Skip to content

gh-150717: Avoid mark-array allocation for groupless regex patterns#150719

Open
gaborbernat wants to merge 1 commit into
python:mainfrom
gaborbernat:opt/re-skip-groupless-mark-alloc
Open

gh-150717: Avoid mark-array allocation for groupless regex patterns#150719
gaborbernat wants to merge 1 commit into
python:mainfrom
gaborbernat:opt/re-skip-groupless-mark-alloc

Conversation

@gaborbernat
Copy link
Copy Markdown
Contributor

@gaborbernat gaborbernat commented Jun 1, 2026

Every match, search, or fullmatch on a pattern with no capturing groups allocates capture-group bookkeeping, then frees it without ever reading it. Group-less patterns are common in validation and scanning code.

This skips the allocation for patterns with no capturing groups. Patterns that capture stay untouched, and results are identical.

It helps validation and scanning in tight loops: checking record formats during an import with re.match(r"\d{4}-\d{2}-\d{2}", value), scanning log lines with re.search(r"ERROR|WARN", line), or testing many small patterns per request.

A pyperf comparison of base versus patched builds (script and full table in a comment below) uses the most widely used group-less patterns mined from the top-1000 PyPI packages. They run 1.03 to 1.17x faster, geometric mean 1.09x, with the short match/search cases gaining most; a capturing-group control is unchanged.

Resolves #150717.

state_init() always did PyMem_New(state->mark, groups*2), which for a
pattern with no capturing groups is PyMem_Malloc(0) -- a real allocation
(plus matching free) on every match/search/fullmatch call, for an array
that is never read: groupless patterns emit no MARK opcodes and group 0's
span is taken from state->start/ptr.

Guard the allocation with `if (pattern->groups)`. state->mark stays NULL
(set by the preceding memset), and both the error path and state_fini
already PyMem_Free(NULL) safely.
@gaborbernat gaborbernat force-pushed the opt/re-skip-groupless-mark-alloc branch from 6fdde54 to 28624be Compare June 1, 2026 16:11
@gaborbernat gaborbernat marked this pull request as ready for review June 1, 2026 17:15
@sobolevn sobolevn requested a review from serhiy-storchaka June 1, 2026 19:57
@gaborbernat
Copy link
Copy Markdown
Contributor Author

Benchmark using pyperf, with the patterns taken from real usage: I mined every literal re.* pattern across the top-1000 PyPI packages, kept the group-less ones, and picked the most widely used (the bracketed number is how many distinct projects use each). Inputs and operations match how each pattern is actually applied (whitespace collapse, PEP 503 name normalization, version/slug validation, number/whitespace scans, camelCase splitting). A capturing-group pattern is included as a control.

Built base (main ancestor) and patched interpreters, ran the script under each, compared with pyperf compare_to. macOS arm64, non-PGO.

import pyperf
runner = pyperf.Runner()
BENCHES = [
    (r"sub \s+ (collapse ws) [66 repos]",        r"import re; p=re.compile(r'\s+'); s='the  quick   brown\tfox jumps'", "p.sub(' ', s)"),
    (r"sub [-_.]+ (PEP 503 norm) [21 repos]",     r"import re; p=re.compile(r'[-_.]+'); s='Foo_._Bar--Baz'",            "p.sub('-', s)"),
    (r"match ^\d+\.\d+\.\d+$ (version) [11]",      r"import re; p=re.compile(r'^\d+\.\d+\.\d+$'); s='12.4.301'",          "p.match(s)"),
    (r"match ^[A-Za-z0-9_-]+$ (slug) [6]",         r"import re; p=re.compile(r'^[a-zA-Z0-9_-]+$'); s='my-package_v2'",    "p.match(s)"),
    (r"search \d+ (number) [20]",                  r"import re; p=re.compile(r'\d+'); s='retry after 30 seconds'",       "p.search(s)"),
    (r"search \s (has ws) [23]",                   r"import re; p=re.compile(r'\s'); s='no_spaces_here_value'",          "p.search(s)"),
    (r"sub (?<!^)(?=[A-Z]) (camel split) [11]",    r"import re; p=re.compile(r'(?<!^)(?=[A-Z])'); s='CamelCaseClassName'", "p.sub('_', s)"),
    (r"match (\d+)\.(\d+)\.(\d+) (CONTROL, groups)", r"import re; p=re.compile(r'(\d+)\.(\d+)\.(\d+)'); s='12.4.301'",    "p.match(s)"),
]
for name, setup, stmt in BENCHES:
    runner.timeit(name=name, stmt=stmt, setup=setup)

Results (pyperf compare_to base.json patched.json --table):

+-------------------------------------------------+---------+-----------------------+
| Benchmark                                       | re_base | re_patched            |
+=================================================+=========+=======================+
| sub r'\s+' (collapse ws) [66 repos]             | 598 ns  | 558 ns: 1.07x faster  |
+-------------------------------------------------+---------+-----------------------+
| sub r'[-_.]+' (PEP 503 norm) [21 repos]         | 339 ns  | 318 ns: 1.07x faster  |
+-------------------------------------------------+---------+-----------------------+
| match r'^\d+\.\d+\.\d+$' (version) [11 repos]   | 119 ns  | 104 ns: 1.14x faster  |
+-------------------------------------------------+---------+-----------------------+
| match r'^[A-Za-z0-9_-]+$' (slug) [6 repos]      | 99.8 ns | 85.5 ns: 1.17x faster |
+-------------------------------------------------+---------+-----------------------+
| search r'\d+' (number) [20 repos]               | 184 ns  | 170 ns: 1.08x faster  |
+-------------------------------------------------+---------+-----------------------+
| search r'\s' (has ws) [23 repos]                | 113 ns  | 99.9 ns: 1.13x faster |
+-------------------------------------------------+---------+-----------------------+
| sub r'(?<!^)(?=[A-Z])' (camel split) [11 repos] | 545 ns  | 527 ns: 1.03x faster  |
+-------------------------------------------------+---------+-----------------------+
| Geometric mean                                  | (ref)   | 1.09x faster          |
+-------------------------------------------------+---------+-----------------------+

Group-less patterns run 1.03-1.17x faster (geometric mean 1.09x); the short match/search cases gain most, since the per-call allocation is a larger share of their cost. The capturing-group control was hidden as not significant, i.e. unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

re: avoid per-call mark-array allocation for patterns with no capturing groups

1 participant