Skip to content

Drop mimalloc#6231

Merged
dscho merged 6 commits into
git-for-windows:mainfrom
dscho:drop-mimalloc
May 26, 2026
Merged

Drop mimalloc#6231
dscho merged 6 commits into
git-for-windows:mainfrom
dscho:drop-mimalloc

Conversation

@dscho
Copy link
Copy Markdown
Member

@dscho dscho commented May 5, 2026

When 6a29c2d ("mingw: use mimalloc", 2019-06-24) introduced the vendored mimalloc, the comparison was against nedmalloc (which by then had not seen an upstream commit since 2014, and whose repository was archived in 2019). The two were essentially at parity in that benchmark; mimalloc was chosen because it was actively developed. I do not really recall whether the platform's default allocator was not part of the comparison; If it was, the performance was still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with nedmalloc safely on its way to being dropped from the upstream codebase entirely (gitgitgadget#2104, currently in seen as e576abb), the question is no longer "mimalloc vs nedmalloc" but "mimalloc vs the OS allocator". Re-running the same git repack -adfq benchmark against each platform's current default allocator finds no measurable speedup from mimalloc on any of Windows, macOS, or Linux.

Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original comparison was nedmalloc vs mimalloc on git repack -adfq over a subset of linux.git), now extended to the three GitHub-hosted runners (ubuntu-latest, macos-latest, windows-latest). Each job built two git binaries from the same source tree, vanilla and USE_MIMALLOC=YesPlease, then prepared a fresh bare clone of linux.git to a fixed SHA, and ran the repacks with both built gits in randomized order for five iterations. Each iteration ran both binaries exactly once on a freshly copytree-ed copy of the immutable template repository; the order within an iteration was randomized so any per-iteration confounder (cache state, runner warm-up, neighbour-VM contention) would be shared symmetrically between variants. Timings excluded the copytree. The full driver is the Python script ci/bench-mimalloc.py on the mimalloc-benchmark branch.

Results: original linux v2.6.20-era workload (49,917 commits, 431,605 objects, ~204 MB pack)

Platform vanilla mean ± stdev mimalloc mean ± stdev Δ (mimalloc − vanilla)
ubuntu-latest 27.089s ± 0.060s 27.041s ± 0.065s −0.048s (−0.18%)
macos-latest 23.259s ± 1.206s 25.076s ± 2.279s +1.817s (+7.8%)
windows-latest 29.828s ± 1.651s 30.329s ± 2.428s +0.501s (+1.7%)

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

Results: 4x larger linux v3.0 workload (255,039 commits, 2,059,429 objects, ~788 MB pack)

Platform vanilla mean ± stdev mimalloc mean ± stdev Δ (mimalloc − vanilla)
ubuntu-latest 134.723s ± 0.329s 134.801s ± 0.191s +0.078s (+0.06%)
macos-latest 130.183s ± 19.098s 133.292s ± 18.991s +3.109s (+2.4%)
windows-latest 145.183s ± 1.272s 146.271s ± 4.161s +1.088s (+0.75%)

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

Discussion

The Linux numbers on the larger workload are particularly clear: stdev is below 0.3% of the mean for both variants, and the difference is well inside that floor. Glibc's allocator and the vendored mimalloc are statistically indistinguishable for git repack -adfq here.

windows-latest runners are noisier (per-run variance ~1-4%, mostly neighbour-VM scheduling), but mimalloc never beats vanilla in either workload. With the original justification for keeping a custom allocator gone (the modern Windows segment-heap is no longer the slow Windows-XP-era HeapAlloc that drove the original 2009 nedmalloc adoption), there is nothing left to motivate the maintenance cost of a vendored allocator.

macos-latest is too noisy at n=5 (stdev 14% of the mean) to draw a firm conclusion, but the visible point-estimate goes the wrong way and there is no plausible mechanism by which Apple's libsystem_malloc would be slower than mimalloc.

What this PR does not do

It does not by itself remove nedmalloc from the tree; that is still promised as a follow-up of the in-flight upstream patch gitgitgadget#2104, presently in seen as e576abb. The first commit here is an amend! whose autosquashed result is byte-identical to that upstream commit, so once the next merging-rebase picks up the upstream patch the two will collapse cleanly.

The five remaining fixup! reverts target each of the original mimalloc-vendoring commits in reverse chronological order; once autosquashed, the pairs cancel out to empty commits which the rebase will drop, leaving the tree free of compat/mimalloc/, the USE_MIMALLOC build infrastructure, and the supporting changes (compat/posix.h _DEFAULT_SOURCE guard, win32_pthread_* renames) that only existed to support the vendored allocator.

@dscho
Copy link
Copy Markdown
Member Author

dscho commented May 5, 2026

Gaaaaah:

  [...]
  In file included from hook.c:1:
  Error: ./git-compat-util.h:353:11: '_Generic' is a C11 extension [-Werror,-Wc11-extensions]
    353 |         return !!strchr(path, '/');
        |                  ^
  /usr/include/string.h:265:3: note: expanded from macro 'strchr'
    265 |   __glibc_const_generic (S, const char *, strchr (S, C))
        |   ^
  /usr/include/x86_64-linux-gnu/sys/cdefs.h:838:3: note: expanded from macro '__glibc_const_generic'
    838 |   _Generic (0 ? (PTR) : (void *) 1,                     \
        |   ^
  [...]

That is obviously independent of this here PR's changes. At least the ever-productive @pks-gitlab saves our day by providing a fix already.

Copy link
Copy Markdown
Member

@mjcheetham mjcheetham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Love when get to remove 16k lines of code in a PR :-)

dscho added a commit that referenced this pull request May 26, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 26, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
dscho added 6 commits May 26, 2026 18:35
mingw: stop using nedmalloc

The vendored nedmalloc allocator under compat/nedmalloc/ has been
unmaintained upstream for a very long time: the original repository at
https://github.com/ned14/nedmalloc received its last commit on July 5,
2014, and was archived (made read-only) by its owner on March 15, 2019.
Our copy has been carried forward unchanged ever since.

The Git for Windows commit that introduced mimalloc as a replacement
on Windows ("mingw: use mimalloc", 2019-06-24, present in the Git for
Windows branch thicket but not upstream) already observed at that time
that nedmalloc had ceased to see any updates for several years.

This came to a head when the Git for Windows SDK upgraded to GCC 16:
the `add_segment()` function in `compat/nedmalloc/malloc.c.h` declares
`int nfences = 0` and only references it inside an `assert()`, which
GCC 16 now flags as `-Wunused-but-set-variable`. Combined with the
`-Werror` enabled by `DEVELOPER=1`, this turns into a hard build
failure:

	compat/nedmalloc/malloc.c.h: In function 'add_segment':
	compat/nedmalloc/malloc.c.h:3897:7: error: variable 'nfences' set but not used [-Werror=unused-but-set-variable=]
	 3897 |   int nfences = 0;
	      |       ^~~~~~~
	cc1.exe: all warnings being treated as errors

The same source built without complaint under GCC 15.2.0; the
regression was bisected to the SDK package update at
git-for-windows/git-sdk-64@188d93dd455
(`mingw-w64-x86_64-gcc 15.2.0-14 -> 16.1.0-1`), with the failing CI
run captured at
https://github.com/git-for-windows/git-sdk-64/actions/runs/25244795074.

Rather than patch the unmaintained vendored sources to silence the
warning, stop opting into nedmalloc altogether on MINGW. The platform
allocator is what every non-MINGW build already uses, and a fresh
build of git.git's master against a minimal Git for Windows SDK
upgraded to GCC 16, with `USE_NED_ALLOCATOR` removed from the MINGW
section, completes successfully.

The compat/nedmalloc/ subtree itself is left in place to keep this
change minimal; nothing in the build links against it any longer, so
it can be removed in a follow-up if desired.

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Re-running the `git repack -adfq` benchmark from
6a29c2d ("mingw: use mimalloc",
2019-06-24) against the platform's *current* default allocator (so
without `nedmalloc` in the picture at all) shows mimalloc is no longer
faster than the system allocator on any of Windows, macOS, or Linux,
neither for the original ~30-second `linux v2.6.20` workload nor for a
4x larger `linux v3.0` workload where each individual run takes ~2
minutes (and the noise floor on Linux is below 0.3% of the mean, so
even small differences would be visible if any existed).

`mimalloc` was originally chosen over nedmalloc, not over the system
allocator. Six years on, with nedmalloc now being dropped from the
codebase entirely, the allocator that mimalloc has to beat is whatever
the OS ships by default; modern Windows segment-heap, glibc malloc, and
the macOS libsystem allocator have all closed the gap, and there is no
longer a measurable benefit to keep maintaining a custom allocator.

The actual benchmark methodology, the per-platform numbers, and links
to the workflow runs that produced them are spelled out in the PR
description rather than repeated across each fixup.

The `fixup!` subject is so that the next rebase against an upstream
Git that already lacks this commit will autosquash this revert into
the original (which becomes empty and is dropped), leaving the tree
free of `mimalloc`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Part of the series that drops the vendored `mimalloc` from this fork;
the rationale (no measurable speedup over the platform allocator on
any of Windows, macOS, or Linux) is in the second commit of the
series and the PR description. The `fixup!` subject is so the next
rebase against an upstream Git that already lacks the target commit
autosquashes this revert into it, dropping the original cleanly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Part of the series that drops the vendored `mimalloc` from this fork;
the rationale (no measurable speedup over the platform allocator on
any of Windows, macOS, or Linux) is in the second commit of the
series and the PR description. The `fixup!` subject is so the next
rebase against an upstream Git that already lacks the target commit
autosquashes this revert into it, dropping the original cleanly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Part of the series that drops the vendored `mimalloc` from this fork;
the rationale (no measurable speedup over the platform allocator on
any of Windows, macOS, or Linux) is in the second commit of the
series and the PR description. The `fixup!` subject is so the next
rebase against an upstream Git that already lacks the target commit
autosquashes this revert into it, dropping the original cleanly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Part of the series that drops the vendored `mimalloc` from this fork;
the rationale (no measurable speedup over the platform allocator on
any of Windows, macOS, or Linux) is in the second commit of the
series and the PR description. The original commit was a preparation
step for vendoring `mimalloc` in (which forces C11 mode under
mingw-w64 GCC and so implicitly links libwinpthread, clashing with
Git's own emulation). With `mimalloc` gone the rename is no longer
needed, so this revert restores the plain `pthread_create` /
`pthread_self` names. The `fixup!` subject is so the next rebase
against an upstream Git that already lacks the target commit
autosquashes this revert into it, dropping the original cleanly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho merged commit 37bff80 into git-for-windows:main May 26, 2026
60 checks passed
@dscho dscho deleted the drop-mimalloc branch May 26, 2026 19:14
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
When 6a29c2d ("mingw: use mimalloc",
2019-06-24) introduced the vendored mimalloc, the comparison was against
`nedmalloc` (which by then had not seen an upstream commit since 2014,
and whose repository was archived in 2019). The two were essentially at
parity in that benchmark; mimalloc was chosen because it was actively
developed. I do not really recall whether the platform's *default*
allocator was not part of the comparison; If it was, the performance was
still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with `nedmalloc` safely on its way to being dropped from
the upstream codebase entirely
(gitgitgadget#2104, currently in `seen` as
e576abb), the question is no longer "mimalloc vs nedmalloc" but
"mimalloc vs the OS allocator". Re-running the same `git repack -adfq`
benchmark against each platform's current default allocator finds no
measurable speedup from mimalloc on any of Windows, macOS, or Linux.

## Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original
comparison was nedmalloc vs mimalloc on `git repack -adfq` over a subset
of `linux.git`), now extended to the three GitHub-hosted runners
(`ubuntu-latest`, `macos-latest`, `windows-latest`). Each job built two
`git` binaries from the same source tree, vanilla and
`USE_MIMALLOC=YesPlease`, then prepared a fresh bare clone of
`linux.git` to a fixed `SHA`, and ran the repacks with both built `git`s
in randomized order for five iterations. Each iteration ran both
binaries exactly once on a freshly `copytree`-ed copy of the immutable
template repository; the order *within* an iteration was randomized so
any per-iteration confounder (cache state, runner warm-up, neighbour-VM
contention) would be shared symmetrically between variants. Timings
excluded the `copytree`. The full driver is the Python script
`ci/bench-mimalloc.py` on the [`mimalloc-benchmark`
branch](https://github.com/dscho/git/tree/mimalloc-benchmark/ci).

## Results: original `linux v2.6.20`-era workload (49,917 commits,
431,605 objects, ~204 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 27.089s ± 0.060s | 27.041s ± 0.065s | −0.048s
(−0.18%) |
| `macos-latest` | 23.259s ± 1.206s | 25.076s ± 2.279s | +1.817s (+7.8%)
|
| `windows-latest` | 29.828s ± 1.651s | 30.329s ± 2.428s | +0.501s
(+1.7%) |

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

## Results: 4x larger `linux v3.0` workload (255,039 commits, 2,059,429
objects, ~788 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 134.723s ± **0.329s** | 134.801s ± **0.191s** |
+0.078s (+0.06%) |
| `macos-latest` | 130.183s ± 19.098s | 133.292s ± 18.991s | +3.109s
(+2.4%) |
| `windows-latest` | 145.183s ± 1.272s | 146.271s ± 4.161s | +1.088s
(+0.75%) |

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

## Discussion

The Linux numbers on the larger workload are particularly clear: stdev
is below 0.3% of the mean for both variants, and the difference is well
inside that floor. Glibc's allocator and the vendored mimalloc are
statistically indistinguishable for `git repack -adfq` here.

`windows-latest` runners are noisier (per-run variance ~1-4%, mostly
neighbour-VM scheduling), but mimalloc never beats vanilla in either
workload. With the original justification for keeping a custom allocator
gone (the modern Windows segment-heap is no longer the slow
Windows-XP-era `HeapAlloc` that drove the original 2009 nedmalloc
adoption), there is nothing left to motivate the maintenance cost of a
vendored allocator.

`macos-latest` is too noisy at n=5 (stdev 14% of the mean) to draw a
firm conclusion, but the visible point-estimate goes the wrong way and
there is no plausible mechanism by which Apple's `libsystem_malloc`
would be slower than mimalloc.

## What this PR does *not* do

It does not by itself remove `nedmalloc` from the tree; that is still
promised as a follow-up of the in-flight upstream patch
gitgitgadget#2104, presently in `seen` as
e576abb. The first commit here is an `amend!` whose autosquashed
result is byte-identical to that upstream commit, so once the next
merging-rebase picks up the upstream patch the two will collapse
cleanly.

The five remaining `fixup!` reverts target each of the original
mimalloc-vendoring commits in reverse chronological order; once
autosquashed, the pairs cancel out to empty commits which the rebase
will drop, leaving the tree free of `compat/mimalloc/`, the
`USE_MIMALLOC` build infrastructure, and the supporting changes
(`compat/posix.h` `_DEFAULT_SOURCE` guard, `win32_pthread_*` renames)
that only existed to support the vendored allocator.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
When 6a29c2d ("mingw: use mimalloc",
2019-06-24) introduced the vendored mimalloc, the comparison was against
`nedmalloc` (which by then had not seen an upstream commit since 2014,
and whose repository was archived in 2019). The two were essentially at
parity in that benchmark; mimalloc was chosen because it was actively
developed. I do not really recall whether the platform's *default*
allocator was not part of the comparison; If it was, the performance was
still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with `nedmalloc` safely on its way to being dropped from
the upstream codebase entirely
(gitgitgadget#2104, currently in `seen` as
e576abb), the question is no longer "mimalloc vs nedmalloc" but
"mimalloc vs the OS allocator". Re-running the same `git repack -adfq`
benchmark against each platform's current default allocator finds no
measurable speedup from mimalloc on any of Windows, macOS, or Linux.

## Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original
comparison was nedmalloc vs mimalloc on `git repack -adfq` over a subset
of `linux.git`), now extended to the three GitHub-hosted runners
(`ubuntu-latest`, `macos-latest`, `windows-latest`). Each job built two
`git` binaries from the same source tree, vanilla and
`USE_MIMALLOC=YesPlease`, then prepared a fresh bare clone of
`linux.git` to a fixed `SHA`, and ran the repacks with both built `git`s
in randomized order for five iterations. Each iteration ran both
binaries exactly once on a freshly `copytree`-ed copy of the immutable
template repository; the order *within* an iteration was randomized so
any per-iteration confounder (cache state, runner warm-up, neighbour-VM
contention) would be shared symmetrically between variants. Timings
excluded the `copytree`. The full driver is the Python script
`ci/bench-mimalloc.py` on the [`mimalloc-benchmark`
branch](https://github.com/dscho/git/tree/mimalloc-benchmark/ci).

## Results: original `linux v2.6.20`-era workload (49,917 commits,
431,605 objects, ~204 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 27.089s ± 0.060s | 27.041s ± 0.065s | −0.048s
(−0.18%) |
| `macos-latest` | 23.259s ± 1.206s | 25.076s ± 2.279s | +1.817s (+7.8%)
|
| `windows-latest` | 29.828s ± 1.651s | 30.329s ± 2.428s | +0.501s
(+1.7%) |

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

## Results: 4x larger `linux v3.0` workload (255,039 commits, 2,059,429
objects, ~788 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 134.723s ± **0.329s** | 134.801s ± **0.191s** |
+0.078s (+0.06%) |
| `macos-latest` | 130.183s ± 19.098s | 133.292s ± 18.991s | +3.109s
(+2.4%) |
| `windows-latest` | 145.183s ± 1.272s | 146.271s ± 4.161s | +1.088s
(+0.75%) |

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

## Discussion

The Linux numbers on the larger workload are particularly clear: stdev
is below 0.3% of the mean for both variants, and the difference is well
inside that floor. Glibc's allocator and the vendored mimalloc are
statistically indistinguishable for `git repack -adfq` here.

`windows-latest` runners are noisier (per-run variance ~1-4%, mostly
neighbour-VM scheduling), but mimalloc never beats vanilla in either
workload. With the original justification for keeping a custom allocator
gone (the modern Windows segment-heap is no longer the slow
Windows-XP-era `HeapAlloc` that drove the original 2009 nedmalloc
adoption), there is nothing left to motivate the maintenance cost of a
vendored allocator.

`macos-latest` is too noisy at n=5 (stdev 14% of the mean) to draw a
firm conclusion, but the visible point-estimate goes the wrong way and
there is no plausible mechanism by which Apple's `libsystem_malloc`
would be slower than mimalloc.

## What this PR does *not* do

It does not by itself remove `nedmalloc` from the tree; that is still
promised as a follow-up of the in-flight upstream patch
gitgitgadget#2104, presently in `seen` as
e576abb. The first commit here is an `amend!` whose autosquashed
result is byte-identical to that upstream commit, so once the next
merging-rebase picks up the upstream patch the two will collapse
cleanly.

The five remaining `fixup!` reverts target each of the original
mimalloc-vendoring commits in reverse chronological order; once
autosquashed, the pairs cancel out to empty commits which the rebase
will drop, leaving the tree free of `compat/mimalloc/`, the
`USE_MIMALLOC` build infrastructure, and the supporting changes
(`compat/posix.h` `_DEFAULT_SOURCE` guard, `win32_pthread_*` renames)
that only existed to support the vendored allocator.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 27, 2026
When 6a29c2d ("mingw: use mimalloc",
2019-06-24) introduced the vendored mimalloc, the comparison was against
`nedmalloc` (which by then had not seen an upstream commit since 2014,
and whose repository was archived in 2019). The two were essentially at
parity in that benchmark; mimalloc was chosen because it was actively
developed. I do not really recall whether the platform's *default*
allocator was not part of the comparison; If it was, the performance was
still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with `nedmalloc` safely on its way to being dropped from
the upstream codebase entirely
(gitgitgadget#2104, currently in `seen` as
e576abb), the question is no longer "mimalloc vs nedmalloc" but
"mimalloc vs the OS allocator". Re-running the same `git repack -adfq`
benchmark against each platform's current default allocator finds no
measurable speedup from mimalloc on any of Windows, macOS, or Linux.

## Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original
comparison was nedmalloc vs mimalloc on `git repack -adfq` over a subset
of `linux.git`), now extended to the three GitHub-hosted runners
(`ubuntu-latest`, `macos-latest`, `windows-latest`). Each job built two
`git` binaries from the same source tree, vanilla and
`USE_MIMALLOC=YesPlease`, then prepared a fresh bare clone of
`linux.git` to a fixed `SHA`, and ran the repacks with both built `git`s
in randomized order for five iterations. Each iteration ran both
binaries exactly once on a freshly `copytree`-ed copy of the immutable
template repository; the order *within* an iteration was randomized so
any per-iteration confounder (cache state, runner warm-up, neighbour-VM
contention) would be shared symmetrically between variants. Timings
excluded the `copytree`. The full driver is the Python script
`ci/bench-mimalloc.py` on the [`mimalloc-benchmark`
branch](https://github.com/dscho/git/tree/mimalloc-benchmark/ci).

## Results: original `linux v2.6.20`-era workload (49,917 commits,
431,605 objects, ~204 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 27.089s ± 0.060s | 27.041s ± 0.065s | −0.048s
(−0.18%) |
| `macos-latest` | 23.259s ± 1.206s | 25.076s ± 2.279s | +1.817s (+7.8%)
|
| `windows-latest` | 29.828s ± 1.651s | 30.329s ± 2.428s | +0.501s
(+1.7%) |

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

## Results: 4x larger `linux v3.0` workload (255,039 commits, 2,059,429
objects, ~788 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 134.723s ± **0.329s** | 134.801s ± **0.191s** |
+0.078s (+0.06%) |
| `macos-latest` | 130.183s ± 19.098s | 133.292s ± 18.991s | +3.109s
(+2.4%) |
| `windows-latest` | 145.183s ± 1.272s | 146.271s ± 4.161s | +1.088s
(+0.75%) |

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

## Discussion

The Linux numbers on the larger workload are particularly clear: stdev
is below 0.3% of the mean for both variants, and the difference is well
inside that floor. Glibc's allocator and the vendored mimalloc are
statistically indistinguishable for `git repack -adfq` here.

`windows-latest` runners are noisier (per-run variance ~1-4%, mostly
neighbour-VM scheduling), but mimalloc never beats vanilla in either
workload. With the original justification for keeping a custom allocator
gone (the modern Windows segment-heap is no longer the slow
Windows-XP-era `HeapAlloc` that drove the original 2009 nedmalloc
adoption), there is nothing left to motivate the maintenance cost of a
vendored allocator.

`macos-latest` is too noisy at n=5 (stdev 14% of the mean) to draw a
firm conclusion, but the visible point-estimate goes the wrong way and
there is no plausible mechanism by which Apple's `libsystem_malloc`
would be slower than mimalloc.

## What this PR does *not* do

It does not by itself remove `nedmalloc` from the tree; that is still
promised as a follow-up of the in-flight upstream patch
gitgitgadget#2104, presently in `seen` as
e576abb. The first commit here is an `amend!` whose autosquashed
result is byte-identical to that upstream commit, so once the next
merging-rebase picks up the upstream patch the two will collapse
cleanly.

The five remaining `fixup!` reverts target each of the original
mimalloc-vendoring commits in reverse chronological order; once
autosquashed, the pairs cancel out to empty commits which the rebase
will drop, leaving the tree free of `compat/mimalloc/`, the
`USE_MIMALLOC` build infrastructure, and the supporting changes
(`compat/posix.h` `_DEFAULT_SOURCE` guard, `win32_pthread_*` renames)
that only existed to support the vendored allocator.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 29, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 29, 2026
When 6a29c2d ("mingw: use mimalloc",
2019-06-24) introduced the vendored mimalloc, the comparison was against
`nedmalloc` (which by then had not seen an upstream commit since 2014,
and whose repository was archived in 2019). The two were essentially at
parity in that benchmark; mimalloc was chosen because it was actively
developed. I do not really recall whether the platform's *default*
allocator was not part of the comparison; If it was, the performance was
still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with `nedmalloc` safely on its way to being dropped from
the upstream codebase entirely
(gitgitgadget#2104, currently in `seen` as
e576abb), the question is no longer "mimalloc vs nedmalloc" but
"mimalloc vs the OS allocator". Re-running the same `git repack -adfq`
benchmark against each platform's current default allocator finds no
measurable speedup from mimalloc on any of Windows, macOS, or Linux.

## Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original
comparison was nedmalloc vs mimalloc on `git repack -adfq` over a subset
of `linux.git`), now extended to the three GitHub-hosted runners
(`ubuntu-latest`, `macos-latest`, `windows-latest`). Each job built two
`git` binaries from the same source tree, vanilla and
`USE_MIMALLOC=YesPlease`, then prepared a fresh bare clone of
`linux.git` to a fixed `SHA`, and ran the repacks with both built `git`s
in randomized order for five iterations. Each iteration ran both
binaries exactly once on a freshly `copytree`-ed copy of the immutable
template repository; the order *within* an iteration was randomized so
any per-iteration confounder (cache state, runner warm-up, neighbour-VM
contention) would be shared symmetrically between variants. Timings
excluded the `copytree`. The full driver is the Python script
`ci/bench-mimalloc.py` on the [`mimalloc-benchmark`
branch](https://github.com/dscho/git/tree/mimalloc-benchmark/ci).

## Results: original `linux v2.6.20`-era workload (49,917 commits,
431,605 objects, ~204 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 27.089s ± 0.060s | 27.041s ± 0.065s | −0.048s
(−0.18%) |
| `macos-latest` | 23.259s ± 1.206s | 25.076s ± 2.279s | +1.817s (+7.8%)
|
| `windows-latest` | 29.828s ± 1.651s | 30.329s ± 2.428s | +0.501s
(+1.7%) |

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

## Results: 4x larger `linux v3.0` workload (255,039 commits, 2,059,429
objects, ~788 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 134.723s ± **0.329s** | 134.801s ± **0.191s** |
+0.078s (+0.06%) |
| `macos-latest` | 130.183s ± 19.098s | 133.292s ± 18.991s | +3.109s
(+2.4%) |
| `windows-latest` | 145.183s ± 1.272s | 146.271s ± 4.161s | +1.088s
(+0.75%) |

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

## Discussion

The Linux numbers on the larger workload are particularly clear: stdev
is below 0.3% of the mean for both variants, and the difference is well
inside that floor. Glibc's allocator and the vendored mimalloc are
statistically indistinguishable for `git repack -adfq` here.

`windows-latest` runners are noisier (per-run variance ~1-4%, mostly
neighbour-VM scheduling), but mimalloc never beats vanilla in either
workload. With the original justification for keeping a custom allocator
gone (the modern Windows segment-heap is no longer the slow
Windows-XP-era `HeapAlloc` that drove the original 2009 nedmalloc
adoption), there is nothing left to motivate the maintenance cost of a
vendored allocator.

`macos-latest` is too noisy at n=5 (stdev 14% of the mean) to draw a
firm conclusion, but the visible point-estimate goes the wrong way and
there is no plausible mechanism by which Apple's `libsystem_malloc`
would be slower than mimalloc.

## What this PR does *not* do

It does not by itself remove `nedmalloc` from the tree; that is still
promised as a follow-up of the in-flight upstream patch
gitgitgadget#2104, presently in `seen` as
e576abb. The first commit here is an `amend!` whose autosquashed
result is byte-identical to that upstream commit, so once the next
merging-rebase picks up the upstream patch the two will collapse
cleanly.

The five remaining `fixup!` reverts target each of the original
mimalloc-vendoring commits in reverse chronological order; once
autosquashed, the pairs cancel out to empty commits which the rebase
will drop, leaving the tree free of `compat/mimalloc/`, the
`USE_MIMALLOC` build infrastructure, and the supporting changes
(`compat/posix.h` `_DEFAULT_SOURCE` guard, `win32_pthread_*` renames)
that only existed to support the vendored allocator.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 30, 2026
This closes two gaps in the current `AGENTS.md` that came up while
drafting #6231 and #2104:

1. The `Fixup Commits` section only covered `fixup!`. `amend!` has
different semantics (replaces the target's commit message and combines
diffs to produce any final state), and the "amend! whose body is the
upstream commit and whose diff aligns the squashed result with that
upstream commit" pattern is precisely how the first commit of #6231
anticipates the in-flight nedmalloc removal in `seen` as `e576abb9f8`.
That pattern is undocumented today.

2. There is no top-level "Contributing to Git for Windows" section, only
"Contributing to Upstream Git via GitGitGadget". The cross-fork `gh pr
create` invocation, the conditions under which a PR is naturally a
fixup/amend-only series against existing thicket commits, and the
upstream-aligning `amend!` shape were all things I inferred from context
rather than the guide.

Add subsections that cover those gaps. Nothing in the existing text
changes.
gitforwindowshelper Bot pushed a commit that referenced this pull request May 30, 2026
When 6a29c2d ("mingw: use mimalloc",
2019-06-24) introduced the vendored mimalloc, the comparison was against
`nedmalloc` (which by then had not seen an upstream commit since 2014,
and whose repository was archived in 2019). The two were essentially at
parity in that benchmark; mimalloc was chosen because it was actively
developed. I do not really recall whether the platform's *default*
allocator was not part of the comparison; If it was, the performance was
still worse than mimalloc, if it wasn't, I forgot to test ;-)

Six years on, with `nedmalloc` safely on its way to being dropped from
the upstream codebase entirely
(gitgitgadget#2104, currently in `seen` as
e576abb), the question is no longer "mimalloc vs nedmalloc" but
"mimalloc vs the OS allocator". Re-running the same `git repack -adfq`
benchmark against each platform's current default allocator finds no
measurable speedup from mimalloc on any of Windows, macOS, or Linux.

## Methods

I recapitulated the same benchmark as cited in 6a29c2d (the original
comparison was nedmalloc vs mimalloc on `git repack -adfq` over a subset
of `linux.git`), now extended to the three GitHub-hosted runners
(`ubuntu-latest`, `macos-latest`, `windows-latest`). Each job built two
`git` binaries from the same source tree, vanilla and
`USE_MIMALLOC=YesPlease`, then prepared a fresh bare clone of
`linux.git` to a fixed `SHA`, and ran the repacks with both built `git`s
in randomized order for five iterations. Each iteration ran both
binaries exactly once on a freshly `copytree`-ed copy of the immutable
template repository; the order *within* an iteration was randomized so
any per-iteration confounder (cache state, runner warm-up, neighbour-VM
contention) would be shared symmetrically between variants. Timings
excluded the `copytree`. The full driver is the Python script
`ci/bench-mimalloc.py` on the [`mimalloc-benchmark`
branch](https://github.com/dscho/git/tree/mimalloc-benchmark/ci).

## Results: original `linux v2.6.20`-era workload (49,917 commits,
431,605 objects, ~204 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 27.089s ± 0.060s | 27.041s ± 0.065s | −0.048s
(−0.18%) |
| `macos-latest` | 23.259s ± 1.206s | 25.076s ± 2.279s | +1.817s (+7.8%)
|
| `windows-latest` | 29.828s ± 1.651s | 30.329s ± 2.428s | +0.501s
(+1.7%) |

Workflow run: https://github.com/dscho/git/actions/runs/25374127848

## Results: 4x larger `linux v3.0` workload (255,039 commits, 2,059,429
objects, ~788 MB pack)

| Platform | vanilla mean ± stdev | mimalloc mean ± stdev | Δ (mimalloc
− vanilla) |
|---|---|---|---|
| `ubuntu-latest` | 134.723s ± **0.329s** | 134.801s ± **0.191s** |
+0.078s (+0.06%) |
| `macos-latest` | 130.183s ± 19.098s | 133.292s ± 18.991s | +3.109s
(+2.4%) |
| `windows-latest` | 145.183s ± 1.272s | 146.271s ± 4.161s | +1.088s
(+0.75%) |

Workflow run: https://github.com/dscho/git/actions/runs/25376885309

## Discussion

The Linux numbers on the larger workload are particularly clear: stdev
is below 0.3% of the mean for both variants, and the difference is well
inside that floor. Glibc's allocator and the vendored mimalloc are
statistically indistinguishable for `git repack -adfq` here.

`windows-latest` runners are noisier (per-run variance ~1-4%, mostly
neighbour-VM scheduling), but mimalloc never beats vanilla in either
workload. With the original justification for keeping a custom allocator
gone (the modern Windows segment-heap is no longer the slow
Windows-XP-era `HeapAlloc` that drove the original 2009 nedmalloc
adoption), there is nothing left to motivate the maintenance cost of a
vendored allocator.

`macos-latest` is too noisy at n=5 (stdev 14% of the mean) to draw a
firm conclusion, but the visible point-estimate goes the wrong way and
there is no plausible mechanism by which Apple's `libsystem_malloc`
would be slower than mimalloc.

## What this PR does *not* do

It does not by itself remove `nedmalloc` from the tree; that is still
promised as a follow-up of the in-flight upstream patch
gitgitgadget#2104, presently in `seen` as
e576abb. The first commit here is an `amend!` whose autosquashed
result is byte-identical to that upstream commit, so once the next
merging-rebase picks up the upstream patch the two will collapse
cleanly.

The five remaining `fixup!` reverts target each of the original
mimalloc-vendoring commits in reverse chronological order; once
autosquashed, the pairs cancel out to empty commits which the rebase
will drop, leaving the tree free of `compat/mimalloc/`, the
`USE_MIMALLOC` build infrastructure, and the supporting changes
(`compat/posix.h` `_DEFAULT_SOURCE` guard, `win32_pthread_*` renames)
that only existed to support the vendored allocator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants