prio-queue: use cascade-down sift for faster extract-min#2132
Draft
spkrka wants to merge 1 commit into
Draft
Conversation
df494d3 to
2038c08
Compare
Replace the standard sift-down in prio_queue_get() with a
cascade-down approach.
The standard approach places the last array element at the root,
then sifts it down. At each level this requires two comparisons
(left vs right child, then element vs winner) and, when the
element is larger, a swap (three 16-byte copies).
The cascade approach instead promotes the smaller child into the
vacant root slot at each level — one comparison and one copy.
The vacancy sinks to a leaf, where the last array element is
placed and sifted up if needed — typically zero levels since the
last array element tends to be large.
For a heap of depth d, work per extract drops from 2d comparisons
+ 3d copies to d comparisons + d copies: roughly half the
comparisons and a third of the data movement.
Simplify prio_queue_replace() to a plain get+put sequence. The
previous implementation shared sift_down_root() with get, but the
cascade approach no longer accommodates that cleanly. This is
fine in practice: replace is only called from
pop_most_recent_commit() (fetch-pack, object-name, walker) and
show-branch — none of which appear in any hot path.
A synthetic benchmark (10M put+get cycles with ascending keys,
CPU-pinned, 3 runs per data point) shows consistent improvement
across all queue sizes, with no regressions:
queue width baseline cascade speedup
------------------------------------------------
10 0.457s 0.428s 1.07x
100 0.845s 0.692s 1.22x
1,000 1.207s 0.980s 1.23x
10,000 1.652s 1.338s 1.23x
100,000 2.208s 1.795s 1.23x
In end-to-end git commands the improvement is modest because
sift_down_root is only ~8% of total runtime. Profiling
rev-list --count on a 2.5M-commit monorepo shows sift_down_root
dropping from 8.2% to 0.4% of total runtime. The improvement
scales with DAG width: wider DAGs produce larger priority queues,
amplifying the per-level savings. In small or narrow repos the
queues stay shallow and the effect is negligible.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the standard sift-down in
prio_queue_get()with a cascade-down approach that halves the number of comparisons and reduces data movement from 3 copies per level (swap) to 1 copy per level.The standard extract-min places the last array element at the root, then sifts it down. At each level this requires two comparisons (left vs right child, then element vs winner) and, when the element is larger, a swap (three 16-byte copies).
The cascade approach saves the root, then promotes the smaller child into the vacant slot at each level — one comparison and one copy. The hole sinks to a leaf, where the saved element is placed and sifted up if needed (typically zero levels, since the last array element tends to be a low-priority leaf value).
For a heap of depth d, work per extract drops from 2d comparisons + 3d copies to d comparisons + d copies.
prio_queue_replace()is simplified to a plain get+put sequence. The previous implementation sharedsift_down_root()with get, but the cascade approach no longer accommodates that cleanly. This is fine in practice: replace is only called frompop_most_recent_commit()(fetch-pack, object-name, walker) and show-branch — none of which appear in any hot path.Performance
Profiling
git rev-list --counton a 2.5M-commit monorepo showssift_down_rootdropping from 8.2% to 0.4% of total runtime, effectively eliminated as significant overhead.Synthetic benchmark
10M put+get cycles with ascending integer keys, CPU-pinned, 3 runs per data point. Consistent improvement across all queue sizes, no regressions:
End-to-end benchmarks
All benchmarks use
git-benchwith 1 warmup run followed by 10 timed runs. Each configuration is built from the same source tree and tested on the same repo in alternating order.Large monorepo (2.5M commits, wide DAG, 8.4K remote branches) — range of ~200 first-parent / ~434 total commits:
linux kernel (1.4M commits) — range v5.0..v6.0 (311K commits):
git.git (81K commits) — range v2.0.0..v2.45.0 (37K commits):
The improvement scales with DAG width: wider DAGs produce larger priority queues, amplifying the per-level savings. In small or narrow repositories the priority queues stay shallow and the sift-down cost is already negligible, so the change is not noticeable.
Testing
Existing test coverage is thorough:
All tests pass.