Enable zero-copy for QNN GPU by qti-mattsinc · Pull Request #2105 · microsoft/onnxruntime-genai

qti-mattsinc · 2026-04-27T22:55:51Z

Use the new GPU shared memory allocator in QNN EP to
allocate the KV cache on CPU-accessible GPU memory.
This provides a large speedup by eliminating unnecessary
copy overhead.
Refactor QNN-specific checks out of the common
EnsureDeviceOrtInit by adding GetMemoryInfo and
GetProviderOptionsForAllocatorSession to DeviceInterface.

johnpaultaken

This change does not look ideal to me, we need to discuss why it deviates from the norm.
Ideally what I like to see is no change to genai at all.
When OrtMemoryInfo is of type QnnShared can we just return the QnnGpuAllocator based on the device selection made by the user ie GPU ?
Everything else should work transparently, just like how it works for the other EPs cuda, openvino etc.
I don't see a need for enable_dx12_shared_memory_allocator option, why not enable it always in the EP ?
Also we cannot have variable names like use_dx12_shared_memory etc which then turns out to be only a Qnn specific option. Ideally, I would want to avoid any code in genai that are EP specific.

* Use the new GPU shared memory allocator in QNN EP to allocate the KV cache on CPU-accessible GPU memory. This provides a large speedup by eliminating unnecessary copy overhead. * Keep fallback path with CPU<->GPU copies if outdated QNN EP or driver dependencies prevent the shared allocator from being available. > Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com>

* Refactor QNN-specific checks out of the common `EnsureDeviceOrtInit` by adding `GetMemoryInfo` and `GetProviderOptionsForAllocatorSession` to `DeviceInterface`. * Also move a WebGPU branch out of the common code along similar lines.

Copilot

Pull request overview

Enables zero-copy KV-cache placement for the QNN GPU backend by using the shared-memory allocator exposed by recent QNN EP packages, with a runtime capability probe and fallback. Also refactors device-specific allocator/session-option logic out of the common EnsureDeviceOrtInit path by adding two new virtual methods (GetMemoryInfo, GetProviderOptionsForAllocatorSession) on DeviceInterface, with QNN and WebGPU providing overrides.

Changes:

Add DeviceInterface::GetMemoryInfo / GetProviderOptionsForAllocatorSession plus a DeviceTypeToString helper, and replace ad-hoc per-device branches in EnsureDeviceOrtInit with calls to these virtuals.
Add IsQNNGPUBackend and a IsQNNGPUSharedAllocatorAvailable runtime probe (embedded trivial ONNX model) to opt the GPU backend into shared-memory allocation when supported, with a warning-and-fallback otherwise.
Move WebGPU's "WebGPU_Buf" vs "WebGPU_Buffer" name fallback into the WebGPU GetMemoryInfo override.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/smartptrs.h	Adds `DeviceTypeToString` and two new virtual methods on `DeviceInterface` with default implementations driven by per-device name tables.
src/models/model.cpp	Simplifies `EnsureDeviceOrtInit` to use the new virtuals; drops inline QNN/WebGPU special cases and uses `DeviceTypeToString` in error messages.
src/qnn/interface.h	Forward declarations plus `IsQNNGPUBackend` declaration.
src/qnn/interface.cpp	QNN overrides of the new virtuals (with GPU vs NPU device filtering) and implementation of `IsQNNGPUBackend`.
src/qnn/session_options.cpp	Adds shared-memory allocator capability probe via a hardcoded trivial ONNX model; selects QNN device only when GPU shared allocator is available or NPU shared allocator is explicitly enabled.
src/webgpu/interface.cpp	Adds WebGPU `GetMemoryInfo` with attempted fallback between "WebGPU_Buf" and "WebGPU_Buffer".

* EP implementation changed to use this instead of choosing allocator by selected backend

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

johnpaultaken

Looks cleaner than existing code.
Qnn specific checks are moved to qnn specifie interface now.
And so is web gpu specific code.

qti-mattsinc commented Apr 27, 2026

View reviewed changes

Comment thread src/models/onnxruntime_api.h

qti-mattsinc commented Apr 27, 2026

View reviewed changes

Comment thread src/models/model.cpp Outdated

johnpaultaken suggested changes Apr 30, 2026

View reviewed changes

qti-mattsinc force-pushed the dev/mattsinc/gpu-zero-copy branch from 6ea280c to 922e278 Compare May 12, 2026 21:22

vjatoth-qti and others added 2 commits May 15, 2026 15:08

Move QNN branches out of common initialization

cf07c29

* Refactor QNN-specific checks out of the common `EnsureDeviceOrtInit` by adding `GetMemoryInfo` and `GetProviderOptionsForAllocatorSession` to `DeviceInterface`. * Also move a WebGPU branch out of the common code along similar lines.

qti-mattsinc force-pushed the dev/mattsinc/gpu-zero-copy branch from 922e278 to cf07c29 Compare May 15, 2026 22:09

qti-mattsinc requested a review from johnpaultaken May 15, 2026 22:54

qti-mattsinc changed the title ~~WIP: Enable zero-copy for QNN GPU~~ Enable zero-copy for QNN GPU May 18, 2026

qti-mattsinc marked this pull request as ready for review May 18, 2026 23:22

qti-mattsinc requested a review from a team as a code owner May 18, 2026 23:22

Copilot AI review requested due to automatic review settings May 18, 2026 23:22

Copilot started reviewing on behalf of qti-mattsinc May 18, 2026 23:27 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/webgpu/interface.cpp

Comment thread src/qnn/interface.cpp Outdated

Comment thread src/qnn/session_options.cpp Outdated

qti-mattsinc added 3 commits May 21, 2026 16:56

Check for enable_dx12_shared_memory_allocator option

c14c3d6

* EP implementation changed to use this instead of choosing allocator by selected backend

Clean up availibility check

22edf49

Look for memory info exception in allocator check

45f9e78

qti-mattsinc requested a review from Copilot May 22, 2026 17:14

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread src/smartptrs.h

Comment thread src/smartptrs.h

Comment thread src/qnn/session_options.cpp

Comment thread src/qnn/session_options.cpp

Comment thread src/qnn/session_options.cpp

Comment thread src/webgpu/interface.cpp Outdated

Comment thread src/models/model.cpp Outdated

qti-mattsinc added 2 commits May 22, 2026 10:49

Address comments

fc85649

Address comment

dcabbd5

qti-mattsinc mentioned this pull request May 26, 2026

Add DX12 shared memory support for QNN-EP with GPU backend. onnxruntime/onnxruntime-qnn#213

Open

johnpaultaken approved these changes May 27, 2026

View reviewed changes

Clean up unused variable

c6877f3

kunal-vaishnavi requested a review from baijumeswani May 28, 2026 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable zero-copy for QNN GPU#2105

Enable zero-copy for QNN GPU#2105
qti-mattsinc wants to merge 8 commits into
microsoft:mainfrom
CodeLinaro:dev/mattsinc/gpu-zero-copy

qti-mattsinc commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

johnpaultaken left a comment •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnpaultaken left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qti-mattsinc commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnpaultaken left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnpaultaken left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qti-mattsinc commented Apr 27, 2026 •

edited

Loading

johnpaultaken left a comment •

edited

Loading