Summary
ThinClientStoreModel.getRootUri() returns null and triggers NullPointerException: Cannot invoke "java.net.URI.getHost()" because "rootUri" is null at RxGatewayStoreModel.getUri:421 whenever the routing fallback chain bottoms out at defaultRoutingContext, because defaultRoutingContext.thinclientRegionalEndpoint is never populated.
This systematically breaks ~19 query tests (every *QueryTest* / ReadFeed*Test in the query profile) running against thin-client-enabled multi-master accounts in CI, blocking PR merges (e.g. #49090, #49258, and any PR that runs Public_Cosmos_Live_Test_ThinClient_MultiRegion).
Stack trace (from CI log)
java.lang.NullPointerException: Cannot invoke "java.net.URI.getHost()" because "rootUri" is null
at com.azure.cosmos.implementation.RxGatewayStoreModel.getUri(RxGatewayStoreModel.java:421)
at com.azure.cosmos.implementation.RxGatewayStoreModel.performRequest(RxGatewayStoreModel.java:301)
at com.azure.cosmos.implementation.RxGatewayStoreModel.query(RxGatewayStoreModel.java:281)
at com.azure.cosmos.implementation.RxGatewayStoreModel.invokeAsyncInternal(RxGatewayStoreModel.java:789)
at com.azure.cosmos.implementation.RxGatewayStoreModel.lambda$invokeAsync$0(RxGatewayStoreModel.java:797)
at com.azure.cosmos.implementation.BackoffRetryUtility.lambda$executeRetry$0(BackoffRetryUtility.java:36)
Trigger conditions (all must hold)
- SDK client has
COSMOS.THINCLIENT_ENABLED=true + HTTP2 enabled → useThinClient=true
- Account has
thinClientReadableLocations (federation has IsThinClientEnabled=true server-side)
- Account
enableMultipleWriteLocations=true AND client multipleWriteRegionsEnabled=true
- Client
preferredRegions does not match any account region (e.g. [East US 2] against an account with [West Central US, East US 3])
- Request is
ResourceType.Document (Document queries / bulk delete during truncateCollection)
Root cause
LocationCache constructor (line 72) builds defaultRoutingContext = new RegionalRoutingContext(defaultEndpoint) from the global account URL. RegionalRoutingContext constructor only sets gatewayRegionalEndpoint; thinclientRegionalEndpoint remains null.
LocationCache.addRoutingContexts() (lines 947-963) is the only place setThinclientRegionalEndpoint(...) is ever called, and it iterates regional endpoints only — defaultRoutingContext is never threaded through it.
- With preferred regions that do not match,
getPreferredAvailableRoutingContexts() returns an empty endpoint list and falls back to fallbackRegionalRoutingContext (line 887/903). For the WRITE path the fallback is defaultRoutingContext; for the READ path the fallback is writeRegionalRoutingContexts.get(0), which itself fell back to defaultRoutingContext.
- For a Document request,
RxDocumentClientImpl.useThinClientStoreModel(request) returns true (useThinClient + hasThinClientReadLocations() + ResourceType.Document), so the request is routed through ThinClientStoreModel.
ThinClientStoreModel.getRootUri() returns resolveServiceEndpoint(req).getThinclientRegionalEndpoint() → reads null from defaultRoutingContext → rootUri is null at RxGatewayStoreModel.getUri:421.
// ThinClientStoreModel.java:96
@Override
public URI getRootUri(RxDocumentServiceRequest request) {
// need to have thin client endpoint here
return this.globalEndpointManager.resolveServiceEndpoint(request).getThinclientRegionalEndpoint();
}
Minimal reproduction
Verified on both the current main (no PR) and on PR #49090 — byte-identical output, no network required:
DatabaseAccount dbAccount = new DatabaseAccount();
dbAccount.setEnableMultipleWriteLocations(true);
List<DatabaseAccountLocation> readable = Arrays.asList(
loc("West Central US", "https://acct-westcentralus.documents.azure.com:443/"),
loc("East US 3", "https://acct-eastus3.documents.azure.com:443/"));
dbAccount.setReadableLocations(readable);
dbAccount.setWritableLocations(readable);
List<DatabaseAccountLocation> tcLocs = Arrays.asList(
loc("West Central US", "https://acct-westcentralus.documents.azure.com:10250/"),
loc("East US 3", "https://acct-eastus3.documents.azure.com:10250/"));
dbAccount.set(Constants.Properties.THINCLIENT_READABLE_LOCATIONS, tcLocs);
dbAccount.set(Constants.Properties.THINCLIENT_WRITABLE_LOCATIONS, tcLocs);
ConnectionPolicy policy = new ConnectionPolicy(DirectConnectionConfig.getDefaultConfig());
policy.setEndpointDiscoveryEnabled(true);
policy.setMultipleWriteRegionsEnabled(true);
policy.setPreferredRegions(Arrays.asList("East US 2")); // unmatched
LocationCache cache = new LocationCache(policy,
new URI("https://acct.documents.azure.com:443/"), new Configs());
cache.onDatabaseAccountRead(dbAccount);
RxDocumentServiceRequest req = RxDocumentServiceRequest.create(
null, OperationType.Query, ResourceType.Document,
"/dbs/db1/colls/col1/docs", new HashMap<>());
RegionalRoutingContext resolved = cache.resolveServiceEndpoint(req);
assert resolved.getGatewayRegionalEndpoint() != null; // OK
assert resolved.getThinclientRegionalEndpoint() == null; // BUG
Output:
Resolved gateway endpoint: https://acct.documents.azure.com:443/
Resolved thinclient endpoint: null
Matches defaultEndpoint? true
Suggested fixes (any one)
- Null-check in
ThinClientStoreModel.getRootUri() (smallest, defensive):
public URI getRootUri(RxDocumentServiceRequest request) {
RegionalRoutingContext ctx = this.globalEndpointManager.resolveServiceEndpoint(request);
URI tc = ctx.getThinclientRegionalEndpoint();
return tc != null ? tc : ctx.getGatewayRegionalEndpoint();
}
- Populate
defaultRoutingContext.thinclientRegionalEndpoint when thin-client locations are present (correct fix, keeps RegionalRoutingContext invariants consistent).
- Tighten
useThinClientStoreModel(request) to also require that the resolved context has a non-null thinclient endpoint.
Concurrent test-infrastructure issue
sdk/cosmos/live-platform-matrix.json, live-thinclient-platform-matrix.json, and live-http2-platform-matrix.json hard-code "PREFERRED_LOCATIONS": "[\"East US 2\"]" under MultiMaster_MultiRegion ArmConfig entries. The live thin-client static accounts (thin-client-multi-writer-ci, thin-client-multi-region-ci) only have [West Central US, East US 3] — no East US 2 — so the hardcoded preferred region never matches and unconditionally triggers the fallback path that exposes this bug. Even after the SDK fix lands, the matrix should be updated to use a region the static account actually has, otherwise we are masking other test signal.
Impact
cc @FabianMeiswinkel
Summary
ThinClientStoreModel.getRootUri()returnsnulland triggersNullPointerException: Cannot invoke "java.net.URI.getHost()" because "rootUri" is nullatRxGatewayStoreModel.getUri:421whenever the routing fallback chain bottoms out atdefaultRoutingContext, becausedefaultRoutingContext.thinclientRegionalEndpointis never populated.This systematically breaks ~19 query tests (every
*QueryTest*/ReadFeed*Testin thequeryprofile) running against thin-client-enabled multi-master accounts in CI, blocking PR merges (e.g. #49090, #49258, and any PR that runsPublic_Cosmos_Live_Test_ThinClient_MultiRegion).Stack trace (from CI log)
Trigger conditions (all must hold)
COSMOS.THINCLIENT_ENABLED=true+ HTTP2 enabled →useThinClient=truethinClientReadableLocations(federation hasIsThinClientEnabled=trueserver-side)enableMultipleWriteLocations=trueAND clientmultipleWriteRegionsEnabled=truepreferredRegionsdoes not match any account region (e.g.[East US 2]against an account with[West Central US, East US 3])ResourceType.Document(Document queries / bulk delete duringtruncateCollection)Root cause
LocationCacheconstructor (line 72) buildsdefaultRoutingContext = new RegionalRoutingContext(defaultEndpoint)from the global account URL.RegionalRoutingContextconstructor only setsgatewayRegionalEndpoint;thinclientRegionalEndpointremainsnull.LocationCache.addRoutingContexts()(lines 947-963) is the only placesetThinclientRegionalEndpoint(...)is ever called, and it iterates regional endpoints only —defaultRoutingContextis never threaded through it.getPreferredAvailableRoutingContexts()returns an empty endpoint list and falls back tofallbackRegionalRoutingContext(line 887/903). For the WRITE path the fallback isdefaultRoutingContext; for the READ path the fallback iswriteRegionalRoutingContexts.get(0), which itself fell back todefaultRoutingContext.RxDocumentClientImpl.useThinClientStoreModel(request)returnstrue(useThinClient +hasThinClientReadLocations()+ResourceType.Document), so the request is routed throughThinClientStoreModel.ThinClientStoreModel.getRootUri()returnsresolveServiceEndpoint(req).getThinclientRegionalEndpoint()→ readsnullfromdefaultRoutingContext→rootUriis null atRxGatewayStoreModel.getUri:421.Minimal reproduction
Verified on both the current
main(no PR) and on PR #49090 — byte-identical output, no network required:Output:
Suggested fixes (any one)
ThinClientStoreModel.getRootUri()(smallest, defensive):defaultRoutingContext.thinclientRegionalEndpointwhen thin-client locations are present (correct fix, keepsRegionalRoutingContextinvariants consistent).useThinClientStoreModel(request)to also require that the resolved context has a non-null thinclient endpoint.Concurrent test-infrastructure issue
sdk/cosmos/live-platform-matrix.json,live-thinclient-platform-matrix.json, andlive-http2-platform-matrix.jsonhard-code"PREFERRED_LOCATIONS": "[\"East US 2\"]"underMultiMaster_MultiRegionArmConfig entries. The live thin-client static accounts (thin-client-multi-writer-ci,thin-client-multi-region-ci) only have[West Central US, East US 3]— no East US 2 — so the hardcoded preferred region never matches and unconditionally triggers the fallback path that exposes this bug. Even after the SDK fix lands, the matrix should be updated to use a region the static account actually has, otherwise we are masking other test signal.Impact
queryprofile runs against multi-master accounts fail with this NPE (~19 test classes across*QueryTest*,ReadFeed*Test)JsonSerializable.propertyBagfinal fix — zero routing code touched) — both hit the identical NPE, confirming this is environmental and pre-existingPublic_Cosmos_Live_Test_ThinClient_*jobscc @FabianMeiswinkel