Skip to content

Run transport security tests in process instead of over sockets#2764

Merged
maxisbey merged 1 commit into
mainfrom
maxisbey/deflake-security-tests
Jun 2, 2026
Merged

Run transport security tests in process instead of over sockets#2764
maxisbey merged 1 commit into
mainfrom
maxisbey/deflake-security-tests

Conversation

@maxisbey
Copy link
Copy Markdown
Contributor

@maxisbey maxisbey commented Jun 2, 2026

The SSE and StreamableHTTP security test files have been flaking in CI. Two recent examples: test_sse_security_custom_allowed_hosts failing with 421 for an explicitly allowed host, and test_idle_session_is_reaped failing with 406 instead of 404 on Windows.

Motivation and Context

The security tests had a port-allocation race. Each test picked an ephemeral port by bind-then-close, spawned a uvicorn subprocess to re-bind it, and polled until the port accepted connections. Under pytest-xdist, two workers can pick the same port in that window: the second server fails to bind (silently — log_level="error" in a subprocess), the readiness poll succeeds against the other worker's server, and the test asserts against a server configured with different security settings. That's how a test that allows custom.host observes a 421.

The fix removes the network entirely: both files now drive the same Starlette apps in process through the interaction suite's StreamingASGITransport (re-exported from tests.interaction.transports as the sanctioned import point for code outside that suite). No sockets, no subprocesses, no ports to race over. Assertions are unchanged — the diff is a harness swap. Each file also drops from seconds of subprocess churn to ~0.25s.

The idle-reap test had a timing race. It slept 0.1s after a 0.05s idle timeout and asserted the session was gone — on a slow runner the reaper hasn't fired yet, the request routes into the still-live transport, and the missing Accept header yields the 406. Polling for the 404 instead would never converge: each request to a live session pushes its idle deadline forward. The test now waits (bounded by anyio.fail_after(5)) on the manager's own "idle timeout" log record, which is emitted synchronously with the session being unregistered — once observed, the 404 is guaranteed.

One src change: the new in-process GET test covers the validation-failure return in _handle_get_request, so the now-stale # pragma: no cover on that line is removed (the success path was already driven in process by the interaction suite).

Two subtleties worth knowing as a reviewer:

  • The SSE rejection path sends a second http.response.start through the bridge (connect_sse sends the 421/403 itself, then the handler's trailing Response() is sent by Starlette). This is deterministic because anyio's MemoryObjectSendStream.send() checkpoints on the non-empty error body before the overwrite can happen (holds down to the anyio>=4.9 floor); there's a comment at the spot documenting the assumption.
  • The SSE client uses cancel_on_close=False so the bridge drains the app's disconnect handling instead of cancelling it, matching how the interaction suite's own SSE leg uses the bridge.

tests/shared/test_sse.py, tests/shared/test_streamable_http.py, and tests/client/test_http_unicode.py still use the subprocess pattern and are follow-up candidates.

How Has This Been Tested?

  • ./scripts/test green: 1526 passed, 100% line+branch coverage, strict-no-cover clean
  • The three changed files stress-tested heavily: thousands of executions across repeated sequential runs, pytest-xdist at several -n values, shuffled test orders, and under sustained CPU load — zero failures
  • The deflaked idle-reap test passed 50/50 consecutive solo runs

Breaking Changes

None — test-only, plus one pragma removal in src.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Test names keep their existing feature-label style (rather than the interaction suite's behaviour-sentence names) so the "assertions unchanged" claim stays auditable in the diff; happy to rename in a follow-up if preferred.

AI Disclaimer

The SSE and StreamableHTTP security tests each spawned a uvicorn
subprocess on a port picked by bind-then-close, then polled until the
port accepted connections. Under pytest-xdist two workers can pick the
same port in that window: the second server fails to bind, the
readiness poll succeeds against the other worker's server, and the test
asserts against a server configured with different security settings
(e.g. 421 for a host the test explicitly allowed).

Rewrite both files to drive the same Starlette apps in process through
the interaction suite's StreamingASGITransport (re-exported from
tests.interaction.transports as the sanctioned import point): no
sockets, no subprocesses, no ports to race over. Assertions are
unchanged. The new in-process GET test covers the validation-failure
return in _handle_get_request; the pragma on that line was already
stale (the success path has been driven in process by the interaction
suite since it merged) and is removed.

Also deflake test_idle_session_is_reaped, which slept 0.1s after a
0.05s idle timeout and failed on slow runners when the reaper had not
fired yet. Re-requesting the session to poll for the 404 would push its
idle deadline forward, so instead wait on the manager's "idle timeout"
log record, which is emitted synchronously with the session being
unregistered.
@maxisbey maxisbey marked this pull request as ready for review June 2, 2026 18:53
@maxisbey maxisbey merged commit b3025f9 into main Jun 2, 2026
32 checks passed
@maxisbey maxisbey deleted the maxisbey/deflake-security-tests branch June 2, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants