Skip to content

Support baking PythonCall into a juliacall system image (opt-in)#773

Open
ncudlenco wants to merge 2 commits into
JuliaPy:mainfrom
buchi-labortechnik-ag:feat/embed-sysimage
Open

Support baking PythonCall into a juliacall system image (opt-in)#773
ncudlenco wants to merge 2 commits into
JuliaPy:mainfrom
buchi-labortechnik-ag:feat/embed-sysimage

Conversation

@ncudlenco
Copy link
Copy Markdown
Contributor

@ncudlenco ncudlenco commented May 19, 2026

Summary

Lets PythonCall itself be compiled into a juliacall system image (opt-in). With it baked in, the using PythonCall that import juliacall performs becomes a memory-map instead of a multi-second load+compile. In a containerised juliacall workload we measured, fresh-process startup drops from ~18 s to ~1.9 s. Off by default - no behaviour change unless enabled. It's meant as a low-risk interim until the automatic fix discussed in #436 (or related work) lands; it doesn't conflict with or preclude that.

Motivation

In short-lived Python processes that embed Julia via juliacall - serverless or autoscaled containers (AWS Lambda, queue workers, CI jobs) that start, handle one request, and exit - there's no long-lived process to amortise Julia start-up. Every cold start pays it again, and the dominant part is the using PythonCall that import juliacall runs to bring the bridge up.

Measured, fresh container to first call:

container init
stock juliacall (only app packages baked) ~18 s
PythonCall also baked into the sysimage ~1.9 s

PYTHON_JULIACALL_SYSIMAGE already exists, but a sysimage with only the application packages still runs using PythonCall every cold start. Baking PythonCall in too is the fix, but it currently fails: when PythonCall is in the sysimage its __init__ runs during jl_init_with_image, before juliacall's bootstrap has set Main.__PythonCall_libptr. init_context() decides "embedded" only from that global, so it sees "not embedded"; since Python is mid-import juliacall, init_juliacall() then errors with 'juliacall' module already exists.

Change

Add one module-level flag, PythonCall._is_embedded, that the user flips inside a PackageCompiler script= to mark the resulting sysimage as embedded. At runtime, PythonCall.__init__ reads the flag and takes the embedded path; libpython is opened from the existing lib preference / JULIA_PYTHONCALL_LIB. If the sysimage is loaded by a julia process with no running Python (e.g. a Base.compilecache child), the embedded path rolls back and PythonCall stays inactive instead of erroring. Docs include a worked PackageCompiler example.

Backward compatibility

Backwards compatible since the flag is off by default. The classic juliacall path and the non-embedded path are unchanged; lib and exe preferences keep their existing semantics. No new preferences or environment variables are introduced.

Testing

The new code path only runs when explicitly enabled, so default behaviour is unchanged and the existing test suites are unaffected.

Verified manually on Linux (AWS Lambda base image public.ecr.aws/lambda/python:3.12) and Windows (desktop, Python 3.13). A plain julia.exe --sysimage= load of a baked sysimage leaves PythonCall inactive without erroring. Not tested on macOS. No automated test is included because exercising it requires building a system image in CI, which is slow - happy to add one, or document the steps, if you'd prefer.

Related issues

Relates to #436 and #600 - this unblocks their use case (baking PythonCall into a custom system image) as an interim. The root-cause fix described in #436 - resetting PythonCall's sysimage-persisted state in __init__ so it works with no opt-in - is the longer-term path; this PR doesn't attempt or preclude it.

Also relevant to #762 ("Improve juliacall startup time?", which explicitly asks about compiling a system image) and a building block for #76 ("Compile and use custom sysimages automatically"). Does not address #129 (a different failure: no environment in the LOAD_PATH depends on CondaPkg).

@cjdoris
Copy link
Copy Markdown
Member

cjdoris commented May 24, 2026

Looks good. My main feedback is about the docs - could you document the basic steps to actually create a sysimg?

I don't think you need to go much into the why (the current second paragraph) nor mention Main.__PythonCall_libptr (which is internal). Instead, you can pretty much just say that you need to set the prefs embedded=true and lib=.... But what's not totally obvious is how you set up a project and these prefs and use PackageCompiler to actually make the sysimg.

@cjdoris
Copy link
Copy Markdown
Member

cjdoris commented May 24, 2026

Another thought - if you know that you are embedded, then you can find the libptr (by calling into the C-API functions, which will be globally available in this case). We only don't do this because it's easier to directly communicate the ptr itself by setting Main.__PythonCall_libptr from Python.

Therefore if we can "bake in" the fact that PythonCall is embedded into the sysimg, then we won't need any of these preferences. I wonder if we could simply do @eval PythonCall _is_embedded=true or something when we make the sysimg, so it's baked into PythonCall, then test for this variable in PythonCall.__init__?

When PythonCall is compiled into a juliacall system image, its __init__
runs during jl_init_with_image, before juliacall's bootstrap defines
Main.__PythonCall_libptr. Embedding was therefore mis-detected as
non-embedded and failed with "'juliacall' module already exists".

Add an opt-in embedded preference / JULIA_PYTHONCALL_EMBEDDED (via the
same getpref mechanism as exe/lib) that forces the embedded path and
obtains libpython from the lib preference / JULIA_PYTHONCALL_LIB (already
loaded in the host process). Unset, behaviour is unchanged. Docs and
CHANGELOG updated.
ncudlenco pushed a commit to buchi-labortechnik-ag/PythonCall.jl that referenced this pull request May 29, 2026
Address review feedback from @cjdoris on JuliaPy#773:

  > if you know that you are embedded, then you can find the libptr (by
  > calling into the C-API functions, which will be globally available in
  > this case) [...] if we can 'bake in' the fact that PythonCall is
  > embedded into the sysimg, then we won't need any of these preferences.
  > I wonder if we could simply do '@eval PythonCall _is_embedded=true' or
  > something when we make the sysimg, so it's baked into PythonCall, then
  > test for this variable in 'PythonCall.__init__'?

  > [...] could you document the basic steps to actually create a sysimg?
  > I don't think you need to go much into the why [...] nor mention
  > Main.__PythonCall_libptr (which is internal). Instead, you can pretty
  > much just say that you need to set the prefs [...]. But what's not
  > totally obvious is how you set up a project and these prefs and use
  > PackageCompiler to actually make the sysimg.

Design
------
Add a module-level 'const _is_embedded = Ref(false)' on PythonCall, flipped
at sysimage build time via PackageCompiler's 'script=' keyword (NOT
'precompile_execution_file=', which runs in a separate child process whose
state is not snapshotted). The mutated value is captured in the snapshot;
at runtime, 'PythonCall.__init__' reads it and takes the embedded path.

A 'const Ref' is preferred over a rebound non-const global so the C
submodule can 'import' the name once and read it without 'parentmodule'
indirection. Same baked-into-sysimage behaviour as the literal '@eval'
form suggested in review.

libpython is opened from the existing 'lib' preference / JULIA_PYTHONCALL_LIB
(added in 0.9.33). The PR does not introduce new preferences or
environment variables.

The interpreter's executable path is resolved via 'sys.executable' using
PyImport_ImportModule + PyObject_GetAttrString + PyUnicode_AsUTF8AndSize -
stable across all supported CPython versions and platforms.

If '_is_embedded[]' is true but 'Py_IsInitialized()' returns 0 - e.g. the
sysimage is loaded by a 'julia.exe' child of 'Base.compilecache' rather
than by juliacall - init_context resets CTX and downstream module
__init__s short-circuit. PythonCall loads as inactive instead of erroring.

Files
-----
  src/PythonCall.jl: declare 'const _is_embedded = Ref(false)'.
  src/C/C.jl: import _is_embedded into the C submodule.
  src/C/context.jl: rewrite init_context() embedded branch; add
    _embedded_program_path() reading sys.executable.
  src/Core/Core.jl, src/Convert/Convert.jl, src/Wrap/Wrap.jl,
  src/JlWrap/JlWrap.jl, src/JlWrap/C.jl, src/Compat/Compat.jl: guard
    __init__ on CTX.is_initialized for the inactive-load case.
  docs/src/juliacall.md: rewrite the 'Baking PythonCall into a system
    image' section with a worked example.
  CHANGELOG.md: Unreleased entry.
@ncudlenco ncudlenco force-pushed the feat/embed-sysimage branch from 28dd020 to f06bb4a Compare May 29, 2026 22:03
@ncudlenco
Copy link
Copy Markdown
Contributor Author

ncudlenco commented May 29, 2026

Looks good. My main feedback is about the docs - could you document the basic steps to actually create a sysimg?

I don't think you need to go much into the why (the current second paragraph) nor mention Main.__PythonCall_libptr (which is internal). Instead, you can pretty much just say that you need to set the prefs embedded=true and lib=.... But what's not totally obvious is how you set up a project and these prefs and use PackageCompiler to actually make the sysimg.


Another thought - if you know that you are embedded, then you can find the libptr (by calling into the C-API functions, which will be globally available in this case). We only don't do this because it's easier to directly communicate the ptr itself by setting Main.__PythonCall_libptr from Python.

Therefore if we can "bake in" the fact that PythonCall is embedded into the sysimg, then we won't need any of these preferences. I wonder if we could simply do @eval PythonCall _is_embedded=true or something when we make the sysimg, so it's baked into PythonCall, then test for this variable in PythonCall.__init__?


Thanks - both points shaped the rewrite. The flag is now baked into the sysimage as you suggested, the docs section has a worked PackageCompiler example, and Main.__PythonCall_libptr is no longer mentioned. libpython is opened via the existing lib preference (JULIA_PYTHONCALL_LIB) - no new preferences or env vars.

I also explored the C-API angle. It works in standalone tests but I couldn't get it to resolve libpython reliably during the sysimage init window across all targets, so the PR sticks with the explicit lib preference. Happy to dig deeper if you'd like more detail.

One implementation note: I used const _is_embedded = Ref(false) rather than a rebound non-const global, so the C submodule can import the name once and read it without parentmodule indirection. User API is the same one line in script=. Happy to switch to the literal @eval form if you'd prefer.

P.S. Also added you as a Write collaborator on buchi-labortechnik-ag/PythonCall.jl so you can push directly to this branch if you'd like to make changes yourself.

Address review feedback from @cjdoris on JuliaPy#773:

  > if you know that you are embedded, then you can find the libptr (by
  > calling into the C-API functions, which will be globally available in
  > this case) [...] if we can 'bake in' the fact that PythonCall is
  > embedded into the sysimg, then we won't need any of these preferences.
  > I wonder if we could simply do '@eval PythonCall _is_embedded=true' or
  > something when we make the sysimg, so it's baked into PythonCall, then
  > test for this variable in 'PythonCall.__init__'?

  > [...] could you document the basic steps to actually create a sysimg?
  > I don't think you need to go much into the why [...] nor mention
  > Main.__PythonCall_libptr (which is internal). Instead, you can pretty
  > much just say that you need to set the prefs [...]. But what's not
  > totally obvious is how you set up a project and these prefs and use
  > PackageCompiler to actually make the sysimg.

Design
------
Add a module-level 'const _is_embedded = Ref(false)' on PythonCall, flipped
at sysimage build time via PackageCompiler's 'script=' keyword (NOT
'precompile_execution_file=', which runs in a separate child process whose
state is not snapshotted). The mutated value is captured in the snapshot;
at runtime, 'PythonCall.__init__' reads it and takes the embedded path.

A 'const Ref' is preferred over a rebound non-const global so the C
submodule can 'import' the name once and read it without 'parentmodule'
indirection. Same baked-into-sysimage behaviour as the literal '@eval'
form suggested in review.

libpython is opened from the existing 'lib' preference / JULIA_PYTHONCALL_LIB
(added in 0.9.33). The PR does not introduce new preferences or
environment variables.

The interpreter's executable path is resolved via 'sys.executable' using
PyImport_ImportModule + PyObject_GetAttrString + PyUnicode_AsUTF8AndSize -
stable across all supported CPython versions and platforms.

If '_is_embedded[]' is true but 'Py_IsInitialized()' returns 0 - e.g. the
sysimage is loaded by a 'julia.exe' child of 'Base.compilecache' rather
than by juliacall - init_context resets CTX and downstream module
__init__s short-circuit. PythonCall loads as inactive instead of erroring.

Files
-----
  src/PythonCall.jl: declare 'const _is_embedded = Ref(false)'.
  src/C/C.jl: import _is_embedded into the C submodule.
  src/C/context.jl: rewrite init_context() embedded branch; add
    _embedded_program_path() reading sys.executable.
  src/Core/Core.jl, src/Convert/Convert.jl, src/Wrap/Wrap.jl,
  src/JlWrap/JlWrap.jl, src/JlWrap/C.jl, src/Compat/Compat.jl: guard
    __init__ on CTX.is_initialized for the inactive-load case.
  docs/src/juliacall.md: rewrite the 'Baking PythonCall into a system
    image' section with a worked example.
  CHANGELOG.md: Unreleased entry.
@ncudlenco ncudlenco force-pushed the feat/embed-sysimage branch from f06bb4a to 58ffd5a Compare May 29, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants