fix: materialize ConstantColumnVector on Comet's serialize/export paths by schenksj · Pull Request #4532 · apache/datafusion-comet

schenksj · 2026-05-30T03:10:24Z

Which issue does this PR close?

Closes #4527.

Rationale for this change

Spark wraps file-source partition columns and other per-batch constants in ConstantColumnVector. When such a batch reaches Comet's serialization path (Utils.getBatchFieldVectors, used by broadcast/shuffle) or the FFI export path (NativeUtil.exportBatch), it was rejected with:

Comet execution only takes Arrow Arrays, but got ...ConstantColumnVector

This is a standalone fix; it was surfaced while working on the Delta Lake contrib integration (the OPTIMIZE / deletion-vector rewrite paths pull constants through a Comet operator), so prioritizing it helps that effort, but it applies to any plan that routes a constant column through a Comet operator.

What changes are included in this PR?

ConstantColumnVectors.materialize (in the org.apache.spark.sql.comet.execution.arrow package) builds a fresh Arrow FieldVector holding the constant repeated numRows times. It reuses the existing per-type ArrowFieldWriters, so it covers every type -- scalars, decimal, timestamps, and complex struct/array/map -- and stays in sync with Spark's type handling, rather than a hand-rolled per-type switch.
Utils.materializeConstantColumnVector exposes it to the serialization path.
New match arms in Utils.getBatchFieldVectors and NativeUtil.exportBatch materialize a ConstantColumnVector instead of throwing. The existing CometVector path is untouched.

How are these changes tested?

New test in UtilsSuite round-trips a batch with a value ConstantColumnVector and a null ConstantColumnVector through serializeBatches / decodeBatches and asserts the materialized values (and nulls) survive. The test fails on main with the "only takes Arrow Arrays" exception and passes with this change. UtilsSuite (3/3) and CometExecSuite (126/0) pass. The FFI exportBatch arm shares the same materializeConstantColumnVector helper.

Spark wraps file-source partition columns and other per-batch constants in ConstantColumnVector. When such a batch reaches Comet's serialization path (Utils.getBatchFieldVectors, used by broadcast/shuffle) or FFI export path (NativeUtil.exportBatch), it was rejected with "Comet execution only takes Arrow Arrays". Materialize the constant into a fresh Arrow FieldVector (the constant repeated numRows times) inline. The materializer reuses the existing per-type ArrowFieldWriters, so it covers every type -- scalars, decimal, timestamps, and complex struct/array/map -- and stays in sync with Spark's type handling. Adds ConstantColumnVectors.materialize (arrow package) + Utils.materializeConstantColumnVector, with new match arms in getBatchFieldVectors and exportBatch. Closes apache#4527 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: materialize ConstantColumnVector on Comet's serialize/export paths#4532

fix: materialize ConstantColumnVector on Comet's serialize/export paths#4532
schenksj wants to merge 1 commit into
apache:mainfrom
schenksj:fix/materialize-constant-column-vector

schenksj commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

schenksj commented May 30, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant