fix: materialize ConstantColumnVector on Comet's serialize/export paths#4532
Open
schenksj wants to merge 1 commit into
Open
fix: materialize ConstantColumnVector on Comet's serialize/export paths#4532schenksj wants to merge 1 commit into
schenksj wants to merge 1 commit into
Conversation
Spark wraps file-source partition columns and other per-batch constants in ConstantColumnVector. When such a batch reaches Comet's serialization path (Utils.getBatchFieldVectors, used by broadcast/shuffle) or FFI export path (NativeUtil.exportBatch), it was rejected with "Comet execution only takes Arrow Arrays". Materialize the constant into a fresh Arrow FieldVector (the constant repeated numRows times) inline. The materializer reuses the existing per-type ArrowFieldWriters, so it covers every type -- scalars, decimal, timestamps, and complex struct/array/map -- and stays in sync with Spark's type handling. Adds ConstantColumnVectors.materialize (arrow package) + Utils.materializeConstantColumnVector, with new match arms in getBatchFieldVectors and exportBatch. Closes apache#4527 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4527.
Rationale for this change
Spark wraps file-source partition columns and other per-batch constants in
ConstantColumnVector. When such a batch reaches Comet's serialization path (Utils.getBatchFieldVectors, used by broadcast/shuffle) or the FFI export path (NativeUtil.exportBatch), it was rejected with:This is a standalone fix; it was surfaced while working on the Delta Lake contrib integration (the OPTIMIZE / deletion-vector rewrite paths pull constants through a Comet operator), so prioritizing it helps that effort, but it applies to any plan that routes a constant column through a Comet operator.
What changes are included in this PR?
ConstantColumnVectors.materialize(in theorg.apache.spark.sql.comet.execution.arrowpackage) builds a fresh ArrowFieldVectorholding the constant repeatednumRowstimes. It reuses the existing per-typeArrowFieldWriters, so it covers every type -- scalars, decimal, timestamps, and complex struct/array/map -- and stays in sync with Spark's type handling, rather than a hand-rolled per-type switch.Utils.materializeConstantColumnVectorexposes it to the serialization path.Utils.getBatchFieldVectorsandNativeUtil.exportBatchmaterialize aConstantColumnVectorinstead of throwing. The existingCometVectorpath is untouched.How are these changes tested?
New test in
UtilsSuiteround-trips a batch with a valueConstantColumnVectorand a nullConstantColumnVectorthroughserializeBatches/decodeBatchesand asserts the materialized values (and nulls) survive. The test fails onmainwith the "only takes Arrow Arrays" exception and passes with this change.UtilsSuite(3/3) andCometExecSuite(126/0) pass. The FFIexportBatcharm shares the samematerializeConstantColumnVectorhelper.