GH-3530: Optimize PLAIN encoding and decoding with direct ByteBuffer I/O by iemejia · Pull Request #3565 · apache/parquet-java

iemejia · 2026-05-17T22:38:32Z

Part of #3530 — Apache Parquet Java Performance Improvements

Summary

Replace ByteBufferInputStream and LittleEndianDataInputStream wrappers with direct ByteBuffer access for all PLAIN value readers and writers.

Readers (PlainValuesReader, BooleanPlainValuesReader, BinaryPlainValuesReader, FixedLenByteArrayPlainValuesReader): hold a little-endian ByteBuffer from initFromPage() and call getInt/getLong/getFloat/getDouble directly, eliminating per-value stream overhead.

Writers (PlainValuesWriter, BooleanPlainValuesWriter, FixedLenByteArrayPlainValuesWriter): write through CapacityByteArrayOutputStream's new writeInt/writeLong methods which put values directly into the NIO slab buffer in little-endian order, avoiding temporary byte-array allocation.

Supporting changes:

CapacityByteArrayOutputStream: allocate slabs with ByteOrder.LITTLE_ENDIAN, add writeInt(int) and writeLong(long) for single-value NIO writes.
BytesInput: add zero-copy writeTo(ByteBuffer) and toByteArray() using bulk ByteBuffer.get() instead of stream copy.
LittleEndianDataOutputStream: batch single-byte writes into single write(buf, 0, N) calls for writeShort/writeInt.

Includes JMH benchmarks (PlainEncodingBenchmark, PlainDecodingBenchmark) covering all 7 primitive types for both encoding and decoding.

Benchmark results

Environment: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64.

Decoding (100K values/iteration, 3 forks x 5 iterations, throughput mode):

Benchmark	Master (M ops/s)	Branch (M ops/s)	Speedup
decodeInt	425	5,427	12.8x
decodeFloat	416	5,440	13.1x
decodeLong	119	4,720	39.5x (*)
decodeDouble	116	6,026	51.8x (*)
decodeBoolean	639	1,642	2.6x
decodeFlba (len=2,12,16)	188	680	3.6x
decodeBinary (len=10,100,1000)	142	225-230	1.6x

Encoding:

Benchmark	Master (M ops/s)	Branch (M ops/s)	Speedup
encodeInt	148	559	3.8x
encodeFloat	150	532	3.5x
encodeLong	193	478	2.5x
encodeDouble	179	439	2.4x
encodeBoolean	850	1,692	2.0x
encodeBinary (len=10)	76	150	2.0x
encodeFlba (len=2-16)	156-184	178-224	1.1-1.2x

(*) decodeLong/Double show JIT variance across forks (error bars >20%); true steady-state likely ~13x consistent with INT32/FLOAT.

…uffer I/O Replace ByteBufferInputStream and LittleEndianDataInputStream wrappers with direct ByteBuffer access for all PLAIN value readers and writers. Readers (PlainValuesReader, BooleanPlainValuesReader, BinaryPlainValuesReader, FixedLenByteArrayPlainValuesReader) now hold a little-endian ByteBuffer obtained from initFromPage() and call getInt/getLong/getFloat/getDouble directly, eliminating per-value stream overhead. Writers (PlainValuesWriter, BooleanPlainValuesWriter, FixedLenByteArrayPlainValuesWriter) write through CapacityByteArrayOutputStream's new writeInt/writeLong methods, which put values directly into the NIO slab buffer in little-endian order, avoiding temporary byte-array allocation. Supporting changes: - CapacityByteArrayOutputStream: allocate slabs with ByteOrder.LITTLE_ENDIAN, add writeInt(int) and writeLong(long) for single-value NIO writes. - BytesInput: add zero-copy writeTo(ByteBuffer) and toByteArray() using bulk ByteBuffer.get() instead of stream copy. - LittleEndianDataOutputStream: batch single-byte writes into single write(buf, 0, N) calls for writeShort/writeInt. Includes JMH benchmarks (PlainEncodingBenchmark, PlainDecodingBenchmark) covering all 7 primitive types for both encoding and decoding.

Fokko · 2026-05-22T20:57:44Z

-      int length = BytesUtils.readIntLittleEndian(in);
-      return Binary.fromConstantByteBuffer(in.slice(length));
-    } catch (IOException | RuntimeException e) {
-      throw new ParquetDecodingException("could not read bytes at offset " + in.position(), e);


Should we keep the ParquetDecodingException? Otherwise we're throwing the raw {IOException,RuntimeException} which is a behavioral change.

Done. Added try/catch wrapping RuntimeException (which covers BufferUnderflowException, IllegalArgumentException, etc.) into ParquetDecodingException in both readBytes() and skip().

Fokko · 2026-05-22T20:58:15Z

+    if (available > 0) {
+      this.buffer = stream.slice(available).order(ByteOrder.LITTLE_ENDIAN);
+    } else {
+      this.buffer = ByteBuffer.allocate(0).order(ByteOrder.LITTLE_ENDIAN);


Should we create a constant for the ByteBuffer.allocate(0).order(ByteOrder.LITTLE_ENDIAN);?

Done. Extracted EMPTY_LE_BUFFER as a private static final read-only constant (ByteBuffer.allocate(0).order(LITTLE_ENDIAN).asReadOnlyBuffer()), used via .duplicate() in initFromPage. Same pattern applied in PlainValuesReader.

Fokko · 2026-05-22T21:02:48Z

-    try {
-      return Binary.fromConstantByteBuffer(in.slice(length));
-    } catch (IOException | RuntimeException e) {
-      throw new ParquetDecodingException("could not read bytes at offset " + in.position(), e);


Same as above, should we keep the wrapped ParquetDecodingException?

Done. Added try/catch wrapping in readBytes(), skip(), and skip(int n). Also using Math.multiplyExact(n, length) in skip(int n) to detect overflow.

Fokko · 2026-05-22T21:05:43Z

-      try {
-        skipBytesFully(n * 8);
-      } catch (IOException e) {
-        throw new ParquetDecodingException("could not skip " + n + " double values", e);


Same here, do we want to keep the ParquetDecodingException?

Done. All readXxx() and skip(int n) methods across DoublePlainValuesReader, FloatPlainValuesReader, IntegerPlainValuesReader, and LongPlainValuesReader now wrap RuntimeException in ParquetDecodingException with descriptive messages matching the original error contract.

Fokko · 2026-05-22T21:09:32Z

 public abstract class PlainValuesReader extends ValuesReader {
  private static final Logger LOG = LoggerFactory.getLogger(PlainValuesReader.class);

-  protected LittleEndianDataInputStream in;


We should go through the deprecation cycle here, but is anything using this outside of the project itself?

Good point. The old protected LittleEndianDataInputStream in field is now protected ByteBuffer buffer — the type change is binary-incompatible regardless, so a deprecation cycle wouldn't help external subclasses (they'd get a compile error either way). I searched the project and only internal subclasses (the 4 inner classes) access this field. I think this is acceptable given this class was never annotated @Public and the field type change makes deprecation impractical. WDYT?

Fokko · 2026-05-22T21:13:43Z

+     * mutable {@code BAOS.getBuf()}.
+     */
+    @Override
+    public byte[] toByteArray() {


This overrides a deprecated API, as a follow-up we probably should move the internal calls to the new API:

@deprecated Use {@link #toByteBuffer(ByteBufferAllocator, Consumer)}

Addressed in f0bdac6. The base-class toInputStream() now tries getInternalByteBuffer() first (zero-copy fast path) before falling back to the deprecated toByteBuffer(). Also added getInternalByteBuffer() and toInputStream() overrides to ByteArrayBytesInput so the byte-array-backed path is zero-copy too. Added @SuppressWarnings("deprecation") on the intentional overrides.

Fokko · 2026-05-22T21:16:38Z

-  public int getNextOffset() {
-    return in.getNextOffset();
+  public void skip(int n) {
+    bitIndex += n;


Should we check for bounds, and throw a ParquetDecodingException in case of out of bounds?

Done. Added a bitCount field (set to length * 8 in initFromPage) and an explicit bounds check in readBoolean() that throws ParquetDecodingException with a descriptive message when attempting to read beyond the page boundary.

Fokko · 2026-05-22T21:17:27Z

-      } catch (IOException e) {
-        throw new ParquetDecodingException("could not skip " + n + " double values", e);
-      }
+      buffer.position(buffer.position() + n * 8);


Should we use Math.multiplyExact here and below?

Done. Applied Math.multiplyExact in all skip(int n) methods: Math.multiplyExact(n, 8) for double/long, Math.multiplyExact(n, 4) for float/int, and Math.multiplyExact(n, length) in FixedLenByteArrayPlainValuesReader. Overflow now produces an ArithmeticException which gets caught and wrapped in ParquetDecodingException.

- Wrap RuntimeException in ParquetDecodingException in all read/skip methods to preserve existing error contract - Extract EMPTY_LE_BUFFER constant for empty page initialization - Add bounds check in BooleanPlainValuesReader.readBoolean() - Use Math.multiplyExact in skip(int n) to detect overflow

@SuppressWarnings

… toByteArray/toByteBuffer - Base-class toInputStream() now tries getInternalByteBuffer() first for zero-copy path before falling back to deprecated toByteBuffer() - ByteArrayBytesInput: add getInternalByteBuffer() and toInputStream() overrides to avoid unnecessary copy through BAOS - Add @SuppressWarnings("deprecation") on intentional deprecated overrides

Fokko reviewed May 22, 2026

View reviewed changes

iemejia added 2 commits June 2, 2026 18:59

Conversation

iemejia commented May 17, 2026

Summary

Benchmark results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants