parquet-dotnet

Fully managed Apache Parquet implementation

MIT License

Stars
596
Committers
100

Bot releases are visible (Hide)

parquet-dotnet - 4.25.0

Published by github-actions[bot] about 1 month ago

Improvements

  • File merger utility has Stream overload for non file-based operations.
  • File merger utility has extra overload to choose compression codec and specify custom metadata, by @dxdjgl in #519.
  • Timestamp logical type is supported, by @cliedeman in #521.
  • More data types support encoding using Dictionary encoding, by @EamonHetherton in #531.
  • Support for Roslyn nullable types, by @ErikApption in #537.
  • internal: fix return of Decode methods to returning the actual destination length, by @artnim in #543.
parquet-dotnet - 4.24.0 Latest Release

Published by github-actions[bot] 4 months ago

New features

  • Enum serialization is supported, using Enum's underlying type as a storage type.
  • [ParquetIgnore] is supported in addition to [JsonIgnore] for class properties. This is useful when you want to ignore a property in Parquet serialization but not in JSON serialization. Thanks to @rhvieira1980 in #411.
  • By popular demand, there is now a FileMerger utility which can merge multiple parquet files into a single file by either merging files or actual data together.

Improvements

  • Nullable TimeSpan support in ParquetSerializer by @cliedeman in #409.
  • DataFrame support for int16/uint16 types by @asmirnov82 in #469.
  • Dropping build targets for .NET Core 3.1 and .NET 7.0 (STS). This should not affect anyone as .NET 6 and 8 are the LTS versions now.
  • Added convenience methods to serialize/deserialize collections into a single row group in #506 by @piiertho.
  • Serialization of interfaces and interface member properties is now supported, see #513 thanks to @Pragmateek.
  • ParquetReader is now easier to use in LINQ expressions thanks to @danielearwicker in #509.
  • Upgraded to latest IronCompress dependency.

Bug fixes

  • Loop will read past the end of a block #487 by @alex-harper.
  • Decimal scale condition check fixed in #504 by @sierzput.
  • Class schema reflector was using single cache for reading and writing, which resulted in incorrect schema for writing. Thanks to @Pragmateek in #514.
  • Incorrect definition level for null values in #516 by @greg0rym.

Parquet Floor

  • New feature "File explorer" lists filesystem using a panel on the left, allowing you to quickly load different files in the same directory and navigate to other directories.
  • Hovering over title will show full file path and load time in milliseconds.
  • Right-click on a row shows context menu allowing to copy the row to clipboard in text format.
  • Icon updated to use the official Parquet logo.
  • You will get a notification popup if a new version of Parquet Floor is available.
  • Telemetry agreement changed and made clearer to understand.
parquet-dotnet - 4.23.5

Published by github-actions[bot] 7 months ago

Bug fixes

  • Reading decimal fields ignores precision and scale by @sierzput in #482.
  • UUID logical type was not read correctly, it must always be in big-endian format. Thanks to @anatoliy-savchak in #496.
parquet-dotnet - 4.23.4

Published by github-actions[bot] 9 months ago

Bug fixes

Fixed regression in schema discovery of nullables for DateTime, DateOnly, TimeOnly.

parquet-dotnet - 4.23.3

Published by github-actions[bot] 9 months ago

4.23.3

Fixed regression in schema discovery of nullable decimal data types. Thanks to @stefer in #465 for investigating and reporting this.

parquet-dotnet - 4.23.2

Published by github-actions[bot] 9 months ago

Bug fixes

  • Avoid file truncation when serializing with Append = true by @danielearwicker in #462.
  • Failure to read Parquet file with FIXED_LEN_BYTE_ARRAY generated by Python in #463 thanks to @AndrewDavidLees by @aloneguid.
parquet-dotnet - 4.23.1

Published by github-actions[bot] 9 months ago

Improvement

  • Flat file converter understands simple arrays and lists.
parquet-dotnet - 4.23.0

Published by github-actions[bot] 9 months ago

New features

  • Class serializer now supports fields, in addition to properties (#405).
  • New helper class ParquetToFlatTableConverter to simplify conversion of parquet files to flat data destinations.

Bugs fixed

  • .NET >= 6 specific types DateOnly and TimeOnly deserialization was failing due to schema validation errors (#395).
  • TimeOnly nullability wasn't respected.
  • Custom attributes like [ParquetTimestamp], [ParquetMicroSecondsTime] or [ParquetDecimal] were ignored for nullable class properties (408).

Floor

  • Remembers theme variant - "light" or "dark".
  • Ask for permission to send anonymous telemetry data on start.
  • New button - reload file from disk.
  • Simple conversion to CSV.
  • Implemented version check on start.
parquet-dotnet - 4.22.1

Published by github-actions[bot] 9 months ago

Improvements

  • Deserialization into array of primitives is now supported on root class level in #456.
  • Untyped deserialiser supports legacy arrays.

Parquet Floor

  • Schema will still be loaded even if a file has failed to load.
  • Legacy arrays can be viewed.
parquet-dotnet - 4.22.0

Published by github-actions[bot] 9 months ago

Improvements

  • Added ParquetSerializer DeserializeAsync overloads accepting local file path (#379)

Bug fixes

  • DataFrameReader did not handle files with multiple row groups (#365)

Parquet Floor

  • Reduced binary size after enabling partial trimming.
  • byte[] columns are left-aligned.
  • Increased data cell top and bottom padding by 2.
parquet-dotnet - 4.20.1

Published by github-actions[bot] 9 months ago

Fixes

  • NetBox was exposing some internal types (#451)

Experimental

Parquet Floor (reference implementation of desktop viewer) user interface improvements.

parquet-dotnet - 4.20.0

Published by github-actions[bot] 9 months ago

New features

Support Writing Int64 timestamp MICROS unit (#362).

Experimental features

Cross-platform desktop app called Floor is published as a part of this release.

parquet-dotnet - 4.19.0

Published by github-actions[bot] 10 months ago

Improvements

  • Pre-allocate result list capacity when serializing by @Arithmomaniac in #444.

Experimental features

  1. This release has experimental API for a new "dictionary serializer" (name might change) to get you a taste of the future before row API will be deprecated in a very far future.
  2. Codebase also includes an experimental cross-platform desktop application written in Avalonia to view parquet files. It's in very early stages but works for basic use cases. Avalonia app was included in the solution because it does not require any IDE add-ons, SDKs and so on and just builds with stock .NET 8 SDK. In the future the app will be pre-built for Linux, Windows (and possibly Mac with community help) and included in the build artifacts.

Looking forward to your thoughts!

parquet-dotnet - 4.18.1

Published by github-actions[bot] 10 months ago

Critical bug fix: reverting #423 as it introducing some side effects that prevent from generating correct files.

parquet-dotnet - 4.18.0

Published by github-actions[bot] 10 months ago

This is the next stability improvements release, and a big thanks to everyone who contributed! Without you this project would not be possible.

Please don't forget to star this project on GitHub if you like it, this helps the project grow and motivates the fellow contributors to keep contributing!

Improvements

  • Explicitly use invariant culture when encoding number types, eliminating the potential for generating invalid JSON by @rachied in #438.
  • Added DeserializeAllAsync in #433 by @Arithmomaniac.
  • Added option to reduce flushing of streams during write operation in #432 by @dxdjgl.
  • Added explicit target for .NET 8 by @aloneguid.

Bug fixes

  • DataFrameMapper returns incompatible DataFrameColumn by @aloneguid (#343).
parquet-dotnet - 4.17.0

Published by github-actions[bot] 11 months ago

This is a community bugfix release. As a maintainer I have only approved PRs raised by this wonderful community. Thanks everyone, and keep doing what you do.

Improvements

  • Allow deserialization from open RowGroupReaders by @ddrinka in #423/#422.

Bugs fixed

  • Gracefully handle malformed fields with trailing bytes in the data by @mukunku in #413.
  • ParquetSerializer doesn't support different JsonPropertyName and ClrPropertyName on struct fields by @mrinal-thomas in #410.
  • ParquetSerializer can sometimes fail when populating _typeToAssembler cache in parallel by @scottfavre in #420/#411.
parquet-dotnet - 4.16.4

Published by github-actions[bot] about 1 year ago

Class serializer was writing map key as optional (#396). Schema reflector for class serializer now emits non-nullable keys.

Validation for maps keys in schema was also added.

parquet-dotnet - 4.16.3

Published by github-actions[bot] about 1 year ago

Delta encoding can be optionally turned off (thanks to @itayfisz for suggestion in #392).

parquet-dotnet - 4.16.2

Published by github-actions[bot] about 1 year ago

Critical Bug Fix in DELTA_BINARY_PACKED Decoding: Adding first value to destination array before reading the block, by @ee-naveen in #391.

parquet-dotnet - 4.16.1

Published by github-actions[bot] about 1 year ago

Critical Bug Fixes

  • Ensuring delta encoding footer blocks are complete And Handle Overflow by @ee-naveen in #387.
  • Use PLAIN encoding for columns without defined data by @spanglerco in #388.