parquet-dotnet

Fully managed Apache Parquet implementation

MIT License

Stars
596
Committers
100

Bot releases are visible (Hide)

parquet-dotnet - 4.6.0

Published by github-actions[bot] over 1 year ago

This release contains completely rewritten class serializer using .NET expression trees. All the class serialisation documentation has been updated to reflect this. Existing ParquetConvert serializer is left untouched and marked obsolete, hence there is no increase in the major version number.
If you are using ParquetConvert, consider switching to ParquetSerializer soon, because no new features or fixes will be added to ParquetConvert.

New class serializer supports all primitive types and nested types (structs, lists, maps and their combinations). It also fully conforms to Dremel specification, which was a massive headache to implement properly.

Other improvements:

  • Documentation updated for low-level and serializer API on how to use nested data types.
  • Corrected how definition and repetition levels are calculated, making it more conformant to Parquet specification.
  • Schema path calculation logic improved and does not rely on string splitting/joining, which allows you to use any characters anywhere in column names (#278)

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.5.4...4.6.0

parquet-dotnet - 4.5.4

Published by github-actions[bot] over 1 year ago

What's Changed

  • bitpacking range check bug on destination buffer (#267)

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.5.3...4.5.4

parquet-dotnet - 4.5.3

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.5.2...4.5.3

parquet-dotnet - 4.5.2

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.5.1...4.5.2

parquet-dotnet - 4.5.1

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.5.0...4.5.1

parquet-dotnet - 4.5.0

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Performance

In general, this library outperforms native code in raw write performance which was the aim of this release (see chart). Unfortunately, writing strings is still slightly slower, which will improve in the next versions.
image

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.7...4.5.0

parquet-dotnet - 4.4.7

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.6...4.4.7

parquet-dotnet - 4.4.6

Published by github-actions[bot] over 1 year ago

parquet-dotnet - 4.4.5

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.4...4.4.5

parquet-dotnet - 4.4.4

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.3...4.4.4

parquet-dotnet - 4.4.3

Published by github-actions[bot] over 1 year ago

Massive performance boost for RLE encoder.

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.2...4.4.3

parquet-dotnet - 4.4.2

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.1...4.4.2

parquet-dotnet - 4.4.1

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.4.0...4.4.1

parquet-dotnet - 4.4.0

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.3.4...4.4.0

parquet-dotnet - 4.3.4

Published by github-actions[bot] over 1 year ago

Experimental managed Snappy implementation.

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.3.3...4.3.4

parquet-dotnet - 4.3.3

Published by github-actions[bot] over 1 year ago

Improvements

  • Slight performance improvements in string write.
  • Reading file schema is now using only async methods.

Unrelated

Check out Parquet Online project, help wanted!

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.3.2...4.3.3

parquet-dotnet - 4.3.2

Published by github-actions[bot] over 1 year ago

Performance Improvements

Write performance comparing to ParquetSharp:

Method DataType Mean Error StdDev Gen0 Gen1 Gen2 Allocated
ParquetNet float 4.562 ms 0.3217 ms 0.0176 ms 273.4375 273.4375 273.4375 3919.8 KB
ParquetSharp float 26.750 ms 4.2722 ms 0.2342 ms - - - 20.68 KB
ParquetNet int 4.276 ms 0.4586 ms 0.0251 ms 218.7500 218.7500 218.7500 3919.26 KB
ParquetSharp int 22.786 ms 5.9526 ms 0.3263 ms - - - 20.93 KB
ParquetNet str 278.591 ms 163.0088 ms 8.9351 ms - - - 83953.52 KB
ParquetSharp str 128.194 ms 24.6149 ms 1.3492 ms 11500.0000 500.0000 - 73337.16 KB

Read performance comparing to ParquetSharp:

Method DataType Mean Error StdDev Gen0 Gen1 Gen2 Allocated
ParquetNet float 4.440 ms 10.1130 ms 0.5543 ms 46.8750 46.8750 46.8750 7.64 MB
ParquetSharp float 2.274 ms 0.8282 ms 0.0454 ms 31.2500 23.4375 23.4375 3.85 MB
ParquetNet int 3.787 ms 7.4689 ms 0.4094 ms 46.8750 46.8750 46.8750 7.64 MB
ParquetSharp int 2.176 ms 0.9895 ms 0.0542 ms 31.2500 23.4375 23.4375 3.85 MB
ParquetNet str 188.185 ms 73.6461 ms 4.0368 ms 21000.0000 10666.6667 666.6667 192.66 MB
ParquetSharp str 192.253 ms 72.4862 ms 3.9732 ms 21000.0000 10666.6667 666.6667 129.82 MB

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.3.1...4.3.2

parquet-dotnet - 4.3.1

Published by github-actions[bot] over 1 year ago

parquet-dotnet - 4.3.0

Published by github-actions[bot] almost 2 years ago

Breaking Changes

(I promise to stop these!)

  • Due to massive ambiguity issues between DateTimeOffset and DateTime, the first one gets retired because DateTime is more often used by developers. If you are using DateTimeOffset (with Parquet) please stop now!
  • DataField constructor now accepts nullable parameters for isNullable and isArray which allows to override reflected type metadata.

New Features

  • Dictionary encoding is now supported. It is enabled by default for string columns with uniqueness factor of 0.8. You can tune it or turn off completely in ParquetOptions class (#223, #151).

Deprecations

  • DataType enumeration is now obsolete. It was introduced in early versions when number of supported types was small, but now almost every basic CLR type is supported so there is no need for it.

Bugs Fixed

  • Due to refactoring, data boundary check was missed out causing exception on deserialising columns with all nulls (#224).
  • Creator metadata wasn't set after migrating to GitHub Actions (#104).

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.2.3...4.3.0

parquet-dotnet - 4.2.3

Published by github-actions[bot] almost 2 years ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.2.2...4.2.3