parquet-dotnet

Fully managed Apache Parquet implementation

MIT License

Stars
596
Committers
100

Bot releases are visible (Hide)

parquet-dotnet - 4.2.2

Published by github-actions[bot] almost 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.2.1...4.2.2

parquet-dotnet - 4.2.1

Published by github-actions[bot] almost 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.2.0...4.2.1

parquet-dotnet - 4.2.0

Published by github-actions[bot] almost 2 years ago

Performance

This release is heavily inspired by performance optimisations but also brings breaking changes (see below) due to the work required to improve performance combined with general cleanup. Version 4.1 was not on the top of the list when it comes to performance which changes drastically in this release.

  • Parquet data encoders are generally rewritten to make use of .NET's low-level Span and Memory API where possible.
  • Code is much simpler.
  • Many reads are heavily optimised!

Parquet.Net is now faster than ParquetSharp (a wrapper around native C++ implementation).

In summary, read performance is more or less the same (was much slower). But write performance is higher. For integers, this library is 5 times faster on write, and for floats it's about 6 times faster!

It's worth noting this is just the beginning, and performance work is only starting.

Method DataType DataSize Mode Mean Error StdDev Gen0 Gen1 Gen2 Allocated
ParquetNet float 1000000 read 3.239 ms 9.9790 ms 0.5470 ms 273.4375 269.5313 269.5313 7827.2 KB
ParquetSharp float 1000000 read 2.494 ms 0.3515 ms 0.0193 ms 402.3438 394.5313 394.5313 3945.39 KB
ParquetNet float 1000000 write 4.836 ms 0.8731 ms 0.0479 ms 656.2500 656.2500 656.2500 7822.63 KB
ParquetSharp float 1000000 write 31.201 ms 10.7644 ms 0.5900 ms - - - 20.65 KB
ParquetNet int 1000000 read 2.577 ms 2.4778 ms 0.1358 ms 269.5313 265.6250 265.6250 7827.21 KB
ParquetSharp int 1000000 read 2.589 ms 0.7094 ms 0.0389 ms 402.3438 394.5313 394.5313 3945.74 KB
ParquetNet int 1000000 write 5.046 ms 1.0249 ms 0.0562 ms 640.6250 640.6250 640.6250 7822.72 KB
ParquetSharp int 1000000 write 26.367 ms 10.6178 ms 0.5820 ms - - - 20.9 KB
ParquetNet str 1000000 read 379.114 ms 221.4981 ms 12.1411 ms 36000.0000 18000.0000 1000.0000 339862.54 KB
ParquetSharp str 1000000 read 352.314 ms 64.6664 ms 3.5446 ms 36000.0000 18000.0000 1000.0000 226690.46 KB
ParquetNet str 1000000 write 442.085 ms 42.8804 ms 2.3504 ms 1000.0000 1000.0000 1000.0000 376007.85 KB
ParquetSharp str 1000000 write 264.072 ms 483.3020 ms 26.4914 ms 11500.0000 1000.0000 - 110923.76 KB

Breaking Changes

  • Schema-related classes moved to the Parquet.Schema namespace.
  • Attributes moved to the root Parquet namespace.
  • Row-related classes moved to the Parquet.Rows namespace (from Parquet.Data.Rows).

Improvements:

  • DateTime and TimeTimeOffset is now treated the same!

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.1.3...4.2.0

parquet-dotnet - 4.1.3

Published by github-actions[bot] almost 2 years ago

parquet-dotnet - 4.1.2

Published by github-actions[bot] almost 2 years ago

parquet-dotnet - 4.1.1

Published by github-actions[bot] almost 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.1.0...4.1.1

parquet-dotnet - 4.1.0: Data Compression

Published by aloneguid about 2 years ago

how-does-data-compression-work

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.0.2...4.1.0

parquet-dotnet - 4.0.2

Published by aloneguid about 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.0.1...4.0.2

parquet-dotnet - 4.0.1

Published by aloneguid about 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.0.0...4.0.1

parquet-dotnet - 4.0.0

Published by aloneguid about 2 years ago

image

This release starts a new major version 4 for parquet-dotnet. Parquet interface becomes incompatible due to full switch to async API. You will see some performance improvements in high load scenarios, but performance in general will improve during v4.0 lifetime.

In addition to that, I have upgraded parquet.thrift to latest version and regenerated/fixed thrift contracts to enable progress and implementation of the newest Parquet features.

And:

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/3.10.0...4.0.0

parquet-dotnet - 3.10.0

Published by aloneguid about 2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/3.9.1...3.10.0

parquet-dotnet - 3.9.1

Published by aloneguid about 3 years ago

thread safety fix #125

parquet-dotnet - 3.9.0

Published by aloneguid over 3 years ago

Another release. It's 3.9 now, is it going to be 4.0 any time soon? I don't know!

Breaking Changes

"parq" tool removed as according to stats it's rarely used by anyone, and I don't have time to support it. If you want to view parquet files, try Parquet Viewer.

Community Contributions

  • #116 Fix for NullReference-Exception in case of Int16, UnsignedShort and UnsignedByte. (thanks @nikolapeja6).
  • #109 Add support for decimal scale and precision. (thanks @bdquig).
  • #112 Preserve ticks during the conversion from NanoTime to DateTimeOffset (thanks @MaratFaskhiev).

Small Improvements

  • csproj: removed NET14 directive - wasn't in use at all.
  • explicit dependencies on System.Buffers and System.Memory were removed.
  • dependency on System.Reflection.Emit.Lightweight is now only required for netstandard 2.0 and lower.
  • added 2 new .NET Core explicit LTS targets - .NET Core 2.1 and .NET Core 3.1. For you it only means that if you are targeting one of those frameworks, you will have 2 less transitive nuget dependencies.
  • using official parquet graphics.
parquet-dotnet - 3.8.6

Published by aloneguid over 3 years ago

Bugfix release. Note to users: parquet-dotnet is pretty much alive. It's actively used in quite a few projects and is pretty solid.

  • Many thanks to @felipepessoto for fixing and issue in RunLengthBitPackingHybridValuesReader and adding support for dots in field names.
  • Thanks to @DmitryBaranov1986 for improving native serialiser interface (#92).
  • Thanks to @ishepherd for handling empty data pages correctly (#95) and also congrats on the first OSS contribution 💛
  • @aloncatz is also a hero - fixed reading plain dictionary of zero length (#96).

LiveMessage_2021-03-05-17-50-30

parquet-dotnet - 3.8.5

Published by aloneguid over 3 years ago

various bugfixes

parquet-dotnet - https://github.com/aloneguid/parquet-dotnet/releases/tag/3.8.3

Published by aloneguid almost 4 years ago

parquet-dotnet - https://github.com/aloneguid/parquet-dotnet/releases/tag/3.8.2

Published by aloneguid almost 4 years ago

parquet-dotnet - https://github.com/aloneguid/parquet-dotnet/releases/tag/3.8.1

Published by aloneguid almost 4 years ago

parquet-dotnet - https://github.com/aloneguid/parquet-dotnet/releases/tag/3.8.0

Published by aloneguid almost 4 years ago

parquet-dotnet - https://github.com/aloneguid/parquet-dotnet/releases/tag/3.7.7

Published by aloneguid over 4 years ago