lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

APACHE-2.0 License

Downloads
814.7K
Stars
3.8K

Bot releases are visible (Hide)

lance - v0.4.11 Expose fragment data files in python

Published by changhiskhan over 1 year ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.10...v0.4.11

lance - v0.4.10: It is all about performance

Published by eddyxu over 1 year ago

Welcome our newest contributor @LiWeiJie!

This release introduces several performance improvements, including index caching and customized prefetching.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.9...v0.4.10

lance - v0.4.9 Document improvement, and Distributed Fragments

Published by eddyxu over 1 year ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.8...v0.4.9

lance - v0.4.8 Better support for nested fields and more supported predicates

Published by changhiskhan over 1 year ago

Previously predicates on nested (and deeply nested) fields were not properly supported. This release adds support for filtering on struct sub-fields or deeply nested structs.

We also add support for more filter predicates and fixed a regression in NULL handling for string columns.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.7...v0.4.8

lance - v0.4.7 Random access improvements

Published by eddyxu over 1 year ago

In this version, we improve the random access over cloud storage by allowing a higher number of parallel I/Os.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.6...v0.4.7

lance - v0.4.6 Support FileFragment creation

Published by changhiskhan over 1 year ago

Allows the creation of a distributed lance dataset from scratch

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.5...v0.4.6

lance - v0.4.5 Preview private API for merging columns

Published by changhiskhan over 1 year ago

Welcome @Mause as our newest contributor! Also, a big thank you for your work on the duckdb extension framework.

In this release we added a preview of the feature to do distributed column additions. This makes it possible to distribute Lance Fragments across nodes, add a new column to each Fragment, and then write a new Lance dataset version manifest with the updated schema and files.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.4...v0.4.5

lance - v0.4.4 Various bug fixes

Published by changhiskhan over 1 year ago

#805 fixed an integer overflow bug in the plain decoder that resulted in high latency for Take (and consequently high latency for the vector search). We'll be adding continuous performance benchmarks soon to prevent issues like this from being released in the future.

We also fixed a gap in cosine similarity where the vectors does not line up perfectly with SIMD strides on the platform.

DiskANN progress is continuing. First milestone will be an in-memory version to support smaller datasets. A compressed, disk-based version will follow soon after that.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.3...v0.4.4

lance - v0.4.3 Bug fixes and code cleanup

Published by changhiskhan over 1 year ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.2...v0.4.3

lance - v0.4.2 Polars, GCS, and distributed lances

Published by changhiskhan over 1 year ago

A warm welcome to @hzhang86 as Lance's newest contributor. Thanks for adding TPCH benchmarks for Lance to establish a baseline. This is really helpful for us to focus performance optimization roadmap.

This release is packed with valuable features:

  1. Direct polars scan without needing to pull everything into memory is added.
  2. We expose FileFragment's to allow distributed processing engines like Spark to access parts of a Lance dataset easily
  3. Last but not least, we've added support for reading Lance data directly from GS buckets

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.1...v0.4.2

lance - v0.4.1 Support Append in Vector Search

Published by changhiskhan over 1 year ago

The vector search in Lance now supports live updates. Previously, when you added new vectors to the dataset, you would be required to rebuild the index. Now, the index is "inherited" and the vector search results are the combination of ANN search on the indexed data and KNN on the new Appended data. So there's a small latency increase and the recall should be the same or better.

This provides a smooth performance curve until you have inserted enough new data that re-indexing is warranted.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.0...v0.4.1

lance - v0.4.0 Windows support

Published by changhiskhan over 1 year ago

A warm welcome to @gsajko ! Thanks for making our tutorial notebook easier to use and understand!

Note: OPQ is disabled in windows for the vector index. This will be addressed once LAPACK support is added.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.19...v0.4.0

lance - v0.3.19 Bug fix for filter predicates on large-utf8 type

Published by changhiskhan over 1 year ago

Also fix publishing to crates.io

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.18...v0.3.19

lance - v0.3.18 Bug fix release for binary offsets

Published by changhiskhan over 1 year ago

Fix for incorrect offset for string/variable list columns as reported in https://github.com/eto-ai/lance/issues/720#issuecomment-1479716990

Thanks @lucazanna for the feedback!

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.17...v0.3.18

lance - v0.3.17 Support for nested dict columns

Published by changhiskhan over 1 year ago

A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.

This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).

Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.16...v0.3.17

lance - v0.3.16 Filte pushdown improvements

Published by changhiskhan over 1 year ago

Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥

This release contains 2 workarounds for arrow limitations:

  1. Lance datasets now support <field> LIKE '%' and <field> IN (<values>) filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter.

  2. When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.15...v0.3.16

lance - v0.3.15 Bug fix for combining vector search and filter predicate

Published by changhiskhan over 1 year ago

Thanks to @cemoody for the bug report!

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.14...v0.3.15

lance - v0.3.14 Timestamp support

Published by changhiskhan over 1 year ago

This is a patch release that adds support for Arrow Timestamp type. Thanks @kesavkolla for the bug report!

Thanks to @Renkai we also an optimized Take for Boolean arrays.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.13...v0.3.14

lance - v0.3.13 Support fast Take for variable length list

Published by changhiskhan over 1 year ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.12...v0.3.13

lance - v0.3.12 Upgrade arrow-rs and bug fixes

Published by changhiskhan over 1 year ago

  • Upgraded arrow-rs dependency to 33.0 (Waiting on datafusion for 34.0 upgrade).
  • Nested Dictionary fields are now parsed and written correctly.
  • More progress towards OPQ implementation.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.11...v0.3.12