Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
APACHE-2.0 License
Bot releases are visible (Hide)
Published by changhiskhan over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.10...v0.4.11
Published by eddyxu over 1 year ago
Welcome our newest contributor @LiWeiJie!
This release introduces several performance improvements, including index caching and customized prefetching.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.9...v0.4.10
Published by eddyxu over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.8...v0.4.9
Published by changhiskhan over 1 year ago
Previously predicates on nested (and deeply nested) fields were not properly supported. This release adds support for filtering on struct sub-fields or deeply nested structs.
We also add support for more filter predicates and fixed a regression in NULL handling for string columns.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.7...v0.4.8
Published by eddyxu over 1 year ago
In this version, we improve the random access over cloud storage by allowing a higher number of parallel I/Os.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.6...v0.4.7
Published by changhiskhan over 1 year ago
Allows the creation of a distributed lance dataset from scratch
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.5...v0.4.6
Published by changhiskhan over 1 year ago
Welcome @Mause as our newest contributor! Also, a big thank you for your work on the duckdb extension framework.
In this release we added a preview of the feature to do distributed column additions. This makes it possible to distribute Lance Fragments across nodes, add a new column to each Fragment, and then write a new Lance dataset version manifest with the updated schema and files.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.4...v0.4.5
Published by changhiskhan over 1 year ago
#805 fixed an integer overflow bug in the plain decoder that resulted in high latency for Take (and consequently high latency for the vector search). We'll be adding continuous performance benchmarks soon to prevent issues like this from being released in the future.
We also fixed a gap in cosine similarity where the vectors does not line up perfectly with SIMD strides on the platform.
DiskANN progress is continuing. First milestone will be an in-memory version to support smaller datasets. A compressed, disk-based version will follow soon after that.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.3...v0.4.4
Published by changhiskhan over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.2...v0.4.3
Published by changhiskhan over 1 year ago
A warm welcome to @hzhang86 as Lance's newest contributor. Thanks for adding TPCH benchmarks for Lance to establish a baseline. This is really helpful for us to focus performance optimization roadmap.
This release is packed with valuable features:
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.1...v0.4.2
Published by changhiskhan over 1 year ago
The vector search in Lance now supports live updates. Previously, when you added new vectors to the dataset, you would be required to rebuild the index. Now, the index is "inherited" and the vector search results are the combination of ANN search on the indexed data and KNN on the new Appended data. So there's a small latency increase and the recall should be the same or better.
This provides a smooth performance curve until you have inserted enough new data that re-indexing is warranted.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.4.0...v0.4.1
Published by changhiskhan over 1 year ago
A warm welcome to @gsajko ! Thanks for making our tutorial notebook easier to use and understand!
Note: OPQ is disabled in windows for the vector index. This will be addressed once LAPACK support is added.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.19...v0.4.0
Published by changhiskhan over 1 year ago
Also fix publishing to crates.io
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.18...v0.3.19
Published by changhiskhan over 1 year ago
Fix for incorrect offset for string/variable list columns as reported in https://github.com/eto-ai/lance/issues/720#issuecomment-1479716990
Thanks @lucazanna for the feedback!
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.17...v0.3.18
Published by changhiskhan over 1 year ago
A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.
This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).
Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.16...v0.3.17
Published by changhiskhan over 1 year ago
Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥
This release contains 2 workarounds for arrow limitations:
Lance datasets now support <field> LIKE '%'
and <field> IN (<values>)
filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter.
When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.15...v0.3.16
Published by changhiskhan over 1 year ago
Thanks to @cemoody for the bug report!
nearest
and filter
are applied by @changhiskhan in https://github.com/eto-ai/lance/pull/686
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.14...v0.3.15
Published by changhiskhan over 1 year ago
This is a patch release that adds support for Arrow Timestamp type. Thanks @kesavkolla for the bug report!
Thanks to @Renkai we also an optimized Take for Boolean arrays.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.13...v0.3.14
Published by changhiskhan over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.12...v0.3.13
Published by changhiskhan over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.11...v0.3.12