Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
APACHE-2.0 License
Bot releases are hidden (Show)
Published by changhiskhan over 1 year ago
Bug fix for reading variable length list arrays (welcome @gsilvestrin).
We're working on windows support (welcome to @dnsco) and OPQ implementation for vector index, so stay tuned!
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.10...v0.3.11
Published by changhiskhan over 1 year ago
You can now choose to bypass the ANN index even if it was available and perform vector search using brute-force. This helps with debugging ANN results. Note that SIMD is still applicable during brute-force search.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.9...v0.3.10
Published by changhiskhan over 1 year ago
By default pyarrow compute Expressions doesn't serialize to sql strings. This patch release enables a limited set of filter pushdowns via python. Supported syntax:
This enables querying via duckdb without needing to load the whole dataset into memory first.
e.g., duckdb.query("SELECT * FROM dataset WHERE id=5")
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.8...v0.3.9
Published by changhiskhan over 1 year ago
You can now query lance datasets outside of python using duckdb! Thanks to @dacort for making the lance extension play nice with duckdb. dbt-duckdb-lance anyone? You can find the extension under integration/duckdb_lance
.
We're also very excited to release a very substantial performance optimization for random access for non-numeric columns.
Previously, if you wanted to fetch a string or blob column along with nearest neighbor search results, the non-optimized binary decoder take could add up to 5-20x latency overhead, depending on the sparsity of the indices. In this release we've optimized the take performance so this is basically a free operation.
While most of the work in Rust is completed for filter pushdown, we've had to delay the general release for this feature until we're able to overcome some rough edges making pyarrow compute Expressions play nice with datafusion and sqlparser-rs. It'll be worth the wait though we promise!
Cosine similarity is shipped but the recall performance is lower, due to some issues during index creation. We recommend that you stick with the default L2 distance metric until we address this in the coming few releases.
We'd love to hear from you!
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.7...v0.3.8
Published by changhiskhan over 1 year ago
Thanks @ananis25 for implementing Lance support for Duration and Null arrow arrays!
We've also completed the core implementation of cosine distance (with SIMD) and refactored the distance functions to be pluggable. Next release will expose this as a public API in Rust and Python
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.6...v0.3.7
Published by changhiskhan over 1 year ago
Welcome to @ananis25 and @yah01 !
This release enables time travel capability allowing you to check out the latest version as of a certain date and time.
We've refactored the query and index creation code to make room for multiple distance metrics.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.5...v0.3.6
Published by changhiskhan over 1 year ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.4...v0.3.5
Published by changhiskhan over 1 year ago
This is a minor release with bug fixes, documentation and ergonomic improvements for vectors indices.
Welcome our newest contributor @yah01
take_rows()
performance by binary search by @yah01 in https://github.com/eto-ai/lance/pull/564
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.2...v0.3.4
Published by changhiskhan over 1 year ago
We discovered two thing making index creation unnecessarily long:
On macbook air, index creation goes from 25min on sift1m to 24s.
On Ubuntu, it's roughly a 6x speedup.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.1...v0.3.2
Published by changhiskhan over 1 year ago
We added an index creation tool that's 2x faster than FAISS.
Accessible in python via Dataset.create_index
Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.0...v0.3.1
Published by changhiskhan over 1 year ago
Sayonara C++, bonjour Rust
What started out as a holiday hack has become a full-blown Rust rewrite.
As we say farewell to our much beloved C++ implementation, we welcome a major new feature to Lance: the vector index.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.9...v0.3.0
Published by changhiskhan almost 2 years ago
And also, we've started to implement Lance is Rust. A new kickass vector indexing feature will be coming soon once we do some more cleanup and hook the Rust module back into python.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.8...v0.2.9
Published by changhiskhan almost 2 years ago
This release contains the following:
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.7...v0.2.8
Published by eddyxu almost 2 years ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.6...v0.2.7
Published by eddyxu almost 2 years ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.5...v0.2.6
Published by eddyxu almost 2 years ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.4...v0.2.5
Published by eddyxu almost 2 years ago
Support Schema Evolution via Append Column.
lq
cmd tool to be able to inspect new versioned format by @eddyxu in https://github.com/eto-ai/lance/pull/334
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.3...v0.2.4
Published by changhiskhan almost 2 years ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.2...v0.2.3
Published by eddyxu almost 2 years ago
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.1...v0.2.2
Published by changhiskhan almost 2 years ago
Fixed bug affecting writes of fixed size list arrays as well as datagen code for Coco.
Updated to Arrow 10.0 newly released.
Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.0...v0.2.1