lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

APACHE-2.0 License

Downloads
814.7K
Stars
3.8K

Bot releases are visible (Hide)

lance - v0.3.11 Bug fix release

Published by changhiskhan over 1 year ago

Bug fix for reading variable length list arrays (welcome @gsilvestrin).

We're working on windows support (welcome to @dnsco) and OPQ implementation for vector index, so stay tuned!

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.10...v0.3.11

lance - v0.3.10 Easier debugging for vector index

Published by changhiskhan over 1 year ago

You can now choose to bypass the ANN index even if it was available and perform vector search using brute-force. This helps with debugging ANN results. Note that SIMD is still applicable during brute-force search.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.9...v0.3.10

lance - v0.3.9 limited python support for predicate pushdown

Published by changhiskhan over 1 year ago

By default pyarrow compute Expressions doesn't serialize to sql strings. This patch release enables a limited set of filter pushdowns via python. Supported syntax:

  1. field references
  2. Operators: > < >= <= = == !=
  3. conjunctions / disjunctions

This enables querying via duckdb without needing to load the whole dataset into memory first.

e.g., duckdb.query("SELECT * FROM dataset WHERE id=5")

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.8...v0.3.9

lance - v0.3.8 Improved random access for non-numeric columns and duckdb extension

Published by changhiskhan over 1 year ago

You can now query lance datasets outside of python using duckdb! Thanks to @dacort for making the lance extension play nice with duckdb. dbt-duckdb-lance anyone? You can find the extension under integration/duckdb_lance.

We're also very excited to release a very substantial performance optimization for random access for non-numeric columns.
Previously, if you wanted to fetch a string or blob column along with nearest neighbor search results, the non-optimized binary decoder take could add up to 5-20x latency overhead, depending on the sparsity of the indices. In this release we've optimized the take performance so this is basically a free operation.

While most of the work in Rust is completed for filter pushdown, we've had to delay the general release for this feature until we're able to overcome some rough edges making pyarrow compute Expressions play nice with datafusion and sqlparser-rs. It'll be worth the wait though we promise!

Cosine similarity is shipped but the recall performance is lower, due to some issues during index creation. We recommend that you stick with the default L2 distance metric until we address this in the coming few releases.

We'd love to hear from you!

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.7...v0.3.8

lance - v0.3.7 Duration and Null support

Published by changhiskhan over 1 year ago

Thanks @ananis25 for implementing Lance support for Duration and Null arrow arrays!

We've also completed the core implementation of cosine distance (with SIMD) and refactored the distance functions to be pluggable. Next release will expose this as a public API in Rust and Python

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.6...v0.3.7

lance - v0.3.6 Time travel

Published by changhiskhan over 1 year ago

Welcome to @ananis25 and @yah01 !

This release enables time travel capability allowing you to check out the latest version as of a certain date and time.
We've refactored the query and index creation code to make room for multiple distance metrics.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.5...v0.3.6

lance - v0.3.5 Fast take and Decimal{128, 256} support

Published by changhiskhan over 1 year ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.4...v0.3.5

lance - v0.3.4 Bug fixes and ergonomics

Published by changhiskhan over 1 year ago

This is a minor release with bug fixes, documentation and ergonomic improvements for vectors indices.
Welcome our newest contributor @yah01

What's Changed

New Contributors

Preview

  1. We're most of the way there for a DuckDB extension to read Lance datasets natively (i.e., without python)
  2. We're integrating datafusion to enable pushdown of filter predicates

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.2...v0.3.4

lance - v0.3.2 Speed up index creation by more than 60x

Published by changhiskhan over 1 year ago

We discovered two thing making index creation unnecessarily long:

  1. Instead of using KMeans++ to initialize, just do random initialization
  2. Turn of BLAS on macOS because it turned out to be super slow

On macbook air, index creation goes from 25min on sift1m to 24s.
On Ubuntu, it's roughly a 6x speedup.

What's Changed

New Contributors

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.1...v0.3.2

lance - v0.3.1 Index creation tool

Published by changhiskhan over 1 year ago

We added an index creation tool that's 2x faster than FAISS.
Accessible in python via Dataset.create_index

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.3.0...v0.3.1

lance - v0.3.0 Rusty Lances and Friendly Neighbors

Published by changhiskhan over 1 year ago

Sayonara C++, bonjour Rust

What started out as a holiday hack has become a full-blown Rust rewrite.
As we say farewell to our much beloved C++ implementation, we welcome a major new feature to Lance: the vector index.

  1. Lance's vector index is fast and has a small memory footprint. From disk, we benchmark average latencies of 1ms on vanilla macbook airs for 1M vectors.
  2. Your data, vectors, and index can live in harmony under one roof so you don't need to manage a separate index or service.
  3. You can choose to manage and retrieve additional features with the vectors with very little performance impact.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.9...v0.3.0

lance - v0.2.9 pandas extension type for inline images

Published by changhiskhan almost 2 years ago

And also, we've started to implement Lance is Rust. A new kickass vector indexing feature will be coming soon once we do some more cleanup and hook the Rust module back into python.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.8...v0.2.9

lance - v0.2.8 Happy Holidays!

Published by changhiskhan almost 2 years ago

This release contains the following:

  1. A full-fledged ML data quality improvement workflow using Lance showing model performance insights, detecting mislabels, and doing active learning. An experimental integration with Label Studio is demonstrated as well.
  2. Critical bug fix affected read/write of dictionary columns
  3. Imagenet dataset converter

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.7...v0.2.8

lance - v0.2.7 Dataset Diff and Metrics computation, and Dataset Version Metadata

Published by eddyxu almost 2 years ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.6...v0.2.7

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.5...v0.2.6

lance - v0.2.5 Schema evolution, support merging with arrow Table

Published by eddyxu almost 2 years ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.4...v0.2.5

lance - v0.2.4: Schema Evolution and Append Column

Published by eddyxu almost 2 years ago

Support Schema Evolution via Append Column.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.3...v0.2.4

lance - v0.2.3 Bugfix release; breaks dataset proto schema

Published by changhiskhan almost 2 years ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.2...v0.2.3

lance - v0.2.2 Python notebooks and CV dataset conversion.

Published by eddyxu almost 2 years ago

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.1...v0.2.2

lance - v0.2.1 Bug fix release

Published by changhiskhan almost 2 years ago

Fixed bug affecting writes of fixed size list arrays as well as datagen code for Coco.
Updated to Arrow 10.0 newly released.

What's Changed

Full Changelog: https://github.com/eto-ai/lance/compare/v0.2.0...v0.2.1