tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

MIT License

Downloads
79.5K
Stars
11.5K
Committers
126

Bot releases are hidden (Show)

tantivy - Tantivy 0.12

Published by fulmicoton over 4 years ago

  • Removing static dispatch in tokenizers for simplicity. (#762)
  • Added backward iteration for TermDictionary stream. (@halvorboe)
  • Fixed a performance issue when searching for the posting lists of a missing term (@audunhalland)
  • Added a configurable maximum number of docs (10M by default) for a segment to be considered for merge (@hntd187, landed by @halvorboe #713)
  • Important Bugfix #777, causing tantivy to retain memory mapping. (diagnosed by @poljar)
  • Added support for field boosting. (#547, @fulmicoton)
tantivy - Tantivy 0.11.3

Published by fulmicoton almost 5 years ago

  • Fixed DateTime as a fast field (#735)
tantivy - Tantivy 0.11.1

Published by fulmicoton almost 5 years ago

  • Bug fix #729
tantivy - Tantivy 0.11.0

Published by fulmicoton almost 5 years ago

  • Added f64 field. Internally reuse u64 code the same way i64 does (@fdb-hiroshima)
  • Various bugfixes in the query parser.
    • Better handling of hyphens in query parser. (#609)
    • Better handling of whitespaces.
  • Closes #498 - add support for Elastic-style unbounded range queries for alphanumeric types eg. "title:>hello", "weight:>=70.5", "height:<200" (@petr-tik)
  • API change around Box<BoxableTokenizer>. See detail in #629
  • Avoid rebuilding Regex automaton whenever a regex query is reused. #639 (@brainlock)
  • Add footer with some metadata to index files. #605 (@fdb-hiroshima)
  • Add a method to check the compatibility of the footer in the index with the running version of tantivy (@petr-tik)
  • TopDocs collector: ensure stable sorting on equal score. #671 (@brainlock)
  • Added handling of pre-tokenized text fields (#642), which will enable users to
    load tokens created outside tantivy. See usage in examples/pre_tokenized_text. (@kkoziara)
  • Fix crash when committing multiple times with deleted documents. #681 (@brainlock)

How to update?

  • The index format is changed. You are required to reindex your data to use tantivy 0.11.
  • Box<dyn BoxableTokenizer> has been replaced by a BoxedTokenizer struct.
  • Regex are now compiled when the RegexQuery instance is built. As a result, it can now return
    an error and handling the Result is required.
  • tantivy::version() now returns a Version object. This object implements ToString()
tantivy - Tantivy 0.10.3

Published by fulmicoton almost 5 years ago

  • Fix crash when committing multiple times with deleted documents. #681 (@brainlock)
tantivy - Tantivy 0.10.2

Published by fulmicoton about 5 years ago

Hotfix for #656

tantivy - Tantivy 0.10.1

Published by fulmicoton about 5 years ago

  • Closes #544. A few users experienced problems with the directory watching system.
    Avoid watching the mmap directory until someone effectively creates a reader that uses
    this functionality.
tantivy - Tantivy 0.10.0

Published by fulmicoton over 5 years ago

Tantivy 0.10.0 index format is compatible with the index format in 0.9.0.

  • Added an API to easily tweak or entirely replace the
    default score. See TopDocs::tweak_scoreand TopScore::custom_score (@pmasurel)
  • Added an ASCII folding filter (@drusellers)
  • Bugfix in query.count in presence of deletes (@pmasurel)
  • Added .explain(...) in Query and Weight to (@pmasurel)
  • Added an efficient way to delete_all_documents in IndexWriter (@petr-tik).
    All segments are simply removed.

Minor

  • Switched to Rust 2018 (@uvd)
  • Small simplification of the code.
    Calling .freq() or .doc() when .advance() has never been called
    on segment postings should panic from now on.
  • Tokens exceeding u16::max_value() - 4 chars are discarded silently instead of panicking.
  • Fast fields are now preloaded when the SegmentReader is created.
  • IndexMeta is now public. (@hntd187)
  • IndexWriter add_document, delete_term. IndexWriter is Sync, making it possible to use it with a Arc<RwLock<IndexWriter>>. add_document and delete_term can
    only require a read lock. (@pmasurel)
  • Introducing Opstamp as an expressive type alias for u64. (@petr-tik)
  • Stamper now relies on AtomicU64 on all platforms (@petr-tik)
  • Bugfix - Files get deleted slightly earlier
  • Compilation resources improved (@fdb-hiroshima)

How to update?

Your program should be usable as is.

Fast fields

Fast fields used to be accessed directly from the SegmentReader.
The API changed, you are now required to acquire your fast field reader via the
segment_reader.fast_fields(), and use one of the typed method:

  • .u64(), .i64() if your field is single-valued ;
  • .u64s(), .i64s() if your field is multi-valued ;
  • .bytes() if your field is bytes fast field.
tantivy - Tantivy 0.9.1

Published by fulmicoton over 5 years ago

Hotfix . All language were using the English stemmer.

tantivy - Tantivy 0.9

Published by fulmicoton over 5 years ago

0.9.0 index format is not compatible with the previous index format.

Bugfix

Some Mmap objects were being leaked, and would never get released. (@fulmicoton)

New Features

  • Added IndexReader. By default, index is reloaded automatically upon new commits (@fulmicoton)
  • Stemming in other language possible (@pentlander)
  • Added grouped add and delete operations.
    They are guaranteed to happen together (i.e. they cannot be split by a commit).
    In addition, adds are guaranteed to happen on the same segment. (@elbow-jason)
  • Added DateTime field (@barrotsteindev)

Misc improvements

  • Indexer memory footprint improved. (VInt comp, inlining the first block. (@fulmicoton)
  • Removed most unsafe (@fulmicoton)
  • Segments with no docs are deleted earlier (@barrotsteindev)
  • Removed INT_STORED and INT_INDEXED. It is now possible to use STORED and INDEXED
    for int fields. (@fulmicoton)
tantivy - Tantivy 0.8.2

Published by fulmicoton over 5 years ago

0.8.2 fixes build for non x86_64 platforms. See #496 for details.

tantivy - Tantivy 0.8.1

Published by fulmicoton over 5 years ago

Hotfix of #476.

Merge was reflecting deletes before commit was passed.
Thanks @barrotsteindev for reporting the bug.

tantivy - Tantivy 0.8.0

Published by fulmicoton almost 6 years ago

  • API Breaking change in the collector API. (@jwolfe, @fulmicoton)
  • Multithreaded search (@jwolfe, @fulmicoton)
tantivy - Tantivy 0.7.2

Published by fulmicoton almost 6 years ago

Bugfix #457
Removing faulty debug_assert!.

tantivy - Tantivy 0.7.1

Published by fulmicoton almost 6 years ago

  • Bugfix: NGramTokenizer panics on non ascii chars
  • Added a space usage API
tantivy - Tantivy 0.7.0

Published by fulmicoton about 6 years ago

  • Skip data for doc ids and positions (@fulmicoton),
    greatly improving performance
  • Tantivy error now rely on the failure crate (@drusellers)
  • Added support for AND, OR, NOT syntax in addition to the +,- syntax
  • Added a snippet generator with highlight (@vigneshsarma, @fulmicoton)
  • Added a TopFieldCollector (@pentlander)
tantivy - Tantivy 0.6.1

Published by fulmicoton over 6 years ago

  • Bugfix #324. GC removing was removing file that were still in u
    seful
  • Added support for parsing AllQuery and RangeQuery via QueryParser
    • AllQuery: *
    • RangeQuery:
      • Inclusive field:[startIncl to endIncl]
      • Exclusive field:{startExcl to endExcl}
      • Mixed field:[startIncl to endExcl} and vice versa
      • Unbounded field:[start to *], field:[* to end]
tantivy - Tantivy 0.6.0

Published by fulmicoton over 6 years ago

Special thanks to @drusellers and @jason-wolfe for their contributions
to this release!

From now on Tantivy compiles on stable rust.

  • Removed C code. Tantivy is now pure Rust. (@pmasurel)
  • BM25 (@pmasurel)
  • Approximate field norms encoded over 1 byte. (@pmasurel)
  • Compiles on stable rust (@pmasurel)
  • Add &[u8] fastfield for associating arbitrary bytes to each document (@jason-wolfe) (#270)
    • Completely uncompressed
    • Internally: One u64 fast field for indexes, one fast field for the bytes themselves.
  • Add NGram token support (@drusellers)
  • Add Stopword Filter support (@drusellers)
  • Add a FuzzyTermQuery (@drusellers)
  • Add a RegexQuery (@drusellers)
  • Various performance improvements (@pmasurel)_
tantivy - Tantivy 0.5.2

Published by fulmicoton over 6 years ago

Hotfix of 0.5.x for the following issues

  • bugfix #274
  • bugfix #280
  • bugfix #289
tantivy - Tantivy 0.5.1

Published by fulmicoton over 6 years ago

Bugfix #254 : tantivy failed if no documents in a segment contained a specific field.