Bot releases are visible (Hide)

docarray - 💫 Release v0.19.0

Published by github-actions[bot] almost 2 years ago

Release Note (`0.19.0`)

Release time: 2022-11-15 15:22:16

This release contains 2 breaking changes, 11 new features, 1 performance improvement, 7 bug fixes and 7 documentation improvements.

💥 Breaking changes

DocumentArray now supports Qdrant versions above 0.10.1, and drops support for previous versions (#726)
DocumentArray now supports Weaviate versions above 0.16.0 and client 3.9.0, and drops support for previous versions (#736)

🆕 Features

Add flag to disable list-like structure and behavior (#730, #766, #768, #762)

Sometimes, you do not need to use a DocumentArray as a list and access by offset. Since this capability involves keeping in the store a mapping of Offset2ID it comes with overhead.

Now, when using a DocumentArray with external storage, you can disable this behavior. This improves performance when accessing Documents by ID while disallowing some list-like behavior.

from docarray import DocumentArray

da = DocumentArray(storage='qdrant', config={'n_dim': 10, 'list_like': False})

Support find by text and filter for ElasticSearch and Redis backends (#740)

For ElasticSearch and Redis document stores we now support find by text while applying filtering.

from docarray import DocumentArray, Document

da = DocumentArray(storage='elasticsearch', config={'n_dim': 32, 'columns': {'price': 'int'}, 'index_text': True})

with da:
    da.extend(
        [Document(tags={'price': i}, text=f'pizza {i}') for i in range(10)]
    )
    da.extend(
        [
            Document(tags={'price': i}, text=f'noodles {i}')
            for i in range(10)
        ]
    )

results = da.find('pizza', filter={
    'range': {
        'price': {
            'lte': 5,
        }
    }
})

assert len(results) > 0
assert all([r.tags['price'] < 5 for r in results])
assert all(['pizza' in r.text for r in results])

Add 3D data handling of mesh vertices and faces (#709, #717)

DocArray now supports loading data with vertices and faces to represent 3D objects. You can visualize them using display:

from docarray import Document

doc = Document(uri='some/uri')
doc.load_uri_to_vertices_and_faces()
doc.display()

Add `embed_and_evaluate` method (#702, #731)

The method embed_and_evaluate has been added to DocumentArray that performs embedding, matching, and computing evaluation metrics all at once. It batches operations to reduce the computation footprint.

import numpy as np
from docarray import Document, DocumentArray


def emb_func(da):
    for d in da:
        np.random.seed(int(d.text))
        d.embedding = np.random.random(5)


da = DocumentArray(
    [Document(text=str(i), tags={'label': i % 10}) for i in range(1_000)]
)

da.embed_and_evaluate(
    metrics=['precision_at_k'], embed_funcs=emb_func, query_sample_size=100
)

Reduction of memory usage when evaluating 100 query vectors against 500,000 index vectors with 500 dimensions:

Manual Evaluation:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    28   1130.7 MiB   1130.7 MiB           1   @profile
    29                                         def run_evaluation_old_style(queries, index, model):
    30   1133.1 MiB      2.5 MiB           1       queries.embed(model)
    31   2345.6 MiB   1212.4 MiB           1       index.embed(model)
    32   2360.4 MiB     14.8 MiB           1       queries.match(index)
    33   2360.4 MiB      0.0 MiB           1       return queries.evaluate(metrics=['reciprocal_rank'])

Evaluation with `embed_and_evaluate (batch_size 100,000):

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    23   1130.6 MiB   1130.6 MiB           1   @profile
    24                                         def run_evaluation(queries, index, model, batch_size=None):
    25   1130.6 MiB      0.0 MiB           1       kwargs = {'match_batch_size':batch_size} if batch_size else {}
    26   1439.9 MiB    309.3 MiB           1       return queries.embed_and_evaluate(metrics=['reciprocal_rank'], index_data=index, embed_models=model, **kwargs)

Update Qdrant version to 0.10.1 (#726)

This release supports Qdrant versions above 0.10.1. This comes with a lot of performance improvements and bug fixes on the backend.

Add filter support for Qdrant document store (#652)

Qdrant document store now supports pure filtering:

from docarray import Document, DocumentArray
import numpy as np
n_dim = 3
da = DocumentArray(
    storage='qdrant',
    config={'n_dim': n_dim, 'columns': {'price': 'float'}},
)
with da:
    da.extend(
        [
            Document(id=f'r{i}', embedding=i * np.ones(n_dim), tags={'price': i})
            for i in range(10)
        ]
    )


max_price = 7
n_limit = 4
filter = {'must': [{'key': 'price', 'range': {'lte': max_price}}]}

results = da.filter(filter=filter, limit=n_limit)

print('\nPoints with "price" at most 7:\n')
for embedding, price in zip(results.embeddings, results[:, 'tags__price']):
    print(f'\tembedding={embedding},\t price={price}')

This prints:

Points with "price" at most 7:
	embedding=[6. 6. 6.],	 price=6
	embedding=[7. 7. 7.],	 price=7
	embedding=[1. 1. 1.],	 price=1
	embedding=[2. 2. 2.],	 price=2

Support passing `search_params` in `find` for Qdrant document store (#675)

You can now pass search_params in find interface with Qdrant.

results = da.find(np_query, filter=filter, limit=n_limit, search_params={"hnsw_ef": 64})

Add login and logout proxy methods to DocumentArray (#697)

DocArray offers login and logout methods to log into your Jina AI Cloud account directly from DocArray.

from docarray import login, logout
login()
# you are logged in
logout()
# you are logged out

Add `docarray` version to push (#710)

When pushing DocumentArray to cloud, docarray version is now added as metadata.

Add args to `load_uri_to_video_tensor` (#663)

Add keyword arguments that are available in av.open() to load_uri_to_video_tensor()

from docarray import Document

doc = Document(uri='/some/uri')
doc.load_uri_to_video_tensor(timeout=5000)

Update Weaviate server to v1.16.1 and client to 3.9.0 (#736, #750)

This release adds support for Weaviate version above v1.16.0. Make sure to use version 1.16.1 of the Weaviate backend to enjoy all Weaviate features.

🚀 Performance

Sync sub-index only when parent is synced (#719)

Previously, if you used the sub-index feature, every time you add new Documents with chunks, DocArray would persist the offset2ids of the chunk subindex. With this change, the offset2id is persisted once, when the parent DocumentArray's offset2id is persisted.

🐞 Bug Fixes

Exception for all from generator calls on instance (#659)

Previously, when calling generator class methods as from_csv from a DocumentArray instance it had the non-intuitive behavior of not changing the DocumentArray in place.

Now DocumentArray instances are not allowed to call these methods, and raise an Exception.

from docarray import DocumentArray

da = DocumentArray()
da.from_files(
    patterns='*.*',
    size=2,
)

AttributeError: Class method can't be called from a DocumentArray instance but only from the DocumentArray class.

Fix markup error in summary (#739)

Previously, calling summary on a Document that contains some textual patterns would raise an Exception from rich. This release uses the Text class from rich to ensure the text is properly rendered.

Convert score of search results to float (#707)

When using find or match interfaces with Redis document store, scores are now returned as float and not string.

Initialize doc with dataclass obj and kwargs (#694)

Allow initialization of a Document instance with a dataclass object as well as additional kwargs.
Currently, when a Document is initialized with dataclass and kwargs the attributes passed with the dataclass object are overridden.

from docarray import dataclass, Document
from docarray.typing import Text

@dataclass
class MyDoc:
    chunk_text: Text

d = Document(MyDoc(chunk_text='chunk level text'), text='top level text')

assert d.text == 'top level text'
assert d.chunk_text.text == 'chunk level text'

Attribute error with empty list in dataclass (#674)

Allow passing an empty List as field input of a dataclass:

from docarray import *
from docarray.typing import *

from typing import List

@dataclass()
class A:
    img: List[Text]

Document(A(img = []))

Propagate context enter and exit to subindices (#737)

When using DocumentArray as a context manager, subindices are now handled as context managers as well.
This makes handling subindices more robust.

Correct type hint for tags in DocumentData (#735 )

Change the type hint for tags in docarray.document.data.DocumentData from tags: Optional[Dict[str, 'StructValueType']] to tags: Optional[Dict[str, Any]].
This stops the IDE complaining when passing nested dictionaries inside tags.

📗 Documentation Improvements

Add new benchmark page with SIFT1M dataset (#691)

Change the benchmark section of docs to use SIFT1M dataset. Also add QPS-Recall graphs to compare how different DocumentStores work in DocArray.

newplot

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)
Adrien (@adrienlachaize)
anna-charlotte (@anna-charlotte)
Nan Wang (@nan-wang)
Bob van Luijt (@bobvanluijt)
Dirk Kulawiak (@dirkkul)
Jackmin801 (@Jackmin801)
Nicholas Dunham (@NicholasDunham)
Johannes Messner (@JohannesMessner)
samsja (@samsja)
Joan Fontanals (@JoanFM)
AlaeddineAbdessalem (@alaeddine-13)
dong xiang (@dongxiang123)
Anne Yang (@AnneYang720)
Marco Luca Sbodio (@marcosbodio)
Michael Günther (@guenthermi)
Han Xiao (@hanxiao)
Marco Luca Sbodio (@marcosbodio])

docarray - 💫 Patch v0.18.1

Published by github-actions[bot] almost 2 years ago

Release note

This release contains 1 hot fix.

🐞 Bug Fix

Require AnnLite 0.3.13

To avoid a breaking change, DocArray now requires AnnLite version 0.3.13.

🤟 Contributors

samsja (@samsja)

docarray - 💫 Release v0.18.0

Published by github-actions[bot] about 2 years ago

Release Note

This release contains 7 new features, 6 bug fixes and 8 documentation improvements.

🆕 Features

Support geospatial filters in Redis backend (#579)

The Redis Document Store can now accept geospatial filter queries in the DocumentArray.find() method:

from docarray import Document, DocumentArray

n_dim = 3
da = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'columns': {'location': 'geo'},
    },
)
with da:
    da.extend(
        [
            Document(id=f'r{i}', tags={'location': f"{-98.17+i},{38.71+i}"})
            for i in range(10)
        ]
    )
max_distance = 300
filter = f'@location:[-98.71 38.71 {max_distance} km] '
results = da.find(filter=filter, limit=10)
print(
    f'Locations within: {max_distance} km',
    [(doc.id, doc.tags['location']) for doc in results],
)

Results:

Locations within: 300 km [('r0', '-98.17,38.71'), ('r1', '-97.17,39.71')]

Support multiple metrics in evaluate (#643)

DocumentArray.evaluate() now supports computing evaluations for multiple metrics at once. The metric parameter is
renamed to metrics, and metric_name is renamed to metric_names.

The evaluate() method expects a list for metrics and metric_names rather than a single value.
For instance, instead of doing:

da2.evaluate(
    ground_truth=da1, metric='precision_at_k', metric_name='precision@k', k=5
)  # returns average_evaluation

use:

da2.evaluate(
    ground_truth=da1, metrics=['precision_at_k'], metric_names=['precision@k'], k=10
)  # returns {'precision@k': prec_at_k_average_evaluation}

The first usage will raise a deprecation warning and will be deprecated soon.

The return type is also changed: evaluate() will now return a dict mapping metric names to their average evaluation scores
instead of a single score value.

For more info, check the Evaluate Matches section in the documentation.

Show server error messages in push

When using DocumentArray.push(), error messages returned by the server will show up in the stack trace. For instance, pushing a DocumentArray with a name reserved by another user will return the following error:

requests.exceptions.HTTPError: 403 Client Error: OperationNotAllowedError: Current user is not allowed to edit this artifact. Permission denied. for url: https://api.hubble.jina.ai/v2/rpc/artifact.upload

Add warnings when using MongoDB-like filter QL syntax in Redis and support native filter QL (#645)

MongoDB-like filter QL is no longer supported in the Redis backend, and this release adds support for the native Redis QL syntax. Using MongoDB-like filter QL will raise a deprecation warning and will be deprecated soon.

Therefore, instead of using:

redis_da.find(filter={'field': {'@eq': 'value'}})

use this syntax instead:

redis_da.find(filter='@field:value')

For more information, check the Redis Document Store documentation.

Add support for labeled datasets to the evaluate function (#617)

As of this release, DocumentArray.evaluate() supports labeled datasets. Labels can be added using a tag field in
each Document of your DocumentArray:

import numpy as np
from docarray import Document, DocumentArray

example_da = DocumentArray([Document(tags={'label': (i % 2)}) for i in range(10)])
example_da.embeddings = np.random.random([10, 3])
example_da.match(example_da)
print(example_da.evaluate(metric='precision_at_k'))

The results of the evaluation will be stored in the evaluations field of each Document.

You can specify the label field name using the label_tag attribute:

example_da = DocumentArray(
    [Document(tags={'my_custom_label': (i % 2)}) for i in range(10)]
)
example_da.embeddings = np.random.random([10, 3])
example_da.match(example_da)
print(example_da.evaluate(metric='precision_at_k', label_tag='my_custom_label'))

Allow progress bar while batching (#628)

You can see the progress of batching documents using DocumentArray.batch() with the show_progress parameter:

import time
from docarray import Document, DocumentArray

da = DocumentArray.empty(100000)
for i in range(1, 100000):
    da.append(Document(text=str(i)))

print('append finished')

for batch in da.batch(500, show_progress=True):
    time.sleep(0.1)

my gif

Add the `n_components` PCA parameter to AnnLite configurations (#606)

The parameter n_components is added to AnnLite's configuration in DocArray. Use this parameter when you want to use
PCA in your AnnLite backend.

🐞 Bug Fixes

Support Qdrant 0.8.0

DocArray adds support for Qdrant versions greater than or equal to v0.8.0 and drops support for previous versions.
Therefore, make sure to use version 0.8.0 or higher for both qdrant-client and the Qdrant database.

Sync DocumentArray using sync() method and context manager (#625)

Fully persisting (syncing) data in a DocumentArray to a database now is ensured using either the context manager or
the sync() method. Make sure to wrap write operations to a DocumentArray in a context manager like so:

my_da = DocumentArray(storage='my_storage', config=...)
with my_da:
    ...  # write operations

or use the sync() method:

my_da = DocumentArray(storage='my_storage', config=...)
...  # write operations
my_da.sync()

Close the file handler properly in `load_uri_to_audio_tensor` (#609)

Method load_uri_to_audio_tensor used to open a file handler without properly closing the file.
This release fixes this bug and makes sure the file is opened with a context manager and is closed properly.

Fix add not performing deep copy (#582)

Concatenation operations in DocumentArray used to operate on objects in-place, without making a copy.
This resulted in the following unexpected behavior:

from docarray import DocumentArray

da1 = DocumentArray.empty(3)
da2 = DocumentArray.empty(4)
da3 = DocumentArray.empty(5)
print(da1 + da2 + da3)

da1 += da2
print('length =', len(da1))  # expected length = 7 but prints length = 16

This release fixes the bug. Concatenation will operate on new copied objects each time rather than concatenating
in-place.

Fix loading from a database with subindices (#581)

Prior to this release, reloading a DocumentArray configured with subindices from a database used to produce a
unique ID existing error (the actual error message depends on the backend). This happened because DocumentArray
attempted to index initial documents twice in the sub-index, although they had been already indexed.

This release fixes the issue.

Remove check of default value in _non_empty_fields (#565)

Serializing a Document used to ignore scores with value 0.0. For instance, the string representation of a Document
might ignore the scores with value 0 and consider them as an empty field. This release fixes the issue.

📗 Documentation Improvements

Highlight the importance of using the context manager when it comes to fully persisting data in a database. Read more in Persistence, mutations and context manager. (#613)
Fix a mention of the convert_uri_to_datauri() method in the documentation. (#608)
Fix the documentation build stage so that the API reference section appears correctly for Document Stores. Now you can find the API reference for Document Stores in this section. (#594)
Fix the docstring of the set_image_normalization() method so that it mentions proper usage and aligns with PyTorch
conventions. (#585)
Clarify that the Query Language syntax of filter queries in DocArray depends on the Document Store used with the
DocumentArray instance. (#586)
Introduce a few improvements to the README example, so that the user takes into consideration the dataset size and
requirements. (#577)
Fix an example of plotting embeddings in the README. (#576)
Introduce a few grammatical improvements to the What is DocArray section. (#566)

💥 Backwards incompatible API changes

Increased minimum versions for dependencies:

Package	Minimum Version
`qdrant`	`0.8.0`

The Qdrant backend in DocArray now requires Qdrant database v0.8.0 or higher.

Other API Changes:

The return type of DocumentArray.evaluate() changed from a single score float to a dict mapping score names to score values.
Fully persisting data in DocumentArray using a storage backend now has to be ensured by using the context manager. Therefore, you need to wrap your write operations to a DocumentArray in a context manager like so:

my_da = DocumentArray(storage='my_storage', config=...)
with my_da:
    ...  # write operations

Alternatively, you can call the sync() method when you finish write operations:

my_da = DocumentArray(storage='my_storage', config=...)
...  # write operations
my_da.sync()

Future API Changes:

The MongoDB-like query language syntax for filtering in the Redis backend will be deprecated soon.
The metric and metric_name parameters in the DocumentArray.evaluate() method were renamed and accept a list type rather than a single value (as mentioned above). The old naming and type will be deprecated soon.

🤟 Contributors

We would like to thank all contributors to this release:
Jie Fu (@jemmyshin)
Wang Bo (@bwanglzu)
Jonathan Rowley (@jonathan-rowley)
Alex Cureton-Griffiths (@alexcg1)
Han Xiao (@hanxiao)
AlaeddineAbdessalem (@alaeddine-13)
Michael Günther (@guenthermi)
samsja (@samsja)
Johannes Messner (@JohannesMessner)
dong xiang (@dongxiang123)
Jackmin801 (@Jackmin801)
Joan Fontanals (@JoanFM)

docarray - 💫 Release v0.17.0

Published by github-actions[bot] about 2 years ago

Release Note (`0.17.0`)

Release time: 2022-09-23 16:18:19

This release contains 8 new features, 2 performance improvements, 7 bug fixes, and 2 documentation improvements.

🆕 Features

Allow passing parameters to `load_uri_to_*` methods (#540)

The load_uri_to_* methods (load_uri_to_blob, load_uri_to_text, etc.) now accept kwargs so that you can pass a timeout parameter to the underlying request methods.

For example:

doc = Document(uri='uri_path')
doc.load_uri_to_blob(timeout=2)

Allow multiple DocumentArrays per Redis server (#540)

You can now store multiple DocumentArrays in a single Redis instance, as long as each DocumentArray has a different index_name:

da1 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 128, 'index_name': 'da1'})
da2 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 256, 'index_name': 'da2'})
da3 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 512, 'index_name': 'da3'})

Login required for DocumentArray push and pull (#541)

Logging in to Jina Cloud is now required before pushing/pulling DocumentArrays to/from Jina Cloud. You can log in either by creating a token in hub.jina.ai and setting it as an environment variable (JINA_AUTH_TOKEN=my_token) or using the CLI command jina auth login.

Push metadata along with DocumentArray and add `cloud_list` and `cloud_delete` methods (#490)

DocumentArray.push will extract metadata about the DocumentArray and send it to Jina Cloud. Although this is transparent to users, it will help with visualization of DocumentArrays in Jina Cloud.

It is also possible to list and delete DocumentArrays in Jina Cloud using the following methods:

DocumentArray.cloud_list(): will list all DocumentArray objects owned by the authenticated user
DocumentArray.cloud_delete(da_name): will delete the DocumentArray by name if it is owned by the authenticated user

Full text search support in Redis backend (#535)

Full text search is supported either on the Document.text field or on Document tags as long as you enable indexing text or specify tag fields to be indexed.

For example:

from docarray import Document, DocumentArray

da = DocumentArray(
    storage='redis', config={'n_dim': 2, 'index_text': True}
)
da.extend([
    Document(text='Redis allows you to search by text query,'),
    Document(text='by vector similarity'),
    Document(text='Or by filter conditions'),
]) # add documents with text field

da.find('my text query').texts

Result:

['Redis allows you to search by text query,']

Add logical operators `$and` and `$or` in Redis (#509)

The Redis backend now supports $and and $or logical operators. For example:

from docarray import DocumentArray

da = DocumentArray(storage='redis', config={'n_dim': 128, 'columns': {'col1': 'str', 'col2': 'int'}})

redis_filter = {
    "$or": {
        "col1": {"$eq": "value"},
        "col2": {"$lt": 100}
    }
}

# retrieve documents using filter
da.find(redis_filter)

Columns in backend configuration should be a dictionary, not a list of tuples (#526)

The columns configuration parameter for storage backends has been changed from a list of tuples to a dictionary in the following format: {'column_name': 'column_type'}. This helps with YAML compatibility.

For example:

from docarray import DocumentArray
da = DocumentArray(storage='annlite', config={'n_dim': 128, 'columns': {'col1': 'str', 'col2': 'float'}})

Allow displaying image documents using either tensor or URI (#518)

It is now possible to choose which field to use when displaying an image document:

from docarray import Document

d = Document(uri=os.path.join(cur_dir, 'toydata/test.png'))
d.display()
d.display(from_='uri')

d.load_uri_to_image_tensor()
d.display(from_='tensor')

Backwards incompatible API changes

Increased minimum versions for dependencies:

Package	Minimum Version
`jina-hubble-sdk`	`0.13.1`
`annlite`	`0.3.12`

Other API Changes:

The columns configuration parameter for storage backends has been changed from a list of tuples to a dictionary in the following format: {'column_name': 'column_type'}.

🚀 Performance

Optimize find with an `exists` condition (#519)

We got rid of unnecessary and costly computation when computing DocumentArray.find with an exists filter. When running the following code:

from docarray import DocumentArray, Document
da = DocumentArray(Document(text='text') for _ in range(num)) + \
     DocumentArray(Document(blob=b'blob') for _ in range(num))


da.find(query={'text': {'$exists': True}})

you should expect a 200-300% speed increase in find latency.

This optimization only affects performing DocumentArray.find or DocumentArray.match when an exists condition is used and an in-memory document store is used.

Change default journal mode to WAL in SQLite backend (#506):

The default journal mode in the SQLite backend is now WAL. This should improve performance when using the SQLite backend.

According to the SQLite docs, WAL is significantly faster, provides more concurrency, and is more robust.

🐞 Bug Fixes

Keep default values for vector similarity parameters in Redis backend (#559)

DocumentArray's Redis backend previously initialized schemas in the Redis database with default values of vector similarity search parameters. Those default values came from DocArray, not Redis.

This altered the database's default behavior, although the user didn't explicitly specify that. We've changed the implementation to avoid altering default values of the database. Default values now depend on the Redis database version.

Adapt to AnnLite changes (#543)

AnnLite introduced a breaking change in 0.3.12. Therefore, we have adapted our implementation to the latest version of AnnLite and increased the minimum required version to 0.3.12.

Keep out of mask docs in delete by mask (#534)

DocumentArray's delete by mask operation used to present an unexpected behavior. The following code erases the last Document, even though it is not covered by the mask:

da = DocumentArray.empty(3)
mask = [True, False]
del da[mask]
print(len(da))  # prints 1

We have fixed this behavior, and DocumentArray will now correctly keep documents that are not present in the mask.

Fix Finetuner link for Totally Looks Like (#532)

We've fixed an incorrect link in the documentation.

Fix AnnLite type map (#533)

DocArray type mapping used the wrong types in AnnLite. We've now replaced the types specified in the document store implementation with the correct ones.

Create Strawberry types with kwargs (#527)

Strawberry introduced a breaking change in 0.128.0, making it necessary to pass parameters as key arguments. We've adapted our code base to this change.

Make device more generic (#515)

Some parts of in-memory distance computation used to restrict tensor device conversion to cuda. We've changed the implementation to make device conversion more generic.

📗 Documentation Improvements

Add benchmark reference to feature summary (#510)

We've added a "One Million Benchmark" section to the "Feature Summary" page.

Update push/pull setup instructions (#516)

We've updated the pip setup instruction required to use DocumentArray push/pull.

🤟 Contributors

We would like to thank all contributors to this release: Joan Fontanals(@github_user)
Leon Wolf(@fogx)
samsja(@samsja)
AlaeddineAbdessalem(@alaeddine-13)
Halo Master(@linkerlin)
Han Xiao(@hanxiao)
Wang Bo(@bwanglzu)
Anne Yang(@AnneYang720)
Joan Fontanals(@JoanFM)

docarray - 💫 Patch v0.16.5

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.5`)

Release time: 2022-09-08 17:56:12

🙇 We'd like to thank all contributors for this new release! In particular,
Anne Yang, Jina Dev Bot, 🙇

🆕 New Features

[cefb66c0] - redis: add logic operators $and and $or in redis (#509) (Anne Yang)

🍹 Other Improvements

[404b9731] - version: the next version will be 0.16.5 (Jina Dev Bot)

docarray - 💫 Patch v0.16.4

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.4`)

Release time: 2022-09-08 15:59:21

🙇 We'd like to thank all contributors for this new release! In particular,
Joan Fontanals, Wang Bo, Jina Dev Bot, 🙇

🆕 New Features

[fff3ecca] - columns should be a dictionary not list of tuples (#526) (Joan Fontanals)

🐞 Bug fixes

[531bd835] - fix fiinetuner link for totally looks like (#532) (Wang Bo)
[4526bc7d] - fix annlite type map (#533) (Joan Fontanals)

🍹 Other Improvements

[c7105983] - version: the next version will be 0.16.4 (Jina Dev Bot)

docarray - 💫 Patch v0.16.3

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.3`)

Release time: 2022-09-06 09:46:29

🙇 We'd like to thank all contributors for this new release! In particular,
Joan Fontanals, AlaeddineAbdessalem, samsja, Jina Dev Bot, 🙇

🆕 New Features

[fa05ec35] - allow choose to display from tensor or from uri in document (#518) (samsja)

⚡ Performance Improvements

[d6152331] - only check if field is set (#519) (AlaeddineAbdessalem)

🐞 Bug fixes

[e8cc9c56] - create strawberry types with kwargs (#527) (Joan Fontanals)

📗 Documentation

[0b2326bd] - update seri (#516) (samsja)

🏁 Unit Test and CICD

[b268f6f3] - ci fix (#520) (AlaeddineAbdessalem)

🍹 Other Improvements

[4d4fb504] - version: the next version will be 0.16.3 (Jina Dev Bot)

docarray - 💫 Patch v0.16.2

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.2`)

Release time: 2022-08-30 19:00:32

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Halo Master, Jina Dev Bot, 🙇

🐞 Bug fixes

[34bf27f3] - find: make device more generic (#515) (Han Xiao)
[459703e9] - sqlite: change default journal mode to WAL (#506) (Halo Master)

🍹 Other Improvements

[2b5fe5ca] - docs: fix benchmark table (Han Xiao)
[a1c6f30d] - version: the next version will be 0.16.2 (Jina Dev Bot)

docarray - 💫 Patch v0.16.1

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.1`)

Release time: 2022-08-29 13:57:21

🙇 We'd like to thank all contributors for this new release! In particular,
Jina Dev Bot, 🙇

🍹 Other Improvements

[68533181] - version: the next version will be 0.16.1 (Jina Dev Bot)

docarray - 💫 Patch v0.16.0

Published by github-actions[bot] about 2 years ago

Release Note (`0.16.0`)

Release time: 2022-08-29 10:19:26

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, AlaeddineAbdessalem, Anne Yang, Joan Fontanals, felix-wang, Johannes Messner, Jina Dev Bot, 🙇

🆕 New Features

[c2235de1] - redis: implement Redis storage backend and unit tests (#452) (Anne Yang)
[24be6ba8] - bump protobuf (#371) (Joan Fontanals)

🐞 Bug fixes

[7c91c7bd] - annlite offsetmapping (#504) (felix-wang)
[be788678] - plot: be robust against non-existing subindices (#503) (Johannes Messner)

🍹 Other Improvements

[2c17d888] - docs: update docs generation (Han Xiao)
[615fa85c] - include redis in benchmarking script (Han Xiao)
[120135cf] - cleanup ci (#505) (AlaeddineAbdessalem)
[6aaf0e9d] - update readme (Han Xiao)
[c5ff8705] - fix readme (Han Xiao)
[cc88ec28] - version: the next version will be 0.15.5 (Jina Dev Bot)

docarray - 💫 Patch v0.15.4

Published by github-actions[bot] about 2 years ago

Release Note (`0.15.4`)

Release time: 2022-08-25 07:34:28

🙇 We'd like to thank all contributors for this new release! In particular,
AlaeddineAbdessalem, samsja, Jina Dev Bot, 🙇

🐞 Bug fixes

[51fec4f4] - cap annlite version (#497) (AlaeddineAbdessalem)

📗 Documentation

[e75ca80d] - bump es version in documentation to match the tests (#498) (samsja)

🍹 Other Improvements

[e52d749a] - Revert "fix: cap annlite version (#497)" (#499) (AlaeddineAbdessalem)
[37982c6b] - version: the next version will be 0.15.4 (Jina Dev Bot)

docarray - 💫 Patch v0.15.3

Published by github-actions[bot] about 2 years ago

Release Note (`0.15.3`)

Release time: 2022-08-23 12:02:49

🙇 We'd like to thank all contributors for this new release! In particular,
Johannes Messner, Han Xiao, Jina Dev Bot, 🙇

🐞 Bug fixes

[cb066d83] - update offset2id when deleting in subindex (#496) (Johannes Messner)
[69a976fd] - find: fix misleading signature of _find_by_text (#495) (Han Xiao)

🍹 Other Improvements

[0dfb5f2c] - version: the next version will be 0.15.3 (Jina Dev Bot)

docarray - 💫 Patch v0.15.2

Published by github-actions[bot] about 2 years ago

Release Note (`0.15.2`)

Release time: 2022-08-19 12:58:37

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, 🙇

🆕 New Features

[b54bb09a] - video: add height_width for webcam caps (#494) (Han Xiao)

🍹 Other Improvements

[fefc0025] - version: the next version will be 0.15.2 (Jina Dev Bot)

docarray - 💫 Patch v0.15.1

Published by github-actions[bot] about 2 years ago

Release Note (`0.15.1`)

Release time: 2022-08-19 11:18:56

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Johannes Messner, Jina Dev Bot, 🙇

🆕 New Features

[c936b18f] - video: add from_webcam generator (#493) (Han Xiao)
[9e00a008] - push meta data along with docarray (Han Xiao)

🐞 Bug fixes

[461b996f] - set subindices directly via access path (#488) (Johannes Messner)

🍹 Other Improvements

[7e3988c2] - Revert "feat: push meta data along with docarray" (Han Xiao)
[42bf943c] - version: the next version will be 0.15.1 (Jina Dev Bot)

docarray - 💫 Minor v0.15.0

Published by github-actions[bot] about 2 years ago

Release Note (`0.15.0`)

Release time: 2022-08-12 19:11:24

Highlights 🌟

Subindices for all Document Stores 🤓

Subindices allow you to efficiently search through nested and multimodal Documents, at any nesting level, without loading them into memory.

This works by creating dedicated database indices using subindex_configs= and searching through a specific subindex using on=:

@dataclass
class MyDoc:
    image: Image
    paragraph: Text


da = DocumentArray(
    [
        Document(MyDoc(image='apple.png', paragraph='hello')),
        Document(MyDoc(image='apple.png', paragraph='world'))
    ],
    storage='annlite',
    subindex_configs={'@.[image]': None},  # add subindex for '@.[image]'
)
da['@.[image]'].embeddings = ...  # add image embeddings
da.find(on='@.[image]')  # find closest image; can also take '@c' etc.

Read more about this feature in our docs.

#456

Other improvements ⚡

ElasticSearch: allow kwargs to be passed in .extend() #473
ElasticSearch: Better error handling for bulk operations #465

Bug fixes 🪲

SQLite: keep offset2id consistent during all operations #471
Various fixes to .plot() #472 #468
Avoid clashes of randomly generated Document id's #464

docarray - 💫 Patch v0.14.11

Published by github-actions[bot] about 2 years ago

Release Note (`0.14.11`)

Release time: 2022-08-11 20:09:20

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, 🙇

🧼 Code Refactoring

[930c75cb] - use hubble sdk for get_token (#485) (Han Xiao)

🍹 Other Improvements

[f7b8eb89] - version: the next version will be 0.14.11 (Jina Dev Bot)

docarray - 💫 Patch v0.14.10

Published by github-actions[bot] about 2 years ago

Release Note (`0.14.10`)

Release time: 2022-08-11 15:25:02

🙇 We'd like to thank all contributors for this new release! In particular,
Nan Wang, Alvin Prayuda, Han Xiao, Johannes Messner, Jina Dev Bot, 🙇

🆕 New Features

[8f66a15e] - elastic: add bulk get operation (#478) (Alvin Prayuda)
[413af099] - elastic: add find kwargs for perf tuning (#481) (Alvin Prayuda)
[ec7dcce1] - subindex for all document stores (#456) (Johannes Messner)

🐞 Bug fixes

[71274d2d] - retrieve MAX_ES_RETURNED_DOCS docs in bulk operations (#484) (Nan Wang)

🧼 Code Refactoring

[a43b498d] - use hubble sdk for push pull (#482) (Han Xiao)

🍹 Other Improvements

[fc39c042] - version: the next version will be 0.14.10 (Jina Dev Bot)

docarray - 💫 Patch v0.14.9

Published by github-actions[bot] about 2 years ago

Release Note (`0.14.9`)

Release time: 2022-08-09 20:59:47

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, 🙇

🐞 Bug fixes

[30e00322] - image: fix image padding on vertical direction (#480) (Han Xiao)

🍹 Other Improvements

[23ce6518] - version: the next version will be 0.14.9 (Jina Dev Bot)

docarray - 💫 Patch v0.14.8

Published by github-actions[bot] about 2 years ago

Release Note (`0.14.8`)

Release time: 2022-08-05 09:17:34

🙇 We'd like to thank all contributors for this new release! In particular,
Alvin Prayuda, Jina Dev Bot, 🙇

🆕 New Features

[9f939e6f] - elastic: allow kwargs to be pass at extend (#473) (Alvin Prayuda)

🍹 Other Improvements

[67f60cd7] - version: the next version will be 0.14.8 (Jina Dev Bot)

docarray - 💫 Patch v0.14.7

Published by github-actions[bot] about 2 years ago

Release Note (`0.14.7`)

Release time: 2022-08-04 12:53:30

🙇 We'd like to thank all contributors for this new release! In particular,
Johannes Messner, Jina Dev Bot, 🙇

🐞 Bug fixes

[6b4f6fc0] - keep offset2id updated when deleting in sqlite (#471) (Johannes Messner)

🍹 Other Improvements

[505d7767] - version: the next version will be 0.14.7 (Jina Dev Bot)

Package Rankings

Top 1.58% on Pypi.org

Top 5.69% on Proxy.golang.org

Related Projects

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch

14 Jun 2023 622

crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of d...

10 Apr 2013 3,988

app-search-nlp-insurance

05 Dec 2022 0

elastic_search_guide

These are my notes on Elasticsearch and search/retrieval methods (including data structures) in g...

17 Jul 2024 0

laravel-elastic-vision

Elasticsearch driver for Laravel Scout.

22 Jan 2023 17

advanced-java

😮 Core Interview Questions & Answers For Experienced Java(Backend) Developers | 互联网 Java 工程师进阶知识完...

06 Oct 2018 74,113

blog-langchain-elasticsearch

Code examples accompanying blog "Privacy-first AI search using LangChain and Elasticsearch"

17 May 2023 30

flexsearch

Next-Generation full text search library for Browser and Node.js

25 Feb 2018 11,784

gen-ai-meetup-to-2023

18 Jul 2023 4

dataux

Federated mysql compatible proxy to elasticsearch, mongo, cassandra, big-table, google datastore

27 Dec 2014 318

elastic

Deprecated: Use the official Elasticsearch client for Go at https://github.com/elastic/go-elastic...

06 Dec 2012 7,399

app-search-javascript

Elastic App Search Official JavaScript Client

09 Aug 2019 66

db-tutorial

📚 db-tutorial 是一个数据库教程。

08 Aug 2017 4,195

elasticsearch-specification

Elasticsearch full specification

23 Feb 2016 102

great-big-example-application

A full-stack example app built with JHipster, Spring Boot, Kotlin, Angular 4, ngrx, and Webpack

23 Oct 2016 927

docarray

Release Note (0.19.0)

💥 Breaking changes

🆕 Features

Add flag to disable list-like structure and behavior (#730, #766, #768, #762)

Support find by text and filter for ElasticSearch and Redis backends (#740)

Add 3D data handling of mesh vertices and faces (#709, #717)

Add embed_and_evaluate method (#702, #731)

Update Qdrant version to 0.10.1 (#726)

Add filter support for Qdrant document store (#652)

Support passing search_params in find for Qdrant document store (#675)

Add login and logout proxy methods to DocumentArray (#697)

Add docarray version to push (#710)

Add args to load_uri_to_video_tensor (#663)

Update Weaviate server to v1.16.1 and client to 3.9.0 (#736, #750)

🚀 Performance

Sync sub-index only when parent is synced (#719)

🐞 Bug Fixes

Exception for all from generator calls on instance (#659)

Fix markup error in summary (#739)

Convert score of search results to float (#707)

Initialize doc with dataclass obj and kwargs (#694)

Attribute error with empty list in dataclass (#674)

Propagate context enter and exit to subindices (#737)

Correct type hint for tags in DocumentData (#735 )

📗 Documentation Improvements

Add new benchmark page with SIFT1M dataset (#691)

Other documentation improvements

🤟 Contributors

Release note

🐞 Bug Fix

Require AnnLite 0.3.13

🤟 Contributors

Release Note

🆕 Features

Support geospatial filters in Redis backend (#579)

Support multiple metrics in evaluate (#643)

Show server error messages in push

Add warnings when using MongoDB-like filter QL syntax in Redis and support native filter QL (#645)

Add support for labeled datasets to the evaluate function (#617)

Allow progress bar while batching (#628)

Add the n_components PCA parameter to AnnLite configurations (#606)

🐞 Bug Fixes

Support Qdrant 0.8.0

Sync DocumentArray using sync() method and context manager (#625)

Close the file handler properly in load_uri_to_audio_tensor (#609)

Fix add not performing deep copy (#582)

Fix loading from a database with subindices (#581)

Remove check of default value in _non_empty_fields (#565)

📗 Documentation Improvements

💥 Backwards incompatible API changes

Increased minimum versions for dependencies:

Other API Changes:

Future API Changes:

🤟 Contributors

Release Note (0.17.0)

🆕 Features

Allow passing parameters to load_uri_to_* methods (#540)

Allow multiple DocumentArrays per Redis server (#540)

Login required for DocumentArray push and pull (#541)

Push metadata along with DocumentArray and add cloud_list and cloud_delete methods (#490)

Full text search support in Redis backend (#535)

Add logical operators $and and $or in Redis (#509)

Columns in backend configuration should be a dictionary, not a list of tuples (#526)

Allow displaying image documents using either tensor or URI (#518)

Backwards incompatible API changes

Increased minimum versions for dependencies:

🚀 Performance

Optimize find with an exists condition (#519)

Change default journal mode to WAL in SQLite backend (#506):

🐞 Bug Fixes

Keep default values for vector similarity parameters in Redis backend (#559)

Adapt to AnnLite changes (#543)

Keep out of mask docs in delete by mask (#534)

Fix Finetuner link for Totally Looks Like (#532)

Fix AnnLite type map (#533)

Create Strawberry types with kwargs (#527)

Make device more generic (#515)

📗 Documentation Improvements

Add benchmark reference to feature summary (#510)

Release Note (`0.19.0`)

Add `embed_and_evaluate` method (#702, #731)

Support passing `search_params` in `find` for Qdrant document store (#675)

Add `docarray` version to push (#710)

Add args to `load_uri_to_video_tensor` (#663)

Add the `n_components` PCA parameter to AnnLite configurations (#606)

Close the file handler properly in `load_uri_to_audio_tensor` (#609)

Release Note (`0.17.0`)

Allow passing parameters to `load_uri_to_*` methods (#540)

Push metadata along with DocumentArray and add `cloud_list` and `cloud_delete` methods (#490)

Add logical operators `$and` and `$or` in Redis (#509)

Optimize find with an `exists` condition (#519)

Release Note (`0.16.5`)

Release Note (`0.16.4`)

Release Note (`0.16.3`)

Release Note (`0.16.2`)

Release Note (`0.16.1`)

Release Note (`0.16.0`)

Release Note (`0.15.4`)

Release Note (`0.15.3`)

Release Note (`0.15.2`)

Release Note (`0.15.1`)

Release Note (`0.15.0`)

Release Note (`0.14.11`)

Release Note (`0.14.10`)

Release Note (`0.14.9`)

Release Note (`0.14.8`)

Release Note (`0.14.7`)