Represent, send, store and search multimodal data
APACHE-2.0 License
Bot releases are visible (Hide)
Published by github-actions[bot] almost 2 years ago
0.19.0
)Release time: 2022-11-15 15:22:16
This release contains 2 breaking changes, 11 new features, 1 performance improvement, 7 bug fixes and 7 documentation improvements.
Sometimes, you do not need to use a DocumentArray
as a list and access by offset. Since this capability involves keeping in the store a mapping of Offset2ID
it comes with overhead.
Now, when using a DocumentArray
with external storage, you can disable this behavior. This improves performance when accessing Documents by ID while disallowing some list-like
behavior.
from docarray import DocumentArray
da = DocumentArray(storage='qdrant', config={'n_dim': 10, 'list_like': False})
For ElasticSearch and Redis document stores we now support find
by text while applying filtering.
from docarray import DocumentArray, Document
da = DocumentArray(storage='elasticsearch', config={'n_dim': 32, 'columns': {'price': 'int'}, 'index_text': True})
with da:
da.extend(
[Document(tags={'price': i}, text=f'pizza {i}') for i in range(10)]
)
da.extend(
[
Document(tags={'price': i}, text=f'noodles {i}')
for i in range(10)
]
)
results = da.find('pizza', filter={
'range': {
'price': {
'lte': 5,
}
}
})
assert len(results) > 0
assert all([r.tags['price'] < 5 for r in results])
assert all(['pizza' in r.text for r in results])
DocArray now supports loading data with vertices and faces to represent 3D objects. You can visualize them using display
:
from docarray import Document
doc = Document(uri='some/uri')
doc.load_uri_to_vertices_and_faces()
doc.display()
embed_and_evaluate
method (#702, #731)The method embed_and_evaluate
has been added to DocumentArray
that performs embedding, matching, and computing evaluation metrics all at once. It batches operations to reduce the computation footprint.
import numpy as np
from docarray import Document, DocumentArray
def emb_func(da):
for d in da:
np.random.seed(int(d.text))
d.embedding = np.random.random(5)
da = DocumentArray(
[Document(text=str(i), tags={'label': i % 10}) for i in range(1_000)]
)
da.embed_and_evaluate(
metrics=['precision_at_k'], embed_funcs=emb_func, query_sample_size=100
)
Reduction of memory usage when evaluating 100 query vectors against 500,000 index vectors with 500 dimensions:
Manual Evaluation:
Line # Mem usage Increment Occurrences Line Contents
=============================================================
28 1130.7 MiB 1130.7 MiB 1 @profile
29 def run_evaluation_old_style(queries, index, model):
30 1133.1 MiB 2.5 MiB 1 queries.embed(model)
31 2345.6 MiB 1212.4 MiB 1 index.embed(model)
32 2360.4 MiB 14.8 MiB 1 queries.match(index)
33 2360.4 MiB 0.0 MiB 1 return queries.evaluate(metrics=['reciprocal_rank'])
Evaluation with `embed_and_evaluate (batch_size 100,000):
Line # Mem usage Increment Occurrences Line Contents
=============================================================
23 1130.6 MiB 1130.6 MiB 1 @profile
24 def run_evaluation(queries, index, model, batch_size=None):
25 1130.6 MiB 0.0 MiB 1 kwargs = {'match_batch_size':batch_size} if batch_size else {}
26 1439.9 MiB 309.3 MiB 1 return queries.embed_and_evaluate(metrics=['reciprocal_rank'], index_data=index, embed_models=model, **kwargs)
This release supports Qdrant versions above 0.10.1. This comes with a lot of performance improvements and bug fixes on the backend.
Qdrant document store now supports pure filtering:
from docarray import Document, DocumentArray
import numpy as np
n_dim = 3
da = DocumentArray(
storage='qdrant',
config={'n_dim': n_dim, 'columns': {'price': 'float'}},
)
with da:
da.extend(
[
Document(id=f'r{i}', embedding=i * np.ones(n_dim), tags={'price': i})
for i in range(10)
]
)
max_price = 7
n_limit = 4
filter = {'must': [{'key': 'price', 'range': {'lte': max_price}}]}
results = da.filter(filter=filter, limit=n_limit)
print('\nPoints with "price" at most 7:\n')
for embedding, price in zip(results.embeddings, results[:, 'tags__price']):
print(f'\tembedding={embedding},\t price={price}')
This prints:
Points with "price" at most 7:
embedding=[6. 6. 6.], price=6
embedding=[7. 7. 7.], price=7
embedding=[1. 1. 1.], price=1
embedding=[2. 2. 2.], price=2
search_params
in find
for Qdrant document store (#675)You can now pass search_params
in find
interface with Qdrant.
results = da.find(np_query, filter=filter, limit=n_limit, search_params={"hnsw_ef": 64})
DocArray offers login
and logout
methods to log into your Jina AI Cloud account directly from DocArray.
from docarray import login, logout
login()
# you are logged in
logout()
# you are logged out
docarray
version to push (#710)When pushing DocumentArray
to cloud, docarray
version is now added as metadata
.
load_uri_to_video_tensor
(#663)Add keyword arguments that are available in av.open()
to load_uri_to_video_tensor()
from docarray import Document
doc = Document(uri='/some/uri')
doc.load_uri_to_video_tensor(timeout=5000)
This release adds support for Weaviate version above v1.16.0. Make sure to use version 1.16.1 of the Weaviate backend to enjoy all Weaviate features.
Previously, if you used the sub-index feature, every time you add new Documents with chunks, DocArray would persist the offset2ids of the chunk subindex. With this change, the offset2id is persisted once, when the parent DocumentArray's offset2id is persisted.
Previously, when calling generator class
methods as from_csv
from a DocumentArray
instance it had the non-intuitive behavior of not changing the DocumentArray in place.
Now DocumentArray
instances are not allowed to call these methods, and raise an Exception
.
from docarray import DocumentArray
da = DocumentArray()
da.from_files(
patterns='*.*',
size=2,
)
AttributeError: Class method can't be called from a DocumentArray instance but only from the DocumentArray class.
Previously, calling summary
on a Document
that contains some textual patterns would raise an Exception from rich
. This release uses the Text
class from rich
to ensure the text
is properly rendered.
When using find
or match
interfaces with Redis
document store, scores are now returned as float
and not string
.
Allow initialization of a Document instance with a dataclass object as well as additional kwargs.
Currently, when a Document is initialized with dataclass
and kwargs
the attributes passed with the dataclass object are overridden.
from docarray import dataclass, Document
from docarray.typing import Text
@dataclass
class MyDoc:
chunk_text: Text
d = Document(MyDoc(chunk_text='chunk level text'), text='top level text')
assert d.text == 'top level text'
assert d.chunk_text.text == 'chunk level text'
Allow passing an empty List as field input of a dataclass:
from docarray import *
from docarray.typing import *
from typing import List
@dataclass()
class A:
img: List[Text]
Document(A(img = []))
When using DocumentArray
as a context manager, subindices
are now handled as context managers as well.
This makes handling subindices
more robust.
Change the type hint for tags in docarray.document.data.DocumentData
from tags: Optional[Dict[str, 'StructValueType']]
to tags: Optional[Dict[str, Any]]
.
This stops the IDE complaining when passing nested dictionaries inside tags
.
Change the benchmark section of docs to use SIFT1M
dataset. Also add QPS-Recall
graphs to compare how different DocumentStores
work in DocArray.
plot
with display
(#689)We would like to thank all contributors to this release:
Published by github-actions[bot] almost 2 years ago
This release contains 1 hot fix.
To avoid a breaking change, DocArray now requires AnnLite version 0.3.13.
samsja (@samsja)
Published by github-actions[bot] about 2 years ago
This release contains 7 new features, 6 bug fixes and 8 documentation improvements.
The Redis Document Store can now accept geospatial filter queries in the DocumentArray.find()
method:
from docarray import Document, DocumentArray
n_dim = 3
da = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'columns': {'location': 'geo'},
},
)
with da:
da.extend(
[
Document(id=f'r{i}', tags={'location': f"{-98.17+i},{38.71+i}"})
for i in range(10)
]
)
max_distance = 300
filter = f'@location:[-98.71 38.71 {max_distance} km] '
results = da.find(filter=filter, limit=10)
print(
f'Locations within: {max_distance} km',
[(doc.id, doc.tags['location']) for doc in results],
)
Results:
Locations within: 300 km [('r0', '-98.17,38.71'), ('r1', '-97.17,39.71')]
DocumentArray.evaluate()
now supports computing evaluations for multiple metrics at once. The metric
parameter is
renamed to metrics
, and metric_name
is renamed to metric_names
.
The evaluate()
method expects a list for metrics
and metric_names
rather than a single value.
For instance, instead of doing:
da2.evaluate(
ground_truth=da1, metric='precision_at_k', metric_name='precision@k', k=5
) # returns average_evaluation
use:
da2.evaluate(
ground_truth=da1, metrics=['precision_at_k'], metric_names=['precision@k'], k=10
) # returns {'precision@k': prec_at_k_average_evaluation}
The first usage will raise a deprecation warning and will be deprecated soon.
The return type is also changed: evaluate()
will now return a dict mapping metric names to their average evaluation scores
instead of a single score value.
For more info, check the Evaluate Matches section in the documentation.
When using DocumentArray.push()
, error messages returned by the server will show up in the stack trace. For instance, pushing a DocumentArray
with a name reserved by another user will return the following error:
requests.exceptions.HTTPError: 403 Client Error: OperationNotAllowedError: Current user is not allowed to edit this artifact. Permission denied. for url: https://api.hubble.jina.ai/v2/rpc/artifact.upload
MongoDB-like filter QL is no longer supported in the Redis backend, and this release adds support for the native Redis QL syntax. Using MongoDB-like filter QL will raise a deprecation warning and will be deprecated soon.
Therefore, instead of using:
redis_da.find(filter={'field': {'@eq': 'value'}})
use this syntax instead:
redis_da.find(filter='@field:value')
For more information, check the Redis Document Store documentation.
As of this release, DocumentArray.evaluate()
supports labeled datasets. Labels can be added using a tag
field in
each Document of your DocumentArray:
import numpy as np
from docarray import Document, DocumentArray
example_da = DocumentArray([Document(tags={'label': (i % 2)}) for i in range(10)])
example_da.embeddings = np.random.random([10, 3])
example_da.match(example_da)
print(example_da.evaluate(metric='precision_at_k'))
The results of the evaluation will be stored in the evaluations
field of each Document.
You can specify the label field name using the label_tag
attribute:
example_da = DocumentArray(
[Document(tags={'my_custom_label': (i % 2)}) for i in range(10)]
)
example_da.embeddings = np.random.random([10, 3])
example_da.match(example_da)
print(example_da.evaluate(metric='precision_at_k', label_tag='my_custom_label'))
You can see the progress of batching documents using DocumentArray.batch()
with the show_progress
parameter:
import time
from docarray import Document, DocumentArray
da = DocumentArray.empty(100000)
for i in range(1, 100000):
da.append(Document(text=str(i)))
print('append finished')
for batch in da.batch(500, show_progress=True):
time.sleep(0.1)
n_components
PCA parameter to AnnLite configurations (#606)The parameter n_components
is added to AnnLite's configuration in DocArray. Use this parameter when you want to use
PCA in your AnnLite backend.
DocArray adds support for Qdrant versions greater than or equal to v0.8.0 and drops support for previous versions.
Therefore, make sure to use version 0.8.0 or higher for both qdrant-client
and the Qdrant database.
Fully persisting (syncing) data in a DocumentArray to a database now is ensured using either the context manager or
the sync()
method. Make sure to wrap write operations to a DocumentArray
in a context manager like so:
my_da = DocumentArray(storage='my_storage', config=...)
with my_da:
... # write operations
or use the sync()
method:
my_da = DocumentArray(storage='my_storage', config=...)
... # write operations
my_da.sync()
load_uri_to_audio_tensor
(#609)Method load_uri_to_audio_tensor
used to open a file handler without properly closing the file.
This release fixes this bug and makes sure the file is opened with a context manager and is closed properly.
Concatenation operations in DocumentArray used to operate on objects in-place, without making a copy.
This resulted in the following unexpected behavior:
from docarray import DocumentArray
da1 = DocumentArray.empty(3)
da2 = DocumentArray.empty(4)
da3 = DocumentArray.empty(5)
print(da1 + da2 + da3)
da1 += da2
print('length =', len(da1)) # expected length = 7 but prints length = 16
This release fixes the bug. Concatenation will operate on new copied objects each time rather than concatenating
in-place.
Prior to this release, reloading a DocumentArray configured with subindices from a database used to produce a
unique ID existing
error (the actual error message depends on the backend). This happened because DocumentArray
attempted to index initial documents twice in the sub-index, although they had been already indexed.
This release fixes the issue.
Serializing a Document used to ignore scores with value 0.0
. For instance, the string representation of a Document
might ignore the scores with value 0 and consider them as an empty field. This release fixes the issue.
convert_uri_to_datauri()
method in the documentation. (#608)set_image_normalization()
method so that it mentions proper usage and aligns with PyTorchPackage | Minimum Version |
---|---|
qdrant |
0.8.0 |
The Qdrant backend in DocArray now requires Qdrant database v0.8.0 or higher.
DocumentArray.evaluate()
changed from a single score float to a dict mapping score names to score values.DocumentArray
using a storage backend now has to be ensured by using the context manager. Therefore, you need to wrap your write operations to a DocumentArray
in a context manager like so:my_da = DocumentArray(storage='my_storage', config=...)
with my_da:
... # write operations
Alternatively, you can call the sync()
method when you finish write operations:
my_da = DocumentArray(storage='my_storage', config=...)
... # write operations
my_da.sync()
metric
and metric_name
parameters in the DocumentArray.evaluate()
method were renamed and accept a list type rather than a single value (as mentioned above). The old naming and type will be deprecated soon.We would like to thank all contributors to this release:
Jie Fu (@jemmyshin)
Wang Bo (@bwanglzu)
Jonathan Rowley (@jonathan-rowley)
Alex Cureton-Griffiths (@alexcg1)
Han Xiao (@hanxiao)
AlaeddineAbdessalem (@alaeddine-13)
Michael Gรผnther (@guenthermi)
samsja (@samsja)
Johannes Messner (@JohannesMessner)
dong xiang (@dongxiang123)
Jackmin801 (@Jackmin801)
Joan Fontanals (@JoanFM)
Published by github-actions[bot] about 2 years ago
0.17.0
)Release time: 2022-09-23 16:18:19
This release contains 8 new features, 2 performance improvements, 7 bug fixes, and 2 documentation improvements.
load_uri_to_*
methods (#540)The load_uri_to_*
methods (load_uri_to_blob
, load_uri_to_text
, etc.) now accept kwargs
so that you can pass a timeout parameter to the underlying request methods.
For example:
doc = Document(uri='uri_path')
doc.load_uri_to_blob(timeout=2)
You can now store multiple DocumentArrays in a single Redis instance, as long as each DocumentArray has a different index_name
:
da1 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 128, 'index_name': 'da1'})
da2 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 256, 'index_name': 'da2'})
da3 = DocumentArray(storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 512, 'index_name': 'da3'})
Logging in to Jina Cloud is now required before pushing/pulling DocumentArrays to/from Jina Cloud. You can log in either by creating a token in hub.jina.ai
and setting it as an environment variable (JINA_AUTH_TOKEN=my_token
) or using the CLI command jina auth login
.
cloud_list
and cloud_delete
methods (#490)DocumentArray.push
will extract metadata about the DocumentArray and send it to Jina Cloud. Although this is transparent to users, it will help with visualization of DocumentArrays in Jina Cloud.
It is also possible to list and delete DocumentArrays in Jina Cloud using the following methods:
DocumentArray.cloud_list()
: will list all DocumentArray objects owned by the authenticated userDocumentArray.cloud_delete(da_name)
: will delete the DocumentArray by name if it is owned by the authenticated userFull text search is supported either on the Document.text
field or on Document tags as long as you enable indexing text or specify tag fields to be indexed.
For example:
from docarray import Document, DocumentArray
da = DocumentArray(
storage='redis', config={'n_dim': 2, 'index_text': True}
)
da.extend([
Document(text='Redis allows you to search by text query,'),
Document(text='by vector similarity'),
Document(text='Or by filter conditions'),
]) # add documents with text field
da.find('my text query').texts
Result:
['Redis allows you to search by text query,']
$and
and $or
in Redis (#509)The Redis backend now supports $and
and $or
logical operators. For example:
from docarray import DocumentArray
da = DocumentArray(storage='redis', config={'n_dim': 128, 'columns': {'col1': 'str', 'col2': 'int'}})
redis_filter = {
"$or": {
"col1": {"$eq": "value"},
"col2": {"$lt": 100}
}
}
# retrieve documents using filter
da.find(redis_filter)
The columns
configuration parameter for storage backends has been changed from a list of tuples to a dictionary in the following format: {'column_name': 'column_type'}
. This helps with YAML compatibility.
For example:
from docarray import DocumentArray
da = DocumentArray(storage='annlite', config={'n_dim': 128, 'columns': {'col1': 'str', 'col2': 'float'}})
It is now possible to choose which field to use when displaying an image document:
from docarray import Document
d = Document(uri=os.path.join(cur_dir, 'toydata/test.png'))
d.display()
d.display(from_='uri')
or
d.load_uri_to_image_tensor()
d.display(from_='tensor')
Package | Minimum Version |
---|---|
jina-hubble-sdk |
0.13.1 |
annlite |
0.3.12 |
Other API Changes:
columns
configuration parameter for storage backends has been changed from a list of tuples to a dictionary in the following format: {'column_name': 'column_type'}
.exists
condition (#519)We got rid of unnecessary and costly computation when computing DocumentArray.find
with an exists
filter. When running the following code:
from docarray import DocumentArray, Document
da = DocumentArray(Document(text='text') for _ in range(num)) + \
DocumentArray(Document(blob=b'blob') for _ in range(num))
da.find(query={'text': {'$exists': True}})
you should expect a 200-300% speed increase in find
latency.
This optimization only affects performing DocumentArray.find
or DocumentArray.match
when an exists
condition is used and an in-memory
document store is used.
The default journal mode in the SQLite backend is now WAL. This should improve performance when using the SQLite backend.
According to the SQLite docs, WAL is significantly faster, provides more concurrency, and is more robust.
DocumentArray's Redis backend previously initialized schemas in the Redis database with default values of vector similarity search parameters. Those default values came from DocArray, not Redis.
This altered the database's default behavior, although the user didn't explicitly specify that. We've changed the implementation to avoid altering default values of the database. Default values now depend on the Redis database version.
AnnLite introduced a breaking change in 0.3.12
. Therefore, we have adapted our implementation to the latest version of AnnLite and increased the minimum required version to 0.3.12
.
DocumentArray's delete by mask operation used to present an unexpected behavior. The following code erases the last Document, even though it is not covered by the mask:
da = DocumentArray.empty(3)
mask = [True, False]
del da[mask]
print(len(da)) # prints 1
We have fixed this behavior, and DocumentArray will now correctly keep documents that are not present in the mask.
We've fixed an incorrect link in the documentation.
DocArray type mapping used the wrong types in AnnLite. We've now replaced the types specified in the document store implementation with the correct ones.
Strawberry introduced a breaking change in 0.128.0
, making it necessary to pass parameters as key arguments. We've adapted our code base to this change.
Some parts of in-memory distance computation used to restrict tensor device conversion to cuda
. We've changed the implementation to make device conversion more generic.
We've added a "One Million Benchmark" section to the "Feature Summary" page.
We've updated the pip setup instruction required to use DocumentArray push/pull.
We would like to thank all contributors to this release: Joan Fontanals(@github_user)
Leon Wolf(@fogx)
samsja(@samsja)
AlaeddineAbdessalem(@alaeddine-13)
Halo Master(@linkerlin)
Han Xiao(@hanxiao)
Wang Bo(@bwanglzu)
Anne Yang(@AnneYang720)
Joan Fontanals(@JoanFM)
Published by github-actions[bot] about 2 years ago
0.16.5
)Release time: 2022-09-08 17:56:12
๐ We'd like to thank all contributors for this new release! In particular,
Anne Yang, Jina Dev Bot, ๐
cefb66c0
] - redis: add logic operators $and and $or in redis (#509) (Anne Yang)404b9731
] - version: the next version will be 0.16.5 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.16.4
)Release time: 2022-09-08 15:59:21
๐ We'd like to thank all contributors for this new release! In particular,
Joan Fontanals, Wang Bo, Jina Dev Bot, ๐
fff3ecca
] - columns should be a dictionary not list of tuples (#526) (Joan Fontanals)531bd835
] - fix fiinetuner link for totally looks like (#532) (Wang Bo)4526bc7d
] - fix annlite type map (#533) (Joan Fontanals)c7105983
] - version: the next version will be 0.16.4 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.16.3
)Release time: 2022-09-06 09:46:29
๐ We'd like to thank all contributors for this new release! In particular,
Joan Fontanals, AlaeddineAbdessalem, samsja, Jina Dev Bot, ๐
fa05ec35
] - allow choose to display from tensor or from uri in document (#518) (samsja)d6152331
] - only check if field is set (#519) (AlaeddineAbdessalem)e8cc9c56
] - create strawberry types with kwargs (#527) (Joan Fontanals)0b2326bd
] - update seri (#516) (samsja)b268f6f3
] - ci fix (#520) (AlaeddineAbdessalem)4d4fb504
] - version: the next version will be 0.16.3 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.16.2
)Release time: 2022-08-30 19:00:32
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, Halo Master, Jina Dev Bot, ๐
34bf27f3
] - find: make device more generic (#515) (Han Xiao)459703e9
] - sqlite: change default journal mode to WAL (#506) (Halo Master)Published by github-actions[bot] about 2 years ago
0.16.1
)Release time: 2022-08-29 13:57:21
๐ We'd like to thank all contributors for this new release! In particular,
Jina Dev Bot, ๐
68533181
] - version: the next version will be 0.16.1 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.16.0
)Release time: 2022-08-29 10:19:26
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, AlaeddineAbdessalem, Anne Yang, Joan Fontanals, felix-wang, Johannes Messner, Jina Dev Bot, ๐
c2235de1
] - redis: implement Redis storage backend and unit tests (#452) (Anne Yang)24be6ba8
] - bump protobuf (#371) (Joan Fontanals)7c91c7bd
] - annlite offsetmapping (#504) (felix-wang)be788678
] - plot: be robust against non-existing subindices (#503) (Johannes Messner)2c17d888
] - docs: update docs generation (Han Xiao)615fa85c
] - include redis in benchmarking script (Han Xiao)120135cf
] - cleanup ci (#505) (AlaeddineAbdessalem)6aaf0e9d
] - update readme (Han Xiao)c5ff8705
] - fix readme (Han Xiao)cc88ec28
] - version: the next version will be 0.15.5 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.15.4
)Release time: 2022-08-25 07:34:28
๐ We'd like to thank all contributors for this new release! In particular,
AlaeddineAbdessalem, samsja, Jina Dev Bot, ๐
51fec4f4
] - cap annlite version (#497) (AlaeddineAbdessalem)e75ca80d
] - bump es version in documentation to match the tests (#498) (samsja)Published by github-actions[bot] about 2 years ago
0.15.3
)Release time: 2022-08-23 12:02:49
๐ We'd like to thank all contributors for this new release! In particular,
Johannes Messner, Han Xiao, Jina Dev Bot, ๐
cb066d83
] - update offset2id when deleting in subindex (#496) (Johannes Messner)69a976fd
] - find: fix misleading signature of _find_by_text (#495) (Han Xiao)0dfb5f2c
] - version: the next version will be 0.15.3 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.15.2
)Release time: 2022-08-19 12:58:37
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, ๐
b54bb09a
] - video: add height_width for webcam caps (#494) (Han Xiao)fefc0025
] - version: the next version will be 0.15.2 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.15.1
)Release time: 2022-08-19 11:18:56
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, Johannes Messner, Jina Dev Bot, ๐
c936b18f
] - video: add from_webcam generator (#493) (Han Xiao)9e00a008
] - push meta data along with docarray (Han Xiao)461b996f
] - set subindices directly via access path (#488) (Johannes Messner)Published by github-actions[bot] about 2 years ago
0.15.0
)Release time: 2022-08-12 19:11:24
โ
โ
Subindices allow you to efficiently search through nested and multimodal Documents, at any nesting level, without loading them into memory.
This works by creating dedicated database indices using subindex_configs=
and searching through a specific subindex using on=
:
@dataclass
class MyDoc:
image: Image
paragraph: Text
da = DocumentArray(
[
Document(MyDoc(image='apple.png', paragraph='hello')),
Document(MyDoc(image='apple.png', paragraph='world'))
],
storage='annlite',
subindex_configs={'@.[image]': None}, # add subindex for '@.[image]'
)
da['@.[image]'].embeddings = ... # add image embeddings
da.find(on='@.[image]') # find closest image; can also take '@c' etc.
Read more about this feature in our docs.
#456
โ
.extend()
#473.plot()
#472 #468Published by github-actions[bot] about 2 years ago
0.14.11
)Release time: 2022-08-11 20:09:20
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, ๐
930c75cb
] - use hubble sdk for get_token (#485) (Han Xiao)f7b8eb89
] - version: the next version will be 0.14.11 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.14.10
)Release time: 2022-08-11 15:25:02
๐ We'd like to thank all contributors for this new release! In particular,
Nan Wang, Alvin Prayuda, Han Xiao, Johannes Messner, Jina Dev Bot, ๐
8f66a15e
] - elastic: add bulk get operation (#478) (Alvin Prayuda)413af099
] - elastic: add find kwargs for perf tuning (#481) (Alvin Prayuda)ec7dcce1
] - subindex for all document stores (#456) (Johannes Messner)71274d2d
] - retrieve MAX_ES_RETURNED_DOCS docs in bulk operations (#484) (Nan Wang)a43b498d
] - use hubble sdk for push pull (#482) (Han Xiao)fc39c042
] - version: the next version will be 0.14.10 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.14.9
)Release time: 2022-08-09 20:59:47
๐ We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, ๐
30e00322
] - image: fix image padding on vertical direction (#480) (Han Xiao)23ce6518
] - version: the next version will be 0.14.9 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.14.8
)Release time: 2022-08-05 09:17:34
๐ We'd like to thank all contributors for this new release! In particular,
Alvin Prayuda, Jina Dev Bot, ๐
9f939e6f
] - elastic: allow kwargs to be pass at extend (#473) (Alvin Prayuda)67f60cd7
] - version: the next version will be 0.14.8 (Jina Dev Bot)Published by github-actions[bot] about 2 years ago
0.14.7
)Release time: 2022-08-04 12:53:30
๐ We'd like to thank all contributors for this new release! In particular,
Johannes Messner, Jina Dev Bot, ๐
6b4f6fc0
] - keep offset2id updated when deleting in sqlite (#471) (Johannes Messner)505d7767
] - version: the next version will be 0.14.7 (Jina Dev Bot)