deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

MPL-2.0 License

Downloads
56K
Stars
7.8K
Committers
121

Bot releases are visible (Hide)

deeplake - v3.1.2 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • [DL-888] Dataset copying speedup and fixes (#2005) @FayazRahman
  • Do not hide S3 access errors (#1884) @daniel-falk
  • [DL-905] [DL-916] Consistent progressbar arg + example for decode_method (#2021) @FayazRahman

βš™οΈ Who Contributes

@FayazRahman and @daniel-falk

deeplake - v3.1.1 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Mmdet integration (#2026) @adolkhan
  • Allow persistent workers in dataloader (#2028) @AbhinavTuli
  • [AL-2012] speedup pop element from dataset (#2024) @levongh
  • [AL-2036] remove tiled image extraction (#2017) @levongh
  • Handle repeated samples in shuffle (#2018) @AbhinavTuli
  • [DL-910] Tensorflow iteration fix (#2013) @FayazRahman

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan and @levongh

deeplake - v3.1.0 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • [DL-896] pip install deeplake[enterprise] (#2008) @farizrahman4u
  • [AL-2017] Add decode method to Pytorch API (#1991) @AbhinavTuli
  • [DL-885] Fix iteration warnings (#1989) @FayazRahman
  • [CUS-35] Fix merging class labels when class names aren't populated (#2007) @AbhinavTuli
  • Allow np.array as sampler weights. Update docs. (#1999) @khustup
  • [DL-893] Fast UUID + speedup sample id tensor (#1988) @farizrahman4u
  • [AL-2024] Add MPL license to Deep Lake in Pypi (#1998) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @farizrahman4u and @khustup

deeplake - v3.0.18 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Bump libdeeplake version to fix issue with dataloader crashing over multiple epochs(#2000) @AbhinavTuli
  • [DL-811] [DL-857] API reference updates (#1977) @FayazRahman

βš™οΈ Who Contributes

@AbhinavTuli and @FayazRahman

deeplake - v3.0.17 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • [CUS-32] Fix dataloader behaviour for json and list tensors (#1995) @AbhinavTuli
  • [CUS-30] Add support for bytes in json tensors (#1994) @AbhinavTuli
  • Add timeout to Pypi version check (#1996) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli

deeplake - v3.0.16 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Libdeeplake update to fix issue with linked tensors on certain systems (#1992) @levongh
  • [AL-1850] [CUS-29] Version control diff and merge improvements (#1862) @AbhinavTuli
  • Adds support for sampling. (#1987) @khustup
  • [DL-879] Improve download API (#1986) @FayazRahman
  • [AL-1992] [CUS-18] Fixes token expiration issue using hub:// datasets (#1983) @AbhinavTuli
  • Mesh & Point Cloud htype's docs (#1979) @adolkhan

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @khustup and @levongh

deeplake - v3.0.15 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Serve link creds for non deeplake datasets in ds.visualize (#1974) @khustup
  • [DL-790] Speedup extend (#1936) @farizrahman4u

βš™οΈ Who Contributes

@farizrahman4u and @khustup

deeplake - v3.0.14 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • [AL-2010] Fixes verification of linked samples during rechunking (#1980) @AbhinavTuli
  • No Wheels (fix for pip install on Windows) (#1976) @farizrahman4u
  • [AL-2011] Fixes a bug with popping samples (#1975) @AbhinavTuli
  • [AL-1964] Expose path for linked tensors (#1963) @AbhinavTuli
  • [DL-759] Deeplake connect (#1951) @ProgerDav

βš™οΈ Who Contributes

@AbhinavTuli, @ProgerDav and @farizrahman4u

deeplake - v3.0.13 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Update libdeeplake version (#1970) @AbhinavTuli
  • Update shuffle buffer to handle bytes (#1968) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli

deeplake - v3.0.12 🌈

Published by github-actions[bot] almost 2 years ago

🧭 What's Changed

  • Libdeeplake fixes and improvements (#1964) @AbhinavTuli
    • Greatly improves performance when working with compressed jpeg and png data
    • Experimental dataloader transforms now receive PIL images instead of numpy arrays, ToPILImage transform should not be included
    • Fixes deadlocking issue when multiple nested dataloaders are created
    • Fixed unexpected segmentation faults
    • Added wheels for centOS
    • Added wheels for arm64 and x86_64 (fixed linking errors during lib import)
  • [DL]-819 Add error messages related to user not being logged in (#1955) @adolkhan
  • [DL-804] Dont support group.info (#1960) @FayazRahman
  • [DL-782] Delete temp tensors in case append fails during transforms (#1924) @FayazRahman
  • Improves experimental dataloader performance for tensors with jpeg and png images (#1961) @AbhinavTuli
  • [AL-1999] [Bug fix] lnfo not being updated after using Deep Lake compute on dataset. (#1956) @AbhinavTuli
  • Fixed shape polygon fix (#1959) @FayazRahman
  • [DL-821] Fix allowing commit on views (#1953) @farizrahman4u
  • [DL-814][CUS-14][CUS-17] Pytorch fixes (#1949) @farizrahman4u
  • [CUS-22] Update query and htypes api reference (#1948) @FayazRahman
  • [CUS-24] Fix polygons bug with fixed shape inputs (#1950) @farizrahman4u
  • [DL-756] Log loading creds except in transforms (#1937) @FayazRahman
  • [Dl 706] Improve speed of materialization (#1902) @adolkhan
  • [AL-1990] add shuffle argument to .shuffle for experimental dataloader(#1942) @levongh
  • [DL-726][DL-789] Ignore corrupt tensors + fetch_chunks for .data(), .text() etc (#1932) @farizrahman4u
  • [DL-798] Fix partial read skip for chunk compressed chunks (#1939) @farizrahman4u

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u, @istranic and @levongh

deeplake - v3.0.10 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • libdeeplake upgrade (#1938) @davidbuniat
    • Query shape(image) bug fixed
    • Query regex for contains function deployed. Example: SELECT * WHERE contains(labels, 'an') on imagenet, will return all samples with class names containing. There are two wildcards supported * - any number of characters (including 0) and ? - exactly one character.
  • fix read for wav compressed audio (#1935) @gorinars
  • [DL-730] Make sure hub.list does not report the token to bugout (#1917) @adolkhan
  • Update Deep Lake version after release (#1934) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli, @adolkhan, @davidbuniat, @gorinars and [email protected]

deeplake - v3.0.9 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Update libdeeplake version (#1933) @AbhinavTuli
  • [DL-764] API reference updates (#1929) @FayazRahman
  • Fix region issue with activeloop storage datasets (#1930) @AbhinavTuli
  • [DL-755] Specify transform kwargs in ds.pytorch call (#1925) @farizrahman4u
  • [DL-783] Rich compatibility (#1926) @farizrahman4u

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman and @farizrahman4u

deeplake - v3.0.8 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • libdeeplake update to fix memory issues (#1927) @AbhinavTuli
  • [DL-777] Polygons bug fix (#1922) @farizrahman4u
  • Variable local cache prefix (#1839) @GMW99
  • [DL-763] Locking fix (#1921) @farizrahman4u
  • [DL-701] Columnar views (#1912) @farizrahman4u

βš™οΈ Who Contributes

@AbhinavTuli, @GMW99 and @farizrahman4u

deeplake - v3.0.7 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Updated libdeeplake version, removes torch as dependency, fixes issue with strings in dataloader (#1919) @AbhinavTuli
  • [DL-753] [DL-722] Fix appending linked data with verify=False (#1914) @FayazRahman
  • Allow tensorflow dataset to fetch chunks (#1887) @daniel-falk
  • [DL-754] Add reporting for W&B integration (#1918) @FayazRahman

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @daniel-falk, @davidbuniat and @mikayelh

deeplake - v3.0.6 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Update libdeeplake version to fix issue with distributed mode (#1915) @AbhinavTuli
  • [AL-1967] Fixes issue with readonly mode error raised despite not trying to write to dataset (#1911) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli and @davidbuniat

deeplake - v3.0.5 🌈

Published by farizrahman4u about 2 years ago

Introducing Deep Lake

We are more than excited to transition into Deep Lake, data lake for deep learning applications. Furthermore we released

Behind the scenes those are 5 key stepping stones of Deep Lake.

  1. Version Control: Git for data
  2. Visualize: In-browser visualization engine
  3. Query: Rapid queries with Tensor Query language
  4. Materialize: Format native to deep learning
  5. Stream: Streaming Data Loaders

If you wonder...

  • Why we renamed Hub to Deep Lake?

Hub originally was a chunked array format which evolved with version control, streaming engine, query capabilities naturally while iterating with community members. The name has been too generic to describe the tool often leading to a confusion with dataset hubs. Inspired from A. Pinhassi’s blogpost we renamed the package from hub to deeplake

 > pip3 install deeplake
  • Where does Deep Lakehouse comes into the place?

While the format including versioning, lineage is fully open-source. Query, streaming and visualization engines built in C++ are yet closed source. They are accessible through Python interface for all users. While committed to open-source principles, we are planning to open-source high performance engines as they commoditize.

🧭 What's Changed

  • Update README.zh-cn.md (#1910) @tatevikh
  • Update README.md (#1909) @istranic
  • Staging 3.0.5 (#1908) @farizrahman4u
  • Tiling Fix (#1907) @farizrahman4u
  • 3.0.3 (#1906) @farizrahman4u
  • [DL-746] hub->deeplake (#1895) @farizrahman4u
  • [DL-747] API Reference updates: new compressions + new Htypes page (#1892) @FayazRahman
  • Tensor Query Language documentation (#1896) @FayazRahman
  • Added more file formats for compression (#1597) @aadityasinha-dotcom
  • Indra import fix (#1891) @farizrahman4u
  • API Reference updates (#1886) @FayazRahman
  • Update version to 2.8.6 (#1889) @AbhinavTuli

πŸ› Bug Fixes

  • Passing token down (#1903) @ProgerDav

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav, @aadityasinha-dotcom, @artgish, @davidbuniat, @farizrahman4u, @istranic, @mikayelh and @tatevikh

deeplake - v2.8.5 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • [DL-717] Add installation instructions to API reference (#1882) @FayazRahman
  • [DL-702] API reference updates (#1883) @FayazRahman
  • [DL-711] Allow view optimization when read_only=True (#1865) @farizrahman4u
  • Fixes bug with is_sequence (#1880) @AbhinavTuli
  • [DL-714] Add Ellipsis support for indexing (#1878) @farizrahman4u
  • [DL-645] Fix memory leak in transforms (#1871) @adolkhan
  • [DL-715] Fix wandb integration path issue (#1879) @farizrahman4u
  • Add docstrings for experimental features(#1876) @levongh
  • [DL-693] Disable label sync for dataset copy transform (#1875) @FayazRahman
  • [DL-709] Docker build fix (#1860) @farizrahman4u
  • Improve indra error message in case of missing dependencies (#1873) @farizrahman4u
  • [DL-710] Fix locking issue with deepcopy (#1864) @farizrahman4u

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u and @levongh

deeplake - v2.8.4 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Fixes import issue on Python 3.10 (#1867) @adolkhan
  • Big speedup for experimental dataloader initialization (#1869) @AbhinavTuli
  • Adds docstrings for experimenal features (#1868) @levongh

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat and @levongh

deeplake - v2.8.3 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Fixes type mismatch for expiration(#1858) @levongh
  • Flag to disable wandb integration (#1863) @farizrahman4u
  • Fixes wandb+local datasets (#1861) @hakanardo
  • [DL-668] Make pytorch() work with views (#1855) @farizrahman4u
  • [AL-1949] Make experimental pytorch dataloader consistent with existing implementation (#1853) @AbhinavTuli
  • [DL-650] Better error handling when not passing a tensor name to ds.append (#1817) @adolkhan
  • Update docs URL in readme (#1857) @FayazRahman
  • Speedup conversion of hub storage datasets->deeplake for experimental features (#1856) @levongh
  • [DL-611] New API reference (#1830) @FayazRahman
  • Wandb update: report datasets created with deepcopy (#1848) @farizrahman4u
  • [Bugfix] 1828 raising UserNotLoggedInException when invalid path is provided (#1829) @adolkhan
  • [DL-655] Added min and max length options (#1841) @adolkhan

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u, @hakanardo and @levongh

deeplake - v2.8.1 🌈

Published by github-actions[bot] about 2 years ago

🧭 What's Changed

  • Ensure that new format for chunk id isn't used for encoders with version <= 2.7.6 (#1850) @AbhinavTuli

βš™οΈ Who Contributes

@AbhinavTuli and @davidbuniat