deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

MPL-2.0 License

Downloads
56K
Stars
7.8K
Committers
121

Bot releases are hidden (Show)

deeplake - v3.0.5 🌈

Published by farizrahman4u about 2 years ago

Introducing Deep Lake

We are more than excited to transition into Deep Lake, data lake for deep learning applications. Furthermore we released

Behind the scenes those are 5 key stepping stones of Deep Lake.

  1. Version Control: Git for data
  2. Visualize: In-browser visualization engine
  3. Query: Rapid queries with Tensor Query language
  4. Materialize: Format native to deep learning
  5. Stream: Streaming Data Loaders

If you wonder...

  • Why we renamed Hub to Deep Lake?

Hub originally was a chunked array format which evolved with version control, streaming engine, query capabilities naturally while iterating with community members. The name has been too generic to describe the tool often leading to a confusion with dataset hubs. Inspired from A. Pinhassi’s blogpost we renamed the package from hub to deeplake

 > pip3 install deeplake
  • Where does Deep Lakehouse comes into the place?

While the format including versioning, lineage is fully open-source. Query, streaming and visualization engines built in C++ are yet closed source. They are accessible through Python interface for all users. While committed to open-source principles, we are planning to open-source high performance engines as they commoditize.

🧭 What's Changed

  • Update README.zh-cn.md (#1910) @tatevikh
  • Update README.md (#1909) @istranic
  • Staging 3.0.5 (#1908) @farizrahman4u
  • Tiling Fix (#1907) @farizrahman4u
  • 3.0.3 (#1906) @farizrahman4u
  • [DL-746] hub->deeplake (#1895) @farizrahman4u
  • [DL-747] API Reference updates: new compressions + new Htypes page (#1892) @FayazRahman
  • Tensor Query Language documentation (#1896) @FayazRahman
  • Added more file formats for compression (#1597) @aadityasinha-dotcom
  • Indra import fix (#1891) @farizrahman4u
  • API Reference updates (#1886) @FayazRahman
  • Update version to 2.8.6 (#1889) @AbhinavTuli

πŸ› Bug Fixes

  • Passing token down (#1903) @ProgerDav

βš™οΈ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav, @aadityasinha-dotcom, @artgish, @davidbuniat, @farizrahman4u, @istranic, @mikayelh and @tatevikh

deeplake - v2.2.1 🌈

Published by davidbuniat almost 3 years ago

🧭 What's Changed

  • Fix indexing and groups (#1426) @aliubimov
  • Tweak tile encoder format (#1435) @AbhinavTuli
  • Remove explicit shared memory conversion (#1437) @aliubimov
  • Fix tiling + pytorch (#1427) @AbhinavTuli
  • Distributed data parallel loader (#1412) @aliubimov
  • Rewrite query (#1415) @aliubimov
  • Fix RCE in video decompression (#1425) @farizrahman4u
  • Enable intellisense for hub compute functions (#1419) @AbhinavTuli
  • [Bug Fix] Stop accessing underlying storage for checking encoder existence (#1417) @AbhinavTuli
  • Video bug fix (#1349) @farizrahman4u
  • Fix windows pickling issue (#1421) @farizrahman4u
  • Delete old pytorch impl (#1420) @farizrahman4u
  • [AL-1610] Json VC Info (#1413) @farizrahman4u
  • removing hub_shm since it is no longer required (#1418) @gautamkrishnar
  • [AL-1613][AL-1549] Send events from Hub to Platform (#1406) @AbhinavTuli
  • [AL-1621] Add skip_ok to hub compute (#1409) @AbhinavTuli
  • [AL-1615][AL-1612] Speedup inplace transform (#1396) @AbhinavTuli
  • [Bug fix] Fix pytorch + version control (#1410) @AbhinavTuli
  • [AL-1631] Commit with custom hash (#1408) @farizrahman4u
  • Fix typo in README.md (#1411) @fabioperez
  • [AL-1579] Auto htype (#1370) @farizrahman4u

βš™οΈ Who Contributes

@AbhinavTuli, @aliubimov, @davidbuniat, @fabioperez, @farizrahman4u, @gautamkrishnar and @mikayelh

deeplake - Hub is in Beta!

Published by istranic over 3 years ago

What's New

  • Hub core was redesigned to enable blazing-fast dataset creation. You can create a Hub dataset faster than copy/pasting files on your local machine

Features

  • Super simple API
  • Easy creation of datasets and hosting on Activellop Storage or S3
  • Rapid dataset streaming to any machine
  • Simple dataset integration to pytorch with no boilerplate code (Windows support will be added in the next release)
deeplake - Pre-Release 2.0.1-alpha

Published by imshashank over 3 years ago

Pre-release for Hub 2.0-alpha

deeplake - 2.0 Early Alpha

Published by nollied over 3 years ago

deeplake - 1.3.3 πŸš€

Published by Diveafall over 3 years ago

🧭 What's Changed

  • to_pytorch now supports a new argument (key_list) that only passes certain tensors to it and speeds up iteration time in case multiple extra tensors are present. (#715) @AbhinavTuli
  • Caching present within to_pytorch has been improved to tensors with dynamic shapes (earlier it was saving only the current sample in the cache) (#715) @AbhinavTuli
  • Added ability to store DatasetView as a new Dataset (#740) @AbhinavTuli
  • Introduces Windows and MacOS tests to circleci (#719) @haiyangdeperci
  • Benchmark restructuring and memory profiling (#642) @benchislett
  • changed default dtype of classlabel from uint16 to uint8 (#745) @AbhinavTuli
  • Updated humbug version (#728) @zomglings

πŸ—‚οΈ Documentation

  • Add examples of dataset generation and modification using transforms, trainings with TensorFlow and PyTorch (#675) @kristinagrig06
  • Added code and testing notebook for running dataset transforms on a ray cluster. (#713) @kristinagrig06

πŸ› Bug Fixes

  • Fixed an issue when overwriting transform datasets (#724) @AbhinavTuli

πŸ”— Dependency Updates

  • Bump boto3 from 1.17.41 to 1.17.43 (#742) @dependabot-preview
  • Bump boto3 from 1.17.40 to 1.17.41 (#734) @dependabot-preview
  • Bump torchvision from 0.9.0 to 0.9.1 (#720) @dependabot-preview
  • Bump boto3 from 1.17.39 to 1.17.40 (#730) @dependabot-preview
  • Bump boto3 from 1.17.36 to 1.17.39 (#726) @dependabot-preview
  • Bump tiledb from 0.8.5 to 0.8.6 (#725) @dependabot-preview

βš™οΈ Who Contributed

@AbhinavTuli, @Diveafall, @benchislett, @haiyangdeperci, @imshashank, @kristinagrig06 and @zomglings

deeplake - 1.3.2

Published by AbhinavTuli over 3 years ago

πŸš€ New

  • Auto infer-schema & auto-directory ingestion! (#696) @McCrearyD
  • Added a hello objectron notebook (#694) @haiyangdeperci
  • Added ability to specify region in S3 (#715) @kevinlu1211
  • CSV parsing added to hub.auto (#711) @dhiganthrao
  • Added genomelake hub backend benchmarks (#680) @DebadityaPal
  • Added unit test for utils.py (#668) @hakanbakacak

🧭 What's Changed

  • to_tensorflow now supports a new argument (key_list) that only passes certain tensors to it and speeds up iteration time in case multiple extra tensors are present. (#689) @AbhinavTuli
  • Caching present within to_tensorflow has been improved to tensors with dynamic shapes (earlier it was saving only the current sample in the cache) (#689) @AbhinavTuli
  • Adds the option to specify None as compressor while defining the schema (#689) @AbhinavTuli
  • Adds the ability to slice dynamically shaped tensors and obtain a list instead of iterating over them one by one. (#689) @AbhinavTuli
  • transform logic has been modified to work properly with multiple workers (#689) @AbhinavTuli
  • Added tags to usage and crash reports (#697) @zomglings
  • Added ipynb file with benchmark tests for dnafrag package (#676) @DebadityaPal
  • Relaxed hub requirements (#659) @haiyangdeperci
  • Updated Objectron dataset tensors from generic types to hub schema representations (#705) @haiyangdeperci

πŸ› Bug Fixes

  • Removed mutable default args in client/base.py (#699) @TakshPanchal
  • Fixes windows environment encoding (#671) @haiyangdeperci
  • Fix/windows setup (#650) @haiyangdeperci
  • Fixed README links (#682) @DebadityaPal
  • Any dataset copy test that got interrupted midway through the test affected all subsequent test runs. This has now been fixed. (#689) @AbhinavTuli
  • Fixed issue with resize in mode='a' (#718) @kristinagrig06

πŸ—‚ Documentation

  • Russian translation for README (#656) @george-zakharov
  • Update schema docs (#654) @thisiseshan
  • Add Tutorial for Working with Text on Hub (#672) @dhiganthrao
  • include consent language in readme (#666) @mynameisvinn

πŸ”— Dependency Updates

  • Bumped humbug dependency version to ">=0.1.6" (#673) @zomglings
  • Update zarr requirement from <2.7,>=2.4 to >=2.4,<2.8 (#717) @dependabot-preview
  • Bump boto3 from 1.17.33 to 1.17.36 (#716) @dependabot-preview
  • Bump boto3 from 1.17.30 to 1.17.33 (#701) @dependabot-preview
  • Bump tensorflow from 2.4.0 to 2.4.1 (#706) @dependabot-preview
  • Bump sphinx from 3.5.2 to 3.5.3 (#707) @dependabot-preview
  • Bump tiledb from 0.7.6 to 0.8.5 (#703) @dependabot-preview
  • Bump flake8 from 3.8.4 to 3.9.0 (#686) @dependabot-preview
  • [Security] Bump tensorflow from 2.3.1 to 2.4.0 (#332) @dependabot-preview
  • Bump pytest-cov from 2.10.1 to 2.11.1 (#474) @dependabot-preview
  • Bump boto3 from 1.17.22 to 1.17.30 (#693) @dependabot-preview

βš™οΈ Who Contributed

@AbhinavTuli, @DebadityaPal, @McCrearyD, @TakshPanchal, @dependabot-preview, @dependabot-preview[bot], @dhiganthrao, @george-zakharov, @haiyangdeperci, @hakanbakacak, @kevinlu1211, @kristinagrig06, @madhucharan, @mynameisvinn, @thisiseshan, @zomglings

deeplake - 1.3.1

Published by AbhinavTuli over 3 years ago

πŸš€ New

  • Auto infer-schema & auto-directory ingestion! (#696) @McCrearyD
  • Added a hello objectron notebook (#694) @haiyangdeperci
  • Added ability to specify region in S3 (#715) @kevinlu1211
  • CSV parsing added to hub.auto (#711) @dhiganthrao
  • Added genomelake hub backend benchmarks (#680) @DebadityaPal
  • Added unit test for utils.py (#668) @hakanbakacak

🧭 What's Changed

  • to_tensorflow now supports a new argument (key_list) that only passes certain tensors to it and speeds up iteration time in case multiple extra tensors are present. (#689) @AbhinavTuli
  • Caching present within to_tensorflow has been improved to tensors with dynamic shapes (earlier it was saving only the current sample in the cache) (#689) @AbhinavTuli
  • Adds the option to specify None as compressor while defining the schema (#689) @AbhinavTuli
  • Adds the ability to slice dynamically shaped tensors and obtain a list instead of iterating over them one by one. (#689) @AbhinavTuli
  • transform logic has been modified to work properly with multiple workers (#689) @AbhinavTuli
  • Added tags to usage and crash reports (#697) @zomglings
  • Added ipynb file with benchmark tests for dnafrag package (#676) @DebadityaPal
  • Relaxed hub requirements (#659) @haiyangdeperci
  • Updated Objectron dataset tensors from generic types to hub schema representations (#705) @haiyangdeperci

πŸ› Bug Fixes

  • Removed mutable default args in client/base.py (#699) @TakshPanchal
  • Fixes windows environment encoding (#671) @haiyangdeperci
  • Fix/windows setup (#650) @haiyangdeperci
  • Fixed README links (#682) @DebadityaPal
  • Any dataset copy test that got interrupted midway through the test affected all subsequent test runs. This has now been fixed. (#689) @AbhinavTuli
  • Fixed issue with resize in mode='a' (#718) @kristinagrig06

πŸ—‚ Documentation

  • Russian translation for README (#656) @george-zakharov
  • Update schema docs (#654) @thisiseshan
  • Add Tutorial for Working with Text on Hub (#672) @dhiganthrao
  • include consent language in readme (#666) @mynameisvinn

πŸ”— Dependency Updates

  • Bumped humbug dependency version to ">=0.1.6" (#673) @zomglings
  • Update zarr requirement from <2.7,>=2.4 to >=2.4,<2.8 (#717) @dependabot-preview
  • Bump boto3 from 1.17.33 to 1.17.36 (#716) @dependabot-preview
  • Bump boto3 from 1.17.30 to 1.17.33 (#701) @dependabot-preview
  • Bump tensorflow from 2.4.0 to 2.4.1 (#706) @dependabot-preview
  • Bump sphinx from 3.5.2 to 3.5.3 (#707) @dependabot-preview
  • Bump tiledb from 0.7.6 to 0.8.5 (#703) @dependabot-preview
  • Bump flake8 from 3.8.4 to 3.9.0 (#686) @dependabot-preview
  • [Security] Bump tensorflow from 2.3.1 to 2.4.0 (#332) @dependabot-preview
  • Bump pytest-cov from 2.10.1 to 2.11.1 (#474) @dependabot-preview
  • Bump boto3 from 1.17.22 to 1.17.30 (#693) @dependabot-preview

βš™οΈ Who Contributed

@AbhinavTuli, @DebadityaPal, @McCrearyD, @TakshPanchal, @dependabot-preview, @dependabot-preview[bot], @dhiganthrao, @george-zakharov, @haiyangdeperci, @hakanbakacak, @kevinlu1211, @kristinagrig06, @madhucharan, @mynameisvinn, @thisiseshan, @zomglings

deeplake - 1.3.0

Published by AbhinavTuli over 3 years ago

🧭 What's Changed

  1. Version Control has been added to Hub Datasets! (#610) @AbhinavTuli
  2. to_tensorflow now properly supports Text datasets (#658) @AbhinavTuli
  3. Hub crash and system information reports using Bugout (#624) @zomglings
  4. Added support for multiple BBox and Classlabel, instead of Sequences. (#658) @AbhinavTuli
  5. CLI name has been changed from hub to activeloop (#631) @haiyangdeperci
  6. Notebook example for creating dataset for object detection and instance segmentation added(#629) @haritsahm
  7. Tutorial for working with Audio Added (#592) @mynameisvinn

πŸš€ New

  1. Hub version command cli (#628) @sparkingdark
  2. Automatic Release Drafter added to repository (#598) @Anselmoo
  3. Improve Directory Structure of Examples (#630) @SauravMaheshkar
  4. Put zarr, tileDB, and hub benchmarks in one file (#534) @DebadityaPal
  5. Refactored Dataset Class (#576) @DebadityaPal
  6. Add Github Actions CI pipeline (#372) @ADI10HERO
  7. Improve Directory Structure of Examples (#630) @SauravMaheshkar

πŸ› Bug Fixes

  1. Removed Assertions from shape_detector.py and added exceptions (#616) @DebadityaPal
  2. Adds support for dataset views in sharded dataset (#557) @AbhinavTuli
  3. Advanced slicing added for Sharded Dataset (#558) @AbhinavTuli

πŸ—‚ Documentation

  1. README added in Korean (#621) @HyeongminLEE
  2. README added in Bahasa Indonesia (#645) @haritsahm
  3. README added in French (#640) @MargauxMasson
  4. README added in Turkish (#608) @hakanbakacak
  5. Chinese Readme Proofread and Update (#613) @Cynthia7979
  6. Change ds.commit() to ds.flush() throughout in README.md (#619) @galbwe
  7. Added explaination for local file system to docs (#634) @McCrearyD
  8. Replaced commit() with flush() in documentation. (#604) @dhiganthrao
  9. Add MinIO to Data Storage docs (#605) @gabriel-milan
  10. Updated example notebooks with pip (#585) @MojammelHossain
  11. Typos fixed (#591) @dPacc

πŸ”— Dependency Updates

Bump pytest from 6.2.1 to 6.2.2 (#496) @dependabot-preview
Bump ray from 1.0.0 to 1.2.0 (#554) @dependabot-preview
Bump boto3 from 1.16.39 to 1.17.20 (#646) @dependabot-preview

βš™οΈ Who Contributed

@ADI10HERO, @AbhinavTuli, @Anselmoo, @Cynthia7979, @DebadityaPal, @HyeongminLEE, @MargauxMasson, @McCrearyD, @MojammelHossain, @SauravMaheshkar, @dPacc, @davidbuniat, @dhiganthrao, @gabriel-milan, @galbwe, @haiyangdeperci, @hakanbakacak, @haritsahm, @imshashank, @mikayelh, @mynameisvinn, @sparkingdark and @zomglings

deeplake - 1.2.3

Published by AbhinavTuli over 3 years ago

Release Notes

  1. Reverting shape checks for Mask schema to maintain backward compatibility.
deeplake - 1.2.2

Published by AbhinavTuli over 3 years ago

Release Notes

  1. Hotfix for a bug that resulted in incorrect slicing of TensorView.
deeplake - 1.2.1

Published by AbhinavTuli over 3 years ago

Release Notes

  1. Dataset copying has been added allowing you to copy your own and other users' datasets easily. Datasets can be copied across gcs, s3, aws, local storage and hub storage. #454 (@AbhinavTuli)
  2. Many improvements to the benchmarks #508 #512 #531 #545 #550 (@haiyangdeperci @DebadityaPal)
  3. Development Roadmap added #511 (@mynameisvinn)
  4. Improved message for Hub transforms by displaying shard size #523 (@DebadityaPal)
  5. All windows have now been fixed. #528 (@AbhinavTuli)
  6. Hub dataset filtering has been overhauled and a section has been added for the same in the documentation #539 (@AbhinavTuli)
  7. to_tensorflow issues with Datasets containing Sequences (such as coco) have been fixed #540 (@AbhinavTuli)
  8. Adds get_label parameter to .compute() and .numpy(), to directly retrieve string label from ClassLabel #489 (@DebadityaPal)
  9. Tutorial added for using Hub with Hugging Face transformers #536 (@DebadityaPal)
  10. Some unit tests have now been parameterized to cover multiple datatypes #527 (@drewpotter)
  11. From directory function has been implemented to directly ingest categorical image data #459 (@sparkingdark)
  12. Example use case added for creating a Hub dataset for Deep Learning prediction of crop yield #559 (@MargauxMasson)
  13. MPL Headers have been added to source files #494 (@KrishnaChaitanya1)
deeplake - 1.2.0

Published by AbhinavTuli over 3 years ago

Release Notes

  1. Adds support for dataset filtering (#460)(@AbhinavTuli)
  2. Greatly improves to_tensorflow performance (#481) (@AbhinavTuli)
  3. Benchmarks added for Hub 1.x (#486) (@benchislett)
  4. Fixes a bug that caused issues on windows machines (#472)(@FayazRahman)
  5. Fixes a bug that caused issues with TF 2.4.0 (#478) (@DebadityaPal)
  6. Fixes docker build issue (#463) (@Darkborderman)
  7. Added Chinese readme (#458) (@EYH0602)
  8. Better automatic determination of Dataset mode depending on permissions (#466)(@edogrigqv2)
  9. CoLA dataset uploaded to Hub, upload script added to examples (#487)(@mynameisvinn)
  10. Fixes a bug with dataset slicing (#480) (@AbhinavTuli)
  11. Adds support for custom s3 endpoints (including MinIO) (#482) (@AbhinavTuli)
  12. Adds the ability to set a name to a dataset so it appears better on the visualizer (#468) (@AbhinavTuli)
deeplake - 1.1.3

Published by AbhinavTuli almost 4 years ago

Fixes an issue in to_pytorch when using a dataset that the user doesn't own.

deeplake -

Published by davidbuniat almost 4 years ago

Release Notes

  • Custom s3storage with 5-10x faster than S3FS
  • Faster pytorch dataset with current chunk logic
  • Fixed caching with in-memory per process without LMDB
  • Better Exception handling for loading a dataset, shape and type checks, casting
  • Added examples, tutorials, and better GitHub issue handling
  • Add the opportunity to fill in additional information about the dataset such as description, license, citation
  • Native support with .compute() in the middle for nested tensors

Contributors include. @edogrigqv2 @AbhinavTuli @mynameisvinn @Anselmoo @sparkingdark @sanchitvj @Atom-101

deeplake - Release v1.0.7

Published by davidbuniat almost 4 years ago

Private dataset support
Improved error handling and exceptions
Test coverage reached 73%->80%
Various bug fixes
Transform speedup ~2x, hence from_x convertors work faster

deeplake - Version 1.0.6

Published by AbhinavTuli almost 4 years ago

Fixes some issues with segmentation and RAM issues in transform

deeplake - version 1.0.6

Published by AbhinavTuli almost 4 years ago

Fixes some issues with segmentation and RAM issues in transform