db-hub-fastapi | Python Ecosystem Directory

Bot releases are hidden (Show)

db-hub-fastapi - 0.10.1 Latest Release

Published by prrao87 12 months ago

What's Changed

Elasticsearch update by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/44

Full Changelog: https://github.com/prrao87/db-hub-fastapi/compare/0.10.0...0.10.1

db-hub-fastapi - 0.10.0

Published by prrao87 about 1 year ago

What's Changed

Add missing packages and update README by @sanders41 in https://github.com/prrao87/db-hub-fastapi/pull/43
Add LanceDB data loader with FastAPI endpoints by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/27

Full Changelog: https://github.com/prrao87/db-hub-fastapi/compare/0.9.2...0.10.0

db-hub-fastapi - 0.9.2

Published by prrao87 about 1 year ago

What's Changed

Fixed issue with batching and file handling logic by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/42

Full Changelog: https://github.com/prrao87/db-hub-fastapi/compare/0.9.1...0.9.2

db-hub-fastapi - 0.9.1

Published by prrao87 about 1 year ago

What's Changed

Improved bulk indexer for Meilisearch and compared performance with the sync version of the Python client. Clearly, async performs better, the larger the data.

Meilisearch bulk index benchmark by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/41

Full Changelog: https://github.com/prrao87/db-hub-fastapi/compare/0.9.0...0.9.1

db-hub-fastapi - 0.9.0

Published by prrao87 about 1 year ago

What's Changed

Update to Pydantic v2 finished.

Improve docs and remove ONNX for Qdrant and Weaviate by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/39
Pydantic v2 update for Qdrant by @prrao87 in https://github.com/prrao87/db-hub-fastapi/pull/40

Full Changelog: https://github.com/prrao87/db-hub-fastapi/compare/0.8.3...0.9.0

db-hub-fastapi - 0.8.3

Published by prrao87 about 1 year ago

What's Changed

Update schema directory structure for Elasticsearch, Meilisearch and Neo4j by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/36
Pydantic v2 for Weaviate by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/37

Full Changelog: https://github.com/prrao87/async-db-fastapi/compare/0.8.2...0.8.3

db-hub-fastapi - 0.8.2

Published by prrao87 over 1 year ago

What's Changed

Pydantic v2 Elasticsearch by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/34

Full Changelog: https://github.com/prrao87/async-db-fastapi/compare/0.8.1...0.8.2

db-hub-fastapi - 0.8.1

Published by prrao87 over 1 year ago

What's Changed

Pydantic v2 Meilisearch by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/32
Bug fix: FastAPI response in Meilisearch by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/33

Full Changelog: https://github.com/prrao87/async-db-fastapi/compare/0.8.0...0.8.1

db-hub-fastapi - 0.8.0

Published by prrao87 over 1 year ago

What's Changed

Reorg and renaming by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/29
Pydantic v2 updates for Neo4j by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/30

Full Changelog: https://github.com/prrao87/async-db-fastapi/compare/0.7.0...0.8.0

db-hub-fastapi - 0.7.0

Published by prrao87 over 1 year ago

What's Changed

Aggregation endpoints for Elastic added in https://github.com/prrao87/async-db-fastapi/pull/24, fixing #22
Significant performance improvements for all databases through asyncio and multiprocessing shown in https://github.com/prrao87/async-db-fastapi/pull/26

db-hub-fastapi - 0.6.0

Published by prrao87 over 1 year ago

What's Changed

Added bulk-indexing code and API for Weaviate: An ML-first vector database for similarity/hybrid search by @prrao87 in https://github.com/prrao87/async-db-fastapi/pull/21

db-hub-fastapi - 0.5.0

Published by prrao87 over 1 year ago

What's Changed

Added code for Qdrant, a vector database built in Rust

Includes:

Key features

Bulk index both the data and associated vectors (sentence embeddings) using sentence-transformers into Qdrant so that we can perform similarity search on phrases.

Unlike keyword based search, similarity search requires vectors that come from an NLP (typically transformer) model
- We use a pretrained model from sentence-transformers
- multi-qa-distilbert-cos-v1 is the model used: As per the docs, "This model was tuned for semantic search: Given a query/question, it can find relevant passages. It was trained on a large and diverse set of (question, answer) pairs."
Unlike other cases, generating sentence embeddings on a large batch of text is quite slow on a CPU, so some code is provided to generate ONNX-optimized and quantized models so that we both generate and index the vectors into db more rapidly without a GPU

Notes on ONNX performance

It looks like ONNX does utilize all available CPU cores when processing the text and generating the embeddings (the image below was generated from an AWS EC2 T2 ubuntu instance with a single 4-core CPU).

On average, the entire wine reviews dataset of 129,971 reviews is vectorized and ingested into Qdrant in 34 minutes via the quantized ONNX model, as opposed to more than 1 hour for the regular sbert model downloaded from the sentence-transformers repo. The quantized ONNX model is also ~33% smaller in size from the original model.

sbert model: Processes roughly 51 items/sec
Quantized onnxruntime model: Processes roughly 92 items/sec

This amounts to a roughly 1.8x reduction in indexing time, with a ~26% smaller (quantized) model that loads and processes results faster. To verify that the embeddings from the quantized models are of similar quality, some example cosine similarities are shown below.

Example results:

The following results are for the sentence-transformers/multi-qa-MiniLM-L6-cos-v1 model that was built for semantic similarity tasks.

Vanilla model

---
Loading vanilla sentence transformer model
---
Similarity between 'I'm very happy' and 'I am so glad': [0.74601071]
Similarity between 'I'm very happy' and 'I'm so sad': [0.6456476]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09541589]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.27607652]

Quantized ONNX model

---
Loading quantized ONNX model
---
The ONNX file model_optimized_quantized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
Similarity between 'I'm very happy' and 'I am so glad': [0.74153285]
Similarity between 'I'm very happy' and 'I'm so sad': [0.65299551]
Similarity between 'I'm very happy' and 'My dog is missing': [0.09312761]
Similarity between 'I'm very happy' and 'The universe is so vast!': [0.26112114]

As can be seen, the similarity scores are very close to the vanilla model, but the model is ~26% smaller and we are able to process the sentences much faster on the same CPU.

db-hub-fastapi - 0.4.3

Published by prrao87 over 1 year ago

In this release

srsly is a fast and lightweight JSON serialization library from Explosion.
- It eliminates a lot of boilerplate for util functions that read/write compressed JSONL files (in gzip format)
- Using this library each bulk indexing script is very simple, doesn't add much overhead to the pip install time, and reduces the number of lines of code quite significantly
- The code base for Elasticsearch, Meilisearch and Neo4j have all been updated to use srsly to read gzipped JSONL
- For future DBs, the same approach will be used to also keep things clean and readable
For Meilisearch, the settings specification is moved over to a settings.json to keep things clean and easy to find all in one place

db-hub-fastapi - 0.4.2

Published by prrao87 over 1 year ago

Enhancements

This release contains updates and enhancements from #15 and #16.

#15 results in a ~4x reduction in indexing time for Meilisearch. The key changes are as follows:

It's possible to process files concurrently (using process pool executor from concurrent.futures), avoiding sequential execution
The process pool is then attached to the running event loop, so that we allow non-blocking execution of each executor that's performing tasks like reading JSON data and validating them in Pydantic
aiofiles was also tried to process files in async fashion, but the bottleneck seems to be with the validation in pydantic, not with file I/O.
- It will be interesting to see how pydantic 2 compares with this approach in the future!

db-hub-fastapi - 0.4.1

Published by prrao87 over 1 year ago

Improvements to Meilisearch section

#11 resolves an issue where files not being found causes the script to fail
#12 improves indexing performance by gathering async tasks first (and not processing them in a blocking manner)
#13 cleans up the comments and docs and fixes a problem with the docker container not firing up when minor version is missing

db-hub-fastapi - 0.4.0

Published by prrao87 over 1 year ago

What's in this release

#8 adds Meilisearch, a fast and responsive search engine database written in Rust. Like the other databases in this repo, the async Python client is used to bulk-index the dataset into the db and async queries are used in FastAPI. The following tasks are implemented:

Set up Meilisearch DB instance via Docker compose and include .env.example
Add async bulk indexing script
- Include schema checks
- Add methods to set searchable, sortable and filterable fields
Add API code for querying db
Add docs describing Meilisearch and some of its limitations compared to other dbs

db-hub-fastapi - 0.3.0

Published by prrao87 over 1 year ago

What's in this update

Includes updates from #5 and #6.

Elasticsearch

This release introduces Elasticsearch indexing and API code to the repo.

Include docker files to set up a basic license (free) Elasticsearch database
Create a wines alias and its associated index in Elastic
Bulk-index the wines dataset into Elastic
Test queries in Kibana
Build FastAPI application to query result from Elastic via JSON queries sent to the backend
Test out sample queries via OpenAPI browser

Neo4j

Minor fixes to docs: typos and clarity
Fix type hints in schema and API
Set the docker container tag as the API version for simplicity (every time the FastAPI container tag changes, the API version in the docs follow suit with the same number)
Fix issues with type hints in API routers
- Neo4j queries return vanilla dicts, and for some reason, FastAPI + Pydantic don't parse these prior to sending them as a response (this isn't an issue in Elastic)
Update README example cURL request and docs
Fix linting issues

db-hub-fastapi - 0.2.1

Published by prrao87 over 1 year ago

Refactor Neo4j data loader schema and queries

This release is for https://github.com/prrao87/async-db-fastapi/pull/4.

There's no need to complicate things by converting the existing data to a nested dict -- keeping the original dict from the raw data makes sense from a build query perspective
Running each portion (nodes first and then edges after) is also unnecessary -- a single build query with WITH and MERGE statements does the job

db-hub-fastapi - 0.2.0

Published by prrao87 over 1 year ago

Updates

Add uvloop to speed up async event loop (The AsyncGraphDatabase driver already uses this
Slim down docker image for faster builds
Add new endpoints
Update docs for more clarity

db-hub-fastapi - 0.1.0

Published by prrao87 over 1 year ago

Neo4j

This version adds support for ingesting the wine reviews dataset into Neo4j, a graph database, in an async fashion. In addition, it also provides a query API written in FastAPI that allows the user to send queries via available endpoints. As usual in FastAPI, the API is documented via OpenAPI specs.

All code (wherever possible) is async
Pydantic is used for schema validation, both prior to data ingestion and during API request handling
- The same schema is used for data ingestion and for the API, so there is only one source of truth regarding how the data is handled
The whole setup is orchestrated and deployed via docker, and gunicorn is the process manager for the API -- so there is a semblance of scalability. However, for a true production setup, the given compose file may not be sufficient (look into K8s and other such tools for deploying the API more scalably behind a reverse proxy)

Related Projects

pybanca-back

A python API made with Ja pronto and SQLAlchemy.

15 Mar 2018 6

pythondb

Python as a Database.

08 Jul 2024 2

FastApi-Strawberry-GraphQL-SqlAlchemy-BoilerPlate

Boiler plate project for using GraphQL (Strawberry) with FastAPI and Async SQL Alchemy 🍓

08 Jul 2022 140

awesome-vector-database

Awesome List of Vector DB resources

21 Apr 2023 77

full-stack-fastapi-ftgo

Full stack, modern web application template. Using FastAPI, Vuejs, MongoDB, PostgreSQL, Redis, Me...

29 Jun 2024 13

pandora

Small box of pandora to prototype your app with ready for use backend. This is just my compilatio...

24 Sep 2018 26

Markcloud_USPTO-Search-Remind

[ (주) 마크클라우드 인턴 활동 리마인드 ] USPTO 데이터 크롤링 후 압축해제하여 MongoDB에 저장 & MariaDB 사용하여 로그인 및 회원가입 & Elastics...

24 Aug 2024 1

YLab_Fastapi_project

Проект выполнен в рамках интенсива Python от компании Y_lab. По результатам выполнения и техсобес...

20 Jul 2023 5

nlp-stock-sa

A stock sentiment analysis app using natural language processing built with Python, TypeScript, P...

11 Feb 2024 1

xbird_challenge

17 Sep 2017 0

fastapi-users

Ready-to-use and customizable users management for FastAPI

05 Oct 2019 4,554