pdf-fuzz

PoC bulk search you pdf files using text look up

Stars
0
Committers
2

pdf-fuzz

PoC bulk search your pdf files using fuzzy text look up.

How to

Requirements

  • Docker
  • docker-compose

Run this project

1.Clone project and submodules: run git clone --recurse-submodules https://github.com/HazemBZ/pdf-fuzz.

2.Drop a folder with pdf files inside pdf_fuzz_back/assets folder (smaller number of files -> less time to process).

3.Index db with pdf contents: docker-compose exec backend bash -c "python manage.py reindex".

4.Spin up containers: run docker-compose up.

Update your pdf file

After changing the contents of pdf_fuzz_back/assets, reindex with: docker-compose exec backend bash -c "python manage.py reindex".

TODOs

  • v0 PoC.
  • CI/CD: docker-compose -> one click project spin up.
  • BE: ETL solution for text lookup -> Faster lookups, extract once use forever.
  • FE: Handle queries w/ ReactQuery -> DX.
  • FE/BE: Files uploader -> QoL.
  • BE: Task Queue solution for files processing -> Seperation of concerns.