Open Source Ecosystems

About The Project

BetterSearch is a desktop search tool that brings natural language to traditional file search. BetterSearch allows you to ask questions directly to your laptop/PC and get detailed answers, without sending your files to the cloud. Currently in its alpha version, BetterSearch is available only for Windows machines, with plans to support Windows, MacOS, and Linux in the full release.

Leveraging the powerful indexing features of existing search systems on Windows and MacOS, BetterSearch performs on-the-fly indexing and updates its content index automatically, even when files are added, deleted, or modified. Users do not need to manually add files for querying.

BetterSearch employs two state-of-the-art models for embedding and querying:

SQLCoder - A fine-tuned Llama-3 model from Defog.ai, designed for SQL generation from natural language queries.
gte-v1.5 - Alibaba’s gte-v1.5 series of models, known for its advanced embeddings, extended context lengths, and efficient memory usage.

Built With

Getting Started

CAUTION: only Windows is supported currently, BetterSearch will not work on a Linux or MacOS installation.

Prerequisites

Ensure you have Python >= 3.9 installed, either through a local setup or virtual environment. I recommend creating a virtual environment using conda or venv.

Using conda:

git clone https://github.com/sandesh-bharadwaj/BetterSearch.git
conda create -f bettersearch_env.yml # This creates a conda environment called 'bettersearch' with all dependencies.
conda activate bettersearch

Usage

To start the application, run:

cd BetterSearch
python app.py

On the first few runs, BetterSearch will take time for initial indexing and downloading the necessary models (depending on internet speeds), so please be patient. You can speed up file indexing by starting the application, switching the Compute Mode setting to a GPU-based option (if you have a compatible Nvidia GPU), and then restarting the application. For more information, see Compute Mode.

Once the initial setup is complete, the application will start up much faster on subsequent launches.

BetterSearch can answer questions related to both file properties and file contents.

Compute Mode

By default, BetterSearch uses the CPU-Only setting. However, GPU options are also available, and you can create your custom configurations by modifying the respective JSON files.

CPU-Only - This setting loads both the vector embedding model and SQLCoder on the CPU, using the OpenVINO-optimized version of SQLCoder available here. This setting consumes a significant amount of memory, so expect slower responses if you don't have sufficient RAM. (Tested and verified on Intel i7-12800HX, 32GB of RAM)
GPU VRAM < 10GB - SQLCoder is loaded using 4-bit optimization on the GPU. Requires at least 6GB of VRAM to work correctly. (Tested and verified on Nvidia RTX 3070Ti)
GPU VRAM < 16GB - SQLCoder is loaded using 8-bit optimization on the GPU. Requires at least 10GB of VRAM to work correctly. (Not tested; please report any bugs)
GPU VRAM > 16GB - SQLCoder is loaded using 16-bit optimization on the GPU. (Not tested; please report any bugs)

Additionally, you can choose to load only the vector embedding model on the GPU, while loading SQLCoder on the CPU. This can be done by setting embd_model_device to cuda instead of cpu in cpu_only.json. This configuration allows for fast file content indexing without requiring a powerful GPU to run SQLCoder.

Examples

Questions Using the Search Index:

"How many files were modified after September 10, 2021?"
"What are the three largest files on my system?"

Questions Using the Content Index:

"What is the penalty for not wearing a seatbelt in a passenger vehicle in Massachusetts?" - Information available in the Massachusetts Driving Manual PDF on my local machine.
"Give me a brief summary of Sandesh's thesis during his MS at Boston University." - Information available in my resume. 😉

Known Issues

Llama-3 seems to be prone to strange errors and failure to follow the prompt due to quantization, and I have experienced the same when running BetterSearch in CPU-Only and GPU VRAM < 10GB modes. At the moment, there isn't a solution to this issues, but using the latter two GPU settings in Compute Mode should yield better results.
File content queries can be bad, due to the chunk size and chunk overlap settings. This can be improved through smarter indexing, and a good reference point is Greg Kamradt's tutorial. Alternatively, this could also be due to the nature of the embedding model being used, but extensive testing is required to confirm this.

Creating Your Own Configurations

To create your own configuration files, follow the structure provided in any of the pre-defined configuration files and adjust them as needed.

"model_name": This specifies the LLM model used for search. I recommend:
- "defog/llama-3-sqlcoder-8b" - If you have an Nvidia GPU that supports one of the available GPU configs. Intel GPUs are also supported using the Intel Extension for PyTorch.
- "sandeshb/llama-3-sqlcoder-8b-int8-ov" - If you have an Intel CPU or GPU. Note that AMD CPUs may not experience the same level of speedup with OpenVINO models.
"cache_dir": The cache directory where the models are downloaded ("cache_dir/" by default).
"bnb_config": Configuration for BitsAndBytes; refer to the documentation for more details.
"kv_cache_flag": Sets the use_cache flag for generation models in HuggingFace Transformers. It is recommended to set this to true always.
"num_beams": Number of beams for beam search (default=4).
"db_path": Location of the content index (Chroma)("better_search_content_db/" by default).
"embd_model_device": Decides where gte-v1.5 will be loaded. (Options: "cpu", "cuda")
"check_interval": Interval (in seconds) at which BetterSearch checks the filesystem for changes and updates its content index (default=30).
"chunk_size": Chunk size for storing vector embeddings in Chroma (default=500).
"chunk_overlap": Overlap between vector embedding chunks in Chroma (default=150). It is recommended to keep this value between 10%-20% of "chunk_size".
"chunk_batch_size": Batch size for adding embedding chunks to Chroma. This should be set based on the amount of RAM available, as setting it too high can crash the app. (Default is 500, adjust according to your preference.)
"top_k": Number of documents retrieved based on the query in Chroma (default=3).

Roadmap

Folder whitelists/blacklists for content indexing.
Improve content querying through chunk size and overlap controls.
Add more settings for controlling generation.
Migrate from using OS-specific search indexes to custom SQL database for better SQL querying and customizability.
Add MacOS support
Add Linux support

See the open issues for a full list of proposed features (and known issues).

Contributing

If you have a suggestion that would improve this, please open an issue with the tag "enhancement".You can also fork the repo and create a pull request. Your feedback is greatly appreciated! Don't forget to give the project a star! Thanks again!