This RAG is designed to analyze text for potential hate speech using Llama and ChromaDB
BSD-3-CLAUSE License
This RAG is designed to analyze user inputs for potential hate speech using a combination of the large language model (LLM) Llama and a ChromaDB database. The LLM evaluates the input against a curated database of hate speech entries, determining whether the input should be classified as such. If the input is identified as hate speech and sufficiently distinct from existing entries in the database, it is stored for future reference. The script also ensures that redundant entries are avoided, maintaining the efficiency and relevance of the database.
[!NOTE] This repository was created as part of a master's thesis at IU Internationale Hochschule.
clone this git repository and install Ollama.
Python:
pip install -r requirements/requirements.txt
Ollama:
ollama pull mannix/llama3.1-8b-abliterated
[!IMPORTANT] Attention: Please use uncensored versions of Llama, such as Mannix Llama 3.1; otherwise, this script will not work properly.
Use one of the following scripts:
initial_load_csv.py
script imports the CSV file stored in /data
to update the Chroma database.initial_load_ai.py
script prompts Ollama to generate hate speech examples and updates the Chroma database.Example:
python initial_load_csv.py
This Python script using LangChain combines the use of the large language model Llama with a ChromaDB database to review user inputs for group-based hostility (hate speech) and, if applicable, store them. This helps to expand the database with new, pertinent examples of hate speech while avoiding redundant or identical entries.
python main.py
Initialization:
OllamaLLM
) and a ChromaDB database (hatespeech
) are initialized. The model is used to analyze user inputs, while the database is used to find similar entries and store new ones.Prompt Creation:
Query and Context Matching:
retrieve_context_and_distances
, searches the ChromaDB for similar entries based on the user input. The function returns documents and the distances of these documents to the input.LLM Invocation:
Database Storage:
User Feedback:
[!NOTE] This script is designed in German, but it will function consistently regardless of the language. You are welcome to modify the prompts, user outputs, and other language-specific elements.
In v1.0.3 - WebApp Support
there is a new WebApp!
To use the WebApp you have to do following steps:
run.py
Script
Navigate to the webapp
folder where run.py
is located and run the script by executing:
python run.py
This command will start the web application, and you should keep this terminal window open and running while you interact with the web interface.
Once the run.py
script is running, you can access the web application by opening a web browser and navigating to the appropriate URL, typically http://localhost:5000
.
[!NOTE] To make the WebApp suitable for production (available online), you have to meet all the requirements for a production environment, such as handling large traffic, managing sessions securely, and logging errors appropriately. Please think about a Reverse Proxy, too!
Don`t forgt to have a look at the tools:
chroma_query.py
chroma_inspector.py
These scripts have been updated. I have kept the original versions in the /old
folder because they were used during the writing of the thesis.
main_v1.py
is the first edition of the script. The new main.py is fully updated to include database writeback functionality and integration with Llama 3.1.chroma_query_v1.py
is the initial version of the query script. The latest version in /tools
includes console input functionality.