A real-time client and server for LLaMA.
I personally used the smallest 7B/
model on an Intel PC / Macbook Pro, which is ~4.8G when quantized to 4 bit, or ~13G in full precision.
Note that LLaMA cannot be used for commercial use.
To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.
Apply for Official Access. You will get a unique download link once you are approved.
Assuming you have the LLaMA checkpoints (☝️)
LLAMA_MODEL_PATH
and LLAMA_MAIN
variables in server.go
.go build
./server
python3 -m pip install requests
python3 llama.py