A Home Assistant integration to control your smart home using a local, self hosted LLM
MIT License
This integration requires setting up Functionary's vLLM based inferencing server. Refer to their Setup section for instructions on how to get the inferencing server up and running. The recommended model to use is meetkai/functionary-small-v3.2.
Functionary was chosen because it's one of the best LLMs for tool calling. Check out Berkeley's Function-Calling leaderboard for more information.
The integration can work with any inferencing server that offers an Open AI API endpoint and supports tool calling, in combination with a LLM that also supports tool calling. However, the integration has not yet been tested with such a setup. Support for this setup will be added in the future. The current priority is to add more functionality.
The assistant currently supports the following device types and operations:
This integration is a work in progress and the list of features will continue to grow!
To install the AI Assistant integration to your Home Assistant instance, use this My button:
If the above My button doesnt work, you can also perform the following steps manually:
hemanthpai/hass-ai-assistant
.Integration
.HACS does not "configure" the integration for you, You must add AI Assistant after installing via HACS.
Options for AI Assistant can be set via the user interface, by taking the following steps:
Settings relating to the integration itself.
Option | Description |
---|---|
API Timeout | The maximum amount of time to wait for a response from the API in seconds |
The starting text for the AI language model to generate new text from. This text can include information about your Home Assistant instance, devices, and areas and is written using Home Assistant Templating.
The language model and additional parameters to fine tune the responses.
Option | Description |
---|---|
Model | The model used to generate response. |
Context Size | Sets the size of the context window used to generate the next token. |
Maximum Tokens | The maximum number of words or tokens that the AI model should generate in its completion of the prompt. |
Temperature | The temperature of the model. A higher value (e.g., 0.95) will lead to more unexpected results, while a lower value (e.g. 0.5) will be more deterministic results. |
Top P | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. |
Discussions for this integration over on the discussions page
This integration was inspired by the Ollama Connversation integration. Early iterations of this integration were based on a fork of this repository. As development progressed, through benchmarking, I found that Functionary offered better success rates than Llama 3.1, Groq Llama, and Mistral Nemo. As I result, this integration diverged from Ollama Conversation in its use of vLLM/Open AI compatible APIs v/s using a Ollama server. Despite the divergence, this integration still retains some code and design choices. An additional source of inspiration is the Home LLM integration.