Experiments in voice assisting...
MIT License
Gwen is an extensible voice assistant framework. It offers the following functionality:
Gwen is intended to be used as an easily deployable speech-to-text and question answering service for which you write a client that then handles interpretation of commands as well as audio output of answers to questions given by Google Assistant.
Gwen has been tested on macOS as well as Raspberian on a Raspberry PI. Support for Linux x86_64 is being worked on. Gwen does currently not work on Windows as Snowboy is not supported there. Gwen currently only supports English.
Gwen was created as a means to control what data is being sent to Google. Depending on your reading of the Google Home data security and privacy policy, Google may gather any data the Google Home device can record "to make [their] services faster, smarter, more relevant, and more useful to you". By placing a Google Home device in your home and associating it with your Google account, Google may get to know you more up-close than you may want.
Another motivation for Gwen is it to enable creating our own, custom command processing for home automation, making the entire setup trivial for programmers.
Gwen tries to give you more control over your privacy by only sending the audio data to Google Assistant that is necessary. After detecting a hotword such as "Snowboy", which happens on-device without sending anything to the outside world, Gwen will send the subsequent audio stream from your microphone to the Google Assistant API. Google Assistant signals when it detects then end of your utterance, at which point no more audio is send to Google's servers.
Note: You need to authorize a Google account to be used with Gwen. This means that Google will know from whom the audio Gwen sends stems.
Gwen requires the following software to be installed on your Mac or Raspberry PI:
sudo apt-get install oracle-java8-jdk
sudo apt-get install libatlas-base-dev
Gwen requires a microphone as well as a set of speakers. For Linux/Raspberry PI, ensure that your ALSA configuration is setup correctly. Gwen will use the default microsphone and audio output.
With the pre-requisits installed, you can download the latest build, then in the directory you
downloaded the .jar
file to, execute java -jar gwen-serverjar
. As a rule of thumb, always start Gwen from the same directory as the .jar
file.
You can stop Gwen at any time via CTRL+c
on the command line.
Gwen has an integrated web interface to let you configure its various bits and pieces. The interface is exposed
on port 8777
. You can access the web interface via http://localhost:8777
on the same machine you started Gwen on.
Alternatively, e.g. if you run Gwen on a Raspberry PI, simply use the IP address of the device in your LAN as the
host name.
On the first run, Gwen requires you to setup a Google Developer project. Follow the Google Assistant instructions
on how to setup the project and retrieve the project's clientId
and clientSecret
. Enter these credentials in
the Gwen web interface.
Next, authorize Gwen to use your Google account. Click the link in the web interface, select your account, then paste the code you receive into the web interface.
If everything went right, Gwen will now start up and present you with a simple status page.
Upon successful installation, you can further customize Gwen.
A Gwen model consists of a unique name, a Snowboy hotword detector model and a type.
You can use any of the detector models available on the Snowboy website. Gwen supports both universal models (.umdl
)
that are speaker independent, as well as personal models, which are trained based on samples of your own voice. A
detector model is then used to recognize when you speak a hotword.
The type specifices how your speech after the hotword should be interpreted. There are currently two types:
Command
: Once the model's hotword was detected, subsequent audio input is send to Google Assistant and a speech-to-textQuestion
Once the model's hotword was detected, subsequent audio input is send to Google Assistant and an audioGwen comes with two universal models, Alexa
for questions, and Snowboy
for commands. You can setup any number
of models under the Models
tab of the web interface.
All changes to the model configuration are saved to models.json
. Model files you updload will be saved to the usermodels/
directory.
The Config
tab in Gwen's web interface let's you modify various settings:
Play audio locally
: Gwen can playback the answers by Google Assistant locally. If you disable this setting, you'll have toRecord stereo
: Your voice will be recorded in stereo should your mirophone support it. This option is a fallback solutationSend audio input
: DANGER ZONE. This is only meant for debugging purposes, so your Gwen client is send all the audioTCP pub/sub port
: the port to which a TCP Gwen client can connect.Websocket pub/sub port
: the port to which a Websocket client can connect.Changing any of these settings will trigger an in-process restart of Gwen. Additionally, all configuration changes are
saved to config.json
Once you setup and started the Gwen server as described above, you can write a client to react to various events. We provide a simple Kotlin implementation for a TCP based Gwen client, as well as a Javascript implementation for a Websocket based Gwen client.
The general flow of a user interaction is as follows:
A client can react to the following events:
HOTWORD
: send when a model's hotword was detected. You'll receive the model's name and type. This is the idealCOMMAND
: send when a command model'S hotword was detected, and the user spoke the command. You'll receive theQUESTION
: send when a question model's hotword was detected, and the user spoke the question. You'll receive theQUESTION_ANSWER_AUDIO
: send by Google Assistant once a user question was received. Will usually result in multiple ofLocalAudioPlayer#play()
QUESTION_END
: send when both the question and the answer have been processed.AUDIO_INPUT
: send when the Send audio input
flag was enabled in the configuration. Gwen will continuously stream the recordedTo implement a client in Kotlin, inherit from GwenClient
e.g.:
val client = object : GwenPubSubClient("localhost", 8778) {
override fun hotword(modelName: String, type: GwenModelType) {
}
override fun command(modelName: String, text: String) {
if (text.toLowerCase() == "procrastinate")
Desktop.getDesktop().browse(URI("https://reddit.com"));
}
override fun questionStart(modelName: String, text: String) {
}
override fun questionAnswerAudio(modelName: String, audio: ByteArray) {
}
override fun questionEnd(modelName: String) {
}
override fun audioInput(audio: ByteArray) {
}
};
This Gwen client will help you procrastinate! And here it is in JavaScript, using Websockets:
var client = new GwenClient();
client.connect("localhost", 8779, {
onHotword: function(modelName, modelType) {
},
onCommand: function(modelName, command) {
if (command == "procrastinate") window.open("https://reddit.com");
},
onQuestionStart: function(modelName, question) {
},
onQuestionAnswerAudio: function(modelName, audio) {
},
onQuestionEnd: function(modelName) {
},
onAudioInput: function(audio) {
}
});
If you don't want to use Kotlin (or anyother JVM language) or JavaScript (or a compile to JavaScript language), you can
easily implement the simple TCP protocol for the pub/sub server. Following the GwenClient
implementation
is the best protocol documentation at the moment, as the protocol might slightly change in upcoming releases.
Gwen uses Gradle as its build system.
To build both the Gwen server and client, execute the following command in a terminal in the Gwen project's root directory:
./gradlew dist
The resulting .jar
files can be found in:
gwen-server/build/libs/gwen-server-<version>.jar
gwen-client/build/libs/gwen-client-<version>.jar
The .jar
files are uber-jars and can simply be dropped into your project. If you do not want to create uber-jars, use:
./gradlew build
To build artifacts and deploy them to your local ~/.m2
repository, execute:
./gradlew uploadArchives
To deploy a SNAPSHOT build to SonaType, use:
./gradlew -Psnapshot -PsonatypeUsername=<username> -PsonatypePassword=<password> uploadArchives
To deploy a release build to SonaType, use:
./gradlew -Prelease -PsonatypeUsername=<username> -PsonatypePassword=<password> uploadArchives
You'll have to manually close the staging repository in the SonaType web interface.
Gwen takes a few short cuts that may result in security concerns if not handled properly.
config.json
. Re-use of these bySend audio input
flag enabled.Send audio input
flag enabled.All communication with Google is secured via TLS.
There be dragons, use at your own risk, read the sources.
Please refer to https://github.com/badlogic/gwen/blob/master/CONTRIBUTING.md on how to contribute to Gwen.
Please refer to https://github.com/badlogic/gwen/blob/master/LICENSE for the full license text (MIT).
Feel free to ping me on Twitter (@badlogicgames).