audio_capture

This repositiory provides a set of ROS 2 packages for audio. It provides a Python version to capture and play audio data using pyaudio.

Installation

$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/audio_common.git
$ cd ~/ros2_ws
$ rosdep install --from-paths src --ignore-src -r -y
$ pip3 install -r audio_common/requirements.txt
$ colcon build

Docker

You can create a docker image to test audio_common. Use the following common inside the directory of audio_common.

$ docker build -t audio_common .

After the image is created, run a docker container with the following command.

$ docker run -it --device /dev/snd audio_common

To use a shortcut, you may use following command:

$ make docker_run

Nodes

audio_capturer_node

Node to obtain audio data from a microphone and publish it into the audio topic.

Parameters

format: Specifies the audio format to be used for capturing. Common values are pyaudio.paInt16 (16-bit format) or other formats supported by PyAudio. Default: pyaudio.paInt16
channels: The number of audio channels to capture. Typically, 1 for mono and 2 for stereo. Default: 1
rate: The sample rate that is is how many samples per second should be captured. Default: 16000
chunk: The size of each audio frames. Default: 4096
device: The ID of the audio input device. A value of -1 indicates that the default audio input device should be used. Default: -1
frame_id: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: ""

ROS 2 Interfaces

audio: Topic to publish the audio data captured from the microphone. Type: audio_common_msgs/msg/AudioStamped

audio_player_node

Node to play the audio data obtained from the audio topic.

Parameters

channels: The number of audio channels to capture. Typically, 1 for mono and 2 for stereo. Default: 1
device: The ID of the audio input device. A value of -1 indicates that the default audio input device should be used. Default: -1

ROS 2 Interfaces

audio: Topic subscriber to get the audio data captured to be played. Type: audio_common_msgs/msg/AudioStamped

music_node

Node to play the music from a audio file in wav format.

Parameters

chunk_time: Time, in milliseconds, that last each audio chunk. Default: 50
frame_id: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: ""

ROS 2 Interfaces

audio: Topic subscriber to get the audio data captured to be played. Type: audio_common_msgs/msg/AudioStamped

tts_node

Node to generate audio from a text (TTS).

Parameters

chunk: The size of each audio frames. Default: 4096
frame_id: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: ""

ROS 2 Interfaces

audio: Topic publisher to send the audio data generated by the TTS. Type: audio_common_msgs/msg/AudioStamped
say: Action to generate audio data from a text. Type: audio_common_msgs/action/TTS

Demos

Audio Capturer/Player

$ ros2 run audio_common audio_capturer_node

$ ros2 run audio_common audio_player_node

TTS

$ ros2 run audio_common tts_node

$ ros2 run audio_common audio_player_node

$ ros2 action send_goal /say audio_common_msgs/action/TTS "{'text': 'Hello World'}"

Music Player

$ ros2 run audio_common music_node

$ ros2 run audio_common audio_player_node

$ ros2 service call /music_play audio_common_msgs/srv/MusicPlay "{audio: 'elevator'}"

Related Projects

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-sta...

03 Jul 2024 5,597

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

06 Sep 2023 1,102

aiopolly

Asynchronous wrapper for AWS Polly API

25 May 2019 4

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

16 Mar 2023 9,988

dswav

23 Nov 2023 15

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

14 Jan 2024 33,328

RadioPlayerV3

An Advanced Telegram Bot to Play Radio & Music in Voice Chat. This is Also The Source Code of Th...

09 Aug 2021 466

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

29 Jan 2023 2,400

jarvis-lite

My lightweight J.A.R.V.I.S desktop experiment

26 Dec 2017 2

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-ti...

27 Aug 2024 2,721

open-dubbing

Open dubbing is an AI dubbing system which uses machine learning models to automatically translat...

14 Sep 2024 24

python_audio_loading_benchmark

Benchmark popular audio i/o packages

11 Jan 2019 131

audio_common

audio_capture

Table of Contents

Installation

Docker

Nodes

audio_capturer_node

Parameters

ROS 2 Interfaces

audio_player_node

Parameters

ROS 2 Interfaces

music_node

Parameters

ROS 2 Interfaces

tts_node

Parameters

ROS 2 Interfaces

Demos

Audio Capturer/Player

TTS

Music Player

Related Projects

CosyVoice

versatile_audio_super_resolution

aiopolly

AudioGPT

dswav

GPT-SoVITS

RadioPlayerV3

AudioLDM

jarvis-lite

mini-omni

open-dubbing

python_audio_loading_benchmark