A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models
APACHE-2.0 License
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models
This repository accompanies the 🤗 HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]
(Mostly identical to the huggingface/datasets contributing guide)
Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
Clone your fork to your local disk, and add the base repository as a remote:
git clone [email protected]:<your Github handle>/wav2vec-toolkit.git
cd wav2vec-toolkit
git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
Set up a development environment by running the following command in a virtual environment:
conda create -n env python=3.7 --y
conda activate env
pip install -e ".[dev]"
pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt
(If wav2vec-toolkit was already installed in the virtual environment, remove
it with pip uninstall wav2vec_toolkit
before reinstalling it in editable
mode with the -e
flag.)
Create a new branch to hold your development changes:
git checkout -b a-descriptive-name-for-my-changes
do not work on the master
branch.
Develop the features on your branch.
Format your code. Run black and isort so that your newly added files look nice with the following command:
black --line-length 119 --target-version py36 src scripts languages
isort src scripts languages
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
git add .
git commit
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
git fetch upstream
git rebase upstream/main
Push the changes to your account using:
git push -u origin a-descriptive-name-for-my-changes
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.