
Scripts for recreating the Replication Dataset for Fundamental Frequency Estimation. Part of the dissertation "Pitch of Voiced Speech in the Short-Time Fourier Transform". © 2020, Bastian Bechtold. All rights reserved.

GPL-3.0 License


This is part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods (Accepted Dissertation) on the topic of A Replication Dataset for Fundamental Frequency Estimation 2020, Bastian Bechtold, Jade Hochschule & Carl von Ossietzky Universitt Oldenburg, Germany.

Pitch Estimation Experiments

This directory contains programs that calculate pitch tracks for combinations of speech signals and noise signals.

Preparing the Corpora

Prior to running the pitch estimation experiments, you need to download the required speech and noise corpora.

Speech signals are taken from the following corpora:

  • CMU-ARCTIC (BSD licensed) [1]
  • FDA (free to download) [2]
  • KEELE and KEELE_mod (free for noncommercial use) [3]
  • MOCHA-TIMIT (free for noncommercial use) [4]
  • PTDB-TUG (ODBL license) [5]
  • TIMIT (commercial license, not included in downloads) [6]

Noise signals are taken from the corpora:

(License texts are included in the corpus files)

These corpora (except for TIMIT) can either be downloaded by running the shell scripts of the same name (i.e. for downloading the FDA corpus), and then assembling them into a JBOF dataset using the python script of the same name (i.e. for the FDA corpus), or by downloading the fully assembled JBOF dataset from and unzipping them in this directory.

The KEELE_mod corpus is a modified version of the KEELE corpus, where recordings are cut into shorter pieces much like in all the other speech corpora.

The TIMIT corpus can not be provided as a download, as it is not made available under a free license. If you happen to have access to the TIMIT corpus, copy it into the directory TIMIT_orig, and the script can import it into a JBOF dataset like all the other corpora.

Running the Experiments

  • noisy speech computes pitch tracks for every PDA and speech in noise at various SNRs
  • synthetic computes pitch tracks for every PDA and tone complexes in noise at various SNRs

These experiments take about one year to compute on a 2019 single-core CPU (less on more cores). Their results are collected as a JBOF Dataframe in a "data" directory, and all intermediate tasks and return values are collected in a "experiment" directory.

To run these scripts, the PDAs python module needs to be installed. A linux64/Python3.6+ version of this module is available from as well (requires Matlab and the Curve Fitting Toolbox, Deep Learning Toolbox, Image Processing Toolbox, Parallel Computing Toolbox, Signal Processing Toolbox, Statistics and Machine Learning Toolbox, Symbolic Math Toolbox).

The PDAs module includes the following fundamental frequency estimation algorithms:

These algorithms are included in their native programming language (Matlab for BANA, DNN, MBSC, NLS, NLS2, PEFAC, RAPT, RNN, SACC, SHR, SRH, STRAIGHT, SWIPE, YAAPT, and YIN; C for KALDI, PRAAT, and SAFE; Python for AMDF, AUTOC, CEP, CREPE, MAPS, and SIFT), and adapted to a common Python interface. AMDF, AUTOC, CEP, and SIFT are our partial re-implementations as no original source code could be found.

All source code in this repository is licensed under the terms of the GPLv3 license.


