A Python3 program for converting Japanese words and numbers into phonemes.
MIT License
jphones
: A Japanese Phonetizerjphones
accepts as input tokens of Japanese (words or numbers), and returns an approximate phonetic transcription.
The tokens can be Kanji, Harigana, Katakana, or Romaji. English words will be phonetized via grapheme-to-phoneme conversion.
Example usage:
import jphones as j2p
token = {'token': '', 'type': 'word'}
Phonetizer = j2p.phonetizer.Phonetizer()
phonemes = Phonetizer.get_phonemes(token)
print(phonemes)
# {'phonemes': ['s', 'u', 'g', 'o', 'i'], 'token': '', 'type': 'word'}
$ pip3 install git+https://github.com/JRMeyer/jphones.git
jphones
is built upon the following Python dependencies:
The Convert-Numbers-to-Japanese
script has been significantly changed, and comes included in this repo, renamed as num2kana.py
. No need to install it.
The japanese_numbers
module has been modified to work with jphones
and Python3.
Install japanese_numbers
from my forked version as such:
$ pip3 install git+https://github.com/JRMeyer/japanese-numbers-python.git
You should install pykakasi
and its dependencies as such:
$ pip3 install six semidbm
$ pip3 install pykakasi
The main j2p.Phonetizer.get_phonemes()
function expects tokens as Python dicts. Each dict should have two entries:
'token'
: 'unicode-char-string'
'type'
: 'word'
or 'number'
The function j2p.Phonetizer.get_phonemes()
returns the original dictionary for the token, with an extra entry for phonemes:
'phonemes'
: ['p','h','o','n','e','m','e','s']
The phoneme set is very naive, but for ASR built on phonetic decision trees it should suffice. The phonemes are a one-to-one correlate of the Romaji Hepburn set.
Currently jphones
can only handle numbers up to 9999
. Anything larger will be returned with the phoneme-string 'NUM-TOO-LARGE'
.
Of the three dependencies, only Convert-Numbers-to-Japanese
doesn't have an MIT License. This script has no explicit license.