Converting Mozc dictionary to MeCab dictionary for Kana-Kanji conversion (KKC)
WTFPL License
For using MeCab as a Kana-Kanji converter (KKC), this repository provides scripts to convert Mozc dictionary to MeCab dictionary.
$ git clone --depth 1 https://github.com/ikegami-yukino/mecab-as-kkc.git
$ make
$ make install
or
$ cp -r maceb-as-kkc <target directory>/maceb-as-kkc
If you do not want to add the dictionary entry, we recommend executing the following commands. These save the disk usage (about 160MB).
$ rm `mecab-config --dicdir`/mecab-as-kkc/lex.csv
$ rm `mecab-config --dicdir`/mecab-as-kkc/matrix.def
$ make uninstall
or
$ rm -r <target directory>/maceb-as-kkc
$ echo ここではきものをぬぎます | mecab -d `mecab-config --dicdir`/mecab-as-kkc -N 5
ここでは着物を脱ぎます
ここでは着物を脱ぎます
ここではきものを脱ぎます
ここではきものを脱ぎます
ここで履物を脱ぎます
In lex.csv, we can add an entry as 1 line 1 entry. The line formatting of lex.csv is as follows:
めかぶ,670,1250,4000,和布蕪
From the left, reading (Hiragana), left-cotext ID, right-context ID, cost, and word are corresponded to. In this case, the reading "めかぶ" is converted to the word "和布蕪".
left-cotext ID and right-context ID are chosen from mozc
/src/data/dictionary_oss/id.def` file.
Usually, the following context IDs are used:
1837 名詞,サ変接続,*,*,*,*,*
1847 名詞,一般,*,*,*,*,*
1895 名詞,代名詞,一般,*,*,*,*
1916 名詞,固有名詞,一般,*,*,*,*
1917 名詞,固有名詞,人名,一般,*,*,*
1918 名詞,固有名詞,人名,名,*,*,*
1919 名詞,固有名詞,人名,姓,*,*,*
1920 名詞,固有名詞,地域,一般,*,*,*
1921 名詞,固有名詞,地域,一般,*,*,府名
1922 名詞,固有名詞,地域,一般,*,*,県名
1923 名詞,固有名詞,地域,一般,*,*,都名
1924 名詞,固有名詞,地域,国,*,*,*
1925 名詞,固有名詞,組織,*,*,*,*
NOTE that choosing the appropriate context ID needs Japanese language domain knowledge.
How to tune cost value is as follows:
$ `mecab-config --libexecdir`/mecab-dict-index -d mecab-as-kkc -o mecab-as-kkc
$ echo めかぶ | mecab -d `mecab-config --dicdir`/maceb-as-kkc`
Currently, this repository does not support Kana-Symbol conversion and Kana-Emoji conversion because we do not know how to determine their appropriate costs.
Contributions are welcome.
WTFPL
We thank MeCab and Mozc since this repository relies on them.