kusari

Japanese random sentence generator based on Markov chain

Downloads
7.4K
Stars
6
Committers
1

🔗 Kusari Gem Version Build Status

Japanese random sentence generator based on Markov chain.

Installation

$ gem install kusari

Usage

First of all, our application must load the gem and create a new instance as:

require 'kusari'
generator = Kusari::Generator.new
# by default, the above statement is the same as:
#   generator = Kusari::Generator.new(3, "./ipadic")

Note that the first argument 3 indicates N for the N-gram model used by creating tokenized word table. You can give arbitrary number. And the second one ./ipadic tells the path of IPA dictionary, a dictionary for parsing Japanese strings, to the generator.

Next, adding strings (reference sentences for Markov chain) can be done by:

generator.add_string("ăƒăƒ­ăšăƒ‘ăƒˆăƒ©ăƒƒă‚·ăƒ„ăŻă€ă“ăźäž–ă§äșŒäșșきりでした。")
generator.add_string("ćœŒă‚‰ăŻă€ćźŸăźć…„ćŒŸă‚ˆă‚Šă‚‚ä»Čăźă‚ˆă„ć€§ăźèŠȘ揋でした。")
generator.add_string("ăƒăƒ­ăŻă€ă‚ąăƒ«ăƒ‡ăƒłăƒç”ŸăŸă‚Œăźć°‘ćčŽă§ă—ăŸă€‚")

In addition to the above operations, we can save the tokenized word table on local as:

generator.save("tokenized_table.markov")

And it can be loaded by:

generator.load("tokenized_table.markov")

Finally, we can obtain randomly generated sentence as:

generator.generate(140)
# => "ăƒăƒ­ăŻă€ă‚ąăƒ«ăƒ‡ăƒłăƒç”ŸăŸă‚Œăźć…„ćŒŸă‚ˆă‚Šă‚‚ä»Čăźă‚ˆă„ć€§ăźć°‘ćčŽă§ă—ăŸă€‚"

Here, an argument of the generate method defines limit length for the generated sentence; generator.generate(140) creates a sentence which can be posted on Twitter, for example.

License

MIT