Generating fun Stack Exchange questions using Markov chains
GPL-3.0 License
Generating fun Stack Exchange questions using Markov chains
For Debian and similar distribution install with:
sudo apt-get install p7zip-full
git clone https://github.com/Findus23/se-simulator
cd se-simulator
git submodule init
git submodule update
pip install -r requirements.txt
se-simulator
config.sample.py
to config.py
and fill in the database details and create a secret_key
create.py
, which creates the database and fetches the list of SE sitesapply_colors.py
(which should run really quickly)chains
, download
and raw
(or syminks to somewhere where more disk space is left).7z
files for the sites you want to generate (it's recommend to start with a file <100MB)
.7z
has another name as the site has now, rename itconsume.py
raw/
, unpack it and extract the needed content from the .xml
files into new .jsonl
files. It also writes the data of the file into the db, so it won't be imported again.todb.py
shuffle.py
count.txt
server.py
http://127.0.0.1:5000/
app.py
: needed for Flaskbasemodel.py
and models.py
: peewee ORMextra_data.py
: manually collected colors of every site with an custom thememarkov.py
: extending the great markovify library for my use caseparsexml.py
: reading in the Stack Exchange dump XML files with no more than 40MB RAM usage.text_generator.py
: everything that creates the content and handles the Markov chainsupdater.py
: probably not working anymore, checks for newer dump filesutils.py
: everything else