lexique-experiments

Experiments with the Lexique French frequency database

Stars
5

French Vocabulary Frequency with Lexique

Lexique 3 is a French word database from the Université de Savoie. SQLite 3 is a fast, local database. IPython is a cool way to work with and visualize scientific data.

I'm also merging in the French verb conjugation rules data set, which will provide detailed information about regular verbs.

Viewing the notebook online

A notebook with lots of interesting data is available via nbviewer.

Licenses

  • The data in the directory Lexique380 is distributed under a Creative
    Commons license, which you can find in that directory.
  • The conjugation rules in verbs-0-2-0.xml are in the public
    domain, according to the SourceForge project page where I found them.
  • The verb-prototypes.tsv file is generated using both the Lexique
    data and the conjugation rules.

Running it

You will need:

  • A environment which defaults to UTF-8 encoding.
  • A bunch of normal Unix/Linux command-line tools.
  • iconv
  • sqlite3
  • Python 2.7.3 or later.
  • pip

Run the following commands from the command line:

# Install ipython and supporting libraries.
pip install -U pandas
pip install -U ipython[notebook]
pip install -U brewer2mpl

# Generate our database from the raw Lexique data.
make

# Open up our interactive notebook in a web browser.
ipython notebook 'French Vocabulary Frequency with Lexique.ipynb'