counterix

Generating count-based Distributional Semantic Models

Downloads
86
Stars
2
Committers
2

counterix

A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.

Install

pip install counterix

or, after a git clone:

python3 setup.py install

Use

Generate

To generate a raw count matrix from a tokenized corpus, run:

counterix generate \
  --corpus /abs/path/to/corpus/txt/file \
  --min-count frequency_threshold \
  --win-size window_size

If the --output parameter is not set, the output files will be saved to the corpus directory.

Weigh

To weigh a raw count model with PPMI, run:

counterix weigh --model /abs/path/to/raw/count/npz/model

SVD

To apply SVD on a PPMI-weighed model, with k=10000, run:

counterix svd \
  --model /abs/path/to/ppmi/npz/model \
  --dim 10000

To control the number of threads used during SVD, run counterix with env OMP_NUM_THREADS=1