counterix

A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.

Install

pip install counterix

or, after a git clone:

python3 setup.py install

Use

Generate

To generate a raw count matrix from a tokenized corpus, run:

counterix generate \
  --corpus /abs/path/to/corpus/txt/file \
  --min-count frequency_threshold \
  --win-size window_size

If the --output parameter is not set, the output files will be saved to the corpus directory.

Weigh

To weigh a raw count model with PPMI, run:

counterix weigh --model /abs/path/to/raw/count/npz/model

SVD

To apply SVD on a PPMI-weighed model, with k=10000, run:

counterix svd \
  --model /abs/path/to/ppmi/npz/model \
  --dim 10000

To control the number of threads used during SVD, run counterix with env OMP_NUM_THREADS=1

Package Rankings

Top 26.06% on Pypi.org

Badges

Extracted from project README

Related Projects

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregress...

01 Apr 2024 2,568

pyfn

A python module to process data for Frame Semantic Parsing

22 Aug 2018 22

ortografix

Seq2seq model with attention for automatic orthographic simplification

04 Apr 2020 0

witokit

A Python toolkit to generate a tokenized dump of Wikipedia for NLP

08 Nov 2018 11

starcoder2

Home of StarCoder2!

08 Dec 2023 1,732

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

21 Jan 2022 16

generative_chestxray

Repository to train Latent Diffusion Models on Chest X-ray data (MIMIC-CXR) using MONAI Generativ...

04 Feb 2023 18

embeddix

A general purpose toolkit for word embeddings

07 Mar 2020 0

CornerNet-Lite

17 Apr 2019 1,778

scaling-anno

Code and data for "Scaling up instance annotation via label propagation" in ICCV 2021.

19 Sep 2021 7

pyclustering

pyclustering is a Python, C++ data mining library.

25 Feb 2014 1,165

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

11 Dec 2020 12,124

entropix

Entropy, Zipf's law and distributional semantics

16 Dec 2018 3

SGGM

20 Oct 2020 3

langchangetrack

Package for Statistically significant linguistic change

21 Jan 2015 52