gensim-data

Data repository for pretrained NLP models and NLP corpora.

LGPL-2.1 License

Stars

974

View Code on GitHub Visit Website

Ecosystems: Python

Statistics for this project are still being loaded, please check back later.

Related Projects

japanese-words-to-vectors

Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.

04 Sep 2016 83

mtdata

A tool that locates, downloads, and extracts machine translation corpora

06 Apr 2020 139

magnitude

A fast, efficient universal vector embedding utility package.

24 Feb 2018 1,624

toolbox

A collection of tools, APIs and other resources to use in creative coding web projects.

20 Dec 2014 72

belb

Biomedical Entity Linking Benchmark

22 Aug 2023 9

TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

10 Jul 2020 1,688

nlp

This repository recorded my NLP journey.

18 May 2018 1,073

word2vec-api

Simple web service providing a word embedding model

15 Jul 2014 1,431

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...

23 Feb 2024 1,061

gensim

Topic Modelling for Humans

10 Feb 2011 15,255

RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large langu...

14 Apr 2023 4,532

conceptnet-numberbatch

13 Jul 2015 1,286

LexicalRichness

A module to compute textual lexical richness (aka lexical diversity).

09 May 2018 90

code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Re...

24 Jul 2018 1,096