Data repository for pretrained NLP models and NLP corpora.
LGPL-2.1 License
Statistics for this project are still being loaded, please check back later.
Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.
A tool that locates, downloads, and extracts machine translation corpora
A fast, efficient universal vector embedding utility package.
A collection of tools, APIs and other resources to use in creative coding web projects.
Biomedical Entity Linking Benchmark
Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf
This repository recorded my NLP journey.
Simple web service providing a word embedding model
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Topic Modelling for Humans
The RedPajama-Data repository contains code for preparing large datasets for training large langu...
A module to compute textual lexical richness (aka lexical diversity).
TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Re...