subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

MIT License

Downloads

14.1K

Stars

2.1K

Committers

View Code on GitHub

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

114

Total Committers

Avg. Commits Per Committer

1.0

6.0

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

about 1 month

Package Rankings

Top 1.62% on Pypi.org

Top 6.73% on Proxy.golang.org

Related Projects

GLM

GLM (General Language Model)

18 Mar 2021 3,170

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

16 Feb 2024 9,074

bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

04 Oct 2017 1,179

SimCR

Code for NAACL 2024 main conference paper "An Empirical Study of Consistency Regularization for E...

27 Aug 2023 5

TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

10 Jul 2020 1,688

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

24 Jul 2019 13,925

news-translit-nmt

Training scripts and instructions how to reproduce our systems submitted to the NEWS 2018 Task on...

18 Jan 2019 4

text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似...

12 Nov 2019 4,034

conceptnet-numberbatch

13 Jul 2015 1,286

wordfreq

Access a database of word frequencies, in various natural languages.

28 Oct 2013 1,362

sequence-to-sequence-from-scratch

Sequence to Sequence from Scratch Using Pytorch

30 Sep 2018 119

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

29 Aug 2017 29,423

HanLP

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

09 Oct 2014 33,640

awesome-text-summarization

The guide to tackle with the Text Summarization

04 Oct 2017 1,272

End-to-end-ASR-Pytorch

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation)...

08 Dec 2017 1,177