TreebankPreprocessing

Python scripts preprocessing Penn Treebank and Chinese Treebank

GPL-3.0 License

Stars

162

View Code on GitHub Visit Website

Ecosystems: Python

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

Related Projects

nlu_datasets

Datasets for intent classification and entity extraction including converters.

20 Nov 2018 5

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，B...

28 Apr 2017 482

SmartLMVocabs

Improving Language Model Performance through Smart Vocabularies

22 Nov 2018 6

attend-copy-parse

Code for the paper attend, copy, parse - End-to-end information extraction from documents (https:...

23 Sep 2019 13

argumentation-management

Annotator combining different NLP pipelines.

28 Jun 2021 0

bert-pos

Part-of-speech tagging using BERT

03 Oct 2019 8

HarvestText

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

19 Nov 2018 2,391

jnlpba

Tools and resources related to the JNLPBA corpus

03 May 2016 5

subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

01 Sep 2015 2,146

spacy_conll

Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpi...

19 Dec 2018 72

multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation

05 Dec 2017 300

lachesis

lachesis automates the segmentation of a transcript into closed captions

27 Dec 2016 32

gazp

Source code for Grounded Adaptation for Zero-shot Executable Semantic Parsing

01 Feb 2021 21

yle-corpus

Tools for working with the Yle corpus

10 Oct 2019 6

CS-Tacotron-Pytorch

Pytorch implementation of CS-Tacotron, a code-switching speech synthesis end-to-end generative TT...

11 Jan 2019 21