korean-wikipedia-corpus

문장단위로 분절된 한국어 위키피디아 코퍼스. Releases에서 다운로드 받거나 tfds-korean으로 사용해주세요.

Stars

View Code on GitHub Visit Website View on X

Ecosystems: Python

Statistics for this project are still being loaded, please check back later.

Related Projects

korean-spacing-model

한국어 문장 띄어쓰기(삭제/추가) 모델입니다. 데이터 준비 후 직접 학습이 가능하도록 작성하였습니다.

16 Sep 2020 54

yle-corpus

Tools for working with the Yle corpus

10 Oct 2019 6

top-open-subtitles-sentences

Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code

19 Oct 2022 17

namuwiki-corpus

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

12 Jun 2021 16

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

18 Sep 2023 4,242

SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation i...

11 Sep 2016 2,011

speech-to-text-wavenet

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's...

14 Nov 2016 3,945

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，B...

28 Apr 2017 482

ailearning

AiLearning：数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

25 Feb 2017 38,884

tta

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

03 Feb 2021 13

wikiedits

Automatic extraction of edited sentences from text edition histories.

12 Apr 2014 77

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

31 May 2019 7,448

FASPell

2019-SOTA简繁中文拼写检查工具：FASPell Chinese Spell Checker (Chinese Spell Check / 中文拼写检错 / 中文拼写纠错 / 中文拼写检查)

26 Sep 2019 1,199

ECDICT

Free English to Chinese Dictionary Database

20 Mar 2017 5,918

keyword_extraction

利用Python实现中文文本关键词抽取，分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法。

23 Nov 2017 1,121