odsc

Project that aims to sentenize all the open data of Riksdagen and other sources to create an easily linkable dataset of sentences that can be refered to from Wikidata lexemes and other resources

GPL-3.0 License

Stars

Committers

View Code on GitHub View on X

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

142

Total Committers

Avg. Commits Per Committer

47.33

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

about 14 hours

Related Projects

zix_understandability-index

Get a pragmatic assessment how understandable a German text is.

22 Aug 2024 6

spacy-udpipe

spaCy + UDPipe

25 Jul 2019 159

skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

16 Mar 2021 917

textacy

NLP, before and after spaCy

03 Feb 2016 2,208

sentence-embedding-evaluation-german

Basically SentEval with German language downstream tasks

08 Apr 2022 0

vocabsieve

Simple sentence mining tool for language learning

10 Jul 2021 372

LexDanNet

07 Dec 2023 0

nlu_datasets

Datasets for intent classification and entity extraction including converters.

20 Nov 2018 5

parler-tts

Inference and training library for high-quality TTS models.

13 Feb 2024 2,623

contextualSpellCheck

✔️Contextual word checker for better suggestions

10 Apr 2020 395

LexSrt

The purpose of this script is to get all the senses for all the words in a SRT-file from Wikidata

05 Sep 2023 3

gensim-data

Data repository for pretrained NLP models and NLP corpora.

13 Oct 2017 974

summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built ...

10 Feb 2019 269

top-open-subtitles-sentences

Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code

19 Oct 2022 17

SmartLMVocabs

Improving Language Model Performance through Smart Vocabularies

22 Nov 2018 6