Semantic Chunker
MIT License
A Python tool for splitting large Markdown files into smaller sections based on a specified token...
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Annotator combining different NLP pipelines.
👖 Conformal Tights adds conformal prediction of coherent quantiles and intervals to any scikit-le...
Split strings into (character-based) k-shingles
A template for the python project. It uses poetry for dependency management and tox for testing.
A python documentation linter which checks that the docstring description matches the definition....
yet another text augmentation python package
A project template for Python package with heavy use of Github actions
A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse ra...
aim to use JapaneseTokenizer as easy as possible
Automatically execute code blocks within a Markdown file and update the output in-place
pre-commit settings for python project with pyproject.toml
Installs the latest GE-Proton and Installs Non Steam Launchers under 1 Proton prefix folder and a...