Semantic Chunker
MIT License
Published by Goldziher 3 months ago
-initial
yet another text augmentation python package
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
A Python tool for splitting large Markdown files into smaller sections based on a specified token...
aim to use JapaneseTokenizer as easy as possible
👖 Conformal Tights adds conformal prediction of coherent quantiles and intervals to any scikit-le...
A python documentation linter which checks that the docstring description matches the definition....
Split strings into (character-based) k-shingles
pre-commit settings for python project with pyproject.toml
A project template for Python package with heavy use of Github actions
Automatically execute code blocks within a Markdown file and update the output in-place
Installs the latest GE-Proton and Installs Non Steam Launchers under 1 Proton prefix folder and a...
A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse ra...
A template for the python project. It uses poetry for dependency management and tox for testing.
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
Annotator combining different NLP pipelines.