Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 InfoSec - https://git.sr.ht/~proycon
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Python - Released: 06 Jul 2010 - 477
A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
Rust - Released: 18 Jun 2020 - 143
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Python - Released: 24 Jun 2011 - 60
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
Python - Released: 06 Jul 2010 - 129
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
Rust - Released: 19 Apr 2021 - 31
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
JavaScript - Released: 02 Jan 2014 - 108
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
C++ - Released: 21 Sep 2013 - 123
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Cython - Released: 21 May 2014 - 29
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Python - Released: 06 Sep 2018 - 18
A Python package for generating and working with codemeta
Python - Released: 16 Apr 2018 - 25
LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Shell - Released: 20 May 2015 - 68
Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
Shell - Released: 05 Jan 2022 - 8
A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
Python - Released: 06 Sep 2018 - 10
My elaborate home automation configuration + scripts
Python - Released: 04 Aug 2017 - 20
Very small simple command-line interface for language detection using lingua-rs
Rust - Released: 16 Apr 2022 - 6
Generic Environment for Context-Aware Correction of Orthography
Python - Released: 08 Jan 2015 - 22
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
Rust - Released: 14 Aug 2020 - 4
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
Python - Released: 11 Feb 2013 - 18
FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
Python - Released: 12 Feb 2015 - 5
Use spaCy for NLP and output to the FoLiA XML format.
Python - Released: 23 Mar 2019 - 12