Maarten van Gompel

Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 InfoSec - https://git.sr.ht/~proycon

Projects

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Python - Released: 06 Jul 2010 - 477

vocage

A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal

Rust - Released: 18 Jun 2020 - 143

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

Python - Released: 24 Jun 2011 - 60

clam

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

Python - Released: 06 Jul 2010 - 129

analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction

Rust - Released: 19 Apr 2021 - 31

flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

JavaScript - Released: 02 Jan 2014 - 108

colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

C++ - Released: 21 Sep 2013 - 123

timbl

TiMBL implements several memory-based learning algorithms.

C++ - Released: 05 Jun 2014 - 46

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Cython - Released: 21 May 2014 - 29

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

Python - Released: 06 Sep 2018 - 18

codemetapy

A Python package for generating and working with codemeta

Python - Released: 16 Apr 2018 - 25

LaMachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

Shell - Released: 20 May 2015 - 68

codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process

Shell - Released: 05 Jan 2022 - 8

foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.

Python - Released: 06 Sep 2018 - 10

dotfiles

My dotfiles

Shell - Released: 30 May 2013 - 23

deepfrog

An NLP-suite powered by deep learning

Rust - Released: 12 Feb 2020 - 19

homeassistant-config

My elaborate home automation configuration + scripts

Python - Released: 04 Aug 2017 - 20

unilangforum

UniLang Language Community - Forum

PHP - Released: 14 Mar 2015 - 8

lingua-cli

Very small simple command-line interface for language detection using lingua-rs

Rust - Released: 16 Apr 2022 - 6

gecco

Generic Environment for Context-Aware Correction of Orthography

Python - Released: 08 Jan 2015 - 22

sesdiff

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).

Rust - Released: 14 Aug 2020 - 4

python-timbl

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.

Python - Released: 11 Feb 2013 - 18

foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.

Python - Released: 12 Feb 2015 - 5

spacy2folia

Use spaCy for NLP and output to the FoLiA XML format.

Python - Released: 23 Mar 2019 - 12

lexmatch

Simple lexicon matcher against a text

Rust - Released: 20 Sep 2021 - 2