Natural language parser, for Latin-script languages, that produces nlcst.
This package exposes a parser that takes Latin-script natural language and produces a syntax tree.
If you want to handle natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-latin
,
which wraps this project to also parse natural language at a higher-level
(easier) abstraction.
Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.
For English and Dutch, you can instead use parse-english
and
parse-dutch
.
You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.
This package is ESM only. In Node.js (version 16+), install with npm:
npm install parse-latin
In Deno with esm.sh
:
import {ParseLatin} from 'https://esm.sh/parse-latin@7'
In browsers with esm.sh
:
<script type="module">
import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle'
</script>
import {ParseLatin} from 'parse-latin'
import {inspect} from 'unist-util-inspect'
const tree = new ParseLatin().parse('A simple sentence.')
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
└─0 SentenceNode[6] (1:1-1:19, 0-18)
├─0 WordNode[1] (1:1-1:2, 0-1)
│ └─0 TextNode "A" (1:1-1:2, 0-1)
├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
├─2 WordNode[1] (1:3-1:9, 2-8)
│ └─0 TextNode "simple" (1:3-1:9, 2-8)
├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
├─4 WordNode[1] (1:10-1:18, 9-17)
│ └─0 TextNode "sentence" (1:10-1:18, 9-17)
└─5 PunctuationNode "." (1:18-1:19, 17-18)
This package exports the identifier ParseLatin
.
There is no default export.
ParseLatin()
Create a new parser.
ParseLatin#parse(value)
Turn natural language into a syntax tree.
value
(string
, optional)Tree (RootNode
).
👉 Note: The easiest way to see how
parse-latin
parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.
parse-latin
splits text into white space, punctuation, symbol, and word
tokens:
Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.
non-profit
, she’s
, G.I.
, 11:00
, N/A
, &c
, nineteenth- and…
1.
, e.g.
, id.
.)
,."
This package is fully typed with TypeScript. It exports no additional types.
Projects maintained by me are compatible with maintained versions of Node.js.
When I cut a new major release, I drop support for unmaintained versions of
Node.
This means I try to keep the current release line, parse-latin@^7
, compatible
with Node.js 16.
This package is safe.
parse-english
parse-dutch
Yes please! See How to Contribute to Open Source.