foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.

GPL-3.0 License

Stars
9

Bot releases are hidden (Show)

foliatools - v2.5.7 Latest Release

Published by proycon 3 months ago

[Maarten van Gompel]

  • folia2stam: do not include leading whitespace in token/structure offsets

[Ko van der Sloot]

  • foliaspec: added 1 line to the C++ code generation, enabling a light for of type introspection
  • foliaspec: "feat" should NOT be in the list of features that can be attributes
foliatools - v2.5.6

Published by proycon 8 months ago

  • folia2stam: added debug mode + fixed text untangling to properly regard text delimiters
  • tei2folia: implemented --docid parameter to set document id #56
foliatools - v2.5.5

Published by proycon 9 months ago

[Maarten van Gompel]

  • folia2stam: added initial version of folia2stam conversion, using stam-python 0.4.0, might still be subject to change
  • transcribedspeech2folia: added conversion script for transcribed speech with speaker diarisation output (plain text) to FoLiA
  • foliaspec: upgraded yaml loading
  • foliaspec2rdf: updated RDF conversion for FoLiA v2.5.3
  • removed support for Python 3.6

[Ko van der Sloot]

  • foliaspec: added XML processing instruction support for C++ only (Python parser doesn't support it)
foliatools - v2.5.4

Published by proycon about 3 years ago

  • [tei2folia] Fixed regression bug #46
  • [tei2folia] Implemented support for c and rs elements.
foliatools - v2.5.3

Published by proycon about 3 years ago

  • foliatextcontent: Implemented the ability to add offsets to existing elements #43
  • foliatextcontent: fix for processor in linkstrings (LanguageMachines/PICCL#63)
foliatools - v2.5.2

Published by proycon over 3 years ago

  • [foliavalidator] added a --fixinvalidreferences parameter to fix (=delete) invalid wref references (proycon/flat#174)
  • [folia2txt] implemented support for retrieving the original text of a document prior to any corrections #40
foliatools - v2.5.1

Published by proycon over 3 years ago

  • folia2columns: added option to extract sentences and paragraphs and support for extracting sequences of lemmas, pos, poshead, sense or phon when extracting paragraphs or sentences (thanks to Jelke Bloem) #36 #37
  • folia2columns: if outputting to a single file, no longer print header multiple times when processing multiple files (thanks to Jelke Bloem)
  • tei2folia: implemented an --intermediate option to dump intermediate output
  • tei2folia: can handle certain pseudo-TEI now #35
    • fix for table in pseudo-TEI paragraph
    • Accept teiTrim as a valid root element, despite this NOT being an actual valid TEI element!
  • foliaspec: added a mapping to the C++ properties (Ko van der Sloot)
  • foliahtml: updated folia2html test output
  • python 3.5 is deprecated now, use 3.6 or higher
foliatools - v2.5.0

Published by proycon over 3 years ago

  • folia2html: Adapted whitespace handling to comply to FoLiA v2.5; result in less pretty-printed HTML output but more true to the original FoLiA #29
  • folia2html: removed broken -o option, just rely on redirection #27
  • folia2html: handle t-ref #32
  • folia2html: implemented t-hspace
  • folia2html: FoLiA comments should translate to html comments #33
foliatools - v2.4.9

Published by proycon over 3 years ago

Bugfix release, previous release was premature.

foliatools - v2.4.8

Published by proycon over 3 years ago

folia2html: Implemented support for outputting based on other text classes #30

foliatools - v2.4.7

Published by proycon over 3 years ago

  • folia2html: translate t-hbr as a soft-hyphen
  • folia2html: translate features on structural elements to css classes
  • folia2html: fix in translating t-str to css classes
  • foliasplit: prevent duplicate IDs in the root element
foliatools - v2.4.6

Published by proycon over 3 years ago

  • folia2html: Implemented support for render superscript/subscript #26
  • folia2html: mplemented the ability to add custom external CSS stylesheets #26
  • updated help info for fixunassignedprocessor procedure
foliatools - v2.4.4

Published by proycon almost 4 years ago

Minor bugfix release:

  • Fixes an issue in folia2salt (thanks to @parkervg)
foliatools - v2.4.3

Published by proycon almost 4 years ago

  • [foliatextcontent] Fixed and improved substring linking, adding markup elements that reference substrings, also supports corrections #23
foliatools - v2.4.2

Published by proycon almost 4 years ago

Minor update:

  • [tei2folia] Better handling, detection and validation of IDs #22
foliatools - v2.4.1

Published by proycon almost 4 years ago

Major performance improvement in foliasplit.

foliatools - v2.4.0

Published by proycon almost 4 years ago

  • [rst2folia] implemented rubric handling
  • [foliasplit] Implemented a new tool to split a FoLiA document into multiple documents, based on a user's selection criteria. Also allows for linking from a parent document to external child documents. #20
  • [foliaerase] Fixed the inability to properly handle markup elements #21
foliatools - v2.3.2

Published by proycon almost 4 years ago

Bugfix release:

  • [rst2folia] Made more robust against failures #17
  • [rst2folia] support for conversion of containers (divs) from html #18
foliatools - v2.3.1

Published by proycon about 4 years ago

Bugfix release:

  • [txt2folia] Prevent adding empty text content (#14)
foliatools - v2.3.0

Published by proycon about 4 years ago

  • The tei2folia converter has been extended to support more of TEI
    • Implements conversion of tokens, sentences and simple linguistic annotation (@pos,@lemma,@join,@msd) (#12 #13)
    • better document ID detection, prefer DOI, then ISSN, then ISBN, then DTADirName (specific to Deutsches Text Archiv), fall back to untyped but check we get something sane out of it. #12
    • implemented conversion of @norm attribute (not sure if this is entirely according to TEI P5 spec but Deutsches Text Archiv uses it.
    • Benefit from some of the newly allowed structural nestings in folia v2.3
    • Implemented handling for tei:trailer and some other elements
    • Ignore styling that is wrapped around structural elements (for now)
    • Added extra sanity checks
  • foliavalidator now implements the ability to output to explicit form (proycon/folia#84). Explicit form is a more verbose XML serialisation that makes assumptions that are usually implicit in FoLiA (such as defaults and element categories) explicit in the output. This facilitates the job for parsers who do not implement the full FoLiA logic. This is meant to be used as an alternative serialisation only in cases where it makes sense (to support such 3rd party parsers).
  • Various fixes for foliatextcontent
  • implemented a first version of a FoLiA to Salt converetor (proycon/folia#85). This is still in an experimental stage. Salt is a graph based model that acts as an intermediate model in their conversion tool Pepper. This folia2salt convertor in combination with pepper allows users, in theory, to convert FoLiA to formats such as TCF, Paula XML, ANNIS and many others.
  • Updated documentation with some more in-depth sections on foliavalidator, tei2folia and folia2salt
  • various foliaspec updates