folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

GPL-3.0 License

Downloads
2K
Stars
60
Committers
5

Bot releases are hidden (Show)

folia - v2.5.3 Latest Release

Published by proycon 9 months ago

  • added first version of FoLiA specification in RDF/SKOS #4 (may still be subject to change), with new RDF namespace
  • Added EtymologyAnnotation
folia - v2.5.1

Published by proycon about 3 years ago

  • This release fixes an issue in whitespace handling prior to linebreaks. (#101). Whitespace prior to explicit linebreaks is insignificant.
folia - v2.5.0

Published by proycon over 3 years ago

  • The main focus of this release is a reworked and strictly specified whitespace handling. Some changes were already introduced to this end v2.4.1 but these did not solve the problem sufficiently yet. We have now implemented a strict interpretation of whitespace, documented here: https://folia.readthedocs.io/en/latest/text_annotation.html#whitespace . The new rules are applied to older documents as well so this may lead to different perspective on the text in certain cases, but validators will fall back to the old rules and not cause a hard validation failure if the new rules are not met, preserving backward compatibility. (#88)
  • Implemented processor tags, a simple tagging mechanism that gives users some handles to convey extra information they want to make available to certain processors. (#93)
  • Added a t-lang element (markup counterpart of the lang element)
folia - v2.4.2

Published by proycon almost 4 years ago

  • Predefine some subsets for style annotation #90
  • Allow features in markup annotation #89
  • Allow features in text content
  • Added extra documentation for handling leading/trailing whitespace #88
  • Allow for multiple foreign metadata nodes in FoLiA, even in 'native' mode #91
folia - v2.4.1

Published by proycon almost 4 years ago

  • Ignore all leading/trailing whitespace in text content #88
folia - v2.4.0

Published by proycon almost 4 years ago

  • Added modality annotation (#86) this is now preferred also for sentiment annotation (the dedicated sentiment annotation type is deprecated but remains for backward compatibility) as well as other modalities such as negations, truthfulness, doubt.
  • Added a simple set definition for geolocation and an example to the documentation (using metric annotation)
  • Minor backward-compatibility breaking change: renamed modalityfeature in coreference links to mod so it doesn't conflict with the new modality element, I've never seen anybody use this aspect of coreference linking in FoLiA yet so it's a small risk I'm taking. Let me know if it causes issues for anybody.
  • Reintroduced and documented External annotation (#87), allowing you to separate child documents from parent documents whilst maintaining links.
folia - v2.3.0

Published by proycon about 4 years ago

  • Added the possibility of serialising FoLiA to explicit form. Explicit form is a more verbose XML serialisation that makes assumptions that are usually implicit in FoLiA (such as defaults and element categories) explicit in the output. This facilitates the job for parsers who do not implement the full FoLiA logic. This is meant to be used as an alternative serialisation only in cases where it makes sense (to support such 3rd party parsers). #84
  • Documentation and README updates:
    • added the new rust library, amended implementation list
  • Added new examples and fixed some existing examples
  • Some added flexibility in certain nested of structural elements;
    • allow Word directly under Division
    • allow Linebreaks in tables, figures and lists (outside of items, rows/cells), because these are sometimes used to denote pagebreaks in multi-span tables/figures/lists.
folia - v2.2.1

Published by proycon about 5 years ago

Minor update release:

  • added a syntactic movement example from proycon/flat#138
  • added FQL documentation on adding relations (proycon/foliapy#11)
  • allow AbstractInlineAnnotation in quote
folia - v2.2.0

Published by proycon about 5 years ago

  • The default text delimiter for <part> and <ref> is now space, but this can be overridden with the space="no" attribute. #61
  • Revised "AS ALTERNATIVE" directive in FQL
  • Various documentation, test and example updates
folia - v2.1.0

Published by proycon over 5 years ago

  • Set definitions can now define constraints on the combination of classes/subsets used. #50
  • Added a t-ref element as a text markup counterpart to ref
  • Added flexibility for structure elements (more nesting options)
folia - v2.0.3

Published by proycon over 5 years ago

  • Added extra examples and fixes some other
  • Added set definition for Penn Treebank and Spacy Named Entities (thanks to @ErkanBasar)
  • Added ISO-639-3 set definition (and a link for backward compatibility) (LanguageMachines/ucto#67)
  • span roles do not carry independent processor information
  • reducing complexity: no attributes except ID allowed on annotation layers anymore
folia - v2.0.2

Published by proycon over 5 years ago

  • added attributes src and format to processor element
  • added some examples
  • added simple default set for phonetics
  • various documentation updates
  • various fixes
  • annotator/processor/annotatortype attribute on suggestion element is deprecated (only confidence and n are still allowed)
  • [specification] added a hidden property, used by the Hiddenword class
folia - v2.0.1

Published by proycon over 5 years ago

  • Minor fix for hyphenation annotation
  • README update and some documentation fixes
folia - v2.0.0

Published by proycon over 5 years ago

This is a major new release of FoLiA, which includes some breaking changes such as renamed elements. Nevertheless, the FoLiA libraries retain backward compatibility and can read FoLiA v1 (and v0) documents and upgrade them.

Points of general interest:

  • Completely revised the FoLiA documentation, turned into more formal specification; automatically drawn from the official specification; with automatically validated examples. Now available as a webpage hosted on https://folia.readthedocs.io (PDF still available too) #43
    • The documentation includes some guidelines on good FoLiA practises (arose from #70 and others)
  • Added proper support for provenance logging in FoLiA #46
  • Renamed alignment annotation to relation annotation #59
  • Ensured most examples are "sensible" #9
    • Extended tests using these examples, all examples are automatically tested now
  • The FoLiA tools are now split from the central FoLiA repository into a separate project at https://github.com/proycon/foliatools #55
    • Cleaner output without stack traces from FoLiA validator #44
  • Implemented the ability to add inline annotations on multi-word spans (group annotations) and solved related multi-word issues. These were previously reserved only for use with structural elements. #51
  • Revised the structure annotation hierarchy (i.e. which structural elements are allowed under which parents) on certain points #42
  • Implemented a hidden words annotation type, allowed a layer of implicit/empty/ghost words that can be referenced from span annotation. Needed e.g. for syntactic movement annotation. #58
  • Allow encoding of soft word breaks / hyphenation #66

More technical points:

  • Add support for provenance in FQL #60
  • Annotation declaration overhaul and handle missing set attribute in declarations #54
  • Explicitly forbid and prevent forward wrefs from span annotation #41
  • Apply space attribute more generically to multiple structure elements #61
  • Added a new property in the specification to detect tags that may be (or MUST be) used as Wrefs #63
  • Added a new property to distinguish folia:id (IDREF) from xml:id (ID) #64
  • Alias attribute does not propagate to RelaxNG schema yet #65

A new FoLiA library has been released (replacing the previous one in PyNLPl): https://github.com/proycon/foliapy/releases/tag/2.0.0
A new version of FoLiA tools has also been released: https://github.com/proycon/foliatools/releases/tag/2.0.0
You may also consult the FoLiA release plan (#68) for more information on upgrading and compatibility.

folia - v1.5.1.60

Published by proycon almost 7 years ago

  • [foliacorrect] Implemented --acceptsuggestion
folia - v1.5.1.59

Published by proycon almost 7 years ago

FoLiA v1.5.1

  • Minor update: set comment printable to false explicitly

FoLiA-Tools v1.5.1.59

  • Fixes in foliavalidator for directory processing
  • Prerelease of new foliaeval tool (still under construction)
folia - v1.5.0.57

Published by proycon about 7 years ago

FoLiA v1.5

  • Implemented text validation (#24); checks text redundancy and offsets
  • Added facilities for metadata on sub-parts of a document (#30)
  • Added support for aliases (short names) for set definitions (#31)
  • More liberal acceptance of Linebreak and Whitespace
  • Allow Paragraph and Part under ListItem
  • TextContent and PhonContent were erroneously disallowed under Part

Important note: Text validation is now the default for FoLiA v1.5+, this means that documents will be more strictly validated regarding their text content. Inconsistenties in text reduncancy or offsets results in invalid FoLiA.

FoLiA-Tools v1.5.0.57

  • Expanded rst2folia tool, better rst coverage and fixes
  • foliavalidator adapted for text checking (default for v1.5+, off for older)
folia - v1.4.3.56

Published by proycon about 7 years ago

FoLiA v1.4.3

  • Added textclass attribute to make relation between different text classes and annotations explicit (#29)

FoLiA-tools v1.4.3.56

  • Added foliaid tool
  • Fixes for foliamerge tool
folia - v1.4.2.55

Published by proycon over 7 years ago

FoLiA v1.4.2 release:

  • Allow more structural elements under
  • Space attribute simplified #28

FoLiA-tools v.1.4.2.55:

  • foliavalidator: Implemented experimental text validation (warnings only for now)
  • Expanding foliamerge tool with support for merging as alternative
folia - v1.4.1.54

Published by proycon over 7 years ago

FoLiA v1.4.1 release:

  • Allow <w> directly under <listitem> (#26)
  • Fixes for linebreak (<br/>) as text markup (allow xlink attribute)

FoLiA-tools v.1.4.1.54:

  • Updated foliavalidator to generate more extensive tracebacks
Package Rankings
Top 12.6% on Pypi.org
Badges
Extracted from project README
tests documentation lamabadge DOI Project Status: Active – The project has reached a stable, usable state and is being actively developed.
Related Projects