A spaCy custom component that extracts and normalizes temporal expressions
MIT License
A spaCy custom component that extracts and normalizes dates and other temporal expressions.
pip install timexy
After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:
import spacy
from timexy import Timexy
nlp = spacy.load("en_core_web_sm")
# Optionally add config if varying from default values
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e.kb_id_}")
>>> 10.10.2010 timexy TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years timexy TIMEX3 type="DURATION" value="P6Y"
Timexy allows the normalization of all temporal expressions to
The normalization is configured with the kb_id_type
config parameter:
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
NOTE: Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the
kb_id_type
config.
Please refer to the contributing guidelines here.