Recognizers-Text

Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV). Packages available at: https://www.nuget.org/profiles/Recognizers.Text, https://www.npmjs.com/~recognizers.text

MIT License

Downloads
3.1M
Stars
1.7K
Committers
134

Bot releases are hidden (Show)

Recognizers-Text - Release of Recognizers-Text NPM packages v1.3.1 Latest Release

Published by aurghob about 1 year ago

New release of the Recognizers-Text packages to NPM (https://www.npmjs.com/~recognizers.text). Version 1.3.1

Upgrade dependency on lodash to address vulnerability in older versions.

Also, releasing v1.3.2 of recognizers date time as 1.3.1 was broken.

Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.7

Published by fxs130430 over 1 year ago

  • [.NET] - Support for NET6.0 Target Framework
  • [.NET] - Support for parametric Timeout to RegEx objects
  • [EN DateTimeV2] - Two-digit year improvements
  • [FR DateTimeV2] - midi expression fix
  • [ES DateTimeV2] - Support for the pattern del
  • [EN DateTimeV2] - Support for the pattern [minutes] past [hour]
  • [IT DateTimeV2] - fix for time expression [minutes] minute alle [hour]
  • [FR DAteTimeV2] - Improving the coverage of expressions with "last" in French
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.6

Published by fxs130430 almost 2 years ago

  • [DateTimeV2] Stack Overflow bug fix in FR
  • [NumberRange] support added for PT
  • [NumberWithUnit] Mitigations for reported common false positives in EN
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.5

Published by fxs130430 about 2 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.5

Changes:

  • [EN DateTime V2] improving support for daterange and cases like "-2020-07-01-"
  • [EN DateTime V2] Fixeing the missing resolutions of some duration types
  • [EN Number] Added support for "nought" as 0
  • [EN DateTimeV2] Improving recognition of informal dates
  • [EN Currency] recognizing MUSD as currency
  • [JA DateTime V2] DateTimeModel initial support
  • [DE DateTime V2] added support for 'en' suffix in German dates containing ordinal numbers as words
  • [DE DateTime V2] fixing inconsistent date recognition with ordinal numbers as words
  • [PT DateTime V2] fixing the failure to recognize times in the 'das n' format
  • [ZH DateTimeV2] adding support for DateTimePeriod patterns like "还剩5 分钟"
  • [NL DateTime V2] fixing multiple issues from speech across date/time sub-types
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.4

Published by tellarin over 2 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.4

Changes:

  • Bug fix for USD Currency in Spanish language;
  • Bug fix for DateRange (without year) that would be mispredicted and resolved as a single date;
  • Added support for 30-hour clock form (commonly used in Japanese);
  • Bug fix for DateTimeV2 where temporal modifiers such as "before" are no longer recognized in multi-mentions like "from 2010 to 2018 or before 2000";
  • Bug fix for DateTimeV2 and Currency to fix misc. bugs generated from the speech in ES and PT;
  • Bug fix for "ALL" wrongly picked as Currency in English;
  • Bug fix for Number in Japanese to recognize the Kanji zero;
  • Extending Number and Currency recognition in Japanese to include Kana along with existing Kanji and Arabic numerals;
  • Improving the recall on a few Dimension entities in Portuguese;
  • Making the interpretation of temporal modifiers "since" and "until" in Chinese consistent with English implementation;
  • Bug fix for ATT recognized as Currency in SV;
  • Bug fix for DateTimeV2 over mentions in the form "day-of-week, date" not recognized in IT;
  • Bug fix for DateTimeV2 to support thw word "ad" in IT;
  • Adding support for "as soon as possible" in FR;
  • Adding support for mex$ as Currency;
  • Supporting "t" as an abbreviation for a ton in NumberWithUnits;
  • Bug fix for night terms in German;
  • Added "abd dolar" as Currency in Turkish;
  • Added the missing support for a few units and abbreviations in ES;
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.3

Published by AhmedLeithy over 2 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.3

Changes:

  • Fixed bugs in EN Datetime including the “since” and “after” modifiers, “a current”, and “greater than” in datetime range and TimexProperty.ToString()
  • Fixed Timex for EN ordinals 11-13
  • Fixed relative ordinals across languages
  • Added Support for angles of rotation in multiple languages
  • Added support for informal use of degrees in ES and PT
  • Refinements of datetime and NumberWithUnits in PT
  • Fixed extraction of phrases like “5 e 45” in PT
  • Adding dimensional units support in JP
  • Refinement of currency and datetime durations and parser in JP
  • Fixed number extraction in datetime and patterns like “is 30 or at least 30” in NumberRange for KO
  • Refinement of temperature and Number range in KO
  • Refinement of Datetime extraction in parsing in SV
  • Refinement of TimeZone extraction and parsing in SV
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.2

Published by nawanas almost 3 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.2

Changes:

  • Fixed errors in number tokenization of digit/character sequence
  • Added support for common date format YYYYMMDD
  • Added support for unconventional scientific notation
  • Fixed ISO Week errors
  • Fixed fractional numbers in long form and large number expressed as fraction
  • Added support for micrograms as a dimension type
  • Fixed issues with emails including capitalization in sequence
  • Fixed recognition of relative range in years in EN|ES|PT|CN Datetime
  • Fixed fraction parsing from text in forms like “two out of one hundred” in EN Number
  • Fixed consistency of units, “%” , “percent” , and “percentage” and appropriate tagging
  • Fixed proper handling of transcribed dates for EN Datetime
  • Merged Extractions in expressions like “Monday two weeks from now” in EN Datetime
  • Fixed consistency in extractions of forms [day abbreviation] [number] (Mon 13th) in EN Datetime
  • Fixed Resolution of “last week of this month” in EN Datetime
  • Fixed omission for “0” in French numbers
  • Fixed ambiguous spelled-out hours in FR Datetime
  • Fixed recognition of year when month is spelled out and using “de”
  • Fixed date range when starting with a “=” gives stack overflow in FR Datetime
  • Fixed failed recognition of time in the n horas format in PT Datetime
  • Fixed negative decimals in ZH Number
  • Improvements to ordinal and fraction recognition in SV Number
  • Refinements of ordinal, percentage and number recognition in KR Number
  • Added support for temperature in JP Number
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.1

Published by nawanas almost 3 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.1

Changes

  • Datetime Adding support for holiday-weekend mention patterns
  • Fixed 2-digit years in month of year construction in EN/ES
  • Fixed recognition of "one morning" in EN
  • Fixed recognition of numerical month in ES/PT
  • Minor change to year suffix in ES to accept words like del or de
  • Fixed tear recognition when month is spelled out in PT
  • Fixed times with am/pm modifier to return the specified time when using words like "morgens" in DE
  • Supporting Weihnachtsfeiertag in DE datetime
  • Support next/last modifiers for dates in DE with nächstem and letztem.
  • Fixing inconsistency in recognizing complex entities including fruhmprgens in DE Datetime
  • Fixed missed ates surrounded by words such as "avant", "et" "maintnet" in FR datertime
  • Adding support for fünfzehn w/ umlaut in DE numbers
  • Adding support for units when surrounded by parenthesis/brackets
  • Bug fixes for expanded recognition of currency value in FR/ES/PT/US to include [currency name][currency symbol]
  • Fixed currency with "con" for decimals in ES
  • Support for quoted Text in SV
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.8.0

Published by tellarin over 3 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.8.0

Changes

  • DateTime support in Dutch;
  • Bug fix for merged DateTime entities in German where time + weekday range behaviour was inconsistent with English;
  • Support for language variations in German for day-of-week and time-of-day DateTime mentions;
  • Bug fix for reference year incorrectly assigned to timex in DatePeriod time expressions;
  • Bug fix for Time entities post-noon returning inconsistent extra resolution, as if ambiguous, in Chinese;
  • Support for "immer" as signal for recurring time (Set) mentions in German;
  • Bug fix for weekday + time-range mentions producing an invalid range in French DateTime;
  • Bug fix for "hasta"/"até" not properly supported as DataTime range modifier in Spanish and Portuguese;
  • Improved false positive filter rules for common cases in English Temperature, PhoneNumber, and DateTime;
  • German Holiday recognition coverage improvements;
  • Bug fix for relative past DateTime modifiers not always resolved correctly in French and Spanish;
  • Bug fix for Duration patterns wrongly normalized/resolved when number is missing in Dutch and English;
  • Bug fix for misinterpretation of a relative duration if prefixed by number in Portuguese, French, and Spanish;
  • Improved handling of "in" vs "within" in Spanish DateTime ranges;
  • Improved support for colloquial Date mentions in Portuguese and English;
  • Bug fix for overly aggressive merge of multiple Duration mentions with modifiers in German, Italian, Spanish;
  • Bug fix for "Jahr" + year number not properly recognized consistently between German and English;
  • Time parser refinements in Japanese.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.7.0

Published by tellarin over 3 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.7.0

The project has reached over 2.5M package downloads on NuGet/npm/PyPI!

Changes

  • Bug fix for incorrect parsing in weekday-date formats in German, French, Portuguese, Spanish, and Italian DateTime;
  • Support for NumberRange in French and German languages;
  • Improved support for colloquial Date mentions in Portuguese and Spanish;
  • Support for early/late modifiers in German DatePeriod;
  • Bug fix in assigning subtype for Number with multipliers (e.g., "1.2b");
  • Bug fix for "à midi" not always correctly recognized as Time;
  • Bug fix for time-of-day entities recognized, but not resolved correctly in Spanish and Portuguese TimePeriod;
  • Extended support for Duration terms in English, Portuguese, and Spanish;
  • Extended support for expressions indicating the present moment in English DateTime;
  • Improved handling of cultures that use multiple Number formats/separators;
  • Improved support for merging date/time/timezone terms within brackets in DateTime;
  • Improved support for French relative Time mentions;
  • Added support for compound Currency entities in Portuguese, Spanish, French, German, and Italian;
  • Support for hyphen-connected Unit expressions in German;
  • German Holiday recognition coverage improvements;
  • QuotedText recognition integration into Sequence recognizers;
  • Fixed inconsistency between French and English DateTime entities with article connectors;
  • Korean support for Currency and Temperature units (extraction-only);
  • Partial Korean support for Dimension units (extraction-only);
  • Support Timezone resolution for time-of-day + time patterns (in Preview);
  • Timezone resolution fixes for US informal timezone names (Preview);
  • Bug fix for over parsing in combining date and time with timezone in English DateTime;
  • Improved resolution of 2-digit year mentions in DatePeriod;
  • Bug fix in Arabic Number recognizer to handle other culture-specific Unicode number separators;
  • Holiday parser refinements in Japanese.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.6.0

Published by tellarin over 3 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.6.0

Changes

  • Extended support for date + time forms in Portuguese DateTime;
  • Extended support for unambiguous date formats in DateTime;
  • Add support for emoji skin tone modifiers across cultures in Choice recognizer;
  • Add support to handle common mispelt ordinals in English Date;
  • Additional support for non-standard speed units in English NumberWithUnit;
  • Support for bitcoin and its Unicode symbol as Currency;
  • Splitting clustered units into their separate Unit entries in English;
  • Bug fix resolving time-of-day modifiers in DateTime in French and Spanish;
  • Support for million/billion/trillion Number abbreviations in English Number;
  • Bug fix for false positive hours incorrectly extracted from float number in DateTime;
  • Improved performance in recognizing long Number forms in Japanese;
  • Bug fix in resolution for "anoche" in Spanish DateTime;
  • Improved support for merged timex of duration/datetimerange (e.g., "PT1H30M") processing in TimexLib;
  • Bug fix to leap year resolution and output format for invalid dates like "2/29/2019" in DateTime;
  • Bug fix in numbered week resolution (e.g., 2021-W02) in TimexLib;
  • Bug fix in Timex parsing across different cultures in TimexLib resolver.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.5.0

Published by tellarin almost 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.5.0

The project has reached over 2M package downloads on NuGet/npm/PyPI!

Changes

  • Expanded coverage for DateTime mention formats across sub-types in Spanish;
  • Bug fix in Spanish NumberRange when multiple non-mergeable sub-ranges are present in input;
  • Bug fix in inconsitent normalization of Spanish Date ranges;
  • Bug fix in resolution for years spelled as words in English DateTime;
  • Bug fix in normalization and resolution of relative year mentions in German DateTime;
  • Bug fix for regression in Chinese DateTime handling "western formats";
  • Improvements support for relative DateTime expressions like "el año anterior" in Spanish;
  • Support for relative Holiday calculations in DateTime;
  • Revised support for Date ranges in Spanish (cleanup and new forms);
  • Revised support for large numbers and informal forms in Chinese;
  • Support for composite durations in Chinese DateTime;
  • Fix for overly aggressive entity merging in Spanish DateTime;
  • Improved handling of fractions and percentages in Chinese;
  • Improved handling of potentially ambiguous terms in Chinese Number and Dimension;
  • Support for Unicode vulgar fractions in .NET across western languages;
  • Bug fix for entity boundary issue in English recurrent dates (Set);
  • Bug fix in support for superscript 'a' and 'o' in Spanish Ordinal;
  • Support for informal abbreviations and prefixes in English Age;
  • Currency support in Swedish;
  • Performance improvements in Swedish Number;
  • Support for Chinese dynasties as Date ranges/periods;
  • Bug fix in Chinese DateTime support for year ranges and decades;
  • Expanded coverage for Date expressions in French (ongoing).
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.4.2

Published by tellarin almost 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.4.2

Changes

  • Support for Age, Temperature, and Dimension units in Swedish;
  • Improvements to handling informal DateTime ranges and Duration in French;
  • Improved recognition of relative ranges and periods in Spanish DateTime;
  • Support for DateTime entity mentions in the form "[n] [date-unit] from [datetime]";
  • Bug fix in handling Currency ISO codes before monetary amounts;
  • Improved parsing of fractions (including textual and unicode fractions) in Number recognizer;
  • Exposed sub-type information in extracted Dimension unit entities (weight, speed, etc.);
  • Bug fix handling recurring times (Set) referencing weekends in English;
  • Bug fix parsing entities in sentences with multiple DateTime ranges mentions in English;
  • Implemented handling of fractional Number term in German;
  • Improved support for fractional Number in English;
  • Implemented merging of compound Unit entities;
  • Improved handling of "start/end of" in DateTime ranges in English and Spanish;
  • Support for terms like "work day" and "work week" in German DateTime;
  • Bug fix for null resolutions during processing of certain Date ranges in Spanish;
  • Bug fixes in handling "quarter", "around", "now", "weekend", and abbreviated months in Spanish DateTime;
  • Bug fix handling whitespace as Date separator in French;
  • Refined coverage for modifiers (early/earlier/late/later, next/past) in Spanish DateTime;
  • Bug fix handling lists of years in DateTime;
  • Bug fix in disambiguating "morning"/"tomorrow" in German DateTime;
  • Bug fix for Set and Time entities being incorrectly mixed during recognition in German;
  • Bug fix for Holiday + Time not properly extracted and parsed in German;
  • Expanded coverage for Time range and time-of-day expressions in German;
  • Bug fix in parsing complex DateTime ranges in Spanish and English (e.g., "since A and not after B");
  • Improved handling of "more than" Number ranges in Chinese;
  • Support for "万" as multiplier in Chinese Number;
  • Bug fix handling suffix for "half" in Chinese Number.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.4.1

Published by tellarin about 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.4.1

Changes

  • The recognizers are now .NET Core 3.1 compatible;
  • Improved support for relative Date Range in German;
  • Improved extraction coverage in Arabic Numbers;
  • Fix compound disjoint Number Range extraction bug in English and Spanish;
  • Hindi DateTime improvements with focus on ranges/periods;
  • Support for "fiscal year" in Spanish DateTime;
  • Resolution improvements for year Date Range in Spanish;
  • Better support for relative Holiday mentions;
  • Improved Holiday support for Dutch (coverage and fixes);
  • Support for part-of Date Range mentions (e.g., "by the end of this month") in English;
  • Support for "to/till date" as Date Range in English;
  • Bug fix in Chinese Currency parsing;
  • Extended Dutch Currency support;
  • Improved support for Numbers like "dozen" in French;
  • Bug fix in initialization of URL recognizer when no culture is specified;
  • Add strict/relaxed match and validation to E-mail recognition;
  • Bug fix were elided Numbers were wrongly extracted in Italian.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.4.0

Published by tellarin over 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.4.0

In June/2020 the project has reached 1.25M package downloads across platforms!

Changes

  • Improved recognition of Japanese Number (Cardinal), Ordinal, and Percentage;
  • Improved recognition of Dutch Number (Cardinal) and Ordinal;
  • Support for NumberRange in Dutch and Hindi;
  • Multiple improvements to Spanish NumberRange;
  • Support for the Indian numbering system in English Number;
  • Improved handling of relative modifiers in German DateTime;
  • Recognition of dialectal Time expressions in German;
  • Support for informal Time mentions in Portuguese;
  • Multiple refinements in DateTimeRange resolution using boundary context;
  • Support for anchored day of week in parsing relative week entities;
  • Improved recognition of Hindi Time, TimeRange, and Duration;
  • Improved handling of variants and gender in French Number;
  • Bug fix for weekday timexes in Portuguese DateTime;
  • Bug fix in German DateTime to properly handle merging weekday + time_of_day;
  • Improved parsing of fully written Dateentities in Spanish and Portuguese;
  • Bug fix in Duration in Spanish and Portuguese;
  • Improvements to false positive extractions of *Ranges from phone numbers;
  • Improvements to false positive extractions of Unit and Time in Chinese and Japanese;
  • Fix in TimexRangeResolver to handle times with date constraints;
  • Improved handling of UTC reference times in TimexRangeResolver;
  • Improvements to handle Number false positives in Chinese;
  • Bug fix for parsing month + two-digit year in Portuguese, Spanish, and French DateTime;
  • Support for part of day in French DateTime;
  • Extensions to TimeZone handling of European forms and extra non-standard timezone names (in Preview);
  • Improved handling of modifiers like "end/beginning/middle" in year ranges;
  • Extended support for multipliers in handling Number/NumberRanges (e.g., "5k-20k", "20MM");
  • Improved handling of month and day of week abbreviations in French Date/DateRange;
  • Improved recognition of Phonenumber corner cases;
  • Support for approximate DateTime in Spanish;
  • Support for "night" and "weekend" ranges in English DateRange;
  • Extension in Hindi Holiday to recognize additional lunar holidays;
  • Holiday fixes/extensions for Easter, Worker's/May day, Juneteenth, etc. resolution;
  • Bug fix in Portuguese parser for relative past Time;
  • Improved support for PRESENT_REF entities in German DateTime;
  • Fix to reduce false positives in French Unit;
  • Bug fix in French handling of "summer" in DateRange;
  • Support for multipliers/dividers in English Set;
  • Support for "weekdays" in English DateRange and Set;
  • Support for "business hours" in Spanish TimeRange;
  • Improvements to Hindi Set support;
  • Support for NumberRange in Japanese - Extraction-only;
  • Support for Number (Cardinal), Ordinal, Percentage, and NumberRange in Arabic - Extraction-only;
  • Support for Ordinal, Percentage, and NumberRange in Korean - Extraction-only.
Recognizers-Text - Release of Recognizers-Text Nuget packages v1.3.2

Published by tellarin over 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.3.2

Changes

  • Support for DateTime recognizers in Hindi;
  • Multiple refinements in Spanish and Portuguese DateTimeRange resolution;
  • Extension in Chinese to handle extra relative DateTimeRange scenarios and "≤" and "≥" unicode chars;
  • Bug fix in French causing false positives in Date extraction;
  • Bug fix in French causing incorrect span calculation for some extracted DateRange entities;
  • Text library now offers methods to convert span indexing between 'char-based' and 'text-element-based';
  • Extension in English DateTime to account for new non-standard written date forms.
Recognizers-Text - Release of Recognizers-Text NPM packages v1.2.7

Published by tellarin over 4 years ago

New release of the Recognizers-Text packages to NPM (https://www.npmjs.com/~recognizers.text). Version 1.2.7

Recognizers-Text - Release of Recognizers-Text NPM packages v1.3.0

Published by tellarin over 4 years ago

New release of the Recognizers-Text packages to NPM (https://www.npmjs.com/~recognizers.text). Version 1.3.0

Major update to bring it closer to parity with .NET/NuGet.

Recognizers-Text - Release of Recognizers-Text Nuget packages v1.3.1

Published by tellarin over 4 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text).

Retracted due to incorrect nuget package dependency reference, please use packages version 1.3.2.

Recognizers-Text - Release of Recognizers-Text Nuget packages v1.3.0

Published by tellarin almost 5 years ago

New release of the Recognizers-Text packages to nuget.org (https://www.nuget.org/profiles/Recognizers.Text). Version 1.3.0

Changes

  • Support for Cardinal, Ordinal, Percent recognizers in Hindi;
  • Support for Age, Temperature, Dimension, Currency in Hindi;
  • Support for Choice recognizers in Turkish and Hindi;
  • Support for NumberRange recognizer in Turkish;
  • Multiple refinements to Turkish DateTime support across sub-types;
  • Improvements to German DateTime support for colloquial scenarios;
  • Bug fix in Spanish DateRange support for cases that cross year boundaries;
  • Bug fix in DateTime to avoid cases where time expression loses generality;
  • Extension to English recurring Set for scenarios like "every other ", "quarterly";
  • Extension to English to handle informal TimeRange and "week" mentions;
  • Extensions and bug fix in Ordinal recognition in Chinese;
  • Refinements to Currency parsing;
  • Improvement to time expression library for week-in-month scenarios;
  • Bug fix in DatePeriod extraction;
  • Multiple performance (latency) improvements;
  • Removal of support for .NET 4.5 and 4.5.2.