lingua-go

The most accurate natural language detection library for Go, suitable for short text and mixed-language text

APACHE-2.0 License

Stars
1.2K
Committers
4

Bot releases are visible (Hide)

lingua-go - Lingua 1.4.0 Latest Release

Published by pemistahl about 1 year ago

Features

  • The new functions GetIsoCode639_1FromValue() and GetIsoCode639_3FromValue() have been introduced to return the proper IsoCode639_1 and IsoCode639_3 for a given name string. (#44)

Changes

  • The functions GetLanguageFromIsoCode639_1() and GetLanguageFromIsoCode639_3() now correctly return Unknown instead of -1 if a language cannot be found for the given iso code. (#44)

Bug Fixes

  • The method LanguageDetector.DetectMultipleLanguagesOf() returned wrong values for start and end indices for texts consisting of only a single word. This has been fixed. (#43)
lingua-go - Lingua 1.3.4

Published by pemistahl over 1 year ago

Bug Fixes

  • When trying to detect multiple languages in a text consisting of only a single word, a panic occurred. This has been fixed. (#41)
lingua-go - Lingua 1.3.3

Published by pemistahl over 1 year ago

Bug Fixes

  • For long input texts, a panic occurred while computing the confidence values due to an accidental division by zero. This has been fixed. (#27)
lingua-go - Lingua 1.3.2

Published by pemistahl over 1 year ago

Improvements

  • After applying some internal optimizations, language detection is now faster, at least between 20% and 30%, approximately. For long input texts, the speed improvement is greater than for short input texts.
lingua-go - Lingua 1.3.1

Published by pemistahl almost 2 years ago

Bug Fixes

  • For long input texts, an error occurred while computing the confidence values due to numerical underflow when converting probabilities. This has been fixed.
lingua-go - Lingua 1.3.0

Published by pemistahl almost 2 years ago

Improvements

  • The min-max normalization method for the confidence values has been replaced with applying the softmax function. This gives more realistic probabilities. (#25)
lingua-go - Lingua 1.2.2

Published by pemistahl almost 2 years ago

Bug Fixes

  • Under certain circumstances, calling the method LanguageDetector.DetectMultipleLanguagesOf() caused an index error. This has been fixed.
lingua-go - Lingua 1.2.1

Published by pemistahl almost 2 years ago

Bug Fixes

  • A misconfiguration in a go.mod file caused errors when trying to download the library via the go get command. This has been fixed. Thanks to @BenStigsen for the pointer. (#23)
lingua-go - Lingua 1.2.0

Published by pemistahl almost 2 years ago

Features

  • The new method LanguageDetector.DetectMultipleLanguagesOf() has been introduced. It allows to detect multiple languages in mixed-language text. (#9)
lingua-go - Lingua 1.1.1

Published by pemistahl almost 2 years ago

Documentation

  • Some documentation mistakes have been fixed and missing information has been added.
lingua-go - Lingua 1.1.0

Published by pemistahl almost 2 years ago

Features

  • The new method LanguageDetectorBuilder.WithLowAccuracyMode() has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#17)

  • The new method LanguageDetector.ComputeLanguageConfidence() has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#19)

Improvements

  • The computation of the confidence values has been revised and the min-max normalization algorithm is now applied to the values, making them better comparable by behaving more like real probabilities. (#16)

  • The language models are now serialized as protocol buffers instead of json. Thanks to this change, they are now loaded into memory twice as fast as before. (#22)

Bug Fixes

  • The unigram counts in the statistics engine were not retrieved correctly. This has been fixed, producing more correct detection results. (#14)

Compatibility

  • The lowest supported Go version is 1.18 now. Older versions are no longer compatible with this library.

Miscellaneous

  • The library now has a fresh and colorful new logo. Why? Well, why not? (-:
lingua-go - Lingua 1.0.5

Published by pemistahl almost 3 years ago

Bug Fixes

  • The character â was erroneously not treated as a possible indicator for French.

Improvements

  • The dependencies to the other language detectors which are used for the accuracy comparisons were always downloaded together with the main library. They are only needed when you want to update the accuracy reports, therefore the cmd/ subdirectory now contains its own Go module that defines those dependencies. They have now been removed from the main library. Thanks to @dim and @BoeingX for identifying this problem. (#8)
lingua-go - Lingua 1.0.4

Published by pemistahl almost 3 years ago

Bug Fixes

  • It was possible to include lingua.Unknown in the set of input languages for building the language detector. It is only meant as a return value, so it is now automatically removed from the set of input languages. Thanks to @marians for identifying this problem. (#7)
lingua-go - Lingua 1.0.3

Published by pemistahl almost 3 years ago

Improvements

  • By replacing sync.Once with sync.Map for storing the language models at runtime, a large amount of code could be removed while preserving the same functionality. This improves code maintenance significantly.
lingua-go - Lingua 1.0.2

Published by pemistahl about 3 years ago

Bug Fixes

  • In very rare cases, the language returned by the detector was non-deterministic.
    This has been fixed. Big thanks to @FilipAlexander for identifying this problem. (#6)
lingua-go - Lingua 1.0.1

Published by pemistahl over 3 years ago

Bug Fixes

  • The language models were not embedded into the compiled binary. This resulted in problems when trying to use Lingua within a Docker container, for instance. Big thanks to @dsxack for identifying this problem and providing a fix. (#2 #3)
lingua-go - Lingua 1.0.0

Published by pemistahl over 3 years ago

This is the very first release of the Go implementation of Lingua. Enjoy! :-)