Truly universal encoding detector in pure Python
MIT License
Bot releases are hidden (Show)
Published by Ousret 12 months ago
Published by Ousret about 1 year ago
python -m charset_normalizer.cli
or python -m charset_normalizer
encoding.aliases
as they have no alias (#323)Published by Ousret over 1 year ago
from_path
no longer enforce PathLike
as its first argumentis_binary
that relies on main capabilities, and is optimized to detect binariesenable_fallback
argument throughout from_bytes
, from_path
, and from_fp
that allow a deeper control over the detection (default True)Published by Ousret over 1 year ago
should_rename_legacy
for legacy function detect
and disregard any new arguments without errors (PR #262)Published by Ousret almost 2 years ago
Published by Ousret almost 2 years ago
language_threshold
in from_bytes
, from_path
and from_fp
to adjust the minimum expected coherence rationormalizer --version
now specify if the current version provides extra speedup (meaning mypyc compilation whl)md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1first()
and best()
from CharsetMatchnormalize
chaos_secondary_pass
, coherence_non_latin
and w_counter
from CharsetMatchunicodedata2
This is the last version (3.0.x) to support Python 3.6 We plan to drop it for 3.1.x
Published by Ousret about 2 years ago
This is the last pre-release. If everything goes well, I will publish the stable tag.
language_threshold
in from_bytes
, from_path
and from_fp
to adjust the minimum expected coherence ratioPublished by Ousret about 2 years ago
normalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)first()
and best()
from CharsetMatchPublished by Ousret about 2 years ago
normalize
scheduled for removal in 3.0Published by Ousret about 2 years ago
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1normalize
chaos_secondary_pass
, coherence_non_latin
and w_counter
from CharsetMatchunicodedata2
Published by Ousret over 2 years ago
--version
(PR #194)unicodedata2
as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)Published by Ousret over 2 years ago
Published by Ousret over 2 years ago
Published by Ousret almost 3 years ago
Published by Ousret almost 3 years ago
explain
to True (PR #146)Published by Ousret almost 3 years ago
NullHandler
by default from @nmaynes (PR #135)explain
to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)set_logging_handler
to configure a specific StreamHandler from @nmaynes (PR #135)CHANGELOG.md
entries, format is based on Keep a Changelog (PR #141)Published by Ousret about 3 years ago
We arrived in a pretty stable state.
Changes:
SyntaxError
(Not about ASCII decoding error) for those trying to install this package using a non-supported Python versionThis version pushes forward the detection-coverage to 98%! https://github.com/Ousret/charset_normalizer/runs/3863881150
The great filter (cannot be better than) shall be 99% in conjunction with the current dataset. In future releases.
Published by Ousret about 3 years ago
Changes:
Published by Ousret about 3 years ago
Changes:
Internal: 🎨 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality #81
Internal: 🎨 The MANIFEST.in was not exhaustive #78
Improvement: ✨ The BC-support with v1.x was improved, the old staticmethods are restored #82
Remove: 🔥 The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead #92
Improvement: ✨ The Unicode detection is slightly improved, see #93
Bugfix: 🐛 In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection #95
Bugfix: 🐛 Some rare 'space' characters could trip up the UnprintablePlugin
/Mess detection #96
Improvement: 🎨 Add syntax sugar __bool__ for results CharsetMatches
list-container see #91
This release push further the detection coverage to 97 % !