Pure JavaScript implementation of UTF-8 validation
MIT License
Pure JavaScript implementation of UTF-8 validation.
To be drop-in replacement for utf-8-validate
.
Most time and efforts were spent to develop extensive test suite (over 18k assertions).
Tests are run using mocha with regular command:
npm test
Many non-obvious aspects of UTF-8 validation are tested, including:
To test other UTF-8 validation libraries, first install them
cd test/others
npm install
cd ../..
and then run tests for one library, eg:
npm test --lib=utf-8-validate
or:
npm test --lib=is-utf8
Validation speed is measured during test. So far this validator is fastest (this is not a joke!).
valid-8
: 300 Mb/s (pure JavaScript)utf-8-validate
: 260 Mb/s (C++)is-utf8
: 110 Mb/s (pure JavaScript either)Validation is simple:
valid8 = require('valid-8')
if(!valid8(new Buffer('你好,世界!')))
{
// ...
}
For compatibility with utf-8-validate
alias is set
valid8.Validation.isValidUTF8 === validate8
.
By default, valid8
rejects UTF surrogates (0xD800-0xDFFF) and codepoints
higher than 0x10FFFF, according to UTF specification.
One can force UTF surrogates to pass test setting valid8.surrogates = true
.
To allow long sequences (say, 5 or 6 bytes), set validate8.maxBytes
to 5
or 6
.
7-byte sequences will always be rejected. By default validate8.maxBytes=4
,
and can be set to 1
, 2
or 3
either. Eg, set validate8.maxBytes=2
to disable
Chinese ideograms (and many other symbols).