A lightweight c++ unicode library
UNC is a lightweight and simple unicode library.
I use it because i sometimes need something that just works for a specific set of purposes.
Currently the following operations are supported.
UNC handles errors by either dropping characters, or marking them as invalid.
#include <iostream>
#include <string>
#include <unc/unc.hpp>
using namespace unc;
using namespace std;
int main() {
/* the utf-8 encoded string "åäö" */
string s("\xc3\xa5\xc3\xa4\xc3\xb6");
ustring us = decode<utf8>(s);
cout << us.size() << endl; // prints: 3
ustring ucase = uppercase(us);
cout << encode<utf8>(ucase) << endl; // prints: ÅÄÖ
cout << encode<codepoints>(ucase) << endl; // prints: <U+00c5><U+00c4><U+00d6>
}
If this is all you need, you are welcome to use UNC, it will only introduce around 300K (compared to many megabytes for ICU).
Just be aware that UNC is far from a drop in replacement for ICU.
#> make -f Makefile.mf clean all
#> sudo make -f Makefile.mf install
#> cmake
#> make
Currently the only built version is the one statically compiled, it is linked by doing something similar to the following (if your system supports pkg-config and has bash).
#> g++ myapp.cpp $(pkg-config --libs --cflags unc) -o myapp
Use mingw in an msys environment, it should contain everything you need in order to compile. It might be necessary to install python and run the make command like the following.
#> make PYTHON=python.exe
The script tools.py is used to parse the unicode files and generate the necessary databases. It requires python.
Currently, the following databases are generated.
They are both stored in the uncdata library ready for linking.