Generate a ChatRWKV weight file by v2/convert_model.py
(in ChatRWKV repo) and strategy cuda fp16
.
Generate a faster-rwkv weight file by tools/convert_weight.py
. For example, python3 tools/convert_weight.py RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096-converted-fp16.pth rwkv-4-1.5b-chntuned-fp16.fr
.
mkdir build
cd build
cmake -DFR_ENABLE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
./chat tokenizer_file_path weight_file_path "cuda fp16"
For example, ./chat ../tokenizer_model ../rwkv-4-1.5b-chntuned-fp16.fr "cuda fp16"
Generate a ChatRWKV weight file by v2/convert_model.py
(in ChatRWKV repo) and strategy cuda fp32
or cpu fp32
. Note that though we use fp32 here, the real dtype is determined is the following step.
Generate a faster-rwkv weight file by tools/convert_weight.py
.
Export ncnn model by ./export_ncnn <input_faster_rwkv_model_path> <output_path_prefix>
. You can download pre-built export_ncnn
from Releases if you are a Linux users, or build it by yourself.
Download the pre-built Android AAR library from Releases, or run the aar/build_aar.sh
to build it by yourself.
For the path of Android NDK and toolchain file, please refer to Android NDK docs.
mkdir build
cd build
cmake -DFR_ENABLE_NCNN=ON -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DANDROID_NDK=xxxx -DCMAKE_TOOLCHAIN_FILE=xxxx -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
Copy chat
into the Android phone (by using adb or Termux).
Copy the tokenizer_model and the ncnn models (.param, .bin and .config) into the Android phone (by using adb or Termux).
Run ./chat tokenizer_model ncnn_models_basename "ncnn fp16"
in adb shell or Termux, for example, if the ncnn models are named rwkv-4-chntuned-1.5b.param
, rwkv-4-chntuned-1.5b.bin
and rwkv-4-chntuned-1.5b.config
, the command should be ./chat tokenizer_model rwkv-4-chntuned-1.5b "ncnn fp16"
.
Android System >= 9.0
RAM >= 4GB (for 1.5B model)
No hard requirement for CPU. More powerful = faster.
Run one of the following commands in Termux to download prebuilt executables and models automatically. The download script supports continuely downloading partially downloaded files, so feel free to Ctrl-C and restart it if the speed is too slow.
Executables, 1.5B CHNtuned int8 model, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 3
Executables, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 2
Executables and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 1
Executables only:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 0
Install rwkv2onnx
python package by pip install rwkv2onnx
.
Run rwkv2onnx <input path> <output path> <ChatRWKV path>
. For example, rwkv2onnx ~/RWKV-5-World-0.1B-v1-20230803-ctx4096.pth ~/RWKV-5-0.1B.onnx ~/ChatRWKV