Convert Apple NeuralHash model for CSAM Detection to ONNX.
APACHE-2.0 License
Convert Apple NeuralHash model for CSAM Detection to ONNX.
Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following:
360x360
.[-1, 1]
range.96x128
matrix with the resulting vector of 128 floats.In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included.
Both macOS and Linux will work. In the following sections Debian is used for Linux example.
brew install lzfse
.Python 3.6 and above should work. Install the following dependencies:
pip install onnx coremltools
You will need 4 files from a recent macOS or iOS build:
Option 1: From macOS or jailbroken iOS device (Recommended)
If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/
(on macOS) or /System/Library/Frameworks/Vision.framework/
(on iOS).
.ipsw
of a recent iOS build (14.7+) from ipsw.me.cd /path/to/ipsw/file
mkdir unpacked_ipsw
cd unpacked_ipsw
unzip ../*.ipsw
ls -lh
What you need is the largest .dmg
file, for example 018-63036-003.dmg
.
# Build and install apfs-fuse
sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev
git clone https://github.com/sgan81/apfs-fuse.git
cd apfs-fuse
git submodule init
git submodule update
mkdir build
cd build
cmake ..
make
sudo make install
sudo ln -s /bin/fusermount /bin/fusermount3
# Mount image
mkdir rootfs
apfs-fuse 018-63036-003.dmg rootfs
Required files are under /System/Library/Frameworks/Vision.framework/
in mounted path.
Put them under the same directory:
mkdir NeuralHash
cd NeuralHash
cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* .
cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat .
Normally compiled Core ML models store structure in model.espresso.net
and shapes in model.espresso.shape
, both in JSON. It's the same for NeuralHash model but compressed with LZFSE.
dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net
dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape
cp NeuralHashv3b-current.espresso.weights model.espresso.weights
cd ..
git clone https://github.com/AsuharietYgvar/TNN.git
cd TNN
python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash
The resulting model is NeuralHash/model.onnx
.
Netron is a perfect tool for this purpose.
pip install onnxruntime pillow
nnhash.py
on an image:python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg
Example output:
ab14febaa837b6c1484c35e6
Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.
Device | Hash |
---|---|
iPad Pro 10.5-inch | 2b186faa6b36ffcc4c4635e1 |
M1 Mac | 2b5c6faa6bb7bdcc4c4731a1 |
iOS Simulator | 2b5c6faa6bb6bdcc4c4731a1 |
ONNX Runtime | 2b5c6faa6bb6bdcc4c4735a1 |