Benchmarking for Candle
Benchmarking for Candle.
Just clone and then run one of:
cargo run --release
cargo run --release --features cuda
cargo run --release --features metal
test | device | candle_time_per_pass | torch_time_per_pass | n | result |
---|---|---|---|---|---|
add | cuda | 19.496µs | 17.935µs | 100000 | ❌ Candle slower than Torch by 1.087x |
matmul | cuda | 259.859µs | 234.579µs | 100000 | ❌ Candle slower than Torch by 1.108x |
cublaslt_matmul | cuda | 136.350µs | 254.747µs | 100000 | ✅ Candle faster than Torch by 1.868x |
relu | cuda | 17.804µs | 22.853µs | 100000 | ✅ Candle faster than Torch by 1.284x |
gelu | cuda | 21.429µs | 10.031µs | 100000 | ❌ Candle slower than Torch by 2.136x |
silu | cuda | 20.767µs | 10.429µs | 100000 | ❌ Candle slower than Torch by 1.991x |
softmax | cuda | 22.955µs | 27.366µs | 100000 | ✅ Candle faster than Torch by 1.192x |
reshape | cuda | 0.123µs | 9.104µs | 100000 | ✅ Candle faster than Torch by 73.951x |
transpose | cuda | 0.094µs | 5.810µs | 100000 | ✅ Candle faster than Torch by 61.876x |
narrow | cuda | 0.113µs | 11.722µs | 100000 | ✅ Candle faster than Torch by 104.156x |
test | device | candle_time_per_pass | torch_time_per_pass | n | result |
---|---|---|---|---|---|
add | cpu | 133.909µs | 51.276µs | 1000 | ❌ Candle slower than Torch by 2.612x |
matmul | cpu | 4464.338µs | 4658.558µs | 1000 | ✅ Candle faster than Torch by 1.044x |
relu | cpu | 175.867µs | 50.372µs | 1000 | ❌ Candle slower than Torch by 3.491x |
gelu | cpu | 2374.970µs | 10.533µs | 1000 | ❌ Candle slower than Torch by 225.487x |
silu | cpu | 2115.940µs | 9.030µs | 1000 | ❌ Candle slower than Torch by 234.315x |
softmax | cpu | 1655.149µs | 266.818µs | 1000 | ❌ Candle slower than Torch by 6.203x |
reshape | cpu | 0.074µs | 13.229µs | 1000 | ✅ Candle faster than Torch by 179.150x |
transpose | cpu | 0.120µs | 8.362µs | 1000 | ✅ Candle faster than Torch by 69.404x |
narrow | cpu | 0.064µs | 12.545µs | 1000 | ✅ Candle faster than Torch by 197.377x |
test | device | candle_time_per_pass | torch_time_per_pass | n | result |
---|---|---|---|---|---|
add | cpu | 34.672µs | 49.539µs | 1000 | ✅ Candle faster than Torch by 1.429x |
matmul | cpu | 4718.913µs | 4705.201µs | 1000 | ❌ Candle slower than Torch by 1.003x |
relu | cpu | 257.947µs | 65.408µs | 1000 | ❌ Candle slower than Torch by 3.944x |
gelu | cpu | 2405.888µs | 12.113µs | 1000 | ❌ Candle slower than Torch by 198.618x |
silu | cpu | 523.669µs | 9.747µs | 1000 | ❌ Candle slower than Torch by 53.725x |
softmax | cpu | 1667.239µs | 272.704µs | 1000 | ❌ Candle slower than Torch by 6.114x |
reshape | cpu | 0.132µs | 18.616µs | 1000 | ✅ Candle faster than Torch by 141.064x |
transpose | cpu | 0.171µs | 7.215µs | 1000 | ✅ Candle faster than Torch by 42.137x |
narrow | cpu | 0.107µs | 13.318µs | 1000 | ✅ Candle faster than Torch by 124.090x |