Benchmarking mysql, mongodb, postgres and postgres + timescale.
The following project tries to benchmark the following methods on 3 database instances:
Each database instance is ran through docker and uses the non-default ports to avoid conflicts with local database instances.
NOTE. Create an issue if you see a mistake or have a suggestion.
The following methods are benchmarked:
The data format for all of the tables is the same (excluding the id field between mongodb and postgres implementations, and the name of the interval field in mysql).
{
"created_at": "2021-09-01T00:00:00Z", // TIMESTAMPTZ | mongodb ISODate
"updated_at": "2021-09-01T00:00:00Z", // TIMESTAMPTZ | mongodb ISODate
"start_time": "2021-09-01T00:00:00Z", // TIMESTAMPTZ | mongodb ISODate
"interval": TIME_IN_MILLISECONDS, // BIGINT (renamed to resolution in mysql)
"area": "area", // TEXT
"source": "source", // TEXT
"value": 0.0, // double precision
}
Upserts are done using an start_time
, interval
and area
filter.
The same data chunks get inserted into all of the databases.
# start docker in the background
sudo docker compose up -d
# run the go benchmarks
cd go
go test -benchmem -run=^$ -bench ^BenchmarkTimeseries$ timeseries-benchmark -v -count=1 -timeout=0
# reset docker (uninstall every image and container)
sudo docker stop $(sudo docker ps -aq)
sudo docker rm $(sudo docker ps -aq)
sudo docker rmi $(sudo docker images -q)
sudo docker rmi -f $(sudo docker images -q)
sudo docker volume rm $(docker volume ls -q)
go $ go test -benchmem -run=^$ -bench ^BenchmarkTimeseries$ timeseries-benchmark -v -count=1 -timeout=0
goos: darwin
goarch: arm64
pkg: timeseries-benchmark
BenchmarkTimeseries
BenchmarkTimeseries/mysql-insert-100000-rows-10 1 101650284500 ns/op 108107240 B/op 1900985 allocs/op
BenchmarkTimeseries/mongodb-insert-100000-rows-10 1 41199354291 ns/op 813491216 B/op 11302331 allocs/op
BenchmarkTimeseries/pg-ntv-insert-100000-rows-10 1 32611100375 ns/op 40034880 B/op 1300323 allocs/op
BenchmarkTimeseries/pg-tsc-insert-100000-rows-10 1 52285019042 ns/op 40052832 B/op 1300511 allocs/op
BenchmarkTimeseries/mysql-upsert-single-4000-rows-10 1 1865689000 ns/op 4321896 B/op 76004 allocs/op
BenchmarkTimeseries/mongodb-upsert-single-4000-rows-10 1 1112991375 ns/op 31069864 B/op 416178 allocs/op
BenchmarkTimeseries/pg-ntv-upsert-single-4000-rows-10 1 1378900750 ns/op 1600000 B/op 52000 allocs/op
BenchmarkTimeseries/pg-tsc-upsert-single-4000-rows-10 1 2096648666 ns/op 1600000 B/op 52000 allocs/op
BenchmarkTimeseries/mysql-upsert-bulk-4000-rows-10 2 757385021 ns/op 2529884 B/op 56023 allocs/op
BenchmarkTimeseries/mongodb-upsert-bulk-4000-rows-10 6 167808972 ns/op 14574984 B/op 176101 allocs/op
BenchmarkTimeseries/pg-ntv-upsert-bulk-4000-rows-10 19 59175759 ns/op 5768672 B/op 52042 allocs/op
BenchmarkTimeseries/pg-tsc-upsert-bulk-4000-rows-10 2 515012583 ns/op 5773888 B/op 52090 allocs/op
benchmark_test.go:90: * storage size for pg-tsc, 100000 rows, before compression: 23568
BenchmarkTimeseries/mysql-get-4000-10 376 3211083 ns/op 3031471 B/op 40051 allocs/op
BenchmarkTimeseries/mongodb-get-4000-10 85 12326166 ns/op 5018889 B/op 80200 allocs/op
BenchmarkTimeseries/pg-ntv-get-4000-10 510 2367323 ns/op 2934839 B/op 16033 allocs/op
BenchmarkTimeseries/pg-tsc-get-4000-10 309 3820294 ns/op 2934888 B/op 16033 allocs/op
benchmark_test.go:112: sleeping for 60 sec to get the correct mongodb collection storage size
benchmark_test.go:115: * storage size for 100000 rows
benchmark_test.go:122: - mysql: 10272 KB
benchmark_test.go:122: - mongodb: 7704 KB
benchmark_test.go:122: - pg-ntv: 20208 KB
benchmark_test.go:122: - pg-tsc: 8392 KB
PASS
ok timeseries-benchmark 306.979s
Notes:
resolution
instead of interval
because interval
is a reserved keyword.To get specific info about how long the queries took on the database level, i ran the read_test.sh
script post benchmarking. The results between native postgres and timescale are not great. Native postgres SELECT *
queries otperform timescale by at least 2x. All of the logs of the explain queries can be inspected in the file.
* postgres select with limit
Planning Time: 0.323 ms
Execution Time: 1.932 ms
(4 rows)
~ timescaledb version
default_version | installed_version
-----------------+-------------------
2.16.1 | 2.16.1
(1 row)
* timescaledb compressed -> select with limit
Planning Time: 63.068 ms
Execution Time: 7.791 ms
(293 rows)
* timescale decompressed -> select with limit
Planning Time: 16.949 ms
Execution Time: 3.828 ms
(75 rows)
# find the version of timescaledb
docker exec timeseries_timescaledb psql -U test -d timeseries_benchmark -c "SELECT default_version, installed_version FROM pg_available_extensions where name = 'timescaledb';"
# see size of table in timescale + timing of query
docker exec timeseries_timescaledb psql -U test -d timeseries_benchmark -c "SELECT pg_size_pretty(hypertable_size('data_objects')) AS total_size;"
docker exec timeseries_timescaledb psql -U test -d timeseries_benchmark -c "\d data_objects"
# see size of table in postgres + timing of query
docker exec timeseries_postgres psql -U test -d timeseries_benchmark -c "SELECT pg_size_pretty(pg_total_relation_size('data_objects')) AS total_size;"
docker exec timeseries_postgres psql -U test -d timeseries_benchmark -c "\d data_objects"
docker exec timeseries_postgres psql -U test -d timeseries_benchmark -c 'SELECT *
FROM timescaledb_information.dimensions
WHERE hypertable_name = 'metrics';'
SELECT default_version, installed_version FROM pg_available_extensions where name = 'timescaledb';
totalSize
value updates after some time, once the records are inserted. This is why there is a pause before reading the collection size.30 days
otperforms compression of 7 days
with a big margin.
SELECT pg_size_pretty(pg_total_relation_size($1)) AS total_size;
query does not return the correct. I found this out when the table size returned from this query did not change once i benchmarked the size on varying number of rows (1k -> 10k).DATETIME
instead of TIMESTAMP
because TIMESTAMP
has a range of 1970-2038
and DATETIME
has a range of 1000-9999
(Error 1292 (22007): Incorrect datetime value: '2038-01-19 04:00:00' for column 'start_time' at row 1).