chdb

chDB is an embedded OLAP SQL Engine 🚀 powered by ClickHouse

APACHE-2.0 License

Downloads
65.7K
Stars
1.7K
Committers
13

Bot releases are visible (Hide)

chdb - v2.0.4 Latest Release

Published by auxten about 1 month ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v2.0.3...v2.0.4

chdb - v2.0.3

Published by auxten about 1 month ago

What's Changed

New Contributors

Full Changelog: https://github.com/chdb-io/chdb/compare/v2.0.2...v2.0.3

chdb - v2.0.2

Published by auxten about 2 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v2.0.1...v2.0.2

chdb - v2.0.1

Published by auxten about 2 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v2.0.0b1...v2.0.1

chdb - v2.0.0b1

Published by auxten 4 months ago

What's Changed

Special thanks to @mneedham!

Full Changelog: https://github.com/chdb-io/chdb/compare/v2.0.0b0...v2.0.0b1

chdb - v2.0.0b0

Published by auxten 4 months ago

Highlight

  • 🚀 87x performance boost query on Pandas DataFrame, See Benchmark

  • ⬆️ Upgrade ClickHouse engine to 24.5

  • 🐍 Query on Pandas DataFrame, ArrowTable, Dict or Any Python object Directly!

import chdb
import pandas as pd
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()

Changes

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.4.1...v2.0.0b0

chdb - v1.4.1

Published by auxten 5 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.3.0...v1.4.1

chdb - v1.4.0

Published by auxten 5 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.3.0...v1.4.0

chdb - v1.3.0

Published by auxten 7 months ago

What's Changed

  • Add show for result by @auxten in https://github.com/chdb-io/chdb/pull/194
    # You do not need something like
    # ret = chdb.query("SELECT 123")
    # print(ret)
    # just
    chdb.query("SELECT 123").show()
    
  • Allow path in dbapi connect by @nevinpuri in https://github.com/chdb-io/chdb/pull/176
    from chdb import dbapi
    conn = dbapi.connect(path=test_state_dir)
    cur = conn.cursor()
    cur.execute("CREATE DATABASE e ENGINE = Atomic;")
    cur.execute(
        "CREATE TABLE e.hi (a String primary key, b Int32) Engine = MergeTree ORDER BY a;"
    )
    cur.execute("INSERT INTO e.hi (a, b) VALUES (%s, %s);", ["he", 32])
    
    cur.close()
    conn.close()
    
    conn2 = dbapi.connect(path=test_state_dir)
    cur2 = conn2.cursor()
    cur2.execute("SELECT * FROM e.hi;")
    row = cur2.fetchone()
    self.assertEqual(("he", 32), row)
    
  • Now, SET clause will work through chdb session! by @auxten in https://github.com/chdb-io/chdb/pull/207
    from chdb import session as chs
    se = chs.Session()
    se.query("SET input_format_csv_skip_first_lines = 1")
    se.query("SELECT * FROM `some dirty csv`").show()
    
  • Add test cases for materialize view by @auxten in https://github.com/chdb-io/chdb/pull/201
  • Fix DB-API test rerun issue by @auxten in https://github.com/chdb-io/chdb/pull/208

New Contributors

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.2.1...v1.3.0

chdb - v1.2.1

Published by auxten 9 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.2.0...v1.2.1

chdb - v1.2.0

Published by auxten 9 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.1.0...v1.2.0

chdb - v1.1.0

Published by auxten 10 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.0.2...v1.1.0

chdb - v1.0.2

Published by auxten 10 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v1.0.1...v1.0.2

chdb - v1.0.1

Published by auxten 10 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/chdb-io/chdb/commits/v1.0.1

chdb - v1.0.0

Published by lmangani 11 months ago

  • ⭐ chdb 1.0.0 (celebration release)

Celebrating the work of @auxten @lmangani @laodouya @nmreadelf and all the community members supporting the project ❤️

chdb-1k-stars

chdb - v1.0.0rc3

Published by auxten 11 months ago

chdb - v1.0.0rc2

Published by auxten 11 months ago

chdb - v1.0.0rc1

Published by auxten 11 months ago

What's Changed

Full Changelog: https://github.com/chdb-io/chdb/compare/v0.16.0rc2...v1.0.0rc1

chdb - v0.16.0rc2

Published by auxten 11 months ago

chdb - v0.16.0rc1

Published by auxten 11 months ago

chdb Release Summary

chdb 0.16 based on clickhouse 23.10

Query Enhancements

  • Vector Addition:

    • python3 -m chdb "SELECT [1, 2, 3] + [4, 5, 6]".
  • Omit file() Function:

    • python3 -m chdb "SELECT * from '/home/Clickhouse/bench/hits_0.parquet' limit 10".
  • NumPy as Input Format:

    • Support for NumPy as an input format with the query SELECT * FROM 'data.npy'.
  • Parquet Optimizations:

    • Writing parquet files is 10x faster, it's multi-threaded now. Almost the same speed as reading.
    • Parquet filter pushdown. I.e. when reading Parquet files, row groups (chunks of the file) are skipped based on the WHERE condition and the min/max values in each column.
    • Optimize reading small row groups by batching them together in Parquet.
  • Condition Pushdown for ORC:

    • Using data skipping indices in ORC, similarly to Parquet.
  • PRQL Support:

    • Added support for PRQL as a query language.
  • urlCluster Function:

    • Add urlCluster table function.

New Features

  • Introducing arrayFold for applying a lambda function to multiple arrays.
  • Extended support for asynchronous inserts with external data via the native protocol.
  • Introduced function jsonMergePatch for merging JSON strings.
  • Continued support for Kusto Query Language dialect with Phase 1 implementation.
    - Introduced a new SQL function arrayRandomSample for sampling elements from an input array.
    - Added support for dropping cache for Protobuf format with SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf].
  • Conditions on arguments of a table with a space-filling curve in its key can now be used for indexing.
  • New setting force_optimize_projection_name checks that a projection is used in the query.
  • Added aggregation function lttb using the Largest-Triangle-Three-Buckets algorithm for downsampling data.
  • CHECK TABLE query has better performance and usability, supporting checking particular parts.
    - Introduced function byteSwap for reversing the bytes of unsigned integers.
    - Added functions formatQuery and formatQuerySingleLine for formatted SQL query output.
    - Introduced DWARF input format for reading debug symbols from an ELF file.
    - Introduced SHOW SETTING setting_name as a simpler version of SHOW SETTINGS.
    - Added fields substreams and filenames to the system.parts_columns table.
    - Introduced a setting create_table_empty_primary_key_by_default for default ORDER BY ().

Performance Improvements

  • Fixed contention on Context lock, significantly improving performance for short-running concurrent queries.
  • Improved the performance of inverted index creation by 30%.
  • Optimized memory consumption for external aggregation with many temporary files.
  • Added option query_plan_preserve_num_streams_after_window_functions to preserve the number of streams after evaluating window functions.
  • Released more streams if data is small, optimizing resource usage.
  • Optimized RoaringBitmaps before serialization.
  • Optimized inverted index posting lists to use the smallest possible representation.
  • Set a reasonable size for the marks cache for secondary indices by default.
  • Avoided unnecessary reconstruction of index granules when reading skip indexes.
  • Cached CAST function in set during execution to improve the performance of function IN when set element type doesn't match column type.
  • Improved write performance to EmbeddedRocksDB tables.
  • Improved overall resilience for ClickHouse in case of many parts within a partition.
  • Reduced memory consumption during loading of hierarchical dictionaries.
  • All dictionaries now support the setting dictionary_use_async_executor.
  • Prevented excessive memory usage when deserializing AggregateFunctionTopKGenericData.
  • Reduced CPU consumption for AsyncMetrics threads on a Keeper with lots of watches.
  • Experimental inverted indexes now do not store tokens with too many matches, saving space.
  • Improved write performance to EmbeddedRocksDB tables.
  • Improved write performance to hierarchical dictionaries.