octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

MPL-2.0 License

Downloads
5
Stars
4.8K
Committers
14

Bot releases are hidden (Show)

octosql - 🌊OctoSQL Streaming 🌊

Published by cube2222 about 4 years ago

Hey everybody!

This release is an almost-rewrite of OctoSQL.

  • It changes all state to use local transactional on-disk storage (based on Badger).
  • Adds Temporal SQL
    • Watermarks
    • Triggers
    • Event Time
    • Early Results and Retractions
  • New datasources
    • Kafka
    • Apache Parquet Files
  • All datasources now work asynchronously to actual processing
  • Parallelism has been introduced to datasources (Kafka), distinct selects, group by's, joins with Shuffling functionality included. (key-hashing)
  • Stream Joins (which make joining files orders of magnitudes faster than Lookup Joins)
  • Live-updating output tables
  • New Table Valued Functions
    • Maximum Difference Watermark Generator
    • Percentile Watermark Generator
  • Many new functions have been added
  • Common Table Expressions have been added (queries containing "WITH" statements)
  • Telemetry (described in the README in depth)

octosql-demo

octosql - Excel support, TVF's and custom SQL parser!

Published by cube2222 about 5 years ago

Features

  • Excel file support.
  • Table Value Functions: range and tumble for now. Documentation
  • Configurable separators for CSV files.
  • Support for CSV files with no header row.
  • Support for uppercase field names (they get lowercased).

Internals

  • We finally switched to a custom sql parser (copied from vitess) which allowed us to add TVF's and will allow us to add other potential modifications to our SQL dialect.
  • Support for record metadata, including stuff like retracting records. This is important for further work on streaming support.
  • Record serialization, which will make it possible to use disk based key-value state storage.
octosql - Ergonomics, bug fixes and performance improvements!

Published by cube2222 about 5 years ago

  • Fixing null handling and adding functions for easier usage: coalesce and nullif
  • Fixing a bug where an empty password for postgres wouldn't work. ( thanks to @zknill )
  • Fixing the optimizer an optimizer bug where a predicate with a function like int(p.age) could be pushed down, but it wouldn't be thrown into the query, and handled using "?" signs, which would always be filled with null (as the wanted records are yet to come).
  • More understandable function argument errors.
  • Parallelizing and adding prefetch to lookup joins, resulting in an over 10x improvement in throughput with a joined datasource with a low-latency (should be a much bigger difference with high-latency data sources)
  • Adding an execution configuration file section, currently containing the prefetch size for the aforementioned parallel lookup join.
  • Brand new logo. ( thanks to @styczynski )
octosql - Initial release!

Published by cube2222 over 5 years ago

We're now happy enough with OctoSQL to make it public!