octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

MPL-2.0 License

Downloads
5
Stars
4.8K
Committers
14

Bot releases are visible (Hide)

octosql - v0.6.0

Published by github-actions[bot] over 2 years ago

Changelog

  • Add short-circuiting LIMIT and eagerly printed JSON and CSV formats.
  • Add support for plugins handling file extensions.
  • Add support for table options.
  • Treat empty fields as NULL in CSV datasource.
octosql - v0.5.0

Published by github-actions[bot] over 2 years ago

Changelog

  • Update to Go 1.18
  • Make the temporary plugin directory configurable.
  • Improve strict function typechecking, as well as list indexing typechecking.
  • Stop using zero copy JSON decoding, it's buggy.
  • Introduce much better object support, with new syntax for accessing object fields (object -> field_name).
  • Improve old plugin API Level error message.
  • Fix tempdir name used by Plugin executor.
octosql - v0.4.2

Published by github-actions[bot] almost 3 years ago

Changelog

  • f2f6b4d Fix two cases where the optimizer could break Expressions during predicate pushdown, causing crashes.
octosql - v0.4.1 - Fix Windows support

Published by github-actions[bot] almost 3 years ago

Changelog

  • 3f87e6e Fix handling of unix sockets on Windows: - Use Lstat instead of Stat to stat unix sockets. - Use custom dialer to omit gRPC resolvers, as they misunderstand Windows unix paths.
  • 8715db2 Fix version resolution with multiple default plugin datasource instances.
  • f3b4dec Fix plugin binaries lookup on Windows.
octosql - v0.4.0 - Plugins, Static Typing, Performance, and Embeddability

Published by github-actions[bot] almost 3 years ago

Hey there!

This release marks another ground-up rewrite of OctoSQL. There were a lot of problems with the previous designs of OctoSQL. Fortunately, that is no more, and I hope there won't be any more rewrites in the future.

This is a big release, and there are major changes.

OctoSQL is now statically typed, so your queries get typechecked before being executed and the optimizer is much more robust thanks to that. Schemas are automatically inferred from datasources, so there are no usability tradeoffs to that.

Datasources are now decoupled from the main repository of OctoSQL, as OctoSQL now uses a plugin model. You can create new datasources for OctoSQL and just add an entry to the plugin repository for OctoSQL to be able to use it. The plugin subsystem is also specifically designed not to compromise on performance, which gets us to the final point...

Performance. OctoSQL is now much (orders of magnitude much) faster. You can expect 100x speed improvements across most use cases.

Overall, OctoSQL is now more robust, faster, easier to create plugins for, and has static verification of executed queries.

P.S.: It's also more lightweight and fully in-memory now, so you can easily embed it into your own applications either as a dataflow engine, or a SQL execution engine.

octosql - 🌊OctoSQL Streaming 🌊

Published by cube2222 about 4 years ago

Hey everybody!

This release is an almost-rewrite of OctoSQL.

  • It changes all state to use local transactional on-disk storage (based on Badger).
  • Adds Temporal SQL
    • Watermarks
    • Triggers
    • Event Time
    • Early Results and Retractions
  • New datasources
    • Kafka
    • Apache Parquet Files
  • All datasources now work asynchronously to actual processing
  • Parallelism has been introduced to datasources (Kafka), distinct selects, group by's, joins with Shuffling functionality included. (key-hashing)
  • Stream Joins (which make joining files orders of magnitudes faster than Lookup Joins)
  • Live-updating output tables
  • New Table Valued Functions
    • Maximum Difference Watermark Generator
    • Percentile Watermark Generator
  • Many new functions have been added
  • Common Table Expressions have been added (queries containing "WITH" statements)
  • Telemetry (described in the README in depth)

octosql-demo

octosql - Excel support, TVF's and custom SQL parser!

Published by cube2222 about 5 years ago

Features

  • Excel file support.
  • Table Value Functions: range and tumble for now. Documentation
  • Configurable separators for CSV files.
  • Support for CSV files with no header row.
  • Support for uppercase field names (they get lowercased).

Internals

  • We finally switched to a custom sql parser (copied from vitess) which allowed us to add TVF's and will allow us to add other potential modifications to our SQL dialect.
  • Support for record metadata, including stuff like retracting records. This is important for further work on streaming support.
  • Record serialization, which will make it possible to use disk based key-value state storage.
octosql - Ergonomics, bug fixes and performance improvements!

Published by cube2222 over 5 years ago

  • Fixing null handling and adding functions for easier usage: coalesce and nullif
  • Fixing a bug where an empty password for postgres wouldn't work. ( thanks to @zknill )
  • Fixing the optimizer an optimizer bug where a predicate with a function like int(p.age) could be pushed down, but it wouldn't be thrown into the query, and handled using "?" signs, which would always be filled with null (as the wanted records are yet to come).
  • More understandable function argument errors.
  • Parallelizing and adding prefetch to lookup joins, resulting in an over 10x improvement in throughput with a joined datasource with a low-latency (should be a much bigger difference with high-latency data sources)
  • Adding an execution configuration file section, currently containing the prefetch size for the aforementioned parallel lookup join.
  • Brand new logo. ( thanks to @styczynski )
octosql - Initial release!

Published by cube2222 over 5 years ago

We're now happy enough with OctoSQL to make it public!