Run SQL over your Maven artifacts
MIT License
Run SQL over your Maven artifacts.
Slides from the talk "Talking SQL to Strangers".
camille-sql
allows you to explore Maven artifacts you have on your local hard drive.
Run the server:
$ bin/camille-server
Artifacts repository path: /Users/<user>/.m2/repository/
Running server on localhost:26727
...
The server understands PostreSQL wire protocol, so you can connect to it using standard psql
client:
$ PGPASSWORD=nopass psql "host=localhost port=26727 sslmode=disable"
psql (12.2, server 9.5.0)
Type "help" for help.
camille=>
As you can see, psql
is absolutely sure it talks to PostgreSQL version 9.5.0.
Now you have access to 2 tables: artifacts
and versions
. You can run any read-only SQL query: the server supports projections, filtering, grouping, joins, agg functions, sub-queries etc (pretty much all of SQL99).
Basic queries:
camille=> select * from artifacts limit 6;
uid | group_id | artifact_id | name | url
------------+--------------------------+--------------------------+---------------------+------------------------------------------
3227713579 | alandipert | desiderata | desiderata | https://github.com/alandipert/desiderata
3382955103 | aopalliance | aopalliance | AOP alliance | http://aopalliance.sourceforge.net
1507835947 | asm | asm-parent | ASM | http://asm.objectweb.org/
226341444 | backport-util-concurrent | backport-util-concurrent | Backport of JSR 166 | http://backport-jsr166.sourceforge.net/
1712481681 | biz.aQute | bndlib | BND Library | http://www.aQute.biz/Code/Bnd
2280883480 | biz.aQute.bnd | biz.aQute.bndlib | biz.aQute.bndlib | https://bnd.bndtools.org/
(6 rows)
camille=> select * from versions where filesize > 10000 limit 5;
uid | version | filesize | last_modified | sha1
------------+---------+----------+-------------------------+------------------------------------------
3345961009 | 1.3.2 | 337129 | 2019-07-04 23:36:26.464 | ff84d15cfeb0825935a170d7908fbfae00498050
1053708643 | 1.0.1 | 26514 | 2019-07-04 23:23:20.322 | 49c100caf72d658aca8e58bd74a4ba90fa2b0d70
2740841946 | 1.6.5 | 1034049 | 2019-07-05 05:37:10.953 | 7d18faf23df1a5c3a43613952e0e8a182664564b
925895164 | 0.4.4 | 42645 | 2020-02-01 06:45:59.599 | 2522f7f1b4bab169a2540406eb3eb71f7d6e3003
136773645 | 1.9 | 263965 | 2019-07-04 23:25:30.09 | 9ce04e34240f674bc72680f8b843b1457383161a
(5 rows)
Something more complicated:
camille=>
SELECT group_id, COUNT(*) AS n_files
FROM artifacts
LEFT JOIN versions ON artifacts.uid=versions.uid
GROUP BY group_id
ORDER BY n_files DESC
LIMIT 10;
group_id | n_files
--------------------------+---------
org.apache.flink | 391
org.apache.maven | 245
org.codehaus.plexus | 186
org.apache.hadoop | 121
org.apache.maven.doxia | 108
org.apache.maven.plugins | 82
io.netty | 67
org.apache.maven.shared | 65
org.apache.lucene | 64
org.apache.commons | 62
(10 rows)
The project is mainly done out of pure curiosity:
pgwire
) look likepgwire
as a Netty codecpgwire
messages (see pgwire
package). The tricky part of the codec is that very first message has a different structure compared to all following messages (from PostgreSQL documentation: because of purely historical reasons). Channel initializer creates pipeline with PgwireStartupMessageDecoder
that will eventually remove itself after the first message is succesfully processed.m2sql
package. It exposes JDBC connection, so the server uses standard java.sql
interface when talking to it (see documentation for Apache Avatica library).More details in the deck.
"Precision is the difference between a butcher and a surgeon" (tm)
Implemented optimization:
Work in progress:
This is the project made for fun. Feel free to implement whatever feature you want and just drop a PR here ;) See TODO list below if you need ideas on what could be helpful (or what is critically missing).
toByteBuf
method for each message type, the logic should be implemented in a single encoder with dynamic type-based dispatch\l
, show databases
, show tables
(need to register pg_catalog
to make this happen)pgwire
protocol has way more message types that are currently implementedinsert
, update
etc)group_id LIKE com.apache.%
predicate might be optimited by going directly to com/apache/
subfolder)DELETE
versionsCopyright © 2020 camille-sql
camille-sql
is licensed under the MIT license, available at MIT and also in the LICENSE file.