基于Spark+Debezium打造的简单易用、超高性能大数据治理引擎,适用于批流一体的数据集成和数据分析场景,支持CDC实时数据采集,支持海量数据同步、数据建模和OLAP数据分析
类型 | 数据源 | 批模式(读) | 批模式(写) | 流模式(读) | 流模式(写) | CDC(读) | CDC(写) |
---|---|---|---|---|---|---|---|
关系型数据库 | MySQL | √ | √ | √ | √ | 增,删,改 | 增,删,改 |
MariaDB | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
PostgreSQL | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
Oracle | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
SQLServer | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
DB2 | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
NoSQL数据库 | HBase | √ | √ | √ | √ | 增 | 增,删,改 |
Phoenix | √ | √ | √ | 增,删,改 | |||
MongoDB | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
数据仓库 | Hive | √ | √ | √ | 增 | ||
StarRocks | √ | √ | √ | 增,删,改 | |||
Doris | √ | √ | √ | 增,删,改 | |||
ClickHouse | √ | √ | √ | √ | 增,删,改 | 增,删,改 | |
消息中间件 | Kafka | √ | √ | √ | √ | 增 | 增 |
图数据库 | Neo4j | √ | √ | √ | √ | 增 | 增,删,改 |
文件数据源 | Text | √ | √ | √ | √ | 增 | 增 |
CSV | √ | √ | √ | √ | 增 | 增 | |
Excel | √ | √ | √ | √ | 增 | 增 | |
JSON | √ | √ | √ | √ | 增 | 增 | |
ORC | √ | √ | √ | √ | 增 | 增 | |
Parquet | √ | √ | √ | √ | 增 | 增 |
{
"env": {
"param": "hdfs://cluster/starks/params/test.json",
},
"source": [
{
"identifier": "ss001",
"name": "用户基本信息表(存量数据)",
"type": "ORACLE",
"dataset": "users_basic",
"mode": "BATCH",
"connection": {
"url": "jdbc:oracle:thin:@//127.0.0.1:1521/XE",
"driver": "oracle.jdbc.OracleDriver",
"user": "system",
"password": "system"
}
},
{
"identifier": "ss002",
"name": "用户详细信息表(实时更新)",
"type": "MYSQL",
"dataset": "users_detail",
"mode": "STREAM",
"connection": {
"url": "jdbc:mysql://127.0.0.1:3306/test",
"driver": "com.mysql.cj.jdbc.Driver",
"user": "root",
"password": "root"
}
}
],
"transform": [
{
"identifier": "tf001",
"name": "用户基本信息和详细信息关联合并",
"source": ["ss001", "ss002"],
"sql": "select ss001.*, ss002.detail as detail from ss001 inner join ss002 on ss001.id = ss002.id",
"transout": ["ts001"]
}
],
"transout": [
{
"identifier": "ts001",
"transform": ["tf001"],
"sink": ["sk001"]
}
],
"sink": [
{
"identifier": "sk001",
"name": "通过JDBC协议输出到HIVE数仓",
"type": "HIVE",
"dataset": "users_info",
"mode": "APPEND",
"connection": {
"url": "jdbc:hive2://127.0.0.1:10000/test",
"driver": "org.apache.hive.jdbc.HiveDriver",
"user": "hive"
}
}
]
}
Stark-1.1.0-preview.jar
根目录下的 rule.json
规则文件,指定 source
和 sink
中的 [MySQL/Oracle/PostgreSQL]
数据源连接信息Stark-1.1.0-preview.jar
到服务器(需要安装Spark3.x客户端,官网下载解压即可)$SPARK_HOME/bin
目录下,执行 spark-submit Stark-1.1.0-preview.jar
命令,等待任务执行结束[MySQL/Oracle/PostgreSQL]
数据库,查看 sink
节点指定的输出表,验证数据是否采集成功注意:
[预览版]
只能使用[MySQL/Oracle/PostgreSQL]
数据源做[批处理]
操作,想要体验Stark引擎完整版功能请联系↓↓↓