tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV

APACHE-2.0 License

Downloads
233
Stars
883
Committers
47

Bot releases are hidden (Show)

tispark - TiSpark 2.3.0

Published by marsishandsome over 4 years ago

New Features

  • Support working with TiDB-3.1 and TiDB-4.0
  • Support working with TiFlash
  • Support directly writing data to TiKV

Fixes

  • Support new row format encoding #1492
  • Fix test: add TiFlash replica available check #1495
  • Fix string type pushdown #1500
tispark - TiSpark 2.3.0-rc.1

Published by marsishandsome over 4 years ago

  • Fix column name case bug #1487
  • Fix resolve current lock npe bug #1485
  • Fix resolve lock bug with tiflash #1483
tispark - TiSpark 2.3.0-rc

Published by marsishandsome over 4 years ago

New Features

  • Support working with TiDB-3.1 and TiDB-4.0
  • Support working with TiFlash
  • Support directly writing data to TiKV
tispark - TiSpark 2.1.9

Published by marsishandsome over 4 years ago

Fixes

  • Fix desc temp view #1328
  • Fix prefix index on blob #1334
  • Shade io.opencensus to resolve grpc conflict #1352
  • Fix parition table isn't shown in show command #1374
  • Fix partition pruning when partition definition contains big integer #1385
  • Support TiDB-4.0 #1398
tispark - TiSpark 2.1.8

Published by marsishandsome almost 5 years ago

Fixes

  • Fix UnsupportedOperationException: using stream rather than removeIf #1303
tispark - TiSpark 2.1.7

Published by marsishandsome almost 5 years ago

Fixes

  • Add task retry if tikv is down #1207
  • Fix output offsets: add field type to Constant and ColumnRef when encoding proto #1231
  • Register udf ti_version for every sparksession #1258
  • Add timezone check #1275
  • Disable set, enum and bit pushed down #1242
tispark - TiSpark 2.1.6

Published by marsishandsome almost 5 years ago

Fixes

  • Fix TopN push down bug #1185
  • Consider nulls order in TopN pushdown #1187
  • Fix Stack Overflow Error when reading from partition table #1179
  • Fix parsing view table's json bug #1174
  • Fix No Matching column bug #1162
  • Fix behavior of estimateTableSize #845
  • Fix Bit Type default value bug #1148
  • Fix fastxml security alert #1127
  • Fix bug: TiSpark Catalog has 10-20s delay #1108
  • Fix reading data from TiDB in Spark Structured Streaming #1104
tispark - TiSpark 2.1.5

Published by marsishandsome about 5 years ago

Fixes

  • Remove useless scala and jackson dependencies #1079
  • Fix range partition throw UnsupportedSyntaxException error #1088
  • Make TiSpark reading data from a hash partition table #1089
tispark - TiSpark 2.2.0

Published by marsishandsome about 5 years ago

New Features

  • Natively support writing data to TiKV (ACID) using Spark Data Source API

WARNING

DO NOT set spark.tispark.write.without_lock_table to true on production environment (you may lost data).

Improvements

  • Release one TiSpark jar (both support Spark-2.3.x and Spark-2.4.x) instead of two #933
  • Add spark version to TiSpark's udf ti_version #943
  • Bump grpc to 1.17 #982
  • Add retry mechanism for batchGet #986

Fixes

  • Catch UnsupportedSyntaxException when generating partition expressions #960
  • Fix TiSpark cannot read from a hash partition table #966
  • Prohibit extra index data type pushdown when doing index scan to avoid decoding extra column #995
  • Prohibit agg or groupby pushdown on double read #1004
tispark - TiSpark 2.1.4

Published by marsishandsome about 5 years ago

Fixes

  • Fix distinct without alias bug: disable pushdown aggregate with alias #1055
  • Fix reflection bug: pass in different arguments for a different version of same function #1037
tispark - TiSpark 2.1.3

Published by marsishandsome about 5 years ago

Fixes

  • Fix cost model in table scan #1023
  • Fix index scan bug #1024
  • Prohibit aggregate or group by pushdown on double read #1027
  • Fix reflection bug for HDP release #1017
  • Fix scala compiler version #1019
tispark - TiSpark 2.1.2

Published by marsishandsome about 5 years ago

Fixes

  • Fix improper response with region error #922
  • Fix view parseing problem #953
tispark - TiSpark 1.2.1

Published by marsishandsome over 5 years ago

TiSpark 1.2.1 is released!

TiSpark 1.2.1 is a bug fix release. The most important bug fixed in this release is https://github.com/pingcap/tispark/pull/899.

We suggest all users who use TiSpark 1.2 to upgrade to TiSpark 1.2.1.

Fixes

  • fix count error, if advanceNextResponse is empty, we should read next region (#899)
  • use fixed version of proto (#898)
tispark - TiSpark 2.1.1

Published by marsishandsome over 5 years ago

TiSpark 2.1.1 is released!

TiSpark 2.1.1 is a bug fix release. The most important bug fixed in this release is https://github.com/pingcap/tispark/pull/882.

We suggest all users who use TiSpark 2.1 to upgrade to TiSpark 2.1.1.

Fixes

  • Add TiDB/TiKV/PD version and Spark version supported for each latest major release (#804) (#887)
  • Fix incorrect timestamp of tidbMapDatabase (#862) (#885)
  • Fix column size estimation (#858) (#884)
  • fix count error, if advanceNextResponse is empty, we should read next region (#878) (#882)
  • use fixed version of proto instead of master branch (#843) (#850)
tispark - TiSpark 2.1

Published by birdstorm over 5 years ago

TiSpark 2.1 is released!

TiSpark 2.1 contains multiple fixes and refinement. It also provides support for Spark 2.3/2.4.

Features

  • Support range partition pruning (Beta) (#599)
  • Support show columns command (#614)

Fixes

  • Fix build key ranges with xor expression (#576)
  • Fix cannot initialize pd if using ipv6 address (#587)
  • Fix default value bug (#596)
  • Fix possible IndexOutOfBoundException in KeyUtils (#597)
  • Fix outputOffset is incorrect when building DAGRequest (#615)
  • Fix incorrect implementation of Key.next() (#648)
  • Fix partition parser can't parser numerical value 0 (#651)
  • Fix prefix length may be larger than the value used. (#668)
  • Fix retry logic when scan meet lock (#666)
  • Fix inconsistent timestamp (#676)
  • Fix tempView may be unresolved when applying timestamp to plan (#690)
  • Fix concurrent DAGRequest issue (#714)
  • Fix downgrade scan logic (#725)
  • Fix integer type default value should be parsed to long (#741)
  • Fix index scan on partition table (#735)
  • Fix KeyNotInRegion may occur when retrieving rows by handle (#755)
  • Fix encode value long max (#761)
  • Fix MatchErrorException may occur when Unsigned BigInt contains in group by columns (#780)
  • Fix IndexOutOfBoundException when trying to get pd member (#788)
tispark - TiSpark 2.0

Published by birdstorm over 5 years ago

TiSpark 2.0 is released!

TiSpark works with Spark 2.3 now and uses multiple new features such as Spark Extensions. The new document can be found at https://github.com/pingcap/tispark/blob/master/docs/userguide.md#demo .

Features

  • Work with Spark 2.3
  • Support use $database statement
  • Support show databases statement
  • Support show tables statement
  • No need to use TiContext.mapTiDBDatabase, use $database.$table to identify a table instead
  • Support data type SET and ENUM
  • Support data type YEAR
  • Support data type TIME
  • Support isolation level settings
  • Support describe table command
  • Support cache tables and uncache tables
  • Support read from a TiDB partition table
  • Support use TiDB as metastore

Fixes

  • Fix JSON parsing (#491)
  • Fix count on empty table (#498)
  • Fix ScanIterator unable to read from adjacent empty regions (#519)
  • Fix possible NullPointerException when setting show_row_id true (#522)

Improved

  • Make ti version usable without selecting database (#545)
tispark - TiSpark 1.2

Published by birdstorm over 5 years ago

Fixes

  • Fixes compatibility with PDServer #480
tispark - TiSpark 1.1

Published by birdstorm about 6 years ago

Fixes multiple bugs:

  • Fix daylight saving time (DST) (#347)
  • Fix count(1) result is always 0 if subquery contains limit (#346)
  • Fix incorrect totalRowCount calculation (#353)
  • Fix request fail with Key not in region after retrying NotLeaderError (#354)
  • Fix ScanIterator logic where index may be out of bound (#357)
  • Fix tispark-sql dbName (#379)
  • Fix StoreNotMatch (#396)
  • Fix utf8 prefix index (#400)
  • Fix decimal decoding (#401)
  • Refactor not leader logic (#412)
  • Fix global temp view not visible in thriftserver (#437)

Adds:

  • Allow TiSpark retrieve row id (#367)
  • Decode json to string (#417)

Improvements:

  • Improve PD connection issue's error log (#388)
  • Add DB prefix option for TiDB tables (#416)
tispark - TiSpark 1.0.1

Published by birdstorm about 6 years ago

  • Fix unsigned index
  • Compatible with TiDB before and since 48a42f
tispark - TiSpark 1.0 GA

Published by ilovesoup over 6 years ago

TiSpark provides distributed computing of TiDB data using Apache Spark.

  • Provide a gRPC communication framework to read data from TiKV
  • Provide encoding and decoding of TiKV component data and communication protocol
  • Provide calculation pushdown, which includes:
    • Aggregate pushdown
    • Predicate pushdown
    • TopN pushdown
    • Limit pushdown
  • Provide index related support
    • Transform predicate into Region key range or secondary index
    • Optimize Index Only queries
    • Adaptive downgrade index scan to table scan per region
  • Provide cost-based optimization
    • Support statistics
    • Select index
    • Estimate broadcast table cost
  • Provide support for multiple Spark interfaces
    • Support Spark Shell
    • Support ThriftServer/JDBC
    • Support Spark-SQL interaction
    • Support PySpark Shell
    • Support SparkR
Package Rankings
Top 9.62% on Repo1.maven.org
Top 8.17% on Proxy.golang.org
Top 13.57% on Pypi.org
Badges
Extracted from project README
Maven Central License