Apache Pinot - A realtime distributed OLAP datastore
APACHE-2.0 License
Bot releases are hidden (Show)
This release comes with several features, SQL /UI/Perf enhancements Bugfixes across areas ranging from Multistage Query Engine to Ingestion, Storage format, SQL support, etc.
is_colocated_by_join_keys
query option is reintroduced to ensure dynamic broadcast which can also benefit from direct exchange optimizationis_colocated_by_join_keys
hint is now required for making colocated joins
FLOAT
type is not supported.BOOLEAN
, INT
, LONG
, FLOAT
(only in V1), DOUBLE
, STRING
, TIMESTAMP
.ArrayAgg(intCol, 'INT')
returns ARRAY<INT>
Canonicalize SqlKind.OTHERS
and SqlKind.OTHER_FUNCTIONS
and support
concat
as ||
operator (#12025)
Capability for constant filter in QueryContext
, with support for server to handle it (#11956)
Tests for filter pushdown (#11994)
Enhancements to query plan tests (#11966)
Refactor PlanFragmenter to make the logic clear (#11912)
Observability enhancements to emit metrics for grpc request and multi-stage leaf stage (#11838)
pinot.server.query.log.maxRatePerSecond
: query log max rate (QPS, default 10K)pinot.server.query.log.droppedReportMaxRatePerSecond
: dropped query log report max rate (QPS, default 1)Security enhancement to add RBAC authorization checks for multi-stage query engine (#11830)
Enhancement to leaf-stage execution stats NPE handling (#11805)
Enhancement to add a framework to back-propagate metadata across opChains (#11746)
Use of BinaryArray to wire proto for multi-stage engine bytes literal handling (#11738)
Enable dynamic broadcast for SEMI joins. Adds a fallback option to enable hash table join using joinOptions(join_strategy = 'hash_table')
(#11696)
Improvements to dispatch exception handling (#11688)
Allow malformed dateTime string to return default value configurable in the function signature (#11258)
fromDateTime(colContainsMalformedStr, '<dateTimeFormat>', '<timezone>', <default_value>)
Improvement in multi-stage aggregation to directly store column index as identifier (#11617)
Perf optimization to avoid unnecessary rows conversion in aggregation (#11607)
Enhance SegmentPartitionMetadataManager
to handle new segment (#11585)
Optimize mailbox info in query plan to reduce memory footprint (#12382)
Optimizations to query plan serialization (#12370)
Optimization for parallel execution of Ser/de stage plan (#12363)
Optimizations in query dispatch (#12358)
Perf optimization for group-by and join for single key scenario (#11630)
pinot.server.consumption.rate.limit
to enable this featureMergeRollupTask
and RealtimeToOfflineSegmentsTask
minion tasks segmentMapperFileSizeThresholdInBytes
to specify the threshold size "task": {
"taskTypeConfigsMap": {
"<task_name>": {
"segmentMapperFileSizeThresholdInBytes": "1000000000"
}
}
}
Adds support for deterministic and sticky routing for a query / table / broker. This setting would lead to same server / set of servers (for MultiStageReplicaGroupSelector
) being used for all queries of a given table.
Query option (takes precedence over fixed routing setting at table / broker config level)
SET "useFixedReplica"=true;
Table config (takes precedence over fixed routing setting at broker config level)
"routing": {
...
"useFixedReplica": true
}
Broker conf - pinot.broker.use.fixed.replica=true
dimensionTableConfig.errorOnDuplicatePrimaryKey=true
to enable this behaviorforceCommit
API as a comma separated list of partition names or consuming segment namespinot.broker.instance.tags
streamConfigs
:"stream.pulsar.issuerUrl": "https://auth.streamnative.cloud"
"stream.pulsar.credsFilePath": "file:///path/to/private_creds_file
"stream.pulsar.audience": "urn:sn:pulsar:test:test-cluster"
lowDiskMode.
Default value is false. SELECT ProductId, UserId, l2_distance(embedding, ARRAY[-0.0013143676,-0.011042999,...]) AS l2_dist, n_tokens, combined
FROM fineFoodReviews
WHERE VECTOR_SIMILARITY(embedding, ARRAY[-0.0013143676,-0.011042999,...], 5)
ORDER by l2_dist ASC
LIMIT 10
VectorSimilarity
will return a double value where the first parameter is the embedding column and the second parameter is the search term embedding literal. VectorSimilarity
is a predicate, once config the topK
, this predicate will return topk
rows per segment. Then if you are using this index with other predicate, you may not get expected number of rows since the records matching other predicate might not in the topk
rows.deletedKeysTTL
which will remove deleted keys from in-memory hashmap and mark the validDocID as invalid after the deletedKeysTTL
threshold period.deletedKeysTTL
is set fieldConfigList: [
{
"name": "columnName",
"indexType": "TEXT",
"indexTypes": [
"TEXT"
],
"properties": {
"luceneAnalyzerClass": "org.apache.lucene.analysis.core.KeywordAnalyzer"
},
}
]
standardAnalyzer
unless the luceneAnalyzerClass
property is specified. Murmur3 support with optional fields seed
and variant
for the hash in functionConfig
field of columnPartitionMap.
Default value for seed
is 0.
Added support for 2 variants of Murmur3
: x86_32
and x64_32
configurable using the variant
field in functionConfig
. If no variant is provided we choose to keep the x86_32
variant as it was part of the original implementation.
Examples of functionConfig
;
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3
},
..
}
}
Here there is no functionConfig configured, so the seed
value will be 0
and variant will be x86_32
.
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3,
"functionConfig": {
"seed": "9001"
},
},
..
}
}
Here the seed
is configured as 9001
but as no variant is provided, x86_32
will be picked up.
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3,
"functionConfig" :{
"seed": "9001"
"variant": "x64_32"
},
},
..
}
}
Here the variant
is mentioned so Murmur3 will use the x64_32
variant with 9001
as seed.
Note on users using Debezium
and Murmur3
as partitioning function :
byte[]
, String
or long[]
columns.variant
should be set as x64_32
and seed
should be set as 9001
.Adds new MV dictionary encoded forward index format that only stores the unique MV entries.
This new index format can significantly reduce the index size when the MV entries repeat a lot
The new index format can be enabled during index creation, derived column creation, and segment reload
To enable the new index format, set the compression codec in the FieldConfig
:
{
"name": "myCol",
"encodingType": "DICTIONARY",
"compressionCodec": "MV_ENTRY_DICT"
}
Or use the new index JSON:
{
"name": "myCol",
"encodingType": "DICTIONARY",
"indexes": {
"forward": {
"dictIdCompressionType": "MV_ENTRY_DICT"
}
}
}
enableColumnBasedNullHandling
is false. When set to true, Pinot will ignore TableConfig.IndexingConfig.nullHandlingEnabled
and columns will be nullable if and only if FieldSpec.notNull
is false, which is also the default value.{
"schemaName": "blablabla",
"dimensionFieldSpecs": [
{
"dataType": "INT",
"name": "nullableField",
"notNull": false
},
{
"dataType": "INT",
"name": "notNullableField",
"notNull": true
},
{
"dataType": "INT",
"name": "defaultNullableField"
},
...
],
"enableColumnBasedNullHandling": true/false
}
outOfOrderRecordColumn
OOO
or not and then accordingly update the corresponding column value to true / false. skipUpsert
functionColumnPairs
has a output type of bytes, such as when you use distinctcountrawhll.
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"a",
"b",
"c"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [],
"aggregationConfigs": [
{
"columnName": "column1",
"aggregationFunction": "SUM",
"compressionCodec": "SNAPPY"
},
{
"columnName": "column2",
"aggregationFunction": "distinctcounthll",
"compressionCodec": "LZ4"
}
],
"maxLeafRecords": 10000
}
]
"instanceAssignmentConfigMap": {
"CONSUMING": {
"partitionSelector": "MIRROR_SERVER_SET_PARTITION_SELECTOR",
"replicaGroupPartitionConfig": { ... },
"tagPoolConfig": {
...
"tag": "mt1_REALTIME"
}
...
}
"COMPLETED": {
"partitionSelector": "MIRROR_SERVER_SET_PARTITION_SELECTOR",
"replicaGroupPartitionConfig": { ... },
"tagPoolConfig": {
...
"tag": "mt1_OFFLINE"
}
...
},
"instancePartitionsMap": {
"CONSUMING": “mt1_CONSUMING"
"COMPLETED": "mt1_OFFLINE"
},
dimension
as a valid option to table "type" in the /tables controller APIdropOutOfOrderRecord
skipUpsert
for partial-upsert tables as nulls start showing up for columns where a previous non-null was encountered and we don't know if it's an out-of-order event or not.RebalanceChecker
periodic task:
controller.rebalance.checker.frequencyPeriod
: 5min by default ; -1 to disablecontroller.rebalanceChecker.initialDelayInSeconds
: 2min+ by defaultRebalanceConfig
:
heartbeatIntervalInMs
: 300_000 i.e. 5minheartbeatTimeoutInMs
: 3600_000 i.e. 1hrmaxAttempts
: 3 by default, i.e. the original run plus two retriesretryInitialDelayInMs
: 300_000 i.e. 5min, for exponential backoff w/ jittersDELETE /tables/{tableName}/rebalance
API to stop rebalance. In comparison, POST /tables/{tableName}/rebalance
was used to start one.UltraLogLog
(#11835)distinctCountULL
and distinctCountRawULL
)Broadly there are two configs that will enable this feature:
Configs are available as queryOption, tableConfig and Broker config. The priority of enforcement is as follows:
The overriding order of priority is:
1. QueryOption -> maxServerResponseSizeBytes
2. QueryOption -> maxQueryResponseSizeBytes
3. TableConfig -> maxServerResponseSizeBytes
4. TableConfig -> maxQueryResponseSizeBytes
5. BrokerConfig -> pinot.broker.max.server.response.size.bytes
6. BrokerConfig -> pinot.broker.max.query.response.size.bytes
maxValueLength
in jsonIndexConfig
to restrict length of valuesVarByteChunkForwardIndexWriterV4
arrayIndexOfInt(int[] value, int valToFind)
arrayIndexOfLong(int[] value, long valToFind)
arrayIndexOfFloat(int[] value, float valToFind)
arrayIndexOfDouble(int[] value, double valToFind)
arrayIndexOfString(int[] value, String valToFind)
intersectIndices(int[] values1, int[] values2)
FrequentStringsSketch
and FrequentLonsSketch
aggregation functions (#11098)FREQUENTLONGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
FREQUENTSTRINGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
Table index api to get the aggregate index details of all segments for a table.
URL/tables/{tableName}/indexes
Response format
{
"totalSegments": 31,
"columnToIndexesCount":
{
"col1":
{
"dictionary": 31,
"bloom": 0,
"null": 0,
"forward": 31,
...
"inverted": 0,
"some-dynamically-injected-index-type": 31,
},
"col2":
{
...
}
...
}
controller.resource.rebalance.delay_ms
export PINOT_CONTROLLER_HOST=host
export PINOT_SERVER_PROPERTY_WHATEVER=whatever_property
export ANOTHER_VARIABLE=random
DISTINCTCOUNTHLLPLUS(some_id, 12)
clpMatch
, into a boolean expression on the columns of a CLP-encoded field.org.apache.pinot.sql.parsers.rewriter.ClpRewriter
to pinot.broker.query.rewriter.class.names
.DATETIMECONVERTWINDOWHOP
function (#11773)JSON_EXTRACT_INDEX
transform function to leverage json index for json value extraction (#11739)GenerateData
command support for generating data in JSON format (#11778)toUUIDBytes
and fromUUIDBytes
(#11988)DateTimeGenerator
for DATE_TIME
field type columns (#12206)SegmentGenerationAndPushTask
to push segment to realtime table (#12084)skipControllerCertValidation
to skip controller cert validation in AddTableCommand (#11967)jsonKey
(#11890)
jsonKey
is the GCP credential key in string format (either in plain string or base64 encoded string). Refer Creating and managing service account keys to download the keys.columnMajorSegmentBuilderEnabled
pinot.server.query.log.maxRatePerSecond
: query log max rate (QPS, default 10K)pinot.server.query.log.droppedReportMaxRatePerSecond
: dropped query log report max rate (QPS, default 1)SlidingTimeWindowArrayReservoir
in dropwizard metrics (#11695)controller.realtime.segment.tmpFileAsyncDeletionEnabled
(default false
)controller.realtime.segment.tmpFileRetentionInSeconds
(default 3600
)minimizeDataMovement
instance partition assignment strategy (#11953)upsertPrimaryKeysCount
metric reporting when table is deleted (#12169)CommonsConfigurationUtils
(#12056)ServerRoutingStatsManagerTest.testQuerySubmitAndCompletionStats
(#12029)partitionUpsertMetadataManager
(#11964)_segmentAssignmentStrategy
in favor of SegmentsValidationAndRetentionConfig
#11869
useMultistageEngine
property reference in JsonAsyncHttpPinotClientTransportFactory
(#11820)isOptional
instead of the deprecated hasOptional
Keyword (#11682)This PR is introducing backward incompatibility for UpsertCompactionTask. Previously, we allowed to configure the compaction task without the snapshot enabled. We found that using in-memory based validDocIds is a bit dangerous as it will not give us the consistency (e.g. fetching validDocIds bitmap while the server is restarting & updating validDocIds).
We now enforce the enableSnapshot=true
for UpsertCompactionTask if the advanced customer wants to run the compaction with the in-memory validDocId bitmap.
{
"upsertConfig": {
"mode": "FULL",
"enableSnapshot": true
}
}
...
"task": {
"taskTypeConfigsMap": {
"UpsertCompactionTask": {
"schedule": "0 */5 * ? * *",
"bufferTimePeriod": "7d",
"invalidRecordsThresholdPercent": "30",
"invalidRecordsThresholdCount": "100000",
"invalidDocIdsType": "SNAPSHOT/IN_MEMORY/IN_MEMORY_WITH_DELETE"
}
}
}
Also, we allow to configure invalidDocIdsType
to UpsertCompactionTask for advanced user.
snapshot
: Default validDocIds type. This indicates that the validDocIds bitmap is loaded from the snapshot from the Pinot segment. UpsertConfig's enableSnapshot
must be enabled for this type.
onHeap
: the validDocIds bitmap will be fetched from the server.onHeapWithDelete
: the validDocIds bitmap will be fetched from the server. This will also take account into the deleted documents. UpsertConfig's deleteRecordColumn
must be provided for this type.allow.table.name.with.database
(#12402)TableDataManagerConfig
(#12189)CaseTransform
function. (#11721)PinotConfiguartion
to commons-configuartion2
(#11916)Published by vvivekiyer 8 months ago
is_colocated_by_join_keys
query option is reintroduced to ensure dynamic broadcast which can also benefit from direct exchange optimizationis_colocated_by_join_keys
hint is now required for making colocated joins
FLOAT
type is not supported.BOOLEAN
, INT
, LONG
, FLOAT
(only in V1), DOUBLE
, STRING
, TIMESTAMP
.ArrayAgg(intCol, 'INT')
returns ARRAY<INT>
Canonicalize SqlKind.OTHERS
and SqlKind.OTHER_FUNCTIONS
and support
concat
as ||
operator (#12025)
Capability for constant filter in QueryContext
, with support for server to handle it (#11956)
Tests for filter pushdown (#11994)
Enhancements to query plan tests (#11966)
Refactor PlanFragmenter to make the logic clear (#11912)
Observability enhancements to emit metrics for grpc request and multi-stage leaf stage (#11838)
pinot.server.query.log.maxRatePerSecond
: query log max rate (QPS, default 10K)pinot.server.query.log.droppedReportMaxRatePerSecond
: dropped query log report max rate (QPS, default 1)Security enhancement to add RBAC authorization checks for multi-stage query engine (#11830)
Enhancement to leaf-stage execution stats NPE handling (#11805)
Enhancement to add a framework to back-propagate metadata across opChains (#11746)
Use of BinaryArray to wire proto for multi-stage engine bytes literal handling (#11738)
Enable dynamic broadcast for SEMI joins. Adds a fallback option to enable hash table join using joinOptions(join_strategy = 'hash_table')
(#11696)
Improvements to dispatch exception handling (#11688)
Allow malformed dateTime string to return default value configurable in the function signature (#11258)
fromDateTime(colContainsMalformedStr, '<dateTimeFormat>', '<timezone>', <default_value>)
Improvement in multi-stage aggregation to directly store column index as identifier (#11617)
Perf optimization to avoid unnecessary rows conversion in aggregation (#11607)
Enhance SegmentPartitionMetadataManager
to handle new segment (#11585)
Optimize mailbox info in query plan to reduce memory footprint (#12382)
Optimizations to query plan serialization (#12370)
Optimization for parallel execution of Ser/de stage plan (#12363)
Optimizations in query dispatch (#12358)
Perf optimization for group-by and join for single key scenario (#11630)
pinot.server.consumption.rate.limit
to enable this featureMergeRollupTask
and RealtimeToOfflineSegmentsTask
minion tasks segmentMapperFileSizeThresholdInBytes
to specify the threshold size "task": {
"taskTypeConfigsMap": {
"<task_name>": {
"segmentMapperFileSizeThresholdInBytes": "1000000000"
}
}
}
Adds support for deterministic and sticky routing for a query / table / broker. This setting would lead to same server / set of servers (for MultiStageReplicaGroupSelector
) being used for all queries of a given table.
Query option (takes precedence over fixed routing setting at table / broker config level)
SET "useFixedReplica"=true;
Table config (takes precedence over fixed routing setting at broker config level)
"routing": {
...
"useFixedReplica": true
}
Broker conf - pinot.broker.use.fixed.replica=true
dimensionTableConfig.errorOnDuplicatePrimaryKey=true
to enable this behaviorforceCommit
API as a comma separated list of partition names or consuming segment namespinot.broker.instance.tags
streamConfigs
:"stream.pulsar.issuerUrl": "https://auth.streamnative.cloud"
"stream.pulsar.credsFilePath": "file:///path/to/private_creds_file
"stream.pulsar.audience": "urn:sn:pulsar:test:test-cluster"
lowDiskMode.
Default value is false. SELECT ProductId, UserId, l2_distance(embedding, ARRAY[-0.0013143676,-0.011042999,...]) AS l2_dist, n_tokens, combined
FROM fineFoodReviews
WHERE VECTOR_SIMILARITY(embedding, ARRAY[-0.0013143676,-0.011042999,...], 5)
ORDER by l2_dist ASC
LIMIT 10
VectorSimilarity
will return a double value where the first parameter is the embedding column and the second parameter is the search term embedding literal. VectorSimilarity
is a predicate, once config the topK
, this predicate will return topk
rows per segment. Then if you are using this index with other predicate, you may not get expected number of rows since the records matching other predicate might not in the topk
rows.deletedKeysTTL
which will remove deleted keys from in-memory hashmap and mark the validDocID as invalid after the deletedKeysTTL
threshold period.deletedKeysTTL
is set fieldConfigList: [
{
"name": "columnName",
"indexType": "TEXT",
"indexTypes": [
"TEXT"
],
"properties": {
"luceneAnalyzerClass": "org.apache.lucene.analysis.core.KeywordAnalyzer"
},
}
]
standardAnalyzer
unless the luceneAnalyzerClass
property is specified. Murmur3 support with optional fields seed
and variant
for the hash in functionConfig
field of columnPartitionMap.
Default value for seed
is 0.
Added support for 2 variants of Murmur3
: x86_32
and x64_32
configurable using the variant
field in functionConfig
. If no variant is provided we choose to keep the x86_32
variant as it was part of the original implementation.
Examples of functionConfig
;
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3
},
..
}
}
Here there is no functionConfig configured, so the seed
value will be 0
and variant will be x86_32
.
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3,
"functionConfig": {
"seed": "9001"
},
},
..
}
}
Here the seed
is configured as 9001
but as no variant is provided, x86_32
will be picked up.
"tableIndexConfig": {
..
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Murmur3",
"numPartitions": 3,
"functionConfig" :{
"seed": "9001"
"variant": "x64_32"
},
},
..
}
}
Here the variant
is mentioned so Murmur3 will use the x64_32
variant with 9001
as seed.
Note on users using Debezium
and Murmur3
as partitioning function :
byte[]
, String
or long[]
columns.variant
should be set as x64_32
and seed
should be set as 9001
.Adds new MV dictionary encoded forward index format that only stores the unique MV entries.
This new index format can significantly reduce the index size when the MV entries repeat a lot
The new index format can be enabled during index creation, derived column creation, and segment reload
To enable the new index format, set the compression codec in the FieldConfig
:
{
"name": "myCol",
"encodingType": "DICTIONARY",
"compressionCodec": "MV_ENTRY_DICT"
}
Or use the new index JSON:
{
"name": "myCol",
"encodingType": "DICTIONARY",
"indexes": {
"forward": {
"dictIdCompressionType": "MV_ENTRY_DICT"
}
}
}
enableColumnBasedNullHandling
is false. When set to true, Pinot will ignore TableConfig.IndexingConfig.nullHandlingEnabled
and columns will be nullable if and only if FieldSpec.notNull
is false, which is also the default value.{
"schemaName": "blablabla",
"dimensionFieldSpecs": [
{
"dataType": "INT",
"name": "nullableField",
"notNull": false
},
{
"dataType": "INT",
"name": "notNullableField",
"notNull": true
},
{
"dataType": "INT",
"name": "defaultNullableField"
},
...
],
"enableColumnBasedNullHandling": true/false
}
outOfOrderRecordColumn
OOO
or not and then accordingly update the corresponding column value to true / false. skipUpsert
functionColumnPairs
has a output type of bytes, such as when you use distinctcountrawhll.
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"a",
"b",
"c"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [],
"aggregationConfigs": [
{
"columnName": "column1",
"aggregationFunction": "SUM",
"compressionCodec": "SNAPPY"
},
{
"columnName": "column2",
"aggregationFunction": "distinctcounthll",
"compressionCodec": "LZ4"
}
],
"maxLeafRecords": 10000
}
]
"instanceAssignmentConfigMap": {
"CONSUMING": {
"partitionSelector": "MIRROR_SERVER_SET_PARTITION_SELECTOR",
"replicaGroupPartitionConfig": { ... },
"tagPoolConfig": {
...
"tag": "mt1_REALTIME"
}
...
}
"COMPLETED": {
"partitionSelector": "MIRROR_SERVER_SET_PARTITION_SELECTOR",
"replicaGroupPartitionConfig": { ... },
"tagPoolConfig": {
...
"tag": "mt1_OFFLINE"
}
...
},
"instancePartitionsMap": {
"CONSUMING": “mt1_CONSUMING"
"COMPLETED": "mt1_OFFLINE"
},
dimension
as a valid option to table "type" in the /tables controller APIdropOutOfOrderRecord
skipUpsert
for partial-upsert tables as nulls start showing up for columns where a previous non-null was encountered and we don't know if it's an out-of-order event or not.RebalanceChecker
periodic task:
controller.rebalance.checker.frequencyPeriod
: 5min by default ; -1 to disablecontroller.rebalanceChecker.initialDelayInSeconds
: 2min+ by defaultRebalanceConfig
:
heartbeatIntervalInMs
: 300_000 i.e. 5minheartbeatTimeoutInMs
: 3600_000 i.e. 1hrmaxAttempts
: 3 by default, i.e. the original run plus two retriesretryInitialDelayInMs
: 300_000 i.e. 5min, for exponential backoff w/ jittersDELETE /tables/{tableName}/rebalance
API to stop rebalance. In comparison, POST /tables/{tableName}/rebalance
was used to start one.UltraLogLog
(#11835)distinctCountULL
and distinctCountRawULL
)Broadly there are two configs that will enable this feature:
Configs are available as queryOption, tableConfig and Broker config. The priority of enforcement is as follows:
The overriding order of priority is:
1. QueryOption -> maxServerResponseSizeBytes
2. QueryOption -> maxQueryResponseSizeBytes
3. TableConfig -> maxServerResponseSizeBytes
4. TableConfig -> maxQueryResponseSizeBytes
5. BrokerConfig -> pinot.broker.max.server.response.size.bytes
6. BrokerConfig -> pinot.broker.max.query.response.size.bytes
maxValueLength
in jsonIndexConfig
to restrict length of valuesVarByteChunkForwardIndexWriterV4
arrayIndexOfInt(int[] value, int valToFind)
arrayIndexOfLong(int[] value, long valToFind)
arrayIndexOfFloat(int[] value, float valToFind)
arrayIndexOfDouble(int[] value, double valToFind)
arrayIndexOfString(int[] value, String valToFind)
intersectIndices(int[] values1, int[] values2)
FrequentStringsSketch
and FrequentLonsSketch
aggregation functions (#11098)FREQUENTLONGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
FREQUENTSTRINGSSKETCH(col, maxMapSize=256) -> Base64 encoded sketch object
Table index api to get the aggregate index details of all segments for a table.
URL/tables/{tableName}/indexes
Response format
{
"totalSegments": 31,
"columnToIndexesCount":
{
"col1":
{
"dictionary": 31,
"bloom": 0,
"null": 0,
"forward": 31,
...
"inverted": 0,
"some-dynamically-injected-index-type": 31,
},
"col2":
{
...
}
...
}
controller.resource.rebalance.delay_ms
export PINOT_CONTROLLER_HOST=host
export PINOT_SERVER_PROPERTY_WHATEVER=whatever_property
export ANOTHER_VARIABLE=random
DISTINCTCOUNTHLLPLUS(some_id, 12)
clpMatch
, into a boolean expression on the columns of a CLP-encoded field.org.apache.pinot.sql.parsers.rewriter.ClpRewriter
to pinot.broker.query.rewriter.class.names
.DATETIMECONVERTWINDOWHOP
function (#11773)JSON_EXTRACT_INDEX
transform function to leverage json index for json value extraction (#11739)GenerateData
command support for generating data in JSON format (#11778)toUUIDBytes
and fromUUIDBytes
(#11988)DateTimeGenerator
for DATE_TIME
field type columns (#12206)SegmentGenerationAndPushTask
to push segment to realtime table (#12084)skipControllerCertValidation
to skip controller cert validation in AddTableCommand (#11967)jsonKey
(#11890)
jsonKey
is the GCP credential key in string format (either in plain string or base64 encoded string). Refer Creating and managing service account keys to download the keys.columnMajorSegmentBuilderEnabled
pinot.server.query.log.maxRatePerSecond
: query log max rate (QPS, default 10K)pinot.server.query.log.droppedReportMaxRatePerSecond
: dropped query log report max rate (QPS, default 1)SlidingTimeWindowArrayReservoir
in dropwizard metrics (#11695)controller.realtime.segment.tmpFileAsyncDeletionEnabled
(default false
)controller.realtime.segment.tmpFileRetentionInSeconds
(default 3600
)minimizeDataMovement
instance partition assignment strategy (#11953)upsertPrimaryKeysCount
metric reporting when table is deleted (#12169)CommonsConfigurationUtils
(#12056)ServerRoutingStatsManagerTest.testQuerySubmitAndCompletionStats
(#12029)partitionUpsertMetadataManager
(#11964)_segmentAssignmentStrategy
in favor of SegmentsValidationAndRetentionConfig
#11869
useMultistageEngine
property reference in JsonAsyncHttpPinotClientTransportFactory
(#11820)isOptional
instead of the deprecated hasOptional
Keyword (#11682)This PR is introducing backward incompatibility for UpsertCompactionTask. Previously, we allowed to configure the compaction task without the snapshot enabled. We found that using in-memory based validDocIds is a bit dangerous as it will not give us the consistency (e.g. fetching validDocIds bitmap while the server is restarting & updating validDocIds).
We now enforce the enableSnapshot=true
for UpsertCompactionTask if the advanced customer wants to run the compaction with the in-memory validDocId bitmap.
{
"upsertConfig": {
"mode": "FULL",
"enableSnapshot": true
}
}
...
"task": {
"taskTypeConfigsMap": {
"UpsertCompactionTask": {
"schedule": "0 */5 * ? * *",
"bufferTimePeriod": "7d",
"invalidRecordsThresholdPercent": "30",
"invalidRecordsThresholdCount": "100000",
"invalidDocIdsType": "SNAPSHOT/IN_MEMORY/IN_MEMORY_WITH_DELETE"
}
}
}
Also, we allow to configure invalidDocIdsType
to UpsertCompactionTask for advanced user.
snapshot
: Default validDocIds type. This indicates that the validDocIds bitmap is loaded from the snapshot from the Pinot segment. UpsertConfig's enableSnapshot
must be enabled for this type.
onHeap
: the validDocIds bitmap will be fetched from the server.onHeapWithDelete
: the validDocIds bitmap will be fetched from the server. This will also take account into the deleted documents. UpsertConfig's deleteRecordColumn
must be provided for this type.allow.table.name.with.database
(#12402)TableDataManagerConfig
(#12189)CaseTransform
function. (#11721)PinotConfiguartion
to commons-configuartion2
(#11916)Published by saurabhd336 about 1 year ago
Support for Window Functions
Initial (phase 1) Query runtime for window functions with ORDER BY within the OVER() clause (#10449)
Support for the ranking ROW_NUMBER() window function (#10527, #10587)
Set Operations Support
Support SetOperations (UNION, INTERSECT, MINUS) compilation in query planner (#10535)
Timestamp and Date Operations
Support TIMESTAMP type and date ops functions (#11350)
Aggregate Functions
Support more aggregation functions that are currently implementable (#11208)
Support multi-value aggregation functions (#11216)
Support Sketch based functions (#)
Make Intermediate Stage Worker Assignment Tenant Aware (#10617)
Evaluate literal expressions during query parsing, enabling more efficient query execution. (#11438)
Added support for partition parallelism in partitioned table scans, allowing for more efficient data retrieval (#11266).
[multistage]Adding more tuple sketch scalar functions and integration tests (#11517)
Turn on v2 engine by default (#10543)
Introduced the ability to stream leaf stage blocks for more efficient data processing (#11472).
Early terminate SortOperator if there is a limit (#11334)
Implement ordering for SortExchange (#10408)
Table level Access Validation, QPS Quota, Phase Metrics for multistage queries (#10534)
Support partition based leaf stage processing (#11234)
Populate queryOption down to leaf (#10626)
Pushdown explain plan queries from the controller to the broker (#10505)
Enhanced the multi-stage group-by executor to support limiting the number of groups, improving query performance and resource utilization (#11424).
Improved resilience and reliability of the multi-stage join operator, now with added support for hash join right table protection (#11401).
Fix Predicate Pushdown by Using Rule Collection (#10409)
Try fixing mailbox cancel race condition (#10432)
Catch Throwable to Propagate Proper Error Message (#10438)
Fix tenant detection issues (#10546)
Handle Integer.MIN_VALUE in hashCode based FieldSelectionKeySelector (#10596)
Improve error message in case of non-existent table queried from the controller (#10599)
Derive SUM return type to be PostgreSQL compatible (#11151)
Add the ability to include new index types at runtime in Apache Pinot. This opens the ability of adding third party indexes, including proprietary indexes. More details here
NULL support for ORDER BY, DISTINCT, GROUP BY, value transform functions and filtering.
Support added to extend upserts and allow deleting records from a realtime table. The design details can be found here.
Adds a feature to preload segments from a table that uses the upsert snapshot feature. The segments with validDocIds snapshots can be preloaded in a more efficient manner to speed up the table loading (thus server restarts).
Adds support for specifying expiry TTL for upsert primary key metadata cleanup.
Adds a new minion task to compact segments belonging to a real-time table with upserts.
Adds new implementations of PinotDataBuffer that uses Unsafe java APIs and foreign memory APIs. Also added support for PinotDataBufferFactory to allow plugging in custom PinotDataBuffer implementations.
Allows overriding index configs at tier level, allowing for more flexible index configurations for different tiers.
Added new configuration options below which allow use of a bounded thread pool and allocate capacities for it.
pinot.broker.enable.bounded.http.async.executor
pinot.broker.http.async.executor.max.pool.size
pinot.broker.http.async.executor.core.pool.size
pinot.broker.http.async.executor.queue.size
This feature allows better management of broker resources.
Adds a parameter to queryOptions to drop the resultTable from the response. This mode can be used to troubleshoot a query (which may have sensitive data in the result) using metadata only.
In segment metadata and index map, store columns in alphabetical order so that the result is deterministic. Segments generated before/after this PR will have different CRC, so during the upgrade, we might get segments with different CRC from old and new consuming servers. For the segment consumed during the upgrade, some downloads might be needed.
Adds options to configure helix timeouts
external.view.dropped.max.wait.ms`` - The duration of time in milliseconds to wait for the external view to be dropped. Default - 20 minutes.
external.view.check.interval.ms`` - The period in milliseconds in which to ping ZK for latest EV state.
This PR makes Pinot case insensitive be default, and removes the deprecated property enable.case.insensitive.pql
substring
query function definitionDateTimeGranularitySpec
without explicitly setting size (#11057)Published by saurabhd336 about 1 year ago
Support for Window Functions
Initial (phase 1) Query runtime for window functions with ORDER BY within the OVER() clause (#10449)
Support for the ranking ROW_NUMBER() window function (#10527, #10587)
Set Operations Support
Support SetOperations (UNION, INTERSECT, MINUS) compilation in query planner (#10535)
Timestamp and Date Operations
Support TIMESTAMP type and date ops functions (#11350)
Aggregate Functions
Support more aggregation functions that are currently implementable (#11208)
Support multi-value aggregation functions (#11216)
Support Sketch based functions (#)
Make Intermediate Stage Worker Assignment Tenant Aware (#10617)
Evaluate literal expressions during query parsing, enabling more efficient query execution. (#11438)
Added support for partition parallelism in partitioned table scans, allowing for more efficient data retrieval (#11266).
[multistage]Adding more tuple sketch scalar functions and integration tests (#11517)
Turn on v2 engine by default (#10543)
Introduced the ability to stream leaf stage blocks for more efficient data processing (#11472).
Early terminate SortOperator if there is a limit (#11334)
Implement ordering for SortExchange (#10408)
Table level Access Validation, QPS Quota, Phase Metrics for multistage queries (#10534)
Support partition based leaf stage processing (#11234)
Populate queryOption down to leaf (#10626)
Pushdown explain plan queries from the controller to the broker (#10505)
Enhanced the multi-stage group-by executor to support limiting the number of groups, improving query performance and resource utilization (#11424).
Improved resilience and reliability of the multi-stage join operator, now with added support for hash join right table protection (#11401).
Fix Predicate Pushdown by Using Rule Collection (#10409)
Try fixing mailbox cancel race condition (#10432)
Catch Throwable to Propagate Proper Error Message (#10438)
Fix tenant detection issues (#10546)
Handle Integer.MIN_VALUE in hashCode based FieldSelectionKeySelector (#10596)
Improve error message in case of non-existent table queried from the controller (#10599)
Derive SUM return type to be PostgreSQL compatible (#11151)
Add the ability to include new index types at runtime in Apache Pinot. This opens the ability of adding third party indexes, including proprietary indexes. More details here
NULL support for ORDER BY, DISTINCT, GROUP BY, value transform functions and filtering.
Support added to extend upserts and allow deleting records from a realtime table. The design details can be found here.
Adds a feature to preload segments from a table that uses the upsert snapshot feature. The segments with validDocIds snapshots can be preloaded in a more efficient manner to speed up the table loading (thus server restarts).
Adds support for specifying expiry TTL for upsert primary key metadata cleanup.
Adds a new minion task to compact segments belonging to a real-time table with upserts.
Adds new implementations of PinotDataBuffer that uses Unsafe java APIs and foreign memory APIs. Also added support for PinotDataBufferFactory to allow plugging in custom PinotDataBuffer implementations.
Allows overriding index configs at tier level, allowing for more flexible index configurations for different tiers.
Added new configuration options below which allow use of a bounded thread pool and allocate capacities for it.
pinot.broker.enable.bounded.http.async.executor
pinot.broker.http.async.executor.max.pool.size
pinot.broker.http.async.executor.core.pool.size
pinot.broker.http.async.executor.queue.size
This feature allows better management of broker resources.
Adds a parameter to queryOptions to drop the resultTable from the response. This mode can be used to troubleshoot a query (which may have sensitive data in the result) using metadata only.
In segment metadata and index map, store columns in alphabetical order so that the result is deterministic. Segments generated before/after this PR will have different CRC, so during the upgrade, we might get segments with different CRC from old and new consuming servers. For the segment consumed during the upgrade, some downloads might be needed.
Adds options to configure helix timeouts
external.view.dropped.max.wait.ms`` - The duration of time in milliseconds to wait for the external view to be dropped. Default - 20 minutes.
external.view.check.interval.ms`` - The period in milliseconds in which to ping ZK for latest EV state.
This PR makes Pinot case insensitive be default, and removes the deprecated property enable.case.insensitive.pql
substring
query function definitionDateTimeGranularitySpec
without explicitly setting size (#11057)Published by walterddr over 1 year ago
Published by xiangfu0 over 1 year ago
and
filter predicate evaluation efficiency by @jasperjiaguo in https://github.com/apache/pinot/pull/9336
order by sorted ASC, unsorted
and order by DESC
cases by @gortiz in https://github.com/apache/pinot/pull/8979
usageHelp
instead of deprecated help
in picocli commands by @navina in https://github.com/apache/pinot/pull/9608
New join semantics support
New sql semantics support:
Performance enhancement
Published by xiangfu0 about 2 years ago
Apache Pinot 0.11.0 has introduced many new features to extend the query abilities, e.g. the Multi-Stage query engine enables Pinot to do distributed joins, more sql syntax(DML support), query functions and indexes(Text index, Timestamp index) supported for new use cases. And as always, more integrations with other systems(E.g. Spark3, Flink).
Note: there is a major upgrade for Apache Helix to 1.0.4, so please make sure you upgrade the system in the order of:
Helix Controller -> Pinot Controller -> Pinot Broker -> Pinot server
The new multi-stage query engine (a.k.a V2 query engine) is designed to support more complex SQL semantics such as JOIN, OVER window, MATCH_RECOGNIZE and eventually, make Pinot support closer to full ANSI SQL semantics.
More to read: https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine
Pinot operators can pause realtime consumption of events while queries are being executed, and then resume consumption when ready to do so again.
More to read: https://medium.com/apache-pinot-developer-blog/pause-stream-consumption-on-apache-pinot-772a971ef403
The gapfilling functions allow users to interpolate data and perform powerful aggregations and data processing over time series data.
More to read: https://www.startree.ai/blog/gapfill-function-for-time-series-datasets-in-pinot
Long waiting feature for segment generation on Spark 3.x.
Similar to the Spark Pinot connector, this allows Flink users to dump data from the Flink application to Pinot.
This feature allows better fine-grained control on pinot queries.
This allows users to have better query performance on the timestamp column for lower granularity. See: https://docs.pinot.apache.org/basics/indexing/timestamp-index
Wanna search text in realtime? The new text indexing engine in Pinot supports the following capabilities:
select * FROM foo where text_col LIKE 'a%'
select * from foo where text_col CONTAINS 'bar'
Read more: https://medium.com/@atri.jiit/text-search-time-series-style-681af37ba42e
Now you can use INSERT INTO [database.]table FROM FILE dataDirURI OPTION ( k=v ) [, OPTION (k=v)]*
to load data into Pinot from a file using Minion. See: https://docs.pinot.apache.org/basics/data-import/from-query-console
This feature supports enabling deduplication for realtime tables, via a top-level table config. At a high level, primaryKey (as defined in the table schema) hashes are stored into in-memory data structures, and each incoming row is validated against it. Duplicate rows are dropped.
The expectation while using this feature is for the stream to be partitioned by the primary key, strictReplicaGroup routing to be enabled, and the configured stream consumer type to be low level. These requirements are therefore mandated via table config API's input validations.
Pinot has resolved all the high-level vulnerabilities issues:
Published by sajjad-moradi over 2 years ago
This release introduces some new great features, performance enhancements, UI improvments, and bug fixes which are described in details in the following sections.
The release was cut from this commit fd9c58a.
The dependency graph for plug-and-play architecture that was introduced in release 0.3.0 has been extended and now it contains new nodes for Pinot Segment SPI.
DataBlockCache
lookups (#8140)
LZ4
as default compression mode (#7797)
incubator
. (#8023)
Published by xiangfu0 almost 3 years ago
This is a bug fixing release contains:
Update Log4j to 2.17.0 to address CVE-2021-45105 (#7933)
The release is based on the release 0.9.2 with the following cherry-picks:
93c0404da6bcbf9bf4e165f1e2cbba069abcc872
Published by xiangfu0 almost 3 years ago
This is a bug fixing release contains:
The release is based on the release 0.9.1 with the following cherry-picks:
9ed6498cdf9d32a65ebcbcce9158acab64a8c0d7
50e1613503cd74b26cf78873efcbdd6e8516bd8f
767aa8abfb5bf085ba0a7ae5ff4024679f27816e
Published by xiangfu0 almost 3 years ago
This release fixes the major issue of CVE-2021-44228 and a major bug fixing of pinot admin exit code issue(https://github.com/apache/pinot/pull/7798).
The release is based on the release 0.9.0 with the following cherry-picks:
e44d2e46f2eaba5f75d789d92ce767fbee96feba
af2858aff26e169f348605e61d0c5e21ddd73dd9
Published by xiangfu0 almost 3 years ago
This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.
The release was cut from the following commit: 13c9ee9 and the following cherry-picks: 668b5e0, ee887b9
LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.
To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.
At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.
Major Changes:
This release also sees improvements to Pinot’s query console UI.
There have also been improvements and additions to Pinot’s SQL implementation.
This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:
LZ4_WITH_LENGTH
chunk compression type (#7655)BYTES
data type (#7595)Published by snleee about 3 years ago
This release introduced several awesome new features, including compatibility tests, enhanced complex type and Json support, partial upsert support, and new stream ingestion plugins (AWS Kinesis, Apache Pulsar). It contains a lot of query enhancements such as new timestamp
and boolean
type support and flexible numerical column comparison. It also includes many key bug fixes. See details below.
The release was cut from the following commit: fe83e95aa9124ee59787c580846793ff7456eaa5
and the following cherry-picks:
timeColumnTransformFunction
is removed (backward-incompatible, but rollup is not supported anyway)collectorType
and replace it with mergeType
roundBucketTimePeriod
and partitionBucketTimePeriod
to config the time bucket for round and partitionMinionEventObserverFactory
is changed from org.apache.pinot.*.event.*
to org.apache.pinot.*.plugin.minion.tasks.*
(#6980)pinot-minion-builtin-tasks
module and package them into a shaded jar (#6618)pinot.server.instance.reload.consumingSegment
will be true by default (#7078)pinot-kafka
to pinot-json
package. (#7021)PUT /schemas/{schemaName}
will be blocked. (#6737)/tables/validateTableAndSchema
in favor of the new configs/validate API and introduced new APIs for /tableConfigs
to operate on the realtime table config, offline table config and schema in one shot. (#6840)Published by xiangfu0 over 3 years ago
This release introduced several awesome new features, including JSON index, lookup-based join support, geospatial support, TLS support for pinot connections, and various performance optimizations and improvements. It also adds several new APIs to better manage the segments and upload data to offline table. It also contains many key bug fixes. See details below.
The release was cut from the following commit: 78152cdb2892cf8c2df5b8a4d04e2aa897333487
and the following cherry-picks:
queriesDisabled
to check if queries disabled or not. (#6586)
jsonExtractKey
and jsonExtractScalar
functions (#6246) (#6594)
isolation.level
to Kafka consumer (2.0) to ingest transactionally committed messages only (#6580)
RANGE, =, <, <=, >, >=, AND, OR
(#6259)
${VAR_NAME:DEFAULT_VALUE}
in Pinot table configs. (#6271)
Pinot controller metrics prefix is fixed to add a missing dot (#6499). This is a backward-incompatible change that JMX query on controller metrics must be updated
Legacy group key delimiter (\t) was removed to be backward-compatible with release 0.5.0 (#6589)
Upgrade zookeeper version to 3.5.8 to fix ZOOKEEPER-2184: Zookeeper Client should re-resolve hosts when connection attempts fail. (#6558)
Add TLS-support for client-pinot and pinot-internode connections (#6418)
Upgrades to a TLS-enabled cluster can be performed safely and without downtime. To achieve a live-upgrade, go through the following steps:
controller.vip.protocol
and controller.vip.port
and update the configuration files of any ingestion jobs. Restart components a final time and verify that insecure ingress via http is not available anymore.PQL endpoint on Broker is deprecated (#6607)
/query/sql
and controller endpoint /sql
Published by jackjlli almost 4 years ago
This release introduced some excellent new features, including upsert, tiered storage, pinot-spark-connector, support of having clause, more validations on table config and schema, support of ordinals in GROUP BY and ORDER BY clause, array transform functions, adding push job type of segment metadata only mode, and some new APIs like updating instance tags, new health check endpoint. It also contains many key bug fixes. See details below.
The release was cut from the following commit:
e5c9bec
and the following cherry-picks:
Published by chenboat about 4 years ago
This release includes many new features on Pinot ingestion and connectors (e.g., support for filtering during ingestion which is configurable in table config; support for json during ingestion; proto buf input format support and a new Pinot JDBC client), query capability (e.g., a new GROOVY transform function UDF) and admin functions (a revamped Cluster Manager UI & Query Console UI). It also contains many key bug fixes. See details below.
The release was cut from the following commit:
d1b4586
and the following cherry-picks:
DistinctCountAggregationFunction
DictionaryBasedAggregationOperator
for DistinctCountheap
.Published by haibow over 4 years ago
This release introduced various new features, including the theta-sketch based distinct count aggregation function, an S3 filesystem plugin, a unified star-tree index implementation, deprecation of TimeFieldSpec in favor of DateTimeFieldSpec, etc. Miscellaneous refactoring, performance improvement and bug fixes were also included in this release. See details below.
The release was cut from this commit:
https://github.com/apache/incubator-pinot/commit/008be2db874dd1c0d7877ce712842abd818d89d1
with cherry-picking the following patches:
\"
inside the json) (#5194) void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
Published by xiangfu0 over 4 years ago
The reason behind the architectural change from the previous release (0.2.0) and this release (0.3.0), is the possibility of extending Apache Pinot. The 0.2.0 release was not flexible enough to support new storage types nor new stream types. Basically, inserting a new functionality required to change too much code. Thus, the Pinot team went through an extensive refactoring and improvement of the source code.
For instance, the picture below shows the module dependencies of the 0.2.X or previous releases. If we wanted to support a new storage type, we would have had to change several modules. Pretty bad, huh?
In order to conquer this challenge, below major changes are made:
Now the architecture supports a plug-and-play fashion, where new tools can be supported with little and simple extensions, without affecting big chunks of code. Integrations with new streaming services and data formats can be developed in a much more simple and convenient way.
Below is current supported Pinot Plugins module structure:
2020/03/09 23:37:19.879 ERROR [HelixTaskExecutor] [CallbackProcessor@b808af5-pinot] [pinot-broker] [] Message cannot be processed: 78816abe-5288-4f08-88c0-f8aa596114fe, {CREATE_TIMESTAMP=1583797034542, MSG_ID=78816abe-5288-4f08-88c0-f8aa596114fe, MSG_STATE=unprocessable, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=fooBar_OFFLINE, RESOURCE_NAME=brokerResource, RETRY_COUNT=0, SRC_CLUSTER=pinot, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_hostname.domain,com_9000, TGT_NAME=Broker_hostname,domain.com_6998, TGT_SESSION_ID=f6e19a457b80db5, TIMEOUT=-1, segmentName=fooBar_559, tableName=fooBar_OFFLINE}{}{}
java.lang.UnsupportedOperationException: Unsupported user defined message sub type: REFRESH_SEGMENT
at org.apache.pinot.broker.broker.helix.TimeboundaryRefreshMessageHandlerFactory.createHandler(TimeboundaryRefreshMessageHandlerFactory.java:68) ~[pinot-broker-0.2.1172.jar:0.3.0-SNAPSHOT-c9d88e47e02d799dc334d7dd1446a38d9ce161a3]
at org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:1096) ~[helix-core-0.9.1.509.jar:0.9.1.509]
at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:866) [helix-core-0.9.1.509.jar:0.9.1.509]
Published by mcvsubbu almost 5 years ago
Added support for Kafka 2.0.
Table rebalancer now supports a minimum number of serving replicas during rebalance.
Added support for UDF in filter predicates and selection.
Added support to use hex string as the representation of byte array for queries (#4041)
Added support for parquet reader (#3852)
Introduced interface stability and audience annotations (#4063)
Refactor HelixBrokerStarter to separate constructor and start() (#4100) - backwards incompatible
Admin tool for listing segments with invalid intervals for offline tables
Migrated to log4j2 (#4139)
Added simple avro msg decoder
Added support for passing headers in pinot client
Table rebalancer now supports a minimum number of serving replicas during rebalance.
Support transform functions with AVG aggregation function (#4557)
Configurations additions/changes
We are in the process of separating Helix and Pinot controllers, so that
admninistrators can have the option of running independent Helix
controllers and Pinot controllers.
We are in the process of moving towards supporting SQL query format and results.
We are in the process of seperating instance and segment assignment using instance
pools to optimize the number of Helix state transitions in Pinot clusters with
thousands of tables.
Task management does not work correctly in this relese, due to bugs in Helix.
We will upgrade to Helix 0.9.2 (or later) version to get this fixed.
You must upgrade to this release before moving onto newer versions of Pinot
release. The protocol between pinot-broker and pinot-server has been changed
and this release has the code to retain compatibility moving forward.
Skipping this release may (depending on your environment) cause query errors
if brokers are upgraded and servers are in the process of being upgraded.
As always, we recommend that you upgrade controllers first, and then brokers
and lastly the servers in order to have zero downtime in production clusters.
PR #4100 introduces a backwards incompatible change to pinot broker. If you
use the Java constructor on HelixBrokerStarter class, then you will face a
compilation error with this version. You will need to construct the object
and call start() method in order to start the broker.
PR #4139 introduces a backwards incompatible change for log4j configuration.
If you used a custom log4j configuration (log4j.xml), you need to write a new
log4j2 configuration (log4j2.xml). In addition, you may need to change the
arguments on the command line to start pinot components.
If you used pinot-admin command to start pinot components, you don't need any
change. If you used your own commands to start pinot components, you will
need to pass the new log4j2 config as a jvm parameter (i.e. substitute
-Dlog4j.configuration or -Dlog4j.configurationFile argument with
-Dlog4j2.configurationFile=log4j2.xml).
Published by snleee over 5 years ago
This is the first official release of Apache Pinot.
sun.misc
, #3625)We have added a Pinot filesystem abstraction that provides users the option to plug-in their own storage backend. Currently, we support HDFS, NFS, and Azure Data Lake.
We have decoupled Pinot from Kafka for realtime ingestion and added the abstraction for Streams. Users can now add their own plugins to read from any pub-sub systems.
Pinot now can support byte[]
type natively. Also, Pinot can accept byte serialized TDigest object (com.tdunning.math.stats.TDigest
) and this can be queried to compute approximate percentiles as follows.
select percentileTDigest95(tDigestColumn) from myTable where ... group by... top N
Since this is the first official Apache release, these notes applies to people who have used Pinot in production by building from the source code.
com.linkedin
to org.apache
.ControllerRestApi
and SegmentNameGenerator
.PartitionAwareRouting
feature, we have changed the format of partition values from the segment metadata and zk segment metadata. (e.g. partitionRanges: [0 0], [1 1] -> partitions: 0, 1
) The new code is backward compatible with the old format; however, old code will not be able to understand the new format in case of rollback.Thanks for everyone who have contributed to Pinot.
We would also like to express our gratitude to our Apache mentors @kishoreg, @felixcheung, @jimjag, @olamy and @rvs.