Apache Druid: a high performance real-time analytics database.
APACHE-2.0 License
Bot releases are visible (Hide)
Druid 0.9.1.1 contains only one change since Druid 0.9.1, #3204, which addresses a bug with the Coordinator web console. The full list of changes for the Druid 0.9.1 line is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed
Query time lookup (QTL) functionality has been substantially reworked in this release. Most users will need to update their configurations and queries.
The druid-namespace-lookup extension is now deprecated, and will be removed in a future version of Druid. Users should migrate to the new druid-lookups-cached-global extension. Both extensions can be loaded simultaneously to simplify migration. For details about migrating, see Transitioning to lookups-cached-global in the documentation.
Aside from the QTL changes, please note the following changes:
/druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myISO8601Interval}
REST endpoint is now deprecated. The new /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?kill=true
REST endpoint can be used instead.The standard Druid update process described by http://druid.io/docs/0.9.1.1/operations/rolling-updates.html should be followed for rolling updates.
Druid 0.9.1 is the first version to include the experimental Kafka indexing service, utilizing a new Kafka-type indexing task and a supervisor that runs within the Druid overlord. The Kafka indexing service provides an exactly-once ingestion guarantee and does not have the restriction of events requiring timestamps which fall within a window period. More details about this feature are available in the documentation: http://druid.io/docs/0.9.1.1/development/extensions-core/kafka-ingestion.html.
Note: The Kafka indexing service uses the Java Kafka consumer that was introduced in Kafka 0.9. As there were protocol changes made in this version, Kafka 0.9 consumers are not compatible with older brokers and you will need to ensure that your Kafka brokers are version 0.9 or better. Details on upgrading to the latest version of Kafka can be found here: http://kafka.apache.org/documentation.html#upgrade
#2656 Supervisor for KafkaIndexTask
#2602 implement special distinctcount
#2220 Appenderators, DataSource metadata, KafkaIndexTask
#2424 Enabling datasource level authorization in Druid
#2410 statsd-emitter
#1576 [QTL] Query time lookup cluster wide config
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AFeature
#2972 Improved Segment Distrubution (new cost function)
#2931 Optimize filter for timeseries, search, and select queries
#2753 More consistent empty-set filtering behavior on multi-value columns
#2727 BoundFilter optimizations, and related interface changes.
#2711 All Filters should work with FilteredAggregators
#2690 Allow filters to use extraction functions
#2577 Implement native in filter
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AImprovement
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ABug
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ADocumentation
Thanks to everyone who contributed to this release!
@acslk
@b-slim
@binlijin
@bjozet
@dclim
@drcrallen
@du00cs
@erikdubbelboer
@fjy
@gaodayue
@gianm
@guobingkun
@harshjain2
@himanshug
@jaehc
@javasoze
@jisookim0513
@jon-wei
@JonStrabala
@kilida
@lizhanhui
@michaelschiff
@mrijke
@navis
@nishantmonu51
@pdeva
@pjain1
@rasahner
@sascha-coenen
@se7entyse7en
@shekhargulati
@sirpkt
@skilledmonster
@spektom
@xvrl
@yuppie-flu
Druid 0.9.1 contains hundreds of performance improvements, stability improvements, and bug fixes from over 30 contributors. Major new features include an experimental Kafka Supervisor to support exactly-once consumption from Apache Kafka, support for cluster-wide query-time lookups (QTL), and an improved segment balancing algorithm.
The full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed
Query time lookup (QTL) functionality has been substantially reworked in this release. Most users will need to update their configurations and queries.
The druid-namespace-lookup extension is now deprecated, and will be removed in a future version of Druid. Users should migrate to the new druid-lookups-cached-global extension. Both extensions can be loaded simultaneously to simplify migration. For details about migrating, see Transitioning to lookups-cached-global in the documentation.
Aside from the QTL changes, please note the following changes:
/druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myISO8601Interval}
REST endpoint is now deprecated. The new /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?kill=true
REST endpoint can be used instead.The standard Druid update process described by http://druid.io/docs/0.9.1/operations/rolling-updates.html should be followed for rolling updates.
Druid 0.9.1 is the first version to include the experimental Kafka indexing service, utilizing a new Kafka-type indexing task and a supervisor that runs within the Druid overlord. The Kafka indexing service provides an exactly-once ingestion guarantee and does not have the restriction of events requiring timestamps which fall within a window period. More details about this feature are available in the documentation: http://druid.io/docs/0.9.1/development/extensions-core/kafka-ingestion.html.
Note: The Kafka indexing service uses the Java Kafka consumer that was introduced in Kafka 0.9. As there were protocol changes made in this version, Kafka 0.9 consumers are not compatible with older brokers and you will need to ensure that your Kafka brokers are version 0.9 or better. Details on upgrading to the latest version of Kafka can be found here: http://kafka.apache.org/documentation.html#upgrade
#2656 Supervisor for KafkaIndexTask
#2602 implement special distinctcount
#2220 Appenderators, DataSource metadata, KafkaIndexTask
#2424 Enabling datasource level authorization in Druid
#2410 statsd-emitter
#1576 [QTL] Query time lookup cluster wide config
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AFeature
#2972 Improved Segment Distrubution (new cost function)
#2931 Optimize filter for timeseries, search, and select queries
#2753 More consistent empty-set filtering behavior on multi-value columns
#2727 BoundFilter optimizations, and related interface changes.
#2711 All Filters should work with FilteredAggregators
#2690 Allow filters to use extraction functions
#2577 Implement native in filter
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AImprovement
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ABug
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ADocumentation
Thanks to everyone who contributed to this release!
@acslk
@b-slim
@binlijin
@bjozet
@dclim
@drcrallen
@du00cs
@erikdubbelboer
@fjy
@gaodayue
@gianm
@guobingkun
@harshjain2
@himanshug
@jaehc
@javasoze
@jisookim0513
@jon-wei
@JonStrabala
@kilida
@lizhanhui
@michaelschiff
@mrijke
@navis
@nishantmonu51
@pdeva
@pjain1
@rasahner
@sascha-coenen
@se7entyse7en
@shekhargulati
@sirpkt
@skilledmonster
@spektom
@xvrl
@yuppie-flu
Published by gianm over 8 years ago
Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 400 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.
Full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed
In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime.
To learn all about the new extension loading mechanism, see Include extensions and Include Hadoop Dependencies. If you are impatient, here is the summary.
The following properties have been deprecated:
druid.extensions.coordinates
druid.extensions.remoteRepositories
druid.extensions.localRepository
druid.extensions.defaultVersion
Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.
druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"]
.
druid.extensions.directory specifies the directory where all the extensions live. An example would be druid.extensions.directory=/xxx/extensions
.
Note that mysql-metadata-storage extension is not packaged in druid distribution due to license issue. You will have to manually download it from druid.io, decompress and then put in the extensions directory specified.
druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies
. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath /xxx/hadoop-dependencies
.
You might now wonder if you have to manually put extensions inside /xxx/extensions
and /xxx/hadoop-dependencies
. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will see extensions
and hadoop-dependencies
folders there. Simply copy them to /xxx/extensions
and /xxx/hadoop-dependencies
respectively, now you are all set!
If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.
If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.
Please feel free to leave any questions regarding the migration.
Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.
Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.
As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.
This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.
As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.
If you have any issues starting up because of this, please see https://github.com/druid-io/druid/issues/2749
druid.indexer.task.baseDir
and druid.indexer.task.baseTaskDir
now default to using the standard Java temporary directory specified by java.io.tmpdir
system property, instead of /tmp
,
Other issues to be aware of: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3A%22Release+Notes%22
and
https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AIncompatible
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AFeature
#1719 Add Rackspace Cloud Files Deep Storage Extension
#1858 Support avro ingestion for realtime & hadoop batch indexing
#1873 add ability to express CONCAT as an extractionFn
#1921 Add docs and benchmark for JSON flattening parser
#1936 adding Upper/Lower Bound Filter
#1978 Graphite emitter
#1986 Preserve dimension order across indexes during ingestion
#2008 Regex search query
#2014 Support descending time ordering for time series query
#2043 Add dimension selector support for groupby/having filter
#2076 adding lower and upper extraction fn
#2209 support cascade execution of extraction filters in extraction dimension spec
#2221 Allow change minTopNThreshold per topN query
#2264 Adding custom mapper for json processing exception
#2271 time-descending result of select queries
#2258 acl for zookeeper is added
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AImprovement
#984 Use thread priorities. (aka set nice
values for background-like tasks)
#1638 Remove Maven client at runtime + Provide a way to load Druid extensions through local file system
#1728 Store AggregatorFactory[] in segment metadata
#1988 support multiple intervals in dataSource inputSpec
#2006 Preserve dimension order across indexes during ingestion
#2047 optimize InputRowSerde
#2075 Configurable value replacement on match failure for RegexExtractionFn
#2079 reduce bytearray copy to minimal optimize VSizeIndexedWriter
#2084 minor optimize IndexMerger's MMappedIndexRowIterable
#2094 Simplifying dimension merging
#2107 More efficient SegmentMetadataQuery
#2111 optimize create inverted indexes
#2138 build v9 directly
#2228 Improve heap usage for IncrementalIndex
#2261 Prioritize loading of segments based on segment interval
#2306 More specific null/empty str handling in IndexMerger
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ABug
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ADocumentation
#2100 doc update to make it easy to find how to do re-indexing or delta ingestion
#2186 Add intro developer docs
#2279 Some more multitenancy docs
#2364 Add more docs around timezone handling
#2216 Completely rework the Druid getting started process
Thanks to everyone who contributed to this patch!
@fjy
@xvrl
@drcrallen
@pjain1
@chtefi
@liubin
@salsakran
@jaebinyo
@erikdubbelboer
@gianm
@bjozet
@navis
@AlexanderSaydakov
@himanshug
@guobingkun
@abbondanza
@binlijin
@rasahner
@jon-wei
@CHOIJAEHONG1
@loganlinn
@michaelschiff
@himank
@nishantmonu51
@sirpkt
@duilio
@pdeva
@KurtYoung
@mangesh-pardeshi
@dclim
@desaianuj
@stevemns
@b-slim
@cheddar
@jkukul
@AdrieanKhisbe
@liuqiyun
@codingwhatever
@clintropolis
@zhxiaogg
@rohitkochar
@itsmee
@Angelmmiguel
@noddi
@se7entyse7en
@zhaown
@genevien
Published by xvrl over 8 years ago
druid.selectors.coordinator.serviceName
to your Coordinator's druid.service
value (defaults to druid/coordinator
) in common.runtime.properties of all nodes. Realtime handoff will only work if this config is properly set. (See #2015)druid.indexer.runner.javaOpts
to Peon as a propertyThanks to all the contributors to this release!
@b-slim
@binlijin
@dclim
@drcrallen
@fjy
@gianm
@guobingkun
@himanshug
@nishantmonu51
@pjain1
@xvrl
Published by xvrl almost 9 years ago
If you are using union queries, please make sure to update broker nodes prior to updating any historical nodes, realtime nodes, or indexing service.
Otherwise, you can follow standard rolling update procedures.
Thanks to all the contributors to this release!
@anwenxu
@cheddar
@dclim
@drcrallen
@fjy
@gianm
@guobingkun
@Hailei
@himanshug
@jon-wei
@nishantmonu51
@pjain1
@potto007
@qix
@rasahner
@xvrl
Published by xvrl about 9 years ago
There should be no update concerns and standard updating procedures can be followed for rolling updates
Improved test coverage for indexing service, ingestion, and coordinator endpoints
The full list of changes can be found here
Special thanks to everyone that contributed (code, docs, etc.) to this release!
@drcrallen
@davideanastasia
@guobingkun
@himanshug
@michaelschiff
@fjy
@krismolendyke
@nishantmonu51
@rasahner
@xvrl
@gianm
@pjain1
@samjhecht
@solimant
@sherry-q
@ubercow
@zhaown
@mvfast
@mistercrunch
@pdeva
@KurtYoung
@onlychoice
@b-slim
@cheddar
@MarConSchneid
Published by drcrallen over 9 years ago
We recently introduced a backwards incompatible change to the schema Druid uses when it emits metrics. If you are not emitting Druid metrics to an http endpoint, the update procedure should be straightforward.
io.druid.server.metrics.ServerMonitor
has been renamed to io.druid.server.metrics.HistoricalMetricsMonitor
. You will need to update any configs that contain this change.Published by fjy over 9 years ago
This release is mainly to get out dimension compression and rework the druid documentation. There are no update concerns with this version of Druid.
Published by xvrl over 9 years ago
Group results by day of week, hour of day, etc.
We added support for time extraction functions where you can group by results based on anything DateTimeFormatter supports. For more details, see http://druid.io/docs/latest/DimensionSpecs.html#time-format-extraction-function .
Audit rule and dynamic configuration changes
Druid now provides support for remembering why a rule or configuration change was made, and who made the change. Note that you must provide the author and comment fields yourself. The IP which issued the configuration change will be recorded by default. For more details, see headers "X-Druid-Author" and "X-Druid-Comment" on http://druid.io/docs/latest/Coordinator.html
Provide support for a password provider for the metadata store
This enables people to write a module extension which implements the logic for getting a password to the metadata store.
Enable servlet filters on Druid nodes
This enables people to write authentication filters for Druid requests.
Query parallelization on the broker for long interval queries
We’ve added the ability to break up a long interval query into multiple shorter interval queries that can be run in parallel. This should improve the performance of more expensive groupBy
s. For more details, see "chunkPeriod" on http://druid.io/docs/latest/Querying.html#query-context
Better schema exploration
The broker can now return the dimensions and metrics for a datasource broken down by interval.
Improved code coverage
We’ve added numerous unit tests to improve code coverage and will be tracking coverage in the future with Coveralls.
Additional ingestion metrics
Added additional metrics for failed persists and failed handoffs.
Configurable InputFormat for batch ingestion (#1177)
pagingSpec
in a select query generates a obscure NPE (#1165). Thanks to friedhardware!ignoreInvalidRows
in reducer for Hadoop indexingtimeBoundary
query on union datasources (#1243)DruidSecondaryModule
(#1245)Published by xvrl over 9 years ago
New ingestion spec
Druid 0.7.0 requires a new ingestion spec format. Druid 0.6.172 supports both the old and new formats of ingestion and has scripts to convert from the old to the new format. This script can be run with 'tools convertSpec' using the same Main used to run Druid nodes. You can update your Druid cluster to 0.6.172, update your ingestion specs to the new format, and then update to Druid 0.7.0. If you update your cluster to Druid 0.7.0 directly, make sure your real-time ingestion pipeline understands the new spec.
MySQL is no longer the default metadata storage
Druid now defaults to embedding Apache Derby, which was chosen mainly for testability purposes. However, we do not recommend using Derby in production. For anything other than testing, please use MySQL or PostgreSQL metadata storage.
Configuration parameters for metadata storage were renamed from druid.db
to druid.metadata.storage
and an additional druid.metadata.storage.type=<mysql|postgresql>
is required to use anything other than Derby.
The convertProps
tool can assis you in convertng all 0.6.x properties to 0.7 properties.
Druid is now case-sensitive
Druid column names are now case-sensitive. We previously tried to be case-insensitive for queries and case-preserving for data, but we decided to make this change as there were numerous bugs related to various casing problems.
If you are upgrading from version 0.6.x:
jsonLowerCase
parseSpec to lower-case the data for you at ingestion time and maintain backwards compatibility.For all other parse specs, you will need to lower-case the
metric/aggregator names if you were using mixed case before.
Batch segment announcement is now the default
Druid now uses batch segment announcement by default for all nodes. If you are already using batch segment announcement, you should be all set.
If you have not yet updated to using batch segments announcement, please read this guide in the forum on how to update your current 0.6.x cluster to use batch announcement first.
Kafka 0.7.x removed in favor of Kafka 0.8.x
If you are using Kafka 0.7, you will have to build the kafka-seven
extension manually. It is commented out in the build, because Kafka 0.7 is not available in Maven Central. The Kafka 0.8 (kafka-eight
) extension is unaffected.
Coordinator endpoint changes
Numerous coordinator endpoints have changed. Please refer to the coordinator documentation for what they are.
In particular:
/info
on the coordinator has been removed./health
on historical nodes has been removedSeparate jar required for com.metamx.metrics.SysMonitor
If you currenly have com.metamx.metrics.SysMonitor
as part of your druid.monitoring.monitors
configuration and would like to keep it, you will have to add the SIGAR library jar to your classpath.
Alternatively, you can simply remove com.metamx.metrics.SysMonitor
if you do not rely on the sys/.*
metrics.
We had to remove the direct dependency on SIGAR in order to move Druid artifacts to Maven Central, since SIGAR is currently not available there.
Update Procedure
If you are running a version of Druid older than 0.6.172, please upgrade to 0.6.172 first. See the 0.6.172 release notes for instructions.
In order to ensure a smooth rolling upgrade without downtime, nodes must be updated in the following order:
Long metric column support
Until now Druid stored all metrics as single precision floating point values, which could introduce rounding errors and unexpected results with queries using longSum
aggregators, especially for groupBy queries.
Pluggable metadata storage
MySQL, PostgreSQL, and Derby (for testing) are now supported out of the box. Derby only supports single master or should not be used for high availability production, use MySQL or PostgreSQL failover for that.
Simplified data ingestion API
completely redo Druid’s data ingestion API.
Switch compression for metric colums from LZF to LZ4
Initial performance tests show it may be between 15% and 25% faster, and results in segments about 3-5% smaller on typical data sets.
Configurable inverted bitmap indexes
Druid now supports Roaring Bitmaps in addition to the default Concise Bitmaps. Initial performance tests show Roaring may be up to 20% faster for certain types of queries, at the expense of segments being 20% larger on average.
Integration tests
We have added a set of integration tests that use Docker to spin up a Druid cluster to run a series of indexing and query tests.
New Druid Coordinator console
We introduced a new Druid console that should hopefully provide a better overview of the status of your cluster and be a bit more scalable if you have hundreds of thousands of segments. We plan to expand this console to provide more information about the current state of a Druid cluster.
Query Result Context
Result contexts can report errors during queries in the query headers. We are currently using this feature for internal retries, but hope to expand it to report more information back to clients.
Faster query speeds
Lots of speed improvements thanks to faster compression format, small optimizations in column structure, and optimizations of queries with multiple aggregations, as well as numerous groupBy query performance improvements. Overall, some queries can be up to twice as fast using the new index format.
Druid artifacts in Maven Central
Druid artifacts are now available in Maven Central to make your own builds and deployments easier.
Common Configuration File
Druid now has a common.runtime.properties where you can declare all global properties as well as all of your external dependencies. This avoids repeated configuration across multiple nodes and will hopefully make setting up a Druid cluster a little less painful.
Default host names, port and service names
Default host names, ports, and service names for all nodes means a lot less configuration is required upfront if you are happy with the defaults. It also means you can run all node types on a single machine without fiddling with port conflicts.
Druid column names are now case sensitive
Death to casing bugs. Be aware of the dangers of updating to 0.7.0 if you have mixed case columns and are using 0.6.x. See above for more details.
Query Retries
Druid will now automatically retry queries for certain classes of failures.
Background caching
For certain types of queries, especially those that involve distinct (hyperloglog) counts, this can improve performance up over 20%. Background caching is disabled by default.
Reduced coordinator memory usage
Reduced coordinator memory usage (by up to 50%). This fixes a problem where a coordinator would sometimes lose leadership due to frequent GCs.
Metrics can now be emitted to SSL endpoints
Additional AWS credentials support, Thanks @gnethercutt
Additional persist and throttle metrics for real-time ingestion
This should help diagnose when real-time ingestion is being throttled and how long persists are taking. These metrics provide a good indication of when it is time to scale up real-time ingestion.
Broker initialization endpoint
Brokers now provides a status endpoint at /druid/broker/v1/loadstatus to indicate whether they are ready to be queried, making rolling upgrades / restarts easier.
druid.host
should now support IPv6 addresses as well.druid.coordinator.merge.on
to false. ‘false’ is the default value of the config.Published by xvrl over 9 years ago
Druid 0.6.172 fixes a few bugs to make the upgrade path towards Druid 0.7.0 seamless:
If you are not already running 0.6.171, please see the 0.6.171 release notes for important notes on the upgrade procedure.
Published by fjy over 9 years ago
Druid 0.6.171 is a bug fix stable mainly meant to enable a less painful update to Druid 0.7.0. Going forward, we will be backporting fixes to 0.6.x as required for the community and continuing to develop major features on 0.7.x.
http://static.druid.io/artifacts/releases/druid-services-0.6.171-bin.tar.gz
Both this version and 0.7.0-RC1 provide much better out of the box support for PostgreSQL as a metadata store. In order to provide this functionality, we had to make some small changes to the way data is stored in metadata storage for MySQL setups.
Before updating to 0.6.171, please make sure that:
All Druid MySQL metadata tables are using UTF-8 encoding for all string/text columns,
The default character set for the Druid MySQL database has been changed to UTF-8.
Druid Coordinator and Overlord will refuse to start if the database default character set is not UTF-8.
To check column character encoding, use
SHOW CREATE TABLE <table>;
.
If the default table encoding is not UTF-8 or if any columns are encoded using anything other than UTF-8 you will need to convert those tables.
To check the database default encoding, use
SHOW VARIABLES LIKE 'character_set_database';
If you are not already using UTF-8 encoding for your columns, you can convert your tables and change the database default using the following commands. Please keep in mind that table conversion can take a while (order of minutes) and segment loading / handoff will be interrupted for the duration of the upgrade.
Make a backup of your database before performing the upgrade!
ALTER TABLE druid_config CONVERT TO CHARSET utf8;
ALTER TABLE druid_rules CONVERT TO CHARSET utf8;
ALTER TABLE druid_segments CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasks CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklogs CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklocks CONVERT TO CHARSET utf8;
-- replace druid with your Druid database name here
ALTER DATABASE druid DEFAULT CHARACTER SET utf8;
We introduced several query optimizations, mainly for topNs and HLLs
The overlord can now optionally choose what worker to send tasks to #904
Improved retry logic for realtime plumbers when handoffs fail during the final merge step
Published by fjy almost 10 years ago
Published by fjy about 10 years ago
Published by fjy over 10 years ago
Published by fjy over 10 years ago
We are pleased to announce a new Druid stable, version 0.6.73. New features include:
A production tested dimension cardinality estimation module
We recently open sourced our HyperLogLog module described in bit.ly/1fIEpjM and //bit.ly/1ebLnNI . Documentation has been added on how to use this module as an aggregator and as part of post aggregators.
Hash-based partitioning
We recently introduced a new sharding format for batch indexing. We use the HyperLogLog module to estimate the size of a data set and create partitions based on this size. In our tests, partitioning via this hash based method is both faster and leads to more evenly partitioned segments.
Cross-tier replication
We can now replicate segments across different tiers. This means that you can create a “hot” tier that loads a single copy of the data on more powerful hardware and a “cold” tier that loads another copy of the data on less powerful hardware. This can lead to significant reductions in infrastructure costs.
Nested GroupBy Queries
Thanks to an awesome contribution from Yuval Oren et. al, we can do multi-level aggregation with groupBys. More info here: https://groups.google.com/forum/#!topic/druid-development/8oL28iuC4Gw
GroupBy memory improvements
We’ve made improvements as to how multi-threaded groupBy queries utilize memory. This should help reduce memory pressure on nodes with concurrent, expensive groupBy queries.
Real-time ingestion stability improvements
We’ve seen some stability issues with real-time ingestion with a high number of concurrent persists and have added smarter throttling to handle this type of workload.
Additional features
Things on our plate
Published by xvrl over 10 years ago
This is a small release with mainly stability and performance updates.
Published by xvrl over 10 years ago
When updating Druid with no downtime, we highly recommend updating historical nodes and real-time nodes before updating the broker layer. Changes in queries are typically compatible with an old broker version and a new historical node version, but not vice versa. Our recommended rolling update process is: