Bot releases are visible (Hide)
TLSv1.2 is the minimum TLS protocol version that newer Kudu clients are able to use for secure Kudu RPC. The newer clients are not able to communicate with servers built and run with OpenSSL of versions prior to 1.0.1. If such a Kudu cluster is running on a deprecated OS versions (e.g., RHEL/CentOS 6.4), the following options are available to work around the incompatibility:
--rpc_authentication=disabled
and --rpc_encryption=disabled
for all masters and tablet servers in the cluster to allow the new client to work with the old clusterTLSv1.2 is the minimum TLS protocol version that newer Kudu servers are able to use for secure Kudu RPC. The newer servers are not able to communicate using secure Kudu RPC with Kudu C++ client applications linked with libkudu_client library built against OpenSSL of versions prior to 1.0.1 or with Java client applications run with outdated Java runtime that doesn't support TLSv1.2. The following options are available to work around this incompatibility:
--rpc_tls_min_protocol
and --rpc_tls_ciphers
flags on all masters and tablet servers in the cluster, setting --rpc_tls_min_protocol=TLSv1
and adding TLSv1-capable cipher suites (e.g. AES128-SHA and AES256-SHA) into the list--rpc_authentication=disabled
and --rpc_encryption=disabled
for all masters and tablet servers in the cluster to allow such Kudu clients to work with newer clustersSupport for Python 2.x and Python 3.4 and earlier is deprecated and may be removed in the next minor release.
Kudu now supports encrypting data at rest. Kudu supports AES-128-CTR
, AES-192-CTR
, and AES-256-CTR
ciphers to encrypt data, supports Apache Ranger KMS and Apache Hadoop KMS. See Data at rest for more details.
Kudu now supports range-specific hash schemas for tables. It's now possible to add ranges with their own unique hash schema independent of the table-wide hash schema. This can be done at table creation time and while altering the table. It’s controlled by the --enable_per_range_hash_schemas
master flag which is enabled by default (see KUDU-2671).
Kudu now supports soft-deleted tables. Kudu keeps a soft-deleted table aside for a period of time (a.k.a. reservation), not purging the data yet. The table can be restored/recalled back before its reservation expires. The reservation period can be customized via Kudu client API upon soft-deleting the table. The default reservation period is controlled by the --default_deleted_table_reserve_seconds
master's flag. NOTE: As of Kudu 1.17 release, the soft-delete functionality is not supported when HMS integration is enabled, but this should be addressed in a future release (see KUDU-3326).
Introduced Auto-Incrementing
column. An auto-incrementing column is populated on the server side with a monotonically increasing counter. The counter is local to every tablet, i.e. each tablet has a separate auto incrementing counter (see KUDU-1945).
Kudu now supports experimental non-unique primary key. When a table with non-unique primary key is created, an Auto-Incrementing
column named auto_incrementing_id
is added automatically to the table as the key column. The non-unique key columns and the Auto-Incrementing
column together form the effective primary key (see KUDU-1945).
Introduced Immutable
column. It's useful to represent a semantically constant entity (see KUDU-3353).
An experimental feature is added to Kudu that allows it to automatically rebalance tablet leader replicas among tablet servers. The background task can be enabled by setting the --auto_leader_rebalancing_enabled
flag on the Kudu masters. By default, the flag is set to 'false' (see KUDU-3390).
Introduced an experimental feature: authentication of Kudu client applications to Kudu servers using JSON Web Tokens (JWT). The JWT-based authentication can be used as an alternative to Kerberos authentication for Kudu applications running at edge nodes where configuring Kerberos might be cumbersome. Similar to Kerberos credentials, a JWT is considered a primary client's credentials. The server-side capability of JWT-based authentication is controlled by the --enable_jwt_token_auth
flag (set 'false' by default). When the flat set to 'true', a Kudu server is capable of authenticating Kudu clients using the JWT provided by the client during RPC connection negotiation. From its side, a Kudu client authenticates a Kudu server by verifying its TLS certificate. For the latter to succeed, the client should use Kudu client API to add the cluster's IPKI CA certificate into the list of trusted certificates.
The C++ client scan token builder can now create multiple tokens per tablet. So, it's now possible to dynamically scale the set of readers/scanners fetching data from a Kudu table in parallel. To use this functionality, use the newly introduced SetSplitSizeBytes()
method of the Kudu client API to specify how many bytes of data each token should scan (see KUDU-3393).
Kudu's default replica placement algorithm is now range and table aware to prevent hotspotting unlike the old power of two choices algorithm. New replicas from the same range are spread evenly across available tablet servers, the table the range belongs to is used as a tiebreaker (see KUDU-3476).
Statistics on various write operations is now available via Kudu client API at the session level (see KUDU-3351, KUDU-3365).
Kudu now exposes all its metrics except for string gauges in Prometheus format via the embedded webserver's /metrics_prometheus
endpoint (see KUDU-3375).
It’s now possible to deploy Kudu clusters in an internal network (e.g. in K8S environment) and avoid internal traffic (i.e. tservers and masters) using advertised addresses and allow Kudu clients running in external networks. This can be achieved by customizing the setting for the newly introduced --rpc_proxy_advertised_addresses
and --rpc_proxied_addresses
server flags. This might be useful in various scenarios where Kudu cluster is running in an internal network behind a firewall, but Kudu clients are running at the other side of the firewall using JWT to authenticate to Kudu servers, and the RPC traffic between to the Kudu cluster is forwarded through a TCP/SOCKS proxy (see KUDU-3357).
It’s now possible to clean up metadata for deleted tables/tablets from Kudu master's in-memory map and the sys.catalog
table. This is useful in reducing the memory consumption and bootstrap time for masters. This can be achieved by customizing the setting for the newly introduced --enable_metadata_cleanup_for_deleted_tables_and_tablets
and --metadata_for_deleted_table_and_tablet_reserved_secs
kudu-master’s flags.
It’s now possible to perform range rebalancing for a single table per run in the kudu cluster rebalance
CLI tool by setting the newly introduced --enable_range_rebalancing
tool flag. This is useful to address various hot-spotting issues when too many tablet replicas from the same range (but different hash buckets) were placed at the same tablet server. The hot-spotting issue in tablet replica placement should be address in a follow-up releases, see KUDU-3476 for details.
It’s now possible to compact log container metadata files at runtime. This is useful in reclaiming the disk space once the container becomes full. This feature can be turned on/off by customizing the setting for the newly introduced --log_container_metadata_runtime_compact
kudu-tserver flag (see KUDU-3318).
New CLI tools kudu master/tserver set_flag_for_all
are added to update flags for all masters and tablet servers in a Kudu cluster at once.
A new CLI tool kudu local_replica copy_from_local
is added to copy tablet replicas' data at the filesystem level. It can be used when adding disks and for quick rebalancing of data between disks, or can be used when migrating data from one data directory to the other. It will make data more dense than data on old data directories too.
A new CLI tool kudu diagnose parse_metrics
is added to parse metrics out of diagnostic logs (see KUDU-2353).
A new CLI tool kudu local_replica tmeta delete_rowsets
is added to delete rowsets from the tablet.
A sanity check has been added to detect wall clock jumps, it is controlled by the newly introduced --wall_clock_jump_detection
and --wall_clock_jump_threshold_sec
flags. That should help to address issues reported in KUDU-2906.
Reduce the memory consumption if there are frequent alter schema operations for tablet servers (see KUDU-3197).
Reduce the memory consumption by implementing memory budgeting for performing RowSet merge compactions (i.e. CompactRowSetsOp maintenance operations). Several flags have been introduced, while the --rowset_compaction_memory_estimate_enabled
flag indicates whether to check for available memory necessary to run CompactRowSetsOp maintenance operations (see KUDU-3406).
Optimized evaluating in-list predicates based on RowSet PK bounds. A tablet server can now effectively skip rows when the predicate is on a non-prefix part of the primary key and the leading columns' cardinality is 1 (see KUDU-1644).
Speed up CLI tool kudu cluster rebalance
to run intra-location rebalancing in parallel for location-aware Kudu cluster. Theoretically, running intra-location rebalancing in parallel might shorten the runtime by N times compared with running sequentially, where N is the number of locations in a Kudu cluster. This can be achieved by customizing the setting for the newly introduced --intra_location_rebalancing_concurrency
flag.
Two new flags --show_tablet_partition_info
and --show_hash_partition_info
have been introduced for the kudu table list
CLI tool to show the corresponding relationship between partitions and tablet ids, and it's possible to specify the output format by specifying
--list_table_output_format
flag.
A new flag --create_table_replication_factor
has been introduced for the kudu table copy
CLI tool to specify the replication factor for the destination table.
A new flag --create_table_hash_bucket_nums
has been introduced for the kudu table copy
CLI tool to specify the number of hash buckets in each hash dimension for the destination table.
A new flag --tables
has been introduced for the kudu master unsafe_rebuild
CLI tool to rebuild the metadata of specified tables on Kudu master, and it has no effect on the other tables.
A new flag --fault_tolerant
has been introduced for the kudu table copy/scan
and kudu perf table_scan
CLI tool to make the scanner fault-tolerant and the results returned in primary key order per-tablet.
A new flag --show_column_comment
has been introduced for the kudu table describe
CLI tool to show column comments.
A new flag --current_leader_uuid
has been introduced for the kudu tablet leader_step_down
CLI tool to conveniently step down leader replica using a given UUID.
A new flag --use_readable_format
has been introduced for the kudu local_replica dump rowset
CLI tool to indicate whether to dump the primary key in human readable format. Besides, another flag --dump_primary_key_bounds_only
has been introduced to this tool to indicate whether to dump rowset primary key bounds only.
A new flag --tables
has been introduced for the kudu local_replica delete
CLI tool to conveniently delete multiple tablets by table name.
It’s now possible to specify owner
and comment
fields when using the kudu table create
CLI tool to create tables.
It’s now possible to use the kudu local_replica copy_from_remote
CLI tool to copy tablets in a batch.
It’s now possible to enable or disable auto rebalancer by setting --auto_rebalancing_enabled
flag to Kudu master at runtime.
It’s now possible for kudu tserver/master get_flags
CLI tool to filter flags even if the server side doesn’t support flags filter function (the latter is for Kudu servers of releases prior to 1.12).
Added a CSP (Content Security Policy) header to prevent security scanners flagging Kudu's web UI as vulnerable.
A separated section has been introduced to include all non-default flags specially on path /varz
of Kudu's web UI.
A separated section has been introduced to show slow scans on path /scans
of Kudu's web UI, it can be enabled by tweaking the --show_slow_scans
flag for tablet servers. A scan is called 'slow' if it takes more time than defined by --slow_scanner_threshold_ms
.
A new Data retained
column has been introduced to the Non-running operations
section to indicate the approximate amount of disk space that would be freed on path /maintenance-manager
of Kudu's web UI.
The default value of tablet history retention time (controlled by --tablet_history_max_age_sec
flag) on Kudu master has been reduced from 7 days to 5 minutes. It's not necessary to keep such a long history of the system tablet since masters always scan data at the latest available snapshot.
Kudu can now be built and run on Apple M chips and macOS 11, 12. As with prior releases, Kudu's support for macOS is experimental, and should only be used for development.
Fixed an issue where historical MVCC data older than the ancient history mark (configured by --tablet_history_max_age_sec
) that had only DELETE operations wouldn't be compacted correctly. As a result, the ancient history data could not be GCed if the tablet had been created by Kudu servers of releases prior to 1.10 (those versions did not support live row counting) (see KUDU-3367).
Fixed an issue where the Kudu server could potentially crash on malicious negotiation attempts.
Fixed a bug when a Kudu tablet server started under an OS account that had no permission to access tablet metadata files would stuck in the tablet bootstrapping phase (see KUDU-3419).
Fixed a bug in the C++ client where toggling SetFaultTolerant(false)
would not work.
Fixed a bug in the C++ client where toggling KuduScanner::SetSelection()
would not work.
Fixed a bug in the Java client where under certain conditions same rows would be returned multiple times even if the scanner was configured to be fault-tolerant.
Fixed a bug in the Java client where the last propagated timestamp and resource metrics would not be updated in subsequent scan responses.
Fixed a bug in the Java client where it would not invalidate stale locations of the leader master.
Fixed a bug in the Kudu HMS client that was causing failures when scanning Kudu tables from Hive (see KUDU-3401).
Fixed a bug where the kudu table copy
CLI tool would fail copying an unpartitioned table.
Fixed a bug where the kudu master unsafe_rebuild
CLI tool would rebuild the system catalog with outdated schemas of tables that were unhealthy during the rebuild process.
Fixed a bug where kudu table copy
failed to copy tables that had STRING, BINARY or VARCHAR type of columns in their range keys (see KUDU-3306).
Fixed a bug of the kudu table copy
CLI tool crashing if encountering an error while copying rows to the destination table. The tool now exits gracefully and provides additional information for troubleshooting in such a condition.
Fixed a bug where the kudu local_replica list
CLI tool would crash if the --list_detail
flag was enabled.
Fixed a bug when a sub-process running Ranger client would crash when receiving a oversized message from Kudu master. With the fix, each peer communicating via the Subprocess protocol now discards an oversized message, logs about the issue, and clears the channel, and is able to receive further messages after encountering such a condition.
Fixed a bug when a Kudu application linked with kudu_client library would crash with SIGILL if running on a machine lacking SSE4.2 support (see KUDU-3248).
Fixed a bug where the subprocess crashes in case of receiving large messages from the Kudu master when the pipe gets full to transport the entire message in one go or when there is a delay in sending from the master (see KUDU-3489).
Kudu 1.17.0 is wire-compatible with previous versions of Kudu:
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.17 and versions earlier than 1.3:
The Kudu 1.17 Java client library is API- and ABI-compatible with Kudu 1.16. Applications written against Kudu 1.16 will compile and run against the Kudu 1.17 client library. Applications written against Kudu 1.17 will compile and run against the Kudu 1.16 client library unless they use the API newly introduced in Kudu 1.17.
The Kudu 1.17 {cpp} client is API- and ABI-forward-compatible with Kudu 1.16. Applications written and compiled against the Kudu 1.16 client library will run without modification against the Kudu 1.17 client library. Applications written and compiled against the Kudu 1.17 client library will run without modification against the Kudu 1.16 client library unless they use the API newly introduced in Kudu 1.17.
The Kudu 1.17 Python client is API-compatible with Kudu 1.16. Applications written against Kudu 1.16 will continue to run against the Kudu 1.17 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.17.0 includes contributions from 26 people, including 12 first-time contributors:
Published by attilabukor over 2 years ago
Clients can now require authentication and encryption instead of depending on server-side settings KUDU-1921.
Kudu Masters now automatically attempt to add themselves to an existing cluster if there is a healthy Raft quorum among Kudu Masters.
A new tool kudu master unsafe_rebuild
is added to reconstruct the master catalog from tablet metadata collected from tablet servers. This can be used in emergencies to restore access to tables when all masters are unavailable.
A new tool kudu table set_replication_factor
is added to alter the replication factor of a table. The tool immediately updates table metadata in the master, and the master will asynchronously effect the new replication factor. Progress can be monitored by running ksck
.
It’s now possible to require a minimum replication factor for a Kudu table. This can be achieved by customizing the setting for the newly introduced --min_num_replicas
kudu-master’s flag. For example, setting --min_num_replicas=3
enforces every newly created table to have at least 3 replicas for each of its tablets, so there cannot be a data loss when just a single tablet server in the cluster fails irrecoverably. For the sake of backward compatibility, --min_num_replicas
is set to 1 by default.
It’s now possible to track startup progress on the /startup
page on the web UI. There are also metrics added to track the overall server startup progress as well as the processing of the log block containers and starting of the tablets KUDU-1959.
A new tool kudu table add_column
is added to add columns to existing tables using the CLI KUDU-3339.
A new tool kudu tserver unregister
is added to remove a dead tablet server from the cluster without restarting the masters KUDU-2915.
Kudu will now more aggressively fsync consensus-related metadata when metadata is configured to be on an XFS mount. This may lead to increased contention on the device that backs metadata, but will prevent corruption in the event of an outage KUDU-2195.
A clearer message is logged when the Ranger subprocess crashes, to specify a problem with the Ranger client.
Two new flags have been introduced for the kudu table scan
and kudu perf table_scan
CLI tools: --row_count_only
and --report_scanner_stats
. With these new flags, the above mentioned CLI tools allow to issue scan requests equivalent to running “SELECT COUNT(1) FROM <table_name>” from impala-shell. These new provisions are useful in detecting and troubleshooting scan performance issues.
Added replica selection configuration knob for the kudu table scan
and kudu perf table_scan
CLI tools: it’s controlled by the --replica_selection
flag.
To improve security, the following flags are now marked as sensitive and will be redacted in the logs and WebUI when the redaction is enabled:
** --webserver_private_key_file
** --webserver_private_key_password_cmd
** --webserver_password_file
The logic to select the effective time source when running with --time_source=auto
has been updated. The builtin
time source would be auto-selected if a Kudu server runs with --time_source=auto
in an environment where the instance detector isn't aware of dedicated NTP servers AND the --builtin_ntp_servers
flag is set to a valid value. Otherwise, if --builtin_ntp_servers
flag is set to an empty or invalid value, the effective time source becomes system
for platforms supporting the get_ntptime()
API, otherwise the catch-all case selects the system_unsync
as the time effective source.
It is now possible to print or edit PBC files in batch using the kudu pbc
CLI tool, and also to format its JSON input/output as “pretty”.
Client connection timeout is now configurable in the Java client KUDU-3240.
A new /healthz
endpoint is now available on the kudu-master and tablet-server embedded web servers for liveness checks KUDU-3308.
Hive Metastore URI is now logged to the console when connecting via kudu hms
CLI tool KUDU-3189.
It is now possible to start up a master when there is an additional master address present in the master addresses flag KUDU-3311.
Table entity is now accessible in KuduWriteOperation
in the C++ client, making understanding errors on the client side easier KUDU-2623.
The rebalancer tool now doesn’t move replicas to tablet servers in maintenance mode KUDU-3328.
Improved the performance of the run length encoding (RLE).
Log4J used in Ranger subprocess was upgraded to 2.17.1 which contains patches go several security vulnerabilities (CVE-2021-44832, CVE-2021-45105, CVE-2021-45046, and CVE-2021-44228).
Kudu servers previously crashed if hostnames became unresolvable via DNS (e.g. if the container hosting a server were destroyed). Such errors are now treated as transient and the lookups are retried periodically. See KUDU-75], KUDU-1620, and KUDU-1885 for more details.
Fixed an issue in Kudu Java client where concurrent flushing of data buffers could lead to errors reported as 'java.lang.AssertionError: This Deferred was already called' KUDU-3277.
Fixed Kudu RPC negotiation issue when running with cyrus-sasl-gssapi-2.1.27-5 and newer versions of the RPM package. A failed RPC connection negotiation attempt would result in an error logged along with the full connection negotiation trace: Runtime error: SASL(-15): mechanism too weak for this user: Unable to find a callback: 32775
KUDU-3297.
Fixed crash in kudu-master and kudu-tserver when running with kernel where the getrandom(2)
API is not available (versions of Linux kernel prior to 3.17).
Fixed bug which could lead to exhaustion of the address space for the outgoing connections on a busy Kudu cluster KUDU-3352.
Fixed a bug in the Java client where a malformed tablet server ID in the scan token causes connection failures and timeouts in some cases KUDU-3349.
Fixed a bug where the rebalancer failed with -ignored_tservers
flag KUDU-3346.
Kudu 1.16.0 is wire-compatible with previous versions of Kudu:
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.16 and versions earlier than 1.3:
The Kudu 1.16 Java client library is API- and ABI-compatible with Kudu 1.15. Applications written against Kudu 1.15 will compile and run against the Kudu 1.16 client library and
vice-versa.
The Kudu 1.16 {cpp} client is API- and ABI-forward-compatible with Kudu 1.15. Applications written and compiled against the Kudu 1.15 client library will run without modification against the Kudu 1.16 client library. Applications written and compiled against the Kudu 1.16 client library will run without modification against the Kudu 1.15 client library.
The Kudu 1.16 Python client is API-compatible with Kudu 1.15. Applications written against Kudu 1.15 will continue to run against the Kudu 1.16 client
and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.16.0 includes contributions from 17 people, including 5 first-time contributors:
Thank you for your contributions!
Published by acelyc111 about 3 years ago
kudu-mapreduce
integration has been removed in the 1.15.0 release.Kudu now experimentally supports multi-row transactions. Currently only INSERT
and INSERT_IGNORE
operations are supported.
See here for a design overview of this feature.
Kudu now supports Raft configuration change for Kudu masters and CLI tools for orchestrating addition and removal of masters in a Kudu cluster. These tools substantially simplify the process of migrating to multiple masters, recovering a dead master and removing masters from a Kudu cluster. For detailed steps, see the latest administration documentation. This feature is evolving and the steps to add, remove and recover masters may change in the future. See KUDU-2181 for details.
Kudu now supports table comments directly on Kudu tables which are automatically synchronized when the Hive Metastore integration is enabled. These comments can be added at table creation time and changed via table alteration.
Kudu now experimentally supports per-table size limits based on leader disk space usage or number of rows. When generating new authorization tokens, Masters will now consider the size limits and strip tokens of INSERT
and UPDATE
privileges if either limit is reached. To enable this feature, set the --enable_table_write_limit
master flag; adjust the --table_disk_size_limit
and --table_row_count_limit
flags as desired or use the kudu table set_limit
tool to set limits per table.
It is now possible to change the Kerberos Service Principal Name using the --principal
flag. The default SPN is still kudu/_HOST
. Clients connecting to a cluster using a non-default SPN must set the sasl_protocol_name
or saslProtocolName
to match the SPN base (i.e. “kudu” if the SPN is “kudu/_HOST”) in the client builder or the Kudu CLI. See KUDU-1884 for details.
Kudu RPC now supports TLSv1.3. Kudu servers and clients automatically negotiate TLSv1.3 for Kudu RPC if OpenSSL (or Java runtime correspondingly) on each side supports TLSv1.3. If necessary, use the newly introduced flag --rpc_tls_ciphersuites
to customize TLSv1.3-specific cipher suites at the server side. See KUDU-2871 for details.
TLS ciphers renegotiation for TLSv1.2 and prior protocol versions is now explicitly disabled. See KUDU-1926 for details.
The location assignment for Kudu clients is now disabled by default since it doesn’t bring a lot of benefits, but rather puts an extra load to Kudu masters. This change reduces the load on Kudu masters which is essential if too many clients run in a cluster. To enable the location assignment for clients, override the default by setting --master_client_location_assignment_enabled=true
for Kudu masters.
The behavior of the C++ client replica selection for closest replica, the default, was updated to match the behavior of the Java client. Instead of picking a random replica each time, a static value is used for each process ensuring that the selection remains deterministic and can benefit from better caching. See KUDU-3248 for details.
The Web UI /rpcz endpoint now displays information on whether an RPC connection is protected by TLS, and if so, provides information on the negotiated TLS cipher suite.
Tooling requests and C++ client requests bound for leader masters will now be retried in the event the masters cannot be reached.
Cluster tooling will now validate that the master argument contains no duplicate values. See KUDU-3226 for details.
The error message output by Kudu Java client in an attempt to write into a non-existent table partition now contains the table’s name.
See KUDU-3267 for details.
Fixed a bug in the Kudu tablet servers that could result in a crash when performing an incremental backup of rows that had many batches of updates. See KUDU-3291 for more details.
The Kudu Java client will now retry scans bound for tablets hosted on quiescing tablet servers at replicas on other tablet servers. See
KUDU-3213 for more details.
Fixed a race between the scheduling of a maintenance op and the destruction of a tablet. This could previously lead to a crash.
See KUDU-3268 for more details.
Fixed crash in Kudu C++ client introduced with KUDU-1802. See KUDU-3254 for details.
Fixed bug in Kudu Java client which manifested in AUTO_FLUSH_BACKGROUND
sessions hung in a call to KuduSession.flush()
method. Another sign of the bug were stuck data ingest workloads based on Java client (e.g., kudu-spark applications) with
"java.lang.AssertionError: This Deferred was already called!" message in the logs. See KUDU-3277 for details.
Fixed crash in Kudu server due to lack of getrandom(2)
system call in Linux kernel version earlier than 3.17 by instead using /dev/random
for uuid generation in the Boost library. Crash includes the following message in the logs "terminate called after throwing an instance of 'boost::wrapexceptboost::uuids::entropy_error'". See the fix for a sample stack trace.
Kudu 1.15.0 is wire-compatible with previous versions of Kudu:
The authentication features introduced in Kudu 1.3 place the following limitations
on wire compatibility between Kudu 1.15 and versions earlier than 1.3:
The Kudu 1.15 Java client library is API- and ABI-compatible with Kudu 1.14. Applications written against Kudu 1.14 will compile and run against the Kudu 1.15 client library and vice-versa.
The Kudu 1.15 {cpp} client is API- and ABI-forward-compatible with Kudu 1.14. Applications written and compiled against the Kudu 1.14 client library will run without modification against the Kudu 1.15 client library. Applications written and compiled against the Kudu 1.15 client library will run without modification against the Kudu 1.14
client library.
The Kudu 1.15 Python client is API-compatible with Kudu 1.14. Applications written against Kudu 1.14 will continue to run against the Kudu 1.15 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.15.0 includes contributions from 12 people, including 2 first-time contributors:
Thank you for your contributions!
Published by granthenke over 3 years ago
Support for CentOS 6/RHEL 6, Ubuntu 14, Ubuntu 16, and Debian 8 platforms has been dropped
given they are at or near end-of-life. We will no longer validate these platforms as a
part of the release process, though patches will still be accepted going forward.
Developer support for OS X 10.10 Yosemite, OS X 10.11 El Capitan, and OS X 10.12 Sierra
has been dropped. We will no longer validate these versions as a part of the release
process, though patches will still be accepted going forward.
Support for Python 2.x and Python 3.4 and earlier is deprecated and may be
removed in the next minor release.
The kudu-mapreduce
integration has been deprecated and may be removed in the
next minor release. Similar functionality and capabilities now exist via the
Apache Spark, Apache Hive, Apache Impala, and Apache NiFi integrations.
Full support for INSERT_IGNORE
, UPDATE_IGNORE
, and DELETE_IGNORE
operations
was added. The INSERT_IGNORE
operation will insert a row if one matching the key
does not exist and ignore the operation if one already exists. The UPDATE_IGNORE
operation will update the row if one matching the key exists and ignore the operation
if one does not exist. The DELETE_IGNORE
operation will delete the row if one matching
the key exists and ignore the operation if one does not exist. These operations are
particularly useful in situations where retries or duplicate operations could occur and
you do not want to handle the errors that could result manually or you do not want to cause
unnecessary writes and compaction work as a result of using the UPSERT
operation.
The Java client can check if the cluster it is communicating with supports these operations
by calling the supportsIgnoreOperations()
method on the KuduClient. See
KUDU-1563 for more details.
Spark 3 compatible JARs compiled for Scala 2.12 are now published for the Kudu Spark integration.
See KUDU-3202 for more details.
Every Kudu cluster now has an automatically generated cluster Id that can be used to uniquely
identify a cluster. The cluster Id is shown in the masters web-UI, the kudu master list
tool,
and in master server logs. See KUDU-2574
for more details.
It is now possible to enforce that OpenSSL is initialized in FIPS approved mode in the servers
and the C++ client by setting the KUDU_REQUIRE_FIPS_MODE environment variable to “1”, “yes” or
“true”. See KUDU-3210 for more details.
Downloading the WAL data and data blocks when copying tablets to another tablet server is now
parallelized, resulting in much faster tablet copy operations. These operations occur when
recovering from a down tablet server or when running the cluster rebalancer. See
KUDU-1728 and KUDU-3214 for more details.
The HMS integration now supports multiple Kudu clusters associated with a single HMS
including Kudu clusters that do not have HMS synchronization enabled. This is possible,
because the Kudu master will now leverage the cluster Id to ignore notifications from
tables in a different cluster. Additionally, the HMS plugin will check if the Kudu cluster
associated with a table has HMS synchronization enabled. See
KUDU-3192 and KUDU-3187 for more details.
The HMS integration now supports gzipped HMS notifications. This is important in order to
support Hive 4 where the default encoder was changed to be the GzipJSONMessageEncoder. See
KUDU-3201 for more details.
Kudu will now fail tablet replicas that have been corrupted due to KUDU-2233 instead of
crashing the tablet server. If a healthy majority still exists, a new replica will be created
and the failed replica will be evicted and deleted. See
KUDU-3191 and KUDU-2233 for more details.
DeltaMemStores will now be flushed as long as any DMS in a tablet is older than the point
defined by --flush_threshold_secs
, rather than flushing once every --flush_threshold_secs
period. This can reduce memory pressure under update- or delete-heavy workloads, and lower tablet
server restart times following such workloads. See
KUDU-3195 for more details.
The kudu perf loadgen
CLI tool now supports UPSERT
for storing the generated data into
the table. To switch to UPSERT
for row operations (instead of default INSERT
), add the
--use_upsert
command-line flag.
Users can now specify the level of parallelization when copying a tablet using the
kudu local_replica copy_from_remote
CLI tool by passing the
--tablet_copy_download_threads_nums_per_session
argument.
The Kudu Masters now discriminate between overlapped and exact duplicate key ranges when adding
new partitions, returning Status::AlreadyPresent()
for exact range duplicates and
Status::InvalidArgument()
for otherwise overlapped ones. In prior releases, the master
returned Status::InvalidArgument()
both in case of duplicate and otherwise overlapped ranges.
The handling of an empty list of master addresses in Kudu C++ client has improved. In prior
releases, KuduClientBuilder::Build()
would hang in ConnectToCluster()
if no master addresses
were provided. Now, KuduClientBuilder::Build()
instantly returns Status::InvalidArgument()
in such a case.
The connection negotiation timeout for Kudu C++ client is now programmatically configurable.
To customize the connection negotiation timeout, use the newly introduced
KuduClientBuilder::connection_negotiation_timeout()
method in the Kudu C++ client API.
All RPC-related kudu
CLI tools now have --negotiation_timeout_ms
command line flag to
control the client-side connection negotiation timeout. The default value for the new flag is
set to 3000 milliseconds for backward compatibility. Keep in mind that the total RPC timeout
includes the connection negotiation time, so in general it makes sense to bump --timeout_ms
along with --negotiation_timeout_ms
by the same delta.
Kudu now reports on slow SASL calls (i.e. calls taking more than 250 milliseconds to complete)
when connecting to a server. This is to help diagnose issues like described in
KUDU-3217.
MaintenanceManager now has a new histogram-based maintenance_op_find_best_candidate_duration
metric to capture the stats on how long it takes (in microseconds) to find the best maintenance
operation among available candidates. The newly introduced metric can help in diagnosing
conditions where MaintenanceManager seems lagging behind the rate of write operations in a busy
Kudu cluster with many replicas per tablet server.
The KuduScanToken Java API has been extended with a deserializeIntoScannerBuilder()
method that
can be used to further customize generated tokens.
Logging of the error message produced when applying an op while a Java KuduSession is closed
has been throttled. See
KUDU-3012 for more details.
Added a new uptime
metric for a Kudu server. The metric's value is reported as the length of
the time interval passed from the start of the server, in microseconds. Knowing the server's
uptime, it's easier to interpret and compare metrics reported by different Kudu servers.
Documentation for Kudu’s metrics are now automatically generated for each release and can be seen
here.
Fixed lock contention between MaintenanceManager op registration and the scheduling of new
maintenance ops. On particularly dense tablet servers, this contention was previously shown to
significantly slow down startup times. See
KUDU-3149 for more details.
Fixed lock contention between MaintenanceManager’s threads performing already scheduled
operations and the scheduler thread itself. This benefits clusters with heavy ingest/update
workloads that have many replicas per tablet server. See
[KUDU-1954] (https://issues.apache.org/jira/browse/KUDU-1954) for more details.
Fixed a bug in the merge iterator that could result in a crash. This could surface as a crash
when performing ordered or differential scans, particularly when the underlying data contained
deletes and reinserts. See
KUDU-3108 for more details.
Fixed a heap-use-after-free bug in Kudu C++ client that might manifest itself when altering a
table to update the partitioning schema. See
KUDU-3238 for more details.
Fixed a bug where building scan tokens would result in a NullPointerException if a tablet not
found error occurred before generating the token. See
KUDU-3205 for more details.
Fixed a bug where a delete operation would fail if the row being deleted contained exactly
64 columns and all values were set on the row. See
KUDU-3198 for more details.
Fixed a bug where Slf4j classes were shaded into the Spark integration JARs. See
KUDU-3157 for more details.
Fixed a bug where the 'kudu hms fix' tool mistakenly reports non-matching master addresses
when the addresses are in-fact canonically the same. See
KUDU-2884 for more details.
Kudu 1.14.0 is wire-compatible with previous versions of Kudu:
The authentication features introduced in Kudu 1.3 place the following limitations
on wire compatibility between Kudu 1.14 and versions earlier than 1.3:
The Kudu 1.14 Java client library is API- and ABI-compatible with Kudu 1.13. Applications
written against Kudu 1.13 will compile and run against the Kudu 1.14 client library and
vice-versa.
The Kudu 1.14 {cpp} client is API- and ABI-forward-compatible with Kudu 1.13.
Applications written and compiled against the Kudu 1.13 client library will run without
modification against the Kudu 1.14 client library. Applications written and compiled
against the Kudu 1.14 client library will run without modification against the Kudu 1.13
client library.
The Kudu 1.14 Python client is API-compatible with Kudu 1.13. Applications
written against Kudu 1.13 will continue to run against the Kudu 1.14 client
and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.14.0 includes contributions from 12 people, including 1 first-time
contributors:
Thank you for your contributions!
For full installation details, see Kudu Installation.
Published by attilabukor about 4 years ago
Support for Python 2.x and Python 3.4 and earlier is deprecated and may be removed in the next minor release.
The kudu-mapreduce
integration has been deprecated and may be removed in the next minor release. Similar functionality and capabilities now exist via the Apache Spark, Apache Hive, Apache Impala, and Apache NiFi integrations.
Added table ownership support. All newly created tables are automatically owned by the user creating them. It is also possible to change the owner by altering the table. You can also assign privileges to table owners via Apache Ranger (see KUDU-3090).
An experimental feature is added to Kudu that allows it to automatically rebalance tablet replicas among tablet servers. The background task can be enabled by setting the --auto_rebalancing_enabled
flag on the Kudu masters. Before starting auto-rebalancing on an existing cluster, the CLI rebalancer tool should be run first (see KUDU-2780).
Bloom filter column predicate pushdown has been added to allow optimized execution of filters which match on a set of column values with a false-positive rate. Support for Impala queries utilizing Bloom filter predicate is available yielding performance improvements of 19% to 30% in TPC-H benchmarks and around 41% improvement for distributed joins across large tables. Support for Spark is not yet available. (see KUDU-2483).
AArch64-based (ARM) architectures are now supported including published Docker images.
The Java client now supports the columnar row format returned from the server transparently. Using this format can reduce the server CPU and size of the request over the network for scans. The columnar format can be enabled via the setRowDataFormat() method on the KuduScanner.
An experimental feature that can be enabled by setting the --enable_workload_score_for_perf_improvement_ops
prioritizes flushing and compacting hot tablets.
Hive metastore synchronization now supports Hive 3 and later.
The Spark KuduContext accumulator metrics now track operation counts per table instead of cumulatively for all tables.
The kudu local_replica delete
CLI tool now accepts multiple tablet identifiers. Along with the newly added --ignore_nonexistent
flag, this helps with scripting scenarios when removing multiple tablet replicas from a particular Tablet Server.
Both Master’s and Tablet Server’s web UI now displays the name for a service thread pool group at the /threadz
page
Introduced queue_overflow_rejections_
metrics for both Masters and Tablet Servers: number of RPC requests of a particular type dropped due to RPC service queue overflow.
Introduced a CoDel-like queue control mechanism for the apply queue. This helps to avoid accumulating too many write requests and timing them out in case of seek-bound workloads (e.g., uniform random inserts). The newly introduced queue control mechanism is disabled by default. To enable it, set the --tablet_apply_pool_overload_threshold_ms
Tablet Server’s flag to appropriate value, e.g. 250 (see KUDU-1587).
Java client’s error collector can be resized (see KUDU-1422).
Calls to the Kudu master server are now drastically reduced when using scan tokens. Previously deserializing a scan token would result in a GetTableSchema request and potentially a GetTableLocations request. Now the table schema and location information is serialized into the scan token itself avoiding the need for any requests to the master when processing them.
The default size of Master’s RPC queue is now 100 (it was 50 in earlier releases). This is to optimize for use cases where a Kudu cluster has many clients working concurrently.
Masters now have an option to cache table location responses. This is targeted for Kudu clusters which have many clients working concurrently. By default, the caching of table location responses is disabled. To enable table location caching, set the proper capacity of the table location cache using Master’s --table_locations_cache_capacity_mb
flag (setting to 0 disables the caching). Up to 17% of improvement is observed in GetTableLocations request rate when enabling the caching.
Removed lock contention on Raft consensus lock in Tablet Servers while processing a write request. This helps to avoid RPC queue overflows when handling concurrent write requests to the same tablet from multiple clients (see KUDU-2727).
Master’s performance for handling concurrent GetTableSchema requests has been improved. End-to-end tests indicated up to 15% improvement in sustained request rate for high concurrency scenarios.
Kudu servers now use protobuf Arena objects to perform all RPC request/response-related memory allocations. This gives a boost for overall RPC performance, and with further optimization the result request rate was increased significantly for certain methods. For example, the result request rate increased up to 25% for Master’s GetTabletLocations() RPC in case of highly concurrent scenarios (see KUDU-636).
Tablet Servers now use protobuf Arena for allocating Raft-related runtime structures. This results in substantial reduction of CPU cycles used and increases write throughput (see KUDU-636).
Tablet Servers now use protobuf Arena for allocating EncodedKeys to reduce allocator contention and improve memory locality (see KUDU-636).
Bloom filter predicate evaluation for scans can be computationally expensive. A heuristic has been added that verifies rejection rate of the supplied Bloom filter predicate below which the Bloom filter predicate is automatically disabled. This helped reduce regression observed with Bloom filter predicate in TPC-H benchmark query #9 (see KUDU-3140).
Improved scan performance of dictionary and plain-encoded string columns by avoiding copying them (see KUDU-2844).
Improved maintenance manager’s heuristics to prioritize larger memstores (see KUDU-3180).
Spark client’s KuduReadOptions now supports setting a snapshot timestamp for repeatable reads with READ_AT_SNAPSHOT consistency mode (see KUDU-3177).
Kudu scans now honor location assignments when multiple tablet servers are co-located with the client.
Fixed a bug that caused IllegalArgumentException to be thrown when trying to create a predicate for a DATE column in Kudu Java client (see KUDU-3152).
Fixed a potential race when multiple RPCs work on the same scanner object.
Kudu 1.13.0 is wire-compatible with previous versions of Kudu:
Kudu 1.13 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
Rolling upgrade between Kudu 1.12 and Kudu 1.13 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
Kudu 1.0 clients may connect to servers running Kudu 1.13 with the exception of the below-mentioned restrictions regarding secure clusters.
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.13 and versions earlier than 1.3:
If a Kudu 1.13 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
If a Kudu 1.13 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
The Kudu 1.13 Java client library is API- and ABI-compatible with Kudu 1.12. Applications written against Kudu 1.12 will compile and run against the Kudu 1.13 client library and vice-versa.
The Kudu 1.13 C++ client is API- and ABI-forward-compatible with Kudu 1.12. Applications written and compiled against the Kudu 1.12 client library will run without modification against the Kudu 1.13 client library. Applications written and compiled against the Kudu 1.13 client library will run without modification against the Kudu 1.12 client library.
The Kudu 1.13 Python client is API-compatible with Kudu 1.12. Applications written against Kudu 1.12 will continue to run against the Kudu 1.13 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.13.0 includes contributions from 22 people, including 9 first-time contributors:
Jim Apple
Kevin J McCarthy
Li Zhiming
Mahesh Reddy
Romain Rigaux
RuiChen
Shuping Zhou
ningw
wenjie
For full installation details, see Kudu Installation.