smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

GPL-3.0 License

Stars
92
Committers
35

Bot releases are visible (Hide)

smart-data-lake - 2.0.2

Published by pgruetter almost 4 years ago

New Features

  • fail on configuration objects defined in multiple locations (#104)
  • automatic caching/release of DataFrames that are used multiple times (#112)
  • combine SubFeeds if multiple Actions write to the same DataObject (#280)
  • support save mode overwrite for JdbcTableDataObject (#229)
  • register UDFs for SQL and expressions from global configuration (#262)
  • new CustomPartitionMode (#268)
  • new CustomFileDataObject

Minor Improvements & Bugfixes

  • dont include sdl-core in fat-jar of other modules (#269)
  • fix Kafka Avro with schemaregistry NotSerializableException for GenericDatumReader (#198)
  • refactor transform partition values (#264)
  • validate and order column names on JdbcTableDataObject.writeDataFrame (#190)
  • delete concerned partitions on empty overwrite (#275)
  • improve configuration exception message handling
  • update typesafe config version
smart-data-lake - 1.2.2

Published by pgruetter almost 4 years ago

New Features

  • fail on configuration objects defined in multiple locations (#104)
  • automatic caching/release of DataFrames that are used multiple times (#112)
  • combine SubFeeds if multiple Actions write to the same DataObject (#280)
  • support save mode overwrite for JdbcTableDataObject (#229)
  • register UDFs for SQL and expressions from global configuration (#262)
  • new CustomPartitionMode (#268)
  • new CustomFileDataObject

Minor Improvements & Bugfixes

  • dont include sdl-core in fat-jar of other modules (#269)
  • fix Kafka Avro with schemaregistry NotSerializableException for GenericDatumReader (#198)
  • refactor transform partition values (#264)
  • validate and order column names on JdbcTableDataObject.writeDataFrame (#190)
  • delete concerned partitions on empty overwrite (#275)
  • improve configuration exception message handling
  • update typesafe config version
smart-data-lake -

Published by zzeekk almost 4 years ago

Features and major bugfixes:

  • Properties ${scala.minor.version} are not replaced in pom on maven central for release 1.2.0/2.0.0
  • WebserviceFileDataObject: Pass authHeader through CredentialsUtil.getCredentials
  • Add "virtual" partition columns for reading from JdbcTableDataObject (#229)
smart-data-lake -

Published by zzeekk almost 4 years ago

Features and major bugfixes:

  • Properties ${scala.minor.version} are not replaced in pom on maven central for release 1.2.0/2.0.0
  • WebserviceFileDataObject: Pass authHeader through CredentialsUtil.getCredentials
  • Add "virtual" partition columns for reading from JdbcTableDataObject (#229)
smart-data-lake - 2.0.0 / Spark 3

Published by zzeekk almost 4 years ago

Enhancements and major bugfixes:

  • Modularization (#28)
  • Use dependency management (#83)
  • Support Spark 3.0 (#133) -> SDL version 2.0.x is on Spark 3.0, SDL version 1.2.x is on Spark 2.4.x
  • Ignore partition values filter for specific input DataObjects of CustomSparkAction (#129)
  • Extend execution mode fail condition to allow multiple conditions with description (#215)
  • Allow to specify schema for CustomDfCreator (#232)
  • Support views in Oracle JdbcTableDataObject (#191)

Minor Bugfixes: #201, #203, #213, #219, #158, #199, #182, #51

smart-data-lake - 1.2.0

Published by zzeekk almost 4 years ago

Enhancements and major bugfixes:

  • Modularization (#28)
  • Use dependency management (#83)
  • Support Spark 3.0 (#133) -> SDL version 2.0.x is on Spark 3.0, SDL version 1.2.x is on Spark 2.4.x
  • Ignore partition values filter for specific input DataObjects of CustomSparkAction (#129)
  • Extend execution mode fail condition to allow multiple conditions with description (#215)
  • Allow to specify schema for CustomDfCreator (#232)
  • Support views in Oracle JdbcTableDataObject (#191)

Minor Bugfixes: #201, #203, #213, #219, #158, #199, #182, #51

smart-data-lake - 1.1.1

Published by pgruetter about 4 years ago

Major bugfixes and enhancements:

  • Python Transformations (#130)
  • Partition values on initSubfeed (#161)
  • Abort run if failed run with different parameters exists (#163)
  • Do not stop after init if some actions are skipped (#168)
  • Fix unmatched case exception on deleteDataAfterRead (#173)
  • Dynamically add runId as DataFrame column / PartitionValue (#102, #187)
  • Filter partition values for other subFeeds to dataObjects partition columns (#180)
  • Fix TickTockHiveTable for recursiveInputs (#181)
  • Fix parallelism (#184)

Minor bugfixes and correction:
See the following Pull Requests:
#164 check partitions existing
#167 improve detection of main input/output
#177 early validation of metricsFailCondition
#178 add initialization hook to StateListener

smart-data-lake - 1.1.0

Published by pgruetter about 4 years ago

New features:

  • Allow recursive inputs to CustomSparkAction (#150)
  • Create empty DataFrame according to schemaMin if table does not exist (#153)
  • Refactor executionmode (see PR #147 for details)
  • Refine Kafka schema registry conversion (see PR #154 for details)
  • Support config validation and dry-runs, support to start simulation run programmatically (Part of #120)
  • new StateListener interface for custom handler of current state/metrics (#156)

Breaking changes

  • removal of initExecutionMode attribute on Actions. This can now be achieved by applyCondition = "isNodeStart" (Part of PR #147)
smart-data-lake - 1.0.7

Published by pgruetter about 4 years ago

New features

  • incremental mode #44
  • SQL hooks before & after reading #141
  • Action metric check and fail #142

Bug fixes
various fixes and improvements #145

smart-data-lake - 1.0.6

Published by pgruetter about 4 years ago

New Features

  • KafkaDataObject / Streaming Execution Mode (#35)
  • Optionally include filename in dataframes from file sources (#76)
  • Path is now optional in HiveTableDataObject (#19)

Bug Fixes

  • various small bug fixes (#116, #119, #125,#128,#135)
smart-data-lake - 1.0.5

Published by zzeekk over 4 years ago

New Features

  • Recovery of failed runs (#61)

Bug Fixes

  • follow redirects in WebserviceDataObject (#100)
  • various small bugfixes (#117, #118)
smart-data-lake - 1.0.4

Published by zzeekk over 4 years ago

Use jdbc query option (#94)
Implement support for Oracle JDBC catalog (#91)
Refactor acl enforcement (#95)
Fix historization bugs (#105, #110)
various small bugfixes & improvements (#88, #89, #90, #98, #106, #108)

smart-data-lake - 1.0.3

Published by zzeekk over 4 years ago

New Features

  • Add support for token authentication in SplunkConnection (#74)
  • Refactor SplunkConnection, SftpFileRefConnection, JdbcTableConnection and JmsDataObject to use auth mode in configuration for authentication informations -> existing configuration for these connection types must be updated
  • Implement configurable memory usage logging (configuration: global.memoryLogTimer)
  • Log duration & runtime metrics for exec phase per DataObject (#84)
  • Log action count per runtime state in final log message

Bug fixes

  • Refactor Default/LocalSmartDataLakeBuilder, use LocalSmartDataLakeBuilder for jars mainClass, remove KerberosSmartDataLake builder
  • Mask secret spark properties in log
  • Fix assert repartition filename empty
smart-data-lake -

Published by pgruetter over 4 years ago

  • Optimize KerberosSmartDataLakeBuilder (create Object for easier usage, use defined environment variables)
smart-data-lake -

Published by pgruetter over 4 years ago

  • Add step for test and deploy of scala 2.12 version
  • Prepare for maven central
  • Fix Github Actions