bentools-etl

PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.

MIT License

Downloads
42.9K
Stars
120
Committers
3
bentools-etl - 4.0.1 Latest Release

Published by bpolaszek 11 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0...4.0.1

bentools-etl - 4.0

Published by bpolaszek 11 months ago

Hey folks! πŸ‘‹

It's been more than 4 years since a version 3 bentools/etl was drafted, but never got out of the alpha stability, mostly because of a lack of time but also, I have to admit, uncertainties about design directions taken.

Introducing bentools/etl v4

PHP 8 and a lot of projects on my side came in between, and I recently got the need of this library, but I wanted to keep the good ideas of the v3, and remove the bad ones as well.

So, I decided that a stable v3 will never sunrise, and because lots of classes have been renamed, most of them became immutable, here's a brand new v4 version.

What's new?

  • This version requires PHP 8.2 as a minimum, is 100% covered by tests (this wasn't the case before), and uses PHPStan to ensure types consistency at the highest level. A Github Actions CI has also been set up.

  • It introduces a new EtlState object, which is instantiated at the beginning of the ETL process, and passed through the different steps and event listeners. The EtlExecutor (previously the Etl class) is no longer mutable, since it basically holds the Extractor, the Transformer and the Loader objects, fires events and provides you with the state you need with the EtlState.

  • The EtlState is mostly readonly, but you can still call $state->skip() to skip items, $state->stop() to stop the process, $state->flush() to request an early flush, and you can use the $state->context array to pass arbitrary data between the different steps and events during the whole workflow.

  • The EtlState object also has a nextTick method you can use to perform actions on the next iteration of the loop, for example to do something on an item after an early flush has been triggered.

  • Experimental ReactPHP support, so that you can process incoming data from streams / connections and perform periodic tasks in a long-running process.

  • Improved DX

  • 100% code coverage

How does it work?

Here's an example of the new API:

city_english_name,city_local_name,country_iso_code,continent,population
"New York","New York",US,"North America",8537673
"Los Angeles","Los Angeles",US,"North America",39776830
Tokyo,東京,JP,Asia,13929286
...
use Bentools\ETL\EtlConfiguration;
use Bentools\ETL\EtlExecutor;
use Bentools\ETL\EventDispatcher\Event\LoadEvent;
use Bentools\ETL\Extractor\CSVExtractor;
use Bentools\ETL\Loader\JSONLoader;
use Bentools\ETL\Recipe\LoggerRecipe;
use Monolog\Logger;

$etl = (new EtlExecutor(options: new EtlConfiguration(flushEvery: 100)))
    ->extractFrom(new CSVExtractor(options: ['columns' => 'auto']))
    ->transformWith(function (array $city) {
        $city['slug'] = strtr(strtolower($city['city_english_name']), [' ' => '-']);
        yield $city;
    })
    ->loadInto(new JSONLoader())
    ->onLoad(fn (LoadEvent $event) => print("Loaded city `{$event->item['slug']}`".PHP_EOL))
    ->withRecipe(new LoggerRecipe(new Logger('etl-logs')));

$report = $etl->process(
    source: 'file:///tmp/cities.csv',
    destination: 'file:///tmp/cities.json',
);

var_dump($report->output); // file:///tmp/cities.json
[
    {
        "city_english_name": "New York",
        "city_local_name": "New York",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 8537673,
        "slug": "new-york"
    },
    {
        "city_english_name": "Los Angeles",
        "city_local_name": "Los Angeles",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 39776830,
        "slug": "los-angeles"
    },
    {
        "city_english_name": "Tokyo",
        "city_local_name": "東京",
        "country_iso_code": "JP",
        "continent": "Asia",
        "population": 13929286,
        "slug": "tokyo"
    }
]

I hope you'll enjoy this release as much as I enjoyed coding it! πŸ˜ƒ

bentools-etl - 4.0-alpha16

Published by bpolaszek 11 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha15...4.0-alpha16

bentools-etl - 4.0-alpha15

Published by bpolaszek 11 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha14...4.0-alpha15

bentools-etl - 4.0-alpha14

Published by bpolaszek 11 months ago

bentools-etl - 4.0-alpha13

Published by bpolaszek 11 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha12...4.0-alpha13

bentools-etl - 4.0-alpha12

Published by bpolaszek 11 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha11...4.0-alpha12

bentools-etl - 4.0-alpha11

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha10...4.0-alpha11

bentools-etl - 4.0-alpha10

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha9...4.0-alpha10

bentools-etl - 4.0-alpha9

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha8...4.0-alpha9

bentools-etl - 4.0-alpha8

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha7...4.0-alpha8

bentools-etl - 4.0-alpha7

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha6...4.0-alpha7

bentools-etl - 4.0-alpha6

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha5...4.0-alpha6

bentools-etl - 4.0-alpha5

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha4...4.0-alpha5

bentools-etl - 4.0-alpha4

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha3...4.0-alpha4

bentools-etl - 4.0-alpha3

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha2...4.0-alpha3

bentools-etl - 4.0-alpha2

Published by bpolaszek 12 months ago

What's Changed

Full Changelog: https://github.com/bpolaszek/bentools-etl/compare/4.0-alpha1...4.0-alpha2

bentools-etl - Version 4.0 on its way !

Published by bpolaszek about 1 year ago

Hey folks! πŸ‘‹

It's been more than 4 years since a version 3 bentools/etl was drafted, but never got out of the alpha stability, mostly because of a lack of time but also, I have to admit, uncertainties about design directions taken.

Introducing bentools/etl v4

PHP 8 and a lot of projects on my side came in between, and I recently got the need of this library, but I wanted to keep the good ideas of the v3, and remove the bad ones as well.

So, I decided that a stable v3 will never sunrise, and because lots of classes have been renamed, most of them became immutable, here's a brand new v4 version.

What's new?

  • This version requires PHP 8.2 as a minimum, is 100% covered by tests (this wasn't the case before), and uses PHPStan to ensure types consistency at the highest level. A Github Actions CI has also been set up.

  • It introduces a new EtlState object, which is instantiated at the beginning of the ETL process, and passed through the different steps and event listeners. The EtlExecutor (previously the Etl class) is no longer mutable, since it basically holds the Extractor, the Transformer and the Loader objects, fires events and provides you with the state you need with the EtlState.

  • The EtlState is mostly readonly, but you can still call $state->skip() to skip items, $state->stop() to stop the process, $state->flush() to request an early flush, and you can use the $state->context array to pass arbitrary data between the different steps and events during the whole workflow.

How does it work?

Here's an example of the new API:

city_english_name,city_local_name,country_iso_code,continent,population
"New York","New York",US,"North America",8537673
"Los Angeles","Los Angeles",US,"North America",39776830
Tokyo,東京,JP,Asia,13929286
...
use Bentools\ETL\EtlConfiguration;
use Bentools\ETL\EtlExecutor;
use Bentools\ETL\EventDispatcher\Event\LoadEvent;
use Bentools\ETL\Extractor\CSVExtractor;
use Bentools\ETL\Loader\JSONLoader;
use Bentools\ETL\Recipe\LoggerRecipe;
use Monolog\Logger;

$etl = (new EtlExecutor(options: new EtlConfiguration(flushEvery: 100)))
    ->extractFrom(new CSVExtractor(options: ['columns' => 'auto']))
    ->transformWith(function (array $city) {
        $city['slug'] = strtr(strtolower($city['city_english_name']), [' ' => '-']);
        yield $city;
    })
    ->loadInto(new JSONLoader())
    ->onLoad(fn (LoadEvent $event) => print("Loading city `{$event->item['slug']}`".PHP_EOL))
    ->withRecipe(new LoggerRecipe(new Logger('etl-logs')));

$report = $etl->process(
    source: 'file:///tmp/cities.csv',
    destination: 'file:///tmp/cities.json',
);

var_dump($report->output); // file:///tmp/cities.json
[
    {
        "city_english_name": "New York",
        "city_local_name": "New York",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 8537673,
        "slug": "new-york"
    },
    {
        "city_english_name": "Los Angeles",
        "city_local_name": "Los Angeles",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 39776830,
        "slug": "los-angeles"
    },
    {
        "city_english_name": "Tokyo",
        "city_local_name": "東京",
        "country_iso_code": "JP",
        "continent": "Asia",
        "population": 13929286,
        "slug": "tokyo"
    }
]

I hope you'll enjoy this release as much as I enjoyed coding it! πŸ˜ƒ

bentools-etl -

Published by bpolaszek about 4 years ago

  • Etl::process() can now have no args
  • Refactored some constructors
  • Add early flush feature (loader can now flush on demand from ETL or from the event system, and will know if it's a partial flush or a full flush)
  • Handled Doctrine namespace change
  • PHP 7.4 support (sorry, we're late)
bentools-etl - Init loader hook

Published by bpolaszek over 5 years ago

  • Signature changes on some extractors / loaders (CSV, JSON).
  • It's now possible to pass arbitrary arguments to LoaderInterface::init() that will be processed just before the 1st item to be loaded. These arbitrary, optionnal arguments are now part of Etl::process() signature. this allows a single loader to have multiple options and/or targets at runtime, and to reset its state at each ETL process.
  • It's possible to hook on the loader.init event with EtlBuilder::onLoaderInit().
Package Rankings
Top 8.56% on Packagist.org
Badges
Extracted from project README
CI Workflow Coverage