🧙 Build, run, and manage data pipelines for integrating and transforming data.
APACHE-2.0 License
Bot releases are hidden (Show)
Published by mattppal about 1 year ago
One of our top contributors @christopherscholz just delivered a huge feature! A completely streamlined dbt Block!
Here are some of the highlights:
dbt-core
, instead of calling it via a subprocess, which allows to use all of dbts functionalitiesdbt seed
command to dbt-core==1.4.7
DBTBlock
DBTBlockSQL
and DBTBlockYAML
There's lots to unpack in this one, so be sure to read more in the PR below and check out our updated docs.
by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3497
Google Cloud users rejoice! Mage already supports storing block output variables in S3, but thanks to contributor @luizarvo, you can now do the same in GCS!
Check out the PR for more details and read-up on implementation here.
by @luizarvo in https://github.com/mage-ai/mage-ai/pull/3597
Another community-led integration! Thank you @mohamad-balouza for adding a Tableau source for data integration pipelines!
by @mohamad-balouza in https://github.com/mage-ai/mage-ai/pull/3581
Last week, we rolled out a ton of new DuckDB functionality, this week, we're adding DuckDB loader and exporter templates! Be sure to check them out when building your new DuckDB pipelines! 😄
by @matrixstone in https://github.com/mage-ai/mage-ai/pull/3553
Exciting frontend improvements are coming your way! You can now retry all of a pipeline's incomplete block runs from the UI. This includes all block runs that do not have completed
status.
S3Storage
to store block output variables by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3559 and https://github.com/mage-ai/mage-ai/pull/3588
dbt seed
requiring variables by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3579
pipelineRowsSorted
when clearing search query by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3596
condition_failed
check for dynamic blocks by @dy46 in https://github.com/mage-ai/mage-ai/pull/3595
remote_variables_dir
for variable manager by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3562
TIMESTAMP
in redshift convert by @RobinFrcd https://github.com/mage-ai/mage-ai/pull/3567
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.26...0.9.28
Published by mattppal about 1 year ago
Folks, we've got some ducking magic going on in this release. You can now use DuckDB files inside Mage's SQL Blocks. 🥳 🦆 🪄
You can use data loaders to CREATE
and SELECT
from DuckDB tables as well as write new data to DuckDB.
Check out our docs to get started today!
by @matrixstone in https://github.com/mage-ai/mage-ai/pull/3463
This is another huge feature— a complete overhaul of our Charts functionality!
There are 2 new charts dashboards: a dashboard for all your pipelines and a dashboard for each pipeline.
You can add charts of various types with different sources of data and use these dashboards for observability or for analytics.
There's a ton to unpack here, so be sure to read more in our docs.
by @tommydangerous
This one is a big quality of life improvement: Mage can now display datetimes in local timezones... No more UTC conversion! Just navigate to Settings > Workspace > Preferences to enable a new timezone!
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3481
A big shoutout to @mazieb! You can now stream data from InfluxDB via Mage. Thanks for your hard work! They added a destination last week!
Read more in our docs here.
by @mazieb in https://github.com/mage-ai/mage-ai/pull/3430
Another frequently requested feature shipping this week, courtesy of @dy46: custom block-level logging!
You can now specify logging at the block-level by directly changing the logger settings:
@data_loader
def load_data(*args, **kwargs):
kwarg_logger = kwargs.get('logger')
kwarg_logger.info('Test logger info')
kwarg_logger.warning('Test logger warning')
kwarg_logger.error('Test logger error')
...
See more in our docs here.
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3473
zmq
context destroy by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3514
os.path.join
-> Error non posix systems by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3520
dtype
int
is not always casted as int64
by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3522
timestamp out of range
on Windows by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3519
mage_ai
and mage_integrations
by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3525
Execute
pipeline action for streaming pipelines by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3492
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.23...0.9.26
Published by wangxiaoyou1993 about 1 year ago
📰 Hot off the press, you can now add and update block variables via the Mage UI!
Check out our docs to learn more about block variables.
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3451
You can now pull PowerBI data in to all your Mage projects!
A big shoutout to @mohamad-balouza for this awesome contribution. 🎉
Read more about the connection here.
by @mohamad-balouza in https://github.com/mage-ai/mage-ai/pull/3433
@mohamad-balouza was hard at work! You can also integrate Knowi data in your Mage projects! 🤯
Read more about the connection here.
by @mohamad-balouza in https://github.com/mage-ai/mage-ai/pull/3446
A big shoutout to @mazieb! You can now stream data to InfluxDB via Mage. Thanks for your hard work!
Read more in our docs here.
by @mazieb in https://github.com/mage-ai/mage-ai/pull/3378
status
query from block runs request by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3444
start_time
and execution_date
in should_schedule
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3466
local_python_force
executor type and configuring ECS executor launch type @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3447
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.21...0.9.23
Published by mattppal about 1 year ago
📰 Hot off the press, you can now add and update block variables via the Mage UI!
Check out our docs to learn more about block variables.
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3451
You can now pull PowerBI data in to all your Mage projects!
A big shoutout to @mohamad-balouza for this awesome contribution. 🎉
Read more about the connection here.
by @mohamad-balouza in https://github.com/mage-ai/mage-ai/pull/3433
@mohamad-balouza was hard at work! You can also integrate Knowi data in your Mage projects! 🤯
Read more about the connection here.
by @mohamad-balouza in https://github.com/mage-ai/mage-ai/pull/3446
A big shoutout to @mazieb! You can now stream data to InfluxDB via Mage. Thanks for your hard work!
Read more in our docs here.
by @mazieb in https://github.com/mage-ai/mage-ai/pull/3378
status
query from block runs request by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3444
start_time
and execution_date
in should_schedule
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3466
local_python_force
executor type and configuring ECS executor launch type @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3447
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.21...0.9.23
Published by mattppal about 1 year ago
Mage now supports running the whole pipeline process in one AWS ECS task instead of running pipeline blocks in separate ECS tasks! This allows you to speed up pipeline execution in ECS tasks by saving ECS task startup time.
Here's an example pipeline metadata.yaml
:
blocks:
- ...
- ...
executor_type: ecs
run_pipeline_in_one_process: true
name: example_pipeline
...
The ECS executor_config
can also be configured at the pipeline level.
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3418
Postgres enthusiasts rejoice! You can now stream data directly to Postgres via streaming pipelines! 😳
Check out the docs for more information on this handy new destination.
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3423
You can now sort the Block Runs table by clicking on the column headers! Those of us who are passionate about having our ducks in a row are happy about this one! 🦆
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3356
Bothered by that one run you'd rather forget? Individual runs can be dropped from the pipeline runs table, so you don't have to worry about them anymore!
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3370
Much like Buzz Lightyear, we're headed "to infinity and beyond," but we get that your pipelines shouldn't be. This feature allows you to configure timeouts for both blocks and pipelines— if a run exceeds the timeout, it will be marked as failed.
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3399
nextjs
local build type error by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3389
NULL
headers breaking API Source by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3386
setup_17.x
is no longer supported by @christopherscholz in https://github.com/mage-ai/mage-ai/pull/3405
lsn
and _mage_deleted_at
in initial log_based syncby @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3394
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.19...0.9.21
Published by mattppal about 1 year ago
As a part of this release, we have some exciting new AI functionality: you can now generate pipelines and add inline comments using AI. 🤯
See the following links for documentation of the new functionality.
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3365 and https://github.com/mage-ai/mage-ai/pull/3359
Elasiticsearch is now available as a streaming sink. 🥳🎉
A big thanks to @sujiplr for their contribution!
by @sujiplr in https://github.com/mage-ai/mage-ai/pull/3335
The GitHub API is now available as a data integrations— you can pull in commits, changes, and more from the GitHub API!
by @mattppal in https://github.com/mage-ai/mage-ai/pull/3252
This is a big one for our dbt users out there! You can now use {{ variables('...') }}
in dbt profiles.yml
.
jaffle_shop:
outputs:
dev:
dbname: postgres
host: host.docker.internal
port: 5432
schema: {{ variables('dbt_schema') }}
target: dev
That means pulling in custom Mage variables, directly!
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3337
Some great frontend improvements are going down! You can now sort pipelines on the dashboards, both with and without groups/filters enabled!
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3327
Another awesome community contribution— this one also on the frontend. Thanks to @splatcollision, we now have inline documentation for our data integration sources!
Now, you can see exactly what you need, directly from the UI!
by @splatcollision in https://github.com/mage-ai/mage-ai/pull/3349
You might notice our docs have a new look! We've changed how we think about side-navs and tabs.
Our goal is to help you find what you need, faster. We hope you like it!
_by @mattppal in https://github.com/mage-ai/mage-ai/pull/3324 and https://github.com/mage-ai/mage-ai/pull/3367
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.16...0.9.19
Published by mattppal about 1 year ago
A data product is any piece of data created by 1 or more blocks in a pipeline. For example, a block can create a data product that is an in-memory DataFrame, or a JSON serializable data structure, or a table in a database.
A global data product is a data product that can be referenced and used in any pipeline across the entire project. A global data product is entered into the global registry (global_data_products.yaml
) under a unique ID (UUID) and it references an existing pipeline. Learn more here.
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3206
We now have some awesome block templates for our MySQL users out there!
Check them out:
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3294
In the metadata.yml
of a standard batch pipeline, you can now configure running pipelines in a single process:
blocks:
...
run_pipeline_in_one_process: true
...
You may now also:
kwargs['context']
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3280
async_generate_block_with_description
upstream_blocks
param by @matrixstone in https://github.com/mage-ai/mage-ai/pull/3313
oracledb
lib to mage-ai by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3319
base.py
: Add XML support for file read/write in S3, GCP, and other cloud storage providers by @adelcast in https://github.com/mage-ai/mage-ai/pull/3279
created_at
property by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3317
google_ads
source by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3322
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.14...0.9.16
Published by mattppal about 1 year ago
Ahoy! There's a new connector on-board the Mage ship: sFTP. 🚢
by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3214
Pipeline schedules can now be tagged to help categorize them in the UI. 🎉
@tommydangerous in https://github.com/mage-ai/mage-ai/pull/3222
😎 Manual pipeline runs can now be cancelled via the UI.
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3236
db_connection_url
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3189
dbt-snowflake
when password null by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3225
frontend_dist_base_path
is created by @dy46 in https://github.com/mage-ai/mage-ai/pull/3224
spark.jars
, use the file names and neglect paths by @csharplus in https://github.com/mage-ai/mage-ai/pull/3264
dbt-bigquery
. by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3273
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.11...0.9.14
Published by mattppal about 1 year ago
There's a new environment variable in town— MAGE_BASE_PATH
! 🤠
Mage now supports adding a prefix to a Mage URL, i.e. localhost:6789/my_prefix/
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3141
DROP TABLE
in raw SQL blocksRaw SQL blocks can now drop tables! 🎉
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3184
Our MongoDB users will be happy about this one— MongoDB can now be accessed via a connection string, for example: mongodb+srv://{username}:{password}@{host}
Doc: https://docs.mage.ai/integrations/databases/MongoDB#add-credentials
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3188
Data integration pipelines just got another great destination— Salesforce!
by @Luishfs in https://github.com/mage-ai/mage-ai/pull/2772
bool
data type conversion issue with the ClickHouse exporter by @csharplus in https://github.com/mage-ai/mage-ai/pull/3172
DISABLE_TERMINAL
environment variable by @juancaven1988 in https://github.com/mage-ai/mage-ai/pull/3174
frontend_dist_base_path_template
in package by @dy46 in https://github.com/mage-ai/mage-ai/pull/3178
upload_kwargs
to allow overwriting an existing file or use any other options by @sumanshusamarora in https://github.com/mage-ai/mage-ai/pull/3148
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.10...0.9.11
Published by mattppal about 1 year ago
Block Creation
Document Generation
From the following PRs:
Leveraging write_pandas
in the snowflake-connector-python
library, this feature enhances the speed of batch uploads using Snowflake destinations 🤯 by @csharplus in https://github.com/mage-ai/mage-ai/pull/2896
Now, Mage can auto-remove logs after your retention period expires!
Configure retention_period in logging_config:
logging_config:
retention_period: '15d'
Run command to delete old logs:
mage clean-old-logs k8s_project
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3139
MongoDB is now supported as a destination! 🎉 by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3084
It's now possible to configure concurrency at the pipeline level:
concurrency_config:
block_run_limit: 1
pipeline_run_limit: 1
Doc: https://docs.mage.ai/design/data-pipeline-management#pipeline-level-concurrency
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3112
Mage's UI has been improved to feature a new add-block flow! by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3094, https://github.com/mage-ai/mage-ai/pull/3074, & https://github.com/mage-ai/mage-ai/pull/3106
Mage now support custom k8s executor configuration:
k8s_executor_config:
service_account_name: mageai
job_name_prefix: "{{ env_var('KUBE_NAMESPACE') }}"
container_config:
image: mageai/mageai:0.9.7
env:
- name: USER_CODE_PATH
value: /home/src/k8s_project
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3127
endpoint_url
in loggerYou can now configure a custom endpoint_url
in s3 loggers, allowing you to customize how messages are displayed!
logging_config:
type: s3
level: INFO
destination_config:
bucket: <bucket name>
prefix: <prefix path>
aws_access_key_id: <(optional) AWS access key ID>
aws_secret_access_key: <(optional) AWS secret access key>
endpoint_url: <(optional) custom endpoint url>
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3137
Text and HTML from block output is now rendered!
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3079
Clickhouse is now supported as a integrations destination! by @Luishfs in https://github.com/mage-ai/mage-ai/pull/3005
You can now set custom timeouts for all of your ECS tasks! by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3144
A single Postgres database can now support multiple Mage instances ✨ by @csharplus in https://github.com/mage-ai/mage-ai/pull/3070
pymssql
dependency by @dy46 in https://github.com/mage-ai/mage-ai/pull/3114
pipeline_scheduler
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3102
check_status
method bug fix to look at pipelines ran between specific time period by @sumanshusamarora in https://github.com/mage-ai/mage-ai/pull/3115
pipeline_uuids
by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3090
VariableResource
method by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3099
io-config
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3130
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.8...0.9.10
Published by mattppal about 1 year ago
With this release, Magers now have the option to create pipeline templates, then use those to populate new pipelines.
Additionally, you may now browse, create, and use custom block templates in your pipelines. 🎉
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3064, https://github.com/mage-ai/mage-ai/pull/3042 and https://github.com/mage-ai/mage-ai/pull/3065
Your pipeline filters and groups are now sticky— that means setting filters/groups will persist through your Mage session.
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3059
You can now run the web server and scheduler in separate containers, allowing for horizontal scaling of the scheduler! Read more in our docs.
Run scheduler only:
/app/run_app.sh mage start project_name --instance-type scheduler
Run web server only:
/app/run_app.sh mage start project_name --instance-type web_server
by @wangxiaoyou1993 in https://github.com/mage-ai/mage-ai/pull/3016
Data integration streams may now be executed in parallel!
by @dy46 in https://github.com/mage-ai/mage-ai/pull/1474
The secrets management backend can now handle multiple environments!
by @dy46 in https://github.com/mage-ai/mage-ai/pull/3000
Interval datetimes and durations are now returned in block variables. Check out our docs for more info!
by @tommydangerous in https://github.com/mage-ai/mage-ai/pull/3058 and https://github.com/mage-ai/mage-ai/pull/3068
delete
and add documentation by @mattppal in https://github.com/mage-ai/mage-ai/pull/3012
/files
route to backend server and fix TypeError in FileEditor by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3041
Updated At
column to Pipelines list page by @johnson-mage in https://github.com/mage-ai/mage-ai/pull/3050
Full Changelog: https://github.com/mage-ai/mage-ai/compare/0.9.4...0.9.8
Published by thomaschung408 over 1 year ago
Docs: https://docs.mage.ai/guides/streaming/destinations/azure_data_lake
Mage now supports Azure Data Lake as a streaming destination!
Tags can now be applied to pipelines. Users can leverage the pipeline view to apply filters or group pipelines by tag.
You can now prefix your k8s executor jobs! Here’s an example k8s executor config file:
k8s_executor_config:
job_name_prefix: data-prep
resource_limits:
cpu: 1000m
memory: 2048Mi
resource_requests:
cpu: 500m
memory: 1024Mi
service_account_name: default
See the documentation for further details.
Mage no longer prints data integration settings in logs: a big win for security. 🔒
azure_secret_var
syntax: docs.VARCHAR
instead of VARCHAR(255)
. The MySQL loader now uses TEXT
for strings to avoid truncation.COPY
step to reduce Docker build time.Published by thomaschung408 over 1 year ago
You can use Mage with multiple workspaces in the cloud now. Mage has a built in workspace manager that can be enabled in production. This feature is similar to the multi-development environments, but there are settings that can be shared across the workspaces. For example, the project owner can set workspace level permissions for users. The current additional features supported are:
Upcoming features:
Doc: https://docs.mage.ai/developing-in-the-cloud/workspaces/overview
Add "Overview" page to dashboard providing summary of pipeline run metrics and failures.
Support all Git operations through UI. Authenticate with GitHub then pull from a remote repository, push local changes to a remote repository, and create pull requests for a remote repository.
Doc: https://docs.mage.ai/production/data-sync/github
ENABLE_NEW_RELIC
environment variable to enable or disable new relic monitoring.Doc: https://docs.mage.ai/production/observability/newrelic
Enable signing in with Microsoft Active Directory account in Mage.
Doc: https://docs.mage.ai/production/authentication/microsoft
https://docs.mage.ai/production/authentication/overview#ldap
LDAP_DEFAULT_ACCESS
so that the default access can be customized.There are two ways to configure Mage to sync from Git on server start
Sync on server start up
option in Git settings UIGIT_SYNC_ON_START
environment variable (options: 0 or 1)Doc: https://docs.mage.ai/production/data-sync/git#git-settings-as-environment-variables
Shout out to Mohamad Balouza for his contribution of adding the Mode Analytics source to Mage data integration pipeline.
Support using S3 source to connect to MinIO by configuring the aws_endpoint
in the config.
TIMESTAMP_TZ
as column type for snowflake datetime column.Added OracleDB Data Loader block
Schema
was not properly set when checking table existence. Use dbo
as the default schema if no schema is set.Shout out to Daesgar for his contribution of adding support running ClickHouse DBT models in Mage.
Add a DBT block that can run any generic command
Turn on output to logs when running a single block in the notebook
When running a block in the notebook, provide an option to only run the upstream blocks that haven’t been executed successfully.
Change the color of a custom block from the UI.
Show what pipelines are using a particular block
Enhanced pipeline settings page and block settings page
Enhance dependency tree node to show callbacks, conditionals, and extensions
Save trigger from UI to code
Allow setting service account name for k8s executor
k8s_executor_config:
resource_limits:
cpu: 1000m
memory: 2048Mi
resource_requests:
cpu: 500m
memory: 1024Mi
service_account_name: custom_service_account_name
Support customizing the timeout seconds in GCP cloud run config.
gcp_cloud_run_config:
path_to_credentials_json_file: "/path/to/credentials_json_file"
project_id: project_id
timeout_seconds: 600
Check ECS task status after running the task.
TypeError: cannot pickle '_thread.lock' object in the deepcopy from the handle_batch_events_recursively
) and fallback to copy method.Add json value macro. Example usage: "{{ json_value(aws_secret_var('test_secret_key_value'), 'k1') }}"
Allow slashes in block_uuid when downloading block output. The regex for the block output download endpoint would not capture block_uuids with slashes in them, so this fixes that.
Fix renaming block.
Fix user auth when disable notebook edits is enabled.
Allow JWT_SECRET to be modified via env var. The JWT_SECRET
for encoding and decoding access tokens was hardcoded, the fix allows users to update it through an environment variable.
Hide duplicate shortcut items in editor context menu
When changing the name of a block or creating a new block, auto-create non-existent folders if the block name is using nested block names.
Fix trigger count in pipeline dashboard
Fix copy text for secrets
Fix git sync asyncio
issue
Fix Circular Import when importing get_secret_value
method
Shorten branch name in the header. If branch name is greater than 21 characters, show ellipsis.
Replace hard-to-read dark blue font in code block output with much more legible yellow font.
Show error popup if error occurs when updating pipeline settings.
Update tree node when block status changes
Prevent sending notification multiple times for multiple block failures
Published by thomaschung408 over 1 year ago
Add conditional block to Mage. The conditional block is an "Add-on" block that can be added to an existing block within a pipeline. If the conditional block evaluates as False, the parent block will not be executed.
Doc: https://docs.mage.ai/development/blocks/conditionals/overview
For standard pipelines (not currently supported in integration or streaming pipelines), you can save the output of a block that has been run as a CSV file. You can save the block output in Pipeline Editor page or Block Runs page.
Doc: https://docs.mage.ai/orchestration/pipeline-runs/saving-block-output-as-csv
Mage supports customizing Spark session for a pipeline by specifying the spark_config
in the pipeline metadata.yaml
file. The pipeline level spark_config
will override the project level spark_config
if specified.
Doc: https://docs.mage.ai/integrations/spark-pyspark#custom-spark-session-at-the-pipeline-level
Doc: https://github.com/mage-ai/mage-ai/tree/master/mage_integrations/mage_integrations/sources/api
Users can customize the notification templates of different channels (slack, email, etc.) in project metadata.yaml. Hare are the supported variables that can be interpolated in the message templates: execution_time
, pipeline_run_url
, pipeline_schedule_id
, pipeline_schedule_name
, pipeline_uuid
Example config in project's metadata.yaml
notification_config:
slack_config:
webhook_url: "{{ env_var('MAGE_SLACK_WEBHOOK_URL') }}"
message_templates:
failure:
details: >
Failure to execute pipeline {pipeline_run_url}.
Pipeline uuid: {pipeline_uuid}. Trigger name: {pipeline_schedule_name}.
Test custom message."
Doc: https://docs.mage.ai/production/observability/alerting-slack#customize-message-templates
Mage stores orchestration data, user data, and secrets data in a database. In addition to SQLite and Postgres, Mage supports using MSSQL and MySQL as the database engine now.
MSSQL docs:
MySQL docs:
Mage supports connecting to MinIO and Wasabi by specifying the AWS_ENDPOINT
field in S3 config now.
Doc: https://docs.mage.ai/integrations/databases/S3#minio-support
To maximize block reuse, you can use dynamic and replica blocks in combination.
CREATE SCHEMA IF NOT EXISTS
is not supported by MSSQL. Provided a default command in BaseSQL -> build_create_schema_command, and an overridden implementation in MSSQL -> build_create_schema_command containing compatible syntax. (Kudos to gjvanvuuren)kwargs
passing so that RabbitMQ messages can be acknowledged correctly.Published by thomaschung408 over 1 year ago
Support reusing same block multiple times in a single pipeline.
Doc: https://docs.mage.ai/design/blocks/replicate-blocks
Support running Spark code on Yarn cluster with Mage.
Doc: https://docs.mage.ai/integrations/spark-pyspark#hadoop-and-yarn-cluster-for-spark
Mage supports configuring automatic retry for block runs with the following ways
retry_config
to project’s metadata.yaml
. This retry_config
will be applied to all block runs.retry_config
to the block config in pipeline’s metadata.yaml
. The block level retry_config
will override the global retry_config
.Example config:
retry_config:
# Number of retry times
retries: 0
# Initial delay before retry. If exponential_backoff is true,
# the delay time is multiplied by 2 for the next retry
delay: 5
# Maximum time between the first attempt and the last retry
max_delay: 60
# Whether to use exponential backoff retry
exponential_backoff: true
Doc: https://docs.mage.ai/orchestration/pipeline-runs/retrying-block-runs#automatic-retry
When running DBT block with language YAML, interpolate and merge the user defined --vars in the block’s code into the variables that Mage automatically constructs
--select demo/models --vars '{"demo_key": "demo_value", "date": 20230101}'
--select demo/models --vars {"demo_key":"demo_value","date":20230101}
--select demo/models --vars '{"global_var": {{ test_global_var }}, "env_var": {{ test_env_var }}}'
--select demo/models --vars {"refresh":{{page_refresh}},"env_var":{{env}}}
Support dbt_project.yml
custom project names and custom profile names that are different than the DBT folder name
Allow user to configure block to run DBT snapshot
Support using dynamic child blocks for SQL blocks
Doc: https://docs.mage.ai/design/blocks/dynamic-blocks#dynamic-sql-blocks
If your Mage app is deployed on Microsoft Azure with Mage’s terraform scripts, you can choose to launch separate Azure container instances to execute blocks.
mage start project_name --instance-type scheduler
mage start project_name --instance-type web_server
mage start project_name --instance-type server_and_scheduler
Support “Add”, “Rename”, “Move”, “Delete” operations on folder.
Allow specifying envs
value to apply triggers only in certain environments.
Example:
triggers:
- name: test_example_trigger_in_prod
schedule_type: time
schedule_interval: "@daily"
start_time: 2023-01-01
status: active
envs:
- prod
- name: test_example_trigger_in_dev
schedule_type: time
schedule_interval: "@hourly"
start_time: 2023-03-01
status: inactive
settings:
skip_if_previous_running: true
allow_blocks_to_fail: true
envs:
- dev
Doc: https://docs.mage.ai/guides/triggers/configure-triggers-in-code#create-and-configure-triggers
Add indices to schedule models to speed up DB queries.
“Too many open files issue”
ULIMIT_NO_FILE
environment variable to increase maximum number of open files in Mage deployed on AWS, GCP and Azure.Fix git_branch resource blocking page loads. The git clone
command could cause the entire app to hang if the host wasn't added to known hosts. git clone
command is updated to run as a separate process with the timeout, so it won't block the entire app if it's stuck.
Fix bug: when adding a block in between blocks in pipeline with two separate root nodes, the downstream connections are removed.
Fix DBT error: KeyError: 'file_path'
. Check for file_path
before calling parse_attributes
method to avoid KeyError.
Improve the coding experience when working with Snowflake data provider credentials. Allow more flexibility in Snowflake SQL block queries. Doc: https://docs.mage.ai/integrations/databases/Snowflake#methods-for-configuring-database-and-schema
Pass parent block’s output and variables to its callback blocks.
Fix missing input field and select field descriptions in charts.
Fix bug: Missing values template chart doesn’t render.
Convert numpy.ndarray
to list
if column type is list when fetching input variables for blocks.
Fix runtime and global variables not available in the keyword arguments when executing block with upstream blocks from the edit pipeline page.
View full Changelog
Published by thomaschung408 over 1 year ago
More complex streaming pipeline is supported in Mage now. You can use more than transformer and more than one sinks in the streaming pipeline.
Here is an example streaming pipeline with multiple transformers and sinks.
Doc for streaming pipeline: https://docs.mage.ai/guides/streaming/overview
Allow using custom Spark configuration to create Spark session used in the pipeline.
spark_config:
# Application name
app_name: 'my spark app'
# Master URL to connect to
# e.g., spark_master: 'spark://host:port', or spark_master: 'yarn'
spark_master: 'local'
# Executor environment variables
# e.g., executor_env: {'PYTHONPATH': '/home/path'}
executor_env: {}
# Jar files to be uploaded to the cluster and added to the classpath
# e.g., spark_jars: ['/home/path/example1.jar']
spark_jars: []
# Path where Spark is installed on worker nodes,
# e.g. spark_home: '/usr/lib/spark'
spark_home: null
# List of key-value pairs to be set in SparkConf
# e.g., others: {'spark.executor.memory': '4g', 'spark.executor.cores': '2'}
others: {}
Doc for running PySpark pipeline: https://docs.mage.ai/integrations/spark-pyspark#standalone-spark-cluster
New data integration source DynamoDB is added.
timestamptz
as data type for datetime column in Postgres destination.Improved the file editor of Mage so that user can edit the files without going into a pipeline.
Mage uses Polars to speed up writing block output (DataFrame) to disk, reducing the time of fetching and writing a DataFrame with 2 million rows from 90s to 15s.
.gitignore
Mage automatically adds the default .gitignore
file when initializing project
.DS_Store
.file_versions
.gitkeep
.log
.logs/
.preferences.yaml
.variables/
__pycache__/
docker-compose.override.yml
logs/
mage-ai.db
mage_data/
secrets/
TypeError: Instance and class checks can only be used with @runtime protocols
.View full Changelog
Published by thomaschung408 over 1 year ago
Add code templates to fetch data from and export data to MongoDB.
Example MongoDB config in io_config.yaml
:
version: 0.1.1
default:
MONGODB_DATABASE: database
MONGODB_HOST: host
MONGODB_PASSWORD: password
MONGODB_PORT: 27017
MONGODB_COLLECTION: collection
MONGODB_USER: user
Data loader template
Data exporter template
renv
for R blockrenv
is installed in Mage docker image by default. User can use renv
package to manage R dependency for your project.
Doc for renv
package: https://cran.r-project.org/web/packages/renv/vignettes/renv.html
Support running streaming pipeline in k8s executor to scale up streaming pipeline execution.
It can be configured in pipeline metadata.yaml with executor_type
field. Here is an example:
blocks:
- ...
- ...
executor_count: 1
executor_type: k8s
name: test_streaming_kafka_kafka
uuid: test_streaming_kafka_kafka
When cancelling the pipeline run in Mage UI, Mage will kill the k8s job.
Support running Spark DBT models in Mage. Currently, only the connection method session
is supported.
Follow this doc to set up Spark environment in Mage. Follow the instructions in https://docs.mage.ai/tutorials/setup-dbt to set up the DBT. Here is an example DBT Spark profiles.yml
spark_demo:
target: dev
outputs:
dev:
type: spark
method: session
schema: default
host: local
Update the multi-development environment to go through the user authentication flow. Multi-development environment is used to manage development instances on cloud.
Doc for multi-development environment: https://docs.mage.ai/developing-in-the-cloud/cloud-dev-environments/overview
Shout out to Joseph Corrado for his contribution of adding pre-commit hooks to Mage to run code checks before committing and pushing the code.
Doc: https://github.com/mage-ai/mage-ai/blob/master/README_dev.md
Shout out to hjhdaniel for his contribution of adding the method for deleting secret keys to Mage.
Example code:
from mage_ai.data_preparation.shared.secrets import delete_secret
delete_secret('secret_name')
If a block is selected in an integration pipeline to retry block runs, only the block runs for the selected block's stream will be ran.
Mage now automatically retries blocks twice on failures (3 total attempts).
Display error popup with link to docs for “too many open files” error.
Fix DBT block limit input field: the limit entered through the UI wasn’t taking effect when previewing the model results. Fix this and set a default limit of 1000.
Fix BigQuery table id issue for batch load.
Fix unique conflict handling for BigQuery batch load.
Remove startup_probe in GCP cloud run executor.
Fix run command for AWS and GCP job runs so that job run logs can be shown in Mage UI correctly.
Pass block configuration to kwargs
in the method.
Fix SQL block execution when using different schemas between upstream block and current block.
View full Changelog
Published by thomaschung408 over 1 year ago
Support using Polars DataFrame in Mage blocks.
Shout out to Sergio Santiago for his contribution of integrating Opsgenie as an alerting option in Mage.
Doc: https://docs.mage.ai/production/observability/alerting-opsgenie
Add support for using batch load jobs instead of the query API in BigQuery destination. You can enable it by setting use_batch_load
to true
in BigQuery destination config.
When loading ~150MB data to BigQuery, using batch loading reduces the time from 1 hour to around 2 minutes (30x the speed).
io_config.yaml
git switch
to switch branchesAdd another value to DISABLE_NOTEBOOK_EDIT_ACCESS
environment variable to allow users to create secrets, variables, and run blocks.
The available values are
Doc: https://docs.mage.ai/production/configuring-production-settings/overview#read-only-access
For standard python pipelines, retry block runs from a selected block. The selected block and all downstream blocks will be re-ran after clicking the Retry from selected block
button.
Fix terminal user authentication. Update terminal authentication to happen on message.
Fix a potential authentication issue for the Google Cloud PubSub publisher client
Dependency graph improvements
DBT
limit
property in DBT block PUT request payload.Retry pipeline run
Fix bug: When Mage fails to fetch a pipeline due to a backend exception, it doesn't show the actual error. It uses "undefined" in the pipeline url instead, which makes it hard to debug the issue.
Improve job scheduling: If jobs with QUEUED status are not in queue, re-enqueue them.
Pass imagePullSecrets
to k8s job when using k8s
as the executor.
Fix streaming pipeline cancellation.
Fix the version of google-cloud-run package.
Fix query permissions for block resource
Catch sqlalchemy.exc.InternalError
in server and roll back transaction.
View full Changelog
Published by thomaschung408 over 1 year ago
Added Markdown block to Pipeline Editor.
Doc: https://docs.mage.ai/guides/blocks/markdown-blocks
Doc: https://docs.mage.ai/production/data-sync/git#https-token-authentication
Doc: https://docs.mage.ai/development/blocks/callbacks/overview
Make callback block more generic and support it in data integration pipeline.
Keyword arguments available in data integration pipeline callback blocks: https://docs.mage.ai/development/blocks/callbacks/overview#data-integration-pipelines-only
Support bulk retrying pipeline runs for a pipeline.
Add right click context menu for row on pipeline list page for pipeline actions (e.g. rename).
When hovering over left and right vertical navigation, expand it to show navigation title like BigQuery’s UI.
Doc: https://docs.mage.ai/development/testing/great-expectations#json-object
Doc: https://docs.mage.ai/dbt/incremental-models
Shout out to André Ventura for his contribution of adding the Google Cloud Storage destination to data integration pipeline.
Shout out to Dhia Eddine Gharsallaoui again for his contribution of adding Druid data source to Mage.
Doc: https://docs.mage.ai/integrations/databases/Druid
Use COPY
command in mage_ai.io.postgres.Postgres
export method to speed up writing data to Postgres.
Doc: https://docs.mage.ai/guides/streaming/sources/google-cloud-pubsub
Setting the environment variable DEFAULT_EXECUTOR_TYPE
to k8s
to use K8s executor by default for all pipelines. Doc: https://docs.mage.ai/production/configuring-production-settings/compute-resource#2-set-executor-type-and-customize-the-compute-resource-of-the-mage-executor
Add the k8s_executor_config
to project’s metadata.yaml to apply the config to all the blocks that use k8s executor in this project. Doc: https://docs.mage.ai/production/configuring-production-settings/compute-resource#kubernetes-executor
Allow specifying GPU resource in k8s_executor_config
.
default
as service account namespace in Helm chartFix service account permission for creating Kubernetes jobs by not using default
namespace.
Doc for deploying with Helm: https://docs.mage.ai/production/deploying-to-cloud/using-helm
MAGE_PUBLIC_HOST
.View full Changelog
Published by thomaschung408 over 1 year ago
Provide code template to trigger another pipeline from a block within a different pipeline.****
Doc: https://docs.mage.ai/orchestration/triggers/trigger-pipeline
Doc: https://docs.mage.ai/guides/streaming/destinations/mongodb
Mage supports two ways to delete messages:
Doc: https://docs.mage.ai/guides/streaming/sources/amazon-sqs#message-deletion-method
Set executor_count
variable in the pipeline’s metadata.yaml file to run multiple executors at the same time to scale the streaming pipeline execution
Doc: https://docs.mage.ai/guides/streaming/overview#run-pipeline-in-production
Added pagination to Triggers and Block Run pages
After pulling the code from git repository to local, automatically install the libraries in requirements.txt
so that the pipelines can run successfully without manual installation of the packages.
Allow setting the table names for upstream blocks when using SQL blocks.
connect_timeout
to PostgreSQL IOlocation
to BigQuery IO.sql
extension in DBT model name if user includes it (the .sql
extension should not be included)..sql
suffix trailing to emphasize that the .sql
extension should not be included.onSuccess
callback logging issuemage run
command. Set repo_path before initializing the DB so that we can get correct db_connection_url.ModuleNotFoundError: No module named 'aws_secretsmanager_caching'
when running pipeline from command lineView full Changelog