š§ Build, run, and manage data pipelines for integrating and transforming data.
APACHE-2.0 License
Bot releases are hidden (Show)
Published by thomaschung408 over 1 year ago
Support using SQL block to fetch data from, transform data in and export data to ClickHouse.
Doc: https://docs.mage.ai/integrations/databases/ClickHouse
Support using SQL block to fetch data from, transform data in and export data to Trino.
Doc: https://docs.mage.ai/development/blocks/sql/trino
Enable Sentry integration to track and monitor exceptions in Sentry dashboard.
Doc: https://docs.mage.ai/production/observability/sentry
Mage now supports dragging and dropping blocks to re-order blocks in pipelines.
Support consuming messages from SQS queues in streaming pipelines.
Doc: https://docs.mage.ai/guides/streaming/sources/amazon-sqs
Dummy sink will print the message optionally and discard the message. This dummy sink will be useful when users want to trigger other pipelines or 3rd party services using the ingested data in transformer.
Doc: https://docs.mage.ai/guides/streaming/destinations/dummy
Add code templates to fetch data from and export data to Delta Lake.
Support writing unit tests for Mage pipelines that run in the CI/CD pipeline using mock data.
Doc: https://docs.mage.ai/development/testing/unit-tests
.sql
extension, the wrong file may get deleted if you try to delete the file with the double .sql
extension.+schema
in DBT profileio_config.yaml
database and schema by defaultGit
Ā feature is used.View fullĀ Changelog
Published by thomaschung408 over 1 year ago
In addition to configuring triggers in UI, Mage also supports configuring triggers in code now. Create aĀ triggers.yaml
file under your pipeline folder and enter the triggers config. The triggers will automatically be synced to DB and trigger UI.
Doc: https://docs.mage.ai/guides/triggers/configure-triggers-in-code
Shout out toĀ Dhia Eddine Gharsallaoui for his contribution of centralizing the server loggings and adding verbosity control. User can control the verbosity level of the server logging by setting the SERVER_VERBOSITY
environment variables. For example, you can set SERVER_VERBOSITY
environment variable to ERROR
to only print out errors.
Doc: https://docs.mage.ai/production/observability/logging#server-logging
User can customize the resource when using the Kubernetes executor now by adding the executor_config
to the block config in pipelineās metadata.yaml
.
Doc: https://docs.mage.ai/production/configuring-production-settings/compute-resource#kubernetes-executor
{}
) when a stream that was previously selected may have been deleted or renamed. If a previously selected stream was deleted or renamed, it will still appear in theĀ SelectStreams
Ā modal but will automatically be deselected and indicate that the stream is no longer available in red font. User needs to click "Confirm" to remove the deleted stream from the schema.cmd
Ā shell command for windows instead of bash. Allow users to overwrite the shell command with theĀ SHELL_COMMAND
Ā environment variable.with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
loader.export(
df,
schema_name,
table_name,
index=False,
if_exists='append',
allow_reserved_words=True,
unique_conflict_method='UPDATE',
unique_constraints=['col'],
)
View full Changelog
Published by thomaschung408 over 1 year ago
The terminal experience is improved in this release, which adds new interactive features and boosts performance. Now, you can use the following interactive commands and more:
git add -p
dbt init demo
great_expectations init
Shout out to Luis SalomĆ£o for adding the Google Ads source.
DOUBLE PRECISION
instead of DECIMAL
as the column type for float/double numbers.Doc: https://docs.mage.ai/guides/streaming/destinations/amazon-s3
Enable the logging of custom exceptions in the transformer of a streaming pipeline. Here is an example code snippet:
@transformer
def transform(messages: List[Dict], *args, **kwargs):
try:
raise Exception('test')
except Exception as err:
kwargs['logger'].error('Test exception', error=err)
return messages
Support cancelling running streaming pipeline (when pipeline is executed in PipelineEditor) after page is refreshed.
Shout out to Tim Ebben for adding the option to send alerts to Google Chat in the same way as Teams/Slack using a webhook.
Example config in projectās metadata.yaml
notification_config:
alert_on:
- trigger_failure
- trigger_passed_sla
slack_config:
webhook_url: ...
How to create webhook url: https://developers.google.com/chat/how-tos/webhooks#create_a_webhook
Prevent a user from editing a pipeline if itās stale. A pipeline can go stale if there are multiple tabs open trying to edit the same pipeline or multiple people editing the pipeline at different times.
Fix bug: Code block scrolls out of view when focusing on the code block editor area and collapsing/expanding blocks within the code editor.
Fix bug: Sync UI is not updating the "rows processed" value.
Fix the path issue of running dynamic blocks on a Windows server.
Fix index out of range error in data integration transformer when filtering data in the transformer.
Fix issues of loading sample data in Google Sheets.
Fix chart blocks loading data.
Fix Git integration bugs:
Add preventive measures for saving a pipeline:
DBT block
Circular reference detected
issue with DBT variables.SQL block
Add helper for using CRON syntax in trigger setting.
View full Changelog
Published by thomaschung408 over 1 year ago
Mage supports Github/Gitlab integration via UI now. You can perform the following actions with the UI.
Doc on setting up integration: https://docs.mage.ai/production/data-sync/git
Add terraform templates for deploying Mage to ECS from a CodeCommit repo with AWS CodePipeline. It will create 2 separate CodePipelines, one for building a docker image to ECR from a CodeCommit repository, and another one for reading from ECR and deploying to ECS.
Docs on using the terraform templates: https://docs.mage.ai/production/deploying-to-cloud/aws/code-pipeline
When you run Mage on AWS, instead of using hardcoded API keys, you can also use ECS task role to authenticate with AWS services.
Shout out toĀ Bruno Gonzalez for his contribution of supporting automatically opening Mage in a browser tab when using mage start
command in your laptop.
Github issue: https://github.com/mage-ai/mage-ai/issues/2233
When executing a block in the notebook and an error occurs, show the stack trace of the error without including the custom code wrapper (useless information).
Before:
After:
MySQL
Commercetools
Outreach
Snowflake
disable_double_quotes
is True.consume_method = SourceConsumeMethod.READ_ASYNC
in your streaming source class. Then it'll use read_async method.Mag supports triggering pipelines on AWS events. Now, you can access the raw event data in block method via kwargs['event']
. This enhancement enables you to easily customize your pipelines based on the event trigger and to handle the event data as needed within your pipeline code.
secret_var
in repo's metadata.yamlView full Changelog
Published by thomaschung408 over 1 year ago
Mage is now integrated with Great Expectations to test the data produced by pipeline blocks.
You can use all the expectations easily in your Mage pipeline to ensure your data quality.
Follow the doc to add expectations to your pipeline to run tests for your block output.
Added pipeline description.
Single click on a row no longer opens a pipeline. In order to open a pipeline now, users can double-click a row, click on the pipeline name, or click on the open folder icon at the end of the row.
Select a pipeline row to perform an action (e.g. clone, delete, rename, or edit description).
Users can click on the file icon under the Actions
column to go directly to the pipeline's logs.
Added search bar which searches for text in the pipelineĀ uuid
,Ā name
, andĀ description
and filters the pipelines that match.
The create, update, and delete actions are not accessible by Viewer roles.
Added badge in Filter button indicating number of filters applied.
Group pipelines by status
or type
.
Users can write raw SQL blocks and only include theĀ INSERT
statement.Ā CREATE TABLE
Ā statement isnāt required anymore.
Users can write SELECT
statements using raw SQL in SQL blocks now.
Find all supported SQL statements using raw SQL in this doc.
When using SSH tunnel to connect to Postgres database, SSH tunnel was originally only supported in block run at a time due to port conflict. Now Mage supports SSH tunneling in multiple blocks by finding the unused port as the local port. This feature is also supported in Python block when using mage_ai.io.postgres
module.
Shout out toĀ Luis SalomĆ£o for his continuous contribution to Mage. The new source Pipedrive is available in Mage now.
Add check for size of query since that can potentially exceed the limit.
Sensor block is used to continuously evaluate a condition until itās met. Mage now has more sensor templates to check whether data lands in S3 bucket or SQL data warehouses.
Mage can connect to a standalone Spark cluster and run PySpark code on it. You can set the environment variableĀ SPARK_MASTER_HOST
in your Mage container or instance. Then running PySpark code in a standard batch pipeline will work automagically by executing the code in the remote Spark cluster.
Follow this doc to set up Mage to connect to a standalone Spark cluster.
Mage now automatically masks environment variable values with stars in terminal output or block output to prevent showing sensitive data in plaintext.
Improve streaming pipeline logging
Provide the working NGINX config to allow Mage WebSocket traffic.
location / {
proxy_pass http://127.0.0.1:6789;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
Fix raw SQL quote error.
Add documentation for developer to add a new source or sink to streaming pipeline: https://docs.mage.ai/guides/streaming/contributing
View full Changelog
Published by thomaschung408 over 1 year ago
You can configure Mage to not allow any edits to pipelines or blocks in production environment. Users will only be able to create triggers and view the existing pipelines.
Doc: https://docs.mage.ai/production/configuring-production-settings/overview#read-only-access
Shout out toĀ Dhia Eddine Gharsallaoui for his contribution of adding LDAP authentication method to Mage. When LDAP authentication is enabled, users will need to provide their LDAP credentials to log in to the system. Once authenticated, Mage will use the authorization filter to determine the userās permissions based on their LDAP group membership.
Follow the guide to set up LDAP authentication.
Support running SQL Server DBT models in Mage.
Tutorial for setting up a DBT project in Mage: https://docs.mage.ai/tutorials/setup-dbt
Mage can now be deployed to Kubernetes with Helm: https://mage-ai.github.io/helm-charts/
How to install Mage Helm charts
helm repo add mageai https://mage-ai.github.io/helm-charts
helm install my-mageai mageai/mageai
To customize the mount volume for Mage container, youāll need to customize the values.yaml
Get the values.yaml
with the command
helm show values mageai/mageai > values.yaml
Edit the volumes
config in values.yaml
to mount to your Mage project path
Doc: https://docs.mage.ai/production/deploying-to-cloud/using-helm
When you run Mage and Spark in the same Kubernetes cluster, you can set the environment variableĀ SPARK_MASTER_HOST
to the url of the master node of the Spark cluster in Mage container. Then youāll be able to connect Mage to your Spark cluster and execute PySpark code in Mage.
Follow this guide to use Mage with Spark in Kubernetes cluster.
Add filtering (by status and type) for pipelines.
Allow blocks to fail
setting for pipelines with dynamic blocks.sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back.
in API middlewareView full Changelog
Published by thomaschung408 over 1 year ago
Mage pipeline used to stop running if any of the block run failed. A setting was added to continue running the pipeline even if a block in the pipeline fails during the execution.
Check out the doc to learn about the additional settings of a trigger.
If you have your pipeline data stored in a remote repository in Github, you can sync your local project with the remote repository through Mage.
Follow the doc to set up the sync with Github.
Edit bookmark property values from UI. User can edit the bookmark values, which will be used as a bookmark for the next sync. The bookmark values will automatically update to the last record synced after the next sync is completed. Check out the doc to learn about how to edit bookmark property values.
Use TEXT instead of VARCHAR with character limit as the column type in Postgres destination
Show a loader on a data integration pipeline while the list of sources and destinations are still loading
Specify the Protobuf schema class path in the Kafka source config so that Mage can deserialize the Protobuf messages from Kafka.
Doc: https://docs.mage.ai/guides/streaming/sources/kafka#deserialize-message-with-protobuf-schema
Doc: https://docs.mage.ai/guides/streaming/destinations/kafka
Mage doesnāt directly stream data into Redshift. Instead, Mage can stream data to Kinesis. You can configure streaming ingestion for your Amazon Redshift cluster and create a materialized view using SQL statements.
Doc: https://docs.mage.ai/guides/streaming/destinations/redshift
Add the button to cancel all running pipeline runs for a pipeline.
For the viewer role, donāt show the edit options for the pipeline
Show āPositional arguments for decorated functionā preview for custom blocks
Disable notebook keyboard shortcuts when typing in input fields in the sidekick
View full Changelog
Published by thomaschung408 over 1 year ago
Add callbacks to run after your block succeeds or fails. You can add a callback by clicking āAdd callbackā in the āMore actionsā menu of the block (the three dot icon in the top right).
For more information about callbacks, check out the Mage documentation
Show preview of total pipeline runs created and timestamps of pipeline runs that will be created before starting backfill.
Misc UX improvements with the backfills pages (e.g. disabling or hiding irrelevant items depending on backfill status, updating backfill table columns that previously weren't updating as needed)
Show pipeline editor main content header on Firefox. The header for the Pipeline Editor main content was hidden for Firefox browsers specifically (which prevented users from being able to change their pipeline names on Firefox).
Make retry run popup fully visible. Fix issue with Retry pipeline run button popup being cutoff.
Add alert with details on how to allow clipboard paste in insecure contexts
Show canceling status only for pipeline run being canceled. When multiple runs were being canceled, the status for other runs was being updated to "canceling" even though those runs weren't being canceled.
Remove table prop from destination config. TheĀ table
Ā property is not needed in the data integration destination config templates when building integration pipelines through the UI, so they've been removed.
Update data loader, transformer, and data exporter templates to not require DataFrame.
Fix PyArrow issue
Fix data integration destination row syncing count
Fix emoji encode for BigQuery destination
Fix dask memory calculation issue
Fix Nan being display for runtime value on Syns page
Odd formatting on Trigger edit page dropdowns (e.g. Fequency) on Windows
Not fallback to empty pipeline when failing to reading pipeline yaml
View full Changelog
Published by thomaschung408 over 1 year ago
User login and user level permission control is supported in mage-ai version 0.8.0 and above.
Setting the environment variable REQUIRE_USER_AUTHENTICATION
to 1 to turn on user authentication.
Check out the doc to learn more about user authentication and permission control: https://docs.mage.ai/production/authentication/overview
New sources
New destinations
Full lists of available sources and destinations can be found here:
Improvements on existing sources and destinations
In various surfaces in Mage, you may be asked to input config for certain integrations such as cloud databases or services. In these cases, you may need to input a password or an api key, but you donāt want it to be shown in plain text. To get around this issue, we created a way to store your secrets in the Mage database.
Check out the doc to learn more about secrets management in Mage: https://docs.mage.ai/development/secrets/secrets
Mage now supports limiting the number of concurrent block runs by customizing queue config, which helps avoid mage server being overloaded by too many block runs. User can configure the maximum number of concurrent block runs in projectās metadata.yaml via queue_config
.
queue_config:
concurrency: 100
Support running PySpark pipelines locally without custom code and settings.
If you have your Spark cluster running locally, you can just build your standard batch pipeline with PySpark code same as other Python pipelines. Mage handles data passing between blocks automatically for Spark DataFrames. You can use kwargs['spark']
in Mage blocks to access the Spark session.
View full Changelog
Published by thomaschung408 over 1 year ago
New sources
Full lists of available sources and destinations can be found here:
Improvements on existing sources and destinations
Mage now supports building and running Spark pipelines with remote Databricks Spark cluster.
Check out the guide to learn about how to use Databricks Spark cluster with Mage.
Shout out to Luis SalomĆ£o for his contribution of adding the RabbitMQ streaming source to Mage! Check out the doc to set up a streaming pipeline with RabbitMQ source.
Support running Trino DBT models in Mage.
KUBE_NAMESPACE
Ā environment variable.Add a generic block that can run in a pipeline, optionally accept inputs, and optionally return outputs but not a data loader, data exporter, or transformer block.
View full Changelog
Published by thomaschung408 over 1 year ago
New sources
New destinations
Full lists of available sources and destinations can be found here:
Improvements on existing sources and destinations
MERGE
command in Trino connector to handle conflict.query_max_length
to adjust batch size.Mage has a newly revamped command line tool, with better formatting, clearer help commands, and more informative error messages. Kudos to community member @jlondonobo, for your awesome contribution!
mage_sources.yml
files in different model subfolders: use only 1 sources file for. all models instead of nesting them in subfolders.metadata.yaml
Ā file. https://docs.mage.ai/production/configuring-production-settings/runtime-variable#in-code
Besides storing logs on the local disk or AWS S3, we now add the option to store the logs in GCP Cloud Storage by adding logging config in projectās metadata.yaml like below:
logging_config:
type: gcs
level: INFO
destination_config:
path_to_credentials: <path to gcp credentials json file>
bucket: <bucket name>
prefix: <prefix path>
Check out the doc for details: https://docs.mage.ai/production/observability/logging#google-cloud-storage
mage_ai.io.file.FileIO
. Example:from mage_ai.io.file import FileIO
file_directories = ['default_repo/csvs']
FileIO().load(file_directories=file_directories)
View full Changelog
Published by thomaschung408 over 1 year ago
Mage launched a new backfill framework to make backfills a lot easier. User can select a date range and date interval for backfill. Mage will automatically create the pipeline runs within the date range, and run them concurrently to backfill the data.
New sources
New destinations
Add Kinesis as streaming source and destination(sink) to streaming pipeline.
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
blocks:
- uuid: example_data_loader
type: data_loader
upstream_blocks: []
downstream_blocks: []
executor_type: k8s
...
Mage can run monitors in Metaplane via API integration. Check out the guide to learn about how to run monitors in Metaplane and poll statuses of the monitors.
kwargs['key']
in test functions.View full Changelog
Published by thomaschung408 over 1 year ago
New sources
Improvements on existing sources and destinations
_s3_last_modified
column from LastModified key, and enable _s3_last_modified
column as a bookmark property.search_pattern
key.table_configs
key. https://github.com/mage-ai/mage-ai/blob/master/mage_integrations/mage_integrations/sources/amazon_s3/README.md
_mage_deleted_at
column to record the source row deletion time.{{ aws_secret_var('some_name_for_secret') }}
. Here is the full guide: https://docs.mage.ai/production/configuring-production-settings/secrets#yaml
Full lists of available sources and destinations can be found here:
Customize alerts to only send when pipeline fails or succeeds (or both) via alert_on
config
notification_config:
alert_on:
- trigger_failure
- trigger_passed_sla
- trigger_success
Here are the guides for configuring the alerts
Besides using Terraform scripts to deploy Mage to cloud, Mage now also supports managing AWS cloud resources using AWS Cloud Development Kit in Typescript.
Follow this guide to deploy Mage app to AWS using AWS CDK scripts.
Mage can orchestrate the sync jobs in Stitch via API integration. Check out the guide to learn about how to trigger the jobs in Stitch and poll statuses of the jobs.
Allow pressing escape
key to close error message popup instead of having to click on the x
button in the top right corner.
If a data integration source has multiple streams, select all streams with one click instead of individually selecting every single stream.
Make pipeline runs pages (both the overall /pipeline-runs
and the trigger /pipelines/[trigger]/runs
pages) more efficient by avoiding individual requests for pipeline schedules (i.e. triggers).
In order to avoid confusion when using the drag and drop feature to add dependencies between blocks in the dependency tree, the ports (white circles on blocks) on other blocks disappear when the drag feature is active. The dependency lines must be dragged from one blockās port onto another block itself, not another blockās port, which is what some users were doing previously.
Fix positioning of newly added blocks. Previously when adding a new block with a custom block name, the blocks were being added to the bottom of the pipeline, so these new blocks should appear immediately after the block where it was added now.
Popup error messages include both the stack trace and traceback to help with debugging (previously did not include the traceback).
Update links to docs in code block comments (links were broken due to recent docs migration to a different platform).
View full Changelog