Open source on-call scheduling, automated escalations, and notifications so you never miss a critical alert
APACHE-2.0 License
Bot releases are visible (Hide)
Welcome to the unveiling of GoAlert v0.32.0! This release is a comprehensive package, featuring a range of exciting new features, critical bug fixes, and essential developer improvements to elevate your alert management experience.
The 'API Keys' feature is currently an experimental feature. To explore this functionality, administrators must opt in by configuring GoAlert with EXPERIMENTAL=gql-api-keys
. As an experimental feature, it may change, and your feedback is appreciated. Caution is advised in production, and expect updates as we refine and enhance its capabilities.
Explore an up-and-coming new feature of GoAlert while it's still in its experimental stage! Once activated, discover the new 'API Keys' option in the admin sidebar, allowing administrators to create and tailor access using GraphQL queries or mutations for a more secure and customizable integration.
More information on implementation details and the decisions around this new feature can be found in an ADR here.
listGQLFields
query by @mastercactapus in https://github.com/target/goalert/pull/3276
allowedFields
to query
by @mastercactapus in https://github.com/target/goalert/pull/3411
query
field doesn't work by @mastercactapus in https://github.com/target/goalert/pull/3491
Simplify your team's schedule with monthly handoffs! The new 'Monthly' option in Rotations enhances scheduling convenience, ensuring smooth handoffs once a month in GoAlert.
Elevate service monitoring with new comprehensive metrics in GoAlert. Admins can now keep an eye on service health, check for setup issues, and review notification channels—all in one place for better awareness and management.
Users can now opt into subscribe to all shifts on a schedule in GoAlert. Get a complete view of your commitments and your teammates for more comprehensive schedule management.
The user profile page now renders a calendar for all on-call schedule shifts (broken down by schedule)
We've made a number of QOL improvements including a new splash screen, fixed favicon, labels are visible on services, and improved timestamp support in markdown.
There were a few new config options and tweaks added for admins, like requiring labels on services and closing stale alerts.
A number of bug fixes for the Temporary Schedules features have been addressed and some new features!. Soft limits for start and end dates provide better control, default values are now set accurately, and temporary schedules seamlessly merge as intended.
Schedules are now broken down by a default shift length, and while custom shifts remain supported, this intends to make it easier to create and manage even handoffs amongst team members.
Additionally, if you need to edit a temporary schedule, you will now be prompted to confirm the changes, so you know EXACTLY what is being changed before saving!
Lots of bugs were squashed in this update to improve the overall GoAlert experience! Enhancements include improved linter and formatting, optimized search functionalities, enhanced concurrency and performance, streamlined dependency management, meticulous timestamp handling, and robust error handling.
Introducing Architectural Decision Records (ADRs) in GoAlert! We now document key decisions around new and complex features, providing transparent insights into our development process. ADRs serve as a valuable resource for understanding the rationale behind design choices and enhancing collaboratoin within the GoAlert community.
Learn more about ADRs here.
Some of our current ADRs include decisions on tech migrations and the new API key architecture, and can be found here.
We've invested significant efforts in addressing technical debt by migrating to new technologies. This includes transitions to TypeScript, SQLc, URQL, React Suspense, and React Hooks.
We've fine-tuned user interactions, enriched logging, and polished alert management for an even smoother GoAlert experience. Notable updates, including refined canonical zone names, the addition of an ExpFlag component for conditional rendering, and enhanced alert noise reduction.
We've started making significant progress towards plugin support in GoAlert. This doesn't yet have any user impact, but some new API endpoints and internal changes have been made to support this in the future.
destinationTypes
and destinationFieldValidate
queries by @mastercactapus in https://github.com/target/goalert/pull/3548
dest
as a field on OnCallNotificationRule by @mastercactapus in https://github.com/target/goalert/pull/3659
Full Changelog: https://github.com/target/goalert/compare/v0.31.1...v0.32.0
Published by mastercactapus about 1 year ago
This point release fixes some regressions present in the v0.31.0 release. Additionally, containers and binaries for this version were built with Go 1.21.1.
Full Changelog: https://github.com/target/goalert/compare/v0.31.0...v0.31.1
Published by mastercactapus about 1 year ago
Unveiling the latest GoAlert update with enhanced alert tracking, seamless system migration, and more. Experience greater control and efficiency in your alert management.
Container image: goalert/goalert:v0.31.0
Users may now mark alerts as "noisy" and see the overall counts and escalations on the Alert Metrics page.
There is now a graph to visualize sent messages:
You can now migrate a live system from one DB server to another without downtime.
More information is available in the Switchover Guide.
Alerts created by email are now supported by pointing an MX record directly to GoAlert documentation available here.
Users and admins are now able to update passwords from the UI
createBasicAuth
and updateBasicAuth
for managing Basic credentials via GraphQL by @allending313 in https://github.com/target/goalert/pull/3062
Users can now add Slack direct messages as a contact method on their profile when Slack is enabled.
You can now set a global auto-close time for unacknowledged alerts under Maintenance on the Admin -> Config page.
Configuring the Twilio voice name and language from the admin panel is now possible.
Webhooks can now be set as a destination for scheduled on-call notifications and directly on an escalation policy step! Additionally, the service name is now included for alerts.
You can now escalate an alert by voice and SMS.
Slack usergroups can now automatically be updated by setting an on-call notification on a schedule!
It is now possible to get the alert and service ID when creating an alert via a Generic API key by setting the Accept
header to application/json
We hardened security, expanded maintenance mode, and fixed some cleanup issues.
We moved the add buttons on wide-screen devices from a floating-action button to a named button at the top to reduce confusion for new users.
We also made various improvements around accessibility, typography, and inputs around the application. Notably:
We've introduced a new Time
component to normalize the display and accessibility of timestamps and durations across the application.
We squashed a whole lot of bugs! Most notably those with long public URLs will no longer get errors when sending SMS messages as multi-segment messages are now supported.
We made a lot of various improvements to the development tools and processes. Lots of tech-debt resolution, testing improvements, and documentation fixes!
We decided to end the experiment with GORM and instead adopt sqlc.
We added additional linting checks and fixes to our CI pipeline.
There is now support for experimental feature flags to allow breaking more extensive features into smaller pieces aiding in review and testing.
The list of current flags is available with goalert --list-experimental
useExpFlag
by @mastercactapus in https://github.com/target/goalert/pull/2806
Various dependency updates including a switch to Yarn pnp making installs and switching branches much faster.
Lots of progress made on the TypeScript conversion in https://github.com/target/goalert/issues/2318
Full Changelog: https://github.com/target/goalert/compare/v0.30.1...v0.31.0
Published by mastercactapus almost 2 years ago
Minor release with bug and stability fixes for peak season.
Full Changelog: https://github.com/target/goalert/compare/v0.30.0...v0.30.1
Published by m17ch about 2 years ago
This version has many improvements and features that have been in the works for a while for users, admins, and developers!
In addition to the binary release below, this release is also available in container image form: goalert/goalert:v0.30.0
First and foremost, no breaking changes. Without configuration updates, the new version should behave like the old. However, the global Public URL config is now deprecated.
We added a new --public-url
flag (or GOALERT_PUBLIC_URL
env var), which is now the preferred way to configure the public address for things like auth, Twilio/Slack callbacks, SMS URLs, and redirects. You should test and update your environments accordingly. Most users can simply start GoAlert with --public-url
set to their current Public URL config value; additional notes are below under "Migration Notes"
Things of note:
0.30.0
should be as safe as all other versions with no breaking/behavioral change without explicit opt-in.--http-prefix
flag. Using it will cause said deprecated settings to be ignored.This was done to simplify fragile code that caused problems, particularly with various reverse-proxy setups and confusion around deployments behind a route prefix. Redirects and validation no longer use the host header to guess the environment and will always assume it is available at the configured URL.
Migration Notes
--http-prefix
, just ensure the same prefix is set as the path of your --public-url
valueA long time coming, we finally added some color to the application and dark-mode support! Changing the color for yourself from the new settings popover is now possible (can be useful for those that use multiple environments).
Slack account linking is finally supported without running the goalert-slack-email-sync
tool. Linking allows a user to close or acknowledge an alert from Slack that previously needed to be maintained by an admin.
When a user attempts to interact with an alert for the first time in Slack, a private message will prompt them to link their account. After confirming in GoAlert, their initial action will be taken (e.g., close or ack), and their account will remain linked for future interactions.
Note: in response to feedback, we've updated the Slack notifier to only include the summary in messages, reducing channel noise.
We've made it possible to put a service in a temporary maintenance mode, which will still allow alerts to be created/acknowledged/closed but prevents escalations (including the first) so your phone isn't going off while you're working to fix an issue :)
We've added a metrics page for services so you can visualize alert counts, escalations, and time to acknowledge and close. It also allows you to export up to one year of alert data (including timing information) as a CSV.
You can find Alert Metrics under Quick Links on your service details page.
alert_metrics
table. (#2177)daily_alert_metrics
table that aggregates metrics from alert_metrics
. (#2272)alerts
query as the source of truth. Additionally, closedAt
is added to the table and CSV export. (#2448)Grafana Alerting is now supported, along with support for images!
ImageURL
to alert details, when available, for Grafana-generated alerts. (#2468)A new Message Logs page allows admins to view metadata about recently sent notifications.
Outgoing Logs
view from the Admin page. (#2061)debugMessages
query that an admin can use to retrieve a list of recent messages from the outgoing_messages
table. (#2052)The new Alert Counts page will help admins find "busy" services and correlate high-alert events when necessary.
alert_logs
table when running make regendb
. (#2150)resetdb
to include randomly generated escalation logs. (#2430)It is now possible to do a smooth key rotation with Twilio:
There is now a Dev link in the main navigation menu when starting in development mode (i.e., make start
), giving quick access to tools and configuration for local development. In addition, there were several process improvements and paid tech debt with this release, such as a huge speed improvement with the switch to esbuild
.
Some of these tools have been available and running with make start
but the new dashboard will hopefully make it a lot easier to discover and use the bundled tools!
runproc
tool, a simpler alternative to runjson
. (#2080)make start
log output. (#2172)build-env
image to have the necessary dependencies to run Cypress tests, allowing all build steps to be run in the same container. (#2400)slowproxy
that can simulate network latency, jitter, and throughput constraints for testing. (#2307)webpack-dev-server
and the required proxy with a method that will serve files from the local filesystem or in memory. (#2246)test/smoke
(#2590)--public-url
flag and strictly validate URLs (#2421)1 KiB
. (#2073)react-router-dom
with wouter
. (#2415)application/json
in incoming webhook. (#2161)make start
issue when restarting postgres (#2580)SIZE=
with make regendb
to test various data sizes (#2580)speedbump
as a devtool to replace slowproxy
(#2580)encoding/xml
(#2582)Version Updates:
Published by dctalbot almost 3 years ago
This release includes several new features in addition to general maintenance and bug fixes.
OutlinedInput
style in anticipation of MUIv5.users:read.email
permission scope. See the instructions on the Getting Started page for how to configure a Slack App using the generated manifest. (#2025)auth_subjects
table. This allows for Slack mentions e.g. in On-Call Notifications and interactive messages.debugMessageStatus
query provides the state of a message for a given provider message ID. (#1917)Version Updates:
Published by dctalbot about 3 years ago
This release includes several new features in addition to general maintenance and bug fixes.
Many thanks to our new contributors!
@voromahery
@Voninkazo
@NateBigStone
@ganamavo
@wesley-dean-flexion
On-Call notifications
You can now have GoAlert notify Slack channels of on-call changes on a schedule. Notifications can be configured for any change, or at specific times of the day.
Other Improvements
Search behavior has changed to use a root-word-based search. For user name searching, a word prefix algorithm was implemented more information is available in the PRs (#1860, #1872)
Actions have been moved from the top bar to MUI card actions
Calendar toolbar has been updated and loading status added to the calendar
Users may now add webhooks as contact methods. They can be used similarly to email, SMS, and voice. More information can be found under "Using Webhooks" under /docs
within the GoAlert application.
Must be enabled in the Admin -> Config page.
Based on feedback in our community slack channel, we implemented some in-app user management solutions:
A new experimental gRPC API server is now included (disabled by default).
--listen-sysapi string (Experimental) Listen address:port for the system API (gRPC).
--sysapi-ca-file string (Experimental) Specifies a path to a PEM-encoded certificate(s) to authorize connections from plugin services.
--sysapi-cert-file string (Experimental) Specifies a path to a PEM-encoded certificate to use when connecting to plugin services.
--sysapi-key-file string (Experimental) Specifies a path to a PEM-encoded private key file use when connecting to plugin services.
Currently supported API methods are AuthSubjects
and DeleteUser
allowing an external tool to enumerate and delete/cleanup users. (e.g., compare OIDC users with an Active Directory server).
Protobuf definition and certificate information can be found under pkg/sysapi
Version Updates:
Published by m17ch over 3 years ago
This release has a lot of long-anticipated awesome new features and improvements! Most of the work of this release was done under the covers to enable these as well as new features currently in the pipeline.
Thanks to our new contributors @kanish671 and @phoenix6561 for help with the React Hooks conversions Issue #923
Temporary schedules allow replacing the entire on-call schedule for a fixed timeframe. It should help situations where a team requires fine-grained control of on-call responsibility for a few days (e.g. peak week).
To get started using this feature, click the "Temp Sched" button from the schedule details page calendar:
A dialog will open allowing you to configure the time span of the schedule and associated shifts:
Temporary schedules will appear green on the calendar:
A new user contact method type EMAIL has been introduced. This allows an additional avenue for users to receive alert notifications.
The SMTP server details can be configured from the Admin page.
Schedules now appear at the top of the Edit Step dialog. Many users had trouble with overrides before realizing a rotation was used instead of schedule. This change was found to encourage users to use schedules (instead of rotations) when configuring a policy.
You can now enable Prometheus metrics at startup. This allows for monitoring some of the internals and health of GoAlert.
To enable, set the --listen-prometheus
flag or the GOALERT_LISTEN_PROMETHEUS
environment variable.
A new endpoint at <listen_address>/metrics
will include:
GoAlert now implements bucketed throttling of notifications, per contact method. This is to avoid spamming devices due to misconfiguration or a noisy/broken healthcheck.
For reference, new rules are:
Message Type | Contact Method Type | Rules |
---|---|---|
All | All | 1 per minute |
Status | All | 1 per 3 minutes 3 per 20 minutes (burst) 8 per 2 hours |
Alert | SMS | 5 per 15 minutes (burst) 11 per 1 hour 21 per 3 hours |
Alert | Voice | 3 per 15 minutes (burst) 7 per 1 hour 15 per 3 hours |
/docs
: Now supports rendering multiple markdown documents.Version Updates:
Published by arurao almost 4 years ago
This release has a lot of bug fixes, cool new features, and hardening in prep for the peak season!
A version check has now been added to GoAlert's UI that will display a persistent snackbar notification when a new version is detected.
A new integration key type for Prometheus Alertmanager has been added.
The config page has a new condensed/summary view to make finding items easier:
Users may now manage active login sessions from the profile page, including the ability to easily end/cleanup ones no longer used.
(Admin note: Initial last access time will start as the time of your deployment when introducing this feature.)
A toggle was added to the alerts list filter to switch between displaying the duration since created or the full timestamp when created.
When adding a new contact method with a number that is already in-use, users will now have more helpful messaging with a link to the existing users' profile.
The alert log status entries will now indicate failures for things like provider "Twilio" is disabled.
A new notices API has been added within the GraphQL schema for a Notice. Currently, a notice will be displayed if an escalation policy is unused, in the future, this will be expanded to other common issues.
A new status dialog has been added that displays real-time delivery details and the status of test notifications.
goimports
command in Makefile so that using go run ... will automatically pull the tool, as well as using the correct version, rather than requiring the user to install it manually.Published by mastercactapus about 4 years ago
This release primarily focused on stability & refactor work, however there have been a number of welcome improvements and helpful new features too!
While the /v1
endpoints have been deprecated for awhile, there is now a switch in the admin config to allow disabling the old /v1/graphql
endpoint. It will be disabled by default in a future version, and removed completely in a yet further version.
On wide displays, links have been moved to the side to better use available screen space.
There is now a Toolbox
page under Admin with a tool that will allow looking up carrier information about a phone number by an admin.
This is part of a collection of tools added to handle infrequent, but problematic, carrier filtering of SMS messages.
The Send SMS tool can be used to test sending messages from alternate numbers, with or without URLs, etc.. in order to detect if carrier filtering is taking place, and available workarounds (like disabling URLs in SMS messages, or using an alternate From number).
Admins now have the ability to override the From number for SMS messages per-carrier. This is helpful in the case that a specific carrier is blocking SMS message delivery and can be used as a temporary workaround without a new deployment or code change.
A new admin page was added to allow configuring and viewing current system limits.
The alert log has been enhanced to show current status of message delivery.
Issues like failures to send, or disabled contact methods, will also show up in the log.
Additionally, things like having nobody on call, or users without immediate rules, will be reflected in the log.
It is now possible for users to subscribe to their own shifts via a calendar subscription. This allows viewing when you're on call for a schedule from apps like Google Calendar or iCalendar
Subscriptions may be accessed from your Profile page, additionally they can be created from any schedule's details page.
The create alert dialog has been reworked allowing for creating identical alerts across multiple services. This is intended to help operations teams that may need to notify multiple teams about a wide-impact incident.
sendit
tool that is similar to ngrok
that can be self hosted.postgresql://
support to the waitfor
toolPublished by mastercactapus almost 5 years ago
This hotfix release fixes a critical bug in the rotation/schedule calculation code that could cause an infinite loop (more context here).
Additionally, when trying to create an override that conflicts with an existing one, users will now receive a descriptive error, indicating the conflicting user, instead of a generic "unexpected error" message.
Published by mastercactapus almost 5 years ago
This release has a lot of bug fixes, features, and hardening in prep for the peak season!
No more SMS/call per-second during alert storms!!
When you have more than 1 scheduled message for a given contact method (e.g. 5 alerts pending for SMS)
you will now get an SMS reading "Svc: 'foobar' has 5 unacked alerts" with an option to ack or close all. This also works for voice and Slack!
Additionally, you will only ever receive up to 1 message per 60-seconds for an individual contact method (confirmation/replies excluded).
You can enable bundling from the Admin page under the General
section.
GoAlert now supports generating shorter URLs, optionally from an alternate domain.
You can enable ShortURL in the Admin page under the General
section.
You can now have GoAlert automatically cleanup closed alerts that are older than a configurable amount of days.
This can be enabled by setting a non-zero value for Alert Cleanup Days
in the Maintenance
section of the Admin page.
We switched to the github.com/jackc/pgx
driver to solve a swath of bugs and issues (e.g. where context deadlines were not respected) we ran into with the old driver. We also put a lot of work in to handle DB/connection hiccups more gracefully. In most cases, this means even if the DB is restarted, no requests will fail (assuming it comes back within a reasonable time).
More importantly, network edge cases that could cause a connection or query to hang (sometimes indefinitely) will now properly terminate and respect context deadlines using the new driver.
Heartbeat monitors are a new feature that will generate an Alert if they do not receive a POST request within the specified time period. This means you can have a cronjob checking in hourly, and if it fails, fails to run, or loses network access, GoAlert will still create an alert once the timeout is reached.
You can find the link on the Service Details page to manage these.
We've introduced a default set of system limits for things like max rules on a schedule, max unacked alerts per service, etc.. These are intended to give some type of bounds or upper-limit on different resources.
They can be tweaked in the config_limits
table. A page will be added to the Admin panel in a future version.
The logic around prioritizing messages has been replaced with a version that is unit-testable and flexible. Services that have not received any notifications will have their first notification prioritized above all else for example.
Priority is also re-calculated per-message making it fairer during alert storms.
Published by m17ch about 5 years ago
This release primarily focused on fixing bugs and barriers that were reported in v0.22.0.
goalert/goalert
docker container now listens on the wildcard address by defaultsmoketest
dep from generate
to install
(#58)createAlert
mutation (#37)Published by mastercactapus over 5 years ago
Initial open source release!