When we enable developers to create page or API monitors, we want to automate as much of the code writing process as possible, so that developers can test the functionality they care about without getting bogged down in setup steps.
Most engineering organizations are shifting their monitoring and observability left, making developers part of the team that makes sure their service is always running and available. But we don’t expect our developers to become QA and Operations experts, so we need to enable their contributions with as much automation and default configuration as possible. The goal of this guide is to help create an easy process for all your engineers to contribute to monitoring. Just like any Checkly user, developers can still control every aspect of how a monitor runs, when it detects failure, and who gets notified of that failure. This isn’t about taking away control, just setting defaults and automation so that you don’t need to become a monitoring expert to contribute to your team’s monitoring.
This guide will tie together material from our Playwright Learn articles, our blog, and our tutorial videos, to help you create an environment for developers that makes it easy for them to add new page checks with minimal repetitive code. We’ll cover:
This article will refer repeatedly to an ideal “developer” who understands their own service quite well, but isn’t experienced at at either QA or monitoring. They may not be an expert in the automation framework Playwright, which we use to write our monitors, and they don’t have experience with specific configuration like retry logic, alert thresholds, etcetera. This guide can also help you make monitoring accessible for:
Though the rest of this guide will refer to our developer as the person we’re enabling, feel free to slot in anyone who we want to enable to write monitors, without requiring they become experts in Observability, Monitoring, Checkly, or Playwright.
First off, our developer may not be very excited to log into a web interface to create new monitors. Since developers are used to running and deploying code from the command line, we want to make available the Checkly CLI to our developers. A full guide to the Checkly CLI is on our documentation site, but in general the process would look like:
npx checkly login
.npm create checkly
to make a new project.npx checkly test
, the tool will scan the project for tests, run them as configured from the real geographic locations, and give a local report on results.npx checkly deploy
, and they’ll show up for all users in the web interface.For small teams just getting started with Checkly, this is all you need to do to harness the CLI for your process, but as you grow you’ll want to have a single source of truth for the code of all existing checks, and if necessary a change review process, for that you can use the Monitoring as Code model by storing your checks and their configuration in a shared repository.
It’s a great feature of Checkly that any check can have totally custom settings for every aspect of its running: it can run at a custom cadence, from its own geography, with its own logic for retries. This means that if, for example, you’re creating a new test specifically to check localization settings, you can set just that check to run from dozens of geographies, even if most checks only need three or four.
Of course, we don’t want to give our fresh developer over a dozen settings that she needs to set just to create her first check. New Checks created either in the web UI or from your code environment via the CLI will default to using the global configuration. You can view and edit this global config in checkly.config.ts
, you can see a reference on this configuration and options under the project
construct documentation. Some suggestions for default configuration:
Set a generous RetryStrategy
! Some failures can look extremely worrying even if they happen once, but it’s worth double-checking that the problem wasn’t entirely ephemeral. Full documentation of retry configuration is on our docs site.
If you’re encouraging microservice developers or frontend engineers to add checks, consider setting the default frequency to be relatively low. It’s great to test edge cases or unsual scenarios, but someone without exposure to Operations is unlikely to need to run a check every minute.
Make sure your geographic locations
reflect your userbase. You can use runParallel:false
if testing from a large number of locations, so that checks will run from a single location each cycle.
You may want to go beyond global settings for a larger or more complex team. For example if a group of engineers is working with:
These are all goud reasons to create a check group. Groups have their own set of variables and configuration for all the members of that group. Checks can be added to a group either by configuration at the code level (passing a checkGroup
object to the config), or in the web UI by creating a group and clicking ‘Add Checks to Group’.
Some important things to remember when working with a group of checks:
So group settings will override global settings, and individual settings will override everything else.
While there are sophisticated ways to share authentication across checks discussed in other guides, it’s good to know how to share code with fixtures for common repeated tasks. Let’s imagine an example where we’ve got a web shop with a login modal that we need to fill out before doing further account actions. The goal of my check isn’t really to test this modal, it just needs to be filled out before my real testing, where I look at my demo user’s recent orders:
To open this modal and log in, we might use some lines like the following:
This is a fine practice for a single test, but if we have dozens that all require login as a first step, we’ll find ourself copying and pasting this code several times. This means a lot of copied code, such that if we ever tweak the login process, we’ll have to update this copied code in dozens of places, not good!
We’d want to move this code into an extension of the Playwright page fixture like so:
In this example, we’ve just put our fixtures at the top of a single check’s file, and shared it across tests within that file, in the next we’ll share a code snippet across multiple files, and running code with every check.
To see fixtures demonstrated, and a step-by-step explanation of the code, check out Stefan’s tutorial video:
While it’s useful to package up tasks like logging in across multiple tests, we may want some code to run before and after every single check. Some possible examples:
For these, again, we could use core Javascript/Typescript modules to accomplish this task, here’s an example
But while it’s great to use native features, this is requireing a good amount of setup in each test file (the above snippets would have to be repeated in each new file), and our goal is to make creating new checks as easy as possible for developers. So let’s use Playwright’s native features to make this even easier for them.
Now the only change you’ll need to make to your test files is to have them import the test
function from this fixture file, for example:
To see this demonstrated, and a step-by-step explanation of the fixture code, check out Stefan’s tutorial video:
Here’s our complete YouTube playlist on fixtures.
Simplifying monitoring processes through automation and standardized practices allows developers to focus on what matters: ensuring their services function as intended. By leveraging tools like the Checkly CLI and adopting practices like Monitoring as Code, reusable fixtures, and standardized configurations, teams can involve developers in monitoring without overwhelming them with setup complexity or configuration details.
These strategies reduce repetitive work, enable consistency across tests, and ensure that monitoring integrates seamlessly into the development workflow. Ultimately, this approach promotes a culture of shared responsibility for reliability, making it easier for developers to contribute to the stability and observability of their systems without needing deep expertise in monitoring tools. With these methods, teams can improve collaboration and maintain a robust monitoring setup that scales with their needs.