Build a Monitoring and Alerting System with n8n: Uptime, Errors, and Competitor Tracking
I woke up at 3 AM to a client calling because their website had been down for four hours. Nobody on the team knew because they were all asleep. The staging server had eaten all the memory, the production database failover did not kick in, and the only reason anyone noticed was because a customer in a different time zone sent an angry email.
That was the last time I relied on hope as a monitoring strategy.
After that incident, I built a comprehensive monitoring and alerting system using n8n. It checks website uptime every five minutes, monitors API health endpoints, tracks error rates, watches SSL certificate expiration, monitors competitor pricing, and sends immediate alerts via Slack and email when anything goes wrong.
The system has been running for over a year across multiple client projects, and it has caught issues within minutes that would have otherwise gone undetected for hours. In this guide, I will show you how to build the same system.
Why n8n for Monitoring Instead of Dedicated Tools?
There are excellent dedicated monitoring tools out there — Datadog, New Relic, Pingdom, UptimeRobot. I use some of them alongside n8n. But n8n fills a specific niche that these tools miss:
Custom monitoring logic. Dedicated tools monitor standard metrics: is the server responding? What is the response time? But what about business logic monitoring? “Are we still getting new signups?” “Has the payment webhook fired in the last hour?” “Is the competitor’s pricing page showing a different price than yesterday?” n8n lets you monitor anything you can query via an API or web request.
Unified alerting. With dedicated tools, you end up with alerts from five different platforms going to different channels. n8n centralizes all monitoring into a single system with a single alerting configuration. One Slack channel, one escalation policy, one dashboard.
Cost. Monitoring tools charge per check, per host, or per metric. n8n lets you run unlimited checks at whatever frequency you want for a flat cost (or free if self-hosted). For a startup monitoring 20+ endpoints, the savings are significant.
Integration depth. When n8n detects an issue, it can do more than just send an alert. It can automatically restart a service via SSH, scale up a server via a cloud provider API, roll back a deployment, or create an incident ticket in your project management tool. Try doing that with UptimeRobot.
System Architecture
The monitoring system I built has four main components:
1. Uptime Monitoring — Check if websites and services are responding
2. API Health Checks — Validate that APIs are returning correct data
3. Error Tracking — Monitor application logs and error rates
4. Competitor Monitoring — Track competitor pricing, features, and availability
Each component runs on its own n8n workflow with its own schedule and alerting rules. Let me walk through each one.
Component 1: Website Uptime Monitoring
The uptime monitor is the simplest and most critical component. It checks whether your websites are accessible and responding within acceptable time limits.
The workflow. A Schedule Trigger node fires every 5 minutes. It triggers a series of HTTP Request nodes, one for each URL to monitor. I monitor the production website, the API endpoint, the admin dashboard, the documentation site, and any other critical URLs.
Each HTTP Request node is configured to time out after 10 seconds. I use the “Never Error” option so that failed requests do not stop the workflow — instead, the error is captured as output data that I can evaluate in the next node.
After each HTTP Request, an IF node checks three conditions:
– Did the request succeed (HTTP status code 200)?
– Was the response time under 3 seconds?
– Does the response body contain an expected string (a simple content verification)?
If any condition fails, the workflow branches to an alerting path.
Smart alerting (avoiding alert fatigue). A single failed check is not necessarily an outage — it could be a momentary network blip. I built a state management system to avoid false alarms.
When a check fails, the workflow writes the failure to a Google Sheet (or Redis, if you prefer speed) with a timestamp. Before sending an alert, it checks the sheet for the last 3 results for that URL. Only if the last 3 consecutive checks have failed does it send an alert. This means a real outage triggers an alert within 15 minutes (3 checks at 5-minute intervals), but a brief glitch does not.
Similarly, when a URL recovers after being down, the workflow sends a “recovery” notification so the team knows the issue has resolved.
Alert content. The Slack alert includes: the URL that is down, how long it has been down (calculated from the first failure timestamp), the HTTP status code (or “timeout” if the request timed out), the last successful response time, and a direct link to the monitoring dashboard.
For critical services, I also send an email alert to ensure it reaches the on-call person even if they are not checking Slack.
Response time tracking. Even when sites are up, I track response times over time. The workflow logs each check’s response time in a Google Sheet with a timestamp. I use this data to spot performance degradation trends — if the average response time has increased by 50% over the last week, that is an early warning of an issue even if nothing has gone down yet.
Getting Started with n8n
Building a monitoring system from scratch might sound like a big project, but n8n makes it surprisingly straightforward. Try n8n here to get started with the cloud version, which is the easiest way to set up your first monitoring workflow without worrying about server management.
Component 2: API Health Checks
Uptime monitoring tells you if a server is responding, but it does not tell you if the API is actually working correctly. An API can return HTTP 200 while serving completely wrong data because of a database issue or a broken dependency.
API health checks go deeper by validating the actual response content.
Authentication check. The workflow calls the login endpoint with test credentials and verifies that it returns a valid authentication token. If authentication is broken, nothing else in the application works.
Data endpoint checks. For each critical API endpoint, the workflow sends a request and validates the response against expected criteria. For example, for a product listing API, it checks that the response is valid JSON, that it contains at least one product, that each product has the required fields (name, price, image URL), and that the price values are within a reasonable range (no products at $0 or $999,999).
I use a Function node to run these validations and produce a structured result object that indicates which checks passed and which failed.
Database connectivity check. The workflow calls an endpoint that queries the database and returns a simple result. If this check fails, it usually indicates a database connection issue. I specifically check both read and write operations by calling a test endpoint that writes a record and reads it back.
Third-party dependency checks. Modern applications depend on external services: payment processors, email providers, CDNs, authentication services. The workflow checks each critical dependency by calling a lightweight endpoint (usually a health or status endpoint) on each service. If Stripe’s API is down, I want to know about it before customers start complaining that they cannot check out.
Scheduled execution. The API health check workflow runs on a Schedule Trigger every 10 minutes. This is less frequent than the uptime check because API health checks are more expensive (each run makes multiple API calls) and issues are less likely to be transient.
Component 3: Error Tracking and Log Monitoring
Even when everything is “up,” errors can be silently accumulating. A spike in 500 errors, a sudden increase in failed database queries, or a new type of exception that started appearing after the latest deployment — these issues need to be caught early.
Log aggregation approach. I connect n8n to application logs through two methods:
Method one: If the application uses a logging service like Sentry or LogRocket, I use the HTTP Request node to call their API and fetch recent error events. The workflow checks for any new errors since the last check, grouping them by error type and counting occurrences.
Method two: For applications that log to a file or a database, I use the SSH node to connect to the server and run a command that counts error-level log entries in the last 10 minutes. The command filters the log file for lines containing “ERROR” or “CRITICAL,” counts them, and returns the count.
Anomaly detection. A simple error count is not very useful by itself. What matters is whether the error rate is abnormal. I built a basic anomaly detection system using a Function node:
The workflow maintains a rolling average of the error count for each 10-minute window over the last 24 hours. When the current error count exceeds 3x the rolling average, it triggers an alert. This accounts for natural fluctuations in error rates (e.g., higher traffic during business hours produces more errors) while catching genuine spikes.
New error type detection. The workflow also maintains a list of known error types (stored in a Google Sheet). When a new error type appears that is not in the list, it triggers a separate alert specifically for n