Build Data Pipelines with n8n: A Practical ETL Tutorial for Non-Data Engineers
Data pipelines sound intimidating. The term conjures images of Spark clusters, Airflow DAGs, and data engineers writing complex SQL transformations. But here is the truth: most small and mid-sized businesses do not need enterprise-grade data infrastructure. What they need is a reliable way to pull data from one system, clean it up, and push it into another.
I am a startup consultant, not a data engineer. But I build data pipelines every week for my clients using n8n, and they work beautifully. From syncing Stripe payment data into Google Sheets for the finance team, to aggregating marketing metrics from five different platforms into a single dashboard, to migrating customer data between CRMs during a tool switch — n8n handles all of it.
In this tutorial, I will walk you through how to build practical data pipelines with n8n, using real examples from projects I have actually built. No Spark. No Kafka. No PhD required.
What Is a Data Pipeline (And Why Should You Care)?
A data pipeline is simply a series of steps that move data from a source to a destination, with some transformation in between. In the industry, this is called ETL: Extract, Transform, Load.
– Extract: Pull data from a source (API, database, file, spreadsheet)
– Transform: Clean, filter, restructure, or enrich the data
– Load: Push the processed data into a destination (database, spreadsheet, dashboard, another API)
You need data pipelines whenever your business data lives in multiple systems that do not talk to each other natively. Which is to say, always.
Your sales data is in HubSpot. Your billing data is in Stripe. Your support tickets are in Zendesk. Your marketing metrics are in Google Analytics, Facebook Ads Manager, and LinkedIn Campaign Manager. Your team wants a single dashboard that shows all of this together. That requires a data pipeline.
Why n8n for Data Pipelines?
I have used custom Python scripts, Apache Airflow, and even hand-rolled cron jobs connected to shell scripts. For startups and small teams, n8n is better than all of these for several reasons:
Visual debugging. When a pipeline breaks (and they do), n8n shows you exactly which node failed and what data it was processing. With a Python script, you are reading stack traces and adding print statements. With n8n, you click on the failed node and see the input and output data immediately.
No infrastructure to manage. Airflow requires a dedicated server, a database backend, and ongoing maintenance. n8n runs on a single server or in the cloud with zero ops overhead.
Built-in scheduling. n8n’s Schedule Trigger node lets you run pipelines at any interval — every minute, every hour, daily, weekly, or on specific days and times. No cron setup required.
Native integrations. n8n has nodes for most common data sources and destinations: databases (Postgres, MySQL, MongoDB), spreadsheets (Google Sheets, Airtable), APIs (HTTP Request node for anything), cloud storage (S3, Google Drive), and more.
Error handling. n8n supports try-catch-style error handling, retry logic, and alerting. When a pipeline fails, I get a Slack notification with the error details and can fix it immediately.
Real Example 1: Stripe to Google Sheets Financial Report
This is the most common data pipeline I build. The finance team wants a daily report of all Stripe transactions in a Google Sheet, broken down by product, currency, and status.
Extract phase. I start with a Schedule Trigger node set to run daily at 7 AM. It triggers an HTTP Request node that calls the Stripe API’s list charges endpoint, filtering for charges created in the last 24 hours. The Stripe API paginates results, so I use a Loop node that keeps fetching the next page until all charges are retrieved. Each API call returns up to 100 charges, and the loop handles pagination by passing the last charge ID as the starting_after parameter.
Transform phase. The raw Stripe data includes dozens of fields I do not need. I use a Set node to extract only the relevant fields: charge ID, amount (converting from cents to dollars by dividing by 100), currency, customer email, product description, status (succeeded, failed, refunded), and timestamp.
Next, I use a Function node to perform aggregations. I group the charges by product and calculate the total revenue, number of transactions, and average transaction value for each product. I also flag any failed charges or refunds for the finance team to review.
I add a DateTime node to convert the Unix timestamps from Stripe into human-readable dates in the team’s local time zone.
Finally, I use another Function node to calculate running totals: month-to-date revenue, comparison with the same day last month, and percentage change. This gives the finance team instant context.
Load phase. The processed data goes into a Google Sheets node that appends rows to a “Daily Transactions” sheet. Each row contains the charge details, and a separate “Summary” sheet gets the aggregated metrics.
I also use a conditional branch that checks if any charges failed or were refunded. If so, it adds those to a separate “Needs Review” sheet and sends a Slack notification to the finance channel.
The result is a self-updating financial dashboard that the team opens every morning with fresh data, without anyone manually exporting CSVs from Stripe.
Getting Started with n8n
If you want to build data pipelines like these, try n8n here. The cloud version comes with all the database and API integrations pre-configured, so you can connect your data sources and start building pipelines immediately.
Real Example 2: Multi-Platform Marketing Dashboard
My second most-requested pipeline aggregates marketing data from multiple platforms into a single Airtable base that powers a dashboard.
The data sources are Google Analytics (website traffic), Facebook Ads Manager (ad spend and conversions), LinkedIn Campaign Manager (B2B ad performance), Mailchimp (email campaign metrics), and HubSpot (lead generation numbers).
Extract phase. I use five parallel branches, each pulling data from a different platform:
– Google Analytics: The Google Analytics node pulls pageviews, sessions, bounce rate, and conversion events for the previous day
– Facebook Ads: The HTTP Request node calls the Facebook Marketing API to get spend, impressions, clicks, and conversions for each active campaign
– LinkedIn Ads: Another HTTP Request node calls the LinkedIn Marketing API for similar metrics
– Mailchimp: The Mailchimp node retrieves campaign performance data (sends, opens, clicks, unsubscribes) for any campaign sent in the last 24 hours
– HubSpot: The HubSpot node pulls the number of new contacts, new deals, and deal values created yesterday
Each branch runs independently, so even if one API is slow or fails, the others continue processing. This is one of n8n’s strengths — parallel execution with independent error handling.
Transform phase. Each data source returns data in a completely different format. The transform phase normalizes everything into a consistent schema.
I use Set nodes on each branch to map the platform-specific fields to a common structure: date, platform, metric_name, metric_value, and campaign_name (where applicable).
A Merge node combines all five branches into a single data stream. Then a Function node calculates cross-platform metrics: total ad spend across all platforms, blended cost per lead (total spend divided by total new HubSpot contacts), and channel-level ROI estimates.
Load phase. The normalized data is loaded into Airtable using the Airtable node. Each row in the Airtable base represents a single metric for a single day and platform. This structure makes it easy to build views, filters, and charts in Airtable’s interface.
I also push the summary metrics into a Slack message that goes out at 8 AM, giving the marketing team a quick daily digest without having to open Airtable.
The whole pipeline runs in under 2 minutes and gives the team a unified view of marketing performance that would take hours to compile manually.
Real Example 3: CRM Data Migration
When one of my clients switched from Salesforce to HubSpot, they needed to migrate over 15,000 contacts, 3,000 companies, and 8,000 deals. The native migration tools offered by HubSpot handled the basics but could not deal with custom fields, relationship mappings, and data cleanup requirements.
I built a migration pipeline in n8n that handled the entire process.
Extract phase. I used the Salesforce node to bulk-export all contacts, companies, and deals. The Salesforce API returns data in batches, so I used a Loop node similar to the Stripe pagination example. For each object type (Contact, Company, Deal), I queried all fields including custom ones.
Transform phase. This was the most complex part. Salesforce and HubSpot use different field names, data types, and picklist values. I built a comprehensive mapping layer:
A Function node mapped Salesforce field names to HubSpot field names. For example, Salesforce’s “Account” became HubSpot’s “Company,” and the custom field “ARR_Band__c” became a HubSpot custom property “annual_revenue_band.”
Another Function node converted picklist values. Salesforce used “Closed-Won” while HubSpot uses “closedwon.” Deal stages, lead statuses, and industry categories all needed mapping.
I also used a Function node to clean the data: trimming whitespace from names, normalizing phone numbers to E.164 format, removing duplicate email addresses, and flagging records with missing required fields.
For