In marketing, making informed marketing decisions requires an integrated view of data from multiple sources — Google Ads, Google Analytics, Google Search Console, and various custom systems.
Over the years, I’ve refined a data engineering process that unifies these disparate data streams into a single, cohesive platform. The result is a solution that not only simplifies data management but also enhances the decision-making capabilities of marketing teams.
A Unified Data Pipeline: The Core Architecture
At the heart of this solution is a robust data pipeline built on modern cloud-native technologies. The process begins with data ingestion, flows through centralized storage and transformation layers, and culminates in actionable analytics. Here’s a brief look at the architectural flow:
Data Ingestion: Airbyte extracts data from multiple marketing platforms.
Data Storage: All data is consolidated in Google BigQuery.
Data Transformation: dbt and Airflow work in tandem to prepare data for analysis.
Analytics: Looker visualizes the data for self-serve analytics.
Infrastructure: Airbyte and Airflow are deployed on Google Kubernetes Engine (GKE) using their free open-source (OOS) version. Furthermore, we make use of Terraform to manage the infrastructure following infrastructure as a code best practices.
Each component has been selected and configured to address the challenges of scalability, reliability, and ease of maintenance.
![Diagram illustrating a data pipeline from MySQL, PostgreSQL, Google Ads, and Google Analytics 4 sources to Airbyte for ingestion, then transformations across layered data zones, culminating in Looker for business intelligence. Airflow coordinates each step](https://framerusercontent.com/images/02Q02L61DhBq1qWtBjiAozNS8GQ.png)
Marketing Data Stack Architecture
Ingestion with Airbyte
The first step in building a reliable data pipeline is data ingestion. Airbyte serves as the engine that extracts data from various sources, including Google Ads, Analytics, Search Console, and custom APIs. Its flexibility makes adding new sources incredibly cheap, as you are only paying for the compute, not for the rows or GB ingested, like other managed ingestion tools.
![Airbyte dashboard showing multiple connections (name, source, destination, sync info) with toggles, a left navigation menu, and a ‘New Connection’ button.](https://framerusercontent.com/images/I21owT8fiY3HNiYcE4x1Xwgsg.png)
Centralized Storage in BigQuery
After extraction, data is funneled into Google BigQuery, a high-performance, serverless data warehouse. BigQuery is chosen for its capabilities of handling vast amounts of data, its fast querying capabilities, and its integrations with Google Cloud Platform.
Having all data centralized in a single source of truth is the backbone of the architecture. This way, we can provide extensive reporting to the marketing teams so that they have all data available to make decisions.
Data Transformation Using dbt and Airflow
Raw data rarely comes in a format ready for immediate analysis. This is where data transformation comes into play. Using dbt (data build tool), you are able to write clear, maintainable SQL models that cleanse and structure the data. dbt transforms the raw inputs into a set of business-friendly models that are optimized for analytics.
![A four-stage data pipeline (Landing Zone, Staging Layer, Intermediate Layer, Curated Datamarts) shown with magnifying glass icons along the flow. Above each stage are Docker, Kubernetes, and the dbt logo.](https://framerusercontent.com/images/aLa3OSXQHIyHGdbVMDrvg89Z8zI.png)
Trasformation Layers
To orchestrate these transformations, Apache Airflow is used. Airflow schedules and monitors the execution of dbt jobs, ensuring that the entire data transformation pipeline runs reliably. This combination of dbt and Airflow not only automates the process but also provides a clear audit trail of data changes, which is essential for maintaining data quality over time.
![A screenshot of an Airflow DAG Docs page displaying a bar chart of task runs on the left (with tasks like ‘get_dbt_mode_parameter’) and a color-coded DAG graph on the right, showing each task’s status and run history.](https://framerusercontent.com/images/aximEtMEc0NWORcqiJ8uH5jq14.png)
Delivering Insights with Looker
The final stage of the pipeline is delivering actionable insights to the marketing team through Looker. Looker connects directly to BigQuery and provides a self-service analytics platform where users can explore data, generate reports, and build interactive dashboards.
With Looker, the marketing team gains the flexibility to drill down into metrics and uncover trends without needing constant support from the data engineering team.
![Website analytics dashboard showing key metrics (users, sessions, page views, bookings, and appointment rate), alongside a line chart of sessions by month (October to January).](https://framerusercontent.com/images/lMW97MAhg1lnCDWvAg5q400wbjQ.png)
Scalable Deployment with GKE and Terraform
Behind the scenes, the deployment of this entire system is managed using Google Kubernetes Engine (GKE) and Terraform. Airbyte and Airflow are containerized and deployed on GKE, which provides a scalable and resilient environment to handle fluctuating workloads.
Terraform plays a critical role in this setup by automating the provisioning of all Google Cloud resources. This Infrastructure as Code approach ensures that the deployment is consistent, reproducible, and easy to manage across different environments — from development to production.
![GKE + Terraform for Infrastructure](https://framerusercontent.com/images/eBU0YXBx39FdP5C6Im2CTlNyk5Q.png)
GKE + Terraform for Infrastructure
Conclusion
This architecture unifies data from Google Ads, Analytics, Search Console, and other sources into one streamlined pipeline. By leveraging Airbyte, BigQuery, dbt, Airflow, and Looker — and deploying on GKE with Terraform — we create a scalable, reliable system that delivers timely, actionable insights.
We have followed this integrated approach for many different teams and it simplified their decision-making process, improved ROI, and enabled them to pivot strategies quickly in response to real-time insights, ultimately driving sustained growth and operational excellence.
Thank you
If you enjoyed reading this article, stay tuned as we regularly publish technical articles on Looker and how to leverage it at best. Follow Astrafy on LinkedIn to be notified of the next article ;).
If you are looking for support on Looker, Data Stack, or Google Cloud solutions, feel free to reach out to us at sales@astrafy.io.