In a few months Universal Analytics will stop working. If you have already implemented your GA4 (Google Analytics 4) property then it’s a good start. But what about your data models ?

It could be challenging to start from scratch. That’s why we would like to propose one simple starting solution that in our opinion could fit many use cases and analysis for you and your team. This implementation does not answer every use case but it should definitely be one of the Google Analytics datamarts in your possession. Prior to any implementation, you should always do a workshop with the different business stakeholders to understand their needs; this will drive in the end your optimal Google Analytics models.

Nota bene: in this article we simplify the model a lot to explain easily how to build it. Contact us if you want to know more about the full dbt code.

Methodology

GA4 has an “event” granularity, with every “event” linked to a page. Our approach is to have a session / pages granularity like the example below:

Table showing website analytics data with columns for date, session ID, page location, and counts for multiple events. The date '2024-01-01' is consistent across three rows, with varying session IDs and page locations such as 'homepage' and 'contact_us', along with different numbers of events recorded

With this structure you will be able to add as many dimensions and measures you want per page (like channel, device, number of add_to_cart clicks, number of transactions and revenue). All of those dimensions and measures are linked to a specific page that’s why it seems relevant to have this type of granularity.

Implementation Steps

Let’s take an example, imagine you want to analyse basic metrics like number of sessions, number of users, and one click button event that we are going to call “book a meeting”. The resulting table we would like to have is this one:

Data table from a website analytics report for the date 2024-01-01, displaying columns for session_id, full_visitor_id, page_location, marketing channel, and a book_meeting action. The table records different sessions navigating to homepage and contact_us pages through direct and referral channels with an indication of whether a meeting was booked

The first step is to flatten the google analytics raw table.

Flat table

Let’s do it with fewer fields to simplify our example, but of course you can add as many fields as you want.

### int_flat_ga4_tmp
SELECT
   -- Timestamps
  , DATE(TIMESTAMP_SECONDS({{ unnest_key('event_params', 'ga_session_id', 'int_value') }})) AS date_date
  , TIMESTAMP_MICROS(event_timestamp) AS event_timestamp
   -- Ids
  , CONCAT(SPLIT(user_pseudo_id, '.')[OFFSET(0)], {{ unnest_key('event_params', 'ga_session_id', 'int_value') }}) AS session_id
   -- Pages
  , IFNULL({{ unnest_key('event_params', 'page_referrer', 'string_value') }}, "(not set)") AS page_referrer
  , {{ unnest_key('event_params', 'page_location', 'string_value') }} AS page_location
  , {{ unnest_key('event_params', 'page_title', 'string_value') }} AS page_title
   -- Acquisition
  , {{ channel_grouping('IFNULL(traffic_source.source, "(not set)")', 'IFNULL(traffic_source.medium, "(not set)")') }} AS channel_grouping
   -- Event
  , event_name
FROM {{ source('ga4', 'ga_raw') }}

To have this session / page granularity we need to identify the ranking of every page during a session and regroup events per page. Let’s do it this way:

### int_sessions_pages_rank
WITH sessions_pages_rank AS (
  SELECT
    CONCAT(SPLIT(user_pseudo_id, '.')[OFFSET(0)], {{ unnest_key('event_params', 'ga_session_id', 'int_value') }}) AS session_id
    , IFNULL({{ unnest_key('event_params', 'page_referrer', 'string_value') }}, "(not set)") AS page_referrer
    , {{ unnest_key('event_params', 'page_location', 'string_value') }} AS page_location
    , MIN(event_timestamp) AS event_id
  FROM {{ source('ga4', 'ga_raw') }}
  WHERE TRUE
  GROUP BY 1, 2, 3
)

SELECT
  session_id
  , page_referrer
  , page_location
  , ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY event_id) AS row_pages
FROM sessions_pages_rank
WHERE TRUE

Ok now we join this table to the flat_ga4:

### int_flat_ga4
SELECT
  fg.*
  , sp.row_pages
FROM {{ ref('int_flat_ga4_tmp') }} fg
LEFT JOIN {{ ref(int_sessions_pages_rank) }} sp ON
                                    fg.session_id = sp.session_id AND
                                    fg.page_location = sp.page_location AND
                                    fg.page_referrer = sp.page_referrer

It will give us the table below:

Table with analytics data including date, session ID, visitor ID, page location, channel type, events, and number of pages viewed on a website for January 1, 2024. Sessions show activity like page views and a book meeting event, via direct and referral channels, with varying numbers of pages viewed


This flat table serves as a foundational starting point for any desired granularity, whether it be at the session level, daily level, or focused on a particular event.

Session / Pages table

To create our final table that will be used by our BI tool, we need to group by session and row_pages. Remember, this row_pages field allows us to identify the ranking of every page during a session and regroup events per page.

### fc_sessions_pages
SELECT
   -- Primary Key
  , session_id
  , row_pages
   -- Date
  , MIN(date_date) AS date_date
   -- User
  , ANY_VALUE(full_visitor_id) AS full_visitor_id
   -- Page
  , ANY_VALUE(page_location) AS page_location
   -- Acquisition
  , ANY_VALUE(channel_grouping) AS channel_grouping
   -- Event
  , SUM(IF(event_name = "book meeting", 1, 0)) AS book_meeting
FROM {{ ref('int_flat_ga4') }} fg
GROUP BY 1, 2
Snapshot of a web analytics table from January 1, 2024, detailing session IDs, visitor IDs, page locations such as homepage and contact_us, marketing channels like direct and referral, and a column indicating the number of times a meeting was booked, with one instance of a booking on the contact_us page

That’s it! Now we have our final table. You can modify the code and add as many events and measures as you want. For our example we built some visualisations in Looker like the ones below.

Comprehensive dashboard view from Google Analytics 4 showing key website metrics including Users, Active Users, Sessions, Engaged Sessions, and Bookmeeting Sessions. There are pie charts for user acquisition by channel and device type, a line graph for number of sessions over several months, and a detailed channel grouping table with corresponding user and session numbers

Conclusion

Why this granularity? The “sessions / events” has some drawbacks because depending on your traffic and number of events you could end up with a huge table, and high storage / request cost. In the meantime, with the session / page granularity, a detailed table is present and we are still able to analyse what appends on a page for a specific session, and understand the journey of our users.

Thank you

For this example we simplified the way we build the different models. If you want to have the full code, go on our website and book a meeting with us 😉 We will be glad to discuss how we can implement the best Google Analytics data modelling to feed your business needs.

If you enjoyed reading this article, stay tuned as we regularly publish articles on Looker and how to get the most value of this tool. Follow Astrafy on LinkedIn to be notified for the next article ;).

If you are looking for support with your Looker implementation, advice on Data Stack or Google Cloud solutions, feel free to reach out to us at sales@astrafy.io.