Reporting GA4 vs BigQuery is not equal. Why and what are the main reasons for it? This is a very common snag when discussing GA4 export reports is the remark, “But the data is different from the GA4 user interface”. Sometimes followed by a sentence like “it’s wrong, fix it”.
So in this article we will try to explain how these differences arise and what is actually correct.
GA4 vs BigQuery reporting: why is it not equal?
GA4 user interface data is not equal to BigQuery exported data. Google Analytics 4 reports do not represent reality. Not that anything changed with the advent of GA4, even GA3 didn’t represent reality. It’s just that with GA3 it was a long accepted fact. This article does not discuss previously known factors such as consent policies, browser limitations, blocking tools, etc.
This article discusses the new reasons that come with general access to raw GA4 data. Which with GA3 was only accessible to GA360 users or users who downloaded data from GA3 in small chunks and then pieced it together.
Reporting GA4 vs BigQuery: Main reasons
GA4 attribution is still a black box.
In GA3 everyone was used to the Last click non direct attribution. Alternatively, in GA3 360 to a different model we set up. In GA4 the data attribution model is set from the beginning and its exact definition is still a black box, so it cannot be replicated one to one from exported data. Therefore, you need to understand what attribution model your reporting uses.
Interface estimates some data.
Google estimates some data due to process complexity. It uses an improved HyperLogLog algorithm to do this. HLL++ https://en.wikipedia.org/wiki/HyperLogLog.
While we know the algorithm that Google uses to estimate some metrics, unfortunately it is not possible to replicate this in BigQuery for all metrics for the reason stated by GA4 BigQuery:
Note: For user counts, Google Analytics uses sparse precision value of 25. Since BigQuery sparse precision value is always precision + 5, the value will default to 19. Thus, this parameter will not match with Google Analytics UI when counting users. There will be a small difference in user count for cardinalities up to approximately 12,200.
More info: https://developers.google.com/analytics/blog/2022/hll?ref=ga4bigquery
Google Signals
BigQuery export may show more users if you have Google Signals enabled. Since this is personal information of users, Google is very protective of this data and does not provide it in the export. Google Signals attempts to combine data from multiple devices/”cookies” into one user by using additional information that the user provides to Google. This cannot be replicated one to one, but it is of course possible to use your user data that you have whether in GA4 or other systems.
Mismatch in the definition of metrics
The GA4 data model was built to survive for a while. GA3 came in 2012 and there are still plenty of people who would be thrilled if GA3 could stay around even though sites and apps have fundamentally shifted since 2012.
Until recently, there was no such thing as a session in GA4. It was only due to user pressure that Google decided to count visits in reports, but it doesn’t count visits as we know them from GA3, but session_start events, which can slightly alter the data in some extreme situations.
First_user_source_medium and session_source_medium and source_medium are 3 different metrics
Transaction revenue GA4 stores revenue in USD by default. In case of other currencies, exchange rates are addressed and may be used in GA4 different currencies than those used in raw reporting.
More info: https://support.google.com/analytics/answer/9796179#zippy=%2Cin-this-article
Bad GoogleAds visit attribution
If a visit comes from a Google ad and carries with it a gclid in the url. GA4 does not assign google/cpc source/medium to this visit, but google/organic. In the presence of utm parameters Google is not consistent and sometimes assigns the visit correctly and sometimes not. There is currently no official statement on whether this will change or if this is the desired state.
If google/organic is assigned and no campaign is assigned. This can all be solved by using data from Google Ads, where all gclid data and campaigns it belongs to exist. In the case of using multiple Google Ads accounts, the situation starts to get very complicated.
Aggregation of segmented data
This is not a GA4 feature here, but an old known user bug. But since it is a somewhat more common bug than necessary it is included in this article.
Because GA4 exports can be really large, it is common for some metrics to be pre-calculated. For example, if we precalculate the number of site users by day:
Monday 100 users
Tuesday 100 users
1+1 ≠ 2
The total number will usually not be 200 users, since the same user may have visited our site on different days, and if we add up the individual dates, we are counting one user multiple times.
Conclusion
Although GA4 is not perfect, don’t hang your head. That was never their purpose. It’s still the best tool on the market through which to address marketing performance, site trends, user trends, and much more. After all, it’s crucial to consider whether you would fundamentally change your behaviour if someone told you that your site was read not by 120.986 people but by 120.362 people, or in the extreme case 300.000 people but it was always triple what you thought since the site’s inception. Of course, that doesn’t mean we shouldn’t address data quality, but in the case of GA4 data there is no need to look for absolute perfection.
If you have any questions about this topic, be sure to contact our professionals! Take advantage of our free capacity and come with us to make the most of your data.