AdBlockers hide the majority of visitors from web analytics
I used Mixpanel analytics to tell me how many people are reading my articles. Recently I implemented my custom analytics. I was astonished when my custom analytics showed me more than twice the number of page views than the numbers from Mixpanel. AdBlock was hiding sixty per cent of my site’s visitors from Mixpanel.
Illustration by Ljubica Petkovic
Everyone who publishes on the internet loves analytics. It’s a big part of the writing feedback process. That’s why I included analytics on this blog. I decided to use Mixpanel because I value my user’s privacy. As far as I was able to tell, Mixpanel is not using the captured user data for advertisement. I only wanted to know that someone visited my blog; I didn’t need all the detailed user demographic you get from Google Analytics.
MixPanel is a terrific tool, but the free plan only allows for a thousand monthly users. When I published my Zettelkasten article, I exceeded my limit by order of magnitude. The plan that would handle small bursts of traffic like that would cost me $400 a month, a clear overkill for my humble blog. So I decided to create my custom analytics.
My custom analytics
I’ve seen articles on Hacker News about blog authors creating their custom analytics1, and I thought: “Why shouldn’t I do the same?” The only requirement: If a page gets opened in a browser I want to know about it. No funnels, no cookies, no demographics.
Every time someone loads a page on this blog, the browser sends a ping to my server. The server captures the ping as an nginx access log entry2. I store the IP address, User-Agent (the browser type), the page visited, and the referrer3 if the browser shares it.
I use goaccess to process the logs and get useful statistics. I’m planning on open-sourcing the whole setup once I extensively tested it.
After building my custom analytics, I created one more test account on Mixpanel to compare the results, and I was astonished by the difference in page view numbers. The only explanation is that AdBlockers like UBlock Origin block the Mixpanel tracking.
The following numbers are for the period between 14th and 26th June 2020.
Mixpanel analytics captures only 40% of all page views on my blog.
Daily page views difference
Measuring method and interpretation
Both analytics measure the same thing: A browser opened a page on my blog. I looked into the access logs in my custom analytics, and I’m confident that the number of page views is realistic and captures real people visiting the site.
Even though the sample of more than five thousand requests is statistically significant, it can’t be extrapolated to just any website. The audience of this blog is tech-savvy, and the chance of them having AdBlocker installed is much higher than in the general population.
The results are not going to be specific to Mixpanel, and the same decreased numbers can be expected for any other commercial analytics out there.
The future of analytics
It’s fantastic that so many of my readers are using AdBlock to protect their privacy. I hope this trend will continue, and it’s going to be increasingly harder to track complex user behaviour on the internet.
I was cautious about gathering only the absolute minimum of information about my readers. I only care about how many times each page gets viewed as feedback on my writing. Until privacy-aware analytics like simpleanalytics get whitelisted in AdBlock lists, using access logs is the only reliable way to get the full picture.
I’m going to keep using the Mixpanel and my custom analytics side by side a bit longer. But once I exceed my limit in Mixpanel again, I’m just going to remove it and rely on my custom analytics. I got really excited when I saw that 150% jump in traffic as I started measuring all of it.
Update 2020-07-19: Adam Graham pointed out that it’s not clear whether Mixpanel gets affected by AdBlockers or not. I updated the lead paragraph for better clarity.
Analytics without Google, My Messy Analytics Breakup ↩︎
An access log is a text file where each line represents one request to the server. ↩︎
Referrer header is the most controversial information I store. I keep it because it’s great to know where this blog is mentioned, but it does help to track user behaviour. The header itself has security problems ↩︎
(the difference between total and valid requests are the crawlers) ↩︎