107

Tracking broad data

Implementing analytics on this website

‘Are you all right?’
‘Yes, sir. I am attempting to fill a silent moment with non-relevant conversation.’
‘Ah. Small talk.’
‘Yes, sir. I have found that humans often use “small talk” during awkward moments. Therefore, I have written a new subroutine for that purpose. How did I do?’
‘Perhaps it was a little too non-relevant.’
—Picard and Data in Star Trek (S6 E18, Starship mine)

I have been on the fence about analytics for quite some time. Of late I have also come to realise that like a lot of other things in life this too is not a black-and-white issue. Consider yourself in the grey zone.

Analytics today has well-earned infamy thanks in large part to Google. The search giant continues to exploit analytics as a means of maximising ad revenue with a simple approach: you should advertise with us because we know who is more likely to want your product and we will show it to them.

If you put aside the notion that Google is analytics, the real fun with gathering data comes to the fore. Moreover, it becomes clear that analytics does not have to be privacy-invasive.

I am a huge proponent of protecting my privacy, and I intend to protect my readers’ privacy just the same. Unless you write to me yourself, I am not interested in prying into your identity, where you live, what you do et cetera. However, I love numbers. And I am curious within limits. I want to know where people are visiting my website from—my limit is the country—and I want to know where people are subscribing to my newsletter from—nothing personal, just the URL—and I want to know what articles people are reading more, not you or them, just people in general. And I want to know how long people are spending on my site and how many pages people are visiting when they do pop in. None of this is personally identifying information or PII is it is more commonly known.

There are more serious reasons to track website visitors too. For example, some of the licenced element I use on my website are limited to anywhere from 50,000 to 100,000 visitors per month. I am under this limit, but unless I have the numbers to show for it one way or another, who is to say? Once again, though, the importact statistic here is simply the numbers and not the people. I am tracking numbers, not people.

Going a step further, I did not want to use a third-party service for this purpose. Third parties are not always unsafe but if I could do it myslef, would that not be safer and more assured privacy for my readers?

Plausible, Umami and Matomo

Plausible is an excellent analytics tool that I self-host on my server—the same one that delivers this website. So headcounts to this website are collected and stored on the same server as all the other stuff you see around this page. There is not even transient sharing of information with third parties. Plausible is also compliant with GDPR, CCPA and PECR because no PII is collected and no cookies are used. It has no user tracking, cross-site tracking (why would I even?), advertising or other insufferable habits like selling user data. Plausible has even been audited independently by a firm of lawyers.

Before I settled on Plausible I considered two other options: Matomo and Umami. Matomo is very powerful, even more than I need. But, while it had an option to not use cookies, it nevertheless still placed an essential cookie to do its job. I am not one of those who claim “cookies are evil” but I just prefer a clean website and if I can afford to not serve cookies, so much the better. So Matomo, while being the simplest to set up of the three I tried (it was quite literally just a copy–paste job), would not cut it for me.

Next I gave Umami a go. I liked Umami as well, with its similarly clean dashboard and straightforward set-up, but I noticed some concerns popping up around the web regarding its adhereing to its many privacy promises. Umami claims GDPR compliance but as Rich Lott pointed out on Github

I think that while Umami markets itself as privacy-first/privacy-focussed, GDPR + CPPA compliant, its docs need to match. So if you have docs saying Look you can store the customer’s PII (personally identifiable informstion) this way then they need to be accompanied by a reminder that this may make your use case non compliant.

I am not too opposed to this in general because it does ultimately fall on the user of a tool if they choose to knowingly use a privacy-invasive feature, but should Umami claim without additional clarification that they are omnipotently protective of user privacy?

Similarly Michael Malis pointed out on Hacker News that “From auditing the source code, this doesn’t seem to be the case. First, it claims it doesn’t use cookies, but it clearly uses localStorage to store a ‘sessionKey’” which seems to me to be problematic, especially when some competitors like Plausible do not use localStorage either. As Michael goes on to say, “Both Fathom and plausible generate a unique salt every day. By getting rid of the old salts, they’ve anonymized any data older than a day.”

Another solution I was interested in was Offen but found their docs unpalatable. But I want to give a shoutout to Goat Counter and Pirate PX, both incredible solutions if you want simple headcounts, especially the latter.

Setting up Plausible

I took embarassingly long to set up Plausible. Starting in 2025 I moved to a VPS which was what made all this possible while reducing my monthly maintenance costs in the process. Perhaps because they would rather have people pay for Plausible as a service (which is understandable) the ‘community edition’, as they call their open-source self-hosted option, is poorly documented for anything but a straight up docker installation.

I run Apache on Debian so a reverse proxy was necessary and the one on the Plausible repo simply does not work. Luckily I had gone through Konstantin Tutsch’s brilliant guide to setting up Umami which provided a working .conf file for reverse proxying with Apache and I implemented the same for my Plausible set-up with minimal changes.

With this the analytics showed up but did not work. A bit of digging into the logs told me there was a websocket communicstion issue. It was also reported on Github with a solution that, as my luck would have it, was for nginx. Another hour of tinkering—in my defence, this is not my area of expertise—it turned out that the Apache solution was simpler: upgrade=websocket. Oh well.

It was great to put together this and I now feel like after a few days of miserably lurking around Stack Exchange and Server Fault I can deploy any docker I put my mind too. I kid of course, Docker is meant to be easy to deploy, but the implementation of analytics has proven to be encouraging for me, knowing how many people are reading my essays makes me want to write a lot more more—for better or worse.

06.01.25 technology

Liked this essay?

It takes time and effort to keep up good quality, independent writing. If you liked what you read, please consider supporting this website. I’m always open to discussions via e-mail or iMessage and several readers get in touch this way.

Subscribe to my newsletter

Confluence is a newsletter on science, technology and society, designed to make you think critically about your world. Dispatched fortnightly.

    Five reasons to subscribe