Why Build Privacy-First Application Is That Hard?

Privacy by design

I’m still working on PlaylistShare, hoping to release it to the world in 2024. I mean, hoping to release it to the few friends interested.

One focus I have beside making the application usable, is to put user privacy first. So during the development, I try to not use cookies or other tracking technologies.

For those who don’t know, PlaylistShare is a simple music tracking application. You can add an album with an url to listen to it, and when you listened, mark it as well, add a comment for yourself (or for the others), rate the album, etc.

I just want to share some of my thoughts on the subject, and some of the choices I made. I do not detain the truth here, it is just some “organized” thoughts someone can find interesting. If you want to talk about this, please feel free to contact me on LinkedIn.

Analytics

Well, this section will be short: I don’t really care about analytics (I can here SEO specialists yell at me from there). I used Piwik (now Matomo) a lot in the past, never used Google Analytics, and I won’t for my personal projects.

Here, analytics is not really a thing cause like I said one sentence earlier: I don’t care about analytics. Next.

The Captcha

When I made the user registration, I wanted to secure the registration with some anti-bot features. The go-to in that case would be reCaptcha or hCaptcha, but in a privacy-first application, reCaptcha is a clear no. hCaptcha, while it’s better in this field, is also a no, since it uses cookies and gather a lot of informations to work.

I also checked Altcha, and at first, I wanted to implement this one. You can install it in your own server, so no third party involved.

But in the end, I implemented an easier alternative (at least I think it is): IconCaptcha. It is a fully self-hosted captcha system as Altcha. It was a bit tricky to implement in my current stack (Laravel / ReactJS), but in the end, it works. I guess it’s less secure than Altcha, but in the current state of the application, it is enough and it’ll block the main threats I expect at first, without any privacy trade off.

Embedded iFrame

Then I write the Privacy Policy template for the application, I write a “No cookies or tracking technologies” section. I admit it feels good to write this section. But then I think further: I use embedded iframe from YouTube and Bandcamp. That’s a huge issue, it is a liability.

Embedded iFrame from another website is a risk and will eventually use tracking technologies one way or another.

I didn’t dig a lot through that. I read some articles, blog post and StackOverflow questions about it like this one or this one.

For YouTube embed, I currently consider Invidious API, I started to look at it, and it appears there is no tracking.

For Bandcamp, I start to check where the streamed audio files are from, to ditch the embedded iframe, not sure it’ll work forever though.

Why is it this hard?

So okay, I want to build privacy-first application, and it’s hard. But the question in the end is: why is it so hard?

TL;DR: Solutions that tracks users are simple, easy to implement, secure and working. Others are the opposite. More complex, and when you want to interact with other website, it is quite a journey.

Alternatives to reCaptcha or hCaptcha exists, but none is really easier to implement. And, hot-take here, I guess none is really as secure. But from a privacy point of view, reCaptcha is the worst, and any solutions that are hosted in a third-party server are a liability.

I spent 4 hours to implement IconCaptcha, and I’m sure reCaptcha would be a 10 minutes long implementation. For me, the user privacy is a strong requirement for this project. It’s a side project, so spend a lot of time on implementing a captcha isn’t really an issue, I can do it. For a company, 4 hours vs 10 minutes is an obvious choice.

For the embedded iframe, it is easy to just paste the embed code, or even reverse engineering one to build it dynamically (that’s what I do for bandcamp embed). But if we put privacy first, embedded iframes is code you don’t control. It’s a weak point. You have to trust the provider of the embed. And quite honestly, I didn’t trust YouTube to take care of the user’s privacy, and I trust Bandcamp less and less with the recent events.

And here, there is little to no alternatives. For YouTube, there is Invidious, and maybe others (no name comes to my mind right now), so it should be ok. For Bandcamp, there is none, or at least I didn’t find one. I guess I have to make my own from scratch. But there is a catch: I will spend time and energy at building my own embed by scrapping data. Is it worth it? For me, yes, I will learn new stuff and perhaps build a cool thing someone could use. For a company? Huge no, spending time to build something that can break at anytimes? Please.

My humble guess is this is hard because tracking is the most profitable thing on the web right now. The GAFAM (or AAAMM now I guess) make billions out of our personal data, covered by the fact their platform is free (as in free beer). They have billions to invest in infrastructure and workforce to build easily implementable and secure captcha, analytics and cool embedded content. And the new proprietary platform will eventually do the same to get a piece of the cake.

The FOSS community try to compete here, they build Invidious, FreeTube, PeerTube, Funkwhale, Mastodon, the list getting longer and longer and it is really cool. But the content is not really here for now. Creators are present on some of these platform, but in the end, it is a niche. And most of the musical content I consume is on YouTube or Bandcamp, unfortunatly.

And in the end, the web development environment wants you to track your users, for analytics sure, but just to gather data about you, not necesseraly for shady business. Sometimes it is genuinely to “improve the service”. For me, the real issue was, is and will be trust. You have to trust the company, developper and provider that bring softwares and services to you.

Really short conclusion

So here I am, trying to build my humble application without any user tracking.

It’s hard, but fun. And in the end, just the “No cookie and tracking technologie” section of the Privacy Policy worth the time and energy for me.

And in the end, you just have to trust me on that.

I can argue my source code is public, but how can you be sure the public source is the one thats run in production? You see the point, I’m not willing to elaborate this here. Maybe on another post.

I don’t tell you to don’t trust anyone, just be careful on what you gave to the Internet, you never know how that’ll be used.

Take care everyone.