SurveillanceCapitalism — jolek78's blog

Guests on our own web

Sat, 16 May 2026 07:58:18 +0000

A few months ago I spun up a new VPS on Linode, London datacentre. Nothing special – Debian, Nginx, a Let's Encrypt certificate, a domain I was going to use for my daily notes and my homelab experiments. No link posted anywhere, no entries in my feeds, no backlinks from the sites I run. Just a freshly assigned IP, from a subnet that a week earlier had belonged to someone else.

The one thing I had configured carefully was the logs: nginx with an extended format, journald with audit, a few baseline fail2ban jails. I wanted to see what happens to a server that doesn't yet have a life, before I gave it one. Twenty-four hours later, I opened the logs. No humans. That was expected – I hadn't told anyone the domain. But there was already a small zoo of other presences. A wget from a Polish VPS with a phantom reverse DNS, the kind registered with a placeholder that never got updated. Three GETs, same resource, thirty-six second intervals. Then nothing. An SSH scan on port 80 – yes, an SSH scan on the HTTP port – written in Go, with a user-agent that claimed to be Mozilla/5.0 but was negotiating TLS the way only Go's crypto libraries do. VisionHeight, a commercial scanner that bills itself as ethical, mapped seven ports in two and a half minutes. Censys came through twice, identifying itself, leaving its own PTR and a link to its opt-out page. A Common Crawl crawler. GPTBot. ClaudeBot. AppleBot.

People: zero.

I spent the evening watching those logs the way you'd watch a sequence of read-heads on a tape. It was like opening the door to a flat you'd just rented and finding it already occupied by intruders. This is a public network, they seemed to be saying, and nobody told you what public means.

Since then I've done what everyone does: I've built defences. nftables to drop ASNs known for aggressive scanning. fail2ban with custom jails for nginx that recognise the patterns of the noisier scans – probes against /wp-login.php on a server that doesn't run WordPress, attempts at /.env, requests for phpMyAdmin paths that don't exist. GoAccess to visualise what little organic traffic remains once the rest is filtered out. An alert system over ntfy for out-of-band anomalies. It is routine – every sysadmin running a homelab has their own variant. But building it calmly, rather than as a patch on something that has already fallen over, is precisely what gets you to look at things that would otherwise scroll past, filtered away.

And while I was building it, a question came to mind, maybe a banal one, an extremely banal one: who am I doing all this for?

Not for the readers – those are few, almost none, they arrive via RSS, shared links, the occasional search engine. I was defending the server from a network that is predominantly non-human. I was configuring jails for scanners that don't know me, for crawlers that don't read me, for botnets that don't particularly mean me harm – they mean harm to anyone reachable on a port 22 or 80.

The threshold: 51% (and already 53%)

In 2024, for the first time in ten years, bot-generated traffic surpassed human traffic on the internet. Fifty-one percent against forty-nine. The figure comes from Imperva's Bad Bot Report, 2025 edition, the twelfth in the annual series – the analysis is based on thirteen trillion requests blocked by their global mitigation network in 2024 alone. It is the number that best sums up where we have ended up.

The 2026 Bad Bot Report, published a few weeks ago with 2025 data, has updated the figure: 53% bots, 47% humans. Another point and a half lost in twelve months. It did not happen all at once. Here is the historical series, from 2015 onwards:

Year	Humans	Bad bots	Good bots
2015	54%	27%	19%
2018	62%	22%	17%
2020	59%	26%	15%
2022	53%	30%	17%
2023	50%	32%	18%
2024	49%	37%	14%
2025	47%	n/d	n/d

(Source: Imperva, Bad Bot Report 2025 and 2026)

Humans have lost seven percentage points in ten years. The erosion is slow and steady – a descending curve measured in years, not in months. Nobody cut a ribbon to announce we have crossed the threshold. It was a gradual shift of the axis, a median that moved while we were looking elsewhere. Meanwhile, the bad bots grew from 27% to 37%. Ten percentage points in ten years, all on the predatory side. Brute force, credential stuffing, data scraping, account takeover, API fraud. Imperva records that ATOs – Account Takeover Attacks – grew by 40% in 2024 alone, and in 2025 the financial sector absorbed 46% of all ATO incidents worldwide. And, dulcis in fundo, the “good bots” – Googlebot, Bingbot, the legitimate aggregators, the health checkers – went down. From 19% in 2015 to 14% in 2024. The indexing services that historically justified bandwidth consumption have lost ground: the network has become more automated, but in a direction that does not pay off for those who publish.

Cloudflare confirms this with independent data. Their Radar Year in Review 2025, published at the end of December, reports that global internet traffic grew by 19% in 2025, and a substantial share of that growth is attributable to bots and AI crawlers. Googlebot still dominates – around 28% of verified traffic – but the new generation is gaining fast: OpenAI's GPTBot went from 4.7% in July 2024 to 11.7% in July 2025. ChatGPT-User, the bot that acts on explicit user command, recorded a year-on-year growth of 2,825% in request volume. That is not a typo. PerplexityBot, even more extreme: +157,490%.

The 51% threshold has to be read in this context. The curve has been rising for years, and 2024 is not the peak. The network we are using today is not the 2015 network with a few more bots: it is a structurally different network, where humans have gone from being the main signal to being the background noise.

Who is talking in this network?

When you say “bot” you do not say one thing. The presences in the logs belong to three families that do different jobs, have different economies, and produce different pressures on the infrastructure. Three main categories, then.

The cartographers. These are the scanners that map the entire IPv4 space – four billion three hundred million addresses – across all or nearly all known ports, and maintain queryable databases of exposed services. The founding project is ZMap, released in 2013 by a team at the University of Michigan. ZMap is a port scanner that can scan the entire IPv4 space on a single port in under 45 minutes from a single machine, from userspace, over a gigabit connection. Technically remarkable: it cuts by an order of magnitude the time needed to “see” all of the internet. Censys was built on top of ZMap, launched in 2015 by the same authors. Censys continuously scans IPv4, collects TLS certificates, service banners, software fingerprints, and keeps everything in a queryable commercial database. Shodan, founded in 2009 by John Matherly, is the conceptual predecessor: less polished technically, but longer-lived and more deeply rooted in sysadmin culture. Rapid7's Project Sonar, ZoomEye, Fofa, Netlas – all follow the same logic.

In 2012, an anonymous researcher decided he wanted to census the internet but did not have the bandwidth. He built an illegal botnet of compromised routers – the Carna Botnet – and ran the first Internet Census: he published the dataset online and declared his own offences. It remains a case study in the asymmetry between technical capability and legality – what Censys does today from a datacentre, ten years ago was a federal crime in the United States. The scanners describe themselves as ethical. They respect abuse@, publish their methodology, exclude networks on request, leave identifiable PTRs. All true. But the data they produce – the complete, near-real-time map of what is exposed on the internet – is sold by subscription, and the clients include academic research and the surveillance industry, corporate threat intelligence and aspiring attackers with seventy-nine dollars a month for a base account. In 2014, at Def Con 22, researchers Dan Tentler, Paul McMillan and Robert Graham ran a live IPv4 scan on port 5900 looking for VNC servers without authentication. They found thirty thousand systems accessible without a password. Among them: two hydroelectric power stations, the cameras of a Czech casino, industrial control systems, ATMs, a caviar production plant. The map exists because producing it is cheap, and who consults it – for what purposes, with what consequences – is a consequence of that price, not the reason for the project.

The extractors. These are the AI crawlers. They existed in embryonic form before too – Common Crawl for years, the indexing archives of search engines forever – but since November 2022, with the release of ChatGPT, they have changed in nature and in volume.

Cloudflare's data, collected from a fixed sample of clients to eliminate the growth bias, is explicit. Between July 2024 and July 2025:

GPTBot (OpenAI): from 4.7% to 11.7% of total crawler traffic
ClaudeBot (Anthropic): from 6% to nearly 10%
Meta-ExternalAgent (Meta): from 0.9% to 7.5%
PerplexityBot (Perplexity): growth of 157,490%
Bytespider (ByteDance): declining, from 14.1% to 2.4%

The most revealing figure is the composition by purpose. Cloudflare classifies AI crawling into three categories: training (data collection to train models), search (indexing for chat search), user action (visits on explicit user command). Over the past twelve months, 80% of AI crawling has been for training. 18% for search. 2% for user action. In the most recent six months the training share has risen further, to 82%. The overwhelming majority of the work these bots do around the web does not, then, serve network mapping – it serves to extract content, process it, and turn it into training data for models that will then sell access or use the output to generate responses that compete with the originating site.

Another Cloudflare metric measures the imbalance directly: the crawl-to-refer ratio, that is, how many requests a bot makes versus how much traffic it then sends back to the source site. In July 2025, Anthropic was crawling 38,000 pages for every human visitor it sent back – a clear improvement on the 286,000:1 ratio recorded in January of the same year, but still the most lopsided extreme among the major AI platforms. OpenAI in the same period was running at around 1,500:1. Perplexity 194:1. The economic model is asymmetric extraction: take a lot, give back little.

The parasites. These are the bad bots in the strict sense: 37% of total internet traffic in 2024. Thirteen trillion requests blocked by Imperva's network alone in that year.

Here the composition changes. Imperva observes that “simple” attacks – basic scripts, dictionary attacks, automated scans – grew from 40% to 45% in 2024. The report explicitly attributes this growth to the arrival of generative AI: tools like ChatGPT, Claude, Llama have lowered the technical barrier to writing a brute forcer, a credential stuffer, a malicious crawler. What ten years ago required Perl and John the Ripper today requires a prompt and ten minutes. 31% of total attacks recorded by Imperva fall into one of the twenty-one OWASP Automated Threats categories. 44% of advanced bot traffic attacks APIs, no longer web pages – because APIs expose business logic with fewer defences and more value. 21% of attacks use residential proxies: IP addresses belonging to real domestic connections, rented on the grey market, allowing the bot to blend in as legitimate user traffic. Geo-fencing, per-IP rate limiting, ASN blacklists – all useless against an attacker who routes traffic through a residential fibre line in Milan.

One detail demolishes a widespread myth. Attackers usually do not want to take a site down. They want to use it. A compromised site is worth more alive than dead: as a host for phishing, cryptocurrency mining, botnet command-and-control, traffic redirect for black-hat SEO, file storage for warez. When the site falls, the attacker has done something wrong – they have saturated resources, triggered detection, burned their foothold. Akamai regularly publishes reports that confirm this: the economic model of the malicious bot is the long stay, not the raid. This changes the reading of visible symptoms. If a site falls over with intermittent 502s, the structural explanation is almost always: saturation of a PHP-FPM pool due to medium-scale bot traffic, on infrastructure that was not dimensioned to absorb half the internet knocking at the same time. The political explanation – they are attacking us to silence us – is almost always false, because anyone who knows how to attack seriously does not let the site fall over.

In June 1994 Martijn Koster, a Dutch sysadmin running the early web crawlers for ALIWEB, proposed a convention: a text file at the root of the site, robots.txt, in which the operator could declare which parts of their domain crawlers were kindly asked not to visit. No central authority would enforce it, no network protocol would verify it. It was a gentleman's pact, full stop. It worked because in the nineties crawlers were few, they were run by people who knew each other, and nobody had an economic interest strong enough to burn their reputation by ignoring a directive. For thirty years it held. Googlebot, Bingbot, Yandex, Common Crawl – all respected robots.txt as part of the basic etiquette of indexing. It was so established that the formal specification only arrived in 2022 (RFC 9309), decades after the daily practice. When the IETF standardised it, they did so to document a consolidated practice, not to create a new one.

That pact, in the last three years, has been broken.

Drew DeVault, founder of SourceHut – the niche git platform much loved by those who do not want to be on GitHub – published a post in March 2025 that became a manifesto, titled Please stop externalising your costs directly into my face. The piece describes, with technical coldness, the behaviour of LLM crawlers:

they crawl everything they can find, robots.txt be damned, including expensive endpoints like git blame, every page of every git log, and every commit of every repository, and they do this using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses – mostly residential, in unrelated subnets, each making no more than one HTTP request over any window we tried to measure – actively and maliciously adapting and blending in with legitimate traffic to evade any attempt at characterisation or blocking

It is the description of a distributed DDoS attack carried out by companies that present themselves as legitimate consumers of bandwidth. SourceHut had to unilaterally block entire cloud providers – Google Cloud, Microsoft Azure – because it was the only viable defence.

The Wikimedia Foundation, in April 2025, published data that complements this. Since January 2024, the bandwidth consumed by media downloads on Wikimedia Commons has grown by 50%. The increase does not come from new human readers: it comes from AI scrapers vacuuming up the entire catalogue of 144 million open-licence files. Wikimedia has quantified it: 65% of the most expensive traffic hitting the central datacentres is bot-generated, even though bots account for only 35% of total pageviews. The bots read in bulk – they request obscure pages that the regional cache does not have, forcing the infrastructure to fetch them from the centre. A human reader costs little; an AI crawler costs a lot, and the cost-to-benefit ratio for the body hosting the content has become unsustainable. The Foundation has set as a 2025/2026 annual goal: “reduce by 20% the traffic generated by scrapers”. An organisation that hosts the largest free encyclopaedia in the world is forced to invest engineering in repelling those who want to read it.

The KDE project's GitLab went down temporarily because of a crawler coming from Alibaba IP ranges. GNOME's GitLab installed Anubis, a proof-of-work challenge written by Xe Iaso – on arrival at the page, the browser has to solve a small computational problem before the content is shown. Costs nothing to a human, costs dearly to a bot that has to do millions a day. The numbers published by Bart Piotrowski, GNOME's sysadmin, after switching on Anubis: in two and a half hours, 81,000 total requests, of which only 3% made it through the proof-of-work. 97% were bots. Anubis' default loading screen shows a girl in anime style – it is an explicitly provocative aesthetic choice by Iaso, who has declared he wanted to make the experience annoying for those using these tools to extract.

Kevin Fenzi, who administers Fedora's infrastructure, has blocked traffic from entire countries. Drew DeVault, in the same post, writes:

Every time I sit down for a beer with my friends and fellow sysadmins, it is not long before we start complaining about the bots and asking each other whether the other has found the definitive way to get rid of them. The desperation in these conversations is palpable

It is the first-person chronicle of a technical community that has watched a thirty-year cooperative protocol break in thirty-six months.

Anthropic, OpenAI and the others publicly respond that they respect robots.txt. The sysadmins' logs say otherwise. Cloudflare, in its December 2025 report, writes unambiguously that “crawling activity can be aggressive, often ignoring the directives found in robots.txt files”. The structural problem is simple: robots.txt never had an enforcement mechanism. It rested on reputation. For those extracting data today to train AI models, the dataset is worth more than the reputation lost by ignoring it.

What it means for those who publish

For anyone running a small site – a blog, an online magazine, a collective's server, a personal homelab – the 51% (and more) figure translates into a daily operational reality that those who do not administer do not see. A server receives, in proportion, the same kind of bot traffic as the New York Times. Not the same volume, of course – but the same mix. GPTBot downloads wp-content, Censys maps the ports, some botnet tries credentials against three or four well-known WordPress endpoints. Even publishing three articles a month to a readership of two hundred people, you end up statistically anonymous, inside a scanning distribution that is uniform across all of IPv4.

This produces two effects.

The first is that the technical barrier to publishing on one's own has grown. In the 2000s it was enough to install WordPress on a shared host and forget about it. Today that model survives only if there is someone taking care of the maintenance – timely updates, well-curated plugins, robust passwords, offsite backups, monitoring. Without it, the site does not get attacked in a targeted way: it simply gets consumed by background pressure, like a cliff that erodes without any particular wave breaking on it.

The second is centralisation. The industry's response to the problem has been “managed everything”: Cloudflare in front of everything, managed WAFs, hosting that does automatic protection, CDNs that absorb anomalous traffic. They work. But the price is that a large chunk of the web now passes through a single provider – Cloudflare handles something like 20% of global HTTP requests – and the small independent publisher who would like to remain small and independent has to choose between delegating their network to a commercial intermediary or accepting standing upright in the wind.

On the defensive front there is a ferment of countermeasures – creative and desperate at the same time. Beyond Anubis, there are tar pits: Nepenthes, written by an anonymous developer who signs himself “Aaron”, responds to crawlers with infinite labyrinths of generated content – pages that link to other pages that link to others, all synthetic, all designed to consume the bot's resources without giving anything useful in return. Cloudflare has released a commercial equivalent, AI Labyrinth, which does the same thing serving irrelevant text to recognised crawlers. There is the community project ai.robots.txt, which maintains an up-to-date list of AI crawler user-agents and provides both a ready-made robots.txt and .htaccess rules to block them. A small archipelago of individual countermeasures – effective in some cases, but also a symptom: the fight is site by site, sysadmin by sysadmin, because no higher level exists where the question can be resolved.

Self-hosting is still possible. I do it myself, many others do. But it requires time, competence, continuous attention. It has become a niche. What in the 1990s was the normal way of being online is today an exception that needs to be justified – and maintained by hand.

We publish for human readers. But the infrastructure is shaped by bots. The visible web – the one humans see, navigate, read – is the surface tip of an iceberg made mostly of traffic invisible to the eyes and visible in the logs. The real web – the one the bots see – is all of IPv4, scanned in search of usable surfaces.

Guests on our own web

When Tim Berners-Lee described the World Wide Web in the early 1990s, he spoke of a space for connecting people: documents, ideas, knowledge, communities. The cyberlibertarian narrative of the years that followed – Barlow's Declaration of the Independence of Cyberspace in 1996, the Californian dream of the internet as individual emancipation from the hierarchies of the twentieth century – amplified that promise until it became myth. Thirty years later, the data is one: in 2025, humans are 47% of internet traffic. The majority is machines. And 80% of the work of those machines is the extraction of value from pages that other humans have written, to be processed and sold as predictive, classificatory, generative capability.

Lawrence Lessig saw it in 1999, in Code and Other Laws of Cyberspace. The thesis was simple: code is law. The technical architecture of a network is already political, because it determines what behaviours are possible. Changing the code – the protocols, the specifications, the design choices – means changing which practices are economic and which are not. TCP/IP does not speak about identity, and that is a political choice with thirty-year consequences. robots.txt was cooperative, and that is a political choice that has become a vulnerability. Those who have controlled the architecture – the ARPANET engineers first, the large infrastructure companies later – have already written the rules of the game, regardless of who won the elections or wrote the laws. Lessig has been repeating it for twenty-five years. It is happening now, on a global scale.

We are guests on our own web. We have been for at least a decade, and for two years we have been statistically a minority. The rent we pay is in data extracted without our noticing, in attention consumed by content generated by those who have scraped ours, and in administration hours spent keeping in place infrastructure that is not designed for us. It is not a metaphor: it is an accounting that could be done line by line, if anyone felt like keeping it. The interesting question, then, is not how we block the bots: it is what it means to publish and administer in an internet where the intended audience is no longer the majority of the recipients. A question we should have asked ourselves a long time ago, and one that concerns not only technical operators, but anyone who considers the internet a common good – political, cultural, material.

Sources and further reading

On bot traffic statistics and trends

Imperva (Thales) (2025). 2025 Bad Bot Report: The Rapid Rise of Bots and the Unseen Risk for Business. Twelfth annual edition. The decade-long historical series, the pillar 51% figure, composition by attack category, estimates on residential proxies and ATOs. Thirteen trillion bot requests blocked in 2024. https://www.imperva.com/resources/resource-library/reports/bad-bot-report/
Imperva (Thales) (2026). 2026 Bad Bot Report: Bad Bots in the Agentic Age. Updated figures for 2025: 53% bots, 47% humans. https://www.imperva.com/blog/
Cloudflare Radar (2025). 2025 Year in Review: The rise of AI, post-quantum, and record-breaking DDoS attacks. AI crawler composition by purpose (training/search/user action), GPTBot/ClaudeBot/Meta-ExternalAgent share, crawl-to-refer ratio by platform. Independent confirmation of the Imperva data from a completely different network angle. https://radar.cloudflare.com/year-in-review/2025
Cloudflare Blog (2025). From Googlebot to GPTBot: who's crawling your site in 2025. https://blog.cloudflare.com/

On the breakdown of cooperative protocols

DeVault, D. (2025). Please stop externalising your costs directly into my face. SourceHut blog, March 2025. The manifesto, in first person, of a sysadmin who watches the cooperative robots.txt pact break. Essential reading to understand what it means to administer a FOSS service under pressure from LLM crawlers. https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html
Wikimedia Foundation (2025). How crawlers impact the operations of the Wikimedia projects. Diff blog, April 2025. The internal data: 65% of the most expensive traffic from bots, 35% of pageviews. The most documented case of asymmetry between costs borne by the body hosting free content and benefits extracted by crawlers. https://diff.wikimedia.org/
Iaso, X. (2024–present). Anubis (proof-of-work anti-AI-scraper). The concrete tool that GNOME, KDE and several other FOSS communities have adopted to defend public infrastructure from aggressive crawlers. Demonstrates that defence, today, is proof-of-work – that is, computational friction applied to those who want to read. https://anubis.techaro.lol/

On scanning infrastructure

Durumeric, Z., Adrian, D., Mirian, A., Bailey, M., Halderman, J. A. (2015). “A Search Engine Backed by Internet-Wide Scanning”. Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS '15). Founding paper of Censys. Describes how scanning IPv4 has become economically trivial. Essential technical reading to understand the discovery/defence asymmetry. https://zmap.io/
Akamai (various years). The Web Scraping Problem and related Threat Intelligence reports. Economic model of the malicious bot as a parasitic long stay, not as a destroyer. Demolishes the common intuition that a site that falls over has been “attacked”: those who know how to attack well do not make anything fall over. https://www.akamai.com/blog/security

On the political economy of digital infrastructure

Lessig, L. (1999, updated as Code v2 in 2006). Code and Other Laws of Cyberspace. Basic Books. Code is law. The technical architecture of a network is already political because it defines what is possible. Twenty-five years later, the thesis is the single most useful conceptual tool for reading what is happening to robots.txt. http://codev2.cc/
Zuboff, S. (2019). The Age of Surveillance Capitalism. PublicAffairs. Framework of non-consensual extraction as the dominant economic model of Silicon Valley. To be read thinking that its thesis, written about behaviour, applies today one level deeper: to the textual raw material.
Crawford, K. (2021). Atlas of AI. Yale University Press. The materiality of AI as extractive asymmetry: mines, datacentres, underpaid human labour. I would add: your server.

Original protocol specifications

Postel, J. (ed.) (1981). Internet Protocol. RFC 791. The original IP specification, fourteen pages that never talk about identity. https://datatracker.ietf.org/doc/html/rfc791
Koster, M., Illyes, G., Zeller, H., Sassman, L. (2022). Robots Exclusion Protocol. RFC 9309. The formal specification of robots.txt, arriving thirty years after the practice and already obsolete in the practice. Worth rereading every so often to remember that today's internet is a palimpsest of hacks on top of a protocol conceived for a world that no longer exists. https://datatracker.ietf.org/doc/html/rfc9309

Discuss...

#Bots #AICrawlers #robotsTxt #DigitalSovereignty #SelfHosting #Cloudflare #SurveillanceCapitalism #FOSS #Internet #SolarPunk #Writing

· 🦣 Mastodon · 📸 Pixelfed · 📬 Email · · ☕ Support this work on Liberapay

Reflections on an (impossible) escape from capitalism

Sun, 05 Apr 2026 15:46:47 +0000

It was an ordinary Friday evening. The parcel had arrived with the courier that morning, but I only opened it after dinner, with that silent ceremony I perform every time new hardware shows up – as if opening a box too quickly were a form of disrespect toward the object. Inside was a HUNSN 4K. Small, almost ridiculously small. A mini PC in a form factor that fit in the palm of a hand. I put it on the table, looked at it. Looked at it again. And then an uncomfortable thought occurred to me. I had ordered it from a Chinese reseller, paid with a credit card, through a completely traceable payment infrastructure, from one of the most centralised and surveilled commercial ecosystems in existence. To build a homelab that would let me escape centralised and surveilled ecosystems.

The funny thing – funny in the sense that it makes you laugh, but badly – is that I'm not alone. Every day, somewhere in the world, someone orders a mini PC, a Raspberry Pi, a managed Mikrotik switch, with the stated goal of taking back control of their digital life. They order it on Alibaba, pay with PayPal, wait for the courier. And they see nothing strange in any of this, because the contradiction is so structural it has become invisible. This article is an attempt to make it visible again. Without easy solutions, because I don't have any. And when have I ever…

The Promise of the Homelab

When, in 2019, I started self-hosting pretty much everything – Nextcloud (always on a Raspberry Pi, first RPi3 then RPi4), Jellyfin, Navidrome, FreshRSS, and about twenty-five other services on Proxmox LXC, each with its own isolated Docker daemon – I did it with a precise motivation: I wanted to know where my data lived, who could read it, and have the ability to switch it off myself if I ever felt like it. Not when a company decides to shut down a service, not when someone else changes the licence terms. Me. This came after a long period of reflection on myself, the work I was doing and still do, and the technological society I live in. It is an ideological choice before it is a technical one. Technology as a tool for autonomy rather than control; infrastructure as something you own instead of something that owns you. I hope no one is alarmed if I say that some of these reflections began with reading Theodore Kaczynski's Manifesto, before eventually landing, of course, on more authoritative sources.

Yes, I'm mad, but not quite that mad…

When you pay a subscription to a cloud service, the transaction does not end the moment you authorise the electronic payment. Shoshana Zuboff, in The Age of Surveillance Capitalism, calls this mechanism behavioral surplus: the behavioural data extracted beyond what is needed to provide the service, then resold as predictive raw material.

Under the regime of surveillance capitalism, however, the first text does not stand alone; it trails a shadow close behind. The first text, full of promise, actually functions as the supply operation for the second text: the shadow text. Everything that we contribute to the first text, no matter how trivial or fleeting, becomes a target for surplus extraction. That surplus fills the pages of the second text. This one is hidden from our view: “read only” for surveillance capitalists. In this text our experience is dragooned as raw material to be accumulated and analyzed as means to others' market ends. The shadow text is a burgeoning accumulation of behavioral surplus and its analyses, and it says more about us than we can know about ourselves. Worse still, it becomes increasingly difficult, and perhaps impossible, to refrain from contributing to the shadow text. It automatically feeds on our experience as we engage in the normal and necessary routines of social participation.

You are not the customer of the system – you are its product. Your habits, your schedules, your preferences, your hesitations before clicking on something: all of this is collected, modelled, sold. The transaction is not monthly: it is continuous, invisible, and never ends as long as you use the service. With hardware, in principle, the transaction is one-time: you buy, you pay, it ends, it is yours. The disk is in your room, not on a server subject to government requests, security breaches, or business decisions that are nothing to do with you but impact your access to those services. This distinction – between a tool you use and a system that uses you – is the real stake of the homelab. It is not about saving money, it is not about performance. It is about who controls what.

The problem is that building this infrastructure requires hardware, time, knowledge, and resources. The hardware comes from somewhere; the time, the knowledge, and the energy resources come from a privilege not granted to everyone.

The Market I Hadn't Seen

Search for “mini PC homelab” on any marketplace. What you find is a productive ecosystem that has exploded over the past five years in a way I honestly did not expect.

MINISFORUM, Beelink, Trigkey, Geekom, GMKtec. Zimaboard, with its single-board aesthetic designed explicitly for those who want home racks. Raspberry Pi and the galaxy of clones – Orange Pi, Rock Pi, Banana Pi. Managed Mikrotik switches at accessible prices. 1U rack cases to mount under the desk. M.2 NVMe SSDs with TBW figures calculated for small-server workloads. Silent power supplies designed to run 24/7. A market built from scratch, that exists precisely because there is a community of people who want to run servers at home. r/homelab and r/selfhosted on Reddit have approximately 2.8 and 1.7 million members respectively – numbers publicly verifiable, and growing. YouTube is full of dedicated channels. There is an entire attention economy built around “escaping” the attention economy.

But it is worth asking: who built this market, and why. MINISFORUM and Beelink do not exist out of ideological sympathy for the homelab movement. They exist because they identified a profitable segment and served it with industrial precision. Kate Crawford, in Atlas of AI, documents how technology supply chains follow niche demand with the same efficiency with which they follow mass demand: factories in Guangdong optimise production lines not for a worldview, but for a margin. The fact that the resulting product also satisfies an ideological need is, from the manufacturer's point of view, irrelevant.

The Victorian environmental disaster at the dawn of the global information society shows how the relations between technology and its materials, environments, and labor practices are interwoven. Just as Victorians precipitated ecological disaster for their early cables, so do contemporary mining and global supply chains further imperil the delicate ecological balance of our era.

The mechanism had been described with theoretical precision back in 1999 by Luc Boltanski and Ève Chiapello in The New Spirit of Capitalism. Their thesis: capitalism is never defeated by criticism – it is incorporated. When a critique becomes widespread enough, the system absorbs it and transforms it into a market segment. The artistic critique of the 1960s – autonomy, authenticity, rejection of standardisation – became the marketing of the creative economy. The critique of digital centralisation – sovereignty, privacy, control – has become an online catalogue to browse through.

Resistance has become a market segment. Every time someone buys a HUNSN to stop paying subscriptions to services they don't control, a factory in Guangdong sells a HUNSN. Capitalism has not been defeated – it has shifted (at least for a small slice of the population: the nerds, the hackers) the extraction point from subscriptions to hardware.

The Accumulation Syndrome

But there is a further level – more ridiculous and more personal – that homelab communities never discuss openly, yet anyone who has a homelab recognises immediately. The Raspberry Pi 4 bought “for a project.” The old ThinkPad kept because “you never know.” The 4TB disk salvaged from a decommissioned NAS – and “it might come in handy.” The second-hand switch picked up on eBay for eighteen euros because it was cheap and might be useful. The cables, the cables, the cables.

r/homelab has a term for this: just in case hardware. It is the hardware of the imaginary future, of projects that only exist in your head, of configurations that one day – one day – you will finally test. In the meantime it occupies a shelf, draws current in standby, and generates a diffuse sense of possibility that is indistinguishable from the most classic consumerism. The underlying psychological mechanism has a precise name: compensatory consumption – consumption as a response to a perceived loss of autonomy or control. You buy hardware because buying hardware gives you the feeling of recovering agency over something. The aesthetic is different from traditional consumerism – no luxury logos, no recognisable status symbols – but the mechanism is identical.

That said, there is a partially honest answer to all of this: the second-hand and refurbished market. The ThinkPad X230 on eBay, the Dell R720 server decommissioned from a datacentre, the disk from someone who upgraded their NAS. My ZFS NAS, to give one example, is a recycled old tower with four 1TB disks in RAIDZ – hardware that would otherwise have ended up in landfill, with a life cycle extended by years, without generating new production demand. It is closer to the ethics of repair than to compulsive buying. But it has its own internal contradiction: it requires even more technical competence than buying new – knowing how to assess wear, diagnose an unknown component, manage ten-year-old drivers. The barrier to entry rises further. And the refurbished market is itself now an organised commercial sector, with its own margins, its own platforms, its own pricing logic. It is not a clean way out. It is a less dirty way out.

And then there is the energy question, which is usually ignored in homelab discussions and is instead the most uncomfortable of all – uncomfortable enough to deserve a more in-depth treatment later on. For now, suffice it to say: every machine on your shelf that “draws current in standby” is a line item in the energy bill that the homelab movement rarely accounts for.

Not for Everyone. And It Should Not Be This Way.

There is a second level of the paradox that is even more uncomfortable than the first. Building a homelab costs money – relatively little, but it costs. It requires physical space. It requires a decent connection. And it requires time. A lot of time. Not installation time – that is measurable, finite. The learning time that precedes everything else. To reach the point where you can build a functional infrastructure with Proxmox, LXC containers, centralised authentication, reverse proxy, automated backups – you need to have already spent years understanding how Linux works, how to reason about networks and permissions, how to read a log. I started with a Red Hat in 1997, and it took me almost thirty years to get where I am. I should know this. Yet it always escapes me. And that time did not fall from the sky. It is time I was able to dedicate because I had a certain kind of job, a certain stability, a certain amount of mental energy left at the end of the day. It is middle-class-with-a-stable-position time, not the time of someone working three warehouse shifts a week. Passion is not enough.

Johan Söderberg documents this in Hacking Capitalism: the FOSS movement was born as resistance to capitalism, but reproduces within itself hierarchies of skill and merit that make it structurally exclusive. Freedom is technically available to anyone, but effective access requires resources distributed in anything but a democratic manner. Söderberg goes further than simply observing the exclusivity: the voluntary open source work produces use value – functioning software, documentation, community support – that capital then extracts as exchange value without remunerating those who produced it. Red Hat builds a billion-dollar company on a kernel written largely by volunteers. It is not just that not everyone can get in: it is that those who get in often work for someone without knowing it. The homelab inherits this problem and amplifies it.

The narrative of orthodox historical materialism corresponds with some very popular ideas in the computer underground. It is widely held that the infinite reproducibility of information made possible by computers (forces of production) has rendered intellectual property (relations of production, superstructure) obsolete. The storyline of post-industrial ideology is endorsed but with a different ending. Rather than culminating in global markets, technocracy and liberalism, as Daniel Bell and the futurists would have it; hackers are looking forward to a digital gift economy and high-tech anarchism. In a second turn of events, hackers have jumped on the distorted remains of Marxism presented in information-age literature, and, while missing out on the vocabulary, ended up promoting an upgraded Karl Kautsky-version of historical materialism.

This is not a quirk of the homelab movement: it is a recurring structure in every technological wave. Langdon Winner, in his influential essay Do Artifacts Have Politics?, argued that technological choices are never neutral – they incorporate power structures, distribute access in non-random ways. Amateur radio in the 1920s, the personal computer in the 1980s, the internet in the 1990s: every time the promise was democratising, every time the actual distribution followed the lines of pre-existing privilege. Not out of malice, but out of structure. The irony is this: those who would most need digital autonomy – those who cannot afford subscriptions, those who live under governments that surveil communications, those most exposed to data collection – are exactly those least likely to be able to build a homelab. Not for lack of interest or intelligence. For lack of time, money, and years of privileged exposure to technology.

Homelab communities do not usually talk about this. They talk about which mini PC to buy, how to optimise energy consumption, which distro to use as a base. The conversation about structural exclusivity exists, but at the margins – in Jacobin, in Logic Magazine, in EFF activism – while the centre of the discourse remains impermeable. It is not that no one speaks about it: it is that the peripheries speak about it, and the peripheries do not set the agenda. This entire conversation takes place in a room to which not everyone has a ticket. And those inside do not seem to find that particularly problematic.

A Technological Cosplay?

So is the whole thing a con? Is the homelab just anti-capitalist cosplay while you continue to fund the same supply chains? In part, yes.

The HUNSN 4K was designed in China, assembled in China, shipped by container on ships burning bunker fuel. Global maritime transport is responsible for approximately 2.5% of global CO₂ emissions – a share that the IMO (International Maritime Organization) has been trying to reduce for years with slow progress and targets continually postponed. Then: distributed through Alibaba, paid with a credit card. Every piece of technology hardware carries an extractive chain that begins in lithium mines in Bolivia and cobalt mines in the Democratic Republic of the Congo, passes through factories in Guangdong, and ends in electronic waste processing centres in Ghana. The hardware travels that supply chain exactly like any other consumer device. Furthermore, hardware has a lifecycle. In five years the HUNSN 4K will be too slow, or it will break, or something will come out with energy efficiency too much better to ignore. And I will buy again. The mini PC market for homelabs depends on the obsolescence of previous purchases – exactly like any other consumer market.

The critique of capitalism, when it is widespread enough, is not suppressed – it is incorporated. The system absorbs the values of resistance and transforms them into a market segment. Autonomy becomes a selling point. Decentralisation becomes a brand. The rebel who wanted to exit the system finds himself funding a new vertical of the same system, convinced he is making an ethical choice.

The Counter-Shot

But there is a structural difference that would be dishonest to ignore.

When you pay a subscription to a cloud service, the cost is not just the monthly fee. It is the continuous cession of data, behaviours, habits. It is the behavioral surplus Zuboff talks about: you are not using a service, you are being used as raw material to train models, build profiles, sell advertising. The transaction never ends, in ways you often cannot see and cannot escape from as long as you use the service.

With hardware, the transaction ends. The data stays on a physical disk in your room, not on a server subject to government requests, breaches, or business decisions that have nothing to do with you but impact your life. The software running on it – Proxmox, Debian, Nextcloud, Jellyfin – is open source; you can modify it. If something changes in a way you cannot accept, you can leave. This resilience has real value – but it is worth noting that it is asymmetric resilience: it works for those who have the skills to exercise it. For those who do not, the theoretical portability of their data from Nextcloud to something else requires exactly the same skills we have already identified as the barrier to entry. The freedom to leave is real. Access to that freedom, much less so.

And then there is the energy question, which I have deferred long enough. The major hyperscalers – AWS, Google, Azure – operate with a PUE (Power Usage Effectiveness) between 1.1 and 1.2. For every watt of useful computation they dissipate barely 0.1–0.2 watts in heat and infrastructure. They have enormous economies of scale, optimised industrial cooling, significant investments in renewable energy, and above all: their servers run at very high utilisation rates. Almost always busy.

A home homelab works in a radically different way. The machine runs 24/7 even when it is doing nothing – and for most of the time it is doing nothing. Navidrome serving three requests a day, FreshRSS fetching every hour, an LDAP container sitting listening without receiving connections. You are paying the energy cost of the infrastructure regardless of usage. The implicit PUE of a homelab, calculated honestly on the ratio between total consumption and actual workload, is much worse than that of a datacentre. IEA data (Data Centres and Data Transmission Networks, updated annually) shows that large cloud providers progressively improve energy efficiency thanks to economies of scale that no individual homelab can replicate. The flip side is that the same growth in demand that makes economies of scale possible negates the efficiency gains: Amazon's absolute emissions increased between 2023 and 2024 despite improved PUE. Efficiency improves. Total consumption grows anyway. This is Jevons' Paradox: energy efficiency, instead of reducing consumption, increases it, because it lowers the marginal cost of use and stimulates demand that grows faster than the efficiency gains.

Note: The comparison is not as linear as the numbers suggest. PUE measures the internal efficiency of a datacentre, not the energy cost of the network traffic that data generates every time it leaves it – traffic that a homelab eliminates almost completely for internal services. Nor does it measure proportion: AWS is efficient at delivering services to millions of users, but that scale says nothing about the real cost of storing fifty gigabytes of personal data on a server designed for loads a thousand times greater. A HUNSN N100 in idle consumes less than 8 watts. The honest energy comparison is not homelab vs hyperscaler in the abstract – it is homelab vs proportional share of hyperscaler for your specific workload, a calculation that nobody can make with publicly available data.

This does not automatically mean that the cloud is the ethically correct choice – the problem does not reduce to PUE, and surveillance has costs that are not measured in kilowatts. It means that anyone with SolarPunk values who chooses the homelab must reckon with a real contradiction: the choice of sovereignty may be, watt for watt, energetically more costly than the system one wants to escape. I have no clean answer, but ignoring the question would be dishonest. Söderberg acknowledges that the FOSS movement has produced concrete and undeniable gains – they simply are not enough, on their own, to subvert the dynamics of informational capitalism.

In short: this is not a critique of the homelab, but it is a critique of the homelab presented as a sufficient revolutionary act.

What Happens at Eleven PM – and Beyond

That night, with the HUNSN 4K on the table, I pressed on. I installed Proxmox. I configured the network. I started bringing up containers one by one. And at some point – three hours had passed, I had three terminals open and was debugging nslcd to centralise LDAP authentication across all the containers – I realised something: I was doing all of this simply because I enjoyed it. Not to resist something. Not to advance an ideological agenda. Because there was a problem to solve and solving it gave me satisfaction. Mihaly Csikszentmihalyi describes this state in Flow as total absorption in a task calibrated to one's own competencies: time expands, attention narrows, awareness of context vanishes. It is not motivation – it is something more immediate. Debugging an authentication problem at eleven at night on a system I could have chosen not to build is, neuropsychologically, indistinguishable from pleasure. Not the satisfaction of having finished: the process itself. Moreover, for an AuDHD person like me, going into hyperfocus allows you to lose your sense of time entirely, and to literally escape from a world you viscerally loathe.

Ah – you had not figured that out yet?

When I had finished and closed everything, the satisfaction was still there. Along with a mildly uncomfortable awareness: I could probably have used a hosted service, lived just as well, and not lost three hours of a weeknight. But in the meantime I had understood how PAM worked, I had read documentation I had never opened before, I had implemented it on my homelab, I had learned something I hadn't known I wanted to know.

And here the circle closes in a somewhat unsettling way. Söderberg speaks of voluntary open source work as the production of pure use value – the intrinsic pleasure of doing, understanding, building something that works. But it is exactly this use value that capital then extracts as exchange value: the competence I accumulate debugging LDAP at eleven at night is the same competence I bring to work the next day, that I put into articles like this one, that I share in communities where others use it to build their own homelabs. Technical pleasure is not neutral. It has a production chain. Not always visible, but real.

This is what the homelab is, at least for me: a way of learning that produces, as a side effect, an infrastructure I control. The ideology is there, but it comes second. First comes the pleasure of understanding how something works. Or rather: ideology and pleasure are interchangeable, and often run in parallel – but this does not resolve any of the contradictions I described above. It leaves them all standing, in fact makes them stranger. Am I resisting capitalism, or am I just cultivating an expensive hobby with a political aesthetic?

The Hacker Ethic

The word “hacker” has had bad press for decades. In 1990s news bulletins it was a synonym for a hooded cybercriminal; in the jargon of security companies it became a marketing term to prepend to anything. Neither has much to do with what the word historically means. Steven Levy, in Hackers: Heroes of the Computer Revolution, reconstructs the culture that formed around the MIT and Stanford labs in the 1960s: a community of programmers for whom code was an aesthetic object, access to information a moral principle, and technical competence the only legitimate hierarchy. The principles Levy identifies as the “hacker ethic” are precise: access to computers – and to anything that can teach you how the world works – should be unlimited and total. All information should be free. Decentralised systems are preferable to centralised ones. Hackers should be judged by what they produce, not by titles, age, race, or position. You can create art and beauty with a computer.

It is not a political manifesto in the traditional sense. It is something more visceral – a disposition toward the world, a way of standing before a system you do not yet understand: the correct response is to take it apart, understand how it works, and put it back together better than before.

Pekka Himanen, in The Hacker Ethic and the Spirit of the Information Age – with a preface by Linus Torvalds and an epilogue by Manuel Castells, which already says something about the project's ambition – performs a more explicit theoretical operation. He builds the hacker ethic in direct opposition to the Protestant work ethic described by Max Weber: where Weber saw work as duty, discipline as virtue, and leisure as absence of production, Himanen identifies in the hacker a figure who works out of passion, considers play an integral part of work, and rejects the sharp separation between productive time and free time. The hacker does not work for money – money is a side effect, when it comes. They work because the problem is interesting. Because the elegant solution has value in itself. Because understanding how something works is, in and of itself, sufficient.

Hacker activity is also joyful. It often has its roots in playful explorations. Torvalds has described, in messages on the Net, how Linux began to expand from small experiments with the computer he had just acquired. In the same messages, he has explained his motivation for developing Linux by simply stating that “it was/is fun working on it.” Tim Berners-Lee, the man behind the Web, also describes how this creation began with experiments in linking what he called “play programs.” Wozniak relates how many characteristics of the Apple computer “came from a game, and the fun features that were built in were only to do one pet project, which was to program … [a game called] Breakout and show it off at the club.”

Recognise something? I do. Those three hours debugging nslcd at eleven at night were not work in the Weberian sense – nobody was paying me, nobody had asked me to do it, there was no corporate objective to reach. They were hacking in the precise sense that Levy and Himanen describe: exploration motivated by curiosity, with the infrastructure as an object of study as much as of utility. The homelab is, culturally, a direct expression of the hacker ethic. It is no coincidence that homelab communities and open source communities overlap almost perfectly, that they use the same language, the same platforms, the same values. But here, as elsewhere in this article, the story gets complicated.

The hacker ethic promises a pure meritocracy: you are judged by what you can do, not by who you are. It is an attractive idea. It is also, in practice, a partial fiction. Technical meritocracy presupposes that everyone starts from the same point – that skills are accessible to anyone who really wants to acquire them, that the time to acquire them is distributed equally, that mentorship networks and learning resources are available regardless of context. The homelab as hacker practice inherits both things: the genuine nature of curiosity as a driver, and structural exclusivity as an undeclared side effect. The pleasure of taking a system apart to understand how it works is real and should not be devalued. But that pleasure is available, in practice, to those who already have the ticket.

Conclusions

The HUNSN 4K runs, alongside the other “little electronic contraptions,” on a rack next to my armchair – the one where, at the end of the day, I indulge my guilty pleasure of reading a book in the company of my cats. Proxmox, the Nextcloud server, the ZFS NAS, a small MINISFORUM box running Ollama with some local open-weight LLM models, a Raspberry Pi 5 running the Tor Relay, and a HUNSN RJ15 with pfSense controlling incoming and outgoing traffic. An infrastructure, in short, that allows me to have something resembling digital sovereignty within the limits of the possible. The contradictions I have described do not resolve. They are held together, with effort, as any intellectually complex position on a complex system must be held together.

The first: the market that made the accessible homelab possible is the same market the homelab is supposed to emancipate us from. If this explosion of affordable, efficient mini PCs had not happened – if capitalism had not decided to build exactly what we wanted – how many of us would have taken the same path? How much of our “ethical choice” depends on the existence of products designed and sold precisely for us?

The second: does incorporated resistance truly lose its force, or does it remain resistance even when someone profits from it? Boltanski and Chiapello describe the incorporation mechanism, but do not argue that critique loses all effectiveness in the process. Perhaps the homelab is simultaneously a product of the system and a real, if partial, form of withdrawal from it. The two things are not mutually exclusive.

The third: if digital autonomy requires decades of accumulated skills, enough free time to use them, and enough money to buy the hardware, are we building a democratic alternative? Or are we building an exclusive club with a rebel aesthetic, reproducing the same hierarchies of privilege it claims to want to fight?

The fourth: the energy question has no clean answer, and Jevons' Paradox makes it even more uncomfortable – because it works in both directions. The cloud improves efficiency and increases total consumption. A homelab consumes proportionally more, but does not fuel the demand that drives that total consumption upwards. Are we building digital sovereignty, or are we simply choosing where to position ourselves within a contradiction that cannot be resolved at the individual level?

I don't know. But at least I know where my data is.

Fun Fact

This article was written in Markdown using a Flatnotes instance running as a CT container on Proxmox, while listening to a symphonic metal playlist served by Navidrome – another CT container – pulling OGG files from a ZFS NAS over an NFS share. The cited books were in EPUB format on Calibre Web. In the background, Nextcloud on a Raspberry Pi 4 was syncing and backing up everything. Spelling mistakes were corrected by Qwen2.5, an LLM model served by Ollama on the MINISFORUM box, accessible locally via oterm and Open WebUI. And all of this, controlled from a laptop running Linux.

Coincidences? I don't think so.

Discuss...

#Homelab #SelfHosted #SurveillanceCapitalism #Privacy #OpenSource #HackerEthic #SolarPunk #DigitalSovereignty #FOSS #Linux #Writing

· 🦣 Mastodon · 📸 Pixelfed · 📬 Email · · ☕ Support this work on Liberapay

SurveillanceCapitalism — jolek78's blog

Guests on our own web

The threshold: 51% (and already 53%)

Who is talking in this network?

robots.txt, or the death of a social pact

What it means for those who publish

Guests on our own web

Sources and further reading

Reflections on an (impossible) escape from capitalism

The Promise of the Homelab

The Market I Hadn't Seen

The Accumulation Syndrome

Not for Everyone. And It Should Not Be This Way.

A Technological Cosplay?

The Counter-Shot

What Happens at Eleven PM – and Beyond

The Hacker Ethic

Conclusions

Fun Fact