Web 4.0: The Pragmatic Internet

Table of Contents

The web has evolved in waves, each building on the successes and failures of what came before. But we’re at a crossroads where incremental fixes can no longer address fundamental architectural problems. It’s time for a pragmatic reimagining of how the internet works.

Web n+1.0 #

Web 1.0 was static pages. Simple, clunky, but revolutionary: you could look things up and read them anywhere in the world. The read-only web served its purpose from 1989 to 2004, delivering information through basic HTML and file systems.

Web 2.0 gave us social interaction, platforms, and collaboration. It centralised everything, but it worked: the web became usable and participatory. From 2004 to 2010, user-generated content and social networking transformed how we use the internet.¹ The key was backward compatibility and clear value propositions that drove rapid adoption.

Web 3.0 promised decentralisation, semantic web standards, and blockchain-driven ownership. It failed catastrophically, but not for a single reason. Two competing visions emerged: Tim Berners-Lee’s semantic web using RDF and OWL, and the blockchain-based decentralised web. The semantic web achieved only 4 million domain adoptions by 2013 despite 20+ years of development,² failing due to what Aaron Swartz called the “formalising mindset of mathematics” creating standards “so abstract that few ever saw widespread adoption.”³ Meanwhile, blockchain projects show 90% failure rates⁴ with fundamental scalability limitations and user experience problems.

Our data is still not organised #

Here’s the reality check that current “Web 4.0” discussions miss entirely:

HTML is still the core document format, and it’s terrible at structured meaning
Websites vary wildly, chasing design fads and corporate identity that often give false signals about quality and trust
Some services talk to each other — maps, reviews, payments — but mostly it’s patchy and fragmented
Users are fatigued by inconsistent interfaces, and AI agents can’t work properly because every service exposes its data differently

The evidence is overwhelming. Over 80% of AI projects fail — twice the rate of non-AI IT projects⁵ — largely because LLMs cannot directly interpret HTML data. They’re trained for linear text processing while HTML follows tree-based structures. Akamai reports AI scraping now accounts for 600+ million requests daily⁶ with low success rates due to data quality issues.

We were close once. Meta tags, schema.org, OpenGraph showed promise. Today, over 45 million web domains use Schema.org structured data,⁷ and success stories are impressive: Rakuten saw 270% traffic increase after implementing recipe schema,⁸ and National retail stores achieved 2% better click-through rates with product rich results.⁸ But adoption only happened when SEO or marketing had clear incentives. It never reshaped the fundamental web architecture.

Yes, XKCD’s comic about “yet another standard” is still funny. But without a lighter, structured, intent-driven web, AI cannot be the interface layer we want it to be.

Web 4.0: Less about design, more about organisation #

Web 4.0 isn’t about more meaningless jargon or flashy features. It’s about finally organising the web’s data into consistent, machine-readable structures that work for humans, AI agents, and future interfaces we haven’t imagined yet.

This doesn’t mean saying goodbye to design and competition. Businesses still express their brand through logos, colors, tone, and imagery. But the heavy, clunky UX differentiation that makes users relearn every service disappears.

Instead, the web is built on seven core primitives:

1. Intents #

What a service actually does: book taxi, order food, read article, compare mortgages. Every service declares its capabilities in machine-readable format.

2. Contexts #

The structured information needed for decisions: menus with allergen data, stock levels, real-time pricing, product specifications, availability windows.

3. Actions #

How you actually engage: order, reserve, cancel, subscribe, compare. Each action has predictable inputs and outputs.

4. Branding #

Controlled design hooks for identity and expression, without destroying consistency. Your logo, colors, and voice remain, but within a coherent framework.

5. Identity #

Federated identity signals that work everywhere. Use Apple ID, Google, Microsoft, or bring your own decentralised identity. Business verification works the same way. The global Identity and Access Management market projects growth from $17.80 billion (2023) to $61.74 billion (2032),⁹ indicating massive investment in solving these fragmented identity challenges.

6. Financial #

Abstracted payment processing. “Pay £20” works regardless of the underlying rails — card, bank transfer, digital wallet, or crypto. The online payment API market is growing from $200 million to $306.5 million by 2032¹⁰ as organisations seek unified payment solutions.

7. Contracts #

Every exchange leaves a structured audit trail: receipts, invoices, terms, subscriptions, warranties. Digital contracts with programmatic enforcement.

A new trust model #

Right now, age verification is one of the biggest UX nightmares on the web. Governments demand it, websites struggle with clunky solutions, users lose privacy. 62% of European consumers abandon applications due to verification friction.¹¹

In Web 4.0, a site simply asks: “Is this user over 18?” and gets a trusted signal from your browser. That signal comes from your identity provider of choice. The site doesn’t see your date of birth, passport number, or gender. Just the minimum required: yes/no.

This privacy-preserving approach is becoming reality. Google’s 2025 open-source zero-knowledge proof solution¹² and France’s CNIL proof-of-concept¹³ demonstrate feasible implementations. Zero-knowledge proofs allow verification without revealing underlying data — a critical technology for privacy-first web architecture.

That’s the model for all sensitive data. Identity becomes a system of scoped trust exchanges, not endless form-filling. Need to verify income bracket for a mortgage quote? Your provider signals “Tier 3” without revealing your exact salary. Want age-restricted content? Get “18+” without sharing your birthday.

Fair compensation for AI training #

Advertising doesn’t vanish in Web 4.0, but it’s reined in. Platforms, not sites, decide how ads are represented, preserving revenue streams while preventing dark patterns.

For AI training, Web 4.0 solves the looming fairness crisis. When AI systems want to use your content, they engage through structured content exchanges with fair compensation.

Cloudflare pioneered this approach in July 2024 with their “Content Independence Day” initiative,¹⁴ becoming the first infrastructure provider to block AI crawlers by default without permission or compensation. Their “Pay Per Crawl” system¹⁵ allows publishers to set per-request pricing, with major publishers like The Atlantic and Associated Press endorsing the model. This isn’t theoretical — it’s happening now.

Privacy as default architecture #

Most interactions don’t require full disclosure. You don’t need to reveal your gender to price a flight. You don’t need your home address to browse a restaurant menu.

In Web 4.0, your identity provider handles data exchange. Only necessary information flows through the system. Airlines get payment and passenger details only at checkout. Restaurants get delivery addresses only when you actually order.

This means less data hoarding, fewer breaches, deeper trust. The homomorphic encryption market reached $30.51 million in 2024¹⁶ with 74% of financial institutions deploying privacy-preserving technologies that enable computation without exposing underlying data.

Why does this matter? #

For users: One consistent experience, whether through a voice assistant, browser, or AR interface. No more relearning every service. Current voice assistants show the problem: Google Assistant accuracy declined from 98% (2017) to 88% (2019)¹⁷ despite technological advances, largely due to poor structured data integration.

For AI: Finally, a structured, universal way to interact with the real economy. Current approaches fail because each service requires custom scraping and interpretation.

For businesses: Easier integration, cheaper development, less duplicated effort. Organisations currently maintain dedicated teams for each platform combination among 63,000+ possible browser-platform-device variations.¹⁸

For competition: Comparison and aggregation services flourish because data is accessible and consistent. No more platform lock-in that generates billions for dominant players while stifling innovation.

For privacy: Share only what’s needed, when it’s needed, with cryptographic guarantees about data usage.

Learning from past failures #

The semantic web failed because it was led by ideology over execution. Tim Berners-Lee’s vision suffered from over-engineered standards that Aaron Swartz described as “uniformly scourges on the planet, offenses against hardworking programmers.”¹⁹ The standards were “so abstract that few ever saw widespread adoption.”²⁰

Web 4.0 learns from these failures. Instead of complex ontologies and theoretical frameworks, we propose seven concrete primitives that solve real problems. Instead of requiring wholesale replacement, existing Schema.org implementations can evolve incrementally into the Web 4.0 framework.

Current blockchain-based Web 3.0 approaches repeat the same mistakes — prioritising ideology over usability. The 90% project failure rate²¹ demonstrates what happens when technology leads rather than follows user needs.

The Pragmatic path forward #

This structure doesn’t kill creativity. It forces creativity to flourish within constraints — the same way iOS design guidelines led to a generation of apps that were both functional and beautiful. Following iOS design principles leads to 30% increases in user engagement and 20% decreases in failed interactions.²²

Web 4.0 isn’t a radical reimagining that breaks everything. It’s the pragmatic evolution that the web desperately needs:

Built on proven foundations: Schema.org adoption proves structured data works when there are clear incentives
Addresses real pain points: Identity fragmentation, payment complexity, privacy violations, AI integration failures
Enables new possibilities: Voice interfaces, AI agents, augmented reality — all working with consistent, structured data
Preserves what works: Existing websites continue functioning while gaining structured capabilities

The web needs to work for the interfaces we’re building today and the ones we haven’t imagined yet. That requires moving beyond the current mess of fragmented APIs, inconsistent schemas, and privacy-hostile data collection toward a clean, organised, pragmatic foundation.

The point #

Web 3.0 failed because it was ideology over reality. Current “Web 4.0” hype is falling into the same trap, focusing on buzswords instead of solving actual problems.

Our vision of Web 4.0 isn’t more meaningless jargon. It’s a web of finally organized data — structured into intents, contexts, actions, identity, and contracts. A web that works for whatever interface you prefer: voice assistants, AI agents, traditional browsers, or future interaction modes.

A web that is less heavy, more information-focused, privacy-first.

A web that actually works.

Chohan, Usman W. “Web 3.0: The Future Architecture of the Internet?” SSRN, 2022. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4037693 ↩︎
Berners-Lee, Tim. “Semantic Web Road map.” W3C Design Issues, 1998. https://www.w3.org/DesignIssues/Semantic.html ↩︎
“Whatever Happened to the Semantic Web?” Two Bit History, May 27, 2018. https://twobithistory.org/2018/05/27/semantic-web.html ↩︎
“Concept and Dimensions of Web 4.0.” ResearchGate, 2017. https://www.researchgate.net/publication/321366810_Concept_and_Dimensions_of_Web_40 ↩︎
“The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed.” RAND Corporation, 2024. https://www.rand.org/pubs/research_reports/RRA2680-1.html ↩︎
“The Rise of the LLM AI Scrapers: What It Means for Bot Management.” Akamai, 2024. https://www.akamai.com/blog/security/rise-llm-ai-scrapers-bot-management ↩︎
“Schema Markup Statistics 2025.” Amra & Elma, 2025. https://www.amraandelma.com/top-schema-markup-statistics-2025/ ↩︎
“Schema Success Stories: Using Structured Data to Boost Traffic.” Search Engine Journal, 2020. https://www.searchenginejournal.com/schema-success-stories-structured-data-boost-traffic/372734/ ↩︎ ↩︎
“DID and VC: Untangling Decentralized Identifiers and Verifiable Credentials for the Web of Trust.” ACM Digital Library, 2021. https://dl.acm.org/doi/fullHtml/10.1145/3446983.3446992 ↩︎
“Online Payment API Market to Reach $306.5 Million, Globally, by 2032 at 5.2% CAGR.” GlobeNewswire, August 5, 2024. https://www.globenewswire.com/news-release/2024/08/05/2924113/0/en/Online-Payment-API-Market-to-Reach-306-5-Million-Globally-by-2032-at-5-2-CAGR-Allied-Market-Research.html ↩︎
“Age Verification: Methods & Compliance Explained.” Ondato Blog, 2024. https://ondato.com/blog/what-is-age-verification/ ↩︎
“Opening up ‘Zero-Knowledge Proof’ technology to promote privacy in age assurance.” Google Blog, July 2025. https://blog.google/technology/safety-security/opening-up-zero-knowledge-proof-technology-to-promote-privacy-in-age-assurance/ ↩︎
“Exploring Privacy-Preserving Age Verification: A Close Look at Zero-Knowledge Proofs.” New America, 2024. https://www.newamerica.org/oti/briefs/exploring-privacy-preserving-age-verification/ ↩︎
“Content Independence Day: no AI crawl without compensation!” Cloudflare Blog, July 4, 2024. https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/ ↩︎
“Introducing pay per crawl: enabling content owners to charge AI crawlers for access.” Cloudflare Blog, January 2025. https://blog.cloudflare.com/introducing-pay-per-crawl/ ↩︎
“Homomorphic Encryption Market Size & Share Trends, 2033.” Global Growth Insights, 2024. https://www.globalgrowthinsights.com/market-reports/homomorphic-encryption-market-110929 ↩︎
“5 major flaws of Voice Assistant technology in 2022.” Fleksy Blog, 2022. https://www.fleksy.com/blog/5-major-flaws-of-voice-assistant-technology-in-2022/ ↩︎
“Understanding Browser and Device Fragmentation.” BrowserStack Blog, 2024. https://www.browserstack.com/blog/understanding-browser-os-and-device-fragmentation/ ↩︎
Cagle, Kurt. “Why the Semantic Web Has Failed.” LinkedIn, 2016. https://www.linkedin.com/pulse/why-semantic-web-has-failed-kurt-cagle ↩︎
“Whatever Happened to the Semantic Web?” Two Bit History, May 27, 2018. https://twobithistory.org/2018/05/27/semantic-web.html ↩︎
“Concept and Dimensions of Web 4.0.” ResearchGate, 2017. https://www.researchgate.net/publication/321366810_Concept_and_Dimensions_of_Web_40 ↩︎
“The Negative Impact of Mobile-First Web Design on Desktop.” Nielsen Norman Group, 2019. https://www.nngroup.com/articles/content-dispersion/ ↩︎