Govern, Licence, Charge: The Machine Traffic Playbook | MonetizationOS

Guides

May 21, 2026

6 min read

Govern, Licence, Charge: The Machine Traffic Playbook | MonetizationOS

Adam Townsend

Head of Growth

In this article

1
Introduction

TL;DR

Half of all traffic to publisher sites is now machine, and almost none of it pays. Robots.txt is voluntary and routinely ignored, terms of service are rarely enforced, and most publishers cannot even distinguish a search indexer from an AI training crawler at the point of access.
There are three commercial moves available: govern, licence, charge. Govern who gets through and on what terms, licence the access you grant rather than letting it happen unilaterally, and charge in a way that reflects what your content is worth to the systems consuming it.
Governance is impossible without instrumentation, and instrumentation has to happen at the edge. A WAF makes binary decisions, but a real entitlements layer makes a different commercial decision for every visitor category, enforced in milliseconds before content is served, which is the only point in the request lifecycle where machine traffic can actually be reached.
The licensing market is already forming. AP, Axel Springer, FT, Washington Post, Condé Nast, Schibsted and News Corp have signed deals with AI companies, but those first-generation arrangements were coarse because publisher infrastructure could not express finer terms, and the publishers who can express them next will define what the market becomes.
The pricing signal publishers haven't learned to read is scarcity. As AI-generated content floods the web, original primary-source reporting becomes the input AI systems literally cannot reproduce, which means machine access pricing will move upward over the next decade, not downward, and the publishers with infrastructure to participate will be the ones shaping that curve.

‍

Half the traffic hitting publisher infrastructure today is not human, and almost none of it is paying. That figure is not new and not contested, but what is starting to change is that the infrastructure to do something about it now exists, and the publishers using it are shaping what machine access pricing looks like for the rest of the industry.

There are three moves available: govern what gets through and on what terms, licence access to content that AI companies have been taking for free, and charge for that access in a way that reflects what the content is actually worth to the systems consuming it. The majority of digital publishers have not started on any of the three, and the reason has very little to do with commercial intent. They know machine traffic is growing and they understand their content has value, but the barrier is that their existing stacks were never built to express the decisions a real commercial strategy would require.

The machine is already reading your content, and the only question is who gets to set the terms.

Govern first

Governance means having a deliberate, enforceable policy about who gets access to your content, under what conditions, and through what mechanism, and right now most publishers have none. They have robots.txt files, which are voluntary and widely ignored by commercial crawlers, as Forbes and several other publishers documented in mid-2024 when they showed that Perplexity was bypassing their robots.txt directives entirely and reprinting article content verbatim in its responses. The file had told Perplexity to stay out but Perplexity ignored it, because the file had no enforcement mechanism, and neither did the publishers who relied on it. They have terms of service that prohibit scraping, expensive to enforce and rarely tested in court, and they have no systematic way to distinguish between a Googlebot crawl that supports their search visibility and an AI training crawler that depletes their content asset without compensation.

The starting point is instrumentation, because you cannot govern what you cannot see, and most publishers cannot see their machine traffic in any useful sense. They know requests are arriving, but they do not know which are search indexers, which are training crawlers, which are aggregators pulling content for redistribution, and which represent commercial relationships worth developing. Publishers who have done the analysis tend to find non-human requests accounting for 30 to 50% of total server load, while the ones who haven't are running their business on incomplete information. When the New York Times filed suit against OpenAI in December 2023, part of its complaint was that OpenAI had ingested millions of Times articles without authorisation or compensation, but the Times had no systematic visibility into what had been taken, in what volumes, or for which purposes, which is an instrumentation failure before it is a legal one.

Once you can see the traffic, you can make decisions, and those decisions are best expressed as entitlement rules configured by the business rather than hardcoded by engineering or 3rd party software. A search engine crawler gets free access to indexable content because its downstream discovery value justifies it, a known AI training pipeline gets access to a defined content tier at a licensed rate, an anonymous human visitor gets a metered number of articles, a paying subscriber gets full access, and an AI agent acting on behalf of a subscriber inherits that subscriber's entitlements. Each is a distinct commercial relationship, configured in advance and enforced automatically on every request, and this granularity is what separates a governance system from a firewall, because a firewall makes a binary decision, while an entitlement system makes a commercial one, and it makes a different commercial decision for every visitor category the business has defined.

Architecture is what makes that real. A governance layer operating at the CDN edge, in milliseconds, before a request reaches the origin server, intercepts machine traffic at the only point in the request lifecycle where it can actually be reached, and the decision happens before content is transmitted, which means content that is withheld has genuinely been withheld, not obscured behind a JavaScript overlay that machine visitors never render and not blocked after the origin has already done the compute work of serving the response. For machine traffic operating at API scale, client-side governance does not exist, and the edge is not a technical preference but the precondition for governance being real rather than theatrical, and for the access decisions a publisher makes to carry actual commercial weight.

Licence what you grant

Licensing becomes possible only once governance infrastructure exists, because a licensing framework formalises a relationship that currently happens informally and unilaterally. A training crawler takes your content because it can, whereas a licensing framework means it can only take your content because you have agreed to terms: a defined scope, a defined use, a defined period, and defined compensation.

The market is already there. The Associated Press signed with OpenAI in 2023, Axel Springer signed the same year, the Financial Times signed in May 2024, and the Washington Post, Condé Nast, Schibsted, News Corp and dozens of others have followed. These are early transactions in what will become a standard commercial category, and the publishers who moved first set the terms, while the publishers who wait negotiate from a progressively weaker position, or discover they have already been scraped and the moment for negotiation has passed.

Those first-generation deals gave the industry a proof of concept, but they were necessarily coarse, because a publisher without entitlement infrastructure granular enough to express conditional terms has little choice but to negotiate in large blocks. The AP's deal with OpenAI covered broad access to the AP's archive and real-time newswire on multi-year terms, and the AP could not meter individual article usage, separate training rights from inference rights, apply geographic restrictions, or know precisely what had been consumed and for which purposes. Those terms were not chosen freely but defined by the limit of what the publisher's infrastructure could actually express and enforce, and the pattern repeats across most of the early deals: significant value transferred, limited control retained.

A proper entitlements layer changes what is negotiable, opening up metered access to specific content tiers rather than the full archive, geographic restrictions that preserve exclusive arrangements in premium markets, time-limited licences that expire rather than becoming perpetual rights, use-case separation that prices read access, synthesis rights and training rights differently, and terms that scale with actual usage rather than a flat fee for unlimited consumption. This is the granular, intelligent control that protects a publisher's most valuable assets while still participating in the market, and none of it is expressible without infrastructure that has been built to enforce it.

A viable licensing programme begins with infrastructure capable of expressing and enforcing terms, not with a deal already in hand. The Really Simple Licensing standard, with its first stable specification published in late 2025 and edge-level enforcement through Cloudflare and Akamai, gives publishers a machine-readable way to declare their access terms to any crawler capable of reading them, while an entitlements system handles grants and revocations, tracks consumption against agreed terms, and connects to billing. The infrastructure exists, but most publishers have simply not assembled it.

The strongest counterargument is that individual publishers lack the leverage to negotiate with large AI companies, and that without collective action, licensing efforts will be circumvented, which is precisely why initiatives like SPUR exist, backed by the BBC, the Financial Times, the Guardian, Sky News and Telegraph Media Group, to build the shared technical standards that give publishers collective strength. But waiting for collective action before building individual infrastructure is the wrong sequence, because a publisher that has instrumented its traffic, classified its crawlers and built an entitlements layer can join a collective framework immediately when the moment arrives, while a publisher that has done none of this cannot move at all, regardless of how strong the collective position becomes.

Charge for what it's worth

When pricing machine access, the logic is different from pricing for human readers, because a human subscription optimises for engagement, habit formation and long-term retention, whereas a machine access fee optimises for data quality, exclusivity, update frequency and use case. A training crawler values depth of archive and breadth of coverage, a real-time inference system values recency and reliability, and an agent accessing content on behalf of a human subscriber creates a different commercial relationship again, all of which are different products at different price points that a single subscription tier cannot express.

There is also a structural reason why original journalism commands particular value as AI training data, and it strengthens over time rather than weakening. A 2024 paper in Nature by Shumailov and colleagues found that AI systems trained predominantly on AI-generated content begin to degrade, because the tails of the original data distribution disappear first, and then diversity erodes recursively as each model generation inherits the distortions of its predecessors. Original reporting, fact-checked and drawn from primary sources, is not reproducible by the systems that consume it, so the more AI-generated content floods the web, the scarcer authentic primary-source content becomes, and the more valuable the publishers who produce it become to the systems that need it. That scarcity is the pricing signal publishers have not yet learned to read, and it is the foundation on which the machine access market will price upward, not downward, over the next decade.

There is no settled market rate, and we will not pretend otherwise, but what is visible is a market in formation, with the bilateral deals multiplying, the standards creating enforcement mechanisms, and publishers with the infrastructure to participate shaping what the pricing norms become. The music industry spent decades building ASCAP and BMI precisely because individual rights holders could not negotiate with broadcast platforms at scale, and publishing is building equivalent infrastructure now, so the publishers who have functioning access and entitlements systems will be part of defining what fair compensation looks like, while the ones who have not will accept whatever terms emerge.

The infrastructure is the precondition

The honest answer to why most publishers have not started on any of this is not that they lack commercial ambition, but that the barrier has been infrastructure, and specifically the gap between what most publisher stacks can do and what governing, licensing and charging for machine access actually requires. Blocking and ignoring have both persisted not because they are good strategies but because neither requires anything of your existing architecture, since you can add an IP block to a WAF without touching your monetisation platform, and you can ignore the problem entirely without touching anything at all. The moment you want to do something different, make a real-time access decision for a specific crawler type, offer a licensed endpoint with usage tracking, or bill for machine access at a different rate from your human subscriptions, you discover that your current stack was never designed for any of it.

What the infrastructure needs to do is straightforward to describe: classify every visitor at the point of access, before content is served, in milliseconds; apply commercial rules to machine visitors with the same granularity it applies to human subscribers; deliver licensed access through authenticated endpoints; and connect usage to billing. None of this requires rebuilding editorial infrastructure or replacing existing monetisation platforms, but it does require an access and entitlements layer that sits between content and every incoming request, operates at the CDN edge, and can express the commercial decisions the business needs to make.

MonetizationOS is the only infrastructure layer that governs, licences and charges for both human and machine traffic through a single system, so a paying subscriber's entitlements and an AI training crawler's licensing terms are enforced by the same edge layer, visible in the same reporting, and managed through the same commercial interface. Legacy paywall platforms were built for human subscribers and have no native architecture for machine visitors, security and WAF tools were built to block traffic and have no commercial logic, and bot-specific point solutions address a narrow problem but cannot extend to the full access and entitlement management the moment requires. We built MonetizationOS as composable, extensible infrastructure precisely because no one knows which commercial models for machine access will dominate or what combinations will become standard, so whatever licensing framework a publisher negotiates, whatever standards emerge, and whatever the future of access looks like, the infrastructure to enforce those terms at the edge is already in place and does not require re-architecture to accommodate new models. Build once, adapt without rebuilding, and that is what strategic sovereignty over your content actually means in practice.

The bilateral deals are proof the market is there, and the standards are proof the commercial framework is forming. The machine is reading your content today, and whether any of it generates revenue depends on whether your stack can act on the decision to govern, licence and charge, which for most publishers, right now, it cannot, and that is the only thing in the way.

-------

MonetizationOS is edge-native infrastructure that governs and monetises every access request in real time, from human audiences to AI agents. One million free operations per month, no setup fees, deploys in hours. Get started at monetizationos.com.

‍