Cloudflare, one of the world’s largest internet infrastructure providers, has begun blocking AI web crawlers by default unless they receive direct permission from site owners.
This new policy changes the longstanding practice where AI developers could freely scrape the web to train large language models (LLMs).
A Default Block on AI Crawling
Previously, Cloudflare allowed website owners to opt out of AI crawling. Now, blocking is automatic. This reversal comes after more than 1 million customers chose to restrict AI bots under the former optional system.
As of now, AI vendors must explicitly seek permission to access content, including clarifying whether their intent is training, inference or search.
“This long-awaited feature by Cloudflare is a true disaster for many GenAI vendors, which may be fatal to the current business models of GenAI,” said Dr Kolochenko, CEO at ImmuniWeb and a Fellow at the British Computer Society (BCS).
“This security feature will elegantly prevent data-greedy bots from unwarrantedly scraping human-created content without permission and without paying for it.”
A New Economic Model for Web Content
The updated policy introduces a “Pay Per Crawl” program. This feature lets a select group of publishers set pricing terms for AI scrapers. In return, AI companies can choose to pay for content access or be denied entry. This permission-based approach contrasts with the previous model, where web scraping relied on loosely enforced rules, such as robots.txt.
Read more on AI scraping: Gray Bots Surge as Generative AI Scraper Activity Increases
During the Axios Live event last week, Cloudflare CEO, Matthew Prince, emphasized the broader implications.
“If the internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone,” Prince explained.
“In sum, most GenAI vendors will soon face a tough reality: paying a fair price for high-quality training data while staying profitable. In view of the formidable competition emanating from China, many Western GenAI companies may simply quit the business as economically unviable,” Kolochenko added.
Legal Gray Areas and Social Media Exemptions
The legality of scraping remains murky. In May 2025, Irish and German regulators declined to block Meta from using Facebook and Instagram data to train its Llama model, despite opposition from privacy and consumer groups. These developments highlight the gap between fast-moving technologies and slower regulatory systems.
“In some jurisdictions, a deliberate bypass of anti-bot protection and massive data scraping may constitute a criminal offense,” Kolochenko said, adding that breach of contract claims, not copyright, could pose the most serious legal threat to GenAI companies.