Your AI Stack Runs on the Commons

On Creative Commons, Humans Commons and the uncomfortable question behind every AI product.

Jun 09, 2026

For some time, I have been nurturing a rather niche dream: that we might one day have something like Creative Commons for the age of AI. I know this is not the sort of dream that makes for a good lifestyle magazine profile. Some people dream of houses by the sea, others of quitting everything and opening a vineyard in Tuscany. I apparently dream of licensing frameworks. There are worse things, I suppose, although perhaps not many that sound less glamorous at dinner.

The thought kept returning to me whenever I saw another heated discussion about AI training, copyright, open knowledge, scraping, public data, creator rights, fair use, text and data mining, and all the other terms that now regularly appear where people used to say: “I wrote this”, “I published this”, “please credit me”, or “could you ask first?”. What interested me was the cultural confusion beneath the surface. We seem to have lost the ability to describe different kinds of openness.

Creative Commons has put this very bluntly in its work on CC Signals:
“AI is being built on the largest unregulated extraction of knowledge in history.” This sentence stayed with me because it names something many people have sensed, even if they did not yet have the words for it. The issue is not that knowledge circulates. Knowledge should circulate. The issue is that the scale, speed and asymmetry of this circulation have changed.

There is also a more personal reason why this sentence stayed with me. I am writing from a slightly uncomfortable place. I am a Wikipedian, a long-time admirer of Creative Commons and a believer in open knowledge. I am also an AI consultant and trainer. I spend part of my professional life helping people and organisations understand how to use AI sensibly, critically and productively.

So if I sound conflicted here, it is because I am.

I do not have the comfort of standing on one clean side of the argument. When I think as a Wikipedian, I feel protective of the fragile human systems that make knowledge available: contributors, editors, moderators, translators, community norms, citations, disputes, corrections, patient maintenance. When I think as someone working with AI, I can also see how powerful these tools can be for learning, accessibility, analysis, translation, discovery and everyday work. I do not want to defend extraction. I also do not want to pretend that AI can simply be wished away by people who care about culture and knowledge.

Perhaps this is why the ground feels as if it is shifting under both feet. The open-knowledge world was built on a certain generosity of circulation. The AI world is built on an enormous appetite for material. Somewhere between these two worlds sits a question I cannot avoid: is there still a common ground between openness and AI, or are we watching two logics become incompatible?

For many years, the open internet rested on an informal understanding. If something was available online, other people could read it, quote it, link to it, discuss it, criticise it, learn from it, sometimes translate it, sometimes remix it, sometimes build upon it, depending on the context and the licence. This understanding was never perfect, and anyone who has spent even five minutes around copyright debates knows that humans are remarkably gifted at turning “please credit the author” into a philosophical war. Still, the imagined user of openness was usually another human being.

That assumption no longer holds.

A person reading an essay is one thing. A teacher using an article in class is one thing. A volunteer translating a public-interest resource is one thing. A crawler absorbing enormous amounts of human work into a commercial machine-learning system, with no meaningful conversation about permission, attribution, compensation or future use, is something else. Scale changes the nature of the act. When use becomes extraction at industrial speed, the old language starts to feel inadequate.

This is why I was genuinely intrigued when I came across Humans Commons.

It is an attempt to create AI-era licences that distinguish between human use and non-human, automated or AI use. In its simplest form, the idea is disarmingly clear: a work can be available for humans under certain conditions, while remaining unavailable for AI training, scraping or automated processing unless explicit permission is granted. At the other end of the spectrum, a creator can say something much more permissive: humans, machines, crawlers, bots and AI systems may use this broadly, including for training and development, within legal and harmful-use boundaries. The interesting part is choice.

The Humans Commons spectrum runs from AI0 to AI100.

AI0 is described as human-only use, with no AI. Depending on the exact licence variant, humans may still be allowed to share, adapt or use the work under specific conditions, in a way that echoes Creative Commons-style thinking. AI systems, bots, crawlers, scrapers and other automated tools are excluded unless permission is given.

AI100, by contrast, is a broad, public-domain-style licence that permits copying, adapting, distributing, transforming and using material to train or develop AI models, including commercially. Even this permissive licence excludes harmful uses such as autonomous weapons or models designed to manipulate or deceive in ways that endanger people’s wellbeing.

At first glance, this may sound like a technical licensing story. I do not think it is.

For me, Humans Commons is interesting because it points to a larger question: what kind of commons can survive AI-scale extraction?

Creative Commons’ current work on CC Signals asks a similar question from another angle. CC Signals tries to develop a framework through which creators and custodians of content can communicate expectations around AI reuse. Another sentence from that work deserves attention: “True openness means participation with fairness and respect, not unchecked extraction.” It is a useful reminder that openness itself is not the problem. Openness has supported Wikipedia, open educational resources, public-interest media, open archives, research communities and countless small acts of shared usefulness. It has also made it easier for the strongest technological actors to treat the web as an immense resource base.

This is the paradox of open. A locked-down web would be a disaster for education, research, civil society, independent writers, small organisations and anyone who cannot simply buy access to knowledge. The answer to extraction cannot be universal closure. Yet openness cannot become a trap either. It cannot mean that people and institutions who share generously become the easiest targets for automated appropriation.

This is where Open Future’s work becomes especially useful.

Open Future has been arguing that we need to think about AI and the commons together, and that public-interest AI requires more than access to whatever can be scraped. Their work on commons-based data governance suggests that datasets for AI training can be shared as a public good, provided they are governed as a commons and made resilient to extraction or capture by commercial interests. A commons is not a pile of things lying around for whoever has the largest bucket. It is a resource shaped by rules, stewardship, participation and responsibility.

The Wikimedia example makes this less abstract for me. Open Future’s summary of the Wikimedia CH roundtable described Wikipedia as reaching something like a “Peak Wikipedia” moment: still central to the knowledge ecosystem, increasingly consumed by machines, and at risk of becoming an invisible layer beneath AI-generated answers. The data points are striking: an 8% decrease in user traffic, combined with 50% growth in overall traffic attributed to bots.

AI is no longer only reading Wikipedia intensively. It is starting to replace Wikipedia as the interface through which many people encounter knowledge.

I read this with particular attention, perhaps because I know from the Wikimedia world how much human governance sits behind what many people experience as a simple answer on a screen. Wikipedia looks effortless only if one has never watched the effort. Behind an article there may be disputes, citations, reversions, talk pages, community norms, policy discussions, volunteer time, moderation, trust and stubbornness. A free encyclopedia is not produced by magic, although at times it may require similar levels of faith.

If human visits decline, bot activity rises, and AI tools increasingly mediate access to knowledge, the question is not merely whether Wikipedia has been cited or scraped. The deeper question is whether the feedback loop that keeps the knowledge commons alive is being weakened. If AI systems use the commons while reducing the visibility of the commons, then the problem becomes sustainability of public knowledge itself.

There is another complication: openness does not affect everyone equally.

Open Future’s analysis of NOODL, an experimental tiered licence for African language datasets, makes this point clearly. Traditional open licensing often treats all users as formally equal. That sounds fair until we ask who has the capacity to benefit. A researcher working with limited resources and a multinational technology company may receive the same terms, with very different consequences. Equal access can still produce unequal outcomes.

This point feels especially relevant to AI. A public dataset is not encountered in the same way by every actor. For one community, it may be a tool for language preservation, research or local innovation. For a large company, it may become one more input into a product that returns little value to the people represented in the data. In such cases, a more equitable commons may need differentiated rules. That idea may offend those who prefer the older romance of openness, but perhaps the older romance was always too tidy.

The managerial lesson here is almost embarrassingly familiar. When the operating environment changes, old categories start producing poor decisions. In organisations, this happens whenever people keep using structures, job titles, incentives or reporting lines designed for a previous situation. In the knowledge commons, something similar is happening now. We are trying to govern AI-scale extraction with concepts built for a web where the imagined reuser was still largely human. No wonder the system feels strained.

I do not think licensing alone will solve this. It would be comforting to believe that a neat symbol on a webpage could make every crawler polite, every platform responsible and every AI company newly fascinated by moral philosophy. I am old enough, and have worked with enough organisations, to know that a written rule and an organisational behaviour are very often distant relatives who meet only at weddings. Enforcement will be difficult. Jurisdictions will differ. Bad actors will ignore licences. Good actors will still need legal departments.

Still, dismissing these experiments because they are imperfect would be a mistake.

The first function of a new licence is not enforcement alone. It is also articulation. A licence gives people a way to express intent. It creates a visible signal. It teaches a distinction. It gives institutions something to adopt, discuss, improve and challenge. It may later become part of procurement, platform defaults, publishing workflows, repository settings, research governance and public policy.

Perhaps the future will not belong to one perfect licence.

It may be a messy ecosystem of licences, preference signals, technical standards, opt-outs, dataset registries, collective bargaining models, institutional policies, platform defaults, procurement requirements, public AI infrastructures and public pressure. Aesthetically, this is not pleasing. Social systems rarely are. But it may be more useful than waiting for a single elegant solution while extraction continues.

This is why I cannot write about Humans Commons, CC Signals or Open Future’s work as if I were observing them from a safe distance. I am implicated in the question. I want AI tools to become more useful, more accessible and more responsibly adopted. I also want the knowledge commons to survive the enthusiasm of those who see it mainly as input.

This is also why, when I work with clients, I try to make the commons visible rather than treating it as abstract background. At the very least, I encourage people to track and attribute sources whenever possible, to understand where their knowledge inputs come from, and to remember that Wikimedia is not a magical answer machine, but a human-made infrastructure worth supporting. This may sound modest. Perhaps it is. But small habits of attribution and care are a better starting point than pretending that shared knowledge simply appears whenever we need it.

Perhaps that is the uncomfortable common ground: AI worth building should not weaken the conditions that make shared knowledge possible. Open knowledge worth defending should evolve when the nature of reuse changes.

Both worlds will need better manners, better governance and less innocence about power.

The question is no longer whether knowledge should circulate. It must, if we care about education, culture, science, democracy and the everyday usefulness of the web. The question is whether we can design circulation in a way that does not exhaust the people, communities and institutions that make knowledge worth circulating.

Some of us still believe in open knowledge. Some of us also work with AI. Standing between these worlds is not especially comfortable, but perhaps it is a useful place from which to ask better questions.

Discussion about this post

Ready for more?