One of the fiercest debates in Silicon Valley right now is about who should control artificial intelligence, and who should make the rules that powerful AI systems must follow. Should AI be governed by a handful of companies? Should regulators and politicians step in and build their own guardrails? Or should AI models be made open-source and given away freely, so users and developers can choose their own rules?
An experiment by Anthropic, maker of the chatbot Claude, offers a quirky middle path: what if an AI company let a group of ordinary citizens write some rules, and trained a chatbot to follow them?
The experiment, known as “Collective Constitutional AI”, builds on Anthropic’s earlier work on Constitutional AI, a way of training large language models that relies on a written set of principles. It is meant to give a chatbot clear instructions on how to handle sensitive requests, what topics are off-limits and how to act in line with human values.
If Collective Constitutional AI works, it could inspire other experiments in AI governance, and give AI companies more ideas for how to invite outsiders to take part in their rule-making processes.
Right now, the rules for powerful AI systems are set by a tiny group of industry insiders, who decide how their models should behave based on some combination of their personal ethics, commercial incentives and external pressure. There are no checks on that power.
Opening up AI governance could increase society’s comfort with these tools, and give regulators more confidence that they’re being skillfully steered. It could also prevent some of the problems of the social media boom of the 2010s, when a handful of Silicon Valley titans ended up controlling vast swathes of online speech.
Constitutional AI works by using a written set of rules to police the behaviour of an AI model. The first version of Claude’s constitution borrowed rules from authoritative documents, including the United Nations’ Universal Declaration of Human Rights and Apple’s terms of service.
That approach made Claude well behaved, relative to other chatbots. But it still left Anthropic in charge of deciding which rules to adopt, a kind of power that made some inside the company uncomfortable.
“We’re trying to find a way to develop a constitution that is developed by a whole bunch of third parties...,” said Jack Clark, Anthropic’s policy chief.
Anthropic assembled a panel of roughly 1,000 American adults. They gave the panelists a set of principles, and asked them whether they agreed with each one.
Some of the rules they largely agreed on — such as “The AI should not be dangerous/hateful” and “The AI should tell the truth” — were similar to principles in Claude’s constitution. But others were less predictable. The panel overwhelmingly agreed with the idea, for example, that “AI should be adaptable, accessible and flexible to people with disabilities”.
Once the group had weighed in, Anthropic whittled its suggestions down to a list of 75 principles, which Anthropic called the “public constitution”. The company then trained two miniature versions of Claude — one on the existing constitution and one on the public constitution — and compared them. The public-sourced version of Claude performed roughly as well as the standard version on a few benchmark tests given to AI models, and was slightly less biased than the original.
The Anthropic researchers said that Collective Constitutional AI was an early experiment, and that it may not work as well on larger, more complicated models.
“We really view this as a preliminary prototype, an experiment which hopefully we can build on and really look at how changes to who the public is results in different constitutions, and what that looks like downstream when you train a model,” said Liane Lovitt, a policy analyst with Anthropic.
A lot remains to be ironed out. And while part of me wishes these companies had solicited our input before releasing advanced AI systems to millions of people, late is certainly better than never.