Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Research leaders urges technology industry to monitor AI “thoughts”


IA researchers from Openai, Google Deepmind, Anthropic and a large coalition of companies and non -profit groups call for a more in -depth survey on the techniques of monitoring so -called models of reasoning in AI in a Post the paper Posted Tuesday.

A key characteristic of AI reasoning models, such as O3 of Openai And DEEPSEEK R1are their thought chains Or COTS – An outsourcing process in which AI models work through problems, similar to the way humans use a scratch pad to cross a difficult mathematical question. Reasoning models are a basic technology to supply AI agents, and document authors maintain that bed surveillance on the bed could be a basic method for keeping AI agents under control as they become more widespread and capable.

“COT monitoring has a precious addition to security measures for border AI, offering a rare overview of how AI agents make decisions,” said the researchers of the position document. “However, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make the best use of COT monitoring and to study how it can be preserved. ”

The position document requests the main developers of AI models to be studied which makes the cots “surveillance” – in other words, what factors can increase or reduce transparency in the way in which AI models really come to responses. The authors of the document say that surveillance of the bed bed can be a key method to understand the models of reasoning on the AI, but note that it could be fragile, warning against any intervention which could reduce their transparency or their reliability.

The authors of the document also call the developers of AI models to monitor COT monitoring and to study how the method could one day be implemented as a safety measure.

The notable signatories of the newspaper include the director of research of Openai, Mark Chen, the CEO of the Surintiffentnce Safe Ilya Sutskever, the winner of the Nobel Prize winner Geoffrey Hinton, the co-founder of Google Deepmind Shane Legg, the security adviser Xai Dan Hendrycks and the co-founder of the machines to think John Schulman. The first authors include leaders of the United Kingdom AI Security Institute and Apollo Research, and other signatories come from Metr, Amazon, Meta and UC Berkeley.

The newspaper marks a moment of unit among many AI industry leaders to try to stimulate research on AI security. This comes at a time when technological companies are taken in fierce competition – which led Meta to poach the best researchers D’OPENAI, Google Deepmind and Anthropic with offers of a million dollars. Some of the most sought after researchers are those who build AI agents and AI reasoning models.

Techcrunch event

San Francisco
|
October 27-29, 2025

“We are at this critical moment when we have this new thing from the chain of thoughts. It seems quite useful, but it could disappear in a few years if people do not really focus on it,” said Bowen Baker, an Openai researcher who worked on the newspaper, in an interview with Techcrunch. “Publishing an article of position like this, for me, is a mechanism to obtain more research and attention on this subject before it occurs.”

OPENAI publicly published an overview of the first IA reasoning model, O1, in September 2024. During the months that followed, the technology industry quickly published competitors who have similar capacities, with certain models of Google Deepmind, XAI and Anthropic showing even more advanced performance on history.

However, it is relatively little understood on the operation of AI reasoning models. While AI laboratories have excelled in improving AI performance in the past year, this has not necessarily translated as a better understanding of how they arrive at their answers.

Anthropic was one of the industry leaders to determine how AI models really work – an area called interpretability. Earlier this year, CEO Dario Amodei announced a Commitment to open the black box of AI models by 2027 and invest more in interpretability. He also called Openai and Google Deepmind to find the subject more.

Anthropic’s first research indicated that Cots may not be a fully reliable indication The way these models come to answers. At the same time, Openai researchers said that COT surveillance could one day be a reliable means of monitoring alignment and safety In AI models.

The objective of position articles like this is to report a boost and draw more attention to the emerging research areas, such as COT monitoring. Companies like Openai, Google Deepmind and Anthropic are already looking for these subjects, but it is possible that this document will encourage more funding and research on space.

(tagstotranslate) ai safety



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *