Anthropic keeps new AI model private after it finds thousands of external vulnerabilities

Summary

Anthropic has decided to keep its most powerful new AI model, Claude Mythos Preview, away from the general public. The company made this choice after the model discovered thousands of security flaws in major computer systems and web browsers. Instead of a wide release, Anthropic is sharing the technology with a select group of tech giants and security experts to help fix these problems quietly. This move highlights the growing concern that advanced AI could be used as a dangerous tool for cyberattacks if it falls into the wrong hands.

Main Impact

The decision to keep this model private marks a major shift in how AI companies release their products. Usually, new models are launched for everyone to use, but Claude Mythos Preview is considered too risky for a standard release. By using a "controlled deployment" strategy, Anthropic is trying to ensure that the AI helps defend the internet rather than helping hackers attack it. This approach could become the new standard for the industry as AI tools become capable of finding and exploiting complex software bugs without human help.

Key Details

What Happened

Anthropic created an initiative called Project Glasswing to manage the use of this new model. They have partnered with some of the biggest names in technology, including Apple, Microsoft, Google, Amazon, and Cisco. These partners are using the AI to scan their software for "zero-day" vulnerabilities. These are security holes that were previously unknown to the people who wrote the software. Because the AI can find these flaws so quickly, Anthropic believes it is safer to work directly with the companies that can fix them before the public ever finds out they exist.

Important Numbers and Facts

Anthropic is putting a lot of resources into this safety effort. The company is providing $100 million in AI usage credits to its partners so they can use the model for security work. They are also donating $4 million in cash to organizations that look after open-source software. The model has already proven its power by finding a bug in the OpenBSD operating system that had been hidden for 27 years. In another case, it found a 17-year-old flaw in FreeBSD that would allow a person without a password to take full control of a server from anywhere in the world.

Background and Context

It is important to understand that Anthropic did not set out to build a "hacking" AI. The model became good at finding security flaws simply because it was trained to be better at coding and logical thinking. As the AI got smarter at writing software, it naturally became better at spotting mistakes in software. However, the ability to find a mistake is very similar to the ability to break into a system. This is known as a "dual-use" problem, where a helpful tool can easily be turned into a weapon. Anthropic researchers noted that the model can even link several small bugs together to create a very complex and successful attack.

Public or Industry Reaction

Leaders in the tech community have praised the move, especially those who work on free, open-source software. Jim Zemlin, the head of the Linux Foundation, explained that many people who maintain important software do not have the money or staff to do deep security checks. By giving these smaller groups access to powerful AI tools and funding, Anthropic is helping to protect the foundation of the internet. Government officials in the United States have also been briefed on the model's power, as they try to figure out how AI will change the future of national security and digital warfare.

What This Means Going Forward

Anthropic does not plan to keep all its models private forever. They are working on new safety features that will be included in future versions, such as the upcoming Claude Opus model. The goal is to create "guardrails" that prevent the AI from helping with malicious activities while still allowing it to be useful for regular tasks. Other companies like OpenAI are following a similar path, treating their most advanced coding models with extra caution. This suggests that the most powerful AI tools of the future may only be available to verified organizations rather than the general public.

Final Take

The discovery of decades-old bugs by Claude Mythos Preview shows that our digital world is more fragile than we thought. While it is exciting that AI can help us find and fix these flaws, the risk of misuse is too high to ignore. Anthropic’s decision to prioritize safety over a flashy public launch is a responsible step in a world where AI is rapidly becoming more capable than the humans who created it.

Frequently Asked Questions

Why is Anthropic not releasing the Claude Mythos Preview model?

The model is so good at finding and exploiting security flaws that Anthropic fears it could be used for major cyberattacks if it were available to everyone. They are keeping it private to prevent it from being used by bad actors.

What is a zero-day vulnerability?

A zero-day vulnerability is a security flaw in software that the developers do not know about yet. It is called "zero-day" because the creators have had zero days to fix it, making it very dangerous if a hacker finds it first.

How is Anthropic helping the open-source community?

Anthropic is donating $4 million and providing free access to its AI tools for groups like the Linux Foundation and the Apache Software Foundation. This helps people who maintain free software find and fix bugs that they might have missed otherwise.