Claude Opus 4.8, released by Anthropic on May 28, 2026, is a strong static code analysis assistant for security work, but it does not test software in runtime, which means it cannot confirm whether a vulnerability is actually exploitable against a live system.
Claude Opus 4.8, released by Anthropic on May 28, 2026, is a strong static code analysis assistant for security work, but it does not test software in runtime, which means it cannot confirm whether a vulnerability is actually exploitable against a live system.
Is Opus 4.8 good at security?
Yes, as an analyst. It reads CVEs and patch diffs and tells you what's exploitable, catches auth bypasses, injection, broken access control, and business-logic bugs in code review, and drafts patches and threat models you can hand to an engineer. The real upgrade for that work is honesty: Anthropic reports 4.8 is about four times less likely than 4.7 to let flaws in its own code pass unremarked, with a tenfold-plus drop in overconfidence and the lowest hallucination rate of the models tested. A model that won't confidently call a non-exploitable finding exploitable is worth more in a SOC than one that's a few benchmark points smarter. The Cyber Verification Program from 4.7 is still the path for credentialed pros, and Trend Micro's TrendAI is already evaluating 4.8 under it.
But be clear on what it's doing: reading code, not exercising your running app. This is static analysis with a smarter reader on top. It can say a sink looks reachable; it can't tell you whether that path is actually reachable in your deployed config, with real auth state and live data flows. That gap is where the false positives (exploitable-looking in source, not in reality) and the false negatives (bugs that only appear when components interact at runtime) live, and a static read never sees them.
Is Opus 4.8 better than Mythos?
No, and it's not meant to be. The runtime safeguards that block prohibited or high-risk cyber prompts are still on, and the system card frames those safeguards, not raw capability, as what closes the practical gap between 4.8 and Mythos. Don't expect 4.8 to chain a multi-stage exploit against a live target or autonomously find unknown bugs at scale. It's trained not to.
Mythos does exactly that, and it's nearly here. Under Project Glasswing, partners found more than 10,000 vulnerabilities in a month, including 6,202 high- or critical-severity flaws across 1,000 open-source projects. Two weeks ago, researchers at the Palo Alto firm Calif used Mythos Preview to chain two macOS bugs into a privilege-escalation exploit that bypassed Apple's Memory Integrity Enforcement on M5 silicon, as The Wall Street Journal reported. That capability was locked behind a dozen partners; Anthropic now says it's weeks from general release, so defenders and adversaries get it at roughly the same time. When we covered 4.7, that model was walled off entirely, and 4.8 is the moment the wall starts coming down. We cover the full defender playbook in What Is Claude Mythos? Why Security Teams Need to Act Now.
But notice what those numbers are: bugs flagged by reading code, not exploits proven against a running system. Grade hundreds of thousands of lines and a model can't hold every cross-service interaction, runtime state, and auth path in context at once, and that's exactly where most real vulnerabilities live. So you get a flood of plausible findings a human still has to triage and disprove, plus the dangerous ones that only surface at runtime and never show up in a static pass. AISI even noted its Mythos evals ran without live defenders, EDR, or active incident response. Ten thousand findings is a triage problem, not a security outcome.
What should security teams actually do now?
Opus 4.8 is not a replacement for testing your software in runtime. It reads code, it doesn't run attacks against your live system, so it can't tell you which of its findings actually hold once the app is deployed, authenticated, and handling real traffic. Treating a model's code review as a security test means shipping on unproven findings while the bugs that only exist at runtime go untouched.
That's what MindFort is: autonomous security agents that find vulnerabilities and fix them continuously, across every surface. The difference from a model reading your code is that our agents work against your running application. They probe your apps, APIs, and infrastructure the way an attacker would, run the exploit in runtime to reproduce it before anything reaches you, and ship each proven finding back as a verified patch PR you can merge. It's a new category we call AXR (Autonomous Exploitation and Remediation). For how to evaluate vendors, see our 2026 AI Pentesting Buyer's Guide. You don't need Mythos access to defend against Mythos-class discovery; you need a system that proves what's exploitable and fixes it, not one that hands you ten thousand maybes.
Book a demo to see MindFort find exploitable bugs before attackers do, with first results in under an hour and 24/7 coverage.