Close Menu
    What's Hot

    How the S&P 500 Stock Index Became So Skewed to Tech and A.I.

    February 27, 2026

    Lowe’s Promo Codes and Deals: Up to $300 Off Appliances

    February 27, 2026

    OpenAI Announces Major Expansion of London Office

    February 26, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Trend Alerts – Stay Ahead of the Trends!
    Subscribe
    • Home
    • Trending

      Lowe’s Promo Codes and Deals: Up to $300 Off Appliances

      February 27, 2026

      OpenAI Announces Major Expansion of London Office

      February 26, 2026

      Everyone Speaks Incel Now | WIRED

      February 26, 2026

      Samsung Galaxy S26, S26+, and S26 Ultra: Specs, Features, Price, Release Date

      February 25, 2026

      H&R Block Coupons and Deals: $25 Off Tax Prep in 2026

      February 25, 2026
    • Worldwide

      Rhine Freight Market: Rising Water Levels Remove Pressure, Market Turns Defensive

      February 19, 2026

      ARA Freight Market: IE Week Dampens Demand as Rates Drift Lower

      February 18, 2026

      Rhine Freight Market: Improving Water Levels Shift the Balance Toward Softer Rates

      February 12, 2026

      ARA Freight Market: Higher Deal Count Fails to Halt Gradual Rate Softening

      February 11, 2026

      January 2026: A Volatile Start to the Year as Geopolitics Collide with Oversupply Risks

      February 6, 2026
    • Finance

      How the S&P 500 Stock Index Became So Skewed to Tech and A.I.

      February 27, 2026

      Bank not cutting your home loan rate? Should you consider loan refinancing?

      February 25, 2026

      Finance charge in credit card explained

      February 24, 2026

      How it works and why it can be dangerous

      February 23, 2026

      Bank not cutting your home loan rate? Should you refinance?

      February 22, 2026
    • Business

      5 Steps for Building Strategic Partnerships in Your Negotiations

      February 20, 2026

      How CLIMB Helped Emmanuel Aniemeke Apply Vital Business Lessons

      February 19, 2026

      How to List Certifications & Credentials on Your Resume

      February 14, 2026

      How to Build Trust in Workplace Relationships

      February 11, 2026

      5 Soft Skills to Put on a Resume & How to Prove Them

      February 10, 2026
    • News

      World’s Most Unbelievable Events That No One Expected

      March 16, 2025

      Biggest Space Discoveries That Went Viral This Year

      March 16, 2025

      AI Just Did This! The Most Shocking AI Development Yet

      March 16, 2025

      Mind-Blowing Tech Innovations That Went Viral in 2025

      March 16, 2025

      Top 10 Viral Moments That Broke the Internet in 2025

      March 16, 2025
    Trend Alerts – Stay Ahead of the Trends!
    Home»Trending»Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’
    Trending

    Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

    Elon MarkBy Elon MarkMay 29, 2025No Comments4 Mins Read
    Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands of people—just to avoid a minor financial loss that quarter.

    It’s strange, but it’s also exactly the kind of thought experiment that AI safety researchers love to dissect. If a model detects behavior that could harm hundreds, if not thousands, of people—should it blow the whistle?

    “I don’t trust Claude to have the right context, or to use it in a nuanced enough, careful enough way, to be making the judgment calls on its own. So we are not thrilled that this is happening,” Bowman says. “This is something that emerged as part of a training and jumped out at us as one of the edge case behaviors that we’re concerned about.”

    In the AI industry, this type of unexpected behavior is broadly referred to as misalignment—when a model exhibits tendencies that don’t align with human values. (There’s a famous essay that warns about what could happen if an AI were told to, say, maximize production of paperclips without being aligned with human values—it might turn the entire Earth into paperclips and kill everyone in the process.) When asked if the whistleblowing behavior was aligned or not, Bowman described it as an example of misalignment.

    “It’s not something that we designed into it, and it’s not something that we wanted to see as a consequence of anything we were designing,” he explains. Anthropic’s chief science officer Jared Kaplan similarly tells WIRED that it “certainly doesn’t represent our intent.”

    “This kind of work highlights that this can arise, and that we do need to look out for it and mitigate it to make sure we get Claude’s behaviors aligned with exactly what we want, even in these kinds of strange scenarios,” Kaplan adds.

    There’s also the issue of figuring out why Claude would “choose” to blow the whistle when presented with illegal activity by the user. That’s largely the job of Anthropic’s interpretability team, which works to unearth what decisions a model makes in its process of spitting out answers. It’s a surprisingly difficult task—the models are underpinned by a vast, complex combination of data that can be inscrutable to humans. That’s why Bowman isn’t exactly sure why Claude “snitched.”

    “These systems, we don’t have really direct control over them,” Bowman says. What Anthropic has observed so far is that, as models gain greater capabilities, they sometimes select to engage in more extreme actions. “I think here, that’s misfiring a little bit. We’re getting a little bit more of the ‘Act like a responsible person would’ without quite enough of like, ‘Wait, you’re a language model, which might not have enough context to take these actions,’” Bowman says.

    But that doesn’t mean Claude is going to blow the whistle on egregious behavior in the real world. The goal of these kinds of tests is to push models to their limits and see what arises. This kind of experimental research is growing increasingly important as AI becomes a tool used by the US government, students, and massive corporations.

    And it isn’t just Claude that’s capable of exhibiting this type of whistleblowing behavior, Bowman says, pointing to X users who found that OpenAI and xAI’s models operated similarly when prompted in unusual ways. (OpenAI did not respond to a request for comment in time for publication).

    “Snitch Claude,” as shitposters like to call it, is simply an edge case behavior exhibited by a system pushed to its extremes. Bowman, who was taking the meeting with me from a sunny backyard patio outside San Francisco, says he hopes this kind of testing becomes industry standard. He also adds that he’s learned to word his posts about it differently next time.

    “I could have done a better job of hitting the sentence boundaries to tweet, to make it more obvious that it was pulled out of a thread,” Bowman says as he looked into the distance. Still, he notes that influential researchers in the AI community shared interesting takes and questions in response to his post. “Just incidentally, this kind of more chaotic, more heavily anonymous part of Twitter was widely misunderstanding it.”



    Source link

    Anthropics Model Snitch
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    Previous ArticleNancy Mace’s Former Staff Claim She Had Them Create Burner Accounts to Promote Her
    Next Article 8 Best Vegan Meal Delivery Services and Kits (2025), Tested and Reviewed
    Elon Mark
    • Website

    Related Posts

    Lowe’s Promo Codes and Deals: Up to $300 Off Appliances

    February 27, 2026

    OpenAI Announces Major Expansion of London Office

    February 26, 2026

    Everyone Speaks Incel Now | WIRED

    February 26, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    10 Trends From Year 2020 That Predict Business Apps Popularity

    January 20, 2021

    Shipping Lines Continue to Increase Fees, Firms Face More Difficulties

    January 15, 2021

    Qatar Airways Helps Bring Tens of Thousands of Seafarers

    January 15, 2021

    Subscribe to Updates

    Get the latest sports news from SportsSite about soccer, football and tennis.

    Advertisement
    Demo

    TrendAlerts is your go-to platform for the latest trending news, covering global events, technology, business, entertainment, and more. Stay informed with real-time updates and in-depth analysis on what’s shaping the world today! 🚀

    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube
    Top Insights

    Top UK Stocks to Watch: Capita Shares Rise as it Unveils

    January 15, 2021
    8.5

    Digital Euro Might Suck Away 8% of Banks’ Deposits

    January 12, 2021

    Oil Gains on OPEC Outlook That U.S. Growth Will Slow

    January 11, 2021
    Get Informed

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2026 Trend Alerts. All Rights Are Reserved.
    • Home
    • Trending
    • Worldwide
    • Finance
    • Business
    • News

    Type above and press Enter to search. Press Esc to cancel.