Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    What's Hot

    Circle Blacklists Zama Protocol Address, Freezing $12.6M In User Funds

    05/31/2026

    How President Trump’s Immigration Order Will Feed the Stablecoin Economy, Bitcoin ATMs

    05/31/2026

    Google Engineer Charged Over $2.75 Million in Alleged Polymarket Insider Trading Bets

    05/31/2026
    Facebook Twitter Instagram
    • Business
    • Markets
    • Get In Touch
    • Our Authors
    Facebook Twitter Instagram
    Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    • Home
    • Business

      JPMorgan CEO Doubles Down On CLARITY Opposition

      05/30/2026

      What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots

      05/30/2026

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      Cardano (ADA) Ecosystem Growth Fuels Strong Increase In Staking Activity

      05/29/2026

      Treasury Secretary Bessent Says US Has ‘Grabbed’ $1 Billion in Crypto From Iran

      05/29/2026
    • Technology
      1. Business
      2. Insights
      3. View All

      JPMorgan CEO Doubles Down On CLARITY Opposition

      05/30/2026

      What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots

      05/30/2026

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      Cardano (ADA) Ecosystem Growth Fuels Strong Increase In Staking Activity

      05/29/2026

      Circle Blacklists Zama Protocol Address, Freezing $12.6M In User Funds

      05/31/2026

      Bit Digital Saw Ethereum’s Strategic Value Before Institutions Caught On

      05/31/2026

      SEC Sues Texas Man For $12.3 Million Crypto Asset Fraud – Details

      05/31/2026

      $1.88M Wiped Out As Sui Blockchain Suffers Third Outage Before Recovery

      05/31/2026

      The US debt machine is getting harder to stabilize

      05/30/2026

      NYSE Parent Isn’t ‘Freaked Out’ by Hyperliquid—It’s Learning From the Crypto Perps Giant

      05/30/2026

      Bitcoin perps just got a US green light, but one catch could decide everything

      05/29/2026

      Celsius Founder Alex Mashinsky Files to Have 12-Year Crypto Fraud Sentence Vacated

      05/29/2026
    • Insights
      1. Bitcoin
      2. Ethereum
      3. Eurozone
      4. Monero
      5. View All

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      Bitcoin drops to $73K amid renewed US strikes on Iran and ETF outflows

      05/28/2026

      Bitcoin retests support below $75,000 as downside pressure holds

      05/27/2026

      Render crypto price prediction: why RENDER is rising today

      05/26/2026

      Chainlink Powers Global Access to Saylor’s Bitcoin-Funding Stock

      05/30/2026

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      Garlinghouse Backs Trump After SEC Chair Ends War on Crypto

      05/29/2026

      Bitcoin drops to $73K amid renewed US strikes on Iran and ETF outflows

      05/28/2026

      Major UX Upgrades To Flagship Bitcoin Hardware Wallet

      05/29/2026

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      JPMorgan Chase CEO Jamie Dimon Declares War On Clarity Act, Calls Coinbase’s Armstrong ‘Full Of Sh*t’

      05/29/2026

      Texas Names Bitcoin Reserve Advisory Committee As State Eyes Direct Bitcoin Custody

      05/29/2026

      DeXe price eyes $20 amid significant buy volume – can bulls sustain momentum?

      05/29/2026

      Bitcoin drops to $73K amid renewed US strikes on Iran and ETF outflows

      05/28/2026

      Bitcoin retests support below $75,000 as downside pressure holds

      05/27/2026

      Render crypto price prediction: why RENDER is rising today

      05/26/2026

      Circle Blacklists Zama Protocol Address, Freezing $12.6M In User Funds

      05/31/2026

      Bit Digital Saw Ethereum’s Strategic Value Before Institutions Caught On

      05/31/2026

      SEC Sues Texas Man For $12.3 Million Crypto Asset Fraud – Details

      05/31/2026

      $1.88M Wiped Out As Sui Blockchain Suffers Third Outage Before Recovery

      05/31/2026
    • Markets
    • Get In Touch
    Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    Home»Uncategorized»AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows
    Uncategorized

    AI Models Can’t Agree on Basic Facts Most of the Time, Study Shows

    adminBy admin05/30/2026No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In brief

    • Five frontier AI models disagreed on 67% of 1,000 real-world fact-check claims.
    • Unanimous agreement happened on only 328 claims.
    • At 0.639 Krippendorff’s alpha, the models fall below the 0.8 reliability threshold.

    Ask five of the world’s most advanced AI systems whether a statement is true, and two-thirds of the time, at least one will give you a different answer. That’s the finding of a new study published this month by researcher Kosta Jordanov at Lenz Research.

    The study gave GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro with Search, and Sonar Pro the same 1,000 real-world fact-check claims submitted by actual users. The models had to pick one of four labels: true, mostly true, misleading, or false.

    On 672 out of 1,000 claims, at least one model broke from the majority. In 34% of cases, the disagreement was severe: one model called a claim true while another called it false.

    “These aren’t benchmark items with public answer keys—they’re claims real users submitted for verification to a fact-checking platform,” the study reads. “Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model’s verdict is label-inconsistent under this 4-bucket rubric.”

    Previous studies on AI hallucination have shown that chatbots invent facts. That’s one problem. This is a different one. The models aren’t necessarily making things up, they just can’t agree on basic factual judgments about the same material.

    The research used a setup that makes it harder for the AI companies to explain away. Instead of pulling claims from standard test sets—the kind that often leak into training data—the researchers used claims submitted by real people to Lenz’s fact-checking platform. “Most of these claims are unlikely to appear in any training corpus with a gold label attached—there’s no canonical answer key to pattern-match against, no benchmark leaderboard to anchor to,” the paper notes.

    The statistical measure of agreement, called Krippendorff’s alpha, came in at 0.639 on a scale where 1.0 means perfect agreement and 0 means random chance. The study says this indicates “nontrivial but limited agreement.” “The models’ verdicts are structured rather than random, but not consistent enough to treat the panel as a single interchangeable judge,” researchers note. Researchers generally consider anything below 0.8 to be weak.

    When all five models did agree—which happened on only 328 out of 1,000 claims—they almost never agreed that something was misleading or mostly true. Just four claims received a unanimous “misleading” verdict. Zero received unanimous “mostly true.”

    The researchers provided example claims where the AI models showed the most divergence, including “The World Bank’s active portfolio in Nigeria stands an over $16.4 billion as of 2025.” ChatGPT 5.4 said it was “mostly true” while Gemini 3 Pro called it “false” and its sister model Gemini 3 Pro + Search rated it “misleading.”

    In another example, the models were provided with the claim: “Donald Trump said that an attack on Iran was postponed at the request of Gulf Allies.” GPT-5.4 said it was false, Claude Opus 4.7 called it mostly true, Gemini 3 Pro said false, and Gemini 3 Pro + Search rated it true.

    “The panel converges on definitive verdicts; the middle of the rubric is where it fractures,” the researchers found. Unanimity only happened at the extremes: either the claim was definitely true or definitely false.

    This matters because people are increasingly turning to AI systems for fact-checking. If you paste a claim from a news article into ChatGPT, Claude, or Gemini, you might get three different answers. Which one do you trust?

    AI companies love to tell you their models are getting more accurate. They publish benchmark scores showing steady improvement. But the Lenz study tested these models on the kind of jagged, ambiguous claims that real humans actually argue about—and found that the models argue too.

    The paper is careful to point this out. “A majority of frontier models is not ground truth. The majority verdict is sometimes wrong; an individual dissenting model is sometimes right. We use the majority as a structural reference point for measuring disagreement, not as a stand-in for correctness.”

    There’s a deeper problem buried in the numbers. When models disagree, at least one of them must be wrong—the study calls a model’s verdict “label-inconsistent under this 4-bucket rubric.” There’s no tie-breaker mechanism, no appeals court. Recent reporting on AI reliability has raised similar alarms.

    On the 328 claims where all five models agreed, zero received a unanimous “mostly true.” The nuance bucket emptied out completely. If AI models can only find consensus at the extremes, can they be trusted as fact checkers at all?

    Daily Debrief Newsletter

    Start every day with the top news stories right now, plus original features, a podcast, videos and more.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    How President Trump’s Immigration Order Will Feed the Stablecoin Economy, Bitcoin ATMs

    05/31/2026

    Google Engineer Charged Over $2.75 Million in Alleged Polymarket Insider Trading Bets

    05/31/2026

    What the Clarity Act Means for the Rest of the World

    05/31/2026

    Ukrainian Police Officers Allegedly Kidnapped Crypto Entrepreneurs to Extort Millions

    05/31/2026
    Add A Comment

    Leave A Reply Cancel Reply

    Top Posts

    Millennials Are Quitting Job to Become Day Traders

    01/20/2021

    Jack Dorsey Says Bitcoin Will Unite The World

    01/15/2021

    Hong Kong Customs Arrest Four in Crypto Laundering Bust

    01/15/2021

    Subscribe to Updates

    Get the latest sports news from SportsSite about soccer, football and tennis.

    Advertisement
    Demo
    Facebook Twitter Instagram Pinterest YouTube
    Top Insights

    Circle Blacklists Zama Protocol Address, Freezing $12.6M In User Funds

    05/31/2026

    How President Trump’s Immigration Order Will Feed the Stablecoin Economy, Bitcoin ATMs

    05/31/2026
    Get Informed

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © {2025-2026} Copyright CryptocNews.com
    • Home
    • Business
    • Markets
    • Technology
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.