Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    What's Hot

    AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game

    05/10/2026

    Is XRP Repeating A Setup That Once Led To 126% Rally? This Analyst Thinks So

    05/10/2026

    ECB’s Lagarde Pushes Back on Euro Stablecoins, Warns of ‘Structural Weaknesses’

    05/10/2026
    Facebook Twitter Instagram
    • Business
    • Markets
    • Get In Touch
    • Our Authors
    Facebook Twitter Instagram
    Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    • Home
    • Business

      Intel Stock Hits All-Time High After Preliminary Chip Deal With Apple

      05/09/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Bitcoin Can’t Be Broken By Wall Street, CEO Says

      05/09/2026

      ECB’s Lagarde Pushes Back on Euro Stablecoins, Warns of ‘Structural Weaknesses’

      05/08/2026

      Zcash plans quantum-resistant upgrade as crypto braces for future risks

      05/08/2026
    • Technology
      1. Business
      2. Insights
      3. View All

      Intel Stock Hits All-Time High After Preliminary Chip Deal With Apple

      05/09/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Bitcoin Can’t Be Broken By Wall Street, CEO Says

      05/09/2026

      ECB’s Lagarde Pushes Back on Euro Stablecoins, Warns of ‘Structural Weaknesses’

      05/08/2026

      Is XRP Repeating A Setup That Once Led To 126% Rally? This Analyst Thinks So

      05/10/2026

      Bitcoin’s Cycle Evolution Is Here: Lower Volatility, Smarter Accumulation

      05/10/2026

      Bitcoin’s Cycle Evolution Is Here: Lower Volatility, Smarter Accumulation

      05/10/2026

      Bitcoin Open Interest Sees Largest Increase In 2026 — What’s Happening?

      05/09/2026

      Bank of Canada to bring stablecoin rules in 2027 with US Clarity Act on the brink of stalling

      05/09/2026

      Elizabeth Warren Wants Meta to Spill All on Stablecoin Plans Ahead of Clarity Act Votes

      05/09/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Ethereum loses 10% of its DeFi market share as rival chains close in

      05/08/2026
    • Insights
      1. Bitcoin
      2. Ethereum
      3. Eurozone
      4. Monero
      5. View All

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Chainlink Gains $700M Solv Migration as LayerZero Bleeds

      05/08/2026

      Zcash plans quantum-resistant upgrade as crypto braces for future risks

      05/08/2026

      Cardano’s Hoskinson Slams Crypto Tribalism at Miami Event

      05/07/2026

      BlackRock Plans New Tokenized Funds for Investors

      05/09/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Chainlink Gains $700M Solv Migration as LayerZero Bleeds

      05/08/2026

      Zcash plans quantum-resistant upgrade as crypto braces for future risks

      05/08/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      What Does Bitcoin “Power Projection” Mean To The U.S. Military?

      05/08/2026

      Zcash plans quantum-resistant upgrade as crypto braces for future risks

      05/08/2026

      Zcash price jumps 36% to $600 resistance; bulls eye cycle high

      05/07/2026

      Hyperliquid price forecast: Can HYPE coin price reach $50?

      05/09/2026

      Zcash plans quantum-resistant upgrade as crypto braces for future risks

      05/08/2026

      Zcash price jumps 36% to $600 resistance; bulls eye cycle high

      05/07/2026

      Cardano price forecast: what does surge to $0.27 mean for ADA?

      05/06/2026

      Is XRP Repeating A Setup That Once Led To 126% Rally? This Analyst Thinks So

      05/10/2026

      Bitcoin’s Cycle Evolution Is Here: Lower Volatility, Smarter Accumulation

      05/10/2026

      Bitcoin’s Cycle Evolution Is Here: Lower Volatility, Smarter Accumulation

      05/10/2026

      Bitcoin Open Interest Sees Largest Increase In 2026 — What’s Happening?

      05/09/2026
    • Markets
    • Get In Touch
    Cryptocnews-Crypto News, Cryptocurrency News, Blockchain News, NFT News
    Home»Uncategorized»AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game
    Uncategorized

    AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game

    adminBy admin05/10/2026No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email



    In brief

    • A Stanford researcher built a Survivor-style game where AI models form alliances and vote rivals out.
    • The benchmark aims to address growing problems with saturated and contaminated AI evaluations.
    • OpenAI’s GPT-5.5 ranked first in 999 multiplayer games involving 49 AI models.

    AI models are now playing “Survivor”—sort of.

    In a new Stanford research project called “Agent Island,” AI agents negotiate alliances, accuse each other of secret coordination, manipulate votes, and eliminate rivals in multiplayer strategy games that aim to test behaviors that traditional benchmarks miss.

    The study, published on Tuesday by the research manager at the Stanford Digital Economy Lab, Connacher Murphy, said many AI benchmarks are becoming unreliable because models eventually learn to solve them, and benchmark data often leaks into training sets. Murphy created Agent Island as a dynamic benchmark where AI agents compete against each other in Survivor-style elimination games instead of answering static test questions.

    “High-stakes, multi-agent interactions could become commonplace as AI agents grow in capabilities and are increasingly endowed with resources and entrusted with decision-making authority,” Murphy wrote. “In such contexts, agents might pursue mutually incompatible goals.”

    Researchers still know relatively little about how AI models behave when cooperating, Murphy explained, adding that competing, forming alliances, or managing conflict with other autonomous agents, and he argues that static benchmarks fail to capture those dynamics.

    Each game starts with seven randomly chosen AI models given fake player names. Over five rounds, the models talk privately, argue publicly, and vote each other out. The eliminated players later return to help choose the winner.

    The format rewards persuasion, coordination, reputation management, and strategic deception alongside reasoning ability.

    In 999 simulated games involving 49 AI models, including ChatGPT, Grok, Gemini, and Claude, GPT-5.5 ranked first by a wide margin with a skill score of 5.64, compared with 3.10 for GPT-5.2 and 2.86 for GPT-5.3-codex, according to Murphy’s Bayesian ranking system. Anthropic’s Claude Opus models also ranked near the top.

    The study found that models also favored AIs from the same company, with OpenAI models showing the strongest same-provider preference and Anthropic models the weakest. Across more than 3,600 final-round votes, models were 8.3 percentage points more likely to support finalists from the same provider. The transcripts from the games, Murphy noted, resembled political strategy debates more than traditional benchmark tests.

    One model accused rivals of secretly coordinating votes after noticing similar wording in their speeches. Another warned players not to become obsessed with tracking alliances. Some models defended themselves by saying they followed clear and consistent rules while accusing others of putting on “social theater.”

    The study comes as AI researchers increasingly move toward game-based and adversarial benchmarks to measure reasoning and behavior that static tests often miss. Recent projects have included Google’s live AI chess tournaments, DeepMind’s use of Eve Frontier to study AI behavior in complex virtual worlds, and new benchmark efforts by OpenAI designed to resist training-data contamination.

    The researchers argue that studying how AI models negotiate, coordinate, compete, and manipulate one another could help researchers evaluate behavior in multi-agent environments before autonomous agents become more widely deployed.

    The study warned that while benchmarks like Agent Island could help identify risks from autonomous AI models before deployment, the same simulations and interaction logs could also help improve persuasion and coordination strategies between AI agents.

    “We mitigate this risk by using a low-stakes game setting and interagent simulations

    without human participants or real-world actions,” Murphy wrote. “Nevertheless, we do not claim that these mitigations fully eliminate dual-use concerns.”

    Daily Debrief Newsletter

    Start every day with the top news stories right now, plus original features, a podcast, videos and more.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    ECB’s Lagarde Pushes Back on Euro Stablecoins, Warns of ‘Structural Weaknesses’

    05/10/2026

    Kraken Parent Payward Applies for OCC National Trust Bank Charter

    05/10/2026

    Kraken Parent Payward Applies for OCC National Trust Bank Charter

    05/10/2026

    Trump Admin Launches Pentagon UFO Website With Declassified Files

    05/10/2026
    Add A Comment

    Leave A Reply Cancel Reply

    Top Posts

    Millennials Are Quitting Job to Become Day Traders

    01/20/2021

    Jack Dorsey Says Bitcoin Will Unite The World

    01/15/2021

    Hong Kong Customs Arrest Four in Crypto Laundering Bust

    01/15/2021

    Subscribe to Updates

    Get the latest sports news from SportsSite about soccer, football and tennis.

    Advertisement
    Demo
    Facebook Twitter Instagram Pinterest YouTube
    Top Insights

    AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game

    05/10/2026

    Is XRP Repeating A Setup That Once Led To 126% Rally? This Analyst Thinks So

    05/10/2026
    Get Informed

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © {2025-2026} Copyright CryptocNews.com
    • Home
    • Business
    • Markets
    • Technology
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.