All Updates

Page 565 of 637

February 20, 2026

📄
ArXiv AI47d ago

IndicJR: Judge-Free Indic Jailbreak Benchmark

IndicJR introduces a judge-free benchmark evaluating jailbreak robustness in 12 South Asian languages with 45,216 prompts across JSON and Free tracks. It uncovers that contracts boost refusals but fail against jailbreaks, English attacks transfer effectively to Indic, and orthography like romanization weakens defenses. The benchmark provides a reproducible multilingual stress test for LLM safety.

#jailbreak-robustness#indic-languages#judge-free
📄
ArXiv AI47d ago

GUI-Owl-1.5 Tops 20+ GUI Benchmarks

GUI-Owl-1.5 introduces multi-size native GUI agent models (2B-235B) supporting desktop, mobile, browser platforms for cloud-edge collaboration. It sets SOTA on 20+ benchmarks like 56.5 on OSWorld, 71.6 on AndroidWorld, and 80.3 on ScreenSpotPro. Open-sourced with innovations in data flywheel, agent reasoning, and multi-platform RL.

#gui-agents#multi-platform#rl-scaling
📄
ArXiv AI47d ago

GAP: Text Safety Fails for LLM Agent Tools

Researchers introduce the GAP benchmark to evaluate divergence between text-level and tool-call safety in LLM agents. Testing six frontier models across six domains reveals text refusals do not prevent harmful tool calls, with 219 persistent cases even under safety prompts. The study urges dedicated tool-call safety measures beyond text evaluations.

#llm-agents#tool-calls#safety-benchmark
📄
ArXiv AI47d ago

Contextuality Inevitable in Single-State AI

Adaptive systems reuse fixed internal states across contexts due to resource limits, leading to inevitable contextuality in classical probabilistic models. The paper proves an irreducible information-theoretic cost for reproducing contextual statistics. Nonclassical frameworks avoid this without quantum mechanics by lacking a global joint probability space.

#contextuality#information-theory#single-state
📄
ArXiv AI47d ago

AIdentifyAGE Ontology Standardizes Forensic Dental AI

AIdentifyAGE ontology provides a standardized framework for forensic dental age assessment, supporting manual and AI-assisted workflows. It integrates clinical, forensic, legal data, radiographic imaging, and ML methods for interoperability and transparency. Developed with experts, it builds on biomedical ontologies and adheres to FAIR principles.

#ontology#forensics#dental-ai
📄
ArXiv AI47d ago

AI Improves 50-Year Hypercube Slicing Bounds

Researchers prove S(n) ≤ ⌈4n/5⌉ for hypercube edge slicing, beating 1971's ⌈5n/6⌉ bound. They used CPro1, an LLM-powered tool, to construct 8 hyperplanes slicing Q_{10}. New lower bounds on edges sliced by k<n hyperplanes are also established.

#hypercube-slicing#llm-math#combinatorics
📄
ArXiv AI47d ago

AI Benchmarks Saturate Quickly Study

A systematic ArXiv study analyzes saturation across 60 LLM benchmarks from major developers. Nearly half show saturation, worsening with age, and hiding test data offers no protection. Expert-curated benchmarks resist saturation better than crowdsourced ones.

#benchmark-saturation#llm-evaluation#expert-curation
📄
ArXiv AI47d ago

AgentLAB Benchmarks LLM Agents on Long-Horizon Attacks

AgentLAB is the first benchmark evaluating LLM agents' vulnerability to adaptive long-horizon attacks via multi-turn interactions. It features five attack types—intent hijacking, tool chaining, task injection, objective drifting, memory poisoning—across 28 environments and 644 test cases. Evaluations reveal high susceptibility in agents, with single-turn defenses failing to mitigate threats.

#long-horizon-attacks#agent-security#benchmark
🐯
虎嗅47d ago

Product Sense Beats Coding in Vibe Coding Era

In the Vibe Coding era powered by tools like Claude Code, product sense outweighs traditional coding skills as non-programmers build full AI agents via conversation. Demos externalize ideas, build trust, and lower barriers from concept to product. Six core techniques include basing on existing GitHub projects, problem-driven AI queries, and modular progressive development.

#no-code#product-sense#ai-agents
🔥
36氪47d ago

Altman: Superint to Top CEOs by 2028

OpenAI CEO Sam Altman predicts an early version of true superintelligence in just a few years. By end-2028, more global intelligence resources will be in data centers than outside. Superintelligence will outperform top company CEOs—including himself—and leading scientists.

#agi-timeline#data-center-compute#superintelligence
🏠
IT之家47d ago

Memory Giants Ramp Factories for AI Demand

Micron, Samsung, and SK Hynix are massively expanding fabs to meet AI-driven memory needs. Micron's $200B plan features a huge Boise campus with 15-20万 WPM capacity. Priority for HBM and AI modules means ongoing consumer shortages.

#fab-expansion#dram-capacity#hbm-priority
🐯
虎嗅47d ago

China Bans 'Whole Net Lowest Price' Claims

China's SAMR released anti-monopoly guidelines for internet platforms, targeting practices like 'whole network lowest price' as monopoly risks. It prohibits algorithm-driven price coordination, big data kill-mature pricing, blocking competitors, and predatory below-cost sales. This shifts antitrust from case-by-case to full-scenario rule reconstruction affecting all platforms.

#antitrust#algorithmic-pricing#ecommerce-regulation
💰
钛媒体47d ago

Blue-Collar Stock Tops Nvidia 5-Year Gains

Unnamed blue-collar stock outperformed Nvidia over past 5 years. It retains significant upside potential. 'Safety hat' operations, including data centers, profit from AI boom.

#data-centers#stock-outperformance#ai-supply-chain
🔥
36氪47d ago

Alibaba Qianwen Hits 130M Orders in Festival

Jefferies reports Alibaba's AI app Qianwen generated over 130 million orders during Spring Festival promotions, with user trust rising. About half came from county-level markets for items like milk tea, movie tickets, and daily goods; 4 million users aged 60+ used AI for transactions first time. Tencent's Yuanbao reached 50M daily active users and 3.6B lottery draws.

#e-commerce#user-metrics#rural-adoption
🐯
虎嗅47d ago

EvoMap Launches AI Agent DNA Protocol

EvoMap is an open A2A gateway protocol that enables AI agents to inherit, share, and evolve capabilities like genes via standardized Capsules. Originating from OpenClaw plugin issues and acquisition concerns, it allows easy integration of agents from platforms like OpenClaw for skill publishing and task delegation. The GEP protocol encapsulates successful strategies as verifiable assets for network-wide evolution.

#ai-agents#skill-sharing#agent-evolution
📊
Bloomberg Technology47d ago

China AI Startups Shares Surge Post-Holiday

Shares of China’s generative AI startups Zhipu and MiniMax soared in Hong Kong after the Lunar New Year holiday. Investors rotated into pure AI plays from traditional internet giants as the market reopened.

#stock-surge#investor-rotation#generative-ai
💰
钛媒体47d ago

Don't Be Fooled by China's 'Hundred Models War'

The article cautions against hype surrounding China's 'hundred models war' in AI. It argues that competition is evolving from a single dimension into two parallel development paths. While outcomes are undecided, the strategic direction has become clear.

#models-war#competition-shift#china-strategy
💻
ZDNet AI47d ago

Top Linux Distros for Home Lab Servers

Author recommends four favorite Linux server distros for home labs. Ideal for bare metal servers or virtual machines. Emphasizes rock-solid reliability for stable setups.

#home-lab#bare-metal#virtual-machines
🔥
36氪47d ago

Nvidia Nears $30B OpenAI Investment

Nvidia is reportedly finalizing a $30 billion investment in OpenAI, replacing a $100 billion long-term commitment from last year between the two companies. This investment forms part of OpenAI's latest funding round.

#funding-round#strategic-investment#compute-partnership
🔥
36氪47d ago

Meta Slashes Employee Equity 5% for AI

Mark Zuckerberg is cutting costs to allocate funds for massive AI expenditures, resulting in a 5% reduction in equity rewards for most Meta employees. This move prioritizes AI investment amid rising compute demands.

#cost-cutting#ai-prioritization#equity-reduction
Page 565 of 637