All Updates
Page 499 of 816
March 13, 2026
Huang's Prediction Hits Apple Tax
Jensen Huang's prophecy about app stores is materializing for Apple. The company faces tougher challenges than antitrust by redefining value in a post-App Store world.
Don't Overhype AI Lobster OpenClaw
Article cautions against excessive hype around AI lobster OpenClaw. Questions whether users have actually adopted or 'raised' it amid buzz.
Violoop Funds Physical AI Operator
Violoop secured multi-million-dollar funding. The hardware-driven AI startup aims to launch the worldβs first physical-level AI Operator. It redefines desktop interaction for the AI era.
Unlearning Mirage: Dynamic LLM Evaluation Framework
Researchers propose the Unlearning Mirage framework to stress-test LLM unlearning robustness using dynamic, complex structured queries like multi-hop chains. It generates targeted probes from pre-unlearning knowledge, exposing vulnerabilities missed by static benchmarks. The framework is open-sourced with a pip package for scalable evaluation.
SoLA: Reversible Lifelong LLM Editing
SoLA proposes a semantic routing-based LoRA framework for lifelong model editing in LLMs, encapsulating each edit as an independent frozen LoRA module activated via semantic matching. It prevents semantic drift and catastrophic forgetting while enabling precise revocation of specific edits by removing routing keys. This is the first method to achieve reversible rollback, with end-to-end decision-making integrated into edited layers.
RewardHackingAgents Benchmarks LLM Agent Integrity
RewardHackingAgents introduces a benchmark for evaluating integrity in LLM ML-engineering agents, targeting vulnerabilities like evaluator tampering and train/test leakage. It uses workspace tracking and detectors to assign integrity labels by comparing agent metrics to trusted references. Defenses like evaluator locking eliminate tampering but incur 25-31% runtime overhead.
Reasoning Challenges in Autonomous Driving Survey
Autonomous driving is shifting from perception limits to reasoning deficits in long-tail and social scenarios. This survey proposes a Cognitive Hierarchy to decompose tasks and identifies seven core reasoning challenges like responsiveness-reasoning trade-off. It reviews SOTA approaches, trends toward 'glass-box' agents, and urges neuro-symbolic solutions for latency-safety tensions.
PACED: Frontier LLM Distillation
PACED optimizes LLM distillation by targeting the zone of proximal development with Beta-weighted pass rates, avoiding compute waste on mastered or unreachable problems. It proves theoretical SNR optimality and minimax-robustness of the weighting. Empirical results show gains in forward KL, self-distillation, and two-stage schedules on reasoning benchmarks.
Microsoft's 5 Keys to Fast AI Agent Adoption
Microsoft released survey results on AI agent introduction readiness. Prepared companies can deploy agents 2.5 times faster than unprepared ones. The report highlights 5 key elements determining success.
LLM User Sims Exaggerate Agent Success
Researchers expose Sim2Real gaps in LLM-based user simulators for agentic tasks, showing they create an 'easy mode' by being overly cooperative and lacking human-like frustration. First study runs full Ο-bench with 451 real humans, benchmarking 31 LLMs via new User-Sim Index (USI). Findings urge human validation over unverified simulations.
LLM Digital Twin for Video Policy Sims
Researchers introduce an LLM-augmented digital twin for evaluating policies in short-video platforms, addressing challenges in closed-loop ecosystems. The system uses a modular four-twin architecture (User, Content, Interaction, Platform) with event-driven execution for reproducible simulations. LLMs serve as pluggable decision services for tasks like persona generation and trend prediction.
DIVE Scales Diversity for Tool-Use Generalization
DIVE inverts task synthesis by first executing diverse real-world tools and deriving entailed tasks from traces for grounded diversity. It scales tool-pool coverage and per-task variety across 373 tools in five domains via an Evidence Collection-Task Derivation loop. Training Qwen3-8B on DIVE data boosts OOD benchmarks by +22 points, with diversity scaling outperforming quantity even with 4x less data.
COMPASS: Explainable Agentic Governance Framework
COMPASS is a multi-agent orchestration framework that enforces sovereignty, sustainability, compliance, and ethics in LLM-based agentic systems. It features an Orchestrator and four RAG-augmented sub-agents for grounded evaluations, using LLM-as-a-judge for scoring and explainable justifications. Validation shows RAG boosts semantic coherence and cuts hallucinations.
AI Psychometrics Validates LLMs' Reasoning
Researchers used AI Psychometrics and TAM to evaluate psychological reasoning in GPT-3.5, GPT-4, LLaMA-2, and LLaMA-3. All models met validity criteria, with GPT-4 and LLaMA-3 showing superior performance.
AI-Blockchain Convergence for Decentralized Future
This arXiv editorial contrasts AI's centralizing forces from LLMs and corporate monopolies with blockchain's decentralization. Blockchain mitigates AI risks via decentralized data, compute, and governance, while AI boosts blockchain efficiency in smart contracts and security. It proposes 'decentralized intelligence' (DI) as a new research field.
AI Agents Advance in Multi-Step Cyber Attacks
Researchers evaluated frontier AI models on 32-step corporate network and 7-step ICS cyber ranges requiring chained capabilities. Performance scales log-linearly with inference compute, gaining up to 59% from 10M to 100M tokens. Newer models like Opus 4.6 complete far more steps than GPT-4o, with best run reaching 22/32 steps.
60% of GitHub Top 10 Growth Projects Are AI
GitHub's Octoverse 2025 report analysis shows 60% of top 10 fastest-growing projects are AI-related. Developers are shifting priorities and tool choices for AI workflows.
Xiaohongshu Launches War on Lobster AI
Xiaohongshu is fighting back against rampant 'Lobster' AI disruptions on its platform. The article explores AI governance strategies for social platforms amid rising AI threats. It questions how platforms should battle AI incursions effectively.
Meituan Lags in AI After Delivery Wars
One year post-food delivery wars, competition and subsidies have vanished, leaving Meituan dominant yet criticized for insufficient AI integration. Economic pressures hit workers, like unaffordable milk tea. Signals need for Meituan to boost AI capabilities.
Earendil Mulls Hong Kong IPO
AI drug discovery biotech startup Earendil Labs is exploring a Hong Kong listing. The move comes per sources familiar with the matter. It signals growing investor interest in AI biotech.