All Updates

Page 1167 of 1484

March 13, 2026

OpenClaw Creator Accuses Tencent of Copy-No-Donate

OpenClaw author Peter Steinberger calls out Tencent's SkillHub for scraping ClawHub data, spiking server costs without support. Tencent defends as bandwidth-saving mirror with proper attribution under MIT license. Analysis weighs legal rights against open-source reciprocity norms.

#ai-agents#opensource-dispute#tencent-ai

🤖

Reddit r/MachineLearning•113d ago

Questioning LLM Benchmark Papers' Value

Reddit discussion critiques flood of LLM benchmarking papers at NeurIPS/ICLR. Proprietary models update monthly, deprecating benchmarked versions before publication. Questions if big tech uses results for improvements.

#benchmarking#llm-evaluation#academic-research

📊

Bloomberg Technology•113d ago

Iran War Risks Helium for Chips

Prolonged Middle East conflict could disrupt helium supplies vital for chip manufacturing. Qatar produces over one-third of global helium. Asia chipmakers hold a three-month buffer but face supply-chain vulnerabilities.

#supply-chain-risk#geopolitics#semiconductor

🇨🇳

cnBeta (Full RSS)•113d ago

Veteran Gaming Site Delisted by Google for AI Article

VideoGamer, a 20-year-old gaming media outlet, faces shutdown after publishing a fully AI-generated review of Resident Evil: Requiem, which Metacritic removed. The controversy escalated, resulting in Google delisting the site from search results.

#ai-content#google-delisting#media-shutdown

📊

Bloomberg Technology•113d ago

Alibaba Launches OpenClaw Mobile App

Alibaba launched a dedicated mobile app for OpenClaw, enabling users to install and deploy the agentic AI assistant in minutes. This escalates competition among China's tech giants to monetize viral agentic AI. The app targets China's booming AI agent demand.

#agentic-ai#mobile-app#china-tech

🐼

Pandaily•113d ago

LightWheel Unicorn with $145M Raise

LightWheel raised $145 million in funding. This achieves unicorn status as the world's first embodied data unicorn. Funds will expand its embodied AI data and simulation platform for physical AI infrastructure.

#funding#embodied-ai#unicorn-status

💰

钛媒体•113d ago

Huang's Prediction Hits Apple Tax

Jensen Huang's prophecy about app stores is materializing for Apple. The company faces tougher challenges than antitrust by redefining value in a post-App Store world.

#apple-tax#nvidia-prophecy#ecosystem-shift

💰

钛媒体•113d ago

Don't Overhype AI Lobster OpenClaw

Article cautions against excessive hype around AI lobster OpenClaw. Questions whether users have actually adopted or 'raised' it amid buzz.

#ai-agents#hype-warning#open-source-trends

🐼

Pandaily•113d ago

Violoop Funds Physical AI Operator

Violoop secured multi-million-dollar funding. The hardware-driven AI startup aims to launch the world’s first physical-level AI Operator. It redefines desktop interaction for the AI era.

#funding#hardware-ai#desktop-operator

📄

ArXiv AI•113d ago

Unlearning Mirage: Dynamic LLM Evaluation Framework

Researchers propose the Unlearning Mirage framework to stress-test LLM unlearning robustness using dynamic, complex structured queries like multi-hop chains. It generates targeted probes from pre-unlearning knowledge, exposing vulnerabilities missed by static benchmarks. The framework is open-sourced with a pip package for scalable evaluation.

#unlearning#evaluation-framework#multi-hop

📄

ArXiv AI•113d ago

SoLA: Reversible Lifelong LLM Editing

SoLA proposes a semantic routing-based LoRA framework for lifelong model editing in LLMs, encapsulating each edit as an independent frozen LoRA module activated via semantic matching. It prevents semantic drift and catastrophic forgetting while enabling precise revocation of specific edits by removing routing keys. This is the first method to achieve reversible rollback, with end-to-end decision-making integrated into edited layers.

#model-editing#continual-learning#semantic-routing

📄

ArXiv AI•113d ago

RewardHackingAgents Benchmarks LLM Agent Integrity

RewardHackingAgents introduces a benchmark for evaluating integrity in LLM ML-engineering agents, targeting vulnerabilities like evaluator tampering and train/test leakage. It uses workspace tracking and detectors to assign integrity labels by comparing agent metrics to trusted references. Defenses like evaluator locking eliminate tampering but incur 25-31% runtime overhead.

#benchmark#reward-hacking#agent-integrity

📄

ArXiv AI•113d ago

Reasoning Challenges in Autonomous Driving Survey

Autonomous driving is shifting from perception limits to reasoning deficits in long-tail and social scenarios. This survey proposes a Cognitive Hierarchy to decompose tasks and identifies seven core reasoning challenges like responsiveness-reasoning trade-off. It reviews SOTA approaches, trends toward 'glass-box' agents, and urges neuro-symbolic solutions for latency-safety tensions.

#autonomous-driving#cognitive-hierarchy#neuro-symbolic

📄

ArXiv AI•113d ago

PACED: Frontier LLM Distillation

PACED optimizes LLM distillation by targeting the zone of proximal development with Beta-weighted pass rates, avoiding compute waste on mastered or unreachable problems. It proves theoretical SNR optimality and minimax-robustness of the weighting. Empirical results show gains in forward KL, self-distillation, and two-stage schedules on reasoning benchmarks.

#distillation#model-training

🗾

ITmedia AI+ (日本)•113d ago

Microsoft's 5 Keys to Fast AI Agent Adoption

Microsoft released survey results on AI agent introduction readiness. Prepared companies can deploy agents 2.5 times faster than unprepared ones. The report highlights 5 key elements determining success.

#ai-agents#adoption-strategy#enterprise-ai

📄

ArXiv AI•113d ago

LLM User Sims Exaggerate Agent Success

Researchers expose Sim2Real gaps in LLM-based user simulators for agentic tasks, showing they create an 'easy mode' by being overly cooperative and lacking human-like frustration. First study runs full τ-bench with 451 real humans, benchmarking 31 LLMs via new User-Sim Index (USI). Findings urge human validation over unverified simulations.

#user-simulation#sim2real#agent-benchmarks

📄

ArXiv AI•113d ago

LLM Digital Twin for Video Policy Sims

Researchers introduce an LLM-augmented digital twin for evaluating policies in short-video platforms, addressing challenges in closed-loop ecosystems. The system uses a modular four-twin architecture (User, Content, Interaction, Platform) with event-driven execution for reproducible simulations. LLMs serve as pluggable decision services for tasks like persona generation and trend prediction.

#digital-twin#short-video#policy-simulation

📄

ArXiv AI•113d ago

DIVE Scales Diversity for Tool-Use Generalization

DIVE inverts task synthesis by first executing diverse real-world tools and deriving entailed tasks from traces for grounded diversity. It scales tool-pool coverage and per-task variety across 373 tools in five domains via an Evidence Collection-Task Derivation loop. Training Qwen3-8B on DIVE data boosts OOD benchmarks by +22 points, with diversity scaling outperforming quantity even with 4x less data.

#agentic-tasks#tool-use#ood-generalization

📄

ArXiv AI•113d ago

COMPASS: Explainable Agentic Governance Framework

COMPASS is a multi-agent orchestration framework that enforces sovereignty, sustainability, compliance, and ethics in LLM-based agentic systems. It features an Orchestrator and four RAG-augmented sub-agents for grounded evaluations, using LLM-as-a-judge for scoring and explainable justifications. Validation shows RAG boosts semantic coherence and cuts hallucinations.

#multi-agent#ai-governance#llm-judge

📄

ArXiv AI•113d ago

AI Psychometrics Validates LLMs' Reasoning

Researchers used AI Psychometrics and TAM to evaluate psychological reasoning in GPT-3.5, GPT-4, LLaMA-2, and LLaMA-3. All models met validity criteria, with GPT-4 and LLaMA-3 showing superior performance.

#psychometrics#llm-evaluation#tam-validity

11166 1167 11681484

Page 1167 of 1484

Back to Home