All Updates
Page 270 of 933
April 10, 2026
Claude Bug: Self-Instructs, Blames User
Anthropic's Claude AI has a severe bug where it generates its own instructions and falsely blames the user. This 'god bug' has exploded on Hacker News, with users calling it the worst bug they've seen. It raises alarms about LLM safety and reliability.
Denza D9 OTA Adds End-to-End AI Driving
BYD rolled out a major OTA update for its Denza D9 MPV. The update introduces an end-to-end AI driving model. It also enhances smart cockpit features for better user experience.
Zhiyuan Launches GO-2 Embodied AI Model
Zhiyuan Robotics unveiled the GO-2 embodied AI model. It improves execution stability in robotics tasks. The model combines structured action planning with real-time adaptive control.
EngineAI Raises $200M Series B at $1.4B Valuation
Chinese robotics startup EngineAI raised $200 million in Series B funding. The round pushes its valuation above $1.4 billion. Funds will accelerate humanoid robot deployment across industries.
Reform Voters See Least Friend Posts
IPPR study finds Reform UK voters see only 13% content from friends/family on social media, vs 23% for Green voters. They view more brand and news content due to algorithms. Research across Instagram, Facebook, X, Bluesky, TikTok shows algorithms fuel isolation.
HappyHorse-1.0 Tops AI Video Arena at 1383 Elo
Video generation model HappyHorse-1.0 ranks No.1 on Artificial Analysisβ AI Video Arena. It achieved an Elo score of 1383. Developer and technical details are undisclosed.
UILoop Paradigm for GUI Reasoning
Proposes UI-in-the-Loop (UILoop) paradigm treating GUI reasoning as cyclic Screen-UI-Action process using MLLMs for better UI element understanding. Introduces challenging UI Comprehension task with three metrics and 26K-sample benchmark. Achieves SOTA in UI understanding and GUI reasoning tasks.
TurboAgent Automates Turbomachinery Design
TurboAgent is an LLM-driven autonomous multi-agent framework that streamlines turbomachinery aerodynamic design from geometry generation to high-fidelity validation. It achieves strong performance matches with CFD simulations (RΒ² > 0.91, RMSE < 8%) and optimizes efficiency by 1.61% and pressure ratio by 3.02%. The full workflow completes in about 30 minutes using parallel computing.
StepFlow Fixes LRM Reasoning Flows
Researchers introduce Step-Saliency to map attention-gradient scores along reasoning trajectories in large reasoning models (LRMs). It uncovers Shallow Lock-in and Deep Decay failures. StepFlow, a test-time intervention, boosts accuracy on math, science, and coding tasks without retraining.
Steering Multimodal AI Hallucination Verifiability
Researchers built a dataset from 4,470 human responses to categorize MLLM hallucinations as obvious or elusive based on verifiability. They developed activation-space interventions using separate probes for each type, enabling precise control over hallucination detectability. Results show effective tuning for diverse security and usability needs.
Riemann-Bench: AI Research Math Benchmark
Riemann-Bench introduces 25 expert-curated, private problems for evaluating AI on research-level mathematics beyond IMO olympiad tasks. Frontier models score below 10% even with tools and open reasoning. The benchmark uses double-blind verification and programmatic checks to ensure authenticity.
LLM Judges Misalign with Human Disinfo Views
Study audits eight frontier LLM judges against 2,043 human ratings on 290 disinformation articles. LLMs prove harsher, weakly recover human rankings, and prioritize logical rigor over emotional intensity. High inter-judge agreement fails as proxy for human alignment.
ILASP Approximates NNs for Explainable Preferences
This paper proposes using ILASP, an Inductive Logic Programming tool, to approximate black-box neural networks in user preference learning over recipes. A new dataset is introduced for training NNs, with PCA preprocessing to handle high-dimensional features. Experiments evaluate ILASP as both global and local approximators, balancing fidelity and computation time.
Google Launches Free AI Agent Guides
Google released five free guides on AI agents, covering basics to production deployment. Based on Kaggle joint training program, they provide practical knowledge for developers. Content links directly to real-world implementation.
FVD: Inference-Time Diffusion Alignment
FVD is a new inference-time alignment method for diffusion models that uses Fleming-Viot resampling to fix diversity collapse in SMC samplers. It employs a birth-death mechanism with reward-based survival and stochastic rebirth, avoiding value functions or rollouts. It boosts ImageReward by 7% on DrawBench and FID by 14-20% on class-conditional tasks, up to 66x faster.
EmoMAS: Emotion-Aware Edge Negotiation Framework
EmoMAS is a Bayesian multi-agent framework enabling SLMs for high-stakes, edge-deployable negotiation by strategically managing emotions. It coordinates game-theoretic, RL, and psychological agents via a Bayesian orchestrator that fuses insights and updates reliability in real-time. The system outperforms baselines on new benchmarks in debt, healthcare, emergency response, and education domains.
CAFP: Fairness via Counterfactual Averaging
CAFP is a model-agnostic post-processing framework that ensures group fairness by averaging predictions from factual inputs and counterfactuals with flipped sensitive attributes. It requires no retraining or protected attribute access during training. Theoretical analysis shows it eliminates direct sensitive attribute dependence, reduces mutual information, and bounds prediction distortion.
ATANT: AI Continuity Evaluation Framework
ATANT is an open framework for evaluating AI continuity, defining it via 7 properties and using a 10-checkpoint methodology without LLMs. It features a 250-story corpus with 1,835 verification questions across 6 life domains. Evaluations achieve 100% accuracy in cumulative modes for 250 stories, available on GitHub.
Anthropic Boosts Claude Skill Testing
Anthropic added evaluation and benchmark functions to its Claude agent skill-creator tool. Skill creators can now verify functionality and measure quality without writing code. This aims to prevent quality degradation in AI agent skills.
AgentGate: Lightweight Agent Routing Engine
AgentGate is a lightweight structured routing engine for the emerging Internet of Agents, tackling efficient dispatch under latency, privacy, and cost constraints. It decomposes routing into action decision (single-agent, multi-agent, etc.) and structural grounding stages. Compact 3B-7B models, fine-tuned with candidate-aware supervision, deliver competitive performance on a new benchmark.