All Updates

Page 247 of 929

April 13, 2026

๐Ÿ“„
ArXiv AIโ€ข22d ago

PilotBench: Safe Aviation AI Benchmark

PilotBench is a benchmark for evaluating LLMs on safety-critical flight trajectory and attitude prediction using 708 real-world aviation trajectories. It introduces Pilot-Score, weighting 60% accuracy and 40% instruction/safety compliance. LLMs excel in controllability but lag in precision versus traditional forecasters, especially in complex flight phases.

#benchmark#embodied-ai#aviation-safety
๐Ÿ“„
ArXiv AIโ€ข22d ago

PETITE: Tutor-Student Boosts LLM Coding

PETITE introduces tutor-student multi-agent interaction using the same LLM to enhance coding problem-solving. The student agent generates and refines code, while the tutor provides feedback without ground-truth access. It matches or exceeds SOTA methods like Self-Consistency on APPS benchmark with far fewer tokens.

#multi-agent#peer-tutoring#llm-improvement
๐Ÿ“„
ArXiv AIโ€ข22d ago

OpenKedge: Safe Agent Mutation Protocol

OpenKedge redefines AI agent mutations as governed processes via declarative intent proposals evaluated against system state and policies before execution. It enforces safety through execution contracts with bounded actions, resources, and time using ephemeral identities. The protocol introduces Intent-to-Execution Evidence Chain (IEEC) for cryptographic auditability, proven effective in multi-agent and cloud scenarios.

#agent-safety#execution-contracts#evidence-chains
๐Ÿ“„
ArXiv AIโ€ข22d ago

LOM-action: Auditable Enterprise AI Simulation

LOM-action equips enterprise AI with event-driven ontology simulation, where business events trigger graph mutations in a sandbox to evolve a scenario-valid simulation graph. Decisions are derived exclusively from this graph via a dual-mode skill and reasoning architecture, producing full audit logs. It achieves 93.82% accuracy and 98.74% F1, outperforming Doubao-1.8 and DeepSeek-V3.2 by 4x on F1.

#ontology-simulation#graph-mutation#audit-trail
๐Ÿ“„
ArXiv AIโ€ข22d ago

Linear Bounds for MSO Models via Decision Diagrams

Researchers extend Courcelle's theorem, proving MSO2 models with free variables can be represented by decision diagrams of parameterized linear size in treewidth or pathwidth. Upper bounds shown for SDD (treewidth) and OBDD (pathwidth); lower bound proves OBDD limitations on bounded treewidth graphs. Links MSO logic to knowledge representation.

#treewidth#decision-diagrams#courcelle-theorem
๐Ÿ“„
ArXiv AIโ€ข22d ago

Feedback Search Optimizes LLM Planning Domains

Researchers model space reasoning as search in feedback space to generate planning domains from natural language using LLMs. They augment descriptions with symbolic feedback like landmarks and VAL validator outputs. Heuristic search over model space improves domain quality for practical deployment.

#planning-domains#heuristic-search#symbolic-feedback
๐Ÿ“„
ArXiv AIโ€ข22d ago

Artifacts as RL Agent External Memory

Researchers formalize the situated view of cognition in RL, framing the environment as functional memory via 'artifacts' that compress history information. Proofs show artifacts reduce memory needed for policies, corroborated by experiments where spatial path observations unintentionally lower memory requirements. This paves the way for exploiting environments as substitutes for internal memory.

#external-memory#situated-cognition
๐Ÿ“„
ArXiv AIโ€ข22d ago

Agents Sustain Marketing Gains Autonomously

Longitudinal case study analyzes agentic AI for marketing personalization over 11 months in a consumer app. Human-curated phase achieved highest engagement lift, while autonomous agents sustained positive gains from a fixed library. Supports hybrid human-agent model for scalable performance.

#agentic-ai#personalization#autonomous-agents
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข22d ago

SoftBank, NEC, Honda, Sony Form 1T-Param Physical AI Venture

SoftBank is reportedly teaming up with NEC, Honda, and Sony Group to establish a new company for AI foundation model development. The model targets 1 trillion parameters for physical AI. It focuses on integrating large-scale models with robots.

#embodied-ai#foundation-model#robotics-consortium
๐Ÿค–
Reddit r/MachineLearningโ€ข22d ago

ICML 2026 Review Deadline Sparks Outrage

ICML 2026 extended reviewer final justification deadline without allowing author-AC comments, frustrating authors. A reviewer raised new concerns on experiments and fairness post-rebuttal, risking rejection of strong papers. Seen as a major process mistake.

#conference-review#icml-2026#deadline-mistake
๐Ÿ‡ญ๐Ÿ‡ฐ
SCMP Technologyโ€ข22d ago

China Approves First L3 Self-Driving EVs

Chinese carmakers received approval in mid-December for EV models with Level 3 autonomous driving on public roads. They are preparing for mass production of 'hands-off' vehicles. New mandatory safety standards for autonomous vehicles are open for public comment, marking a pivotal year for self-driving tech in China.

#autonomous-driving#china-regulation#av-safety
๐Ÿฆ™
Reddit r/LocalLLaMAโ€ข22d ago

Gemma 4 Called Out for Lazy Web Search

User complains Gemma 4 26B MoE ignores extensive web search prompts and tools, sticking to one search max. Despite instructions and skills, it prefers internal knowledge over digging deeper. Contrasts with proactive Qwen 3.5 27B.

#tool-use#model-critique#web-search
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข22d ago

Unitree H1 Runs 36 km/h, Claims World Champion Level

Chinese robotics firm Unitree Robotics released a video of their headless humanoid robot H1 sprinting at a top speed of 36 km/h (10 m/s). The company claims this performance reaches 'world champion level' in humanoid robotics.

#humanoid-robot#robotics#locomotion
๐Ÿ 
ITไน‹ๅฎถโ€ข22d ago

Edge UI Revamp Embraces Copilot Round Corners

Microsoft Edge plans a 2026 UI overhaul with softer, larger rounded corners mirroring Copilot's design. Now managed by the AI team, it features iOS-like toggles and deeper Chromium alignment, reducing unique features. This unifies visuals across Microsoft's AI ecosystem despite performance critiques.

#ui-redesign#copilot-integration#browser-ui
๐Ÿ‡ฌ๐Ÿ‡ง
The Register - AI/MLโ€ข22d ago

China's AI Plan for Lessons and Homework

Chinaโ€™s National Data Administration published an action plan for AI in education to upskill citizens for AI adoption. The plan promotes AI for preparing school lessons and marking homework. It aims to integrate AI deeply into the education system.

#education-ai#china-policy#edtech
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข22d ago

Humanoid Robots: Involution and Client Wars

A screenshot unveils intense involution, anxiety, and resource battles in the humanoid robot sector. Rivals aggressively target stealing all of Unitree's customers and bids.

#humanoid-robots#competition#involution
๐Ÿ‡ฌ๐Ÿ‡ง
The Guardian Technologyโ€ข22d ago

Meta AI Glasses: Review and Privacy Risks

Journalist Elle Hunt shares her month-long experience wearing Metaโ€™s AI-powered smart glasses in a podcast. She highlights transformative features for vision and hearing impairments alongside privacy concerns. Mark Zuckerberg describes them as 'personal super intelligence' for staying present.

#wearables#privacy#accessibility
๐Ÿผ
Pandailyโ€ข22d ago

Pony.ai Unveils Self-Evolving PonyWorld 2.0

Pony.ai has unveiled PonyWorld 2.0, enabling autonomous systems to self-diagnose and evolve. This platform redefines training methods for self-driving AI, marking a paradigm shift in the industry.

#autonomous-driving#simulation#self-evolving
๐Ÿฆ™
Reddit r/LocalLLaMAโ€ข22d ago

MiniMax-M2.7 NVFP4 Hits 2800 tok/s on 2x RTX PRO 6000

Benchmarks on 2x RTX PRO 6000 Blackwell (96GB) show MiniMax-M2.7 NVFP4 achieving 2800 tok/s at C=128, 127.7 tok/s at C=1. Prefill up to 17k tok/s at 8k ctx. Uses SGLang with TP=2, bf16 KV, no speculative decoding yet.

#blackwell-gpu#inference-bench#nvfp4-quant
๐Ÿผ
Pandailyโ€ข22d ago

Tencent Cloud Launches QClaw V2 Multi-Agent Collaboration

Tencent Cloud rolled out QClaw V2, introducing multi-agent collaboration for consumer AI assistants. This enables agents to work together on tasks. Scalability and memory limitations remain key challenges.

#multi-agent#ai-assistants#cloud-infra
Page 247 of 929