All Updates

Page 150 of 906

April 20, 2026

๐Ÿ“„
ArXiv AIโ€ข12d ago

Milkyway Evolves Agents for Future Predictions

Milkyway is a self-evolving LLM agent system that updates a persistent harness using internal feedback from repeated predictions on unresolved questions. It improves factor tracking, evidence gathering, and uncertainty handling without changing the base model. It achieves top scores on FutureX (60.90) and FutureWorld (77.96).

#prediction-agents#self-evolution#internal-feedback
๐Ÿ“„
ArXiv AIโ€ข12d ago

MCTS for Bilevel Agent Skills Optimization

Researchers introduce a bilevel optimization framework for LLM agent skills, using Monte Carlo Tree Search in the outer loop to determine skill structure and LLMs in the inner loop to refine content. This addresses the interdependence between structure and components in skill design. Experiments on an open-source Operations Research QA dataset show improved agent performance.

#agent-skills#bilevel-optimization#llm-agents
๐Ÿ“„
ArXiv AIโ€ข12d ago

LLM Reasoning: Latent, Not Chain-of-Thought

This position paper argues that LLM reasoning is best studied as latent-state trajectories rather than surface chain-of-thought (CoT). It formalizes three hypotheses separating latent states, explicit CoT, and serial compute, with evidence favoring latent trajectories (H1). Recommends focusing on latent dynamics and disentangling evaluations for better interpretability.

#reasoning#latent-states#interpretability
๐Ÿ“„
ArXiv AIโ€ข12d ago

LACE Enables Cross-Thread LLM Reasoning

LACE transforms isolated LLM reasoning into coordinated parallel processes via cross-thread attention, allowing paths to share insights and correct errors. It uses a synthetic data pipeline to train collaborative behavior absent in natural data. Experiments show over 7-point accuracy gains versus standard parallel search.

#synthetic-data
๐Ÿ“„
ArXiv AIโ€ข12d ago

KWBench: LLM Unprompted Problem Recognition Benchmark

KWBench introduces a benchmark for evaluating LLMs' ability to recognize problems in knowledge work scenarios without explicit prompts. It features 223 tasks from fields like acquisitions and fraud analysis, based on game-theoretic patterns, with a three-tier scoring rubric. Top models achieve only 27.9% pass rate unprompted, highlighting evaluation gaps.

#benchmark#llm-evaluation#knowledge-work
๐Ÿ“„
ArXiv AIโ€ข12d ago

GIST: Multimodal Spatial Grounding Topology

GIST transforms consumer-grade mobile point clouds into semantically annotated navigation topologies via 2D occupancy maps, topological layouts, and lightweight semantic overlays. It powers tasks like intent-driven semantic search, one-shot localization (1.04m error), zone classification, and visually-grounded instruction generation. Evaluations show superior performance over baselines and 80% navigation success with verbal cues.

#multimodal#spatial-grounding#embodied-ai
๐Ÿ“„
ArXiv AIโ€ข12d ago

DeepER-Med: Agentic AI for Medical Research

DeepER-Med introduces an agentic AI framework for evidence-based medical research with modules for planning, collaboration, and synthesis. It includes DeepER-MedQA, a dataset of 100 expert-level questions from real scenarios. The system outperforms top platforms in insight generation and aligns with clinical recommendations in 7 of 8 cases.

#agentic-ai#healthcare-ai#evidence-synthesis
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

Claude Adds Context Sharing & Skills for Excel/PowerPoint

Anthropic enhanced Claude for Excel and Claude for PowerPoint with multi-file context sharing. The new 'skills' feature enables one-click execution of routine workflows. This eliminates the hassle of switching between Excel and PowerPoint.

#office-integration#workflow-automation#context-sharing
๐Ÿ“„
ArXiv AIโ€ข12d ago

Canadian AI Register Obscures Accountability

Canada launched its first Federal AI Register in November 2025, listing 409 AI systems. Analysis reveals 86% are internal efficiency tools, but the register obscures human discretion, training data, and uncertainty management. It frames AI as reliable tooling rather than contestable decision-making.

#ai-transparency#government-policy#accountability
๐Ÿ“„
ArXiv AIโ€ข12d ago

Algebraic Invariants Enhance LLM Reasoning

Presents a symbolic reasoning framework for LLMs operationalizing abduction, deduction, and induction via Peirce's tripartite inference. Enforces consistency with Gamma Quintet invariants, including Weakest Link bound to prevent error propagation in chains. Verified through 100 properties and 16 fuzz tests over 10^5 cases, offering a benchmark foundation.

#abduction#deduction#induction
๐Ÿ‡จ๐Ÿ‡ณ
TechNodeโ€ข12d ago

Xiaomi miclaw first to pass CAICT Claw eval

Xiaomi's miclaw mobile intelligent agent is among the first to pass the CAICT Claw evaluation for smartphone intelligent assistants. This regulatory approval validates advanced on-device AI systems. It is powered by Xiaomi's in-house MiMo large model.

#on-device-ai#china-regulation#intelligent-agent
๐Ÿ“Š
Bloomberg Technologyโ€ข12d ago

iQiyi Overhauls for Full AI Content Creation

iQiyi Inc. anticipates AI generating entire films and shows from scratch soon. This vision drives the streaming service's largest overhaul in 16 years. It marks a monumental shift in the content industry.

#ai-generated-content#streaming-overhaul#china
๐Ÿค–
Reddit r/MachineLearningโ€ข12d ago

SGOCR: Grounded OCR Dataset Pipeline Released

Independent researcher releases SGOCR, an open-source pipeline and V1 dataset for spatially-grounded OCR-focused VQA with rich metadata. Pipeline uses Nvidia nemotron-ocr-v2 for extraction, Gemma4/Qwen3-VL for anchors, and Gemini-2.5-flash for verification. Developed via agentic loops and custom optimization for VLM training.

#vlm-dataset#ocr-pipeline#grounding
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข12d ago

Robot Makers Snub Honor's Half-Marathon Win

The entire internet celebrates Honor's victory in a half-marathon, but robotics developers remain unmoved. The article argues that embodied intelligence does not require superhuman speed like Usain Bolt's, prioritizing other capabilities instead.

#robotics#embodied-ai#half-marathon
๐Ÿค–
Reddit r/MachineLearningโ€ข12d ago

Vaultak: AI Agent Runtime Security & Risk Scoring

Vaultak provides runtime security for production AI agents, scoring risks in real-time across five dimensions: action type, resource sensitivity, blast radius, frequency, and context deviation. It addresses failure modes like unintended actions, PII leaks, and damaging loops with policy enforcement and rollback. Open-source on GitHub for discussion and integration.

#ai-agents#risk-scoring#policy-enforcement
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

Anthropic's Metric: 30% AI-Resistant People Traits

Anthropic has published a new indicator revealing that about 30% of people are less affected by AI advancements. The report highlights common characteristics of these individuals and specific job types that remain resilient. This insight comes from ITmedia AI+ coverage.

#workforce-impact#ai-resilience#job-analysis
๐Ÿผ
Pandailyโ€ข12d ago

DJI Teases Agri Drone Launch Apr 21

DJI is set to launch a new product on April 21. It is widely expected to be the company's latest agricultural drone. The tease has sparked industry anticipation.

#agricultural-drone#drone-launch#precision-farming
๐Ÿผ
Pandailyโ€ข12d ago

Honor Expands to Robot Dog & Dexterous Hand

Honor is developing a new robotics lineup after its 'Lightning' humanoid. It includes a quadruped robot dog and dexterous hand. The effort targets consumer-ready embodied AI products.

#quadruped-robot#dexterous-hand#embodied-ai
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข12d ago

Honor Dominates Robot Marathon Top Six

Honor robots swept the top six positions in a robot marathon competition. This underscores big tech's engineering and systemic advantages. Startups like Unitree and Zhiyuan must urgently scale data and full-stack capabilities.

#robotics#embodied-ai#competition
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

Universities Warn: Avoid Personal Data in Google AI Search

Rikkyo University and other institutions are warning their campuses not to input personal information into Google Search's AI mode. The data entered is stored in Google's database and used for AI training, which constitutes an information leakage risk.

#privacy-risk#data-leakage#ai-search
Page 150 of 906