All Updates

Page 133 of 868

April 17, 2026

๐Ÿ“„
ArXiv AIโ€ข12d ago

Uncertainty Quantification for LRMs

New method uses conformal prediction to quantify uncertainty in LRMs' reasoning traces and answers with statistical guarantees. Introduces Shapley value framework for explaining uncertainty origins via key training examples and steps. Theoretical analyses and experiments confirm effectiveness.

#explainable-ai#reasoning-models
๐Ÿ“„
ArXiv AIโ€ข12d ago

SciFi: Safe Autonomous AI for Science

SciFi introduces a safe, lightweight agentic AI framework for autonomous scientific tasks. It features an isolated execution environment, three-layer agent loop, and self-assessing do-until mechanism for reliable operation. This enables end-to-end automation of structured tasks using various LLMs with minimal human input.

#agentic-ai#autonomous-workflow#scientific-ai
๐Ÿ“„
ArXiv AIโ€ข12d ago

RiskWebWorld: Realistic GUI Benchmark for E-commerce Risks

RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, featuring 1,513 tasks from production pipelines across 8 domains. It includes challenges like uncooperative websites and partial hijackments, with Gymnasium-compliant infrastructure for scalable evaluation and RL. Evaluations show top models at 49.1% success, highlighting scale's importance over zero-shot grounding.

#gui-agents#e-commerce#benchmark
๐Ÿ“„
ArXiv AIโ€ข12d ago

ReSS: Symbolic Scaffolds for Tabular Reasoning

ReSS framework extracts decision paths from decision trees as symbolic scaffolds to guide LLMs in generating faithful reasoning for tabular data. It creates a high-quality dataset for fine-tuning LLMs, augmented for better generalization. Achieves up to 10% gains on medical/financial benchmarks with new faithfulness metrics.

#tabular-data#symbolic-reasoning#explainable-ai
๐Ÿ“„
ArXiv AIโ€ข12d ago

NuHF Claw: Risk-Aware AI for Nuclear Rooms

NuHF Claw introduces a risk-constrained cognitive agent framework for digital nuclear control rooms, addressing cognitive risks from soft-controls. It couples cognitive state inference with real-time probabilistic safety assessment to regulate autonomous behavior. Simulator tests show it anticipates cognitive degradation, constrains unsafe recommendations, and preserves human authority.

#cognitive-agents#nuclear-safety#llm-safety
๐Ÿ“„
ArXiv AIโ€ข12d ago

Measurable Errors in LM Agent Explore/Exploit

Researchers design controllable 2D grid environments with DAG tasks to measure exploration and exploitation errors in LM agents policy-agnostically. Frontier models struggle with distinct failures, but reasoning models perform better and engineering improves both skills. Code is open-sourced on GitHub.

#agent-benchmark#explore-exploit#embodied-ai
๐Ÿ“„
ArXiv AIโ€ข12d ago

LLM Chaos from Numerical Instability

Researchers reveal how floating-point rounding errors cause unpredictability in LLMs through chaotic propagation in Transformer layers. They identify an 'avalanche effect' in early layers and three distinct regimes: stable, chaotic, and signal-dominated. Findings are validated across datasets and model architectures.

#chaos#reliability#floating-point
๐Ÿ“„
ArXiv AIโ€ข12d ago

LAMO: Scalable Lightweight GUI Agents

LAMO framework empowers lightweight MLLMs for GUI automation via multi-role orchestration and task scalability. It features role-oriented data synthesis and two-stage training: Perplexity-Weighted Cross-Entropy SFT for knowledge distillation, plus RL for cooperative exploration. LAMO-3B supports monolithic and MAS execution, excelling as a plug-and-play executor with advanced planners.

#gui-agents#lightweight-mllms
๐Ÿ“„
ArXiv AIโ€ข12d ago

CONCORD: Privacy-Safe Always-Listening AI

CONCORD is a privacy-aware A2A framework for proactive speech-based AI assistants that captures only owner speech via real-time verification, producing one-sided transcripts. It recovers missing context through spatio-temporal resolution, gap detection, and minimal relationship-aware A2A queries, avoiding hallucinations. Evaluations show 91.4% gap detection recall, 96% relationship classification, and 97% privacy TNR.

#privacy-aware#speech-ai#multi-agent
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

AI Detects Customer Harassment in Talks

Plus Alpha Consulting launches AI Kasuhara Guard to detect customer harassment (kasuhara) in face-to-face service. It transcribes conversations via speech recognition, visualizes interactions, and secures evidence trails. This aids customer-facing businesses in Japan.

#speech-recognition#harassment-detection#customer-service
๐Ÿ“„
ArXiv AIโ€ข12d ago

Active Constraint Learning for Satellite Scheduling

Researchers introduce Conservative Constraint Acquisition (CCA) for optimizing Earth Observation satellite schedules under unknown operational constraints. Integrated into the Learn&Optimize framework, it interactively learns feasibility from a binary oracle while avoiding over-tightening. It outperforms baselines on synthetic instances up to 50 tasks, using fewer queries and less time.

#satellite-scheduling
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

JFTC Warns OS Providers on AI Exclusion

Japan's Fair Trade Commission released a generative AI market survey report on April 16, warning that smartphone OS providers restricting third-party AI app access may violate antitrust laws. It expressed concerns over US and Chinese giants potentially hindering fair competition for domestic firms in AI autonomous driving.

#antitrust#regulation#autonomous-driving
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข12d ago

DeepSeek Cheaper as Cloud Prices Surge

AI inference costs dropped over 80% in 18 months, yet China's top three cloud providers announced price hikes in the same week. This signals a 2-3 year structural pricing battle. The article questions when this trend will end.

#compute-costs#china-clouds#price-battle
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

Japan MOJ Panels AI Video-Voice Infringements

Japan's Ministry of Justice announced on April 17 a study group to address escalating unauthorized AI-generated videos and audio mimicking celebrities' faces and voices. It will clarify civil liabilities, infringement criteria, and damage claims based on current laws and precedents.

#deepfake#regulation#copyright
๐Ÿผ
Pandailyโ€ข12d ago

Manycore Tech HK Debut Hits $4.1B Valuation

Manycore Tech has debuted on the Hong Kong stock exchange, achieving a valuation surpassing HK$32B (about $4.1B USD). The company is backed by prominent investors Shunwei and IDG. This launch highlights spatial intelligence as AI's emerging frontier.

#spatial-intelligence#ipo#funding
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข12d ago

QuantPai Brands as AI Species Pioneer

QuantPai upgrades brand to 'Intelligent Species' first stock with exclusive interview of MIT robotics PhD CTO. Emphasizes enduring value creation for long-term success in AI/robotics.

#robotics#brand-upgrade#cto-interview
โš›๏ธ
้‡ๅญไฝโ€ข12d ago

Spatial AI Stock Surges 171% on Debut

The first spatial intelligence stock skyrocketed 171% on opening day. It represents success in the track backed by AI pioneer Li Feifei and is one of Hangzhou's six little dragons. The era of spatial intelligence has just begun.

#spatial-ai#china-startups#ipo-surge
๐Ÿผ
Pandailyโ€ข12d ago

Alibaba Enters $40B Guzi Economy

Alibaba and China Literature are entering the booming โ€œGuzi Economy,โ€ valued at over $40 billion. This market is driven by Gen Z's passion for IP-based consumption. China's tech giants are racing to capture this massive opportunity.

#guzi-economy#ip-consumption#gen-z-market
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข12d ago

SoftBank Launches Sovereign LLM Service

SoftBank starts enterprise service using domestic LLM Sarashina in June. Built with Oracle services in its own data center for data sovereignty. Enables secure AI integration with confidential business info.

#sovereign-ai#data-sovereignty#japan-llm
๐Ÿฆ™
Reddit r/LocalLLaMAโ€ข12d ago

Barriers to Insider LLM Weight Leaks

Reddit post questions technical barriers preventing OpenAI or Anthropic engineers from exporting and leaking flagship model weights. Notes NDAs exist but LLMs are portable compared to other software. References original Llama leak as precedent.

#insider-threat#model-security#weight-leak
Page 133 of 868