All Updates

Page 356 of 893

March 30, 2026

๐Ÿ“„
ArXiv AIโ€ข31d ago

Sommelier: Open Pipeline for Full-Duplex SLMs

Sommelier introduces a scalable open-source pipeline for preprocessing multi-turn audio data tailored for full-duplex Speech Language Models. It tackles the lack of high-quality multi-speaker conversational datasets and issues like overlapping speech, back-channeling, diarization errors, and ASR hallucinations. This bridges critical gaps in developing real-time natural human-computer interaction systems.

#full-duplex#multi-turn#speech-preprocessing
๐Ÿ“„
ArXiv AIโ€ข31d ago

PAPO Stabilizes Rubric Training via Decoupled Normalization

Process-Aware Policy Optimization (PAPO) integrates rubric-based process evaluation into GRPO using decoupled advantage normalization to overcome ORM's uniform correctness issue and PRM's reward hacking. It composes global-normalized outcome advantage (Aout) with correct-response-normalized process advantage (Aproc). PAPO outperforms baselines, achieving 51.3% on OlympiadBench versus 46.3% for ORM.

#rlhf#reward-modeling#policy-optimization
๐Ÿ“„
ArXiv AIโ€ข31d ago

LLM Fusion Builds Traceable Airport KGs

Presents a dual-stage framework fusing symbolic Knowledge Engineering and LLMs to construct machine-readable Knowledge Graphs for Total Airport Management, addressing data silos. Document-level processing outperforms segment-based inference in recovering non-linear procedural dependencies. Ensures traceability via probabilistic discovery and deterministic source anchoring, automating workflow synthesis from text.

#knowledge-graphs#prompt-engineering#traceability
๐Ÿ“„
ArXiv AIโ€ข31d ago

Lightweight AI Framework for PV Arc-Fault Detection

The LD-framework enables reliable DC arc-fault detection in PV systems despite spectral interference, hardware heterogeneity, and operating drifts. It achieves 0.9999 accuracy, 0% false trips across nuisance conditions, and effective cross-hardware transfer with minimal data. Cloud-edge collaboration supports long-term self-adaptation, recovering precision from 21% to 95% in field tests.

#arc-fault-detection#pv-systems#domain-adaptation
๐Ÿ“„
ArXiv AIโ€ข31d ago

GUIDE Fixes GUI Agent Domain Bias

GUIDE is a training-free, plug-and-play framework that resolves domain bias in GUI agents by retrieving expertise from web tutorial videos. It uses a subtitle-driven Video-RAG pipeline for three-stage retrieval and an automated annotation pipeline based on inverse dynamics to inject planning and grounding knowledge. Experiments on OSWorld show over 5% success rate improvements and fewer execution steps without model changes.

#gui-agents#video-rag#domain-bias
๐Ÿ“„
ArXiv AIโ€ข31d ago

DesignWeaver Boosts Novice T2I Product Design

DesignWeaver is an interface that aids novices in generating prompts for text-to-image models by extracting key product design dimensions from images into a selectable palette. Informed by a study of 12 expert designers, it promotes visual-guided exploration. A 52-novice evaluation showed more diverse, innovative designs but raised expectations beyond current model capabilities.

#product-design#prompt-engineering#generative-design
๐Ÿ“„
ArXiv AIโ€ข31d ago

CANGuard: Hybrid CNN-GRU for CAN Intrusion Detection

CANGuard introduces a spatio-temporal deep learning model combining CNN, GRU, and attention to detect DoS and spoofing attacks in vehicle CAN networks. Trained on the CICIoV2024 dataset, it outperforms state-of-the-art methods across key metrics like accuracy and F1-score. Ablation studies and SHAP analysis validate its components and feature importance.

#cybersecurity#can-bus#automotive-ai
๐Ÿ“„
ArXiv AIโ€ข31d ago

CADSmith: Multi-Agent CAD Gen with Validation

CADSmith is a multi-agent pipeline that generates CadQuery code from natural language, refined through nested loops: inner for execution errors and outer for geometric validation using OpenCASCADE metrics and VLM visual assessment. It employs retrieval-augmented generation over API docs without fine-tuning. Benchmarks show 100% execution rate, boosted F1 to 0.9846, IoU to 0.9629, and Chamfer Distance down to 0.74.

#multi-agent#cad-generation#geometric-validation
๐Ÿ“„
ArXiv AIโ€ข31d ago

BeSafe-Bench Exposes AI Agent Safety Risks

BeSafe-Bench introduces a benchmark for evaluating behavioral safety risks of situated AI agents in functional environments across web, mobile, embodied VLM, and VLA domains. It augments tasks with nine safety-critical risk categories and employs a hybrid evaluation using rule-based checks and LLM-as-a-judge. Tests on 13 agents show even top performers complete under 40% of tasks fully safely, highlighting urgent safety alignment needs.

#agent-safety#safety-benchmark#embodied-agents
๐Ÿ“„
ArXiv AIโ€ข31d ago

AutoB2G: LLM-Driven Auto B2G Simulator

AutoB2G is an LLM-based agentic framework that automates building-grid co-simulation workflows using natural language descriptions. It extends CityLearn V2 for B2G interactions and employs the SOCIA framework with a DAG-structured codebase to guide LLM execution and refinement. Experiments show it coordinates simulations to boost grid performance metrics.

#agentic-framework#co-simulation#energy-management
๐Ÿ“„
ArXiv AIโ€ข31d ago

AIRA_2 Breaks AI Agent Bottlenecks

AIRA_2 tackles three bottlenecks in AI research agents: single-GPU limits, validation generalization gaps, and fixed LLM operators. It employs async multi-GPU workers, Hidden Consistent Evaluation, and ReAct agents for better throughput and reliability. Achieves 71.8% percentile on MLE-bench-30 at 24 hours, rising to 76.0% at 72 hours, surpassing prior SOTA.

#ai-agents#multi-gpu#benchmarks
๐Ÿ“„
ArXiv AIโ€ข31d ago

A-SelecT Automates DiT Timestep Selection

A-SelecT dynamically selects the most information-rich timestep from DiT transformer features in a single run. It eliminates exhaustive timestep searches and suboptimal feature selection, boosting training efficiency. Experiments show superior performance on classification and segmentation benchmarks over prior diffusion methods.

#diffusion-models#timestep-selection
๐Ÿ“Š
Bloomberg Technologyโ€ข31d ago

Ex-OpenAI Kass Predicts AI Winners Boom

Former OpenAI Head of Go-To-Market Zack Kass says AI investments are early but will create many winners. He is a top voice guiding businesses to harness AI. Interviewed at Citi Macro Conference in Hong Kong by Bloomberg.

#ai-investments#expert-opinion#business-adoption
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข31d ago

Space Computing Chain Now Fully Forming

Half a year after hype, space computing has advanced significantly. A complete industry chain is emerging from chips to in-orbit deployment. This marks key progress in orbital compute infrastructure.

#space-compute#satellite-ai#industry-chain
๐Ÿ”ฅ
36ๆฐชโ€ข31d ago

China Boosts AI in Eco-Governance

China's Ministry of Ecology and Environment revealed on March 30 that AI and big data are deeply integrated into ecological monitoring and enforcement, delivering tangible results. This ushers environmental governance into an intelligent era.

#china-ai#govtech#sustainability
โš›๏ธ
้‡ๅญไฝโ€ข31d ago

Chinese World Model Tops Global Benchmarks

Domestic Chinese world model ranks first globally, far surpassing Google and Nvidia. Achieves near-perfect 3D accuracy scores. Secured 1B RMB in latest Pre-B funding round.

#world-model#3d-benchmarks#china-ai-funding
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข31d ago

Grok BBQ Images Spark Japan-US X Trend

Japanese and US X users are uniting in a viral movement sharing mouthwatering American-style BBQ images. It began with a Japanese user's AI-generated illustrations or photos, amplified by a new generative AI feature on X. Elon Musk expressed admiration for the phenomenon.

#viral-trend#cross-cultural#photorealism
๐Ÿฏ
่™Žๅ—…โ€ข31d ago

Slow LLM Delays AI Responses Intentionally

Sam Lavigne released open-source Slow LLM, a Chrome extension and DNS tool that slows responses from ChatGPT, Claude, Grok, and Gemini to add friction and combat AI over-reliance. It intercepts browser Fetch calls to trickle data, promoting independent thinking. Developed with Claude but finished manually after self-slowing.

#browser-extension#ai-ux#dependency
๐Ÿ‡ญ๐Ÿ‡ฐ
SCMP Technologyโ€ข31d ago

DeepSeek 12-Hour Outage Hits Millions

Chinese AI startup DeepSeek endured a 12-hour outage from Sunday evening to Monday morning, cutting off hundreds of millions of users from its chatbot website and app. The Hangzhou-based company investigated the issue and issued fixes between 1am and 9am Monday. Complaints surged as rivals gained market share.

#outage#reliability#chinese-ai
โš›๏ธ
้‡ๅญไฝโ€ข31d ago

DeepSeek Web Upgrade Crashes 11hrs, New Model Teased

DeepSeek's web version launched a major upgrade, causing an 11-hour outage that exploded on hot searches. This follows a quiet period during the 'lobster era.' The event strongly suggests a new model is imminent.

#web-platform#outage#model-release
Page 356 of 893