All Updates
Page 356 of 893
March 30, 2026
Sommelier: Open Pipeline for Full-Duplex SLMs
Sommelier introduces a scalable open-source pipeline for preprocessing multi-turn audio data tailored for full-duplex Speech Language Models. It tackles the lack of high-quality multi-speaker conversational datasets and issues like overlapping speech, back-channeling, diarization errors, and ASR hallucinations. This bridges critical gaps in developing real-time natural human-computer interaction systems.
PAPO Stabilizes Rubric Training via Decoupled Normalization
Process-Aware Policy Optimization (PAPO) integrates rubric-based process evaluation into GRPO using decoupled advantage normalization to overcome ORM's uniform correctness issue and PRM's reward hacking. It composes global-normalized outcome advantage (Aout) with correct-response-normalized process advantage (Aproc). PAPO outperforms baselines, achieving 51.3% on OlympiadBench versus 46.3% for ORM.
LLM Fusion Builds Traceable Airport KGs
Presents a dual-stage framework fusing symbolic Knowledge Engineering and LLMs to construct machine-readable Knowledge Graphs for Total Airport Management, addressing data silos. Document-level processing outperforms segment-based inference in recovering non-linear procedural dependencies. Ensures traceability via probabilistic discovery and deterministic source anchoring, automating workflow synthesis from text.
Lightweight AI Framework for PV Arc-Fault Detection
The LD-framework enables reliable DC arc-fault detection in PV systems despite spectral interference, hardware heterogeneity, and operating drifts. It achieves 0.9999 accuracy, 0% false trips across nuisance conditions, and effective cross-hardware transfer with minimal data. Cloud-edge collaboration supports long-term self-adaptation, recovering precision from 21% to 95% in field tests.
GUIDE Fixes GUI Agent Domain Bias
GUIDE is a training-free, plug-and-play framework that resolves domain bias in GUI agents by retrieving expertise from web tutorial videos. It uses a subtitle-driven Video-RAG pipeline for three-stage retrieval and an automated annotation pipeline based on inverse dynamics to inject planning and grounding knowledge. Experiments on OSWorld show over 5% success rate improvements and fewer execution steps without model changes.
DesignWeaver Boosts Novice T2I Product Design
DesignWeaver is an interface that aids novices in generating prompts for text-to-image models by extracting key product design dimensions from images into a selectable palette. Informed by a study of 12 expert designers, it promotes visual-guided exploration. A 52-novice evaluation showed more diverse, innovative designs but raised expectations beyond current model capabilities.
CANGuard: Hybrid CNN-GRU for CAN Intrusion Detection
CANGuard introduces a spatio-temporal deep learning model combining CNN, GRU, and attention to detect DoS and spoofing attacks in vehicle CAN networks. Trained on the CICIoV2024 dataset, it outperforms state-of-the-art methods across key metrics like accuracy and F1-score. Ablation studies and SHAP analysis validate its components and feature importance.
CADSmith: Multi-Agent CAD Gen with Validation
CADSmith is a multi-agent pipeline that generates CadQuery code from natural language, refined through nested loops: inner for execution errors and outer for geometric validation using OpenCASCADE metrics and VLM visual assessment. It employs retrieval-augmented generation over API docs without fine-tuning. Benchmarks show 100% execution rate, boosted F1 to 0.9846, IoU to 0.9629, and Chamfer Distance down to 0.74.
BeSafe-Bench Exposes AI Agent Safety Risks
BeSafe-Bench introduces a benchmark for evaluating behavioral safety risks of situated AI agents in functional environments across web, mobile, embodied VLM, and VLA domains. It augments tasks with nine safety-critical risk categories and employs a hybrid evaluation using rule-based checks and LLM-as-a-judge. Tests on 13 agents show even top performers complete under 40% of tasks fully safely, highlighting urgent safety alignment needs.
AutoB2G: LLM-Driven Auto B2G Simulator
AutoB2G is an LLM-based agentic framework that automates building-grid co-simulation workflows using natural language descriptions. It extends CityLearn V2 for B2G interactions and employs the SOCIA framework with a DAG-structured codebase to guide LLM execution and refinement. Experiments show it coordinates simulations to boost grid performance metrics.
AIRA_2 Breaks AI Agent Bottlenecks
AIRA_2 tackles three bottlenecks in AI research agents: single-GPU limits, validation generalization gaps, and fixed LLM operators. It employs async multi-GPU workers, Hidden Consistent Evaluation, and ReAct agents for better throughput and reliability. Achieves 71.8% percentile on MLE-bench-30 at 24 hours, rising to 76.0% at 72 hours, surpassing prior SOTA.
A-SelecT Automates DiT Timestep Selection
A-SelecT dynamically selects the most information-rich timestep from DiT transformer features in a single run. It eliminates exhaustive timestep searches and suboptimal feature selection, boosting training efficiency. Experiments show superior performance on classification and segmentation benchmarks over prior diffusion methods.
Ex-OpenAI Kass Predicts AI Winners Boom
Former OpenAI Head of Go-To-Market Zack Kass says AI investments are early but will create many winners. He is a top voice guiding businesses to harness AI. Interviewed at Citi Macro Conference in Hong Kong by Bloomberg.
Space Computing Chain Now Fully Forming
Half a year after hype, space computing has advanced significantly. A complete industry chain is emerging from chips to in-orbit deployment. This marks key progress in orbital compute infrastructure.
China Boosts AI in Eco-Governance
China's Ministry of Ecology and Environment revealed on March 30 that AI and big data are deeply integrated into ecological monitoring and enforcement, delivering tangible results. This ushers environmental governance into an intelligent era.
Chinese World Model Tops Global Benchmarks
Domestic Chinese world model ranks first globally, far surpassing Google and Nvidia. Achieves near-perfect 3D accuracy scores. Secured 1B RMB in latest Pre-B funding round.
Grok BBQ Images Spark Japan-US X Trend
Japanese and US X users are uniting in a viral movement sharing mouthwatering American-style BBQ images. It began with a Japanese user's AI-generated illustrations or photos, amplified by a new generative AI feature on X. Elon Musk expressed admiration for the phenomenon.
Slow LLM Delays AI Responses Intentionally
Sam Lavigne released open-source Slow LLM, a Chrome extension and DNS tool that slows responses from ChatGPT, Claude, Grok, and Gemini to add friction and combat AI over-reliance. It intercepts browser Fetch calls to trickle data, promoting independent thinking. Developed with Claude but finished manually after self-slowing.
DeepSeek 12-Hour Outage Hits Millions
Chinese AI startup DeepSeek endured a 12-hour outage from Sunday evening to Monday morning, cutting off hundreds of millions of users from its chatbot website and app. The Hangzhou-based company investigated the issue and issued fixes between 1am and 9am Monday. Complaints surged as rivals gained market share.
DeepSeek Web Upgrade Crashes 11hrs, New Model Teased
DeepSeek's web version launched a major upgrade, causing an 11-hour outage that exploded on hot searches. This follows a quiet period during the 'lobster era.' The event strongly suggests a new model is imminent.