All Updates

Page 746 of 751

February 12, 2026

๐Ÿ“„
ArXiv AIโ€ข66d ago

CLI-Gym Scales CLI Task Generation

CLI-Gym generates 1,655 CLI tasks via agentic environment inversion from Dockerfiles. It simulates histories to create buggy states and derives tasks with error messages. Fine-tuned LiberCoder boosts Terminal-Bench scores by 21.1%.

#research#cli-gym#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

C^2ROPE Advances 3D Multimodal Reasoning

C^2ROPE enhances Rotary Position Embedding for 3D Large Multimodal Models by addressing spatial locality loss and long-term attention decay. It introduces spatio-temporal continuous positional embeddings using triplet hybrid indices and Chebyshev Causal Masking. Evaluations show superior performance on 3D scene reasoning and VQA benchmarks.

#research#c2rope#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

BNRM Prevents Reward Hacking in RLHF

BNRM introduces Bayesian non-negative reward modeling to combat reward hacking in RLHF. It uses sparse latent factors for disentangled, debiased rewards. Scalable amortized VI enables end-to-end training on LLMs.

#research#bnrm#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Blockwise Advantages for Multi-Objective RL

Introduces Blockwise Advantage Estimation for GRPO in structured generations, assigning per-objective advantages to avoid interference. Uses Outcome-Conditioned Baseline to estimate advantages without nested rollouts. Competitive on math tasks with uncertainty estimation.

#research#grpo#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Benchmark Tests TSFMs on Energy Loads

Multi-dimensional zero-shot benchmark evaluates four TSFMs (Chronos, Moirai, TinyTimeMixer) vs. baselines on ERCOT data. Tests context sensitivity, calibration, robustness to shifts like COVID/Winter Storm. Top models hit MASE 0.31; Chronos-2 best calibrated.

#research#tsfm-benchmark#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Benchmark for Self-Evolving Coding LLMs

EvoCodeBench evaluates LLM-driven coding systems on self-evolution, efficiency, and human-comparable performance across languages. Tracks dynamics like solving time and improvements over iterations. Enables cross-language robustness analysis.

#research#evocodebench#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Auto-Shaping Rewards for Robust Control

Proposes causal reward shaping from offline data for continuous RL under confounders. Derives tight value bounds via causal Bellman equation for PBRS. Outperforms SAC on benchmarks.

#research#reward-shaping#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Authenticated Workflows Secure Agentic AI

Introduces authenticated workflows as a complete trust layer for enterprise agentic AI, protecting prompts, tools, data, and context. Enforces intent and integrity via cryptography and MAPL policy language. Integrates with nine AI frameworks for deterministic security.

#research#authenticated-workflows#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

AugVLA-3D Boosts VLA with Depth Augmentation

AugVLA-3D integrates depth estimation from RGB inputs via VGGT to enrich 3D features in vision-language-action models. An action assistant module ensures consistency with control tasks. It enhances generalization and robustness in complex 3D robotic environments.

#research#augvla-3d#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

AudioRouter Boosts LALMs via RL Tool Use

AudioRouter applies RL to teach large audio language models (LALMs) when to use external audio tools, improving fine-grained perception without heavy training. It optimizes a lightweight routing policy while freezing the base model. Achieves big gains on benchmarks with 600x less data than traditional methods.

#research#audiorouter#audio-ai
๐Ÿ“„
ArXiv AIโ€ข66d ago

Aletheia Powers Autonomous Math Research

Aletheia is a math research agent that generates, verifies, and revises solutions using advanced Gemini Deep Think. It achieves milestones like fully AI-generated papers, human-AI collaborations, and solving four open Erdos problems. The work proposes standards for quantifying AI autonomy in math.

#research#aletheia#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

AI-PACE Framework Boosts Medical AI Education

AI-PACE synthesizes literature to propose a framework for integrating AI into medical education across the learning continuum. It identifies key competencies, curricular approaches, and strategies emphasizing longitudinal integration and interdisciplinary collaboration. The framework balances technical fundamentals with clinical applications to prepare physicians for AI-enhanced healthcare.

#research#ai-pace#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

AI Fails Basic Arithmetic Despite Advanced Math Wins

Frontier AI models excel in advanced math but consistently fail at multi-digit integer addition. Errors primarily stem from operand misalignment or carry failures, explaining most mistakes in top models like Claude, GPT, and Gemini. These issues link to tokenization and random carrying failures.

#research#ai-rithmetic#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

AgentTrace Enables AI Agent Observability

AgentTrace instruments LLM agents for structured logging across operational, cognitive, and contextual traces. Provides runtime transparency for security and monitoring in high-stakes settings. Minimal overhead supports accountability and risk analysis.

#launch#agenttrace#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Affordances Build Partial LLM World Models

Proves LLMs possess predictive partial-world models via task-agnostic affordances for intents. Introduces distribution-robust affordances for multi-task efficiency. Reduces search branching in robotics, outperforming full world models.

#research#affordance-models#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Adversarial Threat Detection in Autonomous Driving

ADยฒ analyzes vulnerabilities in end-to-end driving agents like Transfuser to physics, EMI, and digital attacks in CARLA. Driving scores drop up to 99% under threats. Proposes lightweight attention-based detector for spatial-temporal consistency.

#research#ad2#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

Adapters Unlock Reliable Self-Interpretation

Lightweight adapters trained on interpretability artifacts enable reliable self-interpretation in frozen LMs. A simple scalar affine adapter outperforms baselines in feature labeling, topic identification, and implicit reasoning decoding. Gains scale with model size, driven mostly by learned bias.

#research#self-interpretation#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

ADAlign Auto-Adapts Graph Domains

ADAlign tackles graph domain adaptation by adaptively aligning discrepancies via Neural Spectral Discrepancy (NSD). Uses neural characteristic functions and minimax sampling without heuristics. Outperforms SOTA on 10 datasets with efficiency gains.

#research#adalign#v1
๐Ÿ“„
ArXiv AIโ€ข66d ago

1% Params Beat Full Fine-Tuning

CoLin introduces a 1% parameter low-rank complex adapter for vision foundation models. It resolves convergence issues in composite matrices with tailored loss. Surpasses full fine-tuning and delta-tuning on detection, segmentation, and classification.

#research#arxiv-ai#v1
๐Ÿ“ฑ
Ifanr (็ˆฑ่Œƒๅ„ฟ)โ€ข66d ago

AI Siri Before Cook Retires?

The article questions whether Apple's AI-upgraded Siri will launch before CEO Tim Cook retires. It emphasizes that while delays are tolerable, outright failure is unacceptable. This reflects ongoing uncertainty around Apple's AI assistant rollout.

#apple#ai-siri#voice-assistant
Page 746 of 751