All Updates

Page 780 of 786

February 12, 2026

๐Ÿ“„
ArXiv AIโ€ข69d ago

ECHO Platform for AI-Human Studies

ECHO is an open platform for reproducible human-AI interaction research. Supports chat, search sessions, surveys, tasks in low-code setup. Exports datasets for HCI, IR analysis.

#research#echo#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Dynamic Contamination-Free Medical Benchmark

LiveMedBench offers weekly updated real-world clinical cases for LLM evaluation, avoiding contamination via temporal separation. Multi-agent curation ensures integrity; automated rubric evaluation aligns with experts better than alternatives. Tests reveal top LLMs at 39.2%, highlighting contextual gaps.

#research#livemedbench#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Dissecting Moltbook's Non-Human Social Graph

Early Moltbook data from 6k agents shows power-law participation and small-world connectivity like human networks. Micro patterns are alien: shallow threads, low reciprocity, 34% duplicate templates. Dominated by identity language and phrases like 'my human'.

#research#moltbook#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Diffusion Priors Enhance Sparse CT Reconstruction

Introduces diffusion-based generative priors in DGP framework for reconstructing CT images from sparse-view sinograms. Combines iterative optimization with neural generative power while preserving explainability. Shows promising results under highly sparse geometries.

#research#dgp#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Diffusion Models Graph Domain Adaptation

DiffGDA uses diffusion and SDEs to model continuous structure-semantic evolution from source to target graphs. A domain-aware network guides trajectories to optimal adaptation paths. Outperforms baselines on 14 tasks across 8 datasets.

#research#arxiv-ai#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

DermFM-Zero Excels in Zero-Shot Dermatology

DermFM-Zero is a vision-language model trained on 4M multimodal data for zero-shot dermatology tasks. Achieves SOTA on benchmarks and outperforms clinicians in studies. Latent representations enable interpretable concept discovery.

#research#dermfm-zero#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

CycFlow: Deterministic Flows for TSP Optimization

CycFlow replaces diffusion generation with deterministic point transport for combinatorial optimization like TSP. It learns vector fields to map coordinates to circular arrangements for angular sorting. Speeds up solving by 1000x vs. baselines.

#research#cycflow#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Crypto Guards LLM Prompts and Context

Proposes authenticated prompts and context for cryptographic provenance in LLM apps. Features policy algebra with Byzantine resistance and layered defenses. Achieves 100% attack detection with zero false positives.

#research#authenticated-prompts#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

CrossTALK Jailbreaks VLMs Effectively

Proposes CrossTALK for red-teaming VLMs via cross-modal entanglement attacks. Extends clues across modalities with scalable complexity. Achieves state-of-the-art jailbreak success rates.

#research#crosstalk#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

CRL Steers SAE Features Token-by-Token

CRL uses reinforcement learning to select sparse autoencoder (SAE) features for steering language models at each token, revealing which features impact outputs. It includes adaptive masking for diverse features and enables analysis like branch point tracking and layer-wise comparisons. Tested on Gemma-2 2B, it improves benchmarks while providing interpretable logs.

#research#crl#gemma-2
๐Ÿ“„
ArXiv AIโ€ข69d ago

Confounds Limit FM CT Specificity

Foundation models match task-specific discrimination in abdominal trauma CT but suffer specificity drops from negative-class heterogeneity like solid organ injuries. Task-specific models handle confounds better. Adaptation via labeled training reduces susceptibility.

#research#foundation-models#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

CLI-Gym Scales CLI Task Generation

CLI-Gym generates 1,655 CLI tasks via agentic environment inversion from Dockerfiles. It simulates histories to create buggy states and derives tasks with error messages. Fine-tuned LiberCoder boosts Terminal-Bench scores by 21.1%.

#research#cli-gym#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

C^2ROPE Advances 3D Multimodal Reasoning

C^2ROPE enhances Rotary Position Embedding for 3D Large Multimodal Models by addressing spatial locality loss and long-term attention decay. It introduces spatio-temporal continuous positional embeddings using triplet hybrid indices and Chebyshev Causal Masking. Evaluations show superior performance on 3D scene reasoning and VQA benchmarks.

#research#c2rope#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

BNRM Prevents Reward Hacking in RLHF

BNRM introduces Bayesian non-negative reward modeling to combat reward hacking in RLHF. It uses sparse latent factors for disentangled, debiased rewards. Scalable amortized VI enables end-to-end training on LLMs.

#research#bnrm#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Blockwise Advantages for Multi-Objective RL

Introduces Blockwise Advantage Estimation for GRPO in structured generations, assigning per-objective advantages to avoid interference. Uses Outcome-Conditioned Baseline to estimate advantages without nested rollouts. Competitive on math tasks with uncertainty estimation.

#research#grpo#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Benchmark Tests TSFMs on Energy Loads

Multi-dimensional zero-shot benchmark evaluates four TSFMs (Chronos, Moirai, TinyTimeMixer) vs. baselines on ERCOT data. Tests context sensitivity, calibration, robustness to shifts like COVID/Winter Storm. Top models hit MASE 0.31; Chronos-2 best calibrated.

#research#tsfm-benchmark#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Benchmark for Self-Evolving Coding LLMs

EvoCodeBench evaluates LLM-driven coding systems on self-evolution, efficiency, and human-comparable performance across languages. Tracks dynamics like solving time and improvements over iterations. Enables cross-language robustness analysis.

#research#evocodebench#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Auto-Shaping Rewards for Robust Control

Proposes causal reward shaping from offline data for continuous RL under confounders. Derives tight value bounds via causal Bellman equation for PBRS. Outperforms SAC on benchmarks.

#research#reward-shaping#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

Authenticated Workflows Secure Agentic AI

Introduces authenticated workflows as a complete trust layer for enterprise agentic AI, protecting prompts, tools, data, and context. Enforces intent and integrity via cryptography and MAPL policy language. Integrates with nine AI frameworks for deterministic security.

#research#authenticated-workflows#v1
๐Ÿ“„
ArXiv AIโ€ข69d ago

AugVLA-3D Boosts VLA with Depth Augmentation

AugVLA-3D integrates depth estimation from RGB inputs via VGGT to enrich 3D features in vision-language-action models. An action assistant module ensures consistency with control tasks. It enhances generalization and robustness in complex 3D robotic environments.

#research#augvla-3d#v1
Page 780 of 786