All Updates

Page 897 of 907

February 12, 2026

πŸ“„
ArXiv AIβ€’79d ago

Stats Test Spots LLM Degradations

McNemar's test framework detects post-optimization LLM degradations via per-sample comparisons. Aggregates across benchmarks with controlled false positives. Flags 0.3% drops confidently.

#research#llms#mcnemar
πŸ“„
ArXiv AIβ€’79d ago

Silence Boosts Collective Taste Judgment

Introduces Silence Routing framework for collective intelligence in taste domains using music preferences. Specifies when contributors should speak, report, or stay silent. Simulation shows accuracy gains over baselines only when silence is allowed.

#research#silence-routing#v1
πŸ“„
ArXiv AIβ€’79d ago

SigLIP Boosts Multi-Label ECG Classification

Adapts SigLIP contrastive learning with a Jaccard-based sigmoid loss for multi-label ECG classification using real-world data. Incorporates medical knowledge and techniques like higher embedding dimensions and random cropping. Per-label analysis identifies prediction challenges across ECG findings.

#research#siglip-ecg#v1
πŸ“„
ArXiv AIβ€’79d ago

Semantic Labels Enhance TPRA Retrieval

Explores semantic labeling for TPRA questionnaires using LLMs and hybrid SSSL. Compares direct labeling vs. clustering and propagation. Improves retrieval when labels are discriminative.

#research#sssl-pipeline#v1
πŸ“„
ArXiv AIβ€’79d ago

Self-Supervised SR Quality Assessor

Proposes no-reference IQA for real-world super-resolved images using content-free SSL. Pretrains multi-SR model representations via contrastive learning. Includes new SRMORSS dataset for pretext training.

#research#s3-riqa#v1
πŸ“„
ArXiv AIβ€’79d ago

ScratchWorld Tests GUI Agents

Introduces ScratchWorld benchmark with 83 tasks for multimodal GUI agents in Scratch. Uses primitive/composite modes and execution-based evaluation. Exposes reasoning-acting gaps in state-of-the-art agents.

#research#scratchworld#v1
πŸ“„
ArXiv AIβ€’79d ago

Safety Alignment for Omni-Modal LLMs

OmniSteer addresses cross-modality vulnerabilities in OLLMs using AdvBench-Omni dataset and modality-semantics decoupling. Uncovers mid-layer dissolution and extracts golden refusal vector via SVD. Boosts refusal rate to 91.2% while preserving capabilities.

#research#omnisteer#v1
πŸ“„
ArXiv AIβ€’79d ago

SAF Improves Parkinson's ECoG Prediction

Introduces first reproducible ECoG dataset from rat models for Parkinson's disease prediction. Swap-Adversarial Framework (SAF) uses channel swapping and domain-adversarial training to tackle inter-subject variability and HDLSS issues. Outperforms baselines in cross-subject, cross-session, and cross-dataset settings, generalizing to EEG.

#research#saf#v1
πŸ“„
ArXiv AIβ€’79d ago

RSHallu: Hallucination Eval for RS MLLMs

RSHallu studies hallucinations in remote-sensing MLLMs with a new taxonomy, benchmark, and dual-mode checker. Provides datasets for mitigation via training and plug-and-play strategies. Improves hallucination-free rates by up to 21% on RS tasks.

#research#rshallu#v1
πŸ“„
ArXiv AIβ€’79d ago

Robust Policy Optimization for Recommendations

DRPO tackles model collapse in off-policy generative recommendation via optimistic distributionally robust optimization. Proves hard filtering recovers high-quality data from noisy logs. Achieves SOTA on mixed-quality benchmarks.

#research#drpo#v1
πŸ“„
ArXiv AIβ€’79d ago

RLCER Evolves CoT Rubrics

RLCER reinforces chain-of-thought via self-evolving rubrics without human labels. Outperforms outcome-centric RLVR on reasoning tasks. Rubrics boost inference as prompts.

#research#rlcer#v1
πŸ“„
ArXiv AIβ€’79d ago

Rewiring Sparsifies Efficient GNNs

Explores adaptive rewiring and sparsification for scalable GNNs using ErdΕ‘s-RΓ©nyi models. Tested on power grid N-1 analysis with GCN/GIN. Balances sparsity for generalization via tuning and early stopping.

#research#adaptive-rewiring#v1
πŸ“„
ArXiv AIβ€’79d ago

RealHD Dataset Detects AI Fake Images

RealHD offers 730k high-quality real and AI-generated images from advanced methods like text-to-image and inpainting. Addresses prior dataset flaws with diverse prompts, metadata, and masks. Includes lightweight noise entropy detection baseline with strong generalization.

#research#realhd#v1
πŸ“„
ArXiv AIβ€’79d ago

Quantum ICO Merges Sensing and Computation

Proposes quantum scheme using indefinite causal order (ICO) for integrated sensing and computation on one state. Agent superposes observation-then-compute and compute-then-observation orders. Achieves low losses in magnetic navigation tasks.

#research#ico-agent#v1
πŸ“„
ArXiv AIβ€’79d ago

Quadrupeds Cooperate for Super Jumps

Co-jump enables two quadrupeds to synchronize jumps up to 1.5m via MAPPO and curriculum, without communication. Achieves 144% height gain over solo robots using proprioception. Transfers from sim to hardware.

#research#arxiv-ai#v1
πŸ“„
ArXiv AIβ€’79d ago

ΞΌpscaling Optimizes Model Warm Starts

Proposes principled upscaling for model widths inspired by ΞΌP, with theory guaranteeing equivalence to widened versions. Extends ΞΌTransfer for hyperparameter scaling, avoiding costly retuning at larger sizes. Applicable to diverse architectures and optimizers with infinite-width analysis.

#research#pscaling#v1
πŸ“„
ArXiv AIβ€’79d ago

ProtoGLAD Enables Interpretable Graph Anomalies

ProtoGLAD detects graph-level anomalies by contrasting with nearest normal prototype graphs discovered via point-set kernels. It iteratively clusters normal graphs for unsupervised detection. Provides human-interpretable explanations outperforming black-box methods.

#research#protoglad#v1
πŸ“„
ArXiv AIβ€’79d ago

Privacy Shield for Mobile GUI Agents

Framework anonymizes sensitive UI data with type-preserving placeholders for cloud-based GUI agents. Detects PII across screenshots, XML, and instructions via layered architecture. Achieves top privacy-utility trade-off on benchmarks.

#research#gui-anonymizer#v1
πŸ“„
ArXiv AIβ€’79d ago

Privacy-Aware XR Collaboration Framework

PRISM-XR integrates multimodal LLMs for XR collaboration while filtering sensitive data from XR headset frames on edge servers. It features lightweight registration and customizable content-sharing for efficient synchronization. Evaluations show 90% accuracy in user requests and strong privacy protection.

#research#prism-xr#v1
πŸ“„
ArXiv AIβ€’79d ago

Policy-Car Swerving Absorbs Traffic Jams

Proposes SVDD-JAD strategy mimicking police swerving to suppress stop-and-go waves via slow-in/fast-out maneuvers. Analyzes five key parameters measurable with roadside detectors. SUMO simulations confirm no secondary waves triggered.

#research#svdd-jad#v1
Page 897 of 907