All Updates
Page 742 of 752
February 12, 2026
Topology Meets NNs Under Uncertainty
Integrates neural networks, topological data analysis, and Bayesian methods for AI in military domains. Covers image, time-series, graph applications like fraud detection. Emphasizes robustness and interpretability.
Tokens Enable Emergent Resource Rationality
Inference-time scaling in language models leads to adaptive resource rationality without explicit cost rewards. Models shift from brute-force to analytic strategies as task complexity rises. LRMs show robustness on challenging functions like XOR/XNOR unlike IT models.
TokaMark Launches Fusion Plasma Benchmark
TokaMark standardizes AI evaluation on MAST tokamak data with unified multi-modal access and 14 tasks. Harmonizes formats, metadata, and protocols for reproducible comparisons. Includes baseline model; fully open-sourced for community use.
Text Boosts Multimodal Anomaly Detection
Text-guided framework enhances weakly supervised multimodal video anomaly detection. Employs in-context learning for anomaly text augmentation and multi-scale bottleneck Transformer for fusion. Achieves state-of-the-art on UCF-Crime and XD-Violence benchmarks.
Ξ΄_TCB Measures LLM Prediction Stability
Introduces Ξ΄_TCB metric to quantify LLM internal state robustness against perturbations, beyond traditional accuracy. Linked to output embedding geometry, it reveals prediction instabilities missed by perplexity. Correlates with prompt engineering in in-context learning.
Synthetic Underspecification for Agents
LHAW generates controllable underspecified long-horizon tasks by removing info across goals, constraints, inputs, context. Validates via agent trials, classifying ambiguity impacts. Releases 285 variants from benchmarks.
SynergyKGC Handles KG Heterogeneity
SynergyKGC fuses entity semantics with heterogeneous topologies via cross-modal synergy. Uses density-dependent anchoring and double-tower consistency. Improves KGC hit rates on benchmarks.
Step 3.5 Flash: Efficient Frontier AI
Step 3.5 Flash is a 196B MoE model with 11B active params for agentic tasks. Optimized with sliding-window attention and MTP-3 for low-latency inference. Matches frontier models on math, code, and agent benchmarks.
Stats Test Spots LLM Degradations
McNemar's test framework detects post-optimization LLM degradations via per-sample comparisons. Aggregates across benchmarks with controlled false positives. Flags 0.3% drops confidently.
Silence Boosts Collective Taste Judgment
Introduces Silence Routing framework for collective intelligence in taste domains using music preferences. Specifies when contributors should speak, report, or stay silent. Simulation shows accuracy gains over baselines only when silence is allowed.
SigLIP Boosts Multi-Label ECG Classification
Adapts SigLIP contrastive learning with a Jaccard-based sigmoid loss for multi-label ECG classification using real-world data. Incorporates medical knowledge and techniques like higher embedding dimensions and random cropping. Per-label analysis identifies prediction challenges across ECG findings.
Semantic Labels Enhance TPRA Retrieval
Explores semantic labeling for TPRA questionnaires using LLMs and hybrid SSSL. Compares direct labeling vs. clustering and propagation. Improves retrieval when labels are discriminative.
Self-Supervised SR Quality Assessor
Proposes no-reference IQA for real-world super-resolved images using content-free SSL. Pretrains multi-SR model representations via contrastive learning. Includes new SRMORSS dataset for pretext training.
ScratchWorld Tests GUI Agents
Introduces ScratchWorld benchmark with 83 tasks for multimodal GUI agents in Scratch. Uses primitive/composite modes and execution-based evaluation. Exposes reasoning-acting gaps in state-of-the-art agents.
Safety Alignment for Omni-Modal LLMs
OmniSteer addresses cross-modality vulnerabilities in OLLMs using AdvBench-Omni dataset and modality-semantics decoupling. Uncovers mid-layer dissolution and extracts golden refusal vector via SVD. Boosts refusal rate to 91.2% while preserving capabilities.
SAF Improves Parkinson's ECoG Prediction
Introduces first reproducible ECoG dataset from rat models for Parkinson's disease prediction. Swap-Adversarial Framework (SAF) uses channel swapping and domain-adversarial training to tackle inter-subject variability and HDLSS issues. Outperforms baselines in cross-subject, cross-session, and cross-dataset settings, generalizing to EEG.
RSHallu: Hallucination Eval for RS MLLMs
RSHallu studies hallucinations in remote-sensing MLLMs with a new taxonomy, benchmark, and dual-mode checker. Provides datasets for mitigation via training and plug-and-play strategies. Improves hallucination-free rates by up to 21% on RS tasks.
Robust Policy Optimization for Recommendations
DRPO tackles model collapse in off-policy generative recommendation via optimistic distributionally robust optimization. Proves hard filtering recovers high-quality data from noisy logs. Achieves SOTA on mixed-quality benchmarks.
RLCER Evolves CoT Rubrics
RLCER reinforces chain-of-thought via self-evolving rubrics without human labels. Outperforms outcome-centric RLVR on reasoning tasks. Rubrics boost inference as prompts.
Rewiring Sparsifies Efficient GNNs
Explores adaptive rewiring and sparsification for scalable GNNs using ErdΕs-RΓ©nyi models. Tested on power grid N-1 analysis with GCN/GIN. Balances sparsity for generalization via tuning and early stopping.