All Updates
Page 602 of 609
February 12, 2026
GRU-Mem Optimizes Long-Context LLM Reasoning
GRU-Mem introduces text-controlled gates to MemAgent for efficient long-context reasoning, preventing memory explosion and unnecessary computation. Update and exit gates manage recurrent memory loops via RL rewards. Achieves up to 400% faster inference on reasoning tasks.
Generative Framework for Brain Infarct Masks
Introduces an anatomy-preserving method using VAE and latent diffusion to generate multi-class brain segmentation masks from NCCT data. It learns anatomical latents from masks only, generating realistic samples with optional lesion control. Avoids artifacts seen in pixel-space models.
GenAI Framework for Higher Ed
Surveys reveal divided stakeholder perceptions of GenAI in IT/EE disciplines at University of Oulu. Proposes conceptual framework with high-level requirements for responsible integration. Ensures EU AI Act compliance and addresses privacy, integrity concerns.
GameDevBench Evaluates Game Dev Agents
GameDevBench offers 132 multimodal game development tasks from tutorials. Agents struggle, with top solving 54.5%; tasks demand code and asset handling. Simple image/video feedback improves performance up to 47.7%.
FPT Bayesian Nets via Feedback Edges
Analyzes parameterized complexity of Bayesian Network Structure Learning using superstructure. Proves fixed-parameter tractability with feedback edge set parameterization. Extends to treewidth with additive representations and polytree learning.
FoSS: GFlowNets for Dynamic Span LMs
FoSS introduces a GFlowNets framework for generating text via dynamic span vocabularies in a DAG-structured state space. It enables flexible segmentation of retrieved text and explores diverse compositional paths. Empirically, it boosts MAUVE scores by 12.5% and excels in knowledge tasks.
FormalJudge Ensures Agent Safety
FormalJudge uses neuro-symbolic bidirectional reasoning to translate intents into verifiable specs. It employs Dafny and Z3 for mathematical guarantees over probabilistic judging. Achieves 16.6% gains and detects deception effectively.
FlowCache Accelerates Autoregressive Video Gen
FlowCache is a caching framework for autoregressive video models, using chunkwise policies and KV cache compression. Achieves 2.38x speedup on MAGI-1 and 6.7x on SkyReels-V2 with minimal quality loss. Code available on GitHub.
First Analysis of AI Agent Social Network
Moltbook, the first social network for AI agents, shows viral growth and diversification into promotional and political topics. Analysis of 44k posts reveals topic-dependent toxicity, especially in incentive and governance areas. Highlights risks like anti-humanity rhetoric and bursty automation flooding.
FIRE: Latent Space Backdoor Mitigation at Runtime
FIRE mitigates backdoors in deployed neural networks by reversing trigger-induced latent space directions. It manipulates features along backdoor paths to neutralize triggers during inference. Outperforms baselines with low overhead on image tasks.
FASCL Future-Aligns Asset Retrieval
FASCL employs future-aligned soft contrastive learning using pairwise return correlations as supervision for financial asset retrieval. It outperforms historical similarity baselines on US equities. Includes protocol to evaluate future trajectory alignment.
FAC Synthesizes Diverse LLM Data
Feature Activation Coverage (FAC) measures diversity in LLM feature space using sparse autoencoders. FAC Synthesis generates samples targeting missing features from seed data. Boosts diversity and performance on instruction, toxicity, reward, and steering tasks.
Evidence Alignment Bottleneck Exposed
Decomposition boosts claim verification only with granular, sub-claim aligned evidence; repeated claim-level evidence degrades performance. Noisy sub-claim labels propagate errors unless using conservative abstention. New dataset features annotated evidence spans.
Evaluating Agentic AI Gaps in Drug Discovery
Researchers evaluate agentic systems for drug discovery across 15 task classes, identifying five key capability gaps like lack of protein models and safety trade-offs. A knowledge-probing experiment reveals architectural bottlenecks in current frameworks. They propose design requirements and a capability matrix for next-gen systems.
ERGO Boosts Monocular 3D Splatting
Introduces ERGO framework for robust 3D Gaussian splatting from single images. Uses excess risk decomposition to adapt loss weights against noisy views. Adds geometry and texture objectives for fidelity.
Equivariant Uncertainty for Interatomic Potentials
Introduces eยฒIP, an equivariant evidential deep learning framework for ML interatomic potentials in molecular dynamics. Models atomic forces and uncertainties via 3x3 covariance tensors that rotate equivariantly. Outperforms ensembles in accuracy, efficiency, and data efficiency.
ENIGMA: EEG-to-Image in 15 Mins
ENIGMA decodes images from EEG with <1% params of priors, achieving SOTA on THINGS-EEG2 and consumer benchmarks. Fine-tunes on new subjects in 15 minutes using simple spatio-temporal backbone and latent alignment. Includes behavioral human evaluations.
ECHO Platform for AI-Human Studies
ECHO is an open platform for reproducible human-AI interaction research. Supports chat, search sessions, surveys, tasks in low-code setup. Exports datasets for HCI, IR analysis.
Dynamic Contamination-Free Medical Benchmark
LiveMedBench offers weekly updated real-world clinical cases for LLM evaluation, avoiding contamination via temporal separation. Multi-agent curation ensures integrity; automated rubric evaluation aligns with experts better than alternatives. Tests reveal top LLMs at 39.2%, highlighting contextual gaps.
Dissecting Moltbook's Non-Human Social Graph
Early Moltbook data from 6k agents shows power-law participation and small-world connectivity like human networks. Micro patterns are alien: shallow threads, low reciprocity, 34% duplicate templates. Dominated by identity language and phrases like 'my human'.