All Updates
Page 841 of 860
February 13, 2026
Surveying Multi-Agent Communication Paradigms
This survey frames multi-agent communication via the Five Ws, tracing evolution from MARL's hand-designed protocols to emergent language and LLM-based systems. It highlights trade-offs in interpretability, scalability, and generalization across paradigms. Practical design patterns and open challenges are distilled for hybrid systems.
SemaPop: Semantic Population Synthesis
SemaPop uses LLMs for semantic-conditioned population synthesis, deriving personas from surveys. Integrates with WGAN-GP for statistical alignment and behavioral realism. Achieves better marginal/joint distribution matches with diversity.
scPilot Automates Single-Cell Analysis
scPilot enables LLMs to reason over single-cell RNA-seq data using natural language and on-demand tools for annotation, trajectories, and TF targeting. Paired with scBench benchmark, it shows gains like 11% accuracy lift via iterative reasoning. Transparent traces explain biological insights.
SCF-RKL Advances Model Merging
SCF-RKL introduces sparse, distribution-aware model merging using reverse KL divergence to minimize interference. It selectively fuses complementary parameters, preserving stable representations and integrating new capabilities. Evaluations on 24 benchmarks show superior performance in reasoning, instruction following, and safety.
Quark Medical Alignment Paradigm Launched
Quark Medical Alignment introduces a holistic multi-dimensional paradigm for aligning large language models in high-stakes medical question answering. It decomposes objectives into four categories with closed-loop optimization using observable metrics, diagnosis, and rewards. A unified mechanism with Reference-Frozen Normalization and Tri-Factor Adaptive Dynamic Weighting resolves scale mismatches and optimization conflicts.
PhyNiKCE Boosts Autonomous CFD Reliability
PhyNiKCE introduces a neurosymbolic agentic framework to overcome LLM limitations in Computational Fluid Dynamics (CFD) simulations. It decouples neural planning from symbolic validation using a Constraint Satisfaction Problem approach to enforce physical laws. Validated on OpenFOAM tasks, it achieves 96% improvement over baselines while cutting self-correction loops by 59% and token use by 17%.
PBSAI Multi-Agent AI Governance
PBSAI provides reference architecture for securing enterprise AI estates with multi-agent systems. Organizes 12 domains via agent families, context envelopes, output contracts. Aligns with NIST AI RMF for SOC and hyperscale defense.
NMIPS: Neuro-Symbolic PDE Solver
NMIPS introduces a unified neuro-symbolic framework for solving PDE families with shared structures but varying parameters. It discovers interpretable analytical solutions via multifactorial optimization and affine transfer for efficiency. Experiments show up to 35.7% accuracy gains over baselines.
Measuring LLM Agent Behavioral Consistency
Study reveals LLM agents like Llama/GPT/Claude produce 2-4 unique action paths per 10 runs on HotpotQA, with inconsistency predicting failure. Consistent runs hit 80-92% accuracy vs 25-60% for inconsistent ones. Variance traces to early decisions like first search query.
MaxExp Optimizes Multispecies Predictions
MaxExp is a decision-driven framework for binarizing probabilistic species distribution models into presence-absence maps by maximizing evaluation metrics. It requires no calibration data and outperforms thresholding methods, especially under class imbalance. SSE provides a simpler alternative using expected species richness.
MathSpatial Exposes MLLMs' Spatial Reasoning Gap
MLLMs excel in perception but fail mathematical spatial reasoning, scoring under 60% on tasks humans solve at 95% accuracy. MathSpatial introduces a framework with MathSpatial-Bench (2K problems), MathSpatial-Corpus (8K training data), and MathSpatial-SRT for structured reasoning. Fine-tuning Qwen2.5-VL-7B achieves strong results with 25% fewer tokens.
MAPLE Boosts Multimodal RL Post-Training
MAPLE is a modality-aware ecosystem for post-training multimodal LLMs, including MAPLE-bench, MAPO optimization, and adaptive curricula. It stratifies training by modality needs to cut variance and speed convergence. It closes uni/multi-modal gaps by 30% and converges 3x faster.
LGS for Long-Term Physics Simulation
LGS uses VAE latent space and Transformer dynamics for generalizable PDE simulation. Uncertainty knob and flow forcing stabilize long-horizon predictions. Pretrained on 2.5M trajectories across 12 PDE families.
INTENT: Budget Planning for Tool Agents
INTENT is an inference-time planner for budget-constrained LLM agents using costly tools. Leverages hierarchical world model for intention-aware cost anticipation. Outperforms baselines on StableToolBench under budgets and price shifts.
Human-Inspired Learning for Adaptive Reasoning
Proposes a framework for continuous learning of internal reasoning processes in AI, unifying reasoning, action, reflection, and verification. It treats thinking trajectories as learning material to evolve cognitive structures during execution. Experiments show 23.9% runtime reduction on sensor tasks.
GHOST Prunes Mamba2 Hidden States Efficiently
GHOST applies structured pruning to Mamba2 using forward-pass controllability and observability metrics, avoiding backpropagation. Achieves 50% state reduction with ~1 PPL rise on WikiText-2 across 130M-2.7B models. Code available anonymously.
Exposing Ground Truth Illusion in Annotations
Literature review critiques 'ground truth' in ML data annotation as a positivistic fallacy ignoring human subjectivity. Analyzes 346 papers from top venues revealing biases like anchoring and geographic hegemony. Proposes roadmap for pluralistic infrastructures embracing disagreement.
ERM Fixes Causal Rung Collapse in LLMs
New research identifies 'rung collapse' in LLMs, where models confuse associations with causal interventions, leading to flawed reasoning under distributional shifts. It proposes Epistemic Regret Minimization (ERM), a belief revision method that penalizes causal errors independently of task success. Experiments across six frontier LLMs show ERM recovers 53-59% of entrenched errors.
DrIGM Enables Robust Multi-Agent RL
DrIGM introduces distributionally robust IGM for MARL, ensuring decentralized actions align under uncertainties via robust value factorization. Compatible with VDN/QMIX/QTRAN without reward shaping. Boosts OOD performance in SustainGym and StarCraft.
Decision-Valued Maps in DecisionDB
Formalizes decision-valued maps tracking representation impacts on outcomes. DecisionDB logs, replays, audits using content-based IDs and write-once storage. Partitions representation space into persistence regions.