AI Updates Aggregator

All Updates

Page 841 of 860

February 13, 2026

ArXiv AI•74d ago

Surveying Multi-Agent Communication Paradigms

This survey frames multi-agent communication via the Five Ws, tracing evolution from MARL's hand-designed protocols to emergent language and LLM-based systems. It highlights trade-offs in interpretability, scalability, and generalization across paradigms. Practical design patterns and open challenges are distilled for hybrid systems.

#research#arxiv#multi-agent

ArXiv AI•74d ago

SemaPop: Semantic Population Synthesis

SemaPop uses LLMs for semantic-conditioned population synthesis, deriving personas from surveys. Integrates with WGAN-GP for statistical alignment and behavioral realism. Achieves better marginal/joint distribution matches with diversity.

#research#semapop#llms

ArXiv AI•74d ago

scPilot Automates Single-Cell Analysis

scPilot enables LLMs to reason over single-cell RNA-seq data using natural language and on-demand tools for annotation, trajectories, and TF targeting. Paired with scBench benchmark, it shows gains like 11% accuracy lift via iterative reasoning. Transparent traces explain biological insights.

#research#scpilot#scbench

ArXiv AI•74d ago

SCF-RKL Advances Model Merging

SCF-RKL introduces sparse, distribution-aware model merging using reverse KL divergence to minimize interference. It selectively fuses complementary parameters, preserving stable representations and integrating new capabilities. Evaluations on 24 benchmarks show superior performance in reasoning, instruction following, and safety.

#research#scf-rkl#ai

ArXiv AI•74d ago

Quark Medical Alignment Paradigm Launched

Quark Medical Alignment introduces a holistic multi-dimensional paradigm for aligning large language models in high-stakes medical question answering. It decomposes objectives into four categories with closed-loop optimization using observable metrics, diagnosis, and rewards. A unified mechanism with Reference-Frozen Normalization and Tri-Factor Adaptive Dynamic Weighting resolves scale mismatches and optimization conflicts.

#research#quark#medical-alignment

ArXiv AI•74d ago

PhyNiKCE Boosts Autonomous CFD Reliability

PhyNiKCE introduces a neurosymbolic agentic framework to overcome LLM limitations in Computational Fluid Dynamics (CFD) simulations. It decouples neural planning from symbolic validation using a Constraint Satisfaction Problem approach to enforce physical laws. Validated on OpenFOAM tasks, it achieves 96% improvement over baselines while cutting self-correction loops by 59% and token use by 17%.

#research#phynikce#cfd

ArXiv AI•74d ago

PBSAI Multi-Agent AI Governance

PBSAI provides reference architecture for securing enterprise AI estates with multi-agent systems. Organizes 12 domains via agent families, context envelopes, output contracts. Aligns with NIST AI RMF for SOC and hyperscale defense.

#research#pbsai#ai-governance

ArXiv AI•74d ago

NMIPS: Neuro-Symbolic PDE Solver

NMIPS introduces a unified neuro-symbolic framework for solving PDE families with shared structures but varying parameters. It discovers interpretable analytical solutions via multifactorial optimization and affine transfer for efficiency. Experiments show up to 35.7% accuracy gains over baselines.

#research#arxiv#nmips

ArXiv AI•74d ago

Measuring LLM Agent Behavioral Consistency

Study reveals LLM agents like Llama/GPT/Claude produce 2-4 unique action paths per 10 runs on HotpotQA, with inconsistency predicting failure. Consistent runs hit 80-92% accuracy vs 25-60% for inconsistent ones. Variance traces to early decisions like first search query.

#research#llama#gpt

ArXiv AI•74d ago

MaxExp Optimizes Multispecies Predictions

MaxExp is a decision-driven framework for binarizing probabilistic species distribution models into presence-absence maps by maximizing evaluation metrics. It requires no calibration data and outperforms thresholding methods, especially under class imbalance. SSE provides a simpler alternative using expected species richness.

#research#maxexp#ecology

ArXiv AI•74d ago

MathSpatial Exposes MLLMs' Spatial Reasoning Gap

MLLMs excel in perception but fail mathematical spatial reasoning, scoring under 60% on tasks humans solve at 95% accuracy. MathSpatial introduces a framework with MathSpatial-Bench (2K problems), MathSpatial-Corpus (8K training data), and MathSpatial-SRT for structured reasoning. Fine-tuning Qwen2.5-VL-7B achieves strong results with 25% fewer tokens.

#research#mathspatial#qwen

ArXiv AI•74d ago

MAPLE Boosts Multimodal RL Post-Training

MAPLE is a modality-aware ecosystem for post-training multimodal LLMs, including MAPLE-bench, MAPO optimization, and adaptive curricula. It stratifies training by modality needs to cut variance and speed convergence. It closes uni/multi-modal gaps by 30% and converges 3x faster.

#research#maple#multimodal

ArXiv AI•74d ago

LGS for Long-Term Physics Simulation

LGS uses VAE latent space and Transformer dynamics for generalizable PDE simulation. Uncertainty knob and flow forcing stabilize long-horizon predictions. Pretrained on 2.5M trajectories across 12 PDE families.

#research#lgs#physics-simulation

ArXiv AI•74d ago

INTENT: Budget Planning for Tool Agents

INTENT is an inference-time planner for budget-constrained LLM agents using costly tools. Leverages hierarchical world model for intention-aware cost anticipation. Outperforms baselines on StableToolBench under budgets and price shifts.

#research#arxiv#intent

ArXiv AI•74d ago

Human-Inspired Learning for Adaptive Reasoning

Proposes a framework for continuous learning of internal reasoning processes in AI, unifying reasoning, action, reflection, and verification. It treats thinking trajectories as learning material to evolve cognitive structures during execution. Experiments show 23.9% runtime reduction on sensor tasks.

#research#ai#reasoning

ArXiv AI•74d ago

GHOST Prunes Mamba2 Hidden States Efficiently

GHOST applies structured pruning to Mamba2 using forward-pass controllability and observability metrics, avoiding backpropagation. Achieves 50% state reduction with ~1 PPL rise on WikiText-2 across 130M-2.7B models. Code available anonymously.

#research#ghost#mamba2

ArXiv AI•74d ago

Exposing Ground Truth Illusion in Annotations

Literature review critiques 'ground truth' in ML data annotation as a positivistic fallacy ignoring human subjectivity. Analyzes 346 papers from top venues revealing biases like anchoring and geographic hegemony. Proposes roadmap for pluralistic infrastructures embracing disagreement.

#research#arxiv#none

ArXiv AI•74d ago

ERM Fixes Causal Rung Collapse in LLMs

New research identifies 'rung collapse' in LLMs, where models confuse associations with causal interventions, leading to flawed reasoning under distributional shifts. It proposes Epistemic Regret Minimization (ERM), a belief revision method that penalizes causal errors independently of task success. Experiments across six frontier LLMs show ERM recovers 53-59% of entrenched errors.

#research#llms#v1

ArXiv AI•74d ago

DrIGM Enables Robust Multi-Agent RL

DrIGM introduces distributionally robust IGM for MARL, ensuring decentralized actions align under uncertainties via robust value factorization. Compatible with VDN/QMIX/QTRAN without reward shaping. Boosts OOD performance in SustainGym and StarCraft.

#research#arxiv#drigm

ArXiv AI•74d ago

Decision-Valued Maps in DecisionDB

Formalizes decision-valued maps tracking representation impacts on outcomes. DecisionDB logs, replays, audits using content-based IDs and write-once storage. Partitions representation space into persistence regions.

#research#decisiondb#ai-decisions

1840 841 842860

Page 841 of 860