All Updates
Page 745 of 752
February 12, 2026
Large-Scale AI Social Simulation Launched
AIvilization v0 deploys a resource-constrained artificial society with unified LLM agents. Features hierarchical planning, adaptive profiles, and human steering for long-horizon autonomy. Reproduces real market stylized facts like wealth stratification.
LAP Achieves Zero-Shot Robot Embodiment Transfer
Language-Action Pre-training (LAP) represents robot actions in natural language for zero-shot transfer across embodiments without fine-tuning. LAP-3B, a 3B VLA, delivers over 50% success on novel robots and tasks. Enables efficient adaptation and unifies action prediction with VQA.
LakeMLB Benchmarks ML in Data Lakes
LakeMLB is a benchmark for machine learning in data lakes, focusing on multi-table union and join scenarios with real datasets from government, finance, and more. Supports pre-training, augmentation strategies. Evaluates tabular ML methods and releases datasets/code.
KSTER Attacks Reverse LLM Model Edits
KSTER exploits low-rank updates in locate-then-edit methods to recover edited data via spectral keyspace reconstruction and entropy prompt recovery. Achieves high success on multiple LLMs. Defense subspace camouflage uses decoys to hide fingerprints.
KPO Stabilizes LLM Policy Optimization
Online Causal Kalman Filtering models IS ratios as evolving latent states for stable RL in LLMs. Smooths noise while preserving token structure. Superior on math reasoning datasets.
KG-Guided LLM for SSD Analysis
KORAL integrates LLMs with Data and Literature Knowledge Graphs for SSD diagnostics from fragmented telemetry. Provides descriptive, predictive, prescriptive, what-if analysis with explainable insights. Outperforms expert methods on production traces.
ImprovEvolve Boosts AlphaEvolve Solutions
Enhances LLM-guided evolution by evolving programs that propose, improve, and perturb solutions iteratively. Achieves new SOTA on hexagon packing and autocorrelation inequality benchmarks. Reduces LLM cognitive load via structured parameterization.
HZO Speeds Zeroth-Order Optimization
Hierarchical Zero-Order optimization decomposes network depth for efficient ZO in DNNs. Reduces query complexity from O(ML^2) to O(ML log L). Matches backpropagation accuracy on CIFAR-10 and ImageNet.
Human Guidance Excels in Vibe Coding
Presents experimental framework comparing human-led, AI-led, and hybrid vibe coding groups. Humans deliver superior iterative instructions, preventing AI-led performance collapse. Hybrids thrive with human direction and AI evaluation.
Guide Transitions Orgs to Agentic AI
Practical framework shifts organizations to agentic AI via domain-driven tasks and human-in-loop orchestration. Addresses challenges like workflow ownership and scaling. Emphasizes small AI-augmented teams with business alignment.
GTR Enhances Time Series Forecasting
Global Temporal Retriever (GTR) is a plug-and-play module extending MTSF models' context via global pattern retrieval. Uses adaptive embeddings, dynamic alignment, and 2D convolution fusion. SOTA results on six datasets with low overhead; code on GitHub.
GRU-Mem Optimizes Long-Context LLM Reasoning
GRU-Mem introduces text-controlled gates to MemAgent for efficient long-context reasoning, preventing memory explosion and unnecessary computation. Update and exit gates manage recurrent memory loops via RL rewards. Achieves up to 400% faster inference on reasoning tasks.
Generative Framework for Brain Infarct Masks
Introduces an anatomy-preserving method using VAE and latent diffusion to generate multi-class brain segmentation masks from NCCT data. It learns anatomical latents from masks only, generating realistic samples with optional lesion control. Avoids artifacts seen in pixel-space models.
GenAI Framework for Higher Ed
Surveys reveal divided stakeholder perceptions of GenAI in IT/EE disciplines at University of Oulu. Proposes conceptual framework with high-level requirements for responsible integration. Ensures EU AI Act compliance and addresses privacy, integrity concerns.
GameDevBench Evaluates Game Dev Agents
GameDevBench offers 132 multimodal game development tasks from tutorials. Agents struggle, with top solving 54.5%; tasks demand code and asset handling. Simple image/video feedback improves performance up to 47.7%.
FPT Bayesian Nets via Feedback Edges
Analyzes parameterized complexity of Bayesian Network Structure Learning using superstructure. Proves fixed-parameter tractability with feedback edge set parameterization. Extends to treewidth with additive representations and polytree learning.
FoSS: GFlowNets for Dynamic Span LMs
FoSS introduces a GFlowNets framework for generating text via dynamic span vocabularies in a DAG-structured state space. It enables flexible segmentation of retrieved text and explores diverse compositional paths. Empirically, it boosts MAUVE scores by 12.5% and excels in knowledge tasks.
FormalJudge Ensures Agent Safety
FormalJudge uses neuro-symbolic bidirectional reasoning to translate intents into verifiable specs. It employs Dafny and Z3 for mathematical guarantees over probabilistic judging. Achieves 16.6% gains and detects deception effectively.
FlowCache Accelerates Autoregressive Video Gen
FlowCache is a caching framework for autoregressive video models, using chunkwise policies and KV cache compression. Achieves 2.38x speedup on MAGI-1 and 6.7x on SkyReels-V2 with minimal quality loss. Code available on GitHub.
First Analysis of AI Agent Social Network
Moltbook, the first social network for AI agents, shows viral growth and diversification into promotional and political topics. Analysis of 44k posts reveals topic-dependent toxicity, especially in incentive and governance areas. Highlights risks like anti-humanity rhetoric and bursty automation flooding.