All Updates

Page 593 of 612

February 13, 2026

๐Ÿ‡จ๐Ÿ‡ณ
cnBeta (Full RSS)โ€ข51d ago

OpenAI Launches Speedy Codex-Spark Model

OpenAI released GPT-5.3-Codex-Spark, a lightweight version of its Codex intelligent agent programming tool. This slimmed-down model prioritizes extreme inference speed for rapid iteration scenarios. It follows the latest full Codex model released earlier this month.

#launch#openai#gpt-53-codex-spark
๐Ÿ‡จ๐Ÿ‡ณ
cnBeta (Full RSS)โ€ข51d ago

OpenAI Launches Fast Codex-Spark Variant

OpenAI released GPT-5.3-Codex-Spark, a lightweight version of its Codex programming tool. It targets rapid iteration with extreme inference speed. This follows a recent full Codex model update.

#launch#openai#gpt-53-codex-spark
๐Ÿ‡จ๐Ÿ‡ณ
cnBeta (Full RSS)โ€ข51d ago

OpenAI Launches Speedy Codex-Spark Variant

OpenAI released GPT-5.3-Codex-Spark, a lightweight version of its Codex intelligent programming tool. Designed for extreme inference speed, it targets rapid iteration scenarios. This follows the recent launch of the latest full Codex model.

#launch#openai#gpt-53-codex-spark
๐Ÿ“„
ArXiv AIโ€ข51d ago

Voxtral Realtime Streaming ASR

Voxtral Realtime achieves Whisper-quality transcription at 480ms latency via end-to-end streaming training. Features causal audio encoder and Ada RMS-Norm. Pretrained on 13 languages; model weights released Apache 2.0.

#research#voxtral#voxtral-realtime
๐Ÿ“„
ArXiv AIโ€ข51d ago

V2G Transforms CAD into Auditable Graphs

V2G pipeline converts CAD diagrams into property graphs, capturing component topology and connectivity overlooked by pixel-based MLLMs. It delivers major accuracy gains on electrical schematic compliance benchmarks where top MLLMs fail. Benchmark and code released on GitHub for further research.

#research#v2g-audit#graph-transformation
๐Ÿ“„
ArXiv AIโ€ข51d ago

TSR Boosts Multi-Turn RL for LLM Agents

TSR introduces trajectory-search rollouts to enhance multi-turn reinforcement learning for LLM agents. It uses lightweight tree-style search for high-quality trajectories, improving rollout generation and stabilizing training. Achieves up to 15% performance gains on tasks like Sokoban and WebShop.

#research#tsr#llm-agents
๐Ÿ“„
ArXiv AIโ€ข51d ago

TRACER Aggregates Risks in Agent Trajectories

TRACER is a trajectory-level uncertainty metric for tool-using agents, combining surprisal, repetition, and coherence signals with tail-focused aggregation. Improves AUROC by 37% and AUARC by 55% on tau^2-bench for failure prediction. Code and benchmark on GitHub.

#research#tracer#ai-agents
๐Ÿ“„
ArXiv AIโ€ข51d ago

ThinkRouter Boosts Reasoning Efficiency

ThinkRouter introduces confidence-aware routing between latent and discrete spaces for efficient AI reasoning. It switches to discrete tokens during low-confidence steps to reduce noise from latent embeddings. Experiments show major accuracy gains on STEM and coding tasks while shortening outputs.

#research#thinkrouter#ai-reasoning
๐Ÿ“„
ArXiv AIโ€ข51d ago

Text2GQL-Bench: New Graph Query Benchmark

Text2GQL-Bench introduces a unified benchmark for Text-to-Graph-Query-Language systems with 178,184 question-query pairs across 13 domains and multiple GQLs. It features a scalable dataset generation framework and a multi-metric evaluation including grammatical validity, similarity, semantic alignment, and execution accuracy. Evaluations show LLMs struggle with ISO-GQL, achieving only 4% zero-shot execution accuracy, improving to 50% with 3-shot prompting and 45.1% with fine-tuning.

#research#text2gql-bench#graph-databases
๐Ÿ“„
ArXiv AIโ€ข51d ago

Surveying Multi-Agent Communication Paradigms

This survey frames multi-agent communication via the Five Ws, tracing evolution from MARL's hand-designed protocols to emergent language and LLM-based systems. It highlights trade-offs in interpretability, scalability, and generalization across paradigms. Practical design patterns and open challenges are distilled for hybrid systems.

#research#arxiv#multi-agent
๐Ÿ“„
ArXiv AIโ€ข51d ago

SemaPop: Semantic Population Synthesis

SemaPop uses LLMs for semantic-conditioned population synthesis, deriving personas from surveys. Integrates with WGAN-GP for statistical alignment and behavioral realism. Achieves better marginal/joint distribution matches with diversity.

#research#semapop#llms
๐Ÿ“„
ArXiv AIโ€ข51d ago

scPilot Automates Single-Cell Analysis

scPilot enables LLMs to reason over single-cell RNA-seq data using natural language and on-demand tools for annotation, trajectories, and TF targeting. Paired with scBench benchmark, it shows gains like 11% accuracy lift via iterative reasoning. Transparent traces explain biological insights.

#research#scpilot#scbench
๐Ÿ“„
ArXiv AIโ€ข51d ago

SCF-RKL Advances Model Merging

SCF-RKL introduces sparse, distribution-aware model merging using reverse KL divergence to minimize interference. It selectively fuses complementary parameters, preserving stable representations and integrating new capabilities. Evaluations on 24 benchmarks show superior performance in reasoning, instruction following, and safety.

#research#scf-rkl#ai
๐Ÿ“„
ArXiv AIโ€ข51d ago

Quark Medical Alignment Paradigm Launched

Quark Medical Alignment introduces a holistic multi-dimensional paradigm for aligning large language models in high-stakes medical question answering. It decomposes objectives into four categories with closed-loop optimization using observable metrics, diagnosis, and rewards. A unified mechanism with Reference-Frozen Normalization and Tri-Factor Adaptive Dynamic Weighting resolves scale mismatches and optimization conflicts.

#research#quark#medical-alignment
๐Ÿ“„
ArXiv AIโ€ข51d ago

PhyNiKCE Boosts Autonomous CFD Reliability

PhyNiKCE introduces a neurosymbolic agentic framework to overcome LLM limitations in Computational Fluid Dynamics (CFD) simulations. It decouples neural planning from symbolic validation using a Constraint Satisfaction Problem approach to enforce physical laws. Validated on OpenFOAM tasks, it achieves 96% improvement over baselines while cutting self-correction loops by 59% and token use by 17%.

#research#phynikce#cfd
๐Ÿ“„
ArXiv AIโ€ข51d ago

PBSAI Multi-Agent AI Governance

PBSAI provides reference architecture for securing enterprise AI estates with multi-agent systems. Organizes 12 domains via agent families, context envelopes, output contracts. Aligns with NIST AI RMF for SOC and hyperscale defense.

#research#pbsai#ai-governance
๐Ÿ“„
ArXiv AIโ€ข51d ago

NMIPS: Neuro-Symbolic PDE Solver

NMIPS introduces a unified neuro-symbolic framework for solving PDE families with shared structures but varying parameters. It discovers interpretable analytical solutions via multifactorial optimization and affine transfer for efficiency. Experiments show up to 35.7% accuracy gains over baselines.

#research#arxiv#nmips
๐Ÿ“„
ArXiv AIโ€ข51d ago

Measuring LLM Agent Behavioral Consistency

Study reveals LLM agents like Llama/GPT/Claude produce 2-4 unique action paths per 10 runs on HotpotQA, with inconsistency predicting failure. Consistent runs hit 80-92% accuracy vs 25-60% for inconsistent ones. Variance traces to early decisions like first search query.

#research#llama#gpt
๐Ÿ“„
ArXiv AIโ€ข51d ago

MaxExp Optimizes Multispecies Predictions

MaxExp is a decision-driven framework for binarizing probabilistic species distribution models into presence-absence maps by maximizing evaluation metrics. It requires no calibration data and outperforms thresholding methods, especially under class imbalance. SSE provides a simpler alternative using expected species richness.

#research#maxexp#ecology
๐Ÿ“„
ArXiv AIโ€ข51d ago

MathSpatial Exposes MLLMs' Spatial Reasoning Gap

MLLMs excel in perception but fail mathematical spatial reasoning, scoring under 60% on tasks humans solve at 95% accuracy. MathSpatial introduces a framework with MathSpatial-Bench (2K problems), MathSpatial-Corpus (8K training data), and MathSpatial-SRT for structured reasoning. Fine-tuning Qwen2.5-VL-7B achieves strong results with 25% fewer tokens.

#research#mathspatial#qwen
Page 593 of 612