All Updates

#drone-swarm#voice-control#military-ai

SpaceX Enters Pentagon AI Drone Race

SpaceX and its xAI subsidiary are competing in a classified Pentagon program for voice-controlled autonomous drone swarms. Elon Musk's recent company merger thrusts them into AI-enabled weapons development. The move into military AI could provoke significant debate.

#memory-shortage#fab-expansion#ai-infra

Micron's $200B Factory Push Breaks AI Memory Bottleneck

Micron Technology plans $200 billion investment in new factories to tackle the worst storage chip shortage in 40 years. The expansion targets AI memory constraints, powering data storage for smartphones, autos, laptops, and data centers.

#humanoid-robot#viral-video#chunwan-gala

Unitree Chunwan Robot Video Goes Viral Overseas

Unitree humanoid robots performed on the Spring Festival Gala stage. The official video from Unitree reached nearly 100,000 views in under 10 hours overseas. Overseas netizens expressed shock at the performance in comments.

🔥

36氪•48d ago

Spring Fest Box Office Hits 10B, AI Orders Surge

2026 Spring Festival box office surpassed 10 billion yuan including pre-sales by Feb 17. Qianwen data shows AI ticket buys on Damai jumped 372x in two days. Tier 3/4 city AI orders skyrocketed 782x.

#ticket-orders#ai-adoption#tier34-cities

#autonomous-driving#fleet-operations#austin-deployment

Robotaxi 19% Availability After 8 Months

Tesla's Robotaxi service launched in Austin 8 months ago now has just 19% availability. This lags far behind Elon Musk's prior commitments. Multiple core operational metrics fall short of targets.

🇬🇧

The Guardian Technology•48d ago

AI Climate Claims Branded Greenwashing

A report dismisses tech industry claims that AI can fix climate issues as greenwashing. Most references are to traditional machine learning, not energy-intensive generative AI like chatbots and image tools. Explosive growth fuels massive datacenter energy demands.

#research#generative-ai#climate

#automated-vehicles#nl-explanations#llm-ensemble

X-Blocks: Linguistic Blocks for AV Explanations

X-Blocks introduces a hierarchical framework analyzing natural language explanations for automated vehicles (AVs) at context, syntax, and lexicon levels. RACE, a multi-LLM ensemble with Chain-of-Thought and self-consistency, achieves 91.45% accuracy on Berkeley DeepDrive-X dataset. It uncovers scenario-specific vocabulary patterns and reusable grammar families for explainable AI.

#benchmark-generation#data-augmentation#reasoning-evaluation

VeRA: Verified Reasoning Data Augmentation

VeRA converts benchmark problems into executable specifications—templates, generators, and verifiers—to create unlimited verified variants at near-zero cost. VeRA-E generates equivalent problems to detect memorization, while VeRA-H hardens tasks for fresh challenges. Evaluated on 16 frontier models and fully open-sourced.

#research#vera#ai-evaluation

VeRA: Scalable Verified Reasoning Data Augmentation

VeRA is a framework that transforms static benchmark problems into executable specifications for generating unlimited verified variants. It features VeRA-E for equivalent rewrites to detect memorization and VeRA-H for hardened tasks at intelligence frontiers. The tool is open-sourced with code and datasets after evaluating 16 frontier models.

#text-detection#std-dev#auroc

VaryBalance: Top LLM Text Detector

VaryBalance detects LLM-generated text by exploiting greater variation between human texts and their LLM-rewritten versions versus LLM texts. It quantifies this via mean standard deviation for robust distinction. Experiments show it beats state-of-the-art like Binoculars by up to 34.3% AUROC across models and languages.

Trajectory-Dominant Pareto Optimization for Intelligence

AI systems stagnate in long-horizon adaptability due to trajectory-level Pareto traps, not data or capacity limits. The paper introduces Trajectory-Dominant Pareto Optimization, defining dominance over full trajectories, and Pareto traps as local optima blocking global paths. It proposes the Trap Escape Difficulty Index (TEDI) and a taxonomy to diagnose intelligence ceilings.

#pareto-traps#tedi-index

#research#sslogic#logical-reasoning

SSLogic Scales Logic via Agentic Synthesis

SSLogic is an agentic meta-synthesis framework that scales logical reasoning tasks at the family level using iterative Generate-Validate-Repair loops for Generator-Validator pairs. It features a Multi-Gate Validation Protocol with adversarial blind reviews by independent agents to ensure data reliability. Training on evolved data boosts benchmarks like SynLogic by +5.2 points.

#test-time-compute#parallel-clones#agentic-rl

SELFCEST: Learned Parallel Model Clones

SELFCEST equips base language models to spawn same-weight clones in parallel contexts via agentic reinforcement learning. It trains end-to-end with global task rewards and shared-parameter rollouts to allocate budgets across branches. This improves accuracy-cost Pareto frontiers on math reasoning and long-context QA benchmarks with OOD generalization.

#multimodal-llm#engineering-plots

PlotChain Benchmark for MLLM Plot Reading

PlotChain introduces a deterministic benchmark for evaluating multimodal LLMs on extracting quantitative values from engineering plots like Bode and FFT. It features 450 plots across 15 families with ground truth and checkpoint diagnostics for failure analysis. Top models score ~80% (Gemini 2.5 Pro leads), but frequency tasks remain weak.

#abstract-syntax-tree#first-order-logic#semantic-parsing

NL2LOGIC: 99% Accurate NL-to-FOL Translation

NL2LOGIC is a new framework using abstract syntax trees (AST) to translate natural language into first-order logic via large language models. It combines a recursive LLM semantic parser with an AST-guided generator for high syntactic accuracy and semantic faithfulness. Benchmarks show 99% syntactic accuracy, up to 30% semantic gains, and 31% reasoning improvement when integrated with Logic-LM.

#sub-agents#agentic-ai#personalization

MAPLE: Sub-Agent Design for AI Personalization

MAPLE decomposes LLM agent limitations by separating memory, learning, and personalization into dedicated sub-agents. Memory manages storage/retrieval, Learning extracts insights asynchronously, and Personalization applies them in real-time. It boosts personalization scores by 14.6% and trait incorporation from 45% to 75% on MAPLE-Personas benchmark.

Lang2Act Boosts VLM Visual Reasoning with Emergent Tools

Lang2Act enhances Vision-Language Models (VLMs) via self-emergent linguistic toolchains for fine-grained visual perception in VRAG, avoiding rigid external tools and info loss from image ops. It employs a two-stage RL framework: first to build a reusable action toolbox, second to exploit it for reasoning. Achieves >4% performance gains; code at GitHub.

#visual-reasoning

#research#llms#hallucinations

Geometric Taxonomy of LLM Hallucinations

Researchers propose a geometric taxonomy classifying LLM hallucinations into three types: unfaithfulness, confabulation, and factual error. Benchmark hallucinations show strong domain-local detection but fail cross-domain, while human-crafted confabulations enable a single global detection direction. Factual errors remain undetectable via embeddings due to distributional encoding limits.