All Updates
Page 575 of 626
February 18, 2026
Hybrid Abstention Boosts LLM Reliability
This arXiv paper introduces an adaptive abstention system for LLMs that dynamically adjusts safety thresholds using contextual signals like domain and user history. It features a multi-dimensional detection architecture with five parallel detectors in a hierarchical cascade, reducing latency and false positives. Evaluations show strong performance in sensitive domains like medical advice.
EduEVAL-DB Dataset for AI Tutor Evaluation
EduEVAL-DB introduces a dataset of 854 explanations for 139 ScienceQA questions across K-12 subjects, with one human-teacher and six LLM-simulated teacher explanations. It features a pedagogical risk rubric covering factual correctness, depth, focus, appropriateness, and bias, annotated via semi-automatic expert review. Preliminary benchmarks compare Gemini 2.5 Pro against fine-tuned Llama 3.1 8B for risk detection on consumer hardware.
EAA Automates Microscopy with VLM Agents
Experiment Automation Agents (EAA) is a vision-language-model-driven system that automates complex microscopy workflows in materials characterization. It combines multimodal reasoning, tool actions, and long-term memory for autonomous or user-guided experiments. Demonstrated at Advanced Photon Source, it handles focusing, feature search, and data acquisition to boost efficiency.
Common Belief Defies KD4: New Axioms
Contrary to common belief, common belief is not KD4 under KD45 individual beliefs, retaining only D and 4 properties plus shift-reflexivity C(Cφ → φ). The paper proves KD4 extended with this axiom is incomplete, requiring an additional agent-number-dependent axiom. This fully characterizes common belief, settling a long-open problem.
AI Predicts Invoice Dilution with Leakage-Free XGBoost & KAN
This ArXiv paper proposes an AI/ML framework to predict invoice dilution in supply chain finance, mitigating non-credit risks and margin losses. It employs leakage-free two-stage XGBoost, Kolmogorov-Arnold Networks (KAN), and ensemble models trained on production data across nine transaction fields. The method supports real-time dynamic credit limits, reducing reliance on buyer's irrevocable payment undertakings (IPU).
AgriWorld: LLM Agents for Verifiable Agri Reasoning
Researchers introduce AgriWorld, a Python execution environment with unified tools for geospatial queries, remote-sensing analytics, crop simulations, and agri predictors. Agro-Reflective LLM agent uses an execute-observe-refine loop for multi-turn reasoning over agricultural data. Evaluated on new AgroBench benchmark, it outperforms text-only and direct tool-use baselines.
Steinberger's OpenClaw Vision as Constitution
Peter Steinberger left a VISION.md document before joining OpenAI, framing OpenClaw's future less as a roadmap and more as a constitution. The article delivers a line-by-line examination of its contents.
OpenClaw v2026.2.17: 1M Context + Sonnet 4.6
OpenClaw v2026.2.17 release enables Anthropic's 1M token context window for Opus and Sonnet. It introduces Sonnet 4.6 support alongside extensive updates to iOS, Slack, Telegram, Discord, and cron systems.
Palo Alto CEO: AI Lags in Enterprise
Palo Alto Networks CEO Nikesh Arora reports minimal enterprise AI adoption, limited mainly to coding assistants. Business use trails consumer adoption by at least two years. The company acquired Koi to gear up for future AI developments.
Qwen-Code v0.10.4: Fixes & Region Support
Qwen-Code released v0.10.4 with a news banner announcing Qwen3.5-Plus launch, fixes for sandbox user permissions in integration tests, and new support for Coding Plan Global/Intl regions. It also bumps the version from 0.10.3, with full changelog available.
Spain Probes X, Meta, TikTok on AI CSAM
Spain's government demands investigation into X, Meta, and TikTok for allegedly using AI to create and spread child sexual abuse material. PM Sanchez accuses platforms of harming children's rights and vows to end their impunity. This follows plans to ban under-16s from social media.
YouTube Recovers from Recommendation Outage
YouTube resolved a brief global outage caused by a recommendation system failure that prevented videos from appearing. The issue affected all platforms including YouTube.com, apps, Music, Kids, and TV. Peak reports hit over 320,000 in the US per Downdetector, with impacts in multiple countries.
GitHub Unveils Copilot CLI Command Cheat Sheet
GitHub has compiled and explained slash commands for GitHub Copilot CLI in an official blog post. Developers can execute quick, repeatable actions in the terminal without switching to editors or web UI.
Gartner: Under 20 Humanoids in Production by 2028
Gartner predicts fewer than 20 companies will deploy humanoid robots in full production by 2028. The forecast focuses on manufacturing and supply chain sectors amid physical AI hype.
AI Adopted by Billions in One Chinese Spring Festival
Chinese AI apps Qianwen, Doubao, and Yuanbao exploded during 2026 Spring Festival via red envelope campaigns, logging billions of interactions and onboarding over 130 million new users, including elderly and lower-tier city residents. This achieved unprecedented adoption speed, faster than smartphones (5 years) or mobile payments (3 years). The event marked AI's shift from niche to mainstream through habit-forming incentives.
Snapdragon Chipsets Show 71-93% INT8 Accuracy Variance
Same INT8 ONNX model tested on 5 Snapdragon chipsets yields accuracy from 93% (8 Gen 3) to 71% (4 Gen 2), vs 94% cloud. Causes: NPU INT8 rounding differences, operator fusion variations, CPU fallbacks on low-end chips. Highlights need for hardware-specific on-device testing.
Galaxy Star Brain Enables Real Robot Deployment
Galaxy Universal transitions robots from stage performances to practical on-the-job use via its end-to-end large model, Galaxy Star Brain. A capable working robot debuted at this year's Spring Festival Gala. The piece highlights the model's strength in real-world applications.
Tesla Avoids CA Sales Ban on FSD Marketing
California DMV confirms Tesla complied with marketing rules for Autopilot and Full Self-Driving, avoiding a 30-day sales ban. This follows a December judge's ruling on exaggerated claims, with Tesla given 90 days for corrections. The company implemented required corrective measures.
California Probes xAI Grok Explicit Images
California AG Rob Bonta is launching an AI accountability program while investigating xAI's Grok for generating explicit pornographic images without consent, including potentially underage content. The office issued a cease-and-desist order last month amid global scrutiny. xAI deflects blame and still allows some sexualized content for paid users.
Qwen SDK TypeScript v0.1.5-preview.2 Released
Qwen released SDK TypeScript v0.1.5-preview.2, bundling CLI v0.10.2 with fixes for authentication, logging, and extension issues. New features include experimental skills settings, redesigned CLI UI, and removal of tiktoken dependency. Various docs updates and compatibility improvements enhance developer workflow.