All Updates

Page 231 of 923

April 14, 2026

๐Ÿ“„
ArXiv AIโ€ข20d ago

Pessimistic VGA for Bias-Free Multi-Criteria Ranking

This arXiv paper introduces novel linear programming-based Virtual Gap Analysis (VGA) models to handle biases and data diversity in multi-criteria analysis (MCA). It outlines a two-step pessimistic method using cardinal and ordinal data to assess and prioritize alternatives, eliminating the least favorable. The approach is scalable for decision support systems.

#linear-programming#multi-criteria#decision-making
๐Ÿ“„
ArXiv AIโ€ข20d ago

OpenFlo Automates Web UX with AI Agents

OpenFlo is an AI agent simulating human behavior on websites for automated UX evaluation, producing reports via SUS, SEQ, and Think Aloud. It uses GUI grounding for robust end-to-end interactions, unlike DOM-based tools. Open-source code enables scalable usability testing for developers.

#ux-evaluation#web-agents#gui-grounding
๐Ÿ“„
ArXiv AIโ€ข20d ago

OOWM: Object-Oriented World Modeling for Embodied AI

OOWM introduces a framework that structures embodied reasoning using object-oriented programming and UML diagrams, redefining world models as explicit symbolic tuples of state and transitions. It employs class diagrams for object hierarchies from visual perception and activity diagrams for executable planning. A three-stage training pipeline with SFT and GRPO enables learning from sparse rewards, outperforming textual CoT on MRoom-30k benchmarks.

#embodied-reasoning#robotic-planning#uml-modeling
๐Ÿ“„
ArXiv AIโ€ข20d ago

MobiFlow: Real-World Mobile Agent Benchmark

MobiFlow is a new evaluation framework for mobile agents using tasks from arbitrary third-party applications. It employs an efficient graph-construction algorithm based on multi-trajectory fusion to compress state space and support dynamic interactions. Covering 20 apps and 240 tasks, it aligns better with human assessments than AndroidWorld.

#mobile-agents#benchmarking#gui
๐Ÿ“„
ArXiv AIโ€ข20d ago

LABBench2: Tougher AI Biology Benchmark

LABBench2 introduces nearly 1,900 tasks to measure AI systems' real-world biology research capabilities, evolving from LAB-Bench with more realistic contexts. Frontier models show gains over prior benchmarks but face 26-46% accuracy drops. Dataset on Hugging Face; eval harness on GitHub.

#benchmark#biology-ai#ai-evaluation
๐Ÿ“„
ArXiv AIโ€ข20d ago

Factorizing Formal Contexts via Necessity Operators

This arXiv paper analyzes a method for factorizing formal contexts into independent subcontexts using closures of necessity operators from possibility theory. It examines properties of set pairs that enable such factorizations in Boolean data settings. The approach is extended to fuzzy contexts to support efficient computation of subcontexts.

#formal-contexts#possibility-theory#fuzzy-logic
๐Ÿ“„
ArXiv AIโ€ข20d ago

Explainable Planning for Hybrid Systems

This arXiv paper introduces a comprehensive study on explainable artificial intelligence planning (XAIP) for hybrid systems. It highlights applications in safety-critical domains like self-driving cars, robotics, and healthcare. The work addresses the growing need for explanations in automated planning amid AI automation shifts.

#explainable-ai#automated-planning#hybrid-systems
๐Ÿ“„
ArXiv AIโ€ข20d ago

Benchmark Humanizes Mobile GUI Agents

Introduces 'Turing Test on Screen' benchmark modeling agent-detection as MinMax optimization to minimize behavioral divergence. Collects high-fidelity mobile touch dynamics dataset, revealing vanilla LMM agents' detectability due to unnatural kinematics. Establishes AHB with metrics and proposes humanization methods achieving high imitability without utility loss.

#gui-agents#humanization#touch-dynamics
๐Ÿ—พ
ITmedia AI+ (ๆ—ฅๆœฌ)โ€ข20d ago

AWS Launches Risky OpenClaw AI Agent on Lightsail

AWS has made the open-source autonomous private AI agent OpenClaw available on its VPS service Amazon Lightsail. Users can run AI agents via browser to automate tasks like email management, web browsing, and file organization. Security considerations are emphasized due to potential risks.

#ai-agent#autonomous-agent#vps-hosting
๐Ÿ“„
ArXiv AIโ€ข20d ago

AHC: Meta-Learned Compression for MCU Detection

AHC is a meta-learning framework for adaptive compression enabling continual object detection on MCUs under 100KB memory. It uses MAML-based adaptation in 5 steps, hierarchical scale-aware compression, and dual-memory consolidation. Outperforms baselines on CORe50, TiROD, and PASCAL VOC with theoretical forgetting bounds.

#continual-learning#model-compression#edge-ai
๐Ÿ“„
ArXiv AIโ€ข20d ago

Agentic PDE Exploration with Latent Models

This research couples multi-agent LLMs with latent foundation models (LFMs) to enable continuous exploration of PDE-governed phenomena like fluid flows. LFMs provide compact, disentangled latent representations and act as fast surrogate simulators for arbitrary parameters. Applied to tandem cylinder flows at Re=500, it autonomously discovers new scaling laws for displacement and momentum thickness.

#multi-agent#pde-simulation#scientific-discovery
๐Ÿ“„
ArXiv AIโ€ข20d ago

7 Steps for AI Log Analysis

AI systems generate vast logs during interactions, crucial for understanding model behaviors and evaluating effectiveness. This arXiv paper proposes a standardized 7-step pipeline based on best practices, illustrated with code from the Inspect Scout library. It offers detailed guidance and highlights common pitfalls for reproducible analysis.

#log-analysis#ai-evaluation#best-practices
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข20d ago

Skip L3? Huawei Sticks, XPeng Targets L4

Debate rages on skipping L3 autonomy for direct L4 pursuit in China. Huawei views L3 as essential path amid pilots, while XPeng leads L4 charge. Core issue is competition for policy incentives.

#autonomous-driving#av-strategy#china-policy
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข20d ago

StepStar & Moonshot Race to IPO

Chinese AI firms Jueyue Xingchen and Moonshot AI (Yue Zhi Anmian) are rushing for IPOs. The race questions who will be the next major success like 'China Lobster'. Listing marks the start of intensified competition.

#ipo#chinese-ai#startups
๐Ÿผ
Pandailyโ€ข20d ago

Kuaishou KroWork Tests Remote AI Coding

Kuaishou is testing KroWork, an AI coding assistant that supports remote task execution through messaging platforms. It emphasizes asynchronous development workflows. Currently in early testing on macOS with Apple silicon.

#remote-tasks#async-dev#apple-silicon
๐Ÿผ
Pandailyโ€ข20d ago

SenseTime Care U Expands AI to Homes

SenseTime's SenseAuto has launched Care U, an AI home companion that extends in-vehicle intelligence to household scenarios. It features cross-device memory and coordination across car, home, and office. The product unifies AI systems in multiple environments.

#home-ai#cross-device#vehicle-to-home
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข20d ago

ZTE Cracks AI Deep Waters for Enterprises

ZTE positions itself as a 'value contributor' in the AI era without overstepping. It collaborates with industries to build resilient ecosystems. This drives widespread adoption of digital intelligence technologies for government and enterprises.

#digital-intelligence#ecosystem-building#gov-tech
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข20d ago

China's AI Unicorns Reach Maturity

Chinese AI unicorns are marking their 'coming of age'. AI models remain a key factor, but the competitive battlefield is evolving.

#china-ai#competition#maturity
โš›๏ธ
้‡ๅญไฝโ€ข20d ago

Qwen Agents Generate Excel from Chat

Qwen introduces Agent functionality that enables direct generation and editing of Excel files from conversational inputs. This reconstructs traditional spreadsheet workflows using AI agents.

#excel-agent#productivity#agentic-ai
๐Ÿ’ฐ
้’›ๅช’ไฝ“โ€ข20d ago

Altman's Home Torched: AI Luddites Resurge?

Arson targeted Sam Altman's residence. Event revives 'Luddite movement' fears in AI era, questioning unsustainable 'winner-takes-all' tech gains.

#luddite#ai-backlash#sam-altman
Page 231 of 923