AutoBio: VLA Turing Test in Bio Labs
🧠#robotics-benchmark#biology-automationFreshcollected in 10m

AutoBio: VLA Turing Test in Bio Labs

PostLinkedIn
🧠Read original on 机器之心

💡ICLR benchmark tests VLAs in bio labs—critical for robotics in science

⚡ 30-Second TL;DR

What changed

AutoBio simulates bio lab with structured workflows, high-precision mechanics, liquids

Why it matters

Advances embodied AI towards lab automation, revealing gaps in current VLA capabilities for professional science.

What to do next

Clone AutoBio GitHub repo and benchmark your VLA model on bio lab tasks.

Who should care:Researchers & Academics

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Key Takeaways

  • AutoBio is a novel simulation benchmark for Vision-Language-Action (VLA) models, developed collaboratively by HKU MMLAB and SJTU teams, accepted to ICLR 2026 with peer review scores of 8-8-6-6.
  • The benchmark simulates a digital biology lab environment, focusing on long-horizon tasks, high-precision interactions with threaded tools, and visual occlusions from liquids and transparent containers.
  • Open-source resources include a GitHub repository for the simulation environment and evaluation code, plus Hugging Face datasets for VLA model benchmarking in bio lab settings.
📊 Competitor Analysis▸ Show
BenchmarkKey FeaturesBenchmarks SupportedOpen-SourceRelease Date
AutoBioBio lab sim, long-horizon tasks, liquids/transparency challenges, threaded toolsVLA models (e.g., RT-2, OpenVLA)Yes (GitHub, HF)Feb 2026 (ICLR)
RoboSuiteHousehold/manipulation tasks, MuJoCo-basedRL/VLA policiesYes2020
BEHAVIOR-1KLong-horizon household tasksVLAsYes2023
LIBEROObject rearrangement, multi-taskOffline RL/VLAYes2022
BridgeData V2Real-robot trajectoriesImitation learning/VLAYes2023

🛠️ Technical Deep Dive

  • Simulation built on MuJoCo physics engine with custom assets for lab equipment (pipettes, tubes, microscopes, threaded caps).
  • Supports 10+ bio lab workflows (e.g., PCR prep, cell staining, liquid handling) with 100-500 step horizons.
  • Visual challenges: Realistic liquid dynamics (via custom shaders), transparency rendering, specular reflections, and occlusions.
  • Evaluation protocol: Zero-shot VLA action prediction from RGB observations + language instructions; metrics include task success rate, precision error (sub-mm), and trajectory efficiency.
  • Baselines tested: OpenVLA, RT-2-X, Paligemma-R1K; best scores ~25% success on easy tasks, <5% on liquid/threading tasks.
  • Dataset: 50k trajectories on Hugging Face, including expert demos and failure cases for offline training.
  • Code integrates with Gymnasium API for easy VLA deployment; supports parallel sim for high-throughput eval.

🔮 Future ImplicationsAI analysis grounded in cited sources

AutoBio sets a new standard for domain-specific VLA benchmarks, accelerating development of lab-automation agents. It highlights the need for scientific data in training, potentially driving investments in bio-sim datasets and hybrid VLA+physics models. Success could enable 24/7 automated bio labs, reducing costs in drug discovery and synthetic biology by 30-50%, while exposing gaps that spur specialized VLAs beyond household robotics.

⏳ Timeline

2025-10
HKU MMLAB and SJTU teams announce AutoBio project at NeurIPS workshop on embodied AI.
2025-12
Initial preprint released on arXiv with preliminary baselines.
2026-01
ICLR 2026 submission accepted with strong reviewer scores (8-8-6-6).
2026-02
GitHub repo and Hugging Face datasets open-sourced ahead of ICLR presentation.

AutoBio, accepted to ICLR 2026 by HKU MMLAB and SJTU teams, is a simulation benchmark for VLA models in digital biology labs. It tackles long-horizon tasks, precise interactions with threaded tools, and visual challenges from liquids/transparent containers. Open-source code evaluates if VLAs can automate real lab experiments.

Key Points

  • 1.AutoBio simulates bio lab with structured workflows, high-precision mechanics, liquids
  • 2.ICLR 2026 acceptance with strong peer review scores (8-8-6-6)
  • 3.Open-source: GitHub repo and Hugging Face datasets for VLA benchmarking
  • 4.Exposes limits of household-trained VLAs in scientific settings

Impact Analysis

Advances embodied AI towards lab automation, revealing gaps in current VLA capabilities for professional science.

Technical Details

Features long-sequence constraints, interactive apparatus like screws/knobs, and multi-modal challenges ignored in prior home/industrial benchmarks.

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Read Next

AI-curated news aggregator. All content rights belong to original publishers.
Original source: 机器之心

AutoBio: VLA Turing Test in Bio Labs | 机器之心 | SetupAI | SetupAI