⚡雷峰网•Stalecollected in 42h
Approaching.AI Launches ATaaS Token Production Platform

💡Cut AI inference costs 20%+ via ATaaS heterogenous optimizations & KV Cache breakthroughs
⚡ 30-Second TL;DR
What Changed
Four core techs: Liuhe heterogeneous inference 2.0 cuts cluster costs 20%+
Why It Matters
ATaaS provides a benchmark for optimizing domestic AI compute, enabling cost-effective scaling amid Token demand surge. It bridges hardware-software gaps, reducing waste in large-scale deployments.
What To Do Next
Contact Approaching.AI to demo ATaaS for optimizing your heterogeneous inference clusters.
Who should care:Enterprise & Security Teams
🧠 Deep Insight
AI-generated analysis for this event.
🔑 Enhanced Key Takeaways
- •Approaching.AI (趨境科技) is positioning ATaaS as a middleware layer specifically targeting the 'AI inference bottleneck' in data centers, aiming to bridge the gap between raw GPU hardware and high-concurrency application requirements.
- •The platform's architecture is designed to support multi-model, multi-tenant environments, allowing data centers to dynamically allocate compute resources across different LLM workloads without manual reconfiguration.
- •The company's strategy focuses on 'Token-as-a-Service' (ATaaS) as a business model, shifting the focus from selling hardware or software licenses to selling optimized, low-latency token throughput to enterprise clients.
📊 Competitor Analysis▸ Show
| Feature | Approaching.AI (ATaaS) | vLLM (Open Source) | NVIDIA TensorRT-LLM |
|---|---|---|---|
| Core Focus | Heterogeneous cluster optimization | High-throughput serving | Hardware-specific acceleration |
| KV Cache Management | Proprietary 'Yuebing' (100-1000x) | PagedAttention | PagedAttention |
| Scalability | 10k-card elastic scaling | Limited by cluster size | Limited by cluster size |
| Pricing Model | ATaaS (Token-based) | Open Source (Free) | Hardware-bundled/Enterprise |
🛠️ Technical Deep Dive
- Liuhe Heterogeneous Inference 2.0: Utilizes a unified abstraction layer to manage mixed GPU architectures (e.g., H100s, A100s, and domestic chips) within a single cluster, reducing idle time caused by hardware fragmentation.
- Yuebing KV Cache: Implements a hierarchical memory management system that offloads KV cache to CPU memory or NVMe storage, effectively decoupling context window size from VRAM capacity.
- Shuangyi SLO Simulation: A predictive engine that models request latency and throughput at the operator level, allowing the scheduler to preemptively adjust resource allocation before SLO violations occur.
- Wanxiang Scaling: Employs a distributed communication protocol optimized for high-latency interconnects, enabling linear throughput scaling across 10,000+ GPUs.
🔮 Future ImplicationsAI analysis grounded in cited sources
ATaaS will force a shift in data center pricing models from 'GPU-hour' to 'Token-per-dollar'.
By significantly increasing resource efficiency, the platform makes token-based billing more profitable and predictable for infrastructure providers than traditional hourly rentals.
Approaching.AI will likely pursue partnerships with domestic Chinese chip manufacturers.
The 'Liuhe' heterogeneous integration technology is specifically designed to integrate diverse hardware, which is a critical requirement for utilizing non-NVIDIA chips in the current supply-constrained market.
⏳ Timeline
2023-05
Approaching.AI (趨境科技) founded with a focus on AI infrastructure optimization.
2024-09
Company secures initial funding to develop high-efficiency inference middleware.
2026-03
Official launch of the ATaaS (AI Token as a Service) platform.
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 雷峰网 ↗