Approaching.AI Launches ATaaS Token Production Platform

Post LinkedIn

⚡Read original on 雷峰网

#kv-cache #token-productionataas

💡Cut AI inference costs 20%+ via ATaaS heterogenous optimizations & KV Cache breakthroughs

⚡ 30-Second TL;DR

What Changed

Four core techs: Liuhe heterogeneous inference 2.0 cuts cluster costs 20%+

Why It Matters

ATaaS provides a benchmark for optimizing domestic AI compute, enabling cost-effective scaling amid Token demand surge. It bridges hardware-software gaps, reducing waste in large-scale deployments.

What To Do Next

Contact Approaching.AI to demo ATaaS for optimizing your heterogeneous inference clusters.

Who should care:Enterprise & Security Teams

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•Approaching.AI (趨境科技) is positioning ATaaS as a middleware layer specifically targeting the 'AI inference bottleneck' in data centers, aiming to bridge the gap between raw GPU hardware and high-concurrency application requirements.
•The platform's architecture is designed to support multi-model, multi-tenant environments, allowing data centers to dynamically allocate compute resources across different LLM workloads without manual reconfiguration.
•The company's strategy focuses on 'Token-as-a-Service' (ATaaS) as a business model, shifting the focus from selling hardware or software licenses to selling optimized, low-latency token throughput to enterprise clients.

📊 Competitor Analysis▸ Show

Feature	Approaching.AI (ATaaS)	vLLM (Open Source)	NVIDIA TensorRT-LLM
Core Focus	Heterogeneous cluster optimization	High-throughput serving	Hardware-specific acceleration
KV Cache Management	Proprietary 'Yuebing' (100-1000x)	PagedAttention	PagedAttention
Scalability	10k-card elastic scaling	Limited by cluster size	Limited by cluster size
Pricing Model	ATaaS (Token-based)	Open Source (Free)	Hardware-bundled/Enterprise

🛠️ Technical Deep Dive

Liuhe Heterogeneous Inference 2.0: Utilizes a unified abstraction layer to manage mixed GPU architectures (e.g., H100s, A100s, and domestic chips) within a single cluster, reducing idle time caused by hardware fragmentation.
Yuebing KV Cache: Implements a hierarchical memory management system that offloads KV cache to CPU memory or NVMe storage, effectively decoupling context window size from VRAM capacity.
Shuangyi SLO Simulation: A predictive engine that models request latency and throughput at the operator level, allowing the scheduler to preemptively adjust resource allocation before SLO violations occur.
Wanxiang Scaling: Employs a distributed communication protocol optimized for high-latency interconnects, enabling linear throughput scaling across 10,000+ GPUs.

🔮 Future ImplicationsAI analysis grounded in cited sources

ATaaS will force a shift in data center pricing models from 'GPU-hour' to 'Token-per-dollar'.

By significantly increasing resource efficiency, the platform makes token-based billing more profitable and predictable for infrastructure providers than traditional hourly rentals.

Approaching.AI will likely pursue partnerships with domestic Chinese chip manufacturers.

The 'Liuhe' heterogeneous integration technology is specifically designed to integrate diverse hardware, which is a critical requirement for utilizing non-NVIDIA chips in the current supply-constrained market.

⏳ Timeline

2023-05

Approaching.AI (趨境科技) founded with a focus on AI infrastructure optimization.

2024-09

Company secures initial funding to develop high-efficiency inference middleware.

2026-03

Official launch of the ATaaS (AI Token as a Service) platform.

⚡Read original article on 雷峰网

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #kv-cache

Same product