⚛️量子位•Recentcollected in 36m
Alibaba's Versatile Model Hits Qwen App

💡Alibaba's Qwen counters Sora with epic content creation model—test for multimodal gains.
⚡ 30-Second TL;DR
What Changed
Qwen APP introduces versatile 'acting' AI model
Why It Matters
Alibaba intensifies competition in multimodal AI, potentially accelerating accessible content tools for creators. This could shift market dynamics in Chinese AI apps versus global leaders.
What To Do Next
Download Qwen APP and experiment with the new model for video content generation.
Who should care:Creators & Designers
🧠 Deep Insight
AI-generated analysis for this event.
🔑 Enhanced Key Takeaways
- •The model, identified as Qwen-Video, utilizes a diffusion transformer architecture specifically optimized for long-duration, high-fidelity video generation with temporal consistency.
- •Alibaba has integrated this model into the Qwen app to enable direct text-to-video and image-to-video workflows, bypassing the need for external video editing software.
- •The release emphasizes 'controllable' generation, allowing users to influence camera movement and character motion, a key differentiator from the initial black-box approach of early Sora-like models.
📊 Competitor Analysis▸ Show
| Feature | Qwen-Video (Alibaba) | Sora (OpenAI) | Kling AI (Kuaishou) |
|---|---|---|---|
| Primary Focus | Mobile-first content creation | High-fidelity cinematic generation | Realistic motion & physics |
| Pricing | Freemium/Token-based | Enterprise/API (Restricted) | Tiered Subscription |
| Benchmarks | High temporal consistency | High visual fidelity | High motion realism |
🛠️ Technical Deep Dive
- •Architecture: Employs a latent diffusion transformer (DiT) framework designed to handle variable aspect ratios and frame rates natively.
- •Temporal Modeling: Incorporates a 3D-VAE (Variational Autoencoder) to compress video data while maintaining spatial-temporal coherence across long sequences.
- •Training Data: Leverages a proprietary dataset of high-quality, annotated video-text pairs, focusing on complex physical interactions and camera dynamics.
- •Inference Optimization: Utilizes model quantization and kernel-level optimizations to enable real-time or near-real-time generation on mobile-accessible cloud infrastructure.
🔮 Future ImplicationsAI analysis grounded in cited sources
Alibaba will expand Qwen-Video capabilities to include real-time video-to-video style transfer.
The current architecture's focus on temporal consistency provides a strong foundation for frame-by-frame style application without flickering.
The Qwen app will become a primary revenue driver for Alibaba's Cloud division through enterprise API integration.
By lowering the barrier to high-quality video production, Alibaba is positioning its cloud infrastructure as the backend for creative agencies and marketing firms.
⏳ Timeline
2023-08
Alibaba releases the first iteration of the Qwen (Tongyi Qianwen) open-source LLM series.
2024-09
Alibaba introduces Qwen-2.5, significantly improving reasoning and coding capabilities across the model family.
2025-06
Alibaba announces the integration of multimodal capabilities into the Qwen mobile application.
2026-04
Official launch of the 'all-round acting' video generation model within the Qwen app.
📰
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: 量子位 ↗