๐ŸผStalecollected in 3h

SkyReels-V4 Tops Global Text-to-Video Ranking

SkyReels-V4 Tops Global Text-to-Video Ranking
PostLinkedIn
๐ŸผRead original on Pandaily

๐Ÿ’กSkyReels-V4 beats Sora, Veo in text-to-video+audio benchmarksโ€”new global leader!

โšก 30-Second TL;DR

What Changed

SkyReels-V4 ranks #1 in Text-to-Video with Audio on Artificial Analysis leaderboard

Why It Matters

This milestone elevates Kunlun Tech in the competitive text-to-video space, pressuring leaders like OpenAI and Google to innovate faster. It signals advancing capabilities in multimodal AI, benefiting developers seeking state-of-the-art video generation tools.

What To Do Next

Test SkyReels-V4 performance on Artificial Analysis leaderboard for your text-to-video projects.

Who should care:Developers & AI Engineers

๐Ÿง  Deep Insight

Web-grounded analysis with 7 cited sources.

๐Ÿ”‘ Enhanced Key Takeaways

  • โ€ขSkyReels-V4 achieves an Elo score of 1131 on the Artificial Analysis Text-to-Video with Audio leaderboard, leading ahead of Kling 3.0 1080p (Pro) at 1097.[3]
  • โ€ขThe model supports 1080p resolution at 32 FPS for sequences up to 15 seconds, with native audio-visual co-generation including frame-perfect lip-syncing and SFX alignment.[2]
  • โ€ขKunlun Tech (also known as Skywork AI) employs a dual-stream Multimodal Diffusion Transformer architecture, starting from low-resolution text-to-image training on 3 billion images and scaling to 1080p with multimodal inputs.[4]
๐Ÿ“Š Competitor Analysisโ–ธ Show
ModelElo (With Audio)Elo (No Audio)Resolution/FPSMax Length
SkyReels-V4113112441080p/3215s
Kling 3.0 1080p (Pro)109712481080pN/A
Veo 3.1 Fast1086N/AN/AN/A
Sora 2Below top 5N/AN/AN/A

๐Ÿ› ๏ธ Technical Deep Dive

  • โ€ขUtilizes a dual-stream Multimodal Diffusion Transformer (MM-DiT) that unifies video and audio synthesis, inpainting, and editing in a single framework.[4]
  • โ€ขTraining pipeline: Stage 1 low-resolution text-to-image on 3 billion images, progressing to stage 6 with 1080p multimodal inputs; integrates sound via upsampling and fusion with video sparse attention.[4]
  • โ€ขSupports multimodal inputs including text, image, and mask references for pixel-level control; generates 1080p at 32 FPS up to 15 seconds with microsecond-level audio-video synchronization.[2]

๐Ÿ”ฎ Future ImplicationsAI analysis grounded in cited sources

SkyReels-V4 integration into platforms like Atlas Cloud will reduce post-production costs by 50% for short-form creators.
Native audio-visual co-generation eliminates manual dubbing and layering, targeting film post-production and marketing agencies with optimized inference costs.[2]
Kunlun Tech's ecosystem will capture 20% of global AI short-drama market by end-2026.
Combines SkyReels advancements with Mureka AI music, Skywork multimodal reasoning, and DramaWave platform for end-to-end applications.[1]

โณ Timeline

2025-02
Open-sourced SkyReels-V1, China's first AI video model for short dramas.
2025-04
Released SkyReels-V2, world's first infinite-length film generation using Diffusion Forcing.
2026-01
Open-sourced SkyReels-V3, multi-subject video generation system.
2026-03
SkyReels-V4 tops Artificial Analysis Text-to-Video with Audio leaderboard.
๐Ÿ“ฐ

Weekly AI Recap

Read this week's curated digest of top AI events โ†’

๐Ÿ‘‰Related Updates

AI-curated news aggregator. All content rights belong to original publishers.
Original source: Pandaily โ†—