Seedance 2.0 — ByteDance's Cinematic AI Video Generator

Seedance 2.0 turns a single prompt into a 15-second cinematic clip with native audio, phoneme-accurate lip-sync, and director-level camera control. The multimodal AI video model that topped Artificial Analysis over Veo 3, Sora 2, and Gen-4.5.

Drop a Reference Image for Seedance 2.0

The model accepts up to 9 reference images per generation. Upload a character sheet, a location photo, or a lighting reference — it pulls identity, style, and staging from what you drop in, then directs the scene around it.

Supports PNG, JPG, WebP up to 24MB

Pick Your Aspect Ratio

16:9 for cinematic, 9:16 for shorts, 1:1 for social. Every ratio renders at up to 4K.

Native Audio-Video in One Pass

It is the first mainstream video model that generates audio and video jointly — not as a post-processed layer. Footsteps land on puddles at frame-accurate timing, cloth rustles when the wind blows, a guitar string vibrates in sync with the note.

Prompt

Person walking through puddles in heavy rain, footsteps synchronized with splashing sounds, raindrops hitting umbrella in rhythm with audio, 4K quality, realistic water physics, cinematic atmosphere, perfect audio-visual timing.

Native audio sync

Director-Level Camera Control

Seedance 2.0 takes cinematographer vocabulary literally. Call a dolly-in, a rack focus, a Dutch angle, a whip pan — it executes. Multi-shot storytelling from a single prompt, so one 15-second render can feel like a cut sequence.

Prompt

Professional portrait of a young man in a rainy urban street at night, neon signs reflecting on wet pavement, atmospheric fog, shallow depth of field, cinematic bokeh, moody color palette, 4K ultra-detailed, film noir aesthetic.

Cinematic control

Phoneme-Level Lip-Sync in 8+ Languages

Drop a character portrait and a line of dialogue — the model animates mouth shapes at the phoneme level, not the word level. The result passes on close inspection in English, Mandarin, Japanese, Korean, Spanish, French, German, and more.

Prompt

Close-up shot of a woman speaking directly to camera, clear articulation of words, natural facial expressions during speech, perfect lip-sync with audio, 4K cinematic quality, professional interview lighting, authentic conversational tone.

Phoneme lip-sync

Physics That Hold Up

Fabric wrinkles the way cloth wrinkles. Liquids refract. Particles obey gravity and wind independently. Trained on real-world footage, its world model survives slow-motion scrutiny that kills other video models.

Prompt

Slow-motion shot of a red silk scarf being thrown into the air, floating gracefully with realistic fabric physics, gentle wind affecting movement, 4K quality, cinematic lighting with soft shadows, photorealistic material properties.

Real-world physics

9 Images + 3 Videos + 3 Audios per Generation

Seedance 2.0 takes richer reference payloads than any other public video model. Feed character sheets, location plates, existing footage, reference scores — it fuses them into a single coherent render instead of averaging them into mush.

Prompt

4K close-up of water being poured into a crystal glass, realistic liquid physics with surface tension, light refraction through water and glass, dynamic splashing, photorealistic transparency and reflections, cinematic lighting.

Multi-reference fusion

Topped Artificial Analysis in 2026

It hit Elo 1269 on Artificial Analysis's video-generation leaderboard in April 2026, ahead of Google Veo 3, OpenAI Sora 2, and Runway Gen-4.5. On SeedVideoBench-2.0 it leads text-to-video, image-to-video, and multimodal tasks.

Prompt

Cherry blossom petals falling in slow motion, realistic wind patterns affecting each petal differently, natural gravity and air resistance, 4K cinematic quality, soft bokeh background, spring atmosphere, photorealistic textures.

Benchmark leader

Why Creators Choose Seedance 2.0

Seedance 2.0 dropped the tradeoffs. Cinematic output, native audio, director-level control, and world-class physics — all in one model, in one pass, in one 15-second clip.

Seedance 2.0 Credit Plans

Every render costs credits based on duration and resolution. Pick the plan that matches how much you shoot. Credits roll over on subscription plans; one-time packs never expire.

Monthly Subscription

Annual Subscription

-30% OFF

Credit Packs

Monthly Subscription

Annual Subscription

-30% OFF

Credit Packs

Starter

$9.9/ month

For solo creators testing the model.

Includes:

2,950 credits per month
~30 renders/month

Creator

$19.9/ month

For working video creators.

Includes:

6,500 credits per month
~65 renders/month

Studio

$49.9/ month

For agencies running at volume.

Includes:

18,000 credits per month
~180 renders/month

Seedance 2.0 FAQ

Quick answers about running Seedance 2.0 through our hosted gateway.

What is Seedance 2.0?

Seedance 2.0 is ByteDance's flagship multimodal video model, released February 2026. It generates up to 15 seconds with native audio, director-level camera control, and phoneme-level lip-sync in 8+ languages, all in a single forward pass. The model topped Artificial Analysis's leaderboard above Veo 3, Sora 2, and Gen-4.5.

Is Seedance 2.0 free to try?

You get starter credits on sign-up — enough to render your first clip without paying. After that, every render costs credits based on duration and resolution. Our credit plans start at $9.90/month.

Can I use Seedance 2.0 from the US?

Yes. ByteDance excluded the United States from the direct rollout through Dreamina. Our hosted gateway relays requests through supported regions, so US creators can use the full Seedance 2.0 API without a VPN or waitlist.

How long are Seedance 2.0 videos?

Each render is up to 15 seconds. Within that window the model can produce multiple shots with natural cuts and transitions, so the output feels like an edited sequence rather than a continuous take.

What inputs does Seedance 2.0 accept?

In a single pass, Seedance 2.0 accepts a text prompt plus up to 9 reference images, 3 video clips, and 3 audio clips. Character identity, location, camera style, and even ambient sound can all be seeded from references.

Does Seedance 2.0 really generate audio?

Yes, and it is one of the defining features. Video and audio are generated jointly in one forward pass — not post-processed. Footsteps, dialogue, music, and ambient sound all land on the right frame because the model never separates them.

Is Seedance 2.0 safe for commercial use?

Renders you generate through our gateway are yours to use commercially under our terms of service. Seedance 2.0 has built-in content moderation; prompts that violate policy are rejected before compute.

Seedance 2.0 — ByteDance's Cinematic AI Video Generator

Native Audio-Video in One Pass

It is the first mainstream video model that generates audio and video jointly — not as a post-processed layer. Footsteps land on puddles at frame-accurate timing, cloth rustles when the wind blows, a guitar string vibrates in sync with the note.

Director-Level Camera Control

Seedance 2.0 takes cinematographer vocabulary literally. Call a dolly-in, a rack focus, a Dutch angle, a whip pan — it executes. Multi-shot storytelling from a single prompt, so one 15-second render can feel like a cut sequence.

Phoneme-Level Lip-Sync in 8+ Languages

Drop a character portrait and a line of dialogue — the model animates mouth shapes at the phoneme level, not the word level. The result passes on close inspection in English, Mandarin, Japanese, Korean, Spanish, French, German, and more.

Physics That Hold Up

Fabric wrinkles the way cloth wrinkles. Liquids refract. Particles obey gravity and wind independently. Trained on real-world footage, its world model survives slow-motion scrutiny that kills other video models.

9 Images + 3 Videos + 3 Audios per Generation

Seedance 2.0 takes richer reference payloads than any other public video model. Feed character sheets, location plates, existing footage, reference scores — it fuses them into a single coherent render instead of averaging them into mush.

Topped Artificial Analysis in 2026

It hit Elo 1269 on Artificial Analysis's video-generation leaderboard in April 2026, ahead of Google Veo 3, OpenAI Sora 2, and Runway Gen-4.5. On SeedVideoBench-2.0 it leads text-to-video, image-to-video, and multimodal tasks.

Why Creators Choose Seedance 2.0

01Native Audio, Not Post-ProcessedVideo and audio are generated in a unified forward pass. Wind, footsteps, voice, music — every sound lands on the correct frame because the model never separate…

0215-Second Multi-Shot RendersA single generation can contain multiple cuts and camera angles that feel edited, not drifted. One prompt, one render, one scene with beats.

03Director-Level Camera LanguageIt responds to real cinematographer vocabulary. Specify lens, move, angle, framing — it executes.

04Multimodal Input: Text + Images + Video + AudioUp to 9 reference images, 3 video clips, and 3 audio clips per generation. Richer references, tighter output.

05Phoneme-Level Lip-SyncThe model animates mouths at the phoneme level in 8+ languages. Dialogue scenes read as performed, not pasted on.

064K Native ResolutionRenders up to 3840×2160. Grade it, cut it, color it — the pixels hold up at cinema distribution sizes.

07Available Worldwide — Including the USByteDance restricted the direct rollout to non-US regions. Our hosted gateway relays every country, US included.

08No ByteDance Account NeededSkip the Dreamina waitlist. Sign in with Google, buy credits, and start rendering in under a minute.

Seedance 2.0 FAQ

What is Seedance 2.0?

Is Seedance 2.0 free to try?

Can I use Seedance 2.0 from the US?

How long are Seedance 2.0 videos?

What inputs does Seedance 2.0 accept?

Does Seedance 2.0 really generate audio?

Is Seedance 2.0 safe for commercial use?