1. Use cases

Video generation models are technologies that use text or image descriptions to generate dynamic video content. As the technology continues to advance, its applications are becoming increasingly widespread. Some potential application areas include:

  1. Dynamic content generation: Video generation models can create dynamic visual content to describe and explain information.
  2. Multimodal intelligent interaction: Combining image and text inputs, video generation models can be used for more intelligent and interactive applications.
  3. Replacing or enhancing traditional visual technologies: Video generation models can replace or enhance traditional machine vision technologies to solve more complex multimodal problems. As technology progresses, the multimodal capabilities of video generation models will integrate with visual language models, driving their comprehensive application in intelligent interaction, automated content generation, and complex scenario simulation. Additionally, video generation models can be combined with image generation models (image-to-video) to further expand their application range, achieving more diverse and rich visual content generation.

2. Usage ecommendations

When writing prompts, pay attention to detailed, chronological descriptions of actions and scenes. Include specific actions, appearance, camera angles, and environmental details. All content should be written in a single paragraph, starting directly with the main action, and the description should be specific and precise. Imagine yourself as a director describing a shot script. Keep the prompt within 200 words.

To achieve the best results, structure your prompt as follows:

  • Start with a sentence describing the main action
    • Example:A woman with light skin, wearing a blue jacket and a black hat with a veil,She first looks down and to her right, then raises her head back up as she speaks.
  • Add specific details about actions and gestures
    • Example:She first looks down and to her right, then raises her head back up as she speaks.
  • Precisely describe the appearance of the character/object
    • Example:She has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her blue jacket.
  • Include details about the background and environment
    • Example:The background is out of focus, but shows trees and people in period clothing.
  • Specify the camera angle and movement
    • Example:The camera remains stationary on her face as she speaks.
  • Describe lighting and color effects
    • Example:The scene is captured in real-life footage, with natural lighting and true-to-life colors.
  • Note any changes or sudden events
    • Example:A gust of wind blows through the trees, causing the woman’s veil to flutter slightly.

Example of a video generated from the above prompt:

3. Experience address

You can experience it by clicking playground. You can also refer to the API Documentation to view the API invocation method.

4. Supported models

4.1 Text-to-video models

Currently supported text-to-video models:

  • tencent/HunyuanVideo
  • tencent/HunyuanVideo-HD
  • Wan-AI/Wan2.1-T2V-14B-Turbo
  • Wan-AI/Wan2.1-T2V-14B

tencent/HunyuanVideo-HD not supports API calls.

  1. Image-to-Video Resolution The resolution is automatically matched based on the aspect ratio of the user’s uploaded image:
  • 16:9 👉 1280×720
  • 9:16 👉 720×1280
  • 1:1 👉 960×960

For optimal generation results, we recommend using images with aspect ratios of 16:9 / 9:16 / 1:1 to generate videos.

Note: The supported text-to-video models may be subject to change. Please filter by the “Video” tag on the Models to obtain the current list of supported models.