Image generation
1.Image generation model overview
The platform provides image generation models that can be used in two main ways: generating images directly based on prompts, or generating image variants based on existing images.
-
Generating images based on text prompts
When using large text-to-image models, carefully designed prompts (instructions) can help generate higher quality images. Here are some tips to improve the quality of generated images:
-
Specific description: Try to describe the image content in detail. For example, if you want to generate a sunset beach scene, instead of just “beach sunset,” you can try “a serene beach with the sun setting, the sky turning orange and red, gentle waves lapping at the sand, and a small boat in the distance.”
-
Emotion and atmosphere: Besides describing the content, you can also add descriptions of emotions or atmosphere, such as “cozy,” “mysterious,” or “energetic,” which can help the model better understand the desired style.
-
Style specification: If you have a specific art style preference, such as “impressionist” or “surrealism,” you can specify it in the prompt to make the generated image more likely to meet your expectations.
-
Avoid vague words: Try to avoid using abstract or vague words, such as “beautiful” or “good,” as these words are difficult for the model to concretize and may result in images that differ significantly from your expectations.
-
Use negatives: If you do not want certain elements in the image, you can use negative words to exclude them. For example, “generate a beach sunset image without a boat.”
-
Step-by-step input: For complex scenes, you can try inputting the prompt in steps, first generating a basic image and then adjusting or adding details as needed.
-
Try different descriptions: Sometimes, even if you describe the same scene, different descriptions can yield different results. Try describing the scene from different angles or using different words to see which one gives you the best result.
-
Utilize model-pecific features: Some models may offer specific features or parameter adjustment options, such as adjusting the resolution or style strength of the generated image. Properly utilizing these features can also help improve the quality of the generated image.
-
By using these methods, you can effectively improve the quality of images generated using large text-to-image models. However, since different models may have different characteristics and preferences, you may need to make appropriate adjustments based on the specific model’s features and feedback.
You can refer to the following examples:
A futuristic eco-friendly skyscraper in central Tokyo. The building incorporates lush vertical gardens on every floor, with cascading plants and trees lining glass terraces. Solar panels and wind turbines are integrated into the structure’s design, reflecting a sustainable future. The Tokyo Tower is visible in the background, contrasting the modern eco-architecture with traditional city landmarks.
An elegant snow leopard perched on a cliff in the Himalayan mountains, surrounded by swirling snow. The animal’s fur is intricately detailed with distinctive patterns and a thick winter coat. The scene captures the majesty and isolation of the leopard’s habitat, with mist and mountain peaks fading into the background.
-
Generating image variants based on existing images
Some image generation models support generating image variants based on existing images. In this case, you still need to input appropriate prompts to achieve the desired results. You can refer to the above content for prompt input.
2.Experience address
You can experience the image generation function by visiting Image Generation or by referring to the API documentation to call the API.
-
Key parameters introduction
-
image_size: Controls the image resolution. You can customize various resolutions when making API requests.
-
num_inference_steps: Controls the number of steps in image generation. Some models can generate better results by adjusting the step length. Models like black-forest-labs/FLUX.1-schnell, Pro/black-forest-labs/FLUX.1-schnell, and stabilityai/stable-diffusion-3-5-large-turbo do not support adjusting the step length and use a default step length of 4.
-
prompt_enhancement: Prompt enhancement switch. This switch enhances the input prompt. For Chinese users, if you want to quickly generate images using Chinese prompts, you can turn this switch on to better adapt to Chinese.
-
batch_size: The number of images generated at once. The default value is 1, with a maximum value of 4.
-
negative_prompt: Here, you can input elements that you do not want to appear in the image to eliminate some influencing factors.
-
seed: If you want to generate the same image every time, you can set the seed to a fixed value.
-
3.Image generation billing introduction
The platform’s image generation billing is divided into two billing methods:
-
Billing based on image size and inference steps, with a price of ¥x/M px/Steps, i.e., ¥x per million pixels per step.
For example, if you want to generate an image with a width of 1024 and a height of 512, and 4 inference steps, and you choose a model with a price of ¥0.0032/M px/Steps (stabilityai/stable-diffusion-3-5-large-turbo), the cost of generating one image would be
(1024 * 512) / (1024 * 1024) * 4 * 0.0032 = 0.0064 yuan
. Here, 2 represents that the pixel size of 1024 * 512 is 0.5M, and the cost of generating an image is related to the pixel size and the number of steps. -
Billing based on the number of images, with a price of ¥x/Image, i.e., ¥x per image.
For example, if you want to generate an image with a width of 1024 and a height of 512, and 4 inference steps, and you choose a model with a price of ¥0.37/Image (black-forest-labs/FLUX.1-pro), the cost of generating one image would be ¥0.37. The cost of generating an image is unrelated to the pixel size and the number of steps.
4.Supported models list
Currently supported image generation models:
- Text-to-image series:
-
black-forest-labs series:
- black-forest-labs/FLUX.1-dev
- black-forest-labs/FLUX.1-schnell
- Pro/black-forest-labs/FLUX.1-schnell
- black-forest-labs/FLUX.1-pro
-
stabilityai series:
- stabilityai/stable-diffusion-3-5-large
- stabilityai/stable-diffusion-3-5-large-turbo
- stabilityai/stable-diffusion-3-medium
- stabilityai/stable-diffusion-xl-base-1.0
- stabilityai/stable-diffusion-2-1
-
deepseekai series:
- deepseek-ai/Janus-Pro-7B Default output resolution is 384*384
-
- Image-to-image series:
- stabilityai series:
- stabilityai/stable-diffusion-xl-base-1.0
- stabilityai/stable-diffusion-2-1
- stabilityai series: