Mastering AI: Enhance Image and Video Consistency

Improving Sora Image-to-Video Consistency

Getting Sora's image-to-video to maintain style can be challenging. If the video unexpectedly changes style, try the "recut and regenerate" technique. Cut out the inconsistent part, keep the good section, and regenerate from there. Repeat this process until the desired consistency is achieved, though it requires patience.

Controlling LoRA Facial Expressions

LoRAs trained primarily on smiling faces often default to smiles. To achieve neutral expressions, explicitly caption training images with "smiling face" or "neutral expression." This helps the model separate the expression from identity. During generation, repeatedly including "neutral face" in the prompt might also help.

Optimizing WAN Video Resolutions

For WAN 480p video models on limited VRAM (e.g., 12GB), resolutions such as 480x640 or 384x512 might perform better than the default 832x480. Experiment with lower shift values (e.g., 4-5 instead of 9) and reduced FPS for potentially smoother results in short clips.

Structure-Preserving Style Transfer Workflow

A new workflow combines image generation models like Flux with Canny ControlNet. This method transfers the style from a source image onto the structure of a target image, aiming to preserve the original geometry while applying a new visual aesthetic.

Links:

Exploring the HiDream Model

Initial user tests suggest the new HiDream model offers strong prompt adherence, possibly surpassing Flux, especially with multiple subjects. It might also be less restrictive with content. Note its MoE architecture, which may impact fine-tuning approaches compared to models like SDXL or Flux.

Improving Sora Image-to-Video Consistency

Controlling LoRA Facial Expressions

Optimizing WAN Video Resolutions

Structure-Preserving Style Transfer Workflow

Exploring the HiDream Model

Read more