Improving Sora Image-to-Video Consistency
Getting Sora's image-to-video to maintain style can be challenging. If the video unexpectedly changes style, try the "recut and regenerate" technique. Cut out the inconsistent part, keep the good section, and regenerate from there. Repeat this process until the desired consistency is achieved, though it requires patience.
Controlling LoRA Facial Expressions
LoRAs trained primarily on smiling faces often default to smiles. To achieve neutral expressions, explicitly caption training images with "smiling face" or "neutral expression." This helps the model separate the expression from identity. During generation, repeatedly including "neutral face" in the prompt might also help.
Optimizing WAN Video Resolutions
For WAN 480p video models on limited VRAM (e.g., 12GB), resolutions such as 480x640 or 384x512 might perform better than the default 832x480. Experiment with lower shift values (e.g., 4-5 instead of 9) and reduced FPS for potentially smoother results in short clips.
Structure-Preserving Style Transfer Workflow
A new workflow combines image generation models like Flux with Canny ControlNet. This method transfers the style from a source image onto the structure of a target image, aiming to preserve the original geometry while applying a new visual aesthetic.
Links:
- https://github.com/FotographerAI/Zen-style
- https://huggingface.co/spaces/fotographerai/Zen-Style-Shape
Exploring the HiDream Model
Initial user tests suggest the new HiDream model offers strong prompt adherence, possibly surpassing Flux, especially with multiple subjects. It might also be less restrictive with content. Note its MoE architecture, which may impact fine-tuning approaches compared to models like SDXL or Flux.