FramePack Video Generation on Consumer GPUs
Generate longer videos on consumer GPUs (6GB+ VRAM). FramePack breaks generation into smaller bits, improving coherence. Use tools like SageAttention or FlashAttention for potential speedups. Installation guides and a ComfyUI wrapper are available.
Links:
- https://github.com/lllyasviel/FramePack
- https://github.com/kijai/ComfyUI-FramePackWrapper
- https://github.com/woct0rdho/SageAttention
Wan 2.1 Model Updates and LoRA Training
The Wan 2.1 First-Last-Frame-to-Video (FLF2V) 14B model (720P) is released. It was trained mainly on Chinese data, so using Chinese prompts might yield better results. LoRAs can be trained or merged using tools like Musubi Tuner.
Links:
- https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P
- https://github.com/kijai/ComfyUI-WanVideoWrapper
- https://github.com/kohya-ss/musubi-tuner/blob/main/wan_generate_video.py
New Character Consistency Models
InstantCharacter allows character preservation from a single image without tuning, using the Flux model. Separately, FantasyTalking generates realistic talking portraits with coherent motion, ideal for animation. Papers and code are available for both projects.
Links:
- https://github.com/Tencent/InstantCharacter
- https://fantasy-amap.github.io/fantasy-talking/
- https://arxiv.org/abs/2504.04842
Advanced HiDream Configuration
Run HiDream efficiently using fp8 precision. For potentially less restricted outputs, combine it with uncensored LLMs (e.g., abliterated Llama 3.1 8B GGUF) via the GGUF Quad Clip Loader node in ComfyUI. This setup might require Triton.
Links:
- https://github.com/calcuis/gguf
- https://huggingface.co/calcuis/hidream-gguf/tree/main
- https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF/tree/main
Dataset Preparation Strategies
For SD 1.5 style training, random cropping maintains detail but risks learning cropped compositions. Resizing images suits models like Flux (train on multiple resolutions). Augmentation helps small datasets, but be careful with transformations like flipping which can alter meaning without careful tagging.