Cutting-Edge AI Tools and Techniques for Text and Video

Portable Text-Generation-WebUI Builds

Fully self-contained Text-Generation-WebUI builds using llama.cpp backend are now available. No installation needed; unzip and run. Supports CUDA, CPU on Windows/Linux, and ARM64/x86_64 on macOS. Uses portable Python builds and communicates via llama-server. Vulkan backend requires manual replacement of the llama-server executable.

Links:

Accelerated CPU Prompt Processing for MoE Models

The ik_llama.cpp fork demonstrates significantly faster prompt processing on CPU compared to mainline llama.cpp, reaching ~44 t/s for Llama-4-Scout Q5_K_M prompt evaluation versus ~21 t/s. Generation speed remains similar. This offers substantial speedups for large context processing on CPU-bound setups.

Links:

Magi-1 Autoregressive Video Generation

Sand-AI released Magi-1, an open-source autoregressive diffusion video model. It enables infinite extension for temporal continuity and claims precise control over timing and motion dynamics. Supports T2V, I2V, V2V modes. High VRAM requirements reported (e.g., 640GB mentioned for full operation).

Links:

Llama 4 MoE Models on Consumer Vulkan GPUs

Running Llama 4 Maverick on consumer GPUs with limited VRAM via llama.cpp Vulkan is possible by skipping warmup (--no-warmup) and offloading expert weights (.ffn_.*_exps.=CPU) using the -ot parameter. This avoids VRAM allocation errors during initialization, allowing shared weights and selected MoE layers on GPU.

Links:

https://github.com/ggml-org/llama.cpp/issues/13024

Distillkit for Model Distillation

Distillkit offers logit-based (KL divergence on teacher outputs) and hidden states-based knowledge distillation methods. Hidden state distillation allows for cross-architecture transfer by aligning intermediate layer representations. Memory requirements are higher than standard SFT. Scaling support for >70B models is under development.

Links:

https://github.com/arcee-ai/Distillkit

Portable Text-Generation-WebUI Builds

Accelerated CPU Prompt Processing for MoE Models

Magi-1 Autoregressive Video Generation

Llama 4 MoE Models on Consumer Vulkan GPUs

Distillkit for Model Distillation

Read more