Cutting-Edge AI Developments and Optimizations

Gemma 3 QAT Performance & Stability

Recent benchmarks suggest Google's QAT Gemma 3 27B Q4 outperforms standard Q4 quants on GPQA diamond using less VRAM (16.4GB vs 17.4GB+). However, statistical significance is questioned, and some LMStudio users report instability like output loops and nonsensical words with QAT models.

Links:

Windsurf System Prompt & Function Leaks

Leaked Windsurf system prompts (dated 2025-04-20) expose internal tool structures (JSON/XML functions) and parameters like the "Yap score" modulating verbosity (up to 8192 words). These details offer insights for reverse engineering or building custom agents leveraging similar patterns.

Links:

Llama 4 Slow Prefill with Partial Offload

Users report slow prompt processing (prefill) speed compared to generation speed when running Llama 4 Maverick on llama.cpp with MoE experts offloaded to CPU (-ot ".\*ffn\_.\*\_exps.\*=CPU"). PCIe bandwidth limitations (Gen3 vs Gen4) and layer selection (*attn.*=GPU) are being investigated. Ktransformers fork mentioned as faster.

Links:

https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/llama4.md

Hardware Choices for Local LLMs

Discussions focus on VRAM vs. compute. Used RTX 3090 (24GB) often preferred over new AMD RX 7900 XTX (24GB) or lower VRAM current-gen cards due to CUDA ecosystem maturity. Multi-GPU setups (2x/3x 3060 12GB) are viable budget options for >30B models. M1 Max (64GB) favored over M4 (32GB).

LlamaIndex Docstore & Tool Calling

Experienced users question the role and necessity of the docstore component in LlamaIndex when nodes are already in the vector store. Additionally, techniques are sought to enable tool calling for LLMs lacking native support (e.g., Snowflake Cortex) potentially via structured output parsing within LlamaIndex.

Links:

Gemma 3 QAT Performance & Stability

Windsurf System Prompt & Function Leaks

Llama 4 Slow Prefill with Partial Offload

Hardware Choices for Local LLMs

LlamaIndex Docstore & Tool Calling

Read more