Performance
| Model | Quant | Size | RADV pp | RADV tg | AMDVLK pp | AMDVLK tg |
| Gemma-4-E2B | UD-Q4_K_XL | 2.9 GiB | 3382 | 109 | 954 | 99 |
| Gemma-4-E4B | UD-Q4_K_XL | 4.7 GiB | 1828 | 59 | 491 | 57 |
| GPT-OSS-20B-Derestricted | MXFP4 | 11 GiB | 1405 | 77 | 1380 | 77 |
| Qwen3.5-4B | Q8_0 | 4 GiB | 1375 | 37.9 | 510 | 40.0 |
| Gemma-4-26B-A4B | Q8_0 | 25 GiB | 1303 | 47.7 | 790 | 47.1 |
| Qwen3-30B-Instruct-2507 | UD-Q4_K_XL | 16.5 GiB | 1143 | 92 | 936 | 93 |
| Nemotron-3-Nano-30B-A3B New | UD-Q4_K_XL | 21.3 GiB | 1106 | 68 | 776 | 65 |
| Qwen3.5-35B-A3B | Unsloth UD-Q4_K_XL | 21 GiB | 1017 | 59.8 | 686 | 60.4 |
| GLM-4.7-Flash New | UD-Q4_K_XL | 16.3 GiB | 990 | 73 | 529 | 70 |
| Qwen3.5-9B | UD-Q4_K_XL | 5.6 GiB | 972 | 35.7 | 289 | 35.2 |
| Nemotron-Cascade-2-30B-A3B | Q8_0 | 31 GiB | 968 | 54 | — | — |
| Kimi-Linear-48B-A3B New | Q4_K_M | 28 GiB | 789 | 72 | 574 | 72 |
| Ministral-3-14B New | UD-Q4_K_XL | 7.8 GiB | 696 | 25 | 173 | 25 |
| GPT-OSS-120B | MXFP4 | 59 GiB | 596 | 56 | 661 | 53 |
| Qwen3-Coder-Next-80B | MXFP4 | 41 GiB | 586 | 40 | 462 | 43 |
| Magistral-Small-2509 New | UD-Q4_K_XL | 13.5 GiB | 389 | 15 | 94 | 15 |
| Devstral-Small-2-24B New | UD-Q4_K_XL | 13.5 GiB | 382 | 15 | 94 | 15 |
| Qwen3.5-27B | UD-Q4_K_XL | 16 GiB | 310 | 12.1 | 86 | 11.9 |
| Qwen3.5-122B-A10B | Unsloth UD-Q4_K_XL | 72 GiB | 287 | 22.4 | 197 | 21.9 |
| MiniMax-M2.5 New | Unsloth UD-Q3_K_XL | 94 GiB | 179 | 22 | 164 | 32 |
| Gemma-4-31B | Unsloth UD-Q4_K_XL (Apr 11) | 17.5 GiB | 261 | 11.1 | 70.8 | 11.1 |
Quality
| Model | Writing /30 | LRU /10 | FastAPI /8 | LeetCode /59 | Polyglot /65 | Postgres /57 | Cassandra /56 | Combined /285 |
| Gemma-4-31B | 27 | 10 | 8 | 59 | 15 | 44 | 38 | 201 |
| Gemma-4-26B-A4B | 28 | 10 | 8 | 59 | 15 | 45 | 29 | 194 |
| Qwen3.5-122B-A10B | 29 | 10 | 8 | 59 | 13 | 36 | 34 | 192 |
| MiniMax-M2.5 New | 26 | 10 | 7 | 59 | 13 | 40 | 30 | 185 |
| GPT-OSS-120B | 20 | 10 | 8 | 59 | 14 | 40 | 31 | 182 |
| Qwen3.5-35B-A3B | 28 | 10 | 8 | 59 | 8 | 32 | 33 | 178 |
| Kimi-Linear-48B-A3B New | 30 | 10 | 8 | 57 | 22 | 26 | 24 | 177 |
| Qwen3.5-27B | 25 | 10 | 8 | 59 | 10 | 34 | 29 | 175 |
| Qwen3-30B-Instruct-2507 | 30 | 10 | 2 | 59 | 13 | 27 | 31 | 172 |
| Qwen3-Coder-Next-80B | 26 | 10 | 2 | 59 | 9 | 33 | 32 | 171 |
| Devstral-Small-2-24B New | 27 | 10 | 2 | 59 | 11 | 29 | 31 | 169 |
| GPT-OSS-20B-Derestricted | 13 | 10 | 8 | 59 | 14 | 37 | 23 | 164 |
| Gemma-4-E2B | 18 | 7 | 8 | 59 | 7 | 27 | 27 | 153 |
| Gemma-4-E4B | 23 | 7 | 8 | 59 | 3 | 26 | 26 | 152 |
| Ministral-3-14B New | 26 | 2 | 2 | 59 | 16 | 23 | 18 | 146 |
| Nemotron-Cascade-2-30B-A3B | 18 | 10 | 8 | 59 | 1 | 22 | 21 | 139 |
| Qwen3.5-9B | 20 | 10 | 0 | 51 | 5 | 28 | 22 | 136 |
| Qwen3.5-4B | 16 | 9 | 8 | 54 | 3 | 17 | 16 | 123 |
| Magistral-Small-2509 New | 20 | 0 | 8 | 30 | 2 | 12 | 35 | 107 |
| Nemotron-3-Nano-30B-A3B New | 20 | 4 | 0 | 46 | 7 | 16 | 16 | 109 |
| GLM-4.7-Flash New | 14 | 0 | 0 | 16 | 0 | 23 | 27 | 80 |
Key Findings
- RADV dominates prompt processing across all model families (50-280% faster than AMDVLK on pp). Token generation is typically tied between drivers since it's bandwidth-bound. AMDVLK only wins on GPT-OSS-120B pp (661 vs 596 RADV).
- Kimi-Linear-48B is the new polyglot leader at 22/65, dethroning the GPT-OSS/Gemma trio (14/65). It's one of only two models to score on the Go rule engine (10/10) alongside Qwen3-30B (3/10). Linear attention gives it 72 t/s at 28 GiB.
- Devstral-Small-2 punches above its weight. A 24B dense coding model hitting 29/57 Postgres and 31/56 Cassandra while scoring 10/10 LRU and 59/59 LeetCode. The 15 t/s generation speed hurts but the quality-per-parameter is impressive.
- Nemotron-3-Nano collapses without thinking mode. 0/57 Postgres, 0/56 Cassandra, 0/59 LeetCode. The Mamba-2 hybrid architecture appears to need reasoning tokens enabled to produce structured output. Fast (1106 pp, 68 tg) but unusable for coding or database tasks with no-think.
- Magistral-Small has the highest Cassandra score (35/56) of any non-Gemma model, beating Devstral (31) and Kimi (24). But it scores 0/10 LRU, 12/57 Postgres, and leaked its reasoning scaffold into the romance story. A specialist, not a generalist.
- GLM-4.7-Flash can't code. 0/10 LRU, 0/8 FastAPI, 16/59 LeetCode (extraction failures), 0/65 polyglot. But it handles database work (23/57 PG, 27/56 Cass) and generates at 73 t/s. A fast model with a very narrow skill set.
- MiniMax-M2.5 went from 0/10 coding to 10/10 after switching to Unsloth's UD-Q3_K_XL quant and their recommended sampling params (temp 1.0, top_p 0.95, min_p 0.01, top_k 40). At 94 GiB it's the largest model on disk, but 185/285 Combined puts it 6th overall. Perfect 10/10 on both PostgreSQL T2 optimization and Cassandra T2 anti-pattern detection. AMDVLK is its best backend at 32 t/s (vs 22 on RADV).
- Sampling params transform quality results. Unsloth-recommended params (presence_penalty 1.5, top_k 20, thinking mode via chat-template-kwargs) made the difference between 0/10 and 10/10 on several models.
- T2 (cron), T5 (FastAPI), and T6 (Sinatra) remain unsolved at 0/10 across all model runs. These single-shot challenges exceed current local model capability.
- Gemma 4 leads both database benchmarks. Dense 31B tops PostgreSQL (46/57) and Cassandra (39/56). The 26B MoE follows closely. No other model family breaks 35/57 PG or 35/56 Cass.
Partial Results
Models evaluated before the full benchmark suite was established. These ran writing and LRU cache tests but not the complete battery. No longer on disk. Listed here for historical reference.
Performance
| Model | Quant | Size | RADV pp | RADV tg | AMDVLK pp | AMDVLK tg |
| Step3.5-Flash | IQ3_XS | 76 GiB | 237 | 32 | — | — |
| Nemotron-3-Super-120B-A12B | Unsloth UD-Q4_K_XL | 78 GiB | 196 | 10.2 | 139 | 9.86 |
Quality
| Model | Writing /30 | LRU /10 |
| Ling-Flash-2.0 | 26 | 2 |
| Nemotron-3-Super-120B-A12B | 25 | 10 |
| Devstral-2-123B | 25 | 2 |
| Solar-Open-100B | 21 | 0 |
| Mistral-Large-2411 | 20 | 2 |