Performance

Model Quant Size RADV pp RADV tg AMDVLK pp AMDVLK tg
Gemma-4-E2B UD-Q4_K_XL 2.9 GiB 3382 109 954 99
Granite-4.1-3B UD-Q4_K_XL 2.0 GiB 2278 88.7 652 88.1
Gemma-4-E4B UD-Q4_K_XL 4.7 GiB 1828 59 491 57
GPT-OSS-20B-Derestricted MXFP4 11 GiB 1405 77 1380 77
Qwen3.5-4B Q8_0 4 GiB 1375 37.9 510 40.0
Gemma-4-26B-A4B Q8_0 25 GiB 1303 47.7 790 47.1
Qwen3-30B-Instruct-2507 UD-Q4_K_XL 16.5 GiB 1143 92 936 93
Nemotron-3-Nano-30B-A3B UD-Q4_K_XL 21.3 GiB 1106 68 776 65
Nemotron-3-Nano-Omni-30B-A3B-Reasoning UD-Q4_K_XL 22.8 GiB 1097 61.1 777 58.7
Qwen3.6-35B-A3B Unsloth UD-Q4_K_XL 20.8 GiB 1029 59.6 615 57.6
Qwen3.5-35B-A3B Unsloth UD-Q4_K_XL 21 GiB 1017 59.8 686 60.4
GLM-4.7-Flash UD-Q4_K_XL 16.3 GiB 990 73 529 70
Qwen3.5-9B UD-Q4_K_XL 5.6 GiB 972 35.7 289 35.2
Nemotron-Cascade-2-30B-A3B Q8_0 31 GiB 968 54 n/a n/a
Granite-4.1-8B UD-Q4_K_XL 5.1 GiB 936 38.6 265 38.7
Kimi-Linear-48B-A3B Q8_0 48.6 GiB 746 52 570 53
Gemma-4-12B Unsloth UD-Q8_K_XL 13.6 GiB 716 14.0 199 14.0
Ministral-3-14B UD-Q4_K_XL 7.8 GiB 696 25 173 25
GPT-OSS-120B MXFP4 59 GiB 596 56 661 53
Qwen3-Coder-Next-80B MXFP4 41 GiB 586 40 462 43
Magistral-Small-2509 UD-Q4_K_XL 13.5 GiB 389 15 94 15
Devstral-Small-2-24B UD-Q4_K_XL 13.5 GiB 382 15 94 15
Mistral-Small-4-119B Unsloth UD-Q4_K_XL 69 GiB 363 40.3 313 39.2
Qwen3.6-27B Unsloth UD-Q4_K_XL 16.4 GiB 322 12.0 86 12.0
Qwen3.5-27B UD-Q4_K_XL 16 GiB 310 12.1 86 11.9
Qwen3.5-122B-A10B Unsloth UD-Q4_K_XL 72 GiB 287 22.4 197 21.9
Granite-4.1-30B UD-Q4_K_XL 16.5 GiB 275 11.8 71.2 11.9
Gemma-4-31B Unsloth UD-Q4_K_XL (Apr 11) 17.5 GiB 261 11.1 70.8 11.1
MiniMax-M2.7 Unsloth UD-IQ4_XS 101 GiB 181 27.0 179 24.8
MiniMax-M2.5 Unsloth UD-Q3_K_XL 94 GiB 179 22 164 32
Mistral-Medium-3.5-128B Unsloth UD-Q4_K_XL 70.5 GiB 62 2.9 17 2.9

Quality

The Tooling column is new and tracks automation reliability out of 65. It is scored separately and does not count toward the Combined total. Models marked with a hyphen have not run it yet.

Model Writing /30 LRU /10 FastAPI /8 LeetCode /59 Polyglot /65 Postgres /57 Cassandra /56 Tooling /65 Combined /285
Gemma-4-31B 27 10 8 59 44 51 39 - 238
Gemma-4-12B 28 10 8 59 32 47 38 54 222
Gemma-4-26B-A4B 28 10 8 59 22 49 39 - 215
Qwen3.6-27B 30 10 8 59 30 44 32 60.5 213
Mistral-Medium-3.5-128B 30 7 8 59 30 39 34 - 207
Qwen3.6-35B-A3B 29 10 8 59 35 31 33 53.5 205
Qwen3.5-122B-A10B 29 10 8 59 13 36 37 - 192
Qwen3.5-35B-A3B 28 10 8 59 17 32 38 - 192
Kimi-Linear-48B-A3B 30 10 8 59 22 30 31 - 190
Qwen3.5-27B 25 10 8 59 21 34 29 - 186
MiniMax-M2.5 26 10 7 59 13 40 30 - 185
GPT-OSS-120B 20 10 8 59 14 40 31 - 182
MiniMax-M2.7 28 10 8 59 8 38 28 - 179
Gemma-4-E4B 23 7 8 59 13 35 34 - 179
Mistral-Small-4-119B 21 10 7 59 24 27 26 - 174
Qwen3-30B-Instruct-2507 30 10 2 59 13 27 31 - 172
Granite-4.1-30B 28 10 2 59 10 32 31 - 172
Qwen3-Coder-Next-80B 26 10 2 59 9 33 32 - 171
Devstral-Small-2-24B 27 10 2 59 11 29 31 - 169
GPT-OSS-20B-Derestricted 13 10 8 59 14 37 23 - 164
Gemma-4-E2B 18 7 8 59 10 27 27 - 156
Nemotron-3-Nano-Omni-30B-A3B-Reasoning 24 8 2 59 6 31 18 - 148
Ministral-3-14B 26 2 2 59 16 23 18 - 146
Qwen3.5-9B 20 10 0 51 14 28 23 - 146
Granite-4.1-8B 18 4 4 59 6 26 24 - 141
Nemotron-Cascade-2-30B-A3B 18 10 8 59 1 22 21 - 139
Qwen3.5-4B 16 9 8 54 3 17 16 - 123
Nemotron-3-Nano-30B-A3B 20 4 0 46 7 16 16 - 109
Magistral-Small-2509 20 0 8 30 2 12 35 - 107
Granite-4.1-3B 17 0 0 53 4 11 13 - 98
GLM-4.7-Flash 14 0 0 16 0 23 27 - 80

Key Findings

Partial Results

Models evaluated before the full benchmark suite was established. These ran writing and LRU cache tests but not the complete battery. No longer on disk. Listed here for historical reference.

Performance

Model Quant Size RADV pp RADV tg AMDVLK pp AMDVLK tg
Step3.5-Flash IQ3_XS 76 GiB 237 32 n/a n/a
Nemotron-3-Super-120B-A12B Unsloth UD-Q4_K_XL 78 GiB 196 10.2 139 9.86

Quality

Model Writing /30 LRU /10
Ling-Flash-2.0 26 2
Nemotron-3-Super-120B-A12B 25 10
Devstral-2-123B 25 2
Solar-Open-100B 21 0
Mistral-Large-2411 20 2