Models | Strix Benchmarks

Performance

Model	Quant	Size	RADV pp	RADV tg	AMDVLK pp	AMDVLK tg
Gemma-4-E2B	UD-Q4_K_XL	2.9 GiB	3382	109	954	99
Gemma-4-E4B	UD-Q4_K_XL	4.7 GiB	1828	59	491	57
GPT-OSS-20B-Derestricted	MXFP4	11 GiB	1405	77	1380	77
Qwen3.5-4B	Q8_0	4 GiB	1375	37.9	510	40.0
Gemma-4-26B-A4B	Q8_0	25 GiB	1303	47.7	790	47.1
Qwen3-30B-Instruct-2507	UD-Q4_K_XL	16.5 GiB	1143	92	936	93
Nemotron-3-Nano-30B-A3B New	UD-Q4_K_XL	21.3 GiB	1106	68	776	65
Qwen3.5-35B-A3B	Unsloth UD-Q4_K_XL	21 GiB	1017	59.8	686	60.4
GLM-4.7-Flash New	UD-Q4_K_XL	16.3 GiB	990	73	529	70
Qwen3.5-9B	UD-Q4_K_XL	5.6 GiB	972	35.7	289	35.2
Nemotron-Cascade-2-30B-A3B	Q8_0	31 GiB	968	54	—	—
Kimi-Linear-48B-A3B New	Q4_K_M	28 GiB	789	72	574	72
Ministral-3-14B New	UD-Q4_K_XL	7.8 GiB	696	25	173	25
GPT-OSS-120B	MXFP4	59 GiB	596	56	661	53
Qwen3-Coder-Next-80B	MXFP4	41 GiB	586	40	462	43
Magistral-Small-2509 New	UD-Q4_K_XL	13.5 GiB	389	15	94	15
Devstral-Small-2-24B New	UD-Q4_K_XL	13.5 GiB	382	15	94	15
Qwen3.5-27B	UD-Q4_K_XL	16 GiB	310	12.1	86	11.9
Qwen3.5-122B-A10B	Unsloth UD-Q4_K_XL	72 GiB	287	22.4	197	21.9
MiniMax-M2.5 New	Unsloth UD-Q3_K_XL	94 GiB	179	22	164	32
Gemma-4-31B	Unsloth UD-Q4_K_XL (Apr 11)	17.5 GiB	261	11.1	70.8	11.1

Quality

Model	Writing /30	LRU /10	FastAPI /8	LeetCode /59	Polyglot /65	Postgres /57	Cassandra /56	Combined /285
Gemma-4-31B	27	10	8	59	15	44	38	201
Gemma-4-26B-A4B	28	10	8	59	15	45	29	194
Qwen3.5-122B-A10B	29	10	8	59	13	36	34	192
MiniMax-M2.5 New	26	10	7	59	13	40	30	185
GPT-OSS-120B	20	10	8	59	14	40	31	182
Qwen3.5-35B-A3B	28	10	8	59	8	32	33	178
Kimi-Linear-48B-A3B New	30	10	8	57	22	26	24	177
Qwen3.5-27B	25	10	8	59	10	34	29	175
Qwen3-30B-Instruct-2507	30	10	2	59	13	27	31	172
Qwen3-Coder-Next-80B	26	10	2	59	9	33	32	171
Devstral-Small-2-24B New	27	10	2	59	11	29	31	169
GPT-OSS-20B-Derestricted	13	10	8	59	14	37	23	164
Gemma-4-E2B	18	7	8	59	7	27	27	153
Gemma-4-E4B	23	7	8	59	3	26	26	152
Ministral-3-14B New	26	2	2	59	16	23	18	146
Nemotron-Cascade-2-30B-A3B	18	10	8	59	1	22	21	139
Qwen3.5-9B	20	10	0	51	5	28	22	136
Qwen3.5-4B	16	9	8	54	3	17	16	123
Magistral-Small-2509 New	20	0	8	30	2	12	35	107
Nemotron-3-Nano-30B-A3B New	20	4	0	46	7	16	16	109
GLM-4.7-Flash New	14	0	0	16	0	23	27	80

Key Findings

RADV dominates prompt processing across all model families (50-280% faster than AMDVLK on pp). Token generation is typically tied between drivers since it's bandwidth-bound. AMDVLK only wins on GPT-OSS-120B pp (661 vs 596 RADV).
Kimi-Linear-48B is the new polyglot leader at 22/65, dethroning the GPT-OSS/Gemma trio (14/65). It's one of only two models to score on the Go rule engine (10/10) alongside Qwen3-30B (3/10). Linear attention gives it 72 t/s at 28 GiB.
Devstral-Small-2 punches above its weight. A 24B dense coding model hitting 29/57 Postgres and 31/56 Cassandra while scoring 10/10 LRU and 59/59 LeetCode. The 15 t/s generation speed hurts but the quality-per-parameter is impressive.
Nemotron-3-Nano collapses without thinking mode. 0/57 Postgres, 0/56 Cassandra, 0/59 LeetCode. The Mamba-2 hybrid architecture appears to need reasoning tokens enabled to produce structured output. Fast (1106 pp, 68 tg) but unusable for coding or database tasks with no-think.
Magistral-Small has the highest Cassandra score (35/56) of any non-Gemma model, beating Devstral (31) and Kimi (24). But it scores 0/10 LRU, 12/57 Postgres, and leaked its reasoning scaffold into the romance story. A specialist, not a generalist.
GLM-4.7-Flash can't code. 0/10 LRU, 0/8 FastAPI, 16/59 LeetCode (extraction failures), 0/65 polyglot. But it handles database work (23/57 PG, 27/56 Cass) and generates at 73 t/s. A fast model with a very narrow skill set.
MiniMax-M2.5 went from 0/10 coding to 10/10 after switching to Unsloth's UD-Q3_K_XL quant and their recommended sampling params (temp 1.0, top_p 0.95, min_p 0.01, top_k 40). At 94 GiB it's the largest model on disk, but 185/285 Combined puts it 6th overall. Perfect 10/10 on both PostgreSQL T2 optimization and Cassandra T2 anti-pattern detection. AMDVLK is its best backend at 32 t/s (vs 22 on RADV).
Sampling params transform quality results. Unsloth-recommended params (presence_penalty 1.5, top_k 20, thinking mode via chat-template-kwargs) made the difference between 0/10 and 10/10 on several models.
T2 (cron), T5 (FastAPI), and T6 (Sinatra) remain unsolved at 0/10 across all model runs. These single-shot challenges exceed current local model capability.
Gemma 4 leads both database benchmarks. Dense 31B tops PostgreSQL (46/57) and Cassandra (39/56). The 26B MoE follows closely. No other model family breaks 35/57 PG or 35/56 Cass.

Partial Results

Models evaluated before the full benchmark suite was established. These ran writing and LRU cache tests but not the complete battery. No longer on disk. Listed here for historical reference.

Performance

Model	Quant	Size	RADV pp	RADV tg	AMDVLK pp	AMDVLK tg
Step3.5-Flash	IQ3_XS	76 GiB	237	32	—	—
Nemotron-3-Super-120B-A12B	Unsloth UD-Q4_K_XL	78 GiB	196	10.2	139	9.86

Quality

Model	Writing /30	LRU /10
Ling-Flash-2.0	26	2
Nemotron-3-Super-120B-A12B	25	10
Devstral-2-123B	25	2
Solar-Open-100B	21	0
Mistral-Large-2411	20	2

Model Comparison

Performance

Quality

Key Findings

Partial Results

Performance

Quality