eb585c0001
gpt-5.4-nano correctly discriminates complexity (1 vs 10) while deepseek-v4-flash rated everything as 1/10.