Observation: GPT-OSS shows more consistent high-quality outputs across different model sizes, with scores ranging 4-6 vs GPT-2's 1-6 range.