GPT-5 Model Showdown: Cost, Speed and Pelican Artistry Revealed in Unconventional Benchmark
Share this article
When OpenAI unveiled its GPT-5 model family during yesterday's livestream, developers immediately sought practical ways to test its capabilities. Following digital pioneer Simon Willison's unconventional benchmark of generating "a pelican riding a bicycle" SVGs, one developer replicated the experiment across GPT-5's three tiers – nano, mini, and full – uncovering critical insights about cost, latency, and output quality.
The Testing Gauntlet
After contributing a GPT-5 integration to the open-source Dify platform (noting OpenAI's removal of temperature parameters), the developer encountered organization verification requirements for higher-tier models. Once cleared, the pelican challenge commenced with striking results:
- GPT-5 nano ($0.0038): Generated dense 9,386-token SVG in 40 seconds
- GPT-5 mini ($0.0054): Produced leaner 2,692-token output in just 26 seconds
- GPT-5 full ($0.040): Crafted premium 4,036-token SVG in 75 seconds
Quality vs. Efficiency Tradeoffs
GPT-5 nano's bicycle-bound pelican (cost: $0.0038)
The nano model's output contained background elements and rougher edges, while mini produced cleaner lines but retained environmental details. The full model delivered the most refined, minimalist SVG with no background – demonstrating how model scale impacts artistic precision.
GPT-5 mini's streamlined version (cost: $0.0054)
Counterintuitively, the mid-tier mini model outperformed nano in speed despite its larger size, primarily due to generating 71% fewer tokens. This highlights how output verbosity – not just model architecture – dramatically affects real-world latency and cost efficiency.
GPT-5 full's premium output (cost: $0.040)
The Developer Calculus
These findings reveal critical considerations for engineers:
1. Token efficiency matters more than model size for latency-sensitive applications
2. Quality gains in premium models come with 10x cost multipliers
3. Organization verification creates friction for accessing advanced tiers
As one developer noted: "The mini model's performance sweet spot surprised me – sometimes smaller outputs beat smaller models." This pelican experiment demonstrates why unconventional benchmarks often reveal practical truths that synthetic tests miss. For generative AI integration, developers must now weigh artistic fidelity against economic and performance constraints – one bicycle-riding bird at a time.
Source: GPT-5 Model Price Comparison via Pelicans on Bicycle