Blog
ENPL

AI image generation in 2026 — Gemini 3.1 Pro vs Flux vs SDXL — what I use and when

Four models, three price points, one case study (cover image for the blog). Showing what works for which use case, where the API hurts, and when it pays to go back to Sharp + Canvas.

·7 min read
AI image generation in 2026 — Gemini 3.1 Pro vs Flux vs SDXL — what I use and when

After a year of experimenting with various image generation models I have a fairly concrete ranking for different use cases. This isn't a leaderboard or a benchmark, it's a record of what I actually use and why, with concrete prices, gotchas, and cases where no AI beat procedural generation.

Models I care about in 2026

ModelProviderStrengthPrice (1 image)
Gemini 3.1 Pro ImagesGooglePhoto realism, text rendering~$0.040
Flux 1.1 ProBlack Forest LabsBest composition, brand asset look~$0.055
Flux schnellBlack Forest Labs (CF Workers AI)Speed, free tier$0 (CF)
SDXL LightningStability AI (CF Workers AI)Textures, abstract patterns$0 (CF)
DALL-E 3OpenAIStill good for illustration~$0.040
Imagen 3GooglePremium photo realism~$0.030

What I don't use:

  • Midjourney, community-first, no API, doesn't fit automation
  • Stable Diffusion local, only if you're local (RTX 3070+) and need full freedom
  • Leonardo.ai, heavenly UI, but expensive and inconsistent API

Case study — cover images for the blog

I have 32 posts. I wanted each to have a unique cover. Three approaches I tried:

Approach 1 — Flux 1.1 Pro via Replicate

Prompt template:

abstract dark technical illustration for a tech blog post about
"{title}". Color palette: deep black background (#0a0a0a) with neon
green accent (#00FF7F). Style: cyberpunk minimalism, code fragments
floating, terminal silhouettes, abstract data visualizations, 16:9
aspect ratio. No text, no people, no recognizable logos.

Results: visually great but stylistically inconsistent. Each image looks professional, but 32 together look like a gallery of different artists, not one brand.

Cost: 32 × $0.055 = $1.76. Cheap as it gets.

Approach 2 — Gemini 3.1 Pro Images

Same prompt template, better typography rendering (I tried adding title overlay inline in the image). Results worse than Flux for abstraction, great for photo-realistic scenarios.

Best for: posts with a "human element" (e.g. "my workflow", "talking to a client"). Worse for: abstract tech.

Approach 3 — Sharp + Canvas, procedural (winning pick)

After two days of generation I went back to procedural. Each slug has a deterministic seed, generates:

import sharp from "sharp";
import { createHash } from "crypto";
 
function slugSeed(slug: string): number {
  return parseInt(createHash("md5").update(slug).digest("hex").slice(0, 8), 16);
}
 
async function generateCover(slug: string, title: string) {
  const seed = slugSeed(slug);
  const hue = (seed % 360); // unique per slug
  const svg = `
    <svg width="1200" height="630">
      <defs>
        <radialGradient id="g" cx="20%" cy="10%">
          <stop offset="0%" stop-color="hsl(${hue}, 80%, 40%)" stop-opacity="0.4"/>
          <stop offset="60%" stop-color="#0a0a0a" stop-opacity="0"/>
        </radialGradient>
      </defs>
      <rect width="1200" height="630" fill="#0a0a0a"/>
      <rect width="1200" height="630" fill="url(#g)"/>
      <!-- accent grid pattern -->
      <pattern id="grid" width="60" height="60" patternUnits="userSpaceOnUse">
        <path d="M 60 0 L 0 0 0 60" fill="none" stroke="rgba(255,255,255,0.04)"/>
      </pattern>
      <rect width="1200" height="630" fill="url(#grid)"/>
    </svg>
  `;
  await sharp(Buffer.from(svg))
    .composite([{
      input: Buffer.from(`<svg><text x="80" y="350" font-size="68" font-family="Inter Black" fill="white">${title}</text></svg>`),
      top: 0, left: 0,
    }])
    .webp({ quality: 85 })
    .toFile(`public/blog/covers/${slug}.webp`);
}

Cost: $0. Time: 0.4s per image. Consistency: 100%. Maintenance: regenerates on every build if I want.

Why procedural won:

  • Brand-consistent (every image shares DNA, colors, font, layout)
  • Reproducible (commit to git, regenerate locally)
  • Zero API cost
  • Title rendered as text, not "draw text" attempted by an LLM

Where procedural loses:

  • Photo-realistic content (Gemini wins by years)
  • Complex illustration (Flux)
  • Storytelling visuals (e.g. mockup product page), only AI

API DX (developer experience) comparison

Ranking who integrates well:

1. Replicate, REST API, polling job ID, output URL. All models work (Flux, SDXL, Gemini, custom community models).

curl -X POST https://api.replicate.com/v1/predictions \
  -H "Authorization: Token $TOKEN" \
  -d '{"version": "...", "input": {"prompt": "..."}}'

2. Gemini API (Google AI Studio), JSON-in, base64-out. Well-documented, easier than GCP setup for solo devs.

3. Cloudflare Workers AI, cheapest (free tier up to 10k req/day), but doesn't return image bytes via MCP (known bug since 03/2026). Direct REST API works.

4. OpenAI Images API (DALL-E 3), works OK, but max 1024x1024 and pricey for batch (DALL-E rate limits much lower than GPT).

Quality benchmark (no bias)

I gave the same prompt to 4 models, scored by 5 people:

Prompt: "Modern minimalist landing page hero illustration for a
booking website. Dark theme. Mountain silhouette in background.
Subtle accent color: forest green. Photo-realistic but stylized.
1200x630."

Results (1-10):

ModelCompositionColor fidelity"Brand feel"Average
Flux 1.1 Pro9888.3
Gemini 3.1 Pro8978.0
DALL-E 37777.0
Flux schnell7766.7
SDXL6655.7

Flux wins, but Gemini very close + 30% cheaper.

When to use each — my decision tree

  • Hero image for client landing (mountain, premium)Gemini 3.1 Pro ($0.04, photo-realistic)
  • Brand illustrations / 5 different images for a deckFlux 1.1 Pro ($0.055 but top-tier consistency)
  • Internal mockups, draft, brainstormFlux schnell (CF, free)
  • Abstract patterns / texture backgroundsSDXL Lightning (CF, free)
  • Cover images for 32 blog posts (need consistency)procedural Sharp ($0)
  • One-off social asset (Instagram post, banner)Gemini or Flux depending on style

Warnings and things that surprised me

1. Brand colors don't always come out. I tried #00FF7F as accent, Flux/Gemini interpret "neon green" as similar but not identical. If you NEED exact color → procedural overlay.

2. Text on image is a joke. All models still struggle with this (despite Gemini 3 being the best). If you need a title, generate the background and overlay text via Sharp/Canvas.

3. Model versioning in Replicate. Pin the specific model version hash in your API call. When a new version drops, your images will look different. Remember the hash from docs may be stale.

4. Rate limits are more restrictive than they seem. DALL-E 3 = 5 req/min for Tier 1. For 32 images that's 6.5 minutes of blocking. Gemini and Replicate higher (50-200/min).

5. Watermarking in Imagen 3 / Gemini. Google ships SynthID in generated images (invisible watermark). Doesn't affect look but some clients prefer "pure" generation, Flux doesn't have it.

What I do for clients

For small business landings (agritourism, apiary, salon):

  • Hero image: from stock (Unsplash, Pexels), often suffices and looks authentic
  • Decorative illustrations: Flux 1.1 Pro once, reused for the whole brandbook
  • Avatars/team photos: stock + AI fix-up (Photoshop AI generative fill)

For AI agents / personal projects:

  • Cover images: procedural (like this blog)
  • Demo videos: ScreenStudio + manual overlay
  • Logo/brand: external designer once, not AI (the difference is real)

For social posts:

  • Post graphics: Banner Maker (Sharp + template) or Gemini per post
  • OG images: dynamic Next.js og endpoint (with Tailwind-like styling)

AI image gen in 2026 is a grown-up toolkit. The model choice depends heavily on use case, not "which is best". For me the split is: Flux for brand assets, Gemini for photo realism, CF free tier for draft, Sharp for everything that needs consistency and a $0 budget.

Smartest rule: how many times will this image refresh? Once → AI. 32 times with brand consistency → procedural.