HN – Show HN: Image prompt game with multi-signal CLIP/HSV/HOG scoring

Built this originally as a small competitive game, then it turned into a useful prompt-engineering practice loop.

  Core mechanic: user sees a target image, writes a prompt, model generates output, and we score similarity.

  Scoring uses multiple signals so one metric doesn’t dominate:

  1. Semantic alignment (CLIP)
  - user_prompt -> target_image (is the prompt conceptually aligned with target?)
  - user_image -> target_image (is the generated result semantically aligned with target?)

  2. Prompt faithfulness (CLIP)
  - user_prompt -> user_image (did generation actually follow the submitted prompt?)

  3. Color similarity
  - HSV histogram overlap (user_image vs target_image) for palette/tone distribution

  4. Structure similarity
  - HOG-lite gradient/orientation comparison (user_image vs target_image) for layout/edge composition

  Final score is a weighted blend (content signals weighted highest), normalized to player-facing points.

  Why this approach:
  - CLIP-only can overrate semantically related but visually off outputs
  - color-only ignores structure/meaning
  - structure-only misses semantics/style
  - combining prompt-image and image-image signals reduced obvious false positives in ranking

  Stack:
  - Spring Boot backend
  - separate CLIP scoring container
  - external image generation service
  - Next.js frontend
  - PostgreSQL

  Would love technical feedback on:
  - metric weighting/calibration
  - known failure modes I should benchmark
  - alternatives to HOG-lite for fast structural scoring

Show HN: Image prompt game with multi-signal CLIP/HSV/HOG scoring

1 comments