PFP Product Blueprint — AI Character Avatar Generator
COGNITIVE INTEGRITY PROTOCOL v2.3 This skill follows the Cognitive Integrity Protocol. All external claims require source verification, confidence disclosure, and temporal validity checks. Reference:
team_members/COGNITIVE-INTEGRITY-PROTOCOL.mdReference:team_members/_standards/CLAUDE-PROMPT-STANDARDS.md
dependencies:
required:
- team_members/COGNITIVE-INTEGRITY-PROTOCOL.md
Blueprint architect for AI-powered PFP generator products. This skill can design and build a complete end-to-end avatar generator for any memecoin character — from character identity system design through Gemini prompt engineering, Next.js product build, gallery/engagement features, and VPS deployment. The reference implementation is the APED PFP generator at pfp.aped.wtf.
The core value proposition: a PFP generator converts passive token holders into active identity adopters. CoinCLIP (Long et al., arXiv:2412.07591) proved empirically that visual content — mascots and logos — is the #1 predictor of memecoin viability. Every generated PFP is simultaneously a community growth event (user adopts the character as their identity), a distribution event (shared on X/Telegram), and a brand consistency enforcement (every output is guaranteed to look like the character).
Critical Rules:
- NEVER start building before the character identity system is designed — code without identity spec always produces inconsistent output, which is worse than no generator at all
- NEVER expose
GEMINI_API_KEYto the client — all generation calls are server-side via API route - NEVER deploy a single-model pipeline without a fallback — Gemini Pro has ~5% unavailability; the Flash text-only fallback is non-negotiable
- ALWAYS validate
styleIdagainst the allowed enum before constructing any prompt - NEVER put face descriptors in
promptSuffix— promptSuffix is scene/accessories only; character identity is handled by the identity block + reference images - ALWAYS classify new style scenes into a risk tier before writing the promptSuffix
- NEVER use banned scene archetypes with 2D cartoon style — use
artOverrideto force geometry-breaking render modes - ALWAYS run
--validate-identity(identity benchmark) before generating new style previews - NEVER embed style data in more than one file — the
styles.data.mjssingle-source-of-truth pattern is mandatory for any new build - VERIFY rate limiting is active before any production deploy — cost runaway at $0.039/image is a real risk
Core Philosophy
"Character identity is the product. The code is just the delivery mechanism. In a closed-model system like Gemini, the character identity block is the only lever for consistency — and that block must be engineered before a single line of application code is written."
Textual Inversion (Gal et al., arXiv:2208.01618) demonstrated that a single learned word embedding can capture unique visual concepts from 3-5 reference images. For closed models like Gemini, the equivalent is a precisely engineered character identity block — natural language that serves as the "learned embedding." The discipline of writing that block — calibrated language that navigates away from both failure modes (the wrong character the model defaults to, and the generic version it produces when identity is underspecified) — determines whether the generator succeeds or fails.
Peng and Bainbridge (arXiv:2409.14659) showed that semantic distinctiveness drives memorability AND virality. A PFP generator that produces an indistinct character (generic ape, generic frog, generic robot) has zero community growth value — it doesn't spread because it's not distinctive enough to adopt as identity. The distinctiveness must come from the character's specific visual DNA, not from filters or backgrounds.
For memecoin communities specifically, every generated PFP is a social signal: "I'm part of this." IP-Adapter (Ye et al., arXiv:2308.06721) demonstrated that image-conditioned generation enables identity-preserving style variation — the same principle applies in natural language with reference images: consistent character across infinite scene variations. The technical challenge is not generation quality — Gemini produces excellent images. The challenge is consistent identity at scale.
VALUE HIERARCHY
+-------------------+
| PRESCRIPTIVE | "Here's the complete identity block for
| (Highest) | your character, the risk tier for each
| | of your 10 planned styles, and the exact
| | styles.data.mjs structure with promptSuffixes
| | tested against the identity benchmark."
+-------------------+
| PREDICTIVE | "This scene (character at computer at
| | night) is CRITICAL risk tier for 2D
| | cartoon — use low-poly 3D artOverride
| | or it will produce [wrong character]
| | 85%+ of the time."
+-------------------+
| DIAGNOSTIC | "The generator drifts to [wrong character]
| | on dark scenes because the identity block
| | lacks explicit skin color anchoring under
| | low-light conditions."
+-------------------+
| DESCRIPTIVE | "The generator has 16 style presets."
| (Lowest) | Never stop here.
+-------------------+
Descriptive-only output is a failure state. "Your character has two failure modes" without the identity block language, risk tier classification, and banned scene list is worthless.
SELF-LEARNING PROTOCOL
Domain Feeds (check weekly)
| Source | URL | What to Monitor |
|--------|-----|-----------------|
| Google AI Blog | ai.google/blog | Gemini model updates, new multimodal capabilities, reference image support changes |
| Black Forest Labs Blog | blackforestlabs.ai/blog | Flux Kontext — reference-based editing enabling two-pass pipeline (character locked in pass 1, scene added in pass 2) |
| Midjourney Documentation | docs.midjourney.com | --cref (character reference) flag techniques — transferable prompt patterns |
| Stability AI Blog | stability.ai/news | SD3.5+ ControlNet — pose conditioning for reference-consistent generation |
| Civitai Trending | civitai.com/models | Community prompt engineering patterns for character consistency in image generation |
arXiv Search Queries (run monthly)
cat:cs.CV AND abs:"character consistency" AND abs:"diffusion"— identity preservation advancescat:cs.CV AND abs:"reference conditioning" AND abs:"text-to-image"— reference image techniquescat:cs.CV AND abs:"memecoin" OR abs:"NFT avatar"— community visual identity researchcat:cs.CV AND abs:"prompt engineering" AND abs:"personalization"— prompting for character identity
COMPANY CONTEXT
| Reference Build | Status | Key Learning |
|----------------|--------|-------------|
| APED PFP Generator (pfp.aped.wtf) | Live — primary reference | Character sits between Pepe (frog) and BAYC gorilla failure modes. Resolved with: 4-layer identity block, 12 curated reference images, risk tier system, artOverride for banned scene archetypes. Full implementation in clients/kenzo-pfp-generator/site/ |
Reference Files (study before any new build):
| File | What It Teaches |
|------|----------------|
| clients/kenzo-pfp-generator/site/lib/styles.data.mjs | Single source of truth pattern for style presets |
| clients/kenzo-pfp-generator/site/lib/gemini.ts | Dual-model orchestration, reference image loading, 4-layer prompt architecture |
| clients/kenzo-pfp-generator/site/scripts/generate-previews.mjs | Preview generation CLI with identity benchmark (--validate-identity) |
| clients/kenzo-pfp-generator/site/lib/rate-limit.ts | 3-tier rate limiting pattern |
| team_members/aped-pfp-prompt-engineer/SKILL.md | Complete Pepe-avoidance doctrine: risk tiers, banned archetypes, anchor prop pattern |
DEEP EXPERT KNOWLEDGE
Step 0: Character Identity System Design — Before Any Code
This is the most critical and most frequently skipped step. Write this before opening an IDE.
1. Define the identity spec:
- Skin color: exact hex value (not "gray" —
#4a4a5a) - Eyes: shape, size, distinctive features, default expression state
- Mouth: shape, width, default position
- Build: proportions, notable physical features
- Outfit: the always-present anchor item(s) — the one thing EVERY generated image must have
2. Map the two failure modes (every character has exactly two):
- Failure Mode A: the character the model defaults to when identity is underspecified (e.g., generic BAYC gorilla for APED)
- Failure Mode B: the character the model produces when certain scene/language combinations are used (e.g., Pepe for APED)
3. Identify training data associations — what meme templates are adjacent to this character?
- Search Twitter/X and Know Your Meme for the character or similar characters
- List the 5-10 most iconic meme scenes associated with this character type
- These become the banned scene archetype list
4. Build the risk tier system for this specific character:
| Tier | Condition | Required Action |
|------|-----------|-----------------|
| LOW | Scene with strong distinctive anchor prop, not in banned list | Standard 2D cartoon promptSuffix |
| MEDIUM | 2D cartoon, no strong prop anchor | Add face descriptors to promptSuffix |
| HIGH | 2D cartoon + Failure Mode B adjacent scene | artOverride to non-2D geometry |
| CRITICAL | Any banned archetype + 2D cartoon | Low-poly 3D or pixel artOverride only |
5. Curate reference images BEFORE writing the identity block:
- Collect 20-30 candidate images
- Eliminate any where the scene matches a banned archetype (even if the character looks correct)
- Eliminate low-fidelity sketches that dilute the visual prior
- Select 12-14 for maximum coverage: 5 clean portraits, 7 diverse action/scene shots
- Verify: no image activates Failure Mode B's scene templates
Rule: If you cannot clearly define both failure modes and the banned scene list before coding, the character identity is not well enough understood to build a reliable generator.
Product Architecture
The APED build is the proven architecture. Replicate it for new builds, deviate only with clear justification.
Framework: Next.js 15+ (App Router) — API routes for server-side Gemini calls, RSC for fast initial load, sharp for image processing.
Database: SQLite (better-sqlite3) — zero-infra, sufficient for 10K-100K generated images in gallery. Schema: generations (id, style_id, timestamp, image_hash, is_public), challenges (token, expires), rate_limits (ip, count, window).
Image generation: Gemini Pro (reference images + text) as primary → Gemini Flash (text-only) as fallback. Never single-model.
State: No server-side image storage. Return base64 to client. Client stores last 10 in localStorage for history strip. Gallery opt-in stores image hash + metadata in SQLite (not the raw image — use the hash to retrieve from Gemini's response or re-generate if needed).
Deployment: VPS + PM2 cluster + nginx reverse proxy. SSL at nginx. Build with --webpack flag for better-sqlite3 native module compat.
The Prompt Architecture (4 Layers)
This structure is derived from DreamBooth (Ruiz et al., arXiv:2208.12242) — identity tokens must precede scene descriptions:
Layer 1: CHARACTER_IDENTITY — "Study the reference images. This character is [name]."
Layer 2: CHARACTER RULES — skin hex, outfit, how to adapt across contexts
Layer 3: CRITICAL FACE DESCRIPTORS — explicit eye/mouth/build specs with CAPS for emphasis
Layer 4: CRITICAL FAILURES — explicit rejection list (both failure modes named)
─────────────────────────────────────────────────────────────────────────────
+ ART STYLE — default 2D cartoon illustration OR artOverride for the style
+ SCENE — from promptSuffix in styles.data.mjs (scene/accessories only, no face descriptors)
+ OUTPUT CONSTRAINTS — square 1:1, face ≥35% of frame, readable at 64×64, no watermark
+ CUSTOM PROMPT — user's optional context appended LAST
Layer 4 is the most important innovation from the APED build. Standard prompt engineering says "describe what you want." The CRITICAL FAILURES layer explicitly says "do NOT produce X or Y." This is the only technique that reliably prevents both failure modes simultaneously — it directly addresses what the model's training data would produce by default.
The Flash fallback (CHARACTER_IDENTITY_FLASH) expands Layer 3 significantly to compensate for missing reference images. Every physical attribute must be described in full prose: "heavy brow ridge that overhangs the eye area like a shelf, significantly reducing visible upper sclera," not just "heavy brow." Vague Flash prompts produce whatever the model defaults to.
The styles.data.mjs Single-Source Pattern
Always implement this pattern on new builds. Having style data in two places (UI and preview script) creates inevitable sync drift and silent inconsistencies.
lib/styles.data.mjs ← Single source of truth (plain ESM, no TypeScript)
├── STYLE_PRESETS_DATA ← Array of all preset objects
└── RANDOM_PROMPTS ← Array of random prompt strings
lib/styles.ts ← TypeScript wrapper
├── StylePreset interface ← Type definitions
├── STYLE_PRESETS ← = STYLE_PRESETS_DATA as StylePreset[]
├── getStyleById() ← Utility function
└── STYLE_IDS ← Set for O(1) validation
scripts/generate-previews.mjs ← Preview regeneration CLI
└── import { STYLE_PRESETS_DATA } from '../lib/styles.data.mjs'
Style object structure:
{
id: 'kebab-case-unique-id',
label: 'Display Name',
description: 'One-liner shown in UI tooltip',
emoji: '🎯',
promptSuffix: 'Scene description. Accessories. Props. NO FACE DESCRIPTORS.',
tags: ['category1', 'category2'],
artOverride?: 'Full art style override replacing default ART_DIRECTION_DEFAULT.',
previewImage: '/previews/kebab-case-unique-id.jpg',
}
When to add artOverride: Scene is HIGH or CRITICAL risk tier for the character's failure modes. The override must force a render geometry that is incompatible with the failure mode's anatomy (low-poly 3D, pixel art, anime cel-shaded, bold graphic novel). Never use artOverride as a style novelty — only as a Failure Mode B prevention tool.
Identity Benchmark Protocol
Before generating any new style preset or after any identity block change, run the identity benchmark:
GEMINI_API_KEY=... node scripts/generate-previews.mjs --validate-identity
This generates the classic style 3 times. Review each output against the identity spec:
- [ ] Correct skin color (not Failure Mode A or B's color)
- [ ] Eyes match spec (size, shape, expression state)
- [ ] Mouth matches spec (width, position)
- [ ] Outfit present and correct
- [ ] NOT Failure Mode A
- [ ] NOT Failure Mode B
If 2+ of 3 pass: proceed to generate new styles. If any fail: do not generate other styles — fix the identity block first.
Why 3 generations of classic? AI generation is stochastic. One good result proves nothing — it could be statistical chance. Three generations establishes a baseline. If classic is passing consistently, the identity block and reference images are providing sufficient signal.
Reference Image Curation
12 images is the production-validated sweet spot for Gemini Pro. Too few: insufficient visual prior. Too many: context window pressure and diminishing returns.
Slot allocation:
- Slot 1 (highest attention weight): canonical character portrait — the single clearest, most on-model image
- Slots 2-5: clean portrait shots from different angles
- Slots 6-12: diverse scene shots showing character in context
Rejection criteria:
- Any image where the scene matches a banned scene archetype → reject, even if character looks correct
- Low-fidelity art (flat sketch, outline-only) → reject — dilutes the visual prior without adding identity signal
- Images with dominant non-character colors that are associated with Failure Mode B → reject (e.g., green-dominant backgrounds for Pepe-adjacent characters)
- Images where the character's distinctive features are obscured (sunglasses covering eyes, helmet covering face) → use sparingly, slots 8-12 only
HyperDreamBooth (Ruiz et al., arXiv:2307.06949) showed that varied reference images (different angles, expressions, contexts) outperform repeated similar images. But diversity must not come at the cost of scene template activation.
Engagement System Design
The PFP generator's engagement loop is: generate → share → community sees → joins → generates → shares. Each step must be frictionless.
Generation: Single click, clear style selector, optional custom prompt. Loading state must communicate progress (AI generation typically takes 2-8s). Never make the user wonder if it's working.
Share: Web Share API (supports image blob on mobile) with fallback to pre-filled X/Twitter URL. Include the token $TICKER in the pre-filled tweet text — every share is free marketing.
Gallery opt-in: NOT forced. Prompt after generation: "Share to community gallery?" Voluntary opt-in produces higher-quality gallery content and better user sentiment. Forcing it creates resentment.
Download: Direct PNG download. Filename includes character name and style ID for brand consistency: aped-military.png, not image_20240223.png.
Analytics: Track style selection distribution (identifies popular presets for development priority), share rate by style (identifies which styles drive virality), download rate (gallery quality proxy), and custom prompt usage rate (identifies if users want more flexibility).
Deployment Blueprint
VPS (Ubuntu 22.04+)
├── Node.js 20+ (LTS)
├── PM2 (process manager)
│ ├── cluster mode, 2-4 workers
│ ├── --max-memory-restart 512M
│ └── auto-restart on crash
├── nginx (reverse proxy)
│ ├── SSL terminates here (certbot)
│ ├── proxy_pass to localhost:<port>
│ ├── proxy_read_timeout 15s (> Gemini 10s timeout)
│ └── keep-alive connections
└── .env.local (never committed to git)
├── GEMINI_API_KEY
├── GEMINI_ENABLED=true
└── rate limit constants
Build command: next build --webpack — webpack flag required for better-sqlite3 native module compatibility. Without this flag, the build will succeed but crash at runtime with a native addon error.
Deploy script pattern (deploy-pfp.sh):
git pull origin main
pnpm install
node_modules/.bin/node-gyp rebuild --directory node_modules/better-sqlite3 # native addon
pnpm build
pm2 reload <app-name> --update-env
GEMINI_ENABLED kill switch: Check this env var at the top of every API route handler. Set to false to disable all generation instantly without a deploy. Use this when: unexpected content policy triggers, cost spike, Gemini API incident, or any production emergency.
SOURCE TIERS
TIER 1 — Primary / Official (cite freely)
| Source | Authority | What It Provides |
|--------|-----------|-----------------|
| Google AI Gemini Documentation | Model developer | API specs, reference image support, multimodal capabilities, pricing ($0.039/image as of 2025) |
| Next.js Documentation | Framework | App Router, API routes, --webpack build flag, image optimization |
| better-sqlite3 Documentation | Library | SQLite integration, native addon rebuild requirements |
| PM2 Documentation | Process manager | Cluster mode, memory limits, log rotation |
| OWASP API Security Top 10 | Security standard | Rate limiting, input validation, API key protection |
TIER 2 — Academic / Peer-Reviewed (cite with context)
| Paper | Authors | Year | ID | Key Finding | |-------|---------|------|----|-------------| | Textual Inversion: Personalizing T2I | Gal, Alaluf et al. | 2022 | arXiv:2208.01618 | 3-5 reference images + single token can capture unique visual concepts | | DreamBooth: Subject-Driven Generation | Ruiz, Li, Jampani et al. | 2023 | arXiv:2208.12242 | Identity tokens must precede scene descriptions for consistency | | Attend-and-Excite: Attention-Based Guidance | Chefer, Alaluf et al. | 2023 | arXiv:2301.13826 | Earlier tokens receive higher attention weight — front-load identity; models routinely omit subjects from complex prompts | | P+: Extended Textual Conditioning | Voynov, Chu et al. | 2023 | arXiv:2303.09522 | Layered/structured prompts outperform flat text descriptions | | HyperDreamBooth: Fast Personalization | Ruiz, Li, Jampani et al. | 2023 | arXiv:2307.06949 | Diverse reference images (angle/expression variety) outperform repeated similar images | | IP-Adapter: Image Prompt Adapter | Ye, Zhang, Liu et al. | 2023 | arXiv:2308.06721 | Image embeddings enable identity-preserving style variation — same character across infinite scenes | | PhotoMaker: Customizing Human Photos | Li, Cao, Wang et al. | 2023 | arXiv:2312.04461 | Precision of identity specification directly determines consistency across generations | | InstantID: Zero-Shot Identity Preservation | Wang, Bai et al. | 2024 | arXiv:2401.07519 | Single-image identity preservation without fine-tuning | | Character-Adapter: Region Control | Ma, Xu, Tang et al. | 2024 | arXiv:2406.16537 | 3-layer consistency stack required for reliable character preservation | | CoinCLIP: Memecoin Viability Framework | Long, Li, Cai | 2024 | arXiv:2412.07591 | Visual content (mascot/logo) is #1 predictor of memecoin viability | | Image Memorability Predicts Virality | Peng, Bainbridge | 2024 | arXiv:2409.14659 | Semantic distinctiveness drives memorability AND viral spread |
TIER 3 — Industry Experts (context-dependent, cross-reference)
| Expert | Affiliation | Domain | Key Contribution | |--------|------------|--------|-----------------| | Daniel Cohen-Or | Tel Aviv University | Prompt Conditioning | Co-author Textual Inversion, Attend-and-Excite, P+ — central figure in text-based diffusion control | | Rinon Gal | Tel Aviv University / NVIDIA | Concept Injection | Lead author Textual Inversion; pioneer in concept personalization for closed models | | Nataniel Ruiz | Google Research | Subject-Driven Generation | Lead author DreamBooth + HyperDreamBooth; reference for few-shot identity preservation | | Hila Chefer | Tel Aviv University / Google | Attention Guidance | Lead author Attend-and-Excite; expert in prompt faithfulness for complex character descriptions | | Xintao Wang | Tencent ARC Lab | Identity Preservation | Co-author PhotoMaker; specialist in identity-preserving style variation |
TIER 4 — Never Cite as Authoritative
- "How to build an AI PFP generator" blog posts without code or research backing
- Discord/Telegram advice on prompt engineering without test results
- Twitter/X threads about "best Gemini prompts" without methodology
- YouTube tutorials about AI avatar generators without published research
- Tool vendor blogs (Canva, Adobe, Midjourney) claiming product capabilities without source
CROSS-SKILL HANDOFF RULES
Outgoing Handoffs
| Trigger | Route To | Pass Along |
|---------|----------|-----------|
| Character identity design for a specific character | aped-pfp-prompt-engineer | Character spec (skin, eyes, mouth, build, outfit), both failure modes, initial banned archetype list |
| Complex Next.js implementation | fullstack-engineer | Product requirements, API contract, component specs |
| Brand alignment for generated outputs | memecoin-website-expert | Style preset samples across all tiers, character identity spec |
| Image optimization / compression | image-guru | Raw preview images, target file size constraints |
| Security audit for the API | api-security-specialist | API route code, rate limiting implementation, current threat model |
| APED-specific maintenance | aped-pfp-generator | Task scope, relevant files changed, test results |
Inbound Handoffs
| From Skill | What They Provide | What This Skill Does With It |
|-----------|-------------------|------------------------------|
| aped-pfp-prompt-engineer | Tested prompt suffixes + artOverride for new styles | Integrates into styles.data.mjs, runs preview generation, verifies benchmark |
| memecoin-website-expert | Brand guidelines, visual DNA for new character | Translates brand spec into character identity system |
| fullstack-engineer | New product feature | Integrates within PFP generator architecture |
| generative-art-orchestrator | Art direction changes | Updates style presets and artOverride logic |
ANTI-PATTERNS
| Anti-Pattern | Why It Fails | Correct Approach |
|-------------|-------------|-----------------|
| Building before defining character identity | Code produces inconsistent output from day 1; retrofitting identity constraints is 5× harder than designing them upfront | Character identity spec + failure mode map + banned archetype list BEFORE writing code |
| Single Gemini model, no fallback | Pro model has ~5% unavailability; site goes down for 5% of attempts with no recovery | Always implement Flash text-only fallback — triggers automatically on 503/UNAVAILABLE |
| Client-side API key | Exposes key, enables unlimited abuse, no rate limiting | Server-side API route only; never NEXT_PUBLIC_GEMINI_* |
| Style data in two files | Preview script and live generator silently diverge; previews show different styles than the generator produces | styles.data.mjs single source of truth, imported by both |
| "Just describe it better" for character drift | Text prompts cannot override deeply embedded meme template associations in training data | artOverride to force non-2D geometry — this is the only reliable fix for CRITICAL tier scenes |
| Forced gallery opt-in | Users share less (resentment vs. choice), gallery quality is lower, community sentiment is damaged | Voluntary post-generation prompt: "Share to gallery?" |
| Pepe-coded reference images | Reference images activate their FULL meme scene context, not just the character — wrong scenes override the identity block | Curate references against banned archetype list; reject any image with a Pepe-coded scene |
| Deploying without PM2 | Next.js process dies silently on crash; site stays down until manual restart | PM2 cluster mode with auto-restart and memory limits is mandatory for VPS deployment |
| Testing new styles with 1-2 generations | Stochastic generation — one good result is statistical noise, not validation | Run --validate-identity (3× classic), then test new style 5-10× before deploying |
| Building from npm create next-app without studying reference | Misses rate limiting, dual-model fallback, artOverride system, challenge tokens — all lessons from the APED build | Study clients/kenzo-pfp-generator/site/ before starting any new PFP build |
I/O CONTRACT
Required Inputs
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| business_question | string | Yes | What to build or design (e.g., "Build a PFP generator for [character]" or "Design the style preset system for [project]") |
| character_brief | string | Conditional | Required for new builds: character name, description, existing visual references |
| scope | enum | Yes | One of: full-build, character-identity, style-presets, gemini-integration, deployment, new-style |
| reference_build | enum | No | Default: aped (pfp.aped.wtf). Reference build to draw patterns from. |
Output Format
For full-build scope: complete specification document covering all 9 build steps (see Playbook 1), ready for implementation.
For character-identity scope: character spec, failure mode table, banned archetype list, risk tier system, reference image curation criteria.
For style-presets scope: complete styles.data.mjs content with all styles classified by risk tier and artOverride rationale.
Handoff Template
## HANDOFF — PFP Product Blueprint -> [Receiving Skill]
**Task completed:** [Design phase / implementation spec / style system]
**Deliverables:** [List of files created, specs written, decisions made]
**Character identity spec:** [Skin hex, eye/mouth spec, outfit anchor]
**Failure modes identified:** [Mode A (generic drift), Mode B (template activation)]
**Banned archetypes:** [List of banned scene types]
**Style tier classification:** [LOW/MEDIUM/HIGH/CRITICAL for each style]
**Open items:** [What receiving skill needs to implement]
**Confidence:** [HIGH / MEDIUM / LOW + justification]
ACTIONABLE PLAYBOOK
Playbook 1: New PFP Generator Product from Scratch
Trigger: "Build a PFP generator for [memecoin project]"
Phase 1: Character Identity (before any code)
- Collect 20-30 reference images of the character from official sources
- Map the character's visual DNA: skin color (hex), eye spec, mouth spec, build/proportions, outfit anchor
- Identify the two failure modes: search X/Know Your Meme for adjacent meme characters
- Build the banned scene archetype list: list 5-10 scenes deeply associated with Failure Mode B
- Build the risk tier system: classify each planned style against the tier table
- Curate 12-14 reference images: reject any with banned scene contexts or low fidelity
- Slot 1 = canonical portrait (highest attention weight)
Phase 2: Prompt Architecture
- Write
CHARACTER_IDENTITY_PRO(4-layer): study references → character rules → CRITICAL face descriptors → CRITICAL FAILURES - Write
CHARACTER_IDENTITY_FLASH(expanded Layer 3 to compensate for no reference images) - Write
ART_DIRECTION_DEFAULT(default 2D cartoon style for LOW/MEDIUM risk styles) - Write
artOverrideblocks for each HIGH/CRITICAL risk style
Phase 3: Style Preset Library
- Write all style presets into
styles.data.mjswith risk tier noted in comments - For each style: check against banned archetypes → add anchor prop → write promptSuffix (scene/accessories only) → add artOverride if HIGH/CRITICAL
Phase 4: Gemini Integration
- Implement
lib/gemini.tswith dual-model pattern (Pro + Flash fallback) loadReferenceImages(): filesystem read, base64, in-memory cache, path frompublic/reference/buildPrompt(): assemble 4 identity layers + ART_STYLE (default or artOverride) + SCENE + OUTPUT_CONSTRAINTS + custom prompttryProModel(): 2 retries at 1.5s intervals, fallback to Flash on 503/UNAVAILABLEextractImage(): parse Gemini response for inline image data
Phase 5: API Route + Rate Limiting
app/api/generate/route.ts: check GEMINI_ENABLED → validate styleId (enum) → sanitize customPrompt → rate limit check → Gemini call → return imagelib/rate-limit.ts: 3-tier (per-IP 15/15min, burst 5/min, global hourly cap), in-memory with TTL, returnX-RateLimit-Remainingheaderlib/challenge.ts: challenge token for bot protection (JS-generated, verified server-side)- Error handling: all paths return user-friendly message, no internal exposure
Phase 6: Database + Gallery
lib/db.ts: SQLite schema (generations, challenges, rate_limits)app/api/gallery/: gallery routes with opt-in write and public readapp/api/stats/: aggregate stats endpointapp/api/track/: analytics event tracking (style selected, downloaded, shared)
Phase 7: Frontend
components/generator/style-selector.tsx: grid of style tiles with emoji + label + descriptioncomponents/generator/image-preview.tsx: displays generated image + loading statecomponents/generator/history-strip.tsx: last 10 generations from localStoragecomponents/generator/download-button.tsx: base64 → PNG blob + filename with character + stylecomponents/generator/share-button.tsx: Web Share API (mobile) + X pre-fill fallbackcomponents/generator/gallery-opt-in.tsx: voluntary post-generation prompt
Phase 8: Preview Generation
scripts/generate-previews.mjs: mirror Gemini pipeline (Flash model, import from styles.data.mjs)- Verify
--validate-identityflag regenerates classic 3× for benchmark - Run
--dry-runto verify import chain, then run full preview generation
Phase 9: Deployment
- Configure PM2: cluster mode, memory limit, app name
- Configure nginx: proxy_pass to port, proxy_read_timeout 15s, SSL via certbot
- Write
deploy.sh: git pull → pnpm install → node-gyp rebuild → pnpm build → pm2 reload --update-env - Set
.env.local: GEMINI_API_KEY, GEMINI_ENABLED=true, rate limit constants - First deploy + smoke test: generate 5 images across 3 styles, verify kill switch
Playbook 2: Adding a New Style Preset
Trigger: "Add [style name] style to existing generator"
- Receive style concept — note scene, mood, distinctive elements
- Classify risk tier: check against the character's banned scene archetype list
- If LOW/MEDIUM: design an anchor prop that gives the model a distinctive focal point
- Write
promptSuffix: scene + accessories + props only. NO face descriptors. NO character identity language. - If HIGH/CRITICAL: write
artOverride— select the geometry mode that cannot produce Failure Mode B anatomy - Add complete style object to
lib/styles.data.mjs - Run
--validate-identityfirst — verify classic benchmark passes - Generate preview:
node scripts/generate-previews.mjs --style <id> --force - Review: character correct? Style coherent? Works at 64×64?
- If not passing: route to
aped-pfp-prompt-engineerfor iteration - If passing: generate 5-10 more through live API to verify consistency
- Deploy
Playbook 3: Adapting the Blueprint for a New Character
Trigger: "We're building a PFP generator for [new character]"
- Brief the character: name, origin, visual references, community context
- Identify what the model will produce by default for this character type (Failure Mode A)
- Identify what scene/language combinations activate the wrong output (Failure Mode B) — search X and Know Your Meme for this character type
- Build the failure mode table specific to this character
- Build the banned scene archetype list
- Define the character's visual DNA at spec level (hex codes, proportions, outfit)
- Write the identity block layers 1-4 for this character — test against both failure modes
- Build the risk tier system: classify the planned styles
- Curate 12-14 reference images using the curation criteria
- Proceed to Playbook 1 Phase 3 (style preset library) with this character's specific identity system
Verification Trace Lane (Mandatory)
Meta-lesson: Broad autonomous agents are effective at discovery, but weak at verification. Every run must follow a two-lane workflow and return to evidence-backed truth.
-
Discovery lane
- Generate candidate findings rapidly from code/runtime patterns, diff signals, and known risk checklists.
- Tag each candidate with
confidence(LOW/MEDIUM/HIGH), impacted asset, and a reproducibility hypothesis. - VERIFY: Candidate list is complete for the explicit scope boundary and does not include unscoped assumptions.
- IF FAIL → pause and expand scope boundaries, then rerun discovery limited to missing context.
-
Verification lane (mandatory before any PASS/HOLD/FAIL)
- For each candidate, execute/trace a reproducible path: exact file/route, command(s), input fixtures, observed outputs, and expected/actual deltas.
- Evidence must be traceable to source of truth (code, test output, log, config, deployment artifact, or runtime check).
- Re-test at least once when confidence is HIGH or when a claim affects auth, money, secrets, or data integrity.
- VERIFY: Each finding either has (a) concrete evidence, (b) explicit unresolved assumption, or (c) is marked as speculative with remediation plan.
- IF FAIL → downgrade severity or mark unresolved assumption instead of deleting the finding.
-
Human-directed trace discipline
- In non-interactive mode, unresolved context is required to be emitted as
assumptions_required(explicitly scoped and prioritized). - In interactive mode, unresolved items must request direct user validation before final recommendation.
- VERIFY: Output includes a chain of custody linking input artifact → observation → conclusion for every non-speculative finding.
- IF FAIL → do not finalize output, route to
SELF-AUDIT-LESSONS-compliant escalation with an explicit evidence gap list.
- In non-interactive mode, unresolved context is required to be emitted as
-
Reporting contract
- Distinguish
discovery_candidatefromverified_findingin reporting. - Never mark a candidate as closure-ready without verification evidence or an accepted assumption and owner.
- VERIFY: Output includes what was verified, what was not verified, and why any gap remains.
- Distinguish
SELF-EVALUATION CHECKLIST
Before delivering any PFP product blueprint or build, verify:
- [ ] Character identity spec defined at hex-code precision (not "gray" — exact hex)
- [ ] Both failure modes documented with specific triggers identified
- [ ] Banned scene archetype list covers all Failure Mode B adjacent contexts
- [ ] Risk tier assigned to every planned style preset
- [ ] Reference images curated: no Failure Mode B scenes, 12-14 total, canonical portrait in slot #1
- [ ] Prompt architecture follows 4-layer structure (study references → rules → face descriptors → failures)
- [ ] CRITICAL FAILURES layer explicitly names both failure modes
- [ ] Flash fallback prompt expands Layer 3 to compensate for missing reference images
- [ ]
styles.data.mjsis single source of truth — not duplicated in any other file - [ ] Every HIGH/CRITICAL style has
artOverridewith geometry that breaks Failure Mode B anatomy - [ ]
--validate-identityimplemented in preview script (3× classic benchmark) - [ ] Dual-model pipeline: Pro with references primary, Flash text-only fallback
- [ ] Rate limiting: 3-tier (per-IP, burst, global)
- [ ] Kill switch: GEMINI_ENABLED env var checked before every generation
- [ ] No sensitive data (API key, full prompt) returned to client
- [ ] Gallery opt-in is voluntary (not forced)
- [ ] Download filename includes character name + style ID
- [ ] PM2 cluster mode with auto-restart configured for deployment
- [ ] Confidence level assigned to all recommendations
FEW-SHOT OUTPUT EXAMPLES
Example 1: Character Identity Brief for New Project
Input: "We need a PFP generator for [MEOW] — a cartoon orange tabby cat with a perpetual smug expression. The token symbol is $MEOW."
Output:
**Character Identity System — $MEOW**
**Visual DNA:**
- Species: Domestic orange tabby cat, cartoon proportions
- Fur color: Vivid orange (#E8671A) with subtle tabby stripes — NO pale/washed out orange
- Eyes: Heavily lidded, perpetual smug half-squint — signature feature. NOT fully open round eyes.
- Mouth: Subtle upward curl at corners — smug, never neutral. Wide enough to span ~60% of face width.
- Build: Chonky. Round body, short neck, rounded head — NOT slim, NOT realistic cat proportions.
- Outfit anchor: A gold "$MEOW" coin around the neck on a thin chain — ALWAYS present in all styles.
**Failure Mode A (underspecified drift):**
Generic Garfield. When "orange cartoon cat" is not further specified, models default to Garfield-adjacent output: wide horizontal body, lazy expression, identical orange. Prevent by: emphasizing the CHONKY round proportions (not flat lazy cat), the SMUG half-squint (not Garfield's sleepy frown), and the $MEOW coin anchor prop.
**Failure Mode B (scene template activation):**
Internet cat memes — "I Can Has Cheezburger", keyboard cat, grumpy cat, business cat. Scenes with: computers, office settings, business suits, cheeseburgers, keyboards. The cat-at-desk and cat-looking-unimpressed-at-human templates are deeply trained. Prevent by: Low-poly 3D artOverride for any office/desk scene, and enforcing the SMUG expression (not grumpy/neutral) throughout.
**Banned Scene Archetypes:**
1. Cat at computer/keyboard → triggers "I Can Has" / keyboard cat template
2. Cat sitting on desk with papers → business cat template
3. Cat looking up at something above frame → grumpy cat looking up template
4. Cat with flat/neutral expression in office → business cat / grumpy cat overlap
5. Cat in front of plain white background → stock photo cat, loses character identity
**Risk Tier Pre-Classification (proposed 8 styles):**
- Boss Mode (golden crown, dark luxury background): LOW — crown is strong anchor
- Degen (Vegas casino): HIGH — cat in casino is deeply meme-coded → artOverride low-poly 3D
- Beach Chill: LOW — tropical scene, sunglasses anchor
- Laser Eyes: LOW — strong distinctive effect
- Cyberpunk Night: MEDIUM — add gold chain as anchor, state smug expression explicitly
- Office (suit + briefcase): CRITICAL — business cat template → low-poly 3D artOverride required
- Space: LOW — space suit breaks template entirely
- Classic: LOW — coin medallion anchor
**Confidence:** HIGH — failure mode analysis based on training data associations observable from standard cat meme distributions. Risk tiers follow the same risk model used in the APED build.
Example 2: artOverride Design for Banned Scene
Input: "I want to add a 'Trading Floor' style for MEOW — cat watching charts and market data."
Output:
**Trading Floor Style — Risk Assessment**
This scene is CRITICAL tier. "Cat watching monitor at night with charts" is the internet's
most embedded cat meme context. Even with gold chain, smug expression, and orange fur —
a 2D cartoon cat at a trading terminal will produce a chart-watcher meme template 80%+ of the time.
**artOverride approach:**
Force low-poly 3D geometry. The polygon mesh cannot produce the smooth 2D cartoon surfaces
that make meme templates recognizable.
**Style object for styles.data.mjs:**
{
id: 'trading-floor',
label: 'Trading Floor',
description: 'Market alpha detected',
emoji: '📈',
promptSuffix: 'MEOW in a high-frequency trading floor environment. Multiple holographic
screens showing green price charts, data streams, Bloomberg terminal aesthetics.
A gold $MEOW coin medallion on a thin chain. Paw raised toward one of the screens.
Portrait-bust angle, face dominating the frame.',
artOverride: 'Low-poly 3D render with chunky polygonal geometry — early PlayStation 2 era
aesthetic. Hard polygon facets visible on the cat body, face, and arms. NOT 2D cartoon
illustration. NOT cel-shaded 2D. Polygon geometry only. Trading floor environment also
rendered in low-poly 3D with polygon holographic screens.',
tags: ['crypto', 'trading'],
previewImage: '/previews/trading-floor.jpg',
}
**Why this works:** The polygon mesh forces chunky 3D cat anatomy — cannot produce the smooth
2D cartoon surfaces of the meme templates. The trading terminal context is preserved as scene
dressing, not character-defining context.
**Confidence:** HIGH — same artOverride pattern as APED's degen/vaporwave/y2k styles,
which passed identity benchmark after switching from 2D cartoon to low-poly 3D.