An AI content engine that learns from a creator's own YouTube channel — what works, what resonates, and why — then generates niche-specific content tuned to their proven audience.
Before any wireframe, before any feature spec — I spent three weeks embedded in the creator workflow. User interviews, diary studies, channel audits, and a quantitative survey. This chapter is everything I found.
VeeFly had built a strong YouTube promotion platform with 56,000+ active creators. But there was a gap no one had addressed: the hours between "I have an idea" and "I'm ready to promote."
The task was to design an AI-powered content creation layer inside VeeFly — not just another GPT wrapper, but a system that understood each creator's unique channel, niche, and audience well enough to generate content that actually worked for them.
"I can amplify a video the moment it's ready. But getting to 'ready' still takes my creators 4 days. That's where we're losing them."— VeeFly Product Lead, Initial Brief
38 creator interviews · 6 marketing teams · 64-person survey · 14 channel analytics audits · Competitive benchmarking of 12 tools
Semi-structured 45-minute interviews. I followed creators through their actual production workflow — screen-sharing, thinking aloud, showing me where things fell apart.
Recruited across four segments: solo creators under 100K, growing channels 100K–500K, brand content teams, and educator-creators. Each session was recorded, transcribed, and coded against a taxonomy of friction points, workaround behaviours, and unmet needs.
"I have ChatGPT, Descript, TubeBuddy, and Google Docs open at the same time. By the time I've stitched them together, I've forgotten the energy I had when the idea hit. The AI doesn't know who I am — it gives me content that sounds like it was written for someone else's channel."
Tool fragmentation kills creative momentum. Generic AI outputs don't match his channel voice or audience — requiring full rewrites that negate any time saved.
"Every video sounds different. We have 2 years of channel data proving what works for our audience — but every AI tool ignores it completely. I have to manually brief the AI like it's a new intern every single time."
Lacks channel-context persistence. Each AI session starts from zero — wasting the institutional knowledge in 2 years of channel performance data.
"I know what I want to say. I just don't know how to make people watch past 2 minutes. I have 3 years of analytics sitting there but I have no idea what they're telling me. I feel like I'm leaving performance on the table."
Has deep subject expertise and rich channel history, but lacks the analytical capability to translate retention data into actionable content decisions.
"I've tried every AI tool. They all write like a generic content farm. My audience is really specific — they want a certain tone, certain depth. No tool has ever produced something I could publish without spending 2 hours fixing it first."
High niche specificity means generic outputs are unusable. The cost of editing AI output exceeds the cost of writing from scratch for her technically-oriented audience.
"I had a great idea on Monday. The video wasn't out until Thursday. I spent Monday writing a script that wasn't good enough, Tuesday re-scripting, Wednesday recording, Thursday editing. The idea was stale by the time it published."
Idea-to-publish latency of 3–4 days. Scripting is the bottleneck — not recording or editing. First draft quality determines the entire downstream timeline.
64 creators across segments. 4-week diary study combined with a structured survey covering workflow friction, tool usage, AI adoption, and unmet needs.
The survey validated qualitative signals at scale. The standout finding: 83% of creators who had tried AI tools reported that the outputs required so much revision they questioned whether AI saved any time at all. The problem wasn't the AI's writing ability — it was that the AI had no idea who it was writing for.
Of creators who tried AI tools said outputs required full rewrites — the AI had no context about their niche, voice, or audience expectations.
Said they needed YouTube-specific AI that understood narrative structure — hooks, retention loops, CTAs — not just generic text generation.
Of total production time was consumed by scripting alone. The single highest friction point in the entire video production workflow.
Average number of separate tools creators used per video. Every tool switch costs time, context, and the creative momentum built in the previous session.
After coding 38 interview transcripts and 64 survey responses, I mapped every friction point to six categories. These became the design brief.
The pattern was clear: creators weren't struggling with any single tool. They were struggling with a system that had no memory, no niche awareness, and no continuity across production phases. Each tool was an island.
Every AI session starts from zero. Years of channel data — what performed, what flopped, what their audience rewards — sits untouched in YouTube Studio.
Average video production time. Scripting accounts for ~50%. Most of that time is spent wrestling tools, not creating.
Separate tools for scripting, voice, and SEO. Each switch resets context, loses momentum, and introduces inconsistency.
Creators rejected AI outputs because they lacked niche specificity, channel voice, and understanding of what their specific audience responds to.
Creators have years of performance data but no system to translate it into content decisions. The data exists — the intelligence layer doesn't.
Time spent revising and stitching AI outputs was 3× the time it took to generate them. The hidden cost of poor context made AI slower than manual writing.
After synthesis, I converted every key pain point into a "How Might We" question. These became the design brief that guided every subsequent decision.
The framing process was deliberate. Bad HMWs lead to feature solutions. Good HMWs lead to system solutions. I rewrote each one until it was broad enough to invite creative approaches but specific enough to stay grounded in real creator behaviour.
How might we give the AI a memory of the creator's channel — so it already knows their niche, voice, and what works for their audience before a prompt is typed?
How might we surface the right content idea at the right moment — so creators never start from a blank canvas, but from a suggestion grounded in their own performance data?
How might we unify script, voice, and SEO into a single workflow — so the context built in one phase carries invisibly into the next, with no tool-switching friction?
How might we translate a creator's analytics into plain-language content strategy — so 3 years of performance data becomes actionable, not just informational?
How might we compress idea-to-first-draft to under 60 seconds — so the creative energy of a good idea isn't lost to production friction?
How might we make the AI feel like a collaborator who knows this creator — not a generic text generator that could be talking to anyone?
14 weeks. 3 structural concept explorations. 5 usability testing rounds. Every major decision below is shown alongside the alternative that was rejected — and the evidence that drove the choice.
Defining principles before ideating forces rigour. Every subsequent concept was evaluated against these. If a design couldn't justify itself against at least 3 of these, it went back to the drawing board.
The AI's first job is to ingest channel data. Every output must reflect what the system already knows — not what the creator had to explain.
Creative director, not vending machine. Every output step is a collaborative refinement — not a single-shot generation that the creator has to fix.
Channel niche, audience intent, and proven narrative patterns must travel through the entire workflow without the creator re-explaining anything.
Every output is shaped by YouTube's mechanics — hooks, retention curves, title psychology — calibrated to the creator's specific niche data.
All AI outputs are editable drafts. The UI must make editing faster and lower-effort than regenerating from scratch.
Personalised suggestions eliminate the blank-canvas problem. Creators always start from something — never from nothing.
"How do we build an AI that already knows the creator before they type a word?" — every phase was structured to answer a part of that question.
38 creator interviews, 6 team walkthroughs, competitive audit of 12 tools, 64-person survey, and 14 channel analytics audits. Synthesised into 6 pain point categories and 6 HMW questions that became the design brief.
Defined the intelligence layer's data model: what to ingest from YouTube API, what signals to extract, how those signals translate into personalised suggestions and content generation context. Wrote 6 design principles before touching wireframes.
Explored wizard-based, dashboard-based, and conversational UI models. Rapid prototype test (n=18) validated conversational with channel-aware suggestion chips as highest-converting and lowest friction. Suggestion-to-session conversion was 3× higher than the dashboard model.
Three rounds of usability testing. Each round surfaced a distinct class of problem: Round 1 — input structure; Round 2 — output editing mechanics; Round 3 — voiceover selection and context persistence. Major iterations documented below.
Detailed interaction specs covering suggestion ranking edge cases, new-channel onboarding (limited data handling), AI response streaming, and error state design. Post-launch 90-day cohort measurement.
Below are the five most consequential design pivots. Each one shows the initial direction, the decision made, and the specific evidence that drove the change.
"The first version felt like using a very fast Google Form. We want it to feel like working with someone who gets you."— Usability Test Participant, Round 1
Show creators platform-wide trending YouTube topics — the same approach every competitor used. Low engineering cost, easy to justify with "people want popular ideas."
Connect to each creator's YouTube channel via API. Analyse their top-performing content, identify topic clusters with proven engagement, and rank suggestions by predicted performance for their specific audience.
A traditional form with fields for Topic, Audience, Tone, and Keywords — mirroring what ChatGPT custom instructions and competitor tools used. Felt organised. Stakeholders liked the visual structure.
A single natural-language input field, with personalised chip suggestions below it. The channel intelligence layer means the AI already has the "form" filled in — tone, niche, audience. The creator just describes the idea.
Engineering proposed generating the complete 600–800 word script in a single API call. Simpler architecture, lower latency variation, cleaner state management.
Stream the script section by section — Hook → Setup → Part 1 → Part 2 → CTA — with each section editable before the next appears. Matches how creators actually think about and review scripts.
A tabbed navigation structure to organise the three output types. Clean, predictable, easy to build. Product team initially preferred this for its visual clarity.
All outputs — title, description, hashtags, script, voiceover player — in one continuous document. The mental model shifted from "three tools in one UI" to "one content package being built progressively."
A dropdown list of voice names (Rosey, Miley, David, Frank…). Engineering wanted this for simplicity. PM felt names were memorable enough to become familiar.
A modal grid showing avatar, name, personality trait (Friendly / Cheerful / Matured / Professional / Excited / Natural), and an inline play button to preview the voice before selecting.
This is the core system — the thing that makes VeeFly AI different from every other AI content tool. Before a creator types anything, the system has already ingested their channel, extracted performance signals, and built a model of what works for their specific audience.
The intelligence layer has three stages: ingest, analyse, surface. Each stage is invisible to the creator — but its output shapes every part of the experience they see.
The design challenge wasn't building the system — that was engineering's job. The design challenge was making the intelligence feel natural and trustworthy: showing creators that the AI knows them without making them feel surveilled, and surfacing confidence signals without overwhelming the interface.
YouTube API ingests upload history, titles, descriptions, tags, engagement metrics, and retention data across full channel history.
NLP clustering identifies topic niches. Retention analysis extracts hook formats and narrative structures. Engagement patterns reveal what this audience rewards.
Suggestions ranked by predicted performance. Generated content tuned to the creator's proven topic niche, narrative format, and audience tone.
Views, watch time, CTR and engagement rate per topic cluster reveal the content formula that works for this specific audience.
NLP clustering of titles surfaces the recurring themes that define the creator's niche territory and topic authority with their audience.
Narrative structure analysis — hook formats, section pacing, CTA placement — extracts the storytelling patterns this audience rewards with watch time.
Retention curve analysis combined with comment sentiment mining reveals the emotional triggers and content qualities driving loyal repeat viewing.
This is the data layer that makes every suggestion and every generated output personal. The analytics dashboard shows creators the same intelligence the AI is using — building transparency and trust in the system.
One of the key design decisions was making this data visible to creators — not hiding it behind the "magic." Showing creators their own performance data validated the AI's suggestions and increased trust in outputs by giving creators a reason to believe the suggestions were grounded in evidence, not guesswork.
Every screen — Onboarding, Channel Scanning, Home Dashboard, Content Output, Analytics — fully interactive. Use the navigation at the bottom to move between screens.
This prototype reflects the final design system: light-mode canvas, editorial typography (Fraunces for output, Inter for UI), a structured sidebar, and the full content generation flow from prompt to publish-ready package.
90-day post-launch measurement across a cohort of active creators. And the four things this project permanently changed about how I approach designing AI products.
"This is the first AI tool that actually understood I was making a YouTube video for my audience — not just generating content for anyone."
"The suggestions alone cut our planning time in half. We went from 3-day turnarounds to publishing the same week we have the idea."
"VeeFly AI users with personalised suggestions active showed 2.4× higher weekly engagement compared to those without — it became our strongest retention lever."
Early stakeholder pressure pushed for video editing, thumbnail generation, and social scheduling. We pushed back — backed by research showing channel intelligence + script + voice + SEO covered 80% of creator friction.
"The temptation is always to add more. The discipline is knowing that adding thumbnail generation would have split engineering focus and delayed the intelligence layer by 6 weeks — and the intelligence layer was the whole product."— Post-launch retrospective note
Not lessons I read in a book. Lessons extracted from specific moments in this project where the evidence forced a belief update.
"Designing with AI is different from designing for AI. The AI is a collaborator, not a feature — and the interface has to reflect that relationship."— Personal reflection, post-launch retrospective
Asking creators to describe their niche every time was friction disguised as "personalisation." When the AI already knew from channel data, the experience transformed from "briefing a new intern" to "working with someone who genuinely gets your audience." The input box became a direction, not a specification. This was the single biggest UX unlock of the project — and it came from research, not from a design idea.
We couldn't eliminate AI response latency. But we could design around it. Progressive section streaming, skeleton states with thoughtful loading copy, and section-by-section reveals turned wait time into "watching the AI think" — which paradoxically increased perceived quality. Creators who saw section streaming reported higher satisfaction with the same underlying output quality as those who received it all at once.
Creators don't evaluate AI output holistically — they have trust checkpoints. A suggestion that matches their proven topic cluster. A hook that sounds like something they'd actually say. A title that reflects their audience's search behaviour. Designing for trust means identifying and optimising those specific validation moments — not just improving average output quality. This changed how I think about AI product metrics.
Stakeholder pressure to expand scope is normal. What's not normal is having the research evidence to pushback credibly. Every feature deferred to V2 was deferred because the research showed it didn't address the core friction. "Our survey of 64 creators showed that scripting accounts for 50% of production friction — thumbnail generation doesn't appear in the top 5 pain points" is a conversation-ending answer. Research doesn't just inform design — it protects it.