Overview
YouTube AI Studio plugs into the YouTube Creator Studio upload flow and lets creators edit with natural language. Upload a video and the AI reads the transcript, detects the sentiment of each section, suggests cuts, generates captions, and produces original background music matched to the mood. The interface is built to be visually indistinguishable from real YouTube Creator Studio. Built in a weekend with Diego Agustin and Aaron Zhang.
What we built
A full production web app, not a demo shell. Next.js and React on the front end, Supabase for auth and a PostgreSQL database, and in-browser video export through FFmpeg WebAssembly so nothing has to round-trip a server. A creator uploads, the AI proposes an edit, and the export happens client side.
- Gemini 1.5 Flash classifies transcript segments by mood and energy through the Generative Language API.
- Lyria 3 generates original, royalty-free background music per section based on those classifications.
- The mood classification is the bridge. Gemini labels the emotional arc of the video, and Lyria scores each section to match.
- Stack: Next.js, React, Supabase (auth + PostgreSQL), FFmpeg WebAssembly for in-browser export.
What broke
- Lyria 3 generation latency. Music generation was too slow to block on, so we moved it behind a background worker polling a job queue.
- Inconsistent Gemini labels. Early prompts produced mood labels that drifted between calls, which broke the music pipeline downstream. Careful prompt engineering forced consistent structured classifications.
Demo
Try it live at yt-ai-studio.vercel.app or read the code on GitHub.