Jun 12, 2026

A Video Production Pipeline Built Around a 15-Minute Constraint

How I designed a content system that produces videos quickly enough to be sustainable.

Goal

Create educational videos and shorts consistently without spending hours editing.

The original motivation was supporting content distribution for ClearFit, a career-fit project I was experimenting with. I wanted a workflow I could realistically sustain long-term.

Target:

  • less than 15 minutes of effort per video
  • good AI voice quality
  • good quality AI-generated imagery
  • visually engaging output
  • inexpensive to run locally (I was curious how much quality could be achieved with roughly 16GB of VRAM before reaching for paid APIs)

Approach

After testing several text-to-speech systems and image-generation workflows, I settled on a combination that produced acceptable quality on local hardware.

The central piece became a JSON specification describing the entire video.

Each file contains:

  • title
  • scenes
  • video thumbnail
  • narration
  • image prompts
  • animation instructions

System

The JSON acts as the source of truth.

From there, local scripts:

  • generate narration
  • generate images
  • animate backgrounds
  • assemble scenes
  • render the final video

A single specification can produce both long-form videos and shorts.

Result

The workflow is now simple enough that a new video starts as a JSON file and can be rendered with a few commands.

More importantly, the production process fits the original constraint: creating content quickly, inexpensively, and with little ongoing effort.

SystemClarity

Have a real case? Submit it.

If this kind of pattern feels familiar in your own work, use the inquiry form to share what you are trying to build and where the technical shape is still unclear.

Share This Essay