HyperFrames Explained: When HTML Becomes a Video Format (And What It Means for You)

13 min read
URLtoVideo Team
hyperframeshtml to videoai agentsvideo renderingwebsite to videoheygen

A few days ago, HeyGen dropped something called HyperFrames on GitHub, and the video tooling corner of the internet has been trying to figure out what to do about it ever since. The elevator pitch is strange enough to stop you mid-scroll: write HTML, render video, built for agents.

That is not a metaphor. You literally write an HTML file, run a CLI command, and a .mp4 file falls out the other side. No After Effects. No timeline. No keyframes dragged with a mouse. Just markup, a headless browser, and FFmpeg doing what FFmpeg does.

I've been building URL-to-video tooling long enough to be skeptical of "this changes everything" launches, and most of the time that skepticism is correct. HyperFrames is a little different, though — not because it's going to replace Premiere next Tuesday, but because it's pointing at something real: AI agents need a video format they can actually write, and HTML is the one they already speak fluently.

Let me walk you through what HyperFrames actually is, how its website-to-video pipeline works, where it shines, where it still feels rough, and — honestly — when you should use it versus just pasting a URL into URLtoVideo and calling it a day.

What HyperFrames Actually Is (No Hype Version)

Strip away the tagline and HyperFrames is three things glued together:

  1. A specification for describing video compositions in HTML. You get regular HTML tags plus a handful of data-* attributes that tell the renderer "this element enters at 0.5s, exits at 2.1s, uses this easing." That's it. No proprietary DSL, no React components, no .fcpxml files.
  2. A rendering engine that opens each composition in a headless Chromium, walks through frames at a fixed frame rate (30fps, 60fps, whatever you set), screenshots each one, and hands the sequence to FFmpeg for encoding.
  3. A set of "skills" — these are installable prompt packages for Claude Code, Cursor, Gemini CLI, and Codex — that teach your coding agent how to write correct compositions and well-timed GSAP animations.

The third part is where it gets philosophically interesting. HeyGen isn't just shipping a library for humans. They're shipping a tool that agents are supposed to drive. The CLI is non-interactive. The compositions are plain text. The output is deterministic — run the same input twice, get the same bytes twice. Every decision seems oriented around "can a language model do this alone, reliably, at 2 a.m., without a human pressing anything?"

That's a real shift. Most video tools have spent twenty years adding buttons. HyperFrames is built around not having any.

The Technical Stack, Briefly

Under the hood you're looking at:

  • Node.js 22+ as the runtime
  • Puppeteer driving headless Chromium for frame capture
  • FFmpeg for encoding the frame sequence to MP4
  • GSAP as the default animation engine (though you can swap in Lottie, Three.js, D3, or basically anything that runs in a browser)
  • Apache 2.0 license, open source, 6.7k GitHub stars as of this writing and climbing fast

If you've ever rendered a video by scripting Puppeteer to screenshot a page over time and piping the result through ffmpeg -framerate 30 -i frame_%04d.png out.mp4, congratulations — you've built a janky version of HyperFrames. What HeyGen added is the plumbing that makes this approach actually usable: deterministic timing, frame adapters, a composition format that doesn't fall apart, and the agent skills that make an LLM capable of writing the HTML correctly on the first try.

The Website-to-Video Pipeline, Unpacked

Here's the part that overlaps directly with what we do at URLtoVideo, so it's worth looking at carefully. HyperFrames ships a built-in workflow called "website to video" that takes a URL and a one-line creative brief and outputs a finished MP4.

The brief looks something like:

"25-second product launch. Apple keynote energy."

And the pipeline runs in seven stages:

Stage Output Artifact What Happens
1. Capture Screenshots, fonts, colors Puppeteer crawls the site and pulls brand tokens
2. Design DESIGN.md Agent summarizes brand voice and palette
3. Script SCRIPT.md Agent writes hook + story + proof + CTA
4. Storyboard STORYBOARD.md One beat per scene, with direction
5. Voiceover narration.wav + transcript.json TTS with word-level timestamps
6. Build compositions/*.html One animated HTML file per beat
7. Render my-video.mp4 Puppeteer + FFmpeg compile the final video

The payoff: a 25-second, 1920×1080 video file, roughly 7 MB, typically encoded in about 12 seconds of wall time. Every intermediate artifact is a plain-text file on your disk. Don't like beat four? Edit the markdown, rebuild that one composition, re-render.

This is genuinely clever. It's also — and I want to be honest here — a lot. You need Node 22+, FFmpeg installed globally, a terminal, a coding agent like Claude Code or Cursor, and enough familiarity with the command line that npx skills add heygen-com/hyperframes doesn't make your eye twitch.

For a developer who's already deep in that ecosystem, it's a dream. For a marketer who just needs a scroll-through video of a landing page for LinkedIn by 3 p.m., it's a trip to the Apple Store to buy a new laptop you didn't need.

What HyperFrames Does Exceptionally Well

Credit where it's due. A few things about this project are genuinely excellent:

Determinism. Video tools almost never give you bit-for-bit reproducibility. HyperFrames does, because there's no GPU nondeterminism, no "the render queue got interrupted" — every frame is a screenshot of a headless browser at time t. Run it twice, get the same pixels. This matters a lot for CI pipelines and for AI agents that need to know their output is stable.

The agent-first philosophy. If you've ever tried to get Claude Code to "just edit the video," you know the experience is terrible. LLMs don't know what a timeline is. They do know HTML. By choosing HTML as the video format, HyperFrames made the problem solvable by a language model, not just visible to one. That's a meaningful distinction.

Extensibility. The Frame Adapter pattern means you can mix GSAP for UI animation, Three.js for a 3D hero, Lottie for a branded intro, and D3 for data viz charts — all in the same render. Browser compatibility is doing a lot of heavy lifting here, and HeyGen had the sense not to fight it.

Scale economics. Once you've written a template composition, generating 10,000 personalized variants is just 10,000 cheap headless-browser runs. For programmatic video — think personalized onboarding videos, auto-generated weekly reports, dynamic ad creatives — the per-unit cost drops to pennies.

Where HyperFrames Is Still Rough

Every new framework has sharp edges. Being honest about them is the only way this post is useful:

Setup friction is real. "Install Node 22, install FFmpeg, install an agent CLI, install the skills, clone the repo, initialize a project, configure your API keys for TTS" is a lot of yaks to shave before your first render. Compare that to pasting a URL into a text box and pressing a button.

Rendering is CPU-heavy on your machine. Puppeteer + FFmpeg is not free. A complex 60-second composition will heat up your laptop and noticeably slow everything else down. There's no cloud-render option in the box — you bring your own compute.

The output is only as good as the composition. If your agent writes a mediocre storyboard, you get a mediocre video, now with the additional problem that debugging mediocre involves reading generated HTML. The abstraction saves time only when the generation is good.

It's not a screen recorder. This is the biggest misunderstanding I've seen floating around online. HyperFrames does not record a website scrolling. It builds a new video using assets extracted from the website — logo, colors, text, screenshots. If what you actually want is a smooth scroll-through of the live site, HyperFrames is the wrong tool. Use a screen recorder, or use URLtoVideo, which is literally designed for that.

HyperFrames vs. URLtoVideo: Same Neighborhood, Different Houses

I want to be careful here because it would be easy to write a disingenuous comparison that makes our own product look better. Let me just state what these tools actually do and let you decide.

URLtoVideo (our tool)

  • Input: a URL
  • Output: an MP4 of the live website smoothly scrolling from top to bottom, at the resolution and aspect ratio you pick
  • Interface: a web form
  • Time to first video: about 30 seconds, no install
  • Use case: product demos, landing-page showcases, social teasers, investor decks, "look at our new site" Slack pings

The value prop is speed and zero setup. You paste, you wait, you download. It's a screen recording of the real page, not a stylized reinterpretation.

HyperFrames

  • Input: a URL, a creative brief, and a terminal
  • Output: a stylized, scripted, narrated video constructed from brand assets extracted from the site
  • Interface: CLI, plus an AI agent like Claude Code
  • Time to first video: 15–60 minutes on your first run (install + learn); a few minutes per video after that
  • Use case: promo videos, launch teasers, programmatic video at scale, anything where you want HeyGen-level polish and you control every frame

The value prop is control and programmatic generation. You're not recording a website; you're building a short film about it, with voiceover and beats and hero shots.

The Honest Comparison Table

URLtoVideo HyperFrames
Setup time 0 minutes 15–60 minutes
Technical skill required None Comfortable with CLI
Output style Live scroll recording Stylized composition with VO
Time per video ~30 seconds ~2–5 minutes
Determinism Not critical Guaranteed
Cost per video Free / cheap Local compute + TTS API
Best for Quick demos, social, showcases Promo videos, scale, automation
Editing after render Download and edit elsewhere Edit markdown, rebuild beats
Requires AI agent No Strongly recommended

If I had to summarize it in one line: URLtoVideo captures your website. HyperFrames reinterprets it.

Both are valid. They're not really competing — they're adjacent tools you'd pick up at different stages of the same marketing process.

When to Pick Which

A quick decision framework, because "it depends" is a useless answer:

Use URLtoVideo when:

  • You need a video today and have no time to install anything
  • You want to show the actual site, not a stylized version
  • Your audience cares about what the site is, not a narrative about the site
  • You're a marketer, a founder, or anyone without deep engineering time to spare
  • You need multiple aspect ratios of the same content (1080×1920, 1:1, 16:9) and want to generate them in one go

Use HyperFrames when:

  • You're building a video pipeline, not a one-off
  • You need ten thousand personalized variants of the same video
  • You want voiceover, scripted beats, and custom animation — not a recording
  • You have a coding agent in your workflow already and you want it to produce video artifacts
  • You care about version-controllable, diff-able video source files

Use both when:

  • You want a scripted promo video (HyperFrames) and a supporting "here's the live product" clip (URLtoVideo) for the same launch. Honestly, this is the case for most product launches.

Getting Started with HyperFrames (The Minimum Viable Path)

If reading this far has you curious enough to try it, here's the fastest path from zero to first render. I'm going to assume you're on macOS or Linux and already have a terminal open.

# Prereqs: Node 22+ and FFmpeg
node --version   # should be v22 or higher
ffmpeg -version  # any recent version

# Install HyperFrames skills into your Claude Code setup
npx skills add heygen-com/hyperframes

# Initialize a new video project
npx hyperframes init my-first-video
cd my-first-video

# Open in Claude Code (or Cursor, or your agent of choice)
# Then, in the agent:
/hyperframes Create a 15-second teaser for https://example.com with cinematic energy.

From there, the agent runs the seven-stage pipeline, writes the files to disk, and renders the MP4. You can preview individual compositions in a live-reloading browser window while the agent iterates, which is genuinely one of the nicest parts of the experience — you see your video take shape the same way you'd see a webpage take shape during development.

If the output isn't what you wanted, the first thing to do is not re-run the whole pipeline. Open STORYBOARD.md, edit the beat you don't like, and tell the agent to rebuild just that composition. It's iterating on video the way developers iterate on code, which was always the point.

The Bigger Picture: Programmatic Video Is Finally Here

Video production has been stuck in the "one person at a GUI" model since Avid shipped its first non-linear editor in 1989. Tools got better. The model didn't change. You still opened an app, dragged clips onto a timeline, and clicked export.

What HyperFrames represents — and what a few other projects are poking at right now — is the first serious attempt to make video a developer primitive. Text has always been a primitive: you can write it, diff it, version it, generate it, template it. Images got there a few years ago with generative models. Video has been holding out because the toolchain was locked behind pro apps and proprietary formats.

An HTML-based video format breaks that open. Every frontend developer is now, technically, a video engineer. Every AI agent that can write a landing page can now write a video. Every CI pipeline that deploys a website can now, in principle, publish an accompanying animated trailer. Whether that's useful for your business is a separate question. But the capability is suddenly just... there, and cheap.

The parallel I keep coming back to is early WebGL. In 2012 you could technically render a 3D scene in a browser. Almost nobody did, because the setup was painful and the audience was niche. A decade later you can't scroll a product page without tripping over a Three.js hero section. HyperFrames (and things like it) feel like they're at that 2012 moment for programmatic video: absolutely real, absolutely not yet mainstream, and probably going to look obvious in hindsight.

How This Fits Into Our Thinking at URLtoVideo

I want to be transparent about how we're reading this internally. HyperFrames isn't a competitor to what we do — it's a signal about where the market is heading. The signal is: video creation is splitting into two lanes.

The first lane is "I need a video, now, from this URL." That's the lane we're in. The value is speed and zero friction. We don't think that lane goes away. If anything, it gets more valuable as video becomes more essential to every marketing workflow.

The second lane is "I want to produce video programmatically, at scale, as part of an automated pipeline, probably driven by an AI agent." That's where HyperFrames lives, and it's where a lot of the interesting technical innovation is going to happen over the next 12 months.

We're watching that second lane carefully. If you read our guide on converting websites into professional videos, you already know we care about output quality and workflow fit. The long-term question for every tool in this space — ours included — is "where does the agent sit?" Is the human still driving? Is the agent driving and the human reviewing? Is there a human at all?

We don't have perfect answers. Nobody does yet. But HyperFrames is a real, useful data point, and if you're a developer who's curious about programmatic video, you should spend an afternoon with it.

Frequently Asked Questions

Is HyperFrames free? Yes. Apache 2.0 licensed, open source, no account required for the core framework. You'll pay for TTS if you use a commercial voice provider, and for any AI agent subscription you're already running, but the rendering itself is free and runs on your machine.

Do I need to use HeyGen to use HyperFrames? No. HeyGen built it and open-sourced it, but you can use HyperFrames entirely standalone without a HeyGen account. Nothing is phoned home.

Can HyperFrames record a scrolling website the way URLtoVideo does? Not really, and that's not what it's for. HyperFrames builds a new video using assets extracted from the site. If you want an actual screen recording of the live page scrolling, use URLtoVideo — it's built specifically for that.

What's the learning curve like if I've never used a CLI before? Steep enough that I'd recommend picking a different tool, honestly. HyperFrames assumes you're comfortable with Node, npm, terminals, and at least one coding agent. If you're not, the reward-to-effort ratio isn't there yet. Wait six months — tooling on top of HyperFrames will almost certainly smooth this over.

Will HyperFrames replace tools like Premiere or After Effects? No, and I don't think that's the goal. HyperFrames is for generative and automated video. Premiere is for human-driven creative work. They solve different problems. The confusion happens because they both output MP4s.

Can I use HyperFrames output in social media? Absolutely. It outputs standard MP4 files at whatever resolution and aspect ratio you specify. 1080×1920 for TikTok/Reels, 1:1 for feed posts, 16:9 for YouTube — all fine.

Does HyperFrames work offline? The rendering itself is fully local. If your agent needs to hit an LLM API or a TTS service, those steps need connectivity. But once the compositions are written, the render pipeline is local-only.

What happens if my composition has a bug? The linter catches most static issues. For runtime issues, you'll see them when you preview the composition in the browser (it's just HTML — DevTools works normally). This is actually a huge quality-of-life win versus traditional video tools where "debugging" means scrubbing through a timeline.

Final Thoughts

HyperFrames is the most interesting thing to happen to video tooling in a while, and it's interesting in a specific way: it made video something an AI agent can genuinely produce by itself. Not edit. Not tweak. Produce. That's the line that matters.

It's not going to replace every video tool. It's not even going to replace most video tools. What it's going to do is create a new category — programmatic, agent-driven, versionable video — that didn't really exist before, and that category is going to get bigger fast.

If you're a developer, go try it. An afternoon with the /hyperframes skill in Claude Code is well spent. If you're a marketer or a founder who needs a video from a URL this week, don't let the hype distract you — paste the URL here, press the button, go back to whatever you were doing. Different tools, different jobs.

And if you end up using both in the same campaign? Even better. That's kind of the whole point.

References

  1. HyperFrames on GitHub — HeyGen's open-source repository (April 2026)
  2. HyperFrames Website-to-Video Guide — Official HeyGen documentation
  3. HyperFrames official site — "HTML is now a video format"
  4. HeyGen HyperFrames: How Code is Killing Traditional Video Editing — DEV Community analysis
  5. How to Edit Video with an AI Agent Using HyperFrames — apidog tutorial
  6. GSAP (GreenSock Animation Platform) — Default animation engine used by HyperFrames
  7. The Ultimate Guide to Converting Websites into Professional Videos in 2025 — URLtoVideo Blog

Ready to Create Your First Video?

Start converting websites to professional videos in minutes

Try URLtoVideo Free