University of Pennsylvania researchers have developed a tool that converts web pages into scrolling videos using headless Chrome and ffmpeg, with integration for natural language control via Codex.
Researchers at the University of Pennsylvania have developed an innovative tool that captures web pages as scrolling videos, addressing the need for simple web page documentation and demonstration. The web-scroll-video project transforms static web content into dynamic MP4 videos by capturing viewport screenshots at fixed scroll offsets and streaming them through ffmpeg.

The tool solves a common problem for developers, designers, and content creators who need to document or demonstrate website functionality. Instead of manually recording screen captures or using complex video editing software, users can now programmatically generate videos of web pages with precise control over scrolling speed, resolution, and interactions.
"Creating videos of web pages has traditionally been a cumbersome process requiring screen recording software and manual editing," the project's documentation explains. "This tool automates the entire process by controlling a browser to scroll through a page while capturing frames at specified intervals."
The tool operates by launching a temporary headless Chrome profile with DevTools enabled, opening the target URL, and then capturing frames as it scrolls through the page. These frames are then piped into ffmpeg to produce an H.264 MP4 video. By default, it creates 1080p videos at 30 fps, but users can customize resolution, frame rate, and scroll speed through various command-line options.
What makes this tool particularly interesting is its integration with Codex, an AI assistant that allows users to describe the video they want in plain English. For example, a user could request: "Make a 60 fps 1080p video of https://zamechek.com. Pause 1 second, click Blog, scroll slowly to the bottom and back to the top, click the first Keynot post, then scroll slowly to the bottom. Show the cursor." Codex would then generate a cue sheet and render the MP4 accordingly.
For more complex scenarios, the tool supports cue sheets that allow users to define sequences of actions including pauses, clicks, typing, zooms, and highlights. These cue sheets can be written as text files or JSON, making them easy to edit and version control. This approach enables the creation of sophisticated web page demonstrations with precise timing and interactions.
"The cue sheet system is particularly powerful because it keeps the video definition alongside the output file," the documentation notes. "This means revisions are straightforward - you just edit the cue file and re-render."
The project requires only Node.js 22 or newer, Google Chrome/Chromium/Microsoft Edge, and ffmpeg. Notably, it doesn't require any npm packages, making it lightweight and easy to install. The tool can be installed manually or through the Codex skill installer, which can also check and help install dependencies.
For command-line users, the tool offers various options for customization:
--scriptto run a cue sheet--outto specify output path--widthand--heightfor viewport dimensions--fpsfor frame rate--speedfor scroll speed--durationto fit the full scroll into a specific time--cursorto show a rendered cursor
The tool handles several common challenges in web page capture, including lazy-loaded content and varying page load times. It includes options to adjust warmup scroll steps and delay times after page loading to ensure complete content capture.
Error handling is another strength of the tool. When a cue step fails, it generates an error report and screenshot, helping users identify and fix issues. The error reports include the failure message, current URL, intended output path, and screenshot path.
This project represents an interesting intersection of web automation, video processing, and AI-assisted workflow. By providing a simple way to convert web pages into videos with precise control, it opens up possibilities for automated documentation, website testing, and content creation.
The tool is particularly valuable for:
- Creating website demos for pitches or presentations
- Documenting web application features
- Generating tutorial content
- Archiving website content in video format
- Testing website behavior under different scroll conditions
The project is released under the MIT license, making it freely available for both personal and commercial use. The source code is available on GitHub, where users can report issues, request features, or contribute improvements.
As web-based applications become increasingly prevalent, tools that simplify the process of capturing and documenting web content will continue to gain importance. The web-scroll-video tool from Penn researchers provides a straightforward yet powerful solution to this problem, with the added benefit of AI-assisted operation for non-technical users.

Comments
Please log in or register to join the discussion