I built a small CLI tool called screenframes that records a portion of your screen and extracts a curated set of frames showing only meaningful visual changes. It's designed to give AI coding agents eyes.
The problem is straightforward. When you're working with Claude Code or similar tools, they can't see what's happening on your screen. You can describe what you're looking at, or take manual screenshots. Both are tedious and lossy. What you want is to just record a chunk of your workflow and hand the agent a concise visual summary of what happened.
screenframes record opens the native macOS region selector. Pick an area, do your thing, press Escape to stop. You get back the video and a set of deduplicated frames — only the ones where something actually changed.
The pipeline is simple: record with macOS screencapture, sample at 10 fps with ffmpeg, auto-crop to the region where pixels actually changed between frames, run perceptual dedup using phash to throw away near-identical frames, downscale to 1280px wide, and generate a markdown manifest with timestamps. You end up with maybe 9 frames from a 12 second recording instead of 120.
You can also skip the recording step entirely with screenframes extract path/to/video.mov if you already have a video from Cmd+Shift+5 or whatever.
The output is a timestamped directory with the cropped video, the frames, and the manifest. Drop the frames into a prompt and the agent has context it never had before.
Python, MIT licensed, pipx install from the repo. Requires macOS and ffmpeg.