AI SMART CROP
Turn landscape podcasts, interviews, and streams into vertical clips with AI face tracking that follows the active speaker frame by frame.
DeepSkim uses a neural network (S3FD) to detect every face in every frame, then TalkNet audio-visual analysis to determine who is speaking. The 9:16 crop window automatically follows the active speaker with smooth, Gaussian-filtered motion — no jumpiness, no manual keyframing.
When two people are talking — like in an interview or debate — DeepSkim automatically switches to a split-screen layout showing both speakers. It detects when speaker turns happen and transitions smoothly between single-speaker and split views.
When there are no faces on screen — like during screenshares, slides, or B-roll — DeepSkim uses a 4-tier fallback pipeline (scene detection, object detection, saliency mapping, optical flow) to keep the crop centered on the most interesting part of the frame.