─── 🖊⋅🖊⋅🖊 ───

Happy new year! I’m trying to proof-of-concept something: let’s say there’s a series of technical talks, quality content, and they’re all recorded and posted online. But they get low watch time. What are ways in which we can reformat the content to make it easier to consume in a fast-moving world?

Idea 1: use AI to quickly generate multiple shorter clips

The goal is that a viewer can watch some clips casually, and decide to watch more if they have time and interest. Ideally, it’d be a video that is edited across multiple moments in a talk, but I’ll just start with clipping out continuous 30s or 1m of content.

After some initial brainstorming with Perplexity, the goal workflow is as such:

download video
extract audio with ffmpeg
put the audio through whisperx to extract timestamps in a .srf file (turns out this is the standard subtitles format!)
Put the .srf into AI to identify insightful portions to clip out
Use ffmpeg to clip out those portions based on the timestamp

Licensing

Some important questions to ask:

are we allowed to repurpose the videos in this way? e.g. the speaker agreed to their likeness being used?
- future question to explore: digital privacy rights across diff countries, e.g. EULA vs Korea likeness usage vs U.S. media consent
For now, I’m using a lecture I downloaded off of MIT open courseware. I looked around and it seems that the specific course didn’t have it’s own special license, so the MIT OCW license is (CC BY-NC-SA) — What are the requirements of use for MIT OpenCourseWare? – MIT OpenCourseWare
- Deed - Attribution-NonCommercial-ShareAlike 4.0 International - Creative Commons
- Free to adapt, remix, etc. but need to give attribution, and sharealike.

FFmpeg

What a useful tool! I used it to extract my audio into an mp3 file, and used a simple command to clip out various timestamps. I also even added a nifty little fade-out effect in the last 2 seconds, in both video and audio. See below.

ffmpeg -ss 00:44:25.791 -to 00:45:42.936 -i bio-lecture.mp4   -vf "fade=t=out:st=75:d=2"   -af "afade=t=out:st=75:d=2"   sugars.mp4

whisperx

I’m on a RTX 1060 6gb. Which is old, and weak. I spent a lot of time fiddling around with version and stuff. To get whisperx to work (WhisperX: Automatic Speech Recognition with Word-level Timestamps), I had to download these things:

CUDA
cuDNN
whisperx
ctranslate2
torch, torchaudio, torchvision
libcudn

I learned that if in doubt, just delete your virtual environment and do a clean install. Also, WSL is so annoying.

There are some interesting options you can put into whisperX. I ran it like this:

whisperx audio.wav   --model large-v2   --batch_size 1   --compute_type int8 --language en --vad_method silero --device cpu

gave up eventually on GPU so I swapped to CPU compute
large model seems to be ok
keep a low batch_size b/c weak GPU
compute_type also can only be int8. The next upgrade would be float16.
silero vad is a voice activity detector. cool! basically, it detects breaks in sentences and silence.

While running whisperX, the terminal output was giving the transcription in 30 second chunks (btw, runtime wasn’t too bad: I got a chunk every 5s or so), so I was worried that the transcription was not fine-grained enough. Thankfully, after transcribing, it takes your content and spends some time doing alignment, so the final result is subtitles that are timestamped to approximately a sentence or two each, which is much more natural. In the future, I’ll try throwing the subtitles on top of the video again, so I can have a closed captions video.

Idea 2: clip out the audio and post it as a podcast

Less work to do here, but the idea is to use ffmpeg to get the audio again, and then post it on a podcast-type of website. Then people can listen to it more passively, rather than watching the whole video. Hmm… who listens to podcasts these days? I do.

This one is pretty good: GitHub - ad-aures/castopod: Castopod is an open-source hosting platform made for podcasters who want engage and interact with their audience. Synchronized read-only mirror of https://code.castopod.org/adaures/castopod

Misc other ideas

better website with a link to directly sign up for the mailing list, sort past content by tags, and make it searchable. I mean, if you’re going to keep doing this for years, you’ve got to have a robust way to store everything!
networking corner? e.g. if the speaker is a startup-y guy, have a startup networking corner. If it’s more like bio, have a bio networking corner.

One last thing you can do with ffmpeg

Make a slideshow/gif!

ffmpeg -f concat -safe 0 -i list.txt -r 30 -c:v libx264 -pix_fmt yuv420p output.mp4

You can define a file list.txt that basically is like this:

file 'photo1.jpg'
duration 0.3
file 'photo2.jpg'
duration 0.3
file 'photo3.jpg'
duration 0.3

And it’ll make your video for you.