FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

cm0002@piefed.world · 9 months ago

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

chrisbtoo@lemmy.ca · 9 months ago

Hopefully the speech recognition is better than whatever the fuck most online video platforms use for automatic subtitles at the moment.

pirateKaiser@sh.itjust.works · 9 months ago

I’ve built an app with Whisper, the level of ‘hit or miss’ entirely depends on the size of the model and language. Even audio quality is a lesser factor in my experience. So, it depends…

Grass@sh.itjust.works · 9 months ago

has anyone compared vulkan av1 to nvenc or vaapi? too new still?

the_doktor@lemmy.zip · 9 months ago

It’s getting to the point where EVERYTHING has freaking AI slop in it and the only solution is to manually build an entire OS from scratch (LFS) and disabling any damned AI slop any package has, while putting it all on a pre-AI era computer because I’m sure the HW and BIOS in every computer will have damned AI soon enough.

To hell with AI slop and to hell with anyone supporting that copyright-infringing, inaccurate, brain-dead, environmentally unfriendly pile of crap technology.

Björn@swg-empire.de · 9 months ago

I mean, here it is used optionally to help with accessibility. That is objectively a good use of AI.

Scoopta@programming.dev · 9 months ago

Also it’s running locally. I think the biggest problem with AI is the data harvesting and this is just not that

katy ✨@piefed.blahaj.zone · 9 months ago

ugh so what’s the alternative package to ffmpeg?

LunaChocken@programming.dev · 9 months ago

Good luck with that… ffmpeg is the de facto standard.

TonyOstrich@lemmy.world · 9 months ago

This is one of the actually decent uses of this model. I have used Whisper to transcribe to phone calls, and just the other week I had to export the audio from a video I was working on to run whisper to get subtitles for the video. It’s still not a set it and forget it solution, but correcting it’s small mistakes here and there is so much faster than manually transcribing the audio.

Given how modular ffmpeg is with the way the switches work a user never has to interact with that portion of the application. I can technically use ffmpeg to trsnscode an mp3 without ever using the video components.

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

FFmpeg 8.0 merges OpenAI "Whisper Filter" for automatic speech recognition, Vulkan AV1 encoding, & VP9 decoding

FFmpeg 8.0 Merges Vulkan AV1 Encoding & VP9 Decoding