In playing with the exact same use case, I was blown away at how good Gemini (flash 2.5 IIRC) transcoded podcasts with speaker identification and handled common "overlaps" in conversations. I can't remember what local Ollama models I played with but was not very impressed.
In playing with the exact same use case, I was blown away at how good Gemini (flash 2.5 IIRC) transcoded podcasts with speaker identification and handled common "overlaps" in conversations. I can't remember what local Ollama models I played with but was not very impressed.
Yeah, Gemini is really strong at speaker separation and handling overlaps.
I’m taking a local-first approach (privacy, offline, no cost), using Faster-Whisper