A technology media, macstories, published a blog post about Apple's newly released Speech API, which has attracted widespread attention in the industry. Through a transcription test on a 4K video file that is 34 minutes long and 7GB in size, it was found that Apple's new Speech API only took 45 seconds, far exceeding other similar tools. In comparison, OpenAI's Whisper took 101 seconds for transcription, highlighting Apple's technological advantage, improving efficiency by approximately 55%.
Apple first announced this Speech framework at the 2025 Worldwide Developers Conference (WWDC). The framework includes two modules: SpeechAnalyzer and SpeechTranscriber. The introduction of this technology marks another breakthrough for Apple in the field of voice processing, particularly in terms of speed and accuracy.
In specific tests, the media used Yap, an application developed based on the new module, for transcription. By comparing the performance of different tools, Yap achieved a fast transcription time of 45 seconds, making it the optimal choice in the market. In contrast, MacWhisper (based on OpenAI's Whisper open-source speech transcription model) took 1 minute and 41 seconds, while VidCap required 1 minute and 55 seconds. An earlier version of MacWhisper (V2) even took 3 minutes and 55 seconds.
Although all testing tools encountered some errors in recognizing proper nouns, such as "AppStories," Yap's advantage in local computing ensured its significantly higher efficiency when handling multiple video segments. By calculating the time saved each week when processing multiple videos, users can significantly improve work efficiency.
The rapid development of this technology not only provides convenience for video content creators but also lays the foundation for future scenario expansion. In the future, with the continuous evolution of AI technology, Apple may introduce more innovative solutions in the field of voice recognition, further enhancing user experience.
Key points:
🌟 Apple's new Speech API transcribes a 34-minute 4K video in just 45 seconds, surpassing competitors in speed.
⏱️ Compared to OpenAI Whisper, Apple's technology improves efficiency by approximately 55%, showing significant performance.
📈 The advantage of localized computing allows Yap to be more efficient in processing multiple video segments, saving users a lot of time.