As the volume of video data generated by global enterprises reaches unprecedented heights, how to handle the "dark data" that has been long stored and never viewed or analyzed has become a new industry challenge. Recently, InfiniMind, a Tokyo startup founded by two former senior Google employees, announced it has successfully raised $5.8 million in seed funding. The company is working on building a new AI infrastructure aimed at transforming PB-level raw video and audio into searchable, structured business intelligence data.

The two founders of InfiniMind, Aza Kai and Hiraku Yanagita, worked at Google Japan for nearly ten years, focusing on cloud computing, machine learning, and video recommendation algorithms. Kai said that traditional video analysis often limits itself to tagging individual frames, which cannot understand complex narrative logic or causal relationships. However, thanks to the breakthroughs in visual language models (VLMs) in recent years, current AI is now able to understand video content lasting hundreds of hours and accurately locate specific events or scenes.

Currently, InfiniMind has launched its real-time content analysis platform TV Pulse in Japan, targeting the media and retail industries. Additionally, the company is preparing to launch its flagship product DeepFrame to the international market. This platform can handle videos up to 200 hours long and supports no-code integration. With its headquarters moving to the United States, InfiniMind plans to use the new funds to further expand its engineering team and accelerate R&D, helping global enterprises make more informed decisions through video data.

Key Points:

  • 🎞️ Focusing on "Dark Data": InfiniMind focuses on transforming PB-level video assets that have never been analyzed into usable structured business data.

  • 🚀 Background of Former Google Team: The founders once led data solutions at Google Japan, and the company has raised $5.8 million in seed funding led by UTEC.

  • 🔍 Long Video Understanding Capability: Its core technology supports processing videos up to 200 hours long, accurately identifying scenes, speakers, and complex events, with significant cost advantages.