Kaltura, the leading video platform, is fully evolving from a "content container" to a "smart interactive interface." Recently, this NASDAQ-listed company announced the acquisition of eSelf.ai, an Israeli AI digital human company, for $27 million, deeply integrating the latter's cutting-edge real-time conversational virtual human technology into its enterprise video ecosystem. This move marks that Kaltura no longer just focuses on video storage and distribution but bets on the next-generation enterprise interaction paradigm of "Video as an Interface."

Not just a moving mouth, but an AI agent that can see, hear, and speak clearly

eSelf.ai was founded in 2023 by Alan Bekker, co-founder of the former Snap-acquired company Voca, and CTO Eylon Shoshan. With only 15 team members, the company has deep expertise in three core technologies: voice-video generation, low-latency speech recognition, and screen understanding. Its virtual humans not only achieve realistic lip synchronization, but they can also "see" the user's screen content and respond in real time based on it—for example, when a customer lingers on an insurance page, the digital human can actively explain the product terms; in training scenarios, it can dynamically adjust the focus of explanations based on the trainee's interface operations.

image.png

Kaltura CEO Ron Yekutiel emphasized that the core value of this acquisition lies in eSelf's true real-time synchronized conversation capability, rather than the common "pre-recorded audio + lip-syncing" pseudo-interaction found in the market. "We need an AI that can have two-way, dynamic, and context-aware conversations with users, not just a talking video clip," he said.

From an enterprise video platform to an AI experience engine

Kaltura currently serves over 800 global enterprise customers, including Amazon, Oracle, SAP, IBM, and several top financial institutions and universities. Its products include enterprise video portals, virtual classrooms, webinar systems, and TV streaming solutions. After acquiring eSelf, Kaltura will launch independent AI agents that can be embedded in sales, customer service, and training scenarios, offering enterprises a "full-stack video intelligence":

Front-end: Highly realistic digital humans as an interactive entry point;

Mid-platform: Connecting to enterprise systems such as CRM, knowledge bases, and LMS;

Back-end: Dynamically generating personalized responses based on user behavior and screen content.

Yekutiel pointed out that Kaltura's vision is to make video transition from "passive watching" to "active service." "We started with video, advanced to personalized video, and now, through eSelf, we give AI a face, eyes, ears, and mouth, enabling it to truly possess human-level expression and understanding," he said.

Clear strategic layout, denying sale rumors

Despite recent media reports that Kaltura is seeking to sell at a valuation of $400 million to $500 million, Yekutiel explicitly denied it: "We have never been close to any transaction." On the contrary, this acquisition is its fourth strategic acquisition (previous ones including Tvinci, Rapt Media, Newrow), demonstrating the company's continued commitment to investing in AI and video integration. Kaltura generated approximately $180 million in revenue in 2024, achieving profitability in both Adjusted EBITDA and cash flow, and has 600 employees.

With the full integration of the eSelf team, Kaltura plans to rapidly implement conversational AI agents in high-value scenarios such as education, finance, healthcare, and e-commerce. When enterprise customer service is no longer just a chatbot, but a digital expert that can "look at you, understand you, and guide you," the threshold of human-computer interaction may be approaching.