As AI companies mature, the competition for high-quality data has become one of the fiercest battlegrounds in the industry, giving rise to companies like Mercor and Surge, with the most notable being Scale AI, founded by Alexandr Wang. However, Wang has now taken charge of Meta's AI business, and many investors see an opportunity, willing to fund companies that have compelling new strategies for collecting training data.
Datacurve, a company graduated from Y Combinator, is one such company, focusing on providing high-quality data for software development. On Thursday, the company announced a $15 million Series A funding round led by Mark Goldberg from Chemistry, with participation from employees at DeepMind, Vercel, Anthropic, and OpenAI. Previously, the company had also completed a $2.7 million seed funding round, with Balaji Srinivasan, former CTO of Coinbase, participating in the investment.
Datacurve uses a bounty hunter system to attract skilled software engineers to complete the most difficult data sets. The company pays for these contributions and has distributed over $1 million in bounties so far.
However, co-founder Serena Ge said that the biggest motivation is not money. For high-value services like software development, the compensation for data work is always far lower than traditional employment relationships, so the company's most important advantage is a positive user experience.
Ge said, we treat this as a consumer product rather than a data annotation operation. They have spent a lot of time thinking about how to optimize it to attract and engage the people they want to enter the platform.
This is especially important as the demand for data after training becomes more complex. Early models were trained on simple data sets, while today's AI products rely on complex reinforcement learning environments, which need to be built through specific and strategic data collection. As environments become more complex, data requirements are becoming stricter in both quantity and quality, which may give high-quality data collection companies like Datacurve an advantage.
As an early-stage company, Datacurve currently focuses on the field of software engineering, but Ge said this model can also apply to fields such as finance, marketing, and even medicine.
Ge explained that what they are doing now is building an infrastructure for post-training data collection, attracting and retaining high-level talent in their respective fields.