Recently, Uber launched a new service in India that allows its ride-sharing and food delivery drivers to use their free time through the app to participate in data classification and information collection tasks. This news was posted by Megha Yethadka, Global Head of Uber AI Solutions, on LinkedIn, who stated that drivers may have idle time during their daily work or want to earn some extra income in the evening.
This new type of work includes reviewing photos, counting objects, classifying text, recording audio, and digitizing receipts, among other forms. Yethadka mentioned that these tasks will support Uber's global enterprise customers, helping them develop generative AI models or consumer applications.
Yethadka further said, "So far, these tasks have been completed by independent contractors outside the app. The initial results are very promising, and we look forward to expanding this service further." In the video she posted, she mentioned that this service could be rolled out globally.
Prabhjeet Singh, President of Uber India and South Asia, said that these new tasks have already been launched in 12 cities, and "tens of thousands of drivers" have started participating in what Uber calls "digital tasks."
Dara Khosrowshahi, CEO of Uber, mentioned in the August earnings call that the launch of digital tasks is because Uber has the core capability to assign tasks to global earners. "You will see a different type of earner, who will work on exciting AI developments around the world," Khosrowshahi said.
In addition, Uber also announced on the same day that they are operating a 350PB (petabyte) data lake and developing a tool called "HiveSync" to protect this data. The Uber Engineering team's announcement explained that previously, Uber's data infrastructure ran between two data center regions to ensure redundancy, but this made the second region not actually used while running, resulting in unnecessary costs.
Therefore, Uber launched the "Single Region Computing" (SRC) program, running all batch computing tasks within a single region and then replicating the data to the second region using HiveSync. HiveSync was developed by Uber starting in 2016 and now manages about 300PB of data stored in 800,000 Hive tables, replicating 8PB of data per day.
Uber said they plan to open source this replication service and continue developing new features to meet growing demands for scalability and low latency. HiveSync also plays an important role in Uber's process of migrating its batch data analytics and machine learning training systems to Google Cloud.
Key Points:
🌟 Uber launches a new driver data classification task in India, helping the development of AI models.
👥 Tens of thousands of drivers have already participated in this "digital task," available in 12 cities.
💾 Uber also released a 350PB data lake and introduced the data protection tool HiveSync.