Jonas Mueller, Curtis Northcutt, and Anish Athalye uncovered a startling truth during their doctoral research at MIT: 10 of the most widely used datasets for computer vision, natural language, and audio were riddled with errors. They found mislabels in a lot of places — a freight car was tagged as a mobile home, a mushroom and pepperoni pizza was labeled as dough, and keyboards were misidentified as space bars. The implications of their realization were pivotal: Erroneous data jeopardizes technology’s reliability, leading to artificial intelligence that “hallucinates” or misinforms.
The trio didn’t resort to the Herculean task of meticulously sifting through the data. Instead, they developed an algorithm to check the data accuracy — and the results were eye-catching. Following the publication of an academic study and the release of their algorithm as open-source code, Silicon Valley tech companies began knocking on their door.
“We had so many companies reaching out for enterprise support and additional features that we decided to launch a business around it. Today it’s become a popular open-source library for data-centric AI used by tens of thousands of data scientists in a variety of companies.“
— Jonas Mueller
Their company called Cleanlab provides software to automatically find and fix errors in real-world data. It streamlines the laborious, error-prone, and expensive process of cleaning up data, allowing businesses to confidently turn raw data into reliable models and insights — no coding required. It automatically labels raw data while pinpointing potential errors and redundancies.
Cleanlab recently raised a $25 million Series A. The company is helping to usher in the era of data-centric AI, a discipline focused on the systematic refinement of data to bolster AI systems. With inaccurate or unreliable data, AI suffers. With clean data, AI works far better.
The importance of cleaning up data sets has real implications for companies large and small. For example, an e-commerce company relying on inaccurate or unreliable data may misclassify its products, leading to items landing on the wrong web pages and a poor customer experience.
Cleanlab joins the LIFT Labs Accelerator
In the fall of 2023, Cleanlab spent six weeks in the Comcast NBCUniversal LIFT Labs Accelerator: Enterprise AI. Along with nine other startups, Cleanlab worked with tech and business leaders at Comcast to understand the practical challenges and opportunities within a global media and technology company.
The Cleanlab team found it particularly exciting to learn the AI and data use cases that are most important to leaders across Comcast.
“Learning from inside Comcast was an incredible experience and helped us better understand how we can deliver the most meaningful value to companies to improve business outcomes,” said Mueller. “The accelerator streamlined the process of meeting with Comcast data scientists and helped us understand where to focus our energy so our product continues to hit the mark.”
Cleanlab’s Mission: AI for All
Although enterprise solutions are a big part of Cleanlab’s roadmap, they envision a future where any company, regardless of size or sector, can use Cleanlab to uplift their AI workflows and bypass the steep learning curve of complex algorithms or coding expertise.
“Our goal is to democratize AI and make it so anybody can use it to solve complex, pressing problems,” said Mueller. “These datasets can have huge flaws that are obvious to experts but a big blocker for folks without formal training. We want our software to help a business analyst or a non-technical person to use the incredible power of AI to improve business outcomes.”
To read more inspiring startup stories and to stay connected to Comcast NBCUniversal LIFT Labs, sign up for our newsletter and follow us on LinkedIn.