Artificial Intelligence

AI Helps Humans Annotate Biological Data in Real-Time

New human-augmenting AI labeling system may accelerate biomedical research.

Posted October 26, 2021 | Reviewed by Kaja Perina

Source: Seanbatty/Pixabay

Artificial intelligence (AI) machine learning is rapidly being deployed in biotech, medical, health care, and life science industries. Labeling the massive data sets needed to train the deep learning algorithms is a time-consuming, challenging task for humans. A new study published in npj Digital Medicine demonstrates how Human-Augmenting Labeling System (HALS) can reduce the manual work of labeling data by over 90 percent and increase the quality of biological data annotation.

“Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models,” wrote the study authors affiliated with Salesforce AI Research, Stanford University, University of California, San Francisco, and Amsterdam University Medical Centers. “In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models.

The imaging of biological tissues and cells are central to scientific and medical research in biology. The digital images of these specimens have enabled the training of AI deep learning algorithms, especially in AI computer vision in radiology, histology (the study of the microscopic anatomy of cells, tissues, and organs) and diagnostics.

Recent advances in AI in biomedical and scientific research is largely due to supervised machine learning where the training data is labeled, as opposed to unsupervised learning where the algorithms must identify the patterns in the dataset without labeled training data. For example, an AI algorithm for oncology may be trained using a microscopy image dataset where the tumor images are annotated with labels such as malignant or benign. Furthermore, according to the researchers, just one microscopy image can have a gigabyte of visual data, and annotation requires specialized knowledge which makes it difficult to scale.

To solve this challenge, the researchers created HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI that is able to learn annotations from humans in real-time. The solution consists of a SlideRunner open-source labeling interface, a PanNuke dataset of over 200,000 annotated cells, a pre-trained ResNet-18 for image classification, and Coreset for active learning.

“Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types,” reported the scientists.