Credit: MD Saiful Islam Khan / iStock / Getty Images Plus
Microsoft and Paige have revealed the fruits of their collaborative labor in creating the most advanced AI-powered cancer pathology tool. A research article published in Nature Medicine demonstrates how their “foundational model” for computational biology, Virchow, has the unprecedented ability to model diverse patterns observed in pathology images due to being trained on over a million digitized slides—the most extensive set of its kind. Named in honor of a pioneer of modern pathology, Rudolf Virchow, who proposed the first theory of cellular pathology, the model has best-in-class performance in biomarker prediction, cell identification, and pan-cancer detection.
But perhaps the most valuable aspect of Virchow is that the pan-cancer detector, which was trained on the most data for a cancer diagnostic, can perform at the level of tissue-specific clinical-grade models and outperform them in identifying some rare variants of cancer—all with less data and without any labels. Thus, the potential for Virchow in high-impact applications with limited amounts of labeled training data is, theoretically, unprecedented.
Applying computer vision to pathology
Digital histological preparations, also known as whole slide images (WSIs), are gradually replacing their analog counterparts in light microscopy examinations, as they are the basis of efforts in computational pathology to aid in disease diagnosis, characterization, and understanding through the application of artificial intelligence. In this developing field—the first AI pathology system to receive FDA approval was in September 2021—a major aim is to decipher routine WSIs for previously unknown outcomes like prognosis and therapeutic response.
These capabilities are possible due to the remarkable performance gains of computer vision. This area of AI built for analyzing images has been dramatically enhanced by the development of “foundation models,” which are self-supervised algorithms built on massive datasets. The significant upside of these models is that they do not require curated labels, which offers generalizability and training data efficiency. To do so, massive amounts of training data are required, which have not been previously seen in computational pathology.
Virchow, a new “foundation” for cancer diagnostics
Last year, in early September, Microsoft and Paige, a provider of AI-driven pathology solutions dedicated to improving cancer research and treatment, partnered to develop a foundational model for clinical-grade computational pathology and identify rare cancers. The resulting AI model, Virchow, was trained using data from around 100,000 patients, equivalent to about 1.5 million H&E-stained WSIs obtained from Memorial Sloan Kettering Cancer Center (MSKCC). This dataset used four- to ten-fold more images and 3,000 times more pixels than commercially available AI models.
Virchow was tested against various clinical-grade AI models, which was initially previewed as a pre-print in January of this year. According to the Nature Medicine research article, Virchow’s performance generally matches these commercial models in pan-cancer detection and outperforms them in detecting rare cancers. This outcome is even more astounding when considering that the pan-cancer model’s training dataset does not include the usual quality control and subpopulation enrichment of data and labels that are done for commercially available AI models.
Overall, Virchow unlocks the ability to accurately and precisely detect unusual histological variants of cancer and biomarker status, which is difficult to achieve with cancer- or biomarker-specific training due to the limited amount of associated training data. The results provide evidence that large-scale foundation models can be the basis for robust results in a new frontier of computational pathology.
News & FeaturesArtificial intelligenceComputational biologyEarly cancer detectionPathologySoftwareMicrosoft