A hallmark of NCI-designated cancer centers is that the science conducted at these institutions meets rigorous standards for transdisciplinary science and state-of-the-art research. The resources and computing environments at these institutions encourage multidisciplinary teams to creatively solve hard problems and ensure solutions are implemented at a relatively fast pace. The NCI would like to leverage these scientifically rich environments and invites teams of biomedical informaticians, data scientists, clinical researchers, and others to use Large (or Medium) Language Models (LLM) to assist in cancer diagnosis, treatment, and other relevant data extraction from unstructured clinical reports.
Large/Medium Language Models or LLMs are generative language models made up of billions of parameters. They are trained on large quantities of unlabeled text using either self-supervised learning or a semi-supervised learning model. The use of LLMs and other generative technologies has made it possible for retrospective treatment and dosage data to contribute to models that aid in modern treatment decisions. Specifically, Generative Pre-trained Transformer (GPT) technology can extract treatment and dosage data from unstructured clinical records. Alternatively, LLMs could be used to extract cancer-relevant information from pathology reports. This extracted information can be analyzed alongside diagnosis, survival, outcome, and quality of life data to contribute to a cancer patient’s journey. Solutions that focus on training medium language models for these tasks would be accepted. Through the NCI thesaurus, the NCI maintains a list of common cancer treatments.