AI Apps for Cancer Research

Center for Translational Data Science

Mentors

Aarti Venkat, Director of Clinical Informatics, Center for Translational Data Science

Anirudh Subramanyam, Senior Data Scientist & AI Research Professional, CTDS

About Us

The Center for Translational Data Science (CTDS) at the University of Chicago is a research center whose mission is to develop the discipline of translational data science to impactful problems in biology, medicine, healthcare, and the environment. We envision a world in which researchers have ready access to the data needed and the tools required to make data driven discoveries that increase our scientific knowledge and improve the quality of life. We architect ecosystems of large-scale commons of research data, computing resources, applications, tools, and services for the broader research community to use data at scale to pursue scientific inquiry and accelerate discovery. Learn more at https://gdc.cancer.gov/, https://gen3.org/, https://stats.gen3.org/, and https://ctds.uchicago.edu/

Externship Description

AI apps for cancer research

The Center for Translational Data Science is a leader in building data commons, data meshes and AI tools that support research in biology, medicine, and healthcare.  It is the maintainer of the open source Gen3 data platform for developing and operating data commons and data meshes.   Using Gen3, we operate some of the largest data commons in the world containing harmonized biomedical data. Over the past year,  we have deployed the first “AI commons” that showcases AI apps that are built over the Genomic Data Commons (GDC), the world’s largest cancer genomics data commons for cancer research. For example, two of these AI apps, GDC cohort co-pilot, and Query augmented generation (QAG) have been built to automatically infer cohorts and variant frequencies from natural language queries, respectively. We have conducted initial pilot tests on the functionality of these tools. However, verification and validation checks need to be in place for the tools to return accurate responses, and more elaborate testing needs to be included.

If you are interested in AI, biomedical research, data sharing platforms, or quantitative biology then this internship may give you a good grounding in those areas.

Specific Objectives

Externs will:

  • Deepen understanding of biomedical software systems
  • Work with the team to add automatic verification and evaluation frameworks for responses
  • Work with the team to expand the testing on cross-modality queries
  • Learn software development best practices, and a deeper understanding of AI
  • Receive one on one instruction and attend weekly meeting and frequent checkins and discussion

Qualifications

Required:

  • Knowledge of cancer biology/oncology and informatics to understand the data, metadata and architecture of the GDC
  • Formulate relevant natural language queries for tool testing
  • Python
  • Machine learning, deep learning, LLM