In this project, we developed a predictive model to identify lung cancer cases using gene expression data data provided by the Genomic Data Commons (GDC), specifically focusing on genes known to contribute to lung cancer. These genes were selected based on information from reputable biological databases such as NCBI, NCI Genomic Data Commons (GDC) and UniProt and various research works on lung cancer. TPM (Transcripts Per Million) values for each of these genes were analyzed across approximately 2000 test samples, with each sample being compared to all others to identify the closest match based on gene expression patterns.
The team:
Techstack: