Projects

Project 3: Identification of Contaminant Using Hypothesis Testing in Marker Gene and Metagenomics Data (Poster)

Background: The measurement of microbial community suffers from contaminant DNA sequences that are not truly present in the sample. Decontam has been introduced to identify contaminant sequences using a classification procedure based on a pattern that contaminant appears high frequencies in low-concentration samples. However, it has no false discovery rate control, and clear guidance is missing to help users choose an interpretable threshold.

Results: We propose a hypothesis testing procedure, Tcontam, to detect contaminants using statistical p-value and control the false discovery rate using multiple testing correction procedure. We confirmed validity of Tcontam using simulation. In a human oral dataset, Tcontam reports the contaminants with false discovery rate under control and has low chance to classify the sequences with small sample size as contaminants.

projects