About event
Comparing biological samples through sequencing is a core task in bioinformatics analyses such as variant detection, differential expression analysis, and epigenetic peak calling. Standard approaches for sample comparison typically rely on mapping newly sequenced reads to a reference genome. However, this process has several limitations, including mapping biases and reference biases, which arise when reads fail to align in complex genomic regions or when the reference genome is too divergent due to intrinsic variability or misassemblies. To address these challenges, k-mer-based approaches have been proposed as an alternative. While these methods partially mitigate mapping-related issues, they often produce results that are difficult to interpret and still depend on mapping for downstream analysis. In this study, we propose a k-mer-based approach for comparing biological samples in the context of a reference genome without relying on sequence alignment. Instead of mapping, kdiff identifies interesting genomic regions by detecting k-mers with differential abundances between samples (differential k-mers). We demonstrate that our method effectively detects copy number variants in cancer genomes and remains robust against reference genome misassemblies. Additionally, we illustrate its utility in confirming telomere locations in noisy yeast nanopore sequencing data. By leveraging fast k-mer counting tools, we provide results comparable to standard alignment-based methods while significantly improving computational efficiency.
The lecture is part of the Bioinformatics Seminar Series, which aims to invite exciting speakers working on the current state-of-the-art bioinformatics problems.