Short read sequencing

Over the last decade, technology advances have made DNA sequencing a routine and cost-effective method in many fields of life sciences research. The dominant technology today generates billions of short sequences, called “reads”, consisting of 75-250 bases (the letters that make up the DNA sequence). We build high throughput analysis methods to process large volumes of reads in diverse DNA sequencing projects, from high profile international cancer genome mapping initiatives to the generation of reference genomes of non-model species.

Long read sequencing

New technologies are now becoming available that generate information on long stretches of the input DNA as long or linked reads. Long read platforms can sequence over 100,000 base pairs per read, though with a very high error rate and low throughput. Linked read platforms can associate multiple reads over similar lengths, although the data contains many gaps. Still, if coupled with bioinformatics tools that can leverage the rich information they provide, these new sequencing platforms will open new frontiers in health research. We develop specialized tools that quickly, accurately, and efficiently map, assemble and analyse long and linked sequence reads.

Spruce genomics

Spruce trees are Canada’s most significant forest resource. Spruces produce high quality wood and fibre that is widely used in the industry, and as dominant species of Canada’s forests, they provide essential local and global ecosystem services. We are collaborating with multiple organizations across Canada to build genomic resources for the species.

Antimicrobial peptide discovery

Bacteria can rapidly evolve to develop resistance to antibiotics, presenting a growing and very dangerous problem. In a “post-antibiotic era”, bacterial diseases would once again be untreatable, and many standard treatments, including surgical operations, could become unusable. To boost the search for new treatment options, in a collaborative project, we are focusing on short proteins called antimicrobial peptides (AMPs), which are produced naturally by various animal and plant species. These host defence proteins can protect against infection, or reduce the harm caused by an existing infection.

Alternative polyadenylation

The length of a coding transcript’s tail end (3’ UTR) can affect its stability, transport and translation, with important regulatory consequences. Alternative cleavage of transcripts at the 3’ UTR end (alternative polyadenylation) is a difficult to study phenomenon, often requiring specialized and relatively expensive experimental methods. As a solution to this problem, we are building bioinformatics tools using standard transcriptome sequencing (RNA-seq) datasets. We are using our tools to catalogue alternative polyadenylation in large cancer cohorts.

Clinical bioinformatics

Substantial advancements in healthcare economics can be realized through the development of genomics technologies to detect variations and mutations in DNA and RNA in a manner that allows: i) effective preventative care, and/or ii) efficient diagnosis and treatment. One technology that will enable this vision is high throughput DNA and RNA sequencing. This requires proper downstream data analysis and interpretation. To address this challenge, we are building and validating bioinformatics pipelines for clinical use. In collaborative projects, we are deploying these pipelines for cancer care and for diagnosing rare genetic diseases.