ntStat manuscript published: k-mer characterization from raw sequencing data

We are pleased to announce that our scientific manuscript, ntStat: k-mer characterization using occurrence statistics in raw sequencing data, is now published in PLOS Computational Biology.

ntStat is a fast and memory-efficient toolkit for extracting k-mer occurrence statistics directly from raw sequencing reads. Using succinct Bloom filters, ntStat tracks both k-mer counts and depth, and models k-mer count histograms via evolutionary computation to infer key genomic properties de novo, including genome size, heterozygosity, and sequencing characteristics.

Across benchmarks, ntStat demonstrates strong performance and accuracy (>99.5% correct k-mer counts) while reducing memory usage and avoiding heavy disk requirements compared to existing tools. Its histogram analysis further enables accurate estimation of genomic parameters from both short- and long-read data.

ntStat on GitHub