In a healthcare setting, being able to access data quickly is vital. For example, a sepsis patient’s survival rate decreases by 4% for every hour we fail to diagnose the species causing the infection and and intervene with an appropriate antibiotic regimen.
Typical genomic analyses are too slow. You transport DNA samples from the collection point to a centralized facility to be sequenced and analyzed in a batch process, which can take weeks or even months. Recently, nanopore DNA sequencers have become commercially available that stream raw signal-level data as they are collected and provide immediate access to them. However, processing the data in real-time remains challenging, requiring substantial compute and storage resources, as well as a dedicated bioinformatician. Not only is the process still too slow, it’s also failure-prone, expensive, and doesn’t scale.
We recently built out a proof of concept for genomics researchers and bioinformatics developers that highlights the breadth and depth of Google Cloud’s data processing tools. In this presentation we describe a scalable, reliable, and cost effective end-to-end pipeline for fast DNA sequence analysis built on Google Cloud and this new class of nanopore DNA sequencers.