Saturday, 25 April 2015

Flashback to 2008 - the Illumina GA "classic"


Throughout my bioinformatics career I have been fortunate to be around early adopters of the latest sequencing technologies. Back in 2008 I worked in the Dept of Microbiology at Monash University who ran the Micromon sequencing service. Micromon purchased Australia's first Illumina Genome Analyzer. This was not a GAIIx or even GAII, it was what we now call a "GA Classic", not much more than an original Solexa instrument with upgraded rubber bands and some new stickers on it!

The data

The GA Classic supported single-end 36-bp reads when we got it, people prior to us only has 18-bp reads! The instrument had 8 lanes and produced 8 FASTQ files after 10 days of waiting. Each file had the name s_N_sequence.txt where N was the lane number.

I found one of the first runs we did dated March 2009. This data was for the small bacterium Pasteurella multocida PM70. In one lane we got a whopping 7,462,936 reads (36 bp) totalling 268,665,696 bp (268 Mbp) with average base quality Q30. Here's the FASTQC plot:

Data of this length and quality would probably be filtered out in most modern pipelines! But for us it was surprisingly sufficient (100x) to get a decent de novo assembly out of an early version of Velvet (using k=29 and k=31) and annotate with an ancient predecessor of Prokka. And science pressed on.

Historical aside

Many of you will recall the rollercoaster changes in FASTQ quality encoding that Illumina put us through over the years, something that lead to me writing the first version of the Wikipedia FASTQ page, which was continually improved and extended by the amazing genomics bioinformatics community that exists out there.

Here's some other words to reminisce to: ELAND, Bustard, IPAR, MAQ, SSAHA.


Australia owes a lot to Micromon and Scott Coutts the (extremely patient) scientist who had to figure out the original GA and then start all over again after every subsequent upgrade. He played a major role in training junior Illumina engineers who cut their teeth on this somewhat originally unreliable instrument before it eventually was upgraded to GAII and GAIIx (and eventually replaced by a MiSeq and NextSeq).

1 comment:

  1. GA is classic. It's really amazing that no more than ten years the technology has changed a lot. Now Hiseq 2500 is more cheap and quick for whole genome sequencing, but this do not mean that GA will be forgot.