Saturday, 30 March 2013

Bacterial genome annotation systems


You get a bacterial isolate. You sequence it. You manage to get some contigs after mucking around with some de novo assembly software. Now what? Annotation of course! Your FASTA file is teeming with lifeless chunks of bacterial DNA yearning to be adorned with insightfully labelled features, so it can get some more attention from you, and maybe even be reunited with some old friends in Genbank/ENA. If this sounds familiar, then this blog post is for you.

What is genome annotation?

Genome annotation is the process of identifying features of interest on a genome sequence. Some of the   features relevant to bacterial genomes are protein coding genes, non-coding RNAs, and operons. Features can have all sorts of useful information associated with them in addition to their genomic location and feature type. For example, a protein-coding gene annotation could include items such as the predicted protein product, whether it has a signal peptide, a gene abbreviation and an enzyme classification number. The accuracy and richness of a genome annotation is important, and sometimes critical, to downstream biological interpretation.

In the old days, a basic ORF finder would be run over the contigs. Then the truly dedicated curators would comb over the ORFs, trim back to good looking start codon sites, delete spurious looking ORFs, and so on. Later gene predictor software and BLASTX helped bootstrap this process further. Now there are various "automatic annotation" systems which do a reasonably good job. Manual refinement of the automatic annotation can then be done using curation applications.

Below I list the tools I am aware of for performing and curating bacterial genome annotation. If I've missed any please let me know and I will add them.

Web submission systems

Standalone systems

Curation systems

  • Manatee (web interface + database backend)
  • Artemis (local app, can be combined with Chado SQL backend)
  • Apollo (local app, can be combined with Chado SQL backend)
  • Wasabi (my old non-public awful CGI/Perl/make mess that we still use internally)


Beware of systems claiming to do "microbial" annotation. Most of them are only designed for annotating bacteria. They will perform poorly on viruses, fungi and other microbes.


  1. Without having any personal experience with it, RATT, Rapid Annotation Transfer Tool, from the Sanger Institute, may be useful:

  2. There is a new version of Apollo that is entirely web-based

    It's also well-integrated with MAKER

  3. Thanks Chris, I had forgotten about WebApollo! I spoke with Scott Cain about it at BOSC last year, and he said is was progressing well. I need a replacement for my aging in-house Wasabi system (bacteria only) and I will go back and look at WebApollo. Thanks for commenting.

  4. Thanks Lex for reminding me of RATT. I also never tried it, but it's something I need to put on my TO-DO list. Three of my colleagues are coming to Oslo this year, but I'm not able to come. Perhaps we will meet one day.

  5. In Briefings in Bioinformatics, a survey on this topic - "The automatic annotation of bacterial genomes" - was published recently

  6. Thanks for the link to Mick and Emily's paper Igor.

    (I intended to include it in the post but forgot.)

  7. Hi Torsten, Whats about GenDb annotation???

    1. I'm embarrassed to say I have never come across GenDB. I went to the website, and it looks interesting. However, I can not download any software, I have to email someone to get access, and the Demo database link does not work. Is it being actively maintained? (the publication was 11 years ago)

  8. Hi Torsten, I tried Rast /BaSYS/Maker but i think only Rast is working . Do you have any idea all about other.??

  9. We are a leading software development services in delhi ,which works as per the client requirements and give provide software.

  10. This comment has been removed by a blog administrator.

  11. I have a question: Can i do structural annotation in RAST or IMG? i have the idea that this webservers are for functional analysis.

    Thanks for the info and the future help