A very brief guide to converting E. coli gene IDs

E. coli has been a model organism for long enough for many of its genes to go by several different names. This isn't a terrible problem as long as we keep some well-organised databases, but even those databases have their own unique identifiers. The issue is compounded by the fact that genetic loci in different strains of E. coli all get different identifiers as well.

Please avoid using common names when creating lists of E. coli genes! They're easy to remember but don't make for consistent, unique database identifiers, as this example shows - at least nine different common names could be used.

Many names for the same thing.

Many names for the same thing.

Here are a few easy, E. coli-specific ways to handle and convert nearly any gene ID you may find.

  • ecoli.txt - This is an actively-maintained list hosted by Uniprot. It lists ordered locus IDs (b codes and JW codes) and their corresponding accession numbers for Swiss-Prot, Uniprot, and EcoGene, along with a few common names.
  • Uniprot's ID mapping tool - Useful for converting Uniprot IDs to NCBI Gene IDs and vice-versa. The other conversions can be hit-or-miss, especially with databases like BioCyc.
  • EcoGene mapping tool - Useful for converting EcoGene IDs (they start with EG, but don't confuse them with an EchoBase ID as they start with EB) to other identifiers.
  • PIR ID mapping tool - Yes, another ID mapping tool, including some common E. coli databases like EcoGene.