• 更新时间: 2016-09-06 09:18:44
  • 文件类型: PDF
  • 文件版本: 最新版
  • 简要说明:

Gene finding for microbial genome and metagenome assemblies

Gene finding expertise for CLC Genomics Workbench

Accurate identification of protein-coding regions in metagenomics sequences is challenging. The MetaGeneMark plugin relies on an innovative bootstrap-like approach to overcome the parameter estimation problem that conventional gene finding algorithms suffer from due to short contig length and the absence of genomic context in contigs

GENE PROBE Inc., the inventors of MetaGeneMark, have developed and refined algorithms for gene prediction of short anonymous sequences for more than fifteen years. The MetaGeneMark plugin is optimized for gene finding in bacterial genomes and metagenomes.




Gene Finding made easy

  • Full automation without variable parameter settings. All necessary parameters are auto-selected based on the sequence information of the used data set.

  • Combine thee MetaGeneMark plugin with the Extract Annotation tool and the BLAST tools of the CLC Genomics Workbench.

Designed for a wide range of microbial data types

  • Gene finding is supported for microbial genomes, metagenomes, and even metagenomes containing sequences of bacterial, archaeal, and viral (phage) origin. 

  • The plugin handles datasets ranging from a single sequence or contig of a few hundred nucleotides up to metagenome assemblies with several gigabytes of sequence. 


*Zhu W., Lomsadze A. and Borodovsky M.
Ab initio gene identification in metagenomic sequences.
Nucleic Acids Research, 2010, Vol.38, No.12, e132, doi: 10.1093/nar/gkq275

MetaGeneMark (metagenomic gene caller with unsupervised estimation of model parameters, a customized version with extended functions) is an ab initio genomic sequence analysis tool designed to predict intronless protein coding regions in novel metagenomic and metatranscriptomic sequences. High order parameters of statistical models of protein coding and non-­coding regions are determined for each individual sequence by a heuristic method that essentially reconstructs genomic context of a given short sequence (Zhu et al., 2010*)  The MetaGeneMark core code implements the Viterbi algorithm for hidden semi-Markov model.

Modes of analysis implemented in MetaGeneMark include:

1. gene prediction in prokaryotic or phage metagenomes as well as metatranscrptomes (Genetic code 11)

2. gene prediction in metagenomes of yeast like eukaryotes (having intronless genes) , eukaryotic viruses as well as eukaryotic metatranscriptomes  (Genetic code 1).