Accurate identification of protein-coding regions in metagenomics sequences is challenging. The MetaGeneMark plugin relies on an innovative bootstrap-like approach to overcome the parameter estimation problem that conventional gene finding algorithms suffer from due to short contig length and the absence of genomic context in contigs
GENE PROBE Inc., the inventors of MetaGeneMark, have developed and refined algorithms for gene prediction of short anonymous sequences for more than fifteen years. The MetaGeneMark plugin is optimized for gene finding in bacterial genomes and metagenomes.
Full automation without variable parameter settings. All necessary parameters are auto-selected based on the sequence information of the used data set.
Combine thee MetaGeneMark plugin with the Extract Annotation tool and the BLAST tools of the CLC Genomics Workbench.
Gene finding is supported for microbial genomes, metagenomes, and even metagenomes containing sequences of bacterial, archaeal, and viral (phage) origin.
The plugin handles datasets ranging from a single sequence or contig of a few hundred nucleotides up to metagenome assemblies with several gigabytes of sequence.
MetaGeneMark (metagenomic gene caller with unsupervised estimation of model parameters, a customized version with extended functions) is an ab initio genomic sequence analysis tool designed to predict intronless protein coding regions in novel metagenomic and metatranscriptomic sequences. High order parameters of statistical models of protein coding and non-coding regions are determined for each individual sequence by a heuristic method that essentially reconstructs genomic context of a given short sequence (Zhu et al., 2010*) The MetaGeneMark core code implements the Viterbi algorithm for hidden semi-Markov model.
Modes of analysis implemented in MetaGeneMark include:
1. gene prediction in prokaryotic or phage metagenomes as well as metatranscrptomes (Genetic code 11)
2. gene prediction in metagenomes of yeast like eukaryotes (having intronless genes) , eukaryotic viruses as well as eukaryotic metatranscriptomes (Genetic code 1).