Relying heavily on reads mapped with a gap as evidence for transcripts, it is primarily developed for eukaryotic genomes. The proposed work flow for using the ab initio Transcript discovery plugin in combination with the existing RNA-Seq tool in the CLC Genomics Workbench is this:
· Run the large gap mapper using all your RNA-Seq reads and a genomic reference sequence.
· Run the transcript discovery algorithm on the resulting read mapping to predict transcripts and genes.
· Inspect the results and if necessary re-run the transcript discovery to refine the settings to produce the desired result.
· Part of the result from the transcript discovery is a copy of the reference genome including the new transcript and gene annotations.
· This can now be used as a common reference for measuring gene expression using the existing RNA-Seq tool in the Workbench.
If you have sequenced several samples that need to be compared, we suggest using the reads from all samples for the large gap mapping and subsequent transcript discovery. In this way, you can establish a common set of reference transcripts and genes that makes it possible to compare gene expression levels across samples (using the RNA-Seq tool in the CLC Genomics Workbench). The initial read mapping created by the large gap mapper is then no longer used and can be deleted, unless you wish to be able to go back and double-check the basis of the prediction.
The current release is a beta version with full functionality for single reads. If you have paired reads, they are treated as single reads. However, when you run subsequent RNA-Seq analysis to quantify expression across genes and transcripts, the full paired information is used.
Transcript Discovery 2.0 beta 16, released July 13, 2016
· Improved speed of Transcript Detection.
· Fixed bug where Transcript Detection could run out of stack space on large data sets.
· Transcript Detection now warns instead of failing when transcripts cannot be merged.