This summer we have an REU student from Amherst College who is sequencing Oenothera transcriptomes, and is learning both lab techniques and computational techniques. I’ve dusted off the bioinformatics tutorials I made for our Pleurocarp Tree of Life meeting, giving them some much-needed updates. I thought I would post them here in case they would be useful for others. Some of the text is specific for our server (it’s called Treubia after the liverwort genus), so some of the commands may need to be adjusted for your specific machine.
These tutorials assume you already have a bit of experience working on the command line. If not, check out my Intro To Command Line tutorial, which you can find on GitHub.
The first tutorial walks through the contents of an Illumina sequencing run, and guides you through how to quality check, trim, and gather basic statistics about your reads using FastQC and Trimmomatic.
Next, actually assembling the transcriptome using Trinity! The main website for Trinity has a good tutorial as well, but this expands on the tutorial with discussions about what the sequence names mean, gathering basic assembly statistics, and visualizing the assembly using IGV.
Finally, a more advanced tutorial explaining how to assess the transcriptome by comparing its sequences to known genomes using BLAST, translate your transcripts into proteins using Transdecoder, and how to annotate the genome using Trinotate.
This tutorial also uses a python script I wrote, “annotation_stats.py” which uses BLAST results from your transcriptome against a proteome of choice to compile transcriptome annotation statistics suggested by O’Neil and Emrich (Bioinformatics 2013). You can download “annotation_stats.py” from my GitHub site.
These tutorials were made in LaTeX, and if you want to modify them, please contact me and I can provide the source code!