Kingsley Dixon1, Jill Harrison2, Sandy Hetherington3, Joshua Mylne4 and Jingling Zhang4.
1. Kings Park Science, Botanic Gardens and Parks Authority, Fraser Avenue, Perth 6005, Australia
2. School of Biological Sciences, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK.
3. Plant Sciences Department, South Parks Road, Oxford, OX2 3RB, UK
4. The University of Western Australia, School of Chemistry and Biochemistry & ARC Centre of Excellence in Plant Energy Biology, 35 Stirling Highway, Crawley, Perth 6009, Australia.
In October 2014 we reported transcriptome sequencing for the lycophytes Phylloglossum drummondii and Isoetes drummondii and the basal angiosperm representative Trithuria bibracteata (Figure 1) in a blog post (DOI: 10.13140/RG.2.1.4814.7445). We envisaged that sequence data would be useful for gene discovery in systematic and evo-devo studies due to the paucity of sampling in these lineages and their key taxonomic position. Some technical glitches gave a delay in obtaining high quality RNA, but this has now been overcome and the data are available from firstname.lastname@example.org.
Figure 1: Phylloglossum drumondii, Isoetes drummondii and Trithuria bibracteata samples used in RNA extraction.
RNA was extracted using a phenol and lithium chloride prep followed by a NucleoSpin clean up. Sequencing libraries were generated from 300-1000 ng of purified total RNA using the TruSeq® Stranded Total RNA LT with Ribo-Zero Plant kit (Illumina). Approximately 100 million paired-end 151 bp Illumina raw reads were acquired for each species. Jingjing Zhang assembled the transcriptomes as described by Jayasena et al. (2014) using approximately 70 million clean reads for each species. Assemblies were done four times for each species using a different word size setting in CLC Genomics (20, 30, 40 and 60). The number of contigs assembled per transcriptome ranged from approximately 140,000-150,000 for word size 20 to 200,000-250,000 for word size 60.
Sandy Hetherington has undertaken a preliminary phylogenetic analysis of KNOX homeodomain genes demonstrating data utility by the expected placement of gene homologues (Figure 2). The 30 word sized transcriptome assemblies were used for the analysis. Protein coding regions were predicted using GeneMarkS-T (Tang et al 2015). KNOX sequences were identified by a BLAST search with a query of KNOX proteins from; Arabidopsis thaliana (At), Oryza sativa (Os), Selaginella moellendorffii (Sm), Physcomitrella patens (Pp), Chlamydomonas reinhardtii (Cr), Ostreococcus tauri (Ot) based on the analysis by Mukherjee and colleagues (2006). Proteins were aligned using MAFFT (Katoh and Frith 2012) and manually trimmed using Bioedit (Hall, 1999) to the 64 amino acids that constitute the conserved homeodomain. A maximum likelihood phylogenetic analysis was carried out in RAxML (RAxML version 8.0.5) (Stamatakis, 2014), protein model PROTGAMMAAUTO and 1000 rapid bootstraps. The phylogenetic analysis was rooted on the closely related Arabidopsis BEL protein At_BEL1.
Figure 2: ML tree showing position of Phylloglossum drummondii (Pd), Isoetes drummondii (Id) and Trithuria bibracteata (Tb) contigs identified from transcriptomes.
We hope that the data will be useful to the evo-devo community and encourage potential users to get in touch with Josh for access.
Hall T. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41: 95–98.
Jayasena A. S., Secco D, Barnath Levin K., Berkowitz O., Whelan J. and Mylne, J. 2014. Next generation sequencing and de novo transcriptomics to study gene evolution. Plant Methods. DOI: 10.1186/1746-4811-10-34
Katoh K, Frith MC. 2012. Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinformatics 28: 3144–3146.
Mukherjee K, Brocchieri L, Bürglin TR. 2009. A comprehensive classification and evolutionary analysis of plant homeobox genes. Molecular Biology and Evolution 26: 2775–2794.
Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313.Tang S, Lomsadze A, Borodovsky M. 2015. Identification of protein coding regions in RNA transcripts. Nucleic Acids Research 43: 1–10.