Kingsley
Dixon1, Jill Harrison2, Sandy Hetherington3, Joshua
Mylne4 and Jingling Zhang4.
1. Kings Park Science, Botanic Gardens and
Parks Authority, Fraser Avenue, Perth 6005, Australia
2. School of Biological Sciences, 24
Tyndall Avenue, Bristol, BS8 1TQ, UK.
3. Plant Sciences Department, South Parks
Road, Oxford, OX2 3RB, UK
4. The
University of Western Australia, School of Chemistry and Biochemistry & ARC
Centre of Excellence in Plant Energy Biology, 35
Stirling Highway, Crawley, Perth 6009, Australia.
In October 2014 we reported transcriptome
sequencing for the lycophytes Phylloglossum
drummondii and Isoetes drummondii
and the basal angiosperm representative Trithuria
bibracteata (Figure 1) in a blog post (DOI: 10.13140/RG.2.1.4814.7445).
We envisaged that sequence data would be useful for gene discovery in
systematic and evo-devo studies due to the paucity of sampling in these
lineages and their key taxonomic position. Some technical glitches gave a delay
in obtaining high quality RNA, but this has now been overcome and the data are available
from joshua.mylne@uwa.edu.au.
Figure 1: Phylloglossum drumondii, Isoetes
drummondii and Trithuria bibracteata samples
used in RNA extraction.
RNA was extracted using a phenol
and lithium chloride prep followed by a NucleoSpin clean up. Sequencing
libraries were generated from 300-1000 ng of purified total RNA using the
TruSeq® Stranded Total RNA LT with Ribo-Zero Plant kit (Illumina). Approximately 100
million paired-end 151 bp Illumina raw reads were acquired for each species. Jingjing Zhang assembled the transcriptomes as described by Jayasena et al. (2014) using approximately 70
million clean reads for each species. Assemblies were done four times for each
species using a different word size setting in CLC Genomics (20, 30, 40 and
60). The number of contigs assembled per transcriptome ranged from
approximately 140,000-150,000 for word size 20 to 200,000-250,000 for word size
60.
Sandy
Hetherington has undertaken a preliminary phylogenetic analysis of KNOX homeodomain genes demonstrating data
utility by the expected placement of gene homologues (Figure 2). The 30 word
sized transcriptome assemblies were used for the analysis. Protein coding
regions were predicted using GeneMarkS-T (Tang et al 2015). KNOX sequences were
identified by a BLAST search with a query of KNOX proteins from;
Arabidopsis thaliana (At), Oryza sativa (Os), Selaginella moellendorffii (Sm),
Physcomitrella patens (Pp), Chlamydomonas reinhardtii (Cr), Ostreococcus tauri (Ot) based on the analysis by Mukherjee and
colleagues (2006). Proteins were aligned using MAFFT (Katoh and Frith 2012) and
manually trimmed using Bioedit (Hall, 1999) to the 64 amino acids that
constitute the conserved homeodomain. A maximum likelihood phylogenetic
analysis was carried out in RAxML (RAxML version 8.0.5) (Stamatakis, 2014),
protein model PROTGAMMAAUTO and 1000 rapid bootstraps. The phylogenetic
analysis was rooted on the closely related Arabidopsis
BEL protein At_BEL1.
Figure 2: ML tree showing position of Phylloglossum drummondii (Pd), Isoetes
drummondii (Id) and Trithuria
bibracteata (Tb) contigs identified from transcriptomes.
We hope that the data will be useful to the
evo-devo community and encourage potential users to get in touch with Josh for
access.
Hall T. 1999. BioEdit: a user-friendly biological sequence alignment
editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium
Series 41: 95–98.
Jayasena A. S., Secco D, Barnath Levin K., Berkowitz O.,
Whelan J. and Mylne, J. 2014. Next
generation sequencing and de novo
transcriptomics to study gene evolution. Plant
Methods. DOI:
10.1186/1746-4811-10-34
Katoh K, Frith MC. 2012. Adding unaligned sequences into an existing alignment using
MAFFT and LAST. Bioinformatics 28: 3144–3146.
Mukherjee K,
Brocchieri L, Bürglin TR. 2009. A comprehensive
classification and evolutionary analysis of plant homeobox genes. Molecular
Biology and Evolution 26: 2775–2794.
Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis
of large phylogenies. Bioinformatics 30: 1312–1313.
Tang S,
Lomsadze A, Borodovsky M. 2015. Identification of protein coding regions in
RNA transcripts. Nucleic Acids Research 43: 1–10.