Create your own conference schedule! Click here for full instructions

Abstract Detail

Phylogenomics

Hearn, David [1], Hannon, Dylan [2], Poulsen, Travis [3], Cronn, Richard [4], Liston, Aaron [5].

RNA-Seq phylogenomic pipeline for non-model plants.

Next generation RNA-Seq technologies provide phylogenomic data at an unprecedented scale for non-model organisms. We developed an automated analysis pipeline that focuses on the needs of plant systematists while minimizing the computational burden of NGS data. As input, the pipeline requires raw sequence read files from the Illumina platform, metadata about these files, one or more reference proteomes from species that are potentially distantly related to ingroup taxa, and a set of reference gene locus names whose homologs will be identified in the exomes of ingroup taxa. The pipeline checks the quality of sequence reads, trims low quality regions, assembles the reads, translates the assembled contigs, generates models of homology among reference genes, queries the translated contigs for homologs of the reference genes, aligns the retrieved amino acid sequences, and reverse-translates the amino acid alignments back to nucleotide alignments. In addition to the concatenated alignments, alignments are provided for each of the loci separately for use by gene tree / species tree reconciliation software. We apply this pipeline to infer the phylogeny of 33 ingroup species in the non-model genus Adenia (Passifloraceae) along with 8 outgroup taxa from the rosid clade with fully-sequenced genomes. Our analyses started with 1243 reference gene loci from Arabidopsis. This set was reduced to 477 loci after stringent filters removed loci with high sequence heterogeneity among species, low coverage across species, or high levels of within-genome duplication. The automated pipeline generated a 432,886-length, quality-masked, concatenated amino acid alignment with 75,798 parsimony-informative characters. This alignment provided the basis to infer maximum parsimony, maximum likelihood, and Bayesian estimates of phylogeny. The inferred phylogenies provide high support for previous hypotheses of relationships in Adenia based on internal transcribed spacer sequence data, yet some nodes remain unresolved in clades with rapid rates of speciation. The methodology favors highly-conserved loci, and we are investigating techniques to identify appropriate, quickly-evolving loci from the RNA-Seq data to further resolve the few nodes with lower support. This approach is generally applicable to any plant group, is based on freely-available software, and comes at a sample preparation and sequencing cost of $100-$200 dollars per sample.

Broader Impacts:

1 - Towson University, 8000 York Road, Towson, MD, 21252, USA
2 - Huntington Botanical Garden, 1151 Oxford Road, San Marino , CA , 91108, USA
3 - Towson University, Biological Sciences, 1800 Rambling Rdg Ln, Apt. 202, Pikesville, MD, 21209, USA
4 - USDA Forest Service, 3200 SW Jefferson Way, Corvallis, OR, 97330, USA
5 - Oregon State University, Department of Botany & Plant Pathology, 2082 Cordley Hall, Corvallis, OR, 97331-2902, USA

Keywords:
Adenia
RNA-Seq
Phylogenomics
Non-model species
next generation sequencing.

Presentation Type: Oral Paper:Papers for Topics
Session: 36
Location: Magnolia/Riverside Hilton
Date: Tuesday, July 30th, 2013
Time: 2:15 PM
Number: 36004
Abstract ID:685
Candidate for Awards:Margaret Menzel Award