Targeted Enrichment of Large Gene Families for Phylogenetic Inference: Phylogeny and Molecular Evolution of Photosynthesis Genes in the Portullugo Clade (Caryophyllales)
Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the “portullugo” (Caryophyllales), a moderately sized lineage of flowering plants (˜ 2200 species) that includes the cacti and harbors many evolutionary transitions to C4 and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C4 and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C4 and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ˜ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae + Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.