CMU-CB-15-104
Computational Biology Department
School of Computer Science, Carnegie Mellon University



CMU-CB-15-104

Comparative Genomics Reveals Forces Driving the Evolution
of Highly Iterated Palindrome-1 (HIP1) in Cyanobacteria

Minli Xu

March 2015

Ph.D. Thesis

CMU-CB-15-104.pdf


Keywords: Repetitive Sequence, Small Repeat, Cyanobacteria, Highly Iterated Palindrome-1, HIP1, Horizontal Gene Transfer, HGT, Tree Reconciliation, Notung, Temporal Feasibility

The Highly Iterative Palindrome-1 (HIP1) is a highly abundant octamer palindrome motif (5-GCGATCGC-3) found in a wide range of cyanobacterial genomes from various habitats. In the most extreme genome, HIP1 frequency is as high as one occurrence per 350 nucleotides.This is rather astonishing considering that at this frequency, on average, every gene will be associated with more than one HIP1 motif. This high level abundance is particularly intriguing, considering the important roles other repetitive motifs play in the regulation, maintenance, and evolution of prokaryotic genomes. However, although first identified in the early 1990s, HIP1s functional and molecular roles remain a mystery.

Here I present a comparative genomics investigation of the forces that maintain HIP1 abundance in 40 cyanobacterial genomes. My genome-scale survey of HIP1 enrichment, taking into account the background tri-nucleotide frequency in the genome, shows that HIP1 frequencies are up to 300 times higher than expected. Further analysis reveals that in alignments of divergent genomes, HIP1 motifs are more conserved than other octamer palin- dromes with the same GC content, used as a control. This conservation is not a byproduct of codon usage, since codons in HIP1 motifs are more conserved than the same codons found outside HIP1 motifs. HIP1 is also conserved on a broader scale. I predicted orthologs using the Notung software platform and compared enrichment of HIP1 motifs with control motifs across orthologous gene pairs. The similarity of HIP1enrichment in orthologs is significantly higher than the control. Taken together, my results provide thefirst evidence for the mechanism driving HIP1 prevalence. The observed conservation is consistent with selection acting to maintain HIP1 prevalence and rejects the hypothesis that HIP1 abundance is due to a neutral process, such as DNA repair. The evidence of selection thus suggests a functional role for HIP1. My analysis of the genome-wide spatial distribution of HIP1 suggests that the motif lacks periodicity, voting against a role in supercoiling. The spatial distribution of HIP1 motifs in mRNA transcript data from Synechococcus sp. PCC 7942 reveals a significant 3 bias, which is suggestive of regulatory functions such as transcription termination and inhibition of exonucleolytic degradation. I conclude by discussing my findings in the context of cyanobacterial evolution and propose testable hypotheses for future work.

195 pages

Thesis Committee:
Dannie Durand (Chair)
N. Luisa Hiller
Jeffrey Lawrence
Daniel Barker (University of St. Andrew's)

Robert F. Murphy, Head, Computational Biology Department
Andrew W. Moore, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu