I now have two sources of evidence supporting recombination in H. influenzae. First, the inability to resolve the phylogenetic relationships of strains, even when using whole genome sequences. Second, pairs of sites have independent evolutionary histories, indicative of mosaicism within a locus (determined using the Phi test of Bruen et al. 2006). Star phylogenies can be a sign of recombination and/or population expansion. But taken together with the independent evolution of pairs of sites, there is good evidence that recombination occurs between strains of H. influenzae. But despite the evidence from phylogenetic methods and the Phi test, I would still like to see at least one signature of recombination.
To do this, and to quantify the number of recombination events that have occurred during the divergence of each strain, I have been using a program called GENECONV (Sawyer 1999). This program identifies regions in an alignment where pairs of strains are more similar to each other than would be expected, given the distribution of polymorphism throughout the rest of the gene/locus/alignment. These are examples of allele exchange between strains in the alignment (or between a strain not in the alignment and both strains in the alignment). I will refer to these as INs. The program also identifies regions where a particular strain is more different from other strains than expected, given the distribution of polymorphism throughout the rest of the gene/locus/alignment. These are examples of allele exchange between a strain in the alignment and a strain not in the alignment (OUTs). When I used an alignment of the whole genomes of 13 strains as input, and counted how many nucleotides are recombinant from the output, I got very similar numbers for all strains. For example, for the INs I got numbers that ranged from 444,214 nt to 538,306 nt. This means that this many nucleotides in one particular strain were involved in a recombination event (either as donor or recipient, we will never know which). And for the OUTs I got numbers that ranged from 122,124 nt to 128,546 nt. This means that this many nucleotides in one particular strain were involved in a recombination event (probably as a recipient). As a control, when I randomized the order of nucleotides in the input alignment, no recombinant fragments were found. And when I have looked at regions of the alignment identified by the program, I do see evidence for recombination.
Basically what these results mean is that all strains have similar numbers of nucleotides that have entered the genome through recombination events. But I am bothered by how similar the numbers are for each strain, and therefore find this result hard to believe. I think I need to learn a lot more about the GENECONV program.