GENECONV takes pairs of aligned sequences and compares them to find regions that are more similar to each other than expected. These regions are considered to be candidate recombinant fragments. But the worrisome thing is how GENECONV determines the divergence expected between sequences. The reason this is a concern is because my output shows, for some strains, over half of the genome is more similar than expected. And to make matters worse, the strains with these large numbers of "recombinant" regions are those that phylogenetic analyses identifies as close relatives. These results make me think that instead of finding recombinant fragments, GENECONV is finding regions of vertical descent.
It could be the case that these putative recombinant regions are fooling the phylogenetic reconstruction methods, such that these strains are placed together as close relatives only because they share so many regions due to recombination. But if the majority of the genome is the result of homogenization of sequence between two strains, aren't these each others closest relatives anyway, whether through vertical or horizontal descent? Nevertheless, I need to sort out exactly what is happening during GENECONV runs, starting with how it determines what similarity is to be expected between two sequences. Does the method used to measure expected similarity take phylogenetic structure into account?