First, the region preceding the start codon. I've aligned the region between the start codon of sxy and the start codon of the upstream gene, recA; this region is approximately 372 bp. One of the most striking differences between strains is that Rd seems to have a 52 bp deletion about 185 bp before the sxy start codon (I say this was a deletion in Rd because it is unique to Rd and all other 12 strains are almost identical at this region). Looking at an old paper mapping the transcription start site and other juicy details of this region (Zulty and Barcak 1995), it seems as though this may be a binding site for the integration host factor (IHF) protein, which facilitates the integration of phage genomes. However, my not-so-thorough literature search on IHF binding sites in H. influenzae didn't come up with much so I will leave this alone for now. And it is so far away from the start codon that I don't know if it would have implications for the transcription of sxy. Another noteworthy result: some strains differ in the sequence of their -35 site, which isn't that surprising as the consensus for this site is not that strict if I remember correctly.
Second, the coding region. Overall, there is very little variation between strains in the coding region. Strains only differed from one another by ~10 positions at most. But what is most interesting is that there seems to be two alleles segregating. And even more interesting is that the majority of nucleotides that define each allele would cause a different amino acid to be incorporated into the protein (i.e. are nonsynonymous). This may not seem so surprising, as there are more nonsynonymous sites in a gene than there are synonymous; so if mutations happen at random, they will most often occur at a nonsynonymous site. Because most of these mutations will change the amino acid sequence, they are likely to be deleterious and removed from the population by natural selection. So one could argue that there hasn't been enough time for selection to purge these nonsynonymous mutations from the H. influenzae. But I know from the analysis of sxy and other genes that there has been plenty of time for selection to purge such mutations from the population, if they are indeed harmful. Therefore, this finding raises several questions: are the two alleles functionally different? Are they associated with differences in competence? And has selection maintained both alleles, or is one on the rise?
It would be great to address some of these questions. I have data for the second one, but the other two are a little more difficult, especially since sxy is not an easy gene to work with.