Tuesday, 16 April 2019

Ancestors

Figure 1
Introduction

We are still worrying at the problem of inheritance first noticed at reference 1: does Queen Elizabeth II carry genes from William the Conqueror, her well attested if distant ancestor? The outline of an argument that she does not, given by Reich on page 11 of reference 2, is plausible, but we have failed to join up the dots. We cannot move the argument from plausible to convincing. Not that this stopped us from moving rather further into this interesting book, on which we hope to report more fully in due course.

Nor are we satisfied with our argument at reference 1. For one, it assumes that chromosomes always splice between genes, moving genes around but not breaking them up. And while it is true that chromosomes do not splice at random, there are hot spots for splicing, we have found nothing which suggests that splicing does not cut across the genes of a chromosome. For two, it is an external argument, placing external limits on what can happen: a bit like saying that π is not a rational number, so no finite calculation, however clever, is going to come up with an exact value. We would prefer a more constructionist argument, an argument which better illustrates the processes involved. But we have failed at formal analysis.

So over the past few days, we have resorted to simulation using Visual Basic (VB) in a small Microsoft Excel workbook, largely built by plagiarising some other bits of work, work which happened to contain useful machinery.

Once again, even allowing for the need to get all the bugs out of the simulation, we have been impressed with what simulation can do, with a reasonably modern laptop able to get through billions of calculations, enough, at least for present purposes, to give reassuringly stable results from (pseudo) random processes: much easier than formal analysis with paper and pencil and something only available to amateurs for the last twenty years or so. I associate to a story about Soviet mathematicians and physicists being really good with paper and pencil because that was all they had. While our only real problem has been the behaviour of the Excel random number generator, which does not seem to be as random as one might have hoped, and we have not taken the time to explore its behaviour in detail.

The model

We continue to neglect the possibility of mutation.

We continue to assume that parents are genetically independent, that two people who marry have no ancestors in common. A rather brave assumption in the case of our rather endogenous royal families, where there was, until recently at least, nowhere near enough breeding out.

We next observe that if our humans have one copy of one chromosome and that each egg just takes one of the two available from the two parents, not much mixing is going on and any particular human is most unlikely to have genetic material from any particular distant ancestor.

We next allow our humans to have two copies of the one chromosome and allow mixing at the time parents pass a chromosome to an egg, with this chromosome being built by splicing together one chunk from one and another, complementary chunk, from the other, as suggested in the left hand part of Figure 1 above. Any particular human is now much more likely to have genetic material from distant ancestors, with the price being that it may not be all that much.

We start with someone called Fred. The task is, working forward from Fred, to trace the fate of Fred’s two chromosomes in his various offspring, taking into account their mixing with chromosomes from large numbers of other people. For how many generations can one expect his descendants to actually carry some part of one of his chromosomes?

Features of our VB model:
  • Our humans have two copies of just one chromosome, considered as the unit real interval, [0,1]
  • At reproduction time, each parent builds a new version of that chromosome for inclusion in the egg by allowing exactly one splice between his or her two copies. The top half of one and the bottom half of the other (top left hand splice above), or vice-versa (top right hand splice above). Note that the numbers in that figure are there to suggest positions in the chromosome, without regard to the numbers of genes or anything else involved in real chromosomes
  • The position of the splice is uniformly distributed on the unit interval, using the Excel Rnd and Randomize functions. This improves on the assumption of reference 1 mentioned above
  • The direction of the splice is evenly distributed; that is to say evenly distributed between top left hand splices and top right hand splices
  • In any chromosome built by successive applications of this process, Fred’s chromosome will survive, if at all, as just one segment in just one of the two chromosomes in any particular individual, possibly a small segment, of our unit real interval. We call that segment the ancestral segment
  • We call the tracking of that ancestral segment through the generations, until it vanishes, as an iteration
  • We have done many iterations, using the Excel random number generator to call the shots at each turn.
All this means, inter alia, we can model the state of the chromosome of any of Fred’s descendants as a pair of reals in the unit segment: start of ancestral segment and end of ancestral segment. All we need to do is model the steady progression of the dilution of Fred’s genes, the remorseless shortening of the ancestral segment, as we move down through the generations. A massive simplification of the situation we started with, a simplification which can be modelled in just a few lines of VB code.

In which, at any generation, looking to the next generation, given the position of the splice, we have just six cases to check.

In two of these cases, the ancestral segment vanishes, as is the case in the top right hand splice above. In two others, there is no change to the ancestral segment, as is the case in the top left hand splice above. In the last two, there is compromise and either a bit is lopped off the top of the ancestral segment (top right hand splice) or a bit is lopped off the bottom (top left hand splice), as can be seen by mentally sliding the green ancestral slice around the splice. It is this lopping which gives rise to the remorseless shortening mentioned above.

Results

Results, with 10,000,000 iterations per run, seem robust, more or less identical to those with either 5,000,000 or 1,000,000 iterations. So the results of the simulation are clear enough, even if we have failed on the theory.

So, having generated 1,000,000 descendants, the table on the right hand side of Figure 1 above looks at the survival of Fred’s genes in those descendants. The rows of the table are the generations, corresponding in a rough way with time. So the zero in line one says that all the children of Fred’s children carry some part of one of his chromosomes – the result of all those first generation children having one of his two chromosomes in entirety. The length of their ancestral segment is 1. So wherever the splices are made, all of Fred’s grandchildren will have some part of one of his chromosomes. Length of ancestral segment somewhere between 0 and 1, average 0.5. The line never stops there, the ancestral segment never vanishes there. But the ancestral segment vanishes in around a quarter of his great grandchildren. This is the meaning of line 2 – 250,294 out of our 1,000,000 descendants of Fred.

The number of generations through which an ancestral segment survives appears to be capped at around 25, say 600 years or so allowing 25 years to the generation, rather less than what is needed to make it to Elizabeth II from William the Conqueror. Presumably it can go further, but only very rarely. So on this basis at least, the answer to the opening question seems to be no.

Figure 2
As an aside, we observe that the distribution of the length of the ancestral segment at its vanishing point shows that if a is less than b, then the probability of length a is greater than that of length b, with this rule only breaking down at the very end, when numbers are very small. This being illustrated in Figure 2 left. While we had expected the distribution to be smooth and orderly, but with the maximum occurring some way down, perhaps at a quarter – rather than at zero.

We were mindful that something odd turning up in the statistics is often a mistake. But diligent search has failed to reveal any such.

Notice also that the middle part of the table in Figure 1 extends downwards pretty much by division by two – which tells us that there is theory there, even if we can’t find it!

We suspect the rather too reliable gaps at the bottom of the table to be some artefact of Excel random numbers or rounding.

Notwithstanding, we then made the straightforward generalisation to 50 chromosomes, doing the same sums for each of them and taking the longevity of Fred’s genes as the maximum of the longevity of the ancestral segments in each of the 50 chromosomes. This does not seem to make very much difference, with the number of generations still being capped at around 30. So it remains most unlikely that our Queen is carrying any of William’s genes. She might be carrying genes which are the same as his, but that is not quite the same thing – rather, a nice philosophical point.

All which is a tribute to the power of Microsoft’s Excel’s Visual Basic on a laptop. We leave others to speculate on whether one of the other products around twenty years ago would have turned out as well, had they not been killed off by the power of Microsoft.

References

Reference 1: https://psmv4.blogspot.com/2019/03/counting-ancestors.html.

Reference 2: Who We Are and How We Got Here: Ancient DNA and the new science of the human past - David Reich – 2018.

No comments:

Post a Comment