In July/August 2017 there was a bit of ‘disagreement’ on the social media platform Twitter, with some right-wing users attacking the BBC over an educational cartoon about Roman Britain, because some of the characters were shown with a darker skin than others. Sides were taken in articles and postings by experts and others on the question of ethnic diversity in Roman Britain. This same issue has popped up again on occasions since.
Mary Beard, the classical historian, emphasised what is known about the diversity of the Roman presence in Britain, and came in for considerable abuse from the far-right.There was an intervention on the other side from Nassim Nicholas Taleb, the statistician, who made various assertions based on genetics. It’s not my purpose to discuss these arguments here: it does seem that both sides talked past each other to some extent, especially in regards to the meaning of ‘diversity’. You can find more about this here and here and here (and other articles).
There is just one particular point I am interested in here, and it arises from an intervention on Twitter from Taleb, which I responded to at the time.
@wmarybeard: this is indeed pretty accurate, there’s plenty of firm evidence for ethnic diversity in Roman Britain
@nntaleb: Historians believe their own BS. Where did the subsaharan genes evaporate? NorthAfricans were lightskinned.
Only “Aethiopians”, even then
@nntaleb: We have a clear idea of genetic distributions hence backward composition; genes better statisticians than historian hearsay bullshit
Trying to leave aside the left-right political positions and the racist motivations involved in the wider discussion, this particular argument seems to reflect an idea that supposed ‘hard science’ (in this case, statistical genetics) trumps ‘soft’ (history/archaeology). I do not want to get into discussion of ‘hardness’ and ‘softness’ here, but I shall try to analyse Taleb’s specific argument about genetic distributions.
There are three sets of data in this problem. One is the set of data on the genes of the living population of interest, and another is the set of genetic data on the historic (or prehistoric) population that the living population is being compared with. There is also another set of data: the written records, archaeological finds and all other items that comprise what Taleb dismisses as ‘hearsay bullshit’.
All of these sets have their own specific technical problems. The historical data certainly have problems of interpretation amongst themselves: dating of objects and authorship or precise meaning of documents, for example, may be uncertain.
But collecting genetic data also has considerable technical problems: for example, obtaining reliable data from living populations is much easier than it is from fossils or human remains, as there are typically fewer remains than there are available living humans. Also the DNA from human remains may have degraded over time or been contaminated with other DNA, such as from microbes, and needs careful separation.
There is an additional problem with the data from both living and past populations: we have to be reasonably sure that the sets we have are representative of the populations they are taken from, since in neither case can we analyse the DNA of every individual. The genome of an individual is ‘data’ or an ‘anecdote’ to the same extent as a single archaeological find or written record is. It doesn’t tell us anything about the population until it is put into context with all the other similar data. This is a particular case where statistics is used: it is a mathematical tool to help us make a (probabilistic) estimate of the genetic composition of the population that our data sets come from. This is not an automatic process: it involves making assumptions and using the right statistical method, so it has its own issues and uncertainties.
We can think of the problem we are trying to solve here as a theory: how can we explain the genetic makeup of the modern population in terms of the genetic makeup of the past population at the period of history that is of interest? Looking at it this way, we can see that there is no ‘backward composition’ we can automatically use to derive one from the other, whether it uses statistics or any other mathematical techniques. The genetics of the modern population must depend on its history, as well as the scientific principles of genes. In the time that elapsed between the past and the present populations, we need to know what has happened to the population. Did certain groups migrate in or out of the population, was there mixing of different populations, was the population subjected to ethnic cleansing or genocide? These are questions that are very hard to settle, for example, to what extent did invading Anglo-Saxons displace the British currently settled in what is now England. Historians have good reason to believe that the Anglo-Saxons formed a new ruling class, and that some of the existing British were displaced by migration, but there is still dispute as to what proportions of the original population were killed or displaced.
So clearly we can’t ignore history here. The current genetic composition of a population is a result of both genetics and history. That history is part of the problem to be solved. Some of it will be attested by artefacts and documentation, some of it is purely hypothetical (and might be solved with assistance from the genetics). The point here is that the historical evidence cannot be dismissed: any theory, including reconstructing ancient populations, must reconcile all the relevant evidence, or it is simply inadequate. If it is contradicted by the evidence (and the evidence is not found to be defective), then the theory is false, and needs modification or replacement. The evidence here includes (at least) the available genetic data and the historical evidence, including written records and archaeological finds.
Note that I am not making claims about the actual history of Roman Britain here. I don’t have the necessary expertise in the technical fields. I am simply trying to analyse the problem, to see why genetics is not in itself adequate to the problem, and the history cannot be dismissed. Solving the problem necessarily needs the technical expertise of the geneticists, historians and any others with relevant knowledge.
To see why the history is essential, consider the following scenarios (not an exhaustive list) that might happen to a population of interest:
- The population remains isolated from any other.
- A small number of immigrants arrives, they seize power and become the ruling class, and eventually merge with the main population through interbreeding.
- A large number of immigrants arrives and merges with the main population through interbreeding.
- A foreign power occupies the country, using troops from other populations, but remains largely separate from the main population. In the end it (largely) leaves.
All these scenarios will leave both different genetic traces and historical and archaeological data. Likely, given the inadequacy of the evidence, the problem will never be finally settled, as there will always be anomalies, gaps in the data and unsolved questions.
Statistics is a branch of mathematics. Strictly speaking, it is fit only for analysing distributions of pure numbers. Whenever statistics is used as a mathematical tool in solving questions about the real world, other restrictions apply. We are then not dealing with pure numbers but with physical entities (and the concepts we use to understand those entities). In the particular problem above, these entities include people and their genes. In the physical world, these entities are subject to other principles, including the human lifecycle and the physical processes of genetic combination.
Sadly, statisticians, however good they are at statistics, can lose sight of this fact, and claim authority for statistics that is simply not justified. A very good example of this is in the supposed debate over human-caused global warming, where statisticians have weighed in on the ‘sceptic’ side with statistical analyses that simply ignore the laws of physics. One statistician has called this ‘mathturbation‘. Climate measurements are not simply numbers, they are properties (such as temperature or carbon dioxide concentration) of physical entities such as air or oceanic water, and our theories about them are part of physics.
In trying to understand things that happen in the real world, outside mathermatical textbooks, you can’t ignore the technical experts and their knowledge: statistics may prove them wrong, but only when correctly applied to the subject data.
Nassim Nicholas Taleb
The involvement of Taleb in this debate was very strange. He seemed to glory in taking the part of extreme right-wing participants in (sometimes vicious) attacks on historians on Twitter. He derided Mary Beard’s academic credentials, and frequently calls experts in other fields ‘bullshitters’. Yet his claim about genes and statistics reproduced above is bullshit, where bullshit is the term for not actually lying, but giving the impression of having knowledge he didn’t actually have.
Taleb’s background is as a statistician and a trader on financial markets. Financial markets are about the nearest thing in the real world to pure numbers, and mistakenly thinking that climate data are in some way similar to markets has led many people into error. Perhaps this confusion applies to other areas as well.
Taleb also seems to have a very thin skin, and has blocked me, and apparently many others who disagreed with him, on Twitter.