Demographic information for each of the world’s nations was obtained from the Joshua Project’s database. This database identifies and quantifies distinct cultural, linguistic, and religious groups of people within each country and further agglomerates these people into broader ‘people clusters’. Using this data I formed people cluster inventories for each nation. I started with the most numerous people clusters within each country and continued counting until I had reached at least 97% of that nation's total population. I will use the Afghani case to illustrate this process. The Pashtun from the largest people cluster in the country, 43.33% of the population. Following this group comes the Persian cluster at 26.98% and then the Uzbeks at 9.58%. I continue with this process until reaching at least 97% of the total population. It should be emphasized that the people cluster level brings together numerous heterogeneous ‘people’. The Persian cluster in Afghanistan, for instance, consists of the Qizilbash, Afghani and Tajiki Persians, non-Afghani and non-Tajiki Persians, Pahlavani, and Warduji. I selected the cluster level as my level of granularity due to the lack of available genetic data at finer scales. The research presented in this report may be updated and refined as more precise data becomes available.
I soon discovered that some 1.5 billion people around the world live in ‘non-homeland’ countries (the term homeland is defined here simply as the country housing the largest population in that given cluster and makes no assertion regarding native, indigenous, or ancestral claims). For instance, 55.1 million Anglo-Celts live in the United Kingdom, while 44.9 Anglo-Celts live outside of the United Kingdom. In order to account for these diaspora populations, I also include in each national inventory many people clusters whose numbers within that country may not arise to a major proportion, but whose global diaspora populations are substantial. So for instance, the data shows that only about 76,000 people belonging to the Tajik people cluster live in Afghanistan, while the global Tajik diaspora numbers some 1.7 million Tajiks living outside Tajikistan. I therefore include the small Tajik cluster in the Afghan inventory. Using this method of inclusion I have accounted for 99.6% of the world’s population and 98.2% of the world’s diasporatic populations in the study.
Once having completed the selection of people clusters within each national inventory, I set about to find Y-DNA genetic data in the published literature for each people cluster-- a monumental task (sources). I have limited the boundaries of this study to Y-DNA information due to time constraints. I am fantastically interested in complementing the work done here with a comparable set of mtDNA data set as well. It will be fascinating to see how closely, or not, the mtDNA evidence follows the contours of the Y-DNA results, but this particular will have to be left to others to complete. I was often able to locate in the literature a specific report of Y-DNA haplotypes for a given people cluster. Occasionally I found more than one report for a cluster. In some cases I could combine the data to form a larger sample or select the latest and most comprehensive among them. As any researcher in this infant field is aware, the past ten years has seen a variety of haplogroup naming systems. This nomenclature confusion has been largely resolved by the ISOGG standardization (this paper adopts the 2011 ISOGG standards). However, when the lack of contemporary data (meaning older than about 8 years) required the use of earlier publications, I was often confronted with the task of converting from one of several naming systems into the current ISOGG standard. Finally, the demographic and Y-DNA data were combined into country by country summaries, providing a haplogroup census for each nation. The summaries are displayed as charts on the individual country pages.
Please continue reading about the risks, assumptions, and potential errors inherent in these data.