Haplogroup Data

          Data Gaps: In some cases, I was unable to find a specific sampling of haplotypes for a given people cluster. In these cases I used a variety of methods to approximate data based on the best available data. Sometimes this involved using a related cluster from a neighboring country whose population had been studied. For instance, if I had data for Pygmys in the Central African Republic but none for Burundi, I transferred these data from one country to the other. In other cases, for lack of known data, I had to use a haplogroup model covering a regional territory. For instance certain Bantu clusters in central Africa are not reported and so I used a generic, central African, Bantu haplogroup identity. The specific sources used for each cluster are described here and the country-by-country calculations are included in the database for download. I have compiled a list of unknown Y-DNA haplogroup data on the Data Gaps page-- in case any field genealogists out there are looking for an untapped population to sample or know of some completed study of which I am not aware.

          Sample Size and Randomness: Sample sizes vary from study to study. Some haplogroup information is reported on the basis of only a handful of individuals from the population and thus may present an inaccurate picture of the true identity of the population. The sample size, ‘n’, for each people cluster is included in the country by country calculation spreadsheets. The degree of randomness of the samples is also a potential source of statistical error.

          Founder Effects/Distortions: There is an assumption made here that haplogroup identity for a given people cluster remains intact across migratory events. Once I determined the haplogroup profile for, say, the Greek cluster, I used the same profile wherever the Greek cluster appears. The assumption therefore is that the Greek cluster population of Australia shares the same haplogroup identity with the Greek cluster populations in Greece or in Mexico or anywhere else. This is not necessarily so since it could be that the emigrants do not represent the entire haplogroup spectrum of the land of origin (i.e., a genetic bottleneck) . Cultural or economic factors in the land of origin could lead to a higher proportion of certain sub-haplogroups migrating to another place and therefore founding a cluster exhibiting an entirely different proportion of haplotypes from that at home. In some cases I have attempted to incorporate this type of ‘founder effect’ into the data. For instance, since I know that most of the East Indian laborers brought by the British to the New World came from around Uttar Pradesh, India, I used data from that part of India when calculating for East Indian haplotypes in former British colonies in the New World.

          Lab Errors and Study Methodologies: Since I did no primary research on population haplogroups, this study is entirely dependant on the scientific literature. I have used a broad array of studies from many countries, some coming from as early as 1999, and there exists the possibility of laboratory error in their reported results. Likewise, I have depended on the methodologies used in these studies to select their sample subjects and to interpret the DNA data into haplogroup categories. As to the former, I have tried to opt for the largest and most robust sampling methodologies when given the luxury to choose between studies. As to haplogroup classification, I have made every effort possible to use the marker system (e.g., M60, P30, etc.) to confirm haplogroup designations. In studies using older classification systems I have updated the results according to the latest ISOGG tree, being careful to go no further than the original data will allow. So for instance, the Rosser (2000) classification scheme allows classification only to a certain level of granularity. When Rosser assigns a people cluster into his HG category 1, we can only intepret this as an R1b. The complete table of clade markers and their corresponding ISOGG letters is available here

          Varying Levels of Haplogroup Granularity: As laboratory methods and classification trees have evolved over the past ten years, a more nuanced picture of sub-haplogroups has emerged. Today, we have complex trees involving numerous nested sub-haplogroups. This means that not all studies reach the same level of classification granularity. For instance, certain studies may have determined that an individual haplotype belongs to the E1 group while another may take its genetic testing all the way to the E1a1a1a1b level. The latter result means that the individual exhibits the E1 mutation plus a bunch of other mutations down to that sub-level, but the former result could also be an E1a1a1a1b—it was not tested down to that level. What this means is that the comparison of sub-haplogrouping between studies is not possible at this time. In spite of this reality, it is still possible to analyze the data at the major haplogroup level: in the above example, both studies concur in the E and E1 mutations for these individuals. Therefore, there is enough cladistic consistency across studies to produce a global result at the major haplogroup or major sub-haplogroup level. As more testing is reported, greater data resolution should become possible.

Ecosocionomic Analyses

          Erroneous Association of Y-DNA with Race, Ethnicity, or Nationality: Y-DNA identity is not to be misconstrued as a surrogate for 'race', ethnicity, or nationality. Gender-linked markers are a completely distinct form of classification. One need only remember that the sex chromosomes form only a small fraction of the complete human genome and, as far as we know, have no impact direct on physiognomy. To understand this, consider a person whose ancestral grandfather was a central African man who passed on the B haplotype to his son with a mother who was from Ethiopia. Now imagine that this son and all future grandsons follow the same pattern (B haplotype son mates with Ethiopian woman). After countless generations a male descendant of this line probably looks like an Ethiopian, but his Y-DNA haplogroup is still the B prevalent in central Africa. This same scenario could be repeated between members of any racial or ethnic group such that any haplogroup could be theoretically exhibited by members of any race, ethnicity, or nationality. For the same reason, members of the same 'race', ethnicity, or nationality could belong to different ancestral haplogroups and often do. One need only observe haplogroup diversity in any of the people clusters portrayed in this study to see the mechanism in action. People whom we could consider Scandinavian, for instance, are often members of the I or the R or the N haplogroups. As mentioned in the papers, the primary objective of this study is the exploration of associations between gender-linked ancestral markers and complex ecosocionomic indicators.

          Indicators and Values Equally Distributed Within Nations: When we use the haplogroup as the organizational principle for analysis, we are making the assumption that all haplogroups within a political group are equal participants in the underlying socioeconomic processes. Thus, if we say that the US, as a whole, has an ecological deficit of X and we know that 15% of the population in the US consists of haplogroup K, then we assume that 15% of X can be credited to haplogroup K. This type of assumption is common with ‘per capita’ type socioeconomic statistics and this study is no different in applying a 'per haplogroup' analysis.

          Non-Penetrance of Gender Linked DNA: One of the hypotheses explored by this work is the relationship between ancestral haplogrouping statistics and other ecosocionomic statistics. The assumption of course is that gender linked DNA is somehow able to ‘penetrate’ through to the level of phenotype, social structure, or behavior, either directly or indirectly. The null hyposthesis is that gender-linked genes simply determine gender and then have no other impact on human attributes or aggregations. There are studies beginning to appear that explore this hypothesis but the field is still nascent and mostly focused on physiological penetrance at this stage:

Association of gender-linked genes and immunity:

Yousefi S, et al. (2008) Catapult-like release of mitochondrial DNA by eosinophils contributes to antibacterial defense. Nature Medicine 14: 949-953. doi:10.1038/nm.1855

Association of gender-linked genes and specific neurons: 

Kimura K, et al. (2008) Fruitless and Doublesex Coordinate to Generate Male-Specific Neurons that Can Initiate Courtship. Neuron 59(5): 759-769. doi:10.1016/j.neuron.2008.06.007

Association of gender-linked genes and HLA typing: 

Thorsby E (2012) The Polynesian gene pool: an early contribution by Amerindians to Easter Island. Phil. Trans. R. Soc. B 367(1590): 812-819. doi: 10.1098/rstb.2011.0319

Association of gender-linked genes and coronary artery disease: 

Mearns BM (2012) Coronary artery disease: Y chromosome link to CAD risk. Nature Reviews Cardiology 9: 187. doi:10.1038/nrcardio.2012.24

Association of genes with anthropology and genography: 

Novembre J, et al. (2008) Genes mirror geography within Europe. Nature 456: 98-101. doi:10.1038/nature07331

Association of cultural traits and socioeconomic indicators: 

Gorodnichenko Y, Roland G (2011) Individualism, innovation, and long-run growth. PNAS 108(Supp. 4): 21316-21319. doi: 10.1073/pnas.1101933108.


Comments