Ethio Helix ኢትዮ:ሒሊክስ: Extensive Doctoral Thesis on Ethiopian Y and mtDNA

I was contacted earlier by Dr. Chris Plaster about a doctoral thesis on Ethiopian Y & mtDNA that was completed 2 years ago but had been embargoed to the public until only about two months ago. As this is the first time I am coming across of it, plus since it is 204 pages long I have not had a chance to go through it thoroughly, but suffice it to say that this is the most extensive work on Ethiopian NRY & mtDNA that I have seen to date, although the resolution leaves a lot to be desired, I will update this post more as I read it more thoroughly over the next few days/weeks...

Variation in Y chromosome, mitochondrial DNA and labels of identity on Ethiopia

Some numbers and figures that caught my attention at first glance:

The Discussion section also has some interesting things to say, especially with respects to haplogroups A3b2 and J, but also the remaining ones found in Ethiopia as well.

UPDATE (11/27/2012) - Received some more resolution on a portion of the NRY data from Dr. Plaster that was carried out later and not included in the thesis:

Link to Source Document

UPDATE2: Interactive Chart of Figure 3.2 (for improved legibility)
UPDATE3 (11/28/2012)- Analyzing Ethiopian E-M34 haplotypes.

One of the more curious results with respect to the NRY haplogroups found in this dataset is the high amount (24.6%) of E-M34 in the Maale samples. Previously, Cruciani '04 (see here and here for details) had found E-M34 widespread in Ethiopia with a more Northerly concentration (Amhara - 24%, Ethiopian Jews - 14%, Oromo - 8%, Wolayta -8%), this newer data however shows the opposite, i.e a more Southerly concentration of E-M34 (Maale - 25%, Amhara - 13%, Oromo - 10%, Afar - 4%).

To explain the apparently lower diversity of Ethiopian E-M34 haplotypes relative to ones found in the Near East, Cruciani '04 had also proposed that the lineage may have back migrated to East Africa from the Near East, although not completely abandoning the possibility that it may also have originated in East Africa.

Luckily, the data for the Ethiopian E-M34 haplotypes found in this paper is actually accompanied with STR data, 23 independent 15 marker E-M34 haplotypes.

So this gave me a chance to compare the diversity of the haplotypes with non-Ethiopian E-M34 haplotypes that are available a the haplozone site. A compilation of E1b1b1 and subclades of E1b1b1 67 marker haplotypes from this site can be downloaded from here. For this analysis only E-M123 and E-M84 haplotypes are used.

The method I used to compare the haplotypes is the same as outlined in this previous blog post. The only difference is that I am constrained with the number of markers available to me, thus I have used 14 of the following markers to compare haplotype diversity/TMRCA:

DYS19 DYS388 DYS390 DYS391 DYS392 DYS393 DYS389I DYS389II DYS437 DYS438 DYS439 DYS448 DYS456 Y GATA H4

where the marker DYS635 is unfortunately missing in the haplozone site and is not included in the analysis.

The TMRCA results for the 3 different datasets (E-M34_Plaster, E-M123_Haplozone and E-M84_Haplozone) are as follows:

Dataset:E-M34_Plaster
Sample size:23
Years/Generation:28 - 33
TMRCA Range:4590 - 7558
Mean TMRCA:6055
Median TMRCA:5920
SD:836

Year/Generation =28 detailed:
finalsummary =

{
[1,2] = Chandler;14 Markers TMRCA(Median)--5920.7 TMRCA(Modal)--6412.9
[1,3] = Stafford;14 Markers TMRCA(Median)--5120.6 TMRCA(Modal)--5593.2
[1,4] = Burgarella_Navascues;14 Markers TMRCA(Median)--4988.7 TMRCA(Modal)--5635.7
[1,5] = Ballantyne;14 Markers TMRCA(Median)--4590.1 TMRCA(Modal)--4993.9
}

Dataset:EM123_Haplozone
Sample size:129
Years/Generation:28 - 33
TMRCA Range:4120 - 6131
Mean TMRCA:5067
Median TMRCA:5147
SD:667

Year/Generation =28 detailed:
finalsummary =

{
[1,2] = Chandler;14 Markers TMRCA(Median)--5202.1 TMRCA(Modal)--5202.1
[1,3] = Stafford;14 Markers TMRCA(Median)--4405.3 TMRCA(Modal)--4405.3
[1,4] = Burgarella_Navascues;14 Markers TMRCA(Median)--4121 TMRCA(Modal)--4121
[1,5] = Ballantyne;14 Markers TMRCA(Median)--4330 TMRCA(Modal)--4330
}

Dataset:E-M84_Haplozone
Sample size:69
Years/Generation:28 - 33
TMRCA Range:3666 - 5124
Mean TMRCA:4458
Median TMRCA:4347
SD:483

Year/Generation =28 detailed:
finalsummary =

{
[1,2] = Chandler;14 Markers TMRCA(Median)--4347.9 TMRCA(Modal)--4347.9
[1,3] = Stafford;14 Markers TMRCA(Median)--3666.2 TMRCA(Modal)--3666.2
[1,4] = Burgarella_Navascues;14 Markers TMRCA(Median)--3885.7 TMRCA(Modal)--3885.7
[1,5] = Ballantyne;14 Markers TMRCA(Median)--4219.2 TMRCA(Modal)--4219.2
}

It is not necessary to get fixated on the absolute TMRCA numbers, rather what is more informative are the relative TMRCA numbers, since the mutation rates being used for all 3 datasets come from the same source. In addition, the absolute TMRCA is not very informative due to the low number of markers, for instance, if I used these same 14 markers to compute a mean TMRCA across all 4 mutation rate sets for the E-M35 balanced dataset, I get 7,038 YBP, where as if I use 46 markers across all mutation rates I get a mean TMRCA of 11,984 YBP and yet again if I use 66 markers (but limited only to the Chandler mutation rates) I get a mean TMRCA of 14,802 YBP. So a reasonable amount of markers are needed before the absolute TMRCA starts to plateau to a meaningful number.

However, the relative TMRCA's clearly show the Ethiopian E-M34 haplotypes to be more diverse, and thus putatively older, than both the E-M84 and E-M123 haplotypes from haplozone, and that in itself is quite interesting.

UPDATE4 (11/29/2012) -Analyzing Ethiopian J-M267 haplotypes.

Similar to the above I used the 48 J-M267 haplotypes from this paper to compare them with non-Ethiopian J-M267 haplotypes from the FTDNA projects database and the results were as follows:

Dataset:J-M267_Plaster
Sample size:48
Years/Generation:28 - 33
TMRCA Range:12188 - 21364
Mean TMRCA:15006
Median TMRCA:14448
SD:3108

Year/Generation =28 detailed:
Finalsummary =

{
[1,1] = Chandler;14 Markers TMRCA(Median)--18128 TMRCA(Modal)--18128
[1,2] = Stafford;14 Markers TMRCA(Median)--12460 TMRCA(Modal)--12460
[1,3] = Burgarella_Navascues;14 Markers TMRCA(Median)--12331 TMRCA(Modal)--12331
[1,4] = Ballantyne;14 Markers TMRCA(Median)--12189 TMRCA(Modal)--12189
}

Dataset:J-M267_FTDNA

Sample size:573

Years/Generation:28 - 33

TMRCA Range:11288 - 31985

Mean TMRCA:17597

Median TMRCA:16324

SD:5654

Year/Generation =28 detailed:

finalsummary =

{

[1,1] = Chandler;14 Markers TMRCA(Median)--18873 TMRCA(Modal)--27139

[1,2] = Stafford;14 Markers TMRCA(Median)--11955 TMRCA(Modal)--16285

[1,3] = Burgarella_Navascues;14 Markers TMRCA(Median)--11289 TMRCA(Modal)--15253

[1,4] = Ballantyne;14 Markers TMRCA(Median)--12084 TMRCA(Modal)--16363

}

Note again that the 66/46 Marker size mean TMRCA for the FTDNA dataset was considerably lower (9901) than the above 14 marker dataset, again highlighting the impact of Marker combination / size on the absolute TMRCA. However, it is clear from above that the FTDNA J-M267 haplotypes are relatively more diverse than the haplotypes from Ethiopia from the current paper (unlike the case for E-M34 above).

Another interesting find in this paper with respect to the J-lineage is the reporting of one case of J (x M267, M172) in the Maale, a first such find in Ethiopia that I am aware of.

UPDATE4 (11/30/2012) -Analyzing Ethiopian E-V32 haplotypes.

To finalize the series of TMRCA calculations I have been doing, I performed the same calculations on the E-V32 dataset vs Haplozone, interestingly, it seems as though the E-V32 lineages in Haplozone are older than the ones in the Plaster paper, a reasonable explanation for this is that since we already know that E-V32 is for the most part restricted to Eastern Africa (a) most of the Haplozone E-V32 haplotypes, may have relatively recent East African ancestry, a possibility since a reasonable majority of the haplotypes are from the Arabian peninsula and the near east and/or (b) We know that there are already a few East African (Somali) haplotypes within the E-V32_Haplozone dataset. (Note: the self declared origins of the E-V32 Haplotypes from haplozone were:
11 from the Near East (Qatar, UAE, Jordan, Saudi and Yemen), 2 from Africa (Egypt and Somalia) and 4 of unknown origin).

Here below is the summary for the TMRCA comparisons I have done thus far, each bar within each dataset represents the mean TMRCA when the years per generation is equal to 28 and 33 , and the putative ancestral haplotype is set to median and modal repeats for the specified mutation rate set.

Also, note that the 72 E-M35_Plaster haplotypes are a composite of 18 E-V32, 4 E-V22, 23 E-M34, 1 E-M281 and 26 E-V6 haplotypes. Whereas the 180 E-M35_Haplozone haplotypes are a composite of, 20 E-V13, 20 E-V22, 20 E-V12, 60 E-M81 and 60 E-M123 haplotypes.

11 comments:

MajuNovember 26, 2012 at 7:12 PM
Very promising. I'll try to get some time tomorrow to read it in depth.

So far I have stopped at the last two graphs and, contrary to what you said once, it does seem that Cushitic peoples are less influenced than Semitic ones by Eurasian lineages: even the Afar, who live closer to Arabia than Amhara do show less Eurasian genetics. It's not a simple story but it does seem like Semitics show yet another layer of Eurasian influx.

Obvious correlations:

Y-DNA K (J, T) - mtDNA N (M too but less obvious - see below).

Y-DNA E1b1a7 - mtDNA L2 (possibly some specific subclade(s))

Possibly Y-DNA E2 with some mtDNA L0/L1 subclade.

Regarding M (which would be all M1, which is so highly derived under M, i.e. M → M1'20'51 → M1, that it MUST be a returning lineage from Asia) it seems like it may have spread with African Y-DNA lineages, at least in Ethiopia. Otherwise we are be in the strange case in which the immigrant Y-DNA is smaller than the mtDNA - not unheard of (see North Africa for example) but that requires an explanation. Such explanation may be a later expansion within Africa led by native African Y-DNA (for example that of early Afroasiatic languages in the Epipaleolithic) but I'm not sure how it fits Ethiopian prehistory.
andrewNovember 27, 2012 at 6:29 PM
It would be most helpful to know the linguistic affiliations of the identified groups other than the Afar, Amhara, Anuak, Maale, and Oromo.

I'd also be quite interested to hear the gist of the discussion section's treatment of haplogroups A3b2 and J* - the J1/J2 piece pretty much speaks for itself.

The data look quite suitable for a principal components analysis chart (ideally including both mtDNA and Y-DNA in the same analysis) which might show some of the trends that would otherwise be less obvious.
UmiJanuary 3, 2013 at 12:10 PM
Interesting study. I am most surprised at the relatively high levels of J among some of the Omotic groups. This weakens the Ethiopian origin theory of Afro-Asiatic substantially. Perhaps future studies with deeper resolution will clarify the situation better.
Awale IsmailJanuary 22, 2015 at 9:27 AM
Where are the Oromo samples from? Are they from all over Ethiopia or?

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Monday, November 26, 2012

Extensive Doctoral Thesis on Ethiopian Y and mtDNA

Variation in Y chromosome, mitochondrial DNA and labels of identity on Ethiopia

11 comments:

Blog Archive

Search This Blog

Contact Form

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Monday, November 26, 2012

Extensive Doctoral Thesis on Ethiopian Y and mtDNA

Variation in Y chromosome, mitochondrial DNA and labels of identity on Ethiopia

11 comments:

Blog Archive

Search This Blog

Subscribe To

Contact Form