Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing

Tomasz Grzybowski; Ryszard Pawłowski; Tomasz Kupiec; Wojciech Branicki; Renata Jacewicz

doi:10.5114/amsik.2018.84532

eISSN: 1689-1716
ISSN: 0324-8267

Archiwum Medycyny Sądowej i Kryminologii/Archives of Forensic Medicine and Criminology

Current issue Archive Manuscripts accepted About the journal Supplements Editorial board Reviewers Abstracting and indexing Subscription Contact Instructions for authors Ethical standards and procedures

Editorial Policies

Sarajevo Declaration on Integrity and Visibility of Scholarly Publications

4/2018
vol. 68

Send email

Copy url:

Original paper

Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing

Tomasz Grzybowski

,

Ryszard Pawłowski

,

Tomasz Kupiec

,

Wojciech Branicki

,

Renata Jacewicz

Arch Med Sadowej Kryminol 2018; 68 (4): 242–258

DOI: https://doi.org/10.5114/amsik.2018.84532

Online publish date: 2019/04/17

Article file

- zalecenia.pdf [0.65 MB]

Get citation

EndNote

Papers, Reference Manager, RefWorks, Zotero

AMA

Grzybowski T, Pawłowski R, Kupiec T, Branicki W, Jacewicz R. Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing. Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology. 2018:242-258. doi:10.5114/amsik.2018.84532.

APA

Grzybowski, T., Pawłowski, R., Kupiec, T., Branicki, W.,  & Jacewicz, R. (2018). Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing. Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology, 242-258. https://doi.org/10.5114/amsik.2018.84532

Chicago

Grzybowski, Tomasz, Ryszard Pawłowski, Tomasz Kupiec, Wojciech Branicki,  and Renata Jacewicz. 2018. "Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing". Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology: 242-258. doi:10.5114/amsik.2018.84532.

Harvard

Grzybowski, T., Pawłowski, R., Kupiec, T., Branicki, W.,  and Jacewicz, R. (2018). Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing. Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology, pp.242-258. https://doi.org/10.5114/amsik.2018.84532

MLA

Grzybowski, Tomasz et al. "Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing." Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology, 2018, pp. 242-258. doi:10.5114/amsik.2018.84532.

Vancouver

Grzybowski T, Pawłowski R, Kupiec T, Branicki W, Jacewicz R. Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing. Archiwum Medycyny Sądowej i  Kryminologii/Archives of Forensic Medicine and Criminology. 2018:242-258. doi:10.5114/amsik.2018.84532.

PlumX metrics:

Wstęp

Analiza zmienności mitochondrialnego DNA człowieka (mtDNA) jest dobrze ugruntowaną strategią badawczą, stosowaną w genetyce populacyjnej i ewolucyjnej od połowy lat 80. ubiegłego wieku. Początki jej wykorzystania w genetyce sądowej sięgają pierwszej połowy lat 90. Obecnie, w erze genomiki populacyjnej, badania mtDNA dotyczą nie tylko jego fragmentów, lecz coraz częściej także całej cząsteczki, stanowiąc przydatne narzędzie służące zarówno dokładnemu śledzeniu historii demograficznej populacji na poziomie linii żeńskich [1], jak i identyfikacji oraz określaniu pochodzenia w linii żeńskiej na potrzeby sądowe [2].
Sekwencjonowanie całego genomu mitochondrialnego podnosi wartość identyfikacyjną porównań sekwencji mtDNA, zawsze należy jednak pamiętać, że mtDNA jest markerem haploidalnym i w sensie ścisłym pozwala na identyfikację grupy osób spokrewnionych w linii żeńskiej. Z drugiej strony duże międzypopulacyjne zróżnicowanie mtDNA, obserwowane zarówno na poszczególnych kontynentach, jak i na mniejszych obszarach, pozwala na przewidywanie pochodzenia biogeograficznego w linii żeńskiej, wpisując się w szerszy nurt współczesnych badań predykcyjnych. Coraz powszechniejszemu wykorzystaniu zmienności mtDNA sprzyja szybki rozwój technik wielkoskalowego sekwencjonowania DNA (massively parallel sequencing – MPS), które uzupełniają, a niekiedy wręcz zastępują tradycyjne sekwencjonowanie DNA metodą Sangera.
W międzynarodowym środowisku naukowym sformułowano do tej pory liczne rekomendacje dotyczące wykorzystania analiz mtDNA na potrzeby sądowe. Ich stosowanie przyczyniło się do polepszenia jakości wyników badań laboratoryjnych i podniosło wiarygodność populacyjnych baz danych [3–5]. Wytyczne te mają szczególne znaczenie zwłaszcza przy zwiększaniu zakresu sekwencjonowania mtDNA i sięganiu po nowe podejścia laboratoryjne oraz metody analizy danych.
Niniejsze opracowanie stanowi pierwszą część rekomendacji stworzonych przez działającą od 2017 r. Polskojęzyczną Grupę Roboczą Międzynarodowego Towarzystwa Genetyki Sądowej (ISFG-PL), przygotowanych w ramach prac Zespołu ds. Standardów i Opiniowania (Team for Standards and Assessment in Forensic Genetics – TSA). Ich celem jest wytyczanie standardów, tworzenie platformy wymiany doświadczeń oraz upowszechnianie wiedzy o obowiązujących wytycznych, które powinny być przestrzegane we wszystkich polskich laboratoriach wykonujących badania dla celów sądowych. Przedstawione zalecenia dotyczące analizy mtDNA w genetyce sądowej sformułowano na podstawie doświadczeń eksperckich członków ISFG-PL oraz ostatnich rekomendacji ISFG.

Wytyczne dotyczące specyfiki procesu badawczego

W praktyce laboratoriów sądowych z analiz mtDNA korzysta się przede wszystkim przy identyfikacji niewielkich ilości silnie zdegradowanego materiału biologicznego. Sprzyja temu znaczna przewaga ilościowa mtDNA w komórce w stosunku do DNA jądrowego (od kilkudziesięciu do blisko 7000 cząsteczek, w zależności od rodzaju tkanki i jej zapotrzebowania energetycznego oraz wieku osobnika), a także sposób jego organizacji (cząsteczka kolista, kowalencyjnie zamknięta). Czułość analizy opartej na reakcji łańcuchowej polimerazy (polymerase chain reaction – PCR) oraz specyfika materiału biologicznego i samego mtDNA sprawiają, że ujawnienie zanieczyszczeń (kontaminacji) badanych próbek materiałem pochodzącym z zewnątrz jest w przypadku mtDNA dużo bardziej prawdopodobne niż w przypadku DNA jądrowego.
Problem kontaminacji wynikający z czułości PCR dotyczy zarówno tradycyjnych protokołów analizy mtDNA za pomocą sekwencjonowania metodą Sangera (bezpośrednie sekwencjonowanie produktów PCR), jak i nowszych protokołów sekwencjonowania wielkoskalowego, w których biblioteki DNA ulegają najczęściej klonalnej amplifikacji. Wymusza to konieczność stosowania rygorystycznych środków zapobiegawczych, takich jak korzystanie z pomieszczeń przeznaczonych do tego typu badań, komór z laminarnym przepływem powietrza, sprzętu laboratoryjnego i odczynników, stosowanie tipsów z filtrami, odzieży jednorazowej, przestrzenne rozdzielenie procedur przed- i poamplifikacyjnych, sterylizacja światłem UV czy traktowanie powierzchni roboczych podchlorynem sodu. Poza tym laboratorium powinno utrzymywać system stałego monitorowania kontaminacji, obejmujący stosowanie negatywnych kontroli ekstrakcji i amplifikacji DNA, kontroli pozytywnych, a także prowadzenie i aktualizację zbioru haplotypów personelu. Możliwe jest wprawdzie uzyskanie wiarygodnych wyników badań przy niskim poziomie kontaminacji [5], np. gdy haplotyp zidentyfikowany w negatywnej kontroli ekstrakcji jest inny niż haplotyp badanych próbek, ale sposób postępowania po ujawnieniu kontaminacji zawsze zależy od kontekstu sprawy i wymaga zachowania daleko idącej ostrożności w interpretacji wyników. Prowadzenie „bazy danych” haplotypów będących wynikiem kontaminacji może pomóc w monitorowaniu problemu w dłuższych przedziałach czasowych. Ułatwia to zaobserwowanie ewentualnych prawidłowości w pojawianiu się kontaminacji, a co za tym idzie – ustalenie ich źródła.
Specyfika materiału poddawanego badaniom mtDNA często przypomina tzw. kopalne DNA, na co zwracano uwagę w najwcześniejszych opracowaniach badań mtDNA dla celów sądowych [3]. Przykładem jest identyfikacja materiału kostnego sprzed kilkudziesięciu lat i starszego. Środowisko badaczy „kopalnego DNA” sformułowało liczne kryteria wiarygodności wyników analiz genetycznych, które w dużej części pokrywają się z wytycznymi genetyków sądowych [6, 7]. Nie wszystkie są możliwe do spełnienia w sprawach zlecanych przez organy procesowe, w szczególnie trudnych przypadkach masowych identyfikacji materiału kostnego możliwe jest jednak spełnienie takich kryteriów, jak niezależne badania i potwierdzenie haplotypów najbardziej wymagających próbek w przynajmniej dwóch laboratoriach. Wymusza to koordynację i współpracę między różnymi ośrodkami.
W czasach gdy sekwencje mtDNA ustalano wyłącznie za pomocą tradycyjnej metody Sangera, stworzono rekomendacje, według których haplotypy przedstawiane w sprawozdaniach z badań powinny znaleźć odzwierciedlenie w danych surowych pochodzących z więcej niż jednego sekwencjonowania dla danego regionu, najlepiej z obu nici. Zalecenie to ma szczególne znaczenie w przypadku sekwencji znajdujących się poniżej odcinków homopolimerowych, których wiarygodny odczyt z pojedynczego sekwencjonowania jednej nici jest utrudniony lub niemożliwy. Dopuszcza się wówczas sekwencjonowanie fragmentów poniżej niestabilnych odcinków homopolimerowych z wykorzystaniem alternatywnych starterów [5]. Specyfika technik MPS sprawia, że te zalecenia są wypełniane niejako automatycznie, ponieważ haplotypy konsensusowe określa się na podstawie odczytów sekwencji pojedynczych cząsteczek i operuje pojęciem pokrycia dla całego mtDNA lub wybranego regionu.
Wykazano, że największy odsetek błędów w badaniach mtDNA stanowią różnego rodzaju błędy pisarskie [8], dlatego zaleca się minimalizację ręcznej obróbki danych – np. zamiast ręcznego wpisywania haplotypów do tabel rekomenduje się eksportowanie ich z programów używanych do analizy danych. Ważnym wymaganiem pozwalającym na uniknięcie takich błędów oraz pomyłek interpretacyjnych jest niezależne analizowanie surowych danych przez co najmniej dwóch ekspertów. Kryterium to, będące częścią dobrej praktyki laboratoryjnej, jest na ogół znane i przestrzegane w laboratoriach badawczych akredytowanych zgodnie z normą PN-EN ISO/IEC 17025:2005. Równie istotnym zaleceniem, obligatoryjnym dla laboratoriów akredytowanych, jest regularne uczestnictwo w zewnętrznych badaniach biegłości (PT) lub porównaniach międzylaboratoryjnych (ILC) poświęconych analizie mtDNA, takich jak GEDNAP (German DNA Profiling), GHEP-ISFG czy CTS (Collaborative Testing Services) [5].
Zagadnieniem pośrednio związanym z problemem pomyłek pisarskich jest zakres analizy sekwencji mtDNA. Warto zwrócić uwagę, że powszechna w laboratoriach genetyczno-sądowych praktyka sekwencjonowania fragmentów regionu kontrolnego z oddzielnych amplikonów (np. HVS I: 16024–16365 p.z.; HVS II: 73–340; HVS III: 340–576) zwiększa liczbę wymaganych analiz danej próbki, przez co może zwiększać ryzyko wystąpienia błędów. Z drugiej strony analiza mtDNA w postaci mniejszych, nakładających się fragmentów jest często wymuszona jakością materiału, zwłaszcza przy bardzo silnej degradacji DNA. Analiza mniejszych fragmentów zmniejsza również ryzyko ujawnienia się ewentualnego niskiego poziomu kontaminacji, gdyż amplifikacja krótszych fragmentów DNA wymaga z reguły mniejszej liczby cykli PCR. Analiza nakładających się na siebie sekwencji mtDNA stanowi natomiast formę kontroli uzyskanych wyników. Niezależnie od liczby analizowanych produktów PCR zwiększenie zakresu sekwencjonowania mtDNA w oczywisty sposób podnosi wartość identyfikacyjną wyników badań. Dzięki zastosowaniu metod MPS ograniczenia dotyczące jakości materiału i konieczność zwiększania zakresu sekwencjonowania są łatwiejsze do pogodzenia ze względu na liczne opcje „multipleksowania” i jednoczesnej analizy sekwencji wielu amplikonów [9].
W związku z tym ISFG, zalecając ogólne zwiększanie zakresu sekwencjonowania i minimalizację liczby amplikonów w badaniach metodą Sangera, sformułowało minimalne wymaganie, zgodnie z którym przynajmniej na potrzeby badań populacyjnych podejmowanych w celu tworzenia baz danych konieczna jest analiza sekwencji pełnego regionu kontrolnego (16024–576 p.z.). Należy podkreślić, że w wykorzystywanej powszechnie w badaniach sądowych populacyjnej bazie danych EMPOP (www.empop.org) zdeponowano już dane z pełnych genomów mitochondrialnych (w obecnej wersji bazy v.4/R11 jest 256 pełnych haplotypów), a ich liczba prawdopodobnie będzie rosła. Dużą ilość danych pełnogenomowych, które przeszły pozytywnie kontrolę jakości zgodnie z procedurami EMPOP, uzyskano jeszcze za pomocą tradycyjnej metody Sangera [10, 11].

Wytyczne dotyczące porównywania i zapisu sekwencji mtDNA

W najnowszych rekomendacjach [5] ISFG podtrzymuje długo stosowaną praktykę porównywania haplotypów mtDNA względem sekwencji referencyjnej rCRS [12] o aktualnym numerze dostępu w bazie NCBI NC_012920. Poprawiono w niej 11 pozycji w stosunku do sekwencji CRS (Cambridge Reference Sequence) opublikowanej w 1981 r. przez Andersona i wsp. [13]. Sekwencja rCRS jest haplotypem pochodzenia europejskiego reprezentującym rzadką podhaplogrupę H2a2a1. Przed kilkoma laty dyskusję na temat ewentualnej zmiany sekwencji referencyjnej wywołała praca Behara i wsp. [14], w której autorzy zaproponowali zastąpienie jej tzw. sekwencją RSRS (Reconstructed Sapiens Reference Sequence) – zrekonstruowanym haplotypem stanowiącym hipotetyczny korzeń drzewa filogenetycznego mtDNA człowieka współczesnego. RSRS różni się od rCRS w 52 pozycjach, w tym w 16 w samym regionie kontrolnym.
Propozycja znalazła oddźwięk zarówno w środowisku genetyków populacyjnych, jak i sądowych, przeważyły jednak głosy za utrzymaniem rCRS jako sekwencji referencyjnej [15, 16]. Argumentowano m.in., że choć rCRS rzeczywiście należy do „młodych” odgałęzień drzewa filogenetycznego i nie należy traktować go jako „haplotypu wyjściowego” czy „sekwencji typu dzikiego”, zastąpienie go RSRS wywołałoby ogromny chaos interpretacyjny, nieporozumienia nomenklaturowe i konieczność „przetłumaczenia” dotychczasowych rekordów w populacyjnych bazach danych zgodnie z nowym nazewnictwem. rCRS należy zatem traktować wyłącznie jako haplotyp odniesienia, z którym porównuje się haplotypy badane, a jego wybór ma charakter czysto arbitralny [16]. Mimo to warto nadmienić, że dostępne online globalne drzewo filogenetyczne mtDNA (PhyloTree), często stosowane do weryfikacji przynależności haplogrupowej badanych próbek, począwszy od wersji 14 bazuje na RSRS jako sekwencji referencyjnej. Dostępna jest również wersja oparta na rCRS [17]. W sprawozdaniach z badań należy bezwzględnie podawać zakres, w którym analizowane sekwencje są porównywane z rCRS. Homoplazmatyczne różnice w stosunku do sekwencji referencyjnej oznacza się wielkimi literami, przy czym stan rCRS określa się za pomocą przedrostka, a stan haplotypu badanego – przyrostka względem pozycji sekwencji (np. A263G). Delecje oznacza się przyrostkiem „DEL”, „del” lub „–” (np. A249DEL), zaś insercje standardowo dokumentuje się za pomocą liczby nukleotydów poprzedzonych kropką w kierunku 3’ od danej pozycji sekwencji (np. –309.1C, –309.2C). Zmiany heteroplazmatyczne należy oznaczać za pomocą wielkich liter zgodnie z kodowaniem IUPAC (Y = C/T, R = A/G itd., np. T16093Y), natomiast małe litery są zarezerwowane dla heteroplazmatycznych mieszanin (np. zapis –309.1c oznacza heteroplazmatyczną mieszaninę cząsteczek z insercją cytozyny poniżej pozycji 309 oraz pozbawionych tej insercji).
Nieporozumienia interpretacyjne mogą wynikać z różnych sposobów przyrównywania haplotypu badanego do sekwencji referencyjnej, zwłaszcza w obrębie odcinków homopolimerowych. Aby zapobiec takim problemom, sformułowano tzw. filogenetyczne kryteria przyrównywania haplotypów, według których filogeneza mtDNA ma zasadniczo pierwszeństwo przed zasadą największej oszczędności, tj. dążeniem do przedstawiania haplotypów za pomocą najmniejszej możliwej liczby różnic w stosunku do sekwencji referencyjnej [18]. Punktem odniesienia jest PhyloTree w wersji z rCRS jako sekwencją referencyjną [17].
Przykładowo, według wspomnianych kryteriów haplotyp regionu kontrolnego jednej z próbek pochodzących z testu kompetencji GEDNAP 54, należący do haplogrupy R0a, należałoby zapisać jako T16126C T16362C T58C –60.1T C64T T152C A263G –309.1C –309.2C –315.1C, nie zaś – jak nakazywałaby zasada największej oszczędności – T16126C T16362C –57.1C C64T T152C A263G –309.1C –309.2C –315.1C. Preferowany zapis wynika z faktu, że w drzewie filogenetycznym tranzycja regionu kontrolnego T58C jest jedną z mutacji definiujących haplogrupę R0a’b, z której wywodzi się R0a (PhyloTree, build 17). Według tych samych kryteriów podczas przyrównywania sekwencji w obrębie odcinków homopolimerowych w HVS I i HVS II należy zawsze odpowiednio dokumentować tranzycje T16189C oraz T310C, natomiast mutacje długości krótkiego fragmentu złożonego z reszt adeniny poprzedzającego pozycję 16184 najlepiej przedstawiać w postaci transwersji (przykładowo, zmiana C16184A jest jedną z mutacji definiujących wschodnioazjatycką podhaplogrupę B4c2). Opracowanie Bandelta i Parsona [18] dostarcza całego mnóstwa innych reprezentatywnych przykładów podejścia filogenetycznego do przyrównywania sekwencji w obrębie odcinków homopolimerowych.

Wytyczne dotyczące interpretacji heteroplazmii sekwencji i długości

W dobie szybkiego rozwoju technik sekwencjonowania MPS, które obecnie są już stosowane lub wprowadzane w wielu laboratoriach genetyczno-sądowych, szczególnego znaczenia nabierają rekomendacje dotyczące interpretacji heteroplazmii sekwencji i długości w mtDNA. O ile w przypadku sekwencjonowania Sangera heteroplazmia sekwencji na ogół jest wykrywalna dzięki analizie wysokości pików na elektroforegramach, kiedy poziom wariantu mniejszościowego przekracza 10% wariantu większościowego [19], o tyle sekwencjonowanie MPS przy dostatecznie wysokim pokryciu pozwala na wykrywanie heteroplazmii na poziomie poniżej 10% ([20] i cytowane tam pozycje piśmiennictwa), a nawet znacznie poniżej tej wartości, bo ok. 1% [21].
Równie istotnym zagadnieniem jest liczba pozycji heteroplazmatycznych. W wynikach sekwencjonowania metodą Sangera występowanie nakładających się sygnałów od dwóch zasad w kilku pozycjach sekwencji, zwłaszcza specyficznych haplogrupowo, najczęściej oznacza mieszaninę, a nie rzeczywistą heteroplazmię. W danych MPS występowanie większej liczby niejednoznaczności na poziomie niższym niż 10% nie musi natomiast oznaczać mieszaniny bądź kontaminacji. Przykładowo, w danych pełnogenomowych, uzyskanych za pomocą platformy Illumina z 20 000-krotnym pokryciem, udokumentowano występowanie heteroplazmii w maksymalnie siedmiu pozycjach u danej osoby, przy uwzględnieniu wariantów występujących na poziomach od 0.1% do 10% [22]. Z drugiej strony opisano skrajne przypadki ogromnej liczby heteroplazmii sekwencji (do 71), które po analizie filogenetycznej okazały się wynikiem kontaminacji i pomieszania próbek. Przykładem mogą być dane z 1085 pełnych genomów mitochondrialnych pochodzące z projektu „1000 Genomów”, zebrane przez Ye i wsp. [23] i krytycznie omówione przez Just i wsp. [24] oraz Skonieczną i wsp. [25].
W odróżnieniu od heteroplazmii sekwencji, która u większości zdrowych osób dotyka różnych pozycji wykazujących zmienność w filogenezie mtDNA, lecz nie specyficznych haplogrupowo [25], heteroplazmia długości, wynikająca z niestabilności odcinków homopolimerowych, dotyka jedynie określonych fragmentów w regionie kontrolnym. W obrębie niektórych z nich mogą się znajdować pozycje sekwencji specyficzne haplogrupowo. W HVS I mutacje długości dotyczą przede wszystkim fragmentu znajdującego się między nukleotydami 16183 i 16194 w obecności tranzycji T16189C, która prowadzi do powstania nieprzerwanego odcinka 10 reszt cytozyny. Odcinek ten jest niestabilny, toteż często obserwuje się insercje reszt C poniżej pozycji 16193 skutkujące powstaniem ciągu od 11 do 14 reszt C. Tranzycja T16189C jest diagnostyczna dla niektórych haplogrup, np. dla rzadkiej, lecz wyróżniającej się szerokim występowaniem w Europie haplogrupy X. W HVS II heteroplazmia długości dotyka najczęściej fragmentu znajdującego się między nukleotydami 302 i 316. Pierwsza część tego fragmentu, znajdująca się powyżej pozycji 310, jest o wiele bardziej niestabilna niż część zlokalizowana poniżej, dlatego bardzo często obserwuje się heteroplazmię długości z wariantami –309.1C, –309.2C czy 309.3C. Niestabilność sekwencji poniżej pozycji 310 jest stosunkowo rzadka, o ile nie dojdzie do tranzycji T>C skutkującej pojawieniem się ciągu 13 reszt C. Przy tranzycji T310C, która jest diagnostyczna dla środkowo- i wschodnioeuropejskiej podhaplogrupy U4a2, ciąg reszt C często ulega skróceniu; obserwuje się w nim m.in. warianty C313DEL, C314DEL czy C315DEL. Heteroplazmia długości występuje również w ciągu reszt C znajdującym się pomiędzy nukleotydami 568 i 573 (HVS III).
Sekwencjonowanie metodą Sangera na ogół nie pozwala na precyzyjne określenie liczby wariantów w przypadkach heteroplazmii długości, możliwe jest jednak podanie wariantu większościowego, rekomendowane przez ISFG do tworzenia populacyjnych baz danych [5]. W razie konieczności podania dokładnej liczby wariantów można posłużyć się analizą wielkości amplikonów generowanych z odpowiednich fragmentów regionu kontrolnego, zgodnie z protokołem zaproponowanym przez Berger i wsp. [26]. Przytoczone dane pokazują znaczenie ogólnych wytycznych ISFG dotyczących zarówno heteroplazmii sekwencji, jak i długości, na podstawie których laboratoria powinny określić własne kryteria interpretacji tych zjawisk, w zależności od jakości danych, specyfiki technologii oraz doświadczenia. Przykładowo, heteroplazmię sekwencji danych MPS uzyskanych za pomocą platformy 454-Life Sciences przez Skonieczną i wsp. [21, 25] uważano za potwierdzoną, jeśli wariant mniejszościowy obserwowano w przynajmniej 20 odrębnych odczytach wysokiej jakości, co najmniej 35% odczytów zawierających wariant mniejszościowy pochodziło z obu nici mtDNA, a stosunek ilościowy odczytów z obu nici dla wariantu większościowego i mniejszościowego był podobny. Zgodnie z zaleceniami ISFG żaden z rodzajów heteroplazmii nie jest podstawą do wykluczenia dwóch identycznych haplotypów z tej samej linii żeńskiej [5].

Zalecenia dotyczące kontroli jakości danych

Zarówno dane populacyjne, jak i wyniki sekwencjonowania DNA uzyskiwane w konkretnych sprawach w laboratoriach genetyczno-sądowych powinny podlegać kontroli jakości przy użyciu dostępnych narzędzi filogenetycznych. Bardzo prostym, a zarazem skutecznym sposobem kontroli jest określanie przynależności haplogrupowej próbek na podstawie mutacji diagnostycznych. Przydatne jest wspomniane już PhyloTree, aktualizowane na bieżąco w miarę publikowania wyników badań populacyjnych w piśmiennictwie naukowym i bazach danych [17]. Użytkownicy mniej obeznani z filogenezą mtDNA mogą skorzystać z narzędzia do automatycznego określania haplogrup (EMMA) oferowanego przez bazę EMPOP, które opiera się na danych pochodzących z PhyloTree. Brak mutacji diagnostycznych oczekiwanych dla danej haplogrupy może być wynikiem rozmaitych błędów laboratoryjnych i pisarskich, z kolei obecność w danym haplotypie mutacji diagnostycznych dla różnych haplogrup najczęściej wynika z kontaminacji bądź pomieszania próbek.
W ramach bazy EMPOP dostępne jest również narzędzie do wyszukiwania ewentualnych artefaktów w zestawie danych populacyjnych oparte na sieciach haplotypów (quasi-median networks). Pozwala ono na szybkie wychwycenie nieobserwowanych dotąd lub nieoczekiwanych mutacji, które mogą stanowić artefakty sekwencjonowania lub być wynikiem innych błędów w analizie mtDNA.
Zastosowanie wspomnianych narzędzi filogenetycznych daje możliwość wykazania błędów w publikowanych danych – nie tylko na gruncie badań sądowych, lecz także populacyjnych i medycznych [27, 28]. Wyniki badań populacyjnych mtDNA przyjmowane do druku w oficjalnym periodyku naukowym ISFG – Forensic Science International: Genetics – podlegają obecnie obligatoryjnej kontroli prowadzonej przez zespół EMPOP. Grupa ISFG-PL zaleca, aby analogicznej kontroli jakości poddawane były populacyjne bazy danych wykorzystywane przez polskie laboratoria do oceny wartości dowodowej wyników badań mtDNA.

Zalecenia dotyczące korzystania z populacyjnych baz danych

Przy interpretacji wyników badań trzeba uwzględnić fakt, że mtDNA jest markerem haploidalnym, w którym różne pozycje sekwencji charakteryzują się różnym tempem mutacji. Z jednej strony należy zatem traktować profile mtDNA jako haplotypy, z drugiej zaś brać pod uwagę względną ewolucyjną stabilność oraz ekstremalną niestabilność różnych pozycji sekwencji. Trudno wobec tego w sposób absolutnie sztywny, oparty tylko na liczbie różnic w sekwencjach, sformułować kryteria wykluczenia dwóch haplotypów z tej samej linii żeńskiej [27].
Zgodnie z rekomendacjami Scientific Working Group on DNA Analysis Methods (SWGDAM) o wykluczeniu można mówić w przypadku zaistnienia dwóch lub więcej różnic pomiędzy porównywanymi próbkami (z wyjątkiem heteroplazmii długości). Z wynikiem nierozstrzygającym mamy natomiast do czynienia, jeśli haplotypy różnią się w jednej pozycji, niezależnie od tego, czy mają takie same warianty długości między pozycjami 302–310 czy różnią się pod tym względem między pozycjami 302–310 przy zgodności wszystkich pozostałych pozycji sekwencji [4]. Autorzy niniejszej pracy uznają te kryteria za dyskusyjne, szczególnie kryterium wyniku nierozstrzygającego bazujące na braku wspólnego wariantu długości. Region 302–310 jest bowiem bardzo niestabilny, w przeciwieństwie do dość stabilnego fragmentu 311–315; zdaniem niektórych autorów wariant 315.1C powinien być nawet brany pod uwagę w rekonstrukcjach filogenetycznych [16]. Ponadto rekomendacje SWGDAM nie uwzględniają różnic w obrębie odcinka homopolimerowego między nukleotydami 16183 i 16194 w obecności tranzycji T16189C, nie wprowadzają też żadnego rozróżnienia między wariantami długości występującymi na tle bardzo częstych i bardzo rzadkich haplotypów. Nie biorą one wreszcie pod uwagę pochodzenia tkankowego badanych próbek – zaistnienie jednej różnicy homoplazmatycznej pomiędzy próbką krwi a włosem jest bardziej prawdopodobne niż między dwiema próbkami krwi [27]. Mimo że nie dysponujemy jeszcze pełnymi danymi na temat tempa mutacji poszczególnych pozycji w mtDNA, dobrym kierunkiem interpretacji byłoby uwzględnienie w obliczeniach ilorazu wiarygodności (likelihood ratio – LR) indywidualnego tempa mutacji zgodnie z sugestią Salasa i wsp. [27], zwłaszcza przy różnicach jednonukleotydowych między badanymi próbkami [29]. W przypadku różnic jednonukleotydowych w obrębie regionu kontrolnego warto również podjąć próbę rozszerzenia zakresu badań, kierując się m.in. informacją na temat przynależności (pod)haplogrupowej analizowanych próbek i ukierunkowując dalsze badania na odpowiednie pozycje sekwencji [30].
Zgodnie z rekomendacjami ISFG do szacowania częstości haplotypów zalecane jest przeszukiwanie baz danych zapewniających maksymalnie konserwatywną wykładnię wartości dowodowej wyniku przeszukania [5]. Ponieważ mtDNA w populacjach charakteryzuje się silnym zróżnicowaniem geograficznym, w praktyce oznacza to, że przy szacowaniu częstości danego haplotypu należy uwzględnić kontekst sprawy i wykorzystywać bazę danych z regionu geograficznego, z którego może pochodzić badany haplotyp. Dla haplotypów należących do haplogrup o pochodzeniu europejskim może to być np. część bazy EMPOP obejmująca populacje zachodniej Eurazji. Jeśli z kontekstu sprawy wynika, że badany haplotyp regionu kontrolnego pochodzi od osoby z populacji Europy Środkowej i Wschodniej, zalecane byłoby wykorzystanie odpowiedniej geograficznie bazy danych, o ile laboratorium nią dysponuje. Przeszukanie bazy regionu kontrolnego zachodniej Eurazji również mogłoby okazać się pomocne ze względu na bardzo nieznaczne rozwarstwienie populacji w tej części świata, przynajmniej na poziomie rozdzielczości regionu kontrolnego [31]. W każdym przypadku laboratorium powinno jednak racjonalnie uzasadnić wybór populacyjnej bazy danych [5]. Niestabilne fragmenty sekwencji wyróżniające się polimorfizmem długości (np. w sąsiedztwie pozycji 16189, 310, 460, 573, powtórzenia „AC” pomiędzy 514 i 524) nie powinny być brane pod uwagę przy przeszukiwaniu baz danych; zgodnie z tym zaleceniem baza EMPOP oferuje możliwość pominięcia tych pozycji. W przypadku heteroplazmii sekwencji przy przeszukiwaniu bazy nie należy wykluczać żadnego z wariantów [5].

Zalecenia odnoszące się do szacowania wartości dowodowej

Istnieje kilka sposobów przedstawiania wartości dowodowej wyników uzyskanych na podstawie przeszukania bazy danych. Najstarsza metoda obejmuje proste podanie liczby obserwacji haplotypu w bazie oraz obliczenie na tej podstawie jego częstości:

p = x / n
gdzie: p – częstość haplotypu, n – liczba haplotypów w bazie, x – liczba obserwacji profilu w bazie zawierającej n profili.
Baza EMPOP umożliwia modyfikację tego podejścia poprzez dodanie do bazy jednego lub dwóch uzyskanych haplotypów [5]:

p = x + 1 / n + 1
lub:
p = x + 2 / n + 2
gdzie: p – częstość haplotypu, n – liczba haplotypów w bazie, x – liczba obserwacji profilu w bazie zawierającej n profili.
Od dawna wskazywano, że szacowanie częstości na podstawie liczby obserwacji w bazie ma wiele wad, zwłaszcza w przypadku rzadkich haplotypów [3]. Aby uniknąć niedoszacowania, zaproponowano więc ostrożniejsze podejście polegające na obliczeniu 95% górnego przedziału ufności [4, 5]:

p0 = p + 1,96 √(p) (1 – p) / n
gdzie: p – częstość określona na podstawie liczby obserwacji profilu w bazie zawierającej n profili, n – liczba haplotypów w bazie.
Dla bardzo rzadkich haplotypów można obliczyć 95% górnego przedziału ufności Cloppera-Pearsona [4]:
gdzie: n – liczba haplotypów w bazie, x – liczba obserwacji profilu w bazie zawierającej n profili, k = 1, 2, 3…x obserwacji.
Jeśli haplotyp nie występuje w bazie, zaleca się bardzo ostrożny sposób oszacowania 95% przedziału ufności [3, 4] według wzoru:

p0 = 1 – α1n = 1 – (0,05)1n gdzie: α = 0,05 – współczynnik ufności dla 95% przedziału ufności, n – liczba haplotypów w bazie.
Po oszacowaniu częstości haplotypu mtDNA wartość LR oblicza się w tradycyjny sposób jako odwrotność częstości.
Baza danych EMPOP pozwala również na uwzględnienie w obliczeniach LR podejścia probabilistycznego (tzw. model kappa), polegającego na przewidywaniu rozkładu częstości rzadkich haplotypów na podstawie odsetka tzw. singletonów, czyli profili występujących tylko raz w bazie danych [32]:

LR = n / 1 – κ
gdzie: κ – odsetek singletonów w bazie, n – liczba haplotypów w bazie.
Mimo że ISFG dopuszcza stosowanie wszystkich wymienionych metod szacowania wartości dowodowej [5], podejście probabilistyczne (model kappa) uznaje się za najlepszy sposób określania siły dowodu w przypadku rzadkich haplotypów i jest ono rekomendowane przez ISFG-PL.
Przy obliczaniu LR w badaniach identyfikacyjnych pod pewnymi warunkami możliwe jest łączenie wyników badań mtDNA z wynikami analiz markerów autosomalnych oraz chromosomu Y, w wyniku czego uzyskuje się wartości LR drogą mnożenia [4, 5]. Podejście to opiera się na założeniu niezależności (braku asocjacji) między profilami autosomalnymi i haploidalnymi w danej populacji, co zostało eksperymentalnie (statystycznie) zweryfikowane dla niektórych próbek populacyjnych [33, 34]. Przy założeniu niezależności postępowanie to jest jednak uprawnione tylko wówczas, gdy bazy danych wykorzystywane do szacowania częstości haplotypów i genotypów obejmują populacje o tym samym pochodzeniu geograficznym, nie ma przesłanek do stwierdzenia struktury (rozwarstwienia) populacji, a stawiane hipotezy są takie same w odniesieniu do różnych typów markerów. Zawsze należy też pamiętać, że wyniki badań mtDNA nie pozwalają na rozróżnienie osób pochodzących z tej samej linii żeńskiej, toteż łączenie wartości dowodowej właściwej dla mtDNA i innych markerów nie ma sensu, jeśli z okoliczności sprawy wynika, że powinno się uwzględnić różne osoby spokrewnione w linii żeńskiej [5, 35, 36]. W związku z tym ISFG-PL zaleca ostrożność przy przedstawianiu wartości LR uzyskanych w wyniku mnożenia. Szczególną uwagę powinno się zwrócić na hipotezy stawiane przy obliczaniu LR dla różnych markerów. Jeżeli laboratorium zdecyduje się na przemnożenie wartości LR, konieczne jest też alternatywne przedstawienie wyników w postaci składowych wartości LR uzyskanych dla poszczególnych markerów [35].

Wnioski

Opracowanie porządkuje istniejące rekomendacje dotyczące analizy mtDNA dla celów sądowych oraz formułuje wskazówki interpretacyjne, które są szczególnie istotne w kontekście zmieniającego się warsztatu badawczego. Zalecenia stworzone przez członków ISFG-PL powinny być przestrzegane we wszystkich polskich laboratoriach wykonujących badania dla celów sądowych.
Autorzy deklarują brak konfliktu interesów.

Introduction

Analysis of variation in human mitochondrial DNA (mtDNA) is a well-established genetic testing strategy which has been used in population and evolutionary genetics since the mid-1980s. The application of mtDNA tests in forensic genetics goes back to the mid-1990s. At present, in the era of population genomics, mtDNA testing is performed not only on fragments, but increasingly on the whole molecule. As a result, mtDNA tests are becoming a very useful tool both for the accurate tracking of the demographic history of the population at the maternal line level [1], and for the identification and determination of maternal lineage for forensic purposes [2]. Even though whole mitochondrial genome sequencing improves the identification value of mtDNA sequence comparisons, it should always be kept in mind that mtDNA is a haploid marker and, in the strict sense, allows the identification of maternal relatives. On the other hand, a high degree of interpopulation mtDNA variation which is observed both across the continents and in smaller areas within the continents allows the prediction of biogeographic maternal lineage, and thus fits into the wider trend of contemporary predictive genetic testing. The growing range of applications of mtDNA variation can be attributed to the rapid development of massively parallel sequencing (MPS) techniques which complement and in certain cases even replace traditional Sanger-type DNA sequencing.
With respect to the application of mtDNA analyses for forensic purposes, the international scientific community has formulated a number of recommendations to date. Following these guidelines has contributed to an improvement in the quality of test results obtained in laboratory practice, and an increase in the reliability of population databases [3–5]. The guidelines acquire a particular significance especially where forensic units expand the range of mtDNA sequencing, implementing new laboratory approaches and methods of data analysis.
This document is the first part of the recommendations developed by the Polish Speaking Working Group of the International Society for Forensic Genetics (ISFG-PL) which was established in 2017. The recommendations are an effect of studies conducted by the Team for Standards and Assessment in Forensic Genetics (TSA). The purpose of these efforts is to set standards, create a platform for the exchange of experiences, and disseminate knowledge of the existing guidelines which should be followed by all Polish laboratories performing forensic tests. The standards applicable to mtDNA analysis in forensic genetics which are presented in this document are based on the expert experience of members of the ISFG-PL and the recent guidelines issued by the ISFG.

Guidelines for the specificity of the testing process

In practice, forensic laboratories conduct mtDNA analyses primarily in cases involving the identification of biological material which is present in small amounts and highly degraded. Such applications are prevalent especially in view of the significant quantitative advantage of mtDNA over nuclear DNA in the cell (from several dozen to nearly 7,000 molecules, depending on the type of tissue and its energy requirements, and the age of the individual) and its organization (covalently closed circular molecule). Given the sensitivity of polymerase chain reaction (PCR)-based analysis, and the specificity of the biological material and the mtDNA itself, the identification of test sample contamination by external material is far more likely in mtDNA analyses than in nuclear DNA tests. The problem of contamination related to PCR sensitivity applies both to traditional protocols of mtDNA analysis by Sanger sequencing (direct sequencing of PCR products) and more recent protocols of massively parallel sequencing, where DNA libraries are most commonly generated by clonal amplification. Consequently, strict anti-contamination measures are required including, whenever feasible, dedicated rooms, laminar air flow cabinets, dedicated laboratory equipment and reagents, use of tips with filters, disposable clothing, spatial separation of pre- and post-amplification procedures, UV light sterilization, and treatment of working surfaces with sodium hypochlorite. In addition to prevention, every laboratory should maintain a system of continuous contamination monitoring including the use of negative DNA extraction and amplification controls, and positive controls, and keep a regularly updated database of haplotypes of all laboratory personnel. Although it is possible to obtain reliable test results at a low level of contamination [5], e.g. when the haplotype identified in a negative extraction control is different from the haplotypes of the test samples, the procedure to follow after identifying contamination depends on the context of a particular case, while the interpretation of results always requires utmost caution. It seems advisable to keep a database of haplotypes resulting from contamination in order to monitor the problem within longer time frames. The method makes it easier to identify possible regular patterns of contamination, and thus helps to determine its likely origin.
It is important to note that the specific nature of the material subjected to mtDNA testing in forensic laboratories often resembles the properties of the so-called ancient DNA tested in laboratories, which was highlighted in the earliest studies on mtDNA analyses for forensic purposes [3]. This happens, for example, in cases involving the identification of bone material from a few dozen years ago and more. It needs to be noted that the investigators of ancient DNA have formulated a series of reliability criteria for genetic analysis results. The criteria are largely consistent with the guidelines formulated by forensic geneticists [6, 7]. Not all reliability criteria can be met in all cases ordered by judicial bodies. However, in the context of mass identification of bone material in particularly difficult cases it seems possible to meet such criteria as independent testing and confirmation of haplotypes of the most challenging samples in at least two laboratories, which requires coordination and cooperation between various testing centres.
During a period when mtDNA sequences were determined exclusively using the traditional Sanger method, a set of recommendations was developed, based on which haplotypes presented in test reports should be reflected in raw data derived from more than one sequencing run performed for a given region, preferably from both strands. The recommendation is particularly important in the case of sequences located downstream of homopolymeric tracts, as their reliable reading from a single sequencing of one strand is either difficult or impossible. In such situations, it is acceptable to perform the sequencing of fragments downstream of unstable homopolymeric tracts using alternative primers [5]. The specific nature of MPS techniques means that the above recommendations are satisfied intrinsically, as consensus haplotypes are determined here based on the reading of single-molecule sequences, and the concept of coverage is used for the entire mtDNA or for a selected region.
It has been shown in a number of studies that the highest percentage of errors in mtDNA tests are typographical errors [8]. Therefore, as a rule, manual data processing should be minimized. For example, instead of manually entering haplotypes in tables, the recommended strategy is to export haplotypes from data analysis programmes. An important recommendation which contributes to the minimization of such errors and a range of misinterpretations is the need to ensure that raw data are analyzed independently by at least two experts. The criterion, a constitutive element of good laboratory practice, is generally known and adhered to in units accredited as testing laboratories in accordance with the PN-EN ISO/IEC 17025:2005 standard. An equally significant requirement, mandatory for accredited laboratories, is regular participation in external proficiency testing (PT) programmes or interlaboratory comparisons (ILC) devoted to mtDNA analysis, for example GEDNAP (German DNA Profiling), GHEP-ISFG or CTS (Collaborative Testing Services) [5].
An important issue indirectly related to the above-mentioned problem of typographical errors is the range of mtDNA sequence analysis. It should be noted that the practice of sequencing fragments of the control region from separate amplicons (for example – HVS I: 16024–16365 bp; HVS II: 73–340; HVS III: 340–576), which is widespread in forensic genetic laboratories, increases the number of analyses required for a given sample and, consequently, may result in a higher error risk. On the other hand, the analysis of mtDNA in the form of shorter overlapping fragments is often determined by the quality of the material, especially in cases involving very heavily degraded DNA. In such situations, analyzing shorter overlapping fragments also reduces the risk of identifying possible low-level contamination, since the amplification of shorter DNA fragments typically requires fewer PCR cycles. The analysis of overlapping mtDNA sequences also represents a specific control method for obtained results. Regardless of the number of PCR products analyzed, increasing the mtDNA sequencing range naturally improves the identification value of test results. With the use of MPS methods, restrictions concerning the quality of material and the need to increase the sequencing range are easier to reconcile, as there are a number of options for “multiplexing” and simultaneous multiplexed amplicon sequencing [9].
Considering the above factors, the ISFG, in its recommendation to increase the overall sequencing range and minimize the number of amplicons in Sanger sequencing, formulated a minimum requirement according to which at least in population genetic studies for forensic databasing purposes, the entire control region (16024–576 bp) should be sequenced. It needs to be stressed that the population database EMPOP (www.empop.org), which is widely used in forensic examinations, already contains full mitochondrial genome data (256 complete haplotypes in the current version of the database – v.4/R11), and their number is likely to increase. A large amount of full-genome data that have successfully passed quality control in accordance with the EMPOP procedures was obtained using the traditional Sanger method [10, 11].

Guidelines for comparison and notation of mtDNA sequences

In the most recent guidelines [5], the ISFG upholds the long-used practice of comparing mtDNA haplotypes against the reference sequence rCRS [12] with the current database accession number NCBI NC_012920. It has been revised in 11 positions in relation to the Cambridge Reference Sequence (CRS) published in 1981 by Anderson et al. [13]. The rCRS sequence is a haplotype of European origin representing the rare sub-haplogroup H2a2a1. Several years ago, a study by Behar et al. [14] provoked a debate by suggesting the replacement of rCRS by the so-called RSRS (Reconstructed Sapiens Reference Sequence), a reconstructed haplotype representing the hypothetical root of the Homo sapiens mtDNA phylogenetic tree. RSRS differs from rCRS in 52 positions, including 16 in the control region. Although the proposal attracted a considerable response from population and forensic geneticists, the opinions in favour of retaining the rCRS as a reference sequence [15, 16] prevailed. For example, it was argued that even though rCRS indeed belonged to the “young” branches of the phylogenetic tree and should not be considered as an “ancestral haplotype” or a “wild-type sequence”, switching from rCRS to a new reference sequence (RSRS) would lead to immense interpretative chaos, nomenclature misunderstandings, and the necessity to “translate” existing records in population databases in accordance with the new naming convention. Consequently, the rCRS should be regarded exclusively as a reference haplotype, against which test haplotypes should be compared, and its historical selection was purely arbitrary in nature [16]. Nevertheless it should be noted that the mtDNA phylogenetic tree available online (PhyloTree) and commonly used for the verification of haplogroup affiliation of test samples, has been based on the RSRS as the reference sequence since the Phylotree Build 14. However, a version based on the rCRS is also available [17].
Test reports should always state the range within which the test sequences are compared against the rCRS. Homoplasmic differences from the reference sequence are described using capital letters, with the rCRS status being notated as a prefix, and the status of the test haplotype as a suffix to the position of the sequence (e.g. A263G). Deletions should be indicated by the prefixes “DEL”, “del” or “–” (e.g. A249DEL). The standard notation of insertions consists of the number of nucleotides preceded by a full stop in 3’-direction from a given sequence position (e.g. –309.1C, –309.2C). Heteroplasmic changes should be designated with capital letters according to the IUPAC coding conventions (Y = C/T, R = A/G, etc.; for example T16093Y), while lower case letters are reserved for heteroplasmic mixtures (e.g. –309.1c refers to a heteroplasmic mixture of molecules with cytosine insertion after position 309 and devoid of the insertion).
However, the application of different methods for the alignment of the test haplotype to the reference sequence, especially within homopolymeric tracts, may result in misinterpretations. To prevent such problems, the so-called phylogenetic haplotype alignment criteria have been developed. Based on them, mtDNA phylogeny essentially takes precedence over the principle of maximum parsimony, i.e. the attempt to present haplotypes with the lowest possible number of differences relative to the reference sequence [18].
The point of reference here is PhyloTree in the version with rCRS as the reference sequence [17]. For example, according to the said criteria the control region haplotype in one of the samples from the GEDNAP Proficiency Test 54, belonging to the haplogroup R0a, should be notated as T16126C T16362C T58C –60.1T C64T T152C A263G –309.1C –309.2C –315.1C, and not as T16126C T16362C –57.1C C64T T152C A263G –309.1C –309.2C –315.1C, as the principle of maximum parsimony would require. The preferred notation resulted from the fact that in the phylogenetic tree the transition of the control region T58C is one of the mutations defining the haplogroup R0a’b from which R0a is derived (PhyloTree, Build 17). Based on the same criteria, the alignments of sequences within the homopolymeric tracts in HVS I and HVS II should always document the transitions T16189C and T310C, respectively, while length mutations in the short adenine stretch preceding the position 16184 should be represented in the form of transversions (e.g. C16184A is a change representing one of the mutations defining the Eastern Asian subhaplogroup B4c2).
A range of other examples of the phylogenetic approach to sequence alignments within homopolymeric tracts can be found in the study by Bandelt and Parson [18].

Guidelines for the interpretation of sequence and length heteroplasmy

In view of the rapid development of MPS sequencing techniques which are currently used or implemented in a number of forensic genetic laboratories, recommendations for the interpretation of sequence and length heteroplasmy in mtDNA acquire a particular importance. While in sequencing by the Sanger method sequence heteroplasmy is generally detectable by analyzing peak heights in electropherograms when the minority variant level exceeds 10% of the majority variant [19], MPS sequencing at a sufficiently high coverage allows the detection of heteroplasmy below 10% ([20] and references cited there), and even at a level significantly below this value, i.e. approximately 1% [21].
An equally important issue is the number of heteroplasmic positions. The presence of overlapping signals from two bases at several sequence positions (especially haplogroup-specific) in results obtained by Sanger sequencing usually signifies a mixture rather than actual heteroplasmy. In contrast, a greater amount of ambiguities at a level below 10% in MPS data does not necessarily correspond to a mixture or contamination. For example, in full-genome data obtained using the Illumina platform with the coverage of 20,000×, heteroplasmy has been documented at a maximum of seven positions in a particular individual, taking into account variants occurring at the levels of 0.1% to 10% [22]. On the other hand, there are extreme cases of reporting a great number of sequence heteroplasmies (up to 71) in a given person which, following phylogenetic analysis, turned out to have been caused by contamination and sample mix-up. A representative example may be the data for 1,085 complete mitochondrial genomes from the 1000 Genomes Project collected by Ye et al. [23] and critically discussed by Just et al. [24], and Skonieczna et al. [25].
In contrast to sequence heteroplasmy, which in healthy individuals affects primarily various positions that exhibit variation in mtDNA phylogeny but are not haplogroup-specific [25], length heteroplasmy, resulting from the instability of homopolymeric tracts, affects preferentially specified fragments in the control region. Within some of these fragments, there may be haplogroup-specific sequence positions. In HVS I, length mutations affect primarily the fragment located between nucleotides 16183 and 16194 in the presence of transition T16189C, which leads to an uninterrupted stretch of 10 Cs. The stretch is unstable, which is why C insertions are often seen downstream of position 16193, leading to a stretch of 11 to 14 Cs. Transition T16189C is diagnostic for some haplogroups, e.g. for the haplogroup X which is uncommon but has a wide geographical distribution across Europe. In HVS II, length heteroplasmy typically affects a fragment located between nucleotides 302 and 316. The first part of this fragment, located upstream of position 310, is far more unstable than the part located downstream of this position, which is why length heteroplasmy with –309.1C, –309.2C or 309.3C variants is very often seen. Sequence instability downstream of position 310 is relatively rare, unless the position is affected by a T>C transition resulting in a stretch of 13 Cs. In the presence of transition T310C, which is diagnostic for the Central and Eastern European subhaplogroup U4a2, the C stretch is often shortened, with observed C313DEL, C314DEL or C315DEL variants. Length heteroplasmy is also observed within the C stretch between nucleotides 568 and 573 (HVS III). Sanger sequencing does not typically allow precise determination of the number of variants in cases of length heteroplasmy, however it is acceptable to report the dominant variant, so the ISFG recommends this method for the creation of population databases [5]. In situations where the number of variants must be specified precisely, one can resort to analyzing the size of amplicons generated from the corresponding fragments of the control region, as set out in the protocol proposed by Berger et al. [26].
The data given above clearly demonstrate the importance of the ISFG general guidelines both for sequence and length heteroplasmy, based on which laboratories should explicitly define their criteria for the interpretation of these phenomena depending on the quality of data, specific features of the technology, and experience. For example, in MPS data obtained by Skonieczna et al. [21, 25] using the 454-Life Sciences platform, sequence heteroplasmy was considered to be confirmed when the minority variant was observed in at least 20 separate high-quality reads, at least 35% of the minor variant reads were from both mtDNA strands, and the quantitative ratio of the reads from both strands was similar for the majority and minority variants. According to the ISFG guidelines none of the heteroplasmy types constitutes a basis for excluding two identical haplotypes deriving from the same maternal lineage [5].

Recommendations for data quality control

Both population data and DNA sequencing results obtained in specific cases in the practice of forensic genetic laboratories should be subject to quality control based on available phylogenetic tools. A very simple, yet effective, control method involves determining the haplogroup affiliation of samples based on diagnostic mutations. Here, a useful tool is PhyloTree mentioned above, which is continuously updated, as new results of population studies are published in scientific journals and become available in databases [17]. For users who are less familiar with mtDNA phylogeny, the EMPOP database offers an automated haplogroup classification tool (EMMA) based on data included in PhyloTree. The absence of diagnostic mutations expected for a given haplogroup may be due to various types of laboratory and typographical errors, and the presence of diagnostic mutations for various haplogroups in a particular haplotype is most commonly a result of contamination or mix-up of samples. The EMPOP database is also provided with a tool for the search of possible artefacts in a set of population data based on quasi-median networks. The tool allows rapid identification of previously unobserved or unexpected mutations which may be sequencing artefacts or result from other errors in mtDNA analysis.
The application of these phylogenetic tools a posteriori offered a possibility to identify a number of errors in published data, and their usefulness is highlighted not only in forensic casework, but also in population and medical tests [27, 28]. It is worth noting that the results of mtDNA population studies which are currently submitted for publication in the official scientific journal of the ISFG – Forensic Science International: Genetics – are subject to prior mandatory quality control by the EMPOP database team. The ISFG-PL Group recommends that an analogous quality control scheme should also be applied to the population databases used by Polish laboratories for determining the evidential value of mtDNA test results.

Recommendations for the use of population databases

While interpreting test results, attention should be given to the fact that mtDNA is a haploid marker in which different sequence positions are characterized by different mutation rates. Consequently, mtDNA profiles should be considered as haplotypes on the one hand, but relative evolutionary stability and extreme instability of various sequence positions should be taken into account on the other. Therefore, it is difficult to formulate the criteria for excluding two haplotypes from the same maternal lineage in an absolutely rigid manner, based exclusively on the number of differences in sequences [27].
Based on Scientific Working Group on DNA Analysis Methods (SWGDAM) recommendations, exclusion occurs when test samples differ at two or more positions (with the exception of length heteroplasmy). The result is inconclusive if the haplotypes differ at a single position whether or not they share common length variants between positions 302–310, or differ in the length variant between positions 302–310, all other positions being concordant [4]. In the opinion of the authors of this study, the criteria given above are debatable, particularly the criterion applicable to the inconclusive result which is based on the lack of a common length variant. Region 302–310 is highly unstable, but 311–315 is a relatively stable fragment, and some authors argue that variant 315.1C should even be taken into account in phylogenetic reconstructions [16]. Also, SWGDAM’s recommendations do not address differences within the homopolymeric tract between nucleotides 16183 and 16194 in the presence of transition T16189C, and fail to differentiate between the length variants occurring against the background of very common and very rare haplotypes. Finally, they do not take into account the tissue origin of test samples. For example, the occurrence of a single homoplasmic difference between a blood sample and a hair sample is more likely than between two blood samples [27].
Even though complete data on the mutation rates of different mtDNA positions are not yet available, the desired interpretation would be to take into consideration individual mutation rates in likelihood ratio (LR) calculations, especially in cases involving single-nucleotide differences between test samples [29], as suggested by Salas et al. [27]. In the case of single-nucleotide differences within the control region, in order to achieve the final resolution, an attempt can be made to expand the scope of testing based on information on the (sub)haplogroup affiliation of the test samples, and target further tests at appropriate sequence positions [30].
According to the ISFG guidelines, haplotype frequencies should be estimated preferably by searching databases that will ensure a maximally conservative interpretation of the evidential value of search results [5]. Since mtDNA occurring in populations is characterized by a high degree of geographical variation, in practice this recommendation means that the estimation of frequency of a particular haplotype should include the context of the case, and use a database from the geographical region from which the examined haplotype may originate. For example, in cases involving haplotypes belonging to haplogroups of European origin, this may be the section of the EMPOP database covering populations of Western Eurasia. If the context of a particular case suggests that the control region haplotype under analysis is that of a person belonging to a Central and Eastern European population, it would be advisable to use a geographically relevant database, if one is available in the laboratory. In this case, though, searching a control region database for Western Eurasia would be justified as well, since the degree of population stratification in this part of the world is very low, corresponding at least to the resolution level of the control region [31]. In all cases under study, however, the laboratory should be able to rationally justify the choice of a population database [5]. Unstable sequence fragments characterized by length polymorphism (e.g. adjacent to positions 16189, 310, 460, 573, “AC” repeats between positions 514 and 524) should not be considered during database searches. According to the guideline, the EMPOP database offers a possibility to ignore these positions. In instances of sequence heteroplasmy, however, none of the variants should be excluded during database searching [5].

Recommendations for the estimation of evidential value

There are several ways to present the evidential value of the results obtained through database search. The earliest approach was to simply specify the number of haplotype observations in the database, and to calculate haplotype frequency on that basis:

p = x / n
where: p – frequency of the haplotype, n – number of haplotypes in the database, x – number of observations of the profile in the database containing n profiles.
The EMPOP database gives a possibility to modify this approach by adding one or two obtained haplotypes to the database [5]:

p = x + 1 / n + 1
or:

p = x + 2 / n + 2
where: p – frequency of the haplotype, n – number of haplotypes in the database, x – number of observations of the profile in the database containing n profiles.
It has long been indicated that the estimation of frequency based on the number of observations in the database has a number of shortcomings in the case of rare haplotypes [3]. Consequently, to address the problem of haplotype underestimation in the database, a more cautious approach has been proposed, based on the calculation of the 95% upper confidence interval [4, 5]:

p0 = p + 1,96 √(p) (1 – p) / n
where: p – frequency determined on the basis of the number of observations of the profile in the database containing n profiles; n – number of haplotypes in the database.
For very rare haplotypes, the Clopper-Pearson 95% confidence interval can be calculated [4]:
where: n – number of haplotypes in the database, x – number of observations of the profile in the database containing n profiles, k = 1,2,3…x observations.
If a haplotype is not found in the database, a very cautious method of estimating the 95% confidence interval [3, 4] is recommended, based on the following formula:

p0 = 1 – α1n = 1 – (0,05)1n
where: α = 0.05 – confidence coefficient for 95% confidence interval, n – number of haplotypes in the database.
Following the estimation of frequency of the mtDNA haplotype, the LR value is calculated in a conventional manner as the inverse of frequency.
The EMPOP database also has a functionality to incorporate the probabilistic approach in LR calculations (in the so-called kappa model), with prediction of the frequency distribution of rare haplotypes based on the percentage of the so-called singletons, i.e. profiles that occur in the database only once [32]:

LR = n / 1 – κ
where: κ – percentage of singletons in the database, n – number of haplotypes in the database.
While the ISFG considers all the above methods for estimating the evidential value as legitimate [5], the probabilistic approach (kappa model) may be the best way to determine the strength of evidence for rare haplotypes. Consequently, the approach is recommended by the ISFG-PL Group.
Under certain conditions, when calculating LR in identification tests, the results of mtDNA tests can be combined with the results of autosomal marker and Y chromosome analyses, producing LR values by means of multiplication [4, 5]. The approach is based on the assumption of independence (i.e. lack of association) between autosomal and haploid profiles in a given population, which has been verified experimentally (statistically) for some population samples [33, 34]. However, given the assumption of independence, the procedure is only valid in situations where the databases used for estimating the frequency of haplotypes and genotypes include populations of the same geographical origin, there are no premises for determining the structure (stratification) of the population, and the adopted hypotheses are the same for different types of markers. Also, it must always be remembered that mtDNA results cannot differentiate between individuals from the same maternal lineage. Consequently, combining the evidential value provided by mtDNA and other markers brings no benefit if the circumstances of a particular case suggest that alternative hypotheses should consider different individuals related in the maternal line [5, 35, 36]. In view of the above considerations, the ISFG-PL Group recommends caution when presenting LR values obtained as a result of multiplication.
Particular care should be given to the hypotheses that are put forward in LR calculations for different markers. If a laboratory chooses to combine the LR values by multiplication, an alternative presentation of results in the form of constituents of LR values obtained for different markers is also obligatory [35].

Conclusions

The present study systematizes existing recommendations in the field of mtDNA analysis for forensic purposes, and formulates interpretative guidelines which are especially relevant in view of recent developments in the forensic casework. The recommendations developed by members of the ISFG-PL presented in this document should be followed by all Polish laboratories conducting forensic testing.
The authors declare no conflict of interest.

Piśmiennictwo/References

1. Kivisild T. Maternal ancestry and population history from whole mitochondrial genomes. Investig Genet 2015; 6: 3.
2. Butler JM. The future of forensic DNA analysis. Philos Trans R Soc Lond B Biol Sci 2015; 370: 20140252.
3. Holland MM, Parsons T. Mitochondrial DNA sequence analysis – validation and use for forensic casework. Forensic Sci Rev 1999; 11: 21-50.
4. Scientific Working Group on DNA Analysis Methods (SWGDAM). Interpretation Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing Laboratories, 2013. https://www.swgdam.org/publications.
5. Parson W, Gusmão L, Hares DR, Irwin JA, Mayr WR, Morling N, Pokorak E, Prinz M, Salas A, Schneider PM, Parsons TJ. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing. Forensic Sci Int Genet 2014; 13: 134-142.
6. Cooper A, Poinar HN. Ancient DNA: do it right or not at all. Science 2000; 289: 1139.
7. Llamas B, Valverde G, Fehren-Schmitz L, Weyrich LS, Cooper A, Haak W. From the field to the laboratory: Controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era. Sci Technol Archaeol Res 2017; 3: 1-14.
8. Parson W, Brandstatter A, Alonso A, et al. The EDNAP mitochondrial DNA population database (EMPOP) collaborative exercises: organisation, results and perspectives. Forensic Sci Int 2004; 139: 215-226.
9. Parson W, Huber G, Moreno L, et al. Massively parallel sequencing of complete mitochondrial genomes from hair shaft samples. Forensic Sci Int Genet 2015; 15: 8-15.
10. Just RS, Scheible MK, Fast SA, et al. Full mtGenome reference data: development and characterization of 588 forensic-quality haplotypes representing three U.S. populations. Forensic Sci Int Genet 2015; 14: 141-155.
11. Malyarchuk B, Litvinov A, Derenko M, et al. Mitogenomic diversity in Russians and Poles. Forensic Sci Int Genet 2017; 30: 51-56.
12. Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 1999; 23: 147.
13. Anderson S, Bankier AT, Barrell BG, et al. Sequence and organization of the human mitochondrial genome. Nature 1980; 286: 460-467.
14. Behar DM, van Oven M, Rosset S, et al. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am J Hum Genet 2012; 90: 675-684.
15. Salas A, Coble M, Desmyter S, et al. A cautionary note on switching mitochondrial DNA reference sequences in forensic genetics. Forensic Sci Int Genet 2012; 6: e182-184.
16. Bandelt HJ, Kloss-Brandstätter A, Richards MB, Yao YG, Logan I. The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies. Hum Genet 2014; 59: 66-77.
17. van Oven M, Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat 2009; 30: E386-E394.
18. Bandelt HJ, Parson W. Consistent treatment of length variants in the human mtDNA control region: a reappraisal. Int J Legal Med 2008; 122: 11-21.
19. Irwin JA, Saunier JL, Niederstätter H, et al. Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples. J Mol Evol 2009; 68: 516-527.
20. Just RS, Irwin JA, Parson W. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing. Forensic Sci Int Genet 2015; 18: 131-139.
21. Skonieczna K, Malyarchuk B, Jawień A, Marszałek A, Banaszkiewicz Z, Jarmocik P, Grzybowski T. Mitogenomic differences between the normal and tumor cells of colorectal cancer patients. Hum Mutat 2018; 39: 691-701.
22. Rebolledo-Jaramillo B, Su MS, Stoler N, et al. Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc Natl Acad Sci U S A 2014; 111: 15474-15479.
23. Ye K, Lu J, Ma F, Keinan A, Gu Z. Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals. Proc Natl Acad Sci U S A 2014, 22: 10654-10659.
24. Just RS, Irwin JA, Parson W. Questioning the prevalence and reliability of human mitochondrial DNA heteroplasmy from massively parallel sequencing data. Proc Natl Acad Sci U S A 2014; 111: E4546-E4547.
25. Skonieczna K, Malyarchuk B, Jawień A, et al. Heteroplasmic substitutions in the entire mitochondrial genomes of human colon cells detected by ultra-deep 454 sequencing. Forensic Sci Int Genet 2015; 15: 16-20.
26. Berger C, Hatzer-Grubwieser P, Hohoff C, Parson W. Evaluating sequence-derived mtDNA length heteroplasmy by amplicon size analysis. Forensic Sci Int Genet 2011; 5: 142-145.
27. Salas A, Bandelt HJ, Macaulay V, Richards MB. Phylogeographic investigations: the role of trees in forensic genetics. Forensic Sci Int 2007; 168: 1-13.
28. Salas A, Carracedo A, Macaulay V, Richards M, Bandelt HJ. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics. Biochem Biophys Res Commun 2005; 335: 891-899.
29. Grzybowski T, Malyarchuk BA, Bednarek J, Woźniak M, Papuga M, Stopińska K, Luczak S. Phylogeographic approach in the interpretation of mitochondrial DNA sequencing results in forensics. Arch Med Sadowej Kryminol 2006; 56: 191-197.
30. Skonieczna K, Bednarek J, Rogalla U, et al. The application of mitochondrial genomics to forensic investigations based on human mitochondrial DNA testing. Arch Med Sadowej Kryminol 2012; 62: 213-218.
31. Grzybowski T, Malyarchuk BA, Derenko MV, Perkova MA, Bednarek J, Woźniak M. Complex interactions of the Eastern and Western Slavic populations with other European groups as revealed by mitochondrial DNA analysis. Forensic Sci Int Genet 2007; 1: 141-147.
32. Brenner CH. Fundamental problem of forensic mathematics — the evidential value of a rare haplotype. Forensic Sci Int Genet 2010; 4: 281-291.
33. Walsh B, Redd AJ, Hammer MF. Joint match probabilities for Y chromosomal and autosomal markers. Forensic Sci Int 2008; 174: 234-238.
34. de Zoete J, Sjerps M, Meester R, Cator E. The combined evidential value of autosomal and Y-chromosomal DNA profiles obtained from the same sample. Int J Legal Med 2014; 128: 897-904.
35. Amorim A. A cautionary note on the evaluation of genetic evidence from uniparentally transmitted markers. Forensic Sci Int Genet 2008; 4: 376-378.
36. Gjertson DW, Brenner CH, Baur MP, et al. ISFG: Recommendations on biostatistics in paternity testing. Forensic Sci Int Genet 2007; 1: 223-231.

Copyright: © 2019 Polish Society of Forensic Medicine and Criminology (PTMSiK). This is an Open Access journal, all articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License (http://creativecommons.org/licenses/by-nc-sa/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material, provided the original work is properly cited and states its license.

Recommendations of the Polish Speaking Working Group of the International Society for Forensic Genetics for forensic mitochondrial DNA testing

Tomasz Grzybowski , Ryszard Pawłowski , Tomasz Kupiec , Wojciech Branicki , Renata Jacewicz

Wstęp

Wytyczne dotyczące specyfiki procesu badawczego

Wytyczne dotyczące porównywania i zapisu sekwencji mtDNA

Wytyczne dotyczące interpretacji heteroplazmii sekwencji i długości

Zalecenia dotyczące kontroli jakości danych

Zalecenia dotyczące korzystania z populacyjnych baz danych

Zalecenia odnoszące się do szacowania wartości dowodowej

Wnioski

Introduction

Guidelines for the specificity of the testing process

Guidelines for comparison and notation of mtDNA sequences

Guidelines for the interpretation of sequence and length heteroplasmy

Recommendations for data quality control

Recommendations for the use of population databases

Recommendations for the estimation of evidential value

Conclusions

Piśmiennictwo/References

Tomasz Grzybowski

,

Ryszard Pawłowski

,

Tomasz Kupiec

,

Wojciech Branicki

,

Renata Jacewicz