Univerzita Karlova v Praze Přírodovědecká fakulta. Dizertační práce

Transkript

1 Univerzita Karlova v Praze Přírodovědecká fakulta Dizertační práce Profilování genové exprese na úrovni jednotlivých buněk a kontrola kvality Mgr. David Švec Školitel: prof. Mikael Kubista, Ph.D. Praha a Göteborg,

2 Mé drahé Markétě Prohlašuji, že jsem závěrečnou práci zpracoval samostatně a že jsem uvedl všechny použité informační zdroje a literaturu. V Praze,

3 Single cell gene expression profiling and quality control David Švec Institute of Biotechnology AS CR, Laboratory of Gene Expression, Prague, Czech Republic TATAA Biocenter, Research & Development, Gothenburg, Sweden Abstract: Gene expression profiling has become an exceedingly important tool for describing occurence of mrna in tissue samples and even single cells. Most often we use it for characterization of cell types, degree of differentiation and pathology on a molecular level. In our newly established laboratory, we developed high resolution qpcr tomography to show distribution of tens of maternal mrnas within a single oocyte. We demonstrated that distribution of mrnas has an important role in further development of the organism. For high resolution qpcr tomography, where one oocyte is divided in tens of samples and about fifty genes are studied in each sample, we optimized dye based protocol for microfluidic high-throughput platform BioMark. Next step was complementing the molecular profile of tens most important genes with information about histology of each selected tissue section using laser microdissection. As a model we used embryonic development of mouse molar. Our goal was to describe interaction of up to one hundred genes in different stages of development and on the single cell level. This work also reviews development of molecular tools for testing samples for contamination, genomic background and RNA quality. Use of such tools enhances development of new analytical approaches and shows to be crucial quality control for challenging studies of gene expression in time and space not only on the single cell level. Such studies are expected to accelerate understanding of cell regulation and to find new molecular targets for therapeutic use. Keywords: real time PCR, single-cell biology, single-cell gene expression, gene expression profiling, map of gene expression, qpcr tomography, high-throughput qpcr, quality control, RNA spike, DNA spike, genomic background, direct cell lysis

4 Profilování genové exprese na úrovni jednotlivých buněk a kontrola kvality David Švec Biotechnologický ústav AV ČR, Laboratoř genové exprese, Praha, Česká republika TATAA Biocenter, Výzkum & vývoj, Göteborg, Švédsko Abstrakt: Profilování genové exprese je velmi důležitý nástroj, který umožňuje získat informaci o výskytu mrna ve vzorku tkáně či v jednotlivých buňkách. Slouží k charakterizaci typů buněk, stanovení stupně diferenciace i ke studiu jejich patologického stavu na molekulární úrovni. V naší nově založené laboratoři jsme vyvinuli qpcr tomografii s vysokým rozlišením odhalující distribuci maternálních mrna v rámci jednoho oocytu. Distribuce mrna v embryu je důležitá pro pochopení následujícího vývoje jedince. Metoda qpcr tomografie kombinuje rozdělení oocytu na desítky řezů a kvantifikaci exprese až padesáti genů v každém vzorku. Toho bylo možné dosáhnout až po optimalizaci protokolu pro vysokokapacitní qpcr instrument BioMark. Další snahou bylo qpcr tomografii doplnit o zobrazení studované tkáně a její výběr pomocí laserové mikrodisekce. Cílem bylo vytvořit první mapu distribuce mrna desítek nejdůležitějších genů při vývoji myší stoličky v prostoru a čase, která by popsala interakce genů v jednotlivých stádiích, případně ve vybraných buňkách. Dalším úkolem této dizertace je ukázat vývoj molekulárních nástrojů pro kontrolu kontaminace, genomického pozadí i kvality mrna ve vzorku. Tyto nástroje velice usnadňují vývoj metod pro analýzu genové exprese v čase a prostoru a to i na úrovni jednotlivých buněk. Mohou tak přispět ke snadnějšímu pochopení regulačních procesů mezi buňkami, což může vést k novým diagnostickým a terapeutickým postupům. Klíčová slova: real time PCR, biologie jednotlivých buněk, genová exprese na úrovni jednotlivých buněk, profilování genové exprese, qpcr tomografie, vysokokapacitní qpcr, kontrola kvality, RNA spike, DNA spike, genomické pozadí, přímá lyzace buněk

5 Seznam publikované literatury: Dizertační práce je založena na následujících pracích. Řazeno chronologicky, podle času kdy byla témata řešena: I. Sindelka R., Sidova M., Svec D., Kubista M. Spatial expression profiles in the Xenopus laevis oocytes measured with qpcr tomography. Methods May;51(1): II. Svec D., Rusnakova V., Korenkova V., Kubista M. Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark ; PCR Technology Current-Innovations Third-Edition, Editors: Tania Nolan, Steve Bustin, CRC Press (Taylor and Francis group), Ch 23; ; 2013 III. Laurell H., Iacovoni JS., Abot A, Svec D., Maoret JJ., Arnal JF., Kubista M. Correction of RT-qPCR data for genomic DNA-derived signals with ValidPrime. Nucleic Acids Res Apr;40(7) IV. Kubista M., Rusnakova V., Svec D., Sjögreen B., Tichopad A, GenEx: Data Analysis Software. Quantitative real-time PCR in applied microbiology; Editor: Martin Filion; Caister Academic Press; V. Svec D., Andersson D., Pekny M., Kubista M., Ståhlberg A; Direct cell lysis for single-cell gene expression profiling; Přijato do Frontiers in Molecular and Cellular Oncology, 22. říjen 2013; DOI: /fonc

6 OBSAH: SEZNAM ZKRATEK ÚVOD CÍLE PRÁCE LITERÁRNÍ PŘEHLED Proč měřit genovou expresi na úrovni jednotlivých buněk? Metody použitelné k získání jednotlivých buněk pro profilování genové exprese Metody použitelné k jednobuněčnému profilování genové exprese na úrovni mrna Metody použitelné k jednobuněčnému profilování genové exprese na úrovni proteinů VÝSLEDKY A DISKUZE Subcelulární qpcr tomografie s vysokým rozlišením Mikrofluidní vysokokapacitní analýza za použití nespecifických barviv qpcr tomografie kombinovaná s laserovou mikrodisekcí Kontrola kvality experimentu při profilování genové exprese na úrovni jednotlivých buněk Inhibice a výtěžek Kvalita RNA a mrna Genomická kontaminace ValidPrime Přímá lyzace buněk Analýza dat OBECNÁ DISKUZE ZÁVĚR A VÝHLED PODĚKOVÁNÍ LITERATURA

7 Seznam zkratek Cq - cycle of quantification BMP bone morphogenetic factor BSA bovine serum albumin ELISA - enzyme-linked immunosorbent assay FACS - fluorescence assisted cell sorting FFPE formalin fixed paraffin embedded FGF fibroblast growth factor GTC guanidium thiocyanate LNA locked nucleic acid LOD limit of detection LOQ limit of quantification MDA - multiple displacement amplification MIQE minimum information for publication of quantitative real-time PCR experiments mirna micro RNA ncrna non coding RNA RT reverse transcription SHH sonic hedgehog SNP single nucleotide polymorphism SOM self organizing maps smfish single molecule fluorescence in situ hybridization PCA prinicipal component analysis PNA peptide nucleic acid qpcr quantitative polymerase chain reaction = real time PCR RIN RNA integrity number RQI RNA quality indicator

8 1 Úvod Profilování genové exprese je rozšířený přístup [1, 2], který umožňuje získat informaci o výskytu mrna ve vzorku tkáně či v jednotlivých buňkách. Jedná se o výjimečně užitečný nástroj, používaný k popisu základních buněčných procesů, diagnostice onemocnění a také k objevu nových cílů pro léčbu. Standardně se takové studie provádějí na vzorcích tkáně čítající tisíce buněk. Současné trendy ukazují, že s velikostí studovaného vzorku velmi úzce souvisí úroveň komplexity vzorku, heterogenity a především charakter odhalitelné informace. Studium na úrovni jednotlivých buněk dokáže identifikovat dosud skryté minoritní populace buněk a nové regulační mechanismy umožňující objasnění jejich funkce a nalezení nových možností terapie. Nejběžnějšími technikami ke stanovení genové exprese na úrovni mrna jsou v současné době microarraye, kvantitativní PCR (qpcr), next-generation sequencing a in situ hybridizace spolu s in vivo zobrazováním. Každá z těchto metod má své výhody i omezení a není vždy jednoduché, či možné výsledky mezi různými metodami porovnávat. Biologické procesy v mnohobuněčném organismu jsou velmi komplexní a současný stav poznání i technologií nám umožňuje jejich studium, pouze pokud pozornost věnujeme jenom jedné či dvěma úrovním informace. Například in vivo metody využívající mikroskopii a fluorescenční sondy mohou poskytnout jedinečné informace o lokalizaci několika mrna současně, ale kontext s více geny v daném místě a okamžiku bývá ztracen. Mikroarraye poskytují často přehled o celém transkriptomu, nicméně pro důvěryhodnou kvantifikaci vyžadují značná množství čisté a kvalitní RNA, která je nutno odebrat z celého studovaného objektu či objektů a tím je ztracena cenná souvislost s lokalizací exprese v komplexní tkáni. Pomocí RNA sekvenace je možné získat nejkomplexnější informaci o momentálním stavu buňky včetně identifikace neznámých transkriptů. Současné next-generation sekvenování sice již umožňuje analýzu transkriptomu na úrovni jednotlivých buněk, nicméně obtížnou překážkou zůstává spolehlivá kvantifikace při malých vstupních množstvích RNA a vysoké náklady. Při použití qpcr lze zkombinovat výhody velmi přesné kvantifikace mrna až stovky genů z jediné buňky, ale pro cenné doplnění informace o umístění buňky je obvykle nutné zachovat vysokou kvalitu RNA a jasný histologický obraz tkáně v době odběru buněk. Prvním krokem, který naše laboratoř ve směru mapování genové exprese v čase a prostoru podnikla bylo vyvinutí subcelulární qpcr tomografie oocytů Xenopus laevis v roce 2008[3]. 8

9 Kontinuita qpcr tomografie oocytu závisela na doplnění mapy o vyšší rozlišení více různých transkriptů, což bylo spojeno s nutností uvést do provozu vysokokapacitní přístroj BioMark s dostupnými knihovnami primerů, založenými na nespecificky vázajících se činidlech typu SYBR Green. Pro stanovení množství mrna až stovky genů a jejich interakce v biologickém systému v čase a prostoru jsme se dále pokusili zkombinovat qpcr tomografii s mikroskopickou laserovou disekcí, která přispívá informací o histologii buněk a umožňuje výběr studovaného objektu až na úroveň jednotlivých buněk. Jako model nám posloužil embryonální vývoj myší stoličky. Velmi brzy byly patrné technické překážky spojené s kvalitou a výtěžkem RNA z tkáně je zároveň zachována i po stránce histologické. Množství vzorků a malý počet transkriptů, který obsahují vyžadovali také optimalizaci protokolu přímé lyzace. Vyřešení těchto překážek vyžadovalo vývoj nových nástrojů pro kontrolu pre-analytické fáze experimentu v souvislosti s inhibicí, kontrolou ztrát materiálu a degradace RNA. V průběhu mého studia jsem pokusil tyto překážky vyřešit. 9

10 2 Cíle práce Mé hlavní cíle práce lze popsat v následujících bodech: - položit základy pro studium genové exprese jednotlivých buněk v nově založené laboratoři genové exprese a pomocí nových technik vysvětlit neznámé souvislosti distribuce RNA a význam pro buněčné děje a celý organismus - vyvinout qpcr tomografii s vysokým rozlišením pro studium distribuce mrna na subcelulární úrovni použitím vývojového modelu oocytů Xenopus laevis - doplnit qpcr tomografii o mikroskopické zobrazení a možnost výběru studované tkáně a použít ji k sestavení časoprostorové mapy distribuce mrna modelu embryonálního vývoje myší stoličky - v rámci akademického a později firemního prostředí vyvinout nástroje pro kontrolu čistoty vzorků a správnosti kvantifikace mrna usnadňující použití qpcr, včetně vývoje nástrojů pro vyhodnocení dat - v rámci firemního prostředí najít způsob jak efektivně přímo lyzovat jednotlivé buňky bez purifikace, s ohledem na stabilitu mrna a kompatibilitu s molekulárními metodami 10

11 3 Literární přehled 3.1 Proč měřit genovou expresi na úrovni jednotlivých buněk? Po sekvenaci nejžádanějších genomů (člověk v roce 2001, myš 2002, krysa 2004) se objevila řada nástrojů, které umožnily vysokokapacitní analýzu molekulárního profilu různých biologických dějů týkajících se vývoje organismu, diferenciace buněk či odlišení normální tkáně od rakovinné. Ať už byly použity pro genomické či transkriptomické studie microarraye, sekvenování či PCR, jako vzorek převážně sloužily vzorky složené z tisíců až miliard buněk různých typů, kde každá buňka přispěla informací do celkového průměru. Takový přístup je vhodný pro některé aplikace typu porovnání produkce geneticky upravených bakterií, kde je získání průměru majoritní populace žádoucím indikátorem efektivity produkce. Avšak při studiu komplexních procesů v tkáních, které se nutně skládají z interakce jednotlivých více či méně zastoupených buněčných populací je informace o různorodosti buněčných populací tímto přístupem schována, velká část klíčových informací se ztrácí a tím je také znemožněna úplná a správná interpretace toho, co se při těchto procesech odehrává na molekulární úrovni (Obrázek 1). Příkladem užitečnosti oddělení příspěvku různých populací může být identifikace rakovinných kmenových buněk v nádoru[4-6], nebo detailní interakce neuronů, astrocytů, mikroglií a oligodendrocytů při poškození míchy[7-9], nebo funkce α, β a γ pankreatických buněk při studiu a terapii diabetu [10]. 11

12 Obrázek 1: Omezení plynoucí z analýzy směsi buněk. Pokud jsme zaznamenali zvýšení genové exprese genu α ve směsi buněk oproti kontrolní populaci, může to znamenat (1) nárůst ve všech buňkách vzorku, (2) zvýšení exprese pouze v jednom typu buněk, (3) proliferaci buněk, které gen exprimují. Převzato z [11]. Tomuto zakrývajícímu efektu při použití souborů buněk se lze vyhnout analýzou na úrovni jednotlivých buněk. První takové studie se objevili v roce 2002 [12] a bylo ukázáno, že i zdánlivě homogenní populace ukazují vysokou úroveň variability v počtu mrna transkriptů přítomných v jednotlivých buňkách [12-18]. Jako typické se ukazuje log-normální distribuce mrna, kdy je většina transkriptů shromážděna jen v několika buňkách z celé populace a zbylé buňky jsou v klidovém stavu téměř bez daného transkriptu (Obrázek 2), [19, 20], odpovídající teorii stochastické regulace genové exprese a transkripčních vzplanutí. 12

13 Obrázek 2: Příklad typické log-normální distribuce mrna pro beta-actin v 96 astrocytech z primární kultury, vlevo v lineárním měřítku a vpravo v logaritmickém měřítku spolu s vyznačeným geometrickým průměrem charakterizujícím typickou buňku dané populace. Určení průměrné buňky dle distribuce transkriptu pomocí modelování pak slouží jako charakteristika dané populace a umožňuje její odlišení od jiné populace zdánlivě stejných buněk (Obrázek 2) [21]. Množství současných prací ukazuje výhody takového přístupu například při identifikaci a charakterizaci silně minoritních subpopulací kmenových buněk zodpovědných za obnovu tkáně [4, 7, 8, 22-26], což otevírá možnosti k dalšímu pochopení komplexních regulačních procesů signalizace či diferenciace buněk. K teorii stochastické distribuce mrna a dynamice regulace genové exprese dodejme, že genotyp s vnějšími faktory z prostředí nemusí být jediné faktory, které způsobí rozdílný fenotyp. Uvažme buňku dělící se na dvě identické dceřinné buňky. V průběhu dělení jsou molekuly rozděleny náhodně, převážně dle Brownova pohybu a zákonů statistické mechaniky. Pravděpodobnost, že obě dceřiné buňky získají stejný počet molekul je velmi malá. Navíc kdyby každá buňka získala po jedné molekule daného transkripčního faktoru, ten poputuje různou dobu ke svému cíli, než spustí transkripci, protože nelze očekávat korelaci Brownova pohybu ve dvou různých buňkách. Oba tyto faktory přispějí k vyšší variabilitě fenotypu dvou isogenických buněk[27] a lze je označit jako vnitřní šum, náhodný a nekorelovaný mezi geny. Jiná práce vysvětluje vnější šum a zahrnuje do něj regulační faktory, kde lze očekávat korelaci více genů, např. transkripční faktor, nebo množství RNA polymerázy. Pomocí E.coli byl proveden 13

14 jednoduchý experiment, kde byly na stejný promotor navázány dva různé fluorescenční proteiny. Bakterie by za přispění pouze vnějšího šumu měly všechny stejnou barvu vzniklou ze směsi obou proteinů a lišily by se pouze intenzitou. S vnitřním šumem, by byla korelace barev čistě náhodná. Bylo však ukázáno, že svou roli hrají oba typy šumu současně a jejich míra závisí na promotoru [12]. Při studiu kvasinek pomocí smfish bylo ukázáno, že některé geny jsou regulovány konstitutivně. Nevykazují vysokou variabilitu exprese mezi buňkami a lze je popsat pomocí modelu popisujícího stálý přísun a degradaci mrna. Pro některé geny bylo však vhodnější použít k popisu binární model, který předpokládá dvě nastavení promotoru: neaktivní, kdy není zaznamenána žádná exprese a aktivní, kde exprese prochází intenzivním nárůstem [28]. Tyto dva stavy jsou také často spojeny s uzavřenou nebo otevřenou formou chromatinu pro daný transkript [29, 30]. Binární model lze dále rozdělit podle četnosti a doby trvání aktivní a neaktivní fáze (Obrázek 3). V rámci jedné eukaryotické buňky byly nalezeny geny spadající do různých kategorií, lze tedy očekávat, že samotná variabilita exprese genu v populaci na úrovni mrna i proteinu má význam pro pochopení regulačních mechanismů [27]. Obrázek 3: Efekt regulace genové exprese na její dynamiku variabilitu populace. Typická distribuce mrna v závislosti na typu regulace genu: (I) populace s dlouhými intervaly aktivní a neaktivní exprese, přispívající ke zřetelně definovaným skupinám aktivních a neaktivních buněk, (II) populace s krátkými intervaly aktivní a dlouhými intervaly neaktivní transkripce s pozvolným poklesem počtu molekul mrna. (III) poslední typ reprezentuje poměrně homogenní populaci s kontinuální produkcí mrna v kombinaci s velmi krátkými intervaly neaktivní exprese. Převzato z [27]. Koexistence vysokého počtu transkriptů dvou a více genů v jedné buňce v jeden moment pak s vysokou pravděpodobností naznačuje jejich korelaci, regulační vztahy a fundamentální biologické informace o sdílení regulační kaskády. Při užití modelování lze také odvodit, zda gen 14

15 X aktivuje či inhibuje expresi genu Y, či jsou oba regulováni genem Z. Pokud je například maximum exprese genu X detekováno v čase před dosažením maxima exprese genu Y, lze usoudit, že gen X aktivuje gen Y. Pokud jsou jejich maxima detekována současně, budou pravděpodobně sdílet další společný aktivační prvek Z. Nová studie provedená na dosud největším souboru jednotlivých buněk (1440 B- lymfocytů) hledající souvislost mezi mono-nukleotidovými polymorfismy (SNP) a dynamikou genové exprese ukazují, že nejen celková úroveň exprese, ale i četnost a velikost expresních vzplanutí spolu s provázaností interakce exprese genů závisí na momentálních podmínkách buňky a jejím okolí, buněčném cyklu a případně podané látce. Studie také porovnává úroveň variability exprese mrna na úrovni jednotlivých buněk uvnitř vzorku a mezi jednotlivci, kde se ukazuje variabilita exprese genů v různých jednotlivcích velmi porovnatelná, tedy pravděpodobně s charakterizující hodnotou. Zatímco je možné i v souvislosti s SNP s použitím jednobuněčné analýzy stanovit regulační síť exprese genů uvnitř vzorku pro daný biologický jev, tradiční studie postavené na porovnání souborů vzorků tuto možnost nedávají a obvykle nepozorují výrazné změny v úrovních exprese genů díky množství překrývajících se signálů [31]. Ačkoliv korelace mezi různými geny na úrovni mrna nebo proteinů mohou objasnit regulační vztahy, hladina mrna ne vždy koreluje s množstvím proteinu daného genu. Studie ukazují, že i na úrovni proteinů se vyskytuje vysoká heterogenita v množství a lokalizace ve vztahu k jednotlivým buňkám[32, 33]. U kvasinek jsou známy případy, kdy nekorelující transkripty vytváří proteiny pro vysoce synchronizované komplexy proteasomu a podjednotek RNA polymerasy II [34]. Pomocí E.coli bylo ukázáno, že je častějším jevem zaznamenat nízkou korelaci mrna a proteinu, což naznačuje přítomnost odlišných systémů regulace pro mrna molekuly a proteiny[35]. Jednoduchými funkčními příklady jaké výhody pro přežití a vývoj buněk daná variabilita fenotypu může přinášet mohou být reakce bakterií na antibiotikum, kdy při určité dávce antibiotika většina bakterií zanikne, ale několik bakterií s vysokým množstvím potřebného produktu přežije. Nebo analogicky s chemoterapií u rakovinných buněk[36], jimž variabilita a následná kumulace onkogenních mutací přináší výhody podobné darwinovské evoluci [37, 38]. 15

16 3.2 Metody použitelné k získání jednotlivých buněk pro profilování genové exprese Jednotlivé buňky je možné vygenerovat z většiny tkání pomocí disociace a následného obohacení pro studované typy například pomocí FACS či mikroaspirace, ale při tomto postupu bývá ztracen kontext s prostředím dané buňky a její lokalizace. Jedním způsobem jak lze ztrátě lokalizace předejít je laserová mikrodisekce. K získání nukleových kyselin z jednotlivých buněk je obecně výhodné použít přímou lyzaci bez promývacích kroků pro minimalizaci ztrát materiálu FACS (použit v této práci) Průtoková cytometrie spojená s tříděním buněk na základě fluorescence (FACS) je vyvíjena již od 70. let k automatizaci třídění jednotlivých buněk na základě vybraných, především povrchových proteinových znaků. Jde o nejstarší metodu umožňující výběr jednotlivých buněk, kdy je buněčná suspenze usměrněna do laminárně proudícího tenkého proudu kapaliny který prochází zdrojem světla (laser) a detektorem, který umožňuje změřit vybrané parametry, ať už jde o nespecifické značení viability, obsah DNA, buněčný cyklus či specifické povrchové znaky. Laminární proud s buňkami je rozdělen do mikrokapek, kde je možné ověřit, že kapka obsahuje právě jednu buňku. Na základě vyhodnocení dat z detektoru je kapka opatřena elektrostatickým nábojem a směr jejího letu může být dle potřeby usměrněn magnetem. Pomocí nejnovějších přístrojů tak lze za hodinu z desetitisíců buněk získat vybranou populaci až na základě 17 různých parametrů[39], kde omezení nastává z důvodů přesahu spekter a zvýšeného fluorescenčního pozadí. V kombinaci s hmotnostní spektrometrií a značení protilátek specifickými prvky lze odlišit buňky až na základě tří desítek faktorů [40, 41]. Nespornou výhodou FACSu je viabilita buněk při vlastním třídění. Buňky mohou být dále kultivovány nebo nasměrovány do jamky mikrotitrační destičky a tam přímo lyzovány pro další molekulární analýzu bez větších ztrát mrna [42]. FACS je proto zatím nepostradatelným nástrojem pro vysokokapacitní analýzu jednotlivých buněk i pro qpcr, microarraye či sekvenování. Nevýhodou je potřeba tkáň disociovat na buněčnou suspenzi, což znemožňuje uchovat informaci o lokalizaci buňky a často také přináší problémy se změnou genové exprese a ztrátou velkého množství buněk. Obsluha samotného přístroje i permeabilizování a značení buněk zpravidla vyžaduje značné zkušenosti a limitujícím mohou být i vysoké pořizovací náklady třídiče buněk, spolu s náklady za provoz a nákupem protilátek. 16

17 3.2.2 Laserová microdisekce (použita v této práci) Metoda laserové mikrodisekce (LMD) [43, 44] kombinuje výhody in situ lokalizace použitím mikroskopie s analýzou transkriptomu ale i proteinů pomocí molekulárních technik. Po zafixování tkáně tato metoda umožňuje přesné zacílení na buňky zájmu díky přímé vizuální kontrole pod mikroskopem a jejich výběr pro další analýzu. U živých buněk může být využita na selekci buněk pro další kultivaci. K dispozici jsou dva hlavní technologické proudy: laser cutting dissection používající zpravidla ultrafialový laser (UV) k vyříznutí fotolabilní membrány, na které je připraven tkáňový řez. Část řezu pak i s UV labilní folií putuje do připravené zkumavky. Druhým proudem je laser capture dissection, kde je termolabilní membrána nad vzorkem deformována infračerveným laserem (IR) k nalepení se na cílovou tkáň, folie je pak odstraněna i s vybranou tkání. Pro přesnější výběr až na úroveň jednotlivých buněk a zachování kvality RNA se více hodí první varianta používající UV laser a UV labilní folii [45]. Laserová mikrodisekce je jednou z metod použitelných k analýze izolovaných organel, uvážíme-li přesnost na úrovni výřezu metafázického chromozomu [46]. Hlavním omezujícím faktorem je problematický výtěžek a kvalita RNA při zvolení nejčastěji používané formalinové fixace (FFPE), která je v oboru diagnostiky velmi oblíbená pro jednoduchost a výjimečné schopnosti zachovávat histologické znaky. Důvodem pro poškození RNA je tvorba methylenových můstků mezi nukleovými kyselinami a proteiny, poškozující templát pro enzymatické zpracování[47]. Při doporučeném použití fixace rychlým zmrazením, lze dosáhnout lepší kvality RNA a vyššího výtěžku, nicméně histologické znaky i integrita tkáně se zpravidla poškodí tvorbou ledových krystalů. Řešením mohou být nové protokoly pro fixování tkáně, zachovávající histologii, RNA, proteiny i DNA[48], protokol pro zuby [49], či jednoduchá fixace ethanolem okamžitě po vyjmutí tkáně z -80 C [50] Mikroaspirace Mikroaspirace buněk vyžaduje mikroskop spojený s mikromanipulátorem, zdroj vakua a mikrozkumavky do kterých se obvykle odlomí špička skleněné kapiláry s cílovou buňkou. Nejčastěji je použita suspenze disociovaných buněk, či tkáň v RNAse-free roztoku, při určitých modifikacích je možné použít i materiál fixovaný. Výhodou metody je velmi vysoká přesnost a stupeň kontroly výběru buňky nebo dokonce organely, přidanou hodnotou mohou být i měření membránového potenciálu před odběrem buňky. Nevýhodou je technická a především časová 17

18 náročnost při sběru stovek buněk pro profilování genové exprese. Při sběru buněk trvajícím několik hodin dochází často k degradaci mrna a změně profilu exprese. 3.3 Metody použitelné k jednobuněčnému profilování genové exprese na úrovni mrna Vzhledem k velikosti eukaryotické buňky (10 µm) a limitovanému množství RNA: nejčastěji mezi pg RNA, z toho mrna pg [51], představuje analýza jednotlivých buněk technicky obtížnou disciplínu. Pro některé metody však tisíce různých aktivních genů zastoupených přibližně sta tisíci transkriptů v jediné buňce poskytují dostatek materiálu pro kvantifikaci a umožňují nahlédnout do molekulárních charakteristik jednotlivých buněk Microarray Microarraye dokáží detekovat a kvantifikovat kompletní transkriptom včetně nekódující RNA (ncrna) a micro RNA (mirna) pomocí hybridizace flourescenčně značeného templátu k ukotveným syntetickým oligonukleotidovým próbám a to za předpokladu, že je známa sekvence templátu. Microarraye obvykle potřebují 1-2 µg kvalitní a čisté RNA, což odpovídá buněk. Pro jednobuněčnou analýzu je tedy nutné RNA namnožit a to pomocí PCR, T7 nebo isotermální amplifikace [42], což je krok často omezující výběr cílů pro detekci a také zanášející možnou chybu pro kvantifikaci [52]. Analýza dat je poměrně komplexní, metoda má úzký dynamický rozsah spolu s nízkou citlivostí při malých koncentracích templátu. Při důkladné optimalizaci a validaci dokáže detekovat až genů v jediné buňce [52-54]. Hlavní nevýhodou je stále výše nákladů na jeden vzorek a potřeba analyzovat až stovky buněk jednotlivě Sekvenování Sekvenování druhé generace (pyrosekvenování = sekvenování pomocí syntézy, transkriptomická varianta: masivně paralalení sekvenování = RNA seq) zásadním způsobem změnilo poznávání genomů a transkriptomů a v současné době se přibližuje technicky i ekonomicky k analýze na úrovni jednotlivých buněk. Využívá několika principů (pyrosekvenování, 454, Solexa a SOLiD)[55]. Tato technika jediná dokáže identifikovat neznámé mutace a transkripty. Pro ostatní technologie např. microarray, qpcr, FISH je potřeba nejdříve zmapovat sekvenci cílového transkriptu. 18

19 DNA sekvenování se používá k charakterizaci mutací a odchylek v genomu (změna počtu kopií) u kódujících i nekódujících oblastí, metylace DNA i struktury chromatinu. Buňka obsahuje obvykle femtogramy genomické DNA a jako vstup pro sekvenování je třeba nanogramy až mikrogramy. Je tedy nutná amplifikace celého genomu. V době vzniku této práce existuje značná snaha o vývoj metody optimální k amplifikaci celého genomu použitelného na úrovni jednotlivých buněk a přechod od nedostačující MDA [56]. Přesto jsou už známy fascinující práce o studiu heterogenity uvnitř nádoru [38, 57] na úrovni jednotlivých buněk [5]. RNA sekvenování (RNA seq) se používá ke kvantifikaci genové exprese pomocí počtu přečtení sekvence po přiřazení ke genu pomocí referenčních kódujících databází. Typicky je třeba milionů čtecích fragmentů pro identifikaci nových genů, splice variant a exonů. Navíc lze sekvenovat známé i neznámé mirna, ncrna, varianty sestřihu a objevit neznámé transkripty [42, 58]. V roce 2010 bylo možné sekvenovat transkriptomy 16 jednotlivých buněk za 6 dní (SOLiD) [59], rok 2012 umožňuje sekvenování 42 cirkulujících nádorových buněk (Illumina, > 20 milionů čtecích rámců), kdy je také měřen dynamický rozsah kvantifikace transkriptomu. Studie ukazuje na dobrou korelaci v rozmezí 100pg - 1ng. Při 10 pg se objevují chyby, nejpravděpodobněji způsobené tím, že amplifikace z jediné buňky vnáší do analýzy bias a nespecifické produkty. Validace RNA seq výsledků pomocí qpcr je tak často nutnou součástí takových experimentů. Potřeba protokolu amplifikujícího transkriptom rovnoměrněji především pro nízko exprimované geny stále pokračuje [58, 60]. Nevýhodou metody, podobně jako u microarraye je, že v určité fázi protokolu sekvenování druhé generace využívá PCR k namnožení materiálu tak, aby signál fluorescence (vyjímkou je Ion Torrent, používající elektronickou detekci) dosáhl detekovatelné úrovně. Dalším problémem je DNA kontaminace reagencií, kde je složité se vyvarovat a případně odhalit její původ u de novo sekvenací genomů [42]. Uživatel se musí vypořádat s velkým množstvím dat a zpracovat až 50 gigabází z jediného experimentu. De novo sekvenace lidského genomu pomocí Sangerova sekvenování stála přibližně 3 miliardy USD a trvala 10 let, nyní se náklady pohybují okolo 100tis. USD v řádu několika dní. Vysoké náklady jsou ale stále překážkou pro analýzu většího počtu jednotlivých buněk. Slibným a revolučním řešením do budoucna se zdá být sekvenování třetí generace na úrovni jednotlivých molekul (Pacific Biosciences, Visigen, IBM s DNA transistor, Nanopore, Helicos), které by mělo snížit náklady vztažené ke vzorku na cca 1000 USD za genom a umožnit 19

20 i přímou analýzu RNA v neomezeném rozsahu bez nutnosti jejího přepisu a preamplifikace [55, 61, 62]. Jednou z dalších velmi zajímavých aplikací bude typizování aktivních alel a objasnění epigenetické regulace [63]. Nástup do praxe a vyřešení technických problémů, včetně standardizace a komplexní analýzy náročné na výpočetní výkon však ještě několik let potrvá a i potom lze předpokládat, že pro jednoduchou kvantifikaci již identifikovaných cílových genů v mnoha vzorcích budou stále hrát důležitou roli i metody založené na PCR smfish Již nějakou dobu dokážeme zobrazit mrna in situ v histologickém a morfologickém kontextu pomocí in situ hybridizace (ISH) a užitím fluorescenčně značených specifických oligonukleotidových prób nejčastěji o délce okolo 20b u fluorescenční in situ hybridizace (FISH). Vývoj dospěl až k detailnímu rozlišení jednotlivých buněk pomocí navázání několika (48-96) prób na jedinou molekulu mrna pro smfish, vytvářejícím dostatečně silný signál pro její detekci [13]. Z nabídky molekulárních metod, jde o přímou cestu jak zobrazit mrna v buňce a sledovat její regulaci a pohyb, vyžadující pouze fixaci materiálu, bez disociace tkáně, purifikace, přepisu a preamplifikace, což je nespornou výhodou[42]. Molekulový FISH umožňuje i poměrně přesnou kvantifikaci genu. Limitující je počet sledovaných faktorů, kdy lze pomocí multiplexu odlišit spektrum 3-5 různých prób. Když se použije kombinatorické značení prób, lze detekovat 2 n -1 genů najednou, kde n je počet spektrálně odlišitelných kanálů, v praxi se množství genů pohybuje mezi Problémem mohou být krátké cílové molekuly jako mirna, kde buď detekce není možná, nebo jen za použití speciálně modifikovaných nukleotidů se silnější schopností navázat se specificky na cílovou molekulu (LNA, PNA) [62, 64] qpcr (použito v této práci) Polymerázová řetězová reakce (polymerase chain reaction, PCR) byla objevena v 80.letech 20.století a slouží k amplifikaci, detekci a kvantifikaci nukleových kyselin [65]. Pro kvantifikaci mrna se používá reverzní transkripce spolu s kvantitativní PCR (RT-qPCR). Principem metody je použití termostabilní polymerázy, páru specifických primerů a DNA templátu, který je pomocí opakujícího se teplotního protokolu v ideálním případě amplifikován právě dvojnásobně v každém cyklu v přítomnosti fluorescenčního činidla korelujícím s množstvím přítomné DNA v reálném čase. 20

21 Z výše popsaných metod je pravděpodobně nejrozšířenější laboratorní technikou s nejmenšími nároky na vybavení. PCR byla použita na jednotlivé buňky již v roce 1988 [66] a k prvnímu qpcr profilování genové exprese 5 genů na úrovni 169 jednotlivých buněk došlo v roce 2005 použitím β-buněk pankreatu [19]. Nevýhodou metody je omezený rozsah analýzy pouze na zmapované geny a nutnost kompartmentalizace qpcr reakcí, kdy při analýze 100 buněk a 100 genů potřebujeme reakcí. Tento fakt však na druhou stranu přispívá k vysoké flexibilitě při návrhu experimentů, kdy je možné poměrně snadno kombinovat různé počty genů analyzované v různém množství vzorků. Výhodou qpcr jsou nízké náklady na analýzu vzorku, klíčové pro studie se stovkami buněk. Dalším důležitým parametrem je vysoká citlivost pro detekci teoreticky již od jediné molekuly templátu. Limit kvantifikace (LOQ) pro qpcr se však obyčejně pohybuje okolo molekul templátu, pokud použijeme obvykle přijímaný limit SD Cq 0.45 (Obrázek 4). Obrázek 4: Závislost reproducibility qpcr, reprezentované standardní deviací Cq hodnot a množstvím molekul přítomných v reakčním objemu na základě Poisson distribuce. Jako přípustné je obvykle pro limit kvantifikace považována hodnota SD 0.45, což představuje molekul templátu v reakci. Převzato z [67]. Zásadním vývojovým stupněm pro analýzu desítek buněk a desítek genů najednou bylo vyvinutí vysokokapacitních qpcr systémů, včetně přístroje BioMark (Fluidigm), který byl jako jeden z prvních v Evropě dostupný v naší laboratoři [68] a později i dalších platforem včetně OpenArray (LifeTechnologies) [69-73] (Tabulka 1). 21

22 Fluidigm Dynamic Array (+Acces array*) Fluidigm Dynamic Array Fluidigm FR (Genotyping) Fluidigm (Genotyping) Fluidigm Digital Array Fluidigm Digital Array Life Tech. Openarray Wafergen Smartchip priming time 11 min. 20 min. 11 min. 10 min. 6 min. 30 min Roche LightCycler 1536 loading time 60 min 95 min. 60 min 30 min. 40 min. 40 min 30 min >10 min min number of samples assays per sample (reusable) 48 (reusable) (3x) single/multiplex single/multiplex (3x) qpcr reactions 2304* (3x) min.input/sample 5 μl 5 μl 5 μl 4 μl 8 μl 4 μl 3-5 μl nl nl reaction volume 10 nl 6.75 nl 8 nl 8 nl 6 nl 0.85 nl 33 nl 100nl nl detection probe/dye probe/dye probe probe probe/dye probe/dye probe/dye probe/dye probe/dye loader MX HX WX RX MX MX AccuFill System Nanodispenser InnovadyneTM launched spring 2007 fall 2008 fall 2010 May 2011 fall 2006 spring 2009 spring 2009 spring 2010 summer 2009 Tabulka 1: Přehled platforem pro vysokokapacitní analýzu qpcr (nezahrnuje systémy používající technologii mikro kapek). Mikrofluidní chipy se kromě vysokokapacitní analýzy desítek buněk a genů také hodí k přípravě vzorků jednotlivých buněk, kdy zajištění právě jedné buňky, její lyzace, reverzní transkripce i preamplifikace může proběhnout v uzavřeném systému minimalizujícím ztráty s minimálními reakčními objemy v řádu pico- a nanolitrů. Malé množství transkriptů není intenzivně ředěno a počet molekul templátu v každé nanolitrové qpcr je mnohem pravděpodobněji nad limitem kvantifikace [74, 75], což dělá systém pro analýzu buněk efektivnějším a dostupnějším. V ostatních případech je pro běžnou přípravu jednobuněčných vzorků pomocí FACS a 96 jamkové mikrotitrační destičky nutné provést preamplifikaci templátu k obohacení nad limit kvantifikace. Analýza dat k identifikaci subpopulací vzorků a jednotlivých buněk probíhá multiparametricky, nejčastěji pomocí principal komponent analýzy (PCA), hierarchické klustrování a self-organizing map (SOM). Více o analýze dat v kapitole knihy [76] Genex: Data analysis software [67, 77]. Širokou základnu uživatelů a aplikací pro qpcr spolu s velmi rozmanitým trhem s reagencii a instrumenty se snaží svými doporučeními standardizovat MIQE guidelines [78], sloužící jako kontrolní seznam pro uvedení všech relevantních informací o qpcr experimentu v případě publikování výsledků. Citlivost metody je tak vysoká, že s pomocí dalších metod 22

23 kombinujících specificitu na proteiny a možnost amplifikovat informaci pomocí PCR [79], umožňuje změřit DNA, RNA i úroveň proteinu v jediné buňce [80] Digital PCR Na rozdíl od klasické qpcr metoda digitální PCR dokáže zajistit přesné měření absolutních hodnot mrna bez nutnosti použití ředící křivky. V principu jde o rozdělení jednoho vzorku až na několik tisíc menších reakcí, kdy každá obsahuje právě jednu nebo žádnou cílovou molekulu. Po teplotním cyklování se pouze reakce s jednou molekulou ukáží jako pozitivní. Jednoduchým sečtením a aplikací statistických úprav (např. korekce pro Poisson distribuci) dostaneme počet původních cdna molekul ve vzorku[81]. Vzrůstající počet platforem dostupných pro digitální PCR především na bázi mikrofluidní (BioMark - Fluidigm, QuantStudio - LifeTechnologies) či mikrokapkové (QX100 - Biorad, RainDrop - RainDance), dělá tuto metodu stále vhodnější i pro klinickou diagnostiku[82, 83]. Hlavními aplikacemi, kde digitální PCR představuje své výhody je detekce velmi vzácných mutací, odlišení chromozomálních aberací či kvantifikace vzácně se vyskytujících virových kmenů. Ředěním a rozdělením vzorku totiž dochází k odstranění pozadí s případně velmi podobnou sekvencí a tak digitální PCR nabízí vyšší specificitu měření. Nevýhodou jsou zatím poměrně vysoké pořizovací náklady i náklady za provoz vztažené k jednomu vzorku. To lze částečně řešit i širšími možnostmi multiplexu v porovnání s qpcr. Podobně jako u metody FISH, zde lze použít kombinaci kanálů pro jeden gen a tak získat systém, kde pro dvoukanálový přístroj (A, B) můžeme měřit tři geny najednou (gen 1: A, gen 2: B, gen 3: A+B). V kontextu analýzy jednotlivých buněk jde o velmi zajímavou a jedinečnou alternativu schopnou odlišit produkty exprese jednotlivých alel genu v buňce, včetně epigenetické informace a regulace například pomocí nekódujících RNA. 3.4 Metody použitelné k jednobuněčnému profilování genové exprese na úrovni proteinů Ačkoliv jsem se ve své práci analýze proteinů nevěnoval, měl jsem možnost pracovat s metodami immuno-qpcr a proximity extension assay a je jistě dobré poskytnout aktuální přehled, protože jsou proteiny hlavním funkčním i stavebním prvkem každé buňky a také 23

24 vrcholem biologického dogmatu DNA-RNA-protein. Je velmi důležité sledovat i jejich úroveň v jednotlivých buňkách pro pochopení buněčných dějů zahrnutých v různých onemocněních, vývojových procesech a regulaci. Stanovení DNA a RNA napomáhá k získání informace o množství proteinu jako produktu, ale nedokáže určit jeho polohu, koncentraci, post-translační úpravy či interakci s ostatními proteiny. K nejběžnějším metodám k analýze proteinů slouží elektroforéza, imunochemické nástroje, chromatografie a hmotnostní spektrometrie. Tyto metody ovšem vyžadují značná množství vstupního materiálu, které je možné získat pouze z velkých populací buněk a s tím souvisí i charakter informace z takto homogenizované populace. Největší překážkou pro analýzu na úrovni jednotlivých buněk jsou velmi malá množství proteinu v jediné buňce a jejich velmi široká rozmanitost. Navíc existuje celá řada typů proteomických analýz jako post-translační modifikace, mezi-proteinové či DNA interakce včetně kvantifikace není možné aplikovat všechny najednou. Tak jako u metod amplifikace DNA zde ovšem neexistuje přímá možnost signál molekulárně zesílit. Pokud chceme protein označit nějakou molekulární sondou, nejčastěji protilátkou, pak specificita a kvantifikace velmi závisí na kvalitě takové protilátky. Limitujícími faktory je neprostupnost buněčné membrány pro protilátky, omezené spektrum protilátek s v podstatě nekonečným spektrem antigenů a velmi vysoké náklady při profilování exprese sady proteinů. Nicméně byla vyvinuta řada metod, jenž některé z komponent širokého proteinového spektra dokáží stanovit i u jednotlivých buněk Hmotnostní spektrometrie Díky neustále se zlepšující technologii hmotnostní spektrometrie (MS) má tato metoda potenciál v budoucnu sloužit jako nástroj pro kvantifikaci celého proteomu jedné buňky a to i bez nutnosti značení. Nyní se citlivost detekce pro čistý protein pohybují v řádech femtomolu. Ze spektra hmotnostních spektrometrií se pro stanovení proteinů v jednotlivých buňkách používá nejčastěji elektrospray MS, laser desorpční ionizační (LDI-MS), sekundární iontová (SIMS) a matrix asistovaná laser desorpční (MALDI-MS). [36] Nejčastějším omezením je v současné době velmi malý poměr intenzity signálu k pozadí u malých množství jednotlivých proteinů v komplexním vzorku. Značnou pomocí jsou vysoce citlivé separační metody předcházející MS využívající mikrofluidní elektroforézu. 24

25 3.4.2 Analýza pomocí protilátek Použití protilátek ke stanovení proteinu se těší vysoké popularitě, díky poměrně vysoké specificitě, sensitivitě a relativně jednoduchému použití. Tyto parametry ovšem závisí na kvalitě dané protilátky, která je pro daný protein a organismus dostupná. Výběr protilátek pro nejčastější modelové organismy je i přes jejich náročnou výrobu poměrně široký, spektrum proteinů však ještě širší. Jedním z limitů pro funkční studie pomocí protilátek je neschopnost protilátek proniknout do živých buněk. Pro potřeby profilování genové exprese desítek genů, může být limitujícím faktorem také cena a vysoké pozadí způsobené nespecifickým navázáním mnoha protilátek. Pro potřeby jednobuněčné analýzy lze použít několik metod využívající protilátek. Pro analýzu in vivo se zobrazením detailů na úrovni buněk byla v poslední době vyvinuta zajímavá metoda CLARITY, kde jsou nejdříve pomocí cross-linkování formaldehydu a cílových proteinů stabilizovány buněčné struktury v celém orgánu, pak jsou elektroforeticky odstraněny fosfodiesterové membrány, které normálně zabraňují přístupu protilátek do buněk a cílové proteiny jsou označeny. Vznikají tak velmi detailní systematické mapy neuronové sítě u epilepsie a dalších poruch [84]. V posledních letech byla citlivost in vitro kvantifikace proteinů vylepšena díky kombinaci značení protilátkami a PCR amplifikace. Jednou z možností je immuno qpcr, kdy imobilizovaná primární protilátka naváže cílový protein a po následném promytí je přidána sekundární protilátka, která velmi specificky naváže pouze imobilizovaný cílový protein. Sekundární protilátka je předem kovalentně spojena s DNA oligonukleotidem. Po dalším promytí následuje kvantifikace DNA oligonukleotidu pomocí qpcr, který koreluje s množstvím cílového proteinu [85]. Další variantou je proximity ligation assay (PLA) [86] a ještě progresivnější modifikací je proximity extension assay (PEA), která používá současně dvě protilátky specifické k jednomu cílovému proteinu, kdy obě mají kovalentně připojený oligonukleotid. Vazebná místa proteinu jsou blízko sebe, tak aby umožnily přesah 3 konců nukleotidů obou protilátek a ten vytvořil dvouřetězcový úsek DNA pro iniciaci PCR amplifikace pomocí přidaných primerů [79]. Pomocí PEA v kombinaci s microfluidní platformou BioMark lze kvantifikovat 96 různých proteinových markerů přímo z 1 µl krevního séra. Obecně lze říci, že tyto techniky spojující specificitu protilátky s amplifikací DNA poskytují řádově stokrát širší dynamický rozsah než nejčastěji používaná ELISA a umožňují stanovení DNA, RNA i proteinu z jedné buňky současně[80]. 25

26 3.4.3 Proteinové in vivo sondy Velmi používanou technikou je vytváření transgenních modelů pro funkční studie, kde je cílový protein genetickými metodami spojen se sondou z rodiny XFP proteinů (X=Green, Red, Blue, Cyan, Yellow fluorescent protein), což umožňuje jeho cílenou expresi, lokalizaci i kvantifikaci[87]. Jedním z příkladů aplikace je Brainbow spolu s cre-lox rekombinací [88] umožňující odlišit až 90 různě zbarvených buňek v neuronální síti. In vivo zobrazování má řadu velmi praktických využití, včetně sledování pohybu buněk, proliferace během vývoje a monitorování fyziologických dějů. Nevýhodou jsou časově náročné postupy genetické modifikace zvířecího modelu a často také velikost XFP proteinu, která často ovlivňuje chování buňky. Alternativou k protilátkám mohou být menší bioorthogonální sondy, které využívají k inkorporaci buněčného translačního aparátu [36]. Počet proteinů, které je možné sledovat je omezen šíří fluorescenčního spektra a obvykle bývá 1 až Optická pinzeta ve spojení s Ramanovou spektroskopií Tato metoda umožňuje manipulovat s jednotlivými buňkami i jednotlivými organelami pomocí světelného paprsku. Pokud tento paprsek splňuje gaussovské parametry, pak jsou objekty blízko ohniska usměrňovány směrem do středu ohniska. Zdrojem těchto sil jsou fotony rozptýlené skrze objekt, které objekt tlačí směrem do ohniska a zároveň je objekt odraženými fotony tlačen odstředivě od ohniska. Vzniklá výslednice sil udržuje objekt na místě silou, která je přímo úměrná intenzitě paprsku. Jakmile je částice imobilizována, lze jí charakterizovat pomocí paprsku laseru díky specifickému rozptylu Ramanova spektra odpovídajícímu složení buňky. Každá buňka pak vykazuje specifické rozdělení absorpčních maxim, podobné otisku prstu. Je obtížné identifikovat jednotlivé proteiny v daném spektru rozptylu, nicméně je možné charakterizovat buňky s podobným složením a odlišit tak například buňky rakovinné od normálních[46]. Výhodou je absence barvení a značení, častou nevýhodou bývá přesah velikosti studovaných buněk nad možnostmi velikosti ohniska. 26

27 4 Výsledky a diskuze 4.1 Subcelulární qpcr tomografie s vysokým rozlišením Vysoce žádaným cílem při studiu biologických systémů je získání detailní strukturní a zároveň molekulární informace v kontextu celého systému, který je obvykle velmi komplexní. V této oblasti bylo v posledních letech dosaženo velkého pokroku ve vývoji mikroskopických technik pro zobrazení transgenních, či protilátkami značených cílů a modelování biologických systémů, ať už na základě sériových řezů vzorku nebo odstraněním membránových struktur[84, 88-90]. Tyto metody poskytují fascinující náhled například na strukturu mozku až na úroveň jednotlivých buněk a synapsí a umožňují in vivo funkční pozorování. Nejsou však kompatibilní s profilováním genomu či transkriptomu, protože je vlastnosti fluorescenčních sond limitují k charakterizaci maximálně 5-6 genů, které nedostačují pro hlubší pochopení buněčných procesů při jediném experimentu. Pokud chceme stanovit dalších 5-6 genů, je třeba časově náročného vytvoření nového transgenního modelu. Například při vzniku primárního tumoru z normální tkáně[91], ale i uvnitř jediného tumoru byly na úrovni jednotlivých buněk pozorovány změny v genomu buněk naznačující extrémní úroveň heterogenity, kterou je pravděpodobně užitečné korelovat s fenotypickými znaky, lokalizací buněk a vztahem s jejich okolím[92]. V nově založené Laboratoři genové exprese, Biotechnologického ústavu AV ČR (2006) byla vyvinuta metoda qpcr tomografie pro studium role distribuce maternálních mrna ve vývoji organismu a to na subcelulární úrovni v jednotlivých oocytech obojživelníka Xenopus laevis. Pro výběr tohoto modelu bylo rozhodujících několik faktorů: nenáročný chov, externí oplození a externí vývoj embryí, ovulace kontrolovatelná podáním hormonů, velké množství téměř identických oocytů při ovulaci, velikost oocytu (1,3 mm) a vysoký obsah celkové RNA v jedné buňce ( 5 µg). Rozhodující roli při časném vývoji hrají biomolekuly mrna, mirna a proteiny a jejich rozložení v oocytu při rýhování a rozdělení do blastomer. Množství těchto molekul rozhoduje o vývoji jednotlivých buněk a o formování základních tělních os. Nová metoda odhalila gradient distribuce jednotlivých mrna podél animálně vegetální osy oocytu a rozdělení studovaných genů do skupin dle výskytu. Jedna skupina zastupovala mrna primárně distribuovanou v animální části, druhá skupina mrna vegetální části a třetí skupina vegetální kortex. Pomocí qpcr tomografie rozdělující jednu buňku na pět částí a studující 18 genů bylo 27

28 zjištěno, že lokalizace těchto klíčových mrna je zachována minimálně do 32-buněčného stadia i během rýhování[3]. Dalším stupněm ve vývoji metody qpcr tomografie v naší nové laboratoři byla charakteristika více genů (31) zachycující distribuci maternální mrna, mitochondriální RNA pro zjištění distribuce energetického metabolismu a také pro jádro specifické mrna v detailnějších řezech, tentokrát rozdělující oocyty (n=3) na 15 řezů podél animálně-vegetální osy. Zjistili jsme, že většina mrna animálních genů je soustředěna a skladována v blízkosti jádra a že běžně používané referenční geny (EF-1α, GAPHD, β-tubulin, α-actin) jsou lokalizovány velmi podobně jako animální geny, což znemožňuje jejich použití pro normalizaci genové exprese. Velmi zajímavým zjištěním byla také vysoká podobnost distribuce mrna různých oocytů, což není pro somatické buňky běžné a naznačuje důležitou úlohu lokalizace mrna v oocytu pro další vývoj organismu. Tuto práci popisuje publikace I. Spatial expression profiles in the Xenopus laevis oocytes measured with qpcr tomography. 28

29 Methods 51 (2010) Contents lists available at ScienceDirect Methods journal homepage: Spatial expression profiles in the Xenopus laevis oocytes measured with qpcr tomography Radek Sindelka a, Monika Sidova b,c, David Svec b, Mikael Kubista b,d, * a Whitehead Institute, Cambridge, USA b Laboratory of Gene Expression, Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague, Czech Republic c Charles University in Prague, Faculty of Science, Department of Cell Biology, Czech Republic d TATAA Biocenter AB, Odinsgatan 28, Göteborg, Sweden article info abstract Article history: Accepted 18 December 2009 Available online 4 January 2010 Keywords: Xenopus Oocyte qpcr tomography Expression profiling qpcr tomography was developed to study mrna localization in complex biological samples that are embedded and cryo-sectioned. After total RNA extraction and reverse transcription, the spatial profiles of mrnas and other functional RNAs were determined by qpcr. The Xenopus laevis oocyte was selected as model, because of its large size (more than 1 mm) and large amount of total RNA (5 lg). Fifteen sections along the animal vegetal axis were cut and prepared for quantification of 31 RNA targets using the high-throughput real-time RT-PCR (qpcr) BioMark platform. mrnas were found to have two localization patterns, animal/central or vegetal. Because of the high resolution in sectioning, it was possible to distinguish two subgroups of the vegetal gene patterns: germ plasm determinant pattern and profile of other vegetal genes. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction The early development of a multicellular organism is a complex process in which each step must be precisely controlled. Each cell has unique expression of genes and the spatial gene expression patterns within the growing embryo depend on the cells position. In situ hybridization, microarray analyses, RNA protection assays, RT-PCR, Illumina sequencing and real-time quantitative PCR (qpcr) are among the most frequently used methods for gene expression analysis. Each method has its limitations and comparison of results between laboratories using different methods including different conditions and protocols is difficult. For example, while whole mount in situ hybridization is standard for spatial analysis in developmental biology, quantification of gene expression is at best qualitative. On the other hand, high-throughput quantitative methods such as microarrays and Illumina sequencing can produce large amounts of quantitative expression data. However, spatial resolution is limited using these methods since they require large amounts of pure and high quality RNA. qpcr is in-between; quantification is highly precise and minute amounts of RNA can be used, which allows for spatial profiling. Tens of genes can be conveniently quantified in each sample. * Corresponding author. Address: Laboratory of Gene Expression, Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague, Czech Republic. Fax: address: mikael.kubista@tataa.com (M. Kubista). Oogenesis in Xenopus laevis results in oocytes with 1.3 mm in diameter [1]. Mature oocytes have differently colored hemispheres that specify the developmental animal vegetal axis. Pigmentation of the animal hemisphere is caused by accumulation of melanosomes. The animal hemisphere also contains the germinal vesicle. The light color of the vegetal hemisphere is due to storage of yolk platelets [2]. mrna molecules synthesized during oogenesis are termed maternal, because the template for their transcription is solely maternal chromosomes (reviewed in [3]). Two groups of maternal transcripts have been identified that differ in their spatial distribution in the Xenopus oocyte (reviewed in [4 6]). One group is called vegetal. These genes are expressed during the early stages of oogenesis and the corresponding mrna molecules are actively transported from the nucleus to the vegetal pole. Two different transport pathways have been identified. DEADSouth, Xcad2, Xdazl and Xpat transcripts are transported by the early METRO (message transport organizer) pathway [7,8] and they accumulate in the primordial germ cells formed in later developmental stages. The second pathway is active later during oogenesis and transports mrna molecules, such as VegT and Vg1, that are important for germ layer determination. The vegetal hemisphere gives rise to the endoderm cell layer in later stages of development and stimulates the formation of the mesodermal layer. mrnas encoded by many maternal genes including An1, An2, An3, Tcf-1, axin, Xpar1 localize predominantly in the animal hemisphere [4 6,9]. The distribution of mrnas along the animal vegetal axis in the Xenopus oocyte is critical for successful development. The second /$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi: /j.ymeth

30 88 R. Sindelka et al. / Methods 51 (2010) developmental axis, the dorsal ventral, can be distinguished at the 4-cell stage in Xenopus. The sperm enters the oocyte at a random position in the animal pole. This is followed by the cortical rotation approximately 20 min after the fertilization [10 12], during which the vegetal cortex layer moves approximately 30 in the opposite direction to the sperm s entry. This results in the accumulation of beta-catenin and some yet unknown stabilizing factors into the forthcoming dorsal side of the embryo. Surprisingly, the cortical rotation does not significantly change the distribution of maternal mrnas along the A V axis [9]. In this paper we present the high resolution qpcr tomography technique for spatiotemporal analysis of mrnas in the X. laevis oocyte. We present the entire procedure from sample preparation, embedding, slicing, RNA extraction, reverse transcription, preamplification, qpcr to data analysis. In our study, X. laevis oocytes were sectioned into 15 segments across the animal vegetal (A V) axis and the levels of 31 maternal transcripts, mitochondrial RNAs, and nucleus specific RNAs were quantified using the highthroughput real-time RT-PCR microfluidic BioMark platform ( 2. Materials and methods 2.1. Embryo fixation and sectioning Xenopus laevis females were stimulated with 500 U of hcg (human chorionic gonadotrophin) and kept overnight in room temperature. Oocytes were obtained by gentle squeezing. They were not treated with cysteine, which is common procedure, because the treatment compromises the subsequent cryo sectioning of the samples. Three oocytes from the same female were used in our study. The oocytes were embedded in a drop of optimum cutting temperature (OCT) on a pre-cooled dissection block (Microm HM560). The block was placed for 5 min at 20 C in the cryostat chamber and the samples were dissected into 45 slices each 30 lm thick along the animal vegetal (A V) axis (Fig. 1). Consecutive slices were pooled into fifteen tubes with three slices in each. Hence, the first tube contained the first three animal sections, the second tube contained sections 4 6, etc., and tube fifteen contained the last three vegetal sections Extraction of total RNA and quality control The RNeasy Micro kit (Qiagen) was used for total RNA extraction. RLT (350 ll) RNeasy lysis buffer with 1% mercaptoethanol and RNA carrier was added to each tube, which were immediately stored at 80 C. After thawing of the samples they were vortexed for at least 1 min in lysis buffer. The mercaptoethanol added and the long vortexing were found to be essential for efficient removal of inhibitors. Yolk platelets localized in the vegetal part of the oocyte were found serious inhibitor of reverse transcription and qpcr. RNA concentrations were measured with the Nanodrop Ò ND1000 quantification system (Nanodrop Inc., Fig. 1). Quality of the extracted RNA was assessed by running 10 randomly selected samples on the capillary electrophoresis system Experion (Bio-Rad). Total RNA (50 ng) was denatured and loaded into the HighSense chip and run according to the manufacturer s protocol (Fig. 2). An 18S to 28S ribosomal RNA ratio of 1:2 indicated high quality RNA. High quality RNA is further supported by the absence of short RNA fragments that would indicate degradation Reverse transcription cdna was produced starting with 30 ng of total RNA, 1.5 ll of a (1:1) mixture of 10 lm oligo-dt and 10 lm random hexamers and water. Total reaction volume was 8.5 ll. After 10 min incubation at 72 C, 100 U of MMLV reverse transcriptase (Promega), 12 U of RNasin (Promega), 5 nmol of dntp and 2 ll of buffer (5) were added to a total volume of 11.8 ll. The mixture was incubated for 60 min at 37 C. The cdna product was diluted to 60 ll and stored at 20 C Primer design and preamplification Primers for the amplification of the 31 selected maternal genes were designed with Primer3 [ Primer specificity and assay efficiency was tested on control cdna (mixture of cdna from different Xenopus developmental stages). Acceptance criteria were: specific amplification of control cdna with a Cq lower than 35, a single dominant peak in the derivative of the melting curve, and no amplification of non-template controls. Preamplification was used to increase the number of template molecules. This is needed because the cdna synthesis does not yield sufficient number of molecular copies of the template molecules that they can be analyzed with confidence in parallel singleplex reactions. Preamplification PCR was run in 20 ll containing 4 ll of cdna, 2 ll of a mixture of all forward and reverse primers (500 nm each), 10 ll of Sigma JumpStart mastermix (Sigma, cus- Fig. 1. Spatial distribution of total RNA in the Xenopus laevis oocyte. Oocytes were sectioned across the A V axis. Total RNA was extracted and quantified using the Nanodrop. Each bar represents a 90 lm section. 30

31 R. Sindelka et al. / Methods 51 (2010) Fig. 2. The quality of the extracted RNA was assessed with capillary electrophoresis (Experion, Bio-Rad). Total RNA (100 ng) from 11 sections were analyzed on high sense chip. tomized product) and water. A CFX 96 cycler (Bio-Rad) was used for the preamplification with the cycling conditions: polymerase activation at 95 C for 2 min, followed by 18 cycles (95 C 15 s, 59 C 1 min and 72 C 1 min). The product of the preamplification reaction was diluted to 80 ll and stored at 20 C. The robustness of the preamplification was validated by comparing qpcr expression levels of the genes Xmam1 and Vg1 that were analyzed with and without preamplification. The relative expression of the two genes was similar when analyzing data with and without preamplification (Fig. 3) High-throughput qpcr performed on Biomark system The high throughput microfluidic qpcr platform BioMark (Fluidigm) was used for qpcr analysis running the dynamic array. The sample reaction mixtures had a final volume of 5.3 ll made up of 1.2 ll preamplified cdna, 0.6 ll of SYBR Green Sample Loading reagent (Fluidigm), 2.77 ll Sigma A mastermix (Sigma, custom product), ll of Chromophy, diluted 1:25 (TATAA Biocenter), ll of ROX (Invitrogen) and 0.1 ll of JumpStart DNA Taq polymerase (Sigma). The primer reaction mixtures had also a final volume of 6 ll and were made up of 3 ll Assay Loading reagent (Fluidigm) and 3 ll of a mixture of reverse and forward primers (10 lm). The empty dynamic array was first primed with oil solution in the NanoFlex 4-IFC Controller (Fluidigm) to fill its control valves. Sample reaction mixtures were loaded into the sample wells carefully avoiding any bubbles, and primer reaction mixtures were loaded into the assay wells. The dynamic array was then placed again into the NanoFlex 4-IFC Controller for loading and mixing. The mixing takes place by diffusion between the reaction chambers filled with sample reagent and adjacent containers filled with the appropriate primer and probe mix, between which a channel is opened. After about 55 min the dynamic array was transferred to the BioMark. The qpcr cycling program was 3 min at 95 C for activation of the hot-start enzyme, followed by 30 cycles of denaturation at 95 C for 15 s, annealing at 60 C for 20 s, and elongation at 72 C for 20 s. Melting curves analysis was performed after completed qpcr collecting fluorescence between 60 and 95 C at 0.5 C increments Data analysis Specific amplification of each targeted cdna was confirmed by melt curve analysis. Measured Cq values were exported from the BioMark platform software to Excel for data analysis. qpcr technical replicates of samples were averaged. Expression ratios were calculated by the delta Cq method normalized to the first animal tube (containing sections 1 3). This artificially sets the expression of all genes in the first segment to 1, relative to which all other segments are expressed. For each oocyte and separately for every gene the relative expressions in all segments were summed, and the data were divided with the sum. This resulted in expression of each gene being presented as the percentage of its total expression. The percentage values for the three oocytes were finally averaged. The averaged values and associated standard deviations are presented. The standard deviations were in general very small and cannot be seen in the plot. 3. Results Fig. 3. Comparison of the spatial expression profiles of Vg1 (vegetal gene) and Xmam1 (animal genes) measured with and without preamplification. During optimization of sample collection and preparation for cryosectioning we found that extensive cysteine treatment, which is common in many protocols, makes oocytes fragile and lowers the quality of the RNA. Because of the large size (1.3 mm in diameter) of the X. laevis oocyte, different thickness of the cryostat sections is possible. We found that 30 lm sections, which is the maximum width possible with our cryostat (Microm), worked very well and most oocytes gave the same number of sections. We also 31

32 90 R. Sindelka et al. / Methods 51 (2010) found that pre-incubation of the samples in the cryostat chamber for at least 10 min was important for smooth sectioning. Mercaptoethanol (1%) was critical not only for reducing the activity of RNases, but also for removal of the yolk and other lipids from the vegetal pole. Once the protocol was optimized, the variability in total RNA concentration between segments cut from different oocytes was very low, evidencing precise and reproducible sectioning and RNA extraction (Fig. 1). The quality of the extracted RNA was assessed with capillary electrophoresis using the Experion system (Bio-Rad), and indicated high quality RNA (Fig. 2). The reaction chambers in the BioMark platform are only 10 nl and the loaded cdna is automatically aliquoted into 48 chambers. For each chamber to contain sufficient cdna for reliable quantification the material must be preamplified. The performance of the preamplification was tested by comparing one vegetal gene (Vg1) and one animal gene (Xmam1) with and without preamplification. The relative expression of the two genes was the same suggesting that preamplification introduces minimal bias (Fig. 3). Two main spatial expression profiles were found for the 31 maternal genes studied. Most were abundant in the upper third of the oocyte, which is in the animal hemisphere probably close to the oocyte nucleus. Genes frequently used for normalization of expression in X. laevis [13], such as EF-1alpha, GAPDH, beta-tubulin, alpha-actin and RNA polymerase II had this localization. Mitochondrial cytochrome C mrnas and U3 snorna, which is located in the nucleolus, were also localized to the animal hemisphere (Figs. 4 and 5). The high resolution of qpcr tomography allowed two subgroups within the vegetal genes to be distinguished (Fig. 6). The spatial distribution of germ plasm determinants such as Xdazl, DEADSouth and Xcad2 shows a very steep gradient toward the vegetal pole, suggesting that these mrnas are localized densely close to the pole itself [14]. Vg1, VegT, Otx1, Eg6 and Wnt11 are also localized vegetal, but with a less steep spatial gradient toward the pole. Fig. 5. Expression profiles of animal genes along the A V axis of X. laevis oocyte. The distribution of mrnas coding for An1, An2, APC, axin, Est1, Fz7, Zp3 and ZPC (1) and Xmam1, XPar1, Tcf-3, Stat3, Oct60, GSK-3beta and FoxH1 (2) was determined in 15 serial segments. 4. Conclusions qpcr tomography was introduced by Sindelka et al., in In the pioneer work X. laevis oocytes were sectioned into five segments across the animal vegetal axis and expression levels of 18 maternal genes were measured by qpcr. Two distinct expression patterns were found that were referred to as animal and vegetal. In the presented study, we refine the qpcr tomography technique to increase its resolution to 15 segments. This allowed us to distinguish two subgroups of the vegetal genes. Fig. 4. Spatial expression profiles of common reference genes, mitochondrial RNA, reflecting the distribution of mitochondria in the cell, and nucleolar RNA, reflecting the position of the nucleus, along the A V axis of the X. laevis oocyte. Distributions of mrna coding RNA polymerase 2, mitochondrial cytochrome C, GAPDH, EF- 1alpha, beta-tubulin, alpha-actin and U3 snorna was determined in 15 serial segments. Fig. 6. Expression profiles of vegetal genes along the A V axis of the X. laevis oocyte. The distribution of mrna coding germ plasm factors Deadsouth, Xdazl and Xcad2 (in red) and other vegetal genes Otx1, Eg6, Vg1, VegT and Wnt11 (in blue) was determined in 15 serial segments. The majority of mrna molecules encoded by animal genes were found in sections 3 8. This is where the oocyte nucleus is expected to be located. This may indicate that animal transcripts are localized in the nucleus or stored somewhere near it. Interestingly, common reference genes used in Xenopus expression studies, such as EF-1alpha, GAPDH, beta-tubulin and alpha-actin, have animal localization. Clearly, they would not be suitable for normalization of spatial expression in the Xenopus oocyte and not even in early blastomere stages, where the original spatial distribution remains (unpublished data). The higher sensitivity, better specificity and wider dynamic range of qpcr tomography compared to other RNA based methods allow spatial expression profiles to be measured with higher resolution than has been possible. Previous studies on single cell expression profiling have indicated large variations in the mrna levels among cells [15,16]. This does not seem to apply to the Xenopus oocyte. The intracellular expression profiles seem to be highly conserved among the oocytes, indicating that mrna spatial distribution is critical for the early development. 32

33 R. Sindelka et al. / Methods 51 (2010) Acknowledgments This work was supported by Grant agency of Academy of Science Czech Republic (IAA ) and research goal AV0Z granted by Ministry of Youth, Education and Sports of the Czech republic, together with support from Grant agency of Czech Republic (GACR 301/09/1752). References [1] P. Chang, D. Perez-Mongiovi, E. Houliston, Microsc. Res. Tech. 44 (1999) [2] M.V. Danilchik, J.C. Gerhart, Dev. Biol. 122 (1987) [3] J. Heasman, Semin. Cell Dev. Biol. 17 (2006) [4] M.L. King, T.J. Messitt, K.L. Mowry, Biol. Cell 97 (2005) [5] Y. Zhou, M.L. King, IUBMB Life 56 (2004) [6] K.L. Mowry, C.A. Cote, FASEB J. 13 (1999) [7] H. MacArthur, M. Bubunenko, D.W. Houston, M.L. King, Mech. Dev. 84 (1999) [8] H. MacArthur, D.W. Houston, M. Bubunenko, L. Mosquera, M.L. King, Mech. Dev. 95 (2000) [9] R. Sindelka, J. Jonak, R. Hands, S.A. Bustin, M. Kubista, Nucleic Acids Res. 36 (2) (2008) [10] J. Gerhart, M. Danilchik, T. Doniach, S. Roberts, B. Rowning, R. Stewart, Development 107 (1989) [11] J.C. Gerhart, J.P. Vincent, S.R. Scharf, S.D. Black, R.L. Gimlich, M. Danilchik, Philos. Trans. R. Soc. Lond. B Biol. Sci. 307 (1984) [12] J.P. Vincent, J.C. Gerhart, Dev. Biol. 123 (1987) [13] R. Sindelka, Z. Ferjentsik, J. Jonak, Dev. Dyn. 235 (2006) [14] F. Nishiumi, T. Komiya, K. Ikenishi, Dev. Growth Differ. 47 (1) (2005) [15] M. Bengtsson, A. Stahlberg, P. Rorsman, M. Kubista, Genome Res. 15 (2005) [16] A. Tichopad, R. Kitchen, I. Riedmaier, C. Becker, A. Ståhlberg, M. Kubista, Clin. Chem. 55 (10) (2009)

34 4.2 Mikrofluidní vysokokapacitní analýza za použití nespecifických barviv Souběžně s výše uvedenou prací kde byl u nás poprvé použit přístroj pro vysokokapacitní analýzu qpcr BioMark (Fluidigm), vznikla publikace shrnující optimalizaci qpcr pomocí nespecifických barviv v mikrofluidním prostředí chipů BioMarku. V době, kdy naše laboratoř získala tento unikátní přístroj, byly dostupné pouze protokoly pro analýzu pomocí specifických hydrolyzačních prób komerční knihovny Taqman (Life Technologies). Stanovení toho typu se sice vyznačují vysokou kvalitou a specificitou díky použití próby a pokročilému systému in silico validací dle aktuálních sekvenačních databází, nicméně pro náš modelový organismus Xenopus laevis ani v době vzniku této disertační práce nejsou Taqman stanovení dostupné. Dalším faktorem je nezbytná validace Taqman stanovení pomocí cdna, velmi úzce doporučená reagentia pro RT-qPCR v rámci portfolia Life Technologies a také poměrně vysoká cena Taqman stanovení. Knihovny našich validovaných primerů pro Xenopus byly založené na flexibilním použití nespecifických barviv vázajících se v průběhu PCR na dvouřetězcovou DNA typu SYBR Greenu. Proto bylo nutným předpokladem pro další rozvoj laboratoře a qpcr tomografie optimalizovat protokol pro mikrofluidní analýzu pomocí dsdna nespecificky vázajících se fluorescenčních barvení. Součástí optimalizace byl ve spolupráci se švédskou firmou TATAA Biocenter výběr vhodného mastermixu a screening nových barviv, kdy nejvhodnějším kandidátem pro nový typ analýzy bylo barvivo s označením Chromofy. Zjistili jsme, že nejčastěji používané barvivo SYBR Green je pro BioMark platformu nevhodné v kombinaci s naprostou většinou mastermixů, protože dochází k adsorbpci na velký povrch mikrofluidního řečiště vytvořeného z polydimethylsiloxanu (PDMS) a není tak možné vytvořit uniformní podmínky qpcr v rámci struktury celého chipu, což velmi negativně ovlivnilo analýzu. Barvivo Chromofy se po ročním testování ukázalo jako nestabilní, na rozdíl od velmi stabilního EvaGreenu. Dalším zjištěním byla důležitost intenzity fluorescence dsdna nespecifických barviv pro zachycení signálu amplifikace a odlišení od fluorescenčního pozadí. Intenzita fluorescence je závislá charakteru barvy samotné, na složení daného mastermixu ale především na správném nastavení teploty při úvodní kalibraci CCD kamery. Protože byl software přístroje od jeho uvedení nastaven pro Taqman próby, jejichž fluorescenční pozadí se s teplotou téměř nemění, bylo nastavení expozičního času CCD kamery při úvodní kalibraci prováděno při pokojové teplotě. Při použití jakékoliv nespecifické barvy, však dochází v přítomnosti primerů při pokojové teplotě k tvorbě nespecificky navázaných 34

35 produktů, které generují silný fluorescenční signál. Nastavení kamery je pak na tento signál nastaveno před začátkem teplotního protokolu a při vlastní amplifikaci dochází k podexponování záznamu a ztrátě signálu z qpcr amplifikace. Reakční objem v chipu je na rozdíl od konvenčních systémů 1000x menší, v řádech nanolitrů a tak je ztráta intensity signálu doprovázena významnou ztrátou amplifikační údajů, nutných pro stanovení genové exprese. Této práci se věnuje publikace II. Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark v knize PCR Technology: Current Innovations, Third Edition. Po krátké a zajímavé stáži v sídle Fluidigm, včetně testování Chromofy byl software BioMarku dle našich zjištění upraven. 35

36 23 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark David Svec, Vendula Rusnakova, Vlasta Korenkova, and Mikael Kubista Contents 23.1 Introduction Materials and Methods Results and Discussion Calibration of Background Signal Data Uniformity Melting Curve Analysis Conclusions Acknowledgments References Introduction In biological and medical studies, large-scale gene expression and transcriptome analyses are essential tools. 1 Based on the goals of the investigator, different methods such as RNA microarrays, massive parallel sequencing, or high-throughput real-time quantitative PCR (qpcr) can be used. There are currently four platforms available for high-throughput qpcr: the BioMark from Fluidigm, the OpenArray from Life Technologies, the LightCycler 1536 from Roche, and the Smartchip from Wafergen (Table 23.1). These instruments offer the broadest dynamic range, high versatility of experiment setup, low cost per sample, and the lowest demands on the amount of sample material, allowing analysis of minute sample amounts, including single cells. 2,3 One focus of our center is gene expression profiling of single cells and we have used the BioMark as well as the OpenArray system. When we acquired the BioMark, most of our assay libraries were based on nonspecific dyes and so we explored using these established dye-based assays in microfluidic qpcr and extensively optimized the experimental protocol. In this report, we focus on the use of the BioMark system, which uses microfluidic chips called dynamic arrays for gene expression profiling, digital arrays for digital PCR, and reusable genotyping arrays for single nucleotide polymorphism (SNP) applications. When using the dynamic array, each sample is mixed with each assay combinatorially in the integrated fluidic circuits (IFC) resulting in = 2304 or = 9216 qpcr reactions. The complete workflow takes 3 4 h and requires approximately 1 h of hands-on time. When compared to conventional qpcr, the high-throughput platform requires 24 or 96 times less effort and time (considering 384- or 96-well blocks). Furthermore, when using the dynamic array, all the reactions are run in parallel under identical thermal and optical conditions; hence, no interplate calibration is needed for studies of up to 96 samples and less bias is introduced. The automatic mixing in the 323 K12140_C023.indd /27/ :37:23 PM

37 324 PCR Technology Table 23.1 High-Throughput qpcr Platforms Available as of Autumn Priming time Loading time Number of samples Assays per sample qpcr reactions Minimum input/ sample Reaction volume Fluidigm Dynamic Array (+Access Array a ) Fluidigm Dynamic Array Fluidigm FR (Genotyping) Fluidigm (Genotyping) Fluidigm Digital Array Fluidigm Digital Array Life Tech. Openarray Wafergen Smartchip 11 min 20 min 11 min 10 min 6 min 30 min Roche LightCycler min 95 min 60 min 30 min 40 min 40 min 30 min >10 min min (reusable) (3 ) (reusable) 24 Single/multiplex Single/multiplex (3 ) a , (3 ) μl 5 μl 5 μl 4 μl 8 μl 4 μl 3 5 μl nl nl 10 nl 6.75 nl 8 nl 8 nl 6 nl 0.85 nl 33 nl 100 nl nl Detection Probe/dye Probe/dye Probe Probe Probe/dye Probe/dye Probe/dye Probe/dye Probe/dye Loader MX HX WX RX MX MX AccuFill Nanodispenser Innovadyne TM System Launched Spring 2007 Fall 2008 Fall 2010 May 2011 Fall 2006 Spring 2009 Spring 2009 Spring 2010 Summer 2009 a The Access array is for downstream sequencing. K12140_C023.indd /27/ :37:23 PM

38 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark 325 loading station (IFC controllers with type names MX, HX, WX, and RX, specific for particular arrays) dramatically reduces manual liquid handling. Only pipetting steps are needed to prepare 2304 qpcrs and to prepare 9216 qpcrs. Pipetting error can be further reduced by using automatic dispensers and multichannel pipettes, as the arrays are compatible with standard footprints. The cost saving on reagents is substantial, since the BioMark consumes 200 times less reagents and sample material than conventional microtiter plate-based instruments running 20 μl reactions. The volume of the sample mix, composed of mastermix, including dsdna binding dye + loading buffer + cdna + ROX, pipetted in an array inlet is 5 μl, which is distributed into 48 or 96 reaction chambers. The reaction volume is 10 nl in the dynamic array and 6.75 nl in the dynamic array (Table 23.1). Nanoliter-scale qpcr requires concentrated samples with a sufficient number of template molecules to allow homogeneous distribution into the reaction chambers. The reaction volumes vary somewhat among the microfluidic arrays, but are typically 2000 times smaller than conventional 20 μl qpcrs. In practice, this usually requires that the samples are preamplified. 8 Preamplification is required because, for example, if 2 μl of cdna containing 30 copies or more of a target gene is used in a conventional 20 μl qpcr, the reproducibility of technical replicates will be satisfactory. But if the same sample is used in a array, each chamber will contain an average of copies. Only a few chambers will contain target, while most will be empty. Hence, the data would not be reproducible. 9 Poisson variability is higher in the BioMark than in conventional systems due to low template concentration yielding high Cq values (Figure 23.1). 10,11 When the first BioMark dynamic array was launched in 2007, 12 hydrolysis probes were used exclusively for gene expression analysis. The cost of a probe assay is typically 5 10 times that of primers only, which can add a substantial amount to the cost of entire profiling experiments. Testing and validating assays requires substantial resources when using high-throughput profiling. The 10 Microfluidic high-throughput qpcr SD SD Cq 384 well conventional qpcr Cq dye based protocol probe based protocol 384 well dye based protocol 384 well probe based protocol Figure 23.1 (See color insert.) Dependence of standard deviation on Cq in conventional and microfluidic qpcr systems. In the conventional system, 10 μl qpcr were run in a 384-well plate. The microfluidic system was a array with 10 nl qpcrs. The same samples and assays were analyzed, although for the BioMark the material was preamplified through 14 cycles. K12140_C023.indd /27/ :37:24 PM

39 326 PCR Technology assay must be tested to determine efficiency, sensitivity, and specificity. 9 Assays in which primer dimers are formed should be excluded as they will result in unreliable data due to poor sensitive and compromised reproducibility. This factor is frequently ignored by those using probes for template detection, arguing that nonspecific products do not generate signal. However, parallel amplifications compete for reagents, which leads to an underestimation of the specific target. 13 When using dye-based assays, melt profiles of PCR products can be recorded after each run with no additional cost and these can be used to reveal the presence of competing reactions. Studies of SNPs and methylation patterns can be done at minimal cost by using a similar strategy but by applying postamplification, high-resolution melt analysis (HRM). If sequencing of the PCR product is required, the access array can harvest all PCR products in one pool, ready to be sequenced, instead of microinjecting the nanoliter-scale volumes one by one. There are some important technical differences between conventional systems using microtiter plates and the microfluidic platforms. The BioMark optical system uses a xenon lamp, optical filters (by default: three out of four filter positions are mounted with filters that are compatible with detection of ROX (6-carboxy-X-rhodamine), FAM (5-carboxyfluorescein), VIC/JOE (6-carboxy- 4,5 -dichloro-2,7 -dimethoxyfluorescein, a charged coupled device (CCD) 9M pixel camera, and a cooling system. 4 The passive reference dye, ROX, is always used in the BioMark platform to identify the chambers and to normalize the signal (Figure 23.3). Before the start of cycling, the CCD camera is focused onto the wells. Then, the autoexposure function sets an exposure time for each fluorescence channel based on the background signal. The objective is to set a baseline such that even a minute increase in fluorescence can be detected with high sensitivity by the CCD chip. The autoexposure calibration is performed by collecting several images with different exposure times. An optimum exposure time is calculated and is then used for the acquisition of qpcr signals. With increasing background, the exposure time to register fluorescence during qpcr is shortened. If the exposure time during acquisition is too short, images are underexposed, which leads to poor-quality data and, in the worst-case scenario, amplification, which is not detected above the background. The calibration setting protocol dates back to the time when the BioMark was introduced in November 2006, 14 and only probe-based assays were used. This included using a calibration temperature of 20 C, and initially the calibration temperature was not adjustable. The calibration settings for Fluidigm s EvaGreen protocol based on the use of Taqman Gene Expression Master Mix (TGEMM, Life Technologies) use a different approach to that presented here. This requires that a new detector profile is created in the data collection software artificially to reprogram the sensitivity of the CCD. This setting is only optimum for the combination of EvaGreen and TGEMM and can lead to under- or overexposure of the CCD if used for other combinations of dye and mastermix. Here, we present a new calibration approach of the CCD in the BioMark that does not depend on the dye and master mix used, and included the use of the novel dye Chromofy Materials and Methods Intensity of fluorescence, loading uniformity, and the performance of reporter dyes in the microfluidic qpcr assays using the dynamic arrays were tested. A GAPDH qpcr assay developed within the SPIDIA project ( with efficiency >90% and no primer dimer formation in conventional qpcr was used for tests on the BioMark platform with dye-based and probe-based detection and using several master mixes. The same primer pair was used for all studies: human GAPDH assay targeting NM_ , 91 bp amplicon, Fwd primer: CCTCCACCTTTGACGCT, Rev primer: TTGCTGTAGCCAAATTCGTT, probe FAM-BHQ1: AGCTTGACAAAGTGGTCGTTGAGGGCAATG. Primers (MWG Operon) were desalted and the hydrolysis probe was HPLC purified (MWG Operon). Final concentrations in the reaction chamber were 400 nm of each primer, 200 nm hydrolysis probe, and recommended amount of 2 DA Assay Loading Reagent (Fluidigm). The selection of mastermixes used with the probe were ABI Taqman Gene Expression MasterMix (TGEMM) according to Fluidigm s protocol. Reporter dyes were used K12140_C023.indd /27/ :37:24 PM

40 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark 327 in KAPA SYBR Fast Universal mix and Finnzymes SYBR Flash mix. These mixes were supplied either with SYBR Green I or without SYBR Green I to which we added Chromofy or EvaGreen. Chromofy 5000 (TATAA Biocenter, 6.85 mm) was diluted with nuclease-free water to a 400 working stock. The final concentration in the reaction chamber was 6. EvaGreen 20 (Biotium) was used in the final 1 concentration in qpcr. Tests were performed using the Universal human cdna (BioChain Institute Inc.) in different concentrations or after preamplification: preamplfication of 20 genes (SPIDIA) was carried out in Biorad IQ Supermix and contained 5 μl of universal cdna in 50 μl reaction volume, 50 nm each primer final concentration, and used a amplification protocol consisting of 3 min 95 C followed by 18 cycles of 20 s 95 C, 1 min 58 C, and 30 s 72 C. Test reactions consisting of 5 μl of sample mix contained 0.5 μl of 20 DA sample loading reagent (Fluidigm) and 2.5 μl of mastermix. Concentrations of ROX reference dye (50 Invitrogen, stock 25 μm) in test arrays varied from 25, 50, to 125 nm in qpcr. Prior to loading of samples and assay reagents into the inlets, the array was primed in the NanoFlex 4-IFC Controller (Fluidigm), 5 μl of sample mix was pipetted into sample inlets, and 5 μl of assay mix into the assay inlets. Technical replicates were performed on the assay as well as on the sample level to evaluate repeatability. All dynamic arrays were thermocycled using the BioMark Real-Time PCR System (Fluidigm) under the following conditions: calibration temperature was manually set to 20 C or 72 C made possible with a change in the Windows registry using a *.reg file provided by Fluidigm. The ROX/ FAM-MGB detection channel was selected. The temperature protocol was 600 s 95 C activation, 40 cycles followed by either a two-step probe protocol: 95 C 20 s, 60 C 60 s (signal acquisition at 60 C), or a three-step dye protocol: 95 C 20 s, 58 C 20 s, and 72 C for 30 s (signal acquisition at 72 C). Cycling was followed by melt curve analysis with amplicon incubated from 65 C to 95 C at 0.5 C increments and 1 s wait between each increment. Data were collected using the BioMark Data Collection software with separate protocols for the qpcr and for the melting analysis. Data were analyzed using the Fluidigm Real-Time PCR Analysis v and the Fluidigm Melting Curve Analysis v Linear baseline correction was used and threshold was manually adjusted to be as close as possible to the background. A set of validated assays that had been designed for the EU project SPIDIA ( were used for comparison of the BioMark data with measurements performed on a conventional qpcr platform (Roche LightCycler 384). Images of PCR tubes with GAPDH PCR product were taken using stereomicroscope with green fluorescent protein filter. The melting temperature of GAPDH product in KAPA SYBR mix is 81 C Results and Discussion When the BioMark system was installed in our lab, several high-throughput projects were planned based on panels of established dye-based assays. The first choice of dye was SYBR Green, which is the most popular qpcr reporter dye. However, SYBR Green I turned out to perform very poorly on the Biomark (Figure 23.3). After extensive screening of other dyes, we identified Chromofy as a strong, alternative candidate. The performance of the assays (hydrolysis probes, Chromofy) was evaluated on standard human cdna at different concentrations. The uniformity of the loading process was tested using 48 assay replicates (GAPDH) on cdna at different concentrations analyzed in sample triplicates Calibration of Background Signal While DNA binding dyes usually generate higher-plateau fluorescent signal than probes and a larger ΔRn value in conventional qpcr systems, we found the opposite to occur in the initial runs with SYBR Green on the BioMark. The poor signal had a major impact on data quality in terms of low reproducibility and poor sensitivity. We found that this was caused by inappropriate calibration settings of the CCD camera, which was optimized for probe-based detection. The calibration temperature was set to 20 C by default. The background fluorescence of free probes K12140_C023.indd /27/ :37:24 PM

41 328 PCR Technology does not change appreciably with temperature. However, in a reaction mix containing free dye, there is negligible fluorescence, although the reaction mix contains primers and possibly other residual DNA to which the dye may bind. This bound dye fluoresces and its intensity decreases with temperature (Figure 23.2). 15,18 The strong background signal of the dye at 20 C during calibration results in short exposure time during acquisition, which leads to underexposure during qpcr. To improve the quality of measured data in dye-based reads, we performed background calibration at elevated temperature. For the dyes, we found calibration at 72 C to perform well (Figure 23.3). We were not able to test higher calibration temperature because this is limited in the BioMark Data Collection software. This limit depends on the thermal protocol and for some reason that is unclear, this limit is set to Figure 23.2 (See color insert.) Dye fluorescence in conventional qpcr at different temperatures. Left: two images show a tube with 10 μl SYBR Green master mix (KAPA Universal SYBR mix) in the presence of primers (400 nm per GAPDH primer) before cycling at room temperature and at 72 C. Right: two images show the tube after cycling in the presence of PCR product at room temperature and at 72 C. K12140_C023.indd /27/ :37:27 PM

42 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark 329 Figure 23.3 Camera images showing 12 qpcr chambers in a dynamic array. Top row: 20 C calibration. Dye signal is strong during calibration and exposure time is set to 0.39 s. During acquisition at 72 C, no signal is detected after one cycle, and even after 40 cycles, the signal is very weak. Second row: 72 C calibration. Dye signal is weak during calibration and exposure time is set to 3.6 s. Already after one cycle, chambers are distinguishable, and after 40 cycles, strong signals develops. Third row: Using hydrolysis probes, fluorescence at calibration temperature is low and exposure time is set to 2.15 s. Bottom: Chromofy signal calibrated at 72 C, which results in 1.94 s exposure time. ROX reference dye signal shows that fluorescence is independent of calibration temperature. First to fourth row concentration and exposure time: 125 nm, 0.33 s; 125 nm, 0.23 s; 25 nm, 2.15 s; 125 nm, s. the temperature at which signal is acquired. In most three-step PCR protocols, this is at 72 C, while for two-step probe-based PCR, acquisition is typically at 60 C. To demonstrate the effect of calibration temperature, three dynamic arrays with identical sample and assay setup were used, but signal was generated using either hydrolysis probes, SYBR Green I or Chromofy (Table 23.2). For each array, autoexposure at 20 C was performed. The run was then aborted and new autoexposure was measured at 72 C followed by running qpcr. We observed that the exposure time was almost 10 times longer after calibrating at 72 C compared to calibration at 20 C. There was a slight decrease of exposure times when using hydrolysis probes and including ROX, which suggests that there was only a minor increase of background fluorescence from these labels with decreased temperature. ΔFAM is the difference between initial raw fluorescence signal and the signal measured after the last cycle. Raw fluorescence data measured in the reporter channel are normalized by dividing by the raw data of the passive ROX reference channel and are denoted Rn. Rn values and raw fluorescence data were accessed using the real-time PCR analysis software module (Fluidigm). The maximum values in initial and terminal cycles within the array were read. K12140_C023.indd /27/ :37:29 PM

43 330 PCR Technology Table 23.2 Effect of Calibration Temperature on Detection Chemistry Hydrolysis Probe Mix A Eva Green Mix A SYBR Green Mix B Chromofy Mix B ROX (s) FAM (s) ΔFAM Raw Signal Max ΔRn ROX (s) FAM (s) ΔFAM Raw Signal Max ΔRn ROX (s) FAM (s) ΔFAM Raw Signal Max ΔRn ROX (s) FAM (s) ΔFAM Raw Signal 20 C x x x x x x x x 72 C 0.88 a 0.37 a 4000 a 0.3 a , , Max ΔRn Note: Identical setup of assays and samples are compared. Mix A = TGEMM, Mix B = KAPA Fast Universal. a 60 C. K12140_C023.indd /27/ :37:29 PM

44 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark 331 (+)Rn represents fluorescence values for a reaction that contain all components, including template, while ( )Rn is the background reading for a negative sample. For positive reactions, the ( )Rn signal is obtained from the early cycles before fluorescence starts to increase. The normalized signal ΔRn = ( )Rn + (+)Rn is the fluorescence signal generated using particular ROX concentration and run settings (exposure times). The qpcr was completed using calibration readings at 72 C and data are presented in Table To demonstrate higher sensitivity with calibration at lower temperature, we performed a second study using a different set of dynamic arrays (Table 23.3). One sample was analyzed using SYBR Green I as the reporter for a set of assays using different calibration temperatures in the qpcr (Table 23.4). Samples were analyzed in triplicates using 48 assays. This generated a total of 144 data points. The assays were loaded in all chips in the same order, which minimized the uniformity bias experienced with SYBR Green I (see below). We observed lower Cq values, reduced standard deviations of technical replicates, and higher call rates after calibration at 72 C than with calibration at 20 C temperature Data Uniformity Sample mixes are conventionally loaded into the IFCs from one edge of the array to fill all the reaction chambers. The surface of the polydimethylsiloxane (PDMS) 19 that contacts the sample mix is much greater than the surface of the polypropylene in conventional microtiter plates. To validate the dye-based protocol, a uniformity test was performed, with the identical GAPDH assay loaded into all 48 assay inlets. Finnzymes Dynamo master mix was used to test the performance of SYBR Green I in first array (1 μl of 20 diluted universal cdna per 5 μl sample mix) and Chromofy performance was tested on a second array (1 μl of 40 diluted universal cdna per 5 μl sample mix). Both arrays were run using autoexposure at 72 C. Figure 23.4 shows how Cq varies from the Table 23.3 Effect of Calibration Temperature for Different qpcr Chemistries Calibration Temperature ( C) ROX (s) FAM (s) Probe/Dye Master Mix Array ΔFAM Raw Signal Max ΔRn Probe A , Probe A , SYBR C Green SYBR C Green SYBR C Green Chromofy C , Chromofy C , Note: Mix A = TGEMM, Mix C = Finnzymes SYBR Flash Mix. Table 23.4 Sensitivity and Reproducibility 20 C Calibration 60 C Calibration 72 C Calibration Mean SD of triplicates Missing values (%) K12140_C023.indd /27/ :37:29 PM

45 332 PCR Technology (a) SYBR Green Chromofy ΔRn ΔRn SD = SD = Cycle Cycle (b) Chromofy SYBR Green Cq value Assay inlet Figure 23.4 Amplification curves (a) and Cq values (b) showing uniformity of SYBR and Chromofy assays in arrays. All 48 assay inlets were loaded with GAPDH primers. Q1 loading influx. For the SYBR Green I, reporter Cq increases with five cycles from the inlet to the last-filled reaction chamber. For Chromofy, the increase is much more modest (Table 23.5). This drift in Cq probably reflects adsorption of the reporter dye to the PDMS of the IFCs (Figure 23.5). 19 After optimizing the calibration of the CCD and reagent conditions of the Chromofy protocol, we compared the performance with the recommended probe protocol (Figure 23.6, Table 23.6). K12140_C023.indd /27/ :37:30 PM

46 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark 333 Table 23.5 Uniformity of SYBR and Chromofy Assays Run in a BioMark Channel Mean Cq (48 Replicates) SD (48 Replicates) ΔCq (Max Min) SYBR Green Chromofy ΔCq (Mean of Five Left Chambers Mean of Five Right Chambers) Figure 23.5 CCD images showing selected qpcr chambers in ROX and FAM channels. Lines 1 3 are ROX signals and lines 4 6 show SYBR Green signal. Lines 7 9 show Chromofy signal and lines hydrolysis probe signal. Blue label indicates inlets. Q2 (a) ΔRn (b) Hydrolysis probe 4.00 SYBR Green Cycle Cycle ΔRn 0.03 ΔRn Chromofy Cycle Figure 23.6 (See color insert.) Amplification curves (a) and heatmap produced with the BioMark software. Blue squares represent high Cqs, yellow low Cqs. Lines 4 6 show SYBR Green signal, lines 7 9 show Chromofy, and lines show probe signal. K12140_C023.indd /27/ :37:36 PM

47 334 PCR Technology Table 23.6 Reproducibility and Loading Uniformity of Probe- and Dye-Based Assays Mean Cq (48 Replicates) SD (48 Replicates) ΔCq (Mean of Five Left Chambers Mean of Five Right Chambers) ΔCq (Max Min) Probe SYBR Green Chromofy Table 23.7 Effect of Calibration Temperature on Melt Curve Analysis Calibration Temperature ( C) ROX (s) FAM (s) Probe/Dye Master Mix Array Decrease of FAM Fluorescence drn/dt Average T m ( C) GAPDH Product SD (48 Replicates) SYBR Green C N/A N/A SYBR Green C , N/A N/A SYBR Green C , Chromofy C , Chromofy B , Chromofy C , a 0.14 a Note: Mix A = TGEMM, Mix B = KAPA Fast Universal, Mix C = Finnzymes SYBR Flash Mix. a 96 replicates. The Chromofy assay shows comparable reproducibility and loading uniformity to the probe-based assay. The optimized conditions were for the probe assay: μl of universal cdna per 5 μl of sample mix, for the SYBR Green I assay: 0.05 μl cdna per 5 μl sample mix, and for the Chromofy assay: 0.2 μl cdna per 5 μl sample mix Melting Curve Analysis Optimized calibration temperature also improves the readings of fluorescence during melt curve analysis. The BioMark data collection software has a surprising limit for the maximum calibration temperature that can be used. This is the temperature at which acquisition of the HRM signal starts. For example, if we run a melt curve analysis from 65 C to 95 C, the highest possible calibration temperature is 65 C. The performance of HRM analysis under various conditions is summarized in Table The melting temperature (T m ) is fairly homogeneous across the array when using SYBR Green I or Chromofy reporter (Figure 23.7), demonstrating that it is not appreciably affected by the progressive dye adsorption revealed above. Poor reproducibility is only observed at the edges of the array and is most likely due to evaporation through the partially permeable MSDS caused by extensive heating during thermal cycling. PCR products can be categorized based on the melting profiles when using version 2.6. (and higher) of the Fluidigm Melting Curve Analysis. This is useful to identify qpcrs with nonspecific amplification products such as primer dimers. An example of this application on single-cell material is shown in Figure K12140_C023.indd /27/ :37:36 PM

48 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark T m ( C) Chromofy mix C SYBR Green mix C Chromofy mix B Assay inlet Figure 23.7 (See color insert.) T m uniformity across the array. Mix B = KAPA Fast Universal, mix C = Finnzymes SYBR Flash mix Conclusions In the BioMark IFC, the fluorescence reporter signal collected during qpcr is normalized to the signal from the passive reference dye ROX based on the background signals collected before the onset of the experiment. In the standard BioMark protocol, the background signal is collected at ambient temperature. ROX is a negatively/uncharged dye at neutral ph that does not interact with DNA. It has a rigid structure and constant, very high fluorescence quantum yield that is essentially independent of temperature. Hydrolysis probes are most frequently based on fluorescein (FAM) reporter and rhodamine quencher (TAMRA) dyes, which have similar properties. Therefore, when using hydrolysis probes, there is essentially no dependence of fluorescence intensity on temperature. Since neither ROX nor hydrolysis probe fluorescence depends on temperature, it is possible to record background signal at ambient temperature even though fluorescence during PCR is recorded at substantially higher temperature. Most free DNA binding dyes used as reporters in qpcr are cationic asymmetric cyanines that bind preferentially to double-stranded DNA but show residual binding to single-stranded DNA. 18 The strong fluorescence induced upon binding is induced by the dye being held in a tight conformation, usually in the groove of the double helix or squeezed in between bases in intercalative mode. The tight binding restricts internal motion within the dye that dissipates excitation energy through nonradiative processes, giving rise to intensive fluorescence. The tightness of the binding is temperature dependent. At elevated temperatures, binding is looser, thus increasing nonradiative decay and thereby reducing fluorescence emission. The background fluorescence of asymmetric cyanine dyes such as SYBR Green I and Chromofy at ambient temperature is therefore much lower than their fluorescence at the temperatures used to record fluorescence during qpcr. As demonstrated in this report, calibration of the CCD at ambient temperature when using dye reporters results in poor quality of the fluorescence signals measured during the qpcr, and the performance can be dramatically improved by calibrating the CCD at high temperature. We also show that the common reporter dye SYBR Green I adsorbs to the PDMS of the ICS resulting in a large drift of Cq values with the distance from the inlet. The adsorption is much less with dye Chromofy. K12140_C023.indd /27/ :37:37 PM

49 336 PCR Technology (a) 0.10 Melting (b) 0.10 Melting drn/dt drn/dt Temperature Temperature Figure 23.8 High-throughput melting analysis of PCR products. (a) Only specific PCR products formed. (b) Some reactions have formed aberrant PCR products characterized by lower T m. ACKNOWLEDGMENTS This work was supported in part by the European Community Seventh Research Framework Programme project SPIDIA ( Grant Agreement no and to the Biotechnology Institute of the Czech Academy of Sciences and the TATAA Biocenter and in part by AV0Z granted by Ministry of Youth, Education and Sports of the Czech Republic. David Svec is in part supported by the PhD fellowship from Charles University. References 1. Tang, F., Lao, K. and Surani, M.A. Development and applications of single-cell transcriptome analysis. Nat Methods 8, S6 S11, Stahlberg, A. et al. Defining cell populations with single-cell gene expression profiling: Correlations and identification of astrocyte subpopulations. Nucleic Acids Res 39, e24, Stahlberg, A. and Bengtsson, M. Single-cell gene expression profiling using reverse transcription quantitative real-time PCR. Methods 50, , BioMark HD System. K12140_C023.indd /27/ :37:38 PM

50 Dye-Based High-Throughput qpcr in Microfluidic Platform BioMark OpenArray Real-Time PCR Platform. 6. The SmartChip Cycler. 7. LightCycler 1536 Real-Time PCR System. 8. Mengual, L., Burset, M., Marin-Aguilera, M., Ribal, M.J. and Alcaraz, A. Multiplex preamplification of specific cdna targets prior to gene expression analysis by TaqMan arrays. BMC Res Notes 1, 21, Bustin, S.A. et al. The MIQE guidelines: Minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55, , Rutledge, R.G. and Stewart, D. Assessing the performance capabilities of LRE-based assays for absolute quantitative real-time PCR. PLoS One 5, e Wang Z, S.J. Determination of target copy number of quantitative standards used in PCR-based diagnostic assays, in Gene Quantification (ed. F. F) pp (Birkhäuser, Boston, 1998). 12. Heid, C.A., Stevens, J., Livak, K.J. and Williams, P.M. Real time quantitative PCR. Genome Res 6, , Lind, K., Stahlberg, A., Zoric, N. and Kubista, M. Combining sequence-specific probes and DNA binding dyes in real-time PCR for specific nucleic acid quantification and melting curve analysis. Biotechniques 40, , Fluidigm s BioMark System Sales Surpass the Century Mark. 15. Wittwer, C.T., Herrmann, M.G., Moss, A.A. and Rasmussen, R.P. Continuous fluorescence monitoring of rapid cycle DNA amplification. Biotechniques 22, , , Newby, D.T., Hadfield, T.L. and Roberto, F.F. Real-time PCR detection of Brucella abortus: A comparative study of SYBR green I, 5 -exonuclease, and hybridization probe assays. Appl Environ Microbiol 69, , Arikawa, E. et al. Cross-platform comparison of SYBR Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study. BMC Genomics 9, 328, Jan Nygren, N.S. and Kubista, M. The interactions between the fluorescent dye thiazole orange and DNA. Biopolymers 46, 39 51, 1998, Perkel, J.M. Microfluidics Bringing new things to life science. in Science Magazine (AAAS OPMS, 2008). Q3 Q4 Q5 K12140_C023.indd /27/ :37:38 PM

51 4.3 qpcr tomografie kombinovaná s laserovou mikrodisekcí Pro další vývoj metody qpcr tomografie, která prokázala schopnost profilování genové exprese s vysokou citlivostí až pro stovku genů současně v subcelulárním rozlišení včetně lokalizace transkriptů, jsme se pokusili o doplnění metody o obrazovou informaci studovaného objektu a jako potřebná se ukázala také možnost specificky vybrat buňky, které budou postoupeny molekulární analýze. Cílem tedy bylo vyvinout metodu do podoby, aby byla široce použitelná pro studium buněčných útvarů uvnitř tkání a zabránit tomu, aby okolí zkreslovalo molekulární profil studovaného útvaru. Pro vytvoření mapy genové exprese v čase a prostoru jsme pro relativní jednoduchost vybrali model velmi zajímavého embryonálního vývoje myší stoličky ve spolupráci s Oddělením teratologie, Ústavu Experimentální medicíny AV ČR. Myš má redukovanou dentici, která v každém kvadrantu sestává z jednoho řezáku a třech stoliček, které jsou odděleny mezerou (diastema). Rozložení zubů je výsledkem buněčné signalizace epitelu a mezenchymu, která je poměrně vysoce konzervována mezi obratlovci a je zodpovědná také za vytváření šablony a morfologie při dalších vývojových procesech[93]. Pochopení a popis těchto procesů je aplikovatelný i na člověka a měl by umožnit tvorbu náhradního chrupu pomocí tkáňového inženýrství. Myš nemá mléčné zuby, kromě rudimentárního zubu před řezáky, které kontinuálně dorůstají. Ty představují užitečný model pro studium a diferenciaci kmenových buněk. Důvodem výběru modelu vývoje myší stoličky byla poměrně vysoká úroveň současného poznání, která umožňuje výsledky naší nové metody korelovat s morfologickými daty a se studiemi používajícími převážně in situ hybridizaci. Vývoj myšího zubu prochází řadou dobře definovaných fází: ztluštění epitelu, pupen, čepička a zvonek. Ke ztluštění ústního epitelu dochází okolo E11,5 (den vývoje embrya) a je pro něj typická expresse genu Shh, která spouští buněčnou proliferaci v místě vývoje zubu. Dále se epitel vchlipuje do přiléhajícího mesenchymu pocházejícího z neurální lišty a tvoří pupen. Dle obecně přijímané teorie, je před E12 signální centrum spouštějící vývoj zubu lokalizované v epitelu, po E12 však tuto úlohu přejímá mezenchym a začíná kondenzovat buňky okolo vzniklé invaginace pupenu. V den E13,5 je zubní pupen na vrcholu oploštělý a je obklopen kondenzovaným zubním mezenchymem typicky exprimujícím transkripční faktory Bmp4, Msx1, Wnt a Pax9. Po této signalizaci se na špici pupenu utváří formace sklovinného uzlu, který 51

52 exprimuje další klíčové faktory Shh, Fgf4, Bmp4 a Wnt100b a přechází tak do stádia zubní čepičky[94]. Zubní primordium prvního dolního moláru (M1) na stádiu čepičky s výrazným signalizačním centrem sklovinného uzlu jsme vybrali pro vývoj qpcr tomografie kombinované s laserovou mikrodisekcí jako první. Protokol metody (Obrázek 5) byl sestaven z dostupné vědecké a produktové literatury období [43, 92, 95] se snahou o optimalizaci pro novou aplikaci za použití sériových řezů s nepoškozenou histologií zubního primordia současně s kvalitní RNA, kterou by bylo možné z velkého množství malých vzorků jednoduše a kvalitně izolovat, optimálně přímou lyzací. Současně také probíhala validace qpcr primerů pro specificitu, sensitivitu a efektivitu pro náročné použití pro vzorky s velmi nízkou koncentrací RNA. V nové laboratoři bylo zapotřebí vyvinout také metodu validace párů primerů (=qpcr assay), prakticky splňující nároky pro analýzu na úrovni jednotlivých buněk. Validace pro 30 párů primerů byla provedena pomocí templátu, který je co nejpodobnější cílovým molekulám, tedy myší cdna a analýzy pěti-krokové pěti-násobné ředící řady se čtyřmi qpcr technickými replikáty na koncentraci, společně s kontrolou pro genomickou DNA a negativní kontrolou. Pro validace byla použita qpcr s detekcí pomocí SYBR Greenu s analýzou teploty tání produktu. Pro ověření délky PCR fragmentu jsme použili agarosovou elektroforézu. Obrázek 5: Postup pro qpcr tomografii kombinovanou s LMD. Zleva: 1) rychle zmražená tkáň embryonální čelisti je upevněna v mrazícím médiu a krájena na 20 µm řezy. 2) tepelné ukotvení řezů na UV labilní membránu, fixace pomocí 20% roztoku síranu amonném, barvení hematoxylinem 1min a odvodnění pomocí alkoholové řady 70% 1min a 98% 1min. 3) a 4) Výběr epitelové části zubního primordia M1 na stádiu pupenu a vyříznutí pomocí LMD systému Leica LMD ) Potvrzení přítomnosti vyříznuté tkáně, přidání 70 µl lyzačního RLT bufferu a izolace dle instrukcí kitu Rneasy Micro (Qiagen). 6) Kvantifikace 30 genů za použití RT-qPCR (nepublikovaná data). 52

53 Jak bylo zmíněno v úvodu k LMD, problematickým krokem je fixace tkáně, v našem případě sériových řezů z kryotomu, která ovlivňuje kvalitu a výtěžek RNA. Kvalita RNA v řezech tkáně byla testována pomocí microfluidní elektroforézy Biorad Experion a po zdlouhavé optimalizaci fixace tkáně pomocí 20% roztoku síranu amonného (RNA later) jsme dosáhli hodnot RQI pro řezy, které byly ukotveny k UV labilní membráně v rozmezí 0-15minut. Výsledkem téměř dvouletého snažení je předběžná mapa genové exprese v zubním epitelu (Obrázek 6), kdy je primordium M1 ve stádiu přechodu do zubní čepičky (ED13.5). Byli jsme schopni detekovat zvýšenou expresi jednoho z hlavních růstových faktorů Fgf4 v oblasti tvořícího se sklovinného uzlu[96] v návaznosti na další klíčové signální molekuly Shh a Edar. Jako nový poznatek se ukázal výskyt Fgf8, jehož exprese byla pomocí in situ hybridizace detekována nejpozději ve stádiu tvorby pupenu, kdy je tlumena[97]. Projekt (podpořen Grantovou agenturou University Karlovy) byl pro technickou a časovou náročnost nutnou ke zvýšení rozlišení a reproducibility přerušen. Vyšší rozlišení až na úroveň jednotlivých buněk je podmíněné efektivnější fixací tkáně. Pro objevení nových souvislostí s ostatními geny bude nutné studii provést na stádiu formace sekundárních sklovinných uzlů podmiňujících tvorbu hrbolků zubů na základě exprese Fgf4 a Fgf9. Naše laboratoř současně s přerušením projektu vstoupila do konsorcia SPIDIA.eu (Standardisation and improvement of generic pre-analytical tools and procedures for in-vitro diagnostics), kde jedním z cílů spolupracujících laboratoří a podniků byl vývoj a validace protokolu pro klinické použití umožňujícího fixaci histologických i molekulárních znaků tkáně současně [48, 98]. Tento protokol je nejpravděpodobnějším kandidátem na použití při budoucím obnovení projektu a mohl by umožnit přípravu fluorescenčně značených vzorků, selekci specifických buněk a jejich omickou analýzu na úrovni populace i jedné buňky poskytující širokou molekulární informaci v kontextu lokalizace a interakce s okolím. Úspěšná aplikace metody molekulární tomografie dle očekávání přispěje k rychlejšímu popsání souvislostí fenotypu a exprese desítek genů případně transkriptomu a umožní najít nové možnosti terapie včetně tkáňového inženýrství zubů. 53

54 Obrázek 6: Prvotní mapa genové exprese myšího zubu ED13.5 ve fázi přechodu do stádia zubní čepičky získaná pomocí qpcr tomografie (A). Úroveň exprese mrna transkripčních faktorů, signálních molekul, receptorů, růstových faktorů a referenčních genů v předozadní orientaci (frontální řezy A, posteriorní I). Zeleně je označena skupina genů jejichž mrna byla pomocí in situ hybridizace detekována na frontálních řezech napříč celým útvarem [99], červeně pouze uprostřed v oblasti sklovinného uzlu a černě oblasti kromě sklovinného uzlu, exprese šedých genů nebyla v tomto stádiu pomocí in situ hybridizace potvrzena. Modře je znázorněna skupina referenčních genů. Pro normalizaci byla použita assay pro ribozomální 18S. (B) Předozadní 3D rekonstrukce zubního epitelu při pohledu ze strany. 3D rekonstrukce vznikla pomocí software Reconstrukt na základě obrysů řezů zubního epitelu získaných v průběhu mikrodisekce (nepublikovaná data). 54

55 4.4 Kontrola kvality experimentu při profilování genové exprese na úrovni jednotlivých buněk Nedokončením projektu qpcr-lmd tomografie a začátek projektu SPIDIA bylo moje postgraduální studium nasměrováno ve třetím roce k vývoji molekulárních nástrojům usnadňujícím vývoj a validaci nových metod, včetně molekulární tomografie a umožňujících sledování následujících parametrů v experimentech: Inhibice a výtěžek Kontaminanty přítomné ve vzorcích jsou známé tím, že inhibují enzymatické reakce a ve spojení s RT-qPCR jsou schopné silně zkreslit výsledky. Proto je také jedním ze základních doporučení obecně uznávaných MIQE [78] testovat přítomnost inhibitorů pro věrohodné publikované výsledky získané pomocí qpcr. Nejjednodušší a nejefektivnějším způsobem kontroly je přidání známé exogenní mrna do vzorku RNA a porovnání RT-qPCR výsledku s kontrolním vzorkem obsahujícím stejné množství exogenní mrna v čisté vodě [100]. V akademické laboratoři jsme vyvinuli syntetický gen s unikátní sekvencí, která není přítomná v žádném organismu. Tato univerzální exogenní mrna s délkou 1000 ribonukleotidů je vybavena 3 poly A koncem a 5 čepičkou, aby splňovala co nejpodobnější parametry s nativní mrna v průběhu jakéhokoliv protokolu. Vyvinuli jsme i DNA variantu o stejné sekvenci. Protože se jedná o jednoduchý a efektivní nástroj, stal se relativně úspěšným produktem a v režii firmy TATAA jsme v čase psaní této práce vyvinuli druhý syntetický gen, aby bylo možné kombinovat dva kontrolní prvky v různých fázích experimentu a například odlišit inhibici v RT od inhibice v qpcr při profilování jednotlivých buněk (Obrázek 7). 55

56 Obrázek 7: Pokud hrozí nebezpečí inhibice RT-qPCR, je obecně doporučeno použít exogenní kontrolu (RNA, DNA) pro validaci protokolu. V případě analýzy jednotlivých buněk je kontrola přidána do lyzačního roztoku. Kombinace exogenní RNA a DNA umožňuje odlišit inhibici v reverzní transkripci a qpcr a dokáže odhalit adsorpci či jinou ztrátu materiálu při porovnání s neinhibovaným referenčním vzorkem, který obyčejně představuje čistá voda. S použitím první exogenní mrna kontroly jsme zjistili, že inhibice různými látkami a jejich různým množstvím ovlivňuje reverzní transkripci a PCR amplifikaci velmi variabilní měrou a to i když se jedná o identický gen (=jedna molekula mrna) na kterou jsou navrženy různé assaye (=pár primerů) (Obrázek 8). Tento výsledek naznačuje, že reverzní transkripce probíhá s různou efektivitou nejen pro různé mrna, což je známo [101, 102], ale také s různou efektivitou pro různé fragmenty v rámci jedné mrna. Efekt inhibice je tedy v případě přítomnosti inhibitoru zcela závislý na konkrétní qpcr assay a je tak ex post jen velmi obtížně analyticky vykompenzovatelný. 56

57 A ) R T in h ib ic e C q (k o n tro la - in h ib ito r) L 5 M M id d le L M id d le M 3 L 3 M re la tiv n í s tu p e ň in h ib ic e (% ) E th a n o l 1 % E th a n o l 1 0 % F o rm a lin 0,0 1 % F o rm a lin 0,1 % G T C 0,4 m M B ) q P C R in h ib ic e G T C 4 m M F e n o l 0,0 1 % F e n o l 0,1 % C q (k o n tro la - in h ib ito r) re la tiv n í s tu p e ň in h ib ic e (% ) E th a n o l 1 % E th a n o l 1 0 % F o r m a lin 0,0 1 % F o r m a lin 0,1 % G T C 0,4 m M G T C 4 m M F e n o l 0,0 1 % F e n o l 0,1 % Obrázek 8: Úroveň inhibice v závislosti na umístění qpcr assay v rámci molekuly mrna. Levá osa y ukazuje rozdíl mezi Cq hodnotou naměřenou pro kontrolní neinhibovaný vzorek zastoupený vodou a hodnotou pro vzorek s inhibitorem, pravá osa y ukazuje relativní stupeň ztráty mrna způsobený inhibicí, kde neinhibovaná kontrola reprezentuje 100% mrna. Název assaye označuje umístění v rámci mrna molekuly a délku PCR amplikonu L 350 párů bází, M 150 párů bází, všechny assaye byly validovány a vykazovaly efektivitu 100%. Osa x ukazuje typ a finální koncentraci inhibitoru v 10 µl RT(A) nebo qpcr (B) reakci. A: inhibitory byly přidány pouze do reverzní transkripce (n=3) a produkt byl 100x zředěn pro analýzu pomocí qpcr (n=2), aby se předešlo přenesené inhibici. Každá RT (GrandScript, TATAA Biocenter) obsahovala také 10 ng celkové myší RNA, která měl plnit úkol nosiče a zabránit případným ztrátám exogenní kontroly adsorpcí. B: inhibitory byly přidány do qpcr (n=6). Pro qpcr byl použit GrandMaster SYBR green mix (TATAA Biocenter, nepublikovaná data) GTC= Guanidium Thiocyanate 57

58 Výtěžek izolace RNA či DNA lze měřit po vyloučení možné inhibice přidáním exogenní kontroly do stabilizované celkové RNA, provedením izolace a porovnáním s kontrolou reprezentující 100% výtěžek. Příklad aplikace exogenní RNA a DNA pro kontrolu profilování genové exprese na úrovni jednotlivých buněk viz publikace V. Direct cell lysis for single-cell gene expression profiling Kvalita RNA a mrna S dostupností metod jako microarray, next generation sequencing a qpcr dokážeme poměrně přesně stanovit množství RNA, to nám dovoluje použít mrna a případně mirna a další nekódující RNA jako biomarkery pro diagnostiku onemocnění. Problémem ale zůstává jak ověřit změny v expresi, ke kterým mohlo dojít mezi odběrem vzorku a molekulární analýzou. RNA je méně stabilní než DNA díky jedné extra 2 hydroxylové skupině na ribóze, která je nukleofilem a přispívá k vnitřní nestabilitě molekuly. Jsou známy různé typy degradace RNA: a) pro náhodné štěpení stačí vodný roztok s magnesiovými kationty, ph mimo rozmezí 2 až 9, UV záření nebo vysoká teplota b) chemickou modifikaci funkčních skupin RNA způsobuje například formalinová fixace a ukotvení tkáně do parafinu (FFPE)[47], c) enzymatická degradace může být způsobena různými typy 3 a 5 exo- nebo endonukleáz. V současné době je nejpoužívanější metodou k ověření kvality RNA mikrofluidní elektroforéza využívající analýzu velikosti fragmentů ribozomální RNA a stanovující indikátor kvality nejčastěji RIN 1-10 [103], kdy vyšší číslo znamená intaktní RNA. Pokud RNA ze vzorku vykazuje vysoké číslo (obvykle 7, ale není pevně stanoveno), předpokládá se, že i cílová mrna je nezměněna. Několikrát bylo ukázáno, že degradace pro různé mrna i pro rrna probíhá podle jiné kinetiky a pravidel, souvisejících s funkcí a lokalizací dané RNA molekuly [ ]. Pro aplikace jako profilování jednotlivých buněk nebo LMD nelze elektroforetické metody použít pro nedostatek dostupného materiálu. Pomocí výše uvedené exogenní RNA kontroly, jsme se pokusili vyvinout qpcr metodu, která by byla citlivá pro ne-enzymatické typy degradace využívající především rozdílnou délku PCR amplikonu (Obrázek 9) ke stanovení fragmentace či chemického poškození RNA, resp. cdna. Metodu použitelnou také pro profilování genové exprese na úrovni jednotlivých buněk [48, 107]. 58

59 Obrázek 9: Rozvržení qpcr assayí na 3 konci, uprostřed a na 5 konci v rámci 1000b exogenní mrna kontroly. Pro každou z těchto částí byly navrženy tři assaye s délkou Short 75 bází, Medium 150 bází, Long 350 bází, které sdílí společný levý primer. Všechny assaye byli validovány a ukázaly efektivitu blízkou 100% (nepublikovaná data). Výsledkem je zjištění různé citlivosti na fragmentaci exogenní mrna v závislosti na umístění assaye. Stanovili jsme, že oblast mrna 3 je za použití daného RT protokolu (náhodné hexamery, SuperScript III) nejvíce fragmentována při 10 minutové inkubaci v 95 C pomocí relativní kvantifikace různě dlouhých assayí v porovnání s teplotou 20 C (Obrázek 10). 2^ Cq (delší - kratší) Stupeň fragmentace exogenní mrna v bufferu, při různých teplotách po dobu 10 min 5 medium/ 5 short 5 long/ 5 short middle medium/ middle short middle long/ middle short Obrázek 10: Porovnání fragmentace uvnitř různých částí exogenní mrna kontroly pomocí qpcr assay s různě dlouhými amplikony za použití inkubace v neutrálním Tris bufferu s EDTA po dobu 10 minut (nepublikovaná data) Genomická kontaminace ValidPrime 3 medium/ 3 short 3 long/ 3 short Tris+EDTA ph 7 20 C RH Tris+EDTA ph 7 60 C RH Tris+EDTA ph 7 75 C RH Tris+EDTA ph 7 95 C RH Pro přesnou analýzu genové exprese na úrovni mrna musí naměřená hodnota Cq reprezentovat množství transkriptu v daném vzorku. To vyžaduje, aby assay byla specifická 59

60 pouze k cílové cdna a neamplifikovala pseudogeny, primer-dimery a genomickou DNA, což často není možné dosáhnout díky podobnosti pseudogenů či absenci intronů v cílovém genu. Kontaminace gdna ve vzorku je problémem pro většinu protokolů purifikace RNA, díky velmi podobným fyzicko-chemickým vlastnostem RNA a DNA. Protože úroveň kontaminace DNA se často ve vzorcích liší a každá assay pro cílový gen vykazuje také jinou citlivost pro gdna, používá se k testování amplifikace genomické DNA RT minus kontrola. Ta má stejné složení jako obvyklá RT, není však přidána reverzní transkriptáza. Obvykle je nutné analyzovat RT(-) kontrolu pro každý gen (m) a pro každý vzorek (n), což zdvojnásobuje počet qpcr (m x n) a výrazně zvyšuje náklady pro profilování genové exprese i na úrovní jednotlivých buněk, kde je nutné analyzovat velké množství vzorků. Široce akceptovatelnou hranicí [78] je pak rozdíl RT(-) - RT(+) Cq 5, který ukazuje maximálně 3% podíl gdna z celkového množství RNA. Pokud je podíl gdna větší, nejběžnějším řešením je přidání DNAasového kroku v protokolu. Naším cílem (v laboratoři švédské TATAA Biocenter za podpory programu EduGlia) bylo vyvinout assay, které bude specifická pouze pro jeden úsek genomu přítomný pouze v jedné kopii a který není v žádné formě přepisován do mrna assay nazvanou ValidPrime. Tuto assay lze využít pro kvantifikaci genomů ve vzorku. Při použití standardu čisté genomické DNA je pak možné získat poměr počtu kopií pro ValidPrime a cílový gen. Kombinací informace o počtu kopií genomů ve vzorku s poměrem počtu kopií v rámci genomu, lze matematicky odečíst genomické pozadí ve vzorku a zpřesnit tak analýzu genové exprese spolu s nižšími náklady vynaloženými na kontrolu kvality týkající gdna. Při použití techniky ValidPrime je nutné přidat do experimentu pouze jednu assay (ValidPrime) a jeden vzorek (standard gdna), výsledný počet kontrol pro gdna je tak m+n+1. ValidPrime jsme použili na pěti cdna z různých myších tkáních a také testovali přidáváním gdna do celkové RNA. Analýz jsme provedli pomocí konvenčního i vysokokapacitního přístroje. Ukázali jsme, že ValidPrime zpřesňuje stanovení a je obecně možné ho použít pro odečtení až 60% podílu kontaminace gdna ve vzorku. Při prvním použití ValidPrimu je doporučeno ověřit efektivitu cílové assaye a ValidPrimu pomocí standardní křivky pokrývající alespoň tisícinásobný rozdíl množství haploidních genomů. Technika je popsána v publikaci III. Correction of RT qpcr data for genomic DNA-derived signals with ValidPrime. 60

61 Published online 6 January 2012 Nucleic Acids Research, 2012, Vol. 40, No. 7 e51 doi: /nar/gkr1259 Correction of RT qpcr data for genomic DNA-derived signals with ValidPrime Henrik Laurell 1, *, Jason S. Iacovoni 1, Anne Abot 1, David Svec 2,3, Jean-José Maoret 1,4, Jean-François Arnal 1,5 and Mikael Kubista 2,3 1 Inserm/Université Paul Sabatier UMR1048, Institut des Maladies Métaboliques et Cardiovasculaires (I2MC), BP84225, Toulouse cedex 4, France, 2 Laboratory of Gene Expression, Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague, 3 TATAA Biocenter AB, Göteborg, Sweden, 4 Plateforme GeT (Génome et Transcriptome) du Génopole Toulouse, Toulouse, France and 5 Faculté de Médecine, Université de Toulouse III and CHU de Toulouse Received September 17, 2011; Revised November 21, 2011; Accepted December 5, 2011 ABSTRACT Genomic DNA (gdna) contamination is an inherent problem during RNA purification that can lead to non-specific amplification and aberrant results in reverse transcription quantitative PCR (RT qpcr). Currently, there is no alternative to RT( ) controls to evaluate the impact of the gdna background on RT PCR data. We propose a novel method (ValidPrime) that is more accurate than traditional RT( ) controls to test qpcr assays with respect to their sensitivity toward gdna. ValidPrime measures the gdna contribution using an optimized gdna-specific ValidPrime assay (VPA) and gdna reference sample(s). The VPA, targeting a nontranscribed locus, is used to measure the gdna contents in RT(+) samples and the gdna reference is used to normalize for GOI-specific differences in gdna sensitivity. We demonstrate that the RNA-derived component of the signal can be accurately estimated and deduced from the total signal. ValidPrime corrects with high precision for both exogenous (spiked) and endogenous gdna, contributing 60% of the total signal, whereas substantially reducing the number of required qpcr control reactions. In conclusion, ValidPrime offers a cost-efficient alternative to RT( ) controls and accurately corrects for signals derived from gdna in RT qpcr. INTRODUCTION Accurate gene expression analysis by reverse transcription (RT) quantitative PCR (qpcr) requires assays with high specificity for the target cdna/reference gene, collectively referred to herein as the Gene-Of-Interest (GOI). It is important to have negligible signal contribution from experimental artifacts, such as primer dimers and contaminating genomic DNA (gdna). Traditionally, primer dimer formation is tested using a no template control (NTC) and gdna contamination levels are measured with RT( ) controls [which differ from regular RT(+) reactions in that no reverse transcriptase is added]. Contamination of gdna is an inherent problem during RNA purification due to the similar physicochemical properties of RNA and DNA. Since gdna contamination levels are frequently not uniform between samples (1) and the sensitivity toward gdna differs greatly between GOI assays, RT( ) controls are needed for each sample/assay pair, which substantially adds to the cost and labor in RT qpcr profiling studies. A difference of at least five quantification cycles (C q ) between RT(+) and RT( ) reactions indicates that <3% of the total signal originates from gdna, and is commonly used as limit to ensure accurate estimation of GOI expression. Smaller differences typically call for DNase treatment of samples. The accuracy of gdna background estimation, as measured with RT( ) reactions, is compromised due to the fact that GOI assays, designed to amplify target transcripts, are used even though they are not optimized for gdna amplification. Furthermore, intrinsic characteristics of RT( ) qpcrs that influence the result of the correction, such as amplification efficiencies, are difficult to assess. In addition, as proposed theoretically (2) and shown experimentally (3,4), a low initial number of target molecules leads to a large variability between replicates, mainly due to stochastic effects. All together, this explains the low reproducibility frequently observed in RT( ) reactions. The qpcr assays can be either gdna sensitive or insensitive. Whereas qpcr assays can be designed to be *To whom correspondence should be addressed. Tel: ; Fax: ; henrik.laurell@inserm.fr ß The Author(s) Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 61

62 e51 Nucleic Acids Research, 2012, Vol. 40, No. 7 PAGE 2 OF 10 gdna insensitive, such as those designed to target exons flanking a long intron or with primers that cross exon exon junctions, qpcr assays for single-exon genes will readily amplify contaminating gdna. The gdna background signal is even further amplified in the presence of multiple genomic copies or pseudogenes. The latter are particularly troublesome since they may originate from retrotransposons without introns that are amplified even with intron-spanning assays. Thus, there exists both variation in the degree of contamination between samples and large differences between assays in terms of their sensitivity to gdna. Therefore, general methods of controlling and correcting for gdna contamination are essential for accurate measurements of gene expression. As an alternative to RT( ) reactions, we have developed a procedure that determines the impact of the gdna contamination on the measured signal much more accurately and allows validation of qpcr primers with respect to their sensitivity toward gdna. We show in proof-of-principle experiments that efficient background correction can be performed with gdna contamination representing 60% of the total signal. MATERIALS AND METHODS Samples All samples were from mouse (C57Bl/6J) tissues (kidney, liver, adipose tissue, uterus, peritoneal macrophages). All experimental procedures involving animals were performed in accordance with the principles and guidelines established by the National Institute of Medical Research (Inserm) and were approved by the local Animal Care and Use Committee. Prior to sampling, mice were anesthetized by intraperitoneal injection of ketamine (100 mg kg 1 ) and xylazine (10 mg kg 1 ). Tissues were snap-frozen in liquid nitrogen and stored at 80 C. Isolation of peritoneal macrophages has been described elsewhere (5). Macrophages were in some cases treated with 20 ng/ml LPS ex vivo for 4 h prior to RNA extraction. DNA extraction C57Bl/6J mouse gdna was extracted from whole blood using the PerfectPure DNA Blood Cell Kit, according to the recommended protocol (5 0 PRIME GmbH, Hamburg, Germany). Good results were also obtained with gdna purified from mouse tails by phenol/chloroform extraction after Proteinase K digestion (6). The DNA concentration was determined spectroscopically (NanoDrop). RNA extraction Total RNA was extracted using a double purification protocol. Briefly, Trireagent (Sigma-Aldrich, St Louis, MI, USA) was added to the frozen tissue sample, which was homogenized in a Precellys 24 homogenizer (Bertin Technologies, France). After the extraction step, the supernatant was gently mixed with 1 Vol 70% ethanol and applied on a total RNA miniprep Genelute column, where it was washed and eluted following the instructions from the manufacturer (Sigma-Aldrich). The integrity and quality of the RNA was tested by capillary microelectrophoresis [MultiNA (Shimadzu) or Experion (BioRad)] and spectroscopically (NanoDrop). A fraction of the RNA was DNase treated using the DNAfree kit from Ambion. To avoid inhibition of the reverse transcriptase, the volume of DNAse treated RNA did not exceed 25% of the total volume during RT. RT Total RNA ( mg) was reverse transcribed in ml using the High Capacity cdna Reverse Transcription Kit (Applied Biosystems) using random hexamers. The reaction mixture was incubated for 10 min at 25 C, 120 min at 37 C and finally for 5 min at 85 C, according to instructions from the manufacturer (Applied Biosystems). RT reactions were diluted 5 10-fold prior to qpcr. Real-time qpcr Conventional qpcr. All reactions (except when indicated) were performed in duplicate 10 ml volumes using 20 ng reverse transcribed total RNA in a StepOnePlus system (Applied Biosystems) with the SsoFast EvaGreen Supermix (BioRad) and an assay concentration of 300 nm using the cycling parameters: 95 C (20 s) followed by 40 cycles at 95 C (3 s) and 60 C (20 s). Melting curve analysis: 95 C (15 s); 60 C (60 s) and a progressive increase up to 95 C (0.5 C/min). Analysis of the data was performed with the StepOne software v.2.2. High-throughput qpcr. The Dynamic Arrays for the microfluidic BioMark TM system (Fluidigm Corporation, CA, USA) (7) were used to study gene expression in 6.5 ng cdna from mouse peritoneal macrophages or mouse uterus, as described below. Specific target amplification. Pre-amplification of cdna (produced from 25 to 65 ng of total RNA) was performed in the StepOnePlus cycler (Applied Biosystems) [at 95 C for 10 min activation step followed by 14 cycles: 95 C, (15 s), 60 C, (4 min)] in a total volume of 5 ml in the presence of all primers at a concentration of 50 nm. After pre-amplification, 20 ml Low EDTA TE Buffer [10 mm Tris ph8 (Ambion), 0.1 mm EDTA ph8 (Sigma)] was added to each sample. Sample Mix for BioMark analysis. The pre-sample mix contained 66.7% 2X Taqman Õ Gene Expression Master Mix (Applied Biosystems), 6.67% 20X DNA Binding Dye Sample Loading Reagent (Fluidigm), 6.67% 20X EvaGreen TM (Biotium), 20% Low EDTA TE Buffer. Sample mix was obtained by mixing 5.6 ml of the pre-sample mix with 1.9 ml of diluted cdna. Assay Mix. A quantity of 3.8 ml 2X Assay Loading Reagent (Fluidigm) and 1.9 ml Low EDTA TE Buffer were mixed with 1.9 ml of primers (20 mm of each forward and reverse primer). qpcr conditions. After priming of the Dynamic Array in the NanoFlex TM 4-Integrated Fluidic Circuits 62

63 PAGE 3 OF 10 Nucleic Acids Research, 2012, Vol. 40, No. 7 e51 (IFC) Controller (Fluidigm), 5 ml of each sample and 5 ml of each assay mix were added to dedicated wells. The dynamic array was then placed again in the IFC Controller for loading and mixing under the following conditions: 50 C (2 min); 70 C (30 min) and 25 C (10 min). The loaded Dynamic Array was transferred to the BioMark TM real-time PCR instrument. After initial incubation at 50 C (2 min) and activation of the Hotstart enzyme at 95 C (10 min) cycling was performed using 95 C (15 s), and 60 C (1 min) for 35 cycles, followed by melting curve analysis (1 C/3 s). Data analysis. Initial data analysis was performed with the Fluidigm real-time PCR analysis software v with linear derivative baseline correction and a quality correction set to Design of ValidPrime assays Intergenic regions in the mouse genome with no known transcriptional activity were selected using the UCSC genome browser ( In total, 30 assays targeting 10 different regions on 5 chromosomes were designed using PrimerBlast (NCBI). Amplification efficiencies were determined with a dilution series of gdna ( haploid genome copies). PCR products were analyzed for purity by recording melting curves and by capillary micro-electrophoresis (MultiNA, Shimadzu), leading to the selection of five assays for limit of detection (LOD) and limit of quantification (LOQ) determination. LOD and LOQ determination of ValidPrime assays Five assays were selected for determination of LOD and LOQ using eight concentrations (0, 1, 2, 4, 8, 16, 32, 64 copies) in the presence of 50 ng/ml carrier yeast trna (Roche Molecular Biochemicals). Sequence information for the two best candidates, in terms of sensitivity and specificity, is provided in Supplementary Table S1. Except when stated otherwise, mvpa1 was used as the VPA. GOI assay design and validation Non-commercial GOI assays were either taken from previously published studies (5,8,9) or designed with the Primer-BLAST utility at NCBI. Sequences are reported in Supplementary Table S1. Specificity was evaluated by BLAST (mouse RefSeq database) during design and by in silico PCR (UCSC Genome Browser). Amplification efficiencies were evaluated in the BioMark system on dilutions series of both cdna and gdna. Exogenous gdna spiking experiments Quantities ranging from 50 to 5000 haploid genome copies (corresponding to ng gdna) or water were added to 20 ng (StepOnePlus) or 6.5 ng (BioMark) cdna. Non-spiked samples had low, but detectable gdna levels. For the BioMark runs, the gdna was added prior to the pre-amplification step. Genome copy number calculations were based on the NCBI m37 assembly of the C57Bl/6 mouse genome ( bp) assuming an average molecular weight of 660 g/mol/bp. The mass of a haploid mouse genome was thus estimated to be 2.98 pg. Data analysis and statistics Cq DNA, Cq RNA and %DNA were calculated using the gh-validprime software ( The GenEx software (v.5.3, was used for one-way ANOVA analysis and to calculate LOD. Data are presented as mean ± SD. RESULTS The ValidPrime method We developed ValidPrime to estimate and correct for gdna contribution in RT(+) qpcr measurements in a more reliable manner than that afforded by RT( ) controls. We refer to the signal measured in an RT(+) qpcr as Cq NA (NA : Nucleic Acids) [Equation (1)], indicating contributions from RNA (Cq RNA ) as well as gdna (Cq DNA ) as shown in Equation (2), expressed in relative quantities. Cq RT+ ¼ Cq NA ð1þ 2 Cq NA ¼ 2 Cq RNA +2 Cq DNA ð2þ Traditionally, determination of the RNA component using RT( ) controls would be achieved using Equation (3). However, as detailed in the introduction, low reproducibility and other factors detract from the accuracy of this approach. We propose that Equation (4), derived in Supplementary Figure S1A, provides an accurate solution provided that Cq DNA is estimated using ValidPrime, Equation (5), in which GOI refers to any transcribed GOI, including reference genes, studied in a RT qpcr experiment. Cq RNA and Cq DNA refer to the signal contribution derived from RNA (cdna) and DNA (gdna), respectively, in a RT+ sample. Cq RNA ¼ log 2 ð2 Cq RT+ 2 Cq RT Þ Cq RNA ¼ log 2 ð2 Cq NA 2 Cq DNA Þ Cq DNA ¼ Cq VPA Sample +CqGOI gdna CqVPA gdna For the determination of Cq DNA [Equation (5)], the gdna contamination level in a RT(+) sample (referred to as Sample ) is measured with a gdna-specific ValidPrime assay (VPA) (Cq VPA sample ). The VPA targets a non-transcribed locus present in one copy per normal haploid genome. However, since the gdna sensitivity can be highly variable between GOI assays, the capacity of the GOI assay to amplify gdna is compared with that of the VPA. In ValidPrime, this difference is tested on purified gdna, yielding the delta Cq component in Equation (5) (Cq GOI gdna CqVPA gdna ). Despite a formulaic resemblance to the Ct equation developed by Livak and Schmittgen (10), these calculations are distinct (Supplementary Figure S1B). Figure 1 depicts a typical grid of qpcr data including the required controls for ValidPrime estimation of Cq DNA and the subsequent correction of Cq NA into Cq RNA. Apart ð3þ ð4þ ð5þ 63

64 e51 Nucleic Acids Research, 2012, Vol. 40, No. 7 PAGE 4 OF 10 Figure 1. ValidPrime: principles and exemplifying equations. ValidPrime uses the annotation Cq NA for the signal measured in a (RT+) qpcr sample, to which both Nucleic Acids, RNA and DNA contribute, corresponding to Cq RNA and Cq DNA [Equation (2)]. The grid shows an example of an experimental design with 3 RT+ samples and 3 GOI assays, plus the controls required for the ValidPrime estimation of Cq DNA and the subsequent correction of Cq NA to obtain Cq RNA. The term GOI is used in ValidPrime for both target transcripts and reference genes, since the calculations are independent of the gene type. The VPA column contains the data obtained with the VPA and the gdna row contains measurements using purified gdna as a sample. The equations under the grid illustrate the determination of Cq DNA, Cq RNA and %DNA for GOI 1 in sample 1 according to the color code in the grid. from the GOI assays, that are specific for each study, the VPA has been added among the assays. In addition to samples 1 3, which correspond to any RT+ samples in qpcr study, one or several gdna samples are added in the experimental design. The equations under the grid exemplify the calculations for GOI 1 in Sample 1. The gdna contribution can also be expressed as a percentage of relative quantities [Equation (6)]. %DNA ¼ 2 Cq DNA 2 Cq NA 100 Assay validation In order to determine the accuracy of the ValidPrime method, we first designed and characterized candidate VPAs. Among 30 candidates from 10 different regions on five chromosomes, 26 amplified gdna with efficiencies between 90 and 110%. Among the tested assays, mvpa1 (amplifying an 87-bp sequence in the qb region of chromosome 1) and mvpa5 (amplifying an 87-bp sequence in the qf region of chromosome 5) had the best characteristics in terms of sensitivity and specificity. LOD was 3.2 copies for mvpa1 (GenEx; Cut-off Cq 37; ð6þ 95% CI; mean of two determinations) and 3.7 copies for mvpa5 (GenEx; Cut-off Cq 37; 95% CI) and the LOQ (SD < 45%) was 4 copies for both assays (Supplementary Figure S2). In four out of eight NTC reactions, a signal (Cq 38.1±0.9) was detected with the mvpa5 assay, indicating formation of primer dimers. However, the primer dimer product was never observed in samples containing gdna, as evaluated by melting curve analyses and by capillary micro-electrophoresis (MultiNA, Shimadzu). Efficiency analysis for GOI assays was performed in the BioMark system. No amplification was observed in the NTC controls, except for Sprr2f (Cq 28.6), which was 10 cycles above the Cq measured in the sample with the lowest Sprr2f expression (Cq 18.5) and thus, far more than the proposed accepted minimal difference of five cycles between NTC and RT(+) sample (11,12). The generally low Cq values obtained with the BioMark system are explained by the 14-cycle pre-amplification step used in this protocol. The amplification efficiency was similar between assays as measured with a cdna dilution series (95.5 ± 6.1%; mean R 2 : ) and a gdna dilution series for gdna-sensitive assays (100.4 ± 7.7%; mean R 2 : ) (data not shown). All RNA samples used in the study had A260/A280 ratios between 1.9 and 2.0 (mean: 1.97); A260/A230 between 1.5 and 2.5 (mean: 2.13) and A260/A270 above 1.17 (mean: 1.23), where the latter tests for phenol contamination. Equivalence between Cq DNA estimated with ValidPrime and RT( ) controls We next verified that the Cq DNA values calculated with ValidPrime agree with those measured directly in RT( ) qpcrs. Since a direct comparison is difficult, due to the poor reproducibility of RT( ) controls (see above), the following test was performed: RT(+) and RT( ) samples from two different tissues were spiked with 0.30 ng of gdna (approximately 100 haploid genome copies) and measured using three gdna-sensitive GOI assays. The data in Figure 2 are ratios of relative quantities (RQ) between either the total signal (Cq NA ) in RT(+) reactions or the corresponding Cq DNA calculated by ValidPrime over the RQ in RT( ) reactions. As shown, tissue-dependent differences in the expression levels of the three target genes were observed [from 1.8- to 27-fold compared with RT( ) samples]. Independent of the expression level, the estimation by ValidPrime of the gdna-derived signal levels (Cq DNA ) in RT(+) samples was in excellent agreement with the data from RT( ) samples, with the ratio of the relative quantities (1.20 ± 0.29) close to the theoretically expected value of 1. Calculation of Cq RNA in RT(+) samples through the correction of signals derived from exogenously added gdna Given the good correlation between ValidPrime estimation of Cq DNA and RT( ) measurements, we next tested the accuracy of the calculation of the RNA-derived component Cq RNA in RT(+) samples using Equation (4). In a first set of experiments, different amounts of gdna were 64

65 PAGE 5 OF 10 Nucleic Acids Research, 2012, Vol. 40, No. 7 e VP: + Adipose Tissue Kidney Fold over RT(-) Serpine1 Tgfb3 Ch25h NA DNA NA DNA NA DNA Figure 2. Equivalence between Cq DNA calculated by ValidPrime and RT( ) measurements. Fold ratios in linear scale (2 -Cq(RT+) /2 -Cq(RT ) ) between either the total signal (NA) measured in spiked RT(+) reactions (dark bars) or the gdna signal (DNA) estimated by ValidPrime (VP) from RT(+) reactions (light bars) compared to the signal in RT( ) reactions. A quantity of 20 ng of cdna from adipose tissue (hatched bars) or from kidney, were spiked with 0.30 ng gdna to decrease the variability due to stochastic amplification observed in RT( ) reactions. Independently of the expression level of the three genes studied in RT(+) samples, the estimations by ValidPrime of the gdna-derived signals in RT(+) were very similar to the signals measured in RT( ) reactions, as the ratio was close to 1 (illustrated by the red dashed line; mean 1.20 ± 0.29). Data are mean ± SD from two experiments in duplicate on the StepOnePlus. added to cdna test samples with low, but detectable, endogenous gdna levels. All 32 GOI assays were gdna-sensitive (Supplementary Table S1) and had gdna amplification efficiencies similar to the VPA (i.e. passed the ValidPrime high confidence criteria detailed in Supplementary Figure S3). Both the traditional StepOnePlus microtiter plate-based qpcr (Figure 3A) and the microfluidic BioMark system (7) (Figure 3B) were used to collect raw data (Cq NA ) as input for ValidPrime estimations of the RNA-derived signal (Cq RNA ). Samples were grouped according to the level of DNA contribution. Using ValidPrime, we could accurately estimate the RNA-derived signal (Cq RNA ) even in samples with elevated gdna-derived signals. However, the correction was less precise when the gdna background exceeded 60% of the total signal. The demonstration that with ValidPrime we can identify and correct for signals derived from exogenous DNA in experimental RT qpcr samples, using two different qpcr platforms, was first step toward a proof-of-principle. The correction is virtually independent of gene copy number since it works well both for GOI assays targeting one single locus and for genes with multiple pseudogenes (Supplementary Figure S4). Correction of signals derived from endogenous gdna In order to evaluate the capacity of ValidPrime to correct for endogenous gdna present in typical RNA preparations, a different strategy was applied. We used a gdna-sensitive and a gdna-insensitive assay for each GOI, with comparable amplification efficiencies. Three genes (Il1b, Serpine1 and Chi3l3) expressed in mouse macrophages were chosen as targets. Using the BioMark system, qpcr data were collected from 81 RNA preparations and the ValidPrime correction was applied. Despite identical overall gdna content, the impact of the gdna on the total signal obtained with the gdna-sensitive assays differed considerably between the three genes. When the impact was limited (i.e. low %DNA), as in the case for Il1b, the effect of the ValidPrime correction was modest (Supplementary Figure S5). With increasing %DNA, as observed for Serpine1 and Chi3l3 (Figure 4A), the result of the correction becomes clearer, even in log2 scale (Figure 4B). Theoretically, given identical amplification efficiencies for the two assays and the absence of gdna amplification, the Cq NA data in the scatter plots in Figure 4B should fall on a straight line with a slope of 1. The presence of gdna will contribute to the signal measured with gdna sensitive assays (x-axis) and the uncorrected Cq NA data will therefore produce a slope >1. Even though the impact of the correction differs for the three genes, the Cq RNA values estimated using ValidPrime restore linearity, especially for samples with a DNA contribution <60% (summarized in Figure 4C). These data demonstrate that using ValidPrime, efficient correction of RT qpcr data for the presence of endogenous gdna is possible, as long as the DNA contribution to the total signal is <60%. DISCUSSION Since its invention in the early/mid 1990s (13,14), qpcr has undergone considerable methodological and technological advances (15). However, despite its direct impact on qpcr results, no alternative to RT( ) controls has, to our knowledge, been proposed to assess gdna-derived contributions to the signals in RT qpcr. 65

66 e51 Nucleic Acids Research, 2012, Vol. 40, No. 7 PAGE 6 OF 10 A Fold over control %DNA : CqNA (VP-) CqRNA (VP+) StepOnePlus No spike < 3% 3-25% 25-60% 60-90% n : B Fold over control %DNA : CqNA (VP-) CqRNA (VP+) BioMark No spike < 3% 3-25% 25-60% 60-90% n : Figure 3. Correction of exogenous (spiked) gdna with ValidPrime. The data are presented in linear scale as fold ratio (2 -Cq /2 -Cq ref ), where Cq ref is the Cq NA measured on non-spiked controls and Cq refers to Cq RNA (light bars) or Cq NA (dark bars) depending on whether or not ValidPrime correction was applied (VP /VP + ). The data are grouped based on the impact of exogenous DNA, expressed as percentage of the total signal (%DNA) in each sample. Data were collected with either 17 GOI assays on a StepOnePlus (Applied Biosystems) using mvpa1 and mvpa5 (A), or with 19 assays on a BioMark (Fluidigm) using mvpa1 (B). All assays passed the high confidence ValidPrime criteria (Supplementary Figure S3). Data are presented as the mean ± SD, with (n) designating the number of samples in each group. cdnas were from mouse kidney or liver for the StepOnePlus studies and mouse uterus for the BioMark study. ValidPrime is a cost-efficient alternative to RT( ) controls to test for the presence of gdna in samples. It is superior to RT( ) controls not only because of a higher accuracy, but also because fewer control reactions are required, eliminating the need for additional test reactions in the RT step. While the traditional approach for a study based on m samples and n genes requires m reverse transcription control reactions (RT ) and m n extra qpcrs, ValidPrime only requires m+n+1 control qpcrs and no RT( ) reactions (Table 1). As an example, in a BioMark Dynamic Array experiment, ValidPrime reduces the number of controls by >95%. ValidPrime is also the first method that proposes to correct for qpcr signals originating from contaminating gdna. It is possible that the lack of accuracy and low reproducibility generally observed in RT( ) reactions has previously restrained the development of a correctionbased model similar to that proposed in Equation (3). The present study includes data obtained with cdna from five different mouse tissues analyzed with two qpcr instrument platforms, providing support for the general validity of ValidPrime. It is important not to confuse gdna contamination levels with the actual contribution of gdna to the total signal, herein expressed as %DNA [Equation (6)]. Indeed, we did not observe any correlation between gdna levels (as estimated by qpcr with the VPA) and the total signal (Cq NA ) measured in RT(+) qpcr reactions with GOI assays (Supplementary Figure S6). However, as evidenced from the data shown in Figure 4A and Supplementary Figure S5A, there is a clear positive correlation between %DNA and Cq NA with the gdna sensitive assay, which demonstrates the increased impact of contaminating gdna in samples with low GOI expression levels. The primer design strategy also strongly influences the impact of gdna on the qpcr signal. Given the multi-exonic nature of most eukaryotic genes (16), it is conceivable that gdna-insensitive assays can be designed for most targets in vertebrates. Regardless of the primer design strategy, the inability of a GOI assay to amplify gdna needs to be validated experimentally. ValidPrime offers this possibility. However, for certain targets it is impossible to design transcript-specific assays. This can be due to either the presence of intronless pseudogenes or the absence of introns in single-exon genes. In order to assure a good accuracy for the ValidPrime correction, these gdna sensitive assays should behave similarly to VPA against gdna. In analogy with the comparative Ct method (or Ct method) (10), in which similar amplification efficiencies for the GOI and reference gene assays are presumed, estimation of Cq RNA in ValidPrime assumes similar efficiencies for the GOI and gdna assays. When validated according to the Minimal Information for publication of Quantitative real-time PCR Experiments (MIQE) guidelines (17), gdna-sensitive assays are in general perfectly compatible with ValidPrime. Nevertheless, when using a GOI assay for the first time with ValidPrime, and especially when Cq adjustment is requested, we recommend the inclusion of a gdna dilution series with concentrations covering at least three log 10 (e.g haploid genomic copies). Consistent relation to VPA across the dilution series indicates similar amplification efficiencies of the two assays, which sanctions Cq correction with high confidence (Supplementary Figure S3). For VPAs, as well as for high confidence GOI assays, we generally observed perfectly linear amplifications from 5 to haploid genomic copies (corresponding to ng) (Supplementary Figures S2 and S3). Even though it is possible that higher gdna concentrations (i.e. >30 ng per reaction) could influence qpcr amplification efficiencies (18), such gdna contamination levels are rarely, if ever, encountered in RT qpcr experiments. Furthermore, we did not observe any differences in the VPA amplification between samples with purified gdna 66

67 PAGE 7 OF 10 Nucleic Acids Research, 2012, Vol. 40, No. 7 e51 A 100 %DNA Mean: 17.0% Median: 9.7% Serpine1 %DNA Mean: 41.8% Median: 37.9% Chi3l B Cq NA (gdna sensitive assay) Cq NA (gdna sensitive assay) Cq (gdna insensitive assay) Cq (gdna insensitive assay) Cq (gdna sensitive assay) Cq (gdna sensitive assay) C Serpine1 Data type Correction Filter (%DNA) Slope R 2 n CqNA No None CqRNA Yes None CqRNA Yes > CqNA (VP-) CqRNA (VP+)(<60%DNA) CqRNA (VP+)(60-90%DNA) Chi3l3 Data type Correction Filter (%DNA) Slope R 2 n CqNA No None CqNA No > CqRNA Yes > CqRNA Yes > CqRNA Yes > Figure 4. Correction of endogenous gdna with ValidPrime. Comparison of results obtained with two assays targeting Serpine1 (left) or Chi3l3 (right) in cdna prepared from mouse peritoneal macrophages and measured in the BioMark qpcr system. The gdna-sensitive assays amplify both gdna and cdna, while the gdna-insensitive assays only recognize the transcript. (A) Scatter plots showing the correlation between the %DNA [as defined in Equation (6)] and Cq NA data obtained with the gdna-sensitive assays in each of 81 independent RNA preparations (means of duplicates). The positive correlation between %DNA and Cq illustrates the increasing impact of the gdna contamination with decreasing total signal. Mean and median values refer to %DNA levels. (B) Cq NA data measured with the gdna-insensitive assays plotted against the corresponding Cq NA data (dark blue) or ValidPrime-estimated Cq RNA (light blue and orange), obtained with the gdna-sensitive assays. Samples with a DNA contribution of % are shown in orange and those with <60% in light blue. (C) Tables summarizing the effect of ValidPrime correction and data filtering on the slope and the coefficient of determination (R 2 ). and mixed samples, spiked with cdna or RNA (Supplementary Table S2). Even though we consistently observed very low variability between replicates in VPA-gDNA amplifications over a wide range of initial gdna concentrations (Supplementary Figure S2 and Supplementary Table S2), it is advisable to use 1 10 ng gdna (i.e. & haploid genome copies) per qpcr, when only one gdna concentration is included in the design. This range favors reliable and distinct gdna amplification with the VPA and the high confidence gdna-sensitive GOI assays. It also increases the confidence when verifying the absence of gdna amplification with GOI assays that are presumed to be gdna-insensitive. In this study, we used a maximal 0.3 SD for the Cq between VPA and GOI gdna amplifications as criterion for high confidence gdna-sensitive GOI assays. Alternatively, an efficiency (E) based criterion can be 67

68 e51 Nucleic Acids Research, 2012, Vol. 40, No. 7 PAGE 8 OF 10 Table 1. ValidPrime reduces the number of required control reactions in RT qpcr No. of controls Assays (n) Samples (m) The roman values indicates Traditional RT strategy: (m n)+m and the bold values indicates ValidPrime: (m+n+1). ValidPrime replaces the need to perform RT( ) controls for all RT(+) reactions and reduces substantially the number of controls compared to a conventional set up. In an expression profiling experiment based on m samples and n assays, the RT( ) approach requires m RT( ) reactions followed by m n qpcr controls, whereas ValidPrime only requires m+n+1 controls. The numbers in the table are based on single measurements for both approaches. Even when p gdna samples/concentrations are included in the experimental setup using ValidPrime, the number of control reactions [m+(p n)+p] is still largely inferior to the RT( ) approach. used. Indeed, similar results to those shown in Figure 3, were obtained when a maximal difference of 0.15 in E (defined as 10-1/slope -1) was used as inclusion criterion (data not shown). If a gdna-sensitive GOI assay has a suboptimal, but confidently estimated E and cannot be replaced with a better assay, Equation (7) (19), or equivalent (20), can be used to correct the Cq NA. Procedures for confident determination of amplification efficiencies are described elsewhere (21). Cq NA new ¼ Cq NA old logð1+eþ ð7þ logð2þ Coherency of PCR product melting curve profiles from cdna and gdna samples should also be considered prior to Cq RNA calculations. If a GOI assay generates gdna-specific products that are not observed in cdna samples, Cq RNA adjustment of Cq NA will not be reliable and is not recommended or even needed. Electrophoresis-based analysis of PCR-products is an alternative informative tool to verify that the same products are formed from cdna and gdna templates. Caution should also be taken if differences in ploidy are expected, such as in cancer biopsies, since the number of VPA and GOI targets per cell could vary between samples. However, homogenous populations of aneuploid samples can be analyzed with ValidPrime, such as cancer cell lines, given that the VPA and GOI target loci are each present at least in one copy per cell. To make ValidPrime readily available, we have developed a software application (gh-validprime) (H. L. and J. I., manuscript in preparation), that is free of charge for academic use. ValidPrime Cq RNA calculation is also available within the data pre-processing workflow of the GenEx software (version 5.3, The gh-validprime software assigns grades to assays/samples based on the impact of the genomic background (Supplementary Figure S7). The gdna-insensitive assays are classified as A+. Other assays are attributed the grades A, B, C and F, where the assignment is sample-dependent. While A (<3 %DNA) does not require correction, B and C samples (3 25 and %DNA, respectively) are corrected, provided the assays pass the high confidence criteria. If gdna contribution exceeds 60%, correction is not recommended. RT+ samples with gdna concentrations below the limit of detection, in which the VPA fail to generate a signal, are attributed the grade A*. The default output from the ValidPrime software is either Cq NA (for A+ assays, A* and A samples), Cq RNA (for B and C samples) or HIGHDNA for F samples. The output data are ready for further pre-processing, such as normalization against reference genes. The general ValidPrime workflow is summarized in Figure 5. The gdna sensitivity and confidence evaluation of GOI assays can be performed independently, as outlined in Figure 5A, or together with RT(+) samples, which facilitates the specificity assessment. Figure 5B illustrates the flowchart for previously validated GOI assays. The ValidPrime source code is available through the gh-validprime project at This software depends on the Qt framework ( and the GeneHuggers library ( A windows installer and test files are available at ValidPrime assays targeting different species (including human, mouse and a general vertebrate) have been developed by the TATAA Biocenter ( CONCLUSION ValidPrime provides, for the first time, the opportunity to correct reliably for gdna background in qpcr. Correction is possible for any GOI assay that consistently amplifies gdna, given that the DNA contribution does not exceed 60% of the signal. ValidPrime is superior to traditional RT( ) controls because of its higher accuracy and the lower number of controls required, which leads to a substantial cost savings. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2, Supplementary Figures 1 7 and Supplementary References [1,10,17,22 25]. ACKNOWLEDGEMENTS The authors are grateful for critical comments from Drs Pascal Martin, Se bastien De jean, Coralie Fontaine and Anders Sta hlberg. Drs Genevie` ve Tavernier and Elodie Riant provided mouse blood and tissue samples and Hortense Berge` s provided technical assistance during the isolation of mouse macrophages. 68

69 PAGE 9 OF 10 Nucleic Acids Research, 2012, Vol. 40, No. 7 e51 A B Figure 5. ValidPrime flowchart. ValidPrime GOI assay validation. ValidPrime can be used as a reliable, cost-efficient alternative to RT( ) controls to survey gdna background in RT qpcr, and as a tool to determine the RNA-derived signal (Cq RNA ) in RT(+) qpcr reactions. To optimize its accuracy when Cq RNA calculation is desired, validation of GOI assays in gdna samples is recommended, as outlined in (A). Asterisks indicates the efficiency evaluation and melting curve/electrophoresis-based analysis. This includes an evaluation of the gdna sensitivity of GOI assays using dilution series with gdna samples spanning at least three log 10 in copy number. GOI assays that do not amplify gdna are attributed the grade A+. The amplification of gdna by high-confidence assays should be specific and with an efficiency similar to that of the VPA (see Discussion section and Supplementary Figure S3). For GOI assays with suboptimal, but confidently determined (17,21) efficiency, Equation (7) could be applied to adjust Cq NA data. To optimize specificity, there should also be consistency between the melting curves of PCR products in gdna and cdna samples. (B) Cq RNA calculation with ValidPrime-validated GOI assays. High confidence and A+ assays can be used with less gdna samples for Cq RNA determination. It is recommended to confirm the absence of gdna amplification at least once for A+ assays. Samples that do not contain sufficient gdna to generate a signal with the VPA are attributed A*. As for gdna insensitive A+ assays, Cq RNA equals Cq NA (i.e. output = input) in A* samples, since the DNA-derived signal is negligible [see Equations (2 and 4)]. For gdna-sensitive GOI assays, Cq RNA is calculated by a Cq DNA -based correction of Cq NA using Equations (4 and 5). To minimize the risk of jeopardizing the accuracy of the Cq RNA estimation, it is not advisable to perform correction on samples where the DNA-derived signal exceeds 60%. The calculations are facilitated using the ValidPrime software. Details on additional assay/sample grading and data output formats employed by the software are provided in Supplementary Figure S7. The Cq RNA output data can be used for downstream data processing, such as normalization against reference genes. 69

70 e51 Nucleic Acids Research, 2012, Vol. 40, No. 7 PAGE 10 OF 10 FUNDING Institut National de la Sante et de la Recherche Me dicale. Funding for the open access charge: Institut des Maladies Métaboliques et Cardiovasculaires, Toulouse, France. Conflict of interest statement. M.K. and D.S. are employees of TATAA Biocenter AB, a commercial source for the ValidPrime assays. REFERENCES 1. Bustin,S.A. (2002) Quantification of mrna using real-time reverse transcription PCR (RT-PCR): trends and problems. J. Mol. Endocrinol., 29, Peccoud,J. and Jacob,C. (1996) Theoretical uncertainty of measurements using quantitative polymerase chain reaction. Biophys. J., 71, Nordga rd,o., Kvaloy,J.T., Farmen,R.K. and Heikkila,R. (2006) Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: the balance between accuracy and precision. Anal. Biochem., 356, Bengtsson,M., Hemberg,M., Rorsman,P. and Sta hlberg,a. (2008) Quantification of mrna in single cells and modelling of RT-qPCR induced noise. BMC Mol. Biol., 9, Calippe,B., Douin-Echinard,V., Laffargue,M., Laurell,H., Rana-Poussine,V., Pipy,B., Guery,J.C., Bayard,F., Arnal,J.F. and Gourdy,P. (2008) Chronic estradiol administration in vivo promotes the proinflammatory response of macrophages to TLR4 activation: involvement of the phosphatidylinositol 3-kinase pathway. J. Immunol., 180, Hofstetter,J.R., Zhang,A., Mayeda,A.R., Guscar,T., Nurnberger,J.I. Jr and Lahiri,D.K. (1997) Genomic DNA from mice: a comparison of recovery methods and tissue sources. Biochem. Mol. Med., 62, Spurgeon,S.L., Jones,R.C. and Ramakrishnan,R. (2008) High throughput gene expression measurement with real time PCR in a microfluidic dynamic array. PLoS One, 3, e Riant,E., Waget,A., Cogo,H., Arnal,J.F., Burcelin,R. and Gourdy,P. (2009) Estrogens protect against high-fat diet-induced insulin resistance and glucose intolerance in mice. Endocrinology, 150, Giulietti,A., Overbergh,L., Valckx,D., Decallonne,B., Bouillon,R. and Mathieu,C. (2001) An overview of real-time quantitative PCR: applications to quantify cytokine gene expression. Methods, 25, Livak,K.J. and Schmittgen,T.D. (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods, 25, Bustin,S.A. and Nolan,T. (2004) Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J. Biomol. Tech., 15, Nolan,T., Hands,R.E. and Bustin,S.A. (2006) Quantification of mrna using real-time RT-PCR. Nat. Protoc., 1, Higuchi,R., Fockler,C., Dollinger,G. and Watson,R. (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology, 11, Gibson,U.E., Heid,C.A. and Williams,P.M. (1996) A novel method for real time quantitative RT-PCR. Genome Res., 6, Pfaffl,M.W. (2010) The ongoing evolution of qpcr. Methods, 50, Roy,S.W. and Gilbert,W. (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat. Rev. Genet., 7, Bustin,S.A., Benes,V., Garson,J.A., Hellemans,J., Huggett,J., Kubista,M., Mueller,R., Nolan,T., Pfaffl,M.W., Shipley,G.L. et al. (2009) The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem., 55, Yun,J.J., Heisler,L.E., Hwang,I.I.L., Wilkins,O., Lau,S.K., Hyrcza,M., Jayabalasingham,B., Jin,J., McLaurin,J., Tsao,M.S. et al. (2006) Genomic DNA functions as a universal external standard in quantitative real-time PCR. Nucleic Acids Res., 34, e Kubista,M., Sindelka,R., Tichopad,A., Bergkvist,A., Lindh,D. and Forootan,A. (2007) The Prime Technique. Real-time PCR data analysis. GIT Lab. J., 9 10, Pfaffl,M.W. (2001) A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res., 29, e Tholen,D.W., Kroll,M., Astles,J.R., Caffo,A.L., Happe,T.M., Krouwer,J. and Lasky,F. (2003) Evaluation of the Linearity of Quantitative Measurement Procedures: a Statistical Approach; Approved Guideline, Vol. 23, CLSI EP6-A. Clinical and Laboratory Standards Institute, Wayne, PA, pp Vandesompele,J., De Preter,K., Pattyn,F., Poppe,B., Van Roy,N., De Paepe,A. and Speleman,F. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol., 3, RESEARCH Andersen,C.L., Jensen,J.L. and Orntoft,T.F. (2004) Normalization of real-time quantitative reverse transcription-pcr data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res., 64, Hellemans,J., Mortier,G., De Paepe,A., Speleman,F. and Vandesompele,J. (2007) qbase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol., 8, R Liu,Y.J., Zheng,D., Balasubramanian,S., Carriero,N., Khurana,E., Robilotto,R. and Gerstein,M.B. (2009) Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics, 10,

71 4.4.4 Přímá lyzace buněk Pro přesnou analýzu velkého množství vzorků s velmi malým množstvím nukleových kyselin, jakým je jednobuněčné profilování nebo molekulární tomografie, je nutné použít techniku izolace, která zpřístupní cílové molekuly bez jakýchkoliv ztrát a vnesené chyby. Je také velmi vhodné, aby technika izolace byla jednoduchá, vhodná k lyzaci mnoho buněčných typů a pokud možno automatizovatelná, vzhledem k velkému počtu analyzovaných vzorků. Důležité je také použít co nejmenšího objemu roztoku pro udržení co nejvyšší koncentrace templátu v průběhu analýzy a prevenci náhodné distribuce molekul (Obrázek 4). Takovým krokem při zpracování jednobuněčných vzorků je přímá lyzace, která umožňuje efektivní uvolnění RNA a její přepis do cdna bez promývacích kroků, pouhým přidáváním potřebných reagencií zkumavky vzorku. Pro efektivní lyzaci lze použít chemické, mechanické či fyzikální metody např. sonikaci, mrazení či zahřívání. Pro poměrně účinnou lyzaci buňky často postačuje i čistá voda s hypotonickým efektem, způsobujícím prasknutí buněčné membrány osmotickým tlakem. Pokročilejší lyzační roztoky obvykle používají molekulový nosič, který se naváže na povrch zkumavek a minimalizuje ztráty vzácných mrna. Součástí bývají také chemikálie rozkládající buněčnou membránu[21] a pufr stabilizující nukleové kyseliny. Při užití chemických činidel musí být díky absenci purifikace chemikálie kompatibilní s enzymatickými reakcemi, které následují. V naší studii sledující které chemikálie působí nejefektivněji při lyzaci buňky a zároveň neovlivňují RT-qPCR spolu s efektem na stabilitu RNA jsme testovali 17 různých činidel. Pomocí ředící řady buněk jsme navíc porovnali výtěžek s běžně dostupným kitem založeným na promývání kolonek. Použili jsme nástroje kontroly kvality: exogenní mrna a DNA spike a také ValidPrime. Výsledky jsou popsány v publikaci V. Direct cell lysis for single-cell gene expression profiling. Zjistili jsme, že nejvyšší výtěžek při lyzaci 32 buněk jsme dosáhli přidáním 1 mg/ml BSA do čisté vody. Analýza nebyla provedena na jednotlivých buňkách, abychom se vyhnuli stochastickému efektu a vzorky bylo možné díky populačnímu průměru snáze porovnat. Dle některých literárních zdrojů je navázání RNA na BSA silně závislé na ph a iontovém složení roztoku[108]. Díky známému podpůrnému a proti-inhibičnímu efekftu BSA na enzymatické reakce, zvláště PCR[ ], jsme porovnali efekt lyzačních chemikálií použitím čisté RNA. Ověřili jsme, že většina efektu zvyšující výtěžek pochází z kroků před reverzní transkripcí, tedy 71

72 lyzace buněk. Pozorovali jsme také zvýšení stability mrna v přítomnosti BSA v čase v pokojové teplotě a také při několika cyklech zmražení a rozmrazení. Lyzační roztok na bázi BSA byl již použit v několika studiích pro zvýšení výtěžku nutného pro analýzu [7, 8]. Nepublikované výsledky pro assay specifickou na genomickou DNA ValidPrime ukázaly, že i když je přímá lyzace postačující pro získání mrna, nemusí být vždy dostatečná pro zpřístupnění genomické DNA, která je v jádře uschována v komplexní struktuře chromatinu (Obrázek 11), zatímco agresivnější činidla kolonkového kitu chromatinovou strukturu rozruší s větší pravděpodobností. Tento fakt může hrát významnou roli pro budoucí epigenetické studie na úrovni jednotlivých buněk, kdy lze předpokládat detekci pouze otevřených úseků DNA. Obrázek 11: Efektivita izolace genomické DNA stanovená pro přímou lyzaci (BSA) a kolonkový purifikační kit (Qiagen) pomocí qpcr assay ValidPrime. Pro vytvoření standardní křivky jsme použili astrocyty a pomocí FACSu roztřídili do příslušných množství (n=4) znázorněné na ose x. Levá osa y reprezentuje naměřené Cq hodnoty, zatímco pravá osa y ukazuje distribuci pozitivních technických qpcr replikátů (nepublikovaná data). 72

73 Q1 METHODS ARTICLE published: xx October 2013 doi: /fonc Direct cell lysis for single-cell gene expression profiling David Svec 1,2 *, Daniel Andersson 3, Milos Pekny 3, Robert Sjöback 2, Mikael Kubista 1,2 and Anders Ståhlberg 2,3,4 * Q Institute of Biotechnology AS CR, Prague, Czech Republic TATAA Biocenter, Gothenburg, Sweden Center for Brain Repair and Rehabilitation, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden The interest to analyze single and few cell samples is rapidly increasing. Numerous extrac- 011 Edited by: Tomer Kalisky, Bar-Ilan University, tion protocols to purify nucleic acids are available, but most of them compromise severely 069 Israel 013 on yield to remove contaminants and are therefore not suitable for the analysis of samples Reviewed by: 071 Alexandre Arcaro, University of Bern, containing small numbers of transcripts only. Here, we evaluate 17 direct cell lysis protocols Switzerland for transcript yield and compatibility with downstream reverse transcription quantitative 016 Stephan Von Gunten, University of real-time PCR. Four endogenously expressed genes are assayed together with RNA and Bern, Switzerland 074 DNA spikes in the samples. We found bovine serum albumin (BSA) to be the best lysis 018 *Correspondence: 075 agent, resulting in efficient cell lysis, high RNA stability, and enhanced reverse transcription 019 David Svec, TATAA Biocenter, Odinsgatan 28, Gothenburg, efficiency. Furthermore, we found direct cell lysis with BSA superior to standard column 077 Sweden based extraction methods, when analyzing from 1 up to 512 mammalian cells. In conclusion, direct cell lysis protocols based on BSA can be applied with most cell collection david.svec@tataa.com; 022 Anders Ståhlberg, Sahlgrenska 079 methods and are compatible with most analytical workflows to analyze single-cells as well 023 Cancer Center, Sahlgrenska Academy, University of Gothenburg, as samples composed of small numbers of cells Gothenburg, Sweden Keywords: real-time PCR, single-cell biology, single-cell gene expression, RNA spike, DNA spike, cell lysis, direct anders.stahlberg@gu.se 026 lysis, RNA purification INTRODUCTION Gene expression profiling has traditionally been performed on rather large samples with plenty of material. However, tissues contain many cell types that respond differently to stimuli and environmental changes, which complicate interpretation. Many studies are confounded by the intrinsic heterogeneity of biological samples. With single-cell analysis this complexity is eliminated and the true response of each cell type can be studied (1, 2). Recent single-cell profiling studies have shown large variability in transcript levels among individual cells in seemingly homogeneous cell populations and have revealed previously unknown subpopulations (3, 4). Analysis of individual cells clearly opens up for new possibilities to study biological processes such as cell transitions, signaling, differentiation, and proliferation (5, 6). Reverse transcription quantitative real-time PCR (RT-qPCR) is the golden standard for gene expression profiling (7, 8). Through the implementation of the guidelines minimum information for publication of RT-qPCR experiments (MIQE) the technique has become robust and reliable (9). Usually samples composed of hundreds of thousands of cells are analyzed. These samples are lysed with strong chaotropic agents that release and protect nucleic acids, which are then purified using protocols that remove contaminants and substances that might interfere with downstream RT-qPCR (10, 11). Common to these methods is that they include one or more washing steps that lead to losses. As we write, the CELL CULTURES catalog of known RNA types is growing, resulting in increased appreciation for the numerous biological functions carried out by RNA (12). A typical single-cell contains rather few transcripts of most genes. Recent RNA sequencing data suggest there are some 22,000 mrnas in a mouse embryonic stem cell and some 505,000 mrnas in a mouse embryonic fibroblast. The top thousand transcripts are present in molecules per stem cell and molecules per fibroblast (13, 14). Clearly, when analyzing singlecells any loss during extraction caused by washing can introduce serious uncertainty and even total loss of some transcripts. Hence, classical purification protocols based on washing are not suitable for single-cell analysis (15, 16). A protocol based on a lysis medium that disrupts the cell membrane, makes RNA accessible for RT and maintains RNA integrity without inhibiting the downstream enzymatic reactions offers great advantages in quantitative single-cell gene expression profiling. In this work we study lysis buffers that are suitable for small samples ( cells) and do not require washing. We test several lysis agents in use today, comparing lysis yield, reproducibility, and RNA stability. The effect on sample handling after cell lysis is another important parameter to consider, since the time from cell collection to storage can vary from minutes to hours. We also compare the sensitivity of direct cell lysis to traditional column based RNA extraction protocols, and we test how many cells can be analyzed without downstream inhibition. To assess yields and validate reproducibilities we use RNA and DNA spikes (17). MATERIALS AND METHODS Primary astrocytes were generated from post-natal day 0 1 mouse brains and prepared as described (4). The astrocytes were washed twice in PBS and treated with 0.25% Trypsin/EDTA (Invitrogen) for 2 min to dissociate cells. The dissociated cells were kept on October 2013 Volume 3 Article

74 Svec et al. Direct lysis of single-cells ice in either PBS supplemented with 2% BSA or in astrocyte culture medium until subsequent analysis. All experiments involving mice were conducted according to protocols approved by the Ethics Committee of the University of Gothenburg, Gothenburg, Sweden REVERSE TRANSCRIPTION RNA AND DNA SPIKES SuperScript III Reverse Transcriptase (Invitrogen) was used for 178 A Universal RNA Spike (TATAA Biocenter) was used to evaluate the performance of the lysis buffers and RT efficiencies. The RNA spike is about 1000 bases long and has a 5 cap and a poly-a tail of approximately 200 bases to mimic eukaryotic mrna. The spike sequence is not present in any known genome. A DNA spike (TATAA Biocenter) was used to determine the specific effect of the lysis protocols on DNA CELL LYSIS 20 µl RT reaction volumes were used in the comparison between 187 Cells were sorted with a BD FACSAria (Becton Dickinson) into 96-well plates (Applied Biosystems) with 5 µl lysis buffer per well as described (2, 18). The following chemicals were evaluated (final lysis concentrations are shown): 7-deaza-2 -deoxyguanosine-5 - triphosphate lithium salt (100 µm, Sigma-Aldrich); Betaine solution (4 M, Sigma-Aldrich); BSA (1 4 mg/ml, Fermentas); guanidine thiocyanate solution (GTC) (40 80 mm, Sigma-Aldrich); GenElute linear polyacrylamide (LPA) (50 ng/µl, Sigma-Aldrich); Igepal CA-630 (also known as Non-idet P-40, 0.5 4%, Sigma- Aldrich); polyinosinic acid potassium salt (50 ng/µl, Sigma- Aldrich); RNAseOUT (10 U/µl, Invitrogen); 2 reverse transcription buffer: 100 mm Tris-HCl (ph 8.3), 150 mm KCl, and 6 mm MgCl 2 (Invitrogen); d-(+)-trehalose dihydrate (1 M, Sigma- Aldrich); yeast trna (50 ng/µl, Ambion); RT mix (2 RT buffer, Invitrogen, 5 µm random hexamers (Metabion), 5 µm oligo-dt (Metabion), 1 mm dntp); RT mix with BSA (2 RT buffer, 5 µm random hexamers, 5 µm oligo-dt, 1 mm dntp, 1 mg/ml BSA); and RNase-free water (Gibco). The BSA (20 mg/ml) was supplied in 10 mm Tris (ph 7.4 at 25 C) 100 mm KCl, 1 mm EDTA, and 50% v/v glycerol. For detailed list of chemicals used in lysis buffer, see Table S1 in Supplementary Material. The DNA ( molecules) and RNA ( molecules) spikes were added to the lysis buffers. Lysed samples were frozen at 80 C until RT. Each lysis was tested in four replicates. To evaluate RNA stability in different lysis buffers we performed time course studies as well as freeze-thaw cycling experiments. Briefly, after cell dissociation the concentration was adjusted to 200 cells/µl. For each test 2.5 µl cell suspension was added to 47.5 µl lysis buffer and vortexed. For the time course study, samples were kept at room temperature for 0, 1, 2, and 6 h (n = 4). For the freeze-thaw cycling test, lysates were frozen at 80 C for 20 min and then thawed in room temperature for 20 min. The freezingthawing was repeated 1, 2, 3, or 6 times (n = 4). Two microliters of cell lysate was used for RT QUANTITATIVE REAL-TIME PCR tion for Publication of Quantitative Real-Time PCR Experiments 166 DNA AND RNA PURIFICATION 223 Total RNA was extracted using the RNeasy Micro kit without DNase treatment (Qiagen). Cells were sorted into 75 µl RLT buffer supplied with 2-Mercaptoethanol (Sigma Aldrich), RNA and DNA spikes, and 20 ng of poly-a carrier (Qiagen) per sample. Lysed cells were frozen at 80 C until extraction, which was performed according to the manufacturer s instructions. All purified RNA (10 µl) was used in reverse transcription. Purification of PCR products for the DNA spike was performed with Qiaquick PCR purification kit (Qiagen). reverse transcription. Directly lysed cells were incubated in 0.5 mm dntp (Sigma-Aldrich), 2.5 µm oligo-dt (Metabion), and 2.5 µm random hexamers (Metabion) at 65 C for 5 min and then chilled on ice. 50 mm Tris-HCl (ph 8.3), 75 mm KCl, 3 mm MgCl 2, 5 mm dithiothreitol, 10 U RNaseOut, and 50 U SuperScript III were added to a final volume of 10 µl (all Invitrogen). Final RT concentrations or amounts of the agents are shown. To be able to use all RNA from the column based extraction experiments direct lysis with 1 mg/ml BSA and columns based extraction. The temperature profile was: 25 C for 5 min, 50 C for 60 min, 55 C for 15 min followed by enzymatic inactivation by heating to 70 C for 15 min. All cdna samples were diluted with water to 60 µl prior qpcr. The inhibitory effect of guanidine thiocyanate in RT was tested using 32 cells comparing 10 µl RT reactions with 50 U SuperScript III to 20 µl RT reactions with 200 U Superscript III (Figure S1 in Supplementary Material). Quantitative real-time PCR (qpcr) was performed on the Light- Cycler480 (Roche Diagnostics) using SYBR Green I detection chemistry. To each reaction (10 µl) containing iq SYBR Green Supermix (Bio-Rad) and 400 nm of each primer (Eurofins MWG Operon), 3 µl of diluted cdna was added. Primer sequences are shown in Table S2 in Supplementary Material. The temperature protocol was 3 min at 95 C followed by 45 cycles of amplification (95 C for 20 s, 58 C for 20 s, and 72 C for 20 s). All samples were analyzed by melting curve analysis (60 95 C at 0.1 C continuous increments). Formation of PCR products of expected length was confirmed by agarose gel electrophoresis. Cycle of quantification (Cq) values were obtained by the maximum second derivative method. All biological assays were designed to span introns and were checked by the BLAST algorithm for potential pseudogenes. For Gapdh 308 potential pseudogenes were found. During assay validation, all primer pairs resulted in more than five cycles difference between the normal cdna sample and the RT negative control that only contained genomic DNA. All qpcr assays were optimized to such extent that primer-dimer signals never appeared within 45 cycles of amplification, and PCR efficiencies were %. Standard curves were analyzed with GenEx (MultiD Analyses). Interplate calibrator (TATAA Biocenter) was used to compensate for instrument variation between qpcr runs. All experiments were performed according to the Minimum Informa- guidelines (9) RESULTS OPTIMIZATION OF PURIFICATION-FREE LYSIS 226 We tested the following 17 conditions for the direct cell lysis and RNA analysis by RT-qPCR in mammalian cells: water, water Frontiers in Oncology Molecular and Cellular Oncology October 2013 Volume 3 Article

75 Svec et al. Direct lysis of single-cells with RNA and DNA spikes, 100 µm 7-deaza-2 -deoxyguanosine- 5 -triphosphate lithium salt (7-deaz GTP), 4 M Betaine, 1 and 2 mg/ml bovine serum albumin (BSA), 40 and 80 mm GTC, 50 ng/µl GenElute LPA, 0.5 and 4% Igepal CA-630 (also known as Non-idet P-40), 50 ng/µl polyinosinic acid potassium salt (polyi), 10 U/µl RNAse OUT, 2 RT buffer, 1 M trehalose, 50 ng/µl yeast trna and combinations of compounds: RT mix (2 buffer, 5 µm random hexamers, 5 µm oligo-dt, and 1 mm dntp) and RT mix + BSA (2 RT buffer, 5 µm random hexamers, 5 µm oligodt, 1 mm dntp, and 1 mg/ml BSA). For details, see Table S1 in Supplementary Material. The lysis agents can be divided in groups based on function: carriers [BSA (19 21), yeast trna (22), LPA (23), poly I (24), and 7-deaz GTP (25)], enzymatic enhancers [BSA, betaine (25 27), trehalose (28 30)], detergent [Igepal CA-630 (1)], chaotropic agent [GTC (1, 31)]. Most lysis conditions act through osmosis (4, 8). Each lysis protocol was evaluated on 32 primary astrocytes collected in 96-well plates using FACS (n = 4 for each condition). The rationale of analyzing 32 instead of single-cells in the comparison of conditions is to eliminate the effect of stochastic gene expression observed in single-cells, while keeping the number of cells still sufficiently low to reflect the lysis performance of few cells (3, 32, 33). Two highly expressed genes (Gapdh and Vim) and two intermediately expressed genes (Dll1 and Jag1) were analyzed. It is common procedure to add spikes to biological samples, particularly when complex matrices are analyzed, to detect inhibition, which can strongly bias data (17, 34). To test for degradation, inhibition, and losses due to adsorption in the lysis step and downstream RT-qPCR, RNA, and DNA spikes were added to all lysis media before the cell sorting. The idea of using an RNA as well as a DNA spike is to separate the interference in RT and qpcr. The RNA spike has 3 A-tail and 5 Cap to mimic endogenous mrna. Figure 1A shows Cq-values measured by RT-qPCR representing relative cdna yields of Gapdh, Vim, Dll1, Jag1, and of the DNA and RNA spikes at the various tested conditions. Figure 1B shows that the use of 1 mg/ml BSA results in highest average cdna yields for Gapdh, Vim, Dll1, and Jag1. About 2 mg/ml BSA and 1 M trehalose show almost as good behavior. The effect of lysis agent is substantial as reflected by the difference of 5.9 cycles for Jag1 between using 1 mg/ml BSA and using 80 mm GTC. At 100% PCR efficiency this would correspond to 58-fold difference in the measured Jag1 level. There is some variation in lysis yield with condition and also with transcript, but generally lysis was efficient with BSA. Another way to compare lysis is by the rate of positive qpcr reads for the target molecules. For the highly abundant Gapdh and Vim transcripts as well as for the two spikes all samples were positive, while for the low abundant transcripts Dll1 and Jag 1 the rate of positive reads ranged from 25 to 100% (Table S3 in Supplementary Material). The Cq-values measured for the DNA spike reflect the qpcr performance including inhibition and any COMPARISON OF RNA STABILITY IN DIFFERENT LYSIS BUFFERS losses due to surface adsorption in the particular matrix. There is SENSITIVITY, YIELD AND DYNAMIC RANGE OF RNA ANALYSIS WITH modest variation in yields (Figure 1). Notably, RNaseOUT is the DIRECT CELL LYSIS COMPARED TO COLUMN BASED RNA PURIFICATION PROTOCOLS 338 agent inducing lowest yield. For the RNA spike, which reflects the combined effect of the lysis matrix, RT, and qpcr, differences are larger. While most additives show modest variation from the RT mix, the yield dropped 7.3-fold (assuming 100% PCR efficiency) when using 80 mm GTC. To separate the effect of the agent added on cell lysis from that on the RT-qPCR we also analyzed 5 ng of purified total RNA from the same cells with each lysis condition. Figure 2 shows the effect of the lysis agents on the RT-qPCR only for Gapdh, Vim, Dll1, Jag1, and the RNA and DNA spikes. Most lysis agents enhance the RT yield compared to the water control (Figure 2B), exception was GTC (80 mm) which severely inhibited RT. The stimulatory effect of the lysis agents was to some degree gene (or rather assay) dependent, as can be seen by comparing the best and worst condition for Vim, which was 2.2-fold when comparing RT mix with 10 U/µl RNaseOut, and Jag1, which was 10.2-fold when comparing 4% Igepal630 with 80 mm GTC (Table S4 in Supplementary Material). The cdna yield depends on lysis efficiency, RNA integrity, RT primer access to target RNA, and the RT yield (35). The effect of lysis agents on cell lysis together with RT-qPCR (Figure 1B) compared to RT-qPCR only (Figure 2B) does not correlate, indicating that cell lysis is the limiting step to obtain high cdna yield. We have previously shown that lysis of cell aggregates may be improved by addition of GTC (1). That protocol, however, requires using more reverse transcriptase. In this study we used about half the amount, which leads to the severe inhibition by GTC we observe (Figure S1 in Supplementary Material). RNA stability in terms of decay and accessibility after collection and cell lysis is an important but rarely tested property of lysis buffers, as the time from cell collection and lysis to analysis may vary from minutes to hours. The sample handling may also require freezing/thawing steps. We tested the stability of RNA by keeping 500 lysed astrocytes in six different lysis conditions at room temperature (n = 4) for 1, 2, and 6 h. Astrocytes that were lysed and immediately reverse transcribed were used as control. Figure 3 shows the relative stabilities of Gapdh, Vim, Dll1, and Jag1 transcripts in water, 50 ng/µl yeast trna, 1 4 mg/ml BSA, and 1 RT buffer. The storage in BSA was superior. As expected the amount of accessible transcripts decreased with time of storage at room temperature (Tables S5 and S6 in Supplementary Material). Notably, accessible Gapdh and Vim transcript levels decreased rapidly when using lysis conditions other than BSA, while Dll1 and Jag1 showed more moderate decrease at all conditions. Consequently, RNA loss is gene dependent, which is in agreement with previous reports (36, 37). Maintaining RNA stability throughout freeze/thawing cycles is most important when handling and storing nucleic acids. Figure 4 shows that 1 4 mg/ml BSA is superior to the other tested agents to maintain RNA stability after 1, 2, 3, and 6 cycles of freezing/thawing. Using BSA in storage media almost all mrna remains available for analysis even after six freeze/thaw cycles, while with the other agents the mrna is gradually lost. To assess sensitivity, yield and dynamic range of the here optimal direct cell lysis protocol (1 mg/ml BSA) we compared it to a standard protocol based on traditional spin-columns (RNeasy Micro kit, Qiagen). Expression of Gapdh, Vim, Dll1, Jag1, and the RNA October 2013 Volume 3 Article

76 Svec et al. Direct lysis of single-cells FIGURE 1 Evaluation of direct cell lysis protocols. (A) The lysis yields of Gapdh, Vim, Dll1, Jag1, DNA, and RNA spike compared at 17 lysis conditions. Thirty-two astrocytes were sorted for each condition. Relative cdna yields are presented as Cq-values on the left y -axis and relative transcript numbers on the right y -axis. The relative transcript number is expressed in percentage compared to the optimal lysis condition for each gene, assuming 100% RT efficiency and 100% PCR efficiency. Data are shown as mean ± SD (n = 4). Missing data were excluded and are listed in Frontiers in Oncology Molecular and Cellular Oncology Table S3 in Supplementary Material. (B) Mean cdna yield of the transcripts. Expressions of Gapdh, Vim, Dll, and Jag1 were averaged and are compared to the overall optimal lysis condition (1 mg/ml BSA). Data are shown as mean ± SD (n = 4). 7-deaz GTP, 7-deaza-20 deoxyguanosine 50 triphosphate lithium salt; GTC, guanidine thiocyanate; LPA, linear polyacrylamide; polyi, polyinosinic acid potassium salt; 2 RT buffer, 2 reverse transcription buffer; RT mix, 2 RT buffer, 5 µm random hexamers, 5 µm oligo-dt, and 1 mm dntp. October 2013 Volume 3 Article Q3

77 Svec et al. Direct lysis of single-cells FIGURE 2 Evaluation of direct cell lysis protocols on RT-qPCR. (A) The RT-qPCR yields of Gapdh, Vim, Dll1, Jag1, DNA, and RNA spike using 17 lysis conditions. Five nanograms of purified RNA was used in all RT reactions. Relative RT yields are presented as Cq-values on the left y -axis and relative transcript numbers on the right y -axis. The relative transcript number is expressed in percentage relative to the water control for each gene, assuming 100% RT efficiency and 100% PCR efficiency. Lysis conditions with Cq-values below that of the water control are RT enhancing agents, while conditions with higher Cq-values are inhibitory. Data are shown as mean ± SD (n = 4). Missing data were excluded and are shown in Table S4 in Supplementary Material. (B) Mean RT yield for Gapdh, Vim, Dll, and Jag1. The relative transcript yield of each transcript was averaged and compared to the optimal RT-qPCR condition (RT mix). Data are shown as mean ± SD (n = 4). 7-deaz GTP, 7-deaza-20 deoxyguanosine 50 triphosphate lithium salt; GTC, guanidine thiocyanate; LPA, linear polyacrylamide; polyi, polyinosinic acid potassium salt; 2 RT buffer, 2 reverse transcription buffer; RT mix, 2 RT buffer, 5 µm random hexamers, 5 µm oligo-dt, and 1 mm dntp. October 2013 Volume 3 Article

78 Svec et al. Direct lysis of single-cells FIGURE 3 mrna accessibility over time. (A) mrna accessibility over time in 1 4 mg/ml BSA, 50 ng/µl yeast trna, 1 RT buffer, and water. Five hundred astrocytes were lysed and kept in room temperature for 0, 1, 2, and 6 h. Cq-values are shown on the left y -axis and relative transcript numbers on the right y -axis. Relative transcript number is expressed in percentage compared to the 1 mg/ml BSA sample at 0 h, assuming 100% RT efficiency and 100% PCR efficiency. Data are shown as mean ± SD Frontiers in Oncology Molecular and Cellular Oncology (n = 4). (B) Mean RNA accessibility of the transcripts. Expression of Gapdh, Vim, Dll, and Jag1 were averaged and compared to the 1 mg/ml BSA condition at 0 h. (C) Percentage of positive data points. Missing data were excluded from subplots (A,B) and are shown in Table S5 in Supplementary Material. Four genes and four time points were analyzed per lysis condition. GTC, guanidine thiocyanate; 1 RT buffer, 1 reverse transcription buffer. October 2013 Volume 3 Article

79 Svec et al. Direct lysis of single-cells FIGURE 4 mrna during freeze/thaw cycling. (A) Comparison of RNA accessibility after freeze/thaw cycles in 1 4 mg/ml BSA, 50 ng/µl yeast trna, 1 RT buffer and water. Five hundred astrocytes were lysed, frozen in 80 C and thawed in room temperature 1, 2, 3, or 6 times. Cq-values are shown on the left y -axis and relative transcript numbers on the right y -axis. Relative transcript number is expressed in percentage compared to the 1 mg/ml BSA sample thawed once, assuming 100% RT efficiency and 100% PCR efficiency. Data are shown as mean ± SD (n = 4) (B) Mean RNA accessibility of the transcripts. Expression of Gapdh, Vim, Dll, and Jag1 were averaged and compared to the 1 mg/ml BSA sample thawed once. (C) Percentage of positive data points. Missing data were excluded from subplot (A,B) and are shown in Table S6 in Supplementary Material. Four genes and four different amounts of freeze/thaw cycles were analyzed per lysis condition. GTC, guanidine thiocyanate; 1 RT buffer, 1 reverse transcription buffer. October 2013 Volume 3 Article

80 Svec et al and DNA spikes were measured in FACS-sorted primary astrocytes performing a twofold dilution series ranging from a single-cell to 2048 cells (n = 4 per step, in total 12 steps). Cells were sorted into 5 µl of either 1 mg/ml BSA or RNeasy Micro kit RLT buffer (supplemented with poly-a carrier and 2-Mercaptoethanol). Figure 5 shows the yields and dynamic ranges of the direct cell lysis and of the column based extraction. The yields are 3- to 15-fold higher for the endogenously expressed mrnas with the direct cell lysis protocol when analyzing 256 cells and even higher when analyzing lower cell numbers (Table S7 in Supplementary Material). For the RNA and DNA spikes the yields with direct lysis were Direct lysis of single-cells and 5.1-fold higher, respectively. This corresponds to more than 90% loss of RNA and 80% loss of DNA with the column based purification protocol. Jag1 and Dll1 transcripts were present at low levels when few cells (<16 cells) were analyzed. Here, sensitivity can be assessed from the percentage of positive replicates (samples that gave rise to a specific PCR product, Figure 5A). This comparison shows that direct cell lysis based on 1 mg/ml BSA is superior to RNeasy Micro spin-columns. The dynamic range of the direct cell lysis is indicated by linear analysis in Figure 5A. The yields of Gapdh, Vim, Dll1, Jag1, and the RNA and DNA spikes start to decline when 512 or more astrocytes were analyzed FIGURE 5 Comparison of direct cell lysis and column based extraction. (A) One to 2048 cells in steps of two were FACS-sorted and mrna was extracted either by direct cell lysis or RNeasy Micro columns for RT-qPCR analysis. The difference in Cq-values (left y -axis) between 1 mg/ml BSA and column based extraction reflects the difference in sensitivity between the two methods. Percentage of samples with detectable cdna is plotted on the right y -axis. The dynamic range for the endogenously expressed genes is shown by linear curve fits. The Spike only control sample shows the effect of column based extraction of the spikes alone without any cell material. Data are shown as mean ± SD (n = 4) (see also Table S7 in Supplementary Material). (B) Comparison of direct cell lysis to column based extraction for all transcripts. The mean difference in sensitivity is shown by averaging the expression of Gapdh, Vim, Dll, and Jag1. The dotted line, at a value of one, indicates when column based extraction is as efficient as direct lysis. Missing data were replaced with a Cq-value of Frontiers in Oncology Molecular and Cellular Oncology October 2013 Volume 3 Article

81 Svec et al. Direct lysis of single-cells DISCUSSION Traditional recommendations and guidelines for purification of nucleic acids refer to direct cell lysis as being inhibitory and even impossible in many conditions (11). For most sample types this is true. But for samples of low complexity, such as cell cultures and single-cells, direct cell lysis offers advantages by eliminating losses due to washing and therefore results in higher yields. Other advantages are that the protocols are simple, fast, and cost efficient, which makes them very suitable for high throughput applications (38, 39). For single-cell gene expression profiling studies, direct cell lysis is practically the only way to retrieve mrna for reliable analysis. In fact, direct lysis can even be used for multi-analyte profiling, measuring RNA, DNA, and proteins from the same single-cells (single-cell omniomics) (40). In a recent comparison of commercial direct lysis agents with extraction using spin-columns, superior yields were obtained with direct cell lysis for up to 1000 fibroblast cells (41). We have evaluated 17 direct lysis protocols on astrocytes comparing yield, RNA stability, and compatibility with downstream RT-qPCR. For testing of RT and qpcr proficiencies we used RNA and DNA spikes. We found best performance using 1 mg/ml of BSA. BSA is a common enhancer in PCR (21, 42). Its mechanism of action is complex and includes being a carrier (19, 43), proteinase inhibitor (44), and to sequester inhibitors (20, 45). Here, we show BSA also has advantageous properties in direct cell lysis and for maintaining RNA accessible (Figure 1). Comparing Figures 1B and 2B we conclude that the positive effect is highest when BSA is present during cell lysis. The enhancing effect of BSA on qpcr is usually thought of as a carrier effect, i.e., BSA adsorbs to the surfaces of the reaction container reducing the number of nucleic acids that bind. After BSA, yeast trna, which is a typical carrier molecule, performed best in the freeze/thaw cycling study (Figure 4). This suggests that BSA has some carrier properties, but other mechanisms must contribute and should be more important, since the effect of BSA on the RNA and DNA spikes was almost negligible (Figures 1 and 2). BSA is also known to reduce the effect of inhibitors in qpcr (20, 45), possibly by binding them (19, 43). However, our RNA and DNA spike data (Figure 1) suggest that inhibition is not important when small number of cells are analyzed, which is in accordance with previous observations (15). Albumin is by far the most abundant protein in the circulatory system, being present at millimolar concentrations, and it accounts for 80% of the colloid osmotic blood pressure (46). It is carrier of fatty acids in plasma and very important buffer, stabilizing the blood ph. BSA showed stabilizing effects compared to water, RT buffer, and yeast trna in the stability at room temperature and freeze/thaw cycling studies (Figures 4 and 5), which may be related to the function it has in blood. The effects of BSA and trna as carriers are gene dependent. Some studies suggest interactions between BSA and RNA are unlikely (47), while other suggest the affinity of nucleic acids to BSA is highly ph and ionic strength dependent (19, 43). Another issue in RNA purification protocols is degradation by ribonucleases (RNases) (48, 49). RNase inhibitors are often added to lysis buffers or to the RT to prevent RNA degradation. In this study we used in vitro cultured primary astrocytes and found no improvement using RNase inhibitors (Figures 1 and 2), suggesting enzymatic degradation by RNases is not important under our conditions. This is in accordance with our previous studies of various cell types and experimental setups (1, 2, 4, 40). RNases, including endonucleases and exonucleases are all a large family of RNA degrading enzymes. In eukaryotic cells mrnas form ribonucleoprotein complexes with compact quaternary structures, in which 3 mrna ends are covered by proteins or embedded in secondary structures that protects them from intracellular RNases (50). Eukaryotic cells also produce the ribonuclease inhibitor protein (51). These protection mechanisms are likely to remain after mild direct lysis. Many RNases are also secreted and the human body fluids are very rich in RNases that are very active and extremely stable, but their functions are not well known (52). Extracellular RNases are most likely washed away while preparing dissociated cells for later cell collection. We speculate that the loss of mrnas during storage time and freeze/thaw cycling is due to self-hydrolysis of nucleic acids, aggregation, and absorption rather than to RNase activity. Self-hydrolysis can be mitigated with optimized buffer. Cells with more rigid plasma membrane may require harsher lysis condition for complete cell disruption that need to be compatible with downstream analysis. The development of inhibition resistant mutants of Taq polymerases allows direct amplification of DNA in blood and soil samples that may prove to be very useful in single-cell analysis when stronger detergents and salts are required (53, 54). In the future we expect direct lysis of larger number of cells and perhaps even small tissue pieces to be possible (55 57). In conclusion, we show that direct lysis of single-cells and even few cells samples can be reliably performed without losses in combination with RT-qPCR. We also show that additives such as BSA have several advantageous properties to the lysis buffers, including high lysis yield and stabilizing effect on the mrna AUTHOR CONTRIBUTION Conceived and designed experiments: David Svec and Anders Ståhlberg. Performed the experiment: David Svec, Daniel Andersson, and Anders Ståhlberg. Analyzed the data: David Svec, Daniel Andersson, Robert Sjöback, Mikael Kubista, and Anders Ståhlberg. Contributed reagents and material: Milos Pekny, Mikael Kubista, and Anders Ståhlberg. Wrote the paper: David Svec, Daniel Andersson, Milos Pekny, Robert Sjöback, Mikael Kubista, and Anders Ståhlberg ACKNOWLEDGMENTS This work was supported by grants from ESF Functional Genomics Short Visit Grant Frontiers of Functional Genomics (David Svec), EU FP 7 Program EduGlia (237956) (David Svec), ALF Gothenburg (11392), AFA Research Foundation, Hjärnfonden, Amlöv s Foundation, E. Jacobson s Donation Fund, NanoNet COST Action (BM1002), EU FP 7 Program TargetBraIn (279017), Assar Gabrielssons Research Foundation, Johan Jansson Foundation for Cancer Research, Swedish Society for Medical Research, The Swedish Research Council (Anders Ståhlberg and Milos Pekny 11548), BioCARE National Strategic Research Program at University of Gothenburg, Wilhelm and October 2013 Volume 3 Article

82 Svec et al. Direct lysis of single-cells Martina Lundgren Foundation for Scientific Research, the grant Geselowitz DA, Neckers LM. Bovine serum albumin is a major oligonucleotidebinding of Czech Ministry of Education ME10052, the research project protein found on the surface of cultured cells. Antisense Res Dev (1995) : AV0Z , and GACR GA P303/10/ Kreader CA. Relief of amplification inhibition in PCR with bovine serum albumin or T4 gene 32 protein. Appl Environ Microbiol (1996) 62: SUPPLEMENTARY MATERIAL 21. Farell EM, Alexandre G. Bovine serum albumin further enhances the effects of The Supplementary Material for this article can be found organic solvents on increased yield of polymerase chain reaction of GC-rich online at templates. BMC Res Notes (2012) 5:257. doi: / Wang QT, Xiao W, Mindrinos M, Davis R. Yeast trna as carrier in the isolation of 1034 Oncology/ /fonc /abstract 1091 microscale RNA for global amplification and expression profiling. Biotechniques 1035 (2002) 33(4):788,790, Q REFERENCES 23. Sachdeva R, Simm M. Application of linear polyacrylamide coprecipitation of Bengtsson M, Stahlberg A, Rorsman P, Kubista M. Gene expression profiling denatured templates for PCR amplification of ultra-rapidly reannealing DNA in single cells from the pancreatic islets of Langerhans reveals lognormal Biotechniques (2011) 50: distribution of mrna levels. Genome Res (2005) 15: doi: /gr. 24. Winslow SG, Henkart PA. Polyinosinic acid as a carrier in the microscale purification of total RNA. Nucleic Acids Res (1991) 19: doi: /nar/ Stahlberg A, Bengtsson M, Hemberg M, Semb H. Quantitative transcription factor analysis of undifferentiated single human embryonic stem cells. Clin Chem 25. Musso M, Bocciardi R, Parodi S, Ravazzolo R, Ceccherini I. Betaine, dimethyl (2009) 55: doi: /clinchem sulfoxide, and 7-deaza-dGTP, a powerful mixture for amplification of GC-rich Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mrna synthesis in DNA sequences. J Mol Diagn (2006) 8: doi: /jmoldx mammalian cells. PLoS Biol (2006) 4:e309. doi: /journal.pbio Kang J,Lee MS,Gorenstein DG. The enhancement of PCR amplification of a random sequence DNA library by DMSO and betaine: application to in vitro com Stahlberg A, Andersson D, Aurelius J, Faiz M, Pekna M, Kubista M, et al. Defining 1045 cell populations with single-cell gene expression profiling: correlations and binatorial selection of aptamers. J Biochem Biophys Methods (2005) 64: identification of astrocyte subpopulations. Nucleic Acids Res (2011) 39:e24. doi: /j.jbbm doi: /nar/gkq Jensen MA, Fukushima M, Davis RW. DMSO and betaine greatly improve amplification 1104 of GC-rich constructs in de novo synthesis. PLoS One (2010) 5:e Sindelka R, Jonak J, Hands R, Bustin SA, Kubista M. Intracellular expression profiles measured by real-time PCR tomography in the Xenopus laevis oocyte. doi: /journal.pone Nucleic Acids Res (2008) 36: doi: /nar/gkm Carninci P, Nishiyama Y, Westover A, Itoh M, Nagaoka S, Sasaki N, et al. Thermostabilization Benesova J, Rusnakova V, Honsa P, Pivonkova H, Dzamba D, Kubista M, et al. and thermoactivation of thermolabile enzymes by trehalose and Distinct expression/function of potassium and chloride channels contributes to its application for the synthesis of full length cdna. Proc Natl Acad Sci USA the diverse volume regulation in cortical astrocytes of GFAP/EGFP mice. PLoS (1998) 95: doi: /pnas One (2012) 7:e doi: /journal.pone Spiess AN, Ivell R. A highly efficient method for long-chain cdna synthesis 7. Kubista M, Andrade JM, Bengtsson M, Forootan A, Jonak J, Lind K, et al. using trehalose and betaine. Anal Biochem (2002) 301: doi: /abio The real-time polymerase chain reaction. Mol Aspects Med (2006) 27: doi: /j.mam Horakova H, Polakovicova I, Shaik GM, Eitler J, Bugajev V, Draberova L, et Nolan T, Hands RE, Bustin SA. Quantification of mrna using real-time RT- al. 1,2-propanediol-trehalose mixture as a potent quantitative real-time PCR PCR. Nat Protoc (2006) 1: doi: /nprot enhancer. BMC Biotechnol (2011) 11:41. doi: / Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The 31. Mason PE, Neilson GW, Dempsey CE, Barnes AC, Cruickshank JM. The hydration structure of guanidinium and thiocyanate ions: implications for protein MIQE guidelines: minimum information for publication of quantitative realtime 1059 PCR experiments. Clin Chem (2009) 55: doi: /clinchem. stability in aqueous solution. Proc Natl Acad Sci USA (2003) 100: doi: /pnas Bustin SA. Quantification of mrna using real-time reverse transcription 32. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a PCR (RT-PCR): trends and problems. J Mol Endocrinol (2002) 29: single cell. Science (2002) 297: doi: /science doi: /jme Stahlberg A, Rusnakova V, Forootan A, Anderova M, Kubista M. RT-qPCR workflow for single-cell data analysis. Methods (2012). Q Radstrom P, Knutsson R, Wolffs P, Lovenklev M, Lofstrom C. Pre-PCR processing: 1064 strategies to generate PCR-compatible samples. Mol Biotechnol (2004) 34. Stahlberg A, Aman P, Ridell B, Mostad P, Kubista M. Quantitative real-time PCR : doi: /mb:26:2:133 method for detection of B-lymphocyte monoclonality by comparison of kappa Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al. Landscape and lambda immunoglobulin light chain expression. Clin Chem (2003) 49: of transcription in human cells. Nature (2012) 489: doi: / doi: / nature Stahlberg A, Hakansson J, Xian X, Semb H, Kubista M. Properties of the reverse Islam S, Kjallquist U, Moliner A, Zajac P, Fan JB, Lonnerberg P, et al. Characterization transcription reaction in mrna quantification. Clin Chem (2004) 50: of the single-cell transcriptional landscape by highly multiplex RNA-seq. doi: /clinchem Genome Res (2011) 21: doi: /gr Fleige S, Pfaffl MW. RNA integrity and the effect on the real-time qrt Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, et al. Full-length PCR performance. Mol Aspects Med (2006) 27: doi: /j.mam mrna-seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol (2012) 30: doi: /nbt Fleige S, Walf V, Huch S, Prgomet C, Sehm J, Pfaffl MW. Comparison of relative Bengtsson M, Hemberg M, Rorsman P, Stahlberg A. Quantification of mrna mrna quantification models and the impact of RNA integrity in quantitative in single cells and modelling of RT-qPCR induced noise. BMC Mol Biol (2008) real-time RT-PCR. Biotechnol Lett (2006) 28: doi: /s :63. doi: / Marshall LA, Wu LL, Babikian S, Bachman M, Santiago JG. Integrated printed 38. White AK, Vaninsberghe M, Petriv OI, Hamidi M, Sikorski D, Marra MA, et 1133 circuit board device for cell lysis and nucleic acid extraction. Anal Chem (2012) al. High-throughput microfluidic single-cell RT-qPCR. Proc Natl Acad Sci USA : doi: /ac302622v (2011) 108: doi: /pnas Nolan T, Hands RE, Ogunkolade W, Bustin SA. SPUD: a quantitative PCR assay 39. Mazutis L, Gilbert J, Ung WL, Weitz DA, Griffiths AD, Heyman JA. Singlecell for the detection of inhibitors in nucleic acid preparations. Anal Biochem (2006) analysis and sorting using droplet-based microfluidics. Nat Protoc (2013) : doi: /j.ab : doi: /nprot Stahlberg A, Bengtsson M. Single-cell gene expression profiling using reverse 40. Stahlberg A, Thomsen C, Ruff D, Aman P. Quantitative PCR analysis of DNA, 1138 transcription quantitative real-time PCR. Methods (2010) 50: doi:10. RNAs, and proteins in the same single cell. Clin Chem (2012). doi: / /j.ymeth clinchem Frontiers in Oncology Molecular and Cellular Oncology October 2013 Volume 3 Article

83 Svec et al. Direct lysis of single-cells Fox BC, Devonshire AS, Baradez MO, Marshall D, Foy CA. Comparison of 53. Abu Al-Soud W, Radstrom P. Capacity of nine thermostable DNA polymerases reverse transcription-quantitative polymerase chain reaction methods and platforms to mediate DNA amplification in the presence of PCR-inhibiting samples. Appl 1199 for single cell gene expression analysis. Anal Biochem (2012). doi: / Environ Microbiol (1998) 64: j.ab Kermekchiev MB, Kirilova LI, Vail EE, Barnes WM. Mutants of Taq DNA Silvy M, Pic G, Gabert J, Picard C. Improvement of gene expression analysis polymerase resistant to PCR inhibitors allow DNA amplification from whole 1145 by RQ-PCR technology: addition of BSA. Leukemia (2004) 18: blood and crude soil samples. Nucleic Acids Res (2009) 37:e40. doi: /nar/ doi: /sj.leu gkn Arnedo A, Espuelas S, Irache JM. Albumin nanoparticles as carriers for a phosphodiester 55. Ferrari BC, Power ML, Bergquist PL. Closed-tube DNA extraction using a ther oligonucleotide. Int J Pharm (2002) 244: doi: /s0378- mostable proteinase is highly sensitive, capable of single parasite detection. 5173(02) Biotechnol Lett (2007) 29: doi: /s Abu Al-Soud W, Radstrom P. Effects of amplification facilitators on diagnostic 56. Zhang Z, Kermekchiev MB, Barnes WM. Direct DNA amplification from crude PCR in the presence of blood, feces, and meat. J Clin Microbiol (2000) clinical samples using a PCR enhancer cocktail and novel mutants of Taq. J Mol : Diagn (2010) 12: doi: /jmoldx Wilson IG. Inhibition and facilitation of nucleic acid amplification. Appl Environ 57. Lounsbury JA, Coult N, Miranian DC, Cronk SM, Haverstick DM, Kinnon P, 1209 Microbiol (1997) 63: et al. An enzyme-based DNA preparation method for application to forensic Guyton AC, Hall JE. Textbook of Medical Physiology. Philadelphia: Elsevier Saunders biological samples and degraded stains. Forensic Sci Int Genet (2012) 6: (2006). doi: /j.fsigen Rossetti S, Van Unen L, Sacchi N, Hoogeveen AT. Novel RNA-binding properties of the MTG chromatin regulatory proteins. BMC Mol Biol (2008) 9: doi: / Conflict of Interest Statement: David Svec, Milos Pekny, Robert Sjöback, andanders Sharova LV, Sharov AA, Nedorezov T, Piao Y, Shaik N, Ko MS. Database for Ståhlberg declare stock ownership in TATAA Biocenter mrna half-life of genes obtained by DNA microarray analysis of pluripotent 1159 and differentiating mouse embryonic stem cells. DNA Res (2009) 16: Received: 30 September 2013; paper pending published: 21 October 2013; accepted: doi: /dnares/dsn030 October 2013; published online: xx October Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, et al. Quantifying E. Citation: Svec D, Andersson D, Pekny M, Sjöback R, Kubista M and Ståhlberg A 1218 coli proteome and transcriptome with single-molecule sensitivity in single cells. (2013) Direct cell lysis for single-cell gene expression profiling. Front. Oncol. 3:274. doi: Science (2010) 329: doi: /science /fonc Ibrahim H, Wilusz J, Wilusz CJ. RNA recognition by 3 -to-5 exonucleases: the This article was submitted to Molecular and Cellular Oncology, a section of the journal substrate perspective. Biochim Biophys Acta (2008) 1779: doi: /j. Frontiers in Oncology bbagrm Copyright 2013 Svec, Andersson, Pekny, Sjöback, Kubista and Ståhlberg. This is Dickson KA, Haigis MC, Raines RT. Ribonuclease inhibitor: structure and function. an open-access article distributed under the terms of the Creative Commons Attribu Prog Nucleic Acid Res Mol Biol (2005) 80: doi: /s0079- tion License (CC BY). The use, distribution or reproduction in other forums is per- 6603(05) mitted, provided the original author(s) or licensor are credited and that the original Arnold U, Schulenburg C, Schmidt D, Ulbrich-Hofmann R. Contribution of publication in this journal is cited, in accordance with accepted academic practice structural peculiarities of onconase to its high stability and folding kinetics. No use, distribution or reproduction is permitted which does not comply with these Biochemistry (2006) 45: doi: /bi terms October 2013 Volume 3 Article

84 4.4.5 Analýza dat Experimenty s profilováním genové exprese obsahují multiparametrické testování s mnoha geny, obvykle rozložené do několika qpcr pokusů na vysokokapacitních přístrojích s velkými soubory dat, které je nutné nejdříve sloučit. Mimo klíčového navržení experimentu uvažující počet technických replikátů na všech úrovních analýzy, je nutné provést také bezchybnou systematickou klasifikaci vzorků a genů, jejich typu, skupin, replikátů a uvažovat správné kombinace normalizačních metod. S narůstajícím množstvím dat je třeba pracovat pomocí nástrojů odpovídajících komplexitě výsledků a umožňující jasnou interpretaci. V průběhu studia jsme dlouhodobě spolupracovali na vylepšování postupů analýzy a jejich implementaci do softwaru Genex. Ten obsahuje nástroje pro automatický import dat z nejčastěji používaných přístrojů včetně informací definující vzorky, geny, skupiny, replikáty atd. Ve fázi přípravy dat umožňuje operace typu: spojení analýz z více experimentů (destičkový kalibrátor), odstranění chybějících dat, normalizace pomocí referenčních genů a pomocí exogenní kontroly (RNA a DNA spike I, II), výpočet aritmetického průměru replikátů, odstranění signálu genomické DNA pomocí ValidPrime, změny měřítka a dále analýzu od t-testu přes PCA, hierarchické rozvrstvení a nervové sítě. Základní popis a výhody fungování softwaru jsou popsány v publikaci IV. GenEx: Data Analysis Software, obecný a doporučený postup pro analýzu dat jednotlivých buněk je popsán v práci kolegů Ståhlberga a Rusňákové [113]. 84

85 GenEx: Data Analysis Software Mikael Kubista, Vendula Rusnakova, David Svec, Björn Sjögreen and Ales Tichopad 4 Abstract As the qpcr field advances, the design of experiments and the analysis of data are becoming more important and more challenging. Calculation of relative expression of a reporter gene to a reference gene in pairs of samples using the ΔΔCq method is no longer sufficient. Studies are now designed using multiple markers, nested levels, exploring or confirming the effect of multiple factors, occasionally in paired designs, etc. Proper handling of such data requires software that support the planning and design of experiments, and data analysis. Several software with these capacities are emerging. This chapter describes some of the features of one of the most powerful of those: GenEx from MultiD Analysis. Introduction In the early days of quantitative real-time PCR (qpcr) the experiments performed were rather uncomplicated, involving only small number of samples and analysis could be performed using a standard spread sheet such as Microsoft Excel. However, as the technology develops, more advanced experimental designs become common with large data sets, often involving multiple plates or runs on high-throughput instruments. These analyses require more information on how the experiment was set up to handle references, standards, and controls appropriately, and correctly account for the variance and covariance in the measured data. On some high throughput instruments, even the transfer of data from the instrument to the analysis software is challenging, requiring user intervention with obvious risk to accidently change or lose some data. In parallel with the instruments becoming more advanced and powerful, qpcr data analysis software are being developed. Today, this development is heavily supported by the instrument companies, who have realized their customers needs. The main qpcr software on the market today are GenEx from MultiD ( which is supporting all leading qpcr instruments, Statminer from Integromics ( which supports qpcr instruments from Life Technologies, and qbase plus from Biogazelle ( com), which supports Bio-Rad instruments. Since GenEx is currently the only generic platform and that it is also among the most developed software, we will focus this chapter on it. Much of what is said, though, is general for qpcr analysis and applies also to most other software on the market. Data arrangement When groups are compared, data are classically arranged with the measured Cq (explained variables) in columns headed with the experimental group label (explanatory variables). This arrangement provides easy overview of the data (Fig. 4.1). However, it is not practical for advanced studies that may include more than one nominal factor or covariate (variable of metric character, e.g. time, age, dose, etc.), multiple markers, replicate measurement, repeated measurements (same subject sampled repeatedly), multiplate measurements, etc. A more flexible approach is to arrange the data with samples as rows and all variables in columns (Fig. 4.1). This is standard arrangement in most statistical software today. 85

86 64 Kubista et al. Explanatory Explained Explanatory Control Treated Explained rearranged GOI Group Sample Control Sample Control Sample Control Sample Control Sample Control Sample Treated Sample Treated Sample Treated Sample Treated Figure 4.1 Traditional arrangement of data with experimental group labels in column heads (left) and modern, generic arrangement of data with one variable per column (right). The format is readily generalized to any number of markers and additional columns and rows can be added that specify the experimental design, index samples, markers, plates etc. In GenEx, these are referred to as classification columns and classification rows and have labels starting with #. In the example shown in Fig. 4.2, #Repeat indexes qpcr technical replicates (samples with the same index are replicates on the qpcr level). These are expected to be highly similar and shall be averaged during data pre-processing. #Treatment indexes treatment groups that eventually shall be compared using a statistical test. Finally, the study is paired, meaning that each subject received both treatments and a sample was collected after each treatment. Paired study designs are more powerful, because the pairing reduces confounding variation. This elevates the power of the test and the experiment requires fewer subjects. It can be, for example, samples collected from all subjects before treatment and a second set collected after treatment. It can also be positive and negative samples collected from the same subject or genetically similar individuals such as siblings, identical twins or clones. A special type of paired study design is repeated samplings, used in more Figure 4.2 Example of data arrangement in GenEx. The first column list the samples. Second and third columns are measured Cq values. Fourth column indexes technical replicates, fiftth column indexes treatment groups and sixth column indexes paired samples. Bottom row identifies marker and reference genes. 86

87 GenEx: Data Analysis Software 65 than two subsequent measurements. In general, the word paired is replaced by repeated. Specialized statistical procedures are available to analyse repeated samplings. Data import The experimental design is defined in part by deciding on the experimental factors and covariates involved in the experiment and in part when the samples and assays are mixed while dispensed into the qpcr containers. This information is critical for proper analysis and mining of the measured data. This has been realized by several of the leading qpcr instruments and assay providers. The Roche LC480 software for Real-Time Ready custom and focus panels, for example, names all genes and samples, indicates reference genes, and specifies technical and biological replicates at various levels. This information is transferred to GenEx and arranged appropriately for downstream analysis using a wizard. A similar high level user friendly solution is provided by Exiqon, who offers a customized version of GenEx with a powerful wizard to read their mircury LNA Universal RT microrna PCR platform (www. exiqon.com/qpcr-software). On the BIOMARK microfluidic platform from Fluidigm, technical and biological replicates are indicated by the naming of assays and samples, and appropriate classification columns are created automatically. Data generated on other qpcr instruments can also be read by GenEx, including the Stratagene MX300X from Agilent, Realplex from Eppendorf, CFX96/384 from Bio-Rad, Eco from Illumina, and the many different qpcr platforms from Life Technologies. These efforts from the instrument manufacturers to transfer experimental design information automatically or at least semi-automatically into GenEx substantially simplify the pre-processing needed to prepare qpcr data for statistical analysis Data pre-processing The days when the Cq method was sufficient to analyse qpcr data have past. Not that we are calculating differently, the studies have become larger and the experiments more complex. In fact, for most studies performed today, it is not even possible to write a closed form expression to calculate the resulting expression response. Rather the measured data must be processed sequentially to account for the various aspects of the experiment. In particular, it is essential to correctly define the statistical unit (often referred to as a subject when organisms are used). Each unit should be associated with a single value for each variable to use common statistical methods. This value must, however, frequently be assembled from various measurements, i.e. responses of target and reference genes, estimated amplification efficiencies, etc. In order to integrate the process into a logical workflow, GenEx provides an intuitive wizard with the following sequential operations: 1 Interplate calibration. Many studies cannot be fitted in a single experimental run or for practical reasons have to be extended over time. qpcr instruments perform base-line correction and set threshold separately for each run, which introduces a bias between the Cqs measured in different runs. This bias can be compensated for by performing a common amplification in all plates, where the same sample is analysed for a given assay. This sample is called Inter-Plate Calibrator, IPC. Any variation in the measured Cqs of the IPC among runs reflects systematic variation due to instrument factors and should be compensated for. It is sufficient to run a single IPC for each channel in the instrument if a common threshold and base-line correction is used. It is not recommended to perform separate inter-plate calibrations for each target. Since every correction adds confounding variation to the data, unimportant corrections shall be avoided, as they may impair data quality rather than improving it. For the same reason, it is a good strategy to use a robust sample for IPC and analyse it in replicates. In multiplate experiments, the runs and the inter-plate calibrators shall be indexed in classification columns. 2 Efficiency correction. If PCR efficiency has been estimated, the measured Cq values can be corrected to account for suboptimal amplification. Typically, PCR efficiencies are 87

88 66 Kubista et al. estimated from serial dilutions run separately. The PCR efficiencies may then be listed in a classification row for automatic correction in GenEx. 3 Normalize using spiking. PCR efficiency depends on the sample matrix. Usually it is assumed that the sample matrix and thus the PCR efficiency is constant. But occasionally there are variations, which can be tested for using an exogenous spike added to the samples. Differential expression of the spike between the test and a standard sample reflects the sample s specific inhibition and can to some degree be accounted for. 4 Normalize to sample amount. Measured Cq values depend on the sample input. This can be the sample volume processed, amount of RNA used for reverse transcription, or cell count. If sample input varies, data may have to be normalized. The sample input shall be indicated in a classification column. 5 Average qpcr replicates. If qpcr replicates are available they shall be indexed in a classification column and their Cq values shall be averaged. 6 Correct for genomic DNA background. When quantifying RNA levels using RT-qPCR, the assays may also amplify genomic copies of the target if the DNase treatment used is insufficient. The amount of genomic background can be assessed by measuring either RT controls or by using the ValidPrime approach. The contribution to Cq from the genomic background can be calculated and the Cqs corrected. 7 Normalize with reference genes. In expression studies normalization to endogenous controls, such as stably expressed reference genes, is popular. With GenEx, we can normalize to any number of reference genes; we can even normalize sets of reporter genes to sets of reference genes to match the genes properties such as expression levels, stabilities, distribution in tissues, etc. It is also possible to normalize to the mean expression of all the genes (global normalization). Optionally, reference genes can be indexed in a classification row for automatic processing. Normalization to the expression of reference genes corresponds to calculating Cq in the classical approach. 8 Average technical replicates. If additional technical replicates are available, such as RT, extraction, and sampling replicates, they shall be indexed in classification columns and averaged. 9 Normalize with reference sample(s). In some paired designs, systematic variation can be reduced by normalizing to the paired sample during pre-processing. 10 Relative quantities. An arbitrary reference level is selected (which corresponds to Cq in the classical approach) and data are converted to linear scale (2 Cq in the classical approach). The reference level can be the most expressed sample, the least expressed sample, mean expression of all the samples, mean expression of a group of samples, or percentage (sum of the expression in all samples set to 1). It is also possible to convert the Cq values to an arbitrary linear scale (2 Cq ). 11 Convert to log scale. For statistical analysis with parametric methods, the data shall be converted to logarithmic scale. Available options are log 2, log 10, ln, and log(x + 1). All the steps in the workflow are not needed, since some cancel the effect of others. The appropriate steps depend on the experimental design, the controls and references that are available, and the analysis that will be performed. In addition to the pre-processing work flow, GenEx has correction for missing data. GenEx recognizes two types of missing data, random missing (failed experiments) and non-random missing (off-scale data). There is a built-in handling of random missing data among technical replicates, which are replaced based on available information, in the course of the pre-processing. This is very useful, since the missing information can be ignored and is automatically accounted for. There are also several means to handle non-random missing or off scale data that are due to too low target amounts, which may bias the biological effect and invalidate the statistical inference in the majority of the statistical tests employed. Outliers in the data can be tested for based on standard 88

89 GenEx: Data Analysis Software 67 deviation and the Grubb s outlier test. The preprocessing of data is logged and stored in a log file. Screening by correlation Several companies including Roche, Exiqon, Life Technologies, Lonza, Qiagen and TATAA Biocenter, offer pre-plated assays for smooth expression profiling and screening purposes. Data from those plates are readily read into GenEx. Rarely are all assays relevant for every study and a strategy is to analyse a few representative samples of each kind in a pilot study to identify differentially expressed genes to be used in a larger downstream study. This is readily done using the GenEx scatter plot (Fig. 4.3). Replicated measurements can be compared with test the reproducibility between plates (top left), or screen for differentially expressed genes under two conditions (top right, bottom). Correlations between genes expressions can be quantified by calculating the Pearson or Spearman correlation coefficients. This is typically applied to larger number of samples and has, for example, been used to reveal correlations between genes expressed in individual cells (Stahlberg et al., 2011). Preparing the data for analysis Groups for comparison are created using the GenEx Data Manager. Treatment groups or treatment factors in multifactorial studies, such as studies of the effect of gender or covariates such as age, time or drug load, can be indexed in classification columns and used to assign subjects into groups automatically. The groups are assigned colours and symbols for plotting. A neat feature is that colours and symbols can be set independently, which makes it possible to assign subjects to multiple groups and identify these in plots by the shape, size, and colour of the symbol. Even shades of colours can be used creatively to indicate various levels of covariates at ordinary scale (e.g. darker shade indicates higher drug load). In the data manager, subjects and genes can be removed temporarily from analysis to compare results based on analyses of subsets of data. Data can also be mean centred (subtraction of the mean value) or autoscaled (subtraction of mean followed by division with the standard deviation) to change the weights of the genes/ samples in analyses. This is particularly useful in expression profiling analysis, where genes having different expression levels can be assigned equal weights. For analyses that apply models based on measured data, such as the standard curve, reverse calibration, neural networks, self organized map, potential curves etc., samples (and genes) can be assigned either training or test. Training data are used to create the model, which is applied to classify the test samples. The various analyses available in GenEx are listed in Table 4.1. Standard curve and reverse calibration Amounts of pathogens in field samples can be quantified using qpcr by comparing the measured Cqs of the field samples with those of standard samples by means of a standard curve. Representative data arrangement is shown in Fig In addition to the measured data, the concentration of the standards is given in a classification column. Additional classification columns can be used to index replicates and to identify the standard and test samples. Any technical replicates shall be averaged during pre-processing and the averaged Cq shall be considered a single data point. Independently prepared standards are treated as different data points, while replicate measurements of field samples are averaged and used as a single more precise estimate. Groups are created in the GenEx Data Manager and assigned either test or training status. Samples can also be reversibly removed from analysis (Fig. 4.5). A confidence level is set for the analysis. The standard curve is the best straight line fit of the Cqs measured for the standard samples to their concentration in logarithmic scale (Fig. 4.6). It is calculated using linear regression and defines the intercept, which is the Cq expected for a sample containing a single template molecule, and the slope. From the slope, the PCR efficiency is estimated. GenEx also calculates the uncertainties in the estimates of the slope and the intercept, which are reflected by the dashed lines in the plot as the Working-Hotelling area, and the 89

90 68 Kubista et al. A B Figure 4.3 Scatter plot comparing replicate measurements performed in separate plates (A) and two different conditions (B). Differentially expressed genes are tabulated (C). confidence interval for the PCR efficiency (Fig. 4.6). It is essential to calculate the confidence information since it reflects the precision of the estimated efficiency. In this example, the precision of the estimated efficiency is quite high, because a large number (21) of standards was used and a wide concentration range was covered (6 logs). In the literature, we frequently see standard curves based on a substantially lower number of standards. The PCR efficiencies estimated from such standard curves are highly uncertain and any corrections made are unreliable. There are even cases 90

91 GenEx: Data Analysis Software 69 C in literature when separate standard curves are measured for the same assay in different plates and the estimated PCR efficiencies are used to adjust the two sets of data. Such adjustment is nonsense since any difference between the estimates of efficiency is caused by random variation in the measurements rather than any systematic difference between the runs. The residual plot shows the deviations of the standard samples measured Cq and their predicted Cq by the standard curve (Fig. 4.6). If the straight line standard curve is adequate to model the data, residuals should fluctuate randomly. If the model is inadequate, runs of positive and negative residuals will be seen. GenEx performs a statistical test for the number of runs, and if they are too few, warns that the linear standard curve may be an inadequate model. Outliers are readily identified visually in the residual plot and GenEx further uses the Grubb s test to support the removal of outliers. In general, no more than one outlier should be removed from a standard curve (Anonymous, 2003). If multiple outliers are indicated, the approach used is likely to be unstable and should be overseen. When replicates are available, the residual plot also reveals if noise increases at low concentrations. A reliable standard curve is critical for accurate estimation of the concentrations of field samples, which in GenEx are referred to as test samples. The estimates improve if the field samples are available in replicates that can be averaged to reduce confounding variation. Concentrations of the unknown samples are estimated by entering the standard curve at the measured Cq and reading out the log of the concentration on the x-axis (Fig. 4.7). The Working-Hotelling area, which reflects the prediction uncertainty, is wider than before because of the additional error contribution from the measured Cq. GenEx calculates confidence intervals for the estimated concentrations. The confidence intervals are symmetric around the mean in logarithmic scale, while they are asymmetric around the mean in linear scale (Fig. 4.7). The uncertainty in the estimates is larger than most people think. Even though the standard curve in the example is based on seven concentrations of standards covering six logs, each measured in triplicates for a total of 21 readings, and the assay has 96% efficiency, the uncertainty in the estimated concentrations of the unknowns is substantial. For example, for Test 1, estimated concentration is 46,700 copies, with the 95% confidence interval: 31,000 61,000 copies! With less accurate standard curve, the precision in the estimated concentration would be even worse. 91

92 70 Kubista et al. Table 4.1 Genex versions GenEx version comparison Standard Pro Enterprise Pre-processing of data Interplate calibration PCR efficiency correction Normalize to sample amount Normalize to reference genes/samples Normalize to spike Missing data handling and primer dimer correction Relative quantities and fold changes Finding optimal reference genes genorm NormFinder Plots Scatterplots Line plots Bar plots Box and whiskers plot Principal component analysis PCA P-curve Cluster analysis Hierarchical clustering/dendogram Heatmap analysis Networks Self-organizing map (SOM) Artificial neural networks (ANN) Support vector machine (SVM) Regression analysis Standard curve Reverse calibration Limit of detection (LOD) Partial least square (PLS) Three-way analysis Trilinear decomposition Statistics Descriptive statistics Parametric t-test Non-parametric tests One-way ANOVA Two-way ANOVA Nested ANOVA 92

93 GenEx: Data Analysis Software 71 GenEx version comparison Standard Pro Enterprise Correlation Spearman rank correlation coefficient Pearson correlation coefficient Experimental design Sample size Experimental design optimization Limit of detection The limit of detection (LOD) is the lowest amount of analyte in a sample that can be detected with (stated) probability, although perhaps not quantified as an exact value, with analyte here referring to the targeted nucleic acid (Anonymous, 1995, 2004). For classical tests, when a signal is measured against a background, LOD is estimated from the standard deviation of the blank readout at the standard curve intercept. This approach is, however, not applicable to qpcr, which, due to its real-time readout, gives no reading for a negative sample (Cq for a blank sample is formally infinity). Instead, for an analytical process that involves qpcr, LOD can be estimated from multiple standard curves (Burns and Valdivia, 2008). A minimum of six is recommended and concentrations around the expected LOD should be assessed. The measured data are transferred to binary format indicating positive and negative PCR s and the fraction of positive calls at each concentration is calculated. LOD is the concentration at which replicates are positive at the stated rates (e.g. 95%). GenEx fits the measured positive rates at different concentrations to estimate the LOD (Fig. 4.8). not only compensate for variation in sample amount, but also for variations in extraction yield, reverse transcription efficiency, and RNA quality. In the early days of PCR, genes needed for basic housekeeping functions were thought to have stable expression and could serve as references. Experience has shown this is not always true, and before a gene is used as a reference, this assumption should be validated. Criteria for a good reference gene is that it has a stable expression among samples and that its expression is invariant of the treatment applied. Stability of expression is reflected by the standard deviation (SD) of biological replicates. However, we cannot just take a set of samples, measure the Cqs and calculate the SD s, because we do not know how to normalize Selecting reference genes With qpcr, the amount of target molecules in a sample is measured rather than their concentration. A large sample is expected to contain more target molecules than a small one and to compensate for the effect of size, normalization must be applied. There are several options to normalize. A popular option in gene expression analysis is to normalize with reference genes, since this should Figure 4.4 Data prepared for analysis using a standard curve. Classification columns indicate concentrations of standards, sample replicates, and indexes test and training samples. 93

94 72 Kubista et al. A B Figure 4.5 Groups are created in the GenEx data manager optionally using information in classification columns (A). Samples are assigned training or test status, and may be reversibly inactivated (B). the samples for this exercise. Of course, we can use the same amount of RNA in the analyses, but then, how do we know that all samples were extracted and reverse transcribed with the same yield, and that they have the same mrna/total RNA ratio and the same RNA quality? Furthermore, if we were to assume that we can evaluate genes expression stabilities based on samples normalized to the same amount of RNA, then we would have already decided that total RNA is the best norm. The gene selected based on minimum SD measured on samples having the same amount of 94

95 GenEx: Data Analysis Software 73 A B C Figure 4.6 (A) qpcr standard curve (line) with uncertainty indicated (dashed lines). (B) residual plot showing the deviations of the standard points from the fitted standard curve. (C) estimated slope, intercept and PCR efficiency for the standard curve including confidence intervals. RNA will be the gene that shows a variation that correlates the most with that of total amount of RNA, and we may then as well normalize to the amount of RNA directly. If we suspect that the total amount of RNA is not the best norm, we have to identify optimum reference genes using different strategy. An appropriate approach to select reference genes is a special form of analysis of variance, which in qpcr literature is best known as using the tool NormFinder (Andersen et al., 2004). NormFinder is applied to a panel of candidate reference genes that is analysed in a set of representative samples. In essence, NormFinder calculates a global average expression of all the genes in all the samples, to which the individual genes are compared. Based on this comparison, SD for each candidate reference gene is estimated. Furthermore, if the samples are from different treatment groups, NormFinder separates the variation into an intragroup and an intergroup contribution. Fig. 4.9 shows an example where reference genes were sought for an obesity study in mice, where wild-type mice and an obese strain were compared. The genes were selected from the TATAA reference gene panel ( products/gene-expression-assays-and-panels/ View-all-products.html), which was measured on 95

96 74 Kubista et al. A B Figure 4.7 (A) Calibration by means of the standard curve to estimate the test sample s concentration and its confidence interval. (B) summary of estimated concentrations including confidence intervals in logarithmic (left) and linear (right) scales. seven representative mice from each strain. The intragroup variation estimated is the SD of the genes in the different treatment groups, while the intergroup variation is differential expression and sums to zero for every gene over all the groups. Good reference genes shall have low intergroup variation in all groups and negligible intergroup variation. Good strategy using NormFinder is to inspect the calculated intra and intergroup variations to identify any genes that appear regulated or exceedingly unstable and remove them from the data set using the GenEx Data Manager. NormFinder analysis is then repeated without considering the groups, since the remaining genes are not regulated. This produces more robust result with a single SD estimate for each gene, based on which the genes are ranked (Fig. 4.10). The gene with lowest SD is the optimum reference gene. GenEx also calculates the accumulated SD expected if multiple reference genes are used for 96

97 GenEx: Data Analysis Software 75 Figure 4.8 Estimation of LOD by fitting the rate of positive calls as function of concentration. A B Figure 4.9 Estimated intergroup and intragroup variations with NormFinder of genes from the TATAA reference panels in representative brain samples of wild-type and obese mice. normalization. If we use larger number of reference genes, random variation among the genes expression partially cancel reducing the SD. Comparing the SD contributed from different number of reference genes selected based on stability, a minimum in the accumulated SD plot is obtained, indicating the number of reference genes that give the lowest SD (Fig. 4.10). However, analysing more genes cost time and money, and one should consider the degree of improvement and the overall noise contributed by the reference genes when making a decision. In the example, the largest improvement is observed when including the second reference gene; including additional reference genes only slightly improves the result. Furthermore, the noise contribution from the best reference gene is only 0.05 cycles and as little as 0.04 cycles when combining the two best reference genes. Considering that the repeatability of a qpcr instrument is rarely less than 0.1 cycle (estimated as SD of technical replicates), using more than one reference gene, and definitely more than two, will in this study not improve the quality of the data appreciably. Using NormFinder, normalization with reference genes can be compared with normalization with total RNA by adding an extra column in the data sheet with the RNA concentrations per analysed sample in logarithmic scale (Fig. 4.11). The algorithm is ignorant of the nature of the variables, and will compare their variation. For the data in our example, normalization with total RNA is essentially as stable as normalization with PPIA, which is the single optimum reference gene here. In this study the samples analysed were flash-frozen biopsies from mouse brains, from which RNA of very high quality (RIN 8 9) was extracted. Our experience is that for samples with high quality RNA, normalization to total amount of RNA is often as good as normalizing with a single reference gene. In samples of poor RNA integrity, or when expression may have been induced, normalization with reference genes is preferred. An older method to identify good reference genes that still is being used is genorm. It uses the same input data as NormFinder, but it does not consider groups; all samples are treated as being from a single population. genorm sequentially 97

98 76 Kubista et al. A B Figure 4.10 SD (A) and accumulated SD (B) for the reference gene candidates in the TATAA reference gene panel in 14 brain samples from mice estimated using Normfider. eliminates the gene that shows the highest variation relative to all the other genes based on paired expression values in all the studied samples. The variability is reflected by a so-called M-value (Fig. 4.12). Because of the elimination process, genorm cannot identify an optimum reference gene, and ends up by suggesting a pair of genes that shows high correlation and should be suitable for normalization. The M-value is related to the SD, but as calculated, the M-values for the genes are based on different sample sizes and are therefore not strictly comparable. Furthermore, 98

99 GenEx: Data Analysis Software 77 A B Figure 4.11 (A) Input data for comparison of normalization with reference genes to normalization with total amount of RNA using Normfider. A column indexing the used total RNA concentrations in logarithmic scale has been added. (B) Output indicating that for these particular samples, total RNA normalization is almost as stable as normalization with the single best reference gene. as the comparison of any individual candidate gene is performed towards a plurality of genes, assumed to resemble most closely the anticipated stable behaviour, it is prone to systematic failure where group of co-regulated instable genes may be involved in the analysis. Any such co-regulated complex of instable genes may dominate over the stable genes and hence point at deviant genes as candidates. Usually, the gene rankings by genorm and NormFinder are similar, which is reassuring. Should the rankings differ, there would be a reason to suspect the selection to include one or more regulated genes, and the result should be interpreted with caution. Relative quantification Treatment groups are readily compared visually in bar graphs using descriptive statistics (Fig. 4.13) and statistical comparison is made using ANOVA (one factor, two or more levels) or two-way ANOVA (two factors, two or more levels each) or, in the case of two groups, with either t-test (paired/unpaired, one-tail/ two-tail) or non-parametric tests (Mann Whitney, Wilcoxon). The difference between the groups is shown in either linear or logarithmic scale and the confidence interval is indicated (Fig. 4.14). Note that the confidence interval of the differential expression is asymmetric 99

100 78 Kubista et al. Figure 4.12 Ranking of reference gene candidates by sequential elimination using genorm. The final two genes cannot be compared further. Figure 4.13 Comparison of the expression of four genes in three groups. Bars indicate mean expression and the error bar either SD, SE or CI. when data are presented in linear scale. When expression of many genes is compared, GenEx offers several means to control for the false discovery rate due to multiple testing, including Bonferroni, Benjamini Hochberg, Westfall and Young, and Benjamini Yekutieli (Fig. 4.15). Expression profiling t-test and ANOVA are univariate methods that analyse the expression of every gene separately, effectively assuming that the genes are expressed independently of each other. This is rarely the case; genes expressions tend to be correlated. This correlation can be exploited in the analyses using multivariate statistical methods. GenEx 100

101 GenEx: Data Analysis Software 79 A B Figure 4.14 Comparison with t-test for the expression of four genes between two groups. (A) Descriptive statistics include normality test and P-values. (B) Bar graph showing the differential expression in linear scale. Note, the indicated confidence intervals are asymmetric around the means. offers several unsupervised as well as a selection of supervised methods to classify samples and categorize genes based on expression profiles. Unsupervised methods classify samples and genes based on the measured profiles only. They include classical hierarchical clustering combined with heat map, which can be based on various clustering schemes including the Ward s algorithm and several distance measures including the Euclidian distance and the magnitude of the Pearson correlation. While the Euclidian distance clusters genes based on similarities and consider up-regulation and down-regulation being opposite, hence anti-correlated, distance based on the magnitude of the Pearson correlation considers up-regulation and down-regulation to be correlated. The latter is useful to classify, for example, genes that show the same temporal response to 101

102 80 Kubista et al. Figure 4.15 Comparison of the expression of multiple genes between two groups corrected for the high false discovery rate due to multiple testing. Columns 2 and 3 indicate results of normality test and column 4 the differential expression in log scale. Column 5 indicates P-values calculated with t-test. Grey indicates P-values considered significant based on Bonferroni correction, light grey are P-values below stipulated uncorrected confidence level that are not significant with Bonferroni correction, and dark grey are P-values above confidence threshold. The three last columns indicate P-values calculated using Benjamini-Hochberg, Westfall and Young, and Benjamini Yekutieli corrections for the high false discovery rate. treatment independently of the genes being up or down-regulated. The clustering is visualized in a dendrogram, which in GenEx can be mirrored in every node. Mirroring in a node changes the visual appearance of the dendrogram, producing an equivalent mathematical solution. A small self-organized map (SOM) can be used to force classification into a defined small number of groups based on expression similarities. SOM can also be used to validate a classification model, based on the distribution of samples/genes in a large map. Principal component analysis (PCA) A B C Figure 4.16 Classification of genes expressed during early development of Xenopus laevis based on hierarchical clustering (A), SOM (B), and PCA (C). 102

103 GenEx: Data Analysis Software 81 groups samples/genes based on correlated expression in reduced space. Fig shows example of hierarchical clustering, SOM, and PCA of genes expressed during the development of the African claw frog Xenopus laevis from the oocyte to tadpole stage (Bergkvist et al., 2010). Hierarchical clustering of sample and genes can be combined showing also the measured intensities in a heat map (Fig. 4.17). The appearance of the heat map can be changed in GenEx within equivalent mathematical solutions by mirroring the dendrograms in nodes. Supervised methods require a training set of samples with known classification; for example, negative and positive samples, or short, medium and long term survivals. A model is developed based on the training set that can be used to classify new data (in GenEx called test data). The procedure is similar to a regression based on a standard curve, but here it is based on multiple genes and the model does not have to be linear. Supervised methods available in GenEx include partial least squares (PLA), which is used to calculate a single standard curve based on the expression of multiple genes to predict concentrations or other measures of test samples. Potential curves is a variant for prediction of new data based on PCA, and artificial neural network (ANN) and support vector machines (SVM) are multivariate non-linear methods to classify samples. Logistic regression, Probit, receiver operating characteristics (ROC), and survival analysis will soon also be available. Experimental design Designing experiments is more difficult than analysing results. In a well designed experiment, confounding variation is minimized and the number of subject is sufficient to obtain conclusive results. A good strategy is to perform a fully nested pilot study before specifying the test protocol for a larger study (Fig. 4.18; Tichopad et al., 2009). Fig shows the result of a nested pilot study wherein three heifers were studied by collecting three blood samples of each that were reverse transcribed in triplicates, and each cdna was analysed in triplicate using qpcr. Using a Figure 4.17 Classification of genes and samples visualized with dendrograms and a heat map indicating expression levels. The heat map appearance can be changed by mirroring in any of the dendrograms nodes. 103

104 82 Kubista et al. Group A Group B Group C Subject n = 3 3 Sample/ Extraction n = RT n = qpcr n = Figure 4.18 Nested experimental design. Three subjects (heifers) are tested; three samples are collected from each subject and extracted; the extracted material is reverse transcribed in triplicates. Each cdna is analyzed in triplicates for qpcr. nested ANOVA, the variation arising from the different experimental steps can be estimated and expressed either as standard deviations or as variance contributions (Kitchen et al., 2010). While β-actin and caspase-3 show generally low standard deviations in all steps, interleukin 1β and interferonlevels varied substantially among the heifers. For liver samples, the picture was different, with the data evidencing large variation in the sampling step (Trichopad et al., 2009). Knowing the costs associated with the different experimental steps, the follow up study can be cost optimized. For example, for genes exhibiting SD = 0.1 cycle for the qpcr and RT steps, SD = 0.2 cycles for the sampling/extraction step, and SD = 1 cycle for the variation among the animals, and assuming a cost of 1 unit for the qpcr, 3 units for the RT, 10 units for sampling/extraction, and 100 unit for each animal, with a total budget of 1000 units, the best we can do is to analyse eight animals, sample each animal once, perform RT in triplicates and qpcr in duplicates. The total standard error (SE) for this study is expected to be about 0.36 cycles (Fig. 4.20). Using the same tool, we also estimated the SD among animals analysed with a single sample collected from each animal, RT performed in triplicates and qpcr in duplicate to 1.02 cycles. This can then be fed into a Power analysis to estimate the number of animals needed to ensure a particular difference with certain confidence and power. If we accept 5% false-positive rate (95% confidence) and 5% false-negative rate (95% power) we construct a graph showing how many subject are needed to measure a particular difference due to treatment, For example, to measure a 2-fold difference ( Cq of 1) we require under these criteria 15 animals (Fig. 4.20). Conclusions GenEx is the first professional software that offers the full palette of methods for qpcr data analysis and qpcr experimental design. The user interface is attractive and the graphical output is of high quality. The help file is adequate and the audiovisual tutorials based on example data are excellent ( A big plus is the free support forum, where hundreds of GenEx users share experiences and advises (www. qpcrforum.com). The important support from virtually all leading qpcr instrument manufacturers is also most valuable. 104

105 GenEx: Data Analysis Software 83 A B Figure 4.19 Decomposition of the variance in a four-level nested study of blood samples collected from heifers. (A) estimated standard deviations (SD) among heifers (grey), among replicate blood samplings (dark grey), among reverse transcription replicates (white), and among qpcr replicates (black). (B) Same data presented as variance contributions expressed in percentages. Future trends qpcr is rapidly becoming a broadly accepted platform, which leads to needs for standardization and quality control. This will require future software to be GLP, ISO, and Title 21 CFR Part 11 compliant. We also expect followers to Roche and Exiqon that already offer excellent wizards for transferring data and experimental design information to qpcr analysis software. Integration with Laboratory Information Management Systems (LIMS) is also expected. 105

Zobrazit více