Close
About
FAQ
Home
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understand the distinct patterns of selection in auto-immune diseases with ancient DNA data by the S-LDSC model
(USC Thesis Other)
Understand the distinct patterns of selection in auto-immune diseases with ancient DNA data by the S-LDSC model
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
U n d e r s t a n d t h e d i s t i n c t p a t t e r n s o f s e l e c t i o n i n a u t o - i m m u n e d i s e a s e s w i t h a n c i e n t D N A d a t a b y t h e
S - L D S C m o d e l
b y
Y a t i n g Z e n g
A T h e s i s P r e s e n t e d t o t h e
F A C U L T Y O F T H E U S C K E C K S C H O O L O F M E D I C I N E
U N I V E R S I T Y O F S O U T H E R N C A L I F O R N I A
I n P a r t i a l F u l f i l l m e n t o f t h e
R e q u i r e m e n t s f o r t h e D e g r e e
M A S T E R O F S C I E N C E
( B I O S T A T I S T I C S )
M a y 2 0 2 3
C o p y r i g h t 2 0 2 3 Y a t i n g Z e n g
A c k n o w l e d g m e n t s
I am grateful to all of those who do me favor along the whole way .
First, the most important one should be my mentor Professor Steven Gazal. He taught me
more than I could ever give him credit for here. He has shown me, by himself, what a good
researcher should be and what a nice person would do. He always tried his best to help me figure
out my problems and taught me with much patience. I appreciate it indeed to meet such a
wonderful professor , who encouraged and taught me a lot in researching the field of genetics.
Besides, I’m thankful to Professor Charleston Chiang and Professor Nicholas Mancuso,
their insightful feedback and constructive criticism have challenged me to think critically and
deeply about my research, pushing me to strive for excellence in my work.
Last but not least, each of the members of the Gazal Lab was always friendly to give their
support and encouragement. I am fortunate to learn and work in such a great environment.
Thank you all very much! The day working together would be a very precious memory
for me!
ⅱ
T A B L E O F C O N T E N T S
Acknowledgments ........................................................................................................................... ii
List of T ables .................................................................................................................................. iv
List of Figures .................................................................................................................................. v
Abstract ........................................................................................................................................... vi
Chapter 1: Introduction .................................................................................................................... 1
Chapter 2: Method ........................................................................................................................... 3
2-1 DNA dataset of modern and ancient Europeans .................................................................. 3
2-2 computation ................................................................................................................... 4 𝐹 𝑠 𝑡 2-3 Creation of annotations capturing allele frequency changes at dif ferent time points .... 5 𝐹 𝑠 𝑡 2-4 Stratified LD Score regression ............................................................................................ 6
Chapter 3: Results ............................................................................................................................ 9
3-1 V ariants with recent allele frequency changes are enriched in important functional
regions and in genes related to immunity .................................................................................. 9
3-2 V ariants with recent allele frequency changes are enriched in the heritability of human
diseases and complex traits ...................................................................................................... 1 1
Chapter 4: Conclusion and discussion ........................................................................................... 14
References ...................................................................................................................................... 16
iii
L i s t o f T a b l e s
T able 1: T ime period and corresponding sample sizes used in this study ....................................... 2
T able 2: Paired T ime periods used for enrichment analysis ............................................................ 5
T able 3: Genes regulated by SNPs with high (present vs 4-5K) keeping MHC genes in the 𝐹 𝑠 𝑡 pathway analyses ........................................................................................................................... 10
T able 4: Genes regulated by SNPs with high (present vs 4-5K) without keeping MHC genes 𝐹 𝑠 𝑡 in the pathway analyses .................................................................................................................. 1 1
iv
L i s t o f F i g u r e s
Figure 1: S-LDSC results for the trend of enrichment in 6 important functional regions across
three dif ferent time scales ................................................................................................................ 9
Figure 2: Disease heritability enrichment of SNPs with high ................................................. 12 𝐹 𝑠 𝑡 v
A b s t r a c t
While the human genome and human phenotypes have been shaped by around 5-6 millions of
years of evolution (Carroll, 2003) , it is unclear if and how they have been impacted by recent
selection events (<10K years). Here, we investigated how recent changes in allele frequency
have shaped the genetic architectures of human diseases and complex traits by leveraging
time-dependent allele frequencies from 554 modern Europeans and 5,362 ancient Europeans, and
results from 63 independent genome-wide association studies. W e show that variants with high
allele frequency dif ferences between modern Europeans and Europeans living 4K years ago,
were enriched in functional variants, in variants tar geting genes involved in the immune function,
and inheritability of human diseases and complex traits (especially within blood and immune
phenotypes). These results strongly suggest that recent selection events have impacted human
disease risks in Europeans.
vi
C h a p t e r 1 : I n t r o d u c t i o n
Genome-wide association studies (GW AS) and analyses of components of heritability have
demonstrated that common diseases and complex traits are overwhelmingly polygenic and
pleiotropic (Abdellaoui et al., 2023; V isscher et al., 2012, 2017) , motivating ef forts to understand
the processes underlying complex trait architectures. Recent studies have highlighted the impact
of long-term negative selection on deleterious variants associated with human diseases and
complex traits, and its key role in shaping the genetic architecture of these traits (Agarwala et al.,
2013; Eyre-W alker , 2010; Fuchsber ger et al., 2016; Gazal et al., 2018; Mancuso et al., 2016;
O’Connor et al., 2019; Schoech et al., 2019; Simons et al., 2018; Y ang et al., 2015; Zeng et al.,
2018; Zuk et al., 2014) . Over the past 10,000 years, significant changes in human population
size, migration patterns, and cultural practices were all possible to bring selection pressure and
influence the genetic makeup of modern humans (Hawks et al., 2007; Keinan & Clark, 2012;
Laland et al., 2010; Pritchard et al., 2010) . However , it is unclear whether genetic adaptation due
to recent selective pressure events (i.e. <10K years ago) has also contributed to shaping the
genetic architecture of human diseases and complex traits (i.e. resulting in significantly high
heritability enrichments of the functional alleles), as well as when these events happened.
Recent advances in ancient DNA techniques have allowed generating enough genetic
data to directly study recent past selections by tracking allele frequency changes over time
(Patterson et al., 2022) . Here, we leveraged time-dependent allele frequencies from 554 modern
Europeans and 5,362 ancient Europeans and results from 63 independent European GW AS to
investigate how recent changes in allele frequency have shaped the genetic architectures of
human diseases and complex traits in Europeans. W e observed that variants with high allele
frequency changes between now and 4K years ago, were enriched in functional variants, in
1
variants tar geting genes involved in the immune function, and in heritability of human diseases
and complex traits (especially within the blood and immune phenotypes). These results strongly
suggest that recent selection events have impacted human disease risks in Europeans.
2
C h a p t e r 2 : M e t h o d
2-1 DNA dataset of modern and ancient Eur opeans
The (unpublished) dataset analyzed in this study was obtained from the collaboration with David
Reich laboratory (Patterson et al., 2022) . It consists of 554 sequenced modern Europeans from
the 1000 Genome project (1000 Genomes Project Consortium et al., 2015) , and 5,362 genotyped
ancient Europeans from 8 time periods (T able 1) with genotypes imputed over 9,081,148 SNPs.
For each time period (present and 8 ancient periods), we obtained the allele frequencies estimates
that were corrected for population structure by an unpublished method developed by the Reich’ s
lab.
T ime period Starting time Ending time Mean time # of samples
Present 0 0 0 554
1-999 1 999 739 436
1K-2K 1,000 2,000 1,448 1,315
2K-3K 2,000 3,000 2,447 1,207
3K-4K 3,000 4,000 3,572 736
4K-5K 4,000 5,000 4,438 730
5K-6K 5,000 6,000 5,527 349
6K-7K 6,000 7,000 6,492 341
7K-8K 7,000 8,000 7,410 248
T able 1: T ime period and corr esponding sample sizes used in this study .
3
2-2 computation 𝐹 𝑠 𝑡 In this study , we used to quantify allele frequency changes between 2 time periods. is a 𝐹 𝑠 𝑡 𝐹 𝑠 𝑡 measure of gene dif ferentiation between and within populations (Bhatia et al., 2013; Nei, 1986) ,
which can be defined as
= , 𝐹 𝑆 𝑇 𝐷 𝑆𝑇 '
𝐻 𝑇 where is the average of gene diversity between populations and is the diversity of the two 𝐷 𝑆 𝑇 '
𝐻 𝑇 population samples.
Based on this model, the estimator contingent upon bi-allelic SNPS and two populations is
, 𝐹 𝑆 𝑇 =
(𝑝 1
−𝑝 2
)
2
2𝑝 𝑎𝑣𝑔 (1−𝑝 𝑎𝑣𝑔 )
where
, 𝑝 𝑎 𝑣 𝑔 =
𝑝 1
+𝑝 2
2
and are the allele frequencies in these two sample populations. Since we cannot observe 𝑝 1
𝑝 2
𝑝 1
and , the observed sample allele frequencies and were used instead, with 𝑝 2
𝑝 1
𝑝 2
𝑉𝑎𝑟 (𝑝 1
− 𝑝 2
) ≈ (2𝐹 𝑆 𝑇 +
1
2𝑁 1
+
1
2𝑁 2
)𝑝 𝑎 𝑣 𝑔 (1 − 𝑝 𝑎 𝑣 𝑔 )
where is the number of samples with valid genotypes from population for . 𝑁 𝑖 𝑖 𝑖 ∈ 1, 2 { }
Thus the estimator used in this research can be written as
. 𝐹 𝑠 𝑡 = 𝐸 (
(𝑝 1
−𝑝 2
)
2
−(
1
2𝑁 1
+
1
2𝑁 2
)𝑝 𝑎𝑣𝑔 (1−𝑝 𝑎𝑣𝑔 )
2𝑝 𝑎𝑣𝑔 (1−𝑝 𝑎𝑣𝑔 )
)
4
2-3 Cr eation of annotations capturing allele fr equency changes at differ ent time points 𝐹 𝑠 𝑡 A genome annotation is the assignment to a binary or continuous value to every genetic variant.
In this study , we create annotations based on the allele frequency changes between two periods
of time. W e considered 3 dif ferent approaches, each comparing allele frequencies at 8 dif ferent
time points using the measure. For the first approach, we compared allele frequencies across 𝐹 𝑠 𝑡 adjacent millennial bins. For the second approach, we compared allele frequencies before and
after a specific millennial time point. For the third approach, we compared allele frequencies
between the present time and dif ferent millennial bins. In total, we defined 23 distinct 𝐹 𝑠 𝑡 definitions (T able 2).
Appr oach 1 Appr oach 2 Appr oach 3
Present v .s. 1-999 Present v .s. >0K Present v .s. 1-999
1-999 v .s. 1K-2K < 1K v .s. > 1K Present v .s. 1K-2K
1K-2K v .s. 2K-3K < 2K v .s. > 2K Present v .s. 2K-3K
2K-3K v .s 3K-4K < 3K v .s. > 3K Present v .s 3K-4K
3K-4K v .s. 4K-5K < 4K v .s. > 4K Present v .s. 4K-5K
4K-5K v .s. 5K-6K < 5K v .s. > 5K Present v .s. 5K-6K
5K-6K v .s. 6K-7K <6K v .s. > 6K Present v .s. 6K-7K
6K-7K v .s. 7K-8K < 7K v .s. > 7K Present v .s. 7K-8K
T able 2: Pair ed T ime periods used for enrichment analysis
Then, for each definition, we defined its corresponding annotation by annotating SNPs with 𝐹 𝑠 𝑡 the top 1% values within 10 dif ferent (European) MAF bins (same MAF bins as the S-LDSC 𝐹 𝑠 𝑡 baseline-LD model (Finucane et al., 2015; Gazal et al., 2017) ). W e stratified values by MAF , 𝐹 𝑠 𝑡 5
as the magnitude of depends on its MAF , and it would be easier for a rare variant to have a 𝐹 𝑠 𝑡 high than for a common variant. 𝐹 𝑠 𝑡 T o characterize the function of variants with high values for each approach, we 𝐹 𝑠 𝑡 computed their functional enrichment and gene enrichment. W e computed functional enrichment
of a given annotation A with the function below ,
𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝐸𝑛𝑟𝑖𝑐ℎ𝑚𝑒𝑛𝑡 =
𝑁 𝑟𝑒𝑐𝑒𝑛𝑡 𝐴 /𝑁 𝑟𝑒𝑐𝑒𝑛𝑡 𝑁 𝐴 /𝑁 where is the number of common variants (excluding MHC locus) with recent allele 𝑁 𝑟 𝑒 𝑐 𝑒 𝑛 𝑡 𝐴 frequency changes that are in A , is the number of common variants with recent allele 𝑁 𝑟 𝑒 𝑐 𝑒 𝑛 𝑡 frequency changes, is the number of common variants that are in A , and is the number of 𝑁 𝐴 𝑁 common variants. The standard errors of the enrichment were computed using the block
Jackknife procedure, which were then also used to gain the corresponding 95% CI. W e computed
gene enrichment of variants in annotation A by linking these variants to their tar get genes using
the cS2G method (Gazal et al., 2022) , and by running the Gene Ontology analyses from the R
goseq package (Y oung et al., 2010) .
2-4 Stratified LD Scor e r egr ession
Stratified LD score regression (S-LDSC) is a method to use GW AS summary statistics for
partitioning heritability across overlapping binary and continuous annotations (Finucane et al.,
2015; Gazal et al., 2017) .
6
In this method, a sample of N individuals was supposed to have a vector quantitative
phenotypes y = ( ), whose mean and variance standardized to 0 and 1 respectively . Such 𝑦 1
, ..., 𝑦 𝑁 an infinitesimal linear model would be written as
𝑦 = 𝑋 β + ε
where X represents a standardized N M matrix of quantitative genotypes, is a × β = (β
1
, ..., β
𝑀 )
vector of ef fect sizes for per -SNP heritability and is a vector of residuals, ε = (ε
1
, ..., ε
𝑀 )
whose mean is 0 and variance is . In this model, was then assumed to have a mean of 0 and a σ
𝑒 2
β
variance depending on C continuous-valued annotations:
𝑣𝑎𝑟 (β
𝑗 ) =
𝑐 ∑ 𝑎 𝑐 (𝑗 )τ
𝑐 where is the value of annotation related to SNP j , and represents the contribution of 𝑎 𝑐 (𝑗 ) 𝑎 𝑐 τ
𝑐 each SNP per unit of annotation to heritability (conditioned on all other annotations). 𝑎 𝑐 Under this model,
𝐸 [χ
𝑗 2
] = 𝑁 𝑐 ∑ τ
𝑐 𝑙 (𝑗 , 𝑐 ) + 1
Where
𝑙 (𝑗 , 𝑐 ) =
𝑘 ∑ 𝑎 𝑐 (𝑘 )𝑟 𝑗 𝑘 2
is the LD score of SNP j with respect to the value of annotation and means the 𝑎 𝑐 𝑟 2
𝑗 𝑘 correlation between SNPs j and k (Finucane et al., 2015; Gazal et al., 2017) . This equation allows
us to estimate the values of , that is given a vector of statistics and the LD scores τ
𝑐 τ
𝐶 χ
𝑗 2
7
computed from a reference sample, which was the dataset of European come from 1,000
Genomes project in this case (1000 Genomes Project Consortium et al., 2015) . Standard errors of
are estimated using a block jackknife procedure, and corresponding P -values using z scores. τ
𝐶 The heritability enrichment of an annotation A is defined as the ratio between the
proportion of heritability explained by SNPs in the annotation and the proportion of total SNPs
selected to analyze in the annotation (here set as 1%):
𝐸𝑛𝑟𝑖𝑐ℎ𝑚𝑒𝑛𝑡 (𝐴 ) =
𝑗 ∊𝐴 ∑
𝑐 ∑𝑎 𝑐 (𝑗 )τ
𝑐 /
𝑗 ∑
𝑐 ∑𝑎 𝑐 (𝑗 )τ
𝑐 𝑀 𝐴 / 𝑀 Where M
A
is the number of common variants in A , and M the total number of common variants.
Standard errors are estimated using a block jackknife procedure. .
S-LDSC was run using European datasets with recommended settings (Finucane et al.,
2015; Gazal et al., 2017) . W e ran S-LDSC on 63 independent European GW AS summary
statistics (Gazal et al., 2022) removing the MHC locus, and performed random ef fect
meta-analyses on * and enrichment estimates. W e performed additional analyses on 1 1 blood τ
and immune traits (White Blood Cell Count, Red Blood Cell Count, Platelet Count, Red Blood
Cell Distribution W idth, Eosinophils Count, Auto Immune T raits (Sure), Rheumatoid Arthritis,
Ulcerative Colitis, Crohn’ s disease, Celiac Disease, and Systemic Lupus Erythematosus) and 9
brain disorders (Neuroticism, Bipolar Disorder and Schizophrenia, Insomnia, Major Depressive
Disorder , Attention Deficit Hyperactivity Disorder , Bipolar Disorder vs. Schizophrenia,
Anorexia, Alzheimer's Disease, and Autism Spectrum).
8
C h a p t e r 3 : R e s u l t s
3-1 V ariants with r ecent allele fr equency changes ar e enriched in important functional
r egions and in genes r elated to immunity
W e investigated the functional relevance of variants with recent allele frequency changes by
looking at their enrichment in functional regions, and by performing gene set enrichment of the
genes they are linked to.
Figur e 1: S-LDSC r esults for the tr end of enrichment in 6 important functional r egions
acr oss 3 differ ent time scales. Error bars represent 95% confidence intervals and were plotted
when the functional enrichment was significantly greater than 1; corresponding errors were
computed using the block Jackknife procedure.
First, we observed that variants with recent allele frequency changes were extremely
enriched in non-synonymous variants, enriched in synonymous and UTRs variants, and depleted
in repressed and intron variants (Figure 1). Functional enrichments were constant when using
adjacent millennial bins (approach 1) before and after a specific millennial time point (approach
2), but had low and high magnitudes, respectively (Figures 1A and 1B). When comparing allele
frequencies between the present time and dif ferent millennial bins (approach 3), we observed
that functional enrichments started being higher when comparing from more than 2K years ago
9
(Figure 1C). Our results were consistent with the ones obtained on HapMap II (Barreiro et 𝐹 𝑠 𝑡 al. 2008 Nat Genet), and were similar to annotations using the top 0.5% and top 2% 𝐹 𝑠 𝑡 𝐹 𝑠 𝑡 (data not shown).
Second, we observed that genes linked to variants with recent allele frequency changes
tend to be linked to genes related to immunity (T able 3). W e replicated these results when
removing genes within the MHC locus (T able 4), showing that recent allele frequency changes
impacted the whole genome and were not restricted to the MHC locus. Besides, the genes
previously thought to be under genetic selection (Le et al., 2022; Mathieson et al., 2015) are also
showing similar enrichment in our research. (T otally 14 out of 26 of the significant genes cited in
these two papers, by Mathieson and Le, were contained in the genes regulated by SNPs with high
Fst (present vs 4-5K) in our research. Notably , the paper by Mathieson demonstrated that 13 out
of 16 significant genes displayed similar enrichment to our findings. )
T erm # of genes P fdrP
MHC protein complex 18/22 3.36E-15 3.62E-1 1
MHC class II protein complex 13/14 8.80E-13 4.74E-09
adaptive immune response 90/391 1.85E-12 6.63E-09
peptide antigen binding 16/23 2.04E-1 1 5.49E-08
interferon-gamma-mediated signaling pathway 3387 2.69E-1 1 5.79E-08
T able 3: Genes r egulated by SNPs with high (pr esent vs 4-5K) keeping MHC genes in 𝐹 𝑠 𝑡 the pathway analyses
T erm # of genes P fdrP
regulation of cell adhesion 1 10/625 2.33E-08 2.50E-04
leukocyte cell-cell adhesion 57/301 2.20E-08 5.80E-03
regulation of response to external stimulus 134/910 4.09E-06 6.25E-03
regulation of inflammatory response 56/308 6.79E-06 8.68E-03
10
T erm # of genes P fdrP
regulation of cell adhesion 1 10/625 2.33E-08 2.50E-04
lymphocyte activation 97/613 7.30E-06 8.68E-03
T able 4: Genes r egulated by SNPs with high (pr esent vs 4-5K) without keeping MHC 𝐹 𝑠 𝑡 genes in the pathway analyses
Altogether , our results show that recent allele frequency changes tar get variants that tend
to be functional and genes involved in the immune function, suggesting that our annotations 𝐹 𝑠 𝑡 are more likely to capture selection events related to adaptation to new environments (such as
pathogens), rather than genetic drift (i.e. the change of allele frequencies due to random
processes).
3-2 V ariants with r ecent allele fr equency changes ar e enriched in the heritability of human
diseases and complex traits
S-LDSC was applied to the 23 annotations on 63 independent diseases and complex traits 𝐹 𝑠 𝑡 and meta-analyzed results across traits. W e observed extremely significant heritability
enrichment (1.88-fold, enrichment P = 2 x 10
- 5
) for variants that allele frequency changed ~4K
years ago (Approach 3; Figure 2). W e also observed significant conditional ef fects for their
corresponding annotations ( * P = 3 x 10
- 9
), implying that information about allele frequency τ
changes brings new information to the heritability model above the functional annotations of the
baseline-LD model. W e observed no significant enrichment using annotations from Approach 1,
and similar and significant enrichments using annotations from Approach 2 (Figure 2).
1 1
Figur e 2: Disease heritability enrichment and the P-value of conditional effect of SNPs with
high . Results were meta-analyzed across 63 independent traits (black), 1 1 blood and 𝐹 𝑠 𝑡 immune traits (red), and 9 brain disorders (blue).
Because our annotations are more likely to capture selection events related to 𝐹 𝑠 𝑡 adaptation to new environments (T able 3), we performed new S-LDSC analyses restricted to 1 1
blood and immune traits. W e observed higher heritability enrichments for variants that allele
frequency changed ~4K years ago than in previous analyses (4.42-fold, enrichment P = 3 x 10
- 4
,
* P = 1 x 10
- 8
; see red dots in Figure 2). Interestingly we also found significant results (for τ
conditional ef fects only) for variants that allele frequency changed ~2K years ago (2.1 1-fold,
enrichment P = 0.03, * P = 2 x 10
- 4
), suggesting that recent changes might have impacted blood τ
and immune traits.
Considering human brains have evolved to a relatively superior level for more than 100K
years (Neubauer et al., 2018) scientifically , it is unlikely that the human brains or phenotypes
12
would under go significant changes over 8,000 years, as this is a relatively short time period in
evolutionary terms. Although small genetic variations may occur , they are unlikely to result in
significant alterations to brain function or phenotype. Thus, we expected that recent selection did
not impact human brain function and brain-related phenotypes, and replicated our S-LDSC
analyses on 9 brain disorders as a negative control, and observed no heritability enrichment for
none of the annotations (see blue dots in Figure 2). 𝐹 𝑠 𝑡 Finally , while we obtained those conclusions using an arbitrary threshold of the top 1%
, we note that our conclusions were replicated when using the top 0.5% and top 2% 𝐹 𝑠 𝑡 𝐹 𝑠 𝑡 𝐹 𝑠 𝑡 (data not shown).
Altogether , our results suggest that allele frequency changes appearing ~4,000 years ago
impacted human diseases and complex traits, and that allele frequency changes appearing ~2,000
years ago impacted blood and immune traits.
13
C h a p t e r 4 : C o n c l u s i o n a n d d i s c u s s i o n
In this project, we showed that variants with high allele frequency dif ferences between modern
Europeans and Europeans living 4K years ago, were enriched in functional variants and in
variants tar geting genes involved in the immune function, suggesting that these dif ferences are
more likely to capture selection events related to the adaptation to new environments (such as
pathogens), rather than genetic drift. W e next showed that these variants were enriched in human
disease and complex trait heritability (especially within the blood and immune phenotypes),
suggesting recent selective pressures impacting complex traits.
Our findings have several implications for downstream analyses. First, it will be of
interest to validate that variants with high allele frequencies dif ferences in Europeans are
enriched in heritability in East Asian populations and to check whether or not these variants are
subject to gene-by-environment interaction by comparing allele disease ef fect sizes (Shi et al.,
2021) . Second, it raises the question if recent allele frequency changes were driven by directional
selection or stabilizing selection on human complex traits. W e could test these two hypotheses by
estimating mean polygenic risk scores and genetic variance (respectively) of dif ferent
phenotypes using allele frequency estimates at dif ferent points in time. It would allow us to test
the hygiene hypothesis, suggesting that adaptation of our immune system to new environments
has increased our risk of autoimmune diseases. However , new methods are needed to obtain such
estimates. Similarly , it also raises the question if recent allele frequency changes were driven by
positive selection (i.e., an increase of the derived allele frequency) or negative selection (i.e., a
decrease of the derived allele frequency) on those variants. Dissecting the direction of derived
allele frequency through time would help to answer this question. Finally , it would be interesting
14
to investigate the allele frequency changes of particular disease variants that have been
fine-mapped to human phenotypes.
Altogether , our results strongly suggest that recent selection events have impacted human
disease risks in Europeans, and provide exciting opportunities for new research directions.
15
R e f e r e n c e s
1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P .,
Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy , S., McV ean, G. A., & Abecasis, G.
R. (2015). A global reference for human genetic variation. Natur e , 526(7571), 68–74.
Abdellaoui, A., Y engo, L., V erweij, K. J. H., & V isscher , P . M. (2023). 15 years of GW AS
discovery: Realizing the promise. American Journal of Human Genetics .
https://doi.or g/ 10.1016/j.ajhg.2022.12.01 1
Agarwala, V ., Flannick, J., Sunyaev , S., GoT2D Consortium, & Altshuler , D. (2013). Evaluating
empirical bounds on complex disease genetic architecture. Natur e Genetics , 45(12),
1418–1427.
Bhatia, G., Patterson, N., Sankararaman, S., & Price, A. L. (2013). Estimating and interpreting
FST : the impact of rare variants. Genome Resear ch , 23(9), 1514–1521.
Carroll, S. B. (2003). Genetics and the making of Homo sapiens. In Natur e (V ol. 422, Issue 6934,
pp. 849–857). https://doi.or g/ 10.1038/nature01495
Eyre-W alker , A. (2010). Evolution in health and medicine Sackler colloquium: Genetic
architecture of a complex trait and its implications for fitness and genome-wide association
studies. Pr oceedings of the National Academy of Sciences of the United States of America ,
107 Suppl 1, 1752–1756.
Finucane, H. K., Bulik-Sullivan, B., Gusev , A., T rynka, G., Reshef, Y ., Loh, P .-R., Anttila, V .,
Xu, H., Zang, C., Farh, K., Ripke, S., Day , F . R., ReproGen Consortium, Schizophrenia
W orking Group of the Psychiatric Genomics Consortium, RACI Consortium, Purcell, S.,
Stahl, E., Lindstrom, S., Perry , J. R. B., … Price, A. L. (2015). Partitioning heritability by
functional annotation using genome-wide association summary statistics. Natur e Genetics ,
47(1 1), 1228–1235.
Fuchsber ger , C., Flannick, J., T eslovich, T . M., Mahajan, A., Agarwala, V ., Gaulton, K. J., Ma,
C., Fontanillas, P ., Moutsianas, L., McCarthy , D. J., Rivas, M. A., Perry , J. R. B., Sim, X.,
Blackwell, T . W ., Robertson, N. R., Rayner , N. W ., Cingolani, P ., Locke, A. E., T ajes, J. F .,
… McCarthy , M. I. (2016). The genetic architecture of type 2 diabetes. Natur e , 536(7614),
41–47.
Gazal, S., Finucane, H. K., Furlotte, N. A., Loh, P .-R., Palamara, P . F ., Liu, X., Schoech, A.,
Bulik-Sullivan, B., Neale, B. M., Gusev , A., & Price, A. L. (2017). Linkage
disequilibrium-dependent architecture of human complex traits shows action of negative
selection. Natur e Genetics , 49(10), 1421–1427.
Gazal, S., Loh, P .-R., Finucane, H. K., Ganna, A., Schoech, A., Sunyaev , S., & Price, A. L.
16
(2018). Functional architecture of low-frequency variants highlights strength of negative
selection across coding and non-coding annotations. Natur e Genetics , 50(1 1), 1600–1607.
Gazal, S., W eissbrod, O., Hormozdiari, F ., Dey , K. K., Nasser , J., Jagadeesh, K. A., W einer , D. J.,
Shi, H., Fulco, C. P ., O’Connor , L. J., Pasaniuc, B., Engreitz, J. M., & Price, A. L. (2022).
Combining SNP-to-gene linking strategies to identify disease genes and assess disease
omnigenicity . Natur e Genetics , 54(6), 827–836.
Hawks, J., W ang, E. T ., Cochran, G. M., Harpending, H. C., & Moyzis, R. K. (2007). Recent
acceleration of human adaptive evolution. In Pr oceedings of the National Academy of
Sciences (V ol. 104, Issue 52, pp. 20753–20758). https://doi.or g/ 10.1073/pnas.0707650104
Keinan, A., & Clark, A. G. (2012). Recent Explosive Human Population Growth Has Resulted in
an Excess of Rare Genetic V ariants. In Science (V ol. 336, Issue 6082, pp. 740–743).
https://doi.or g/ 10.1 126/science.1217283
Laland, K. N., Odling-Smee, J., & Myles, S. (2010). How culture shaped the human genome:
bringing genetics and the human sciences together . Natur e Reviews. Genetics , 1 1(2),
137–148.
Le, M. K., Smith, O. S., Akbari, A., Harpak, A., Reich, D., & Narasimhan, V . M. (2022). 1,000
ancient genomes uncover 10,000 years of natural selection in Europe. bioRxiv : The
Pr eprint Server for Biology . https://doi.or g/ 10.1 101/2022.08.24.505188
Mancuso, N., Rohland, N., Rand, K. A., T andon, A., Allen, A., Quinque, D., Mallick, S., Li, H.,
Stram, A., Sheng, X., Kote-Jarai, Z., Easton, D. F ., Eeles, R. A., PRACTICAL consortium,
Le Marchand, L., Lubwama, A., Stram, D., W atya, S., Conti, D. V ., … Reich, D. (2016).
The contribution of rare variation to prostate cancer heritability . Natur e Genetics , 48(1),
30–35.
Mathieson, I., Lazaridis, I., Rohland, N., Mallick, S., Patterson, N., Roodenber g, S. A., Harney ,
E., Stewardson, K., Fernandes, D., Novak, M., Sirak, K., Gamba, C., Jones, E. R., Llamas,
B., Dryomov , S., Pickrell, J., Arsuaga, J. L., de Castro, J. M. B., Carbonell, E., … Reich, D.
(2015). Genome-wide patterns of selection in 230 ancient Eurasians. Natur e , 528(7583),
499–503.
Nei, M. (1986). DEFINITION AND ESTIMA TION OF FIXA TION INDICES. Evolution;
International Journal of Or ganic Evolution , 40(3), 643–645.
Neubauer , S., Hublin, J.-J., & Gunz, P . (2018). The evolution of modern human brain shape. In
Science Advances (V ol. 4, Issue 1). https://doi.or g/ 10.1 126/sciadv .aao5961
O’Connor , L. J., Schoech, A. P ., Hormozdiari, F ., Gazal, S., Patterson, N., & Price, A. L. (2019).
Extreme Polygenicity of Complex T raits Is Explained by Negative Selection. American
Journal of Human Genetics , 105(3), 456–476.
17
Patterson, N., Isakov , M., Booth, T ., Büster , L., Fischer , C.-E., Olalde, I., Ringbauer , H., Akbari,
A., Cheronet, O., Bleasdale, M., Adamski, N., Altena, E., Bernardos, R., Brace, S.,
Broomandkhoshbacht, N., Callan, K., Candilio, F ., Culleton, B., Curtis, E., … Reich, D.
(2022). Lar ge-scale migration into Britain during the Middle to Late Bronze Age. Natur e ,
601(7894), 588–594.
Pritchard, J. K., Pickrell, J. K., & Coop, G. (2010). The genetics of human adaptation: hard
sweeps, soft sweeps, and polygenic adaptation. Curr ent Biology: CB , 20(4), R208–R215.
Schoech, A. P ., Jordan, D. M., Loh, P .-R., Gazal, S., O’Connor , L. J., Balick, D. J., Palamara, P .
F ., Finucane, H. K., Sunyaev , S. R., & Price, A. L. (2019). Quantification of
frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of
negative selection. Natur e Communications , 10(1), 790.
Shi, H., Gazal, S., Kanai, M., Koch, E. M., Schoech, A. P ., Siewert, K. M., Kim, S. S., Luo, Y .,
Amariuta, T ., Huang, H., Okada, Y ., Raychaudhuri, S., Sunyaev , S. R., & Price, A. L.
(2021). Population-specific causal disease ef fect sizes in functionally important regions
impacted by selection. Natur e Communications , 12(1), 1098.
Simons, Y . B., Bullaughey , K., Hudson, R. R., & Sella, G. (2018). A population genetic
interpretation of GW AS findings for human quantitative traits. PLoS Biology , 16(3),
e2002985.
V isscher , P . M., Brown, M. A., McCarthy , M. I., & Y ang, J. (2012). Five years of GW AS
discovery . American Journal of Human Genetics , 90(1), 7–24.
V isscher , P . M., W ray , N. R., Zhang, Q., Sklar , P ., McCarthy , M. I., Brown, M. A., & Y ang, J.
(2017). 10 Y ears of GW AS Discovery: Biology , Function, and T ranslation. American
Journal of Human Genetics , 101(1), 5–22.
Y ang, J., Bakshi, A., Zhu, Z., Hemani, G., V inkhuyzen, A. A. E., Lee, S. H., Robinson, M. R.,
Perry , J. R. B., Nolte, I. M., van Vliet-Ostaptchouk, J. V ., Snieder , H., LifeLines Cohort
Study , Esko, T ., Milani, L., Mägi, R., Metspalu, A., Hamsten, A., Magnusson, P . K. E.,
Pedersen, N. L., … V isscher , P . M. (2015). Genetic variance estimation with imputed
variants finds negligible missing heritability for human height and body mass index. Natur e
Genetics , 47(10), 1 1 14–1 120.
Y oung, M. D., W akefield, M. J., Smyth, G. K., & Oshlack, A. (2010). Gene ontology analysis for
RNA-seq: accounting for selection bias. Genome Biology , 1 1(2), R14.
Zeng, J., de Vlaming, R., W u, Y ., Robinson, M. R., Lloyd-Jones, L. R., Y engo, L., Y ap, C. X.,
Xue, A., Sidorenko, J., McRae, A. F ., Powell, J. E., Montgomery , G. W ., Metspalu, A.,
Esko, T ., Gibson, G., W ray , N. R., V isscher , P . M., & Y ang, J. (2018). Signatures of negative
selection in the genetic architecture of human complex traits. Natur e Genetics , 50(5),
746–753.
18
Zuk, O., Schaf fner , S. F ., Samocha, K., Do, R., Hechter , E., Kathiresan, S., Daly , M. J., Neale, B.
M., Sunyaev , S. R., & Lander , E. S. (2014). Searching for missing heritability: designing
rare variant association studies. Pr oceedings of the National Academy of Sciences of the
United States of America , 1 1 1(4), E455–E464.
19
Abstract (if available)
Abstract
While the human genome and human phenotypes have been shaped by around 5-6 millions of years of evolution(Carroll, 2003), it is unclear if and how they have been impacted by recent selection events (<10K years). Here, we investigated how recent changes in allele frequency have shaped the genetic architectures of human diseases and complex traits by leveraging time-dependent allele frequencies from 554 modern Europeans and 5,362 ancient Europeans, and results from 63 independent genome-wide association studies. We show that variants with high allele frequency differences between modern Europeans and Europeans living 4K years ago, were enriched in functional variants, in variants targeting genes involved in the immune function, and inheritability of human diseases and complex traits (especially within blood and immune phenotypes). These results strongly suggest that recent selection events have impacted human disease risks in Europeans.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling the minor allele frequency and linkage disequilibrium joint architectures of human diseases and complex traits
PDF
Understanding ancestry-specific disease allelic effect sizes by leveraging multi-ancestry single-cell RNA-seq data
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Characterizing synonymous variants by leveraging gene expression and GWAS datasets
PDF
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
PDF
Scalable latent factor models for inferring genetic regulatory networks
PDF
Understanding acute lymphoblastic leukemia in different ethnic groups in the United States
PDF
A global view of disparity in imputation resources for conducting genetic studies in diverse populations
PDF
Integrative analysis of multi-view data with applications in epidemiology
PDF
Improving the power of GWAS Z-score imputation by leveraging functional data
PDF
Shortcomings of the genetic risk score in the analysis of disease-related quantitative traits
PDF
Identifying and quantifying transcriptional module heterogeneity and genetic co-regulation, with applications in asthma
PDF
The impact of global and local Polynesian genetic ancestry on complex traits in Native Hawaiians
PDF
Prostate cancer: genetic susceptibility and lifestyle risk factors
PDF
Bayesian hierarchical models in genetic association studies
PDF
Comparison of participant and study partner predictions of cognitive impairment in the Alzheimer's disease neuroimaging initiative 3 study
PDF
Polygenic analyses of complex traits in complex populations
PDF
Adipokines do not account for the association between osteocalcin and insulin sensitivity in Mexican Americans
PDF
The risk estimates of pneumoconiosis and its relevant complications: a systematic review and meta-analysis
PDF
Prediction modeling with meta data and comparison with lasso regression
Asset Metadata
Creator
Zeng, Yating
(author)
Core Title
Understand the distinct patterns of selection in auto-immune diseases with ancient DNA data by the S-LDSC model
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2023-05
Publication Date
04/05/2023
Defense Date
04/04/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
auto-immune diseases,GWAS,meta,OAI-PMH Harvest,S-LDSC
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gazal, Steven (
committee chair
), Chiang, Charleston (
committee member
), Mancuso, Nicholas (
committee member
)
Creator Email
yatingze@usc.edu,yatingzeng12@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112932358
Unique identifier
UC112932358
Identifier
etd-ZengYating-11567.pdf (filename)
Legacy Identifier
etd-ZengYating-11567
Document Type
Thesis
Format
theses (aat)
Rights
Zeng, Yating
Internet Media Type
application/pdf
Type
texts
Source
20230405-usctheses-batch-1016
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
auto-immune diseases
GWAS
meta
S-LDSC