Close
The page header's logo
About
FAQ
Home
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Identification of differentially connected gene expression subnetworks in asthma symptom
(USC Thesis Other) 

Identification of differentially connected gene expression subnetworks in asthma symptom

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Transcript (if available)
Content




Identification of Differentially Connected Gene Expression Subnetworks
in Asthma Symptom  

by

XIAOWEI WU  


A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)


May 2020



Copyright 2020                                                                                                           XIAOWEI WU
ii
Acknowledgments
I would like to thank Dr. Joshua Millstein for being my great mentor and for being
incredibly supportive and helpful. In addition, I would like to thank my committee members Dr.
Wendy Mack and Dr. Meredith Franklin for their expertise and guidance.
This work is dedicated to my parents for all of their endless love and support.












iii
TABLE OF CONTENTS
Acknowledgments                                                                                                                                 ii
List of Tables                    iv
List of Figures                      v
Abstract                    vi
1. Introduction                     1
2. Methods                      4
2.1 Asthma BRIDGE Data 4
2.2 Gene-Gene Interaction 5
2.3 FDR and Parameter-based Approach 8
2.4 Subnetworks Recognition 9
2.5 Gene-Set Enrichment Analysis 10
2.6 Statistical Analysis Tool 10
3. Results                                                                                  11
3.1 Demographics of the Asthma BRIDGE Samples 11
3.3 Permutation-based FDR Results 13
3.4 Subnetwork Construction Results 17
3.5 Comparison with CAMP data 20
3.6 Gene-Set Enrichment Analysis 21
4. Discussion                   22
Bibliography                                25
Appendix                                29
Table A1: GSEA Results 29



iv
List of Tables
Table 1: Characteristics of the Asthmatic Participants in ABRIDGE and CAMP Datasets                        12
Table 2: Characteristics of each node in asthmatic genes network                                                              17




















v
List of Figures

Figure 1: Comparison of distribution of marginal p-value in initial and average permuted data                13
Figure 2: Interaction p-value comparison for initial and average number of permutations                         14
Figure 3: Examples of permutation results                                                                                                  14
Figure 4: 70 significant gene-gene interactions discovered under threshold=3.45 (100 permutations)      15
Figure 5: Significant results (FDR < 0.05) under different numbers of permutations 16
Figure 6: Network Structure                                                                                                                        19
Figure 7: Replication Results in CAMP                                                                                                      20

























vi
Abstract  
Background: Gene subnetworks have been demonstrated to be involved in the pathogenesis of
complex genetic diseases such as asthma. However, it is still a significant challenge to construct
accurate subnetworks from gene expression network data.  
Methods: We developed a permutation-based, non-parametric method to identify differently
connected subnetworks as well as indispensable genes within the overall network. The selection
of gene-gene interactions is based on a multivariable regression model and permutation-based
FDR approaches. We implemented Gephi to draw the gene network and conduct the network
analysis, as well as the KOBAS system, to review Gene Ontology (GO) annotations for entries in
significant gene lists and to search for enrichment patterns. Analysis data was based on the
ABRIDGE dataset, including 245 asthmatics and corresponding 18388 genes expression data.  
Results: Forty-eight significant gene-gene interactions were discovered under a FDR threshold of
0.05. We identified five significant sub-networks, which are unique in ABRIDGE expression data
since no similar pattern was discovered under permutation data. However, we were unable to
replicate the network in the asthma comparison group.
Conclusion: GPR44, MYB, VLDLR, ALOX15, INDO, and OLIG2 play key roles among the
gene subnetworks we discovered. The hub nodes and skeleton substructures of subnetworks are
consistent with prior knowledge regarding asthma pathways. Enrichment analysis revealed that
the list of top genes is enriched with asthma genes. Our study illustrates the feasibility and
validity of this method, and it could be an alternative method for targeting subnetworks that are
associated with different symptoms in complex diseases.
1
1. Introduction  
Asthma, characterized by respiratory symptoms and reversible airflow obstruction, is a
chronic inflammatory disease of the airways as well as one of the most common chronic diseases
around the world[1]. Based on the Global Asthma Report(2018), nearly 300 million people
worldwide were diagnosed with asthma by 2018, and the chance of reaching 400 million patients
by 2025 remains high[2].  
Accumulated data and research on asthma has built a strong foundation of associative, full-
scale gene network theory that describes the pathogenic mechanism of asthma, which has an
estimated heritability ranging from 35 to 70%[3]. For example, genome-wide association studies
(GWASs) have identified 36 asthma risk loci that met statistically significant threshold and
replicated successfully in several subsequent studies[4]. Hundreds of genes and gene-gene
interactions have been defined as significant influencers of broad genetic heterogeneity in
asthma[5].  
Gene-gene interaction indicates a relationship between two or more different gene loci,
which contribute to the same phenotype in a non-additive way[6]. Research has verified that
asthma susceptibility is impacted primarily through gene-gene interactions[7]. For instance, the
genetic interaction between IL4RA and IL13 towards bronchial hyperresponsiveness contributes
to asthma susceptibility[8]. The synergistic effect of IL6 and IL6R genes modify bronchodilator
drug responsiveness in asthma[9]. Since gene-gene interactions leading to asthma have also been
discovered in multiple studies, addressing such interactions may be critical for asthma
treatment[10].  
2
Analysis of asthma gene networks and subnetworks has become a major research topic in
recent years. A gene network is a collection of genes that interact with each other, and a gene
subnetwork is a separately identifiable part of a gene network.  A subnetwork that is constructed
of biologically significant genes is termed an active subnetwork[8]. Although knowledge of gene
networks and specific gene interactions in asthma have been reported, searching for active
subnetworks, or a set of genes and interactions that act in an essential biological function, has been
a challenge[11]. The reasons are that not only that the selection of active subnetworks is
computationally difficult[12], involving a massive number of gene tests, but also that generally
recognized methods require parametric assumptions[12]. Therefore, the understanding and
identification of subnetworks are constrained, especially when gene expression data are inadequate,
only contains limited number of genes.  
Various approaches have been developed to extract subnetworks from gene networks using
gene expression data. Mainstream methods include BioNet, jActiveModules, and PinnacleZ.
BioNet is an integrative analysis method which can target functional modules within gene-gene
interaction networks. By scoring each node in the network region based on the beta-uniform
mixture model and assigned p-value, BioNet ranks subnetworks by scores and chooses the
maximum. Implementation of the Prize-Collecting Steiner Tree (PCST) problem and an integer
linear programming algorithm is the core of BioNet. BioNet detects differentia of functional
modules, when the characteristic data and p values are available [13].  
Both jActiveModules and PinnacleZ are plugin packages in Cytoscape, which is an open-
source software platform for visualizing complex networks and integrating data. They acquire
functions to search activated subnetworks that show statistically significant changes in expression
3
data over different conditions. Two different algorithms comprise jActiveModules, simulated
annealing, and greedy search; PinnacleZ is only comprise by greedy search. Simulated annealing
is designed to search for the most highly scored subnetwork, and greedy search extends a
subnetwork by adding one of its neighboring genes that maximizes a mutual information-based
function. However, jActiveModules is not applicable to extensive high-dimensional data due to
computational complexity in addition to the large memory requirement[12]. These main existing
approaches to identify active sub-networks are based on overall genetic network construction
under a parametric assumption and ranking system. Under massive multiple testing conditions,
parametric tests rely heavily on the extreme tails of the distribution, which are not consistent
with real data[14].  There is also a possibility that we need to discover latent differentially
connected subnetworks where the parameters of the expression gene distribution are unspecified.  
In this study, we developed a new method to detect differentially connected gene
subnetworks. To address challenges from parametric assumptions, Millstein and Volfson’s FDR
approach[14] is implemented, which is a practical permutation-based FDR calculation method.
Permutation-based FDR provides a non-parametric method to adjust for the multiple testing error,
where precision is proportional to the number of permutations. Gene expression data is re-assessed,
and FDR computed based on 100 permutations, in which the expression data are randomly shuffled
over the gene labels in the dataset followed by re-computation of the significance level. With FDR
statistical significance thresholds of 0.05 for all inferred network edges, which mean the gene-gene
interactions, our analysis approach identified five differential subnetworks constructed within the
ABRIDGE asthma genetic dataset. We applied Gephi to visualize the subnetworks in a graph form.
Gephi is a network visualization software which is applicable in various areas; one of Gephi’s key
functions is a spatialization process display[15]. We also conducted a Gene Ontology (GO)
4
functional analysis to confirm the significance of our gene list. The results show the feasibility of
the new method. Moreover, the causal analytical model we introduced proved its potential
usefulness in identification of various subnetworks for therapeutic strategies, drug discovery and
development, and gene-oriented treatment.  
2. Methods  
2.1 Asthma BRIDGE Data  
The Asthma Bio Repository for Integrative Genomic Exploration (Asthma BRIDGE) is an
NIH/NHLBI-supported initiative to develop a publicly accessible resource consisting of
lymphoblastoid cell lines from asthmatics and controls participating in genetic studies of
asthma[16]. The project includes a collection of 1,542 individuals with asthma with
comprehensive phenotype and genomic data[17][18][19]. The definition of asthma in ABRIGDE
is based on the appearance of asthma symptoms, which is confirmed through a questionnaire, the
usage of an inhaled bronchodilator more than once per week, or the usage of daily asthma
medication for the six months before the interview. We use tested score DNA methylation and
Genome-wide SNP data were obtained from the whole blood samples of 576 participants. Samples
were randomized to avoid confounding by experimental batch. We excluding samples with
missing values, in order to assemble the most informative data for this analysis. The final data set
included 245 asthmatic adults, for whom we had complete data and with their full genes
experimental data.  
We used the Childhood Asthma Management Program (CAMP) as a compare group to test
the validity of our ABRIDGE study results. CAMP is a multicenter, randomized, placebo-
5
controlled, double-blind trial established to study the long-term effects of commonly prescribed
asthma treatment regimens[20]. With age range from 16 to 26, participants were subsequently
followed for a mean of 4.3 years, with lung function studies and questionnaires at regular intervals.
604 samples from the CAMP dataset were used in the current analysis. The ABRIDGE and CAMP
shared the same tested methods for asthma severity level, which are also the specific asthma
outcome (dependent variable) we used in our model.  
2.2 Gene-Gene Interaction  
Common features of three broad subnetwork-defining approaches (BioNet,
jActiveModules, and PinnacleZ) introduced above, include global gene network construction
according to gene expression data and scoring function for identify active subnetwork edges.
However, our goal is to identify subnetworks based on the significant edges, starting by
discriminating edges and narrowed them down according to statistical significance level. A
conventional linear regression-based approach was utilized to estimate gene-gene interactions in
this study. One of the strengths of linear regression is that it is a relatively easy way to interpret
significant edges between different gene pairs. Linear regression is also well supported through
mathematical theory; numerous studies have proved its credibility in the conduct of genetic
interaction studies[21][22][23].  
Before modeling the multivariable interactions, we conducted marginal analysis to select
the statistically significant main effect genes, with an alpha level of 0.05 as the cutoff for
significance. The rationale for  applying marginal analysis is two-fold: (1)there were a total of
18,388 human genes in our database, so that creating and testing gene pairs for a the entire set of
18,388 human genes in ABRIDGE dataset, and testing their association would be impractical and
6
computationally expensive; (2) under the independence criterion, filtering insignificant and low-
expression genes will improve the detection sensitivity of differentially expressed genes[24]. With
fixed covariates, marginal effects illustrated the magnitude of influence and the level of
significance of each single gene towards the outcome, which is useful to reduce redundancy.  
Suppose that a total number of P genes are measured on each of n samples. Let !
!
denote
the response vector of measures on all traits for n patients. Assume that each trait is modeled by
linear regression, denoted  
!
!
=β
!"
∗ &
#!
+ ε
#"
 (1)
Where β
!"
represents P-dimensional vector of regression coefficients for one gene in this
model,  #
#!
is the design matrix of P gene in genes expression data regarding n asthmatic. ε
#"
is
the error vector. For generality and simplicity, all features are centered, and adjusting covariates
and intercepts are not indicated in the formula. The covariates in the analysis included age, race,
gender, patient site, and three principal components (PCs).
With a significance level of 0.05, we identified the significant genes set in the marginal
analysis and selected the top 100 genes with the lowest p-values as significant genes for further
analysis. Compared to maintaining all significant genes with an alpha level of 0.05 to the next
steps, using the top 100 genes can guarantee not only computational efficiency but also the number
of testing genes pairs are consistent in the permuted and initial datasets.  
In genome-wide scale studies, multivariable regression models are often involved in the
detection of gene-gene interactions, which can be represented by a coefficient term[25]. We
proposed a hypothesis and created a multivariate model to estimate the significance level of all
7
gene pairs. Suppose that a total number of 100 genes (p genes) are measured on each of n samples.
The interaction data were generated from this model:  
!
$
=β1
$"
∗ ,1
#$
+ β2
$"
∗ ,2
#$
+ β3
$"
∗ ,1
#$
∗ ,2
#$
+ ε
#"
 (2)
The similarity between the marginal analysis model and interaction testing model is that
both of them are adjusting for the same covariates and PCs. For interaction models, we used two
gene variables, &1
#$
and &2
#$
, which represented two specific genes from 100 genes. posited the
existence of interaction (&1
#$
* &2
#$
), when analyzed gene pairs to the asthma outcome and
added the corresponding interaction variable. A set of internal gene-gene interactions p-values will
be computed and will be applied in the following subnetwork identification.  
Although multiple linear regression model is a statistically valid and straightforward
method, certain challenges require proper solutions. One challenge is that parametric assumption
is required to estimate this model accurately. Another challenge is the substantial increase in type
I error after multiple genetic testing[23]. For the first challenge, permutation technology can be
used to select the significant genes and to evaluate the significance of corresponding gene-pair
edges. Permutation is a type of statistical significance test where the distribution of the test statistic
under the null distribution is simulated by randomly permuting data and computing the test statistic
accordingly. If the null hypothesis is true, changing the exposure would not affect the outcome[14].
As a result, a massive scale of test statistic values is generated under the null assumption by
repeated permutation analysis. Under a non-parametric dataset, which is applicable to most of the
data analyzed here, permutation increases model robustness. [26]  
Through randomly re-assigned patient IDs on the gene expression data before merging with
the clinical data, we re-executed the whole gene expression data into our analysis while
8
maintaining the underlying correlation structure of the data. With all covariates unchanged, our
permutation process leads to 101 sets of internal gene-gene interaction in one non-permuted dataset
and 100 permuted datasets, which can be applied to the subnetwork identification in the next step.  
2.3 FDR and Parameter-based Approach  
For the second challenge, we applied the False Discovery Rate (FDR) to adjust the
interaction p-values for multiplicity in testing. FDR describes the proportion of falsely rejected
null hypotheses among those that showed statistically significant and rejected null hypotheses,
given the number of significant tests, was proposed by Benjamini and Hochberg in 1995[27]. The
Benjamini and Hochberg (BH) FDR procedure exhibits adaptability under different forms of
dependency and accurately manages FDR at a proscribed level[28]. The FDR procedure
contributes to a significant gain in power and solves multiple testing and large sample size
problems[29]. However, several researchers have noted concerns toward the BH FDR procedure
regarding the uncertain effect of genes interaction under non-parametric circumstances[28][30].  
For our multivariable linear regression model, which endure multiple testing challenges,
estimating the null distributions accurately is the primary goal. Compared to the BH FDR
procedure, the Millstein and Volfson approach (JM FDR; R package "fdrci", https://CRAN.R-
project.org/package=fdrci) is a less conservative FDR approach, where permutation is utilized to
estimate the FDR and confidence intervals for the FDR estimates. Although a permutation-based
approach is computationally intensive, it takes into account the multiple testing and dependencies
among tests, owing to their flexibility and generality. JM FDR can also calculate confidence
intervals to estimate the precision of the point estimate of FDR value under increasingly stringent
FDR thresholds, providing an estimate of  the reliability of the FDR results[31]. As a sensitivity
9
analysis, adjusted interaction p-values using the BH FDR for the non-permuted data were
computed to compare with the JM FDR results.  
2.4 Subnetworks Recognition  
After 100 permutation analyses are conducted, and statistical significance is defined by an
FDR threshold of 0.05, with upper 95% CI bound less than 1, we are able to define all the identified
interactions that are at least as extreme as the thresholds as connected network components, which
bring us the candidates of edges of subnetworks. So that we are capable of constructing
subnetworks based on independent edges rather than constructing the global network first and then
extracting active subnetworks based on the given score, which might be an alternative way. To
select significant subnetworks, we set subnetwork that spreads connectivity with more than two
nodes will consider as valid subnetwork. Weight of edges is defined as the negative value of the
log transformation of p-value.  
Gephi is a network visualization software designed to facilitate the nonlinear process of
information discovery[15]. It can achieve a spatialization process aiming at transforming the
network into a map. It computes the network parameters such as the number of nodes and edges,
and it reports several properties of the network. To evaluate the clusters within network and
centrality of vertices in the genetic network graph, we calculate several properties by Gephi.
Modularity class separates clusters, which interact more among themselves rather than other
clusters within the network; degree counts the number of connections for each gene; betweenness
centrality represents the degree of which nodes stand between each other cluster. Node with higher
betweenness centrality means that it has a strong influence over the flow of information between
10
different subgroups[15]. These properties help us to understand how vital each node is and how
does each subnetwork affect each other.  
2.5 Gene-Set Enrichment Analysis  
Gene set enrichment analysis (GSEA) is a powerful method to find general trends in the
enormous lists of genes or proteins generated by many functional genomics techniques and
bioinformatics analyses[32]. Multiple datasets have been created as source information for GSEA,
including the Gene Ontology (GO) knowledgebase, which is the world’s largest source of
information on the functions of genes[33]. GO manages all types of high-throughput experiments.
Extensive applications of the GO knowledgebase are produced in genetic studies because of its
vast numbers of GO categories, which are tested for gene set enrichment[33].  
KOBAS is a widely used GSEA tool, covering 5,945 species with incorporated
knowledge[34]. We applied KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/kobas3/?t=1) to analyze the
significant genes (nodes) in our subnetworks by comparing their enrichment signature patterns to
the enrichment signatures of matched genes in the GO knowledgebase. Such analysis provides a
global visualization of critical regulatory differences between normal genes and asthma genes and
demonstrates the most significantly enriched functions in the selected set of genes.  
2.6 Statistical Analysis Tool  
Data were analyzed and plotted using R software v 3.2.3 (https://www.rproject.org/).
Network graphs were drawn by Gephi 0.9.2 (https://gephi.org/). KOBAS 3.0
(http://kobas.cbi.pku.edu.cn/ kobas3) was utilized to achieve Gene-list Enrichment tests.  
11
3. Results  
We tested our method, with gene interactions identified and adjusted by the JM FDR
approach, to detect differentially connected subnetworks. Five analyses were performed: 1) The
detection of significant interaction term among two pair of genes which were selected by gene
marginal analysis; 2) Random permutation of gene data and repeated calculation in step 1, 100
times synchronously; 3) Adjustment of p-value of correlation by JM FDR approach; 4)
Construction of subnetworks in addition to network structure analysis; 5) Implementation of Gene-
list enrichment test of significant genes.  
3.1 Demographics of the Asthma BRIDGE Samples  
Two hundred forty-five persons with asthma participated in this study; full demographic
details of the research samples are presented in Table 1. Briefly, the average age was 22, ranging
from 11.7 to 31.6. Gender was evenly distributed, with 121 males and 124 females. The largest
race/ethnicity representation was Hispanic, accounting for about 72% of the sample. Europeans
accounted for 14% of the population, and the remaining participants were from different races.
The respective average value of Principal Components (PC1, PC2, and PC3) are 4.81, 0.60, and -
1.17. The average asthma test score is 7.46. Over half received a high school or college education
and the majority reported incomes less than $30,000 per year.  The features of 604 CAMP
participants are also listed in Table 1.  




12
Table 1: Characteristics of the Asthmatic Participants in ABRIDGE and CAMP Datasets
Characteristic ABRIDGE (n = 245) CAMP (n = 604)
Age, mean (range) 22.0 (11.7,31.6) 20.9 (16.5,25.8)
  Gender


Male 121 (49%) 376 (63%)
Female 124 (51%) 228 (37%)
  Race


Hispanic 177 (72%) 59 (10%)
European 34 (14%) 413 (68%)
Other 34 (14%) 132 (22%)
  Site Name


CHS/USC 107 (44%)  
MCCAS 138 (56%)  
CAMP  604 (100%)
  Principal Components 1 4.81 (-66.28,72.65) 0.16 (-57.83,79.40)
  Principal Components 2 0.60 (-63.44,48.94) 0.05 (-53.80,49.64)
  Principal Components 3 -1.17 (-36.74,55,13) 0.134 (-31.491,29.838)
  Outcome* 7.46 (4,20) 1.35 (0,25)
  *Number in (brackets) indicates range or percentage
  *Outcome: Test score for asthmatic diagnosed with asthma in last 6months

3.2 Interaction Gene Pairs Results  
The number of significant genes under an alpha level of 0.05 in our non-permuted marginal
analysis is 872, which is close to the number we would expect for an unbiased p-value under the
null hypothesis (0.05 * 18388 = 919). From Figure 1, we found that the average permutation
marginal p-value has a closer distribution to normal distribution, compared to the distribution of
the p-value of initial data. The average permutation p-value was calculated by summing up the
ordered p-value and dividing by the number of permutations.  
13

Figure 1: Comparison of distribution of marginal p-value in initial and average permuted data
3.3 Permutation-based FDR Results  
The selection of the top 100 significant genes with the lowest p-value yielded 4950 gene-
gene interaction tests in each observed and permuted gene expression dataset. In Figure 2, our
permuted average p-value distribution appears to be less extreme in terms of the distribution of the
p-value, compared to the initial p-value distribution. Figure 3 demonstrates some examples of 100
times of permutations, the permuted p-value approximates more possible test statistic values we
could have seen under the null hypothesis than the initial distribution.
14

Figure 2: Interaction p-value comparison for initial and average number of permutations

Figure 3: Examples of permutation results
15
In JM FDR methods, the definition of FDR and confidence interval is supported by up and
downstream of each gene pair[14]. P-values from the multivariable linear regression model were
calculated and adjusted by the JM FDR approach. Using a stepwise increasing FDR threshold, the
estimated FDR declined most sharply at the beginning, then gradually decreased towards a
minimum FDR (Figure 4). The corresponding confidence intervals on the FDR show the same
patterns, with narrower limits as the estimated FDR reached a minimum. Integer values shown
along with the plot specify the number of gene interactions that are at least as extreme as the
thresholds specified on the horizontal axis, pointing out the edges we are looking for in the
subnetwork structures. Since more extreme values do not correspond to lower rates of false
discoveries and the termination of the downward trend in the estimated FDR started at a
significance threshold of 3.45, we used 3.45 as the significance threshold to identify the most
statistically significant gene-gene interactions.  A total number of 70 interaction terms, involving
33 genes, were filtered by this significance threshold of 3.45, with an FDR level lower than 0.05
and CI of (0.02570, 0.09907).

Figure 4: 70 significant gene-gene interactions discovered under threshold=3.45 (100 permutations)
16

Figure 5: Significant results (FDR < 0.05) under different numbers of permutations
Unexpectedly, we found that the application of the JM FDR approach only yielded 70 valid
interaction terms under an FDR level of 0.05. Instead, the BH FDR estimation method yielded 442
valid interactions. Previous research has shown that the JM FDR approach is a less conservative
method than BH FDR[14]. If the FDR estimate is effective and approximately unbiased, then we
would expect JM FDR to yield more edges than the BH FDR approach. We observed that the
number of discoveries stabilized when the number of permutations reach about 50 permutations
(Figure 5). One of the possible reasons is that the permutation approach revealed the impact of
genotyping errors on rare variant association tests[35].
We also explored potential subnetworks using the permutation data, even though the
primary purpose of the permutation data was to calculate the FDR. 86% of the permutation data
showed no significant gene-gene interactions; among the 14% of permutation data sets with
significant gene-gene interactions, only two significant subnetworks with more than two nodes
17
were defined. However, none of these had a similar structure to the subnetworks identified using
the observed data. This result shows that significant edges in observed subnetworks are unique
under the ABRIDGE asthma genes dataset. Results proved the disparity of the observed and
permuted data.
3.4 Subnetwork Construction Results  
Among all significant gene pairs that were found to be associated with asthma, 11 center
genes represented the center of corresponding subnetworks; these genes were: ALOX15, BACE2,
C10ORF33, EMR4, EMR4P, GPR44, IL5RA, LOC653381, MYB, OLIG2, VLDLR. In Gephi, we
applied the Clockwise Rotate algorithm to get an overview of the network structure. We included
several indexes, including degree, modularity, betweenness centrality, and closeness centrality
(Table 2). Our network has 24 nodes and 48 edges, with an average degree of 2.667. The most
extended path length is 7, graph density equals to 0.116, which means that our general regulatory
network is a low-density network. Based on our observation and discrimination by modularity
level, the network has five subnetworks.
Table 2: Characteristics of each node in asthmatic genes network
ID Source Group Degree Closeness Centrality* Betweenness Centrality
1 LOC653381 0 3 0.377358 53
2 BACE2 0 2 0.285714 19
3 GAPT 0 1 0.224719 0
4 SLC29A1 0 1 0.277778 0
5 OLIG2 1 6 0.5 75.066667
6 MYB 1 6 0.5 64.716667
7 C10ORF33 1 5 0.408163 25.216667
8 PRSS33 1 5 0.454545 21.533333
9 CEBPE 1 3 0.416667 5.333333
10 RHOBTB3 1 3 0.416667 5.333333
18
11 EMR1 1 1 0.294118 0
12 GPR44 2 7 0.465116 56.966667
13 CD24 2 4 0.434783 29.333333
14 EMR4P 2 3 0.377358 1.75
15 ALOX15 2 2 0.363636 0.75
16 EMR4 2 1 0.307692 0
17 GFOD1 2 1 0.322581 0
18 FLJ43093 2 1 0.322581 0
19 IL5RA 3 2 1 1
20 GPR114 3 1 0.666667 0
21 INDO 3 1 0.666667 0
22 VLDLR 4 3 0.363636 37
23 GADD45A 4 1 0.27027 0
24 EPN2 4 1 0.27027 0
  *closeness centrality measures the centrality of one node in a network
Figure 6 is an affiliation network illustrating the interactions within different nodes. The
size of nodes in the network indicates the number of links a gene has that significantly exceeds the
average and indicates the magnitude of the affiliation exposures. Line thickness denotes the
strength of the co-affiliation. Different colors of nodes indicate different modularity classes,
designed to measure the strength of the division of a network into modules. Subnetworks with high
modularity have dense connections between the nodes within modules but sparse connections
between nodes in different modules.  
19

Figure 6: Network Structure
Our network was divided into two connected components. One connected component
contains five subnetworks. Two main clusters are located in the center of the network, scattered
by two small clusters. IL5RA, INDO, and GPR114 created another connected component on the
bottom left that appeared to be unconnected with other subnetworks. Among all the nodes, GPR44
(7) has the highest degree, with seven connections. OLIG2 (6) and MYB (6) also have a higher
degree than others. With a higher level of betweenness centrality and degree, compared to other
genes, they are likely the center genes of this network and play essential roles in connecting the
flow of information between different subgroups. GPR44 and CD24 have the highest weight of
edges (18.29), which is followed by EMR4 and CD24 (18.18), ALOX15, and GPR44 (17.38).  
20
3.5 Comparison with CAMP data
Since we acquired the significant subnetworks network, we tested our network using
CAMP data.  If the FDR estimate is approximately unbiased, and the two datasets share a degree
of similarity, we would expect about 95% of these discoveries to be true positives. Thus, we would
expect a sizable proportion of significant replicative genes in the CAMP data.

Figure 7: Replication Results in CAMP
However, we failed to reproduce the ABRIDGE results (Figure 7). With an alpha level of
0.05, none of the 70 gene pairs for which we observed significant results in the ABRIDGE data
showed significance in the CAMP data. The failure to reproduce our findings might be due to
differences in study populations. For example, the Race composition in ABRIDGE contains more
Hispanic (72%) than CAMP (10%), and there are less male in ABRIDGE (49%) than CAMP
21
(63%). In addition, we only selected the 100 most significant genes in the ABRIDGE data to
construct the interaction network. We might anticipate more significant results if we expand the
number of significant genes selected.
3.6 Gene-Set Enrichment Analysis  
Across all analyses in the GO dataset, 149 unique GO terms show association enrichment
with p-value less than 0.05. The GO biological processes enrichment analysis (p-value < 0.05 with
FDR correction) for significant nodes (genes) associated with asthma score in the ABRIDGE data
were identified through the KOBAS 3.0 system. Among all significant GO biological processes,
we observed that several asthma-associated GO terms, such as positive regulation of
developmental process, immune response, inflammatory response, response to hypoxia, regulation
of T cell proliferation and negative regulation of lymphocyte apoptotic process, were enriched by
our significant genes list. Since GO enrichment test was enriched by the asthma network we
discovered, instead of individual genes, we believe that the significant genes we observed in
ABRIDGE provide a more pertinent understanding of the mechanisms of development of asthma.  
Additionally, it was revealed that GO terms in the enrichment list are enriched with our
asthma-related genes. We observed that several genes were involved in a large number of enriched
GO biological processes. For example, VLDLR involves 28 GO terms, such as positive regulation
of developmental process, cellular response to oxygen-containing compound and primary
metabolic process; ALOX15 participates in 24 GO terms, including regulation of cell activation,
oxidoreductase activity; INDO participates in 21 GO terms, including external stimulus,
inflammatory response and developmental process. Even though GPR44 only related to 5 GO
terms, strong associations with biological regulation, adenylate cyclase-inhibiting G protein-
22
coupled receptor signaling pathway, and external stimulus demonstrates its high regulatory ability.
(See Appendix Table A1)
4. Discussion  
Since the prevalence of asthma are increasing year by year, understanding the pathogenesis
of asthma has become increasingly important, especially from a gene-gene interaction angle[3].
Gene-subnetwork detection methods have tremendous potential to serve as a useful tool for
studying asthma, because tackling such a complex genetic disease requires understanding of
pathophysiology to improve  practical diagnostic markers as well as therapeutic targets, which are
also the focus points of subnetwork research[2].  
Implementation of a multivariable regression model and adjustment of p-values by the JM
FDR approach allowed us to identify significant genetic correlations that satisfied the FDR
criterion. Gephi was used to model the network structure and to conduct network analysis based
on our detected gene edges. GO enrichment analysis helped us to target corresponding over-
represented GO terms using annotations for significant gene lists in this study.  
The statistical method we proposed and illustrated in this study provides a new idea to
identify differentially connected genetic subnetworks. This method proves its capability and
potential utility to overcome considerable tasks imposed by massive disease gene expression
datasets. The distinctive feature of this method is the feasibility of subnetwork construction under
a non-parametric assumption because of the use of the JM FDR approach. JM FDR provides a
novel method (R package "fdrci") where permutations can be used to estimate FDR and includes
confidence intervals that bracket the FDR estimates in a non-parametric manner, which can
23
account for dependencies among tests. For weak effects where an FDR level of 0.05 may not be
achievable, CIs of FDR estimates provide some reference and play an especially important role in
predicting those subnetworks which constructed by weak effects.  
The multivariable linear model, which is relatively easy to implement yet supported by
well-characterized theory[37][38][39], combined with an JM FDR approach performed well in
identifying significant gene-gene interactions. In contrast, previous methods using only P-values
to identify the differential gene or gene pair expression may lead to misinterpretation of the original
information in the context of large-scale significance testing. However, the results of the sensitivity
analysis showed in this example that JM FDR was more stringent than BH FDR. Taking account
of the fact that JM FDR has been shown to be a less conservative method than BH, these results
contradicted our expectations. If the parametric assumptions are violated, we will observe the
difference between JM FDR and BH FDR since JM FDR is applied in a non-parametric assumption
while BH FDR is opposite. If JM FDR is unbiased, another reason is that permutations may
increase the likelihood of different types of errors, which will inflate the type I error rate and
decrease power[35]. An increase in the magnitude of errors in multiple genotypes can lead to
inflated type I error rate and decreased power, while specific rare variant tests and study designs
may be more robust to genotype errors[35]. When researchers have completed stratified FDR for
rare variant data, they found that their estimated FDR values tended to be much smaller than the
true FDRs, likely due to a massive number of tests leading to disequilibrium[36]. Even though we
only highlighted 70 significant gene-gene interactions using a stringent significance threshold, it
is clear from the tight CIs bracketing FDR estimates at more permissive significance thresholds
that substantially, more such interactions are implied by these results.  
24
General network characteristics helped us to understand the connectivity, complexity, and
structure of our network, where edges and nodes potentially convey multiple information. Based
on standard methods of interaction estimation in addition to the criteria that identify significant
subnetworks, we distinguished 24 candidate genes that were classified into five different clusters.
Among these, MYB, ALOX15, GPR44, INDO, IL5RA have been demonstrated as genes that
occupy a key position in the susceptibility of asthma and asthma-related phenotypes
[37][38][39][40][41][42][43]. Even though most of the remaining genes were not reported to
significantly influence on asthma, we cannot overlook the possibility that these genes have latent
influence in asthma and asthma-related disease, given the significant associations with proved
asthma-related genes we detected in this study. We believe that attention in the future via
additional experimental studies are necessary for the asthma-related genes we found in this
study.  
Even though we did not successfully replicate the ABRIDGE result in the CAMP data, our
approach has demonstrated how the combination of False Discovery Rate and interaction model
can be informative in detecting subnetworks of asthma-related genes. Furthermore, our methods
have the potential to lend insight and make accurate predictions in a wide range of biological
applications.



 
25
Bibliography  
[1].Ober C, Yao TC. The genetics of asthma and allergic disease: a 21st century perspective. Immunol Rev.
2011;242:10–30.
[2].Network GA. The Global Asthma Report, Auckland, New Zealand. (2018).
[3].A population-based study of bronchial asthma in adult twin pairs.Nieminen MM, Kaprio J, Koskenvuo
MChest. 1991 Jul; 100(1):70-5.  
[4].Vicente, Cristina T., Joana A. Revez, and Manuel AR Ferreira. "Lessons from ten years of genome‐wide
association studies of asthma." Clinical & translational immunology 6.12 (2017): e165.
[5].Zhang Y, Moffatt MF, Cookson WO: Genetic and genomic approaches to asthma: new insights for the
origins. Curr Opin Pulm Med 2012; 18: 6–13.  
[6].Gilbert‐Diamond, Diane, and Jason H. Moore. "Analysis of gene‐gene interactions." Current protocols in
human genetics 70.1 (2011): 1-14.
[7].Duffy, David L., et al. "Genetics of Asthma and Hay Fever in Australian Twins1-3." Am rev respir Dis
142 (1990): 1351-1358.
[8].Howard, Timothy D., et al. "Gene-gene interaction in asthma: IL4RA and IL13 in a Dutch population
with asthma." The American Journal of Human Genetics 70.1 (2002): 230-236.  
[9].Gaire, Raj K et al. “Discovery and analysis of consistent active sub-networks in cancers.” BMC
bioinformatics vol. 14 Suppl 2,Suppl 2 (2013): S7. doi:10.1186/1471-2105-14-S2-S7
[10].McGeachie MJ, Yates KP, Zhou X, Guo F, Sternberg AL, Van Natta ML, Wise RA, Szefler SJ, Sharma
S, Kho AT, et al.; CAMP Research Group. Genetics and genomics of longitudinal lung function patterns
in individuals with asthma. Am J Respir Crit Care Med 2016;194:1465–1474.
[11].Mitra K, Carvunis AR, Ramesh SK, Ideker T. Integrative approaches for
finding modular structure in biological networks. Nat Rev Genet. 2013;14(10):719–32.
[12].He, H., Lin, D., Zhang, J. et al. Comparison of statistical methods for subnetwork detection in the
integration of gene expression and protein interaction network. BMC Bioinformatics 18, 149 (2017).
https://doi.org/10.1186/s12859017-1567-2
[13].Beisser D, Klau GW, Dandekar T, Muller T, Dittrich MT. BioNet: an R-Package for the functional
analysis of biological networks. Bioinformatics. 2010;26(8):1129–30.
[14].Millstein J, Volfson D. Computationally efficient permutation-based confidence interval estimation for
tail-area FDR. Front Genet. 2013;4:179. Published 2013 Sep 17. doi:10.3389/fgene.2013.00179
[15].Bastian, Mathieu, Sebastien Heymann, and Mathieu Jacomy. "Gephi: an open source software for
exploring and manipulating networks." Third international AAAI conference on weblogs and social
media. 2009.
26
[16].Croteau-Chonka DC, Qiu W, Martinez FD, Strunk RC, Lemanske RF Jr, Liu AH, Gilliland FD, Millstein
J, Gauderman WJ, Ober C, Krishnan JA, White SR, Naureckas ET, Nicolae DL, Barnes KC, London SJ,
Barraza-Villarreal A, Carey VJ, Weiss ST, Raby BA, Asthma BioRepository for Integrative Genomic
Exploration Consortium: Gene Expression Profiling in Blood Provides Reproducible Molecular Insights
into Asthma Control. Am J Respir Crit Care Med 2017; 195: 179–188.
[17].Torgerson DG, Ampleford EJ, Chiu GY, Gauderman WJ, Gignoux CR, Graves PE, et al; Mexico City
Childhood Asthma Study (MCAAS), Gilliland FD; Children’s Health Study (CHS) and HARBORS
Study, Burchard EG; Genetics of Asthma in Latino Americans (GALA) Study, Study of Genes-
Environment and Admixture in Latino Americans (GALA2) and Study of African Americans, Asthma,
Genes & Environments (SAGE), Martinez FD; Childhood Asthma Research and Education (CARE)
Network, Weiss ST; Childhood Asthma Management Program (CAMP), Williams LK; Study of Asthma
Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE), Barnes KC; Genetic
Research on Asthma in African Diaspora (GRAAD) Study, Ober C, Nicolae DL: Meta-analysis of
genome-wide association studies of asthma in ethnically diverse North American populations. Nat Genet
2011; 43: 887–892.
[18].Kothari PH, Qiu W, Croteau-Chonka DC, Martinez FD, Liu AH, Lemanske RF Jr, Ober C, Krishnan JA,
Nicolae DL, Barnes KC, London SJ, Barraza-Villarreal A, White SR, Naureckas ET, Millstein J,
Gauderman WJ, Gilliland FD, Carey VJ, Weiss ST, Raby BA, Asthma BioRepository for Integrative
Genomic Exploration Consortium: The role of local CpG DNA methylation in mediating the 17q21
asthma-susceptibility GSDMB/ORMDL3 expression quantitative trait locus. J Allergy Clin Immunol
2018; 141: 2282–2286.e6.
[19].Breton CV, Siegmund KD, Joubert BR, Wang XH, Qui WL, Carey V, Nystad W, Haberg SE, Ober C,
Nicolae D, Barnes KC, Martinez F, Liu A, Lemanske R, Strunk R, Weiss S, London S, Gilliland F, Raby
B, Consortium BRIDGE Consortium: Prenatal tobacco smoke exposure is associated with childhood
DNA CpG methylation. PLoS One 2014; 9:e99716.
[20].Brehm, John M., et al. "Serum vitamin D levels and severe asthma exacerbations in the Childhood
Asthma Management Program study." Journal of Allergy and Clinical Immunology 126.1 (2010): 52-58.
[21].Hamon, Rebecca Elisa; McLaughlin, Michael John; Gilkes, R. J.; Rate, A. W.; Zarcinas, Bernard
Alexander; Robertson, A.; Cozens, Gill; Radford, Nigel; Bettenay, L.  Geochemical indices allow
estimation of heavy metal background concentration in soils, Global Biogeochemical Cycles,2004;
18:GB1014.
[22].Asselbergs, F. W., et al. "Gender‐specific correlations of plasminogen activator inhibitor‐1 and tissue
plasminogen activator levels with cardiovascular disease‐related traits." Journal of Thrombosis and
Haemostasis 5.2 (2007): 313-320.
27
[23].Wahlsten, D. (1990). Insensitivity of the analysis of variance to heredityenvironment interaction.
Behavioral and Brain Sciences, 13(1), 109-120. doi:10.1017/S0140525X00077797  
[24].Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high throughput
experiments. Proc Natl Acad Sci U S A. 2010;107(21):9546–9551. doi:10.1073/pnas.0914005107
[25].Renz, H., et al. "Allergic diseases, gene–environment interactions." Allergy 66 (2011): 10-12.
[26].Pattin, Kristine A., et al. "A computationally efficient hypothesis testing method for epistasis analysis
using multifactor dimensionality reduction." Genetic Epidemiology: The Official Publication of the
International Genetic Epidemiology Society 33.1 (2009): 87-94.
[27].Benjamini, Yoav, and Yosef Hochberg. “Controlling the False Discovery Rate: A Practical and Powerful
Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological), vol.
57, no. 1, 1995, pp. 289–300. JSTOR, www.jstor.org/stable/2346101. Accessed 27 Feb. 2020.
[28].Kim, Kyung In, and Mark A. van de Wiel. "Effects of dependence in high-dimensional multiple testing
problems." BMC bioinformatics 9.1 (2008): 114.
[29].Yoav Benjamini and Daniel Yekutieli (2005): False Discovery Rate–Adjusted Multiple Confidence
Intervals for Selected Parameters, Journal of the American Statistical Association, 100:469, 71-81
[30].van Loon, Wouter, et al. "The Power of the Benjamini-Hochberg Procedure." (2017)
[31].Kogan, Vladimir, et al. "Genetic-Epigenetic Interactions in Asthma Revealed by a Genome-Wide Gene-
Centric Search." Human heredity 83.3 (2018): 130-152.
[32].Subramanian, Aravind, et al. "Gene set enrichment analysis: a knowledge-based approach for interpreting
genome-wide expression profiles." Proceedings of the National Academy of Sciences 102.43 (2005):
15545-15550.
[33].Ashburner, Michael, et al. "Gene ontology: tool for the unification of biology." Nature genetics 25.1
(2000): 25-29.
[34].Xie, C. et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and
diseases. Nucleic Acids Research 39, W316–W322 (2011).
[35].Evaluating the impact of genotype errors on rare variant tests of association. Cook K, Benitez A, Fu C,
Tintle N Front Genet. 2014; 5():62.
[36].Exploring the potential benefits of stratified false discovery rates for region-based testing of association
with rare genetic variation. Xu C, Ciampi A, Greenwood CM, UK10K Consortium. Front Genet. 2014;
5():11.
[37].Wang X, Chen Z. Genomic Approach to Asthma. Shangai: Springer Nature Singapore Pte Ltd (2018).
[38].Maeda, Yukiko, et al. "Genetic impact of functional single nucleotide polymorphisms in the 3-UTR
region of the chemoattractant receptor expressed on Th2 cells (CRTH2) gene on asthma and atopy in a
Japanese population." International archives of allergy and immunology 142.1 (2007): 51-58.
28
[39].Sekigawa, T., et al. "Gene-expression profiles in human nasal polyp tissues and identification of genetic
susceptibility in aspirin-intolerant asthma." Clinical Experimental Allergy 39.7 (2009): 972-981.
[40].Hákonarson, Hákon, et al. "Allelic frequencies and patterns of single-nucleotide polymorphisms in
candidate genes for asthma and atopy in Iceland." American journal of respiratory and critical care
medicine 164.11 (2001): 2036-2044.
[41].Maeda, Yukiko, et al. "Indian Hedgehog produced by postnatal chondrocytes is essential for maintaining
a growth plate and trabecular bone." Proceedings of the National Academy of Sciences 104.15 (2007):
6382-6387.
[42].Moffatt, Miriam F., et al. "Genetic variants regulating ORMDL3 expression contribute to the risk of
childhood asthma." Nature 448.7152 (2007): 470-473.
[43].Song, Young-Sin, et al. "Effect of genetic polymorphism of ALOX15 on aspirin-exacerbated respiratory
disease." International archives of allergy and immunology 159.2 (2012): 157-161.









29
Appendix

Table A1: GSEA Results
#Term ID
Input
num
ber
Backgro
und
number
P-Value
Corrected P-
Value
Input Hyperlink
cellular
process
GO:0009
987
6 2484
0.0021284159
7937
0.057230692
6815
BACE2|VLDLR|EMR1|EPN2|RHOB
TB3|SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0009987
nucleobase-
containing
compound
metabolic
process
GO:0006
139
5 933
0.0001534073
55673
0.020193092
1201
GADD45A|MYB|SLC29A1|VLDLR|
CEBPE
http://amigo.geneontology.org/amigo/ter
m/GO:0006139
primary
metabolic
process
GO:0044
238
5 1578
0.0016608117
9461
0.057230692
6815
GADD45A|INDO|MYB|OLIG2|SLC2
9A1
http://amigo.geneontology.org/amigo/ter
m/GO:0044238
development
al process
GO:0032
502
4 932
0.0017459093
9079
0.057230692
6815
ALOX15|SLC29A1|CEBPE|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0032502
biological
regulation
GO:0065
007
4 1832
0.0185205652
089
0.073852569
7271
GPR44|GPR114|RHOBTB3|SLC29A
1
http://amigo.geneontology.org/amigo/ter
m/GO:0065007
positive
regulation of
development
al process
GO:0051
094
3 188
0.0001692158
55755
0.020193092
1201
INDO|VLDLR|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0051094
phosphorylati
on
GO:0016
310
3 360
0.0010987074
936
0.057230692
6815
GADD45A|IL5RA|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0016310
organic
cyclic
compound
biosynthetic
process
GO:1901
362
3 689
0.0067731640
2978
0.057230692
6815
INDO|MYB|CEBPE
http://amigo.geneontology.org/amigo/ter
m/GO:1901362
transport
GO:0006
810
3 691
0.0068271419
5191
0.057230692
6815
ALOX15|CEBPE|EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0006810
positive
regulation of
cellular
process
GO:0048
522
3 768
0.0091046455
7831
0.059262965
7643
INDO|OLIG2|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0048522
multicellular
organism
development
GO:0007
275
3 802
0.0102365279
64
0.062113169
6797
ALOX15|CEBPE|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0007275
intrinsic
component
of membrane
GO:0031
224
3 843
0.0117067665
195
0.066524165
3014
BACE2|VLDLR|SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0031224
macromolecu
le
biosynthetic
process
GO:0009
059
3 869
0.0126995726
145
0.068250184
9508
BACE2|VLDLR|CEBPE
http://amigo.geneontology.org/amigo/ter
m/GO:0009059
protein
metabolic
process
GO:0019
538
3 911
0.0144036828
866
0.069682682
0728
BACE2|MYB|ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0019538
organonitrog
en compound
metabolic
process
GO:1901
564
3 1027
0.0197645521
551
0.075273507
1438
PRSS33|IL5RA|ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:1901564
cellular
response to
stimulus
GO:0051
716
3 1113
0.0243679994
701
0.077301825
5627
IL5RA|ALOX15|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0051716
positive
regulation of
lymphocyte
activation
GO:0051
251
2 38
0.0002366850
62328
0.021183313
0784
MYB|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0051251
response to
cytokine
GO:0034
097
2 137
0.0028130648
3782
0.057230692
6815
VLDLR|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0034097
30
cellular
response to
oxygen-
containing
compound
GO:1901
701
2 161
0.0038407458
2845
0.057230692
6815
VLDLR|CEBPE
http://amigo.geneontology.org/amigo/ter
m/GO:1901701
peptidyl-
amino acid
modification
GO:0018
193
2 174
0.0044599421
5136
0.057230692
6815
IL5RA|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0018193
positive
regulation of
cellular
component
organization
GO:0051
130
2 189
0.0052279834
9878
0.057230692
6815
MYB|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0051130
regulation of
protein
phosphorylati
on
GO:0001
932
2 192
0.0053884036
9288
0.057230692
6815
GADD45A|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0001932
regulation of
protein
modification
process
GO:0031
399
2 235
0.0079317724
8876
0.057282740
3457
VLDLR|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0031399
positive
regulation of
cell
communicati
on
GO:0010
647
2 236
0.0079962553
2113
0.057282740
3457
ALOX15|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0010647
kinase
activity
GO:0016
301
2 241
0.0083222411
4431
0.057282740
3457
GADD45A|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0016301
positive
regulation of
multicellular
organismal
process
GO:0051
240
2 242
0.0083881507
8324
0.057282740
3457
INDO|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0051240
molecular
function
regulator
GO:0098
772
2 307
0.0131684041
637
0.068250184
9508
IL5RA|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0098772
negative
regulation of
gene
expression
GO:0010
629
2 309
0.0133305934
433
0.068250184
9508
OLIG2|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0010629
negative
regulation of
nitrogen
compound
metabolic
process
GO:0051
172
2 316
0.0139051595
159
0.069127465
8036
BACE2|OLIG2
http://amigo.geneontology.org/amigo/ter
m/GO:0051172
response to
external
stimulus
GO:0009
605
2 366
0.0183145410
219
0.073852569
7271
GPR44|INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0009605
regulation of
multicellular
organismal
process
GO:0051
239
2 428
0.0244926710
019
0.077301825
5627
INDO|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0051239
catalytic
activity,
acting on a
protein
GO:0140
096
2 488
0.0311712228
157
0.085568414
0026
IL5RA|CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0140096
regulation of
response to
stimulus
GO:0048
583
2 593
0.0443689714
064
0.100064442
348
INDO|ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0048583
cellular
protein
modification
process
GO:0006
464
2 607
0.0462633569
727
0.102871315
504
MYB|VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0006464
regulation of
cellular
macromolecu
le
biosynthetic
process
GO:2000
112
2 619
0.0479109476
717
0.105303722
915
OLIG2|CEBPE
http://amigo.geneontology.org/amigo/ter
m/GO:2000112
31
endomembra
ne system
GO:0012
505
2 633
0.0498604588
027
0.106891620
83
BACE2|EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0012505
oxidoreducta
se activity,
acting on
single donors
with
incorporation
of molecular
oxygen,
incorporation
of two atoms
of oxygen
GO:0016
702
1 5
0.0034004373
2719
0.057230692
6815
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0016702
regulation of
histone H3-
K9
methylation
GO:0051
570
1 5
0.0034004373
2719
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0051570
negative
regulation of
interleukin-
10
production
GO:0032
693
1 5
0.0034004373
2719
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0032693
type 2
immune
response
GO:0042
092
1 6
0.0039661027
1364
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0042092
positive
regulation of
T-helper cell
differentiatio
n
GO:0045
624
1 6
0.0039661027
1364
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0045624
regulation of
transforming
growth factor
beta
production
GO:0071
634
1 7
0.0045314616
1713
0.057230692
6815
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0071634
negative
regulation of
lymphocyte
apoptotic
process
GO:0070
229
1 7
0.0045314616
1713
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0070229
vesicle coat
GO:0030
120
1 7
0.0045314616
1713
0.057230692
6815
EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0030120
T-helper cell
differentiatio
n
GO:0042
093
1 7
0.0045314616
1713
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0042093
heterotypic
cell-cell
adhesion
GO:0034
113
1 7
0.0045314616
1713
0.057230692
6815
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0034113
positive
regulation of
interleukin-
12
production
GO:0032
735
1 7
0.0045314616
1713
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0032735
regulation of
glycoprotein
metabolic
process
GO:1903
018
1 7
0.0045314616
1713
0.057230692
6815
BACE2
http://amigo.geneontology.org/amigo/ter
m/GO:1903018
p38MAPK
cascade
GO:0038
066
1 7
0.0045314616
1713
0.057230692
6815
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0038066
regulation of
activated T
cell
proliferation
GO:0046
006
1 8
0.0050965141
9581
0.057230692
6815
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0046006
lymphocyte
apoptotic
process
GO:0070
227
1 8
0.0050965141
9581
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0070227
32
B cell
activation
involved in
immune
response
GO:0002
312
1 8
0.0050965141
9581
0.057230692
6815
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0002312
positive
regulation of
gliogenesis
GO:0014
015
1 9
0.0056612606
0775
0.057230692
6815
OLIG2
http://amigo.geneontology.org/amigo/ter
m/GO:0014015
ventral spinal
cord
development
GO:0021
517
1 9
0.0056612606
0775
0.057230692
6815
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0021517
body fluid
secretion
GO:0007
589
1 10
0.0062257010
1096
0.057230692
6815
SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0007589
negative
regulation of
reproductive
process
GO:2000
242
1 10
0.0062257010
1096
0.057230692
6815
GPR44
http://amigo.geneontology.org/amigo/ter
m/GO:2000242
clathrin coat
GO:0030
118
1 11
0.0067898355
6335
0.057230692
6815
EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0030118
interleukin-
12
production
GO:0032
615
1 11
0.0067898355
6335
0.057230692
6815
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0032615
adenylate
cyclase-
inhibiting G
protein-
coupled
receptor
signaling
pathway
GO:0007
193
1 11
0.0067898355
6335
0.057230692
6815
GPR44
http://amigo.geneontology.org/amigo/ter
m/GO:0007193
regulation of
alpha-beta T
cell
activation
GO:0046
634
1 12
0.0073536644
2276
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0046634
B cell
proliferation
GO:0042
100
1 12
0.0073536644
2276
0.057230692
6815
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0042100
CD4-
positive,
alpha-beta T
cell
activation
GO:0035
710
1 12
0.0073536644
2276
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0035710
calcium-
dependent
protein
binding
GO:0048
306
1 12
0.0073536644
2276
0.057230692
6815
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0048306
signal
transduction
involved in
mitotic cell
cycle
checkpoint
GO:0072
413
1 12
0.0073536644
2276
0.057230692
6815
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0072413
histone
lysine
methylation
GO:0034
968
1 12
0.0073536644
2276
0.057230692
6815
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0034968
carbohydrate
derivative
transport
GO:1901
264
1 13
0.0079171877
4696
0.057282740
3457
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:1901264
negative
regulation of
cell cycle
G1/S phase
transition
GO:1902
807
1 14
0.0084804056
9363
0.057282740
3457
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:1902807
regulation of
epithelial cell
differentiatio
n
GO:0030
856
1 14
0.0084804056
9363
0.057282740
3457
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0030856
cargo
receptor
activity
GO:0038
024
1 15
0.0090433184
2039
0.059262965
7643
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0038024
33
positive
regulation of
leukocyte
differentiatio
n
GO:1902
107
1 17
0.0101682288
442
0.062113169
6797
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:1902107
regulation of
dendrite
development
GO:0050
773
1 17
0.0101682288
442
0.062113169
6797
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0050773
positive
regulation of
leukocyte
proliferation
GO:0070
665
1 17
0.0101682288
442
0.062113169
6797
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0070665
cyclin-
dependent
protein
kinase
activity
GO:0097
472
1 18
0.0107302268
561
0.062974118
2704
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0097472
cellular
response to
hypoxia
GO:0071
456
1 18
0.0107302268
561
0.062974118
2704
SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0071456
nephron
tubule
development
GO:0072
080
1 19
0.0112919202
778
0.065201733
217
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0072080
tetrapyrrole
binding
GO:0046
906
1 22
0.0129751745
732
0.068250184
9508
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0046906
positive
regulation of
JNK cascade
GO:0046
330
1 22
0.0129751745
732
0.068250184
9508
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0046330
regulation of
histone
modification
GO:0031
056
1 23
0.0135356512
053
0.068250184
9508
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0031056
kinase
regulator
activity
GO:0019
207
1 23
0.0135356512
053
0.068250184
9508
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0019207
cellular
response to
starvation
GO:0009
267
1 23
0.0135356512
053
0.068250184
9508
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0009267
fatty acid
derivative
metabolic
process
GO:1901
568
1 24
0.0140958240
326
0.069127465
8036
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:1901568
nephron
development
GO:0072
006
1 26
0.0152152588
997
0.069834137
0011
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0072006
regulation of
actin
filament
length
GO:0030
832
1 26
0.0152152588
997
0.069834137
0011
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0030832
positive
regulation of
supramolecul
ar fiber
organization
GO:1902
905
1 26
0.0152152588
997
0.069834137
0011
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:1902905
endocytic
vesicle
membrane
GO:0030
666
1 26
0.0152152588
997
0.069834137
0011
EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0030666
regulation of
protein
polymerizati
on
GO:0032
271
1 27
0.0157745212
53
0.070590982
607
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0032271
cellular
response to
molecule of
bacterial
origin
GO:0071
219
1 27
0.0157745212
53
0.070590982
607
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0071219
leukocyte
activation
involved in
immune
response
GO:0002
366
1 28
0.0163334804
283
0.071145704
6633
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0002366
34
cellular
response to
lipopolysacc
haride
GO:0071
222
1 28
0.0163334804
283
0.071145704
6633
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0071222
positive
regulation of
ERK1 and
ERK2
cascade
GO:0070
374
1 29
0.0168921365
821
0.071145704
6633
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0070374
telencephalo
n
development
GO:0021
537
1 29
0.0168921365
821
0.071145704
6633
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0021537
isomerase
activity
GO:0016
853
1 31
0.0180085404
511
0.073852569
7271
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0016853
leukocyte
proliferation
GO:0070
661
1 32
0.0185662884
789
0.073852569
7271
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0070661
regulation of
ERK1 and
ERK2
cascade
GO:0070
372
1 32
0.0185662884
789
0.073852569
7271
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0070372
adaptive
immune
response
based on
somatic
recombinatio
n of immune
receptors
built from
immunoglob
ulin
superfamily
domains
GO:0002
460
1 33
0.0191237341
106
0.074416269
6913
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0002460
cellular
ketone
metabolic
process
GO:0042
180
1 33
0.0191237341
106
0.074416269
6913
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0042180
negative
regulation of
protein
kinase
activity
GO:0006
469
1 34
0.0196808775
024
0.075273507
1438
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0006469
cellular
response to
extracellular
stimulus
GO:0031
668
1 36
0.0207942581
902
0.075677995
0074
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0031668
mononuclear
cell
proliferation
GO:0032
943
1 36
0.0207942581
902
0.075677995
0074
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0032943
regulation of
JNK cascade
GO:0046
328
1 36
0.0207942581
902
0.075677995
0074
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0046328
hormone
metabolic
process
GO:0042
445
1 37
0.0213504957
982
0.075677995
0074
BACE2
http://amigo.geneontology.org/amigo/ter
m/GO:0042445
regulation of
actin
cytoskeleton
organization
GO:0032
956
1 37
0.0213504957
982
0.075677995
0074
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0032956
multi-
multicellular
organism
process
GO:0044
706
1 38
0.0219064317
9
0.076140801
7557
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0044706
G2/M
transition of
mitotic cell
cycle
GO:0000
086
1 38
0.0219064317
9
0.076140801
7557
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0000086
35
positive
regulation of
cytosolic
calcium ion
concentration
GO:0007
204
1 39
0.0224620663
215
0.076298417
0213
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0007204
negative
regulation of
kinase
activity
GO:0033
673
1 40
0.0230173995
483
0.076298417
0213
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0033673
positive
regulation of
cell-cell
adhesion
GO:0022
409
1 40
0.0230173995
483
0.076298417
0213
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0022409
endocytic
vesicle
GO:0030
139
1 40
0.0230173995
483
0.076298417
0213
EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0030139
amide
binding
GO:0033
218
1 44
0.0252357225
198
0.077301825
5627
BACE2
http://amigo.geneontology.org/amigo/ter
m/GO:0033218
phagocytosis
GO:0006
909
1 44
0.0252357225
198
0.077301825
5627
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0006909
postsynapse
GO:0098
794
1 44
0.0252357225
198
0.077301825
5627
SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0098794
regulation of
mononuclear
cell
proliferation
GO:0032
944
1 44
0.0252357225
198
0.077301825
5627
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0032944
positive
regulation of
MAP kinase
activity
GO:0043
406
1 45
0.0257895515
558
0.077301825
5627
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0043406
actin
filament
organization
GO:0007
015
1 45
0.0257895515
558
0.077301825
5627
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0007015
signal
transduction
by p53 class
mediator
GO:0072
331
1 45
0.0257895515
558
0.077301825
5627
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0072331
positive
regulation of
leukocyte
activation
GO:0002
696
1 45
0.0257895515
558
0.077301825
5627
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0002696
apical plasma
membrane
GO:0016
324
1 46
0.0263430802
197
0.077301825
5627
SLC29A1
http://amigo.geneontology.org/amigo/ter
m/GO:0016324
cellular
response to
abiotic
stimulus
GO:0071
214
1 46
0.0263430802
197
0.077301825
5627
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0071214
coated
vesicle
GO:0030
135
1 46
0.0263430802
197
0.077301825
5627
EPN2
http://amigo.geneontology.org/amigo/ter
m/GO:0030135
regulation of
inflammatory
response
GO:0050
727
1 47
0.0268963086
666
0.077652245
9891
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0050727
regulation of
actin
filament-
based
process
GO:0032
970
1 47
0.0268963086
666
0.077652245
9891
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0032970
MAP kinase
activity
GO:0004
707
1 48
0.0274492370
516
0.078614614
9157
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0004707
cellular
response to
peptide
GO:1901
653
1 52
0.0296579530
712
0.082949587
496
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:1901653
glycoprotein
biosynthetic
process
GO:0009
101
1 52
0.0296579530
712
0.082949587
496
BACE2
http://amigo.geneontology.org/amigo/ter
m/GO:0009101
regulation of
T cell
activation
GO:0050
863
1 52
0.0296579530
712
0.082949587
496
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0050863
36
carboxylic
acid
biosynthetic
process
GO:0046
394
1 53
0.0302093834
701
0.083836893
6611
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0046394
adaptive
immune
response
GO:0002
250
1 55
0.0313113470
233
0.085568414
0026
GAPT
http://amigo.geneontology.org/amigo/ter
m/GO:0002250
cellular
response to
external
stimulus
GO:0071
496
1 56
0.0318618804
866
0.085952127
9285
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0071496
positive
regulation of
cell
development
GO:0010
720
1 57
0.0324121152
803
0.085952127
9285
OLIG2
http://amigo.geneontology.org/amigo/ter
m/GO:0010720
leukocyte
cell-cell
adhesion
GO:0007
159
1 57
0.0324121152
803
0.085952127
9285
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0007159
response to
molecule of
bacterial
origin
GO:0002
237
1 57
0.0324121152
803
0.085952127
9285
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0002237
positive
regulation of
cell cycle
GO:0045
787
1 61
0.0346100708
441
0.090440915
0523
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0045787
endocytosis
GO:0006
897
1 63
0.0357072606
168
0.091308566
4345
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0006897
cellular
amino acid
metabolic
process
GO:0006
520
1 64
0.0362554090
398
0.091404481
9455
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0006520
RNA
polymerase
II cis-
regulatory
region
sequence-
specific
DNA binding
GO:0000
978
1 66
0.0373508137
279
0.092217871
1352
MYB
http://amigo.geneontology.org/amigo/ter
m/GO:0000978
immune
response-
regulating
signaling
pathway
GO:0002
764
1 67
0.0378980703
004
0.092295980
7316
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0002764
regulation of
protein
complex
assembly
GO:0043
254
1 67
0.0378980703
004
0.092295980
7316
ALOX15
http://amigo.geneontology.org/amigo/ter
m/GO:0043254
cell
morphogenes
is involved in
neuron
differentiatio
n
GO:0048
667
1 72
0.0406299015
872
0.096970031
7881
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0048667
small
GTPase
binding
GO:0031
267
1 77
0.0433543276
304
0.100064442
348
RHOBTB3
http://amigo.geneontology.org/amigo/ter
m/GO:0031267
Ras GTPase
binding
GO:0017
016
1 77
0.0433543276
304
0.100064442
348
RHOBTB3
http://amigo.geneontology.org/amigo/ter
m/GO:0017016
heterocycle
catabolic
process
GO:0046
700
1 78
0.0438983258
933
0.100064442
348
INDO
http://amigo.geneontology.org/amigo/ter
m/GO:0046700
cytokine-
mediated
signaling
pathway
GO:0019
221
1 79
0.0444420288
64
0.100064442
348
IL5RA
http://amigo.geneontology.org/amigo/ter
m/GO:0019221
37
regulation of
protein
serine/threoni
ne kinase
activity
GO:0071
900
1 79
0.0444420288
64
0.100064442
348
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0071900
regulation of
cell cycle
process
GO:0010
564
1 81
0.0455285495
4
0.101870129
596
GADD45A
http://amigo.geneontology.org/amigo/ter
m/GO:0010564
small
GTPase
mediated
signal
transduction
GO:0007
264
1 86
0.0482396942
962
0.105303722
915
RHOBTB3
http://amigo.geneontology.org/amigo/ter
m/GO:0007264
response to
bacterium
GO:0009
617
1 89
0.0498628510
574
0.106891620
83
VLDLR
http://amigo.geneontology.org/amigo/ter
m/GO:0009617
enzyme
activator
activity
GO:0008
047
1 89
0.0498628510
574
0.106891620
83
CD24
http://amigo.geneontology.org/amigo/ter
m/GO:0008047 
Asset Metadata
Creator Wu, Xiaowei (author) 
Core Title Identification of differentially connected gene expression subnetworks in asthma symptom 
Contributor Electronically uploaded by the author (provenance) 
School Keck School of Medicine 
Degree Master of Science 
Degree Program Biostatistics 
Publication Date 05/12/2020 
Defense Date 05/01/2020 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag asthma,false discovery rate,genes,interaction,network,OAI-PMH Harvest 
Language English
Advisor Millstein, Joshua (committee chair), Franklin, Meredith (committee member), Mack, Wendy (committee member) 
Creator Email xwu651@usc.edu 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c89-304764 
Unique identifier UC11665947 
Identifier etd-WuXiaowei-8488.pdf (filename),usctheses-c89-304764 (legacy record id) 
Legacy Identifier etd-WuXiaowei-8488.pdf 
Dmrecord 304764 
Document Type Thesis 
Rights Wu, Xiaowei 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract Background: Gene subnetworks have been demonstrated to be involved in the pathogenesis of complex genetic diseases such as asthma. However, it is still a significant challenge to construct accurate subnetworks from gene expression network data. ❧ Methods: We developed a permutation-based, non-parametric method to identify differently connected subnetworks as well as indispensable genes within the overall network. The selection of gene-gene interactions is based on a multivariable regression model and permutation-based FDR approaches. We implemented Gephi to draw the gene network and conduct the network analysis, as well as the KOBAS system, to review Gene Ontology (GO) annotations for entries in significant gene lists and to search for enrichment patterns. Analysis data was based on the ABRIDGE dataset, including 245 asthmatics and corresponding 18388 genes expression data. ❧ Results: Forty-eight significant gene-gene interactions were discovered under a FDR threshold of 0.05. We identified five significant subnetworks, which are unique in ABRIDGE expression data since no similar pattern was discovered under permutation data. However, we were unable to replicate the network in the asthma comparison group. ❧ Conclusion: GPR44, MYB, VLDLR, ALOX15, INDO, and OLIG2 play key roles among the gene subnetworks we discovered. The hub nodes and skeleton substructures of subnetworks are consistent with prior knowledge regarding asthma pathways. Enrichment analysis revealed that the list of top genes is enriched with asthma genes. Our study illustrates the feasibility and validity of this method, and it could be an alternative method for targeting subnetworks that are associated with different symptoms in complex diseases. 
Tags
false discovery rate
genes
interaction
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button