Close
About
FAQ
Home
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Nonlinear modeling and machine learning methods for environmental epidemiology
(USC Thesis Other)
Nonlinear modeling and machine learning methods for environmental epidemiology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NONLINEAR MODELING AND MACHINE LEARNING METHODS FOR ENVIRONMENTAL EPIDEMIOLOGY By Noa Molshatzki ____________________________________________________________________________ A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BIOSTATISTICS) August 2019 Copyright 2019 Noa Molshatzki ii TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ v LIST OF FIGURES ..................................................................................................................... vi OVERVIEW OF DISSERTATION ............................................................................................ 1 CHAPTER 1. OPTIMAL FLOW RATE SAMPLING DESIGNS FOR STUDIES WITH EXTENDED EXHALED NITRIC OXIDE ANALYSIS ........................................................... 5 ABSTRACT ................................................................................................................................ 5 INTRODUCTION ...................................................................................................................... 7 METHODS ................................................................................................................................. 9 Literature Review .................................................................................................................... 9 Optimal design theory in nonlinear regression .................................................................... 10 Theoretical approach to optimizing sampling design ........................................................... 11 Simulation study approach to evaluating optimal sampling design ..................................... 13 Validation of results using previously published data .......................................................... 14 Power to detect associations with NO parameters ............................................................... 15 RESULTS ................................................................................................................................. 16 Literature review ................................................................................................................... 16 Optimal sampling design ...................................................................................................... 17 Validation of results using previously published data .......................................................... 20 Power to detect associations with NO parameters ............................................................... 20 DISCUSSION ........................................................................................................................... 21 REFERENCES ......................................................................................................................... 28 TABLES AND FIGURES ........................................................................................................ 33 SUPPLEMENTARY DATA .................................................................................................... 39 Fisher’s information matrix and criteria for an optimal sampling design ........................... 49 R code for calculating the theoretical covariance ................................................................ 50 iii Difference between theoretical calculation of the covariance and the analytical method used in ‘nls’ ........................................................................................................................... 51 Detailed description of the simulation study on power to detect associations with NO parameters ............................................................................................................................ 51 REFERENCES ..................................................................................................................... 52 CHAPTER 2. TRANSFORM-BOTH-SIDES METHODS FOR NONLINEAR MIXED EFFECTS MODELS .................................................................................................................. 62 ABSTRACT .............................................................................................................................. 62 INTRODUCTION .................................................................................................................... 64 METHODS ............................................................................................................................... 66 Box-Cox transformation for linear model............................................................................. 66 Transform both sides (TBS) for the nonlinear mixed effect model ....................................... 67 ANOVA 2-step grid search method ....................................................................................... 69 Unified method ...................................................................................................................... 70 CHS data and data simulation .............................................................................................. 71 Transformation estimation .................................................................................................... 72 Transformation misspecification .......................................................................................... 73 RESULTS ................................................................................................................................. 74 Transformation estimation- simulation study ....................................................................... 74 Transformation misspecification .......................................................................................... 75 CHS data application ............................................................................................................ 75 DISCUSSION ........................................................................................................................... 76 REFERENCES ......................................................................................................................... 82 TABLES AND FIGURES ........................................................................................................ 85 SUPPLEMENTARY DATA .................................................................................................... 88 The Argatroban model .......................................................................................................... 89 iv CHAPTER 3. METHODS TO IDENTIFY KEY PREDICTORS AND 2-WAY INTERACTIONS USING GRADIENT BOOSTING MODELS IN LARGE HEALTHCARE DATA .............................................................................................................. 90 ABSTRACT .............................................................................................................................. 90 INTRODUCTION .................................................................................................................... 92 METHODS ............................................................................................................................... 94 Trees and Gradient boosting model ...................................................................................... 94 Importance summary statistics and visualization techniques ............................................... 95 BOTH-GBM analysis ............................................................................................................ 97 Data simulation ..................................................................................................................... 99 KPSC data ........................................................................................................................... 101 RESULTS ............................................................................................................................... 102 Simulation study .................................................................................................................. 102 KPSC data ........................................................................................................................... 104 DISCUSSION ......................................................................................................................... 106 REFERENCES ....................................................................................................................... 111 TABLES AND FIGURES ...................................................................................................... 116 SUPPLEMENTARY DATA .................................................................................................. 123 DISCUSSION ............................................................................................................................ 127 v LIST OF TABLES Table 1.1 Sample size required as a function of sampling design……………………………….33 Table S1.1 The 147 papers reporting on 150 voluntary flow rate sampling designs identified in the literature review. ..................................................................................................................... 39 Table 3.1 Gestational diabetes importance summary statistics and resampled p value………...116 Table 3.2 Gestational diabetes logistic regression, Train set…………………………………...117 Table S3.1 Gestational diabetes logistic regression, Test set…………………………………..123 vi LIST OF FIGURES Figure 1. Modeling of extended NO analysis ................................................................................. 4 Figure 1.1 Study selection for the literature review on target flow rate sampling designs........... 34 Figure 1.2 Theoretical relationship between flow rates in the discrete design space and FeNO. 35 Figure 1.3 Distribution of target flow rates identified in the literature review. ............................ 35 Figure 1.4 Impact of different NO parameter values (CHS values) ............................................. 36 Figure 1.5 Performance and Monte Carlo bias flow rate designs ................................................. 36 Figure 1.6 Flow rates of the 10 best four flow rate sampling designs. ......................................... 37 Figure 1.7 Optimality criteria performance assuming different numbers of target flow rates. .... 37 Figure 1.8 Optimality criteria performance under varying range of allowable flow rates. .......... 37 Figure 1.9 Validating the importance of low/high flow rates in parameter estimation using previously published data. ............................................................................................................ 38 Figure S1.1 Impact of different NO parameter values (literature review) .................................... 52 Figure 2.1 Transform both sides (TBS) parameter across estimation methods ............................ 85 Figure 2.2 Transformation misspecification for the FeNO model parameter estimates. .............. 86 Figure 2.3 Standardized residuals after applying transform both sides (TBS) on CHS data. ....... 87 Figure S2.1 FeNO vs. flow rates ................................................................................................... 88 Figure S2.2 Nonlinear fixed effect model fails to converge ......................................................... 88 Figure S2.3 Transformation misspecification for the Argatroban model parameter estimates. ... 89 Figure 3.1 BOTH-GBM approach identifying both important predictors and interactions for GBM with a binary outcome....................................................................................................... 118 Figure 3.2 Proportion of discovered effects in simulated data (r=0) .......................................... 119 Figure 3.3 Average running time ................................................................................................ 120 Figure 3.4 Partial dependence plots for marginal trend of important gestational diabetes mellitus determinants ................................................................................................................................ 121 Figure S3.1 Partial dependence plots for nonlinear effects in simulated data ............................ 124 vii Figure S3.2 Illustration of the bias toward continuous variables in reference null distribution . 124 Figure S3.3 Proportion of discovered effects in simulated data (r=0.5) ..................................... 125 Figure S3.4 Illustration of observed effect and reference null distribution effects. ................... 126 1 OVERVIEW OF DISSERTATION Recent technological developments are creating new opportunities and new data in the biomedical sciences that require the development and application of novel statistical methods. On one hand, technological/methodological advancements in biomarker assessment have created opportunities for personalized medicine and early disease detection. Biomarkers may be easier to assess (e.g., less expensive, less invasive) than other health endpoints. While this feature makes biomarkers attractive for practical reasons, statistical approaches might be necessary to extract maximal information from biomarker data. On the other hand, computerization of medical records generates large healthcare datasets that can be mined to study population-level health trends. It is not always possible to analyze these large datasets with traditional parametric statistical methods. Machine learning methods are being increasingly applied to large data in clinical or epidemiological settings where interpretation is of interest, despite the fact these methods were originally designed to optimize prediction. Methods for understanding these “black box” models are an open and active area of research. My dissertation is motivated by these two broad statistical challenges mentioned above, which have been encountered with data being used to detect associations between air pollution exposures and health outcomes in Southern California. In Chapters 1 and 2, I focus on a biomarker responsive to air pollution exposures. These projects meld a stochastic modeling framework with a deterministic physiological model that describes the production and dynamics of the biomarker in the human body. I aim to improve the statistical methods while keeping the biological model unchanged. In Chapter 1, I use a frequentist approach to derive optimal sampling protocols for the biomarkers. In Chapter 2, I use a Bayesian approach to estimate an optimal statistical model for the biomarker. In Chapter 3, I use a machine learning approach to 2 investigate methods for identifying important predictors and interactions, shifting focus from personal biomarker measurements to big data. Note that since Chapters 1-3 each present a distinct research question, each chapter is formatted as a stand-alone product and statistical notation may vary between chapters. In Chapters 1-2, I study stochastic models with an underlying deterministic component, motivated by multiple flow exhaled nitric oxide. Nonlinear regression models are composed of a deterministic, physically driven nonlinear equation plus additive random errors. A biomarker of airway inflammation, the fractional concentration of exhaled nitric oxide (FeNO), is an important biomarker for environmental epidemiology because it is responsive to air pollution exposures. Localized airway inflammation (airway vs. alveolar) can be non-invasively quantified using data on repeated FeNO maneuvers at multiple fixed exhalation flow rates (extended NO analysis). The closed form, steady-state solution to a system of differential equations on the dynamics of FeNO in the lower respiratory tract represents FeNO as a nonlinear function of flow rate and three NO parameters (NO parameters). I apply my methods to data from the Southern California Children's Health Study (CHS), a population-based cohort study of the long-term effects of air pollution on children’s respiratory health. In Chapter 1, I study optimal flow rate sampling designs for extended NO analysis. A growing number of studies use extended NO analysis (Figure 1, Step 0), but there is currently no standardized flow rate sampling design. Under a nonlinear regression model for estimating NO parameters for one person, I characterized optimal sampling designs by using a rigorous statistical theory-based approach. I examined the effects of NO parameter estimation uncertainties (caused by using different sampling designs) on the sample size required to detect 3 associations with potential determinants (Figure 1, Step 2). This work was published in 2017 in the ‘Journal of Breath Research’, the leading breath research journal.(1) In Chapter 2, I investigate a transformation-based approach to improve the normality and homoscedasticity assumptions of the nonlinear mixed effects model for multiple flow FeNO measured in multiple study participants. Departure from these assumptions can lead to incorrect inference. In nonlinear models, to preserve the physiological relationship between the outcome and model parameters, the transformation must be applied on both the outcome and the regression function. Due to lack of an analytical solution for the transformed model, the transformation is often chosen a priori. I introduce a data-based method for choosing the transformation, a novel unified Bayesian Markov chain Monte Carlo method for jointly estimating the transformation and the model parameters. To further understand the implications of transformation, I examine the effect of misspecified transformation on parameter estimation. In Chapter 3, I examine methods to identify whether gestational diabetes is associated with air pollution exposures and other covariates using a machine learning model on data from hospitals affiliated with Southern California Kaiser Permanente. For this large healthcare data application, generalized boosted regression models (GBMs) presented an opportunity to simultaneously assess a large number of possible predictors while allowing for nonlinearities and interactions without any pre-specified hypotheses. GBMs are designed to produce accurate predictions, but interpretations cannot be made directly. Importance of a variable (overall) and in interactions is summarized by so-called importance statistics, but these statistics are not accompanied by thresholds to distinguish between important and unimportant variables. These statistics also tend to prioritize predictors with numerous categories. I combine methods to both detect overall important predictors and predictors with important 2-way interactions by creating a 4 reference null distribution out of importance statistics measured under an altered outcome. Furthermore, I suggest a novel, simplified, faster, implementation of the null reference distribution approach to identify the importance of 2-way interactions. Finally, in the Discussion I summarize the major findings and contributions to statistical science from this dissertation and outline future research directions. Figure 1. Modeling of extended NO analysis 1. Molshatski N, Eckel SP. Optimal flow rate sampling designs for studies with extended exhaled nitric oxide analysis. Journal of Breath Research 2017;11(1):016012. 5 CHAPTER 1. OPTIMAL FLOW RATE SAMPLING DESIGNS FOR STUDIES WITH EXTENDED EXHALED NITRIC OXIDE ANALYSIS ABSTRACT INTRODUCTION: The fractional concentration of exhaled nitric oxide (FeNO) is a biomarker of airway inflammation. Repeat FeNO maneuvers at multiple fixed exhalation flow rates (extended NO analysis) can be used to estimate parameters quantifying proximal and distal sources of NO in mathematical models of lower respiratory tract NO. A growing number of studies use extended NO analysis, but there is no official standard flow rate sampling protocol. In this paper, we provide information for study planning by deriving theoretically optimal flow rate sampling designs. METHODS: First, we reviewed previously published designs. Then, under a nonlinear regression framework for estimating NO parameters in the steady-state two compartment model of NO, we identified unbiased optimal four flow rate designs (within the range of 10 to 400 ml/s) using theoretical derivations and simulation studies. Optimality criteria included NO parameter standard errors (SEs). A simulation study was used to estimate sample sizes required to detect associations with NO parameters estimated from studies with different designs. RESULTS: Most designs (77%) were unbiased. NO parameter SEs were smaller for designs with: more target flows, more replicate maneuvers per target flow, and a larger range of target flows. High flows were most important for estimating alveolar NO concentration, while low flows were most important for the proximal NO parameters. The Southern California Children's Health Study design (30, 50, 100 and 300 ml/s) had ≥1.8 fold larger SEs and required 1.1 to 3.2 6 fold more subjects to detect the association of a determinant with each NO parameter as compared to an optimal design of 10, 50, 100 and 400 ml/s. CONCLUSIONS: There is a class of reasonable flow rate sampling designs with good theoretical performance. In practice, designs should be selected to balance the tradeoffs between optimality and feasibility of the flow range and total number of maneuvers. 7 INTRODUCTION The fractional concentration of exhaled nitric oxide (FeNO) is a noninvasive biomarker of airway inflammation (1). FeNO is being increasingly assessed in clinical, occupational and environmental epidemiology studies (2-4). Online FeNO measurement has been standardized for clinical use at a 50 ml/s flow rate (5) to control for the inverse relationship between FeNO and exhalation flow rate (6, 7). At a 50 ml/s flow rate, FeNO is mostly from bronchial (proximal) airway sources (8). At higher flow rates, FeNO is an imperfect proxy for alveolar (peripheral) NO concentration (8-11). Information on parameters quantifying NO dynamics in the lower respiratory tract can be obtained using repeated FeNO maneuvers at multiple fixed exhalation flow rates, also referred to as “extended NO analysis” (10, 12, 13). The association between FeNO and flow rate (𝑉 ̇ ) is described by the following closed form solution to the steady-state two-compartment model (11): 𝐹𝑒𝑁𝑂 = + 𝐶 𝑁𝑂 − exp − ̇ (1) where the alveolar NO concentration (C ANO), airway tissue diffusing capacity (DawNO), and maximum airway flux (J′awNO equal to the product of DawNO and the mean airway tissue concentration of NO, CawNO) are subsequently referred to as the NO parameters. The NO parameters represent physical quantities that cannot be measured directly (14). A growing number of studies use extended NO analysis (11, 14, 15), but there is currently no standardized flow rate sampling design. Standardization of online FeNO assessment at 50 ml/s (FeNO50), based on thorough review of published material and expert consensus (5, 16, 17), has resulted in a large body of research. In extended NO analysis, the flow rate sampling design will likely be an important part of a standardized protocol. While some studies have 8 partially addressed the issue, there has been no comprehensive review of existing designs or statistical derivation of optimal designs. For the statistical derivation of an optimal design, it is important to note that FeNO is a nonlinear function of the NO parameters and flow rate in Equation 1. Nonlinear regression models are used for estimating parameters in models of biological response in a wide variety of fields (e.g., pharmacokinetics and pharmacodynamics). In these fields, it is common practice to use the statistical properties of nonlinear models to find optimal sampling designs (18). The optimal sampling design for estimation of nonlinear model parameters can be derived exactly using the Fisher information matrix. For example, Guedj et al (19) used this approach to derive optimal sampling designs for studies of hepatitis C viral kinetics. In general, the Fisher information matrix is a function of the sampling design and describes how much information the sampling design provides for parameter estimation. In particular, theoretical standard errors (SEs) of model parameters estimates can be calculated from the Fisher information matrix. In this paper, we studied optimal flow rate sampling designs for individual-level NO parameter estimation in extended NO analysis, under the assumption of a steady-state two compartment model. Our goals were twofold: (1) to conduct a comprehensive review of flow rate sampling designs in the literature and (2) to derive optimal sampling designs for extended NO analysis using a rigorous statistical theory-based approach, informed by results from the literature review. We used theoretical derivations of the Fisher information matrix and simulation studies to identify optimal, unbiased sampling designs and to evaluate the effects of various factors on the performance of these designs, including: the range of target flows considered, the number of target flows, and the number of replicate maneuvers per target flow. To provide practical context for our theoretical work, we validated our results using previously 9 published data and examined the impact of various designs on the sample size (number of study subjects) required to detect associations with NO parameters. METHODS Literature Review Relevant papers were identified on June 1, 2016 using the PubMed search: ((“multiple flow” OR “extended” OR “flow rates” OR “alveolar”)) AND (FeNO OR Exhaled nitric oxide OR “exhaled NO” OR “FE(NO)” OR “NO exchange”). All terms were categorized as “text word”. Filters for English language and humans were applied to all papers except those published after December 31, 2014 since recent papers may not yet have had these tags applied. A total of 335 studies were identified (Figure 1.1). Papers with access to English full text (N=327) were briefly reviewed (abstract and, if necessary, introduction/methods) and restricted to the 180 studies collecting multiple flow rate FeNO data. Papers not reporting specific target flow rates (e.g., reporting only the range of target flow rates) were excluded (N=19). When it was obvious that multiple papers reported on the same data, one representative paper was retained (N=14 repeats excluded). The remaining 147 papers [Table S1.1] reporting on 150 voluntary flow rate sampling designs were reviewed to record the following information: target flow rates, number of replicate maneuvers per flow rate, study participant age group (children only (≤ 18 years old); children and adults; adults only) and whether participants were healthy/recruited from the general population or had a specific condition/disease (included case only and case-control studies). Observed FeNO was typically used to estimate NO parameters using linear approximation models (20, 21) (these sampling designs typically restricted to high flow rates only) or by direct (22, 23) / approximate nonlinear models (15, 24). 10 Optimal design theory in nonlinear regression In the following, we motivate our theoretical approach of using optimal sampling design theory for nonlinear regression in the context of our extended NO analysis application. We will be choosing the optimal subset of flows—from a larger predetermined set of exhalation flow rates—which provide the most information for estimating NO parameters. We briefly describe how nonlinear regression can be used to estimate NO parameters and then highlight the key aspects of optimal design theory for nonlinear regression parameter estimation that help us to quantify the information provided by the sampling design and provide criteria for selecting an optimal design. The fundamental framework of regression can be represented as follows: 𝑌 = 𝑓 + 𝜀 (2) where 𝑗 indexes each observation (i.e., j indexes FeNO maneuver). The outcome 𝑌 equals the sum of a mean function (𝑓 ) and normally distributed error (𝜀 ) that is assumed to have mean zero and variance 𝜎 . For linear regression, 𝑓 is a linear function of model parameters. For nonlinear regression, 𝑓 is a nonlinear function of model parameters, as in the case of the deterministic, closed-form solution to the steady-state two compartment model (Equation 1). Based on previous work, we use nonlinear regression with a “log transform both sides” approach to better satisfy the assumption of equal variance and normally distributed errors (23). In this approach, 𝑌 = log 𝐹𝑒𝑁𝑂 and 𝑓 𝜃 , 𝑉 ̇ = log 𝐽 ′ 𝑁𝑂 𝐷 𝑁𝑂 + 𝐶 𝑁𝑂 − 𝐽 ′ 𝑁𝑂 𝐷 𝑁𝑂 𝑒𝑥𝑝 − 𝐷 𝑁𝑂 𝑉 ̇ 11 where “log” denotes the natural log, 𝜃 = (𝐶 𝑁𝑂 , 𝐽 ′ 𝑁𝑂 , 𝐷 𝑁𝑂 ) denotes the vector of NO parameters to be estimated, and 𝑉 ̇ denotes the flow rate. Note that our assumption of a constant error variance (𝜎 ) additive on the log scale is analogous to assuming reproducible maneuvers in that sense that logFeNO observed at any given flow rate has a variance of 𝜎 . The Supplementary Data presents a detailed overview of the Fisher information matrix and criteria for an optimal sampling design in nonlinear regression. Briefly, for linear and nonlinear regression, the inverse Fisher information matrix is the minimal theoretical covariance matrix of an unbiased estimator of model parameters (25, 26). The theoretical covariance matrix depends on the sampling design. Functions of the theoretical covariance matrix can be used as criteria for identifying an optimal sampling design (27). Specifically, the theoretical SE of the NO parameters can be obtained as elements on the diagonal of this matrix and used as NO parameter-specific criteria for an optimal design. The cube root of the determinant of this matrix (henceforth referred to as “D-Cov”) can be used as a global criterion for an optimal design. It is important to note that in nonlinear regression (unlike in linear regression), the theoretical covariance depends on the parameter values (18, 27, 28) so, theoretically, the optimal sampling design for extended NO analysis could vary depending on the values of the NO parameters. To avoid a circular argument—where the optimal design for parameter estimation depends on the value of these parameters—we compared optimal designs under several fixed sets of NO parameter values (27). Theoretical approach to optimizing sampling design The optimal sampling design was chosen from a discrete design space of eight flow rates identified from the literature review: 10, 20, 30, 50, 100, 200, 300 and 400 ml/s. Our work was 12 motivated by the Southern California Children's Health Study (CHS). The CHS is a large-scale prospective cohort study of school children originally designed to investigate the chronic effects of air pollution on children's respiratory health. In 2010, extended NO data were collected in 1640 children, ages 12-15, at target flow rates of 30, 50, 100 and 300 ml/s using Ecomedics Model CLD88-SP analyzers (23, 29). In this paper, we considered data from the 1507 children with ≥1 valid maneuver at each of the 4 target flow rates. In practice, the 50 ml/s flow rate was repeated three times (other flow rates were repeated only twice) for better comparability with earlier FeNO50 data collection in the CHS. For simplicity, we assume the CHS design has 2 replicate maneuvers per target flow for the theoretical derivations and simulation studies in this paper. Note that the CHS design is a subset of our discrete design space. Figure 1.2 describes the theoretical relationship between flow and FeNO according to the two-compartment model (Equation 1) at the 8 flow rates in our discrete design space, for a hypothetical person with the median NO parameter values observed in the CHS (C ANO=1.2 ppb, J’awNO=700 pl·s -1 , DawNO=12.3 pl·s -1 ·ppb -1 ) (23). For a given constraint on the sampling design (number of flow rates and number of replicate maneuvers at each flow rate) and set of NO parameter values, we calculated the theoretical covariance for every possible combination of flow rates from the discrete design space. For example, for a design with 4 flow rates, there are 70 (8 choose 4) different possible sampling schemes. We performed an exhaustive search across the set of 70 possible designs to identify the designs that minimized our performance criteria: the SE of each NO parameter as well as the global criterion “D-Cov”. Theoretically, changing the number of replicates (𝑟 ) at each target flow rate multiples each optimality criterion value by a fixed factor. Specifically, the theoretical impact of increasing from 𝑟 replicates to 𝑟 + 1 replicates is that each criterion (SE or D-Cov) is multiplied by 13 . For example, regardless of the sampling design, using 3 replicates instead of 2 will multiply the theoretical SE of each NO parameter by 0.82 or, equivalently, reduce each SE by 18%. Hence, unless otherwise stated, we use 2 replicates per flow rate in theoretical derivations since the ATS/ERS guidelines for FeNO50 recommend performing repeated, reproducible exhalations (5). Finally, the error variance, 𝜎 , has a fixed multiplicative effect on the Fisher information matrix. In the CHS data, it was previously observed that the within-subject variance of log(FeNO) between the target flows of 30 to 300 ml/s was approximately 𝜎 = 0.1 2 at each flow (23). In this paper, we extrapolate beyond the range of flows in the CHS, so we assume a slightly larger variance (0.15 2 ). Code for calculating the theoretical covariance using the freely available statistical software R can be found in the Supplemental Data. Simulation study approach to evaluating optimal sampling design As there is no analytical solution for parameter estimation in nonlinear models, the primary goal of the simulation studies was to identify biased sampling designs. We generated each simulated multiple flow dataset in the following manner. Theoretical FeNO was calculated based on a log transform-both-sides version of Equation 1, assuming a given set of NO parameter values and the target flow rates in the discrete flow rate design space. Observed log FeNO was calculated as theoretical log FeNO plus normally distributed random error for each replicate at each flow. Similar to the theoretical calculations, we assumed an error variance 𝜎 = 0.15 , 2 replicates per flow rate, and median CHS NO parameter values. For each flow rate sampling design, log transform both sides nonlinear regression was applied to 10,000 simulated multiple flow datasets using the R function ‘nls’ to obtain individual NO parameter estimates and SE. A short discussion on the difference between theoretical 14 calculation of the covariance matrix and the analytical method used in ‘nls’ can be found in the Supplemental Data. Monte Carlo estimates of bias (difference between the mean of parameter estimates across the 10,000 simulated datasets and the true parameter value) were calculated for each parameter. The design was defined to be unbiased for an NO parameter if the bias was < 5% of that parameter value (e.g., differences of less than 0.06 ppb for C ANO, 35 pl·s -1 for J’awNO and 0.62 pl·s -1 ·ppb -1 for DawNO, assuming the median CHS values for the NO parameters). Monte Carlo estimates of SE (median of sample parameter SE across the 10,000 datasets) were used to validate the theoretical derivations of the optimal designs, including the formula for the impact of increasing the number of replicate maneuvers, in small sample size scenarios where asymptotic (i.e., large sample size) theoretical assumptions may not apply (e.g., with 4 flow rates, the number of maneuvers per person is n=4 for 1 replicate and n=8 for 2 replicates). Validation of results using previously published data We evaluated the real-world impact of our theoretical results using previously published data on FeNO measured across an extremely wide range of flow rates. Briefly, Table 1 in Silkoff et al (22) provides data on FeNO measured at nine flow rates: 4.2, 8.5, 10.3, 17.2, 20.7, 38.2, 75.6, 850, and 1,550 ml/s (one maneuver per flow rate) in ten nonsmoking nonasthmatic subjects with no respiratory disease (8 male, ages 15 to 64). Using this data, we estimated log transform both sides nonlinear models for each subject using: (1) 6 different subsets of 4 flow rates, each subset consisting of the 3 highest flows available (75.6, 850, and 1,550 ml/s) plus one low flow (either 4.2, 8.5, 10.3, 17.2, 20.7 or 38.2 ml/s) and (2) 3 different subsets of 4 flow rates, each subset consisting of 3 low flows (10.3, 20.7, and 38.2 ml/s) plus one high flow (either 75.6, 850, or 15 1,550 ml/s). We compared SE values to determine the impact, in practice, of increasingly larger minimum or lower maximum flow rate values. Power to detect associations with NO parameters Many studies are interested in the association between localized airway inflammation (as assessed by estimated NO parameters) and potential determinants (e.g., disease status, medication use, or environmental exposure). A common approach is to first estimate NO parameters and then fit a second regression model that estimates the association (𝛽 ) of a specific estimated NO parameter (e.g., C ANO) and the potential determinant. Estimation of 𝛽 is affected by the uncertainty in NO parameter estimation, which varies for different sampling designs. Using a simulation study, we evaluated the impact of the flow rate sampling design on power/sample size considerations for studies relating a potential determinant X to estimated NO parameters. The simulation study is described in detail in the Supplemental Data. Briefly, we assumed that the true, unobservable NO parameter of interest (𝜃 ) was related to a continuous X using simple linear regression (𝜃 = 𝛼 + 𝛽𝑋 + 𝛿 ). For a given NO parameter of interest and sampling design, we simulated 10,000 datasets each containing multiple flow FeNO data from N hypothetical participants where the true NO parameter of interest was a function of X and then we: (1) estimated NO parameters for each of the N participants using log transform-both-sides nonlinear regression and (2) related the estimated NO parameter values to X in a simple linear regression and recorded the estimated 𝛽 and its associated p-value. Using results from the 10,000 datasets, we estimated the power as the percent of statistically significant 𝛽 (p<0.05) and calculated the mean estimated 𝛽 to investigate possible bias. This process was then repeated for other NO parameters and other sampling designs. Due to the skewed distributions observed in 16 practice, J’awNO and DawNO were log transformed in these simulations. We determined the sample size necessary to achieve 80% power in the CHS design (30, 50, 100 and 300 ml/s) and 5 other designs with different minimum or maximum flow rates: (1) 10, 50, 100 and 400 ml/s, (2) 20, 50, 100 and 400 ml/s, (3) 30, 50, 100 and 400 ml/s, (4) 10, 50, 100 and 300 ml/s and (5) 20, 50, 100 and 300 ml/s. All statistical analysis and data simulation was performed using R 3.2.2 (30). RESULTS Literature review Each of the 150 flow rate sampling designs identified in the literature review included 2 to 8 target flow rates, with the most common being 4 (47%). Most studies (70%) used linear approximation methods to estimate NO parameters, 15% used nonlinear methods and 15% did not use a steady-state two-compartment model. Figure 1.3 summarizes the distribution of the target flow rates in the 150 designs. The most common target flow rates were 50 (20%), 100 (21%) and 200 (16%) ml/s. The number of replicate maneuvers per target flow was reported in 65% of the designs. In all but three of these designs, participants were asked to perform 2-3 replicates. Overall, the total number of maneuvers required of participants ranged from 4 to 24 (estimated by multiplying the number of target flows by number of replicates per flow), with the most common being 12 (33% of the studies reporting number of replicates). The designs were mainly applied to studies of adults (75%) and participants with a specific disease or condition (82%). The maximum flow rate ranged from 46 to 950 ml/s, with the most common being 200 ml/s (32%). Designs for adults tended to use higher maximal flows compared to designs for children. For example, 26% of designs for adults used a maximal flow rate >300 ml/s as 17 compared to 11% of designs for children (chi-square p-value= 0.096). To investigate the minimum flow rates used in practice, it is necessary to divide the studies into those using linear vs. nonlinear estimation methods since the linear approximation is most valid at higher flows. The minimum flow rate for designs for linear estimation methods ranged from 10 to 50 ml/s (most common 50 ml/s, 72%) and the minimum flow rate for nonlinear estimation methods ranged from 5 to 30 ml/s (most common 10 ml/s, 48%). Among studies using nonlinear estimation methods, 81.2% of the studies in adults (n=16) and 57.1% of the studies in children (n=7) had a minimum flow rate <30 ml/s, although the difference was not statistically significant (chi-square p-value=0.487). Optimal sampling design Different NO parameter values gave rise to similar optimal four flow rate designs. This result held both for NO parameters based on extremes observed in the CHS (Figure 1.4) and for NO parameters observed in various diseases (from a review by Högman 2012 (14), Figure S1.1). In these figures, the 70 possible four flow rate sampling designs from the eight flow rate discrete sampling space were indexed by the performance of each design with “reference” NO parameters equal to the medians observed in the CHS (C ANO=1.2 ppb, J’awNO=700 pl·s -1 , DawNO=12.3 pl·s -1 ·ppb -1 ). For example, the design that minimized the SE of C ANO for median CHS NO parameter values (10, 30, 50, 400 ml/s flow rates) was the 1st and 3rd best design, respectively, for NO parameters typical of a subject with very high (> 90 th percentile) or very low (<10 th percentile) FeNO50. Despite the similarity of the ranking of the designs across NO parameter values, note that the values of the optimality criteria did depend on the NO parameter values, particularly for the SE of CANO, SE of J’awNO and D-Cov criteria. Specifically, the same 18 sampling design applied to two participants with different NO parameters (e.g., a typical participant with high FeNO50 in the CHS: CANO=1.6 ppb, J’awNO=3809.8 pl·s -1 , DawNO=16.9 pl·s -1 ·ppb -1 vs a participant with median CHS NO parameters) produced a ≥ 3-fold increase in the SE of CANO, SE of J’awNO, and D-Cov criteria but no change in the SE of D awNO. For the remainder of this paper, we assumed the median NO parameters in the CHS. Based on the simulation study, a small number of replicates (i.e., 1-2) at the target flow rates produced slightly smaller NO parameter SE than anticipated from the asymptotic theoretical results. While Pearson’s correlations between the theoretical SE and median of estimated SE of 10,000 simulations were approximately 1, the average ratio of theoretical SE to simulated SE was 1.5 for 1 replicate, 1.1 for 2 replicates and 1 for 3-5 replicates. As shown in Figure 1.5, most of the 70 possible four flow rate designs produced unbiased NO parameter estimates, despite the wide range of optimality criteria values. In particular, 77% of designs were unbiased for all three parameters while 81%, 100%, and 91% of designs were unbiased for CANO, J’awNO, and DawNO, respectively. The CHS flow rate sampling design (f in Figure 1.5) was unbiased for all parameters. Note that there was a set of unbiased designs with optimality criteria values similar to that of the optimal design for each criterion. Of the 70 possible four flow rate designs, the 10 most optimal designs for each criterion included similar sets of flow rates (Figure 1.6). The minimum possible flow rate (10 ml/s) was included in all but one design. The maximum possible flow rate (400 ml/s) was included in all 10 best designs for SE of C ANO and D-Cov. There was less consistency in the highest flow rate of the best designs for the SE of J’awNO and SE of DawNO (≥200 ml/s). A 50 ml/s flow rate appeared in ≥5 of the 10 best designs for each criterion. 19 We initially considered designs with four target flow rates. When we varied the number of target flow rates from 3 to 8 (note that 8 is the full discrete sampling space), we saw only modestly improved performance of optimal designs when increasing the number of target flows to more than 4 flows (Figure 1.7). When comparing the “full” design with 8 flow rates to the optimal design with 4 flow rates, the 4 flow rate design had 1.3 fold larger D-Cov, 1.2 fold larger CANO SE, 1.2 fold larger J’awNO SE and approximately the same DawNO SE. Note that for each criterion/number of flows combination, the optimal design included 10, 50 and 400 ml/s flow rates. Optimality criteria values were considerably impacted by the maximal and minimal flow rate of the given design. For each criterion, narrowing the range of flows reduced the performance of the design. For example, given four flow rate designs with 2 replicates, CHS sample design (30, 50, 100 and 300 ml/s) SE’s were 1.8 times larger for C ANO, 2.2 times larger for J’awNO, 4.3 times larger for DawNO and 1.8 times larger for D-Cov as compared to the optimal 2 replicate four flow rate design which had the maximal flow rate range (10, 50, 100, 400). The SE of CANO criterion was impacted more by the highest flow than by the lowest flow (Figure 1.8). When comparing the optimal design with flows ranging from 10 to 400 ml/s to the optimal design with flows ranging from 10 to 200 ml/s (i.e., decreasing the upper bound of the flows from 400 to 200 ml/s) the SE of C ANO increased 2-fold. On the other hand, increasing the lowest flow rate to 30 ml/s from 10 ml/s (and keeping the maximum flow at 400 ml/s) increased the SE of CANO by only 1.3-fold. The SE of J’awNO, SE of DawNO and D-Cov criteria were impacted more by the lowest flow than by the highest flow. Decreasing the highest flow rate to 200 ml/s from 400 ml/s (low flow rate= 10 ml/s) increased these criteria by only 1.2-fold (Figure 20 1.8). Increasing the lowest flow rate to 30 ml/s from 10 ml/s (high flow rate= 100 ml/s) increased these criteria by ≥1.6-fold (J’awNO 1.9-fold, DawNO 3.7-fold and D-Cov 1.6-fold). Validation of results using previously published data As shown in Figure 1.9, models estimated using data previously published in Silkoff (22) supported our theoretically derived results on the importance of including high/low flow rates for obtaining NO parameter estimates with minimal SE. Specifically, the SE of estimated airway sources of NO (J’awNO and DawNO) increased most dramatically when the low flow rate was increased, whereas the SE of the estimated alveolar concentration of NO (C ANO) increased most dramatically when the highest flow rate was decreased. Power to detect associations with NO parameters Table 1.1 demonstrates the sample sizes required to detect given effect sizes at 80% power for associations of a continuous potential determinant with each of the NO parameters under 6 different 4 flow rate designs (2 replicates/flow) with varying minimum and maximum flow rates across designs. Reducing the maximum flow rate from 400 to 300 ml/s had relatively little impact on the required sample sizes for all NO parameters. When the lowest flow rate was increased from 10 to 30 ml/s and the maximum flow rate was held fixed at 300 or 400 ml/s, the sample size required increased modestly for C ANO and logJ’awNO (1.1 to 1.2 times larger) and more dramatically for logDawNO (2.8 to 3.1 times larger). 21 DISCUSSION In this paper, we: (a) conducted a comprehensive literature review of the flow rates used in extended NO analysis studies and (b) used statistical theory and simulation to derive optimal flow rate sampling designs for extended NO analysis studies, based on the steady-state two- compartment model. In general, we found—for a given set of NO parameters, a given range of possible flow rates, and a given number of flows—there is a set of unbiased, near-optimal designs. We investigated the impact of various factors and design decisions (number of flow rates, range of flow rates, etc…) on the performance of the design. NO parameter values had little impact on the set of optimal flow rates, although they did impact the SE of the NO parameter estimates. The range of flows had a large impact on the design performance, with a high flow being most important for C ANO and a low flow being important for DawNO and J’awNO (particularly DawNO). Increasing the minimum flow dramatically increased the sample size required to detect the effect of a determinant on logD awNO, while decreasing the maximum flow had a much smaller impact on the sample size required for C ANO. Increasing the number of flows had a modest impact for designs with more than 4 flows. Finally, additional replicate maneuvers at each flow reduced the SE of NO parameter estimates, and we provided a formula for calculating these theoretical reductions. Flow rate sampling designs have been studied previously. George et al. 2004 (11) presented an overview of the ranges of flow rates that had been used to estimate various versions of the two compartment model. A handful of studies have examined the performance of different flow rate sampling designs, but none have used a rigorous statistical theory-based approach to derive optimal designs. Rather, these previous studies have each compared 2 to 4 different flow rate sampling designs using a specific sample of measured data to calculate and compare: (a) 22 predictions of FeNO at various flow rates (particularly flow rates not used in NO parameter estimation) (24, 31) and/or (b) estimated NO parameter values(24, 31-33). For example, predicted FeNO50 from NO parameters estimated using three flow rates (≤20, 100 and ≥350 ml/s) was deemed sufficiently close to observed FeNO50 when the lowest flow rate was 10 ml/s or 20 ml/s, but not 30 ml/s (31). Roy et al (32) compared NO parameter estimates from 5 measured flow rates (10, 30, 50, 100, 200 ml/s) to NO parameters estimated from subsets of the flow rates (4 highest: 30, 50, 100 and 200 or 4 lowest: 10, 30, 50, and 100 ml/s). When comparing the reference (5 flow rate design) to the 4 highest flow design or the 4 lowest flow design, parameter estimates were, respectively: 2 and 0.9 fold different for C ANO, 0.7 and 1.1 fold different for J’awNO, and had a median difference of 19.6 and -1.24 pl·s -1 ·ppb -1 for DawNO (p<0.001). In our simulation study, we found the same 4 flow rate designs (30, 50, 100 and 200 or 10, 30, 50, and 100) to be unbiased for J’awNO and DawNO and slightly biased for C ANO (1.05 fold difference in C ANO for both designs). The Roy et al (32) results may have differed from ours for several reasons, including: the use of different estimation methods (direct nonlinear least squares estimation of Equation 1 parameter vs. our “log transform both sides” approach to estimating Equation 1 parameters (23)), use of different reference (estimates based on 5 flow rates vs. true parameters used for the simulation), and different study populations with different NO parameter values (COPD patients vs. population-based sample of schoolchildren), and a relatively small sample size (50). Finally, Högman et al (33) found that similar NO parameter estimates were obtained using 11 flow rates (range of 5 to 500 ml/s) and a subset of only 3 flow rates (5, 100 and 500 ml/s) , all ranging between 5 and 500 ml/s. In practice, a flow rate sampling design should balance the need for extreme flows, both low and high (to obtain precise estimates of NO parameters and good power to detect 23 determinants of these parameters) and the need for flow rates that are feasible to perform in most study subjects or patients. At low flow rates, the duration of an exhalation sufficient to reach a plateau can be quite long. Without a maneuver of adequate duration for low flow maneuvers, it is doubtful that the correct plateau value for NO will have been achieved. For example, maneuvers of 10 or 20 ml/s in children require exhalations of long duration (7, 34) (e.g., mean time to plateau was 18 s for an 11 ml/s target flow in children (34)) and have been deemed to be “uncomfortable” for children (34-36) and adults (32, 37)). Difficulties in maintaining long duration exhalations were found to be unrelated to disease status in adults (37). In one study of children with asthma, 44% could not obtain flow rates <40 ml/s, the typical time needed to obtain a stable NO plateau was > 12 s, and younger patients had difficulties sustaining stable flow (38). However, another study of 6-18 year old healthy children reported that the children had no difficulties in sustaining steady exhalations long enough to attain a plateau at low flow rates (15 and 23 ml/s) (39). Future work on the feasibility of assessing low flow FeNO, particularly in children and other subgroups who may have difficulty with long maneuvers, is required to recommend a practical lowest flow rate. A previous study of 32 adults (a mix of healthy, asthmatics, and smokers/nonsmokers) compared a minimum low flow of 10, 20 or 30 ml/s and determined that 20 ml/s was the highest recommended low flow in the given study population with the given NO parameter estimation method (31), but this study did not specify the maneuver durations, did not include children, and included no discussion on the differences across various subgroups of the time to NO plateau and the quality of these plateaus. At higher flow rates, the duration of an exhalation is shortened by the limiting factor of pulmonary capacity, which varies by age, gender, disease status, etc. The ATS/ERS guidelines for FeNO50 require exhalation durations of ≥4 seconds for children <12 years and >6 seconds for 24 children >12 years and adults to allow the airway compartment to be washed out and a reasonable plateau achieved (5). A 4 second exhalation at 300 ml/s corresponds to an exhaled volume of 1.2 L, close to the vital capacity of a child’s lung. For example, the mean vital capacity of 10 year old children is 1.9 L in boys and 1.7 L in girls (40). Previous studies have reported children having difficulties performing high flow rate (>200 ml/s) maneuvers (35, 36, 38, 41) due to patients running out of air and progressive decrease in FeNO without a stable plateau (38), possibly due to the decrease in airway mucosal surface area available for diffusion of NO from airway wall to lumen during the exhalation. On average, CHS participants ages 12- 15 achieved a mean plateau flow rate of only 287 ml/s when targeting 300 ml/s (23). Taken together, these findings suggest that special consideration regarding the feasibility of very low and very high flow rates for children may be necessary. Future studies that evaluate feasibility of low flow maneuvers could also evaluate the feasibility of high flow maneuvers and recommend practical high flow rates. Based on our theoretical and simulation study results, high flow rates of 300 ml/s do not result in major reductions in precision or power as compared to 400 ml/s. Our findings regarding the impact of other factors on the performance of flow rate sampling designs also merit discussion. First, different NO parameter values gave rise to similar optimal sets of flow rates. This implies, for example, that the same set of flow rates could be used for healthy patients and for those with asthma. However, higher NO parameter values gave rise to larger SE for C ANO and J’awNO. Since patients with asthma tend to have higher J’ awNO than healthy controls, if it is desired to have similar SE across participants in a case-control study, patients with asthma could be requested to perform additional replicate maneuvers at each flow rate. Second, although in extended NO analysis it is not necessary to measure FeNO at 50 ml/s (since FeNO50 can be predicted using estimated NO parameters), it was interesting that 25 many optimal designs included a 50 ml/s flow rate. It is already recognized that 50 ml/s maneuvers are well-tolerated by most study participants, so it would be reasonable to recommend inclusion of a 50 ml/s flow rate in an extended NO analysis sampling protocol. This would also provide face-validity for comparisons of FeNO 50 obtained from extended NO analysis as compared to the standard FeNO50 protocol. Finally, the impact of the flow rate sampling design on the power to detect associations of a determinant with NO parameters differed by the NO parameter. If the focus of a given study is on J’ awNO or CANO, using a potentially more feasible restricted range of flows had a modest impact on the required sample size. However, if the focus of the study includes D awNO, a large sample size (or a low minimum flow rate) is crucial to ensure an adequately powered design. Our study has several strengths. Our work is based on an extensive literature search of previously published sampling designs. We provided a comprehensive summary of sampling design performance using theoretical derivations and simulation studies to estimate bias and to validate theoretical results. We examined design performance separately for each NO parameter and overall by the: number of flow rates, number of replicate maneuvers, and range of flow rates. Instead of comparing the performance of a few designs, we performed an exhaustive search across all possible discrete flow rates combination. We also examined the effect of different NO parameter values (taken from the literature) on the optimal designs. The NO parameter and variance values used in most theoretical calculations/simulations were based on the well- characterized CHS data. We validated our results on the importance of a low minimum/ high maximal flow rate using previously published data. In addition to the effect of sampling design on the two-compartment model parameters and SEs, we examine the number of subjects needed (power) to detect the association between a determinant and NO parameters. To our knowledge 26 this is the first study that uses theoretical derivation of Fisher Information matrix to assess design performance in the field of extended NO analysis. Our study also has several weaknesses. The steady-state two compartment mathematical model of NO exchange was assumed throughout our analysis and simulation. Our conclusions may not be valid in situations where this model is inadequate. Specifically, the steady-state two compartment model neglects axial diffusion for simplicity. Ignoring axial diffusion would likely cause the greatest problems at extremely low flow rates where convection may not dominate axial diffusion that might lead to underestimation of J’ awNO and DawNO (10, 42). Since in this study we observed that low flows were most important for estimation of DawNO, we anticipate that a study of optimal designs for a deterministic modeling approach that accounts for axial diffusion might provide different insights on the impact of the design on DawNO estimation. An additional limitation is that we restricted our search for an optimal design to a discrete flow rate sampling space. However, the choice of possible flow rate values was based on the flow rates used in previously published studies. An additional practical concern is that FeNO measurement instruments, such as the EcoMedics CLD88-SP analyzers used in the CHS, may have only a specific set of flow resistors available for purchase (although it would be straightforward to custom manufacture a resistor for any given flow rate). Note also that we optimized designs on the individual level and not on the population level. Using a population level approach can improve parameter estimation by pooling information from all subjects (10, 43). Future work could study optimal flow rate sampling designs in a population-level, nonlinear mixed effect model framework. Finally, despite our literature review, we lacked comprehensive information about the feasibility of low and high flows (particularly the required duration of low flow maneuvers) for various study populations. Such information, along with pre-specification of 27 the key NO parameters of interest, would be needed to provide a guide for researchers on the set of optimal flow rates for a given study. However, if researchers have an understanding of the range of flow rates possible in their study population, they can use results of this paper to guide their sampling design. In future work, an online tool could be designed that could take as inputs: (a) the anticipated feasible range of flow rates, (b) the anticipated mean NO parameters in a given study population, and (c) the desired effect sizes detectable at 80% power of a potential determinant on NO parameters (and whether all NO parameters were of interest). Then the tool could then output the theoretically optimal set of target flows and a simulation-based estimate of the required sample size. In conclusion, we recommend careful consideration of the proposed flow rate sampling design’s feasibility, performance, and power when designing extended NO analysis studies. There is a class of unbiased designs with good performance. Performance can be optimized by using a wider range of flow rates, sampling more flow rates and/or using more replicates for each maneuver. On the other hand, the feasibility of performing very high or low flow rate maneuvers or a large total number of maneuvers limits the designs that can be used in practice. Designs such as the one used in the CHS—with a slightly restricted range of flow rates—are reasonable to implement in larger studies of children. Studies primarily interested in estimating D awNO should attempt to measure the lowest flow rate feasible in that study population to obtain the highest precision and power. 28 REFERENCES 1. Kharitonov SA, Gonio F, Kelly C, et al. Reproducibility of exhaled nitric oxide measurements in healthy and asthmatic adults and children. European Respiratory Journal 2003;21(3):433-8. 2. Rodway GW, Choi J, Hoffman LA, et al. Exhaled nitric oxide in the diagnosis and management of asthma: clinical implications. Chronic Respiratory Disease 2009;6(1):19- 29. 3. Barnes PJ, Dweik RA, Gelb AF, et al. Exhaled nitric oxide in pulmonary diseases: a comprehensive review. CHEST Journal 2010;138(3):682-92. 4. Garcia-Rio F, Casitas R, Romero D. Utility of two-compartment models of exhaled nitric oxide in patients with asthma. Journal of Asthma 2011;48(4):329-34. 5. American Thoracic S, European Respiratory S. ATS/ERS recommendations for standardized procedures for the online and offline measurement of exhaled lower respiratory nitric oxide and nasal nitric oxide, 2005. American Journal of Respiratory and Critical Care Medicine 2005;171(8):912-30. 6. Högman M, Stromberg S, Schedin U, et al. Nitric oxide from the human respiratory tract efficiently quantified by standardized single breath measurements. Acta Physiologica Scandinavica 1997;159(4):345-6. 7. Silkoff PE, McClean PA, Slutsky AS, et al. Marked flow-dependence of exhaled nitric oxide using a new technique to exclude nasal nitric oxide. American Journal of Respiratory and Critical Care Medicine 1997;155(1):260-7. 8. George SC. How accurately should we estimate the anatomical source of exhaled nitric oxide? Journal of Applied Physiology 2008;104(4):909-11. 29 9. Eckel SP, Salam MT. Single high flow exhaled nitric oxide is an imperfect proxy for distal nitric oxide. Occupational and Environmental Medicine 2013;70(7):519-20. 10. Condorelli P, Shin HW, Aledia AS, et al. A simple technique to characterize proximal and peripheral nitric oxide exchange using constant flow exhalations and an axial diffusion model. Journal of Applied Physiology 2007;102(1):417-25. 11. George SC, Högman M, Permutt S, et al. Modeling pulmonary nitric oxide exchange. Journal of Applied Physiology 2004;96(3):831-9. 12. Suresh V, Shelley DA, Shin HW, et al. Effect of heterogeneous ventilation and nitric oxide production on exhaled nitric oxide profiles. Journal of Applied Physiology 2008;104(6):1743-52. 13. Tsoukias NM, George SC. A two-compartment model of pulmonary nitric oxide exchange dynamics. Journal of Applied Physiology (1985) 1998;85(2):653-66. 14. Högman M. Extended NO analysis in health and disease. Journal of Breath Research 2012;6(4):047103. 15. Högman M, Meriläinen P. Extended NO analysis in asthma. Journal of Breath Research 2007;1(2):024001. 16. Kharitonov S, Alving K, Barnes PJ. Exhaled and nasal nitric oxide measurements: recommendations. The European Respiratory Society Task Force. European Respiratory Journal 1997;10(7):1683-93. 17. Recommendations for standardized procedures for the on-line and off-line measurement of exhaled lower respiratory nitric oxide and nasal nitric oxide in adults and children- 1999. This official statement of the American Thoracic Society was adopted by the ATS 30 Board of Directors, July 1999. American Journal of Respiratory and Critical Care Medicine 1999;160(6):2104-17. 18. Fedorov VV, Leonov SL. Optimal design for nonlinear response models. Chapman & Hall/CRC Biostatistics Series. Boca Raton: Taylor & Francis, 2013:1 online resource (xxviii, 373 pages). 19. Guedj J, Bazzoli C, Neumann AU, et al. Design evaluation and optimization for models of hepatitis C viral dynamics. Statistics in Medicine 2011;30(10):1045-56. 20. Tsoukias NM, Tannous Z, Wilson AF, et al. Single-exhalation profiles of NO and CO2 in humans: effect of dynamically changing flow rate. Journal of Applied Physiology 1998;85(2):642-52. 21. Pietropaoli AP, Perillo IB, Torres A, et al. Simultaneous measurement of nitric oxide production by conducting and alveolar airways of humans. Journal of Applied Physiology 1999;87(4):1532-42. 22. Silkoff PE, Sylvester JT, Zamel N, et al. Airway nitric oxide diffusion in asthma: Role in pulmonary function and bronchial responsiveness. American Journal of Respiratory and Critical Care Medicine 2000;161(4 Pt 1):1218-28. 23. Eckel SP, Linn WS, Berhane K, et al. Estimation of parameters in the two-compartment model for exhaled nitric oxide. PloS One 2014;9(1):e85471. 24. Högman M, Holmkvist T, Wegener T, et al. Extended NO analysis applied to patients with COPD, allergic asthma and allergic rhinitis. Respiratory Medicine 2002;96(1):24- 30. 25. Rao CR. Information and the accuracy attainable in the estimation of statistical parameters. Calcutta Mathematical Society 1945;37:81–9. 31 26. Cram'er H. Mathematical Methods of Statistics. NJ, USA: Princeton University Press; 1946. 27. Pronzato L, Pázman A. Design of experiments in nonlinear models. Lecture Notes in Statistics 2013;212. 28. Atkinson AC, Donev AN, Tobias R. Optimum experimental designs, with SAS. Oxford ; New York: Oxford University Press; 2007. 29. Linn WS, Rappaport EB, Eckel SP, et al. Multiple-flow exhaled nitric oxide, allergy, and asthma in a population of older children. Pediatric Pulmonology 2013;48(9):885-96. 30. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2015. 31. Högman M, Thornadtsson A, Hedenstierna G, et al. A practical approach to the theoretical models to calculate NO parameters of the respiratory system. Journal of Breath Research 2014;8(1):016002. 32. Roy K, Borrill ZL, Starkey C, et al. Use of different exhaled nitric oxide multiple flow rate models in COPD. European Respiratory Journal 2007;29(4):651-9. 33. Högman M, Drca N, Ehrstedt C, et al. Exhaled nitric oxide partitioned into alveolar, lower airways and nasal contributions. Respiratory Medicine 2000;94(10):985-91. 34. Pedroletti C, Zetterquist W, Nordvall L, et al. Evaluation of exhaled nitric oxide in schoolchildren at different exhalation flow rates. Pediatric Research 2002;52(3):393-8. 35. Latzin P, Beck J, Griese M. Exhaled nitric oxide in healthy children: variability and a lack of correlation with atopy. Pediatr Allergy Immunol 2002;13(1):37-46. 36. Sepponen A, Lehtimaki L, Huhtala H, et al. Alveolar and bronchial nitric oxide output in healthy children. Pediatric Pulmonology 2008;43(12):1242-8. 32 37. Brindicci C, Ito K, Resta O, et al. Exhaled nitric oxide from lung periphery is increased in COPD. European Respiratory Journal 2005;26(1):52-9. 38. Mahut B, Delacourt C, De Blic J, et al. Increase in alveolar nitric oxide in the presence of symptoms in childhood asthma. CHEST Journal 2004;125(3):1012-8. 39. Kissoon N, Duckworth LJ, Blake KV, et al. Exhaled nitric oxide concentrations: online versus offline values in healthy children. Pediatric Pulmonology 2002;33(4):283-92. 40. Jones HE. The vital capacity of children. Archives of Disease in Childhood 1955;30(153):445-8. 41. Robroeks CM, Van Vliet D, Hendriks HJ, et al. Feasibility of exhaled nitric oxide measurements at various flow rates in children with asthma. Pediatric Allergy and Immunology 2010;21(1‐Part‐II):e222-e8. 42. Shin H-W, George SC. Impact of axial diffusion on nitric oxide exchange in the lungs. Journal of Applied Physiology 2002;93(6):2070-80. 43. Eckel S, Berhane K, Linn W, et al. Statistical Methods For Studying Determinants Of Airway And Alveolar Nitric Oxide. American Journal of Respiratory and Critical Care Medicine 2013;187:A5092. 33 TABLES AND FIGURES Table 1.1 Sample size required as a function of sampling design. Sample size required to achieve 80% power to detect the effect of a 1 SD increase in a continuous determinant on each NO parameter,* as a function of sampling design. Sampling design CANO J’awNO a DawNO a 10, 50, 100 and 400 ml/s 50 50 50 20, 50, 100 and 400 ml/s 52 51 82 30, 50, 100 and 400 ml/s 56 52 141 10, 50, 100 and 300 ml/s 51 50 52 20, 50, 100 and 300 ml/s 54 51 88 30, 50, 100 and 300 ml/s (CHS design) 61 53 160 a J’awNO and DawNO were log transformed. NO parameters were assumed to be the median for all CHS participants. * Effect sizes: C ANO= 0.5, J’ awNO a = 0.43 and D awNO a =0.52. 34 Figure 1.1 Study selection for the literature review on target flow rate sampling designs. PubMed search: (("multiple flow" OR "extended" OR "flow rates" OR “alveolar”)) AND (FeNO OR Exhaled nitric oxide OR "exhaled NO" OR "FE(NO)" OR "NO exchange") Date: June 1 2016 Filters: Humans, English (filters were not applied on search articles published after 2014) 35 Figure 1.2 Theoretical relationship between flow rates in the discrete design space and FeNO. * For NO parameters equal to the median across all CHS participants (C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , D awNO=12.3 pl·s -1 ·ppb -1 ) Figure 1.3 Distribution of target flow rates identified in the literature review. * Flow rates used in our discrete design space 36 Figure 1.4 Impact of different NO parameter values (CHS values) Based on extreme NO parameter values observed in the CHS data * —on optimality criteria for the 70 possible four flow rate designs from the eight flow rate discrete sampling space. * NO parameters are assumed to be: the median for all CHS participants, C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , DawNO=12.3 pl·s -1 ·ppb -1 (median CHS); the median for CHS participants with FeNO 50 > 90 th percentile (44.9 ppb), C ANO=1.6 ppb, J’ awNO=3809.8 pl·s -1 , DawNO=16.9 pl·s -1 ·ppb -1 (FeNO 50>90p); and the median for CHS participants with FeNO 50 < 10 th percentile (6.4 ppb), C ANO=1.1 ppb, J’ awNO=259.4 pl·s -1 , DawNO=8.0 pl·s -1 ·ppb -1 (FeNO 50<10p). Sampling designs are indexed according to a rank ordering of the performance under the reference of NO parameters equal to the median CHS. Figure 1.5 Performance and Monte Carlo bias flow rate designs Performance and Monte Carlo bias* of the 70 possible four flow rate designs from the eight flow rate discrete sampling space **. * Monte Carlo mean C ANO, J’ awNO and D awNO estimates <5% deviation of true parameter value ** NO parameters are assumed to be the median for all CHS participants: C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , D awNO=12.3 pl·s -1 ·ppb -1 . 37 Figure 1.6 Flow rates of the 10 best four flow rate sampling designs. Out of the 70 possible four flow rate designs from the eight flow rate discrete sampling space, for each optimality criterion. * NO parameters are assumed to be the median for all CHS participants: C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , D awNO=12.3 pl·s -1 ·ppb -1 . Figure 1.7 Optimality criteria performance assuming different numbers of target flow rates. Optimality criteria performance* of the 10 best (when applicable) sampling designs assuming different numbers of target flow rates. * NO parameters are assumed to be the median for all CHS participants: C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , D awNO=12.3 pl·s -1 ·ppb -1 . Figure 1.8 Optimality criteria performance under varying range of allowable flow rates. Optimality criteria performance of the best four flow rate sampling design under the assumption of a varying range of allowable flow rates (minimum: 10 to 30 ml/s, maximum: 200 to 400 ml/s)*. * NO parameters are assumed to be the median for all CHS participants: C ANO=1.2 ppb, J’ awNO=700 pl·s -1 , D awNO=12.3 pl·s -1 ·ppb -1 . 38 Figure 1.9 Validating the importance of low/high flow rates in parameter estimation using previously published data. Validating the importance of low/high flow rates in parameter estimation,* using previously published data.** First row: SE of NO parameters estimated from 6 different subsets of 4 flow rates, where each subset consists of the highest 3 flow rates available (75.6, 850, and 1,550 ml/s) plus one low flow rate (x- axis). Second row: SE of NO parameters estimated from 3 different subsets of 4 flow rates, where each subset consists of the 3 low flow rates (10.3, 20.7, and 38.2 ml/s) plus one high flow rate (x- axis). Different plotting symbols represent different subjects. * Log transform both sides nonlinear models. Missing points are due to failures in model convergence. ** Table 1 in Silkoff et al 39 SUPPLEMENTARY DATA Table S1.1 The 147 papers reporting on 150 voluntary flow rate sampling designs identified in the literature review. Author, year Title Design (flow rates) Number of replicate maneuvers per flow rate Model used 1 Health condition 2 Age group 3 Akamatsu T, 2014 (1) Switching from salmeterol/fluticasone to formoterol/budesonide combinations improves peripheral airway/alveolar inflammation in asthma. 50, 100, 150, 200 and 250 ml/s 3 linear sick adults Bake B, 2014 (2) Effects of pollen season on central and peripheral nitric oxide production in subjects with pollen asthma. 50, 100 and 270 ml/s 2 linear both adults Barath S, 2013 (3) Diesel exhaust but not ozone increases fraction of exhaled nitric oxide in a randomized controlled experimental exposure study of healthy human subjects. 10, 50, 100 and 270 ml/s 2 no model estimation healthy adults Barregard L, 2008 (4) Experimental exposure to wood smoke: effects on airway inflammation and oxidative stress. 50, 100 and 270 ml/s 2 linear healthy adults Bazeghi N, 2011 (5) Exhaled nitric oxide measure using multiple flows in clinically relevant subgroups of COPD. 10, 30, 50, 100 and 200 ml/s 3 nonlinear sick adults Brindicci C, 2005 (6) Exhaled nitric oxide from lung periphery is increased in COPD. 10, 50, 100, 200 and 260 ml/s 2 nonlinear both adults Brindicci C, 2007 (7) Differential flow analysis of exhaled nitric oxide in patients with asthma of differing severity. 10, 50, 100, 200 and 260 ml/s NR nonlinear both adults Brindicci C, 2007 (8) Effect of an inducible nitric oxide synthase inhibitor on differential flow-exhaled nitric oxide in asthmatic patients and healthy volunteers. 10, 100, 200 and 260 ml/s NR nonlinear sick adults Cameli P, 2014 (9) Exhaled nitric oxide in interstitial lung diseases. 50, 100 and 150 ml/s 2 linear both adults Cameli P, 2015 (10) Exhaled nitric oxide and carbon monoxide in lung transplanted patients. 50, 100, 150 and 350 ml/s 2 linear both adults Cameli P, 2016 (11) Exhaled nitric oxide is not increased in pulmonary sarcoidosis. 50, 100 and 150 ml/s NR linear both adults Caspersen C, 2011 (12) Bronchial nitric oxide flux and alveolar nitric oxide concentration after exposure to hyperoxia. 30, 50, 100 and 250 ml/s 3 nonlinear both adults Cattoni I, 2013 (13) Mechanisms of decrease in fractional exhaled nitric oxide during acute bronchoconstriction 10, 50, 100, 150 and 250 ml/s 3 nonlinear sick adults Chinellato I, 2012 (14) Bronchial and alveolar nitric oxide in exercise- induced 10, 50, 100 and 200 ml/s NR linear sick children 40 bronchoconstriction in asthmatic children. Chládková J, 2012 (15) Alveolar concentration and bronchial flux of nitric oxide: two linear modeling methods evaluated in children and adolescents with allergic rhinitis and atopic asthma. 50, 100, 150, 200 and 250 ml/s 2 linear sick children Choi J, 2009 (16) Multiple flow rates measurement of exhaled nitric oxide in patients with sarcoidosis: a pilot feasibility study. 50, 100, 150, 200, 250, 300 and 400 ml/s NR linear both adults Cohen J, 2008 (17) Physiology of the small airways: A gender difference? 30, 50, 100 and 200 ml/s 3 linear sick adults Condorelli P, 2007 (18) A simple technique to characterize proximal and peripheral nitric oxide exchange using constant flow exhalations and an axial diffusion model. 100, 150, 200 and 250 ml/s 3 linear sick adults Degano B, 2009 (19) Nitric oxide production by the alveolar compartment of the lungs in cirrhotic patients. 50, 100 and 200 ml/s NR linear both adults Delclaux C, 2002 (20) Increased nitric oxide output from alveolar origin during liver cirrhosis versus bronchial source during asthma. 50, 100, 150, 200, 250 and 300 ml/s NR linear sick adults Deykin A, 2002 (21) Exhaled nitric oxide as a diagnostic test for asthma: online versus offline techniques and effect of flow rate. 50, 100, 200, 350 and 500 ml/s 3 no model estimation both adults Duong-Quy S, 2016 (22) Study of Exhaled Nitric Oxide in Subjects with Suspected Obstructive Sleep Apnea: A Pilot Study in Vietnam. 50, 100, 150 and 350 ml/s NR linear sick adults Duplain H, 2000 (23) Exhaled nitric oxide in high-altitude pulmonary edema: role in the regulation of pulmonary vascular tone and evidence for a role against inflammation. 70 and 90 ml/s 3 no model estimation healthy adults Eckel SP, 2016 (24) Traffic-related air pollution and alveolar nitric oxide in southern California children. 30, 50, 100 and 300 ml/s 2 (3 for 50 ml/s) nonlinear healthy children Foley SC, 2007 (25) Airway nitric oxide output is reduced in bronchiectasis. 100, 150, 200 and 250 ml/s 3 linear both adults Foresi A, 2007 (26) Alveolar-derived exhaled nitric oxide is reduced in obstructive sleep apnea syndrome. 50, 120, 190, 250 and 300 ml/s 3 linear both adults Fortuna AM, 2011 (27) Airway and alveolar nitric oxide measurements in obstructive sleep apnea syndrome. 10, 30, 50, 100 and 200 ml/s NR linear both adults Fritscher LG, 2009 (28) The effect of montelukast on exhaled nitric oxide of alveolar and bronchial origin in inhaled 50, 100, 150, 200 and 250 ml/s 3 linear sick adults 41 corticosteroid-treated asthma. Fujisawa T, 2013 (29) Alveolar nitric oxide concentration reflects peripheral airway obstruction in stable asthma. 50, 100, 150 and 200 ml/s 3 linear sick adults Furukawa K, 2011 (30) Increase of nitrosative stress in patients with eosinophilic pneumonia. 50, 100, 175 and 370 ml/s 2 linear both adults Gelb AF, 2006 (a) (31) Role of spirometry and exhaled nitric oxide to predict exacerbations in treated asthmatics. 100, 150 and 200 ml/s 3 linear sick adults Gelb AF, 2008 (a) (32) Effect of fluticasone 250 microg/salmeterol 50 microg and montelukast on exhaled nitric oxide in asthmatic patients. 100, 150 and 200 ml/s 3 linear sick adults Gelb AF, 2009 (a) (33) Role of add-on zileuton on total exhaled, large airway, and small airway/alveolar nitric oxide in moderate- severe persistent adult asthmatics on fluticasone 250 microg/Salmeterol 50 microg. 50, 100, 150 and 200 ml/s 3 linear sick adults Gelb AF, 2010 (34) Central and peripheral airway sites of nitric oxide gas exchange in COPD. 50, 100, 150 and 200 ml/s 3 linear sick adults Gelb AF, 2010 (a) (35) Central and peripheral airway/alveolar sites of exhaled nitric oxide in acute asthma. 50, 100, 150 and 200 ml/s 3 linear sick adults Gelb AF, 2011 (36) Increased nitric oxide concentrations in the small airway of older normal subjects. 50, 100, 150 and 200 ml/s 3 linear healthy adults Gelb AF, 2012 (a) (37) In moderate-to-severe asthma patients monitoring exhaled nitric oxide during exacerbation is not a good predictor of spirometric response to oral corticosteroid. 50, 100, 150 and 200 ml/s 3 linear sick adults Girgis RE, 2002 (38) Partitioning of alveolar and conducting airway nitric oxide in scleroderma lung disease. 50, 100, 150 and 200 ml/s NR linear both adults Girgis RE, 2005 (39) Decreased exhaled nitric oxide in pulmonary arterial hypertension: response to bosentan therapy. 18, 50, 100 and 250 ml/s NR nonlinear sick adults Heijkenskjöld- Rentzhog C, 2012 (40) The fraction of NO in exhaled air and estimates of alveolar NO in adolescents with asthma: methodological aspects. 50, 100, 200 and 300 ml/s 2 linear both both Heijkenskjöld- Rentzhog C, 2014 (41) Alveolar and exhaled NO in relation to asthma characteristics--effects of correction for axial diffusion. 50, 100, 200 and 300 ml/s 2 linear sick both Hemmingsson TE, 2012 (42) Effects of ambient pressure on pulmonary nitric oxide. 29, 47, 100 and 321 ml/s 2 linear healthy adults Hirano T, 2013 (43) Relationship between alveolar nitric oxide concentration in exhaled 50, 100, 175 and 370 ml/s 3 linear both adults 42 air and small airway function in COPD. Hirano T, 2013 (44) Persistent elevation of exhaled nitric oxide and modification of corticosteroid therapy in asthma. 50, 100, 175 and 370 ml/s 3 linear sick adults Hofer M, 2009 (45) Extended nitric oxide measurements in exhaled air of cystic fibrosis and healthy adults. 50, 100, 150 and 200 ml/s 3 linear both adults Högman M, 2002 (46) Extended NO analysis applied to patients with COPD, allergic asthma and allergic rhinitis. 5, 100 and 500 ml/s 3 nonlinear both adults Högman M, 2011 (47) Added value with extended NO analysis in atopy and asthma. 5, 50, 100 and 500 ml/s 3 nonlinear both adults Högman M, 2014 (48) A practical approach to the theoretical models to calculate NO parameters of the respiratory system. 10, 30, 50, 100, 150, 200, 250 and 350 ml/s 3 nonlinear healthy adults Hua-Huy T, 2010 (b) (49) Increased alveolar concentration of nitric oxide is related to serum- induced lung fibroblast proliferation in patients with systemic sclerosis. 50, 100, 150, 200 and 250 ml/s NR linear both adults Jöbsis Q, 2001 (50) Controlled low flow off line sampling of exhaled nitric oxide in children. 50, 100 and 150 ml/s 3 no model estimation healthy children Kanazawa H, 2010 (c) (51) Validity of measurement of two specific biomarkers for the assessment of small airways inflammation in asthma. 50, 100, 150 and 200 ml/s 3 linear both adults Karsten J, 2014 (52) Bedside monitoring of ventilation distribution and alveolar inflammation in community-acquired pneumonia. 50 and 200 ml/s NR no model estimation sick adults Keen C, 2010 (53) Low levels of exhaled nitric oxide are associated with impaired lung function in cystic fibrosis. 30, 50, 100, 150, 200 and 250 ml/s 3 nonlinear both both Keen C, 2011 (54) Small airway function, exhaled NO and airway hyper-responsiveness in paediatric asthma. 30, 50, 100, 150, 200, 250 and 300 ml/s 3 nonlinear both children Kerckx Y, 2008 (55) Airway contribution to alveolar nitric oxide in healthy subjects and stable asthma patients. 50, 175 and 300 ml/s NR linear both adults Kissoon N, 2000 (d) (56) FE(NO): relationship to exhalation rates and online versus bag collection in healthy adolescents. 4, 5, 7, 10, 15, 23, 31 and 46 ml/s 3 no model estimation healthy children Kissoon N, 2002 (d) (57) Exhaled nitric oxide concentrations: online versus offline values in healthy children. 15, 23, 31 and 46 ml/s 3 no model estimation healthy children Kobayashi D, 2011 (c) (58) Comparison of alveolar nitric oxide concentrations using two different methods for assessing small airways obstruction in asthma. 50, 100, 150 and 200 mL/s 3 linear both adults Larsson-Callerfelt AK, 2015 (59) iNOS affects matrix production in distal lung 50, 100, 200, 300 and 400 ml/s NR linear both adults 43 fibroblasts from patients with mild asthma. Latzin P, 2002 (60) Exhaled nitric oxide in healthy children: variability and a lack of correlation with atopy. Design 1: 45, 86, 184 and 237 ml/s Design 2: 20, 45, 86, 184 and 237 ml/s Design 3: 10, 20, 45, 86 and 184 ml/s 3 linear healthy children Lehouck A, 2010 (61) Alveolar and bronchial exhaled nitric oxide in chronic obstructive pulmonary disease. 50, 100 and 200 ml/s 2 linear both adults Lehtimäki L, 2000 (e) (62) Increased bronchial nitric oxide production in patients with asthma measured with a novel method of different exhalation flow rates. 40, 100, 175 and 370 ml/s NR no model estimation both adults Lehtimäki L, 2001 (e) (63) Extended exhaled NO measurement differentiates between alveolar and bronchial inflammation. 100, 175 and 370 ml/s NR linear both adults Lehtimäki L, 2001 (e) (64) Inhaled fluticasone decreases bronchial but not alveolar nitric oxide output in asthma. 100, 175 and 370 ml/s NR linear both adults Lehtimäki L, 2002 (e) (65) Increased alveolar nitric oxide concentration in asthmatic patients with nocturnal symptoms. 100, 175 and 370 ml/s NR linear both adults Lehtimäki L, 2005 (e) (66) Peripheral inflammation in patients with asthmatic symptoms but normal lung function. 100, 175 and 370 ml/s NR linear both adults Lehtimäki L, 2010 (67) Pulmonary inflammation in asbestos-exposed subjects with borderline parenchymal changes on HRCT. 50, 100, 200 and 300 ml/s 3 linear both adults Lehtimäki L, 2010 (68) Bronchial nitric oxide is related to symptom relief during fluticasone treatment in COPD. 50, 100, 200 and 300 ml/s NR linear sick adults Lehtonen H, 2007 (69) Increased alveolar nitric oxide concentration and high levels of leukotriene B(4) and 8-isoprostane in exhaled breath condensate in patients with asbestosis. 50, 100, 200 and 300 ml/s NR linear both adults Linkosalo L, 2008 (70) Increased bronchial NO output in severe atopic eczema in children and adolescents. 50, 100, 200 and 300 ml/s NR linear sick both Linkosalo L, 2012 (71) Relation of bronchial and alveolar nitric oxide to exercise-induced bronchoconstriction in atopic children and adolescents. 50, 100, 200 and 300 ml/s NR linear sick both Lopez V, 2011 (72) Natural pollen exposure increases the response plateau to adenosine 5'- monophosphate and bronchial but not alveolar nitric oxide in sensitized subjects. 100, 200 and 300 ml/s 2 linear both adults 44 Lunt A, 2015 (73) Airway and alveolar nitric oxide production, lung function and pulmonary blood flow in sickle cell disease. 50, 100, 150 and 350 ml/s NR linear healthy children Mahut B, 2006 (74) Validity criteria and comparison of analytical methods of flow- independent exhaled NO parameters. 100, 150, 200, 275 350 ml/s NR linear healthy adults Makinen T, 2009 (75) Bronchial diffusing capacity of nitric oxide is increased in patients with allergic rhinitis. 10, 20, 30, 50, 100, 200, 300 and 400 ml/s 3 nonlinear both adults Malinovschi A, 2009 (76) Basal and induced NO formation in the pharyngo- oral tract influences estimates of alveolar NO levels. 100, 200, 300 and 400 ml/s 2 linear healthy adults Malinovschi A, 2011 (77) Increased plasma and salivary nitrite and decreased bronchial contribution to exhaled NO in pulmonary arterial hypertension. 20, 50, 100, 200 and 300 ml/s NR linear both adults Malmberg LP, 2013 (78) Very low birth weight and respiratory outcome: association between airway inflammation and hyperresponsiveness. 10, 50, 100 and 200 ml/s 2 linear both children Mandon J, 2012 (79) Exhaled nitric oxide monitoring by quantum cascade laser: comparison with chemiluminescent and electrochemical sensors. 15, 50, 100 and 250 ml/s 1 nonlinear sick children Maniscalco M, 2015 (80) Low alveolar and bronchial nitric oxide in severe uncomplicated obesity. 50, 100 and 150 ml/s NR linear both adults Maniscalco M, 2015 (81) Extended analysis of exhaled and nasal nitric oxide for the evaluation of chronic cough. 50, 100 and 150 ml/s NR linear sick adults Matsumoto H, 2010 (82) Association of alveolar nitric oxide levels with pulmonary function and its reversibility in stable asthma. 50, 100 and 200 ml/s NR linear sick adults McCurdy MR, 2011 (83) Exhaled nitric oxide parameters and functional capacity in chronic obstructive pulmonary disease. 8.3, 33.3, 50, 100, 150 and 250 ml/s 3 nonlinear sick adults Menzies D, 2008 (84) Effect of aspirin on airway inflammation and pulmonary function in patients with persistent asthma. 50, 100 and 200 ml/s 3 linear sick adults Modig L, 2014 (85) Short-term exposure to ozone and levels of exhaled nitric oxide. 50 and 270 ml/s 3 no model estimation healthy adults Nicolini G, 2010 (86) Both bronchial and alveolar exhaled nitric oxide are reduced with extrafine beclomethasone dipropionate in asthma. 10, 50 , 100, 200 and 260 ml/s NR nonlinear sick adults Nihlberg K, 2010 (87) Altered matrix production in the distal airways of individuals with asthma. 50, 100, 200 and 400 ml/s NR linear both adults 45 Paraskakis E, 2006 (88) Measurement of bronchial and alveolar nitric oxide production in normal children and children with asthma. 50, 100, 200 and 260 ml/s NR linear both children Paraskakis E, 2007 (89) Nitric oxide production in PCD: possible evidence for differential nitric oxide synthase function. 50, 100, 200 and 260 ml/s NR linear both children Paredi P, 2014 (90) A novel approach to partition central and peripheral airway nitric oxide. 50, 100, 200 and 300 ml/s 2 linear both adults Pedroletti C, 2003 (91) Nitric oxide airway diffusing capacity and mucosal concentration in asthmatic schoolchildren. 10, 50, 100 and 500 ml/s 3 nonlinear both children Prieto L, 2013 (92) Comparison of 2 methods to correct for peripheral nitric oxide exchange in the lungs. 50, 100, 200 and 300 ml/s NR linear both adults Prieto L, 2013 (93) The effect of spirometry on bronchial and alveolar nitric oxide in subjects with asthma. 50, 100, 200 and 300 ml/s 2 linear sick adults Prieto L, 2015 (94) Effects of cigarette smoke on methacholine- and AMP-induced air trapping in asthmatics. 100, 200 and 300 ml/s 2 linear sick adults Puckett JL, 2010 (95) An elevated bronchodilator response predicts large airway inflammation in mild asthma 50, 100 and 200 ml/s 3 linear both children Puckett JL, 2010 (f) (96) Impact of analysis interval on the multiple exhalation flow technique to partition exhaled nitric oxide. 50, 100 and 200 ml/s 3 linear sick children Radhakrishnan DK, 2012 (97) Lower airway nitric oxide is increased in children with sickle cell disease. 30, 50, 100, 150 and 200 ml/s 3 linear both children Robroeks CM, 2008 (98) Comparison of the anti- inflammatory effects of extra-fine hydrofluoroalkane- beclomethasone vs fluticasone dry powder inhaler on exhaled inflammatory markers in childhood asthma. 50, 100 and 200 ml/s NR linear sick children Robroeks CM, 2010 (99) Feasibility of exhaled nitric oxide measurements at various flow rates in children with asthma. 30, 50, 100 and 200 ml/s 2 linear sick children Rolla G, 2004 (100) Source of exhaled nitric oxide in primary biliary cirrhosis. 50, 100, 150 and 200 ml/s NR linear both adults Rosa MJ, 2011 (101) Fractional exhaled nitric oxide exchange parameters among 9-year-old inner- city children. 50, 83 and 100 ml/s 2 (3 for 83 ml/s) linear healthy children Roy K, 2007 (102) Use of different exhaled nitric oxide multiple flow rate models in COPD. 10, 30, 50, 100 and 200 ml/s 3 nonlinear both adults Sanders SP, 2004 (103) Role of nasal nitric oxide in the resolution of experimental rhinovirus infection. 9, 27, 41 and 78 ml/s 3 no model estimation healthy adults 46 Sardón O, 2014 (104) Alveolar nitric oxide and its role in pediatric asthma control assessment. 50, 100 and 200 ml/s 3 linear both children Sauni R, 2012 (105) Increased alveolar nitric oxide and systemic inflammation markers in silica-exposed workers. 50, 100, 200, 300 and 400 ml/s NR linear both adults Scichilone N, 2013 (106) Exhaled nitric oxide is associated with cyclic changes in sexual hormones. 50, 150, 250 and 350 ml/s NR linear healthy adults Scichilone N, 2013 (107) Clinical and anti- inflammatory effects of ultra-short preseasonal vaccine to Parietaria in asthma. 50, 100, 150 and 350 ml/s NR linear sick adults Scichilone N, 2013 (108) Alveolar nitric oxide and asthma control in mild untreated asthma. 50, 150, 25 and 350 ml/s 3 linear sick adults Scichilone N, 2013 (109) Is health-related quality of life associated with upper and lower airway inflammation in asthmatics? 50, 150, 250 and 350 ml/s 3 linear sick adults Sehlstedt M, 2010 (110) Antioxidant airway responses following experimental exposure to wood smoke in man. 10, 50, 100 and 270 ml/s 3 no model estimation healthy adults Sepponen A, 2008 (111) Alveolar and bronchial nitric oxide output in healthy children. 10, 50, 100 and 200 ml/s 3 nonlinear healthy children Shelley, 2010 (f) (112) Quantifying proximal and distal sources of NO in asthma using a multicompartment model. 50, 100 and 200 ml/s 3 no model estimation sick children Shin HW, 2001 (113) Flow-independent nitric oxide exchange parameters in healthy adults. 50 and 250 ml/s 3 no model estimation using fixed flow rates healthy adults Shin HW, 2002 (114) Flow-independent nitric oxide exchange parameters in cystic fibrosis. 50 and 250 ml/s 5 no model estimation using fixed flow rates (used model using tidal breathing) both children Shin HW, 2003 (115) Impact of high-intensity exercise on nitric oxide exchange in healthy adults. 50 and 250 ml/s 3 no model estimation using fixed flow rates (used model using tidal breathing) healthy adults Shin HW, 2004 (116) Airway diffusing capacity of nitric oxide and steroid therapy in asthma. 50 and 250 ml/s 3 no model estimation using fixed flow rates (used model using tidal breathing) both adults Shinkai M, 2002 (117) Analysis of exhaled nitric oxide by the helium bolus method. 60, 120 and 240 ml/s NR no model estimation both adults Shirai T, 2013 (118) Respiratory mechanics and peripheral airway inflammation and dysfunction in asthma. 50, 100, 150, 200 and 250 ml/s NR linear sick adults Shoemark A, 2009 (119) Bronchial and peripheral airway nitric oxide in 50, 100 and 200 ml/s 2 linear sick adults 47 primary ciliary dyskinesia and bronchiectasis. Short PM, 2012 (120) Effects of extra-fine inhaled and oral corticosteroids on alveolar nitric oxide in COPD. 50, 100, 150 and 200 ml/s 2 linear sick adults Shorter JH, 2011 (121) Clinical study of multiple breath biomarkers of asthma and COPD (NO, CO(2), CO and N(2)O) by infrared laser spectroscopy. 10, 50, 100 and 333 ml/s 3 linear both both Spears M, 2011 (122) Bronchial nitric oxide flux (J'aw) is sensitive to oral corticosteroids in smokers with asthma. 30, 50, 100, 150, 200, 250 and 300 ml/s 3 nonlinear sick adults St Croix CM, 1999 (123) Assessment of nitric oxide formation during exercise. 46 and 950 ml/s 2 no model estimation healthy adults Suri R, 2007 (124) Alveolar, but not bronchial nitric oxide production is elevated in cystic fibrosis. 50, 100, 200 and 260 ml/s 3 linear both children Tajiri T, 2014 (125) Comprehensive efficacy of omalizumab for severe refractory asthma: a time- series observational study. 50, 100 and 200 ml/s NR linear sick adults Thornadtsson A, 2015 (126) Extended nitric oxide analysis may improve personalized anti- inflammatory treatment in asthmatic children with intermediate F(E)NO50. 16, 50, 90 and 230 ml/s 1 nonlinear sick children Tiev KP, 2009 (b) (127) Exhaled nitric oxide, but not serum nitrite and nitrate, is a marker of interstitial lung disease in systemic sclerosis. 100, 150 and 200 ml/s NR linear both adults Tiev KP, 2009 (b) (128) Diagnostic value of exhaled nitric oxide to detect interstitial lung disease in systemic sclerosis. 50, 100, 150 and 200 ml/s 2 linear sick adults Tiev KP, 2012 (b) (129) Alveolar concentration of nitric oxide predicts pulmonary function deterioration in scleroderma. 50, 100, 150 and 200 ml/s NR linear sick adults Tiev KP, 2013 (b) (130) High alveolar concentration of nitric oxide is associated with alveolitis in scleroderma. 100, 150 and 200 ml/s NR linear both adults Tiev KP, 2014 (b) (131) Exhaled NO predicts cyclophosphamide response in scleroderma- related lung disease. 50, 100, 150 and 200 ml/s NR linear sick adults Törnberg DC, 2002 (132) Nasal and oral contribution to inhaled and exhaled nitric oxide: a study in tracheotomized patients. 50, 100, 200 and 300 ml/s 2 no model estimation both adults Tufvesson E, 2013 (133) Increase of club cell (Clara) protein (CC16) in plasma and urine after exercise challenge in asthmatics and healthy controls, and correlations to exhaled breath temperature and exhaled nitric oxide. 50, 100, 200 and 300 ml/s 3 linear both adults van Amsterdam JG, 2003 (134) Flow dependency and off- line measurement of exhaled NO in children. Design 1: 8.3 and 350 ml/s (children) 2 no model estimation healthy children 48 Design 2: 8.3, 100, 120, 250 and 540 ml/s (adults) van de Kant KD, 2015 (135) The effect of body weight on distal airway function and airway inflammation. 30, 50, 100, 200 and 300 ml/s 2 nonlinear both adults Van Muylem A, 2010 (136) Acinar effect of inhaled steroids evidenced by exhaled nitric oxide. 50, 175 and 300 ml/s NR linear sick adults Volbeda F, 2012 (137) Clinical control of asthma associates with measures of airway inflammation. 30, 50, 100 and 200 ml/s 3 linear sick adults Walker WT, 2013 (138) Upper and lower airway nitric oxide levels in primary ciliary dyskinesia, cystic fibrosis and asthma. 50, 100, 200 and 250 ml/s 2 linear both children Wiel E, 2014 (139) Effects of small airway dysfunction on the clinical expression of asthma: a focus on asthma symptoms and bronchial hyper- responsiveness. 20, 50, 100 and 200 ml/s NR linear sick adults Williamson PA, 2009 (140) A proof-of-concept study to evaluate the antiinflammatory effects of a novel soluble cyclodextrin formulation of nebulized budesonide in patients with mild to moderate asthma. 50, 100 and 200 ml/s NR linear sick adults Williamson PA, 2011 (141) Assessment of small- airways disease using alveolar nitric oxide and impulse oscillometry in asthma and COPD. 50, 100 and 200 ml/s NR linear both adults Williamson PA, 2013 (142) Inhaled and systemic corticosteroid response in severe asthma assessed by alveolar nitric oxide: a randomized crossover pilot study of add-on therapy. 50, 100 and 200 ml/s 2 linear sick adults Wuttge DM, 2010 (143) Increased alveolar nitric oxide in early systemic sclerosis. 50, 100, 200 and 400 ml/s 3 linear both adults Yasui H, 2012 (144) Impact of add-on pranlukast in stable asthma; the additive effect on peripheral airway inflammation. 50, 100, 150 and 200 ml/s 3 linear sick adults Zacharasiewicz A, 2004 (145) Effect of inhalation times on exhaled NO. 50, 100 and 200 ml/s NR no model estimation both both Zetterquist W, 2002 (146) Exhaled carbon monoxide is not elevated in patients with asthma or cystic fibrosis. 75 and 150 ml/s 2 no model estimation both adults Zhao Y, 2012 (147) Characteristics of pulmonary inflammation in combined pulmonary fibrosis and emphysema. 10, 50, 100 and 200 ml/s 3 linear both adults NR= Not reported 1 Observed FeNO was used to estimate NO parameters using linear approximation models (148, 149) (these study designs typically restricted to high flow rates only) or by direct nonlinear (150, 151) or approximate nonlinear models (46, 152). In several cases, fixed flow rates were not used to estimate parameters in a steady-state two- compartment model (no model estimation). 2 Participants were healthy/recruited from the general population or had a specific condition/disease (included case only and case-control studies). 49 3 Age group: children only (≤ 18 years old) vs. children and adults vs. adults only (a), (b), (c), (d), (e), (f) and (g)- clusters of studies with similarities in flow rate sample design, recruitment and/or eligibility criteria, but no clear evidence that it was the same study subjects Fisher’s information matrix and criteria for an optimal sampling design In statistical estimation, evidence from observed data about unknown parameters is typically summarized using the likelihood function. The observed log likelihood function for each subject is: 𝑙 𝜃 , 𝑉 ̇ = − 𝑛 2 ln(2𝜋 ) − 𝑛 2 ln(𝜎 ) − 1 2𝜎 𝑌 − 𝑓 𝜃 , 𝑉 ̇ = 𝐶 − 1 2𝜎 𝑌 − 𝑓 𝜃 , 𝑉 ̇ 𝑌 − 𝑓 𝜃 , 𝑉 ̇ where 𝑛 is the total number of maneuvers measured (number of target flow rates times number of replicates at each flow), 𝑌 is a vector of observed outcomes for each maneuver 𝑌 , … , 𝑌 , 𝑓 𝜃 , 𝑉 ̇ is a vector of mean function values for each maneuver 𝑓 𝜃 , 𝑉 ̇ , … , 𝑓 𝜃 , 𝑉 ̇ , and 𝐶 denotes the “constant” terms of the log likelihood, which are constant in that they do not depend on 𝜃 . For a specific flow rate sampling design, 𝑑 , the Fisher information matrix, 𝐼 (𝜃 , 𝑑 ), is defined as the negative expectation of the second derivative—with respect to the model parameters 𝜃 —of the log-likelihood function: 𝐼 (𝜃 , 𝑑 ) = −𝐸 𝜕 𝑙 𝜃 , 𝑉 ̇ 𝜕𝜃𝜕𝜃 (E1) Assuming that observations are independent, the Fisher information is the sum of the information from the individual maneuvers (153). According to the Cramér–Rao inequality, the inverse Fisher information matrix is the minimal covariance matrix of an unbiased estimator (154, 155). The square root of each element on the principal diagonal of the inverse Fisher Information matrix, referred to henceforth as the theoretical covariance matrix, is the standard error (SE) of one of the models parameters (i.e., the NO parameters) and the off diagonal elements represent the covariance between each parameter pair. The determinant of the theoretical covariance matrix, Det(Covariance) = det(𝐼 (𝜃 , 𝑑 ) ), is proportional to the volume of the joint asymptotic confidence region of the parameters (153, 156, 157). A sampling design minimizing determinant of the covariance [Det(Covariance)] can be interpreted as an overall optimal design. Note that a design that maximizes the determinant of the Fisher Information matrix is commonly referred to as the “D optimal” design, but here we considered designs that minimize the determinant of the inverse Fisher Information matrix for interpretability. We defined the “D-Cov” criterion to be the square root of Det(Covariance) / , so it is on the same scale as the NO parameter SE. When the covariance matrix is diagonal, Det(Covariance) / is the geometric mean of the variance of the 3 NO parameters (157). 50 R code for calculating the theoretical covariance ########################################################### # function: derive theoretical covariance matrix components ########################################################### TheoryCov<-function(Ca,Jaw,Daw,ReplicatesNum,SD,TargetFlowNum,FlowRateGride) { # variance of the error, additive on a logFeNO scale sigma2<-SD^2 # replicated flow rates WorkGrid<-sort(rep(FlowRateGride,ReplicatesNum)) # each row represents the location of a flow rate combination FlowRateCombLocation<-t(combn(1:length(FlowRateGride),TargetFlowNum)) # set result table CombNum<-dim(FlowRateCombLocation)[1] # total number of flow rate combinations NAvec<-rep(NA,CombNum) # length of NA vector Results<-eval(parse(text=paste0("data.frame(CombNum=1:",CombNum,",", paste0("flow",1:TargetFlowNum,"=NAvec",collapse=","), ",DetCov=NAvec,JawSE=NAvec,DawSE=NAvec,CaSE=NAvec)"))) FeNO<-Jaw/Daw+(Ca-Jaw/Daw)*exp(-Daw/WorkGrid) # 2nd derivatives of the log-likelihood (transform both sides two-compartment model) model<-deriv(~-0.5*log(sigma2)- (log(FeNO)- log(Jaw/Daw + (Ca - Jaw/Daw)*exp(-Daw/WorkGrid)))^2/ (2*sigma2),c("Jaw","Daw","Ca"),hessian=T) # calculate hessian, conditional on different flow rates ConditionalHessian<-attr(eval(model), "hessian") ### cycle through flow rate combinations for (i in 1:CombNum) { # flow rate locations (no replicates) FlowRateCombLocation_i<-FlowRateCombLocation[i,] # save flow rate combination Results[i,2:(TargetFlowNum+1)]<-FlowRateGride[FlowRateCombLocation_i] # flow rate locations (with replicates) FlowRateComb<-which(WorkGrid%in%FlowRateGride[FlowRateCombLocation_i]) # Hessian for specific flow rate combination Hessian<-ConditionalHessian[FlowRateComb,,] # Information matrix Information<- -apply(Hessian, 3, colSums) # save det(Covariance), change scale to "SE" scale Results$DetCov[i]<-sqrt((1/det(Information))^(1/3)) # covariance matrix CovMat<-solve(Information) # save SE's Results$JawSE[i]<-sqrt(CovMat["Jaw","Jaw"]) Results$DawSE[i]<-sqrt(CovMat["Daw","Daw"]) Results$CaSE[i]<-sqrt(CovMat["Ca","Ca"]) } return(Results) } # example TheoryCov(Ca=1.2,Jaw=700,Daw=12.3,ReplicatesNum=2,SD=0.15, TargetFlowNum=4,FlowRateGride=c(10,20,30,50,100,200,300,400)) First 5 lines of the result: CombNum flow1 flow2 flow3 flow4 DetCov JawSE DawSE CaSE 1 1 10 20 30 50 7.067786 213.06696 7.665572 4.2154151 2 2 10 20 30 100 4.767254 124.70979 5.241203 1.3772627 3 3 10 20 30 200 3.664492 101.30075 4.599873 0.6084819 4 4 10 20 30 300 3.253707 95.14145 4.429319 0.4055565 5 5 10 20 30 400 3.028784 92.30200 4.350253 0.3126046 51 Difference between theoretical calculation of the covariance and the analytical method used in ‘nls’ Numerical methods are required to obtain parameter estimates because there is no closed-form solution for nonlinear model parameter estimation. The ‘nls’ function uses a Gauss-Newton algorithm method to find the parameters values (𝜃 ) that minimize the sum of squared residuals, ∑ 𝑌 − 𝑓 𝜃 , 𝑉 ̇ . Note that there is a difference between the theoretical calculation of the covariance and associated SEs presented earlier and the analytical method used in ‘nls’. Briefly, the observed information matrix (Equation E1): − ( , ̇ ) = − ∇ 𝑓 𝜃 , 𝑉 ̇ 𝑌 − 𝑓 𝜃 , 𝑉 ̇ + ∇ 𝑓 𝜃 , 𝑉 ̇ ∇ 𝑓 𝜃 , 𝑉 ̇ (E2) where ∇ 𝑓 𝜃 , 𝑉 ̇ = ( , ̇ ) , ∇ 𝑓 𝜃 , 𝑉 ̇ = ( , ̇ ) and all quantities in Equation E2 are evaluated at 𝜃 = 𝜃 . For linear regression models, the analytical solution requires the sum of the errors, 𝑌 − 𝑓 𝜃 , 𝑉 ̇ to equal zero thus reducing Equation E2 to − ( , ̇ ) = ∇ 𝑓 𝜃 , 𝑉 ̇ ∇ 𝑓 𝜃 , 𝑉 ̇ (E3) For nonlinear regression models, the sum of the errors may not equal zero (158). In ‘nls’, the inverse of Equation E3 is used as the observed covariance matrix. As previously reported, the nonlinear “log transform both sides” has excellent fit with median R 2 = 0.99 (151) so we expected that Equations E2 and E3 would produce similar values in practice. We confirmed this in our simulation studies. Detailed description of the simulation study on power to detect associations with NO parameters In this simulation study, we assumed the k th true NO parameter, 𝜃 , (𝑘 = 1, … ,3) was related to a continuous determinant 𝑋 : 𝜃 = 𝛼 + 𝛽 𝑋 + 𝛿 (E4) where 𝑖 indexes study participant, 𝑖 = 1, … 𝑁 , and 𝛿 are normally distributed errors. It is common practice for researchers to estimate 𝛽 using a model where 𝜃 is replaced with its estimate, 𝜃 . Unbiased flow rate sampling designs impact these studies through the uncertainty in NO parameter estimation. In an unbiased design, the estimated NO parameter equals the unobservable, true parameter plus error 𝜃 = 𝜃 + 𝜖 where the error depends in part on the design. For a given flow rate sampling design and NO parameter of interest (𝜃 ), we simulated 10,000 datasets each containing extended NO analysis data from N hypothetical participants. In a similar manner to the theoretical calculation, the error (Equation 2) variance was set to 𝜎 =0.15 2 , there were 2 replicates at each flow rate, and we assumed average NO parameter values across the participants equal to the median CHS NO parameter values. 𝜃 was a function of a continuous predictor 𝑋 and 𝛽 , the effect associated with a change in 1 SD unit of 𝑋 (Equation E4). The other 2 NO parameters remained fixed to the median CHS NO parameter values. The distribution of 𝑋 and 𝛿 was assumed to be normal with mean zero. The 𝛿 variance was set to 1 2 (similar to CHS estimates) and the 𝑋 variance was set to 0.1 2 . Then, we estimated NO parameters for each of the N participants using log transform-both-sides nonlinear regression (to achieve higher rates of nls convergence, logDawNO was constrained to be ≥ -1). Finally, we related 𝜃 to Xi in the N participants in a simple linear regression and recorded 𝛽 and its associated p-value. This process was then repeated for the other NO parameters and the other sampling designs. 52 Figure S1.1 Impact of different NO parameter values (literature review) Based on NO parameters observed in various diseases, from a literature review*—on optimality criteria for the 70 possible four flow rate designs from the eight flow rate discrete sampling space. * Based on the Högman 2012 review (159), NO parameters are assumed to be: the median for all CHS participants, CANO=1.2 ppb, J’awNO=700 pl/s, DawNO=12.3 pl·s -1 ·ppb -1 (median CHS); Keen, 2010 (53) (CF) CANO=1 ppb, J’awNO=429 pl/s, DawNO=14 pl·s -1 ·ppb -1 ; Pedroletti, 2003 (91) (Asthma) CANO=1.5 ppb, J’awNO=3500 pl/s, DawNO=26 pl·s -1 ·ppb -1 ; McCurdy, 2011 (83) (COPD) CANO=4.6 ppb, J’awNO=400 pl/s, DawNO=8.8 pl·s -1 ·ppb -1 . Sampling designs are indexed according to a rank ordering of the performance under the reference of NO parameters equal to the median CHS. REFERENCES 1. Akamatsu T, Shirai T, Kato M, et al. Switching from salmeterol/fluticasone to formoterol/budesonide combinations improves peripheral airway/alveolar inflammation in asthma. Pulmonary Pharmacology & Therapeutics 2014;27(1):52-6. 2. Bake B, Viklund E, Olin A-C. Effects of pollen season on central and peripheral nitric oxide production in subjects with pollen asthma. Respiratory Medicine 2014;108(9):1277-83. 3. Barath S, Mills NL, Adelroth E, et al. Diesel exhaust but not ozone increases fraction of exhaled nitric oxide in a randomized controlled experimental exposure study of healthy human subjects. Environmental Health 2013;12:36. 4. Barregard L, Sällsten G, Andersson L, et al. Experimental exposure to wood smoke: effects on airway inflammation and oxidative stress. Occupational and Environmental Medicine 2008;65(5):319-24. 5. Bazeghi N, Gerds TA, Budtz-Jørgensen E, et al. Exhaled nitric oxide measure using multiple flows in clinically relevant subgroups of COPD. Respiratory Medicine 2011;105(9):1338-44. 6. Brindicci C, Ito K, Resta O, et al. Exhaled nitric oxide from lung periphery is increased in COPD. European Respiratory Journal 2005;26(1):52-9. 7. Brindicci C, Ito K, Barnes PJ, et al. Differential flow analysis of exhaled nitric oxide in patients with asthma of differing severity. CHEST Journal 2007;131(5):1353-62. 8. Brindicci C, Ito K, Barnes PJ, et al. Effect of an inducible nitric oxide synthase inhibitor on differential flow-exhaled nitric oxide in asthmatic patients and healthy volunteers. CHEST Journal 2007;132(2):581-8. 9. Cameli P, Bargagli E, Refini R, et al. Exhaled nitric oxide in interstitial lung diseases. Respiratory Physiology & Neurobiology 2014;197:46-52. 10. Cameli P, Bargagli E, Fossi A, et al. Exhaled nitric oxide and carbon monoxide in lung transplanted patients. Respiratory Medicine 2015;109(9):1224-9. 11. Cameli P, Barbagli E, Rottoli P. Exhaled nitric oxide is not increased in pulmonary sarcoidosis. Sarcoidosis Vasculitis and Diffuse Lung Disease 2016;33(1):39-40. 53 12. Caspersen C, Stensrud T, Thorsen E. Bronchial nitric oxide flux and alveolar nitric oxide concentration after exposure to hyperoxia. Aviation, Space, and Environmental Medicine 2011;82(10):946-50. 13. Cattoni I, Guarnieri G, Tosetto A, et al. Mechanisms of decrease in fractional exhaled nitric oxide during acute bronchoconstriction. CHEST Journal 2013;143(5):1269-76. 14. Chinellato I, Piazza M, Peroni D, et al. Bronchial and alveolar nitric oxide in exercise‐ induced bronchoconstriction in asthmatic children. Clinical & Experimental Allergy 2012;42(8):1190-6. 15. Chládková J, Senkerík M, Havlínová Z, et al. Alveolar concentration and bronchial flux of nitric oxide: two linear modeling methods evaluated in children and adolescents with allergic rhinitis and atopic asthma. Pediatric Pulmonology 2012;47(11):1070-9. 16. Choi J, Hoffman LA, Sethi JM, et al. Multiple flow rates measurement of exhaled nitric oxide in patients with sarcoidosis: a pilot feasibility study. Sarcoidosis Vasculitis and Diffuse Lung Disease 2009;26(2):98. 17. Cohen J, Douma W, Ten Hacken N, et al. Physiology of the small airways: A gender difference? Respiratory Medicine 2008;102(9):1264-71. 18. Condorelli P, Shin HW, Aledia AS, et al. A simple technique to characterize proximal and peripheral nitric oxide exchange using constant flow exhalations and an axial diffusion model. Journal of Applied Physiology 2007;102(1):417-25. 19. Degano B, Mittaine M, Hervé P, et al. Nitric oxide production by the alveolar compartment of the lungs in cirrhotic patients. European Respiratory Journal 2009;34(1):138-44. 20. Delclaux C, Mahut B, Zerah-Lancner F, et al. Increased nitric oxide output from alveolar origin during liver cirrhosis versus bronchial source during asthma. American Journal of Respiratory and Critical Care Medicine 2002;165(3):332-7. 21. Deykin A, Massaro AF, Drazen JM, et al. Exhaled nitric oxide as a diagnostic test for asthma: online versus offline techniques and effect of flow rate. American Journal of Respiratory and Critical Care Medicine 2002;165(12):1597-601. 22. Duong-Quy S, Hua-Huy T, Tran-Mai-Thi H-T, et al. Study of Exhaled Nitric Oxide in Subjects with Suspected Obstructive Sleep Apnea: A Pilot Study in Vietnam. Pulmonary Medicine 2016;2016. 23. DUPLAIN H, SARTORI C, LEPORI M, et al. Exhaled nitric oxide in high-altitude pulmonary edema: role in the regulation of pulmonary vascular tone and evidence for a role against inflammation. American Journal of Respiratory and Critical Care Medicine 2000;162(1):221-4. 24. Eckel SP, Zhang Z, Habre R, et al. Traffic-related air pollution and alveolar nitric oxide in southern California children. European Respiratory Journal 2016:ERJ-01176-2015. 25. Foley SC, Hopkins NO, Fitzgerald MX, et al. Airway nitric oxide output is reduced in bronchiectasis. Respiratory Medicine 2007;101(7):1549-55. 26. Foresi A, Leone C, Olivieri D, et al. Alveolar-derived exhaled nitric oxide is reduced in obstructive sleep apnea syndrome. CHEST Journal 2007;132(3):860-7. 27. Fortuna A, Miralda R, Calaf N, et al. Airway and alveolar nitric oxide measurements in obstructive sleep apnea syndrome. Respiratory Medicine 2011;105(4):630-6. 28. Fritscher LG, Rodrigues MT, Zamel N, et al. The effect of montelukast on exhaled nitric oxide of alveolar and bronchial origin in inhaled corticosteroid-treated asthma. Respiratory Medicine 2009;103(2):296-300. 54 29. Fujisawa T, Yasui H, Akamatsu T, et al. Alveolar nitric oxide concentration reflects peripheral airway obstruction in stable asthma. Respirology 2013;18(3):522-7. 30. Furukawa K, Sugiura H, Matsunaga K, et al. Increase of nitrosative stress in patients with eosinophilic pneumonia. Respiratory Research 2011;12(1):1-11. 31. Gelb AF, Taylor CF, Shinar CM, et al. Role of spirometry and exhaled nitric oxide to predict exacerbations in treated asthmatics. CHEST Journal 2006;129(6):1492-9. 32. Gelb AF, Taylor CF, Shinar CM, et al. Effect of fluticasone 250 microg/salmeterol 50 microg and montelukast on exhaled nitric oxide in asthmatic patients. Canadian Respiratory Journal 2008;15(4):193-8. 33. Gelb AF, Taylor CF, Simmons M, et al. Role of add-on zileuton on total exhaled, large airway, and small airway/alveolar nitric oxide in moderate-severe persistent adult asthmatics on fluticasone 250μg/Salmeterol 50μg. Pulmonary Pharmacology & Therapeutics 2009;22(6):516-21. 34. Gelb AF, Taylor CF, Krishnan A, et al. Central and peripheral airway sites of nitric oxide gas exchange in COPD. CHEST Journal 2010;137(3):575-84. 35. Gelb AF, George SC, Silkoff PE, et al. Central and peripheral airway/alveolar sites of exhaled nitric oxide in acute asthma. Thorax 2010;65(7):619-25. 36. Gelb AF, George SC, Camacho F, et al. Increased nitric oxide concentrations in the small airway of older normal subjects. CHEST Journal 2011;139(2):368-75. 37. Gelb AF, Moridzadeh R, Singh DH, et al. In moderate-to-severe asthma patients monitoring exhaled nitric oxide during exacerbation is not a good predictor of spirometric response to oral corticosteroid. Journal of Allergy and Clinical Immunology 2012;129(6):1491-8. 38. Girgis RE, Gugnani MK, Abrams J, et al. Partitioning of alveolar and conducting airway nitric oxide in scleroderma lung disease. American Journal of Respiratory and Critical Care Medicine 2002;165(12):1587-91. 39. Girgis RE, Champion HC, Diette GB, et al. Decreased exhaled nitric oxide in pulmonary arterial hypertension: response to bosentan therapy. American Journal of Respiratory and Critical Care Medicine 2005;172(3):352-7. 40. Heijkenskjöld‐Rentzhog C, Alving K, Kalm‐Stephens P, et al. The fraction of NO in exhaled air and estimates of alveolar NO in adolescents with asthma: Methodological aspects. Pediatric Pulmonology 2012;47(10):941-9. 41. Heijkenskjöld‐Rentzhog C, Nordvall L, Janson C, et al. Alveolar and exhaled NO in relation to asthma characteristics–effects of correction for axial diffusion. Allergy 2014;69(8):1102-11. 42. Hemmingsson TE, Linnarsson D, Frostell C, et al. Effects of ambient pressure on pulmonary nitric oxide. Journal of Applied Physiology 2012;112(4):580-6. 43. Hirano T, Matsunaga K, Sugiura H, et al. Relationship between alveolar nitric oxide concentration in exhaled air and small airway function in COPD. Journal of Breath Research 2013;7(4):046002. 44. Hirano T, Matsunaga K, Sugiura H, et al. Persistent elevation of exhaled nitric oxide and modification of corticosteroid therapy in asthma. Respiratory Investigation 2013;51(2):84-91. 45. Hofer M, Mueller L, Rechsteiner T, et al. Extended nitric oxide measurements in exhaled air of cystic fibrosis and healthy adults. Lung 2009;187(5):307-13. 55 46. Högman M, Holmkvist T, Wegener T, et al. Extended NO analysis applied to patients with COPD, allergic asthma and allergic rhinitis. Respiratory Medicine 2002;96(1):24- 30. 47. Högman M, Malinovschi A, Norbäck D, et al. Added value with extended NO analysis in atopy and asthma. Clinical Physiology and Functional Imaging 2011;31(4):294-9. 48. Högman M, Thornadtsson A, Hedenstierna G, et al. A practical approach to the theoretical models to calculate NO parameters of the respiratory system. Journal of Breath Research 2014;8(1):016002. 49. Hua-Huy T, Tiev KP, Chéreau C, et al. Increased alveolar concentration of nitric oxide is related to serum-induced lung fibroblast proliferation in patients with systemic sclerosis. The Journal of Rheumatology 2010;37(8):1680-7. 50. Jöbsis Q, Raatgeep H, Hop W, et al. Controlled low flow off line sampling of exhaled nitric oxide in children. Thorax 2001;56(4):285-9. 51. Kanazawa H, Kyoh S, Asai K, et al. Validity of measurement of two specific biomarkers for the assessment of small airways inflammation in asthma. Journal of Asthma 2010;47(4):400-6. 52. Karsten J, Krabbe K, Heinze H, et al. Bedside monitoring of ventilation distribution and alveolar inflammation in community-acquired pneumonia. Journal of Clinical Monitoring and Computing 2014;28(4):403-8. 53. Keen C, Gustafsson P, Lindblad A, et al. Low levels of exhaled nitric oxide are associated with impaired lung function in cystic fibrosis. Pediatric Pulmonology 2010;45(3):241-8. 54. Keen C, Olin A-C, Wennergren G, et al. Small airway function, exhaled NO and airway hyper-responsiveness in paediatric asthma. Respiratory Medicine 2011;105(10):1476-84. 55. Kerckx Y, Michils A, Van Muylem A. Airway contribution to alveolar nitric oxide in healthy subjects and stable asthma patients. Journal of Applied Physiology 2008;104(4):918-24. 56. Kissoon N, Duckworth LJ, Blake KV, et al. FE NO: Relationship to exhalation rates and online versus bag collection in healthy adolescents. American Journal of Respiratory and Critical Care Medicine 2000;162(2):539-45. 57. Kissoon N, Duckworth LJ, Blake KV, et al. Exhaled nitric oxide concentrations: online versus offline values in healthy children. Pediatric Pulmonology 2002;33(4):283-92. 58. Kobayashi D, Tochino Y, Kanazawa H, et al. Comparison of alveolar nitric oxide concentrations using two different methods for assessing small airways obstruction in asthma. Respirology 2011;16(5):862-8. 59. Larsson-Callerfelt A-K, Weitoft M, Nihlberg K, et al. iNOS affects matrix production in distal lung fibroblasts from patients with mild asthma. Pulmonary Pharmacology & Therapeutics 2015;34:64-71. 60. Latzin P, Beck J, Griese M. Exhaled nitric oxide in healthy children: variability and a lack of correlation with atopy. Pediatr Allergy Immunol 2002;13(1):37-46. 61. Lehouck A, Carremans C, De Bent K, et al. Alveolar and bronchial exhaled nitric oxide in chronic obstructive pulmonary disease. Respiratory Medicine 2010;104(7):1020-6. 62. Lehtimäki L, Turjanmaa V, Kankaanranta H, et al. Increased bronchial nitric oxide production in patients with asthma measured with a novel method of different exhalation flow rates. Annals of Medicine 2000;32(6):417-23. 56 63. LEHTIMÄKI L, KANKAANRANTA H, SAARELAINEN S, et al. Extended exhaled NO measurement differentiates between alveolar and bronchial inflammation. American Journal of Respiratory and Critical Care Medicine 2001;163(7):1557-61. 64. Lehtimäki L, Kankaanranta H, Saarelainen S, et al. Inhaled fluticasone decreases bronchial but not alveolar nitric oxide output in asthma. European Respiratory Journal 2001;18(4):635-9. 65. Lehtimäki L, Kankaanranta H, Saarelainen S, et al. Increased alveolar nitric oxide concentration in asthmatic patients with nocturnal symptoms. European Respiratory Journal 2002;20(4):841-5. 66. Lehtimäki L, Kankaanranta H, Saarelainen S, et al. Peripheral inflammation in patients with asthmatic symptoms but normal lung function. Journal of Asthma 2005;42(7):605-9. 67. Lehtimäki L, Oksa P, Järvenpää R, et al. Pulmonary inflammation in asbestos-exposed subjects with borderline parenchymal changes on HRCT. Respiratory Medicine 2010;104(7):1042-9. 68. Lehtimäki L, Kankaanranta H, Saarelainen S, et al. Bronchial nitric oxide is related to symptom relief during fluticasone treatment in COPD. European Respiratory Journal 2010;35(1):72-8. 69. Lehtonen H, Oksa P, Lehtimäki L, et al. Increased alveolar nitric oxide concentration and high levels of leukotriene B4 and 8-isoprostane in exhaled breath condensate in patients with asbestosis. Thorax 2007;62(7):602-7. 70. Linkosalo L, Lehtimäki L, Laitinen J, et al. Increased bronchial NO output in severe atopic eczema in children and adolescents. Pediatric Allergy and Immunology 2008;19(5):426-32. 71. Linkosalo L, Lehtimäki L, Holm K, et al. Relation of bronchial and alveolar nitric oxide to exercise‐induced bronchoconstriction in atopic children and adolescents. Pediatric Allergy and Immunology 2012;23(4):360-6. 72. Lopez V, Prieto L, Perez-Frances C, et al. Natural pollen exposure increases the response plateau to adenosine 5'-monophosphate and bronchial but not alveolar nitric oxide in sensitized subjects. Respiration 2011;83(3):225-32. 73. Lunt A, Ahmed Ne, Rafferty GF, et al. Airway and alveolar nitric oxide production, lung function, and pulmonary blood flow in sickle cell disease. Pediatric Research 2015. 74. Mahut B, Louis B, Zerah-Lancner F, et al. Validity criteria and comparison of analytical methods of flow-independent exhaled NO parameters. Respiratory Physiology & Neurobiology 2006;153(2):148-56. 75. Mäkinen T, Lehtimäki L, Kinnunen H, et al. Bronchial diffusing capacity of nitric oxide is increased in patients with allergic rhinitis. International Archives of Allergy and Immunology 2008;148(2):154-60. 76. Malinovschi A, Janson C, Holm L, et al. Basal and induced NO formation in the pharyngo-oral tract influences estimates of alveolar NO levels. Journal of Applied Physiology 2009;106(2):513-9. 77. Malinovschi A, Henrohn D, Eriksson A, et al. Increased plasma and salivary nitrite and decreased bronchial contribution to exhaled NO in pulmonary arterial hypertension. European Journal of Clinical Investigation 2011;41(8):889-97. 78. Malmberg LP, Pelkonen AS, Malmström K, et al. Very low birth weight and respiratory outcome: association between airway inflammation and hyperresponsiveness. Annals of Allergy, Asthma & Immunology 2013;111(2):96-101. 57 79. Mandon J, Högman M, Merkus PJ, et al. Exhaled nitric oxide monitoring by quantum cascade laser: comparison with chemiluminescent and electrochemical sensors. Journal of Biomedical Optics 2012;17(1):017003. 80. Maniscalco M, Zedda A, Faraone S, et al. Low alveolar and bronchial nitric oxide in severe uncomplicated obesity. Obesity Research & Clinical Practice 2015;9(6):603-8. 81. Maniscalco M, Faraone S, Sofia M, et al. Extended analysis of exhaled and nasal nitric oxide for the evaluation of chronic cough. Respiratory Medicine 2015;109(8):970-4. 82. Matsumoto H, Niimi A, Jinnai M, et al. Association of alveolar nitric oxide levels with pulmonary function and its reversibility in stable asthma. Respiration 2010;81(4):311-7. 83. McCurdy MR, Sharafkhaneh A, Abdel-Monem H, et al. Exhaled nitric oxide parameters and functional capacity in chronic obstructive pulmonary disease. Journal of Breath Research 2011;5(1):016003. 84. Menzies D, Nair A, Meldrum KT, et al. Effect of aspirin on airway inflammation and pulmonary function in patients with persistent asthma. Journal of Allergy and Clinical Immunology 2008;121(5):1184-9. e4. 85. Modig L, Dahgam S, Olsson D, et al. Short-term exposure to ozone and levels of exhaled nitric oxide. Epidemiology 2014;25(1):79-87. 86. Nicolini G, Chetta A, Simonazzi A, et al. Both bronchial and alveolar exhaled nitric oxide are reduced with extrafine beclomethasone dipropionate in asthma. Presented at Allergy and Asthma Proceedings 2010. 87. Nihlberg K, Andersson-Sjöland A, Tufvesson E, et al. Altered matrix production in the distal airways of individuals with asthma. Thorax 2010;65(8):670-6. 88. Paraskakis E, Brindicci C, Fleming L, et al. Measurement of bronchial and alveolar nitric oxide production in normal children and children with asthma. American Journal of Respiratory and Critical Care Medicine 2006;174(3):260-7. 89. Paraskakis E, Zihlif N, Bush M. Nitric oxide production in PCD: possible evidence for differential nitric oxide synthase function. Pediatric Pulmonology 2007;42(10):876-80. 90. Paredi P, Kharitonov SA, Meah S, et al. A novel approach to partition central and peripheral airway nitric oxide. CHEST Journal 2014;145(1):113-9. 91. Pedroletti C, Högman M, Meriläinen P, et al. Nitric oxide airway diffusing capacity and mucosal concentration in asthmatic schoolchildren. Pediatric Research 2003;54(4):496- 501. 92. Prieto L, López V, Barato D, et al. Comparison of 2 Methods to Correct for Peripheral Nitric Oxide Exchange in the Lungs. Journal of Investigational Allergology and Clinical Immunology 2013;23(6):409-14. 93. Prieto L, Ruiz-Jimenez L, Marin J. The effect of spirometry on bronchial and alveolar nitric oxide in subjects with asthma. Journal of Asthma 2013;50(6):623-8. 94. Prieto L, Palop J, Llusar R, et al. Effects of cigarette smoke on methacholine-and AMP- induced air trapping in asthmatics. Journal of Asthma 2015;52(1):26-33. 95. Puckett JL, Taylor RW, Leu SY, et al. An elevated bronchodilator response predicts large airway inflammation in mild asthma. Pediatric Pulmonology 2010;45(2):174-81. 96. Puckett JL, Taylor RW, Galant SP, et al. Impact of analysis interval on the multiple exhalation flow technique to partition exhaled nitric oxide. Pediatric Pulmonology 2010;45(2):182-91. 58 97. Radhakrishnan DK, Bendiak GN, Mateos-Corral D, et al. Lower airway nitric oxide is increased in children with sickle cell disease. The Journal of Pediatrics 2012;160(1):93- 7. 98. Robroeks CM, van de Kant KD, van Vliet D, et al. Comparison of the anti-inflammatory effects of extra-fine hydrofluoroalkane-beclomethasone vs fluticasone dry powder inhaler on exhaled inflammatory markers in childhood asthma. Annals of Allergy, Asthma & Immunology 2008;100(6):601-7. 99. Robroeks CM, Van Vliet D, Hendriks HJ, et al. Feasibility of exhaled nitric oxide measurements at various flow rates in children with asthma. Pediatric Allergy and Immunology 2010;21(1‐Part‐II):e222-e8. 100. Rolla G, Brussino L, Scappaticci E, et al. Source of exhaled nitric oxide in primary biliary cirrhosis. CHEST Journal 2004;126(5):1546-51. 101. Rosa MJ, Divjan A, Hoepner L, et al. Fractional exhaled nitric oxide exchange parameters among 9‐year‐old inner‐city children. Pediatric Pulmonology 2011;46(1):83- 91. 102. Roy K, Borrill Z, Starkey C, et al. Use of different exhaled nitric oxide multiple flow rate models in COPD. European Respiratory Journal 2007;29(4):651-9. 103. Sanders SP, Proud D, Permutt S, et al. Role of nasal nitric oxide in the resolution of experimental rhinovirus infection. Journal of Allergy and Clinical Immunology 2004;113(4):697-702. 104. Sardón O, Corcuera P, Aldasoro A, et al. Alveolar nitric oxide and its role in pediatric asthma control assessment. BMC Pulmonary Medicine 2014;14(1):1. 105. Sauni R, Oksa P, Lehtimäki L, et al. Increased alveolar nitric oxide and systemic inflammation markers in silica-exposed workers. Occupational and Environmental Medicine 2012;69(4):256-60. 106. Scichilone N, Battaglia S, Braido F, et al. Exhaled nitric oxide is associated with cyclic changes in sexual hormones. Pulmonary Pharmacology & Therapeutics 2013;26(6):644- 8. 107. Scichilone N, Scalici V, Arrigo R, et al. Clinical and anti-inflammatory effects of ultra- short preseasonal vaccine to Parietari in asthma. Therapeutic Advances in Respiratory Disease 2013:1753465813476564. 108. Scichilone N, Battaglia S, Taormina S, et al. Alveolar nitric oxide and asthma control in mild untreated asthma. Journal of Allergy and Clinical Immunology 2013;131(6):1513-7. 109. Scichilone N, Braido F, Taormina S, et al. Is health-related quality of life associated with upper and lower airway inflammation in asthmatics? BioMed Research International 2013;2013. 110. Sehlstedt M, Dove R, Boman C, et al. Antioxidant airway responses following experimental exposure to wood smoke in man. Particle and Fibre Toxicology 2010;7(1):1. 111. Sepponen A, Lehtimäki L, Huhtala H, et al. Alveolar and bronchial nitric oxide output in healthy children. Pediatric Pulmonology 2008;43(12):1242-8. 112. Shelley DA, Puckett JL, George SC. Quantifying proximal and distal sources of NO in asthma using a multicompartment model. Journal of Applied Physiology 2010;108(4):821-9. 113. Shin H-W, Rose-Gottron CM, Perez F, et al. Flow-independent nitric oxide exchange parameters in healthy adults. Journal of Applied Physiology 2001;91(5):2173-81. 59 114. Shin H-W, Rose-Gottron CM, Sufi RS, et al. Flow-independent nitric oxide exchange parameters in cystic fibrosis. American Journal of Respiratory and Critical Care Medicine 2002;165(3):349-57. 115. Shin H-W, Rose-Gottron CM, Cooper DM, et al. Impact of high-intensity exercise on nitric oxide exchange in healthy adults. Medicine and Science in Sports and Exercise 2003;35(6):995-1003. 116. Shin H-W, Rose-Gottron CM, Cooper DM, et al. Airway diffusing capacity of nitric oxide and steroid therapy in asthma. Journal of Applied Physiology 2004;96(1):65-75. 117. Shinkai M, Suzuki S, Miyashita A, et al. Analysis of exhaled nitric oxide by the helium bolus method. CHEST Journal 2002;121(6):1847-52. 118. Shirai T, Mori K, Mikamo M, et al. Respiratory mechanics and peripheral airway inflammation and dysfunction in asthma. Clinical & Experimental Allergy 2013;43(5):521-6. 119. Shoemark A, Wilson R. Bronchial and peripheral airway nitric oxide in primary ciliary dyskinesia and bronchiectasis. Respiratory Medicine 2009;103(5):700-6. 120. Short PM, Williamson PA, Lipworth BJ. Effects of extra-fine inhaled and oral corticosteroids on alveolar nitric oxide in COPD. Lung 2012;190(4):395-401. 121. Shorter JH, Nelson DD, McManus JB, et al. Clinical study of multiple breath biomarkers of asthma and COPD (NO, CO2, CO and N2O) by infrared laser spectroscopy. Journal of Breath Research 2011;5(3):037108. 122. Spears M, Weir CJ, Smith AD, et al. Bronchial nitric oxide flux (J′ aw) is sensitive to oral corticosteroids in smokers with asthma. Respiratory Medicine 2011;105(12):1823-30. 123. ST. CROIX CM, Wetter TJ, Pegelow DF, et al. Assessment of nitric oxide formation during exercise. American Journal of Respiratory and Critical Care Medicine 1999;159(4):1125-33. 124. Suri R, Paraskakis E, Bush A. Alveolar, but not bronchial nitric oxide production is elevated in cystic fibrosis. Pediatric Pulmonology 2007;42(12):1215-21. 125. Tajiri T, Niimi A, Matsumoto H, et al. Comprehensive efficacy of omalizumab for severe refractory asthma: a time-series observational study. Annals of Allergy, Asthma & Immunology 2014;113(4):470-5. e2. 126. Thornadtsson A, Neerincx A, Högman M, et al. Extended nitric oxide analysis may improve personalized anti-inflammatory treatment in asthmatic children with intermediate FENO50. Journal of Breath Research 2015;9(4):047114. 127. Tiev KP, Le-Dong N-N, Duong-Quy S, et al. Exhaled nitric oxide, but not serum nitrite and nitrate, is a marker of interstitial lung disease in systemic sclerosis. Nitric Oxide 2009;20(3):200-6. 128. Tiev K, Coste J, Ziani M. Diagnostic value of exhaled nitric oxide to detect interstitial lung disease in systemic sclerosis. Sarcoidosis Vasculitis and Diffuse Lung Disease 2009;26(1):32-8. 129. Tiev KP, Hua-Huy T, Kettaneh A, et al. Alveolar concentration of nitric oxide predicts pulmonary function deterioration in scleroderma. Thorax 2012;67(2):157-63. 130. Tiev KP, Hua-Huy T, Rivière S, et al. High alveolar concentration of nitric oxide is associated with alveolitis in scleroderma. Nitric Oxide 2013;28:65-70. 131. Tiev KP, Rivière S, Hua-Huy T, et al. Exhaled NO predicts cyclophosphamide response in scleroderma-related lung disease. Nitric Oxide 2014;40:17-21. 60 132. Törnberg D, Marteus H, Schedin U, et al. Nasal and oral contribution to inhaled and exhaled nitric oxide: a study in tracheotomized patients. European Respiratory Journal 2002;19(5):859-64. 133. Tufvesson E, Svensson H, Ankerst J, et al. Increase of club cell (Clara) protein (CC16) in plasma and urine after exercise challenge in asthmatics and healthy controls, and correlations to exhaled breath temperature and exhaled nitric oxide. Respiratory Medicine 2013;107(11):1675-81. 134. Van Amsterdam J, Zanen P, Somer S, et al. Flow dependency and off‐line measurement of exhaled NO in children. Pediatric Allergy and Immunology 2003;14(4):266-71. 135. van de Kant KD, Paredi P, Meah S, et al. The effect of body weight on distal airway function and airway inflammation. Obesity Research & Clinical Practice 2015. 136. Van Muylem A, Kerckx Y, Michils A. Acinar effect of inhaled steroids evidenced by exhaled nitric oxide. Journal of Allergy and Clinical Immunology 2010;126(4):730-5. e2. 137. Volbeda F, Broekema M, Lodewijk ME, et al. Clinical control of asthma associates with measures of airway inflammation. Thorax 2012:thoraxjnl-2012-201861. 138. Walker W, Liew A, Harris A, et al. Upper and lower airway nitric oxide levels in primary ciliary dyskinesia, cystic fibrosis and asthma. Respiratory Medicine 2013;107(3):380-6. 139. Wiel E, Postma D, Molen T, et al. Effects of small airway dysfunction on the clinical expression of asthma: a focus on asthma symptoms and bronchial hyper‐responsiveness. Allergy 2014;69(12):1681-8. 140. Williamson PA, Menzies D, Nair A, et al. A proof-of-concept study to evaluate the anti- inflammatory effects of a novel soluble cyclodextrin formulation of nebulized budesonide in patients with mild to moderate asthma. Annals of Allergy, Asthma & Immunology 2009;102(2):161-7. 141. Williamson PA, Clearie K, Menzies D, et al. Assessment of small-airways disease using alveolar nitric oxide and impulse oscillometry in asthma and COPD. Lung 2011;189(2):121-9. 142. Williamson PA, Short PM, Vaidyanathan S, et al. Inhaled and systemic corticosteroid response in severe asthma assessed by alveolar nitric oxide: a randomized crossover pilot study of add‐on therapy. British Journal of Clinical Pharmacology 2013;75(1):93-102. 143. Wuttge D, Bozovic G, Hesselstrand R, et al. Increased alveolar nitric oxide in early systemic sclerosis. Clinical and Experimental Rheumatology 2010;28(5 Suppl 62):S5-9. 144. Yasui H, Fujisawa T, Inui N, et al. Impact of add-on pranlukast in stable asthma; the additive effect on peripheral airway inflammation. Respiratory Medicine 2012;106(4):508-14. 145. Zacharasiewicz A, Wilson N, Lex C, et al. Effect of inhalation times on exhaled NO. Pediatric Pulmonology 2004;38(4):335-8. 146. Zetterquist W, Marteus H, Johannesson M, et al. Exhaled carbon monoxide is not elevated in patients with asthma or cystic fibrosis. European Respiratory Journal 2002;20(1):92-9. 147. Zhao Y, Cui A, Wang F, et al. Characteristics of pulmonary inflammation in combined pulmonary fibrosis and emphysema. Chinese Medical Journal 2012;125(17):3015-21. 148. Tsoukias NM, Tannous Z, Wilson AF, et al. Single-exhalation profiles of NO and CO2 in humans: effect of dynamically changing flow rate. Journal of Applied Physiology 1998;85(2):642-52. 61 149. Pietropaoli AP, Perillo IB, Torres A, et al. Simultaneous measurement of nitric oxide production by conducting and alveolar airways of humans. Journal of Applied Physiology 1999;87(4):1532-42. 150. Silkoff PE, Sylvester JT, Zamel N, et al. Airway nitric oxide diffusion in asthma: Role in pulmonary function and bronchial responsiveness. American Journal of Respiratory and Critical Care Medicine 2000;161(4 Pt 1):1218-28. 151. Eckel SP, Linn WS, Berhane K, et al. Estimation of parameters in the two-compartment model for exhaled nitric oxide. PloS One 2014;9(1):e85471. 152. Högman M, Meriläinen P. Extended NO analysis in asthma. Journal of Breath Research 2007;1(2):024001. 153. Pronzato L, Pázman A. Design of experiments in nonlinear models. Lecture Notes in Statistics 2013;212. 154. Rao CR. Information and the accuracy attainable in the estimation of statistical parameters. Calcutta Mathematical Society 1945;37:81–9. 155. Cram'er H. Mathematical Methods of Statistics. NJ, USA: Princeton University Press; 1946. 156. Atkinson AC, Donev AN, Tobias R. Optimum experimental designs, with SAS. Oxford ; New York: Oxford University Press; 2007. 157. Fedorov VV, Leonov SL. Optimal design for nonlinear response models. Chapman & Hall/CRC Biostatistics Series. Boca Raton: Taylor & Francis, 2013:1 online resource (xxviii, 373 pages). 158. Ayyub BM, McCuen RH. Probability, statistics, and reliability for engineers and scientists. 3rd ed. Boca Raton, FL: CRC Press; 2011. 159. Högman M. Extended NO analysis in health and disease. Journal of Breath Research 2012;6(4):047103. 62 CHAPTER 2. TRANSFORM-BOTH-SIDES METHODS FOR NONLINEAR MIXED EFFECTS MODELS ABSTRACT For nonlinear mixed effects models, the maximum likelihood method for inference depends on assumptions of normality for the distributions of random effects and errors. In practice, nonlinear models are often applied to biological data (e.g., concentration measurements) that are right skewed. Hence modeling assumptions might not be satisfied due to skewed and/or heteroscedastic error distributions. A common solution is to transform both the response and the regression function, which preserves the theoretical relationship between the response and model parameters in these physical models. The transformation is commonly chosen a priori (e.g., natural log) but can also be estimated from the data using Box-Cox power transformation. Methods for estimating the transformation must overcome the lack of an analytical solution for the marginal likelihood in a nonlinear mixed effects model. Existing methods include: (1) a transform-both-sides nonlinear mixed effect model extension to Box-Cox estimation in linear models and (2) an approach estimating the transformation of only the response in an ANOVA model (applicable to only a subset of study designs). Both methods first identify the transformation parameter using grid search and then, conditionally, estimate the model parameters. We have developed a novel Bayesian Markov chain Monte Carlo unified (UNIIFIED method) approach to jointly estimate the transformation parameter and the model’s parameters. In a simulation study, we compared the performance and limitations of the new UNIIFIED method with existing methods. We also applied methods to data on exhaled nitric oxide measured at multiple exhalation flow rates in the Southern California Children’s Health 63 Study. In a second simulation study, we examined the impacts of transformation misspecification. We observed that the UNIIFIED method fit a wide range of study designs, had unbiased transformation parameter estimates, good convergence and reasonable run time. Misspecifying the transformation parameter biased some of the random effects variance parameters and increased standard error of some fixed effects. We observed that—compared to the estimated transformation parameter confidence intervals—there is a wider range of transformations with reasonable performance. Models estimated with the UNIFIED method had good performance and can be applied to all study designs, so we recommend the UNIFIED method to estimate the data driven transformation parameter in nonlinear mixed effects models. 64 INTRODUCTION The nonlinear model is a statistical regression approach that combines a theoretical, mechanistic- driven mean function and an additive error. Nonlinear models are popular in the field of pharmacokinetics. The theoretical model often uses anatomy, physiology, and biochemistry as the basis for differential equations describing the dynamics of a substance of interest throughout body compartments. The model’s parameters represent physically/biologically meaningful quantities. Nonlinear mixed effects models pool information across single experimental units (e.g., study participant) to gain power in estimating population mean parameters (fixed effects) while accounting for the variation between units or subjects (random effects). The maximum likelihood method for inference on fixed effects model parameters assumes normality and equal variance of the error term. Violations of these assumptions may lead to incorrect inference. In linear models, a power transformation of the outcome can be used to remove skewness and heteroscedasticity. The 𝜆 is often chosen with the Box-Cox method.(1) In nonlinear models, the scientific meaning of the parameters will be altered if only the outcome is transformed. Transforming both the outcome and the mean function preserves the physiological relationship between the outcome and model parameters on the original scale.(2, 3) The Box-Cox method can be extended to a transform-both-sides (TBS) scenario. However, TBS increases the complexity of model fitting compared with transforming only the outcome. Several methods have been proposed previously, but they are not readily implemented with off-the-shelf software. Hence, transformation is commonly chosen a-priori, e.g., natural log transformation.(4) Our work is motivated by data on exhaled nitric oxide (FeNO) from the Southern California Children's Health Study (CHS). The CHS is a large-scale longitudinal cohort designed to study the chronic effects of air pollution on children’s respiratory health. FeNO is a biomarker 65 of airway inflammation in the lower respiratory tract (5, 6) that has uses, amongst others, in monitoring asthma and revealing associations between air pollution and health effects. The nonlinear association between exhalation flow rate (𝑉 ̇ , ml/s) and FeNO (ppb) is described in a closed form solution to the steady-state two-compartment model (7) (FeNO model). 𝐹𝑒𝑁𝑂 = + 𝐶 − exp − ̇ (1) where the so-called “NO parameters” include the maximum airway flux (J′ aw), airway tissue diffusing capacity (Daw), and alveolar NO concentration (CA). The model describes a simple approximation of NO dynamics in the airway and alveolar compartments assuming a cylinder- shaped airway.(8) For the nonlinear FeNO model, it was previously demonstrated that an a priori log TBS approach produced unbiased parameters estimates and satisfied assumptions of normality and equal variance.(9) In this study, we focus on data-driven methods for determining the transformation in nonlinear mixed effect models. First, we introduce two existing methods and a novel unified Bayesian Markov chain Monte Carlo method (denoted UNIFIED) that simultaneously estimates transformation and model parameters. In simulation studies, we compare the three methods. We apply all three methods to FeNO data in the CHS (CHS data). Second, in a simulation study we investigated the impact of transformation misspecification on nonlinear mixed effect model parameters estimates (e.g., in the case of a misspecified a priori transformation or no transformation). We validate our results with simulations using another previously published model which is distinct from the FeNO model. 66 METHODS Box-Cox transformation for linear model In a linear model, the relationship between an outcome vector 𝒚 , mean regression function 𝑓 (. ) and a vector of additive error 𝜺 is described by: ℎ(𝒚 , 𝜆 ) = 𝑓 (𝑿 , 𝜷 ) + 𝜺 (2) where 𝑿 is a design matrix of explanatory variables, 𝜷 is a vector of regression parameters and the association between variables and outcome is assumed to be linear in 𝜷 . The transformed outcome, ℎ(𝒚 , 𝜆 ) is defined by the Box–Cox family of monotonic transformations(1) as: h(𝒚 , 𝜆 ) = 𝒚 − 1 𝜆 ⁄ 𝑓𝑜𝑟 𝜆 ≠ 0 log(𝒚 ) 𝑓𝑜𝑟 𝜆 = 0 (3) where 𝜆 is the transformation parameter and 𝒚 > 0. Log transformation (𝜆 = 0) and no transformation (𝜆 = 1) are special cases. On the transformed scale, the errors (𝜺 ) are assumed to be normally distributed with mean zero and constant variance 𝜎 across X levels, i.e. they satisfy the normality and homoscedastic assumptions. A standard method to find 𝜆 is to use a grid search over a range of k=1,…,K fixed values of 𝜆 (e.g. −2 ≤ 𝜆 ( ) ≤ 2). The change of variables rule is used to compare log-likelihoods for a given 𝜆 ( ) on the original 𝑦 scale 𝑓 (𝒚 ) = 𝑓 ( , ) ℎ 𝒚 , 𝜆 ( ) ℎ 𝒚 , 𝜆 ( ) (4) where ℎ 𝑦 , 𝜆 ( ) is the Jacobian, ℎ 𝒚 , 𝜆 ( ) = 𝑑 𝑑𝑦 ℎ 𝒚 , 𝜆 ( ) = 𝒚 ( ) 𝑓𝑜𝑟 𝜆 ( ) ≠ 0 1 𝒚 𝑓𝑜𝑟 𝜆 ( ) = 0 The likelihood 𝐿 is equal to: 67 𝐿 𝜷 , 𝜎 , 𝜆 ( ) = ( ) / 𝑒𝑥𝑝 − ℎ 𝒚 , 𝜆 ( ) − 𝑿𝜷 ℎ 𝒚 , 𝜆 ( ) − 𝑿𝜷 ∏ ℎ 𝑦 , 𝜆 ( ) (5) and the log-likelihood 𝑙 is: 𝑙 𝜷 , 𝜎 , 𝜆 ( ) = − 𝑙𝑜𝑔 (2𝜋 ) − 𝑙𝑜𝑔 (𝜎 ) − ℎ 𝒚 , 𝜆 ( ) − 𝑋𝛽 ℎ 𝒚 , 𝜆 ( ) − 𝑿𝜷 + ∑ 𝑙𝑜𝑔 ℎ 𝑦 , 𝜆 ( ) (6) The maximum log-likelihood (estimates are marked with ) 𝑙 𝜷 𝜆 ( ) , 𝜎 𝜆 ( ) , 𝜆 ( ) can be plotted versus 𝜆 ( ) . The 𝜆 is selected as the 𝜆 ( ) that maximizes the log-likelihood. For inference, it is common to use a profile likelihood approach to build a confidence interval. Specifically, note that for a fixed value 𝜆 , the log-likelihood ratio test (LRT) compares 𝐻 : 𝜆 = 𝜆 vs. 𝐻 : 𝜆 ≠ 𝜆 , by 𝐿𝑅𝑇 = 2 𝑙 𝜆 − 𝑙 (𝜆 ) (7) The LRT approximately follows a 𝜒 ,( ) distribution. By inverting this test, we can find values of 𝜆 that lie in the non-rejection region to construct a LRT 100(1 − 𝛼 )% confidence interval (CI) for 𝜆 : 𝜆 : 𝜆 > 𝑙 𝜆 − 𝜒 ,( ) (8) Hence, for the 95% CI of 𝜆 we can simply add a horizontal line to the plot at the log-likelihood value, 𝑙 𝜆 − 1.92. The LRT CI for 𝜆 is defined by the intersection between the log- likelihood curve and the horizontal line. Transform both sides (TBS) for the nonlinear mixed effect model In a TBS nonlinear mixed effect model, the relationship between 𝒚 , mean regression function 𝑓 (. ) and additive error 𝜺 is described by ℎ(𝒚 𝒊 , 𝜆 ) = ℎ(𝑓 (𝑿 𝒊 , 𝜷 𝒊 ), 𝜆 ) + 𝜺 𝒊 (9) 𝜷 𝒊 = 𝑨 𝒊 𝜷 + 𝒃 𝒊 68 where 𝒚 𝒊 = 𝑦 , … , 𝑦 is the 𝑗 -th outcome (𝑗 = 1, … , 𝑛 ) for the 𝑖 -th individual (𝑖 = 1, … , 𝑚 ) with design matrix 𝑿 𝒊 of explanatory variables. Here, the association between variables and outcome is nonlinear in the (𝑝 × 1) vector of participant-specific parameters 𝜷 . These individual parameters depend on fixed effect, 𝜷 and on 𝒃 𝒊 , a vector (𝑚 × 1) random effects. The random effects are assumed to come from a multivariate normal distribution with a mean vector 𝟎 and a (𝑝 × 𝑝 ) variance-covariance matrix D. Here, we will assume that the covariate information summarized in a design matrix 𝑨 𝒊 is equal to the identity matrix 𝑰 × . After applying the transformation ℎ(. ) on both the outcome and the mean regression function using 𝜆 (TBS parameter), we assume that errors 𝜺 𝒊 = 𝜀 , … , 𝜀 are independent and normally distributed with mean 0 and variance 𝜎 . Maximum likelihood estimation methods for mixed effect models are based on the marginal density of 𝒚 obtained by integrating out the unobserved random effects. 𝑝 (𝒚 𝒊 ) = ∫ 𝑝 (𝒚 𝒊 |𝒃 𝒊 ) 𝑝 (𝒃 𝒊 )𝑑𝒃 𝒊 (10) For TBS nonlinear mixed effect model 𝑝 (𝒚 𝒊 |𝒃 𝒊 ), the conditional density of 𝒚 given the random effects 𝒃 , and 𝑝 (𝑏 ), the marginal distribution of 𝒃 are defined as 𝑝 (𝒚 𝒊 |𝒃 𝒊 ) = (2𝜋𝜎 ) / 𝑒𝑥𝑝 − {ℎ(𝒚 𝒊 , 𝜆 ) − ℎ(𝒇 𝒊 , 𝜆 )} {ℎ(𝒚 𝒊 , 𝜆 ) − ℎ(𝒇 𝒊 , 𝜆 )} ∏ ℎ 𝑦 , 𝜆 (11) and 𝑝 (𝒃 𝒊 ) = |𝑫 | / (2𝜋 ) / 𝑒𝑥𝑝 − 𝑏 𝐷 𝑏 (12) with 𝑓 = 𝑓 (𝒙 𝒊𝟏 , 𝜷 𝒊 ), … , 𝑓 𝒙 𝒊𝒏 𝒊 , 𝜷 𝒊 . After plugging equations 11 and 12 into equation 10, the marginal distribution of 𝑦 is: 𝑝 (𝒚 𝒊 ) = (2𝜋𝜎 ) / ∏ ℎ 𝑦 , 𝜆 |𝑫 | / (2𝜋 ) / (13) 69 × 𝑒𝑥𝑝 − 1 2𝜎 {ℎ(𝒚 𝒊 , 𝜆 ) − ℎ(𝒇 𝒊 , 𝜆 )} {ℎ(𝒚 𝒊 , 𝜆 ) − ℎ(𝒇 𝒊 , 𝜆 )} − 1 2 𝒃 𝒊 𝑫 𝒃 𝒊 𝑑𝒃 𝒊 Because 𝑓 is nonlinear in the random effects, the marginal log-likelihood function (the integral in Equation 13), does not have a closed-form solution. Here we use the nlme R function to estimate an approximation to the marginal log likelihood (Equation 13).(10) The algorithm alternates until convergence between (1) a penalized nonlinear fixed effects least squares step and (2) maximum likelihood of linear mixed effects model step using a first-order Taylor expansion and Newton-Raphson algorithm.(11, 12) NLME, together with the Jacobian for change of variables rule (Equation 4) can be used to estimate the log-likelihood for a given 𝜆 ( ) . So, we can extend the grid search method from the linear model to the nonlinear mixed effect model (NLME-GS) and use the same approach to estimate 𝜆 and 95% LRT CI. ANOVA 2-step grid search method Latif et al(13) observed that the purpose of TBS is to improve the distribution of the error term and hence the transformation can be estimated considering the nonlinear mean function a nuisance. This can be done by replacing the nonlinear mixed effects model with an analysis of variance (ANOVA)-based model that estimates a different mean for each unique combination of study conditions (e.g., flow rate) and experimental units (e.g., participant). In the first step, 𝜆 is estimated using standard Box-Cox transformation for linear model on a full ANOVA model with (only) a transformed outcome: ℎ(𝑦 , 𝜆 ) = 𝜇 + 𝛿 (14) where 𝜇 is the mean of the transformed outcome corresponding to each combination of study conditions (𝑣 ) and experimental unit (𝑖 ). Note a change in notation, before we used 𝑗 = 1, … , 𝑛 to index a set of observations for subject 𝑖 . The ANOVA method requires a protocol of 𝑅 > 1 70 replicates at 𝑉 study conditions (with 𝑟 = 1, … , 𝑅 and 𝑣 = 1, … , 𝑉 ). If all subjects follow the same protocol, then 𝑛 = 𝑉𝑅 . The additive errors, 𝛿 , are assumed to be independent normally distributed with variance 𝜏 . If the nonlinear mixed effect model fits the data well, the distribution of 𝜀 (Equation 9) will be similar to the distribution of 𝛿 .(13) Hence, we can avoid convergence issues and find 𝜆 (and 95% LRT CI) using the standard Box-Cox grid search method/software to a saturated ANOVA model. In the second step, the nonlinear mixed effect model is estimated treating the 𝜆 as known. We will refer to this approach as ANOVA 2-step grid search (ANOVA-2GS) method. Unified method In our novel approach, rather than fixing the TBS parameter in a grid search, we propose a unified (UNIFIED method) Bayesian approach to estimate 𝜆 jointly with the other parameters of the nonlinear mixed effect model. In the Bayesian model, inference is made from a posterior distribution of parameters which is a combination of the likelihood of the observed data and the prior distributions of the parameters.(14) As opposed to the maximum likelihood approach that integrates out random effects and works with the marginal likelihood, the Bayesian framework treats the random effects as parameters to be estimated and works with the joint likelihood. A Bayesian Markov chain Monte Carlo (MCMC) algorithm is used to generate samples of parameters from the posterior distribution. The algorithm creates a chain of estimates by proposing new samples of parameters and then accepting or rejecting the proposal with some decision rule. This process is repeated until the chains of accepted proposals reach a stationary state. Our model is coded using the R interface for the Stan probabilistic modelling language.(15, 16) Stan uses Hamiltonian Monte Carlo to generate samples of parameters. The Hamiltonian 71 dynamics’ physical system allows for transitions to (almost) anywhere in the posterior (for further details Monnahan et al, (17)), minimizing the inefficient random walk which reduces autocorrelation and improves efficiency and running time.(15) To compare the Bayesian UNIFIED approach to the other frequentist methods, we define the mean of the posterior distribution to be the transformation parameter estimate and the standard deviation of the posterior distribution to be its standard error. The 95% credible interval for a parameter will be constructed by taking the 2.5 and 97.5 percentiles of the posterior distribution. Henceforth, the term CI will refer to both LRT confidence intervals and credible intervals. The MCMC process is usually repeated in several independent chains to assess mixing. The Gelman–Rubin diagnostic test (𝑅 ) assesses convergence of the chains by comparing the estimated between-chain and within-chain variances for each model parameter.(18, 19). Good mixing (convergence) is usually defined as 𝑅 ≤ 1.1.(14) CHS data and data simulation We used FeNO data collected between March and June 2010 as part of the most recent CHS cohort. Multiple exhalation flow rate FeNO maneuvers was collected in 1640 middle-school students (ages 12–15). Children were asked to perform 9 FeNO maneuvers at the following target flow rates: 50 ml/s (3 replicates) and 30, 100, and 300 ml/s (2 replicates each).(20, 21) As described previously (9), we considered data from 1507 children with ≥1 valid maneuver at each of the 4 target flow rates, for a total of 13,614 FeNO maneuvers. Participants did not always produce data that satisfied the number of replicates per target flow rate specified in the protocol. For example, 250 (16.7%) of the participants had one or more flow rates with only one maneuver 72 (no replicates). Actual flow rates differ from target flow rates especially for the highest flow rate of 300 ml/s (Figure S2.1). In simulation studies, a vector of theoretical FeNO for subject 𝑖 (𝒇 𝒊 ) were calculated from the FeNO model (Equation 1). For each simulation, 100 CHS data participants were sampled with replacement, and their vector of observed CHS (actual) flow rates (𝑽 ̇ 𝒊 ) were used for data generation. Due to the skewed distributions observed in practice, J’ aw and Daw were log transformed in 𝒇 𝒊 , with 𝛽 = (𝐶 , log𝐽 , 𝑙𝑜𝑔𝐷 ) and 𝒇 𝒊 defined as 𝒇 𝒊 = 𝑙𝑜𝑔𝐽 𝑙𝑜𝑔𝐷 + 𝐶 − 𝑙𝑜𝑔𝐽 𝑙𝑜𝑔𝐷 𝑒𝑥𝑝 − 𝑙𝑜𝑔𝐷 𝑽 ̇ 𝒊 Using quantities similar to those observed in the CHS data, participant-specific parameters were simulated as follows 𝜷 𝒊 ~𝑀𝑉𝑁 (𝜷 , 𝑫 ) 𝜷 = (1.2,6.7,2.7) 𝑫 = 0.4 0 −0.1 0 0.7 0.2 −0.1 0.2 0.3 Observed FeNO was calculated after adding an error, 𝜺 𝒊 ~𝑁 (0, 0.1 ) on the scale of 𝜆 : 𝑭𝒆𝑵𝑶 𝒊 = ℎ (ℎ(𝒇 𝒊 , 𝜆 ) + 𝜺 𝒊 ) were ℎ (. ) is the inverse transformation of 𝜆 . We set 𝜆 =0.2 for comparability with the (rounded) value observed in the CHS data. Overall, we simulated 1,000 such datasets. Transformation estimation We compared NLME-GS, ANOVA-2GS, and UNIFIED methods using simulated data by the FENO model and real FeNO data from the CHS. We recorded 𝜆 and the corresponding 95% CI for each method. Given 𝜆 ( ) , convergence of the nonlinear mixed effect model is not guaranteed 73 so there could be missing sections in the profile likelihood. NLME-GS failed to estimate 𝜆 if there were missing sections within the LRT CI. For the ANOVA-2GS, we applied a grid search on a linear model with main effects and interactions for all combinations of participants and target flow rates. Note that because the ANOVA-2GS require replicates of study conditions we used target flow rates (30, 50, 100 and 300 ml/s) instead the actual flow rates used to simulate the data. Originally, we had planned to use a grid between -2 and 2 (with increments of 0.01) for both the NLME-GS and ANOVA-2GS methods. Due to the slow running time for NLME-GS, we used a restricted grid search for this method (-0.3 to 0.8). For both grid search methods, we verified that the CI limits did not approach the grid search boundaries. For the UNIFIED method we allowed for 3 chains, each with total 600 iterations (300 iterations were for burn-in) and used randomly sampled initial values. Note that RStan requires many fewer iterations to reach convergence compared to other MCMC methods. The prior distributions were: 𝜷 𝒊 ~𝑀𝑉𝑁 (𝜷 , 𝑫 ) 𝜷 ~𝑢𝑛𝑖𝑓𝑜𝑟𝑚 (0,100) 𝑫 ~inv_wishart 3, 0.1 0 0 0 0.1 0 0 0 0.1 𝜎 ~𝑖𝑛𝑣 _𝑔𝑎𝑚𝑚𝑎 (.001, .001) 𝜆 ~𝑢𝑛𝑖𝑓𝑜𝑟𝑚 (−2,2) For the UNIFIED method, we defined convergence failure as 𝑅 ≥1.1. Transformation misspecification We examined the impact of transformation misspecification on model parameter estimates in the same simulated data as for transformation estimation. On each simulated dataset, we applied a 74 sequence of predefined 𝜆 ’s ranging from 0 to 1, with 𝜆 included. We used a simplified version of the UNIFIED estimation framework, without 𝜆 estimation, to estimate fixed effect parameters and elements of the random effects variance-covariance matrix (𝑫 ). We define the mean of the posterior distribution to be the parameters estimates and the standard deviation of the posterior distribution to be its standard errors. The distribution of the parameter estimated is presented in boxplots. Relative bias was defined as the difference between estimate and the true parameter divided by the true parameter. Relative bias will provide us with a scale for the magnitude of possible bias. All analysis was performed using R version 3.5.2.(22) RESULTS Transformation estimation- simulation study In the simulation study, the NLME-GS method failed to estimate 𝜆 in 8% of the simulated datasets. There were no convergence issues for the ANOVA-2GS and the UNIFIED methods. TBS parameter estimates for the NLME-GS and the UNIFIED methods were unbiased with similar distribution. The ANOVA-2GS method estimates were biased towards 0 (Figure 2.1.a). Both NLME-GS and UNIFIED methods had high coverage of the 𝜆 (94.8% and 95.6%, respectively). For both methods, 95% CI length was similar with minimal lower bound of 0.04 and maximal upper bound of 0.35. Run time was very fast (few seconds) for the ANOVA-2GS method (Figure 2.1.b). Even though a severely restricted grid search was conducted for NLME- GS, its run time was slower (median run time= 14.3 min) compared to the NLME-GS method (median run time= 7.6 min). The NLME-GS slow run time is due to increased lack of convergence of nonlinear mixed effect model as 𝜆 ( ) moves away from 𝜆 =0.2 (Figure S2.2). 75 Transformation misspecification Transformation misspecification did not produce bias in the estimation of (population parameter) fixed effects, but it did impact standard errors and random effect variances. As the magnitude of 𝜆 increased, the standard error for C A and logDaw increased (Figure 2.2.b), and the variance of the random effect for C A decreased (Figure 2.2.2.c). An unclear trend was observed for the covariance between CA and logDaw (Figure 2.2.d). Generally applying 𝜆 =0.2 had minimal bias compared to using larger transformation. Even though 0 was not included in the 95% CI for the transformation estimate, in practice, a 0 transformation (i.e., natural log) demonstrated similar and even better ability to estimate model parameters compare to 𝜆 =0.2. In addition, rates of convergence failure for 𝜆 =0, 0.2, and 0.4 were lower (1.2-3.5%) compared to 𝜆 =0.6, 0.8 and 1 (11-13%). We tested the impact of transformation misspecification, under a previously published Argatroban model(4) with two model parameters: logCl and logV (see ‘The Argatroban model’ in supplements). Similar to the FeNO model, for the Argatroban model misspecification of the transformation did not bias the fixed effects estimation, but standard errors and random effect variances were impacted. As the magnitude of the misspecified 𝜆 decreased, the standard error for logV (Figure S2.3.b) and logV random effects variance (Figure S2.3.c) increased and the covariance between logCl and logV decreased (Figure S2.3.d). This pattern was different than for the FeNO model. CHS data application When applying a TBS nonlinear mixed effect model to CHS data with 𝜆 =1 (i.e., no transformation), standardized residuals did not follow a normal distribution and were heavily 76 dependent on flow rate (Figure 2.3.a and 2.3.b, right panel). The NLME-GS and the UNIFIED methods applied to the CHS data gave similar (given the limitation of grid search resolution) 𝜆 estimates of 0.19 (95% CI: 0.17-0.20), for both methods. The ANOVA-2GS estimate was lower, 0.14 (95% CI: 0.13-0.15). When applying transformations of 𝜆 =0.2 or 𝜆 = 0 to the CHS data, standardized residuals much better satisfied assumptions of normality and homoscedasticity (Figure 2.3). DISCUSSION Violations of the normality or homoscedasticity assumptions may lead to incorrect inference in nonlinear mixed effects models. Transformation can be chosen a priori or estimated from the data using a TBS extension of the Box-Cox method. We demonstrate that, compared to other approaches, our novel Bayesian UNIFIED approach has reasonable run time, good convergence rates and produces unbiased estimates of 𝜆 . In this study, we presented the first careful consideration of the impact of transformation misspecification on model parameter estimates for nonlinear mixed effects models. Transformation misspecification resulted in biased estimates for some of the random effects variance parameters and wide standard errors for some of the fixed effects. However, there is a wider range (wider than suggested by the CI for 𝜆 ) of transformations with similar estimation performance. The NLME-GS method is a standard application of Box-Cox grid search for an approximating to the marginal log-likelihood of a nonlinear mixed effect model. It has no restrictions on the number of replicates at each study condition and can handle continuously varying study conditions (e.g., flow rates). For the FeNO model when 𝜆 ( ) was far from 𝜆 , we observed difficulty in reaching convergence which made the NLME-GS method slow and 77 computationally burdensome. Convergence problems motivated us to use a restricted grid search range (-0.3 to 0.8) compared to our original plan (-2 to 2). For a specific nonlinear mixed effect model application, we can improve convergence rates by trying different stating values, adding iterations and changing the convergence criterion.(11) However, these solutions are tailored to a specific data/model and can’t be applied in a grid search process where dozens/hundreds (in our case K=111) of models are estimated. Refinements/extensions of the basic NLME-GS presented here have been previously published, including: using grid search to find an initial estimates of 𝜆 and then improving this estimate using (a) Laplace's approximation of the marginal log- likelihood (4) or (b) an iterative gradient based process.(23) Because these methods are based on the NLME-GS approach they will share same issues discussed above. In addition, if the grid search is fine enough then any improvements in estimating 𝜆 are expected to be small (e.g., λ=0.5 after grid search stage and λ=0.5 after improvement (4)). The ANOVA-2GS method is a simple, intuitive approach. Because it is based on a linear model with a closed form solution to maximize the log-likelihood, there are no convergence issues and run time is very fast. Previously, this method has been demonstrated to have excellent performance for estimation of λ.(13) However, this method can’t be applied in scenarios where study conditions are not replicated (e.g., Argatroban model in Supplements). The FeNO observed in the CHS data represent an intermediate scenario where the study protocol has replicated target flow rates but in practice target flow rates were not necessarily reached and the actual flow rate achieved at each maneuver was recorded. For the FeNO model, we see that the ANOVA-2GS method produced biased λ estimates. Although the UNIFIED method was slower than the ANOVA-2GS method, it had unbiased estimates and good convergence rates. The UNIFIED method estimates λ jointly with 78 other model parameters. This method considers all transformations within the range defined in the prior distribution and doesn’t use a discretized grid search. We note that based on results for simulation study and CHS data (putting aside run time and convergence issues) the transformation estimates and CI from the NLME-GS and UNIFIED methods were similar (with similar CI), indicating that in this example joint estimation did not produce different results than an NLME-GS approach. In this study, we introduced a careful consideration of the impact of transformation misspecification on model parameter estimates for nonlinear mixed effects models. Previous publications typically compared model performance of a transformed model using 𝜆 with an a priori chosen transformation of 1(23), 0 (4), or both.(13) In Daimon et al (23), instead of assuming some 𝜆 they used an error structure that depends on the mean function to simulate data. After estimating λ, they observed that in the transformed model it was possible to obtain more precise model parameter estimates compared to an untransformed model. In their simulated data they demonstrated that heteroscedasticity of the observations can be remarkably improved with transformation, except for the poor fits at the extremes. Normality was not reached after transformation, but there was an improvement after transformation. Latif et al (13) mentioned that a data-based transformation estimation was preferable to an a priori 0 transformation in satisfying normality assumption. However, in practice the Q–Q normal plots after applying a 0 transformation and the data-based estimated transformation appeared very similar. Here, we apply a sequence of transformations, ranged from 0 to 1 (𝜆 included) in simulated data by the FeNO and the Argatroban models. This allow us to view trends in model parameters bias and to identify transformations with similar performance. Overall in the FeNO and Argatroban models, misspecification of transformation resulted in unbiased fixed effects, 79 biased estimates for some of the random effects variance parameters and large standard errors for some of the fixed effects. For data simulated by the FeNO model, misspecified transformations that were far from 𝜆 had convergence issues. Increasing/decreasing transformation size might have a different effect on different model components. In the same model, some transformations can reduce bias/improve accuracy for some parameters but not for others. The estimated transformation is probably the transformation that struck a balance between opposite effects. In practice we observed that the range of transformations with similar estimation performance exceeds that range suggested by estimated CI. Oberg et al,(4) demonstrated that when 0 transformation is used, instead of 𝜆 =0.5 the estimates of some model parameters were worsened. Here, for the same simulation setting, we see that transformations between 0.5 and 1 have similar estimation performance. In data simulated under the FeNO model, the estimated transformation (and 𝜆 ) was 0.2, but transformation of 0 had very similar and at times improved estimation performance. Moreover, in the CHS data, both transformations of 0 and 0.2 improved normality and homoscedasticity compared to a transformation of 1 (in a similar manner). In the FeNO model, we observed that the C A random effect variance was under estimated when we increased the pre-defined transformation (compared to the true value). Estimating C A is one of the primary goals of multiple flow FeNO analysis.(9) In nonlinear mixed effect models, in addition to estimation of fixed effects, we are also interested in identifying differences and variations among individuals, and to detect associations between individual estimates and some determinants. A common approach is to derive participant-specific parameters from a nonlinear mixed effect model and then to fit a second linear model that estimates the association between these participant-specific parameters and potential determinant. For example, a goal is to 80 estimate participant-specific CA and its association with air pollution exposure. An under- or over-estimation of the random effects variance could interfere with the ability to detect such associations. In addition, because the 0 transformation was previously suggested for the (participant-specific) nonlinear FeNO model(9) we had a specific interest in examining whether this approach extends to the (population-level) nonlinear mixed effect model. We demonstrate that for the FeNO model, a 0 transformation had very similar and at times, improved estimation performance to the estimated transformation of 0.2. This study has several limitations. We recommend using an MCMC method implemented using rstan. Working with the rstan package requires additional software installation beyond base R and usual R package installation. Additionally, in rstan the Bayesian model is written in a “compiled language” using a language other than R, which requires a short introduction. The MCMC algorithm can take time to reach a stationary state and convergence is not guaranteed. In practice, for the FeNO model we saw that compared to the NLME-GS, the UNIFIED method had good convergence rates and reasonable run time. This study has several strengths. The UNIFIED method had good convergence rates, reasonable run time, was flexible and does not require replicates of study conditions. Compared to the NLME-GS method the UNIFIED method had similar reasonable run time and good convergence rates. Compared to the ANOVA-2GS the UNIFIED is more flexible and does not require replicates of study conditions with had unbiased estimates of transformations scale. The CHS data is one of the largest datasets of multiple flow FeNO data. Previous work with this data (9, 24) used fixed target flow rates as defined in the study sample design protocol. For current analysis we sampled actual flow rates observed in the CHS data. We believe this simulation scheme better represent the actual exhalation flow variation in the population. We also validated 81 our FeNO model results for estimating the impact of transformation misclassification in another scenario, the Argatroban model. In summary, transformation can help improve normality and equal variance assumptions. We demonstrate that transformation misspecification affects the estimation of only some of the model parameters. Furthermore, there is a wide range (wider than the range suggested by CI) of 𝜆 ’s with similar estimation performance. However, without a theoretical explanation, the only way of learning which parameters are affected or what is the range of transformation with similar performance is to run an extensive simulation study. In practice, the researcher usually debates between using transformation of 1, 0 or to estimate the transformation from data. Here we present one example for which 0 transformation is reasonable and one for which 1 transformation is reasonable. In both cases, the estimated transformation had good estimation performance with no need of further investigation. Hence, our recommendation is to use the data to estimate transformation. From the models reviewed in this study the main gain from applying an appropriate transformation is unbiased estimates of random effects variances and more accurate fixed effects estimates. If the application does not have replicates of the study conditions, we recommend the UNIFIED method for transformation estimation. This method estimates transformation jointly with other model parameters and is unbiased with reasonable run time. The results of this study can be generalized to other nonlinear mixed effects models beyond the FeNO model. 82 REFERENCES 1. Box GE, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B (Methodological) 1964:211-52. 2. Carroll RJ, Ruppert D. Power transformations when fitting theoretical models to data. Journal of the American Statistical Association 1984;79(386):321-8. 3. Snee RD. An alternative approach to fitting models when re-expression of the response is useful. Journal of Quality Technology 1986;18(4):211-25. 4. Oberg A, Davidian M. Estimating data transformations in nonlinear mixed effects models. Biometrics 2000;56(1):65-72. 5. Bucca C, Cicolin A, Guida G, et al. Exhaled nitric oxide (FENO) in non-pulmonary diseases. Journal of Breath Research 2012;6(2):027104. 6. Kharitonov S, Gonio F, Kelly C, et al. Reproducibility of exhaled nitric oxide measurements in healthy and asthmatic adults and children. European Respiratory Journal 2003;21(3):433-8. 7. George SC, Hogman M, Permutt S, et al. Modeling pulmonary nitric oxide exchange. Journal of Applied Physiology 2004;96(3):831-9. 8. Högman M. Extended NO analysis in health and disease. Journal of Breath Research 2012;6(4):047103. 9. Berhane K, Zhang Y, Salam MT, et al. Longitudinal effects of air pollution on exhaled nitric oxide: the Children's Health Study. Occupational and Environmental Medicine 2014;71(7):507-13. 10. Pinheiro J, Bates D, DebRoy S, et al. nlme: Linear and nonlinear mixed effects models. R Package Version 2013;3(1):111. 83 11. Pinheiro J, Bates D. Mixed-effects models in S and S-PLUS. Springer Science & Business Media; 2006. 12. Lindstrom MJ, Bates DM. Nonlinear mixed effects models for repeated measures data. Biometrics 1990:673-87. 13. Latif AM, Gilmour SG. Transform-both-sides nonlinear models for in vitro pharmacokinetic experiments. Statistical Methods in Medical Research 2014:0962280214544017. 14. Gelman A, Stern HS, Carlin JB, et al. Bayesian data analysis. Chapman and Hall/CRC; 2013. 15. Gelman A, Lee D, Guo J. Stan: A probabilistic programming language for Bayesian inference and optimization. Journal of Educational and Behavioral Statistics 2015;40(5):530-43. 16. Carpenter B, Gelman A, Hoffman MD, et al. Stan: A probabilistic programming language. Journal of Statistical Software 2017;76(1). 17. Monnahan CC, Thorson JT, Branch TA. Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods in Ecology and Evolution 2017;8(3):339-48. 18. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 1998;7(4):434-55. 19. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science 1992;7(4):457-72. 20. Linn WS, Rappaport EB, Eckel SP, et al. Multiple‐flow exhaled nitric oxide, allergy, and asthma in a population of older children. Pediatric Pulmonology 2013;48(9):885-96. 84 21. Linn WS, Rappaport EB, Berhane KT, et al. Extended exhaled nitric oxide analysis in field surveys of schoolchildren: a pilot test. Pediatric Pulmonology 2009;44(10):1033-42. 22. Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018. 23. Daimon T, Goto M. Power-transformation mixed effects model with applications to pharmacokinetics (Statistical Models for Biomedical Research). Journal of the Japanese Society of Computational Statistics 2003;15(2):135-50. 24. Molshatski N, Eckel SP. Optimal flow rate sampling designs for studies with extended exhaled nitric oxide analysis. Journal of Breath Research 2017;11(1):016012. 85 TABLES AND FIGURES Figure 2.1 Transform both sides (TBS) parameter across estimation methods a. Boxplots of TBS parameter estimates Horizontal blue line indicates 𝜆 =0.2 b. Boxplots of run time (min) For the NLME-GS method we considered a restricted range of 𝜆 (-0.3 to 0.8) compared to the UNIFIED and ANOVA-2GS methods (-2 to 2) 86 Figure 2.2 Transformation misspecification for the FeNO model parameter estimates. Boxplots of estimates across a sequence of predefined 𝜆 ranged between 0 and 1 Green boxplots borders indicate estimates for 𝜆 = 0.2; Horizontal blue line indicates true values used for simulation; When applicable relative bias is added as text (relative bias is not calculated for standard errors and COV(C A, logJ’ aw) that had true value of 0) a. Fixed effects b. Fixed effects standard error c. Random effect variance (diagonal elements of the variance-covariance matrix) d. Random effect covariance (off-diagonal elements of the variance-covariance matrix) 87 Figure 2.3 Standardized residuals after applying transform both sides (TBS) on CHS data. a. Q–Q normal plot of standardized residuals 𝜆 = 0 𝜆 = 0.2 𝜆 = 1 b. Distribution of standardized residuals by target flow rate 𝜆 = 0 𝜆 = 0.2 𝜆 = 1 88 SUPPLEMENTARY DATA Figure S2.1 FeNO vs. flow rates Vertical blue lines indicate target flow rates at 30, 50, 100 and 300 ml/s Figure S2.2 Nonlinear fixed effect model fails to converge For nlme grid search (NLME-GS) method Vertical line indicates 𝜆 =0.2 89 The Argatroban model The Argatroban model describes the effect of time (𝑡 ) on Argatroban concentration (𝑦 ),(1) with 𝑓 = 𝑅 𝐶𝑙 𝑒𝑥𝑝 − 𝐶𝑙 𝑉𝑜𝑙 𝑡 ∗ − 𝑒𝑥𝑝 − 𝐶𝑙 𝑉𝑜𝑙 𝑡 were 𝐶𝑙 and 𝑉𝑜𝑙 are fixed effects characterizing clearance and volume of distribution, respectively. We followed simulation described in Oberg et al,(1): 1000 simulated datasets each with 25 subjects assuming times: 𝑡 =(30, 60, 90, 115, 160, 200, 240, 245, 250, 260, 275, 295, 320) and 𝑡 ∗ = 0 for 𝑡 ≤ 𝑡 (study conditions were not replicated). With true fixed effects of 𝑡 = 240, 𝑅 = 4, 𝛽 = (𝑙𝑜𝑔𝐶𝑙 , 𝑙𝑜𝑔𝑉𝑜𝑙 ) = (−5.5, −1.9), 𝐷 = 0.15 0.02 0.02 0.01 and 𝜀 ~𝑁 (0, 1 ). The estimated TBS parameter from the Argatroban data 0.5 was set to 𝜆 . Figure S2.3 Transformation misspecification for the Argatroban model parameter estimates. Boxplots of estimates across a sequence of predefined 𝜆 ranged between 0 and 1 Green boxplots borders indicate estimates for 𝜆 = 0.5; Horizontal blue line indicates true values used for simulation. When applicable relative bias is added as text (relative bias is not calculated for standard errors) a. Fixed effects b. Fixed effects standard error c. Elements of the variance-covariance matrix 1. Oberg A, Davidian M. Estimating data transformations in nonlinear mixed effects models. Biometrics 2000;56(1):65-72. 90 CHAPTER 3. METHODS TO IDENTIFY KEY PREDICTORS AND 2-WAY INTERACTIONS USING GRADIENT BOOSTING MODELS IN LARGE HEALTHCARE DATA ABSTRACT A gradient boosting model (GBM) is an ensemble machine learning method that combines small sized trees into a single model with high predictive performance. GBM can handle a large number of predictor variables as inputs while automatically allowing for nonlinearities and interactions. It is a flexible approach that can be applied to explore large environmental healthcare datasets without a pre-defined hypothesis. A final model is composed of numerous trees, so it is not directly interpretable. Importance statistics and visualization methods are used to understand the key predictors. However, the importance statistics are biased towards predictors with many possible split points (e.g., continuous predictors) and come with no clear cutoff to separate important (true association with the outcome) and nonimportant effects. It has been previously suggested to account for predictor type by repeatedly estimating importance statistics under an altered outcome to create a reference null distribution. The method for producing a null distribution is different when assessing the overall importance of predictors versus the importance of interactions. In this work, we use a simulation study to (1) apply methods to identify both important predictors and interactions with GBM and (2) propose a novel improvement to the method for identifying interactions. We also analyzed data from Kaiser Permanente Southern California (KPSC) electronic medical records that has been linked with maternal residential air pollution exposures to detect determinants of gestational diabetes. In 91 simulated data, methods to identify important effects with GBM outperformed standard logistic regression by successfully detecting important predictors, interactions and nonlinearities while maintaining low false discovery. In addition, compared to the existing approach, our proposed method for detecting interactions was computationally efficient (short run time) and had good discovery performance. In KPSC data we identified known gestational diabetes risk factors, including potentially novel nonlinear trends and interaction terms, and we confirmed our findings using logistic regression. In conclusion, GBM is a promising model for analyzing large environmental healthcare data. Methods based on altered outcome are important tool for GBM interpretation. 92 INTRODUCTION Electronic medical records (EMR) have dramatically expanded the quantity and availability of healthcare data. These datasets, which can be quite large, can be combined with environmental information derived from residential addresses (e.g., assigned air pollutant exposures from nearby Environmental Protection Agency monitors) to address epidemiological research questions. These large datasets are an opportunity for hypothesis generation, attempting to reveal new associations between exposures and health outcomes without pre-defined hypotheses about potential associations that could exist in the data. Due to the large number of potential predictors, applications of traditional statistical methods would require model selection methods.(1, 2) These data tend to be messy, so other challenges include missingness, outliers, correlated predictors, and unknown nonlinear or interaction effects.(3-5) Machine learning methods are flexible, data-driven methods that offer a new approach to these challenges. This work is motivated by EMR of mother-child pairs collected in Kaiser Permanente Southern California (KPSC). KPSC data contain maternal: residential air pollution exposures, socio-demographics, and health information. Our outcome of interest was diagnosis of gestational diabetes mellitus (GDM) during pregnancy. A gradient boosting model (GBM) is a machine learning ensemble approach that learns the data by fitting a sequence of small-sized trees to the residuals of the previous tree(s).(6, 7) Each tree divides the data into a set of disjoint regions by applying a series of binary splits on the predictors. GBM can handle numerous predictors at a time, is resistant to outliers, and can accommodate missing data and account for nonlinearities.(8-10) GBM can simultaneously consider all possible interactions up to a predefined interaction level and allow for complex nonlinear interactions. Combining a large number of simple trees can result in dramatic 93 improvements in prediction performance compared to single tree-based models, like classification and regression trees.(11-13) A single tree is easily interpretable but this feature is lost for GBM, which usually contains hundreds or thousands of trees. Importance summary statistics and visualization techniques can be used to combine results across trees and indicate importance of predictor(s). Henceforth we will use “importance” as a GBM analog to statistical significance, indicating that an association occurs other than by chance. Available importance summary statistics describe the overall importance of a predictor and the importance of predictor(s) in interactions. Importance summary statistics provide only a relative scale for importance without a threshold, confidence intervals, or p-values. Similar to other tree-based methods (14, 15) the GBM algorithm tends to prefer predictors with many possible splits (continuous or categorical with many categories, referred hereafter as continuous predictors). It follows that comparison of importance summary statistics across different types of predictors can be misleading. Even though the bias toward continuous predictors is known, in practice many publications using GBM and other tree-based methods report on important effects without accounting for predictor type (e.g., continuous, categorical). A possible solution is to create a reference null distribution by repeatedly estimating importance statistics under an altered outcome. The null distribution approach has been applied previously for overall importance (16) and for predictors with important interactions (17) but not both. Because GDM is represented as a binary variable (binary outcomes are common in EMR), we will assume binary outcome in the method section and simulations study. However, this work can be easily extended to other outcome types. In this work, we suggest applying a null distribution approach to both identify overall important predictors and predictors with important 2-way interactions (BOTH-GBM), when 94 using GBM with binary outcome. Furthermore, we introduce a novel, less computationally intensive, application of the null reference distribution approach to identify predictors with important 2-way interactions. In a simulation study, we compared the ability to distinguish between important and nonimportant effects (discovery performance) between BOTH-GBM analysis and a logistic model. Specifically, for BOTH-GBM analysis, we compared performance of methods to identify predictors with important interactions. Our simulation was designed to allow for different predictor types, additive and 2-way interactions effects, different sample sizes, and different correlations of the predictors. Finally, we applied BOTH-GBM analysis on KPSC data. METHODS Trees and Gradient boosting model Decision trees (18) divide the data into a set of disjoint regions by applying a series of binary splits in the predictors. Predictors and split points are chosen hierarchically, conditioned on previous splits in a greedy manner to minimize prediction errors. Data splitting is repeated until some stopping criterion is reached (e.g., ≥10 observations in each terminal node). For a continuous outcome, the predicted value assigned to all observations within the same region is the sample mean of these observations (regression trees) and for a categorical outcome it is the mode (classification trees). Decision trees are intuitive and easy to interpret. They can accommodate missing data and are insensitive outliers.(19) Multiple splits on the same predictor can form a nonlinear trend and the hierarchical structure allows for interactions. However, trees tend to overfit and a small change in the data can result in a different model. Summaries of 95 overall importance of predictors in decision trees can be biased as trees will tend to prioritize continuous predictors, with many possible splits.(19) GBM is an adaptive numerical optimization machine learning approach that slowly learns the data by sequentially fitting small sized trees, each usually allowing for 1-6 splits.(6, 7, 18) In a forward process, GBM fits a new tree to residuals calculated from the series of previous trees. Existing trees are left unchanged, only fitted values and residuals are re-estimated. The method aims to minimize a loss function. For a binary outcome (our main interest), the standard loss function is the deviance a measure of the loss in predictive performance due to a suboptimal model. By fitting small trees to the residuals, we slowly improve fit in areas where it does not perform well. The final boosted model is a linear combination ensemble of all trees. GBM is more stable and accurate then a single tree model. The model has several tuning parameters. Each tree is based on a bag fraction (bg) of the dataset.(7) This stochastic process helps reduce the correlation between trees, reduces computing time, and improves predictive performance.(7) Interaction depth (id) controls interaction order by setting maximal number of splits in each ‘small’ sized tree. For example, id = 2 allow for a model with 2-way interactions and id = 1 for a model containing only additive effects. The contribution of each tree is further shrunk by a learning rate (lr), a small positive number typically between 0.01 and 0.001. The total number of trees (nt) depends on lr and should be large enough to allow the model to learn the data well but not too large to avoid overfitting. Importance summary statistics and visualization techniques Ensembling a large number of trees improves prediction performance at the expense of loss in interpretation. Importance summary statistics and visualizations techniques are required to 96 combine results across nt trees. Friedman’s relative influence (6) assesses overall importance of a predictor. It is based on the number of times a predictor is selected for splitting, weighted by the squared improvement of the model as a result of each split. Relative influence combines nonlinearities, additive effects and interaction effects and hence cannot be interpreted as the magnitude (or statistical significance) of the main effect in traditional models. To gain some understanding of the direction and shape of associations in GBM models, one can use partial dependence which is the effect of a subset of predictors (𝒙 𝒔 ) on the outcome after marginalizing out all other predictors (𝒙 𝑺 ).(6) Partial dependence (𝑃𝐷 ) can be estimated using: 𝑃𝐷 (𝒙 𝒔 ) = ∑ 𝐹 𝒙 𝒔 , 𝒙 𝑺 (𝒊 ) (1) where 𝐹 denotes GBM fitted values. The overall trend is estimated by iterating over N (i=1, …, N) data points. At each point i, partial dependence is evaluated by calling model predict function. Partial dependence plots are used to visualize marginal trend of a single predictor and interaction structure between (usually 2) predictors. Note that a limitation of partial dependence plots (which we do not attempt to remedy here) is that 𝒙 𝒔 might be correlated with 𝒙 𝑺 , and hence holding 𝒙 𝑺 fixed at observed values while varying 𝒙 𝒔 across the whole range may be unrealistic. Currently there are 2 existing methods to identify predictors with important interactions in GBM. Although these methods can be generalized to greater than 2-way interactions, here we limit ourselves to 2-way interactions between 𝑥 and 𝑥 , hereafter we will abbreviate ‘2-way interactions’ into ‘interaction’. The first method is based on Friedman’s “H” statistics.(20) Assuming 𝑥 and 𝑥 do not interact, their joint partial dependence function can be decomposed as: 𝑃𝐷 𝑥 , 𝑥 = 𝑃𝐷 (𝑥 ) + 𝑃𝐷 𝑥 (2) The H statistics is the square root of the proportion of the variance explained by the interaction: 97 𝐻 = ∑ ( ) , ( ) ( ) ( ) ∑ ( ) , ( ) (3) A downside is that calculation of the H statistics is computationally expensive. For each interaction, 𝑃𝐷 𝑥 , 𝑥 , 𝑃𝐷 (𝑥 ), and 𝑃𝐷 𝑥 are estimated by iterating over N data points which results in 3*N calls to the model prediction function.(21) For the second method, Elith et al(22) recommended creating a new dataset that includes all combinations of 𝑥 , 𝑥 levels (continuous predictors are categorized into intervals) with all other predictors set to their means (continuous predictors) or modes (categorical predictors). GBM that was trained on the real dataset is used to predict the outcome (𝑃𝑟𝑒𝑑𝑖𝑐𝑡 ) for the new dataset. Elith statistic is based on the following linear model 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 = 𝑙𝑒𝑣𝑒𝑙𝑠 (𝑥 ) + 𝑙𝑒𝑣𝑒𝑙𝑠 (𝑥 ) + 𝑟 (4) where 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 is calculated allowing interactions and the mean function includes only main effects. Hence, larger residuals indicate that an interaction term is required. Elith statistic is the mean squared residuals (𝑟 ) times 1000. Compared to the H statistics, Elith statistic is a simplified faster approach. For each interaction, Elith statistic requires only one call to the predict function. Originally it was recommended to categorize continuous predictors into intervals (20) of fixed length. Here, to reduce influence of extreme observations, we used a slightly modified version where we created intervals by quintiles. BOTH-GBM analysis The aforementioned importance statistics (RI ϵ [0, ∞), H statistic ϵ [0, 1], Elith statistic ϵ [0, ∞)) are relative measures, with larger values indicating greater importance. These summary statistics do not provide a threshold to distinguish between important and nonimportant effects. Even for 98 non-important predictors or interactions, these summary statistics will not necessarily be equal to zero. In addition, importance statistics from tree-based models favor continuous predictors. So, noninformative continuous predictors may incorrectly be identified as important. One solution is to create a reference null distribution by repeatedly estimating each importance statistic B times (b= 1, …, B) under an altered outcome (i.e., permuted or artificial, as described below). By altering the outcome, we can produce null distributions for importance statistics that account for predictor type and number of levels while preserving the dependence structure between predictors.(23) Based on the null reference distribution approach, we can obtain resampling- based “p-values” for a specific effect by calculating the fraction of null importance statistics larger than the observed importance statistics from the original dataset: Resampling p-value = ∑ , (5) For overall predictor importance, we create a reference null distribution of no predictors- outcome association by repeatedly estimating relative influence from a data with randomly permuted (15) outcome (RI-P approach). For interaction importance, the relevant null distribution assumes only additive effects. Permuting the outcome would destroy all associations with the outcome, including additive effects. Instead, Freidman et al (20) recommended creating a reference null distribution by repeatedly estimating H statistics from a dataset with artificial outcomes produced assuming only additive effects (H-A approach). We suggest a novel approach: to use artificial outcome concept but with the Elith statistic which is less computationally intensive than the H statistic (E-A approach). Using Equation 5 we can calculate resampled p value for each observed importance statistic from the corresponding reference null distribution. We will henceforth refer to the 99 approach of developing reference null distributions for both overall and interaction importance in GBM as “BOTH-GBM”. A description of BOTH-GBM approach is presented in Figure 3.1. Data simulation We used a simulation study to assess the ability to correctly identify predictors with true associations (discovery performances). Predictors (50) were sampled from a multivariate normal distribution with a mean vector of 0 and a variance-covariance matrix D. Each predictor’s variance was equal to 1 and all pairwise covariances/correlation between predictors were set to r. Half of the predictors (25) were further dichotomized at their means to create binary predictors. The log odds mean function was: log odds (Y=1)= log(p/(1-p)) =1+ 0.5 *(cont1+cont2+cont1*cont2) + 0.5 *(bin1+bin2+bin1*bin2) + 0.5 *(cont3+bin3+cont3*bin3) + 0.5 *cont4^2*bin4+ 0.5 *cont6^2+ 0.1 *(cont5+bin5) The inverse of the log odds (p) was used as probability weights to generate binary outcome. The prefix “cont” and “bin” denote continuous and binary predictors respectively. Only a subset of 11 predictors had a true association with the outcome: 6 important continuous predictors (cont6 had nonlinear effect) and 5 important binary predictors. Out of 1225 pairwise interactions (50 choose 2) there were only 4 important interactions (and the remaining 1221 were noise). Interactions combined different predictor types (cont1-cont2, cont3-bin3 and bin1-bin2) and had 100 nonlinear effects (cont4*bin4). For most true associations, we used moderate effect sizes of 0.5. Because predictor importance accounts for interactions, in practice predictor importance magnitude differ. We also included 2 additive effects with weak effect size of 0.1. We simulated 10*N (N=2000, 4000, 6000, 8000 or 10,000) observations and sampled without replacement N times to reach an outcome ratio of 1:9 (similar to GDM prevalence in KPSC data). All predictors were uncorrelated (r=0), or with moderate correlation of r=0.5. A total of 1000 datasets were simulated for each combination of r and N. For GBM estimation we used lr=0.01 and nt=2000. Resampled p values were calculated following the description of BOTH-GBM analysis in Figure 3.1 with B=100. Due to time constraints H-A resampled p values were calculated only for subset of 55 interactions (pairwise combinations of the 11 important predictors, 11 choose 2). This subset was chosen, because, important interaction is expected to occur only between 2 predictors identify as important. Partial dependence plots were used to visualize the nonlinear trend of cont6 and cont4*bin4. Discovery is the proportion of resampled p-values smaller than a threshold. For predictor importance we used standard threshold of 0.05. To avoid false positives, for interaction importance we used a more restricted threshold of 0.01. Average false discovery is the average proportion of false discoveries out of all non-important predictors (50-11=39) and interactions (1225-4=1221). We compered BOTH-GBM analysis discovery performance to classical logistic regression approach with variable selection. Significant predictors (p value<0.05) were selected from an intermediate logistic regression model that included all 50 predictors. The final model included selected predictor’s main effects and all corresponding pairwise interactions. Nonlinearities were not estimated in the logistic regression analysis. 101 KPSC data Data were from a retrospective longitudinal cohort study of singleton deliveries in KPSC hospitals between 1995 and 2009. As described previously,(24) maternal and child data were extracted from EMR. Our outcome of interest was diagnosis of gestational diabetes mellitus (GDM) during pregnancy. Socio-demographic and health predictors considered for the analysis included: maternal age at delivery, parity (0, 1 or >1), education (high school, some college or college and post), self-reported maternal race/ethnicity (White, Black, Hispanic, API-Asian Pacific Islander or Other), median family house hold income based on census tract of residence (quantitative discrete predictor with 5 levels), history of comorbidity (≥1 diagnosis of heart, lung, kidney, or liver disease; cancer), child’s sex (male or female), maternal pre-pregnancy body mass index (BMI), smoking (yes or no), KPSC service areas (14 areas) and birth year. Residential ambient air pollution exposures of PM2.5, PM10, NO2 and O3 were interpolated to birth address from regulatory air monitoring stations to estimate mother’s average exposure during 1 year/ 3 months before pregnancy and during the 1 st trimester of pregnancy.(25) Information on birth address distance to freeways and major roads (quantitative discrete predictor with 14 levels) were also included in the analysis. BMI and smoking were assessed starting in late 2006. For our analysis we considered a subset of 68,541 births collected between 2007 and 2009 with Southern California birth address. We excluded pregnancies of mothers with type 1 diabetes, pre-existing type 2 diabetes, and an early onset (before gestational week 13) of GDM. We randomly divided the KPSC data into 80% training (N=54,847) and 20% testing data N=13,694). We use the BRT package in R with 5-fold cross validation (22, 26) to identify the optimal number of trees given lr of 0.01, id=2 (nt=1300). Resampled p-values were calculated using BOTH-GBM analysis (as described in Figure 3.1) with B=100. For overall predictor 102 importance, RI-P resampled p-values were calculated for all 26 KPSC data predictors. For interaction importance, E-A resampled p values were calculated for all 325 pairwise interactions (26 choose 2). Due to time constraints, H statistics were not calculated for KPSC data. To rule out the possibility that results were driven by subtle differences caused by the stochastic processes imbedded in GBM, the observed GBM stage was repeated with 10 different random number generating seeds. We present median observed importance and maximal resample p- values of the 10 repeats. Partial dependence plots were used to visualize trends and possible nonlinearities of important effects. We evaluated the predictive performance of GBM using the area under the receiver operator characteristic curve (AUC) in the training and in the testing datasets.(27) Finally, we included important predictors, nonlinear trends and interactions identified in GBM analysis in logistic regression model. We used the Likelihood Ratio Test (LRT) to test the significance of adding interaction term. In the KPSC data, 8.5% of observations were missing at least one predictor variable. GBM accounts for missing data using surrogate splits, but missing data were omitted for logistic regression analysis. All analyses were performed using R version 3.5.2.(28) We used GBM as implemented in the gbm package in R with a Bernoulli distribution for the binary outcome.(29) RESULTS Simulation study The proportion of discovered important effects for uncorrelated predictors (r=0), as a function of sample size (N) is described in Figure 3.2. For BOTH-GBM analysis (Figure 3.2.a) most continuous and binary predictors have high discovery rates, even for the relatively small sample size of N=2000. The method poorly recovered both predictors with weak effects, cont5 and bin5. 103 Increasing sample size greatly improved the discovery of cont4, which was part of a nonlinear interaction with bin4 (Figure 3.2.a). GBM identified as important the cont6 nonlinear main effect and the complicated, nonlinear cont4*bin4 interaction, and partial dependence plots also capture the nonlinear trends (demonstrated for N=6000, r=0 in Figure S3.1). For both H-A and E-A approaches (Figure 3.2.a, right panel) the cont1*cont2 interaction had the highest discovery followed by the 2 types of continuous*binary interactions (all reached nearly 100% discovery for N≥6000). The bin1*bin2 interaction had the lowest discovery rate, which improved greatly with increased sample size. Compared with H-A, the E-A approach had a better discovery rate for the bin1*bin2 interaction. In the reference null distribution, importance statistics for predictors and interactions involving continuous predictors were much higher compared to those for binary predictors (demonstrated for N=6000, r=0 in Figure S3.2). Compared to BOTH-GBM analysis, predictor discovery rates for logistic regression was lower and depended more on sample size (Figure 3.2.b). Logistic regression poorly recovered both weak effects (cont5 and bin5), predictors with nonlinear trends (cont4 and cont6) and one of the binary predictors (bin1). For N≤4000 logistic regression (compared to GBM) had better discovery rates for cont1*cont2 and cont3*bin3 interactions but, had very low discovery for cont4*bin4 and bin1*bin2 interactions across N. False discovery was similar and low for BOTH- GBM analysis and logistic regression for noise predictors (GBM: 0.5-1.5% vs. logistic: 0.2%) and noise interactions (GBM: 0.7-0.8% vs. logistic: 0.5-0.8%). Generally, in the correlated predictors scenario (r=0.5 vs. r=0) discovery of poorly recovered predictors increased and interaction discovery decreased (Figure S3.3). False discovery for predictors in BOTH-GBM analysis was much higher (5.4-6.0%). False discovery 104 for other effects only slightly increased (logistic predictors: 0.2-0.3%, GBM interactions: 0.9- 1.0% and logistic interactions 0.4-0.9%). Average run time for the H-A approach was highly dependent on sample size and took substantially longer than the E-A approach (Figure 3.3.a). The running time for the GBM model itself also depended on sample size (Figure 3.3.b). According to the description of BOTH-GBM analysis in Figure 3.1, with B=100 we need to run 202 GBMs to detect overall important predictors and predictors with important interactions for one dataset. For example, given N=10000 (50 predictors; 2000 trees) total running time for GBMs is 39.0 sec*202=97.6 min. For GBMs running time we need to add the time to identify 55 interactions: 100.6 sec*101=169.3 min for H statistics (3770.8 min for 1225 interactions) and/or 2.1 sec*101=3.5 min for Elith approach (78.0 min for 1225 interactions). The correlation structure did not affect running time (data not shown). KPSC data Both training (N= 54,847) and testing datasets (N= 13,694) had 8.3% cases of GDM. GBM fitting was conducted on the training set and hence unless otherwise mentioned results are from the training set. GBM had good predictive performance with median AUC=0.721 in training and AUC=0.715 in testing set. Overall important predictors identified by RI-P approach were: maternal age, BMI, ethnicity, KPSC service areas, education and parity (Table 3.1.a). Interactions between maternal age and ethnicity and between maternal age and education were found important by E-A approach (Table 3.1.b). The predictor NO2, 1 st trimester (continuous) and the interaction between maternal age and BMI (continuous* continuous) ranked high according to observed summary statistics but not 105 according to resampled p-values. Examples for this bias towards continuous predictors is described in Figure S3.4. The high null distribution of NO 2, 1 st trimester compared to categorical education predictor is a result of different predictor type. Hence, similar observed relative influence values can indicate importance for education, but not for NO 2, 1 st trimester. Similarly, even though the maternal age and BMI interaction had a larger observed importance statistic the interaction between maternal age and ethnicity was important but the former was not. We also saw that there was (small) variance in the importance statistics across 10 model repeats with different seeds. The marginal trend of important effects is visualized in partial dependence plots (Figure 3.4). Trends were very similar across GBM repeats with different seeds. Older maternal age and higher BMI strongly increased the log odds of GDM. For BMI, the effect was attenuated for BMI>34. API is the ethnic group with largest GDM risk. Mothers with more education years and more past pregnancies had a lower log odds of GDM (Figure 3.4.a). The slope between maternal age and GDM log odds increased after age 24 for Hispanic as compared to White mothers. The slope between maternal age and GDM log odds was attenuated after age 25 for mothers with at least some college education compared to mothers with high school education (Figure 3.4.b). Similar results were observed in logistic regression fit to the training set, in terms of the important predictors, nonlinear trends for BMI and interactions identified in BOTH-GMB (Table 3.2). Logistic regression predictive performance was similar to the GBM model with AUC=0.717 in training and AUC=0.713 in testing set. An addition of NO 2, 1 st trimester and the interaction between maternal age and BMI (high summary statistics values but resampled p value indicate non-importance) was also not significant in logistic model (p value >0.25). Similar 106 trends with attenuated effects non-significant interactions were observed for logistic model applied on the testing set (Table S3.1). DISCUSSION GBM importance statistics are severely biased towards effects involving continuous predictors. This bias can be corrected by comparing observed importance statistics to a reference null distribution. Previously, these methods demonstrated good ability to identify only overall important predictors (16) or only predictors with important interactions.(17) To the best of our knowledge this is the first time BOTH-GMB analysis has been applied to detect both. In a simulation study BOTH-GMB analysis had good ability to identify important predictors, interactions and nonlinearities with low false discovery rates. For detecting predictors with important interactions, E-A and H-A approaches had similar discovery performance with an advantage for the E-A approach in detecting interactions between 2 binary predictors. In addition, the E-A approach had much faster run time. We compared the performance of BOTH-GMB analysis and standard logistic regression. Mansiaux et al (16) observed that GBM (among other machine learning methods), outperformed logistic regression framework to detect important predictors. Here, we demonstrate the advantage of using BOTH-GMB analysis over logistic regression in identifying both overall important predictors and predictors with important interactions. Logistic regression discovery of true effects was lower and depended more on sample size. Nonlinearities were not accounted for in logistic regression and, as expected, logistic regression could not detect effects with nonlinear trends. Logistic regression performance could be explained by the ‘one in ten rule’. In multivariate logistic regression, it is recommended that number of effects estimated will be no 107 more that than 10% of the number of events.(30) Hence, in our simulations (outcome ratio 1:9) the number of main effects and corresponding pairwise interactions should range between 20 to 100, depending on sample size. In practice, larger number of effects could have been selected. In our analysis P predictors were selected from intermediate logistic regression model. Total effects included in the final model were equal to P+(P choose 2). Logistic models with large number of total effects fail to be estimated (5%). Among estimated logistic regression models, the median number of total effects selected was 66 (IQR: 55-78) and could go as high as 190 effects. Our simulation study included different sample sizes, with N ranging from 2,000 to 10,000. Compared to our results, other studies demonstrated that a smaller sample size is sufficient to reach good discovery performance. Mansiaux et al (16) used a binary outcome and N=300 but, although their simulated data included interaction terms, discovery was assessed only for overall predictor importance. The differences in sample size could be explained by the fact that, as we observed, detecting interactions requires larger sample size. Lampa et al (17) used N=1,000 to discover important complex high order (>2) interactions, but their outcome was continuous. BOTH-GMB is based on altering the outcome and outcome type could potentially have effect on the ability to detect associations. We expect that altering continuous outcome will have greater power to detect associations. We repeated our simulation study for uncorrelated and moderately correlated data. Unlike other tree-based method, e.g., random forest that divides importance between highly correlated predictors (31), GBM should assign importance to just one of the correlated predictors. However, as was previously reported (17), we observed that correlated data affected discovery performance and increased false discovery. 108 In KPSC data, we managed to retrieve known GDM risk factors and obtained similar results in logistic regression. Advanced maternal age,(32) high pre-pregnancy BMI (32, 33) and Hispanic and API ethnicities(34) have all been reported to increase GDM risk. For parity, other studies report on lower GDM risk for women pregnant for the first time.(34) We observed similar trend for parity in an unadjusted logistic regression model, but for adjusted GBM and logistic regression models we observed the opposite trend. BMI is usually categorized (underweight, normal, overweight and obese), here for BMI on the continuous scale we observed nonlinear trend with attenuated effect for mothers with BMI>34. Gestational weight gain, especially during early pregnancy increase a woman’s risk of GDM.(35) Guidelines(36) for obesity during pregnancy is to restrict weight gain for obese mothers. It is possible that intervention to minimize weight gain for mothers with extremely high BMI reduced GDM risk. Unfortunately, we don’t have information on weight gain through GDM diagnosis. The interaction between maternal age and ethnicity suggests on a different trend for White vs. Hispanic mothers after age 24 and might be a novel finding. Increased GDM risk of mothers with high school education (compare to college education) is evident after the age of 25. Education for individuals up to mid-20’s could be just timing, whereas education status at older age could be an indicator for socioeconomic status. Association between air pollution exposure(37, 38) and increased risk of GDM was previously reported in a study that also used this KPSC data but included observations from earlier years (starting at 1999 instead of 2006).(25) We are not able to detect an association between air pollution exposure and GDM in our subset of the data. A possible explanation is that air pollution exposure levels (especially PM2.5 and NO2) decreased (25) and hence, effects did not exist or were too weak to identify. The hazard of using importance statistics without accounting for predictor type is demonstrated in 109 KPSC data analysis. We observed high importance for continuous predictors (e.g., air pollutants) that became nonimportant after accounting for the reference null distribution. This study had several limitations. Computational time constrained this study in 2 aspects. First, as we discussed before, H statistics have long run time and heavily depend on sample size. In our simulation study, we applied the H-A approach only for a subset of 55 interactions which allowed us to obtain discovery performance for all true interactions but limited our ability to estimate false discovery. Long computational time prevented us from applying the H-A approach to KPSC data. Second, GBM itself is very resource-intensive and takes a long time to run. Run time depends on sample size, number of predictors, and number of trees. Repeatedly altering the outcome for BOTH-GMB analysis is time-consuming. Another limitation is the uncertainty of GBM and the methods we applied to identify important effects. GBM is a stochastic process and result depend (to some extent) on the random seed specified. Hence, for GBM applied to KPSC data we used 10 GBM repeats with different seeds. In practice, the 10 observed summary statistics (examples in Figure S3.4) and partial dependence trends (Figure 3.4) were very similar. For reference null distributions, we used B=100 which resulted in crude resampled p-values (B=1000 or higher would have been preferred). We used maximal resampled p value to avoid false discovery of borderline values. This study had several strengths. An alternative approach to alter the outcome is to alter predictors.(39) However, altering the outcome is faster (we don’t need to repeat the process separately for each predictor) and has the advantage of maintaining the associations between predictors.(15) In addition, we used an elaborate simulation study to test performance of BOTH- GMB analysis vs. logistic regression and to compare H-A and E-A approaches. Our simulation included different effect sizes, different predictor types (continuous and binary), interactions 110 between predictors of different types and nonlinear trends. Finally, we applied our method on a large healthcare dataset which integrated EMR and air pollution exposure. Methods presented here for binary outcomes could be easily extended to other outcome types. In conclusion, GBMs are promising tools in analyzing environmental healthcare data. They automatically consider a large number of predictors, all interactions up to a prespecified level, and nonlinearities. GBM interpretation can greatly benefit from applying methods to identify predictors and interactions. In GBM, like other machine learning methods, the gain in predictive performance comes at the expense of interpretability. We observed that good discovery performance is achieved with BOTH-GMB analysis. However, resolving interpretability produced another issue. BOTH-GMB analysis is computationally intensive and might not be feasible for very large datasets. Using the Elith statistic in the process of identifying predictors with important interactions can reduce the computational burden while maintaining good discovery performance. Future research directions include: discovery performance using a larger number of predictors, generalizing to higher order interactions, testing different correlation structures, and extending to a different outcome type (e.g., continues outcome or binary rare outcome). 111 REFERENCES 1. Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. European Journal of Epidemiology 2009;24(12):733. 2. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology 1996;49(12):1373-9. 3. Billionnet C, Sherrill D, Annesi-Maesano I. Estimating the health effects of exposure to multi-pollutant mixture. Annals of Epidemiology 2012;22(2):126-41. 4. Shah NH, Tenenbaum JD. FOCUS on translational bioinformatics: The coming age of data-driven medicine: translational bioinformatics' next frontier. Journal of the American Medical Informatics Association: JAMIA 2012;19(e1):e2. 5. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013;309(13):1351-2. 6. Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics 2001:1189-232. 7. Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis 2002;38(4):367-78. 8. Elith* J, H. Graham* C, P. Anderson R, et al. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 2006;29(2):129-51. 9. Leathwick J, Elith J, Francis M, et al. Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees. Marine Ecology Progress Series 2006;321:267-81. 112 10. Moisen GG, Freeman EA, Blackard JA, et al. Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecological Modelling 2006;199(2):176-87. 11. James G, Witten D, Hastie T, et al. An introduction to statistical learning. Springer; 2013. 12. Austin PC, Lee DS, Steyerberg EW, et al. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble‐ based methods? Biometrical Journal 2012;54(5):657-73. 13. Austin PC, Tu JV, Ho JE, et al. Using methods from the data-mining and machine- learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. Journal of Clinical Epidemiology 2013;66(4):398- 407. 14. Breiman L. Classification and regression trees. Routledge; 2017. 15. Altmann A, Toloşi L, Sander O, et al. Permutation importance: a corrected feature importance measure. Bioinformatics 2010;26(10):1340-7. 16. Mansiaux Y, Carrat F. Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections. BMC Medical Research Methodology 2014;14:99. 17. Lampa E, Lind L, Lind PM, et al. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environmental Health 2014;13:57. 113 18. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, Springer Series in Statistics. Springer New York, 2009. 19. Breiman L, Friedman J, Olshen R, et al. Classification and regression trees. Wadsworth & Brooks. Cole Statistics/Probability Series 1984. 20. Friedman JH, Popescu BE. Predictive learning via rule ensembles. The Annals of Applied Statistics 2008;2(3):916-54. 21. Molnar C. Interpretable machine learning: A guide for making black box models explainable. Christoph Molnar, Leanpub 2018. 22. Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. Journal of Animal Ecology 2008;77(4):802-13. 23. Steuer R, Kurths J, Daub CO, et al. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 2002;18(suppl_2):S231-S40. 24. Xiang AH, Wang X, Martinez MP, et al. Association of maternal diabetes with autism in offspring. JAMA 2015;313(14):1425-34. 25. Jo H. Early life air pollution exposure, gestational diabetes mellitus, and autism spectrum disorder. Dissertation. University of Southern California; 2019. 26. Elith J, Leathwick J. Boosted Regression Trees for ecological modeling. R documentation Available at https://cran r-project org/web/packages/dismo/vignettes/brt pdf 2017. 27. Robin X, Turck N, Hainard A, et al. pROC: display and analyze ROC curves. R Package Version 2015;1. 28. Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018. 114 29. Ridgeway G. Generalized Boosted Models: A guide to the gbm package. Update 2007;1(1):2007. 30. Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996;15(4):361-87. 31. Touw WG, Bayjanov JR, Overmars L, et al. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Briefings in Bioinformatics 2013;14(3):315-26. 32. Solomon CG, Willett WC, Carey VJ, et al. A prospective study of pregravid determinants of gestational diabetes mellitus. JAMA 1997;278(13):1078-83. 33. Torloni M, Betran A, Horta B, et al. Prepregnancy BMI and the risk of gestational diabetes: a systematic review of the literature with meta‐analysis. Obesity Reviews 2009;10(2):194-203. 34. Schwartz N, Nachum Z, Green MS. The prevalence of gestational diabetes mellitus recurrence—effect of ethnicity and parity: a metaanalysis. American Journal of Obstetrics and Gynecology 2015;213(3):310-7. 35. Hedderson MM, Gunderson EP, Ferrara A. Gestational weight gain and risk of gestational diabetes mellitus. Obstetrics and Gynecology 2010;115(3):597. 36. Buschur E, Kim C. Guidelines and interventions for obesity during pregnancy. International Journal of Gynecology & Obstetrics 2012;119(1):6-10. 37. Malmqvist E, Jakobsson K, Tinnerberg H, et al. Gestational diabetes and preeclampsia in association with air pollution at levels below current air quality guidelines. Environmental Health Perspectives 2013;121(4):488-93. 115 38. Robledo CA, Mendola P, Yeung E, et al. Preconception and early pregnancy air pollution exposures and risk of gestational diabetes mellitus. Environmental Research 2015;137:316-22. 39. Strobl C, Boulesteix AL, Zeileis A, et al. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 2007;8:25. 116 TABLES AND FIGURES Table 3.1 Gestational diabetes importance summary statistics and resampled p value Resampled p values were calculated following the description of BOTH-GMB analysis in Figure 3.1 with B=100. a. Overall important predictors Predictors are ordered by observed relative influence, only top 10 are presented (out of 26 predictors considered for the analysis) Predictor Observed relative influence * RI-P resampled p values ** Maternal age 2481.10 0.00 BMI 2264.20 0.00 Ethnicity 932.30 0.00 KPSC service areas 513.70 0.00 NO2, 1 st trimester 115.20 0.58 Education 112.00 0.00 Parity 86.30 0.01 O3, 1 st trimester 75.10 0.99 PM2.5, 1 st trimester 73.40 0.97 O3, 3 months pre-preg. 59.60 1.00 b. Predictors with important interactions Interactions are ordered by observed Elith, only top 10 are presented (out of 325 2-way interactions considered for the analysis) Interaction Observed Elith * E-A resampled p values ** Maternal age * BMI 4.2 0.31 Maternal age * Ethnicity 2.1 0.00 Ethnicity * BMI 1.2 0.20 KPSC service areas * BMI 0.7 0.10 Maternal age * KPSC service areas 0.4 0.65 KPSC service areas * Smoking 0.3 1.00 Maternal age * Education 0.2 0.00 Ethnicity * KPSC service areas 0.2 0.27 BMI * Smoking 0.1 1.00 Maternal age * Parity 0.1 1.00 * Median observed summary statistics of 10 GBM repeats with different seeds ** Maximal Resampled p values of 10 GBM repeats with different seeds RI-P= Relative influence permuted outcome approach E-A= Elith artificial outcome approach 117 Table 3.2 Gestational diabetes logistic regression, Train set Term OR P value Maternal age 1.095 <0.001 BMI 1.103 <0.001 BMI>34** 0.925 <0.001 Ethnicity White Ref. API 1.944 0.063 Black 0.698 0.374 Hispanic 0.528 0.018 Other 1.839 0.389 Education Highschool Ref. < College 1.811 0.011 >= College 1.784 0.040 Parity 0 Ref. 1 0.781 <0.001 2+ 0.760 <0.001 Maternal age and education Maternal age * Highschool Ref. Maternal age * < College 0.977 0.002 Maternal age* >= College 0.973 0.002 p*=0.001 Maternal age and ethnicity Maternal age * White Ref. Maternal age * Asian Pacific Islander 1.011 0.310 Maternal age * Black 1.007 0.569 Maternal age *Hispanic 1.028 0.001 Maternal age * Other 0.990 0.660 p*=0.006 Train set N=54,847; Model is adjusted for KPSC service areas * LRT p value for interaction ** Linear spline API= Asian Pacific Islander 118 Figure 3.1 BOTH-GBM approach identifying both important predictors and interactions for GBM with a binary outcome Observed predictor and interaction effects Apply GBM to estimate observed outcome allowing for 2-way interactions (id=2) Obtain: Relative influence H statistics and/or Elith statistics Null for overall effect of predictors Apply GBM to estimate randomly permuted (without replacement) outcome (id=2) Obtain: Relative influence Repeat B times and calculate RI-P resampled p-values Null for interaction(1) 1. Apply GBM to estimate observed outcome assuming only additive effects (id=1) 2. Based on model (1), calculate predicted additive probability for the i-th observation 3. Sample an artificial binary outcome using (2) as probability weights 4. Apply GBM to estimate artificial binary outcome (3) allowing for 2-way interactions (id =2) Obtain: H and/or Elith statistics Repeat stages (3) to (4) B times and calculate H-A and/or E-A resampled p-values RI-P= Relative influence permuted outcome approach E-A= Elith artificial outcome approach H-A= H statistic artificial outcome approach 119 Figure 3.2 Proportion of discovered effects in simulated data (r=0) A total of 1000 data were simulated for each N a. BOTH-GBM analysis Resampled p values were calculated following the description in Figure 3.1 with B=100. Predictor discovery is defined as resampled p value<0.05 and interaction discovery as p value <0.01. Predictor discovery Interaction discovery b. Logistic regression model Predictors and corresponding interactions were selected as described above. Predictor and interaction discovery are defined as resampled p value<0.05 Predictor discovery Interaction discovery Log odds for p mean function: log(p/(1-p)) =1+0.5*(cont1+cont2+cont1*cont2)+ 0.5*(bin1+bin2+bin1*bin2) +0.5*(cont3+bin3+cont3*bin3) +0.5*cont4^2*bin4+0.5*cont6^2+0.1*(cont5+bin5) 120 Figure 3.3 Average running time a. H and Elith statistics For a subset of 55 interactions b. Single GBM estimation For 50 predictors; 2000 trees Sample size Average time (sec) 2000 6.4 4000 12.6 6000 19.8 8000 28.8 10000 39.0 121 Figure 3.4 Partial dependence plots for marginal trend of important gestational diabetes mellitus determinants Different curves represent results from 10 observed GBM repeats with different seeds. For continuous predictors extreme values (<percentile 0.1% and >99.9%) are not plotted. Dashed vertical lines represent the 1% and 99% percentile. a. Partial dependence plots for overall important predictors Overall important predictors were identified in Table 3.2 (RI-P resampled p value <0.05). KPSC service areas is not plotted. For BMI, the vertical solid line at BMI=34 suggest a nonlinear effect 122 b. Partial dependence plots for important interactions Predictors with important interactions were identified in Table 3.2 (E-A resampled value <0.01). Vertical solid lines suggest an interaction Maternal age and ethnicity Maternal age and education 123 SUPPLEMENTARY DATA Table S3.1 Gestational diabetes logistic regression, Test set Term OR P value Maternal age 1.101 <0.001 BMI 1.097 <0.001 BMI>34** 0.930 <0.001 Ethnicity White Ref. API 3.951 0.053 Black 0.801 0.803 Hispanic 0.906 0.861 Other 0.214 0.349 Education Highschool Ref. < College 1.283 0.596 >= College 2.399 0.115 Parity 0 1 0.855 0.059 2+ 0.795 0.009 Maternal age and education Maternal age * Highschool Ref. Maternal age * < College 0.989 0.437 Maternal age* >= College 0.968 0.048 p*=0.140 Maternal age and ethnicity Maternal age * White Ref. Maternal age * Asian Pacific Islander 0.993 0.743 Maternal age * Black 0.999 0.961 Maternal age *Hispanic 1.016 0.365 Maternal age * Other 1.065 0.193 p*=0.471 Test set N=13,694; Model is adjusted for KPSC service areas * LRT p value for interaction ** Linear spline API= Asian Pacific Islander 124 Figure S3.1 Partial dependence plots for nonlinear effects in simulated data Different curves represent results from 10 simulated data (N=6000, r=0) Extreme values (<percentile 0.1% and >99.9%) are not plotted. Dashed vertical lines represent the 1% and 99% percentile. Cont6 nonlinear trend Cont4*bin4 nonlinear interaction Log odds for p mean function: log(p/(1-p)) =1+0.5*(cont1+cont2+cont1*cont2)+ 0.5*(bin1+bin2+bin1*bin2) +0.5*(cont3+bin3+cont3*bin3) +0.5*cont4^2*bin4+0.5*cont6^2+0.1*(cont5+bin5) Figure S3.2 Illustration of the bias toward continuous variables in reference null distribution Simulated data, N=6000, r=0 Overall important predictors Important interactions 125 Figure S3.3 Proportion of discovered effects in simulated data (r=0.5) A total of 1000 data were simulated for each N. a. BOTH-GBM analysis Resampled p values were calculated following the description in Figure 3.1 with B=100. Variable discovery is defined as p value<0.05 and interaction discovery as p value <0.01. Variable discovery Interaction discovery a. Logistic regression model Predictors and corresponding interactions were selected as described above. Predictor and interaction discovery are defined as p value<0.05 Variable discovery Interaction discovery Log odds for p mean function: log(p/(1-p)) =1+0.5*(cont1+cont2+cont1*cont2)+ 0.5*(bin1+bin2+bin1*bin2) +0.5*(cont3+bin3+cont3*bin3) +0.5*cont4^2*bin4+0.5*cont6^2+0.1*(cont5+bin5) 126 Figure S3.4 Illustration of observed effect and reference null distribution effects. Observed summary statistics of 10 GBM repeats with different seeds (blue dots) vs. boxplots of B=100 summary statistics estimated under a null (calculated following the description of BOTH-GBM analysis in Figure 3.1). a. Overall important predictors b. Important interactions RI-P= Relative influence permuted outcome approach E-A= Elith artificial outcome approach 127 DISCUSSION In the following section I will: contextualize my work, review my key findings, describe its importance, and outline future research directions. My first two chapters were motivated by data on the fractional concentration of exhaled nitric oxide (FeNO) collected as part of the Southern California Children’s Health Study (CHS). FeNO is a potential early marker for the respiratory response to air pollution exposures. Repeat FeNO maneuvers at several fixed exhalation flow rates (extended NO analysis) can be used to noninvasively evaluate “NO parameters” describing airway and alveolar sources of NO in a nonlinear regression model, based on the closed form steady-state solution to a mathematical model of lower respiratory tract NO dynamics. In Chapter 1, I derived optimal flow rate sampling designs for the problem of estimating NO parameters for a single participant, using nonlinear regression. I found that there is a class of unbiased flow rate sampling designs with near-optimal performance. As might be expected, optimal designs involved more data. They had a wider range of flow rates, larger number of flow rates sampled, and more replicates per flow rate. However, there are practical limits on the data that can be collected, especially for children who have a hard time performing maneuvers at the most extreme flow rates. I demonstrated that if one were to sample FeNO using a suboptimal design, then a larger number of subjects would be required to detect associations between NO parameters and a determinant (e.g., air pollution). The impacts of sampling design on NO parameter estimation (and the resultant impact on sample size required to detect associations with a determinant) varied across NO parameters. One design was not optimal for all NO parameters. In practice, a good sampling design should balance concerns about optimal estimation with the feasibility of conducting extreme flow rates and the resources/time available 128 to conduct a reasonable number of maneuvers. I demonstrated that given the practical considerations for a large-scale study of children, the CHS sampling design has good performance. This work is the first careful statistical planning of FeNO sample design and I plan to develop a shiny app web interface tool to evaluate the impact of a proposed sampling design on power. In the future, I could expand on this work by identifying optimal sampling design for a study population, when simultaneously estimating population-level and individual-level NO parameters in a nonlinear mixed effect model that pools information across individuals. In Chapter 2, I refined the statistical modeling approach linking the deterministic nonlinear FeNO model to the stochastic observations of FeNO data, in a population-based model setting. Specifically, I studied a data driven transformation of both the outcome and the mean function in a nonlinear mixed effect model. I demonstrated that transformation can improve the normality and homoscedasticity of the residuals and can improve parameter estimation. I developed a novel unified Bayesian Markov chain Monte Carlo (MCMC) method to jointly estimate the transformation parameter and the nonlinear mixed effect model parameters. This method had unbiased transformation parameter estimates, good convergence, and reasonable run time. I found that transformation misspecification affects the estimation of only some of the model’s parameters. Moreover, beside the estimated transformation parameter there is a wider range of transformations with analogous performance. Comparable transformations can include keeping the data on the original scale or a standard log transformation. In the CHS data, I observed that the previously used log transformation has similar performance to transformation estimated from the data. The flexible MCMC framework allows for extensions of this work by allowing the transformation to vary between individuals or groups (i.e., by adding a random or fixed effect to the transformation). 129 Results of Chapter 1 (sample design) and Chapter 2 (model transformation) share some common concepts that can be attributed to the nonlinear structure of the regression function. First, the impact of the sample design or model transformation was differential across NO parameters. For example, the NO parameter quantifies the maximum airway flux is highly correlated with one of the CHS sampled flow rate and hence was least sensitive to these choices. Second, there was no single optimal sample design or model transformation, but rather a range of reasonable choices, each with similar performance. Third, my novel statistical methodological developments and rigorous simulation studies provide justification for the methodological choices by our research team for FeNO sampling (CHS design of 3 maneuvers at 50 ml/s, and then 2 each at 30, 50, and 100 ml/s) and FeNO modeling (a priori log transformation), confirming that these choices are reasonable and have good performance. Lastly, my work can be generalized to other nonlinear models, for example, in the field of pharmacokinetics. In Chapter 3, I identified key predictors and interactions using gradient boosting (GBM) by constructing a reference null distribution out of importance statistics measured under an altered outcome. I demonstrated the bias in existing importance statistics towards predictors with numerous categories. The reference null distribution approach corrects for this bias and has a good ability to detect overall important predictors, predictors with important 2-way interactions and nonlinearities (compared to a traditional regression model). My contribution was a novel implementation of the null distribution approach for detection of 2-way interactions. Compared to an existing approach, my method had much shorter run time and similar (at times better) 2- way interaction discovery performance. In an application to data from 68,541 singleton deliveries in Kaiser Permanente Southern California (KPSC), considering 26 predictors and corresponding 325 pairwise interactions, I retrieved known determinants of gestational diabetes 130 mellitus and identified potentially novel nonlinearities and interactions. I confirmed my GBM findings using an analogous logistic regression, both applied to the training data (comparison of GBM and logistic regression methods) and holdout data (validation/generalization of findings). I did not identify associations with air pollution exposures, which are typically quite small in magnitude. However, due to desirable features of GBM I believe this project offer promising tools for future environmental epidemiological approaches to EMR data. Linkage of environmental exposures (based on residential address) to EMR data is a cost-effective method for studying large populations, complementary to more expensive prospective cohort studies. Methods discussed in this project could also be applied to other machine learning methods besides GBM. Because the reference null distribution method requires numerous estimations of GBMs and importance summary statistics, this method—even when using my novel implementation—is computationally intensive. As computation power increases, application to large datasets will become less burdensome. For now, computational burden can be decreased by data reduction by repeatedly resampling observations of selecting predictors. Future research direction could include: generalizing to higher order interactions, evaluating the impacts of different correlation structure within the data, and exploring how to quantify the “power” of this method. In summary, in this dissertation I developed and applied statistical methods to address a range of statistical challenges in modern environmental epidemiology—from personalized biomarker modeling to big data analytics—with an overarching goal of advancing methods for studying the health effects of air pollution. My novel statistical methods have the potential for applications in other areas, including pharmacokinetics (Chapters 1 and 2) and a wide array of fields (Chapter 3).
Abstract (if available)
Abstract
This dissertation suggests novel methods to address statistical challenges which have been encountered with data being used to detect associations between air pollution exposures and health outcomes in Southern California. The first challenge is to extract maximal information from biomarker data. Nonlinear regression models are composed of a deterministic, physically driven nonlinear equation plus additive random errors. The fractional concentration of exhaled nitric oxide (FeNO) is a biomarker of airway inflammation which is responsive to air pollution exposures. A closed form, steady-state solution model represents multiple flow exhaled FeNO as a nonlinear function of flow rate and NO parameters (FeNO model) and is used to localize airway inflammation (airway vs. alveolar). In Chapter 1, a frequentist approach is used to derive multiple flow exhaled FeNO optimal sampling protocols. In Chapter 2, a Bayesian approach is used to estimate an optimal statistical FeNO model. Methods are applied on Southern California Children's Health Study (CHS), a population-based cohort study of the chronic effects of air pollution on children’s respiratory health. For the second challenge the focus is shifting from personal biomarker measurements to big data analysis. Gradient Boosting Models (GBM) is a, tree-based machine learning method that automatically account for nonlinearities and interactions but are not directly interpretable. In Chapter 3, a machine learning approach is used to identify important predictors and interactions in GBM. Methods are applied on electronic medical records data from hospitals affiliated with Southern California Kaiser Permanente to identify whether gestational diabetes is associated with air pollution exposures and other covariates.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Bayesian models for a respiratory biomarker with an underlying deterministic model in population research
PDF
Genetic variation in inducible nitric oxide synthase promoter, residential traffic related air pollution and exhaled nitric oxide in children
PDF
Comparison of nonlinear mixed effect modeling methods for exhaled nitric oxide
PDF
Airway inflammation and respiratory health in the Southern California children's health study
PDF
Stochastic inference for deterministic systems: normality and beyond
PDF
Evaluation of new methods for estimating exposure to traffic-related pollution and early health effects for large population epidemiological studies
PDF
Bayesian model averaging methods for gene-environment interactions and admixture mapping
PDF
Machine-learning approaches for modeling of complex materials and media
Asset Metadata
Creator
Molshatzki, Noa
(author)
Core Title
Nonlinear modeling and machine learning methods for environmental epidemiology
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
08/14/2019
Defense Date
08/12/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Air pollution,biomarker,Box-Cox transformation,electronic medical records,exhaled nitric oxide,FeNO,Fisher information matrix,GBM,generalized boosted regression models,machine learning,Markov chain Monte Carlo,MCMC,nonlinear regression,OAI-PMH Harvest,sample design,transform-both-sides
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Eckel, Sandrah (
committee chair
), Gauderman, Jim (
committee member
), Marjoram, Paul (
committee member
), Xiang, Anny (
committee member
)
Creator Email
molshatz@usc.edu,noa.molshatzki@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-212162
Unique identifier
UC11663052
Identifier
etd-Molshatzki-7776.pdf (filename),usctheses-c89-212162 (legacy record id)
Legacy Identifier
etd-Molshatzki-7776.pdf
Dmrecord
212162
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Molshatzki, Noa
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
biomarker
Box-Cox transformation
electronic medical records
exhaled nitric oxide
FeNO
Fisher information matrix
GBM
generalized boosted regression models
machine learning
Markov chain Monte Carlo
MCMC
nonlinear regression
sample design
transform-both-sides