Top Read Articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Design Considerations for Vaccine Trials with a Special Focus on COVID-19 Vaccine Development
    Jie Chen and Naitee Ting
    Journal of Data Science    2020, 18 (3): 550-580.   DOI: 10.6339/JDS.202007_18(3).0020
    Abstract305)      PDF (476KB)(320)       Save
    The COVID-19 pandemic has triggered explosive activities in searching for cures, including vac- cines against the SARS-CoV-2 infection. As of April 30, 2020, there are at least 102 COVID-19 vaccine development programs worldwide, the majority of which are in preclinical development phases, five are in phase I trial, and three are in phase I/II trial. Experts caution against rushing COVID-19 vaccine development, not only because the knowledge about SARS-CoV-2 is lack- ing (albeit rapidly accumulating), but also because vaccine development is a complex, lengthy process with its own rules and timelines. Clinical trials are critically important in vaccine devel- opment, usually starting from small-scale phase I trials and gradually moving to the next phases (II and III) after the primary objectives are met. This paper is intended to provide an overview on design considerations for vaccine clinical trials, with a special focus on COVID-19 vaccine de- velopment. Given the current pandemic paradigm and unique features of vaccine development, our recommendations from statistical design perspective for COVID-19 vaccine trials include: (1) novel trial design (e.g., master protocol) to expedite the simultaneous evaluation of multiple candidate vaccines or vaccine doses, (2) human challenge studies to accelerate clinical develop- ment, (3) adaptive design strategies (e.g., group sequential designs) for early termination due to futility, efficacy, and/or safety, (4) extensive modeling and simulation to characterize and estab- lish long-term efficacy based on early-phase or short-term follow-up data, (5) safety evaluation as one of the primary focuses throughout all phases of clinical trials, (6) leveraging real-world data and evidence in vaccine trial design and analysis to establish vaccine effectiveness, and (7) global collaboration to form a joint development effort for more efficient use of resource and expertise and data sharing.
    Related Articles | Metrics
    Gene Set Enrichment Analysis in RNA-Seq Data
    Chen-An Tsai and Pei-Hsun Li
    Journal of Data Science    2020, 18 (4): 632-648.   DOI: 10.6339/JDS.202010_18(4).0003
    Abstract185)      PDF (674KB)(54)       Save
    Related Articles | Metrics
    Editorial: A reformed Journal of Data Science for the era of data science
    Jun Yan
    Journal of Data Science    2020, 18 (3): 405-406.   DOI: 10.6339/JDS.202007_18(3).0001
    Abstract154)      PDF (143KB)(122)       Save
    Related Articles | Metrics
    [ Discussion Paper ] An Epidemiological Forecast Model and Software Assessing Interventions on COVID-19 Epidemic in China
    Lili Wang, Yiwang Zhou, Jie He, Bin Zhu, Fei Wang, Lu Tang, Marisa C. Eisenberg, Peter X.K. Song
    Journal of Data Science    0, (): 1-.  
    Accepted: 03 April 2020

    Abstract146)      PDF (774KB)(147)       Save
    We develop a health informatics toolbox that enables timely analysis and evaluation of the time-course dynamics of a range of infectious disease epidemics. As a case study, we examine the novel coronavirus (COVID-19) epidemic using the publicly available data from the China CDC. This toolbox is built upon a hierarchical epidemiological model in which two observed time series of daily proportions of infected and removed cases are generated from the underlying infection dynamics governed by a Markov Susceptible-Infectious-Removed (SIR) infectious disease process. We extend the SIR model to incorporate various types of time-varying quarantine protocols, including government-level ‘macro’ isolation policies and community-level ‘micro’ social distancing (e.g. self-isolation and self-quarantine) measures. We develop a calibration procedure for under-reported infected cases. This toolbox provides forecasts, in both online and offline forms,as well as simulating the overall dynamics of the epidemic. An R software package is made available for the public, and examples on the use of this software are illustrated. Some possible extensions of our novel epidemiological models are discussed.
    Related Articles | Metrics
    [ Discussion Paper ] Tracking Reproductivity of COVID-19 Epidemic in China with Varying Coefficient SIR Model
    Haoxuan Sun, Yumou Qiu, Han Yan, Yaxuan Huang, Yuru Zhu, Jia Gu, Song Xi Chen
    Journal of Data Science    0, (): 2-.  
    Accepted: 23 April 2020

    Abstract135)      PDF (1138KB)(128)       Save
    We propose a varying coefficient Susceptible-Infected-Removal (vSIR) model that allows changing infection and removal rates for the latest corona virus (COVID-19) outbreak in China. The vSIR model together with proposed estimation procedures allow one to track the reproductivity of the COVID-19 through time and to assess the effectiveness of the control measures implemented since Jan 23 2020 when the city of Wuhan was lockdown followed by an extremely high level of self-isolation in the population. Our study finds that the reproductivity of COVID-19 had been significantly slowed down in the three weeks from January 27 to February 17th with 96.3% and 95.1% reductions in the effective reproduction numbers R among the 30 provinces and 15 Hubei cities, respectively. Predictions to the ending times and the total numbers of infected are made under three scenarios of the removal rates. The paper provides a timely model and associated estimation and prediction methods which may be applied in other countries to track, assess and predict the epidemic of the COVID-19 or other infectious diseases.
    Related Articles | Metrics
    Data Visualization and Descriptive Analysis for Understanding Epidemiological Characteristics of COVID-19: A Case Study of a Dataset from January 22, 2020 to March 29, 2020
    Yasin Khadem Charvadeh and Grace Y. Yi
    Journal of Data Science    2020, 18 (3): 526-535.   DOI: 10.6339/JDS.202007_18(3).0018
    Abstract121)      PDF (475KB)(82)       Save
    COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2) that was reported to spread in people in December 2019. Understanding epidemiological features of COVID-19 is important for the ongoing global efforts to contain the virus. As a complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric and nonparametric survival models as well as text mining and data visualization techniques to examine the clinical manifestations and epidemiological features of COVID-19. Our analysis shows that: (i) the median incubation time is about 5 days and older people tend to have a longer incubation period; (ii) the median time for infected people to recover is about 20 days, and the recovery time is significantly associated with age but not gender; (iii) the fatality rate is higher for older infected patients than for younger patients.
    Related Articles | Metrics
    An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China (with discussion)
    Lili Wang, Yiwang Zhou, Jie He, Bin Zhu, Fei Wang, Lu Tang, Michael Kleinsasser, Daniel Barker, Marisa C. Eisenberg, and Peter X.K. Song
    Journal of Data Science    2020, 18 (3): 409-432.   DOI: 10.6339/JDS.202007_18(3).0003
    Abstract115)      PDF (1444KB)(122)       Save
    We develop a health informatics toolbox that enables timely analysis and evaluation of the time- course dynamics of a range of infectious disease epidemics. As a case study, we examine the novel coronavirus (COVID-19) epidemic using the publicly available data from the China CDC. This toolbox is built upon a hierarchical epidemiological model in which two observed time series of daily proportions of infected and removed cases are generated from the underlying infection dy- namics governed by a Markov Susceptible-Infectious-Removed (SIR) infectious disease process. We extend the SIR model to incorporate various types of time-varying quarantine protocols, in- cluding government-level ‘macro’ isolation policies and community-level ‘micro’ social distancing (e.g. self-isolation and self-quarantine) measures. We develop a calibration procedure for under- reported infected cases. This toolbox provides forecasts, in both online and offline forms, as well as simulating the overall dynamics of the epidemic. An R software package is made available for the public, and examples on the use of this software are illustrated. Some possible extensions of our novel epidemiological models are discussed.

    Related Articles | Metrics
    Meta-Analysis of Several Epidemic Characteristics of COVID-19
    Panpan Zhang, Tiandong Wang, and Sharon X. Xie
    Journal of Data Science    2020, 18 (3): 536-549.   DOI: 10.6339/JDS.202007_18(3).0019
    Abstract107)      PDF (376KB)(57)       Save
    As the COVID-19 pandemic has strongly disrupted people’s daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the serial interval and the epidemic doubling time. We collect relevant studies based on the COVID-19 data in China and conduct a meta-analysis to obtain pooled estimates on the four metrics. From the summary results, we conclude that the COVID-19 has stronger transmissibility than SARS, implying that stringent public health strategies are necessary.
    Related Articles | Metrics
    COVID-19 Fatality: A Cross-Sectional Study using Adaptive Lasso Penalized Sliced Inverse Regression
    Kaida Cai, Wenqing He, and Grace Y. Yi
    Journal of Data Science    2020, 18 (3): 483-494.   DOI: 10.6339/JDS.202007_18(3).0015
    Abstract106)      PDF (406KB)(60)       Save
    Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronvirus, which was declared as a global pandemic by the World Health Organi- zation on March 11, 2020. In this work, we conduct a cross-sectional study to investigate how the infection fatality rate (IFR) of COVID-19 may be associated with possible geographical or demographical features of the infected population. We employ a multiple index model in combi- nation with sliced inverse regression to facilitate the relationship between the IFR and possible risk factors. To select associated features for the infection fatality rate, we utilize an adaptive Lasso penalized sliced inverse regression method, which achieves variable selection and sufficient dimension reduction simultaneously with unimportant features removed automatically. We ap- ply the proposed method to conduct a cross-sectional study for the COVID-19 data obtained from two time points of the outbreak.
    Related Articles | Metrics
    Editorial: Data science in action in response to the outbreak of COVID-19
    Dean Follmann, Peter X. K. Song, Hansheng Wang, and Jun Yan
    Journal of Data Science    2020, 18 (3): 407-408.   DOI: 10.6339/JDS.202007_18(3).0002
    Abstract94)      PDF (195KB)(93)       Save
    Related Articles | Metrics
    Tracking Reproductivity of COVID-19 Epidemic in China with Varying Coefficient SIR Model
    Haoxuan Sun, Yumou Qiu, Han Yan, Yaxuan Huang, Yuru Zhu, Jia Gu, and Song Xi Chen
    Journal of Data Science    2020, 18 (3): 455-472.   DOI: 10.6339/JDS.202007_18(3).0010
    Abstract94)      PDF (894KB)(276)       Save
    We propose a varying coefficient Susceptible-Infected-Removal (vSIR) model that allows changing infection and removal rates for the latest corona virus (COVID-19) outbreak in China. The vSIR model together with proposed estimation procedures allow one to track the reproductivity of the COVID-19 through time and to assess the effectiveness of the control measures implemented since Jan 23 2020 when the city of Wuhan was lockdown followed by an extremely high level of self-isolation in the population. Our study finds that the reproductivity of COVID-19 had been significantly slowed down in the three weeks from January 27th to February 17th with 96.3% and 95.1% reductions in the effective reproduction numbers R among the 30 provinces and 15 Hubei cities, respectively. Predictions to the ending times and the total numbers of infected are made under three scenarios of the removal rates. The paper provides a timely model and associated estimation and prediction methods which may be applied in other countries to track, assess and predict the epidemic of the COVID-19 or other infectious diseases.

    Related Articles | Metrics
    A Meta Analysis for the Basic Reproduction Number of COVID-19 with Application in Evaluating the Effectiveness of Isolation Measures in Different Countries
    Jianghu (James) Dong, Yongdao Zhou, Ying Zhang, Thomas Flaherty, and Douglas Franz
    Journal of Data Science    2020, 18 (3): 496-510.   DOI: 10.6339/JDS.202007_18(3).0016
    Abstract86)      PDF (1163KB)(69)       Save
    COVID-19 is quickly spreading around the world and carries along with it a significant threat to public health. This study sought to apply meta-analysis to more accurately estimate the basic reproduction number (R0) because prior estimates of R0 have a broad range from 1.95 to 6.47 in the existing literature. Utilizing meta-analysis techniques, we can determine a more robust estimation of R0, which is substantially larger than that provided by the World Health Organization (WHO). A susceptible-Infectious-removed (SIR) model for the new infection cases based on R0 from meta analysis is proposed to estimate the effective reproduction number Rt. The curves of estimated Rt values over time can illustrate that the isolation measures enforced in China and South Korea were substantially more effective in controlling COVID-19 compared to the measures enacted early in both Italy and the United States. Finally, we present the daily standardized infection cases per million population over time across countries, which is a good index to indicate the effectiveness of isolation measures on the prevention of COVID-19. This standardized infection case determines whether the current infection severity status is out of range of the national health capacity to care for patients.
    Related Articles | Metrics
    A New Generalized Lomax Model: Statistical Properties And Applications
    Mohamed Ibrahim and Haitham M. Yousof
    Journal of Data Science    2020, 18 (1): 190-217.   DOI: 10.6339/JDS.202001_18(1).0010
    Abstract81)      PDF (1888KB)(43)       Save
    In this paper, a new version of the Poisson Lomax distributions is proposed and studied. The new density is expressed as a linear mixture of the Lomax densities. The failure rate function of the new model can be increasing-constant, increasing, U shape, decreasing and upside down-increasing. The statistical properties are derived and four applications are provided to illustrate the importance of the new density. The method of maximum likelihood is used to estimate the unknown parameters of the new density. Adequate fitting is provided by the new model.
    Reference | Related Articles | Metrics
    Application Of Statistical Control Charts To Detect Unusual Frequency Of Earthquake In The World
    Fariha Taskin, Mohammad Shahed Masud
    Journal of Data Science    2020, 18 (1): 44-55.   DOI: 10.6339/JDS.202001_18(1).0002
    Abstract81)      PDF (437KB)(84)       Save
    Earthquake in recent years has increased tremendously. This paper outlines an evaluation of Cumulative Sum (CUSUM) and Exponentially Weighted Moving Average (EWMA) charting technique to determine if the frequency of earthquake in the world is unusual. The frequency of earthquake in the world is considered from the period 1973 to 2016. As our data is auto correlated we cannot use the regular control chart like Shewhart control chart to detect unusual earthquake frequency. An approach that has proved useful in dealing with auto correlated data is to directly model time series model such as Autoregressive Integrated Moving Average (ARIMA), and apply control charts to the residuals. The EWMA control chart and the CUSUM control chart have detected unusual frequencies of earthquake in the year 2012 and 2013 which are state of statistically out of control.
    Reference | Related Articles | Metrics
    Discussion of “An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China”
    Shannon Gallagher
    Journal of Data Science    2020, 18 (3): 437-437.   DOI: 10.6339/JDS.202007_18(3).0005
    Abstract81)      PDF (359KB)(41)       Save
    Related Articles | Metrics
    The Poisson Burr X Inverse Rayleigh Distribution And Its Applications
    Rania H. M. Abdelkhalek
    Journal of Data Science    2020, 18 (1): 56-77.   DOI: 10.6339/JDS.202001_18(1).0003
    Abstract69)      PDF (931KB)(48)       Save
    A new flexible extension of the inverse Rayleigh model is proposed and studied. Some of its fundamental statistical properties are derived. We assessed the performance of the maximum likelihood method via a simulation study. The importance of the new model is shown via three applications to real data sets. The new model is much better than other important competitive models.
    Reference | Related Articles | Metrics
    On Identification of High Risk Carriers of COVID-19 Using Masked Mobile Device Data 
    Da Huang, Xuening Zhu, Weidong Luo, Hao Yin, Jing Hong, Yu Chen, Jing Zhou, and Hansheng Wang
    Journal of Data Science    2020, 18 (Volume S1): 3-.   DOI: 10.6339/JDS.202012_18(S1).0002
    Abstract69)      PDF (289KB)(29)       Save
    Related Articles | Metrics
    On The Estimation Of The Shape Parameter Of A Symmetric Distribution
    Jennifer S.K. Chan , S.T. Boris Choy and Stephen G. Walker
    Journal of Data Science    2020, 18 (1): 78-100.   DOI: 10.6339/JDS.202001_18(1).0004
    Abstract68)      PDF (1563KB)(32)       Save
    The shape parameter of a symmetric probability distribution is often more difficult to estimate accurately than the location and scale parameters. In this paper, we suggest an intuitive but innovative matching quantile estimation method for this parameter. The proposed shape parameter estimate is obtained by setting its value to a level such that the central 1-1/n portion of the distribution will just cover all n observations, while the location and scale parameters are estimated using existing methods such as maximum likelihood (ML). This hybrid estimator is proved to be consistent and is illustrated by two distributions, namely Student-t and Exponential Power. Simulation studies show that the hybrid method provides reasonably accurate estimates. In the presence of extreme observations, this method provides thicker tails than the full ML method and protect inference on the location and scale parameters. This feature offered by the hybrid method is also demonstrated in the empirical study using two real data sets.
    Reference | Related Articles | Metrics
    Discussion of “An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China”
    Tianjian Zhou and Yuan Ji
    Journal of Data Science    2020, 18 (3): 440-442.   DOI: 10.6339/JDS.202007_18(3).0007
    Abstract68)      PDF (448KB)(46)    PDF(mobile) (448KB)(1)    Save
    Related Articles | Metrics
    Rejoinder: An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China
    Lili Wang, Yiwang Zhou, Jie He, Bin Zhu, Fei Wang, Lu Tang, Michael Kleinsasser, Daniel Barker, Marisa C. Eisenberg, and Peter X.K. Song
    Journal of Data Science    2020, 18 (3): 446-454.   DOI: 10.6339/JDS.202007_18(3).0009
    Abstract68)      PDF (1721KB)(96)       Save
    Related Articles | Metrics
    Assessing the Impacts of Mutations to the Structure of COVID-19 Spike Protein via Sequential Monte Carlo
    Samuel W. K. Wong
    Journal of Data Science    2020, 18 (3): 511-525.   DOI: 10.6339/JDS.202007_18(3).0017
    Abstract67)      PDF (3056KB)(34)       Save
    Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low- energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this protein structure and complement laboratory efforts.
    Related Articles | Metrics
    Discussion of “An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China”
    Debangan Dey and Vadim Zipunnikov
    Journal of Data Science    2020, 18 (3): 433-436.   DOI: 10.6339/JDS.202007_18(3).0004
    Abstract64)      PDF (474KB)(99)       Save
    Related Articles | Metrics
    Statistical Inference For A Simple Step–Stress Model With Type–II Hybrid Censored Data From The Kumaraswamy Weibull Distribution
    R. E. Ibrahim and H. E. Semary
    Journal of Data Science    2020, 18 (1): 132-147.   DOI: 10.6339/JDS.202001_18(1).0007
    Abstract63)      PDF (487KB)(42)       Save
    In reliability and life-testing experiments, the researcher is often interested in the effects of extreme or varying stress factors on the lifetimes of experimental units. In this paper, a step-stress model is considered in which the life-testing experiment gets terminated either at a pre-fixed time (say, Tm+1) or at a random time ensuring at least a specified number of failures (Say, y out of n). Under this model in which the data obtained are Type-II hybrid censored, the Kumaraswamy Weibull distribution is used for the underlying lifetimes. The maximum Likelihood estimators (MLEs) of the parameters assuming a cumulative exposure model are derived. The confidence intervals of the parameters are also obtained. The hazard rate and reliability functions are estimated at usual conditions of stress. Monte Carlo simulation is carried out to investigate the precision of the maximum likelihood estimates. An application using real data is used to indicate the properties of the maximum likelihood estimators.
    Reference | Related Articles | Metrics
    Discussion of “Tracking reproductivity of COVID-19 epidemic in China with varying coefficient SIR model”
    Yukang Jiang, Jianbin Tan, Ting Tian, and Xueqin Wang
    Journal of Data Science    2020, 18 (3): 473-474.   DOI: 10.6339/JDS.202007_18(3).0011
    Abstract63)      PDF (457KB)(36)       Save
    Related Articles | Metrics
    Subsampled Data Based Alternative Regularized Estimators 
    Subir Ghosh, Gabriel Ruiz, and Brandon Wales
    Journal of Data Science    2020, 18 (2): 238-256.   DOI: 10.6339/JDS.202004 18(2).0002
    Abstract61)      PDF (308KB)(48)       Save
    Subsampling the data is used in this paper as a learning method about the influence of the data points for drawing inference on the parameters of a fitted logistic regression model. The alternative, alternative regularized, alternative regularized lasso, and alternative regularized ridge estimators are proposed for the parameter estimation of logistic regression models and are then compared with the maximum likelihood estimators. The proposed alternative regularized estimators are obtained by using a tuning parameter but the proposed alternative estimators are not regularized. The proposed alternative regularized lasso estimators are the averaged standard lasso estimators and the alternative regularized ridge estimators are also the averaged standard ridge estimators over subsets of groups where the number of subsets could be smaller than the number of parameters. The values of the tuning parameters are obtained to make the alternative regularized estimators very close to the maximum likelihood estimators and the process is explained with two real data as well as a simulated study. The alternative and alternative regularized estimators always have the closed form expressions in terms of observations that the maximum likelihood estimators do not have. When the maximum likelihood estimators do not have the closed form expressions, the alternative regularized estimators thus obtained provide the approximate closed form expressions for them.
    Related Articles | Metrics
    Discussion of “Tracking reproductivity of COVID-19 epidemic in China with varying coefficient SIR model”
    Lu Tang
    Journal of Data Science    2020, 18 (3): 475-476.   DOI: 10.6339/JDS.202007_18(3).0012
    Abstract61)      PDF (390KB)(38)       Save
    Related Articles | Metrics
    Four Parameters Kumaraswamy Reciprocal Family Of Distributions
    Salma Omar Bleed
    Journal of Data Science    2020, 18 (1): 101-114.   DOI: 10.6339/JDS.202001_18(1).0005
    Abstract59)      PDF (563KB)(22)       Save
    In this paper, kumaraswamy reciprocal family of distributions is introduced as a new continues model with some of approximation to other probabilistic models as reciprocal, beta, uniform, power function, exponential, negative exponential, weibull, rayleigh and pareto distribution. Some fundamental distributional properties, force of mortality, mills ratio, bowley skewness, moors kurtosis, reversed hazard function, integrated hazard function, mean residual life, probability weighted moments, bonferroni and lorenz curves, laplace-stieltjes transform of this new distribution with the maximum likelihood method of the parameter estimation are studied. Finally, four real data sets originally presented are used to illustrate the proposed estimators.
    Reference | Related Articles | Metrics
    Discussion of “An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China”
    Yifan Zhu and Ying Qing Chen
    Journal of Data Science    2020, 18 (3): 443-445.   DOI: 10.6339/JDS.202007_18(3).0008
    Abstract59)      PDF (410KB)(38)       Save
    Related Articles | Metrics
    Rejoinder: “Tracking Reproductivity of COVID-19 Epidemic with Varying Coefficient SIR Model”
    Haoxuan Sun, Yumou Qiu, Han Yan, Yaxuan Huang, Yuru Zhu, Jia Gu, and Song Xi Chen
    Journal of Data Science    2020, 18 (3): 480-482.   DOI: 10.6339/JDS.202007_18(3).0014
    Abstract59)      PDF (488KB)(49)       Save
    Related Articles | Metrics
    Discussion of “An epidemiological forecast model and software assessing interventions on the COVID-19 epidemic in China”
    Kelly R. Moran
    Journal of Data Science    2020, 18 (3): 438-439.   DOI: 10.6339/JDS.202007_18(3).0006
    Abstract58)      PDF (345KB)(41)       Save
    Related Articles | Metrics
    Quantifying Disease Severity Of Cystic Fibrosis Using Quantile Regression Methods
    Kameryn Denaro, Barbara A. Bailey, Douglas J. Conrad
    Journal of Data Science    2020, 18 (1): 148-160.   DOI: 10.6339/JDS.202001_18(1).0008
    Abstract57)      PDF (421KB)(35)       Save
    This article presents a classification of disease severity for patients with cystic fibrosis (CF). CF is a genetic disease that dramatically decreases life expectancy and quality. The disease is characterized by polymicrobial infections which lead to lung remodeling and airway mucus plugging. In order to quantify disease severity of CF patients and compute a continuous severity index measure, quantile regression, rank scores, and corresponding normalized ranks are calculated for CF patients. Based on the rank scores calculated from the set of quantile regression models, a continuous severity index is computed for each CF patient and can be considered a robust estimate of CF disease severity.
    Reference | Related Articles | Metrics
    Discussion of “Tracking reproductivity of COVID-19 epidemic in China with varying coefficient SIR model”
    Lili Wang, Fei Wang, Yiwang Zhou, and Peter X.K. Song
    Journal of Data Science    2020, 18 (3): 477-479.   DOI: 10.6339/JDS.202007_18(3).0013
    Abstract57)      PDF (514KB)(34)       Save
    Related Articles | Metrics
    Editorial: Data Science in Action in Response to the Outbreak of COVID-19 in China
    Dean Follmann, Peter X. K. Song, Hansheng Wang, and Jun Yan
    Journal of Data Science    0, (): 1-.   DOI: 10.6339/JDS.202007_18(S1).0001
    Abstract55)      PDF (167KB)(53)       Save
    Related Articles | Metrics
    The Real-time Effect of Public Health Interventions on the COVID-19 Epidemic in Hubei Province 
    Jiamin Liu, Ze Chen, Jianqiang Zhang, Yanyan Ouyang, Xu Guo , and Wangli Xu
    Journal of Data Science    2020, 18 (Volume S1): 61-.   DOI: 10.6339/JDS.202012_18(S1).0006
    Abstract55)      PDF (643KB)(45)       Save
    Related Articles | Metrics
    Incorporating Design Weights And Historical Data Into Model-Based Small-Area Estimation
    Hui Xie, Lawrence E. Barker and Deborah B. Rolka
    Journal of Data Science    2020, 18 (1): 115-131.   DOI: 10.6339/JDS.202001_18(1).0006
    Abstract53)      PDF (689KB)(36)       Save
    Bayesian hierarchical regression (BHR) is often used in small area estimation (SAE). BHR conditions on the samples. Therefore, when data are from a complex sample survey, neither survey sampling design nor survey weights are used. This can introduce bias and/or cause large variance. Further, if non-informative priors are used, BHR often requires the combination of multiple years of data to produce sample sizes that yield adequate precision; this can result in poor timeliness and can obscure trends. To address bias and variance, we propose a design assisted model-based approach for SAE by integrating adjusted sample weights. To address timeliness, we use historical data to define informative priors (power prior); this allows estimates to be derived from a single year of data. Using American Community Survey data for validation, we applied the proposed method to Behavioral Risk Factor Surveillance System data. We estimated the prevalence of disability for all U.S. counties. We show that our method can produce estimates that are both more timely than those arising from widely-used alternatives and are closer to ACS’ direct estimates, particularly for low-data counties. Our method can be generalized to estimate the county-level prevalence of other health related measurements.
    Reference | Related Articles | Metrics
    Extended Poisson-Frechet Distribution: Mathematical Properties and Applications to Survival and Repair Times
    M. S. Hamed
    Journal of Data Science    2020, 18 (2): 319-342.   DOI: 10.6339/JDS.202004_18(2).0006
    Abstract52)      PDF (859KB)(30)       Save
    In this paper, a new four parameter zero truncated Poisson Frechet distribution is defined and studied. Various structural mathematical properties of the proposed model including ordinary moments, incomplete moments, generating functions, order statistics, residual and reversed residual life functions are investigated. The maximum likelihood method is used to estimate the model parameters. We assess the performance of the maximum likelihood method by means of a numerical simulation study. The new distribution is applied for modeling two real data sets to illustrate empirically its flexibility.
    Related Articles | Metrics
    Log-Weighted Pareto Distribution And Its Statistical Properties
    Rasha Mohamed Mandouh, Mahmoud Abdel-ghaffar Mohamed
    Journal of Data Science    2020, 18 (1): 161-189.   DOI: 10.6339/JDS.202001_18(1).0009
    Abstract51)      PDF (1112KB)(127)       Save
    The Pareto distribution is a power law probability distribution that is used to describe social scientific, geophysical, actuarial, and many other types of observable phenomena. A new weighted Pareto distribution is proposed using a logarithmic weight function. Several statistical properties of the weighted Pareto distribution are studied and derived including cumulative distribution function, location measures such as mode, median and mean, reliability measures such as reliability function, hazard and reversed hazard functions and the mean residual life, moments, shape indices such as skewness and kurtosis coefficients and order statistics. A parametric estimation is performed to obtain estimators for the distribution parameters using three different estimation methods the maximum likelihood method, the L-moments method and the method of moments. Numerical simulation is carried out to validate the robustness of the proposed distribution. The distribution is fitted to a real data set to show its importance in real life applications.
    Reference | Related Articles | Metrics
    Investigating the Repeatability of the Extracted Factors in Relation to the Type of Rotation Used, and the Level of Random Error: A Simulation Study
    Dimitris Panaretos, George Tzavelas, Malvina Vamvakari, Demosthenes Panagiotakos
    Journal of Data Science    2020, 18 (2): 390-404.   DOI: 10.6339/JDS.202004_18(2).0010
    Abstract48)      PDF (377KB)(28)       Save
    Factor analysis (FA) is the most commonly used pattern recognition methodology in social and health research. A technique that may help to better retrieve true information from FA is the rotation of the information axes. The purpose of this study was to evaluate whether the selection of rotation type affects the repeatability of the patterns derived from FA, under various scenarios of random error introduced, based on simulated data from the Standard Normal distribution. It was observed that when applying promax non - orthogonal rotation, the results were more repeatable as compared to the orthogonal rotation, irrespective of the level of random error introduced in the model.
    Related Articles | Metrics
    Zografos Balakrishnan Power Lindley Distriution
    Noor Shahid, Rashida khalil and Javeria Khokhar
    Journal of Data Science    2020, 18 (2): 279-298.   DOI: 10.6339/JDS.202004_18(2).0004
    Abstract47)      PDF (701KB)(43)       Save
    In this paper Zografos Balakrishnan Power Lindley (ZB-PL) distribution has been obtained through the generalization of Power Lindley distribution using Zografos and Balakrishnan (2009) technique. For this technique, density of upper record values exists as their special case. Probability density (pdf), cumulative distribution (cdf) and hazard rate function (hrf) of the proposed distribution are obtained. The probability density and cumulative distribution function are expanded as linear combination of the density and distribution function of Exponentiated Power Lindley (EPL) distribution. This expansion is further used to study different properties of the new distribution. Some mathematical and statistical properties such as asymptotes, quantile function, moments, mgf, mean deviation, renyi entropy and reliability are also discussed. Probability density (pdf), cumulative distribution (cdf) and hazard rate (hrf) functions are graphically presented for different values of the parameters. In the end Maximum Likelihood Method is used to estimate the unknown parameters and application to a real data set is provided a. It has been observed that the proposed distribution provides superior fit than many useful distributions for given data set.
    Related Articles | Metrics
    Statistical Inference for K Exponential Populations Under Joint Progressive Type-I Censored Scheme
    O.E. Abo-Kasem and Mazen Nassar
    Journal of Data Science    2020, 18 (2): 376-389.   DOI: 10.6339/JDS.202004_18(2).0009
    Abstract45)      PDF (478KB)(17)       Save
    In this article, the maximum likelihood estimators of the k independent exponential populations parameters are obtained based on joint progressive type- I censored (JPC-I) scheme. The Bayes estimators are also obtained by considering three different loss functions. The approximate confidence, two Bootstrap confidence and the Bayes credible intervals for the unknown parameters are discussed. A simulated and real data sets are analyzed to illustrate the theoretical results.
    Related Articles | Metrics