Journal of Data Science ›› 2020, Vol. 18 ›› Issue (3): 526-535.doi: 10.6339/JDS.202007_18(3).0018

Previous Articles     Next Articles

Data Visualization and Descriptive Analysis for Understanding Epidemiological Characteristics of COVID-19: A Case Study of a Dataset from January 22, 2020 to March 29, 2020

Yasin Khadem Charvadehand Grace Y. Yi1, 2

  1. 1 Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada
    2 Department of Computer Science, University of Western Ontario, London, Ontario, Canada
  • Online:2020-07-21 Published:2020-07-22

Abstract: COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2) that was reported to spread in people in December 2019. Understanding epidemiological features of COVID-19 is important for the ongoing global efforts to contain the virus. As a complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric and nonparametric survival models as well as text mining and data visualization techniques to examine the clinical manifestations and epidemiological features of COVID-19. Our analysis shows that: (i) the median incubation time is about 5 days and older people tend to have a longer incubation period; (ii) the median time for infected people to recover is about 20 days, and the recovery time is significantly associated with age but not gender; (iii) the fatality rate is higher for older infected patients than for younger patients.

Key words: incubation time, recovery time, risk factors, survival analysis, symptom onset, text mining