Health Data Science - PHD




Health Data Science - PHD




The PhD Program in Health Data Science trains the next generation of data science leaders for applications in public health and medicine. The program advances future leaders in health and biomedical data science by: (i) providing rigorous training in the fundamentals of health and biomedical data science, (ii) fostering innovative thinking for the design, conduct, analysis, and reporting of public health research studies, and (iii) providing practical training through real-world research opportunities at research centers and institutes directed by departmental faculty such as the Biostatistics Center (BSC), the Computational Biology Institute (CBI), and the Biostatistics and Epidemiology Consulting Service (BECS).

The PhD program consists of two concentrations; Biostatistics & Bioinformatics Concentration. Biostatistics is the science of designing, conducting, analyzing, and reporting studies aimed at advancing public health and medicine. Bioinformatics is the science of developing and applying computational algorithms and analysis methodologies to big biological data such as genetic sequences. Together they are foundational sciences for public health research and decision-making and essential to educating the next generation of leaders in health and biomedical data science.

The program takes advantage of the rich biostatistical and bioinformatics resources at GW and in the Nation’s Capital. Faculty in the Department of Biostatistics and Bioinformatics are engaged in a diverse research portfolio that includes areas such as diabetes, infectious diseases, mental health, maternal-fetal medicine, cardiovascular disease, emergency medicine, and oncology. Methodological interests of the faculty include the design and analyses of clinical trials including group-sequential and adaptive design, SMART trials, pragmatic trials, multiple testing, and benefit: risk evaluation; machine learning; meta-analyses; missing data; randomization tests, longitudinal data; the use of real-world data including electronic medical records; and research in biostatistics education methodologies. The Washington DC area is a hub for biostatisticians and bioinformaticians in government and industry, providing a rich source of adjunct faculty with relevant experience.  Specifically, the National Institutes of Health (NIH) and the Food and Drug Administration (FDA) have considerable human resources in these disciplines, many with world-class reputations. Several leading biostatisticians from the NIH are currently serving on doctoral committees and teach courses in the Milken Institute School of Public Health (GWSPH).

The program features a modernized applied curriculum, unique in its emphasis on cutting-edge data science techniques while retaining the rigor of traditional Biostatistics and Bioinformatics programs. The program prepares students to be independent researchers and effective collaborators in interdisciplinary studies.


GWSPH Doctoral programs admit students for the Fall term each academic year. Applications will be accepted beginning in August and are due no later than December 1st for the next matriculating cohort beginning in the following Fall term.  Find GWSPH graduate admissions information here.

All applicants for the Biostatistics Concentration are required to submit current GRE scores (within five years of matriculation date). Applicants for the Bioinformatics Concentration are strongly encouraged to submit a GRE score.

Meeting the minimum requirements does not assure acceptance. Applicants must provide evidence of the completion of their undergraduate and/or graduate work before registration in GWSPH is permitted.



Concentration-Specific Prerequisites

   Applied Biostatistics 

  Bioinformatics Concentration

  • Three semesters of calculus (through multivariable calculus)
  • A course in linear algebra
  • A course in undergraduate statistics

Additional advanced courses in mathematics and calculus-based probability are encouraged but not a requirement for admission.

  • A course in statistics
  • A course in introductory biology and/or a course in computer programming
  • Typically, an undergraduate major in either biology, statistics, mathematics, computer science, bioinformatics, and/or bioengineering

Transfer Credits

Graduate courses taken prior to admission while in non-degree status may not be transferable into GWSPH programs. The PhD program is designed to serve students coming directly from an undergraduate degree. Students completing a master’s degree prior to admission to the PhD degree program may be eligible to transfer up to 24 credits toward the PhD coursework requirements. Depending on how many transfer credits are accepted, at minimum, 48 credits of additional coursework and dissertation research will be required.

PhD Core Requirements

PUBH 6080 | Pathways to Public Health (0 credits)
PUBH 6421 | Responsible Conduct of Research (1 credit)
PUBH 6850 | Introduction to SAS for Public Health Research (1 credit)
PUBH 6851 | Introduction to R for Public Health Research (1 credit)
PUBH 6852 | Introduction to Python for Public Health Research (1 credit) 
PUBH 6860 | Principles of Bioinformatics (3 credits) 
PUBH 6886 | Statistical and Machine Learning for Public Health Research (3 credits)
PUBH 8099 | PhD Seminar: Cross Cutting Concepts in Public Health (1 credit) NOTE: In 23-24, PUBH 8099 was updated to PUBH 8001
PUBH 8870 | Statistical Inference for Public Health Research I* (3 credits)


SPH Course Descriptions

Biostatistics Concentration

PUBH 6866 | Principles of Clinical Trials (3 credits) 
PUBH 6869 | Principles of  Biostatistical Consulting (1 credit)
PUBH 6879 | Propensity Score Methods for Causal Inference in Observational Studies (3 credits)
PUBH 6887 | Applied Longitudinal Data Analysis for Public Health Research (3 credits)
PUBH 8871 | Statistical Inference for Public Health Research II* (3 credits)
PUBH 8875 | Linear Models in Biostatistics* (3 credits) 
PUBH 8877 | Generalized Linear Models in Biostatistics* (3 credits)
PUBH 8878 | Statistical Genetics (3 credits)
PUBH 8879 | An Introduction to Causal Inference for Public Health Research (3 credits)
PUBH 8880 | Statistical Computing for Public Health Research (3 credits)
STAT 6227 | Survival Analysis (3 credits)

SPH Course Descriptions


* Courses are basis of comprehensive exam for the Biostatistics concentration.

Bioinformatics Concentration

PUBH 6854 | Applied Computing in Health Data Science (3 credits)
PUBH 6859 | High Performance Cloud Computing (3 credits) 
PUBH 6861 | Public Health Genomics (3 credits) 
PUBH 6884 Bioinformatics  Algorithms and Data Structure (3 credits) 
PUBH 8885 | Computational Biology (3 credits) 

SPH Course Descriptions


  • Applied Biostatistics Concentration: 12 credits minimum
  • Bioinformatics Concentration:  18 credits minimum

Both concentrations:  elective selections must include at least*:

  • 3 Credits in Biostatistics 
  • 3 Credits in Bioinformatics
  • 3 Credits in a Cognate Area

All students are expected to work with their Advisor in the selection of their Elective coursework.

*Pre-approved elective courses are shown in the program guide for each category. 


Research Requirements

GTAP** | GradTeachingAsst Certification (This includes UNIV 0250 - Graduate Assistant Certification Course (1 credit) (both concentrations) (0; 1 credit)
PUBH 8283 | Doctoral Biostatistics Consulting Practicum (Biostatistics concentration only) (2 credits)
PUBH 8413 | Research Leadership (both concentrations (1 credit)

** This is a requirement for TAs.

PUBH 8999 |
 Dissertation Research (varies by concentration - 12 minimum credits)


Non-Academic Requirements

Professional Enhancement

Students in degree programs must participate in eight hours of Professional Enhancement. These activities may be Public Health-related lectures, seminars, or symposia related to your field of study.

Professional Enhancement activities supplement the rigorous academic curriculum of the SPH degree programs and help prepare students to participate actively in the professional community. You can learn more about opportunities for Professional Enhancement via the Milken Institute School of Public Health Listserv, through departmental communications, or by speaking with your advisor.

Students must submit a completed Professional Enhancement Form to the student records department [email protected].

Collaborative Institutional Training Initiative (CITI) Training

All students are required to complete the Basic CITI training module in Social and Behavioral Research prior to beginning the practicum.  This online training module for Social and Behavioral Researchers will help new students demonstrate and maintain sufficient knowledge of the ethical principles and regulatory requirements for protecting human subjects - key for any public health research.

Academic Integrity Quiz

All Milken Institute School of Public Health students are required to review the University’s Code of Academic Integrity and complete the GW Academic Integrity Activity.  This activity must be completed within 2 weeks of matriculation. Information on GWSPH Academic Integrity requirements can be found here.

Past Program Guides

Past Program Guides

Students in the PhD in Health and Biomedical Data Science program should refer to the guide from the year in which they matriculated into the program. For the current program guide, click the "PROGRAM GUIDE" button on the right-hand side of the page.

Program Guide 2023-2024
Program Guide 2022-2023
Program Guide 2021-2022


Students pursuing a PhD in Health Data Science have access to a world-class faculty with relevant expertise and diverse experience in all sectors of public health and medical research. Areas of interest and research experience for professors and lecturers in the program include: clinical trials, statistical modeling, machine learning, computing and software development, survival analysis, and finite population sampling, with applications in infectious diseases (including COVID-19, HIV, and bacterial superbug infections), mental health, diabetes, maternal-fetal medicine, and cardiovascular disease.  Learn about the Department of Biostatistics and Bioinformatics faculty here.
PhD in Biostatistics Students

Lizhao (Agnes) Ge
Email: [email protected]

Start year: 2021

Lizhao was born and raised in Zhejiang, China. She came to the United States for undergraduate studies at the University of Iowa, where she obtained a BS in Mathematics and a BBA in Finance, and a minor in Music. She earned a Master of Applied Statistics from the Pennsylvania State University and worked there as a Statistical Consultant after graduation. She joined the Antibacterial Resistance Leadership Group (ARLG) at the George Washington University Biostatistics Center as a biostatistician in 2020 and started her PhD journey in the Health and Biomedical Data Science (Applied Biostatistics track) in 2021. Her research interests are clinical trial designs, and application of the Desirability of Outcome Ranking (DOOR) in biomedical studies.

Yijie He
Email: [email protected]
Start year: 2021

Yijie was born in China. Before coming to the George Washington University, he received a BS degree in Bioengineering from University of California San Diego and an MS degree in Biostatistics from Duke University. He is currently a PhD student in Health and Biomedical Data Science, in the Applied Biostatistics track, and he also works at the George Washington University Biostatistics Center as a biostatistician. His current research interests include clinical trials, high-dimensional data, and data science.

Shiyu Shu
Email: [email protected]
Start year: 2021

Shiyu (Richard) was born and raised in Dalian, China, and has been studying in the United States for the last 7 years. He obtained a BA in Mathematics and in Economics from Vassar College, during which he spent one semester as an exchange student at St Edmund Hall, Oxford University. He then received a Master of Statistical Practice from Carnegie Mellon University, and worked as a data analyst for a healthcare organization in rural Arizona during the peak of the COVID pandemic. The work experience motivated him to pursue a career in public health, and to continue his PhD studies in the Health and Biomedical Data Science program at GWU. He is currently a biostatistician working in the Diabetes Prevention Program team (DPP) at the Biostatistics Center, under the supervision of Dr. Marinella Temprosa. His current research interests include machine learning/data science, genomics data and survival analysis.

Shanshan Zhang
Email: [email protected]
Start year: 2021

Shanshan was born and raised in China. She earned a Bachelor of Medicine and a Master of Science in Cell Biology from China Medical University. When she came to the United States in 2018, she transferred her interests to public health, since a doctor can save individuals, whereas a public health expert can save lives on a population level. She obtained a second graduate degree, an MS in Biostatistics from the George Washington University. Shanshan hopes that she can make contributions to the field of public health, especially in designing and conducting clinical trials during the PhD program, and can work as an outstanding biostatistician in the future.

Recent Publications:

Qiongfang Wu, Leizhen Xia, Lifeng Tian, Shanshan Zhang, Jialyu Huang. Hormonal replacement treatment for frozen-thawed embryo transfer with or without GnRH agonist pretreatment: a retrospective cohort study stratified by times of embryo implantation failures. Accepted by Frontiers in Endocrinology. 5 January 2022

Shanshan Zhang. Biostatistics in Clinical Decision Making: What can We Get from a 2× 2 Contingency Table. E3S Web of Conferences (Vol. 233). EDP Sciences. December 2020

Qiqiang Guo, Shanshan Wang, Shanshan Zhang, Hongde Xu, Xiaoman Li, et al. ATM‐CHK 2‐Beclin 1 Axis Promotes Autophagy to Maintain ROS Homeostasis Under Oxidative Stress. The EMBO Journal, 39(10), e103111. 18 March 2020

PhD in Bioinformatics Students

Mahdi Baghbanzadeh
Email: [email protected][email protected]
Start year: 2021

Mahdi Baghbanzadeh is a Ph.D. student in the health and biomedical data science program at the Milken Institute School of Public Health at the George Washington University. Mahdi received his MS in Mathematical Statistics from Shiraz University in 2012, and his BS in Statistics from Shahid Beheshti University in 2010.  Before joining GWU, he had the experience of 7 years performing in an analytical role ranging from data analyst to senior data scientist in multiple companies. His research interests are applying machine learning algorithms in analyzing omics data, developing tools for studying the genotype-phenotype association studies, and the effects of different medications on a certain disease.


Baghbanzadeh, Mostafa; Simeone, F. C.; Bowers, C. M.; Liao, K.-C.; Thuo, M. M.; Baghbanzadeh, Mahdi; Miller, M.; Carmichael, T. B.; Whitesides, G. M.* “Odd-even effects in charge transport across n-alkanethiolate-based SAMs” Journal of American Chemical Society2014136, 16919–16925.

Mahdi Baghbanzadeh, Dewesh Kumar, Sare I. Yavasoglu, Sydney Manning, Ahmad Ali Hanafi-Bojd, Hassan Ghasemzadeh, Ifthekar Sikder, Dilip Kumar, Nisha Murmu, Ubydul Haque* “Malaria Epidemics in India: Role of Climatic Condition and Control Measures” Science of the Total Environment2020712, 136368.

Peeri, Noah C., Nistha Shrestha, Md Siddikur Rahman, Rafdzah Zaki, Zhengqi Tan, Saana Bibi, Mahdi Baghbanzadeh, Nasrin Aghamohammadi, Wenyi Zhang, and Ubydul Haque. "The SARS, MERS and novel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned?" International Journal of Epidemiology202049, 717-726.

Md Siddikur Rahman, Ajlina Karamehic-Muratovic, Mahdi Baghbanzadeh, Miftahuzzannat Amrin, Sumaira Zafar, Nadia Nahrin Rahman, Sharifa Umma Shirina, Ubydul Haque, “Climate change and dengue fever knowledge, attitudes and practices in Bangladesh: a social media–based cross-sectional survey”Transactions of The Royal Society of Tropical Medicine and Hygiene2021115, 85-93.

Nistha Shrestha, Muhammad Yousaf Shad, Osman Ulvi, Modasser Hossain Khan, Ajlina Karamehic-Muratovic, Uyen-Sa D.T. Nguyen, Mahdi Baghbanzadeh, Robert Wardrup, Nasrin Aghamohammadi, Diana Cervantes, Kh. Md Nahiduzzaman, Rafdzah Ahmed Zaki, Ubydul Haque, The impact of COVID-19 on globalization”One Health2020, 100180.

Osman Ulvi, Ajlina Karamehic-Muratovic, Mahdi Baghbanzadeh, Ateka Bashir, Jacob Smith, Ubydul Haque, “Social Media Use and Mental Health: A Global Analysis”, Epidemiologia, 2022, 3 (1), 11-25.

Ranojoy Chatterjee
Email: [email protected][email protected]

Start year: 2021

Ranojoy is originally from Kolkata, India. He got his B.Tech in Computer Science from WBUT and has an MS in Computer Science from Kansas State University, specializing in recommendation systems using a multi-armed bandit approach. After graduation he worked at Bellwethr, Inc developing a retention engine which was later patented by the company. After his brief stint in industry, he worked as a research specialist in Rahlab to develop machine learning tools for analyzing Covid-19 data. His current research interests are graph neural networks, single cell data and prediction systems in biomedical data science.  


Amritphale, A., Chatterjee, R., Chatterjee, S. et al. Predictors of 30-Day Unplanned Readmission After Carotid Artery Stenting Using Artificial Intelligence. Adv Ther 38, 2954–2972 (2021).

Chow JH, Rahnavard A, Gomberg-Maitland M, Chatterjee R, et al. Association of Early Aspirin Use With In-Hospital Mortality in Patients With Moderate COVID-19. JAMA Netw Open. 2022;5(3):e223890. doi:10.1001/jamanetworkopen.2022.3890

Clark Gaylord
Email: [email protected]

Start year: 2021

After receiving M.S. degrees in Mathematics and Statistics from the University of Virginia and Virginia Tech, respectively, Clark has had a career in information technology, network security, and research computing. Over the last 20 years, Clark has led the design and operation of many research computing and big data research systems, and is a consulting statistician on several research projects. While at Virginia Tech, Clark taught several courses in Statistics, Data Science, and Networking. A PhD candidate in GW's Health and Biomedical Data Science, Bioinformatics Track, Clark is also Director of Research Technology Services in GW IT.


GW High Performance Computing:

Erika Hubbard
Email: [email protected]

Start year: 2021

Erika was born and raised in Fairfax County, Virginia (NOVA) and earned her BSc in Biomedical Engineering with minor concentrations in Applied Mathematics and Engineering Business from the University of Virginia. Upon graduation she went on to intern and work for AMPEL BioSolutions, LLC in Charlottesville, VA, researching autoimmune and inflammatory diseases, primarily systemic lupus erythematosus (SLE). As a dual member of the systems biology and bioinformatics teams at AMPEL she developed an interest in leveraging genomics data to gain insights into mechanisms of autoimmune disease pathogenesis. She continues to work with AMPEL to study lupus and translate findings into novel clinical tools to further precision medicine. 

Recent Publications:

Hubbard EL, Pisetsky DS, Lipsky PE. Anti-RNP antibodies are associated with the interferon gene signature but not decreased complement levels in SLE. Ann Rheum Dis [Epub ahead of print: 3 Feb 2022]. doi:

Hubbard EL, Grammer AC, Lipsky PE. Transcriptomics data: pointing the way to subclassification and personalized medicine in systemic lupus erythematosus. Curr Opin Rheumatol [Internet]. 2021 Nov 1;33(6):579-85. doi:  

Daamen AR, Bachali P, Owen KA, Kingsmore KM, Hubbard EL, Labonte AC, et al. Comprehensive transcriptomic analysis of COVID-19 blood, lung, and airway. Sci Rep [Internet]. 2021 Mar 29;11(1):7052. doi:

Hubbard EL, Catalina MD, Heuer S, Bachali P, Geraci NS, et al. Analysis of gene expression from systemic lupus erythematosus synovium reveals myeloid cell-driven pathogenesis of lupus arthritis. Sci Rep [Internet]. 2020 Oct 15;10(1):17361. doi:

Xinyang Zhang
Email: [email protected][email protected]

Start year: 2021

Xinyang was born and raised in Jiangsu, China. Before coming to George Washington University, she obtained her MS in Data Informatics at the University of Southern California, Los Angeles. For now, she started her Ph.D. journey in Health and biomedical data science (Applied Bioinformatics track) and works for the Computational Biology Institute (CBI) as a Research Assistant. Her research interest focuses on microbiome analysis, omics data for the COVID-19, and reference-grade pathogen sequences database construction.