I am a Lecturer (Assistant Professor) in Business Analytics at Queen Mary University of London, School of Business and Management, and a Fellow of the Software Sustainability Institute. My background is interdisciplinary. I hold a DPhil in Development Studies from the University of Oxford (Oxford Department of International Development), where I was also affiliated with the network science group at the Mathematical Institute, and I held a lectureship position in Computational Social Science at the University of Essex and postdoctoral positions at the Stanford University School of Medicine and the University of Chicago.
My research combines computational methods from social data science and network analysis with approaches from reproducible research and metascience to study the transparency, reproducibility, bias, and social impact of data-intensive research, with a current focus on evaluating and improving the transparency and reproducibility of applications of data science, artificial intelligence, and machine learning in the social and health sciences. In another stream of research, I use computational social science and network analysis to examine health-related misinformation, digital-health interventions, and inequality in network structures of global migration.
I teach data science with an emphasis on open reproducible research and responsible analysis of real-world data. My open learning materials on Reproducible Data Science provide an accessible introduction to open-source research software, reproducible workflows (with Jupyter Notebook), hands-on coding (with Python and Markdown), and data science techniques and skills necessary to perform open, reproducible, and ethical data analysis.
DPhil in Development Studies, 2015
University of Oxford
MA in Sociological Research, 2010
University of Essex
[12 Sep 2022] I have joined the School of Business and Management at Queen Mary University of London as a Lecturer (Assistant Professor) in Business Analytics.
[13 Jan 2022] I have been awarded a Software Sustainability Institute Fellowship for 2022.
[7 Dec 2021] Our abstract Evaluating the Prevalence of Health-Related Misinformation across Multiple Social-Media Platforms (with Hamid Nejadghorban, Victoria Stensland, Joshua Hodgkin) was accepted for oral presentation at the BSA Virtual Annual Conference 2022: Building Equality and Justice Now, 20–22 April 2022.
[7 Dec 2021] Our chapter with Mason A. Porter on Migration networks: applications of network analysis to macroscale migration patterns is now published in McAuliffe, M.(Ed.) Research Handbook on International Migration and Digital Technology, Edward Elgar Publishing, 2021. An earlier preprint version of the chapter is available open access at arXiv:2002.10992. Data and computer code are available on OSF.
[3 Dec 2021] I gave a lightning talk Networked Marketplaces for Responsible Sharing of Research Data at the Association for Interdisciplinary Metaresearch and Open Science conference (aimos2021), 30 November – 3 December 2021.
[11 Nov 2021] I was invited to give a talk at the 2022 Toronto Workshop on Reproducibility (2022 TWR) hosted by the Canadian Statistical Sciences Institute (CANSSI) Ontario and the Data Sciences Institute at the University of Toronto, 23–25 February, 2022.
[28 Oct 2021] I organised and participated in a multidisciplinary workshop panel Building Responsible Data Science Workflows: Transparency, Reproducibility, and Ethics by Design at the PyData Global 2021 conference, 28–30 October 2021.
[25 Oct 2021] I am a co-author on Naudet et al. Medical journal requirements for clinical trial data sharing: Ripe for improvement, published in PLOS Medicine.
[3 Sep 2021] I am invited faculty at the Research Transparency and Reproducibility Training (RT2), August 23–September 3, 2021, organised by the Berkeley Initiative for Transparency in the Social Sciences. With teaching assistant Hamid Nejadghorban (PhD candidate in Economics, University of Essex), we provided a hands-on tutorial on Dynamic Documents with Jupyter Notebook for Reproducible Workflows. Materials and computer code from the training and our session are publicly available on OSF and GitHub repositories.
[16 Jun 2021] I gave a talk about my open learning resource Reproducible Data Science at the 2021 National Workshop on Data Science Education, 14–18 June 2021, organised by UC Berkeley’s Division of Computing, Data Science, and Society.
[9 Jun 2021] My learning resource Reproducible Data Science with Open-Source Python Tools and Real-World Data is available open access under a CC BY-SA 4.0 license.
Publications and preprints
Importance The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but …
Open teaching materials