Key figures
22 Students π©βπ
Six Teams π€
Three Content experts π
Two Government partners π₯
One winning team π
Overview of the event
The first CBDRH Health Data Science datathon took place on Friday 26th May 2023. This event saw the participation of 22 HDS students working in six teams, five on campus and one hybrid online/on-campus team. The event focused on the Antiretroviral Therapy in HIV dataset, and teams were challenged to pose a research question and develop a solution using their health context expertise and analytic skills.
Five experts were on hand to guide the teams, aiding them in crafting and executing their research questions and proposed solutions effectively. This included applied researchers with content expertise in HIV medications and machine learning, and Health Informaticians from NSW Health Sydney Local Health District
The datathon wasnβt just about coding and data; it was also about enjoyment and camaraderie. The teams enjoyed good food and had loads of fun. The presentations from all teams were highly engaging, with everyone putting into practice the technical coding skills and health context expertise from the HDS program.
The day culminated with a prize ceremony to acknowledge the hard work from all of the teams, and of course announce the winners! The winning teamβs breakthrough came from using neural networks to predict the success of current ART drug combinations for patients with HIV. The second-place team used survival analysis to answer the question, βWhat drug combination is most effective at achieving viral suppression?β The third team investigated the impact of Dolutegravir (DTG) as a third agent drug on time to viral suppression among active HIV patients under antiretroviral therapy (ART).
This event truly blended learning, collaboration, and fun, capturing the true essence of data science in an enjoyable and rewarding environment.
The data
The Antiretroviral Therapy in HIV dataset comprises viral loads, CD4 counts, and drug regimen information for 8,916 patients with HIV. This is a synthetic dataset that has been developed using Generative Adversarial Networks. This approach provides realistically complex data, allowing users to prototype, evaluate, and compare machine learning algorithms without the usual constraints of patient privacy.
The ART HIV dataset included demographic details and longitudinal clinical data on drug combinations and CD4 counts for nearly 9,000 patients. Common baseline drug regimes included tenofovir disoproxil & emtricitabine (FTC+TDF) and abacavir & lamivudine (3TC+ABC) Several of the teams choose to implement machine learning models to predict future CD4 outcomes based on previous values, demographics and treatment values.
What our students said
I loved getting to work with really complicated data based on real situations, loved the info sharing sessions and the support given.
The open-ended nature of the competition meant that we got to see different uses for the dataset. I got to meet participants from other faculties.
The actual competition day itself was very rewarding and also very fun!
Picture gallery
Explore the images from the day below. Credit for all images to Cassandra Hannagan.