About

3 Levels of our Digital Humanities Project

1. Sources

To examine and analyze health outcomes, we sourced data from California Healthcare Access and Information providing data from different Hospital locations from years 2010- 2023. These locations were separated by regions across California. These regions are Northern California, California, and Southern California and were manually separated by our group to better correlate to air quality data.
The dataset above did not include air quality data, which was essential in creating our narrative. For this, we looked to the EPA. Their Air Data website was the perfect solution, which provided years of air quality measurements across California collected by monitors. Continuing, there was a very small proportion of NA values within the dataset, however that was due to a lack of data provided to the EPA, and they were simply ignored in the final visualizations. It is important to consider how and why this data was collected, and any silencing that can have occurred. Due to wildfires producing excess smoke many times in rural and northern Californian counties, there are times in which those counties appear to be the most polluted. In reality, due to the lack of urbanization and great volume of forestry, mountainous regions can tend to be more safe than dangerous in regards to air quality. Moreover, the location of monitors can silence groups, as the gray areas where there were no monitors present will naturally have less granularity and accuracy in their measurements. This indicates potential bias of rural and poorer groups.

2. Processing

We imported datasets for patient demographics of hospitals across California from the years 2010 to 2023. We standardized the necessary variable names to combine all the datasets into one standard dataset containing the hospital names, addresses, patient counts by race, gender, insurance and diagnoses.
From the dataset, we created new variables “region”, “latitude” and “longitude”. The “region” was based on a controversial proposal named the Division of California into Three States initiative. We adopted this proposal because the division accurately separates California into three regions by geographical location and population. Based on this initiative, we separated hospital regions into Northern California, California and Southern California.
The “latitude” and “longitude” variables represent the geographical coordinates of each hospital listed in the dataset. To extract the coordinates, the address of each hospital was combined and input into Google Maps API, where we extracted the latitudes and longitudes. Outliers or inaccurate values were replaced manually to ensure accuracy of information.
After loading in the datasets, we found that it was already very clean. By using Deepnote, we could see missing chunks, values, and incorrectly formatted data, and from there used Python (packages including pandas, numpy, matplotlib, etc.). We had to combine both the hospital dataset and the air quality dataset, which we did in RStudio using the tidyverse package. To ensure that the values were added correctly, we combined the datasets by matching the relevant year and county between air quality value and hospital. With this, we were able to create various data visualizations in Tableau and Python.

3. Presentation

This website was created using WordPress and is hosted through UCLA’s Humanspace portal-a user-friendly hosting service offered and designed specifically for students in the Digital Humanities Department.
After processing our sources, we took on the task of how to present our data into clear and concise visualizations. The tools we utilized were Tableau and Python. With the large number of variables, we needed to find tools that could easily manage complex datasets. Tableau’s wide range of visualizations allowed us to explore several relationships. Python’s unique customizability allowed us to create interactive visualizations to engage viewers.
We used WordPress as a medium to create our website, as it was both user- and client-friendly. By using hyperlinks to link to data sources and bibliographic sites, this added another level of ease. Because we want to keep every group of readers in mind, we made sure that fonts were legible, word vs background colors contrasted well, and titles were distinct, bolded, and much larger. With these considerations in mind, we were able to create both a visually appealing website and a website that prioritizes the UI/UX for all potential readers.

Acknowledgments

Our project would not have been possible without Cameron Manning for his constant help, advice, and guidance through the process of creating this project. Without him, we would be more lost than a ship without a compass. Through his support during office hours, discussion sections, and lecture meetings, our group gained confidence in our data visualization skills and developed a deeper understanding of our datasets.

Additionally, we want to express our appreciation for Professor Sabo, whose invaluable education was vital in all of our work. His teachings in regards to Digital Humanities and beyond set the foundations and more of this website.

Bibliography