4 Conclusion
In this report, we investigated a central question: what causes cancer death rates to increase more in certain areas of the USA than in others?
We found that the following variables had the most significant impact on cancer rates: pctAgeOvr50, STIs, adltSmoking, adltObesity, pctBlack, pctWhite, pctFemale, and foodIns.
It is also noteworthy that we decided to use pctAgeOvr50 as a variable in determining death rates due to the previously abnormal large effect that the pctFemale variable had, which was significant enough for us to flag it as a cause of concern.
However, after factoring in the age-based variable, we realized that the pctFemale variable was largely accounting for the effect age has on the death rates.
By adding pctAgeOvr50 to help combat the omitted variables bias our model had, we found that controlling for age helped explain some of the omitted variable bias.
Yet, with the coefficient of pctFemale still being substantial, our model undoubtedly still exhibits omitted variable bias.
This is a shortcoming of our research and prompts others to find the data that will help us explain what we are omitting.
Furthermore, in the future, we might want to investigate the effects of other factors on cancer death rates.
For example, pollution, water quality, air quality could all be of interest alongside less conventional factors such as primary healthcare providers in an area, average income, and numerous others worthy of exploration.