5 Appendix

5.1 ChatGPT

Me okay so in two of my variables we are missing the data for the year 2018, to fill that data in I want you to take the average of each state from 2017,2019,2020 to fill in the missing values for 2018

ChatGPT To fill in missing data for the year 2018 by taking the average of corresponding values from 2017, 2019, and 2020 for each state, you can use the mutate() function along with case_when() to conditionally replace missing values. Here’s how you can do it:

R Copy code

# Assuming your data frame is named 'df' and the columns are named 'Year', 'State', 'Variable1', 'Variable2'
library(dplyr)
# Group by 'State'
df_filled <- df %>%
  group_by(State) %>%
  mutate(Variable1 = case_when(
    is.na(Variable1) & Year == 2018 ~ (Variable1[Year == 2017] + Variable1[Year == 2019] + Variable1[Year == 2020]) / 3,
    TRUE ~ Variable1
  ),
  Variable2 = case_when(
    is.na(Variable2) & Year == 2018 ~ (Variable2[Year == 2017] + Variable2[Year == 2019] + Variable2[Year == 2020]) / 3,
    TRUE ~ Variable2
  )) %>%
  ungroup()

In this code:

We group the data by ‘State’ using group_by(). We then use mutate() to create new versions of ‘Variable1’ and ‘Variable2’ using case_when(). Inside case_when(), we check if the value is missing for the year 2018 (is.na(Variable1) & Year == 2018). If so, we replace it with the average of the corresponding values from 2017, 2019, and 2020 for each state. The TRUE ~ Variable1 and TRUE ~ Variable2 parts ensure that for rows where the condition is not met, the original value is retained. Finally, we ungroup the data using ungroup() to remove the grouping. Adjust the column names and data frame name according to your specific dataset.

ME Okay, so in the variable Crude.Rate, it is split up by 2 rows by gender that variable is called Sex, how do I divide Crude.Rate every row that has Female in the Sex column by 4 and the same for Male by 3?

ChatGPT You can achieve this using the mutate() function along with conditional statements. Here’s how you can do it:

R Copy code # Load the dplyr package

library(dplyr)

# Assuming your data frame is named 'RPfNoDummies'

# Convert Crude.Rate to numeric type
RPfNoDummies <- RPfNoDummies %>%
  mutate(Crude.Rate = as.numeric(Crude.Rate))

# Divide Crude.Rate by 4 for rows where Sex is Female
RPfNoDummies <- RPfNoDummies %>%
  mutate(Crude.Rate = ifelse(Sex == "Female", Crude.Rate / 4, Crude.Rate))

# Divide Crude.Rate by 3 for rows where Sex is Male
RPfNoDummies <- RPfNoDummies %>%
  mutate(Crude.Rate = ifelse(Sex == "Male", Crude.Rate / 3, Crude.Rate))

This code first converts the “Crude.Rate” column to numeric type. Then it divides the “Crude.Rate” by 4 for rows where the “Sex” column is “Female” and divides by 3 for rows where the “Sex” column is “Male”. All other rows remain unchanged. Make sure to adjust the column names and conditions based on your actual data.

ME okay this isnt dropping any rows right now:

$# Dummies 
RPdummies <- RPdummies %>%
  filter_all(any_vars(. != "Not Applicable"))
$# No Dummies 
RPNoDummies <- RPNoDummies %>%
  filter_all(any_vars(. != "Not Applicable"))

ChatGPT If you want to drop rows in your data frame where any column contains the value “Not Applicable”, you can use the filter_all() function along with any_vars() as you’ve done. However, you need to ensure that the comparison is done properly for each column. If your intention is to filter out rows where any column contains “Not Applicable”, you can do it like this:

R Copy code

$# For RPdummies
RPdummies <- RPdummies %>%
  filter_all(any_vars(. != "Not Applicable"))

$# For RPNoDummies
RPNoDummies <- RPNoDummies %>%
  filter_all(any_vars(. != "Not Applicable"))

This code will filter out rows where any column contains the value “Not Applicable” from both RPdummies and RPNoDummies data frames. Make sure that “Not Applicable” is the exact string representation of your NA values. If NA values are represented differently in your data, adjust the comparison accordingly.

ME I want to change the colors of this

ggplot(SexRpDf1, aes(x = Year, y = Deathsper100k, fill = Sex)) +
  geom_bar(position = "dodge", stat = "identity") +
  labs(x = "Year", y = "Number of Deaths per 100", fill = "Sex")

**ChatGPT* To change the colors of the bars in your bar plot, you can use the scale_fill_manual() function to manually specify the colors for each level of the Sex variable. Here’s how you can modify your code:

R Copy code

# Define custom colors
my_colors <- c("Female" = "skyblue", "Male" = "pink")
# Plot
ggplot(SexRpDf1, aes(x = Year, y = Deathsper100k, fill = Sex)) +
  geom_bar(position = "dodge", stat = "identity") +
  labs(x = "Year", y = "Number of Deaths per 100", fill = "Sex") +
  scale_fill_manual(values = my_colors)

In this code: my_colors is a vector that specifies the colors you want to use for each level of the Sex variable. scale_fill_manual(values = my_colors) sets the fill colors of the bars according to the values specified in my_colors. Adjust the colors in my_colors vector according to your preference.

5.1.1 Cancer Statistics Source

Cancer Death %