Time Series Homework: Chapter 3 Lesson 2 Key

Please_put_your_name_here

Data

Code
mortality <- rio::import("https://byuistats.github.io/timeseries/data/mortality_us.xlsx")

Questions

Question 1 - Context and Measurement (10 points)

The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

US Mortality

https://wonder.cdc.gov/wonder/help/cmf.html#

Note: The data is self-explanatory, don’t get lost in the documentation page.

Answer

Data Collection Process:

The CDC collects data on the number of deaths in the US from the Compressed Mortality File which is a data base updated annually.

Unit of Analysis:

Deaths per 100,000 people in the United States

Meaning:

Each observation is the population, number of deaths and the mortality rate per 100,000 people on an annual level.

Question 2 - US Mortality Rate: Visualization (30 points)

a) Please plot the US Mortality Rate series (mortality per 100,000).
Answer
Code
ggplot(mortality, aes(x = year, y = `mort_rate_per_100,000`)) +
  geom_line(color = "forestgreen", size = 1) +
  labs(
    title = "U.S. Mortality Rate per 100,000 (1968-2021)",
    x = "Year",
    y = "Mortality Rate per 100,000",
  ) +
  theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    plot.caption = element_text(hjust = 0.5, face = "italic", color = "gray40")  # Style the subnote
  ) +
  ylim(750, max(mortality$`mort_rate_per_100,000`))  # Set y-axis starting from 800

b) Use the exponential smoothing method to model the US Mortality Rate series. Use the smoothing parameter that R calculates by minimizing the SS1PE. Add it to the plot in 2a
Answer
Code
mortality_tsibble <- mortality |>
  as_tsibble(index = year)

mortality_model <- mortality_tsibble |>
  model(
    Additive = ETS(`mort_rate_per_100,000` ~ 
                     error("A") + 
                     trend("N", beta = 0) + 
                     season("N"),  # No trend or seasonality, pure EWMA
                   opt_crit = "amse")  # Minimize SS1PE using AMSE
  )

fitted_values <- fitted(mortality_model)

# Add the fitted values to the plot and include a legend
ggplot() +
  geom_line(data = mortality, aes(x = year, y = `mort_rate_per_100,000`, color = "Original Data"), size = 1.5) +
  geom_line(data = fitted_values, aes(x = year, y = .fitted, color = "Fitted (Minimized SS1PE)"), size = 1) +
  labs(
    title = "U.S. Mortality Rate per 100,000 Population (1968-2021)",
    x = "Year",
    y = "Mortality Rate per 100,000",
    caption = "Fitted Exponential Smoothing Model Added (Minimized SS1PE)"
  ) +
  theme_minimal(base_size = 15) +
  scale_color_manual(name = "Legend", values = c("Original Data" = "blue", "Fitted (Minimized SS1PE)" = "red")) 

c) Please repeat the modeling above but instead choose a smoothing parameter value \(\alpha=0.2\). Add it to the plot in 2a
Answer
Code
mortality_model_alpha_0.2 <- mortality_tsibble |>
  model(
    Additive = ETS(`mort_rate_per_100,000` ~ 
                     error("A") + 
                     trend("N", alpha = 0.2, beta = 0) + 
                     season("N"),  # No trend or seasonality, pure EWMA
                   opt_crit = "amse")
  )

fitted_values_alpha_0.2 <- fitted(mortality_model_alpha_0.2)

# Add the fitted values to the plot and include a legend
ggplot() +
  geom_line(data = mortality, aes(x = year, y = `mort_rate_per_100,000`, color = "Original Data"), size = 1.5) +
  geom_line(data = fitted_values_alpha_0.2, aes(x = year, y = .fitted, color = "Fitted (Alpha = 0.2)"), size = 1) +
  labs(
    title = "U.S. Mortality Rate per 100,000 Population (1968-2021)",
    x = "Year",
    y = "Mortality Rate per 100,000",
    caption = "Fitted Exponential Smoothing Model Added (Alpha = 0.2)"
  ) +
  theme_minimal(base_size = 15) +
  scale_color_manual(name = "Legend", values = c("Original Data" = "blue", "Fitted (Alpha = 0.2)" = "forestgreen"))

d) Please repeat the modeling above but instead choose a smoothing parameter value \(\alpha=\frac{1}{n}\).Add it to the plot in 2a
Answer
Code
# Assuming your mortality data is already in a time series format
n <- length(mortality$`mort_rate_per_100,000`)  # Number of data points
alpha <- 1 / n  # Smoothing parameter alpha = 1/n

mortality_tsibble <- mortality |>
  as_tsibble(index = year)
mortality_model <- mortality_tsibble |>
  model(
    Additive = ETS(`mort_rate_per_100,000` ~ error("A") + trend("A" , alpha = alpha, beta=0) + season("N"),
                   opt_crit = "amse")  # Minimize SS1PE using AMSE
  )

fitted_values_alpha_1_n <- fitted(mortality_model)

# Add the fitted values to the plot and include a legend
ggplot() +
  # Plot the original data in blue
  geom_line(data = mortality, aes(x = year, y = `mort_rate_per_100,000`, color = "Original Data"), size = 1) +
  
  # Add the fitted values from the exponential smoothing model in red with customized dash pattern
  geom_line(aes(x = fitted_values_alpha_1_n$year, y = fitted_values_alpha_1_n$.fitted, color = "Fitted with Alpha = 1/n"), size = 1) +
  
  # Labels and title
  labs(
    title = "U.S. Mortality Rate per 100,000 Population (1968-2021)",
    x = "Year",
    y = "Mortality Rate per 100,000",
    caption = "Fitted Exponential Smoothing Model with Alpha = 1/n"
  ) +
  
  # Customize the theme
  theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    plot.caption = element_text(hjust = 0.5, face = "italic", color = "gray40"),
    
    # Make the legend smaller if needed
    legend.text = element_text(size = 10),
    legend.title = element_text(size = 12),
    legend.key.size = unit(0.5, 'cm'),
    legend.spacing.x = unit(0.2, 'cm')
  ) +
  # Add the legend
  scale_color_manual(name = "Legend", values = c("Original Data" = "blue", "Fitted with Alpha = 1/n" = "red")) 

d) Which of the smoothing parameter values you tried before work best for the series? Justify your answer.
Answer

Each of these different parameters has different strengths and weaknesses. The SSP1E choice of parameter puts its focus on recent observations rather than historical observation. This parameter would be useful for trying to forecast the number of deaths in 2022 due to Covid. The smoothing parameter of 0.2 balances responsiveness and stability when forecasting mortality rates. This value allows the model to capture recent changes without overreacting to random fluctuations, providing a smoother, more reliable forecast. The smoothing parameter of 1/n places the most emphasis on historical data, which allows longer term trends to be visible. Remember, choosing the optimal smoothing parameter often involves considering the specific characteristics of the data, the purpose of the analysis and potential external influences.

Question 5 - US Mortality Rate: Excess Mortality (30 points)

The jump at the last two years of the US Mortality Rate series is clearly the effect that Covid-19 had on mortality across the US.

a) Please calculate the excess mortality rate during 2020 and 2021 using the smoothing parameter values you employed in the previous question. Your results should be displayed in a table that is professionally formatted.
Answer
Code
# Filter data for 2020 and 2021
fitted_covid_ss1pe <- fitted_values |> filter(year %in% c(2020, 2021))
fitted_covid_alpha_0.2 <- fitted_values_alpha_0.2 |> filter(year %in% c(2020, 2021))
fitted_covid_alpha_1_n <- fitted_values_alpha_1_n |> filter(year %in% c(2020, 2021))

# Compute excess values for each alpha
mortality_covid <- mortality |> 
  filter(year %in% c(2020, 2021)) |> 
  mutate(
    ss1pe = `mort_rate_per_100,000` - fitted_covid_ss1pe$.fitted,
    alpha_0.2 = `mort_rate_per_100,000` - fitted_covid_alpha_0.2$.fitted,
    alpha_1_n = `mort_rate_per_100,000` - fitted_covid_alpha_1_n$.fitted
  )

# Create the table
library(pander)
pander(
  mortality_covid |> select(year, `mort_rate_per_100,000`,  
                            ss1pe, alpha_0.2, alpha_1_n),
  caption = "Mortality Data for 2020-2021 with Excess Values Calculated for Three Different Alphas"
)
Mortality Data for 2020-2021 with Excess Values Calculated for Three Different Alphas
year mort_rate_per_100,000 ss1pe alpha_0.2 alpha_1_n
2020 1027 157.3 179.9 189.5
2021 1044 16.82 160.7 203.7
b) What is the meaning of excess mortality as you calculated it? Please explain it to a general audience, as if you were being interviewed in a national news show.
Answer

Excess mortality refers to the number of deaths beyond what is typically expected in a specific period, based on historical data. It serves as an indicator of unusual circumstances, like pandemics, natural disasters, or severe flu seasons, showing if more people are dying than usual. This metric captures both direct effects, such as deaths from the primary cause (e.g., a virus), and indirect impacts, like deaths resulting from overwhelmed healthcare systems. By comparing observed deaths to expected baselines, experts can assess the overall impact of such events on public health.

c) Which excess mortality rate estimate do you think is correct? Some commentators would suggest you are choosing the parameters to defend your political goals, not to conduct a scientific analysis. Please comment on the difficulty of choosing parameter values to estimate the death toll of Covid-19.
Answer

Based off the data and the alpha of 0.2 there is a good balance between the previous data having an effect where the expected value is prevalent, without having too much of an impact. Selecting parameter values in time series analysis, such as the smoothing parameter α in exponential smoothing, is challenging due to the need to balance modeling assumptions, data characteristics, and methodological choices. Different assumptions such as what weight to place on historical data can very much vary by opinion and knowledge of the series. It is very important to understand the big picture of the situation. Furthermore, for this reason it is important to be transparent with the parameters chosen and how one arrived to choose them in order to gain credibility that the purposes are meant to conduct a scientific analysis. The alpha of 0.2 could be considered to be more correct as it looks mostly to what we have seen historically while still taking into account more recent changes in mortality rates. This shows that there is a large discrepancy between what we were expecting based on past years and that Covid did have a large effect on mortality rates.

Rubric

Criteria Mastery (10) Incomplete (0)
Question 1: Context and Measurement The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided. The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series.
Mastery (5) Incomplete (0)
Question 2a: Mortality Plot The student accurately plots the US Mortality Rate series in R, ensuring clear labeling and a descriptive title for easy interpretation. The visualization effectively presents the data with distinguishable points or lines and appropriate formatting. Additionally, the R code is well-commented, providing clear explanations of each step and maintaining readability. The student encounters challenges in plotting the US Mortality Rate series in R. The plot may lack essential labels or a descriptive title, making it difficult to interpret. Additionally, the visualization might be unclear or cluttered, and the R code may lack sufficient comments, hindering understanding of the process. Overall, improvement is needed in effectively plotting time series data in R.
Mastery (5) Incomplete (0)
Question 2b: Smoothing SS1PE Students use R to compute exponential smoothing for modeling the US Mortality Rate series. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process. Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
Mastery (5) Incomplete (0)
Question 2c: Smoothing a=0.2 Students employ R to repeat the modeling of the US Mortality Rate series using a specified smoothing parameter value of alpha=0.2. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process. Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
Mastery (5) Incomplete (0)
Question 2d: Smoothing a=1/n Students employ R to repeat the modeling of the US Mortality Rate series using a specified smoothing parameter value of alpha=1/n. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process. Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
Mastery (10) Incomplete (0)
Question 2d: Evaluation of Parameter Choice Students justify their choice of parameter in the context of the underlying factors affecting US Mortality Rates evident in the data. The students evidence their understanding of the implications of the values in the smoothing parameter. Students show they have done some background research into the data to answer the question. Students fail to adequately justify their choice of parameter in relation to the underlying factors affecting US Mortality Rates evident in the data. They may lack evidence of understanding the implications of the values in the smoothing parameter or fail to demonstrate how these implications relate to the context of the data. Additionally, they may show limited evidence of background research into the data to support their justification, indicating a lack of depth in their analysis.
Mastery (10) Incomplete (0)
Question 3a: Excess MortalityTable Students accurately calculate the excess mortality rate during 2020 and 2021 using the smoothing parameter values employed in the previous question. They present their results in table, clearly displaying the excess mortality rate for each year alongside the corresponding smoothing parameter values. The table is well-labeled and easy to interpret. Students demonstrate inaccuracies in calculating the excess mortality rate during 2020 and 2021. Their presentation of the results in a table may lack clarity and professionalism, with issues such as unclear labeling, inconsistent formatting, or difficulty in interpreting the information provided. Additionally, they may overlook important details or fail to include all necessary information in the table, making it challenging for readers to understand the table.
Mastery (10) Incomplete (0)
Question 3b: Excess Mortality Explanations effectively convey the meaning of excess mortality to a general audience, avoiding technical terms and providing a clear, accessible description. They define excess mortality as the number of deaths observed during a specific period compared to what would be expected based on historical data. Responses may struggle to explain excess mortality clearly to a general audience, potentially using technical language or lacking coherence. They may fail to provide relatable examples or context, making it difficult for the audience to understand the concept and its significance.
Mastery (10) Incomplete (0)
Question 3c: Evaluation of assumptions used for inference Responses address the challenge of selecting parameter values to make inference in time series. They provide a comprehensive analysis, considering factors like modeling assumptions, and methodological variations that influence parameter selection. Explanations highlight the need of transparent reporting to ensure robust and reliable estimates in professional discourse. Below expectations, responses may lack depth or clarity in addressing the challenge of selecting parameter values for making inference in time series. They may overlook key factors influencing parameter selection, such as data quality or specific characteristics of the time series data. Additionally, they may not effectively consider the impact of modeling assumptions or methodological variations on parameter selection. Furthermore, they may fail to emphasize the importance of transparent reporting in ensuring the reliability and validity of estimates, potentially resulting in a lack of confidence in the conclusions drawn from the analysis.
Total Points 70