# Weather data for Rexburg
<- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather.csv") rex_temp
Time Series Homework: Chapter 1 Lesson 2
Please_put_your_name_here
Data
Questions
Question 1: Context and Measurement (10 points)
The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.
Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.
a) Rexburg, ID Daily High Temperatures
Question 2: Visualization (5 points)
Please plot the Rexburg Daily Temperature series choosing the range and frequency to illustrate the data in the most readable format. Use the appropriate axis labels, units, and captions.
Question 3: Additive Decomposition - Manual Approach (25 points)
This exercise will guide you through all the steps to conduct an additive decomposition of the Rexburg Daily Temperature Series. The first step is to aggregate the daily series to a monthly frequency to ease on the calculation. The code below accomplished the task.
a) Please use AI to comment and explain the steps of the code below. Replace the code below with the fully explained code.
# Weather data for Rexburg
<- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather.csv") |>
monthly_tsibble mutate(date2 = ymd(dates)) |>
mutate(year_month = yearmonth(date2)) |>
as_tibble() |>
group_by(year_month) |>
summarize(average_daily_high_temp = mean(rexburg_airport_high)) |>
ungroup() |>
as_tsibble(index = year_month)
view(monthly_tsibble)
Commented
# Step 1: Import the CSV file containing Rexburg weather data using rio::import
# 'rio' is a versatile library for data import/export; the URL directly imports the CSV file.
<- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather.csv") |>
monthly_tsibble
# Step 2: Use mutate to create a new column 'date2', which is a properly formatted date object.
# 'ymd' is a lubridate function that converts a string to a Date object in Year-Month-Day format.
mutate(date2 = ymd(dates)) |>
# Step 3: Create another column 'year_month' using mutate, which extracts the year and month
# from the 'date2' column, and turns it into a year-month object (used for time series analysis).
# 'yearmonth' is a lubridate function that creates a year-month representation of the date.
mutate(year_month = yearmonth(date2)) |>
# Step 4: Group the data by 'year_month'. Grouping is necessary because we want to calculate
# the average daily high temperature for each year-month period.
group_by(year_month) |>
# Step 5: Summarize the data by calculating the average daily high temperature for each month.
# The mean function is applied to the 'average_daily_high_temp' column to compute the monthly average.
summarize(average_daily_high_temp = mean(rexburg_airport_high)) |>
# Step 6: Ungroup the data after summarization.
# Ungrouping is done to remove the group structure from the data, which is no longer needed.
ungroup() |>
# Step 7: Convert the resulting data frame into a tsibble (time series tibble).
# A tsibble is a special type of tibble used in time series analysis.
# The 'year_month' column is set as the time index for this tsibble.
as_tsibble(index = year_month)
# Step 8: View the resulting tsibble to inspect the data.
# 'view' is used to open the tsibble in a viewer for easy exploration.
view(monthly_tsibble)
b) Please calculate the centered moving average of the Rexburg Monthly Temperature Series. Plot the series.
c) Please calculate the seasonally adjusted series. Plot the series
d) Please calculate the random component. Please plot the series.
Question 4: Additive Decomposition - R’s Decompose (15)
a) Please use additive decomposition model described in the Time Series notebook to decompose the Rexburg Monthly Temperature Series. Plot the algorithm’s output.
b) You can also decompose without aggregating the data series. The following code completes the additive decomposition with the original data set. Please use AI to comment and explain how to accomplish the task. Replace the code below with the fully explained code.
# Weather data for Rexburg
<- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather.csv") |>
daily_tsibble mutate(year_month_day = ymd(dates)) |>
select(-imputed, -dates) |>
# as_tibble() |>
as_tsibble(index = year_month_day)
%>% head
daily_tsibble
<- daily_tsibble |>
daily_decompose model(feasts::classical_decomposition(rexburg_airport_high ~ season(365.25),
type = "add")) |>
components()
|> autoplot(.vars = random) daily_decompose
Commented
# Import weather data for Rexburg from a CSV file hosted online.
<- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather.csv") |>
daily_tsibble # Convert the 'dates' column to a proper date format using the `ymd()` function.
mutate(year_month_day = ymd(dates)) |>
# Remove the 'imputed' and original 'dates' columns. We're using the new date column.
select(-imputed, -dates) |>
# Convert the data frame to a tsibble object, specifying 'year_month_day' as the time index.
# This is crucial for time series analysis functions to work correctly. A tsibble is a special
# data frame for time series analysis. It enforces a regular time index, which is important for many time series operations.
as_tsibble(index = year_month_day)
# Display the first few rows of the tsibble to inspect the data.
%>% head() daily_tsibble
# A tsibble: 6 x 2 [1D]
rexburg_airport_high year_month_day
<int> <date>
1 30 1999-01-02
2 25 1999-01-03
3 26 1999-01-04
4 29 1999-01-05
5 32 1999-01-06
6 31 1999-01-07
# Decompose the time series into its components: trend, seasonality, and remainder.
<- daily_tsibble |>
daily_decompose # Fit a classical decomposition model to the 'rexburg_airport_high' variable (daily high temperatures).
# `classical_decomposition()` function from the `feasts` package is used for this purpose.
# - `season(365.25)` specifies the length of the seasonal period (365.25 days for a year, accounting for leap years).
# - `type = "add"` indicates that an additive decomposition model should be used (Trend + Seasonality + Remainder = Observed).
# An additive model is appropriate when the magnitude of the seasonal fluctuations does not change with the level of the series. If
# the seasonal fluctuations change with the level of the series, a multiplicative model would be more appropriate.
model(feasts::classical_decomposition(rexburg_airport_high ~ season(365.25), type = "add")) |>
# Extract the decomposed components (trend, seasonality, and remainder) from the model.
components()
# Plot the 'random' component (remainder) of the decomposition. The remainder is what's left
# after the trend and seasonality have been removed. It should ideally be random noise, centered
# around zero, with no obvious patterns. If there are patterns in the remainder, it suggests
# that the model has not captured all of the structure in the data.
|> autoplot(.vars = random) daily_decompose
Warning: Removed 364 rows containing missing values or values outside the scale range
(`geom_line()`).
# You could also plot other components:
# daily_decompose |> autoplot(.vars = trend) # Plot the trend component.
# daily_decompose |> autoplot(.vars = season_year) # Plot the seasonal component.
# daily_decompose |> autoplot() # Plot all components and the original data.
Question 5: Seasonally Adjusted Series - Analysis (20 points)
a) Justify why we use the additive decomposition model to seasonally adjust the Rexburg Daily Temperature series.
b) You calculated the random component of the series using three different procedures. Are the random component series the same? Are there patterns that are similar across all the random component series? Propose an explanation for the commonalities.
Rubric
Criteria | Mastery (10) | Incomplete (0) | |
Question 1: Context and Measurement | The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided. | The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series. | |
Mastery (5) | Incomplete (0) | ||
Question 2: Visualization | Chooses a reasonable manual range for the Rexburg Daily Temperature series, providing a readable plot that captures the essential data trends. Creates a plot with accurate and clear axis labels, appropriate units, and a caption that enhances the understanding of the Rexburg Daily Temperature series. | Attempts manual range selection, but with significant issues impacting the readability of the plot. The chosen range may obscure important data trends, demonstrating a limited understanding of graphical representation.Fails to include, axis labels, units, or captions, leaving the visual representation and interpretation incomplete. | |
Mastery (5) | Incomplete (0) | ||
Question 3a: Monthly Aggregation | The code has been updated with comments and clear explanations of what each command and function does. The student shows they understand the intuition behind the procedure. | The code has not been updated or the comments and explanation do not provide enough evidence to prove the student understand the code. | |
Mastery (5) | Incomplete (0) | ||
Question 3b: Centered Moving Average | Correctly calculates the centered moving average. Clearly presents the results with well-labeled axes, titles, and a properly formatted plot. | Incorrectly calculates the centered moving average or omits it entirely. The plot is either missing or poorly presented, lacking clear labels, titles, or proper formatting, making it difficult to interpret the results. | | ||
Mastery (10) | Incomplete (0) | ||
Question 3c: Seasonally Adjusted Means Series | Correctly calculates the seasonally adjusted means series using an appropriate method. Produces a clear, accurate plot of the seasonally adjusted time series with well-labeled axes and titles. | Incorrect calculation or missing/incorrect plot. Plot lacks essential elements like labels, titles, or fails to represent the seasonally adjusted series. | | ||
Mastery (5) | Incomplete (0) | ||
Question 3d: Random Component Series | Correctly calculates the random component of the series by removing the trend and seasonal components. Produces a clear, accurate plot with well-labeled axes, titles, and proper formatting. | Incorrectly calculates the random component, omits steps (e.g., does not remove trend/seasonality), or the plot is unclear or missing essential elements like labels and titles. | |
Mastery (10) | Incomplete (0) | ||
Question 4a: Decompose Monthly | Correctly applies the additive decomposition model to the Rexburg Monthly Temperature Series. Clearly presents the trend, seasonal, and random components in well-labeled plots, with appropriate titles, axes labels, and formatting. | Fails to correctly apply the additive decomposition model, resulting in incorrect or incomplete separation of the trend, seasonal, and random components. The plot is either missing or poorly presented, lacking proper labels, titles, or clear distinction between components. |
|
Mastery (5) | Incomplete (0) | ||
Question 4b: Decompose Daily | The code has been updated with comments and clear explanations of what each command and function does. The student shows they understand the intuition behind the procedure. | The code has not been updated or the comments and explanation do not provide enough evidence to prove the student understand the code. | |
Mastery (10) | Incomplete (0) | ||
Question 5a: Modeling Justification | Clearly differentiates between the multiplicative and additive model assumptions, and shows how the series best matches the additive model’s assumptions. | It’s not clear that the student understands the difference between the additive and multiplicative model or their assumptions. | |
Mastery (10) | Incomplete (0) | ||
Question 5b: Random Component Analysis | Compares the random components derived from three different procedures, accurately stating whether they are the same or different. Identifies and explains any observed similarities or differences. Proposes a logical, well-reasoned explanation for why similar patterns might exist in the random components, using relevant time series concepts (e.g., common noise factors, model assumptions). | Fails to correctly compare the random components, or provides an unclear or inaccurate comparison. | Fails to provide a clear or accurate explanation for the commonalities, or provides irrelevant reasoning. | |
Total Points | 75 |