Time Series Homework: Chapter 4 Lesson 1 Key

Please_put_your_name_here

Questions

Question 1 - Gaussian White Noise (30 points)

a) Simulate a realization of Gaussian White Noise and plot it. Use 50 points and \(\sigma^2=2\)

Answer

set.seed(1234)

# 1. Simulate 50 observations from N(0, 2)
mu <- 0
sigma_squared <- 2
n <- 50

# Generate white noise data
data <- rnorm(n, mean = mu, sd = sqrt(sigma_squared))

# 2. Create a Time Series Object with Weekly Seasonality (frequency = 7)
# Assuming the data is daily, a frequency of 7 captures weekly patterns
ts_data <- ts(data, frequency = 7)

# 3. Convert the ts Object to a tsibble
# The 'as_tsibble()' function automatically handles the frequency
tsibble_data <- as_tsibble(ts_data)

# Plot the Gaussian White Noise
ggplot(tsibble_data, aes(x = index, y = value)) +
  geom_line(color = "blue") +
  geom_point(color = "darkgreen") +
  labs(
    title = "Simulated Gaussian White Noise Without Gaps",
    x = "Month",
    y = "Value"
  ) +
  theme_minimal()

b) Estimate the mean and variance. Why don’t they match exactly with the parameters values you used to create it?

Answer

sample_mean <- mean(tsibble_data$value)
print(paste("Sample Mean:", round(sample_mean, 4)))

[1] "Sample Mean: -0.6407"

sample_variance <- var(tsibble_data$value)
print(paste("Sample Variance:", round(sample_variance, 4)))

[1] "Sample Variance: 1.5666"

The reason the mean, variance, and covariance don’t match exactly with the parameters we used, is because this is data taken from a population. Because it is a sample it won’t catch 100% exactly what is going on because there wasn’t a high enough number of observations taken as a sample and there is a random component of variability within the sample.

c) Demonstrate this process second order stationary. Apply a statistical test to assess if the mean is zero. Use correlograms to show the covariance and correlation functions for lags up to 10. Explain your findings.

Answer

acf(tsibble_data$value, plot=TRUE, type = "covariance", lag = 10)

d) Use the classical decomposition on your simulated series, plot the results.

Answer

decomp <- tsibble_data %>%
  model(
    classical_decomposition(type = "additive")
  ) %>%
  components()

# 5. Plot the Decomposed Components
autoplot(decomp) +
  labs(
    title = "Additive Decomposition of Gaussian White Noise Series",
    y = "Value"
  ) +
  theme_minimal()

d) Please evaluate practical signifficance of the estimates of the trend and seasonal component.

Answer

Given this is a Gaussian white noise simulated data frame, the data is stochastic. The mean is 0 so there is no trend. The theoretical autocorrelation is 0 so there is no seasonality. The algorithm will estimate something … We know that even though there is a trend component, it’s average should focus around 0 since that is the distribution it came from. The decomposition algorithm assumes some underlying trend in the data even though none exists. Random noise gets misinterpreted as small-scale trends due to the smooth process used to estimate the trend.

While Gaussian white noise ought to have no periodic or seasonal patterns, the algorithm seems to believe there is a seasonal component. This is from the decomposition trying to impose a seasonal structure on inherently random data. A seasonal pattern here is visible because it is an artifact of the decomposition algorithm.

Because this data is random, the random component should resemble the original data, as all randomness here is captured after substracting the trend and seasonal component, however, we see here the random component looks quite different because the algorithm creating a seasonal and trend component.

Extra Credit

Question 3 - Random Walks (55 points)

a) Simulate a realization of a white noise process using the Exponential distribution. If \(s_t\) is the realization from an exponentially distributed random variable, use the white noise \(w_t=s_t-\lambda\) for your simulation. The length of the series is 500 points and \(\lambda=1\). Please plot your simulation.

Answer

# Set parameters
lambda <- 1
n <- 500


s_t <- rexp(n, rate = lambda)  # exponential random variable with rate lambda
w_t <- s_t - lambda  # white noise process

# Plot the white noise series
plot(w_t, type = "l", main = "Simulated White Noise Process", ylab = "w_t", xlab = "Time")

b) Superimpose a histogram of the Gaussian White noise simulation from Question 2 versus the Exponential White Noise. Please compare and contrast the two distributions. Your chart should look close to Figure 1 below. Hint: research exponential random variables and their right-tail properties.

![Figure 1](https://github.com/byuistats/timeseries/raw/master/images/Overlayed_Histograms_HW4_1.png)

Answer

# Exponential White Noise Parameters
lambda <- 1
n_exp <- 500
s_t <- rexp(n_exp, rate = lambda)
w_t_exp <- s_t - lambda  # Centered Exponential White Noise

# Gaussian White Noise Parameters
n_gauss <- 500
sigma2 <- 2
sigma <- sqrt(sigma2)

w_t_gauss <- rnorm(n_gauss, mean = 0, sd = sigma)




df <- data.frame(
  value = c(w_t_exp, w_t_gauss),
  dist = factor(rep(c("Exponential", "Gaussian"), 
                    times = c(length(w_t_exp), length(w_t_gauss))))
)

ggplot(df, aes(x = value, fill = dist, color = dist)) +
  geom_histogram(aes(y = ..density..), position = "identity", bins = 15, alpha = 0.3) +
  scale_fill_manual(values = c("Exponential" = rgb(1, 0, 0, 0.3), 
                               "Gaussian" = rgb(0, 0, 1, 0.3))) +
  scale_color_manual(values = c("Exponential" = "darkred", 
                                "Gaussian" = "blue")) +
  labs(title = "Overlayed Histograms: Gaussian vs Exponential Distribution",
       x = "Value", y = "Density",
       fill = "Distribution", color = "Distribution") +
  xlim(c(-3, 4)) + ylim(c(0, 0.7)) +
  theme_minimal()

c) Create a random walk series using the Exponential DWN simulations. Please plot the series.

Answer

# Generate the random walk series by taking the cumulative sum of w_t
random_walk <- cumsum(w_t)

# Plot the random walk series
plot(random_walk, type = "l", main = "Random Walk Series", ylab = "Random Walk", xlab = "Time")

d) Use the Holt-Winters decomposition method to estimate the trend, seasonal, and random components of the series.

Answer

random_walk_ts <- ts(random_walk, frequency = 12) %>% as_tsibble()


hw_model <- random_walk_ts %>%
  model(
    Additive = ETS(
      value ~ error("A") + trend("A") + season("A"),  
      opt_crit = "amse",                              
      nmse = 1                                         
    )
  )

report(hw_model)

Series: value 
Model: ETS(A,A,A) 
  Smoothing parameters:
    alpha = 0.9503298 
    beta  = 0.0001031695 
    gamma = 0.02856599 

  Initial states:
     l[0]       b[0]       s[0]      s[-1]      s[-2]       s[-3]       s[-4]
 2.477575 -0.0102033 -0.1772354 -0.1363246 0.02429963 -0.03235896 -0.05994879
      s[-5]     s[-6]      s[-7]     s[-8]      s[-9]     s[-10]    s[-11]
 0.05676777 0.3267906 -0.2972415 0.1992056 -0.1795977 0.03721112 0.2384321

  sigma^2:  1.171

     AIC     AICc      BIC 
3203.978 3205.248 3275.626

autoplot(components(hw_model))

e) Please evaluate practical significance of the estimates for the parameters of the model using the default algorithm settings (minimize the SS1PE).

Answer

The estimated Holt-Winters parameters, with \(\alpha\) at 0.95, \(\beta\) at 0.0001, and \(\gamma\) at 0.03, are highly sensible for a random walk series generated from exponential discrete white noise. The high \(\alpha\) value indicates that the model heavily weights the most recent observation to update the level, which is characteristic of a random walk where the best forecast for the next period is closely tied to the current value. The extremely low \(\beta\) and \(\gamma\) values, both near zero, correctly reflect the inherent nature of a random walk: it lacks a persistent, deterministic trend (even if there’s drift, it’s captured by the level) and exhibits no seasonality. Essentially, these parameters show the Holt-Winters model adapting appropriately by simplifying itself. It has effectively recognized the absence of strong trend and seasonal patterns, leading to a model that behaves very much like a highly adaptive simple exponential smoothing. You may see different values as you estimate, expect values between the following: \(\alpha\) between 0.94 and 0.99, \(\beta\) at between 0.03 and 0.0001, and \(\gamma\) between 0.03 and 0.0001.

Rubric

	Mastery (5)	Incomplete (0)
Question 1a: Gaussian White Noise	They effectively plot the simulated realization of Gaussian White Noise in R using plotting functions like plot() or ggplot2, ensuring proper labeling of the axes, a title, and any other necessary elements for clear visualization. Code is well-commented.	Students struggle to simulate Gaussian White Noise in R or effectively plot the simulation, potentially due to errors in coding or misunderstandings of the process. Their R code may lack sufficient comments, making it challenging for others to understand or reproduce the results. The plot may lack clarity or proper presentation, hindering interpretation, and may omit necessary elements such as axis labels or a title.
	Mastery (10)	Incomplete (0)
Question 1b: Sample second order properties	Students demonstrate an understanding of statistical concepts such as variance, covariance, and autocorrelation, and use appropriate methods or functions to estimate these properties in their chosen statistical software. Proficient explanations may highlight factors such as sample size, random variability, or measurement error, which can introduce discrepancies between estimated properties and the parameters set during data creation.	Students fail to estimate the second-order properties of the sample data or provide a coherent explanation for discrepancies with the parameters set during data creation. They may demonstrate a limited understanding of statistical concepts or use inappropriate methods for estimation. Additionally, they may overlook key factors contributing to discrepancies, providing vague or incomplete explanations. Overall, their analysis may lack depth or clarity, hindering comprehension of the reasons behind the observed differences.
	Mastery (5)	Incomplete (0)
Question 1c: Decomposition	Student applies the classical decomposition to the simulated series, accurately separating and plotting the components (trend, seasonal, and random) with clear labels and appropriate formatting. \| Fails to apply classical decomposition or does not plot the decomposed components clearly. Missing labels or poor formatting make the plot difficult to interpret.
	Mastery (10)	Incomplete (0)
Question 1d: Interpretation and Analysis	Clearly evaluates the trend and seasonal component estimates. Provides a well-reasoned evaluation of whether decomposition is appropriate for the series.	Fails to accurately interpret the trend and seasonal components, providing minimal or incorrect insights into the data patterns. Lacks a clear or well-supported evaluation of the appropriateness of decomposition, with reasoning that is vague, unsupported, or does not address the model assumptions.
	Mastery (10)	Incomplete (0)
Question 2a: Simulation and Plot	Correctly simulates a realization of white noise by generating 500 points from an Exponential distribution, and presents the results in a clear plot with appropriate axis labels and units.	Fails to correctly simulate the white noise process, does not use the specified parameters, or presents a plot that lacks clarity, labels, or units.
	Mastery (10)	Incomplete (0)
Question 2b: Histograms	Responses superimpose a histogram of the Gaussian White Noise simulation from Question 2 alongside the Exponential White Noise. They compare and contrast the two distributions, demonstrating an understanding of statistical concepts such as mean and variance. Proficient analyses may include research findings on the mean and variance of an exponential random variable, providing context for the comparison. The resulting chart closely resembles Figure 1, showing clear distinctions between the distributions and offering insightful commentary on their similarities and differences.	Responses may struggle to accurately superimpose the histograms of Gaussian White Noise and Exponential White Noise or provide a coherent comparison between the two distributions. They may lack understanding of statistical concepts such as mean and variance or fail to research the properties of an exponential random variable. The resulting chart may lack clarity or proper presentation, hindering interpretation of the distributions. Additionally, they may overlook key differences between the distributions or provide vague or superficial comparisons, indicating a limited understanding of the underlying concepts.
	Mastery (10)	Incomplete (0)
Question 2c: Simulate a Random Walk	Correctly generates a random walk series using Exponential DWN simulations and accurately plots the series with clear axis labels and units	Fails to correctly generate the random walk series, does not use Exponential DWN as specified, or presents a plot that lacks clarity, labels, or appropriate units.
	Mastery (5)	Incomplete (0)
Question 2d: Holt Winters Decomposition	Applies the Holt-Winters decomposition method to estimate the trend, seasonal, and random components, presenting results that clearly reflect each component’s behavior within the series.	Fails to apply the Holt-Winters decomposition, produces inaccurate component estimates, or presents results that are unclear or incorrectly labeled.
	Mastery (20)	Incomplete (0)
Question 2e: Holt Winters Decomposition Evaluation	Provides a thoughtful evaluation of whether decomposition is appropriate for a random walk, considering the characteristics of the series.	Fails to interpret the parameter estimates or provides an incomplete or incorrect explanation of their significance. Does not evaluate decomposition validity for a random walk or provides unsupported reasoning.
Total Points	85