Time Series Homework: Chapter 4 Lesson 4

Please_put_your_name_here

Data

exuseu <- rio::import("https://byuistats.github.io/timeseries/data/exuseu.csv")

Questions

Question 1 - Context and Measurement (5 points)

The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

a) Dollar-Euro Exchange Rate

FRED

Answer

Data Collection Process The data is collected daily (business days) and reflects the foreign exchange rate at which one U.S. dollar can be exchanged for euros. The data is sourced from the Federal Reserve or other financial institutions, using averages of currency trading rates from major banks. Unit of Analysis The unit of analysis is the daily exchange rate, expressed as the number of euros per one U.S. dollar. Meaning Each observation represents the average exchange rate for the dollar-to-euro conversion for a specific business day, providing insights into the currency’s relative strength and market conditions on that date.

Question 2 - Sneaky AR(2) (50 points)

a) Simulate an AR(2) process with \(n=100, \mu=0,\alpha_1=0.25, \alpha_2=0.1, \sigma^2=1\)

Answer
set.seed(123)
n_rep <- 100
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}

dat_ts |> 
  autoplot(.vars = x) +
    labs(
      x = "Time",
      y = "Simulated Time Series",
      title = paste("Simulated Values from an AR(2) Process with Mean", alpha0)
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

b) Is the AR(2) process stationary? Please use the characteristic equation method and report your results in a table.

Answer
abs(polyroot(c(1,-0.25,-0.1)))
[1] 2.150368 4.650368

Yes it is stationary because the absolute value to the solutions of the characteristic equation are both greater than 1.

c) Let R decide how many lags between 1 and 9 to fit the simulated series. Try it a few times. How often were the model parameters inside the estimate’s confidence intervals?

Answer
set.seed(123)
n_rep <- 100
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary1 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )


set.seed(164)
n_rep <- 100
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary2 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )


set.seed(198)
n_rep <- 100
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary3 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
ci_summary1
# A tibble: 1 × 8
  .model             term  estimate std.error statistic p.value  lower upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>   <dbl>  <dbl> <dbl>
1 AR(x ~ order(1:9)) ar1      0.259    0.0972      2.66 0.00904 0.0644 0.453
ci_summary2
# A tibble: 3 × 8
  .model             term  estimate std.error statistic p.value   lower  upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>   <dbl>   <dbl>  <dbl>
1 AR(x ~ order(1:9)) ar1     0.171     0.0988     1.73   0.0871 -0.0268 0.368 
2 AR(x ~ order(1:9)) ar2     0.0871    0.0996     0.874  0.384  -0.112  0.286 
3 AR(x ~ order(1:9)) ar3    -0.126     0.0981    -1.29   0.202  -0.322  0.0701
ci_summary3
# A tibble: 1 × 8
  .model             term  estimate std.error statistic p.value  lower upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>   <dbl>  <dbl> <dbl>
1 AR(x ~ order(1:9)) ar1     0.0513    0.0984     0.521   0.603 -0.146 0.248

We can see from the models above that none of them selected an AR(2) process. With the models above the parameters were usually caught within the confidence interval. However your answers may be different depending on the seeds you set.

d) Increase n to 1000. Repeat part c. Did the R routine obtain better estimates?

Answer
set.seed(123)
n_rep <- 1000
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary1 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
set.seed(164)
n_rep <- 1000
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary2 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
set.seed(198)
n_rep <- 1000
alpha0 <- 0
alpha1 <- 0.25
alpha2 <- 0.1
sigma_sqr <- 1

dat_ts <- tibble(w = rnorm(n = n_rep, sd = sqrt(sigma_sqr))) |>
    mutate(
      index = 1:n(),
      x = 0
    ) |>
    tsibble::as_tsibble(index = index)

# Simulate x values
dat_ts$x[1] <- alpha0 + dat_ts$w[1]
dat_ts$x[2] <- alpha0 + alpha1 * ( dat_ts$x[1] - alpha0 ) + dat_ts$w[2]
for (i in 3:nrow(dat_ts)) {
  dat_ts$x[i] <- alpha0 + 
    alpha1 * ( dat_ts$x[i-1] - alpha0 ) + 
    alpha2 * ( dat_ts$x[i-2] - alpha0 ) + 
    dat_ts$w[i]
}
fit_ar <- dat_ts |>
    model(AR(x ~ order(1:9))) 
ci_summary3 <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
ci_summary1
# A tibble: 2 × 8
  .model             term  estimate std.error statistic  p.value  lower upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>    <dbl>  <dbl> <dbl>
1 AR(x ~ order(1:9)) ar1     0.219     0.0316      6.94 6.89e-12 0.156  0.282
2 AR(x ~ order(1:9)) ar2     0.0732    0.0316      2.32 2.06e- 2 0.0101 0.136
ci_summary2
# A tibble: 3 × 8
  .model             term  estimate std.error statistic  p.value    lower  upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>  <dbl>
1 AR(x ~ order(1:9)) ar1     0.273     0.0316      8.63 2.43e-17  0.209   0.336 
2 AR(x ~ order(1:9)) ar2     0.0725    0.0327      2.22 2.66e- 2  0.00723 0.138 
3 AR(x ~ order(1:9)) ar3    -0.0503    0.0316     -1.59 1.12e- 1 -0.113   0.0129
ci_summary3
# A tibble: 2 × 8
  .model             term  estimate std.error statistic  p.value  lower upper
  <chr>              <chr>    <dbl>     <dbl>     <dbl>    <dbl>  <dbl> <dbl>
1 AR(x ~ order(1:9)) ar1      0.279    0.0315      8.84 4.13e-18 0.216  0.342
2 AR(x ~ order(1:9)) ar2      0.109    0.0315      3.45 5.85e- 4 0.0456 0.171

The estimates for the parameters have improved. For the models that were AR(2) models we can see that the model parameters are within the confidence interval. With an increase in information, R was able to more accurately identify the model. However one of the models still did not identify that this is an AR(2) process.

e) How much trust should you have in R’s model fitting routine when you don’t know what the data generating process is?

Answer

You should excise great caution when using R’s model fitting routine if you do not know the data generating process. R will try to optimize the model, without evaluating the context that lies behind the data producing the model. Due to this, it is up to you the evaluate a series and what model should be used by performing exploratory analysis and gaining domain knowledge about the series being worked on.

Question 3 - Stationarity and AR(p) fitting (50 points)

A key assumption when fitting AR(p)) models is that the time series data is stationary. Stationarity ensures that the statistical properties of the series remain constant over time, so our estimates of the mean, variance, and temporal relationships are valid. Non-stationary data, characterized by trends or changing mean and variance, can lead to unreliable parameter estimates and inaccurate forecasts when using AR(p) models. To address non-stationarity, a common solution is to apply differencing, particularly first differencing. By employing first differencing, the non-stationary series can be transformed into a stationary form, allowing for the application of AR(p) models with more confidence and accuracy.

a) Plot the Dollar-Euro Exchange Rate series, it’s correlogram, and a partial correlogram. Is the series stationary?

Answer
us_ts <- exuseu |>
  mutate(date = mdy(date))|>
  as_tsibble(index = date)
ggplot(data = us_ts, aes(x = date, y = exuseu))+
  geom_line()+
  labs(x ="Date",
       y = "Dollar-Euro Exchange Rate")

acf(us_ts$exuseu)

pacf(us_ts$exuseu)

The data appears to be non-stationary as the correlogram shows a decay as the lags are increasing. The appears to be a large spike at lag 1 in the partial correlogram, suggesting that an AR(1) process may be behind the data generating process.

b) Fit an AR(p) model to the data by letting R choose how many lags to use? Please report your results in a table.

Answer
fit_ar <- us_ts |>
    model(AR(exuseu ~ order(1:9))) 
ci_summary <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
ci_summary
# A tibble: 5 × 8
  .model          term  estimate std.error statistic   p.value    lower    upper
  <chr>           <chr>    <dbl>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
1 AR(exuseu ~ or… cons…  0.00480   0.00261      1.84 6.64e-  2 -4.25e-4  0.0100 
2 AR(exuseu ~ or… ar1    1.26      0.0276      45.5  4.54e-272  1.20e+0  1.31   
3 AR(exuseu ~ or… ar2   -0.326     0.0442      -7.37 2.94e- 13 -4.14e-1 -0.238  
4 AR(exuseu ~ or… ar3    0.129     0.0442       2.93 3.48e-  3  4.10e-2  0.218  
5 AR(exuseu ~ or… ar4   -0.0640    0.0276      -2.32 2.05e-  2 -1.19e-1 -0.00884

c) Is the fitted model stationary? Would you use the fitted model as the basis for forecasting?

Answer
abs(polyroot(c(1,-1.257,0.326,-0.129,0.064)))
[1] 1.005448 2.624603 2.624603 2.255970

The fitted model is stationary. However this model is unfit to be used for forecasting as the underlying data appears to be non-stationary. This model also does not follow what we observe from the partial correlogram and therefore needs more scutiny.

d) Take the first-difference of the Dollar-Euro Exchange Rate series.

Answer
us_ts$diff = us_ts$exuseu - lag(us_ts$exuseu) 

e) Plot the differenced series, it’s correlogram, and a partial correlogram. Is the series stationary?

Answer
ggplot(data = us_ts, aes(x = date, y = diff))+
  geom_line()+
  labs(x = "Date",
       y = "First difference of US-Euro Exchange Rate")

acf(na.omit(us_ts$diff))

pacf(na.omit(us_ts$diff))

By taking a first difference of the series, we get a series that appears to be stationary. When evaluating the correlogram we no longer see the same decay that we originally saw as lags increased.

f) Fit an AR(p) model to the data by letting R choose how many lags to use? Please report your results in a table.

Answer
fit_ar <- na.omit(us_ts) |>
    model(AR(diff ~ order(1:9))) 
ci_summary <- tidy(fit_ar) |>
    mutate(
        lower = estimate - 2 * std.error,
        upper = estimate + 2 * std.error
    )
ci_summary
# A tibble: 3 × 8
  .model            term  estimate std.error statistic  p.value    lower   upper
  <chr>             <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>   <dbl>
1 AR(diff ~ order(… ar1     0.259     0.0276      9.39 2.61e-20  0.204    0.314 
2 AR(diff ~ order(… ar2    -0.0675    0.0284     -2.37 1.78e- 2 -0.124   -0.0106
3 AR(diff ~ order(… ar3     0.0614    0.0276      2.23 2.62e- 2  0.00621  0.117 

g) Is the fitted model stationary? Would you use the fitted model as the basis for forecasting?

Answer
abs(polyroot(c(1,-0.259,0.067,-0.061)))
[1] 2.317880 2.659436 2.659436

The fitted model is stationary as all of the coefficients are greater than 1. When looking at the partial correlogram we can see there is likely significance in the first lag. There is also significance in the third lag which is reflected in the model that R generates. However the validity of this being an AR(3) model is questionable as we would expected 5% of lags to fall outside of the bounds due to statistical error. Due to this trusting the model is questionable and running an AR(1) model would be more appropriate.

Rubric

Criteria Mastery (5) Incomplete (0)
Question 1: Context and Measurement The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided. The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series.
Mastery (5) Incomplete (0)
Question 2a: Simulation 1 Students correctly simulate the AR(2) process with specified parameters using well-commented code in R. | Students attempt to simulate the AR(2) process but encounter errors in parameter specification or fail to produce the correct dataset length. Code clarity may be lacking. |
Mastery (5) Incomplete (0)
Question 2b: Stationarity Students use well-commented R code to determine the stationarity of the AR(2) process using the characteristic equation method. | Students may make errors in implementing or interpreting the characteristic equation method. Results reporting may lack clarity, and code comments may be insufficient. |
Mastery (10) Incomplete (0)
Question 2c: AR(p) fitting Students utilize well-commented R code to fit AR models with varying lag numbers to the simulated series. Their discussion of the frequency of estimates marching model parameters is clear and easy to understand Students may make errors in specifying lag ranges or interpreting results. Reporting of parameter confidence intervals may be incomplete, and code comments may lack clarity.
Mastery (10) Incomplete (0)
Question 2d: Increased n Students correctly increase the sample size to 1000 and repeat the procedure from part c with well-commented code. They compare the results with those obtained using a smaller sample size. | Students may fail to adequately compare results or assess estimate quality when increasing the sample size. Analysis may be incomplete or inaccurate, and code comments may lack clarity. |
Mastery (20) Incomplete (0)
Question 2e: Evaluation Students provide a thoughtful analysis of the reliability of R’s model fitting routine in unknown data generating process situations. They discuss potential limitations and considerations when relying on automated model fitting algorithms. The response shows the students understand the relatioship between sample size and the precision of estimates. Students may provide a superficial analysis or overlook important considerations when discussing R’s model fitting routine reliability. Analysis may lack depth or accuracy, and code comments may be insufficient.
Mastery (5) Incomplete (0)
Question 3a: Plot Original Series Students create high-quality plots of the Dollar-Euro Exchange Rate series, its correlogram, and a partial correlogram, ensuring clarity and appropriate labeling. They analyze the plots to determine stationarity, providing clear explanations supported by the visual evidence. | Students produce plots of the Dollar-Euro Exchange Rate series, its correlogram, and a partial correlogram but with lower quality or clarity, lacking appropriate labeling or detail. Analysis of stationarity may be incomplete or inaccurate, lacking clear explanations supported by the visual evidence. |
Mastery (5) Incomplete (0)
Question 3b: AR(p) fitting Students use R to fit an AR(p) model to the data, allowing R to determine the optimal number of lags. They report the results in a clear and organized table format, including parameter estimates and statistical significance. | Students encounter difficulties in fitting the AR(p) model or fail to properly report the results in a table format. Inaccuracies or omissions in parameter estimates and statistical significance reporting may be present. |
Mastery (5) Incomplete (0)
Question 3c: Fitted Model Stationarity Students assess the stationarity of the fitted AR(p) model using the absolute value of the roots of the characteristic equation. They use the polyroot() function. Students make a reasoned judgment regarding the model’s suitability for forecasting, considering its stationarity and the sensitivity of the forecast to changes in parameter values that may be statistical but not practically significant. Students provide an incomplete or inaccurate assessment of the stationarity of the fitted AR(p) model. Their judgment regarding the model’s suitability for forecasting may lack justification or clarity.
Mastery (5) Incomplete (0)
Question 3d: First Difference Students correctly compute the first difference of the Dollar-Euro Exchange Rate series in R. Students encounter errors or inaccuracies in computing the first difference, resulting in incorrect or incomplete data transformation.
Mastery (5) Incomplete (0)
Question 3e: Plot Difference Students create high-quality plots of the differenced series, its correlogram, and a partial correlogram, ensuring clarity and appropriate labeling. They analyze the plots to determine stationarity, providing clear explanations supported by the visual evidence. | Students produce plots of the differenced series, its correlogram, and a partial correlogram but with lower quality or clarity, lacking appropriate labeling or detail. Analysis of stationarity may be incomplete or inaccurate, lacking clear explanations supported by the visual evidence.
Mastery (5) Incomplete (0)
Question 3f: AR(p) fitting of first difference Students use R to fit an AR(p) model to the differenced data, allowing R to determine the optimal number of lags. They report the results in a clear and organized table format, including parameter estimates and statistical significance. | Students encounter difficulties in fitting the AR(p) model to the differenced data or fail to properly report the results in a table format. Inaccuracies or omissions in parameter estimates and statistical significance reporting may be present. |
Mastery (15) Incomplete (0)
Question 3g: Stationarity and Evaluation Students assess the stationarity of the fitted AR(p) model on the differenced data using the charateristic equation of the parameter estimates. They make a reasoned judgment regarding the model’s suitability for forecasting, considering its stationarity, magnitude and significance of coefficient estimates. Students provide an incomplete or inaccurate assessment of the stationarity of the fitted AR(p) model on the differenced data. Their judgment regarding the model’s suitability for forecasting may lack justification or clarity.




Total Points 100