Time Series Homework: Chapter 4 Lesson 2 Key

Please_put_your_name_here

Questions

Question 1 - Random Walks: Stocks (30 points)

Modify the code used to get the prices of McDonald’s stock to download closing stock prices for a different publicly-traded company over a time period of your choice.

a) Create a time plot of the daily closing stock prices.

Answer
# Set symbol and date range
symbol <- "TSLA"
company <- "Tesla"
date_start <- "2020-07-01"
date_end <- "2024-01-01"

# Fetch stock prices (can be used to get new data)
stock_df <- tq_get(symbol, from = date_start, to = date_end, get = "stock.prices")

# Transform data into tsibble
stock_ts <- stock_df %>%
  mutate(
    dates = date, 
    value = adjusted
  ) %>%
  dplyr::select(dates, value) %>%
  as_tibble() %>% 
  arrange(dates) |>
  mutate(diff = value - lag(value)) |>
  as_tsibble(index = dates, key = NULL)

plot_ly(stock_ts, x = ~dates, y = ~value, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Time Plot of ", symbol, " Daily Closing Price")
  )

b) Create a correlogram of the daily closing stock prices. Is there evidence that daily closing stock prices follows a random walk? Please explain.

Answer
acf(stock_ts$value, plot=TRUE, type = "correlation", lag.max = 25)

There is strong evidence suggesting a random walk. The correlogram of daily closing stock prices typically shows an autocorrelation value very close to 1 at the first lag (following the perfect correlation of 1 at lag 0), and this high correlation then decays very slowly across subsequent lags. This pattern of slow decay is typical of a non-stationary time series, which is characteristic of a random walk. In such a process, each day’s price is essentially the previous day’s price plus an unpredictable random component, meaning the series has a ‘long memory’ where past shocks are carried forward. Consequently, while the current price level is highly correlated with recent past prices, this structure implies that the changes in price (the returns) are largely random and unpredictable based on past price movements alone.

c) Take the first difference of your daily closing stock prices and plot the resulting time series.

Answer
# Take the first difference
stock_ts_diff <- stock_ts %>%
  mutate(first_diff = difference(value, 1)) |> 
  filter(!is.na(first_diff))

# Plot the first-differenced series
plot_ly(stock_ts_diff, x = ~dates, y = ~first_diff, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = "Dates"),
    yaxis = list(title = "First Difference of Closing Price (US$)"),
    title = "Time Plot of First-Differenced Tesla Daily Closing Prices"
  )

d) Create two charts, the first is a correlogram of the first-difference of daily closing stock prices. The second is a histogram of the difference in the stock prices and superimpose the corresponding normal density.

Answer
acf(stock_ts$diff |> na.omit(), plot=TRUE, type = "correlation", lag.max = 25)

# Histogram of differences in stock prices
stock_ts |>
  mutate(
    density = dnorm(diff, mean(stock_ts$diff, na.rm = TRUE), sd(stock_ts$diff, na.rm = TRUE))
  ) |>
  ggplot(aes(x = diff)) +
    geom_histogram(aes(y = after_stat(density)),
        color = "white", fill = "#56B4E9", binwidth = 5) +
    geom_line(aes(x = diff, y = density)) +
    theme_bw() +
    labs(
      x = "Difference",
      y = "Frequency",
      title = "Histogram of Difference in the Closing Stock Prices"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

e) Using your results from parts c and d, is there evidence that the first difference of daily closing stock prices follows a white noise process? Please explain.

Answer

This appears to follow a white noise process. We can see this with the histogram of these differences showing a normal distribution centered around zero. The ACF plot of the first differences of the daily closing stock prices also provides strong evidence that this differenced series largely follows a white noise process. This is because after the definitional spike of 1 at lag 0, almost all subsequent autocorrelation values fall within the blue dashed significance bands, indicating that they are not statistically different from zero. This shows a lack of significant correlation, which is the defining characteristic of white noise. The observation of a few lags potentially nearing or slightly exceeding significance bounds is not uncommon due to statistical error, random chance or minor deviations from a pure white noise process.

Question 2 - Random Walks with drift and exponentially weighted slopes: Stocks (20 points)

Using the daily closing stock prices series of the previous question. Can you find evidence that there is drift in the series.

a) Please calculate the mean and standard deviation of the first-difference daily closing stock prices series of the previous question.

Answer
# Calculate mean and standard deviation
diff_mean <- mean(stock_ts_diff$first_diff, na.rm = TRUE)
diff_sd <- sd(stock_ts_diff$first_diff, na.rm = TRUE)

# Display the results in a table
results <- tibble(
  Statistic = c("Mean", "Standard Deviation"),
  Value = c(diff_mean, diff_sd)
)

kable(results, caption = "Mean and Standard Deviation of First-Differenced Tesla Closing Prices") %>%
  kable_styling(full_width = FALSE)
Mean and Standard Deviation of First-Differenced Tesla Closing Prices
Statistic Value
Mean 0.1975432
Standard Deviation 9.0797725

b) Please provide statistical evidence for the need to include a drift component in our random walk model.

Answer
# Given values
n <- 880
alpha <- 0.05

# Calculate standard error
SE <- diff_sd / sqrt(n)

# Calculate critical t-value
t_critical <- qt(1 - alpha/2, df = n - 1)

# Calculate confidence interval
CI_lower <- diff_mean - t_critical * SE
CI_upper <- diff_mean + t_critical * SE

# Create a tibble to display the results
ci_results <- tibble(
  `Statistic` = c("Mean", "Lower 95% CI", "Upper 95% CI"),
  `Value` = c(diff_mean, CI_lower, CI_upper)
)

kable(ci_results, caption = "95% Confidence Interval for the Mean of First-Differenced Tesla Closing Prices") %>%
  kable_styling(full_width = FALSE)
95% Confidence Interval for the Mean of First-Differenced Tesla Closing Prices
Statistic Value
Mean 0.1975432
Lower 95% CI -0.4031879
Upper 95% CI 0.7982743

d) Based on the results, is there a justification to adding an exponentially weighted slope vs adding a drift parameter? Please explain.

Answer

The confidence interval contains zero, showing that there is no evidence to support adding in a drift parameter. Since the underlying process appears to be random, adding an exponentially weighted slope would not be advised as it is more likely to capture random shocks rather than a persistent and predictable underlying trend.

Rubric

Criteria Mastery (5) Incomplete (0)
Question 1a: Series Plot Responses create a time plot of the daily closing stock prices, effectively representing the data over time. The plot is well-constructed, with clear labeling of the axes, a title, and appropriate formatting to enhance readability. Proficient submissions ensure that the time plot accurately reflects the temporal trends and patterns present in the daily closing stock prices, providing a clear visual representation of the data. The plot may lack clarity or proper presentation, making it difficult to interpret the trends or patterns in the data. Additionally, they may fail to include necessary elements such as axis labels, a title, or appropriate formatting, hindering the readability of the plot. Overall, their representation of the data may be incomplete or insufficient, indicating a need for improvement in data visualization skills.
Mastery (10) Incomplete (0)
Question 1b: Series Correlation Analysis Responses demonstrate a clear understanding of the concept of a random walk and its implications for autocorrelation patterns.They provide clear explanations supported by statistical evidence from the correlogram, discussing how the observed autocorrelation patterns align with the characteristics of a random walk. Students demonstrate a limited understanding of the concept of a random walk or fail to connect the observed autocorrelation patterns to its implications. Their analysis may lack depth or coherence, providing vague or incorrect explanations for the presence or absence of evidence supporting a random walk. Additionally, they may overlook key features or patterns in the correlogram, hindering their ability to draw meaningful conclusions about the nature of the time series data.
Mastery (5) Incomplete (0)
Question 1c: First Difference Plot Responses create a time plot of the first-differences daily closing stock prices, effectively representing the data over time. The plot is well-constructed, with clear labeling of the axes, a title, and appropriate formatting to enhance readability. Proficient submissions ensure that the time plot accurately reflects the temporal trends and patterns present in the daily closing stock prices, providing a clear visual representation of the data. The plot may lack clarity or proper presentation, making it difficult to interpret the trends or patterns in the data. Additionally, they may fail to include necessary elements such as axis labels, a title, or appropriate formatting, hindering the readability of the plot. Overall, their representation of the data may be incomplete or insufficient, indicating a need for improvement in data visualization skills.
Mastery (5) Incomplete (0)
Question 1d: First Difference Properties Students create two charts as specified: a correlogram of the first-difference of daily closing stock prices and a histogram of the difference in stock prices with the corresponding normal density superimposed. The correlogram effectively examines the autocorrelation structure of the differenced series, providing insights into the stationarity and serial dependence of the data. The histogram and superimposed normal density accurately represent the distribution of the differences in stock prices, allowing for visual comparison between the empirical distribution and the theoretical normal distribution. Students encounter difficulties in understanding the concept of differencing or density estimation, leading to inaccuracies in the visual representation of the data. Their analysis of the correlogram and histogram may lack depth or coherence, providing vague or incorrect interpretations of the autocorrelation structure or distribution of the differences in stock prices. Additionally, they may fail to effectively superimpose the normal density on the histogram or overlook key features or patterns in the visualizations.
Mastery (10) Incomplete (0)
Question 1e: Model Identification Student interpret the autocorrelation patterns in the correlogram, focusing on the presence or absence of significant autocorrelation at lag 1 and higher lags. They interpret the shape and centering of the histogram, considering whether it resembles a normal distribution, discussing how the observed patterns align with the expectations for a random walk process. Students provide incomplete or superficial interpretations of the autocorrelation patterns or histogram shape, failing to connect them to the characteristics of a random walk process. They overlook key features or patterns in the visualizations, indicating a limited understanding of the underlying concepts.
Mastery (0) Incomplete (0)
Question 2a: Summary Statistics Students accurately calculate the mean and standard deviation of the first-difference daily closing stock prices series, with well-commented code. Submission have calculation errors or lack sufficient comments in the code, making it hard to follow. They might struggle to provide accurate values or clear explanations, indicating a need for improvement in both mathematical and coding skills.
Mastery (10) Incomplete (0)
Question 2b: Evidence of Drift Students offer statistical evidence supporting the necessity of including a drift component in the random walk model. They conduct a hypothesis test correctly The hypothesis test is built incorrectly, or the evidence collecter is not sufficient to make a statistical statement.
Mastery (10) Incomplete (0)
Question 2c: Interpretation Students interpret the hypothesis test correctly and use the correct language to describe their results Responses lack clear statistical evidence or fail to effectively justify the inclusion of a drift component in the random walk model. They might offer vague or unsupported assertions about the presence of a trend in the data without conducting appropriate statistical analysis. The interpretation of the confidence interval is incorrect
Total Points 50