Time Series Homework: Chapter 2 Lesson 3 Key

Please_put_your_name_here

Data

ind_prod <- rio::import("https://byuistats.github.io/timeseries/data/ind_prod_us.csv")

Questions

Question 1 - Context and Measurement (10 points)

The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

Total US Industrial Production Index

https://fred.stlouisfed.org/series/IPB50001N

Answer

Description: The Federal Reserve’s monthly index of industrial production and related capacity indexes and capacity utilization rates cover three major sectors: manufacturing, mining, and electric and gas utilities. Together with construction, these sectors account for the majority of the variation in national output over the course of the business cycle. The industrial production (IP) index measures the real output of relevant establishments located in the United States, regardless of their ownership, but does not include those located in U.S. territories.

The index provides detailed insights into structural developments within the economy. By capturing fluctuations in production across the industrial sector, the index helps illuminate broader trends and economic cycles, serving as a key indicator for assessing industrial activity in the U.S. economy.

Data Collection Process: The data is collected monthly by the Federal Reserve, which aggregates information from a variety of sources across the industrial sector. These sources include surveys, reports from utilities and manufacturing facilities, and mining production statistics. The process involves adjusting for real output changes rather than price variations, ensuring that the index reflects true production levels.

Unit of Analysis: The unit of analysis is an index where 2017 is set as the base year (2017 = 100). This standardization allows for comparison over time by indicating relative changes in industrial production since 2017. Each observation represents the level of production output for a given month as compared to the base year.

Meaning of Each Observation: Each monthly observation in the series reflects the aggregated real output of the U.S. industrial sector, encompassing manufacturing, mining, and utilities. The index value indicates how current production compares to the levels in 2017. For example, an index value of 105 means that industrial production is 5% higher than in 2017.

Question 2 - Total US Industrial Production: Correlogram (20 points)

a) Please plot a correlogram of the US Industrial Production Index
Answer
acf(ind_prod$ind_prod_indx, plot=TRUE, type = "correlation", lag.max = 100)

b) Please identify evidence of any trend or seasonal component using the correlogram. Please justify your findings.
Answer

Based on the correlogram of the US Industrial Production Index, there is evidence of a trend due to the slow decline in autocorrelation values across many lags, indicating persistent relationships over time. The high autocorrelation values gradually decrease, which is typical of a trending series. However, there is no clear evidence of a seasonal pattern, as the autocorrelation function (ACF) does not show repeating spikes at regular intervals. Once the trend is removed, underlying seasonal patterns may be revealed.

Question 3 - Total US Industrial Production: Decomposition (10 points)

a) Please plot a decomposition of the US Industrial Production Index series. Include the original series, trend, seasonal variation, and random component.
Answer

Additive

ind_prod_ts <- rio::import("https://byuistats.github.io/timeseries/data/ind_prod_us.csv") |>
  mutate(
    dates = yearmonth(mdy(date)),
    value = ind_prod_indx
  ) |> 
  dplyr::select(dates, value) |>
  as_tsibble(index = dates) 
ind_prod_decompose <- ind_prod_ts |>
    model(feasts::classical_decomposition(value,
        type = "add"))  |>
    components()

autoplot(ind_prod_decompose)
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_line()`).

Multiplicative

ind_prod_decompose <- ind_prod_ts |>
    model(feasts::classical_decomposition(value,
        type = "multi"))  |>
    components()

autoplot(ind_prod_decompose)
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_line()`).

b) Justify your choice of decomposition model (additive vs multiplicative)
Answer

Evaluating the random component visualy, the additive model seems to be a better fit. The additive random component seems to have a constant variance, so it seems “regular”. We have discussed that as one of the key tests of fit. On another hand, the time series is increasing and the variance of the seasonal component is increasing across time, so the multiplicative model would fit better. In this case the context trumps any visual heuristic. The series represents the industrial production of the growing US economy. The hectic random component for the multiplicative model captures unexpected economic events like the Great Depression in the late 1920s through the 1930s. You can look closely to see other unexpected events like World War 2 in the 1940s and the 2008 housing crash which cannot be seen visually in the additive models random component. These historical events transformed the US industrial sector and are a key driver of the variation on the series. The mutiplicative model shows correctly these fluctuations in their true magnitude. The random component reveals irregularities and unexpected variations beyond the trend and seasonality, which may indicate unique economic shocks or market disruptions essential for understanding short-term volatility.

Question 4 - Total US Industrial Production: Stationary Series (20 points)

a) Please plot a correlogram of the random component of the US Industrial Production Index
Answer
# Step 1: Import the data and convert to time series format
ind_prod_ts <- rio::import("https://byuistats.github.io/timeseries/data/ind_prod_us.csv") |>
  mutate(
    dates = yearmonth(mdy(date)),
    value = ind_prod_indx
  ) |> 
  dplyr::select(dates, value) |>
  as_tsibble(index = dates)

ind_prod_decompose <- ind_prod_ts |>
    model(feasts::classical_decomposition(value,
        type = "add"))  |>
    components()


acf(ind_prod_decompose$random |> na.omit(), plot=TRUE, type = "correlation", lag.max = 25)

b) Please interpret the correlogram of the random component of the US Industrial Production Index. Include descriptions of the statistical and practical significance of the results. Be careful to justify the cases when a statistically significant correlation is not practically significant.
Answer

The correlogram of the US Industrial Production Index’s random component indicates that it largely resembles white noise, with most autocorrelations being statistically insignificant. There are statistically significant spikes at lags 1, 12 and 24, however it is likely that only lag 1 has practical signifficance. The spikes at lag 12 and 24, corresponding to a yearly cycle, warrants further investigation as it might indicate residual seasonality not fully captured by the decomposition.

Question 5 - US Industrial Production Index: Introspection (20 points)

a) Why is it important to remove trend and seasonal variation before plotting and analyzing correlograms?
Answer

Removing trend and seasonal variation before analyzing correlograms is crucial because these patterns can mask underlying relationships and lead to spurious correlations. Trends and seasonality introduce strong autocorrelation at certain lags, overwhelming any subtle correlations in the underlying data. By removing these dominant patterns, we isolate the stationary component of the time series, allowing us to accurately assess the true relationships between data points at different lags and identify any remaining autocorrelation that might indicate the need for more complex modeling. This ensures that the analysis focuses on the underlying stochastic process rather than predictable cyclical fluctuations.

b) Please speculate on the importance of autocorrelation analysis of the random component of time series data on its modeling and investigation?
Answer

It is important and useful to do an analysis of the autocorrelation of the random component because it gives a good idea if our model needs adjusting, if the correct one is being used, and what is missing. The other components can mislead our interpretation. Trends, cause long persistent autocorrelation across many lags and, while seasonal variations create cyclical patterns that can change the underlying nature of the randomness. By removing these components to the best of our degree, it will better paint the picture of how the random component are independent (i.e., white noise) or if any remaining patterns need to be addressed in the model. This ensures that the analysis reflects true underlying relationships, rather than being influenced by predictable components like trend or seasonality.

Rubric

Criteria Mastery (10) Incomplete (0)
Question 1: Context and Measurement The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided. The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series.
Mastery (5) Incomplete (0)
Question 2a: Correlogram The student plots a correlogram of the time series requested. The plot accurately displays autocorrelation values at various lags. If code is well-commented, providing clarity on the plotting process. The labels, title, and legends are appropriate and match the quality of the illustrations in the Time Series notebook. The student attempts to plot a correlogram of the time series requested but encounters significant errors or lacks clarity in their plot. If code is used, it may lack sufficient commenting or coherence, making it challenging to understand the plotting process. Overall, the plot may lack detail or accuracy, highlighting areas for improvement in time series visualization skills.
Mastery (15) Incomplete (0)
Question 2b: Interpretation The student effectively interprets the correlogram to identify evidence of trend or seasonal components in the time series. Their description matches the textbook description in page 37. The student attempts to interpret the correlogram but encounters errors or lacks clarity in their analysis. There may be inaccuracies in interpreting autocorrelation values or misinterpretation of the findings, indicating a limited understanding of correlogram analysis techniques. Overall, the justification for findings may lack depth or accuracy.
Mastery (5) Incomplete (0)
Question 3a: Decomposition The student plots a decomposition of the US Industrial Production Index series, including the original series, trend, seasonal variation, and random component. The code is well-commented, providing clarity on the decomposition process. The labels, title, and legends are appropriate and enhance the understanding of the plot, matching the quality of illustrations in the Time Series notebook. The student attempts to plot a decomposition of the US Industrial Production Index series but encounters significant errors or lacks clarity in their plot. The code lacks sufficient commenting or coherence, making it challenging to understand the decomposition process. Overall, the plot may lack detail or accuracy.
Mastery (5) Incomplete (0)
Question 3b: Modeling Justification Provides a well-reasoned justification for choosing either the additive or multiplicative decomposition model, clearly explaining how the data’s characteristics (e.g., seasonality, trend) influence the choice. | Fails to provide a clear or logical justification, or the explanation is incorrect or unsupported by the data’s characteristics.
Mastery (5) Incomplete (0)
Question 4a: Correlogram of random component The student plots a correlogram of the time series requested. The plot accurately displays autocorrelation values at various lags. If code is well-commented, providing clarity on the plotting process. The labels, title, and legends are appropriate and match the quality of the illustrations in the Time Series notebook. The student attempts to plot a correlogram of the time series requested but encounters significant errors or lacks clarity in their plot. If code is used, it may lack sufficient commenting or coherence, making it challenging to understand the plotting process. Overall, the plot may lack detail or accuracy, highlighting areas for improvement in time series visualization skills.
Mastery (15) Incomplete (0)
Question 4b: Interpretation Clearly interprets the correlogram, explaining the statistical significance of correlations and addressing practical significance. Provides well-reasoned justification for when statistically significant correlations are not practically important. | Fails to interpret the correlogram accurately, does not explain statistical or practical significance clearly, or provides weak justification for distinguishing between statistical and practical significance. |
Mastery (10) Incomplete (0)
Question 5a: Introspection The student explains the importance of removing trend and seasonal variation before analyzing correlograms, showing an understanding of stationarity assumptions in time series analysis. They recognize that trend and seasonality violate these assumptions, potentially distorting autocorrelation patterns. The student attempts to explain the importance of removing trend and seasonal variation before analyzing correlograms but may struggle with clarity or accuracy. Their understanding of stationarity assumptions in time series analysis might be limited, leading to inconsistencies or inaccuracies. Overall, their explanation may lack depth, indicating areas for improvement in understanding preprocessing steps and stationarity assumptions.
Mastery (10) Incomplete (0)
Question 5b: Introspection The student effectively speculates on the importance of autocorrelation analysis of the random component in time series data for modeling, investigation, and forecasting. Their discussion shows understanding of the topics we have already covered in class. The submission shows effort. Overall, their explanation may lack depth,
or clarity.





Total Points 80