Time Series Homework: Chapter 2 Lesson 2

Please_put_your_name_here

Data

Code

manu_inv <- rio::import("https://byuistats.github.io/timeseries/data/manu_mat_invent.csv")

Questions

Question 1 - Context and Measurement (10 points)

The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

Manufacturers’ Materials and Supplies Inventories

https://fred.stlouisfed.org/series/UMTMMI

Answer

Data Collection Process: The Manufacturers’ Shipments, Inventories, and Orders (M3) Survey is conducted by the U.S. Census Bureau. This survey collects data from manufacturers across a broad range of industries in the United States. The goal is to measure the value of shipments, inventories, and unfilled orders, along with new orders received. The data is gathered directly from manufacturers on a monthly basis and is reported in millions of dollars. It provides critical information regarding industrial activity and business conditions.

Unit of Analysis: The unit of analysis for this time series is millions of dollars. Each data point represents a specific measure of activity in the U.S. manufacturing sector, such as the total value of shipments, inventories, or orders. These measures are reported monthly at the end of the period, reflecting the economic conditions in the manufacturing industry during that month.

Meaning of Each Observation: Each observation in the time series represents the total value (in millions of dollars) of a specific manufacturing activity during the end of a particular month. The survey tracks several categories:

Shipments: The total dollar value of products shipped by manufacturers.

Inventories: The value of products held in inventory by manufacturers.

Orders: New orders received and unfilled orders for manufactured goods.

This data helps to assess the current state and future trends in the U.S. manufacturing sector.

Question 2 - Manufacturer’s Inventory: Autocorrelation and autocovariance (10 points)

a) Please calculate the list of autocorrelation and autocovariance values for the Manufacturer’s Inventory series.

Answer

Here is simple code for covariance/autocorrelation (like in the textbook)

# Autocovariances 
acf(manu_inv$manu_inv, plot=FALSE, type = "covariance")


Autocovariances of series 'manu_inv$manu_inv', by lag

       0        1        2        3        4        5        6        7 
2.39e+09 2.36e+09 2.33e+09 2.30e+09 2.27e+09 2.24e+09 2.20e+09 2.16e+09 
       8        9       10       11       12       13       14       15 
2.13e+09 2.09e+09 2.05e+09 2.01e+09 1.97e+09 1.93e+09 1.89e+09 1.85e+09 
      16       17       18       19       20       21       22       23 
1.81e+09 1.77e+09 1.73e+09 1.69e+09 1.65e+09 1.61e+09 1.58e+09 1.54e+09 
      24       25 
1.51e+09 1.48e+09

Here is a different approach to output covariance/autocorrelation in a tabular format

### Using the broom::tidy() function we can convert the list output of acf() into a nice datatable (similar to pander)
### We can then use kable() to display and caption this datatable in markdown.

# Autocorrelations Table
kable(broom::tidy(acf(manu_inv$manu_inv, plot=FALSE, type="correlation")), 
      caption = "Autocorrelations", digits = 3)

Autocorrelations
lag	acf
0	1.000
1	0.988
2	0.976
3	0.963
4	0.949
5	0.934
6	0.920
7	0.904
8	0.889
9	0.873
10	0.856
11	0.840
12	0.824
13	0.807
14	0.789
15	0.772
16	0.755
17	0.738
18	0.721
19	0.704
20	0.689
21	0.674
22	0.659
23	0.645
24	0.632
25	0.619

b) If autocovariance and autocorrelation are trying to evaluate a similar linear relationship across time in our series, why do we get different values for autocorrelation and autocovariance at the same lag.

Answer

The reason we get different values for autocorrelation and autocovariance at the same lag, even though both aim to evaluate the linear relationship across time in a series, is due to normalization.

This is because the autocovariance will depend on the size and scale of the data, meaning if the time series has large numbers, the autocovariance will correspond significantly larger.

When we compare this then to the autocorrelation, this is a number that is normalized to be a dimensionless number between -1 and 1. It adjusts the autocovariance to a scale that allows for comparison across different datasets or lags. So, while both measure the linear relationship at a given lag, autocovariance reflects the absolute magnitude of that relationship, while autocorrelation reflects the relative strength of that relationship on a standardized scale.

Question 3 - Manufacturer’s Inventory: Stationary (20 points)

Weak stationarity is a form of stationarity important for the analysis of time series data. A time series is said to be weakly stationary if its statistical properties such as mean, variance, and autocovariance are constant over time. Here are the key components of weak stationarity:

Constant Mean: The mean of the time series remains constant over time. This doesn’t necessarily mean that the time series is centered around zero; it just implies that the average value remains the same throughout the observed period.

Constant Variance: The variance of the time series is uniform across all time points. Like the mean, this doesn’t imply that the variance must be zero, just that it doesn’t change systematically with time.

Constant Autocovariance: The autocovariance between any two observations of the time series depends only on the time lag between them and not on the absolute positions of the observations in time. This implies that the dependence structure of the time series remains constant over time.

a) Please split the time series into two halves according to the date recorded, the earlier half of the data and the latter part of the data. Calculate the mean, variance, and autocovariance for each half. Note: (it doesn’t really matter if it’s precisely half. An approximate middle is sufficient.)

Answer

Code

median_date <- median(manu_inv$date)

df_early <- manu_inv %>%
  filter(date <= median_date)

df_late <- manu_inv %>%
  filter(date > median_date)

mean_early <- mean(df_early$manu_inv)
variance_early <- var(df_early$manu_inv)
acf_early <- acf(df_early$manu_inv, type = "covariance", plot = FALSE)

mean_late <- mean(df_late$manu_inv)
variance_late <- var(df_late$manu_inv)
acf_late <- acf(df_late$manu_inv, type = "covariance", plot = FALSE)

cat("Early Half:\n",
    "Mean:", mean_early, "\n",
    "Variance:", variance_early, "\n")

Early Half:
 Mean: 187241.2 
 Variance: 2356124713

Code

acf_early


Autocovariances of series 'df_early$manu_inv', by lag

       0        1        2        3        4        5        6        7 
2.34e+09 2.28e+09 2.22e+09 2.15e+09 2.07e+09 2.00e+09 1.92e+09 1.84e+09 
       8        9       10       11       12       13       14       15 
1.76e+09 1.69e+09 1.61e+09 1.55e+09 1.49e+09 1.42e+09 1.36e+09 1.33e+09 
      16       17       18       19       20       21       22 
1.29e+09 1.26e+09 1.23e+09 1.21e+09 1.18e+09 1.16e+09 1.13e+09

Code

cat("Late Half:\n",
    "Mean:", mean_late, "\n",
    "Variance:", variance_late, "\n")

Late Half:
 Mean: 188964.2 
 Variance: 2453863580

Code

acf_late


Autocovariances of series 'df_late$manu_inv', by lag

       0        1        2        3        4        5        6        7 
2.44e+09 2.38e+09 2.31e+09 2.24e+09 2.17e+09 2.10e+09 2.02e+09 1.94e+09 
       8        9       10       11       12       13       14       15 
1.86e+09 1.77e+09 1.69e+09 1.61e+09 1.54e+09 1.49e+09 1.43e+09 1.39e+09 
      16       17       18       19       20       21       22 
1.34e+09 1.30e+09 1.27e+09 1.24e+09 1.22e+09 1.19e+09 1.17e+09

b) Is there evidence to suggest that the Manufacturer’s Inventory series is weakly stationary?

Answer

Because the mean, variance, and autocovariance values are relatively consistent across the two halves of the data, it would be reasonable to state that this series exhibits weak stationarity. While there are small variations, these changes are not significant enough to clearly indicate non-stationarity.

c) The variance function for a times series, \(\sigma^2(t)=E[(x_t-\mu)^2]\), is defined for the entire ensemble. Why is determining whether a time series has constant variance so difficult using sample data?

Answer

The core difficulty in assessing constant variance lies in distinguishing between true changes in the variance function over time (heteroscedasticity) and random fluctuations inherent in a single realization, even if the underlying variance function is constant (homoscedasticity).

Here are two examples of how you could explain or present this in greater deta detail:

The variance function is indeed defined for the ensemble of possible time series realizations, representing the average variance across all possible time series that could be generated by the underlying process at time t. However, in practice, we are almost always limited to observing only a single realization of the time series. This poses a significant challenge when trying to determine if the variance is constant.

Even if the true ensemble variance is constant over time, a single observed realization will exhibit random fluctuations. These fluctuations are inherent to the stochastic nature of the time series. When we calculate the sample variance from our single realization at different time points (or over different time windows), these sample variances will vary due to this inherent randomness, even if the underlying ensemble variance is constant.

Therefore, determining if the ensemble variance is truly constant using only sample data from a single realization is difficult because:

We are estimating an ensemble property from a single sample path: Sample statistics from one realization are estimates of ensemble properties, and these estimates are subject to sampling variability.
Random fluctuations within the realization can mimic changes in variance: Apparent changes in sample variance over time might simply be due to the inherent randomness of the specific realization we observed, rather than actual changes in the underlying ensemble variance function.
Limited data within a single realization: Even with a long time series, we still only have one observation at each time point from the ensemble perspective. This contrasts with traditional statistical settings where we have multiple independent samples from a population.

The variance function describes the expected spread of values across the entire ensemble of possible time series at time t. It’s a property of the underlying process generating the time series, not just one instance of it. The problem arises because we only ever get to see one version, or realization, of this time series.

Imagine trying to determine if the average height of trees in a forest is the same across different areas of the forest. The “ensemble” is like considering all possible forests that could exist under similar conditions. We only get to walk through one actual forest and take measurements.

Even if the average tree height across all possible forests is constant everywhere, in our specific forest, we’ll find patches with taller trees and patches with shorter trees just due to random chance and the natural variability of tree growth. Similarly, in a time series, even if the true variance is constant, the sample variance calculated from our single realization will fluctuate randomly over time.

This makes it hard to tell if observed changes in sample variance are:

Real changes in the underlying ensemble variance: Meaning the process generating the time series is actually changing its variability.
Just random fluctuations: Meaning the variance is constant, and the changes we see are simply due to the inherent randomness of the single time series we happened to observe.

Because we only have one realization, it’s challenging to separate these two possibilities without employing statistical methods designed to account for this single-realization problem.

Rubric

Criteria	Mastery (10)	Incomplete (0)
Question 1: Context and Measurement	The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided.	The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series.
	Mastery (5)	Incomplete (0)
Question 3a: Autocorrelation and Covariance	The student correctly computes the autocorrelation and autocovariance values for the Manufacturer’s Inventory series using R.The R code is well-commented and structured, facilitating understanding of each step in the calculation process. Results are presented clearly.	The student attempts to compute autocorrelation and autocovariance values for the Manufacturer’s Inventory series, but significant errors are present in the computations. The R code lacks clear documentation, with unclear or missing comments that hinder comprehension of the calculation process. Presentation of results may be confusing or incomplete, making it challenging to interpret the autocorrelation and autocovariance values accurately.
	Mastery (5)	Incomplete (0)
Question 3b:Theoretical understanding	The student provides a clear and accurate explanation of why different values are obtained for the same lag of the autocorrelation and autocovariance estimates. The explanation demonstrates a solid understanding of the underlying concepts.	The student attempts to explain why different values are obtained for the same lag of the autocorrelation and autocovariance estimates but does so with significant inaccuracies or lack of clarity. The explanation ~~may~~ lacks coherence or fails to address key differences between autocorrelation and autocovariance adequately.
	Mastery (5)	Incomplete (0)
Question 4a: Stationarity Calculations	The student accurately splits the dataset into two parts and calculates the mean, variance, and autocovariance for each part using R. The R code is well-commented, providing clear explanations of the steps taken to perform the analysis. The calculated statistics are presented clearly, aiding interpretation of the results, and the student shows a solid understanding of the concepts involved in analyzing time series data.	The student attempts to split the dataset into two parts and calculate the mean, variance, and autocovariance for each part using R, but does so with significant errors or inaccuracies. The R code lacks clear and sufficient commenting, making it difficult to understand the steps taken in the analysis. The calculated statistics may be presented poorly or inaccurately, indicating a limited understanding of the concepts involved in analyzing time series data.
	Mastery (5)	Incomplete (0)
Question 4b: Evaluation	The student assesses whether there is evidence to suggest that the Manufacturer’s Inventory series is weakly stationary. The analysis is supported by clear and concise explanations, demonstrating a solid understanding of the concept of weak stationarity.	The student attempts to assess whether the Manufacturer’s Inventory series is weakly stationary but does so with significant errors or lacks clarity in their analysis. There may be inaccuracies in the methodology or misinterpretation of results, indicating a limited understanding of weak stationarity
	Mastery (10)	Incomplete (0)
Question 4c: Evaluation	The students understand the definition and application of a time series variance function to an ensemble.	The submission doesn’t provide enough evidence of understanding of the definition and application of the variance function.
Total Points	40