Time Series Homework: Chapter 2 Lesson 1

Please_put_your_name_here

Data

GET SERIES FROM FRED???

Code
okuns <- rio::import("https://byuistats.github.io/timeseries/data/outputgap_and_cyclical_unemp.xlsx")

gs_night <- rio::import("https://byuistats.github.io/timeseries/data/nightstand-sweat.csv")

Questions

Question 1 - Context and Measurement (10 points)

The first step of any time series analysis is gathering context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

a) Output Gap

https://chat.openai.com/share/122aaad9-2be6-43ec-b58a-e1858305b401

Answer
General Knowledge

Definition: The output gap is the difference between the actual output of an economy (real GDP) and its potential output at full employment (potential GDP). It reflects how far an economy is from its ideal output level.

Positive output gap: Indicates that the economy is overperforming, possibly leading to inflation.

Negative output gap: Indicates underperformance, meaning the economy is operating below its potential, which is common during recessions.

Required Information

Data Collection Process: The output gap is typically calculated using national accounts data, particularly real GDP. The potential GDP is often estimated through statistical methods such as the Hodrick-Prescott filter or by using structural models of production. This series is calculated quarterly.

Unit of Analysis: The unit of analysis is the difference of Real GDP and Potential GDP, often as a percentage of potential GDP.

Meaning of an Observation: The value of a single observation is interpreted as a particular years quarterly output gap (usually as a percentage).

b) Cyclical Unemployment

https://chat.openai.com/share/7d6bf187-41d0-42c3-98bc-d02ea1bd5b80

Answer

Data Collection Process: This is typically derived from labor force surveys, where the unemployment rate is measured. The natural rate of unemployment is estimated using long-term trends in labor market data and removed leaving behind the cyclical unemployment rates.

Unit of Analysis: The unit of analysis for cyclical unemployment is individual workers who are unemployed due to a downturn in the economy. It is typically expressed as a percentage of the total labor force.

Meaning of Each Observation: Each observation in the cyclical unemployment time series represents the percentage of the labor force that is unemployed due to cyclical economic factors at a given time.

Question 2 - Covariance and Correlation: Okun’s Law (30) points)

Okun’s Law is an empirical relationship defined as a negative correlation between the Output Gap and Cyclical Unemployment. If the economy is expanding, businesses are producing more, and unemployment tends to decrease. Conversely, during economic contractions or recessions, output shrinks, leading to an increase in unemployment.

Please use the data okuns to test whether Okun’s Law applies to the US from 1960 to 2021.

a) Please create a scatter plot of the Output Gap in the x-axis and Cyclical Unemployment in the y-axis.
Answer
Code
# Scatter plot of Output Gap vs Cyclical Unemployment
ggplot(okuns, aes(x = `Output Gap`, y = `Cyclical Unemployment`)) +
  geom_point(color = "blue") +
  labs(title = "Scatter Plot of Output Gap vs Cyclical Unemployment",
       x = "Output Gap",
       y = "Cyclical Unemployment") +
  theme_minimal() +
  theme(panel.grid.major = element_line(color = "gray", linetype = "dashed"))

b) Please calculate the covariance and correlation coefficient between the Output Gap and Cyclical Unemployment.
Answer
Code
# Covariance
covariance <- cov(okuns$`Output Gap`, okuns$`Cyclical Unemployment`)

# Correlation coefficient
correlation <- cor(okuns$`Output Gap`, okuns$`Cyclical Unemployment`)

# Print results
cat("Covariance:", covariance, "\nCorrelation Coefficient:", correlation)
Covariance: -3.531172 
Correlation Coefficient: -0.8861978
Code
#cat("Correlation Coefficient:", correlation, "\n")
c) Evaluate Okun’s law. Is there sufficient evidence to suggest an empirical relationship between the Output Gap and Cyclical Unemployment?
Answer

Okun’s law is the principle that higher output of an economy correlates with lower unemployment. The correlation coeffecient gives us a good idea of the strength and direction of the linear relationship. Therefore, a strong negative correlation would support Okun’s law. Here we can see in the scatterplot that there is a downwards trend in our y as we move from low to high x values. This suggests Okun’s law is true, we do observe a negatively correlated empirical relationship between the output gap and cyclical unemployment. When we calculate the correlation between the two variables, we have a strong negative correlation of -0.89. This suggests there is strong evidence for Okun’s law in this particular timeframe.

Question 3 - Correlation vs Causation: Spurious Searches (30 points)

“Every single person who confuses correlation and causation ends up dying.”

Hannah Fry

Please use the data gs_night to evaluate the empirical relationship between the Google search terms for night sweats and nightstand.

a) Please create a scatter plot of the night sweat search index in the x-axis and nightstand search index in the y-axis.
Answer
Code
ggplot(gs_night, aes(x = `night sweat`, y = nightstand)) +
  geom_point(color = "blue") +
  labs(title = "Scatter Plot of Night Sweat Search Index vs Nightstand Search Index",
       x = "Night Sweat Search Index",
       y = "Nightstand Search Index") +
  theme_minimal() +
  theme(panel.grid.major = element_line(color = "gray", linetype = "dashed"))

b) Please calculate the covariance and correlation between the night sweat search index and the nightstand search index.
Answer
Code
# Covariance
covariance_night <- cov(gs_night$`night sweat`, gs_night$nightstand)

# Correlation coefficient
correlation_night <- cor(gs_night$`night sweat`, gs_night$nightstand)

# Print results
cat("Covariance:", covariance_night, "\nCorrelation Coefficient:", correlation_night)
Covariance: 271.05 
Correlation Coefficient: 0.7871518
Code
#cat("Correlation Coefficient:", correlation, "\n")
c) Please suggest a scenario where an increase in the frequency of searches for nightstand will cause higher searches for the term night sweats?
Answer

Generally these will be somewhat implausible reasons and scenarios, this is OK. As long as there is a situation presented credit is given for completion

Example: Often, people use their nightstands to keep personal items like water or fans near their beds. As they rearrange their bedrooms and monitor their sleep more closely, they might realize they’re experiencing night sweats and, as a result, search for reasons or solutions. Therefore, an increase in “nightstand” searches due to temperature control, might indirectly affect searches for “night sweats”.

d) Please suggest a scenario where an increase in the frequency of searches for night sweats will cause higher searches for the term nightstand?
Answer

Generally these will be somewhat implausible reasons and scenarios, this is OK. As long as there is a situation presented credit is given for completion.

Example: People experiencing frequent night sweats may look for ways to keep essential items like water, cooling fans, or extra towels close to their bed for comfort and relief. This could then lead them to search for a nightstand that offers them a way to manage their night sweats by allowing them to access these items easily during the night.

e) Please suggest an event or reason that would result in the correlation between searches for night sweats and searches for the term nightstand?
Answer

This will either be correct or incorrect, credit is not given simply for completion unlike parts c & d. 

Correct

Usually google search will autocomplete or suggest autocompletions as you begin searching. Given ‘Nightsweats’ & ‘Nightstand’ both begin with the same first 6 characters it is highly likely that people may be attempting to search for one or the other and accidentally accepted the wrong autocorrection as they typed out their search query.

Incorrect

Perhaps, one event or reason that would result in the correlation between searches is an event like a heatwave or seasonal weather changes. During periods of high temperatures, it is more difficult to manage temperatures within the home at night if insufficient air conditioning or if more people experience night sweats as their bodies react to the heat. This discomfort may prompt them to search for ways to mitigate the heat and manage night sweats. I frankly would buy a night stand if I didn’t have one and was experiencing heat sweats every night. I would want to have easy access to a glass of water or have a fan positioned correctly to keep me cool.

f) Are any of your suggestions plausible? What does that teach you about the difference between correlation and causation? Please elaborate.
Answer

Suggestions

The student should correctly state that while their suggestions are technically possible, they are implausible (with the exception of google search autocorrection if mentioned)

Correlation vs. Causation:

Correlation simply means that two variables tend to move together—either increasing or decreasing simultaneously—but it doesn’t tell us why this happens. Causation, on the other hand, implies that changes in one variable directly cause changes in another.

In this case, while the data shows a strong correlation between searches for night sweats and nightstand, this does not mean that one causes the other. As seen in the evaluation of the scenarios, plausible explanations for the correlation often involve a third external factor (e.g., google autocorrect) influencing both search terms. This highlights the importance of not assuming causality just because two variables are correlated. In reality, correlation can often result from unrelated factors, coincidences, or underlying variables that drive both outcomes.

Rubric

Criteria Ratings
Complete (10)
Question 1: Context and Measurement The student demonstrates a clear understanding of the context for each data series. The explanation includes details about the data collection process, unit of analysis, and the meaning of each observation.
Complete (5)
Question 2a: Scatter Plot The scatter plot is appropriately titled, and all elements, including the plot itself, axis labels, and title, are clearly labeled. The scatter plot matches or exceeds the quality of the scatter plots shown in lecture.
Complete (5)
Question 2b: Covariance and Correlation Calculation The student effectively utilizes R to correctly calculate covariance and correlation.
Complete (10)
Question 2c:Interpretation and Evaluation The student provides a clear and accurate interpretation of the correlation between the variables. The response demostrates understanding of how the correlation coefficient quantifies the strength and direction of the relationship between the variables.
Complete (10)
Question 2c: Okun’s Law The discussion of Okun’s law makes clear the student understands how to use statistical evidence to evaluate scientific claims in the context of an academic field.
Complete (5)
Question 3a: Scatter Plot The scatter plot is appropriately titled, and all elements, including the plot itself, axis labels, and title, are clearly labeled. The scatter plot matches or exceeds the quality of the scatter plots shown in lecture.
Complete (5)
Question 3b: Covariance and Correlation Calculation The student effectively utilizes R to correctly calculate covariance and correlation.
Complete (10)
Question 3(c,d,e):Scenarios The scenarios suggested are clear and concise. The responses show an honest attempt at thinking of a connection between the variables according to the prompt.
Complete (10)
Question 3: Plausiblity The student provides a critical evaluation is provided regarding the plausibility of the suggested scenarios and events. The evaluation includes a clear and comprehensive explanation on the difference between correlation and causation.
Total Points 70