Plots Trends, and Seasonal Variation

Chapter 1: Lesson 2

Learning Outcomes

Use technical language to describe the main features of time series data

Define time series analysis
Define time series
Define sampling interval
Define serial dependence or autocorrelation
Define a time series trend
Define seasonal variation
Define cycle
Differentiate between deterministic and stochastic trends

Plot time series data to visualize trends, seasonal patterns, and potential outliers

Plot a “ts” object
Plot the estimated trend of a time series by computing the mean across one full period

Preparation

Read Sections 1.1-1.4

Learning Journal Exchange (15 min)

Review another student’s journal
What would you add to your learning journal after reading your partner’s?
What would you recommend your partner add to their learning journal?
Sign the Learning Journal review sheet for your peer

Vocabulary and Nomenclature Matching Activity (5 min)

Check Your Understanding

Working with a partner, match the definitions on the left with the terms on the right.

Vocabulary Matching

A figure with time on the horizontal axis and the value of a random variable on the vertical axis
A systematic change in a time series that does not appear to be periodic
Repeated pattern within each year (or any other fixed time period)
Repeated pattern that does not correspond to some fixed natural period
Observations in which values are related to lagged observations of the same variable
Random trend that does not follow a discernible or predictable pattern
Can be modeled with mathematical functions, facilitating the long-term prediction of the behavior

Cycle
Correlated (Serially Dependent) Data
Deterministic Trend
Seasonal Variation
Stochastic Trend
Time Plot
Trend

Comparison of Deterministic and Stochastic Time Series (10 min)

Stochastic Time Series

The following app illustrates a few realizations of a stochastic time series.

If a stochastic time series displays an upward trend, can we conclude that trend will continue in the same direction? Why or why not?

Deterministic Time Series

The figure below illustrates realizations of a deterministic time series. The data fluctuate around a sine curve.

Class Activity: Importing Data and Creating a tsibble Object (5 min)

Recall the Google Trends data for the term “chocolate” from the last lesson. The cleaned data are available in the file chocolate.csv. Here are the first few rows of the csv:

https://byuistats.github.io/timeseries/data/chocolate.csv

Import the Data

Use the code below to import the chocolate data and convert it into a time series (tsibble) object. You can click on the clipboard icon in the upper right-hand corner of the box below to copy the code.

# load packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load("tsibble", "fable",
               "feasts", "tsibbledata",
               "fable.prophet", "tidyverse",
               "patchwork", "rio")

# read in the data from a csv
chocolate_month <- rio::import("https://byuistats.github.io/timeseries/data/chocolate.csv")

# define the first date in the time series
start_date <- lubridate::ymd("2004-01-01")  

# create a sequence of dates, one month apart, starting with start_date
date_seq <- seq(start_date,
                start_date + months(nrow(chocolate_month)-1),
                by = "1 months")

# create a tibble including variables dates, year, month, value
chocolate_tibble <- tibble(
  dates = date_seq,
  year = lubridate::year(date_seq),        # gets the year part of the date
  month = lubridate::month(date_seq),      # gets the month
  value = pull(chocolate_month, chocolate) # gets the value of the ts 
)

# create a tsibble where the index variable is the year/month
chocolate_month_ts <- chocolate_tibble |>
  mutate(index = tsibble::yearmonth(dates)) |>
  as_tsibble(index = index)

# generate the ts plot
choc_plot <- autoplot(chocolate_month_ts, .vars = value) +
  labs(
    x = "Month",
    y = "Searches",
    title = "Relative Number of Google Searches for 'Chocolate'"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

choc_plot

Explore R commands summarizing time series data

Check Your Understanding

What does each of the following R commands give us?
- head(chocolate_month_ts, 1)
- tail(chocolate_month_ts, 1)
- guess_frequency(chocolate_month_ts$index)

Estimating the Trend: Annual Aggregation (10 min)

To help visualize what is happening with a time series, we can simply aggregate the data in the time series to the annual level by computing the mean of the observations in a given year. This can make it easier to spot a trend.

For the chocolate data, when we average the data for each year, we get:

Aggregation

chocolate_annual_ts <- summarise(
    index_by(chocolate_month_ts, year), 
    value = mean(value)
  ) 

#chocolate_annual_ts

Table

chocolate_annual_ts |>
  as.data.frame() |>
  concat_partial_table(nrow_head = 6, nrow_tail= 3, decimals = 4) |>
  display_table()

year	value
2004	35.5
2005	41.75
2006	41.5833
2007	43.1667
2008	41.6667
2009	43.5
⋮	⋮
2021	55.0833
2022	59.5
2023	60.1667

The first plot is the time series plot of the raw data, and the second plot is a time series plot of the annual means.

Show the code

# monthly plot
mp <- autoplot(chocolate_month_ts, .vars = value) +
  labs(
    x = "Month",
    y = "Searches",
    title = "Relative Number of Google Searches for 'Chocolate'"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

# yearly plot
yp <- autoplot(chocolate_annual_ts, .vars = value) +
  labs(
    x = "Year",
    y = "Searches",
    title = "Mean Annual Google Searches for 'Chocolate'"
  ) +
  scale_x_continuous(breaks = seq(2004, max(chocolate_month_ts$year), by = 2)) +
  theme(plot.title = element_text(hjust = 0.5))

mp / yp

If you want to superimpose these plots, it would make sense to align the mean value for the year with the middle of the year. Here is a plot superimposing the annual mean aligned with July 1 (in blue) on the values of the time series (in black).

Show the code

chocolate_annual_ts <- summarise(
    index_by(chocolate_month_ts, year), 
    value = mean(value)
  ) |>
  mutate(index = tsibble::yearmonth( mdy(paste0("7/1/",year)) )) |>
  as_tsibble(index = index)

# combined plot
autoplot(chocolate_month_ts, .vars = value) +
  geom_line(data = chocolate_annual_ts, 
            aes(x = index, y = value), 
            color = "#56B4E9") +
  labs(
    x = "Month",
    y = "Searches",
    title = "Relative Number of Google Searches for 'Chocolate'"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

Check Your Understanding

What do the annually-aggregated data tell us about the trend?
What do you observe in the trend?
What do you suspect is causing this trend?

Homework Preview (5 min)

Review upcoming homework assignment
Clarify questions

Homework

Download Assignment

homework_1_2.qmd

Matching

1. A figure with time on the horizontal axis and the value of a random variable on the vertical axis	F. Time Plot
2. A systematic change in a time series that does not appear to be periodic	G. Trend
3. Repeated pattern within each year (or any other fixed time period)	D. Seasonal Variation
4. Repeated pattern that does not correspond to some fixed natural period	A. Cycle
5. Observations in which values are related to lagged observations of the same variable	B. Correlated (Serially Dependent) Data
6. Random trend that does not follow a discernible or predictable pattern	E. Stochastic Trend
7. Can be modeled with mathematical functions, facilitating the long-term prediction of the behavior	C. Deterministic Trend