tsibble, graphics and decomposition

Introduction

The point of this workshop is to get you familiarized with the fpp3 package and especially the tsibble object. This is the most important R structure of this course and we will therefore spend some time getting to know it better. The tsibble is the time series equivalent of a tibble object in the tidyverse. It is a data frame which is indexed by a time variable. To create a tsibble, we need to specify which column in the data is the time variable (the index variable). Here is an example, creating a tibble object from data on the temperature in Bergen:

library(fpp3) 
library(tidyverse)
# read the data: 
temp <- read_csv2(
  "https://raw.githubusercontent.com/holleland/BAN430/master/data/bergen_temp.csv")
temp
# A tibble: 516 × 2
   date       temp 
   <chr>      <chr>
 1 01.10.2021 12.9 
 2 02.10.2021 11.7 
 3 03.10.2021 13.6 
 4 04.10.2021 11.7 
 5 05.10.2021 11   
 6 06.10.2021 10   
 7 07.10.2021 11.1 
 8 08.10.2021 14.2 
 9 09.10.2021 13.8 
10 10.10.2021 11.3 
# ℹ 506 more rows
# Convert date column from character to date and temp to numeric:
temp <- temp %>% 
  mutate(date = as.Date(date, format = "%d.%m.%Y"),
         temp = as.numeric(temp))
temp
# A tibble: 516 × 2
   date        temp
   <date>     <dbl>
 1 2021-10-01  12.9
 2 2021-10-02  11.7
 3 2021-10-03  13.6
 4 2021-10-04  11.7
 5 2021-10-05  11  
 6 2021-10-06  10  
 7 2021-10-07  11.1
 8 2021-10-08  14.2
 9 2021-10-09  13.8
10 2021-10-10  11.3
# ℹ 506 more rows
# Creating a tsibble
temp.ts <- temp %>% as_tsibble(index = date)
temp.ts
# A tsibble: 516 x 2 [1D]
   date        temp
   <date>     <dbl>
 1 2021-10-01  12.9
 2 2021-10-02  11.7
 3 2021-10-03  13.6
 4 2021-10-04  11.7
 5 2021-10-05  11  
 6 2021-10-06  10  
 7 2021-10-07  11.1
 8 2021-10-08  14.2
 9 2021-10-09  13.8
10 2021-10-10  11.3
# ℹ 506 more rows

Once we have made our tsibble object, the fpp3 package (or really the tsibble package) has many useful functions we can apply to that object, such as plotting functions. For instance, we can create a timeplot of the Bergen temperatures, by the following code:

temp.ts %>% autoplot(temp)

This is a simple time series with only one variable (the daily mean temperature in Bergen). Often the data we study will consist of multiple time series. Then we need to provide information on which columns the identify the individual time series. This is called the key variable of the tsibble object. Let us consider an example with temperature in Bergen, Oslo, Trondheim and Stavanger.

citytemp <- read_csv2(
  "https://raw.githubusercontent.com/holleland/BAN430/master/data//citytemp.csv")
citytemp.ts <- citytemp %>% 
  mutate(date = as.Date(date, format = "%d.%m.%Y")) %>% 
  filter(!is.na(date)) %>%  # Remove NA-values
  # Create tsibble object
  as_tsibble(index = date, 
             key = c("name","station"))
citytemp.ts
# A tsibble: 34,930 x 4 [1D]
# Key:       name, station [4]
   name             station date       meanTemp
   <chr>            <chr>   <date>        <dbl>
 1 Bergen - Florida SN50540 2000-01-01      6.3
 2 Bergen - Florida SN50540 2000-01-02      6.3
 3 Bergen - Florida SN50540 2000-01-03      6.7
 4 Bergen - Florida SN50540 2000-01-04      4.6
 5 Bergen - Florida SN50540 2000-01-05      4.6
 6 Bergen - Florida SN50540 2000-01-06      6.5
 7 Bergen - Florida SN50540 2000-01-07      6.3
 8 Bergen - Florida SN50540 2000-01-08      6.4
 9 Bergen - Florida SN50540 2000-01-09      4.1
10 Bergen - Florida SN50540 2000-01-10      5.3
# ℹ 34,920 more rows
citytemp.ts %>% autoplot()

Exercises

  1. Run the code above on your own. We will use the same data sets later, but for now, just check that you get the same figures.

  2. Do Exercise 3. All the code you need is given in the exercise. Note, you do not need to download the csv-file, but can load it directly to R using this link (replacing the first line of code):

tute1 <- readr::read_csv("https://bit.ly/fpptute1")
  1. Do Exercise 4.
# install.packages("USgas")
library(USgas)
head(us_total)
  year   state      y
1 1997 Alabama 324158
2 1998 Alabama 329134
3 1999 Alabama 337270
4 2000 Alabama 353614
5 2001 Alabama 332693
6 2002 Alabama 379343

  1. Continue with the temperature data from the largest cities in Norway. Create a
  1. Timeplot,
  2. Seasonal plot,
  3. Seasonal subseries plot.
  1. Aggregate the temperature time series for the 4 largest cities in Norway from daily to weekly and monthly average temperatures. Create two illustrative figures for each.

  2. Temperatures in the Norwegian cities all follow the same seasonality (cold in winter - warm in summer). Create a graphic illustrating the correlation between these time series. Hint: GGally::ggpairs().

  3. Create a autocorrelation plot of the temperature data for Bergen using default settings. Increase the maximum number of lags to 400. Interpret the latter plot. Is this a stationary time series?

  4. Create a tsibble consisting of white noise (uncorrelated variables) of length 100. Create an autocorrelation plot for the time series you have simulated. Interpret the plot. Is this a stationary time series?

  5. The Norwegian government has decided to work towards a goal of installing 30GW of offshore wind power. The two locations they have decided to start building the first wind parks is called Sørlige Nordsjø 2 and Utsira Nord. On Canvas, you find derived power production from modelled wind speed at these two locations on hourly time scale for 5 years.

  1. Using the offshore wind power data, create illustrative figures for Utsira Nord and Sørlige Nordsjø 2 for different time scale aggregates (hourly, daily, weekly, monthly).
  2. Can you a detect trend/cycle/season based on your figures?
  3. What about the relationship between the two locations? Is the dependence linear?
  4. Does your answer in (c) depend on the time scale you use?
  5. If you were to decide where to build the first wind farm solely based on the data you have, which would you choose and why? Discuss with your neighbors.
# Hint: 
wind <- readRDS("OffshoreWindtwoLocationsFiveYears.rds")
  1. Create a tsibble containing the daily wind power data from Sørlige Nordsjø 2. Decompose the time series into three components; trend-cycle T_t, season S_t and remainder R_t, using a suitable decomposition method. Why did you choose the method you did? Is there any seasonal/trend patterns in the data?

  2. The data used in this exercise is the wholesale and retail sales index from Statistics Norway. More specifically, the data is an index for Retail trade, except of motor vehicles and motorcycles. Data starts in Jan 2000 to what is presently available from Statistics Norway.

  1. Load the data using e.g. read.csv2 indicated below. Convert the month column to a yearmonth type and wholesale as a tsibble.
wholesale  <- read.csv2(
  "https://raw.githubusercontent.com/holleland/BAN430/master/data/wholesale_and_retails_index_norway.csv", 
  sep = ";")
head(wholesale,3)
    month wholesale_and_retail_sales_index
1 2000M01                             50.5
2 2000M02                             49.3
3 2000M03                             53.8
  1. Make a time plot.
  2. Decompose the time series using the classical, x11, seats and STL methods. Can you detect any prominent differences between the methods?
  3. Try adjusting the trend and season windows of the STL. What happens? (default values are respectively 21 and 11).
  4. Using your method of choice, plot the seasonally-adjusted time series.
  5. Using your method of choice, plot the detrended series.
  6. Optional: Implement your own additive classical decomposition on this example (solution: see exercise 6).
  1. Exercises from chapter 3: 1-3

Additional recommended exercises from chapter 2: 5, 7, 9 and 10