Rcatch22

R package for calculation of 22 CAnonical Time-series CHaracteristics

Downloads
283
Stars
22
Committers
3

output: rmarkdown::github_document

Rcatch22

R package for the calculation of 22 CAnonical Time-series CHaracteristics. The package is an efficient implementation that calculates time-series features coded in C.

knitr::opts_chunk$set(
comment = NA, fig.width = 7, fig.height = 5, cache = FALSE)

Installation

You can install the stable version of Rcatch22 from CRAN using the following:

install.packages("Rcatch22")

You can install the development version of Rcatch22 from GitHub using the following:

devtools::install_github("hendersontrent/Rcatch22")

You might also be interested in a related R package called theft (Tools for Handling Extraction of Features from Time series) which provides standardised access to Rcatch22 and 5 other feature sets (including 3 feature sets from Python libraries) for a total of ~1,200 features. theft also includes extensive functionality for processing and analysing time-series features, including automatic time-series classification, top performing feature identification, and a range of statistical data visualisations.

Wiki

Please open the included vignette within an R environment or visit the detailed Rcatch22 Wiki for information and tutorials.

Computational performance

With features coded in C, Rcatch22 is highly computationally efficient, scaling nearly linearly with time-series size. Computation time in seconds for a range of time series lengths is presented below.

library(dplyr)
library(purrr)
library(ggplot2)
library(microbenchmark)
library(Rcatch22)

set.seed(123)
lengths <- c(10^1, 10^2, 10^3, 10^4, 10^5)

generate_sims <- function(n){
  
  tmp <- c(cumsum(rnorm(n, mean = 0, sd = 1)))
  
  sim <- as.data.frame(summary(microbenchmark(catch22_all(tmp), times = 10, unit = "s"))) %>%
    dplyr::select(c(mean)) %>%
    mutate(ts_length = n)
  
  return(sim)
}

sims <- lengths %>%
  purrr::map_df(~ generate_sims(n = .x))
# Draw graphic

sims %>%
  ggplot() +
  geom_smooth(aes(x = ts_length, y = mean), formula = y ~ x, method = "lm", 
              se = FALSE, size = 1.15, colour = "#1f78b4") +
  geom_point(aes(x = ts_length, y = mean), size = 3.5, colour = "#1f78b4") +
  labs(x = "Time series length",
       y = "Computation time (s)") +
  scale_x_log10(limits = c(1e1, 1e5),
                breaks = scales::trans_breaks("log10", function(x) 10^x),
                labels = scales::trans_format("log10", scales::math_format(10^.x))) +
  scale_y_log10(breaks = scales::trans_breaks("log10", function(x) 10^x),
                labels = scales::trans_format("log10", scales::math_format(10^.x))) +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        legend.position = "bottom",
        axis.text = element_text(size = 12))

catch24

An option to include the mean and standard deviation as features in addition to catch22 is available through setting the catch24 argument to TRUE:

features <- catch22_all(x, catch24 = TRUE)

Citation

A DOI is provided at the top of this README. Alternatively, the package can be cited using the following:

citation("Rcatch22")

Please also cite the original catch22 paper: