This notebook uses tidyverse and other related packages and simulated example data for (equity index) returns and local return factors.

In [29]:

```
cv.packs <- c("tidyverse", "lubridate", "xts", "rsample", "recipes", "parsnip", "yardstick", "workflows", "tune", "dials")
for (pack in cv.packs) require(pack, character.only = TRUE)
```

In [30]:

```
set.seed(110)
dv.meds <- seq(as.Date("2010-02-01"), length = 126, by = "1 month") - 1 # date vector of month-end dates
cv.cas <- c("EUR", "USD", "JPY", "CAD", "GBP", "AUD") # character vector of currency area names
cv.fac <- str_c("FACT", 1:9) # character vector of factor names
cv.alf <- outer(cv.cas, cv.fac, str_c, sep = "_") %>% as.vector() %>% sort() # character vector of all factor series names
nm <- length(dv.meds)
nc <- length(cv.cas)
na <- length(cv.alf)
tbl.fac <- matrix(rnorm(nm * na, 0, 1), nm, na) %>%
as_tibble %>% set_names(cv.alf)
tbl.eqr <- matrix(NA, nm, nc) %>% as_tibble %>% set_names(str_c(cv.cas, "_EQ_XR"))
for (ca in cv.cas) { # simulate equity returns related to local factors
tbl.eqr[str_c(ca, "_EQ_XR")] <- as.matrix(tbl.fac[str_c(ca, cv.fac, sep = "_")]) %*% rnorm(9, 0, 1) + rnorm(nm, 0.5, 5)
}
tbm <- bind_cols(as_tibble(dv.meds) %>% set_names("date"), tbl.eqr, tbl.fac) # simulated tibble data set
tbm %>% print()
```

`tidyr`

package¶The purpose of the `tidyr`

package is to make a dataset tidy, organizing it in a standardized form. Standardization facilitates subsequent operations, estimation and analysis, particularly for the other packages of the "tidyverse", such as `dplyr`

or `ggplot2`

. A tidy data set is data table that meets the following conditions:

__Each column represents one variable__, whose values belong to a single attribute across observational units.__Each row represents one observation__of these variables; row values all are attributes of this observation.__There is only one type of observational units__(e.g. years, months, cross-sections) per table. If one wishes to investigate data sets with different observational units, for example times series and panels, one should use different tables.

Examples of "messy data" are variables that are stored in both rows and columns and column headers that are values rather than variables.

Pivoting makes tables longer and wider, by combining or splitting the contents of columns. In the context of macroeconomic and financial analysis this serves common tasks, such as converting wide data formats (that are akin to spreadsheets and used in xts/zoo objects) to long data formats (as prevalent in SQL) and vice versa.

In [3]:

```
tbm.lon <- tbm %>% tidyr::pivot_longer(-date, names_to = "rtype", values_to = "rval") # create "long" data frame
print(tbm.lon)
```

In [4]:

```
tbm.wid <- tbm.lon %>% tidyr::pivot_wider(names_from = "rtype", values_from = "rval") # long format to wide
xdm.wid <- xts(tbm.wid %>% select(-date), tbm.wid$date) # convert to xts
str(xdm.wid) %>% print()
```

Nesting creates a **list-column** of data frames. Nesting is implicitly a summarising operation: it __produces one row for each group defined by the non-nested columns__. This is useful in conjunction with other summaries that work with whole datasets, most notably models.

The function `tidyr::nest(.data, ...)`

takes a dataframe (`.data`

) and a tidy-selected group of columns to nest (`...`

), in the form `new_col = c(col1, col2,...)`

. The right hand side can be any tidy select expression.

In [5]:

```
tbm.eqr <- tbm %>%
mutate(year = year(date)) %>%
nest(data = -year) %>% # data list column that contains one data tibble per year
tail(5) %>% print()
```

`dplyr`

package¶The `dplyr`

package is one of the key workhorse modules in R. Its purpose is to __transform data tables efficiently__. Transformations include [1] grouping and summarizing cases (rows), [2] extracting, arranging and adding rows, [3] extracting, changing and adding new variables (columns), and [4] combining tables through "mutating joins" or simple row bindings.

The basic object used by the `dplyr`

package is the **tidy tibble**. Financial and economic time series data frames need to be converted to this class. Importantly, __tidy data do not use row names__. Information stored in row names, such as dates in zoo and xts objects, should be be added as an explicit variable in the tibble.

The grouping and summarising functions of `dplyr`

allow swift construction of tables of grouped summary statistics, such as mean, standard deviations, benchmark correlation and so forth. Such tables combine observed cases by a grouping criterion (such as a time period, a particular market state or value of an economic trend) and display the summary statistics' value of a chosen group of variables (for example asset class returns).

The benefit of `dplyr`

for this purpose are [1] simple human readable code, [3] a wide range of grouping criteria based on any of the columns in the tibble, and [3] easy selection of variables (columns) by use of `dplyr::select()`

or `dplyr::across()`

in conjunction with helper functions. Data selection can be based on unquoted data variables, character vectors or regular expressions. Moreover, the `dplyr::arrange()`

function makes in easy to rank columns on the summary table.

In [6]:

```
tbm %>% group_by(year(date)) %>% # actual grouping function
select(date | ends_with("EQ_XR")) %>%
summarise_if(is.numeric, median, na.rm = T) %>% # median of all numeric columns is calculated by year group
slice_tail(n = 3) %>% print()
```

Summaries can also be two-dimensional, i.e. reach over column and row groups. The function `dplyr::rowwise()`

allows to __compute on a data frame one row at a time__. One can use it to display summaries or perform mutations “by row”. This can make it particularly useful in dataframes with list-columns. Like `dplyr::group_by()`

the function `dplyr::rowwise()`

does not change what the data looks like; it __changes how dplyr verbs operate on the data__.

In [7]:

```
tbm %>% mutate(year = year(date)) %>%
rowwise() %>% # creates modified copy of tibble
mutate(USD_BIGFACT = mean(USD_FACT1:USD_FACT9)) %>% # calculation across rows
select(year | date | USD_BIGFACT) %>%
group_by(year) %>% # actual grouping function
summarise(USD_BIGFACT = mean(USD_BIGFACT, na.rm = T), .groups = "drop") %>%
slice_tail(n = 5) %>% print()
```

The `dplyr`

functions allow transforming groups or panels of variables in one single operation with transparanet code.

Columnwise operations with the `dplyr::across(.cols,.fns)`

function __perform one and the same operation on multiple columns__ of a table. These operations are particularly important inside `summarise()`

and `mutate()`

and now supersede **scoped functions** (functions that end on `_if`

, `_at`

or `_all`

). __Columnwise opeprations allow summarising or mutating data panels in one go__. Since one can apply a whole list of functions to a panel in one go it is a very powerful method for performing many operations with a few short lines of code.

- The first argument,
`.cols`

, selects the columns to operate on. It uses tidy selection (like`select()`

) to pick variables by position, name, and type. One can also use a character vector, for example in conjunction with`dplyr::all_of()`

. - The second argument,
`.fns`

, is a function or list of functions to apply to each column. The argument also accepts a`purrr`

style formula (or list of formulas) like`~ .x / 2`

.

Reference: https://dplyr.tidyverse.org/articles/colwise.html

In [8]:

```
tbm %>% summarise(across(USD_FACT1:USD_FACT9, ~mean(.x, na.rm = TRUE))) %>% print() # apply function across columns
```

In [9]:

```
l.fs <- list(sign = ~sign(.x), abs = ~abs(.x)) # list of functions
tbm.mod <- tbm %>%
transmute(across(c(EUR_EQ_XR, USD_EQ_XR), l.fs)) # apply list of functions over muliple columns
print(slice_tail(tbm.mod, n = 5))
```

`purrr`

package¶The tidyverse package `purrr`

enhances R’s functional programming (FP) toolkit. It offers a complete and consistent set of tools for applying functions to data structures, working with lists in general, reducing lists, and modifying function behaviour.

The family of `map()`

functions allows to replace many standard loops with code that is more succinct and easier to read. It generalizes the application of all sorts of operations to sets of variables in a dataframe. The `map`

function family is similar to the `apply`

function family in base R, but offers more variation and is more consistent in the format of output values.
In general, `purrr::map()`

returns a list or data frame. However, type of object that is returned can be determined by choosing map functions with appropriate suffix.

The simplest function `purrr::map(.x, .f,...)`

transforms its input by applying a function (`.f`

) to each element of a list, dataframe or atomic vector (`.x`

) and returning an object of the same length as the input. Put simple, __the function `.f` iterates over each element of the list `.x`__. The map functions also use the `...`

(“dot dot dot”) argument to pass along additional arguments to `.f`

at each interative call. Multiple arguments can be passed along using commas to separate them.

Since dataframes are lists of vectors, the `purrr::map()`

function is suitable for operating on dataframe columns. Thus, the function can be applied to selected columns.

In [10]:

```
tbm %>% select(starts_with(c("EUR", "USD")) & ends_with("FACT1")) %>% map(summary)
```

The standard `purrr::map(x.=list, .f = function)`

syntax works often. However, for cases __where one must specify how the list is used a more explicit version is required__. An explicit syntax can be based on a one-sided function, called a **mapper**, and takes the form `purrr::map(list, ~function(.x))`

. It uses the argument `.x`

, `.`

, or `..1`

to denote where the list element goes inside the function.

Mappers are short-hand one-sided functions that are often used like lambda functions. However, __mappers can also be named and reused__, very much like regular functions, by using the `purrr::as_mapper()`

function. Applied in an assignment statement this function gives **reusable mappers**.

In [11]:

```
returnLast12Months <- as_mapper(~sum(tail(.x, 12))) # reusable mapper
tbm %>% select(ends_with("EQ_XR")) %>% map_dbl(returnLast12Months) %>% print() # apply to subset of data table
```

The power and benefit of the `purrr:map()`

family becomes evident when considering the diversity of some other members:

- The functional
`purrr::map2(.x, .y, .f, ...)`

is used to iterate over two lists (`.x`

, and`.y`

) at the same time. This means it applies the function to pairs of arguments. - The functional
`purrr::pmap(.l, .f)`

is used to iterate over a**master list**, i.e. a list of lists or list of vectors (`.l`

). In particular, thus functional applies the function`.f.`

to parallel sets of inner list elements inside the masterlist.__Instead of using `.x` or `.y` this functional uses the sublist names as arguments__. - The functional
`purrr::invoke_map(.f, .x,...)`

invokes a list of functions`.f`

for a argument list`.x`

that is either of length 1 or of the same length as`.f`

.

In [12]:

```
v.rxr <- tbm %>% select(USD_EQ_XR, EUR_EQ_XR, JPY_EQ_XR) %>%
pmap_dbl(~..1 - 0.5 * (..2 + ..3)) # apply mapper with specific role of each series in expression
str(v.rxr)
```

`broom`

package¶The `broom`

package addresses the **model representation problem** in R. Mathematical models have shared notation and community standards. By contrast, __trained R models follow few community standards__. Information on trained R models is collected in lists of different shapes. Extracting specific information across models is therefore not straightforward. For example, R model class probabilities have typically different interfaces.

The `broom`

package provides a standardized way of representing trained models. In particular, it delivers __three S3 generics__ to extract trained model information:

`broom::tidy()`

returns a standardized tibble with consistent column names that provides information about fited coefficients`broom::glance()`

always returns a one-row tibble with consistent column names of goodness of fit measures`broom::augment()`

adds information about observations to a dataset (such as fitted values and standard errors of predictions) and to get predictions on new data.

The output of the above broom functions is always a tibble. Also, it never has rownames (which would prevent dataframes with duplicates from being stackable). Most importantly, __column names are kept consistent__, so that they can be combined across different models. These features make sure that model output is "tidy", i.e. easily usable for further work, and also stackable.

While `broom`

is useful for summarizing the result of a single analysis in a consistent format, it is really __designed for high-"throughput" applications__ that combine results from multiple analyses.

The function `broom::tidy(x,...)`

__summarizes the results of an estimation or test in a tidy tibble__, where each row contains a category of information. For regression models, these results typically are information sets related to coefficients. Standardized representation is particularly useful for joints with other similar estimations, as well as quick inspections and custom visualizations.

In [13]:

```
mod.ols <- lm(USD_EQ_XR ~ USD_FACT1 + USD_FACT2 + USD_FACT3, tbm)
tbl.olt <- broom::tidy(mod.ols) # standard tibble for linear coefficient estimation
print(tbl.olt)
```

In [14]:

```
mod.nls <- nls(USD_EQ_XR ~ c + b1 * USD_FACT1 + b2/USD_FACT2 + b3/USD_FACT3, tbm, start = list(c = 0, b1 = 1, b2 = 1, b3 = 1))
tbl.nlt <- broom::tidy(mod.nls) # standardized tibble for non-linear coefficient estimation
print(tbl.nlt)
```

The function `broom::glance()`

returns a tibble with exactly one row that contains model-specific goodness of fitness measures and related statistics. This is useful to check for model misspecification and to compare a variety models.
Goodness-of-fit __measures differ across models but their names are standardized__ so that comparisons can focus on the common ones.

In [15]:

```
map_lm <- as_mapper(~lm(as.formula(str_c("USD_EQ_XR ~ ", str_c("USD_FACT", 1:.x) %>% str_c(collapse = " + "))), tbm))
l.mods <- map(1:9, map_lm) # creates list of linear regression models with incremental factor number
tbl.cmp <- purrr::map_df(l.mods, broom::glance, .id = "model") %>% arrange(AIC) # ranked tibble of model fits
print(tbl.cmp)
```

`lubridate`

package¶The `lubridate`

package facilitates work and coding with data and date-time objects. In particular, it supports the parsing of date-times (i.e. converting strings or numbers into proper date-time objects), getting and setting components of date-time objects, rounding date-times and - in particular - mathematical operations with date-times, based on consistent timelines. This means operations are robust to time zones, leap days, daylight savings times, and other time-related idiosyncracies.

__Periods track how much the clock moves forward, regardless of what time actually passes__. They ignore time line irregularities. One adds or subtracts periods in order to create records at specific clock times, irrespective of how much time has actually passed. Base R makes periods with functions that bear the neame of the time unit pluralized, such as `years(x=1)`

, `months(x)`

or `weeks(x)`

. More generally `lubridate::period(num, units)`

is an automation-friendly period constructor that can handle various types of periods.

In [16]:

```
l.units <- list(10, 20, 30)
l.freqs <- list("months", "weeks", "days")
per <- map2(l.units, l.freqs, ~period(.x, units = .y)) %>% reduce(`+`) %>% print() # add up list of different period types
print(lubridate::ymd("2020-06-08") + 2 * per) # periods can be added to a date and get new date
```

Durations track the __passage of physical time__, which deviates from clock time when irregularities occur. One usually adds or subtracts durations to model physical processes, like battery life, but they also matter for trading rule and the assessment of the impact of economic-financial shocks on price dynamics. Durations are stored as seconds, which is the only time unit with consistent length. The `difftime`

objects in base R are a form of duration.

Lubridate makes periods with functions that bear the name of the period prefixed with a `d`

, such as `lubridate::dhours()`

, `lubridate::ddays()`

, or `lubridate::years()`

. All of these are considered to have a fixed number of seconds. There is no `lubridate::dmonths()`

, because calendar months cannot be standardized.

In [17]:

```
print(ymd("2020-01-01") + years(1)) # adding one year period
print(ymd("2020-01-01") + dyears(1)) # adding one year duration (in leap year does not reach the next year)
```

Intervals represent clock time passage (like periods) on a particular part of the timeline. Unlike periods, __intervals are defined by specific start and end times, rather than an amount of time__. If summer time sets clocks forward an interval of one hour arises, even if no time has passed. Technically, intervals represent the spaces in boundaries.

Intervals can be created with the `lubridate::interval(dt1, dt2)`

function or the `dt1 %--% dt2`

operator, both of which take date-time objects as arguments. Note that the interval between two units is one unit. For example, in the case of days the function counts from and to the same point within a day.

In [18]:

```
itv1 <- interval(ymd("2019-01-01"), ymd("2020-01-01"))
print(str(itv1))
itv2 <- dmy("1 Jan 2020") %--% dmy("1 Jan 2021")
print(itv2)
```

There is a range of useful operations that can be performed with intervals:

- the
`dt %within% itv`

operator checks if a date (or interval) falls within an interval (useful for blacklisting); `lubridate::int_start(int)`

and`lubridate::int_end(int)`

get and set first and last dates;`lubridate::int_length(int)`

gives the interval length in seconds;`lubridate::int_aligns(int1, int2)`

,`lubridate::int_overlaps(int1, int2)`

check if intervals have common boundary/overlap;`lubridate_shift(int, by)`

shift an interval along the timeline by an interval;`int_diff(times)`

turns the date-times in a vector into intervals;

In [19]:

```
itv <- interval(ymd("2019-01-01"), ymd("2020-01-01"))
ymd("2019-02-02") %within% itv %>% print # check if date falls into interval
interval(ymd("2019-09-01"), ymd("2020-05-01")) %within% itv %>% print # envelopment needs to be complete to give TRUE
```

`stringr`

package¶The `stringr`

package makes working with strings in R easier, particularly when this work involves regular expressions (regex). The package functions facilitate [1] detecting string pattern matches, [2] subsetting character vectors, [3] managing the length of strings, [4] changing strings by rules, [5] splitting and joining strings, and [6] ordering strings. This is a great benefit for workflows with financial and economic data structures, where time series are created in groups with references to their names across countries or markets.

The functions of the `stringr`

package have the following consistent features:

- All functions start with
`str_`

. - The first argument is a character vector.
- Many functions have a
`pattern`

argument that accepts regex.

The function `stringr::str_detect(cv, pattern)`

checks if the strings in the character vector `cv`

contain a specific `pattern`

. It __returns a logical vector__ of the same length as the input character vector, with TRUE for elements that contain the `pattern`

and FALSE otherwise. Analogously, `stringr::str_which(cv, pattern)`

returns the indexes of strings that match the pattern. These functions are particularly powerful with regular expressions. However, if a regular expression is not needed, the `stringr::fixed()`

function specifies that a pattern is a fixed string. This can yield substantial speed ups. Detecting pattern matches is highly useful in operating with data series in larger data sets by name.

In [20]:

```
str_which(names(tbm), "(^EUR|^USD).+XR$") %>% print() # indexes of strings beginning with EUR/USD that contain XR at the end.
```

`tidymodels`

packages¶Tidymodels packages provide tidyverse-friendly interfaces for model implementation. They have taken the role of the tidyverse’s machine learning toolkit. The tidymodels packages do not implement statistical models themselves. Rather they focus on facilitating the surrounding tasks:

- data sample splitting for validation (
`rsample`

), - data pre-processing (
`recipes`

), - standardized model training and out-of-sample predictions (
`parsnip`

), - creating and tuning hyperparameters
`(tune`

and`dials`

), - model validation with appropriate metrics (
`yardstick`

), and - bundling processes (
`workflows`

).

These critical functions are usually performed in sequence and give an efficient standardized workflow.

The `rsample package`

is useful for both random and time (ordered) splits.

- The function
`rsample::initial_split(data, prop = 3/4,...)`

creates a single binary random split of the data into a training set and testing set. - For time series
`rsample::initial_time_split(data, prop = 3/4, lag = 0, ...)`

does the same, but takes the first proportion of the sample for training, instead of a random selection. - Multi-fold cross-validation randomly splits the data into v groups of roughly equal size (called “folds”). The
`rsample::vfold_cv()`

function does such random splitting. - For time series
`rsample::rolling_origin(data, initial = 5, assess = 1, cumulative = TRUE, skip = 0, lag = 0,...)`

resamples consecutively across time.

All of the above give `rset`

objects that denote **splits**. The splits can be used after being passed as arguments to `rsample::training()`

and `rsample::testing()`

functions. The functions extract actual training and testing **samples**.

A popular standard of data partioning is an initial split into (broad) training and test sets and a - subsequent - split of the broad training set into various (core) training and validation folds. The latter typically serves model tuning, i.e. the selection of the model with the best tuning parameters.

In [21]:

```
tbm.usd <- tbm %>% select(starts_with("USD"))
rset.in <- rsample::initial_split(tbm.usd, prop = .7) # initial 70%/30% split object
tbl.trn <- rsample::training(rset.in) # broad training set
tbl.tst <- rsample::testing(rset.in) # test set
rset.cv <- rsample::vfold_cv(tbl.trn) # cross-validation folds of the broad training set
rset.cv %>% print() # for illustration
```

The `recipes`

package provides an interface that __facilitates data pre-processing for a specific estimation formula__. It specifies the role of each variable as an outcome or predictor variable. And, for intuition, the package functions are named after cooking actions.

The function `recipe(data, form,...)`

creates a `recipe`

object based principally on the model formula `form`

. The initial `recipe`

object becomes the basis for subsequent transformations. Each data transformation is called a **step**. The package offers functions that correspond to specific types of steps, such as normalization or elimination of highly correlated variables, each prefixed with `step_`

. There are nearly one hundred step functions available. Information on the scope of available step functions can be found here.

__A step can be applied to a specific variable, groups of variables, or all variables__. For the purpose of selection, the `recipes`

package offers special selector functions. Thus, `has_role()`

, `all_predictors()`

, and `all_outcomes()`

select variables according to their role in the estimation formula. Similarly, `has_type()`

, `all_numeric()`

, and `all_nominal()`

are used to select columns based on data type. Moreover, __select helpers from the tidyselect package can be used__, just as in `dplyr::select()`

, including `starts_with()`

, `contains()`

and so forth.

After specifying the transformations with step functions, `prep()`

executes the transformations on top of the data that is supplied. This __produces a modified recipe whose step objects have been updated__ with the required quantities.

In [22]:

```
rec.usd <- recipes::recipe(USD_EQ_XR ~ ., data = tbl.trn) %>%
recipes::step_corr(all_predictors(), threshold = 0.9) %>% # correlation filter on predictors
recipes::step_center(all_predictors()) %>% # centering
recipes::step_scale(all_predictors()) %>% # scaling to SD 1
print(rec.usd)
```

After a `recipe`

object has been created for a training data set, __testing data can be transformed in exactly the same way but out of sample__ by using the `bake()`

function. For a recipe with at least one pre-processing operation that has been trained by `prep.recipe()`

computations are applied to the new data.

The `parsnip`

package focuses on __standardizing model interfaces and return values__. When using `parsnip`

, one does not have to remember each interface and unique argument names. This facilitates coding across R packages. Information on the scope of supported models can be found here.

As model type that typically requires tunining is elastic net based on the `glmnet`

package. In a tidyverse workflow one calls such a model through the `parsnip`

interface, while leaving open the tuning parameters, i.e the penalty for regressors and the weight of LASSO versus ridge regression.

In [23]:

```
mod.elnet <- parsnip::linear_reg(penalty = tune::tune(), mixture = tune::tune()) %>%
parsnip::set_engine("glmnet")
print(mod.elnet)
```

A **workflow** is __an object that can bundle together pre-processing, modeling, and post-processing__. For example, a `recipe`

object and a `parsnip`

model can be combined into a workflow.

In [24]:

```
wfl.elnet <- workflows::workflow() %>%
workflows::add_recipe(rec.usd) %>% # add formula-based recipe to workflow
workflows::add_model(mod.elnet) # add general model type to workflow
```

The goal of the `tune`

package is to facilitate hyperparameter tuning for the tidymodels packages. It relies heavily on `recipes`

, `parsnip`

, and `dials`

. Model parameter tuning is accomplished by training models with different specification and testing the predictive success. By running a "grid" (various combinations) of parameter values we can build hundreds or thousands of models. Validation will then identify the "best" model, i.e. the model with optimized tuning parameters.

Here we use the `tune::tune_grid()`

function based on the pre-set workflow and the cross-validation folds to obtain tuning results for a grid that is being created (here by an integer). The chosen metric for evaluation here is the root mean-squared error.

In [25]:

```
trs.elnet <- wfl.elnet %>% # create tuning result object
tune_grid(resamples = rset.cv, grid = 10, metrics = yardstick::metric_set(yardstick::rmse))
trs.elnet %>% tune::collect_metrics() %>% print() # view summary of tuning results.
```

A tibble with the optimized parameters can be extracted trough `tune::select_best()`

.

In [26]:

```
tbl.ops <- trs.elnet %>% tune::select_best(metric = "rmse") # select best specification
print(tbl.ops) # tibble with optimized parameters
```

With `tune::finalize_workflow()`

and `parsnip::fit()`

one can then refit the the optimally tuned model based on the whole broad training sample.

In [27]:

```
wfl.reg <- wfl.elnet %>%
tune::finalize_workflow(tbl.ops) %>% # attach the best tuning parameters to the model
parsnip::fit(data = tbl.trn) # fit the final model to the training data
```

The refitted optimizal model can then be applied to the test set, allowing a validation of the overall learning method out of sample.

The tidymodels package `yardstick`

provides tidy characterizations of model performance. The `yardstick::metrics(data, truth, estimate,...)`

function produces common performance measures of a model. It will automatically choose metrics appropriate for the given type of model. The function expects a tibble with columns that contains the actual results (`truth`

) and what the model predicted (`estimate`

).

In [28]:

```
wfl.reg %>%
predict(new_data = tbl.tst) %>% # predict test set
bind_cols(tbl.tst, .) %>% # combine with actual
select(USD_EQ_XR, .pred) %>%
yardstick::metrics(USD_EQ_XR, .pred) %>% # validate
print()
```