Calculate change in predicted response over diversity gradient
gradient_change_data.Rd
Helper function for creating the data to visualise a scatter-plot of the
response over a diversity gradient. The "richness" and "evenness"
diversity gradients are currently supported. The average (predicted) response
is calculated from all communities present at a given level of the
chosen diversity gradient in `data`. The output of this
function can be passed to the gradient_change_plot
function
to visualise results.
Arguments
- data
A data-frame consisting of variable proportions and any other necessary variables to make predictions from `model` or `coefficients`.
- prop
A vector identifying the column-names or indices of the columns containing the variable proportions in `data`.
- add_var
A list specifying values for additional predictor variables in the model independent of the compositional predictor variables. This could be useful for comparing the predictions across different values for a non-compositional variable. If specified as a list, it will be expanded to show a plot for each unique combination of values specified, while if specified as a data-frame, one plot would be generated for each row in the data and they will be arranged in a grid according to the value specified in `nrow` and `ncol`.
- gradient
Diversity gradient to show on the X-axis, one of "richness" or "evenness". Defaults to "richness". See `Details` for more information.
- prediction
A logical value indicating whether to pass the final data to the `add_prediction` function and append the predictions to the data. Default value is TRUE, but often it would be desirable to make additional changes to the data before making any predictions, so the user can set this to FALSE and manually call the `add_prediction` function.
- ...
Arguments passed on to
add_prediction
model
A regression model object which will be used to make predictions for the observations in `data`. Will override `coefficients` if specified.
coefficients
If a regression model is not available (or can't be fit in R), the regression coefficients from a model fit in some other language can be used to calculate predictions. However, the user would have to ensure there's an appropriate one-to-one positional mapping between the data columns and the coefficient values. Further, they would also have to provide a variance-covariance matrix of the coefficients in the `vcov` parameter if they want the associated CI for the prediction or it would not be possible to calculate confidence/prediction intervals using this method.
vcov
If regression coefficients are specified, then the variance-covariance matrix of the coefficients can be specified here to calculate the associated confidence interval around each prediction. Failure to do so would result in no confidence intervals being returned. Ensure `coefficients` and `vcov` have the same positional mapping with the data.
coeff_cols
If `coefficients` are specified and a one-to-one positional mapping between the data-columns and coefficient vector is not present. A character string or numeric index can be specified here to reorder the data columns and match the corresponding coefficient value to the respective data column. See the "Use model coefficients for prediction" section in examples.
conf.level
The confidence level for calculating confidence/prediction intervals. Default is 0.95.
interval
Type of interval to calculate:
- "none" (default)
No interval to be calculated.
- "confidence"
Calculate a confidence interval.
- "prediction"
Calculate a prediction interval.
Value
The data-frame with the following columns appended at the end
- .Richness
The richness (number of non-zero compositional variables) within each observation.
- .Evenness
The evenness (metric quantifying the relative abundance of each compositional variable) within each observation.
- .Gradient
An character string defining the diversity gradient used for averaging the response.
- .add_str_ID
An identifier column for grouping the cartesian product of all additional columns specified in `add_var` parameter (if `add_var` is specified).
- .Pred
The predicted response for each obsvervation.
- .Lower
The lower limit of the prediction/confidence interval for each observation.
- .Upper
The upper limit of the prediction/confidence interval for each observation.
- .Avg
The averaged value of the predicted response for each unique value of the selected diversity gradient.
Details
Currently two diversity gradients are supported
Richness: A metric describing the number of non-zero compositional variables in an observation.
Evenness: A metric quantifying the relative abundances of all compositional variables in an observation. Defined as $$(2s/(s-1)) \sum_{i, j = 1; i < j}^{s}{p_i * p_j}$$ where \(s\) is the total number of compositional variables and \(p_i\) and \(p_j\) are the proportions of the variables \(i\) and \(j\). See Kirwan et al., 2007 <doi:10.1890/08-1684.1 > and Kirwan et al., 2009 <doi:10.1890/08-1684.1 > for more information.
Here's a small example of how these metrics are calculated for a few observations. Suppose we have four compositional variables (i.e. \(s = 4\)) and have the following three observations
A = (0.5, 0.5, 0, 0)
B = (0.25, 0.25, 0.25, 0.25)
C = (1, 0, 0, 0)
The richness values for these three observations would be as follows
A = 2 (Since two of the four compositional variables were non-zero)
B = 4 (Since all four compositional variables were non-zero)
C = 1 (Since one of the four compositional variables were non-zero)
The evenness values would be calculated as follows
A = \(2*4/(4-1)*(0.5*0.5+0.5*0+0.5*0+0.5*0+0.5*0+0*0) = 0.67\)
B = \(2*4/(4-1)*(0.25*0.25+0.25*0.25+0..25*0.25+0.25*0.25+0.25*0.25+0.25*0) = 1\)
C = \(2*4/(4-1)*(1*0+1*0+1*0+0*0+0*0+0*0) = 0\)
Examples
library(DImodels)
library(dplyr)
## Load data
data(sim2)
## Fit model
mod <- glm(response ~ 0 + (p1 + p2 + p3 + p4)^2, data = sim2)
## Create data
## By default response would be averaged on the basis of richness
head(gradient_change_data(data = sim2,
prop = c("p1", "p2", "p3", "p4"),
model = mod))
#> ✔ Finished data preparation
#> # A tibble: 6 × 12
#> community block p1 p2 p3 p4 response .Richness .Evenness .Gradient
#> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 0.7 0.1 0.1 0.1 20.2 4 0.64 .Richness
#> 2 1 2 0.7 0.1 0.1 0.1 20.1 4 0.64 .Richness
#> 3 1 3 0.7 0.1 0.1 0.1 20.9 4 0.64 .Richness
#> 4 1 4 0.7 0.1 0.1 0.1 17.0 4 0.64 .Richness
#> 5 2 1 0.1 0.7 0.1 0.1 17.2 4 0.64 .Richness
#> 6 2 2 0.1 0.7 0.1 0.1 19.9 4 0.64 .Richness
#> # ℹ 2 more variables: .Pred <dbl>, .Avg <dbl>
## Average response with respect to evenness
head(gradient_change_data(data = sim2,
prop = c("p1", "p2", "p3", "p4"),
model = mod,
gradient = "evenness"))
#> ✔ Finished data preparation
#> # A tibble: 6 × 12
#> community block p1 p2 p3 p4 response .Richness .Evenness .Gradient
#> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 0.7 0.1 0.1 0.1 20.2 4 0.64 .Evenness
#> 2 1 2 0.7 0.1 0.1 0.1 20.1 4 0.64 .Evenness
#> 3 1 3 0.7 0.1 0.1 0.1 20.9 4 0.64 .Evenness
#> 4 1 4 0.7 0.1 0.1 0.1 17.0 4 0.64 .Evenness
#> 5 2 1 0.1 0.7 0.1 0.1 17.2 4 0.64 .Evenness
#> 6 2 2 0.1 0.7 0.1 0.1 19.9 4 0.64 .Evenness
#> # ℹ 2 more variables: .Pred <dbl>, .Avg <dbl>
## Additional variables can also be added to the data by either specifying
## them directly in the `data` or by using the `add_var` argument
## Refit model
sim2$block <- as.numeric(sim2$block)
new_mod <- update(mod, ~. + block, data = sim2)
## This model has block so we can either specify block in the data
subset_data <- sim2[c(1,5,9,11), 2:6]
subset_data
#> block p1 p2 p3 p4
#> 1 1 0.7 0.1 0.1 0.1
#> 5 1 0.1 0.7 0.1 0.1
#> 9 1 0.1 0.1 0.7 0.1
#> 11 3 0.1 0.1 0.7 0.1
head(gradient_change_data(data = subset_data,
prop = c("p1", "p2", "p3", "p4"),
model = mod,
gradient = "evenness"))
#> ✔ Finished data preparation
#> # A tibble: 4 × 10
#> block p1 p2 p3 p4 .Richness .Evenness .Gradient .Pred .Avg
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 1 0.7 0.1 0.1 0.1 4 0.64 .Evenness 18.4 17.3
#> 2 1 0.1 0.7 0.1 0.1 4 0.64 .Evenness 17.5 17.3
#> 3 1 0.1 0.1 0.7 0.1 4 0.64 .Evenness 16.6 17.3
#> 4 3 0.1 0.1 0.7 0.1 4 0.64 .Evenness 16.6 17.3
## Or we could add the variable using `add_var`
subset_data <- sim2[c(1,5,9,11), 3:6]
subset_data
#> p1 p2 p3 p4
#> 1 0.7 0.1 0.1 0.1
#> 5 0.1 0.7 0.1 0.1
#> 9 0.1 0.1 0.7 0.1
#> 11 0.1 0.1 0.7 0.1
head(gradient_change_data(data = subset_data,
prop = c("p1", "p2", "p3", "p4"),
model = new_mod,
gradient = "evenness",
add_var = list(block = c(1, 2))))
#> ✔ Finished data preparation
#> # A tibble: 6 × 11
#> p1 p2 p3 p4 block .add_str_ID .Richness .Evenness .Gradient .Pred
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 0.7 0.1 0.1 0.1 1 block: 1 4 0.64 .Evenness 19.1
#> 2 0.1 0.7 0.1 0.1 1 block: 1 4 0.64 .Evenness 18.2
#> 3 0.1 0.1 0.7 0.1 1 block: 1 4 0.64 .Evenness 17.3
#> 4 0.1 0.1 0.7 0.1 1 block: 1 4 0.64 .Evenness 17.3
#> 5 0.7 0.1 0.1 0.1 2 block: 2 4 0.64 .Evenness 18.7
#> 6 0.1 0.7 0.1 0.1 2 block: 2 4 0.64 .Evenness 17.8
#> # ℹ 1 more variable: .Avg <dbl>
## The benefit of specifying the variable this way is we have an ID
## columns now called `.add_str_ID` which could be used to create a
## separate plot for each value of the additional variable
## Model coefficients can also be used, but then user would have
## to specify the data with all columns corresponding to each coefficient
coef_data <- sim2 %>%
mutate(`p1:p2` = p1*p2, `p1:p3` = p1*p2, `p1:p4` = p1*p4,
`p2:p3` = p2*p3, `p2:p4` = p2*p4, `p3:p4` = p3*p4) %>%
select(p1, p2, p3, p4,
`p1:p2`, `p1:p3`, `p1:p4`,
`p2:p3`, `p2:p4`, `p3:p4`) %>%
slice(1,5,9,11)
print(coef_data)
#> p1 p2 p3 p4 p1:p2 p1:p3 p1:p4 p2:p3 p2:p4 p3:p4
#> 1 0.7 0.1 0.1 0.1 0.07 0.07 0.07 0.01 0.01 0.01
#> 5 0.1 0.7 0.1 0.1 0.07 0.07 0.01 0.07 0.07 0.01
#> 9 0.1 0.1 0.7 0.1 0.01 0.01 0.01 0.07 0.01 0.07
#> 11 0.1 0.1 0.7 0.1 0.01 0.01 0.01 0.07 0.01 0.07
print(mod$coefficients)
#> p1 p2 p3 p4 p1:p2 p1:p3 p1:p4 p2:p3
#> 10.699426 10.228917 8.939289 8.532857 33.894874 37.552444 32.720996 26.739691
#> p2:p4 p3:p4
#> 33.188799 27.771368
gradient_change_data(data = coef_data,
prop = c("p1", "p2", "p3", "p4"),
gradient = "evenness",
coefficients = mod$coefficients,
interval = "none")
#> ✔ Finished data preparation
#> # A tibble: 4 × 15
#> p1 p2 p3 p4 `p1:p2` `p1:p3` `p1:p4` `p2:p3` `p2:p4` `p3:p4`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.7 0.1 0.1 0.1 0.07 0.07 0.07 0.01 0.01 0.01
#> 2 0.1 0.7 0.1 0.1 0.07 0.07 0.01 0.07 0.07 0.01
#> 3 0.1 0.1 0.7 0.1 0.01 0.01 0.01 0.07 0.01 0.07
#> 4 0.1 0.1 0.7 0.1 0.01 0.01 0.01 0.07 0.01 0.07
#> # ℹ 5 more variables: .Richness <dbl>, .Evenness <dbl>, .Gradient <chr>,
#> # .Pred <dbl>, .Avg <dbl>