An Introduction to metaconfoundr

The metaconfoundr package is a toolkit for visualizing confounding control in a set of studies included in a meta-analysis. In this approach, a set of domain experts agree on the variables required to control for confounding for a scientific question properly. Then, for a given confounder, the studies are described as being adequately controlled, inadequately controlled, or controlled with some concerns (see the vignette on evaluating studies and setting up your data). metaconfoundr visualizes these relationships using heatmaps and traffic light plots. metaconfoundr() standardizes data for use in mc_heatmap() and mc_trafficlight(). Let’s look at an example with an included data set, ipi. These data represent 14 analyses (retrospective cohorts and sibling-matched designs) to evaluate the association between short interpregnancy interval (<6 months versus 18-23 months) and risk of preterm birth (<37 weeks gestation) and the adequacy of confounder control. Using metaconfoundr() on ipi does some data wrangling to get it into a shape expected by the plotting functions:

library(metaconfoundr)

# for later examples
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)

metaconfoundr(ipi)
#> # A tibble: 407 × 5
#>    construct         variable     is_confounder study        control_quality
#>    <chr>             <chr>        <chr>         <chr>        <ord>          
#>  1 Sociodemographics Maternal age Y             Zhu_2001a    adequate       
#>  2 Sociodemographics Maternal age Y             Zhu_2001b    adequate       
#>  3 Sociodemographics Maternal age Y             Zhu_1999     adequate       
#>  4 Sociodemographics Maternal age Y             Smith_2003   adequate       
#>  5 Sociodemographics Maternal age Y             Shachar_2016 adequate       
#>  6 Sociodemographics Maternal age Y             Salihu_2012a adequate       
#>  7 Sociodemographics Maternal age Y             Salihu_2012b adequate       
#>  8 Sociodemographics Maternal age Y             Hanley_2017  adequate       
#>  9 Sociodemographics Maternal age Y             deWeger_2011 adequate       
#> 10 Sociodemographics Maternal age Y             Coo_2017     adequate       
#> # … with 397 more rows

The vignette on evaluating studies has more detail, but in brief, the goal is to create a data frame where there are five columns and a row for each confounder and study. The columns are construct, the domain to which a confounder might belong (e.g., “Sociodemographics”); variable, the name of the variable (e.g. “age”); is_confounder, an indicator if the variable is a confounder; study, the name of the study (or another unique ID); and control_quality, an indicator of the level of control for a confounder. control_quality is one of “adequate”, “some concerns”, or “inadequate”. metaconfoundr attempts to automatically detect the layout of your data, but you have full control (see ?mc_detect_layout). You can also specify the data in this format manually.

Data that you provide metaconfoundr() can be in two basic formats: a long and wide. With the long format, metaconfoundr assumes that five columns match the above layout and standardizes them. If there are more than five, metaconfoundr() treats any additional columns as studies, (e.g., they are in wide format). It will automatically transform your wide data to the format expected by metaconfoundr plotting functions. ipi has a wide cousin, ipi_wide, which metaconfoundr() can prepare seamlessly:

ipi_wide
#> # A tibble: 37 × 14
#>    construct      factor confo…¹ Zhu_2…² Zhu_2…³ Zhu_1…⁴ Smith…⁵ Shach…⁶ Salih…⁷
#>    <chr>          <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 Sociodemograp… Mater… Y             2       2       2       2       2       2
#>  2 Sociodemograp… Race/… Y             2       2       2       0       2       2
#>  3 Sociodemograp… Marit… Y             2       2       2       2       0       2
#>  4 Sociodemograp… Pater… Y             0       0       0       0       0       0
#>  5 Sociodemograp… Geogr… Y             0       0       2       0       0       0
#>  6 Socioeconomics SES c… Y             0       0       0       2       0       0
#>  7 Socioeconomics Incom… Y             0       0       0       0       0       0
#>  8 Socioeconomics Educa… Y             2       2       2       0       2       2
#>  9 Socioeconomics Insur… Y             0       0       0       0       2       0
#> 10 Reproductive … Prior… Y             2       2       2       2       2       2
#> # … with 27 more rows, 5 more variables: Salihu_2012b <dbl>, Hanley_2017 <dbl>,
#> #   deWeger_2011 <dbl>, Coo_2017 <dbl>, Ball_2014 <dbl>, and abbreviated
#> #   variable names ¹​confounder_y_n, ²​Zhu_2001a, ³​Zhu_2001b, ⁴​Zhu_1999,
#> #   ⁵​Smith_2003, ⁶​Shachar_2016, ⁷​Salihu_2012a

metaconfoundr(ipi_wide)
#> # A tibble: 407 × 5
#>    construct         variable     is_confounder study        control_quality
#>    <chr>             <chr>        <chr>         <chr>        <ord>          
#>  1 Sociodemographics Maternal age Y             Zhu_2001a    adequate       
#>  2 Sociodemographics Maternal age Y             Zhu_2001b    adequate       
#>  3 Sociodemographics Maternal age Y             Zhu_1999     adequate       
#>  4 Sociodemographics Maternal age Y             Smith_2003   adequate       
#>  5 Sociodemographics Maternal age Y             Shachar_2016 adequate       
#>  6 Sociodemographics Maternal age Y             Salihu_2012a adequate       
#>  7 Sociodemographics Maternal age Y             Salihu_2012b adequate       
#>  8 Sociodemographics Maternal age Y             Hanley_2017  adequate       
#>  9 Sociodemographics Maternal age Y             deWeger_2011 adequate       
#> 10 Sociodemographics Maternal age Y             Coo_2017     adequate       
#> # … with 397 more rows

Creating plots

The primary goal of metaconfoundr is to visualize confounding control for a set of studies in a meta-analysis. The two main plotting functions are mc_heatmap() and mc_trafficlight(), which both accept data prepared by metaconfoundr().

mc_ipi <- metaconfoundr(ipi)
mc_heatmap(mc_ipi)

mc_trafficlight(mc_ipi)

Customizing plots

These results are ggplots and can thus be customized like any other plot from ggplot2.

wrap_labeller <- function(x) stringr::str_wrap(x, 10)

mc_heatmap(mc_ipi) + 
  facet_constructs(labeller = as_labeller(wrap_labeller)) + 
  theme_mc() + 
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5),
    strip.text = element_text(face = "bold")
  )

metaconfoundr also supports adding Cochrane-like symbols and colors to plots with geoms and scales. Note that these colors are not colorblind-friendly.

mc_trafficlight(mc_ipi) + 
  geom_cochrane() + 
  scale_fill_cochrane() + 
  theme_mc() + 
  guides(x = guide_axis(n.dodge = 3)) # dodge axis text rather than rotate

It’s also possible to sort plots by how well a confounder is controlled over all the studies included. See ?score_control for more information on available algorithms by which to sort confounders.

mc_heatmap(mc_ipi, sort = TRUE) + 
  theme_mc() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5),)

Summarizing confounder control

In addition to visualizing all possible confounders, metaconfoundr supports evaluating confounders at the domain level. For instance, if we feel ipi has three core areas of confounding, we can specify what variables are necessary for adequate control to account for the domain. These three domains are sociodemographics, socioeconomics, and reproductive history. We’ll say that controlling for maternal age, race/ethnicity, and marital status are sufficient to control for sociodemographics; socioeconomic status or insurance status and education are adequate for socioeconomics; and prior pregnancy outcomes are enough to control for reproductive history. We can specify these rules using boolean logic that refers to confounders in the variable column of our data:

summary_df <- summarize_control_quality(
  metaconfoundr(ipi),
  Sociodemographics = `Maternal age` & `Race/ethnicity` & `Marital status`,
  Socioeconomics = `SES category` | Insurance & Education,
  "Reproductive Hx" = `Prior pregnancy outcome`
)

summary_df
#> # A tibble: 44 × 4
#>    study     variable          control_quality construct
#>    <chr>     <fct>             <ord>           <fct>    
#>  1 Zhu_2001a overall           some concerns   overall  
#>  2 Zhu_2001a Sociodemographics adequate        domains  
#>  3 Zhu_2001a Socioeconomics    inadequate      domains  
#>  4 Zhu_2001a Reproductive Hx   adequate        domains  
#>  5 Zhu_2001b overall           some concerns   overall  
#>  6 Zhu_2001b Sociodemographics adequate        domains  
#>  7 Zhu_2001b Socioeconomics    inadequate      domains  
#>  8 Zhu_2001b Reproductive Hx   adequate        domains  
#>  9 Zhu_1999  overall           some concerns   overall  
#> 10 Zhu_1999  Sociodemographics adequate        domains  
#> # … with 34 more rows

Summarizing control quality creates a more straightforward visualization. You can also visualize just the overall control quality of a study by using the domains = FALSE argument in summarize_control_quality().

mc_heatmap(summary_df) +
  theme_mc() + 
  theme(legend.position = "right") +
  guides(x = guide_axis(n.dodge = 2))