The data_color() function

The data_color() function can be used without any supplied arguments to colorize a gt table. Let’s do this with the exibble dataset:

exibble |>
  gt() |>
  data_color()
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

What’s happened is that data_color() applies background colors to all cells of every column with the default palette in R (accessed through palette()). The default method for applying color is "auto", where numeric values will use the "numeric" method and character or factor values will use the "factor" method. The text color will be undergo modification automatically to maximize contrast (since autocolor_text is TRUE by default).

You can use any of the available method keywords and gt will only apply color to the compatible values. Let’s use the "numeric" method and supply palette values of "red" and "green".

exibble |>
  gt() |>
  data_color(
    method = "numeric",
    palette = c("red", "green")
  )
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

With those options in place we see that only the numeric columns num and currency received color treatments. Moreover, the palette colors were mapped to the lower and upper limits of the data in each column; interpolated colors were used for the values in between the numeric limits of the two columns.

We can constrain the cells to which coloring will be applied with the columns and rows arguments. Further to this, we can manually set the limits of the data with the domain argument (which is preferable in most cases). Here, the domain will be set as domain = c(0, 50).

exibble |>
  gt() |>
  data_color(
    columns = currency,
    rows = currency < 50,
    method = "numeric",
    palette = c("red", "green"),
    domain = c(0, 50)
  )
num char fctr date time datetime currency row group
1.111e-01 apricot one 2015-01-15 13:35 2018-01-01 02:22 49.950 row_1 grp_a
2.222e+00 banana two 2015-02-15 14:40 2018-02-02 14:33 17.950 row_2 grp_a
3.333e+01 coconut three 2015-03-15 15:45 2018-03-03 03:44 1.390 row_3 grp_a
4.444e+02 durian four 2015-04-15 16:50 2018-04-04 15:55 65100.000 row_4 grp_a
5.550e+03 NA five 2015-05-15 17:55 2018-05-05 04:00 1325.810 row_5 grp_b
NA fig six 2015-06-15 NA 2018-06-06 16:11 13.255 row_6 grp_b
7.770e+05 grapefruit seven NA 19:10 2018-07-07 05:22 NA row_7 grp_b
8.880e+06 honeydew eight 2015-08-15 20:20 NA 0.440 row_8 grp_b

We can use any of the palettes available in the RColorBrewer and viridis packages. Let’s make a new gt table from a subset of the countrypops dataset. Then, through data_color(), we’ll apply coloring to the population column with the "numeric" method, use a domain between 2.5 and 3.4 million, and specify palette = "viridis".

countrypops |>
  dplyr::filter(country_name == "Mongolia") |>
  dplyr::select(-contains("code")) |>
  tail(10) |>
  gt() |>
  data_color(
    columns = population,
    method = "numeric",
    palette = "viridis",
    domain = c(2.5E6, 3.4E6)
  )
country_name year population
Mongolia 2012 2792349
Mongolia 2013 2845153
Mongolia 2014 2902823
Mongolia 2015 2964749
Mongolia 2016 3029555
Mongolia 2017 3096030
Mongolia 2018 3163991
Mongolia 2019 3232430
Mongolia 2020 3294335
Mongolia 2021 3347782

We can alternatively use the fn argument for supplying the scales-based function scales::col_numeric(). That function call will itself return a function (which is what the fn argument actually requires) that takes a vector of numeric values and returns color values. Here is the more complex version of the code that returns the same table as in the previous example.

countrypops |>
  dplyr::filter(country_name == "Mongolia") |>
  dplyr::select(-contains("code")) |>
  tail(10) |>
  gt() |>
  data_color(
    columns = population,
    fn = scales::col_numeric(
      palette = "viridis",
      domain = c(2.5E6, 3.4E6)
    )
  )
country_name year population
Mongolia 2012 2792349
Mongolia 2013 2845153
Mongolia 2014 2902823
Mongolia 2015 2964749
Mongolia 2016 3029555
Mongolia 2017 3096030
Mongolia 2018 3163991
Mongolia 2019 3232430
Mongolia 2020 3294335
Mongolia 2021 3347782

Using your own function in fn can be very useful if you want to make use of specialized arguments in the scales col_*() functions. You could even supply your own specialized function for performing complex colorizing treatments!

The data_color() function has a way to apply colorization indirectly to other columns. That is, you can apply colors to a column different from the one used to generate those specific colors. The trick is to use the target_columns argument. Let’s do this with a more complete countrypops-based table example.

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR")) |>
  dplyr::filter(year %% 10 == 0) |>
  dplyr::select(-contains("code")) |>
  dplyr::mutate(color = "") |>
  gt(groupname_col = "country_name") |>
  fmt_integer(columns = population) |>
  data_color(
    columns = population,
    target_columns = color,
    method = "numeric",
    palette = "viridis",
    domain = c(4E7, 7E7)
  ) |>
  cols_label(
    year = "",
    population = "Population",
    color = ""
  ) |>
  opt_vertical_padding(scale = 0.65)
Population
France
1960 46,649,927
1970 51,724,116
1980 55,052,582
1990 58,044,701
2000 60,921,384
2010 65,030,575
2020 67,571,107
United Kingdom
1960 52,400,000
1970 55,663,250
1980 56,314,216
1990 57,247,586
2000 58,892,514
2010 62,766,365
2020 67,081,000

When specifying a single column in columns we can use as many target_columns values as we want. Let’s make another countrypops-based table where we map the generated colors from the year column to all columns in the table. This time, the palette used is "inferno" (also from the viridis package).

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR", "ITA")) |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(year %% 5 == 0) |>
  tidyr::pivot_wider(
    names_from = "country_name",
    values_from = "population"
  ) |>
  gt() |>
  fmt_integer(columns = c(everything(), -year)) |>
  cols_width(
    year ~ px(80),
    everything() ~ px(160)
  ) |>
  opt_all_caps() |>
  opt_vertical_padding(scale = 0.75) |>
  opt_horizontal_padding(scale = 3) |>
  data_color(
    columns = year,
    target_columns = everything(),
    palette = "inferno"
  ) |>
  tab_options(
    table_body.hlines.style = "none",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    table_body.border.bottom.color = "black"
  )
year France United Kingdom Italy
1960 46,649,927 52,400,000 50,199,700
1965 49,282,756 54,348,050 52,112,350
1970 51,724,116 55,663,250 53,821,850
1975 53,715,733 56,225,800 55,441,001
1980 55,052,582 56,314,216 56,433,883
1985 56,569,195 56,550,268 56,593,071
1990 58,044,701 57,247,586 56,719,240
1995 59,543,659 58,019,030 56,844,303
2000 60,921,384 58,892,514 56,942,108
2005 63,188,395 60,401,206 57,969,484
2010 65,030,575 62,766,365 59,277,417
2015 66,548,272 65,116,219 60,730,582
2020 67,571,107 67,081,000 59,438,851

Now, it’s time to use pizzaplace to create a gt table. The color palette to be used is the "ggsci::red_material" one (it’s in the ggsci R package but also obtainable from the the paletteer package). Colorization will be applied to the to the sold and income columns. We don’t have to specify those in columns because those are the only columns in the table. Also, the domain is not set here. We’ll use the bounds of the available data in each column.

pizzaplace |>
  dplyr::group_by(type, size) |>
  dplyr::summarize(
    sold = dplyr::n(),
    income = sum(price),
    .groups = "drop_last"
  ) |>
  dplyr::group_by(type) |>
  dplyr::mutate(f_sold = sold / sum(sold)) |>
  dplyr::mutate(size = factor(
    size, levels = c("S", "M", "L", "XL", "XXL"))
  ) |>
  dplyr::arrange(type, size) |>
  gt(
    rowname_col = "size",
    groupname_col = "type"
  ) |>
  fmt_percent(
    columns = f_sold,
    decimals = 1
  ) |>
  cols_merge(
    columns = c(size, f_sold),
    pattern = "{1} ({2})"
  ) |>
  cols_align(align = "left", columns = stub()) |>
  data_color(
    method = "numeric",
    palette = "ggsci::red_material"
  )
sold income
chicken
S (20.1%) 2224 28356.00
M (35.2%) 3894 65224.50
L (44.6%) 4932 102339.00
classic
S (41.2%) 6139 69870.25
M (27.6%) 4112 60581.75
L (27.3%) 4057 74518.50
XL (3.7%) 552 14076.00
XXL (0.2%) 28 1006.60
supreme
S (28.2%) 3377 47463.50
M (33.8%) 4046 66475.00
L (38.1%) 4564 94258.50
veggie
S (22.9%) 2663 32386.75
M (30.8%) 3583 57101.00
L (46.4%) 5403 104202.70

Colorization can occur in a row-wise manner. The key to making that happen is by using direction = "row". Let’s use the sza dataset to make a gt table. Then, color will be applied to values across each ‘month’ of data in that table. This is useful when not setting a domain as the bounds of each row will be captured, coloring each cell with values relative to the range. The palette is "PuOr" from the RColorBrewer package (only the name here is required).

sza |>
  dplyr::filter(latitude == 20 & tst <= "1200") |>
  dplyr::select(-latitude) |>
  dplyr::filter(!is.na(sza)) |>
  tidyr::spread(key = "tst", value = sza) |>
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  data_color(
    direction = "row",
    palette = "PuOr",
    na_color = "white"
  )
0530 0600 0630 0700 0730 0800 0830 0900 0930 1000 1030 1100 1130 1200
jan 84.9 78.7 72.7 66.1 61.5 56.5 52.1 48.3 45.5 43.6 43.0
feb 88.9 82.5 75.8 69.6 63.3 57.7 52.2 47.4 43.1 40.0 37.8 37.2
mar 85.7 78.8 72.0 65.2 58.6 52.3 46.2 40.5 35.5 31.4 28.6 27.7
apr 88.5 81.5 74.4 67.4 60.3 53.4 46.5 39.7 33.2 26.9 21.3 17.2 15.5
may 85.0 78.2 71.2 64.3 57.2 50.2 43.2 36.1 29.1 26.1 15.2 8.8 5.0
jun 89.2 82.7 76.0 69.3 62.5 55.7 48.8 41.9 35.0 28.1 21.1 14.2 7.3 2.0
jul 88.8 82.3 75.7 69.1 62.3 55.5 48.7 41.8 35.0 28.1 21.2 14.3 7.7 3.1
aug 83.8 77.1 70.2 63.3 56.4 49.4 42.4 35.4 28.3 21.3 14.3 7.3 1.9
sep 87.2 80.2 73.2 66.1 59.1 52.1 45.1 38.1 31.3 24.7 18.6 13.7 11.6
oct 84.1 77.1 70.2 63.3 56.5 49.9 43.5 37.5 32.0 27.4 24.3 23.1
nov 87.8 81.3 74.5 68.3 61.8 56.0 50.2 45.3 40.7 37.4 35.1 34.4
dec 84.3 78.0 71.8 66.1 60.5 55.6 50.9 47.2 44.2 42.4 41.8

Notice that na_color = "white" was used, and this avoids the appearance of gray cells for the missing values (we also removed the "NA" text with sub_missing(), opting for empty strings).