The gt package – gt-data

The `data_color()` function

The data_color() function can be used without any supplied arguments to colorize a gt table. Let’s do this with the exibble dataset:

exibble |>
  gt() |>
  data_color()

num	char	fctr	date	time	datetime	currency	row	group
1.111e-01	apricot	one	2015-01-15	13:35	2018-01-01 02:22	49.950	row_1	grp_a
2.222e+00	banana	two	2015-02-15	14:40	2018-02-02 14:33	17.950	row_2	grp_a
3.333e+01	coconut	three	2015-03-15	15:45	2018-03-03 03:44	1.390	row_3	grp_a
4.444e+02	durian	four	2015-04-15	16:50	2018-04-04 15:55	65100.000	row_4	grp_a
5.550e+03	NA	five	2015-05-15	17:55	2018-05-05 04:00	1325.810	row_5	grp_b
NA	fig	six	2015-06-15	NA	2018-06-06 16:11	13.255	row_6	grp_b
7.770e+05	grapefruit	seven	NA	19:10	2018-07-07 05:22	NA	row_7	grp_b
8.880e+06	honeydew	eight	2015-08-15	20:20	NA	0.440	row_8	grp_b

What’s happened is that data_color() applies background colors to all cells of every column with the default palette in R (accessed through palette()). The default method for applying color is "auto", where numeric values will use the "numeric" method and character or factor values will use the "factor" method. The text color will be undergo modification automatically to maximize contrast (since autocolor_text is TRUE by default).

You can use any of the available method keywords and gt will only apply color to the compatible values. Let’s use the "numeric" method and supply palette values of "red" and "green".

exibble |>
  gt() |>
  data_color(
    method = "numeric",
    palette = c("red", "green")
  )

num	char	fctr	date	time	datetime	currency	row	group
1.111e-01	apricot	one	2015-01-15	13:35	2018-01-01 02:22	49.950	row_1	grp_a
2.222e+00	banana	two	2015-02-15	14:40	2018-02-02 14:33	17.950	row_2	grp_a
3.333e+01	coconut	three	2015-03-15	15:45	2018-03-03 03:44	1.390	row_3	grp_a
4.444e+02	durian	four	2015-04-15	16:50	2018-04-04 15:55	65100.000	row_4	grp_a
5.550e+03	NA	five	2015-05-15	17:55	2018-05-05 04:00	1325.810	row_5	grp_b
NA	fig	six	2015-06-15	NA	2018-06-06 16:11	13.255	row_6	grp_b
7.770e+05	grapefruit	seven	NA	19:10	2018-07-07 05:22	NA	row_7	grp_b
8.880e+06	honeydew	eight	2015-08-15	20:20	NA	0.440	row_8	grp_b

With those options in place we see that only the numeric columns num and currency received color treatments. Moreover, the palette colors were mapped to the lower and upper limits of the data in each column; interpolated colors were used for the values in between the numeric limits of the two columns.

We can constrain the cells to which coloring will be applied with the columns and rows arguments. Further to this, we can manually set the limits of the data with the domain argument (which is preferable in most cases). Here, the domain will be set as domain = c(0, 50).

exibble |>
  gt() |>
  data_color(
    columns = currency,
    rows = currency < 50,
    method = "numeric",
    palette = c("red", "green"),
    domain = c(0, 50)
  )

num	char	fctr	date	time	datetime	currency	row	group
1.111e-01	apricot	one	2015-01-15	13:35	2018-01-01 02:22	49.950	row_1	grp_a
2.222e+00	banana	two	2015-02-15	14:40	2018-02-02 14:33	17.950	row_2	grp_a
3.333e+01	coconut	three	2015-03-15	15:45	2018-03-03 03:44	1.390	row_3	grp_a
4.444e+02	durian	four	2015-04-15	16:50	2018-04-04 15:55	65100.000	row_4	grp_a
5.550e+03	NA	five	2015-05-15	17:55	2018-05-05 04:00	1325.810	row_5	grp_b
NA	fig	six	2015-06-15	NA	2018-06-06 16:11	13.255	row_6	grp_b
7.770e+05	grapefruit	seven	NA	19:10	2018-07-07 05:22	NA	row_7	grp_b
8.880e+06	honeydew	eight	2015-08-15	20:20	NA	0.440	row_8	grp_b

We can use any of the palettes available in the RColorBrewer and viridis packages. Let’s make a new gt table from a subset of the countrypops dataset. Then, through data_color(), we’ll apply coloring to the population column with the "numeric" method, use a domain between 2.5 and 3.4 million, and specify palette = "viridis".

countrypops |>
  dplyr::filter(country_name == "Mongolia") |>
  dplyr::select(-contains("code")) |>
  tail(10) |>
  gt() |>
  data_color(
    columns = population,
    method = "numeric",
    palette = "viridis",
    domain = c(2.5E6, 3.4E6)
  )

country_name	year	population
Mongolia	2012	2792349
Mongolia	2013	2845153
Mongolia	2014	2902823
Mongolia	2015	2964749
Mongolia	2016	3029555
Mongolia	2017	3096030
Mongolia	2018	3163991
Mongolia	2019	3232430
Mongolia	2020	3294335
Mongolia	2021	3347782

We can alternatively use the fn argument for supplying the scales-based function scales::col_numeric(). That function call will itself return a function (which is what the fn argument actually requires) that takes a vector of numeric values and returns color values. Here is the more complex version of the code that returns the same table as in the previous example.

countrypops |>
  dplyr::filter(country_name == "Mongolia") |>
  dplyr::select(-contains("code")) |>
  tail(10) |>
  gt() |>
  data_color(
    columns = population,
    fn = scales::col_numeric(
      palette = "viridis",
      domain = c(2.5E6, 3.4E6)
    )
  )

country_name	year	population
Mongolia	2012	2792349
Mongolia	2013	2845153
Mongolia	2014	2902823
Mongolia	2015	2964749
Mongolia	2016	3029555
Mongolia	2017	3096030
Mongolia	2018	3163991
Mongolia	2019	3232430
Mongolia	2020	3294335
Mongolia	2021	3347782

Using your own function in fn can be very useful if you want to make use of specialized arguments in the scales col_*() functions. You could even supply your own specialized function for performing complex colorizing treatments!

The data_color() function has a way to apply colorization indirectly to other columns. That is, you can apply colors to a column different from the one used to generate those specific colors. The trick is to use the target_columns argument. Let’s do this with a more complete countrypops-based table example.

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR")) |>
  dplyr::filter(year %% 10 == 0) |>
  dplyr::select(-contains("code")) |>
  dplyr::mutate(color = "") |>
  gt(groupname_col = "country_name") |>
  fmt_integer(columns = population) |>
  data_color(
    columns = population,
    target_columns = color,
    method = "numeric",
    palette = "viridis",
    domain = c(4E7, 7E7)
  ) |>
  cols_label(
    year = "",
    population = "Population",
    color = ""
  ) |>
  opt_vertical_padding(scale = 0.65)

	Population
France
1960	46,649,927
1970	51,724,116
1980	55,052,582
1990	58,044,701
2000	60,921,384
2010	65,030,575
2020	67,571,107
United Kingdom
1960	52,400,000
1970	55,663,250
1980	56,314,216
1990	57,247,586
2000	58,892,514
2010	62,766,365
2020	67,081,000

When specifying a single column in columns we can use as many target_columns values as we want. Let’s make another countrypops-based table where we map the generated colors from the year column to all columns in the table. This time, the palette used is "inferno" (also from the viridis package).

countrypops |>
  dplyr::filter(country_code_3 %in% c("FRA", "GBR", "ITA")) |>
  dplyr::select(-contains("code")) |>
  dplyr::filter(year %% 5 == 0) |>
  tidyr::pivot_wider(
    names_from = "country_name",
    values_from = "population"
  ) |>
  gt() |>
  fmt_integer(columns = c(everything(), -year)) |>
  cols_width(
    year ~ px(80),
    everything() ~ px(160)
  ) |>
  opt_all_caps() |>
  opt_vertical_padding(scale = 0.75) |>
  opt_horizontal_padding(scale = 3) |>
  data_color(
    columns = year,
    target_columns = everything(),
    palette = "inferno"
  ) |>
  tab_options(
    table_body.hlines.style = "none",
    column_labels.border.top.color = "black",
    column_labels.border.bottom.color = "black",
    table_body.border.bottom.color = "black"
  )

year	France	United Kingdom	Italy
1960	46,649,927	52,400,000	50,199,700
1965	49,282,756	54,348,050	52,112,350
1970	51,724,116	55,663,250	53,821,850
1975	53,715,733	56,225,800	55,441,001
1980	55,052,582	56,314,216	56,433,883
1985	56,569,195	56,550,268	56,593,071
1990	58,044,701	57,247,586	56,719,240
1995	59,543,659	58,019,030	56,844,303
2000	60,921,384	58,892,514	56,942,108
2005	63,188,395	60,401,206	57,969,484
2010	65,030,575	62,766,365	59,277,417
2015	66,548,272	65,116,219	60,730,582
2020	67,571,107	67,081,000	59,438,851

Now, it’s time to use pizzaplace to create a gt table. The color palette to be used is the "ggsci::red_material" one (it’s in the ggsci R package but also obtainable from the the paletteer package). Colorization will be applied to the to the sold and income columns. We don’t have to specify those in columns because those are the only columns in the table. Also, the domain is not set here. We’ll use the bounds of the available data in each column.

pizzaplace |>
  dplyr::group_by(type, size) |>
  dplyr::summarize(
    sold = dplyr::n(),
    income = sum(price),
    .groups = "drop_last"
  ) |>
  dplyr::group_by(type) |>
  dplyr::mutate(f_sold = sold / sum(sold)) |>
  dplyr::mutate(size = factor(
    size, levels = c("S", "M", "L", "XL", "XXL"))
  ) |>
  dplyr::arrange(type, size) |>
  gt(
    rowname_col = "size",
    groupname_col = "type"
  ) |>
  fmt_percent(
    columns = f_sold,
    decimals = 1
  ) |>
  cols_merge(
    columns = c(size, f_sold),
    pattern = "{1} ({2})"
  ) |>
  cols_align(align = "left", columns = stub()) |>
  data_color(
    method = "numeric",
    palette = "ggsci::red_material"
  )

	sold	income
chicken
S (20.1%)	2224	28356.00
M (35.2%)	3894	65224.50
L (44.6%)	4932	102339.00
classic
S (41.2%)	6139	69870.25
M (27.6%)	4112	60581.75
L (27.3%)	4057	74518.50
XL (3.7%)	552	14076.00
XXL (0.2%)	28	1006.60
supreme
S (28.2%)	3377	47463.50
M (33.8%)	4046	66475.00
L (38.1%)	4564	94258.50
veggie
S (22.9%)	2663	32386.75
M (30.8%)	3583	57101.00
L (46.4%)	5403	104202.70

Colorization can occur in a row-wise manner. The key to making that happen is by using direction = "row". Let’s use the sza dataset to make a gt table. Then, color will be applied to values across each ‘month’ of data in that table. This is useful when not setting a domain as the bounds of each row will be captured, coloring each cell with values relative to the range. The palette is "PuOr" from the RColorBrewer package (only the name here is required).

sza |>
  dplyr::filter(latitude == 20 & tst <= "1200") |>
  dplyr::select(-latitude) |>
  dplyr::filter(!is.na(sza)) |>
  tidyr::spread(key = "tst", value = sza) |>
  gt(rowname_col = "month") |>
  sub_missing(missing_text = "") |>
  data_color(
    direction = "row",
    palette = "PuOr",
    na_color = "white"
  )

	0530	0600	0630	0700	0730	0800	0830	0900	0930	1000	1030	1100	1130	1200
jan				84.9	78.7	72.7	66.1	61.5	56.5	52.1	48.3	45.5	43.6	43.0
feb			88.9	82.5	75.8	69.6	63.3	57.7	52.2	47.4	43.1	40.0	37.8	37.2
mar			85.7	78.8	72.0	65.2	58.6	52.3	46.2	40.5	35.5	31.4	28.6	27.7
apr		88.5	81.5	74.4	67.4	60.3	53.4	46.5	39.7	33.2	26.9	21.3	17.2	15.5
may		85.0	78.2	71.2	64.3	57.2	50.2	43.2	36.1	29.1	26.1	15.2	8.8	5.0
jun	89.2	82.7	76.0	69.3	62.5	55.7	48.8	41.9	35.0	28.1	21.1	14.2	7.3	2.0
jul	88.8	82.3	75.7	69.1	62.3	55.5	48.7	41.8	35.0	28.1	21.2	14.3	7.7	3.1
aug		83.8	77.1	70.2	63.3	56.4	49.4	42.4	35.4	28.3	21.3	14.3	7.3	1.9
sep		87.2	80.2	73.2	66.1	59.1	52.1	45.1	38.1	31.3	24.7	18.6	13.7	11.6
oct			84.1	77.1	70.2	63.3	56.5	49.9	43.5	37.5	32.0	27.4	24.3	23.1
nov			87.8	81.3	74.5	68.3	61.8	56.0	50.2	45.3	40.7	37.4	35.1	34.4
dec				84.3	78.0	71.8	66.1	60.5	55.6	50.9	47.2	44.2	42.4	41.8

Notice that na_color = "white" was used, and this avoids the appearance of gray cells for the missing values (we also removed the "NA" text with sub_missing(), opting for empty strings).

The data_color() function

The `data_color()` function