Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
790 views
in Technique[技术] by (71.8m points)

r - ggplot2 identical scales (non-continuous) on both sides

Goal

Use ggplot2 (latest version) to produce a graph that duplicates the x- or y-axis on both sides of the plot, where the scale is not continuous.

Minimal Reprex

# Example data
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)), 
                       y = runif(100))

# Standard scatterplot
p1 <- ggplot2::ggplot(dat1) +
    ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))

When the scale is continuous, this is easy to do with an identity transformation (clearly one-to-one).

# This works
p1 + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))

However, when the scale is not continuous, this doesn't work, as other scale_* functions don't have a sec.axis argument (which makes sense).

# This doesn't work
p1 + ggplot2::scale_x_discrete(sec.axis = ggplot2::sec_axis(~ .))

Error in discrete_scale(c("x", "xmin", "xmax", "xend"), "position_d",  : 
  unused argument (sec.axis = <environment>)

I also tried using the position argument in the scale_* functions, but this doesn't work either.

# This doesn't work either
p1 + ggplot2::scale_x_discrete(position = c("top", "bottom"))

Error in match.arg(position, c("left", "right", "top", "bottom")) : 
  'arg' must be of length 1

Edit

For clarity, I was hoping to duplicate the x- or y-axis where the scale is anything, not just discrete (a factor variable). I just used a discrete variable in the minimal reprex for simplicity.

For example, this issue arises in a context where the non-continuous scale is datetime or time format.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Duplicating (and modifying) discrete axis in ggplot2

You can adapt this answer by just putting the same labels on both sides. As far as "you can convert anything non-continuous to a factor, but that's even more inelegant!" from your comment above, that's what a non-continuous axis is, so I'm not sure why that would be a problem for you.

TL:DR Use as.numeric(...) for your categorical aesthetic and manually supply the labels from the original data, using scale_*_continuous(..., sec_axis(~., ...)).


Edited to update:

I happened to look back through this thread and see that it was asked for dates and times. This makes the question worded incorrectly: dates and times are continuous not discrete. Discrete scales are factors. Dates and times are ordered continuous scales. Under the hood, they're just either the days or the seconds since "1970-01-01".

scale_x_date will indeed throw an error if you try to pass a sec.axis argument, even if it's dup_axis. To work around this, you convert your dates/times to a number, and then fool your scales using labels. While this requires a bit of fiddling, it's not too complicated.

library(lubridate)
library(dplyr)

df <- data_frame(tm = ymd("2017-08-01") + 0:10,
                 y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 
df

# A tibble: 11 x 3
           tm          y tm_num
       <date>      <dbl>  <dbl>
 1 2017-08-01 -2.0948146  17379
 2 2017-08-02 -2.6020691  17380
 3 2017-08-03 -3.8940781  17381
 4 2017-08-04 -2.7807154  17382
 5 2017-08-05 -2.9451685  17383
 6 2017-08-06 -3.3355426  17384
 7 2017-08-07 -1.9664428  17385
 8 2017-08-08 -0.8501699  17386
 9 2017-08-09 -1.7481911  17387
10 2017-08-10 -1.3203246  17388
11 2017-08-11 -2.5487692  17389

I just made a simple vector of 11 days (0 to 10) added to "2017-08-01". If you run as.numeric on that, you get the number of days since the beginning of the Unix epoch. (see ?lubridate::as_date).

df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(floor(limits[1]), ceiling(limits[2]), 
                           by = as.numeric(as_date(days(2))))
                       },
                     labels = function(breaks) {as_date(breaks)})

When you plot tm_num against y, it's treated just like normal numbers, and you can use scale_x_continuous(sec.axis = dup_axis(), ...). Then you have to figure out how many breaks you want and how to label them.

The breaks = is a function that takes the limits of the data, and calculates nice looking breaks. First you round the limits, to make sure you get integers (dates don't work well with non-integers). Then you generate a sequence of your desired width (the days(2)). You could use weeks(1) or months(3) or whatever, check out ?lubridate::days. Under the hood, days(x) generates a number of seconds (86400 per day, 604800 per week, etc.), as_date converts that into a number of days since the Unix epoch, and as.numeric converts it back to an integer.

The labels = is a function takes the sequence of integers we just generated and converts those back to displayable dates.

enter image description here

This also works with times instead of dates. While dates are integer days, times are integer seconds (either since the Unix epoch, for datetimes, or since midnight, for times).

Let's say you had some observations that were on the scale of minutes, not days.

The code would be similar, with a few tweaks:

df <- data_frame(tm = ymd_hms("2017-08-01 23:58:00") + 60*0:10,
           y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 
df

# A tibble: 11 x 3
                    tm        y     tm_num
                <dttm>    <dbl>      <dbl>
 1 2017-08-01 23:58:00 1.375275 1501631880
 2 2017-08-01 23:59:00 2.373565 1501631940
 3 2017-08-02 00:00:00 3.650167 1501632000
 4 2017-08-02 00:01:00 2.578420 1501632060
 5 2017-08-02 00:02:00 5.155688 1501632120
 6 2017-08-02 00:03:00 4.022228 1501632180
 7 2017-08-02 00:04:00 4.776145 1501632240
 8 2017-08-02 00:05:00 4.917420 1501632300
 9 2017-08-02 00:06:00 4.513710 1501632360
10 2017-08-02 00:07:00 4.134294 1501632420
11 2017-08-02 00:08:00 3.142898 1501632480
df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(floor(limits[1] / 60) * 60, ceiling(limits[2] / 60) * 60, 
                           by = as.numeric(as_datetime(minutes(2))))
                       },
                     labels = function(breaks) {
                       stamp("Jan 1,
0:00:00", orders = "md hms")(as_datetime(breaks))
                       })

Here I updated the dummy data to span 11 minutes from just before midnight to just after midnight. In breaks = I modified it to make sure I got an integer number of minutes to create breaks on, changed as_date to as_datetime, and used minutes(2) to make a break every two minutes. In labels = I added a functional stamp(...)(...), which creates a nice format to display.

enter image description here

Finally just times.

df <- data_frame(tm = milliseconds(1234567 + 0:10),
           y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 

df
# A tibble: 11 x 3
             tm          y   tm_num
   <S4: Period>      <dbl>    <dbl>
 1    1234.567S  0.2136745 1234.567
 2    1234.568S -0.6376908 1234.568
 3    1234.569S -1.1080997 1234.569
 4     1234.57S -0.4219645 1234.570
 5    1234.571S -2.7579118 1234.571
 6    1234.572S -1.6626674 1234.572
 7    1234.573S -3.2298175 1234.573
 8    1234.574S -3.2078864 1234.574
 9    1234.575S -3.3982454 1234.575
10    1234.576S -2.1051759 1234.576
11    1234.577S -1.9163266 1234.577
df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(limits[1], limits[2], 
                           by = as.numeric(milliseconds(3)))
                       },
                     labels = function(breaks) {format((as_datetime(breaks)),
                                                       format = "%H:%M:%OS3")})

Here we've got an observation every millisecond for 11 hours starting at t = 20min34.567sec. So in breaks = we dispense with any rounding, since we don't want integers now. Then we use breaks every milliseconds(2). Then labels = needs to be formatted to accept decimal seconds, the "%OS3" means 3 digits of decimals for the seconds place (can accept up to 6, see ?strptime).

enter image description here

Is all of this worth it? Probably not, unless you really really want a duplicated time axis. I'll probably post this as an issue on the ggplot2 GitHub, because dup_axis should "just work" with datetimes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...