-
Notifications
You must be signed in to change notification settings - Fork 175
/
ch13.Rmd
1457 lines (976 loc) · 60.2 KB
/
ch13.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
bookdown::html_document2:
fig_caption: yes
editor_options:
chunk_output_type: console
---
```{r echo = FALSE, cache = FALSE}
# This block needs cache=FALSE to set fig.width and fig.height, and have those
# persist across cached builds.
source("utils.R", local = TRUE)
knitr::opts_chunk$set(fig.width=3.5, fig.height=3.5)
```
Miscellaneous Graphs {#CHAPTER-MISCGRAPH}
====================
There are many, many ways of visualizing data, and sometimes things don't fit into nice, tidy categories. This chapter shows how to make some of these other visualizations.
Making a Correlation Matrix {#RECIPE-MISCGRAPH-CORRMATRIX}
---------------------------
### Problem
You want to make a graphical correlation matrix.
### Solution
We'll look at the `mtcars` data set:
```{r}
mtcars
```
First, generate the numerical correlation matrix using `cor`. This will generate correlation coefficients for each pair of columns:
```{r}
mcor <- cor(mtcars)
# Print mcor and round to 2 digits
round(mcor, digits = 2)
```
If there are any columns that you don't want used for correlations (for example, a column of names), you should exclude them. If there are any `NA` cells in the original data, the resulting correlation matrix will have `NA` values. To deal with this, you will probably want to use the argument `use="complete.obs"` or `use="pairwise.complete.obs"`.
To plot the correlation matrix (Figure \@ref(fig:FIG-MISCGRAPH-CORRMATRIX-BASIC)), we'll use the corrplot package:
```{r FIG-MISCGRAPH-CORRMATRIX-BASIC, fig.cap="A correlation matrix", fig.width=7, fig.height=7, message=FALSE}
# If needed, first install with install.packages("corrplot")
library(corrplot)
corrplot(mcor)
```
### Discussion
The `corrplot()` function has many, many options. Here is an example of how to make a correlation matrix with colored squares and black labels, rotated 45 degrees along the top (Figure \@ref(fig:FIG-MISCGRAPH-CORRMATRIX-SHADE)):
```{r FIG-MISCGRAPH-CORRMATRIX-SHADE, fig.cap="Correlation matrix with colored squares and black, rotated labels", fig.width=7, fig.height=7}
corrplot(mcor, method = "shade", shade.col = NA, tl.col = "black", tl.srt = 45)
```
It may also be helpful to display labels representing the correlation coefficient on each square in the matrix. In this example we'll make a lighter palette so that the text is readable, and we'll remove the color legend, since it's redundant. We'll also order the items so that correlated items are closer together, using the `order="AOE"` (angular order of eigenvectors) option. The result is shown in Figure \@ref(fig:FIG-MISCGRAPH-CORRMATRIX-SHADE-NUM):
```{r FIG-MISCGRAPH-CORRMATRIX-SHADE-NUM, fig.cap="Correlation matrix with correlation coefficients and no legend", fig.width=7, fig.height=7}
# Generate a lighter palette
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mcor, method = "shade", shade.col = NA, tl.col = "black", tl.srt = 45,
col = col(200), addCoef.col = "black", cl.pos = "n", order = "AOE")
```
Like many other standalone plotting functions, `corrplot()` has its own menagerie of options, which can't all be illustrated here. Table \@ref(tab:TABLE-OPTIONS-FOR-CORRPLOT) lists some useful options.
Table: (\#tab:TABLE-OPTIONS-FOR-CORRPLOT) Options for `corrplot()`
+-----------------------------------+-----------------------------------+
| Option | Description |
+===================================+===================================+
| `type={"lower" | "upper"}` | Only use the lower or upper |
| | triangle |
+-----------------------------------+-----------------------------------+
| `diag=FALSE` | Don't show values on the |
| | diagonal |
+-----------------------------------+-----------------------------------+
| `addshade="all"` | Add lines indicating the |
| | direction of the correlation |
+-----------------------------------+-----------------------------------+
| `shade.col=NA` | Hide correlation direction lines |
+-----------------------------------+-----------------------------------+
| `method="shade"` | Use colored squares |
+-----------------------------------+-----------------------------------+
| `method="ellipse"` | Use ellipses |
+-----------------------------------+-----------------------------------+
| `addCoef.col="*color*"` | Add correlation coefficients, in |
| | *color* |
+-----------------------------------+-----------------------------------+
| `tl.srt="*number*"` | Specify the rotation angle for |
| | top labels |
+-----------------------------------+-----------------------------------+
| `tl.col="*color*"` | Specify the label color |
+-----------------------------------+-----------------------------------+
| `order={"AOE" | "FPC" | "hclust"}`| Sort labels using angular order |
| | of eigenvectors, first principal |
| | component, or hierarchical |
| | clustering |
+-----------------------------------+-----------------------------------+
### See Also
To create a scatter plot matrix, see Recipe \@ref(RECIPE-SCATTER-SPLOM).
For more on subsetting data, see Recipe \@ref(RECIPE-DATAPREP-SUBSET).
Plotting a Function {#RECIPE-MISCGRAPH-FUNCTION}
-------------------
### Problem
You want to plot a function.
### Solution
Use `stat_function()`. It's also necessary to give ggplot a dummy data frame so that it will get the proper *x* range. In this example we'll use `dnorm()`, which gives the density of the normal distribution (Figure \@ref(fig:FIG-MISCGRAPH-FUNCTION), left):
```{r FIG-MISCGRAPH-FUNCTION-1, eval=FALSE}
# The data frame is only used for setting the range
p <- ggplot(data.frame(x = c(-3, 3)), aes(x = x))
p + stat_function(fun = dnorm)
```
### Discussion
Some functions take additional arguments. For example, `dt()`, the function for the density of the *t*-distribution, takes a parameter for degrees of freedom (Figure \@ref(fig:FIG-MISCGRAPH-FUNCTION), right). These additional arguments can be passed to the function by putting them in a list and giving the list to args:
```{r FIG-MISCGRAPH-FUNCTION-2, eval=FALSE}
p + stat_function(fun = dt, args = list(df = 2))
```
(ref:cap-FIG-MISCGRAPH-FUNCTION) The normal distribution (left); The *t*-distribution with `df=2` (right)
```{r FIG-MISCGRAPH-FUNCTION, ref.label=c("FIG-MISCGRAPH-FUNCTION-1", "FIG-MISCGRAPH-FUNCTION-2"), echo=FALSE, fig.show="hold", fig.cap="(ref:cap-FIG-MISCGRAPH-FUNCTION)"}
```
It's also possible to define your own functions. It should take an *x* value for its first argument, and it should return a *y* value. In this example, we'll define a sigmoid function (Figure \@ref(fig:FIG-MISCGRAPH-FUNCTION-CUSTOM)):
```{r FIG-MISCGRAPH-FUNCTION-CUSTOM, fig.cap="A user-defined function"}
myfun <- function(xvar) {
1 / (1 + exp(-xvar + 10))
}
ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
stat_function(fun = myfun)
```
By default, the function is calculated at 101 points along the *x* range. If you have a rapidly fluctuating function, you may be able to see the individual segments. To smooth out the curve, pass a larger value of n to `stat_function()`, as in `stat_function(fun=myfun, n=200)`.
### See Also
For plotting predicted values from model objects (such as `lm` and `glm`), see Recipe \@ref(RECIPE-SCATTER-FITLINES-MODEL).
Shading a Subregion Under a Function Curve {#RECIPE-MISCGRAPH-FUNCTION-SHADE}
------------------------------------------
### Problem
You want to shade part of the area under a function curve.
### Solution
Define a new wrapper function around your curve function, and replace out-of-range values with `NA`, as shown in Figure \@ref(fig:FIG-MISCGRAPH-FUNCTION-SHADE):
```{r FIG-MISCGRAPH-FUNCTION-SHADE, fig.cap="Function curve with a shaded region"}
# Return dnorm(x) for 0 < x < 2, and NA for all other x
dnorm_limit <- function(x) {
y <- dnorm(x)
y[x < 0 | x > 2] <- NA
return(y)
}
# ggplot() with dummy data
p <- ggplot(data.frame(x = c(-3, 3)), aes(x = x))
p +
stat_function(fun = dnorm_limit, geom = "area", fill = "blue", alpha = 0.2) +
stat_function(fun = dnorm)
```
Remember that what gets passed to this function is a vector, not individual values. If this function operated on single elements at a time, it might make sense to use an `if`--`else` statement to decide what to return, conditional on the value of `x`. But that won't work here, since `x` is a vector with many values.
### Discussion
R has first-class functions, and we can write a function that returns a *closure*-that is, we can program a function to program another function.
This function will allow you to pass in a function, a minimum value, and a maximum value. Values outside the range will again be returned with `NA`:
```{r}
limitRange <- function(fun, min, max) {
function(x) {
y <- fun(x)
y[x < min | x > max] <- NA
return(y)
}
}
```
Now we can call this function to create another function -- one that is effectively the same as the `dnorm_limit()` function used earlier:
```{r}
# This returns a function
dlimit <- limitRange(dnorm, 0, 2)
# Now we'll try out the new function -- it only returns values for inputs
# between 0 and 2
dlimit(-2:4)
```
We can use `limitRange()` to create a function that is passed to `stat_function()`:
```{r eval=FALSE}
p +
stat_function(fun = dnorm) +
stat_function(fun = limitRange(dnorm, 0, 2), geom = "area", fill = "blue",
alpha = 0.2)
```
The `limitRange()` function can be used with any function, not just `dnorm()`, to create a range-limited version of that function. The result of all this is that instead of having to write functions with different hard-coded values for each situation that arises, we can write one function and simply pass it different arguments depending on the situation.
If you look very, very closely at the graph in Figure \@ref(fig:FIG-MISCGRAPH-FUNCTION-SHADE), you may see that the shaded region does not align exactly with the range we specified. This is because ggplot2 does a numeric approximation by calculating values at fixed intervals, and these intervals may not fall exactly within the specified range. As in Recipe \@ref(RECIPE-MISCGRAPH-FUNCTION), we can improve the approximation by increasing the number of interpolated values with `stat_function(n = 200)`.
Creating a Network Graph {#RECIPE-MISCGRAPH-GRAPH}
------------------------
### Problem
You want to create a network graph.
### Solution
Use the igraph package. To create a graph, pass a vector containing pairs of items to `graph()`, then plot the resulting object (Figure \@ref(fig:FIG-MISCGRAPH-GRAPH-BASIC)):
```{r FIG-MISCGRAPH-GRAPH-BASIC, fig.show="hold", fig.cap="A directed graph (left); An undirected graph, with no vertex labels (right)", fig.width=4.5, fig.height=4.5}
# May need to install first, with install.packages("igraph")
library(igraph)
# Specify edges for a directed graph
gd <- graph(c(1,2, 2,3, 2,4, 1,4, 5,5, 3,6))
plot(gd)
# For an undirected graph
gu <- graph(c(1,2, 2,3, 2,4, 1,4, 5,5, 3,6), directed = FALSE)
# No labels
plot(gu, vertex.label = NA)
```
This is the structure of each of the graph objects:
```{r}
gd
gu
```
### Discussion
In a network graph, the position of the nodes is unspecified by the data, and they're placed randomly. To make the output repeatable, you can set the random seed before making the plot. You can try different random numbers until you get a result that you like:
```{r eval=FALSE}
set.seed(229)
plot(gu)
```
It's also possible to create a graph from a data frame. The first two columns of the data frame are used, and each row specifies a connection between two nodes. In the next example (Figure \@ref(fig:FIG-MISCGRAPH-GRAPH-DIRECTED)), we'll use the `madmen2` data set, which has this structure. We'll also use the Fruchterman-Reingold layout algorithm. The idea is that all the nodes have a magnetic repulsion from one another, but the edges between nodes act as springs, pulling the nodes together:
```{r FIG-MISCGRAPH-GRAPH-DIRECTED, fig.cap="A directed graph from a data frame, with the Fruchterman-Reingold algorithm", fig.width=5, fig.height=5}
library(gcookbook) # For the data set
madmen2
# Create a graph object from the data set
g <- graph.data.frame(madmen2, directed=TRUE)
# Remove unnecessary margins
par(mar = c(0, 0, 0, 0))
plot(g, layout = layout.fruchterman.reingold, vertex.size = 8,
edge.arrow.size = 0.5, vertex.label = NA)
```
It's also possible to make a directed graph from a data frame. The `madmen` data set has only one row for each pairing, since direction doesn't matter for an undirected graph. This time we'll use a circle layout (Figure \@ref(fig:FIG-MISCGRAPH-GRAPH-CIRCLE)):
```{r FIG-MISCGRAPH-GRAPH-CIRCLE, fig.cap="A circular undirected graph from a data frame"}
g <- graph.data.frame(madmen, directed = FALSE)
par(mar = c(0, 0, 0, 0)) # Remove unnecessary margins
plot(g, layout = layout.circle, vertex.size = 8, vertex.label = NA)
```
### See Also
For more information about the available output options, see `?plot.igraph`. Also see `?igraph::layout` for layout options.
An alternative to igraph is Rgraphviz, which a frontend for Graphviz, an open-source library for visualizing graphs. It works better with labels and makes it easier to create graphs with a controlled layout, but it can be a bit challenging to install. Rgraphviz is available through the Bioconductor repository system.
Using Text Labels in a Network Graph {#RECIPE-MISCGRAPH-GRAPH-LABEL}
------------------------------------
### Problem
You want to use text labels in a network graph.
### Solution
The vertices/nodes may have names, but these names are not used as labels by default. To set the labels, pass in a vector of names to `vertex.label` (Figure \@ref(fig:FIG-MISCGRAPH-GRAPH-LABEL)):
```{r FIG-MISCGRAPH-GRAPH-LABEL, fig.cap="A network graph with labels", fig.width=6, fig.height=6}
library(igraph)
library(gcookbook) # For the data set
# Copy madmen and drop every other row
m <- madmen[1:nrow(madmen) %% 2 == 1, ]
g <- graph.data.frame(m, directed=FALSE) # Print out the names of each vertex
V(g)$name
plot(g, layout=layout.fruchterman.reingold,
vertex.size = 4, # Smaller nodes
vertex.label = V(g)$name, # Set the labels
vertex.label.cex = 0.8, # Slightly smaller font
vertex.label.dist = 0.4, # Offset the labels
vertex.label.color = "black")
```
### Discussion
Another way to achieve the same effect is to modify the plot object, instead of passing in the values as arguments to `plot()`. To do this, use `V()$xxx<-` instead of passing a value to a `vertex.xxx` argument. For example, this will result in the same output as the previous code:
```{r, eval=FALSE}
# This is equivalent to the preceding code
V(g)$size <- 4
V(g)$label <- V(g)$name
V(g)$label.cex <- 0.8
V(g)$label.dist <- 0.4
V(g)$label.color <- "black"
# Set a property of the entire graph
g$layout <- layout.fruchterman.reingold
plot(g)
```
The properties of the edges can also be set, either with the `E()` function or by passing values to `edge.xxx` arguments (Figure \@ref(fig:FIG-MISCGRAPH-GRAPH-LABEL-EDGE)):
```{r echo=FALSE}
# This block duplicates the settings above; they get consumed when plot() is
# called, but we want them to apply to the next block.
V(g)$size <- 4
V(g)$label <- V(g)$name
V(g)$label.cex <- 0.8
V(g)$label.dist <- 0.4
V(g)$label.color <- "black"
# Set a property of the entire graph
g$layout <- layout.fruchterman.reingold
```
```{r FIG-MISCGRAPH-GRAPH-LABEL-EDGE, fig.cap="A network graph with labeled and colored edges", fig.width=6, fig.height=6}
# View the edges
E(g)
# Set some of the labels to "M"
E(g)[c(2,11,19)]$label <- "M"
# Set color of all to grey, and then color a few red
E(g)$color <- "grey70"
E(g)[c(2,11,19)]$color <- "red"
plot(g)
```
### See Also
See `?igraph.plotting` for more information about graphical parameters in igraph.
Creating a Heat Map {#RECIPE-MISCGRAPH-HEATMAP}
-------------------
### Problem
You want to make a heat map.
### Solution
Use `geom_tile()` or `geom_raster()` and map a continuous variable to `fill`. We'll use the `presidents` data set, which is a time series object rather than a data frame:
```{r}
presidents
str(presidents)
```
We'll first convert it to a format that is usable by ggplot: a data frame with columns that are numeric:
```{r}
pres_rating <- data.frame(
rating = as.numeric(presidents),
year = as.numeric(floor(time(presidents))),
quarter = as.numeric(cycle(presidents))
)
pres_rating
```
Now we can make the plot using `geom_tile()` or `geom_raster()` (Figure \@ref(fig:FIG-MISCGRAPH-HEATMAP)). Simply map one variable to `x`, one to `y`, and one to `fill`:
```{r eval=FALSE}
# Base plot
p <- ggplot(pres_rating, aes(x = year, y = quarter, fill = rating))
# Using geom_tile()
p + geom_tile()
# Using geom_raster() - looks the same, but a little more efficient
p + geom_raster()
```
(ref:cap-FIG-MISCGRAPH-HEATMAP) A heat map-the grey squares represent `NA`s in the data
```{r FIG-MISCGRAPH-HEATMAP, echo=FALSE, fig.cap="(ref:cap-FIG-MISCGRAPH-HEATMAP)", fig.width=13, fig.height=2}
# The above code block is just for display because it would make two plots when
# we just need one.
p <- ggplot(pres_rating, aes(x = year, y = quarter, fill = rating))
p +
geom_tile()
```
> **Note**
>
> The results with `geom_tile()` and `geom_raster()` *should* look the same, but in practice they might appear different. See Recipe \@ref(RECIPE-DISTRIBUTION-DENSITY2D) for more information about this issue.
### Discussion
To better convey useful information, you may want to customize the appearance of the heat map. With this example, we'll reverse the y-axis so that it progresses from top to bottom, and we'll add tick marks every four years along the x-axis, to correspond with each presidential term. For the x and y scales, we remove the padding by using `expand=c(0, 0)`. We'll also change the color scale using `scale_fill_gradient2()`, which lets you specify a midpoint color and the two colors at the low and high ends (Figure \@ref(fig:FIG-MISCGRAPH-HEATMAP-CUSTOM)):
```{r FIG-MISCGRAPH-HEATMAP-CUSTOM, fig.cap="A heat map with customized appearance", fig.width=13, fig.height=2}
p +
geom_tile() +
scale_x_continuous(breaks = seq(1940, 1976, by = 4), expand = c(0, 0)) +
scale_y_reverse(expand = c(0, 0)) +
scale_fill_gradient2(midpoint = 50, mid = "grey70", limits = c(0, 100))
```
### See Also
If you want to use a different color palette, see Recipe \@ref(RECIPE-COLORS-PALETTE-CONTINUOUS).
Creating a Three-Dimensional Scatter Plot {#RECIPE-MISCGRAPH-3D-SCATTER}
-----------------------------------------
### Problem
You want to create a three-dimensional scatter plot.
### Solution
We'll use the rgl package, which provides an interface to the OpenGL graphics library for 3D graphics. To create a 3D scatter plot, as in Figure \@ref(fig:FIG-MISCGRAPH-3D-SCATTER), use `plot3d()` and pass in a data frame where the first three columns represent *x*, *y*, and *z* coordinates, or pass in three vectors representing the *x*, *y*, and *z* coordinates.
```{r eval=FALSE}
# You may need to install first, with install.packages("rgl")
library(rgl)
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg, type = "s", size = 0.75, lit = FALSE)
```
```{r FIG-MISCGRAPH-3D-SCATTER, echo=FALSE, fig.cap="A 3D scatter plot"}
knitr::include_graphics("fig/FIG-MISCGRAPH-3D-SCATTER.png")
```
> **Note**
>
> If you are using a Mac, you will need to install XQuartz before you can use rgl. You can download it from <https://www.xquartz.org/>.
Viewers can rotate the image by clicking and dragging with the mouse, and zoom in and out with the scroll wheel.
> **Note**
>
> By default, `plot3d()` uses square points, which do not appear properly when saving to a PDF. For improved appearance, the example above uses `type="s"` for spherical points, made them smaller with `size=0.75`, and turned off the 3D lighting with `lit=FALSE` (otherwise they look like shiny spheres).
### Discussion
Three-dimensional scatter plots can be difficult to interpret, so it's often better to use a two-dimensional representation of the data. That said, there are things that can help make a 3D scatter plot easier to understand.
In Figure \@ref(fig:FIG-MISCGRAPH-3D-SCATTER2), we'll add vertical segments to help give a sense of the spatial positions of the points:
```{r eval=FALSE}
# Function to interleave the elements of two vectors
interleave <- function(v1, v2) as.vector(rbind(v1,v2))
# Plot the points
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg,
xlab = "Weight", ylab = "Displacement", zlab = "MPG",
size = .75, type = "s", lit = FALSE)
# Add the segments
segments3d(interleave(mtcars$wt, mtcars$wt),
interleave(mtcars$disp, mtcars$disp),
interleave(mtcars$mpg, min(mtcars$mpg)),
alpha = 0.4, col = "blue")
```
```{r FIG-MISCGRAPH-3D-SCATTER2, echo=FALSE, fig.cap="A 3D scatter plot with vertical lines for each point"}
knitr::include_graphics("fig/FIG-MISCGRAPH-3D-SCATTER2.png")
```
It's possible to tweak the appearance of the background and the axes. In Figure \@ref(fig:FIG-MISCGRAPH-3D-SCATTER3), we change the number of tick marks and add tick marks and axis labels to the specified sides:
```{r eval=FALSE}
# Make plot without axis ticks or labels
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg,
xlab = "", ylab = "", zlab = "",
axes = FALSE,
size = .75, type = "s", lit = FALSE)
segments3d(interleave(mtcars$wt, mtcars$wt),
interleave(mtcars$disp, mtcars$disp),
interleave(mtcars$mpg, min(mtcars$mpg)),
alpha = 0.4, col = "blue")
# Draw the box.
rgl.bbox(color = "grey50", # grey60 surface and black text
emission = "grey50", # emission color is grey50
xlen = 0, ylen = 0, zlen = 0) # Don't add tick marks
# Set default color of future objects to black
rgl.material(color = "black")
# Add axes to specific sides. Possible values are "x--", "x-+", "x+-", and "x++".
axes3d(edges = c("x--", "y+-", "z--"),
ntick = 6, # Attempt 6 tick marks on each side
cex = .75) # Smaller font
# Add axis labels. 'line' specifies how far to set the label from the axis.
mtext3d("Weight", edge = "x--", line = 2)
mtext3d("Displacement", edge = "y+-", line = 3)
mtext3d("MPG", edge = "z--", line = 3)
```
```{r FIG-MISCGRAPH-3D-SCATTER3, echo=FALSE, fig.show="hold", fig.cap="3D scatter plot with axis ticks and labels repositioned"}
knitr::include_graphics("fig/FIG-MISCGRAPH-3D-SCATTER3.png")
```
### See Also
See `?plot3d` for more options for controlling the output.
Adding a Prediction Surface to a Three-Dimensional Plot {#RECIPE-MISCGRAPH-3D-SCATTER-MODEL}
-------------------------------------------------------
### Problem
You want to add a surface of predicted value to a three-dimensional scatter plot.
### Solution
First we need to define some utility functions for generating the predicted values from a model object:
```{r eval=FALSE}
# Given a model, predict zvar from xvar and yvar
# Defaults to range of x and y variables, and a 16x16 grid
predictgrid <- function(model, xvar, yvar, zvar, res = 16, type = NULL) {
# Find the range of the predictor variable. This works for lm and glm
# and some others, but may require customization for others.
xrange <- range(model$model[[xvar]])
yrange <- range(model$model[[yvar]])
newdata <- expand.grid(x = seq(xrange[1], xrange[2], length.out = res),
y = seq(yrange[1], yrange[2], length.out = res))
names(newdata) <- c(xvar, yvar)
newdata[[zvar]] <- predict(model, newdata = newdata, type = type)
newdata
}
# Convert long-style data frame with x, y, and z vars into a list
# with x and y as row/column values, and z as a matrix.
df2mat <- function(p, xvar = NULL, yvar = NULL, zvar = NULL) {
if (is.null(xvar)) xvar <- names(p)[1]
if (is.null(yvar)) yvar <- names(p)[2]
if (is.null(zvar)) zvar <- names(p)[3]
x <- unique(p[[xvar]])
y <- unique(p[[yvar]])
z <- matrix(p[[zvar]], nrow = length(y), ncol = length(x))
m <- list(x, y, z)
names(m) <- c(xvar, yvar, zvar)
m
}
# Function to interleave the elements of two vectors
interleave <- function(v1, v2) as.vector(rbind(v1,v2))
```
With these utility functions defined, we can make a linear model from the data and plot it as a mesh along with the data, using the `surface3d()` function, as shown in Figure \@ref(fig:FIG-MISCGRAPH-3D-SCATTER-MODEL):
```{r eval=FALSE}
library(rgl)
# Make a copy of the data set
m <- mtcars
# Generate a linear model
mod <- lm(mpg ~ wt + disp + wt:disp, data = m)
# Get predicted values of mpg from wt and disp
m$pred_mpg <- predict(mod)
# Get predicted mpg from a grid of wt and disp
mpgrid_df <- predictgrid(mod, "wt", "disp", "mpg")
mpgrid_list <- df2mat(mpgrid_df)
# Make the plot with the data points
plot3d(m$wt, m$disp, m$mpg, type = "s", size = 0.5, lit = FALSE)
# Add the corresponding predicted points (smaller)
spheres3d(m$wt, m$disp, m$pred_mpg, alpha = 0.4, type = "s", size = 0.5, lit = FALSE)
# Add line segments showing the error
segments3d(interleave(m$wt, m$wt),
interleave(m$disp, m$disp),
interleave(m$mpg, m$pred_mpg),
alpha = 0.4, col = "red")
# Add the mesh of predicted values
surface3d(mpgrid_list$wt, mpgrid_list$disp, mpgrid_list$mpg,
alpha = 0.4, front = "lines", back = "lines")
```
```{r FIG-MISCGRAPH-3D-SCATTER-MODEL, echo=FALSE, fig.cap="A 3D scatter plot with a prediction surface"}
knitr::include_graphics("fig/FIG-MISCGRAPH-3D-SCATTER-MODEL.png")
```
### Discussion
We can tweak the appearance of the graph, as shown in Figure \@ref(fig:FIG-MISCGRAPH-3D-SCATTER-MODEL2). We'll add each of the components of the graph separately:
```{r eval=FALSE}
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg,
xlab = "", ylab = "", zlab = "",
axes = FALSE,
size = .5, type = "s", lit = FALSE)
# Add the corresponding predicted points (smaller)
spheres3d(m$wt, m$disp, m$pred_mpg, alpha = 0.4, type = "s", size = 0.5, lit = FALSE)
# Add line segments showing the error
segments3d(interleave(m$wt, m$wt),
interleave(m$disp, m$disp),
interleave(m$mpg, m$pred_mpg),
alpha = 0.4, col = "red")
# Add the mesh of predicted values
surface3d(mpgrid_list$wt, mpgrid_list$disp, mpgrid_list$mpg,
alpha = 0.4, front = "lines", back = "lines")
# Draw the box
rgl.bbox(color = "grey50", # grey60 surface and black text
emission = "grey50", # emission color is grey50
xlen = 0, ylen = 0, zlen = 0) # Don't add tick marks
# Set default color of future objects to black
rgl.material(color = "black")
# Add axes to specific sides. Possible values are "x--", "x-+", "x+-", and "x++".
axes3d(edges = c("x--", "y+-", "z--"),
ntick = 6, # Attempt 6 tick marks on each side
cex = .75) # Smaller font
# Add axis labels. 'line' specifies how far to set the label from the axis.
mtext3d("Weight", edge = "x--", line = 2)
mtext3d("Displacement", edge = "y+-", line = 3)
mtext3d("MPG", edge = "z--", line = 3)
```
```{r FIG-MISCGRAPH-3D-SCATTER-MODEL2, echo=FALSE, fig.cap="Three-dimensional scatter plot with customized appearance"}
knitr::include_graphics("fig/FIG-MISCGRAPH-3D-SCATTER-MODEL2.png")
```
### See Also
For more on changing the appearance of the surface, see `?rgl.material`.
Saving a Three-Dimensional Plot {#RECIPE-MISCGRAPH-3D-SAVE}
-------------------------------
### Problem
You want to save a three-dimensional plot created with the rgl package.
### Solution
To save a bitmap image of a plot created with rgl, use `rgl.snapshot()`. This will capture the exact image that is on the screen:
```{r eval=FALSE}
library(rgl)
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg, type = "s", size = 0.75, lit = FALSE)
rgl.snapshot('3dplot.png', fmt = 'png')
```
You can also use `rgl.postscript()` to save a PostScript or PDF file:
```{r eval=FALSE}
rgl.postscript('3dplot.pdf', fmt = 'pdf')
rgl.postscript('3dplot.ps', fmt = 'ps')
```
PostScript and PDF output does not support many features of the OpenGL library on which rgl is based. For example, it does not support transparency, and the sizes of objects such as points and lines may not be the same as what appears on the screen.
### Discussion
To make the output more repeatable, you can save your current viewpoint and restore it later:
```{r eval=FALSE}
# Save the current viewpoint
view <- par3d("userMatrix")
# Restore the saved viewpoint
par3d(userMatrix = view)
```
To save view in a script, you can use `dput()`, then copy and paste the output into your script:
```{r eval=FALSE}
dput(view)
```
Once you have the text representation of the `userMatrix`, add the following to your script:
```{r eval=FALSE}
view <- structure(c(0.907931625843048, 0.267511069774628, -0.322642296552658,
0, -0.410978674888611, 0.417272746562958, -0.810543060302734,
0, -0.0821993798017502, 0.868516683578491, 0.488796472549438,
0, 0, 0, 0, 1), .Dim = c(4L, 4L))
par3d(userMatrix = view)
```
Animating a Three-Dimensional Plot {#RECIPE-MISCGRAPH-3D-ANIMATE}
----------------------------------
### Problem
You want to animate a three-dimensional plot by moving the viewpoint around the plot.
### Solution
Rotating a 3D plot can provide a more complete view of the data. To animate a 3D plot, use `play3d()` with `spin3d()`:
```{r eval=FALSE}
library(rgl)
plot3d(mtcars$wt, mtcars$disp, mtcars$mpg, type = "s", size = 0.75, lit = FALSE)
play3d(spin3d())
```
### Discussion
By default, the graph will be rotated on the *z* (vertical) axis, until you send a break command to R.
You can change the rotation axis, rotation speed, and duration:
```{r eval=FALSE}
# Spin on x-axis, at 4 rpm, for 20 seconds
play3d(spin3d(axis = c(1,0,0), rpm = 4), duration = 20)
```
To save the movie, use the `movie3d()` function in the same way as `play3d()`. It will generate a series of *.png* files, one for each frame, and then attempt to combine them into a single animated *.gif* file using the convert program from the ImageMagick image utility.
This will spin the plot once in 15 seconds, at 50 frames per second:
```{r eval=FALSE}
# Spin on z axis, at 4 rpm, for 15 seconds
movie3d(spin3d(axis = c(0,0,1), rpm = 4), duration = 15, fps = 50)
```
The output file will be saved in a temporary directory, and the name will be printed on the R console.
If you don't want to use ImageMagick to convert the output to a *.gif*, you can specify `convert=FALSE` and then convert the series of *.png* files to a movie using some other utility.
Creating a Dendrogram {#RECIPE-MISCGRAPH-DENDROGRAM}
---------------------
### Problem
You want to make a dendrogram to show how items are clustered.
### Solution
Use `hclust()` and plot the output from it. This can require a fair bit of data preprocessing. For this example, we'll first take a subset of the `countries` data set from the year 2009. For simplicity, we'll also drop all rows that contain an `NA`, and then select a random 25 of the remaining rows:
```{r}
library(dplyr)
library(tidyr) # For drop_na function
library(gcookbook) # For the data set
# Set random seed to make random operation below repeatable
set.seed(392)
c2 <- countries %>%
filter(Year == 2009) %>% # Get data from year 2009
drop_na() %>% # Drop rows that have any NA values
sample_n(25) # Select 25 random rows
c2
```
Notice that the row names (the first column) are essentially random numbers, since the rows were selected randomly. We need to do a few more things to the data before making a dendrogram from it. First, we need to set the *row names*-right now there's a column called `Name`, but the row names are those random numbers (we don't often use row names, but for the `hclust()` function they're essential). Next, we'll need to drop all the columns that aren't values used for clustering. These columns are `Name`, `Code`, and `Year`:
```{r}
rownames(c2) <- c2$Name
c2 <- c2[, 4:7]
c2
```
The values for `GDP` are several orders of magnitude larger than the values for, say, `infmortality.` Because of this, the effect of `infmortality` on the clustering will be negligible compared to the effect of `GDP.` This probably isn't what we want. To address this issue, we'll scale the data:
```{r}
c3 <- scale(c2)
c3
```
By default the `scale()` function scales each column relative to its standard deviation, but other methods may be used.
Finally, we're ready to make the dendrogram, as shown in Figure \@ref(fig:FIG-MISCGRAPH-DENDROGRAM):
```{r FIG-MISCGRAPH-DENDROGRAM, fig.show="hold", fig.cap="A dendrogram (left); With text aligned (right)", fig.width=6, fig.height=5}
hc <- hclust(dist(c3))
# Make the dendrogram
plot(hc)
# With text aligned
plot(hc, hang = -1)
```
### Discussion
A cluster analysis is simply a way of assigning points to groups in an *n*-dimensional space (four dimensions, in this example). A hierarchical cluster analysis divides each group into two smaller groups, and can be represented with the dendrograms in this recipe. There are many different parameters you can control in the hierarchical cluster analysis process, and there may not be a single "right" way to do it for your data.
First, we normalized the data using `scale()` with its default settings. You can scale your data differently, or not at all. (With this data set, *not* scaling the data will lead to GDP overwhelming the other variables, as shown in Figure \@ref(fig:FIG-MISCGRAPH-DENDROGRAM-UNSCALED).)
```{r FIG-MISCGRAPH-DENDROGRAM-UNSCALED, fig.cap="Dendrogram with unscaled data-notice the much larger Height values, which are largely due to the unscaled GDP values", fig.width=6, fig.height=5}
hc_unscaled <- hclust(dist(c2))
plot(hc_unscaled)
```
For the distance calculation, we used the default method, "euclidean", which calculates the Euclidean distance between the points. The other possible methods are "maximum", "manhattan", "canberra", "binary", and "minkowski".
The `hclust()` function provides several methods for performing the cluster analysis. The default is "complete"; the other possible methods are "ward", "single", "average", "mcquitty", "median", and "centroid".
### See Also
See `?hclust` for more information about the different clustering methods.
Creating a Vector Field {#RECIPE-MISCGRAPH-VECTORFIELD}
-----------------------
### Problem
You want to make a vector field.
### Solution
Use `geom_segment()`. For this example, we'll use the `isabel` data set:
```{r}
library(gcookbook) # For the isabel data set
isabel
```
`x` and `y` are the longitude and latitude, respectively, and `z` is the height in kilometers. The `vx`, `vy`, and `vz` values are the wind speed components in each of these directions, in meters per second, and speed is the wind speed.
The height (`z`) ranges from 0.035 km to 18.035 km. For this example, we'll just use the lowest slice of data.
To draw the vectors (Figure \@ref(fig:FIG-MISCGRAPH-VECTOR)), we'll use `geom_segment()`. Each segment has a starting point and an ending point. We'll use the `x` and `y` values as the starting points for each segment, then add a fraction of the `vx` and `vy` values to get the end points for each segment. If we didn't scale down these values, the lines would be much too long:
```{r FIG-MISCGRAPH-VECTOR, fig.cap="First attempt at a vector field. The resolution of the data is too high, but it does hint at some interesting patterns not visible in graphs with a lower data resolution", fig.width=5, fig.height=5, warning=FALSE}
islice <- filter(isabel, z == min(z))
ggplot(islice, aes(x = x, y = y)) +
geom_segment(aes(xend = x + vx/50, yend = y + vy/50),
size = 0.25) # Make the line segments 0.25 mm thick
```
This vector field has two problems: the data is at too high a resolution to read, and the segments do not have arrows indicating the direction of the flow. To reduce the resolution of the data, we'll define a function `every_n()` that keeps one out of every `n` values in the data and drops the rest:
```{r}
# Take a slice where z is equal to the minimum value of z
islice <- filter(isabel, z == min(z))
# Keep 1 out of every 'by' values in vector x
every_n <- function(x, by = 2) {
x <- sort(x)
x[seq(1, length(x), by = by)]
}
# Keep 1 of every 4 values in x and y
keepx <- every_n(unique(isabel$x), by = 4)
keepy <- every_n(unique(isabel$y), by = 4)
# Keep only those rows where x value is in keepx and y value is in keepy
islicesub <- filter(islice, x %in% keepx & y %in% keepy)
```
Now that we've taken a subset of the data, we can plot it, with arrowheads, as shown in Figure \@ref(fig:FIG-MISCGRAPH-VECTORFIELD-ARROW):
```{r FIG-MISCGRAPH-VECTORFIELD-ARROW, fig.cap="Vector field with arrowheads", fig.width=5, fig.height=5, warning=FALSE}
# Need to load grid for arrow() function
library(grid)
# Make the plot with the subset, and use an arrowhead 0.1 cm long
ggplot(islicesub, aes(x = x, y = y)) +
geom_segment(aes(xend = x+vx/50, yend = y+vy/50),
arrow = arrow(length = unit(0.1, "cm")), size = 0.25)
```
### Discussion
One effect of arrowheads is that short vectors appear with more ink than is proportional to their length. This could somewhat distort the interpretation of the data. To mitigate this effect, it may also be useful to map the speed to other properties, like `size` (line thickness), `alpha`, or `colour`. Here, we'll map speed to alpha (Figure \@ref(fig:FIG-MISCGRAPH-VECTORFIELD-AES), left):
```{r FIG-MISCGRAPH-VECTORFIELD-AES-1, eval=FALSE}
# The existing 'speed' column includes the z component. We'll calculate
# speedxy, the horizontal speed.
islicesub$speedxy <- sqrt(islicesub$vx^2 + islicesub$vy^2)
# Map speed to alpha
ggplot(islicesub, aes(x = x, y = y)) +
geom_segment(aes(xend = x+vx/50, yend = y+vy/50, alpha = speed),
arrow = arrow(length = unit(0.1,"cm")), size = 0.6)
```
Next, we'll map speed to colour. We'll also add a map of the United States and zoom in on the area of interest, as shown in the graph on the right in Figure \@ref(fig:FIG-MISCGRAPH-VECTORFIELD-AES),using `coord_cartesian()` (without this, the entire USA will be displayed):
```{r FIG-MISCGRAPH-VECTORFIELD-AES-2, eval=FALSE}
# Get USA map data
usa <- map_data("usa")
# Map speed to colour, and set go from "grey80" to "darkred"
ggplot(islicesub, aes(x = x, y = y)) +
geom_segment(aes(xend = x+vx/50, yend = y+vy/50, colour = speed),
arrow = arrow(length = unit(0.1,"cm")), size = 0.6) +
scale_colour_continuous(low = "grey80", high = "darkred") +
geom_path(aes(x = long, y = lat, group = group), data = usa) +
coord_cartesian(xlim = range(islicesub$x), ylim = range(islicesub$y))
```
```{r FIG-MISCGRAPH-VECTORFIELD-AES, ref.label=c("FIG-MISCGRAPH-VECTORFIELD-AES-1", "FIG-MISCGRAPH-VECTORFIELD-AES-2"), echo=FALSE, message=FALSE, fig.show="hold", fig.cap="Vector field with speed mapped to alpha (left); With speed mapped to colour (right)", fig.width=6, fig.height=5, warning=FALSE}
```
The `isabel` data set has three-dimensional data, so we can also make a faceted graph of the data, as shown in Figure \@ref(fig:FIG-MISCGRAPH-VECTORFIELD-FACET). Because each facet is small, we will use a sparser subset than before:
(ref:cap-FIG-MISCGRAPH-VECTORFIELD-FACET) Vector field of wind speeds, faceted on `z`
```{r FIG-MISCGRAPH-VECTORFIELD-FACET, fig.cap="(ref:cap-FIG-MISCGRAPH-VECTORFIELD-FACET)", fig.width=9, fig.height=6, warning=FALSE}
# Keep 1 out of every 5 values in x and y, and 1 in 2 values in z
keepx <- every_n(unique(isabel$x), by = 5)
keepy <- every_n(unique(isabel$y), by = 5)
keepz <- every_n(unique(isabel$z), by = 2)
isub <- filter(isabel, x %in% keepx & y %in% keepy & z %in% keepz)