## Why ggvis?

ggvis is an awesome data visualization package which builds data graphics with a syntax similar to ggplot2 and creates rich interactive plots like shiny. Since the syntax is very structural, it’s easy to learn and to use.

## Basic ggvis Syntax

### Basic Components

ggvis recreates the grammar of graphics. The key syntax is like this:

You could find 4 components from the chunk above:

$$Graphic = Data + CoordinateSystem + Properties + Marks$$

For example, using built-in dataset mtcars:

Noticed that the coordinates, properties can be moved to the layer_<marks>(), ggvis() can generates plot without layer_<marks>(). Those will all be concretely introduced in the following part.

### Global vs. Local Declaration & Multiple Layers

ggvis allows multiple layers overlaid. When you put the coordinates and properties in ggvis(), you declare them globally. That means the coordinates and properties will be used commonly in all the following layer_<marks>()

We specify ~hp, ~mpg, stroke := "blue" in ggvis(), they are applied on all the layers: respectively on layer_points() and layer_smooths() for the color of border of the points and the color of the smooth line. By default the fill color of points is black.

When we do that locally, we put the properties in each layer:

The properties are declared locally and both layers use stroke but the property works sperately in 2 layers and fill has no impact on layer_smooths(). Why we keep ~hp, ~mpg in ggvis()? Because the program doesn’t have to run them twice in each layer. Keeping them in ggvis() makes it more efficient.

### Assignment Symbols = & :=

The most important symbols are = and :=. You can note them as “mapping” and “setting”.

There are 2 spaces when plotting something - a data space and a visualization space. For example the color have HTML color codes, RGB color codes, etc. If you provide a variable to specify the fill using = (normally followed by a tilde ~, making ggvis to treat it as a variable), ggvis will mapping the variable value on color scales first before plotting.

If you directly pass a string with quotation mark to it, ggvis read it as a raw value.

settings vs. mapping only works for a property instead of a parameter.
You could directly use = + values for a parameter.

Besides, %>% is based on package magrittr and is used widely in dplyr. It’s a symbol of chaining and makes the program more readable.

## Layers & Properties

If you don’t specify layer type, ggvis will use layer_guess() to give an approximate estimation (?ggvis::layer_guess). Besides the magic layer_guess(), I’d strongly recommand to learn more specified layers. We just show you 2 kinds of properties in previous. The basic layers and properties are as below (column names are layer_<marks> functions, row names are properties):

layer_ bars boxplots densities freqpolys histograms lines paths points
x / x2 O O O O O O O O
y / y2 O O O O O O O O
width O O X O O X X O
opacity O O O O O O O O
fill O O O O O O O O
fillOpacity O O O O O O O O
stroke O O O O O O O O
strokeWidth O O O O O O O O
strokeOpacity O O O O O O O O
size X O X X X X X O
shape X X X X X X X O

Where O means supported, X means not supported.

### Barchart, Histogram & Frequency Polygon

For bar graphs of counts at each unique x value, in contrast to a histogram’s bins along x ranges. Barchar and histogram both have width argument. However, the former one is used as column width in graphical space, the latter one to group the coutinuous data on x-axis.

Frequency polygon treats the continuous data in the same logic as histogram but use a line to describe the frequency evolution across ranges. Notice that I use fillOpacity instead of opacity in the third plot. That means the transparency effect is not applied on the stroke (not applied on every layer). By default there is nothing filled under the curve of frequency polygon, since we specify it, the region is filled by transparent red.

### Boxplot

width is also a parameter of layer_boxplots(), the default value is 0.9. This parameter specify the distance among groups / the width of boxes. Besides the normal properties fill, stroke, etc., you can assign the size of outliers.

Currently the layer_boxplots() seems to have a bit problems under ggvis (version 0.4.2) when we modify the value of size - the mustach move but the boxes don’t. Waiting for the package update.

### Density Plot

Density plots provide another way to display the distribution of a single variable. A density plot uses a line to display the density of a variable at each point in its range. You can think of a density plot as a continuous version of a histogram with a different y scale (although this is not exactly accurate).

You can specify the area parameter to decide whether there should be a shaded region drawn under the curve. In the chunk above, even you assign “red” to fill, there is nothing under the density curve (the dault setting is to draw a grey shadow).

### Lines & Paths

Firstly we compare these 2 layers:

It seems that the layer_paths() is chaos, but it is not. It plots starting from the very first record until the last. Let’s reorder the dataset.

Now layer_paths() have same trend as layer_lines(). layer_paths() is more powerful on geographical plots.

lines and paths plot can also use fill property.

### Scatterplot

We’ve been talking about layer_points() for many times. To be noticed that there are many values for the shape of points: "circle"(default), "square", "diamond", "cross", "triangle-up" and "triangle-down".

factor() converts cyl from numeric to categorical data and that makes the plot more clear.

### Special Layers: Model Prediction & Smooths

layer_model_predictions() fits a model to the data and draw it with layer_paths() and, optionally, layer_ribbons(). layer_smooths() is a special case of layering model predictions where the model is a smooth loess curve whose smoothness is controlled by the span parameter. Both use same properties as layer_paths() and layer_lines().

If you don’t specify the formula argument in layer_model_predictions(), ggvis will guess it based on the input in the global data space of ggvis().

There are more properties and layer_<marks>(), you could make a research by typing ?ggvis::marks and ??ggvis::layer_.

## Layers Equivalence

### Model Prediction, Smooths & Densities

Some layers can be realized in another way. Consider how is the layer_model_predictions() processed. 1. Estimate the model based on the dataset; 2. Compute predicted data; 3. Plot the predicted line.

In ggvis, compute_model_prediction() realizes the first 2 steps.

The function returns 2 columns pred_ and resp_.

The chunk returns the same line as the layer_model_predictions() did before. layer_smooths() and layer_densities() can be splitted to 2 steps in the same way.

And you can easily find the corresponding relations as below:

layer_ compute_ layer_
model_predictions model_predictions paths
smooths smooths paths
histograms bin rects
densities density lines
bar count/tabulate/stack/align rects
boxplots boxplot/stack rects

### Equivalence of Histograms

compute_bin() returns 5 columns where count_ is the count in the each interval, xmin_ and xmax_ are the left-most and right-most border of the interval. We can use these 3 columns to plot the rectangles in the intervals.

### Equivalence of Density Plot

compute_density() returns pred_ and resp_ - concatenate them by layer_lines() brings the exact density line.

### Equivalence of Barchart

compute_count() and compute_tabulate() only returns 2 columns - x_ at which point what’s the level of corresponding data count_. And compute_align() helps to set up the lower and upper bound of interval.

In compute_align(), length limits the length of an interval. It’s equivalent to the width in layer_bars()

How does compute_stack() works?

compute_stack() generates 3 new variables based on data source: group__, stack_upr_ and stack_lwr_. The latter 2 variables indicate the upper and lower y coordinates of each stack.

Try another classic stacked barchart:

The compound way to realize a layer helps to understand how the high level layer is generated, and gives you a way to grab the key data for plotting.

### Working with dplyr Package

ggvis can work along with dplyr package. For example to recreate the plot above:

group_by() also works for multiple variables.

## Interactive Output

ggvis enables interactive HTML output by a series of input_<form_controls>() functions. There are 7 input widgets:

• input_checkbox() creates an interactive checkbox;
• input_select(), input_checkboxgroup() and input_radiobuttons()
create interactive control to select on or more options from a list;
• input_slider() creates an interactive slider; input_numeric();
• input_text() create an interactive numeric or text input box.

### Single Checkbox

You can also split the plotting steps if the input clause is too long.

### Selection from A List

input_select() and input_checkboxgroup() allows multiple choice. input_radiobuttons() allows only one option to be picked up in the list.

The argument map should be function with one single argument and returns a modified value based on this funciton. When you maps the variable name to a property, remember to use= instead of :=

### Slider

A slider is quite useful for controlling the argument of continuous data, for example, control the binwidth of a histogram:

### Numeric & Text Input Box

To control the binwidth, you could also directly assign a value by select_numeric().

To control the fill color, you could use select_text().

## Axes, Legends & Scales

### Axes

We use add_axis() to adjust the axis.

The first argument specify horizontal or vertical axis. Use title to name the axes. You can add more details:

Compare carefully the difference between the 2 plots and you will find what do those arguments serve for.

### Legends

add_legend() works similarly to add_axis(), except that it alters the legend of a plot. Instead of specifying which axis to change, you have to specify the property you want to add to the legend. For example:

ggvis will create a separate legend for each property that you use. To do this, you just need to feed add_legend() a vector of property names as its first argument. The code below creates legend for 3 properties: fill, shape and size.

### Scales

ggvis provides several different functions for creating scales: scale_datetime(), scale_logical(), scale_nominal(), scale_numeric(), scale_singular(). Each maps a different type of data input to the visual properties that ggvis uses.

The chunk above maps the value of disp on the scale range between red and yellow for fill color, between darkred and orange for stroke color.

The chunk below maps a categorical variable to fill. cyl has 3 unique values so we can provide a range of length 3 with the color names.

You can adjust any visual property in your graph with a scale (not just color). For example you can specify the opacity and the domain of axes.

scale_numeric("y", domain = c(0, NA)) means there is no limit on the maximum value on the y-axis.

Be aware: ggvis interactivity cannot be displayed in HTML file converted from .Rmd by knitr.