You are currently browsing the monthly archive for April 2015.

In RStudio v0.99 we’ve made a major investment in R source code analysis. This work resulted in significant improvements in code completion, and in the latest preview release enable a new inline code diagnostics feature that highlights various issues in your R code as you edit.

For example, here we’re getting a diagnostic that notes that there is an extra parentheses:

Screen Shot 2015-04-08 at 12.04.14 PM

Here the diagnostic indicates that we’ve forgotten a comma within a shiny UI definition:


This diagnostic flags an unknown parameter to a function call:

Screen Shot 2015-04-08 at 11.50.07 AM

This diagnostic indicates that we’ve referenced a variable that doesn’t exist and suggests a fix based on another variable in scope:

Screen Shot 2015-04-08 at 4.23.49 PM

A wide variety of diagnostics are supported, including optional diagnostics for code style issues (e.g. the inclusion of unnecessary whitespace). Diagnostics are also available for several other languages including C/C++, JavaScript, HTML, and CSS.

Configuring Diagnostics

By default, code in the current source file is checked whenever it is saved, as well as if the keyboard is idle for a period of time. You can tweak this behavior using the Code -> Diagnostics options:


Note that several of the available diagnostics are disabled by default. This is because we’re in the process of refining their behavior to eliminate “false negatives” where correct code is flagged as having a problem. We’ll continue to improve these diagnostics and enable them by default when we feel they are ready.

Trying it Out

You can try out the new code diagnostics by downloading the latest preview release of RStudio. This feature is a work in progress and we’re particularly interested in feedback on how well it works. Please also let us know if there are common coding problems which you think we should add new diagnostics for. We hope you try out the preview and let us know how we can make it better.


I’m pleased to announced that the first version of xml2 is now available on CRAN. Xml2 is a wrapper around the comprehensive libxml2 C library that makes it easier to work with XML and HTML in R:

  • Read XML and HTML with read_xml() and read_html().
  • Navigate the tree with xml_children(), xml_siblings() and xml_parent(). Alternatively, use xpath to jump directly to the nodes you’re interested in with xml_find_one() and xml_find_all(). Get the full path to a node with xml_path().
  • Extract various components of a node with xml_text(), xml_attrs(), xml_attr(), and xml_name().
  • Convert to list with as_list().
  • Where appropriate, functions support namespaces with a global url -> prefix lookup table. See xml_ns() for more details.
  • Convert relative urls to absolute with url_absolute(), and transform in the opposite direction with url_relative(). Escape and unescape special characters with url_escape() and url_unescape().
  • Support for modifying and creating xml documents in planned in a future version.

This package owes a debt of gratitude to Duncan Temple Lang who’s XML package has made it possible to use XML with R for almost 15 years!


You can install it by running:


(If you’re on a mac, you might need to wait a couple of days – CRAN is busy rebuilding all the packages for R 3.2.0 so it’s running a bit behind.)

Here’s a small example working with an inline XML document:

x <- read_xml("<foo>
  <bar>text <baz id = 'a' /></bar>
  <baz id = 'b' /> 

#> [1] "foo"
#> {xml_nodeset (3)}
#> [1] <bar>text <baz id="a"/></bar>
#> [2] <bar>2</bar>
#> [3] <baz id="b"/>

# Find all baz nodes anywhere in the document
baz <- xml_find_all(x, ".//baz")
#> {xml_nodeset (2)}
#> [1] <baz id="a"/>
#> [2] <baz id="b"/>
#> [1] "/foo/bar[1]/baz" "/foo/baz"
xml_attr(baz, "id")
#> [1] "a" "b"


Xml2 is still under active development. If notice any problems (including crashes), please try the development version, and if that doesn’t work, file an issue.

I’m pleased to announced that the first version of readxl is now available on CRAN. Readxl makes it easy to get tabular data out of excel. It:

  • Supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the insanely fast RapidXML C++ library.
  • Has no external dependencies so it’s easy to use on all platforms.
  • Re-encodes non-ASCII characters to UTF-8.
  • Loads datetimes into POSIXct columns. Both Windows (1900) and Mac (1904) date specifications are processed correctly.
  • Blank columns are automatically dropped.
  • Returns output with class c("tbl_df", "tbl", "data.frame") so if you also use dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).

You can install it by running:


There’s not really much to say about how to use it:

# Use a excel file included in the package
sample <- system.file("extdata", "datasets.xlsx", package = "readxl")

# Read by position
head(read_excel(sample, 2))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Or by name:
#> [1] "iris"     "mtcars"   "chickwts" "quakes"
head(read_excel(sample, "mtcars"))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

You can see the documentation for more info on the col_names, col_types and na arguments.

Readxl is still under active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.

The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides rich facilities for charting time-series data in R, including:

  • Automatically plots xts time-series objects (or objects convertible to xts).
  • Rich interactive features including zoom/pan and series/point highlighting.
  • Highly configurable axis and series display (including optional 2nd Y-axis).
  • Display upper/lower bars (e.g. prediction intervals) around series.
  • Various graph overlays including shaded regions, event lines, and annotations.
  • Use at the R console just like conventional R plots (via RStudio Viewer).
  • Embeddable within R Markdown documents and Shiny web applications.

The dygraphs package is available on CRAN now and can be installed with:



Here are some examples of interactive time series visualizations you can create with only a line or two of R code (the screenshots are static, click them to see the interactive version).

Panning and Zooming

This code adds a range selector that’s can be used to pan and zoom around the series data:

dygraph(nhtemp, main = "New Haven Temperatures") %>%

Screen Shot 2015-04-09 at 1.01.35 PM

Point Highlighting

When you hover over the time-series the values of all points at the location of the mouse are shown in the legend:

lungDeaths <- cbind(ldeaths, mdeaths, fdeaths)
dygraph(lungDeaths, main = "Deaths from Lung Disease (UK)") %>%
  dyOptions(colors = RColorBrewer::brewer.pal(3, "Set2"))

Screen Shot 2015-04-09 at 12.53.54 PM

Shading and Annotations

There are a wide variety of tools available to annotate time series. Here we demonstrate creating shaded regions:

dygraph(nhtemp, main="New Haven Temperatures") %>% 
  dySeries(label="Temp (F)", color="black") %>%
  dyShading(from="1920-1-1", to="1930-1-1", color="#FFE6E6") %>%
  dyShading(from="1940-1-1", to="1950-1-1", color="#CCEBD6")

Screen Shot 2015-04-09 at 1.11.31 PM

You can find additional examples and documentation on the dygraphs for R website.

Bringing JavaScript to R

One of the reasons we are excited about dygraphs is that it takes a mature and feature rich visualization library formerly only accessible to web developers and makes it available to all R users.

This is part of a larger trend enabled by the htmlwidgets package, and we expect that more and more libraries like dygraphs will emerge over the coming months to bring the best of JavaScript data visualization to R.


Over the past several years the Rcpp package has become an indispensable tool for creating high-performance R code. Its power and ease of use have made C++ a natural second language for many R users. There are over 400 packages on CRAN and Bioconductor that depend on Rcpp and it is now the most downloaded R package.

In RStudio v0.99 we have added extensive additional tools to make working with Rcpp more pleasant, productive, and robust, these include:

  • Code completion
  • Source diagnostics as you edit
  • Code snippets
  • Auto-indentation
  • Navigable list of compilation errors
  • Code navigation (go to definition)

We think these features will go a long way to helping even more R users succeed with Rcpp. You can try the new features out now by downloading the RStudio Preview Release.

Code Completion

RStudio v0.99 includes comprehensive code completion for C++ based on Clang (the same underlying engine used by XCode and many other C/C++ tools):

Screen Shot 2015-04-07 at 12.13.31 PM

Completions are provided for the C++ language, Rcpp, and any other libraries you have imported.


As you edit C++ source files RStudio uses Clang to scan your code looking for errors, incomplete code, or other conditions worthy of warnings or informational notes. For example:

Screen Shot 2015-04-07 at 12.16.38 PM

Diagnostics alert you to the possibility of subtle problems and flag outright incorrect code as early as possible, substantially reducing iteration/debugging time.

Interactive C++

Rcpp includes some nifty tools to help make working with C++ code just as simple and straightforward as working with R code. You can “source” C++ code into R just like you’d source an R script (no need to deal with Makefiles or build systems). Here’s a Gibbs Sampler implemented with Rcpp:

Screen Shot 2015-04-13 at 4.40.36 PM

We can make this function available to R by simply sourcing the C++ file (much like we’d source an R script):

gibbs(100, 10)

Thanks to the abstractions provided by Rcpp, the code implementing the Gibbs Sampler in C++ is nearly identical to the code you’d write in R, but runs 20 times faster. RStudio includes full support for Rcpp’s sourceCpp via the Source button and Ctrl+Shift+Enter keyboard shortcut.

Try it Out

If you are new to C++ or Rcpp you might be surprised at how easy it is to get started. There are lots of great resources available, including:

You can give the new Rcpp features a try now by downloading the RStudio Preview Release. If you run into problems or have feedback on how we could make things better let us know on our Support Forum.

We’re getting close to shipping the next version of RStudio (v0.99) and this week will continue our series of posts describing the major new features of the release (previous posts have already covered code completion, the revamped data viewer, and improvements to vim mode). Note that if you want to try out any of the new features now you can do so by downloading the RStudio Preview Release.

Code Snippets

Code snippets are text macros that are used for quickly inserting common snippets of code. For example, the fun snippet inserts an R function definition:

Insert Snippet

If you select the snippet from the completion list it will be inserted along with several text placeholders which you can fill in by typing and then pressing Tab to advance to the next placeholder:

Screen Shot 2015-04-07 at 10.44.39 AM

Other useful snippets include:

  • lib, req, and source for the library, require, and source functions
  • df and mat for defining data frames and matrices
  • if, el, and ei for conditional expressions
  • apply, lapply, sapply, etc. for the apply family of functions
  • sc, sm, and sg for defining S4 classes/methods.

Snippets are a great way to automate inserting common/boilerplate code and are available for R, C/C++, JavaScript, and several other languages.

Inserting Snippets

As illustrated above, code snippets show up alongside other code completion results and can be inserted by picking them from the completion list. By default the completion list will show up automatically when you pause typing for 250 milliseconds and can also be manually activated via the Tab key. In addition, if you have typed the character sequence for a snippet and want to insert it immediately (without going through the completion list) you can press Shift+Tab.

Customizing Snippets

You can edit the built-in snippet definitions and even add snippets of your own via the Edit Snippets button in Global Options -> Code:

Edit Snippets

Custom snippets are defined using the snippet keyword. The contents of the snippet should be indented below using the <tab> key (rather than with spaces). Variables can be defined using the form {1:varname}. For example, here’s the definition of the setGeneric snippet:

snippet sg
  setGeneric("${1:generic}", function(${2:x, ...}) {

Once you’ve customized snippets for a given language they are written into the ~/.R/snippets directory. For example, the customized versions of R and C/C++ snippets are written to:


You can edit these files directly to customize snippet definitions or you can use the Edit Snippets dialog as described above. If you need to move custom snippet definitions to another system then simply place them in ~/.R/snippets and they’ll be used in preference to the built-in snippet definitions.

Try it Out

You can give code snippets a try now by downloading the RStudio Preview Release. If you run into problems or have feedback on how we could make things better let us know on our Support Forum.

I’m pleased to announced that readr is now available on CRAN. Readr makes it easy to read many types of tabular data:

  • Delimited files withread_delim(), read_csv(), read_tsv(), and read_csv2().
  • Fixed width files with read_fwf(), and read_table().
  • Web log files with read_log().

You can install it by running:


Compared to the equivalent base functions, readr functions are around 10x faster. They’re also easier to use because they’re more consistent, they produce data frames that are easier to use (no more stringsAsFactors = FALSE!), they have a more flexible column specification, and any parsing problems are recorded in a data frame. Each of these features is described in more detail below.


All readr functions work the same way. There are four important arguments:

  • file gives the file to read; a url or local path. A local path can point to a a zipped, bzipped, xzipped, or gzipped file – it’ll be automatically uncompressed in memory before reading. You can also pass in a connection or a raw vector.

    For small examples, you can also supply literal data: if file contains a new line, then the data will be read directly from the string. Thanks to data.table for this great idea!

    #>   x y
    #> 1 1 2
    #> 2 3 4
  • col_names: describes the column names (equivalent to header in base R). It has three possible values:
    • TRUE will use the the first row of data as column names.
    • FALSE will number the columns sequentially.
    • A character vector to use as column names.
  • col_types: overrides the default column types (equivalent to colClasses in base R). More on that below.
  • progress: By default, readr will display a progress bar if the estimated loading time is greater than 5 seconds. Use progress = FALSE to suppress the progress indicator.


The output has been designed to make your life easier:

  • Characters are never automatically converted to factors (i.e. no more stringsAsFactors = FALSE!).
  • Column names are left as is, not munged into valid R identifiers (i.e. there is no check.names = TRUE). Use backticks to refer to variables with unusual names, e.g. df$`Income ($000)`.
  • The output has class c("tbl_df", "tbl", "data.frame") so if you also use dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).
  • Row names are never set.

Column types

Readr heuristically inspects the first 100 rows to guess the type of each columns. This is not perfect, but it’s fast and it’s a reasonable start. Readr can automatically detect these column types:

  • col_logical() [l], contains only T, F, TRUE or FALSE.
  • col_integer() [i], integers.
  • col_double() [d], doubles.
  • col_euro_double() [e], “Euro” doubles that use , as the decimal separator.
  • col_date() [D]: Y-m-d dates.
  • col_datetime() [T]: ISO8601 date times
  • col_character() [c], everything else.

You can manually specify other column types:

  • col_skip() [_], don’t import this column.
  • col_date(format) and col_datetime(format, tz), dates or date times parsed with given format string. Dates and times are rather complex, so they’re described in more detail in the next section.
  • col_numeric() [n], a sloppy numeric parser that ignores everything apart from 0-9, - and . (this is useful for parsing currency data).
  • col_factor(levels, ordered), parse a fixed set of known values into a (optionally ordered) factor.

There are two ways to override the default choices with the col_types argument:

  • Use a compact string: "dc__d". Each letter corresponds to a column so this specification means: read first column as double, second as character, skip the next two and read the last column as a double. (There’s no way to use this form with column types that need parameters.)
  • With a (named) list of col objects:
    read_csv("iris.csv", col_types = list(
      Sepal.Length = col_double(),
      Sepal.Width = col_double(),
      Petal.Length = col_double(),
      Petal.Width = col_double(),
      Species = col_factor(c("setosa", "versicolor", "virginica"))

    Any omitted columns will be parsed automatically, so the previous call is equivalent to:

    read_csv("iris.csv", col_types = list(
      Species = col_factor(c("setosa", "versicolor", "virginica"))

Dates and times

One of the most helpful features of readr is its ability to import dates and date times. It can automatically recognise the following formats:

  • Dates in year-month-day form: 2001-10-20 or 2010/15/10 (or any non-numeric separator). It can’t automatically recongise dates in m/d/y or d/m/y format because they’re ambiguous: is 02/01/2015 the 2nd of January or the 1st of February?
  • Date times as ISO8601 form: e.g. 2001-02-03 04:05:06.07 -0800, 20010203 040506, 20010203 etc. I don’t support every possible variant yet, so please let me know if it doesn’t work for your data (more details in ?parse_datetime).

If your dates are in another format, don’t despair. You can use col_date() and col_datetime() to explicit specify a format string. Readr implements it’s own strptime() equivalent which supports the following format strings:

  • Year: \%Y (4 digits). \%y (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
  • Month: \%m (2 digits), \%b (abbreviated name in current locale), \%B (full name in current locale).
  • Day: \%d (2 digits), \%e (optional leading space)
  • Hour: \%H
  • Minutes: \%M
  • Seconds: \%S (integer seconds), \%OS (partial seconds)
  • Time zone: \%Z (as name, e.g. America/Chicago), \%z (as offset from UTC, e.g. +0800)
  • Non-digits: \%. skips one non-digit charcater, \%* skips any number of non-digit characters.
  • Shortcuts: \%D = \%m/\%d/\%y, \%F = \%Y-\%m-\%d, \%R = \%H:\%M, \%T = \%H:\%M:\%S, \%x = \%y/\%m/\%d.

To practice parsing date times with out having to load the file each time, you can use parse_datetime() and parse_date():

#> [1] "2015-10-10"
parse_datetime("2015-10-10 15:14")
#> [1] "2015-10-10 15:14:00 UTC"

parse_date("02/01/2015", "%m/%d/%Y")
#> [1] "2015-02-01"
parse_date("02/01/2015", "%d/%m/%Y")
#> [1] "2015-01-02"


If there are any problems parsing the file, the read_ function will throw a warning telling you how many problems there are. You can then use the problems() function to access a data frame that gives information about each problem:

csv <- "x,y

df <- read_csv(csv, col_types = "ii")
#> Warning: 2 problems parsing literal data. See problems(...) for more
#> details.
#>   row col   expected actual
#> 1   1   2 an integer      a
#> 2   2   1 an integer      b
#>    x  y
#> 1  1 NA
#> 2 NA  2

Helper functions

Readr also provides a handful of other useful functions:

  • read_lines() works the same way as readLines(), but is a lot faster.
  • read_file() reads a complete file into a string.
  • type_convert() attempts to coerce all character columns to their appropriate type. This is useful if you need to do some manual munging (e.g. with regular expressions) to turn strings into numbers. It uses the same rules as the read_* functions.
  • write_csv() writes a data frame out to a csv file. It’s quite a bit faster than write.csv() and it never writes row.names. It also escapes " embedded in strings in a way that read_csv() can read.


Readr is still under very active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.


Action buttons can be tricky to use in Shiny because they work differently than other widgets. Widgets like sliders and select boxes maintain a value that is easy to use in your code. But the value of an action button is arbitrary. What should you do with it? Did you know that you should almost always call the value of an action button from observeEvent() or eventReactive()?

The newest article at the Shiny Development Center explains how action buttons work, and it provides five useful patterns for working with action buttons. These patterns also work well with action links.

Read the article here.