You are currently browsing the category archive for the ‘Packages’ category.


RStudio has teamed up with Datacamp to create a new, interactive way to learn dplyr. Dplyr is an R package that provides a fast, intuitive way to transform data sets with R. It introduces five functions, optimized in C++, that can handle ~90% of data manipulation tasks. These functions are lightning fast, which lets you accomplish more things—with more data—than you could otherwise. They are also designed to be intuitive and easy to learn, which makes R more user friendly. But this is just the beginning. Dplyr also automates groupwise operations in R, provides a standard syntax for accessing and manipulating database data with R, and much more.

In the course, you will learn how to use dplyr to

  • select() variables and filter() observations from your data in a targeted way
  • arrange() observations within your data set by value
  • derive new variables from your data with mutate()
  • create summary statistics with summarise()
  • perform groupwise operations with group_by()
  • use the dplyr syntax to access data stored in a database outside of R.

You will also practice using the tbl data structure and the new pipe operator in R, %>%.

The course is taught by Garrett Grolemund, RStudio’s Master Instructor, and is organized around Datacamp’s interactive interface. You will receive expert instruction in short, clear videos as you work through a series of progressive exercises. As you work, the Datacamp interface will provide immediate feedback and hints, alerting you when you do something wrong and rewarding you when you do something right. The course is designed to take about 4 hours and requires only a basic familiarity with R.

This is the first course in a RStudio datacamp track that will cover dplyr, ggvis, rmarkdown, and the RStudio IDE. To enroll, visit the datacamp dplyr portal.

I’m very pleased to announce that dplyr 0.3 is now available from CRAN. Get the latest version by running:


There are four major new features:

  • Four new high-level verbs: distinct(), slice(), rename(), and transmute().
  • Three new helper functions between, count(), and data_frame().
  • More flexible join specifications.
  • Support for row-based set operations.

There are two new features of interest to developers. They make it easier to write packages that use dplyr:

  • It’s now much easier to program with dplyr (using standard evaluation).
  • Improved database backends.

I describe each of these in turn below.

New verbs

distinct() returns distinct (unique) rows of a table:

# Find all origin-destination pairs
flights %>% 
  select(origin, dest) %>%
#> Source: local data frame [224 x 2]
#>    origin dest
#> 1     EWR  IAH
#> 2     LGA  IAH
#> 3     JFK  MIA
#> 4     JFK  BQN
#> 5     LGA  ATL
#> ..    ...  ...

slice() allows you to select rows by position. It includes positive integers and drops negative integers:

# Get the first flight to each destination
flights %>% 
  group_by(dest) %>%
#> Source: local data frame [105 x 16]
#> Groups: dest
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013    10   1     1955        -6     2213       -35      B6  N554JB
#> 2  2013    10   1     1149       -10     1245       -14      B6  N346JB
#> 3  2013     1   1     1315        -2     1413       -10      EV  N13538
#> 4  2013     7   6     1629        14     1954         1      UA  N587UA
#> 5  2013     1   1      554        -6      812       -25      DL  N668DN
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

transmute() and rename() are variants of mutate() and select(). Transmute drops all columns that you didn’t specifically mention, rename() keeps all columns that you didn’t specifically mention. They complete this table:

Drop others Keep others
Rename & reorder variables select() rename()
Compute new variables transmute() mutate()

New helpers

data_frame(), contributed by Kevin Ushey, is a nice way to create data frames:

  • It never changes the type of its inputs (i.e. no more stringsAsFactors = FALSE!)
    data.frame(x = letters) %>% sapply(class)
    #>        x 
    #> "factor"
    data_frame(x = letters) %>% sapply(class)
    #>           x 
    #> "character"
  • Or the names of variables:
    data.frame(`crazy name` = 1) %>% names()
    #> [1] ""
    data_frame(`crazy name` = 1) %>% names()
    #> [1] "crazy name"
  • It evaluates its arguments lazyily and in order:
    data_frame(x = 1:5, y = x ^ 2)
    #> Source: local data frame [5 x 2]
    #>   x  y
    #> 1 1  1
    #> 2 2  4
    #> 3 3  9
    #> 4 4 16
    #> 5 5 25
  • It adds tbl_df() class to output, never adds row.names(), and only recycles vectors of length 1 (recycling is a frequent source of bugs in my experience).

The count() function wraps up the common combination of group_by() and summarise():

# How many flights to each destination?
flights %>% count(dest)
#> Source: local data frame [105 x 2]
#>    dest     n
#> 1   ABQ   254
#> 2   ACK   265
#> 3   ALB   439
#> 4   ANC     8
#> 5   ATL 17215
#> ..  ...   ...

# Which planes flew the most?
flights %>% count(tailnum, sort = TRUE)
#> Source: local data frame [4,044 x 2]
#>    tailnum    n
#> 1          2512
#> 2   N725MQ  575
#> 3   N722MQ  513
#> 4   N723MQ  507
#> 5   N711MQ  486
#> ..     ...  ...

# What's the total carrying capacity of planes by year of purchase
planes %>% count(year, wt = seats)
#> Source: local data frame [47 x 2]
#>    year   n
#> 1  1956 102
#> 2  1959  18
#> 3  1963  10
#> 4  1965 149
#> 5  1967   9
#> ..  ... ...

Better joins

You can now join by different variables in each table:

narrow <- flights %>% select(origin, dest, year:day)

# Add destination airport metadata
narrow %>% left_join(airports, c("dest" = "faa"))
#> Source: local data frame [336,776 x 11]
#>    dest origin year month day                            name   lat    lon
#> 1   IAH    EWR 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 2   IAH    LGA 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 3   MIA    JFK 2013     1   1                      Miami Intl 25.79 -80.29
#> 4   BQN    JFK 2013     1   1                              NA    NA     NA
#> 5   ATL    LGA 2013     1   1 Hartsfield Jackson Atlanta Intl 33.64 -84.43
#> ..  ...    ...  ...   ... ...                             ...   ...    ...
#> Variables not shown: alt (int), tz (dbl), dst (chr)

# Add origin airport metadata
narrow %>% left_join(airports, c("origin" = "faa"))
#> Source: local data frame [336,776 x 11]
#>    origin dest year month day                name   lat    lon alt tz dst
#> 1     EWR  IAH 2013     1   1 Newark Liberty Intl 40.69 -74.17  18 -5   A
#> 2     LGA  IAH 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> 3     JFK  MIA 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 4     JFK  BQN 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 5     LGA  ATL 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> ..    ...  ...  ...   ... ...                 ...   ...    ... ... .. ...

(right_join() and outer_join() implementations are planned for dplyr 0.4.)

Set operations

You can use intersect(), union() and setdiff() with data frames, data tables and databases:

jfk_planes <- flights %>% 
  filter(origin == "JFK") %>% 
  select(tailnum) %>% 
lga_planes <- flights %>% 
  filter(origin == "LGA") %>% 
  select(tailnum) %>% 

# Planes that fly out of either JGK or LGA
nrow(union(jfk_planes, lga_planes))
#> [1] 3592

# Planes that fly out of both JFK and LGA
nrow(intersect(jfk_planes, lga_planes))
#> [1] 1311

# Planes that fly out JGK but not LGA
nrow(setdiff(jfk_planes, lga_planes))
#> [1] 647

Programming with dplyr

You can now program with dplyr – every function that uses non-standard evaluation (NSE) also has a standard evaluation (SE) twin that ends in _. For example, the SE version of filter() is called filter_(). The SE version of each function has similar arguments, but they must be explicitly “quoted”. Usually the best way to do this is to use ~:

airport <- "ANC"
# NSE version
filter(flights, dest == airport)
#> Source: local data frame [8 x 16]
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

# Equivalent SE code:
criteria <- ~dest == airport
filter_(flights, criteria)
#> Source: local data frame [8 x 16]
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

To learn more, read the Non-standard evaluation vignette. This new approach is powered by the lazyeval package which provides all the tools needed to implement NSE consistently and correctly. I now understand how to implement NSE consistently and correctly, and I’ll be using the same approach everywhere.

Database backends

The database backend system has been completely overhauled in order to make it possible to add backends in other packages, and to support a much wider range of databases. If you’re interested in implementing a new dplyr backend, please check out vignette("new-sql-backend") – it’s really not that much work.

The first package to take advantage of this system is MonetDB.R, which now provides the MonetDB backend for dplyr.

Other changes

As well as the big new features described here, dplyr 0.3 also fixes many bugs and makes numerous minor improvements. See the release notes for a complete list of the changes.

Shiny v0.10.2 has been released to CRAN. To install it:


This version of Shiny requires R 3.0.0 or higher (note the current version of R is 3.1.1). R 2.15.x is no longer supported.

Here are the most prominent changes:

  • File uploading via fileInput() now works for Internet Explorer 8 and 9. Note, however, that IE 8/9 do not support multiple files from a single file input. If you need to upload multiple files, you must use one file input for each file. Unlike in modern web browsers, no progress bar will display when uploading files in IE 8/9.
  • Shiny now supports single-file applications: instead of needing two separate files, server.R and ui.R, you can now create an application with single file named app.R. This also makes it easier to distribute example Shiny code, because you can run an entire app by simply copying and pasting the code for a single-file app into the R console. Here’s a simple example of a single-file app:
    ## app.R
    server <- function(input, output) {
      output$distPlot <- renderPlot({
        hist(rnorm(input$obs), col = 'darkgray', border = 'white')
    ui <- shinyUI(fluidPage(
          sliderInput("obs", "Number of observations:",
                      min = 10, max = 500, value = 100)
    shinyApp(ui = ui, server = server)

    See the single-file app article for more.

  • We’ve added progress bars, which allow you to indicate to users that something is happening when there’s a long-running computation. The progress bar will show at the top of the browser window, as shown here:progress
    Read the progress bar article for more.
  • We’ve upgraded the DataTables Javascript library from 1.9.4 to 1.10.2. We’ve tried to support backward compatibility as much as possible, but this might be a breaking change if you’ve customized the DataTables options in your apps. This is because some option names have changed; for example, aLengthMenu has been renamed to lengthMenu. Please read the article on DataTables on the Shiny website for more information about updating Shiny apps that use DataTables 1.9.4.

In addition to the changes listed above, there are some smaller updates:

  • Searching in DataTables is case-insensitive and the search strings are not treated as regular expressions by default now. If you want case-sensitive searching or regular expressions, you can use the configuration options search$caseInsensitive and search$regex, e.g. renderDataTable(..., options = list(search = list(caseInsensitve = FALSE, regex = TRUE))).
  • Shiny has switched from reference classes to R6.
  • Reactive log performance has been greatly improved.
  • Exported createWebDependency. It takes an htmltools::htmlDependency object and makes it available over Shiny’s built-in web server.
  • Custom output bindings can now render htmltools::htmlDependency objects at runtime using Shiny.renderDependencies().

Please read the NEWS file for a complete list of changes, and let us know if you have any comments or questions.

Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at You can get the latest version with:


We’ve made a lot of improvements to the install and release process:

  • Installation functions now default to build_vignettes = FALSE, and only install required dependencies (not suggested). They also store a lot of useful metadata.
  • install_github() got a lot of love. install_github("user/repo") is now the preferred way to install a package from github (older forms with explicit username parameter are now deprecated). You can supply the host argument to install packages from a local github enterprise installation. You can get the latest release with user/repo@*release.
  • session_info() uses package installation metdata to show you exactly how every package was installed (locally, from CRAN, from github, …)
  • release() uses new webform-based submission process for CRAN, as implemented in submit_cran().
  • You can add arbitrary extra questions to release() by defining a function release_questions() in your package. It should return a character vector of questions to ask.

We’ve also added a number of functions to make it easy to get started with various aspects of the package development:

  • use_data() adds data to a package, either in data/ (external data) or in R/sysdata.rda (internal data). use_data_raw() sets up data-raw/ for your reproducible data generation scripts.
  • use_package() sets dependencies and reminds you how to use them.
  • use_rcpp() gets you ready to use Rcpp.
  • use_testthat() sets up testing infrastructure with testthat.
  • use_travis() adds a .travis.yml file and tells you how to get started with travis ci.
  • use_vignette() creates a draft vignette using Rmarkdown.

There were many other minor improvements and bug fixes. See the release notes for complete list of changes.

testthat 0.9 is now available on CRAN. Testthat makes it easy to turn the informal testing that you’re already doing into formal automated tests. Learn more at

This version of testthat has four important new features that bring testthat up to speed with unit testing frameworks in other languages:

  • You can skip() tests with an informative message, if their prerequisites are not available. This is particularly use for CRAN packages, since tests only have a limited amount of time to run. Use skip_on_cran() skip selected tests when run on CRAN.
    test_that("a complicated simulation takes a long time", {
  • Experiment with behaviour driven development with the new describe() function contributed by Dirk Schumacher:
    describe("matrix()", {
      it("can be multiplied by a scalar", {
        m1 <- matrix(1:4, 2, 2)
        m2 <- m1 * 2
        expect_equivalent(matrix(1:4 * 2, 2, 2), m2)
  • Use with_mock() to “mock” functions, replacing slow, resource intensive or inconsistent functions with your own quick approximations. This is particularly useful when you want to test functions that call web APIs without being connected to the internet. Contributed by Kirill Müller.
  • Sometimes it’s difficult to figure out exactly what a function should return and instead you just want to make sure that it returned the same thing as the last time you ran it. A new expectation, expect_equal_to_reference(), makes this easy to do. Contributed by Jon Clayden.

Other changes of note: auto_test_package() is working again (and uses devtools::load_all() to load the code), random praise has been re-enabled (after being accidentally disabled), and expect_identical() works better with R-devel. See the release notes for complete list of changes.

Packrat is now available on CRAN, with version 0.4.1-1! Packrat is an R package that helps you manage your project’s R package dependencies in an isolated, reproducible and portable way.

Install packrat from CRAN with:


In particular, this release provides better support for local repositories. Local repositories are just folders containing package sources (currently as folders themselves).

One can now specify local repositories on a per-project basis by using:

packrat::set_opts(local.repos = <pathToRepo>)

and then using


to install that package from the local repository.

There is also experimental support for a global ‘cache’ of packages, which can be used across projects. If you wish to enable this feature, you can use (note that it is disabled by default):

packrat::set_opts(use.cache = TRUE)

in each project where you would utilize the cache.

By doing so, if one project installs or uses e.g. Shiny 0.10.1 for CRAN, and another version uses that same package, packrat will look up the installed version of that package in the cache — this should greatly speed up project initialization for new projects that use projects overlapping with other packrat projects with large, overlapping dependency chains.

In addition, this release provides a number of usability improvements and bug fixes for Windows.

Please visit for more information and a guide to getting started with Packrat.

httr 0.5 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.

This release is mostly bug fixes and minor improvements, but there is one major new feature: you can now save response bodies directly to disk.

# Download the latest version of rstudio for windows
url <- ""
GET(url, write_disk(basename(url)), progress())

There is also some preliminary support for HTTP caching (see cache_info() and rerequest()). See the release notes for complete details.

R Markdown is a framework for writing versatile, reproducible reports from R. With R Markdown, you write a simple plain text report and then render it to create polished output. You can:

  1. Transform your file into a pdf, html, or Microsoft Word document—even a slideshow—at the click of a button.
  2. Embed R code into your report. When you render the file, R will run the code and insert its results into your report. Use this feature to add graphs and tables to your report: if your data ever changes, you can update your figures by re-rendering the report.
  3. Make interactive documents and slideshows. Your report becomes interactive when you embed Shiny code.

We’ve created a cheat sheet to help you master R Markdown. Download your copy here. You can also learn more about R Markdown at and Introduction to R Markdown.


Shiny v0.10.1 has been released to CRAN. You can either install it from a CRAN mirror, or update it if you have installed a previous version.

install.packages('shiny', repos = '')
# or update your installed packages
# update.packages(ask = FALSE, repos = '')

The most prominent change in this patch release is that we added full Unicode support on Windows. Shiny apps running on Windows must use the UTF-8 encoding for ui.R and server.R (also the optional global.R,, and DESCRIPTION) if they contain non-ASCII characters. See this article for details and examples:

Chinese characters in a shiny app

Chinese characters in a shiny app

Please note although we require UTF-8 for the app components, UTF-8 is not a general requirement for any other files. If you read/write text files in an app, you are free to use any encoding you want, e.g. you can readLines('foo.txt', encoding = 'Windows-1252'). The article above has explained it in detail.

Other changes include:

  • runGitHub() also allows the 'username/repo' syntax now, which is equivalent to runGitHub('repo', 'username'). (#427)
  • navbarPage() now accepts a windowTitle parameter to set the web browser page title to something other than the title displayed in the navbar.
  • Added an inline argument to textOutput(), imageOutput(), plotOutput(), and htmlOutput(). When inline = TRUE, these outputs will be put in span() instead of the default div(). This occurs automatically when these outputs are created via the inline expressions (e.g. `r renderText(expr)`) in R Markdown documents. See an R Markdown example at (#512)
  • Added support for option groups in the select/selectize inputs. When the choices argument for selectInput()/selectizeInput() is a list of sub-lists and any sub-list is of length greater than 1, the HTML tag <optgroup> will be used. See an example at here (#542)

Please let us know if you have any comments or questions.

httr 0.4 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R.

The most important new features are two new vignettes to help you get started and to help you make wrappers for web APIs. Other important improvements include:

  • New headers() and cookies() functions to extract headers and cookies from responses. status_code() returns HTTP status codes.
  • POST() (and PUT(), and PATCH()) now have an encode argument that determine how the body is encoded. Valid values are “multipart”, “form” or “json”, and the multipart argument is now deprecated.
  • GET(..., progress()) will display a progress bar, useful if you’re doing large uploads or downloads.
  • verbose() gives you considerably more control over degree of verbosity, and defaults have been selected to be more helpful for the most common cases.
  • NULL query parameters are now dropped automatically.

There are number of other minor improvements and bug fixes, as described by the release notes.


Get every new post delivered to your Inbox.

Join 693 other followers