I’m very pleased to announce that dplyr 0.3 is now available from CRAN. Get the latest version by running:

install.packages("dplyr")

There are four major new features:

  • Four new high-level verbs: distinct(), slice(), rename(), and transmute().
  • Three new helper functions between, count(), and data_frame().
  • More flexible join specifications.
  • Support for row-based set operations.

There are two new features of interest to developers. They make it easier to write packages that use dplyr:

  • It’s now much easier to program with dplyr (using standard evaluation).
  • Improved database backends.

I describe each of these in turn below.

New verbs

distinct() returns distinct (unique) rows of a table:

library(nycflights13)
# Find all origin-destination pairs
flights %>% 
  select(origin, dest) %>%
  distinct()
#> Source: local data frame [224 x 2]
#> 
#>    origin dest
#> 1     EWR  IAH
#> 2     LGA  IAH
#> 3     JFK  MIA
#> 4     JFK  BQN
#> 5     LGA  ATL
#> ..    ...  ...

slice() allows you to select rows by position. It includes positive integers and drops negative integers:

# Get the first flight to each destination
flights %>% 
  group_by(dest) %>%
  slice(1)
#> Source: local data frame [105 x 16]
#> Groups: dest
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013    10   1     1955        -6     2213       -35      B6  N554JB
#> 2  2013    10   1     1149       -10     1245       -14      B6  N346JB
#> 3  2013     1   1     1315        -2     1413       -10      EV  N13538
#> 4  2013     7   6     1629        14     1954         1      UA  N587UA
#> 5  2013     1   1      554        -6      812       -25      DL  N668DN
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

transmute() and rename() are variants of mutate() and select(). Transmute drops all columns that you didn’t specifically mention, rename() keeps all columns that you didn’t specifically mention. They complete this table:

Drop others Keep others
Rename & reorder variables select() rename()
Compute new variables transmute() mutate()

New helpers

data_frame(), contributed by Kevin Ushey, is a nice way to create data frames:

  • It never changes the type of its inputs (i.e. no more stringsAsFactors = FALSE!)
    data.frame(x = letters) %>% sapply(class)
    #>        x 
    #> "factor"
    data_frame(x = letters) %>% sapply(class)
    #>           x 
    #> "character"
  • Or the names of variables:
    data.frame(`crazy name` = 1) %>% names()
    #> [1] "crazy.name"
    data_frame(`crazy name` = 1) %>% names()
    #> [1] "crazy name"
  • It evaluates its arguments lazyily and in order:
    data_frame(x = 1:5, y = x ^ 2)
    #> Source: local data frame [5 x 2]
    #> 
    #>   x  y
    #> 1 1  1
    #> 2 2  4
    #> 3 3  9
    #> 4 4 16
    #> 5 5 25
  • It adds tbl_df() class to output, never adds row.names(), and only recycles vectors of length 1 (recycling is a frequent source of bugs in my experience).

The count() function wraps up the common combination of group_by() and summarise():

# How many flights to each destination?
flights %>% count(dest)
#> Source: local data frame [105 x 2]
#> 
#>    dest     n
#> 1   ABQ   254
#> 2   ACK   265
#> 3   ALB   439
#> 4   ANC     8
#> 5   ATL 17215
#> ..  ...   ...

# Which planes flew the most?
flights %>% count(tailnum, sort = TRUE)
#> Source: local data frame [4,044 x 2]
#> 
#>    tailnum    n
#> 1          2512
#> 2   N725MQ  575
#> 3   N722MQ  513
#> 4   N723MQ  507
#> 5   N711MQ  486
#> ..     ...  ...

# What's the total carrying capacity of planes by year of purchase
planes %>% count(year, wt = seats)
#> Source: local data frame [47 x 2]
#> 
#>    year   n
#> 1  1956 102
#> 2  1959  18
#> 3  1963  10
#> 4  1965 149
#> 5  1967   9
#> ..  ... ...

Better joins

You can now join by different variables in each table:

narrow <- flights %>% select(origin, dest, year:day)

# Add destination airport metadata
narrow %>% left_join(airports, c("dest" = "faa"))
#> Source: local data frame [336,776 x 11]
#> 
#>    dest origin year month day                            name   lat    lon
#> 1   IAH    EWR 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 2   IAH    LGA 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 3   MIA    JFK 2013     1   1                      Miami Intl 25.79 -80.29
#> 4   BQN    JFK 2013     1   1                              NA    NA     NA
#> 5   ATL    LGA 2013     1   1 Hartsfield Jackson Atlanta Intl 33.64 -84.43
#> ..  ...    ...  ...   ... ...                             ...   ...    ...
#> Variables not shown: alt (int), tz (dbl), dst (chr)

# Add origin airport metadata
narrow %>% left_join(airports, c("origin" = "faa"))
#> Source: local data frame [336,776 x 11]
#> 
#>    origin dest year month day                name   lat    lon alt tz dst
#> 1     EWR  IAH 2013     1   1 Newark Liberty Intl 40.69 -74.17  18 -5   A
#> 2     LGA  IAH 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> 3     JFK  MIA 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 4     JFK  BQN 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 5     LGA  ATL 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> ..    ...  ...  ...   ... ...                 ...   ...    ... ... .. ...

(right_join() and outer_join() implementations are planned for dplyr 0.4.)

Set operations

You can use intersect(), union() and setdiff() with data frames, data tables and databases:

jfk_planes <- flights %>% 
  filter(origin == "JFK") %>% 
  select(tailnum) %>% 
  distinct()
lga_planes <- flights %>% 
  filter(origin == "LGA") %>% 
  select(tailnum) %>% 
  distinct()

# Planes that fly out of either JGK or LGA
nrow(union(jfk_planes, lga_planes))
#> [1] 3592

# Planes that fly out of both JFK and LGA
nrow(intersect(jfk_planes, lga_planes))
#> [1] 1311

# Planes that fly out JGK but not LGA
nrow(setdiff(jfk_planes, lga_planes))
#> [1] 647

Programming with dplyr

You can now program with dplyr – every function that uses non-standard evaluation (NSE) also has a standard evaluation (SE) twin that ends in _. For example, the SE version of filter() is called filter_(). The SE version of each function has similar arguments, but they must be explicitly “quoted”. Usually the best way to do this is to use ~:

airport <- "ANC"
# NSE version
filter(flights, dest == airport)
#> Source: local data frame [8 x 16]
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

# Equivalent SE code:
criteria <- ~dest == airport
filter_(flights, criteria)
#> Source: local data frame [8 x 16]
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

To learn more, read the Non-standard evaluation vignette. This new approach is powered by the lazyeval package which provides all the tools needed to implement NSE consistently and correctly. I now understand how to implement NSE consistently and correctly, and I’ll be using the same approach everywhere.

Database backends

The database backend system has been completely overhauled in order to make it possible to add backends in other packages, and to support a much wider range of databases. If you’re interested in implementing a new dplyr backend, please check out vignette("new-sql-backend") – it’s really not that much work.

The first package to take advantage of this system is MonetDB.R, which now provides the MonetDB backend for dplyr.

Other changes

As well as the big new features described here, dplyr 0.3 also fixes many bugs and makes numerous minor improvements. See the release notes for a complete list of the changes.

Shiny v0.10.2 has been released to CRAN. To install it:

install.packages('shiny')

This version of Shiny requires R 3.0.0 or higher (note the current version of R is 3.1.1). R 2.15.x is no longer supported.

Here are the most prominent changes:

  • File uploading via fileInput() now works for Internet Explorer 8 and 9. Note, however, that IE 8/9 do not support multiple files from a single file input. If you need to upload multiple files, you must use one file input for each file. Unlike in modern web browsers, no progress bar will display when uploading files in IE 8/9.
  • Shiny now supports single-file applications: instead of needing two separate files, server.R and ui.R, you can now create an application with single file named app.R. This also makes it easier to distribute example Shiny code, because you can run an entire app by simply copying and pasting the code for a single-file app into the R console. Here’s a simple example of a single-file app:
    ## app.R
    server <- function(input, output) {
      output$distPlot <- renderPlot({
        hist(rnorm(input$obs), col = 'darkgray', border = 'white')
      })
    }
    
    ui <- shinyUI(fluidPage(
      sidebarLayout(
        sidebarPanel(
          sliderInput("obs", "Number of observations:",
                      min = 10, max = 500, value = 100)
        ),
        mainPanel(plotOutput("distPlot"))
      )
    ))
    
    shinyApp(ui = ui, server = server)
    

    See the single-file app article for more.

  • We’ve added progress bars, which allow you to indicate to users that something is happening when there’s a long-running computation. The progress bar will show at the top of the browser window, as shown here:progress
    Read the progress bar article for more.
  • We’ve upgraded the DataTables Javascript library from 1.9.4 to 1.10.2. We’ve tried to support backward compatibility as much as possible, but this might be a breaking change if you’ve customized the DataTables options in your apps. This is because some option names have changed; for example, aLengthMenu has been renamed to lengthMenu. Please read the article on DataTables on the Shiny website for more information about updating Shiny apps that use DataTables 1.9.4.

In addition to the changes listed above, there are some smaller updates:

  • Searching in DataTables is case-insensitive and the search strings are not treated as regular expressions by default now. If you want case-sensitive searching or regular expressions, you can use the configuration options search$caseInsensitive and search$regex, e.g. renderDataTable(..., options = list(search = list(caseInsensitve = FALSE, regex = TRUE))).
  • Shiny has switched from reference classes to R6.
  • Reactive log performance has been greatly improved.
  • Exported createWebDependency. It takes an htmltools::htmlDependency object and makes it available over Shiny’s built-in web server.
  • Custom output bindings can now render htmltools::htmlDependency objects at runtime using Shiny.renderDependencies().

Please read the NEWS file for a complete list of changes, and let us know if you have any comments or questions.

Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at http://r-pkgs.had.co.nz/. You can get the latest version with:

install.packages("devtools")

We’ve made a lot of improvements to the install and release process:

  • Installation functions now default to build_vignettes = FALSE, and only install required dependencies (not suggested). They also store a lot of useful metadata.
  • install_github() got a lot of love. install_github("user/repo") is now the preferred way to install a package from github (older forms with explicit username parameter are now deprecated). You can supply the host argument to install packages from a local github enterprise installation. You can get the latest release with user/repo@*release.
  • session_info() uses package installation metdata to show you exactly how every package was installed (locally, from CRAN, from github, …)
  • release() uses new webform-based submission process for CRAN, as implemented in submit_cran().
  • You can add arbitrary extra questions to release() by defining a function release_questions() in your package. It should return a character vector of questions to ask.

We’ve also added a number of functions to make it easy to get started with various aspects of the package development:

  • use_data() adds data to a package, either in data/ (external data) or in R/sysdata.rda (internal data). use_data_raw() sets up data-raw/ for your reproducible data generation scripts.
  • use_package() sets dependencies and reminds you how to use them.
  • use_rcpp() gets you ready to use Rcpp.
  • use_testthat() sets up testing infrastructure with testthat.
  • use_travis() adds a .travis.yml file and tells you how to get started with travis ci.
  • use_vignette() creates a draft vignette using Rmarkdown.

There were many other minor improvements and bug fixes. See the release notes for complete list of changes.

Are you headed to Strata? It’s just around the corner!

We particularly hope to see you at R Day on October 15, where we will cover a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from Hadley Wickham, Winston Chang, Garrett Grolemund, J.J. Allaire, and Yihui Xie who are all working on fascinating new ways to keep the R ecosystem apace of the challenges facing those who work with data.

If you plan to stay for the full Strata Conference+Hadoop World be sure to look us up in the Innovator Pavilion booth P14 during the Expo Hall hours. We’ll have the latest books from RStudio authors and “shiny” t-shirts to win. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in New York City!

ShinyApps.io dashboard

Some of the most innovative Shiny apps share data across user sessions. Some apps share the results of one session to use in future sessions, others track user characteristics over time and make them available as part of the app.

This level of sophistication creates tricky design choices when you host your app on a server. A nimble server will open new instances of your app to speed up performance, or relaunch your app on a bigger server when it becomes popular. How should you ensure that your app can find and use its data trail along the way?

Shiny Server developer Jeff Allen explains the best ways to share data between sessions in Share data across sessions with ShinyApps.io, a new article at the Shiny Dev Center.

Registration is now open for the next Master R Development workshop led by Hadley Wickham, author of over 30 R packages and the Advanced R book. The workshop will be held on January 19 and 20th in the San Francisco bay area.

The workshop is a two day course on advanced R practices and package development. You’ll learn the three main paradigms of R programming: functional programming, object oriented programming and metaprogramming, as well as how to make R packages, the key to well-documented, well-tested and easily-distributed R code.

To learn more, or register visit rstudio-sfbay.eventbrite.com.

testthat 0.9 is now available on CRAN. Testthat makes it easy to turn the informal testing that you’re already doing into formal automated tests. Learn more at http://r-pkgs.had.co.nz/tests.html

This version of testthat has four important new features that bring testthat up to speed with unit testing frameworks in other languages:

  • You can skip() tests with an informative message, if their prerequisites are not available. This is particularly use for CRAN packages, since tests only have a limited amount of time to run. Use skip_on_cran() skip selected tests when run on CRAN.
    test_that("a complicated simulation takes a long time", {
      skip_on_cran()
    
      ...
    })
  • Experiment with behaviour driven development with the new describe() function contributed by Dirk Schumacher:
    describe("matrix()", {
      it("can be multiplied by a scalar", {
        m1 <- matrix(1:4, 2, 2)
        m2 <- m1 * 2
        expect_equivalent(matrix(1:4 * 2, 2, 2), m2)
      })
    })
  • Use with_mock() to “mock” functions, replacing slow, resource intensive or inconsistent functions with your own quick approximations. This is particularly useful when you want to test functions that call web APIs without being connected to the internet. Contributed by Kirill Müller.
  • Sometimes it’s difficult to figure out exactly what a function should return and instead you just want to make sure that it returned the same thing as the last time you ran it. A new expectation, expect_equal_to_reference(), makes this easy to do. Contributed by Jon Clayden.

Other changes of note: auto_test_package() is working again (and uses devtools::load_all() to load the code), random praise has been re-enabled (after being accidentally disabled), and expect_identical() works better with R-devel. See the release notes for complete list of changes.

Want to see who is using your Shiny apps and what they are doing while they are there?

Google Analytics is a popular way to track traffic to your website. With Google Analytics, you can see what sort of person comes to your website, where they arrive from, and what they do while they are there.

Since Shiny apps are web pages, you can also use Google Analytics to keep an eye on who visits your app and how they use it.

Add Google Analytics to a Shiny app is a new article at the Shiny dev center that will show you how to set up analytics for a Shiny app. Some knowledge of jQuery is required.

Packrat is now available on CRAN, with version 0.4.1-1! Packrat is an R package that helps you manage your project’s R package dependencies in an isolated, reproducible and portable way.

Install packrat from CRAN with:

install.packages("packrat")

In particular, this release provides better support for local repositories. Local repositories are just folders containing package sources (currently as folders themselves).

One can now specify local repositories on a per-project basis by using:

packrat::set_opts(local.repos = <pathToRepo>)

and then using

packrat::install_local(<pkgName>)

to install that package from the local repository.

There is also experimental support for a global ‘cache’ of packages, which can be used across projects. If you wish to enable this feature, you can use (note that it is disabled by default):

packrat::set_opts(use.cache = TRUE)

in each project where you would utilize the cache.

By doing so, if one project installs or uses e.g. Shiny 0.10.1 for CRAN, and another version uses that same package, packrat will look up the installed version of that package in the cache — this should greatly speed up project initialization for new projects that use projects overlapping with other packrat projects with large, overlapping dependency chains.

In addition, this release provides a number of usability improvements and bug fixes for Windows.

Please visit rstudio.github.io/packrat for more information and a guide to getting started with Packrat.

httr 0.5 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.

This release is mostly bug fixes and minor improvements, but there is one major new feature: you can now save response bodies directly to disk.

library(httr)
# Download the latest version of rstudio for windows
url <- "http://download1.rstudio.org/RStudio-0.98.1049.exe"
GET(url, write_disk(basename(url)), progress())

There is also some preliminary support for HTTP caching (see cache_info() and rerequest()). See the release notes for complete details.

Follow

Get every new post delivered to your Inbox.

Join 12,252 other followers