You are currently browsing the monthly archive for February 2016.

RStudio is pleased to notify account holders of recent updates to shinyapps.io.

Note: Action is required if your shiny application URL includes internal.shinyapps.io

What’s New?

We have updated the authentication and invitation system to improve the user experience, security, and extensibility for anyone with private applications. You may have already noticed some changes to the authentication flow for your applications if you are a Standard or Professional account holder.

As a part of these changes, we have eliminated the IFRAME and the associated RStudio branding, except for customers using custom domains where the IFRAME is still required.

For customers on free plans, we will replace the RStudio branding bar with a softer, less intrusive branding overlay.

Possible Action Required

If you have used the provided URL from shinyapps.io for your shiny applications like most accounts, no action is needed. Your applications will simply benefit from the improvements.

If your shiny application URL begins with internal.shinyapps.io you must change it.

To complete the update we will SHUTDOWN all internal.shinyapps.io URLs on March 2, 2016. If you have publicly linked your application to internal.shinyapps.io or you have embedded applications on your website by directly referring to the internal.shinyapps.io URL, you MUST change your links to the URL you see in the shinyapps.io dashboard for your application.

While relatively few accounts are impacted and no action is required for most shinyapps.io users, if you have questions please contact shinyapps-support@rstudio.com.

Thank you all for your help and thanks for using shinyapps.io!

The RStudio shinyapps.io Team

We’re pleased to announce that a new release of RStudio (v0.99.878) is available for download now. Highlights of this release include:

There are lots of other small improvements across the product, check out the release notes for full details.

RStudio Addins

RStudio Addins provide a mechanism for executing custom R functions interactively from within the RStudio IDE—either through keyboard shortcuts, or through the Addins menu. Coupled with the rstudioapi package, users can now write R code to interact with and modify the contents of documents open in RStudio.

An addin can be as simple as a function that inserts a commonly used snippet of text, and as complex as a Shiny application that accepts input from the user and uses it to transform the contents of the active editor. The sky is the limit!

Here’s an example of addin that enables interactive subsetting of a data frame with live preview:

subset-addin

 

This addin is implemented using a Shiny Gadget (see the source code for more details). RStudio Addins are distributed as R packages. Once you’ve installed an R package that contains addins, they’ll be immediately become available within RStudio.

You can learn more about using and developing addins here: http://rstudio.github.io/rstudioaddins/.

R Markdown

We’ve made a number of improvements to R Markdown authoring. There’s now an optional outline view that enables quick navigation across larger documents:

Screen Shot 2015-12-22 at 9.27.34 AM

We’ve also added inline UI to code chunks for running individual chunks, running all previous chunks, and specifying various commonly used knit options:

Screen Shot 2015-12-22 at 9.30.11 AM

Multiple Source Windows

There are two ways to open a new source window:

Pop out an editor: click the Show in New Window button in any source editor tab.

Tear off a pane: drag a tab out of the main window and onto the desktop; a new source window will be opened where you dropped the tab.

You can have as many source windows open as you like. Each source window has its own set of tabs; these tabs are independent of the tabs in RStudio’s main source pane.

Customizable Keyboard Shortcuts

You can now customize keyboard shortcuts in RStudio — you can bind keys to execute RStudio application commands, editor commands, or even user-defined R functions.

Access the keyboard shortcuts by clicking Tools -> Modify Keyboard Shortcuts...:

This will present a dialog that enables remapping of all available editor commands (commands that affect the current document’s contents, or the current selection) and RStudio commands (commands whose actions are scoped beyond just the current editor).

Emacs Keybindings

We’ve introduced a new keybindings mode to go along with the default bindings and Vim bindings already supported. Emacs mode provides a base set of keybindings for navigation and selection, including:

  • C-p, C-n, C-b and C-f to move the cursor up, down left and right by characters
  • M-b, M-f to move left and right by words
  • C-a, C-e to navigate to the start, or end, of line;
  • C-k to ‘kill’ to end of line, and C-y to ‘yank’ the last kill,
  • C-s, C-r to initiate an Emacs-style incremental search (forward / reverse),
  • C-Space to set/unset mark, and C-w to kill the marked region.

There are some additional keybindings that Emacs Speaks Statistics (ESS) users might find familiar:

  • C-c C-v displays help for the object under the cursor,
  • C-c C-n evaluates the current line / selection,
  • C-x b allows you to visit another file,
  • M-C-a moves the cursor to the beginning of the current function,
  • M-C-e moves to the end of the current function,
  • C-c C-f evaluates the current function.

We’ve also introduced a number of keybindings that allow you to interact with the IDE as you might normally do in Emacs:

  • C-x C-n to create a new document,
  • C-x C-f to find / open an existing document,
  • C-x C-s to save the current document,
  • C-x k to close the current file.

RStudio Server Pro

We’ve introduced a number of significant enhancements to RStudio Server Pro in this release, including:

  • The ability to open multiple concurrent R sessions. Multiple concurrent sessions are useful for running multiple analyses in parallel and for switching between different tasks.
  • Flexible use of multiple R versions on the same server. This is useful when you have some analysts or projects that require older versions of R or R packages and some that require newer versions.
  • Project sharing for easy collaboration within workgroups. When you share a project, RStudio Server securely grants other users access to the project, and when multiple users are active in the project at once, you can see each others’ activity and work together in a shared editor.

See the updated RStudio Server Pro page for additional details, including a set of videos which demonstrate the new features.

Try it Out

RStudio v0.99.878 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.

 

 

On May 19 and 20, 2016, Hadley Wickham will teach his two day Master R Developer Workshop in the centrally located European city of Amsterdam.

Are you ready to upgrade your R skills?  Register soon to secure your seat.

For the convenience of those who may travel to the workshop, it will be held at the Hotel NH Amsterdam Schiphol Airport.

Hadley teaches a few workshops each year and this is the only one planned for Europe. They are very popular and hotel rooms are limited. Please register soon.

We look forward to seeing you in the month of May!

We are pleased to announce version 1.0.0 of the memoise package is now available on CRAN. Memoization stores the value of function call and returns the cached result when the function is called again with the same arguments.

The following function computes Fibonacci numbers and illustrates the usefulness of memoization. Because the function definition is recursive, the intermediate results can be looked up rather than recalculated at each level of recursion, which reduces the runtime drastically. The last time the memoised function is called the final result can simply be returned, so no measurable execution time is recorded.

fib <- function(n) {
  if (n < 2) {
    return(n)
  } else {
    return(fib(n-1) + fib(n-2))
  }
}
system.time(x <- fib(30))
#>    user  system elapsed 
#>   4.454   0.010   4.472
fib <- memoise(fib)
system.time(y <- fib(30))
#>    user  system elapsed 
#>   0.004   0.000   0.004
system.time(z <- fib(30))
#>    user  system elapsed 
#>       0       0       0
all.equal(x, y)
#> [1] TRUE
all.equal(x, z)
#> [1] TRUE

Memoization is also very useful for storing queries to external resources, such as network APIs and databases.

Improvements in this release make memoised functions much nicer to use interactively. Memoised functions now have a print method which outputs the original function definition rather than the memoization code.

mem_sum <- memoise(sum)
mem_sum
#> Memoised Function:
#> function (..., na.rm = FALSE)  .Primitive("sum")

Memoised functions now forward their arguments from the original function rather than simply passing them with .... This allows autocompletion to work transparently for memoised functions and also fixes a bug related to non-constant default arguments. [1]

mem_scan <- memoise(scan)
args(mem_scan)
#> function (file = "", what = double(), nmax = -1L, n = -1L, sep = "", 
#>     quote = if (identical(sep, "\n")) "" else "'\"", dec = ".", 
#>     skip = 0L, nlines = 0L, na.strings = "NA", flush = FALSE, 
#>     fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, 
#>     multi.line = TRUE, comment.char = "", allowEscapes = FALSE, 
#>     fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) 
#> NULL

Memoisation can now depend on external variables aside from the function arguments. This feature can be used in a variety of ways, such as invalidating the memoisation when a new package is attached.

mem_f <- memoise(runif, ~search())
mem_f(2)
#> [1] 0.009113091 0.988083122
mem_f(2)
#> [1] 0.009113091 0.988083122
library(ggplot2)
mem_f(2)
#> [1] 0.89150566 0.01128355

Or invalidating the memoisation after a given amount of time has elapsed. A timeout() helper function is provided to make this feature easier to use.

mem_f <- memoise(runif, ~timeout(10))
mem_f(2)
#> [1] 0.6935329 0.3584699
mem_f(2)
#> [1] 0.6935329 0.3584699
Sys.sleep(10)
mem_f(2)
#> [1] 0.2008418 0.4538413

A great amount of thanks for this release goes to Kirill Müller, who wrote the argument forwarding implementation and added comprehensive tests to the package. [2, 3]

See the release notes for a complete list of changes.

I’m pleased to announce tidyr 0.4.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install it with:

install.packages("tidyr")

There are two big features in this release: support for nested data frames, and improved tools for turning implicit missing values into explicit missing values. These are described in detail below. As well as these big features, all tidyr verbs now handle grouped_df objects created by dplyr, gather() makes a character key column (instead of a factor), and there are lots of other minor fixes and improvements. Please see the release notes for a complete list of changes.

Nested data frames

nest() and unnest() have been overhauled to support a new way of structuring your data: the nested data frame. In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.

For example, take the gapminder dataset:

library(gapminder)
library(dplyr)

gapminder
#> Source: local data frame [1,704 x 6]
#> 
#>        country continent  year lifeExp      pop gdpPercap
#>         (fctr)    (fctr) (int)   (dbl)    (int)     (dbl)
#> 1  Afghanistan      Asia  1952    28.8  8425333       779
#> 2  Afghanistan      Asia  1957    30.3  9240934       821
#> 3  Afghanistan      Asia  1962    32.0 10267083       853
#> 4  Afghanistan      Asia  1967    34.0 11537966       836
#> 5  Afghanistan      Asia  1972    36.1 13079460       740
#> 6  Afghanistan      Asia  1977    38.4 14880372       786
#> 7  Afghanistan      Asia  1982    39.9 12881816       978
#> 8  Afghanistan      Asia  1987    40.8 13867957       852
#> ..         ...       ...   ...     ...      ...       ...

We can plot the trend in life expetancy for each country:

library(ggplot2)

ggplot(gapminder, aes(year, lifeExp)) +
  geom_line(aes(group = country))

unnamed-chunk-4-1

But it’s hard to see what’s going on because of all the overplotting. One interesting solution is to summarise each country with a linear model. To do that most naturally, you want one data frame for each country. nest() creates this structure:

by_country <- gapminder %>% 
  group_by(continent, country) %>% 
  nest()

by_country
#> Source: local data frame [142 x 3]
#> 
#>    continent     country            data
#>       (fctr)      (fctr)          (list)
#> 1       Asia Afghanistan <tbl_df [12,4]>
#> 2     Europe     Albania <tbl_df [12,4]>
#> 3     Africa     Algeria <tbl_df [12,4]>
#> 4     Africa      Angola <tbl_df [12,4]>
#> 5   Americas   Argentina <tbl_df [12,4]>
#> 6    Oceania   Australia <tbl_df [12,4]>
#> 7     Europe     Austria <tbl_df [12,4]>
#> 8       Asia     Bahrain <tbl_df [12,4]>
#> ..       ...         ...             ...

The intriguing thing about this data frame is that it now contains one row per group, and to store the original data we have a new data column, a list of data frames. If we look at the first one, we can see that it contains the complete data for Afghanistan (sans grouping columns):

by_country$data[[1]]
#> Source: local data frame [12 x 4]
#> 
#>     year lifeExp      pop gdpPercap
#>    (int)   (dbl)    (int)     (dbl)
#> 1   1952    43.1  9279525      2449
#> 2   1957    45.7 10270856      3014
#> 3   1962    48.3 11000948      2551
#> 4   1967    51.4 12760499      3247
#> 5   1972    54.5 14760787      4183
#> 6   1977    58.0 17152804      4910
#> 7   1982    61.4 20033753      5745
#> 8   1987    65.8 23254956      5681
#> ..   ...     ...      ...       ...

This form is natural because there are other vectors where you’ll have one value per country. For example, we could fit a linear model to each country with purrr:

by_country <- by_country %>% 
  mutate(model = purrr::map(data, ~ lm(lifeExp ~ year, data = .))
)
by_country
#> Source: local data frame [142 x 4]
#> 
#>    continent     country            data   model
#>       (fctr)      (fctr)          (list)  (list)
#> 1       Asia Afghanistan <tbl_df [12,4]> <S3:lm>
#> 2     Europe     Albania <tbl_df [12,4]> <S3:lm>
#> 3     Africa     Algeria <tbl_df [12,4]> <S3:lm>
#> 4     Africa      Angola <tbl_df [12,4]> <S3:lm>
#> 5   Americas   Argentina <tbl_df [12,4]> <S3:lm>
#> 6    Oceania   Australia <tbl_df [12,4]> <S3:lm>
#> 7     Europe     Austria <tbl_df [12,4]> <S3:lm>
#> 8       Asia     Bahrain <tbl_df [12,4]> <S3:lm>
#> ..       ...         ...             ...     ...

Because we used mutate(), we get an extra column containing one linear model per country.

It might seem unnatural to store a list of linear models in a data frame. However, I think it is actually a really convenient and powerful strategy because it allows you to keep related vectors together. If you filter or arrange the vector of models, there’s no way for the other components to get out of sync.

nest() got us into this form; unnest() gets us out. You give it the list-columns that you want to unnested, and tidyr will automatically repeat the grouping columns. Unnesting data gets us back to the original form:

by_country %>% unnest(data)
#> Source: local data frame [1,704 x 6]
#> 
#>    continent     country  year lifeExp      pop gdpPercap
#>       (fctr)      (fctr) (int)   (dbl)    (int)     (dbl)
#> 1       Asia Afghanistan  1952    43.1  9279525      2449
#> 2       Asia Afghanistan  1957    45.7 10270856      3014
#> 3       Asia Afghanistan  1962    48.3 11000948      2551
#> 4       Asia Afghanistan  1967    51.4 12760499      3247
#> 5       Asia Afghanistan  1972    54.5 14760787      4183
#> 6       Asia Afghanistan  1977    58.0 17152804      4910
#> 7       Asia Afghanistan  1982    61.4 20033753      5745
#> 8       Asia Afghanistan  1987    65.8 23254956      5681
#> ..       ...         ...   ...     ...      ...       ...

When working with models, unnesting is particularly useful when you combine it with broom to extract model summaries:

# Extract model summaries:
by_country %>% unnest(model %>% purrr::map(broom::glance))
#> Source: local data frame [142 x 15]
#> 
#>    continent     country            data   model r.squared
#>       (fctr)      (fctr)          (list)  (list)     (dbl)
#> 1       Asia Afghanistan <tbl_df [12,4]> <S3:lm>     0.985
#> 2     Europe     Albania <tbl_df [12,4]> <S3:lm>     0.888
#> 3     Africa     Algeria <tbl_df [12,4]> <S3:lm>     0.967
#> 4     Africa      Angola <tbl_df [12,4]> <S3:lm>     0.034
#> 5   Americas   Argentina <tbl_df [12,4]> <S3:lm>     0.919
#> 6    Oceania   Australia <tbl_df [12,4]> <S3:lm>     0.766
#> 7     Europe     Austria <tbl_df [12,4]> <S3:lm>     0.680
#> 8       Asia     Bahrain <tbl_df [12,4]> <S3:lm>     0.493
#> ..       ...         ...             ...     ...       ...
#> Variables not shown: adj.r.squared (dbl), sigma (dbl),
#>   statistic (dbl), p.value (dbl), df (int), logLik (dbl),
#>   AIC (dbl), BIC (dbl), deviance (dbl), df.residual (int).

# Extract coefficients:
by_country %>% unnest(model %>% purrr::map(broom::tidy))
#> Source: local data frame [284 x 7]
#> 
#>    continent     country        term  estimate std.error
#>       (fctr)      (fctr)       (chr)     (dbl)     (dbl)
#> 1       Asia Afghanistan (Intercept) -1.07e+03   43.8022
#> 2       Asia Afghanistan        year  5.69e-01    0.0221
#> 3     Europe     Albania (Intercept) -3.77e+02   46.5834
#> 4     Europe     Albania        year  2.09e-01    0.0235
#> 5     Africa     Algeria (Intercept) -6.13e+02   38.8918
#> 6     Africa     Algeria        year  3.34e-01    0.0196
#> 7     Africa      Angola (Intercept) -6.55e+01  202.3625
#> 8     Africa      Angola        year  6.07e-02    0.1022
#> ..       ...         ...         ...       ...       ...
#> Variables not shown: statistic (dbl), p.value (dbl).

# Extract residuals etc:
by_country %>% unnest(model %>% purrr::map(broom::augment))
#> Source: local data frame [1,704 x 11]
#> 
#>    continent     country lifeExp  year .fitted .se.fit
#>       (fctr)      (fctr)   (dbl) (int)   (dbl)   (dbl)
#> 1       Asia Afghanistan    43.1  1952    43.4   0.718
#> 2       Asia Afghanistan    45.7  1957    46.2   0.627
#> 3       Asia Afghanistan    48.3  1962    49.1   0.544
#> 4       Asia Afghanistan    51.4  1967    51.9   0.472
#> 5       Asia Afghanistan    54.5  1972    54.8   0.416
#> 6       Asia Afghanistan    58.0  1977    57.6   0.386
#> 7       Asia Afghanistan    61.4  1982    60.5   0.386
#> 8       Asia Afghanistan    65.8  1987    63.3   0.416
#> ..       ...         ...     ...   ...     ...     ...
#> Variables not shown: .resid (dbl), .hat (dbl), .sigma
#>   (dbl), .cooksd (dbl), .std.resid (dbl).

I think storing multiple models in a data frame is a powerful and convenient technique, and I plan to write more about it in the future.

Expanding

The complete() function allows you to turn implicit missing values into explicit missing values. For example, imagine you’ve collected some data every year basis, but unfortunately some of your data has gone missing:

resources <- frame_data(
  ~year, ~metric, ~value,
  1999, "coal", 100,
  2001, "coal", 50,
  2001, "steel", 200
)
resources
#> Source: local data frame [3 x 3]
#> 
#>    year metric value
#>   (dbl)  (chr) (dbl)
#> 1  1999   coal   100
#> 2  2001   coal    50
#> 3  2001  steel   200

Here the value for steel in 1999 is implicitly missing: it’s simply absent from the data frame. We can use complete() to make this missing row explicit, adding that combination of the variables and inserting a placeholder NA:

resources %>% complete(year, metric)
#> Source: local data frame [4 x 3]
#> 
#>    year metric value
#>   (dbl)  (chr) (dbl)
#> 1  1999   coal   100
#> 2  1999  steel    NA
#> 3  2001   coal    50
#> 4  2001  steel   200

With complete you’re not limited to just combinations that exist in the data. For example, here we know that there should be data for every year, so we can use the fullseq() function to generate every year over the range of the data:

resources %>% complete(year = full_seq(year, 1L), metric)
#> Source: local data frame [6 x 3]
#> 
#>    year metric value
#>   (dbl)  (chr) (dbl)
#> 1  1999   coal   100
#> 2  1999  steel    NA
#> 3  2000   coal    NA
#> 4  2000  steel    NA
#> 5  2001   coal    50
#> 6  2001  steel   200

In other scenarios, you may not want to generate the full set of combinations. For example, imagine you have an experiment where each person is assigned one treatment. You don’t want to expand the combinations of person and treatment, but you do want to make sure every person has all replicates. You can use nesting() to prevent the full Cartesian product from being generated:

experiment <- data_frame(
  person = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)),
  trt  = rep(c("a", "b", "a"), c(3, 2, 1)),
  rep = c(1, 2, 3, 1, 2, 1),
  measurment_1 = runif(6),
  measurment_2 = runif(6)
)
experiment
#> Source: local data frame [6 x 5]
#> 
#>   person   trt   rep measurment_1 measurment_2
#>    (chr) (chr) (dbl)        (dbl)        (dbl)
#> 1   Alex     a     1       0.7161        0.927
#> 2   Alex     a     2       0.3231        0.942
#> 3   Alex     a     3       0.4548        0.668
#> 4 Robert     b     1       0.0356        0.667
#> 5 Robert     b     2       0.5081        0.143
#> 6    Sam     a     1       0.6917        0.753

experiment %>% complete(nesting(person, trt), rep)
#> Source: local data frame [9 x 5]
#> 
#>    person   trt   rep measurment_1 measurment_2
#>     (chr) (chr) (dbl)        (dbl)        (dbl)
#> 1    Alex     a     1       0.7161        0.927
#> 2    Alex     a     2       0.3231        0.942
#> 3    Alex     a     3       0.4548        0.668
#> 4  Robert     b     1       0.0356        0.667
#> 5  Robert     b     2       0.5081        0.143
#> 6  Robert     b     3           NA           NA
#> 7     Sam     a     1       0.6917        0.753
#> 8     Sam     a     2           NA           NA
#> ..    ...   ...   ...          ...          ...

httr 1.1.0 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.

Install the latest version with:

install.packages("httr")

When writing this blog post I discovered that I forgot to annouce httr 1.0.0. This was a major release marking the transition from the RCurl package to the curl package, a modern binding to libcurl written by Jeroen Ooms. This makes httr more reliable, less likely to leak memory, and prevents the diabolical “easy handle already used in multi handle” error.

httr 1.1.0 includes a couple of new features:

  • stop_for_status(), warn_for_status() and (new) message_for_status() replace the old message argument with a new task argument that optionally describes the current task. This allows API wrappers to provide more informative error messages on failure.

  • http_error() replaces url_ok() and url_successful(). http_error() more clearly conveys intent and works with urls, responses and status codes.

Otherwise, OAuth support continues to improve thanks to support from the community:

  • Nathan Goulding added RSA-SHA1 signature support to oauth1.0_token(). He also fixed bugs in oauth_service_token() and improved the caching behaviour of refresh_oauth2.0(). This makes httr easier to use with Google’s service accounts.

  • Graham Parsons added support for HTTP basic authentication to oauth2.0_token() with the use_basic_auth. This is now the default method used when retrieving a token.

  • Daniel Lockau implemented user_params which allows you to pass arbitrary additional parameters to the token access endpoint when acquiring or refreshing a token. This allows you to use httr with Microsoft Azure. He also wrote a demo so you can see exactly how this works.

To see the full list of changes, please read the release notes for 1.0.0 and 1.1.0.

Devtools 1.10.0 is now available on CRAN. Devtools makes package building so easy that a package can become your default way to organise code, data, documentation, and tests. You can learn more about creating your own package in R packages. Install devtools with:

install.packages("devtools")

This version is mostly a collection of bug fixes and minor improvements. For example:

  • Devtools employs a new strategy for detecting RTools on windows: we now only check for Rtools if you need to load_all() or build() a package with compiled code. This should make life easier for most windows users.
  • Package installation receieved a lot of tweaks from the community. Devtools now makes use of the Additional_repositories field, which is useful if you’re using drat for non-CRAN packages. install_github() is now lazy and won’t reinstall if the currently installed version is the same as the one on github. Local installs now add git and github metadata, if available.
  • use_news_md() adds a (very) basic NEWS.md template. CRAN now accepts NEWS.md files so release() warns if you’ve previously added it to .Rbuilignore.
  • use_mit_license() writes the necessary infrastructure to declare that your package is MIT licensed (in a CRAN-compliant way).
  • check(cran = TRUE) automatically adds --run-donttest as this is a de facto CRAN standard.

To see the full list of changes, please read the release notes.