Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations by combining simple pieces in a standard way.

Install it with:


Purrr wouldn’t be possible without Lionel Henry. He wrote a lot of the package and his insightful comments helped me rapidly iterate towards a stable, useful, and understandable package.

Map functions

The core of purrr is a set of functions for manipulating vectors (atomic vectors, lists, and data frames). The goal is similar to dplyr: help you tackle the most common 90% of data manipulation challenges. But where dplyr focusses on data frames, purrr focusses on vectors. For example, the following code splits the built-in mtcars dataset up by number of cylinders (using the base split() function), fits a linear model to each piece, summarises each model, then extracts the the \(R^2\):

mtcars %>%
  split(.$cyl) %>%
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
#>     4     6     8 
#> 0.509 0.465 0.423

The first argument to all map functions is the vector to operate on. The second argument, .f specifies what to do with each piece. It can be:

  • A function, like summary().
  • A formula, which is converted to an anonymous function, so that ~ lm(mpg ~ wt, data = .) is shorthand for function(x) lm(mpg ~ wt, data = x).
  • A string or number, which is used to extract components, i.e. "r.squared" is shorthand for function(x) x[[r.squared]] and 1 is shorthand for function(x) x[[1]].

Map functions come in a few different variations based on their inputs and output:

  • map() takes a vector (list or atomic vector) and returns a list. map_lgl(), map_int(), map_dbl(), and map_chr() take a vector and return an atomic vector. flatmap() works similarly, but allows the function to return arbitrary length vectors.
  • map_if() only applies .f to those elements of the list where .p is true. For example, the following snippet converts factors into characters:
    iris %>% map_if(is.factor, as.character) %>% str()
    #> 'data.frame':    150 obs. of  5 variables:
    #>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    #>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    #>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    #>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    #>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

    map_at() works similarly but instead of working with a logical vector or predicate function, it works with a integer vector of element positions.

  • map2() takes a pair of lists and iterates through them in parallel:
    map2(1:3, 2:4, c)
    #> [[1]]
    #> [1] 1 2
    #> [[2]]
    #> [1] 2 3
    #> [[3]]
    #> [1] 3 4
    map2(1:3, 2:4, ~ .x * (.y - 1))
    #> [[1]]
    #> [1] 1
    #> [[2]]
    #> [1] 4
    #> [[3]]
    #> [1] 9

    map3() does the same thing for three lists, and map_n() does it in general.

  • invoke(), invoke_lgl(), invoke_int(), invoke_dbl(), and invoke_chr() take a list of functions, and call each one with the supplied arguments:
    list(m1 = mean, m2 = median) %>%
    #>    m1    m2 
    #> 9.765 0.117
  • walk() takes a vector, calls a function on piece, and returns its original input. It’s useful for functions called for their side-effects; it returns the input so you can use it in a pipe.

Purrr and dplyr

I’m becoming increasingly enamoured with the list-columns in data frames. The following example combines purrr and dplyr to generate 100 random test-training splits in order to compute an unbiased estimate of prediction quality. These tools are still experimental (and currently need quite a bit of extra scaffolding), but I think the basic approach is really appealing.

random_group <- function(n, probs) {
  probs <- probs / sum(probs)
  g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)),
    rightmost.closed = TRUE)
partition <- function(df, n, probs) {
  n %>% 
    replicate(split(df, random_group(nrow(df), probs)), FALSE) %>%
    zip_n() %>%

msd <- function(x, y) sqrt(mean((x - y) ^ 2))

# Genearte 100 random test-training splits, 
cv <- mtcars %>%
  partition(100, c(training = 0.8, test = 0.2)) %>% 
    # Fit the model
    model = map(training, ~ lm(mpg ~ wt, data = .)),
    # Make predictions on test data
    pred = map2(model, test, predict),
    # Calculate mean squared difference
    diff = map2(pred, test %>% map("mpg"), msd) %>% flatten()
#> Source: local data frame [100 x 5]
#>                   test             training   model     pred  diff
#>                 (list)               (list)  (list)   (list) (dbl)
#> 1  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.70
#> 2  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.03
#> 3  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.29
#> 4  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.88
#> 5  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.20
#> 6  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.68
#> 7  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.39
#> 8  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.82
#> 9  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.56
#> 10 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.40
#> ..                 ...                  ...     ...      ...   ...
#> [1] 3.22

Other functions

There are too many other pieces of purrr to describe in detail here. A few of the most useful functions are noted below:

  • zip_n() allows you to turn a list of lists “inside-out”:
    x <- list(list(a = 1, b = 2), list(a = 2, b = 1))
    x %>% str()
    #> List of 2
    #>  $ :List of 2
    #>   ..$ a: num 1
    #>   ..$ b: num 2
    #>  $ :List of 2
    #>   ..$ a: num 2
    #>   ..$ b: num 1
    x %>%
      zip_n() %>%
    #> List of 2
    #>  $ a:List of 2
    #>   ..$ : num 1
    #>   ..$ : num 2
    #>  $ b:List of 2
    #>   ..$ : num 2
    #>   ..$ : num 1
    x %>%
      zip_n(.simplify = TRUE) %>%
    #> List of 2
    #>  $ a: num [1:2] 1 2
    #>  $ b: num [1:2] 2 1
  • keep() and discard() allow you to filter a vector based on a predicate function. compact() is a helpful wrapper that throws away empty elements of a list.
    1:10 %>% keep(~. %% 2 == 0)
    #> [1]  2  4  6  8 10
    1:10 %>% discard(~. %% 2 == 0)
    #> [1] 1 3 5 7 9
    list(list(x = TRUE, y = 10), list(x = FALSE, y = 20)) %>%
      keep("x") %>% 
    #> List of 1
    #>  $ :List of 2
    #>   ..$ x: logi TRUE
    #>   ..$ y: num 10
    list(NULL, 1:3, NULL, 7) %>% 
      compact() %>%
    #> List of 2
    #>  $ : int [1:3] 1 2 3
    #>  $ : num 7
  • lift() (and friends) allow you to convert a function that takes multiple arguments into a function that takes a list. It helps you compose functions by lifting their domain from a kind of input to another kind. The domain can be changed to and from a list (l), a vector (v) and dots (d).
  • cross2(), cross3() and cross_n() allow you to create the Cartesian product of the inputs (with optional filtering).
  • A number of functions let you manipulate functions: negate(), compose(), partial().
  • A complete set of predicate functions provides predictable versions of the is.* functions: is_logical(), is_list(), is_bare_double(), is_scalar_character(), etc.
  • Other equivalents functions wrap existing base R functions into to the consistent design of purrr: replicate() -> rerun(), Reduce() -> reduce(), Find() -> detect(), Position() -> detect_index().

Design philosophy

The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R.

  • Anonymous functions are verbose in R, so we provide two convenient shorthands. For predicate functions, ~ .x + 1 is equivalent to function(.x) .x + 1. For chains of transformations functions, . %>% f() %>% g() is equivalent to function(.) . %>% f() %>% g().
  • R is weakly typed, so we can implement general zip_n(), rather than having to specialise on the number of arguments. That said, we still provide map2() and map3() since it’s useful to clearly separate which arguments are vectorised over. Functions are designed to be output type-stable (respecting Postel’s law) so you can rely on the output being as you expect.
  • R has named arguments, so instead of providing different functions for minor variations (e.g. detect() and detectLast()) we use a named arguments.
  • Instead of currying, we use ... to pass in extra arguments. Arguments of purrr functions always start with . to avoid matching to the arguments of .f passed in via ....
  • Instead of point free style, use the pipe, %>%, to write code that can be read from left to right.

I’m pleased to announce rvest 0.3.0 is now available on CRAN. Rvest makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with pipes so that you can express complex operations by composed simple pieces. Install it with:


What’s new

The biggest change in this version is that rvest now uses the xml2 package instead of XML. This makes rvest much simpler, eliminates memory leaks, and should improve performance a little.

A number of functions have changed names to improve consistency with other packages: most importantly html() is now read_html(), and html_tag() is now html_name(). The old versions still work, but are deprecated and will be removed in rvest 0.4.0.

html_node() now throws an error if there are no matches, and a warning if there’s more than one match. I think this should make it more likely to fail clearly when the structure of the page changes. If you don’t want this behaviour, use html_nodes().

There were a number of other bug fixes and minor improvements as described in the release notes.

RStudio will again teach the new essentials for doing (big) data science in R at this year’s Strata NYC conference, September 29 2015 (http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/44154).  You will learn from Garrett Grolemund, Yihui Xie, and Nathan Stephens who are all working on fascinating new ways to keep the R ecosystem apace of the challenges facing those who work with data.

Topics include:

  • R Quickstart: Wrangle, transform, and visualize data
    Instructor: Garrett Grolemund (90 minutes)
  • Work with Big Data in R
    Instructor: Nathan Stephens (90 minutes)
  • Reproducible Reports with Big Data
    Instructor: Yihui Xie (90 minutes)
  • Interactive Shiny Applications built on Big Data
    Instructor: Garrett Grolemund (90 minutes)

If you plan to stay for the full Strata Conference+Hadoop World be sure to look us up at booth 633 during the Expo Hall hours. We’ll have the latest books from RStudio authors and “shiny” t-shirts to win. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in New York City! (http://strataconf.com/big-data-conference-ny-2015)

Devtools 1.9.1 is now available on CRAN. Devtools makes package building so easy a package can become your default way to organise code, data, and documentation. You can learn more about developing packages in R packages, my book about package development that’s freely available online..

Get the latest version of devtools with:


There are three major improvements that I contributed:

  • check() is now much closer to what CRAN does – it passes on --as-cran to R CMD check, using an env var to turn off the incoming CRAN checks. These are turned off because they’re slow (they have to retrieve data from CRAN), and are not necessary except just prior to release (so release() turns them back on again).
  • install_deps() now automatically upgrades out of date dependencies. This is typically what you want when you’re working on a development version of a package: otherwise you can get an unpleasant surprise when you go to submit your package to CRAN and discover it doesn’t work with the latest version of its dependencies. To suppress this behaviour, set upgrade_dependencies = FALSE.
  • revdep_check() received a number of tweaks that I’ve found helpful when preparing my packages for CRAN:
    • Suggested dependencies of the revdeps are installed by default.
    • The NOT_CRAN env var is set to false so tests that are skipped on CRAN are also skipped for you.
    • The RGL_USE_NULL env var is set to true to stop rgl windows from popping up during testing.
    • All revdep sources are downloaded at the start of the checks. This makes life a bit easier if you’re on a flaky internet connection.

But like many recent devtools releases, most of the coolest new features have been contributed by the community:

  • Jim Hester implemented experimental remote depedencies for install(). You can now tell devtools where to find dependencies with a remotes field:

    The default allows you to refer to github repos, but you can easily add deps from any of the other sources that devtools supports: see vignette("dependencies") for more details.

    Support for installing development dependencies is still experimental so we appreciate any feedback.

  • Jenny Bryan considerably improved the existing GitHub integration. use_github() now pushes to the newly created GitHub repo, and sets a remote tracking branch. It also populates the URL and BugReports fields of your DESCRIPTION.
  • Kirill Müller contributed many bug fixes, minor improvements and test cases.

See the release notes for complete bug fixes and other minor changes.

tidyr 0.3.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has variables in columns and observations in rows, and is described in more detail in the tidy data vignette. Install tidyr with:


tidyr contains four new verbs: fill(), replace() and complete(), and unnest(), and lots of smaller bug fixes and improvements.


The new fill function fills in missing observations from the last non-missing value. This is useful if you’re getting data from Excel users who haven’t read Karl Broman’s excellent data organisation guide and leave cells blank to indicate that the previous value should be carried forward:

df <- dplyr::data_frame(
  year = c(2015, NA, NA, NA), 
  trt = c("A", NA, "B", NA)
#> Source: local data frame [4 x 2]
#>    year   trt
#>   (dbl) (chr)
#> 1  2015     A
#> 2    NA    NA
#> 3    NA     B
#> 4    NA    NA
df %>% fill(year, trt)
#> Source: local data frame [4 x 2]
#>    year   trt
#>   (dbl) (chr)
#> 1  2015     A
#> 2  2015     A
#> 3  2015     B
#> 4  2015     B

replace_na() and complete()

replace_na() makes it easy to replace missing values on a column-by-column basis:

df <- dplyr::data_frame(
  x = c(1, 2, NA), 
  y = c("a", NA, "b")
df %>% replace_na(list(x = 0, y = "unknown"))
#> Source: local data frame [3 x 2]
#>       x       y
#>   (dbl)   (chr)
#> 1     1       a
#> 2     2 unknown
#> 3     0       b

It is particularly useful when called from complete(), which makes it easy to fill in missing combinations of your data:

df <- dplyr::data_frame(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  value1 = 1:3,
  value2 = 4:6
#> Source: local data frame [3 x 5]
#>   group item_id item_name value1 value2
#>   (dbl)   (dbl)     (chr)  (int)  (int)
#> 1     1       1         a      1      4
#> 2     2       2         b      2      5
#> 3     1       2         b      3      6

df %>% complete(group, c(item_id, item_name))
#> Source: local data frame [4 x 5]
#>   group item_id item_name value1 value2
#>   (dbl)   (dbl)     (chr)  (int)  (int)
#> 1     1       1         a      1      4
#> 2     1       2         b      3      6
#> 3     2       1         a     NA     NA
#> 4     2       2         b      2      5

df %>% complete(
  group, c(item_id, item_name), 
  fill = list(value1 = 0)
#> Source: local data frame [4 x 5]
#>   group item_id item_name value1 value2
#>   (dbl)   (dbl)     (chr)  (dbl)  (int)
#> 1     1       1         a      1      4
#> 2     1       2         b      3      6
#> 3     2       1         a      0     NA
#> 4     2       2         b      2      5

Note how I’ve grouped item_id and item_name together with c(item_id, item_name). This treats them as nested, not crossed, so we don’t get every combination of group, item_id and item_name, as we would otherwise:

df %>% complete(group, item_id, item_name)
#> Source: local data frame [8 x 5]
#>    group item_id item_name value1 value2
#>    (dbl)   (dbl)     (chr)  (int)  (int)
#> 1      1       1         a      1      4
#> 2      1       1         b     NA     NA
#> 3      1       2         a     NA     NA
#> 4      1       2         b      3      6
#> 5      2       1         a     NA     NA
#> ..   ...     ...       ...    ...    ...

Read more about this behaviour in ?expand.


unnest() is out of beta, and is now ready to help you unnest columns that are lists of vectors. This can occur when you have hierarchical data that’s been collapsed into a string:

df <- dplyr::data_frame(x = 1:2, y = c("1,2", "3,4,5,6,7"))
#> Source: local data frame [2 x 2]
#>       x         y
#>   (int)     (chr)
#> 1     1       1,2
#> 2     2 3,4,5,6,7

df %>% 
  dplyr::mutate(y = strsplit(y, ","))
#> Source: local data frame [2 x 2]
#>       x        y
#>   (int)   (list)
#> 1     1 <chr[2]>
#> 2     2 <chr[5]>

df %>% 
  dplyr::mutate(y = strsplit(y, ",")) %>%
#> Source: local data frame [7 x 2]
#>        x     y
#>    (int) (chr)
#> 1      1     1
#> 2      1     2
#> 3      2     3
#> 4      2     4
#> 5      2     5
#> ..   ...   ...

unnest() also works on columns that are lists of data frames. This is admittedly esoteric, but I think it might be useful when you’re generating pairs of test-training splits. I’m still thinking about this idea, so look for more examples and better support across my packages in the future.

Minor improvements

There were 13 minor improvements and bug fixes. The most important are listed below. To read about the rest, please consult the release notes.

  • %>% is re-exported from magrittr: this means that you no longer need to load dplyr or magrittr if you want to use the pipe.
  • extract() and separate() now return multiple NA columns for NA inputs:
    df <- dplyr::data_frame(x = c("a-b", NA, "c-d"))
    df %>% separate(x, c("x", "y"), "-")
    #> Source: local data frame [3 x 2]
    #>       x     y
    #>   (chr) (chr)
    #> 1     a     b
    #> 2    NA    NA
    #> 3     c     d
  • separate() gains finer control if there are too few matches:
    df <- dplyr::data_frame(x = c("a-b-c", "a-c"))
    df %>% separate(x, c("x", "y", "z"), "-")
    #> Warning: Too few values at 1 locations: 2
    #> Source: local data frame [2 x 3]
    #>       x     y     z
    #>   (chr) (chr) (chr)
    #> 1     a     b     c
    #> 2     a     c    NA
    df %>% separate(x, c("x", "y", "z"), "-", fill = "right")
    #> Source: local data frame [2 x 3]
    #>       x     y     z
    #>   (chr) (chr) (chr)
    #> 1     a     b     c
    #> 2     a     c    NA
    df %>% separate(x, c("x", "y", "z"), "-", fill = "left")
    #> Source: local data frame [2 x 3]
    #>       x     y     z
    #>   (chr) (chr) (chr)
    #> 1     a     b     c
    #> 2    NA     c     a

    This complements the support for too many matches:

    df <- dplyr::data_frame(x = c("a-b-c", "a-c"))
    df %>% separate(x, c("x", "y"), "-")
    #> Warning: Too many values at 1 locations: 1
    #> Source: local data frame [2 x 2]
    #>       x     y
    #>   (chr) (chr)
    #> 1     a     b
    #> 2     a     c
    df %>% separate(x, c("x", "y"), "-", extra = "merge")
    #> Source: local data frame [2 x 2]
    #>       x     y
    #>   (chr) (chr)
    #> 1     a   b-c
    #> 2     a     c
    df %>% separate(x, c("x", "y"), "-", extra = "drop")
    #> Source: local data frame [2 x 2]
    #>       x     y
    #>   (chr) (chr)
    #> 1     a     b
    #> 2     a     c
  • tidyr no longer depends on reshape2. This should fix issues when you load reshape and tidyr at the same time. It also frees tidyr to evolve in a different direction to the more general reshape2.

dplyr 0.4.3 includes over 30 minor improvements and bug fixes, which are described in detail in the release notes. Here I wanted to draw your attention five small, but important, changes:

  • mutate() no longer randomly crashes! (Sorry it took us so long to fix this – I know it’s been causing a lot of pain.)
  • dplyr now has much better support for non-ASCII column names. It’s probably not perfect, but should be a lot better than previous versions.
  • When printing a tbl_df, you now see the types of all columns, not just those that don’t fit on the screen:
    data_frame(x = 1:3, y = letters[x], z = factor(y))
    #> Source: local data frame [3 x 3]
    #>       x     y      z
    #>   (int) (chr) (fctr)
    #> 1     1     a      a
    #> 2     2     b      b
    #> 3     3     c      c
  • bind_rows() gains a .id argument. When supplied, it creates a new column that gives the name of each data frame:
    a <- data_frame(x = 1, y = "a")
    b <- data_frame(x = 2, y = "c")
    bind_rows(a = a, b = b)
    #> Source: local data frame [2 x 2]
    #>       x     y
    #>   (dbl) (chr)
    #> 1     1     a
    #> 2     2     c
    bind_rows(a = a, b = b, .id = "source")
    #> Source: local data frame [2 x 3]
    #>   source     x     y
    #>    (chr) (dbl) (chr)
    #> 1      a     1     a
    #> 2      b     2     c
    # Or equivalently
    bind_rows(list(a = a, b = b), .id = "source")
    #> Source: local data frame [2 x 3]
    #>   source     x     y
    #>    (chr) (dbl) (chr)
    #> 1      a     1     a
    #> 2      b     2     c
  • dplyr is now more forgiving of unknown attributes. All functions should now copy column attributes from the input to the output, instead of complaining. Additionally arrange(), filter(), slice(), and summarise() preserve attributes of the data frame itself.

Five months ago we launched shinyapps.io. Since then, more than 25,000 accounts have been created and countless Shiny applications have been deployed. It’s incredibly exciting to see!

It’s also given us lots of data and feedback on how we can make shinyapps.io better. Today, we’re happy to tell you about some changes to our subscription Plans that we hope will make shinyapps.io an even better experience for Shiny developers and their application users.

New Starter Plan – More active hours and apps, less money
For many people the price difference between the Free and the Basic plan was too much. We heard you. Effective today there is a new Starter Plan for only $9 per month or $100 per year. The Starter Plan has the same features as the Free plan but allows 100 active hours per month and up to 25 applications. It’s perfect for the active Shiny developer on a budget!

More Active Hours for Basic, Standard, and Professional Plans
Once you’re up and running with Shiny we want to make sure even the most prolific developers and popular applications have the active hours they need. Today we’re doubling the number of active hours per month for the Basic (now 500), Standard (now 2,000), and Professional (now 10,000) plans. In practice, very few accounts exceeded the old limits for these plans but now you can be sure your needs are covered.

New Performance Boost features for the Basic Plan
In addition to supporting multiple R worker processes per application, which keeps your application responsive as more people use it, we’ve added more memory (up to 8GB) on Basic plans and above. While the data shows that most applications work fine without these enhancements, if you expect many users at the same time or your application is memory or CPU intensive, the Basic Plan has the performance boost you need. The Basic plan also allows unlimited applications and 500 active hours per month.

Traditionally, the mechanisms for obtaining R and related software have used standard HTTP connections. This isn’t ideal though, as without a secure (HTTPS) connection there is less assurance that you are downloading code from a legitimate source rather than from another server posing as one.

Recently there have been a number of changes that make it easier to use HTTPS for installing R, RStudio, and packages from CRAN:

  1. Downloads of R from the main CRAN website now use HTTPS;

  2. Downloads of RStudio from our website now use HTTPS; and

  3. It is now possible to install packages from CRAN over HTTPS.

There are a number of ways to ensure that installation of packages from CRAN are performed using HTTPS. The most recent version of R  (v3.2.2) makes this the default behavior. The most recent version of RStudio (v0.99.473) also attempts to configure secure downloads from CRAN by default (even for older versions of R). Finally, any version of R or RStudio can use secure HTTPS downloads by making some configuration changes as described in the Secure Package Downloads for R article in our Knowledge Base.

Configuring Secure Connections to CRAN

While the simplest way to ensure secure connections to CRAN is to run the updated versions mentioned above, it’s important to note that it is not necessary to upgrade R or RStudio to achieve this end. Rather, two configuration changes can be made:

  1. The R download.file.method option needs to specify a method that is capable of HTTPS; and
  2. The CRAN mirror you are using must be capable of HTTPS connections (not all of them are).

The specifics of the required changes for various products, platforms, and versions of R are described in-depth in the Secure Package Downloads for R article in our Knowledge Base.

Recommendations for RStudio Users

We’ve made several changes to RStudio IDE to ensure that HTTPS connections are used throughout the product:

  1. The default download.file.method option is set to an HTTPS compatible method (with a warning displayed if a secure method can’t be set);
  2. The configured CRAN mirror is tested for HTTPS compatibility and a warning is displayed if the mirror doesn’t support HTTPS;
  3. HTTPS is used for user selection of a non-default CRAN mirror;
  4. HTTPS is used for in-product documentation links;
  5. HTTPS is used when checking for updated versions of RStudio (applies to desktop version only); and
  6. HTTPS is used when downloading Rtools (applies to desktop version only).

If you are running RStudio on the desktop we strongly recommend that you update to the latest version (v0.99.473).

Recommendations for Server Administrators

If you are running RStudio Server it’s possible to make the most important security enhancements by changing your configuration rather than updating to a new version. The Secure Package Downloads for R article in our Knowledge Base provides documentation on how do this.

In this case in-product documentation links and user selection of a non-default CRAN mirror will continue to use HTTP rather than HTTPS however these are less pressing concerns than CRAN package installation. If you’d like these functions to also be performed over HTTPS then you should upgrade your server to the latest version of RStudio.

If you are running Shiny Server we recommend that you modify your configuration to support HTTPS package downloads as described in the Secure Package Downloads for R article.

The Joint Statistics Meetings starting August 8 is the biggest meetup for statisticians in the world. Navigating the sheer quantity of interesting talks is challenging – there can be up to 50 sessions going on at a time!

To prepare for Seattle, we asked RStudio’s Chief Data Scientist Hadley Wickham for his top session picks. Here are 9 talks, admittedly biased towards R, graphics, and education, that really interested him and might interest you, too.

Check out these talks

Undergraduate Curriculum: The Pathway to Sustainable Growth in Our Discipline
Sunday, 1400-1550, CC-607
Statistics with Computing in the Evolving Undergraduate Curriculum
Sunday 1600-1750
In these back to back sessions, learn how statisticians are rising to the challenge of teaching computing and big(ger) data.

Recent Advances in Interactive Graphics for Data Analysis
Monday,1030-1220, CC-608
This is an exciting session discussing innovations in interactive visualisation, and it’s telling that all of them connect with R. Hadley will be speaking about ggvis in this session.

Preparing Students to Work in Industry
Monday, 1400-1550, CC-4C4
If you’re a student about to graduate, we bet there will be some interesting discussion for you here.

Stat computing and graphics mixer
Monday, 1800-2000, S-Ravenna
This is advertised as a business meeting, but don’t be confused. It’s the premier social event for anyone interested in computing or visualisation!

Statistical Computing and Graphics Student Paper Competition
Tuesday, 0830-1020, CC-308
Hear this year’s winners of the student paper award talk about visualising phylogenetic data, multiple change point detection, capture-recapture data and teaching intro stat with R. All four talks come with accompanying R packages!

The Statistics Identity Crisis: Are We Really Data Scientists?
Tuesday, 0830-1020, CC-609
This session, organised by Jeffrey Leek, features an all-star cast of Alyssa Frazee, Chris Volinsky, Lance Waller, and Jenny Bryan.

Doing Good with Data Viz
Wednesday, 0830-1020, CC-2B
Hear Jake Porway, Patrick Ball, and Dino Citraro talk about using data to do good. Of all the sessions, you shouldn’t miss, this is the one you really shouldn’t miss. (But unfortunately it conflicts with another great session – you have difficult choices ahead.)

Statistics at Scale: Applications from Tech Companies
Wednesday, 0830-1020, CC-204
Hilary Parker has organised a fantastic session where you’ll learn how companies like Etsy, Microsoft, and Facebook do statistics at scale. Get there early because this session is going to be PACKED!

There are hundreds of sessions at the JSM, so no doubt we’ve missed other great ones. If you think we’ve missed a “don’t miss” session, please add it to the comments so others can find it.

Visit RStudio at booth #435

In between session times you’ll find the RStudio team hanging out at booth #435 in the expo center (at the far back on the right). Please stop by and say hi! We’ll have stickers to show your R pride and printed copies of many of our cheatsheets.

See you in Seattle!

We’ve added a new articles section to the R Markdown development center at rmarkdown.rstudio.com/articles.html. Here you can find expert advice and tips on how to use R Markdown efficiently.

In one of the first articles, Richard Layton of graphdoctor.com explains the best tips for using R Markdown to generate Microsoft Word documents. You’ll learn how to

  • set Word styles
  • tweak margins
  • handle relative paths
  • make better tables
  • add bibliographies, and more

Check it out; and then check back often.


Get every new post delivered to your Inbox.

Join 19,195 other followers