RStudio is planning a new Master R Developer Workshop to be taught by Hadley Wickham in the San Francisco Bay Area on January 19-20. This will be the same workshop that Hadley is teaching in September in New York City to a sold out audience.

If you did not get a chance to register for the NYC workshop but wished to, consider attending the January Bay Area workshop. We will open registration once we have planned out all of the event details. If you would like to be notified when registration opens, leave a contact address here.

The Joint Statistical Meetings (JSM) start this weekend! We wanted to let you know we’ll be there. Be sure to check out these sessions from RStudio and friends:

Sunday, August 3

  • 4:00 PM: A Web Application for Efficient Analysis of Peptide Libraries: Eric Hare*+ and Timo Sieber and Heike Hofmann
  • 4:00 PM: Gravicom: A Web-Based Tool for Community Detection in Networks: Andrea Kaplan*+ and Heike Hofmann and Daniel Nordman

Monday, August 4

  • 8:35 AM: Preparing Students for Big Data Using R and Rstudio: Randall Pruim
  • 8:55 AM: Thinking with Data in the Second Course: Nicholas J. Horton and Ben S. Baumer and Hadley Wickham
  • 8:55 AM: Doing Reproducible Research Unconsciously: Higher Standard, but Less Work: Yihui Xie
  • 10:30 AM: Interactive Web Application with Shiny: Bharat Bahadur
  • 2:00 PM: Interactive Web Application with Shiny: Bharat Bahadur

Tuesday, August 5

  • 2:00PM: Give Me an Old Computer, a Blank DVD, and an Internet Connection and I’ll Give You World-Class Analytics: Ty Henkaline

Wednesday, August 6

  • 10:35 Shiny: Easy Web Applications in R:Joseph Cheng
  • 11:00 AM: ggvis: Moving Toward a Grammar of Interactive Graphics: Hadley Wickham

For even more talks on R we thought Joseph Rickert’s “Data Scientists and R Users Guide to the JSM” was excellent. Click here to see it. http://blog.revolutionanalytics.com/2014/07/a-data-scientists-and-r-users-guide-to-the-jsm.html

While you’re at the conference, please come by our exhibition area (Booth #112) to say hello. J.J., Hadley and other members of the team will be there. We’ve got enough space to talk about your plans for R and how RStudio Server Pro and Shiny Server Pro can provide enterprise-ready support and scalability for your RStudio IDE and Shiny Server deployments.

We hope to see you there!

R Markdown is a framework for writing versatile, reproducible reports from R. With R Markdown, you write a simple plain text report and then render it to create polished output. You can:

  1. Transform your file into a pdf, html, or Microsoft Word document—even a slideshow—at the click of a button.
  2. Embed R code into your report. When you render the file, R will run the code and insert its results into your report. Use this feature to add graphs and tables to your report: if your data ever changes, you can update your figures by re-rendering the report.
  3. Make interactive documents and slideshows. Your report becomes interactive when you embed Shiny code.

We’ve created a cheat sheet to help you master R Markdown. Download your copy here. You can also learn more about R Markdown at rmarkdown.rstudio.com and Introduction to R Markdown.

RM-cheatsheet

Shiny v0.10.1 has been released to CRAN. You can either install it from a CRAN mirror, or update it if you have installed a previous version.

install.packages('shiny', repos = 'http://cran.rstudio.com')
# or update your installed packages
# update.packages(ask = FALSE, repos = 'http://cran.rstudio.com')

The most prominent change in this patch release is that we added full Unicode support on Windows. Shiny apps running on Windows must use the UTF-8 encoding for ui.R and server.R (also the optional global.R, README.md, and DESCRIPTION) if they contain non-ASCII characters. See this article for details and examples: http://shiny.rstudio.com/articles/unicode.html

Chinese characters in a shiny app

Chinese characters in a shiny app

Please note although we require UTF-8 for the app components, UTF-8 is not a general requirement for any other files. If you read/write text files in an app, you are free to use any encoding you want, e.g. you can readLines('foo.txt', encoding = 'Windows-1252'). The article above has explained it in detail.

Other changes include:

  • runGitHub() also allows the 'username/repo' syntax now, which is equivalent to runGitHub('repo', 'username'). (#427)
  • navbarPage() now accepts a windowTitle parameter to set the web browser page title to something other than the title displayed in the navbar.
  • Added an inline argument to textOutput(), imageOutput(), plotOutput(), and htmlOutput(). When inline = TRUE, these outputs will be put in span() instead of the default div(). This occurs automatically when these outputs are created via the inline expressions (e.g. `r renderText(expr)`) in R Markdown documents. See an R Markdown example at http://shiny.rstudio.com/gallery/inline-output.html (#512)
  • Added support for option groups in the select/selectize inputs. When the choices argument for selectInput()/selectizeInput() is a list of sub-lists and any sub-list is of length greater than 1, the HTML tag <optgroup> will be used. See an example at here (#542)

Please let us know if you have any comments or questions.

httr 0.4 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R.

The most important new features are two new vignettes to help you get started and to help you make wrappers for web APIs. Other important improvements include:

  • New headers() and cookies() functions to extract headers and cookies from responses. status_code() returns HTTP status codes.
  • POST() (and PUT(), and PATCH()) now have an encode argument that determine how the body is encoded. Valid values are “multipart”, “form” or “json”, and the multipart argument is now deprecated.
  • GET(..., progress()) will display a progress bar, useful if you’re doing large uploads or downloads.
  • verbose() gives you considerably more control over degree of verbosity, and defaults have been selected to be more helpful for the most common cases.
  • NULL query parameters are now dropped automatically.

There are number of other minor improvements and bug fixes, as described by the release notes.

RStudio is very pleased to announce the general availability of Shiny Server Pro 1.2.

Download a free 45 day evaluation of Shiny Server Pro 1.2

Shiny Server Pro 1.2 adds support for R Markdown Interactive Documents in addition to Shiny applications. Learn more about Interactive Documents by registering for the Reproducible Reporting webinar August 13 and Interactive Reporting webinar September 3.

We are excited about the new ways in which you can now share your data analysis in Shiny Server Pro along with the security, management and performance tuning capabilities you and your IT teams need to scale.

Uncover all the features of Shiny Server Pro 1.2 in the updated Shiny Server admin guide…then give it a try!

I’ve released four new data packages to CRAN: babynames, fueleconomy, nasaweather and nycflights13. The goal of these packages is to provide some interesting, and relatively large, datasets to demonstrate various data analysis challenges in R. The package source code (on github, linked above) is fully reproducible so that you can see some data tidying in action, or make your own modifications to the data.

Below, I’ve listed the primary dataset found in each package. Most packages also include a number of supplementary datasets that provide additional information. Check out the docs for more details.

  • babynames::babynames: US baby name data for each year from 1880 to 2013, the number of children of each sex given each name. All names used 5 or more times are included. 1,792,091 rows, 5 columns (year, sex, name, n, prop). (Source: Social security administration).
  • fueleconomy::vehicles: Fuel economy data for all cars sold in the US from 1984 to 2015. 33,442 rows, 12 variables. (Source: Environmental protection agency)
  • nasaweather::atmos: Data from the 2006 ASA data expo. Contains monthly atmospheric measurements from Jan 1995 to Dec 2000 on 24 x 24 grid over Central America. 41,472 observations, 11 variables. (Source: ASA data expo)
  • nycflights13::flights: This package contains information about all flights that departed from NYC (i.e., EWR, JFK and LGA) in 2013: 336,776 flights with 16 variables. To help understand what causes delays, it also includes a number of other useful datasets: weather, planes, airports, airlines. (Source: Bureau of transportation statistics)

NB: since the datasets are large, I’ve tagged each data frame with the tbl_df class. If you don’t use dplyr, this has no effect. If you do use dplyr, this ensures that you won’t accidentally print thousands of rows of data. Instead, you’ll just see the first 10 rows and as many columns as will fit on screen. This makes interactive exploration much easier.

library(dplyr)
library(nycflights13)
flights
#> Source: local data frame [336,776 x 16]
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     1   1      517         2      830        11      UA  N14228
#> 2  2013     1   1      533         4      850        20      UA  N24211
#> 3  2013     1   1      542         2      923        33      AA  N619AA
#> 4  2013     1   1      544        -1     1004       -18      B6  N804JB
#> 5  2013     1   1      554        -6      812       -25      DL  N668DN
#> 6  2013     1   1      554        -4      740        12      UA  N39463
#> 7  2013     1   1      555        -5      913        19      B6  N516JB
#> 8  2013     1   1      557        -3      709       -14      EV  N829AS
#> 9  2013     1   1      557        -3      838        -8      B6  N593JB
#> 10 2013     1   1      558        -2      753         8      AA  N3ALAA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

We’re excited to announce a new release of Packrat, a tool for making R projects more isolated and reproducible by managing their package dependencies.

This release brings a number of exciting features to Packrat that significantly improve the user experience:

  • Automatic snapshots ensure that new packages installed in your project library are automatically tracked by Packrat.
  • Bundle and share your projects with packrat::bundle() and packrat::unbundle() — whether you want to freeze an analysis, or exchange it for collaboration with colleagues.
  • Packrat mode can now be turned on and off at will, allowing you to navigate between different Packrat projects in a single R session. Use packrat::on() to activate Packrat in the current directory, and packrat::off() to turn it off.
  • Local repositories (ie, directories containing R package sources) can now be specified for projects, allowing local source packages to be used in a Packrat project alongside CRAN, BioConductor and GitHub packages (see this and more with ?"packrat-options").

In addition, Packrat is now tightly integrated with the RStudio IDE, making it easier to manage project dependencies than ever. Download today’s RStudio IDE 0.98.978 release and try it out!

Packrat RStudio package pane integration

You can install the latest version of Packrat from GitHub with:

    devtools::install_github("rstudio/packrat")

Packrat will be coming to CRAN soon as well.

If you try it, we’d love to get your feedback. Leave a comment here or post in the packrat-discuss Google group.

 

tidyr is new package that makes it easy to “tidy” your data. Tidy data is data that’s easy to work with: it’s easy to munge (with dplyr), visualise (with ggplot2 or ggvis) and model (with R’s hundreds of modelling packages). The two most important properties of tidy data are:

  • Each column is a variable.
  • Each row is an observation.

Arranging your data in this way makes it easier to work with because you have a consistent way of referring to variables (as column names) and observations (as row indices). When use tidy data and tidy tools, you spend less time worrying about how to feed the output from one function into the input of another, and more time answering your questions about the data.

To tidy messy data, you first identify the variables in your dataset, then use the tools provided by tidyr to move them into columns. tidyr provides three main functions for tidying your messy data: gather(), separate() and spread().

gather() takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer. Other names for gather include melt (reshape2), pivot (spreadsheets) and fold (databases). Here’s an example how you might use gather() on a made-up dataset. In this experiment we’ve given three people two different drugs and recorded their heart rate:

library(tidyr)
library(dplyr)

messy <- data.frame(
  name = c("Wilbur", "Petunia", "Gregory"),
  a = c(67, 80, 64),
  b = c(56, 90, 50)
)
messy
#>      name  a  b
#> 1  Wilbur 67 56
#> 2 Petunia 80 90
#> 3 Gregory 64 50

We have three variables (name, drug and heartrate), but only name is currently in a column. We use gather() to gather the a and b columns into key-value pairs of drug and heartrate:

messy %>%
  gather(drug, heartrate, a:b)
#>      name drug heartrate
#> 1  Wilbur    a        67
#> 2 Petunia    a        80
#> 3 Gregory    a        64
#> 4  Wilbur    b        56
#> 5 Petunia    b        90
#> 6 Gregory    b        50

Sometimes two variables are clumped together in one column. separate() allows you to tease them apart (extract() works similarly but uses regexp groups instead of a splitting pattern or position). Take this example from stackoverflow (modified slightly for brevity). We have some measurements of how much time people spend on their phones, measured at two locations (work and home), at two times. Each person has been randomly assigned to either treatment or control.

set.seed(10)
messy <- data.frame(
  id = 1:4,
  trt = sample(rep(c('control', 'treatment'), each = 2)),
  work.T1 = runif(4),
  home.T1 = runif(4),
  work.T2 = runif(4),
  home.T2 = runif(4)
)

To tidy this data, we first use gather() to turn columns work.T1, home.T1, work.T2 and home.T2 into a key-value pair of key and time. (Only the first eight rows are shown to save space.)

tidier <- messy %>%
  gather(key, time, -id, -trt)
tidier %>% head(8)
#>   id       trt     key    time
#> 1  1 treatment work.T1 0.08514
#> 2  2   control work.T1 0.22544
#> 3  3 treatment work.T1 0.27453
#> 4  4   control work.T1 0.27231
#> 5  1 treatment home.T1 0.61583
#> 6  2   control home.T1 0.42967
#> 7  3 treatment home.T1 0.65166
#> 8  4   control home.T1 0.56774

Next we use separate() to split the key into location and time, using a regular expression to describe the character that separates them.

tidy <- tidier %>%
  separate(key, into = c("location", "time"), sep = "\\.") 
tidy %>% head(8)
#>   id       trt location time    time
#> 1  1 treatment     work   T1 0.08514
#> 2  2   control     work   T1 0.22544
#> 3  3 treatment     work   T1 0.27453
#> 4  4   control     work   T1 0.27231
#> 5  1 treatment     home   T1 0.61583
#> 6  2   control     home   T1 0.42967
#> 7  3 treatment     home   T1 0.65166
#> 8  4   control     home   T1 0.56774

The last tool, spread(), takes two columns (a key-value pair) and spreads them in to multiple columns, making “long” data wider. Spread is known by other names in other places: it’s cast in reshape2, unpivot in spreadsheets and unfold in databases. spread() is used when you have variables that form rows instead of columns. You need spread() less frequently than gather() or separate() so to learn more, check out the documentation and the demos.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It’s designed specifically for tidying data, not general reshaping. In particular, existing methods only work for data frames, and tidyr never aggregates. This makes each function in tidyr simpler: each function does one thing well. For more complicated operations you can string together multiple simple tidyr and dplyr functions with %>%.

You can learn more about the underlying principles in my tidy data paper. To see more examples of data tidying, read the vignette, vignette("tidy-data"), or check out the demos, demo(package = "tidyr"). Alternatively, check out some of the great stackoverflow answers that use tidyr. Keep up-to-date with development at http://github.com/hadley/tidyr, report bugs at http://github.com/hadley/tidyr/issues and get help with data manipulation challenges at https://groups.google.com/group/manipulatr. If you ask a question specifically about tidyr on stackoverflow, please tag it with tidyr and I’ll make sure to read it.

We’ve added a new section of articles to the Shiny Development Center. These articles explain how to create interactive documents with Shiny and R Markdown.

You’ll learn how to

  • Use R Markdown to create reproducible, dynamic reports. R Markdown offers one of the most efficient workflows for writing up your R results.

  • Create interactive documents and slideshows by embedding Shiny elements into an R Markdown report. The Shiny + R Markdown combo does more than just enhance your reports; R Markdown provides one of the quickest ways to make light weight Shiny apps.

  • Take advantage of RStudio’s built in features that support R Markdown

interactive-articles.001

Learn more at shiny.rstudio.com/articles

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 652 other followers

RStudio is an affiliated project of the Foundation for Open Access Statistics

Follow

Get every new post delivered to your Inbox.

Join 652 other followers