You are currently browsing the monthly archive for January 2015.
Sometimes the universe surprises us. In this case, it was in a good way and we genuinely appreciated it.
Earlier this week we learned that the Infoworld Testing Center staff selected RStudio as one of 32 recipients of the 2015 Technology of the Year Award.
We thought it was cool because it was completely unsolicited, we’re in very good company (some of our favorite technologies like Docker, Github, node.js…even my Dell XPS 15 Touch!…were also award winners) and the description of our products was surprisingly elegant – simple and accurate.
We know Infoworld wouldn’t have known about us if our customers hadn’t brought us to their attention.
Great news for Shiny and R Markdown enthusiasts!
An Interactive Reporting Workshop with Shiny and R Markdown is coming to a city near you. Act fast as only 20 seats are available for each workshop.
You can find out more / register by clicking on the link for your city!
|East Coast||West Coast|
|March 2 – Washington, DC||April 15 – Los Angeles, CA|
|March 4 – New York, NY||April 17 – San Francisco, CA|
|March 6 – Boston, MA||April 20 – Seattle, WA|
You’ll want to take this workshop if…
You have some experience working with R already. You should have written a number of functions, and be comfortable with R’s basic data structures (vectors, matrices, arrays, lists, and data frames).
You will learn from…
The workshop is taught by Garrett Grolemund. Garrett is the Editor-in-Chief of shiny.rstudio.com, the development center for the Shiny R package. He is also the author of Hands-On Programming with R as well as Data Science with R, a forthcoming book by O’Reilly Media. Garrett works as a Data Scientist and Chief Instructor for RStudio, Inc. GitHub
Shiny version 0.11 is available now! Notable changes include:
- Shiny has migrated from Bootstrap 2 to Bootstrap 3 for its web front end. More on this below.
- The old jsliders have been replaced with ion.rangeSlider. These sliders look better, are easier for users to interact with, and support updating more fields from the server side.
- There is a new
passwordInput()which can be used to create password fields.
eventReactive()functions greatly streamline the use of
actionButtonand other inputs that act more like events than reactive inputs.
For a full set of changes, see the NEWS file. To install, run:
We’ve also posted an article with notes on upgrading to 0.11.
Bootstrap 3 migration
In all versions of Shiny prior to 0.11, Shiny has used the Bootstrap 2 framework for its web front-end. Shiny generates HTML that is structured to work with Bootstrap, and this makes it easy to create pages with sidebars, tabs, dropdown menus, mobile device support, and so on.
The Bootstrap development team stopped development on the Bootstrap 2 series after version 2.3.2, which was released over a year ago, and has since focused their efforts on Bootstrap 3. The new version of Bootstrap builds on many of the same underlying ideas, but it also has many small changes – for example, many of the CSS class names have changed.
In Shiny 0.11, we’ve moved to Bootstrap 3. For most Shiny users, the transition will be seamless; the only differences you’ll see are slight changes to fonts and spacing.
If, however, you customized any of your code to use features specific to Bootstrap 2, then you may need to update your code to work with Bootstrap 3 (see the Bootstrap migration guide for details). If you don’t want to update your code right away, you can use the shinybootstrap2 package for backward compatibility with Bootstrap 2 – using it requires adding just two lines of code. If you do use shinybootstrap2, we suggest using it just as an interim solution until you update your code for Bootstrap 3, because Shiny development going forward will use Bootstrap 3.
Why is Shiny moving to Bootstrap 3? One reason is support: as mentioned earlier, Bootstrap 2 is no longer developed and is no longer supported. Another reason is that there is dynamic community of actively-developed Bootstrap 3 themes. (Themes for Bootstrap 2 also exist, but there is less development activity.) Using these themes will allow you to customize the appearance of a Shiny app so that it doesn’t just look like… a Shiny app.
We’ve also created a package that make it easy to use Bootstrap themes: shinythemes. Here’s an example using the included Flatly theme:
See the shinythemes site for more screenshots and instructions on how to use it.
The shinydashboard package still under development, but feel free to try it out and give us feedback.
As R users know, we’re continuously improving the RStudio IDE. This includes RStudio Server Pro, where organizations who want to deploy the IDE at scale will find a growing set of features recently enhanced for them.
If you’re not already familiar with RStudio Server Pro here’s an updated summary page and a comparison to RStudio Server worth checking out. Or you can skip all of that and download a free 45 day evaluation right now!
WHAT’S NEW IN RSTUDIO SERVER PRO (v0.98.1091)
Naturally, the latest RStudio Server Pro has all of the new features found in the open source server version of the RStudio IDE. They include improvements to R Markdown document and Shiny app creation, making R package development easier, better debugging and source editing, and support for Internet Explorer 10 and 11 and RHEL 7.
Recently, we added even more powerful features exclusively for RStudio Server Pro:
- Load balancing based on factors you control. Load balancing ensures R users are automatically assigned to the best available server in a cluster.
- Flexible resource allocation by user or group. Now you can allocate cores, set scheduler priority, control the version(s) of R and enforce memory and CPU limits.
- New security enhancements. Leverage PAM to issue Kerberos tickets, move Google Accounts support to OAuth 2.0, and allow administrators to disable access to various features.
For a full list of what’s changed in more depth, make sure to read the RStudio Server Pro admin guide.
THE RSTUDIO SERVER PRO BASICS
In addition to the newest features above there are many more that make RStudio Server Pro an upgrade to the open source IDE. Here’s a quick list:
- An administrative dashboard that provides insight into active sessions, server health, and monitoring of system-wide and per-user performance and resources
- Authentication using system accounts, ActiveDirectory, LDAP, or Google Accounts
- Full support for the Pluggable Authentication Module (PAM)
- HTTP enhancements add support for SSL and keep-alive for improved performance
- Ability to restrict access to the server by IP
- Customizable server health checks
- Suspend, terminate, or assume control of user sessions for assistance and troubleshooting
That’s a lot to discover! Please download the newest version of RStudio Server Pro and as always let us know how it’s working and what else you’d like to see.
I’m very pleased to announce that dplyr 0.4.0 is now available from CRAN. Get the latest version by running:
dplyr 0.4.0 includes over 80 minor improvements and bug fixes, which are described in detail in the release notes. Here I wanted to draw your attention to two areas that have particularly improved since dplyr 0.3, two-table verbs and data frame support.
Two table verbs
dplyr now has full support for all two-table verbs provided by SQL:
- Mutating joins, which add new variables to one table from matching rows in another:
full_join(). (Support for non-equi joins is planned for dplyr 0.5.0.)
- Filtering joins, which filter observations from one table based on whether or not they match an observation in the other table:
- Set operations, which combine the observations in two data sets as if they were set elements:
Together, these verbs should allow you to solve 95% of data manipulation problems that involve multiple tables. If any of the concepts are unfamiliar to you, I highly recommend reading the two-table vignette (and if you still don’t understand, please let me know so I can make it better.)
dplyr wraps data frames in a
tbl_df class. These objects are structured in exactly the same way as regular data frames, but their behaviour has been tweaked a little to make them easier to work with. The new data_frames vignette describes how dplyr works with data frames in general, and below I highlight some of the features new in 0.4.0.
The biggest difference is printing:
print.tbl_df() doesn’t try and print 10,000 rows! Printing got a lot of love in dplyr 0.4 and now:
print()method methods invisibly return their input so you can interleave
print()statements into a pipeline to see interim results.
- If you’ve managed to produce a 0-row data frame, dplyr won’t try to print the data, but will tell you the column names and types:
data_frame(x = numeric(), y = character()) #> Source: local data frame [0 x 2] #> #> Variables not shown: x (dbl), y (chr)
- dplyr never prints row names since no dplyr method is guaranteed to preserve them:
df <- data.frame(x = c(a = 1, b = 2, c = 3)) df #> x #> a 1 #> b 2 #> c 3 df %>% tbl_df() #> Source: local data frame [3 x 1] #> #> x #> 1 1 #> 2 2 #> 3 3
I don’t think using row names is a good idea because it violates one of the principles of tidy data: every variable should be stored in the same way.
To make life a bit easier if you do have row names, you can use the new
add_rownames()to turn your row names into a proper variable:
df %>% add_rownames() #> rowname x #> 1 a 1 #> 2 b 2 #> 3 c 3
(But you’re better off never creating them in the first place.)
options(dplyr.print_max)is now 20, so dplyr will never print more than 20 rows of data (previously it was 100). The best way to see more rows of data is to use
Coercing lists to data frames
When you have a list of vectors of equal length that you want to turn into a data frame, dplyr provides
as_data_frame() as a simple alternative to
as_data_frame() is considerably faster than
as.data.frame() because it does much less:
l <- replicate(26, sample(100), simplify = FALSE) names(l) <- letters microbenchmark::microbenchmark( as_data_frame(l), as.data.frame(l) ) #> Unit: microseconds #> expr min lq median uq max neval #> as_data_frame(l) 101.856 112.0615 124.855 143.0965 254.193 100 #> as.data.frame(l) 1402.075 1466.6365 1511.644 1635.1205 3007.299 100
It’s difficult to precisely describe what
as.data.frame(x) does, but it’s similar to
do.call(cbind, lapply(x, data.frame)) – it coerces each component to a data frame and then
cbind()s them all together.
The speed of
as.data.frame() is not usually a bottleneck in interactive use, but can be a problem when combining thousands of lists into one tidy data frame (this is common when working with data stored in json or xml).
Binding rows and columns
dplyr now provides
bind_cols() for binding data frames together. Compared to
cbind(), the functions:
- Accept either individual data frames, or a list of data frames:
a <- data_frame(x = 1:5) b <- data_frame(x = 6:10) bind_rows(a, b) #> Source: local data frame [10 x 1] #> #> x #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> .. . bind_rows(list(a, b)) #> Source: local data frame [10 x 1] #> #> x #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> .. .
xis a list of data frames,
bind_rows(x)is equivalent to
- Are much faster:
dfs <- replicate(100, data_frame(x = runif(100)), simplify = FALSE) microbenchmark::microbenchmark( do.call("rbind", dfs), bind_rows(dfs) ) #> Unit: microseconds #> expr min lq median uq max #> do.call("rbind", dfs) 5344.660 6605.3805 6964.236 7693.8465 43457.061 #> bind_rows(dfs) 240.342 262.0845 317.582 346.6465 2345.832 #> neval #> 100 #> 100
(Generally you should avoid
bind_cols() in favour of a join; otherwise check carefully that the rows are in a compatible order).
Data frames are usually made up of a list of atomic vectors that all have the same length. However, it’s also possible to have a variable that’s a list, which I call a list-variable. Because of
data.frame()s complex coercion rules, the easiest way to create a data frame containing a list-column is with
data_frame(x = 1, y = list(1), z = list(list(1:5, "a", "b"))) #> Source: local data frame [1 x 3] #> #> x y z #> 1 1 <dbl> <list>
Note how list-variables are printed: a list-variable could contain a lot of data, so dplyr only shows a brief summary of the contents. List-variables are useful for:
- Working with summary functions that return more than one value:
qs <- mtcars %>% group_by(cyl) %>% summarise(y = list(quantile(mpg))) # Unnest input to collpase into rows qs %>% tidyr::unnest(y) #> Source: local data frame [15 x 2] #> #> cyl y #> 1 4 21.4 #> 2 4 22.8 #> 3 4 26.0 #> 4 4 30.4 #> 5 4 33.9 #> .. ... ... # To extract individual elements into columns, wrap the result in rowwise() # then use summarise() qs %>% rowwise() %>% summarise(q25 = y, q75 = y) #> Source: local data frame [3 x 2] #> #> q25 q75 #> 1 22.80 30.40 #> 2 18.65 21.00 #> 3 14.40 16.25
- Keeping associated data frames and models together:
by_cyl <- split(mtcars, mtcars$cyl) models <- lapply(by_cyl, lm, formula = mpg ~ wt) data_frame(cyl = c(4, 6, 8), data = by_cyl, model = models) #> Source: local data frame [3 x 3] #> #> cyl data model #> 1 4 <S3:data.frame> <S3:lm> #> 2 6 <S3:data.frame> <S3:lm> #> 3 8 <S3:data.frame> <S3:lm>
dplyr’s support for list-variables continues to mature. In 0.4.0, you can join and row bind list-variables and you can create them in summarise and mutate.
My vision of list-variables is still partial and incomplete, but I’m convinced that they will make pipeable APIs for modelling much eaiser. See the draft lowliner package for more explorations in this direction.
My colleague, Garrett, helped me make a cheat sheet that summarizes the data wrangling features of dplyr 0.4.0. You can download it from RStudio’s new gallery of R cheat sheets.
Jeroen Ooms and I are very pleased to announce a new version of RMySQL, the R package that allows you to talk to MySQL (and MariaDB) databases. We have taken over maintenance from Jeffrey Horner, who has done a great job of maintaining the package of the last few years, but no longer has time to look after it. Thanks for all your hard work Jeff!
library(DBI) # Connect to a public database that I'm running on Google's # cloud SQL service. It contains a copy of the data in the # datasets package. con <- dbConnect(RMySQL::MySQL(), username = "public", password = "F60RUsyiG579PeKdCH", host = "126.96.36.199", port = 3306, dbname = "datasets" ) # Run a query dbGetQuery(con, "SELECT * FROM mtcars WHERE cyl = 4 AND mpg < 23") #> row_names mpg cyl disp hp drat wt qsec vs am gear carb #> 1 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 3 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 4 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 # It's polite to let the database know when you're done dbDisconnect(con) #>  TRUE
It’s generally a bad idea to put passwords in your code, so instead of typing them directly, you can create a file called
~/.my.cnf that contains
[cloudSQL] username=public password=F60RUsyiG579PeKdCH host=188.8.131.52 port=3306 database=datasets
Then you can connect with:
con <- dbConnect(RMySQL::MySQL(), group = "cloudSQL")
Changes in this release
RMySQL 0.10.0 is mostly a cleanup release. RMySQL is one of the oldest packages on CRAN, and according to the timestamps, it is older than many recommended packages, and only slightly younger than MASS! That explains why a facelift was well overdue.
The most important change is an improvement to the build process so that CRAN binaries are now available for Windows and OS X Mavericks. This should make your life much easier if you’re on one of these platforms. We’d love your feedback on the new build scripts. There have been many problems in the past, so we’d like to know that this client works well across platforms and versions of MySQL server.
Otherwise, the changes update RMySQL for DBI 0.3 compatibility:
mysql*()functions are no longer exported. Please use the corresponding DBI generics instead.
- RMySQL gains transaction support with
dbRollback(). (But note that MySQL does not allow data definition language statements to be rolled back.)
- Added method for
dbFetch(). Please use this instead of
dbFetch()now returns a 0-row data frame (instead of an 0-col data frame) if there are no results.
- Added methods for
dbIsValid(). Please use these instead of
dbWriteTable()has been rewritten. It uses a better quoting strategy, throws errors on failure, and only automatically adds row names only if they’re strings. (NB:
dbWriteTable()also has a method that allows you load files directly from disk – this is likely to be faster if your file is one of the formats supported.)
For a complete list of changes, please see the full release notes.
As you might have noticed, ggplot2 recently turned 1.0.0. This release incorporated a handful of new features and bug fixes, but most importantly reflects that ggplot2 is now a mature plotting system and it will not change significantly in the future.
This does not mean ggplot2 is dead! The ggplot2 community is rich and vibrant and the number of packages that build on top of ggplot2 continues to grow. We are committed to maintaining ggplot2 so that you can continue to rely on it for years to come.
The ggplot2 book
Since ggplot2 is now stable, and the ggplot2 book is over five years old and rather out of date, I’m also happy to announce that I’m working on a second edition. I’ll be ably assisted in this endeavour by Carson Sievert, who’s so far done a great job of converting the source to Rmd and updating many of the examples to work with ggplot2 1.0.0. In the coming months we’ll be rewriting the data chapter to reflect modern best practices (e.g. tidyr and dplyr), and adding sections about new features.
We’d love your help! The source code for the book is available on github. If you’ve spotted any mistakes in the first edition that you’d like to correct, we’d really appreciate a pull request. If there’s a particular section of the book that you think needs an update (or is just plain missing), please let us know by filing an issue. Unfortunately we can’t turn the book into a free website because of my agreement with the publisher, but at least you can now get easily get to the source.
RStudio is happy to announce the availability of the shinyapps.io beta.
Shinyapps.io is an easy to use, secure, and scalable hosted service already being used by thousands of professionals and students to deploy Shiny applications on the web. Today we are releasing a significant upgrade as we transition from alpha to beta, the final step before general availability (GA) later this quarter.
New Feature Highlights in shinyapps.io beta
- Secure and manage authorized users with support for new authentication systems, including Google, GitHub, or a shinyapps.io account.
- Tune application performance by controlling the resources available. Run multiple R processes per application instance and add application instances.
- Track performance metrics and simplify application management in a new shinyapps.io dashboard. See an application’s active connections, CPU, memory, and network usage. Review application logs, start, stop, restart, rebuild and archive applications all from one convenient place.
During the beta period, these and all other features in shinyapps.io are available at no charge. At the end of the beta, users may subscribe to a plan of their choice or transition their applications to the free plan.
If you do not already have an account, we encourage anyone developing Shiny applications to consider shinyapps.io beta and appreciate any and all feedback on our features or proposed packaging and pricing.
Happy New Year!