RStudio is excited to announce the general availability (GA) of shinyapps.io.

Shinyapps.io is an easy to use, secure, and scalable hosted service already being used by thousands of professionals and students to deploy Shiny applications on the web. Effective today, shinyapps.io has completed beta testing and is generally available as a commercial service for anyone.

As regular readers of our blog know, Shiny is a popular free and open source R package from RStudio that simplifies the creation of interactive web applications, dashboards, and reports. Until today, Shiny Server and Shiny Server Pro were the most popular ways to share shiny apps. Now, there is a commercially supported alternative for individuals and groups who don’t have the time or resources to install and manage their own servers.

We want to thank the nearly 8,000 people who created at least one shiny app and deployed it on shinyapps.io during its extensive alpha and beta testing phases! The service was improved for everyone because of your willingness to give us feedback and bear with us as we continuously added to its capabilities.

For R users developing shiny applications that haven’t yet created a shinyapps.io account, we hope you’ll give it a try soon!  We did our best to keep the pricing simple and predictable with Free, Basic, Standard, and Professional plans. Each paid plan has features and functionality that we think will appeal to different users and can be purchased with a credit card by month or year. You can learn more about shinyapps.io pricing plans and product features on our website.

We hope to see your shiny app on shinyapps.io soon!

RStudio’s data viewer provides a quick way to look at the contents of data frames and other column-based data in your R environment. You invoke it by clicking on the grid icon in the Environment pane, or at the console by typing View(mydata).

grid icon

As part of the RStudio Preview Release, we’ve completely overhauled RStudio’s data viewer with modern features provided in part by a new interface built on DataTables.

No Row Limit

While the data viewer in 0.98 was limited to the first 1,000 rows, you can now view all the rows of your data set. RStudio loads just the portion of the data you’re looking at into the user interface, so things won’t get sluggish even when you’re working with large data sets.

no row limit

We’ve also added fixed column headers, and support for column labels imported from SPSS and other systems.

Sorting and Filtering

RStudio isn’t designed to act like a spreadsheet, but sometimes it’s helpful to do a quick sort or filter to get some idea of the data’s characteristics before moving into reproducible data analysis. Towards that end, we’ve built some basic sorting and filtering into the new data viewer.

Sorting

Click a column once to sort data in ascending order, and again to sort in descending order. For instance, how big is the biggest diamond?

sorting

To clear all sorts and filters on the data, click the upper-left column header.

Filtering

Click the new Filter button to enter Filter mode, then click the white filter value box to filter a column. You might, for instance, want to look at only at smaller diamonds:

filter

Not all data types can be filtered; at the moment, you can filter only numeric types, characters, and factors.

You can also stack filters; for instance, let’s further restrict this view to small diamonds with a Very Good cut:

filter factor

Full-Text Search

You can search the full text of your data frame using the new Search box in the upper right. This is useful for finding specific records; for instance, how many people named John were born in 2013?

full-text search

Live Update

If you invoke the data viewer on a variable as in View(mydata), the data viewer will (in most cases) automatically refresh whenever data in the variable changes.

You can use this feature to watch data change as you manipulate it. It continues to work even when the data viewer is popped out, a configuration that combines well with multi-monitor setups.

We hope these improvements help make you understand your data more quickly and easily. Try out the RStudio Preview Release and let us know what you think!

RStudio’s code editor includes a set of lightweight Vim key bindings. You can turn these on in Tools | Global Options | Code | Editing:

global options

For those not familiar, Vim is a popular text editor built to enable efficient text editing. It can take some practice and dedication to master Vim style editing but those who have done so typically swear by it. RStudio’s “vim mode” enables the use of many of the most common keyboard operations from Vim right inside RStudio.

As part of the 0.99 preview release, we’ve included an upgraded version of the ACE editor, which has a completely revamped Vim mode. This mode extends the range of Vim key bindings that are supported, and implements a number of Vim “power features” that go beyond basic text motions and editing. These include:

  • Vertical block selection via Ctrl + V. This integrates with the new multiple cursor support in ACE and allows you to type in multiple lines at once.
  • Macro playback and recording, using q{register} / @{register}.
  • Marks, which allow you drop markers in your source and jump back to them quickly later.
  • A selection of Ex commands, such as :wq and :%s that allow you to perform editor operations as you would in native Vim.
  • Fast in-file search with e.g. / and *, and support for JavaScript regular expressions.

We’ve also added a Vim quick reference card to the IDE that you can bring up at any time to show the supported key bindings. To see it, switch your editor to Vim mode (as described above) and type :help in Command mode.

vim quick reference card

Whether you’re a Vim novice or power user, we hope these improvements make the RStudio IDE’s editor a more productive and enjoyable environment for you. You can try the new Vim features out now by downloading the RStudio Preview Release.

We’re busy at work on the next version of RStudio (v0.99) and this week will be blogging about some of the noteworthy new features. If you want to try out any of the new features now you can do so by downloading the RStudio Preview Release.

The first feature to highlight is a fully revamped implementation of code completion for R. We’ve always supported a limited form of completion however (a) it only worked on objects in the global environment; and (b) it only worked when expressly requested via the tab key. As a result not nearly enough users discovered or benefitted from code completion. In this release code completion is much more comprehensive.

Smarter Completion Engine

Previously RStudio only completed variables that already existed in the global environment, now completion is done based on source code analysis so is provided even for objects that haven’t been fully evaluated:

document-inferred

Completions are also provided for a wide variety of specialized contexts including dimension names in [ and [[:

bracket

RStudio now provides completions for function arguments within function chains using magrittr’s %>% operator, for e.g. dplyr data transformation pipelines. Extending this behavior, we also provide the appropriate completions for the various ‘verbs’ used by dplyr:

dplyr        dplyr_verb

In addition, certain functions, such as library() and require(), expect package names for completions. RStudio automatically infers whether a particular function expects a package name and provides those names as completion results:

library

Completion is now also S3 and S4 aware. If RStudio is able to determine which method a particular function call will be dispatched to it will attempt to retrieve completions from that method. For example, the sort.default() method provides an extra argument, na.last, not available in the sort() generic. RStudio will provide completions for that argument if S3 dispatch would choose sort.default()

s3

Beyond what’s described above there are lots more new places where completions are provided:

  • For Shiny applications, completions for ui.R + server.R pairs
  • Completions for knitr options, e.g. in opts_chunk$get(), are now supplied
  • Completions for dynamic symbols within .C, .Call, .Fortran, .External

Additional Enhancements

Always On Completion

Previously RStudio only displayed completions “on-demand” in response to the tab key. Now, RStudio will proactively display completions after a $ or :: as well as after a period of typing inactivity. All of this behavior is configurable via the new completion options panel:

options

File Completions

When within an RStudio project, completions will be applied recursively to all file names matching the current token. The enclosing parent directory is printed on the right:

file

Fuzzy Narrowing

Got a completion with an excessively long name, perhaps a particularly long named Bioconductor package, or another variable or function name of long length? RStudio now uses ‘fuzzy narrowing’ on the completion list, by checking to see if the completion matches a ‘subsequence’ within each completion. By subsequence, we mean a sequence of characters not necessarily connected within the completion, so that for example, ‘fpse’ could match ‘file_path_sans_extension’. We hope that users will quickly become accustomed to this behavior and find it very useful.

fuzzy

Trying it Out

We think that the new completion features make for a qualitatively better experience of writing R code for beginning and expert users alike.  You can give the new features a try now by downloading the RStudio Preview Release.  If you run into problems or have feedback on how we could make things better let us know on our Support Forum.

 

I’m very pleased to announce that Epoch.com has stepped up as a sponsor for the RMySQL package.

For the last 20 years, Epoch.com has built its Internet Payment Service Provider infrastructure on open source software. Their data team, led by Szilard Pafka, PhD, has been using R for nearly a decade, developing cutting-edge data visualization, machine learning and other analytical applications. According to Epoch, “We have always believed in the value of R and in the importance of contributing to the open source community.”

This sort of sponsorship is very important to me. While I already spend most of my time working on R packages, I don’t have the skills to fix every problem. Sponsorship allows me to hire outside experts. In this case, Epoch.com’s sponsorship allowed me to work with Jeroen Ooms to improve the build system for RMySQL so that a CRAN binary is available for every platform.

Is your company interested in sponsoring other infrastructure work that benefits the whole R community? If so, please get in touch.

Sometimes the universe surprises us. In this case, it was in a good way and we genuinely appreciated it.

Earlier this week we learned that the Infoworld Testing Center staff selected RStudio as one of 32 recipients of the 2015 Technology of the Year Award.

We thought it was cool because it was completely unsolicited, we’re in very good company (some of our favorite technologies like Docker, Github, node.js…even my Dell XPS 15 Touch!…were also award winners) and the description of our products was surprisingly elegant – simple and accurate.

We know Infoworld wouldn’t have known about us if our customers hadn’t brought us to their attention.

Thank you.

toy15-rstudio-100563580-orig

Great news for Shiny and R Markdown enthusiasts!

An Interactive Reporting Workshop with Shiny and R Markdown is coming to a city near you. Act fast as only 20 seats are available for each workshop.

You can find out more / register by clicking on the link for your city!

East Coast West Coast
March 2 – Washington, DC April 15 – Los Angeles, CA
March 4 – New York, NY April 17 – San Francisco, CA
March 6 – Boston, MA April 20 – Seattle, WA

You’ll want to take this workshop if…

You have some experience working with R already. You should have written a number of functions, and be comfortable with R’s basic data structures (vectors, matrices, arrays, lists, and data frames).

You will learn from…

The workshop is taught by Garrett Grolemund. Garrett is the Editor-in-Chief of shiny.rstudio.com, the development center for the Shiny R package. He is also the author of Hands-On Programming with R as well as Data Science with R, a forthcoming book by O’Reilly Media. Garrett works as a Data Scientist and Chief Instructor for RStudio, Inc. GitHub

Shiny version 0.11 is available now! Notable changes include:

  • Shiny has migrated from Bootstrap 2 to Bootstrap 3 for its web front end. More on this below.
  • The old jsliders have been replaced with ion.rangeSlider. These sliders look better, are easier for users to interact with, and support updating more fields from the server side.
  • There is a new passwordInput() which can be used to create password fields.
  • New observeEvent() and eventReactive() functions greatly streamline the use of actionButton and other inputs that act more like events than reactive inputs.

For a full set of changes, see the NEWS file. To install, run:

install.packages("shiny")

We’ve also posted an article with notes on upgrading to 0.11.

Bootstrap 3 migration

In all versions of Shiny prior to 0.11, Shiny has used the Bootstrap 2 framework for its web front-end. Shiny generates HTML that is structured to work with Bootstrap, and this makes it easy to create pages with sidebars, tabs, dropdown menus, mobile device support, and so on.

The Bootstrap development team stopped development on the Bootstrap 2 series after version 2.3.2, which was released over a year ago, and has since focused their efforts on Bootstrap 3. The new version of Bootstrap builds on many of the same underlying ideas, but it also has many small changes – for example, many of the CSS class names have changed.

In Shiny 0.11, we’ve moved to Bootstrap 3. For most Shiny users, the transition will be seamless; the only differences you’ll see are slight changes to fonts and spacing.

If, however, you customized any of your code to use features specific to Bootstrap 2, then you may need to update your code to work with Bootstrap 3 (see the Bootstrap migration guide for details). If you don’t want to update your code right away, you can use the shinybootstrap2 package for backward compatibility with Bootstrap 2 – using it requires adding just two lines of code. If you do use shinybootstrap2, we suggest using it just as an interim solution until you update your code for Bootstrap 3, because Shiny development going forward will use Bootstrap 3.

Why is Shiny moving to Bootstrap 3? One reason is support: as mentioned earlier, Bootstrap 2 is no longer developed and is no longer supported. Another reason is that there is dynamic community of actively-developed Bootstrap 3 themes. (Themes for Bootstrap 2 also exist, but there is less development activity.) Using these themes will allow you to customize the appearance of a Shiny app so that it doesn’t just look like… a Shiny app.

We’ve also created a package that make it easy to use Bootstrap themes: shinythemes. Here’s an example using the included Flatly theme: flatly

See the shinythemes site for more screenshots and instructions on how to use it.

We’re also working on shinydashboard, a package that makes it easy to create dashboards. Here’s an example dashboard that also uses the leaflet package.

buses

The shinydashboard package still under development, but feel free to try it out and give us feedback.

As R users know, we’re continuously improving the RStudio IDE.  This includes RStudio Server Pro, where organizations who want to deploy the IDE at scale will find a growing set of features recently enhanced for them.

If you’re not already familiar with RStudio Server Pro here’s an updated summary page and a comparison to RStudio Server worth checking out. Or you can skip all of that and download a free 45 day evaluation right now!

WHAT’S NEW IN RSTUDIO SERVER PRO (v0.98.1091)

Naturally, the latest RStudio Server Pro has all of the new features found in the open source server version of the RStudio IDE. They include improvements to R Markdown document and Shiny app creation, making R package development easier, better debugging and source editing, and support for Internet Explorer 10 and 11 and RHEL 7.

Recently, we added even more powerful features exclusively for RStudio Server Pro:

  • Load balancing based on factors you control. Load balancing ensures R users are automatically assigned to the best available server in a cluster.
  • Flexible resource allocation by user or group. Now you can allocate cores, set scheduler priority, control the version(s) of R and enforce memory and CPU limits.
  • New security enhancements. Leverage PAM to issue Kerberos tickets, move Google Accounts support to OAuth 2.0, and allow administrators to disable access to various features.

For a full list of what’s changed in more depth, make sure to read the RStudio Server Pro admin guide.

THE RSTUDIO SERVER PRO BASICS

In addition to the newest features above there are many more that make RStudio Server Pro an upgrade to the open source IDE. Here’s a quick list:

  • An administrative dashboard that provides insight into active sessions, server health, and monitoring of system-wide and per-user performance and resources
  • Authentication using system accounts, ActiveDirectory, LDAP, or Google Accounts
  • Full support for the Pluggable Authentication Module (PAM)
  • HTTP enhancements add support for SSL and keep-alive for improved performance
  • Ability to restrict access to the server by IP
  • Customizable server health checks
  • Suspend, terminate, or assume control of user sessions for assistance and troubleshooting

That’s a lot to discover! Please download the newest version of RStudio Server Pro and as always let us know how it’s working and what else you’d like to see.

I’m very pleased to announce that dplyr 0.4.0 is now available from CRAN. Get the latest version by running:

install.packages("dplyr")

dplyr 0.4.0 includes over 80 minor improvements and bug fixes, which are described in detail in the release notes. Here I wanted to draw your attention to two areas that have particularly improved since dplyr 0.3, two-table verbs and data frame support.

Two table verbs

dplyr now has full support for all two-table verbs provided by SQL:

  • Mutating joins, which add new variables to one table from matching rows in another: inner_join(), left_join(), right_join(), full_join(). (Support for non-equi joins is planned for dplyr 0.5.0.)
  • Filtering joins, which filter observations from one table based on whether or not they match an observation in the other table: semi_join(), anti_join().
  • Set operations, which combine the observations in two data sets as if they were set elements: intersect(), union(), setdiff().

Together, these verbs should allow you to solve 95% of data manipulation problems that involve multiple tables. If any of the concepts are unfamiliar to you, I highly recommend reading the two-table vignette (and if you still don’t understand, please let me know so I can make it better.)

Data frames

dplyr wraps data frames in a tbl_df class. These objects are structured in exactly the same way as regular data frames, but their behaviour has been tweaked a little to make them easier to work with. The new data_frames vignette describes how dplyr works with data frames in general, and below I highlight some of the features new in 0.4.0.

Printing

The biggest difference is printing: print.tbl_df() doesn’t try and print 10,000 rows! Printing got a lot of love in dplyr 0.4 and now:

  • All print() method methods invisibly return their input so you can interleave print() statements into a pipeline to see interim results.
  • If you’ve managed to produce a 0-row data frame, dplyr won’t try to print the data, but will tell you the column names and types:
    data_frame(x = numeric(), y = character())
    #> Source: local data frame [0 x 2]
    #> 
    #> Variables not shown: x (dbl), y (chr)
  • dplyr never prints row names since no dplyr method is guaranteed to preserve them:
    df <- data.frame(x = c(a = 1, b = 2, c = 3))
    df
    #>   x
    #> a 1
    #> b 2
    #> c 3
    df %>% tbl_df()
    #> Source: local data frame [3 x 1]
    #> 
    #>   x
    #> 1 1
    #> 2 2
    #> 3 3

    I don’t think using row names is a good idea because it violates one of the principles of tidy data: every variable should be stored in the same way.

    To make life a bit easier if you do have row names, you can use the new add_rownames() to turn your row names into a proper variable:

    df %>% 
      add_rownames()
    #>   rowname x
    #> 1       a 1
    #> 2       b 2
    #> 3       c 3

    (But you’re better off never creating them in the first place.)

  • options(dplyr.print_max) is now 20, so dplyr will never print more than 20 rows of data (previously it was 100). The best way to see more rows of data is to use View().

Coercing lists to data frames

When you have a list of vectors of equal length that you want to turn into a data frame, dplyr provides as_data_frame() as a simple alternative to as.data.frame(). as_data_frame() is considerably faster than as.data.frame() because it does much less:

l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters
microbenchmark::microbenchmark(
  as_data_frame(l),
  as.data.frame(l)
)
#> Unit: microseconds
#>              expr      min        lq   median        uq      max neval
#>  as_data_frame(l)  101.856  112.0615  124.855  143.0965  254.193   100
#>  as.data.frame(l) 1402.075 1466.6365 1511.644 1635.1205 3007.299   100

It’s difficult to precisely describe what as.data.frame(x) does, but it’s similar to do.call(cbind, lapply(x, data.frame)) – it coerces each component to a data frame and then cbind()s them all together.

The speed of as.data.frame() is not usually a bottleneck in interactive use, but can be a problem when combining thousands of lists into one tidy data frame (this is common when working with data stored in json or xml).

Binding rows and columns

dplyr now provides bind_rows() and bind_cols() for binding data frames together. Compared to rbind() and cbind(), the functions:

  • Accept either individual data frames, or a list of data frames:
    a <- data_frame(x = 1:5)
    b <- data_frame(x = 6:10)
    
    bind_rows(a, b)
    #> Source: local data frame [10 x 1]
    #> 
    #>    x
    #> 1  1
    #> 2  2
    #> 3  3
    #> 4  4
    #> 5  5
    #> .. .
    bind_rows(list(a, b))
    #> Source: local data frame [10 x 1]
    #> 
    #>    x
    #> 1  1
    #> 2  2
    #> 3  3
    #> 4  4
    #> 5  5
    #> .. .

    If x is a list of data frames, bind_rows(x) is equivalent to do.call(rbind, x).

  • Are much faster:
    dfs <- replicate(100, data_frame(x = runif(100)), simplify = FALSE)
    microbenchmark::microbenchmark(
      do.call("rbind", dfs),
      bind_rows(dfs)
    )
    #> Unit: microseconds
    #>                   expr      min        lq   median        uq       max
    #>  do.call("rbind", dfs) 5344.660 6605.3805 6964.236 7693.8465 43457.061
    #>         bind_rows(dfs)  240.342  262.0845  317.582  346.6465  2345.832
    #>  neval
    #>    100
    #>    100

(Generally you should avoid bind_cols() in favour of a join; otherwise check carefully that the rows are in a compatible order).

List-variables

Data frames are usually made up of a list of atomic vectors that all have the same length. However, it’s also possible to have a variable that’s a list, which I call a list-variable. Because of data.frame()s complex coercion rules, the easiest way to create a data frame containing a list-column is with data_frame():

data_frame(x = 1, y = list(1), z = list(list(1:5, "a", "b")))
#> Source: local data frame [1 x 3]
#> 
#>   x        y         z
#> 1 1 <dbl[1]> <list[3]>

Note how list-variables are printed: a list-variable could contain a lot of data, so dplyr only shows a brief summary of the contents. List-variables are useful for:

  • Working with summary functions that return more than one value:
    qs <- mtcars %>%
      group_by(cyl) %>%
      summarise(y = list(quantile(mpg)))
    
    # Unnest input to collpase into rows
    qs %>% tidyr::unnest(y)
    #> Source: local data frame [15 x 2]
    #> 
    #>    cyl    y
    #> 1    4 21.4
    #> 2    4 22.8
    #> 3    4 26.0
    #> 4    4 30.4
    #> 5    4 33.9
    #> .. ...  ...
    
    # To extract individual elements into columns, wrap the result in rowwise()
    # then use summarise()
    qs %>% 
      rowwise() %>% 
      summarise(q25 = y[2], q75 = y[4])
    #> Source: local data frame [3 x 2]
    #> 
    #>     q25   q75
    #> 1 22.80 30.40
    #> 2 18.65 21.00
    #> 3 14.40 16.25
  • Keeping associated data frames and models together:
    by_cyl <- split(mtcars, mtcars$cyl)
    models <- lapply(by_cyl, lm, formula = mpg ~ wt)
    
    data_frame(cyl = c(4, 6, 8), data = by_cyl, model = models)
    #> Source: local data frame [3 x 3]
    #> 
    #>   cyl            data   model
    #> 1   4 <S3:data.frame> <S3:lm>
    #> 2   6 <S3:data.frame> <S3:lm>
    #> 3   8 <S3:data.frame> <S3:lm>

dplyr’s support for list-variables continues to mature. In 0.4.0, you can join and row bind list-variables and you can create them in summarise and mutate.

My vision of list-variables is still partial and incomplete, but I’m convinced that they will make pipeable APIs for modelling much eaiser. See the draft lowliner package for more explorations in this direction.

Bonus

My colleague, Garrett, helped me make a cheat sheet that summarizes the data wrangling features of dplyr 0.4.0. You can download it from RStudio’s new gallery of R cheat sheets.

Data wrangling cheatsheet

Follow

Get every new post delivered to your Inbox.

Join 12,401 other followers