You are currently browsing the category archive for the ‘News’ category.

If big data is your thing, you use R, and you’re headed to Strata + Hadoop World in San Jose March 13 & 14th, you can experience in person how easy and practical it is to analyze big data with R and Spark.

In a beginner level talk by RStudio’s Edgar Ruiz and an intermediate level  workshop by Win-Vector’s John Mount, we cover the spectrum: What R is, what Spark is, how Sparklyr works, and what is required to set up and tune a Spark cluster. You’ll also learn practical applications including: how to quickly set up a local Spark instance, store big data in Spark and then connect to the data with R, use R to apply machine-learning algorithms to big data stored in Spark, and filter and aggregate big data stored in Spark and then import the results into R for analysis and visualization.

2:40pm–3:20pm Wednesday, March 15, 2017
Sparklyr: An R interface for Apache Spark
Edgar Ruiz (RStudio)
Primary topic: Spark & beyond
Location: LL21 C/D
Level: Beginner
Secondary topics: R

1:30pm–5:00pm Tuesday, March 14, 2017
Modeling big data with R, sparklyr, and Apache Spark
John Mount (Win-Vector LLC)
Primary topic: Data science & advanced analytics
Location: LL21 C/D
Level: Intermediate
Secondary topics: R

While you’re  at the conference be sure to look us up in the Innovator’s Pavilion – booth number P8 during the Expo Hall hours. We’ll have the latest books from RStudio authors, t-shirts to win, demonstrations of RStudio Connect and RStudio Server Pro and, of course, stickers and cheatsheets. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in San Jose! (https://conferences.oreilly.com/strata/strata-ca)

We’re excited to announce the latest release of RStudio Connect: version 1.4.2. This release includes a number of notable features including an overhauled interface for parameterized R Markdown reports.

Enhanced Parameterized R Markdown Reports

Enhanced Parameterized R Markdown Reports

The most notable feature in this release is the ability to publish parameterized R Markdown reports that are easier for anyone to customize. If you’re unfamiliar, parameterized R Markdown reports allow you to inject input parameters into your R Markdown document to alter what analysis the report performs. The parameters of your R Markdown report are now visible on the left-hand sidebar, allowing users to easily tweak the inputs to the document and quickly view the output in the browser.

Users even have the opportunity to create private versions of the report which they can schedule to run again, email, or save and revisit in the browser. Of course, you can continue to use the wide variety of output formats (notebooks, dashboards, books, and others) while using parameterized R Markdown.

In addition to the parameterized report overhaul, there are some other notable features included in this release.

  • Content private by default – Content is set to private (“Just Me”) by default. Users can still change the visibility of their content before publishing, as before.
  • Execute R as the authenticated viewer – You can now choose to have some applications execute their underlying R process as the authenticated viewer currently looking at the app. This allows applications to access any data or resource that the associated user has access to on the server. Requires PAM authentication. More details here.

Other important features include:

  • Show progress indicator when updating a report.
  • Users can now filter content to include only items that they can edit or view.
  • Users now only count against the named user license limit after they log in for the first time.
  • Added support for global “System Messages” that can display an HTML message to your users on the landing pages. Details here.
  • Updated packrat to gain more transparency on package build errors.
  • Updated the list of SSL ciphers to correspond with modern best-practices.

If you haven’t yet had a chance to download and try RStudio Connect we encourage you to do so. RStudio Connect is the best way to share all the work that you do in R (Shiny apps, R Markdown documents, plots, dashboards, etc.) with collaborators, colleagues, or customers.

You can find more details or download a 45 day evaluation of the product at https://www.rstudio.com/products/connect/. Additional resources can be found below.

We’re thrilled to officially introduce the newest product in RStudio’s product lineup: RStudio Connect.

You can download a free 45-day trial of it here.

RStudio Connect is a new publishing platform for all the work your teams do in R. It provides a single destination for your Shiny applications, R Markdown documents, interactive HTML widgets, static plots, and more.

RStudio Connect Settings

RStudio Connect isn’t just for R users. Now anyone can interact with custom built analytical data products developed by R users without having to program in R themselves. Team members can receive updated reports built on the same models/forecasts which can be configured to be rebuilt and distributed on a scheduled basis. RStudio Connect is designed to bring the power of data science to your entire enterprise.

RStudio Connect empowers analysts to share and manage the content they’ve created in R. Users of the RStudio IDE can publish content to RStudio Connect with the click of a button and immediately be able to manage that content from a user-friendly web application: setting access controls and performance settings and viewing the logs of the associated R processes on the server.

Deploying content from the RStudio IDE into RStudio Connect

RStudio Connect is on-premises software that you can install on a server behind your firewall ensuring that your data and R applications never have to leave your organization’s control. We integrate with many enterprise authentication platform including LDAP/Active Directory, Google OAuth, PAM, and proxied authentication. We also provide an option to use an internal username/password system complete with user self-sign-up.

RStudio Connect Admin Metrics

RStudio Connect has been in Beta for almost a year. We’ve had hundreds of customers validate and help us improve the software in that time. In November, we made RStudio Connect generally available without significant fanfare and began to work with Beta participants and existing RStudio customers eager to move it into their production environments. We are pleased that innovative early customers, like AdRoll, have already successfully introduced RStudio Connect into their data science process.

“At AdRoll, we have used the open source version of Shiny Server for years to great success but deploying apps always served as a barrier for new users. With RStudio Connect’s push button deployment from the RStudio IDE, the number of shiny devs has grown tremendously both in engineering and across teams completely new to shiny like finance and marketing. It’s been really powerful for those just getting started to be able to go from developing locally to sharing apps with others in just seconds.”

– Bryan Galvin, Senior Data Scientist, AdRoll

We invite you to take a look at RStudio Connect today, too!

You can find more details or download a 45 day evaluation of the product at https://www.rstudio.com/products/connect/. Additional resources can be found below.

On October 12, RStudio launched R Views with great enthusiasm. R Views is a new blog for R users about the R Community and the R Language. Under the care of editor-in-chief and new RStudio ambassador-at-large, Joseph Rickert, R Views provides a new perspective on R and RStudio that we like to think will become essential reading for you.

You may have read an R Views post already. In the first, widely syndicated, post, Joseph interviewed J.J. Allaire, RStudio’s founder, CEO and most prolific software developer. Later posts by Mine Cetinkaya-Rundel on Highcharts and thoughtful book reviews, new R package picks, and a primer on Naive Bayes from Joseph rounded out the first month. Each post was entirely different from anything you could have read here, on what we now call our Developer Blog at rstudio.org.

Fortunately, you don’t have to choose. Each has its purpose. Our Developer Blog is the place to go for RStudio news. You’ll find product announcements, events, and company happenings – like the announcement of a new blog – right here. R Views is about R in action. You’ll find stories and solutions and opinions that we hope will educate and challenge you.

Subscribe to each and stay up to date on all things R and RStudio!

Thanks for making R and RStudio part of your data science experience and for supporting our work.

Today we’re very pleased to announce the availability of RStudio Version 1.0! Version 1.0 is our 10th major release since the initial launch in February 2011 (see the full release history below), and our biggest ever! Highlights include:

  • Authoring tools for R Notebooks.
  • Integrated support for the sparklyr package (R interface to Spark).
  • Performance profiling via integration with the profvis package.
  • Enhanced data import tools based on the readr, readxl and haven packages.
  • Authoring tools for R Markdown websites and the bookdown package.
  • Many other miscellaneous enhancements and bug fixes.

We hope you download version 1.0 now and as always let us know what you think.

R Notebooks

R Notebooks add a powerful notebook authoring engine to R Markdown. Notebook interfaces for data analysis have compelling advantages including the close association of code and output and the ability to intersperse narrative with computation. Notebooks are also an excellent tool for teaching and a convenient way to share analyses.

Interactive R Markdown

As an authoring format, R Markdown bears many similarities to traditional notebooks like Jupyter and Beaker. However, code in notebooks is typically executed interactively, one cell at a time, whereas code in R Markdown documents is typically executed in batch.

R Notebooks bring the interactive model of execution to your R Markdown documents, giving you the capability to work quickly and iteratively in a notebook interface without leaving behind the plain-text tools, compatibility with version control, and production-quality output you’ve come to rely on from R Markdown.

Iterate Quickly

In a typical R Markdown document, you must re-knit the document to see your changes, which can take some time if it contains non-trivial computations. R Notebooks, however, let you run code and see the results in the document immediately. They can include just about any kind of content R produces, including console output, plots, data frames, and interactive HTML widgets.

screen-shot-2016-09-20-at-4-16-47-pm

You can see the progress of the code as it runs:

screen-shot-2016-09-21-at-10-52-02-am

You can preview the results of individual inline expressions, too:

notebook-inline-output

Even your LaTeX equations render in real-time as you type:

notebook-mathjax

This focused mode of interaction doesn’t require you to keep the console, viewer, or output panes open. Everything you need is at your fingertips in the editor, reducing distractions and helping you concentrate on your analysis. When you’re done, you’ll have a formatted, reproducible record of what you’ve accomplished, with plenty of context, perfect for your own records or sharing with others.

Spark with sparklyr

The sparklyr package is a new R interface for Apache Spark. RStudio now includes integrated support for Spark and the sparklyr package, including tools for:

  • Creating and managing Spark connections
  • Browsing the tables and columns of Spark DataFrames
  • Previewing the first 1,000 rows of Spark DataFrames

Once you’ve installed the sparklyr package, you should find a new Spark pane within the IDE. This pane includes a New Connection dialog which can be used to make connections to local or remote Spark instances:

Once you’ve connected to Spark you’ll be able to browse the tables contained within the Spark cluster:

The Spark DataFrame preview uses the standard RStudio data viewer:

Profiling with profvis

“How can I make my code faster?”

If you write R code, then you’ve probably asked yourself this question. A profiler is an important tool for doing this: it records how the computer spends its time, and once you know that, you can focus on the slow parts to make them faster.

RStudio now includes integrated support for profiling R code and for visualizing profiling data. R itself has long had a built-in profiler, and now it’s easier than ever to use the profiler and interpret the results.

To profile code with RStudio, select it in the editor, and then click on Profile -> Profile Selected Line(s). R will run that code with the profiler turned on, and then open up an interactive visualization.

In the visualization, there are two main parts: on top, there is the code with information about the amount of time spent executing each line, and on the bottom there is a flame graph, which shows what R was doing over time. In the flame graph, the horizontal direction represents time, moving from left to right, and the vertical direction represents the call stack, which are the functions that are currently being called. (Each time a function calls another function, it goes on top of the stack, and when a function exits, it is removed from the stack.)

profile.png

The Data tab contains a call tree, showing which function calls are most expensive:

Profiling data pane

Armed with this information, you’ll know what parts of your code to focus on to speed things up!

Data Import

RStudio now integrates with the readr, readxl, and haven packages to provide comprehensive tools for importing data from many text file formats, Excel worksheets, as well as SAS, Stata, and SPSS data files. The tools are focused on interactively refining an import then providing the code required to reproduce the import on new datasets.

For example, here’s the workflow we would use to import the Excel worksheet at http://www.fns.usda.gov/sites/default/files/pd/slsummar.xls.

First provide the dataset URL and review the import in preview mode (notice that this file contains two tables and as a result requires the first few rows to be removed):

We can clean this up by skipping 6 rows from this file and unchecking the “First Row as Names” checkbox:

The file is looking better but some columns are being displayed as strings when they are clearly numerical data. We can fix this by selecting “numeric” from the column drop-down:

The final step is to click “Import” to run the code displayed under “Code Preview” and import the data into R. The code is executed within the console and imported dataset is displayed automatically:

Note that rather than executing the import we could have just copied and pasted the import code and included it within any R script.

RStudio Release History

We started working on RStudio in November of 2008 (8 years ago!) and had our first public release in February of 2011. Here are highlights of the various releases through the years:

Version Date Highlights
0.92 Feb 2011
  • Initial public release
0.93 Apr 2011
  • Interactive plotting with manipulate
  • Source editor themes
  • Configurable workspace layout
0.94 Jun 2011
  • Enhanced plot export
  • Enhanced package installation and management
  • Enhanced history management
0.95 Jan 2012
  • RStudio project system
  • Code navigation (typeahead search, go to definition)
  • Version control integration (Git and Subversion)
0.96 May 2012
  • Enhanced authoring for Sweave
  • Web publishing with R Markdown
  • Code folding and many other editing enhancements
0.97 Oct 2012
  • Package development tools
  • Vim editing mode
  • More intelligent R auto-indentation
0.98 Dec 2013
  • Interactive debugging tools
  • Enhanced environment pane
  • Viewer pane for web content / htmlwidgets
0.98b Jun 2014
  • R Markdown v2 (publish to PDF, Word, and more)
  • Integrated tools for Shiny application development
  • Editor support for XML, SQL, Python, and Bash
0.99 May 2015
  • Data viewer with support for large datasets, filtering, searching, and sorting
  • Major enhancements to R and C/C++ code completion and inline code diagnostics
  • Multiple cursors, tab re-ordering, enhanced Vim mode
0.99b Feb 2016
  • Emacs editing mode
  • Multi-window source editing
  • Customizable keyboard shortcuts
  • RStudio Addins
1.0 Nov 2016
  • Authoring tools for R Notebooks
  • Integrated support for sparklyr (R interface to Spark)
  • Enhanced data import tools
  • Performance profiling via integration with profvis

The RStudio Release History page on our support website provides a complete history of all major and minor point releases.

 

We’re excited today to announce sparklyr, a new package that provides an interface between R and Apache Spark.

Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more. Highlights include:

  • Interactively manipulate Spark data using both dplyr and SQL (via DBI).
  • Filter and aggregate Spark datasets then bring them into R for analysis and visualization.
  • Orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater.
  • Create extensions that call the full Spark API and provide interfaces to Spark packages.
  • Integrated support for establishing Spark connections and browsing Spark data frames within the RStudio IDE.

We’re also excited to be working with several industry partners. IBM is incorporating sparklyr into their Data Science Experience, Cloudera is working with us to ensure that sparklyr meets the requirements of their enterprise customers, and H2O has provided an integration between sparklyr and H2O Sparkling Water.

Getting Started

You can install sparklyr from CRAN as follows:

install.packages("sparklyr")

You should also install a local version of Spark for development purposes:

library(sparklyr)
spark_install(version = "1.6.2")

If you use the RStudio IDE, you should also download the latest preview release of the IDE which includes several enhancements for interacting with Spark.

Extensive documentation and examples are available at http://spark.rstudio.com.

Connecting to Spark

You can connect to both local instances of Spark as well as remote Spark clusters. Here we’ll connect to a local instance of Spark:

library(sparklyr)
sc <- spark_connect(master = "local")

The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster.

Reading Data

You can copy R data frames into Spark using the dplyr copy_to function (more typically though you’ll read data within the Spark cluster using the spark_read family of functions). For the examples below we’ll copy some datasets from R into Spark (note that you may need to install the nycflights13 and Lahman packages in order to execute this code):

library(dplyr)
iris_tbl <- copy_to(sc, iris)
flights_tbl <- copy_to(sc, nycflights13::flights, "flights")
batting_tbl <- copy_to(sc, Lahman::Batting, "batting")

Using dplyr

We can now use all of the available dplyr verbs against the tables within the cluster. Here’s a simple filtering example:

# filter by departure delay
flights_tbl %>% filter(dep_delay == 2)

Introduction to dplyr provides additional dplyr examples you can try. For example, consider the last example from the tutorial which plots data on flight delays:

delay <- flights_tbl %>% 
  group_by(tailnum) %>%
  summarise(count = n(), dist = mean(distance), delay = mean(arr_delay)) %>%
  filter(count > 20, dist < 2000, !is.na(delay)) %>%
  collect()

# plot delays
library(ggplot2)
ggplot(delay, aes(dist, delay)) +
  geom_point(aes(size = count), alpha = 1/2) +
  geom_smooth() +
  scale_size_area(max_size = 2)
Note that while the dplyr functions shown above look identical to the ones you use with R data frames, with sparklyr they use Spark as their back end and execute remotely in the cluster.

Window Functions

dplyr window functions are also supported, for example:

batting_tbl %>%
  select(playerID, yearID, teamID, G, AB:H) %>%
  arrange(playerID, yearID, teamID) %>%
  group_by(playerID) %>%
  filter(min_rank(desc(H)) <= 2 & H > 0)

For additional documentation on using dplyr with Spark see the dplyr section of the sparklyr website.

Using SQL

It’s also possible to execute SQL queries directly against tables within a Spark cluster. The spark_connection object implements a DBI interface for Spark, so you can use dbGetQuery to execute SQL and return the result as an R data frame:

library(DBI)
iris_preview <- dbGetQuery(sc, "SELECT * FROM iris LIMIT 10")

Machine Learning

You can orchestrate machine learning algorithms in a Spark cluster via either Spark MLlib or via the H2O Sparkling Water extension package. Both provide a set of high-level APIs built on top of DataFrames that help you create and tune machine learning workflows.

Spark MLlib

In this example we’ll use ml_linear_regression to fit a linear regression model. We’ll use the built-in mtcars dataset, and see if we can predict a car’s fuel consumption (mpg) based on its weight (wt) and the number of cylinders the engine contains (cyl). We’ll assume in each case that the relationship between mpg and each of our features is linear.

# copy mtcars into spark
mtcars_tbl <- copy_to(sc, mtcars)

# transform our data set, and then partition into 'training', 'test'
partitions <- mtcars_tbl %>%
  filter(hp >= 100) %>%
  mutate(cyl8 = cyl == 8) %>%
  sdf_partition(training = 0.5, test = 0.5, seed = 1099)

# fit a linear model to the training dataset
fit <- partitions$training %>%
  ml_linear_regression(response = "mpg", features = c("wt", "cyl"))

For linear regression models produced by Spark, we can use summary() to learn a bit more about the quality of our fit, and the statistical significance of each of our predictors.

summary(fit)

Spark machine learning supports a wide array of algorithms and feature transformations, and as illustrated above it’s easy to chain these functions together with dplyr pipelines. To learn more see the Spark MLlib section of the sparklyr website.

H2O Sparkling Water

Let’s walk the same mtcars example, but in this case use H2O’s machine learning algorithms via the H2O Sparkling Water extension. The dplyr code used to prepare the data is the same, but after partitioning into test and training data we call h2o.glm rather than ml_linear_regression:

# convert to h20_frame (uses the same underlying rdd)
training <- as_h2o_frame(partitions$training)
test <- as_h2o_frame(partitions$test)

# fit a linear model to the training dataset
fit <- h2o.glm(x = c("wt", "cyl"),
               y = "mpg",
               training_frame = training,
               lamda_search = TRUE)

# inspect the model
print(fit)

For linear regression models produced by H2O, we can use either print() or summary() to learn a bit more about the quality of our fit. The summary() method returns some extra information about scoring history and variable importance.

To learn more see the H2O Sparkling Water section of the sparklyr website.

Extensions

The facilities used internally by sparklyr for its dplyr and machine learning interfaces are available to extension packages. Since Spark is a general purpose cluster computing system there are many potential applications for extensions (e.g. interfaces to custom machine learning pipelines, interfaces to 3rd party Spark packages, etc.).

The sas7bdat extension enables parallel reading of SAS datasets in the sas7bdat format into Spark data frames. The rsparkling extension provides a bridge between sparklyr and H2O’s Sparkling Water.

We’re excited to see what other sparklyr extensions the R community creates. To learn more see the Extensions section of the sparklyr website.

RStudio IDE

The latest RStudio Preview Release of the RStudio IDE includes integrated support for Spark and the sparklyr package, including tools for:

  • Creating and managing Spark connections
  • Browsing the tables and columns of Spark DataFrames
  • Previewing the first 1,000 rows of Spark DataFrames

Once you’ve installed the sparklyr package, you should find a new Spark pane within the IDE. This pane includes a New Connection dialog which can be used to make connections to local or remote Spark instances:

Once you’ve connected to Spark you’ll be able to browse the tables contained within the Spark cluster:

The Spark DataFrame preview uses the standard RStudio data viewer:

The RStudio IDE features for sparklyr are available now as part of the RStudio Preview Release. The final version of RStudio IDE that includes integrated support for sparklyr will ship within the next few weeks.

Partners

We’re very pleased to be joined in this announcement by IBM, Cloudera, and H2O, who are working with us to ensure that sparklyr meets the requirements of enterprise customers and is easy to integrate with current and future deployments of Spark.

IBM

“With our latest contributions to Apache Spark and the release of sparklyr, we continue to emphasize R as a primary data science language within the Spark community. Additionally, we are making plans to include sparklyr in Data Science Experience to provide the tools data scientists are comfortable with to help them bring business-changing insights to their companies faster,” said Ritika Gunnar, vice president of Offering Management, IBM Analytics.

Cloudera

“At Cloudera, data science is one of the most popular use cases we see for Apache Spark as a core part of the Apache Hadoop ecosystem, yet the lack of a compelling R experience has limited data scientists’ access to available data and compute,” said Charles Zedlewski, vice president, Products at Cloudera. “We are excited to partner with RStudio to help bring sparklyr to the enterprise, so that data scientists and IT teams alike can get more value from their existing skills and infrastructure, all with the security, governance, and management our customers expect.”

H2O

“At H2O.ai, we’ve been focused on bringing the best of breed open source machine learning to data scientists working in R & Python. However, the lack of robust tooling in the R ecosystem for interfacing with Apache Spark has made it difficult for the R community to take advantage of the distributed data processing capabilities of Apache Spark.

We’re excited to work with RStudio to bring the ease of use of dplyr and the distributed machine learning algorithms from H2O’s Sparkling Water to the R community via the sparklyr & rsparkling packages”

A new Shiny release is upon us! There are many new exciting features, bug fixes, and library updates. We’ll just highlight the most important changes here, but you can browse through the full changelog for details. This will likely be the last release before shiny 1.0, so get out your party hats!

To install it, you can just run:

install.packages("shiny")

Bookmarkable state

Shiny now supports bookmarkable state: users can save the state of an application and get a URL which will restore the application with that state. There are two types of bookmarking: encoding the state in a URL, and saving the state to the server. With an encoded state, the entire state of the application is contained in the URL’s query string. You can see this in action with this app: https://gallery.shinyapps.io/113-bookmarking-url/. An example of a bookmark URL for this app is https://gallery.shinyapps.io/113-bookmarking-url/?inputs&n=200. When the state is saved to the server, the URL might look something like: https://gallery.shinyapps.io/bookmark-saved/?state_id=d80625dc681e913a (note that this URL is not for an active app).

Important note: Saved-to-server bookmarking currently works with Shiny Server Open Source. Support on Shiny Server Pro, RStudio Connect, and shinyapps.io is under development and testing. However, URL-encoded bookmarking works on all hosting platforms.

See this article to get started with bookmarkable state. There is also an advanced-level article, and a modules article that details how to use bookmarking in conjunction with modules.

Notifications

Shiny can now display notifications on the client browser by using the showNotification() function. Use this demo app to play around with the notification API. For more, see our article about notifications.

Progress indicators

If your Shiny app contains computations that take a long time to complete, a progress bar can improve the user experience by communicating how far along the computation is, and how much is left. Progress bars were added in Shiny 0.10.2. In Shiny 0.14, we’ve changed them to use the notifications system, which gives them a different look.

Important note: If you were already using progress bars and had customized them with your own CSS, you can add the style = "old" argument to your withProgress() call (or Progress$new()). This will result in the same appearance as before. You can also call shinyOptions(progress.style = "old") in your app’s server function to make all progress indicators use the old styling.

To see new progress bars in action, see this app in the gallery. You can also learn more about them in here.

Modal windows

Shiny has now built-in support for displaying modal dialogs like the one below (live app here):

Modal dialog

To learn more about modal dialogs in Shiny, read the article about them.

insertUI and removeUI

Sometimes in a Shiny app, arbitrary HTML UI may need to be created on-the-fly in response to user input. The existing uiOutput and renderUI functions let you continue using reactive logic to call UI functions and make the results appear in a predetermined place in the UI. The insertUI and removeUI functions, which are used in the server code, allow you to use imperative logic to add and remove arbitrary chunks of HTML (all independent from one another), as many times as you want, whenever you want, wherever you want. This option may be more convenient when you want to, for example, add a new model to your app each time the user selects a different option (and leave previous models unchanged, rather than substitute the previous one for the latest one).

See this simple demo app of how one could use insertUI and removeUI to insert and remove text elements using a queue. Also see this other app that demonstrates how to insert and remove a few common Shiny input objects. Finally, this app shows how to dynamically insert modules using insertUI.

For more, read our article about dynamic UI generation and the reference documentation about insertUI and removeUI.

Documentation for connecting to an external database

Many Shiny users have asked about best practices for accessing external databases from their Shiny applications. Although database access has long been possible using various database connector packages in R, it can be challenging to use them robustly in the dynamic environment that Shiny provides. So far, it has been mostly up to application authors to find the appropriate database drivers and to discover how to manage the database connections within an application. In order to demystify this process, we wrote a series of articles (first one here) that covers the basics of connecting to an external database, as well as some security precautions to keep in mind (e.g. how to avoid SQL injection attacks).

There are a few packages that you should look at if you’re using a relational database in a Shiny app: the dplyr and DBI packages (both featured in the article linked to above), and the brand new pool package, which provides a further layer of abstraction to make it easier and safer to use either DBI or dplyr. pool is not yet on CRAN. In particular, pool will take care of managing connections, preventing memory leaks, and ensuring the best performance. See this pool basics article and the more advanced-level article if you’re feeling adventurous! (Both of these articles contain Shiny app examples that use DBI to connect to an external MySQL database.) If you are more comfortable with dplyr than DBI, don’t miss the article about the integration of pool and dplyr.

If you’re new to databases in the Shiny world, we recommend using dplyr and pool if possible. If you need greater control than dplyr offers (for example, if you need to modify data in the database or use transactions), then use DBI and pool. The pool package was introduced to make your life easier, but in no way constrains you, so we don’t envision any situation in which you’d be better off not using it. The only caveat is that pool is not yet on CRAN, so you may prefer to wait for that.

Others

There are many more minor features, small improvements, and bug fixes than we can cover here, so we’ll just mention a few of the more noteworthy ones. (For more, you can see the full changelog.).

  • Error Sanitization: you now have the option to sanitize error messages; in other words, the content of the original error message can be suppressed so that it doesn’t leak any sensitive information. To sanitize errors everywhere in your app, just add options(shiny.sanitize.errors = TRUE) somewhere in your app. Read this article for more, or play with the demo app.
  • Code Diagnostics: if there is an error parsing ui.R, server.R, app.R, or global.R, Shiny will search the code for missing commas, extra commas, and unmatched braces, parens, and brackets, and will print out messages pointing out those problems. (#1126)
  • Reactlog visualization: by default, the showReactLog() function (which brings up the reactive graph) also displays the time that each reactive and observer were active for:

    reactlog

    Additionally, to organize the graph, you can now drag any of the nodes to a specific position and leave it there.

  • Nicer-looking tables: we’ve made tables generated with renderTable() look cleaner and more modern. While this won’t break any older code, the finished look of your table will be quite a bit different, as the following image shows:

    render-table

    For more, read our short article about this update, experiment with all the new features in this demo app, or check out the reference documentation.

The JSM conference in Chicago, July 31 thru August 4, 2016, is one of the largest to be found on statistics, with many terrific talks for R users. We’ve listed some of the sessions that we’re particularly excited about below. These include talks from RStudio employees, like Hadley Wickham, Yihui Xie, Mine Cetinkaya-Rundel, Garrett Grolemund, and Joe Cheng, but also include a bunch of other talks about R that we think look interesting.

When you’re not in one of the sessions below, please visit us in the exhibition area, booth #126-128. We’ll have copies of all our cheat sheets and stickers, and it’s a great place to learn about the other stuff we’ve been working on lately:  from Sparklyr and R Markdown Notebooks to the latest in RStudio Server Pro, Shiny Server Pro, shinyapps.io, RStudio Connect (beta) and more!

Another great place to chat with people interested in R is the Statistical Computing and Graphics Mixer at 6pm on Monday in the Hilton Stevens Salon A4. It’s advertised as a business meeting in the program, but don’t let that put you off – it’s open to all.

SUNDAY

Session 21: Statistical Computing and Graphics Student Awards
Sunday, July 31, 2016 : 2:00 PM to 3:50 PM, CC-W175b

Session 47 Making the Most of R Tools
Hadley Wickham, RStudio (Discussant)
Sunday, July 31, 2016: 4:00 PM to 4:50 PM, CC-W183b

Thinking with Data Using R and RStudio: Powerful Idioms for Analysts
Nicholas Jon Horton, Amherst College; Randall Pruim, Calvin College ; Daniel Kaplan, Macalester College
Transform Your Workflow and Deliverables with Shiny and R Markdown
Garrett Grolemund, RStudio

Session 54 Recent Advances in Information Visualization
Yihui Xie, RStudio (organizer)
Sunday, July 31, 2016: 4:00 PM to 4:50 PM, CC-W183c

Session 85 Reproducibility Promotes Transparency, Efficiency, and Aesthetics
Richard Schwinn
Sunday, July 31, 2016 : 5:35 PM to 5:50 PM, CC-W176a

Session 88 Communicate Better with R, R Markdown, and Shiny
Garrett Grolemund, RStudio (Poster Session)
Sunday, July 31, 2016: 6:00 PM to 8:00 PM, CC-Hall F1 West

MONDAY

Session 106  Linked Brushing in R
Hadley Wickham, RStudio
Monday, August 1, 2016 : 8:35 AM to 8:55 AM, CC-W196b

Session 127 R Tools for Statistical Computing
Monday, August 1, 2016 : 8:30 AM to 10:20 AM, CC-W196c

8:35 AM The Biglasso Package: Extending Lasso Model Fitting to Big Data in R — Yaohui Zeng, University of Iowa ; Patrick Breheny, University of Iowa
8:50 AM Independent Sampling for a Spatial Model with Incomplete Data — Harsimran Somal, University of Iowa ; Mary Kathryn Cowles, University of Iowa
9:05 AM Introduction to the TextmineR Package for R — Thomas Jones, Impact Research
9:20 AM Vector-Generalized Time Series Models — Victor Miranda Soberanis, University of Auckland ; Thomas Yee, University of Auckland
9:35 AM New Computational Approaches to Large/Complex Mixed Effects Models — Norman Matloff, University of California at Davis
9:50 AM Broom: An R Package for Converting Statistical Modeling Objects Into Tidy Data Frames — David G. Robinson, Stack Overflow
10:05 AM Exact Parametric and Nonparametric Likelihood-Ratio Tests for Two-Sample Comparisons — Yang Zhao, SUNY Buffalo ; Albert Vexler, SUNY Buffalo ; Alan Hutson, SUNY Buffalo ; Xiwei Chen, SUNY Buffalo

Session 270 Automated Analytics and Data Dashboards for Evaluating the Impacts of Educational Technologies
Daniel Stanhope and Joyce Yu and Karly Rectanus
Monday, August 1, 2016 : 3:05 PM to 3:50 PM, CC-Hall F1 West

TUESDAY

Session 276 Statistical Tools for Clinical Neuroimaging
Ciprian Crainiceanu
Tuesday, August 2, 2016 : 7:00 AM to 8:15 AM, CC-W375a

Session 332 Doing More with Data in and Outside the Undergraduate Classroom
Mine Cetinkaya-Rundel, Duke University (organizer)
Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM, CC-W184bc

Session 407 Interactive Visualizations and Web Applications for Analytics
Tuesday, August 2, 2016 : 2:00 PM to 3:50 PM, CC-W179a

2:05 PM Radiant: A Platform-Independent Browser-Based Interface for Business Analytics in R — Vincent Nijs, Rady School of Management
2:20 PM Rbokeh: An R Interface to the Bokeh Plotting Library — Ryan Hafen, Hafen Consulting
2:35 PM Composable Linked Interactive Visualizations in R with Htmlwidgets and Shiny — Joseph Cheng, RStudio
2:50 PM Papayar: A Better Interactive Neuroimage Plotter in R — John Muschelli, The Johns Hopkins University
3:05 PM Interactive and Dynamic Web-Based Graphics for Data Analysis — Carson Sievert, Iowa State University
3:20 PM HTML Widgets: Interactive Visualizations from R Made Easy! — Yihui Xie, RStudio ; Ramnath Vaidyanathan, Alteryx

WEDNESDAY

Session 475  Steps Toward Reproducible Research
Yihui Xie, RStudio  (Discussant)
Wednesday, August 3, 2016 : 8:30 AM to 10:20 AM, CC-W196c

8:35 AM Reproducibility for All and Our Love/Hate Relationship with Spreadsheets — Jennifer Bryan, University of British Columbia
8:55 AM Steps Toward Reproducible Research — Karl W. Broman, University of Wisconsin – Madison
9:15 AM Enough with Trickle-Down Reproducibility: Scientists, Open This Gate! Scientists, Tear Down This Wall! — Karthik Ram, University of California at Berkeley
9:35 AM Integrating Reproducibility into the Undergraduate Statistics Curriculum — Mine Cetinkaya-Rundel, Duke University

Session 581 Mining Text in R
David Marchette, Naval Surface Warfare Center
Wednesday, August 3, 2016 : 2:05 PM to 2:40 PM, CC-W180

THURSDAY

Session 696 Statistics for Social Good
Hadley Wickham, RStudio (Chair)
Thursday, August 4, 2016 : 10:30 AM to 12:20 PM, CC-W179a

Session 694 Web Application Teaching Tools for Statistics Using R and Shiny
Jimmy Doi and Gail Potter and Jimmy Wong and Irvin Alcaraz and Peter Chi
Thursday, August 4, 2016 : 11:05 AM to 11:20 AM, CC-W192a

Following our initial and very gratifying Shiny Developer Conference this past January, which sold out in a few days, RStudio is very excited to announce a new and bigger conference today!

rstudio::conf, the conference about all things R and RStudio, will take place January 13 and 14, 2017 in Orlando, Florida. The conference will feature talks and tutorials from popular RStudio data scientists and developers like Hadley Wickham, Yihui Xie, Joe Cheng, Winston Chang, Garrett Grolemund, and J.J. Allaire, along with lightning talks from RStudio partners and customers.

Preceding the conference, on January 11 and 12, RStudio will offer two days of optional training. Training attendees can choose from Hadley Wickham’s Master R training, a new Intermediate Shiny workshop from Shiny creator Joe Cheng or a new workshop from Garrett Grolemund that is based on his soon-to-be-published book with Hadley: Introduction to Data Science with R.

rstudio::conf is for R and RStudio users who want to learn how to write better shiny applications in a better way, explore all the new capabilities of the R Markdown authoring framework, apply R to big data and work effectively with Spark, understand the RStudio toolchain for data science with R, discover best practices and tips for coding with RStudio, and investigate enterprise scale development and deployment practices and tools, including the new RStudio Connect.

Not to be missed, RStudio has also reserved Universal Studio’s The Wizarding World of Harry Potter on Friday night, January 13, for the exclusive use of conference attendees!

Conference attendance is limited to 400. Training is limited to 70 students for each of the three 2-day workshops. All seats are are available on a first-come, first-serve basis.

Please go to http://www.rstudio.com/conference to purchase.

We hope to see you in Florida at rstudio::conf 2017!

For questions or issues registering, please email conf@rstudio.com. To ask about sponsorship opportunities contact anne@rstudio.com.

UseR! 2016 has arrived and the RStudio team is at Stanford to share our newest products and latest enhancements to Shiny, R Markdown, dplyr, and more. Here’s a quick snapshot of RStudio related sessions. We hope to see you in as many of them as you can attend!

Monday June 27

Morning Tutorials

Afternoon Tutorials

Afternoon short talks moderated by Hadley Wickham

Tuesday June 28

Wednesday June 29

Thursday June 30

Stop by the booth!
Don’t miss our table in the exhibition area during the conference. Come talk to us about your plans for R and learn how RStudio Server Pro and Shiny Server Pro can provide enterprise-ready support and scalability for your RStudio IDE and Shiny deployments.

Note: Although UseR! is sold out, arrangements have been made to stream the keynote talks from https://aka.ms/user2016conference. Video recordings of the other sessions (where permitted by speakers) will be made available by UseR! organizers after the conference.