You are currently browsing the category archive for the ‘RStudio IDE’ category.

RStudio will again teach the new essentials for doing (big) data science in R at this year’s Strata NYC conference, September 29 2015 (  You will learn from Garrett Grolemund, Yihui Xie, and Nathan Stephens who are all working on fascinating new ways to keep the R ecosystem apace of the challenges facing those who work with data.

Topics include:

  • R Quickstart: Wrangle, transform, and visualize data
    Instructor: Garrett Grolemund (90 minutes)
  • Work with Big Data in R
    Instructor: Nathan Stephens (90 minutes)
  • Reproducible Reports with Big Data
    Instructor: Yihui Xie (90 minutes)
  • Interactive Shiny Applications built on Big Data
    Instructor: Garrett Grolemund (90 minutes)

If you plan to stay for the full Strata Conference+Hadoop World be sure to look us up at booth 633 during the Expo Hall hours. We’ll have the latest books from RStudio authors and “shiny” t-shirts to win. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in New York City! (

Traditionally, the mechanisms for obtaining R and related software have used standard HTTP connections. This isn’t ideal though, as without a secure (HTTPS) connection there is less assurance that you are downloading code from a legitimate source rather than from another server posing as one.

Recently there have been a number of changes that make it easier to use HTTPS for installing R, RStudio, and packages from CRAN:

  1. Downloads of R from the main CRAN website now use HTTPS;

  2. Downloads of RStudio from our website now use HTTPS; and

  3. It is now possible to install packages from CRAN over HTTPS.

There are a number of ways to ensure that installation of packages from CRAN are performed using HTTPS. The most recent version of R  (v3.2.2) makes this the default behavior. The most recent version of RStudio (v0.99.473) also attempts to configure secure downloads from CRAN by default (even for older versions of R). Finally, any version of R or RStudio can use secure HTTPS downloads by making some configuration changes as described in the Secure Package Downloads for R article in our Knowledge Base.

Configuring Secure Connections to CRAN

While the simplest way to ensure secure connections to CRAN is to run the updated versions mentioned above, it’s important to note that it is not necessary to upgrade R or RStudio to achieve this end. Rather, two configuration changes can be made:

  1. The R download.file.method option needs to specify a method that is capable of HTTPS; and
  2. The CRAN mirror you are using must be capable of HTTPS connections (not all of them are).

The specifics of the required changes for various products, platforms, and versions of R are described in-depth in the Secure Package Downloads for R article in our Knowledge Base.

Recommendations for RStudio Users

We’ve made several changes to RStudio IDE to ensure that HTTPS connections are used throughout the product:

  1. The default download.file.method option is set to an HTTPS compatible method (with a warning displayed if a secure method can’t be set);
  2. The configured CRAN mirror is tested for HTTPS compatibility and a warning is displayed if the mirror doesn’t support HTTPS;
  3. HTTPS is used for user selection of a non-default CRAN mirror;
  4. HTTPS is used for in-product documentation links;
  5. HTTPS is used when checking for updated versions of RStudio (applies to desktop version only); and
  6. HTTPS is used when downloading Rtools (applies to desktop version only).

If you are running RStudio on the desktop we strongly recommend that you update to the latest version (v0.99.473).

Recommendations for Server Administrators

If you are running RStudio Server it’s possible to make the most important security enhancements by changing your configuration rather than updating to a new version. The Secure Package Downloads for R article in our Knowledge Base provides documentation on how do this.

In this case in-product documentation links and user selection of a non-default CRAN mirror will continue to use HTTP rather than HTTPS however these are less pressing concerns than CRAN package installation. If you’d like these functions to also be performed over HTTPS then you should upgrade your server to the latest version of RStudio.

If you are running Shiny Server we recommend that you modify your configuration to support HTTPS package downloads as described in the Secure Package Downloads for R article.

Today’s guest post is written by Vincent Warmerdam of GoDataDriven and is reposted with Vincent’s permission from You can learn more about how to use SparkR with RStudio at the 2015 EARL Conference in Boston November 2-4, where Vincent will be speaking live.

This document contains a tutorial on how to provision a spark cluster with RStudio. You will need a machine that can run bash scripts and a functioning account on AWS. Note that this tutorial is meant for Spark 1.4.0. Future versions will most likely be provisioned in another way but this should be good enough to help you get started. At the end of this tutorial you will have a fully provisioned spark cluster that allows you to handle simple dataframe operations on gigabytes of data within RStudio.

AWS prep

Make sure you have an AWS account with billing. Next make sure that you have downloaded your .pem files and that you have your keys ready.

Spark Startup

Next go and get spark locally on your machine from the spark homepage. It’s a pretty big blob. Unzip it once it is downloaded go to the ec2 folder in the spark folder. Run the following command from the command line.

./spark-ec2 \
--key-pair=spark-df \
--identity-file=/Users/code/Downloads/spark-df.pem \
--region=eu-west-1 \
-s 1 \
--instance-type c3.2xlarge \
launch mysparkr

This script will use your keys to connect to amazon and setup a spark standalone cluster for you. You can specify what type of machines you want to use as well as how many and where on amazon. You will only need to wait until everything is installed, which can take up to 10 minutes. More info can be found here.
When the command signals that it is done, you can ssh into your machine via the command line.
./spark-ec2 -k spark-df -i /Users/code/Downloads/spark-df.pem --region=eu-west-1 login mysparkr
Once you are in your amazon machine you can immediately run SparkR from the terminal.

chmod u+w /root/spark/

As just a toy example, you should be able to confirm that the following code already works.

ddf <- createDataFrame(sqlContext, faithful) 

This ddf dataframe is no ordinary dataframe object. It is a distributed dataframe, one that can be distributed across a network of workers such that we could query it for parallelized commands through spark.

Spark UI

This R command you have just run launches a spark job. Spark has a webui so you can keep track of the cluster. To visit the web-ui, first confirm on what IP-address the master node is via this command:


You can now visit the webui via your browser.


From here you can view anything you may want to know about your spark clusters (like executor status, job process and even a DAG visualisation).

This is a good moment to stand still and realize that this on it’s own right is already very cool. We can start up a spark cluster in 15 minutes and use R to control it. We can specify how many servers we need by only changing a number on the command line and without any real developer effort we gain access to all this parallelizing power.
Still, working from a terminal might not be too productive. We’d prefer to work with a GUI and we would like some basic plotting functionality when working with data. So let’s install RStudio and get some tools connected.

RStudio setup

Get out of the SparkR shell by entering q(). Next, download and install Rstudio.
sudo yum install --nogpgcheck -y rstudio-server-rhel-0.99.446-x86_64.rpm
rstudio-server restart
While this is installing. Make sure the TCP connection on the 8787 port is open in the AWS security group setting for the master node. A recommended setting is to only allow access from your ip.

Then, add a user that can access RStudio. We make sure that this user can also access all the RStudio files.

adduser analyst
passwd analyst

You also need to do this (the details of why are a bit involved). These edits need to be made because the analyst user doesn’t have root permissions.
chmod a+w /mnt/spark
chmod a+w /mnt2/spark
sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/ > /root/spark/conf/
mv /root/spark/conf/ /root/spark/conf/
ulimit -n 1000000
When this is known, point the browser to <master-ip-adr>:8787. Then login in as analyst.

Loading data from S3

Let’s confirm that we can now play with the RStudio stack by downloading some libraries and having it run against a data that lives on S3.
small_file = "s3n://<AWS-ID>:<AWS-SECRET-KEY>@<bucket_name>/data.json"
dist_df <- read.df(sqlContext, small_file, "json") %>% cache
This dist_df is now a distributed dataframe, which has a different api than the normal R dataframe but is similar to dplyr.
head(summarize(groupBy(dist_df, df$type), count = n(df$auc)))
Also, we can install magrittr to make our code look a lot nicer.

local_df <- dist_df %>% 
  groupBy(df$type) %>% 
  summarize(count = n(df$id)) %>% 

The collect method pulls the distributed dataframe back into a normal dataframe on a single machine so you can use plotting methods on it again and use R as you would normally. A common use case would be to use spark to sample or aggregate a large dataset which can then be further explored in R.
Again, if you want to view the spark ui for these jobs you can just go to:


A more complete stack

Unfortunately this stack has an old version of R (we need version 3.2 to get the newest version of ggplot2/dplyr). Also, as of right now there isn’t support for the machine learning libraries yet. These are known issues at the moment and version 1.5 should show some fixes. Version 1.5 will also feature RStudio installation as part of the ec2 stack.
Another issue is that the namespace of dplyr currently conflicts with sparkr, time will tell how this gets resolved. Same would go for other data features like windowing function and more elaborate data types.

Killing the cluster

When you are done with the cluster, you only need to exit the ssh connection and run the following command:
./spark-ec2 -k spark-df -i /Users/code/Downloads/spark-df.pem --region=eu-west-1 destroy mysparkr


The economics of spark are very interesting. We only pay amazon for the time that we are using Spark as a compute engine. All other times we’d only pay for S3. This means that if we analyse for 8 hours, we’d only pay for 8 hours. Spark is also very flexible in that it allows us to continue coding in R (or python or scala) without having to learn multiple domain specific languages or frameworks like in hadoop. Spark makes big data really simple again.
This document is meant to help you get started with Spark and RStudio but in a production environment there are a few things you still need to account for:

  • security, our web connection is not done through https, even though we are telling amazon to only use our ip, we may be at security risk if there is a man in the middle listening .
  • multiple users, this setup will work fine for a single user but if multiple users are working on such a cluster you may need to rethink some steps with regards to user groups, file access and resource management.
  • privacy, this setup works well for ec2 but if you have sensitive, private user data then you may need to do this on premise because the data cannot leave your own datacenter. Most install steps would be the same but the initial installation of Spark would require the most work. See the docs for more information.

Spark is an amazing tool, expect more features in the future.

Possible Gotya


It can happen that the ec2 script hangs in the Waiting for cluster to enter 'ssh-ready' state part. This can happen if you use amazon a lot. To prevent this you may want to remove some lines in ~/.ssh/known_hosts. More info here. Another option is to add the following lines to your ~/.ssh/config file.

# AWS EC2 public hostnames (changing IPs)
Host * 
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null

“Master” R in Washington DC this September!

Join RStudio Chief Data Scientist Hadley Wickham at the AMA – Executive Conference Center in Arlington, VA on September 14 and 15, 2015 for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

It will be at least another year before Hadley returns to teach his class on the East Coast, so don’t miss this opportunity to learn from him in person. The venue is conveniently located next to Ronald Reagan Washington National Airport and a short distance from the Metro. Attendance is limited. Past events have sold out.

Register today!

We’re pleased to announce that the final version of RStudio v0.99 is available for download now. Highlights of the release include:

  • A new data viewer with support for large datasets, filtering, searching, and sorting.
  • Complete overhaul of R code completion with many new features and capabilities.
  • The source editor now provides code diagnostics (errors, warnings, etc.) as you work.
  • User customizable code snippets for automating common editing tasks.
  • Tools for Rcpp: completion, diagnostics, code navigation, find usages, and automatic indentation.
  • Many additional source editor improvements including multiple cursors, tab re-ordering, and several new themes.
  • An enhanced Vim mode with visual block selection, macros, marks, and subset of : commands.

There are also lots of smaller improvements and bug fixes across the product. Check out the v0.99 release notes for details on all of the changes.

Data Viewer

We’ve completely overhauled the data viewer with many new capabilities including live update, sorting and filtering, full text searching, and no row limit on viewed datasets.


See the data viewer documentation for more details.

Code Completion

Previously RStudio only completed variables that already existed in the global environment. Now completion is done based on source code analysis so is provided even for objects that haven’t been fully evaluated:


Completions are also provided for a wide variety of specialized contexts including dimension names in [ and [[:


Code Diagnostics

We’ve added a new inline code diagnostics feature that highlights various issues in your R code as you edit.

For example, here we’re getting a diagnostic that notes that there is an extra parentheses:

Screen Shot 2015-04-08 at 12.04.14 PM

Here the diagnostic indicates that we’ve forgotten a comma within a shiny UI definition:


A wide variety of diagnostics are supported, including optional diagnostics for code style issues (e.g. the inclusion of unnecessary whitespace). Diagnostics are also available for several other languages including C/C++, JavaScript, HTML, and CSS. See the code diagnostics documentation for additional details.

Code Snippets

Code snippets are text macros that are used for quickly inserting common snippets of code. For example, the fun snippet inserts an R function definition:

Insert Snippet

If you select the snippet from the completion list it will be inserted along with several text placeholders which you can fill in by typing and then pressing Tab to advance to the next placeholder:

Screen Shot 2015-04-07 at 10.44.39 AM

Other useful snippets include:

  • lib, req, and source for the library, require, and source functions
  • df and mat for defining data frames and matrices
  • if, el, and ei for conditional expressions
  • apply, lapply, sapply, etc. for the apply family of functions
  • sc, sm, and sg for defining S4 classes/methods.

See the code snippets documentation for additional details.

Try it Out

RStudio v0.99 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.

HadleyWickhamHSJoin RStudio Chief Data Scientist Hadley Wickham at the University of Illinois at Chicago, on Wednesday May 27th & 28th for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

As of this post, the workshop is two-thirds sold out. If you’re in or near Chicago and want to boost your R programming skills, this is Hadley’s only Central US public workshop planned for 2015.

Register here:

We’ve blogged previously about various improvements we’ve made to the source editor in RStudio v0.99 including enhanced code completion, snippets, diagnostics, and an improved Vim mode. Besides these larger scale features we’ve made lots of smaller improvements that we also wanted to highlight. You can try out all of these features now in the RStudio v0.99 preview release.

Multiple Cursors

You can now create and use multiple cursors within RStudio. Multiple cursors can be created in a variety of ways:

  • Press Ctrl + Alt + {Up/Down} to create a new cursor in the pressed direction,
  • Press Ctrl + Alt + Shift + {Direction} to move a second cursor in the specified direction,
  • Use Alt and drag with the mouse to create a rectangular selection,
  • Use Alt + Shift and click to create a rectangular selection from the current cursor position to the clicked position.

RStudio also makes use of multiple cursors in its Find / Replace toolbar now. After entering a search term, if you press the All button, all items matching that search term are selected.

Screen Shot 2015-05-05 at 2.58.17 PM

You can then begin typing to replace each match with a new term—each matched entry will be updated as you type.

Rearrangeable Tabs

You can (finally!) move tabs around in the Source pane by clicking and dragging. In the below example, the file ‘file_4.R’ is currently selected and being dragged into place.


New, Improved Editor Themes

A number of new editor themes have been added to RStudio, and older editor themes have been tweaked to ensure that brackets are given a distinct color from text for further legibility.


Select / Expand Selection

You can use Ctrl + Shift + E to select everything within the nearest pair of opening and closing brackets, or use Ctrl + Alt + Shift + E to expand the selection up to the next closing bracket.

Screen Shot 2015-05-05 at 2.56.45 PM

Fuzzy Navigation

You can use CTRL + . to quickly navigate between files and symbols within a project. Previously, this search utility performed prefix matching, and so it was difficult to use with long file / symbol names. Now, the CTRL + . navigator uses fuzzy matching to narrow the candidate set down based on subsequence matching, which makes it easier to navigate when many files share a common prefix—for example, to test- files for a project managing its tests with testthat.

Screen Shot 2015-05-05 at 2.41.11 PM

Insert Roxygen Skeleton

RStudio now provides a means for inserting a Roxygen documentation skeleton above functions. The skeleton generator is smart enough to understand plain R functions, as well as S4 generics, methods and classes—it will automatically fill in documentation for available parameters and slots.


 More Languages

We’ve also added syntax highlighting modes for many new languages including Clojure, CoffeeScript, C#, Graphviz, Go, Groovy, Haskell, Java, Julia, Lisp, Lua, Matlab, Perl, Ruby, Rust, Scala, and Stan. There’s also some basic keyword and text based code completion for several languages including JavaScript, HTML, CSS, Python, and SQL.

Try it Out

You can try out all of the new editor features by downloading the latest preview release of RStudio. As always, let us know how the new features are working as well as what else you’d like to see us do.

Soon after the announcement of htmlwidgets, Rich Iannone released the DiagrammeR package, which makes it easy to generate graph and flowchart diagrams using text in a Markdown-like syntax. The package is very flexible and powerful, and includes:

  1. Rendering of Graphviz graph visualizations (via viz.js)
  2. Creating diagrams and flowcharts using mermaid.js
  3. Facilities for mapping R objects into graphs, diagrams, and flowcharts.

We’re very excited about the prospect of creating sophisticated diagrams using an easy to author plain-text syntax, and built some special authoring support for DiagrammeR into RStudio v0.99 (which you can download a preview release of now).

Graphviz Meets R

If you aren’t familiar with Graphviz, it’s a tool for rendering DOT (a plain text graph description language). DOT draws directed graphs as hierarchies. Its features include well-tuned layout algorithms for placing nodes and edge splines, edge labels, “record” shapes with “ports” for drawing data structures, and cluster layouts (see for an introductory guide).

DiagrammeR can render any DOT script. For example, with the following source file (“”):

Screen Shot 2015-04-30 at 12.35.17 PM

You can render the diagram with:



Since the diagram is an htmlwidget it can be used at the R console, within R Markdown documents, and within Shiny applications. Within RStudio you can preview a Graphviz or mermaid source file the same way you source an R script via the Preview button or the Ctrl+Shift+Enter keyboard shortcut.

This simple example only scratches the surface of what’s possible, see the DiagrammeR Graphviz documentation for more details and examples.

Diagrams with mermaid.js

Support for mermaid.js in DiagrammeR enables you to create several other diagram types not supported by Graphviz. For example, here’s the code required to create a sequence diagram:


You can render the diagram with:



See the DigrammeR mermaid.js documentation for additional details.

Generating Diagrams from R Code

Both of the examples above illustrating creating diagrams by direct editing of DOT and mermaid scripts. The latest version of DiagrammeR (v0.6, just released to CRAN) also includes facilities for generating diagrams from R code. This can be done in a couple of ways:

  1. Using text substitution, whereby you create placeholders within the diagram script and substitute their values from R objects. See the documentation on Graphviz Substitution for more details.
  2. Using the graphviz_graph function you can specify nodes and edges directly using a data frame.

Future versions of DiagrammeR are expected to include additional features to support direct generation of diagrams from R.

Publishing with DiagrammeR

Diagrams created with DiagrammeR act a lot like R plots however there’s an important difference: they are rendered as HTML content rather than using an R graphics device. This has the following implications for how they can be published and re-used:

  1. Within RStudio you can save diagrams as an image (PNG, BMP, etc.) or copy them to clipboard for re-use in other applications.
  2. For a more reproducible workflow, diagrams can be embedded within R Markdown documents just like plots (all of the required HTML and JS is automatically included). Note that because the diagrams depend on HTML and JavaScript for rendering they can only be used in HTML based output formats (they don’t work in PDFs or MS Word documents).
  3. From within RStudio you can also publish diagrams to RPubs or save them as standalone web pages.


See the DiagrammeR documentation on I/O for additional details.

Try it Out

To get started with DiagrammeR check out the excellent collection of demos and documentation on the project website. To take advantage of the new RStudio features that support DiagrammeR you should download the latest RStudio v0.99 Preview Release.




In RStudio v0.99 we’ve made a major investment in R source code analysis. This work resulted in significant improvements in code completion, and in the latest preview release enable a new inline code diagnostics feature that highlights various issues in your R code as you edit.

For example, here we’re getting a diagnostic that notes that there is an extra parentheses:

Screen Shot 2015-04-08 at 12.04.14 PM

Here the diagnostic indicates that we’ve forgotten a comma within a shiny UI definition:


This diagnostic flags an unknown parameter to a function call:

Screen Shot 2015-04-08 at 11.50.07 AM

This diagnostic indicates that we’ve referenced a variable that doesn’t exist and suggests a fix based on another variable in scope:

Screen Shot 2015-04-08 at 4.23.49 PM

A wide variety of diagnostics are supported, including optional diagnostics for code style issues (e.g. the inclusion of unnecessary whitespace). Diagnostics are also available for several other languages including C/C++, JavaScript, HTML, and CSS.

Configuring Diagnostics

By default, code in the current source file is checked whenever it is saved, as well as if the keyboard is idle for a period of time. You can tweak this behavior using the Code -> Diagnostics options:


Note that several of the available diagnostics are disabled by default. This is because we’re in the process of refining their behavior to eliminate “false negatives” where correct code is flagged as having a problem. We’ll continue to improve these diagnostics and enable them by default when we feel they are ready.

Trying it Out

You can try out the new code diagnostics by downloading the latest preview release of RStudio. This feature is a work in progress and we’re particularly interested in feedback on how well it works. Please also let us know if there are common coding problems which you think we should add new diagnostics for. We hope you try out the preview and let us know how we can make it better.


Over the past several years the Rcpp package has become an indispensable tool for creating high-performance R code. Its power and ease of use have made C++ a natural second language for many R users. There are over 400 packages on CRAN and Bioconductor that depend on Rcpp and it is now the most downloaded R package.

In RStudio v0.99 we have added extensive additional tools to make working with Rcpp more pleasant, productive, and robust, these include:

  • Code completion
  • Source diagnostics as you edit
  • Code snippets
  • Auto-indentation
  • Navigable list of compilation errors
  • Code navigation (go to definition)

We think these features will go a long way to helping even more R users succeed with Rcpp. You can try the new features out now by downloading the RStudio Preview Release.

Code Completion

RStudio v0.99 includes comprehensive code completion for C++ based on Clang (the same underlying engine used by XCode and many other C/C++ tools):

Screen Shot 2015-04-07 at 12.13.31 PM

Completions are provided for the C++ language, Rcpp, and any other libraries you have imported.


As you edit C++ source files RStudio uses Clang to scan your code looking for errors, incomplete code, or other conditions worthy of warnings or informational notes. For example:

Screen Shot 2015-04-07 at 12.16.38 PM

Diagnostics alert you to the possibility of subtle problems and flag outright incorrect code as early as possible, substantially reducing iteration/debugging time.

Interactive C++

Rcpp includes some nifty tools to help make working with C++ code just as simple and straightforward as working with R code. You can “source” C++ code into R just like you’d source an R script (no need to deal with Makefiles or build systems). Here’s a Gibbs Sampler implemented with Rcpp:

Screen Shot 2015-04-13 at 4.40.36 PM

We can make this function available to R by simply sourcing the C++ file (much like we’d source an R script):

gibbs(100, 10)

Thanks to the abstractions provided by Rcpp, the code implementing the Gibbs Sampler in C++ is nearly identical to the code you’d write in R, but runs 20 times faster. RStudio includes full support for Rcpp’s sourceCpp via the Source button and Ctrl+Shift+Enter keyboard shortcut.

Try it Out

If you are new to C++ or Rcpp you might be surprised at how easy it is to get started. There are lots of great resources available, including:

You can give the new Rcpp features a try now by downloading the RStudio Preview Release. If you run into problems or have feedback on how we could make things better let us know on our Support Forum.


Get every new post delivered to your Inbox.

Join 19,246 other followers