You are currently browsing the category archive for the ‘RStudio IDE’ category.

We’re pleased to announce that a new release of RStudio (v0.99.878) is available for download now. Highlights of this release include:

There are lots of other small improvements across the product, check out the release notes for full details.

RStudio Addins

RStudio Addins provide a mechanism for executing custom R functions interactively from within the RStudio IDE—either through keyboard shortcuts, or through the Addins menu. Coupled with the rstudioapi package, users can now write R code to interact with and modify the contents of documents open in RStudio.

An addin can be as simple as a function that inserts a commonly used snippet of text, and as complex as a Shiny application that accepts input from the user and uses it to transform the contents of the active editor. The sky is the limit!

Here’s an example of addin that enables interactive subsetting of a data frame with live preview:

subset-addin

 

This addin is implemented using a Shiny Gadget (see the source code for more details). RStudio Addins are distributed as R packages. Once you’ve installed an R package that contains addins, they’ll be immediately become available within RStudio.

You can learn more about using and developing addins here: http://rstudio.github.io/rstudioaddins/.

R Markdown

We’ve made a number of improvements to R Markdown authoring. There’s now an optional outline view that enables quick navigation across larger documents:

Screen Shot 2015-12-22 at 9.27.34 AM

We’ve also added inline UI to code chunks for running individual chunks, running all previous chunks, and specifying various commonly used knit options:

Screen Shot 2015-12-22 at 9.30.11 AM

Multiple Source Windows

There are two ways to open a new source window:

Pop out an editor: click the Show in New Window button in any source editor tab.

Tear off a pane: drag a tab out of the main window and onto the desktop; a new source window will be opened where you dropped the tab.

You can have as many source windows open as you like. Each source window has its own set of tabs; these tabs are independent of the tabs in RStudio’s main source pane.

Customizable Keyboard Shortcuts

You can now customize keyboard shortcuts in RStudio — you can bind keys to execute RStudio application commands, editor commands, or even user-defined R functions.

Access the keyboard shortcuts by clicking Tools -> Modify Keyboard Shortcuts...:

This will present a dialog that enables remapping of all available editor commands (commands that affect the current document’s contents, or the current selection) and RStudio commands (commands whose actions are scoped beyond just the current editor).

Emacs Keybindings

We’ve introduced a new keybindings mode to go along with the default bindings and Vim bindings already supported. Emacs mode provides a base set of keybindings for navigation and selection, including:

  • C-p, C-n, C-b and C-f to move the cursor up, down left and right by characters
  • M-b, M-f to move left and right by words
  • C-a, C-e to navigate to the start, or end, of line;
  • C-k to ‘kill’ to end of line, and C-y to ‘yank’ the last kill,
  • C-s, C-r to initiate an Emacs-style incremental search (forward / reverse),
  • C-Space to set/unset mark, and C-w to kill the marked region.

There are some additional keybindings that Emacs Speaks Statistics (ESS) users might find familiar:

  • C-c C-v displays help for the object under the cursor,
  • C-c C-n evaluates the current line / selection,
  • C-x b allows you to visit another file,
  • M-C-a moves the cursor to the beginning of the current function,
  • M-C-e moves to the end of the current function,
  • C-c C-f evaluates the current function.

We’ve also introduced a number of keybindings that allow you to interact with the IDE as you might normally do in Emacs:

  • C-x C-n to create a new document,
  • C-x C-f to find / open an existing document,
  • C-x C-s to save the current document,
  • C-x k to close the current file.

RStudio Server Pro

We’ve introduced a number of significant enhancements to RStudio Server Pro in this release, including:

  • The ability to open multiple concurrent R sessions. Multiple concurrent sessions are useful for running multiple analyses in parallel and for switching between different tasks.
  • Flexible use of multiple R versions on the same server. This is useful when you have some analysts or projects that require older versions of R or R packages and some that require newer versions.
  • Project sharing for easy collaboration within workgroups. When you share a project, RStudio Server securely grants other users access to the project, and when multiple users are active in the project at once, you can see each others’ activity and work together in a shared editor.

See the updated RStudio Server Pro page for additional details, including a set of videos which demonstrate the new features.

Try it Out

RStudio v0.99.878 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.

 

 

On May 19 and 20, 2016, Hadley Wickham will teach his two day Master R Developer Workshop in the centrally located European city of Amsterdam.

Are you ready to upgrade your R skills?  Register soon to secure your seat.

For the convenience of those who may travel to the workshop, it will be held at the Hotel NH Amsterdam Schiphol Airport.

Hadley teaches a few workshops each year and this is the only one planned for Europe. They are very popular and hotel rooms are limited. Please register soon.

We look forward to seeing you in the month of May!

Are you ready to upgrade your R skills?  Register soon to secure your seat.

On January 28 and 29, 2016, Hadley Wickham will teach his popular Master R Developer Workshop at the Westin San Francisco Airport.  The workshop is offered only 3 times a year and the San Francisco class is already nearly 50% full. This is the only Master R Developer Workshop Hadley is planning for the US West Coast in 2016.

We look forward to seeing you there!

The RStudio IDE is bursting with capabilities and features. Do you know how to use them all? Tomorrow, we begin an “RStudio Essentials” webinar series. This will be the perfect way to learn how to use the IDE to its fullest.  The series is broken into six sections always on a Wednesday at 11 a.m. EDT:

Each webinar will be 30 minutes long, which will make them easy to attend. If you miss a live webinar or want to review them, recorded versions will be available to registrants. Register here.

p.s. Don’t forget that you can watch many useful past webinars at our webinars archive.

RStudio will again teach the new essentials for doing (big) data science in R at this year’s Strata NYC conference, September 29 2015 (http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/44154).  You will learn from Garrett Grolemund, Yihui Xie, and Nathan Stephens who are all working on fascinating new ways to keep the R ecosystem apace of the challenges facing those who work with data.

Topics include:

  • R Quickstart: Wrangle, transform, and visualize data
    Instructor: Garrett Grolemund (90 minutes)
  • Work with Big Data in R
    Instructor: Nathan Stephens (90 minutes)
  • Reproducible Reports with Big Data
    Instructor: Yihui Xie (90 minutes)
  • Interactive Shiny Applications built on Big Data
    Instructor: Garrett Grolemund (90 minutes)

If you plan to stay for the full Strata Conference+Hadoop World be sure to look us up at booth 633 during the Expo Hall hours. We’ll have the latest books from RStudio authors and “shiny” t-shirts to win. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in New York City! (http://strataconf.com/big-data-conference-ny-2015)

Traditionally, the mechanisms for obtaining R and related software have used standard HTTP connections. This isn’t ideal though, as without a secure (HTTPS) connection there is less assurance that you are downloading code from a legitimate source rather than from another server posing as one.

Recently there have been a number of changes that make it easier to use HTTPS for installing R, RStudio, and packages from CRAN:

  1. Downloads of R from the main CRAN website now use HTTPS;

  2. Downloads of RStudio from our website now use HTTPS; and

  3. It is now possible to install packages from CRAN over HTTPS.

There are a number of ways to ensure that installation of packages from CRAN are performed using HTTPS. The most recent version of R  (v3.2.2) makes this the default behavior. The most recent version of RStudio (v0.99.473) also attempts to configure secure downloads from CRAN by default (even for older versions of R). Finally, any version of R or RStudio can use secure HTTPS downloads by making some configuration changes as described in the Secure Package Downloads for R article in our Knowledge Base.

Configuring Secure Connections to CRAN

While the simplest way to ensure secure connections to CRAN is to run the updated versions mentioned above, it’s important to note that it is not necessary to upgrade R or RStudio to achieve this end. Rather, two configuration changes can be made:

  1. The R download.file.method option needs to specify a method that is capable of HTTPS; and
  2. The CRAN mirror you are using must be capable of HTTPS connections (not all of them are).

The specifics of the required changes for various products, platforms, and versions of R are described in-depth in the Secure Package Downloads for R article in our Knowledge Base.

Recommendations for RStudio Users

We’ve made several changes to RStudio IDE to ensure that HTTPS connections are used throughout the product:

  1. The default download.file.method option is set to an HTTPS compatible method (with a warning displayed if a secure method can’t be set);
  2. The configured CRAN mirror is tested for HTTPS compatibility and a warning is displayed if the mirror doesn’t support HTTPS;
  3. HTTPS is used for user selection of a non-default CRAN mirror;
  4. HTTPS is used for in-product documentation links;
  5. HTTPS is used when checking for updated versions of RStudio (applies to desktop version only); and
  6. HTTPS is used when downloading Rtools (applies to desktop version only).

If you are running RStudio on the desktop we strongly recommend that you update to the latest version (v0.99.473).

Recommendations for Server Administrators

If you are running RStudio Server it’s possible to make the most important security enhancements by changing your configuration rather than updating to a new version. The Secure Package Downloads for R article in our Knowledge Base provides documentation on how do this.

In this case in-product documentation links and user selection of a non-default CRAN mirror will continue to use HTTP rather than HTTPS however these are less pressing concerns than CRAN package installation. If you’d like these functions to also be performed over HTTPS then you should upgrade your server to the latest version of RStudio.

If you are running Shiny Server we recommend that you modify your configuration to support HTTPS package downloads as described in the Secure Package Downloads for R article.

Today’s guest post is written by Vincent Warmerdam of GoDataDriven and is reposted with Vincent’s permission from blog.godatadriven.com. You can learn more about how to use SparkR with RStudio at the 2015 EARL Conference in Boston November 2-4, where Vincent will be speaking live.

This document contains a tutorial on how to provision a spark cluster with RStudio. You will need a machine that can run bash scripts and a functioning account on AWS. Note that this tutorial is meant for Spark 1.4.0. Future versions will most likely be provisioned in another way but this should be good enough to help you get started. At the end of this tutorial you will have a fully provisioned spark cluster that allows you to handle simple dataframe operations on gigabytes of data within RStudio.

AWS prep

Make sure you have an AWS account with billing. Next make sure that you have downloaded your .pem files and that you have your keys ready.

Spark Startup

Next go and get spark locally on your machine from the spark homepage. It’s a pretty big blob. Unzip it once it is downloaded go to the ec2 folder in the spark folder. Run the following command from the command line.

./spark-ec2 \
--key-pair=spark-df \
--identity-file=/Users/code/Downloads/spark-df.pem \
--region=eu-west-1 \
-s 1 \
--instance-type c3.2xlarge \
launch mysparkr

This script will use your keys to connect to amazon and setup a spark standalone cluster for you. You can specify what type of machines you want to use as well as how many and where on amazon. You will only need to wait until everything is installed, which can take up to 10 minutes. More info can be found here.
When the command signals that it is done, you can ssh into your machine via the command line.
./spark-ec2 -k spark-df -i /Users/code/Downloads/spark-df.pem --region=eu-west-1 login mysparkr
Once you are in your amazon machine you can immediately run SparkR from the terminal.

chmod u+w /root/spark/
./spark/bin/sparkR 

As just a toy example, you should be able to confirm that the following code already works.

ddf <- createDataFrame(sqlContext, faithful) 
head(ddf)
printSchema(ddf)

This ddf dataframe is no ordinary dataframe object. It is a distributed dataframe, one that can be distributed across a network of workers such that we could query it for parallelized commands through spark.

Spark UI

This R command you have just run launches a spark job. Spark has a webui so you can keep track of the cluster. To visit the web-ui, first confirm on what IP-address the master node is via this command:

curl icanhazip.com

You can now visit the webui via your browser.

<master-node-ip>:4040

From here you can view anything you may want to know about your spark clusters (like executor status, job process and even a DAG visualisation).

This is a good moment to stand still and realize that this on it’s own right is already very cool. We can start up a spark cluster in 15 minutes and use R to control it. We can specify how many servers we need by only changing a number on the command line and without any real developer effort we gain access to all this parallelizing power.
Still, working from a terminal might not be too productive. We’d prefer to work with a GUI and we would like some basic plotting functionality when working with data. So let’s install RStudio and get some tools connected.

RStudio setup

Get out of the SparkR shell by entering q(). Next, download and install Rstudio.
wget http://download2.rstudio.org/rstudio-server-rhel-0.99.446-x86_64.rpm
sudo yum install --nogpgcheck -y rstudio-server-rhel-0.99.446-x86_64.rpm
rstudio-server restart
While this is installing. Make sure the TCP connection on the 8787 port is open in the AWS security group setting for the master node. A recommended setting is to only allow access from your ip.

Then, add a user that can access RStudio. We make sure that this user can also access all the RStudio files.

adduser analyst
passwd analyst

You also need to do this (the details of why are a bit involved). These edits need to be made because the analyst user doesn’t have root permissions.
chmod a+w /mnt/spark
chmod a+w /mnt2/spark
sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh > /root/spark/conf/spark-env2.sh
mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh
ulimit -n 1000000
When this is known, point the browser to <master-ip-adr>:8787. Then login in as analyst.

Loading data from S3

Let’s confirm that we can now play with the RStudio stack by downloading some libraries and having it run against a data that lives on S3.
small_file = "s3n://<AWS-ID>:<AWS-SECRET-KEY>@<bucket_name>/data.json"
dist_df <- read.df(sqlContext, small_file, "json") %>% cache
This dist_df is now a distributed dataframe, which has a different api than the normal R dataframe but is similar to dplyr.
head(summarize(groupBy(dist_df, df$type), count = n(df$auc)))
Also, we can install magrittr to make our code look a lot nicer.

local_df <- dist_df %>% 
  groupBy(df$type) %>% 
  summarize(count = n(df$id)) %>% 
  collect

The collect method pulls the distributed dataframe back into a normal dataframe on a single machine so you can use plotting methods on it again and use R as you would normally. A common use case would be to use spark to sample or aggregate a large dataset which can then be further explored in R.
Again, if you want to view the spark ui for these jobs you can just go to:

<master-node-ip>:4040

A more complete stack

Unfortunately this stack has an old version of R (we need version 3.2 to get the newest version of ggplot2/dplyr). Also, as of right now there isn’t support for the machine learning libraries yet. These are known issues at the moment and version 1.5 should show some fixes. Version 1.5 will also feature RStudio installation as part of the ec2 stack.
Another issue is that the namespace of dplyr currently conflicts with sparkr, time will tell how this gets resolved. Same would go for other data features like windowing function and more elaborate data types.

Killing the cluster

When you are done with the cluster, you only need to exit the ssh connection and run the following command:
./spark-ec2 -k spark-df -i /Users/code/Downloads/spark-df.pem --region=eu-west-1 destroy mysparkr

Conclusion

The economics of spark are very interesting. We only pay amazon for the time that we are using Spark as a compute engine. All other times we’d only pay for S3. This means that if we analyse for 8 hours, we’d only pay for 8 hours. Spark is also very flexible in that it allows us to continue coding in R (or python or scala) without having to learn multiple domain specific languages or frameworks like in hadoop. Spark makes big data really simple again.
This document is meant to help you get started with Spark and RStudio but in a production environment there are a few things you still need to account for:

  • security, our web connection is not done through https, even though we are telling amazon to only use our ip, we may be at security risk if there is a man in the middle listening .
  • multiple users, this setup will work fine for a single user but if multiple users are working on such a cluster you may need to rethink some steps with regards to user groups, file access and resource management.
  • privacy, this setup works well for ec2 but if you have sensitive, private user data then you may need to do this on premise because the data cannot leave your own datacenter. Most install steps would be the same but the initial installation of Spark would require the most work. See the docs for more information.

Spark is an amazing tool, expect more features in the future.

Possible Gotya

Hanging

It can happen that the ec2 script hangs in the Waiting for cluster to enter 'ssh-ready' state part. This can happen if you use amazon a lot. To prevent this you may want to remove some lines in ~/.ssh/known_hosts. More info here. Another option is to add the following lines to your ~/.ssh/config file.

# AWS EC2 public hostnames (changing IPs)
Host *.compute.amazonaws.com 
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null

“Master” R in Washington DC this September!

Join RStudio Chief Data Scientist Hadley Wickham at the AMA – Executive Conference Center in Arlington, VA on September 14 and 15, 2015 for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

It will be at least another year before Hadley returns to teach his class on the East Coast, so don’t miss this opportunity to learn from him in person. The venue is conveniently located next to Ronald Reagan Washington National Airport and a short distance from the Metro. Attendance is limited. Past events have sold out.

Register today!

We’re pleased to announce that the final version of RStudio v0.99 is available for download now. Highlights of the release include:

  • A new data viewer with support for large datasets, filtering, searching, and sorting.
  • Complete overhaul of R code completion with many new features and capabilities.
  • The source editor now provides code diagnostics (errors, warnings, etc.) as you work.
  • User customizable code snippets for automating common editing tasks.
  • Tools for Rcpp: completion, diagnostics, code navigation, find usages, and automatic indentation.
  • Many additional source editor improvements including multiple cursors, tab re-ordering, and several new themes.
  • An enhanced Vim mode with visual block selection, macros, marks, and subset of : commands.

There are also lots of smaller improvements and bug fixes across the product. Check out the v0.99 release notes for details on all of the changes.

Data Viewer

We’ve completely overhauled the data viewer with many new capabilities including live update, sorting and filtering, full text searching, and no row limit on viewed datasets.

data-viewer

See the data viewer documentation for more details.

Code Completion

Previously RStudio only completed variables that already existed in the global environment. Now completion is done based on source code analysis so is provided even for objects that haven’t been fully evaluated:

completion-scopes

Completions are also provided for a wide variety of specialized contexts including dimension names in [ and [[:

completion-bracket

Code Diagnostics

We’ve added a new inline code diagnostics feature that highlights various issues in your R code as you edit.

For example, here we’re getting a diagnostic that notes that there is an extra parentheses:

Screen Shot 2015-04-08 at 12.04.14 PM

Here the diagnostic indicates that we’ve forgotten a comma within a shiny UI definition:

diagnostics-comma

A wide variety of diagnostics are supported, including optional diagnostics for code style issues (e.g. the inclusion of unnecessary whitespace). Diagnostics are also available for several other languages including C/C++, JavaScript, HTML, and CSS. See the code diagnostics documentation for additional details.

Code Snippets

Code snippets are text macros that are used for quickly inserting common snippets of code. For example, the fun snippet inserts an R function definition:

Insert Snippet

If you select the snippet from the completion list it will be inserted along with several text placeholders which you can fill in by typing and then pressing Tab to advance to the next placeholder:

Screen Shot 2015-04-07 at 10.44.39 AM

Other useful snippets include:

  • lib, req, and source for the library, require, and source functions
  • df and mat for defining data frames and matrices
  • if, el, and ei for conditional expressions
  • apply, lapply, sapply, etc. for the apply family of functions
  • sc, sm, and sg for defining S4 classes/methods.

See the code snippets documentation for additional details.

Try it Out

RStudio v0.99 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.

HadleyWickhamHSJoin RStudio Chief Data Scientist Hadley Wickham at the University of Illinois at Chicago, on Wednesday May 27th & 28th for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

As of this post, the workshop is two-thirds sold out. If you’re in or near Chicago and want to boost your R programming skills, this is Hadley’s only Central US public workshop planned for 2015.

Register here: https://rstudio-chicago.eventbrite.com

Follow

Get every new post delivered to your Inbox.

Join 19,397 other followers