We’re getting close to shipping the next version of RStudio (v0.99) and this week will continue our series of posts describing the major new features of the release (previous posts have already covered code completion, the revamped data viewer, and improvements to vim mode). Note that if you want to try out any of the new features now you can do so by downloading the RStudio Preview Release.

Code Snippets

Code snippets are text macros that are used for quickly inserting common snippets of code. For example, the fun snippet inserts an R function definition:

Insert Snippet

If you select the snippet from the completion list it will be inserted along with several text placeholders which you can fill in by typing and then pressing Tab to advance to the next placeholder:

Screen Shot 2015-04-07 at 10.44.39 AM

Other useful snippets include:

  • lib, req, and source for the library, require, and source functions
  • df and mat for defining data frames and matrices
  • if, el, and ei for conditional expressions
  • apply, lapply, sapply, etc. for the apply family of functions
  • sc, sm, and sg for defining S4 classes/methods.

Snippets are a great way to automate inserting common/boilerplate code and are available for R, C/C++, JavaScript, and several other languages.

Inserting Snippets

As illustrated above, code snippets show up alongside other code completion results and can be inserted by picking them from the completion list. By default the completion list will show up automatically when you pause typing for 250 milliseconds and can also be manually activated via the Tab key. In addition, if you have typed the character sequence for a snippet and want to insert it immediately (without going through the completion list) you can press Shift+Tab.

Customizing Snippets

You can edit the built-in snippet definitions and even add snippets of your own via the Edit Snippets button in Global Options -> Code:

Edit Snippets

Custom snippets are defined using the snippet keyword. The contents of the snippet should be indented below using the <tab> key (rather than with spaces). Variables can be defined using the form {1:varname}. For example, here’s the definition of the setGeneric snippet:

snippet sg
  setGeneric("${1:generic}", function(${2:x, ...}) {
    standardGeneric("${1:generic}")
  })

Once you’ve customized snippets for a given language they are written into the ~/.R/snippets directory. For example, the customized versions of R and C/C++ snippets are written to:

~/.R/snippets/r.snippets
~/.R/snippets/c_cpp.snippets

You can edit these files directly to customize snippet definitions or you can use the Edit Snippets dialog as described above. If you need to move custom snippet definitions to another system then simply place them in ~/.R/snippets and they’ll be used in preference to the built-in snippet definitions.

Try it Out

You can give code snippets a try now by downloading the RStudio Preview Release. If you run into problems or have feedback on how we could make things better let us know on our Support Forum.

I’m pleased to announced that readr is now available on CRAN. Readr makes it easy to read many types of tabular data:

  • Delimited files withread_delim(), read_csv(), read_tsv(), and read_csv2().
  • Fixed width files with read_fwf(), and read_table().
  • Web log files with read_log().

You can install it by running:

install.packages("readr")

Compared to the equivalent base functions, readr functions are around 10x faster. They’re also easier to use because they’re more consistent, they produce data frames that are easier to use (no more stringsAsFactors = FALSE!), they have a more flexible column specification, and any parsing problems are recorded in a data frame. Each of these features is described in more detail below.

Input

All readr functions work the same way. There are four important arguments:

  • file gives the file to read; a url or local path. A local path can point to a a zipped, bzipped, xzipped, or gzipped file – it’ll be automatically uncompressed in memory before reading. You can also pass in a connection or a raw vector.

    For small examples, you can also supply literal data: if file contains a new line, then the data will be read directly from the string. Thanks to data.table for this great idea!

    library(readr)
    read_csv("x,y\n1,2\n3,4")
    #>   x y
    #> 1 1 2
    #> 2 3 4
  • col_names: describes the column names (equivalent to header in base R). It has three possible values:
    • TRUE will use the the first row of data as column names.
    • FALSE will number the columns sequentially.
    • A character vector to use as column names.
  • col_types: overrides the default column types (equivalent to colClasses in base R). More on that below.
  • progress: By default, readr will display a progress bar if the estimated loading time is greater than 5 seconds. Use progress = FALSE to suppress the progress indicator.

Output

The output has been designed to make your life easier:

  • Characters are never automatically converted to factors (i.e. no more stringsAsFactors = FALSE!).
  • Column names are left as is, not munged into valid R identifiers (i.e. there is no check.names = TRUE). Use backticks to refer to variables with unusual names, e.g. df$`Income ($000)`.
  • The output has class c("tbl_df", "tbl", "data.frame") so if you also use dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).
  • Row names are never set.

Column types

Readr heuristically inspects the first 100 rows to guess the type of each columns. This is not perfect, but it’s fast and it’s a reasonable start. Readr can automatically detect these column types:

  • col_logical() [l], contains only T, F, TRUE or FALSE.
  • col_integer() [i], integers.
  • col_double() [d], doubles.
  • col_euro_double() [e], “Euro” doubles that use , as the decimal separator.
  • col_date() [D]: Y-m-d dates.
  • col_datetime() [T]: ISO8601 date times
  • col_character() [c], everything else.

You can manually specify other column types:

  • col_skip() [_], don’t import this column.
  • col_date(format) and col_datetime(format, tz), dates or date times parsed with given format string. Dates and times are rather complex, so they’re described in more detail in the next section.
  • col_numeric() [n], a sloppy numeric parser that ignores everything apart from 0-9, - and . (this is useful for parsing currency data).
  • col_factor(levels, ordered), parse a fixed set of known values into a (optionally ordered) factor.

There are two ways to override the default choices with the col_types argument:

  • Use a compact string: "dc__d". Each letter corresponds to a column so this specification means: read first column as double, second as character, skip the next two and read the last column as a double. (There’s no way to use this form with column types that need parameters.)
  • With a (named) list of col objects:
    read_csv("iris.csv", col_types = list(
      Sepal.Length = col_double(),
      Sepal.Width = col_double(),
      Petal.Length = col_double(),
      Petal.Width = col_double(),
      Species = col_factor(c("setosa", "versicolor", "virginica"))
    ))

    Any omitted columns will be parsed automatically, so the previous call is equivalent to:

    read_csv("iris.csv", col_types = list(
      Species = col_factor(c("setosa", "versicolor", "virginica"))
    )

Dates and times

One of the most helpful features of readr is its ability to import dates and date times. It can automatically recognise the following formats:

  • Dates in year-month-day form: 2001-10-20 or 2010/15/10 (or any non-numeric separator). It can’t automatically recongise dates in m/d/y or d/m/y format because they’re ambiguous: is 02/01/2015 the 2nd of January or the 1st of February?
  • Date times as ISO8601 form: e.g. 2001-02-03 04:05:06.07 -0800, 20010203 040506, 20010203 etc. I don’t support every possible variant yet, so please let me know if it doesn’t work for your data (more details in ?parse_datetime).

If your dates are in another format, don’t despair. You can use col_date() and col_datetime() to explicit specify a format string. Readr implements it’s own strptime() equivalent which supports the following format strings:

  • Year: \%Y (4 digits). \%y (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
  • Month: \%m (2 digits), \%b (abbreviated name in current locale), \%B (full name in current locale).
  • Day: \%d (2 digits), \%e (optional leading space)
  • Hour: \%H
  • Minutes: \%M
  • Seconds: \%S (integer seconds), \%OS (partial seconds)
  • Time zone: \%Z (as name, e.g. America/Chicago), \%z (as offset from UTC, e.g. +0800)
  • Non-digits: \%. skips one non-digit charcater, \%* skips any number of non-digit characters.
  • Shortcuts: \%D = \%m/\%d/\%y, \%F = \%Y-\%m-\%d, \%R = \%H:\%M, \%T = \%H:\%M:\%S, \%x = \%y/\%m/\%d.

To practice parsing date times with out having to load the file each time, you can use parse_datetime() and parse_date():

parse_date("2015-10-10")
#> [1] "2015-10-10"
parse_datetime("2015-10-10 15:14")
#> [1] "2015-10-10 15:14:00 UTC"

parse_date("02/01/2015", "%m/%d/%Y")
#> [1] "2015-02-01"
parse_date("02/01/2015", "%d/%m/%Y")
#> [1] "2015-01-02"

Problems

If there are any problems parsing the file, the read_ function will throw a warning telling you how many problems there are. You can then use the problems() function to access a data frame that gives information about each problem:

csv <- "x,y
1,a
b,2
"

df <- read_csv(csv, col_types = "ii")
#> Warning: 2 problems parsing literal data. See problems(...) for more
#> details.
problems(df)
#>   row col   expected actual
#> 1   1   2 an integer      a
#> 2   2   1 an integer      b
df
#>    x  y
#> 1  1 NA
#> 2 NA  2

Helper functions

Readr also provides a handful of other useful functions:

  • read_lines() works the same way as readLines(), but is a lot faster.
  • read_file() reads a complete file into a string.
  • type_convert() attempts to coerce all character columns to their appropriate type. This is useful if you need to do some manual munging (e.g. with regular expressions) to turn strings into numbers. It uses the same rules as the read_* functions.
  • write_csv() writes a data frame out to a csv file. It’s quite a bit faster than write.csv() and it never writes row.names. It also escapes " embedded in strings in a way that read_csv() can read.

Development

Readr is still under very active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.

action-button

Action buttons can be tricky to use in Shiny because they work differently than other widgets. Widgets like sliders and select boxes maintain a value that is easy to use in your code. But the value of an action button is arbitrary. What should you do with it? Did you know that you should almost always call the value of an action button from observeEvent() or eventReactive()?

The newest article at the Shiny Development Center explains how action buttons work, and it provides five useful patterns for working with action buttons. These patterns also work well with action links.

Read the article here.

data visualization cheatsheet

We’ve added a new cheatsheet to our collection. Data Visualization with ggplot2 describes how to build a plot with ggplot2 and the grammar of graphics. You will find helpful reminders of how to use:

  • geoms
  • stats
  • scales
  • coordinate systems
  • facets
  • position adjustments
  • legends, and
  • themes

The cheatsheet also documents tips on zooming.

Download the cheatsheet here.

Bonus – Frans van Dunné of Innovate Online has provided Spanish translations of the Data Wrangling, R Markdown, Shiny, and Package Development cheatsheets. Download them at the bottom of the cheatsheet gallery.

Cheatsheet

We’ve added a new cheatsheet to our collection! Package Development with devtools will help you find the most useful functions for building packages in R. The cheatsheet will walk you through the steps of building a package from:

  • Setting up the package structure
  • Adding a DESCRIPTION file
  • Writing code
  • Writing tests
  • Writing documentation with roxygen
  • Adding data sets
  • Building a NAMESPACE, and
  • Including vignettes

The sheet focuses on Hadley Wickham’s devtools package, and it is a useful supplement to Hadley’s book R Packages, which you can read online at r-pkgs.had.co.nz.

Download the sheet here.

Bonus – Vivian Zhang of SupStat Analytics has kindly translated the existing Data Wrangling, R Markdown, and Shiny cheatsheets into Chinese. You can download the translations at the bottom of the cheatsheet gallery.

I’m pleased to announced that the new haven package is now available on CRAN. Haven makes it easy to read data from SAS, SPSS and Stata. Haven has the same goal as the foreign package, but it:

  • Can read binary SAS7BDAT files.
  • Can read Stata13 files.
  • Always returns a data frame.

(Haven also has experimental support for writing SPSS and Stata data. This still has some rough edges but please try it out and report any problems that you find.)

Haven is a binding to the excellent ReadStat C library by Evan Miller. Haven wouldn’t be possible without his hard work – thanks Evan! I’d also like to thank Matt Shotwell who spend a lot of time reverse engineering the SAS binary data format, and Dennis Fisher who tested the SAS code with thousands of SAS files.

Usage

Using haven is easy:

  • Install it, install.packages("haven"),
  • Load it, library(haven),
  • Then pick the appropriate read function:
    • SAS: read_sas()
    • SPSS: read_sav() or read_por()
    • Stata: read_dta().

These only need the name of the path. (read_sas() optionally also takes the path to a catolog file.)

Output

All functions return a data frame:

  • The output also has class tbl_df which will improve the default print method (to only show the first ten rows and the variables that fit on one screen) if you have dplyr loaded. If you don’t use dplyr, it has no effect.
  • Variable labels are attached as an attribute to each variable. These are not printed (because they tend to be long), but if you have a preview version of RStudio, you’ll see them in the revamped viewer pane.
  • Missing values in numeric variables should be seemlessly converted. Missing values in character variables are converted to the empty string, "": if you want to convert them to missing values, use zap_empty().
  • Dates are converted in to Dates, and datetimes to POSIXcts. Time variables are read into a new class called hms which represents an offset in seconds from midnight. It has print() and format() methods to nicely display times, but otherwise behaves like an integer vector.
  • Variables with labelled values are turned into a new labelled class, as described next.

Labelled variables

SAS, Stata and SPSS all have the notion of a “labelled” variable. These are similar to factors, but:

  • Integer, numeric and character vectors can be labelled.
  • Not every value must be associated with a label.

Factors, by contrast, are always integers and every integer value must be associated with a label.

Haven provides a labelled class to model these objects. It doesn’t implement any common methods, but instead focusses of ways to turn a labelled variable into standard R variable:

  • as_factor(): turns labelled integers into factors. Any values that don’t have a label associated with them will become a missing value. (NB: there’s no way to make as.factor() work with labelled variables, so you’ll need to use this new function.)
  • zap_labels(): turns any labelled values into missing values. This deals with the common pattern where you have a continuous variable that has missing values indiciated by sentinel values.

If you have a use case that’s not covered by these function, please let me know.

Development

Haven is still under very active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.

RStudio is excited to announce the general availability (GA) of shinyapps.io.

Shinyapps.io is an easy to use, secure, and scalable hosted service already being used by thousands of professionals and students to deploy Shiny applications on the web. Effective today, shinyapps.io has completed beta testing and is generally available as a commercial service for anyone.

As regular readers of our blog know, Shiny is a popular free and open source R package from RStudio that simplifies the creation of interactive web applications, dashboards, and reports. Until today, Shiny Server and Shiny Server Pro were the most popular ways to share shiny apps. Now, there is a commercially supported alternative for individuals and groups who don’t have the time or resources to install and manage their own servers.

We want to thank the nearly 8,000 people who created at least one shiny app and deployed it on shinyapps.io during its extensive alpha and beta testing phases! The service was improved for everyone because of your willingness to give us feedback and bear with us as we continuously added to its capabilities.

For R users developing shiny applications that haven’t yet created a shinyapps.io account, we hope you’ll give it a try soon!  We did our best to keep the pricing simple and predictable with Free, Basic, Standard, and Professional plans. Each paid plan has features and functionality that we think will appeal to different users and can be purchased with a credit card by month or year. You can learn more about shinyapps.io pricing plans and product features on our website.

We hope to see your shiny app on shinyapps.io soon!

RStudio’s data viewer provides a quick way to look at the contents of data frames and other column-based data in your R environment. You invoke it by clicking on the grid icon in the Environment pane, or at the console by typing View(mydata).

grid icon

As part of the RStudio Preview Release, we’ve completely overhauled RStudio’s data viewer with modern features provided in part by a new interface built on DataTables.

No Row Limit

While the data viewer in 0.98 was limited to the first 1,000 rows, you can now view all the rows of your data set. RStudio loads just the portion of the data you’re looking at into the user interface, so things won’t get sluggish even when you’re working with large data sets.

no row limit

We’ve also added fixed column headers, and support for column labels imported from SPSS and other systems.

Sorting and Filtering

RStudio isn’t designed to act like a spreadsheet, but sometimes it’s helpful to do a quick sort or filter to get some idea of the data’s characteristics before moving into reproducible data analysis. Towards that end, we’ve built some basic sorting and filtering into the new data viewer.

Sorting

Click a column once to sort data in ascending order, and again to sort in descending order. For instance, how big is the biggest diamond?

sorting

To clear all sorts and filters on the data, click the upper-left column header.

Filtering

Click the new Filter button to enter Filter mode, then click the white filter value box to filter a column. You might, for instance, want to look at only at smaller diamonds:

filter

Not all data types can be filtered; at the moment, you can filter only numeric types, characters, and factors.

You can also stack filters; for instance, let’s further restrict this view to small diamonds with a Very Good cut:

filter factor

Full-Text Search

You can search the full text of your data frame using the new Search box in the upper right. This is useful for finding specific records; for instance, how many people named John were born in 2013?

full-text search

Live Update

If you invoke the data viewer on a variable as in View(mydata), the data viewer will (in most cases) automatically refresh whenever data in the variable changes.

You can use this feature to watch data change as you manipulate it. It continues to work even when the data viewer is popped out, a configuration that combines well with multi-monitor setups.

We hope these improvements help make you understand your data more quickly and easily. Try out the RStudio Preview Release and let us know what you think!

RStudio’s code editor includes a set of lightweight Vim key bindings. You can turn these on in Tools | Global Options | Code | Editing:

global options

For those not familiar, Vim is a popular text editor built to enable efficient text editing. It can take some practice and dedication to master Vim style editing but those who have done so typically swear by it. RStudio’s “vim mode” enables the use of many of the most common keyboard operations from Vim right inside RStudio.

As part of the 0.99 preview release, we’ve included an upgraded version of the ACE editor, which has a completely revamped Vim mode. This mode extends the range of Vim key bindings that are supported, and implements a number of Vim “power features” that go beyond basic text motions and editing. These include:

  • Vertical block selection via Ctrl + V. This integrates with the new multiple cursor support in ACE and allows you to type in multiple lines at once.
  • Macro playback and recording, using q{register} / @{register}.
  • Marks, which allow you drop markers in your source and jump back to them quickly later.
  • A selection of Ex commands, such as :wq and :%s that allow you to perform editor operations as you would in native Vim.
  • Fast in-file search with e.g. / and *, and support for JavaScript regular expressions.

We’ve also added a Vim quick reference card to the IDE that you can bring up at any time to show the supported key bindings. To see it, switch your editor to Vim mode (as described above) and type :help in Command mode.

vim quick reference card

Whether you’re a Vim novice or power user, we hope these improvements make the RStudio IDE’s editor a more productive and enjoyable environment for you. You can try the new Vim features out now by downloading the RStudio Preview Release.

We’re busy at work on the next version of RStudio (v0.99) and this week will be blogging about some of the noteworthy new features. If you want to try out any of the new features now you can do so by downloading the RStudio Preview Release.

The first feature to highlight is a fully revamped implementation of code completion for R. We’ve always supported a limited form of completion however (a) it only worked on objects in the global environment; and (b) it only worked when expressly requested via the tab key. As a result not nearly enough users discovered or benefitted from code completion. In this release code completion is much more comprehensive.

Smarter Completion Engine

Previously RStudio only completed variables that already existed in the global environment, now completion is done based on source code analysis so is provided even for objects that haven’t been fully evaluated:

document-inferred

Completions are also provided for a wide variety of specialized contexts including dimension names in [ and [[:

bracket

RStudio now provides completions for function arguments within function chains using magrittr’s %>% operator, for e.g. dplyr data transformation pipelines. Extending this behavior, we also provide the appropriate completions for the various ‘verbs’ used by dplyr:

dplyr        dplyr_verb

In addition, certain functions, such as library() and require(), expect package names for completions. RStudio automatically infers whether a particular function expects a package name and provides those names as completion results:

library

Completion is now also S3 and S4 aware. If RStudio is able to determine which method a particular function call will be dispatched to it will attempt to retrieve completions from that method. For example, the sort.default() method provides an extra argument, na.last, not available in the sort() generic. RStudio will provide completions for that argument if S3 dispatch would choose sort.default()

s3

Beyond what’s described above there are lots more new places where completions are provided:

  • For Shiny applications, completions for ui.R + server.R pairs
  • Completions for knitr options, e.g. in opts_chunk$get(), are now supplied
  • Completions for dynamic symbols within .C, .Call, .Fortran, .External

Additional Enhancements

Always On Completion

Previously RStudio only displayed completions “on-demand” in response to the tab key. Now, RStudio will proactively display completions after a $ or :: as well as after a period of typing inactivity. All of this behavior is configurable via the new completion options panel:

options

File Completions

When within an RStudio project, completions will be applied recursively to all file names matching the current token. The enclosing parent directory is printed on the right:

file

Fuzzy Narrowing

Got a completion with an excessively long name, perhaps a particularly long named Bioconductor package, or another variable or function name of long length? RStudio now uses ‘fuzzy narrowing’ on the completion list, by checking to see if the completion matches a ‘subsequence’ within each completion. By subsequence, we mean a sequence of characters not necessarily connected within the completion, so that for example, ‘fpse’ could match ‘file_path_sans_extension’. We hope that users will quickly become accustomed to this behavior and find it very useful.

fuzzy

Trying it Out

We think that the new completion features make for a qualitatively better experience of writing R code for beginning and expert users alike.  You can give the new features a try now by downloading the RStudio Preview Release.  If you run into problems or have feedback on how we could make things better let us know on our Support Forum.

 

Follow

Get every new post delivered to your Inbox.

Join 17,610 other followers