You are currently browsing the category archive for the ‘Packages’ category.

I’m very pleased to announce that Epoch.com has stepped up as a sponsor for the RMySQL package.

For the last 20 years, Epoch.com has built its Internet Payment Service Provider infrastructure on open source software. Their data team, led by Szilard Pafka, PhD, has been using R for nearly a decade, developing cutting-edge data visualization, machine learning and other analytical applications. According to Epoch, “We have always believed in the value of R and in the importance of contributing to the open source community.”

This sort of sponsorship is very important to me. While I already spend most of my time working on R packages, I don’t have the skills to fix every problem. Sponsorship allows me to hire outside experts. In this case, Epoch.com’s sponsorship allowed me to work with Jeroen Ooms to improve the build system for RMySQL so that a CRAN binary is available for every platform.

Is your company interested in sponsoring other infrastructure work that benefits the whole R community? If so, please get in touch.

Shiny version 0.11 is available now! Notable changes include:

  • Shiny has migrated from Bootstrap 2 to Bootstrap 3 for its web front end. More on this below.
  • The old jsliders have been replaced with ion.rangeSlider. These sliders look better, are easier for users to interact with, and support updating more fields from the server side.
  • There is a new passwordInput() which can be used to create password fields.
  • New observeEvent() and eventReactive() functions greatly streamline the use of actionButton and other inputs that act more like events than reactive inputs.

For a full set of changes, see the NEWS file. To install, run:

install.packages("shiny")

We’ve also posted an article with notes on upgrading to 0.11.

Bootstrap 3 migration

In all versions of Shiny prior to 0.11, Shiny has used the Bootstrap 2 framework for its web front-end. Shiny generates HTML that is structured to work with Bootstrap, and this makes it easy to create pages with sidebars, tabs, dropdown menus, mobile device support, and so on.

The Bootstrap development team stopped development on the Bootstrap 2 series after version 2.3.2, which was released over a year ago, and has since focused their efforts on Bootstrap 3. The new version of Bootstrap builds on many of the same underlying ideas, but it also has many small changes – for example, many of the CSS class names have changed.

In Shiny 0.11, we’ve moved to Bootstrap 3. For most Shiny users, the transition will be seamless; the only differences you’ll see are slight changes to fonts and spacing.

If, however, you customized any of your code to use features specific to Bootstrap 2, then you may need to update your code to work with Bootstrap 3 (see the Bootstrap migration guide for details). If you don’t want to update your code right away, you can use the shinybootstrap2 package for backward compatibility with Bootstrap 2 – using it requires adding just two lines of code. If you do use shinybootstrap2, we suggest using it just as an interim solution until you update your code for Bootstrap 3, because Shiny development going forward will use Bootstrap 3.

Why is Shiny moving to Bootstrap 3? One reason is support: as mentioned earlier, Bootstrap 2 is no longer developed and is no longer supported. Another reason is that there is dynamic community of actively-developed Bootstrap 3 themes. (Themes for Bootstrap 2 also exist, but there is less development activity.) Using these themes will allow you to customize the appearance of a Shiny app so that it doesn’t just look like… a Shiny app.

We’ve also created a package that make it easy to use Bootstrap themes: shinythemes. Here’s an example using the included Flatly theme: flatly

See the shinythemes site for more screenshots and instructions on how to use it.

We’re also working on shinydashboard, a package that makes it easy to create dashboards. Here’s an example dashboard that also uses the leaflet package.

buses

The shinydashboard package still under development, but feel free to try it out and give us feedback.

Jeroen Ooms and I are very pleased to announce a new version of RMySQL, the R package that allows you to talk to MySQL (and MariaDB) databases. We have taken over maintenance from Jeffrey Horner, who has done a great job of maintaining the package of the last few years, but no longer has time to look after it. Thanks for all your hard work Jeff!

Using RMySQL

library(DBI)

# Connect to a public database that I'm running on Google's 
# cloud SQL service. It contains a copy of the data in the
# datasets package.
con <-  dbConnect(RMySQL::MySQL(), 
  username = "public", 
  password = "F60RUsyiG579PeKdCH",
  host = "173.194.227.144", 
  port = 3306, 
  dbname = "datasets"
)

# Run a query
dbGetQuery(con, "SELECT * FROM mtcars WHERE cyl = 4 AND mpg < 23")
#>       row_names  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1    Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 2      Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 3 Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 4    Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

# It's polite to let the database know when you're done
dbDisconnect(con)
#> [1] TRUE

It’s generally a bad idea to put passwords in your code, so instead of typing them directly, you can create a file called ~/.my.cnf that contains

[cloudSQL]
username=public
password=F60RUsyiG579PeKdCH
host=173.194.227.144
port=3306
database=datasets

Then you can connect with:

con <-  dbConnect(RMySQL::MySQL(), group = "cloudSQL")

Changes in this release

RMySQL 0.10.0 is mostly a cleanup release. RMySQL is one of the oldest packages on CRAN, and according to the timestamps, it is older than many recommended packages, and only slightly younger than MASS! That explains why a facelift was well overdue.

The most important change is an improvement to the build process so that CRAN binaries are now available for Windows and OS X Mavericks. This should make your life much easier if you’re on one of these platforms. We’d love your feedback on the new build scripts. There have been many problems in the past, so we’d like to know that this client works well across platforms and versions of MySQL server.

Otherwise, the changes update RMySQL for DBI 0.3 compatibility:

  • Internal mysql*() functions are no longer exported. Please use the corresponding DBI generics instead.
  • RMySQL gains transaction support with dbBegin(), dbCommit(), and dbRollback(). (But note that MySQL does not allow data definition language statements to be rolled back.)
  • Added method for dbFetch(). Please use this instead of fetch(). dbFetch() now returns a 0-row data frame (instead of an 0-col data frame) if there are no results.
  • Added methods for dbIsValid(). Please use these instead of isIdCurrent().
  • dbWriteTable() has been rewritten. It uses a better quoting strategy, throws errors on failure, and only automatically adds row names only if they’re strings. (NB: dbWriteTable() also has a method that allows you load files directly from disk – this is likely to be faster if your file is one of the formats supported.)

For a complete list of changes, please see the full release notes.

ggplot2 1.0.0

As you might have noticed, ggplot2 recently turned 1.0.0. This release incorporated a handful of new features and bug fixes, but most importantly reflects that ggplot2 is now a mature plotting system and it will not change significantly in the future.

This does not mean ggplot2 is dead! The ggplot2 community is rich and vibrant and the number of packages that build on top of ggplot2 continues to grow. We are committed to maintaining ggplot2 so that you can continue to rely on it for years to come.

The ggplot2 book

Since ggplot2 is now stable, and the ggplot2 book is over five years old and rather out of date, I’m also happy to announce that I’m working on a second edition. I’ll be ably assisted in this endeavour by Carson Sievert, who’s so far done a great job of converting the source to Rmd and updating many of the examples to work with ggplot2 1.0.0. In the coming months we’ll be rewriting the data chapter to reflect modern best practices (e.g. tidyr and dplyr), and adding sections about new features.

We’d love your help! The source code for the book is available on github. If you’ve spotted any mistakes in the first edition that you’d like to correct, we’d really appreciate a pull request. If there’s a particular section of the book that you think needs an update (or is just plain missing), please let us know by filing an issue. Unfortunately we can’t turn the book into a free website because of my agreement with the publisher, but at least you can now get easily get to the source.

Give yourself the gift of “mastering” R to start 2015!

Join RStudio Chief Data Scientist Hadley Wickham at the Westin San Francisco on January 19 and 20 for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

As of this post, the workshop is two-thirds sold out. If you’re in or near California and want to boost your R programming skills, this is Hadley’s only West Coast public workshop planned for 2015.

Register here: http://rstudio-sfbay.eventbrite.com/

Today we’re excited to announce htmlwidgets, a new framework that brings the best of JavaScript data visualization libraries to R. There are already several packages that take advantage of the framework (leaflet, dygraphs, networkD3, DataTables, and rthreejs) with hopefully many more to come.

An htmlwidget works just like an R plot except it produces an interactive web visualization. A line or two of R code is all it takes to produce a D3 graphic or Leaflet map. Widgets can be used at the R console as well as embedded in R Markdown reports and Shiny web applications. Here’s an example of using leaflet directly from the R console:

rconsole.2x

When printed at the console the leaflet widget displays in the RStudio Viewer pane. All of the tools typically available for plots are also available for widgets, including history, zooming, and export to file/clipboard (note that when not running within RStudio widgets will display in an external web browser).

Here’s the same widget in an R Markdown report. Widgets automatically print as HTML within R Markdown documents and even respect the default knitr figure width and height.

rmarkdown.2x

Widgets also provide Shiny output bindings so can be easily used within web applications. Here’s the same widget in a Shiny application:

shiny.2x

Bringing JavaScript to R

The htmlwidgets framework is a collaboration between Ramnath Vaidyanathan (rCharts), Kenton Russell (Timely Portfolio), and RStudio. We’ve all spent countless hours creating bindings between R and the web and were motivated to create a framework that made this as easy as possible for all R developers.

There are a plethora of libraries available that create attractive and fully interactive data visualizations for the web. However, the programming interface to these libraries is JavaScript, which places them outside the reach of nearly all statisticians and analysts. htmlwidgets makes it extremely straightforward to create an R interface for any JavaScript library.

Here are a few widget libraries that have been built so far:

  • leaflet, a library for creating dynamic maps that support panning and zooming, with various annotations like markers, polygons, and popups.
  • dygraphs, which provides rich facilities for charting time-series data and includes support for many interactive features including series/point highlighting, zooming, and panning.
  • networkD3, a library for creating D3 network graphs including force directed networks, Sankey diagrams, and Reingold-Tilford tree networks.
  • DataTables, which displays R matrices or data frames as interactive HTML tables that support filtering, pagination, and sorting.
  • rthreejs, which features 3D scatterplots and globes based on WebGL.

All of these libraries combine visualization with direct interactivity, enabling users to explore data dynamically. For example, time-series visualizations created with dygraphs allow dynamic panning and zooming:

NewHavenTemps

Learning More

To learn more about the framework and see a showcase of the available widgets in action check out the htmlwidgets web site. To learn more about building your own widgets, install the htmlwidgets package from CRAN and check out the developer documentation.

 

tidyr 0.2.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has variables in columns and observations in rows, and is described in more detail in the tidy data vignette. Install tidyr with:

install.packages("tidyr")

There are three important additions to tidyr 0.2.0:

  • expand() is a wrapper around expand.grid() that allows you to generate all possible combinations of two or more variables. In conjunction with dplyr::left_join(), this makes it easy to fill in missing rows of data.
    sales <- dplyr::data_frame(
      year = rep(c(2012, 2013), c(4, 2)),
      quarter = c(1, 2, 3, 4, 2, 3), 
      sales = sample(6) * 100
    )
    
    # Missing sales data for 2013 Q1 & Q4
    sales
    #> Source: local data frame [6 x 3]
    #> 
    #>   year quarter sales
    #> 1 2012       1   400
    #> 2 2012       2   200
    #> 3 2012       3   500
    #> 4 2012       4   600
    #> 5 2013       2   300
    #> 6 2013       3   100
    
    # Missing values are now explicit
    sales %>% 
      expand(year, quarter) %>%
      dplyr::left_join(sales)
    #> Joining by: c("year", "quarter")
    #> Source: local data frame [8 x 3]
    #> 
    #>   year quarter sales
    #> 1 2012       1   400
    #> 2 2012       2   200
    #> 3 2012       3   500
    #> 4 2012       4   600
    #> 5 2013       1    NA
    #> 6 2013       2   300
    #> 7 2013       3   100
    #> 8 2013       4    NA
  • In the process of data tidying, it’s sometimes useful to have a column of a data frame that is a list of vectors. unnest() lets you simplify that column back down to an atomic vector, duplicating the original rows as needed. (NB: If you’re working with data frames containing lists, I highly recommend using dplyr’s tbl_df, which will display list-columns in a way that makes their structure more clear. Use dplyr::data_frame() to create a data frame wrapped with the tbl_df class.)
    raw <- dplyr::data_frame(
      x = 1:3,
      y = c("a", "d,e,f", "g,h")
    )
    # y is character vector containing comma separated strings
    raw
    #> Source: local data frame [3 x 2]
    #> 
    #>   x     y
    #> 1 1     a
    #> 2 2 d,e,f
    #> 3 3   g,h
    
    # y is a list of character vectors
    as_list <- raw %>% mutate(y = strsplit(y, ","))
    as_list
    #> Source: local data frame [3 x 2]
    #> 
    #>   x        y
    #> 1 1 <chr[1]>
    #> 2 2 <chr[3]>
    #> 3 3 <chr[2]>
    
    # y is a character vector; rows are duplicated as needed
    as_list %>% unnest(y)
    #> Source: local data frame [6 x 2]
    #> 
    #>   x y
    #> 1 1 a
    #> 2 2 d
    #> 3 2 e
    #> 4 2 f
    #> 5 3 g
    #> 6 3 h
  • separate() has a new extra argument that allows you to control what happens if a column doesn’t always split into the same number of pieces.
    raw %>% separate(y, c("trt", "B"), ",")
    #> Error: Values not split into 2 pieces at 1, 2
    raw %>% separate(y, c("trt", "B"), ",", extra = "drop")
    #> Source: local data frame [3 x 3]
    #> 
    #>   x trt  B
    #> 1 1   a NA
    #> 2 2   d  e
    #> 3 3   g  h
    raw %>% separate(y, c("trt", "B"), ",", extra = "merge")
    #> Source: local data frame [3 x 3]
    #> 
    #>   x trt   B
    #> 1 1   a  NA
    #> 2 2   d e,f
    #> 3 3   g   h

To read about the other minor changes and bug fixes, please consult the release notes.

reshape2 1.4.1

There’s also a new version of reshape2, 1.4.1. It includes three bug fixes for melt.data.frame() contributed by Kevin Ushey. Read all about them on the release notes and install it with:

install.packages("reshape2")

(Posted on behalf of Stefan Milton Bache)

Sometimes it’s the small things that make a big difference. For me, the introduction of our awkward looking friend, %>%, was one such little thing. I’d never suspected that it would have such an impact on the way quite a few people think and write R (including my own), or that pies would be baked (see here) and t-shirts printed (e.g. here) in honor of the successful three-char-long and slightly overweight operator. Of course a big part of the success is the very fruitful relationship with dplyr and its powerful verbs.

Quite some time went by without any changes to the CRAN version of magrittr. But many ideas have been evaluated and tested, and now we are happy to finally bring an update which brings both some optimization and a few nifty features — we hope that we have managed to strike a balance between simplicity and usefulness and that you will benefit from this update. You can install it now with:

install.packages("magrittr")

The underlying evaluation model is more coherent in this release; this makes the new features more natural extensions and improves performance somewhat. Below I’ll recap some of the important new features, which include functional sequences, a few specialized supplementary operators and better lambda syntax.

Functional sequences

The basic (pseudo) usage of the pipe operator goes something like this:

awesome_data <-
  raw_interesting_data %>%
  transform(somehow) %>%
  filter(the_good_parts) %>%
  finalize

This statement has three parts: an input, an output, and a sequence transformations. That’s suprisingly close to the definition of a function, so in magrittr is really just a convenient way of of defining and applying a function.
A new really useful feature of magrittr 1.5 makes that explicit: you can use %>% to not only produce values but also to produce functions (or functional sequences)! It’s really all the same, except sometimes the function is applied instantly and produces a result, and sometimes it is not, in which case the function itself is returned. In this case, there is no initial value, so we replace that with the dot placeholder. Here is how:

mae <- . %>% abs %>% mean(na.rm = TRUE)
mae(rnorm(10))
#> [1] 0.5605

That’s equivalent to:

mae <- function(x) {
  mean(abs(x), na.rm = TRUE)
}

Even for a short function, this is more compact, and is easier to read as it is defined linearly from left to right.
There are some really cool use cases for this: functionals! Consider how clean it is to pass a function to lapply or aggregate!

info <-
  files %>%
  lapply(. %>% read_file %>% extract(the_goodies))

Functions made this way can be indexed with [ to get a new function containing only a subset of the steps.

Lambda expressions

The new version makes it clearer that each step is really just a single-statement body of a unary function. What if we need a little more than one command to make a satisfactory “step” in a chain? Before, one might either define a function outside the chain, or even anonymously inside the chain, enclosing the entire definition in parentheses. Now extending that one command is like extending a standard one-command function: enclose whatever you’d like in braces, and that’s it:

value %>%
  foo %>% {
    x <- bar(.)
    y <- baz(.)
    x * y
  } %>%
  and_whatever

As usual, the name of the argument to that unary function is ..

Nested function calls

In this release the dot (.) will work also in nested function calls on the right-hand side, e.g.:

1:5 %>% 
  paste(letters[.])
#> [1] "1 a" "2 b" "3 c" "4 d" "5 e"

When you use . inside a function call, it’s used in addition to, not instead of, . at the top-level. For example, the previous command is equivalent to:

1:5 %>% 
  paste(., letters[.])
#> [1] "1 a" "2 b" "3 c" "4 d" "5 e"

If you don’t want this behaviour, wrap the function call in {:

1:5 %>% {
  paste(letters[.])
}
#> [1] "a" "b" "c" "d" "e"

A few of %>%’s friends

We also introduce a few operators. These are supplementary operators that just make some situations more comfortable.
The tee operator, %T>%, enables temporary branching in a pipeline to apply a few side-effect commands to the current value, like plotting or logging, and is inspired by the Unix tee command. The only difference to %>% is that %T>% returns the left-hand side rather than the result of applying the right-hand side:

value %>%
  transform %T>%
  plot %>%
  transform(even_more)

This is a shortcut for:

value %>%
  transform %>%
  { plot(.); . } %>%
  transform(even_more)

because plot() doesn’t normally return anything that can be piped along!
The exposition operator, %$%, is a wrapper around with(),
which makes it easy to refer to the variables inside a data frame:

mtcars %$%
  plot(mpg, wt)

Finally, we also have %<>%, the compound assignment pipe operator. This must be the first operator in the chain, and it will assign the result of the pipeline to the left-hand side name or expression. It’s purpose is to shorten expressions like this:

data$some_variable <-
  data$some_variable %>%
  transform

and turn them into something like this:

data$some_variable %<>%
  transform

Even a small example like x %<>% sort has its appeal!
In summary there is a few new things to get to know; but magrittr is like it always was. Just a little coolr!

rvest is new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with:

install.packages("rvest")

rvest in action

To see rvest in action, imagine we’d like to scrape some information about The Lego Movie from IMDB. We start by downloading and parsing the file with html():

library(rvest)
lego_movie <- html("http://www.imdb.com/title/tt1490017/")

To extract the rating, we start with selectorgadget to figure out which css selector matches the data we want: strong span. (If you haven’t heard of selectorgadget, make sure to read vignette("selectorgadget") – it’s the easiest way to determine which selector extracts the data that you’re interested in.) We use html_node() to find the first node that matches that selector, extract its contents with html_text(), and convert it to numeric with as.numeric():

lego_movie %>% 
  html_node("strong span") %>%
  html_text() %>%
  as.numeric()
#> [1] 7.9

We use a similar process to extract the cast, using html_nodes() to find all nodes that match the selector:

lego_movie %>%
  html_nodes("#titleCast .itemprop span") %>%
  html_text()
#>  [1] "Will Arnett"     "Elizabeth Banks" "Craig Berry"    
#>  [4] "Alison Brie"     "David Burrows"   "Anthony Daniels"
#>  [7] "Charlie Day"     "Amanda Farinos"  "Keith Ferguson" 
#> [10] "Will Ferrell"    "Will Forte"      "Dave Franco"    
#> [13] "Morgan Freeman"  "Todd Hansen"     "Jonah Hill"

The titles and authors of recent message board postings are stored in a the third table on the page. We can use html_node() and [[ to find it, then coerce it to a data frame with html_table():

lego_movie %>%
  html_nodes("table") %>%
  .[[3]] %>%
  html_table()
#>                                              X 1            NA
#> 1 this movie is very very deep and philosophical   mrdoctor524
#> 2 This got an 8.0 and Wizard of Oz got an 8.1...  marr-justinm
#> 3                         Discouraging Building?       Laestig
#> 4                              LEGO - the plural      neil-476
#> 5                                 Academy Awards   browncoatjw
#> 6                    what was the funniest part? actionjacksin

Other important functions

  • If you prefer, you can use xpath selectors instead of css: html_nodes(doc, xpath = "//table//td")).

  • Extract the tag names with html_tag(), text with html_text(), a single attribute with html_attr() or all attributes with html_attrs().

  • Detect and repair text encoding problems with guess_encoding() and repair_encoding().

  • Navigate around a website as if you’re in a browser with html_session(), jump_to(), follow_link(), back(), and forward(). Extract, modify and submit forms with html_form(), set_values() and submit_form(). (This is still a work in progress, so I’d love your feedback.)

To see these functions in action, check out package demos with demo(package = "rvest").

I’m very pleased to announce a new version of RSQLite 1.0.0. RSQLite is the easiest way to use SQL database from R:

library(DBI)
# Create an ephemeral in-memory RSQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Copy in the buit-in mtcars data frame
dbWriteTable(con, "mtcars", mtcars, row.names = FALSE)
#> [1] TRUE

# Fetch all results from a query:
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4 AND mpg < 23")
dbFetch(res)
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 3 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 4 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
dbClearResult(res)
#> [1] TRUE

# Or fetch them a chunk at a time
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
while(!dbHasCompleted(res)){
  chunk <- dbFetch(res, n = 10)
  print(nrow(chunk))
}
#> [1] 10
#> [1] 1
dbClearResult(res)
#> [1] TRUE

# Good practice to disconnect from the database when you're done
dbDisconnect(con)
#> [1] TRUE

RSQLite 1.0.0 is mostly a cleanup release. This means a lot of old functions have been deprecated and removed:

  • idIsValid() is deprecated; use dbIsValid() instead. dbBeginTransaction() is deprecated; use dbBegin() instead. Use dbFetch() instead of fetch().
  • dbBuildTableDefinition() is now sqliteBuildTableDefinition() (to avoid implying that it’s a DBI generic).
  • Internal sqlite*() functions are no longer exported (#20). safe.write() is no longer exported.

It also includes a few minor improvements and bug fixes. The most important are:

  • Inlined RSQLite.extfuns – use initExtension() to load the many useful extension functions.
  • Methods no longer automatically clone the connection is there is an open result set. This was implemented inconsistently in a handful of places. RSQLite is now more forgiving if you forget to close a result set – it will close it for you, with a warning. It’s still good practice to clean up after yourself with dbClearResults(), but you don’t have to.
  • dbBegin(), dbCommit() and dbRollback() throw errors on failure, rather than returning FALSE. They all gain a name argument to specify named savepoints.
  • dbWriteTable() has been rewritten. It uses a better quoting strategy, throws errors on failure, and only automatically adds row names only if they’re strings. (NB: dbWriteTable() also has a method that allows you load files directly from disk.)

For a complete list of changes, please see the full release notes.

Follow

Get every new post delivered to your Inbox.

Join 12,397 other followers