You are currently browsing hadleywickham’s articles.

httr 0.6.0 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.

This release is mostly bug fixes and minor improvements. The most important are:

  • handle_reset(), which allows you to reset the default handle if you get the error “easy handle already used in multi handle”.
  • write_stream() which lets you process the response from a server as a stream of raw vectors (#143).
  • VERB() allows to you send a request with a custom http verb.
  • brew_dr() checks for common problems. It currently checks if your libcurl uses NSS. This is unlikely to work so it gives you some advice on how to fix the problem (thanks to Dirk Eddelbuettel for debugging this problem and suggesting a remedy).
  • Added support for Google OAuth2 service accounts. (#119, thanks to help from @siddharthab). See ?oauth_service_token for details.

I’ve also switched from RC to R6 (which should make it easier to extend OAuth classes for non-standard OAuth implementations), and tweaked the use of the backend SSL certificate details bundled with httr. See the release notes for complete details.

tidyr 0.2.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has variables in columns and observations in rows, and is described in more detail in the tidy data vignette. Install tidyr with:

install.packages("tidyr")

There are three important additions to tidyr 0.2.0:

  • expand() is a wrapper around expand.grid() that allows you to generate all possible combinations of two or more variables. In conjunction with dplyr::left_join(), this makes it easy to fill in missing rows of data.
    sales <- dplyr::data_frame(
      year = rep(c(2012, 2013), c(4, 2)),
      quarter = c(1, 2, 3, 4, 2, 3), 
      sales = sample(6) * 100
    )
    
    # Missing sales data for 2013 Q1 & Q4
    sales
    #> Source: local data frame [6 x 3]
    #> 
    #>   year quarter sales
    #> 1 2012       1   400
    #> 2 2012       2   200
    #> 3 2012       3   500
    #> 4 2012       4   600
    #> 5 2013       2   300
    #> 6 2013       3   100
    
    # Missing values are now explicit
    sales %>% 
      expand(year, quarter) %>%
      dplyr::left_join(sales)
    #> Joining by: c("year", "quarter")
    #> Source: local data frame [8 x 3]
    #> 
    #>   year quarter sales
    #> 1 2012       1   400
    #> 2 2012       2   200
    #> 3 2012       3   500
    #> 4 2012       4   600
    #> 5 2013       1    NA
    #> 6 2013       2   300
    #> 7 2013       3   100
    #> 8 2013       4    NA
  • In the process of data tidying, it’s sometimes useful to have a column of a data frame that is a list of vectors. unnest() lets you simplify that column back down to an atomic vector, duplicating the original rows as needed. (NB: If you’re working with data frames containing lists, I highly recommend using dplyr’s tbl_df, which will display list-columns in a way that makes their structure more clear. Use dplyr::data_frame() to create a data frame wrapped with the tbl_df class.)
    raw <- dplyr::data_frame(
      x = 1:3,
      y = c("a", "d,e,f", "g,h")
    )
    # y is character vector containing comma separated strings
    raw
    #> Source: local data frame [3 x 2]
    #> 
    #>   x     y
    #> 1 1     a
    #> 2 2 d,e,f
    #> 3 3   g,h
    
    # y is a list of character vectors
    as_list <- raw %>% mutate(y = strsplit(y, ","))
    as_list
    #> Source: local data frame [3 x 2]
    #> 
    #>   x        y
    #> 1 1 <chr[1]>
    #> 2 2 <chr[3]>
    #> 3 3 <chr[2]>
    
    # y is a character vector; rows are duplicated as needed
    as_list %>% unnest(y)
    #> Source: local data frame [6 x 2]
    #> 
    #>   x y
    #> 1 1 a
    #> 2 2 d
    #> 3 2 e
    #> 4 2 f
    #> 5 3 g
    #> 6 3 h
  • separate() has a new extra argument that allows you to control what happens if a column doesn’t always split into the same number of pieces.
    raw %>% separate(y, c("trt", "B"), ",")
    #> Error: Values not split into 2 pieces at 1, 2
    raw %>% separate(y, c("trt", "B"), ",", extra = "drop")
    #> Source: local data frame [3 x 3]
    #> 
    #>   x trt  B
    #> 1 1   a NA
    #> 2 2   d  e
    #> 3 3   g  h
    raw %>% separate(y, c("trt", "B"), ",", extra = "merge")
    #> Source: local data frame [3 x 3]
    #> 
    #>   x trt   B
    #> 1 1   a  NA
    #> 2 2   d e,f
    #> 3 3   g   h

To read about the other minor changes and bug fixes, please consult the release notes.

reshape2 1.4.1

There’s also a new version of reshape2, 1.4.1. It includes three bug fixes for melt.data.frame() contributed by Kevin Ushey. Read all about them on the release notes and install it with:

install.packages("reshape2")

(Posted on behalf of Stefan Milton Bache)

Sometimes it’s the small things that make a big difference. For me, the introduction of our awkward looking friend, %>%, was one such little thing. I’d never suspected that it would have such an impact on the way quite a few people think and write R (including my own), or that pies would be baked (see here) and t-shirts printed (e.g. here) in honor of the successful three-char-long and slightly overweight operator. Of course a big part of the success is the very fruitful relationship with dplyr and its powerful verbs.

Quite some time went by without any changes to the CRAN version of magrittr. But many ideas have been evaluated and tested, and now we are happy to finally bring an update which brings both some optimization and a few nifty features — we hope that we have managed to strike a balance between simplicity and usefulness and that you will benefit from this update. You can install it now with:

install.packages("magrittr")

The underlying evaluation model is more coherent in this release; this makes the new features more natural extensions and improves performance somewhat. Below I’ll recap some of the important new features, which include functional sequences, a few specialized supplementary operators and better lambda syntax.

Functional sequences

The basic (pseudo) usage of the pipe operator goes something like this:

awesome_data <-
  raw_interesting_data %>%
  transform(somehow) %>%
  filter(the_good_parts) %>%
  finalize

This statement has three parts: an input, an output, and a sequence transformations. That’s suprisingly close to the definition of a function, so in magrittr is really just a convenient way of of defining and applying a function.
A new really useful feature of magrittr 1.5 makes that explicit: you can use %>% to not only produce values but also to produce functions (or functional sequences)! It’s really all the same, except sometimes the function is applied instantly and produces a result, and sometimes it is not, in which case the function itself is returned. In this case, there is no initial value, so we replace that with the dot placeholder. Here is how:

mae <- . %>% abs %>% mean(na.rm = TRUE)
mae(rnorm(10))
#> [1] 0.5605

That’s equivalent to:

mae <- function(x) {
  mean(abs(x), na.rm = TRUE)
}

Even for a short function, this is more compact, and is easier to read as it is defined linearly from left to right.
There are some really cool use cases for this: functionals! Consider how clean it is to pass a function to lapply or aggregate!

info <-
  files %>%
  lapply(. %>% read_file %>% extract(the_goodies))

Functions made this way can be indexed with [ to get a new function containing only a subset of the steps.

Lambda expressions

The new version makes it clearer that each step is really just a single-statement body of a unary function. What if we need a little more than one command to make a satisfactory “step” in a chain? Before, one might either define a function outside the chain, or even anonymously inside the chain, enclosing the entire definition in parentheses. Now extending that one command is like extending a standard one-command function: enclose whatever you’d like in braces, and that’s it:

value %>%
  foo %>% {
    x <- bar(.)
    y <- baz(.)
    x * y
  } %>%
  and_whatever

As usual, the name of the argument to that unary function is ..

Nested function calls

In this release the dot (.) will work also in nested function calls on the right-hand side, e.g.:

1:5 %>% 
  paste(letters[.])
#> [1] "1 a" "2 b" "3 c" "4 d" "5 e"

When you use . inside a function call, it’s used in addition to, not instead of, . at the top-level. For example, the previous command is equivalent to:

1:5 %>% 
  paste(., letters[.])
#> [1] "1 a" "2 b" "3 c" "4 d" "5 e"

If you don’t want this behaviour, wrap the function call in {:

1:5 %>% {
  paste(letters[.])
}
#> [1] "a" "b" "c" "d" "e"

A few of %>%’s friends

We also introduce a few operators. These are supplementary operators that just make some situations more comfortable.
The tee operator, %T>%, enables temporary branching in a pipeline to apply a few side-effect commands to the current value, like plotting or logging, and is inspired by the Unix tee command. The only difference to %>% is that %T>% returns the left-hand side rather than the result of applying the right-hand side:

value %>%
  transform %T>%
  plot %>%
  transform(even_more)

This is a shortcut for:

value %>%
  transform %>%
  { plot(.); . } %>%
  transform(even_more)

because plot() doesn’t normally return anything that can be piped along!
The exposition operator, %$%, is a wrapper around with(),
which makes it easy to refer to the variables inside a data frame:

mtcars %$%
  plot(mpg, wt)

Finally, we also have %<>%, the compound assignment pipe operator. This must be the first operator in the chain, and it will assign the result of the pipeline to the left-hand side name or expression. It’s purpose is to shorten expressions like this:

data$some_variable <-
  data$some_variable %>%
  transform

and turn them into something like this:

data$some_variable %<>%
  transform

Even a small example like x %<>% sort has its appeal!
In summary there is a few new things to get to know; but magrittr is like it always was. Just a little coolr!

rvest is new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with:

install.packages("rvest")

rvest in action

To see rvest in action, imagine we’d like to scrape some information about The Lego Movie from IMDB. We start by downloading and parsing the file with html():

library(rvest)
lego_movie <- html("http://www.imdb.com/title/tt1490017/")

To extract the rating, we start with selectorgadget to figure out which css selector matches the data we want: strong span. (If you haven’t heard of selectorgadget, make sure to read vignette("selectorgadget") – it’s the easiest way to determine which selector extracts the data that you’re interested in.) We use html_node() to find the first node that matches that selector, extract its contents with html_text(), and convert it to numeric with as.numeric():

lego_movie %>% 
  html_node("strong span") %>%
  html_text() %>%
  as.numeric()
#> [1] 7.9

We use a similar process to extract the cast, using html_nodes() to find all nodes that match the selector:

lego_movie %>%
  html_nodes("#titleCast .itemprop span") %>%
  html_text()
#>  [1] "Will Arnett"     "Elizabeth Banks" "Craig Berry"    
#>  [4] "Alison Brie"     "David Burrows"   "Anthony Daniels"
#>  [7] "Charlie Day"     "Amanda Farinos"  "Keith Ferguson" 
#> [10] "Will Ferrell"    "Will Forte"      "Dave Franco"    
#> [13] "Morgan Freeman"  "Todd Hansen"     "Jonah Hill"

The titles and authors of recent message board postings are stored in a the third table on the page. We can use html_node() and [[ to find it, then coerce it to a data frame with html_table():

lego_movie %>%
  html_nodes("table") %>%
  .[[3]] %>%
  html_table()
#>                                              X 1            NA
#> 1 this movie is very very deep and philosophical   mrdoctor524
#> 2 This got an 8.0 and Wizard of Oz got an 8.1...  marr-justinm
#> 3                         Discouraging Building?       Laestig
#> 4                              LEGO - the plural      neil-476
#> 5                                 Academy Awards   browncoatjw
#> 6                    what was the funniest part? actionjacksin

Other important functions

  • If you prefer, you can use xpath selectors instead of css: html_nodes(doc, xpath = "//table//td")).

  • Extract the tag names with html_tag(), text with html_text(), a single attribute with html_attr() or all attributes with html_attrs().

  • Detect and repair text encoding problems with guess_encoding() and repair_encoding().

  • Navigate around a website as if you’re in a browser with html_session(), jump_to(), follow_link(), back(), and forward(). Extract, modify and submit forms with html_form(), set_values() and submit_form(). (This is still a work in progress, so I’d love your feedback.)

To see these functions in action, check out package demos with demo(package = "rvest").

I’m very pleased to announce a new version of RSQLite 1.0.0. RSQLite is the easiest way to use SQL database from R:

library(DBI)
# Create an ephemeral in-memory RSQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Copy in the buit-in mtcars data frame
dbWriteTable(con, "mtcars", mtcars, row.names = FALSE)
#> [1] TRUE

# Fetch all results from a query:
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4 AND mpg < 23")
dbFetch(res)
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 3 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 4 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
dbClearResult(res)
#> [1] TRUE

# Or fetch them a chunk at a time
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
while(!dbHasCompleted(res)){
  chunk <- dbFetch(res, n = 10)
  print(nrow(chunk))
}
#> [1] 10
#> [1] 1
dbClearResult(res)
#> [1] TRUE

# Good practice to disconnect from the database when you're done
dbDisconnect(con)
#> [1] TRUE

RSQLite 1.0.0 is mostly a cleanup release. This means a lot of old functions have been deprecated and removed:

  • idIsValid() is deprecated; use dbIsValid() instead. dbBeginTransaction() is deprecated; use dbBegin() instead. Use dbFetch() instead of fetch().
  • dbBuildTableDefinition() is now sqliteBuildTableDefinition() (to avoid implying that it’s a DBI generic).
  • Internal sqlite*() functions are no longer exported (#20). safe.write() is no longer exported.

It also includes a few minor improvements and bug fixes. The most important are:

  • Inlined RSQLite.extfuns – use initExtension() to load the many useful extension functions.
  • Methods no longer automatically clone the connection is there is an open result set. This was implemented inconsistently in a handful of places. RSQLite is now more forgiving if you forget to close a result set – it will close it for you, with a warning. It’s still good practice to clean up after yourself with dbClearResults(), but you don’t have to.
  • dbBegin(), dbCommit() and dbRollback() throw errors on failure, rather than returning FALSE. They all gain a name argument to specify named savepoints.
  • dbWriteTable() has been rewritten. It uses a better quoting strategy, throws errors on failure, and only automatically adds row names only if they’re strings. (NB: dbWriteTable() also has a method that allows you load files directly from disk.)

For a complete list of changes, please see the full release notes.

I’m very pleased to announce that dplyr 0.3 is now available from CRAN. Get the latest version by running:

install.packages("dplyr")

There are four major new features:

  • Four new high-level verbs: distinct(), slice(), rename(), and transmute().
  • Three new helper functions between, count(), and data_frame().
  • More flexible join specifications.
  • Support for row-based set operations.

There are two new features of interest to developers. They make it easier to write packages that use dplyr:

  • It’s now much easier to program with dplyr (using standard evaluation).
  • Improved database backends.

I describe each of these in turn below.

New verbs

distinct() returns distinct (unique) rows of a table:

library(nycflights13)
# Find all origin-destination pairs
flights %>% 
  select(origin, dest) %>%
  distinct()
#> Source: local data frame [224 x 2]
#> 
#>    origin dest
#> 1     EWR  IAH
#> 2     LGA  IAH
#> 3     JFK  MIA
#> 4     JFK  BQN
#> 5     LGA  ATL
#> ..    ...  ...

slice() allows you to select rows by position. It includes positive integers and drops negative integers:

# Get the first flight to each destination
flights %>% 
  group_by(dest) %>%
  slice(1)
#> Source: local data frame [105 x 16]
#> Groups: dest
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013    10   1     1955        -6     2213       -35      B6  N554JB
#> 2  2013    10   1     1149       -10     1245       -14      B6  N346JB
#> 3  2013     1   1     1315        -2     1413       -10      EV  N13538
#> 4  2013     7   6     1629        14     1954         1      UA  N587UA
#> 5  2013     1   1      554        -6      812       -25      DL  N668DN
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

transmute() and rename() are variants of mutate() and select(). Transmute drops all columns that you didn’t specifically mention, rename() keeps all columns that you didn’t specifically mention. They complete this table:

Drop others Keep others
Rename & reorder variables select() rename()
Compute new variables transmute() mutate()

New helpers

data_frame(), contributed by Kevin Ushey, is a nice way to create data frames:

  • It never changes the type of its inputs (i.e. no more stringsAsFactors = FALSE!)
    data.frame(x = letters) %>% sapply(class)
    #>        x 
    #> "factor"
    data_frame(x = letters) %>% sapply(class)
    #>           x 
    #> "character"
  • Or the names of variables:
    data.frame(`crazy name` = 1) %>% names()
    #> [1] "crazy.name"
    data_frame(`crazy name` = 1) %>% names()
    #> [1] "crazy name"
  • It evaluates its arguments lazyily and in order:
    data_frame(x = 1:5, y = x ^ 2)
    #> Source: local data frame [5 x 2]
    #> 
    #>   x  y
    #> 1 1  1
    #> 2 2  4
    #> 3 3  9
    #> 4 4 16
    #> 5 5 25
  • It adds tbl_df() class to output, never adds row.names(), and only recycles vectors of length 1 (recycling is a frequent source of bugs in my experience).

The count() function wraps up the common combination of group_by() and summarise():

# How many flights to each destination?
flights %>% count(dest)
#> Source: local data frame [105 x 2]
#> 
#>    dest     n
#> 1   ABQ   254
#> 2   ACK   265
#> 3   ALB   439
#> 4   ANC     8
#> 5   ATL 17215
#> ..  ...   ...

# Which planes flew the most?
flights %>% count(tailnum, sort = TRUE)
#> Source: local data frame [4,044 x 2]
#> 
#>    tailnum    n
#> 1          2512
#> 2   N725MQ  575
#> 3   N722MQ  513
#> 4   N723MQ  507
#> 5   N711MQ  486
#> ..     ...  ...

# What's the total carrying capacity of planes by year of purchase
planes %>% count(year, wt = seats)
#> Source: local data frame [47 x 2]
#> 
#>    year   n
#> 1  1956 102
#> 2  1959  18
#> 3  1963  10
#> 4  1965 149
#> 5  1967   9
#> ..  ... ...

Better joins

You can now join by different variables in each table:

narrow <- flights %>% select(origin, dest, year:day)

# Add destination airport metadata
narrow %>% left_join(airports, c("dest" = "faa"))
#> Source: local data frame [336,776 x 11]
#> 
#>    dest origin year month day                            name   lat    lon
#> 1   IAH    EWR 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 2   IAH    LGA 2013     1   1    George Bush Intercontinental 29.98 -95.34
#> 3   MIA    JFK 2013     1   1                      Miami Intl 25.79 -80.29
#> 4   BQN    JFK 2013     1   1                              NA    NA     NA
#> 5   ATL    LGA 2013     1   1 Hartsfield Jackson Atlanta Intl 33.64 -84.43
#> ..  ...    ...  ...   ... ...                             ...   ...    ...
#> Variables not shown: alt (int), tz (dbl), dst (chr)

# Add origin airport metadata
narrow %>% left_join(airports, c("origin" = "faa"))
#> Source: local data frame [336,776 x 11]
#> 
#>    origin dest year month day                name   lat    lon alt tz dst
#> 1     EWR  IAH 2013     1   1 Newark Liberty Intl 40.69 -74.17  18 -5   A
#> 2     LGA  IAH 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> 3     JFK  MIA 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 4     JFK  BQN 2013     1   1 John F Kennedy Intl 40.64 -73.78  13 -5   A
#> 5     LGA  ATL 2013     1   1          La Guardia 40.78 -73.87  22 -5   A
#> ..    ...  ...  ...   ... ...                 ...   ...    ... ... .. ...

(right_join() and outer_join() implementations are planned for dplyr 0.4.)

Set operations

You can use intersect(), union() and setdiff() with data frames, data tables and databases:

jfk_planes <- flights %>% 
  filter(origin == "JFK") %>% 
  select(tailnum) %>% 
  distinct()
lga_planes <- flights %>% 
  filter(origin == "LGA") %>% 
  select(tailnum) %>% 
  distinct()

# Planes that fly out of either JGK or LGA
nrow(union(jfk_planes, lga_planes))
#> [1] 3592

# Planes that fly out of both JFK and LGA
nrow(intersect(jfk_planes, lga_planes))
#> [1] 1311

# Planes that fly out JGK but not LGA
nrow(setdiff(jfk_planes, lga_planes))
#> [1] 647

Programming with dplyr

You can now program with dplyr – every function that uses non-standard evaluation (NSE) also has a standard evaluation (SE) twin that ends in _. For example, the SE version of filter() is called filter_(). The SE version of each function has similar arguments, but they must be explicitly “quoted”. Usually the best way to do this is to use ~:

airport <- "ANC"
# NSE version
filter(flights, dest == airport)
#> Source: local data frame [8 x 16]
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

# Equivalent SE code:
criteria <- ~dest == airport
filter_(flights, criteria)
#> Source: local data frame [8 x 16]
#> 
#>    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> 1  2013     7   6     1629        14     1954         1      UA  N587UA
#> 2  2013     7  13     1618         3     1955         2      UA  N572UA
#> 3  2013     7  20     1618         3     2003        10      UA  N567UA
#> 4  2013     7  27     1617         2     1906       -47      UA  N559UA
#> 5  2013     8   3     1615         0     2003        10      UA  N572UA
#> ..  ...   ... ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)

To learn more, read the Non-standard evaluation vignette. This new approach is powered by the lazyeval package which provides all the tools needed to implement NSE consistently and correctly. I now understand how to implement NSE consistently and correctly, and I’ll be using the same approach everywhere.

Database backends

The database backend system has been completely overhauled in order to make it possible to add backends in other packages, and to support a much wider range of databases. If you’re interested in implementing a new dplyr backend, please check out vignette("new-sql-backend") – it’s really not that much work.

The first package to take advantage of this system is MonetDB.R, which now provides the MonetDB backend for dplyr.

Other changes

As well as the big new features described here, dplyr 0.3 also fixes many bugs and makes numerous minor improvements. See the release notes for a complete list of the changes.

Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at http://r-pkgs.had.co.nz/. You can get the latest version with:

install.packages("devtools")

We’ve made a lot of improvements to the install and release process:

  • Installation functions now default to build_vignettes = FALSE, and only install required dependencies (not suggested). They also store a lot of useful metadata.
  • install_github() got a lot of love. install_github("user/repo") is now the preferred way to install a package from github (older forms with explicit username parameter are now deprecated). You can supply the host argument to install packages from a local github enterprise installation. You can get the latest release with user/repo@*release.
  • session_info() uses package installation metdata to show you exactly how every package was installed (locally, from CRAN, from github, …)
  • release() uses new webform-based submission process for CRAN, as implemented in submit_cran().
  • You can add arbitrary extra questions to release() by defining a function release_questions() in your package. It should return a character vector of questions to ask.

We’ve also added a number of functions to make it easy to get started with various aspects of the package development:

  • use_data() adds data to a package, either in data/ (external data) or in R/sysdata.rda (internal data). use_data_raw() sets up data-raw/ for your reproducible data generation scripts.
  • use_package() sets dependencies and reminds you how to use them.
  • use_rcpp() gets you ready to use Rcpp.
  • use_testthat() sets up testing infrastructure with testthat.
  • use_travis() adds a .travis.yml file and tells you how to get started with travis ci.
  • use_vignette() creates a draft vignette using Rmarkdown.

There were many other minor improvements and bug fixes. See the release notes for complete list of changes.

testthat 0.9 is now available on CRAN. Testthat makes it easy to turn the informal testing that you’re already doing into formal automated tests. Learn more at http://r-pkgs.had.co.nz/tests.html

This version of testthat has four important new features that bring testthat up to speed with unit testing frameworks in other languages:

  • You can skip() tests with an informative message, if their prerequisites are not available. This is particularly use for CRAN packages, since tests only have a limited amount of time to run. Use skip_on_cran() skip selected tests when run on CRAN.
    test_that("a complicated simulation takes a long time", {
      skip_on_cran()
    
      ...
    })
  • Experiment with behaviour driven development with the new describe() function contributed by Dirk Schumacher:
    describe("matrix()", {
      it("can be multiplied by a scalar", {
        m1 <- matrix(1:4, 2, 2)
        m2 <- m1 * 2
        expect_equivalent(matrix(1:4 * 2, 2, 2), m2)
      })
    })
  • Use with_mock() to “mock” functions, replacing slow, resource intensive or inconsistent functions with your own quick approximations. This is particularly useful when you want to test functions that call web APIs without being connected to the internet. Contributed by Kirill Müller.
  • Sometimes it’s difficult to figure out exactly what a function should return and instead you just want to make sure that it returned the same thing as the last time you ran it. A new expectation, expect_equal_to_reference(), makes this easy to do. Contributed by Jon Clayden.

Other changes of note: auto_test_package() is working again (and uses devtools::load_all() to load the code), random praise has been re-enabled (after being accidentally disabled), and expect_identical() works better with R-devel. See the release notes for complete list of changes.

httr 0.5 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.

This release is mostly bug fixes and minor improvements, but there is one major new feature: you can now save response bodies directly to disk.

library(httr)
# Download the latest version of rstudio for windows
url <- "http://download1.rstudio.org/RStudio-0.98.1049.exe"
GET(url, write_disk(basename(url)), progress())

There is also some preliminary support for HTTP caching (see cache_info() and rerequest()). See the release notes for complete details.

httr 0.4 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R.

The most important new features are two new vignettes to help you get started and to help you make wrappers for web APIs. Other important improvements include:

  • New headers() and cookies() functions to extract headers and cookies from responses. status_code() returns HTTP status codes.
  • POST() (and PUT(), and PATCH()) now have an encode argument that determine how the body is encoded. Valid values are “multipart”, “form” or “json”, and the multipart argument is now deprecated.
  • GET(..., progress()) will display a progress bar, useful if you’re doing large uploads or downloads.
  • verbose() gives you considerably more control over degree of verbosity, and defaults have been selected to be more helpful for the most common cases.
  • NULL query parameters are now dropped automatically.

There are number of other minor improvements and bug fixes, as described by the release notes.

Follow

Get every new post delivered to your Inbox.

Join 12,259 other followers