You are currently browsing Jim Hester’s articles.

Today we are pleased to release version 1.1.1 of xml2. xml2 makes it easy to read, create, and modify XML with R. You can install it with:

install.packages("xml2")

As well as fixing many bugs, this release:

  • Makes it easier to create an modify XML
  • Improves roundtrip support between XML and lists
  • Adds support for XML validation and XSLT transformations.

You can see a full list of changes in the release notes. This is the first release maintained by Jim Hester.

Creating and modifying XML

xml2 has been overhauled with a set of methods to make generating and modfying XML easier:

  • xml_new_root() can be used to create a new document and root node simultaneously.
    xml_new_root("x") %>%
      xml_add_child("y") %>%
      xml_root()
    #> {xml_document}
    #> <x>
    #> [1] <y/>
  • New xml_set_text(), xml_set_name(), xml_set_attr(), and xml_set_attrs() make it easy to modify nodes within a pipeline.
    x <- read_xml("<a>
        <b />
        <c><b/></c>
      </a>")
    x
    #> {xml_document}
    #> <a>
    #> [1] <b/>
    #> [2] <c>\n  <b/>\n</c>
    
    x %>% 
      xml_find_all(".//b") %>% 
      xml_set_name("banana") %>% 
      xml_set_attr("oldname", "b")
    x
    #> {xml_document}
    #> <a>
    #> [1] <banana oldname="b"/>
    #> [2] <c>\n  <banana oldname="b"/>\n</c>
  • New xml_add_parent() makes it easy to insert a node as the parent of an existing node.

  • You can create more esoteric node types with xml_comment() (comments), xml_cdata() (CDATA nodes), and xml_dtd() (DTDs).

Coercion to and from R Lists

xml2 1.1.1 improves support for converting to and from R lists, thanks in part to work by Peter Foley and Jenny Bryan. In particular xml2 now supports preserving the root node name as well as saving all xml2 attributes as R attributes. These changes allows you to convert most XML documents to and from R lists with as_list() and as_xml_document() without loss of data.

x <- read_xml("<fruits><apple color = 'red' /></fruits>")
x
#> {xml_document}
#> <fruits>
#> [1] <apple color="red"/>
as_list(x)
#> $apple
#> list()
#> attr(,"color")
#> [1] "red"
as_xml_document(as_list(x))
#> {xml_document}
#> <apple color="red">

XML validation and xslt

xml2 1.1.1 also adds support for XML validation, thanks to Jeroen Ooms. Simply read the document and schema files and call xml_validate().

doc <- read_xml(system.file("extdata/order-doc.xml", package = "xml2"))
schema <- read_xml(system.file("extdata/order-schema.xml", package = "xml2"))
xml_validate(doc, schema)
#> [1] TRUE
#> attr(,"errors")
#> character(0)

Jeroen also released the first xml2 extension package in conjunction with xml2 1.1.1, xslt. xslt allows one to apply XSLT (Extensible Stylesheet Language Transformations) to XML documents, which are great for transforming XML data into other formats such as HTML.

Support for building R projects on Travis has recently undergone improvements which we hope will make it an even better tool for the R community. Feature highlights include:

  • Support for Travis’ container-based infrastructure.
  • Package dependency caching (on the container-based builds).
  • Building with multiple R versions (R-devel, R-release (3.2.3) and R-oldrel (3.1.3)).
  • Log filtering to improve readability and hide less relevant information.
  • Updated dependencies TexLive (2015) and pandoc (1.15.2).

See the Travis documentation on building an R project for complete details on the available options.

Using the container-based infrastructure with package caching is now recommended for nearly all projects. There are more compute and network resources available for container based builds, which means they start processing in less time and run faster. The package caching makes package installation comparable or faster than using binary packages.

A minimal .travis.yml file that is suitable for most cases is

language: r
sudo: false
cache: packages

New packages can omit sudo: false, as it is the default for new repositories. However older repositories will have to explicitly set sudo: false to use the container based infrastructure.

If your package depends on development packages that are not on CRAN (such as GitHub) we recommend you use the Remotes: annotation in your package DESCRPITION file. This will allow your package and dependencies to be easily installed by devtools::install_github() as well as on Travis (Examples). It is generally no longer necessary to use r_github_packages, r_packages, r_binary_packages, etc. as this can be handled with Remotes.

If you need system dependencies, first check to see if they’re available with the apt-addon and include them in your .travis.yml. This will allow you to install them without sudo and still use the container based infrastructure.

addons:
  apt:
    packages:
      - libv8-dev

We hope these improvements will make your use of Travis with R simple and useful. Please file any issues found at https://github.com/travis-ci/travis-ci/issues and mention @craigcitro, @hadley and @jimhester in the issue.

We are pleased to announce version 1.0.0 of the memoise package is now available on CRAN. Memoization stores the value of function call and returns the cached result when the function is called again with the same arguments.

The following function computes Fibonacci numbers and illustrates the usefulness of memoization. Because the function definition is recursive, the intermediate results can be looked up rather than recalculated at each level of recursion, which reduces the runtime drastically. The last time the memoised function is called the final result can simply be returned, so no measurable execution time is recorded.

fib <- function(n) {
  if (n < 2) {
    return(n)
  } else {
    return(fib(n-1) + fib(n-2))
  }
}
system.time(x <- fib(30))
#>    user  system elapsed 
#>   4.454   0.010   4.472
fib <- memoise(fib)
system.time(y <- fib(30))
#>    user  system elapsed 
#>   0.004   0.000   0.004
system.time(z <- fib(30))
#>    user  system elapsed 
#>       0       0       0
all.equal(x, y)
#> [1] TRUE
all.equal(x, z)
#> [1] TRUE

Memoization is also very useful for storing queries to external resources, such as network APIs and databases.

Improvements in this release make memoised functions much nicer to use interactively. Memoised functions now have a print method which outputs the original function definition rather than the memoization code.

mem_sum <- memoise(sum)
mem_sum
#> Memoised Function:
#> function (..., na.rm = FALSE)  .Primitive("sum")

Memoised functions now forward their arguments from the original function rather than simply passing them with .... This allows autocompletion to work transparently for memoised functions and also fixes a bug related to non-constant default arguments. [1]

mem_scan <- memoise(scan)
args(mem_scan)
#> function (file = "", what = double(), nmax = -1L, n = -1L, sep = "", 
#>     quote = if (identical(sep, "\n")) "" else "'\"", dec = ".", 
#>     skip = 0L, nlines = 0L, na.strings = "NA", flush = FALSE, 
#>     fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, 
#>     multi.line = TRUE, comment.char = "", allowEscapes = FALSE, 
#>     fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) 
#> NULL

Memoisation can now depend on external variables aside from the function arguments. This feature can be used in a variety of ways, such as invalidating the memoisation when a new package is attached.

mem_f <- memoise(runif, ~search())
mem_f(2)
#> [1] 0.009113091 0.988083122
mem_f(2)
#> [1] 0.009113091 0.988083122
library(ggplot2)
mem_f(2)
#> [1] 0.89150566 0.01128355

Or invalidating the memoisation after a given amount of time has elapsed. A timeout() helper function is provided to make this feature easier to use.

mem_f <- memoise(runif, ~timeout(10))
mem_f(2)
#> [1] 0.6935329 0.3584699
mem_f(2)
#> [1] 0.6935329 0.3584699
Sys.sleep(10)
mem_f(2)
#> [1] 0.2008418 0.4538413

A great amount of thanks for this release goes to Kirill Müller, who wrote the argument forwarding implementation and added comprehensive tests to the package. [2, 3]

See the release notes for a complete list of changes.