You are currently browsing the monthly archive for January 2016.
Shiny 0.13.0 is now available on CRAN! This release has some of the most exciting features we’ve shipped since the first version of Shiny. Highlights include:
- Shiny Gadgets
- HTML templates
- Shiny modules
- Error stack traces
- Checking for missing inputs
For a comprehensive list of changes, see the NEWS file.
To install the new version from CRAN, run:
Read on for details about these new features!
A common theme over the last few decades was that we could afford to simply sit back and let computer (hardware) engineers take care of increases in computing speed thanks to Moore’s law. That same line of thought now frequently points out that we are getting closer and closer to the physical limits of what Moore’s law can do for us.
So the new best hope is (and has been) parallel processing. Even our smartphones have multiple cores, and most if not all retail PCs now possess two, four or more cores. Real computers, aka somewhat decent servers, can be had with 24, 32 or more cores as well, and all that is before we even consider GPU coprocessors or other upcoming changes.
Sometimes our tasks are embarrassingly simple as is the case with many data-parallel jobs: we can use higher-level operations such as those offered by the base R package parallel to spawn multiple processing tasks and gather the results. Dirk covered all this in some detail in previous talks on High Performance Computing with R (and you can also consult the CRAN Task View on High Performance Computing with R).
But sometimes we cannot use data-parallel approaches. Hence we have to redo our algorithms. Which is really hard. R itself has been relying on the (fairly mature) OpenMP standard for some of its operations. Luke Tierney’s keynote at the 2014 R/Finance conference mentioned some of the issues related to OpenMP, which works really well on Linux but currently not so well on other platforms. R is expected to make wider use of it in future versions once compiler support for OpenMP on Windows and OS X improves.
In the meantime, the RcppParallel package provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:
- Intel Thread Building Blocks (v4.3), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).
- TinyThread, a C++ library for portable use of operating system threads.
RMatrixwrapper classes for safe and convenient access to R data structures in a multi-threaded environment.
- High level parallel functions (
parallelReduce) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.
RcppParallel is available on CRAN now and several packages including dbmss, gaston, markovchain, rPref, SpatPCA, StMoSim, and text2vec are already taking advantage of it (you can read more about the tex2vec implementation here).
In addition, the Rcpp Gallery includes several pieces demonstrating the use of RcppParallel, including:
- A parallel matrix transformation
- A parallel vector summation
- A parallel inner product
- A parallel distance matrix calculation
All four are interesting and demonstrate different aspects of parallel computing via RcppParallel. But the last article is key—it shows how a particular matrix distance metric (which is missing from R) can be implemented in a serial manner in both R, and also via Rcpp. The fastest implementation, however, uses both Rcpp and RcppParallel and thereby achieves a truly impressive speed gain as the gains from using compiled code (via Rcpp) and from using a parallel algorithm (via RcppParallel) are multiplicative. On a couple of four-core machines the RcppParallel version was between 200 and 300 times faster than the R version.
Exciting times for parallel programming in R! To learn more head over to the RcppParallel package and start playing.
I’m pleased to announce purrr 0.2.0. Purrr fills in the missing pieces in R’s functional programming tools, and is designed to make your pure (and now) type-stable functions purrr.
I’m still working out exactly what purrr should do, and how it compares to existing functions in base R, dplyr, and tidyr. One main insight that has affected much of the current version is that functions designed for programming should be type-stable. Type-stability is an idea brought to my attention by Julia. Even though functions in R and Julia can return different types of output, by and large, you should strive to make functions that always return the same type of data structure. This makes functions more robust to varying input, and makes them easier to reason about (and in Julia, to optimise). (But not every function can be type-stable – how could
Purrr 0.2.0 adds type-stable alternatives for maps, flattens, and
try(), as described below. There were a lot of other minor improvements, bug fixes, and a number of deprecations. Please see the release notes for a complete list of changes.
Type stable maps
A map is a function that calls an another function on each element of a vector. Map functions in base R are the “applys”:
lapply() is type-stable: no matter what the inputs are, the output is already a list.
sapply() is not type-stable: it can return different types of output depending on the input. The following code shows a simple (if somewhat contrived) example of
sapply() returning either a vector, a matrix, or a list, depending on its inputs:
df <- data.frame( a = 1L, b = 1.5, y = Sys.time(), z = ordered(1) ) df[1:4] %>% sapply(class) %>% str() #> List of 4 #> $ a: chr "integer" #> $ b: chr "numeric" #> $ y: chr [1:2] "POSIXct" "POSIXt" #> $ z: chr [1:2] "ordered" "factor" df[1:2] %>% sapply(class) %>% str() #> Named chr [1:2] "integer" "numeric" #> - attr(*, "names")= chr [1:2] "a" "b" df[3:4] %>% sapply(class) %>% str() #> chr [1:2, 1:2] "POSIXct" "POSIXt" "ordered" "factor" #> - attr(*, "dimnames")=List of 2 #> ..$ : NULL #> ..$ : chr [1:2] "y" "z"
This behaviour makes
sapply() appropriate for interactive use, since it usually guesses correctly and gives a useful data structure. It’s not appropriate for use in package or production code because if the input isn’t what you expect, it won’t fail, and will instead return an unexpected data structure. This typically causes an error further along the process, so you get a confusing error message and it’s difficult to isolate the root cause.
Base R has a type-stable version of
vapply(). It takes an additional argument that determines what the output will be. purrr takes a different approach. Instead of one function that does it all, purrr has multiple functions, one for each common type of output:
map_df(). These either produce the specified type of output or throw an error. This forces you to deal with the problem right away:
df[1:4] %>% map_chr(class) #> Error: Result 3 is not a length 1 atomic vector df[1:4] %>% map_chr(~ paste(class(.), collapse = "/")) #> a b y z #> "integer" "numeric" "POSIXct/POSIXt" "ordered/factor"
Other variants of
map() have similar suffixes. For example,
map2() allows you to iterate over two vectors in parallel:
x <- list(1, 3, 5) y <- list(2, 4, 6) map2(x, y, c) #> [] #>  1 2 #> #> [] #>  3 4 #> #> [] #>  5 6
map2() always returns a list. If you want to add together the corresponding values and store the result as a double vector, you can use
map2_dbl(x, y, `+`) #>  3 7 11
Another map variant is
invoke_map(), which takes a list of functions and list of arguments. It also has type-stable suffixes:
spread <- list(sd = sd, iqr = IQR, mad = mad) x <- rnorm(100) invoke_map_dbl(spread, x = x) #> sd iqr mad #> 0.9121309 1.2515807 0.9774154
Another situation when type-stability is important is flattening a nested list into a simpler data structure. Base R has
unlist(), but it’s dangerous because it always succeeds. As an alternative, purrr provides
x <- list(1L, 2:3, 4L) x %>% str() #> List of 3 #> $ : int 1 #> $ : int [1:2] 2 3 #> $ : int 4 x %>% flatten() %>% str() #> List of 4 #> $ : int 1 #> $ : int 2 #> $ : int 3 #> $ : int 4 x %>% flatten_int() %>% str() #> int [1:4] 1 2 3 4
Another function in base R that is not type-stable is
try() ensures that an expression always succeeds, either returning the original value or the error message:
str(try(log(10))) #> num 2.3 str(try(log("a"), silent = TRUE)) #> Class 'try-error' atomic [1:1] Error in log("a") : non-numeric argument to mathematical function #> #> ..- attr(*, "condition")=List of 2 #> .. ..$ message: chr "non-numeric argument to mathematical function" #> .. ..$ call : language log("a") #> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
safely() is a type-stable version of try. It always returns a list of two elements, the result and the error, and one will always be
safely(log)(10) #> $result #>  2.302585 #> #> $error #> NULL safely(log)("a") #> $result #> NULL #> #> $error #> <simpleError in .f(...): non-numeric argument to mathematical function>
safely() takes a function as input and returns a “safe” function, a function that never throws an error. A powerful technique is to use
map() together to attempt an operation on each element of a list:
safe_log <- safely(log) x <- list(10, "a", 5) log_x <- x %>% map(safe_log) str(log_x) #> List of 3 #> $ :List of 2 #> ..$ result: num 2.3 #> ..$ error : NULL #> $ :List of 2 #> ..$ result: NULL #> ..$ error :List of 2 #> .. ..$ message: chr "non-numeric argument to mathematical function" #> .. ..$ call : language .f(...) #> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" #> $ :List of 2 #> ..$ result: num 1.61 #> ..$ error : NULL
This is output is slightly inconvenient because you’d rather have a list of three results, and another list of three errors. You can use the new
transpose() function to switch the order of the first and second levels in the hierarchy:
log_x %>% transpose() %>% str() #> List of 2 #> $ result:List of 3 #> ..$ : num 2.3 #> ..$ : NULL #> ..$ : num 1.61 #> $ error :List of 3 #> ..$ : NULL #> ..$ :List of 2 #> .. ..$ message: chr "non-numeric argument to mathematical function" #> .. ..$ call : language .f(...) #> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" #> ..$ : NULL
This makes it easy to extract the inputs where the original functions failed, or just keep the good successful result:
results <- x %>% map(safe_log) %>% transpose() (ok <- results$error %>% map_lgl(is_null)) #>  TRUE FALSE TRUE (bad_inputs <- x %>% discard(ok)) #> [] #>  "a" (successes <- results$result %>% keep(ok) %>% flatten_dbl()) #>  2.302585 1.609438