You are currently browsing the category archive for the ‘Packages’ category.

Leaflet 1.1.0 is now available on CRAN! The Leaflet package is a tidy wrapper for the Leaflet.js mapping library, and makes it incredibly easy to generate interactive maps based on spatial data you have in R.

leaflet-choro

This release was nearly a year in the making, and includes many important new features.

  • Easily add textual labels on markers, polygons, etc., either on hover or statically
  • Highlight polygons, lines, circles, and rectangles on hover
  • Markers can now be configured with a variety of colors and icons, via integration with Leaflet.awesome-markers
  • Built-in support for many types of objects from sf, a new way of representing spatial data in R (all basic sf/sfc/sfg types except MULTIPOINT and GEOMETRYCOLLECTION are directly supported)
  • Projections other than Web Mercator are now supported via Proj4Leaflet
  • Color palette functions now natively support viridis palettes; use "viridis", "magma", "inferno", or "plasma" as the palette argument
  • Discrete color palette functions (colorBin, colorQuantile, and colorFactor) work much better with color brewer palettes
  • Integration with several Leaflet.js utility plugins
  • Data with NA points or zero rows no longer causes errors
  • Support for linked brushing and filtering, via Crosstalk (more about this to come in another blog post)

Many thanks to @bhaskarvk who contributed much of the code for this release.

Going forward, our intention is to prevent any more Leaflet.js plugins from accreting in the core leaflet package. Instead, we have made it possible to write 3rd party R packages that extend leaflet (though the process to do this is not documented yet). In the meantime, Bhaskar has started developing his own leaflet.extras package; it already supports several plugins, for everything from animated markers to heatmaps.

If big data is your thing, you use R, and you’re headed to Strata + Hadoop World in San Jose March 13 & 14th, you can experience in person how easy and practical it is to analyze big data with R and Spark.

In a beginner level talk by RStudio’s Edgar Ruiz and an intermediate level  workshop by Win-Vector’s John Mount, we cover the spectrum: What R is, what Spark is, how Sparklyr works, and what is required to set up and tune a Spark cluster. You’ll also learn practical applications including: how to quickly set up a local Spark instance, store big data in Spark and then connect to the data with R, use R to apply machine-learning algorithms to big data stored in Spark, and filter and aggregate big data stored in Spark and then import the results into R for analysis and visualization.

2:40pm–3:20pm Wednesday, March 15, 2017
Sparklyr: An R interface for Apache Spark
Edgar Ruiz (RStudio)
Primary topic: Spark & beyond
Location: LL21 C/D
Level: Beginner
Secondary topics: R

1:30pm–5:00pm Tuesday, March 14, 2017
Modeling big data with R, sparklyr, and Apache Spark
John Mount (Win-Vector LLC)
Primary topic: Data science & advanced analytics
Location: LL21 C/D
Level: Intermediate
Secondary topics: R

While you’re  at the conference be sure to look us up in the Innovator’s Pavilion – booth number P8 during the Expo Hall hours. We’ll have the latest books from RStudio authors, t-shirts to win, demonstrations of RStudio Connect and RStudio Server Pro and, of course, stickers and cheatsheets. Share with us what you’re doing with RStudio and get your product and company questions answered by RStudio employees.

See you in San Jose! (https://conferences.oreilly.com/strata/strata-ca)

roxygen2 6.0.0 is now available on CRAN. roxygen2 helps you document your packages by turning specially formatted inline comments into R’s standard Rd format. It automates everything that can be automated, and provides helpers for sharing documentation between topics. Learn more at http://r-pkgs.had.co.nz/man.html. Install the latest version with:

install.packages("roxygen2")

There are two headline features in this version of roxygen2:

  • Markdown support.
  • Improved documentation inheritance.

These are described in detail below.

This release also included many minor improvements and bug fixes. For a full list of changes, please see release notes. A big thanks to all the contributors to this release: @dlebauer, @fmichonneau, @gaborcsardi, @HenrikBengtsson, @jefferis, @jeroenooms, @jimhester, @kevinushey, @krlmlr, @LiNk-NY, @lorenzwalthert, @maxheld83, @nteetor, @shrektan, @yutannihilation

Markdown

Thanks to the hard work of Gabor Csardi you can now write roxygen2 comments in markdown. While we have tried to make markdown mode as backward compatible as possible, there are a few cases where you will need to make some minor changes. For this reason, you’ll need to explicitly opt-in to markdown support. There are two ways to do so:

  • Add Roxygen: list(markdown = TRUE) to your DESCRIPTION to turn it on everywhere.
  • Add @md to individual roxygen blocks to enable for selected topics.

roxygen2’s markdown dialect supports inline formatting (bold, italics, code), lists (numbered and bulleted), and a number of helpful link shortcuts:

  • [func()]: links to a function in the current package, and is translated to \code{\link[=func]{func()}.
  • [object]: links to an object in the current package, and is translated to \link{object}.
  • [link text][object]: links to an object with custom text, and is translated to \link[=link text]{object}

Similarly, you can link to functions and objects in other packages with [pkg::func()][pkg::object], and [link text][pkg::object]. For a complete list of syntax, and how to handle common problems, please see vignette("markdown") for more details.

To convert an existing roxygen2 package to use markdown, try https://github.com/r-pkgs/roxygen2md. Happy markdown-ing!

Improved inheritance

Writing documentation is challenging because you want to reduce duplication as much as possible (so you don’t accidentally end up with inconsistent documentation) but you don’t want the user to have to follow a spider’s web of cross-references. This version of roxygen2 provides more support for writing documentation in one place then reusing in multiple topics.

The new @inherit tag allows to you inherit parameters, return, references, title, description, details, sections, and seealso from another topic. @inherit my_fun will inherit everything; @inherit my_fun return params will allow to you inherit specified components. @inherits fun sections will inherit all sections; if you’d like to inherit a single section, you can use @inheritSection fun title. You can also inherit from a topic in another package with @inherit pkg::fun.

Another new tag is @inheritDotParams, which allows you to automatically generate parameter documentation for ... for the common case where you pass ... on to another function. The documentation generated is similar to the style used in ?plot and will eventually be incorporated in to RStudio’s autocomplete. When you pass along ... you often override some arguments, so the tag has a flexible specification:

  • @inheritDotParams foo takes all parameters from foo().
  • @inheritDotParams foo a b e:h takes parameters ab, and all parameters between e and h.
  • @inheritDotParams foo -x -y takes all parameters except for x and y.

All the @inherit tags (including the existing @inheritParams) now work recursively, so you can inherit from a function that inherited from elsewhere.

If you want to generate a basic package documentation page (accessible from package?packagename and ?packagename), you can document the special sentinel value "_PACKAGE". It automatically uses the title, description, authors, url and bug reports fields from the DESCRIPTION. The simplest approach is to do this:

#' @keywords internal
"_PACKAGE"

It only includes what’s already in the DESCRIPTION, but it will typically be easier for R users to access.

Today we are pleased to release version 1.1.1 of xml2. xml2 makes it easy to read, create, and modify XML with R. You can install it with:

install.packages("xml2")

As well as fixing many bugs, this release:

  • Makes it easier to create an modify XML
  • Improves roundtrip support between XML and lists
  • Adds support for XML validation and XSLT transformations.

You can see a full list of changes in the release notes. This is the first release maintained by Jim Hester.

Creating and modifying XML

xml2 has been overhauled with a set of methods to make generating and modfying XML easier:

  • xml_new_root() can be used to create a new document and root node simultaneously.
    xml_new_root("x") %>%
      xml_add_child("y") %>%
      xml_root()
    #> {xml_document}
    #> <x>
    #> [1] <y/>
  • New xml_set_text(), xml_set_name(), xml_set_attr(), and xml_set_attrs() make it easy to modify nodes within a pipeline.
    x <- read_xml("<a>
        <b />
        <c><b/></c>
      </a>")
    x
    #> {xml_document}
    #> <a>
    #> [1] <b/>
    #> [2] <c>\n  <b/>\n</c>
    
    x %>% 
      xml_find_all(".//b") %>% 
      xml_set_name("banana") %>% 
      xml_set_attr("oldname", "b")
    x
    #> {xml_document}
    #> <a>
    #> [1] <banana oldname="b"/>
    #> [2] <c>\n  <banana oldname="b"/>\n</c>
  • New xml_add_parent() makes it easy to insert a node as the parent of an existing node.

  • You can create more esoteric node types with xml_comment() (comments), xml_cdata() (CDATA nodes), and xml_dtd() (DTDs).

Coercion to and from R Lists

xml2 1.1.1 improves support for converting to and from R lists, thanks in part to work by Peter Foley and Jenny Bryan. In particular xml2 now supports preserving the root node name as well as saving all xml2 attributes as R attributes. These changes allows you to convert most XML documents to and from R lists with as_list() and as_xml_document() without loss of data.

x <- read_xml("<fruits><apple color = 'red' /></fruits>")
x
#> {xml_document}
#> <fruits>
#> [1] <apple color="red"/>
as_list(x)
#> $apple
#> list()
#> attr(,"color")
#> [1] "red"
as_xml_document(as_list(x))
#> {xml_document}
#> <apple color="red">

XML validation and xslt

xml2 1.1.1 also adds support for XML validation, thanks to Jeroen Ooms. Simply read the document and schema files and call xml_validate().

doc <- read_xml(system.file("extdata/order-doc.xml", package = "xml2"))
schema <- read_xml(system.file("extdata/order-schema.xml", package = "xml2"))
xml_validate(doc, schema)
#> [1] TRUE
#> attr(,"errors")
#> character(0)

Jeroen also released the first xml2 extension package in conjunction with xml2 1.1.1, xslt. xslt allows one to apply XSLT (Extensible Stylesheet Language Transformations) to XML documents, which are great for transforming XML data into other formats such as HTML.

We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including:

  • Extended dplyr support by implementing: do() and n_distinct().
  • New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer().
  • Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol().
  • Experimental support for Livy to enable clients, including RStudio, to connect remotely to Apache Spark.
  • Improved connections by simplifying initialization and providing error diagnostics.
  • Certified sparklyr, RStudio Server Pro and ShinyServer Pro with Cloudera.
  • Updated spark.rstudio.com with new deployment examples and a sparklyr cheatsheet.

Additional changes and improvements can be found in the sparklyr NEWS file.

For questions or feedback, please feel free to open a sparklyr github issue or a sparklyr stackoverflow question.

Extended dplyr support

sparklyr 0.5 adds supports for n_distinct() as a faster and more concise equivalent of length(unique(x)) and also adds support for do() as a convenient way to perform multiple serial computations over a group_by() operation:

library(sparklyr)
sc <- spark_connect(master = "local")
mtcars_tbl <- copy_to(sc, mtcars, overwrite = TRUE)

by_cyl <- group_by(mtcars_tbl, cyl)
fit_sparklyr <- by_cyl %>% 
   do(mod = ml_linear_regression(mpg ~ disp, data = .))

# display results
fit_sparklyr$mod

In this case, . represents a Spark DataFrame, which allows us to perform operations at scale (like this linear regression) for a small set of groups. However, since each group operation is performed sequentially, it is not recommended to use do() with a large number of groups. The code above performs multiple linear regressions with the following output:

[[1]]
Call: ml_linear_regression(mpg ~ disp, data = .)

Coefficients:
 (Intercept)         disp 
19.081987419  0.003605119 

[[2]]
Call: ml_linear_regression(mpg ~ disp, data = .)

Coefficients:
(Intercept)        disp 
 40.8719553  -0.1351418 

[[3]]
Call: ml_linear_regression(mpg ~ disp, data = .)

Coefficients:
(Intercept)        disp 
22.03279891 -0.01963409 

It’s worth mentioning that while sparklyr provides comprehensive support for dplyr, dplyr is not strictly required while using sparklyr. For instance, one can make use of DBI without dplyr as follows:

library(sparklyr)
library(DBI)

sc <- spark_connect(master = "local")
sdf_copy_to(sc, iris)
dbGetQuery(sc, "SELECT * FROM iris LIMIT 4")
  Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

New functions

The new sdf_quantile() function computes approximate quantiles (to some relative error), while the new ft_tokenizer() and ft_regex_tokenizer() functions split a string by white spaces or regex patterns.

For example, ft_tokenizer() can be used as follows:

library(sparklyr)
library(janeaustenr)
library(dplyr)

sc %>%
  spark_dataframe() %>%
  na.omit() %>%
  ft_tokenizer(input.col = “text”, output.col = “tokens”) %>%
  head(4)

Which produces the following output:

                   text                book     tokens
                  <chr>               <chr>     <list>
1 SENSE AND SENSIBILITY Sense & Sensibility <list [3]>
2                       Sense & Sensibility <list [1]>
3        by Jane Austen Sense & Sensibility <list [3]>
4                       Sense & Sensibility <list [1]>

Tokens can be further processed through, for instance, HashingTF.

Improved compatibility

‘na.action’ is a parameter accepted as part of the ‘ml.options’ argument, which defaults to getOption("na.action", "na.omit"). This allows sparklyr to match the behavior of R while processing NA records, for instance, the following linear model drops NA record appropriately:

library(sparklyr)
library(dplyr)
library(nycflights13)

sc <- spark_connect(master = "local")
flights_clean <- na.omit(copy_to(sc, flights))

ml_linear_regression(
  flights_tbl
  response = "dep_delay",
  features = c("arr_delay", "arr_time"))
* Dropped 9430 rows with 'na.omit' (336776 => 327346)
Call: ml_linear_regression(flights_tbl, response = "dep_delay",
                           features = c("arr_delay", "arr_time"))

Coefficients:
 (Intercept)    arr_delay     arr_time 
6.1001212994 0.8210307947 0.0005284729

In addition, dim(), nrow() and ncol() are now supported against Spark DataFrames.

Livy connections

Livy, “An Open Source REST Service for Apache Spark (Apache License)”, is now available in sparklyr 0.5 as an experimental feature. Among many scenarios, this enables connections from the RStudio desktop to Apache Spark when Livy is available and correctly configured in the remote cluster.

Livy running locally

To work with Livy locally, sparklyr supports livy_install() which installs Livy in your local environment, this is similar to spark_install(). Since Livy is a service to enable remote connections into Apache Spark, the service needs to be started with livy_service_start(). Once the service is running, spark_connect() needs to reference the running service and use method = "Livy", then sparklyr can be used as usual. A short example follows:

livy_install()
livy_service_start()

sc <- spark_connect(master = "http://localhost:8998",
                    method = "livy")
copy_to(sc, iris)

spark_disconnect(sc)
livy_service_stop()

Livy running in HDInsight

Microsoft Azure supports Apache Spark clusters configured with Livy and protected with basic authentication in HDInsight clusters. To use sparklyr with HDInsight clusters through Livy, first create the HDInsight cluster with Spark support:

hdinsight-azureCreating Spark Cluster in Microsoft Azure HDInsight

Once the cluster is created, you can connect with sparklyr as follows:

library(sparklyr)
library(dplyr)

config <- livy_config(user = "admin", password = "password")
sc <- spark_connect(master = "https://dm.azurehdinsight.net/livy/",
                    method = "livy",
                    config = config)

copy_to(sc, iris)

From a desktop running RStudio, the remote connection looks like this:

rstudio-hdinsight-azure.png

Improved connections

sparklyr 0.5 no longer requires internet connectivity to download additional Apache Spark packages. This enables connections in secure clusters that do not have internet access or while on the go.

Some community members reported a generic “Ports file does not exists” error while connecting with sparklyr 0.4. In 0.5, we’ve deprecated the ports file and improved error reporting. For instance, the following invalid connection example throws: a descriptive error, the spark-submit parameters and logging information that helps us troubleshoot connection issues.

> library(sparklyr)
> sc <- spark_connect(master = "local",
                      config = list("sparklyr.gateway.port" = "0"))
Error in force(code) : 
  Failed while connecting to sparklyr to port (0) for sessionid (5305): 
  Gateway in port (0) did not respond.
  Path: /spark-1.6.2-bin-hadoop2.6/bin/spark-submit
  Parameters: --class, sparklyr.Backend, 'sparklyr-1.6-2.10.jar', 0, 5305


---- Output Log ----
16/12/12 12:42:35 INFO sparklyr: Session (5305) starting

---- Error Log ----

Additional technical details can be found in the sparklyr gateway socket pull request.

Cloudera certification

sparklyr 0.4, sparklyr 0.5, RStudio Server Pro 1.0 and ShinyServer Pro 1.5 went through Cloudera’s certification and are now certified with Cloudera. Among various benefits, authentication features like Kerberos, have been tested and validated against secured clusters.

For more information see Cloudera’s partner listings.

We have released the R package bookdown (v0.3) to CRAN. It may be old news to some users, but we are happy to make an official announcement today. To install the package from CRAN, you can

install.packages("bookdown")

The bookdown package provides an easier way to write books and technical publications than traditional tools such as LaTeX and Word. It inherits the simplicity of syntax and flexibility for data analysis from R Markdown, and extends R Markdown for technical writing, so that you can make better use of document elements such as figures, tables, equations, theorems, citations, and references, etc. Similar to LaTeX, you can number and cross-reference these elements with bookdown. Read the rest of this entry »

Want to Master R? There’s no better time or place than Hadley Wickham’s workshop on December 12th and 13th at the Cliftons in Melbourne, VIC, Australia.

Register here: https://www.eventbrite.com/e/master-r-developer-workshop-melbourne-tickets-22546200292   (Note: Prices are in $US and VAT is not collected)

Discounts are still available for academics (students or faculty) and for 5 or more attendees from any organization. Email training@rstudio.com if you have any questions about the workshop that you don’t find answered on the registration page.

Hadley has no Master R Workshops planned in the region for 2017 and his next one with availability won’t be until September in San Francisco. If you’ve always wanted to take Master R but haven’t found the time, Melbourne, the second most fun city in the world, is the place to go!

P.S. We’ve arranged a “happy hour” reception after class on Monday the 12th. Be sure to set aside an hour or so after the first day to talk to your classmates and Hadley about what’s happening in R.

Today we are pleased to release a new version of svglite. This release fixes many bugs, includes new documentation vignettes, and improves fonts support.

You can install svglite with:

install.packages("svglite")

Font handling

Fonts are tricky with SVG because they are needed at two stages:

  • When creating the SVG file, the fonts are needed in order to correctly measure the amount space each character occupies. This is particularly important for plot that use plotmath.
  • When drawing the SVG file on screen, the fonts are needed to draw each character correctly.

For the best display, that means you need to have the same fonts installed on both the computer that generates the SVG file and the computer that draws it. By default, svglite uses fonts that are installed on pretty much every computer. svglite’s font support is now much more flexible thanks to two new arguments: system_fonts and user_fonts.

  1. system_fonts allows you to specify the name of a font installed on your computer. This is useful, for example, if you’d like to use a font with better CJK support:
    svglite("Rplots.svg", system_fonts = list(sans = "Arial Unicode MS"))
    plot.new()
    text(0.5, 0.5, "正規分布")
    dev.off()
  2. user_fonts allows you to specify a font installed in a R package (like fontquiver). This is needed if you want to generate identical plot across different operating systems, and are using in the upcoming vdiffr package which provides graphical unit tests.

For more details, see vignette("fonts").

Text scaling

This update also fixes many bugs. The most important is that text is now properly scaled within the plot, and we provide a vignette that describes the details: vignette("scaling"). It documents, for instance, how to include a svglite graphic in a web page with the figure text consistently scaled with the surrounding text.

Find a full list of changes in the release notes.

It’s nearly summeRtime in Australia! Join RStudio Chief Data Scientist Hadley Wickham for his popular Master R workshop in Melbourne.

Register here:  https://www.eventbrite.com/e/master-r-developer-workshop-melbourne-tickets-22546200292

Melbourne will be Hadley’s first and only scheduled Master R workshop in Australia. Whether you live or work nearby or you just need one more good reason to visit Melbourne in the Southern Hemisphere spring, consider joining him at the Cliftons Melbourne on December 12th and 13th. It’s a rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers.

Hadley’s workshops usually sell out. This is his final Master R in 2016 and he has no plans to offer another in the area in 2017. If you’re an active R user and have been meaning to take this class, now is the perfect time to do it!

We look forward to seeing you in Melbourne!

rstudio::conf 2017, the conference on all things R and RStudio, is only 90 days away. Now is the time to claim your spot or grab one of the few remaining seats at Training Days – including the new Tidyverse workshop.

REGISTER NOW

Whether you’re already registered or still working on it, we’re delighted today to announce the full conference schedule, so that you can plan your days in Florida.

rstudio::conf 2017 takes place January 12-14 at the Gaylord Resorts in Kissimmee, Florida. There are over 30 talks and tutorials to choose from that are sure to accelerate your productivity in R and RStudio. In addition to the highlights below, topics include the latest news on R notebooks, sparklyr, profiling, the tidyverse, shiny, r markdown, html widgets, data access and the new enterprise-scale publishing capabilities of RStudio Connect.

Schedule Highlights

Keynotes
– Hadley Wickham, Chief Scientist, RStudio: Data Science in the Tidyverse
– Andrew Flowers, Economics Writer, FiveThirtyEight: Finding and Telling Stories with R
– J.J. Allaire, Software Engineer, CEO & Founder: RStudio Past, Present and Future

Tutorials
– Winston Chang, Software Engineer, RStudio: Building Dashboards with Shiny
– Charlotte Wickham, Oregon State University: Happy R Users Purrr
– Yihui Xie, Software Engineer, RStudio: Advanced R Markdown
– Jenny Bryan, University of British Columbia: Happy Git and GitHub for the UseR

Featured Speakers
– Max Kuhn, Senior Director Non-Clinical Statistics, Pfizer
– Dirk Eddelbuettel, Ketchum Trading: Extending R with C++: A Brief Introduction to Rcpp
– Hilary Parker, Stitch Fix: Opinionated Analysis Development“
Bryan Lewis, Paradigm4: “Fun with htmlwidgets”
Ryan Hafen, Hafen Consulting: “Interactive plotting with rbokeh and crosstalk”
Julia Silge, Datassist: “Text mining, the tidy way”
Bob Rudis, Rapid7: “Writing readable code with pipes”

Featured Talk
– Joseph Rickert, R Ambassador, RStudio: R’s Role in Data Science

Be sure to visit https://www.rstudio.com/conference/ for the full schedule and latest updates and don’t forget to download the RStudio conference app to help you plan your days in detail.

Special Reminder: When you register, make sure you purchase your ticket for Friday evening at Universal’s Wizarding World of Harry Potter. The park is reserved exclusively for rstudio::conf attendees. It’s an extraordinary experience we’re sure you’ll enjoy!

We appreciate our sponsors and exhibitors!