You are currently browsing the monthly archive for November 2016.
Want to Master R? There’s no better time or place than Hadley Wickham’s workshop on December 12th and 13th at the Cliftons in Melbourne, VIC, Australia.
Register here: https://www.eventbrite.com/e/master-r-developer-workshop-melbourne-tickets-22546200292 (Note: Prices are in $US and VAT is not collected)
Discounts are still available for academics (students or faculty) and for 5 or more attendees from any organization. Email email@example.com if you have any questions about the workshop that you don’t find answered on the registration page.
Hadley has no Master R Workshops planned in the region for 2017 and his next one with availability won’t be until September in San Francisco. If you’ve always wanted to take Master R but haven’t found the time, Melbourne, the second most fun city in the world, is the place to go!
P.S. We’ve arranged a “happy hour” reception after class on Monday the 12th. Be sure to set aside an hour or so after the first day to talk to your classmates and Hadley about what’s happening in R.
I’m very pleased to announce ggplot2 2.2.0. It includes four major new features:
- Subtitles and captions.
- A large rewrite of the facetting system.
- Improved theme options.
- Better stacking.
It also includes as numerous bug fixes and minor improvements, as described in the release notes.
The majority of this work was carried out by Thomas Pederson, who I was lucky to have as my “ggplot2 intern” this summer. Make sure to check out his other visualisation packages: ggraph, ggforce, and tweenr.
Install ggplot2 with:
The facet and layout implementation has been moved to ggproto and received a large rewrite and refactoring. This will allow others to create their own facetting systems, as descrbied in the
vignette("extending-ggplot2"). Along with the rewrite a number of features and improvements has been added, most notably:
- ou can now use functions in facetting formulas, thanks to Dan Ruderman.
ggplot(diamonds, aes(carat, price)) + geom_hex(bins = 20) + facet_wrap(~cut_number(depth, 6))
- Axes are now drawn under the panels in
facet_wrap()when the rentangle is not completely filled.
ggplot(mpg, aes(displ, hwy)) + geom_point() + facet_wrap(~class)
- You can set the position of the axes with the
ggplot(mpg, aes(displ, hwy)) + geom_point() + scale_x_continuous(position = "top") + scale_y_continuous(position = "right")
- You can display a secondary axis that is a one-to-one transformation of the primary axis with
ggplot(mpg, aes(displ, hwy)) + geom_point() + scale_y_continuous( "mpg (US)", sec.axis = sec_axis(~ . * 1.20, name = "mpg (UK)") )
- Strips can be placed on any side, and the placement with respect to axes can be controlled with the
ggplot(mpg, aes(displ, hwy)) + geom_point() + facet_wrap(~ drv, strip.position = "bottom") + theme( strip.placement = "outside", strip.background = element_blank(), strip.text = element_text(face = "bold") ) + xlab(NULL)
theme()function now has named arguments so autocomplete and documentation suggestions are vastly improved.
- Blank elements can now be overridden again so you get the expected behavior when setting e.g.
arrowargument that lets you put arrows on axes.
arrow <- arrow(length = unit(0.4, "cm"), type = "closed") ggplot(mpg, aes(displ, hwy)) + geom_point() + theme_minimal() + theme( axis.line = element_line(arrow = arrow) )
- Control of legend styling has been improved. The whole legend area can be aligned with the plot area and a box can be drawn around all legends:
ggplot(mpg, aes(displ, hwy, shape = drv, colour = fl)) + geom_point() + theme( legend.justification = "top", legend.box = "horizontal", legend.box.margin = margin(3, 3, 3, 3, "mm"), legend.margin = margin(), legend.box.background = element_rect(colour = "grey50") )
legend.marginhave been renamed to
legend.spacingrespectively, as this better indicates their roles. A new
legend.marginactually controls the margin around each legend.
- When computing the height of titles, ggplot2 now inclues the height of the descenders (i.e. the bits
ythat hang underneath). This improves the margins around titles, particularly the y axis label. I have also very slightly increased the inner margins of axis titles, and removed the outer margins.
- The default themes has been tweaked by Jean-Olivier Irisson making them better match
position_fill() now stack values in the reverse order of the grouping, which makes the default stack order match the legend.
avg_price <- diamonds %>% group_by(cut, color) %>% summarise(price = mean(price)) %>% ungroup() %>% mutate(price_rel = price - mean(price)) ggplot(avg_price) + geom_col(aes(x = cut, y = price, fill = color))
(Note also the new
geom_col() which is short-hand for
geom_bar(stat = "identity"), contributed by Bob Rudis.)
If you want to stack in the opposite order, try
ggplot(avg_price) + geom_col(aes(x = cut, y = price, fill = fct_rev(color)))
Additionally, you can now stack negative values:
ggplot(avg_price) + geom_col(aes(x = cut, y = price_rel, fill = color))
The overall ordering cannot necessarily be matched in the presence of negative values, but the ordering on either side of the x-axis will match.
Labels can also be stacked, but the default position is suboptimal:
series <- data.frame( time = c(rep(1, 4),rep(2, 4), rep(3, 4), rep(4, 4)), type = rep(c('a', 'b', 'c', 'd'), 4), value = rpois(16, 10) ) ggplot(series, aes(time, value, group = type)) + geom_area(aes(fill = type)) + geom_text(aes(label = type), position = "stack")
You can improve the position with the
vjust parameter. A
vjust of 0.5 will center the labels inside the corresponding area:
ggplot(series, aes(time, value, group = type)) + geom_area(aes(fill = type)) + geom_text(aes(label = type), position = position_stack(vjust = 0.5))
Today we are pleased to release a new version of svglite. This release fixes many bugs, includes new documentation vignettes, and improves fonts support.
You can install svglite with:
Fonts are tricky with SVG because they are needed at two stages:
- When creating the SVG file, the fonts are needed in order to correctly measure the amount space each character occupies. This is particularly important for plot that use
- When drawing the SVG file on screen, the fonts are needed to draw each character correctly.
For the best display, that means you need to have the same fonts installed on both the computer that generates the SVG file and the computer that draws it. By default, svglite uses fonts that are installed on pretty much every computer. svglite’s font support is now much more flexible thanks to two new arguments:
system_fontsallows you to specify the name of a font installed on your computer. This is useful, for example, if you’d like to use a font with better CJK support:
svglite("Rplots.svg", system_fonts = list(sans = "Arial Unicode MS")) plot.new() text(0.5, 0.5, "正規分布") dev.off()
user_fontsallows you to specify a font installed in a R package (like fontquiver). This is needed if you want to generate identical plot across different operating systems, and are using in the upcoming vdiffr package which provides graphical unit tests.
For more details, see
This update also fixes many bugs. The most important is that text is now properly scaled within the plot, and we provide a vignette that describes the details:
vignette("scaling"). It documents, for instance, how to include a svglite graphic in a web page with the figure text consistently scaled with the surrounding text.
Find a full list of changes in the release notes.
On October 12, RStudio launched R Views with great enthusiasm. R Views is a new blog for R users about the R Community and the R Language. Under the care of editor-in-chief and new RStudio ambassador-at-large, Joseph Rickert, R Views provides a new perspective on R and RStudio that we like to think will become essential reading for you.
You may have read an R Views post already. In the first, widely syndicated, post, Joseph interviewed J.J. Allaire, RStudio’s founder, CEO and most prolific software developer. Later posts by Mine Cetinkaya-Rundel on Highcharts and thoughtful book reviews, new R package picks, and a primer on Naive Bayes from Joseph rounded out the first month. Each post was entirely different from anything you could have read here, on what we now call our Developer Blog at rstudio.org.
Fortunately, you don’t have to choose. Each has its purpose. Our Developer Blog is the place to go for RStudio news. You’ll find product announcements, events, and company happenings – like the announcement of a new blog – right here. R Views is about R in action. You’ll find stories and solutions and opinions that we hope will educate and challenge you.
Subscribe to each and stay up to date on all things R and RStudio!
Thanks for making R and RStudio part of your data science experience and for supporting our work.
The Shiny Server 1.5.x release family upgrades our underlying Node.js engine from 0.10.47 to 6.9.1. The impetus for this change was not stability or performance, but because the 0.10.x release family has reached the end of its life.
We highly recommend that you test on a staging server before upgrading production Shiny Server 1.4.x machines to 1.5. You should always do this for any production-critical software, but it’s particularly important for this release, due to the magnitude of changes to Node.js that we’ve absorbed in one big gulp. (We’ve done thorough end-to-end testing of this release, but there’s no substitute for testing with your own apps, on your own servers.)
Some small bug fixes are also included in this release. See the release notes for more details.
The beginning of the end for Ubuntu 12.04 and Red Hat 5
While we still support Ubuntu 12.04 and Red Hat 5 today, we’ll be moving on from these very old releases in a few months. Both of these distributions will end-of-life in April 2017, and will stop receiving bug fixes and security fixes from their vendors at that time. If you’re using Shiny Server with one of these platforms, we recommend that you start planning your upgrade.
Today we’re very pleased to announce the availability of RStudio Version 1.0! Version 1.0 is our 10th major release since the initial launch in February 2011 (see the full release history below), and our biggest ever! Highlights include:
- Authoring tools for R Notebooks.
- Integrated support for the sparklyr package (R interface to Spark).
- Performance profiling via integration with the profvis package.
- Enhanced data import tools based on the readr, readxl and haven packages.
- Authoring tools for R Markdown websites and the bookdown package.
- Many other miscellaneous enhancements and bug fixes.
R Notebooks add a powerful notebook authoring engine to R Markdown. Notebook interfaces for data analysis have compelling advantages including the close association of code and output and the ability to intersperse narrative with computation. Notebooks are also an excellent tool for teaching and a convenient way to share analyses.
Interactive R Markdown
As an authoring format, R Markdown bears many similarities to traditional notebooks like Jupyter and Beaker. However, code in notebooks is typically executed interactively, one cell at a time, whereas code in R Markdown documents is typically executed in batch.
R Notebooks bring the interactive model of execution to your R Markdown documents, giving you the capability to work quickly and iteratively in a notebook interface without leaving behind the plain-text tools, compatibility with version control, and production-quality output you’ve come to rely on from R Markdown.
In a typical R Markdown document, you must re-knit the document to see your changes, which can take some time if it contains non-trivial computations. R Notebooks, however, let you run code and see the results in the document immediately. They can include just about any kind of content R produces, including console output, plots, data frames, and interactive HTML widgets.
You can see the progress of the code as it runs:
You can preview the results of individual inline expressions, too:
Even your LaTeX equations render in real-time as you type:
This focused mode of interaction doesn’t require you to keep the console, viewer, or output panes open. Everything you need is at your fingertips in the editor, reducing distractions and helping you concentrate on your analysis. When you’re done, you’ll have a formatted, reproducible record of what you’ve accomplished, with plenty of context, perfect for your own records or sharing with others.
Spark with sparklyr
The sparklyr package is a new R interface for Apache Spark. RStudio now includes integrated support for Spark and the sparklyr package, including tools for:
- Creating and managing Spark connections
- Browsing the tables and columns of Spark DataFrames
- Previewing the first 1,000 rows of Spark DataFrames
Once you’ve installed the sparklyr package, you should find a new Spark pane within the IDE. This pane includes a New Connection dialog which can be used to make connections to local or remote Spark instances:
Once you’ve connected to Spark you’ll be able to browse the tables contained within the Spark cluster:
The Spark DataFrame preview uses the standard RStudio data viewer:
Profiling with profvis
“How can I make my code faster?”
If you write R code, then you’ve probably asked yourself this question. A profiler is an important tool for doing this: it records how the computer spends its time, and once you know that, you can focus on the slow parts to make them faster.
RStudio now includes integrated support for profiling R code and for visualizing profiling data. R itself has long had a built-in profiler, and now it’s easier than ever to use the profiler and interpret the results.
To profile code with RStudio, select it in the editor, and then click on Profile -> Profile Selected Line(s). R will run that code with the profiler turned on, and then open up an interactive visualization.
In the visualization, there are two main parts: on top, there is the code with information about the amount of time spent executing each line, and on the bottom there is a flame graph, which shows what R was doing over time. In the flame graph, the horizontal direction represents time, moving from left to right, and the vertical direction represents the call stack, which are the functions that are currently being called. (Each time a function calls another function, it goes on top of the stack, and when a function exits, it is removed from the stack.)
The Data tab contains a call tree, showing which function calls are most expensive:
Armed with this information, you’ll know what parts of your code to focus on to speed things up!
RStudio now integrates with the readr, readxl, and haven packages to provide comprehensive tools for importing data from many text file formats, Excel worksheets, as well as SAS, Stata, and SPSS data files. The tools are focused on interactively refining an import then providing the code required to reproduce the import on new datasets.
For example, here’s the workflow we would use to import the Excel worksheet at http://www.fns.usda.gov/sites/default/files/pd/slsummar.xls.
First provide the dataset URL and review the import in preview mode (notice that this file contains two tables and as a result requires the first few rows to be removed):
We can clean this up by skipping 6 rows from this file and unchecking the “First Row as Names” checkbox:
The file is looking better but some columns are being displayed as strings when they are clearly numerical data. We can fix this by selecting “numeric” from the column drop-down:
The final step is to click “Import” to run the code displayed under “Code Preview” and import the data into R. The code is executed within the console and imported dataset is displayed automatically:
Note that rather than executing the import we could have just copied and pasted the import code and included it within any R script.
RStudio Release History
We started working on RStudio in November of 2008 (8 years ago!) and had our first public release in February of 2011. Here are highlights of the various releases through the years:
The RStudio Release History page on our support website provides a complete history of all major and minor point releases.