As you may have already gathered: I love R. It’s invaluable as a tool for data analysis. People complain that it is a weird programming language and, well, it is. But it’s more than just a language – it’s a whole environment for doing data-centric research.
The greatest advantage of R, however, is that it’s a research community. It’s a network of some of the best statisticians, computer scientists and data analysts in the world who want to share their knowledge with others. Because of this there is a vast array of add on packages available, enabling you to do just about any thing to your data from within R. There’s also a whole load of free books, tutorials and documentation on methods for R users.
Open source software like R is an important step towards the ideal of open science – opening up the whole scientific process so that everyone can scrutinise it and learn from it. But there’s more to open science than just using open source software and publishing in open access journals. To reach a truly open science we need to make the data, methods and results available and accessible for everyone. We also need to communicate the research to people in other fields and the general public.
A number of recent developments extend R from being an environment for data analysis to one for sharing code and research results – a platform for open science.
The R function Sweave has been around for a while and can be used to produce PDF documents containing R code and outputs. This makes it relatively simple to produce documents like Jari Oksanen’s brilliant tutorial for ordination methods using the vegan package. The functionality of Sweave has recently been extended by the knitr package.
Sweave and knitr produce great looking PDFs, but they’re a bit of a leap for those of us who aren’t familiar with LaTeX. It would also be great to publish to a more interactive and distributable format, say HTML. The recently released markdown package for R does just that. Markdown produces great looking HTML from a simple plain text file. RStudio, who make the nice IDE of the same name for R, have even launched a new website for R users to share these HTML scripts with one another!
These tools create a great way of creating highly readable and flexible versions of R scripts and sharing them with the world. This should make it even easier to share data analysis methods and move us towards more open science. It will be interesting to see how these tools tie in with the more traditional methods of science publishing. Will appendices containing all the code and plots needed to carry out the analysis start appearing with published articles? Will it herald a move towards open notebook science for data scientists?
As I mentioned above open science isn’t just about sharing more of the scientific process among scientists, it’s also about sharing the process and the results and implications of our research with the wider public.
Slide show presentations are the way most scientists are used to sharing their research and despite how many terrible slideshows we’ve all seen, they can be a great format for telling a story and illustrating it with graphics. Powerpoint is still the presentation software of choice for most scientists, but other programs are gaining ground. HTML5 is starting to look like a great option for creating presentations and it has greater flexibility than Powerpoint when it comes to embedding different types of media, particularly interactive charts.
And of course you can now create HTML5 presentations from R markdown files. The slidify package looks like a really good way of doing this. Here’s a nice example of an HTML5 presentation created from R (hit F11 to make it full screen). And here’s an interactive data visualisation in HTML5 to whet your appetite.