<- 1.234
x
x#> [1] 1.234
options(OutDec = ",")
x#> [1] 1,234
10 Cross-platform paper cuts
Before we dive into the biggest challenges of working on another machine, I want to quickly cover a bunch of annoying paper cuts. None of these are particularly hard to work around, but they’re likely to be confusing the first time you encounter them and they continue to be a minor nuisance long into your production journey.
These issues are particularly likely to surprise you when you first move from running R code on your laptop (usually Windows or Mac) to Linux. This might happen because you’ve joined a company that provides a central development environment on Linux or you’re deploying production code onto a Linux server.
10.1 Windows vs Mac/Linux
There are a few differences specifically between Windows and Linux that you need to know about. If you’re using a Mac you can skip this section.
10.1.1 Paths
There are three main differences when it comes to handling paths on Windows: the path separator, the meaning of ~
, and case-sensitivity.
In Windows, you generally use \
to separate directories in a path. This is a pain in R because \
is the string escape character, so whenever you’re typing a path, you have to use \\
. On Linux (and Mac), you use /
to separate directories, and \
doesn’t work. The easiest way around this is to get in the habit of using /
; this works everywhere and is easier to type 😄. Alternatively, you can avoid ever typing a directory separator by using a tool like file.path()
, or even better, fs::path()
. We highly recommend using the fs package for path manipulations as it standardizes every input to use /
and ensures that your paths never have multiple /
or trailing /
.
Another path issue is the meaning of ~
. On Linux and Mac, this is a shortcut to your home directory, i.e., the directory in which your desktop and documents directories live. On Windows, however, ~
points to your documents directory. You can again avoid this problem by using fs, which uses the standard meaning of ~
. This is unlikely to affect your production code (since it should be self-contained), but it’s good to be aware of.
The final path issue that you might run into is that paths on Linux are generally case-sensitive, while paths on Windows are not. If you’re ever scratching your head wondering why your production script can’t find a file that’s very obviously there, start by double-checking the case! (The most common way this affects me is when I refer to foo.r
but I have actually created foo.R
.)
Absolute paths look a little different depending on your operating system: on Windows, they start with a drive letter (e.g., C:
) or two backslashes (e.g., \\servername
), but on Mac/Linux, they start with a slash “/” (e.g., /Users/hadley
). Fortunately, this should rarely be a problem for production scripts since they aren’t going to work anyway unless you assume that your production environment has exactly the same directory configuration as your development environment.
10.1.2 Line endings
Windows uses \r\n
(CRLF) as a line-ending character and Linux uses \n
(LF). Generally, most R functions automatically switch between the two forms as needed1, but if for some reason this becomes a problem, you might try the brio package which always uses \r\n
when writing file.
Git also provides some tools to deal with this problem: on your Windows machine you can configure git to automatically convert CRLF to LF:
git config --global core.autocrlf true
GitHub provides some nice documentation with more details if you’re interested.
10.1.3 UTF-8 encoding
One paper cut has been eliminated in recent versions of R (4.2 or greater) or recent versions of Windows (Windows 10 or later): character encoding. Now Windows uses UTF-8 just the same as Mac and Linux. You can learn more about this challenge and how it was overcome on the R Core blog.
10.2 Installing R
On Mac and Windows, it’s pretty straightforward to know how to install R: you go to https://www.r-project.org and download the appropriate installer. How do you install R on Linux? There are two ways:
- Using your system package manager, e.g. with
sudo yum install R
. - Using rig.
We recommend rig for four reasons:
It makes it easy to install any version of R.
It makes it easy to install multiple versions of R and easily switch between them.
It sets you up for package installation success by installing pak and picking good defaults for your CRAN mirrors (more on that in Chapter 12).
On Linux, it installs R binaries built by https://github.com/rstudio/r-builds and these binaries also include backported fixes for any major CVEs.
The main downside to using rig is that installing it the first time a bit more work, and you’ll need to follow the instructions on its website. But once you’ve got it, it’s really easy to install new versions of R:
rig install release
Or change your default R back to an older version:
rig default 3.6
rig also works great on Windows and Mac and we recommend you use it whenever you want to install R.
10.3 Locales
The system locale defines the regional settings that affect how certain types of data are display and processed by R. It includes things like your time zone and your language, which affects how strings are sorted, how upper and lower case work, how dates are displayed and parsed, and how numbers are displayed.
You almost certainly have your laptop set up with your current time zone and whatever regional settings make the most for you. When you run your code on a Linux server, it’s likely to be in a default state where the time zone is UTC and the language is English. Although these problems probably won’t affect you that much, it’s important to know which R functions are likely to give you different results locally and on the server:
When you convert a date-time to a string (e.g. by printing it,
format()
ing it, or pass it toas.character())
it will use the system time zone, which is likely to be UTC on a server. To make the time easier to understand you might want to supply thetz
that you work in. You can find the name of the timezone that your laptop uses by runningSys.timezone()
and learn more about timezone names in general in R4DS.Whenever you
sort()
,order()
, orrank()
a character vector, R will use the current locale to determine the sorting order. On Linux server likely to default to C ordering, which orders strings by their underlying numeric representation. This is unlikely to be what you want, even for English, because it sorts upper case and lower case letters far apart. You can instead usestringr::str_sort()
,stringr::str_order()
, andstringr::str_rank()
which all take an explicit locale argument.When you create a new
factor()
, it creates the levels from the sorted unique values. Because sorting varies (as above) this means that your factor levels might vary, and because factor levels define the contrasts this means that coefficients of models can differ (but not their predictions). Instead you can explicitly supply thelevels
or useforcats::fct()
which uses the unique values in the order that they appear.toupper()
andtolower()
can vary based on the current locale. For example, Turkish has a dotless i, ı, which is the lower case form of I. There are relatively few languages where this matters2 but it’s worth knowing about the problem and the solution: switchingstringr::str_to_upper()
andstringr::str_to_lower()
.strptime()
, which parses dates and times, relies on the current locale for day of week (%a
/%A)
and month name (%b
/%B
). For example, English has Monday and January, but French has lundi and janvier, and Korean has 일요일 and 1월. If you’re parsing date-times and need to control which language is used, you can usereadr::parse_date()
,lubridate::ymd()
and friends, orclock::date_parse()
. All of these functions take an explicit locale argument.
Finally, note the OutDec
option which determines what character is used for the decimal place:
This doesn’t affect parsing, so is less likely to cause problems, but you may need to explicit set it if numbers are not correctly formatted in your output.
10.4 Plots
10.4.1 Graphics devices
If you are producing PNG graphics, be aware that the underlying implementation of the png()
graphics device varies from platform to platform. That means the rendering of your plots is going to be a little different when rendering them on Linux (but you may find they look better than on Windows, where the default png device is not so good). Most of the time this isn’t too important, but if you really care about the details of your plots it’s worth knowing how to do better.
Fortunately the solution is easy: use the ragg package. As well as creating identical plots on every platform, it’s also faster, provides advanced text rendering (including right-to-left text and emoji), consistently high-quality graphics rendering, and more convenient access to system fonts. How to use it depends on how you’re creating your plots:
If you’re manually creating plots, switch from
png()
toragg::agg_png()
.For ggplot2,
ggplot2::ggsave()
will use ragg if it’s installed.For RStudio, follow the advice at https://ragg.r-lib.org/#use-ragg-in-rstudio.
For knitr, change the default plotting device by including the following code in your setup chunk:
knitr::opts_chunk$set(dev = "ragg_png")
.For Shiny,
plotOutput()
will use use ragg if it’s installed.
If you want ggplot2 and Shiny to use ragg in your production environment, you’ll need to explicitly add it as a dependency by including requireNamespace(ragg)
somewhere in your code. That will ensure it gets captured in your manifest and installed in your production environment.
Alternatively, if you need vector plots, you can try svglite. This is similarly designed to produce identical plots on every platform, but instead of using the raster .png
format, it uses the vector .svg
format. For plots with relatively few elements this can produce files that are both higher quality and have a smaller size.
10.4.2 Fonts
Fonts are more likely to make a difference to the rendering of your plot than the graphics because different operating systems come with different fonts. This will inevitably crop up, even if you use ragg, since the default fonts ("sans"
, "serif"
, "mono"
, and "symbol"
) are mapped to different fonts. But if you decide to be explicit about the fonts by provide their full name, you’ll run into a different problem: custom fonts probably aren’t installed in your production environment. And while you probably know how to install a font on your laptop, installing installing fonts on a Linux server is a different challenge entirely.
There is no perfect solution at the moment. You can ask your system administrator to install fonts, but this is far from nimble. If you use ragg or svglite you can make use of systemfonts ability to register fonts not installed on the system and provide the font files as part of the deployment. We plan to improve on this process in the future so that fonts can be declared in your code and automatically fetched if missing, but we are not there yet.
This is the primary difference between text and binary mode connections in R: when writing to a text mode connection any
\n
is automatically converted to\r\n
on Windows.↩︎Mostly because there are relatively few languages that have both upper and lower case letters and different rules to English, but the differing rules for Turkish did cause a real bug in ggplot2!↩︎