Efficient R Programming: A Practical Guide To Smarter Programming

Efficient_R_Programming_A_Practical_Guide_to_Smarter_Programming

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 335 [warning: Documents this large are best viewed by clicking the View PDF Link!]

EfficientRProgramming
APracticalGuidetoSmarterProgramming
ColinGillespieandRobinLovelace
EfficientRProgramming
byColinGillespieandRobinLovelace
Copyright©2017ColinGillespie,RobinLovelace.Allrightsreserved.
PrintedintheUnitedStatesofAmerica.
PublishedbyO’ReillyMedia,Inc.,1005GravensteinHighwayNorth,Sebastopol,CA95472.
O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Online
editionsarealsoavailableformosttitles(http://oreilly.com/safari).Formoreinformation,
contactourcorporate/institutionalsalesdepartment:800-998-9938orcorporate@oreilly.com.
Editor:NicoleTache
ProductionEditor:NicholasAdams
Copyeditor:GillianMcGarvey
Proofreader:ChristinaEdwards
Indexer:WordCoIndexingServices
InteriorDesigner:DavidFutato
CoverDesigner:RandyComer
Illustrator:RebeccaDemarest
December2016:FirstEdition
RevisionHistoryfortheFirstEdition
2016-11-29:FirstRelease
Seehttp://oreilly.com/catalog/errata.csp?isbn=9781491950784forreleasedetails.
TheOReillylogoisaregisteredtrademarkofO’ReillyMedia,Inc.EfficientRProgramming,
thecoverimage,andrelatedtradedressaretrademarksofOReillyMedia,Inc.
Whilethepublisherandtheauthorshaveusedgoodfaitheffortstoensurethattheinformation
andinstructionscontainedinthisworkareaccurate,thepublisherandtheauthorsdisclaimall
responsibilityforerrorsoromissions,includingwithoutlimitationresponsibilityfor
damagesresultingfromtheuseoforrelianceonthiswork.Useoftheinformationand
instructionscontainedinthisworkisatyourownrisk.Ifanycodesamplesorother
technologythisworkcontainsordescribesissubjecttoopensourcelicensesorthe
intellectualpropertyrightsofothers,itisyourresponsibilitytoensurethatyourusethereof
complieswithsuchlicensesand/orrights.
978-1-491-95078-4
[LSI]
Preface
EfficientRProgrammingisaboutincreasingtheamountofworkyoucandowithRinagiven
amountoftime.Itsaboutbothcomputationalandprogrammerefficiency.Therearemany
excellentRresourcesabouttopicssuchasvisualization(e.g.,Chang2012),datascience(e.g.,
GrolemundandWickham2016),andpackagedevelopment(e.g.,Wickham2015).Thereare
evenmoreresourcesonhowtouseRinparticulardomains,includingBayesianstatistics,
machinelearning,andgeographicinformationsystems.However,thereareveryfewunified
resourcesonhowtosimplymakeRworkeffectively.Hints,tips,anddecadesofcommunity
knowledgeonthesubjectarescatteredacrosshundredsofinternetpages,emailthreads,and
discussionforums,makingitchallengingforRuserstounderstandhowtowriteefficient
code.
Inourteachingwehavefoundthatthisissueappliestobeginnersandexperiencedusersalike.
WhetheritsaquestionofunderstandinghowtouseRsvectorobjectstoavoidforloops,
knowinghowtosetupyour.Rprofileand.Renvironfiles,ortheabilitytoharnessR’s
excellentC++interfacetodotheheavylifting,theconceptofefficiencyiskey.Thebookaims
todistilltips,warnings,andtricksofthetradeintoasingle,cohesivewholethatprovidesa
usefulresourcetoRprogrammersofallstripesforyearstocome.
Thecontentofthebookreflectsthequestionsthatourstudentsfromarangeofdisciplines,
skilllevels,andindustrieshaveaskedovertheyearstomaketheirRworkfaster.Howtoset
upmysystemoptimallyforRprogrammingwork?Howcanoneapplygeneralprinciples
fromcomputerscience(suchasdonotrepeatyourself,akaDRY)tothespecificsofanR
script?HowcanRcodebeincorporatedintoanefficientworkflow,includingproject
inception,collaboration,andwrite-up?Andhowcanonequicklylearnhowtousenew
packagesandfunctions?
Thebookanswersthesequestionsandmorein10self-containedchapters.Eachchapterstarts
withthebasicsandgetsprogressivelymoreadvanced,sothereissomethingforeveryonein
eachone.WhilemoreadvancedtopicssuchasparallelprogrammingandC++maynotbe
immediatelyrelevanttoRbeginners,thebookhelpstonavigateRsinfamouslysteeplearning
curvewithacommitmenttostartingslowandbuildingonstrongfoundations.Thuseven
experiencedRusersarelikelytofindpreviouslyhiddengemsofadvice.Whileteachingthis
material,wecommonlyhear“Whydidn’tanyonetellmethatbefore?”
Efficientprogrammingshouldnotbeseenasanoptionalextra,andtheimportanceof
efficiencygrowswiththesizeofprojectsanddatasets.Infact,thisbookwasdevisedwhile
teachingacoursecalledRforBigData,whenitquicklybecameapparentthatifyouwantto
workwithlargedatasets,yourcodemustworkefficiently.Evenwithsmalldatasets,efficient
codethatisbothfasttowriteandfasttorunisavitalcomponentofsuccessfulRprojects.We
foundthattheconceptofefficientprogrammingisimportantinallbranchesoftheR
community.WhetheryouareasporadicuserofR(e.g.,foritsunbeatablerangeofstatistical
packages),lookingtodevelopapackage,orworkingonalargecollaborativeprojectin
whichefficiencyismission-critical,codeefficiencywillhaveamajorimpactonyour
productivity.
Ultimately,efficiencyisaboutgettingmoreoutputforlessworkinput.Totaketheanalogyof
acar,wouldyouratherdrive1,000kmonasingletank(orasinglechargeofbatteries)or
refuelaheavy,clunky,uglycarevery50km?Orwouldyouprefertochooseanaltogether
moreefficientvehicleandcycle?Inthesameway,efficientRcodeisbetterthaninefficientR
codeinalmosteveryway:itiseasiertoread,write,run,share,andmaintain.Thisbook
cannotprovidealltheanswersabouthowtoproducesuchcode,butitcertainlycanprovide
ideas,examplecode,andtipstomakeastartintherightdirectionoftravel.
ConventionsUsedinThisBook
Thefollowingtypographicalconventionsareusedinthisbook:
Italic
Indicatesnewterms,URLs,emailaddresses,filenames,andfileextensions.
Bold
IndicatesthenamesofRpackages.
Constantwidth
Usedforprogramlistings,aswellaswithinparagraphstorefertoprogramelements
suchasvariableorfunctionnames,databases,datatypes,environmentvariables,
statements,andkeywords.
Constantwidthbold
Showscommandsorothertextthatshouldbetypedliterallybytheuser.
Constantwidthitalic
Showstextthatshouldbereplacedwithuser-suppliedvaluesorbyvaluesdeterminedby
context.
T IP
Thiselementsignifiesatiporsuggestion.
NOT E
Thiselementsignifiesageneralnote.
WARNING
Thiselementindicatesawarningorcaution.
UsingCodeExamples
Supplementalmaterial(codeexamples,exercises,etc.)isavailablefordownloadat
https://github.com/csgillespie/efficient.
Thisbookisheretohelpyougetyourjobdone.Ingeneral,ifexamplecodeisofferedwith
thisbook,youmayuseitinyourprogramsanddocumentation.Youdonotneedtocontactus
forpermissionunlessyou’rereproducingasignificantportionofthecode.Forexample,
writingaprogramthatusesseveralchunksofcodefromthisbookdoesnotrequire
permission.SellingordistributingaCD-ROMofexamplesfromO’Reillybooksdoes
requirepermission.Answeringaquestionbycitingthisbookandquotingexamplecodedoes
notrequirepermission.Incorporatingasignificantamountofexamplecodefromthisbook
intoyourproductsdocumentationdoesrequirepermission.
Weappreciate,butdonotrequire,attribution.Anattributionusuallyincludesthetitle,author,
publisher,andISBN.Forexample:“EfficientRProgrammingbyColinGillespieandRobin
Lovelace(O’Reilly).Copyright2017ColinGillespie,RobinLovelace,978-1-491-95078-4.
Ifyoufeelyouruseofcodeexamplesfallsoutsidefairuseorthepermissiongivenabove,
feelfreetocontactusatpermissions@oreilly.com.
OReillySafari
Safari(formerlySafariBooksOnline)isamembership-basedtrainingandreference
platformforenterprise,government,educators,andindividuals.
Membershaveaccesstothousandsofbooks,trainingvideos,LearningPaths,interactive
tutorials,andcuratedplaylistsfromover250publishers,includingOReillyMedia,Harvard
BusinessReview,PrenticeHallProfessional,Addison-WesleyProfessional,MicrosoftPress,
Sams,Que,PeachpitPress,Adobe,FocalPress,CiscoPress,JohnWiley&Sons,Syngress,
MorganKaufmann,IBMRedbooks,Packt,AdobePress,FTPress,Apress,Manning,New
Riders,McGraw-Hill,Jones&Bartlett,andCourseTechnology,amongothers.
Formoreinformation,pleasevisithttp://oreilly.com/safari.
HowtoContactUs
Pleaseaddresscommentsandquestionsconcerningthisbooktothepublisher:
O’ReillyMedia,Inc.
1005GravensteinHighwayNorth
Sebastopol,CA95472
800-998-9938(intheUnitedStatesorCanada)
707-829-0515(internationalorlocal)
707-829-0104(fax)
Wehaveawebpageforthisbook,wherewelisterrata,examples,andanyadditional
information.Youcanaccessthispageathttp://bit.ly/efficient-r-programming.
Tocommentorasktechnicalquestionsaboutthisbook,sendemailto
bookquestions@oreilly.com.
Formoreinformationaboutourbooks,courses,conferences,andnews,seeourwebsiteat
http://www.oreilly.com.
FindusonFacebook:http://facebook.com/oreilly
FollowusonTwitter:http://twitter.com/oreillymedia
WatchusonYouTube:http://www.youtube.com/oreillymedia
Acknowledgments
Thisbookwaswrittenintheopen,andmanypeoplecontributedpullrequeststofixminor
problems.SpecialthanksgoestoOReillywhoallowedthisprocessandeveryonewho
contributedviaGitHub:@Delvis,@richelbilderbeek,@adamryczkowski,@CSJCampbell,
@tktan,@nachti,ConorLawless,@timcdlucas,DirkEddelbuettel,@wolfganglederer,
@HenrikBengtsson,@giocomai,and@daattali.
Manythanksalsotothedetailedfeedbackfromthetechnicalreviewers,RichardCottonand
GarrettGrolemund.
Colin
ToEsther,Nathan,andNiamh.Thanksforyourpatience.
Robin
ThankstomyhousematesinCornerstoneHousingCooperativeforputtingupwithmebeing
antisocialwhileinbookmode.ToeveryoneattheUniversityofLeedsforencouragingmeto
pursueprojectsoutsidetheusualacademicpursuitsofjournalarticlesandconferences.And
thankstoeveryoneinvolvedinthecommunityofopensourcedevelopers,users,and
communicatorswhomadeallthispossible.
Chapter1.Introduction
Thischapterdescribesthewiderangeofpeoplethisbookwaswrittenfor,intermsofRand
programmingexperience,andhowyoucangetthemostoutofit.Anyonesettingoutto
improveefficiencyshouldhaveanunderstandingofpreciselywhattheymeanbytheterm,
andthisisdiscussedwithreferencetoalgorithmicandprogrammerefficiencyin“WhatIs
Efficiency?”,andwithreferencetoRinparticularin“WhatIsEfficientRProgramming?on
thesamepage.Itmayseemobvious,butit’salsoworththinkingaboutwhyanyonewould
botherwithefficientcodenowthatpowerfulcomputersarecheapandaccessible.Thisis
coveredin“WhyEfficiency?.
ThisbookhappilyisnotcompletelyR-specific.NonR–programmingskillsthatareneeded
forefficientRprogramming,whichyouwilldevelopduringthecourseoffollowingthis
book,arecoveredin“Cross-TransferableSkillsforEfficiency”.Atypicallyforabookabout
programming,thissectionintroducestouchtypingandconsistency,cross-transferableskills
thatshouldimproveyourefficiencybeyondprogramming.However,thisisfirstand
foremostabookaboutprogramminganditwouldn’tbesowithoutcodeexamplesinevery
chapter.Despitebeingmoreconceptualanddiscursive,thisopeningchapterisnoexception:
itspenultimatesection(“BenchmarkingandProfiling)describestwoessentialtoolsinthe
efficientRprogrammerstoolboxandhowtousethemwithacoupleofillustrativeexamples.
Thefinalthingtosayattheoutsetishowtousethisbookinconjunctionwiththebook’s
associatedpackageanditssourcecode.Thisiscoveredin“BookResources.
Prerequisites
Asemphasizedinthenextsection,itsusefultoruncodeandexperimentasyouread.This
section,foundatthebeginningofeachchapter,ensuresthatyouhavethenecessarypackages
foreachchapter.Theprerequisitesforthischapterare:
AworkinginstallationofRonyourcomputer(see“InstallingandUpdatingRStudio).
Installandloadthemicrobenchmark,profvis,andggplot2packages(see“InstallingR
Packagesfortipsoninstallingpackagesandkeepingthemup-to-date).Youcanensure
thatthesepackagesareinstalledbyloadingthemasfollows:
library("microbenchmark")
library("profvis")
library("ggplot2")
Theprerequisitesneededtorunthecodecontainedintheentirebookarecoveredin“Book
Resources”attheendofthischapter.
WhoThisBookIsforandHowtoUseIt
ThisbookisforanyonewhowantstomaketheirRcodefastertotype,fastertorun,and
morescalable.TheseconsiderationsgenerallycomeafterlearningtheverybasicsofRfor
dataanalysis;weassumeyouareeitheraccustomedtoRorproficientatprogrammingin
otherlanguages,althoughthisbookcouldstillbeofuseforbeginners.Thusthebookshould
beusefultopeoplewitharangeofskilllevels,whocanbroadlybedividedintothreegroups:
ForprogrammerswithlittleexperiencewithR
ThisbookwillhelpyounavigatethequirksofRtomakeitworkefficiently:itiseasyto
writeslowRcodeifyoutreatitasifitwereanotherlanguage.
ForRuserswithlittleexperienceinprogramming
Thisbookwillshowyoumanyconceptsandtricksofthetrade,someofwhichare
borrowedfromcomputerscience,thatwillmakeyourworkmoretimeeffective.
ForRbeginnerswithlittleexperienceinprogramming
Thisbookcansteeryoutogetthingsright(oratleastlesswrong)attheoutset.Bad
habitsareeasytogainbuthardtolose.Readingthisbookattheoutsetofyour
programmingcareercouldsaveyoumanyhoursinthefuturesearchingthewebfor
issuescoveredinthisbook.
Identifyingwhichgroupyoubestfitintowillhelpyougetthemostoutofit.Foreveryone,we
recommendreadingEfficientRProgrammingwhileyouhaveanactiveRprojectonthego,
whetheritsacollaborativetaskatworkorsimplyapersonalprojectathome.Why?The
scopeofthisbookiswiderthanmostprogrammingtextbooks(Chapter4coversproject
management,forexample)andworkingonaprojectoutsidetheconfinesofitwillhelpput
theconcepts,recommendations,andcodeintopractice.Goingdirectlyfromwordsintoaction
inthiswaywillhelpensurethattheinformationisconsolidated:learnbydoing.
Ifyou’reanRnoviceandfitintothefinalcategory,werecommendthatthisactiveRproject
notbeanimportantdeliverable,butanotherRresource.Thoughthisbookisgeneric,itis
likelythatyourusageofRwillbelargelydomain-specific.Forthisreason,werecommend
readingitalongsideteachingmaterialinyourchosenarea.Furthermore,weadvocatethatall
readersusethisbookalongsideotherRresourcessuchasthenumerousvignettes,tutorials,
andonlinearticlesthattheRcommunityhasproduced(describedinthefollowingtip).Ata
bareminimum,youshouldbefamiliarwithdataframes,looping,andsimpleplots,whichyou
willlearnfromtheseresources.
R ESOUR CES F OR LEAR NINGR
Therearemanyplacestofindgenericanddomain-specificRteachingmaterials.Forcompletebeginners,therearea
numberofintroductoryresources,suchastheexcellentStudent’sGuidetoRandthemoretechnicalIcebreakeRtutorial.
Ralsocomespreinstalledwithguidance,revealedbyenteringhelp.start()intotheRconsole,includingtheclassic
officialguideAnIntroductiontoR,whichisexcellent,butdauntingtomany.Enteringvignette()willdisplayalistof
guidespackagedwithinyourRinstallation(andhencedonotrequireaninternetconnection).Toseethevignettefora
specifictopic,justenterthevignette’snameintothesamecommand(e.g.,vignette(package="dplyr",
"introduction"))toseetheintroductoryvignetteforthedplyrpackage.
AnotherearlyportofcallshouldbetheComprehensiveRArchiveNetwork(CRAN)website.TheContributed
Documentationpagecontainsalistofcontributedresources,mainlytutorials,onsubjectsrangingfrommapmakingto
econometrics.Thenewbookdownwebsitecontainsalistofcomplete(ornearcomplete)booksthatcoverdomainssuch
asRforDataScienceandAuthoringBookswithRMarkdown.WerecommendkeepingyoureyeontheR-o-spherevia
theR-Bloggerswebsite,popularTwitterfeeds,andCRAN-affiliatedemaillistsforup-to-datematerialsthatcanbeused
inconjunctionwiththisbook.
WhatIsEfficiency?
Ineverydaylife,efficiencyroughlymeansworkingwell.Anefficientvehiclegoesfarwithout
guzzlinggas.Anefficientworkergetsthejobdonefastwithoutstress.Andanefficientlight
shinesbrightlywithaminimumofenergyconsumption.Inthisfinalsense,efficiency(η)has
aformaldefinitionastheratioofworkdone(W,lightoutput)peruniteffort(Q,energy
consumptioninthiscase):
Howdoesthistranslateintoprogramming?Efficientcodecanbedefinednarrowlyor
broadly.Thefirst,morenarrowdefinitionisalgorithmicefficiency:howfastthecomputer
canundertakeapieceofworkgivenaparticularpieceofcode.Thisconceptdatesbacktothe
veryoriginsofcomputing,asillustratedbythefollowingquotebyAdaLovelace(1842)in
hernotesontheworkofCharlesBabbage:
Inalmosteverycomputationagreatvarietyofarrangementsforthesuccessionofthe
processesispossible,andvariousconsiderationsmustinfluencetheselectionsamongst
themforthepurposesofacalculatingengine.Oneessentialobjectistochoosethat
arrangementwhichshalltendtoreducetoaminimumthetimenecessaryforcompleting
thecalculation.
Thesecond,broaderdefinitionofefficientcomputingisprogrammerproductivity.Thisisthe
amountofusefulworkaperson(notacomputer)candoperunittime.Itmaybepossibleto
rewriteyourcodebaseinCtomakeit100timesfaster.Butifthistakes100humanhours,it
maynotbeworthit.Computerscanchugawaydayandnight.Peoplecannot.Human
productivityisthesubjectofChapter4.
Bytheendofthisbook,youshouldknowhowtowritecodethatisefficientfromboth
algorithmicandproductivityperspectives.Efficientcodeisalsoconcise,elegant,andeasyto
maintain,whichisvitalwhenworkingonlargeprojects.Butthisraisesthewiderquestion:
whatisdifferentaboutefficientRcodecomparedwithefficientcodeinanyotherlanguage?
WhatIsEfficientRProgramming?
TheissueflaggedbyAdaofhavingagreatvarietyofwaystosolveaproblemiskeyto
understandinghowefficientRprogrammingdiffersfromefficientprogramminginother
languages.Risnotoriousforallowinguserstosolveproblemsinmanyways.Thisisdueto
R’sinherentflexibility,inwhichalmost“anythingcanbemodifiedafteritiscreated”
(Wickham2014).Rsinventors,RossIhakaandRobertGentleman,designedittobethisway:
acellinadataframecanbeselectedinmultiplewaysinbaseRalone(threeofwhichare
illustratedlaterinthischapter,in“BenchmarkingExample”).Thisisusefulbecauseitallows
programmerstousethelanguageasbestsuitstheirneeds,butitcanbeconfusingforpeople
lookingfortherightwayofdoingthingsandcancauseinefficienciesifyoudon’tfully
understandthelanguage.
R’snotorietyforbeingabletosolveaprobleminmultiplewayshasgrownwiththe
proliferationofcommunity-contributedpackages.Inthisbook,wefocusonthebestwayof
solvingproblemsfromanefficiencyperspective.Oftenitisinstructivetodiscoverwhya
certainwayofdoingthingsisfasterthanotherways.However,ifyouraimissimplytoget
stuffdone,youonlyneedtoknowwhatislikelytobethemostefficientway.Inthisway,Rs
flexibilitycanbeinefficient:althoughitmaybeeasiertofindawayofsolvinganygiven
probleminRthanotherlanguages,solvingtheproblemwithRmaymakeithardertofindthe
bestwaytosolvethatproblem,astherearesomany.Thisbooktacklesthisissueheadonby
recommendingwhatwebelievearethemostefficientapproaches.Wehopeyoutrustour
views,basedonyearsofusingandteachingR,butwealsohopethatyouchallengethemat
timesandtestthemwithbenchmarksifyoususpectthere’sabetterwayofdoingthings
(thankstoR’sflexibilityandabilitytointerfacewithotherlanguages,theremaywellbe).
ItiswellknownthatRcodecanlackalgorithmicefficiencycomparedwithlow-level
languagesforcertaintasks,especiallyifitwaswrittenbysomeonewhodoesn’tfully
understandthelanguage.ButitisworthhighlightingthenumerouswaysthatRencourages
andguidesefficiency,especiallyprogrammerefficiency:
Risnotcompiled,butitcallscompiledcode.Thismeansthatyougetthebestofboth
worlds:thankfully,Rremovesthelaboriousstageofcompilingyourcodebeforebeing
abletorunit,butprovidesimpressivespeedgainsbycallingcompiledC,FORTRAN,
andotherlanguagebehindthescenes.
Risafunctionalandobject-orientatedlanguage(Wickham2014).Thismeansthatitis
possibletowritecomplexandflexiblefunctionsinRthatgetahugeamountofwork
donewithasinglelineofcode.
RusesRAMformemory.Thismayseemobvious,butitsworthsaying:RAMismuch
fasterthananyharddisksystem.Comparedwithdatabases,Risthereforeveryfastat
commondatamanipulation,processing,andmodelingoperations.RAMisnowcheaper
thanever,meaningthepotentialdownsidesofthisfeaturearefurtherawaythanever.
Rissupportedbyexcellentintegrateddevelopmentenvironments(IDEs).The
environmentinwhichyouprogramcanhaveahugeimpactonprogrammerefficiency
asitcanprovidehelpquickly,allowforinteractiveplotting,andallowyourRprojectsto
betightlyintegratedwithotheraspectsofyourprojectsuchasfilemanagement,version
management,andinteractivevisualizationsystems,asdiscussedinRStudio.
Rhasastrongusercommunity.Thisboostsefficiencybecauseifyouencountera
problemthathasnotyetbeensolved,youcansimplyaskthecommunity.Ifitisanew,
clearlystated,andreproduciblequestionaskedonapopularforumsuchasStack
OverfloworanappropriateRlist,youarelikelytogetaresponsefromanaccomplished
Rprogrammerwithinminutes.Theobviousbenefitofthiscrowd-sourcedsupport
systemisthattheefficiencybenefitsoftheanswerwill,fromthatmomenton,be
availabletoeveryone.
EfficientRprogrammingistheimplementationofefficientprogrammingpracticesinR.All
languagesaredifferent,soefficientRcodedoesnotlooklikeefficientcodeinanother
language.Manypackageshavebeenoptimizedforperformanceso,forsomeoperations,
achievingmaximumcomputationalefficiencymaysimplybeacaseofselectingthe
appropriatepackageandusingitcorrectly.TherearemanywaystogetthesameresultinR,
andsomeareveryslow.Therefore,notwritingslowcodeshouldbeprioritizedoverwriting
fastcode.
Returningtotheanalogyofthetwocarssketchedinthepreface,efficientRprogrammingfor
someusecasescansimplymeantradinginyourold,heavy,gas-guzzlingSUVfunctionfora
lightweightvelomobile.Thesearchforoptimalperformanceoftenhasdiminishingreturns,
soitisimportanttofindbottlenecksinyourcodetoprioritizeworkformaximumincreases
incomputationalefficiency.LinkingbacktoR’snotorietyasaflexiblelanguage,efficientR
programmingcanbeinterpretedasfindingasolutionthatisfastenoughintermsof
computationalefficiencybutasfastaspossibleintermsofprogrammerefficiency.Afterall,
youandyourcoworkersprobablyhavebetterandmorevaluablethingstodooutsidework,
soitisimportantthatyougetthejobdonequicklyandtaketimeoffforotherinteresting
pursuits.
WhyEfficiency?
Computersarealwaysgettingmorepowerful.Doesthisnotreducetheneedforefficient
computing?Theanswerissimple:no.InanageofBigDataandstagnatingcomputerclock
speeds(seeChapter8),computationalbottlenecksaremorelikelythaneverbeforetohamper
yourwork.Anefficientprogrammercan“solvemorecomplextasks,askmoreambitious
questions,andincludemoresophisticatedanalysesintheirresearch”(Visseretal.2015).
Aconcreteexampleillustratestheimportanceofefficiencyinmission-criticalsituations.
RobinwasworkingonatightcontractfortheUKsDepartmentforTransporttobuildthe
PropensitytoCycleTool,anonlineapplicationthathadtobereadyfornationaldeployment
inlessthanfourmonths.Forthiswork,hedevelopedthefunctionline2route()inthe
stplanrpackagetogenerateroutesviathe(CycleStreets)API.Hundredsofthousandsof
routeswereneeded,but,tohisdismay,codeslowedtoastandstillafteronlyafewthousand
routes.Thisendangeredthecontract.Aftereliminatingotherissuesandviacodeprofiling
(coveredinCodeProfiling),itwasfoundthattheslowdownwasduetoabugin
line2route():itsufferedfromthevectorgrowingproblem,discussedinMemory
Allocation”.
Thesolutionwassimple.Asinglecommitmadeline2route()morethantentimesfasterand
substantiallyshorter.Thispotentiallysavedtheprojectfromfailure.Themoralofthisstoryis
thatefficientprogrammingisnotmerelyadesirableskill—itcanbeessential.
Therearemanyconceptsandskillsthatarelanguage-agnostic.Muchoftheknowledge
impartedinthisbookshouldberelevanttoprogramminginotherlanguages(andother
technicalactivitiesbeyondprogramming).Therearestrongreasonsforfocusingon
efficiencyinonelanguage,however.InR,simplyusingreplacementfunctionsfroma
differentpackagecangreatlyimproveefficiency,asdiscussedinrelationtoreadingtextfiles
inChapter5.Thislevelofdetail,withreproducibleexamples,wouldnotbepossibleina
general-purposeefficientprogrammingbook.Skillsforefficientworking,whichapply
beyondRprogramming,arecoveredinthenextsection.
Cross-TransferableSkillsforEfficiency
ThemeaningofefficientRcode,asopposedtogenericefficientcode,shouldbeclearfrom
readingtheprecedingtwosections.However,thatdoesnotmeanthattheskillsandconcepts
coveredinthisbookarenottransferabletootherlanguagesandnon-programmingtasks.
Likewise,workingonthesecross-transferableskillswillimproveyourRprogramming(as
wellasotheraspectsofyourworkinglife).Twooftheseskillsareespeciallyimportant:touch
typinganduseofaconsistentstyle.
TouchTyping
Theothersideoftheefficiencycoinisprogrammerefficiency.Therearemanythingsthat
willhelpincreasetheproductivityofyouandyourcollaborators,notleastfollowingthe
adviceofPhilippJanertto“thinkmore,workless”(Janert2010).Theevidencesuggeststhat
gooddiet,physicalactivity,plentyofsleep,andahealthywork-lifebalancecanallboostyour
speedandeffectivenessatwork(Jensen2011;Pereiraetal.2015;Grant,Spurgeon,and
Wallace2013).
Whilewerecommendthatthereaderreflectonthisevidenceandtheirownwell-being,thisis
notaself-helpbook.Itisabookaboutprogramming.However,thereisone
nonprogrammingskillthatcanhaveahugeimpactonproductivity:touchtyping.Thisskill
canberelativelypainlesstolearn,andcanhaveahugeimpactonyourabilitytowrite,
modify,andtestRcodequickly.Learningtotouchtypeproperlywillpayoffinsmall
incrementsthroughouttherestofyourprogramminglife(ofcourse,thebenefitsarenot
constrainedtoRprogramming).
Thekeydifferencebetweenatouchtypistandsomeonewhoconstantlylooksdownatthe
keyboard,orwhousesonlytwoorthreefingersfortyping,ishandplacement.Touchtyping
involvespositioningyourhandsonthekeyboardwitheachfingerofbothhandstouchingor
hoveringoveraspecificletter(Figure1-1).Thistakestimeandsomedisciplinetolearn.
Fortunatelytherearemanyresourcesthatwillhelpyougetinthehabitearly,includingthe
opensourcesoftwareprojectsKlavaroandTypeFaster.
Figure1-1.Thestartingpositionfortouchtyping,withthefingersoverthehomekeys.Source:Wikipediaunderthe
CreativeCommonslicense.
ConsistentStyleandCodeConventions
Gettingintothehabitofclearandconsistentstylewhenwritinganything,beitcodeorpoetry,
willhavebenefitsinmanyotherprojects,programmingornon-programming.Asoutlinedin
“CodingStyle”,styleistosomeextentapersonalpreference.However,itisworthnotingthe
conventionsweuseattheoutsetofthisbook,tomaximizeitsreadability.Throughoutthis
bookweuseaconsistentsetofconventionstorefertocode.
Packagenamesareinbold,e.g.,dplyr.
Functionsareinacodefont,followedbyparentheses,likeplot()ormedian().
OtherRobjects,suchasdataorfunctionarguments,areinacodefontwithout
parentheses,likexandname.
Occasionally,we’llhighlightthepackageofthefunctionusingtwocolons,like
microbenchmark::microbenchmark().Notethatthisnotationcanbeefficientifyouonly
needtouseapackagesfunctiononce,asitavoidsattachingthepackage.
TheconceptsofbenchmarkingandprofilingarenotR-specific.However,theyaredoneina
particularwayinR,asoutlinedinthenextsection.
BenchmarkingandProfiling
Benchmarkingandprofilingarekeytoefficientprogramming,especiallyinR.
Benchmarkingistheprocessoftestingtheperformanceofspecificoperationsrepeatedly.
Profilinginvolvesrunningmanylinesofcodetofindbottlenecks.Botharevitalfor
understandingefficiency,andweusethemthroughoutthebook.Theircentralitytoefficient
programmingpracticemeanstheymustbecoveredinthisintroductorychapter,despitebeing
seenbymanyasanintermediateoradvancedRprogrammingtopic.
Insomeways,benchmarkscanbeseenasthebuildingblocksofprofiles.Profilingcanbe
understoodasautomaticallyrunningmanybenchmarksforeverylineinascriptand
comparingtheresultslinebyline.Becausebenchmarksaresmaller,easier,andmore
modular,wecoverthemfirst.
Benchmarking
Modifyingelementsfromonebenchmarktothenextandrecordingtheresultsafterthe
modificationenablesustodeterminethefastestpieceofcode.Benchmarkingisimportantin
theefficientprogrammerstoolkit:youmaythinkthatyourcodeisfasterthanmine,but
benchmarkingallowsyoutoproveit.Theeasiestwaytobenchmarkafunctionistouse
system.time().However,itisimportanttorememberthatwearetakingasample.We
wouldn’texpectasinglepersoninLondontoberepresentativeoftheentireUKpopulation;
similarly,asinglebenchmarkprovidesuswithasingleobservationonourfunction’s
behavior.Therefore,we’llneedtorepeatthetimingmanytimeswithaloop.
Analternativewayofbenchmarkingisviatheflexiblemicrobenchmarkpackage.Thisallows
ustoeasilyruneachfunctionmultipletimes(bydefault,100)inordertodetectmicrosecond
differencesincodeperformance.Wethengetaconvenientsummaryoftheresults:the
minimum/maximumandlower/upperquartiles,andthemean/mediantimes.Wesuggest
focusingonthemediantimetogetafeelforthestandardtimeandthequartilestounderstand
thevariability.
BenchmarkingExample
Agoodexampleistestingdifferentmethodstolookupasinglevalueinadataframe.Note
thateachargumentinthefollowingbenchmarkisatermtobeevaluated(formulti-line
benchmarks,thetermtobeevaluatedcanbesurroundedbycurlybrackets,{}).
library("microbenchmark")
df=data.frame(v=1:4,name=letters[1:4])
microbenchmark(df[3,2],df[3,"name"],df$name[3])
#Unit:microseconds
#exprminlqmeanmedianuqmaxnevalcld
#df[3,2]17.9918.9620.1619.3819.7735.14100b
#df[3,"name"]17.9719.1321.4519.6420.1574.00100b
#df$name[3]12.4813.8115.8114.4815.1467.24100a
Theresultssummarizehowlongeachquerytook:theminimum(min);lowerandupper
quartiles(lqanduq,respectively);andthemean,median,andmaximum(max)foreachofthe
numberofevaluations(neval,withthedefaultvalueof100usedinthiscase).cldreportsthe
relativerankofeachrowintheformofcompactletterdisplay:inthiscase,df$name[3]
performsbest,witharankofaandameantimeofaround25%lowerthantheothertwo
functions.
Whenusingmicrobenchmark(),youshouldpaycarefulattentiontotheunits.Intheprevious
example,eachfunctioncalltakesapproximately20microseconds,implyingaround50,000
functioncallscouldbedoneinasecond.Whencomparingquickfunctions,thestandardunits
are:
milliseconds(ms)
Onethousandfunctionstakesasecond;
microseconds(µs)
onemillionfunctioncallstakesasecond;
nanoseconds(ns)
onebillioncallstakesasecond.
Wecansettheunitswewanttousewiththeunitargument(e.g.,theresultsarereportedin
secondsifwesetunit="s").
Whenthinkingaboutcomputationalefficiency,thereare(atleast)twoinmeasures:
Relativetime
df$name[3]is25%fasterthandf[3,"name"];
Absolutetime
df$name[3]isfivemicrosecondsfasterthandf[3,"name"].
Bothmeasuresareuseful,butitisimportantnottoforgettheunderlyingtimescale.Itmakes
littlesensetooptimizeafunctionthattakesmicrosecondsifthereareoperationsthattake
secondstocompleteinyourcode.
Profiling
Benchmarkinggenerallyteststheexecutiontimeofonefunctionagainstanother.Profiling,
ontheotherhand,isabouttestinglargechunksofcode.
ItisdifficulttooveremphasizetheimportanceofprofilingforefficientRprogramming.
Withoutaprofileofwhattooklongest,youwillhaveonlyavagueideaofwhyyourcodeis
takingsolongtorun.Thefollowingexample(whichgeneratesFigure1-2,animageofice-
sheetretreatfrom1985to2015)showshowprofilingcanbeusedtoidentifybottlenecksin
yourRscripts:
library("profvis")
profvis(expr={
#Stage1:loadpackages
#library("rnoaa")#notnecessaryasdatapre-saved
library("ggplot2")
#Stage2:loadandprocessdata
out=readRDS("extdata/out-ice.Rds")
df=dplyr::rbind_all(out,id="Year")
#Stage3:visualizeoutput
ggplot(df,aes(long,lat,group=paste(group,Year)))+
geom_path(aes(colour=Year))
ggsave("figures/icesheet-test.png")
},interval=0.01,prof_output="ice-prof")
TheresultsofthisprofilingexercisearedisplayedinFigure1-3.
Formoreinformationaboutprofilingandbenchmarking,pleaserefertotheOptimisingcode
chapterinAdvancedRbyHadleyWickham(CRCPress),and“CodeProfilinginthisbook.
Werecommendreadingtheseadditionalresourceswhileperformingbenchmarksand
profilesonyourowncode,perhapsbasedonthefollowingexercises.
Figure1-2.VisualizationofNorthPoleice-sheetdecline,generatedusingthecodeprofiledusingtheprofvispackage
Figure1-3.ProfilingresultsofloadingandplottingNASAdataonice-sheetretreat
Exercises
Considerthefollowingbenchmarktoevaluatedifferentfunctionsforcalculatingthe
cumulativesumofallthewholenumbersfrom1to100:
x=1:100#initiatevectortocumulativelysum
#Method1:withaforloop(10lines)
cs_for=function(x){
for(iinx){
if(i==1){
xc=x[i]
}else{
xc=c(xc,sum(x[1:i]))
}
}
xc
}
#Method2:withapply(3lines)
cs_apply=function(x){
sapply(x,function(x)sum(1:x))
}
#Method3:cumsum(1line,notshown)
microbenchmark(cs_for(x),cs_apply(x),cumsum(x))
#>Unit:nanoseconds
#>exprminlqmeanmedianuqmaxneval
#>cs_for(x)248145316292386893370505436382697258100
#>cs_apply(x)157610198157255241233324306013478394100
#>cumsum(x)561113117961422207518284100
1. Whichmethodisfastestandhowmanytimesfasterisit?
2. Runthesamebenchmark,butwiththeresultsreportedinseconds,onavectorofall
thewholenumbersfrom1to50,000.Hint:alsousetheargumentneval=1sothat
eachcommandisonlyrunoncetoensurethattheresultscomplete(evenwithasingle
evaluation,thebenchmarkmaytakeuptoormorethanaminutetocomplete,
dependingonyoursystem).Doestherelativetimedifferenceincreaseordecrease?
Byhowmuch?
3. Testhowlongthedifferentmethodsforsubsettingthedataframedf,presentedin
“BenchmarkingExample”,takeonyourcomputer.Isitfasterorsloweratsubsetting
thanthecomputeronwhichthisbookwascompiled?
4. Usesystem.time()andafor()looptotesthowlongittakestoperformthe
subsettingoperation50,000times.Beforetestingthis,doyouthinkitwillbemoreor
lessthanonesecondforeachsubsettingmethod?Hint:thetestforthefirstmethodis
showninthefollowingcode:
#Testhowlongittakestosubsetthedataframe50,000times:
system.time(
for(iin1:50000){
df[3,2]
}
)
5. Bonusexercise:tryprofilingasectionofcodeyouhavewrittenusingprofvis.
Wherearethebottlenecks?Weretheywhereyouexpected?
BookResources
RPackage
ThisbookhasanassociatedRpackagethatcontainsdatasetsandfunctionsreferencedinthe
book.ThepackageishostedonGitHubandcanbeinstalledusingthedevtoolspackage:
devtools::install_github("csgillespie/efficient")
Thepackagealsocontainssolutions(asvignettes)totheexercisesfoundinthisbook.They
canbebrowsedwiththefollowingcommand:
browseVignettes(package="efficient")
Thefollowingcommandwillinstallallpackagesusedtogeneratethisbook:
devtools::install_github("csgillespie/efficientR")
OnlineVersion
WearegratefultoOReillyforallowingustodevelopthisbookonline.Theonlineversion
constitutesasubstantialadditionalresourcetosupplementthisbook,andwillcontinueto
evolveinbetweenreprintsofthephysicalbook.Thebook’scodealsorepresentsasubstantial
learningopportunityinitselfasitwaswrittenusingRMarkdownandthebookdownpackage,
allowingustoruntheRcodeeachtimewecompilethebooktoensurethatitworks,and
allowingotherstocontributetoitslongevity.Toeditthischapter,forexample,simply
navigatetohttps://github.com/csgillespie/efficientR/edit/master/01-introduction.Rmdwhile
loggedintoaGitHubaccount.Thefullsourceofthebookisavailableat
https://github.com/csgillespie/efficientRwherewewelcomecomments/questionsontheIssue
TrackerandPullRequests.
References
Wickham,Hadley.2014a.AdvancedR.CRCPress.
Visser,MarcoD.,SeanM.McMahon,CoryMerow,PhilipM.Dixon,SydneRecord,andEelke
Jongejans.2015.“SpeedingUpEcologicalandEvolutionaryComputationsinR;Essentialsof
HighPerformanceComputingforBiologists.”EditedbyFrancisOuellette.PLOS
ComputationalBiology11(3):e1004140.doi:10.1371/journal.pcbi.1004140.
Janert,PhilippK.2010.DataAnalysiswithOpenSourceTools.OReillyMedia.
Jensen,JørgenDejgård.2011.“CanWorksiteNutritionalInterventionsImproveProductivity
andFirmProfitability?ALiteratureReview.”PerspectivesinPublicHealth131(4).SAGE
Publications:184–92.
Pereira,MichelleJessica,BrookeKayeCoombes,TracyAnneComans,andVenerina
Johnston.2015.“TheImpactofOnsiteWorkplaceHealth-EnhancingPhysicalActivity
InterventionsonWorkerProductivity:ASystematicReview.”Occupationaland
EnvironmentalMedicine72(6).BMJPublishingGroupLtd:401–12.
Grant,ChristineA,LouiseMWallace,andPeterCSpurgeon.2013.“AnExplorationofthe
PsychologicalFactorsAffectingRemoteE-WorkersJobEffectiveness,Well-Beingand
Work-LifeBalance.”EmployeeRelations35(5).EmeraldGroupPublishingLimited:527–46.
Chapter2.EfficientSetup
Anefficientcomputersetupisanalogoustoawell-tunedvehicle.Itscomponentsworkin
harmony.Itiswellserviced.Itsfast!
Thischapterdescribesthesetupthatwillenableaproductiveworkflow.Itexploreshowthe
operatingsystem,Rversion,startupfiles,andIDEcanmakeyourRworkfaster.
Understandingandattimeschangingthesesetupoptionscanhavemanyadditionalbenefits.
That’swhywecoverthematthisearlystage(hardwareiscoveredinChapter3).Bytheendof
thischapter,youshouldunderstandhowtosetupyourcomputerandRinstallationfor
optimalefficiency.Itcoversthefollowingtopics:
Randtheoperatingsystems
SystemmonitoringonLinux,Mac,andWindows
Rversion
HowtokeepyourbaseRinstallationandpackagesup-to-date
Rstart-up
Howandwhytoadjustyour.Rprofileand.Renvironfiles
RStudio
AnIDEtoboostyourprogrammingproductivity
BLASandalternativeRinterpreters
LooksatwaystomakeRfaster
Efficientprogrammingismorethanaseriesoftips:thereisnosubstituteforin-depth
understanding.However,tohelprememberthekeymessagesburiedamongthedetails,each
chapterfromnowoncontainsaTopFiveTipssectionafterthepre-requisites.
Prerequisites
Onlyonepackageneedstobeinstalledtorunthecodeinthischapter:
library("benchmarkme")
TopFiveTipsforanEfficientRSetup
1. Usesystemmonitoringtoidentifybottlenecksinyourhardware/code.
2. KeepyourRinstallationandpackagesup-to-date.
3. MakeuseofRStudiospowerfulautocompletioncapabilitiesandshortcuts.
4. StoreAPIkeysinthe.Renvironfile.
5. ConsiderchangingyourBLASlibrary.
OperatingSystem
Rsupportsallthreemajoroperatingsystem(OS)types:Linux,Mac,andWindows.1Ris
platform-independent,althoughtherearesomeOS-specificquirks,suchasinrelationtofile-
pathnotation(see“TheLocationofStartupFiles”).
BasicOS-specificinformationcanbequeriedfromwithinRusingSys.info():
Sys.info()
#>sysnamereleasemachineuser
#>"Linux""4.2.0-35-generic""x86_64""robin"
TranslatedintoEnglish,theprecedingoutputmeansthatRisrunningona64-bit(x86_64)
Linuxdistribution(4.2.0-35-genericistheLinuxversion)andthatthecurrentuserisrobin.
Fourotherpiecesofinformation(notshown)arealsoproducedbythecommand,the
meaningofwhichiswelldocumentedinahelpfilerevealedbyentering?Sys.infointheR
console.
T IP
Theassertive.reflectionpackagecanbeusedtoreportadditionalinformationaboutyourcomputer’soperating
systemandRsetupwithfunctionsforassertingoperatingsystemandothersystemcharacteristics.Theassert_*()
functionsworkbytestingthetruthofthestatementanderroringifthestatementisuntrue.OnaLinuxsystem
assert_is_linux()willrunsilently,whereasassert_is_windows()willcauseanerror.Thepackagecanalso
testfortheIDEyouareusing(e.g.,assert_is_rstudio()),thecapabilitiesofR
(assert_r_has_libcurl_capability(),etc.),andwhatOStoolsareavailable(e.g.,
assert_r_can_compile_code()).Thesefunctionscanbeusefulforrunningcodethatisdesignedonlytorunon
onetypeofsetup.
OperatingSystemandResourceMonitoring
Minordifferencesaside,Rscomputationalefficiencyisbroadlythesameacrossdifferent
operatingsystems.2Beyondthe32-bitversus64-bitissue(coveredinChapter3)andprocess
forking(coveredinChapter7)anotherOS-relatedissuetoconsiderisexternaldependencies:
programsthatRpackagesdependon.Sometimesexternalpackagedependenciesmustbe
installedmanually(i.e.,notusinginstall.packages()).ThisisespeciallycommononUnix-
basedsystems(LinuxandMac).OnDebian-basedoperatingsystemssuchasUbuntu,manyR
packagescanbeinstalledattheOSleveltoensurethatexternaldependenciesarealso
installed(see“InstallingRPackageswithDependencies”).
ResourcemonitoringistheprocessofcheckingthestatusofkeyOSvariables.For
computationallyintensivework,itissensibletomonitorsystemresourcesinthisway.
Resourcemonitoringcanhelpidentifycomputationalbottlenecks.AlongsideRprofiling
functionssuchasprofvis(see“CodeProfiling),systemmonitoringprovidesausefultool
forunderstandinghowRisperforminginrelationtovariablesreportingtheOSstate,suchas
howmuchRAMisinuse,whichrelatestothewiderquestionofwhethermoreisneeded
(coveredinChapter3).
CPUresourcesallocatedovertimeisanothercommonOSvariablethatisworthmonitoring.
Abasicusecaseistocheckwhetheryourcodeisrunninginparallel(seeFigure2-1),and
whetherthereisspareCPUcapacityontheOSthatcouldbeharnessedbyparallelcode.
Figure2-1.Outputfromasystemmonitor(gnome-system-monitorrunningonUbuntu)showingtheresourcesconsumedby
runningthecodepresentedinthesecondoftheExercisesattheendofthissection.ThefirstincreasesRAMuse,thesecond
issingle-threaded,andthethirdismultithreaded.
Systemmonitoringisacomplextopicthatspillsoverintosystemadministrationandserver
management.Fortunately,therearemanytoolsdesignedtoeasemonitoringonallmajor
operatingsystems.
OnLinux,theshellcommandtopdisplayskeyresourceusefiguresformost
distributions.htopandGnome’sSystemMonitor(gnome-system-monitor;seeFigure2-
1)aremorerefinedalternatives,whichusecommand-lineandgraphicaluserinterfaces,
respectively.Anumberofoptions,suchasnethogs,monitorinternetusage.
OnMac,theActivityMonitorprovidessimilarfunctionality.Thiscanbeinitiatedfrom
theUtilitiesfolderinLaunchpad.
OnWindows,theTaskManagerprovideskeyinformationonRAMandCPUuseby
process.ThiscanbestartedinmodernWindowsversionsbypressingCtrl-Alt-Delorby
clickingthetaskbarandStartTaskManager.
Exercises
1. Whatistheexactversionofyourcomputersoperatingsystem?
2. Startanactivitymonitor,thenexecutethefollowingcodechunk.Init,lapply()(or
itsparallelversion,mclapply())isusedtoapplythefunctionmedian()overevery
columninthedataframeobjectX(see“TheApplyFamily”formoreontheapply
familyoffunctions).Thereasonthisworksisthatadataframeisreallyalistof
vectors,witheachvectorformingacolumn.Howdothesystemoutputlogresultson
yoursystemcomparetothosepresentedinFigure2-1?
#Note:uses2+GBRAMandtakesseveralsecondsdependingonhardware
#1:Createlargedataset
X=as.data.frame(matrix(rnorm(1e8),nrow=1e7))
#2:Findthemedianofeachcolumnusingasinglecore
r1=lapply(X,median)
#3:Findthemedianofeachcolumnusingmanycores
r2=parallel::mclapply(X,median)
NOT E
mclapplyonlyworksinparallelonMacandLinux.InChapter7youlllearnabouttheequivalent
functionparLapply()thatworksinparallelonWindows.
3. WhatdoyounoticeregardingCPUusage,RAM,andsystemtimeduringandafter
eachofthethreeoperations?
4. Bonusquestion:howwouldtheresultschangedependingonoperatingsystem?
RVersion
ItisimportanttobeawarethatRisanevolvingsoftwareproject,whosebehaviorchanges
overtime.Ingeneral,baseRisveryconservativeaboutmakingchangesthatbreakbackwards
compatibility.However,packagesoccasionallychangesubstantiallyfromonereleasetothe
next;typicallyitdependsontheageofthepackage.Formostusecases,werecommend
alwaysusingthemostup-to-dateversionofRandpackagessoyouhavethelatestcode.In
somecircumstances(e.g.,onaproductionserverorworkinginateam),youmay
alternativelywanttousespecificversionsthathavebeentestedtoensurestability.Keeping
packagesup-to-dateisdesirablebecausenewcodetendstobemoreefficient,intuitive,robust,
andfeature-rich.Thissectionexplainshow.
T IP
PreviousRversionscanbeinstalledfromCRAN’sarchiveorpreviousRreleases.Thebinaryversionsforall
OSescanbefoundatcran.r-project.org/bin/.TodownloadbinaryversionsforUbuntuWily,forexample,see
https://cran.r-project.org/bin/linux/ubuntu/wily/.TopinspecificversionsofRpackagesyoucanusethepackrat
package.FormoreonpinningRversionsandRpackages,seethefollowingarticlesonRStudio’swebsite:Using-
Different-Versions-of-Randrstudio.github.io/packrat/.
InstallingR
ThemethodofinstallingRvariesforWindows,Linux,andMac.
OnWindows,asingle.exefile(hostedatcran.r-project.org/bin/windows/base/)willinstall
thebaseRpackage.
OnaMac,thelatestversionshouldbeinstalledbydownloadingthe.pkgfileshostedat
https://cran.r-project.org/bin/macosx/.
OnLinux,theinstallationmethoddependsonthedistributionofLinuxinstalled,thoughthe
principlesarethesame.We’llcoverhowtoinstallRonDebian-basedsystems,withlinksat
theendfordetailsonotherLinuxdistributions.ThefirststageistoaddtheCRANrepository
toensurethatthelatestversionisinstalled.IfyouarerunningUbuntu16.04,forexample,
appendthefollowinglinetothefile/etc/apt/sources.list:
debhttp://cran.rstudio.com/bin/linux/ubuntuxenial/
http://cran.rstudio.comisthemirror(whichcanbereplacedbyanyofthoselistedat
https://cran.r-project.org/mirrors.html)andxenialistherelease.SeetheDebianandUbuntu
installationpagesonCRANforfurtherdetails.
Oncetheappropriaterepositoryhasbeenaddedandthesystemupdated(e.g.,withsudoapt-
getupdate),r-baseandotherr-packagescanbeinstalledusingtheaptsystem.The
followingtwocommands,forexample,wouldinstallthebaseRpackage(abarebonesinstall)
andthepackageRCurl,whichhasanexternaldependency:
sudoapt-getinstallr-cran-base#installbaseR
sudoapt-getinstallr-cran-rcurl#installthercurlpackage
apt-cachesearch"^r-.*"|sortwilldisplayallRpackagesthatcanbeinstalledfromapt
inDebian-basedsystems.InFedora-basedsystems,theequivalentcommandisyumlistR-
\*.
Typicaloutputfromthesecondcommandisillustratedinthefollowingexample:
Thefollowingextrapackageswillbeinstalled:
libcurl3-nss
ThefollowingNEWpackageswillbeinstalled
libcurl3-nssr-cran-rcurl
0toupgrade,2tonewlyinstall,0toremoveand16nottoupgrade.
Needtoget699kBofarchives.
Afterthisoperation,2,132kBofadditionaldiskspacewillbeused.
Doyouwanttocontinue?[Y/n]
Furtherdetailsareprovidedathttps://cran.r-project.org/bin/linux/forDebian,Redhat,and
SuseOSs.RalsoworksonFreeBSDandotherUnix-basedsystems.3
OnceRisinstalled,itshouldbekeptup-to-date.
UpdatingR
Risamatureandstablelanguage,sowell-writtencodeinbaseRshouldworkonmost
versions.However,itisimportanttokeepyourRversionrelativelyup-to-dateforthe
followingreasons:
Bugfixesareintroducedineachversion,makingerrorslesslikely.
Performanceenhancementsaremadefromoneversiontothenext,meaningyourcode
mayrunfasterinlaterversions.
ManyRpackagesonlyworkonrecentversionsonR.
Releasenoteswithdetailsoneachoftheseissuesarehostedathttps://cran.r-
project.org/src/base/NEWS.Rreleaseversionshavethreecomponentscorrespondingto
major.minor.patchchanges.Generally,twoorthreepatchesarereleasedbeforethenext
minorincrement,eachpatchisreleasedroughlyeverythreemonths.R3.2,forexample,has
consistedofthreeversions:3.2.0,3.2.1,and3.2.2.
OnUbuntu-basedsystems,newversionsofRshouldbeautomaticallydetectedthrough
thesoftwaremanagementsystem,andcanbeinstalledwithapt-getupgrade.
OnMac,thelatestversionshouldbeinstalledbytheuserfromthe.pkgfilesmentioned
previously.
OnWindows,theinstallrpackagemakesupdatingeasy:
#checkandinstallthelatestRversion
installr::updateR()
Forinformationaboutchangestoexpectinthenextversion,youcansubscribetoR’sNEWS
RSSfeed.Itsagoodwayofkeepingup-to-date.
InstallingRPackages
Largeprojectsmayneedseveralpackagestobeinstalled.Inthiscase,therequiredpackages
canbeinstalledatonce.Usingtheexampleofpackagesforhandlingspatialdata,thiscanbe
donequicklyandconciselywiththefollowingcode:
pkgs=c("raster","leaflet","rgeos")#packagenames
install.packages(pkgs)
Inthepreviouscode,alltherequiredpackagesareinstalledwithtwo—notthree—lines,
whichreducestyping.Notethatwecannowreusethepkgsobjecttoloadthemall:
inst=lapply(pkgs,library,character.only=TRUE)#loadthem
Inthepreviouscode,library(pkg[i])isexecutedforeverypackagestoredinthetextstring
vector.Weuselibrary()hereinsteadofrequire()becausetheformerproducesanerrorif
thepackageisnotavailable.
Loadingallpackagesatthebeginningofascriptisgoodpracticeasitensuresthatall
dependencieshavebeeninstalledbeforetimeisspentexecutingcode.Storingpackagenames
inacharactervectorobjectsuchaspkgsisalsousefulbecauseitallowsustoreferbackto
themagainandagain.
InstallingRPackageswithDependencies
Somepackageshaveexternaldependencies(i.e.,theycalllibrariesoutsideR).OnUnix-like
systems,thesearebestinstalledontotheoperatingsystem,bypassinginstall.packages.This
willensurethatthenecessarydependenciesareinstalledandsetupcorrectlyalongsidetheR
package.OnDebian-baseddistributionssuchasUbuntu,forexample,packageswithnames
startingwithr-cran-canbesearchedforandinstalledasfollows(seehttps://cran.r-
project.org/bin/linux/ubuntu/foralistofthese):
apt-cachesearchr-cran-#searchforavailablecranDebianpackages
sudoapt-get-installr-cran-rgdal#installthergdalpackage(withdependencies)
OnWindows,theinstallrpackagehelpsmanageandupdateRpackageswithsystem-level
dependencies.Forexample,theRtoolspackageforcompilingC/C++codeonWindowscan
beinstalledwiththefollowingcommand:
installr::install.rtools()
UpdatingRPackages
AnefficientRsetupwillcontainup-to-datepackages.Thiscanbedoneforallpackagesby
using:
update.packages()
ThedefaultforthisfunctionisfortheaskargumenttobesettoTRUE,givingcontrolover
whatisdownloadedontoyoursystem.Thisisgenerallydesirablebecauseupdatingdozensof
largepackagescanconsumealargeproportionofavailablesystemresources.
T IP
Toupdatepackagesautomatically,youcanaddthelineutils::update.packages(ask=FALSE)tothe.Last
functioninthe.Rprofilestartupfile(seethenextsectionformoreon.Rprofile).ThankstoRichardCottonforthis
tip.
AnevenmoreinteractivemethodforupdatingpackagesinRisprovidedbyRStudiovia
Tools→CheckforPackageUpdates.Manysuchtime-savingtricksareenabledbyRStudio,
asdescribedin“InstallingandUpdatingRStudio.Next(aftertheexercises),wetakealookat
howtoconfigureRusingstartupfiles.
Exercises
1. WhatversionofRareyouusing?Isitthemostup-to-date?
2. Doanyofyourpackagesneedupdating?
RStartup
EverytimeRstarts,acoupleoffilescriptsarerunbydefault,asdocumentedin?Startup.
Thissectionexplainshowtocustomizethesefiles,allowingyoutosaveAPIkeysorload
frequentlyusedfunctions.Beforelearninghowtomodifythesefiles,we’lltakealookathow
toignorethem,withRsstartuparguments.Ifyouwanttoturncustomsetupon,itsusefulto
beabletoturnitoff(e.g.,fordebugging).
T IP
SomeofR’sstartupargumentscanbecontrolledinteractivelyinRStudio.SeetheonlinehelpfileCustomizing
RStudioformoreonthis.
RStartupArguments
AnumberofargumentsthatrelatetostartupcanbeappendedtotheRstartupcommand(Rina
shellenvironment).Thefollowingareparticularlyimportant:
--no-environand--no-init
TellRtoonlylookforstartupfiles(describedinthenextsection)inthecurrentworking
directory.
--no-restore
TellsRnottoloadafilecalled.RData(thedefaultnameforRsessionfiles)thatmaybe
presentinthecurrentworkingdirectory.
--no-save
TellsRnottoasktheuseriftheywanttosaveobjectssavedinRAMwhenthesessionis
endedwithq().
AddingeachofthesewillmakeRloadslightlyfaster,meaningthatslightlylessuserinputis
neededwhenyouquit.R’sdefaultsettingofloadingdatafromthelastsessionautomaticallyis
potentiallyproblematicinthiscontext.SeeAppendixBofAnIntroductiontoRformore
startuparguments.
T IP
AconcisewaytoloadavanillaversionofRwithalloftheprecedingoptionsenablediswithanoptionofthe
samename:
R--vanilla
AnOverviewofRsStartupFiles
TwofilesarereadeachtimeRstarts(unlessoneofthecommand-lineoptionsoutlined
previouslyisused):
.Renviron
Theprimarypurposeofwhichistosetenvironmentvariables.ThesetellRwheretofind
externalprograms,andcanholduser-specificinformationthatneedstobekeptsecret,
typicallyAPIkeys.
.Rprofile
Aplaintextfile(whichisalwayscalled.Rprofile,henceitsname)thatsimplyrunslines
ofRcodeeverytimeRstarts.IfyouwantRtocheckforpackageupdateseachtimeit
starts(asexplainedintheprevioussection),yousimplyaddtherelevantlinesomewhere
inthisfile.
WhenRstarts(unlessitwaslaunchedwith--no-environ),itfirstsearchesfor.Renvironand
then.Rprofile,inthatorder.Although.Renvironissearchedforfirst,wewilllookat.Rprofile
firstasitissimplerand,formanysetuptasks,morefrequentlyuseful.Bothfilescanexistin
threedirectoriesonyourcomputer.
WARNING
ModificationofR’sstartupfilesshouldnotbetakenlightly.Thisisanadvancedtopic.Ifyoumodifyyourstartup
filesinthewrongway,itcancauseproblems:aseeminglyinnocentcalltosetwd()in.Rprofile,forexample,will
breakdevtoolsbuildandcheckfunctions.
Proceedwithcautionand,ifyoumessthingsup,justdeletetheoffendingfiles!
TheLocationofStartupFiles
Confusingly,multipleversionsofstartupfilescanexistonthesamecomputer,onlyoneof
whichwillbeusedpersession.Notealsothatthesefilesshouldonlybechangedwithcaution
andifyouknowwhatyouaredoing.ThisisbecausetheycanmakeyourRversionbehave
differentlythanotherRinstallations,potentiallyreducingthereproducibilityofyourcode.
Filesinthreefoldersareimportantinthisprocess:
R_HOME
ThedirectoryinwhichRisinstalled.Theetcsubdirectorycancontainstartupfilesread
earlyoninthestartupprocess.FindoutwhereyourR_HOMEiswiththeR.home()
command.
HOME
Theusershomedirectory.Typically,thisis/home/usernameonUnixmachinesor
C:\Users\usernameonWindows(sinceWindows7).AskRwhereyourhomedirectoryis
withSys.getenv("HOME").
R’scurrentworkingdirectory
Thisisreportedbygetwd().
Itisimportanttoknowthelocationofthe.Rprofileand.Renvironsetupfilesthatarebeing
usedoutofthesethreeoptions.Ronlyusesone.Rprofileandone.Renvironinanysession;if
youhavean.Rprofilefileinyourcurrentproject,Rwillignore.RprofileinR_HOMEandHOME.
Likewise,.RprofileinHOMEoverrides.RprofileinR_HOME.Thesameappliesto.Renviron:you
shouldrememberthataddingproject-specificenvironmentvariableswith.Renvironwill
deactivateother.Renvironfiles.
Tocreateaproject-specificstartupscript,simplycreatean.Rprofilefileintheprojectsroot
directoryandstartaddingRcode(e.g.,viafile.edit(".Rprofile")).Rememberthatthis
willmake.Rprofileinthehomedirectorybeignored.Thefollowingcommandswillopen
your.RprofilefromwithinanReditor:
file.edit("~/.Rprofile")#edit.RprofileinHOME
file.edit(".Rprofile")#editproject-specific.Rprofile
WARNING
FilepathsprovidedbyWindowsoperatingsystemswillnotalwaysworkinR.Specifically,ifyouuseapaththat
containssinglebackslashes,suchasC:\\DATA\\data.csv,asprovidedbyWindows,thiswillgeneratetheerror:
Error:unexpectedinputin"C:\\".Toovercomethisissue,Rprovidestwofunctions,file.path()and
normalizePath().Theformercanbeusedtospecifyfilelocationswithouthavingtousesymbolstorepresent
relativefilepaths,asfollows:file.path("C:","DATA","data.csv").Thelattertakesanyinputstringfora
filenameandoutputsatextstringthatisstandard(canonical)fortheoperatingsystem.
normalizePath("C:/DATA/data.csv"),forexample,outputsC:\\DATA\\data.csvonaWindowsmachinebut
C:/DATA/data.csvonUnix-basedplatforms.Notethatonlythelatterwouldworkonbothplatforms,sostandard
Unixfilepathnotationissafeforalloperatingsystems.
Editingthe.Renvironfileinthesamelocationswillhavethesameeffect.Thefollowingcode
willcreateauser-specific.Renvironfile(whereAPIkeysandothercross-projectenvironment
variablescanbestored)withoutoverwritinganyexistingfile.
user_renviron=path.expand(file.path("~",".Renviron"))
file.edit(user_renviron)#openwithanothertexteditorifthisfails
T IP
Thepathologicalpackagecanhelpfindwhere.Rprofileand.Renvironfilesarelocatedonyoursystem,thanks
totheos_path()function.Theoutputofexample(Startup)isalsoinstructive.
Thelocation,contents,andusesofeachisoutlinedinmoredetailinthenextsection.
The.RprofileFile
Bydefault,Rlooksforandruns.Rprofilefilesinthethreelocationsdescribedpreviously,in
aspecificorder..RprofilefilesaresimplyRscriptsthatruneachtimeRruns.Theycanbe
foundwithinR_HOME,HOME,andtheproject’shomedirectorybyusinggetwd().Tocheckif
youhaveasitewide.Rprofile,whichwillrunforallusersonstartup,run:
site_path=R.home(component="home")
fname=file.path(site_path,"etc","Rprofile.site")
file.exists(fname)
TheprecedingcodecodechecksforthepresenceofRprofile.siteinthatdirectory.Asoutlined
previously,the.Rprofilelocatedinyourhomedirectoryisuser-specific.Again,wecantest
whetherthisfileexistsusing:
file.exists("~/.Rprofile")
WecanuseRtocreateandedit.Rprofile(warning:donotoverwriteyourprevious.Rprofile
—wesuggestyoutryproject-specific.Rprofilefirst):
file.edit("~/.Rprofile")
Example.RprofileFile
Example2-1providesatasteofwhatgoesinto.Rprofile.NotethatthisissimplyausualR
script,butwithanunusualname.Thebestwaytounderstandwhatisgoingonistocreatethis
samescript,saveitas.Rprofileinyourcurrentworkingdirectory,andthenrestartyourR
sessiontoobservewhatchanges.TorestartyourRsessionfromwithinRStudio,youcan
clickSession→RestartRorusethekeyboardshortcutCtrl-Shift-F10.
Example2-1.Examplecontentsof.Rprofile
#Afunwelcomemessage
message("HiRobin,welcometoR")
#CustomizetheRpromptthatprefixeseverycommand
#(use""forablankprompt)
options(prompt="R4geo>")
Letsquicklyexplaineachlineofcode.Thefirstsimplyprintsamessageintheconsoleeach
timeanewRsessionisstarted.Thelattermodifiestheconsolepromptintheconsole(setto>
bydefault).Notethatsimplyaddingmorelinestothe.Rprofilewillsetmorefeatures.An
importantaspectof.Rprofile(and.Renviron)isthateachlineisrunonceandonlyoncefor
eachRsession.Thatmeansthattheoptionssetwithin.Rprofilecaneasilybechangedduring
thesession.Thefollowingcommandrunmidsession,forexample,willreturnthedefault
prompt:
options(prompt=">")
Moredetailsontheseandotherpotentiallyuseful.Rprofileoptionsaredescribed
subsequently.Formoresuggestionsofusefulstartupsettings,seeexamplesin
help("Startup")andonlineresourcessuchasthoseatstatmethods.net.ThehelppagesforR
options(accessiblewith?options)arealsoworthareadbeforewritingyourown.Rprofile.
Everbeenfrustratedbyunwanted+symbolsthatpreventcopiedandpastedmultilinefunctions
fromworking?Thesepotentiallyannoying+scanbeeradicatedbyaddingoptions(continue
="")toyour.Rprofile.
Settingoptions
Thefunctionoptionsusedpreviouslycontainsanumberofdefaultsettings.Executing
options()providesagoodindicationofwhatcanbeconfigured.Thesettingsthatcanbe
configuredwithoptions()areoftenrelatedtopersonalpreference(withfewimplicationsfor
reproducibility)sothe.Rprofileinyourhomedirectoryisasensibleplacestosetthemifyou
wantthemtobesetforallyourprojectsthathavenoproject-specific.Rprofilefile.Other
illustrativeoptionsareshownhere:
#Withacustomizedprompt
options(prompt="R>",digits=4,show.signif.stars=FALSE,continue="")
#Withalongerpromptandempty'continue'indent(defaultis"+")
options(prompt="R4Geo>",digits=3,continue="")
Thefirstoptionchangesfourdefaultoptionsinasingleline:
TheRprompt,fromtheboring>totheexcitingR>
Thenumberofdigitsdisplayed
Removingthestarsaftersignificantp-values
Removingthe+inmultilinefunctions
Trytoavoidaddingoptionsthatmakeyourcodenonportabletothestartupfile.Forexample,
addingoptions(stringsAsFactors=FALSE)toyourstartupscripthasadditionaleffectsfor
read.table()andrelatedfunctions,includingread.csv(),makingthemconverttextstrings
intocharactersratherthanintofactors,asisthedefault.Thismaybeusefulforyou,butitcan
alsomakeyourcodelessportable,sobewarned.
SettingtheCRANmirror
ToavoidsettingtheCRANmirroreachtimeyouruninstall.packages(),youcan
permanentlysetthemirrorinyour.Rprofile.
#`local`createsanew,emptyenvironment
#Thisavoidspolluting.GlobalEnvwiththeobjectr
local({
r=getOption("repos")
r["CRAN"]="https://cran.rstudio.com/"
options(repos=r)
})
TheRStudiomirrorisavirtualmachinerunbyAmazon’sEC2service,anditsyncswiththe
mainCRANmirrorinAustriaonceperday.SinceRStudioisusingAmazon’sCloudFront,
therepositoryisautomaticallydistributedaroundtheworld,sonomatterwhereyouareinthe
world,thedatadoesn’tneedtotravelveryfar,andisthereforefasttodownload.
Thefortunespackage
Thissectionillustratesthepowerof.Rprofilecustomizationwithreferencetoapackagethat
wasdevelopedforfun.Thefollowingcodecouldeasilybealteredtoautomaticallyconnectto
adatabase,ortoensurethatthelatestpackageshavebeendownloaded.
Thefortunespackagecontainsanumberofmemorablequotes,calledRfortunes,thatthe
communityhascollectedovermanyyears.Eachfortunehasanumber.Togetfortunenumber
50,forexample,enter:
fortunes::fortune(50)
#>
#>Toparaphraseprovocatively,'machinelearningisstatisticsminusany
#>checkingofmodelsandassumptions'.
#>--BrianD.Ripley(aboutthedifferencebetweenmachinelearningand
#>statistics)
#>useR!2004,Vienna(May2004)
ItiseasytomakeRprintoutoneofthesenuggetsoftrutheachtimeyoustartasessionby
addingthefollowingto.Rprofile:
if(interactive())
try(fortunes::fortune(),silent=TRUE)
Theinteractive()functiontestswhetherRisbeingusedinteractivelyinaterminal.The
fortune()functioniscalledwithintry().Ifthefortunespackageisnotavailable,weavoid
raisinganerrorandmoveon.Byusing::,weavoidaddingthefortunespackagetoourlist
ofattachedpackages.
T IP
Typingsearch()givesthelistofattachedpackages.Byusingfortunes::fortune(),weavoidaddingthe
fortunespackagetothatlist.Thefunction.Last(),ifitexistsinthe.Rprofile,isalwaysrunattheendofthe
session.Wecanuseittoinstallthefortunespackageifneeded.Toloadthepackage,weuserequire(),because
ifthepackageisntinstalled,therequire()functionreturnsFALSEandraisesawarning.
.Last=function(){
cond=suppressWarnings(!require(fortunes,quietly=TRUE))
if(cond)
try(install.packages("fortunes"),silent=TRUE)
message("Goodbyeat",date(),"\n")
}
Usefulfunctions
Youcanuse.Rprofiletodefinenewhelperfunctionsorredefineexistingonessothatthey’re
fastertotype.Forexample,wecouldloadthefollowingtwofunctionsforexaminingdata
frames:
#ht==headtail
ht=function(d,n=6)rbind(head(d,n),tail(d,n))
#Showthefirst5rows&first5columnsofadataframe
hh=function(d)d[1:5,1:5]
andafunctionforsettinganiceplottingwindow:
nice_par=function(mar=c(3,3,2,1),mgp=c(2,0.4,0),tck=-0.01,
cex.axis=0.9,las=1,mfrow=c(1,1),...){
par(mar=mar,mgp=mgp,tck=tck,cex.axis=cex.axis,las=las,
mfrow=mfrow,...)
}
Notethatthesefunctionsareforpersonaluseandareunlikelytointerferewithcodefrom
otherpeople.Forthisreason,evenifyouuseacertainpackageeveryday,wedon’t
recommendloadingitinyour.Rprofile.Shorteninglongfunctionnamesforinteractive(but
notreproduciblecodewriting)isanotheroptionforusing.Rprofiletoincreaseefficiency.If
youfrequentlyuseView(),forexample,youmaybeabletosavetimebyreferringtoitin
abbreviatedform.Thisisillustratedinthefollowinglineofcode,whichmakesitfasterto
viewdatasets(althoughwithIDE-drivenautocompletion,outlinedinthenextsection,thetime
savingsisless).
v=utils::View
Alsobewareofthedangersofloadingmanyfunctionsbydefaultasitmaymakeyourcode
lessportable.Anotherpotentiallyusefulsettingtochangein.RprofileisRscurrentworking
directory.IfyouwantRtoautomaticallysettheworkingdirectorytotheRfolderofyour
project,forexample,youwouldaddthefollowinglineofcodetotheproject-specific
.Rprofile:
setwd("R")
Creatinghiddenenvironmentswith.Rprofile
Beyondmakingyourcodelessportable,anotherdownsideofputtingfunctionsinyour
.Rprofileisthatitcanclutterupyourworkspace:whenyourunthels()command,your
.Rprofilefunctionswillappear.Also,ifyourunrm(list=ls()),yourfunctionswillbe
deleted.Oneneattricktoovercomethisissueistousehiddenobjectsandenvironments.
Whenanobjectnamestartswith.,bydefaultitdoesn’tappearintheoutputofthels()
function:
.obj=1
".obj"%in%ls()
#>[1]FALSE
Thisconceptalsoworkswithenvironments.Inthe.Rprofilefile,wecancreateahidden
environment:
.env=new.env()
Andthenaddfunctionstothisenvironment:
.env$ht=function(d,n=6)rbind(head(d,n),tail(d,n))
Attheendofthe.Rprofilefile,weuseattach,whichmakesitpossibletorefertoobjectsin
theenvironmentbytheirnamesalone:
attach(.env)
The.RenvironFile
The.Renvironfileisusedtostoresystemvariables.Itfollowsasimilarstartuproutinetothe
.Rprofilefile:Rfirstlooksforaglobal.Renvironfile,thenforlocalversions.Atypicaluseof
the.RenvironfileistospecifytheR_LIBSpath,whichdetermineswherenewpackagesare
installed:
#Linux
R_LIBS=~/R/library
#Windows
R_LIBS=C:/R/library
Aftersettingthis,install.packages()savespackagesinthedirectoryspecifiedbyR_LIBS.
Thelocationofthisdirectorycanbereferredbacktosubsequentlyasfollows:
Sys.getenv("R_LIBS")
AllcurrentlystoredenvironmentvariablescanbeseenbycallingSys.getenv()withno
arguments.Notethatmanyenvironmentvariablesarealreadypresetanddonotneedtobe
specifiedin.Renviron.HOME,forexample,whichcanbeseenwithSys.getenv("HOME"),is
takenfromtheoperatingsystemslistofenvironmentvariables.Alistofthemostimportant
environmentvariablesthatcanaffectR’sbehaviorisdocumentedinthelittle-knownhelp
pagehelp("environmentvariables").
Tosetorunsetanenvironmentvariableforthedurationofasession,usethefollowing
commands:
Sys.setenv("TEST"="test-string")#setanenvironmentvariableforthesession
Sys.unsetenv("TEST")#unsetit
Anothercommonuseof.RenvironistostoreAPIkeysandauthenticationtokensthatwillbe
availablefromonesessiontoanother.4Acommonusecaseissettingtheenvironment
variableGITHUB_PAT,whichwillbedetectedbythedevtoolspackageviathefunction
github_pat().Totakeanotherexample,thefollowinglinein.RenvironsetstheZEIT_KEY
environmentvariable,whichisusedinthediezeitpackage:
ZEIT_KEY=PUT_YOUR_KEY_HERE
YouwillneedtosigninandstartanewRsessionfortheenvironmentvariable(accessedby
Sys.getenv())tobevisible.TotestiftheexampleAPIkeyhasbeensuccessfullyaddedasan
environmentvariable,runthefollowing:
Sys.getenv("ZEIT_KEY")
Usingthe.RenvironfileforstoringsettingssuchaslibrarypathsandAPIkeysisefficient
becauseitreducestheneedtoupdateyoursettingsforeveryRsession.Furthermore,thesame
.Renvironfilewillworkacrossdifferentplatforms,sokeepitstoredsafely.
Example.Renvironfile
My.Renvironfilehasgrownovertheyears.Ioftenswitchbetweenmydesktopandlaptop
computers,sotomaintainaconsistentworkingenvironment,Ihavethesame.Renvironfile
onallofmymachines.AswellascontaininganR_LIBSentryandsomeAPIkeys,my
.Renvironhasafewotherlines:
TMPDIR=/data/R_tmp/
WhenRisrunning,itcreatestemporarycopies.Onmyworkmachine,thedefault
directoryisanetworkdrive.
R_COMPILE_PKGS=3
Bytecompileallpackages(coveredinChapter3).
R_LIBS_SITE=/usr/lib/R/site-library:/usr/lib/R/library
Iexplicitlystatewheretolookforpackages.Myuniversityhasasitewidedir