PDF Rnews 2003 2

PDF Rnews_2003-2 CRAN - Contents of R News

User Manual: PDF CRAN: R News

Open the PDF directly: View PDF .
Page Count: 39

Editorial
R Help Desk
Integrating grid Graphics Output with Base Graphics Output
A New Package for the General Error Distribution
Web-based Microarray Analysis using Bioconductor
Sweave, Part II: Package Vignettes
R Foundation News
Recent Events
Book Reviews
- John Maindonald and John Braun: Data Analysis and Graphics Using R --- An Example-based Approach
Changes in R 1.8.0
Changes on CRAN
- New contributed packages
- Other changes
Crossword Solution
Correction to ``Building Microsoft Windows Versions of R and R packages under Intel Linux''

News

The Newsletter of the R Project Volume 3/2, October 2003

Editorial

by Friedrich Leisch

Welcome to a new issue of R News, focussing on

graphics and user interfaces. Paul Murell’s grid

graphics system offers a modern infrastructure for R

graphics. As of R-1.8.0 it is part of the base distribu-

tion but, until now, grid graphics functions could not

be mixed with traditional (base) plotting functions.

In this newsletter Paul describes his gridBase pack-

age which allows mixing of grid and base graphics.

Marc Schwartz, known as frequent poster of answers

on the mailing list r-help, has been invited to con-

tribute to the R Help Desk. He presents “An Intro-

duction to Using R’s Base Graphics”.

The Bioconductor column was written by Colin

Smith and shows a web-based interface for microar-

ray analysis. Angelo Mineo describes normalp, a

new package for the general error distribution. In the

second part of my mini-series on Sweave I demon-

strate how users can interact with package vignettes

and how package authors can add such documents

to their packages.

To close the circle, a version of Paul’s article is a

vignette in package gridBase, so if you want to play

with the code examples, the easiest way is to use R’s

vignette tools. We hope to see vignettes used more

frequently in the future, as they are a simple, effec-

tive way of delivering code examples to users.

R 1.8.0 was released more than two weeks ago.

The list of new features is long; see “Changes in R”

for detailed release information. I will draw atten-

tion to a “minor” but rather emotional point that has

sparked heated dicussions on the mailing lists in the

past. Release 1.8.0 marks the end of the use of the un-

derscore as an assignment operator in the R dialect of

the S language. That is, x_1 is now a syntax error.

A new column in R News lists the new members

of the R Foundation for Statistical Computing. R has

become a mature and valuable tool and we would

like to ensure its continued development and the de-

velopment of future innovations in software for sta-

tistical and computational research. We hope to at-

tract sufﬁcient funding to make these goals realities.

Listing members in our newsletter is one (small) way

of thanking them for their support.

Friedrich Leisch

Technische Universität Wien, Austria

Friedrich.Leisch@R-project.org

Contents of this issue:

Editorial ...................... 1

RHelpDesk .................... 2

Integrating grid Graphics Output

with Base Graphics Output . . . . . . . . . . 7

A New Package for the General Error Distri-

bution ...................... 13

Web-based Microarray Analysis using Biocon-

ductor ...................... 17

Sweave, Part II: Package Vignettes . . . . . . . 21

R Foundation News . . . . . . . . . . . . . . . . 25

RecentEvents ................... 26

BookReviews ................... 28

Changes in R 1.8.0 . . . . . . . . . . . . . . . . . 29

Changes on CRAN . . . . . . . . . . . . . . . . 35

Crossword Solution . . . . . . . . . . . . . . . . 38

Correction to “Building Microsoft Windows

Versions of R and R packages under Intel

Linux” ...................... 39

Vol. 3/2, October 2003 2

R Help Desk

An Introduction to Using R’s Base Graphics

Marc Schwartz

Preface

As the use of R grows dramatically, an increas-

ingly diverse base of users will begin their explo-

ration of R’s programmatic approach to graphics.

Some new users will start without prior experience

generating statistical graphics using coded functions

(ie. they may have used GUI based “point-and-click”

or “drag-and-drop” graphic processes) and/or they

may be overwhelmed by the vast array (pardon

the pun) of graphic and plotting functions in R.

This transition can not only present a steep learning

curve, but can perhaps, by itself, become a barrier to

using R entirely, which would be an unfortunate out-

come.

R has essentially two separate core plotting en-

vironments in the default (base plus ‘recommended

package’) installation. The ﬁrst is the extensive set

of base graphic functions and the second is the com-

bination of the grid (Murrell,2002) and lattice pack-

ages (Sarkar,2002), which together provide for ex-

tensive Trellis conditioning plots and related stan-

dardized functionality. For the purpose of this intro-

duction, I shall focus exclusively on the former.

The key advantages of a programmatic plotting

approach are much ﬁner control over the plotting

process and, importantly, reproducibility. Days,

weeks or even months later, you can return to re-

use your same code with the same data to achieve

the same output. Ultimately, productivity is also en-

hanced because, once created, a single plotting func-

tion can be called quickly, generating one or an entire

series of graphics in a largely automated fashion.

R has a large number of “high” and “low” level

plotting functions that can be used, combined and

extended for speciﬁc purposes. This extensibility en-

ables R to meet a wide spectrum of needs, as demon-

strated by the number of contributed packages on

CRAN that include additional specialized plotting

functionality.

The breadth of base plotting functions is usually

quite satisfactory for many applications. In conjunc-

tion with R’s innate ability to deal with data in vec-

torized structures and by using differing ‘methods’,

one can further reduce the need for lengthy, repeti-

tive and complex code. In many cases, entire data

structures (ie. a linear model object) can be passed as

a single argument to a single plotting function, creat-

ing a default plot or series of plots.

Further, where default plot settings are perhaps

inappropriate for a given task, these can be ad-

justed to your liking and/or disabled. The base

graphic can be enhanced by using various lower

level plotting functions to add data points, lines,

curves, shapes, titles, legends and text annotations.

Formatted complex mathematical formulae (Murrell

and Ihaka,2000;Ligges,2002) can also be included

where required.

If a graphics ‘device’ is not explicitly opened

by the user, R’s high level plotting functions will

open the default device (see ?Devices) speciﬁed by

options("device"). In an interactive session, this is

typically the screen. However, one can also open an

alternative device such as a bitmap (ie. PNG/JPEG) or

a PostScript/PDF ﬁle for publishing and/or presen-

tation. I will focus on using the screen here, since

the particulars concerning other devices can be plat-

form speciﬁc. Note that if you intend to create plots

for output to something other than the screen, then

you must explicitly open the intended device. Dif-

ferences between the screen and the alternate device

can be quite signiﬁcant in terms of the resultant plot

output. For example, you can spend a lot of time cre-

ating the screen version of a plot, only to ﬁnd out it

looks quite different in a PostScript ﬁle,

Various parameters of the ﬁgure and plot regions

within a device can be set in advance by the use of

the par() function before calling the initial plot func-

tion. Others can be set as named arguments to the

plot functions. Options set by par() affect all graph-

ics; options set in a graphics call affect only that call.

(See ?par and ?plot.default for some additional

details).

It is possible to divide the overall graphic device

into a row/column grid of ﬁgures and create individ-

ual plots within each grid section (ie. a matrix of scat-

terplots like a pairs() plot) or create a graphic that

contains different plot types (ie. a scatterplot with

boxplots placed on the x and y axes). For more in-

formation, see ?layout,?split.screen and graphic

parameters ‘mfcol’ and ‘mfrow’ in ?par.

For additional details regarding graphic devices,

parameters and other considerations, please review

“Graphical Procedures” (Ch. 12) in “An Introduc-

tion to R” (Venables, Smith and R Core,2003) and

“Graphics” (Ch. 4) in “Modern Applied Statistics

with S” (Venables and Ripley,2002).

Let’s Get Plotting

In this limited space, it is not possible to cover all

the combinations and permutations possible with R’s

base graphics functionality (which could be a thick

book in its own right). Thus, I will put forth a ﬁ-

nite set of practical examples that cover a modest

range of base plots and enhancements. For each plot,

R News ISSN 1609-3631

Vol. 3/2, October 2003 3

we will create some simple data to work with, cre-

ate a basic plot using a standard function to demon-

strate default behavior and then enhance the base

plot with additional detail. The included graphic for

each will show the ﬁnal result. I recommend that you

consult the R help system for each function (using

?FunctionName) to better understand the syntax of

each function call and how each argument impacts

the resultant output.

Scatterplot with a regression line and con-

ﬁdence / prediction intervals

Linear Regression Plot

x vals

y vals

●

●●

●

−2 −1 0 1 2

−4

−2

Fitted Line

Confidence Bands

Prediction Bands

The plot() function is a generic graphing func-

tion that can accept of variety of data structures

through speciﬁc deﬁned ‘methods’. Frequently,

these arguments are numeric vectors representing

the two-dimensional (x,y) coordinate pairs of points

and/or lines to display. If you want to get a feel

for the breadth of plotting methods available use

methods(plot).

In the next example we ﬁrst create a series of sim-

ple plots (not shown) then create the more complex

scatterplot shown above. To do this we create an x-y

scatterplot using type = "n" so that the axis ranges

are established, but nothing is plotted initially. We

then add the data points, the axes, a ﬁtted regression

line, and conﬁdence and prediction intervals for the

regression model:

# Create our data

set.seed(1)

x <- runif(50, -2, 2)

set.seed(2)

y <- x + rnorm(50)

# Create the model object

mod <- lm(y ~ x)

# Plot the data and add a regression line

# using default plot() behavior

plot(x, y)

abline(mod)

# Plot the model object, going through a

# sequence of diagnostic plots. See ?plot.lm

plot(mod)

# Create prediction values and confidence limits

# using a new dataframe of x values, noting the

# colnames need to match your model term names.

newData <- data.frame(x = seq(min(x), max(x),

by = (max(x) - min(x)) / 49))

pred.lim <- predict(mod, newdata = newData,

interval = "prediction")

conf.lim <- predict(mod, newdata = newData,

interval = "confidence")

# Function to color plot region

color.pr <- function(color = "white")

{

usr <- par("usr")

if (par("xlog"))

usr[1:2] <- 10 ^ usr[1:2]

if (par("ylog"))

usr[3:4] <- 10 ^ usr[3:4]

rect(usr[1], usr[3], usr[2], usr[4],

col = color)

}

# Color the plot background

par(bg = "blue")

# Define margins to enable space for labels

par(mar = c(5, 6, 5, 3) + 0.1)

# Create the plot. Do not plot the data points

# and axes to allow us to define them our way

plot(x, y, xlab = "x vals", ylab = "y vals",

type = "n", col.lab = "yellow", font.lab = 2,

cex.lab = 1.5, axes = FALSE, cex.main = 2,

main = "Linear Regression Plot",

col.main = "yellow", xlim = c(-2.1, 2.1),

ylim = range(y, pred.lim, na.rm = TRUE))

# Color the plot region white

color.pr("white")

# Plot the data points

points(x, y, pch = 21, bg = "yellow", cex=1.25)

# Draw the fitted regression line and the

# prediction and confidence intervals

matlines(newData$x, pred.lim, lty = c(1, 4, 4),

lwd = 2, col = c("black", "red", "red"))

matlines(newData$x, conf.lim, lty = c(1, 3, 3),

lwd = 2, col = c("black", "green4", "green4"))

# Draw the X and Y axes, repectively

axis(1, at = -2:2, col = "white",

col.axis = "white", lwd = 2)

axis(2, at = pretty(range(y), 3), las = 1,

col = "white", col.axis = "white", lwd = 2)

# Draw the legend

legend(-2, max(pred.lim, na.rm = TRUE),

legend = c("Fitted Line", "Confidence Bands",

"Prediction Bands"),

lty = c(1, 3, 4), lwd = 2,

col = c("black", "green4", "red"),

horiz = FALSE, cex = 0.9, bg = "gray95")

# Put a box around the plot

box(lwd = 2)

R News ISSN 1609-3631

Vol. 3/2, October 2003 4

Barplot with conﬁdence intervals and ad-

ditional annotation

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

% Incidence (+/−95% CI)

A B C

126 409 284

p = 0.8285 p = 0.0931 p = 0.1977

4.8%

2.7% 2.8%

Benchmark Value: 4.5%

Incidence of Event By Group

Total N = 819

barplot() can draw essentially three types of

plots with either vertical or horizontal bars (using

the argument horiz = TRUE / FALSE). The ﬁrst is

a series of individual bars where the height argu-

ment (which deﬁnes the bar values) is a simple vec-

tor. The second is a series of stacked multi-segment

bars where height is a matrix and beside = FALSE.

The third is a series of grouped bars where height is

a matrix and beside = TRUE. In the second and third

cases, each column of the matrix height represents

either the values of the bar segments in each stacked

bar, or the values of the individual bars in each bar

group, respectively.

barplot() returns either a vector or a matrix

(when beside = TRUE) of bar midpoints that can be

assigned to a variable (ie. mp <- barplot(...)). You

can use this information to locate bar midpoints for

text and/or line placement. To locate the midpoint

of bar groups, use colMeans(mp) to enable the place-

ment of a bar group label.

Here we will create a vertical barplot, with each

of the three bars representing a proportion. We will

add binomial conﬁdence intervals and p values from

binom.test() using a ‘benchmark’ value that will be

plotted. We will label the y axis with percentages

(prop * 100), add bar values above the top of each

bar and put sample sizes centered below each bar un-

der the x axis.

# Create our data

A <- data.frame(Event = c(rep("Yes", 6),

rep("No", 120)), Group = "A")

B <- data.frame(Event = c(rep("Yes", 11),

rep("No", 398)), Group = "B")

C <- data.frame(Event = c(rep("Yes", 8),

rep("No", 276)), Group = "C")

BarData <- rbind(A, B, C)

attach(BarData)

# Create initial ’default’ barplots

barplot(table(Group))

barplot(table(Group), horiz = TRUE)

barplot(table(Event, Group))

barplot(table(Event, Group), beside = TRUE)

# Let’s get our summary data from the dataframe

table.data <- table(Event, Group)

# Get sample sizes

n <- as.vector(colSums(table.data))

# Get number of "Yes" events

events <- as.vector(table.data["Yes", ])

# Proportion of "Yes" events

prop.events <- events / n

# Group names from table dimnames

Group.Names <- dimnames(table.data)$Group

# Define our benchmark value

benchmark <- 0.045

# Get binomial confidence limits and p values

stats <- mapply(binom.test, x = events, n = n,

p = benchmark)

# ci[, 1] = lower and ci[, 2] = upper

ci <- matrix(unlist(stats["conf.int", ]),

ncol = 2, byrow = TRUE)

p.val <- unlist(stats["p.value", ])

# Define Y axis range to include CI’s and

# space for a legend in the upper LH corner

YMax <- max(ci[, 2]) * 1.25

# Define margins to enable space for labels

par(mar = c(5, 6, 5, 3) + 0.1)

# Do the barplot, saving bar midpoints in MidPts

MidPts <- barplot(prop.events, space = 1,

axes = FALSE,axisnames = FALSE,

ylim = c(0, YMax))

# Define formatted Y axis labels using

# axTicks() and draw the Y Axis and label

YLabels <- paste(formatC(axTicks(2) * 100,

format = "f", digits = 1),

"%", sep = "")

YAxisLab <- "% Incidence (+/-95% CI)"

axis(2, labels = YLabels, at = axTicks(2),

las = 1)

mtext(YAxisLab, side = 2, adj = 0.5,

line = 4.5, cex = 1.1, font = 2)

# Draw the X axis using Group Names at bar

# midpoints

axis(1, labels = Group.Names, at = MidPts,

font = 2, cex.axis = 1.25)

# Draw Sample Sizes and p Values below Group

# Names

mtext(n, side = 1, line = 2, at = MidPts,

cex = 0.9)

p.val.text <- paste("p = ",

formatC(p.val, format = "f", digits = 4),

sep = "")

mtext(p.val.text, side = 1, line = 3,

at = MidPts, cex = 0.9)

# Place formatted bar values above the left edge

# of each bar so that CI lines do not go through

# numbers. Left edge = MidPts - (’width’ / 2)

bar.vals <- paste(formatC(

prop.events * 100, format = "f", digits=1),

"%", sep = "")

text(MidPts - 0.5, prop.events, cex = 0.9,

labels = bar.vals, adj = c(0, -0.5), font=1)

R News ISSN 1609-3631

Vol. 3/2, October 2003 5

# Draw confidence intervals, first drawing

# vertical line segments and then upper and

# lower horizontal boundary segments

segments(MidPts, ci[, 1], MidPts, ci[, 2],

lty = "solid", lwd = 2)

segments(MidPts - 0.25, ci[, 1],

MidPts + 0.25, ci[, 1], lty = "solid", lwd=2)

segments(MidPts - 0.25, ci[, 2],

MidPts + 0.25, ci[, 2], lty = "solid", lwd=2)

# Plot benchmark line

abline(h = benchmark, lty = "dotdash",

lwd = 2, col = "blue")

# Draw legend

legend(1, YMax * 0.95, lty = "dotdash",

legend = "Benchmark Value: 4.5%", lwd = 2,

col = "blue", horiz = FALSE, cex = 0.9,

bg = "gray95")

# Draw title and sub-title

mtext("Incidence of Event By Group", side = 3,

line = 3, cex = 1.5, font = 2)

mtext(paste("Total N = ", sum(n), sep = ""),

side = 3, line = 1, cex = 1, font = 2)

# Put box around plot

box()

detach(BarData)

Paired Boxplots with outliers colored and

median / mean values labeled

●

A1 B1 A2 B2

135 175 250 500

5.1

7.5

3.2

5.1

5.0

7.6

2.7

4.6

Distribution of ’Measure’ by ’Group’

Mean

Median

J.W. Tukey’s Box-Whisker plots (Tukey,1977) are

a quick and easy way to visually review and com-

pare the distributions of continuous variables. For

some descriptive information on the structure and

interpretation of these plots including additional ref-

erences, see ?boxplot.stats.

Here we will generate continuous measures in

four groups. We will generate default plots and then

enhance the layout of the plot to visually group the

data and to annotate it with key labels.

# Create our data

set.seed(1)

A1 <- data.frame(Group = "A1",

Measure = rnorm(135, 5))

set.seed(2)

A2 <- data.frame(Group = "A2",

Measure = rgamma(250, 3))

set.seed(3)

B1 <- data.frame(Group = "B1",

Measure = rnorm(175, 7.5))

set.seed(4)

B2 <- data.frame(Group = "B2",

Measure = rgamma(500, 5))

BPData <- rbind(A1, A2, B1, B2)

attach(BPData)

# Create default boxplots

boxplot(Measure)

boxplot(Measure, horizontal = TRUE)

boxplot(Measure ~ Group)

# Adjust Group factor levels to put A1 / B1

# and A2 / B2 pairs together

Group <- factor(Group,

levels = c("A1", "B1", "A2", "B2"))

# Show default boxplot with re-grouping

boxplot(Measure ~ Group)

# Define that boxplot midpoints to separate

# the pairs of plots

at <- c(1.25, 1.75, 3.25, 3.75)

# Draw boxplot, returning boxplot stats in S

# which will contain summary data for each Group.

# See ?boxplot.stats

S <- boxplot(Measure ~ Group, boxwex = 0.25,

col = c("orange", "yellow"), notch = TRUE,

at = at, axes = FALSE)

# Draw thicker green lines for median values

# When notch = TRUE, median width = boxwex / 2

segments(at - 0.0625, S$stats[3, ],

at + 0.0625, S$stats[3, ],

lwd = 2, col = "darkgreen")

# Get Group means and plot them using a

# diamond plot symbol

means <- by(Measure, Group, mean)

points(at, means, pch = 23, cex = 0.75,

bg = "red")

# Color outlier values using x,y positions from S

points(at[S$group], S$out, pch = 21, bg="blue")

# Draw Y axis, rotating labels to horiz

axis(2, las = 1)

# Draw X Axis Group Labels

axis(1, at = at, labels = S$names,

cex.axis = 1.5, font.axis = 2)

mtext(S$n, side = 1, at = at, line = 3)

# Draw Mean values to the left edge of each

# boxplot

text(at - 0.125, means, labels = formatC(

means, format = "f", digits = 1),

pos = 2, cex = 0.9, col = "red")

# Draw Median values to the right edge of

# each boxplot

text(at + 0.125, S$stats[3, ],

labels = formatC(S$stats[3, ], format = "f",

digits = 1),

pos = 4, cex = 0.9, col = "darkgreen")

# Draw a box around plot

box()

# Add title and legend

title("Distribution of ’Measure’ by ’Group’",

R News ISSN 1609-3631

Vol. 3/2, October 2003 6

cex.main = 1.5)

legend(0.5, max(Measure),

legend = c("Mean", "Median"),

fill = c("red", "darkgreen"))

detach(BPData)

Additional Resources

For additional information on using R’s plotting

functionality, see: Venables, Smith and R Core (2003);

Venables and Ripley (2002); Fox (2002); Dalgaard

(2002). In addition, Uwe Ligges’ recent R News ar-

ticle (Ligges,2003) provides excellent insights into

how best to utilize R’s documentation and help re-

sources.

If you are in need of expert guidance on creating

analytic graphics, such as the pros and cons of using

particular graphic formats and their impact on the

interpretation of your data, two critically important

references are “Visualizing Data” (Cleveland,1993)

and “The Elements of Graphing Data” (Cleveland,

1994).

Bibliography

Cleveland, W. S. (1993): Visualizing Data. Summit,

NJ: Hobart Press. 6

Cleveland, W. S. (1994): The Elements of Graphing

Data. Summit, NJ: Hobart Press, revised edition.

Dalgaard, P. (2002): Introductory Statistics with R.

New York: Springer-Verlag. 6

Fox, J. (2002): An R and S-PLUS Companion to Applied

Regression. Thousand Oaks: Sage. 6

Ligges, U. (2002): R Help Desk – Automation of

Mathematical Annotation in Plots. R News, 2 (3),

32–34. ISSN 1609-3631. URL http://CRAN.

R-project.org/doc/Rnews/.2

Ligges, U. (2003): R Help Desk – Getting Help – R’s

Help Facilities and Manuals. R News, 3 (1), 26–28.

ISSN 1609-3631. URL http://CRAN.R-project.

org/doc/Rnews/.6

Murrell, P. (2002): The grid Graphics Package. R

News, 2 (2), 14–19. ISSN 1609-3631. URL http:

//CRAN.R-project.org/doc/Rnews/.2

Murrell, P. and Ihaka, R. (2000): An Approach to Pro-

viding Mathematical Annotation in Plots. Journal

of Computational and Graphical Statistics, 9 (3), 582–

599. 2

Sarkar, D. (2002): Lattice: An Implementation of Trel-

lis Graphics in R. R News, 2 (2), 19–23. ISSN

1609-3631. URL http://CRAN.R-project.org/

doc/Rnews/.2

Tukey, J. (1977): Exploratory Data Analysis. Reading,

MA: Addison-Wesley. 5

Venables, W. N. and Ripley, B. D. (2002): Modern Ap-

plied Statistics with S. New York: Springer-Verlag,

4th edition. 2,6

Venables, W. N., Smith, D. M. and the R De-

velopment Core Team (2003): An Introduction

to R. URL http://CRAN.R-project.org/doc/

manuals.html.2,6

Marc Schwartz

MedAnalytics, Inc., Minneapolis, Minnesota, USA

MSchwartz@MedAnalytics.com

R News ISSN 1609-3631

Vol. 3/2, October 2003 7

Integrating grid Graphics Output

with Base Graphics Output

by Paul Murrell

Introduction

The grid graphics package (Murrell,2002) is much

more powerful than the standard R graphics system

(hereafter “base graphics”) when it comes to com-

bining and arranging graphical elements. It is pos-

sible to create a greater variety of graphs more easily

with grid (see, for example, Deepayan Sarkar’s lat-

tice package (Sarkar,2002)). However, there are very

many plots based on base graphics (e.g., biplots), that

have not been implemented in grid, and the task of

reimplementing these in grid is extremely daunting.

It would be nice to be able to combine the ready-

made base plots with the sophisticated arrangement

features of grid.

This document describes the gridBase package

which provides some support for combining grid

and base graphics output.

Annotating base graphics

using grid

The gridBase package provides the baseViewports()

function, which supports adding grid output to a

base graphics plot. This function creates a set of grid

viewports that correspond to the current base plot.

These allow simple operations such as adding lines

and text using grid’s units to locate them relative to

a wide variety of coordinate systems, or something

more complex involving pushing further grid view-

ports.

baseViewports() returns a list of three grid view-

ports. The ﬁrst corresponds to the base “inner” re-

gion. This viewport is relative to the entire device;

it only makes sense to push this viewport from the

“top level” (i.e., only when no other viewports have

been pushed). The second viewport corresponds to

the base “ﬁgure” region and is relative to the inner

region; it only makes sense to push it after the “in-

ner” viewport has been pushed. The third viewport

corresponds to the base “plot” region and is relative

to the ﬁgure region; it only makes sense to push it af-

ter the other two viewports have been pushed in the

correct order.

0246810

one

two

three

four

five

six

seven

eight

nine

ten

Figure 1: Annotating a base plot with grid.text().

A simple application of this facility involves

adding text to the margins of a base plot at an arbi-

trary orientation. The base function mtext() allows

text to be located in terms of a number of lines away

from the plot region, but only at rotations of 0 or 90

degrees. The base text() function allows arbitrary

rotations, but only locates text relative to the user co-

ordinate system in effect in the plot region (which is

inconvenient for locating text in the margins of the

plot). By contrast, the grid function grid.text() al-

lows arbitrary rotations and can be used in any grid

viewport. In the following code we ﬁrst create a base

plot, leaving off the tick labels.

> midpts <- barplot(1:10, axes = FALSE)

> axis(2)

> axis(1, at = midpts, labels = FALSE)

Next we use baseViewports() to create grid view-

ports that correspond to the base plot and we push

those viewports1.

> vps <- baseViewports()

> par(new = TRUE)

> push.viewport(vps$inner, vps$figure,

+ vps$plot)

Finally, we draw rotated labels using grid.text()

(and pop the viewports to clean up after ourselves).

The ﬁnal plot is shown in Figure 1.

1The par(new=TRUE) is necessary currently because the ﬁrst grid action will try to move to a new page; it should be possible to remove

this step in future versions of R.

R News ISSN 1609-3631

Vol. 3/2, October 2003 8

> grid.text(c("one", "two", "three",

+ "four", "five", "six", "seven",

+ "eight", "nine", "ten"),

+ x = unit(midpts, "native"),

+ y = unit(-1, "lines"), just = "right",

+ rot = 60)

> pop.viewport(3)

The next example is a bit more complicated be-

cause it involves embedding grid viewports within

a base graphics plot. The dataset is a snapshot of

wind speed, wind direction, and temperature at sev-

eral weather stations in the South China Sea, south

west of Japan2.grid is used to produce novel plot-

ting symbols for a standard base plot.

First of all, we need to deﬁne the novel plotting

symbol. This consists of a dot at the data location,

with a thermometer extending “below” and an ar-

row extending “above”. The thermometer is used to

encode temperature and the arrow is used to indicate

wind speed (both scaled to [0, 1]).

> novelsym <- function(speed,

+ temp, width = unit(3, "mm"),

+ length = unit(0.5, "inches")) {

+ grid.rect(height = length,

+ y = 0.5, just = "top",

+ width = width,

+ gp = gpar(fill = "white"))

+ grid.rect(height = temp *

+ length, y = unit(0.5,

+ "npc") - length, width = width,

+ just = "bottom",

+ gp = gpar(fill = "grey"))

+ grid.arrows(x = 0.5,

+ y = unit.c(unit(0.5, "npc"),

+ unit(0.5, "npc") +

+ speed * length),

+ length = unit(3, "mm"),

+ type = "closed",

+ gp = gpar(fill = "black"))

+ grid.points(unit(0.5, "npc"),

+ unit(0.5, "npc"), size = unit(2,

+ "mm"), pch = 16)

+ }

Now we read in the data and generate a base plot,

but plot no points.

> chinasea <- read.table("chinasea.txt",

+ header = TRUE)

> plot(chinasea$lat, chinasea$long,

+ type = "n", xlab = "latitude",

+ ylab = "longitude",

+ main = "China Sea ...")

Now we use baseViewports() to align a grid view-

port with the plot region, and draw the symbols by

creating a grid viewport per (x,y)location (we rotate

the viewport to represent the wind direction). The ﬁ-

nal plot is shown in Figure 2.

> speed <- 0.8 * chinasea$speed/14 +

+ 0.2

> temp <- chinasea$temp/40

> vps <- baseViewports()

> par(new = TRUE)

> push.viewport(vps$inner, vps$figure,

+ vps$plot)

> for (i in 1:25) {

+ push.viewport(viewport(

+ x = unit(chinasea$lat[i],

+ "native"),

+ y = unit(chinasea$long[i],

+ "native"),

+ angle = chinasea$dir[i]))

+ novelsym(speed[i], temp[i])

+ pop.viewport()

+ }

> pop.viewport(3)

22 23 24 25

119.5 120.0 120.5 121.0 121.5 122.0

China Sea Wind Speed/Direction and Temperature

latitude

longitude

●

Figure 2: Using grid to draw novel symbols on a

base plot.

Embedding base graphics plots in

grid viewports

gridBase provides several functions for adding base

graphics output to grid output. There are three func-

tions that allow base plotting regions to be aligned

with the current grid viewport; this makes it possi-

ble to draw one or more base graphics plots within a

grid viewport. The fourth function provides a set of

2Obtained from the CODIAC web site: http://www.joss.ucar.edu/codiac/codiac-www.html. The ﬁle chinasea.txt is in the grid-

Base/doc directory.

R News ISSN 1609-3631

Vol. 3/2, October 2003 9

graphical parameter settings so that base par() set-

tings can be made to correspond to some of3the cur-

rent grid graphical parameter settings.

The ﬁrst three functions are gridOMI(),

gridFIG(), and gridPLT(). They return the appro-

priate par() values for setting the base “inner”, “ﬁg-

ure”, and “plot” regions, respectively.

The main usefulness of these functions is to allow

you to create a complex layout using grid and then

draw a base plot within relevant elements of that lay-

out. The following example uses this idea to create

alattice plot where the panels contain dendrograms

drawn using base graphics functions4.

First of all, we create a dendrogram and cut it into

four subtrees5.

> library(mva)

> data(USArrests)

> hc <- hclust(dist(USArrests),

+ "ave")

> dend1 <- as.dendrogram(hc)

> dend2 <- cut(dend1, h = 70)

Now we create some dummy variables which corre-

spond to the four subtrees.

> x <- 1:4

> y <- 1:4

> height <- factor(round(unlist(

+ lapply(dend2$lower,

+ attr, "height"))))

Next we deﬁne a lattice panel function to draw the

dendrograms. The ﬁrst thing this panel function

does is push a viewport that is smaller than the view-

port lattice creates for the panel; the purpose is to en-

sure there is enough room for the labels on the den-

drogram. The space variable contains a measure of

the length of the longest label. The panel function

then calls gridPLT() and makes the base plot region

correspond to the viewport we have just pushed. Fi-

nally, we call the base plot() function to draw the

dendrogram (and pop the viewport we pushed)6.

> space <- max(unit(rep(1, 50),

+ "strwidth",

+ as.list(rownames(USArrests))))

> dendpanel <- function(x, y,

+ subscripts, ...) {

+ push.viewport(viewport(y = space,

+ width = 0.9, height = unit(0.9,

+ "npc") - space,

+ just = "bottom"))

+ grid.rect(gp = gpar(col = "grey",

+ lwd = 5))

+ par(plt = gridPLT(), new = TRUE,

+ ps = 10)

+ plot(dend2$lower[[subscripts]],

+ axes = FALSE)

+ pop.viewport()

+ }

Finally, we draw a lattice xyplot, using lattice to set

up the arrangement of panels and strips and our

panel function to draw a base dendrogram in each

panel. The ﬁnal plot is shown in Figure 3.

> library(lattice)

> xyplot(y ~ x | height, subscripts = TRUE,

+ xlab = "", ylab = "",

+ strip = function(...) {

+ strip.default(style = 4,

+ ...)

+ }, scales = list(draw = FALSE),

+ panel = dendpanel)

Florida

North Carolina

39 44 45 55

California

Maryland

Arizona

New Mexico

Delaware

Alabama

Louisiana

Illinois

New York

Michigan

Nevada

Alaska

Mississippi

South Carolina

39 44 45 55

Washington

Oregon

Wyoming

Oklahoma

Virginia

Rhode Island

Massachusetts

New Jersey

Missouri

Arkansas

Tennessee

Georgia

Colorado

Texas

39 44 45 55

Idaho

Nebraska

Kentucky

Montana

Ohio

Utah

Indiana

Kansas

Connecticut

Pennsylvania

Hawaii

West Virginia

Maine

South Dakota

North Dakota

Vermont

Minnesota

Wisconsin

Iowa

New Hampshire

39 44 45 55

Figure 3: Adding base dendrograms to a lattice plot.

The gridPLT() function is useful for embedding

just the plot region of a base graphics function (i.e.,

without labels and axes; another example of this us-

age is given in the next section). If labelling and axes

are to be included it will make more sense to use

gridFIG(). The gridOMI() function has pretty much

the same effect as gridFIG() except that it allows for

the possibility of embedding multiple base plots at

once. In the following code, a lattice plot is placed

alongside base diagnostic plots arranged in a 2-by-2

array.

We use the data from page 93 of “An Introduc-

tion to Generalized Linear Models” (Annette Dob-

son, 1990).

3Only lwd,lty,col are available yet. More should be available in future versions.

4Recall that lattice is built on grid so the panel region in a lattice plot is a grid viewport.

5the data and cluster analysis are copied from the example in help(plot.dendrogram).

6The grid.rect() call is just to show the extent of the extra viewport we pushed.

R News ISSN 1609-3631

Vol. 3/2, October 2003 10

> counts <- c(18, 17, 15, 20,

+ 10, 20, 25, 13, 12)

> outcome <- gl(3, 1, 9)

> treatment <- gl(3, 3)

We create two regions using grid viewports; the left

region is for the lattice plot and the right region is

for the diagnostic plots. There is a middle column of

1cm to provide a gap between the two regions.

> push.viewport(viewport(

+ layout = grid.layout(1,

+ 3, widths = unit(rep(1,

+ 3), c("null", "cm",

+ "null")))))

We draw a lattice plot in the left region.

> push.viewport(viewport(

+ layout.pos.col = 1))

> library(lattice)

> bwplot <- bwplot(counts ~ outcome |

+ treatment)

> print(bwplot, newpage = FALSE)

> pop.viewport()

We draw the diagnostic plots in the right region.

Here we use gridOMI() to set the base inner re-

gion and par(mfrow) and par(mfg) to insert multi-

ple plots7. The ﬁnal plot is shown in Figure 4.

> push.viewport(viewport(layout.pos.col = 3))

> glm.D93 <- glm(counts ~ outcome +

+ treatment, family = poisson())

> par(omi = gridOMI(), mfrow = c(2,

+ 2), new = TRUE)

> par(cex = 0.5, mar = c(5, 4,

+ 1, 2))

> par(mfg = c(1, 1))

> plot(glm.D93, caption = "",

+ ask = FALSE)

> pop.viewport(2)

Notice that because there is only ever one cur-

rent grid viewport, it only makes sense to use one

of gridOMI(),gridFIG(), or gridPLT(). In other

words, it only makes sense to align either the inner

region, or the ﬁgure region, or the plot region with

the current grid viewport.

A more complex example

We will now look at a reasonably complex exam-

ple involving embedding base graphics within grid

viewports which are themselves embedded within a

base plot. This example is motivated by the follow-

ing problem8:

I am looking at a way of plotting a se-

ries of pie charts at speciﬁed locations on

an existing plot. The size of the pie chart

would be proportion to the magnitude of

the total value of each vector (x) and the

values in x are displayed as the areas of

pie slices.

First of all, we construct some fake data, consist-

ing of four (x,y)values, and four (z1,z2)values :

> x <- c(0.88, 1, 0.67, 0.34)

> y <- c(0.87, 0.43, 0.04, 0.94)

> z <- matrix(runif(4 * 2), ncol = 2)

Before we start any plotting, we save the current

par() settings so that at the end we can “undo” some

of the complicated settings that we need to apply.

> oldpar <- par(no.readonly = TRUE)

Now we do a standard base plot of the (x,y)values,

but do not plot anything at these locations (we’re just

setting up the user coordinate system).

> plot(x, y, xlim = c(-0.2, 1.2),

+ ylim = c(-0.2, 1.2), type = "n")

Now we make use of baseViewports. This will cre-

ate a list of grid viewports that correspond to the

inner, ﬁgure, and plot regions set up by the base

plot. By pushing these viewports, we establish a grid

viewport that aligns exactly with the plot region cre-

ated by the base plot, including a (grid) “native” co-

ordinate system that matches the (base) user coordi-

nate system9.

> vps <- baseViewports()

> par(new = TRUE)

> push.viewport(vps$inner, vps$figure,

+ vps$plot)

> grid.segments(x0 = unit(c(rep(0,

+ 4), x), rep(c("npc", "native"),

+ each = 4)), x1 = unit(c(x,

+ x), rep("native", 8)), y0 = unit(c(y,

+ rep(0, 4)), rep(c("native",

+ "npc"), each = 4)), y1 = unit(c(y,

+ y), rep("native", 8)),

+ gp = gpar(lty = "dashed",

+ col = "grey"))

Before we draw the pie charts, we need to perform

a couple of calculations to determine their size. In

this case, we specify that the largest pie will be 1"in

diameter and the others will be a proportion of that

size based on ∑iz.i/max (∑iz.i)

7We use par(mfrow) to specify the 2-by-2 array and par(mfg) to start at position (1, 1)in the array.

8This description is from an email to R-help from Adam Langley, 18 July 2003

9The grid.segments call is just drawing some dashed lines to show that the pie charts we end up with are centred correctly at the

appropriate (x,y)locations.

R News ISSN 1609-3631

Vol. 3/2, October 2003 11

counts

●

1 2 3

●

1 2 3

●

1 2 3

2.6 2.7 2.8 2.9 3.0

−1.0 −0.5 0.0 0.5 1.0

Predicted values

Residuals

●

●●

●

−1.5 −0.5 0.5 1.0 1.5

−1.0 0.0 0.5 1.0 1.5

Theoretical Quantiles

Std. deviance resid.

2.6 2.7 2.8 2.9 3.0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Predicted values

Std. deviance resid.

●

2 4 6 8

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Obs. number

Cook’s distance

Figure 4: Drawing multiple base plots within a grid viewport.

> maxpiesize <- unit(1, "inches")

> totals <- apply(z, 1, sum)

> sizemult <- totals/max(totals)

We now enter a loop to draw a pie at each (x,y)loca-

tion representing the corresponding (z1,z2)values.

The ﬁrst step is to create a grid viewport at the (x,y)

location, then we use gridPLT() to set the base plot

region to correspond to the grid viewport. With that

done, we can use the base pie function to draw a pie

chart within the grid viewport10.

> for (i in 1:4) {

+ push.viewport(viewport(x = unit(x[i],

+ "native"), y = unit(y[i],

+ "native"), width = sizemult[i] *

+ maxpiesize, height = sizemult[i] *

+ maxpiesize))

+ grid.rect(gp = gpar(col = "grey",

+ fill = "white", lty = "dashed"))

+ par(plt = gridPLT(), new = TRUE)

+ pie(z[i, ], radius = 1,

+ labels = rep("", 2))

+ pop.viewport()

+ }

Finally, we clean up after ourselves by popping the

grid viewports and restoring the initial par settings.

> pop.viewport(3)

> par(oldpar)

The ﬁnal plot is shown in Figure 5.

−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Figure 5: Base pie charts drawn within grid view-

ports, which are embedded within a base plot.

Problems and limitations

The functions provided by the gridBase package al-

low the user to mix output from two quite different

graphics systems and there are limits to how much

the systems can be combined. It is important that

users are aware that they are mixing two not wholly

compatible systems (which is why these functions

are provided in a separate package) and it is of course

important to know what the limitations are:

• The gridBase functions attempt to match grid

10We draw a grid.rect with a dashed border just to show the extent of each grid viewport. It is crucial that we again call par(new=TRUE)

so that we do not move on to a new page.

R News ISSN 1609-3631

Vol. 3/2, October 2003 12

graphics settings with base graphics settings

(and vice versa). This is only possible under

certain conditions. For a start, it is only possi-

ble if the device size does not change. If these

functions are used to draw into a window, then

the window is resized, the base and grid set-

tings will almost certainly no longer match and

the graph will become a complete mess. This

also applies to copying output between devices

of different sizes.

• It is not possible to embed base graphics output

within a grid viewport that is rotated.

• There are certain base graphics functions which

modify settings like par(omi) and par(fig)

themselves (e.g., coplot()). Output from these

functions may not embed properly within grid

viewports.

•grid output cannot be saved and restored so

any attempts to save a mixture of grid and base

output are likely to end in disappointment.

Summary

The functions in the gridBase package provide a sim-

ple mechanism for combining base graphics output

with grid graphics output for static, ﬁxed-size plots.

This is not a full integration of the two graphics sys-

tems, but it does provide a useful bridge between the

existing large body of base graphics functions and

the powerful new features of grid.

Availability

The grid package is now part of the base distri-

bution of R (from R version 1.8.0). Additional in-

formation on grid is available from: http://www.

stat.auckland.ac.nz/~paul/grid/grid.html. The

gridBase package is available from CRAN (e.g.,

http://cran.us.r-project.org).

Bibliography

P. Murrell. The grid graphics package. R News, 2(2):

14–19, June 2002. URL http://CRAN.R-project.

org/doc/Rnews/.7

D. Sarkar. Lattice. R News, 2(2):19–23, June 2002. URL

http://CRAN.R-project.org/doc/Rnews/.7

Paul Murrell

University of Auckland, NZ

paul@stat.auckland.ac.nz

R News ISSN 1609-3631

Vol. 3/2, October 2003 13

A New Package for the General Error

Distribution

The normalp package

Angelo M. Mineo

Introduction

The General Error Distribution, whose ﬁrst formula-

tion could be ascribed to the Russian mathematician

Subbotin (1923), is a general distribution for random

errors. To derive this random error distribution, Sub-

botin extended the two axioms used by Gauss to de-

rive the usual normal (Gaussian) error distribution,

by generalizing the ﬁrst one. Subbotin used the fol-

lowing axioms:

1. The probability of an error εdepends only on

the greatness of the error itself and can be ex-

pressed by a function ϕ(ε)with continuous

ﬁrst derivative almost everywhere.

2. The most likely value of a quantity, for which

direct measurements xiare available, must not

depend on the adopted unit of measure.

In this way Subbotin obtains the probability distribu-

tion with the following density function:

ϕ(ε) = mh

2Γ(1/m)·exp[−hm|ε|m]

with −∞<ε<+∞,h>0 and m≥1. This dis-

tribution is also known as Exponential Power Distri-

bution and it has been used, for example, by Box and

Tiao (1992) in Bayesian inference. In the Italian sta-

tistical literature, a different parametrization of this

distribution has been derived by Lunetta (1963), who

followed the procedure introduced by Pearson (1895)

to derive new probability distributions, solving this

differential equation

dlog f

dx =p·log f−log a

x−c

and obtaining a distribution with the following prob-

ability density function

f(x) = 1

2σpp1/pΓ(1+1/p)·exp −|x−µ|p

pσp

with −∞<x<+∞and −∞<µ<+∞,σp>0

and p≥1. This distribution is known as the order

pnormal distribution (Vianelli, 1963). It is easy to

see how this distribution is characterized by three

parameters: µis the location parameter, σpis the

scale parameter and pis the structure parameter. By

changing the structure parameter p, we can recog-

nize some known probability distribution: for ex-

ample, for p=1 we have the Laplace distribution,

for p=2 we have the normal (Gaussian) distribu-

tion, for p→+∞we have the uniform distribution.

A graphical description of some normal of order p

curves is in ﬁgure 1 (this plot has been made with

the command graphnp() of the package normalp).

−4 −2 0 2 4

0.0 0.1 0.2 0.3 0.4 0.5

f(x)

p= 1

p= 1.5

p= 2

p= 3

Figure 1: Normal of order pcurves.

In this paper we present the main functions of the

package normalp and some examples of their use.

The normalp functions

The package contains the four classical functions

dealing with the computation of the density func-

tion, the distribution function, the quantiles and the

generation of pseudo-random observations from an

order pnormal distribution. Some examples related

to the use of these commands are the following:

> dnormp(3, mu = 0, sigmap = 1, p = 1.5,

+ log = FALSE)

[1] 0.01323032

> pnormp(0.5, mu = 2, sigmap = 3, p = 1.5)

[1] 0.3071983

> qnormp(0.3071983, mu = 2, sigmap = 3,

+ p = 1.5)

[1] 0.5

> rnormp(6, mu = 2, sigmap = 5, p = 2.5)

[1] 3.941597 -1.943872 -2.498598

[4] 1.869880 6.709037 14.873287

In case of generation of pseudo-random numbers we

have implemented two methods: one, faster, based

R News ISSN 1609-3631

Vol. 3/2, October 2003 14

on the relationship linking an order pnormal distri-

bution and a gamma distribution (see Lunetta, 1963),

and one based on the generalization of the Marsaglia

(1964) method to generate pseudo-random numbers

from a normal distribution. Chiodi (1986) describes

how the representation of the order pnormal distri-

bution as a generalization of a normal (Gaussian) dis-

tribution can be used for simulation of random vari-

ates.

Another group of functions concerns the estima-

tion of the order pnormal distribution parameters.

To estimate the structure parameter p, an estimation

method based on an index of kurtosis is used; in par-

ticular, the function estimatep() formulates an esti-

mate of pbased on the index VI given by

VI =√µ2

µ1

=pΓ(1/p)Γ(3/p)

Γ(2/p).

by comparing its theoretical value and the empirical

value computed on the sample. For a comparison

between this estimation method and others based on

the likelihood approach see Mineo (2003). With the

function kurtosis() it is possible to compute the

theoretical values of, besides VI,β2and βpgiven by

β2=µ4

µ2

=Γ(1/p)Γ(5/p)

[Γ(3/p)]2

βp=√µ2p

µ2

=p+1

Moreover, it is possible to compute the empirical val-

ues of these indexes given by

VI =qn∑n

i=1(xi−M)2

∑n

i=1|xi−M|

β2=n∑n

i=1(xi−M)4

[∑n

i=1(xi−M)2]2

βp=n∑n

i=1|xi−M|2p

[∑n

i=1|xi−M|p]2.

Concerning the estimation of the location param-

eter µand the scale parameter σp, we have used

the maximum likelihood method, conditional on

the estimate of pthat we obtain from the function

estimatep(). The function we have to use in this

case is paramp(). We have implemented also a func-

tion simul.mp(), that allows a simulation study to

verify the behavior of the estimators used for the es-

timation of the parameters µ,σpand p. The com-

pared estimators are: the arithmetic mean and the

maximum likelihood estimator for the location pa-

rameter µ, the standard deviation and the maximum

likelihood estimator for the scale parameter σp; for

the structure parameter pwe used the estimation

method implemented by estimatep(). Through the

function plot.simul.mp() it is possible to see graph-

ically the behavior of the estimators. A possible use

of the function simul.mp() is the following:

> res <- simul.mp(n = 30, m = 1000, mu = 2,

+ sigmap = 3, p = 3)

> res

Mean Mp Sd

Mean 1.9954033 1.9991151 2.60598964

Variance 0.2351292 0.2849199 0.08791664

Sp p

Mean 2.9348828 3.415554

Variance 0.5481126 7.753024

N. samples with a difficult convergence: 26

> plot(res)

The command plot(res) will produce an histogram

for every set of estimates created by the function

simul.mp(). In ﬁgure 2 we have the histogram for

p. For more details see Mineo (1995-a).

Density

2 4 6 8 10

0.0 0.1 0.2 0.3 0.4

Figure 2: Histogram of ˆ

pobtained with the com-

mand plot.simul.mp(res).

It is also possible to estimate linear regression

models when we make the hypothesis of random er-

rors distributed according to an order pnormal dis-

tribution. The function we have to use in this case

is lmp(), which we can use like the function lm()

from the base package. In fact, the function lmp()

returns a list with all the most important results

drawn from a linear regression model with errors

distributed as a normal of order pcurve; moreover,

it returns an object that can form the argument of the

functions summary.lmp() and plot.lmp(): the func-

tion summary.lmp() returns a summary of the main

obtained results, while the function plot.lmp() re-

turns a set of graphs that in some way reproduces the

analysis of residuals that usually we conduct when

R News ISSN 1609-3631

Vol. 3/2, October 2003 15

we estimate a linear regression model with errors

distributed as a normal (Gaussian) distribution.

14 16 18 20

6 7 8 9 10 11

space

distance

Figure 3: Plot of the data considered in the data

frame cyclist.

To show an example of use of these functions, we

considered a data set reported in Devore (2000). In

this data set (see ﬁgure 3) the distance between a cy-

clist and a passing car (variable distance) and the

distance between the centre line and the cyclist in the

bike lane (variable space) has been recorded for each

of ten streets; by considering the variable distance

as a dependent variable and the variable space as an

independent variable, we produce the following ex-

ample:

> data(ex12.21, package = "Devore5")

> res <- lmp(distance ~ space,

+ data = ex12.21)

> summary(res)

Call:

lmp(formula = distance ~ space,

data = ex12.21)

Residuals:

Min 1Q Median 3Q Max

-0.7467 -0.5202 0.0045 0.3560 0.8363

Coefficients:

(Intercept) space

-2.4075 0.6761

Estimate of p

1.353972

Power deviation of order p: 0.6111

> plot(res)

In ﬁgure 4 we show one of the four graphs that we

have obtained with the command plot(res).

7 8 9 10 11

−0.5 0.0 0.5

Fitted values

Residuals

Residuals vs Fitted

lmp(formula = Car ~ Center, data = cyclist)

Figure 4: First graph obtained by using the command

plot.lmp(res).

Also for a linear regression model with errors

distributed as an order pnormal distribution we

have implemented a set of functions that allow a

simulation study to test graphically the suitabil-

ity of the estimators used. The main function is

simul.lmp(); besides this function, we have im-

plemented the functions summary.simul.lmp() and

plot.simul.lmp() that allow respectively to visual-

ize a summary of the results obtained from the func-

tion simul.lmp() and to show graphically the be-

havior of the produced estimates. A possible use of

these functions is the following:

> res <- simul.lmp(10, 500, 1, data = 1.5,

+ int = 1, sigmap = 1, p = 3, lp = FALSE)

> summary(res)

Results:

(intercept) x1

Mean 0.9959485 1.497519

Variance 0.5104569 1.577187

Sp p

Mean 0.90508039 3.196839

Variance 0.04555003 11.735883

Coefficients: (intercept) x1

1.0 1.5

Formula: y ~ +x1

Number of samples: 500

Value of p: 3

N. of samples with problems on convergence 10

> plot(res)

In ﬁgure 5 it is showed the result of plot(res). For

more details see Mineo (1995-b).

R News ISSN 1609-3631

Vol. 3/2, October 2003 16

Histogram of intercept

Density

−1012345

0.0 0.2 0.4 0.6

Histogram of x1

Density

−4 −2 0 2 4

0.00 0.10 0.20 0.30

Histogram of Sp

Density

0.4 0.6 0.8 1.0 1.2 1.4 1.6

0.0 0.5 1.0 1.5

Histogram of p

Density

2 4 6 8 10

0.0 0.2 0.4 0.6

Figure 5: Graphs obtained with the command

plot.simul.lmp(res).

Besides the described functions, we have im-

plemented two graphical functions. The command

graphnp() allows visualization of up to ﬁve different

order pnormal distributions: this is the command

used to obtain the plot in ﬁgure 1. The command

qqnormp() allows drawing a Quantile-Quantile plot

to check graphically if a set of observations follows a

particular order pnormal distribution. Close to this

function is the command qqlinep() that sketches a

line passing through the ﬁrst and the third quartile

of the theoretical order pnormal distribution, line

sketched on a normal of order pQ-Q plot derived

with the command qqnormp(). In ﬁgure 6 there is a

graph produced by using these two functions.

−2 −1 0 1 2

Theoretical Quantiles

Sample Quantiles

p= 3

Figure 6: Normal of order pQ-Q plot.

Conclusion

In this article we have described the use of the new

package normalp, that implements some useful com-

mands where we have observations drawn from an

order pnormal distribution, known also as general

error distribution. The implemented functions deal

essentially with estimation problems for linear re-

gression models, besides some commands that gen-

eralize graphical tools already implemented in the

package base, related to observations distributed as

a normal (Gaussian) distribution. In the next future

we shall work on the computational improvement of

the code and on the implementation of other com-

mands to make this package still more complete.

Bibliography

G.E.P. Box and G.C. Tiao. Bayesian inference in statis-

tical analysis. Wiley, New York, 1992. First edition

for Addison-Wesley, 1973.

M. Chiodi. Procedures for generating pseudo-

random numbers from a normal distribution of or-

der p (p>1). Statistica Applicata, 1:7-26, 1986.

J.L. Devore. Probability and Statistics for Engineering

and the Sciences (5th edition). Duxbury, California,

2000.

G. Lunetta. Di una generalizzazione dello schema

della curva normale. Annali della Facoltà di Econo-

mia e Commercio dell’Università di Palermo, 17:237-

244, 1963.

G. Marsaglia and T.A. Bray. A convenient method for

generating normal variables. SIAM rev., 6:260-264,

1964.

A.M. Mineo. Stima dei parametri di intensità e di

scala di una curva normale di ordine p (p incog-

nito). Annali della Facoltà di Economia dell’Università

di Palermo (Area Statistico-Matematica), 49:125-159,

1995-a.

A.M. Mineo. Stima dei parametri di regressione

lineare semplice quando gli errori seguono una

distribuzione normale di ordine p (p incognito).

Annali della Facoltà di Economia dell’Università di

Palermo (Area Statistico-Matematica), 49:161-186,

1995-b.

A.M. Mineo. On the estimation of the structure pa-

rameter of a normal distribution of order p. To ap-

pear on Statistica, 2003.

K. Pearson. Contributions to the mathematical the-

ory of evolution. II. Skew variation in homo-

geneous material. Philosophical Transactions of the

Royal Society of London (A), 186:343-414, 1895.

M.T. Subbotin. On the law of frequency of errors.

Matematicheskii Sbornik, 31:296-301, 1923.

S. Vianelli. La misura della variabilità condizionata

in uno schema generale delle curve normali di fre-

quenza. Statistica, 23:447-474, 1963.

Angelo M. Mineo

University of Palermo, Italy

elio.mineo@dssm.unipa.it

R News ISSN 1609-3631

Vol. 3/2, October 2003 17

Web-based Microarray Analysis using

Bioconductor

by Colin A. Smith

Introduction

The Bioconductor project is an open source effort

which leverages R to develop infrastructure and al-

gorithms for the analysis of biological data, in par-

ticluar microarray data. Many features of R, includ-

ing a package-based distribution model, rapid proto-

typing, and selective migration to high performance

implementations lend themselves to the distributed

development model which the Bioconductor project

uses. Users also beneﬁt from the ﬂexible command

line environment which allows integration of the

available packages in unique ways suited to individ-

ual problems.

However, features to one individual may be road-

blocks to another. The use of microarrays for gene

expression proﬁling and other applications is grow-

ing rapidly. Many biologists who perform these ex-

periments lack the programming experience of the

typical R user and would strongly object to using a

command line interface for their analysis.

Here we present an overview of a web-based in-

terface that attempts to address some of the difﬁcul-

ties facing individuals wishing to use Bioconductor

for their microarray data analysis. It is distributed as

the webbioc package available on the Bioconductor

web site at http://www.bioconductor.org/.

Target audience and interface goals

While targeted at many user types, the web interface

is designed for the lowest common denominator of

microarray users, e.g. a biologist with little computer

savvy and basic, but not extensive, statistical knowl-

edge in areas pertinant to microarray analysis. Note

that although this is the lowest level user targeted by

the web interface, this interface also caters to power

users by exposing ﬁner details and allows ﬂexibility

within the preconstructed workﬂows.

This article presents only the ﬁrst version of a

Bioconductor web interface. With luck, more R and

Perl hackers will see ﬁt to add interfaces for more as-

pects of microarray analysis (e.g. two-color cDNA

data preprocessing). To help maintain quality and

provide future direction, a number of user interface

goals have been established.

•Ease of use. Using the web interface, the user

should not need to know how to use either

a command line interface or the R language.

Depending on the technical experience of the

user, R tends to have a somewhat steep learn-

ing curve. The web interface has a short learn-

ing curve and should be usable by any biolo-

gist.

•Ease of installation. After initial installation by a

system administrator on a single system, there

is no need to install additional software on user

computers. Installing and maintaining an R in-

stallation with all the Bioconductor packages

can be a daunting task, often suited to a system

administrator. Using the web interface, only

one such installation needs to be maintained.

•Discoverability. Graphical user interfaces are

signiﬁcantly more discoverable than command

line interfaces. That is, a user browsing around

a software package is much more likely to dis-

cover and use new features if they are graphi-

cally presented. Additionally, a uniﬁed user in-

terface for the different Bioconductor packages

can help add a degree to cohesiveness to what

is otherwise a disparate collection of functions,

each with a different interface. Ideally, a user

should be able to start using the web interface

without reading any external documentation.

•Documentation. Embedding context-sensitive

online help into the interface helps ﬁrst-time

users make good decisions about which statis-

tical approaches to take. Because of its power,

Bioconductor includes a myriad of options for

analysis. Helping the novice statistician wade

through that pool of choices is an important as-

pect of the web interface.

Another aspect of the target audience is the de-

ployment platform. The web interface is written in

Perl, R, and shell scripts. It requires a Unix-based

operating system. Windows is not supported. It

also uses Netpbm and optionally PBS. For further in-

formation, see the webbioc vignette at http://www.

bioconductor.org/viglistingindex.html.

User-visible implementation deci-

sions

There are a number of existing efforts to create web

interfaces for R, most notably Rweb, which presents

the command line environment directly to the user.

See http://www.math.montana.edu/Rweb/. The Bio-

conductor web interface, on the other hand, entirely

abstracts the command line away from the user. This

results in an entirely different set of design decisions.

R News ISSN 1609-3631

Vol. 3/2, October 2003 18

The ﬁrst choice made was the means by which

data is input and handled within the system. In an

R session, data is instantiated as variables which the

user can use and manipulate. However, in the web

interface, the user does not see variables associated

with an R session but rather individual ﬁles which

hold datasets, such as raw data, preprocessed data,

and analysis result tables.

Different stages of microarray analysis are di-

vided into individual modules. Each module leads

the user through a series of steps to gather process-

ing parameters. When ready, the system creates

an R script which it either runs locally or submits

to a computer cluster using the Portable Batch Sys-

tem. Any objects to be passed to another module are

saved in individual ﬁles.

Another decision which directly impacts the user

experience is that the system does not maintain ac-

counts for individual users. Instead, it uses the con-

cept of a uniquely identiﬁed session. When a user

ﬁrst starts using the web interface, a session is cre-

ated which holds all the uploaded and processed

data. The system provides the user with a session

token comprised of a random string of letters and

numbers. The token allows the user to return to their

session at a future date.

This offers a number of advantages: 1) At the

discretion of the local system administrator, the web

analysis resource can be offered as either a public or

a private resource. Such a restriction can be made at

the web-server level rather than the code level. 2) It

allows rapid development of the web interface with-

out being bogged down in the implementation or in-

tegration of a user infrastructure. 3) As opposed to

having no session whatsoever, this allows a user to

input data only once. Raw data ﬁles are often quite

large. Uploading multiple copies of such datasets for

each change in parameters is not desirable.

Lastly, the web interface brings the idea of design-

by-contract used in the Bioconductor project down

to the package level. That is, individual interface

modules are responsible for a speciﬁc stage or type

of analysis. Modules may take the user through any

number of steps as long as they use standard input

and output formats. This allows the system to grow

larger and more powerful over time without making

individual components more complex than is neces-

sary to fulﬁll their function.

Analysis workﬂow

The web interface is currently limited to process-

ing data from microarray experiments based on the

Affymetrix GeneChip platform. It does however

handle an entire workﬂow going from raw intensity

values through to annotated lists of differentially ex-

pressed genes.

Affymetrix microarray data comes in the form of

CEL ﬁles containing intensity values for individual

probes. Because all processing is done server-side,

that data must ﬁrst be transferred with the Upload

Manager. While raw data ﬁles can each be over ten

megabytes, today’s fast ethernet networks provide

very acceptable performance, with ﬁle transfers tak-

ing only a few seconds.

Figure 1: Upload Manager

Affymetrix preprocessing is handled by the affy

Bioconductor package. The core functionality of that

package is captured by only a handful of functions

and thus lends itself to a simple web interface. The

user may choose either the built-in high performance

function for RMA or a custom expression measure.

The custom expression measure also uses additional

plug-in methods from the vsn and gcrma packages,

which leverage the modular design of affy.

Figure 2: Affymetrix Data Preprocessing

There are a number of methods for identifying

differentailly expressed genes. The web interface

currently uses basic statistical tests (t-tests, F-tests,

R News ISSN 1609-3631

Vol. 3/2, October 2003 19

Figure 3: Differential Expression and Multiple Testing

etc.) combined with multiple testing procedures for

error control of many hypotheses. These are im-

plemented in the multtest package. Additionally,

the web interface automatically produces a number

of diagnostic plots common to microarray analysis.

Those include M vs. A (log fold change vs. overall

expression) and normal quantile-quantile plot.

The web interface completes the workﬂow by

producing tables with integrated results and meta-

data annotation. Where appropriate, the annota-

tion links to other online databases including a chro-

mosome viewer, PubMed abstracts, Gene Ontology

trees, and biochemical pathway schematics. The

metadata presentation is handled by annaffy, an-

other Bioconductor package.

In addition to presenting metadata, the web inter-

face provides facilities for searching that metadata.

For instance, it is trivial to map a set of GenBank

accession numbers onto a set of Affymetrix probe-

set ids or ﬁnd all genes in a given Gene Ontology

branch. This assists biologists in making speciﬁc

hypotheses about differential gene expression while

maintining strong control over the error rate.

Lastly, because the web interface stores inter-

mediate data as R objects, users of Bioconductor

through either the command line or web interface

can easily exchange data back and forth. Data ex-

change is currently limited to exprSet objects, which

is the standard class for microarray data in Biocon-

ductor. Future development of the interface should

yield more data exchange options enabling novel col-

laboration between R users and non-users alike.

Final thoughts

An important consideration worthy of discussion is

the inherent lack of ﬂexibility in graphical user inter-

faces. The R command line interface does not box

one into pre-scripted actions in the way that the web

interface does. It allows one to exercise much more

creativity in analysis and take more non-linear ap-

proaches. In the GUI trivial questions may by impos-

sible to answer simply because of unforeseen limita-

tions.

There are, however, a number of strengths in the

web interface beyond what is available in R. The

aforementioned interface goals are good examples of

this. Additionally, the web interface can help reduce

errors by distilling long series of typed commands

into simple point-and-click options. All actions and

parameters are tracked in a log for veriﬁcation and

quality control.

Secondly, the web interface easily integrates into

existing institutional bioinformatics resources. The

web has been widely leveraged to bring univer-

R News ISSN 1609-3631

Vol. 3/2, October 2003 20

Figure 4: Annotated Results and Online Database Links

sally accessible interfaces to common command-line

bioinformatics tools. The system presented here can

sit right next to those tools on a web site. Because it

already uses PBS for dispatching computational jobs,

the web interface can take advantage of existing com-

puter clusters built for genomic search tools, such as

BLAST, and can scale to many simultaneous users.

The web interface has been deployed and is cur-

rently in use by two research groups. One group is

split between institutions located in different states.

They use common session tokens and collaborate by

sharing data and analysis results over the web.

Lastly, Bioconductor has implementations of a

number of algorithms not otherwise freely available.

Some newer algorithms have been exclusively imple-

mented in Bioconductor packages. The web interface

helps bring such innovations to the mainstream. It

may even wet the appetite of some users, convincing

them to take the plunge and learn R.

Colin A. Smith

NASA Center for Computational Astrobiology and Fun-

damental Biology

webbioc@colinsmith.org

R News ISSN 1609-3631

Vol. 3/2, October 2003 21

Sweave, Part II: Package Vignettes

Reading, writing and interacting with R package

primers in Sweave format.

by Friedrich Leisch

This is the second article in a two-part miniseries on

Sweave (Leisch,2002a), a tool that allows embed-

ding of R code in L

EX documents so that code, re-

sults, and descriptions are presented in a consistent

way. The ﬁrst article (Leisch,2002b) introduced the

Sweave ﬁle format and the R functions to process it,

and demonstrated how to use Sweave as a report-

ing tool for literate statistical practice. This article

will concentrate on how to use ﬁles in Sweave format

as primers or manuals for R packages, so that users

have direct access to the code examples shown and

so that all code can be checked automatically for syn-

tax errors and for consistency of the usage descrip-

tion with the implementation.

R package documentation

The main vehicle for documenting R packages are

the help ﬁles, which are the sources, written in R doc-

umentation (Rd) format, of what you see after calling

help() on a topic. These ﬁles, which are divided into

sections, typically contain code in just two sections:

usage and examples. All examples in the R help ﬁles

are, by default, required to be executable such that

the user can copy & paste the code to a running R

process using

• the mouse,

• keyboard shortcuts if running R inside Emacs

with ESS, or

• R’s example() function.

Examples should be ﬂagged as non-executable only

if there are good reasons for this, e.g. because they

require user interactivity like identify() and hence

cannot be executed in batch mode.

The tools for package quality control, available

through the R CMD check1command, test if all the

examples in the package can be successfully exe-

cuted. Furthermore, the code in the usage section is

compared with the actual implementation to check

for inconsistencies and for missing documentation.

The Rd ﬁle format was designed for refer-

ence documentation on single R objects (functions,

classes, data sets, . . . ). It is not intended for demon-

strating the interaction of multiple functions in a

package. For this task we have developed the con-

cept of package vignettes — short to medium-sized

documents explaining parts or all of the functionality

of a package in a more informal tone than the strict

format of reference help pages.

Reading vignettes

Books like Venables and Ripley (2002) that describe

the use of S for data analysis typically contain a mix-

ture of documentation, code, and output. Short doc-

uments in this style are ideally suited to explaining

the functionality of a package to new users. The di-

rectory ‘inst/doc’ of an R source package may con-

tain such package documentation in any format, al-

though we recommend PDF ﬁles because of their

platform independence.

We call a user guide located in ‘inst/doc’ a vi-

gnette only when it is a document from which the

user can extract the R code and interact with it.

Sweave is the only format for such documents that

is currently supported by R; there may be others in

the future. In short: every vignette is a user guide,

but not every user guide is a vignette.

Command line interface

Starting with R version 1.8.0 there is support in base

R for listing and viewing package vignettes. The

vignette() function works similar to data() and

demo(). If no argument is given, a list of all vignettes

in all installed packages is returned — see the exam-

ple R session in Figure 1.

Do not be surprised if this list is rather short or

even empty on your computer. At present only a

few of the packages on CRAN have vignettes. For

Bioconductor we have made package vignettes a re-

quirement and thus all Bioconductor packages pro-

vide one or more vignettes.

Following the title of each vignette listed in Fig-

ure 1, you will see in parenthesis a list of the formats

that are available. In Figure 1all the vignettes are

available in both source and PDF format. To view

the strucchange-intro vignette, all you need to do

is to issue

R> vignette("strucchange-intro")

and the PDF ﬁle is opened in your favorite PDF

reader (exactly which PDF reader depends on the

platform that you use). If the source ﬁle for a vignette

is available, one can easily extract the code shown

in the vignette, although we have not yet fully auto-

mated the procedure. First we get the full path to the

vignette directory

R> vigdir =

+ system.file("doc", package="strucchange")

and then we examine the names of the ﬁles it con-

tains

1R CMD xxx is Rcmd xxx in the Windows version of R.

R News ISSN 1609-3631

Vol. 3/2, October 2003 22

R > vig net te ()

Vig ne tte s in pac kag e ’ An nBuild er ’:

An nB ui lde r An nB ui lde r Basi c ( sou rce , pdf )

HowTo An nB ui lde r HowT o ( sou rce , pdf )

Vi gn et te s in pac ka ge ’ B iobase ’:

Bio ba se Bio ba se Pri mer ( source , pdf )

Bi oc ond uc tor Howto B io co ndu ct or ( s ource , pdf )

HowTo Ho wTo HowTo ( source , pdf )

esA pp ly esA pp ly I nt ro duc tio n ( source , p df )

...

Vig nette s in package ’ strucchange ’:

strucchange - intro str uc ch an ge : An R Pac kag e for T est ing for

Str uc tural C hang e in Linear R eg ression Models

(source , pdf)

...

Figure 1: Usage of vignette() to list available package vignattes.

R> list.files(vigdir)

[1] "00Index.dcf"

[2] "strucchange-intro.R"

[3] "strucchange-intro.Rnw"

[4] "strucchange-intro.pdf"

[5] "strucchange-intro.tex"

File ‘strucchange-intro.Rnw’ is the original Sweave

ﬁle, ‘strucchange-intro.R’ has the extracted R code for

all code chunks and could now be executed using

source() or opened in your favorite editor. If the

‘.R’ ﬁle is not available, we can create it in the cur-

rent working directory by

R> library("tools")

R> vig = listFilesWithType(vigdir, "vignette")

R> Stangle(vig[1])

Writing to file strucchange-intro.R

where listFilesWithType() returns the full path to

all ﬁles in vigdir that have type "vignette", i.e., an

extension marking them as Sweave ﬁles.

Graphical user interface

The simplest way to access vignettes is probably

through the HTML help system. If you execute

help.start() and click your way to a package con-

taining vignettes, then you will see, at the beginning

of the package’s table of contents, a link to an index

of all the available vignettes. In addition there is a

link to the directory containing the vignettes so that,

for example, you could use your browser to examine

the source ﬁles.

A more advanced interface to package vignettes

is available in the Bioconductor package tkWid-

gets, available from http://www.bioconductor.org.

Function vExplorer() lists all available vignettes in

a nice point & click menu. For example, after select-

ing the strucchange vignette the upper left window

shown in Figure 2is opened. The PDF version of

the vignette can be opened by clicking on the “View

PDF” button. Each code chunk of the vignette has

a button on the left side of the window, clicking on

the button shows the code in the "R Source Code"

text ﬁeld. The code can be executed and the resulting

output is shown in the “Results of Execution” area.

The most powerful feature of this kind of inter-

face is that the S code in the source code ﬁeld can

be modiﬁed by the user, e.g., to try variations of the

pre-fabricated examples. To modify the example, one

simply edits the code in the "R Source Code" area and

presses the "Execute Code" button again.

Dynamic statistical documents and their user in-

terfaces are an open research area, see also Buttrey

et al. (2001) and Sawitzki (2002) for other approaches.

Writing vignettes

Once the Sweave ﬁle is written (we cannot do that for

you), it is almost trivial to include it in an R package

and make it available to users by the tools described

above. Say you have written a ﬁle ‘foo.Rnw’ to be

used as a vignette for package foo. First you need to

add some meta-information to the ﬁle along the lines

% \VignetteIndexEntry{An R Package for ...}

% \VignetteDepends{foo, bar, ...}

% \VignetteKeyword{kwd1}

% \VignetteKeyword{kwd2}

R News ISSN 1609-3631

Vol. 3/2, October 2003 23

Figure 2: Screenshot of vExplorer() showing the vignette from package strucchange: main controls for code

chunk execution (upper left), currently active R graphics window (lower left) and a pdf viewer (right).

All of these should be in L

EX comments as we have

not deﬁned them as proper L

EX commands. The in-

dex entry is used for the listings of vignette() or

vExplorer(); frequently it is the same as the title

of the document (or an abbreviated version thereof).

Note that it is directly used in text and HTML ﬁles

and hence should not contain any T

EX markup. The

dependency information is analogous to the Depends

ﬁeld of a package ‘DESCRIPTION’ ﬁle and lists pack-

ages needed to execute the code in the vignette. The

list of \VignetteXXX meta-information speciﬁcations

will probably get longer in the near future, especially

for versioning etc.

Once this is done all you have to do is create

a subdirectory ‘inst/doc’ in yor package source tree

and copy ‘foo.Rnw’ to it. All the rest is taken care of

by the R package management system, e.g.

•R CMD check will extract the code from the vi-

gnette and test that it can be executed success-

fully.

•R CMD build will run Sweave() and pdflatex

on the vignette to create the PDF version.

• The package installation mechanism creates an

index of all vignettes in the package and links

it into the HTML help system.

Note that even code chunks with option

eval=FALSE are tested by R CMD check; if you want

to include code that should not be tested in a vi-

gnette, move it to a normal L

EX verbatim environ-

ment. The reason for this policy is because users

should be able to rely on code examples being exe-

cutable exactly as shown in the vignette.

By including the PDF version in the package

sources it is not necessary that the vignettes can be

compiled at install time, i.e., the package author can

use private L

EX extensions or BibTeX ﬁles. Only the

R code inside the vignettes is part of the checking

procedure; typesetting manuals is not part of pack-

age quality control.

For more details see the manual “Writing R Ex-

tensions”, which features a section on package vi-

gnettes.

In general it is assumed that package authors

run R CMD build on their machine (and may safely

assume that only they do that). R CMD check

on the other hand should be runnable by ev-

erybody, e.g., CRAN runs a check on all 250+

packages on a daily basis (the results are avail-

able at http://cran.r-project.org/src/contrib/

checkSummary.html). Bioconductor has opted for a

stricter policy such that even building packages (in-

cluding running latex on vignettes) should be re-

R News ISSN 1609-3631

Vol. 3/2, October 2003 24

producible on every machine which has the neces-

sary tools installed.

Acknowledgements

vignette() and most of R CMD check were written

by Kurt Hornik. vExplorer() and its helper func-

tions were written by Jeff Gentry and Jianhua Zhang

as part of the Bioconductor project. I want to thank

them and Robert Gentleman for helpful ideas and

discussions.

Bibliography

S. E. Buttrey, D. Nolan, and D. T. Lang. An environ-

ment for creating interactive statistical documents.

In E. J. Wegman, A. Braverman, A. Goodman, and

P. Smyth, editors, Computing Science and Statistics,

volume 33. Interface Foundation of North Amer-

ica, Fairfax Station, VA, USA, 2001. 22

F. Leisch. Sweave: Dynamic generation of statistical

reports using literate data analysis. In W. Härdle

and B. Rönz, editors, Compstat 2002 — Proceedings

in Computational Statistics, pages 575–580. Physika

Verlag, Heidelberg, Germany, 2002a. URL http:

//www.ci.tuwien.ac.at/~leisch/Sweave. ISBN

3-7908-1517-9. 21

F. Leisch. Sweave, part I: Mixing R and L

EX. R

News, 2(3):28–31, December 2002b. URL http:

//CRAN.R-project.org/doc/Rnews/.21

G. Sawitzki. Keeping statistics alive in documents.

Computational Statistics, 17:65–88, 2002. 22

W. N. Venables and B. D. Ripley. Modern Applied

Statistics with S. Fourth Edition. Springer, 2002. URL

http://www.stats.ox.ac.uk/pub/MASS4/. ISBN

0-387-95457-0.

Friedrich Leisch

Institut für Statistik & Wahrscheinlichkeitstheorie

Technische Universität Wien, Austria

Friedrich.Leisch@R-project.org

R News ISSN 1609-3631

Vol. 3/2, October 2003 25

R Foundation News

by Bettina Grün and Friedrich Leisch

New benefactors

• Department of Statistics, Brigham Young

University, Utah, USA

• Institute of Mathematical Statistics (IMS),

Ohio, USA

• MedAnalytics, Inc., Minnesota, USA

New supporting institutions

• Astra Zeneca R&D Mölndal, Mölndal, Sweden

• Baxter AG, Vienna, Austria

• Boehringer Ingelheim Austria GmbH, Vienna,

Austria

• Department of Economics, Stockholm Univer-

sity, Sweden

• Department of Statistics, University of

Wisconsin-Madison, Wisconsin, USA

• Lehrstuhl für Rechnerorientierte Statistik und

Datenanalyse, University of Augsburg, Ger-

many

• Spotﬁre, Massachusetts, USA

New supporting members

Klaus Abberger (Germany)

Luc Anselin (USA)

Anestis Antoniadis (France)

Carlos Enrique Carleos Artime (Spain)

Ricardo Azevedo (USA)

Pierluigi Ballabeni (Switzerland)

Saghir Bashir (UK)

Marcel Baumgartner (Switzerland)

Hans Werner Borchers (Germany)

Rollin Brant (Canada)

Alessandra R. Brazzale (Italy)

Karl W. Broman (USA)

Robert Burrows (USA)

Federico Calboli (Italy)

Charles M. Cleland (USA)

Jorge de la Vega Góngora (Mexico)

Jan de Leeuw (USA)

Ramón Diaz-Uriarte (Spain)

Dubravko Doli´c (Germany)

Dirk Eddelbuettel (USA)

Stephen Eglen (UK)

John Fox (Canada)

Simon Gatehouse (Australia)

Stefano Guazzetti (Italy)

Frank Harrell (USA)

Pascal Heus (USA)

Paul Hewson (UK)

Giles Heywood (UK)

Johannes Hüsing (Germany)

Rafael Irizarry (USA)

David A. James (USA)

Landon Jensen (USA)

Diego Kuonen (Switzerland)

Manuel Castéjon Limas (Spain)

Greg Louis (Canada)

Clifford E. Lunneborg (USA)

John Marsland (UK)

Andrew D. Martin (USA)

Gordon M. Morrison (UK)

Rashid Nassar (USA)

Vadim Ogranovich (USA)

John C. Paolillo (USA)

Thomas Petzoldt (Germany )

Bernhard Pfaff (Germany)

Jonas Ranneby (USA)

Gus Rashid (USA)

Greg Ridgeway (USA)

Jeffrey Rosenthal (Canada)

Claude Rubinson (USA)

Ingo Ruczinski (USA)

Erik A. Sauleau (France)

Martin Schlather (Germany)

Michael Scroggie (Australia)

Frederik von Ameln (Switzerland)

Scott R. Waichler (USA)

Rainer Walke (Germany)

Ko-Kang Kevin Wang (New Zealand)

Stefan Werner (Finland)

Victor D. Zurkowski (Canada)

New ordinary members

Roger Bivand (Norway)

Bill Venables (Australia)

Bettina Grün & Friedrich Leisch

Technische Universität Wien, Austria

Bettina.Gruen@ci.tuwien.ac.at

Friedrich.Leisch@R-project.org

R News ISSN 1609-3631

Vol. 3/2, October 2003 26

Recent Events

Statistical Computing 2003 at

Reisensburg

The Reisensburg meeting has become a regular at-

traction for those interested in Computational Statis-

tics. It is organized by three special interest groups

(Computational Statistics of the Biometric Society -DR,

Statistical Analysis Systems of the German Association

of Medical Informatics, Biometry and Epidemiology

GMDS, and Classiﬁcation and Data Analysis in the bio-

sciences of the Gesellschaft für Klassiﬁkation GfKl)

and it takes place near Ulm, Germany, in beautiful

Reisensburg castle, situated above the river Danube.

The main topics of this conference are ﬁxed one

year in advance by the members of the working

groups. The organizers take great care that there is

sufﬁcient time for discussion after the talks and at

the famous Reisensburg bartizan round tables.

Recent developments of statistical software has

been a major topic of previous meetings. Merits and

miseries of the various packages were discussed in

depth. This has changed. Discussion of the large

packages played a minor role this year, and R was

featured in many presentations. F. Bretz, T. Hothorn

and P. Westfall gave an overview on the multcomp

package for multiple comparisons. F. Leisch intro-

duced flexmix, a framework for ﬁtting discrete mix-

tures of regression models.

As we all know, a lot has still to be done in R

to support advanced visualization and interactivity.

A. Zeileis, D. Meyer and K. Hornik demonstrated

visualizations using mosaic plots in R . S. Urbanek

showed Java based interactive graphics for R and

H. Hoffmann demonstrated what can be done in

other environments, using visualizations for condi-

tional distributions as an example and tools derived

from the Augsburg Dada collection.

The list of speakers and topics is too long

to be repeated here in full. Program and ab-

stracts are available from http://www.dkfz.de/

biostatistics/Reisensburg2003/.

The 2003 meeting highlighted the analysis of ge-

nomic data. Robert Gentleman presented a keynote

session on exploring and visualizing genomic data.

Subsequent sessions covered methodological as-

pects, in particular techniques for combining clas-

siﬁers or variables and methods related to machine

learning. On the more applied side, topics included,

among others, C. Ittrich and A. Benner addressing

the role of microarrays in clinical trials, and U. Mans-

mann discussing simulation techniques for microar-

ray experiments. E. Brunner (Göttingen) used the

opportunity to demonstrate how classical statistical

analysis of the design of experiments may be applied

in this ﬁeld to give concise answers instead of vague

conjectures.

High dimensional observations, combined with

very low sample sizes, are a well known peculiar-

ity of genomic data. Another peculiarity comes from

the strong dependence between the observed data.

The data refer to gene activities, and these are only

an aspect of the metabolic and regulatory dynam-

ics of a cell. Little is known about how to include

the knowledge of metabolic pathways and the re-

sulting dependencies in a statistical analysis. Us-

ing statistical inference from genomic data to identify

metabolic or regulatory structures is largely an open

task. F. Markowetz and R. Spang studied the effect

of perturbations on reconstructiing network struc-

ture; C. Becker and S. Kuhnt addressed robustness

in graphical modelling. From the application side,

A. v. Heydebreck reported on estimation of onco-

geneic tree models and W. Huber talked about iden-

tiﬁcation of protein domain combinations.

The next Reisensburg working conference will

take place 2004, June 27.-30. By the time you read this

article, the call for papers should have been issued.

The main topics will be: applications of machine

learning; statistical analyis of graphs/networks; sta-

tistical software; bioinformatics; exploration of large

data sets.

Till then, working groups in cooperation with

the special research unit in Erlangen will organize

a workshop on Ensemble Learning, Erlangen 2004,

Jan. 23.-24. Stay tuned, and see http://www.imbe.

med.uni-erlangen.de/links/EnsembleWS/.

Günther Sawitzki

Universität Heidelberg

gs@statlab.uni-heidelberg.de

Statistical Inference, Computing

and Visualization for Graphs

On August 1–2, 2003, a workshop on using graphs

in statistical data analysis took place at Stanford

University. Quoting the workshop homepage

at http://www.research.att.com/~volinsky/

Graphs/Workshop.html “Graphs have become an in-

creasingly popular way of representing data in many dif-

ferent domains, including telecommunications research,

genomics and bioinformatics, epidemiology, computer

networks and web connectivity, social networks, mar-

keting and statistical graphical models. Analyzing these

data effectively depends on contributions from the areas of

data representation, algorithms, visualization (both static

and interactive), statistical modeling (including graphical

models) and inference. Each of these areas has its own

language for describing graphs, and its own favorite tools

and methods. Our goal for the workshop is to explore

R News ISSN 1609-3631

Vol. 3/2, October 2003 27

synergies that may exist between these different areas of

research.”

It was very interesting to see the number of dif-

ferent areas of applied data analysis in which graphs

(structures with nodes and edges) are used. There

are differences, most notably the sizes of the graphs,

ranging from a dozen nodes to several millions,

which has an impact on “natural” and efﬁcient com-

putations. However, we also identiﬁed common-

alities, and having a central infrastructure in R for

representing graphs and performing common oper-

ations will certainly help to prevent reinventing the

wheel several times.

The Bioconductor project has started to provide

this infrastructure with the graph package and in-

terfaces to standard libraries for graph computations

and visualization (Rgraphviz,RBGL, . . . ). Develop-

ment versions of ggobi also have support for graphs

that can be tightly linked to R. If you are interested

to learn more about the workshop: you can down-

load the slides for any of the presentations from the

workshop homepage.

Finally, I want to thank the organizers for the

great job they did in organizing the workshop; both

the scientiﬁc program and the social atmosphere

made it a pleasure to participate.

Friedrich Leisch

Technische Universität Wien, Austria

Friedrich.Leisch@R-project.org

JSM 2003

At the 2003 Joint Statistical Meetings in San Fran-

cisco, an invited session was organized that is of par-

ticular interest to the R community. Jan de Leeuw

from University of California, Los Angeles, led off

the session with the a talk on “The State of Statis-

tical Software” (http://gifi.stat.ucla.edu/pub/

jsm03.pdf). He began with a overview of types

of statistical software one might use for activities

such as consulting, teaching and research providing

some history and thoughts for the future along the

way. Luke Tierney from University of Iowa, spoke

on “Some New Language Features of R” (http://

www.stat.uiowa.edu/~luke/talks/jsm03.pdf) fo-

cussing on namespaces, code analysis tools, excep-

tion handling and byte compilation. Duncan Tem-

ple Lang from Bell Laboratories spoke on “Con-

necting Scientiﬁc Software” (http://cm.bell-labs.

com/stat/duncan/Talks/JSM2003). The talk dealt

with connecting other software packages to R, with

particular attention to R DCOM services. The dis-

cussant, Wolfgang Hartmann from SAS, provided an

industry perspective (see http://www.cmat.pair.

com/wolfgang/jsm03.pdf) comparing the features

of different software, commercial and open-source,

with speciﬁc attention to R.

Balasubramanian Narasimhan

Stanford University, CA, USA

naras@stat.stanford.edu

gR 2003

On 17-20th September 2003, Aalborg University

hosted a workshop bringing together people from

many communities working with graphical models.

The common interest is development of a package

for R, supporting the use of graphical models for

data analysis. The workshop followed up on the gR

initiative described by Steffen Lauritzen in R News

2/3.

The workshop provided a kaleidoscope of ap-

plications as well as insight in experiences deal-

ing with practical graphical models. The applica-

tions presented were from the areas of epidemiol-

ogy, geostatistics, genetics, bioinformatics and ma-

chine learning.

The wide range of applications and methodology

showed that a unifying software package for graphi-

cal models must be widely extensible and ﬂexible —

utilizing a variety of data formats, model speciﬁca-

tions and estimation algorithms. The package should

also provide an attractive user interface that aids in

working with complex models interactively.

Development of a gR-package is evolving at

many levels. Some ’old’ stand-alone programs

are being ported as R-packages (CoCoR, BRugs),

some are being interfaced (mimR, JAGS, BugsR),

while others have been developed in R (ggm, deal,

GRAPPA).

Experiences from other existing packages can in-

spire the gR project. For example, the Bayes Net

Toolbox for Matlab includes many features that gR

will include. Intel is currently re-implementing the

Bayes Net Toolbox in C++ (called Probability Net-

work Library, PNL) and plan a December 2003 re-

lease, expected to be open source. An R interface to

PNL could be a possibility.

During the workshop an outline of a package gr-

base with basic elements was discussed and thought

to become a common ground for extensions. Impor-

tant features were to separate data, model and infer-

ence. The grbase package will include

• support for a variety of data formats, eg. as a

list of cases, a dataframe or a database connec-

tion. It should also be possible to work with a

model without data.

• a general model language capable of specify-

ing eg. (block-) recursive graphical models and

BUGS models.

• a variety of representation forms for graphs, eg.

using/extending the graph package from bio-

conductor.

R News ISSN 1609-3631

Vol. 3/2, October 2003 28

• a graphics system, for interactively working

with models. For example using R-Tcl/Tk,

Rggobi or the R-interface to Graphviz.

• an analyzing unit that combines data and

model with the possibility of using different in-

ference algorithms in the analyzing step.

A minimal version of grbase is planned for January

2004.

An invited session concerned with the gR de-

velopments is being planned for the Joint Statistical

Meeting in Toronto, 8-12 August 2004.

See http://www.math.auc.dk/gr/gr2003/ for

more information about the workshop and related

links, including links to the aforementioned soft-

ware.

Acknowledgments The gR-2003 workshop was

supported by the Danish National Research Founda-

tion Network in Mathematical Physics and Stochas-

tics - MaPhySto. The Danish activities of the gR

project are supported by the Danish Natural Science

Research Council.

Claus Dethlefsen

Aalborg University, Denmark

dethlef@math.auc.dk

Book Reviews

John Maindonald and John Braun:

Data Analysis and Graphics Using

R — An Example-based Approach

Cambridge University Press, Cambridge, United

Kingdom, 2003

362 pages, ISBN 0-521-81336-0

http://cbis.anu.edu/DAAG/

http://www.stats.uwo.ca/DAAG/

The aim of the book is to describe the ideas and

concepts of many statistical methodologies, that are

widely used in applications, by demonstrating the

use of R on a number of examples. Most examples in

the book use genuine data collected by the authors in

their combined several decades of statistical consult-

ing experience. The authors see the book as a com-

panion to other books that include more mathemati-

cal treatments of the relevant theory, and they avoid

mathematical notation and mathematical description

of statistical methods. The book is aimed at both sci-

entists and students interested in practical data anal-

ysis. Data and new R functions used in the book are

included in the DAAG package available from the

authors’ web sites and through the Comprehensive

R Archive Network (CRAN).

The book begins with a nice summary of the con-

tents of the twelve chapters of the book. Chapter 1,

A Brief Introduction to R, provides enough informa-

tion on using R to get the reader started. Chapter 2,

Style of Data Analysis, demonstrates with many ex-

amples the use of R to carry out basic exploratory

data analysis involving both graphical and numeri-

cal summaries of data. The authors not only describe

how to create graphs and plots but also show the

reader what to look for in the data summaries and

how to interpret the summaries in the context of each

particular example. Chapter 3, Statistical Models, de-

scribes the authors’ view on the importance of mod-

els as a framework for statistical analysis. Chapter 4,

An Introduction to Formal Inference, introduces the ba-

sic ideas of random sampling and sampling distribu-

tions of statistics necessary to understand conﬁdence

intervals and hypothesis testing. It also includes

chi-square tests for contingency tables and one-way

ANOVA.

The next several chapters demonstrate the use of

R to analyze data using linear models. Chapter 5,

Regression with a Single Predictor, Chapter 6, Multi-

ple Linear Regression, Chapter 7, Exploiting the Linear

Model Framework, and Chapter 8, Logistic Regression

and Other Generalized Linear Models, use increasingly

complex models to lead the reader through several

examples of practical data analysis.

The next three chapters discuss more specialized

topics that arise frequently in practice. Chapter 9,

Multi-level Models, Time Series, and Repeated Measures,

goes through examples that use more complicated

error structures than examples found in previous

chapters. Chapter 10, Tree-based Classiﬁcation and Re-

gression Trees, provides an introduction to tree-based

regression and classiﬁcation modeling. Chapter 11,

Multivariate Data Exploration and Discrimination, de-

scribes both principle components analysis and dis-

criminant analysis.

The ﬁnal chapter, Chapter 12, The R System — Ad-

ditional Topics, is a far more detailed introduction to

R than that contained in the initial chapters. It is also

intended as a reference to the earlier chapters.

The book is a primer on the nuts-and-bolts use

of R for the types of statistical analysis that arise

commonly in statistical practice, and it also teaches

the reader to think statistically when interpreting the

results of an analysis. The strength of the book is

in the extensive examples of practical data analysis

with complete examples of the R code necessary to

carry out the analyses. Short R commands appear

on nearly every page of the book and longer R code

examples appear frequently as footnotes.

R News ISSN 1609-3631

Vol. 3/2, October 2003 29

I would strongly recommend the book to scien-

tists who have already had a regression or a linear

models course and who wish to learn to use R. How-

ever, my recommendation has a few caveats. The

ﬁrst chapter of the book takes the reader through an

introduction to R that has the potential to be a little

frustrating for a reader with no prior R experience.

For example, the ﬁrst plotting command given is

plot(ACT ~ Year, data=ACTpop, pch=16)

The meaning of the pch=16 option is described and

the option data=ACTpop is self evident, but the syn-

tax ACT ~ Year is not explained and is potentially

confusing to an R beginner who does not automat-

ically translate ~into “is modeled by”. Page 5 gives

the advice to create a new workspace before exper-

imenting with R functions, but provides no details

on how one actually does this. Most examples of R

code in the book do contain adequate descriptions,

but there are a number of exceptions.

A second caveat is that the descriptions of statis-

tical methods are an adequate refresher, but are inad-

equate as a primary source of information. The au-

thors indicate clearly that the book is meant to com-

plement other books in the presentation of, and the

mathematical description of, statistical methods. I

agree that the book would not work well as a stand-

alone text book for a course on statistical modeling.

However, it is also not short and I would hesitate to

require students to buy it in addition to another com-

prehensive textbook. The scope of the book is greater

than simply serving as a companion book for teach-

ing R.

Despite my hesitation to use this book in teach-

ing, I give it a strong recommendation to the scien-

tist or data analyst who wishes an easy-to-read and

an understandable reference on the use of R for prac-

tical data analysis.

Bret Larget

University of Wisconsin—Madison

larget@stat.wisc.edu

Changes in R 1.8.0

by the R Core Team

MacOS changes

• As from this release there is only one R port

for the Macintosh, which runs only on MacOS

X. (The ‘Carbon’ port has been discontinued,

and the ‘Darwin’ port is part of the new ver-

sion.) The current version can be run either as a

command-line application or as an ‘Aqua’ con-

sole. There is a ‘Quartz’ device quartz(), and

the download and installation of both source

and binary packages is supported from the

Aqua console. Those CRAN and BioC pack-

ages which build under MacOS X have binary

versions updated daily.

User-visible changes

• The defaults for glm.control(epsilon=1e-8,

maxit=25) have been tightened: this will pro-

duce more accurate results, slightly slower.

• sub, gsub, grep, regexpr, chartr, tolower, toup-

per, substr, substring, abbreviate and strsplit

now handle missing values differently from

"NA".

• Saving data containing name space references

no longer warns about name spaces possibly

being unavailable on load.

• On Unix-like systems interrupt signals now set

a ﬂag that is checked periodically rather than

calling longjmp from the signal handler. This is

analogous to the behavior on Windows. This

reduces responsiveness to interrupts but pre-

vents bugs caused by interrupting computa-

tions in a way that leaves the system in an in-

consistent state. It also reduces the number of

system calls, which can speed up computations

on some platforms and make R more usable

with systems like Mosix.

Changes to the language

• Error and warning handling has been mod-

iﬁed to incorporate a ﬂexible condition han-

dling mechanism. See the online documen-

tation of tryCatch() and signalCondition().

Code that does not use these new facilities

should remain unaffected.

• A triple colon operator can be used to access

values of internal variables in a name space (i.e.

a:::b is the value of the internal variable bin

name space a).

• Non-syntactic variable names can now be

speciﬁed by inclusion between backticks

Like This . The deparse() code has been

changed to output non-syntactical names with

this convention, when they occur as operands

in expressions. This is controlled by a backtick

R News ISSN 1609-3631

Vol. 3/2, October 2003 30

argument, which is by default TRUE for com-

posite expressions and FALSE for single sym-

bols. This should give minimal interference

with existing code.

• Variables in formulae can be quoted by

backticks, and such formulae can be used

in the common model-ﬁtting functions.

terms.formula() will quote (by backticks)

non-syntactic names in its "term.labels" at-

tribute. [Note that other code using terms ob-

jects may expect syntactic names and/or not

accept quoted names: such code will still work

if the new feature is not used.]

New features

• New function bquote() does partial substitu-

tion like LISP backquote.

•capture.output() takes arbitrary connections

for file argument.

•contr.poly() has a new scores argument to

use as the base set for the polynomials.

•cor() has a new argument method =

c("pearson","spearman","kendall")’ as

cor.test() did forever. The two rank based

measures do work with all three missing value

strategies.

• New utility function cov2cor() Cov -> Corr

matrix.

•cut.POSIXt() now allows ‘breaks’ to be more

general intervals as allowed for the ‘by’ argu-

ment to seq.POSIXt().

•data() now has an envir argument.

•det() uses an LU decomposition and LA-

PACK. The method argument to det() no

longer has any effect.

•dev.control() now accepts enable as well as

inhibit. (Wishlist PR#3424)

•*,-and /work more generally on "difftime"

objects, which now have a diff() method.

•dt(*, ncp = V) is now implemented, thanks

to Claus Ekstroem.

•dump() only quotes object names in the ﬁle

where necessary.

•eval() of a promise forces the promise

•file.path() now returns an empty character

vector if given at least one zero-length argu-

ment.

•format() and hence print() make an effort to

handle corrupt data frames, with a warning.

•format.info() now also works with ‘nsmall’

in analogy with format.default().

•gamma(n) is very slightly more precise for inte-

ger n in 11:50.

•?and help() will accept more un-quoted ar-

guments, e.g. NULL.

• The ?operator has new forms for querying

documentation on S4 methods. See the online

documentation.

• New argument frame.plot = axes (==

TRUE) for filled.contour().

• New argument fixed = TRUE for grep() and

regexpr() to avoid the need to escape strings

to match.

•grep(x, ..., value = TRUE) preserves

names of x.

•hist.POSIXt() can now pass arguments to

hist.default()

•legend() and symbols() now make use of

xy.coords() and accept a wider range of co-

ordinate speciﬁcations.

• Added function library.dynam.unload() to

call dyn.unload() on a loaded DLL and tidy

up. This is called for all the standard packages

in namespaces with DLLs if their namespaces

are unloaded.

•lm(singular.ok = FALSE) is now imple-

mented.

• Empty lm() and glm() ﬁts are now handled

by the normal code: there are no methods for

classes "lm.null" and "glm.null". Zero-rank

ﬁts are handled consistently.

•make.names() has improvements, and there

is a new auxiliary function make.unique().

(Based on code contributed by Tom Minka,

since converted to a .Internal function.) In

particular make.names() now recognises that

names beginning with a dot are valid and that

reserved words are not.

•methods() has a print method which as-

terisks functions which are not user-visible.

methods(class = "foo") now lists non-

visible functions, and checks that there is a

matching generic.

•model.matrix() now warns when it removes

the response from the rhs of the formula: that

this happens is now documented on its help

page.

R News ISSN 1609-3631

Vol. 3/2, October 2003 31

• New option "locatorBell" to control the con-

ﬁrmation beep during the use of locator()

and identify().

• New option("scipen") provides some user

control over the printing of numbers in ﬁxed-

point or exponential notation. (Contributed by

David Brahm.)

•plot.formula() now accepts horizontal=TRUE

and works correctly when boxplots are pro-

duced. (Wishlist PR#1207) The code has been

much simpliﬁed and corrected.

•polygon() and rect() now interpret density <

0 or NA to mean ﬁlling (by colour) is desired:

this allows ﬁlling and shading to be mixed in

one call, e.g. from legend().

• The predict() methods for classes lm, glm,

mlm and lqs take a ‘na.action’ argument that

controls how missing values in ‘newdata’ are

handled (and defaults to predicting NA). [Pre-

viously the value of getOption("na.action")

was used and this by default omitted cases

with missing values, even if set to ‘na.exclude’.]

•print.summary.glm() now reports omit-

ted coefﬁcients in the same way as

print.summary.lm(), and both show them as

NAs in the table of coefﬁcients.

•print.table() has a new argument

‘zero.print’ and is now documented.

•rank(x, na.last = "keep") now preserves

NAs in ‘x’, and the argument ‘ties.method’ al-

lows to use non-averaging ranks in the pres-

ence of ties.

•read.table()’s ’as.is’ argument can be charac-

ter, naming columns not to be converted.

•rep() is now a generic function, with de-

fault, POSIXct and POSIXlt methods. For ef-

ﬁciency, the base code uses rep.int() rather

than rep() where possible.

• New function replicate() for repeated eval-

uation of expression and collection of results,

wrapping a common use of sapply() for sim-

ulation purposes.

•rev() is now a generic function, with default

and dendrogram methods.

•serialize() and unserialize() functions are

available for low-level serialization to connec-

tions.

•socketSelect() allows waiting on multiple

sockets.

•sort(method = "quick", decreasing =

TRUE) is now implemented.

•sort.list() has methods "quick" (a wrapper

for sort(method = "quick", index.return

= TRUE) and "radix" (a very fast method for

small integers). The default "shell" method

works faster on long vectors with many ties.

•stripchart() now has ‘log’, ‘add’ and ‘at’ ar-

guments.

•strsplit(x, *) now preserves names() but

won’t work for non-character ‘x’ anymore

formerly used as.character(x), destroying

names(x).

•textConnection() now has a local argument

for use with output connections. local = TRUE

means the variable containing the output is as-

signed in the frame of the caller.

• Using UseMethod() with more than two argu-

ments now gives a warning (as R-lang.texi has

long claimed it did).

• New function vignette() for viewing or list-

ing vignettes.

•which.min(x) and which.max(x) now pre-

serve names.

•xy.coords() coerces "POSIXt" objects to

"POSIXct", allowing lines, etc. to be added to

plot.POSIXlt() plots.

•.Machine has a new entry, sizeof.pointer.

•.Random.seed is only looked for and stored in

the user’s workspace. Previously the ﬁrst place

a variable of that name was found on the search

path was used.

• Subscripting for data.frames has been rational-

ized:

–Using a single argument now ignores any

‘drop’ argument (with a warning). Previ-

ously using ‘drop’ inhibited list-like sub-

scripting.

–adf$name <- value now checks for the

correct length of ‘value’, replicating a

whole number of times if needed.

–adf[j] <- value and adf[[j]] <-

value did not convert character vectors to

factors, but adf[,j] <- value did. Now

none do. Nor is a list ‘value’ coerced to

a data frame (thereby coercing character

elements to factors).

–Where replicating the replacement value

a whole number of times will produce

the right number of values, this is always

done (rather than some times but not oth-

ers).

R News ISSN 1609-3631

Vol. 3/2, October 2003 32

–Replacement list values can include

NULL elements.

–Subsetting a data frame can no longer pro-

duce duplicate column names.

–Subsetting with drop=TRUE no longer

sometimes drops dimensions on matrix or

data frame columns of the data frame.

–Attributes are no longer stripped when re-

placing part of a column.

–Columns added in replacement opera-

tions will always be named, using the

names of a list value if appropriate.

–as.data.frame.list() did not cope with

list names such as ‘check.rows’, and for-

matting/printing data frames with such

column names now works.

–Row names in extraction are still made

unique, but without forcing them to be

syntactic names.

–adf[x] <- list() failed if x was of

length zero.

• Setting dimnames to a factor now coerces to

character, as S does. (Earlier versions of R used

the internal codes.)

• When coercion of a list fails, a meaningful error

message is given.

• Adding to NULL with [[ ]] generates a list if

more than one element is added (as S does).

• There is a new command-line ﬂag ‘--args’

that causes the rest of the command line to

be skipped (but recorded in commandArgs() for

further processing).

• S4 generic functions and method dispatch have

been modiﬁed to make the generic functions

more self-contained (e.g., usable in apply-type

operations) and potentially to speed dispatch.

• The data editor is no longer limited to 65535

rows, and will be substantially faster for large

numbers of columns.

• Standalone Rmath now has a get_seed func-

tion as requested (PR#3160).

• GC timing is not enabled until the ﬁrst call

to gc.time(); it can be disabled by call-

ing gc.time(FALSE). This can speed up the

garbage collector and reduce system calls on

some platforms.

Standard packages

• New package ’mle’. This is a simple package to

ﬁnd maximum likelihood estimates, and per-

form likelihood proﬁling and approximate con-

ﬁdence limits based upon it. A well-behaved

likelihood function is assumed, and it is the re-

sponsibility of the user to gauge the applicabil-

ity of the asymptotic theory. This package is

based on S4 methods and classes.

• Changes in package ’mva’:

–factanal() now returns the test statis-

tic and P-value formerly computed in the

print method.

–heatmap() has many more arguments,

partly thanks to Wolfgang Huber and

Andy Liaw.

–Arguments ‘unit’ and ‘hmin’ of

plclust() are now implemented.

–prcomp() now accepts complex matrices,

and there is biplot() method for its out-

put (in the real case).

–dendrograms are slightly better docu-

mented, methods working with "label",

not "text" attribute. New rev() method

for dendrograms.

–plot.dendrogram() has an explicit

‘frame.plot’ argument defaulting to

FALSE (instead of an implicit one default-

ing to TRUE).

• Changes in package ’tcltk’:

–The package is now in a namespace. To

remove it you will now need to use

unloadNamespace("tcltk").

–The interface to Tcl has been made much

more efﬁcient by evaluating Tcl com-

mands via a vector of Tcl objects rather

than by constructing the string represen-

tation.

–An interface to Tcl arrays has been intro-

duced.

–as.tclObj() has gained a ‘drop’ argu-

ment to resolve an ambiguity for vectors

of length one.

• Changes in package ’tools’:

–Utilities for testing and listing ﬁles, ma-

nipulating ﬁle paths, and delimited pat-

tern matching are now exported.

–Functions

checkAssignFuns()

checkDocArgs()

checkMethods()

R News ISSN 1609-3631

Vol. 3/2, October 2003 33

have been renamed to

checkReplaceFuns()

checkDocFiles()

checkS3methods()

to given better descriptions of what they

do.

–R itself is now used for analyzing the

markup in the \usage sections. Hence in

particular, replacement functions or S3 re-

placement methods are no longer ignored.

–checkDocFiles() now also determines

’over-documented’ arguments which are

given in the \arguments section but not in

\usage.

–checkDocStyle() and checkS3Methods()

now know about internal S3 generics and

S3 group generics.

–S4 classes and methods are included in the

QC tests. Warnings will be issued from

undoc() for classes and methods deﬁned

but not documented. Default methods

automatically generated from nongeneric

functions do not need to be documented.

–New (experimental) functions

codocClasses()

codocData()

for code/documentation consistency

checking for S4 classes and data sets.

• Changes in package ’ts’:

–arima.sim() now checks for inconsis-

tent order speciﬁcation (as requested in

PR#3495: it was previously documented

not to).

–decompose() has a new argument ‘ﬁlter’.

–HoltWinters() has new arguments ‘op-

tim.start’ and ‘optim.control’, and returns

more components in the ﬁtted values. The

plot method allows ‘ylim’ to be set.

–plot.ts() has a new argument ‘nc’ con-

trolling the number of columns (with de-

fault the old behaviour for plot.mts).

–StructTS() now allows the ﬁrst value of

the series to be missing (although it is bet-

ter to omit leading NAs). (PR#3990)

Using packages

•library() has a pos argument, controlling

where the package is attached (defaulting to

pos=2 as before).

•require() now maintains a list of required

packages in the toplevel environment (typ-

ically, .GlobalEnv). Two features use this:

detach() now warns if a package is detached

that is required by an attached package, and

packages that install with saved images no

longer need to use require() in the .First as

well as in the main source.

• Packages with name spaces can now be in-

stalled using ‘--save’.

• Packages that use S4 classes and methods

should now work with or without saved im-

ages (saved images are still recommended for

efﬁciency), writing setMethod(), etc. calls

with the default for argument ‘where’. The

topenv() function and sys.source() have

been changed correspondingly. See the online

help.

• Users can specify in the DESCRIPTION ﬁle the

collation order for ﬁles in the R source directory

of a package.

R documentation format

• New logical markup commands for empha-

sizing (\strong) and quoting (\sQuote and

\dQuote) text, for indicating the usage of an

S4 method (\S4method), and for indicating spe-

ciﬁc kinds of text (\acronym,\cite,\command,

\dfn,\env,\kbd,\option,\pkg,\samp,\var).

• New markup \preformatted for pre-

formatted blocks of text (like example but

within another section). (Based on a contri-

bution by Greg Warnes.)

• New markup \concept for concept index en-

tries for use by help.search().

• Rdconv now produces more informative out-

put from the special \method{GENERIC}{CLASS}

markup for indicating the usage of S3 methods,

providing the CLASS info in a comment.

•\dontrun sections are now marked within

comments in the user-readable versions of the

converted help pages.

•\dontshow is now the preferred name for

\testonly.

Installation changes

• The zlib code in the sources is used unless the

external version found is at least version 1.1.4

(up from 1.1.3).

• The regression checks now have to be passed

exactly, except those depending on recom-

mended packages (which cannot be assumed

to be present).

R News ISSN 1609-3631

Vol. 3/2, October 2003 34

• The target make check-all now runs R CMD

check on all the recommended packages (and

not just runs their examples).

• There are new macros DYLIB_* for building

dynamic libraries, and these are used for the

dynamic Rmath library (which was previously

built as a shared object).

• If a system function log1p is found, it is tested

for accuracy and if inadequate the substitute

function in src/nmath is used, with name

remapped to Rlog1p. (Apparently needed on

OpenBSD/NetBSD.)

C-level facilities

• There is a new installed header ﬁle

R_ext/Parse.h which allows R_ParseVector to

be called by those writing extensions. (Note

that the interface is changed from that used

in the unexported header Parse.h in earlier

versions, and is not guaranteed to remain un-

changed.)

• The header R_ext/Mathlib.h has been re-

moved. It was replaced by Rmath.h in R 1.2.0.

• PREXPR has been replaced by two macros,

PREXPR for obtaining the expression and

PRCODE for obtaining the code for use in

eval. The macro BODY_EXPR has been added

for use with closures. For a closure with a

byte compiled body, the macro BODY_EXPR

returns the expression that was compiled; if the

body is not compiled then the body is returned.

This is to support byte compilation.

• Internal support for executing byte compiled

code has been added. A compiler for produc-

ing byte compiled code will be made available

separately and should become part of a future

R release.

• On Unix-like systems calls to the popen() and

system() C library functions now go through

R_popen and R_system. On Mac OS X these

suspend SIGALRM interrupts around the li-

brary call. (Related to PR#1140.)

Utilities

• R CMD check accepts "ORPHANED" as pack-

age maintainer. Package maintainers can now

ofﬁcially orphan a package, i.e., resign from

maintaining a package.

• R CMD INSTALL (Unix only) is now ’safe’: if

the attempt to install a package fails, leftovers

are removed. If the package was already in-

stalled, the old version is restored.

• R CMD build excludes possible (obsolete) data

and vignette indices in DCF format (and hence

also no longer rebuilds them).

• R CMD check now tests whether ﬁle names are

valid across ﬁle systems and supported oper-

ating system platforms. There is some support

for code/documentation consistency checking

for data sets and S4 classes. Replacement func-

tions and S3 methods in \usage sections are no

longer ignored.

• R CMD Rdindex has been removed.

Deprecated & defunct

• The assignment operator ‘_’ has been removed.

•printNoClass() is defunct.

• The classic MacOS port is no longer supported,

and its ﬁles have been removed from the

sources.

• The deprecated argument ’white’ of parse()

has been removed.

• Methods pacf/plot.mts() have been re-

moved and their functionality incorporated

into pacf.default/plot.ts().

•print.coefmat() is deprecated in favour of

printCoefmat() (which is identical apart from

the default for na.print which is changed from

"" to "NA", and better handling of the 0-rank

case where all coefﬁcients are missing).

•codes() and codes<-() are deprecated, as al-

most all uses misunderstood what they actu-

ally do.

• The use of multi-argument return() calls is

deprecated: use a (named) list instead.

• anovalist.lm (replaced in 1.2.0) is now depre-

cated.

•-and Ops methods for POSIX[cl]t objects are

removed: the POSIXt methods have been used

since 1.3.0.

•glm.fit.null(),lm.fit.null() and

lm.wfit.null() are deprecated.

• Classes "lm.null" and "glm.null" are deprecated

and all of their methods have been removed.

• Method weights.lm(), a copy of

weights.default(), has been removed.

•print.atomic() is now deprecated.

• The back-compatibility entry point Rf_log1p in

standalone Rmath has been removed.

R News ISSN 1609-3631

Vol. 3/2, October 2003 35

Changes on CRAN

by Kurt Hornik and Friedrich Leisch

New contributed packages

DAAG various data sets used in examples and exer-

cises in the book Maindonald, J.H. and Braun,

W.J. (2003) "Data Analysis and Graphics Using

R". By John Maindonald and W. John Braun.

Devore6 Data sets and sample analyses from Jay

L. Devore (2003), "Probability and Statistics

for Engineering and the Sciences (6th ed)",

Duxbury. Original by Jay L. Devore, modiﬁ-

cations by Douglas Bates.

Hmisc The Hmisc library contains many functions

useful for data analysis, high-level graph-

ics, utility operations, functions for computing

sample size and power, importing datasets, im-

puting missing values, advanced table making,

variable clustering, character string manipula-

tion, conversion of S objects to LaTeX code, and

recoding variables. By Frank E Harrell Jr, with

contributions from many other users.

HyperbolicDist This package includes the basic

functions for the hyperbolic distribution: prob-

ability density function, distribution function,

quantile function, a routine for generating ob-

servations from the hyperbolic, and a function

for ﬁtting the hyperbolic distribution to data.

By David Scott.

VaR A set of methods for calculation of Value at Risk

(VaR). By Talgat Daniyarov.

bim Functions to sample and interpret Bayesian

QTL using MCMC. By Brian S. Yandell, Hao

Wu.

boolean A procedure for testing Boolean hypothe-

ses. By Bear F. Braumoeller, Jacob Kline.

cat Analysis of categorical-variable with missing

values. Original by Joseph L. Schafer. Ported

to R by Ted Harding and Fernando Tusell.

classPP PP Indices using class information. By Eun-

kyung Lee.

clines Calculates contour lines. By Paul Murrell.

diptest Compute Hartigan’s dip test statistic for

unimodality. By Martin Maechler, based

on Fortran and S-plus from Dario Ringach

(NYU.edu).

eha A package for survival and event history analy-

sis. By Göran Broström.

emme2 This package includes functions to read and

write to an EMME/2 databank. By Ben Stabler.

exactLoglinTest Monte Carlo and MCMC goodness

of ﬁt tests for log-linear models. By Brian Caffo.

ﬂexmix FlexMix implements a general framework

for ﬁnite mixtures of regression models using

the EM algorithm. FlexMix provides the E-

step and all data handling, while the M-step

can be supplied by the user to easily deﬁne

new models. Existing drivers implement mix-

tures of standard linear models, generalized

linear models and model-based clustering. By

Friedrich Leisch.

forward Forward search approach to robust analy-

sis in linear and generalized linear regression

models. By Originally written for S-Plus by:

Kjell Konis and Marco Riani. Ported to R by

Luca Scrucca.

fpc Fuzzy and crisp ﬁxed point cluster analysis

based on Mahalanobis distance and linear re-

gression ﬁxed point clusters. Semi-explorative,

semi-model-based clustering methods, operat-

ing on n*p data, do not need prespeciﬁcation of

number of clusters, produce overlapping clus-

ters. Discriminant projections separate groups

optimally, used to visualize the separation of

groupings. Corresponding plot methods. Clus-

terwise linear regression by normal mixture

modeling. By Christian Hennig.

ftnonpar The package contains R-functions to per-

form the methods in nonparametric regression

and density estimation, described in Davies, P.

L. and Kovac, A. (2001) Local Extremes, Runs,

Strings and Multiresolution (with discussion)

Annals of Statistics. 29. p1-65 Davies, P. L.

and Kovac, A. (2003) Densities, Spectral Den-

sities and Modality Davies, P. L. (1995) Data

features. Statistica Neerlandica 49,185-245. By

Laurie Davies and Arne Kovac.

ggm Functions for deﬁning directed acyclic graphs

and undirected graphs, ﬁnding induced graphs

and ﬁtting Gaussian Markov models. By Gio-

vanni M. Marchetti.

gridBase Integration of base and grid graphics. By

Paul Murrell.

its The its package contains an S4 class for handling

irregular time series. By Portfolio & Risk Advi-

sory Group, Commerzbank Securities.

linprog This package can be used to solve Linear

Programming / Linear Optimization problems

R News ISSN 1609-3631

Vol. 3/2, October 2003 36

by using the simplex algorithm. By Arne Hen-

ningsen.

lme4 Fit linear and generalized linear mixed-effects

models. By Douglas Bates, and Saikat DebRoy.

lmeSplines Add smoothing spline modelling capa-

bility to nlme. Fit smoothing spline terms

in Gaussian linear and nonlinear mixed-effects

models. By Rod Ball.

logistf Firth’s bias reduced logistic regression ap-

proach with penalized proﬁle likelihood based

conﬁdence intervals for parameter estimates.

By Meinhard Ploner, Daniela Dunkler, Harry

Southworth, Georg Heinze.

mapdata Supplement to maps package, providing

the larger and/or higher-resolution databases.

Original S code by Richard A. Becker and Allan

R. Wilks. R version by Ray Brownrigg.

maps Display of maps. Projection code and larger

maps are in separate packages (mapproj and

mapdata). Original S code by Richard A.

Becker and Allan R. Wilks. R version by

Ray Brownrigg. Enhancements by Thomas P

Minka.

maptools Set of tools for manipulating and reading

geographic data, in particular ESRI shapeﬁles.

By Nicholas J. Lewin-Koh, modiﬁed by Roger

Bivand; C code used from shapelib ().

merror N methods are used to measure each of n

items. This data is used to estimate the accu-

racy and precision of the methods. Maximum

likelihood estimation is used for the precision

estimates. By Richard A. Bilonick.

mmlcr Mixed-mode latent class regression (also

known as mixed-mode mixture model regres-

sion or mixed-mode mixture regression mod-

els) which can handle both longitudinal and

one-time responses, although it is created with

longitudinal data in mind. By Steve Buyske.

mvnormtest Generalization of Shapiro-Wilk test for

multivariate variables. By Slawomir Jarek.

negenes Estimating the number of essential genes in

a genome on the basis of data from a random

transposon mutagenesis experiment, through

the use of a Gibbs sampler. By Karl W Broman.

nlmeODE This package combines the odesolve and

nlme packages for mixed-effects modelling us-

ing differential equations. By Christoffer W.

Tornoe.

nortest Five omnibus tests for the composite hy-

pothesis of normality. By Juergen Gross.

nprq Nonparametric and sparse quantile regression

methods. By Roger Koenker and Pin Ng.

orientlib Representations, conversions and display

of orientation SO(3) data. See the orientlib help

topic for details. By Duncan Murdoch.

pps The pps package contains functions to select

samples using PPS (probability proportional

to size) sampling. It also includes a function

for stratiﬁed simple random sampling, a func-

tion to compute joint inclusion probabilities for

Sampford’s method of PPS sampling, and a few

utility functions. By Jack G. Gambino.

prabclus Distance based parametric bootstrap tests

for clustering, mainly thought for presence-

absence data (clustering of species distribution

maps). Jaccard and Kulczynski distance mea-

sures, clustering of MDS scores, and nearest

neighbor based noise detection (R port of Byers

and Raftery’s (1998) "NNclean"). Main func-

tions are prabtest (for testing), prabclust (for

clustering), prabinit (for preparing the data)

and NNclean (for noise detection). The help-

pages for prabtest and prabclust contain simple

standard executions. By Christian Hennig.

psy Kappa, ICC, Cronbach alpha, screeplot, PCA

and related methods. By Bruno Falissard.

rqmcmb2 Markov Chain Marginal Bootstrap for

Quantile Regression. A resampling method for

inference in quantile regression. Suitable for

modest to large data sets. By Maria Kochergin-

sky, Xuming He.

sca Simple Component Analysis often provides

much more interpretable components than

Principal Components (PCA) without losing

too much. By Valentin Rousson and Martin

Maechler.

seacarb Calculates parameters of the seawater car-

bonate system. By Aurelien Proye and Jean-

Pierre Gattuso.

seao Software for simple evolutionary algorithms.

For all factors (genes) included, one can set the

lowest and highest values as well as the num-

ber of levels (alleles) or the step. An initial gen-

eration can be calculated in several ways and

following generations are calculated based on

a parent generation which can be constructed

using other, already calculated generations or

new generations (as long as the format is ok).

By Kurt Sys.

seao.gui Graphical interface for seao-package. All

functions can be called seperately, but there’s

also a function which can call all other func-

tions. The functions called with this graphical

R News ISSN 1609-3631

Vol. 3/2, October 2003 37

interface hasn’t the same ﬂexibility of the func-

tions called from the command-line. This may

change in the future, although I doubt that. . . .

By Kurt Sys.

segmented Functions to estimate break-points of

segmented relationships in regression models

(GLMs). By Vito M. R. Muggeo.

shapeﬁles Functions to read and write ESRI shape-

ﬁles. By Ben Stabler.

shapes Routines for the statistical analysis of

shapes. In particular, the package provides

routines for procrustes analysis, displaying

shapes and principal components, testing for

mean shape difference, thin-plate spline trans-

formation grids and edge superimposition

methods. By Ian Dryden.

simpleboot Simple bootstrap routines. By Roger D.

Peng.

smoothSurv This package contains primarily a

function to ﬁt a regression model with possi-

bly right, left or interval censored observations

and with the error distrbution expressed as a

mixture of G-splines. Core part of the computa-

tion is done in compiled C++ written using the

Scythe Statistical Libary Version 0.3. By Arnost

Komarek.

tapiR Tools for accessing online UK House of Com-

mons voting data, and datasets for the parlia-

ments 1992-97, 1997-2001 and 2001-now. By

David Firth and Arthur Spirling.

udunits This package provides an R interface to

the Unidata udunits library routines, which

can convert quantities between various units.

Units are indicated by human-readable strings,

such as "m/s", "J", "kg", or "in". Routines

for converting any quantity in known units to

other compatible units are provided. Of partic-

ular use are the time and calendar conversion

routines. Calendar dates are given with units

such as "days since 1900-01-01", for example.

Values with this unit can be converted to nor-

mal, readable calendar dates. This will let you

ﬁnd that "32018 days since 1900-01-01" is ac-

tually 31 Aug 1987. These routines follow the

library’s C interface, so consult that section of

Unidata’s udunits manual for reference. Here

are some example formatted units strings that

can be used: "10 kilogram.meters/seconds2",

"10 kg-m/sec2", "(PI radian)2", "degF", "degC",

"100rpm", "geopotential meters", "33 feet wa-

ter". Note that the udunits library must already

be installed on your machine for this package

to work. By David Pierce.

Other changes

• Package grid is a base package in R 1.8.0.

• Package GeneSOM was renamed to som.

Kurt Hornik

Wirtschaftsuniversität Wien, Austria

Kurt.Hornik@R-project.org

Friedrich Leisch

Technische Universität Wien, Austria

Friedrich.Leisch@R-project.org

R News ISSN 1609-3631

Vol. 3/2, October 2003 38

Crossword Solution

by Barry Rowlingson

Unfortunately nobody got the crossword in the last

issue of R News (Vol. 3/1) exactly right, but one of

my clues had an alternate solution which could not

be eliminated as “wrong”. I’ll therefore draw a name

from the RNG hat:

And the winner is:

> sample(c("Rolf","Saikat","Simon"))[1]

[1] "Simon"

So a well-travelled (to the DSC-03 and back) 50

Euro note will be on its way to Simon Fear.

The solution, with some explanations, is

also available at http://www.maths.lancs.ac.uk/

~rowlings/Crossword/.

Barry Rowlingson

Lancaster University, UK

B.Rowlingson@lancaster.ac.uk

R I W C U A D U

OWNE R SH I P B R I A N

B T I A D A D R

E R R A T UM A ND A NT E

R A E B T V OP

T A NH B E NE F A CTOR

GE D R L E

ENT R A P SDO U G L A S

NT E E OE

T H R E A D SA F E JOHN

L I F Y L B K T

E X P OR T SANAN O V A

M L A T T T UB

A P E R M E L E M E NT A L

NY E M D S S E

R News ISSN 1609-3631

Vol. 3/2, October 2003 39

Correction to “Building Microsoft

Windows Versions of R and R packages

under Intel Linux”

by Jun Yan and A.J. Rossini

Unfortunately, due to an inexcusable oversight on

our part, we failed to be crystal clear in our article

Yan and Rossini (2003) that all the described steps

and the Makeﬁle were summarized from several doc-

uments in the R sources (R Development Core Team,

2003).

These documents are INSTALL,readme.package,

and Makefile under the directory src/gnuwin32/ in

the R source. We intended to automate and illustrate

those steps by presenting an explicit example, hop-

ing that it might save people’s time. However, confu-

sion has been caused and inquiries have been raised

to the R-help mailing list. We apologize for the con-

fusion and claim sole responsibility. In addition, we

clarify that the ﬁnal credit should go to the R Devel-

opment Core Team.

Bibliography

R Development Core Team. R: A language and envi-

ronment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria, 2003. URL

http://www.R-project.org. ISBN 3-900051-00-3.

J. Yan and A. Rossini. Building Microsoft Windows

versions of R and R packages under Intel Linux.

R News, 3(1):15–17, June 2003. URL http://CRAN.

R-project.org/doc/Rnews/.

Jun Yan

University of Iowa, U.S.A.

jyan@stat.uiowa.edu

A.J. Rossini

University of Washington, U.S.A.

rossini@u.washington.edu

Editor-in-Chief:

Friedrich Leisch

Institut für Statistik und Wahrscheinlichkeitstheorie

Technische Universität Wien

Wiedner Hauptstraße 8-10/1071

A-1040 Wien, Austria

Editorial Board:

Douglas Bates and Thomas Lumley.

Editor Programmer’s Niche:

Bill Venables

Editor Help Desk:

Uwe Ligges

Email of editors and editorial board:

firstname.lastname @R-project.org

R News is a publication of the R Foundation for Sta-

tistical Computing, communications regarding this

publication should be addressed to the editors. All

articles are copyrighted by the respective authors.

Please send submissions to regular columns to the

respective column editor, all other submissions to

the editor-in-chief or another member of the edi-

torial board (more detailed submission instructions

can be found on the R homepage).

R Project Homepage:

http://www.R-project.org/

This newsletter is available online at

http://CRAN.R-project.org/doc/Rnews/

R News ISSN 1609-3631

PDF Rnews 2003 2

PDF Rnews_2003-2 CRAN - Contents of R News

Navigation menu

Versions of this User Manual:

Views

Navigation