SPF R Users Guide

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 16

SPF-R User’s Guide

SPF-R User’s Guide

Introduction

The following guide describes an automation tool that helps to develop and assess

Safety Performance Functions (SPFs). SPFs can be straightforward to develop. The process

requires a database of roadway segments (or intersections) containing segment length, number

of crashes, and traffic volumes for each site. A generalized linear model using negative binomial

regression is used to create an equation that relates observed crashes to traffic volume and

length (as well as other independent variables, if desired). Statistical packages such as SPSS,

SAS, Stata, and R Studio perform this regression easily with built-in tools. The process can also

be achieved in Microsoft Excel using solver or custom functions.

The above-mentioned tools are simple enough to generate an SPF manually but can be

cumbersome when trying to improve model development, which requires several iterations

while filtering the roadway dataset. Moreover, the creation of CURE Plots requires several steps

and considerable amount of overhead for large database. FHWA’s Calibrator tool readily

generates CURE Plots but is separate from the SPF development. This separation necessitates

several intermediate and repetitive steps.

The program “R Studio” can be used to simplify and streamline the SPF development

and assessment process for large datasets, and code was written to automate the entire

process. The following sections describe each section of the R Code – named “SPF-R.” The

source code is available on GitHub at: http://github.com/irkgreen/SPF-R. The code can be

modified as needed and meaningful changes may be committed to the GitHub repository so

that other safety professionals can benefit from the enhancements. GitHub is an online,

collaborative tool that allows anyone to download the source code and contribute.

The code requires an input file in CSV-format containing roadway segments or

intersections. Each record must contain, at a minimum, traffic volume (major and minor for

intersections), length (for roadway segments), and crashes. Optionally, the input file can

contain data about the roadway (shoulder width, lane width, curvature, etc.) and crash counts

by severity.

By default, SPF-R develops an SPF based on the input file using the model form shown in

Equation 1. A CURE Plot, scatter plot, and an Excel document containing the model parameters

and data are all saved to folder defined by the user. The following sections describe how to use

and modify SPF-R.

SPF-R Prerequisites

The above referenced source code was intended for use with R Studio. However, it may

work with other installations of R. A separate installation of Rtools as well four R Packages are

required. The following list describes the required tools:

• R Studio - https://www.rstudio.com/products/rstudio/download/

• Rtools - https://cran.r-project.org/bin/windows/Rtools/ 1

• Required packages: knitr, ggplot2, openxls, installr

An analyst may download and install both R Studio and Rtools from the links provided.

To install the required packages, the user will choose run Tools>Packages from the R Studio

menu and enter the comma-separated list of packages described above. R Studio provides

sufficient error messaging to help with most installation errors.

SPR-R Code Description

The following describes the purpose of each section of R-code and provides advice on

modification of code for other uses. Line numbers from the February 15, 2017 “commit” on

GitHub will be used as references. A “commit” is an upload to the repository. It is likely that the

repository will be modified after the release of this document; therefore, please refer to the

SHA hash b376201f1765f3fe3b0adadbbdd794db267c2cde.

Lines 1-17

The first few lines disable echo, clear the workspace, load libraries, and store the version

number. The workspace is cleared to simplify debugging as the previous workspace memory

1 When installing Rtools, make sure that the box is checked to have the installer edit your PATH.

can make it difficult to isolate errors. That said, this line can be removed if the user intends to

use previously stored data (warning – clearing the workspace will delete R Studio’s stored data).

Edit the version number as needed; however, the other lines should stay unchanged. Editing

the version is important so that results are tied to a specific version of SPF-R if changes are

made.

Lines 19-27

This code is used to specify an alternate location for the Windows User’s folder. For

most users, the default is sufficient. However, an alternate user folder can be hardcoded using

the computer’s computer name as shown in lines 21 and 23. This folder is a base folder for

input data as described below.

Lines 29-50

This section is used to map the data columns (from the input file – discussed below) to

the variables used to develop an SPF. You must specify a data column for TotalColumn,

AADTColumn, and LengthColumn. These columns represent the total crashes, traffic volume,

and length, respectively, for each site. The total crashes at each site could be for all crashes or a

specific crash type. TotalColumn must be used if only one specific crash severity is being

analyzed (e.g. fatal only crashes). However, if SPFs are to be developed for more than one

severity type then the KABCO columns can be used to simply the SPF development process. In

this case, the input dataset must include a column for each severity type. For example, you can

develop SPFs for five severity types by using the following mappings:

• TotalColumn = "Total" #The title of the column containing All Crashes (KABCO)

• KABCColumn = "KABC" #The title of the column containing KABC Crashes

• KABColumn = "KAB" #The title of the column containing KAB Crashes

• KAColumn = "KA" #The title of the column containing KA Crashes

• KColumn = "Fatal" #The title of the column containing K Only Crashes

Spaces should be avoided in all column names, however, you can replace spaces with a period:

"Total.Crashes"

Classes can be used if your dataset contains more than one group of roadway segments

or intersection types. For example, the dataset may contain several districts across a state. SPF-

R can be used to build a separate SPF for each district. The mapped ClassColumn must contain a

positive integer (e.g. district number). The lowest and highest integers must be defined with

ClassStart and ClassEnd. Gaps in the range should be avoided. For instance, a dataset might

include data for two highway types: rural, 2-lane roads and urban 4-lane divided roads. In this

dataset, all of the rural, 2-lane roads could be coded as HighwayType = 1 and the others as

HighwayType =2. ClassColumn would be set to “HighwayType” with ClassStart = 1 and ClassEnd

= 2.

The CSVPath variable is used to set the location of the input CSV file. This file must

contain all of the fields mapped above. The CSV must have a title row. The location is relative to

the folder set in line 26. Notice that R uses forward slashes (“/”) for file paths.

The OutputProject_Base is used to define the name of the output folder. The

myFilter_Base is used to apply a global filter to the data. Generally, it is good practice to specify

that traffic volume and length are both greater than zero to avoid errors in the regression. You

can reference a field in two ways:

• Directly – data$FieldName where FieldName is the name of the field in the input CSV

• Using pre-defined variables – data[[VariableName]] where VariableName is TotalColumn

or another previously defined field (ideal for dynamic assignment of a variable

throughout the code)

It is import to change the OutputProject_Base anytime the myFilter_Base is changed.

This will ensure that the modified SPF is saved to another folder instead of overwriting the

previous analysis. There is little warning about overwriting folders or files.

The InputData_Base is used to uniquely identify the analysis type. It is recommended

that the crash time period and crash type are described in this text string. This description will

be included in the output file. Lastly, initTheta is used to specify a starting point for the

overdispersion parameter. This can be adjusted if the regression model is not able to converge.

R Code uses Theta as opposed to k for the overdispersion parameter. Theta is the reciprocal of

Lines 52-55

These comments simply show examples of advanced filters using AND (&) and OR (|)

operators. Notice that the presence of parenthesizes is important in developing filters. Text

string filters require the use of a single quote (apostrophe). R uses a single equal sign (=) to set a

variable, but double equal signs (==) to set a filter to an exact match (as opposed to an

inequality such as greater than).

Lines 57-92

These lines simply check for the input dataset and attempt to bind the data. A flag is set

to TRUE, if successful.

Lines 94-193

This section represents the main function to develop the model – RunSPF. These

statements are not actually executed until called upon later in the code. This may seem a bit

counterintuitive, but these lines will be explained in a later section.

Line 196

This line merely checks that the input dataset (CSV) was bound successfully. The

following lines will not execute if unsuccessful.

Lines 198-213

This section checks if the user has defined a column of classes. If a class column is set,

then the remaining code will loop through each class. In each loop, a filter will be added to the

base filter limited the dataset to class i where i is the current class. If no class is defined, then

no filter is applied and the loop is only executed once.

Lines 215-222

This section represents the primary SPF initialization. Three variables are temporarily

assigned to identify the crash column, the input dataset description, the output folder. The

RunSPF function is executed using the temporally assigned variables. Lastly, a message is

printed indicating that this code has completed.

Lines 227-272

This section executes the same code as in the previous section however the variables

are changed to reference the predefined severity columns, if enabled. The same three variables

are used but this time the crash columns are assigned accordingly. Similarly, the severity type is

indicated in the description variables.

Lines 94-193 (revisited)

This section develops the SPF and creates the output files. It should be more intuitive

now that the other sections have been explained. This function uses temporary variables such

that it can be called several times throughout the code. Care has been taken to make all of the

inputs and outputs generic. Line numbers are indicated where appropriate below.

A filter is applied using data from the base filter (line 43) and using a defined class (line

208), if applicable (line 97). This new data table is then sorted by the traffic volume column (line

100). The crash column is set to a variable to be used negative binomial model development

(line 103). A generalized linear model is used to compute the regression parameters. The

natural log is used to generalize the functional form of the SPF so that the parameters are

coefficients instead of exponents. As such, the natural log of traffic volume and length are

computed (lines 104-105). Optionally, length can be calculated directly from beginning and

ending points; however, segments with a length of zero will cause an error in the SPF function.

It is therefore recommended that length is included in the input file so that a simple filter can

be applied. Theta is initialized on line 45. An effort was made to group all user-defined settings

into a few sections of the code.

Line 112 executes regression based on the SPF model form. This code can be altered to support

other model forms. A few notes about the syntax:

• The variable to the left of the tilde (~) is the dependent variable – crashes.

• The plus sign is used to separate the independent variables. These are variables that are

affected by regression parameter as an exponent (e.g. AADTb or eSW*b).

• Any additional independent variable need to be added to lines 104-105 so that the

column titles are mapped to variables to be used in the glm.nb function.

• A natural log transformation must be computed for any variables lacking the exponent

(Euler’s number, e). Traffic volume (AADT) typically requires this transformation as

shown in Equation 1. Variable names that have been transformed should start with “ln”

to indicate the transformation.

• Advanced users can modify the code to include interaction terms

• Offset() is used to isolate variables that are not affected by a regression parameter (e.g.

Length). These variables should also be transformed using the natural log. Although the

current edition of the HSM (AAHSTO, 2010) treats length this way, there is some recent

evidence that Length should be modeled similar to AADT. In this case offset() can simply

be removed from the R code.

The following table lists three common SPF models and their R Code syntax.

Table F-1. Various SPF Forms and the Corresponding R Code Syntax

Description

Functional Form**

R Code

Typical

  

SPF=glm.nb(crash~lnADT+offset(lnL))

Alternate

 

SPF=glm.nb(crash~lnADT+lnL)

HSM

 365 10  

SPF=glm.nb(crash~offset(HSM*))

Intersection

  __

SPF=glm.nb(crash~lnADT1+lnADT2)

Shoulder

  

SPF=glm.nb(crash~lnADT+SW+offset(lnL))

Interaction

  

SPF=glm.nb(crash~lnADT+SW+LW+SW*LW+offset(lnL))

*HSM = log(data2[[AADTColumn]]*data2[[LengthColumn]]*365*10^-6)

**LW = lane width, SW = shoulder width

Terms that are in exponential functional form (such as eb and eSW*b2) do not require a

transformation; however, length, power functions (such as AADTa), and any other terms require

a natural log transformation. Transformation is required so that the exponents (a, b, b2) can be

treated as coefficients and computed using linear regression. Consider the following

transformation:

=  

ln()=ln(  )

ln()=ln()+ln()+ln ()*

ln()=ln()++  ln()

where,

ln()=  ln()

ln () = 1

*natural log identity

Notice that a and b can now be computed using linear regression with ln(L) as an offset.

In this model form a is the intercept and b is the regression coefficient for AADT. The same

transformation can be applied to other model forms using the same natural log identities. All

natural log transformations must be computed in the section of code starting at line 104.

Moreover, additional parameters (such as b1 and b2) must be referenced in the output section

near line 167 as discussed later.

More complicated model forms can also be used. In this case, it is advisable to check the

R-code syntax using Excel. This is easily accomplished by calculating the prediction using the

intended model form from within Excel. From here, the independent variables and model

parameters can be referenced directly. The resulting prediction can be compared to the fitted

result provided by R – conveniently stored in Excel as well. A perfect match (to several

decimals) confirms that the model form was properly converted. For example, consider the

fatal and injury SPF for two-lane rural road by Bauer and Harwood as described in the SPF

Development Guide:

 =()



 



The equivalent R syntax for this model is:

#Point to variables

crash=data2[[CrashColumn]]

lnADT=log(data2[[AADTColumn]])

IHC=data2$IHC

ln2CD=ifelse(data2$CURVEDEG == 0 ,0,log(2*data2$CURVEDEG)*data2$IHC) # omit if DegreeOfCurve is zero**

G=data2$G

CD_L=data2$CURVEDEG*data2$IHC/(5730*data2[[LengthColumn]])

init.theta = initTheta

#################################################################

SPF=glm.nb(crash~lnADT+G+ln2CD+CD_L)

#################################################################

(Recall that CurveDegree=5730/R)

A variable dispersion can also be used but it requires an additional library. This library

will require significant modifications to the remainder of the code, however. The creation of

CURE plots, scatter plots, and SPFs metrics are all based on the glm output format. While some

of the code might work, much of it will require adjustments. As an alternative, these lines can

be commented out and a manually summary can be used to view the model results. The

following code shows the essential lines required to employ a variable dispersion.

library(gnlm)

#Point to variables

crash=data2[[CrashColumn]]

lnADT=log(data2[[AADTColumn]])

lnL=log(data2[[LengthColumn]])

SPF = gnlr(crash, dist="negative binomial", mu=~exp(a+b*lnADT+c*lnL), shape=~(const+b1*lnL), pmu=list(a=0,b=0,c=0),

pshape=c(0,0))

It should be noted that the results of this methodology have been compared to another

statistical package (Stata) and there are some discrepancies. The resulting parameters differ

slightly (likely variations in the way they are estimated) but not enough to change the

predictions. More importantly, the sign of the parameters are opposite. This may imply there is

a bug in R’s gnlm library (the results from Stata are more intuitive and are likely correct).

Validation should be used with other statistical packages before employing this feature. This

was observed when both reported parameters were found to be negative in Stata. While this

was consistent, it was not exhaustively tested and may not apply in all cases.

Line 116 adds the SPF predictions, residuals, and cumulative residuals to the recently

sorted table. The SPF prediction is simply the predicted crashes using the fitted SPF for each

record in the dataset. The residuals are the difference between the actual crash experience and

the prediction.

The next section (lines 118-146) calculates the information needed to create the CURE

Plot. The CURE Plot is a scatter plot of the cumulative residuals versus a sorted variable

(typically traffic volume). A standard deviation computation is used to create upper and lower

bounds for residuals exceeding 95% confidence boundaries. This section also flags road

segments that are outside of the bounds so that the Percent CURE Deviation (PCD) can be

computed. The ggplot2 library is used to generate the CURE plot and add labels. The resulting

graph is saved as a PNG file to the output folder.

CURE plots can also be generated for other variables. To accomplish this, the data must

be sorted by the variable of choice. It is common for length to be used in CURE plots as well as

traffic volume. The following code shows how to implement this change (underlined

statements can be changed to reference a variable other than length).

#sort by Length

data3 <- dataout[ order(dataout[[LengthColumn]]),]

#add new cumul

dataout2 <- cbind(data3,CumulRes2=cumsum(data3$Residuals))

#calculate data for CURE plot

datalimits2 <- data.frame(dataout2$Residuals)

datalimits2["Length"] <- NA

datalimits2$Length <- dataout2[[LengthColumn]]

datalimits2["CumulRes"] <- NA

datalimits2$CumulRes <- dataout2$CumulRes2

datalimits2["Squared_Res"] <- NA

datalimits2$Squared_Res <- datalimits2$dataout2.Residuals^2

datalimits2["CumulSqRes"] <- NA

datalimits2$CumulSqRes <- cumsum(datalimits2$Squared_Res)

datalimits2["SigmaSum"] <- NA

datalimits2$SigmaSum <- sqrt(datalimits2$CumulSqRes)

datalimits2["StdDev"] <- NA

datalimits2$StdDev <- datalimits2$SigmaSum*sqrt(1-datalimits2$CumulSqRes/sum(datalimits2$Squared_Res))

datalimits2["UpperLimit"] <- NA

datalimits2$UpperLimit <- datalimits2$StdDev * 1.96

datalimits2["LowerLimit"] <- NA

datalimits2$LowerLimit <- datalimits2$StdDev * (-1.96)

datalimits2["Per_CURE"] <- NA

datalimits2$Per_CURE <-

ifelse(datalimits2$CumulRes<=datalimits2$UpperLimit,ifelse(datalimits2$CumulRes>=datalimits2$LowerLimit,1,0),0)

#create CURE plot

CUREPlot2 <- ggplot(datalimits2, aes(datalimits2$Length, y = value, color = variable)) +

geom_point(aes(y = UpperLimit, col = "Upper")) +

geom_point(aes(y = LowerLimit, col = "Lower")) +

geom_point(aes(y = CumulRes, col = "CumulRes")) +

ggtitle("CURE Plot") +

labs(x="Length",y="Cumulative Residuals")

ggsave(file=paste0(OutPath,OutputProject,"_CURE_L.png"))

The same library is used to plot traffic volume versus crashes (actual) per mile (lines

148-154). The SPF predictions are also divided by segment length and plotted to visualize the

SPF model. This plot indicates the relative amount of dispersion in the data and is saved to the

output folder as a PNG. The scatter plot will include a curve represented by points that

describes the shape of the SPF normalized by length. When additional variables are added to

the SPF, this curve is obfuscated as each point is affected by more than just AADT (such as lane

or shoulder width). In this case it would be more appropriate to plot the SPF at various

combinations of the additional variables (e.g. SPFs for lane width of 9 feet, 10 feet, and 11 feet);

each with a slightly different shape. This can be added to the output but was beyond the scope

of this guide.

The next section (lines 156-170) calculates basic descriptive statistics about the data

such as total crashes, mileage, and number of records. Goodness-of-fit measures are also

calculated so that similar models can be compared and improved:

• An equivalent analog to R-squared does not exist for negative binomial regression;

however, a pseudo-R-squared can be computed.

• PCD is calculated by computing the percentage of segments that are outside of the

upper and lower confidence bands from the CURE Plot.

• The Maximum Absolute CURE Deviation is simply the largest (positive or negative)

cumulative residual. As described earlier, this can be useful in outlier and data error

detection.

• Lastly, the Mean Absolute Deviation (MAD) is computed as the average of the absolute

values of the residuals.

These metrics are stored into three arrays including the metric name, the value, and a

description. The descriptions, in many cases, include helpful comments such as if higher or

lower values are preferred or if there are recommended limits. For instance, the HSM has

recommendations for the number of crashes per year and miles in a network for SPF

development. It is important to note that these arrays must be altered if there are any changes

to the SPF functional form (as described in Table F-1). That is, if a minor AADT is added to the

SPF then the corresponding regression coefficient must also be added to the three arrays. The

coefficient is referenced using the following code:

coef(summary(SPF))["VariableName","Estimate"]

The term “VariableName” must be replaced with the variable used in line 112 that

corresponds to the coefficient. For instance, the following three lines of code would be used to

report the five regression coefficients described in Equation 2 (the altered and added code is

underlined).

datametrics <- data.frame(Values = c(Sample,Mileage,Crashes,RSquared,PCD,MACD,MAD,SPF$theta

,coef(summary(SPF))["(Intercept)","Estimate"],coef(summary(SPF))["lnADT","Estimate"],coef(summary(SPF))["G","Estimate"],

coef(summary(SPF))["ln2CD","Estimate"],coef(summary(SPF))["CD_L","Estimate"], SPF$SE.theta, SPF$aic, "", "", ""))

datametrics$Notes <- c("100-200 intersections*","100-200 miles*","300 crashes per year*","Higher values preferred","Less than

5%","Smaller values preferred","Smaller values preferred","Higher values preferred","(b0)","(b1)","(b2)","(b3)","(b4)", "", "",

myFilter, InputData,"*As recommended by FHWA-SA-14-004")

attr(datametrics, "row.names") <-

c("Sample","Length","Crashes","R2","PCD","MACD","MAD","Theta","Intercept","lnADT","G","ln2CD","CD_L","StdErr","AIC",

"Filter","Input Data","")

Care must be taken to ensure that each line is altered similarly such that each array

reports the data in the same order.

The next section (lines 172-180) calculates the Potential for Crash Reduction (PCR) using

the Empirical Bayes (EB) method as outlined in the HSM. The equation for the Empirical Bayes

estimate is:

EB[N] = w * E[N] + (1 - w) N

where:

EB[N] = EB estimate for site N

E[N]= predicted number of crashes for site N based on SPF

N = number of observed crashes at site N

w = weight equation defined as: 1 / [1 + (E[N]/θ)]

θ = over-dispersion parameter (reciprocal of k)

It should be noted that R terminology and the above methodology differs slightly from

the HSM. R reports the over-dispersion parameter as theta which is the reciprocal of k as

designated by the HSM and most other statistical packages (SPSS, SAS, etc.) Also, the input files

used for SPF development are typically created for a five-year period. That is, there is one

record per segment with a single traffic volume and an aggregated total of crashes for the

entire period. As such, there is no need to total the predicted number of crashes as shown in

the HSM in equation 3-10.

The EB estimate is a critical step in the network screening process as it addresses

regression-to-the-mean bias. An analyst may be tempted to compare the observed crashes (N)

to the prediction from the SPF (E[N]); however, this can potentially be misleading if the

observed crashes are uncharacteristically high or low. The EB estimate estimates the magnitude

of expected crashes by using the above weight equation.

PCR is then calculated by the following equation:

PCR = EB[N] - E[N]

This number represents the potential benefit that can be expected if the target crash

type is addressed such that the segment of roadway (or intersection) is to become more like

the average segment in the road type. That is, if an SPF was developed for lane departure

crashes and a PCR at a site was calculated to be 20.6 crashes, then installing rumble stripes

could be expected to eliminate nearly 21 crashes over 5-year period. A Crash Modification

Factor (CMF) could be used to quantify this reduction in crashes based on a specific

countermeasure.

The final section (lines 182-192) creates an Excel file with the metrics and goodness-of-

fit information. Original input data along with all site-specific data (e.g. PCR, weight, SPF

prediction, etc.) are also written out to the same Excel document in a separate sheet.

Configuring and Running SPF-R

The SPF development tool can easily be configured to work for a variety of SPF models. Filters

can be applied to develop SPFs for specific crash types or to change the roadway geometry. In

addition, classes can be used to develop SPFs for several subsets of data. The following is a

summary of the lines that are typically changed:

• Line 17 – Version number – It is good practice update this number to indicate significant

changes to the code base (please consider sharing any advancements on GitHub as

well).

• Line 26 – User folder – This variable is based on the current Windows User’s folder. This

is helpful as this path is different for every user.

• Lines 30-45 – Main Settings – As discussed earlier, these settings specify column names,

classes, severity outputs, main filter, and the input path (line 41). The input path can be

hard coded and will ignore the User Folder if convenient (e.g. CSVpath =

"C/Temp/Input.csv").

• Line 112 – SPF Model Form – This line allows the user to specify a different model form.

Be sure to add statements under line 102 if any additional variables are added to the

model. For instance, a variable for the natural log of traffic volume on the minor

approach would need to be added if you were developing an intersection SPF.

Generally, all other sections of the code should remain unchanged.

Once configured, a user simply executes the script using Code>Run Region>Run All (or

using the hotkey Ctrl+Alt+R). The code includes several printed statements that will appear in

the Console that can help with debugging. The following figure shows a typical R Studio layout.

SPF-R Output

After a successful execution, a folder called “R_SPFs” will be created in the designated

output folder (a warning that this folder already exists will appear after the successive

executions). In this folder, a project folder will be created containing three files: and Excel

workbook with two worksheets, an image of a crash scatter plot, and an image of the CURE

Plot. Windows Explorer provides an easy way to view the output quickly if the thumbnails are

enlarged as shown below.

R Studio is able to process a large database with several classes (recall that classes are

groups of roadway segments or intersections) resulting in several SPFs in just a few minutes (on

a modern computer at the time of writing this paper). In fact, typical SPF development takes

only a few seconds.

Conclusions

This SPF development tool presented above is useful when trying to improve SPF

development. The effect that the roadway network’s heterogeneity has on SPF development

can be quickly explored by simply adjusting the output folder (line 42) and the base filter (line

43). Consider the following example:

• Base condition #1

o OutputProject_Base = "BC1-SW_2_LW_9"

o myFilter = "data$SHLDWID == 2 & data$LANEWID == 9

• Base condition #2

o OutputProject_Base = "BC1-SW_3_LW_10"

o myFilter = "data$SHLDWID == 3 & data$LANEWID == 10

In the above example, two SPFs can quickly be developed for the same roadway

network but for different specifications for shoulder and lane widths. Each SPF will be saved to

separate folders, named accordingly. The CURE Plots can be compared and further assessment

can be performed by opening the respective Excel files. Sample sizes and goodness-of-fit

measures can be compared as well to decide which SPF is more appropriate for the dataset.

The CURE plots provide a quick and visual screening process while other goodness-of-fit

measures allow the user to objectively compare SPFs.

Resources

The following resources offer information on SPF development and calibration.

• The Highway Safety Manual, First Edition

• NCHRP Project 20-7 (Task 332): User’s Guide to Develop Highway Safety Manual Safety

Performance Function (SPF) Calibration Factors.

• SPF Decision Guide: SPF Calibration vs. SPF Development.

o https://safety.fhwa.dot.gov/rsdp/downloads/spf_decision_guide_final.pdf

• SPF Development Guide: Developing Jurisdiction-Specific SPFs.

o https://safety.fhwa.dot.gov/rsdp/downloads/spf_development_guide_final.pdf

• The Art of Regression Modeling in Road Safety by Ezra Hauer

o http://www.springer.com/us/book/9783319125282

SPF R Users Guide

Navigation menu

Versions of this User Manual:

Views

Navigation