Data Standards Reference Handbook (Beta Release) 2019 01 Sf Publishing Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 90

DownloadData Standards Reference Handbook (Beta Release) 2019-01-01 Sf  Publishing Manual
Open PDF In BrowserView PDF
Table of Contents
Introduction

1.1

Data Structure and Formats
Data Structure and Formats

2.1

Column Headers & Order

2.1.1

Date and Time

2.1.2

Text

2.1.3

Numeric

2.1.4

Location (coordinates)

2.1.5

Location (addresses)

2.1.6

Standard Reference Data
Reference Data Overview

3.1

Reference: General Admin

3.2

Department Names and Codes
Reference: Demographics

3.2.1
3.3

Sexual Orientation & Gender Identity

3.3.1

Race and Ethnicity

3.3.2

City and County of San Francisco
San Francisco Recommended Standard
Appendices
Department of Public Health’s Ethnicity Guidelines

3.3.2.1
3.3.2.1.1
3.3.2.1.1.1
3.3.2.1.2

State of California

3.3.2.2

Federal Government

3.3.2.3

Reference: Basemap

3.4

Overview

3.4.1

Parcels

3.4.2

Building Footprints

3.4.3

1

Address Numbers

3.4.4

Street Names

3.4.5

Street Suffix Abbreviations

3.4.6

Street Centerlines and Nodes

3.4.7

Reference: Boundaries

3.5

Census

3.5.1

Neighborhoods

3.5.2

Supervisor Districts

3.5.3

Zoning Use Districts

3.5.4

Appendix
Reserved Column Names

4.1

Reference Data Index

4.2

Contributing

4.3

Acknowledgements

4.4

License

4.5

2

Introduction

Data Standards Reference Handbook (Beta
Release)
DataSF operates the City and County of San Francisco's official open data portal. We are
documenting standards to make data more useful and consistent across the City at scale.
This document serves several purposes:
1. Introduce more consistency within the open data publishing process
2. Provide a single enduring, open document to help onboard new staff
3. Provide a reference for data publishers and users
4. Clarify departmental stewardship of certain reference data
We lean heavily on existing precedent where available. The scope includes:
1. Formats and data structure
2. Common reference standards (lists) that are useful across datasets and departments
We are not including domain-specific standards here like Open311 or LIVES which have
their own documentation and communities. We are also not using this to propose new
domain specific standards.
Throughout this guide we reference standard names and lists, please refer to the appendix
for reserved column names and an index of reference data.

3

Data Structure and Formats

Data Structure and Formats
This section covers format and structure standards for datasets being shared with others.
These standards are designed to make sure that field level information is shared as
consistently as possible to minimize:
1. Errors
2. Rework
3. Repetitive questions
Many thanks to Singapore's Open Data Program for providing a Data Quality Guide for
Tabular Data the bulk of which made its way into this chapter with some additions and
modifications.

4

Column Headers & Order

Column Headers
Only use alphanumeric or these 3 special characters: period (.), dash (-), and
underscore (_)
Ampersand (&) should be replaced by “and” if needed
Each must be unique
Can’t have two headers called "duration"
Units of measure should be omitted
Units can and should be provided with the data dictionary
Keep short (less than 30 characters)
A full description can and should be provided with the data dictionary

Column Order
Unique identifiers should be in the left-most column if applicable
Date and time variables should be in the first column for time series data
Fixed or classified variables should be ordered with the highest-level variable on the left
and most granular variable on the right, for example
311 cases: service_name, service_subtype, service_details
Police incidents: category, descript
Observed variables should always be on the rightmost columns, these are measured
variables often numeric, for example:
Duration
Number of Units
Number of Stories
Year Built
People Served

Is anything wrong, unclear, missing?
Leave a comment.

5

Date and Time

Date and Time
Based on ISO8601, an international standard for representing date and time. We chose
the "extended format" with the hyphens because it is more human readable.
Compare 2016-01-01 to 20160101
All date and time variables must be local time (UTC -8hrs Pacific Standard Time UTC
-7hrs Pacific Daylight Savings Time) unless specified.

Date variables
Interval

Column name

Format

Range of values

Example

Annual

year

YYYY

YYYY: 1776 onwards

2015

Monthly

month

YYYY-MM

MM: 01 to 12

2015-01

Daily

date

YYYY-MM-DD

DD: 01 to 31

2015-01-01

Weekly

week

YYYY-[W]WW

[W]WW: W01 to W52

2015-W01

Quarterly

quarter

YYYY-[Q]Q

[Q]Q: Q1 to Q4

2015-Q1

Half-yearly

half_year

YYYY-[H]H

[H]H: H1 or H2

2015-H1

For fiscal periods, prefix “fiscal_” to column name
Interval

Column name

Format

Example

Fiscal, annual

fiscal_year

YYYY

2015

Fiscal, monthly

fiscal_month

YYYY-MM

2015-01

Fiscal, quarterly

fiscal_quarter

YYYY-[Q]Q

2015-Q1

Fiscal, half-yearly

fiscal_half_year

YYYY-[H]H

2015-H1

Fiscal year start date must be indicated in the data dictionary
e.g. The fiscal year starts on July 1 and ends on June 30 for the City and County of
San Francisco

Date-time and time variables
ISO 8601 uses 24 hour clock system in hh:mm:ss format (do not use AM or PM)
e.g. 13:00 is equivalent to 1:00 PM

6

Date and Time

Type
Date +
time

Time only

Column
name

Format

date_time

time

Example

YYYY-MM-DD[T]hh:mm

2015-01-01T13:00

or YYYY-MMDD[T]hh:mm:ss

2015-0101T13:00:00

hh:mm

13:00

or hh:mm:ss

13:00:00

Specify the timezone if it is not local time (UTC -8hrs Pacific Standard Time UTC -7hrs
Pacific Daylight Savings Time):
Type

Column
name

Date +
time

date_time

Format

Example

YYYY-MMDD[T]hh:mm+hh:mm

2015-0101T12:00+00:00

or YYYY-MMDD[T]hh:mm:ss+hh:mm:ss

2015-0101T12:00:00+00:00:00

Date and time extracts
In certain cases you may want to provide a single variable representing the number or name
of an individual date component, a day, a month, etc. There's no requirement to provide
these, but follow this guidance:

7

Date and Time

Extract

Column
name

Type

Range of values

Year

year_num

integer

any valid year

Month

month_num

integer

1 to 12

Month
Name

month_name

string

January, February, March, April, May, June,
July, August, September, October, November,
December

Week of
Year

woy_num

integer

0 to 51

Day

day_num

integer

1 to 31 (varies by month)

Day of
Week

dow_num

integer

0 to 6

Day of
Week
Name

dow_name

string

Monday, Tuesday, Wednesday, Thursday,
Friday, Saturday, Sunday

Hour

hour_num

integer

0 to 23

Minute

minute_num

integer

0 to 59

Second

second_num

integer

0 to 59

These can often be automatically extracted from a valid ISO-8601 date, for example the
open data portal enables querying a dataset with these date extract functions:
date_extract_d() - extracts the day from a date as an integer
date_extract_dow() - extracts the day of week as an integer between 0 and 6 (inclusive)
date_extract_hh() - extracts the hour of the day as an integer between 0 and 23
(inclusive)
date_extract_m() - extracts the month as an integer
date_extract_mm() - extracts the minute from the time as an integer
date_extract_ss() - extracts the second from the time as an integer
date_extract_woy() - extracts the week of the year as an integer between 0 and 51
(inclusive)
date_extract_y() - extracts the year as an integer

Durations
Durations can be automatically calculated if you provide a separate start and end period in
your dataset. If you also want to provide a duration, please:
Provide the milliseconds between the start and end period (include the duration unit in

8

Date and Time

the data dictionary)
Milliseconds can be rolled up to other time intervals
Use duration in your column name but prepend with a useful descriptor, e.g:
flight_duration
response_duration
dwell_time_duration
travel_duration
Do not duplicate any of the duration column names per the guidance on columns
Note: ISO 8601 does have separate guidance on duration formatting, but we find this
more cumbersome than just calculating milliseconds between a period for which there
are many standard programming libraries.

Is anything wrong, unclear, missing?
Leave a comment.

9

Text

Text
UTF-8 encoding should be used
This ensures that special characters can be decoded by users
No line breaks within cells
This can break parsing in software like Excel, introducing data integrity issues
There are many ways to remove and detect line breaks, but this can vary based on
how you're extracting data

Considerations for categorical variables
Please maintain consistency with canonical and standard reference lists
This helps with analysis across departments and data systems
Common reference lists are provided within this document, including the departmental
steward of the list where applicable and links to the data

Character case
Text should be presented in the easiest to interpret/read format where appropriate.
Title case
Address String
Categories when either the source system presents them this way or it is easy to
interpret from the source
Upper case
Acronyms - e.g - PSA (Park Service Area)
States - e.g. CA
Lower case
Categories when the source system presents them in caps and there's no way to
interpret them to title case
Research suggests lower case as opposed to uppercase is easier to read for humans
and just as useful to machines, note exceptions above

Is anything wrong, unclear, missing?

10

Text

Leave a comment.

11

Numeric

Numeric
No commas
e.g. "1000" instead of "1,000"
No units of measurement
Units should be in metadata instead
Express as full number where possible
e.g. "1200000" instead of "1.2" (million)
If rounded, indicate in metadata
No rounding if possible
Give raw numbers as far as possible
If rounding is needed, try to provide at least 2 decimal places of precision
Percentages can be expressed as either a proportion out of 1 or 100.
e.g. 20% can be expressed as 20 or 0.2
The representation of percentages must be consistent throughout your dataset
(e.g. among different percentage fields)
Agencies must indicate how percentages are expressed in the data dictionary

Is anything wrong, unclear, missing?
Leave a comment.

12

Location (coordinates)

Location (coordinates)
Coordinates in EPSG 4326 or EPSG 2227
Only EPSG 4326 coordinates can be mapped within the open data portal
Should be represented in two columns
EPSG 4326:

latitude

EPSG 2227:

x_coord

and
and

longitude

or

y_coord

Note: all EPSG 4326 coordinates will be loaded into the open data portal to support
mapping and presented in an additional single location column there called
the_geom

. EPSG 2227 coordinates will be represented as the two original columns

In positive/negative floating point
e.g.

latitude

: 37.761146;

longitude

: -122.436235

EPSG should be indicated in metadata

Is anything wrong, unclear, missing?
Leave a comment.

13

Location (addresses)

Location (addresses)
Why valid addresses matter
Consistent formatting of valid addresses is important for accurately mapping and
referencing geographic information
A poorly formed address could end up mapping to the wrong geographic reference or
not at all, reducing the usefulness of the data
Poorly formed addresses can make cleanup of data labor intensive and result in
reporting errors where geography (neighborhoods, census, etc.) is concerned
Poorly formed addresses could also result in additional costs because of things like:
Undeliverable/returned mail
Failure to apply benefits to recipients appropriately based on geography
Poor routing of vehicles or people in the field

Address formatting
Addresses should be output with the level of detail relevant to the data
e.g. permits can be applied down to the sub-address level
If providing addresses in a complete string, make sure the addresses are well formed
and consistent for easy parsing, for example:
741 Ellis Street, Unit 5, San Francisco, CA 94109
901 Bayshore Boulevard, Unit 209, San Francisco, CA 94124
When providing multiple addresses within a dataset, prepend your column names with
the type of address
e.g. address vs. mailing_address (see Registered Businesses dataset)
Where appropriate, use a valid Enterprise Address System address
EAS addresses capture addresses input by DBI staff, see the section on address
numbers for more detail

Address elements
Below are some common elements of an address (but not all)
Not all addresses will have all elements
Address granularity will be driven by the business need, so not all systems will collect

14

Location (addresses)

every element
Note: systems can be designed to validate or lookup addresses on entry,
minimizing error
Make sure the individual elements of an address line up with the guidance below
You can publish addresses as either single strings or break into separate fields
Note: this guidance is provided to promote consistency across the bulk of shared
tabular datasets and not as a comprehensive guide to address standards. For a
comprehensive standard on addressing, see the Federal Geographic Data Committee
(FGDC) United States Thoroughfare, Landmark, and Postal Address Data Standard
Element

From Address
Number

To Address
Number

Address
Number Prefix

Address
Number

Address
Number Suffix

Street Name
Pre Modifier

Data
Type

Definition

Valid Values

Numeric

First part of a range: 10001100 Main Street, San
Francisco, CA 94102

For each street
centerline on the right
side: rt_fadd ; on the
left side: lf_fadd

Numeric

Second part of a range: 10001500 Main Street, San
Francisco, CA 94102

For each street
centerline on the right
side: rt_fadd ; on the
left side: lf_fadd

Numeric

The portion of the Complete
Address Number that
precedes the Address
Number itself: B315 Main
Street, San Francisco, CA
94102

Official address
numbers available
through the Enterprise
Address System as

The numeric identifier for a
land parcel, house, building,
or other location along a
thoroughfare or within a
community: 315A Main Street,
San Francisco, CA 94102

Official address
numbers available
through the Enterprise
Address System as

The portion of the Complete
Address Number that follows
the Address Number itself:
315 A Main Street, San
Francisco, CA 94102

Official address
numbers available
through the Enterprise
Address System as

Numeric

Text

Text

A word or phrase in a
Complete Street Name that 1.
Precedes and modifies the
Street Name, but is separated
from it by a Street Name Pre
Type or a Street Name Pre
Directional or both, or 2. Is
placed outside the Street

address_number

address_number

address_number_suffix

Official list of street
names maintained by

15

Location (addresses)

Pre Modifier

Street Name
Predirectional

Street Name
Pretype

Street Name

Street Name
Posttype

Street Name
Postdirectional

Name so that the Street
Name can be used in creating
a sorted (alphabetical or
alphanumeric) list of street
names.: 315A Old Main
Street, San Francisco, CA
94102

Public Works

Text

A word preceding the street
name that indicates the
directional taken by the
thoroughfare from an arbitrary
starting point, or the sector
where it is located: 315A East
Main Street, San Francisco,
CA 94102

Official list of street
names maintained by
Public Works

Text

A word or phrase that
precedes the Street Name
and identifies a type of
thoroughfare in a Complete
Street Name: US Route 101,
San Francisco, CA

Official list of street
names maintained by
Public Works

Text

The portion of the Complete
Street Name that identifies
the particular thoroughfare (as
opposed to the Street Name
Pre Modifier, Street Name
Post Modifier, Street Name
Pre Directional, Street Name
Post Directional, Street Name
Pre Type, Street Name Post
Type, and Separator Element
(if any) in the Complete Street
Name.): 315A Main Street,
San Francisco, CA 94102

Official list of street
names maintained by
Public Works

Text

A word or phrase that follows
the Street Name and
identifies a type of
thoroughfare in a Complete
Street Name: 315A Main
Street, San Francisco, CA
94102

Official list of street
names maintained by
Public Works

Text

A word following the street
name that indicates the
directional taken by the
thoroughfare from an arbitrary
starting point, or the sector
where it is located: 315A Main
Street East, San Francisco,
CA 94102

Official list of street
names maintained by
Public Works

16

Location (addresses)

Text

A word or phrase in a
Complete Street Name that
follows and modifies the
Street Name, but is separated
from it by a Street Name Post
Type or a Street Name Post
Directional or both: 315A Main
Street Extended, San
Francisco, CA 94102

Official list of street
names maintained by
Public Works

Text

The type of occupancy to
which the associated
Occupancy Identifier applies.
(Building, Wing, Floor,
Apartment, etc. are types to
which the Identifier refers.):
315A Main Street, Apt 2, San
Francisco, CA 94102

There is no complete
reference of
subaddresses (aka
units) at the time. You
can refer to Enterprise
Address System
addresses with units
for a partial list.

Occupancy
Identifier

Text

The letters, numbers, words,
or combination thereof used
to distinguish different
subaddresses of the same
type when several occur
within the same feature: 315A
Main Street, Apt 2, San
Francisco, CA 94102

There is no complete
reference of
subaddresses (aka
units) at the time. You
can refer to Enterprise
Address System
addresses with units
for a partial list.

City

Text

The city the address sits
within: 315A Main Street, San
Francisco, CA 94102

Text

The names of the US states
and state equivalents: the fifty
US states, the District of
Columbia, and all U.S.
territories and outlying
possessions. A state (or
equivalent) is "a primary
governmental division of the
United States." The names
may be spelled out in full or
represented by their two-letter
USPS or ANSI abbreviation:
315A Main Street, San
Francisco, CA 94102

Recommend using
standard
abbreviations. Spell
out if you can do so
without introducing
misspellings (e.g using
validated entry).

Numeric

A system of 5-digit codes that
identifies the individual Post
Office or metropolitan area
delivery station associated
with an address: 315A Main
Street, San Francisco, CA
94102

Note, zip codes are not
actually boundaries,
but are defined by
routes. A list of valid
San Francisco
zipcodes can be
downloaded here.

Street Name
Post Modifier

Occupancy
Type

State Name

ZIP code

17

Location (addresses)

ZIP+4

Numeric

A 4-digit extension of the 5digit Zip Code (preceded by a
hyphen) that, in conjunction
with the Zip Code, identifies a
specific range of USPS
delivery addresses: 315A
Main Street, San Francisco,
CA 94102-1212

Note, zip codes are not
actually boundaries,
but are defined by
routes. A list of valid
San Francisco
zipcodes can be
downloaded here.

Is anything wrong, unclear, missing?
Leave a comment.

18

Reference Data Overview

Reference Data Overview
Reference data generally refers to an authoratative list of permissible values to be used in
other data. It may also refer to standards of collection methods against different lists as often
is the case with demographic information.
Reference data, unlike transactional data, will change less frequently and will often have a
controlled process for changes; for example, the addition of official addresses is controlled
through the permitting process.
These pages are designed to improve discoverability and documentation of some of the
most common references used across city data. This should be useful to data users, but
also data publishers as they make decisions about how to disseminate data. Additionally,
this can be used by those developing new systems.
The following pages are grouped into several related sections:
1. General Admin. Reference lists used in the administration of City business. For
example, in the City financial system.
2. Demographics. Reference lists used to capture demographic information in systems or
on surveys.
3. Basemap. References generated or used in the production of basemap data including
parcel numbers, street names, and address numbers.
4. Boundaries. References that refer to common boundaries like census areas,
neighborhoods and supervisor districts.

19

Reference: General Admin

Reference: General Admin
This section covers any references that are used in the administration of City business that
don't fall into the other reference categories. For example, categories used in the financial
system of record.

20

Department Names and Codes

Department Names and Codes
Definition
The City and County of San Francisco is made up of many organizations that perform work
and deliver services according to the charter and administrative codes of the City.
These organizations have common names, but also codes that are used in accounting for
the work and services performed. The Controller's Office maintains these codes in the
Executive Information System (EIS) where department staff maintain records related to
spending, revenue and budget among other things. Other enterprise systems use these
codes to link administrative data among departments.

Reference
Dataset

Department
Code List

Description and Constraints
These department codes are maintained in
the City's Financial System of Record.
Department Groups, Divisions, Sections,
Units, Sub Units and Departments are nested
in the dataset from left to right. Each nested
unit has both a code and an associated
name.
The dataset represents a flattened tree
(hierarchy) so that each leaf on the tree has
it's own row. Thus certain rows will have
repeated codes across columns. Data
changes as needed.

Reference Columns

Nested (right to left):
department_group_code
division_code
section_code
unit_code
sub_unit_code
department_code

21

Reference: Demographics

Reference: Demographics
Where standards or references exist for demographic information, we include those here.
Currently this covers Sexual Orientation & Gender Identity.

22

Sexual Orientation & Gender Identity

Sexual Orientation and Gender Identity
Below are standards on how to collect sexual orientation and gender identity (SOGI) If your
department does not have a standard, you are encouraged to use one of the standards
below.

San Francisco Standards
Administrative Code Chapter 104
Chapter 104 of the San Francisco code requires the collection of SOGI data by select
departments consistent with Department of Public Health guidelines. View the code for the
full set of requirements. Below is an overview.

Description of Standard
Select departments are required to solicit SOGI data consistent with Department of Public
Health's Policies and Procedures:
"Sexual Orientation Guidelines: Principles for Collecting, Coding, and Reporting Identity
Data," reissued on September 2, 2014
"Sex and Gender Guidelines: Principles for Collecting, Coding, and Reporting Identity
Data," reissued on September 2, 2014
or any successor Policies and Procedures
Sexual Orientation Guidelines
Below is a brief and incomplete excerpt - please review the full set of guidelines before using
this standard.

23

Sexual Orientation & Gender Identity

When collecting data on sexual orientation, the following format should be followed:
Selection of sexual orientation identity should be limited to one answer choice.
How do you describe your sexual orientation or sexual identity? (Check one)
a. Straight / Heterosexual
b. Bisexual
c. Gay / Lesbian / Same-Gender Loving
d. Questioning / Unsure
e. Not listed. Please specify: ________________________
f. Decline to answer
And for internal use only (not to be listed as an option to the individual):
g. Not Asked
h. Incomplete / Missing data
Sex and Gender Guidelines
Below is a brief and incomplete excerpt - please review the full set of guidelines before using
this standard.
Two questions should be used together to identify sex and gender. You should ask
these two questions, together as follows and in this order, to acquire sex and gender
demographics about both the person’s present gender identity and his or her history.
1. What is your gender? (Check one that best describes your current gender identity.)
i. (1) Male
ii. (2) Female
iii. (3) Trans Male
iv. (4) Trans Female
v. (5) Genderqueer / Gender Non-binary
vi. (6) Not listed, please specify____________________
vii. Survey forms would include options 1-6. Coding should also allow for options
7 and 8
i. (7) Declined / Not stated
ii. (8) Question Not Asked
2. What was your sex at birth? (Check one)
i. (1) Male
ii. (2) Female
iii. Survey forms would include options 1-2. Coding should also allow options 3
and 4
iv. (3) Declined / Not stated
v. (4) Question Not Asked

24

Sexual Orientation & Gender Identity

Definitions
"Gender Identity" means a person's gender as designated by that person. A person's gender
identity shall be determined based on the individual's stated gender identity, without regard
to whether the self-identified gender accords with the individual's physical appearance,
surgical history, genitalia, legal sex, sex assigned at birth, or name and sex as it appears in
medical records, and without regard to any contrary statement by any other person,
including a family member, conservator, or legal representative. An individual who lacks the
present ability to communicate his or her gender identity shall retain the gender identity used
by that individual prior to losing his or her expressive capacity.
From Section [3304.1] (c) of the Police Code
"Sexual orientation" shall mean the status of being lesbian, gay, bisexual or heterosexual.
From Section [12B.1] (c) of the Administrative Code.

Who must Comply
The following departments must comply with Administrative Code Chapter 104. See the
code for details on exceptions, the official list of required departments, and other
requirements. Other departments may use this standard as helpful.
Department of Public Health
Department of Human Services
Department of Aging and Adult Services
Department of Children, Youth and their Families
Department of Homelessness and Supportive Housing
Mayor's Office of Housing and Community Development.Requirements
These departments are also required to flow down the standard to their contractors and
service providers. The code provides more detail on these provisions.

Authority
San Francisco Administrative Code Chapter 104: Collection of Sexual Orientation and
Gender Identity Data.

California Standards

25

Sexual Orientation & Gender Identity

At this time we do now know of any California standards. However, CA Government code
8310.8 (cited as the Lesbian, Gay, Bisexual, and Transgender Disparities Reduction Act)
requires certain state departments (listed below) to collect SOGI data. It does not however
specify how they should collect that data or even that they should do it consistently. These
state agencies may flow down SOGI data collection requirements to your department for
purposes of state data collection.
(a) (1) This section shall only apply to the following state departments:
(A) The State Department of Health Care Services.
(B) The State Department of Public Health.
(C) The State Department of Social Services.
(D) The California Department of Aging.

26

Race and Ethnicity

Race and Ethnicity
Background and Overview
Background
The concepts of race and ethnicity are not concrete. They represent social-political
constructs that evolve over time and are subject to the perceptions of self and others. View a
timeline of changes in race and ethnicity in the US Census from 1790-2010.
As a result, there is no perfect standard for race and ethnicity. The standardization of race
and ethnicity data represents a tension between (1) collecting race and ethnicity data to
maximize opportunities to self-identify, self-describe, or place oneself within a group that
feels welcoming and right, and (2) collecting data that decision makers and the public can
use effectively to advance social justice and civil rights.
The changing nature of society’s understanding of race and ethnicity presents an ongoing
challenge to how it is captured.
At this time, San Francisco does not have a standard, required method for collecting race
and ethnicity data. As a result, methods vary not only by department but by program or data
system. Methods in place may be an artifact of reporting expectations, system defaults or
historic decisions.
The purpose of this section is to provide guidance as follows:
Define a recommended standard given the latest research and testing on race and
ethnicity data collection methods and to promote consistent data collection over time
Provide information on other standards that are available and may be used as
alternatives to the recommended standard

Overview of Standards
Below is an overview of the race and ethnicity data standards covered in this section.

27

Race and Ethnicity

Jurisdiction

Title

Who should use

City and
County of
San
Francisco

San Francisco
Recommended
Standard

Departments should comply with this standard
unless they face conflicting requirements. Note that
external reporting fields are not requirements.
Appendix E provides a rationale for these
recommendations.

City and
County of
San
Francisco

Department of
Public Health’s
Ethnicity
Guidelines

All new data collection systems purchased or
designed for or by the Department of Public Health.

State of
California

Racial and
Identity
Profiling Act of
2015
Regulations

The Police Department is required by state law to
use this in the context of collecting data on stops.
This standard is not required for other types of data
collection, including in the Police Department, and
may not be appropriate as it was designed to
capture perceived race/ethnicity.

Federal
Government

Standards for
the
Classification
of Federal
Data on Race
and Ethnicity
(rev. 1997)

This standard does not face City Departments. This
is included for reference as this may flow down to
City departments via federal reporting requirements.

28

City and County of San Francisco

City and County of San Francisco
San Francisco does not have a citywide standard. We include a recommended standard and
the existing guidance from the Department of Public Health.

29

City and County of San Francisco

San Francisco Recommended Standard
Description of Standard
The purpose of this data standard is to support the consistent collection, maintenance and
reporting of data on race and ethnicity across Departments. Consistent race and ethnicity
data will:
Improve our ability to track and compare differences across City services and programs
Help inform policy and procedural changes to reduce disparities across City services
and programs
The categories in this standard come from the Census 2015 National Content Test and like
the Census are not genetically, anthropologically, or scientifically based. Instead the
categories represent a socio-political construct. The Census 2015 National Content Test
consisted of a sample of 1.2 million households making it the largest and most thorough
testing and validation of detailed racial and ethnic categories. This standard relies heavily on
this research as well extensive testing done by the OMB Tabulation Working Group for the
1997 race/ethnicity standard.

Standard or Guideline
Collection Protocol
1. Self-identification preferred. Respect for individual dignity should guide the processes
and methods for collecting data on race and ethnicity. Use self-identification when
feasible and practical. If self-identification is not feasible or practical at the point of
collection, departments should provide a later opportunity for individuals to self-identify.
i. Exception. When collecting data for purposes of understanding bias in
perceptions, use perceived race and ethnicity. For example, data collection on
stops must use perceived race and ethnicity.
2. Multiple selections must be allowed. Respondents or data collectors must be allowed
to select more than one response.
3. Refusal to answer. If the respondent does not answer the race/ethnicity question, the
interviewer may repeat the question and response options. If the respondent fails to
respond to the question, the interviewer may infer a response (based upon observation
or information provided by another source).
4. Training. If staff will be collecting data verbally per this standard, Departments should

30

City and County of San Francisco

develop and implement standard training.

Question Format
Below are formats you should use when collecting race and ethnicity data. The formats
below address:
Ability to collect multiple values. Not all systems are able to collect multiple
selections for a single field value. Use the formats as follows:
Format A. Use this format if your system allows for the selection of multiple values.
Most modern systems should be able to accommodate this.
Format B. Use this format if your system is unable to select multiple values.
Option to collect detailed data. Under each format option or via subsequent
questions, you can collect additional details on subgroups. Each detailed option must
roll up into a one of the 7 standard groups (1). See Appendix C for suggested options.
(1) Refer to 2015 National Content Test Race and Ethnicity Analysis Report. February
28, 2017. Matthews, Kelly et all. Pages 200-282 for roll up guidance

Format A. Multi-Select
Field name (1)

Race and ethnicity

Question prompt (2)

Paper data collection: Mark all that apply
Electronic data collection: Select all that apply

Options and order (3)

White
Asian
Hispanic, Latino, or Spanish
Black or African American
Middle Eastern or Northern African
Native Hawaiian or Other Pacific Islander
American Indian or Alaska Native

Format

Multi-select checkbox. See Appendix B for examples.

(1-2) This terminology was tested in the Census 2015 National Content Test.
(3) Order based on population of San Francisco MSA.

Format B: Single Select
If you cannot use a multi-select option, this format consists of the same field collected at
least twice as follows.

31

City and County of San Francisco

Field name

Race and ethnicity 1

Question prompt

Paper data collection: Mark which one that applies
Electronic data collection: Select which one that applies

Options and order

White
Asian
Hispanic, Latino, or Spanish
Black or African American
Middle Eastern or Northern African
Native Hawaiian or Other Pacific Islander
American Indian or Alaska Native

Format

Radio button. See Appendix B for examples.

Field name

Race and ethnicity 2

Question prompt

If applicable, mark an additional race/ethnicity
Paper data collection: Mark which one that applies
Electronic data collection: Select which one that applies
White
Asian
Hispanic, Latino, or Spanish
Black or African American
Middle Eastern or Northern African
Native Hawaiian or Other Pacific Islander
American Indian or Alaska Native

Options and order

Format

Radio button. See Appendix B for examples.

Reporting
At a minimum, you should calculate the following estimates when reporting on race and
ethnicity data.
Each race and ethnicity alone. This table will provide a Census compatible table that
sums to 100%. To create this table, report the following groups:
White alone
Asian alone
Hispanic, Latino, or Spanish alone
Black or African American alone
American Indian or Alaska Native alone
Middle Eastern or Northern African alone
Native Hawaiian or Other Pacific Islander alone
Two or more races

32

City and County of San Francisco

Each race and ethnicity plus some other race. This table will sum to more than
100%. To create this table, report the following groups:
White plus any other race and ethnicity
Asian plus any other race and ethnicity
Hispanic, Latino, or Spanish plus any other race and ethnicity
Black or African American plus any other race and ethnicity
American Indian or Alaska Native plus any other race and ethnicity
Middle Eastern or Northern African plus any other race and ethnicity
Native Hawaiian or Other Pacific Islander plus any other race and ethnicity

Mapping and Transformations
You may need to map your race and ethnicity data for the purposes of matching how this
data is reported by other jurisdictions, surveys or even historical data your department may
have collected. When doing mapping and transformations, you will have to address three
core issues:
1. Mapping to a standard that does not allow for multi-select
2. Mapping to a standard that used two separate questions for race and ethnicity
3. Mapping to a standard that uses different groups or categories
The rules below break out by case depending on the destination system or standard. The
mapping tables provide detailed specifications on how to meet these. Appendix F provides
more background on these rules. Appendix A provides details on how to do this mapping.

Case 1. Mapping to a combined question format with multiselect options
In Case 1, the only issue that would come up would be different categories. The most
common differences should be mapped as follows. If you come across additional ones, feel
free to reach out to us for guidance.
1. Middle Eastern or North African missing. Map to White as per Census designation.
(1)
2. Native Hawaiian or Other Pacific Islander missing. Map to Asian. (2)
3. Any other missing categories missing. Use ‘Other’ or ‘Some Other Race’ or
‘Unknown’ when available.

33

City and County of San Francisco

(1) 2015 National Content Test Race and Ethnicity Analysis Report. February 28, 2017.
Matthews, Kelly et al. Pages 200-282.
(2) Tabulation Working Group. December 15, 2000. Provisional Guidance on the
Implementation of the 1997 Standards for Federal Data on Race and Ethnicity Ch. 5
Section B.1 p 88.

Case 2. Mapping to a combined question format with
single-select option
Our standard allows for multi-selection. If you have to report to an external system that only
allows one value, use the following rules for records with multiple selections. Appendix A
provides details on how to do this mapping:
1. Missing categories. Refer to Case 1 rules if your categories do not match.
2. More than 1 selected, “Hispanic, Latino, or Spanish” selected. If one of the values
is Hispanic, report the respondent as Hispanic regardless of what other selections are
made. For example, if someone selects Hispanic and Asian, you would map them to
Hispanic.
i. If the destination standard does not have Hispanic, Latino, or Spanish as an option
use the other response to report it.
3. More than 1 selected, “Hispanic, Latino, or Spanish” NOT selected. Apply “Largest
Group other than White” rule. Map the respondent to the largest of the group as
represented in the San Francisco Bay Area general population unless that race is
White. For example, if someone selects White and Asian, report them as Asian.The
order from largest to smallest is determined using population estimates for race and
ethnic groups (when available) for the San Francisco Metropolitan Statistical Area (see
Appendix D):
i. White
ii. Asian
iii. Hispanic, Latino, or Spanish
iv. Black or African American
v. Middle Eastern or North African
vi. Native Hawaiian or Other Pacific Islander
vii. American Indian or Alaska Native
4. Exceptions to 2 and 3. If an option for multi-race exists, map multi-selections to that
option.

Case 3. Mapping to a separate question format with multiselect option

34

City and County of San Francisco

Some external standards will separate race and ethnicity into two separate fields, with
ethnicity designated for Hispanic, Latino, or Spanish, and still allow for multiple selections
under the race field. Use the following rules in this case.
1. Missing categories. Refer to Case 1 rules if your categories do not match.
2. “Hispanic, Latino, or Spanish” selected. Record ethnicity as Hispanic, Latino, or
Spanish or equivalent and:
i. If other race/ethnicities selected, record under race
ii. If no other selected, record as Unknown or Other
3. More than 1 selected, “Hispanic, Latino, or Spanish” NOT selected. Record each
selection in the destination standard using the Case 1 rules as needed.

Case 4. Mapping to a separate question format with singleselect option
Like Case 3, race and ethnicity are two separate fields, with ethnicity designated for
Hispanic, Latino, or Spanish. However, you may only select one option under the race field.
Use the following rules in this case. Appendix A provides details on how to do this mapping.
1. Missing categories. Refer to Case 1 rules if your categories do not match.
2. “Hispanic, Latino, or Spanish” selected. Record ethnicity as Hispanic, Latino, or
Spanish or equivalent and:
i. If another race/ethnicity selected, record that under race. If more than 1 additional
race/ethnicity selected, use rule 3 below.
ii. If no other selected, record as Unknown or Other
3. More than 1 selected, Hispanic, Latino, or Spanish NOT selected. Apply “Largest
Group other than White” rule. Map the respondent to the largest of the group as
represented in the San Francisco Bay Area general population unless that race is
White. For example, if someone selects White and Asian, report them as Asian.The
order from largest to smallest is determined using population estimates for the race
alone values (when available) for the San Francisco Metropolitan Statistical Area (see
Appendix D):
i. White
ii. Asian
iii. Hispanic, Latino, or Spanish
iv. Black or African American
v. Middle Eastern or North African
vi. Native Hawaiian or Other Pacific Islander
vii. American Indian or Alaska Native
4. Exception to 3. If an option for multi-race exists, map multi-selections to that option.

35

City and County of San Francisco

Definitions
Race and ethnicity data collections should include the following minimum categories and
definitions.(1)
(1) Definitions from Census 2015 National Content Test.

36

City and County of San Francisco

Category

Definition

American
Indian or
Alaska
Native

The category “American Indian or Alaska Native” includes all individuals
who identify with any of the original peoples of North and South America
(including Central America) and who maintain tribal affiliation or community
attachment. It includes people who identify as “American Indian” or “Alaska
Native” and includes groups such as Navajo Nation, Blackfeet Tribe,
Mayan, Aztec, Native Village of Barrow Inupiat Traditional Government,
Nome Eskimo Community, etc.

Asian

The category “Asian” includes all individuals who identify with one or more
nationalities or ethnic groups originating in the Far East, Southeast Asia, or
the Indian subcontinent. Examples of these groups include, but are not
limited to, Chinese, Filipino, Asian Indian, Vietnamese, Korean, and
Japanese. The category also includes groups such as Pakistani,
Cambodian, Hmong, Thai, Bengali, Mien, etc.

Black or
African
American

The category “Black or African American” includes all individuals who
identify with one or more nationalities or ethnic groups originating in any of
the black racial groups of Africa. Examples of these groups include, but
are not limited to, African American, Jamaican, Haitian, Nigerian,
Ethiopian, and Somali. The category also includes groups such as
Ghanaian, South African, Barbadian, Kenyan, Liberian, Bahamian, etc.

Hispanic,
Latino, or
Spanish

The category “Hispanic, Latino, or Spanish” includes all individuals who
identify with one or more nationalities or ethnic groups originating in
Mexico, Puerto Rico, Cuba, Central and South American, and other
Spanish cultures. Examples of these groups include, but are not limited to,
Mexican or Mexican American, Puerto Rican, Cuban, Salvadoran,
Dominican, and Colombian. The category also includes groups such as
Guatemalan, Honduran, Spaniard, Ecuadorian, Peruvian, Venezuelan, etc.

Middle
Eastern
or
Northern
African

The category “Middle Eastern or North African” includes all individuals who
identify with one or more nationalities or ethnic groups originating in the
Middle East or North Africa. Examples of these groups include, but are not
limited to, Lebanese, Iranian, Egyptian, Syrian, Moroccan, and Algerian.
The category also includes groups such as Israeli, Iraqi, Tunisian,
Chaldean, Assyrian, Kurdish, etc.

Native
Hawaiian
or Other
Pacific
Islander

The category “Native Hawaiian or Other Pacific Islander” includes all
individuals who identify with one or more nationalities or ethnic groups
originating in Hawaii, Guam, Samoa, or other Pacific Islands. Examples of
these groups include, but are not limited to, Native Hawaiian, Samoan,
Chamorro, Tongan, Fijian, and Marshallese. The category also includes
groups such as Palauan, Tahitian, Chuukese, Pohnpeian, Saipanese,
Yapese, etc.

White

The category “White” includes all individuals who identify with one or more
nationalities or ethnic groups originating in Europe. Examples of these
groups include, but are not limited to, German, Irish, English, Italian,
Polish, and French. The category also includes groups such as Scottish,
Norwegian, Dutch, Slavic, Cajun, Roma, etc.

37

City and County of San Francisco

Who must comply
Departments should comply with this standard unless they face conflicting requirements.
Note that external reporting fields are not requirements. Your data can be transformed to
meet external reporting fields if they are different from this standard. Review the section on
transformations and mapping.

Authority
San Francisco Administrative Code Chapter 22D: Open Data Policy Section 22D.2(b)(7).

38

City and County of San Francisco

Appendices
Appendix A. Mapping Crosswalk
The mapping to other data standards crosswalk provides crosswalks from the San Francisco
Recommended Standard to 4 different reporting options that do not allow the preservation of
a respondents multiple race/ethnicity designations:
Mapping to a combined question format with single-select option (Case 2)
Variation A: Without option of ‘Two or More Races’
Variation B: With option of ‘Two or More Races’
Mapping to a separate question format with single-select option (Case 4)
Variation A: Without option of ‘Two or More Races’
Variation B: With option of ‘Two or More Races’

Appendix B. Example Question Formats
Please view this google form for example question formats for implementing Format's A and
B.

Appendix C. Detailed Categories
Under this standard, departments have the discretion to collect additional detail on
subgroups within each category as long as the values roll up into one of the seven values in
this standard.
Below we provide the main categories with detailed subgroup options from two sources:
The Census 2015 National Content Test
An analysis of San Francisco MSA race and ethnicity estimates
Departments should only collect detailed subgroup data to the degree it is useful for
delivering, providing or evaluating programs and services. For example, a department may
want additional subgroup detail for one category but not for others. Many departments may
find that the seven main categories are sufficient. Appendix B includes example question
formats when collecting detailed data.

39

City and County of San Francisco

For additional roll up guidance: Refer to 2015 National Content Test Race and Ethnicity
Analysis Report. February 28, 2017. Matthews, Kelly et all. Pages 200-282.
Main Category

Census 2020 Categories
Based on US Population

Detailed Categories Based
on SF MSA distribution (1)

German
Irish
English
Italian
Polish
French
Write in

Irish
German
English
Italian
Russian
French
Scottish
Portuguese
Polish
Swedish
Norwegian
Write in

Hispanic, Latino, or
Spanish

Mexican or Mexican
American
Puerto Rican
Cuban
Salvadoran
Dominican
Columbian
Write in

Mexican or Mexican
American
Salvadoran
Guatemalan
Nicaraguan
Puerto Rican
Spaniard
Peruvian
Honduran
Cuban
Columbian
Write in

Black or African
American

African American
Jamaican
Haitian
Nigerian
Ethiopian
Somali
Write in

African American
Nigerian
Ethiopian
Jamaican
Eritrean
Haitian
Somali
Write in

White

Asian

Chinese
Filipino
Asian Indian
Vietnamese
Korean
Japanese
Write in

Chinese
Filipino
Asian Indian
Vietnamese
Korean
Japanese
Taiwanese
Thai
Laotian
Cambodian

40

City and County of San Francisco

Write In
American Indian
Alaska Native
Central or South
American Indian
Write in

American Indian
Alaska Native
Central or South
American Indian
Write in

Middle Eastern or
North African

Lebanese
Iranian
Egyptian
Syrian
Moroccan
Algerian
Write in

Iranian
Armenian
Arab
Lebanese
Palestinian
Turkish
Egyptian
Israeli
Yemeni
Algerian
Write in

Native Hawaiian or
Other Pacific Islander

Native Hawaiian
Samoan
Chamorro
Tongan
Fijian
Marshallese
Write in

Native Hawaiian
Samoan
Chamorro
Tongan
Fijian
Marshallese
Write in

American Indian or
Alaska Native

(1) Determined by analyzing weighted population counts for either ancestry, tribe
(American Indian or Alaska Native), detailed hispanic information (Hispanic) or detailed
race information (Asian) information for respondents in SF Metropolitan Statistical Area.
Each main race/ethnicity category was analyzed in isolation for all respondents who
identified as that category (either alone or in combination with another main
race/ethnicity category) using IPUMS provided flags, except for MENA which currently
has no flag. MENA was determined by finding the weighted population rank of MENA
valid ancestry values. Detailed Categories assigned to Main Categories based Census
2020 proposed mapping (see page 200-282 at 2015 National Content Test Race and
Ethnicity Analysis Report. February 28, 2017. Matthews, Kelly et al).

Appendix D. San Francisco MSA Race and
Ethnicity Estimates
The table below provides a weighted population estimate by race and ethnicity.

41

City and County of San Francisco

Appendix E. Rationale for Recommended
Standard
In the absence of a citywide standard, we relied on the following to inform a citywide
recommended standard:
The results of the Department of Public Health’s research that resulted in department
wide race and ethnicity guidelines released in 2011
The large scale, random assignment testing conducted by the US Census in 2010 and
2015 to compare alternative question formats (see overview of research)

Combined or separate questions
A standard for race and ethnicity must address a key design choice: should race and
ethnicity be asked as separate (one question for ethnicity, i.e. Hispanic or Latino, and
another for race) or combined questions?
Repeated testing by the Census showed that a combined question format yielded data of the
highest quality. This is consistent with DPH’s recommendation to use a combined question
format.
Below is an excerpt from the US Census 2015 National Content Test (NCT):

42

City and County of San Francisco

“The 2015 NCT research demonstrates that a question format that combines race and
ethnicity into one question results in more accurate reporting and dramatically lower
item nonresponse compared to the two separate questions on Hispanic origin and on
race. In addition, with a new combined question design approach which employed
multiple detailed checkboxes to help collect the reporting of detailed groups, the NCT
research successfully demonstrated how an innovative approach could collect data for
myriad groups across our nation’s diverse population. By combining the race and
Hispanic origin questions into 84 one question on race/ethnicity, the research has
shown that Hispanics can better find themselves among the race and ethnicity
categories.” (Census, 2015)
In addition, responses to combined question format can be mapped to any external reporting
requirements that are structured using separate questions.
As a result, the recommended standard for San Francisco combines race and ethnicity into
a single question using terminology and language tested in the 2015 National Content Test.
To address data mapping concerns, the standard also provides guidance and tools for
external reporting and data mapping.

Inclusion of a new category, MENA
The Census tests also explored including a category for Middle Eastern North African
(MENA), a group that historically is included in the “white” category. The results concluded
that the Census should include MENA:
“The NCT research findings show that the use of a distinct MENA category elicits higher
quality data; and people who identify as MENA use the MENA category when it is
available, whereas they have trouble identifying as only MENA when no category is
available.” (Census, 2015)
As a result, the recommended standard includes a category for MENA using terminology
and language tested in the 2015 National Content Test. The standard also provides
guidance and tools for external reporting and data mapping.

Census decision to not make changes for 2020
Despite the results of testing related to the topics above, the Census is not making changes
for the 2020 Census. This decision is controversial (this article provides some background
on the decision). Despite this decision, we are moving forward with the recommended
standard because:
San Francisco data collection does not operate under the same climate as federal
decision making

43

City and County of San Francisco

The combined question format and inclusion of MENA has generated better response
rates and better quality data in repeated testing
Our one example of a local standard (DPH's race/ethnicity) uses combined as a result
of their extensive process of analysis and community engagement
The existing federal standard already provides for a method for collecting using both
combined and separate formats
External comparisons can be mapped and most reporting already requires mapping the
census data to obtain accurate comparisons for Hispanic, Alone

Multiple Selection must be allowed
A study from the Census ranked California as 2nd highest state for those selecting two or
more races in the 2010 census. The census has historically captured race/ethnicity
information via options that allowed the respondent to select more than one. Likewise the
1997 OMB Race & Ethnicity standard calls for the use of multiple selection.
Any modern data system is capable of capturing multiple selections. For older systems
limited to single select, it is possible to capture the equivalent information via two or more
instances of a single select question.

Detailed Race/Ethnicity subgroups left to discretion of
departments
This standard should not be interpreted as discouraging or limiting the collection of detailed
race/ethnicity information. For certain purposes it is desirable to collect more detailed
race/ethnicity sub-group information. Different departments and offices will have different
sub-groups that are relevant to their work or may be needed for internal or external reporting
(ex. detailed asian ethnicities).
Similar to the Department of Public Health’s 2011 race and ethnicity guidelines, collection of
detailed race/ethnicity information is permitted as long as the values can be rolled up into
one the the 7 values in this standard.

Appendix F. Background on Mapping and
Transformation Rules
The San Francisco Recommended Standard provides rules for mapping and transforming
the standard to external or historical data collection methodologies. Below we provide
background on the Largest Group other than White rule.

44

City and County of San Francisco

Most state and federal reporting systems request data ‘as is’ via electronic transfer and will
handle aggregation (and the associated decisions) themselves. The reasoning behind this
mirrors the reasoning for this standard; it provides the reporting agency with the most
detailed data available as well as ensuring a consistent aggregation method across the
various state and local jurisdictions.
Challenge: how to map multi-select to a single value. In cases where the department
has to perform the aggregation several challenges appear when aggregating from multiselect racial/ethnicity categories to often a single race/ethnicity value. For example if a
respondent selected White and Black, or Asian and Hispanic as their races, which option do
you report?
Federal working group identified multiple methods. Considerable thought and testing
went into such questions during the shift to allowing multiple race selections in the 1997
OMB Race Ethnicity Standard. In 2000 the OMB Tabulation Working Group released
guidance on best practices for transforming multi-race data to single race reporting
standards. They presented options that ranged in complexity with each containing pros and
cons.
Deterministic whole assignment methods should be used. The federal working group
identified two main approaches:
Deterministic Whole Assignment methods which are fixed rules for assigning
race/ethnicity values
Probabilistic and Fractional Assignment methods which rely on statistical estimation
The probabilistic and fractional assignment methods are much more complex to implement
and to explain, particularly on a local scale. Given a review of the options and in consultation
with experts, we recommend using Deterministic Whole Assignment methods. We identified
three options suited to the purposes of the this standard. The three Deterministic Whole
Assignment methodologies for when there are 2 or more races selected are:
Smallest Group. The smallest of the 2 races in the general population is the one
reported.
Largest Group other than White. The largest of the 2 races in the general population is
the one reported unless that race is white.
Largest Group. The largest of the 2 races in the general population is the one reported.
The table below provides examples to illustrate the methods using the makeup of the
population in San Francisco. Given the unique demographics of San Francisco, the
preferred option is ‘Largest Group other than White’ to ensure adequate representation by
non-white groups.

45

City and County of San Francisco

Race
and
ethnicity
1

Race and
ethnicity 2

Smallest group

Largest group
other than White

Largest
group

White

American Indian
or Alaska Native

American Indian
or Alaska Native

American Indian
or Alaska Native

White

White

Asian

Asian

Asian

White

White

Black or African
American

Black or African
American

Black or African
American

White

White

Hispanic, Latino,
or Spanish

Hispanic, Latino,
or Spanish

Hispanic, Latino,
or Spanish

White

White

Middle Eastern or
Northern African

Middle Eastern or
Northern African

Middle Eastern or
Northern African

White

White

Native Hawaiian
or Other Pacific
Islander

Native Hawaiian
or Other Pacific
Islander

Native Hawaiian
or Other Pacific
Islander

White

Asian

American Indian
or Alaska Native

American Indian
or Alaska Native

Asian

Asian

Asian

Black or African
American

Black or African
American

Asian

Asian

Asian

Hispanic, Latino,
or Spanish

Hispanic, Latino,
or Spanish

Asian

Asian

Asian

Middle Eastern or
Northern African

Middle Eastern or
Northern African

Asian

Asian

Asian

Native Hawaiian
or Other Pacific
Islander

Native Hawaiian
or Other Pacific
Islander

Asian

Asian

Asian

White

Asian

Asian

White

46

City and County of San Francisco

Department of Public Health’s Ethnicity
Guidelines
Description of Standard
Below is an excerpt from the DPH guidelines:
“These guidelines were developed by SFDPH Community Programs epidemiologists,
researchers, and analysts who share concerns regarding the collection, coding,
reporting, interpretation, and use of social identity indicators. To monitor health
outcomes and intervene on behaviors that are the underlying causes of disease and
injuries, SFDPH must be able to incorporate changing definitions, relevance, and
boundaries that individuals, communities, programs and/or institutions use to identify
themselves and others.
These guidelines address the following key issues concerning race and ethnicity:
1. Desire for consistency in grouping or categorizing of race and ethnicity data across
time and data regimes.
2. Need for flexibility to accommodate many different existing data collection
practices.
3. Lack of clarity in the meaning and use of terms defining race and ethnicity.”

Standard or Guideline
The full guidelines include details on how to collect and report the data. Below are excerpts:
A single set of common mutually-exclusive core ethnicity categories that are aligned
with state and federal minimum reporting categories should be used.
Persons who select more than one ethnicity should be given the opportunity to also
select their primary ethnicity.
Ethnicity data should be minimally reported by these core categories and definitions.

Definitions
African American/ Black. A person having origins in any of the black ethnic groups of

47

City and County of San Francisco

Africa
Asian. A person having origins in any of the original peoples of the Far East, Southeast
Asia (including Philippines), or the Indian subcontinent
Native Hawaiian or Other Pacific Islander (NHOPI). A person having origins in any of
the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands
Native American. A person having origins in any of the original peoples of North
America, Central America, or South America
Latino/a. A person having origins in Mexico, Central America, South America, Puerto
Rico, or Cuba
White. A person having origins in any of the original peoples of Europe, the Middle East,
or North Africa
Multi-ethnic. A person having origins in more than one of the other core categories
specified.
“Other” should not be an option under the Core categories, for all ethnicities fall under
one of the above seven options.

Who must comply
“All new data collection systems purchased or designed for or by the Department of
Public Health that will be used to track the ethnicity of patients, clients, participants, or
other cohorts must have the ability to track ethnicity in accordance with these
guidelines. Additionally, reporting of collected data should also adhere to these
guidelines whenever possible, recognizing third party reporting requirements may be in
conflict.”

Authority
San Francisco Department of Public Health, Central Administration, “Principles for
Collecting, Coding, and Reporting Social Identity Data – Ethnicity Guidelines (COM3)”.

48

State of California

State of California
We know of only one California standard that faces a City Department. If you know of others,
regardless of whether or not they apply to City Departments, please contact someone at
DataSF.

Racial and Identity Profiling Act of 2015
Regulations
Description of Standard
Under the California Racial and Identity Profiling Act of 2015 (AB 953), state and local law
enforcement agencies must collect data regarding stops of individuals, including perceived
demographic information on the person stopped. They must report this data to the California
Attorney General's Office.
As part of this law, the California Attorney General’s Office issued regulations that detail how
stops data must be collected. This data standard includes data on race and ethnicity. Below
is the race and ethnicity excerpt from the state regulations. The full standard is available
online.
Caution: This data standard only applies to the Police Department in the context of
stops data. As a result, the data standard requires perception of race and ethnicity and
the standard reflects this in the categories used. This is due to the purpose of the data
collection, including to identify potential bias. In contrast, other standards rely on selfidentification, which typically leads to different categories

Standard or Guideline
Below is an excerpt from the standard.

49

State of California

“Perceived Race or Ethnicity of Person Stopped” refers to the officer’s perception of the
race or ethnicity of the person stopped. When reporting this data element, the officer
shall make his or her determination of the person’s race or ethnicity based on personal
observation only. The officer shall not ask the person stopped his or her race or
ethnicity, or ask questions
or make comments or statements designed to elicit this information.
When reporting this data element, the officer shall select all of the following data values
that apply:
1. Asian
2. Black/African American
3. Hispanic/Latino(a)
4. Middle Eastern or South Asian
5. Native American
6. Pacific Islander
7. White
Example: If a person appears to be both Black and Latino(a), the officer shall select
both “Black/African American” and “Hispanic/Latino(a).”

Definitions
“Asian” refers to a person having origins in any of the original peoples of the Far East or
Southeast Asia, including for example, Cambodia, China, Japan, Korea, Malaysia, the
Philippine Islands, Thailand, and Vietnam, but who does not fall within the definition of
“Middle Eastern or South Asian” or “Pacific Islander.”
“Black/African American” refers to a person having origins in any of the Black racial
groups of Africa.
“Hispanic/Latino(a)” refers to a person of Mexican, Puerto Rican, Cuban, Central or
South American, or other Spanish culture or origin, regardless of race.
“Middle Eastern or South Asian” refers to a person of Arabic, Israeli, Iranian, Indian,
Pakistani, Bangladeshi, Sri Lankan, Nepali, Bhutanese, Maldivian, or Afghan origin.
“Native American” refers to a person having origins in any of the original peoples of
North, Central, and South America.
“Pacific Islander” refers to a person having origins in any of the original peoples of
Hawaii, Guam, Samoa, or other Pacific Islands, but who does not fall within the
definition of “Middle Eastern or South Asian” or “Asian.”
“White” refers to a person of Caucasian descent having origins in any of the original
peoples of Europe and Eastern Europe.

Who must comply
50

State of California

The Police Department in the context of collecting data on stops. This standard is not
required for other types of data collection and may not be appropriate as it was designed to
capture perceived race/ethnicity.

Authority
State of California Government Code Title 2 Section 12525.5.

51

Federal Government

Federal Government
Standards for the Classification of Federal Data
on Race and Ethnicity (rev. 1997)
Description of Standard
The Office of Management and Budget sets standards for the collection of race and ethnicity
data used for federal government purposes.
The current OMB definition is from 1997 per the Standards for the Classification of Federal
Data on Race and Ethnicity (rev. 1997). The standards represent minimum requirements;
agencies can, and do, go beyond these minimum standards but they must be able to
aggregate data to the OMB’s defined categories.
These standards
“were developed in cooperation with Federal agencies to provide consistent data on
race and ethnicity throughout the Federal Government. Development of the data
standards stemmed in large measure from new responsibilities to enforce civil rights
laws. Data were needed to monitor equal access in housing, education, employment,
and other areas, for populations that historically had experienced discrimination and
differential treatment because of their race or ethnicity. The standards are used not only
in the decennial census (which provides the data for the "denominator" for many
measures), but also in household surveys, on administrative forms (e.g., school
registration and mortgage lending applications), and in medical and other research. The
categories represent a social-political construct designed for collecting data on the race
and ethnicity of broad population groups in this country, and are not anthropologically or
scientifically based.”
OMB initiated a process in 2016-17 to revisit the standard due to limitations with the existing
standard. While a notice came out in March of 2017, we do not know of any additional steps.

Standard or Guideline
Below is an excerpt containing the bulk of the standard. Read the full standard online.

52

Federal Government

This classification provides a minimum standard for maintaining, collecting, and
presenting data on race and ethnicity for all Federal reporting purposes. The categories
in this classification are social-political constructs and should not be interpreted as
being scientific or anthropological in nature. They are not to be used as determinants of
eligibility for participation in any Federal program. The standards have been developed
to provide a common language for uniformity and comparability in the collection and
use of data on race and ethnicity by Federal agencies.
The standards provide two formats that may be used for data on race and ethnicity.
Self-reporting or self-identification using two separate questions is the preferred method
for collecting data on race and ethnicity. In situations where self-reporting is not
practicable or feasible, the combined format may be used.
Respondents shall be offered the option of selecting one or more racial designations.
Recommended forms for the instruction accompanying the multiple response question
are "Mark one or more" and "Select one or more."
In no case shall the provisions of the standards be construed to limit the collection of
data to the categories described above. The collection of greater detail is encouraged;
however, any collection that uses more detail shall be organized in such a way that the
additional categories can be aggregated into these minimum categories for data on
race and ethnicity.

Definitions
The minimum categories for data on race and ethnicity for Federal statistics, program
administrative reporting, and civil rights compliance reporting are defined as follows:
American Indian or Alaska Native. A person having origins in any of the original peoples
of North and South America (including Central America), and who maintains tribal
affiliation or community attachment.
Asian. A person having origins in any of the original peoples of the Far East, Southeast
Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan,
Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.
Black or African American. A person having origins in any of the black racial groups of
Africa. Terms such as "Haitian" or "Negro" can be used in addition to "Black or African
American."
Hispanic or Latino. A person of Cuban, Mexican, Puerto Rican, Cuban, South or Central
American, or other Spanish culture or origin, regardless of race. The term, "Spanish
origin," can be used in addition to "Hispanic or Latino."
Native Hawaiian or Other Pacific Islander. A person having origins in any of the original
peoples of Hawaii, Guam, Samoa, or other Pacific Islands.

53

Federal Government

White. A person having origins in any of the original peoples of Europe, the Middle East,
or North Africa.

Who must Comply
This standard does not face City Departments. Below is the compliance requirement at the
federal level:
“The new standards will be used by the Bureau of the Census in the 2000 decennial
census. Other Federal programs should adopt the standards as soon as possible, but
not later than January 1, 2003, for use in household surveys, administrative forms and
records, and other data collections.”

Authority
Executive Office of the President, Office of Management and Budget (OMB), Office of
Information and Regulatory Affairs.

54

Reference: Basemap

Reference: Basemap
A basemap is most often associated with a visual representation of base geography (streets,
buildings, parks, etc.) upon which other elements may be mapped. The base layers on that
map help the user orient themselves within space.
In this section, we lay out some component basemap pieces that form core reference data.
The underlying data can be used in more than just developing a visual reference map.
We start with an overview of how the pieces fit together. Understanding this can help
you when linking and referencing data across multiple department datasets.
Then for each basemap component we provide:
A definition
Visual illustration of the concept
Authority under which it is collected
Primary or authoritative uses
Accepted values
And summary of supporting reference data

55

Overview

Basemap Overview
A location reference (addressing) data model
Each component is described in this section individually, but there are important
relationships among them.
Let's start with three core components:
Parcels. The most common unit of reference for City data, the parcel defines the
physical extent of land ownership. It is the outcome of a regulated land subdivision
process.
Address Numbers. As an outcome of permitting, new address numbers are assigned
to each entry from the street per rules specified in the Building Codes.
Building Footprints. Building footprints represent a physical structure in 2D extents.
These are not formally digitized and added to a reference during the development
process.
Note: at the time, because buildings are not updated as development occurs, there will
be missing data.

Illustration
The following illustrates the relationship among the components.

56

Overview

1. 1 Parcel, 1 Building. This is often the case in areas like the Financial District or other
densely populated office districts.
2. 1 Parcel, Many Buildings. This occurs in neighborhoods with accessory dwelling units
or detached buildings.
3. Many Parcels, 1 Building. This occurs when a building is subdivided into different
ownership.
4. 1 Parcel, No Buildings. While rare, this does happen in the case of parking lots, vacant
lots and some parks.
5. No parcels, 1 Building. Buildings can be built in the right of way (e.g. on a median)
where parcels don't exist. This happens rarely.
In all cases, a single address can never be associated with multiple buildings or multiple
parcels.

Relationship table and conceptual diagram
The table and diagram below explain the relationship among the 3 core components above.

57

Overview

From
Address
Number

Address
Number

Parcel

Parcel

Building

Building

To

Relationship

Notes

Parcel

An address
number is
related to 0
or 1 parcel

An address number may only occasionally fall
in the right of way where there is no parcel

Building

An address
number is
related to 0
or 1 building

In some cases an address will be assigned to
a lot with no physical structure

Address
Number

A parcel has
0 or many
address
numbers

When a parcel is first created through
subdivision, it may have no addresses
associated with it yet

Building

A parcel has
0 or many
buildings

A parcel doesn't have to have a building on it

Address
Number

A building
has 1 or
many
address
numbers

Per Building Code, once a building is
approved, it will have at least 1 entrance
address if not more*

Parcel

A building is
in 0 or many
parcels

Buildings may actually exist in the roadway
(e.g. a public works toolshed) and not sit on a
parcel at all. Most buildings sit within 1 or
many parcels.

*Note: The relationship between buildings and address numbers is conceptual at the
time. Staff create address points in the Enterprise Addressing System (EAS) within the
parcel but not in reference to the building. In cases where there is one building on one
parcel, the address point may fall within the building footprint, but there's not an
explicitly modeled relationship across all buildings.

58

Overview

Relationship to streets
Each of these components relates to one or more streets. A street centerline has a unique
identifier called a Centerline Node Network ID.
A building or parcel will relate to at least one street segment
Parcels or buildings with streets on either side will have 2 related segments
Corner buildings and parcels will also relate to two segments
Depending on the size and shape of the parcel or building, they could be fronted on
several or all sides by street segments
Most parcels or buildings on the interior of a block will relate to a single street
segment
An address number can only be associated with a single street segment
When the City assigns an address number they do so along a street segment
within an allowed range

A note on historical data
Streets, parcels, buildings, addresses all change over time. Historical City data could
reference things that don't currently exist. In each of the next pages, references include both
current and historical where available. You can also browse the reference index.

59

Parcels

Parcels
Definition
A parcel is a piece of land or a lot (real property) identified by a unique Assessor Parcel
Number (APN)
The APN is comprised of a block number and a lot number
Block number format: 4 numerical digits + 1 optional letter character (0012A)
Lot number format: 3 numerical digits + 1 optional letter character (037B)
Blocks are groupings of lots which are usually contiguous and usually bounded by
streets or other features on all sides
Blocks can be discontiguous and split by other blocks or streets
The City is broken up into over 6,000 blocks and over 200,000 individual lots
Note: You will see reference to

mapblklot

in some City data. This is to reference a 1:M

relationship of vertical parcels to a base parcel; e.g. condo or timeshare lots.
The practice of representing a vertical lot digitally is to duplicate and "stack" the base
parcel for each vertical lot in the building, assigning each a unique
The

mapblklot

mapblklot

is the reference to the base APN. So

blklot

blklot

number.

will be unique, while

will duplicate across vertical lots.

Illustration

60

Parcels

Block 0117 above is bounded:
On the North and South by Union and Green Streets
On the East and West by Stockton and Powell Streets
Columbus Avenue bisects it, but both sides are still part of the same block
The block is subdivided into lots numbered from 001 through 021
A full Assessor Parcel Number would be the concatenation of the block and lot
Blocks are 4 digits with an optional letter suffix - 117 becomes 0117
Lots are 3 digits with an optional letter suffix - 4 becomes 004
The full APN for lot 4 in block 117 is 0117004
These are recorded in paper maps in the Office of the Assessor Recorder and digitized

Authority
Recordation of final parcel maps happens with the Office of the Assessor-Recorder
Before recordation, subdivision maps are approved by the County Surveyor, the Public
Works Director and the Board of Supervisors
More information about the subdivision process and related codes on the Public
Works website

Use
61

Parcels

Assessor Parcel Numbers are used to tie deeds and legal records to property
Assessor Parcel Numbers used to assess and collect taxes on land and improvements
As a common administrative identifier for a number of processes like permitting

Accepted values
Must be provided in a dataset as 2 separate fields:
Block as

blk

or

block

or

block_num

- must have 4 numeric digits and an

optional letter suffix
Lot as

lot

or

lot_num

- must have 3 numeric digits and an optional letter suffix

When representing the fully qualified APN as a single field:
Name the column either

apn

or

assessor_parcel_number

or

blklot

or

block_and_lot

Concatenate the block and lot values together
Do not separate the block and lot number with space or other characters
0585012D instead of 0585/012D
Do not prepend with additional text like

APN

or

Block and Lot Number

Current parcels and corresponding identifiers in the current subdivision parcels below
Historic parcels and corresponding identifiers in the recorded parcel geography below
(note limitations)

Reference Datasets

62

Parcels

Lot
Column

APN
Column

block_num

lot_num

blklot

Recorded
Parcel
Geography
with
Transaction
Date
History

These are the current and
historic parcels with recorded
dates. Historic parcels only go
back to about 1995 with some
exceptions. Useful for tying
historic administrative records to
a location. The geography can
be used as reference but should
not be used for anything
requiring precision.

block_num

lot_num

blklot

San
Francisco
Assessor
Blocks

Just the blocks without lots

block_num

Dataset

Description and Constraints

Current
Subdivision
Parcels

These are the current active
recorded parcels. The
geography can be used as
reference but should not be
used for anything requiring
precision.

Block
Column

N/A

N/A

Is anything wrong, unclear, missing?
Leave a comment.

63

Building Footprints

Building Footprints
Definition
The extent of a building in 2 dimensional space
Includes a unique identifier and other information derived from LIDAR (e.g. max height)

Illustration

On left: Oblique view of Green St facing north between Columbus and Powell ( Imagery:
© 2017 Google; Left Panel Map Data: © 2017 Google)
On right: building footprints for the same block

Authority
SFGIS in the Department of Technology manages data collection and processing from
LIDAR
LIDAR data is provided by a third-party and is updated every ???
From this data, SFGIS derives the footprints and assigns unique identifiers as well
as additional derived statistics about the building (e.g. min, max and median height)
Information about buildings is captured by other departments including Building
Inspection, SF Environment, SF Planning and the City's Real Estate Division among
others.
Building footprints do not include administrative data about a building
They can be related to administrative data spatially and via unique identifiers

Use

64

Building Footprints

To relate other administrative records to a structure
To clarify among administrative datasets what specific structure is being referenced
To improve the addressing model so that address numbers reference a building, not just
a parcel

Accepted values
Footprints are not currently updated as new buildings are constructed
For those buildings constructed before 2010, you can use the unique identifier
sf16_bldgid

Reference Datasets
Dataset

Description and Constraints

Reference
Columns

Building
Footprints

The footprint extents are collapsed from an earlier 3D
building model provided by Pictometry of 2010, and have
been refined from a version of building masses publicly
available on the open data portal for over two years. The
building masses were manually split with reference to
parcel lines, but using vertices from the building mass
wherever possible. These split footprints correspond
closely to individual structures even where there are
common walls; the goal of the splitting process was to
divide the building mass wherever there was likely to be
a firewall. An arbitrary identifier was assigned based on a
descending sort of building area for 177,023 footprints.
The centroid of each footprint was used to join a property
identifier from a draft of the San Francisco Enterprise GIS
Program's cartographic base, which provides continuous
coverage with distinct right-of-way areas as well as
selected nearby parcels from adjacent counties.

unique
identifier for
footprint
mblr for
reference to
property
identifiers
including
parcels and
right of way

sf16_bldgid

Is anything wrong, unclear, missing?
Leave a comment.

65

Address Numbers

Address Numbers
Definition
Per Administrative Bulletin 035 (AB-035) in the San Francisco Building Codes:
All primary entrances from the street to all buildings and all direct entrances from the
street to separate tenant spaces or dwelling units shall be numbered

Illustration

Illustration of right side of Green Street between Columbus Ave and Powell St
100 valid address numbers on this segment from 600 to 699
Even adddresses on right, odds on left
Each address corresponds to an entrance from the street
Note buildings at the rear of the building facing the street have entryways from the
street (e.g. 656A, 658A, 664A, and 666A)
Numbers can be assigned where there is no building, but they must be associated
66

Address Numbers

with a parcel
e.g. the parking lot at 626 Green St

Authority
The official street numbers are assigned by the Department of Building Inspection
Building Official prior to permits for new structures according to the procedure in AB-035

Use
To identify addresses where precision is a requirement
As a location identifier for a number of citywide business processes including noticing,
permitting, business registrations, etc.

Accepted values
Street numbers are assigned according to rules laid out in AB-035, these specify:
The start and end point of address assignment
How many addresses are allocated between intersections and where that differs
Where even and odd numbers are assigned
Authorized City staff enter address numbers in the Enterprise Addressing System
according to these rules
Note on Units: The City records unit numbers for condos to support tying property
records for deeds, property taxes and other business processes. There is no formal
requirement to record the units in rental buildings.

Reference Datasets

67

Address Numbers

Street Number
Column

Dataset

Description and Constraints

Addresses
Enterprise
Addressing
System

The EAS is the system of record for DBI when
assigning official addresses. Associated coordinates
are most often associated with the center of a parcel
or close to it, rather than at the door or entry. This still
allows associations, but it means that in certain
cases a building footprint cannot be spatially
matched via intersection or "point in polygon" with it's
address(es).

address_number

Addresses
with Units Enterprise
Addressing
System

Same general limitations as the Addresses dataset
above, but also includes sub-addresses like units.
Unit numbers are formally referenced for condos
because the City records these for the purposes of
tying deeds and other property records to a specific
unit and owner. Rental units are not formally
recorded by the City.

address_number

Is anything wrong, unclear, missing?
Leave a comment.

68

Street Names

Street Names
Definition
The official name assigned to a segment of street or right-of-way that is legislated
through the subdivision process and/or Board of Supervisors
Street names are generally established when streets are created as a result of the
development / subdivision of land codified in the City's subdivision codes
Renaming streets can be initiated by members of the public or the Board of
Supervisors according to the process documented by Public Works
Note: The above only applies to city-owned public streets

Illustration

Above is the street sign for Jack Kerouac Alley (formerly Adler Alley). On the street sign,
both names are present for five years following a name change.

Authority
New street names assigned during the development / subdivision of land
Recordation of final parcel maps including new streets happens with the Office of
the Assessor-Recorder
Before recordation, subdivision maps are approved by the County Surveyor, the
Public Works Director and the Board of Supervisors
Part of the process defined in the City Subdivision Codes
Renaming of streets requires:

69

Street Names

Petition with signatures submitted to Public Works for review with a submittal fee
The resolution referred to the Clerk of the Board of Supervisors
A Public Hearing at the Land Use and Economic Development Committee
Board of Supervisors approval
Mayor's signature

Use
For official base maps to label the streets properly
As a component part of a full address (see address formatting guidance)
To validate against user submitted address data (e.g. in a form online)

Accepted values
Official street names are maintained in the City's Official Basemap updated by Public
Works staff
The full list of valid City street names is available in the street names dataset

Reference Datasets
Dataset

Description and Constraints

Reference Columns
fullstreetname

Street Names

Contains a list of officially valid street
names contained in the City's Basemap

composed of streetname
& streettype &
postdirection

San
Francisco
Basemap
Street
Centerlines

A geographic reference of the all
basemap streets including a number of
street components like the valid name

streetname
street

&

composed of
st_type

Is anything wrong, unclear, missing?
Leave a comment.

70

Street Suffix Abbreviations

Street Suffix Abbreviations
Definition
A street suffix is a word that follows the name of the street describing its type (e.g.
Street, Avenue, Road)
Suffix abbreviations are shortened forms standardized by the United States Postal
Service (USPS), these are the ones the City uses as well

Authority
The USPS sets standards for addresses for consistency across the delivery of mail
These are documented in USPS Publication 28: Postal Addressing Standards

Use
When writing or recording a short form of a full street name
1500 Market Street to 1500 Market St

Accepted values
Standard street suffix abbreviations available online under Publication 28, Appendx C1
Best not to encode with a period at the end
e.g. ST or St not St.

Is anything wrong, unclear, missing?
Leave a comment.

71

Street Centerlines and Nodes

Street Centerlines and Nodes
Definition
Street centerlines are lines that represent a network of streets
They are aligned generally to the center of a street
They are meant to model the street network and thus have no width or area
They have a length component
Street nodes are the endpoints of a street centerline and represent intersections
A node shared among multiple intersecting street segments is an intersection
Each node and centerline segment will have a unique Centerline Node Network (CNN)
identifier
The collection of Centerline Node Network identifiers are collectively known as "CNNs"

Illustration

72

Street Centerlines and Nodes

Shows 3 streets (Stockton, Green and Columbus) at a point of intersection
Each segment sits between two nodes
A segment ends where it intersects with another segment OR at the physical end of
a street (a dead end)
Some segments will start and end at the same node
Each segment and node has a CNN identifier pictured above
Segments share the same node where they intersect
Node ID 25352000 in the middle is shared by 6 segments

Authority
The management of streets falls to different jurisdictions within the City
Public Works manages and maintains the majority of streets within the City
The remaining are managed and maintained by other entities like Caltrans, Presidio
Trust National Park and Parks & Recreation, a summary of miles of streets by
jurisdiction is available on the open data portal
Basemap data including streets from various jurisdictions is maintained by Public Works

73

Street Centerlines and Nodes

Use
Centerline Node Network IDs (CNNs) are referenced in many datasets throughout the
City (including but not limited to permits and inspections, project management and asset
management systems)
Used to enhance data by adding location attributes, allowing disparate datasets to be
mapped as well as compared for analysis
To model the transportation network

Accepted Values
Every centerline and node will have a unique Centerline Node Network (CNN) identifier
cnn

as a number

cnntext

as a text string

CNN IDs (CNN) may be used in secondary columns as reference
For example:
f_node_cnn

and

t_node_cnn

to indicate from and to nodes

When referencing a CNN, include clear definition in the data dictionary, and include
cnn

in the column name

Valid IDs are in the reference datasets below

Reference Datasets

74

Street Centerlines and Nodes

Dataset

Description and Constraints

Reference
Columns

List of
Streets and
Intersections

A list of street segments and intersections sorted
by street name and ascending address number.
This data set is based on the City's GIS basemap
and contains CNN id numbers for each record.

cnn as number
For segments:
from_cnn and
to_cnn define
the node IDs at
each end

San
Francisco
Basemap
Street
Centerlines

A geographic reference of the all basemap streets
including centerline node network identifiers and
jurisdictions

as number
cnntext as text
f_node_cnn as
the starting
(from) node ID
t_node_cnn as
the ending (to)
node ID

Street
Segment
and
Intersection
(CNN)
Change Log

A list of Street Segment and Intersection (CNN)
changes including new, dropped, realigned,
divided and split records.

cnn

oldcnn

as

number
newcnn

as

number

Is anything wrong, unclear, missing?
Leave a comment.

75

Reference: Boundaries

Reference: Boundaries
Common boundary references are used in numerous City datasets. This section distills
some of the most common references. These include:
Census
Neighborhoods
Supervisor Districts
Zoning Use Districts

76

Census

Census Boundaries
Census data is available from the Federal Census Bureau. For certain City administrative
datasets, we assign census boundaries to make linking these to Census data easier.
For census boundary IDs we present the full ID starting with State ID and going down to the
most granular ID represented by the field (e.g. tract, block or block group). The full IDs are
presented as strings, not numbers. You can learn more about geographic boundaries and
identifiers on the Census website. The full IDs are constructed in the following order:
State FIPS Code (2 digit) > County FIPS code (3 digit) > Tract ID (6 digit) > Blockgro
up ID (1 digit) > Block ID (4 digits, but first digit is the same as Blockgroup ID)

On City datasets with a Census geography column, we only represent the ID for the most
granular geography appropriate to the data. For example, if we publish down to the Census
block, we don't include a separate column for blockgroup or tract. One can derive these from
the full ID because of the nesting relationship mentioned above.
Census
Boundary

Example ID

Label

State

06

California

County

06075

San Francisco County, California

Census
Tract

06075010100

Census Tract 101, San Francisco County,
California

Census
Blockgroup

060750101001

Block Group 1, Census Tract 101, San Francisco
County, California

Census
Block

060750101001000

Block 1000, Block Group 1, Census Tract 101,
San Francisco County, California

77

Neighborhoods

Neighborhoods
The City's Open Data Program provides the Analysis Neighborhoods as the primary
neighborhood district boundary on automated datasets. We also provide other neighborhood
boundaries when appropriate.
The table below includes:
the name and link to each of the neighborhood districts
the human readable column name used on the open data portal
the application programming interface (API) name
the shortname used when there are character limits (e.g. in shapefile formats)
the number of districts included in the dataset
a quick link to download a CSV of just the boundary names (without geometry)

Dataset

Column Name
(Human
Readable)

API Name

Short Name

Analysis
Neighborhoods

Neighborhooods
- Analysis
Boundaries

neighborhoods_analysis_boundaries

NBHDANA

Neighborhood
Groups

Neighborhoods
- Group
Boundaries

neighborhoods_group_boundaries

NBHDGRP

SF Realtor
Neighborhoods

Neighborhoods
- Realtor
Boundaries

neighborhoods_realtor_boundaries

NBHDSFRA

SFFind
Neighborhoods

Neighborhoods
- SFFind
boundaries

neighborhoods_sffind_boundaries

NBHDSFFIND

Note: Datasets published before we codified this practice may not reflect the above. We are
actively improving existing datasets on a rolling basis. Please consult the data dictionary and
other related documentation under the dataset's About tab. If it's still unclear, contact
DataSF, and we'll be happy to help.

78

Supervisor Districts

Supervisor Districts
Definition
There are 11 members of the Board of Supervisors, each representing a geographic
district.

Illustration
Other Fields

Supervisor District

...

1

...

2

...

3

...

4

...

5

...

6

...

7

...

8

...

9

...

10

...

11

Use
Primarily used for reporting by supervisor district

Accepted values
Column name should be

Supervisor District

or

supervisor_district

Values between 1 and 11 (integer)
Acceptable ways to indicate no district include:
null
-1

or

meaning the field has no value
0

79

Supervisor Districts

Indicate how no district is represented in your data dictionary
For example, not all 311 cases have a location and won't have an associated
district

Reference Datasets
Dataset

Description and
Constraints

Reference Columns

Current
Supervisor
Districts

Supervisor Districts as of the
2012 redistricting

- number of district
(integer 1 through 11)
supervisor

80

Zoning Use Districts

Zoning Use Districts
Definition
Zoning regulations govern how land can be used in various geographic areas called
"zoning use districts" (also known as "zoning," "zones" or "use districts").
Zoning regulations may:
govern sizes and shapes of buildings
limit the number of units or apartments that can exists on a property
require the accommodation of car parking off of the street
set controls on planting street trees under certain circumstances
specify how late a business can remain open at night

Illustration

Each part of the City is divided into zones that correspond to regulations in the Planning
Code
Get a higher resolution PDF version of the map above provided by Planning

81

Zoning Use Districts

Authority
Zoning regulations are set out in the San Francisco Planning Code and modified
through legislation
The Planning Department enforces zoning compliance

Use
For understanding what is permitted, conditional and not permitted when building in San
Francisco

Reference
Reference

Description and Constraints

Reference
Columns

Zoning
Districts

The Zoning Districts are a component of the
Zoning Map which in turn is a key component of
the San Francisco Planning Code.

url links to the
district definition in
the planning codes
zoning is the
district code

Planning
Code

The official Zoning Map can be found in the San
Francisco Planning Code on the links under
ZONING MAPS on the left navigation column).

N/A

82

Reserved Column Names

Appendix: Reserved Column Names
The following column names should be used only if they adhere to the definitions in this
guide:
analysis_neighborhood
date
date_time
fiscal_half_year
fiscal_month
fiscal_quarter
fiscal_year
half_year
latitude
longitude
month
quarter
supervisor_district
time
week
x_coord
y_coord
year
zip_code

Is anything wrong, unclear, missing?
Leave a comment.

83

Reference Data Index

Appendix: Reference Data Index
Below is a table of the reference datasets mentioned in this document. View all the
reference data below in the open data portal.
Dataset

Description and
Constraints

Addresses Enterprise
Addressing
System

The EAS is the system of
record for DBI when
assigning official addresses.
Coordinates are most often
associated with the center of
a parcel or close to it, rather
than at the door or entry.
This still allows associations,
but it means that in certain
cases a building footprint
cannot be spatially matched
via intersection or "point in
polygon" with it's
address(es).

address_number

Address
Numbers

Addresses
with Units Enterprise
Addressing
System

Same general limitations as
the Addresses dataset
above, but also includes
sub-addresses like units.
Unit numbers are formally
referenced for condos
because the City records
these for the purposes of
properly tying deeds and
other property records to a
specific unit and owner.
Rental units are not formally
recorded by the City.

address_number

Address
Numbers

Reference Columns

Page(s)

The footprint extents are
collapsed from an earlier 3D
building model provided by
Pictometry of 2010, and
have been refined from a
version of building masses
publicly available on the
open data portal for over two
years. The building masses
were manually split with
reference to parcel lines, but
using vertices from the
building mass wherever

84

Reference Data Index

Building
Footprints

Department
Code List

possible. These split
footprints correspond closely
to individual structures even
where there are common
walls; the goal of the splitting
process was to divide the
building mass wherever
there was likely to be a
firewall. An arbitrary
identifier was assigned
based on a descending sort
of building area for 177,023
footprints. The centroid of
each footprint was used to
join a property identifier from
a draft of the San Francisco
Enterprise GIS Program's
cartographic base, which
provides continuous
coverage with distinct rightof-way areas as well as
selected nearby parcels
from adjacent counties.
These department codes are
maintained in the City's
Financial System of Record.
Department Groups,
Divisions, Sections, Units,
Sub Units and Departments
are nested in the dataset
from left to right. Each
nested unit has both a code
and an associated name.

Building
Footprints

Nested (right to left):
department_group_code
division_code
section_code
unit_code
sub_unit_code
department_code

Department
Names and
Codes

These are the current active
recorded parcels. The
geography can be used as
reference but should not be
used for anything requiring
precision.

block_num
blklot

Parcels

A list of street segments and
intersections sorted by street
name and ascending

cnn

The dataset represents a
flattened tree (hierarchy) so
that each leaf on the tree
has it's own row. Thus
certain rows will have
repeated codes across
columns. Data changes as
needed.

Current
Subdivision
Parcels

unique
identifier for footprint
mblr for reference to
property identifiers
including parcels and
right of way
sf16_bldgid

lot_num

as number

85

Reference Data Index

List of
Streets and
Intersections

address number. This data
set is based on the City's
GIS basemap and contains
CNN id numbers for each
record.

For segments:
from_cnn and to_cnn
define the node IDs at
each end

Recorded
Parcel
Geography
with
Transaction
Date History

These are the current and
historic parcels with
recorded dates. Historic
parcels only go back to
about 1995 with some
exceptions. Useful for tying
historic administrative
records to a location. The
geography can be used as
reference but should not be
used for anything requiring
precision.

block_num
blklot

San
Francisco
Assessor
Blocks

Just the blocks without lots

block_num

lot_num

as number
as text
f_node_cnn as the
starting (from) node ID
t_node_cnn as the
ending (to) node ID

Street
Centerlines
and Nodes

Parcels

Parcels

cnn

cnntext

San
Francisco
Basemap
Street
Centerlines

A geographic reference of
the all basemap streets
including centerline node
network identifiers and
jurisdictions and street
names by segment

fullstreetname

composed of
streetname &

Street
Centerlines
and Nodes
& Street
Names

streettype

Street
Names
Street
Segment
and
Intersection
(CNN)
Change Log

Contains a list of officially
valid street names contained
in the City's Basemap
A list of Street Segment and
Intersection (CNN) changes
including new, dropped,
realigned, divided and split
records.

fullstreetname

composed of
streetname &

Street
Names

streettype

oldcnn
newcnn

as number
as number

Street
Centerlines
and Nodes

86

Contributing

Appendix: Contributing
All of this documentation is open source and available to edit on GitHub. If you see
something that you can contribute, submit a pull request with your edits! To make this easy
you can click the "Edit this page" link at the top of the web docs.
The docs are all written in GitHub Flavored Markdown. If you've used GitHub, it's pretty likely
you've encountered it before. You can become a pro in a few minutes by reading their GFM
Documentation page.

Organizing Files
You'll notice that the GitHub Repo is in a logical structure. Each of the major sections is a
folder. For example the 'Reference: Basemap' pages are in the folder

basemap

in the top

level of the repository.
Some of the chapters are split into multiple sections to help break up the content and make it
easier to digest. You can easily see how chapters are laid out by looking at the

SUMMARY.md

file. This convention helps keep chapters together in the file system and easy to view either
directly on github or gitbook.

Table of Contents
You'll find the table of contents in the SUMMARY.md file. It's a nested list of markdown links.
You can link to a file simply by putting the filename (including the extension) inside the link
target.

Introduction Page
This is the root README.md file. It's intent is to give the reader an elevator pitch of what this
document is about is and why we think it is useful.

Send a Pull Request
So that's it. You make your edits, keep your files and the Table of Contents organized, and
send us a pull request.

Enjoy the Offline Docs

87

Contributing

Moments after your edits are merged, they will be automatically published to the web, as a
downloadable PDF, .mobi file (Kindle compatible), and ePub file (iBooks compatible).

88

Acknowledgements

Appendix: Acknowledgements
Many thanks to Singapore's Open Data Program for providing a Data Quality Guide for
Tabular Data the bulk of which made its way into the chapter Data Structure and Formats
with some additions and modifications.

89

License

Appendix: License

90



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
XMP Toolkit                     : Adobe XMP Core 5.6-c016 91.163616, 2018/10/29-16:58:49
Format                          : application/pdf
Description                     : A work in progress, DataSF is establishing publishing standards to increase the quality and consistency of datasets across the City.
Title                           : Data Standards Reference Handbook (Beta Release)
Subject                         : 
Publisher                       : GitBook
Creator                         : datasf
Language                        : en
Metadata Date                   : 2019:04:06 21:18:23-05:00
Modify Date                     : 2019:04:06 21:18:23-05:00
Timestamp                       : 2018:07:09 17:17:22.373610+00:00
Document ID                     : uuid:a5dc7b28-f311-0948-b761-155039afcf9f
Instance ID                     : uuid:0f3d43d0-0e36-c347-9201-9fba2ab515c0
Page Count                      : 90
Author                          : datasf
Create Date                     : 2018:07:09 17:17:28+00:00
Producer                        : calibre 2.57.1 [http://calibre-ebook.com]
EXIF Metadata provided by EXIF.tools

Navigation menu