Open Data Metadata Guide 2019 01 Bloomberg

User Manual:

Open the PDF directly: View PDF .
Page Count: 13

Download
Open PDF In Browser	View PDF

Table of Contents
Introduction

1.1

Categories

Categories
As an open data portal grows beyond 25-30 datasets, it is more helpful to visitors if they can
browse for data by a subject matter or theme. These categories are short, at most a word or
two, and allow related data to be grouped together. Categories also empower visitors to
explore available data for inspiration, rather than requiring them to use a search tool to find
something specific.
Creating categories is often a fundamental step when implementing an open data portal.
Categories do not need to be permanent; it makes sense to have three to four categories for
a small number of datasets, and re-evaluate them on an annual basis as more data is
published. Most mature open data portals have 8 to 12 categories. Having too many might
mean that the categories are not broad enough. Having too few, especially when combined
with a large number of datasets, might mean that the categories are too broad and less
helpful for visitors.
Although there is no consistent set of categories between open data portals, the following
are quite common and might serve as a starting point: Business, Education, Environment,
Finance, Health, Human (or Social) Services, Property, Public Safety, Recreation, and
Transportation. A librarian or information architect can provide insight and assistance when
creating or revising the list of categories.

Dataset Metadata

Dataset Metadata
Without dataset metadata, a catalog of published data could not exist. Many open data
portals include the necessary tools to create dataset metadata when publishing new data.
Some open data portals automatically update the metadata when editing datasets. Each
dataset you publish will include many of the following metadata elements.

Basic Elements
Basic metadata elements provide the most important pieces of information to help visitors
find data and determine if it is what they need. Many of these items will appear directly in
catalog navigation pages or search results.
Title (or Name): Human-readable name for the data. It should be in plain English and
include sufficient detail to facilitate search and discovery. Acronyms should be avoided.
Description: Human-readable description (e.g., an abstract) with sufficient detail to
enable a user to quickly understand whether the asset is of interest.
Category (or Theme): Main thematic category of the dataset, usually chosen from a
predefined list. Refer to the Categories section of this guide for more information. Some
open data portals limit a dataset to one category; others allow multiple.
Keywords (or Tags): Tags (or keywords) are generally single words which help visitors
discover the data; please include terms that would be used by technical and nontechnical users. Keywords can also be used by recommendation engines to help visitors
discover similar datasets.
Modification Date: The most recent date on which the dataset was changed, updated,
or modified.
Contact Information: The name and email address of the publisher of a dataset.
License: Often datasets on open data portals are available in the public domain with no
restritions on reuse (usually this is noted in the site’s Terms of Service or Data Policy),
however there may be circumstances where a specific dataset is offered using a
different license.

Advanced Elements
Advanced metadata elements provide helpful information that allows third-party software to
consume both data catalogs and datasets. These items might not appear in catalog
navigation pages or search results, but allow for sharing with other open data portals and

Dataset Metadata

search engines.
Frequency: The frequency with which dataset is updated, in plain English. For
example, “Never,” “Hourly,” “Daily,” “Weekdays,” “Weekly,” “Semi-monthly,” “Monthly,”
“Quarterly,” “Semi-annually,” “Annually,” etc. This helps visitors know how often they
should check for new data, and is particularly valuable for software programmers who
may set up automatic downloads.
Temporal Coverage: The range of time included in this dataset. This may reflect a
general range for all the records, or may reflect the earliest and latest dates from
records in the data.
Spatial Coverage: The geographic area for which this dataset is relevant. A place
name - particularly one associated with clear boundaries - is most commonly used. If
the dataset includes geospatial information, spatial coverage can represent a bounding
rectangle or polygon of all the geography contained within it, though this is uncommon.
Refer to Appendix A for sample dataset metadata.

Column Metadata

Column Metadata
Although column metadata is often limited or left out entirely, it is very helpful to data
consumers who frequently work with, write software for, or analyze datasets. Column
metadata attributes provide important details about the data which the column contains.
Many open data portals include the necessary tools to create column metadata when
publishing new data.
Name: Human-readable name of the column. It should be in plain English and usually a
word, or a few words at the most.
Description: Human-readable description of the column’s contents. This description
should include how values in this column are created or updated; address any data
quality concerns, such as unexpected or unusual values;, and explain any meanings
which might be stored as codes,often used for record classification, and more frequent
in source data systems designed for limited storage space.
Data Type: Specifying a data type helps improve the consistency and quality of data.
Common data types are text, numbers, dates/times, booleans (yes/no or true/false), and
geometry (points, lines, polygons). Some open data portals will prevent records from
being added or updated if the type of a value is incorrect.
Required: Specifying whether a value is required in the column for every row in the
table helps improve the quality of data. Some open data portals will prevent records
from being added or updated if the column is marked as required but the datum was not
included.
Machine Name: Machine-readable version of the column’s Name. This is often a copy
of the Name, with changes that make it suitable for computer software to use. These
changes may include replacing spaces with underscores (or removing them entirely),
applying camel-case, and/or ensuring it is unique from other column names.
Refer to Appendix B for sample column metadata.

Additional Resources

Additional Resources
This metadata guide is based upon current best practices in the US open government data
sector. The following are additional resources which may be helpful for greater detail and
guidance:
Project Open Data Metadata Schema
Dublin Core
Federal Geographic Data Committee - Metadata
The Open Metadata Handbook

Appendix A: Sample Dataset Metadata

Appendix A: Sample Dataset Metadata
Standard Dataset Fields
The U.S. federal government has created the Project Open Data metadata schema standard
to implement the federal open data policy. The Project Open Data schema is based on the
international DCAT metadata schema used by open data programs around the world and
has been mapped to many standards. The Project Open Data schema must be preseneted
as a JSON file to be ingested by Data.gov. This schema is natively available with many open
data portal providers including: Azavea, Esri Open Data, NuCivic's DKAN, OpenGov, and
Socrata, and is easily added to CKAN sites with an extension or can be generated on an ad
hoc basis with these tools.
Field

Label

Definition

Required

Title

Human-readable name of the asset.
Should be in plain English and
include sufficient detail to facilitate
search and discovery.

Always

Description

Human-readable description (e.g., an
abstract) with sufficient detail to
enable a user to quickly understand
whether the asset is of interest.

Always

keyword

Open Data Metadata Guide 2019 01 Bloomberg

Navigation menu

Versions of this User Manual:

Views

Navigation