Open Data Metadata Guide 2019 01 Bloomberg
User Manual:
Open the PDF directly: View PDF .
Page Count: 13
OpenDataMetadataGuide
Metadata-descriptiveinformationaboutdata-iscriticaltohelpingvisitorsfindanduse
publisheddataeffectively.Goodmetadatareducestheneedforvisitorstoseekpersonal
assistance,helpspreventmisinterpretationofdata,andencourageshigherdataquality.
Metadataisgenerallydividedintotwotypes:
Metadatathatprovidesanoverviewofthedata.Thiskindofmetadatahelpspeoplefind
thedatathroughinternetsearches,whilenavigatingyourportal,orevenwhile
navigatingotherdataportalswhichmightincludeyourcatalog.
Metadatathatprovidesdetailsaboutspecificpartsofyourdata.Thiskindofmetadata
enablespeopletouseyourdataeffectively,byhelpingthemunderstandthevarious
elementsitincludesandpotentiallimitations.
Thisguideavoidsthedetailsofspecifictechnologies;however,ittakesintoaccountexisting
(andemerging)nationalandinternationalstandardsformetadata.RefertotheAdditional
Resourcessectionformoreinformation.
Toassistcitiesinadvancingopendataprogramsintheirowncommunities,theCenterfor
GovernmentExcellenceatJohnsHopkinsUniversity,apartnerintheWhatWorksCities
initiative,hascreatedthismetadataguide.Bylearningfromtheexperiencesofothercities
andfollowingCenter-developedbestpractices,citieswillhaveagreaterunderstandingof
metadataandbewellontheirwaytonationalleadershipinopengovernmentdata.
Introduction
3
Categories
Asanopendataportalgrowsbeyond25-30datasets,itismorehelpfultovisitorsiftheycan
browsefordatabyasubjectmatterortheme.Thesecategoriesareshort,atmostawordor
two,andallowrelateddatatobegroupedtogether.Categoriesalsoempowervisitorsto
exploreavailabledataforinspiration,ratherthanrequiringthemtouseasearchtooltofind
somethingspecific.
Creatingcategoriesisoftenafundamentalstepwhenimplementinganopendataportal.
Categoriesdonotneedtobepermanent;itmakessensetohavethreetofourcategoriesfor
asmallnumberofdatasets,andre-evaluatethemonanannualbasisasmoredatais
published.Mostmatureopendataportalshave8to12categories.Havingtoomanymight
meanthatthecategoriesarenotbroadenough.Havingtoofew,especiallywhencombined
withalargenumberofdatasets,mightmeanthatthecategoriesaretoobroadandless
helpfulforvisitors.
Althoughthereisnoconsistentsetofcategoriesbetweenopendataportals,thefollowing
arequitecommonandmightserveasastartingpoint:Business,Education,Environment,
Finance,Health,Human(orSocial)Services,Property,PublicSafety,Recreation,and
Transportation.Alibrarianorinformationarchitectcanprovideinsightandassistancewhen
creatingorrevisingthelistofcategories.
Categories
4
DatasetMetadata
Withoutdatasetmetadata,acatalogofpublisheddatacouldnotexist.Manyopendata
portalsincludethenecessarytoolstocreatedatasetmetadatawhenpublishingnewdata.
Someopendataportalsautomaticallyupdatethemetadatawheneditingdatasets.Each
datasetyoupublishwillincludemanyofthefollowingmetadataelements.
BasicElements
Basicmetadataelementsprovidethemostimportantpiecesofinformationtohelpvisitors
finddataanddetermineifitiswhattheyneed.Manyoftheseitemswillappeardirectlyin
catalognavigationpagesorsearchresults.
Title(orName):Human-readablenameforthedata.ItshouldbeinplainEnglishand
includesufficientdetailtofacilitatesearchanddiscovery.Acronymsshouldbeavoided.
Description:Human-readabledescription(e.g.,anabstract)withsufficientdetailto
enableausertoquicklyunderstandwhethertheassetisofinterest.
Category(orTheme):Mainthematiccategoryofthedataset,usuallychosenfroma
predefinedlist.RefertotheCategoriessectionofthisguideformoreinformation.Some
opendataportalslimitadatasettoonecategory;othersallowmultiple.
Keywords(orTags):Tags(orkeywords)aregenerallysinglewordswhichhelpvisitors
discoverthedata;pleaseincludetermsthatwouldbeusedbytechnicalandnon-
technicalusers.Keywordscanalsobeusedbyrecommendationenginestohelpvisitors
discoversimilardatasets.
ModificationDate:Themostrecentdateonwhichthedatasetwaschanged,updated,
ormodified.
ContactInformation:Thenameandemailaddressofthepublisherofadataset.
License:Oftendatasetsonopendataportalsareavailableinthepublicdomainwithno
restritionsonreuse(usuallythisisnotedinthesite’sTermsofServiceorDataPolicy),
howevertheremaybecircumstanceswhereaspecificdatasetisofferedusinga
differentlicense.
AdvancedElements
Advancedmetadataelementsprovidehelpfulinformationthatallowsthird-partysoftwareto
consumebothdatacatalogsanddatasets.Theseitemsmightnotappearincatalog
navigationpagesorsearchresults,butallowforsharingwithotheropendataportalsand
DatasetMetadata
5
searchengines.
Frequency:Thefrequencywithwhichdatasetisupdated,inplainEnglish.For
example,“Never,”“Hourly,”“Daily,”“Weekdays,”“Weekly,”“Semi-monthly,”“Monthly,”
“Quarterly,”“Semi-annually,”“Annually,”etc.Thishelpsvisitorsknowhowoftenthey
shouldcheckfornewdata,andisparticularlyvaluableforsoftwareprogrammerswho
maysetupautomaticdownloads.
TemporalCoverage:Therangeoftimeincludedinthisdataset.Thismayreflecta
generalrangeforalltherecords,ormayreflecttheearliestandlatestdatesfrom
recordsinthedata.
SpatialCoverage:Thegeographicareaforwhichthisdatasetisrelevant.Aplace
name-particularlyoneassociatedwithclearboundaries-ismostcommonlyused.If
thedatasetincludesgeospatialinformation,spatialcoveragecanrepresentabounding
rectangleorpolygonofallthegeographycontainedwithinit,thoughthisisuncommon.
RefertoAppendixAforsampledatasetmetadata.
DatasetMetadata
6
ColumnMetadata
Althoughcolumnmetadataisoftenlimitedorleftoutentirely,itisveryhelpfultodata
consumerswhofrequentlyworkwith,writesoftwarefor,oranalyzedatasets.Column
metadataattributesprovideimportantdetailsaboutthedatawhichthecolumncontains.
Manyopendataportalsincludethenecessarytoolstocreatecolumnmetadatawhen
publishingnewdata.
Name:Human-readablenameofthecolumn.ItshouldbeinplainEnglishandusuallya
word,orafewwordsatthemost.
Description:Human-readabledescriptionofthecolumn’scontents.Thisdescription
shouldincludehowvaluesinthiscolumnarecreatedorupdated;addressanydata
qualityconcerns,suchasunexpectedorunusualvalues;,andexplainanymeanings
whichmightbestoredascodes,oftenusedforrecordclassification,andmorefrequent
insourcedatasystemsdesignedforlimitedstoragespace.
DataType:Specifyingadatatypehelpsimprovetheconsistencyandqualityofdata.
Commondatatypesaretext,numbers,dates/times,booleans(yes/noortrue/false),and
geometry(points,lines,polygons).Someopendataportalswillpreventrecordsfrom
beingaddedorupdatedifthetypeofavalueisincorrect.
Required:Specifyingwhetheravalueisrequiredinthecolumnforeveryrowinthe
tablehelpsimprovethequalityofdata.Someopendataportalswillpreventrecords
frombeingaddedorupdatedifthecolumnismarkedasrequiredbutthedatumwasnot
included.
MachineName:Machine-readableversionofthecolumn’sName.Thisisoftenacopy
oftheName,withchangesthatmakeitsuitableforcomputersoftwaretouse.These
changesmayincludereplacingspaceswithunderscores(orremovingthementirely),
applyingcamel-case,and/orensuringitisuniquefromothercolumnnames.
RefertoAppendixBforsamplecolumnmetadata.
ColumnMetadata
7
AdditionalResources
ThismetadataguideisbaseduponcurrentbestpracticesintheUSopengovernmentdata
sector.Thefollowingareadditionalresourceswhichmaybehelpfulforgreaterdetailand
guidance:
ProjectOpenDataMetadataSchema
DublinCore
FederalGeographicDataCommittee-Metadata
TheOpenMetadataHandbook
AdditionalResources
8
AppendixA:SampleDatasetMetadata
StandardDatasetFields
TheU.S.federalgovernmenthascreatedtheProjectOpenDatametadataschemastandard
toimplementthefederalopendatapolicy.TheProjectOpenDataschemaisbasedonthe
internationalDCATmetadataschemausedbyopendataprogramsaroundtheworldand
hasbeenmappedtomanystandards.TheProjectOpenDataschemamustbepreseneted
asaJSONfiletobeingestedbyData.gov.Thisschemaisnativelyavailablewithmanyopen
dataportalprovidersincluding:Azavea,EsriOpenData,NuCivic'sDKAN,OpenGov,and
Socrata,andiseasilyaddedtoCKANsiteswithanextensionorcanbegeneratedonanad
hocbasiswiththesetools.
Field Label Definition Required
title Title
Human-readablenameoftheasset.
ShouldbeinplainEnglishand
includesufficientdetailtofacilitate
searchanddiscovery.
Always
description Description
Human-readabledescription(e.g.,an
abstract)withsufficientdetailto
enableausertoquicklyunderstand
whethertheassetisofinterest.
Always
keyword Tags
Tags(orkeywords)helpusers
discoveryourdataset;pleaseinclude
termsthatwouldbeusedby
technicalandnon-technicalusers.
Always
modified Last
Update
Mostrecentdateonwhichthe
datasetwaschanged,updatedor
modified.
Always
publisher Publisher Thepublishingentityandoptionally
theirparentorganization(s). Always
contactPoint
Contact
Nameand
Email
Contactperson'snameandemailfor
theasset. Always
identifier Unique
Identifier
Auniqueidentifierforthedatasetor
APIasmaintainedwithinanAgency
catalogordatabase.
Always
Thedegreetowhichthisdataset
couldbemadepublicly-available,
regardlessofwhetherithasbeen
AppendixA:SampleDatasetMetadata
9
accessLevel
Public
Access
Level
madeavailable.Choices:public
(Dataassetisorcouldbemade
publiclyavailabletoallwithout
restrictions),restrictedpublic(Data
assetisavailableundercertainuse
restrictions),ornon-public(Data
assetisnotavailabletomembersof
thepublic).
Always
license License
Thelicenseornon-license(i.e.
PublicDomain)statuswithwhichthe
datasetorAPIhasbeenpublished.
SeeOpenLicensesformore
information.
If-
Applicable
rights Rights
Thismayincludeinformation
regardingaccessorrestrictions
basedonprivacy,security,orother
policies.Thisshouldalsoserveasan
explanationfortheselected
“accessLevel”includinginstructions
forhowtoaccessarestrictedfile,if
applicable,orexplanationforwhya
“non-public”or“restrictedpublic”
dataassetisnot“public,”if
applicable.Text,255characters.
If-
Applicable
spatial Spatial
Therangeofspatialapplicabilityofa
dataset.Couldincludeaspatial
regionlikeaboundingboxora
namedplace.
If-
Applicable
temporal Temporal
Therangeoftemporalapplicabilityof
adataset(i.e.,astartandenddate
ofapplicabilityforthedata).
If-
Applicable
distribution Distribution
Acontainerforthearrayof
Distributionobjects.SeeDataset
DistributionFieldsbelowfordetails.
If-
Applicable
@type Metadata
Type
IRIfortheJSON-LDdatatype.This
shouldbe dcat:Datasetforeach
Dataset.
No
accrualPeriodicity Frequency Thefrequencywithwhichdatasetis
published. No
conformsTo Data
Standard
URIusedtoidentifyastandardized
specificationthedatasetconformsto. No
describedBy Data
Dictionary
URLtothedatadictionaryforthe
dataset.Notethatdocumentation
otherthanadatadictionarycanbe
referencedusingRelatedDocuments
(references).
No
AppendixA:SampleDatasetMetadata
10
describedByType
Data
Dictionary
Type
Themachine-readablefileformat
(IANAMediaTypealsoknownas
MIMEType)ofthedataset'sData
Dictionary( describedBy).
No
isPartOf Collection Thecollectionofwhichthedatasetis
asubset. No
issued Release
Date Dateofformalissuance. No
language Language Thelanguageofthedataset. No
landingPage Homepage
URL
Thisfieldisnotintendedforan
agency'shomepage(e.g.
www.agency.gov),butratherifa
datasethasahuman-friendlyhubor
landingpagethatuserscanbe
directedtoforallresourcestiedto
thedataset.
No
references Related
Documents
Relateddocumentssuchastechnical
informationaboutadataset,
developerdocumentation,etc.
No
theme Category Mainthematiccategoryofthe
dataset. No
FederalDatasetFields
TheU.S.federalrequirementalsorequiresthefollowingmetadatafields.Youshould
considerrequiringlocaldepartmentcodes,systemsofrecord,andassociatedITspendingif
helpfulforyouropendatacatalog.Ifyoudonothaveuniquegovernmentwidecodesrelated
totheseareas,youmightconsidercreatingthose.
AppendixA:SampleDatasetMetadata
11
Field Label Definition Required
bureauCode Bureau
Code
Federalagencies,
combinedagencyand
bureaucodefromOMB
CircularA-11,AppendixC
(PDF,CSV)intheformatof
015:11.
Always
programCode Program
Code
Federalagencies,listthe
primaryprogramrelatedto
thisdataasset,fromthe
FederalProgramInventory.
Usetheformatof
015:001.
Always
dataQuality Data
Quality
Whetherthedatasetmeets
theagency'sInformation
QualityGuidelines
(true/false).
No
primaryITInvestmentUII
PrimaryIT
Investment
UII
Forlinkingadatasetwith
anITUniqueInvestment
Identifier(UII).
No
systemOfRecords Systemof
Records
Ifthesystemisdesignated
asasystemofrecords
underthePrivacyActof
1974,providetheURLto
theSystemofRecords
Noticerelatedtothis
dataset.
No
USG
USG
USG
USG
USG
AppendixA:SampleDatasetMetadata
12
AppendixB:SampleColumnMetadata
AppendixB:SampleColumnMetadata
13