Open Data Metadata Guide 2019 01 Bloomberg
User Manual:
Open the PDF directly: View PDF .
Page Count: 13
Download | |
Open PDF In Browser | View PDF |
Table of Contents Introduction 1.1 Categories 1.2 Dataset Metadata 1.3 Column Metadata 1.4 Additional Resources 1.5 Appendix A: Sample Dataset Metadata 1.6 Appendix B: Sample Column Metadata 1.7 2 Introduction Open Data Metadata Guide Metadata - descriptive information about data - is critical to helping visitors find and use published data effectively. Good metadata reduces the need for visitors to seek personal assistance, helps prevent misinterpretation of data, and encourages higher data quality. Metadata is generally divided into two types: Metadata that provides an overview of the data. This kind of metadata helps people find the data through internet searches, while navigating your portal, or even while navigating other data portals which might include your catalog. Metadata that provides details about specific parts of your data. This kind of metadata enables people to use your data effectively, by helping them understand the various elements it includes and potential limitations. This guide avoids the details of specific technologies; however, it takes into account existing (and emerging) national and international standards for metadata. Refer to the Additional Resources section for more information. To assist cities in advancing open data programs in their own communities, the Center for Government Excellence at Johns Hopkins University, a partner in the What Works Cities initiative, has created this metadata guide. By learning from the experiences of other cities and following Center-developed best practices, cities will have a greater understanding of metadata and be well on their way to national leadership in open government data. 3 Categories Categories As an open data portal grows beyond 25-30 datasets, it is more helpful to visitors if they can browse for data by a subject matter or theme. These categories are short, at most a word or two, and allow related data to be grouped together. Categories also empower visitors to explore available data for inspiration, rather than requiring them to use a search tool to find something specific. Creating categories is often a fundamental step when implementing an open data portal. Categories do not need to be permanent; it makes sense to have three to four categories for a small number of datasets, and re-evaluate them on an annual basis as more data is published. Most mature open data portals have 8 to 12 categories. Having too many might mean that the categories are not broad enough. Having too few, especially when combined with a large number of datasets, might mean that the categories are too broad and less helpful for visitors. Although there is no consistent set of categories between open data portals, the following are quite common and might serve as a starting point: Business, Education, Environment, Finance, Health, Human (or Social) Services, Property, Public Safety, Recreation, and Transportation. A librarian or information architect can provide insight and assistance when creating or revising the list of categories. 4 Dataset Metadata Dataset Metadata Without dataset metadata, a catalog of published data could not exist. Many open data portals include the necessary tools to create dataset metadata when publishing new data. Some open data portals automatically update the metadata when editing datasets. Each dataset you publish will include many of the following metadata elements. Basic Elements Basic metadata elements provide the most important pieces of information to help visitors find data and determine if it is what they need. Many of these items will appear directly in catalog navigation pages or search results. Title (or Name): Human-readable name for the data. It should be in plain English and include sufficient detail to facilitate search and discovery. Acronyms should be avoided. Description: Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. Category (or Theme): Main thematic category of the dataset, usually chosen from a predefined list. Refer to the Categories section of this guide for more information. Some open data portals limit a dataset to one category; others allow multiple. Keywords (or Tags): Tags (or keywords) are generally single words which help visitors discover the data; please include terms that would be used by technical and nontechnical users. Keywords can also be used by recommendation engines to help visitors discover similar datasets. Modification Date: The most recent date on which the dataset was changed, updated, or modified. Contact Information: The name and email address of the publisher of a dataset. License: Often datasets on open data portals are available in the public domain with no restritions on reuse (usually this is noted in the site’s Terms of Service or Data Policy), however there may be circumstances where a specific dataset is offered using a different license. Advanced Elements Advanced metadata elements provide helpful information that allows third-party software to consume both data catalogs and datasets. These items might not appear in catalog navigation pages or search results, but allow for sharing with other open data portals and 5 Dataset Metadata search engines. Frequency: The frequency with which dataset is updated, in plain English. For example, “Never,” “Hourly,” “Daily,” “Weekdays,” “Weekly,” “Semi-monthly,” “Monthly,” “Quarterly,” “Semi-annually,” “Annually,” etc. This helps visitors know how often they should check for new data, and is particularly valuable for software programmers who may set up automatic downloads. Temporal Coverage: The range of time included in this dataset. This may reflect a general range for all the records, or may reflect the earliest and latest dates from records in the data. Spatial Coverage: The geographic area for which this dataset is relevant. A place name - particularly one associated with clear boundaries - is most commonly used. If the dataset includes geospatial information, spatial coverage can represent a bounding rectangle or polygon of all the geography contained within it, though this is uncommon. Refer to Appendix A for sample dataset metadata. 6 Column Metadata Column Metadata Although column metadata is often limited or left out entirely, it is very helpful to data consumers who frequently work with, write software for, or analyze datasets. Column metadata attributes provide important details about the data which the column contains. Many open data portals include the necessary tools to create column metadata when publishing new data. Name: Human-readable name of the column. It should be in plain English and usually a word, or a few words at the most. Description: Human-readable description of the column’s contents. This description should include how values in this column are created or updated; address any data quality concerns, such as unexpected or unusual values;, and explain any meanings which might be stored as codes,often used for record classification, and more frequent in source data systems designed for limited storage space. Data Type: Specifying a data type helps improve the consistency and quality of data. Common data types are text, numbers, dates/times, booleans (yes/no or true/false), and geometry (points, lines, polygons). Some open data portals will prevent records from being added or updated if the type of a value is incorrect. Required: Specifying whether a value is required in the column for every row in the table helps improve the quality of data. Some open data portals will prevent records from being added or updated if the column is marked as required but the datum was not included. Machine Name: Machine-readable version of the column’s Name. This is often a copy of the Name, with changes that make it suitable for computer software to use. These changes may include replacing spaces with underscores (or removing them entirely), applying camel-case, and/or ensuring it is unique from other column names. Refer to Appendix B for sample column metadata. 7 Additional Resources Additional Resources This metadata guide is based upon current best practices in the US open government data sector. The following are additional resources which may be helpful for greater detail and guidance: Project Open Data Metadata Schema Dublin Core Federal Geographic Data Committee - Metadata The Open Metadata Handbook 8 Appendix A: Sample Dataset Metadata Appendix A: Sample Dataset Metadata Standard Dataset Fields The U.S. federal government has created the Project Open Data metadata schema standard to implement the federal open data policy. The Project Open Data schema is based on the international DCAT metadata schema used by open data programs around the world and has been mapped to many standards. The Project Open Data schema must be preseneted as a JSON file to be ingested by Data.gov. This schema is natively available with many open data portal providers including: Azavea, Esri Open Data, NuCivic's DKAN, OpenGov, and Socrata, and is easily added to CKAN sites with an extension or can be generated on an ad hoc basis with these tools. Field Label Definition Required Title Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. Always Description Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. Always keyword Tags Tags (or keywords) help users discover your dataset; please include terms that would be used by technical and non-technical users. Always modified Last Update Most recent date on which the dataset was changed, updated or modified. Always publisher Publisher The publishing entity and optionally their parent organization(s). Always contactPoint Contact Name and Email Contact person's name and email for the asset. Always identifier Unique Identifier A unique identifier for the dataset or API as maintained within an Agency catalog or database. Always title description The degree to which this dataset could be made publicly-available, regardless of whether it has been 9 Appendix A: Sample Dataset Metadata accessLevel Public Access Level made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public). Always License The license or non-license (i.e. Public Domain) status with which the dataset or API has been published. See Open Licenses for more information. IfApplicable Rights This may include information regarding access or restrictions based on privacy, security, or other policies. This should also serve as an explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. Text, 255 characters. IfApplicable Spatial The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place. IfApplicable temporal Temporal The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). IfApplicable distribution Distribution A container for the array of Distribution objects. See Dataset Distribution Fields below for details. IfApplicable @type Metadata Type IRI for the JSON-LD data type. This should be dcat:Dataset for each Dataset. No accrualPeriodicity Frequency The frequency with which dataset is published. No conformsTo Data Standard URI used to identify a standardized specification the dataset conforms to. No Data Dictionary URL to the data dictionary for the dataset. Note that documentation other than a data dictionary can be referenced using Related Documents ( references ). No license rights spatial describedBy 10 Appendix A: Sample Dataset Metadata describedByType Data Dictionary Type The machine-readable file format (IANA Media Type also known as MIME Type) of the dataset's Data Dictionary ( describedBy ). No isPartOf Collection The collection of which the dataset is a subset. No issued Release Date Date of formal issuance. No language Language The language of the dataset. No landingPage Homepage URL This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users can be directed to for all resources tied to the dataset. No references Related Documents Related documents such as technical information about a dataset, developer documentation, etc. No theme Category Main thematic category of the dataset. No Federal Dataset Fields The U.S. federal requirement also requires the following metadata fields. You should consider requiring local department codes, systems of record, and associated IT spending if helpful for your open data catalog. If you do not have unique governmentwide codes related to these areas, you might consider creating those. 11 Appendix A: Sample Dataset Metadata Field Label Definition Required Bureau Code Federal agencies, combined agency and bureau code from OMB Circular A-11, Appendix C (PDF, CSV) in the format of 015:11 . Always programCodeUSG Program Code Federal agencies, list the primary program related to this data asset, from the Federal Program Inventory. Use the format of 015:001 . Always dataQualityUSG Data Quality Whether the dataset meets the agency's Information Quality Guidelines (true/false). No primaryITInvestmentUIIUSG Primary IT Investment UII For linking a dataset with an IT Unique Investment Identifier (UII). No System of Records If the system is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset. No bureauCodeUSG systemOfRecordsUSG 12 Appendix B: Sample Column Metadata Appendix B: Sample Column Metadata 13
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : Yes XMP Toolkit : Adobe XMP Core 5.6-c016 91.163616, 2018/10/29-16:58:49 Format : application/pdf Title : Open Data - Metadata Guide Description : Metadata - descriptive information about data - is critical to helping visitors find and use data effectively. Creator : Center for Government Excellence Publisher : GitBook Subject : Language : en Metadata Date : 2019:04:06 21:06:11-05:00 Modify Date : 2019:04:06 21:06:11-05:00 Timestamp : 2016:12:21 21:26:19.399574+00:00 Document ID : uuid:e29f0317-5336-694e-8de6-9dd7557b326d Instance ID : uuid:67de096a-f13f-3a42-8b8f-cf6b11a81ea0 Page Count : 13 Author : Center for Government Excellence Create Date : 2016:12:21 21:26:21+00:00 Producer : calibre 2.57.1 [http://calibre-ebook.com]EXIF Metadata provided by EXIF.tools