Spreadsheet To XML: Using The PACSCL Finding Aids Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 15
Download | |
Open PDF In Browser | View PDF |
PHILADELPHIA AREA CONSORTIUM OF SPECIAL COLLECTIONS LIBRARIES Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet Hidden Collections Processing Project Holly Mengel 3/1/2012 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Table of Contents Introduction ......................................................................................................................... 3 Familiarize Yourself with the Spreadsheet ......................................................................... 3 Instructions for using the PACSCL Finding Aids Spreadsheet .......................................... 6 2 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Introduction The PACSCL Finding Aids spreadsheet is a tool designed to import container lists into the Archivists’ Toolkit. This spreadsheet can be used for original data entry or for converting electronic legacy finding aids to EAD finding aids. Regardless of the starting point, careful attention to data entry is important. This spreadsheet was designed and shared by Matt Herbison, an archivist at the Drexel University College of Medicine Legacy Center. Technical expertise is not necessary to take advantage of this tool, however, working knowledge of MS Excel and EAD is helpful. Access to XML editing software, such as oXygen, XMetal or Dreamweaver, is also useful. XML editing software enables users to “validate” and edit XML code prior to import into the Archivists’ Toolkit. This guide demonstrates the mechanics of the PACSCL Finding Aids Spreadsheet and uses an extremely basic example. Those employing this guide for legacy conversions will find that few legacy finding aids are this simple or straightforward. For original data entry, follow instructions from Step 4 (page 6). Familiarize Yourself with the Spreadsheet Before beginning, open both the Archivists’ Toolkit and a blank version of the spreadsheet (available at public.herbison.org/ead). The illustrations on the following pages show how the columns in the spreadsheet map directly to fields in the Archivists’ Toolkit. 3 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 4 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Notice: A spreadsheet with the data keyed into the wrong fields may import into the Archivists’ Toolkit. However, in order for the data to import into the proper fields and display correctly, it is necessary for the data to be placed into the exact columns that map to the fields in the Archivists’ Toolkit. Notice: For instructions on proper data entry into Archivists’ Toolkit fields, please see the PACSCL/CLIR Hidden Collections Processing Project, 2009-2012 Archivists’ Toolkit Guide available at http://clir.pacscl.org/project-documentation/ 5 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Instructions for using the PACSCL Finding Aids Spreadsheet Step 1: When starting with an electronic, MS Word document, copy and paste container list of an electronic finding aid into Notepad++. (If starting with an MS Excel document, follow instructions from Step 3). Notepad++ can be downloaded, free of charge, at http://notepad-plus-plus.org/download/ Original electronic finding aid: Original electronic finding aid, after being pasted into Notepad ++: (The arrows below represent tabs. The format of the original document pasted into Notepad++ will dictate where tabs exist. Each new document pasted into Notepad++ will look different.) 6 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Step 2: Separate folder titles, folder dates, box, box number, folder, and folder number with tabs. This will create a text delimited file that can be read and saved by most database and spreadsheet programs. Notice: There are specific numbers of tabs between items, so that when pasting into the spreadsheet, fields fall into specific columns. In this example, data before the first tab will fall into the first column (title); data following one tab (shown with an arrow) will fall into the second column (date expression), etc. In this example, there were no bulk dates in the original finding aid, thus tabs for bulk dates are not included here, but will need to be compensated for in step 4. Take note that in Box 2 Folder 5, there is a tab between 1925 and 1935. In the original finding aid, this is written 1925-1935, but because it is an “inclusive date” in the Archivists’ Toolkit, it will fall into two separate columns “date begin” and “date end” in the PACSCL Finding Aids Spreadsheet. Hint: Mistakes will be obvious after pasting into MS Excel. For example, if dates are in the title field, go back to the Notepad ++ document and add tabs as necessary. 7 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 8 Step 3: Copy and paste text de-limited document into MS Excel, correct spelling and format titles and dates for DACS compliance (use regular expressions if you know how!). If original finding aid is an MS Excel document, start with this step. Notice: The spreadsheet may not format all data correctly. Check dates and box and folder numbers. PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 9 Step 4: Copy columns into appropriate columns in the PACSCL Finding Aids Spreadsheet. Hint: Make sure that data matches up: do several spot checks! Hint: This is where, in supplied example, the compensation for bulk dates occurs, Notice that columns G and H are empty. Hint: Make sure that instance levels 1 and 2 are next to each other. Step 5: Add level (series, subseries, file) to create hierarchy, and instance type (text, graphic material) Notice: There are 3 levels of hierarchy to work with (see red arrow): 1 is series, 2 is subseries, 3 is file. All three levels are not required, but hierarchy is needed for a successful import. Type the number in column B and the appropriate level type automatically populates column A. PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 10 Step 6: Once hierarchy, data entry, etc. are correct, click on the worksheet titled “EAD Skeleton to Copy.” Fill in the pink box (in column A) with the name of the collection. Click on Column D, copy, and then paste into a new Notepad++ document. In Notepad ++, data will look like this: Notice: The collection title is included. Notice: There is a place to paste the container list. PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 11 Next, returning to the PACSCL Finding Aids Spreadsheet, click on the worksheet entitled “EAD Result to Copy” (pictured on the following page). Click on column A and copy. In Notepad++, delete [[[PASTE CONTAINER LIST HERE]]], and then paste information from column A into that space. Hint: If the word “REF” appears in column A, something has gone wrong. Frequently, this can be fixed by saving the data, and opening a clean version of the PACSCL Finding Aids Spreadsheet. Once the clean version is opened, paste the data back into the spreadsheet. If the problem persists, check data entry! Notepad++ should look like this: Notice: The container list is here. PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Finally, save your document as an xml. Step 7: Prior to import into the Archivists’ Toolkit, open the XML document in an XML editor (if there are issues with the XML code, the document will not import into Archivists’ Toolkit). In the editing software, “validate” the XML code in order to pinpoint issues in coding and make necessary edits. Common problems to look for: • Ampersands need to be coded as & • If a “date expression” was placed within the first column of “inclusive dates,” the finding aid will not validate. • Diacritics may be an issue • If the container list is very long, sometime the spreadsheet has difficulty closing all the open tags. Check out the end of the finding aid and should be present almost at the bottom. If it is not, close the tags (see below): Notice: The tag is present. 12 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Step 8: After fixing any mistakes, save the file. Then open the Archivists’ Toolkit, click on Resources, then click on Import on the menu bar along the top of the page, and then, from the drop-down menu, click on Import EAD. See illustration on the following page: Depending upon the size of the finding aid, the import could take a few seconds or several minutes. After the import is complete, double click on Resources so that the new file loads. It will always be titled “Import of [name of the collection].” 13 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Step 9: Add collection level information. Change the following fields (remember to be DACs compliant): Title Date Resource Identifier Extent Language of the collection (only if it is not English) For a DACs compliant finding aid, add: Abstract Biographical/historical note Scope and content note Creator of the collection If the original finding aid has these components, simply copy and paste them into the appropriate fields in AT. 14 PACSCL Hidden Collections Processing Project, 2009-2012 Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet, March 2012 Step 10: To get the collection into the PACSCL Finding Aids Site, export the EAD. First save the xml file to the test web folder, and the next day, check for accuracy on the test site (which is not publicly available to those without the url). Don’t forget to: Spell check Make certain that hierarchy is correct (it is more difficult to see hierarchy mistakes in the spreadsheet than in the Archivists’ Toolkit) If OCR was used to make the finding aid a searchable, electronic document, plan to read the finding aid, word for word. Make all corrections to the finding aid in the Archivists’ Toolkit. Re-export the final version of the finding aid and save the xml file to the production web folder. The next day, it should appear on the PACSCL Findings Aids Site (findingaids.pacscl.org) The container list of the finding should look like this: 15
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 15 Language : en-US Tagged PDF : Yes Title : Spreadsheet to XML: Using the PACSCL Finding Aids Spreadsheet Author : Holly Mengel Subject : Hidden Collections Processing Project Creator : Microsoft® Office Word 2007 Create Date : 2012:03:06 13:30:03 Modify Date : 2012:03:06 13:30:03 Producer : Microsoft® Office Word 2007EXIF Metadata provided by EXIF.tools