Friday in Chicago started with coffee with Christian Dupont from Atlas Systems, followed by Session 302: “Practical Approaches to Born-Digital Records: What Works Today.” The session was packed…standing-room only (some archivists quipped that we must have broken fire codes with the number of people sitting on the floor)! Chris Prom from U Illinois, Urbana-Champaign, moderated the excellent panel on practical solutions to dealing with born-digital archival collections. Suzanne Belovari of Tufts referred to the AIMS project (which sponsored the workshop I attended on Tuesday) and the Personal Archives in Digital Media (paradigm) project, which offers an excellent “Workbook on digital private papers” and “Guidelines for creators of personal archives.” She also referenced the research of Catherine Marshall of the Center for the Study of Digital Libraries at Texas A&M, who has posted her research and papers regarding personal digital archives on her website. All of the speakers referred to Chris Prom’s Practical E-Records blog, which includes lots of guidelines and tools for archivists to deal with born digital material.
Ben Goldman of U Wyoming, who wrote an excellent piece in RB&M entitled “Bridging the Gap: Taking Practical Steps Toward Managing Born-Digital Collections in Manuscript Repositories,” talked about basic steps for dealing with electronic records, including network storage, virus checking, format information, generating checksums, and capturing descriptive metadata. He uses Enterprise Checker for virus checking, Duke DataAccessioner to generate checksums, and a Word doc or spreadsheet to track actions taken for individual files. Melissa Salrin of U Illinois, Urbana-Champaign spoke about her use of a program called Firefly to detect social security numbers in files, TreeSize Pro to identify file types, and a process through which she ensures that the files are read-only when moved. She urged the audience to remember to document every step of the transfer process, and that “people use and create files electronically as inefficiently as analog.” Laura Carroll, formerly of Emory, talked about the famous Salman Rushdie digital archives, noting that donor restrictions are what helped shape their workflow for dealing with Rushdie’s born digital material. The material is now available on a secure Fedora repository. Seth Shaw from Duke spoke about DataAccessioner (see previous posts) but mostly spoke eloquently in what promises to be an historic speech about the need to “do something, even if it isn’t perfect.”
After lunch, I attended Session 410: “The Archivists’ Toolkit: Innovative Uses and Collaborations. The session highlighted interesting collaborations and experiments with AT, and the most interesting was by Adrianna Del Collo of the Met, who found a way to convert folder-level inventories into XML for import into AT. Following the session, I was invited last-minute to a meeting of the “Processing Metrics Collaborative,” led by Emily Novak Gustainis of Harvard. The small group included two brief presentations by Emily Walters of NC State and Adrienne Pruitt of the Free Library of Philadelphia, both of whom have experimented with Gustainis’ Processing Metrics Database, which is an exciting tool to help archivists track statistical information about archival processing timing and costs. Walters also mentioned NC State’s new tool called Steady, which allows archivists to take container list spreadsheets and easily convert them into XML stub documents for easy import into AT. Walters used the PMD for tracking supply cost and time tracking, while Pruitt used the database to help with grant applications. Everyone noted that metrics should be used to compare collections, processing levels, and collection needs, taking special care to note that metrics should NOT be used to compare people. The average processing rate at NC State for their architectural material was 4 linear feet per hour, while it was 2 linear feet per hour for folder lists at Princeton (as noted by meeting participant Christie Petersen).
On Saturday morning I woke up early to prepare for my session, Session 503: “Exposing Hidden Collections Through Consortia and Collaboration.” I was honored and proud to chair the session with distinguished speakers Holly Mengel of the Philadelphia Area Consortium of Special Collections Libraries, Nick Graham of the North Carolina Digital Heritage Center, and Sherri Berger of the California Digital Library. The panelists defined and explored the exposure of hidden collections, from local/practical projects to regional/service-based projects. Each spoke about levels of “hidden-ness,” and the decisionmaking process of choosing partners and service recipients. It was a joy to listen to and facilitate presentations by archivists with such inspirational projects.
After my session, I attended Session 605: “Acquiring Organizational Records in a Social Media World: Documentation Strategies in the Facebook Era.” The focus on documenting student groups is very appealing, since documenting student life is one of the greatest challenges for university archivists. Most of the speakers recommended web archiving for twitter and facebook, which were not new ideas to me. However, Jackie Esposito of Penn State suggested a new strategy for documenting student organizations, which focuses on capture/recapture of social media sites and direct conversations with student groups, including the requirement that every group have a student archivist or historian. Jackie taught an “Archives 101” class to these students during the week after 7 pm early in the fall, and made sure to follow up with student groups before graduation.
After lunch, I went to Session 702: “Return on Investment: Metadata, Metrics, and Management.” All I can say about the session is…wow. Joyce Chapman of TRLN (formerly an NC State Library Fellow) spoke about her research into ROI (return on investment) for manual metadata enhancement and a project to understand researcher expectations of finding aids. The first project addressed the challenge of measuring value in a nonprofit (which cannot measure value via sales like for-profit organizations) through A/B testing of enhancements made to photographic metadata by cataloging staff. Her testing found that page views for enhanced metadata records were quadruple those of unenhanced records, a staggering statistic. Web analytics found that 28% of search strings for their photographs included names, which were only added to enhanced records. In terms of cataloger time, their goal was 5 minutes per image but the average was 7 minutes of metadata work per image. Her project documentation is available online. In her other study, she did a study of discovery success within finding aids by academic researchers using behavior, perception, and rank information. In order from most to least useful for researchers were: collection inventory, abstract, subjects, scope and contents, and biography/history. The abstract was looked at first in 60% of user tests. Users did not know the difference between abstract and scope and contents notes; in fact, 64% of users did not even read the scope at all after reading the abstract! Researchers explained that their reason for ignoring the biography/history note was a lack of trust in the information, since biographies/histories do not tend to include footnotes and the notes are impossible to cite.
Emily Novak Gustainis from Harvard talked about her processing metrics database, as mentioned in the paragraph about the “Processing Metrics Collaborative” session. Her reasoning behind metrics was simple: it is hard to change something until you know what you are doing. Her database tracks 38 aspects of archival processing, including timing and processing levels. She repeated that you cannot compare people, only collections; however, an employee report showed that a permanent processing archivist was spending only 20% of his time processing, so her team was able to use this information to better leverage staff responsibilities to respond to this information.
Adrian Turner from the California Digital Library talked about the Uncovering California Environmental Collections (UCEC) project, a CLIR-funded grant project to help process environmental collections across the state. While metrics were not built into the project, the group thought that it would be beneficial for the project. In another project, the UC Next Generation Technical Services initiative found 71000 feet in backlogs, and developed tactics for collection-level records in EAD and Archivists’ Toolkit using minimal processing techniques. Through info gathering in a Google doc spreadsheet, they found no discernable difference between date ranges, personal papers, and record groups processed through their project. They found processing rates of 1 linear foot per hour for series level arrangement and description and 4-6 linear feet per hour for folder level arrangement and description. He recommended formally incorporating metrics into project plans and creating a shared methodology for processing levels.
I had to head out for Midway before Q&A started to get on the train in time for my return flight, which thankfully wasn’t canceled from Hurricane Irene. As the train passed through Chicago, I found myself thinking about the energizing and inspiring the projects, tools, and theory that comes from attending SAA…and how much I look forward to SAA 2012.
(Cross posted to ZSR Professional Development blog.)