|
|
|
Case Studies on Low Cost Digitisation ProjectseIFL Case Studies on Low Cost Digitisation Projects Final Report Prepared by Edited by Introduction, Methodology and Acknowledgements. Low Cost Digitisation: Sample Projects. Digitisation Projects and Costs. Financing Digitisation Projects. Results of Digitisation Projects and Access. Digitisation Projects and Standards. Sustainability of Digitised Collections and Access. Introduction, Methodology and AcknowledgementsThis report summarizes the experiences with digitisation by some eIFL countries. Although there are probably many more examples of digitisation in eIFL countries, this report includes only those where the country responded to the survey. The main objective of this study was to raise awareness about best practice digitisation projects that are: (1) affordable, (2) easily managed at the technical and organisation level, (3) sustainable, and (4) enable eIFL countries to preserve and promote their local content online. Libraries in eIFL countries with digitisation projects were asked to complete a survey that asked them about the intent of their projects. Surveys were completed and returned by libraries with additional relevant information, such as pictures illustrating the digitisation (scanning). Subsequently, the respondents were interviewed briefly by phone or Skype about their survey answers. The questions in the survey were selected from existing digitisation study questionnaires. We thank the Humanities Advanced Technology and Information Institute (HATII), University of Glasgow and Ann Gow for the use of the survey instruments they developed for the 2001 NINCH study from which a number of the eIFL survey questions were derived. The section on cost issues was inspired by the NINCH Good Practice Guide [NINCH], the iMARK Information Management Resource Kit Module on “Digitization and Digital Libraries” [iMARK] and “From paper to collection” (Loots et al., 2004) Thanks also to Patricia Liebetrau, Geoffrey Salanje, Tigran Zargaryan for their early feedback, their pre-testing of the survey and to Jan Andrzej Nikisch who had a very thorough look at my draft texts and made valuable suggestions. Thanks also to everyone from Low Cost Digitisation Projects who answered the questionnaire and sometimes had further conversations with the author, and whose enthusiasm and detailed information, especially under sometimes tight deadlines, is greatly appreciated. Repke de Vries Executive SummaryLibraries in eIFL countries face the same challenges as any other library in the world: how to meet new demands for digital content and at the same time maintain responsibility for older, physical collections. Digitisation and digital libraries are invaluable to answer these challenges. Effective digitisation projects and digital library implementations require equipment, human resources and expertise. These costs involve different types of financial investments. Outsourcing is an option but usually at a higher cost. After the digitisation is completed, the library must also develop solutions and policies for access, and steps must be taken to ensure sustainability. The case studies included in this report on Low Cost Digitisation projects collectively seek to answer this question: how can libraries in eIFL countries manage digitisation projects given the cost and policy requirements? There are three outcomes from this study: (1) the full project reports from each of the countries that responded to the questionnaire, which was primarily for internal use, (2) this Final Report with its highlights and conclusions, and (3) a short Frequently Asked Questions (FAQ) section (in Appendix B of this report). Questionnaire used for this study is available from eIFL.net website – www.eifl.net. The emerging picture Out of the 47 eIFL.net countries, 20 countries were approached for information about low cost digitisation projects, of which 13 countries responded. In addition 3 countries voluntarily responded to the initial Call. The 24 countries that were not asked either were above Low Cost Digitisation project criteria or were still in the beginning stages of their projects. Libraries responded to the survey from late 2008 to early 2009. They come from most regions of the eIFL network: Eastern Europe, Central Asia, and Africa. The conditions under which they digitise differ not only among the regions but also compared to the United States and Western Europe. Differences relate to availability of government financing, adequateness of library budgets, maturity of outsourcing solutions, and (for in-house projects) the ease of access to digitisation expertise, scanning equipment and support. With the exception of countries that chose outsourcing, all cases report in-house digitisation through the re-deployment and retraining of staff. The libraries acquired digitising equipment and software. With a few exceptions of digitized collections of photographs, maps, and audio files, most of the digitised collections were of handwritten or printed material, mixed materials addressing “everything issued in a country about a country,” and publications related to university research and teaching (such as research papers, dissertations, theses and curriculum literature). Preservation needs, together with new forms of open electronic access, drove many of the libraries to digitised manuscripts, rare books and older print journals. The scanning of contemporary print material (especially in Africa with access limited to the university network) was driven by the need to provide easier and multiple points of access. After scanning, printed text scanning can be converted from images to full text using OCR, but only two countries reported using this approach. One project found that OCR was not adequate to cope with the country’s scripts, and so they reverted to manually keying-in by volunteers. Most scanning equipment the countries employed was low-to-mid range and was manufactured for consumer markets and office use. The few projects to digitise rare materials used more specialised high-end scanners. The purchase of the scanning equipment and related information technology almost never could have been financed from the libraries’ budgets or national funding programs. The countries typically had to rely heavily on external funding and donations, including a diverse array of initiatives and organisations including the British Library’s “Endangered Archives,” UNESCO, the World Bank and the Association of African Universities. The libraries avoided personnel costs to a large extent through a combination of in-house digitisation and part-time re-deployment of regular staff. Nevertheless, there were some full digitisation teams that reported having a part-time director, metadata specialist, technical support staff, a curator, photographer or evaluation specialist and several (2 to 3) digitisers or scanner operators. Initial training for these new duties and gaining expertise was managed in many creative ways to avoid costs. All of the reported projects were relatively young (all were from the last five years). A number are extending the length of time the project is running as a cost saving method. A few participate in international digitisation programs. One case concerned libraries that worked together as consortium on digitisation and digital libraries. Another country reported cooperation with neighbouring countries. However, most countries report that their digitisation projects occur in relative isolation. Any digitising is a complex series of steps involving many technical and other decisions. Guidelines like the “IFLA Digitization Guidelines” offer assistance with these issues, but almost none of the cases reported using such guidelines. Once a project is finished, sustainability of the new digital collection, together with its access, starts needing attention and a budget. Most libraries recognise these sustainability issues and plan for updating their user interface, the file formats and sometimes the metadata scheme. More than half also report developing a preservation strategy for their digitised content or having one in place. Conclusion and recommendations The message from the reports seems to be represented by the following quotes: “we cannot afford to wait with digitisation because some of our collections are in bad condition or because we want to open up our cultural heritage to a new and broader audience or because teaching and research need much more convenient electronic copies of publications.” “Therefore we [tried] a Do-It-Your-Self approach with our own staff and … not too expensive equipment.” “Maybe the results are scanned images only and don’t meet some of the standards … but we need digital content and need it quickly: our users and our collections are asking for it.” Against this background this case study confirms that Low Cost Digitisation is an option that can provide good results. The major hurdles are the initial costs of scanning equipment and related information technology. Many funding schemes are already helping with this obstacle. Mediation by eIFL concerning funding programs and digitisation projects could be of further assistance. Facilitating initial training of digitisation teams could be another form of high-level eIFL assistance. The relative isolation in which projects are done is perhaps inherent to the do-it-yourself approach, but it is less effective. As one example in the study shows, local collaboration through a consortium makes the digitisation job easier. Setting up such collaboration could be a topic for eIFL organised workshops. It is remarkable how some of the ideas behind these Low Cost Digitisation projects have parallels in a 2007 United States forum discussion published as “Shifting Gears – Gearing Up to Get into The Flow.” The message boils down to this: “vast quantities of digitized primary materials will trump a few superbly crafted [special] collections.” Fortunately, the eIFL.net countries libraries represented in this study have been and are putting this principle into practice. Low Cost Digitisation: Sample ProjectsNote. Although there are probably many more examples of digitisation in eIFL countries, this report includes only those where the country responded to the survey. Armenia. The Fundamental Scientific Library (FSL) of the National Academy of Sciences started a two year digitisation project that will scan 15.000 pages from a single collection of early printed books and periodicals that includes 400 rare books and 1.500 journal issues. The project was supported by a 47.000 GBP grant from the British Library’s “Endangered Archives Programme.” The one-time grant enabled the purchase of high end scanning equipment, a graphical workstation and 2TB of disk space, and training of the team’s metadata specialist and digitisers by external consultants. Access to the digitised collection will be open and available through a Greenstone digital library. The project will end in 2010, and future digitisation plans are already being made to create a separate budget line for digitisation in the Fundamental Scientific Library’s annual budget. Bulgaria. The University, the City of Sofia, and the public libraries joined in a two-year digitisation effort that was fully funded by UNESCO and completed in 2006. The project resulted in significant parts of their serials collections digitised through outsourcing by a Bulgarian company. Ethiopia. The Addis Ababa University Computer Center undertook an ambitious program in 2008 to digitise all Ethiopian collection materials that are centrally held by the university. Working with a mid-range scanner and OCR software donated by the Association of African Universities, the program is being funded through the regular budget and the worked is performed by regular staff. The use of OCR results in fully searchable text that is available and displayed next to scanned page images. Georgia. The work of Georgian classical literature and of modern Georgian writers is being digitised under a project begun ten years ago by as a National Parliamentary Library program. The digitisation of Georgian Ancient and Medieval Manuscripts is a National Centre for Manuscripts project that is underway, and the National Scientific Library started work on digitisation of its collections in 2008. A common element among all these programs is the keying-in of texts (often through volunteer work) rather than digitising to page images or using OCR. Ghana. In 2006 the University of Cape Coast Library began digitisation in close cooperation with a number of external partners that either assist the library with training or support them financially. Available digitisation funds roughly equal the library’s annual budget. The two main collections being digitised are Rare Books (for reasons of preservation and public access) and dissertations and theses (to benefit teaching, learning and research). Kyrgyzstan. The Central Scientific and Medical Library of the Malawi. A digitisation project at Bunda College Library (BCL) at the University of Malawi included about 1.000 printed books, serials, documents and printed illustrations, with an average of 100 pages from each collection. Started in November 2007, the project ended in December 2008. The primary audience for the project were students, faculty, researchers and the general public, and the purpose was to preserve and extend access. The project support of $20.000 USD came from external funding, which was used partly for scanning equipment but largely to establish a centre of expertise at BCL, including training and implementation of Greenstone digital library software. Training included librarians and archivists both from Malawi and from Tanzania and Mozambique. BCL is now sharing its experiences with other libraries in Malawi and giving them support. New digitisation projects will be financed from local resources. Poland. There are now nine regional digital libraries in operation throughout Poland. The first launched in 2001 with the Poznan Foundation of Scientific Libraries consortium, and became operational in 2002. The regional Polish Wielkolpolska Digital Library is the virtual door to content. The project includes 4.5 FTE staff and an annual budget of €200.000 for digitisation, computer hardware and software (which was written in-house in partnership with the Poznan Supercomputing and Networking Centre), maintenance, training and access. Participating libraries undertake the digitisation, with some using some high-end scanning equipment they already owned, and others using equipment from the consortium (including simple scanners for low cost digitisation). The purposes of the projects include preservation, wider access to cultural heritage, and creating resources for teaching and learning. Source material includes many types of printed or handwritten material. Continuous user feedback helps adjust content to meet their expectations. Thus far the Poznan consortium has scanned 10.000.000 pages over the past five years, and statistics on access show the collections to be in high demand. Serbia. The Digital Library Department of the National Library of Serbia started its digitisation program in 2003. Financing to generate the working expertise came from a number of sources, including donations. As of 2008 there were 70 collections and 600.000 documents digitised. The primary purposes were preservation and public access to collections that were selected for their historic and cultural value. Source material included printed and rare books, handwritten documents, and sound recordings. Digital content is available on the web and in other formats such as CD’s, with access open to both the general public and to Slavistics specialists and cartographers. The digitised material is in high and growing demand. The current library budget includes €25.000 for equipment and 7 staff who were redeployed from other library departments, and who work with volunteer scanner operators. South Africa. The South African Music Archive Project (SAMAP) is being produced under the auspices of Digital Innovation South Africa (DISA), with some work performed in partnership with the International Library of African Music (ILAM) at Rhodes University. Digitised materials include sound recordings from analogue tapes and gramophone records. SAMAP provides access via the web to much of South Africa’s music heritage to “promote multidisciplinary research in the field of popular music and culture” and to give open access to these hidden “politically sensitive or subversive” music treasures from the past new global audiences. Tajikistan. The Rare Book Digital Collections by the Central Scientific Library of the Academy of Sciences of the Republic of Tajikistan is an ongoing digitisation program that serves two goals: preservation of the rare books collection and electronic access for research and teaching. Access is provided over the web using Greenstone software and on CDs. One particular collection digitised in 2004-2005 scanned 15.000 pages of rare book pages, photographs, maps, etc. The project included material from four different rare book collections, including a “Language and People” collection that digitised the 1899 published “Russian-Tajik Dictionary (which was the first linguistic Russian-Tajik Dictionary in Tajik linguistic history) and the 1902 published “A travers le Turkestan Russe” (which has a pictures representing the everyday life and history of the people living in this region). The total budget of $3.000 USD was externally funded by the U.S. Ambassador’s Fund for Cultural Preservation. Uzbekistan. In 2005 the Uzbekistan National Library embarked upon an in-house digitisation program for its entire collection of 20.000 rare books, manuscripts, and photographic prints, as well as its repository of 7.000 dissertations. At that beginning digitising was a new experience and equipment had to be acquired. UNESCO donated a high end scanner suitable for rare books and manuscripts. Today, the digitisation program has a 25.000 US$ budget and a separate line in the Uzbekistan National Library’s annual budget of 800.000 US$. As of 2008 there were about 22.000 pages and 1.500 photos digitised from 1.100 rare books and about 500 dissertations. In the next phase of the project 30.000 more pages will be scanned. The purpose of the digitisation program was primarily preservation and expanded access by both subject specialists and the general public. Searching the online catalogue is open to anyone. Access to the digital content is free and unrestricted on the premises of the National Library’s but regulated and fee-based from outside the library. Planning is underway to enable open access in the future to some collections. As a spin-off of the original project, the “Republican Center of Digitalisation” will be established to cover both the national library’s digitising needs and those of other Uzbekistan libraries. Digitisation Projects and CostsOverview of Low Cost Digitisation Projects. The thirteen projects were diverse in cost factors, budget, staffing, expertise, equipment, staffing, and the amount of content digitised. In the table below, salaries “on project” means temporary staff for duration of project and on project budget; “re-deployed” means staff working at the library who are full- or part-time assigned to the project and for the duration of it. Table 1: Project Duration, Budget, Cost factors and Results
Equipment, Facilities, Software Costs. Described below were the resources necessary to digitise collections (equipment, computers, software and other facilities), which expands upon the information contained in Table 1. Table 2: Resources Needed, Available and Costs
Human Resources. Regardless whether a project uses specialised scanning equipment or a simple scanner, or is performed in dedicated office space with reliable electricity or in a spare room with power generator, every scanned page and bit of metadata must be created and added with human effort. These projects require team work to develop skills and expertise, which often is achieved through “re-deployment” of existing staff rather than attracting staff already skilled in digitisation work. This choice saves on separate salary costs but introduces training as cost factor. Tables 3 to 6 report on teams, training needs and costs related to training. Table 3: Type and Number of Project Staff, and Time Devoted In each cell, the number before the semi-colon is the number of people assigned to the project, and the number after the semi-colon is the percentage of time devoted to the project (u=unknown)
Notes: Table 4: Training Needs An “x” = training needed. “ntn” no training needed. “N/A” = not applicable
Notes: Table 5: How Training was Organised An “x” = training provided. “N/A” = not applicable. Note that “learning on the job” is a soft cost, as is “independent study.” Using external consultants to provide in-house training or sending staff to external courses likely is a direct cost for the project.
Notes: Table 6: Staff Training Costs
Financing Digitisation ProjectsFinances must be found before the digitisation project can begin. Many cost factors can be met through alternative means, such as staff re-deployment, learning from colleagues or on the job, and use of free and open source software. However, other costs (such as scanning equipment) involve out-of-pocket expenses (as described in tables 1 and 2 above). This equipment is a direct expenses that, when ordered from abroad costs, may need to be paid in foreign currency. After the initial purchase, there may also be costs for specialised maintenance (for high-end scanners) or replacement (for low end scanners). The choice of scanner or other equipment, such as for sound recording, is determined by the collection being digitised. Purchase of a less expensive scanner can result in the loss of quality or functionality, or could require more time. In some cases, less-expensive equipment may not be an option, such as for the digitization of rare books that are in poor condition or for projects with many pages or prints that must be scanned in a short time period of time. Almost all projects considered outsourcing the scanning process, but many reported that this option was too expensive or unavailable in the country, or that the collection was too valuable to allow it to leave the library or archive. Two projects that did outsource were in Bulgaria (which was then able to shift costs from equipment and training to financing the outsourcing) and Georgia (where most digitisation is done by keying in from originals rather than scanning and the work is performed by volunteers). Table 7: Funding Sources at Project Start and Over Time
Results of Digitisation Projects and AccessThe summary of examples from the eIFL.net countries explained the “why,” “what,” and “for whom” of the projects. Most of the projects reported that the most important reasons for digitisation were preservation, followed closely by easier and wider access. The target audience included the general public (to provide access to the historical and cultural collections), and students and faculty at schools and universities (to provide teaching and learning resources). While “number of pages scanned” reported in Table 1 is one indicator of the scale of accomplishment of these projects, more important are the measures below, including how access is provided, the ability to browse on a website, links from other resources (such as the online catalogue, or OPAC), and alternative forms of access (such as distribution on CD’s). Table 8 – Means of User Access to the Digitised Content
Table 9 – Level of Use of Digitised Collections (Selected Data Available)
When digitisation is of text then there are advantages in trying to not only scan to an image of that text but to also create so searchable full text. This can be created through Optical Character Recognition (OCR) software, with varying degrees of accuracy, or by manually re-keying the data from the original manuscript. In contrast to textual information, scanned images can be and found only through their metadata. Table 10: Use of OCR or Manual Re-keying to Create Full Text
Digitisation Projects and StandardsDigitisation must certain standards to provide useful and true digital representations of the physical original. “Useful” means that the scanned images of print material can be used for other purposes, such as to convert the full text into searchable data through OCR. “True” representation means that the image should accurately represent the original, e.g., researchers studying a digitised copy of a very rare handwritten illustrated manuscript should be able to get an image that is equal to the original. There are a number of published guides and guidelines that to help ensure “true and useful digital representations.” Table 11: Types of Guides or Guidelines to Digitise Documents
Sustainability of Digitised Collections and AccessThe sustainability of digitised collection and access requires careful attention and a sufficient budget and ongoing work to ensure continued access. Long term access requires a digital preservation strategy for timely migration to new media, upgrading of metadata as standards may change, and continuous review to ensure that the needs of different user groups are being met as technology and demands change. Table 12: Provision for Future Updates
ConclusionMany libraries in eIFL countries have undertaken or are planning digitisation projects. This report on case studies of low cost digitisation projects brought together some of their experiences. Underlying this report are the full answers to each of the surveys, each forming a mini-report by itself. This Final Report is simply a selection, annotation and interpretation of this original material. Based upon the data provided by the reporting institutions, the following tentative conclusions can be drawn: - For the most part, in-house digitisation projects are relatively recent undertakings, but they seem successful and here to stay. Digital libraries increasingly will be created with digital content coming from the collections owned by the libraries. - There seem to be two motivations for digitisation: preservation of physical collections in poor condition, and the wish to provide users with more readily accessible local digital content. The value of the latter can proven by usage statistics. Some of the cases also demonstrate that these two functions can both be achieved effectively through the same project. - A digitisation project often does not have to “break the bank,” but there are some types of material do require equipment that is so expensive that only outside funding or donation can kick start the project. - Kick-starting a project requires the library to have sufficient funds and human resources, or to find creative ways to cover costs. For the provision of some specific equipment, it may be valuable to investigate the purchase and installation of centralised, streamlined equipment or to find other ways to make this funding and equipment available. - An attractive and resourceful way to get more digitisation done or to help a project have an easier start is to foster collaboration among libraries through consortia and by creating centres of expertise to provide training and support or lend equipment to first-time digitisation projects. Appendix AList of eIFL Countries Contacted
Notes: Appendix BFrequently Asked Questions (FAQ) About Low Cost Digitisation ProjectsQ: I have a library collection that is physical but want it digital: where do I start A: The first step is to educate yourself about some of the issues. You may find the following digitisation guidelines to be helpful: - IFLA & UNESCO [www.ifla.org] - PULMAN [pulmanweb.org/DGMs/] - DISA [www.disa.ukzn.ac.za] - the NINCH Good Practice Guide (NINCH) [www.nyu.edu/its/humanities/ninchguide/] - the iMARK Information Management Resource Kit Module “Digtization and Digital Libraries” [www.imarkgroup.org] Also on CDROM in a number of languages - “From paper to collection” (Loots et al., 2004). [greenstonesupport.iimk.ac.in] Talking to other librarians who are geographically close to you and share the same language should also be very helpful. The next step will be to develop a plan of action. For example, the iMARK Module helps provides many useful exercises and checklists. Finally, get started. The most important lesson learned from all of the projects we studies is this: if you can finance the cost of your scanner and manage the initial training, apply that knowledge as soon as possible so you can learn by doing. Q: How can I finance my digitisation project? A: Most of the projects we studied where financed through one of these sources: (1) obtaining external funding (e.g., grants, foundations, special ministry funding) for scanning equipment and IT-related investments, (2) handling staff costs through redeployment or volunteer work, or (3) additional funds from the regular budget to support training, (4) asking colleagues at other nearby institutions to provide training. Q: Do most libraries work together on digitisation or by themselves? A: The experience varies by country. For example, Poland demonstrated the advantages of regional cooperation, which enabled costs to be shared, central coordination of arrangements, and centralize access to the digital library. Uzbekistan is seeking to establish a digitisation centre at the national level, which would be of a benefit to all libraries. In South Africa, DISA is an example of a group that was able to share expertise, project management and hosting. Nonetheless, in many of the other projects the libraries worked on their own. Some additional information about the participation in international programmes is available from the British Library’s “Endangered Archives.” Q: I know which of my collections I should digitise. Should I outsource this work or do it in-house? A: Most of the projects we studied chose to do the work in-house. The most frequently cited reasons for doing so were because: (1) there were no outsourcing facilities in their country; (2) the collections being digitised were too fragile or special to allow them to leave the library; (3) the out-of-pocket costs for outsourcing were greater than what was available in the budget; (4) although the work might take longer, it was possible to extend the project over time and use in-house staff to complete the project. There were two instances of outsourcing in the study. Bulgaria outsourced the digitisation of the journal collection scanning, and Georgia outsourced to volunteers the work to type and input the text of old manuscripts. Q: What is OCR? What does it do and should I try it? Are there alternative solutions? A: After scanning a page, Optical Character Recognition (OCR) software can be used to read the scanned text images and produce full text as if it were typed. The resulting full text enables indexing and searching through the text. However, OCR usually can read only fairly standardized type fonts, so it cannot always handle local fonts or handwritten material. Even when the type font is standardized, the accuracy of the conversion will vary depending upon the quality of the OCR software. The use of OCR also requires additional steps to convert the scanned image to text and to proofread it, but without OCR the only text that will be searchable will be the metadata that the library might generate. Examples in the study where the library used OCR software were Ethiopia, Ghana and Poland. If you wish to generate full-text, the alternative is to re-type the text (or to outsource that work to another organization). Georgia reported their experience with keying in of rare books by knowledgeable volunteers. This process can be very time-consuming and can also introduce the opportunity for error. You can sometimes outsource this work (particularly for English language text material), but no outsourcing of this was reported in the survey. Q: Which equipment and software do I need? A: The table on the next page provides an overview from the countries that participated in this study in terms of the materials they digitised, the approach they took and the equipment and IT infrastructure they employed.
|
Upcoming conferences
Knowledge sharing Navigation tree
|
|
|