Devil in the detail for landmark National Library of Australia project
NLA's project to digitise Australian newspapers and make them available online required both specialised knowledge and equipment.
THE National Library of Australia's $10 million landmark project to digitise historic Australian newspapers and make them freely available online required both specialised knowledge and equipment.
Some of the newspapers had been microfilmed in the 1960s before microfilming standards were in place, which meant the quality was not ideal.
"So we have some lovely images of newspaper pages with people's thumbs on them as they were holding the newspapers down while it was being microfilmed," National Library of Australia digitisation and photography director Cathy Pilgrim said.
The National Library of Australia published an open tender in 2008 for the historic program, and South Australian document management group Scan Conversion Services was one of two Australian companies selected.
Ms Pilgrim said SCS met quality expectations when it came to automated enhancement of damaged pages, particularly for newspapers from the 1800s that had already started to deteriorate when they were microfilmed. "They (SCS) have high-end scanning equipment that makes it relatively easy to convert microfilm into digital images," she said. "They also had significant expertise and had done this type of work a lot across a whole range of industries."
SCS adapted specialised software packages and wrote some bespoke software to maximise the efficiency and quality of the image capturing process. The result was a highly automated process with extensive built-in quality assurance procedures and an overall increase in productivity of more than 50 per cent.
The Canberra-based National Library, which has about 430 staff, collects and provides access to Australian documentary heritage across a wide range of formats through books, journals, pictures, manuscripts, music collections as well as websites and electronic content.
"Our aim is to collect, store and maintain that documentary heritage and also provide access to it in the widest possible way," Ms Pilgrim said.
The National Library undertook the first phase of the Australian Newspapers Digitisation Program across four years, concluding in June 2011. It is digitising newspapers up until 1954, starting with the earliest newspaper -- the Sydney Gazette and New South Wales Advertiser in 1803.
As of today, more than 5.4 million newspaper pages, which equates to about 54 million articles, are available to research and browse via the National Library's Trove discovery service (http://trove.nla.gov.au/newspaper).
Ms Pilgrim said the project has been beneficial to genealogists, family historians, as well as the education sector. The articles are freely available to anyone and include major events such as Gallipoli, the Great Depression and the Eureka Stockade.
"In the past, people wanting to use newspapers had to go to their library and had to get out microfilm and spend days scrolling through reams of microfilm to find what they are looking for," she said. "By digitising and bringing all of the keywords together into one database, you can be typing on your keyboard at home, the office or wherever, and can access this sort of information."
The Australian Newspaper Digitisation Program, which is a collaborative project with state and territory libraries, has now moved into its fifth year.
"We started off the program with the aim to do one major daily newspaper from each Australian state and territory," she said. "As the program has progressed, we are now providing access to over 150 individual titles and more recently we are starting to work on regional newspaper titles."
In order to sustain the program, the National Library is taking on the digitisation on behalf of others to deliver the content through a national framework.
Offshore contractors undertake the optical character recognition (OCR) to convert the image into text so that the words can be put into a database to enable keyword searching. Offshore contractors are also used to do content analysis.
Ms Pilgrim said the Trove service was the first newspaper digitisation service in the world to enable users to also contribute to the content. "So users can add a subject tag to articles, they can add a comment about an article and one of the innovative things is that they can correct that electronically translated text or the OCR text we are delivering to users."