Web Magazine for Information Professionals

Digitization: Do We Have a Strategy?

David Pearson suggests that the library sector should find a mechanism to put digitisation high on the agenda.

The notion that we are living through times of great change in the communication of information and the transmission of texts is a truism which will bring a weary look to most professionals with any kind of involvement in the area. The digital age, the information age, the electronic age – we’ve all heard these terms so many times and have sat through innumerable discussions, and seen even more documents, trying to sort out what it all means. There are almost as many views on the likely pace of change and the shape of the landscape 10 or 20 years from now as there are librarians to hold forth on the subject. Perhaps this helps to explain why the library community as a whole seems to be in such a rudderless state regarding the creation of digital content; no shortage of action, but no overall sense of direction. I am talking here about digitization of our documentary heritage, that vast mass of books, archives and other media which fill our library shelves today.

The fundamental truth is that digital technology, including its associated communication networks, has radically altered the way in which texts can be accessed. And for the majority of users, texts (or other form of content, such as images) are more important than books. Discussions about books, and about how wonderful and essential they are, tend to confuse the distinction between texts, and books. “Book” is often used as a synonym for “text”; when Milton enthused about the importance of a good book as “the precious life-blood of a master spirit, embalmed and treasured up on purpose to a life beyond life” he was actually meaning a good text; a collection of words put together so as to inspire, educate, communicate … all the things that texts are capable of. The numerous other writers throughout history who have expounded on the qualities of books have usually been using the term in the same way.

This is not to say that books are not in themselves potentially wonderful objects; their form is a masterpiece of design and they have considerable aesthetic potential, something which has been exploited by printers, typefounders and bookbinders down the ages. As historic artefacts, they may also have great value; book historians are increasingly coming alive to the lessons that annotated or otherwise personalised copies of books have to offer, and to the subtler arguments about the effect which the physical format has on the delivery of the message.

These points are, however, something of a sideshow to that which most users see as the primary purpose of books, both new and old; to be the containers of texts. Books were invented so that people could get at texts, and libraries followed as a logical extension of the theme. People want the texts that are contained in books, and libraries present an efficient way of serving popular need by bringing whole collections of books together in one place. People will go to libraries because they hold the texts they need, and make a whole lot more texts available besides, which may be relevant also.

The digital revolution has the power to run a coach and horses through all of this. If a text is held electronically on a server connected to the Internet, it becomes available anywhere in the world through a few keystrokes; no need to go to a library. The importance of texts remains unchanged, and is likely to do so until a method of communication is developed which does not rely on words. “Not marble, nor the gilded monuments of princes, shall outlive this powerful rime”, but the absolute need for books as the containers of those rimes may disappear.

There are numerous arguments marshalled against this drift, most of them spurious in the long run. It is frequently observed that more and more books are published with each passing year, with no indication they are yet becoming like the horse to the motor car. People like books, they are familiar and comfortable with them; our own generation is never likely to move away from that, but we should be wary when predicting the future. The academic scientific community (unlike the humanities one) is one which is already largely converted to the philosophy that it doesn’t exist if it’s not on the net, or at least online, and that trend will surely grow. Many people point to the clumsy and unsatisfactory nature of the technology – slow transmission, ugly presentation, the difficulty of reading large amounts of text from a screen rather than a piece of paper. These are all true today but will they be equally true in 20 years’ time, given the pace of technological development? People worry about the problems of digital preservation, and whether the e-texts of today will still be readable ten years from now. We all know some standard horror stories to illustrate this, but again, can we really believe that this will be other than a transient problem?

Whether we like it or not, the technology already exists to convert the world’s entire documentary heritage to digital form, although the money to do it is not presently forthcoming. Those who do make use of digital surrogates, or born digital material, will generally recognise the benefits of such ready access, though they may still be reading the results on a paper print-out. If you wish to check a reference to the Gentleman’s Magazine of 1750 there is now no need to go to a library to do it, and to call up what is almost certainly a closed access item. A few keystrokes to reach the site of the Internet Library of Early Journals e-lib project will produce the text on screen, and you can print off as many pages as you like without arguing with conservation-steered librarians about photocopying fragile material.[1]

How have we responded to the creation of this potential? There is, as stated earlier, no shortage of activity. We have debated endlessly what it will all mean and whether the death of the book really is round the corner or not; as long ago as 1992, the librarian of the Houghton Library at Harvard, speaking at a colloquium on the future of rare books librarianship, was willing to predict that “every book printed in a western language before 1800” would be available in full-text digital form within a 25 to 50 year timeframe.[2] Sir Anthony Kenny, however, writing the foreword to the British Library’s Towards the digital library (1998), was robust in wishing to “dismiss the fantasies of those who say that in the 21st century all information will be stored and transmitted electronically”.[3] Much effort has been directed towards defining the concept of the hybrid library, with a mixture of print and electronic.

We have also, collectively, poured a great deal of energy and expenditure into projects of all kinds to develop digital access. Much work has gone into creating web-accessible electronic catalogues, as a logical first step, but these are only the foothills of digital librarianship compared with the bigger challenge of creating full-text digital content. This is where the picture is much more fragmented. Umpteen small to medium scale textual digitisation projects have been carried out, some by libraries, some by academic groupings and some by commercial publishers. Selected highlights of particular collections have been put on the web, along with complete texts of (particularly) literary works. The rationale behind these projects varies; there is a sense of much dipping of toes into a new and sexy pool, where people don’t want to be left out but at the same time don’t want to (or can’t) invest too much resource. For libraries, particularly, the driver is often the availability of funding streams, be it JISC (for e-lib) or NOF, creating the danger that priorities are set according to varied philosophies rather than a more long-range strategic perspective. As the 1999 report on Scoping the future of the University of Oxford’s digital library collections put it, “most of the initiatives have been undertaken in isolation, coming up with different answers to the same questions, or suffering from the familiar problem of reinventing the wheel.”[4]

The consequence is a chaotic mass of digital content with few guiding principles and not a little duplication. A quick search on the web produced, in a matter of minutes, five different freely available full text versions of Bleak House; there are probably more. There is no end of digitised titlepages, images, bits of text mounted on umpteen library websites all over the world, sometimes as the legacy of particular exhibitions, sometimes as tasters and testers of what lies in the collections. And there are resources which, one suspects, rarely connect with the people who might need them; I wonder how many people who search for John Ray’s Catalogus plantarum (London, 1670) in library catalogues around the country know that a complete digital surrogate is available on Gallica, the digital library database of the Bibliothèque Nationale?[5]

The really important ideas in librarianship have tended to be the big and simple ones, aimed at comprehensiveness and inclusiveness of coverage. The short-title catalogues, mapping the total published output of national cultures within defined time zones, have become interdisciplinary cornerstones of research. The MARC format, in a different kind of way, is another example. From Alexandria onwards, the most important libraries have been the ones which held the literature of a subject or a nation as comprehensibly as possible under one roof. As publications have proliferated and Panizzian visions of gathering it all together in one place have become ever more impracticable, the principle has lived on in notions like Universal Bibliographic Control, or a Distributed National Resource – co-ordinated systems which hang on to the notion of being able to access everything in a joined up kind of way. The digital environment offers new possibilities of building comprehensive collections virtually, in a way that we now accept as being impossible physically, in any one location, but the library sector seems to be all over the place when it comes to recognising this challenge and picking it up. It is, rather, in the commercial sector that there is more evidence of this kind of thinking. A full-text digital version of the entire English-language published output down to 1700 is under active development by ProQuest (Early English Books Online), a company which also now has the Chadwyck-Healey Literature Online database under its wing, with full text versions of over a quarter of a million literary texts. JSTOR, which digitises entire periodical runs within defined datespans, is not strictly commercial but run as an independent not-for-profit organisation.

It may be objected that this is all as it should be, it’s the job of libraries to house material and make it accessible, not to publish it in facsimile, which is where the commercial operations rightly take their place. This may be so, but digital technology introduces subtle and major changes to the traditional model. It makes it possible for the facsimile publisher ultimately to cut out the role of the library altogether, to deal directly with the users (at terms to be dictated) because they become the holders of the material in the way that libraries used to be. Such developments may be no more than inevitable and healthy aspects of economic evolution; but librarians, before they don their turkey hats and vote for Christmas, should reflect that they are the custodians of the documentary heritage, that they sit on the stuff which researchers want, and they should perhaps be playing a more active role in steering the development of digital content. A further argument recognises that many libraries, including the nationals, the public libraries, and (increasingly) other libraries in the education arena who are being encouraged to recognise their wider community potential, have a basic rationale which is about making material available. They exist partly in order to be reservoirs of material, and partly to create a service which can control access to it for maximum public benefit. If we believe that it is in the public interest to have national collections like the British Library funded from the public purse, because they make the books freely available for the good of all, translating that vision to the digital age implies that we should be making the digital successor similarly available pro bono publico.

If further proof is needed of the dangers of allowing control over access to documentary material to slip too far into the commercial sector, we might contemplate the movement to liberate scientific journals. “There is growing concern among scientists that research results are controlled by an increasingly small number of publishers who have great control over the marketplace”, as William Hersh recently wrote in Nature.[6] As librarians who have to subscribe to the major groupings of scientific journals know only too well, a situation has developed in which a few publishers have an economic stranglehold; scientists produce research which is published in journals which have to be bought, at ever-more crippling subscription rates, by the institutions where many of the scientists are themselves based. Copyright over this material has also been cunningly controlled by the publishers, who have often required authors to sign their rights away. The establishment of new groupings like SPARC and PubMed, trying to bring control back into the community that creates the work, and make it freely available on the Internet for wider public good, is a heartening response but the battle will not be easily won and there is much ground to be caught up.

The thoughts marshalled in this paper have evolved as part of the process of trying to decide, from the standpoint of the Wellcome Library (an independent research library, part of the Wellcome Trust, with rich and diverse holdings), where effort should most usefully be directed in developing a digitisation programme. We have been working with the Higher Education Digitisation Service (HEDS) to seek some properly planned answers and one of the conclusions which came through strongly from their survey, though not one which caused any surprise, was that the most useful digitisation projects are the big ones which take in a readily understood body of material in a comprehensive way, things like EEBO and JSTOR. The itty-bitty ones, creating little digital islands here and there, are much less useful. The study also pointed to a surprising lack of hard published evidence on the success or failure of particular digitisation projects.

The thing that is really woefully apparent in all of this is the absence of any agreed national strategy to provide a context for decision making. However large a project any particular organisation or consortium undertakes, it will always be a contribution to a bigger picture and whatever the jigsaw is, we need it to be completed; we don’t want two people trying to fit the same piece in and we don’t want half the frame and a bit of one corner. “How much better it would be if there was a shared and centrally managed programme [for digitising texts] across the nation”, as Ray Lester wrote in the SCONUL Newsletter in autumn 2000.[7]

The fundamental principles of such a strategy are not hard to envisage. They would begin with an agreed position on that proportion of the documentary heritage which the nation wishes to see digitised within a target timeframe of (say) the next twenty years. That might be literally everything (an ambitious target) or more realistically every text (not every edition), perhaps further narrowed by discipline or other criteria. The target could be further refined by recognising higher and lower priorities within the overall framework; and a similar statement would need to be drawn up for archives. The resulting document would provide guidance for anyone undertaking digitisation work on their own holdings, and for agencies who may be funding such work. The vision then calls for a central database (which may of course be virtual, not actual) whose metadata holds direct links to the urls of digital surrogates; for the national printed archive down to the beginning of the 19th century, the ESTC file comes naturally to mind, although this is not all-inclusive. The French Gallica database, which uses not only digital files created by the BNF but also ones created by a number of partner institutions, seems to offer the beginnings of a model and something in advance of anything currently available in the public domain in the English-speaking world.

Where might we look for such a strategy to be developed? Much of the cutting edge work in electronic library developments in recent years has been focussed in the higher education sector, where of course many of the significant research libraries also live, but such libraries are individually tied to local rather than national priorities. Groupings such as CURL, or RLG, or LIBER might be better able to take an active interest, though they have limits on both resources and influence. The national library is an obvious place to look, but the BL has always been ambivalently placed when it comes to recommending, let alone setting, national policies, where Re:source might expect to be more of a voice. There are other big quasi-national libraries who are not tied to a specific academic community, with whom some of the thoughts expressed here may strike a chord, but they are not a group able to force action alone. And of course we also have a plethora of other bodies like UKOLN or DNER working on aspects of electronic library developments, with complementary but slightly differing agendas , none of whom have a remit to tackle the big picture issue outlined here. The new Research Support Libraries Group has expressed an interest in this area – Brian Follett, the Chairman of the Group, has stated that their terms of reference include “to seek a national strategy for digitising existing collections of primary research material” - but the issue is not yet reflected in the Group’s minutes, as published on their website.[8]

The analysis offered in this paper may be over-simplistic on a number of counts, and it certainly makes a number of assumptions. Firstly, it assumes that the technology is now sufficiently mature for it to be worth investing money and effort in creating digital surrogates which will be of lasting worth, which I think is sound but may require further thought. It also assumes that the kind of funding model in which libraries are maintained at public cost for public good is a permanent feature of the landscape – that society will continue to believe that it is right for the nation’s citizens to have free access to documentary materials, in the way that a national library provides it, a national databank being a logical corollary in a digital age. This is harder to predict in a world of public-private partnership, and one in which universities are increasingly being expected to act more like businesses. It also makes no mention of the issue of copyright, a huge concern for modern texts and born digital material, but less of a showstopper if we concentrate on the vast pre-20th century heritage.

The idea of a national strategy may seem impossibly challenging, or even presumptuously dictatorial, and it may make unwarranted assumptions about priorities for librarians today. But every time I reject it all as folly and pipe-dreams, I am forced back to an unavoidable feeling that the present unsatisfactory situation can only get worse and more confused if nothing happens. We will pour yet more money into fragmented activities, or start constructing huge stable blocks next door to Mr Benz’s nice new motor car factory – is this not a danger of schemes like Full Disclosure or UKNUC, putting great effort into mapping the physical whereabouts of texts when, a generation down the line, people will no longer need to know?

I believe there is an urgent need for the library sector to pull its act together and for us to find a mechanism to put digitisation of the documentary heritage, and a strategy for achieving it, high on the agenda. Making it all happen is not something that libraries can achieve single-handedly but inertia will lead to regret in due course. It is one of the great visionary challenges for the present professional generation.


  1. http://www.bodley.ox.ac.uk/ilej.
  2. Richard Wendorf (ed.), Rare book and manuscript libraries in the twenty-first century, Cambridge (MA), 1993, p.11.
  3. Anthony Kenny, “Foreword” in L. Carpenter et al (eds.), Towards the digital library, London, 1998, 5-9, p.5.
  4. http://www.bodley.ox.ac.uk/scoping (see section 7.1).
  5. http://gallica.bnf.fr/scripts/ConsultationTout.exe?E=0&O=N098508.
  6. William Hersh, “The way of the future?”, Nature 413 (18 October 2001), 680.
  7. Ray Lester, “A few irreverent thoughts on ‘digitisation’ …”, SCONUL Newsletter 20 (Autumn 2000), 5-7.
  8. Brian Follett, “Just how are we going to satisfy our research customers”, LIBER Quarterly 11 (2001), 218-223, p.222. The Group’s minutes are accessible on their website, http://www.rslg.ac.uk.

Author Details

David Pearson
Wellcome Library
Email: d.pearson@wellcome.ac.uk