Friday, August 12, 2016

Bichitra: The Making of a Tagore Website

Bichitra: The Making of a Tagore Website
Sukanta Chaudhuri (with others)

            There are, in the world, more native speakers of Bengali than of Russian, Japanese, German, French, or Italian.   There is only one Bengali writer who has won the Nobel Prize for Literature, but the archive of his writings is larger than Shakespeare's, Goethe's, Proust's, or Faulkner's.    His name is Rabindranath Tagore, poet, novelist, historian, dramatist, painter, sculptor, composer, educator, translator.    His archive of manuscripts and printed works, amounting to over 140,000 pages, is the first major writer's archive to be entirely (almost) digitized and posted to the Internet--"almost" because 40 rare books out of 450 books and 300 out of 3,200 journal items could not (yet) be obtained for reproduction.  The virtual archive was accomplished in two years by a team of 30 plus researchers and computer programmers funded primarily by the Indian government, which found itself justly proud of its Nobel Laureate on the occasion of his 150th birthday in 2011.
            How they did it and why you should care is the subject of a new book, Bichitra: the Making of a Tagore Website, by the project director Sukanta Chaudhuri.   Readers of Chaudhuri's book, The Metaphysics of Text, are familiar with his elegant and clear prose, his attention to detail, his self-effacing grace, and his incredible stamina.  Most of the world needs this book because we don't know Tagore well enough, we don't know Bengali, and we don't know how to build or use virtual archives.  The onus is on us but Bichitra, the book, makes it easy to find out. 
            The first step is to understand the importance and achievements of Tagore himself.  He is a recognized world figure, but few will know that his works (he wrote in both Bengali and English) exist in multiple versions.  Sometimes he turned a play into a novel or vice versa, or he incorporated poems into novels or other works.  Sometimes his works were both collected and anthologized under his supervision, for which he made changes.  Sometimes he wrote the same work (more or less) in both Bengali and English.   But more often he was discovering new things to say with his already written works--he changed his mind--or finding a better way to say what he originally thought.    The richness of Tagore's archive for the study of the genesis of thought and of literary works is unsurpassed by any writer anywhere.  That is why it is called Bichitra, the various, the curious, the bizarre. 
           Obviously a reader needs more than just this book to explain Bichitra, the website.  One needs to be able to work one's way around in the archive.   So, there are tools: search engines and a concordance engine bring Tagore's words and subjects together.  A bibliography with links to every form of each work aggregates the related materials.  A collation program identifies the variants in the different forms of each work.
            It is an archive not an edition.  At one point Chaudhuri modestly calls it a "mere archive" to explain why the site does not explain the genetic process or explicate the significance of textual variants--except for a few examples to show the potentials.  He rightly points out that would be a major project in itself.  The site enables that kind of work; it does not do it for us.    There is nothing "mere" about this archive.  For the first time, persons interested in Tagore can read any one of dozens of versions of his works, can read rare works, can read works in the context of collections of Tagore's works or as originally printed, can read the images of original publications or the transcripts made of them in order to be computer searchable.  And readers can read manuscripts of works (mostly) published, but also for versions that were never published.
            Suppose, however, you are not interested in Tagore, you can still learn much about the Bengali language and its particular difficulties for keyboards, printing presses, and software for searching and collating.  Even questions about fonts receive careful attention.  In the absence of adequate software environments for major literary virtual archives (even for Roman alphabet languages), the Bichtra project invented its own standards for imaging, for transcriptions, and for collations.  Everyone with a large text project confronts the delight and disaster of OCR (Optical Character Recognition) which even at 98% accuracy produces an average two errors per 100 characters (counting spaces) or 40 to 50 errors per page and OCR is of no use at all for manuscripts, which have to be transcribed manually.   Bichitra represents major accomplishments of interest to digital humanists everywhere--if they can just overcome their lack of interest in Tagore or Bengali.  Ignorance is a comfortably debilitating condition, bliss--sort of.
            For me the major accomplishment of the Tagore archive is the images of (almost) every version of every work.  Digital collections of transcriptions are not archives, regardless of what anyone may claim for them.  A transcription is a copy, a reset copy.  It is different from its source text in every character because it is a copy susceptible to error at every character; it is not the original, it is not the same.   Of course, a digital image is a copy also, but it is at least visually accurate.  No one says that a picture of a person is the person.  None should say that a picture of a book is the book.  But digitally, images are as close as technology can get to providing surrogates for the material originals.   Bichitra's crown jewels are its images.  No institution has all the documents; but in this website they are collected, photographed, and mounted.  That is not only great for Tagore studies, but for all aspiring digital archives.  The process, the cameras, the lighting, the negotiations for permissions to photograph, and the alternatives for storing, archiving and displaying images are all so complex that anyone wanting to create a sophisticated archive website will learn much from the Bichitra experience.   But it is so much more.  Images cannot be searched, analyzed or collated.  For these operations transcriptions are needed.  Bichitra provides them. 
             Those last three words were so easy to write.  Over 47 thousand pages of manuscript made transcription anything but easy.  The chapter on manuscript transcription is easily the longest and most interesting because it deals so openly and sensibly with an extremely complex problem.  Most readers will soon get over their unfamiliarity with the language as they get deeper and deeper into considerations of what every manuscript transcriber has experienced.   Transcription is detective work, interpretive work, philosophical work, and practical work.  Before the end of the day, decisions have to be made about how to proceed.  Tagore was a rapid writer and inexhaustible reviser.  Some of his assistants learned to emulate his hand.  Is it a nightmare or a fertile field?  Chaudhuri seems to know that it is the former but he treats it as the latter.
            Every project director and every technical officer and computer science partner on a digital archive project will benefit from reading chapters 6 through 9 in particular.  Chapters 6, 7, and 8 do not shy from technical detail but even technically challenged textual scholars should have no difficulty understanding them.  
            They recount first the task of organizing the file structures required to keep track of hundreds of thousands of individual files of transcriptions and images.   The project team devised a new content management system because there was none to hand adequate for the job. The description of Tagore’s tangled bibliography is merely prelude to describing the organizational system that brought digital order to it.  Next they tackle the job of providing indexing and search capabilities to the website. Third, they describe the construction and function of a collation program that will handle Bengali language and multiple versions.   These three back-end systems and tools represent a formidable accomplishment; given the time in which it was done it is like a miracle.  
            Chapter 9 describes the front end--the user interface design and functions.  Given the intricate and orderly content management system, display of content for the user is potentially infinitely malleable.  The achieved system is not perfect but it is more than a very good beginning.  Nevertheless, the project was launched at a significantly high plateau of achievement.
       Chapter 10 treats the entire project as a good start and addresses three areas for improvement: additions to the content; improvements of the internal synchronization of images and transcriptions, and additional analytical tools and uses for the content.   The project, thus, fulfills the expectations of modern modular project structures, rejecting the intricate monoliths of early electronic projects.  It is extendible.
      The book begins and ends with acknowledgements to those who constructed or supported the project.  It is fitting that this description of so large a project, with such high standards, should begin and end so.  It takes a village to build a digital archive.