Bichitra:
The Making of a Tagore Website
Sukanta
Chaudhuri (with others)
There are, in the world, more native
speakers of Bengali than of Russian, Japanese, German, French, or Italian. There is only one Bengali writer who has won
the Nobel Prize for Literature, but the archive of his writings is larger than
Shakespeare's, Goethe's, Proust's, or Faulkner's. His name is Rabindranath Tagore, poet,
novelist, historian, dramatist, painter, sculptor, composer, educator,
translator. His archive of manuscripts
and printed works, amounting to over 140,000 pages, is the first major writer's
archive to be entirely (almost) digitized and posted to the Internet--"almost"
because 40 rare books out of 450 books and 300 out of 3,200 journal items could
not (yet) be obtained for reproduction. The
virtual archive was accomplished in two years by a team of 30 plus researchers
and computer programmers funded primarily by the Indian government, which found
itself justly proud of its Nobel Laureate on the occasion of his 150th birthday
in 2011.
How
they did it and why you should care is the subject of a new book, Bichitra: the Making of a Tagore Website, by
the project director Sukanta Chaudhuri.
Readers of Chaudhuri's book, The
Metaphysics of Text, are familiar with his elegant and clear prose, his
attention to detail, his self-effacing grace, and his incredible stamina. Most of the world needs this book because we
don't know Tagore well enough, we don't know Bengali, and we don't know how to build
or use virtual archives. The onus is on
us but Bichitra, the book, makes it
easy to find out.
The first step is to understand the
importance and achievements of Tagore himself.
He is a recognized world figure, but few will know that his works (he
wrote in both Bengali and English) exist in multiple versions. Sometimes he turned a play into a novel or
vice versa, or he incorporated poems into novels or other works. Sometimes his works were both collected and
anthologized under his supervision, for which he made changes. Sometimes he wrote the same work (more or
less) in both Bengali and English. But
more often he was discovering new things to say with his already written
works--he changed his mind--or finding a better way to say what he originally
thought. The richness of Tagore's
archive for the study of the genesis of thought and of literary works is
unsurpassed by any writer anywhere. That
is why it is called Bichitra, the various, the curious, the bizarre.
Obviously
a reader needs more than just this book to explain Bichitra, the website. One needs to be able to work one's way around
in the archive. So, there are tools:
search engines and a concordance engine bring Tagore's words and subjects
together. A bibliography with links to
every form of each work aggregates the related materials. A collation program identifies the variants
in the different forms of each work.
It is an archive not an
edition. At one point Chaudhuri modestly
calls it a "mere archive" to explain why the site does not explain
the genetic process or explicate the significance of textual variants--except
for a few examples to show the potentials.
He rightly points out that would be a major project in itself. The site enables that kind of work; it does
not do it for us. There is nothing
"mere" about this archive. For
the first time, persons interested in Tagore can read any one of dozens of
versions of his works, can read rare works, can read works in the context of
collections of Tagore's works or as originally printed, can read the images of
original publications or the transcripts made of them in order to be computer
searchable. And readers can read
manuscripts of works (mostly) published, but also for versions that were never
published.
Suppose, however, you are not
interested in Tagore, you can still learn much about the Bengali language and its
particular difficulties for keyboards, printing presses, and software for
searching and collating. Even questions
about fonts receive careful attention.
In the absence of adequate software environments for major literary
virtual archives (even for Roman alphabet languages), the Bichtra project
invented its own standards for imaging, for transcriptions, and for
collations. Everyone with a large text
project confronts the delight and disaster of OCR (Optical Character
Recognition) which even at 98% accuracy produces an average two errors per 100
characters (counting spaces) or 40 to 50 errors per page and OCR is of no use
at all for manuscripts, which have to be transcribed manually. Bichitra represents major accomplishments of
interest to digital humanists everywhere--if they can just overcome their lack
of interest in Tagore or Bengali.
Ignorance is a comfortably debilitating condition, bliss--sort of.
For me the major accomplishment of
the Tagore archive is the images of
(almost) every version of every work.
Digital collections of transcriptions
are not archives, regardless of what anyone may claim for them. A transcription is a copy, a reset copy. It is different from its source text in every
character because it is a copy susceptible to error at every character; it is
not the original, it is not the same.
Of course, a digital image is a copy also, but it is at least visually
accurate. No one says that a picture of
a person is the person. None should say
that a picture of a book is the book.
But digitally, images are as close as technology can get to providing
surrogates for the material originals.
Bichitra's crown jewels are its images.
No institution has all the documents; but in this website they are
collected, photographed, and mounted.
That is not only great for Tagore studies, but for all aspiring digital
archives. The process, the cameras, the
lighting, the negotiations for permissions to photograph, and the alternatives
for storing, archiving and displaying images are all so complex that anyone
wanting to create a sophisticated archive website will learn much from the
Bichitra experience. But it is so much
more. Images cannot be searched,
analyzed or collated. For these operations
transcriptions are needed. Bichitra
provides them.
Those last three words were so easy to write. Over 47 thousand pages of manuscript made
transcription anything but easy. The
chapter on manuscript transcription is easily the longest and most interesting
because it deals so openly and sensibly with an extremely complex problem. Most readers will soon get over their
unfamiliarity with the language as they get deeper and deeper into
considerations of what every manuscript transcriber has experienced. Transcription is detective work,
interpretive work, philosophical work, and practical work. Before the end of the day, decisions have to
be made about how to proceed. Tagore was
a rapid writer and inexhaustible reviser.
Some of his assistants learned to emulate his hand. Is it a nightmare or a fertile field? Chaudhuri seems to know that it is the former
but he treats it as the latter.
Every project director and every
technical officer and computer science partner on a digital archive project
will benefit from reading chapters 6 through 9 in particular. Chapters 6,
7, and 8 do not shy from technical detail but even technically challenged
textual scholars should have no difficulty understanding them.
They recount first the task of
organizing the file structures required to keep track of hundreds of thousands
of individual files of transcriptions and images. The project team
devised a new content management system because there was none to hand adequate
for the job. The description of Tagore’s tangled bibliography is merely prelude
to describing the organizational system that brought digital order to it.
Next they tackle the job of providing indexing and search capabilities to
the website. Third, they describe the construction and function of a collation
program that will handle Bengali language and multiple versions.
These three back-end systems and tools represent a formidable
accomplishment; given the time in which it was done it is like a miracle.
Chapter 9 describes the front end--the
user interface design and functions. Given the intricate and orderly
content management system, display of content for the user is potentially
infinitely malleable. The achieved system is not perfect but it is more
than a very good beginning. Nevertheless, the project was launched at a
significantly high plateau of achievement.
Chapter
10 treats the entire project as a good start and addresses three areas for
improvement: additions to the content; improvements of the internal
synchronization of images and transcriptions, and additional analytical tools
and uses for the content. The project,
thus, fulfills the expectations of modern modular project structures, rejecting
the intricate monoliths of early electronic projects. It is extendible.
The
book begins and ends with acknowledgements to those who constructed or
supported the project. It is fitting that this description of so large a
project, with such high standards, should begin and end so. It takes a
village to build a digital archive.