Friday, June 30, 2006

A Thousand Words or So

by Paul B. Wiener

All definitions of information may have only one thing in common: they assume it is recognizable. One definition I especially like says: it is a measure of how surprising something is. True, that comes from a protein engineering glossary at the Biomedical Centre in Uppsala, Sweden, but that doesn’t make it any less true. I have a special affection for surprising information because most of it can’t be searched for, and when it’s found, much of it can’t be described. Often it’s not even recognized until its effects are felt, like childhood. This makes someone like me, a librarian with a limited memory and a healthy skepticism about truth, feel a little less guilty for not always being able to answer the questions people ask me. Oh, I can find answers easily enough. Answers are the easy part of information. I just have to rephrase their questions sometimes, to make the answers work. How do you teach people to enjoy being surprised: by giving them what they expect?

One way is by visiting web sites with names that provide no clues about their content. And when the site appears, what’s visible also isn’t much of a clue. Not right away.

One such site is Dredge, produced by a group of electronic media students in our own Department of Art. (And fittingly, you won’t find it listed anywhere by using the anemic search engine on Stony Brook’s Home Page either, though I’m sure that wasn’t the students’ intention.) Virtually nothing on Dredge’s front page tells you what it is – a display site for artistic productions and experiments. Instead, you have to look for the hyperlinks across a large dark, teasing space until you decide that those things down there must be the links. The search process is prompted by the page design. Your “visual brain” is forced to overrule your “textual brain.” Many clueless sites are like Dredge, visually stunning, often textless pages made by people in the arts - photographers, cartoonists, graphics freaks, website designers, Adobe acrobats, hashers, neo-cartographers, poets, topologists - even scientists and farsighted young entrepreneurs like Alex Tew, who invented the Million Dollar Home Page. These people love the challenge of picturing those famous thousand words before you can even think of them. Think of the information gained as similar to what we learn from our dreams, even though they don’t have subtitles.

Another quirky graphic artist’s page that lets you decide how to find its informational content is Leif Parsons’ Page. And another is Ian Timourian’s Mandalabrot.net, which focuses on fractals, visual remixes, generative design, and many of the other new forms of art made possible by computer technology. The site Visual Complexity studies the visual display of information by, well, showing it off. It stuns you with its opening page that presents hundreds of unexplained proprietary information design templates. Bit offers some verbal encouragement too: “Functional visualizations are more than innovative statistical analyses and computational algorithms. They must make sense to the user and require a visual language system that uses colour, shape, line, hierarchy and composition to communicate clearly and appropriately, much like the alphabetic and character-based languages used worldwide between humans.”

Even political forums on the web can score points without using words, animating information to appeal to the newer kinds of “information literacy”. Many web sites and blogs, like GPrime.net, Molecular Expressions, and An Atlas of Cyberspaces, use text to introduce links to the latest text-free games, special effects photography, cartography, optical illusions, digital video and flash animations. And let’s not forget the latest craze, the ever-teasing YouTube. These sites offer learning experiences that can be visually instructive far more quickly than they can be explained – or justified - in words.

Information as most librarians and scholars know it uses symbols – not only words, but numbers, formulae, marks – as well as color and sound, to communicate and document experience. More importantly, most librarians use language to describe the symbols. Symbols called “words” tag ideas, facts, events, people, experiences, memories, feelings, observations. We use them to organize various attributes and similarities. The world thus described is sometimes called “recorded history,” sometimes “science,” and sometimes “reality,” and sometimes “searchable.” What do we do about the information that cannot be so described - the recorded stuff that can only be perceived non-verbally? Do we translate it into words? How many translations (copies, messengers, media, generations, reproductions) will records survive before they lose “authenticity”, whatever that is? This problem long intrigued the philosopher Walter Benjamin, who applied it to works of art. But he wrote it before the internet existed. One answer seems to be: some records survive translation better than other records, and better than most works of art.

There’s a fascinating website that draws attention to the paradoxical fact that art reproduced on the web, because it is “lighted from within,” is sometimes more beautiful than the original thing. Before you leave, take a look at Bibliodyssey, a blog about the beauty of book illustration, old and new. Strange concept - book illustration - isn’t it? Who needs it, especially today, when screens illuminate words everywhere? Illustrating books seemed normal enough once, but here you realize that you are celebrating “books” by reading a blog, (there were no “blogs” three years ago), on a computer screen no doubt using a Windows-based GUI (coded in letters, numbers and signals), and looking at a digital image – one which presumably can never degrade, (since a digital fact weighs no more than an idea and can be dog-eared indefinitely) unless electrons themselves disappear into time….Here’s a lovely image of a drawing (click!) that someone scanned from a 500-year-old manuscript, whose maker once had it painted there by an artist – at this moment the “real one” probably sits unknown and untouchable on a shelf in an ancient library in Rouen, a library morphing into a museum. If most information isn’t art, is it still subject to the kind of degradation that copying and translating from any medium produces? And if the image (or the book, or the movie: remember Fahrenheit 451?) remains in my memory long after the printed and digitized one disappears, will it still be authentic?

Wednesday, June 07, 2006

Truth without Consequences

Google has introduced one of those new search applications that’s just the kind of thing that makes many librarians distrust Googled information. It’s called Google Trends and it provides context-free information that seems to have no bearing outside the arcane world of search itself. But no one ever accused Google of humility. This new engine is a seductive toy that at this stage promises much more than it delivers. Here’s how it works: you searcha word or short (parenthesized) phrase, put a comma after it, then select another (and another three, if you like) to compare it to. Google Trends then tells you how relatively frequently those terms were searched over a few years (on a graph with no scales), as well as what countries or cities it was searched in, and in what language (based on the web version of Google). Some frequency highlights are correlated to currents events, but other than that no explanations are provided, or even suggested, as to why something is trending in any direction. Or on how many searches the trend is based on – 866 or 435,205. Some students may welcome this, for who can doubt that a reason exists for a trend? And a halfway decent student can find reasons much faster than he can find facts.

What are the reasons for interesting trends? A search comparing the terms “stony brook,” “sunysb,” “stonybrook” and “stony brook university” shows us that the single word stonybrook is the second search term of choice used in the US, and that mostly English speakers use it, that India is by far the place where the second highest number of searches came from, that Chinese was easily the second most used Google site for this search, that the phrase Stony Brook is used everywhere much more than the other terms seeking “Stonybrookness”, and that outside the campus community, almost no one ever searches with the term sunysb. You can spend hours discovering similar fascinating, puzzling and potentially meaningless trends by comparing all kind of things, places, names, even numbers. For example, try comparing “0, zero and nothing.” Or “da Vinci, da Vinci Code, Dan Brown and Leonardo.” Which of these five concepts do you think is most searched: “truth, fact, fiction, information and myth?”

۞ I forget just how I came upon this next site, The Athanasius Kircher Image Gallery, at the Stanford University Library. Probably it was randomly from a blog listing. Here are some literally fabulous rarely-seen illustrations of rarely-seen phenomena created by the great, albeit unknown, 17th century German Jesuit polymath. What, you never heard of him? You’ll need to download the DjVu plug-in to see his work. But why were they digitized? The site has impeccable bona fides and links to some fascinating pages about the man, like The Athanasius Kircher Project at Stanford University, The Correspondence of Athanasius Kircher, The Societa Italiana di Storia della Scienza in Florence, Project Director Michael John Gorman, a lecturer in the history of science (at one time) at Stanford

and now a scholar living in Dublin, Ireland, the program of a 2001 Colloquium on Kircher, a gloss on an Exhibition of his Baroque Encyclopedia, an article about Kircher scholarship from the Chronicle, and Wikipedia’s inevitable page on the man. And of course there’s The Proceedings of the Athanasius Kircher Society , which renders the obscurity behind Kircher’s genius into blog-like clarity.

About the only thing I haven’t been able to find is something that explains how his genius impacted the world - though he did make an appearance in an Umberto Eco novel and corresponded with over 750 people. Most of the web sites about Kircher seem full of superficial details. I suppose it takes a certain kind of genius to publish and illustrate dozens of books on many major and arcane subjects in his lifetime, but where’s the rest of him? I haven’t been able to see any of his actual correspondence online (server problems), but there’s no doubt that the images displayed in the Gallery are exotic, deliberately impossible, funny, skillfully drawn and suggestive. It is good to know that we can find images of his Musurgical Ark or his Tarantula and the Musical Antidote to its Poison, even if we aren’t told what they mean. Much of Kircher’s quoted text is either untranslated, is vague, is given no context, is written in strange symbols, or refers to sources that only a grant could verify. In fact, it all would make more sense if we knew it was a hoax, but it isn’t. Still, you won’t find these images on a proprietary database. Who would expect you to pay to see his works? No one. Kircher’s fascinating inaccessability is being paradoxically extended by something libraries will always do well: preserving the virginity of information for its marriage to scholarship.

۞ Statistics don’t lie, they just never tell the whole truth. What reference librarian hasn’t had a student come up and say something like “I need statistics on how many unmarried Japanese men under 40, over 5’8” and under 150 pounds, living abroad and earning less than $50,000 a year missed the train to work because they were shaving on Tuesday?” While there are librarians who attempt to look for answers to such questions, using the usual resources, I rarely do. Instead of telling them the correct answer - 7,387 (J. Ethnic Shaving Sociometrics) - which they never believe, I try to explain that there simply can’t be statistics for such fact . What I can’t say is that I refuse to spend months of searching, analyzing, computing, cross-linking, interpreting and translating to find out that there is no answer. But they wouldn’t accept that either. Why should they, when everyone knows there are fantastic sites offering statistics about baseball, the census, libraries, pregnancy, literacy, jobs, mortality rates, crime, food and movies? Many excellent sites gather tens of thousands of statistical studies and let you believe you can search them. Go ahead, make their day. Have you ever looked at StateMaster or Statistical Universe? If the numbers are there, someone has probably crunched them. But that’s the catch: where do the numbers come from?

It simply doesn’t occur to most people that no one sits around at a desk all day at the center of the world, keeps statistical track of everything happening in the universe at a given moment and relates it to everything else that happens. Only one being does that. For the rest of us, there’s not enough time to track what occurs in time. Maybe Google is working on it. One of the strange lessons of information science is that there are statistics for everything under the sun but what you are looking for. Absorb this lesson, and statistics become less intimidating, almost recreational. Hank Aaron has the all-time home run record? Yes, and he also grounded into double plays – 328 – more than any other player in history – except Cal Ripken, Jr. We know this because there is such a thing as a double play, innings, outs, and rules in a game called baseball. When consequences rule life, statistics will always come to bat. There is such a thing as an infant death, as box office receipts, as employment rates, as book circulation, as starvation. And then there’s everything else. But what about those expatriated Japanese men who shave every Tuesday? Surely they exist, surely they matter. But where are the statistics? Wasn’t anybody watching these guys? When all the information in the world in all languages from books, articles, indexes, statistics, reports, libraries, institutes, laboratories and think tanks - is linked, will librarians be able to answer such questions? Of course not.

Paul B. Wiener

1 June 2006