My goals for DigiWriMo have been very modest. I’ve never aimed to hit the 50,000 words / month mark, really, or even the 1,500 words / day, thanks largely to my teaching and job-hunting schedule for the term. About 70% of the academic jobs I am applying to have deadlines this month, which coincides nicely (how did they know?) with three heavy grading cycles. My schedule puts me on campus about 8-7 four days a week, with very little “me” time during those hours. And other than a partial reprieve this weekend, my month is organized so that I have anywhere from 50-100 papers awaiting grading at any given point. It’s been profoundly difficult to switch out of grading mode and into profound thoughts / experimental writing mode when I finally reach a stopping point on the weekends. Oy.
That doesn’t mean that I’m not keeping the gears turning / working on other projects and ideas. Rather than focus on word count, I decided to use the month to encourage myself to actively experiment with web-based tools that I’ve ever only dabbled with or have perused, but never interacted with, such as Wordle, Storify, and Tumblr. I’ll blog about / link to those experiments at various points, and I’m leaning towards using Storify to curate some of these experiments going forward. As more and more digital humanities folks are working with / in Google+, I’m going to finish developing a profile there and see what comes of it — although I should point out, my reluctance in that department largely comes from the fact that my techno-phobic mother, whom I love very much, is already on Google+, and I have NO idea what to make of that.
For those who are unfamiliar with Wordle, it’s a handy little web tool that produces visual word clouds out of texts that you input directly. Simply type your chosen text into a box, or provide a weblink to a source text, and Wordle crunches the numbers behind the scenes, producing an image where the most frequently occurring words in that selection dominate the visual map, while the lesser-used words fade into the background. The result is a sometimes-startling visual representation of the way specific linguistic signifiers dominate a given text.
Wordle has long seemed to me to be an excellent way to use quantifiable data to create more visually-interesting / visually-rich representations of language. In an effort to combine Wordle with a computational knowledge engine, I first went to Wolfram Alpha, which as I noted in a post last spring, has indexed the complete works of Shakespeare. I ran a search on Hamlet and took note of the top ten most frequently capitalized words: “I,” “O,” “Hamlet,” “I’ll,” “Who,” “Laertes,” “Horatio,” “Denmark,” “England,” and “Ophelia.” As I had not yet used Wordle’s advanced functions, which allow you to input words with numeric values representing their frequency within a given text, I did a little math (yes, I know…I survived, though, don’t worry) to first quarter the total word count of the play, then applied a preserved-ratio figure for each word, in order to reduce the number of times I would have to physically repeat each word in a given text in order to maintain the frequency. The visual result was quite startling:
“I” leaps out of the image, dominating the screen in every formation that I looked at. Not that this is altogether so surprising — after all, this is the play about the character who is always talking about himself (although the first person pronoun does not appear in his famous soliloquy). However, I was really not convinced that I was working with the most interesting or even useful data set. After plugging a couple of other plays (Macbeth, All’s Well That Ends Well) into Wolfram Alpha and getting similar results, I felt that these data sets might be more useful when comparing the frequency and percentage of “I” or the proper names of characters, between playwrights in the period — in other words, what are the frequencies and percentages across the works of Jonson in comparison to Shakespeare? Since Wolfram Alpha does not yet have Jonson’s works indexed (and why not, I’d like to know!), this did not really look feasible.
Aha! Much more interesting results (perhaps)! The most prominent words are all proper nouns, with “Hamlet” being the most dominant name, followed in this instance by (it would seem) Gertrude, or Polonius. Again, this stands to reason — after all, this is the act that ends when Hamlet mistakenly kills Polonius in Gertrude’s bedroom, following an intense conversation over her possible complicity in the death of “Old Denmark,” Hamlet Sr. But again, as I looked over the graph, I realized that these results actually signaled a problem with the data set. Simply cutting – and – pasting the electronic text necessarily included all of the dialogue markers, meaning every time Hamlet uttered a single line, his name appeared in the text. The results are still significant — Hamlet is doing most of the talking in the act, and when he’s not, he is being named as the subject of discussion / interest in other conversations — but not particularly startling.
One more try, then. This time, I painstakingly deleted all dialogue markers, all stage directions — everything that was not pure dialogue. This was the result:
So there we have it — the most frequently occurring word in act three of Hamlet is “Lord.” And again, as I reflected on the content of the data set, the scenes in the third act, and the general language of the play, this was hardly surprising. All but a very small percentage of the play takes place at Elsinore, and the dialogue tends to occur between Hamlet and his compatriots (who address him as “my lord”), between Claudius and his courtiers / underlings, who address him likewise, with some exchanges between Hamlet, Claudius, and Gertrude as well.
Ultimately, I went into this experiment hoping to see how Wolfram Alpha could be combined with a tool such as Wordle to create a visual map representing the high frequency occurrence of a unique and pointedly interesting signifier — all right, I’ll admit it, I wanted to see something pointing towards Hamlet’s vaunted interiority and penchant for brooding on melancholy visions of reality. And perhaps, if / when I take the time to strip the entire play of everything but dialogue and create a new word cloud, such a signifier may become more prominent, although I suspect the results would actually be quite banal. But I did accomplish my goal of using these two web-based tools to conduct a form of research and experimentation that I generally do not practice, in this case on a text that I know extremely well. This might produce material for a more traditional scholarly paper, examining the relationship between courtly manners and sincerity in the way Hamlet and Claudius interact with other characters in the play; however, my strongest urge at this time is to sit back down and reread the play with this fresh perspective in mind. In short, the word clouds have provided me with a new lens through which to consider the play.
I’ll close with a reflection on the way textual searches and quantitative data sets have the potential to reveal new patterns while also highlighting potentially old patterns that would not be easily visible through basic textual comparison. A few years ago, a job candidate presented an excerpt from their research project, arguing that the relationship between two seemingly mundane words in a variety of sixteenth century texts — both literary and non-literary — pointed to the existence of a unique cultural attitude about certain physical properties of common household items of the period (I’m being deliberately vague here to keep identities private). At the end of the talk, which received a lot of praise, this person was asked what triggered their investigation, and they responded by noting that they kept seeing these words show up together in searches of online archives and databases, prompting them to wonder whether or not there was a worthwhile relationship there to pursue. It was interesting, s/he noted, that technology might have enabled his/her research to discover something that might have a lasting impact on the way scholars of the period understood something vital about popular culture of the time, which might not otherwise have come to light. The response was that technology may also have created a false relationship — search engines are not perfect, after all, and even the best are ultimately only able to find materials that pertain to the initial query, initiating a sort of Heisenberg effect: intertextual relationships may seem to occur only because search queries request separate data sets that otherwise would never be paired together.
In my opinion, both perspectives are meritorious, although I tend to side with the final opinion that data-driven searches do not always reveal information that is meaningful — in the way we, as literary scholars and researchers, tend to think of textual relationships as being meaningful. If there is a simple takeaway here, I believe it to be this: technology has incredible possibilities to quite literally changes the way we look at literary texts, but this does not mean abandoning our search for meaning in these texts. Instead, I suggest that we pause to reconsider what our criteria are for considering our results to be meaningful.