What does the story Alice’s Adventures in Wonderland “look” like to a digital text analyzer? Let’s find out using the free text-mining tools Voyant!
Digital text analysis tools give an overview of a document or a corpus of text through tracking words in the text, revealing patterns that are difficult or too time-consuming for close reading with the human eye. Voyant provides a set of over 20 text analyzing and visualizing tools; it is open source, free, web-based and easy to use. In this Research Bridge post, we will explore some of the tools using a simple example.
Alice, White Rabbit, and The Mad Hatter
In the story of Alice’s Adventures in Wonderland, which words appear the most? When do different characters show up in the text? I uploaded the story to Voyant, using the text from Gutenberg Project. One quick glance of the word frequencies is the word cloud created by the Cirrus tool in Voyant:
Another tool Bubblelines shows word frequencies throughout the segments of the text. See how the White Rabbit (the second line from the bottom) pops up at the beginning, the middle and the end; and Cheshire Cat appears in the middle of the section with the Queen:
The Collocates Graph shows terms that appear in close proximity in a network. You can mouse over the terms to see the frequencies, the collocate counts, right mouse-click at a term to remove it. Try this out in the snippet below:
The Voyant-Tools Interface
At the Voyant entry page, you can upload one or more text documents, or put in URLs. Once you upload a document or a corpus, Voyant shows you a default view of 5 tools. Choose another tool using the menu at the top of a tool pane (circled part). You can bookmark or export the URL of the Voyant page, and export the result on any tool pane.
Apart from the few tools explored here, other tools can analyze your text in various ways, and present the results in different interesting graphs. Many of them are particularly helpful when your corpus contains multiple documents. For example, RezoViz shows connections between people, places and organizations that co-occur in different documents. Trends is a line graph showing the distribution of a word’s occurrence across a corpus or documents. These tools can help you find patterns across or within documents. Imagine you can use Voyant Tools to compare use of words in literary works of one author or different authors; or to see a change in vocabulary through time in historical documents.
If you have any interesting ways to use Voyant, or know of other analytic tools you want to share with fellow researchers at HKUST, write to us at Research Support Services!
— By Gabi Wong, Library
last modified November 6, 2019