Friday, April 27, 2012

Text analysis with WordSmith

What you produce:


a text-analysis using State of the Union Addresses
Resource: Text Analysis and Visualization Lib Guide:

What you need:


1. Text Analysis applications:


a) Voyant/Voyeur: web-based, incorporates analysis and visualization.
b) WordSmith: desktop application, export as Excel to visualize the data.
c) Excel

2. Texts:


a) Presidency Project: Obama & Clinton.
b) Project Gutenberg's SOU corpus: 1790-2001
c) download all of these here: http://zoe.ats.ucla.edu/wordsmith_files.zip

How to proceed:



1. Let's look at an example of a text-analysis problem: TV ads for boys vs girls.
2. What would you need to perform this type of comparison?
3. Now, let's say you want to compare Obama State of the Union Addresses with Bush's. Design the experiment.

Step-by-Step in Voyant.


1. Obama vs Bush in Voyant.


2. What are the limitations of Voyant?
3. What do we gain by using WordSmith instead?
a) Customize the stop-list
b) Larger corpus
c) Other languages
d) Smarter texts -- we can add tags/lemmatize, do more massaging.
e) collocates, phrases, etc.

Step-by-Step in WordSmith.


WordList:
We need a reference corpus (large) to compare to individual texts.
1. Wordlist for the batch
2. Wordlist for individual texts.

StopList

Keywords:
1. zip file of the individual texts
2. wordlist of the large (combined) batch.

(It won't find that many keywords because the texts are pretty similar. We need a bigger reference corpus.)

Keywords II.
1. Bush & Obama (individual texts)
2. Larger corpus: All state of the union addresses.

That generates some interesting Keywords.
Output is in excel -- that can be turned into a chart.

Hint: Publish gadget is here.
Hint: Google spreadsheet is here.

No comments:

Post a Comment