Friday, September 21, 2012

Embedded KML viewer

Data and the visualization is from Jennifer Berdan's project, one that she developed during an independent study (Spring 2012, SCAND 596; and Winter 2013, DH 596) with me.

The file loads slowsly, so be patient. If you move the time-slider from past to present, the icons that turn yellow indicate institutions that became co-ed.

Jennifer presented her work at the Women’s History in the Digital World conference, March 22-23 2013 at Bryn Mawr College:

Friday, May 4, 2012

Topic modeling

What will you need?

1. Mallet Gui (it has been downloaded and installed on classroom computers)
2. Texts:
a) State of the Union Addresses: 2005-2012
Download a zip file from here.

b) Random selection of 22 Eighteenth-Century texts
Download a second zip file from here.

Sample results

Example of results from a topic modeling approach to State of Union Addresses: 2005-2012. Topics and their contributions. The full spreadsheet is here


or, for DH201here Network Diagram of topics in Bush vs. Obama.

Friday, April 27, 2012

Text analysis with WordSmith

What you produce:

a text-analysis using State of the Union Addresses
Resource: Text Analysis and Visualization Lib Guide:

What you need:

1. Text Analysis applications:

a) Voyant/Voyeur: web-based, incorporates analysis and visualization.
b) WordSmith: desktop application, export as Excel to visualize the data.
c) Excel

2. Texts:

a) Presidency Project: Obama & Clinton.
b) Project Gutenberg's SOU corpus: 1790-2001
c) download all of these here:

How to proceed:

1. Let's look at an example of a text-analysis problem: TV ads for boys vs girls.
2. What would you need to perform this type of comparison?
3. Now, let's say you want to compare Obama State of the Union Addresses with Bush's. Design the experiment.

Step-by-Step in Voyant.

1. Obama vs Bush in Voyant.

2. What are the limitations of Voyant?
3. What do we gain by using WordSmith instead?
a) Customize the stop-list
b) Larger corpus
c) Other languages
d) Smarter texts -- we can add tags/lemmatize, do more massaging.
e) collocates, phrases, etc.

Step-by-Step in WordSmith.

We need a reference corpus (large) to compare to individual texts.
1. Wordlist for the batch
2. Wordlist for individual texts.


1. zip file of the individual texts
2. wordlist of the large (combined) batch.

(It won't find that many keywords because the texts are pretty similar. We need a bigger reference corpus.)

Keywords II.
1. Bush & Obama (individual texts)
2. Larger corpus: All state of the union addresses.

That generates some interesting Keywords.
Output is in excel -- that can be turned into a chart.

Hint: Publish gadget is here.
Hint: Google spreadsheet is here.

Sunday, April 22, 2012

Network Analysis with Gephi

What you will produce: 

  • A network visualization: online version

    What you need

    • SOFTWARE: Gephi (free, open-source, works on Mac/Windows, combines analysis and visualization)
    • DATA: Two Data sets
      • Facebook (a choice)
      •  Download Icelandic manuscript network (CSV file)

    Steps for creating your own data from Facebook

     (your own account or a group you belong to)
    • Sign in to a Facebook account
    • Go to Netvizz:
    • Create your "personal friend network.
    • (Skip the checkbox in Step 1.)  This may take a while. 
    • Save the .gdf file

    Workshop slides are here, or DH250 Slides here

    Feedback form is here

Thursday, April 12, 2012

Text analysis and visualization

Many Eyes
you will need: an account at ManyEyes

FIND: a (machine-readable) text
State of the Union Addresses: American Presidency Project

MASSAGE: clean it up, if need be

UPLOAD: copy and paste
We will make five visualizations:
  1. Word Cloud,
  2. Word Tree,
  3. Phrase Net,
  4. Tag Cloud,
  5. and then a second Tag Cloud that compares 2009 and 2012.
We will embed one of those visualization in our blog.

What if I want to compare more than two texts?
(e.g. 2009, 2010, 2011, 2012



We will upload four texts, and compare them.
We will explore some other Voyeur visualizations, e.g. Bubblelines.

FIND: a (machine-readable) text(s)
Voyeur accepts a URL, plain-text, HTML, XML, and (some) PDFs.
State of the Union Addresses: American Presidency Project

MASSAGE: or download these


Other activities:

1. Project Gutenberg's State of Union Addresses corpus.  Use the URL for the HTML version and Voyant will upload it.