Treecloud of the 50 most frequent words
in Obama's presidential campaign speeches.
TreeCloud has a new website at treecloud.org!
In December 2007, Jean Véronis
published on his blog
a word cloud with words arranged on a tree which reflected their semantic proximity
inside a corpus of newspaper articles.
The program below,
TreeCloud, builds such
tree clouds
for any text. Such pictures may be used for many purposes:
quick visualization of the global content of a text (article, speech, report, book, ...),
discourse analysis, text comparison through the comparison of their tree clouds.
To know more, please watch
this slidecast.
About the use of tree clouds in literature analysis, you can read
this article (in French).
If you use TreeCloud, please cite this page or:
Philippe Gambette, Jean Véronis:
Visualising a Text with a Tree Cloud,
IFCS'09
(
supplementary material).
For feedback, questions, feature requests or bug reports for TreeCloud, etc., please
contact me,
or leave a message on Jean Véronis's blog or
mine.
Downloads
- TreeCloud 1.3 (13/12/2009):
You can download the archive
Treecloud.zip (to use it, you need
Python2.X, Java, and
SplitsTree 4.10 on your system)
and decompress it in any folder. It contains the following files:
- Treecloud.exe,
a graphical user interface for Windows.
- Treecloud.py,
the main Python script.
- TreecloudFunctions.py,
a Python library of functions used by Treecloud.py.
- the TreeCloud manual,
to know how to install and use the program
- English, French (adapted from the Dico stoplist)
and German stoplists, you can find other stoplists here.
- HISTORY.txt: changelog
- COPYING.txt: GPL License
- to visualize the location in the text of the words which appear in the tree cloud,
you can use AntConc
(especially the "concordance plot" tab).
-
Cooccurrence,
an optimized C program by Jean-Charles Bontemps, to compute very
quickly the nexus file of cooccurrences. He also provides
a web interface
to create tree clouds, in beta version.
-
User-friendly TreeCloud 0.6 for Windows
(Delphi 6 sources,
in particular UPGMA and EqualAngle here,
in the functions UPGMA, sortLeaves, computeDrawing and draw,
changelog).
Next versions
I'm seriously considering the following ideas for the next version of TreeCloud,
they are easy to implement:
- output a regular tag cloud instead of just providing an input file for
TagCloud Builder,
- multiple concordance: provide a set of words as parameters, TreeCloud retrieves
their context (x words before, x words after) in the text and outputs this file,
and also uses it as input to build a tree cloud focused on this set of words
(David Barrowcliff gave this idea),
- list of word changes: define sets of words which can be replaced by
another one, for example "love" "passion" "feeling" are replaced by "love"
in a preprocessing step (Delphine and other users gave this idea),
I will solve the following problems when I find a solution:
- avoid using SplitsTree to compute the tree,
and use other tree reconstruction algorithm,
- implement the tree reconstruction algorithm by
Barthélémy and Luong,
- avoid using SplitsTree to visualize the tree (maybe
Scriptree),
- compute cooccurrence according to the hypergeometric model
(needs approximations to compute binomials).