My thesis research involved
recruiting several dozen students to annotate lecture passages via a
web form
interface, developing a method for deriving a gold standard from
conflicting annotations, adapting two segmentation programs to produce
hierarchical segmentations, and proposing a statistical measure
suitable to the peculiarities of hierarchical discourse segmentation.
Lucien Carroll. Evaluation of
Hierarchical Discourse Segmentation of Expository Speech.
MA thesis carried out under the supervision of Rob Malouf
and Eniko
Csomay. Presented at the 29th Linguistics Students
Association
Colloquium at SDSU, April 8, 2006. slides
Abstract: There is a large body of literature
describing work in linear discourse segmentation, especially of news
data, and some work describing algorithms for hierarchical discourse
segmentation. However, little work has been done on segmenting more
conversational genres, and even less on evaluating hierarchical
segmentation. I describe a method for compiling a gold standard for
tree segmentation of expository monolog, and I propose an error metric.
I then evaluate two hierarchical segmentation algorithms with that
metric. The segmentation algorithms both perform quite poorly on this
language variety, but one of the two is shown to be significantly
better than baseline segmentations.