Adaptive multimodal natural language generation

Speech, prosody & corpus-based methods

Multimodality, cognition and evaluation



Speech, prosody & corpus-based methods

This subproject is devoted to context-dependent speech synthesis and text-to-text natural language generation. The work on speech synthesis concentrates on improved prosody prediction (placement of pitch accent and prosodic boundaries of various kinds). The work on language generation concentrates on sentence fusion (e.g., to combine partial answers, perhaps from different QA engines, to obtain more complete and focused answers).

For speech synthesis, the NeXTeNS text-to-speech system for Dutch was integrated in the IMIX demonstrator. NeXTeNS has been improved in many ways, e.g., the text preprocessing was made more robust against ill-formed input, and morphological analysis was added to improve grapheme-to-phoneme conversion. Exploiting the output from the Alpino parser (used for language analysis within IMIX) is in progress. In addition to the original plan, a talking head (RUTH) was ported from English to Dutch, interfaced with NeXTeNS, and added to the IMIX demonstrator. Marsi (2004) reports on evaluation results of prosody prediction for NeXTeNS, and Marsi & van Rooden (2007) present some experiments on the expression of (un)certainty using RUTH.

To facilitate text-to-text generation we have developed a special-purpose graphical annotation tool (Gadget) which enables us to manually align the dependency analyses. With this, we have created a first parallel monolingual (Dutch) corpus. On the basis of this corpus, we have developed a fully functional prototype of a sentence fusion module which takes two sentences as input, performs linguistic analysis, automatically aligns the nodes of the corresponding dependency graphs, classifies the semantic relations between aligned nodes, and generates new variants which are restatements, generalisations or specifications of the orginal input sentences. The tree alignment and semantic classification were evaluated on the corpus using cross-validation; the surface generation output was evaluated by human judges. First results from this research are described in Marsi and Krahmer (2005ab). Since aligning dependency trees is very similar to recognizing equivalence and entailment, we have used this technique to participate in the Second and Third Recognizing Textual Entailment Challenges (Marsi et al. 2006; 2007).