Language generation in D2S

The input of the Language Generation Module (LGM) of D2S consists of data. In GoalGetter, the input is a table listing the main events in a soccer match; in DYD the input is formed by data on a particular composition by Mozart. The LGM expresses the input data using a collection of so-called syntactic templates.

Syntactic templates are, put informally, syntactic parse trees for sentences with slots in them. These slots are filled with expressions for variable information. Below we see a (simplified) example of a template. It is the English translation of a template from GoalGetter, which may be used to generate sentences of the form <time> <player> had <player's> <Nth> goal noted, where variable expressions may be filled in to express the time, the player, and the number of the goal.

Example template

The template is associated with a certain topic, in this case game-course (that is, goal scoring). Other topics in GoalGetter are general and cards. During generation, templates sharing the same topic are grouped together in one paragraph, thus ensuring coherency of the generated text.

Each template also contains a list of conditions which specify when the template can be used properly. The conditions on the example template state that it can only be used if the goal in question has not yet been mentioned, thus ensuring that the same information will not be given more than once. Additionally, the goal must not be the player's first goal in this match, and it must not be an own goal. Finally, the teams must already have been mentioned, to ensure that the input information is presented in a natural order, while still allowing for some variation.

In contrast to most other NLG systems, the LGM has no separate module which determines the structure of the output text in advance. In the LGM, the ordering of the sentences in the text is determined during generation and fully depends on the local conditions on the templates, taking the preceding discourse into account. Whenever there are multiple possibilities that are equally suitable given the context, a random choice is made. This strategy is used both for template choice and for the filling of the template slots, thus achieving maximal variation in the generated texts. For descriptions of the actual generation algorithm we refer to the bibliography.


Back to GoalGetter homepage