Language Analysis for Dialogue Management in a Theatre Information & Booking System

T. Andernach†, H. ter Doest†, R. op den Akker†, G. van der Hoeven†, S.P. van de Burgt‡, J. Schaake† & A. Nijholt†
†University of Twente, Department of Computer Science
‡Royal PTT Nederland
Parlevink Research Group
PO Box 217, 7500 AE Enschede
The Netherlands

Email: {andernac, terdoest}@cs.utwente.nl
Tel.: +31 53.893740

Abstract

SCHISMA is a joint research project of KPN (Royal PTT Nederland) and the University of Twente. The project aims at providing a natural language dialogue system which interfaces a database containing information about theatre performances in a certain city or region. The interface should make it possible to ask about performances in general, to tune in to a specific performance and, if desired, make a reservation for this performance. Research until now has concentrated on various aspects of realising such a theatre information and booking system. Among these aspects are the building of a Wizard of Oz environment for the acquisition of a corpus of dialogues for this domain, analysis and tagging of the dialogue corpus, recognition of domain-specific concepts (actors, authors, plays, dates, etc.), syntactic analysis and dialogue modelling. The emphasis in this paper will be on the analysis of individual utterances at various levels. Most important for the project are short and medium term goals of delivering prototype systems that allow demonstration of the system and evaluation of the design choices. Due to these goals the project does not strive at incorporating advanced but isolated research results on discourse models and syntactic and semantic analysis. Rather we investigate how to identify user preferences and how to embed systems like these in a more comprehensive environment of information and transaction services.

1. INTRODUCTION

SCHISMA is a collaborative project of the University of Twente and KPN Research, the R&D department of Royal PTT Nederland. Groups involved are the Parlevink group of the Department of Computer Science of the University of Twente and the Speech and Language group at KPN Research. The aim of the project is to develop a prototype of a natural language dialogue system, that allows users to obtain information about theatre performances and to book seats for such performances. The language used by the system prototype is Dutch. The project has support from the Dutch theatre de Twentse Schouwburg in Enschede. Equally important to developing a prototype, is the goal of gaining a deeper insight in the problems one encounters in the process of building natural language dialogue systems. Most important questions are: what is a good method for building such a dialogue system, what are the necessary steps in such a process and what parts of the system can be re-used for other domains or applications as well? Hence, the project can best be characterised as a software engineering exercise with a strong emphasis on the use of language engineering tools. Although we intend to develop a system allowing spoken dialogues we have chosen for a first prototype that is restricted to use typed dialogues. The key issue in developing the prototype will be the construction of a dialogue manager and the introduction of a satisfactory notion of dialogue state. In the (near) future, the modality of speech will be added. This paper however will not focus on the dialogue aspects, but on the analysis of single utterances. To come to a full understanding of an utterance the context of the dialogue must be known. But preliminary analysis steps at various levels are needed before the correct interpretation in the context of the dialogue can be given. The paper discusses a number of such steps: error correction and determination of word boundaries, recognition of special phrases and sentence elements, (like punctuation marks, denotations for numbers, dates, and names for players and performances) robust determination of (simple) syntactical structure and analysis of communicative function. In Figure 1, a global view of the architecture is given.

The paper is organised as follows. In section 2 we discuss the component of the architecture that is currently responsible for morphological analysis, recognition of domain concepts and error handling (MAF). Section 3 is concerned with the parsing component of the system and section 4 is concerned with experiments to obtain dialogues and with speech act analysis of utterances in the corpus thus obtained.

In Van der Hoeven et al. (1995) the dialogue aspects and in particular the view on dialogue state in the SCHISMA project are discussed. In a future paper more attention will be given to the different approaches that the project follows in this respect (e.g., an update semantics approach and an approach to dialogue modelling with finite state automata). Cf. Schaake & Kruijff (1995) and Bos (1994).

2. MORPHOLOGICAL ANALYSIS AND FAILURE HANDLING (MAF)

2.1 Introduction

The input of the MAF module is the character string typed in by the client. The MAF module is best seen as the pre-processor of the SCHISMA system. It handles typing errors, detects certain types of phrases (proper names that occur in the database, date and time phrases, number names, etc.). Output of the MAF module is a wordgraph. By wordgraph we mean a directed graph having as its nodes word boundaries indicating the start and/or beginning of words in the input string. Nodes are labelled with indices on the word level, that is, a node has as index the number of identified words that are to the left of the node. Edges (index1, index2) are allowed only if index1 < index2. In addition, edges are labelled with a value m that indicates the quality of the recognition (and maybe correction) performed.

On the implementation level this means that the MAF module has as output a collection of items (rd,m) where rd is a 3-tuple (fstruct,index1,index2), fstruct a PATR-II feature structure (see Shieber, 1986), index1 and index2 indices on the word-level as discussed above, and m is a value indicating the plausibility of rd as a representation of (part of) the input string.

Internally, the input string is represented as a character graph; intuitively, a charactergraph is the same concept as a wordgraph with the distinction that nodes are now on the characterlevel. The architecture of the MAF module as depicted in Figure 2 should now be understood as follows: the error correcting component ERROR outputs a charactergraph that is provided to the tagging modules PROPER, NUMBER, DATE and TIME and to the MORPH/LEX module. The ERROR module in addition labels the edges of the graph it outputs with the plausibility measure and with the string it maps the substring represented by the edge to.

For performing the error correction ERROR has access to a large dictionary (typically 200,000 words). The tagging modules look for phrases in the input string that contain particularly important information for the dialogue; especially the detection of proper names referring to database items, phrases representing date and time and number names is aimed at here; for detecting proper names referring to the database the PROPER module needs access to the SCHISMA database. MORPH/LEX creates items for the parser out of the tag information provided by the taggers and it searches for words that appear in the domain-specific lexicon and for which domain-dependent semantic information is recorded in it.

An example of a charactergraph tagged for special phrases now follows. Suppose the string typed in by the client is Ik wil de veertiende graag naar Verde then, if we denote edges by brackets, the following is a reading of the input:

[Ik] [w il]wil [DATE de veertiende] [graag] [naar] [PROPER Verde]Verdi
(On the 14th I would like to go to Verdi)

where corrections (if any) immediately follow closing brackets. Another reading of the string would be to recognize de veertiende as

[de] [NUMBER veertiende]

In section 2.2 the error correcting module will be discussed. A more precise description of the taggers will be given in section 2.3 and in section 2.4 the MORPH/LEX component is defined. In section 2.5 some words are devoted to implementation issues.

2.2 Error Correction

As we postponed the development of a spoken interface to the SCHISMA system, we can concentrate here on the analysis of keyboard input. Clearly the analysis of typed input is somewhat simpler than spoken language recognition. However a typed interface introduces the challenge of handling typing errors the client makes. In detecting and correcting typing errors we have to use knowledge of what character sequences may and what may not appear.

Roughly, there are two approaches to this from the engineering point of view: the integrated approach and the pre-processor approach. In the integrated approach recognition of tokens (lexical items, number names, etc. (see discussion below)) is done simultaneously with the error correction.

The pre-processor approach requires the introduction of generic knowledge of what character sequences definitely may and what may not occur. It makes use of what trigrams of characters and what triphones (trigrams of phonemes) are viable in the Dutch language (given a dictionary of words that may occur). Using these trigrams substrings of the input string are compared to words in the dictionary. We refer to Vosse (1994) for details on this error-correction method.

For reasons of compositionality we have chosen the latter approach: for the MAF project to be kept dividable in subprojects that can be worked on separately by several people, this option offers the best possibility to partition the MAF component in a number of components that have a clear input/output specification and thus can be developed and implemented separately.

2.3 Tagging Modules

Clearly, the choice for an error correcting pre-processor is a design decision that has some consequences for the architecture of the MAF component. Most important implication is that the components following the pre-processor, whatever they are (see below for discussion), have clear input/output relations and may perform their task in sequential as well as in parallel order (in contrast to parallel or integrated).

The components described below are in fact specialised taggers; each of them looks for a special type of phrases; if they find the type of phrase they are looking for the phrase is tagged and output to the post-processor MORPH/LEX. In general the output of the taggers is of the form (TAG, value, left, right) where TAG indicates what type of phrase is found, value is an integer value that codifies the contents of the phrase and left and right and are indices in the input string. The capitalised literals are also the tags that are used by the tagging modules; they correspond to module names in the architecture of MAF as given in Figure 2.

The order in which these analysers are applied to the pre-processed/corrected input does not influence the final output of MAF. As a consequence we can visualise these in the MAF component as if they act in parallel.

2.4 The MORPH/LEX Module

The MORPH/LEX module acts as the post-processor of the MAF module. The functionality of the module is best characterised by the following two main activities:
  1. search the lexicon for words that are provided by the ERROR module through the wordgraph; before the lexicon actually is searched, a morphological analysis is performed that accounts for inflection of verbs, adjectives, nouns, etc.; strings for which the lexicon does not contain an entry are assigned an 'unknown' category; a prototype feature structure for words not occurring in the lexicon:
  2. accept the tuples (TAG,value,left,right) generated by the taggers described in the previous section, and construct PATR feature structures out of them; as can be seen in the prototype feature structure below, the TAG and value variables are given special treatment and are provided to the parser:
The output of MORPH/LEX is again a wordgraph. The edges of the graph are now labelled with the category of the word and a feature structure.

The lexicon is a rather small list of words (around 3000 entries) that are highly domain-dependent and have important semantics for the domain; for each of the words a PATR feature structure is supplied in the lexicon. We refer to Section 3 for a discussion on typical feature structures used for representing domain semantics.

2.5 Implementation Report

Currently modules PROPER, ERROR and MORPH/LEX are available. PROPER and MORPH/LEX have been developed and ERROR has been adapted from the source code based on Vosse (1994). The other modules DATE, TIME and NUMBER are still under construction.

3. PARSING

As was explained in the previous section the output of MAF is a wordgraph i.e. a compact structure representing a number of alternative 'readings' of the token sequence typed by the user. For the next module, PARS, each of these readings corresponds to a sequence of pairs (rd,m), in which rd is a 3-tuple (fstruct, index1, index2), in which fstruct is a (complex) feature structure. This feature structure belongs to a word or a sequence of words recognised by MAF in the input token sequence ( for instance: 'van het Hek', or 'veertien', or 'opera'). Here, index1 and index2 are numbers indicating the position of the word or words in the input sequence. The second component m of the pair (rd,m) is a number indicating how good the word or word sequence rd refers to, matches the token sequence.

PARS analyses each reading delivered by MAF, independently of other readings. PARS is essentially a chart parser for context-free unification grammars (cfug). For each reading it outputs (ideally) one or more alternative (in case of ambiguities) analyses of the reading. An analysis is a feature structure, representing its syntactic/semantic structure according to the grammar and the set of feature structures in the reading. The features structures in this set are initially put on the chart of the chart parser. For an overview of the (head/left-corner) chart parser for unification grammars developed in the Parlevink Research Group see Veldhuijzen van Zanten & Op den Akker (1994).

It should be remarked that parsing in the context of a natural language dialogue system should be robust. Robustness means, as far as natural language processing is concerned, filtering the relevant information from the reading. In this context we are not interested in a linguistically sound and complete grammar/parser for Dutch. The syntactic structure of a reading is only of interest as far as it reflects the semantic/pragmatic meaning of the reading. Therefore the cfug is developed on the basis of an analysis of a dialogue-corpus obtained from Wizard of Oz experiments. In particular the user/client-utterances in these dialogues have been subjected to different kinds of analyses for this purpose. These investigations gave answers to the question: how do clients communicate (in Dutch) with the theatre-information system. In particular: how do they phrase their questions, wishes ('information about musicals for children', 'are there still tickets available for this evening?', 'no, thanks'), and by what kind of phrases do they refer to things like dates ('next week', 'this evening', 'the 17th of November'), performances ('the opera of Verdi', 'the performance in which ...'), tickets ('four tickets for ...'), etc.

For the most important (domain-dependent) phrases, and the 'sentence'-structures that occur in the corpus we have developed a cfug together with a lexicon of words with feature structures. In the lexicon words, nouns in particular, have a kind-feature that is used for disambiguation by means of unification. In developing the grammar, we have striven at assigning one analysis to a sequence of words if there is only one meaning. This is quite hard to accomplish, especially if the reading contains words that have obtained the default category WORD by MAF. This happens if the word is unknown. During parsing this default-category can be 'lifted' to a more specific category (for instance to PROPNAME using a grammar rule: PROPNAME --> WORD), however no feature-structure is (of course) associated to this unknown word so disambiguation can not be done by unification-failure. The grammar currently contains about 100 context-free rules.
The grammar contains rules like

S -> WHICH PPSE VERB WRDS TLPSE[1] WRDS TLPSE[2]

accompanied with a set of feature rules for building the feature structure of S, the 'sentence' symbol. Here, PPSE is a performance phrase, and TLPSE is a time or location phrase. WRDS stands for an arbitrary long sequence of words. The sentence 'welke opera's zijn er eigenlijk vandaag in jullie schouwburg' and the semantically equivalent one: 'welke opera's zijn er in jullie schouwburg vandaag' ('which operas are there (...) {today, in your theatre}') result in the following syntactic/semantic-feature structure

[mood: 'WHICHQ';
 object: [main: ['OPERA'];
              restrictions: [restr1: 'TODAY';
                                  restr2: 'THIS_THEATRE'
                                  ]
            ]
]

The mood feature indicates sentence type; in this case it is a WHICH question. The object-feature has as value two feature structures: main has as value the kind of performance and restrictions has as value a list of conditions on the value of main. The values of restr1 and restr2 have been obtained from the lexical entries of 'vandaag', 'jullie', and 'schouwburg' together with the grammar-rules for the category TLPSE of time/location-phrases. Since 'vandaag' has kind-feature 'TIME' whereas 'schouwburg' has kind-feature 'LOCATION' in the lexicon, it is possible by means of unification to fill the semantic-features of the time- and location-phrase in the proper slots of the feature-structure. The resulting feature-structure is pragmatically interpreted by the dialogue-manager using the information about the status of the dialogue at the time the user has uttered this sentence.

Robust analysis is necessary in order to cope with elliptic phrases ('ja', 'doe maar', 'nog iets leuks te doen vanavond?') frequently occurring in the corpus. Especially for these kind of utterances it is hard on the basis of a grammar alone to say anything about the meaning of the reading. In such cases PARS may only deliver a sequence of annotated words, without any syntactic structure, to the dialogue manager.

In the first prototype of the system we are currently developing the actual status of the dialogue by no means dynamically influences the parsing process. PARS leaves it to the dialogue manager to select the (most likely) function of the client-utterance. In this process of selection the manager uses the status of the dialogue. Only then the function of the client utterance in the dialogue is determined. If PARS assigns two different analyses to a reading it leaves it to the manager to select the most likely one using knowledge about the status of the dialogue. In other words, PARS analyses each sentence in isolation, not its pragmatic meaning or communicative function. In the next section we will discuss the analyses of user utterances as communicative functions in a dialogue.

4. ANALYSING THE DIALOGUE CORPUS

4.1 Wizard of Oz-Experiments

Given the current state of the art of natural language man-machine communication, it is still unfeasible to investigate human behaviour in fully integrated natural language understanding systems in a real environment, as such systems are not available.

Therefore, we looked for alternative techniques. One of the most common techniques adopted for the design of man-machine interfaces is the elicitation of man-machine dialogues where the role of the machine is simulated. In these so-called Wizard of Oz-experiments subjects interact with a machine without knowing that the turns of the machine are simulated by a so-called Wizard. This kind of experiments can be of great value for the design of a dialogue system because of the insights we can gain in the (especially linguistic) behaviour of people while they are talking with machines. Moreover, these experiments with the Wizard will give us insight in how the Wizard selects the appropriate information from the utterances typed by the user and what actions he has to perform in order to select the best response.

Student subjects were confronted with the system and were asked to use the system to perform the task described in a scenario. They were not informed about the fact that some of the system tasks were performed by a human being. We collected 64 dialogues in a pilot experiment; these dialogues are taken as the starting point for a first implementation of the dialogue system.

4.2 Speech Act Analysis

An utterance in a dialogue has a form (the utterance itself), contents (the proposition expressed) and a communicative function. Traditionally, more attention has been paid to the first two; in a dialogue however, the communicative function of an utterance must be determined to allow an other agent to react in a proper way. In the Speech Act approach, five basic functions of utterances are distinguished: In our corpus assertions about the world, expressions of the speaker and questions of the speaker are the only communicative functions. We consider both assertions about the world and expressions of the speaker to supply information to the hearer. Therefore, we distinguish two main communicative functions: requests and offers (see also Wachtel, 1986). In the Schisma-domain, we distinguish actions, information and truth values. Combining them with the former two we get:
  1. action requests
  2. action offers
  3. information requests
  4. information offers
  5. truth value requests
  6. truth value offers
(1) and (2) can have domain instantiations like 'reserve' or dialogue control instantiations like 'thank' or 'greet', (3) and (4) can have domain instantiations like 'performance' or 'actor' and (5) and (6) can have the values 'yes', 'no' or 'unknown'.

The communicative function of utterances plays a crucial role in the course of dialogues and the form of the utterance is the basis for determining that function. Concerning the form of utterances we are especially interested in the sentence type of utterances, the first form feature we use for determining the communicative function. The following table is used to determine the sentence type of utterances:
Type verb 2nd/1st subject special
declarative 2nd + -
imperative 1st - imp. verb form
y/n question 1st + -
wh question 2nd + fronted wh-term
The second column labelled with verb 2nd/1st indicates whether the finite verb is in second or in first sentence position. The column subject indicates the presence of a subject and the column special indicates some type-specific features.

A special sentence type utterance is introduced for the sentence type of all utterances that cannot be assigned a sentence type according to the table. The second form feature which plays an important role is the presence of a wh-word. Often, this tag is used for wh-questions. There are, however utterances with wh-words in which the verb is not in second position:

(1) Nou, waar ik met m'n gezin kom te zitten?
(Now, where can me and my family have a seat?)

A third form feature is the presence of a question mark. From the corpus it appears that:

A problem for determining the communicative function of utterances is the form-function dichotomy. This means that corresponding utterance forms may have different functions and that corresponding functions can be realised by different forms. Examples (4) and (5) taken from the Schisma corpus should make this clearer:

(4) Kan ik drie kaartjes reserveren voor de eerste rij?
(Can I reserve three tickets at the first row?)

(5) Treedt Purper op?
(Does Purper perform?)

Literally, (4) and (5) are yes/no questions about the speaker's ability to reserve tickets and about the performance of a group, respectively. (4) however, is an indirect request for reserving tickets and (5) for giving more details about the performance of Purper. In fact, we can paraphrase (4) and (5) as:

(4)a. If it is possible to reserve three tickets at the first row, do it for me.
(5)a. If Purper performs, tell me more about it.

In case the antecedent of the conditional appears to be false, it suffices to react with a mere 'no', although a reason for the negative answer would be welcome. In case it is true, however, utterances like (4) and (5) are often followed by an implicit affirmation in the form of a follow-up question or a follow-up information supply. So, in order for the hearer to react properly, the function of (4) and (5) as a request must be recognised.

We think that superficial linguistic clues in the utterance are both psychologically relevant and operationally useful in this respect. Example (6) from the corpus shows some of these clues:

(6) Ik wil graag 4 plaatsen, bij voorkeur op de eerste rij.
(I would like to have 4 tickets, preferably at the first row)

First, (6) has a declarative word order, which could indicate an informative communicative function. Several other clues however, give rise to another communicative function, like the combination of the first person pronoun with a finite verb expressing a wish, indicating a request.

Usually, 'graag' can be translated as 'very much'. In dialogues however, 'graag' often functions as a particle; it adds some information about the preference of the speaker. 'Graag' supports (strengthens) the wish for information or action; this wish can be implicit (e.g. in the form of a (implicit or explicit) confirmation or choice) or explicit in the form of a 'wish marker', e.g. the verb 'wil'.

A robust method of assigning communicative functions to utterances should account for the fact that the number of clues can vary considerably per utterance and can even be zero. We may be able to infer for instance that an utterance is a request, without being able to distinguish whether it is a request for information or a request for action. Therefore, we use a taxonomy of speech acts. The higher an utterance is classified in the taxonomy, the more general the speech act is. We can also enclose very domain specific speech acts at a low level in the taxonomy.

Following Hinkelman (1990) we will use rules to determine for a certain input utterance a range of possible partial speech act interpretations. (7) is an example of the kind of rules given by Hinkelman (1990) and applicable to (4) above.

(7)	(S MOOD YES-NO-Q
       	VOICE ACT
       	SUBJ (NP HEAD ik)
       	AUXS {kan}
       	MAIN-V +action) 		=>		((REQUEST-ACT ACTION)
							(SPEECH-ACT))

Both structures at the left hand side and the right hand side of the arrow contain features with their values. This rule is applicable if the structure at the left matches (a substructure of) the structure yielded by PARS. The right hand side of the rule is a disjunction of partial descriptions of communicative functions.

4.3 Tagging: A Systematic Way of Information Disclosure

One way of systematically disclosing the varied amount of information for dialogue management in the corpus is tagging; certain characteristics of words, utterances or sequences of utterances are annotated in such a way that common features can be found by statistically processing these annotations.

We are interested in several aspects of utterances; the form of utterances (e.g. word order, domain concepts, clue words, wh-words, topic focus structure), their function (requesting, providing information), the concepts either explicitly mentioned or implicitly intended and the function of these concepts with regard to the state of the database. Thus, tagging can give us information about the state of the database, what the client knows, wants to know or do and wants the server to do.

Another way of exploiting a tagged corpus is to use the generalisations from a test corpus for predicting (characteristics of) utterances in new dialogue sessions. This presupposes a statistic component which applies the rules learned from the test corpus when necessary.

In the SCHISMA project various ways of tagging are explored. These different ways are considered to be complementary. They deal with, among others, the communicative function of utterances, ways to identify topic and focus (leading to knowledge about thematic progression) and the possibility to use finite state automata to model dialogues.

5. FURTHER RESEARCH AND DEVELOPMENT

5.1 Near Future

First step to be taken towards a more elaborated system is to incorporate MAF in the Wizard-environment. Next PARS will be incorporated in this environment and thus the Wizard will be confronted with the analyses from PARS instead of with the user input directly (or the readings from MAF). On the basis of the experiments with this extended Wizard-environment the following step in the development of the grammar can be made.

At the same time we aim at a more realistic set of data. In our main experiment to be executed in the near future, dialogue sessions will be held in a real-world environment; the subjects will be occasional users of the system, unfamiliar with the interface but seeking for information and having some knowledge of the domain. The theatre de Twentse Schouwburg, currently experimenting with voice response systems, will probably give us the opportunity to collect these dialogues.

5.2 Parallel Approaches

Until now not all aspects of the SCHISMA system are receiving equal attention in building a first SCHISMA prototype. The tasks of the dialogue manager and the generation part of the system are under-developed. In the Wizard of Oz-experiments especially these parts of the system are simulated. While on the one hand we plan to continue research with an emphasis on dialogue modelling and on integrating the research results with other tools and methods that help to automate the present Wizard's tasks, on the other hand we plan to deliver from time to time systems (obviously, without Wizard) with improved performance on the theatre information and booking task. A system built with the commercial tool Natural Language is a first example. With this systems questions can be asked, answers can be given, but there is no dialogue. Cf. Komen (1994). A first extension that will be worked on is a theatre information and booking system that conducts a dialogue using task frames and dialogue handling methods similar to those that are used in the Automatic Inquiry System (AIS) that has been developed by Philips Research Laboratories in Aachen, See, e.g. Aust et al (1995). It is expected that these parallel approaches, i.e., will lead to a fruitful interaction between desires and possibilities and will allow incorporation of research results in useful products as soon as these results become available.

5.3 Further Plans

Apart from continuing the lines of research mentioned above we would like to pay more attention to the embedding of systems like SCHISMA in a more comprehensive infra-structure. For that reason a project proposal has been written in which we want to analyse what aspects of natural language have to be modelled in order to meet the expectations that natural language interfaces can provide wide public access to electronic information and transaction services. The long term goal is not only to specify a functional design for the interface of a theatre information and booking system we are developing, but to draw also more general conclusions with respect to the accessibility of information and transaction services to be located on the so-called Electronic Highway. The primary problem one has to face there is the identification of information needs and communicative behaviour of such a ranging and diverse population as the general public, that, moreover, is not very well organised or represented by existing organisations and whose needs and behaviour is not very well articulated. In this research we plan to apply the so-called Consumer Constructive Technology Assessment methodology to natural language accessible electronic information and transaction systems.

Bibliography

Alexandersson, J., et al. A robust and efficient three-layered dialogue component for a speech-to-speech translation system. Proc. EACL, Dublin, 1995.

Andry, F., E. Bilange, F. Charpentier, K. Choukri, M. Ponamale, and S. Soudoplatoff. Computerised simulation tools for the design of an oral dialogue system. Report, 1990.

Andernach, T., G. Deville, and L. Mortier. The design of a real world Wizard of oz experiment for a speech driven telephone directory information system. In Proceedings of Eurospeech, 1993.

Aust, H. and M. Oerder. Dialogue control in automatic inquiry systems. To appear in: Spoken Dialogue Systems. Workshop, Vigsø, Denmark, 1995.

Austin, J.L. How to do Things with Words. The William James Lectures delivered at Harvard University in 1955. Edited by JO. Urmson. Harvard University Press, Cambridge (Mass.), 1962.

Bos, R. Modelling dialogues with finite automata in SCHISMA. Report R&D-SV-95-144. KPN Research, Leidschendam, March 1995.

Boves, L., J. Landsbergen, R. Scha & G. van Noord. Priority Programme Language and Speech Technology, to appear.

Dahlbäck, N. and A. Jönsson. A system for studying human-computer dialogues in natural language. Research report, NLP-LAB IDA Linköping University, Linköping Sweden, December 1986.

Fraser, N.M. and G.M. Gilbert. Simulating Speech Systems. Computer, Speech and Language, vol. 5, 1991, pp. 81-99

Hinkelman, E.A. Linguistic and Pragmatic Constraints on Utterance Interpretation. May 1990, Phd. Thesis, University of Rochester, Rochester.

Hoeven, G.F. van der, J.A. Andernach, S.P. van de Burgt, G-J.M. Kruijff, A. Nijholt, J. Schaake, and F.M.G. de Jong. SCHISMA: A Natural Language Accessible Theatre Information and Booking System. To appear in Proceedings of the First International Workshop on Applications of Natural Language to Databases (NLDB 95), Versailles, 1995.

Komen, E. Evaluation of Natural Language for the Schisma domain. Memoranda Informatica 95-14, 1995.

Schaake, J. and G.-J. M. Kruijff. Information states based analysis of dialogues. Proceedings CLIN '94 (Computational Linguistics in the Netherlands), University of Twente, 1994.

Searle, J.R. Speech Acts. An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, 1969.

Shieber, S.M. An Introduction to Unification-based Approaches to Grammar. Center for the Study of Language and Information, Stanford, CA, USA, 1986.

Veldhuijzen van Zanten, G. and R. op den Akker. Developing natural language interfaces; A test case. Proceedings Workshop on Language Technology (TWLT 8), L. Boves & A. Nijholt (eds.), University of Twente, 1994.

Vosse, T.G. The Word Connection. Rijksuniversiteit Leiden, Ph.D. Thesis, Neslia Paniculata, 1994.

Wachtel, T. Pragmatic Sensitivity in NL Interfaces and the Structure of Conversation. Proceedings of COLING, 1986, Bonn, 35-41.