Email: {andernac, terdoest}@cs.utwente.nl
Tel.: +31 53.893740
The paper is organised as follows. In section 2 we discuss the component of the architecture that is currently responsible for morphological analysis, recognition of domain concepts and error handling (MAF). Section 3 is concerned with the parsing component of the system and section 4 is concerned with experiments to obtain dialogues and with speech act analysis of utterances in the corpus thus obtained.
In Van der Hoeven et al. (1995) the dialogue aspects and in particular the view on dialogue state in the SCHISMA project are discussed. In a future paper more attention will be given to the different approaches that the project follows in this respect (e.g., an update semantics approach and an approach to dialogue modelling with finite state automata). Cf. Schaake & Kruijff (1995) and Bos (1994).
On the implementation level this means that the MAF module has as output a collection of items (rd,m) where rd is a 3-tuple (fstruct,index1,index2), fstruct a PATR-II feature structure (see Shieber, 1986), index1 and index2 indices on the word-level as discussed above, and m is a value indicating the plausibility of rd as a representation of (part of) the input string.
Internally, the input string is represented as a character graph; intuitively, a charactergraph is the same concept as a wordgraph with the distinction that nodes are now on the characterlevel. The architecture of the MAF module as depicted in Figure 2 should now be understood as follows: the error correcting component ERROR outputs a charactergraph that is provided to the tagging modules PROPER, NUMBER, DATE and TIME and to the MORPH/LEX module. The ERROR module in addition labels the edges of the graph it outputs with the plausibility measure and with the string it maps the substring represented by the edge to.
For performing the error correction ERROR has access to a large dictionary (typically 200,000 words). The tagging modules look for phrases in the input string that contain particularly important information for the dialogue; especially the detection of proper names referring to database items, phrases representing date and time and number names is aimed at here; for detecting proper names referring to the database the PROPER module needs access to the SCHISMA database. MORPH/LEX creates items for the parser out of the tag information provided by the taggers and it searches for words that appear in the domain-specific lexicon and for which domain-dependent semantic information is recorded in it.
An example of a charactergraph tagged for special phrases now follows. Suppose the string typed in by the client is Ik wil de veertiende graag naar Verde then, if we denote edges by brackets, the following is a reading of the input:
[Ik] [w il]wil [DATE de veertiende] [graag] [naar] [PROPER Verde]Verdi
(On the 14th I would like to go to Verdi)
where corrections (if any) immediately follow closing brackets. Another reading of the string would be to recognize de veertiende as
[de] [NUMBER veertiende]
In section 2.2 the error correcting module will be discussed. A more precise description of the taggers will be given in section 2.3 and in section 2.4 the MORPH/LEX component is defined. In section 2.5 some words are devoted to implementation issues.
Roughly, there are two approaches to this from the engineering point of view: the integrated approach and the pre-processor approach. In the integrated approach recognition of tokens (lexical items, number names, etc. (see discussion below)) is done simultaneously with the error correction.
The pre-processor approach requires the introduction of generic knowledge of what character sequences definitely may and what may not occur. It makes use of what trigrams of characters and what triphones (trigrams of phonemes) are viable in the Dutch language (given a dictionary of words that may occur). Using these trigrams substrings of the input string are compared to words in the dictionary. We refer to Vosse (1994) for details on this error-correction method.
For reasons of compositionality we have chosen the latter approach: for the MAF project to be kept dividable in subprojects that can be worked on separately by several people, this option offers the best possibility to partition the MAF component in a number of components that have a clear input/output specification and thus can be developed and implemented separately.
The components described below are in fact specialised taggers; each of them looks for a special type of phrases; if they find the type of phrase they are looking for the phrase is tagged and output to the post-processor MORPH/LEX. In general the output of the taggers is of the form (TAG, value, left, right) where TAG indicates what type of phrase is found, value is an integer value that codifies the contents of the phrase and left and right and are indices in the input string. The capitalised literals are also the tags that are used by the tagging modules; they correspond to module names in the architecture of MAF as given in Figure 2.
The lexicon is a rather small list of words (around 3000 entries) that are highly domain-dependent and have important semantics for the domain; for each of the words a PATR feature structure is supplied in the lexicon. We refer to Section 3 for a discussion on typical feature structures used for representing domain semantics.
PARS analyses each reading delivered by MAF, independently of other readings. PARS is essentially a chart parser for context-free unification grammars (cfug). For each reading it outputs (ideally) one or more alternative (in case of ambiguities) analyses of the reading. An analysis is a feature structure, representing its syntactic/semantic structure according to the grammar and the set of feature structures in the reading. The features structures in this set are initially put on the chart of the chart parser. For an overview of the (head/left-corner) chart parser for unification grammars developed in the Parlevink Research Group see Veldhuijzen van Zanten & Op den Akker (1994).
It should be remarked that parsing in the context of a natural language dialogue system should be robust. Robustness means, as far as natural language processing is concerned, filtering the relevant information from the reading. In this context we are not interested in a linguistically sound and complete grammar/parser for Dutch. The syntactic structure of a reading is only of interest as far as it reflects the semantic/pragmatic meaning of the reading. Therefore the cfug is developed on the basis of an analysis of a dialogue-corpus obtained from Wizard of Oz experiments. In particular the user/client-utterances in these dialogues have been subjected to different kinds of analyses for this purpose. These investigations gave answers to the question: how do clients communicate (in Dutch) with the theatre-information system. In particular: how do they phrase their questions, wishes ('information about musicals for children', 'are there still tickets available for this evening?', 'no, thanks'), and by what kind of phrases do they refer to things like dates ('next week', 'this evening', 'the 17th of November'), performances ('the opera of Verdi', 'the performance in which ...'), tickets ('four tickets for ...'), etc.
For the most important (domain-dependent) phrases, and the 'sentence'-structures that occur in the corpus we have developed a cfug together with a lexicon of words with feature structures. In the lexicon words, nouns in particular, have a kind-feature that is used for disambiguation by means of unification. In developing the grammar, we have striven at assigning one analysis to a sequence of words if there is only one meaning. This is quite hard to accomplish, especially if the reading contains words that have obtained the default category WORD by MAF. This happens if the word is unknown. During parsing this default-category can be 'lifted' to a more specific category (for instance to PROPNAME using a grammar rule: PROPNAME --> WORD), however no feature-structure is (of course) associated to this unknown word so disambiguation can not be done by unification-failure. The grammar currently contains about 100 context-free rules.
The grammar contains rules like
S -> WHICH PPSE VERB WRDS TLPSE[1] WRDS TLPSE[2]
accompanied with a set of feature rules for building the feature structure of S, the 'sentence' symbol. Here, PPSE is a performance phrase, and TLPSE is a time or location phrase. WRDS stands for an arbitrary long sequence of words. The sentence 'welke opera's zijn er eigenlijk vandaag in jullie schouwburg' and the semantically equivalent one: 'welke opera's zijn er in jullie schouwburg vandaag' ('which operas are there (...) {today, in your theatre}') result in the following syntactic/semantic-feature structure
[mood: 'WHICHQ'; object: [main: ['OPERA']; restrictions: [restr1: 'TODAY'; restr2: 'THIS_THEATRE' ] ] ]
The mood feature indicates sentence type; in this case it is a WHICH question. The object-feature has as value two feature structures: main has as value the kind of performance and restrictions has as value a list of conditions on the value of main. The values of restr1 and restr2 have been obtained from the lexical entries of 'vandaag', 'jullie', and 'schouwburg' together with the grammar-rules for the category TLPSE of time/location-phrases. Since 'vandaag' has kind-feature 'TIME' whereas 'schouwburg' has kind-feature 'LOCATION' in the lexicon, it is possible by means of unification to fill the semantic-features of the time- and location-phrase in the proper slots of the feature-structure. The resulting feature-structure is pragmatically interpreted by the dialogue-manager using the information about the status of the dialogue at the time the user has uttered this sentence.
Robust analysis is necessary in order to cope with elliptic phrases ('ja', 'doe maar', 'nog iets leuks te doen vanavond?') frequently occurring in the corpus. Especially for these kind of utterances it is hard on the basis of a grammar alone to say anything about the meaning of the reading. In such cases PARS may only deliver a sequence of annotated words, without any syntactic structure, to the dialogue manager.
In the first prototype of the system we are currently developing the actual status of the dialogue by no means dynamically influences the parsing process. PARS leaves it to the dialogue manager to select the (most likely) function of the client-utterance. In this process of selection the manager uses the status of the dialogue. Only then the function of the client utterance in the dialogue is determined. If PARS assigns two different analyses to a reading it leaves it to the manager to select the most likely one using knowledge about the status of the dialogue. In other words, PARS analyses each sentence in isolation, not its pragmatic meaning or communicative function. In the next section we will discuss the analyses of user utterances as communicative functions in a dialogue.
Therefore, we looked for alternative techniques. One of the most common techniques adopted for the design of man-machine interfaces is the elicitation of man-machine dialogues where the role of the machine is simulated. In these so-called Wizard of Oz-experiments subjects interact with a machine without knowing that the turns of the machine are simulated by a so-called Wizard. This kind of experiments can be of great value for the design of a dialogue system because of the insights we can gain in the (especially linguistic) behaviour of people while they are talking with machines. Moreover, these experiments with the Wizard will give us insight in how the Wizard selects the appropriate information from the utterances typed by the user and what actions he has to perform in order to select the best response.
Student subjects were confronted with the system and were asked to use the system to perform the task described in a scenario. They were not informed about the fact that some of the system tasks were performed by a human being. We collected 64 dialogues in a pilot experiment; these dialogues are taken as the starting point for a first implementation of the dialogue system.
The communicative function of utterances plays a crucial role in the course of dialogues and the form of the utterance is the basis for determining that function. Concerning the form of utterances we are especially interested in the sentence type of utterances, the first form feature we use for determining the communicative function. The following table is used to determine the sentence type of utterances:
Type | verb 2nd/1st | subject | special |
declarative | 2nd | + | - |
imperative | 1st | - | imp. verb form |
y/n question | 1st | + | - |
wh question | 2nd | + | fronted wh-term |
A special sentence type utterance is introduced for the sentence type of all utterances that cannot be assigned a sentence type according to the table. The second form feature which plays an important role is the presence of a wh-word. Often, this tag is used for wh-questions. There are, however utterances with wh-words in which the verb is not in second position:
(1) Nou, waar ik met m'n gezin kom te zitten?
(Now, where can me and my family have a seat?)
A third form feature is the presence of a question mark. From the corpus it appears that:
(2) Wanneer is Silicone Kitty
(When does Silicone Kitty play)
(3) 2 zei ik toch?
(I said 2, didn't I?)
(4) Kan ik drie kaartjes reserveren voor de eerste rij?
(Can I reserve three tickets at the first row?)
(5) Treedt Purper op?
(Does Purper perform?)
Literally, (4) and (5) are yes/no questions about the speaker's ability to reserve tickets and about the performance of a group, respectively. (4) however, is an indirect request for reserving tickets and (5) for giving more details about the performance of Purper. In fact, we can paraphrase (4) and (5) as:
(4)a. If it is possible to reserve three tickets at the first row, do it for me.
(5)a. If Purper performs, tell me more about it.
In case the antecedent of the conditional appears to be false, it suffices to react with a mere 'no', although a reason for the negative answer would be welcome. In case it is true, however, utterances like (4) and (5) are often followed by an implicit affirmation in the form of a follow-up question or a follow-up information supply. So, in order for the hearer to react properly, the function of (4) and (5) as a request must be recognised.
We think that superficial linguistic clues in the utterance are both psychologically relevant and operationally useful in this respect. Example (6) from the corpus shows some of these clues:
(6) Ik wil graag 4 plaatsen, bij voorkeur op de eerste rij.
(I would like to have 4 tickets, preferably at the first row)
First, (6) has a declarative word order, which could indicate an informative communicative function. Several other clues however, give rise to another communicative function, like the combination of the first person pronoun with a finite verb expressing a wish, indicating a request.
Usually, 'graag' can be translated as 'very much'. In dialogues however, 'graag' often functions as a particle; it adds some information about the preference of the speaker. 'Graag' supports (strengthens) the wish for information or action; this wish can be implicit (e.g. in the form of a (implicit or explicit) confirmation or choice) or explicit in the form of a 'wish marker', e.g. the verb 'wil'.
A robust method of assigning communicative functions to utterances should account for the fact that the number of clues can vary considerably per utterance and can even be zero. We may be able to infer for instance that an utterance is a request, without being able to distinguish whether it is a request for information or a request for action. Therefore, we use a taxonomy of speech acts. The higher an utterance is classified in the taxonomy, the more general the speech act is. We can also enclose very domain specific speech acts at a low level in the taxonomy.
Following Hinkelman (1990) we will use rules to determine for a certain input utterance a range of possible partial speech act interpretations. (7) is an example of the kind of rules given by Hinkelman (1990) and applicable to (4) above.
(7) (S MOOD YES-NO-Q VOICE ACT SUBJ (NP HEAD ik) AUXS {kan} MAIN-V +action) => ((REQUEST-ACT ACTION) (SPEECH-ACT))Both structures at the left hand side and the right hand side of the arrow contain features with their values. This rule is applicable if the structure at the left matches (a substructure of) the structure yielded by PARS. The right hand side of the rule is a disjunction of partial descriptions of communicative functions.
We are interested in several aspects of utterances; the form of utterances (e.g. word order, domain concepts, clue words, wh-words, topic focus structure), their function (requesting, providing information), the concepts either explicitly mentioned or implicitly intended and the function of these concepts with regard to the state of the database. Thus, tagging can give us information about the state of the database, what the client knows, wants to know or do and wants the server to do.
Another way of exploiting a tagged corpus is to use the generalisations from a test corpus for predicting (characteristics of) utterances in new dialogue sessions. This presupposes a statistic component which applies the rules learned from the test corpus when necessary.
In the SCHISMA project various ways of tagging are explored. These different ways are considered to be complementary. They deal with, among others, the communicative function of utterances, ways to identify topic and focus (leading to knowledge about thematic progression) and the possibility to use finite state automata to model dialogues.
At the same time we aim at a more realistic set of data. In our main experiment to be executed in the near future, dialogue sessions will be held in a real-world environment; the subjects will be occasional users of the system, unfamiliar with the interface but seeking for information and having some knowledge of the domain. The theatre de Twentse Schouwburg, currently experimenting with voice response systems, will probably give us the opportunity to collect these dialogues.
Alexandersson, J., et al. A robust and efficient three-layered dialogue component for a speech-to-speech translation system. Proc. EACL, Dublin, 1995.
Andry, F., E. Bilange, F. Charpentier, K. Choukri, M. Ponamale, and S. Soudoplatoff. Computerised simulation tools for the design of an oral dialogue system. Report, 1990.
Andernach, T., G. Deville, and L. Mortier. The design of a real world Wizard of oz experiment for a speech driven telephone directory information system. In Proceedings of Eurospeech, 1993.
Aust, H. and M. Oerder. Dialogue control in automatic inquiry systems. To appear in: Spoken Dialogue Systems. Workshop, Vigsø, Denmark, 1995.
Austin, J.L. How to do Things with Words. The William James Lectures delivered at Harvard University in 1955. Edited by JO. Urmson. Harvard University Press, Cambridge (Mass.), 1962.
Bos, R. Modelling dialogues with finite automata in SCHISMA. Report R&D-SV-95-144. KPN Research, Leidschendam, March 1995.
Boves, L., J. Landsbergen, R. Scha & G. van Noord. Priority Programme Language and Speech Technology, to appear.
Dahlbäck, N. and A. Jönsson. A system for studying human-computer dialogues in natural language. Research report, NLP-LAB IDA Linköping University, Linköping Sweden, December 1986.
Fraser, N.M. and G.M. Gilbert. Simulating Speech Systems. Computer, Speech and Language, vol. 5, 1991, pp. 81-99
Hinkelman, E.A. Linguistic and Pragmatic Constraints on Utterance Interpretation. May 1990, Phd. Thesis, University of Rochester, Rochester.
Hoeven, G.F. van der, J.A. Andernach, S.P. van de Burgt, G-J.M. Kruijff, A. Nijholt, J. Schaake, and F.M.G. de Jong. SCHISMA: A Natural Language Accessible Theatre Information and Booking System. To appear in Proceedings of the First International Workshop on Applications of Natural Language to Databases (NLDB 95), Versailles, 1995.
Komen, E. Evaluation of Natural Language for the Schisma domain. Memoranda Informatica 95-14, 1995.
Schaake, J. and G.-J. M. Kruijff. Information states based analysis of dialogues. Proceedings CLIN '94 (Computational Linguistics in the Netherlands), University of Twente, 1994.
Searle, J.R. Speech Acts. An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, 1969.
Shieber, S.M. An Introduction to Unification-based Approaches to Grammar. Center for the Study of Language and Information, Stanford, CA, USA, 1986.
Veldhuijzen van Zanten, G. and R. op den Akker. Developing natural language interfaces; A test case. Proceedings Workshop on Language Technology (TWLT 8), L. Boves & A. Nijholt (eds.), University of Twente, 1994.
Vosse, T.G. The Word Connection. Rijksuniversiteit Leiden, Ph.D. Thesis, Neslia Paniculata, 1994.
Wachtel, T. Pragmatic Sensitivity in NL Interfaces and the Structure of Conversation. Proceedings of COLING, 1986, Bonn, 35-41.