Intelligent tutoring systems with conversational dialogue

Intelligent tutoring systems (ITSs) are clearly one of the successful enterprises in AI. There is a long list of ITSs that have been tested on humans and have proven to facilitate learning. There are well-tested tutors of algebra, geometry, and computer languages (such as PACT [Koedinger et al. 1997]); physics (such as ANDES [Gertner and VanLehn 2000; VanLehn 1996]); and electronics (such as SHERLOCK [Lesgold et al. 1992]).

These ITSs use a variety of computational modules that are familiar to those of us in the world of AI: production systems, Bayesian networks, schema templates, theorem proving, and explanatory reasoning. According to the current estimates, the arsenal of sophisticated computational modules inherited from AI produce learning gains of approximately .3 to 1.0 standard deviation units compared with students learning the same content in a classroom (Corbett et al. 1999).

Intelligent tutoring systems with conversational dialogueThe next generation of ITSs is expected to go one step further by adopting conversational interfaces. The tutor will speak to the student with an agent that has synthesized speech, facial expressions, and gestures, in addition to the normal business of having the computer display text, graphics, and animation. Animated conversational agents have now been developed to the point that they can be integrated with ITSs (Cassell and Thorisson, 1999; Johnson, Rickel, and Lester 2000; Lester et al. 1999). Learners will be able to type in their responses in English in addition to the conventional point and click. Recent developments in computational linguistics (Jurafsky and Martin 2000) have made it a realistic goal to have computers comprehend language, at least to an extent where the ITS can respond with something relevant and useful. Speech recognition would be highly desirable, of course, as long as it is also reliable.

At this point, we are uncertain whether the conversational interfaces will produce incremental gains in learning over and above the existing ITSs (Corbett et al. 1999). However, there are reasons for being optimistic. One reason is that human tutors produce impressive learning gains (between .4 and 2.3 standard deviation units over classroom teachers), even though the vast majority of tutors in a school's system have modest domain knowledge, have no training in pedagogical techniques, and rarely use the sophisticated tutoring strategies of ITSs (Cohen, Kulik, and Kulik 1982; Graesser, Person, and Magliano, 1995).

A second reason is that there are at least two success cases, namely, the AUTOTUTOR and ATLAS systems that we discuss in this article. AUTOTUTOR (Graesser et al. 1999) is a fully automated computer tutor that has tutored approximately 200 college students in an introductory course in computer literacy. An early version of AUTOTUTOR improved learning by .5 standard deviation units (that is, about a half a letter grade) when compared to a control condition where students reread yoked chapters in the book. ATLAS (VanLehn et al. 2000) is a computer tutor for college physics that focuses on improving students' conceptual knowledge. In a recent pilot evaluation, students who used ATLAS scored .9 standard deviation units higher than students who used a similar tutoring system that did not use natural language dialogues. Thus, it appears that there is something about conversational dialogue that plays an important role in learning. We believe that the most effective tutoring systems of the future will be a hybrid between normal conversational patterns and the ideal pedagogical strategies in the ITS enterprise.

This article describes some of the tutoring systems that we are developing to simulate conversational dialogue. We begin with AUTOTUTOR. Then we describe a series of physics tutors that vary from conventional ITS systems (the ANDES tutor) to agents that attempt to comprehend natural language and plan dialogue moves (ATLAS and WHY2).


The Tutoring Research Group (TRG) at the University of Memphis developed AUTOTUTOR to simulate the dialogue patterns of typical human tutors (Graesser et al. 1999; Person et al. 2001). AUTOTUTOR tries to comprehend student contributions and simulate dialogue moves of either normal (unskilled) tutors or sophisticated tutors. AUTOTUTOR is currently being developed for college students who are taking an introductory course in computer literacy. These students learn the fundamentals of computer hardware, the operating system, and the internet.

Figure 1 is a screen shot that illustrates the interface of AUTOTUTOR. The left window has a talking head that acts as a dialogue partner with the learner. The talking head delivers AUTOTUTOR'S dialogue moves with synthesized speech, intonation, facial expressions, nods, and gestures. The major question (or problem) that the learner is working on is both spoken by AUTOTUTOR and is printed at the top of the screen. The major questions are generated systematically from a curriculum script, a module that we discuss later. AUTOTUTOR'S major questions are not the fill-in-the-blank, true-false, or multiple-choice questions that are so popular in the U.S. educational system. Instead, the questions invite lengthy explanations and deep reasoning (such as why, how, and what-if questions). The goal is to encourage students to articulate lengthier answers that exhibit deep reasoning rather than deliver short snippets of shallow knowledge. There is a continuous multiturn tutorial dialogue between AUTOTUTOR and the learner during the course of answering a deep-reasoning question. When considering both the learner and AUTOTUTOR, it typically takes 10 to 30 turns during the tutorial dialogue to answer a single question from the curriculum script. The learner types in his/her contributions during the exchange by keyboard, as reflected in the bottom window. For some topics, as in figure 1, there are graphic displays and animation, with components that AUTOTUTOR points to. AUTOTUTOR was designed to be a good conversational partner that comprehends, speaks, points, and displays emotions, all in a coordinated fashion.


An Example AUTOTUTOR-Learner Dialogue

Figure 2 shows a dialogue between a college student and AUTOTUTOR. Prior to this question, the student had been asked and attempted to answer 6 previous questions about the internet. The internet was the macrotopic, and students were tutored by answering several deep-reasoning questions about the internet. It should be noted that this is not a fabricated toy conversation. It is a bona fide dialogue from our corpus of approximately 200 AUTOTUTOR-student dialogues in a computer literacy course.

Figure 2. An Example Tutorial Dialogue between AUTOTUTOR
and a Learner.

TUTOR-1: Alright, let's go on. You want to take photos and
send them over the Internet. Consider this problem. What
hardware do you need to take photos and send them over
the Internet?
STUDENT-1: First of all you need to have access to
the internet.
TUTOR-2: What else?
STUDENT-2: a camera
TUTOR-3: Yeah. For what type of connection do you need
a network card?
STUDENT-3: network card
TUTOR-4: Okay. Now, if you have access to the Internet
through a network card, then your connection is ...
STUDENT-4: I don't know.
TUTOR-5: Is direct. Okay. A network card is needed if
you have a direct connection to the Internet. Okay. How
does the user get hooked up to the internet?
STUDENT-5: by a plug.
TUTOR-6: Once again, a network card is needed if you
have a direct connection to the Internet. Okay, what does
a modem do?
STUDENT-6: connects network card and internet
TUTOR-7: Okay. Try this. A modem is required when
there is ...
STUDENT-7: a connection
TUTOR-8: a dial up connection.
TUTOR-30: Let's review. To send your photos on the
Internet, you need either a digital camera or a regular
camera to take the photos. If you use a regular camera,
you need a scanner to scan them onto a computer disk.
If you have a direct connection to the Internet, then you
need a network card. A modem is needed if you have a
dial up connection.