International Journal on Arti cial Intelligence Tools c World Scienti c Publishing Company DETERMINING THE DIMENSIONS OF VARIABLES IN PHYSICS ALGEBRAIC EQUATIONS C.W. Liew Department of Computer Science Lafayette College Easton, PA 18042 firstname.lastname@example.org Joel A. Shapiro Department of Physics and Astronomy Rutgers University Piscataway, NJ 08854-8019 email@example.com D.E.
Smith Department of Computer Science Rutgers University Piscataway, NJ 08854-8019 firstname.lastname@example.org Received 12 July 2004 Revised (15 December 2004) Accepted (Day Month Year) This paper describes work on methods that evaluate algebraic solutions to word problems in physics. Many current tutoring systems rely on substantial sca olding and consequently require students to completely describe every variable used in the solution. A heuristic, based on constraint propagation, capable of inferring the description of variables ( i.e.
, the possible dimensions and physics concepts) is shown to be highly reliable on three real world data sets, one covering a few problems with a small number of student answers and two others covering a large class of problems ( < 100) with a large number of student answers ( < 11,000). The heuristic uniquely determines the dimensions of all the variables in 91 392% of the equation sets. By asking the student for dimension information about ... more. less.
one variable, an additional 3% of the sets can be determined.<br><br> An ITS tutoring system can use this heuristic to reason about a student 9s answers even when the sca olding and context are removed. Keywords : Intelligent tutoring systems; constraint propagation; units 1. Introduction In teaching problem solving, Intelligent Tutoring Systems (ITS) often employ a rigid and explicit framework to guide the student along a predetermined sequence of steps.<br><br> This mechanism, called sca olding 16 , 2 , 9 , is pedagogically sound and bene- 1 2 Liew, Shapiro, and Smith cial to beginning students in the subject, because it helps them learn how to sys- tematically analyze a complex problem. After some experience, students internalize these steps, and the best pedagogy changes from requiring explicit demonstration of each and every step to allowing the most basic of these steps to be performed implicitly. In fact to continue to require explicit demonstration of these basic steps often frustrates the students, making the task more tedious than instructional.<br><br> At some point, the sca olding should be relaxed by the tutoring system. Removing the sca olding puts a greater burden on both the student and the tutoring system. The student must do more on his own without feedback from the tutor and the system must now interpret answersthat may be in a di erent sequence or may have incorporatedsome basic assumptions.<br><br> This is especially true when there are many ways to specify the solution, as when there are multiple equation sets that correctly describe the physics of the problem, represented in many equivalent forms with di ering numbers of equations and variables. In addition, students can use any one of many di erent variable names to refer to a single physical property. Tutoring systems must be able to infer properties that are referred to by the variables in a set of equations before they can evaluate the correctness of the equations.<br><br> This paper describes our continuing work on developing approaches that reduce an ITS 9s reliance on sca olding. In particular, we examine issues of identifying the meaning of variables in equation sets that solve college level introductory physics problems. Our initial techniques worked for a small set of problems and on a small number of students.<br><br> Subsequent analyzes of a larger corpus showed that improve- ments were needed and the matching techniques were extended as described in section 4.1.1. The improved technique uniquely determine the dimensions of all the variables in 91 392% of the sets of equations. By asking for dimension information about one variable, an additional 3% of the sets could be determined.<br><br> Earlier de- scriptions of this work can be found in 6 , 7 , 10 , 11 , 8 , 9 . Some of the results have been reported in 6 , 11 , 9 . These analyses show that a physics tutoring system can relax sca olding and still reliably and robustly determine the dimensions of the variables used in the equations.<br><br> This knowledge can in turn be used to identify the physical quantities corresponding to student-chosen variables used in their equations, and to associate them with a canonical solution set of variables and equations. We plan to build on these results to identify the physical concepts employed by students to solve a problem and to provide e ective feedback when irrelevant concepts are used, or relevant concepts are omitted or used incorrectly. 2.<br><br> Algebraic Physics Problems Physics uses sets of algebraic equations to specify the interrelations of a set of physical quantities. One of the main di erences between generic algebraic equations and algebraic equations describing a relationship in physics is that the latter must be dimensionally consistent. Two algebraic equations in physics are shown below.<br><br> Determining the Dimensions of Variables in Physics Algebraic Equations 3 T 2 m 1 7 g = m 1 7 a 1 (1) a 1 = 2 a 2 (2) Algebraically speaking, these equations could be added to one another to form a new equation. Physically speaking, equations describe the constraints between quantities given by the laws of physics while variables represent physical properties of an object or a system of objects. Consequently each of the variables, constants, terms, expressions, and even equations has speci c dimensions and can only be combined using dimensionally consistent operations.<br><br> For example, equation 1 is likely to have the dimensions of force ( i.e. , kg · m / s 2 ) while equation 2 would have dimensions of acceleration ( i.e. , m / s 2 ).<br><br> It would be incorrect to add these equations, since that operation would violate dimensional consistency. As a rst step in deciding if an algebraic equation in physics is correct, a system can check if the equation is dimensionally consistent . This is analogous to verifying that the syntax of a program is correct by verifying that the type of each variable is consistent with the operations on that variable.<br><br> This is a straightforward check if the meanings of the variables are known; if not, it provides constraints on what they can signify. 2.1. Issues in Removing the Sca olding Removing the sca olding imposes an additional computational requirement on tu- toring systems.<br><br> We illustrate this with an example problem based on Atwood 9s machine, a pulley with two masses, m 1 and m 2 hanging at either end, as shown in Figure 1. m 2 m 1 Fig. 1.<br><br> Atwoods Machine A common problem based on Atwood 9s machine asks the student for the equa- tion(s) that would determine the acceleration of the mass m 1 , assuming that m 1 4 Liew, Shapiro, and Smith (ii) (iii) (i) T a T T a a a T 1 1 2 2 a Fig. 2. Di erent variable sets describing the solution.<br><br> and m 2 are not equal. Equations 3 through 6 represent one solution to the problem using variable set ( i ) in Figure 2. Forces/Acceleration on Block 1 : T 1 2 m 1 7 g = m 1 7 a 1 (3) Forces/Acceleration on Block 2 : T 2 2 m 2 7 g = m 2 7 a 2 (4) Tension in rope : T 1 = T 2 (5) Acceleration of connected blocks : a 1 = 2 a 2 (6) From a pedagogical standpoint, physics instructors teach beginning students that the steps involved in solving problems of this type are: (1) variable de nition: Each variable is de ned with the object(s) and properties to which it refers.<br><br> In some cases, the time when this variable is applicable is also de ned. (2) identi cation of physics laws: Each applicable physics law, e.g. , force balance or conservation of momentum, must be identi ed and the objects to which they apply must be speci ed.<br><br> (3) instantiation of physics laws: The general physics laws are stated as equations with ctextbook d variables. Each variable speci ed from the rst step is substi- tuted as appropriate for the textbook variables. The result is an equation or system of equations su\x3cient to solve for all unknowns in the current problem.<br><br> (4) solving the equation set: The algebraic manipulations are performed to solve for the required variables. Our tutor does not address this step. As students become accustomed to the vocabulary of the domain, they start using problem solving cshortcuts d.<br><br> Instead of de ning each variable explicitly, the students select from a dictionary of well-known physics variables to represent the properties that they desire. For example in Newtonian mechanics, variables begin- ning with m typically represent masses, variables beginning with a typically rep- resent accelerations, and variables beginning with T may represent tensions ( i.e. , forces).<br><br> Thus the naming of a variable implicitly speci es possible dimensions or Determining the Dimensions of Variables in Physics Algebraic Equations 5 properties and the subscripts of each variable speci es the object(s) to which the variable refers. For example, m 1 and a 1 would refer to the mass and acceleration of the same object while p 1 ,t 1 might refer to the momentum of object 1 at time t 1. When the sca olding is removed, the tutoring system must be able to deter- mine the context of the system of equations.<br><br> For example, rather than describe the Atwoods problem using equations 3 through 6, the student might choose to use a single variable a to represent acceleration, and a single T for the tension, implicitly using the principle that equates T 1 and T 2 , and the constraint a 1 = 2 a 2 , which comes from the xed length of the cord. Variable set ( ii ) in gure 2 identi es the variables used with such an approach. The resulting equations are shown below.<br><br> T 2 m 1 7 g = m 1 7 a (7) T 2 m 2 7 g = 2 m 2 7 a (8) The tutor must determine that (1) the variable a has the dimensions of accelera- tion (kg · m / s 2 ), (2) the single variable is mapped to the acceleration of object 1 and that (3) the acceleration of the other object is replaced by an algebraic substitution using Eq. 6. The system must make similar determinations for the tensions.<br><br> The two issues that a tutoring system must address when sca olding is removed are (1) identi cation of the dimensions and therefore the properties of each variable and (2) identi cation of the object(s) that the variables refer to. In this paper, we focus on the rst issue, that of determining the dimensions of each variable. Our preliminary work in addressing the second issue, that of mapping the variables to objects is described in Ref.<br><br> 11 . 3. Prior Work Checking for dimensional consistency is an important rst step for a physicstutoring system as it can then focus on reasoning about dimensionally correct equations only.<br><br> Existing systems, e.g. , ANDES 3 , 14 , 4 and PHYSICS-TUTOR 6 , require that the dimensions of each variable and constant be known a priori either through a knowledge base of variables and constants or by having the student de ne them. Once these dimensions are known, it is a fairly simple step to determine if the equation is dimensionally consistent by using standard dimensional analysis.<br><br> There are many systems that use constraint propagation to ensure consistency of values of variables. Examples of such systems include VEXED 15 and OPIS 13 . Their use of constraint propagation is similar to our use except that they are propagating values and not dimensions.<br><br> There has also been some work done on adding dimension speci cations to programming languages to support compile-time 12 , 5 and run-time 1 detection of di- mension errors. These systems are similar to strongly typed programming languages where every variable has to be de ned and has a type. Our system is analogous to a weakly typed language where variables are partially de ned on rst use and their 6 Liew, Shapiro, and Smith types are inferred from the context.<br><br> 4. Dimension Check Algorithm In an earlier paper 10 , we described an approach for determining the dimensions of every variable in an algebraic equation. The earlier version of the technique combined the use of a knowledge base of commonly used physics variables and constants with constraint propagation.<br><br> A constraint graph is built in which variables in the equation are instantiated as leaf nodes, and operators ( e.g. , + , 2 , 7 ,/, =) and functions ( e.g. , cos , sin , tan) are instantiated as internal nodes.<br><br> The value at each node represents the set of possible dimensions for that node. The algorithm uses a knowledge base of commonly used physics variables and constants with constraint propagation. The knowledge base determines the possible dimensions of each variable.<br><br> There are usually multiple pos- sible dimensions for a variable. For example, p can properly be used to represent a momentum [kg · m / s], an object distance [m] in optics, a pressure [kg / m / s 2 ], or an electric dipole moment [C · m]. It might also be a variable which should have been called P , representing a power [kg · m 2 / s 3 ], a probability [dimensionless], or a probability per unit distance or volume [m 2 1 or m 2 3 ].<br><br> Constraint propagation is used to propagate dimension information to other terms and literals to (1) infer di- mension information and (2) determine dimensional consistency. The algorithm can take partial information about the dimensions of a variable and combine that with knowledge of operators and functions (which are just operators) to completely de- termine dimensions. In essence knowledge, even incomplete knowledge, propagates from one part of the equation to another.<br><br> This permits the algorithm to reason about dimensional consistency when the variables are not explicitly de ned. This section describes how the algorithm checks for dimensional consistency in equations. The checks are performed in a series of steps as described below: (1) set up the constraint tree for each equation.<br><br> (2) using established naming conventions determine the possible dimensions of the variables and thus the dimensions of the leaf nodes in the constraint tree. (3) propagate values in the constraint tree and determine the consistency of each tree. (4) enforce consistencyconstraints on the dimensions of all leaf nodes corresponding to a single variable, and propagate the constraints throughout the resulting overall graph.<br><br> (5) generate feedback to the user. 4.1. Setting Up The Graph of Constraints The system establishes the constraint graph by setting up a binary constraint tree for each equation and represents the dimensional possibilities for each node.<br><br> Each interior node in the graph represents an operator ( e.g., = , + , 2 , 7 ,/ ) or a function Determining the Dimensions of Variables in Physics Algebraic Equations 7 ( e.g., sin , cos). The leaves of the tree represent each instance of a variable or a constant in the equation. If a variable occurs twice in one equation or in two separate equations, there will be two separate nodes in the constraint network labeled with that variable.<br><br> This allows our system to work with multiple instantiations of a variable, and if the equation set is found to be inconsistent, to pinpoint the speci c instance that is at fault. To maintain consistency for each variable within the system of equations, an identity constraint is added. These constraints connect all nodes that are instances of the same variable, even if they occur in di erent equations, restricting these nodes to have the same set of dimensions.<br><br> The edges of the constraint graph connect nodes that a ect each other di- rectly. The value at each node represents the set of dimensional values that are consistent with the values of the nodes connected to it. Each member of the set is a ve tuple specifying the exponents of each dimension.<br><br> The tuple is ordered as < distance,mass,time,charge,temperature > . For example, a dimensionless variable has a value < 0 , 0 , 0 , 0 , 0 > , a variable m for mass will have a value of < 0 , 1 , 0 , 0 , 0 > , and a variable a for acceleration ( i.e. , m / s 2 ) will have a value of < 1 , 0 , 2 2 , 0 , 0 > .<br><br> 4.1.1. Initializing Values Once the constraint trees have been constructed, the system attempts to obtain initial dimension values for the leaf nodes, i.e. the variables and constants.<br><br> The algorithm uses a knowledge base of commonly used variables and constants along with their dimensions. It may be that there is either no mapping available or that there is more than one mapping. In these cases, the system will either leave the speci c dimension blank or set the node to re\x2ect that more than one dimension is possible.<br><br> Figure 3(a) shows an example of a constraint tree after the leaf nodes have been initialized with possible values. The information in the knowledge base shows that T typically represents tension with dimensions [kg · m / s 2 ], or time with dimension [s], or kinetic energy with dimension [kg · m 2 / s 2 ], or temperature with dimension [ æ K], and therefore assigns a set of these four dimension possibilities to the variable T 1 . The knowledge base supports three types of matches on variable names.<br><br> Each entry into the knowledge base consists of (1) a string, (2) a set of dimensions, (3) a category, and (4) a type of match. The three types of matches are: " pre x match: Any variable name whose pre x matches the string of an entry in the pre x knowledge base will have the associated set of dimensions as a possi- bility. For example, the variable alp will pre x match with the entry a and will have dimensions associated with acceleration as one of the valid possibilities.<br><br> " pre-emptive match: Any variable name whose pre x matches the string of an entry in the pre-emptive knowledge base will pre-empt any pre x matches. For example, the variable alpha 1 will pre-emptively match with alpha and 8 Liew, Shapiro, and Smith * * <?,?,?,?,?> m 1 <0,0,1,0,0> <2,1, 22,0,0> <0,0,0,0,1> <?,?,?,?,?> <?,?,?,?,?> T 1 <?,?,?,?,?> a m 1 1 <1,1, 22,0,0> g <1,0, 22,0,0> <1,0, 22,0,0> 2 = identity constraint <0,0,0,0,0> <0,0,0,0,0> <0,1,0,0,0> <0,1,0,0,0> <1,0,0,0,0> (a) initial values 2 <0,0,1,0,0> <2,1, 22,0,0> <0,0,0,0,1> m 1 * * = <?,?,?,?,?> <?,?,?,?,?> T 1 <1,1, 22,0,0> a m 1 1 g <1,0, 22,0,0> <1,0, 22,0,0> identity constraint <0,0,0,0,0> <0,0,0,0,0> <0,1,0,0,0> <0,1,0,0,0> <1,0, 22,0,0> <1,0,0,0,0> <1,1,0,0,0> <1,1, 22,0,0> <1,0,0,0,0> <1,1, 22,0,0> <1,0, 22,0,0> (b) Constraints propagated to * nodes 2 * <0,0,1,0,0> <2,1, 22,0,0> <0,0,0,0,1> m 1 * = T 1 <1,1, 22,0,0> a m 1 1 g <1,0, 22,0,0> <1,0, 22,0,0> <1,0, 22,0,0> <0,0,0,0,0> <0,0,0,0,0> <0,1,0,0,0> <0,1,0,0,0> <1,1, 22,0,0> <1,1, 22,0,0> <1,1, 22,0,0> identity constraint <1,0, 22,0,0> <1,0,0,0,0> <1,1, 22,0,0> <1,1,0,0,0> <1,0,0,0,0> (c) Constraints propagated upwards 2 * * a m 1 1 <1,0, 22,0,0> <0,1,0,0,0> <1,1, 22,0,0> m 1 g <1,0, 22,0,0> <0,1,0,0,0> <1,1, 22,0,0> = T 1 <1,1, 22,0,0> <1,1, 22,0,0> <1,1, 22,0,0> identity constraint (d) fully resolved Fig. 3.<br><br> Constraints Propagating through the Tree have radians as one of the possible dimensions. This match will also remove acceleration (and any other pre x matches) from the list of possibilities. " exact match: Any variable name that matches exactly with the string of an entry in the exact-match knowledge base will have the associated set of dimensions.<br><br> This match overrides and excludes all other matches. For example, the variable G will have the dimensions of the universal gravitational constant and the match will remove all pre x or pre-emptive matches with G . The variable G 1 however will not be an exact match.<br><br> The extended matching capability provided us with ways to specify preferences amongst the di erent possible matches for a variable and proved reliable and robust. Students could employ non-standard variables and make discovery of dimensions di\x3cult or impossible; however, our experience has shown that this is seldom the case and that students almost always follow the conventions established in the eld and used in texts and by instructors. Determining the Dimensions of Variables in Physics Algebraic Equations 9 4.2.<br><br> Propagating Constraints Once the initial dimension possibilities for the leaf nodes have been established, dimension information is propagated to determine if each equation is dimensionally correct. The system uses a few simple rules to propagate and infer dimensions. The rules for reasoning about dimensions are listed below: (1) If a node represents an additive operator (+ , 2 , =), then the dimension of this node and its children must be the same.<br><br> (2) If a node represents a trigonometric function (sin , cos), then the node and its child are dimensionless. (3) If a node represents a multiplication operator ( 7 ), then the dimension of this node is the component-by-component sum of the dimensions of its children. Figure 3(b) shows the e ect of propagating dimensions at leaf nodes to the two interior nodes representing multiplication ( i.e.<br><br> , labeled with a *). (4) If a node represents a division operator ( / ), then the dimension of this node is the result of subtracting the dimension (component by component) of the right child ( i.e. , the denominator) from the dimension of the left child ( i.e.<br><br> , the numerator). (5) The dimension possibilities of each node are repeatedly checked to assure that all possible dimensions are consistent with the possible dimensions of their parents and children. If one of the possible dimensions is not consistent it is removed.<br><br> This process is repeated until a xed point is found or until an inconsistency has been revealed. (6) Nodes with unknown dimensions acquire them as necessary to maintain dimen- sional consistency. Figure 3(c) shows the rst step of propagating constraints to the nodes labeled 2 and =.<br><br> The constraints associated with these nodes require that the node and its children have the same dimension and therefore tightly constrain the overall tree. Figure 3(d) shows the nal state after all constraints have been propagated to a xed point. When fully resolved there is a single assignment of dimensions to variables consistent with the equation, and T 1 is identi ed as a tension [kg · m / s 2 ].<br><br> Once all constraints due to mathematical operators have been satis ed, the system proceeds to impose identity constraints. If an inconsistency is found while processing mathematical operators, this can be brought to the student 9s attention. Identity constraints are introduced only after the other constraints have been sat- is ed.<br><br> Delaying the application of identity constraints guarantees that local sources of inconsistency will be identi ed early in the evaluation. Consistency constraints are iteratively applied until the entire graph is stable or an inconsistency found. 5.<br><br> Evaluation The algorithm was evaluated on three data sets. Each data set has slightly di erent characteristics and can be described as follows: 10 Liew, Shapiro, and Smith " The Lafayette Data Set: The data set consists of approximately 350 answers to four physics prob- lems from 88 di erent students in an introductory physics course for engineers and science majors at Lafayette college. The problems varied in di\x3culty from the example discussed in this paper (Section 2.1) to a problem involving an accelerating pulley.<br><br> There were no restrictions on the types of variables that the students could use although they were discouraged from performing alge- braic simpli cation of their equations. The students were given the questions and asked to write their answers on sheets of paper that were later transcribed into an electronic form. In addition, the students were not required to de ne or explain any of the variables that they chose to use.<br><br> " The ANDES 2000 Data Set: The ANDES system is also a tutoring system for introductory college level physics. It has a large database of problem types and is in current use at the United States Naval Academy. Logs of student answers and tutor responses have been maintained since the initial introduction of the ANDES system.<br><br> We extracted the student answers from one semester (Fall 2000) and used it to evaluate our system. The key features of this data set (and of the ANDES system) are: 3 large database of problems and problem types : The ANDES system has a repository of approximately one hundred problems. These problems are much more diverse than the ones used to generate the Lafayette dataset.<br><br> 3 large number of equation sets: The analyzed ANDES data contained 9,865 equation sets in 6,000 logs. These logs were created by many students, each of whom worked on many problems. Our analysis does not group equations sets by either student or problem but rather treats all 9,865 equations sets as a single corpus.<br><br> The system recorded answers, including partial answers, making the number of equation sets larger than the number of logs. Many of these equation sets contain incomplete answers, i.e. , the student did not enter all the equations needed to solve the problem.<br><br> We accepted any student equation that was correct, that is, consistent with the problem as stated, and otherwise JaSL8 imposed no constraints on the equations accepted. 3 variables are explicitly de ned before use. The ANDES framework requires that the students de ne all variables before they can be used in equations and provides a graphical interface to help them with this step.<br><br> Our analysis does not use this information, but the fact that the student was required to give it may have a ected the inputs. 3 use of numeric values: The questions in ANDES are given in terms of explicit numerical quantities and require numeric answers. While students were strongly encouraged to generate complete algebraic solutions before substituting numeric values to arrive at the answer, students frequently Determining the Dimensions of Variables in Physics Algebraic Equations 11 use numeric values in place of variables at earlier stages.<br><br> In this data set, units were not required, so all numbers were treated as having unknown dimensions. " The ANDES 2001 Data Set ANDES 2001 was an enhanced version of ANDES that allowed the speci - cation of dimensions for constants. The ANDES 2001 data set is very similar to the ANDES 2000 data set, but it contains dimension information on student supplied constants.<br><br> The di erent properties of the three data sets allowed us to evaluate the perfor- mance with (1) unconstrained user input (Lafayette), (2) a large class of problems (Andes 2000, Andes 2001) and (3) hints from the student (Andes 2001). 5.1. Experimental Results We used the data from the experiments to evaluate the technique along several directions.<br><br> " Correctness: Since our technique is based on a heuristic match using a knowl- edge base, one important question is cHow often does the technique return an incorrect answer? d " E ectiveness: Our initial goal was to remove the need for students to explicitly identify every variable that they used. The e ectiveness is determined by the number of equation sets where the technique could uniquely determine the di- mensions of all variables. We also measured in how many cases one clari cation from the student would have su\x3ced.<br><br> " Generality: Our earlier work was evaluated on a small number of problems. The later set of experiments uses a much larger set of problems and hence tests a much wider set of variable types. The experiments should also determine what types of problems are problematic for the technique.<br><br> " Robustness: How well will the technique perform on incomplete sets of equa- tions? The data from both the Lafayette and ANDES experiments includes incomplete equations submitted by the students. If the technique does not work well on incomplete sets of equations, then the system would not be able to provide feedback to a student who needed help to generate the remaining equations.<br><br> 5.1.1. Results from the Lafayette Data Set Dimensional inconsistencies occurred in approximately 15% of the students 9 answers and the errors were all detected by our algorithm. Our original algorithm failed to disambiguate only 5% of the submitted answers (two to three answers for each problem) A tutor would need to ask the student a question about the meaning of a variable to disambiguate.<br><br> The evaluation of this dataset showed that the technique 12 Liew, Shapiro, and Smith was correct, e ective and robust for this small sample of answers. It was robust in that the students were not constrained in the type of answers they could write down. a 5.1.2.<br><br> Results from the ANDES 2000 Data Set An initial evaluation showed problems that were not revealed with the smaller Lafayette data set. Many equation sets had more than one set of possible dimen- sion assignments for the set of variables. We observed that because we were using possible concepts from all of physics, including electricity and magnetism and mod- ern physics (which were not covered in the Andes problems) the range of choices of dimensionality were often very large and the constraints were often insu\x3cient to uniquely determine the correct choice.<br><br> This problem was addressed by (1) splitting the knowledge base into broad sub elds of relevance and (2) adding a more powerful three-level matching capabil- ity, as described in section 4.1.1, to the knowledge base. The knowledge base was partitioned into major sub elds, such as Newtonian mechanics, electricity and mag- netism, and modern physics, and the ANDES problems were annotated to specify that they were problems in Newtonian mechanics. The results are shown in Table 1 b .<br><br> We found that in 80.5% of the equation sets the dimensionality of all variables were uniquely determined. In 3.2% of the cases we found that exactly one variable was ambiguous, so that with at most one clarifying question to the student we could uniquely determine the dimension of all variables in 83.8% of the cases. Of the remaining 16.2% of the cases, 13.9% had more than one ambiguous vari- able and 2.4% were found to be dimensionally inconsistent.<br><br> The variable-matching knowledge base that we used had 109 entries and contained information covering all of Newtonian mechanics, the area from which the analyzed corpus was obtained. Table 1. Evaluation of the ANDES 2000 data.<br><br> Equation Set Property Number Percent in Corpus of Corpus No ambiguous variables 8022 80.5% One ambiguous variable 320 3.2% Two or more ambiguous variables 1381 13.9% Inconsistent Dimensions 237 2.4% As described earlier, the ANDES system permits the students to use numeric a Students were free to use any naming convention for variables and not asked to de ne them. If a de nition was given it was ignored in our analysis. Standard naming conventions were employed on all student submissions even though this was not required.<br><br> b These results di er from those reported in Ref. 9 , because in the interim we discovered several acceptable variable assignments, such as A for area and V for volume, which had been inadvertently left out of our knowledge base. Because Andes is not case sensitive, these possibilities, though little or never used by the Andes students, introduce additional ambiguities when students used v and a for velocity and acceleration.<br><br> Determining the Dimensions of Variables in Physics Algebraic Equations 13 values in place of variables. In 2000, Andes did not check that correct dimensional information was included. Thus a student might enter 9 .<br><br> 8 instead of g for the accel- eration of gravity. Consequently, constants can sometimes have unstated dimensions and the system has to treat each constant initially as having all dimension possi- bilities instead of as dimensionless constants. This proved to be a source of much of the ambiguity.<br><br> Thus we see that our technique was correct, reasonably e ective, general and robust on a large set of problems over a large group of students. But we expected that we could do better if we could make use of dimensional information provided by the students, such as in an equation like g = 9 . 8 m / s 2 .<br><br> We tested our method on equation sets which contained only correct equations, as we were not prepared to examine by hand whether a failure to nd the dimensionality of the variables was due to our in inadequacy or to student mistakes. Andes in 2000 did not check units, but Andes in the fall of 2001 did. Thus we turned to the 2001 Andes corpus.<br><br> 5.1.3. Evaluation of Fall 2001 Data Set The experimental results from the ANDES 2001 data set were very informative. The main di erence between this data set and the one from fall 2000 was Andes 9 additional capability of analyzing dimension speci cations.<br><br> Students were asked to specify the dimensions of constants in their equations. For this analysis, we used the students 9 dimensions whenever provided. The summary results (Table 2) shows that using the dimensions provided by the student did improve the success at unambiguous determinations from 80.5% to 92.0%, and over 95% could be resolved with at most one question.<br><br> Table 2. Evaluation of the ANDES 2001 data. Equation Set Property Number Percent in Corpus of Corpus No ambiguous variables 9737 92.0% One Ambiguous variable 319 3.0% Two or more Ambiguous variables 325 3.1% Inconsistent Dimensions 200 1.7% To check that this improvement over 2000 was due to student-supplied dimen- sions, we ran our method on the 2001 logs in the 2000 mode, ignoring user-supplied dimensions and ignoring all variables which occur only in statements giving their numerical value, such as L = 2 .<br><br> 3 m. (If we ignore user-supplied units, we can ex- tract no information about L from such a statement, so if L appears nowhere else in the equations, it is irrelevant and indeterminate.) We evaluated the performance of our methods based on whether (1) user spec- i ed dimensions were used or ignored and (2) the system used a knowledge base that mapped variables to all physics quantities or only to concepts from Newtonian mechanics. The results are shown in Table 3.<br><br> 14 Liew, Shapiro, and Smith Table 3. Breakdown of results from the ANDES 2001 data. KB User Dimensions No amb var 1 amb var e 2 amb var Inconsistent full no 6998 (70.9%) 189 ( 1.9%) 2629 (26.6%) 52 (0.5%) newton no 7958 (80.7%) 213 ( 2.2%) 1612 (16.2%) 83 (0.8%) full yes 9663 (91.3%) 339 ( 3.2%) 426 ( 4.0%) 152 (1.4%) newton yes 9737 (92.0%) 319 ( 3.0%) 325 ( 3.1%) 200 (1.9%) Without user speci ed dimensions, the technique performed nearly identically on the 2001 data set and on the 2000 set.<br><br> Not surprisingly, we see that when students are required to include dimensions or when the class of problems are re- stricted to a speci c domain of physics ( i.e. , Newtonian mechanics) our methods give unique dimensional identi cations more often. Further we see that user sup- plied dimensions are much more e ective at disambiguating than is restricting the knowledge base to one domain of physics.<br><br> From the third column in table 3, giving the fractions which our methods unambiguously resolve, we see that only 70.9% of the corpus is unambiguous when the knowledge base for all of physics is used and no dimension information is provided by the student. When the knowledge base is restricted to only consider Newtonian Mechanics, and user dimensions are accepted, this increases by 21.1% to 92.0%. Almost all this increase ( i.e.<br><br> , 20.4% out of a total of 21.1%) can be obtained using only user supplied dimensions. This indicates that the dimensionality of variables can be inferred without knowing the domain of physics to which the problem belongs, and that user supplied dimensions are a key component of an ITS working with algebraic physics equations. It is also instructive to examine the 200 equation sets (2% of the corpus) which Andes accepted as correct but which our heuristic methods declared inconsistent.<br><br> We examined these by hand and found they can be partitioned into four groups. 58 inconsistencies were caused by equations that Andes accepted though in- structors would have marked as wrong. Two such examples are v = 0 m / s 2 and 4 .<br><br> 5 J = 0 . 5 kg 7 3 m / s 2 . Our system rejects the rst equation since v cannot repre- sent a concept with dimensions m/s 2 (acceleration).<br><br> Andes ignores any dimension information when the numeric value is zero, therefore ignores the dimension infor- mation following the 0, and accepts this equation. Our system rejects the second equation since energy( i.e. , J ) does not have the dimensions of kg · m / s 2 .<br><br> Andes interprets the input as (0 . 5 kg) × (3 m / s) 2 (notice the the location of the additional parentheses) and accepts this equation as well. 60 inconsistencies were caused by equation sets with ambiguous inputs for which our parser found a correct, but unintended, parse.<br><br> Our parser is a standard deter- ministic parser and when an input is ambiguous ( i.e. , has more than one parse) it only considers one of the possible parses. Andes found a di erent, dimension- ally consistent, parse than our system.<br><br> Had our parser found the parse Andes used the equations sets would have been determined as consistent. This indicates an inadequacy in our parser, but not in our method of determining the meaning of variables. Determining the Dimensions of Variables in Physics Algebraic Equations 15 68 inconsistent sets are attributable to decisions made by our method regarding students 9 choice of variable names for angles.<br><br> Many of these may have been caused by the awkward handling of Greek in the Andes interface. The remaining 14 inconsistencies of the 9727 sets in the corpus were caused by inexplicable choices, such as the student who used j for distance and g for speed. From our analysis a tutor based on our methods would resolve a student 9s intent on 99% of the submissions and require a dialogue on less that 1% of the submissions.<br><br> Of course such dialogs would discourage students from using unconventional name choices, which is, overall, a good thing. 6. Discussion and Future Work We see that dimensional consistency allows a tutor to recognize the types of student variables in a large fraction of equation sets.<br><br> A tutor, however, needs to provide useful feedback when presented with erroneous responses, and in particular, to point to the source of dimensional inconsistency. As the student answers are sets of equations , this is not always straightforward. For example, consider the student solution L = IÉ ; I = 1 6 mL 2 .<br><br> The rst equation has only one consistent dimensional interpretation, with L an angularmomentum [kg · m 2 / s]. If this conclusion is inserted into the second equation, it will be declared dimensionally inconsistent, even though that equation is correct if the L represents instead the length of the side of the rotating square. We would like a tutor to point to the inconsistent use of L rather than to say either equation is inconsistent.<br><br> The solution is to propagate information in two steps, rst only within each equation until the system is quiescent and then secondly between equations. This allows the system to isolate errors within equations before errors of inconsistencies between equations. There aresimilar problems with inconsistenciesof usagebetween di erent terms of a single equation.<br><br> We intend to evaluate the e ectiveness of these heuristics and to develop additional ones as necessary to ensure that users are pointed in the right direction when they make mistakes. 7. Conclusion This paper has shown how domain knowledge combined with heuristic constraint propagation can be used to determine the context and implicit information con- tained in student answers, speci cally the dimensions of variables in systems of equations.<br><br> This approach has been tested and evaluated on answers from students at two institutions. The results show that the technique uniquely determined the dimensions of all the variables in 91 392% of the sets of equations. By asking for dimension information about one variable, an additional 3% of the sets can be determined.<br><br> 16 Liew, Shapiro, and Smith Sca olding is a technique that is useful and helpful to beginning students. Af- ter some experience, students would bene t from having the sca olding removed. The experiments validate the hypothesis that our technique allows us to remove the sca olding from a physics tutoring system and still determine the dimensions of the variables used in the equations.<br><br> A tutorial system that accepts equations from stu- dents without requiring them to explicitly de ne each variable used can make very e ective use of physical dimensionality constraints and standard variable naming conventions. These can help identify the physical quantities corresponding to each student-chosen variable name. The heuristics rarely lead to mistaken assumptions, and in most cases completely determine the dimensionality subclass to which each variable belongs.<br><br> This encourages us to continue to the next step: given a problem statement and thereby a canonical set of variables relevant to the problem, to as- sociate each student variable with the physics concept it represents rst, and then to try to determine the corresponding member of the canonical set. 8. Acknowledgments We are grateful to Kurt VanLehn and the Andes group for making the ANDES logs available to us, and to Anders Weinstein for answering questions about the ANDES implementation.<br><br> Jim Appenzeller and Dave Santin provided invaluable help in the implementation and evaluation of the system. References 1. Cunis, R.<br><br> A package for handling units of measure in lisp. ACM Lisp Pointers 5 , 2 (1992). 2.<br><br> Fretz, E. B., Wu, H.-K., Zhang, B., Krajcik, J. S., and Soloway, E.<br><br> A further investigation of sca olding design and use in a dynamic modeling tool. In Proceedings of the American Education Research Association Conference (2002). 3.<br><br> Gertner, A., and VanLehn, K. Andes: A coached problem solving environment for physics. In Proceedings, 5th International Conference, ITS 2000 (Montreal Canada, June 2000), Springer.<br><br> 4. Gertner, A. S.<br><br> Providing feedback to equation entries in an intelligent tutoring system for physics. In Proceedings of the 4th International Conference on Intelligent Tutoring Systems (1998). 5.<br><br> Hilfinger, P. N. An ada package for dimensional analysis.<br><br> ACM Transactions on Programming Languages and Systems 10 , 2 (1988), 189 3203. 6. Liew, C., Shapiro, J.<br><br> A., and Smith, D. Reasoning about algebraic answers in physics. In Proceedings of Twelfth International Florida AI Research SocietyConfer- ence (1999), pp.<br><br> 167 3171. 7. Liew, C., Shapiro, J.<br><br> A., and Smith, D. What is wrong with this equation? error detection and feedback with physics equations.<br><br> In Proceedings of Thirteenth Interna- tional Florida AI Research SocietyConference (2000). 8. Liew, C., Shapiro, J.<br><br> A., and Smith, D. Identi cation of variables in model tracing tutors. In Proceedings of 11th International Conference on AI in Education 2003 (July 2003), IOS Press, pp.<br><br> 464 3 466. Determining the Dimensions of Variables in Physics Algebraic Equations 17 9. Liew, C., Shapiro, J.<br><br> A., and Smith, D. Inferring the context for evaluating physics algebraic equations when the sca olding is removed. In Proceedings of Seventeenth International Florida AI Research Society Conference (2004).<br><br> 10. Liew, C., and Smith, D. Checking for dimensional correctness in physics equa- tions.<br><br> In Proceedings of Fifteenth International Florida AI Research Society Confer- ence (2002). 11. Liew, C.<br><br> W., and Smith, D. E. Reasoning about systems of physics equations.<br><br> In Intelligent Tutoring Systems: ITS 2002 (2002), Cerri, Gouarderes, and Paraguacu, Eds., Lecture Notes in Computer Science, Springer-Verlag. (LNCS 2363). 12.<br><br> Novak, G. S. Conversion of units of measurement.<br><br> IEEE Transactions on Software Engineering 21 , 8 (August 1995), 651 3661. 13. Ow, P.<br><br> S., and Smith, S. F. Towards an opportunistic scheduling system.<br><br> In Pro- ceedings of 19th Hawaii International Conference on System Sciences (1986). 14. Schulze, K., Shelby, R., Treacy, D., Wintersgill, M., VanLehn, K., and Gertner, A.<br><br> Andes: An intelligent tutor for classical physics. The Journal of Elec- tronic Publishing 6 , 1 (2000). 15.<br><br> Steinberg, L. Design as re nement plus constraint propagation: The VEXED expe- rience. In Proceedings of AAAI-87 (1987).<br><br> 16. VanLehn, K., Freedman, R., Jordan, P., C. Murray, C., Osan, R., Ringen- berg, M., Ros, C., Schulze, K., Shelby, R., Treacy, D., Weinstein, A., and Wintersgill, M.<br><br> Fading and deepening: The next steps for andes and other model- tracing tutors. In Proceedings of Fifth International Conference on Intelligent Tutor- ing Systems (ITS) (2000), G. Gauthier, C.<br><br> Frasson, and K. VanLehn, Eds. <br><br>