![]() |
Abstract |
|
![]() |
1. Introduction To the best of our knowledge, there are very few tools dedicated at finding pedagogically relevant information in student work. Many tools solely produce statistics on marks, good and bad answers, or mistakes. The C.H.I.C. software [1] is a tool that produces more sophisticated information for teachers. It rests on implicative statistical analysis, an approach that we have not considered here and that produces other kinds of associations than the association rules algorithm included in TADA-Ed. The work by Bisson et al. [2] on the statistical analysis of students' behavior in algebra shows that valuable information can be extracted. They focus on the analysis of students' interactions with the system. Similarly, studies of web usage have been applied to eLearning [3]. CourseVis [4] is a quite sophisticated visualisation tool allowing teachers to explore student data. Our aim is to have a tool that can exploit students' interactions, web logs as well as data with richer semantic information such as the relevance of an action, the evaluation of correctness and, if applicable, the type of mistake made. Of course, one may object that a number of excellent data mining platforms exists, commercially (such as [5]) as well as open source (such as [6]). However they are not tailored to the needs for Education. Firstly, these platforms are too complex for teachers and educators to use. Their features go well beyond the scope of what a teacher may want to do. Therefore the cognitive load associated with operating the program itself is too great to make them practical, as they reduce the users' cognitive resources to actually perform their task [7]. Secondly and most importantly, we found that the mining queries we wanted to run were not possible for the following reasons:
The paper is organized as follows. Section 2 presents the tool. In Section 3 we present examples of its features and of their use with student work obtained from a tool used in our undergraduate teaching. We have anonymised (i.e. give fictitious names) and simplified the data (i.e. we deleted some fine-grained information) and used generic terms so that the reader can understand the data without knowing the tool. Section 4 concludes the paper. |
![]() |
![]() |
2. The Tool 2.1 General presentation The bottom left frame contains classification and clustering algorithms (presently k-means, hierarchic clustering and decision trees). These three algorithms have been adapted from the open source Data Mining library Weka [6]. The aim of k-means and hierarchical clustering is to group elements that are similar according to some specific criteria defined by the user. The aim of a decision tree classifier is to classify an element with a subset of other elements. It may be used to predict the mark of a student with all the work stored in the database for instance. These previous windows are linked together. The results of the k-means algorithm are displayed in the Points view window. Clicking a point (or a rectangle of points) in the Points view shows their attributes in the message area. Clicking a bar in the Graph View window colors the relevant points in the Points View window. |
|
![]() |
|
|
![]() |
In our variant of A-priori which takes sequence into account, the rule A,B®C reads as follows: if students make mistake A, followed by mistake B, then later they make mistake C. We also highlight that an Administration menu contains all the facilities related to the administration of the software: filters (selection of subset of data) and preprocessing of the database for the classification/clustering module and association rules module. We explained that one of the particularities of TADA-Ed was its preprocessing functionalities. We will now describe them. |
|
![]() |
2.2 Preprocessing Let us suppose the teacher wants to perform a clustering algorithm to group students according to the mistakes they made. The data may contain several entries for each student: one entry per mistake. For the clustering algorithm to work we need to gather these entries to form one single vector per student, so that these students can be "compared". The purpose of the preprocessing is to gather these entries per each value of a chosen attribute and to collect and transform information brought by the other attributes. The various types of preprocessing available in TADA-Ed are shown in Table 1. It is important to note that in TADA-Ed, each attribute can have a different preprocessing type. Some powerful data mining platforms offer some of these preprocessing features (nominal and total number) but apply them to all the attributes of the table. |
|
![]() |
|
|
![]() |
3 Application We will illustrate the use of TADA-Ed through different mining experiments. |
|
![]() |
3.1 Unfinished exercises First, we will look at all the mistakes made by students in exercises that were never finished. For this we will use the "mistake" table, shown in Table 2, and filter it to only keep the lines where the finish date is empty. We get a sub-table of Mistake containing only the mistakes that were serious enough to prevent students from completing the exercise. |
|
![]() |
|
|
![]() |
|
|
![]() |
|
![]() Figure 2. Mistake frequency |
![]() |
|
![]() Figure 3. Frequency of concepts used erroneously |
![]() |
|
|
![]() |
|
|
![]() |
|
|
![]() |
|
![]() Figure 6. Clusters representation (login with concept), clusters shown in color. |
![]() |
|
|
![]() |
Figure 6 shows an interesting cluster from a teacher's perspective, cluster 4 may need immediate action. Indeed, it contains students who have made mistakes with concepts in all classes, that have not attempted many exercises, 1.82 on average, but that have made the most mistakes 38.64 on average. Looking back at Figure 5, there is at least one concept that seems to be difficult for them, the one that draws an almost horizontal purple line. Almost all students from that group have made a mistake with that concept. |
|
![]() |
3.2 Classification/prediction of marks according to mistakes |
|
![]() |
|
|
![]() |
The tree is shown in Figure 8: |
|
![]() |
|
|
![]() |
In addition, the login of each student appears at the bottom of the window, in a pop-up menu and by selecting one (here, Phil), its path is colored in red. So Phil is part of the group who made less than 9.5 mistakes (but more than 4.5) and who logged in less than 4.2 times. His exam mark was around 3.72 (out of 7). This tree can now
be saved and used to predict future marks. If bad marks are predicted
from the tree, one may warn students. The path on the tree indicates
how their work with the tool leads to the prediction. Students may
reflect
on that and take remedial actions. |
|
![]() |
3.3 Association rules |
|
![]() |
|
![]() Figure 9. Association Rules result |
![]() |
The association rules module implements the A-priori algorithm [8] that we modified in a number of variations to suit the educational context.
Lastly we added an important feature to this module allowing the teacher to export some rules (in XML format) that can then be imported in a tutoring system to provide pro-active feedback to users [10]. |
|
![]() |
4 Conclusion The tool is implemented and operational. We have illustrated several of its features using data coming from a real tutoring system. For instance, a combination of filtering and visualisation facilities allows us to see mistakes made by students who do not succeed in finishing exercises as well as concepts involved in these mistakes. Clustering facilities may lead to identifying interesting groups of students, like the group of students who make many mistakes with all sorts of concepts without attempting a significant number of exercises. Simple queries on data would not discover such a group. Another feature is to predict errors in the final exam and thus warn students who are likely to fail. We have used the tool to discover clustered mistakes, which lead us to reflect on the course material [9]. We are currently working to improve TADA-Ed. This first version provided us with some means to understand better which algorithms provide relevant information for teachers, which are the attributes to take into consideration and whether and how these algorithms need to be adapted. For example, student data has an important temporal link. This has led us to adapt the classical A-priori algorithm for association rules to take the order in which mistakes are made into account. This order is broader than usually understood, see [8], as it allows us to take into account the sequentiality between mistakes made while solving a single exercise as well as the sequentiality between mistakes done on exercises solved apart in time. Also importantly, educational data mining is now a growing research community and we expect that findings drawn from events such as [11, 12] will help us improve our work. We also plan to tackle the interface to make it as easy as possible for teachers not too familiar with new technologies. This work will be carried out with researchers in Education. Whilst TADA-Ed in its current version can be used by a teacher who understands data mining, we need to make the tool accessible to any teacher who has a minimum of computer literacy. Finally, we would like to make the tool more intelligent, to save time for the teacher. Ideally, the tool would run automatically various Data Mining algorithms on the data and would alert the teacher when interesting/abnormal information has been discovered. |
|
![]() |
5. References [2] Bisson, G., A. Bronne, M.B. Gordon, J.-F. Nicaud, & D. Renaudie. "Analyse statistique de comportements d'éléves en algébre" in Proceedings of EIAH2003 Environnements Informatiques pour l'Apprentissage Humain, Strasbourg, France: Paris: INRP (2003). [3] Zaiane, O.R. "Web Usage Mining for a Better Web-Based Learning Environment" in Proceedings of Conference on Advanced Technology for Education (CATE'01), pp 60-64, Banff, Alberta (2001). [4] Mazza, R. & V. Dimitrova. "CourseVis: Externalising Student Information to Facilitate Instructors in Distance Learning" in Proceedings of 11th International Conference on Artificial Intelligence in Education (AIED03), F. Verdejo and U. Hoppe (Eds), Sydney: IOS Press (2003). [5] SPSS, Clementine, www.spss.com/clementine/ (accessed 2005) [6] WEKA, www.cs.waikato.ac.nz/ml/weka (accessed 2003) [7] Sweller, J., "Some cognitive processes and their consequences for the organisation and presentation of information". Australian Journal of Psychology. 45(1): p. 1-8 (1993). [8] Agrawal, R. & R. Srikant. "Fast Algorithms for Mining Association Rules" in Proceedings of VLDB, Santiago, Chile (1994). [9] Merceron, A. & K. Yacef. "A Web-based Tutoring Tool with Mining Facilities to Improve Learning and Teaching" in Proceedings of 11th International Conference on Artificial Intelligence in Education., F. Verdejo and U. Hoppe (Eds), pp 201-208, Sydney: IOS Press (2003). [10] Merceron, A. & K. Yacef, "Educational Data Mining: a Case Study (paper accepted for the conference on Artificial Intelligence in Education, AIED2005)". Amsterdam, The Netherlands (2005). [11] Beck, J., ed. Proceedings of ITS2004 workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes. Maceio, Brazil (2004). [12] Choquet, C., V. Luengo, & K. Yacef, eds. Workshop on "Student usage analysis" to be held in conjunction with AIED 2005, Amsterdam, The Netherlands, July 2005. (2005). |
![]() |
![]() |
6. Acknowledgements |
![]() |
![]() |
|
![]() |