Intelligent
analysis tools for computer based assessments
Taras Filatov and Viktor Popov
Wessex
Institute of Technology,
Abstract
In distance learning the
function of measuring the students’ knowledge is carried out by computer based
assessment systems. However the development of these systems and distance
learning area on a whole is obstructed by the fact that human analyst is still
required to analyse the reliability of tests used and results obtained.
The study reveals that it is
possible by analysing the results of assessments to reveal the qualitative
adjectives of the tests. This consequently means that it is possible to
implement the self-assessment approach into automated testing systems, which
would help teachers to evaluate their tests and assessments results without the
help of specialists. This will also provide a platform for further automation
and improvement of computer based assessment systems using the same principle.
Following the study, a set of
tools was developed aimed to conduct a comprehensive analysis of results of
assessments, to discover the foregoing factors and to report the conclusions in
an understandable form.
Keywords:
distance learning, computer based assessment, intelligent analysis, statistics,
self-analysis, assessment results, test quality
1 Introduction
Distance
learning is developing rapidly nowadays. For the educational system to be
really distant and computer based it should have an opportunity to assess
students’ knowledge remotely in a similar way as delivered. Computer based
multiple choice tests are being used for this purpose nowadays providing the
opportunity to assess students’ knowledge automatically. This makes possible to
automate most of the educational process. However the consequences of such
automation could be dramatic because of the influence of human factor during
the design of the educational materials and tests. That is why the modern
distance learning systems should have the tools for the self-control and human
control. These tools should be able to analyze the data collected by the system
using the principles of data mining and make assumptions about the positive or
negative influence and the quality of different parts of educational process
such as different types and sets of learning materials, system interface and
usability, assignments, assessment tasks etc. [1, 2]
It
is possible, we believe, having a considerable amount of data, to distinguish
the influence of these factors by a thorough analysis of final results. [3]
This analysis can be done in a manual way, in computer assisted way and in a
fully automated way. By developing the mechanisms for the computer assisted and
automated analysis it is possible to solve the bottle neck problem of the
distance learning systems and make the process of education and assessment
really automated and reliable. [4] The purpose of this work is to make the
first steps in this area. Our task was to develop tools for the intelligent
analysis of data collected by the assessment system targeting the quality of
the test and questions used.
2 Method
A literature on data mining, statistics,
knowledge assessment and theory of measurements was reviewed. A ready computer
based assessment system OpenTest was selected which
is coded in PHP and stores the tests and results of assessments in the MySQL database which makes it convenient to expand the
system with our experimental sub-modules.
3 Results
3.1 Questions with deviations
Ideally, each question of the
test should have a normal distribution of right and wrong answers by different
students. If there is a statistical deviation in the correctness of answers, it
is very likely that there is some problem with this question. The tool
highlights the answers with statistical deviations making simpler any kind of
analysis of questions. The example is
shown in Fig.1 where “+” is a column for right answers and “-“
is a column for wrong answers.

Figure 1: Analysis of
questions
3.2 Ability-probability
diagram
The idea of this tool was
taken from the Items Response Theory [5]. This diagram shown at the picture
displays the relation between the knowledge level of the students (ability) and
the probability of answering the current question. With a quite high level of
reliability the chart displays information about how often the question was
answered correctly by students with a high level of knowledge on the subject of
a test and about how often the question was answered wrong by students with a
low level of knowledge on the topic.
An ideal chart subject to
normal distribution of marks should be something like a bias from the bottom
left corner to the upper right corner. This will mean that no one student with
a zero knowledge level had answered this question and that all the students
with a high knowledge level had answered it right. But it is also necessary to
take into consideration the factor of normalcy of the distribution, the fact
that there is a majority of average levels among students, those who get a “3”
mark. It means that in most cases an ideal bias will transform into a curve
with a maximum at “3” and a little bend and diminution to “4” and “5”. [6, 7]
This tool is very useful for
the analysis of separate questions and topicality within the test. It allows to find out visually whether the question was typical for
the test or whether even successful students had some problems with
understanding and answering it. If the chart is very untypical, then there is a
high probability that the text of the question wasn’t understood properly.
Example of the diagram: Fig.2.

Figure 2: Ability-probability diagram
3.3 Scale of marks
In the selected computer based
assessment system it is possible to set the scale of marks manually. It is
known that the distribution of marks for the test should ideally be normal. Therefore,
by analyzing the form of the frequency distribution of marks, we can judge
about the appropriateness of the scale of marks.
An algorithm of the script of
the analysis of the frequency distribution of marks is based on simple
statistical formulas:
1) the average of distribution
for the set of results of the test is calculated
|
|
(1) |
2) the standard deviation is
calculated
|
|
(2) |
3) it is known that 34.13% of
results when the distribution is normal will lay into the interval between the
average of distribution and one standard deviation from both sides of the chart
of distribution, two amounts of 13,59% will lay into
intervals between one and two standard distributions and two amounts of 2,14%
will lay between three and two standard distributions from the both sides of
the chart. [8]
If the average of distribution
is significantly different from the middle of the chart then the chart is
shifted and calculations are repeated. If the distribution appears to be normal
after that then the conclusions about the difficulty of the test for the
students are made. The examples of such conclusions are shown in Fig.3 and
Fig.4.

Figure 3: Scale of marks
analysis

Figure 4: Scale of marks
analysis
3.4 Correlation matrix
The matrix of correlation clearly
shows how the answers on particular question had affected the common final
result. The source data for it is based on the data of all the students who had
passed the current test. The questions that didn’t affect the final result and
all the suspicious ones are marked with a color. The matrix very clearly
demonstrates which of the questions it is necessary to exclude from the test to
increase its average discriminative value. The discriminative value shows how
good the current test plays his part of classifying the students by their level
of knowledge on the particular topic. The summarized and average discriminative
values are printed below the correlation matrix and may be used for the control
and comparison of the tests.

Figure 5: Correlation matrix
The coefficients of
correlation are calculated by the classical Pearson formulae:
(3)
where x – results of students
by the current question, y – final results for the test, SPxy
– sums of products of deviations from the average values by x and by y; SSx and SSy – a sum of squares of
deviations by x and by y. [8]
The total test discrimination
value is obtained by summarizing the coefficients of correlation of questions. The
average discrimination value is obtained by dividing the total value by the
number of questions.
3.5 Test quality – automated
analysis tool
All the tools listed above
will help professionals to produce a high-level analysis of the data collected
by the system, giving them powerful and convenient instruments to study the
final results fundamentally. Yet the final conclusions must be made by
the examiners on their own, and
although these tools are applicable and easy to use, they still require a certain
training and experience in order to work with them.
The second task of our work
was to try and create an automated test analysis tool in order to give to the
non-specialists a fully automated tool for the analysis of their tests. This
tool should work fully autonomously, using the data of particular assessment
and make conclusions by itself. The conclusions should be understandable for
humans. The task therefore is to develop an autonomous tool which will replace
a human specialist in certain way and make the same conclusions as latter could
make after using the above listed tools. An autonomous integral algorithm of
the intelligent analysis was developed based on the logical rules. It had
embraced all the above described tools. The obtained integral script considers
all the necessary criteria such as a number of collected results, a form of
distribution of the marks amongst students, the discrimination ability of the
test and values of correlation for every question. Next, the script
‘makes’ conclusions, understandable for the non-specialist, about the quality
of a test, and gives recommendations about the posterior modifications to do to
the test.
The example of conclusions
made by integral script is presented in Fig.6 and its simplified algorithm is
presented in Fig.7.

Figure 6: Conclusions made by
automated analysis

Figure 7: Integral script of
automated analysis
3.6 Application so far
The described tools were
implemented in the form of additional interfaces for the computer based
assessment system. To operate properly, the parameters and degrees of freedom
should be configured for each process of integral script. The following
parameters were used during our tests:
The results of final university
assessments for the courses on ‘Informatics’, ‘Discrete Math’,
‘Microcontrollers’ were taken as the source data. The developed intelligent
analysis tools have displayed expected results highly correlated with real
expectations. The conclusions given by an integral script of automated test
quality analysis were in most cases similar to those given by specialists after
a thorough analysis of results. The tools therefore were found very applicable
and time-saving by teachers of the appropriate courses.
Discussion
A set of methods was developed
to help educators to conduct a computer assisted analysis of assessments results.
The methods integrated with statistical tools allow discovering non-obvious information
about the quality of the test questions and the level of difficulty of the test.
An integral script for the
autonomous analysis of the quality of the test was created as a pilot system of
a fully automated intelligent analysis tools for distance learning.
The tools were implemented within
a real computer-based assessment system and applied for the analysis of results
of assessments obtained in real educational processes. This has proved that the
developed methodology allows to replace to a considerable degree an experienced
human analyst and therefore to economize educators’ time.
This represents a base for
future investigations and progress in this area. Distance learning systems will
become more automated and will save more of educators’ time taking care about
the process of learning of each student, allowing educators to dedicate their
time to the design of the courses and preparation of new materials. The number
of distance learning institutions, the number of courses and the number of
students will grow. It follows that there will be a great demand in artificial
intelligent systems that will help educators to process and evaluate the data
about the progresses of students collected by distance learning systems.
In the current work the
achievements of intelligent analysis are applied to evaluate the quality of the
test and its contents. Similarly, further research can elaborate methodologies
and tools to analyse and evaluate the quality of learning materials, the
efficiency of teacher’s work, students’ knowledge and distinguish these
parameters from the collected data using available data analysis methodologies.
References
[1] Mark Notess
(2001) Usability, User Experience, and
Learner Experience.
http://www.elearnmag.org/subpage.cfm?section=tutorials&article=2-1
[2] V. Neris,
J. Silva, A. Neto, S. Zem-Mascarenhas
(2005) Cognitive strategies to the increase of hyper document quality to
distance learning. Proceedings
of the 2005 Latin American conference on Human-computer interaction CLIHC '05,
pp. 295-300
[3] Usama
M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth (1996) From data mining to knowledge discovery: an overview, Advances in knowledge
discovery and data mining. American Association for Artificial
Intelligence,
[4] S.S.J. Lin,
E.Z.F. Liu & S.M. Yuan (2001) Web-based
peer assessment: feedback for students with various thinking-styles. Inst.
of Education and Dept of Computer and Information Science, National Chiao Tung University, Journal of
Computer Assisted Learning 17, pp.420-432
[5] Ronald K. Hambleton, H. Swaminathan, H.
Jane Rogers. (1991) Fundamentals
of item response theory. Sage Publications Inc.
[6] A. Shkil,
S. Chumachenko,
[7] George M. Bodner (1980) Statistical
analysis of multiple choice exams.
Department of
Chemistry,
[8] N. Oleinik
(1991) A textbook on special course “Test
as an instrument of measuring knowledge and tasks difficulty in modern
educational technology”.