Objective control
Employing objective ways of assessing students’ performance ensures the objective character of control. It involves such notions as quantitative measurement and qualitative evaluation, deciding on criteria of assessing and the account of students’ age and individual psychological peculiarities. The terms measurement, control (testing) and evaluation (assessment) are often used synonymously. Indeed, they may, in practice, refer to the same activity. Thus, for example, when parents ask of an evaluation of their child’s proficiency, they are often given a test score. Nevertheless, the similarity of these terms is superficial and tends to obscure the distinctive characteristics of each, which are essential to objective control. Measurement is the process of quantifying certain characteristics according to explicit rules and procedures. Quantificationinvolves the assigning of numbers, and this distinguishes measures from qualitative descriptions such as verbal accounts or non-verbal, visual representations. Non-numerical categories or rankings such as letter grades (A, B, C...), or labels (excellent, good, average...) may have the characteristics of measurement, when they are assigned numbers in order to analyse and interpret them. Physical characteristics are easily assigned numbers to, as they can be observed directly. However, it is mental characteristics of a person that we are concerned with in teaching control. These mental characteristics applied to language teaching mean the ability to perform certain operations with the target language. We generally assume that there are degrees of ability and that these are associated with tasks or performances of increasing difficulty or complexity. Thus, individuals with higher degrees of a given ability could be expected to have a higher probability of correct performance. Whatever attributes or abilities we measure, it is important to understand that it is these abilities or attributes and not the persons themselves that we are measuring. Control activity, or test, is a procedure designed to elicit certain behaviour from which one can make inferences about certain characteristics of an individual (Carroll, 1968). From this definition, it follows that a language test is an instrument designed to elicit a specific sample of an individual’s speech behaviour. Its special value lies in the capability in eliciting such a kind of verbal behaviour that will allow for interpreting it as evidence of the attributes or abilities that are of interest. As a type of measurement, it necessarily quantifies characteristics of individuals according to explicit procedures. Evaluation can be defined as the systematic gathering of information for the purpose of making decisions (Weiss, 1972). The probability of making the correct decision in any given situation is the function not only of the ability of the decision-maker, but also of the quality of the information on which the decision is based. Everything else being equal, the more reliable and relevant the information, the better the likelihood of making the correct decision. Evaluation does not necessarily entail testing. Tests in and of themselves are not evaluative. They are often used for pedagogical purposes, either as a means of motivating students to study, or as a means of reviewing the material taught, in which case no evaluative decision is made on the basis of the test results. It is only when the results of test are used as a basis for making a decision that evaluation is involved. Since by far the majority of tests are used for the purpose of making decisions about individuals, it is important to distinguish the information-providing function of measurement from the decision-making function of evaluation (Bachman, 1994). The relationships among measurements, testing and evaluation can be illustrated as follows:
If we are to ensure the objective character of control, we are to provide two essential measurement qualities: reliability and validity. Reliability is the quality of control that makes it free from errors of measurement. There are many factors other than the ability being measured that can affect performance on control activities, and that constitute sources for measurement errors. Individuals’ performance may be affected by the difference in testing conditions, fatigue and anxiety, and they may obtain scores that are inconsistent from one occasion to the next. Reliability thus has to do with the consistency of measures across different times, test forms, raters and other characteristics of the measurement context. The most important quality of test interpretation and use is validity, or the extent to which the inferences or decisions we make on the basis of test scores are meaningful, appropriate and useful. In order for a test score to be a meaningful indicator of a particular individual’s ability, we must be sure it measures that ability and very little else. If test scores are strongly affected by errors of measurement, they will not be meaningful, and cannot, therefore, provide the basis for valid interpretation or use. A test score that it is not reliable, therefore, cannot be valid. If test scores are affected by abilities other than the one we want to measure, they will not be valid either. E.g., if the students are asked to listen to a lecture and then write a short essay based on that lecture, the essays they write will be affected by both their writing ability and their ability to comprehend the lecture. Rating of their essays, therefore, might not be valid measures of their writing ability. In examining validity, we must also be concerned with the appropriateness and usefulness of the control activity for a given purpose. The score derived from a test developed to measure the language abilities of monolingual elementary school children might not be appropriate for determining the second language proficiency of bilingual children of the same grade and age levels. Similarly, scores from a test designed to provide information about an individual’s vocabulary knowledge might not be particularly useful for placing students in a writing programme. While reliability is a quality of test scores, validity is a quality of test interpretation and use. Neither, however, is the quality of control itself. Furthermore, neither is absolute. We can never attain perfectly error-free measurement in actual practice, and the control technique appropriateness will depend on many factors outside them. Determining what degree of relative reliability and validity is required for a particular control context involves a value judgement on the part of the teacher.
|