Language tests are to meet the following main requirements: validity, reliability, differential capacity, practical and economical character.
Validity is defined as the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores (American Psychological Association, 1985). In other words, the test is valid if it measures what test-makers intended to measure (Kokkota, 1989). There are usually distinguished 5 kinds of language test validity:
Validity kind
| Characteristics
| The way of detection
|
Comparative
| How well the test measures its object compared to another test or assessment
| The test results are correlated to another test or teacher’s assessment results, obtained immediately before of after the given test.
|
Prognostic
| How well test results predict success in future learning
| The test results are correlated to another test results, administered in half a year/ a year
|
Content
| Whether all the major elements of the syllabus/ textbook content are presented in the test
| The elements of the test material are analysed and correlated to those of a syllabus/ textbook
|
Conceptual (constructive)
| How well the objects of testing and the character of tasks correspond to a psycholinguistic acquisition model of the language material tested or to a given model of communicative competence
| The language material of test is analysed correspondingly
|
Exterior
| How attractive and convenient the test items seem to test takers, teachers and supervisors
| The language material of test is analysed correspondingly
|
Reliability is understood as certain stability of test results. Test reliabilitycan be expressed through the following coefficients:
Coefficient
| Estimates
| The way of estimating
|
Stability
| Indicate how consistent test scores are over time
| Computed as correlation coefficient of two subsequent testings of one and the same contingent with one and the same test
|
Equivalence
| Provide an indication of the extent to which scores on alternative forms of a test are equivalent
| Computed as correlation coefficient of two parallel forms (varied in difficulty) of one and the same test for one and the same contingent of test takers. In such a case, retesting is required
|
Internal consistency
| Are concerned primarily with sources of error from within the test and scoring procedures
| Computed correlation coefficient of two parts of one and the same test without retesting. Correct and wrong responses are scored separately in test items of even and uneven numbers for every testee
|
Homogeneity (test item invariance)
| Assess reliability on the basis of ratios of the variances of test components – halves and items – to total score variance (Kuder-Richardson formulae, developed in 1937)
| Computed as the average of all the possible split-half coefficients on the basis of the statistical characteristics of the test items. This approach involves computing the means and variances of the items that constitute the test
|
Differential capacity indicates the ability of the text to reveal students with sufficient and insufficient level of habits and skills formation. This requirement has to do with determining the average degree of difficulty of test items. On the one hand, there should be items of higher degree of difficulty to let more advanced testees display their language ability. On the other hand, students with poorer knowledge should not find the test impossible to cope with. It is generally accepted that a testee gets a satisfactory mark if he copes with 60% of items. Therefore, a test should contain 60% of simple items. The item is considered simple if 85% or more testees have coped with it. The item is difficult if it has been correctly done only by 15% of students or less.
Practical character of a testimplies: 1) understandable language of a test instruction; 2) comparatively simple organisation of the procedure; 3) possibility for a test to be conducted in usual school conditions; 4) comparatively simple procedure of checking students’ answers, computing test scores and their evaluation.
Economical character of a testis of major importance for standard tests at the stage of their planning and preparation. A test can be regarded as economical if it provides maximal reliable information on the testees with minimal time and effort spent on its preparation, administering and scoring.