FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:   || 2 | 3 | 4 |

«Test-task validation has been an important strand in recent revision projects for University of Cambridge Local Examinations Syndicate (UCLES) ...»

-- [ Page 1 ] --

Using observation checklists to validate

speaking-test tasks

Barry O’Sullivan The University of Reading, Cyril J. Weir

University of Surrey, Roehampton and Nick Saville

University of Cambridge Local Examinations Syndicate

Test-task validation has been an important strand in recent revision projects for

University of Cambridge Local Examinations Syndicate (UCLES) examinations.

This article addresses the relatively neglected area of validating the match between

intended and actual test-taker language with respect to a blueprint of language functions representing the construct of spoken language ability. An observation checklist designed for both a priori and a posteriori analysis of speaking task output has been developed. This checklist enables language samples elicited by the task to be scanned for these functions in real time, without resorting to the laborious and somewhat limited analysis of transcripts. The process and results of its development, implications and further applications are discussed.

I Background to the study This article reports on the development and use of observation checklists in the validation of the Speaking Tests within the University of Cambridge Local Examinations Syndicate (UCLES) ‘Main Suite’ examination system (see Figure 1). These checklists are intended to ALTE Level 1 ALTE Level 2 ALTE Level 3 ALTE Level 4 ALTE Level 5 Waystage Threshold Independent Competent Good User User User User User


Level 1 Level 2 Level 3 Level 4 Level 5 Key English Preliminary First Certificate in Certificate of Test (KET) English Test Certificate in Advanced Proficiency in (PET) English (FCE) English (CAE) English (CPE)


The Cambridge/ALTE ve-level system Figure 1 Address for correspondence: Barry O’Sullivan, Testing and Evaluation Unit, School of Linguistics and Applied Language Studies, The University of Reading, PO Box 241, Whiteknights, Reading RG6 6WB, UK; email: b.e.osullivanKreading.ac.uk 10.1191/0265532202lt219oa Ó 2002 Arnold Language Testing 2002 19 (1) 33–56 34 Validating speaking-test tasks Table 1 Format of the Main Suite Speaking Test Part Participants Task format 1 Interviewer–candidate Interview: Verbal questions 2 Candidate–candidate Collaborative task: Visual stimulus;

–  –  –

provide an effective and ef cient tool for investigating variation in language produced by different task types, different tasks within task types, and different interview organization at the pro ciency levels in Figure 1. As such, they represent a unique attempt to validate the match between intended and actual test-taker language with respect to a blueprint of language functions representing the construct of spoken language ability in the UCLES tests of general language pro ciency, from PET to CPE level (for further information related to the different tests in the ‘Main Suite’ battery, see the individual handbooks produced by UCLES). Beyond this study, the application of such checklists has clear relevance for any test of spoken interaction.

The standard Cambridge approach in testing speaking is based on a paired format involving an interlocutor, an additional examiner and two candidates. Careful attention has been given to the tasks through which the spoken language performance is elicited in each different part. The format of the Main Suite Speaking Tests (with the exception of the Level 1 KET test) is summarized in Table 1.

II Issues in validating tests of oral performance In considering the issue of the validity of a performance test1 of speaking, we need a framework that describes the relationship between the construct being measured, the tasks used to operationalize that construct and the assessment of the performances that are used to make inferences to that underlying ability.

There have been a number of models that have attempted to portray the relationship between a test-taker’s knowledge of, and ability to use, a language and the score they receive in a test designed to evaluate that knowledge (e.g., Milanovic and Saville, 1996; McNamara, 1996; Skehan, 1998; Upshur and Turner, 1999).

By performance tests we are referring to direct tests where a test-taker’s ability is evaluated from their performance on a set task or tasks.

Barry O’Sullivan, Cyril J. Weir and Nick Saville Milanovic and Saville (1996) provide a useful overview of the variables that interact in performance testing and suggest a conceptual framework for setting out different avenues of research. The framework was in uential in the revisions of the Cambridge examinations during the 1990s, including the development of KET and CAE exams and revisions to PET, FCE and, most recently, CPE (for a summary of the UCLES approach, see Saville and Hargreaves, 1999).

The Milanovic and Saville framework is one of the earliest, and most comprehensive of these models (reproduced here as Figure 2).

This framework highlights the many factors (or facets) that must be considered when designing a test from which particular inferences are to be drawn about performances; all of the factors represented in the model pose potential threats to the reliability and validity of these inferences. From this model, a framework can be derived, through which a validation strategy can be devised for Speaking Tests such as those produced by UCLES.

The essential elements of this framework are:

· the test-taker;

· the interlocutor/examiner;

· the assessment criteria (scales);

· the task;

· the interactions between these elements.

–  –  –

Figure 2 A conceptual framework for performance testing Source: adapted from Milanovic and Saville, 1996: 6 36 Validating speaking-test tasks The subject of this study, the task, has been explored from a number

of perspectives. Brie y, these have been:

· Task/method comparison (quantitative): involving studies in which comparisons are made between performances on different tasks or methods (Clark, 1979; 1988; Henning, 1983; Shohamy, 1983; Shohamy et al., 1986; Clark and Hooshmand, 1992; Stanseld and Kenyon, 1992; Wigglesworth and O’Loughlin, 1993;

Chalhoub-Deville, 1995a; O’Loughlin, 1995; Fulcher, 1996; Lumley and O’Sullivan, 2000; O’Sullivan, 2000).

· Task/method comparison (qualitative): as above but where qualitative methods are employed (Shohamy, 1994; Young, 1995;

Luoma, 1997; O’Loughlin, 1997; Bygate, 1999; Kormos, 1999).

· Task performance (method effect): where aspects of the task are systematically manipulated; e.g., planning time, pre- or post-task operations, etc. (Foster and Skehan, 1996; 1999; Wigglesworth, 1997; Mehnert, 1998; Ortega, 1999; Upshur and Turner, 1999).

· Native speaker/Nonnative speaker comparison: where native speaker performance on speci c tasks is compared to nonnative speaker performance on the same tasks (Weir, 1983; Ballman, 1991).

· Task dif culty/classi cation: where an attempt has been made to classify tasks in terms of their dif culty (Weir, 1993; Fulcher, 1994; Kenyon, 1995; Robinson, 1995; Skehan, 1996; 1998; Norris et al., 1998 ).

The central importance of the test task has been clearly recognized;

however, in terms of test validation, there is one question that has, to date, remained largely unexplored. Although there has been a great deal of debate over the validation of performance tests through analysis of the language generated in the performance of language elicitation tasks (LETs) (e.g., van Lier, 1989; Lazaraton, 1992; 1996), attention has not been drawn to the one aspect of task performance that would appear to be of most interest to the test designer. That is, when tasks are performed in a test event, how does that performance relate to the test designer’s predictions or expectations based on their de nition or interpretation of the construct? After all, no matter how reliably the performance is scored, if it does not match the expectations of the test designer (in other words represent the constructs which are to be tested), then the inferences that the test designer hopes to draw from the evaluated performance will not be valid.

Cronbach went to the heart of the matter (1971: 443): ‘Construction of a test itself starts from a theory about behaviour or mental organization derived from prior research that suggests the ground plan for the test.’ Davies (1977: 63) argued in similar vein: ‘it is, after Barry O’Sullivan, Cyril J. Weir and Nick Saville all, the theory on which all else rests; it is from there that the construct is set up and it is on the construct that validity, of the content and predictive kinds, is based.’ Kelly (1978: 8) supported this view, commenting that: ‘the systematic development of tests requires some theory, even an informal, inexplicit one, to guide the initial selection of item content and the division of the domain of interest into appropriate sub-areas.’ Because we lack an adequate theory of language in use, a priori attempts to determine the construct validity of pro ciency tests involve us in matters that relate more evidently to content validity.

We need to talk of the communicative construct in descriptive terms and, as a result, we become involved in questions of content relevance and content coverage. Thus, for Kelly (1978: 8) content validity seemed ‘an almost completely overlapping concept’ with construct validity, and for Moller (1982: 68): ‘the distinction between construct and content validity in language testing is not always very marked, particularly for tests of general language pro ciency.’ Content validity is considered important as it is principally concerned with the extent to which the selection of test tasks is representative of the larger universe of tasks of which the test is assumed to be a sample (see Bachman and Palmer, 1981; Henning, 1987: 94;

Messick, 1989: 16; Bachman, 1990: 244). Similarly, Anastasi (1988:

131) de ned content validity as involving: ‘essentially the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured.’ She

outlined (Anastasi, 1988: 132) the following guidelines for establishing content validity:

1) ‘the behaviour domain to be tested must be systematically analysed to make certain that all major aspects are covered by the test items, and in the correct proportions’;

2) ‘the domain under consideration should be fully described in advance, rather than being de ned after the test has been prepared’;

3) ‘content validity depends on the relevance of the individual’s test responses to the behaviour area under consideration, rather than on the apparent relevance of item content.’ The directness of t and adequacy of the test sample is thus dependent on the quality of the description of the target language behaviour being tested. In addition, if the responses to the item are invoked Messick (1975: 961) suggests ‘the concern with processes underlying test responses places this approach to content validity squarely in the realm of construct validity’. Davies (1990: 23) similarly notes: ‘content validity slides into construct validity’.

38 Validating speaking-test tasks Content validation is, of course, extremely problematic given the dif culty we have in characterizing language pro ciency with sufcient precision to ensure the validity of the representative sample we include in our tests, and the further threats to validity arising out of any attempts to operationalize real life behaviours in a test. Specifying operations, let alone the conditions under which these are performed, is challenging and at best relatively unsophisticated (see Cronbach, 1990). Weir (1993) provides an introductory attempt to specify the operations and conditions that might form a framework for test task description (see also Bachman, 1990; Bachman and Palmer, 1996).

The dif culties involved do not, however, absolve us from attempting to make our tests as relevant as possible in terms of content. Generating content related evidence is seen as a necessary, although not suf cient, part of the validation process of a speaking test. To this end we sought to establish in this study an effective and ef cient procedure for establishing the content validity of speaking tests. As well as being useful in helping specify the domain to be tested we would argue that the checklist discussed below would enable the researcher to address how predicted vs. actual task performance can be compared.

III Methodological issues While it is relatively easy to rationalize the need to establish that the LETs used in performance tests are working as predicted (i.e., in terms of language generated), the dif culty lies in how this might best be done.

UCLES EFL (English as a foreign language) routinely collects audio recordings and carries out transcriptions of its Speaking Tests.

These transcripts are used for a range of validation purposes, and in particular they contribute to revision projects for the Speaking Tests, for example, FCE which was revised in 1996, and currently the revision of the International English Language Testing System (IELTS) Speaking Test, in addition to the CPE revision project.

In a series of UCLES studies focusing on the language of the Speaking Tests, Lazaraton has applied conversational analysis (CA) techniques to contribute to our understanding of the language used in pair-format Speaking Tests, including the language of the candidates and the interlocutor. Her approach requires a very careful, ne-tuned transcription of the tests in order to provide the data for analysis (see Lazaraton, 2000). Similar qualitative methodologies have been applied by Young and Milanovic (1992) – also to UCLES data – by Brown (1998) and by Ross and Berwick (1992), amongst others.

Barry O’Sullivan, Cyril J. Weir and Nick Saville While there is clearly a great deal of potential for this detailed analysis of transcribed performances, there are also a number of drawbacks, the most serious of which involves the complexity of the transcription process. In practice, this means that a great deal of time and expertise is required in order to gain the kind of data that will answer the basic question concerning validity. Even where this is done, it is impractical to attempt to deal with more than a small number of test events;

Pages:   || 2 | 3 | 4 |

Similar works:

«2011Winter January-March 2011-2012 Annual Review WCWS is a coordinated response by churches in Westminster to support street and hidden homeless during the winter in partnership with the W e s t L o n d o n D a y C e n t r e, 1 3 4 1 3 6 S e y m o u r P l a c e, L o n d o n, W 1 H 1 N T.TABLE OF CONTENTS WESTMINSTER CHURCHES WINTER SHELTER (WCWS) OVERVIEW 3.WCWS REFERRAL PROCESS SHELTER BUDGET SHELTER PROVISION WCWS 2011-12 OUTCOMES FEEDBACK FROM VOLUNTEERS AND CHURCH COORDINATORS.9...»

«x7722271404 / x772000164 SRM-225 GRASS TRIMMER / BRUSH CUTTER Operator’s Manual Burn Hazard The muffler or catalytic muffler and surrounding cover may become extremely hot. Always keep clear of exhaust and muffler area, otherwise serious personal injury may occur. The engine exhaust from this product contains chemicals known to the State of California to cause cancer, birth defects or other reproductive harm. Users of this equipment risk injury to themselves and others if the unit is used...»

«Colección: La Investigación Educativa en  México 1992‐2002    Volumen 8: Sujetos, Actos y Procesos de Formación  (Tomo I)  Coordinadora: Patricia Ducoing Watty  412 páginas. ISBN: 968‐7542‐30‐6.  © 2003 por Consejo Mexicano de Investigación Educativa, A.C.    www.comie.org.mx  SUJETOS, ACTORES Y PROCESOS DE FORMACIÓN TOMO I: FORMACIÓN PARA LA INVESTIGACIÓN LOS ACADÉMICOS EN MÉXICO ACTORES Y ORGANIZACIONES coordinadora: Patricia Ducoing...»

«VLP-SERIES VLP-300/600/1500 Pro-Endstufen BEDIENUNGSANLEITUNG Seite 1 von 23 www.americanaudio.eu AMERICAN AUDIO VLP-300/600/1500 INHALT Seite 3: Einleitung Seite 4: Sicherheitshinweise Seite 5: Sicherheitshinweise Seite 6: Produktinformationen, Kundenhilfe und Servicecenter Seite 7: Auspacken, Einführung, Aufbausicherheits-vorkehrungen, Betriebsvoraussetzungen Seite 8: Frontpanel VLP-600/1500 Seite 9: Frontpanel VLP-300 Seiten 10 bis 11: Rückseite VLP-600/1500 Seiten 12 bis 13: Rückseite...»


«TOPSIDE Oportunidades de Formación Para los Compañeros de Apoyo con Discapacidad Intelectual en Europa Consejos y directrices para el tutor del compañero de apoyo Inclusion Europe www.peer-training.eu Autores: los socios de TOPSIDE  Hugh Savage, ENABLE, Escocia  Petra Nováková, Inclusion Czech Republic, República Checa  Juultje Holla, Perspectief, Países Bajos  Alina Sima & Cristina Burlacu, Pentru Voi, Rumanía  Ivana Vilademunt & Anna Gutierrez Roca, Sant Tomas, España...»

«Payroll Based Journal (PBJ) – Frequently Asked Questions (FAQ) 12/14/2015 Table of Contents PBJ Data Specification Questions: PBJ Systems Questions: PBJ Training Questions: PBJ Registration Questions: PBJ Submission Questions: PBJ Report Questions: PBJ Data Specification Questions: 1. Q. I attended the software developer call on May 6. It was stated that the only options organizations would have to upload information is manual entry or upload an XML file. I’ve been speaking with a few other...»

«Your Job Email Update for All Results in Chico MSA, CA; Redding MSA, CA; Del Norte County, CA; Lassen County, CA; Modoc County, CA; Nevada County, CA; Plumas County, CA; Sierra County, CA; Siskiyou County, CA; Tehama County, CA; Trinity County, CA. What's New with WANTED Analytics! Log in to Analytics and try the 2 newest features: Talent Sourcing Reports and the Search Wizard. Just copy and paste any job ad into the Search Wizard and we’ll extract the skills and certifications required for...»

«A Jazz Jumpstart Workshop IBM Rational Team Concert V4.0 Enterprise Extensions Build Administration Workshop © 2013 IBM Corporation IBM Software Acknowledgments and Disclaimers © Copyright IBM Corporation 2013 The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing...»

«Copyright by Anna Rudolph Canter The Dissertation Committee for Anna Rudolph Canter Certifies that this is the approved version of the following dissertation: “In the Middle of an Orange Grove, Across the Street From the Tortilla Factory”: The Science Academy of South Texas Committee: O. L. Davis, Jr., Supervisor Mary S. Black Sherry Field Elaine Clift Gore Mary Lee Webeck “In the Middle of an Orange Grove, Across the Street From the Tortilla Factory”: The Science Academy of South Texas...»

«Student Name: Lorcán Owens Student Number: 10715131 Title of Final Year Project: The challenges of teaching and preserving Irish dialects in a standardised Irish Curriculum Word Count: 10,780 I declare that this dissertation is entirely my own work and all sources have been duly acknowledged Date: I Declaration I hereby declare that this dissertation has not been submitted as an exercise for a degree at any other university and, except where otherwise acknowledged it is entirely my own work....»

«Pathway Senior Living Request for Developer Qualifications – Lake & Park December 2014 Robert H. Helle bhelle@pathwaysl.com Pathway Senior Living 701 E Lee St. Des Plaines, IL Phone: 847-768-5100 www.Pathwaysl.com River Forest, IL Request for Developer Qualifications – Lake & Park Pathway Senior Living will be the lead developer and operational manager of the proposed assisted living development. Pathway Senior Living, LLC is a privately held, Chicago-based company founded in 1997 and...»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.