FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:   || 2 | 3 | 4 | 5 |   ...   | 33 |

«Methods for Evaluating Interactive Information Retrieval Systems with Users By Diane Kelly Contents 1 Introduction 2 1.1 Purpose and Scope 4 1.2 ...»

-- [ Page 1 ] --

Foundations and Trends R in

Information Retrieval

Vol. 3, Nos. 1–2 (2009) 1–224

c 2009 D. Kelly

DOI: 10.1561/1500000012

Methods for Evaluating Interactive Information

Retrieval Systems with Users

By Diane Kelly


1 Introduction 2

1.1 Purpose and Scope 4

1.2 Sources and Recommended Readings 6

1.3 Outline of Paper 7 2 What is Interactive Information Retrieval? 9 3 Background 15

3.1 Cognitive Viewpoint in IR 15

3.2 Text Retrieval Conference 17 4 Approaches 25

4.1 Exploratory, Descriptive and Explanatory Studies 25

4.2 Evaluations and Experiments 26

4.3 Laboratory and Naturalistic Studies 27

4.4 Longitudinal Studies 28

4.5 Case Studies 29

4.6 Wizard of Oz Studies and Simulations 29 5 Research Basics 31

5.1 Problems and Questions 31

5.2 Theory 33

5.3 Hypotheses 33

5.4 Variables and Measurement 35

5.5 Measurement Considerations 39

5.6 Levels of Measurement 41 6 Experimental Design 44

6.1 Traditional Designs and the IIR Design 44

6.2 Factorial Designs 48

6.3 Between- and Within-Subjects Designs 49

6.4 Rotation and Counterbalancing 50

6.5 Randomization and User Choice 56

6.6 Study Mode 57

6.7 Protocols 58

6.8 Tutorials 58

6.9 Timing and Fatigue 59

6.10 Pilot Testing 60 7 Sampling 61

7.1 Probability Sampling 63

7.2 Non-Probability Sampling Techniques 66

7.3 Subject Recruitment 68

7.4 Users, Subjects, Participants and Assessors 69 8 Collections 71

8.1 Documents, Topics, and Tasks 71

8.2 Information Needs: Tasks and Topics 76 9 Data Collection Techniques 84

9.1 Think-Aloud 84

9.2 Stimulated Recall 85

9.3 Spontaneous and Prompted Self-Report

–  –  –

School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, dianek@email.unc.edu Abstract This paper provides overview and instruction regarding the evaluation of interactive information retrieval systems with users. The primary goal of this article is to catalog and compile material related to this topic into a single source. This article (1) provides historical background on the development of user-centered approaches to the evaluation of interactive information retrieval systems; (2) describes the major components of interactive information retrieval system evaluation; (3) describes different experimental designs and sampling strategies; (4) presents core instruments and data collection techniques and measures; (5) explains basic data analysis techniques; and (4) reviews and discusses previous studies. This article also discusses validity and reliability issues with respect to both measures and methods, presents background information on research ethics and discusses some ethical issues which are specific to studies of interactive information retrieval (IIR). Finally, this article concludes with a discussion of outstanding challenges and future research directions.

Introduction Information retrieval (IR) has experienced huge growth in the past decade as increasing numbers and types of information systems are being developed for end-users. The incorporation of users into IR system evaluation and the study of users’ information search behaviors and interactions have been identified as important concerns for IR researchers [46]. While the study of IR systems has a prescribed and dominant evaluation method that can be traced back to the Cranfield studies [54], studies of users and their interactions with information systems do not have well-established methods. For those interested in evaluating interactive information retrieval systems with users, it can be difficult to determine how to proceed from a scan of the literature since guidelines for designing and conducting such studies are for the most part missing.

In interactive information retrieval (IIR), users are typically studied along with their interactions with systems and information. While classic IR studies


humans out of the evaluation model, IIR focuses on users’ behaviors and experiences — including physical, cognitive and affective — and the interactions that occur between users and systems, and users and information. In simple terms, classic IR evaluation asks the question, does this system retrieve relevant documents? IIR evaluation asks the question, can people use this system to retrieve relevant documents? IIR studies include both system evaluations as well as more focused studies of users’ information search behaviors and their interactions with systems and information. IIR is informed by many fields including traditional IR, information and library science, psychology, and human–computer interaction (HCI). IIR has often been presented more generally as a combination of IR and HCI, or as a sub-area of HCI, but Ruthven [225] argues convincingly that IIR is a distinct research area. Recently, there has been interest in HCIR, or human computer information retrieval, but this looks similar to IIR and papers about this area have not established its uniqueness (e.g., [191]).

The proposition that IR systems are fundamentally interactive and should be evaluated from the perspective of users is not new. A review of IR literature reveals that many leaders in the field were writing about and studying interactive IR systems during the early years of IR research. For instance, Salton wrote a paper entitled “Evaluation problems in interactive information retrieval” which was published in 1970.

In this paper, Salton [229] identified user effort measures as important components of IR evaluation, including the attitudes and perceptions of users. Cleverdon et al. [55] identified presentation issues and user effort as important evaluation measures for IR systems, along with recall and precision. Tague and Schultz [259] discuss the notion of user friendliness.

Some of the first types of IR interactions were associated with relevance feedback. Looking closely at this seemingly simple type of interaction, we see the difficulties inherent in IIR studies. Assuming that users are provided with information needs, each user is likely to enter a different query, which will lead to different search results and different opportunities for relevance feedback. Each user, in turn, will provide different amounts of feedback, which will create new lists of search results. Furthermore, causes and consequences of these interactions cannot be observed easily since much of this exists in the user’s head. The actions that are available for observation — querying, saving a document, providing relevance feedback — are surrogates of cognitive activities. From such observable behaviors we must infer cognitive 4 Introduction activity; for instance, users who save a document may do so because it changes or adds to their understanding of their information needs.

User–system interactions are influenced by a number of other factors that are neither easily observable nor measurable. Each individual user has a different cognitive composition and behavioral disposition. Users vary according to all sorts of factors including how much they know about particular topics, how motivated they are to search, how much they know about searching, how much they know about the particular work or search task they need to complete, and even their expectations and perceptions of the IIR study [139, 194]. Individual variations in these factors mean that it is difficult to create an experimental situation that all people will experience the same, which in turn, makes it difficult to establish causal relationships. Moreover, measuring these factors is not always practical since there are likely a large number of factors and no established measurement practices.

The inclusion of users into any study necessarily makes IIR, in part, a behavioral science. As a result, appropriate methods for studying interactive IR systems must unite research traditions in two sciences which can be challenging. It is also the case that different systems, interfaces and use scenarios call for different methods and metrics, and studies of behavior and interaction suggest research designs that go beyond evaluation. For these reasons, there is no strong evaluation or experimental framework for IIR evaluations as there is for IR studies.

IIR researchers are able to make many choices about how to design and conduct their evaluations, but there is little guidance about how to do this.

1.1 Purpose and Scope There is a small body of research on evaluation models, methods, and metrics for IIR, but such studies are the exception rather than the rule (e.g., [34, 149]). In contrast to other disciplines where studies of methods and experimental design comprise an important portion of the literature, there are few, if any, research programs in IIR that investigate these issues and there is little formal guidance about how to conduct such studies, despite a long-standing call for such work

1.1 Purpose and Scope [231]. Tague’s [260, 262] work and select chapters of the edited volume by Sp¨rck-Jones [246] provide good starting points, but these writings a are 15–20-years-old. While it might be argued that Sp¨rck-Jones’ book a still describes the basic methodology behind traditional IR evaluations, Tague’s work, which focuses on user-centered methods, needs updating given changes in search environments, tasks, users, and measures. It is also the case that Tague’s work does not discuss data analysis. One might consult a statistics textbook for this type of information, but it can sometimes be difficult to develop a solid understanding of these topics unless they are discussed within the context of one’s own area of study.

The purpose of this paper is to provide a foundation on which those new to IIR can make more informed choices about how to design and conduct IIR evaluations with human subjects.1 The primary goal is to catalog and compile material related to the IIR evaluation method into a single source. This paper proposes some guidelines for conducting one basic type of IIR study — laboratory evaluations of experimental IIR systems. This is a particular kind of IIR study, but not the only kind. This paper is also focused more on quantitative methods, rather than qualitative. This is not a statement of value or importance, but a choice necessary to maintain a reasonable scope for this paper.

This article does not prescribe a step-by-step recipe for conducting IIR evaluations. The design of IIR studies is not a linear process and it would be imprudent to present the design process in this way. Typically, method design occurs iteratively, over time. Design decisions are interdependent; each choice impacts other choices. Understanding the possibilities and limitations of different design choices help one make better decisions, but there is no single method that is appropriate for all study situations. Part of the intellectual aspects of IIR is the method design itself. Prescriptive methods imply research can only be done in 1 The terms user and subject are often used interchangeably in published IIR studies. A distinction between these terms will be made in Section 7. Since this paper focuses primarily on laboratory evaluations, the term subject will be used when discussing issues related to laboratory evaluations and user will be used when discussing general issues related to all IIR studies. Subject is used to indicate a person who has been sampled from the user population to be included in a study.

6 Introduction one way and often prevent researchers from discovering better ways of doing things.

The focus of this paper is on text retrieval systems. The basic methodological issues presented in this paper are relevant to other types of IIR systems, but each type of IIR system will likely introduce its own special considerations and issues. Additional attention is given to the study of different types of IIR systems in the final section of this paper.

Digital libraries, a specific setting where IIR occurs, are also not discussed explicitly, but again, much of the material in this paper will be relevant to those working in this area [29].

Finally, this paper surveys some of the work that has been conducted in IIR. The survey is not intended to be comprehensive. Many of the studies that are cited are used to illustrate particular evaluation issues, rather than to reflect the state-of-the-art in IIR. For a current survey of research in IIR, see Ruthven [225]. For a more historic perspective, see Belkin and Vickery [23].

1.2 Sources and Recommended Readings A number of papers about evaluation have been consulted in the creation of this paper and have otherwise greatly influenced the content of this paper. As mentioned earlier, the works of Tague [260, 262, 263, 264] and Tague and Schultz [259] are seminal pieces. The edited volume by Sp¨rck-Jones [246] also formed a foundation for this paper.

a Other research devoted to the study and development of individual components or models for IIR evaluation have also influenced this paper. Borlund [32, 34] has contributed much to IIR evaluation with her studies of simulated information needs and evaluation measures.

Haas and Kraft [115] reviewed traditional experimental designs and related these to information science research. Ingwersen and J¨rvelin a [139] present a general discussion of methods used in information seeking and retrieval research. Finally, the TREC Interactive Track [80] and all of the participants in this Track over the years have made significant contributions to the development of an IIR evaluation framework.

Review articles have been written about many topics discussed in this paper. These articles include Sugar’s [255] review of user-centered

1.3 Outline of Paper perspectives in IR and Turtle et al.’s [277] review of interactive IR research as well as Ruthven’s [225] more recent version. The Annual Review of Information Science and Technology (ARIST) has also published many chapters on evaluation over its 40-year history including King’s [173] article on the design and evaluation of information systems,2 Kantor’s [161] review of feedback and its evaluation in IR, Rorvig’s [223] review of psychometric measurement in IR, Harter and Hert’s [123] review of IR system evaluation, and Wang’s [290] review of methodologies and methods for user behavior research.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 33 |

Similar works:

«PERSONAL REPRESENTATIVE TRAINING MANUAL This program was developed under grant number SJI-11-E-008 from the State Justice Institute. The points of view expressed are those of the faculty and do not necessarily represent the official position or policies of the State Justice Institute. © Superior Court of Arizona in Maricopa County ALL RIGHTS RESERVED PBPRTM1 0616 IMPORTANT NOTICE TRAINING REQUIREMENT Effective September 1, 2012 The Arizona Supreme Court requires that any person who is not a...»

«Development Geography Occasional Paper Diversification of livelihood strategies and the transformation of pastoralist life among Afar women in Baadu Ethiopia Helena Inkermann No. 04 April 2015 Bonn Development Geography Edited by the section for Development Geography Department of Geography University of Bonn Occasional Papers of the section for Development Geography serve to disseminate research results prior to publication in order to encourage the exchange of ideas and academic debate....»

«How do children reformulate their search queries? Sophie Rutter, Nigel Ford andPaul Clough Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, South Yorkshire, U.K., S1 4DP Abstract Introduction. This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation....»

«Jury Trial and The Independent Bar By Leonard Decof Consider the absurdity of the following concept. Twelve persons, probably without any prior contact with courts or the judicial system, are randomly chosen to sit together and listen to evidence, often of a highly complex nature. Afterward they will deliberate in secret and return a verdict that affects the liberty or property or even life of persons who have appeared before them. The jurors receive no training for this task, nor are they...»

«Annex A SOLAS Chapter V, regulation 22 Navigation Bridge visibility REGULATION 22 Navigation bridge visibility 1 Ships of not less than 45 m in length as defined in regulation III/3.12, constructed on or after 1 July 1998, shall meet the following requirements:.1 The view of the sea surface from the conning position shall not be obscured by more than two ship lengths, or 500 m, whichever is the less, forward of the bow to 10° on either side under all conditions of draught, trim and deck...»

«Location Overview Edinburgh Garrison (HQ Edinburgh Garrison, Dreghorn, www.army.mod.uk/hives Redford, Craigiehall and Glencorse) Edinburgh HIVE Caledonian Community Centre, Dreghorn Barracks, Redford Road, Edinburgh, EH13 9QW Tel: Civ – 0131 310 2876 Mil – 94740 2876 Email – edinburghhive@armymail.mod.uk Edinburgh HIVE publishes a blog providing local and military community information for the Edinburgh area. You can access the blog at www.edinburghhive.blogspot.com LOCAL AREA Edinburgh...»

«IN THE COURT OF CRIMINAL APPEALS OF TENNESSEE AT NASHVILLE Assigned on Briefs October 8, 2013 STATE OF TENNESSEE v. JEFFERY W. DEAN Appeal from the Circuit Court for Robertson County No. 2011-CR-267 Michael R. Jones, Judge No. M2013-00340-CCA-R3-CD Filed October 24, 2013 Appellant, Jeffery W. Dean, challenges his convictions for aggravated kidnapping and carjacking, for which he received concurrent sentences of thirteen years. In this appeal, he contends that the evidence was insufficient to...»

«El ahorro inclusivo para los clientes de remesas Un marco conceptual práctico Contenidos Acerca del Grupo de Trabajo de Remesas y Ahorros 6 1. Contexto y desafíos para las remesas y el ahorro 8 Antecedentes 10 Retos del ahorro para clientes de remesas 14 2. Un marco conceptual para modelos de remesas y ahorros inclusivos 19 Diseño de Productos 20 Desarrollar productos de ahorro para clientes de remesas de bajos ingresos que incentiven un uso continuo a lo largo del tiempo y que promuevan el...»

«Military Installation and Mission Support Best Practices (25 States / 20 Communities) Prepared for: Florida Defense Support Task Force (FDSTF) Submitted: December 23, 2014 TABLE OF CONTENTS TITLE PAGE EXECUTIVE SUMMARY BEST PRACTICES REPORT Purpose States/ Communities Project Participants Methodology Sources Findings STATES 1. Florida 2. Alabama 3. Alaska 4. Arizona 5. California 6. Colorado 7. Connecticut 8. Georgia 9. Hawaii 10. Illinois 11. Kansas 12. Kentucky 13. Louisiana 14. Maryland 15....»

«Departamento Académico de Administración Extensión de Trabajo de Graduación Orientación: Lic. en Administración de Empresas Formas jurídicas de las organizaciones sin fines de lucro Diferencias en la fiscalización privada y en el tratamiento impositivo Cooperativas de consumo, mutuales, asociaciones civiles, fundaciones y asociaciones bajo la forma de sociedades comerciales Alumno: Tomás Arshak Daghlian Legajo: 19055 Mentor: Luis E. Sánchez Brot Firma del Mentor 31 de mayo de 2012...»

«ANIMAL CONTROL ORDINANCE of the TOWN OF ROYALTON Article 1. Applicability: A. This ordinance shall apply to the entire Town of Royalton unless otherwise noted by reference.Article 2. Definitions: A. As used in this ordinance the following words and terms have the respective meanings herein assigned to them: Owner: Any person keeping an animal or allowing an animal to remain in or around buildings or premises owned, controlled or occupied by him/her. Stray: Any animal that is found running at...»

«APTA STANDARDS DEVELOPMENT PROGRAM APTA SGR-TAM-RP-002RECOMMENDED PRACTICE 13 Approved August 28, 2013 American Public Transportation Association 1666 K Street, NW, Washington, DC, 20006-1215 Working Group: Transit Asset Management Defining a Transit Asset Management Framework to Achieve a State of Good Repair Abstract: This Recommended Practice is intended as an introduction to the high-level requirements for building a transit asset management framework to achieve a state of good repair,...»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.