homeour goalsmeet the researchersnews from the projectpublicationsresources


Cultural Validity of Assessments and Assessment Development Procedures
Guillermo Solano-Flores and Sharon Nelson-Barber

WestEd

Paper presented at the 2000 American Educational Research Association Meeting.
New Orleans, LA, April 24-28

DRAFT: Do not duplicate or cite without written permission from the authors. Funding for the investigation here reported was provided by the National Science Foundation, grant number 9909729. The opinions expressed by the authors do not necessarily represent the opinions of the funding agency.

: Abstract
: Introduction
: Beliefs and Assumptions in the Testing of Cultural and Linguistic Minorities
: The Need for a Shift of Paradigms in Testing
: Relevance of Cultural Validity
: Final Comments: Assessing the Cultural Validity of Assessments
: Notes
: References


Abstract

            In this paper we discuss the concept of cultural validity as a form of test validity that should be incorporated into assessment practices. Culturalvalidity is attained when proper consideration is given to the values, beliefs, experiences, communication patterns, and epistemologies inherent to a given culture that influence how students from that culture make sense of test items and how they respond to them. Unlike approaches oriented towards adapting tests or correcting them for cultural bias, an approach based on the notion of cultural validity focuses on understanding and honoring cultural diversity throughout-rather than at the end of-the process of assessment development. Several implications of the notion of cultural validity are discussed.

Introduction

            Current methods for "handling" cultural diversity in assessment use an ex-post-facto approach. A test originally created to test a mainstream population of students is adapted to test another population of students whose culture or native language is different. Or accommodations are provided with the intent to include these cultural or linguistic minority students in an assessment system originally designed to test the mainstream population of students. Or cultural bias is estimated and corrected statistically after the final version of a test is administered to samples of different populations of students.

            While well intentioned, these approaches are not enough to ensure equity and fairness in testing. From a philosophical standpoint, these approaches do not treat cultures and languages with the same respect; mainstream culture dictates the content, contextual information and wording of test. From a methodological standpoint, the equivalence of an original test and its adapted or translated version is questionable; these test versions are developed with different procedures. From a practical standpoint, often the quality of test adaptations and translations is alarmingly poor. Finally, from a theoretical standpoint, these approaches do not take into account the ways in which culture influences thinking.

            We propose that cultural validity (Solano-Flores & Nelson-Barber, 2000) as a form of test validity should be incorporated into assessment practices. Unlike other approaches intended to ensure test equity and fairness, an approach based on cultural validity pro-actively incorporates cultural diversity throughout the process of assessment development-rather than treating it as a source of score variance between cultural groups at the end of the process.

            In this paper we discuss the theoretical foundations, relevance, and implications of the concept of cultural validity. We contend that attaining cultural validity may imply an important shift of paradigms in test development and use.


Beliefs and Assumptions in the Testing of Cultural and Linguistic Minorities

            Historically, the education of cultural and linguistic minorities has reflected a clash of cultures. The styles and values of an imposed school system often conflict with the learning and teaching styles, values and beliefs of cultural minorities. As a result, ineffective and segregating tracking systems (Oakes, 1985, 1990) and cultural stereotypes that affect intellectual identity and performance (Steele & Aronson, 1995; Steele 1997) combine with poverty (Knapp, Shields, & Turnbull, 1995) as factors responsible for low academic performance of cultural minorities in the United States.

            Not surprisingly, assessment practices may be helping perpetuate these inequalities. If these practices overlook or deny cultural diversity by relying on assumptions that derive from or are applicable to the mainstream population, then testing and the scores they produce are likely placing cultural minority students at a disadvantage.

            A case in point is test translation. For decades, back translation has been a standard, commonly accepted procedure for ensuring the quality of a test. First, an experienced translator translates the test from the original language into another language. Then a panel of bilingual scholars reviews the translated version, which is translated back into the first language to monitor retention of the original meaning. Finally, the translated version of the assessment is refined, as it is "tried-out" out with a sample of students.

            An earlier study (Solano-Flores, Ruiz-Primo, Baxter, & Shavelson, 1992) observed that even when a strict back translation procedure is used, the resulting translation may not be entirely sensitive to the characteristics of the students targeted. For example, some students may be unfamiliar with some of the words translators assume are part of the students' everyday language.

            If translations are made without considering the sociocultural dimensions of language use, they may reflect the translators' thinking, not the language used by students. The scores produced may be considered dependable due to the acceptability of the translation procedure used. In reality they confound academic skills and the students' vocabulary in their first language (Durán, 1989).

            The fact that the original version and the translated version of a test are not developed using the same procedure questions the validity of the scores obtained (Solano-Flores & Nelson Barber, 1999; Solano-Flores, Trumbull, & Nelson-Barber, 2000). Still, other practices add up to the problem. For example, despite the existence of test translation guidelines (e.g., Van de Vivjer & Hambleton, 1996), tests are sometimes translated within short timelines, not allowing the chance for try-out and refinement as with the original version.

            Inadequate test translation is just one example of how assessment practices might be led by inaccurate assumptions about language and culture. The simplistic belief that adapting a test (e.g., by translating it into another language or by providing accommodations) is enough to properly serve diverse populations can have the catastrophic effect of contributing to perpetuating inequalities in the assessment of these groups.


The Need for a Shift of Paradigms in Testing

            Anthropologists have provided extensive evidence that culture shapes our worldviews (the way we perceive the world) and our epistemologies (the way we construct knowledge). They are aware that erroneous conclusions about the skills and knowledge of individuals can be arrived at when the influence of culture on their thinking is not properly considered (Greenfield, 1998).

            This notion conflicts with basic assumptions and practices in current testing. For example, conducting individual interviews may not provide accurate data in societies in which knowledge is constructed collectively-which conflicts with the practice of basing assessment on individual scores. Or, unrelated questions intended to obtain fragmented pieces of information (as in a questionnaire or a multiple choice test) are meaningless to cultures in which knowledge is a contextualized phenomenon-which conflicts with the assumption of item independence, basic in psychometric theory. Or, certain questions may not elicit a response from individuals of certain cultures unless the wording and format is adapted based on their epistemologies-which conflicts with the aim of standardization.

            Diverse approaches, mainly of a statistical nature, have been created to correctfor cultural bias, making items equivalent across cultures, and adapting and translating tests originally developed for a mainstream population (e.g., Van de Vivjer & Hambleton, 1996; Van de Vivjer & Leung, 1997; Van de Vivjer & Tanzer, 1998; Ercikan, 1998; Hambleton, 1994; Van de Vivjer & Poortinga, 1997). Though necessary and well-intentioned, these approaches have the limitation of neglecting the social dimension of knowledge and addressing culture merely as a source of score variance. Once the role of culture in the construction of knowledge is properly recognized, the limitations of these approaches to assessing students from different cultural backgrounds become evident.

            What is missing is a revision of the criteria that define what should and should not be considered acceptable in the assessment of diverse populations. We believe this is a good time for such a revision to take place, as important changes in assessment practices are occurring. Among these changes are exploring the quality of scores on tasks when students work in teams (e.g., Webb, 1989, 1991), a revision of the relevance of standardization in testing (Kopriva, 19##), and the adoption of qualitative information on the students' thinking processes as a source for test validity (Magone, Cai, Silver, & Wang, 1994; Baxter, Elder, & Glaser, 1996).


Relevance of Cultural Validity

            In a related paper (Solano-Flores & Nelson-Barber, 2000), we have proposed the notion of cultural validity. Cultural validity is attained when proper consideration is given to the values, beliefs, experiences, communication patterns, and epistemologies inherent to a given culture that influence how students from that culture make sense of test items and how they respond to them. Unlike current thinking in measurement, which addresses culture from the perspective of fairness as a criterion for test validity (Linn, Baker, & Dunbar, 1991), we propose that cultural validity is a form of test validity in its own right.

            Whereas current approaches to cultural diversity in assessment focus on detecting and correcting for cultural bias, an approach based on the notion of cultural validity focuses on understanding the cultural influences that shape student thinking as they engage in responding to test items. Current approaches focus on the final (or close to final) version of an assessment or an adaptation or translated version of it. In contrast, an approach based on the notion of cultural validity focuses on honoring and addressing those cultural differences during the process of assessment development.

            Our conceptual framework is based on the Vygotskyan notion that the sociocultural setting shapes mental functioning (See Vygotsy, 1978; Wertsch, Del Río, & Alvarez, 1995). We are interested in understanding the relationship between culturally-defined learning and teaching styles, beliefs, and values, and the way students make sense of test items and respond to them.

            Naturally, we build on experience gained from recent studies with multiple choice items (Norris, 19##), hands-on tasks (Magone, Cai, Silver, & Wang, 1994; Baxter, Elder, & Glaser, 1996; Hamilton, Nussbaum, & Snow, 1997; Baxter & Glaser, 1998), and concept maps (Ruiz-Primo, Schultz, Li, & Shavelson, 1999), which show that assessment validity also can be examined through the quality and complexity of the knowledge and reasonings used by students when they take assessments. However, consistent with the notion that culture shapes thinking, the methods for cultural validity must go beyond simply analyzing cognitive activity. They must explain that cognitive activity in the context of culture.


Final Comments: Assessing the Cultural Validity of Assessments

             The notion of cultural validity has a serious implication. It suggests the possibility that many current assessments deemed valid from the perspective of equity and fairness, may not be valid from the perspective of cultural validity.

            As a first step towards examining the relevance of the notion of cultural validity, we are examining whether culture influences the way in which students interpret test items. Our intent is to determine how failing to consider those cultural differences may penalize students from some cultural groups.

            We are administering a set of science and mathematics items from a national, standardized test deemed psychometrically sound to samples of students of different cultural groups and geographical areas in the United States. We use both quantitative and qualitative methods. For example, we are using statistical procedures to examine p values and score differences between groups and interviews and verbal protocol analyses to obtain information about the cognitive activity elicited by tasks and to determine how cognitive activity is influenced by culture and how culture can account for score differences between cultural groups.

            This two-year project started recently. Preliminary results suggest that looking at how personal experience and sociocultural activity influences the way in which students interpret and solve science and mathematics problems reveals information about the quality of the test items that would be possible to obtain by using other approaches. The following example illustrates this:

A Latino girl is given a NAEP science exercise on erosion. The exercise shows two pictures, A and B, of the same river and mountains. In A, the mountains look low and round and the river wide. In B, the mountains are high and pointy and the river narrow. The item asks the student to circle the letter under the picture that shows how the river and mountains look NOW (as opposed to how they looked millions of years ago). This girl picks B, the wrong answer. "Have you ever seen mountains?" we ask. She says, no. But then she recalls that, when she went to Salinas (California) to visit some relatives she saw mountains "that looked like this" (picture B). "I've never seen mountains that look like this" (she points at picture A). During the interview we learn that she doesn't remember learning about mountains at school. Nor she has such experiences as climbing mountains or hiking in the countryside. She recalls, however, one day when she went on a picnic. During that picnic, she saw a river and some rocks in it. We ask why she thinks the rocks were there. "Maybe some kids threw them into the river" she responds. (Note 1)

            We have observed that the way students interpret science items and respond to them may be more influenced by personal experience than formal school learning experience. The girl in this example has a limited first-hand experience with mountains. As a result, she interprets the "now" in the item ("how these mountains look now") only from the perspective of personal experience, which occurs in the present. The resulting epistemology, which could be called personal realism, seems to be: "If I see it, that's because it is happening now; if I don't see it, that's because it doesn't exist." That, combined with the fact that she lives in the West coast, which is young, geologically speaking, and where any mountains are pointy, leads her to giving an incorrect answer.

            Whether cultural groups have different ways of interpreting test items will not be clear until we have gathered more information. If evidence shows that culture influences the way in which students interpret and respond to test items, and that not considering cultural differences may penalize students from some cultural groups, there will be strong support for the notion that cultural validity is a form of test validity that should be considered systematically in assessment development and testing practices.

Return to TOP


Notes

Note 1: We appreciate the contribution of Ursula Sexton and Rachel Lagunoff, who conducted this interview together with the first author.


References

Baxter, G.P., Elder, A.D., & Glaser, R. (1996). Knowledge-based cognition and performance assessment in the science classroom. Educational  Psychologist, 31(2), 133-140.

Baxter, G.P. & Glaser, R. (1998). Investigating the cognitive complexity of science assessments. Educational Measurement: Issues and Practice, 17(3), 37-45.

Durán, R.P. (1989). Testing of linguistic minorities. In R. Linn (Ed.). Educational Measurement (3rd Edition). New York: American Council of Education, MacMillan Publishing Company.

Ercikan, K. (1998). Translation effects in international assessment. International Journal of Educational Research, 29, 543-553.

Hambleton, R.K. (1994). Guidelines for adapting educational and psychological: A progress report. European Journal of Psychological Assessment, 10(3), 229-244.

Greenfield, P. M. (1998). Culture as process: Empirical methods for cultural psychology. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology, Second Edition. Vol. 1: Theory and method. Needham Heights, Massachusetts: Allyn & Bacon.

Hamilton, L.S., Nussbaum, E.M., & Snow, R.E. (1997). Interview procedures for validating science assessments. Applied Measurement in Education, 10, 181-200.

Knapp, M. S., Shields, P. M., & Turnbull, B. J. (1995). Academic challenge in high-poverty classrooms. Phi Delta Kappan, 76, 770-776.

Kopriva, R. (1999). A conceptual framework for the valid and comparable measurement of all students. Paper presented at the American Educational Research Association annual meeting, April 19-23. Montreal, Canada.

Linn, R.L., Baker, E.L., & Dunbar, S.B. (1991). Complex performance-based assessment: Expectations and validation criteria. Educational Researcher, (20)8, 5-21

Lipka, J. (1998). Transforming the culture of schools: Yup'ik Eskimo examples. Mahwah, New Jersey: Lawrence Erlbaum.

Megone, M. E., Cai, J., Silver, E. A., & Wang, N. (1994). Validating the cognitive complexity and content quality of a mathematics performance assessment. International Journal of Educational Research, 21(3), 317-340.

Oakes, J. (1985). Keeping track: How schools structure inequality. New Haven, Connecticut: Yale University Press.

Oakes, J. (1990). Multiplying inequalities: The effects of race, social, class, and tracking on opportunities to learn mathematics and science. Santa Monica, California: The RAND Corporation.

Ruiz-Primo, M. A., Schultz, S. E., Li, M., & Shavelson, R. J. (1999). On the cognitive validity of interpretations of scores from alternative concept-mapping techniques. Paper presented at the Annual Meeting of the American Educational Research Association. Montreal, Canada, April 19-23.

Solano-Flores, G., & Nelson-Barber, N. (1999 a). Developing culturally-responsive science assessments. Workshop paper presented at the 1999 Meeting of the National Association for the Research of Science Teaching. Boston, Massachusetts, March 28-31.

Solano-Flores, G., & Nelson-Barber, S. (2000). Cultural validity in assessment and assessment development procedures. Paper presented at the 2000 American Educational Research Association Meeting. New Orleans, LA, April 24-28.

Solano-Flores, G., Trumbull, E., & Nelson-Barber, S. (2000). Evaluation of a model for the concurrent development of two language (English and Spanish) Versions of a Mathematics Assessment. Paper presented at the 2000 American Educational Research Association Meeting. New Orleans, LA, April 24-28.

Solano-Flores, G., Ruiz-Primo, M. A., Baxter, G. P., & Shavelson, R. J. (1991). Science performance assessment with language minority students. Santa Barbara, CA. University of California, Santa Barbara.

Steele, C., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5) 797-811.

Steele, C. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52(6), 613-629

Van de Vivjer, F., & Hambleton, R.K. (1996). Translating tests: Some practical Guidelines. European Psychologist, 1(2), 89-99

Van de Vivjer, F., & Leung, K. (1997). Methods and data analysis of comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology, Second Edition. Vol. 1: Theory and method. Needham Heights, Massachusetts: Allyn & Bacon.

Van de Vivjer, F., & Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13(1), 29-37.

Van de Vivjer, F., & Tanzer, N. K. (1998). Bias and equivalence in cross-cultural assessment: An overview. European Review of Applied Psychology, 47(4), 263-279.

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, Massachusetts: Harvard University Press.

Webb, N. (1989) Peer interaction and Learning in small groups. International Journal of Educational research, 13, 21-39

Webb, N. (1991). Task related verbal interaction and mathematics learning in small groups. Journal for Research in Mathematics Education, 22, 366-389.

Wertsch, J. V., Del Río, P., & Alvarez, A. (Eds.) (1995). Sociocultural studies of mind. New York, New York: Cambridge University Press.

Return to TOP



OUR GOALS | MEET THE RESEARCHERS | NEWS FROM THE PROJECT | PUBLICATIONS| RESOURCES