Validity & Reliability Studies
Gallaudet commissioned research studies on the ASLPI to:
- investigate and document validity and reliability of the ASLPI (current state);
- identify best-practices and standards for language assessments that can be applied to the ASLPI;
- identify the gap between current and desired states and make recommendations to improve the ASLPI process and psychometrics (if needed).
Importance of Research:
- Documenting current level of success
- Test improvement Legal defensibility
- Validity, accuracy, and reliability
- User experience
- Tester training
- Compliance with various standards
- Psychometrics are reviewed (e.g., ACE credit)
- Contribution to research
- Leadership role in setting standard for ASL testing
Standards for High-Stakes Testing
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author.ASTM Language Proficiency Testing Standards. (ASTM International, known until 2001 as the American Society for Testing and Materials (ASTM), is an international standards organization that develops and publishes voluntary consensus technical standards for a wide range of materials, products, systems, and services)
- The Standards provides guidelines/responsibilities but is not prescriptive in how to achieve them
- Principles for the Validation and Use of Personnel Selection Procedures
- Uniform Guidelines on Employee Selection Procedures
Defining Validity of an assessment:
Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. ( Standards, pg. 9)
Defining Reliability of an assessment:
Reliability refers to the consistency of measurements when the testing procedure is repeated on a population of individuals or groups. ( Standards, pg. 25)
ASLPI is rater-based assessment. Therefore, the ASLPI interviewers must be skilled and strongly focused on the construct being measured throughout the interview process.
Measurement Structure of the ASL Proficiency Construct as Assessed by the ASLPI
The findings provided evidence that a majority of the variability in the five dimensions was attributable to a single higher-order construct. That is, a majority of the variability in dimension ratings can be meaningfully traced to a single underlying factor, which is as intended by the test developer. Moreover, the relationships between the dimensions and the single underlying factor were strongly consistent (i.e., approximately equal in magnitude). In sum, the five dimensions are representative of a single underlying construct, and the dimensions relate to the underlying construct in a consistent manner, these findings align with the design and current rating protocol of the ASLPI.
Preliminary Validity Study Conclusion:
- The five ASLPI dimensions appear representative of a single underlying construct, which is defined by five facets/dimensions
- The dimensions relate to the underlying construct in a consistent manner
- Ratings for each ASLPI dimension show very little systematic rater bias
- Each dimension contributed approximately equally to both the total score and the holistic rating
- This provides construct validity evidence supporting the ASLPI definition of ASL proficiency
Study Sample: 1568 assessment interviews rated by ASLPI evaluators
- 1286 test candidates between 2008 and 2011
- 82 re-rated interviews
Inter and intra-rater Reliability - measure of consistency between ratings of the same test taker from different evaluators. This is an "intra-class correlation" (ICC) for a single rater or the average rating across 3 raters.
Correlation Across All Possible Evaluator Pairs
- Findings support the reliability of ASLPI ratings
- Evaluators consistently rank-ordered interviewees for total scores, for holistic ratings, and for dimension scores
- Evaluators rated performances consistently in terms of relative position on the ASLPI scale
|Correlation Across All Possible Evaluator Pairs|
Repeatability of ASLPI Ratings
- Findings supported the repeatability of ASLPI ratings using the adjacent agreement standard of ratings/scores which is the current standard
- When re-rating ASLPI interviews, evaluators' re-ratings agreed with the initial ratings, and met the adjacent standard
|Agreement (+/- 1) [Ref 90-100%]|
- Findings supported that a new panel of evaluators and the original panel of evaluators resulted in reliable ratings
|Comparison||n||Final Rating Reliability|
|Person r||Spearman Rho|
|New panel (3 evaluators)||82||.89||.88|
|Original panel (3 evaluators)||81||.92||.90|
SWA Consulting, Inc (SWA)
311 S. Harrington St., Suit 200
Raleigh, NC 27603
Dr. Eric A. Surface is the president of SWA Consulting, Inc. (SWA; formerly Surface, Ward, & Associates), a management consulting and applied personnel research firm based in Raleigh, NC. SWA focuses on developing evidence-based human performance solutions for clients. For over a decade, Dr. Surface has worked with military, non-profit, and private-sector organizations-such as the United States Special Operations Command, the American Council on the Teaching of Foreign Languages, and IBM-on projects related to performance, training, work-related foreign language, work analysis, testing, and organizational effectiveness. Dr. Surface has been published in Personnel Psychology, Organizational Research Methods, Human Performance, Military Psychology, Journal of Management, and Foreign Language Annals. He has presented numerous papers at academic, professional, and military conferences and has served as a reviewer for several conferences and journals. His research interests include training effectiveness (broadly defined), the interaction of context and individual differences to influence criteria, foreign language proficiency testing and training, the use of technology for training delivery, test validation, and survey methodology. Dr. Surface earned his PhD in industrial/organizational psychology from NCSU, his MA from East Carolina University, and his BA from Wake Forest University. He is a former Consortium Research Fellow and Consortium Post-Doctoral Research Fellow with the Army Research Institute.
Research Recommendations and Implementation
In 2010-2012, the ASLPI evaluation system underwent validity and reliability studies by an external research and consulting firm. The lead researcher traveled to Gallaudet University to present about the findings to the Gallaudet faculty and administration (outcomes posted above).
Upon conclusion of this two year research study, the following recommendations were prioritized by ASL Diagnostic and Evaluation Services:
- Enhance training and refresher trainings with a greater emphasis on rating accuracy, consistency, and strict performance appraisals for attaining and maintaining evaluator status.
- Develop more comprehensive Functional Descriptions for 0-5 proficiency levels, including Functional Descriptions for the plus (+) levels.
- Establish stricter standards for agreement among raters by transitioning from an adjacent rating approach to absolute agreement for rating decisions. The spring 2013 semester was an intensive focus to enhance the training and refresher training programs. Additional components and activities were developed with specific focus on interviewing strategies and techniques to elicit functional language at each proficiency level, as well as focus on rating accuracy
During the summer of 2013 from June through August, The ASLPI conducted a comprehensive and careful analysis of the Functional Descriptions for the 0-5 proficiency levels, including the plus (+) levels. After 500+ hours and critical analysis of 90+ video recorded ASLPI evaluations, updated and enhanced Functional Descriptions were finalized.
At the beginning of fall 2013, all ASLPI Evaluators were required to participate in a mandatory training. Adherence to stricter standards for rating agreement among evaluators was also instituted. The absolute agreement approach for rating decisions was implemented as is recommended for high stakes testing.
Following the training and implementation of the stricter rating expectation (absolute agreement for proficiency level decisions), The ASLPI established a monitoring system. Prior to distribution, every rating decision went through a review process to examine inter- and intra-rater reliability. This ensured application and effectiveness of the training. At the end of fall 2013, The ASLPI conducted a rating reliability study across all evaluators. The reliability study included the 275 evaluations conducted. Rating reliability for the entire pool of ASLPI Evaluators was determined to be 84%. For high stakes language proficiency testing, reliability must be 80% or above.
The ASLPI is continuing to closely monitor all evaluators to ensure that a shared mental model is consistent and evaluators are applying the same operational definitions of the proficiency constructs.