ASLPI Research & Statistics

 
Those who develop and utilize tests are responsible for demonstrating and documenting the effectiveness of their assessments. Effectiveness is typically demonstrated by providing evidence of the reliability and validity of the assessment's use for specific purposes and for specific populations of test-takers. Given that the ASLPI is a high-stakes evaluation often used as a "gatekeeper" for specific purposes, it is essential to document the reliability and validity of the assessment for the population that may be impacted by the results of the evaluation.

Gallaudet University mandates and prioritizes validity and reliability research on the ASLPI which is ongoing. Through these studies, we are mindful of aligning the ASLPI with the field of language proficiency evaluation and best practices. As research studies are concluded and data outcomes are available to share, they will be posted below.

Research Mission
Gallaudet commissioned research studies on the ASLPI to:
  • investigate and document validity and reliability of the ASLPI (current state);
  • identify best-practices and standards for language assessments that can be applied to the ASLPI;
  • identify the gap between current and desired states and make recommendations to improve the ASLPI process and psychometrics (if needed).
Importance of Research:
  • Documenting current level of success
  • Test improvement
    • Validity, accuracy, and reliability
    • User experience
    • Tester training
  • Legal defensibility
  • Compliance with various standards
  • Psychometrics are reviewed (e.g., ACE credit)
  • Marketability/PR
  • Contribution to research
  • Leadership role in setting standard for ASL testing

Standards for High-Stakes Testing
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author.
    • The Standards provides guidelines/responsibilities but is not prescriptive in how to achieve them
  • ASTM Language Proficiency Testing Standards. (ASTM International, known until 2001 as the American Society for Testing and Materials (ASTM), is an international standards organization that develops and publishes voluntary consensus technical standards for a wide range of materials, products, systems, and services)
  • Principles for the Validation and Use of Personnel Selection Procedures
  • Uniform Guidelines on Employee Selection Procedures
Defining Validity of an assessment:
Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. (Standards, pg. 9)

Defining Reliability of an assessment:
Reliability refers to the consistency of measurements when the testing procedure is repeated on a population of individuals or groups. (Standards, pg. 25)

ASLPI is rater-based assessment. Therefore, the ASLPI interviewers must be skilled and strongly focused on the construct being measured throughout the interview process.

Validity Studies
ASLPI Measurement Structure of ASLPI
Measurement Structure of the ASL Proficiency Construct as Assessed by the ASLPI

The findings provided evidence that a majority of the variability in the five dimensions was attributable to a single higher-order construct. That is, a majority of the variability in dimension ratings can be meaningfully traced to a single underlying factor, which is as intended by the test developer. Moreover, the relationships between the dimensions and the single underlying factor were strongly consistent (i.e., approximately equal in magnitude). In sum, the five dimensions are representative of a single underlying construct, and the dimensions relate to the underlying construct in a consistent manner, these findings align with the design and current rating protocol of the ASLPI.

Preliminary Validity Study Conclusion:
  • The five ASLPI dimensions appear representative of a single underlying construct, which is defined by five facets/dimensions
  • The dimensions relate to the underlying construct in a consistent manner
  • Ratings for each ASLPI dimension show very little systematic rater bias
  • Each dimension contributed approximately equally to both the total score and the holistic rating
  • This provides construct validity evidence supporting the ASLPI definition of ASL proficiency

Reliability Studies
Study Sample: 1568 assessment interviews rated by ASLPI evaluators
  • 1286 test candidates between 2008 and 2011
  • 82 re-rated interviews
Inter and intra-rater Reliability - measure of consistency between ratings of the same test taker from different evaluators. This is an "intra-class correlation" (ICC) for a single rater or the average rating across 3 raters.

Correlation Across All Possible Evaluator Pairs
  • Findings support the reliability of ASLPI ratings
  • Evaluators consistently rank-ordered interviewees for total scores, for holistic ratings, and for dimension scores
  • Evaluators rated performances consistently in terms of relative position on the ASLPI scale
Correlation Across All Possible Evaluator Pairs
Total Score .90
Holistic Rating .90
Dimensions
Vocabulary .82
Grammar .82
Comprehension .86
Accent/Production .80
Fluency .81

Repeatability of ASLPI Ratings
  • Findings supported the repeatability of ASLPI ratings using the adjacent agreement standard of ratings/scores which is the current standard
  • When re-rating ASLPI interviews, evaluators' re-ratings agreed with the initial ratings, and met the adjacent standard
Agreement (+/- 1) [Ref 90-100%]
Production 98%
Grammar 98%
Vocabulary 98%
Comprehension 98%
Holistic Rating 87%

ASLPI Re-Rating
  • Findings supported that a new panel of evaluators and the original panel of evaluators resulted in reliable ratings
Comparison n Final Rating Reliability
Person r Spearman Rho
New panel (3 evaluators) 82 .89 .88
Original panel (3 evaluators) 81 .92 .90

Research Firm
SWA Consulting, Inc (SWA)
311 S. Harrington St., Suit 200
Raleigh, NC 27603

Dr. Eric A. Surface is the president of SWA Consulting, Inc. (SWA; formerly Surface, Ward, & Associates), a management consulting and applied personnel research firm based in Raleigh, NC. SWA focuses on developing evidence-based human performance solutions for clients. For over a decade, Dr. Surface has worked with military, non-profit, and private-sector organizations-such as the United States Special Operations Command, the American Council on the Teaching of Foreign Languages, and IBM-on projects related to performance, training, work-related foreign language, work analysis, testing, and organizational effectiveness. Dr. Surface has been published in Personnel Psychology, Organizational Research Methods, Human Performance, Military Psychology, Journal of Management, and Foreign Language Annals. He has presented numerous papers at academic, professional, and military conferences and has served as a reviewer for several conferences and journals. His research interests include training effectiveness (broadly defined), the interaction of context and individual differences to influence criteria, foreign language proficiency testing and training, the use of technology for training delivery, test validation, and survey methodology. Dr. Surface earned his PhD in industrial/organizational psychology from NCSU, his MA from East Carolina University, and his BA from Wake Forest University. He is a former Consortium Research Fellow and Consortium Post-Doctoral Research Fellow with the Army Research Institute.

Research Recommendations and Implementation
In 2010-2012, the ASLPI evaluation system underwent validity and reliability studies by an external research and consulting firm. The lead researcher traveled to Gallaudet University to present about the findings to the Gallaudet faculty and administration (outcomes posted above).

Upon conclusion of this two year research study, the following recommendations were prioritized by ASL Diagnostic and Evaluation Services:
  • Enhance training and refresher trainings with a greater emphasis on rating accuracy, consistency, and strict performance appraisals for attaining and maintaining evaluator status.
  • Develop more comprehensive Functional Descriptions for 0-5 proficiency levels, including Functional Descriptions for the plus (+) levels.
  • Establish stricter standards for agreement among raters by transitioning from an adjacent rating approach to absolute agreement for rating decisions.
The spring 2013 semester was an intensive focus to enhance the training and refresher training programs. Additional components and activities were developed with specific focus on interviewing strategies and techniques to elicit functional language at each proficiency level, as well as focus on rating accuracy

During the summer of 2013 from June through August, ASL-DES conducted a comprehensive and careful analysis of the Functional Descriptions for the 0-5 proficiency levels, including the plus (+) levels. After 500+ hours and critical analysis of 90+ video recorded ASLPI evaluations, updated and enhanced Functional Descriptions were finalized.

At the beginning of fall 2013, all ASLPI Evaluators were required to participate in a mandatory training. Adherence to stricter standards for rating agreement among evaluators was also instituted. The absolute agreement approach for rating decisions was implemented as is recommended for high stakes testing.

Following the training and implementation of the stricter rating expectation (absolute agreement for proficiency level decisions), ASL-DES established a monitoring system. Prior to distribution, every rating decision went through a review process to examine inter- and intra-rater reliability. This ensured application and effectiveness of the training. At the end of fall 2013, ASL-DES conducted a rating reliability study across all evaluators. The reliability study included the 275 evaluations conducted. Rating reliability for the entire pool of ASLPI Evaluators was determined to be 84%. For high stakes language proficiency testing, reliability must be 80% or above.

ASL-DES is continuing to closely monitor all evaluators to ensure that a shared mental model is consistent and evaluators are applying the same operational definitions of the proficiency constructs.

ASLPI Chart of Numbers Served
ASLPI Chart of Numbers Served

Nationwide Distribution of Proficiency Levels
Nationwide Distribution of Proficiency Levels
The majority of results fall within the 2-3 proficiency level range. This national distribution includes examinees who are deaf, hard of hearing, late deafened and hearing who range in background, experience, education, age and acquisition of ASL.