ASLPI Research & Statistics
Those who develop and utilize tests are responsible for demonstrating and documenting the effectiveness of their assessments. Effectiveness is typically demonstrated by providing evidence of the reliability and validity of the assessment‘s use for specific purposes and for specific populations of test-takers. Given that the ASLPI is a high-stakes evaluation often used as a "gatekeeper" for specific purposes, it is essential to document the reliability and validity of the assessment for the population that may be impacted by the results of the evaluation.
Gallaudet University has mandated and prioritized validity and reliability research on the ASLPI which is currently underway. Through these studies, we will also be mindful of aligning the ASLPI with the field of language proficiency evaluation and best practices. We look forward to sharing the research findings in the future.
Defining Validity of an assessment:
Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. (Standards, pg. 9)
Defining Reliability of an assessment:
Reliability refers to the consistency of measurements when the testing procedure is repeated on a population of individuals or groups.” (Standards, pg. 25)
ASLPI is rater-based assessment. Therefore, the ASLPI interviewers must be skilled and strongly focused on the construct being measured throughout the interview process.
Those who develop and utilize tests are responsible for demonstrating and documenting the effectiveness of their assessments. Effectiveness is typically demonstrated by providing evidence of the reliability and validity of the assessment‘s use for specific purposes and for specific populations of test-takers. Given that the ASLPI is a high-stakes evaluation often used as a "gatekeeper" for specific purposes, it is essential to document the reliability and validity of the assessment for the population that may be impacted by the results of the evaluation.
Gallaudet University has mandated and prioritized validity and reliability research on the ASLPI which is currently underway. Through these studies, we will also be mindful of aligning the ASLPI with the field of language proficiency evaluation and best practices. We look forward to sharing the research findings in the future.
Gallaudet University has mandated and prioritized validity and reliability research on the ASLPI which is currently underway. Through these studies, we will also be mindful of aligning the ASLPI with the field of language proficiency evaluation and best practices. We look forward to sharing the research findings in the future.
Research Mission
Standards for
High-Stakes Testing
High-Stakes Testing
Validity Studies
Reliability Studies
Research Firm
ASLPI Chart of Numbers Served
Nationwide Distribution of Proficiency Levels
Research Mission
Gallaudet commissioned research studies on the ASLPI to:
Importance of Research:
- investigate and document validity and reliability of the ASLPI (current state);
- identify best-practices and standards for language assessments that can be applied to the ASLPI;
- identify the gap between current and desired states and make recommendations to improve the ASLPI process and psychometrics (if needed).
Importance of Research:
- Documenting current level of success
- Test improvement
- Validity, accuracy, and reliability
- User experience
- Tester training
- Legal defensibility
- Compliance with various standards
- Psychometrics are reviewed (e.g., ACE credit)
- Marketability/PR
- Contribution to research
- Leadership role in setting standard for ASL testing
Click to close
Standards for High-Stakes Testing
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author.
- The Standards provides guidelines/responsibilities but is not prescriptive in how to achieve them
- ASTM Language Proficiency Testing Standards. (ASTM International, known until 2001 as the American Society for Testing and Materials (ASTM), is an international standards organization that develops and publishes voluntary consensus technical standards for a wide range of materials, products, systems, and services)
- Principles for the Validation and Use of Personnel Selection Procedures
- Uniform Guidelines on Employee Selection Procedures
Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. (Standards, pg. 9)
Defining Reliability of an assessment:
Reliability refers to the consistency of measurements when the testing procedure is repeated on a population of individuals or groups.” (Standards, pg. 25)
ASLPI is rater-based assessment. Therefore, the ASLPI interviewers must be skilled and strongly focused on the construct being measured throughout the interview process.
Click to close
Validity Studies
Measurement Structure of the ASL Proficiency Construct as Assessed by the ASLPI

The findings provided evidence that a majority of the variability in the five dimensions was attributable to a single higher-order construct. That is, a majority of the variability in dimension ratings can be meaningfully traced to a single underlying factor, which is as intended by the test developer. Moreover, the relationships between the dimensions and the single underlying factor were strongly consistent (i.e., approximately equal in magnitude). In sum, the five dimensions are representative of a single underlying construct, and the dimensions relate to the underlying construct in a consistent manner—these findings align with the design and current rating protocol of the ASLPI.
The findings also provided evidence that a majority of the variability in the observed ratings was attributable to their respective lower-order construct (i.e., dimension). Ratings were applied consistently and reliably to Grammar, Vocabulary, Fluency, Production/Accent, and Comprehension—indicating that interviewees’ actual proficiency on each dimension was primarily responsible for raters’ judgments of ASLPI performance. Only a small amount of variability in the ratings was attributable to unique, rater-specific variance (or bias). This increases confidence that ASLPI ratings differ as a function of differences in proficiency rather than as a function of the rater characteristics (e.g., harshness/severity, inconsistency) or other sources of error. These findings are supportive of the intended internal structure of the ASLPI and align with the design and current rating protocol of this assessment.
Preliminary Validity Study Conclusion:

The findings provided evidence that a majority of the variability in the five dimensions was attributable to a single higher-order construct. That is, a majority of the variability in dimension ratings can be meaningfully traced to a single underlying factor, which is as intended by the test developer. Moreover, the relationships between the dimensions and the single underlying factor were strongly consistent (i.e., approximately equal in magnitude). In sum, the five dimensions are representative of a single underlying construct, and the dimensions relate to the underlying construct in a consistent manner—these findings align with the design and current rating protocol of the ASLPI.
The findings also provided evidence that a majority of the variability in the observed ratings was attributable to their respective lower-order construct (i.e., dimension). Ratings were applied consistently and reliably to Grammar, Vocabulary, Fluency, Production/Accent, and Comprehension—indicating that interviewees’ actual proficiency on each dimension was primarily responsible for raters’ judgments of ASLPI performance. Only a small amount of variability in the ratings was attributable to unique, rater-specific variance (or bias). This increases confidence that ASLPI ratings differ as a function of differences in proficiency rather than as a function of the rater characteristics (e.g., harshness/severity, inconsistency) or other sources of error. These findings are supportive of the intended internal structure of the ASLPI and align with the design and current rating protocol of this assessment.
Preliminary Validity Study Conclusion:
- The five ASLPI dimensions appear representative of a single underlying construct, which is defined by five facets/dimensions
- The dimensions relate to the underlying construct in a consistent manner
- Ratings for each ASLPI dimension show very little systematic rater bias
- Each dimension contributed approximately equally to both the total score and the holistic rating
- This provides construct validity evidence supporting the ASLPI definition of ASL proficiency
Click to close
Reliability Studies
Study Sample:
1568 assessment interviews rated by ASLPI evaluators
Inter and Intra-rater Reliability - measure of consistency between ratings of the same test taker from different evaluators. This is an ‘intra-class correlation’ (ICC) for a single rater or the average rating across 3 raters.
Correlation Across All Possible Evaluator Pairs
Repeatability of ASLPI Ratings
ASLPI Re-Rating
1568 assessment interviews rated by ASLPI evaluators
- 1286 test candidates between 2008 and 2011
- 82 re-rated interviews
Correlation Across All Possible Evaluator Pairs
- Findings support the reliability of ASLPI ratings
- Evaluators consistently rank-ordered interviewees for total scores, for holistic ratings, and for dimension scores
- Evaluators rated performances consistently in terms of relative position on the ASLPI scale
| Correlation Across All Possible Evaluator Pairs | |
| Total Score | .90 |
| Holistic Rating | .90 |
| Dimensions | |
| Vocabulary | .82 |
| Grammar | .82 |
| Comprehension | .86 |
| Accent/Production | .80 |
| Fluency | .81 |
Repeatability of ASLPI Ratings
- Findings supported the repeatability of ASLPI ratings using the adjacent agreement standard of ratings/scores which is the current standard
- When re-rating ASLPI interviews, evaluators’ re-ratings agreed with the initial ratings, and met the adjacent standard
| Agreement (+/- 1) [Ref 90-100%] |
|
| Production | 98% |
| Grammar | 98% |
| Vocabulary | 98% |
| Comprehension | 98% |
| Holistic Rating | 87% |
ASLPI Re-Rating
- Findings supported that a new panel of evaluators and the original panel of evaluators resulted in reliable ratings
| Comparison | n | Final Rating Reliability | |
| Person r |
Spearman Rho |
||
| New panel (3 evaluators) |
82 | .89 | .88 |
| Original panel (3 evaluators) |
81 | .92 | .90 |
Click to close
Research Firm
SWA Consulting, Inc (SWA)
311 S. Harrington St., Suite 200
Raleigh, NC 27603
Dr. Eric A. Surface is the president of SWA Consulting, Inc. (SWA; formerly Surface, Ward, & Associates), a management consulting and applied personnel research firm based in Raleigh, NC. SWA focuses on developing evidence-based human performance solutions for clients. For over a decade, Dr. Surface has worked with military, non-profit, and private-sector organizations-such as the United States Special Operations Command, the American Council on the Teaching of Foreign Languages, and IBM-on projects related to performance, training, work-related foreign language, work analysis, testing, and organizational effectiveness. Dr. Surface has been published in Personnel Psychology, Organizational Research Methods, Human Performance, Military Psychology, Journal of Management, and Foreign Language Annals. He has presented numerous papers at academic, professional, and military conferences and has served as a reviewer for several conferences and journals. His research interests include training effectiveness (broadly defined), the interaction of context and individual differences to influence criteria, foreign language proficiency testing and training, the use of technology for training delivery, test validation, and survey methodology. Dr. Surface earned his PhD in industrial/organizational psychology from NCSU, his MA from East Carolina University, and his BA from Wake Forest University. He is a former Consortium Research Fellow and Consortium Post-Doctoral Research Fellow with the Army Research Institute.
311 S. Harrington St., Suite 200
Raleigh, NC 27603
Dr. Eric A. Surface is the president of SWA Consulting, Inc. (SWA; formerly Surface, Ward, & Associates), a management consulting and applied personnel research firm based in Raleigh, NC. SWA focuses on developing evidence-based human performance solutions for clients. For over a decade, Dr. Surface has worked with military, non-profit, and private-sector organizations-such as the United States Special Operations Command, the American Council on the Teaching of Foreign Languages, and IBM-on projects related to performance, training, work-related foreign language, work analysis, testing, and organizational effectiveness. Dr. Surface has been published in Personnel Psychology, Organizational Research Methods, Human Performance, Military Psychology, Journal of Management, and Foreign Language Annals. He has presented numerous papers at academic, professional, and military conferences and has served as a reviewer for several conferences and journals. His research interests include training effectiveness (broadly defined), the interaction of context and individual differences to influence criteria, foreign language proficiency testing and training, the use of technology for training delivery, test validation, and survey methodology. Dr. Surface earned his PhD in industrial/organizational psychology from NCSU, his MA from East Carolina University, and his BA from Wake Forest University. He is a former Consortium Research Fellow and Consortium Post-Doctoral Research Fellow with the Army Research Institute.
Click to close
ASLPI Chart of Numbers Served

Click to close
Nationwide Distribution of Proficiency Levels

Click to close
Research
Those who develop and utilize tests are responsible for demonstrating and documenting the effectiveness of their assessments. Effectiveness is typically demonstrated by providing evidence of the reliability and validity of the assessment‘s use for specific purposes and for specific populations of test-takers. Given that the ASLPI is a high-stakes evaluation often used as a "gatekeeper" for specific purposes, it is essential to document the reliability and validity of the assessment for the population that may be impacted by the results of the evaluation.
Gallaudet University has mandated and prioritized validity and reliability research on the ASLPI which is currently underway. Through these studies, we will also be mindful of aligning the ASLPI with the field of language proficiency evaluation and best practices. We look forward to sharing the research findings in the future.
Click to close
