Graduate Psychometrics Assignment 1
Table of Contents
Graduate Psychometrics

Assignment Content
1.
· Top of Form
· Access the Mental Measurements Yearbook, located in the University Library.
· Select two assessments of intelligence and two achievement tests.
· Critique the major definitions of intelligence. Determine which theory of intelligence best fits your selected instruments. Explain how the definition and the measures are related.
· Evaluate the measures of intelligence you selected for reliability, validity, normative procedures, and bias.
· Your selected intelligence and achievement assessments. How are the goals of the tests similar and different? How are the tests used? What are the purposes of giving these differing tests
· Bottom of Form
******Select two assessments of intelligence and two achievement tests. Also, DO NOT Choose an “emotional” intelligence test. We are looking at the typical intelligence test that gives a standard score and IQ.******** (view attachments for example tests)
Part 2
Answer each questions using 175 words

Discussion 1
In everyday living, mental abilities tend to operate in unison rather than in isolation. How useful is it, therefore, to attempt to isolate and measure “primary mental abilities”? What is factor Advanced Certified analysis? What are the theories that relate to it in regards to intelligence? What about information processing theories? How are they different?
Discussion 2
Thanks for your post. My preferred theory of intelligence is Gardner’s Theory of Multiple Intelligence. I think it is absolutely true that everyone has intelligence but there are many different areas or types of intelligence.
How does Gardner’s theory compare and contrast to Sternberg’s theory?
How can these be related to the learning environment?
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 1/7
EBSCO Publishing Citation Format: APA (American Psychological Assoc.):
NOTE: Review the instructions at http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA and make any necessary corrections before using. Pay special attention to personal names, capitalization, and dates. Always consult your library resources for the exact formatting and punctuation guidelines.
References Kaufman, A. S., Kaufman, N. L., & Breaux, K. C. (2014). Kaufman Test of Educational Achievement, Third
Edition. Retrieved from https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.6516&site=ehost-live&scope=site&custid=uphoenix
<!–Additional Information: Persistent link to this record (Permalink): https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.6516&site=ehost-live&scope=site&custid=uphoenix End of citation–>
Kaufman Test of Educational Achievement, Third Edition Review of the Kaufman Test of Educational Achievement, Third Edition by KAREN MACKLER, School Psychologist, Lawrence Public Schools, Lawrence, NY: DESCRIPTION. The Kaufman Test of Educational Achievement, Third Edition (KTEA-3) is an individually administered measure of academic achievement for children in prekindergarten through Grade 12, and also including adults ages 19 to 25. The assessment covers reading, math, writing, and language areas. There are two parallel forms of the test, A and B. The test is comprehensive and comes with test easels, separate booklets for written expression stories, test protocols including a student response booklet, an administration manual, a scoring manual, and a USB key, which includes scoring protocols and the technical manual. The battery comprises 19 subtests, though administration of all subtests is not required. A subtest may stand on its own, or may be used as part of a composite. The test manual advises which subtests should be administered, based on the referral question, which is very helpful for school-based personnel, especially if the test is to be used for considerations other than eligibility. Administration time varies, depending upon which subtests are given. An estimate of 10 to 35 minutes is given per each core battery, of which there are three. One typically would want to obtain the Academic Skills Battery composite score, which should take approximately 15 minutes at the earlier age ranges and about 85 minutes for those students in Grades 3 and up. The Written Expression subtest takes the longest amount of time to administer, as examinees are asked to write an essay as part of the subtest. The test is primarily norm-referenced, but may be interpreted as criterion-referenced in reading, math, oral language, and written language when using the error analysis capabilities of the test. Uses for the test include determining eligibility for classification and placement, identifying skill strengths and weaknesses, progress monitoring, and demonstrating the effectiveness of response to intervention (RTI) programs. The assessment profile allows for error analysis, which may be useful for instructional planning for school-aged students. Parallel forms of the test would be helpful for pretest/posttest designs or collecting longitudinal data. The subtests may be given to follow disability categories listed in the Individuals with Disabilities Education Improvement Act of 2004 (IDEIA, 2004) and academic deficit

javascript:openWideTip(‘http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA’);
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 2/7
areas of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5; American Psychiatric Association, 2013). The test is the latest in the Kaufman Achievement series, the last being the Kaufman Test of Educational Achievement–Second Edition (KTEA-II; 2004). Of the 19 subtests, four are new. Other subtests have been revised by adding new items or improving content coverage. Administration procedures have been simplified. Even the artwork has been updated. The new subtests are Reading Vocabulary, Silent Reading Fluency, Writing Fluency, and Math Fluency. The Naming Facility subtest on the KTEA-II was broken into two subtests on this measure, Object Naming Facility and Letter Naming Facility. DEVELOPMENT. The Kaufman Test of Educational Achievement, Third Edition is the latest achievement battery developed by Alan and Nadeen Kaufman. This version is a revision of the KTEA-II, originally published in 2004. The KTEA-3 underwent an arduous process leading up to its standardization. The authors conducted mini-pilot studies and a full pilot study prior to a national tryout. The tryout evaluated students and young adults representative of the national population. The final normative sample consisted of approximately 2,600 individuals in prekindergarten through Grade 12, and ages 4 through 25. All participants spoke English and did not have physical or perceptual disabilities that would preclude them from taking the test without modifications. None of the participants was institutionalized. Data from the tryout sample were included with the standardization sample. TECHNICAL. The norm sample was broken down by grade, and norms were created for fall, winter, and spring. The norm sample based upon ages was largely taken from the grade norm sample. In both samples, half of the group was administered Form A and half Form B. Overall, considerable care was taken to ensure a representative sample based on the most current U.S. Census data available. Reliability was computed using the split-half method for all subtests except for the subtests that are timed. Reliability coefficients computed for most composite scores were quite high, in the .80s and the .90s. The Oral Fluency composite coefficients were lower, in the .70s. The reliability for this composite was stronger at younger ages, which is appropriate, as this skill is called upon as a prerequisite for early literacy skills. These findings held true across all grade and age ranges. Of interest is the excellent reliability data (mid- to high .90s) for Letter and Word Recognition, Nonsense Word Decoding, Reading Vocabulary, Math Concepts and Applications, and Spelling. Somewhat lower reliability coefficients were found for Reading Comprehension, Phonological Processing, Math Fluency, Written Expression, and Listening Comprehension, ranging from .80 to the low .90s. Alternate-form reliability was computed for Forms A and B, and the test authors concluded that the two forms measure the same academic abilities. Scores should be consistent regardless of the form chosen for administration. Fluency tests showed lower reliabilities that may be due in part to individual differences such as motivation, stamina, attention, and background knowledge. Interrater reliability was computed for Oral Expression (90%) and Written Expression (95%). Reliability coefficients were high, indicating that the scoring criteria presented in the manuals may be used to obtain scores that are consistent across examiners. Validity was established relative to test content, response processes, and internal structure. Because much of the test was modeled after the preceding KTEA-II, the validity studies should still hold. The constructs of the various academic composites are valid and useful for the practitioner. Concurrent or convergent evidence of validity is important to school-based personnel when deciding which tests to purchase. When compared to scores from other tests (e.g., Wechsler Individual Achievement Test—Third Edition, Woodcock-Johnson III Tests of Achievement), mean KTEA-3 scores for individuals were somewhat lower, which might be attributed to the Flynn effect, which is typical of
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 3/7
newly normed tests rendering older norms softer. Studies provided support for the validity of the composites and subtests. Higher correlations were found between core academic subtests and between KTEA-3 composites and similar composites on other tests. Several studies involving special populations were conducted (those with learning disabilities, mild intellectual disability, attention-deficit/hyperactivity disorder, specific learning disability, and academically gifted), resulting in promising results for the discriminant ability of the test, which makes the test useful clinically. COMMENTARY. Once again, the Kaufmans have presented a well-researched assessment measure that shows high promise for clinical utility. The content of the subtests is relevant and covers the breadth of academic skills needed for success for a school career and beyond. Subtests can stand alone or be used as part of meaningful composites. Fluency subtests should not be used alone or used to make diagnostic or placement decisions. The Oral Language composite does not demonstrate the same level of reliability as other composites, but most often there are other assessment data presented from other sources that will corroborate scores on these subtests. Two scoring methods may be used. The Q-global method is the test publisher’s web-based platform. It may be a little overwhelming at first, but the Q-global system offers the ability to identify a pattern of relative strengths and weaknesses, useful for identification and classification. The report also provides the psychologist with intervention suggestions based upon error analysis and suggestions for parents to use at home to provide support for underlying academic skills. Alternatively, protocols may be scored by hand. Support for this approach is provided on the USB key included in the test kit. The flash drive also includes audio files needed to administer the Listening Comprehension passages as well as information useful for other subtests. The files can be downloaded to another device such as a laptop or smartphone. At first, it may be somewhat overwhelming for some psychologists or other test administrators to deal with the flash drive, but after several administrations of the test and after scoring it several times, the procedures became much easier for this reviewer. The error analysis and intervention suggestions are helpful, and goals may be taken directly from the suggestions given. As response to intervention becomes more and more ingrained into current practices, this feature is very helpful for saving time and assisting teachers with ideas for appropriate intervention. The standardization sample was reflective of the different areas of the country, but it would have been helpful to include students who do not speak English as their first language. Such individuals often constitute a problematic subgroup on standardized tests. Overall, the test design was very user-friendly and current in terms of what is needed in today’s schools. SUMMARY. The Kaufman Test of Educational Achievement, Third Edition (KTEA-3), is a thoroughly planned, user-friendly instrument that can be very helpful for school based practitioners or those assisting young adults in planning for their futures. The test presents good statistics in both reliability and validity studies. The test battery covers all of the areas necessary to have a good understanding of an examinee’s academic strengths and weaknesses. It is also helpful to use the error analysis to determine eligibility for services and goals. Parallel forms are helpful for pretest/posttest analyses and progress monitoring. Future research will make this test even more helpful for meeting current educational assessment needs. REVIEWER’S REFERENCE American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: Author.

11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 4/7
Review of the Kaufman Test of Educational Achievement, Third Edition by MERILEE McCURDY, Associate Professor, and LESLIE HART, Doctoral Candidate, Department of Educational Psychology and Counseling, University of Tennessee, Knoxville, TN: DESCRIPTION. The Kaufman Test of Educational Achievement, Third Edition (KTEA-3) is an achievement battery for children in prekindergarten through Grade 12 (or, from age 4 through age 25 years). The KTEA-3 is an individually administered assessment of a child’s achievement across numerous domains, ranging from reading and written language to mathematics and oral language. Performance across the KTEA-3 can be considered both norm-referenced and, through error analysis, criterion-referenced. Compared to the second edition, the KTEA-3 adds restandardized norms, four new subtests, and revisions to existing subtests to aid in clarity and administration. The assessment is composed of 19 subtests that combine to form composite scores. The core composites are Reading (composed of the Letter and Word Recognition and Reading Comprehension subtests), Math (Math Concepts and Applications and Math Computation subtests), and Written Language (Written Expression and Spelling subtests). The KTEA-3 provides a unique composite called the Academic Skills Battery. Given the nature of this composite, the required number of subtests increases as the student ages. At the Pre-K level, the composite includes Math Concepts and Applications, Letter and Word Recognition, and Written Expression. Assessment at the Kindergarten level also includes the Math Computation and Spelling subtests. All other ages take these five subtests as well as Reading Comprehension. The KTEA-3 also can produce four supplemental reading-related composite scores. The Sound-Symbol composite is composed of Phonological Processing and Nonsense Word Decoding subtests. The Decoding composite is derived from the Letter and Word Recognition and Nonsense Word Decoding subtests. The Reading Fluency composite includes the Silent Reading Fluency, Word Recognition Fluency, and Decoding Fluency subtests, and the Reading Understanding composite, includes Reading Comprehension and Reading Vocabulary subtests. Two oral composites can be constructed. The first, Oral Language, is composed of Associational Fluency, Listening Comprehension, and Oral Expression. The second, Oral Fluency, is derived from Associational Fluency and Object Naming Facility. Four cross-domain composite scores can also be derived. The first, Comprehension, is composed of Reading Comprehension and Listening Comprehension subtests; the Expression composite includes Written Expression and Oral Expression subtests; the Orthographic Processing composite includes Spelling, Letter Naming Facility, and Word Recognition Fluency; and the Academic Fluency composite includes the Writing Fluency, Math Fluency, and Decoding Fluency subtests. The KTEA-3 kit has two parallel forms for use in repeated assessment. Administration of the subtests can be customized to address the presenting referral concern of the client. DEVELOPMENT. The KTEA-3 was initially conceptualized in early 2009. The subtests are designed to assess primary academic areas: basic reading, reading understanding, reading fluency, language processing, mathematics, written language, and oral language. At the beginning stage of redevelopment, researchers administered six subtests to 37 participants in Kindergarten and Grade 2. Four were new subtests to the assessment (Reading Vocabulary, Silent Reading Fluency, Math Fluency, and Writing Fluency) and two were included due to a change in administration (Spelling, Math Computation). Following this “mini-pilot,” a second mini-pilot was conducted using most of the remaining subtests. Afterward, two pilot administrations (one group-administered and one individually administered) were conducted to obtain information about item content and the usability of the test. A tryout phase in 2011 included approximately 870 typical examinees and 32 students identified as having specific learning disabilities in reading and/or writing. This stage resulted in changes in start
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 5/7
points, and results of item analyses led to the removal of items biased with respect to gender, parental education level, or ethnicity. The test authors report that no more than 10% of the initial items were removed from any given subtest. The standardization phase data were collected from August 2012-July 2013 and included 2,600 participants. This group was matched to the 2012 U.S. Census data across age, gender, parent education, ethnicity, region, and exceptionality. The level of agreement between the U.S. Census and the standardization sample is reported and shows good concordance. This agreement holds true across both the spring and fall standardization samples. There may be some underrepresentation across exceptionality groups, particularly for those individuals with attention-deficit/hyperactivity disorder. Results from this standardization study led to some administration changes, dropping of items appearing to be biased, equating of parallel forms, and vertical scaling. TECHNICAL. Reliability. Reliability was assessed using split-half, alternate-form, and interrater reliability methods. Standard errors of measurement were also provided. Split-half reliability coefficients are reported as evidence of internal consistency. The test authors report this metric both by grade and by age. Across grade levels, coefficients for subtests ranged from .54 to .98. At the composite level, coefficients ranged from .70 to .99. Across ages, coefficients ranged from .55 to .99 at the subtest level and from .66 to .99 at the composite level. Alternate-form reliability was evaluated across three grade-based ranges of participants (PK-2, 3-6, and 7-12). Demographic information for the 306 examinees in the sample shows concordance with the U.S. Census figures. An average of 7.5 days lapsed between the administrations. For the first participant range (PK-2), correlations at the subtest level ranged from .59 to .95. At the composite level, the correlations ranged from .74 to .96. The second participant range (3-6) showed similar patterns of correlations at the subtest (.54 to .92) and composite (.69 to .96) levels. The third grouping (Grades 7- 12) demonstrated correlations at the subtest level ranging from .59 to .95 and, at the composite level, from .70 to .96. Using these reviewers’ criteria for evaluating reliability coefficients, reliability coefficients ranged from good (.80s) to excellent (.90s) across a majority of composite scores. Additional information on reliability is provided through the standard error of measurement (SEM). Across grades, the SEM ranges from 2.12 to 10.17 at the subtest level and from 1.74 to 8.28 at the composite level. An age- based consideration of the SEM shows similar patterns across subtests (1.5 to 10.06) and composites (1.55 to 8.81). Final evidence of reliability is provided through a consideration of interrater reliability. The KTEA-3 includes two subtests that require some degree of interpretation by the administrator. Agreement between administrators in the normative sample was 90% (Oral Expression) and 95% (Written Expression), suggesting that the rigorous scoring procedures outlined for the KTEA-3 allow for consistent interpretation of examinee responses. Validity. The KTEA-3 demonstrates validity through content and statistical analyses. The content of the KTEA-3 subtests is derived from concrete academic areas, and answer choices are closely examined to identify patterns and lines of reasoning to the correct answer. Further evidence of validity might be taken from the test authors’ suggestion that some KTEA-3 content aligns with the Common Core State Standards. The validity of the structure of the test was assessed in three parts: intercorrelation analysis, factor analysis, and fit statistics. Intercorrelation studies established an adequate relationship across subtest and composite scores, particularly in Reading, Math, Written Language, Sound-Symbol, Decoding, and Reading Fluency (.70s and .80s). Oral Language and Oral Fluency demonstrated correlation coefficients with other composites that were in the between .40s and .50s, which was typical of previous
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 6/7
iterations of the KTEA. Grade-based intercorrelations are provided from prekindergarten through Grade 12. Supplemental age-based intercorrelations are provided for ages 17-25. Subtest correlations supported the conceptualization of the different composites and provided evidence of discriminant validity. Confirmatory factor analyses also provided evidence of validity. The four-factor model (i.e., Math, Reading, Written Language, Oral Language) first conceptualized with earlier iterations of the KTEA yielded strong fit statistics using data from the KTEA-3. Results of a five-factor model supported adding the Reading Fluency composite. Convergent evidence of validity was provided through comparisons of KTEA-3 scores with scores from other measures of academic achievement and cognitive abilities. Administration of the KTEA-3 was coupled with the KTEA-II or one of three other achievement or speech assessments (Wechsler Individual Achievement Test–Third Edition [WIAT-III], Woodcock Johnson III Tests of Achievement [WJ III ACH], Clinical Evaluation of Language Fundamentals, Fourth Edition [CELF-4]). Correlations with the subtests of the KTEA-II ranged from .06 to .91 at the subtest level and from .02 to .89 at the composite level. Additional correlational data are provided for the composites of the KTEA-II with both the subtests (.06 to .89) and composites (.18 to .93) of the KTEA-3. Correlations between the WIAT-III and the KTEA-3 ranged in magnitude from .00 to .87 (for subtests) and from .05 to .95 (for composites). Selected subtests of the WJ III ACH were correlated with the KTEA-3 subtests yielding correlation coefficients ranging from .13 to .80. Correlations between the KTEA-3 composites and cluster scores from the WJ III ACH ranged from .21 to .87. Those subtests of the KTEA-3 with some oral language component (Oral Expression, Written Expression, and Listening Comprehension) were correlated with the Formulated Sentences subtest of the CELF-4, yielding correlations ranging from .47 to .64. The KTEA-3 also was compared with two tests of cognitive abilities, the Kaufman Assessment Battery for Children, Second Edition (KABC-II) and the Differential Ability Scales–Second Edition (DAS-II). Correlations between the KTEA-3 and KABC-II ranged from .02 to .75. Correlations between composite scores on the KTEA-3 and the School-Aged Battery of the DAS-II ranged from .20 to .75. The correlations between the cognitive ability measures and the KTEA-3 are, on average, lower than those between the KTEA-3 and other achievement measures. Performances of special groups held true to theoretical and clinical expectations, adding further evidence of the validity and utility of KTEA-3 scores. Groups included in the analyses were those with specific learning disorder in reading, written expression, or mathematics; language disorder (either expressive or mixed expressive/receptive presentation); mild intellectual disability; attention- deficit/hyperactivity disorder; and those identified as academically gifted. COMMENTARY. The Kaufman Test of Educational Achievement, Third Edition suggests some significant improvements over the previous version of the assessment. The revised age-based norms allow for assessment of younger children. The KTEA-3 also seems a strong option for specific referral problems, particularly with the ability to address different levels of concern with the multitude of composite scores offered. Of particular interest is the manual’s description of the functionality of the KTEA-3 alongside the Common Core State Standards (CCSS). Given the nationwide implementation of CCSS, this elaboration could provide some utility in interpreting scoring patterns and in making recommendations to facilitate future academic success. Whereas previous versions of the KTEA struggled with a perception of being difficult to administer, this revision feels more fluid. Although a number of supplies are required for administering the KTEA-3, the gains associated with gathering these supplies may outweigh the small hassle. SUMMARY. The KTEA-3 purports to be a valid and reliable assessment of an individual’s academic
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=9&ReturnUrl=https%3a%2f%2fw… 7/7
achievement, and the information in the technical manual supports this assertion. The materials provide a dynamic assessment experience for the child, incorporating recorded problems, fluency problems, and charmingly illustrated writing prompts. The KTEA-3 proves to be a valuable revision to the Kaufman family of tests.
*** Copyright © 2014. The Board of Regents of the University of Nebraska and the Buros Center for Testing. All rights reserved. Any unauthorized use is strictly prohibited. Buros Center for Testing, Buros Institute, Mental Measurements Yearbook, and Tests in Print are all trademarks of the Board of Regents of the University of Nebraska and may not be used without express written consent.
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 1/10
EBSCO Publishing Citation Format: APA (American Psychological Assoc.):
NOTE: Review the instructions at http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA and make any necessary corrections before using. Pay special attention to personal names, capitalization, and dates. Always consult your library resources for the exact formatting and punctuation guidelines.
References Schrank, F. A., McGrew, K. S., Mather, N., LaForte, E. M., Wendling, B. J., & Dailey, D. (2015). Woodcock-
Johnson® IV Tests of Early Cognitive and Academic Development. Retrieved from https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.8061&site=ehost-live&scope=site&custid=uphoenix
<!–Additional Information: Persistent link to this record (Permalink): https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.8061&site=ehost-live&scope=site&custid=uphoenix End of citation–>
Woodcock-Johnson® IV Tests of Early Cognitive and Academic Development Review of the Woodcock-Johnson® IV Tests of Early Cognitive and Academic Development by RUSSELL N. CARNEY, Professor of Psychology, Missouri State University, Springfield, MO: DESCRIPTION. The Woodcock-Johnson® IV Tests of Early Cognitive and Academic Development (ECAD) is a new individually administered battery of tests specifically designed for children ages 2.5 years to 7 years, as well as 8- and 9-year-olds with cognitive delays. It is the downward extension (agewise) of the newly revised Woodcock-Johnson IV (WJ IV; Schrank, McGrew, & Mather, 2014). According to the test’s Comprehensive Manual, the purpose is “to identify emergent cognitive abilities and early academic skills” (p. 1) and to get at cognitive delays and relative strengths and weaknesses that may inform early interventions. The ECAD consists of 10 tests administered by way of a single, colorful easel book, wherein test stimuli face the child, and directions face the examiner. Four of the tests are unique to the ECAD, and the remaining six are alternate forms from other Woodcock-Johnson tests. Administration of the ECAD takes about 50 minutes, and several of the tests have basal and ceiling rules. Two tests use an audio recording via a CD, one requires a two-sided response worksheet, and two are timed. The test manual provides clear directions for both administration and scoring. Seven of the tests measure cognitive factors (e.g., Gf, Gc, Gwm) based on the Cattell-Horn-Carroll (CHC) theory of cognitive abilities (e.g., McGrew, 2005), and three measure academic ability (reading, math, and written language). The ECAD yields three cluster scores: General Intellectual Ability-Early Development (GIA-EDev; representing general intelligence), Early Academic Skills, and Expressive Language. Several familiar types of scores are produced, including raw scores, age equivalents, grade equivalents, percentile ranks, and standard scores. Other derived scores include z scores, T scores, stanines, and normal curve equivalents. Scoring is completed via an online scoring and reporting system. DEVELOPMENT. In developing the ECAD, the test authors followed guidelines outlined in the Standards for Educational and Psychological Testing (American Educational Research Association
javascript:openWideTip(‘http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA’);
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 2/10
[AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014). The ECAD is theoretically and structurally similar to the WJ IV. As stated earlier, the theory base is the CHC theory of cognitive abilities—a modern, factor-analytic theory built on the work of Raymond Cattell, John Horn, and John Carroll. A “three-stratum” theory, the top level represents general intelligence (G), the second level breaks down into a number of broad abilities that include crystallized (Gc) and fluid (Gf) intelligence, and beneath each of the second stratum abilities, a set of narrow factors is identified. In developing new items, the authors of the ECAD consulted outside experts, including university faculty members, psychologists, and public school teachers. Careful review by these individuals helped ensure that the test covered the constructs involved and that the test items were of appropriate difficulty for young children. In addition, bias and sensitivity reviews were conducted by a committee composed of nine professionals with specializations related to diversity and disability. According to the test manual, “all ECAD test item pools were calibrated onto a common scale, called a W scale, for use in the construction of the final published tests and the development of the norms” (p. 56). A Rasch single-parameter logistic test model was used during the development of the test, including for calibration, item pool equating, and scaling. An item tryout study was conducted with several hundred students to provide information about the potential test items, as well as about other features of the new test, such as directions, format, and scoring. Next, a norming study was conducted (2009-2012) in conjunction with the WJ IV. The test manual details how the resultant data were calibrated and equated using the Rasch model. Differential item functioning (DIF) was also examined to check for potentially biased items. Final test forms were assembled and evaluated. Test makers consulted with professionals familiar with young children having various disabilities and from various linguistic backgrounds in order to make the test more accessible to children with special needs. TECHNICAL. The ECAD test battery was co-normed with the WJ IV battery, facilitating the simultaneous calibration and scaling of the tests that appear in both batteries. Tests were administered to 7,416 individuals ranging in age from 2 to 90+ years. Of these examinees, 2,378 were ages 2 to 10 and used as the norm sample for the ECAD. Sample sizes ranged from 173 (2-year-olds) to 336 (8-year-olds). The entire sample came from 46 states and the District of Columbia. Stratification variables included Census region, sex, country of birth, race, ethnicity, community type, parent education, and type of school. Efforts were made to match U.S. population proportions (2010 Census projections). Examinee weighting was used to improve the correspondence. Additional details on the norming process are presented in the test manual. Test reliability deals with the consistency and precision of test scores. The test manual provides reliability coefficients for each of the 10 individual tests and the three cluster scores across ages 2-10. For all tests except one (Rapid Picture Naming), internal consistency was calculated using the split-half procedure (odd/even halves, corrected by the Spearman-Brown formula). The three cluster score reliabilities were figured using Mosier’s formula. On the individual tests, across age groups, median internal reliability coefficients ranged from .74 (Picture Vocabulary) to .97 (Memory for Names). For cluster scores, reliabilities were quite good, with median values of .95, .96, and .89 for General Intellectual Ability-Early Development, Early Academic Skills, and Expressive Language, respectively. Based on test/age specific reliability coefficients, standard errors of measurement (SEMs) are provided for each age group. SEMs can be used to construct a confidence interval around a child’s test score (i.e., the observed score + SEM). Based on the ECAD’s standard scores (M = 100, SD = 15, range = 40-160), and depending on a particular test’s reliability and the age group, SEMs ranged from 2.60 (age 2, Memory for Names; ages
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 3/10
5 and 6, Letter-Word Identification) to 8.22 (age 10, Visual Closure). On the General Intellectual Ability cluster score, SEMs ranged from 2.60 to 3.67 across age groups. On the Early Academic Skills cluster score, SEMs ranged from 2.60 to 3.35. And, finally, on the Expressive Language cluster score, SEMs ranged from 3.00 to 5.61. Validity “refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” and “the process of validation involves accumulating relevant evidence” in that regard (AERA, APA, & NCME, 2014, p. 11). The test manual states that validity evidence, in part, is based on prior validity studies related to the four versions of the Woodcock-Johnson test batteries and research dealing with CHC theory. The manual provides a summary of the 10 tests, describing test content, processes, and construct descriptions. In developing both the ECAD and the WJ IV, the authors used modern test development procedures. Multidimensional scaling, correlation analysis, and cluster analysis, were used to evaluate empirically the match between desired content dimensions and the judgment of professionals. These analyses provided support for the content structure of the battery. Developmental patterns of the test and ability cluster scores (based on cross-sectional data) were also examined. The tests and cluster scores showed developmental changes across the age span, as well as divergent growth curves–the latter providing evidence for distinct abilities. Intercorrelation matrices (presenting correlations among the 10 test and three cluster scores) are provided for three age ranges. The intercorrelations suggest that various tests on the ECAD measure distinct abilities and that the academic skills cluster measures achievement. Because the ECAD was co-normed with the WJ IV, it was possible to examine the ECAD within the WJ IV’s structural validity analysis. “A systematic exploratory, model generation and a cross-validation structural validity strategy were applied” to the norming data, which the manual goes on to describe as “the most thorough scientific approach to the examination of the structural validity of any contemporary battery of cognitive, oral language, and achievement tests” (p. 92). Although space limitations do not allow a description of the details here, the authors of the manual were satisfied that their findings supported a broad CHC factor top-down model, and concluded that CHC g-factor loadings were consistent with existing research and provided evidence for structural validity. The test manual provides additional validity evidence in the form of several tables listing correlations between scores from the ECAD and a variety of relevant tests. For example, students’ ECAD scores were correlated with well-respected measures of children’s cognitive ability. These tests included two editions of the Wechsler Preschool and Primary Scale of Intelligence (i.e., the WPPSI-III and the WPPSI-IV). Here, for example, Full Scale IQ scores on the WPPSI-III and WPPSI-IV demonstrated correlation coefficients of .75 and .78, respectively, with the General Intellectual Ability cluster score on the ECAD. Similarly, the General Conceptual Ability (G) score on the Differential Ability Scales-Second Edition (DAS-II) yielded a correlation coefficient of .87 with the ECAD’s general ability cluster score. In another study, the ECAD’s Expressive Language scores were correlated with children’s performance on other measures of language, such as the Clinical Evaluation of Language Fundamentals, Fourth Edition (CELF-4) and the Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4). As an example, the correlation coefficients were .82 and .79 between the ECAD’s Expressive Language cluster score and the core language score on the CELF-4 and the PPVT-4 score, respectively. Scores from a number of other tests were correlated with scores from the ECAD, and the resultant correlation coefficients are presented in the test manual. Performance on the ECAD for three different clinical samples is also provided. Samples included children with cognitive delay (N = 61), children with speech/language delay (N = 63), and children with autism (N = 41). Each sample is broken down in terms of demographic characteristics, such as age
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 4/10
range, gender, and race. Based on the extensive evidence reported, the authors of the manual conclude that “the validity evidence presented supports the use of the ECAD tests for measuring children’s cognitive abilities and early academic skills” (p. 118). COMMENTARY. The ECAD has a number of strengths—a key one being that the test authors paid careful attention to guidelines in the joint technical standards (AERA, APA, & NCME, 2014), and thus followed current testing-industry best practices in terms of test development. The authors should be commended in this regard. Further, the ECAD’s Comprehensive Manual is exceptionally well written and thorough. Based on the reliability information presented, the manual suggests that the ECAD tests are “sufficiently reliable for measuring children’s abilities” (p. 118). This conclusion is particularly the case with the ability and achievement cluster scores, where the reliability coefficients are in the .90s across age groups. Likewise, the extensive validity evidence is deemed favorable, and “supports the use of the ECAD tests for measuring children’s cognitive abilities and early academic skills” (manual, p. 118). This reviewer is inclined to agree with the test authors’ positive conclusions. As with the WJ IV, the ECAD has a well-researched theory base (CHC theory). It was co-normed with the WJ IV, which is advantageous, and a variety of useful scores are reported. Indeed, as Cizek (2003) wrote in his review of the WJ III, “Virtually any derived score that a user could want from a norm- referenced test is provided” (p. 1020). Further, as the test manual states, “One advantage derived from the Rasch scaling of test data is that a unique calculation of the SEM is provided for each possible test score” (p. 47). On a practical note, standard scores having a mean of 100 and a standard deviation of 15 make comparisons with other modern tests (e.g., the Wechsler or Binet intelligence tests) easier. Another practical observation is that the single easel book should simplify the administration of the test. The booklet seems well made, and pictorial stimuli are colorful and seem appropriate for young children. Further, as the test manual suggests, examiners who are familiar with the Woodcock Johnson test should find it easy to learn to administer the new ECAD. As a measure of both cognitive ability and achievement, the Woodcock Johnson battery has long been a staple of special education assessment. In earlier reviews, Cizek (2003) concluded that the WJ III was “clearly a superior instrument” (p. 1024), and Sandoval (2003) described the WJ III as “the premier battery for measuring both the cognitive abilities and school achievement of school-aged children and young adults” (p. 1027). Given its careful development, the ECAD battery should fare just as well and earn similar accolades. SUMMARY. The new ECAD represents an agewise downward extension of the Woodcock Johnson IV (Schrank, McGrew, & Mather, 2014). Designed specifically for young children (i.e., 2.5 years to 7 years, and 8- and 9-year-olds with cognitive delays), this individually administered battery yields 10 test scores, as well as three important composite (cluster) scores: General Intellectual Ability-Early Development (representing general intelligence), Early Academic Skills, and Expressive Language. In particular, the first two cluster scores allow for the direct comparison of a child’s ability with his or her achievement, which is helpful in the diagnosis of learning disabilities. Based on CHC theory, and co-normed with the WJ IV, the carefully developed ECAD is a welcome addition to the Woodcock Johnson family of tests. It should be very helpful to those tasked with assessing developmental delays in young children and in identifying their respective strengths and weaknesses. REVIEWER’S REFERENCES American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Cizek, G. J. (2003). [Review of the Woodcock-Johnson III]. In B. S. Plake, J. C. Impara, & R. A. Spies
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 5/10
(Eds.), The fifteenth mental measurements yearbook (pp. 1020-1024). Lincoln, NE: Buros Institute of Mental Measurements. McGrew, K. S. (2005). The Cattell-Horn-Carroll (CHC) theory of cognitive abilities: Past, present, and future. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (2nd ed., pp. 136-202). New York, NY: Guilford. Sandoval, J. (2003). [Review of the Woodcock-Johnson III]. In B. S. Plake, J. C. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 1024-1028). Lincoln, NE: Buros Institute of Mental Measurements. Schrank, F. A., McGrew, K. S., & Mather, N. (2014). Woodcock-Johnson® IV. Rolling Meadows, IL: Riverside.
Review of the Woodcock-Johnson® IV Tests of Early Cognitive and Academic Development by CLAUDIA R. WRIGHT, Professor Emeritus, California State University, Long Beach, CA: DESCRIPTION. The Woodcock-Johnson® IV Tests of Early Cognitive and Academic Development (ECAD), the newest addition to the extensive Woodcock-Johnson collection of instruments, represents a theoretically and technically sound battery of developmentally appropriate tests designed to assess cognitive abilities and academic skills for children ages 2 years 6 months to 7 years 11 months and for children ages 8-9 years who have been identified with cognitive developmental delays. The ECAD yields 10 test and three cluster scores. The General Intellectual Ability-Early Development (GIA-EDev) cluster comprises seven cognitive test scores: Test 1, Memory for Names (72 items); Test 2, Sound Blending (22 items); Test 3, Picture Vocabulary (43 items); Test 4, Verbal Analogies (28 items); Test 5, Visual Closure (30 items); Test 6, Sentence Repetition (34 items); and Test 7, Rapid Picture Naming (120 items). The Early Academic Skills (EAS) cluster is made up of three achievement-related test scores: Test 8, Letter-Word Identification (54 items); Test 9, Number Sense (25 items); and Test 10, Writing (42 Items). The Expressive Language (EL) cluster includes Picture Vocabulary and Sentence Repetition; each requires a verbal response. Examiners well versed in standardized testing protocols (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014) are advised to exercise strict adherence to the administration and scoring procedures set forth in the ECAD Comprehensive Manual. Detailed guidance is provided for testing preparation including seating arrangements, materials, establishing rapport, managing time, employing the standardized audio recording for the Sound Blending and Sentence Repetition tests, establishing basal and ceiling levels, and scoring correct responses that may be mispronounced due to variations in speech patterns or synonym use. The ECAD Test Book is organized in a sturdy easel format so that each stimulus page (pictures, letters, or numbers) faces the examinee, and a corresponding detailed instruction page faces the examiner. Special attention to detail enhances the accessibility of ECAD test items and instructions to very young children, children with disabilities (such as vision, hearing, or motor), and children from diverse linguistic backgrounds with particular emphasis on the use of simple language to reduce “language load” and brightly colored, boldly outlined stimulus objects to boost the visual experience. On all tests, each item is scored either “1” (correct) or “0” (incorrect), yielding a total raw score for each test. Detailed descriptions are provided for deriving normative scores, including age-equivalents, percentile ranks, and difference scores. Test administrators are reminded that interpretation of ECAD scores is the purview of professionals with specialized training in early childhood assessment practices and diagnostic decision making. DEVELOPMENT. The ECAD is conceptually based upon the Cattell-Horn-Carroll (CHC) theory of
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 6/10
cognitive abilities (Schneider & McGrew, 2012) and developed and normed concurrently with the Woodcock-Johnson® IV (WJ IV; Schrank, McGrew, & Mather, 2014c). ECAD Tests 1 through 7 assess cognitive abilities consistent with CHC Long-term Retrieval (Glr), Auditory Processing (Ga), Comprehension-Knowledge (Gc), Fluid Reasoning (Gf), Visual Processing (Gv), Short-term Working Memory (Gwm), and Cognitive Processing Speed (Gs). Tests 8 through 10 provide a measure of academic achievement. The first seven tests form the basis for the GIA-EDev cluster, an overall estimate of global intelligence (g). Tests 3 (Picture Vocabulary, Gc-VL/LD) and 6 (Sentence Repetition, Gwm-MS) form the EL cluster; and Tests 8 (Letter-Word Identification, Grw-RD), 9 (Number Sense, Gq- A3/Gf-RQ), and 10 (Writing, Grw-SG), the EAS cluster. Tests 1, 4, 5, and 9 are unique to the ECAD; Tests 2, 3, 6 and 7 are alternate forms of four tests from the Woodcock-Johnson IV Tests of Oral Language (WJ IV OL; Schrank, Mather, & McGrew, 2014b); and Tests 8 and 10 are alternate forms of two tests from the Woodcock-Johnson IV Tests of Achievement (WJ IV ACH; Schrank, Mather, & McGrew, 2014a). Items for the ECAD were selected from existing WJ IV item pools and included the development of new items appropriate for 2-year-old examinees. Also, for children with cognitive delays, additional consideration for test design was based on Parts B and C of the Individuals with Disabilities Education Act of 2004 (IDEA, 2004). ECAD items were submitted for review by outside content-area experts to assess construct relevance, appropriate levels of difficulty for the target age range, gender bias, and sensitivity issues that might affect children including those with disabilities or from diverse linguistic backgrounds. Items identified as potentially biased were removed or rewritten prior to test trials. ECAD test pools were calibrated using the Rasch (IRT) measurement model to create a common scale (W scale) from which to describe an examinee’s ability, an item’s difficulty, and for the calculation of GIA-EDev and EL cluster scores based on the arithmetic average of the corresponding W scores for each test included in the cluster. To optimize individual test weights, GIA-EDev scores were differentially weighted using principal components analysis, generating differential g weights across the age range for ECAD test scores contributing to the cluster. TECHNICAL. Norms. Normative data for the WJ IV and the ECAD were collected over a 25-month period (December 2009 through January 2012) by thoroughly vetted and trained professional examiners recruited by the test publisher. Participants had been randomly selected using a stratified sampling design. Demographic characteristics observed for these samples were consistent with the 2010 U.S. Census. Of the total sample of 7,416, ECAD norms were based on 2,378 children between the ages of 2 years 6 months and 10 years 11 months: ages 2-6 to 2-11 (n = 173); 3-0 to 3-11 (n = 203); 4-0 to 4-11 (n = 223); 5-0 to 5-11 (n = 205); 6-0 to 6-11 (n = 308); 7-0 to 7-11 (n = 310); 8-0 to 8-11 (n = 336); 9-0 to 9-11 (n = 306); 10-0 to 10-11 (n = 314). Participants were predominantly White (64%), lived in metropolitan areas (83%), and had parent education levels beyond high school (59%). To control for possible biases in the normative sample data, each examinee was assigned a series of partial weights based on his/her contribution to the U.S. database. Both conventional and innovative procedures were employed to construct these weighted ECAD norms. Bootstrap resampling procedures allowed for more precise estimates of an examinee’s ability, calculation of age-equivalent scores, percentile ranks, and standard score norms for each test and cluster, and calculation of difference score norms that supported data-based predictions and comparisons among selected tests or cluster scores. Reliability. Estimates of internal-consistency reliability for examinees’ scores at each age level were calculated using the split-half procedure for odd and even items, applying the Spearman-Brown correction formula. Exceptions included reliability estimates for scores on the Rapid Picture Naming test, which were calculated using the Rasch model and estimates for cluster scores, which were calculated
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 7/10
using the unweighted composite formula by Mosier (1943). Moderate to strong median reliability estimates across the nine age levels were obtained for each test and cluster: Memory for Names (.97); Sound Blending (.84); Picture Vocabulary (.74); Verbal Analogies (.80); Visual Closure (.77); Sentence Repetition (.86); Rapid Picture Naming (.86); Letter-Word Identification (.96); Number Sense (.79); and Writing (.91); GIA-EDev (.95); EAS (.96); and EL (.89). Validity. Conventional and innovative methods were applied for examining content-, concurrent-, and structural-related evidence of validity. For content evidence of validity, the Guttman Radex 2- dimensional multidimensional scaling (MDS) procedure was employed to examine the relationships between the 51 WJ IV and the 10 ECAD tests for two age groups (3-5, 6-8) and to identify tests sharing the same cognitive operations or common content features, displayed as a visual-spatial map. The MDS analysis yielded support for four types of shared content characteristics: (a) auditory-verbal (primarily Ga, Gc, and select Gwm tests); (b) figural-visual (Gv and select Gf and Glr tests); (c) quantitative- numeric (primarily Gq/Gf-RQ and select Gwm tests); and, (d) reading-writing. The authors noted the first three dimensions are consistent with the verbal, figural, and numeric components of the Berlin Model of Intelligence Structure (BIS; Süß & Beauducel, 2005) but that the BIS model does not include auditory or reading-writing components. To examine criterion-related evidence of validity, several studies were conducted in which ECAD scores were correlated with well-known measures including three measures of cognitive abilities, four language tests, and two developmental skills assessments. The three measures of cognitive abilities, each individually administered, were the Wechsler Preschool and Primary Scale of Intelligence–Third Edition (WPPSI-III; 16:267); Wechsler Preschool and Primary Scale of Intelligence–Fourth Edition (WPPSI-IV, Wechsler, 2012; 19:176); and Differential Ability Scales–Second Edition (DAS-II; 18:45). The WPPSI-III and DAS-II administrations occurred during the WJ IV/ECAD norming study; the WPPSI-IV administration occurred in 2014. For a sample of 99 examinees ages 4 through 7 (M = 5.9 years, SD = .8 years), correlations were examined between five WPPSI-III composite scores and eight ECAD test scores and two ECAD cluster scores. Mean performance on the ECAD tests and clusters ranged from 100.1 to 109.1, suggesting the sample was of somewhat above average intelligence. Correlations between GIA-EDev cluster scores and WPPSI-III Full Scale IQ, Verbal IQ, and General Learning Quotient supported GIA-EDev as (a) a measure of general intelligence (r = .75), (b) a measure of verbal abilities (r = .82), and (c) a measure of learning abilities (r = .80), respectively. Notable were correlations between Picture Vocabulary, Sentence Repetition, and the EL cluster with both the WPPSI-III Verbal IQ (r = .77, .72, and .81, respectively) and the WPPSI-III General Learning Quotient (r = .79, .71, and .81, respectively), as measures of crystalized intelligence-based language abilities. Further, correlations between scores for Number Sense with the WPPSI-III Full Scale, Verbal, and Perceptual IQ (r = .68, .65, and .60, respectively), suggested a moderate relationship between early quantitative abilities and general intelligence. In a second study, ECAD and WPPSI-IV scores were analyzed for a sample of 100 examinees, ages 3 through 7 years (M = 5.2 years, SD = 1.2). Means on the ECAD tests and clusters ranged from 102.3 to 111.6, and means on the WPPSI-IV composite scores ranged from 103.5 to 105.2, indicating the sample was above average in general intelligence. Consistent with previous findings, strong correlations (corrected for variability in the ECAD sample) supported evidence for GIA-EDev scores as a measure of general intelligence (r = .78 with WPPSI-IV Full Scale IQ); verbal abilities (r = .77 with WPPSI-IV Verbal Comprehension Composite); and, visual-spatial abilities (r = .77 with WPPSI-IV Visual Spatial Composite). For a sample of 50 preschool children, ages 3 through 6 years (M = 5.2 years, SD = .9 years), correlations were examined for ECAD and DAS-II scores. A correlation of .87 between GIA-EDev (M =
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 8/10
104.0, SD = 12.8) and DAS-II General Conceptual Ability (M = 113.8, SD = 13.8) scores supported the GIA-EDev as an indicator of general intelligence. In addition, EL (M = 104.6, SD = 13.8) and Picture Vocabulary (M = 102.0, SD = 14.9) scores yielded moderate coefficients with DAS-II General Conceptual Ability (r = .71 and .65, respectively); DAS-II School Readiness (M = 110.9, SD = 14.6) (r = .71 and .62, respectively); and DAS-II Verbal Ability (M = 107.5, SD = 13.9) (r = .73 and .78, respectively). It was concluded that these language ability measures provide indicators of both general intelligence and language abilities in this sample. Further, high correlations were observed between scores on DAS-II School Readiness with EAS (M = 101.8, SD = 11.0) (r = .91); Letter-Word Identification (M = 99.3, SD = 12.1) (r = .89); and Writing (M = 101.0, SD = 9.0) (r = .85). To examine performance on ECAD test and cluster scores for children with documented and/or primary diagnoses for cognitive developmental delays, three groups of examinees were identified: cognitive delay (ages 3 through 6, n = 61), speech and/or language delay (ages 3 through 6, n = 63), and autism (ages 3 through 7, n = 41). Ten ECAD tests and two Woodcock Johnson IV Oral Language tests (Oral Comprehension and Understanding Directions) were administered. As expected, the patterns of ECAD cluster scores were lowest for the cognitive delay group: GIA-EDev (M = 73.8, SD = 19.3), EL (M = 79.4, SD = 21.4), and EAS (M = 77.2, SD = 15.6). Higher performances were observed for the speech/language delay group for whom the means and standard deviations were 89.3 (13.9), 91.8 (12.8), and 88.7 (12.9), respectively, and the autism group, for whom the corresponding means and standard deviations were 88.0 (17.6), 90.7 (17.3), and 88.6 (14.8). No other statistics were reported for these samples. Additional analyses were conducted on two subsamples from the normative population to examine the relationship between ECAD scores and scores from four language tests. The Clinical Evaluation of Language Fundamentals, Fourth Edition (CELF-4; 18:30), the Comprehensive Assessment of Spoken Language (CASL; 15:58); and the Oral and Written Language Scales: Listening Comprehension/Oral Expression (OWLS; 14:266) are all individually administered multidimensional batteries of oral language ability. The Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4; 18:88) is an individually administered measure of expressive vocabulary and word retrieval. For a sample of 50 examinees, ages 5 through 8 years (M = 6.8, SD = 1.3), scores on four ECAD tests and the EL cluster were correlated with the six CELF-4 composite scores. Moderate to high coefficients ranging from .68 to .82 (median .78) were obtained between EL (M = 106.0, SD = 11.4) and CELF-4 scores. The correlation coefficient between EL and PPVT-4 (M = 105.8, SD = 10.5) was .79. Among other ECAD tests, Picture Vocabulary (M = 102.1, SD = 12.9) achieved the highest correlation with CELF-4 Language Content (M = 106.3, SD = 12.2, r = .78) and with PPVT-4 (r = .75). The results support EL and Picture Vocabulary scores as measuring early oral language abilities. Separately, for a sample of 50 examinees, 3 through 6 years (M = 5.1, SD = 1.1), their scores were analyzed on three ECAD tests (Picture Vocabulary, Sentence Repetition, and Rapid Picture Naming) and EL along with their scores on six CASL and three OWLS measures. Across all coefficients, only low to moderate relationships were observed. Among the ECAD tests, Picture Vocabulary was correlated .58 with CASL Core Composite and .57 with OWLS Oral Composite, suggesting moderate support for the Picture Vocabulary test as a general measure of oral language ability. Moderate coefficients were observed for Sentence Repetition with CASL Core Composite, Sentence Completion, and Syntax Construction (.53, .61, and .54, respectively); and with OWLS Oral Composite and Oral Expression (.52 and .54, respectively). Rapid Picture Naming scores yielded the lowest correlations with CASL scores (ranging from .17 to .26, median .225) and with OWLS scores (ranging from -.21 to .10, median = -.06), suggesting this ECAD test is measuring abilities not assessed by the CASL or OWLS scores. The range of correlations between ECAD EL and CASL scores was from .34 to .55 (median = .465); and with OWLS scores, .34 to .53 (median = .50).
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 9/10
Other correlational analyses investigated the relationship between ECAD scores and two early childhood development measures: the Battelle Developmental Inventory, 2nd Edition (BDI-2; 17:15), a norm-referenced developmental skills assessment for children (birth to 7 years); and the Riverside Early Assessments of Learning IDEA Observational Version (REAL IDEA-OV; Bracken, 2013), an observational assessment of developmental functioning for children (birth to 7 years). For a sample of 98 examinees, ages 3 through 7 years (M = 5.1, SD = 1.2), moderate correlations were obtained between the BDI-2 Communication and Cognitive domain scores and each of the seven ECAD cognitive test scores (ranging from .40 to .70, median = .58; and .39 to .71, median = .64, respectively); and with GIA-EDev (.75 and .79, respectively) and EL (.74 and .72, respectively). The correlations observed between the GIA-EDev and the BDI-2 Communication and Cognitive scores were offered as validity evidence for GIA-EDev as a cognitive ability measure in development assessments with young children. In another study, a sample of 84 examinees, 2 through 6 years (M = 4.6, SD = .9), were observed by examiners using the REAL IDEA-OV. Moderate to strong correlations were obtained for each of the 10 ECAD tests and three cluster scores with each of the six REAL IDEA-OV domains. Specifically, ECAD Picture Vocabulary, Visual Closure, Sentence Repetition, Letter-Word Identification, Number Sense, and GIA-EDev scores yielded coefficients of .70 or higher with REAL IDEA-OV domains: Cognitive (ranging from .71 to .82, median = .79), Communication (ranging from .75 to .82, median = .79), and Academic (ranging from .74 to .89, median = .81). EL yielded moderate to high coefficients across the REAL IDEA- OV domains (.63 to .80, median = .74); scores on Memory for Names and EAS produced relatively low correlations (ranging from .30 to .48, median = .35 and .38 to .43, median = .40, respectively). Overall, these findings provide validity support for selected ECAD scores serving as measures of early cognitive abilities, expressive language skills, and pre-academic skills. Based on WJ IV and ECAD normative data, structural-related evidence of validity was examined by applying a split-sample design (two sets of approximately equal numbers of examinees for each age group) for developing an exploratory model using cluster analysis, exploratory principal components analysis, and multidimensional scaling to identify the best fitting model. Two plausible models were noted, the broad CHC factor top-down model and the broad plus narrow CHC factor bottom-up model; the former was considered preferred because it was the simpler of the two models. A confirmatory structural model for cross validation was applied to WJ IV and ECAD normative data. Although the analyses were carried out for the entire age range for the WJ IV (3 through 90+ years), of particular interest for this review are the outcomes of the model development and cross-validation analyses based on two subsamples of examinees: ages 3 through 5 (n = 208); and ages 6 through 8 (n = 412). For the two age groups, median latent factor loadings from a low of .73 for Cognitive Processing Speed to a high of .98 for Long-term Retrieval on the general intelligence factor (g) indicated the corresponding clusters are representative of abilities strongly influenced by general intelligence. Additional structural evidence of validity was supported by factor loadings observed on each of the 10 ECAD tests. COMMENTARY. The Woodcock-Johnson IV ECAD battery of tests was subjected to extensive and rigorous statistical analyses applied to a large normative data base. Validation of ECAD scores was optimized by including analyses with 11 well-established assessment batteries. The test authors caution that whereas developmental patterns reported with ECAD data supported the validity of scores, these outcomes were based on cross-sectional rather than longitudinal data. Additional research is recommended to examine longitudinal growth data. Also, as relatively small sample sizes were employed for one validation study with children with developmental delays (ages 3 through 7), it is suggested that the test authors undertake more extensive validation efforts to support the use of the ECAD with this important subpopulation of children and include those ages 8 to 9 years. Overall, the ECAD is exceptionally well constructed and adds a welcomed dimension to the Woodcock-Johnson
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=7&ReturnUrl=https%3a%2f%2f… 10/10
family of tests. SUMMARY. The Woodcock-Johnson IV Tests of Early Cognitive and Academic Development are conceptually grounded upon the Cattell-Horn-Carroll (CHC) theory of cognitive abilities. Strong reliability and validity evidence for ECAD test and cluster scores support the appropriateness of selecting the ECAD for assessing cognitive abilities and academic skills for children ages 2 years 6 months to 7 years 11 months and for 8- and 9-year-olds who have been identified with cognitive developmental delays. REVIEWER’S REFERENCES American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Bracken, B. A. (2013). Riverside Early Assessments of Learning IDEA Observational Version. Rolling Meadows, IL: Riverside. Individuals with Disabilities Education Act of 2004. (2004). 20 U.S.C. § 1400. Mosier, C. I. (1943). On the reliability of a weighted composite. Psychometrika, 8, 161-168. Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. Flanagan & P. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (3rd ed., pp. 99–144). New York: Guilford. Schrank, F. A., Mather, N., & McGrew, K. S. (2014a). Woodcock-Johnson IV Tests of Achievement. Rolling Meadows, IL: Riverside. Schrank, F. A., Mather, N., & McGrew, K. S. (2014b). Woodcock-Johnson IV Tests of Oral Language. Rolling Meadows, IL: Riverside. Schrank, F. A., McGrew, K. S., & Mather, N. (2014c). Woodcock-Johnson IV. Rolling Meadows, IL: Riverside. Süß, H.-M., & Beauducel, A. (2005). Faceted models of intelligence. In O. Wilhelm & R. W. Engle (Eds.), Handbook of understanding and measuring intelligence (pp. 313-332). London, UK: Sage. Wechsler, D. (2012). Wechsler Preschool and Primary Scale of Intelligence–Fourth Edition. San Antonio, TX: Pearson.
*** Copyright © 2014. The Board of Regents of the University of Nebraska and the Buros Center for Testing. All rights reserved. Any unauthorized use is strictly prohibited. Buros Center for Testing, Buros Institute, Mental Measurements Yearbook, and Tests in Print are all trademarks of the Board of Regents of the University of Nebraska and may not be used without express written consent.
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 1/8
EBSCO Publishing Citation Format: APA (American Psychological Assoc.):
NOTE: Review the instructions at http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA and make any necessary corrections before using. Pay special attention to personal names, capitalization, and dates. Always consult your library resources for the exact formatting and punctuation guidelines.
References Psychological Corporation (The). (2001). Wechsler Individual Achievement Test–Second Edition. Retrieved
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 1/7
EBSCO Publishing Citation Format: APA (American Psychological Assoc.):
NOTE: Review the instructions at http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA and make any necessary corrections before using. Pay special attention to personal names, capitalization, and dates. Always consult your library resources for the exact formatting and punctuation guidelines.
References Reynolds, C. R. (2016). Reynolds Adaptable Intelligence Test–Nonverbal. Retrieved from
https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.10721&site=ehost-live&scope=site&custid=uphoenix
<!–Additional Information: Persistent link to this record (Permalink): https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.10721&site=ehost-live&scope=site&custid=uphoenix End of citation–>
Reynolds Adaptable Intelligence Test–Nonverbal Review of the Reynolds Adaptable Intelligence Test-Nonverbal by THOMAS J. GROSS, Assistant Professor, Psychology Department, College of Education and Behavioral Sciences, Western Kentucky University, Bowling Green, KY: DESCRIPTION. The Reynolds Adaptable Intelligence Test-Nonverbal (RAIT-NV) was developed from the Reynolds Adaptable Intelligence Test (RAIT) subtests for the Fluid Intelligence Index (FII), which is named the Nonverbal Intelligence Index (NVII) on the RAIT-NV. The RAIT-NV was normed for individuals from 10 years 0 months to 75 years of age. The RAIT-NV includes a professional manual and fast guide that give a test overview and administration and scoring information. Item books, answer sheets, scoring key overlay, and score summary forms are also included. Administration instructions require examinees to read the instructions silently and inform the examiner when they are finished; however, reading out loud and pointing by the examiner are allowed for examinees who are unable to read. There are general instructions, as well as individual subtest instructions. The test author recommends that examiners have formal training in assessment and knowledge of cognitive evaluation. Two subtests comprise the NVII. The Nonverbal Analogies (NVA) subtest has 52 items and a 7-minute time limit. The task requires examinees to choose the best picture that fits a pictorial analogy. The Sequences (SEQ) subtest has 43 items and a 10-minute time limit. This subtest has examinees select a picture that best completes a series of pictures. Both subtests have a multiple-choice format, and the overlay is used to hand score the answer sheet. Raw scores are converted to T scores for each subtest, based on age, and the sum of the T scores is used to find the NVII standard score. An appendix in the professional manual provides 90% and 95% confidence intervals, percentile ranks, stanine scores, z scores, and normal curve equivalents. Age equivalents up to 19 years are provided, but the test author advises against their use. For each age group in the normative sample, statistical significance levels between NVA and SEQ subtest score discrepancies are included in the test manual with the percentage of difference scores. DEVELOPMENT. The primary goal for developing the RAIT-NV was to provide a means to administer a traditional paper-and-pencil assessment of nonverbal intelligence to individuals or groups. The purpose
javascript:openWideTip(‘http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA’);
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 2/7
of this goal was to have a reliable and valid intellectual assessment for examinees who might face environmental or personal setbacks that would impair their verbal skills. A secondary goal was to eliminate confounds related to cultural/linguistic biases through item reviews, and to provide multiple and conceptually equitable forms of instructions during administration. Further, the test author provides a means for comparing a change in NVII standard scores from administration from one time period to another. Much of the RAIT-NV development is discussed as a portion of the RAIT development. The test author reported piloting items and versions of the RAIT using classical test theory, Rasch analyses, and expert reviews to assess items. This approach is consistent with that described in previous reviews of the RAIT (see Floyd & Singh, 2014; Suppa, 2014). TECHNICAL. The RAIT-NV used the RAIT standardization sample of 2,124 individuals divided into 23 age groups from 10 to 75 years. The range of ages in each group varied from 6 months to 10 years; age and group sizes ranged from 71 to 120 individuals. Within the sample, 484 participants completed the booklet version of the RAIT, and 1,640 completed the computer version. No differences between groups were found; still, the RAIT-NV can be administered only in booklet form. The standardization sample was selected to match the 2010 U.S. Census population statistics through stratified random sampling. Raw scores were weighted within age groups based on gender, ethnicity, and educational attainment or parent’s educational attainment for those 10 to 20 years of age. Continuous norming procedures were used to adjust raw score distributions and calculate subtest T scores for the NVA and SEQ at each age range. The NVII was created by using the cumulative frequency distribution of the summed subscale T scores to create a standard score scale with a mean of 100 and standard deviation of 15. The confidence intervals were calculated using an estimated true score and the standard error of estimate. The simple difference method was used to calculate the statistical significance of difference scores. It was observed that approximately 30 to 50% of the discrepancy scores were one or more standard deviations from the mean, dependent on age group (manual, Appendix I). Regarding reliability, the median alpha coefficients were .89 for NVA (range = .84 to .93), .86 for SEQ (range .81 to .92), and .93 for NVII (range = .89 to .96). Test-retest reliability was assessed with 132 participants (10 to 20 years of age, n = 45; 21 to 40 years of age, n = 40; 41 to 75 years of age, n = 47) over a period of 18 to 34 days. Uncorrected and disattenuated (corrected for alpha) coefficients, respectively, for the total sample were .77 and .84 for NVA, .74 and .83 for SEQ, and .81 and .86 for NVII. The range for the uncorrected coefficients was .73 to .80 for NVA, .68 to .83 for SEQ, and .75 to .87 for NVII. The range of disattenuated coefficients was .81 to .89 for NVA, .77 to .97 for SEQ, and .80 to .92 for NVII. Correlations between the NVA and SEQ subtests were not readily apparent. The test author reported that the RAIT-NV is designed to measure “nonverbal or fluid intelligence” (manual, p. 40). A principal component analysis using varimax rotation for the RAIT norming sample indicated that the NVA and SEQ subtests loaded onto a single factor with factor loadings of .91 and .81, respectively. External validity was assessed by correlating the RAIT-NV NVII and subtest scores with scores on other standardized tests and job industry and job training level as well as comparing performance across clinical groups. Correlations between RAIT-NV subtests and other standardized measures’ subtests are provided. In relation to other assessments of intelligence, the NVII was significantly correlated with the Test of General Reasoning Ability (TOGRA) General Reasoning Index (GRI) at .90. The TOGRA is a derivative test of the RAIT, as well. Correlations between the NVII and the Wechsler Intelligence Scale for Children —Fourth Edition (WISC-IV; n = 29) were significant for the Full Scale IQ (FSIQ) score (.51), Processing Speed Index (.52), and the Perceptual Reasoning Index (PRI; .43). Correlations between the NVII and the Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV; n = 28) were significant only for the PRI (.44). Correlations (n = 51) between the NVII and Reynolds Intellectual Assessment Scales (RIAS)
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 3/7
Verbal Intelligence Index (VIX; .65), Nonverbal Intelligence Index (NIX; .36), Composite Intelligence Index (CIX; .60), and an optional Composite Memory Index (CMX; .56) were statistically significant. The Wonderlic Personnel Test total score significantly correlated with the NVII (.71), as did the Beta III IQ score (.72). The relationships of the NVII scores (n = 66) with tests of academic achievement were consistent across the Wide Range Achievement Test 4 (WRAT4) Reading Composite (.43) and the Test of Irregular Word Reading Efficiency (TIWRE) Reading Efficiency Index (.40). The author reported median and mean NVII scores by job industry and found that the NVII correlated significantly (.31; n = 372) with the complexity of professions as outlined by the O*NET Job Zones (National Center for O*NET Development, n.d.). Seven clinical groups were identified: intellectual disability (n = 52), traumatic brain injury (n = 39), stroke (n = 32), dementia (n = 24), hearing impaired (n = 28), learning disability (10 to 17 years of age, n = 22; 18 to 50 years of age, n = 29), and attention-deficit/hyperactivity disorder (10 to 17 years of age, n = 28; 18 to 50 years of age, n = 34). Comparisons of means between the clinical groups and matched samples from the normative set were conducted with independent t-tests for the NVA, SEQ, and NVII scores. Overall, the matched groups had an NVII standard score of approximately 100 (range = 98 to 103), an NVA T score of approximately 51 (range = 50 to 54), and SEQ approximately T = 50 (range = 48 to 52); however, matched sample standard deviations were not provided. In sum, the clinical groups had lower mean scores than the respective matched groups; however, no effect sizes were provided. COMMENTARY. The RAIT-NV is a brief and straightforward assessment that may be administered and scored with little effort on the part of the professional examiner. Test administrators should be aware that the RAIT-NV requires individuals to complete their own answer sheets, which could be difficult for younger individuals or those with intellectual or motor impairments. Also, many of the items on the NVA and some of the items on the SEQ subtests do require culturally specific knowledge (e.g., information about specific sports), which might limit generalization. General guidelines are provided for both verbal and nonverbal administration; there are no detailed instructions in the test manual regarding how to gesture nonverbal instructions. The RAIT-NV was developed with a sample that closely approximates the 2010 U.S. Census; however, with projected changes in U.S. demographics (e.g., Colby & Ortman, 2015) the test developers could consider updating the standardization sample. Nonetheless, the summaries provided by the test developer indicate the RAIT-NV is generally reliable. The external evidence of validity of the RAIT-NV and the related subscales might need clarification. The NVII appears to estimate some form of overall intellectual functioning, but associations with overall intelligence scores varied by test. Although a comparison test such as the Wonderlic could be conceptualized as a measure of fluid abilities, it is more likely a measure of general intelligence (Hicks, Harrison, & Engle, 2015). Similarly, relationships between the NVII and other measures of fluid intelligence seem inconsistent. For example, the NVII has significant correlations with PRI on the WISC-IV and WAIS-IV but a larger correlation with the RIAS VIX than NIX. Those administering the RAIT-NV should consider reviewing the NVA and SEQ correlations with other test batteries’ subtests to appraise subtest validity independently. In consideration of the populations of individuals with disabilities, it appears that the scores were consistently lower than the matched sample. Still, it is difficult to discern the magnitude of the differences as it was unclear how family-wise error was addressed in the means comparisons, and the omission of the matched groups’ standard deviations makes independently calculating effect sizes difficult. SUMMARY. The RAIT-NV succeeds in being a short duration test that can be completed in paper-and- pencil form by a wide age range of individuals. It could be useful as a screener of intellectual functioning but might need to be used in conjunction with other measures. The RAIT-NV is likely insufficient to be used for discriminating clinical and non-clinical groups until more information regarding differences
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 4/7
between these groups is explored, such as sensitivity and specificity. REVIEWER’S REFERENCES Colby, S. L., & Ortman, J. M. (2015). Projections of the size and composition of the U.S. population: 2014 to 2060: Population estimates and projections. Current Population Reports, P25-1143. Washington, DC: U.S. Census Bureau. Floyd, R. G., & Singh, L. J. (2017). [Test review of the Reynolds Adaptable Intelligence Test]. In J. F. Carlson, K. F. Geisinger, & J. L. Jonson (Eds.), The twentieth mental measurements yearbook (pp. 605- 607). Lincoln, NE: Buros Center for Testing. Hicks, K. L., Harrison, T. L., & Engle, R. W. (2015). Wonderlic, working memory capacity, and fluid intelligence. Intelligence, 50, 186-195. doi:10.1016/j.intell.2015.03.005 National Center for O*NET Development. (n.d.). O*NET Online. Retrieved from https://www.onetonline.org/help/online/zones Suppa, C. H. (2017). [Test review of the Reynolds Adaptable Intelligence Test]. In J. F. Carlson, K. F. Geisinger, & J. L. Jonson (Eds.), The twentieth mental measurements yearbook (pp. 607-609). Lincoln, NE: Buros Center for Testing.
Review of the Reynolds Adaptable Intelligence Test-Nonverbal by RONALD A. MADLE, Retired School Psychologist (formerly Shikellamy School District and Penn State University), Lewisburg, PA: DESCRIPTION. The Reynolds Adaptable Intelligence Test-Nonverbal (RAIT-NV) uses visual analogies and visual sequences to measure intelligence in individuals from 10-0 to 75-11 years. It was formed from the Fluid Intelligence Index (FII) of its parent assessment, the comprehensive Reynolds Adaptive Intelligence Test (RAIT). Stated applications include clinical assessment for individuals with various disabilities (e.g., learning disabilities, intellectual disabilities, autism spectrum disorder, neuropsychological impairments), human resources testing, and testing second language learners. It is not appropriate for individuals with significant visual or visual-perceptual impairments. The RAIT-NV test kit includes 10 color item booklets and pads of 50 answer sheets and score summary forms with a see-through scoring key, a professional manual, and a fast guide. The test has 95 items across the Nonverbal Analogies (NVA) and Sequences (SEQ) subtests, with an overall Nonverbal Intelligence Index (NVII). Many individuals can learn to administer the test with the guidance of an appropriately qualified person. It can be administered to groups or individuals using a conventional paper-and-pencil format. All items have five multiple-choice options. After silently reading the test instructions and sample items in the test booklet, the examinee begins the 20- minute-long test. Answers are recorded by filling in bubbles, although dictated responses are permissible. Items can be skipped and returned to within each section, there is no penalty for guessing, and scratch paper may be used. Alternate instructions are presented for examinees with secondary disabilities (e.g., reading, hearing, motor impairments) or second language learners. Even though the subtests have time limits, they are quite generous. Objective scoring is completed on the score summary form, which includes demographic information, raw and standard subtest scores and the NVII, and significance of differences between subtests. The reverse side provides space for plotting subtest and index scores and for recording reliability of changes across administrations. Subtest raw scores are converted to T scores (mean = 50; standard deviation = 10), which are summed to obtain the NVII standard score using the traditional standard score metric (mean = 100; standard deviation = 15) with the associated confidence interval and percentile rank. Score descriptors are significantly below/above average, moderately below/above average, below/above average, and average, with average being 90-109 and all other intervals at 10-point increases or decreases.
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 5/7
In addition to providing fairly standard interpretive advice, the test author recommends accounting for the Flynn Effect by subtracting 0.3 standard score units from the obtained NVII for each year after the test standardization. DEVELOPMENT. The RAIT-NV was developed as part of the RAIT, a comprehensive, flexible intellectual assessment that can be administered in individual or group formats. One hundred sixty-five items (86 NVA and 79 SEQ) were developed, reviewed, and piloted. In the first pilot study three groups of 150 people took selected subtests. Item difficulty, item discrimination, and item bias (DIF) statistics were derived from classical test theory and item response theory. The revised test, with NVA and SEQ items reduced to 72 and 68, respectively, was submitted to a second field test. A second sample of individuals (n = 397) was divided into two groups who took either odd-numbered or even-numbered items. Following the same analyses as in the first pilot, score means, standard deviations, and distributions were computed and items sorted by difficulty. A final analysis of item foil effectiveness and possible biases was completed before items were finalized for the standardization version. TECHNICAL. Standardization. The final RAIT standardization form included 52 NVA items and 43 SEQ items. At this point, time limits were assigned (7 minutes and 10 minutes for NVA and SEQ, respectively). There were no basal or ceiling rules, and each participant completed as many items as possible within the time limits assigned. From July 2011 to November 2012 standardization data were collected in 40 states using both computer and paper-and-pencil administrations. A total of 2,124 people (484 booklet and 1,640 computer) across 23 age groups completed the standardization version. A multivariate analysis of covariance found no differences between the two administration methods, permitting use of all data in the norms. Distributions on gender, ethnicity, and education level were consistent with the 2010 U.S. Census. Geographically the South and West were somewhat overrepresented, but there were no significant differences in scores across regions. A statistical weighting procedure was used to achieve a near- perfect match with the Census figures. Besides the subtest and index scores the final norm tables include 90% and 95% confidence intervals, percentiles, stanines, z scores, normal curve equivalents, age equivalents, discrepancy scores, and reliable change scores. Reliability. Content sampling (internal consistency) and time sampling reliability information is provided in the test manual. Interscorer reliability was not assessed due to the objective multiple-choice test format. Coefficient alpha for the NVII was a very respectable .93, and for the NVA and SEQ scales, values were .89 and .86, correspondingly. Age group coefficients ranged from .81 to .96, with all NVII coefficients between .89 and .96. Most subtest alpha values were in the mid- to high .80s and considered acceptable for making decisions about individuals. Test score stability was examined in 132 individuals from ages 10 to 75 with an average retest interval of 24.5 days (range = 18 to 34 days). The corrected stability coefficient for the total group was .86, with NVA and SEQ being .84 and .83, respectively. When three separate age groups are examined, the reliability for the 10- to 20-year-old group is somewhat weaker than that for the older groups (.77 to .81). Relatively small gains in scores across time (about three points) were noted. Validity. The manual presents multiple types of evidence of validity. Test content included expert review of the item content and developmental sequencing of items by difficulty, as well as the internal consistency of the scales found in the reliability studies. Principal components analyses explored the relationships among variables. A principal components analysis suggested either a two- or three-factor solution for the original RAIT, with the three-factor
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 6/7
solution being chosen. Factor 2, or the Fluid Intelligence Index (which is the same as the NVII), had the strongest relationship to g. This finding was considered strong justification for breaking out the RAIT-NV as a stand-alone test. The RAIT-NV was correlated with several measures of intelligence and achievement to provide convergent and divergent evidence of validity. Comparisons with the Test of General Reasoning Ability (.90; Reynolds, 2014), Wonderlic Personnel Test (.71; Wonderlic, 2002), and the Beta III (.72; Kellogg & Morton, 1999) showed a strong relationship between their overall scores typical of correlations between measures of g. Further comparisons are reported between the Wechsler Intelligence Scale for Children—Fourth Edition (WISC-IV: Wechsler, 2003), Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV; Wechsler, 2008), and Reynolds Intellectual Assessment Scales (RIAS; Reynolds & Kamphaus, 2003). The NVII correlated moderately with the Perceptual Reasoning Index (.43), Processing Speed Index (.52), and Full Scale IQ (.51) on the WISC-IV (n = 29). When 28 adults were administered the WAIS-IV and the RAIT-NV, a moderate correlation was found with the Perceptual Reasoning Index (.44). All other correlations with these two measures were not significant. Finally, the NVII was found to have moderate to strong correlations with each RIAS index in a sample of 51 individuals. Correlations between the RAIT-NV and achievement measures showed moderately strong results on the Word Reading subtest of the Wide Range Achievement Test 4 (.44; Wilkinson & Robertson, 2006) and the Test of Irregular Word Reading Efficiency (.40; Reynolds & Kamphaus, 2007). Finally, scores were examined for individuals in various clinical groups. Scores were as expected. For example, the intellectually disabled group had a mean NVII of 64.67 with similar results on the two subtests. Other groups examined (traumatic brain injury, stroke, dementia, hearing impaired, learning disabilities, and attention-deficit/hyperactivity disorder) also showed expected levels of impairment. COMMENTARY AND SUMMARY. Overall, the RAIT-NV appears to be a useful test that has built on the solid foundation of its parent, the RAIT. It is easily administered and scored across a wide age range. Norms tables show good floor and ceiling scores at all ages with adequate or better item gradients. The RAIT-NV’s psychometric characteristics are moderate to strong, although intercorrelations with some Wechsler index scores seem more typical of IQ-achievement correlations. Important, however, are consistently strong correlations with measures of fluid intelligence. Clinically, there are sufficient floors and ceilings to permit classification of individuals as intellectually disabled or mentally gifted. As with other cognitive tests, however, it would not be possible to discriminate at the more impaired levels of intellectual disability. The RAIT-IV should prove particularly useful in any setting where an efficient (especially group- administered) measure of intellectual functioning, especially fluid ability, is needed. REVIEWER’S REFERENCES Kellogg, C. E., & Morton, N. W. (1999). Beta III. San Antonio, TX: Pearson. Reynolds, C. R. (2014). Test of General Reasoning Ability. Lutz, FL: Psychological Assessment Resources. Reynolds, C. R., & Kamphaus, R. W. (2003). Reynolds Intellectual Assessment Scales. Lutz, FL: Psychological Assessment Resources. Reynolds, C. R., & Kamphaus, R. W. (2007). Test of Irregular Word Reading Efficiency. Lutz, FL: Psychological Assessment Resources. Wechsler, D. (2003). Wechsler Intelligence Scale for Children—Fourth Edition. San Antonio, TX: Pearson. Wechsler, D. (2008). Wechsler Adult Intelligence Scale—Fourth Edition. San Antonio, TX: Pearson. Wilkinson, G. S., & Robertson, G. J. (2006). Wide Range Achievement Test 4. Lutz, FL: Psychological
11/3/2019 EBSCOhost
https://web.b.ebscohost.com/ehost/delivery?sid=c7e96d15-d5be-4b3f-a406-cb9fff427cee%40pdc-v-sessmgr05&vid=4&ReturnUrl=https%3a%2f%2fwe… 7/7
Assessment Resources. Wonderlic, E. F. (2002). Wonderlic Personnel Test. Libertyville, IL: Wonderlic. U.S. Census Bureau. (2010). Current population survey, March 2010. Washington, DC: U.S. Department of Commerce.
*** Copyright © 2014. The Board of Regents of the University of Nebraska and the Buros Center for Testing. All rights reserved. Any unauthorized use is strictly prohibited. Buros Center for Testing, Buros Institute, Mental Measurements Yearbook, and Tests in Print are all trademarks of the Board of Regents of the University of Nebraska and may not be used without express written consent.
from https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.635&site=ehost-live&scope=site&custid=uphoenix
<!–Additional Information: Persistent link to this record (Permalink): https://search.ebscohost.com/login.aspx? direct=true&AuthType=shib&db=mmt&AN=test.635&site=ehost-live&scope=site&custid=uphoenix End of citation–>
Wechsler Individual Achievement Test–Second Edition Review of the Wechsler Individual Achievement Test-Second Edition by BETH J. DOLL, Associate Professor of Educational Psychology, University of Nebraska-Lincoln, Lincoln, NE: DESCRIPTION. The Wechsler Individual Achievement Test-Second Edition (WIAT-II) is a comprehensive individual achievement test that is a revision of the Wechsler Individual Achievement Test (WIAT; The Psychological Corporation, 1992). It is substantially different from its predecessor in the content and format of its subtests and in the scale’s administration and scoring. In most respects, changes reflect the incorporation of cutting-edge research in the acquisition and assessment of educational skills. The basic design of the test remains the same. It provides composite scores in four domains of educational achievement: reading, mathematics, written language, and oral language. The Reading Composite incorporates subtests in Word Reading, Reading Comprehension, and Pseudoword Decoding. In addition to the word reading and passage comprehension tasks of the WIAT, the WIAT-II includes items and scores that assess phonological awareness, letter-sound awareness, automaticity of word recognition, and fluency of reading. In addition, the Reading Composite includes the only new subtest-Pseudoword Decoding-as a measure of word decoding skills. The Mathematics Composite incorporates subtests in Numerical Operations and Mathematics Reasoning. In addition to the computation, problem solving, and quantitative reasoning items of the WIAT, the WIAT-II includes items assessing counting, one-to-one correspondence, estimation, and numerical patterns. The Written Language Composite incorporates subtests in Spelling and Written Expression. The Spelling subtest is very similar to that of the WIAT. The Written Expression subtest operationalizes much of the most recent research in writing instruction, incorporating items assessing word fluency, sentence construction, writing fluency, and written responses to visual or verbal cues in addition to the WIAT’s descriptive and narrative writing tasks. The Oral Language Composite incorporates subtests in Listening Comprehension and Oral Expression. These have been redesigned to include greater emphasis on fluency and expressive vocabulary and recall for contextual information, and less emphasis on literal comprehension. Scoring systems for the Reading Composite and Oral Language Composite were altered to use new scoring rules that were more consistent with instructional practices.
javascript:openWideTip(‘http://support.ebsco.com/help/?int=ehost&lang=&feature_id=APA’);
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 2/8
The WIAT-II is designed for administration to a broad range of individuals, as young as 4-year-olds and as old as 85-year-old adults. Two examiner’s manuals are provided. One describes technical information for the school-aged (4 to 19 years) sample and a second, supplemental manual describes technical information for the sample of college students and adults. Administration for the entire battery ranges from approximately 45 minutes for the youngest children to 2 hours for adolescents and adults. Administration rules are generally straightforward. Start points are indicated for each subtest based upon the examinee’s age, reversal rules describe when earlier basal items should be administered, and discontinue rules are described based on the examinee’s missing a specified number of items in a row. The protocol guides users through the conversion of raw scores into standardized scores, using the scoring and normative supplement manual. However, administration and scoring rules for the Reading Comprehension subtest are more complex and more confusing, and these were further altered after the test’s publication. Early purchasers of the kit will need ensure that they have the “updated manual” that includes these revisions. In addition to standard scores and percentile ranks for each subtest and Composite scale, the WIAT-II yields age equivalent scores, grade equivalent scores (fall or spring), normal curve equivalents, stanines, quartile scores, and decile scores. Error analysis procedures are incorporated into the test protocol, and measures of fluency are provided by timing reading speed on the Reading Comprehension subtest and using time limits for Word Fluency. Finally, because a subset of the standardization sample was administered Wechsler intelligence scales, users can also examine the significance of achievement/intelligence discrepancies between the WIAT-II and the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R), the Wechsler Intelligence Scale for Children- Third Edition (WISC-III), or the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The authors recommend that only professionals trained in the administration and interpretation of individually administered assessment instruments are qualified to administer the WIAT-II and translate its results into education decisions. DEVELOPMENT. Work on the WIAT-II began in 1996, only 4 years after publication of the WIAT. Focus groups were convened with over 500 major users of individual achievement tests to design item modifications. Notes from the focus groups were organized into blueprints of the constructs that the test would assess. Next, these blueprints were compared to national and state standards and curricula such as the Principles and Standards for School Mathematics (NCTM, 2000) of the National Council of Teachers of Mathematics (NCTM), and the report of the National Reading Panel (2000). In addition, prominent researchers in each academic domain were consulted, and some were retained as advisors throughout the test development process. Revised items were piloted with 400 individuals in 1997, followed by a large scale tryout with 1,900 students. Item analysis of these data guided the selection of items for the final version of the WIAT-II. In deference to the importance of cognitive processes underlying achievement, the publishers coordinated the development of the WIAT-II with the Process Assessment of the Learner-Test Battery for Reading and Writing (Berninger, 2001). TECHNICAL. The normative data for the WIAT-II were collected between 1999 and 2001 from 2,950 school-aged children ranging in age from 4 years 0 months to 19 years 11 months, 707 college students, and 500 adults. A stratified-random sampling procedure was used to insure that the sample would be representative of the 1998 Census of the United States on gender, race/ethnicity, geographic region, and parental education level. Students with disabilities were included in the standardization sample in proportion to their representation in public school programs, and the college sample included students from 2-year as well as 4-year campuses. Children and adults were excluded from the standardization if they did not speak English, had nerological disorders, or were taking medications that could suppress performance. A comprehensive description of the final standardization sample for
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 3/8
school-aged children demonstrates that it successfully approximates the demographic characteristics of the United States. To insure the integrity of the standardization data, standardization protocols were scored by the primary examiner, and then were rescored by two additional scorers trained by the test publisher. For both samples, internal consistency reliability estimates of the WIAT-II subtests are generally high (above .85) with the exception of the Written Expression and Listening Comprehension subtests in the school-aged sample and the Written Expression and Oral Expression subtests in the college/adult sample. The reliability estimates of these subtests were only somewhat lower (above .70). Internal consistency reliability of the Composite scores was very high (above .90) in both samples with the exception of the Oral Language Composite, which was above .85. In the school-aged sample, test- retest correlations for the subtests (across intervals of approximately 10 days) were consistently above .85 and test-retest correlations for the Composite scores were above .90. Tests-retest correlations were somewhat lower in the college/adult sample, with correlations between .75 and .85 in Reading Comprehension, Written Expression, Oral Expression, and the composite scores for Written Language and Oral Language. With this level of reliability, it would be reasonable to interpret intersubtest differences for the Reading and Mathematics subtests with a moderate degree of confidence, and to interpret inter-Composite differences with good confidence for school-aged children and adults. Limited validity information for the WIAT-II is available in the examiner’s manual. Predictably, the corresponding subtests of the WIAT and the WIAT-II are strongly correlated (above .80) in the school- aged sample for those subtests with minimal content changes. However, the correlations were lower for subtests that had changed the most: Reading Comprehension (r = .74), Written Expression (r = .48), Listening Comprehension (r = .68), and Oral Expression (r = .62) subtests. Similarly, the Reading Composites and Mathematics Composites of the WIAT and WIAT-II are strongly correlated (r ≥ .85) but the Written Language Composites and Oral Language Composites are not (r = .66). The examiner’s manual describes very modest correlations between the school-aged WIAT-II Reading, Mathematics, and Written Language Composites and corresponding achievement subtests of the Wide Range Achievement Tests-Third Edition (correlations range from .68 to .77) and the Differential Abilities Scales (correlations range from .32 to .64). For a sample of 48 college students, correlations between composite scores of the WIAT-II and the Woodcock-Johnson-Revised were reported to fall above .7, but correlations between individual subtests were much lower (ranging from .47 to .72). The highest correlations were between WIAT-II and and WJ-R mathematics subtests, and lower correlations were reported between reading and language subtests. At the time of this review, there were no other publications describing the WIAT-II’s validity evidence listed in the PsycLit or ERIC literature databases. Consequently, although it is evident that the refined WIAT-II subtests are assessing academic skills in different ways than traditional achievement tests, it is not yet fully clear whether the WIAT-II’s substantial content revisions will yield better, more usable indices of students’ academic abilities. COMMENTARY. The WIAT-II is a sophisticated achievement test. The authors and publishers should be applauded for their decision to align this test with timely research in learning and assessment of academic knowledge. Still, the test’s conceptual sophistication will present a challenge to examiners, who must become familiar with this research and its implications for translating WIAT-II scores into effective educational decisions for students. At present, the examiner’s manual does not provide sufficient guidance for making these translations. What is needed is a comprehensive “expert guide” that will introduce users to recent and revolutionary changes in the conceptual frameworks underlying academic instruction, with particular attention to emerging ideas in reading and written language, and their implications for interpreting WIAT-II scores. Administration and interpretation of the WIAT-II will also be complicated by a few practical difficulties.
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 4/8
The administration and scoring of the Reading Comprehension subtest is complex, and was confused further by the changes made in the recent technical bulletin. Novice examiners will need to carefully review and practice the procedures before using the test. Also, the item type frequently changes within a single subtest. For example, the Word Reading subtest begins with phonemic awareness tasks, changes to letter-sound association tasks, and then shifts to word reading. The interpretation of a student’s score could change significantly depending upon the items that were administered and the items that were missed. Similar shifts in item content were noted within most of the subtests. The administration procedures in the examiner’s manual sometimes deviate from procedures that are supported by the technical properties of the test. For example, the manual notes that some users may decide to administer a partial battery rather than the full WIAT-II test. However, it is clear that only the full battery was administered during norming, such that administration of a partial battery would violate the standardization procedures of the test. Finally, the meaning of these content revisions will not be fully examined without additional validity studies. For example, it would be very useful to know how the WIAT-II compares to the Peabody Individual Achievement Test-Revised (1997) or the Woodcock-Johnson III Tests of Achievement (2001) in both the school-aged and college/adult samples. Additionally, it would be useful to know how well the refined reading comprehension tasks predict instructional needs of students. SUMMARY. The WIAT-II is a carefully designed individual achievement test with exemplary standardization on and a welcome option for administration to college students and adults. Its tasks and scores reflect important, emerging research in reading, mathematics, and language instruction. This refinement reduces the test’s familiarity for many users, but it will be worth their effort to update their understanding of recent research and simultaneously upgrade their achievement test interpretation skills. Although the validity of the WIAT-II is not yet fully tested, its substantial correlation with the WIAT and strong reliability estimates allow users to use it with some confidence. More caution should be exercised in using and interpreting the Written Expression Composite and the Oral Language Composite, as these show the least relation to prior WIAT tasks, and so require more extensive validation to establish their relevance to learning. REVIEWER’S REFERENCES Berninger, V. (2001). Process Assessment of the Learner: Test Battery for Reading and Writing. San Antonio, TX: The Psychological Corporation. Markwardt, F. C. (1997). Peabody Individual Achievement Test-Revised/Normative Update. Circle Pines, MN: AGS Publishing. National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4754). Washington, DC: National Institute of Child Health and Human Development. The Psychological Corporation. (1992). The Wechsler Individual Achievement Test. San Antonio, TX: Author. Woodcock, R. W., McGrew, K. C., & Mather, N. (2001). Woodcock-Johnson III Tests of Achievement. Itasca, IL: Riverside Publishing.
Review of the Wechsler Individual Achievement Test-Second Edition by GERALD TINDAL, Professor in Educational Leadership, College of Education, University of Oregon, Eugene, OR, and MICHELLE NUTTER, Research Associate, Behavioral Research and Teaching, College of Education, University of Oregon, Eugene, OR:
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 5/8
DESCRIPTION. The Wechsler Individual Achievement Test-Second Edition (WIAT-II) is the second edition of a test that was first published in 1992. This test is a “comprehensive, individually administered test for assessing the achievement of children, adolescents, college students, and adults” (examiner’s manual, p. 1). Although Reading, Writing, Mathematics, and Oral Language remain as the basic content domains, the actual items and subscales within these domains have increased in this latest edition. Reading is composed of three subtests: (a) Word Reading (name letters, rhyme words, identify beginning and ending sounds, blend sounds, match sounds with letters and blends, and read words), (b) Reading Comprehension (match words with pictures, and answer comprehension questions from sentences and passages), and (c) Pseudoword Decoding (read phonetic nonsense words). Mathematics is composed of two subtests: (a) Numerical Operations (identify and write numbers, count, and calculate) and (b) Math Reasoning (count, identify shapes, and solve word problems related to time, money, and measurement). Written Language includes two subtests: (a) Spelling (letters, blends, and words) and (b) Written Expression (alphabet writing, word fluency, and write sentences, paragraphs, and essays). Oral Language consists of two subtests: (a) Listening Comprehension (receptive vocabulary, sentence completion, and expressive vocabulary) and (b) Oral Expression (sentence repetition, word fluency, visual passage retell, and giving directions). The materials include stimulus booklets, student administration cards, a response booklet, a record form, an examiner’s manual, a scoring and normative supplement (Pre-K to Grade 12), and a supplement for college students and adults. Other materials have to be provided by the administrator (e.g., blank paper, stop watches, pencil, money). In the examiner’s manual, specific descriptions are presented for the content and theoretical explanation of the measures. The WIAT-II is designed to yield information about diagnostic, placement, eligibility, and intervention decisions across a variety of settings with a range of scoring options available for summarizing performance: (a) standard scores, (b) percentile ranks, (c) age or grade equivalents, (d) normal curve equivalents, (e) stanines, (f) quartile scores, and (g) decile scores. Either age- or grade- based conversions to standard scores can be made (with fall, winter, and spring administrations). Standard scores are based on a mean of 100 and standard deviation of 15. The manual suggests that professionals who have been trained in the use of individually administered assessment tools and who are involved in psychological or educational testing are qualified to administer the WIAT-II. Subtests should be administered in the prescribed order indicated in the stimulus book, irrespective of whether the entire battery or a single subtest is administered. Administration time will vary depending on the age of the examinee and the number of subtests administered but approximately 45 minutes is needed for students in grades Pre-K, 90 minutes for students in elementary grades, and 2 hours for students in middle-high schools. Although consideration is given to testing examinees with physical or language impairments, no specific list of acceptable or allowable accommodations is presented to provide standard score reporting, which suggests that testers rely on their “professional judgment to evaluate the impact of such modified procedures on the test scores” (examiner’s manual, p. 23). The WIAT-II utilizes start points (specific to the particular subtest), reversal rules (typically invoked if 0 is scored on any of the first three items of a subtest, although Reading Comprehension has unique rules), discontinue rules (if 0 is scored on 6 or 7 consecutive items in a subtest), and stop points (for Reading Comprehension and Written Expression). Basal and ceiling levels also are provided. For many of the items, both modeling the task response and repeating or prompting is allowed. Items are either scored dichotomously (0, 1) or awarded partial credit (0, 1, or 2) based on specified scoring procedures and guidelines presented in the supplemental books; for these items, verbatim recording is required.
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 6/8
Qualitative recording of examinee behavior also is encouraged and a checklist is provided for codifying frequency of behavioral occurrence on several dimensions. Written and Oral Expression, as well as optional Reading Comprehension, scores require conversion of raw scores to quartiles (before reporting standard scores) and are reported as unique supplemental scores. Four composite scores can be obtained on the WIAT-II by adding standard scores of individual subtests: Reading (with three subtest scores), Mathematics (with two subtest scores), Written Language (with two subtest scores), and Oral Language (with two subtest scores). Confidence intervals also can be recorded on the summary report before displaying various rank scores. Finally, the report form includes a place to report an ability- achievement discrepancy analysis and plot the results on a bell curve. DEVELOPMENT. The most significant change in this edition is the extension of the age range from 5-19 years to 4-85 years. Development began in 1996 with a rigorous analysis of the WIAT and blueprint of content or curriculum specifications. The theoretical perspectives used to develop this edition include research reported by Berninger (2001), the National Reading Panel (2000), and the National Council of Teachers of Mathematics (NCTM, 2000) standards. Pilot testing of items was conducted in 1997 with approximately 400 individuals; the results were analyzed using traditional item analyses. The authors note that this revision provides more complete behavior sampling in the domains, a broader range of students, closer links to instruction, improved scoring (with error analysis), and procedures for documenting ability-achievement discrepancies. In Reading, letter identification, phonological awareness, and pseudoword decoding were added along with measurement of reading rate, oral reading accuracy, fluency, and comprehension (oral and lexical) in expanded sentence and passage reading. New items were added in the Mathematics subtests to reflect both low level (patterns, counting, 1:1 correspondence, and numerical identification), as well as high-level mathematics problems (e.g., estimation, probability, and multi-step problem solving). Spelling subtests were revised to reflect morphological knowledge; Written Expression subtests include new low measures (timed alphabet writing and fluency) in addition to the assessment of high-level skills (sentence combining and sentence generation, as well as analytic scoring on four traits). Finally, Oral Language is more anchored to real contexts as part of Reading and Writing and adds word fluency, auditory short-term recall, and story generation. TECHNICAL. Two standardization samples were drawn (in 1999-2000 and 2000-2001): for PreK-12 (ages 4-19) and for the college/adult population. Both standardization samples were stratified on the basis of grade, age, sex, race/ethnicity, geographic region, and parent education level, using the 1998 Bureau of the Census as the basis for stratification. Over 5,000 individuals participated in the standardization process. “A stratified random sampling approach was used to select participants representative of the population” and “students who received special education services in school settings were not excluded from participation” (examiner’s manual, p. 86). Sample proportions closely approximate census proportions for all stratification variables. Qualified and trained examiners, with test administration experience, were used for the standardization sample. During the standardization process, rules to start, discontinue, and stop testing were developed to conservatively allow students to avoid being tested on items deemed too easy or too difficult. A subset of the standardization participants also was administered one of the three Wechsler intelligence scales: the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R; Wechsler, 1989), the Wechsler Intelligence Scale for Children, Third Edition (WISC-III; Wechsler, 1991), or the Wechsler Adult Intelligence Scale, Third Edition (WAIS-III; Wechsler, 1997). The linking sample consisted of 1,069 participants. The information collected from this portion of the standardization process was used to develop the achievement- discrepancy statistics. The authors report data on split-half coefficients, test-retest, and interscorer agreement. Most split-half
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 7/8
coefficients (based on age and grade) are well above .80. Grade-based split-half coefficients are consistently lower than age-based coefficients, with coefficients falling below .80 on Numerical Reasoning, Written Expression, and Listening Comprehension in fall and spring. Age-based reliability coefficients fall below .80 for Listening Comprehension only. The split-half coefficients for the four composites are all greater than .80. To determine test-retest reliability, a sample of 297 was drawn from three bands in the standardization sample: ages 6-9, 10-12, and 13-19. Test-retest intervals varied from 7-45 days with an average interval of 10 days. Test-retest subtest scores range between .81 and .99. Composite scores range between .91 and .92. Two separate studies were conducted to evaluate interrater agreement. The first study examined the dichotomously scored items in the Reading Comprehension subtests for 2,180 participants. Interrater reliability coefficients ranged between .94 and .98. The second study examined the Written Expression and Oral Expression subtests for 2,180 participants. The intraclass correlations between the two sets of scores ranged from .71 to .94 across ages, with an average correlation of .85. The intraclass correlations between pairs of scores for the Oral Expression subtest ranged from .91 to .99 across ages, with an average of .96. The WIAT-II presents evidence of content, construct, and criterion-related validity. Although curriculum objectives were referenced for item selection and experts in reading, mathematics, speech, and language arts reviewed the subtests to ascertain the degree with which items measured specific curriculum objectives, no specific information is presented (e.g., which curricula, who served as experts, and how the process was completed). Conventional and item response theory analyses are presented to document item consistency and to eliminate poorly constructed items, determine correct item order, as well as to prevent item bias. Evidence of construct validity is provided through intercorrelations of subtests, correlations with measures of ability, and group differences across grades and groups. Finally, the WIAT-II provides ample support for criterion-related validity with a number of individually administered achievement tests. Moderate correlations appear between selected WIAT-II and Process Assessment of the Learner: Test Battery for Reading and Writing (PAL-RW; Berninger, 2001; considered a companion to the WIAT-II) and WIAT subtests, the Wide Range Achievement Test-3 (WRAT3; Wilkinson, 1993), the Differential Ability Scales (DAS; Elliott, 1990), and the Peabody Picture Vocabulary Test-III (PPVT-III; Dunn & Dunn, 1997). In these studies, moderate to high correlations are presented in an extensive set of tables in the examiner’s manual. Correlations with the WIAT-II and group-administered achievement tests also are presented: Results indicate moderate correlations between the WIAT-II and the Stanford Achievement Tests-Ninth Edition (Stanford 9; Harcourt Educational Measurement, 1996), and the Metropolitan Achievement Tests, Eighth Edition (MAT8; Harcourt Educational Measurement, 1999), as well as the Academic Competence Evaluation Scales (ACES; DiPerna & Elliott, 2000) and school grades. Again, the correlations are moderate to high between the WIAT-II and these tests. Given the WIAT-II is to be used in the differential diagnosis of students with disabilities, it is important that construct validity be examined by comparing groups of students. Nine different comparisons are presented to document the performance of students participating in gifted programs (n = 123), with mental retardation (n = 39), with emotional disturbance (n = 85), with learning disabilities in reading (n = 123), with learning disabilities not specific to reading (n = 109), with attention deficit-hyperactive disorder (ADHD) (n = 179), with both ADHD and learning disabilities (n = 54), with hearing impairments (n = 31), and with speech and/or language impairments (n = 49). In all of these comparisons, the data confirm the differential performance of students with special needs. COMMENTARY AND SUMMARY. The WIAT-II has several strong features. First, its comprehensive nature allows for a thorough examination of student strengths and weaknesses within and across
11/3/2019 EBSCOhost
https://web.a.ebscohost.com/ehost/delivery?sid=6991eb8b-6929-488f-b10d-f1bafd67a059%40sessionmgr4007&vid=4&ReturnUrl=https%3a%2f%2fw… 8/8
several academic domains. Second, the modifications made to the most recent edition subtests reflect current trends in research and curriculum. Third, the materials are well organized and very accessible, for both administration and scoring or reporting. Finally, the link between assessment and instruction/intervention is explicit through the inclusion of an error analysis component and partial correct scoring. The examiner’s manual provides a strong guiding framework for the development of interventions. However, without the thorough interpretation presented by the examiner trained in linking the data to interventions and instructional programs, the error analysis component is meaningless. The protocol alone does not lend itself to linking data to interventions. REVIEWERS’ REFERENCES Berninger, V. (2001). Process Assessment of the Learner: Test Battery for Reading and Writing. San Antonio, TX: The Psychological Corporation. DiPerna, J. C., & Elliott, S. N. (2000). Academic Competence Evaluation Scales-Manual K-12. San Antonio, TX: The Psychological Corporation. Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test (3rd ed.). Circle Pines, MN: American Guidance Services. Elliott, C. D. (1990). Differential Ability Scales. San Antonio, TX: The Psychological Corporation. Harcourt Educational Measurement. (1996). Stanford Achievement Test (9th ed.). San Antonio, TX: Author. Harcourt Educational Measurement. (1999). Metropolitan Achievement Tests (8th ed., standardization ed.). San Antonio, TX: Author. National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4754). Washington, DC: National Institute of Child Health and Human Development. Wechsler, D. (1989). Wechsler Preschool and Primary Scale of Intelligence (rev. ed.). San Antonio, TX: The Psychological Corporation. Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: The Psychological Corporation. Wechsler, D. (1997). Wechsler Adult Intelligence Scale (3rd ed.). San Antonio, TX: The Psychological Corporation. Wilkinson, G. (1993). Wide Range Achievement Test (3rd ed.). Wilmington, DE: Wide Range.
*** Copyright © 2014. The Board of Regents of the University of Nebraska and the Buros Center for Testing. All rights reserved. Any unauthorized use is strictly prohibited. Buros Center for Testing, Buros Institute, Mental Measurements Yearbook, and Tests in Print are all trademarks of the Board of Regents of the University of Nebraska and may not be used without express written consent.