Teacher question:
We are having an interesting conversation in our district. We currently give AIMSweb as a screening probe three times a year. One of the school psychologists pointed out that for the last several years the first graders seem to do better in the fall than in the spring on nonsense word fluency. When we look at measures of comprehension and fluency using other measures, we do not see a decline. Is there any research out there that might help us understand what we are seeing and whether or not this is a serious issue?
Shanahan responds:
What you describe is a common experience with AIMSweb and other progress monitoring tests. And, the more often you re-test, the more often you’ll see the problem. (Thank goodness you are only trying to test the kids three times a year.)
I could find no studies on the nonsense word portion of AIMSweb. But every test has a standard error of measurement (SEM).
The standard error gives an estimate of how much test scores will vary if the test is given repeatedly. Tests aren’t perfect, so if someone were to take the same test two days in a row, the score would not be likely to be the same.
But how much could someone learn (or forget) in one day? Which is the point.
SEM tells you how much change the test score is likely to undergo even if there were no significant opportunity for learning or forgetting. It is not a real change in reading ability, but variance due to the imprecision of the measurement.
Schools tend to pay a lot of attention to the standard error with their state test scores (the so-called “wings” around your school or district average scores). If your school gets 500 in reading on the state test, but the standard error is + or – 5… then we can’t be sure that you did any better than the schools that got 495s, 496s, 497s, 498s, and 499s. Your score was higher, but we can’t tell from this whether your kids actually outperformed those schools within the standard error.
When you calculate the SEM for a school or district score, it will tend to be small because of the large numbers of students whose scores are being averaged. However, when you are looking at an individual’s score, such as when you are trying to find out how much improvement there has been since the last time you tested, SEMs can get a lot bigger.
Unfortunately, schools pay less attention to SEMs with screening or progress monitoring tests than they do with accountability tests.
Nevertheless, AIMSweb has a standard error of measurement. So do all the other screeners out there.
That means when you give such tests repeatedly over short periods of time (say less than every 15 weeks), you’ll end up with unreliability affecting some percentage of the students’ scores.
I’d love to blame AIMSweb for being particularly bad as a predictor test. That would sure make it easy to address your problem: “Lady, you bought the wrong test. Buy the XYZ Reading Screener and everything is going to be fine. You’ll see.”
In fact, studies suggest—at least with oral reading fluency—that if anything AIMSweb has particularly small standard errors of measurement (Ardoin & Christ, 2009).
But even with that, you’ll still find changes in scores that make no sense. John got a 49 when you tested him early in the school year. I couldn’t find an SEM for the AIMSweb nonsense word test, but let’s say to be 95% certain that one score is higher than another is + or – 10 points. Thus, if on retesting you find that his score is 45 it looks like a decline—but what it really means is that John’s score isn’t any different than before.
Teachers usually like knowing that; what looked to be a decline is just test noise.
They usually aren’t quite as happy with the idea that if John goes from 49 to 58 on that test that the change is too small to conclude that any real progress was made. Changes that are within the standard error of measurement are not actually changes at all.
Since I can’t recommend shifting to some other comparable measure (e.g., DIBELS, PALS, CBM) that would necessarily be any more precise, I think what you are doing—comparing the results with those derived from other measures—is the best antidote.
If you see a decline in AIMSweb scores, but no comparable decline in other tests that you are giving…. I’d conclude that there was probably not a real decline. I would then monitor that student more closely during instruction just to be sure.
On the other hand, if the score decline is confirmed by your other tests, then I would try to address the problem through instruction—giving the youngster greater help with the skill in question.
Contact your test publisher and ask for the test’s standard errors of measurement. Those statistics will help you to better interpret these test scores. In fact, without that kind of information I'm not sure how you are making sense of these data.
The problem here: You are expecting too high a degree of accuracy from your testing regime. Give the tests. Use the tests. But don’t trust these changes, up or down, to always be accurate—at least no more accurate than the standard errors suggest that they should be.
Reference
Ardoin, S.P., & Christ, T.J. (2009). Curriculum-based measurement of oral reading: Standard errors associated with progress monitoring outcomes from DIBELS, AIMSweb, and an experimental passages set. School Psychology Review, 38, 266-283.
Our district currently uses both Aimsweb and STAR (Renaissance Learning) at the grade school and just STAR at the middle school (where I currently teach). One thing we've noticed (especially in grades 5 and up) is that our scores are pretty consistent for our highest and lowest readers, but many of our middle readers have score graphs that look like roller coasters. In October, their scores identify them as well above grade level, but in December, they're identified as in need of intervention (but we don't see drops in academic performance). I'm sure some of this is motivation, but it also makes us question the validity of the test. STAR is a computer adaptive test, so I wonder if kids miss questions early on in the test if it makes it harder for them to score well? What are your thoughts on this? I've wondered about asking if we could switch from STAR to Aimsweb at the middle school level because STAR just seems so inconsistent.
Karen-
Comprehension tests definitely depend on the kids trying. If the scores are frequently inconsistent as you describe they cannot possibly be useful. Either find do,e ways to get the kids to try (sometimes this takes no more than a pep talk) or do as you say switch to another measure.
Thanks for your insights on this topic.
Our school is progress monitoring kindergarten students showing risk on Aimsweb weekly.
Are you suggesting that this practice is unreliable and should be discontinued?
Anonymous--
Indeed, I am. That amount of testing is not justified. The standard error of the test is large enough that children cannot be expected to make that amount of gain in a week. You can't tell if the changes that you see are due to learning or unreliability of measurement. No research supports the practice of such frequent testing. Use this time tobtsCh kids to read.
Leave me a comment and I would like to have a discussion with you!
Copyright © 2024 Shanahan on Literacy. All rights reserved. Web Development by Dog and Rooster, Inc.
Comments
See what others have to say about this topic.