Do Screening and Monitoring Tests Really Help?

Teacher Question:

I’m surprised that you don’t write about screening and monitoring tests. I’ve been a teacher for 24 years (first-grade) and I’m considering an early retirement. It seems like I’m supposed to test my students more than teach them. We just test and get ready for tests. I feel so sorry for the boys and girls. I want to teach reading, not FSF, ISF, PSF. NWF, WRF, ORF, LNF. Help me, Dr. Shanahan. Is this what’s best? Is this really a part of the “science of reading”?

Shanahan response:

I feel your pain. When I was a first-grade teacher, we didn’t have all those tests. It certainly made my life easier.

But did that lack make children’s education any better? That’s the real question.

Over the past couple decades such testing has insinuated itself into many primary grade classrooms, often because of policy mandates from above.

Twenty years ago, Reading First (a part of the “No Child Left Behind” federal legislation) was the main source of these testing mandates. These days it’s more likely to be the state Dyslexia Screening laws. Many school districts have taken up these tests on their own, as well.

No wonder. The scheme makes sense. There’s a logic to it.

We test kids at the beginning of the school year to find out which essential skills they have yet to develop. As the school years proceeds, we then re-evaluate to see how the kids are progressing. Teachers, based on the tests, are to differentiate and provide reteaching, and some kids may get extra help through instructional interventions beyond the classroom. The idea is commendable because it strives to keep students from falling behind.

What does research have to say about this approach?

First, there are the studies of the tests themselves. These days, there is a slew of such measures that probe into proficiency with letters, phonemic awareness, decoding, oral reading fluency, and spelling (curriculum-based measures) or that try to predict later reading performance. Studies show that many of these short tests are both valid and reliable. Not that research hasn’t identified important limitations, too – including their shortfall with some populations, such as English Learners (Newell, Codding, & Fortune, 2020); reliability problems when administered by teachers under normal classroom conditions (Ardoin & Christ, 2008); and the so-called false positives issue which often leads to the overidentification of reading problems – meaning some kids get extra instruction when they don’t need it (Compton, Fuchs, Fuchs, Bouton, Gilbert, Barquero, Cho, & Crouch, 2010).

That means such tests can do a good job, but that they sometimes don’t. Still, even with these snags, it appears that they are up to the job for which they are intended (January & Klingbell, 2020; Petscher, Fien, Stanley, Gearin, Gaab, Fletcher, & Johnson, 2019).

So far, so good.

Second, there are studies of various early interventions. Again, there has been substantial study into whether remedial interventions help kids to progress. Several programs that deliver targeted instruction to low readers have been found to be successful.

That’s even better.

Given that there are valid tests and effective interventions out there, you’d think there would be strong evidence supporting programs of early identification and differentiation of instruction.

That’s where things get complicated.

Because, in fact, the evidence supporting the use of such testing to improve reading achievement is neither strong nor straightforward. The pieces are there, but the connections are a bit shaky.

You don’t have to take my word for it.

The What Works Clearinghouse (WWC) issued a relevant practice guide, Assisting Students Struggling with Reading: Response to Intervention (RtI) and Multi-Tier Intervention in the Primary Grades (Gersten, Compton, Connor, Dimino, Santoro, Linan-Thompson, & Tilly, 2008).

That guide recommended that students be screened and monitored in reading. The WWC (part of the research arm of the U.S. Department of Education) evaluated that recommendation and concluded that it was supported by moderate research evidence. Studies showed that such testing could be implemented successfully.

The panel also recommended that, based upon the data from these tests, students should be provided with differentiated reading instruction. The WWC concluded that there was minimal research evidence supporting this recommendation (at that time, they could only cite a single correlational study that suggested the possibility of effectiveness). In other words, there wasn’t convincing proof that teaching in response to testing improved student learning.

Don’t bail yet – that was almost 15 years ago, and things do change.

Given that I started looking for more recent evidence. In some states, these policies have been in place for quite a while, so maybe public data could provide some clues. Also, what about new studies over the past decade?

Unfortunately, public data hasn’t been especially informative. Since 2006, National Assessment (NAEP) scores have languished (and even fallen a bit recently). But I could find no analyses that linked implementation of these testing policies to reading performance in the various states. Likewise, as far as I could determine, no state has even bothered to monitor whether these laws are helping kids to learn better. (When I’m contacted by groups wanting my help in getting their states to adopt early reading assessment policies, I always ask how it has gone in states that have those policies already. I have yet to find someone who had any idea.)

The largest study of Response to Intervention (RtI) or Multi-Tiered Response efforts – early assessment and intervention is a big part of those – wasn’t encouraging either (Balu, 2015). That national study compared learning results of schools that had such programs with those that didn’t. Startlingly, first graders did worse in the test-and-differentiate schools than in the business-as-usual schools. You can read too much into that result given some gaps in the study. Nevertheless, those results aren’t exactly a glowing endorsement of the instructional practices that you’re finding oppressive.

I polled some colleagues who are big fans of these screening/monitoring assessments. They steered me to the studies they cite in their presentations and publications. I looked. There were some terrific studies that provided strong supporting evidence for the test-and-differentiate idea (Carlson, Borman, & Robinson, 2011; van Geel, Keuning, Visscher, & Fox, 2016; Stecker & Fox, 2000), but they weren’t reading studies. The best evidence on this approach comes from math, a different thing altogether. The Carlson study considered both reading and math, but only reported significant positive results on the math side. Oops.

That’s frustrating.

There has been some recent academic research that has been more supportive, however.

For instance, one study found that assessment-based differentiated reading instruction in Grade 3 had a positive impact on fluency, but not on reading comprehension (Forster, Kawohl, & Souvigneir, 2018). The fluency gains were stable over two years. The lowest readers gained the major benefits of the practice, and teachers needed significant support to make it work (including special instructional materials). The researchers concluded that providing test data to teachers alone was not an effective approach, and they reinforced this claim with conclusions drawn by other researchers from other studies. For instance, Lynn Fuchs and Sharon Vaughn – big supporters of the early assessment approach – concluded that “differentiated instruction is beyond the skill set of even the most proficient teachers” (2012, p. 198). So, at least some positive results.

More persuasive evidence was provided by several studies reported by Connor and her colleagues (Connor, Morrison, Fishman, Crowe, Al Otaiba, & Schatschneider, 2013; Connor, Morrison, Fishman, Giuliani, Luck, Underwood, et al., 2011; Connor, Phillips, Young-Suk, Lonigan, Kaschak, Crowe, Dombek, & Al Otaiba, 2018; Connor, Piasta, Fishman, Glasney, Schatschneider, Crowe, et al., 2009). They found that they could successfully raise first and second grade reading achievement through assess-and-differentiate efforts; identifying who needed more decoding tuition and then keeping those kids under close teacher supervision so that they would progress in phonics (while providing the more advanced students with independent reading work and experience).

As powerful and persuasive as the Connor data are – and they are persuasive to me – it is important to note that this team did much more than turn test data over to teachers and hope for the best. No, they developed a proprietary algorithm that they use to determine the appropriate data-based response to student needs. “Taking these results together indicates that predicting appropriate amounts and types of instruction is not as straightforward as has been previously suggested” (Connor, et al., 2009, p. 93). In fact, they concluded that without their algorithmically based approach some students would likely receive too much decoding instruction, while others would certainly receive too little. That means that early testing can have positive learning outcomes, but only if the results of those tests are weighed appropriately, not something easy for individual teachers to do.

Finally, there is a recent meta-analysis of 15 studies of reading interventions with a “data-based decision making” component and their effects on struggling readers in grades K-12 (Filderman, Toste, Didion, Peng, & Clemens, 2018). The effect sizes for these interventions were small but significant. Six of the studies allowed for comparisons of interventions with and without data-based decision making (again, with small positive effects for using the assessments as the basis of teaching).

My conclusions, from all this evidence, is that it is possible to make effective the kind of assessment that you are complaining about. However, it should also be evident that such efforts too often fail to deliver on those promises.

One of the problems is that there is simply too much testing – especially for the students who aren’t low achieving in reading (VanDerHeyden, Burns, & Bonifay, 2018). That WWC Practice Guide referred to earlier called for three testings per year, but in many jurisdictions, kids are getting far more than that – and the frequency of testing is not necessarily linked to any need for information – if you know a youngster is struggling with phonemic awareness, why not just teach more of that rather than testing the student over and over?

Another problem is that teachers can find it challenging to administer so many tests under classroom conditions. Not only does it undercut the amount of instruction, but it can be tough to provide a valid assessment of phonemic awareness or oral reading fluency when students or teachers struggle to hear each other. In my experience, the best data is produced when test administrators are brought in to take on this burden. I know I trust such data more (and so does the Institute for Education Science in the research studies that they support).

Finally, translating test data into properly and productively differentiated instruction is not the no-brainer that policymakers and school administrators seem to presume. They budget for the tests, and then provide little or no professional development, guidance, or material supports to make these efforts effective (and Dyslexia Screening laws don’t address what it takes to make these laws work either).

My opinion? Your school is trying to go in the right direction. Help them. Screening and monitoring kids’ early literacy skills can be worthwhile.

The amount of screening and monitoring testing needs to be strictly limited, however.

In many schools/districts/states, we are overdoing it! The only reason to test someone is to find out something that you don’t know. If you know students are struggling with decoding, testing them to prove it doesn’t add much.

The point of all this testing is to reshape your teaching to ensure that kids learn. Unfortunately, these heavy investments in assessment aren’t always (or even usually) accompanied by similar exertions in the differentiation arena.

Talk to your principal, or your district’s curriculum or special education administrators. Request professional development – with classroom demonstrations, in-class coaching, and joint planning – to help get your head back in the game. The reason you became a teacher, I bet, was that you wanted to help kids. This testing could be part of that, but you can’t do that without support. I bet your colleagues would benefit from that, too.

References

Ardoin, S.P., & Christ, T.J. (2008). Evaluating curriculum-based measurement slope estimates using data from triannual universal screenings. School Psychology Review, 37(1), 109-125.

Balu, R., Pei, Z., Doolittle, F., Schiller, E., Jenkins, J., & Gersten, R. (2015). Evaluation of Response to Intervention practices for elementary school reading (NCEE 2016-4000). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

Carlson, D., Borman, G. D., & Robinson, M. (2011). A multistate district-level cluster randomized trial of the impact of data-driven reform on reading and mathematics achievement. Educational Evaluation and Policy Analysis, 33(3), 378–398. doi:10.3102/0162373711412765

Compton, D.K., Fuchs, D., Fuchs, L.S., Bouton, B., Gilbert, J.K., Barquero, L.A., Cho, E, & Crouch, R.C. (2010). Selecting at-risk first-grade readers for early intervention: Eliminating false positives and exploring the promise of a two-stage gated screening process. Journal of Educational Psychology, 102(2), 327-340.

Connor, C. M., Morrison, F. J., Fishman, B., Crowe, E. C., Al Otaiba, S., & Schatschneider, C. (2013). A longitudinal cluster-randomized controlled study on the accumulating effects of individualized literacy instruction on students' reading from first through third grade. Psychological Science, 24, 1408–1419.

Connor, C. M., Morrison, F. J., Fishman, B., Giuliani, S., Luck, M., Underwood, P. S., et al. (2011). Testing the impact of child characteristics X instruction interactions on third graders' reading comprehension by differentiating literacy instruction. Reading Research Quarterly, 46, 189–221.

Connor, C.M., Phillips, B.M., Young-Suk, G.K., Lonigan, C.J., Kaschak, M.P., Crowe, E., Dombek, J., & Al Otaiba, S. (2018). Examining the efficacy of targeted component interventions on language and literacy for third and fourth graders who are at risk of comprehension difficulties. Scientific Studies of Reading, 22(6), 462-484.

Connor, C. M., Piasta, S. B., Fishman, B., Glasney, S., Schatschneider, C., Crowe, E., et al. (2009). Individualizing student instruction precisely: Effects of child X instruction interactions on first graders' literacy development. Child Development, 80, 77–100.

Filderman, M. J., Toste, J. R., Didion, L. A., Peng, P., & Clemens, N. H. (2018). Data-based decision making in reading interventions: A synthesis and meta-analysis of the effects for struggling readers. Journal of Special Education, 52(3), 174–187. https://doi.org/10.1177/0022466918790001

Forster, N., Kawohl, E., & Souvigneir, E. (2018). Short- and long-term effects of assessment-based differentiated reading instruction in general education on reading fluency and reading comprehension. Learning and Instruction, 56, 98-109.

Fuchs, L. S., & Vaughn, S. (2012). Responsiveness-to-Intervention: A decade later. Journal of Learning Disabilities, 45(3), 195–203. https://doi.org/10.1177/0022219412442150

Gersten, R., Compton, D., Connor, C.M., Dimino, J., Santoro, L., Linan-Thompson, S., and Tilly, W.D. (2008). Assisting students struggling with reading: Response to Intervention and multi-tier intervention for reading in the primary grades. A practice guide. (NCEE 2009-4045). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from http://ies.ed.gov/ncee/wwc/publications/practiceguides/.

January, S.A., & Klingbeil, D.A. (2020). Universal screening in grade K-2: A systematic review and meta-analysis of early reading curriculum-based measures. Journal of School Psychology, 82, 103-122.

Newell, K.W., Codding, R.S., & Fortune, T.W. (2020). Oral reading fluency as a screening tool with English learners: A systematic review. Psychology in the Schools, 57, 1208-1239.

Petscher, Y., Fien, H., Stanley, C., Gearin, B., Gaab, N., Fletcher, J.M., & Johnson, E. (2019). Screening for dyslexia. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education, Office of Special Education Programs, National Center on Improving Literacy. Retrieved from improvingliteracy.org.

Stecker, P. M., & Fuchs, L. S. (2000). Effecting superior achievement using curriculum-based measurement: The importance of individual progress monitoring. Learning Disabilities Research and Practice, 15, 128–134.

VanDerHeyden, A.M., Burns, M.K., & Bonifay, W. (2018). Is more screening better? The relationship between frequent screening, accurate decisions, and reading proficiency. School Psychology Review, 47(1), 62-82.

Comments

See what others have to say about this topic.

What Are your thoughts?

Leave me a comment and I would like to have a discussion with you!

Jacquelyn Vegh Mar 05, 2022 05:56 PM

This is powerful information, but we are at the mercy of the demands made by our federal and state government. Our legislators will continue to require these tests as long as there are lobbyists from test companies with deep pockets. As long as we have mandated tests, schools will give a million other tests to prepare for the tests that count.

Lindsey Mar 05, 2022 06:07 PM

What are your thoughts on MAPS testing? I am in Texas. It seems to be taking over in our district and held up on a pedestal and teachers are being forced to use solely this info to drive their instruction. Thank you.

Angie Neal Mar 05, 2022 06:29 PM

Great post! I think there needs to be a definitive purpose to the screening/testing. Testing to determine reading level for leveled books - waste of time and not evidence-based. Testing to determine specific weaknesses when a student is struggling and then using that information to inform instruction- evidence-based and beneficial.