“Is this actually good for students?”
This seems like such a simple question. It’s one we, collectively, have answered wrong from time to time (see the Reading Wars), but it’s a question rooted in good intentions. The gist of it is this: before we put anything in front of students, do we have evidence that it actually works? Or, in the case of more innovative and untested practices, do we at least have a scientifically-grounded rationale for why it should work?
What if I told you that everything we think we know about edtech effectiveness is an illusion? That the modern reality of so-called “efficacy research” is so riddled with conflicts of interest, pay-to-play barriers, and cherry-picked data that we still don’t really have any idea of what works and what doesn’t. In reality, much of the criteria used to allocate federal funding or create state-approved vendor lists or obtain school board approval for massive contracts is shaky, inconsistent, and too often not at all reflective of the real-life impact a given program will have in a school or district.
Join me on this journey down the rabbit hole of edtech marketing, research clearinghouses, and the underfunded agencies tasked with helping our schools identify the most “effective” programs and practices.
“The problem, as is so often the case with federal mandates, is that there are no teeth behind the guidelines. There is no unbiased agency tasked with conducting studies in a consistent, equitable manner. There is no agreed-upon template for what studies should look like, how results should be measured, or how often they need to be refreshed.”
ESSA and the Rise of “Evidence-Based” Language
Fun fact: the term “evidence-based” shows up a whopping 63 times in the full text of 2015’s Every Student Succeeds Act (ESSA).
ESSA’s emphasis on efficacy, specifically the inclusion of evidence-based requirements for federal funding eligibility, served as the catalyst for a massive shift in the edtech landscape. Schools began requiring publishers to demonstrate evidence of efficacy as part of their evaluation and purchasing process. Publishers scrambled to commission studies or put together their own so they could check this new, mandatory box. Those who lacked the resources, expertise, or data to do so faded into oblivion.
The intent behind the law was commendable. It makes sense that the federal government does not want schools spending federal money on programs that don’t actually help students learn. The problem, as is so often the case with federal mandates, is that there are no teeth behind the guidelines. There is no unbiased agency tasked with conducting studies in a consistent, equitable manner. There is no agreed-upon template for what studies should look like, how results should be measured, or how often they need to be refreshed.
ESSA famously defined four tiers of evidence:
- Tier 1: “Strong evidence from at least one well-designed and well-implemented experimental study.”
- Tier 2: “Moderate evidence from at least one well-designed and well-implemented quasi-experimental study.”
- Tier 3: “Promising evidence from at least one well-designed and well-implemented correlational study with statistical controls for selection bias.”
- Tier 4: “Demonstrates a rationale based on high-quality research findings or positive evaluation that such activity, strategy, or intervention is likely to improve student outcomes or other relevant outcomes.”
These evidence tiers are similar, but not identical to, benchmarks from the What Works Clearinghouse (WWC), an initiative of the Department of Education’s Institute for Education Sciences. The WWC is often referenced in discussions about evidence-based practices, but it is never mentioned explicitly in ESSA’s text. WWC standards are also more rigorous than the relatively vague language in the Act. One of the most significant limiting factors for the WWC is just how small their body of edtech research is in the context of the sheer number of strategies and programs at play in the American education system. They’re not actively chasing down studies to validate. Interested parties need to manually submit studies for review, and the overwhelming majority of studies do not meet the extremely high bar of WWC standards. Most of the programs being used in schools today have no presence on the WWC website at all.
For all these reasons and more, we can’t treat the WWC as an unassailable source of truth. Any school district decision maker trusting wholly in WWC findings could easily be led astray. A quick review of their website shows “Promising Evidence” for the much-maligned Reading Recovery® program dated July 2013, with no updates to reflect the body of follow-up research showing a significant negative impact on children. Popular programs like i-Ready® and Achieve3000® are represented on the site with damning reports showing no positive impact on student achievement. These heavyweight publishers have had no trouble simply commissioning more studies and sidestepping the WWC to check the evidence-based box for school district purchasers. It’s not hard to find the resources or justify the investment when you have district contracts worth $7-$10 million dollars a piece.
In reality, districts (and publishers) are largely left to their own devices regarding the who, what, where, and how behind the label of “evidence-based.” The legitimacy of studies that publishers provide as “evidence” has been left to the honor system. Only a few large school districts possess the resources to vet studies, let alone conduct them on their own. This author has personally seen edtech publishers go so far as to contract with random PhDs on Fiverr to write up reports after the fact based on data provided by the company. The studies look credible to those who don’t know the story behind them, but surely this was not the intent of the law?
The unfortunate result of these inconsistencies is that publishers have free reign to find “correlations” wherever they can, ignoring data that doesn’t make them look good and publicizing results that do. You’re never seeing the full story when you’re evaluating research—what you’re really seeing is marketing disguised as science.
The Confusing Precedent of Non-Perishable Research
One attribute of edtech research that is missing from both ESSA and clearinghouse guidelines is shelf life. If a publisher is able to demonstrate improved outcomes aligned with Tiers 1-3 at any point in time, they are considered “good to go” indefinitely for funding and purchasing purposes. Given what we know about the ephemeral nature of technology, pedagogy, and even the generational differences between students, this is a bit of a head-scratcher.
DreamBox (now part of Discovery Education) is one of the more recognizable examples of this. They have long touted their status as “the only comprehensive K-8 math program rated STRONG by Evidence for ESSA,” but the studies that earned that rating date back to the 2010-2011 school year. Digital learning was still so new at the time that the control group for the study didn’t use any online tools at all. This wasn’t just a different era in education, it was an entirely different generation of students. DreamBox itself looks significantly different now, and the curriculum has no doubt been refreshed many times since.
Lexia is another popular name with similarly dated edtech research floating around in the ether. Their own Evidence for ESSA entry showing a Tier 3 – Promising rating cites three studies. Only one of them took place fewer than ten school years ago. Note: Lexia has had a more recent study added to the site since, presumably because it qualified for a higher rating. Lexia’s What Works Clearinghouse intervention report is based on studies from 2006 and 2008 and includes a reference to “a computer lab of 25 stations,” to give you an idea of how long ago that was. It’s not hard to find newer research if you know where to look, but for companies that don’t have Lexia’s deep pockets, the cost of constantly running studies like this is prohibitive. Many will lean on the same study or two for as long as they can get away with it.
None of this is the fault of the publishers, and we’re not inferring anything about whether or not the programs are effective. That said, at what point do we step back and question whether the evidence we’re pointing to as validation for purchasing decisions is even relevant to our students? We need a better, more agile way to measure what works and what doesn’t in the current educational landscape, based on the current products being offered.
“Everybody has some evidence of efficacy at this point, or they wouldn’t be around anymore. So how is it that the treatment group always wins?”
What Does “Business as Usual” Even Mean?
Arguably the most frustrating aspect of edtech research is the mysterious missing instructional time in every control group. Read through enough research design descriptions and you’ll become intimately familiar with the term “business as usual,” as in “the randomized control trial design randomly assigned teachers to implement [insert program here] or continue with business as usual.” Which begs the question of what those teachers are doing with the 30-60 minutes of instruction/practice time the treatment group is giving their students.
Instead of fun, head-to-head comparisons between two programs that might actually give us useful information about what works and what doesn’t in similar learning environments, we almost never have any idea what’s happening in the control classrooms. Read through the limitations of these edtech research studies and you’ll see phrases like “we did not collect information on the supplemental strategies or programs used in the control classrooms.” This may have made sense ten or fifteen years ago, when the alternative to any digital learning tool was more face-to-face instruction. But in the modern day, nearly every teacher is using something to supplement their instruction. Nearly every school has subscriptions to multiple math and reading programs.
The interesting thought experiment here lies in what those programs are and how they’re being used. Everybody has some evidence of efficacy at this point, or they wouldn’t be around anymore. So how is it that the treatment group always wins?
Let’s say Program A wants more research to support their sales efforts in a particular state, so its publisher commissions a study. Program A is being used by the treatment group, with the intent of “proving” that Program A leads to better outcomes. Teachers in the control group can’t use Program A, so they instead turn to Programs B and C to supplement their instruction. Programs B and C, of course, have their own research showing a correlation between usage and improved student outcomes. Wouldn’t we expect the results of this study to essentially be a wash, all else being equal?
There are two obvious answers to this question. First, it’s entirely possible that the teachers in the control group are not replacing that extra instruction and practice time with any similar online programs at all. Maybe students are instead given free choice time, or extra recess, or worksheets. In that case, it stands to reason that the students who are receiving some 20 hours or more of additional structured practice and feedback during the study will perform better on whichever assessment is being used for measurement. That’s not an indication of an effective product, it’s just common sense.
Alternatively, it could be that teachers in the control groups are using a combination of other strategies or programs, but in a less structured way. Teachers in the treatment groups almost always receive program-specific professional development and coaching, along with clearly defined instructions for usage. Teachers in the control groups presumably do not get the same support for the programs they are using, thus handicapping them right off the bat. Which leads us to….
“The truth is, edtech return on investment—as measured by academic outcomes—is almost never about what you buy as a school or district. It’s about teacher perception, project leadership, and how willing administrators are to support that purchase after the fact.”
Fidelity Matters More than Software
No matter what new technology comes along or how innovative edtech companies are, teacher quality is still, and will likely always be, the most important school-related factor influencing student achievement. This is the one variable that research studies largely can’t account for. The human element cannot be neglected when analyzing such studies. Why? Because software in a vacuum can actually do very little for student achievement in the greater context of a typical educational journey. But any software program built on research-based principles, when implemented strategically alongside high-quality instruction, should have a positive impact on student outcomes.
This is perhaps one of the biggest “dirty little secrets” in the world of edtech. The range of outcomes for any adoption is massive, and it almost always comes down to two things: the buy-in of teachers, the consistency of usage, and the support of leadership. There are plenty of schools and districts paying for programs that they lack the capacity to properly support. Teachers are aware of what programs are available to them, but in most cases the onus falls on them to learn how to effectively incorporate those programs into their instruction.
This is why you’ll find so few credible edtech research studies that don’t involve the company’s hand-picked participants. The districts where these studies are taking place aren’t random, they’re the cream of the crop in terms of how bought in they are to the program and how strong their implementation already is. They are often representative of only a small fraction of the user base for a given product, and their results are more likely to be in the product’s favor because their teachers have the experience, knowledge, and coaching to put their students in the best position to succeed.
The truth is, edtech return on investment—as measured by academic outcomes—is almost never about what you buy as a school or district. It’s about teacher perception, project leadership, and how willing administrators are to support that purchase after the fact. Everybody wants a magic bullet, but if teachers aren’t confident, students aren’t engaged, and leaders aren’t aware, there’s nothing any technology can do to overcome those barriers.
The Pay-to-Play Problem
Here’s a fun game to play if you’re ever bored. Try a Google search for “[insert big name edtech company] research.” Then scroll through the studies and take note of the author’s names and look them up on LinkedIn. The thing many people don’t realize is that the overwhelming majority of studies are authored by employees of the company the study is about.
The programs with the most evidence aren’t necessarily the most effective, they’re just the ones that can afford to maintain their own in-house research teams. In the words of Johns Hopkins University’s research fellows addressing i-Ready’s studies: “Unfortunately, parties associated with the publishers of the assessments have authored the studies, which inevitably calls objectivity into question.”
It’s certainly possible to commission independent, 3rd-party studies, and many of the big names have sprinkled one or two of those alongside their internal research. The problem is that those don’t come cheap either. Depending on the scope of the study and the ESSA evidence level a publisher is looking for, independent studies can range from $15-$20k well into the six figures. There are no refunds for studies that don’t come out in the publisher’s favor, so the risk factor is always a consideration.
The problem here is that modern day school district purchasing workflows often don’t take these things into consideration. Every bit of edtech research decision makers can scrap together provides more risk mitigation for them and makes for a more impressive pitch for board and community approval. The research from smaller companies and those that haven’t been around as long often feels lackluster compared to the sheer volume of studies the sales reps from large publishers have in their respective toolkits. The chasm gets wider with every year that passes.
Thus, we find ourselves in a bit of a Catch-22. Startups that aren’t backed by significant capital funding can’t afford to keep expensive data scientists on the payroll or pony up for a meaningful amount of independent research. Without that research to lean on, those same companies face significant barriers to growth, making it even harder for them to achieve the kind of profitability they need to expand those areas of the business. It’s yet another reminder that the edtech landscape isn’t necessarily about what works and what doesn’t, it’s about who controls the narrative.
“As it stands today, the free market competition in edtech is driven by a race to check the most boxes. What if we demanded a paradigm shift so it was instead driven by a race to build the product that was best for student learning? That can’t happen when research and evidence is left in the hands of publishers.”
How Can We Fix This?
This, as always, is the hardest question of all. The current system is well-established and there are incentives for many companies and organizations to defend the status quo. In a perfect world, we would have an unbiased agency responsible for organizing, executing, and cataloging well-designed studies for every edtech product, free of any influence or funding from the publishers themselves.
One can easily imagine a centralized website accessible by all educators in which such findings (both positive and negative) are reported on a regular basis in an easy-to-sort, easy to understand user interface. IES and the What Works Clearinghouse are a step in the right direction, but without more funding, stronger structure, and further reach, they will always have severe limitations.
As it stands today, the free market competition in edtech is driven by a race to check the most boxes. What if we changed that narrative so it was instead driven by a race to build the product that was best for student learning? That can’t happen when research and evidence is left in the hands of publishers.
Until we are able to raise more awareness about these issues and generate some policymaker support for a better way, the best thing anybody can do is equip themselves with the knowledge to approach any edtech research with a skeptical eye. Who authored the study and why? How do the treatment and control groups compare with each other and what other factors might be influencing results? What do the results really say? What are the limitations of the study in question? Does one program have stronger research than another because it’s truly better for students, or because its publisher has more resources at its disposal?
Does all this mean that everybody should aspire to be a data scientist? No! But we can’t afford to take every study we see at face value. The lack of clear and consistent requirements under ESSA means there is no such thing as an apples-to-apples comparison in edtech research. Consider doing your own “research” by constantly reviewing and analyzing the results of any implementation in your classroom, school, or district rather than relying on what a publisher says they have accomplished elsewhere. In the end, your students’ outcomes are the ones that matter most.
Stay in the loop on what’s happening in edtech today
Is this the kind of thing that really grinds your gears? Are you keen to keep tabs on the latest in AI, edtech research, and the ongoing fight for what’s best for students? Subscribe to EdTech Evolved and get articles like this delivered to your inbox every month.
Explore the Future of EdTech
See how eSpark is combining advanced AI with decades of evidence-based practices to finally fulfill the promise of personalized learning. It’s a unique experience for every student, every time.