Before Data, Before Privacy… Start With The Questions
There has been a great deal of discussion lately about the need to secure student data and protect student privacy. I won’t rehash the recent history and seemingly countless articles and blog posts on the subject and instead will simply state that we at LearnSprout support the proposed Markey-Hatch legislation and stand firm with our position that students’ personally identifiable information (PII) should not be available to anyone beyond the local education agency (LEA) level, whether it be a small charter school or very large public school district. We’ve worked hard to make our Privacy Policy the strongest in education and consider ourselves advocates for student privacy.
Now… With that out of the way, I want to ask a very simple question, and I recognize that it’s the type of question that could label me as a heretic within our new, highly aspirational Edtech space but in my humble opinion, it’s where we all conversations about student privacy need to start.
Why do we need all this data anyway?
This, in my opinion, is the very first question parents and educators should be asking. Today, states and the Federal government collect massive amounts of data on students and to be fair, there are some amazing insights being gleaned that help us understand what has been happening and what is happening now. In some cases, the data can help us understand what might happen tomorrow. That’s powerful stuff but from my observation it seems like many of these benefits have been mined and discovered from a much larger set of data than necessary. Also, data systems can be very expensive.
We need to remember that data on its own does not tell us anything. Data is abstract and like a Jackson Pollock painting can be interpreted in innumerable ways. Where one person sees progress, another might see decline. One of the most important concepts I learned in Marketing is that people will tend to see what they want to see, hear what they want to hear and believe what they want to believe. When it comes to big data, if you go on looking hard enough for something, you’ll eventually find it.
I get where all the excitement around educational data is coming from. We don’t know what we don’t know and the prospect of searching through big messy piles of data appeals to the explorer in all of us, but before we dive in we need to take a step back and ask ourselves exactly what it is we’re looking for.
We need to start with the questions.
Instead of collecting any and all data, and using that data to see what we can possibly see, we should start with a healthy, open discussion to try and figure out exactly which questions we want answers for. Are the questions good questions, or would their answers simply be interesting with little potential for impact? Are there some questions where the best answers come from humans instead of data machines? Are there bad questions? Who should be asking which questions? Are some questions better left at the local level, or do we need the state getting involved to give us the answers to our questions?
As a parent who worries about his kids privacy, my first question about data is simply “What are the questions you’re trying to answer?” In every case, I have yet to find a reason to send PII beyond the school or district level.
Starting with the questions may seem obvious, but in most states the current approach is to collect as much data as possible, as frequently as possible and to store that data in gigantic state longitudinal data systems (SLDS) for analysis. For student information system administrators, this dragnet operation causes more than a few headaches. Entering and validating data for compulsory state reporting can be a real time suck leaving little time for in-house data analysis.
For SIS companies, staying abreast of reporting requirements for each state requires a dedicated state reporting department. It’s expensive and the costs are passed down to schools. If you think I’m exaggerating, I’d like to invite you to check out the documentation for the California Longitudinal Pupil Achievement Data System. (note… the doc is 34MB). Next time you hear someone complain about how expensive education is in America, send them this.
What on earth does the state need all this data for? Well, I’m not an expert, but I think the argument goes something like this. Economies of scale allow the state to build more powerful dashboards and analytics that schools would not otherwise be able to afford on their own. Also, Big Data.
I think there was a time where the first answer was more true, but today new companies like LearnSprout are providing schools with affordable reporting options that typically outperform systems provided by the state. Schoolzilla, Mileposts, Always Prepped and BrightBytes are great examples of companies that are helping schools answer their most pressing questions while keeping data local. As far as Big Data goes, the jury is still out. Researchers have used large data sets in the past to help us understand patterns in chronic absenteeism, drop-outs and suspension rates. Perhaps big state data systems can provide new insights that wouldn’t otherwise be possible with smaller, local data. (I kinda doubt it.)
When we start with the questions one of the things that becomes immediately apparent is how little data we actually need. Small data reduces costs, simplifies implementation and improves performance significantly. LearnSprout for example, taps a short list of fields and is able to analyze five or ten years’ worth of attendance data for an entire school district in a matter of seconds. This can include data as recent as yesterday which is pretty cool considering a lot of SLDS rely on bi-annual data from schools. Oh yeah. LearnSprout is free too.
At LearnSprout, questions drive our product roadmap. Here are some of the questions we’re answering today:
- For each grade level, how many students are on track, off track or borderline for college readiness? Why are they off-track or borderline?
- What is the trend in absenteeism/tardiness/suspensions/illness/etc. across the district over the past year? What is the trend for a school or group of schools? (E.g. All elementary schools)
- How does that compare to previous years? Are we getting better or worse?
- Which demographic subgroups at the district/schools/school are having the worst attendance?
- What does attendance look like for one specific sub-group (E.g. Fourth grade, African American males) How has that changed over the years?
- How do suspensions, illness, unexcused absences, etc. compare between subgroups?
- For all time, which grade levels have had the worst attendance?
- Which students have missed 10% or more of school? (Chronic Absenteeism Report)
- What is the complete history of a student’s attendance?
We are of course, just getting started. We’re constantly adding to an ever-growing list of new questions, but I cannot understate the rigor behind the process of selecting the ones we want to address. Ruthless prioritization through constant customer validation keeps us focused on what’s going to make the biggest impact. It’s a worthwhile process that forces our customers to take a step back and think hard about exactly what their Key Performance Indicators are, and what they could be.
We don’t need an SLDS to build dashboards or an early-warning system. This is not to say that SLDS doesn’t have a place in the edtech landscape, but schools already have all the data they need to answer their questions… Why not cut out the middlemen and analyze the data where it lives?
- Paul