The 5Vs for K-12 Framework
About the framework and who is it for?
The 5Vs for K-12 Framework reframes the original 5Vs of Big Data—Volume, Velocity, Variety, Veracity, and Value—to better suit data evaluation in K-12 education. The main goal is to make the data evaluation process accessible and meaningful for students, helping them improve their analytical skills.
The framework supports students in asking critical questions about how data are structured, where they come from, how trustworthy they are, and whether they are useful for a given investigation.
The framework includes guiding questions for students to use when evaluating each dimension, as well as pedagogical guidance for teachers who wish to introduce it in their classroom.

Volume
Volume is about how much data is available, how many observations and variables are included, and whether that amount is appropriate for the investigation at hand.
Volume matters because the size of a dataset affects the scope and depth of analysis.
Use the following questions as a checklist when you first look at a dataset:
Teachers should help students recognize whether the amount of data available is appropriate for the question being asked and the claims they want to make. They can do so by presenting multiple datasets of different sizes and having the students consider which would best support their investigation goals. Students should have opportunities to explore cases where the dataset is too small, yielding only a partial answer or no answer at all, as well as cases where there is “too much” data to explore and manage, and how it affects the analysis. Teachers should guide students to label a dataset as “too small” when the number of observations does not span the phenomenon’s cycle (e.g., 10 days cannot represent a “seasonal” trend) or when comparing groups with fewer than 30 observations each. “Too big” datasets should be labeled as such when there are many more observations than the claim (e.g., minute-to-minute data to study a yearly phenomenon) or when there are many unused or irrelevant variables.
Have students investigate potential long-term changes in weather patterns for their local community using three datasets: daily weather from the past 10 days, 0 years of monthly summaries, and 100 years of monthly summaries. Starting with the 10-day dataset, ask students to identify any trends or conclusions and justify why a dataset is too small for a long-term claim. Then, have the students look at 10-year and 100-year datasets, prompting comparisons across time spans. Emphasize that the 100-year period would be too large for daily resolution, while the 10-year period could produce stable conclusions when examining long-term changes. Afterward, lead a discussion about what each dataset allows them to see (or not see), and ask students to reflect on the strengths and weaknesses of datasets of different sizes.

Velocity
Velocity refers to the temporal characteristics of a dataset, i.e., how current or recent the data is, how frequently it is updated, and whether it reflects a static snapshot or changes over time.
Velocity matters because some questions require up-to-date data, while others can be answered using historical information.
Use the following questions as a checklist when you first look at a dataset:
Teachers should guide students in considering whether the time span of a dataset aligns with the phenomena they are investigating and have them think about how recent data needs to be in order to be used for the task at hand. For fast-moving phenomena (e.g., air quality or disease outbreaks), students should prioritize recent or frequently updated data, while for long-term changes (e.g., climate trends), they should prioritize datasets covering longer periods with appropriate aggregation, such as monthly or annual data. To make Velocity actionable and consistent, teachers can adopt categorical recency levels such as “fresh,” “recent,” “last decade,” “over 10 years,” and “Not relevant.” Moreover, they can provide activities that compare datasets from different periods to help students evaluate the temporal value of data.
Provide students with two regional COVID-19 datasets, one that includes weekly total cases for the past three years, and the second that includes daily counts of cases for the past 90 days. Ask students which dataset is more appropriate for identifying sudden spikes in a specific period versus long-term trends, and ask them to justify their claims. Facilitate a discussion about how the frequency and recency of data collection affect its usefulness for different scientific inquiries and decision-making. For example, discuss with students a scenario in which a policy decision is being made this month, and emphasize that data from the past 90 days would be more relevant than a multi-year dataset, whereas if the goal is to compare trends across years, the latter is necessary.

Variety
Variety considers the types of data included, such as numerical, categorical, textual, spatial, and visual data, and how the data are structured and organized (e.g., tabular format, JSON, map).
Datasets with multiple data types can support rich analysis but may also be more complex to work with. Recognizing different types of data helps students decide which tools and methods are appropriate.
Use the following questions as a checklist when you first look at a dataset:
Teachers should help students identify different types of data and discuss the strengths and challenges associated with data in various forms. From there, teachers should encourage students to examine what kinds of data are present and how those types influence the kind of questions that can be asked and answered.
Introduce students to a mammal’s dataset that includes their diet (categorical), top speed (numerical), life span (numerical), habitat description (text), and latitude and longitude values (spatial). Ask students to describe the variety of data types, identify which variables are most useful for different scientific questions, and examine the format and organization of the dataset. Then, create a visualization that presents the relationship between different data types to illustrate how multiple data types support more complex conclusions.

Veracity
Veracity is about the accuracy, reliability, completeness, and possible biases in the data. This includes how the data were collected, who collected them, why they were collected, and which populations or behaviors may be under- or over-represented.
Veracity matters because inaccurate or incomplete data can lead to false conclusions.
Use the following questions as a checklist when you first look at a dataset:
Teachers should help students understand where data come from, how they were collected, whether there are missing or inconsistent values, and whether any bias is present. Moreover, they should emphasize how bias can shape interpretations. Demonstrating each source of bias (e.g., collecting, reporting, and measurement) can help students understand how bias can shape data and the importance of considering the context in which data is collected as part of the data evaluation process. Additionally, teachers should guide students in understanding how decisions about what and how to measure something affect the data collected, and ultimately, the insights that can be drawn from it. For example, assessing a person’s health can involve various measures (e.g., heart rate, blood pressure, BMI), and the different measured data might yield different results.
Have students investigate a dataset from a political survey collected by calling people on landlines. Ask the students to identify who is represented in the dataset. Then, present a scenario where certain groups (e.g., young people without a landline phone) were underrepresented and ask the student to suggest alternative data collection methods. Have students reflect on how that might influence conclusions, and what could be done to increase trustworthiness.
Value
Value refers to the relevance and usefulness of the data in answering a given question or generating meaningful insights.
Even a well-structured, accurate dataset may not be helpful if it does not contain the information needed for the task at hand. A dataset is valuable when it helps students connect evidence to explanation, even if it is small or limited in scope.
Use the following questions as a checklist when you first look at a dataset:
Teachers can help students assess whether the data includes relevant variables to answer their research question. Moreover, they can support students’ critical thinking about what is missing and what might improve the dataset. Having students imagine potential datasets that could be used for a given line of inquiry or, conversely, develop potential lines of inquiry for a given dataset, can help students build intuition around the importance of Value. It could also be helpful to brainstorm datasets that would not help answer a certain question so that students can develop a sense for what to look for in datasets that will not be helpful in answering their questions.
Provide students with an NBA team-season dataset for the years 2020-2025. Have the students ask questions about the dataset and justify whether it contains relevant data that can help them answer their questions. Guide the students in identifying which variables in the dataset will be used for this analysis. Facilitate the discussion on how question-data alignment determines value by emphasizing that, for example, if the goal is to evaluate long-term team success, the dataset is relevant, but if the goal is to examine the performance of a specific player, the dataset is not appropriate.