Analysis Modeling & Simulation

Teaching data literacy: Module 3

Analysis, Modeling & Simulation: Static data analysis

Static data analysis

Static data analysis.png

Analysis is the final stage of the data management cycle. Analysis may be static or dynamic. This module discusses each of them. Static analysis measures statistical characteristics of a single dataset covering a fixed time period. For K-8 students, an introduction to static analysis of data is an appropriate final step in the study of data literacy. Static analysis has three steps:

  1. Data representation
  2. Elementary statistics
  3. Computation and interpretation

A dataset can be as simple as a list of cars in a parking lot sorted by color and as complicated as a record of all the trades in Alphabet (Google) stock on a single day. There are many ways to extract information from a dataset. It is essential to choose methods and tools for analysis that match the scale and complexity of the data as well as the skill sets of the analysts. Young elementary students will benefit most from simple exercises that require nothing more than paper and markers. They can learn a lot from very small datasets that they construct themselves. In higher grades, it is appropriate to introduce technological tools that allow students to perform computations and represent data in ways they cannot easily do by hand. The emphasis should always be on the skills students are learning, not on the tools they use to exhibit those skills. The tools will change over time as technology advances. The skills are foundational and permanent.

Data representation

Learning outcome
Students learn three ways to present data and statistics derived from it
As discussed in Module 2, not all data is numerical; but a lot of it is. This step in the process of data literacy applies primarily to quantitative data that is generally summarized and presented in tables, charts, and graphs.
A. Describe the different uses of each type of display
1. Tables are used to present numbers themselves in ways that are easy to read, understand, and interpret.
2. Charts are used to compare categorical data based on some kind of sorting criteria. For example, a chart might be used to show populations of different states.
3. Graphs are used to present time series of data. For example, a graph is an excellent way to view the number of wins teams have in a sports league over time.
B. Give students several examples of each type of representation and let them discuss why each example is or is not a good way to represent particular data. As they become familiar with the three displays, they should observe that it I often possible to present the same information in different ways, and that no one way is usually clearly best. A final choice depends in application, the time available to work on it, and the end use.

Elementary statistics

Learning outcome
Students learn basic statistical concepts
With a few exceptions, this step generally applies to students in middle school and above. Elementary students should work with the concepts without worrying about what they are called or how they are computed. For example, very young students can collect numerical data and plot it in bar graphs to learn the concept of a distribution. They can learn about averages and observe variations in distributions without learning the technical terms that describe them or the computations used to compute them. For older students, instructors must select concepts listed in the tasks below that correspond to students' level of mathematical knowledge.
A. Define some or all of the following elementary statistics. Defining them does not require teaching students how to compute them. Because students can always use software to compute these quantities, it is more important that they understand when and how to use each of them than it is that they know how to compute them.
1. statistical distribution, mean (average), variance, standard deviation, skewness, kurtosis, correlation coefficient
B. Introduce the concept of ordinary least squares (OLS) regression. Since an understanding of the mathematics of regression requires a knowledge of linear algebra that is usually taught only in college, stick to the concepts and provide examples from spreadsheets that perform the calculations. The key point is simply that regression is a technique that allows us to identify correlations between different datasets. Be sure to define the following regression concepts.
1. dependent variable, independent variable, parameter, t-statistics, F-statistics, R-squared

Computation and interpretation

Learning outcome
Students learn to compute and interpret statistics from data
A. Computation
1. Instructors should teach students to compute the statistics listed above using calculators, spreadsheets, and any other software that is available to all students. Since these are all mathematical exercises, it may be done in coordination with math classes if data literacy is being taught separately or in a class other than math.
2. With the exception of high school students taking a statistics course, the emphasis should be on using tools to get a value rather than understanding the mathematics used to obtain such values. This is especially true for regression for reasons stated above.
B. Statistical inference: drawing conclusions from statistical observations
1. For all but high school juniors and seniors not taking a deeper dive not statistics, this is the final step in a K-12 class in data literacy. It is important to take this opportunity to remind students that with the exception of some very simple questions, e.g. what is the average age of students in a class, most statistics come from sample data, and therefore provide us only with statistical estimates of something that we may never be able to know with certainty? For example, how many fish are in a lake? An introduction to the concept of statistical literacy should emphasize the uncertainty that remains about a question after statistical analysis is conducted.
2. Important cautionary warnings for statistical use.
a. Correlation is not causation. This statement means that a statistical correlation found, for example, from a regression analysis, may suggest a causal relationship between phenomena, but it cannot confirm it with further testing. A simple but accurate example is the relationship between the crowing of roosters and the rising of the sun. A simple regression will show a very high correlation between those events, but no one should interpret it to mean that roosters make the sun rise.
b. "Most people use statistics like a drunk man uses a lamppost: more for support than illumination." - Mark Twain. Twain's point is that many people misuse statistics to support ideas they hold when they should use them to increase their understanding of the phenomenon studied to produce some statistics. Star ratings on online services are an excellent example of this misuse of statistics. Without knowing anything about the universe of respondents to the surveys that produce those ratings, it is impossible to know what they really mean.

Go back to Module 1 of Teaching data literacy: Idea formation & abstraction
Go back to Module 2 of Teaching data literacy: Data organization
Continue to Module 4 of Teaching data literacy: Dynamic Data Analysis
Return to home page of Teaching data literacy