Chapter 17 Epilogue

“Begin at the beginning”, the King said, very gravely, “and go on till you come to the end: then stop” – Lewis Carroll

It feels somewhat strange to be writing this chapter, and more than a little inappropriate. An epilogue is what you write when a book is finished, and this book really isn’t finished. There are a lot of things still missing from this book. It doesn’t have an index yet. A lot of references are missing. There are no “do it yourself” exercises. And in general, I feel that there a lot of things that are wrong with the presentation, organisation and content of this book. Given all that, I don’t want to try to write a “proper” epilogue. I haven’t finished writing the substantive content yet, so it doesn’t make sense to try to bring it all together. But this version of the book is going to go online for students to use, and you will may be to purchase a hard copy too, so I want to give it at least a veneer of closure. So let’s give it a go, shall we?

17.1 The undiscovered statistics

First, I’m going to talk a bit about some of the content that I wish I’d had the chance to cram into this version of the book, just so that you can get a sense of what other ideas are out there in the world of statistics. I think this would be important even if this book were getting close to a final product. One thing that students often fail to realise is that their introductory statistics classes are just that, an introduction. If you want to go out into the wider world and do real data analysis, you have to learn a whole lot of new tools that extend the content of your undergraduate lectures in all sorts of different ways. Don’t assume that something can’t be done just because it wasn’t covered in undergrad. Don’t assume that something is the right thing to do just because it was covered in an undergrad class. To stop you from falling victim to that trap, I think it’s useful to give a bit of an overview of some of the other ideas out there

17.1.1 Omissions within the topics covered

Even within the topics that I have covered in the book, there are a lot of omissions that I’d like to redress in future version of the book. Just sticking to things that are purely about statistics (rather than things associated with jamovi), the following is a representative but not exhaustive list of topics that I’d like to expand on at some time:

• Other types of correlations. In [Correlation and regression] I talked about two types of correlation: Pearson and Spearman. Both of these methods of assessing correlation are applicable to the case where you have two continuous variables and want to assess the relationship between them. What about the case where your variables are both nominal scale? Or when one is nominal scale and the other is continuous? There are actually methods for computing correlations in such cases (e.g., polychoric correlation), and it would be good to see these included.

• More detail on effect sizes. In general, I think the treatment of effect sizes throughout the book is a little more cursory than it should be. In almost every instance, I’ve tended just to pick one measure of effect size (usually the most popular one) and describe that. However, for almost all tests and models there are multiple ways of thinking about effect size, and I’d like to go into more detail in the future.

• Dealing with violated assumptions. In a number of places in the book I’ve talked about some things you can do when you find that the assumptions of your test (or model) are violated, but I think that I ought to say more about this. In particular, I think it would have been nice to talk in a lot more detail about how you can tranform variables to fix problems. I talked a bit about this [Pragmatic matters, but the discussion isn’t detailed enough I think.

• Interaction terms for regression. In Factorial ANOVA I talked about the fact that you can have interaction terms in an ANOVA, and I also pointed out that ANOVA can be interpreted as a kind of linear regression model. Yet, when talking about regression in [Correlation and regression] I made not mention of interactions at all. However, there’s nothing stopping you from including interaction terms in a regression model. It’s just a little more complicated to figure out what an “interaction” actually means when you’re talking about the interaction between two continuous predictors, and it can be done in more than one way. Even so, I would have liked to talk a little about this.

• Method of planned comparison. As I mentioned this in Factorial ANOVA, it’s not always appropriate to be using a post hoc correction like Tukey’s HSD when doing an ANOVA, especially when you had a very clear (and limited) set of comparisons that you cared about ahead of time. I would like to talk more about this in the future.

• Multiple comparison methods. Even within the context of talking about post hoc tests and multiple comparisons, I would have liked to talk about the methods in more detail, and talk about what other methods exist besides the few options I mentioned.

17.1.2 Statistical models missing from the book

Statistics is a huge field. The core tools that I’ve described in this book (chi-square tests, t-tests, regression and ANOVA) are basic tools that are widely used in everyday data analysis, and they form the core of most introductory stats books. However, there are a lot of other tools out there. There are so very many data analysis situations that these tools don’t cover, and it would be great to give you a sense of just how much more there is, for example:

• Nonlinear regression. When discussing regression in Chapter 12, we saw that regression assumes that the relationship between predictors and outcomes is linear. On the other hand, when we talked about the simpler problem of correlation in Chapter 4, we saw that there exist tools (e.g., Spearman correlations) that are able to assess non-linear relationships between variables. There are a number of tools in statistics that can be used to do non-linear regression. For instance, some non-linear regression models assume that the relationship between predictors and outcomes is monotonic (e.g., isotonic regression), while others assume that it is smooth but not necessarily monotonic (e.g., Lowess regression), while others assume that the relationship is of a known form that happens to be nonlinear (e.g., polynomial regression).

• Logistic regression. Yet another variation on regression occurs when the outcome variable is binary, but the predictors are continuous. For instance, suppose you’re investigating social media, and you want to know if it’s possible to predict whether or not someone is on Twitter as a function of their income, their age, and a range of other variables. This is basically a regression model, but you can’t use regular linear regression because the outcome variable is binary (you’re either on Twitter or you’re not). Because the outcome variable is binary, there’s no way that the residuals could possibly be normally distributed. There are a number of tools that statisticians can apply to this situation, the most prominent of which is logistic regression.

• The General Linear Model (GLM). The GLM is actually a family of models that includes logistic regression, linear regression, (some) nonlinear regression, ANOVA and many others. The basic idea in the GLM is essentially the same idea that underpins linear models, but it allows for the idea that your data might not be normally distributed, and allows for nonlinear relationships between predictors and outcomes. There are a lot of very handy analyses that you can run that fall within the GLM, so it’s a very useful thing to know about.

• Survival analysis. In A brief introduction to research design I talked about “differential attrition”, the tendency for people to leave the study in a non-random fashion. Back then, I was talking about it as a potential methodological concern, but there are a lot of situations in which differential attrition is actually the thing you’re interested in. Suppose, for instance, you’re interested in finding out how long people play different kinds of computer games in a single session. Do people tend to play RTS (real time strategy) games for longer stretches than FPS (first person shooter) games? You might design your study like this. People come into the lab, and they can play for as long or as little as they like. Once they’re finished, you record the time they spent playing. However, due to ethical restrictions, let’s suppose that you cannot allow them to keep playing longer than two hours. A lot of people will stop playing before the two hour limit, so you know exactly how long they played. But some people will run into the two hour limit, and so you don’t know how long they would have kept playing if you’d been able to continue the study. As a consequence, your data are systematically censored: you’re missing all of the very long times. How do you analyse this data sensibly? This is the problem that survival analysis solves. It is specifically designed to handle this situation, where you’re systematically missing one “side” of the data because the study ended. It’s very widely used in health research, and in that context it is often literally used to analyse survival. For instance, you may be tracking people with a particular type of cancer, some who have received treatment A and others who have received treatment B, but you only have funding to track them for 5 years. At the end of the study period some people are alive, others are not. In this context, survival analysis is useful for determining which treatment is more effective, and telling you about the risk of death that people face over time.

• Mixed models. Repeated measures ANOVA is often used in situations where you have observations clustered within experimental units. A good example of this is when you track individual people across multiple time points. Let’s say you’re tracking happiness over time, for two people. Aaron’s happiness starts at 10, then drops to 8, and then to 6. Belinda’s happiness starts at 6, then rises to 8 and then to 10. Both of these two people have the same “overall” level of happiness (the average across the three time points is 8), so a repeated measures ANOVA analysis would treat Aaron and Belinda the same way. But that’s clearly wrong. Aaron’s happiness is decreasing, whereas Belinda’s is increasing. If you want to optimally analyse data from an experiment where people can change over time, then you need a more powerful tool than repeated measures ANOVA. The tools that people use to solve this problem are called “mixed” models, because they are designed to learn about individual experimental units (e.g. happiness of individual people over time) as well as overall effects (e.g. the effect of money on happiness over time). Repeated measures ANOVA is perhaps the simplest example of a mixed model, but there’s a lot you can do with mixed models that you can’t do with repeated measures ANOVA.

• Multidimensional scaling. Factor analysis is an example of an “unsupervised learning” model. What this means is that, unlike most of the “supervised learning” tools I’ve mentioned, you can’t divide up your variables into predictors and outcomes. Regression is supervised learning whereas factor analysis is unsupervised learning. It’s not the only type of unsupervised learning model however. For example, in factor analysis one is concerned with the analysis of correlations between variables. However, there are many situations where you’re actually interested in analysing similarities or dissimilarities between objects, items or people. There are a number of tools that you can use in this situation, the best known of which is multidimensional scaling (MDS). In MDS, the idea is to find a “geometric” representation of your items. Each item is “plotted” as a point in some space, and the distance between two points is a measure of how dissimilar those items are.

• Clustering. Another example of an unsupervised learning model is clustering (also referred to as classification), in which you want to organise all of your items into meaningful groups, such that similar items are assigned to the same groups. A lot of clustering is unsupervised, meaning that you don’t know anything about what the groups are, you just have to guess. There are other “supervised clustering” situations where you need to predict group memberships on the basis of other variables, and those group memberships are actually observables. Logistic regression is a good example of a tool that works this way. However, when you don’t actually know the group memberships, you have to use different tools (e.g., k-means clustering). There are even situations where you want to do something called “semi-supervised clustering”, in which you know the group memberships for some items but not others. As you can probably guess, clustering is a pretty big topic, and a pretty useful thing to know about.

• Causal models. One thing that I haven’t talked about much in this book is how you can use statistical modelling to learn about the causal relationships between variables. For instance, consider the following three variables which might be of interest when thinking about how someone died in a firing squad. We might want to measure whether or not an execution order was given (variable A), whether or not a marksman fired their gun (variable B), and whether or not the person got hit with a bullet (variable C). These three variables are all correlated with one another (e.g., there is a correlation between guns being fired and people getting hit with bullets), but we actually want to make stronger statements about them than merely talking about correlations. We want to talk about causation. We want to be able to say that the execution order (A) causes the marksman to fire (B) which causes someone to get shot (C). We can express this by a directed arrow notation: we write it as $$A \rightarrow B \rightarrow C$$. This “causal chain” is a fundamentally different explanation for events than one in which the marksman fires first, which causes the shooting $$B \rightarrow C$$, and then causes the executioner to “retroactively” issue the execution order, $$B \rightarrow A$$. This “common effect” model says that A and C are both caused by B. You can see why these are different. In the first causal model, if we had managed to stop the executioner from issuing the order (intervening to change A), then no shooting would have happened. In the second model, the shooting would have happened any way because the marksman was not following the execution order. There is a big literature in statistics on trying to understand the causal relationships between variables, and a number of different tools exist to help you test different causal stories about your data. The most widely used of these tools (in psychology at least) is structural equations modelling (SEM), and at some point I’d like to extend the book to talk about it.

Of course, even this listing is incomplete. I haven’t mentioned time series analysis, item response theory, market basket analysis, classification and regression trees, or any of a huge range of other topics. However, the list that I’ve given above is essentially my wish list for this book. Sure, it would double the length of the book, but it would mean that the scope has become broad enough to cover most things that applied researchers in psychology would need to use.

17.1.3 Other ways of doing inference

A different sense in which this book is incomplete is that it focuses pretty heavily on a very narrow and old-fashioned view of how inferential statistics should be done. In [Estimating unknown quantities from a sample] I talked a little bit about the idea of unbiased estimators, sampling distributions and so on. In Hypothesis testing I talked about the theory of null hypothesis significance testing and p-values. These ideas have been around since the early 20th century, and the tools that I’ve talked about in the book rely very heavily on the theoretical ideas from that time. I’ve felt obligated to stick to those topics because the vast majority of data analysis in science is also reliant on those ideas. However, the theory of statistics is not restricted to those topics and, whilst everyone should know about them because of their practical importance, in many respects those ideas do not represent best practice for contemporary data analysis. One of the things that I’m especially happy with is that I’ve been able to go a little beyond this. Bayesian statistics now presents the Bayesian perspective in a reasonable amount of detail, but the book overall is still pretty heavily weighted towards the frequentist orthodoxy. Additionally, there are a number of other approaches to inference that are worth mentioning:

• Bootstrapping. Throughout the book, whenever I’ve introduced a hypothesis test, I’ve had a strong tendency just to make assertions like “the sampling distribution for BLAH is a t-distribution” or something like that. In some cases, I’ve actually attempted to justify this assertion. For example, when talking about $$\chi^2$$ tests in Categorical data analysis I made reference to the known relationship between normal distributions and $$\chi^2$$ distributions (see [Introduction to probability) to explain how we end up assuming that the sampling distribution of the goodness-of-fit statistic is $$\chi^2$$ . However, it’s also the case that a lot of these sampling distributions are, well, wrong. The $$\chi^2$$ test is a good example. It is based on an assumption about the distribution of your data, an assumption which is known to be wrong for small sample sizes! Back in the early 20th century, there wasn’t much you could do about this situation. Statisticians had developed mathematical results that said that “under assumptions BLAH about the data, the sampling distribution is approximately BLAH”, and that was about the best you could do. A lot of times they didn’t even have that. There are lots of data analysis situations for which no-one has found a mathematical solution for the sampling distributions that you need. And so up until the late 20th century, the corresponding tests didn’t exist or didn’t work. However, computers have changed all that now. There are lots of fancy tricks, and some not-so-fancy, that you can use to get around it. The simplest of these is bootstrapping, and in it’s simplest form it’s incredibly simple. What you do is simulate the results of your experiment lots and lots of times, under the twin assumptions that (a) the null hypothesis is true and (b) the unknown population distribution actually looks pretty similar to your raw data. In other words, instead of assuming that the data are (for instance) normally distributed, just assume that the population looks the same as your sample, and then use computers to simulate the sampling distribution for your test statistic if that assumption holds. Despite relying on a somewhat dubious assumption (i.e., the population distribution is the same as the sample!) bootstrapping is quick and easy method that works remarkably well in practice for lots of data analysis problems.

• Cross validation. One question that pops up in my stats classes every now and then, usually by a student trying to be provocative, is “Why do we care about inferential statistics at all? Why not just describe your sample?” The answer to the question is usually something like this, “Because our true interest as scientists is not the specific sample that we have observed in the past, we want to make predictions about data we might observe in the future”. A lot of the issues in statistical inference arise because of the fact that we always expect the future to be similar to but a bit different from the past. Or, more generally, new data won’t be quite the same as old data. What we do, in a lot of situations, is try to derive mathematical rules that help us to draw the inferences that are most likely to be correct for new data, rather than to pick the statements that best describe old data. For instance, given two models A and B, and a data set $$X$$ you collected today, try to pick the model that will best describe a new data set $$Y$$ that you’re going to collect tomorrow. Sometimes it’s convenient to simulate the process, and that’s what cross-validation does. What you do is divide your data set into two subsets, $$X1$$ and $$X2$$. Use the subset $$X1$$ to train the model (e.g., estimate regression coefficients, let’s say), but then assess the model performance on the other one $$X2$$. This gives you a measure of how well the model generalises from an old data set to a new one, and is often a better measure of how good your model is than if you just fit it to the full data set $$X$$.

17.1.4 Miscellaneous topics

• Power analysis. In Hypothesis testing I discussed the concept of power (i.e., how likely are you to be able to detect an effect if it actually exists) and referred to power analysis, a collection of tools that are useful for assessing how much power your study has. Power analysis can be useful for planning a study (e.g., figuring out how large a sample you’re likely to need), but it also serves a useful role in analysing data that you already collected. For instance, suppose you get a significant result, and you have an estimate of your effect size. You can use this information to estimate how much power your study actually had. This is kind of useful, especially if your effect size is not large. For instance, suppose you reject the null hypothesis at $$p< .05$$, but you use power analysis to figure out that your estimated power was only .08. The significant result means that, if the null hypothesis was in fact true, there was a 5% chance of getting data like this. But the low power means that, even if the null hypothesis is false and the effect size was really as small as it looks, there was only an 8% chance of getting data like you did. This suggests that you need to be pretty cautious, because luck seems to have played a big part in your results, one way or the other!

• Data analysis using theory-inspired models. In a few places in this book I’ve mentioned response time (RT) data, where you record how long it takes someone to do something (e.g., make a simple decision). I’ve mentioned that RT data are almost invariably non-normal, and positively skewed. Additionally, there’s a thing known as the speedaccuracy trade-off: if you try to make decisions too quickly (low RT) then you’re likely to make poorer decisions (lower accuracy). So if you measure both the accuracy of a participant’s decisions and their RT, you’ll probably find that speed and accuracy are related. There’s more to the story than this, of course, because some people make better decisions than others regardless of how fast they’re going. Moreover, speed depends on both cognitive processes (i.e., time spent thinking) but also physiological ones (e.g., how fast can you move your muscles). It’s starting to sound like analysing this data will be a complicated process. And indeed it is, but one of the things that you find when you dig into the psychological literature is that there already exist mathematical models (called “sequential sampling models”) that describe how people make simple decisions, and these models take into account a lot of the factors I mentioned above. You won’t find any of these theoretically-inspired models in a standard statistics textbook. Standard stats textbooks describe standard tools, tools that could meaningfully be applied in lots of different disciplines, not just psychology. ANOVA is an example of a standard tool that is just as applicable to psychology as to pharmacology. Sequential sampling models are not, they are psychology-specific, more or less. This doesn’t make them less powerful tools. In fact, if you’re analysing data where people have to make choices quickly you should really be using sequential sampling models to analyse the data. Using ANOVA or regression or whatever won’t work as well, because the theoretical assumptions that underpin them are not well-matched to your data. In contrast, sequential sampling models were explicitly designed to analyse this specific type of data, and their theoretical assumptions are extremely well-matched to the data.

17.2 Learning the basics, and learning them in jamovi

Okay, that was a long list. And even that listing is massively incomplete. There really are a lot of big ideas in statistics that I haven’t covered in this book. It can seem pretty depressing to finish an almost 500-page textbook only to be told that this is only the beginning, especially when you start to suspect that half of the stuff you’ve been taught is wrong. For instance, there are a lot of people in the field who would strongly argue against the use of the classical ANOVA model, yet I’ve devoted two whole chapters to it! Standard ANOVA can be attacked from a Bayesian perspective, or from a robust statistics perspective, or even from a “it’s just plain wrong” perspective (people very frequently use ANOVA when they should actually be using mixed models). So why learn it at all?

As I see it, there are two key arguments. Firstly, there’s the pure pragmatism argument. Rightly or wrongly, ANOVA is widely used. If you want to understand the scientific literature, you need to understand ANOVA. And secondly, there’s the “incremental knowledge” argument. In the same way that it was handy to have seen one-way ANOVA before trying to learn factorial ANOVA, understanding ANOVA is helpful for understanding more advanced tools, because a lot of those tools extend on or modify the basic ANOVA setup in some way. For instance, although mixed models are way more useful than ANOVA and regression, I’ve never heard of anyone learning how mixed models work without first having worked through ANOVA and regression. You have to learn to crawl before you can climb a mountain.

Actually, I want to push this point a bit further. One thing that I’ve done a lot of in this book is talk about fundamentals. I spent a lot of time on probability theory. I talked about the theory of estimation and hypothesis tests in more detail than I needed to. Why did I do all this? Looking back, you might ask whether I really needed to spend all that time talking about what a probability distribution is, or why there was even a section on probability density. If the goal of the book was to teach you how to run a t-test or an ANOVA, was all that really necessary? Was this all just a huge waste of everyone’s time???

The answer, I hope you’ll agree, is no. The goal of an introductory stats is not to teach ANOVA. It’s not to teach t-tests, or regressions, or histograms, or p-values. The goal is to start you on the path towards becoming a skilled data analyst. And in order for you to become a skilled data analyst, you need to be able to do more than ANOVA, more than t-tests, regressions and histograms. You need to be able to think properly about data. You need to be able to learn the more advanced statistical models that I talked about in the last section, and to understand the theory upon which they are based. And you need to have access to software that will let you use those advanced tools. And this is where, in my opinion at least, all that extra time I’ve spent on the fundamentals pays off. If you understand probability theory, you’ll find it much easier to switch from frequentist analyses to Bayesian ones.

In short, I think that the big payoff for learning statistics this way is extensibility. For a book that only covers the very basics of data analysis, this book has a massive overhead in terms of learning probability theory and so on. There’s a whole lot of other things that it pushes you to learn besides the specific analyses that the book covers. So if your goal had been to learn how to run an ANOVA in the minimum possible time, well, this book wasn’t a good choice. But as I say, I don’t think that is your goal. I think you want to learn how to do data analysis. And if that really is your goal, you want to make sure that the skills you learn in your introductory stats class are naturally and cleanly extensible to the more complicated models that you need in real world data analysis. You want to make sure that you learn to use the same tools that real data analysts use, so that you can learn to do what they do. And so yeah, okay, you’re a beginner right now (or you were when you started this book), but that doesn’t mean you should be given a dumbed-down story, a story in which I don’t tell you about probability density, or a story where I don’t tell you about the nightmare that is factorial ANOVA with unbalanced designs. And it doesn’t mean that you should be given baby toys instead of proper data analysis tools. Beginners aren’t dumb, they just lack knowledge. What you need is not to have the complexities of real world data analysis hidden from from you. What you need are the skills and tools that will let you handle those complexities when they inevitably ambush you in the real world.

And what I hope is that this book, or the finished book that this will one day turn into, is able to help you with that.

Author’s note – I’ve mentioned it before, but I’ll quickly mention it again. The book’s reference list is appallingly incomplete. Please don’t assume that these are the only sources I’ve relied upon. The final version of this book will have a lot more references. And if you see anything clever sounding in this book that doesn’t seem to have a reference, I can absolutely promise you that the idea was someone else’s. This is an introductory textbook: none of the ideas are original. I’ll take responsibility for all the errors, but I can’t take credit for any of the good stuff. Everything smart in this book came from someone else, and they all deserve proper attribution for their excellent work. I just haven’t had the chance to give it to them yet.