Bayesian Estimation of Signal Detection Models, Part 2

This post is the second part of a series of three blog posts: In the first part, I described how to estimate the equal variance Gaussian SDT (EVSDT) model for a single participant, using Bayesian (generalized linear and nonlinear) modeling techniques. In the third part, I describe how to estimate the unequal variance Gaussian SDT model as a nonlinear Bayesian model.

In this post, I describe how to estimate the EVSDT model’s parameters for multiple participants simultaneously using Bayesian generalized hierarchical models. I provide a software implementation in R.

Introduction

This blog post continues directly from Part 1, where we obtained parameter estimates of the EVSDT model for a single subject using three methods: Manual calculation of point estimates (Stanislaw and Todorov 1999), estimating the model as a GLM (Generalized Linear Model; DeCarlo (1998)), and estimating the model as a GLM using brms’ nonlinear modeling syntax (Bürkner 2017).

However, researchers are usually not as interested in the specific subjects that happened to participate in their experiment, as they are in the population of potential subjects. Therefore, we are unsatisfied with parameters which describe only the subjects that happened to participate in our study: The final statistical model should have parameters that estimate features of the population of interest.

Broadly, there are two methods for obtaining these “population level” parameters. By far the most popular method is to summarise the manually calculated subject-specific point estimates of d’ and c with their sample means and standard deviations. From these, we can calculate standard errors, t-tests, confidence intervals, etc. Another method–which I hope to motivate here–is to build a bigger model that estimates subject-specific and population-level parameters simultaneously. We call this latter method “hierarchical” or “multilevel” modeling (Gelman and Hill 2007; Rouder and Lu 2005).

In this blog post, I show how to obtain population-level EVSDT parameters with these two methods, using the R programming language and the brms R package (R Core Team 2017; Bürkner 2017).

Example data

We continue with the same data set as in Part 1 of this blog post. The example data is called confcontr, and is provided as a data frame: “These are the data from the control group in Skagerberg and Wright’s study of memory conformity. Basically, this is the simplest old/new recognition memory design.” (Skagerberg and Wright 2008).

For completeness, the necessary code from that blog post is repeated below:

library(tidyverse)
library(sdtalt)
data(confcontr)
# We prefer working with "tibbles" over "data.frame"s
confcontr <- as_tibble(confcontr)
confcontr
# Create a variable in data indicating if trial was hit/miss/etc.
sdt <- confcontr %>%
mutate(type = "hit",
type = ifelse(isold==1 & sayold==0, "miss", type),
type = ifelse(isold==0 & sayold==0, "cr", type),  # Correct rejection
type = ifelse(isold==0 & sayold==1, "fa", type))  # False alarm
# Count hits/misses/etc. and format data to one row per person
sdt <- sdt %>%
group_by(subno, type) %>%
summarise(count = n()) %>%
spread(type, count)  # Format data to one row per person
# Calculate point estimates of EVSDT parameters
sdt <- sdt %>%
mutate(zhr = qnorm(hit / (hit+miss)),
zfa = qnorm(fa / (fa+cr)),
dprime = zhr-zfa,
crit = -zfa)

Population-level EVSDT Model

We now use these data to estimate the population-level EVSDT parameters using two methods: Manual calculation and hierarchical modeling. For hierarchical modeling, I provide R & brms code to estimate the model as a Generalized Linear Mixed Model (GLMM). I also show how to estimate the GLMM with brms’ nonlinear modeling syntax to help us understand the unequal variances model in Part 3 of this blog post.

Estimation by summarizing subjects’ point estimates

Above we calculated d’ and c for every participant in the sample:

We can therefore calculate sample means and standard errors for both parameters. Here’s one way to do it:

sdt_sum <- select(sdt, subno, dprime, crit) %>%  # Select these variables only
gather(parameter, value, -subno) %>%  # Convert data to long format
group_by(parameter) %>%  # Prepare to summarise on these grouping variables
# Calculate summary statistics for grouping variables
summarise(n=n(), mu=mean(value), sd=sd(value), se=sd/sqrt(n))
sdt_sum

The sample means (mu) are estimates of the population means, and the sample standard deviations (sd) divided by $\sqrt{N subjects}$ are estimated standard deviations of the respective sampling distributions: the standard errors (se). Because the standard deviations of the sampling distributions are unknown and therefore estimated from the data, researchers almost always substitute the Gaussian sampling distribution with a Student’s t-distribution to obtain p-values and confidence intervals (i.e. we run t-tests, not z-tests.)

Note that this method involves calculating point estimates of unknown parameters (the subject-specifc parameters), and then summarizing these parameters with additional models. In other words, we first fit N models with P parameters each (N = number of subjects, P = 2 parameters), and then P more models to summarise the subject-specific models. That’s a lot of models!

Next, we’ll use hierarchical regression1 methods to obtain subject-specific and population-level parameters in one single step.

Estimation with a hierarchical model (GLMM)

We can estimate the EVSDT model’s parameters for every subject and the population average in one step using a Generalized Linear Mixed Model (GLMM). Gelman and Hill (2007) and McElreath (2016) are good general introductions to hierarchical models. Rouder and Lu (2005) and Rouder et al. (2007) discuss hierarchical modeling in the context of signal detection theory.

This model is very much like the GLM discussed in Part 1, but now the subject-specific d’s and cs are modeled as draws from a multivariate normal distribution, whose (“hyper”)parameters describe the population-level parameters. We subscript subjects’ parameters with j, rows in data with i, and write the model as:

$y_{ij} \sim Bernoulli(p_{ij})$ $\Phi(p_{ij}) = \beta_{0j} + \beta_{1j}\mbox{isold}_{ij}$

The outcomes $y_{ij}$ are 0 if participant j responded “new!” on trial i, 1 if they responded “old!”. The probability of the “old!” response for row i for subject j is $p_{ij}$. We then write a linear model on the probits (z-scores; $\Phi$, “Phi”) of ps.

The subject-specific intercepts (recall, $\beta_0$ = -zFAR) and slopes ($\beta_1$ = d’) are described by multivariate normal with means and a covariance matrix for the parameters.

$\left[\begin{array}{c} \beta_{0j} \\ \beta_{1j} \end{array}\right] \sim N( \left[\begin{array}{c} \mu_{0} \\ \mu_{1} \end{array}\right], \Sigma )$

The means $\mu_0$ and $\mu_1$, i.e. the population-level parameters, can be interpreted as parameters “for the average person” (Bolger and Laurenceau 2013). The covariance matrix $\Sigma$ contains the subject-specific parameters’ (co)variances, but I find it easier to discuss standard deviations (I call them $\tau$, “tau”) and correlations. The standard deviations describe the between-person heterogeneities in the population. The correlation term, in turn, describes the covariance of the d’s and cs: Are people with higher d’s more likely to have higher cs?

This model is therefore more informative than running multiple separate GLMs, because it models the covariances as well, answering important questions about heterogeneity in effects.

The brms syntax for this model is very similar to the one-subject model. We have five population-level parameters to estimate. The intercept and slope describe the means: In R and brms modeling syntax, an intercept is indicated with 1 (and can be omitted because it is automatically included, here I include it for clarity), and slope of a variable by including that variable’s name in the data. To include the two regression coefficients, we write sayold ~ 1 + isold.

However, we also have three (co)variance parameters to estimate. To include subject-specific parameters (recall, subjects are indexed by subno variable in data d), and therefore the (co)variance parameters, we expand the formula to sayold ~ 1 + isold + (1 + isold | subno). The part in the parentheses describes subno specific intercepts (1) and slopes of isold. Otherwise, the call to brm() is the same as with the GLM in Part 1:

fitglmm <- brm(sayold ~ 1 + isold + (1 + isold | subno),
data = confcontr,
cores = 4,
file = here::here("static/data/sdtmodel2-1"))

Let’s take a look at the GLMM’s estimated parameters. First, direct your eyes to the “Population-Level Effects” table in the below output. These two parameters are the mean -criterion (Intercept, $\mu_0$) and d’ (isold, $\mu_1$). Recall that we are looking at numerical summaries of (random samples from) the parameters’ posterior distributions: Estimate is the posterior mean.

summary(fitglmm)
##  Family: bernoulli
## Formula: sayold ~ 1 + isold + (1 + isold | subno)
##    Data: confcontr (Number of observations: 3100)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Group-Level Effects:
## ~subno (Number of levels: 31)
##                      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)            0.26      0.06     0.16     0.39       1616 1.00
## sd(isold)                0.39      0.08     0.25     0.56       1168 1.00
## cor(Intercept,isold)    -0.57      0.19    -0.85    -0.12       1196 1.00
##
## Population-Level Effects:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.66      0.06    -0.79    -0.55       1756 1.00
## isold         1.06      0.09     0.88     1.23       1784 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We can then compare the Population-level mean parameters of this model to the sample summary statistics we calculated above. The posterior means map nicely to the calculated means, and the posterior standard deviations match the calculated standard errors:

sdt_sum

These mean effects are visualized as a colored density in the left panel of Figure 1. However, the GLMM also returns estimates of the parameters’ (co)variation in the population. Notice that we also calculated the sample standard deviations, which also provide this information, but we have no estimates of uncertainty in those point estimates. The GLMM, on the other hand, provides full posterior distributions for these parameters.

The heterogeneity parameters are reported in the “Group-Level Effects”2 table, above. We find that the standard deviation of the criteria is about , and that the criteria are positively correlated with d’s (recall that Intercept = -c). The two standard deviations are visualized in the right panel of Figure 1.

It is evident in Figure 1 that the sample means approximately match the posterior mode, but less so for the sample standard deviations, which are far from the peak of the standard deviations’ posterior distribution. By ignoring the uncertainty in the subject-specific parameters, the ‘manual calculation’ method has over-estimated the heterogeneity of d’s and cs in the population, in comparison to the GLMM which takes the subject-specific parameters’ uncertainty into account.

This idea has further implications, revealed by investigating the two methods’ estimates of the subject-specific parameters. Recall that the manual calculation method involved estimating (the point estimates of) a separate model for each participant. A hierarchical model considers all participants’ data simultaneously, and the estimates are allowed to inform each other via the shared prior distribution (right hand side of the equation repeated from above):

$\left[\begin{array}{c} \beta_{0j} \\ \beta_{1j} \end{array}\right] \sim N( \left[\begin{array}{c} \mu_{0} \\ \mu_{1} \end{array}\right], \Sigma )$

This “partial pooling” of information (Gelman and Hill 2007) is evident when we plot the GLMM’s subject-specific parameters in the same scatterplot with the N models method (calculating point estimates separately for everybody) (Figure 2).

We see that estimating the EVSDT model for many individuals simultaneously with a hierarchical model is both easy and informative. Specifically, it is now easy to include predictors on the parameters, and answer questions about possible influences on d’ and c.

Including predictors

Do the EVSDT parameters differ between groups of people? How about between conditions, within people? To answer these questions, we would repeat the manual calculation of parameters as many times as needed, and then draw inference by “submitting” the subject-specific parameters to e.g. an ANOVA model. The GLMM approach affords a more straightforward solution to including predictors: We simply add parameters to the regression model.

For example, if there were two groups of participants, indexed by variable group in data, we could extend the brms GLMM syntax to (the ... is a placeholder for other arguments used above, I also dropped the 1 for clarity because they are implicitly included):

brm(sayold ~ isold*group + (isold | subno), ...)

This model would have two additional parameters: group would describe the difference in c between groups, and the interaction term isold:group would describe the difference in d’ between groups. If, on the other hand, we were interested in the effects of condition, a within-subject manipulation, we would write:

brm(sayold ~ isold*condition + (isold*condition | subno), ...)

With small changes, this syntax extends to “mixed” between- and within-subject designs.

Estimation with a GLMM (nonlinear syntax)

Here, I briefly describe fitting the above GLMM with brms’ nonlinear model syntax, as a lead-up to the unequal variance SDT model in Part 3. The basic model is a straightforward reformulation of the single-subject case in Part 1 and the GLMM described above:

$p_{ij} = \Phi(d'_j\mbox{isold}_{ij} - c_{j})$

The varying d-primes and criteria are modeled as multivariate normal, as with the GLMM. It turns out that this rather complex model is surprisingly easy to fit with brms. The formula is very similar to the single-subject model in Part 1, but we tell bf() that the dprimes and criteria should have subject-specific parameters, as well as population-level parameters.

Above, with the GLMM, subject-specific effects were given by (1 + isold | subno). With the nonlinear modeling syntax, we specify varying effects across multiple parameters using |s| instead of | to tell brms that these parameters should be within one covariance matrix. This syntax gives us the “correlated random effects signal detection model” discussed in Rouder et al. (2007). Apart from the syntax, the model is the same as the GLMM above, but the sign of the intercept is flipped.

glmm2 <- bf(sayold ~ Phi(dprime*isold - c),
dprime ~ 1 + (1 |s| subno),
c ~ 1 + (1 |s| subno),
nl = TRUE)

This time, we’ll set priors on the mean parameters and on the (co)variance parameters. Of note is the lkj(4) parameter which slightly regularizes the d’-criterion correlation toward zero (McElreath 2016; Stan Development Team 2016).

Priors <- c(prior(normal(0, 3), nlpar = "dprime", lb = 0),
prior(normal(0, 3), nlpar = "c"),
prior(student_t(10, 0, 1), class = "sd", nlpar = "dprime"),
prior(student_t(10, 0, 1), class = "sd", nlpar = "c"),
prior(lkj(4), class = "cor"))

We fit the model as before, but adjust the control argument, and set inits to zero to improve sampling efficiency (thanks to Tom Wallis for this tip):

fitglmm2 <- brm(glmm2,
data = confcontr,
prior = Priors,
cores = 4, inits = 0,
file = here::here("static/data/sdtmodel2-2"))

Although this model samples less efficiently than the first GLMM formulation, we (unsurprisingly) observe similar results.

summary(fitglmm)
##  Family: bernoulli
## Formula: sayold ~ 1 + isold + (1 + isold | subno)
##    Data: confcontr (Number of observations: 3100)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Group-Level Effects:
## ~subno (Number of levels: 31)
##                      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)            0.26      0.06     0.16     0.39       1616 1.00
## sd(isold)                0.39      0.08     0.25     0.56       1168 1.00
## cor(Intercept,isold)    -0.57      0.19    -0.85    -0.12       1196 1.00
##
## Population-Level Effects:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.66      0.06    -0.79    -0.55       1756 1.00
## isold         1.06      0.09     0.88     1.23       1784 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fitglmm2)
##  Family: bernoulli
## Formula: sayold ~ Phi(dprime * isold - c)
##          dprime ~ 1 + (1 | s | subno)
##          c ~ 1 + (1 | s | subno)
##    Data: confcontr (Number of observations: 3100)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Group-Level Effects:
## ~subno (Number of levels: 31)
##                                   Estimate Est.Error l-95% CI u-95% CI
## sd(dprime_Intercept)                  0.36      0.07     0.22     0.51
## sd(c_Intercept)                       0.24      0.05     0.15     0.35
## cor(dprime_Intercept,c_Intercept)     0.42      0.20    -0.05     0.73
##                                   Eff.Sample Rhat
## sd(dprime_Intercept)                    1558 1.00
## sd(c_Intercept)                         1071 1.00
## cor(dprime_Intercept,c_Intercept)       1403 1.00
##
## Population-Level Effects:
##                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dprime_Intercept     1.06      0.08     0.90     1.23       2025 1.00
## c_Intercept          0.66      0.06     0.55     0.77       2294 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

For technical reasons, each parameter in fitglmm2 has a _Intercept suffix, but the results are the same. In the next part of this blog post, we take this syntax one step further to estimate the unequal variances Gaussian SDT model.

Discussion

Hierarchical modeling techniques have several advantages over traditional methods, such as (M)ANOVA, for modeling data with within-subject manipulations and repeated measures. For example, many models that previously required using parameters from subject-specific models as inputs can now be modeled within a single model. Hierarchical models naturally account for unbalanced data, and allow incorporating continuous predictors and discrete outcomes. In the specific context of SDT, we observed that hierarchical models also estimate important parameters that describe possible between-person variability in parameters in the population of interest.

From casual observation, it appears that the rise of popularity of hierarchical models is accelerating. Many applied papers now analyze data using multilevel models, instead of rm-ANOVA, suggesting that there is demand for these models within applied research contexts. Conceptualizing more complex, possibly nonlinear models as hierarchical models should then afford a unified framework for data analysis. Furthermore, by including parameters for between-person variability, these models allow researchers to quantify the extent to which their effects of interest vary and, possibly, whether these effects hold for everybody in the population.

References

Bolger, Niall, and Jean-Philippe Laurenceau. 2013. Intensive Longitudinal Methods: An Introduction to Diary and Experience Sampling Research. Guilford Press. http://www.intensivelongitudinal.com/.

Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205. https://doi.org/10.1037/1082-989X.3.2.186.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rouder, Jeffrey N., and Jun Lu. 2005. “An Introduction to Bayesian Hierarchical Models with an Application in the Theory of Signal Detection.” Psychonomic Bulletin & Review 12 (4): 573–604. https://doi.org/10.3758/BF03196750.

Rouder, Jeffrey N., Jun Lu, Dongchu Sun, Paul Speckman, Richard D. Morey, and Moshe Naveh-Benjamin. 2007. “Signal Detection Models with Random Participant and Item Effects.” Psychometrika 72 (4): 621–42. https://doi.org/10.1007/s11336-005-1350-6.

Skagerberg, Elin M., and Daniel B. Wright. 2008. “Manipulating Power Can Affect Memory Conformity.” Applied Cognitive Psychology 22 (2): 207–16. https://doi.org/10.1002/acp.1353.

Stan Development Team. 2016. Stan: A C++ Library for Probability and Sampling, Version 2.15.0. http://mc-stan.org/.

Stanislaw, Harold, and Natasha Todorov. 1999. “Calculation of Signal Detection Theory Measures.” Behavior Research Methods, Instruments, & Computers 31 (1): 137–49. http://link.springer.com/article/10.3758/BF03207704.

1. Hierarchical regression is sometimes used to mean the practice of adding predictors to a regression model based on the predictors’ p-values. Whatever you do, don’t do that.

2. The label “Group-Level Effects” might be slightly confusing because the SD and correlation parameters describe the population of subject-specific effects. I have yet to find a 100% satisfactory terminology here, but think that brms’ terminology is certainly less confusing than that of “random” and “fixed” effects, traditionally encountered in multilevel modeling literature.