participant    delay               structure block     pc
1           1    delay              rule-based    B1 0.3375
2           2 no delay information-integration    B1 0.2875
3           3 no delay              rule-based    B1 0.5250
4           4    delay information-integration    B1 0.3500
5           5    delay              rule-based    B1 0.2375
6           6 no delay information-integration    B1 0.2125Weighting in Within-Designs State-Trace Analysis (STA)
Recently, I revisited the concept of STA to deepen my understanding and examined an example from (Dunn and Kalish 2018). The example is based on a memory experiment reported in (dunn2012?), which used one between-subjects variable (feedback delay vs. no delay) and one within-subjects variable (training block, 1–4). The two dependent variables were proportion correct for rule-based (RB) and information-integration (II) category-learning tasks.
RB tasks are tasks for which a simple, verbalizable rule can be devised (e.g., “If the value x is greater than c, then it belongs to category A”). II tasks, by contrast, depend on implicit learning processes because no easily verbalizable rule exists. In the experiment, 500 ms (no-delay condition) or 5000 ms (delay condition) after stimulus presentation, a Gabor patch mask was shown.
As far as I understand, the underlying idea is that, for RB learning, feedback delay is not particularly detrimental because learners can hold candidate rules in working memory and evaluate feedback later. In contrast, II learning relies on forming stimulus–response associations through procedural learning mechanisms, which require temporal contiguity between response and feedback. Delayed feedback therefore disrupts II learning more strongly.
Note that in the paper only participants with a correct rate of at least 27% were uThe analysis of this experiment can be recreated in Rsed, but in the book all are analyzed.
Let’s load some libs and the data. I use librarian for package management.
Let’s draw a plot toget an idea of what is going on:
stats <- sta_stats(
  data = delay,
  col_value = "pc",
  col_participant = "participant",
  col_dv = "structure",
  col_within = "block",
  col_between = "delay"
)
means_df1 <- delay %>%
  group_by(structure, delay, block) %>%
  summarize(m = mean(pc))`summarise()` has grouped output by 'structure', 'delay'. You can override
using the `.groups` argument.wide <- tidyr::pivot_wider(means_df1, names_from = structure,
                           values_from = c(m))
p1 <- ggplot(wide, aes(`rule-based`, `information-integration`,
                 color = delay, group = delay, label = block)) +
  geom_point() +
  geom_text(vjust = 2) +
  theme_classic()
p1
Now, we can already see that there is a violation of monotonicity, but it is not clear how strong the violation is. But Let’s go setp by step. First we need to fit the partial order model. Two restrictions are relevant: first of all in a block the delay-group should always be worse than the no-delay group. Well, one might wonder why RB is still affected in this case. I gues sthis is because after the feedback in the no-delay group now there was a delay which is benefical for rulbe-based. Indeed, this restriction is already fulfilled, so we do not need to adjust the data. The second restriciton is that from blcok 1 to 4, performance shoud increase, which is the case except for B3/B4 delay. The partial order model needs to correct for that. Why? Well because this discrepancy is not theortically sound and must be attributed to measurement error or other type of error. In stacmr we can just call mr for monotonic regression and then draw the best solution:
mr_d1 <- mr(
  data = delay,
  col_value = "pc",
  col_participant = "participant",
  col_dv = "structure",
  col_within = "block",
  col_between = "delay",
  nsample = 10,
  partial = "auto"
)
mr_d1_s <- summary(mr_d1)
MR fit to 8 data points with call:
mr(data = delay, col_value = "pc", col_participant = "participant", 
    col_dv = "structure", col_within = "block", col_between = "delay", 
    partial = "auto", nsample = 10)
Fit value (SSE): 0.1721
p-value (based on 10 samples): 0.7 
Estimated cell means:
  rule.based information.integration within  between
1     0.3445                  0.2830     B1    delay
2     0.4434                  0.3034     B2    delay
3     0.5081                  0.3149     B3    delay
4     0.5169                  0.3149     B4    delay
5     0.3676                  0.3308     B1 no delay
6     0.4676                  0.4550     B2 no delay
7     0.5757                  0.5346     B3 no delay
8     0.6118                  0.5492     B4 no delaywide <- cbind(wide, mr_d1_s)
p1 + geom_line(data = wide, aes(rule.based, information.integration))
## for delay data: order of factor-levels corresponds to expected partial order.
## Therefore, 'partial = "auto"' can be used to enforce this order.Indeed this fixed the B3/B4 non-monotonicity. But the strange thing ist hat B1 is also slightly repositioned:
`summarise()` has grouped output by 'structure', 'delay'. You can override
using the `.groups` argument.             rule.based1              rule.based2              rule.based3 
            0.000000e+00             0.000000e+00             0.000000e+00 
             rule.based4              rule.based5              rule.based6 
            0.000000e+00             0.000000e+00             0.000000e+00 
             rule.based7              rule.based8 information.integration1 
            0.000000e+00             0.000000e+00             5.496299e-04 
information.integration2 information.integration3 information.integration4 
           -2.359129e-04             3.045620e-03            -5.157505e-03 
information.integration5 information.integration6 information.integration7 
           -1.398881e-14            -1.088019e-14            -8.659740e-15 
information.integration8 
           -1.487699e-14 In d&k the lowest value of the MR model is 0.2830 for delay, whereas initially it was 0.2836. Why is this point moved at all? The reason is that this experiment includes a within variable, which necessitates a different weighting procedure becuase the the values of B1 to B4 are ccorrelated. Higher values in b1, go along with ihgher vlaues in b2, b3 and b4. So if some point does not align, all of them will likely be adjusted, even if only slightly. The question is how we can reproduce the actual results?
#wide2 <- pivot_wider(delay, names_from =c(structure, block), values_from = pc)
#
#wide2 <- delay %>%
#  group_by(structure, delay) %>%
#  pivot_wider(names_from = block, values_from = pc)
#
#covM <- cov(wide2[, c(3, 5, 7, 9)])
#cor(select(wide2, B1:B4))So how does weighting work in this case?
\[V = N_X S_X^{-1}\]
Actually, weights are displayed in stacmr via:
V <- stats$rule.based$weights
W <- stats$information.integration$weights
V           [,1]      [,2]       [,3]       [,4]        [,5]       [,6]
[1,]  3192.7243 -1210.688   436.2617  -356.4355     0.00000     0.0000
[2,] -1210.6877  3460.623 -1937.4585  -669.5800     0.00000     0.0000
[3,]   436.2617 -1937.459  4287.8355 -2319.4907     0.00000     0.0000
[4,]  -356.4355  -669.580 -2319.4907  3033.3542     0.00000     0.0000
[5,]     0.0000     0.000     0.0000     0.0000  4909.21477 -2498.4605
[6,]     0.0000     0.000     0.0000     0.0000 -2498.46049  3232.6705
[7,]     0.0000     0.000     0.0000     0.0000   287.22054 -1204.0883
[8,]     0.0000     0.000     0.0000     0.0000   -74.88846  -379.6758
           [,7]        [,8]
[1,]     0.0000     0.00000
[2,]     0.0000     0.00000
[3,]     0.0000     0.00000
[4,]     0.0000     0.00000
[5,]   287.2205   -74.88846
[6,] -1204.0883  -379.67584
[7,]  2727.5721 -1349.75598
[8,] -1349.7560  1780.38691W          [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
[1,] 6023.1736  -954.9243  -777.7281   226.2976     0.0000     0.00000
[2,] -954.9243  4904.9758 -1456.5032 -1186.2243     0.0000     0.00000
[3,] -777.7281 -1456.5032  4180.2981 -1616.1973     0.0000     0.00000
[4,]  226.2976 -1186.2243 -1616.1973  3192.4727     0.0000     0.00000
[5,]    0.0000     0.0000     0.0000     0.0000  5917.7959 -2643.97092
[6,]    0.0000     0.0000     0.0000     0.0000 -2643.9709  3345.70934
[7,]    0.0000     0.0000     0.0000     0.0000  -307.5252 -1229.73039
[8,]    0.0000     0.0000     0.0000     0.0000  -204.0576   -12.65251
           [,7]        [,8]
[1,]     0.0000     0.00000
[2,]     0.0000     0.00000
[3,]     0.0000     0.00000
[4,]     0.0000     0.00000
[5,]  -307.5252  -204.05759
[6,] -1229.7304   -12.65251
[7,]  4018.6958 -2498.85536
[8,] -2498.8554  2771.05065Now, we are able to reproduce the correct fit value:
     [,1]
[1,]    0          [,1]
[1,] 0.1721286The last two are summed, but the first is 0, so the last value should be the fit of the MR model:
mr_d1$fit[1] 0.1721286Now if you compare this with a primitive procedure, weighting only be inverse of var of means, you will actually get a worse fit, if you apply the correct weighting afterwards for the fit calculation:
          [,1]
[1,] 0.1757721         [,1]     [,2]     [,3]     [,4]     [,5]      [,6]     [,7]     [,8]
[1,] 1241.479 385.4362 166.5004 179.6204 478.3379 -16.75768 114.7728 112.6563So it actually matters and this also explains the strange moving of points that are already monotonic. Since the values depend on each other, they can all move, even if it looks liek they are already in perfect position.
Now, I tried to teach myself linear algebra but this was just the basics without much application of stats. So I do not fully understand what is happening. chat-gpt told me that the the sampling variance of a linear combination is: \[ c^T \Sigma c\], where c is the contrast and Sigma the covariance matrix. if ou chec this with a simple t-test example, it actualyl works out:
[ {X} = \[\begin{bmatrix} \bar X_A\\[4pt] \bar X_B \end{bmatrix}\] = \[\begin{bmatrix} 10\\[4pt] 12 \end{bmatrix}\]. ]
[ = \[\begin{bmatrix} \mathrm{Var}(\bar X_A) & \mathrm{Cov}(\bar X_A,\bar X_B)\\[6pt] \mathrm{Cov}(\bar X_B,\bar X_A) & \mathrm{Var}(\bar X_B) \end{bmatrix}\] = \[\begin{bmatrix} 0.1333 & 0.1667\\[6pt] 0.1667 & 0.3000 \end{bmatrix}\]. ]
[ c = \[\begin{bmatrix} 1\\[4pt] -1 \end{bmatrix}\]. ]
[ D = c^{X} = \[\begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 10\\[4pt] 12 \end{bmatrix}\]= -2. ]
[ (D) = c^c = \[\begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 0.1333 & 0.1667\\[4pt] 0.1667 & 0.3000 \end{bmatrix} \begin{bmatrix} 1\\[4pt] -1 \end{bmatrix}\]. ]
[ c = \[\begin{bmatrix} 0.1333 - 0.1667\\[4pt] 0.1667 - 0.3000 \end{bmatrix}\] = \[\begin{bmatrix} -0.0334\\[4pt] -0.1333 \end{bmatrix}\], c^(c) = -0.0334 + 0.1333 = 0.10. ]
[ (D) = = = 0.316. ]
[ t = = = -6.33. ]
The cool thing about the weight matrix is that it can always be used. If you have a between design, the weight matrix will only have values in the diagonale. covariance might not always be 0, but the pair-wise n will be 0 for between. If you have a within design, the covariance matrix and N-matrix will typicall have non-0-entries everyhwere and if you have a between design some entries in the covariance matrix will be 0, some not. How cool is that?