Produce the results for ‘Towards the definition and measurement of routines and their impact on cognitive control’
This notebook provides a computationally reproducible record of the analysis and figure generation for the paper ‘Towards a normative theory of routines’ (or whatever we wind up deciding to call it).
Settings and other things
This notebook assumes you have the following file structure. Note that the pre-existing csv files were generated by the code in this repository., and the .Rda file was generated in the previous dopamine study using code in this repository.
routines_produce-results/│ ├── _quarto.yml│ ├── *.Rproj│ ├── this-qmd-doc.qmd│ ├── R/│ │ └── all-r-scripts.R│ ├── data-wrangled/│ │ └── exp_[exp_str]_evt.csv│ │ └── exp_[exp_str]_avg.csv│ │ └── exp_lt_maggi-k4.csv│ │ └── dat4_seq_model.Rda ### need to find origin of this file and make consistent across locations│ ├── figs/│ ├── sims/│ ├── res/│ ├── analysis-output/
First, load required packages and set the relative paths for data and other required things…
Code
options(tidyverse.quiet =TRUE)library(tidyverse)library(grid)library(gridExtra)library(knitr)library(magick)library(ggpubr)library(vioplot)library(rstatix)library(emmeans)library(afex)library(pdftools)library(purrr)library(GGally)data_path ='data-wrangled/'# for all data derivssim_data_path ='sims/'# for simulation resultsfig_path ='figs/'# for figuresres_path ='res/'# for inferential resultsfunction_loc <-"R"# where are the functions?req_functions <-list.files(function_loc)sapply(req_functions, function(x) source(paste(here::here(function_loc), x, sep="/")))# # now some font settingslibrary(extrafont)#font_import() # run this once only, comment out after first timeloadfonts(device='pdf')
Q1. Can we produce routines in a consistent way across sessions and conditions?
Here, I calculate the routine (R) scores for each participant, generate density plots of those scores over experiments and conditions, and then I print out the analysis of the basis differences between conditions, across replications.
Compute TE Scores
First I compute TE scores for each participant. Note that these shall now mostly be referred to as R scores within this document.
Code
apply_sort_data <-function(exp_str, data_path){# take the trial level data from session 2 and sort into contexts. We assume a person believes they are in the same context until their first 'cc' hit after a context switch. data_fname <-paste(data_path, 'exp_', exp_str,'_evt.csv', sep='') save_fname <-paste(data_path, 'exp_', exp_str,'_door_selections_for_ent.csv', sep='')ent_initial_data_sort(data_fname, save_fname)}lapply(c('lt', 'ts'), apply_sort_data, data_path=data_path)apply_compute_R <-function(exp_str){# take the data output from the step above, and compute the routine score for each participant and context data_fname <-paste(data_path, 'exp_', exp_str,'_door_selections_for_ent.csv', sep='') save_new_data_fname <-paste(data_path, 'exp_', exp_str,'_rscore-full.csv', sep='') # for saving R scores not averaged across context. Note that original plotting code in the function below, now commented out, shows that save_sum_data_fname <-paste(data_path, 'exp_', exp_str,'_rscore.csv', sep='')ent_compute_routine_measure(data_fname, save_new_data_fname, save_sum_data_fname)}lapply(c('lt', 'ts'), apply_compute_R)
Now that we have the R scores, we can investigate whether we elicit R scores that are greater than you would expect by chance, measure something reliable in humans, and we want to get a feel for what the behaviour is that we are quantifying.
I will make 3 subplots: one showing the z-scores for humans vs agents, one showing the reliability of the measure over sessions, and one showing the trajectories for 2 participants. These will then be put together into one big figure, for the manuscript
Trajectories
Note that the below ‘function’ contains hard coded, handpicked subsets of trials, to show the trajectories in performance.
Code
exp_str <-'lt'traj_data_fname <-paste(data_path, 'exp_', exp_str, '_evt.csv', sep="")traj_fig_fname <-paste(fig_path, 'trajectories', sep='')w <-10# width of plot, in cmtraj_plt <-plot_trajectories(traj_data_fname, traj_fig_fname, w)
Now we’ve created it, lets display it -
Figure 1: Fig 1: example trajectories across doors
R scores are systematically higher than you would expect by chance
The next thing we want to demonstrate is the R scores are statistically higher than you would expect by chance. To achieve this, we first generate a null distribution of door selections for each subject, and obtaining a z score for their observed R score and the mean and sd of the null. We will compare this to zero, and to the performance of a perfectly performing, random agent.
First, to make the nulls, we take their observed data, and resample it a thousand times, under the constraint that the same door can’t be sampled twice.
Note that the evaluation of the code block below is set to false, as it takes a while to get the null distributions
Code
set.seed(42) # for reproducibility is the meaning of lifeexp_strs =c('lt', 'ts')apply_generate_nulls <-function(exp_str){ door_selections_fname =paste(data_path, 'exp_', exp_str, '_door_selections_for_ent.csv', sep='') r_dat_fname =paste(data_path, 'exp_', exp_str, '_rscore-full.csv', sep='') null_rs_fname =paste(data_path, 'exp_', exp_str, '_rnulls.csv', sep='') # for savinggenerate_nulls(door_selections_fname, r_dat_fname, null_rs_fname)}lapply(exp_strs, apply_generate_nulls)
Now that we have the null Rs, we can compute a z-score for each participant, based on their observed R, and the null distribution. We will save the z-scores for plotting, and save the results of t-tests against zero and against -2 to a csv file. Note that when the below was originally run, visual inspection of the data showed some outliers, so I removed participants with a z-score > or < 3 standard deviations from the mean.
`summarise()` has grouped output by 'sub'. You can override using the `.groups`
argument.
`summarise()` has grouped output by 'sub'. You can override using the `.groups`
argument.
Lets look at those stats!
Code
z_stats <-do.call(rbind, lapply(c('lt', 'ts'), function(x) read.csv(paste(res_path, 'exp_', x, '_zs_cl_inf-test.csv', sep=''), header=T)))#| label: distribution of z scores (routine score against permuted null)#| tbl-cap: "distribution of z scores (routine score against permuted null)"kable(z_stats %>%select(!data.name), digits=2)
t
df
p
conf.int1
conf.int2
M
null.value.mean
SE
alternative
method
id
-6.69
97
0
-Inf
-12.82
-17.06
0.00
2.55
less
One Sample t-test
zero
-5.92
97
0
-Inf
-12.82
-17.06
-1.96
2.55
less
One Sample t-test
p95
-8.61
97
0
-Inf
-8.69
-10.77
0.00
1.25
less
One Sample t-test
zero
-7.04
97
0
-Inf
-8.69
-10.77
-1.96
1.25
less
One Sample t-test
p95
Now we want to visualise the z-scores, but we will also compare them to the performance of a perfectly performing but perfectly random agent, whose choices are similar constrained like the participant nulls (i.e. the same door cannot be picked twice in a row).
Note that the below code block does not run, owing to the comp time required to generate the nulls.
Now lets plot the histograms of humans vs random agent. Note this code produces a labelled pdf (humans v agent) for talks, and a pdf pared back for a manuscript.
Fig 2: Showing z-scores for humans and random agent
Next, we can compare the mean z-score of the humans to that of the random agent - i.e. what is the probability that the mean z-score of the humans comes from the null distribution of the random agent?
Code
exp_strs <-c('lt', 'ts')human_zs <-lapply(exp_strs, function(x) read.csv(paste(data_path,'exp_', x,'_zs_cl.csv', sep=''), header=T))# this will load a n length vector called 'zs' where n = the number of times a null was generated for the random agentload(paste(sim_data_path, 'random-agent_z-score-analysis.Rds', sep=''))human_v_agent_zs <-sapply(human_zs, function(x) (mean(x[,'mu_z']) -mean(zs))/sd(zs))names(human_v_agent_zs) <- exp_strshuman_v_agent_ps <-sapply(human_v_agent_zs, pnorm)names(human_v_agent_ps) <- exp_strs# make a dataframe of the results and save as a csvhum_z_dat <-tibble(exp = exp_strs,mu =c(round(mean(zs),2), NA),sd =c(round(sd(zs),2), NA),zs = human_v_agent_zs,ps = human_v_agent_ps)# this needs to be rounded to 2 dp.hum_z_dat[,c("zs", "ps")] <-apply(hum_z_dat[,c("zs", "ps")], 2, round, 2)write.csv(hum_z_dat, file=paste(res_path, 'human_v_agent_zs.csv', sep=''),row.names=FALSE)# print out the resultsprintf('the probability that the human data comes from the null distribution is p= %.2f for the lt exp, and p= %.2f for the ts exp', round(human_v_agent_ps['lt'],2), round(human_v_agent_ps['ts'],2))
[1] "the probability that the human data comes from the null distribution is p= 0.00 for the lt exp, and p= 0.00 for the ts exp"
Routine scores are reliable across sessions
Now that we know we elicit routines above what you would expect by chance, we next ask, using the data from the dopamine study, how reliable are these routines within an individual?
Code
rel_data_fname <-paste(data_path, 'dat4_seq_model.Rda', sep='') # note: this datafile was generated by this project -https://github.com/garner-code/DA_VisRoutes and copied over to this project for the current analysis rel_data_save_name <-paste(data_path, 'rel_rs_wf.csv', sep='') # save the r's from da and placebo in widefromrel_analysis_save_name <-paste(res_path, 'da_reliability_analysis.csv', sep='')reliability_analysis(rel_data_fname, rel_data_save_name, rel_analysis_save_name)# now we've made the data file, report the results of the reliability analysisrel <-read.csv(rel_analysis_save_name)fmt ='the measure demonstrates some reliability, r = %.2f, 95 CI[%.2f, %.2f], t(%.0f) = %.2f, p = %.2f'sprintf(fmt, rel$r, rel$l, rel$u, rel$df, rel$t, rel$p)
Now let’s see the correlation in the data -
Code
# note the 'r' is for reliabilityr_p_wdth <- z_p_wdth/2# plot width of ms plot, in cm - its half the width of the z-score plotr_p_hgt <- z_p_hgtr_col = col_scheme[3]fig_lab ='C'rel_data_save_name = rel_data_save_namerel_plt_fname =paste(fig_path, 'reliability_data', sep='')plot_rel(r_p_wdth, r_p_hgt, r_col, fig_lab, rel_data_save_name, rel_plt_fname)
Fig 3: Correlation of routine scores between sessions
Now I can put Figs 1, 2 and 3 together to make one figure. Going to use the magick package, as dealing with pdfs.
Code
# fnms = c(traj_fig_fname, ms_z_plt_fname, rel_plt_fname)fnms =c(traj_fig_fname, ms_z_plt_fname)ims =lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))im_info =lapply(ims, image_info)top_width = im_info[[2]]$width #+ im_info[[3]]$widthbottom_width = im_info[[1]]$widthbottom_centered <-image_extent(ims[[1]],geometry =paste0(top_width, "x", ... = im_info[[1]]$height),gravity ="north",color ="none")# multi <- c(ims[[2]]) %>%# image_append() %>%# c(bottom_centered) %>% # might want to adjust the size of this# image_append(stack=TRUE)multi <-c(ims[[2]], ims[[1]]) %>%image_append(stack=TRUE)traj_hght = (w/6*2)+.5# get the height of the traj plotpdf(paste(fig_path, 'FigA.pdf', sep=''), width=z_p_wdth+1, # +r_p_wdth,height=z_p_hgt+traj_hght)plot(multi)dev.off()
Can we systematically manipulate how routine someone is?
Now we want to demonstrate that over our two experiments, our training manipulation of stable vs less stable contexts make people more or less routine. We then want to locate the source of disruption to the routine (i.e. what kind of responses are different between the two groups)
How did the training manipulation affect routine scores over the two experiments?
To answer this, we need to visualise the routine scores by training group, across the two experiments.
Code
# first, I need to get the routine scores matched with the training group scoreslapply(c('lt', 'ts'), get_r_info_and_save, data_path=data_path)col_scheme =c("#ef8a62","#999999")plot_r_train(p_wdth = z_p_wdth, p_hgt = z_p_hgt, cols = col_scheme, x_rng =c(0, 40),y_rng =c(0, 25),exp_strs = exp_strs,data_path = data_path,r_by_g_fname =paste(fig_path, 'rs_by-traintype_hists', sep=''))
Fig 4: How training condition impacted routine scores
As predicted, there is a systematic difference between training conditions. Now we can test the difference between them. Note that as the scores are clearly skewed, we will log transform the r scores prior to performing a t-test to compare the two groups in each experiment.
The results are pleasingly consistent across both experiments, so will show the inferential outcomes in a table -
Code
kable(grp_r_comp, digits=2)
comparing routines by groups
t
df
p
mu_diff
mu_l
mu_u
d
d_l
d_u
dv
exp
3.39
97.44
0
-0.47
-0.74
-0.19
0.68
1.17
0.32
log r
lt
3.69
96.67
0
-0.47
-0.72
-0.21
0.74
1.17
0.34
log r
ts
What kinds of responses account for the differences in routine observed between the groups, across the two experiments?
This is where we look at the task-jumps measure between the two groups. Here I create a group x trial type dataframe and draw and save the boxplots.
Code
# first, get the task jump data - note that I am not saving it as a csv,# as it already exists with all the relevant info in the _avg csvget_jump_data <-function(exp_str, data_path){ tmp <-read.csv(paste(data_path, 'exp_', exp_str,'_avg.csv', sep='')) %>%filter(ses ==2) %>%select(sub, train_type, context, switch, context_changes) %>%group_by(sub, train_type, switch) %>%summarise(jumps=mean(context_changes)) %>%ungroup() tmp$exp = exp_str tmp}jumps <-do.call(rbind, lapply(exp_strs, get_jump_data,data_path = data_path))
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
Note that general errors are low, and that information can be called in from ….
It looks like there is an interaction between group and switch (in fact we know there is, from Emily & Dinuk’s analysis). Here I replicate their results: -
Code
# first set the factors to be factorsjumps$sub <-as.factor(jumps$sub)jumps$train_type <-as.factor(jumps$train_type)levels(jumps$train_type) <-c('stable','variable')jumps$switch<-as.factor(jumps$switch)levels(jumps$switch) <-c('stay','switch')# now run the ANOVA model# set emmeans option to multivariateafex_options(emmeans_model ="multivariate")# perform the statistical modelexp_strs <-unique(jumps$exp)tj_aovs <-lapply(exp_strs, function(x) aov_ez("sub", "jumps", jumps %>%filter(exp == x), within ="switch",between ="train_type"))
Contrasts set to contr.sum for the following variables: train_type
Contrasts set to contr.sum for the following variables: train_type
Code
names(tj_aovs) <- exp_strs# convert the anovas to a dataframe, save the results and show as a tableconvert_aov_to_df <-function(aovl, exp_str){ ref = aovl$anova_tabletibble(exp =rep(exp_str, length(ref$F)),effect =rownames(ref),numDF = ref$`num Df`,denDF = ref$`den Df`,Fstat =round(ref$F, 2),ges =round(ref$ges, 2),p =round(ref$`Pr(>F)`, 2))}tj_aov_dat <-do.call(rbind, lapply(1:length(exp_strs),function(x) convert_aov_to_df(tj_aovs[[x]], exp_strs[x])))tj_aov_dat$id <-with(tj_aov_dat, paste(exp, effect, sep="_"))write.csv(tj_aov_dat, paste(res_path, 'task-jumps_aov.csv'), row.names=FALSE)
Code
#| label: impact of condition on routines#| tbl-cap: "impact of switch condition on routines by group"kable(tj_aov_dat, digits=2)
exp
effect
numDF
denDF
Fstat
ges
p
id
lt
train_type
1
98
8.35
0.08
0.00
lt_train_type
lt
switch
1
98
5.51
0.00
0.02
lt_switch
lt
train_type:switch
1
98
24.36
0.01
0.00
lt_train_type:switch
ts
train_type
1
98
11.07
0.10
0.00
ts_train_type
ts
switch
1
98
0.26
0.00
0.61
ts_switch
ts
train_type:switch
1
98
20.21
0.01
0.00
ts_train_type:switch
Now I need to follow up the interactions, and grab the estimated marginal means so I can report the main effects.
Getting the desc stats on the main effect of group:
Code
# first I get the between group effectme_grp <-lapply(tj_aovs, function(x) summary(emmeans(x, 'train_type')))me_grp <-do.call(rbind, me_grp)me_grp$exp <-rep(exp_strs, each=length(unique(me_grp$train_type)))names(me_grp) <-c('group', 'emmean', 'se', 'df', 'l', 'u', 'exp')me_grp$id <-with(me_grp, paste(exp, group, sep="_"))cols2round <-c("emmean", "se", "l", "u")me_grp[,cols2round] <-apply(me_grp[,cols2round], 2, round, 2)write.csv(me_grp, paste(res_path, '/task-jumps_ph_me-grp.csv', sep=''))
Code
#| label: main effect of group - estimated marginal means#| tbl-cap: "main effect of group - estimated marginal means"kable(me_grp, digits=2)
group
emmean
se
df
l
u
exp
id
stable
0.40
0.13
98
0.13
0.67
lt
lt_stable
variable
0.95
0.13
98
0.68
1.22
lt
lt_variable
stable
0.48
0.15
98
0.19
0.78
ts
ts_stable
variable
1.18
0.15
98
0.88
1.47
ts
ts_variable
And get the mean estimates for each switch by group condition -
Code
# and now I test the following simp_fx <-lapply(tj_aovs, function(x) list(summary(emmeans(x, 'switch', by='train_type'))))sw_by_tt <-do.call(rbind, lapply(simp_fx, function(x) summary(x[[1]])))sw_by_tt$exp <-rep(c('lt','ts'), each =4)names(sw_by_tt) <-c('group', 'switch', 'emmean', 'se', 'df', 'l', 'u', 'exp')sw_by_tt$id <-with(sw_by_tt, paste(exp, group, switch, sep="_"))sw_by_tt[,cols2round] <-apply(sw_by_tt[,cols2round], 2, round, 2)write.csv(sw_by_tt, file=paste(res_path, 'task-jumps_ph_se-sw.csv', sep=''), row.names=FALSE)
Code
#| label: estimated means for each level of group x switch #| tbl-cap: "estimated means for each level of group x switch"kable(sw_by_tt)
group
switch
emmean
se
df
l
u
exp
id
stable
stay
0.35
0.14
98
0.08
0.63
lt
lt_stable_stay
stable
switch
0.44
0.13
98
0.18
0.71
lt
lt_stable_switch
variable
stay
1.08
0.14
98
0.80
1.35
lt
lt_variable_stay
variable
switch
0.82
0.13
98
0.56
1.09
lt
lt_variable_switch
stable
stay
0.35
0.14
98
0.07
0.63
ts
ts_stable_stay
stable
switch
0.62
0.16
98
0.30
0.94
ts
ts_stable_switch
variable
stay
1.28
0.14
98
1.00
1.56
ts
ts_variable_stay
variable
switch
1.07
0.16
98
0.75
1.39
ts
ts_variable_switch
Decide whether to add further post-hoc info.
We can conclude at this point that frequent switching makes people more variable in the order in which they perform their behaviours, and that this might be because they are jumping between the two tasks more frequently. This may be because they are mixing up the two tasks more. We also want to rule out general confusion. To do this, we look at responses that come from neither task. Note, this figure can go in the supplemental info.
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
Fig 5: People in the two groups showed the same levels of out of task errors
Now I analyse errors in the same way as I did task jumps. Note that after correcting for multiple tests, only the main effect of switch is consistently significant across the two experiments.
Code
# first set the factors to be factorserrors$sub <-as.factor(errors$sub)errors$train_type <-as.factor(errors$train_type)levels(errors$train_type) <-c('stable','variable')errors$switch<-as.factor(errors$switch)levels(errors$switch) <-c('stay','switch')# perform the statistical modelexp_strs <-unique(errors$exp)er_aovs <-lapply(exp_strs, function(x) aov_ez("sub", "errors", errors %>%filter(exp == x), within ="switch",between ="train_type"))
Contrasts set to contr.sum for the following variables: train_type
Contrasts set to contr.sum for the following variables: train_type
#| label: effect of switch on errors#| tbl-cap: "effect that switch has on general errors"kable(me_sw_er, digits=2)
switch
emmean
se
df
l
u
exp
id
stay
0.03
0.01
98
0.01
0.04
lt
lt_stay
switch
0.04
0.01
98
0.02
0.05
lt
lt_switch
stay
0.03
0.01
98
0.02
0.05
ts
ts_stay
switch
0.05
0.01
98
0.04
0.07
ts
ts_switch
OK, so now we know that the variable group showed less consistent behaviours in terms of order, but comparable accuracy to the stable group (the test is well powered and we can’t discern a group difference).
So being in the variable group impacts your routines. The next question is why?
Why did the stable vs variable manipulation impact routines?
The key thing we want to know is, did we mess up people’s routines because they were probability matching to context, instead of tracking the probability of success, given the context?
Using Bayes theorem, we compute the probability that you are still in context A, given that you have observed that n number of doors from A, and m number of doors from B do not have a target behind them. We can use this probability to then compute the probability of success on your next go, given the number of doors left in each context.
Using Bayes theorem, we can cast the probability that you are still in \(C_A\), given you have observed n and m false doors:
\[P(C = A | n, m) = \frac{ P(n,m|C=A)P(C=A)}{P(n,m)}\]
Once we have the probability of being in that context, given n and m, we can compute the probability of success for staying in that context, and the probability of success for moving to the other context, by taking the new probability of still being in context A, and multiplying it by the probability of success, given the number of doors left in A.
where S is success, and \(n_{\mathrm{max}}\) is the total number of n that can be chosen (i.e. the number of target doors in the set).
Note that we ignore repetitions when calculating these trial by trial probabilities.
We want to see how well each set of probabilities accounts for task jumps, over and above what can be accounted for by the number of switches a participant experiences (i.e. with increasing switches comes increasing chance of confusion), or the switch rate (how often a switch occurs on average, given experience).
Switches are the cumulative sum of experienced switches - i.e. how often you found a target in a context that is different to the last context where you found a target:
\[
\mathrm{Sw} = \sum_{i=1}^S S(i)
\]
And switch rate is simply the probability of a switch, given the number of targets you have found:
\[
\mathrm{Sw_{r}} = \frac{\mathrm{Sw}}{t}
\]
where t is the total number of targets found so far.
First lets get the data in shape. First we get the switch and switch rate variables -
Code
exp_strs <- exp_strsget_sw_info <-function(exp_str){ dat <-read.csv(paste(data_path, 'exp_', exp_str, '_evt.csv', sep='')) %>%filter(ses ==2) %>%select(sub, train_type, t, context, door, door_cc, door_oc, door_nc, switch)################################################## add Sw and Swr regressors for each subject subs <-unique(dat$sub) dat <-do.call(rbind, lapply(subs, get_Sw, dat=dat)) dat$exp = exp_str dat}dat <-do.call(rbind, lapply(exp_strs, get_sw_info))
Now I will get \(p(C = A | n, m)\) and \(p(S|C,n)\). The function I call is a bit of a beast, but hopefully its sufficiently commented that its easy enough to follow what is happening. Basically for each trial, I calculate:
\(p(C = A) = \frac{1}{t} \sum_{i=1}^{t} (C_i == C_{i-1})\)
For each trial, I then count the number of n’s and m’s. Note that n and m are characterised according to the last place you found a target. e.g. if I just found a target from CA, but the true state of the world is now CB, any selection from CB would be counted as an m, as I have no evidence yet to tell me its an n. Note that each n and m is only counted once (duplicates are not counted).
For the cumulative counts of n and m on each trial, I assign the probability of that many nulls, given you are in the context to which those doors belong. Note the equation below is defined for n, but I also do the same for \(p(m|C=B)\)
\[
p(n | C=A) = \frac{n_{max} - n}{n_{max}}
\]
This allows me to compute, based on trial by trial experience, \(p(C = A | n, m)\) and \(p(S|C=A, n,m)\). At the end, I then shift all the information down 1 row, as the current outcome should affect the next behaviour. For each regressor, I then take the odds by dividing by \(p(C=B|n,m)\) and \(p(S|C = B,n,m)\). I then use the odds variables in the subsequent logistic regressions.
Code
exp_strs <- exp_strsget_pcanm_info <-function(exp_str, dat){################################################## add Sw and Swr regressors for each subject subs <-unique(dat$sub) dat <-do.call(rbind, lapply(subs, get_p_context, dat=dat)) dat}dat <-do.call(rbind, lapply(exp_strs, function(x) get_pcanm_info(x, dat %>%filter(exp == x))))write.csv(dat, paste(data_path, 'evt-dat_4log-reg.csv', sep=''), row.names=FALSE)rm(dat)
Now I have the model-based regressors, I use them in a logistic regression to predict \(m\) selections for each individual. Note that I scale the predictors before running the model.
Note that I played with plotting the individual level fits, and while I was happy the models were doing a good enough capturing individual variance, I am yet to find a satisfying way to plot the individual model fits (owing to the multiple continuous predictors). So I will skip straight to the second level analysis, where we take the parameters from the logistic regression models and compare between groups.
Code
# some notes re: cleaning. For some participants, the algorithm did not converge, or the estimated probabilties could not be distinguished between 0 and 1. Eyeballing the boxplots, these subjects largely fell outside the > 1.5 x the IQR (although sometimes the IQR excluded extra participants for whom there was no warning). Removing outliers > 1.5 x IQR seems the most objective way to treat the data, to me.apply_clean_and_analyse_betas <-function(exp_str, data_path, res_path){ betas <-read.csv(paste(data_path, 'betas_', exp_str, '_first-level.csv',sep='')) betas <-apply_outlier_filter_to_all_vars(betas)# save the new data file, as it has extra variableswrite.csv(betas, file=paste(data_path, 'betas_', exp_str, '_first-level_cln.csv',sep=''))apply_t_tests_to_all_vars(betas, exp_str, res_path) # this applies inferential t-tests and writes the results to csv files in res_path}lapply(c('lt', 'ts'), apply_clean_and_analyse_betas, data_path = data_path,res_path = res_path)
Now I am going to collapse across the two experiments to see if the p(success|task) holds up:
I don’t present the beta inferential tests here, as there are so many, but the take home is that they all betas are significantly above zero, the mu is clearly sig between the two groups, which we know, there is some glimpse of an impact of p(scs|cntxt), for each experiment, and it is the only significant variable(other than mu) when both experiments are aggregated.
Next step is to blot the beta coefficients by group (and analysis)
Code
# now that we have analysed the betas ()# plan is to do box plots# top row = co-effs by parameter, by group # lt# second row = co-effs by parameter, by group # tsmake_coefs_plts_4paper_andtlks(data_path,paste(fig_path, 'betas_scnd-lvl',sep=''),p_wdth=12,p_hgt=12)
png
2
Now lets see the results:
Figure 2: Showing the betas by group and parameter
So it looks like all of the tested factors contributed to the increase of other context door selections, which differed by groups (who differed by routine). It looks like the variable group was generally more confused than the other group, with a small suggestion that they may have also been matching to the probability of the context.
As an exploratory plot, I plot routine scores against beta coefficients, to check the correlations between them appear to do what we think - this is possibly one for the supplemental
Code
exp_str ='lt'r_dat =read.csv(paste(data_path, 'exp_', exp_str, '_rscore.csv', sep=''))betas =read.csv(paste(data_path, 'betas_', exp_str, '_first-level_cln.csv',sep=''))betas_r <-inner_join(betas %>%select(c(sub, train_type, ends_with('_flt'))), r_dat, by='sub')betas_r <- betas_r %>%na.omit() %>%mutate(r =log(r))# first I'll check for multivariate outliersmhl.mat <-as.matrix(betas_r[,c('mu_flt', 'scs_flt', 'cntx_flt','r')]) mhl.cov <-cov(mhl.mat) # here I get the covariance matrixmhl.dist <-mahalanobis(mhl.mat, colMeans(mhl.mat), mhl.cov) # now calc the M distcut_off <-qchisq(.001, df=length(mhl.dist)-1)#hist(mhl.dist)sum(mhl.dist > cut_off) # we keep all the data
What this plot tells us is that the more that the probability of context increases your log odds of hitting a door from the other context, the less that the probability of success given context influences the probability of hitting a door from the other context. This is good. It tells us that the parameters trade off against each other in a way we would expect. (Note that I don’t include sw and swr, for visual clarity. Comfortingly, the correlations between these parameters and the others were not significant (nor with the routine scores.))
To understand the relationship between R scores and the remaining predictors, here is an optional lm to run. I am not concluding much from it as everything is calculated on the same data, but it does tell us that we can’t trust the direction of the first order correlations with r, so I am not going to over interpret the direction here. But we do see that these predictors correlate with the routine score.
Does the formation of routines attenuate capacity to transfer learning, or promote task switching performance?
Now the cherry on top. We ask whether the extent to which you became routine (the lower your r score) predicts your difficulty being flexible, or corresponds to you being better at task switching.
Learning transfer/flexibility
First steps are to wrangle the routine and group x transfer task data, and looking at learning onset for each group and task as a boxplot. Note that from visual inspection of qqplots, I decided to log transform the data, to improve normality. Then I take the ratio of the scores from each task and correlate with the routine score, then an inferential test of the things. Last, I add mean accuracy from training, to see if that changes the effect (note to self, maybe do a path analysis if required).
Code
# wrangle datar_dat <-read.csv(paste(data_path, 'exp_lt_rscore.csv', sep=''))onset_dat <-read.csv(paste(data_path, 'exp_lt_maggi-k4.csv', sep='')) %>%select(sid, ses, transfer, k4_onset) %>%filter(ses ==3) %>%mutate(transfer=recode(transfer, `1`='comp', `2`='part')) %>%pivot_wider(names_from=transfer, values_from=k4_onset) %>%mutate(k4 = part / (comp + part))names(onset_dat)[names(onset_dat) =="sid"] ="sub"# proportion makes it more normal# some participants never learned, so we lose anyone who has a k4 of NaN,# as that means their score was Infonset_dat <- onset_dat %>%na.omit() %>%filter(is.finite(comp))# this gets rid of most, but still one to get rid of# now join to the r datonset_dat <-inner_join(onset_dat, r_dat, by='sub')# now I log my variables, and I load the avg data so that I can get the training # group varialeonset_dat <- onset_dat %>%mutate(log_r =log(r))grp_inf <-read.csv(paste(data_path, 'exp_lt_avg.csv', sep='')) %>%filter(ses ==3) %>%select(sub, train_type) %>%unique()onset_dat <-inner_join(onset_dat, grp_inf, by='sub') %>%mutate(train_type=recode(train_type, `1`='stable', `2`='variable'))rm(grp_inf)# Now I have the data, I want to replace any outliers with NAdvs <-c('comp', 'part', 'k4', 'log_r')onset_dat <-cbind(onset_dat,do.call(cbind, lapply(dvs, remove_outliers, betas=onset_dat)))
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(dv)
# Now:
data %>% select(all_of(dv))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(nu_dv)
# Now:
data %>% select(all_of(nu_dv))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Code
rem <-do.call(cbind, lapply(onset_dat, function(x) sum(is.na(x))))rem <- rem[colnames(rem) %in%c("k4_flt", "log_r_flt")]rem <-tibble(var=c("k4_flt", "log_r_flt"),rem=rem)write.csv(rem, paste(res_path, 'exp_lt_k4_r_outs-rem.csv', sep=''), row.names =FALSE)# and last, I want to get the accuracy/general error data from the training phase,trn_dat <-read.csv(paste(data_path, 'exp_lt_avg.csv', sep='')) %>%filter(ses ==2) %>%select(sub, switch, accuracy, setting_errors, general_errors) %>%group_by(sub) %>%summarise(across(where(is.numeric), ~mean(.x, na.rm=T))) onset_dat <-inner_join(onset_dat, trn_dat, by="sub")write.csv(onset_dat, paste(data_path, 'exp_lt_onsetwr.csv', sep=''), row.names=FALSE)
Now that I have the wrangled data, I will make 3 subplots and then combine them. The first is the main effect of partial v transfer, the second is the percentage difference by group, and the third is the correlation between r and the k4 bias measure.
Code
# first, I make the comp/part data longer so I can plot ittmp <- onset_dat %>%select(sub, comp, part, train_type) %>%pivot_longer(cols=c('comp', 'part'), values_to ='k4_on',names_to='transfer')col_scheme =c('#ccba72ff', '#4c5454ff')k4_by_grp_fname =paste(fig_path,'k4_by_grp', sep='')trn_bp_wdth = z_p_wdth*.8trn_bp_hgt = z_p_hgtplt_tran_bp_4paper_andtlks(k4_by_grp_fname, trn_bp_wdth, trn_bp_hgt, tmp,'k4_on ~ transfer*train_type', col_scheme,'Onset',c(0, 200),fig_lab ='A')
png
2
Knowledge onset of 4 doors in the transfer tasks, by group.
Warning in (function (main = NULL, sub = NULL, xlab = NULL, ylab = NULL, : font
width unknown for character 0x20 in encoding latin1
Warning in (function (main = NULL, sub = NULL, xlab = NULL, ylab = NULL, : font
width unknown for character 0x20 in encoding latin1
png
2
Transfer bias score, by group.
Now, combine the boxplot and the bias scores into one figure.
Code
fnms =c( paste(fig_path, 'k4_by_grp', sep=''), paste(fig_path, 'bias-by-grp', sep=''))ims =lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))multi <-c(ims[[1]]) %>%image_append() %>%c(ims[[2]]) %>%image_append()plt_hgt = trn_bp_hgt # get the height of the traj plotplt_wdth = trn_bp_wdth+bs_wdthpdf(paste(fig_path, 'k4_bias.pdf', sep=''), width=plt_wdth,height=plt_hgt)plot(multi)dev.off()
png
2
And now, the last piece, the correlation between the bias score and the logged routine variability score.
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
png
2
Transfer bias correlates with routine score.
Now put the transfer bias scores
Code
fnms =c(k4_by_grp_fname, bias_by_grp_fname, lt_bias_corr_fname)ims =lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))bottom <-image_append(c(ims[[2]],ims[[3]])) # decided to put bottom in paper and top in supplementalpdf(paste(fig_path, 'LT_results.pdf', sep=''))plot(bottom)dev.off()
png
2
Code
# multi <- image_append(c(ims[[1]], bottom), stack=TRUE)pdf(paste(fig_path, 'LT_supp.pdf', sep=''), width=trn_bp_wdth,height=bs_hgt*2)plot(ims[[1]])dev.off()
png
2
Ahhh. Done. Now for stats, I only want the difference in complete vs partial, the difference in the bias between groups, and the correlation between the r and the bias.
`summarise()` has grouped output by 'sub', 'switch'. You can override using the
`.groups` argument.
Code
# now I temporarily make a wider format dataframe so I can make the bias scorestmp <- ts_dat %>%pivot_wider(id_cols=c(sub,train_type),names_from=switch,values_from=c(se, rt, acc)) %>%mutate(se_bias = se_switch/(se_stay+se_switch),rt_bias = rt_switch/(rt_stay+rt_switch),acc_bias = acc_switch/(acc_stay+acc_switch))# now join key variables to other dataframe and remove the tmp onets_dat <-inner_join(ts_dat, tmp %>%select(sub, se_bias, rt_bias, acc_bias), by='sub')rm(tmp)# now join to the r dat and log the r variablets_dat <-inner_join(ts_dat, r_dat, by='sub') %>%mutate(log_r =log(r))# Now I have the data, I want to replace any outliers with NA# Note that setting errors wound up in a lot of dropped data, so I am not going to use the filtered versions of that variabledvs <-c('se', 'rt', 'acc', 'se_bias', 'rt_bias', 'acc_bias', 'log_r')ts_dat <-cbind(ts_dat,do.call(cbind, lapply(dvs, remove_outliers, betas=ts_dat)))
Now I make 3 very similar plots as I did for learning transfer. But I have to do this twice for the task switching data - once for setting errors and once for RT
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
png
2
And now I put these figures together for the paper -