Transition Entropy and Routines

Produce the results for ‘Towards the definition and measurement of routines and their impact on cognitive control’

This notebook provides a computationally reproducible record of the analysis and figure generation for the paper ‘Towards a normative theory of routines’ (or whatever we wind up deciding to call it).

Settings and other things

This notebook assumes you have the following file structure. Note that the pre-existing csv files were generated by the code in this repository., and the .Rda file was generated in the previous dopamine study using code in this repository.

routines_produce-results/
│   ├── _quarto.yml
│   ├── *.Rproj
│   ├── this-qmd-doc.qmd
│   ├── R/
│   │   └── all-r-scripts.R
│   ├── data-wrangled/
│   │   └── exp_[exp_str]_evt.csv
│   │   └── exp_[exp_str]_avg.csv
│   │   └── exp_lt_maggi-k4.csv
│   │   └── dat4_seq_model.Rda ### need to find origin of this file and make consistent across locations
│   ├── figs/
│   ├── sims/
│   ├── res/
│   ├── analysis-output/

First, load required packages and set the relative paths for data and other required things…

Code
options(tidyverse.quiet = TRUE)
library(tidyverse)
library(grid)
library(gridExtra)
library(knitr)
library(magick)
library(ggpubr)
library(vioplot)
library(rstatix)
library(emmeans)
library(afex)
library(pdftools)
library(purrr)
library(GGally)

data_path = 'data-wrangled/' # for all data derivs
sim_data_path = 'sims/' # for simulation results
fig_path = 'figs/' # for figures
res_path = 'res/' # for inferential results

function_loc <- "R" # where are the functions?
req_functions <- list.files(function_loc)
sapply(req_functions, function(x) source(paste(here::here(function_loc),
                                               x, sep="/")))

# # now some font settings
library(extrafont)
#font_import() # run this once only, comment out after first time
loadfonts(device='pdf')

Q1. Can we produce routines in a consistent way across sessions and conditions?

Here, I calculate the routine (R) scores for each participant, generate density plots of those scores over experiments and conditions, and then I print out the analysis of the basis differences between conditions, across replications.

Compute TE Scores

First I compute TE scores for each participant. Note that these shall now mostly be referred to as R scores within this document.

Code
apply_sort_data <- function(exp_str, data_path){
    # take the trial level data from session 2 and sort into contexts. We assume a person believes they are in the same context until their first 'cc' hit after a context switch.
  data_fname <- paste(data_path, 'exp_', exp_str,
                       '_evt.csv', sep='')
  save_fname <- paste(data_path, 'exp_', exp_str,
                          '_door_selections_for_ent.csv', sep='')
  ent_initial_data_sort(data_fname, save_fname)
}
lapply(c('lt', 'ts'), apply_sort_data, data_path=data_path)

apply_compute_R <- function(exp_str){
# take the data output from the step above, and compute the routine score for each participant and context
  data_fname <- paste(data_path, 'exp_', exp_str,
                      '_door_selections_for_ent.csv', sep='')
  save_new_data_fname <- paste(data_path, 'exp_', exp_str,
                               '_rscore-full.csv', sep='') # for saving R scores not averaged across context. Note that original plotting code in the function below, now commented out, shows that
  save_sum_data_fname <- paste(data_path, 'exp_', exp_str,
                               '_rscore.csv', sep='')
  ent_compute_routine_measure(data_fname, save_new_data_fname, save_sum_data_fname)
}

lapply(c('lt', 'ts'), apply_compute_R)

Now that we have the R scores, we can investigate whether we elicit R scores that are greater than you would expect by chance, measure something reliable in humans, and we want to get a feel for what the behaviour is that we are quantifying.

I will make 3 subplots: one showing the z-scores for humans vs agents, one showing the reliability of the measure over sessions, and one showing the trajectories for 2 participants. These will then be put together into one big figure, for the manuscript

Trajectories

Note that the below ‘function’ contains hard coded, handpicked subsets of trials, to show the trajectories in performance.

Code
exp_str <- 'lt'
traj_data_fname <- paste(data_path, 'exp_', exp_str, '_evt.csv', sep="")
traj_fig_fname <- paste(fig_path, 'trajectories', sep='')
w <- 10 # width of plot, in cm
traj_plt <- plot_trajectories(traj_data_fname, traj_fig_fname, w)

Now we’ve created it, lets display it -

Figure 1: Fig 1: example trajectories across doors

R scores are systematically higher than you would expect by chance

The next thing we want to demonstrate is the R scores are statistically higher than you would expect by chance. To achieve this, we first generate a null distribution of door selections for each subject, and obtaining a z score for their observed R score and the mean and sd of the null. We will compare this to zero, and to the performance of a perfectly performing, random agent.

First, to make the nulls, we take their observed data, and resample it a thousand times, under the constraint that the same door can’t be sampled twice.

Note that the evaluation of the code block below is set to false, as it takes a while to get the null distributions

Code
set.seed(42) # for reproducibility is the meaning of life
exp_strs = c('lt', 'ts')

apply_generate_nulls <- function(exp_str){
  door_selections_fname = paste(data_path, 'exp_', exp_str, '_door_selections_for_ent.csv', sep='')
  r_dat_fname = paste(data_path, 'exp_', exp_str, '_rscore-full.csv', sep='')
  null_rs_fname = paste(data_path, 'exp_', exp_str, '_rnulls.csv', sep='') # for saving
  generate_nulls(door_selections_fname,
                 r_dat_fname,
                 null_rs_fname)
}
lapply(exp_strs, apply_generate_nulls)

Now that we have the null Rs, we can compute a z-score for each participant, based on their observed R, and the null distribution. We will save the z-scores for plotting, and save the results of t-tests against zero and against -2 to a csv file. Note that when the below was originally run, visual inspection of the data showed some outliers, so I removed participants with a z-score > or < 3 standard deviations from the mean.

Code
exp_strs <- c('lt','ts')

z_comps <- function(exp_str, data_path){
  null_rs_fname = paste(data_path, 'exp_', exp_str, '_rnulls.csv', sep='') # for saving
  zs_out_fname = paste(data_path, 'exp_', exp_str, '_zs_cl.csv', sep='')
  zs_stats_fname = paste(res_path, 'exp_', exp_str, '_zs_cl_inf-test.csv', sep='')
  ent_compute_zs(null_rs_fname, zs_out_fname, zs_stats_fname)
}  

lapply(exp_strs, z_comps, data_path=data_path)
`summarise()` has grouped output by 'sub'. You can override using the `.groups`
argument.
`summarise()` has grouped output by 'sub'. You can override using the `.groups`
argument.

Lets look at those stats!

Code
z_stats <- do.call(rbind, lapply(c('lt', 'ts'), 
                function(x) read.csv(paste(res_path, 'exp_', x, '_zs_cl_inf-test.csv', sep=''), 
                                     header=T)))


#| label: distribution of z scores (routine score against permuted null)
#| tbl-cap: "distribution of z scores (routine score against permuted null)"
kable(z_stats %>% select(!data.name), digits=2)
t df p conf.int1 conf.int2 M null.value.mean SE alternative method id
-6.69 97 0 -Inf -12.82 -17.06 0.00 2.55 less One Sample t-test zero
-5.92 97 0 -Inf -12.82 -17.06 -1.96 2.55 less One Sample t-test p95
-8.61 97 0 -Inf -8.69 -10.77 0.00 1.25 less One Sample t-test zero
-7.04 97 0 -Inf -8.69 -10.77 -1.96 1.25 less One Sample t-test p95

Now we want to visualise the z-scores, but we will also compare them to the performance of a perfectly performing but perfectly random agent, whose choices are similar constrained like the participant nulls (i.e. the same door cannot be picked twice in a row).

Note that the below code block does not run, owing to the comp time required to generate the nulls.

Code
agent_save_name <- paste(sim_data_path, 'random-agent_z-score-analysis.Rds', sep='')
get_random_agent_null(n_samples=1000, save_name = agent_save_name)

Now lets plot the histograms of humans vs random agent. Note this code produces a labelled pdf (humans v agent) for talks, and a pdf pared back for a manuscript.

Code
exp_strs <- c('lt', 'ts')
col_scheme <- c('#1b9e77','#d95f02', '#7570b3')
names(col_scheme) <- c(exp_strs, "agent")
sims_fname = paste(sim_data_path, 'random-agent_z-score-analysis.Rds', sep='/')
ms_z_plt_fname = paste(fig_path, 'z-hsts_ms', sep='')
tlk_z_plt_fname = paste(fig_path, 'z-hsts_tlk', sep='')
z_p_wdth <- 10 # plot width of ms plot, in cm
z_p_hgt <- z_p_wdth*(6/10)
produce_z_plts(exp_strs,
               col_scheme,
               z_p_wdth,
               z_p_hgt,
               data_path,
               sims_fname,
               ms_z_plt_fname,
               tlk_z_plt_fname)
png 
  2 

Fig 2: Showing z-scores for humans and random agent

Next, we can compare the mean z-score of the humans to that of the random agent - i.e. what is the probability that the mean z-score of the humans comes from the null distribution of the random agent?

Code
exp_strs <- c('lt', 'ts')
human_zs <- lapply(exp_strs, function(x) read.csv(paste(data_path,
                                                        'exp_', x,
                                                        '_zs_cl.csv', 
                                                        sep=''), header=T))

# this will load a n length vector called 'zs' where n = the number of times a null was generated for the random agent
load(paste(sim_data_path, 'random-agent_z-score-analysis.Rds', sep=''))

human_v_agent_zs <- sapply(human_zs, function(x) (mean(x[,'mu_z']) - mean(zs))/sd(zs))
names(human_v_agent_zs) <- exp_strs
human_v_agent_ps <- sapply(human_v_agent_zs, pnorm)
names(human_v_agent_ps) <- exp_strs

# make a dataframe of the results and save as a csv
hum_z_dat <- tibble(exp = exp_strs,
                    mu = c(round(mean(zs),2), NA),
                    sd = c(round(sd(zs),2), NA),
                    zs = human_v_agent_zs,
                    ps = human_v_agent_ps)
# this needs to be rounded to 2 dp.
hum_z_dat[,c("zs", "ps")] <- apply(hum_z_dat[,c("zs", "ps")], 2, round, 2)
write.csv(hum_z_dat, file=paste(res_path, 'human_v_agent_zs.csv', sep=''),
          row.names=FALSE)

# print out the result
sprintf('the probability that the human data comes from the null distribution is p= %.2f for the lt exp, and p= %.2f for the ts exp', round(human_v_agent_ps['lt'],2), round(human_v_agent_ps['ts'],2))
[1] "the probability that the human data comes from the null distribution is p= 0.00 for the lt exp, and p= 0.00 for the ts exp"

Routine scores are reliable across sessions

Now that we know we elicit routines above what you would expect by chance, we next ask, using the data from the dopamine study, how reliable are these routines within an individual?

Code
rel_data_fname <- paste(data_path, 'dat4_seq_model.Rda', sep='') # note: this datafile was generated by this project -https://github.com/garner-code/DA_VisRoutes and copied over to this project for the current analysis 
rel_data_save_name <- paste(data_path, 'rel_rs_wf.csv', sep='') # save the r's from da and placebo in widefrom
rel_analysis_save_name <- paste(res_path, 'da_reliability_analysis.csv', sep='')
reliability_analysis(rel_data_fname, rel_data_save_name,
                     rel_analysis_save_name)

# now we've made the data file, report the results of the reliability analysis
rel <- read.csv(rel_analysis_save_name)
fmt = 'the measure demonstrates some reliability, r = %.2f, 95 CI[%.2f, %.2f], t(%.0f) = %.2f, p = %.2f'
sprintf(fmt, rel$r, rel$l, rel$u, rel$df, rel$t, rel$p)

Now let’s see the correlation in the data -

Code
# note the 'r' is for reliability
r_p_wdth <- z_p_wdth/2 # plot width of ms plot, in cm - its half the width of the z-score plot
r_p_hgt <- z_p_hgt
r_col = col_scheme[3]
fig_lab = 'C'
rel_data_save_name = rel_data_save_name
rel_plt_fname = paste(fig_path, 'reliability_data', sep='')
plot_rel(r_p_wdth, r_p_hgt, r_col,
         fig_lab, 
         rel_data_save_name,
         rel_plt_fname)

Fig 3: Correlation of routine scores between sessions

Now I can put Figs 1, 2 and 3 together to make one figure. Going to use the magick package, as dealing with pdfs.

Code
# fnms = c(traj_fig_fname, ms_z_plt_fname, rel_plt_fname)
fnms = c(traj_fig_fname, ms_z_plt_fname)

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

im_info = lapply(ims, image_info)
top_width = im_info[[2]]$width #+ im_info[[3]]$width
bottom_width = im_info[[1]]$width

bottom_centered <- image_extent(ims[[1]],
                                geometry = paste0(top_width, "x", ... =  
                                                    im_info[[1]]$height),
                                gravity = "north",
                                color = "none")


# multi <- c(ims[[2]]) %>%
#          image_append() %>%
#         c(bottom_centered) %>% # might want to adjust the size of this
#          image_append(stack=TRUE)
multi <- c(ims[[2]], ims[[1]]) %>%
         image_append(stack=TRUE)
traj_hght = (w/6*2)+.5 # get the height of the traj plot
pdf(paste(fig_path, 'FigA.pdf', sep=''), width=z_p_wdth+1, # +r_p_wdth,
    height=z_p_hgt+traj_hght)
plot(multi)
dev.off()

Can we systematically manipulate how routine someone is?

Now we want to demonstrate that over our two experiments, our training manipulation of stable vs less stable contexts make people more or less routine. We then want to locate the source of disruption to the routine (i.e. what kind of responses are different between the two groups)

How did the training manipulation affect routine scores over the two experiments?

To answer this, we need to visualise the routine scores by training group, across the two experiments.

Code
# first, I need to get the routine scores matched with the training group scores
lapply(c('lt', 'ts'), get_r_info_and_save, data_path=data_path)
col_scheme = c("#ef8a62","#999999")
plot_r_train(p_wdth = z_p_wdth, 
             p_hgt = z_p_hgt, 
             cols = col_scheme, 
             x_rng = c(0, 40),
             y_rng = c(0, 25),
             exp_strs = exp_strs,
             data_path = data_path,
             r_by_g_fname = paste(fig_path, 'rs_by-traintype_hists', sep=''))

Fig 4: How training condition impacted routine scores

As predicted, there is a systematic difference between training conditions. Now we can test the difference between them. Note that as the scores are clearly skewed, we will log transform the r scores prior to performing a t-test to compare the two groups in each experiment.

Code
get_rdat_and_label <- function(exp_str, data_path){
  rdat <- get_dat(exp_str, data_path)
  rdat$exp = exp_str
  rdat
}

rdat <- do.call(rbind, lapply(exp_strs, get_rdat_and_label, data_path = data_path))

do_grp_t <- function(rdat, exp_str){
  with(rdat %>% filter(exp == exp_str),
       t.test(log(r) ~ train_type))
}

get_cohens <- function(rdat, exp_str){
  tmp = rdat %>% filter(exp == exp_str) %>%
          mutate(logr = log(r))
  cohens_d(data = tmp,
           formula = logr~train_type,
           ci = TRUE,
           ci.type="perc",
           nboot = 1000)
}

grp_comps <- lapply(exp_strs, do_grp_t, rdat=rdat)
grp_cohens <- lapply(exp_strs, get_cohens, rdat=rdat)

prnt_t <- function(t_res, d_res){
  
  tibble(t = abs(round(t_res$statistic, 2)),
         df = round(t_res$parameter, 2),
         p = round(t_res$p.value, 3),
         mu_diff = round(t_res$estimate[1] - t_res$estimate[2], 2),
         mu_l = round(t_res$conf.int[1], 2),
         mu_u = round(t_res$conf.int[2], 2),
         d = abs(round(d_res$effsize, 2)),
         d_l = abs(round(d_res$conf.low, 2)),
         d_u = abs(round(d_res$conf.high, 2)),
         dv = 'log r')
}

grp_r_comp <- do.call(rbind, 
                      lapply(1:length(grp_comps),
                             function(x) 
                               prnt_t(grp_comps[[x]],
                                      grp_cohens[[x]])))
grp_r_comp$exp <- exp_strs

# now save as a csv file 
write.csv(grp_r_comp, paste(res_path, 'grp_r_ts.csv', sep=''), row.names=FALSE)

Now I need some descriptive statistics

Code
rdat_desc <- rdat %>% group_by(exp, train_type) %>%
  summarise(M=mean(r),
            SD=sd(r),
            N=length(r),
            SE=SD/sqrt(N))
`summarise()` has grouped output by 'exp'. You can override using the `.groups`
argument.
Code
cols2rnd <- c("M", "SE")
rdat_desc[cols2rnd] <- apply(rdat_desc[cols2rnd], 2, round, 2)
rdat_desc$id = paste(rdat_desc$exp, rdat_desc$train_type, sep="_")

write.csv(rdat_desc, paste(res_path, 'grp_r_desc-stats.csv', sep=''), row.names=FALSE)

The results are pleasingly consistent across both experiments, so will show the inferential outcomes in a table -

Code
kable(grp_r_comp, digits=2)
comparing routines by groups
t df p mu_diff mu_l mu_u d d_l d_u dv exp
3.39 97.44 0 -0.47 -0.74 -0.19 0.68 1.17 0.32 log r lt
3.69 96.67 0 -0.47 -0.72 -0.21 0.74 1.17 0.34 log r ts

What kinds of responses account for the differences in routine observed between the groups, across the two experiments?

This is where we look at the task-jumps measure between the two groups. Here I create a group x trial type dataframe and draw and save the boxplots.

Code
# first, get the task jump data - note that I am not saving it as a csv,
# as it already exists with all the relevant info in the _avg csv
get_jump_data <- function(exp_str, data_path){
  
  tmp <- read.csv(paste(data_path, 'exp_', exp_str,
                        '_avg.csv', sep='')) %>%
    filter(ses == 2) %>%
    select(sub, train_type, context, switch, context_changes) %>%
    group_by(sub, train_type, switch) %>% 
    summarise(jumps=mean(context_changes)) %>%
    ungroup()
  tmp$exp = exp_str
  tmp
}

jumps <- do.call(rbind, lapply(exp_strs, get_jump_data,
                               data_path = data_path))
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
Code
j_wdth = z_p_wdth
j_hgt = z_p_hgt
col_scheme = c('#7570b3', '#e7298a')

gen_jumps_plot(jumps,
               'jumps ~ switch*train_type', 
               exp_strs,
               col_scheme,
               j_wdth, j_hgt,
               paste(fig_path, 'task-jumps', sep=''),
               fig_labs = c('A', 'B'),
               ylabel = 'Jumps',
               xlabel = 'Group',
               ylims=c(0,5))

Now lets see the fig in all its glory -

Fig 5: People in the frequent switch group broke up their routines to jump tasks

Now I will put the routine by group and the task jumps figure into one plot for the manuscript.

Code
fnms = c(paste(fig_path, 'rs_by-traintype_hists', sep=''),
         paste(fig_path, 'task-jumps', sep=''))

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

multi <- c(ims[[1]]) %>%
         image_append() %>%
        c(ims[[2]]) %>%
         image_append(stack=TRUE)

pdf(paste(fig_path, 'FigB.pdf', sep=''), width=j_wdth,
    height=j_hgt)
plot(multi)
dev.off()
png 
  2 

Note that general errors are low, and that information can be called in from ….

It looks like there is an interaction between group and switch (in fact we know there is, from Emily & Dinuk’s analysis). Here I replicate their results: -

Code
# first set the factors to be factors
jumps$sub <- as.factor(jumps$sub)
jumps$train_type <- as.factor(jumps$train_type)
levels(jumps$train_type) <- c('stable','variable')
jumps$switch <- as.factor(jumps$switch)
levels(jumps$switch) <- c('stay','switch')

# now run the ANOVA model
# set emmeans option to multivariate
afex_options(emmeans_model = "multivariate")
# perform the statistical model
exp_strs <- unique(jumps$exp)
tj_aovs <- lapply(exp_strs, function(x) 
  aov_ez("sub", "jumps", jumps %>% filter(exp == x), 
         within = "switch",
         between = "train_type"))
Contrasts set to contr.sum for the following variables: train_type
Contrasts set to contr.sum for the following variables: train_type
Code
names(tj_aovs) <- exp_strs

# convert the anovas to a dataframe, save the results and show as a table
convert_aov_to_df <- function(aovl, exp_str){
  ref = aovl$anova_table
  tibble(exp = rep(exp_str, length(ref$F)),
         effect = rownames(ref),
         numDF = ref$`num Df`,
         denDF = ref$`den Df`,
         Fstat = round(ref$F, 2),
         ges = round(ref$ges, 2),
         p = round(ref$`Pr(>F)`, 2))
}
tj_aov_dat <- do.call(rbind, 
                      lapply(1:length(exp_strs),
                             function(x) convert_aov_to_df(tj_aovs[[x]],
                                                           exp_strs[x])))
tj_aov_dat$id <- with(tj_aov_dat, paste(exp, effect, sep="_"))
write.csv(tj_aov_dat, paste(res_path, 'task-jumps_aov.csv'), row.names=FALSE)
Code
#| label: impact of condition on routines
#| tbl-cap: "impact of switch condition on routines by group"

kable(tj_aov_dat, digits=2)
exp effect numDF denDF Fstat ges p id
lt train_type 1 98 8.35 0.08 0.00 lt_train_type
lt switch 1 98 5.51 0.00 0.02 lt_switch
lt train_type:switch 1 98 24.36 0.01 0.00 lt_train_type:switch
ts train_type 1 98 11.07 0.10 0.00 ts_train_type
ts switch 1 98 0.26 0.00 0.61 ts_switch
ts train_type:switch 1 98 20.21 0.01 0.00 ts_train_type:switch

Now I need to follow up the interactions, and grab the estimated marginal means so I can report the main effects.

Getting the desc stats on the main effect of group:

Code
# first I get the between group effect
me_grp <- lapply(tj_aovs, function(x) summary(emmeans(x, 'train_type')))
me_grp <- do.call(rbind, me_grp)
me_grp$exp <- rep(exp_strs, each=length(unique(me_grp$train_type)))
names(me_grp) <- c('group', 'emmean', 'se', 'df', 'l', 'u', 'exp')
me_grp$id <- with(me_grp, paste(exp, group, sep="_"))
cols2round <- c("emmean", "se", "l", "u")
me_grp[,cols2round] <- apply(me_grp[,cols2round], 2, round, 2)
write.csv(me_grp, paste(res_path, '/task-jumps_ph_me-grp.csv', sep=''))
Code
#| label: main effect of group - estimated marginal means
#| tbl-cap: "main effect of group - estimated marginal means"
kable(me_grp, digits=2)
group emmean se df l u exp id
stable 0.40 0.13 98 0.13 0.67 lt lt_stable
variable 0.95 0.13 98 0.68 1.22 lt lt_variable
stable 0.48 0.15 98 0.19 0.78 ts ts_stable
variable 1.18 0.15 98 0.88 1.47 ts ts_variable

And get the mean estimates for each switch by group condition -

Code
# and now I test the following 
simp_fx <- lapply(tj_aovs, function(x) 
           list(summary(emmeans(x, 'switch', by='train_type'))))
sw_by_tt <- do.call(rbind, lapply(simp_fx, function(x) summary(x[[1]])))
sw_by_tt$exp <- rep(c('lt','ts'), each = 4)
names(sw_by_tt) <- c('group', 'switch', 'emmean', 
                     'se', 'df', 'l', 'u', 'exp')
sw_by_tt$id <- with(sw_by_tt, paste(exp, group, switch, sep="_"))
sw_by_tt[,cols2round] <- apply(sw_by_tt[,cols2round], 2, round, 2)
write.csv(sw_by_tt, file=paste(res_path, 'task-jumps_ph_se-sw.csv', sep=''), row.names=FALSE)
Code
#| label: estimated means for each level of group x switch 
#| tbl-cap: "estimated means for each level of group x switch"
kable(sw_by_tt)
group switch emmean se df l u exp id
stable stay 0.35 0.14 98 0.08 0.63 lt lt_stable_stay
stable switch 0.44 0.13 98 0.18 0.71 lt lt_stable_switch
variable stay 1.08 0.14 98 0.80 1.35 lt lt_variable_stay
variable switch 0.82 0.13 98 0.56 1.09 lt lt_variable_switch
stable stay 0.35 0.14 98 0.07 0.63 ts ts_stable_stay
stable switch 0.62 0.16 98 0.30 0.94 ts ts_stable_switch
variable stay 1.28 0.14 98 1.00 1.56 ts ts_variable_stay
variable switch 1.07 0.16 98 0.75 1.39 ts ts_variable_switch

Decide whether to add further post-hoc info.

We can conclude at this point that frequent switching makes people more variable in the order in which they perform their behaviours, and that this might be because they are jumping between the two tasks more frequently. This may be because they are mixing up the two tasks more. We also want to rule out general confusion. To do this, we look at responses that come from neither task. Note, this figure can go in the supplemental info.

Code
# get general errors data
get_error_data <- function(exp_str, data_path){
  
  tmp <- read.csv(paste(data_path, 'exp_', exp_str,
                        '_avg.csv', sep='')) %>%
    filter(ses == 2) %>%
    select(sub, train_type, context, switch, general_errors) %>%
    group_by(sub, train_type, switch) %>% 
    summarise(errors=mean(general_errors)) %>%
    ungroup()
  tmp$exp = exp_str
  tmp
}

errors <- do.call(rbind, lapply(exp_strs, get_error_data,
                                data_path = data_path))
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'sub', 'train_type'. You can override using
the `.groups` argument.
Code
gen_jumps_plot(errors,
               'errors ~ switch*train_type',
               exp_strs,
               col_scheme,
               j_wdth, j_hgt,
               paste(fig_path, 'general-errors', sep=''),
               fig_labs = c('C', 'D'),
               ylabel = 'Errors',
               ylims=c(0,1),
               xlabel = 'Group')
png 
  2 

Fig 5: People in the two groups showed the same levels of out of task errors

Now I analyse errors in the same way as I did task jumps. Note that after correcting for multiple tests, only the main effect of switch is consistently significant across the two experiments.

Code
# first set the factors to be factors
errors$sub <- as.factor(errors$sub)
errors$train_type <- as.factor(errors$train_type)
levels(errors$train_type) <- c('stable','variable')
errors$switch <- as.factor(errors$switch)
levels(errors$switch) <- c('stay','switch')

# perform the statistical model
exp_strs <- unique(errors$exp)
er_aovs <- lapply(exp_strs, function(x) 
  aov_ez("sub", "errors", errors %>% filter(exp == x), 
         within = "switch",
         between = "train_type"))
Contrasts set to contr.sum for the following variables: train_type
Contrasts set to contr.sum for the following variables: train_type
Code
names(er_aovs) <- exp_strs

er_aov_dat <- do.call(rbind, 
                      lapply(1:length(exp_strs),
                             function(x) convert_aov_to_df(er_aovs[[x]],
                                                           exp_strs[x])))
er_aov_dat$id <- with(er_aov_dat, paste(exp, effect, sep="_"))
write.csv(er_aov_dat, paste(res_path, 'gen-errs_aov.csv'), row.names=FALSE)
Code
#| label: effect of group and switch on general errors
#| tbl-cap: "effect of group and switch on general errors"
kable(er_aov_dat, digits=2)
exp effect numDF denDF Fstat ges p id
lt train_type 1 98 3.16 0.03 0.08 lt_train_type
lt switch 1 98 22.88 0.01 0.00 lt_switch
lt train_type:switch 1 98 0.00 0.00 0.94 lt_train_type:switch
ts train_type 1 98 4.74 0.04 0.03 ts_train_type
ts switch 1 98 26.96 0.02 0.00 ts_switch
ts train_type:switch 1 98 2.02 0.00 0.16 ts_train_type:switch

Because of that, I’ll just get the emms on the me of switch.

Code
me_sw_er <- lapply(er_aovs, function(x) summary(emmeans(x, 'switch')))
me_sw_er <- do.call(rbind, me_sw_er)
me_sw_er$exp <- rep(exp_strs, each=length(unique(errors$train_type)))
names(me_sw_er) <- c('switch', 'emmean', 'se', 'df', 'l', 'u', 'exp')

me_sw_er$id <- with(me_sw_er, paste(exp, switch, sep="_"))
cols2round <- c("emmean", "se", "l", "u")
me_sw_er[,cols2round] <- apply(me_sw_er[,cols2round], 2, round, 2)

write.csv(me_sw_er, paste(res_path, '/gen-errs_ph_sw.csv', sep=''))
Code
#| label: effect of switch on errors
#| tbl-cap: "effect that switch has on general errors"
kable(me_sw_er, digits=2)
switch emmean se df l u exp id
stay 0.03 0.01 98 0.01 0.04 lt lt_stay
switch 0.04 0.01 98 0.02 0.05 lt lt_switch
stay 0.03 0.01 98 0.02 0.05 ts ts_stay
switch 0.05 0.01 98 0.04 0.07 ts ts_switch

OK, so now we know that the variable group showed less consistent behaviours in terms of order, but comparable accuracy to the stable group (the test is well powered and we can’t discern a group difference).

So being in the variable group impacts your routines. The next question is why?

Why did the stable vs variable manipulation impact routines?

The key thing we want to know is, did we mess up people’s routines because they were probability matching to context, instead of tracking the probability of success, given the context?

Using Bayes theorem, we compute the probability that you are still in context A, given that you have observed that n number of doors from A, and m number of doors from B do not have a target behind them. We can use this probability to then compute the probability of success on your next go, given the number of doors left in each context.

Using Bayes theorem, we can cast the probability that you are still in \(C_A\), given you have observed n and m false doors:

\[P(C = A | n, m) = \frac{ P(n,m|C=A)P(C=A)}{P(n,m)}\]

Once we have the probability of being in that context, given n and m, we can compute the probability of success for staying in that context, and the probability of success for moving to the other context, by taking the new probability of still being in context A, and multiplying it by the probability of success, given the number of doors left in A.

\[ P(S|C = A,n) = \frac{P(C=A|n,m)}{n_\mathrm{max} - n} \]

where S is success, and \(n_{\mathrm{max}}\) is the total number of n that can be chosen (i.e. the number of target doors in the set).

Note that we ignore repetitions when calculating these trial by trial probabilities.

We want to see how well each set of probabilities accounts for task jumps, over and above what can be accounted for by the number of switches a participant experiences (i.e. with increasing switches comes increasing chance of confusion), or the switch rate (how often a switch occurs on average, given experience).

Switches are the cumulative sum of experienced switches - i.e. how often you found a target in a context that is different to the last context where you found a target:

\[ \mathrm{Sw} = \sum_{i=1}^S S(i) \]

And switch rate is simply the probability of a switch, given the number of targets you have found:

\[ \mathrm{Sw_{r}} = \frac{\mathrm{Sw}}{t} \]

where t is the total number of targets found so far.

First lets get the data in shape. First we get the switch and switch rate variables -

Code
exp_strs <- exp_strs

get_sw_info <- function(exp_str){
  dat <- read.csv(paste(data_path, 'exp_', exp_str, '_evt.csv', sep='')) %>%
      filter(ses == 2) %>% 
      select(sub, train_type, t, context, door, door_cc, door_oc, door_nc, switch)

      #################################################
      # add Sw and Swr regressors for each subject
      subs <- unique(dat$sub)
      dat <- do.call(rbind, lapply(subs, get_Sw, dat=dat))
      dat$exp = exp_str
      dat
}

dat <- do.call(rbind, lapply(exp_strs, get_sw_info))

Now I will get \(p(C = A | n, m)\) and \(p(S|C,n)\). The function I call is a bit of a beast, but hopefully its sufficiently commented that its easy enough to follow what is happening. Basically for each trial, I calculate:

\(p(C = A) = \frac{1}{t} \sum_{i=1}^{t} (C_i == C_{i-1})\)

For each trial, I then count the number of n’s and m’s. Note that n and m are characterised according to the last place you found a target. e.g. if I just found a target from CA, but the true state of the world is now CB, any selection from CB would be counted as an m, as I have no evidence yet to tell me its an n. Note that each n and m is only counted once (duplicates are not counted).

For the cumulative counts of n and m on each trial, I assign the probability of that many nulls, given you are in the context to which those doors belong. Note the equation below is defined for n, but I also do the same for \(p(m|C=B)\)

\[ p(n | C=A) = \frac{n_{max} - n}{n_{max}} \]

This allows me to compute, based on trial by trial experience, \(p(C = A | n, m)\) and \(p(S|C=A, n,m)\). At the end, I then shift all the information down 1 row, as the current outcome should affect the next behaviour. For each regressor, I then take the odds by dividing by \(p(C=B|n,m)\) and \(p(S|C = B,n,m)\). I then use the odds variables in the subsequent logistic regressions.

Code
exp_strs <- exp_strs

get_pcanm_info <- function(exp_str, dat){
  #################################################
  # add Sw and Swr regressors for each subject
  subs <- unique(dat$sub)
  dat <- do.call(rbind, lapply(subs, get_p_context, dat=dat))
  dat
}

dat <- do.call(rbind, lapply(exp_strs, function(x) get_pcanm_info(x, dat %>% filter(exp == x))))
write.csv(dat, paste(data_path, 'evt-dat_4log-reg.csv', sep=''), row.names=FALSE)
rm(dat)

Now I have the model-based regressors, I use them in a logistic regression to predict \(m\) selections for each individual. Note that I scale the predictors before running the model.

Code
betas_fname = paste(data_path, 'evt-dat_4log-reg.csv', sep='')
exp_strs <- c('lt', 'ts')

lapply(exp_strs, get_logist_mods_and_betas, fname=betas_fname, 
       data_path=data_path, res_path=res_path)

Note that I played with plotting the individual level fits, and while I was happy the models were doing a good enough capturing individual variance, I am yet to find a satisfying way to plot the individual model fits (owing to the multiple continuous predictors). So I will skip straight to the second level analysis, where we take the parameters from the logistic regression models and compare between groups.

Code
# some notes re: cleaning. For some participants, the algorithm did not converge,  or the estimated probabilties could not be distinguished between 0 and 1. Eyeballing the boxplots, these subjects largely fell outside the > 1.5 x the IQR (although sometimes the IQR excluded extra participants for whom there was no warning). Removing outliers > 1.5 x IQR seems the most objective way to treat the data, to me.

apply_clean_and_analyse_betas <- function(exp_str, data_path, res_path){
  
  betas <- read.csv(paste(data_path, 
                          'betas_', exp_str, '_first-level.csv',
                          sep=''))
  betas <- apply_outlier_filter_to_all_vars(betas)
  # save the new data file, as it has extra variables
  write.csv(betas, file=paste(data_path, 'betas_',
                              exp_str, '_first-level_cln.csv',
                              sep=''))
  apply_t_tests_to_all_vars(betas, exp_str, res_path) # this applies inferential t-tests and writes the results to csv files in res_path
}
lapply(c('lt', 'ts'), apply_clean_and_analyse_betas, 
                      data_path = data_path,
                      res_path = res_path)

Now I am going to collapse across the two experiments to see if the p(success|task) holds up:

Code
read_and_label_beta_dat <- function(exp_str, fname){
  dat <- read.csv(fname)
  dat$exp <- exp_str
  dat
}

beta_both <- do.call(rbind, lapply(c('lt', 'ts'), function(x) read_and_label_beta_dat(x,
  paste(data_path, 'betas_', x, '_first-level_cln.csv',
                              sep=''))))

apply_t_tests_to_all_vars(beta_both, 'all', res_path)

I don’t present the beta inferential tests here, as there are so many, but the take home is that they all betas are significantly above zero, the mu is clearly sig between the two groups, which we know, there is some glimpse of an impact of p(scs|cntxt), for each experiment, and it is the only significant variable(other than mu) when both experiments are aggregated.

Next step is to blot the beta coefficients by group (and analysis)

Code
# now that we have analysed the betas ()
# plan is to do box plots
# top row = co-effs by parameter, by group # lt
# second row = co-effs by parameter, by group # ts

make_coefs_plts_4paper_andtlks(data_path,
                               paste(fig_path, 
                                     'betas_scnd-lvl',
                                     sep=''),
                               p_wdth=12,
                               p_hgt=12)
png 
  2 

Now lets see the results:

Figure 2: Showing the betas by group and parameter

So it looks like all of the tested factors contributed to the increase of other context door selections, which differed by groups (who differed by routine). It looks like the variable group was generally more confused than the other group, with a small suggestion that they may have also been matching to the probability of the context.

As an exploratory plot, I plot routine scores against beta coefficients, to check the correlations between them appear to do what we think - this is possibly one for the supplemental

Code
exp_str = 'lt'
r_dat = read.csv(paste(data_path, 'exp_', exp_str, '_rscore.csv', sep=''))
betas = read.csv(paste(data_path, 'betas_', exp_str, '_first-level_cln.csv',
                                sep=''))
betas_r <- inner_join(betas %>% select(c(sub, train_type, ends_with('_flt'))),
                      r_dat, by='sub')
betas_r <- betas_r %>% na.omit() %>% mutate(r = log(r))

# first I'll check for multivariate outliers
mhl.mat <- as.matrix(betas_r[,c('mu_flt', 'scs_flt', 'cntx_flt','r')]) 
mhl.cov <- cov(mhl.mat) # here I get the covariance matrix
mhl.dist <- mahalanobis(mhl.mat, colMeans(mhl.mat), mhl.cov) # now calc the M dist
cut_off <- qchisq(.001, df=length(mhl.dist)-1)
#hist(mhl.dist)
sum(mhl.dist > cut_off) # we keep all the data
[1] 0
Code
# prepare for the pairs plot
betas_r$train_type <- as.factor(betas_r$train_type)
levels(betas_r$train_type) <- c('stable', 'variable')
svg(paste(fig_path, 'pairs4backup.svg', sep=''), height=7, width=7)
g <- ggpairs(betas_r %>% select(mu_flt, scs_flt, cntx_flt, r),
        mapping=ggplot2::aes(colour = betas_r$train_type),
        upper = list(continuous = wrap("cor", method = "spearman")))
print(g)
dev.off()
png 
  2 

What this plot tells us is that the more that the probability of context increases your log odds of hitting a door from the other context, the less that the probability of success given context influences the probability of hitting a door from the other context. This is good. It tells us that the parameters trade off against each other in a way we would expect. (Note that I don’t include sw and swr, for visual clarity. Comfortingly, the correlations between these parameters and the others were not significant (nor with the routine scores.))

To understand the relationship between R scores and the remaining predictors, here is an optional lm to run. I am not concluding much from it as everything is calculated on the same data, but it does tell us that we can’t trust the direction of the first order correlations with r, so I am not going to over interpret the direction here. But we do see that these predictors correlate with the routine score.

Code
summary(with(betas_r, lm(r ~ mu_flt + scs_flt + cntx_flt)))

Does the formation of routines attenuate capacity to transfer learning, or promote task switching performance?

Now the cherry on top. We ask whether the extent to which you became routine (the lower your r score) predicts your difficulty being flexible, or corresponds to you being better at task switching.

Learning transfer/flexibility

First steps are to wrangle the routine and group x transfer task data, and looking at learning onset for each group and task as a boxplot. Note that from visual inspection of qqplots, I decided to log transform the data, to improve normality. Then I take the ratio of the scores from each task and correlate with the routine score, then an inferential test of the things. Last, I add mean accuracy from training, to see if that changes the effect (note to self, maybe do a path analysis if required).

Code
# wrangle data
r_dat <- read.csv(paste(data_path, 'exp_lt_rscore.csv', sep=''))
onset_dat <- read.csv(paste(data_path, 'exp_lt_maggi-k4.csv', sep='')) %>% 
  select(sid, ses, transfer, k4_onset) %>% 
  filter(ses == 3) %>%
  mutate(transfer=recode(transfer, `1` = 'comp', `2` = 'part')) %>%
  pivot_wider(names_from=transfer, values_from=k4_onset) %>% 
  mutate(k4 = part / (comp + part))
names(onset_dat)[names(onset_dat) == "sid"] = "sub"
# proportion makes it more normal

# some participants never learned, so we lose anyone who has a k4 of NaN,
# as that means their score was Inf
onset_dat <- onset_dat %>% na.omit() %>%
                filter(is.finite(comp))# this gets rid of most, but still one to get rid of

# now join to the r dat
onset_dat <- inner_join(onset_dat, r_dat, by='sub')

# now I log my variables, and I load the avg data so that I can get the training 
# group variale
onset_dat <- onset_dat %>% mutate(log_r = log(r))
grp_inf <- read.csv(paste(data_path, 'exp_lt_avg.csv', sep='')) %>% 
  filter(ses == 3) %>%
  select(sub, train_type) %>%
  unique()
onset_dat <- inner_join(onset_dat, grp_inf, by='sub') %>%
                mutate(train_type=recode(train_type, `1`= 'stable', 
                                         `2` = 'variable'))
rm(grp_inf)

# Now I have the data, I want to replace any outliers with NA
dvs <- c('comp', 'part', 'k4', 'log_r')
onset_dat <- cbind(onset_dat,
         do.call(cbind, lapply(dvs, remove_outliers, betas=onset_dat)))
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(dv)

  # Now:
  data %>% select(all_of(dv))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(nu_dv)

  # Now:
  data %>% select(all_of(nu_dv))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

Code
rem <- do.call(cbind, lapply(onset_dat, function(x) sum(is.na(x))))
rem <- rem[colnames(rem) %in% c("k4_flt", "log_r_flt")]
rem <- tibble(var=c("k4_flt", "log_r_flt"),
              rem=rem)
write.csv(rem, paste(res_path, 'exp_lt_k4_r_outs-rem.csv', sep=''), row.names = FALSE)
# and last, I want to get the accuracy/general error data from the training phase,
trn_dat <- read.csv(paste(data_path, 'exp_lt_avg.csv', sep='')) %>% 
              filter(ses == 2) %>%
              select(sub, switch, accuracy, setting_errors, general_errors) %>%
              group_by(sub) %>%
              summarise(across(where(is.numeric), ~mean(.x, na.rm=T))) 
onset_dat <- inner_join(onset_dat, trn_dat, by="sub")

write.csv(onset_dat, paste(data_path, 'exp_lt_onsetwr.csv', sep=''), row.names=FALSE)

Now that I have the wrangled data, I will make 3 subplots and then combine them. The first is the main effect of partial v transfer, the second is the percentage difference by group, and the third is the correlation between r and the k4 bias measure.

Code
# first, I make the comp/part data longer so I can plot it
tmp <- onset_dat %>% select(sub, comp, part, train_type) %>%
          pivot_longer(cols=c('comp', 'part'), 
                       values_to = 'k4_on',
                       names_to='transfer')

col_scheme = c('#ccba72ff', '#4c5454ff')

k4_by_grp_fname = paste(fig_path,'k4_by_grp', sep='')
trn_bp_wdth = z_p_wdth*.8
trn_bp_hgt = z_p_hgt
plt_tran_bp_4paper_andtlks(k4_by_grp_fname,
                trn_bp_wdth,
                trn_bp_hgt,
                tmp,
                'k4_on ~ transfer*train_type', 
                col_scheme,
                'Onset',
                c(0, 200),
                fig_lab = 'A')
png 
  2 

Knowledge onset of 4 doors in the transfer tasks, by group.

Now I want to plot the bias score, by group.

Code
bias <- onset_dat %>% select(sub, train_type, log_r_flt, k4_flt) %>%
               na.omit()
col_scheme <- wesanderson::wes_palette('IsleofDogs1')[5:6]
this_form = 'k4_flt ~ train_type'
ylabel = 'Transfer Bias'
fig_lab = 'A'

bs_wdth = 6
bs_hgt = 6

bias_by_grp_fname = paste(fig_path, 'bias-by-grp', sep='')
plt_bias_by_grp_4paper_andtlks(bias_by_grp_fname,
                               bs_wdth, bs_hgt,
                               bias,
                               this_form,
                               col_scheme,
                               ylabel,
                               fig_lab)
Warning in (function (main = NULL, sub = NULL, xlab = NULL, ylab = NULL, : font
width unknown for character 0x20 in encoding latin1
Warning in (function (main = NULL, sub = NULL, xlab = NULL, ylab = NULL, : font
width unknown for character 0x20 in encoding latin1
png 
  2 

Transfer bias score, by group.

Now, combine the boxplot and the bias scores into one figure.

Code
fnms = c( paste(fig_path, 'k4_by_grp', sep=''), 
          paste(fig_path, 'bias-by-grp', sep=''))

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

multi <- c(ims[[1]]) %>%
         image_append() %>%
        c(ims[[2]]) %>%
         image_append()
plt_hgt = trn_bp_hgt # get the height of the traj plot
plt_wdth = trn_bp_wdth+bs_wdth
pdf(paste(fig_path, 'k4_bias.pdf', sep=''), width=plt_wdth,
    height=plt_hgt)
plot(multi)
dev.off()
png 
  2 

And now, the last piece, the correlation between the bias score and the logged routine variability score.

Code
lt_bias_corr_fname = paste(fig_path, 'lt-bias_r_corr', sep='')
plt_r_bias_cor_4paper_andtlks(lt_bias_corr_fname,
                              bs_wdth, bs_hgt,
                              bias,
                              'A')
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
png 
  2 

Transfer bias correlates with routine score.

Now put the transfer bias scores

Code
fnms = c(k4_by_grp_fname, bias_by_grp_fname, lt_bias_corr_fname)

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

bottom <- image_append(c(ims[[2]],ims[[3]])) # decided to put bottom in paper and top in supplemental
pdf(paste(fig_path, 'LT_results.pdf', sep=''))
plot(bottom)
dev.off()
png 
  2 
Code
# multi <- image_append(c(ims[[1]], bottom), stack=TRUE)
pdf(paste(fig_path, 'LT_supp.pdf', sep=''), 
    width=trn_bp_wdth,
    height=bs_hgt*2)
plot(ims[[1]])
dev.off()
png 
  2 

Ahhh. Done. Now for stats, I only want the difference in complete vs partial, the difference in the bias between groups, and the correlation between the r and the bias.

Code
# learning transfer inferences
me_transfer = with(onset_dat, t.test(comp_flt, part_flt, 
                                     paired=TRUE, na.rm=TRUE))
transfer_for_cohens <- onset_dat %>% select(sub, comp_flt, part_flt) %>%
                         na.omit() %>%
                         pivot_longer(cols=c('comp_flt', 'part_flt'), names_to='cond',
                                             values_to='n') 
me_transfer_d = cohens_d(data=transfer_for_cohens,
                         formula=n~cond, paired=TRUE,
                         ci=TRUE, ci.type='perc', nboot=1000)
grp_bias = with(onset_dat, t.test(k4_flt~train_type, na.rm=TRUE))
grp_bias_d = cohens_d(data=onset_dat %>% na.omit(k4_flt),
                      formula=k4_flt~train_type, paired=FALSE,
                      ci=TRUE, ci.type='perc', nboot=1000)
# get mean and standard deviation of group bias scores
bias_desc_stats <- onset_dat %>% select(sub, train_type, log_r_flt, k4_flt) %>%
                  drop_na() %>%
                  group_by(train_type) %>%
                  summarise(M=mean(k4_flt),
                            SD=sd(k4_flt),
                            N=length(k4_flt),
                            SE=SD/sqrt(N))
bias_desc_stats[,c('M', 'SE')] <- apply(bias_desc_stats[,c('M', 'SE')], 2, round, 2)
write.csv(bias_desc_stats, paste(res_path, 'lt_bias_desc-stats.csv', sep=''), row.names=FALSE)

lt_ts <- tibble(fx = c('transfer', 'group-bias'),
                t = c(me_transfer$statistic, grp_bias$statistic),
                df = c(me_transfer$parameter, grp_bias$parameter),
                p = c(me_transfer$p.value, grp_bias$p.value),
                mu_a = c(me_transfer$estimate[1], grp_bias$estimate[1]),
                mu_b = c(me_transfer$estimate[2], grp_bias$estimate[2]),
                l_fx = c(me_transfer$conf.int[1], grp_bias$conf.int[1]),
                u_fx = c(me_transfer$conf.int[2], grp_bias$conf.int[2]),
                d = c(me_transfer_d$effsize, grp_bias_d$effsize),
                dl = c(me_transfer_d$conf.low, grp_bias_d$conf.low),
                du = c(me_transfer_d$conf.high, grp_bias_d$conf.high),
                n1 = c(me_transfer_d$n1, grp_bias_d$n1),
                n2 = c(me_transfer_d$n2, grp_bias_d$n2)) %>%
            mutate(across(where(is.numeric) & !matches("p"), ~round(.x, 2))) 
lt_ts <- lt_ts %>% mutate(p = round(p, 3))
write.csv(lt_ts, paste(res_path, 'lt_inf_tests.csv', sep=''), row.names=FALSE)
kable(lt_ts)
fx t df p mu_a mu_b l_fx u_fx d dl du n1 n2
transfer -5.98 70.00 0.000 -19.97 NA -26.63 -13.31 -0.71 -0.99 -0.49 71 71
group-bias 1.93 81.79 0.057 0.62 0.55 0.00 0.13 0.49 0.02 1.01 33 35

And now correlate the routine score, and perform a regression where we control for general errors per group.

Code
r_bias_cor = with(onset_dat, cor.test(k4_flt, log_r_flt, na.rm=TRUE))

lt_r_cor_info <- tibble(r=r_bias_cor$estimate,
                        l=r_bias_cor$conf.int[1],
                        u=r_bias_cor$conf.int[2],
                        t=r_bias_cor$statistic,
                        df=r_bias_cor$parameter,
                        p=r_bias_cor$p.value) %>%
    mutate(across(where(is.numeric) & !matches("p"), ~round(.x, 2))) %>% 
  mutate(id="cor")
lt_r_cor_info <- lt_r_cor_info %>% mutate(p = round(p, 3))
write.csv(lt_r_cor_info, paste(res_path, 'lt-r_corr.csv', sep=''), row.names=FALSE)
kable(lt_r_cor_info)
r l u t df p id
-0.23 -0.43 -0.01 -2.09 78 0.039 cor

Task switching

And now we ask exactly the same kinds of questions about task switching performance. Focusing on first correct RT

Code
# wrangle data
r_dat <- read.csv(paste(data_path, 'exp_ts_rscore.csv', sep=''))
ts_dat <- read.csv(paste(data_path, 'exp_ts_avg.csv', sep='')) %>% 
  filter(ses==3) %>%
  select(sub, context, switch, train_type, setting_errors,
         rt_correct, accuracy, general_errors) %>% 
  group_by(sub, switch, train_type) %>%
  summarise(se=mean(setting_errors),
            rt=mean(rt_correct),
            acc=mean(accuracy),
            ge=mean(general_errors)) %>%
  ungroup() %>%
  mutate(train_type=recode(train_type, `1` = 'stable', `2` = 'variable'),
         switch=recode(switch, `0` = 'stay', `1` = 'switch'))
`summarise()` has grouped output by 'sub', 'switch'. You can override using the
`.groups` argument.
Code
# now I temporarily make a wider format dataframe so I can make the bias scores
tmp <- ts_dat %>% pivot_wider(id_cols=c(sub,train_type),
                              names_from=switch,
                              values_from=c(se, rt, acc)) %>%
                  mutate(se_bias = se_switch/(se_stay+se_switch),
                         rt_bias = rt_switch/(rt_stay+rt_switch),
                         acc_bias = acc_switch/(acc_stay+acc_switch))
# now join key variables to other dataframe and remove the tmp one
ts_dat <- inner_join(ts_dat, tmp %>% select(sub, se_bias, rt_bias, acc_bias), by='sub')
rm(tmp)
# now join to the r dat and log the r variable
ts_dat <- inner_join(ts_dat, r_dat, by='sub') %>%
            mutate(log_r = log(r))

# Now I have the data, I want to replace any outliers with NA
# Note that setting errors wound up in a lot of dropped data, so I am not going to use the filtered versions of that variable
dvs <- c('se', 'rt', 'acc', 'se_bias', 'rt_bias', 'acc_bias', 'log_r')
ts_dat <- cbind(ts_dat,
                do.call(cbind, lapply(dvs, remove_outliers, betas=ts_dat)))

Code
write.csv(ts_dat, paste(data_path, 'exp_lt_tswr.csv', sep=''), row.names=FALSE)

Now I make 3 very similar plots as I did for learning transfer. But I have to do this twice for the task switching data - once for setting errors and once for RT

Code
# setting errors plot
ts_dat_4_plt <- ts_dat %>% 
  select(sub, train_type, switch, se)

col_scheme = c('#7570b3', '#e7298a')

ts_bp_g_se_fname = paste(fig_path, 'se_by_grp', sep='')
ts_bp_wdth = trn_bp_wdth 
ts_bp_hgt = trn_bp_hgt 
plt_ts_bp_4paper_andtlks(ts_bp_g_se_fname,
                         ts_bp_wdth, ts_bp_hgt,
                         ts_dat_4_plt,
                         'se ~ switch + train_type',
                         col_scheme,
                         ylabel='Set-Errors',
                         ylims=c(0, 0.6), # sort axes
                         yticks=seq(0, 0.6, by = 0.2),
                         fig_lab='C',
                         legend_on=TRUE)
png 
  2 

Task-switching stage setting errors, by group.

And now the RTs:

Code
# setting errors plot
ts_dat_4_plt <- ts_dat %>% 
  select(sub, train_type, switch, rt_flt)

col_scheme = c('#7570b3', '#e7298a')

ts_bp_g_rt_fname = paste(fig_path, 'rt_by_grp', sep='')

plt_ts_bp_4paper_andtlks(ts_bp_g_rt_fname,
                         ts_bp_wdth, ts_bp_hgt,
                         ts_dat_4_plt,
                         'rt_flt ~ switch + train_type',
                         col_scheme,
                         ylabel='RT',
                         ylims=c(0.4, 1.8),
                         yticks=seq(0.4, 1.8, by = 0.4),
                         fig_lab='A',
                         legend_on=FALSE)
png 
  2 

Task-switching stage RT, by group.

Now plot bias scores for both setting errors and RTs

Code
bias_se <- ts_dat %>% select(sub, train_type, se_bias) %>%
               na.omit() %>% unique()
col_scheme <- wesanderson::wes_palette('IsleofDogs1')[5:6]
this_form = 'se_bias ~ train_type'
ylabel = 'SE-Bias'
fig_lab = 'D'

bs_wdth = ts_bp_wdth*.7
bs_hgt = ts_bp_hgt

se_bias_by_grp_fname = paste(fig_path, 'exp_ts_se-bias-by-grp', sep='')
plt_bias_by_grp_4paper_andtlks(se_bias_by_grp_fname,
                               bs_wdth, bs_hgt,
                               bias_se,
                               this_form,
                               col_scheme,
                               ylabel,
                               fig_lab)
png 
  2 

Setting error bias by group. And now plot the bias score for the RT measure.

Code
bias_rt <- ts_dat %>% select(sub, train_type, rt_bias_flt) %>%
               na.omit() %>% unique()
col_scheme <- wesanderson::wes_palette('IsleofDogs1')[5:6]
this_form = 'rt_bias_flt ~ train_type'
ylabel = 'RT-Bias'
fig_lab = 'B'

rt_bias_by_grp_fname = paste(fig_path, 'exp_ts_rt-bias-by-grp', sep='')
plt_bias_by_grp_4paper_andtlks(rt_bias_by_grp_fname,
                               bs_wdth, bs_hgt,
                               bias_rt,
                               this_form,
                               col_scheme,
                               ylabel,
                               fig_lab)
png 
  2 

And combine into one big plot for showing the world.

Code
fnms = c( paste(fig_path, 'rt_by_grp', sep=''), 
          paste(fig_path, 'exp_ts_rt-bias-by-grp', sep=''), 
          paste(fig_path, 'se_by_grp', sep=''),
          paste(fig_path, 'exp_ts_se-bias-by-grp', sep=''))

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

multi <- ims[[1]] %>% c(ims[[2]]) %>%
          image_append() %>%
        c(ims[[3]] %>%c(ims[[4]]) %>% image_append()) %>%
          image_append(stack=TRUE)
plt_hgt = bs_hgt*2 # get the height of the traj plot
plt_wdth = ts_bp_wdth+bs_wdth
pdf(paste(fig_path, 'rt-and-se.pdf', sep=''), width=plt_wdth,
    height=plt_hgt)
plot(multi)
dev.off()
png 
  2 

And now I plot the correlation between the setting error bias and the routine score -

Code
bias_se <- inner_join(bias_se, 
                      ts_dat %>% select(sub, log_r_flt), by='sub') %>%
            unique()

se_bias_corr_fname = paste(fig_path, 'ts-se-bias_r_corr', sep='')
plt_r_se_bias_cor_4paper_andtlks(se_bias_corr_fname,
                                 bs_wdth, bs_hgt,
                                 bias_se,
                                'C')
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
Warning in title(...): font width unknown for character 0x20 in encoding latin1
png 
  2 

Setting error bias correlates with routine score. And now I put these figures together for the paper -

Code
fnms = c(ts_bp_g_se_fname, se_bias_by_grp_fname, se_bias_corr_fname)

ims = lapply(fnms, function(x) image_read_pdf(paste(x, '.pdf', sep='')))

bottom <- image_append(c(ims[[2]],ims[[3]]))
multi <- image_append(c(ims[[1]], bottom), stack=TRUE)
pdf(paste(fig_path, 'TS_results.pdf', sep=''), 
    width=trn_bp_wdth,
    height=bs_hgt*2)
plot(multi)
dev.off()
png 
  2 

All the figures are done!

Code
# first get the switch setting errors to compare between
sw4t_dat <- ts_dat %>% select(sub, train_type, switch, se) %>%
  pivot_wider(names_from=switch, values_from = se) %>%
  na.omit()

switch_t = with(sw4t_dat, 
                t.test(x=stay, y=switch, paired=TRUE, na.rm=TRUE))

switch_for_cohens <- sw4t_dat %>% 
  pivot_longer(cols=c('stay', 'switch'), 
               names_to='cond',
               values_to='n') 
switch_d = cohens_d(data=switch_for_cohens,
                         formula=n~cond, paired=TRUE,
                         ci=TRUE, ci.type='perc', nboot=1000)

# now compare the groups for setting error bias
grp_bias = with(bias_se, t.test(se_bias~train_type, na.rm=TRUE))
grp_bias_d = cohens_d(data=bias_se,
                      formula=se_bias~train_type, paired=FALSE,
                      ci=TRUE, ci.type='perc', nboot=1000)
# now compute the correlation
r_se_bias_cor = with(bias_se, 
                  cor.test(se_bias, log_r_flt, 
                           method='spearman'))
Warning in cor.test.default(se_bias, log_r_flt, method = "spearman"): Cannot
compute exact p-value with ties
Code
ts_ts <- tibble(fx = c('sw', 'group-bias'),
                t = c(switch_t$statistic, grp_bias$statistic),
                df = c(switch_t$parameter, grp_bias$parameter),
                p = c(switch_t$p.value, grp_bias$p.value),
                mu_a = c(switch_t$estimate[1], grp_bias$estimate[1]),
                mu_b = c(switch_t$estimate[2], grp_bias$estimate[2]),
                l_fx = c(switch_t$conf.int[1], grp_bias$conf.int[1]),
                u_fx = c(switch_t$conf.int[2], grp_bias$conf.int[2]),
                d = c(switch_d$effsize, grp_bias_d$effsize),
                dl = c(switch_d$conf.low, grp_bias_d$conf.low),
                du = c(switch_d$conf.high, grp_bias_d$conf.high),
                n1 = c(switch_d$n1, grp_bias_d$n1),
                n2 = c(switch_d$n2, grp_bias_d$n2))
write.csv(ts_ts, paste(res_path, 'ts-se_inf_tests.csv', sep=''), row.names=FALSE)
kable(ts_ts, digits=2)
fx t df p mu_a mu_b l_fx u_fx d dl du n1 n2
sw -9.05 99.00 0.00 -0.03 NA -0.04 -0.02 -0.91 -1.22 -0.64 100 100
group-bias 2.22 95.58 0.03 0.82 0.73 0.01 0.16 0.45 0.04 0.93 50 48

And now I just print out the correlation results.

Code
ts_se_r_cor_info <- tibble(r=r_se_bias_cor$estimate,
                           S=r_se_bias_cor$statistic,
                           df=nrow(bias_se)-2,
                           p=r_se_bias_cor$p.value)
write.csv(ts_se_r_cor_info, 
          paste(res_path, 'ts_se-r_corr.csv', sep=''),
          row.names=FALSE)
kable(ts_se_r_cor_info, digits=2)
r S df p
-0.55 229046.5 96 0