SIFT Manual 0.1 Alpha

SIFT_manual

SIFT_manual_0.1_alpha

SIFT_manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 69

DownloadSIFT Manual 0.1 Alpha
Open PDF In BrowserView PDF
	
  

Source	
  Information	
  Flow	
  Toolbox	
  (SIFT)	
  
An	
  Electrophysiological	
  Information	
  Flow	
  Toolbox	
  for	
  EEGLAB	
  	
  
Theoretical	
  Handbook	
  and	
  User	
  Manual	
  
	
  

Tim	
  Mullen	
  
Swartz	
  Center	
  for	
  Computational	
  Neuroscience,	
  
Institute	
  for	
  Neural	
  Computation,	
  and	
  
Department	
  of	
  Cognitive	
  Science	
  
University	
  of	
  California,	
  San	
  Diego	
  
	
  
tim@sccn.ucsd.edu
Release	
  0.1a	
  
Dec	
  15,	
  2010	
  
	
  

	
  

	
  

1	
  

1.	
   Table	
  of	
  Contents	
  
2.	
   INTRODUCTION	
  ..............................................................................................................................	
  4	
  
3.	
   MULTIVARIATE	
  AUTOREGRESSIVE	
  MODELING	
  ...................................................................	
  7	
  
3.1.	
   STATIONARITY	
  AND	
  STABILITY	
  ......................................................................................................	
  7	
  
3.2.	
   THE	
  MULTIVARIATE	
  LEAST-­‐SQUARES	
  ESTIMATOR	
  .......................................................................	
  7	
  
3.3.	
   FREQUENCY-­‐DOMAIN	
  REPRESENTATION	
  .......................................................................................	
  9	
  
3.4.	
   MODELING	
  NON-­‐STATIONARY	
  DATA	
  USING	
  ADAPTIVE	
  VAR	
  MODELS	
  ........................................	
  10	
  
3.4.1.	
   SEGMENTATION-­‐BASED	
  ADAPTIVE	
  VAR	
  (AMVAR)	
  MODELS	
  ............................................................	
  11	
  
3.5.	
   MODEL	
  ORDER	
  SELECTION	
  ............................................................................................................	
  12	
  
3.6.	
   MODEL	
  VALIDATION	
  ......................................................................................................................	
  14	
  
3.6.1.	
   CHECKING	
  THE	
  WHITENESS	
  OF	
  THE	
  RESIDUALS	
  ....................................................................................	
  14	
  
3.6.1.1.	
  
3.6.1.2.	
  

Autocorrelation	
  Function	
  (ACF)	
  Test	
  ..........................................................................................................	
  15	
  
Portmanteau	
  Tests	
  ..............................................................................................................................................	
  15	
  

3.6.2.	
   CHECKING	
  THE	
  CONSISTENCY	
  OF	
  THE	
  MODEL	
  ........................................................................................	
  17	
  
3.6.3.	
   CHECKING	
  THE	
  STABILITY	
  AND	
  STATIONARITY	
  OF	
  THE	
  MODEL	
  ..........................................................	
  17	
  
3.6.4.	
   COMPARING	
  PARAMETRIC	
  AND	
  NONPARAMETRIC	
  SPECTRA	
  AND	
  COHERENCE	
  ................................	
  17	
  
4.	
   GRANGER	
  CAUSALITY	
  AND	
  EXTENSIONS	
  .............................................................................	
  18	
  
4.1.	
  
4.2.	
  
4.3.	
  
4.4.	
  
4.5.	
  

TIME-­‐DOMAIN	
  GC	
  .........................................................................................................................	
  18	
  
FREQUENCY-­‐DOMAIN	
  GC	
  ..............................................................................................................	
  20	
  
A	
  PARTIAL	
  LIST	
  OF	
  VAR-­‐BASED	
  SPECTRAL,	
  COHERENCE	
  AND	
  GC	
  ESTIMATORS	
  ........................	
  21	
  
TIME-­‐FREQUENCY	
  GC	
  ....................................................................................................................	
  24	
  
(CROSS-­‐)	
  CORRELATION	
  DOES	
  NOT	
  IMPLY	
  (GRANGER-­‐)	
  CAUSATION	
  .........................................	
  24	
  

5.	
   STATISTICS	
  ....................................................................................................................................	
  26	
  
5.1.	
   ASYMPTOTIC	
  ANALYTIC	
  STATISTICS	
  .............................................................................................	
  26	
  
5.2.	
   NONPARAMETRIC	
  SURROGATE	
  STATISTICS	
  ..................................................................................	
  26	
  
5.2.1.	
   BOOTSTRAP	
  RESAMPLING	
  ..........................................................................................................................	
  27	
  
5.2.2.	
   PHASE	
  RANDOMIZATION	
  ............................................................................................................................	
  27	
  
6.	
   USING	
  SIFT	
  TO	
  ANALYZE	
  NEURAL	
  INFORMATION	
  FLOW	
  DYNAMICS	
  .........................	
  28	
  
6.1.	
   SYSTEM	
  REQUIREMENTS	
  ...............................................................................................................	
  29	
  
6.2.	
   CONFIGURING	
  EEGLAB	
  ................................................................................................................	
  29	
  
6.3.	
   LOADING	
  THE	
  DATA	
  .......................................................................................................................	
  30	
  
6.4.	
   THE	
  SIFT	
  ANALYSIS	
  PIPELINE	
  .......................................................................................................	
  31	
  
6.5.	
   PREPROCESSING	
  .............................................................................................................................	
  31	
  
6.5.1.	
   THEORY:	
  PREPROCESSING	
  ..........................................................................................................................	
  33	
  
6.5.1.1.	
  
6.5.1.2.	
  
6.5.1.3.	
  
6.5.1.4.	
  
6.5.1.5.	
  
6.5.1.6.	
  
6.5.1.7.	
  

Component	
  Selection	
  ..........................................................................................................................................	
  34	
  
Epoching	
  ...................................................................................................................................................................	
  34	
  
Filtering	
  ....................................................................................................................................................................	
  35	
  
Downsampling	
  .......................................................................................................................................................	
  35	
  
Differencing	
  ............................................................................................................................................................	
  35	
  
Detrending	
  ..............................................................................................................................................................	
  36	
  
Normalization	
  ........................................................................................................................................................	
  36	
  

6.6.	
   MODEL	
  FITTING	
  AND	
  VALIDATION	
  ...............................................................................................	
  36	
  
6.6.1.	
   THEORY:	
  SELECTING	
  A	
  WINDOW	
  LENGTH	
  ...............................................................................................	
  38	
  
6.6.1.1.	
  

	
  

Local	
  Stationarity	
  .................................................................................................................................................	
  38	
  

2	
  

6.6.1.2.	
  
6.6.1.3.	
  
6.6.1.4.	
  

Temporal	
  Smoothing	
  ..........................................................................................................................................	
  38	
  
Sufficient	
  amount	
  of	
  data	
  ..................................................................................................................................	
  39	
  
Process	
  dynamics	
  and	
  neurophysiology	
  ....................................................................................................	
  39	
  

6.6.2.	
   SELECTING	
  THE	
  MODEL	
  ORDER	
  .................................................................................................................	
  40	
  
6.6.3.	
   FITTING	
  THE	
  FINAL	
  MODEL	
  ........................................................................................................................	
  43	
  
6.6.4.	
   VALIDATING	
  THE	
  FITTED	
  MODEL	
  ..............................................................................................................	
  44	
  
6.7.	
   CONNECTIVITY	
  ESTIMATION	
  .........................................................................................................	
  47	
  
6.8.	
   STATISTICS	
  .....................................................................................................................................	
  48	
  
6.9.	
   VISUALIZATION	
  ..............................................................................................................................	
  49	
  
6.9.1.	
   INTERACTIVE	
  TIME-­‐FREQUENCY	
  GRID	
  ...................................................................................................	
  49	
  
6.9.2.	
   INTERACTIVE	
  CAUSAL	
  BRAINMOVIE3D	
  .................................................................................................	
  54	
  
6.9.3.	
   CAUSAL	
  PROJECTION	
  ...................................................................................................................................	
  60	
  
6.10.	
   GROUP	
  ANALYSIS	
  .........................................................................................................................	
  61	
  
6.10.1.	
   DISJOINT	
  CLUSTERING	
  ..............................................................................................................................	
  62	
  
6.10.2.	
   BAYESIAN	
  MIXTURE	
  MODEL	
  ...................................................................................................................	
  62	
  
7.	
   CONCLUSIONS	
  AND	
  FUTURE	
  WORK	
  .......................................................................................	
  62	
  
8.	
   ACKNOWLEDGEMENTS	
  ..............................................................................................................	
  63	
  
9.	
   APPENDIX	
  I:	
  	
  SIFT	
  0.1A	
  FUNCTION	
  REFERENCE	
  ................................................................	
  64	
  
10.	
   REFERENCES	
  ................................................................................................................................	
  66	
  
	
  

	
  

	
  

	
  

3	
  

2.	
   Introduction	
  
	
  
	
  Mapping	
   the	
   structural	
   and	
   active	
   functional	
   properties	
   of	
   brain	
   networks	
   is	
   a	
   key	
   goal	
   of	
  
basic	
   and	
   clinical	
   neuroscience	
   and	
   medicine.	
   The	
   novelty	
   and	
   importance	
   of	
   this	
  
transformative	
   research	
   was	
   recently	
   emphasized	
   by	
   the	
   U.S.	
   National	
   Institute	
   of	
   Health	
   in	
  
their	
  2010	
  announcement	
  for	
  the	
  Human	
  Connectome	
  Project:	
  	
  
Knowledge	
  of	
  human	
  brain	
  connectivity	
  will	
  transform	
  human	
  neuroscience	
  by	
  
providing	
  not	
  only	
  a	
  qualitatively	
  novel	
  class	
  of	
  data,	
  but	
  also	
  by	
  providing	
  the	
  
basic	
   framework	
   necessary	
   to	
   synthesize	
   diverse	
   data	
   and,	
   ultimately,	
   elucidate	
  
how	
  our	
  brains	
  work	
  in	
  health,	
  illness,	
  youth,	
  and	
  old	
  age.	
  
The	
   study	
   of	
   human	
   brain	
   connectivity	
   generally	
   falls	
   under	
   one	
   or	
   more	
   of	
   three	
  
categories:	
  structural,	
  functional,	
  and	
  effective	
  (Bullmore	
  and	
  Sporns,	
  2009).	
  	
  

Structural connectivity denotes networks of anatomical (e.g., axonal) links. Here the
primary goal is to understand what brain structures are capable of influencing each other
via direct or indirect axonal connections. This might be studied in vivo using invasive
axonal labeling techniques or noninvasive	
   MRI-­‐based	
   diffusion	
   weighted	
   imaging	
  
(DWI/DTI)	
  methods.	
  	
  	
  
Functional	
   connectivity	
   denotes	
   (symmetrical)	
   correlations	
   in	
   activity	
   between	
   brain	
  
regions	
  during	
  information	
  processing.	
  Here	
  the	
  primary	
  goal	
  is	
  to	
  understand	
  what	
  regions	
  
are	
  functionally	
  related	
  through	
  correlations	
  in	
  their	
  activity,	
  as	
  measured	
  by	
  some	
  imaging	
  
technique.	
   A	
   popular	
   form	
   of	
   functional	
   connectivity	
   analysis	
   using	
   functional	
   magnetic	
  
resonance	
   imaging	
   (fMRI)	
   has	
   been	
   to	
   compute	
   the	
   pairwise	
   correlation	
   (or	
   partial	
  
correlation)	
  in	
  BOLD	
  activity	
  for	
  a	
  large	
  number	
  of	
  voxels	
  or	
  regions	
  of	
  interest	
  within	
  the	
  
brain	
  volume.	
  	
  
In	
  contrast	
  to	
  the	
  symmetric	
  nature	
  of	
  functional	
  connectivity,	
  effective	
  connectivity	
  denotes	
  
asymmetric	
   or	
   causal	
   dependencies	
   between	
   brain	
   regions.	
   Here	
   the	
   primary	
   goal	
   is	
   to	
  
identify	
   which	
   brain	
   structures	
   in	
   a	
   functional	
   network	
   are	
   (causally)	
   influencing	
   other	
  
elements	
   of	
   the	
   network	
   during	
   some	
   stage	
   or	
   form	
   of	
   information	
   processing.	
   Often	
   the	
  
term	
  “information	
  flow”	
  is	
  used	
  to	
  indicate	
  directionally	
  specific	
  (although	
  not	
  necessarily	
  
causal)	
   effective	
   connectivity	
   between	
   neuronal	
   structures.	
   Popular	
   effective	
   connectivity	
  
methods,	
   applied	
   to	
   fMRI	
   and/or	
   electrophysiological	
   (EEG,	
   iEEG,	
   MEG)	
   imaging	
   data,	
  
include	
   dynamic	
   causal	
   modeling,	
   structural	
   equation	
   modeling,	
   transfer	
   entropy,	
   and	
  
Granger-­‐causal	
  methods.	
  	
  
Contemporary	
  research	
  on	
  building	
  a	
  human	
  ‘connectome’	
  (complete	
  map	
  of	
  human	
  brain	
  
connectivity)	
   has	
   typically	
   focused	
   on	
   structural	
   connectivity	
   using	
   MRI	
   and	
   diffusion-­‐
weighted	
  imaging	
  (DWI)	
  and/or	
  on	
  functional	
  connectivity	
  using	
  fMRI.	
  However,	
  the	
  brain	
  
is	
   a	
   highly	
   dynamic	
   system,	
   with	
   networks	
   constantly	
   adapting	
   and	
   responding	
   to	
  
environmental	
   influences	
   so	
   as	
   to	
   best	
   suit	
   the	
   needs	
   of	
   the	
   individual.	
   	
   A	
   complete	
  
description	
   of	
   the	
   human	
   connectome	
   necessarily	
   requires	
   accurate	
   mapping	
   and	
   modeling	
  
of	
   transient	
   directed	
   information	
   flow	
   or	
   causal	
   dynamics	
   within	
   distributed	
   anatomical	
  
networks.	
   Efforts	
   to	
   examine	
   transient	
   dynamics	
   of	
   effective	
   connectivity	
   (causality	
   or	
  
directed	
   information	
   flow)	
   using	
   fMRI	
   are	
   complicated	
   by	
   low	
   temporal	
   resolution,	
  
assumptions	
   regarding	
   the	
   spatial	
   stationarity	
   of	
   the	
   hemodynamic	
   response,	
   and	
  
smoothing	
   transforms	
   introduced	
   in	
   standard	
   fMRI	
   signal	
   processing	
   (Deshpande	
   et	
   al.,	
  
	
  

4	
  

2009a;	
   Deshpande	
   et	
   al.,	
   2009b).	
   While	
   electro-­‐	
   and	
   magneto-­‐encephalography	
   (EEG/MEG)	
  
affords	
   high	
   temporal	
   resolution,	
   the	
   traditional	
   approach	
   of	
   estimating	
   connectivity	
  
between	
  EEG	
  electrode	
  channels	
  (or	
  MEG	
  sensors)	
  suffers	
  from	
  a	
  high	
  risk	
  of	
  false	
  positives	
  
from	
  volume	
  conduction	
  and	
  non-­‐brain	
  artifacts.	
  Furthermore,	
  severe	
  limitations	
  in	
  spatial	
  
resolution	
   when	
   using	
   surface	
   sensors	
   further	
   limits	
   the	
   physiological	
   interpretability	
   of	
  
observed	
   connectivity.	
   Although	
   precisely	
   identifying	
   the	
   anatomical	
   locations	
   of	
   sources	
   of	
  
observed	
   electrical	
   activity	
   (the	
   inverse	
   problem)	
   is	
   mathematically	
   ill-­‐posed,	
   recent	
  
improvements	
   in	
   source	
   separation	
   and	
   localization	
   techniques	
   may	
   allow	
   approximate	
  
identification	
   of	
   such	
   anatomical	
   coordinates	
   with	
   sufficient	
   accuracy	
   to	
   yield	
   anatomical	
  
insight	
   invaluable	
   to	
   a	
   wide	
   range	
   of	
   cognitive	
   neuroscience	
   and	
   neuroengineering	
  
applications	
   (Michel	
   et	
   al.,	
   2004).	
   In	
   limited	
   circumstances	
   it	
   is	
   also	
   possible	
   to	
   obtain	
  
human	
   intracranially-­‐recorded	
   EEG	
   (ICE,	
   ECoG,	
   iEEG)	
   that,	
   although	
   highly	
   invasive,	
   affords	
  
high	
  spatiotemporal	
  resolution	
  and	
  (often)	
  reduced	
  susceptibility	
  to	
  non-­‐brain	
  artifacts.	
  	
  
Once	
   activity	
   in	
   specific	
   brain	
   areas	
   have	
   been	
   identified	
   using	
   source	
   separation	
   and	
  
localization,	
   it	
   is	
   possible	
   to	
   look	
   for	
   transient	
   changes	
   in	
   dependence	
   between	
   these	
  
different	
   brain	
   source	
   processes.	
   Advanced	
   methods	
   for	
   non-­‐invasively	
   detecting	
   and	
  
modeling	
   distributed	
   network	
   events	
   contained	
   in	
   high-­‐density	
   EEG	
   data	
   are	
   highly	
  
desirable	
   for	
   basic	
   and	
   clinical	
   studies	
   of	
   distributed	
   brain	
   activity	
   supporting	
   behavior	
   and	
  
experience.	
   In	
   recent	
   years,	
   Granger	
   Causality	
   (GC)	
   and	
   its	
   extensions	
   have	
   increasingly	
  
been	
  used	
  to	
  explore	
  ‘effective’	
  connectivity	
  (directed	
  information	
  flow,	
  or	
  causality)	
  in	
  the	
  
brain	
   based	
   on	
   analysis	
   of	
   prediction	
   errors	
   of	
   autoregressive	
   models	
   fit	
   to	
   channel	
   (or	
  
source)	
   waveforms.	
   GC	
   has	
   enjoyed	
   substantial	
   recent	
   success	
   in	
   the	
   neuroscience	
  
community,	
   with	
   over	
   1200	
   citations	
   in	
   the	
   last	
   decade	
   (Google	
   Scholar).	
   This	
   is	
   in	
   part	
   due	
  
to	
  the	
  relative	
  simplicity	
  and	
  interpretability	
  of	
  GC	
  –	
  it	
  is	
  a	
  data-­‐driven	
  approach	
  based	
  on	
  
linear	
   regressive	
   models	
   requiring	
   only	
   a	
   few	
   basic	
   a	
   priori	
   assumptions	
   regarding	
   the	
  
generating	
   statistics	
   of	
   the	
   data.	
   However,	
   it	
   is	
   also	
   a	
   powerful	
   technique	
   for	
   system	
  
identification	
   and	
   causal	
   analysis.	
   While	
   many	
   landmark	
   studies	
   have	
   applied	
   GC	
   to	
  
invasively	
  recorded	
  local	
  field	
  potentials	
  and	
  spike	
  trains,	
  a	
  growing	
  number	
  of	
  studies	
  have	
  
successfully	
  applied	
  GC	
  to	
  non-­‐invasively	
  recorded	
  human	
  EEG	
  and	
  MEG	
  data	
  (as	
  reviewed	
  
in	
  (Bressler	
  and	
  Seth,	
  2010)).	
  Application	
  of	
  these	
  methods	
  in	
  the	
  EEG	
  source	
  domain	
  is	
  also	
  
being	
   seen	
   in	
   an	
   increasing	
   number	
   of	
   studies	
   (Hui	
   and	
   Leahy,	
   2006;	
   Supp	
   et	
   al.,	
   2007;	
  
Astolfi	
  et	
  al.,	
  2007;	
  Haufe	
  et	
  al.,	
  2010).	
  	
  
In	
  the	
  last	
  decade	
  an	
  increasing	
  number	
  of	
  effective	
  connectivity	
  measures,	
  closely	
  related	
  
to	
  Granger’s	
  definition	
  of	
  causality,	
  have	
  been	
  proposed.	
  Like	
  classic	
  GC,	
  these	
  measures	
  can	
  
be	
   derived	
   from	
   (multivariate)	
   autoregressive	
   models	
   fit	
   to	
   observed	
   data	
   time-­‐series.	
  
These	
   measures	
   can	
   describe	
   different	
   aspects	
  of	
  network	
  dynamics	
  and	
  thus	
   comprise	
   a	
  
complementary	
  set	
  of	
  tools	
  for	
  effective	
  connectivity	
  or	
  causal	
  analysis.	
  
Several	
   toolboxes	
   affording	
   various	
   forms	
   of	
   Granger-­‐causal	
   (or	
   related)	
   connectivity	
  
analysis	
  are	
  currently	
  available	
  in	
  full	
  or	
  beta-­‐release.	
  	
  Table	
  1	
  provides	
  a	
  list	
  of	
  several	
  of	
  
these	
   toolboxes,	
   along	
   with	
   the	
   website,	
   release	
   version,	
   and	
   license.	
   Although	
   these	
  
toolboxes	
   provide	
   a	
   number	
   of	
   well-­‐written	
   and	
   useful	
   functions,	
   they	
   generally	
   lack	
  
integration	
   within	
   a	
   more	
   comprehensive	
   framework	
   for	
   EEG	
   signal	
   processing	
   (with	
   the	
  
exception	
   of	
   TSA,	
   which	
   does	
   integrate	
   into	
   the	
   Biosig	
   EEG/MEG	
   processing	
   suite).	
  
Furthermore,	
   most	
   of	
   these	
   either	
   implements	
   only	
   one	
   or	
   two	
   (often	
   bivariate)	
  
connectivity	
  measures,	
   lacks	
  tools	
  for	
  sophisticated	
  visualization,	
   or	
  lacks	
  robust	
   statistics	
  
or	
   multi-­‐subject	
   (group)	
   analysis.	
   Finally,	
   with	
   the	
   exception	
   of	
   E-­‐Connectome,	
   none	
   of	
  
these	
   toolboxes	
   directly	
   support	
   analysis	
   and/or	
   visualization	
   of	
   connectivity	
   in	
   the	
   EEG	
  

	
  

5	
  

source	
   domain.	
   These	
   are	
   all	
   issues	
   that	
   our	
   Source	
   Information	
   Flow	
   Toolbox	
   (SIFT),	
  
combined	
  with	
  the	
  EEGLAB	
  software	
  suite,	
  attempts	
  to	
  address.	
  
Table	
   1.	
   A	
   list	
   of	
   free	
   Matlab-­‐based	
   toolboxes	
   for	
   connectivity	
   and	
   graph-­‐theoretical	
   analysis	
   in	
   neural	
  
data.	
  

Toolbox	
  Name	
  

Primary	
  
Author	
  

Release	
  

Website	
  

License	
  

Granger	
  
Causality	
  
Connectivity	
  Analysis	
  
(GCCA)	
  Toolbox	
  

Anil	
  Seth	
  

2.6.1	
  

http://www.informatics
.sussex.ac.uk/users/anil
s/index.htm	
  	
  

GPL	
  3	
  

Time-­‐Series	
   Analysis	
   Alois	
  Schloegl	
  
(TSA)	
  Toolbox	
  

3.00	
  

http://biosig-­‐
consulting.com/matlab/
tsa/	
  	
  

GPL	
  2	
  

E-­‐Connectome	
  

1.0	
  

http://econnectome.um

GPL	
  3	
  

Bin	
  He	
  

n.edu/	
  	
  
Brain-­‐System	
  
Multivariate	
  
AutoRegressive	
  
Timeseries	
  
(BSMART)

for	
  

Jie	
  Cui	
  

beta	
  

http://www.brain-­‐
smart.org/	
  	
  

-­‐-­‐	
  

	
  
SIFT	
  is	
  an	
  open-­‐source	
  Matlab	
  (The	
  Mathworks,	
  Inc.)	
  toolbox	
  for	
  analysis	
  and	
  visualization	
  
of	
   multivariate	
   information	
   flow	
   and	
   causality,	
   primarily	
   in	
   EEG/iEEG/MEG	
   datasets	
  
following	
   source	
   separation	
   and	
   localization.	
   The	
   toolbox	
   supports	
   both	
   command-­‐line	
  
(scripting)	
  and	
  graphical	
  user	
  interface	
  (GUI)	
  interaction	
  and	
  is	
  integrated	
  into	
  the	
  widely	
  
used	
   open-­‐source	
   EEGLAB	
   software	
   environment	
   for	
   electrophysiological	
   data	
   analysis	
  
(sccn.ucsd.edu/eeglab).	
  There	
  are	
  currently	
  four	
  modules:	
  data	
  preprocessing,	
  model	
  fitting	
  
and	
   connectivity	
   estimation,	
   statistical	
   analysis,	
   and	
   visualization.	
   A	
   fifth	
   group	
   analysis	
  
module,	
   affording	
   connectivity	
   analysis	
   and	
   statistics	
   over	
   a	
   cohort	
   of	
   datasets,	
   will	
   be	
  
included	
  in	
  the	
  upcoming	
  beta	
  release.	
  First	
  methods	
  implemented	
  include	
  a	
  large	
  number	
  
of	
   popular	
   frequency-­‐domain	
   granger-­‐causal	
   and	
   coherence	
   measures,	
   obtained	
   from	
  
adaptive	
  multivariate	
  autoregressive	
  models,	
  surrogate	
  and	
  analytic	
  statistics,	
  and	
  a	
  suite	
  of	
  
tools	
  for	
  interactive	
  visualization	
  of	
  information	
  flow	
  dynamics	
  across	
  time,	
  frequency,	
  and	
  
(standard	
  or	
  personal	
  MRI	
  co-­‐registered)	
  anatomical	
  source	
  locations.	
  	
  
In	
   this	
   paper,	
   I	
   will	
   outline	
   the	
   theory	
   underlying	
   multivariate	
   autoregressive	
   modeling	
   and	
  
granger-­‐causal	
  analysis.	
  Practical	
  considerations,	
  such	
  as	
  data	
  length,	
  parameter	
  selection,	
  
and	
   non-­‐stationarities	
   are	
   addressed	
   throughout	
   the	
   text,	
  and	
  useful	
   tests	
   for	
   estimating	
  
statistical	
   significance	
   are	
   outlined.	
   This	
   theory	
   section	
   is	
   followed	
   by	
   a	
   hands-­‐on	
  
walkthrough	
  of	
  the	
  use	
  of	
  the	
  SIFT	
  toolbox	
  for	
  analyzing	
  source	
  information	
  flow	
  dynamics	
  
in	
   an	
   EEG	
   dataset.	
   Here	
   we	
   will	
   walk	
   through	
   a	
   typical	
   data	
   processing	
   pipeline	
   culminating	
  
with	
   the	
   demonstration	
   of	
   some	
   of	
   SIFT’s	
   powerful	
   tools	
   for	
   interactive	
   visualization	
   of	
  
time-­‐	
  and	
  frequency-­‐dependent	
  directed	
  information	
  flow	
  between	
  localized	
  EEG	
  sources	
  in	
  
an	
   anatomically-­‐coregistered	
   3D	
   space.	
   Theory	
   boxes	
   through	
   the	
   chapter	
   will	
   provide	
  
additional	
  insight	
  on	
  various	
  aspects	
  of	
  model	
  fitting	
  and	
  parameter	
  selection.	
  
	
  

6	
  

3.	
   Multivariate	
  Autoregressive	
  Modeling	
  
	
  
Assume	
   we	
   have	
   an	
   M-­‐dimensional	
   time-­‐series	
   of	
   length	
   T	
   (e.g.,	
   M	
   channels	
   of	
   EEG	
   data,	
  
with	
  T	
  time	
  points	
  per	
  channel):	
   X := x1 … xT where xt = [ x1t … xMt ]′ .	
  We	
  can	
  represent	
  the	
  
multivariate	
   process	
   at	
   time	
   t	
   as	
   a	
   stationary,	
   stable	
   vector	
   autoregressive	
   (VAR,	
   MVAR,	
  
MAR)	
  process	
  of	
  order	
  p	
  (Henceforth	
  we	
  will	
  denote	
  this	
  as	
  a	
  VAR[p]	
  process):	
  
p

xt = v + ∑Ak xt −k + ut 	
  

(Eq	
  3.1)	
  

k =1

Here	
   v = [v1 …vM ]′ 	
  is	
  an	
  (M	
  x	
  1)	
  vector	
  of	
  intercept	
  terms	
  (the	
  mean	
  of	
  X),	
  Ai	
   	
   are	
  (M	
  x	
  M)	
  
model	
   coefficient	
   matrices	
   and ut is	
   a	
   zero-­‐mean	
   white	
   noise	
   process	
   with	
   nonsingular	
  
covariance	
  matrix Σ .	
  	
  

3.1.	
  

Stationarity	
  and	
  Stability	
  

	
  
We	
   assume	
   two	
   basic	
   conditions	
   regarding	
   the	
   data	
   X	
   and	
   its	
   associated	
   VAR[p]	
   model:	
  
stationarity	
   and	
   stability.	
   A	
   stochastic	
   process	
   X	
   is	
   weakly	
   stationary	
   (or	
   wide-­‐sense	
  
stationary	
   (WSS))	
   if	
   its	
   first	
   and	
   second	
   moments	
   (mean	
   and	
   covariance)	
   do	
   not	
   change	
  
with	
   time.	
   In	
   other	
   words	
   E ( xt ) = µ for	
   all	
   t	
   and E[( xt − µ )( xt −h − µ )′] = Γ(h) = Γ(−h)′ for	
  
all	
  t	
  and	
  h=0,1,2,	
  …	
  where	
  E	
  denotes	
  expected	
  value.	
  A	
  VAR[p]	
  process	
  is	
  considered	
  stable	
  if	
  
its	
  reverse	
  characteristic	
  polynomial	
  has	
  no	
  roots	
  in	
  or	
  on	
  the	
  complex	
  unit	
  circle.	
  Formally,	
  
xt 	
  is	
  stable	
  if	
  

det( I Mp ! Az ) " 0 for z # 1 where
$ A
& 1
& I
M
A := &
& "
&
& 0
%

A2

!

0

!

#
!

IM

( Mp * Mp )

Ap '
)
0 )
)
" )
)
0 )
(

	
  

Equivalently,	
  xt	
   is	
  stable	
  if	
  all	
  eigenvalues	
  of	
  A	
  have	
  modulus	
  less	
  than	
  1	
  (Lütkepohl,	
  2006).	
  
A	
  stable	
  process	
  is	
  one	
  that	
  will	
  not	
  diverge	
  to	
  infinity	
  (“blow	
  up”).	
  An	
  important	
  fact	
  is	
  that	
  
stability	
  implies	
  stationarity	
  –	
  thus	
  it	
  is	
  sufficient	
  to	
  test	
  for	
  stability	
  to	
  ensure	
  that	
  a	
  VAR[p]	
  
process	
   is	
   both	
   stable	
   and	
   stationary.	
   SIFT	
   performs	
   a	
   stability	
   test	
   by	
   analyzing	
   the	
  
eigenvalues	
  of	
  A.	
  	
  

3.2.	
  

The	
  Multivariate	
  Least-­‐Squares	
  Estimator	
  

	
  
A	
   parametric	
   VAR	
   model	
   can	
   be	
   fit	
   using	
   a	
   number	
   of	
   approaches	
   including	
   multivariate	
  
least-­‐squares	
   approaches	
   (e.g.,	
   MLS,	
   ARFIT),	
   lattice	
   algorithms	
   (e.g.,	
   Vieira-­‐Morf),	
   or	
   state-­‐

	
  

7	
  

space	
   models	
   (e.g.,	
   Kalman	
   filtering).	
   Here	
   we	
   will	
   briefly	
   outline	
   the	
   multivariate	
   least-­‐
squares	
   algorithm	
   (multichannel	
   Yule-­‐Walker)	
   and	
   encourage	
   the	
   interested	
   reader	
   to	
  
consult	
   (Schlögl,	
   2000;	
   Lütkepohl,	
   2006;	
   Schlögl,	
   2006)	
   for	
   more	
   details	
   on	
   this	
   and	
   other	
  
algorithms	
  (several	
  of	
  which	
  are	
  implemented	
  in	
  SIFT).	
  
To	
  derive	
  the	
  multivariate	
  least-­‐squares	
  estimator,	
  let	
  us	
  begin	
  with	
  some	
  definitions:	
  

X := ( x1 ,…, xT )

(M ! T ),

B := (v , A1 ,…, A p )

(M ! ( Mp + 1)),

# 1 &
%
(
% xt (
Z t := %
(
!
%
(
% x t " p +1 (
$
'
Z := ( Z 0 ,…, Z T "1 )

U := (u1 ,…, uT )

(( Mp + 1) ! 1), 	
  

((Mp + 1) ! T ),
(M ! T )

	
  
Our	
  VAR[p]	
  model	
  (Eq	
  3.1)	
  can	
  now	
  be	
  written	
  in	
  compact	
  form:	
  

X = BZ + U 	
  

(Eq	
  3.2)	
  

Here	
   B	
   and	
   U	
   are	
   unknown.	
   The	
   multivariate	
   (generalized)	
   least-­‐squares	
   (LS,	
   GLS)	
  
estimator	
   of	
   B	
   is	
   the	
   estimator B̂ that	
   minimizes	
   the	
   variance	
   of	
   the	
   innovation	
   process	
  
(residuals)	
  U.	
  Namely,	
  

Bˆ = arg min S ( B) 	
  
B

where	
   S ( B) = tr[( X − BZ )′Σ −1 ( X − BZ )].	
  
It	
  can	
  be	
  shown	
  (Lütkepohl,	
  2006)	
  that	
  the	
  LS	
  estimator	
  can	
  be	
  obtained	
  by	
  

Bˆ = XZ ′( ZZ ′) −1 	
  

(Eq	
  3.3)	
  

This	
  result	
  can	
  be	
  derived	
  in	
  several	
  ways,	
  however	
  a	
  simple	
  approach	
  follows	
  from	
  post-­‐
multiplying	
  

xt = BZt −1 + ut 	
  
by	
   Z t′−1 	
  and	
  taking	
  expectations:	
  

E ( xt Zt′−1 ) = BE ( Zt −1Zt′−1 ) 	
  

(Eq	
  3.4)	
  

Estimating	
   E ( xt Zt′−1 ) 	
  by	
  

	
  

8	
  

1 T
1
xt Z t′−1 = XZ ′ 	
  
∑
T t =1
T
we	
  obtain	
  the	
  normal	
  equations	
  

1
1
XZ ′ = Bˆ ZZ ′ 	
  
T
T
and	
  thus,	
   Bˆ = XZ ′( ZZ ′) −1 .	
  
The	
   reader	
   may	
   note	
   that	
   B̂ 	
   is	
   simply	
   the	
   product	
   of	
   X	
   and	
   the	
   Moore-­‐Penrose	
  
pseudoinverse	
   of	
   Z:	
   B̂ = XZ † where	
   Z † = pinv( Z ) .	
   The	
   reader	
   familiar	
   with	
   univariate	
  
autoregressive	
  model	
  fitting	
  might	
  also	
  note	
  that	
  (Eq	
  3.4)	
  is	
  very	
  similar	
  to	
  the	
  well-­‐known	
  
system	
   of	
   Yule-­‐Walker	
   equations.	
   Hence,	
   this	
   can	
   be	
   considered	
   an	
   extension	
   to	
   the	
  
multivariate	
  case	
  of	
  the	
  Yule-­‐Walker	
  algorithm	
  for	
  univariate	
  AR	
  model	
  fitting.	
  
Although	
   asymptotically	
   optimal,	
   the	
   LS	
   algorithm	
   often	
   suffers	
   from	
   sub-­‐optimal	
  
performance	
  when	
  even	
  moderate	
  sample	
  sizes	
  are	
  available,	
  as	
  compared	
  to	
  more	
  robust	
  
modified	
   LS	
   algorithms	
   (e.g.,	
   the	
   stepwise	
   least-­‐squares	
   ARFIT	
   algorithm)	
   or	
   non-­‐LS	
  
algorithms	
   (e.g.,	
   the	
   Vieira-­‐Morf	
   lattice	
   algorithm).	
   A	
   detailed	
   empirical	
   performance	
  
comparison	
  of	
  these	
  and	
  other	
  algorithms	
  can	
  be	
  found	
  in	
  (Schlögl,	
  2006).	
  For	
  this	
  reason,	
  
SIFT	
   abandons	
   the	
   LS	
   algorithm	
   in	
   favor	
   of	
   these	
   more	
   robust	
   algorithms.	
   The	
   SIFT	
  
functions	
  pop_est_fitMVAR()	
  and	
  est_fitMVARKalman()	
  provide	
  access	
  to	
  various	
  
model-­‐fitting	
  approaches.	
  

3.3.	
  

Frequency-­‐Domain	
  Representation	
  

	
  
Electrophysiological	
   processes	
   generally	
   exhibit	
   oscillatory	
   structure,	
   making	
   them	
   well	
  
suited	
   for	
   frequency-­‐domain	
   analysis	
   (Buzsaki,	
   2006).	
   A	
   suitably	
   fit	
   autoregressive	
   model	
  
provides	
  an	
  idealized	
  model	
  for	
  the	
  analysis	
  of	
  oscillatory	
  structure	
  in	
  stochastic	
  time	
  series	
  
(Burg,	
   1967;	
   Zetterberg,	
   1969;	
   Burg,	
   1975;	
   Neumaier	
   and	
   Schneider,	
   2001).	
   From	
   the	
   AR	
  
coefficients,	
   we	
   can	
   obtain	
   a	
   number	
   of	
   useful	
   quantities	
   including	
   the	
   spectral	
   density	
  
matrix	
   and	
   the	
   transfer	
   function	
   of	
   the	
   process.	
   From	
   these	
   and	
   related	
   quantities	
   we	
   can	
  
obtain	
   power	
   spectra,	
   coherence	
   and	
   partial	
   coherence,	
   Granger-­‐Geweke	
   causality,	
   directed	
  
transfer	
   function,	
   partial	
   directed	
   coherence,	
   phase-­‐locking	
   value,	
   and	
   a	
   number	
   of	
   other	
  
quantities	
  increasingly	
  being	
  used	
  by	
  the	
  neuroscience	
  community	
  to	
  study	
  synchronization	
  
and	
  information	
  flow	
  in	
  the	
  brain	
  (Pereda	
  et	
  al.,	
  2005;	
  Schelter	
  et	
  al.,	
  2006).	
  	
  
To	
   obtain	
   our	
   frequency-­‐domain	
   representation	
   of	
   the	
   model,	
   we	
   begin	
   with	
   our	
   VAR[p]	
  
model	
  from	
  (Eq	
  3.1).	
  For	
  simplicity,	
  we	
  will	
  assume	
  the	
  process	
  mean	
  is	
  zero:	
  
p

xt = ∑Ak xt −k + ut 	
  
k =1

Rearranging	
  terms	
  we	
  get	
  

	
  

9	
  

p

ut = ∑ Aˆ k xt − k where Aˆ k = − Ak and Aˆ0 = − I 	
  
k =0

	
  
Z-­‐transforming	
  both	
  sides	
  yields:	
  

U ( f ) = A( f ) X ( f ) 	
  where	
  
p

A( f ) = ∑Aˆk e−i 2π fk 	
  
k =0

Premultiplying	
  by	
   A( f )−1 and	
  rearranging	
  terms	
  we	
  obtain:	
  

X ( f ) = A( f )−1U ( f ) = H ( f )U ( f ) 	
  
Here	
   X(f)	
   is	
   the	
   (M	
   x	
   M)	
   spectral	
   matrix	
   of	
   the	
   multivariate	
   process,	
   U(f)	
   is	
   a	
   matrix	
   of	
  
random	
  sinusoidal	
  shocks	
  and	
  A(f)-­‐1	
  =	
  H(f)	
  is	
  the	
  transfer	
  matrix	
  of	
  the	
  system.	
  Note	
  that	
  H(f)	
  
transforms	
  the	
  noise	
  input	
  (U)	
  into	
  the	
  structured	
  spectral	
  matrix.	
  This	
  should	
  give	
  us	
  a	
  hint	
  
that	
   analysis	
   of	
   H(f)	
   (and	
   A(f))	
   will	
   help	
   us	
   in	
   identifying	
   the	
   structure	
   of	
   the	
   modeled	
  
system	
   (including	
   information	
   flow	
   dynamics).	
   The	
   spectral	
   density	
   matrix	
   of	
   the	
   process	
  
(which	
   contains	
   the	
   auto-­‐spectrum	
   of	
   each	
   variable	
   (at	
   frequency	
   f)	
   on	
   the	
   diagonals	
   and	
  
the	
  cross-­‐spectrum	
  on	
  the	
  off-­‐diagonals)	
  is	
  given	
  by:	
  	
  

S ( f ) = X ( f ) X ( f )* = H ( f )ΣH ( f ) * 	
  
As	
  we	
  shall	
  see	
  in	
  Section	
  4.3.	
  ,	
  from	
   S ( f ), A( f ), H ( f ) and Σ ,	
  we	
  can	
  derive	
  a	
  number	
  of	
  
frequency-­‐domain	
   quantities	
   relevant	
   to	
   the	
   study	
   of	
   oscillations,	
   information	
   flow,	
   and	
  
coupling	
  in	
  neural	
  systems.	
  	
  

3.4.	
  

Modeling	
  non-­‐stationary	
  data	
  using	
  adaptive	
  VAR	
  models	
  

	
  
In	
  section	
   3.	
  we	
  stated	
  that	
  data	
  stationarity	
  is	
  a	
  necessary	
  precondition	
  for	
  accurate	
  VAR	
  
estimation.	
   However,	
   it	
   is	
   well-­‐known	
   that	
   neural	
   data,	
   including	
   EEG	
   and	
   Local	
   Field	
  
Potentials	
   (LFPs),	
   can	
   be	
   highly	
   non-­‐stationary,	
   exhibiting	
   large	
   fluctuations	
   in	
   both	
   the	
  
mean	
   and	
   variance	
   over	
   time.	
   For	
   instance,	
   a	
   record	
   of	
   EEG	
   data	
   containing	
   evoked	
  
potentials	
   (EPs)	
   is	
   a	
   classic	
   example	
   of	
   a	
   non-­‐stationary	
   time	
   series	
   (both	
   the	
   mean	
   and	
  
variance	
   of	
   the	
   series	
   changes	
   dramatically	
   and	
   transiently	
   during	
   the	
   evoked	
   response).	
  
Another	
   example	
   would	
   be	
   EEG	
   data	
   collected	
   during	
   slow-­‐wave	
   sleep,	
   which	
   exhibits	
   slow	
  
fluctuations	
   in	
   the	
   mean	
   EEG	
   voltage	
   over	
   time.	
   A	
   number	
   of	
   algorithms	
   have	
   been	
  
proposed	
   for	
   fitting	
   VAR	
   models	
   to	
   non-­‐stationary	
   series.	
   In	
   the	
   neuroscience	
   community	
  
the	
   most	
   popular	
   approaches	
   include	
   segmentation	
   (overlapping	
   sliding-­‐window)	
  
approaches	
   (Jansen	
   et	
   al.,	
   1981;	
   Florian	
   and	
   Pfurtscheller,	
   1995;	
   Ding	
   et	
   al.,	
   2000),	
   state-­‐
space	
   (Kalman	
   filtering)	
   approaches	
   (Schlögl,	
   2000;	
   Sommerlade	
   et	
   al.,	
   2009),	
   and	
   non-­‐
parametric	
  methods	
  based	
  on	
  minimum-­‐phase	
  spectral	
  matrix	
  factorization	
  (Dhamala	
  et	
  al.,	
  
2008).	
   All	
   of	
   these	
   approaches	
   are	
   currently	
   –	
   or	
   soon	
   to	
   be	
   made	
   –	
   accessible	
   in	
   SIFT.	
   Here	
  
we	
  will	
  briefly	
  outline	
  the	
  concepts	
  behind	
  each	
  modeling	
  approach.	
  	
  

	
  

10	
  

3.4.1.	
  
	
  

Segmentation-­‐based	
  Adaptive	
  VAR	
  (AMVAR)	
  models	
  

A	
   segmentation-­‐based	
   AMVAR	
   adopts	
   an	
   approach	
   rather	
   similar	
   to	
   the	
   concept	
   behind	
  
short-­‐time	
  fourier	
  transforms	
  or	
  other	
  windowing	
  techniques.	
  Namely,	
  we	
  extract	
  a	
  sliding	
  
window	
  of	
  length	
  W	
  from	
  the	
  multivariate	
  dataset,	
  and	
  fit	
  our	
  VAR[p]	
  model	
  to	
  this	
  data.	
  We	
  
then	
   increment	
   the	
   window	
   by	
   a	
   (small)	
   quantity	
  Q	
   and	
   repeat	
   the	
   procedure	
   until	
   the	
   start	
  
of	
   the	
   window	
   is	
   greater	
   than	
   T-­‐W.	
   This	
   produces	
   floor((T-­‐W)/Q+1)	
   VAR	
   coefficient	
  
matrices	
  which	
  describe	
  the	
  evolution	
  of	
  the	
  VAR[p]	
  across	
  time.	
  The	
  concept	
  here	
  is	
  that	
  by	
  
using	
   a	
   sufficiently	
   small	
   window,	
   the	
   data	
   will	
   be	
  locally	
   stationary	
   within	
   the	
   window	
   and	
  
suitable	
   for	
   VAR	
   modeling.	
   By	
   using	
   highly	
   overlapping	
   windows	
   (small	
   Q)	
   we	
   can	
   obtain	
  
coefficients	
   that	
   change	
   relatively	
   smoothly	
   with	
   time.	
   Figure	
   1	
   shows	
   a	
   schematic	
   of	
   the	
  
sliding-­‐window	
  AMVAR	
  approach.	
  
One	
  concern	
  here	
  is	
  whether	
  sufficient	
  data	
  points	
  are	
  available	
  to	
  accurately	
  fit	
  the	
  model.	
  
In	
  the	
  general	
  case,	
  we	
  have	
  M2p	
  coefficients	
  (free	
  parameters)	
  to	
  estimate,	
  which	
  requires	
  a	
  
minimum	
  of	
  M2p	
  data	
  samples.	
  However,	
  in	
  practice,	
  we	
  would	
  like	
  to	
  have	
  at	
  least	
  10	
  times	
  
as	
  many	
  data	
  points	
  as	
  free	
  parameters	
  (Schlögl	
  and	
  Supp,	
  2006;	
  Korzeniewska	
  et	
  al.,	
  2008).	
  
When	
  multiple	
  realizations	
  (e.g.,	
  experimental	
  trials)	
  are	
  available,	
  we	
  can	
  assume	
  that	
  each	
  
trial	
   is	
   a	
   random	
   sample	
   from	
   the	
   same	
   stochastic	
   process	
   and	
   average	
   covariance	
   matrices	
  
across	
  trials	
  to	
  reduce	
  the	
  bias	
  of	
  our	
  model	
  coefficient	
  estimator	
  (Ding	
  et	
  al.,	
  2000).	
  For	
  the	
  
LS	
  algorithm,	
  explained	
  in	
  section	
  3.	
  ,	
  this	
  yields	
  the	
  modified	
  estimator:	
  

Bˆ = E ( X (i ) Z ′(i ) ) E ( Z (i ) Z ′(i ) ) −1 	
  

(Eq	
  3.5)	
  

Where	
  X(i)	
   and	
  Z(i)	
  denote	
  matrices	
  X	
  and	
  Z	
  for	
  the	
  ith	
  single-­‐trial	
  and	
  the	
  expected	
  value	
  is	
  
taken	
  across	
  all	
  trials.	
  This	
  approach	
  effectively	
  increases	
  the	
  number	
  of	
  samples	
  available	
  
for	
   a	
   sliding	
   window	
   of	
   length	
   W	
   from	
   W	
   to	
   WN,	
   where	
   N	
   is	
   the	
   number	
   of	
  
trials/realizations.	
  This	
  allows	
  us	
  to	
  potentially	
  use	
  very	
  small	
  windows	
  (containing	
  as	
  few	
  
as	
  p+1	
  sample	
  points)	
  while	
  still	
  obtaining	
  a	
  good	
  model	
  fit.	
  	
  
When	
   using	
   short	
   windows	
   with	
   multi-­‐trial	
   data,	
   an	
   important	
   preprocessing	
   step	
   is	
   to	
  
pointwise	
   subtract	
   the	
   ensemble	
   mean	
   and	
   divide	
   by	
   the	
   ensemble	
   standard	
   deviation	
  
(ensemble	
  normalization).	
  This	
  ensures	
  that	
  the	
  ensemble	
  mean	
  is	
  zero	
  and	
  the	
  variance	
  is	
  
one,	
   at	
   every	
   time	
   point.	
   This	
   can	
   dramatically	
   improve	
   the	
   local	
   stationarity	
   of	
   the	
   data	
  
(Ding	
   et	
   al.,	
   2000).	
   An	
   important	
   result	
   of	
   this	
   is	
   that	
   we	
   are	
   essentially	
   modeling	
  
dependencies	
   in	
   the	
   residual	
   time-­‐series	
   after	
   removing	
   the	
   event-­‐related	
   potential	
   (ERP)	
  
from	
   the	
   data.	
   The	
   fact	
   that	
   this	
   preprocessing	
   step	
   has	
   become	
   common	
   practice	
   in	
  
published	
  applications	
  of	
  AMVAR	
  analysis	
  to	
  neural	
  data	
  suggests	
  that	
  there	
  is,	
  in	
  fact,	
  rich	
  
task-­‐relevant	
   information	
   present	
   in	
   the	
   so-­‐called	
   “residual	
   noise”	
   component	
   of	
   the	
   EEG	
  
which	
   cannot	
   be	
   inferred	
   from	
   the	
   ERP	
   itself	
   (Ding	
   et	
   al.,	
   2000;	
   Bressler	
   and	
   Seth,	
   2010).	
  
This	
   fits	
   under	
   the	
   model	
   that	
   mean-­‐field	
   electrophysiological	
   measures	
   such	
   as	
   LFPs	
   and	
  
EEG	
   measure	
   a	
   sum	
   of	
   (potentially	
   oscillatory)	
   ongoing	
   activity	
   and	
   evoked	
   responses	
  
where	
   the	
   amplitude	
   and	
   phase	
   of	
   the	
   evoked	
   response	
   depends	
   largely	
   on	
   the	
   phase	
   of	
   the	
  
ongoing	
   oscillations	
   (Kenet	
   et	
   al.,	
   2005;	
   Wang	
   et	
   al.,	
   2008).	
   Analyzing	
   the	
   phase	
   structure	
   of	
  
the	
   stationary	
   ongoing	
   oscillations	
   may	
   provide	
   a	
   deeper	
   insight	
   into	
   the	
   state	
   of	
   the	
  
underlying	
  neural	
  system	
  than	
  the	
  analysis	
  of	
  the	
  evoked	
  responses	
  themselves.	
  

	
  

11	
  

N

W

tria
ls

x2
x3

M variables

x1

time

p

yt = " Ak yt ! k + ut
k =1

e

Ak =

VAR

%
'
'
'
'
'
'
&

...

normalization

" a11 (k,tT !W /2 ) … a1M (k,tT !W /2 )
$
$
" a11 (k,t$T !W /2 ) !… a1M (k,t
" T !W /2 ) !%
'
! $ a11 (k,tW$/2 ) … a1M (k,tW /2 ) $
'
$
#$
&
$
'
!$ a (k,t" ) #! a& (k,t
#
M1
T !W /2
MM
#
$
' T !W /2 )
#
&
!
"
!
$
'
#
&
# $# aM 1 (k,tT !W /2 ) # aMM (k,tT !W
& /2 ) '&
# a (k,t ) # a (k,t ) &
W /2
MM
W /2
" M1
%

tim

Y

	
  

Figure	
  1.	
  Schematic	
  of	
  sliding-­‐window	
  AMVAR	
  modeling.	
  W	
  is	
  the	
  window	
  length,	
  T	
  is	
  the	
  length	
  of	
  each	
  
trial	
  in	
  samples,	
  N	
  is	
  the	
  number	
  of	
  trials.	
  	
  

	
  

3.5.	
  

Model	
  order	
  selection	
  

	
  
Parametric	
   VAR	
   model	
   fitting	
   really	
   involves	
   only	
   one	
   parameter:	
   the	
   model	
   order.	
   	
   	
   The	
  
most	
   common	
   approach	
   for	
   model	
   order	
   selection	
   involves	
   selecting	
   a	
   model	
   order	
   that	
  
minimizes	
   one	
   or	
   more	
   information	
   criteria	
   evaluated	
   over	
   a	
   range	
   of	
   model	
   orders.	
  
Commonly	
  used	
  information	
  criteria	
  include,	
  Akaike	
  Information	
  Criterion	
  (AIC),	
  Schwarz-­‐
Bayes	
  Criterion	
  (SBC)	
  –	
  also	
  known	
  as	
  the	
  Bayesian	
  Information	
  Criterion	
  (BIC)	
  –	
  Akaike’s	
  
Final	
   Prediction	
   Error	
   Criterion	
   (FPE),	
   and	
   Hannan-­‐Quinn	
   Criterion	
   (HQ).	
   A	
   detailed	
  
comparison	
   of	
   these	
   criteria	
   can	
   be	
   found	
   in	
   Chapter	
   4.3	
   of	
   (Lütkepohl,	
   2006).	
   In	
   brief,	
   each	
  
criterion	
  is	
  a	
  sum	
  of	
  two	
  terms,	
  one	
  that	
  characterizes	
  the	
  entropy	
  rate	
  or	
  prediction	
  error	
  
of	
   the	
   model,	
   and	
   a	
   second	
   term	
   that	
   characterizes	
   the	
   number	
   of	
   freely	
   estimated	
  
parameters	
   in	
   the	
   model	
   (which	
   increases	
   with	
   increasing	
   model	
   order).	
   By	
   minimizing	
  
both	
  terms,	
  we	
  seek	
  to	
  identify	
  a	
  model	
  that	
  is	
  both	
  parsimonious	
  (does	
  not	
  overfit	
  the	
  data	
  
with	
   too	
   many	
   parameters)	
   while	
   also	
   accurately	
   modeling	
   the	
   data.	
   The	
   criteria	
  
implemented	
  in	
  SIFT	
  are	
  defined	
  in	
  Table	
  2.	
  
Table	
  2.	
  Information	
  criteria	
  for	
  model	
  order	
  selection	
  implemented	
  in	
  SIFT.	
  Here	
   T̂
number	
  of	
  samples	
  (data	
  points)	
  used	
  to	
  fit	
  the	
  model	
  

Estimator	
  
Schwarz-­‐Bayes	
  Criterion	
  
(Bayesian	
  Information	
  Criterion)	
  
Akaike	
  Information	
  Criterion	
  

	
  

= TN 	
  	
  is	
  the	
  total	
  

Formula	
  

ˆ
! p ) + ln(T ) pM 2 	
  
SBC ( p ) = ln !(
Tˆ

! p ) + 2 pM 2 	
  
AIC ( p ) = ln !(
Tˆ

12	
  

Akaike’s	
  Final	
  Prediction	
  Error	
  

M

ˆ
! p ) + # T + Mp + 1& 	
  
FPE ( p ) = !(
%$ Tˆ " Mp "1('
	
  
and	
  its	
  logarithm	
  (used	
  in	
  SIFT)	
  
	
  

ˆ
! p ) + M ln # T + Mp + 1& 	
  
ln( FPE ( p )) = ln !(
%$ Tˆ " Mp "1('
Hannan-­‐Quinn	
  Criterion	
  

ˆ
! p ) + 2ln(ln(T )) pM 2 	
  
HQ ( p ) = ln !(
Tˆ

	
  
For	
  a	
  given	
  information	
  criterion,	
  IC,	
  we	
  select	
  the	
  model	
  order	
  that	
  minimizes	
  IC:	
  	
  

psel = arg min IC ( p) 	
  
p

! p ) 	
   is	
   the	
   logarithm	
   of	
   the	
   determinant	
   of	
   the	
   estimated	
   noise	
  
Here,	
   the	
   first	
   term,	
   ln !(
covariance	
  matrix	
  (prediction	
  error)	
  for	
  a	
  VAR	
  model	
  of	
  order	
  p	
  fit	
  to	
  the	
  M-­‐channel	
  data,	
  
where	
   TN	
   is	
   the	
   total	
   number	
   of	
   datapoints	
   used	
   to	
   fit	
   the	
   model	
   (T	
   samples	
   per	
   trial	
   x	
   N	
  
trials).	
  The	
  key	
  difference	
  between	
  the	
  criteria	
  is	
  how	
  severely	
  each	
  penalizes	
  increases	
  in	
  
model	
   order	
   (the	
   second	
   term).	
   AIC	
   and	
   SBC	
   are	
   the	
   most	
   widely	
   used	
   criteria,	
   but	
   SBC	
  
more	
   heavily	
   penalizes	
   larger	
   model	
   orders.	
   For	
   moderate	
   and	
   large	
   TN,	
   FPE	
   and	
   AIC	
   are	
  
essentially	
   equivalent	
   (see	
   Lutkepohl	
   (2006)	
   p.	
   148	
   for	
   a	
   proof);	
   however,	
   FPE	
   may	
  
outperform	
  AIC	
  for	
  very	
  small	
  sample	
  sizes.	
  HQ	
  penalizes	
  high	
  model	
  orders	
  more	
  heavily	
  
than	
   AIC	
   but	
   less	
   than	
   SBC.	
   Both	
   SBC	
   and	
   HQ	
   are	
   consistent	
   estimators,	
   which	
   means	
   that	
  
lim N →∞ Pr{ psel = ptrue } = 1 .	
   This	
   cannot	
   be	
   said	
   of	
   AIC	
   and	
   FPE.	
   However,	
   under	
   small	
   sample	
  
conditions	
  (small	
  N),	
  AIC/FPE	
  may	
  outperform	
  SBC	
  and/or	
  HQ	
  in	
  selecting	
  the	
  true	
  model	
  
order	
  (Lütkepohl,	
  2006).	
  When	
  modeling	
  EEG	
  data,	
  it	
  is	
  common	
  for	
  AIC	
  and	
  FPE	
  to	
  show	
  
no	
   clear	
   minimum	
   over	
   a	
   reasonable	
   range	
   of	
   model	
   orders.	
   In	
   this	
   case,	
   there	
   may	
   be	
   a	
  
clear	
   “elbow”	
   in	
   the	
   criterion	
   plotted	
   as	
   a	
   function	
   of	
   increasing	
   model	
   order,	
   which	
   may	
  
suggest	
  a	
  suitable	
  model	
  order.	
  	
  
When	
   selecting	
   a	
   model	
   order	
   for	
   neural	
   connectivity	
   analysis,	
   it	
   is	
   important	
   to	
   consider	
  
the	
  dynamics	
  of	
  the	
  underlying	
  physiological	
  system.	
  In	
  particular,	
  one	
  should	
  consider	
  the	
  
maximum	
   expected	
   time	
   lag	
   between	
   any	
   two	
   variables	
   included	
   the	
   model.	
   If	
   we	
   have	
  
reason	
  to	
  expect	
  a	
  time	
  lag	
  of	
   τ	
  seconds	
  between	
  any	
  two	
  brain	
  processes,	
  we	
  should	
  make	
  
sure	
   to	
   select	
   a	
   model	
   order	
   of	
   p	
   ≥	
   τ	
   Fs	
   where	
   Fs	
   is	
   the	
   process	
   sampling	
   rate	
   in	
   Hz.	
  
Additionally,	
   we	
   should	
   consider	
   that	
   the	
   multivariate	
   spectrum	
   of	
   a	
   M-­‐dimensional	
   VAR[p]	
  
model	
   has	
   Mp/2	
   frequency	
   components	
   (peaks)	
   distributed	
   amongst	
   the	
   M	
   variables	
   (there	
  
are	
   Mp	
   complex-­‐conjugate	
   roots	
   of	
   the	
   characteristic	
   equation	
   of	
   the	
   model).	
   This	
   means	
  
that	
   we	
   can	
   observe	
   p/2	
   frequency	
   peaks	
   between	
   each	
   pair	
   of	
   variables	
   (Florian	
   and	
  
Pfurtscheller,	
  1995;	
  Schlögl	
  and	
  Supp,	
  2006).	
  Thus	
  a	
  reasonable	
  lower	
  bound	
  on	
  the	
  model	
  
order	
  might	
  be	
  twice	
  the	
  number	
  of	
  expected	
  frequencies	
  plus	
  one	
  (for	
  the	
  zero-­‐Hz	
  peak).	
  
Tests	
  performed	
  by	
  Jansen	
  (1981)	
  and	
  Florian	
  and	
  Pfurtscheller	
  (1995)	
  demonstrated	
  that	
  
a	
   potentially	
   optimal	
   model	
   order	
   for	
   modeling	
   EEG	
   spectra	
   was	
   p=10,	
   although	
   little	
  

	
  

13	
  

spectral	
   differences	
   were	
   identified	
   for	
   model	
   orders	
   between	
   9	
   and	
   13.	
   A	
   key	
   point,	
  
however,	
  is	
  that	
  this	
  was	
  identified	
  for	
  a	
  sampling	
  rate	
  of	
  128	
  Hz	
  and	
  it	
  is	
  known	
  that	
  the	
  
optimal	
  model	
  order	
  depends	
  significantly	
  on	
  the	
  sampling	
  rate	
  of	
  the	
  process	
  (Zetterberg,	
  
1969).	
  
The	
   principle	
   motivation	
   behind	
   heavy	
   penalization	
   of	
   high	
   model	
   orders	
   in	
   an	
   information	
  
criterion	
   is	
   to	
   improve	
   forecasting	
   performance	
   by	
   reducing	
   over-­‐fitting.	
   However,	
  
forecasting	
   is	
   not	
   necessarily	
   the	
   ultimate	
   goal	
   of	
   our	
   neural	
   modeling	
   approach.	
  
Furthermore,	
   selecting	
   a	
   too-­‐small	
   model	
   order	
   can	
   severely	
   impair	
   our	
   frequency	
  
resolution	
  (merging	
  peaks	
  together)	
  as	
  well	
  as	
  our	
  ability	
  to	
  detect	
  coupling	
  over	
  long	
  time	
  
lags.	
  Where	
  there	
  is	
  a	
  question	
  as	
  to	
  a	
  suitable	
  model	
  order,	
  it	
  is	
  often	
  better	
  to	
  err	
  on	
  the	
  
side	
  of	
  selecting	
  a	
  larger	
  model	
  order.	
  As	
  such,	
  a	
  criterion	
  such	
  as	
  HQ,	
  which	
  often	
  shows	
  a	
  
clear	
  minimum	
  but	
  affords	
  intermediate	
  penalization	
  between	
  AIC	
  and	
  SBC	
  may	
  represent	
  
an	
  optimal	
  choice	
  for	
  neural	
  data.	
  	
  
In	
   general,	
   it	
   is	
   good	
   practice	
   to	
   select	
   a	
   model	
   order	
   by	
   examining	
   multiple	
   information	
  
criteria	
   and	
   combining	
   this	
   information	
   with	
   additional	
   expectations	
   and	
   knowledge	
  
specific	
  to	
  the	
  physiological	
  properties	
  of	
  the	
  neural	
  system	
  being	
  analyzed.	
  When	
  possible	
  
spectra	
   and	
   coherence	
   obtained	
   from	
   fitted	
   VAR	
   models	
   should	
   be	
   compared	
   with	
   those	
  
obtained	
   from	
   non-­‐parametric	
   methods	
   (such	
   as	
   wavelets)	
   to	
   validate	
   the	
   model.	
   Model	
  
order	
   selection	
   is	
   often	
   an	
   iterative	
   process	
   wherein,	
   through	
   model	
   validation,	
   we	
  
determine	
  the	
  quality	
  of	
  our	
  model	
  fit,	
  and,	
  if	
  necessary,	
  revise	
  our	
  model	
  specification	
  until	
  
the	
  data	
  is	
  adequately	
  modeled.	
  	
  
Model	
  order	
  selection	
  is	
  implemented	
  in	
  SIFT	
  using	
  pop_est_selModelOrder().	
  

3.6.	
  

Model	
  Validation	
  

	
  
There	
  a	
  number	
  of	
  criteria	
  which	
  we	
  can	
  use	
  to	
  determine	
  whether	
  we	
  have	
  appropriately	
  
fit	
  our	
  VAR	
  model.	
  SIFT	
  implements	
  three	
  commonly	
  used	
  categories	
  of	
  tests:	
  (1)	
  checking	
  
the	
  residuals	
  of	
  the	
  model	
  for	
  serial	
  and	
  cross-­‐correlation	
  (whiteness	
  tests),	
  (2)	
  testing	
  the	
  
consistency	
  of	
  the	
  model,	
  and	
  (3)	
  check	
  the	
  stability/stationarity	
  of	
  the	
  model.	
  These	
  can	
  be	
  
accessed	
  through	
  the	
  SIFT	
  GUI	
  using	
  pop_est_validateMVAR()	
  
3.6.1.	
  
	
  

Checking	
  the	
  whiteness	
  of	
  the	
  residuals	
  

Recall	
   the	
   compact	
   model	
   definition	
   from	
   (Eq	
   3.2):	
   X = BZ + U .	
   Here	
   we	
   can	
   regard	
   the	
  
VAR[p]	
   model	
   coefficients	
   B	
   as	
   a	
   filter	
   which	
   transforms	
   innovations	
   (random	
   white	
   noise),	
  
U,	
  into	
  observed,	
  structured	
  data	
  X.	
  Consequently,	
  for	
  coefficient	
  estimates	
   B̂ ,	
  we	
  can	
  obtain	
  
ˆ .	
  If	
  we	
  have	
  adequately	
  modeled	
  the	
  data,	
  the	
  residuals	
  should	
  be	
  
the	
  residuals	
   Uˆ = X − BZ
small	
   and	
   uncorrelated	
   (white).	
   Correlation	
   structure	
   in	
   the	
   residuals	
   means	
   there	
   is	
   still	
  
some	
  correlation	
  structure	
  in	
  the	
  data	
  that	
  has	
  not	
  been	
  described	
  by	
  our	
  model.	
  	
  Checking	
  
the	
   whiteness	
   of	
   residuals	
   typically	
   involves	
   testing	
   whether	
   the	
   residual	
   autocorrelation	
  
coefficients	
  up	
  to	
  some	
  desired	
  lag	
  h	
  are	
  sufficiently	
  small	
  to	
  ensure	
  that	
  we	
  cannot	
  reject	
  
the	
  null	
  hypothesis	
  of	
  white	
  residuals	
  at	
  some	
  desired	
  significance	
  level.	
  
	
  

	
  

14	
  

3.6.1.1.	
  
	
  

Autocorrelation	
  Function	
  (ACF)	
  Test	
  

The	
   (M	
   	
   x	
   M)	
   lag	
   l	
   autocovariance	
   matrix	
   of	
   the	
   residuals	
   is	
   given	
   by	
   Cl = E[uˆt uˆt′−l ] .	
   We	
  
denote	
  the	
  autocovariances	
  up	
  to	
  lag	
  l	
  as	
   Ch = (C1 ,…, Ch ) .	
  The	
  lag	
  l	
  autocorrelation	
  matrix	
  is	
  
given	
  by	
   Rl = D−1Cl D−1 where	
  D	
  is	
  a	
  (M	
  	
  x	
  M)	
  diagonal	
  matrix,	
  the	
  diagonal	
  elements	
  being	
  
the	
   square	
   root	
   of	
   the	
   diagonal	
   elements	
   of	
   C0 .	
   We	
   are	
   generally	
   interested	
   in	
   testing	
   the	
  
(white	
  noise)	
  null	
  hypothesis	
   H 0 : R h = ( R1 ,…, Rh ) = 0 against	
  the	
  alternative	
   H1 : R h ≠ 0 .	
  A	
  
simple	
   test,	
   based	
   on	
   asymptotic	
   properties	
   of	
   univariate	
   white	
   noise	
   processes,	
   involves	
  
rejecting	
  the	
  hypothesis	
  that	
   Û 	
  is	
  white	
  noise	
  at	
  the	
  5%	
  level	
  if	
   Rl > ±2 / Tˆ 	
  for	
  any	
  lag	
  l	
  
(excluding	
   the	
   diagonal	
   elements	
   of	
   R0	
   which	
   are	
   always	
   1).	
   Tˆ = TN is	
   the	
   total	
   number	
   of	
  
samples	
   used	
   in	
   estimating	
   the	
   covariance.	
   However,	
   since	
   this	
   is	
   a	
   pointwise	
   significance	
  
test	
   at	
   the	
   5%	
   level,	
   in	
   practice	
   we	
   expect	
   one	
   in	
   twenty	
   coefficients	
   to	
   exceed	
   2 / Tˆ in	
  
absolute	
  value	
  even	
  if	
   Û is	
  white.	
  A	
  reasonable	
  corrected	
  statistic	
  is	
  thus	
  the	
  probability	
  of	
  a	
  
coefficient	
  exceeding	
  the	
  5%	
  significance	
  bounds:	
  

ρ=

(

count R h > ±2 / Tˆ
count ( R h )

) = count ( R

h

> ±2 / Tˆ

M (h + 1) − M
2

) 	
  	
  

If	
  !	
  <	
  0.05,	
  or	
  equivalently	
  1-­‐!	
  >	
  0.95,	
  then	
  we	
  cannot	
  reject	
  the	
  null	
  hypothesis	
  at	
  the	
  5%	
  
level	
  and	
  we	
  accept	
  that	
  the	
  residuals	
  are	
  white.	
  	
  
Due	
   to	
   its	
   simplicity,	
   this	
   sort	
   of	
   test	
   enjoys	
   much	
   popularity.	
   However,	
   it	
   is	
   important	
   to	
  
bear	
   in	
   mind	
   that	
   the	
   5%	
   confidence	
   intervals	
   apply	
   to	
   individual	
   coefficients	
   (i.e.,	
   for	
  
univariate	
  models)	
  and	
  although	
  the	
  Ri	
  and	
  Rj	
   are	
  asymptotically	
  uncorrelated	
  for	
   i ≠ j this	
  
is	
   not	
   necessarily	
   true	
   for	
   the	
   elements	
   of	
   Ri.	
   As	
   such,	
   this	
   test	
   may	
   be	
   misleading	
   when	
  
considering	
  the	
  coefficients	
  of	
  a	
  multivariate	
  model	
  as	
  a	
  group.	
  Additionally,	
  in	
  small	
  sample	
  
conditions	
  (small	
   Tˆ ),	
  this	
  test	
  may	
  be	
  overly	
  conservative	
  such	
  that	
  the	
  null	
  hypothesis	
  is	
  
rejected	
   (residuals	
   indicated	
   as	
   non-­‐white)	
   less	
   often	
   than	
   indicated	
   by	
   the	
   chosen	
  
significance	
  level	
  (Lutkepohl,	
  2006).	
  	
  
3.6.1.2.	
  
	
  

Portmanteau	
  Tests	
  

In	
  the	
  previous	
  section,	
  we	
  noted	
  that	
  the	
  simple	
  asymptotic	
  ACF	
  test	
  may	
  yield	
  misleading	
  
results	
  when	
  the	
  coefficients	
  are	
  considered	
  independently	
  rather	
  than	
  as	
  a	
  group,	
  derived	
  
from	
   a	
   multivariate	
   process.	
   	
   In	
   contrast,	
   portmanteau	
   tests	
   are	
   a	
   powerful	
   class	
   of	
   test	
  
statistics	
  explicitly	
  derived	
  to	
  test	
  H0	
  up	
  to	
  some	
  lag	
  h.	
  SIFT	
  implements	
  three	
  portmanteau	
  
test	
   statistics:	
   Box-­‐Pierce	
   (BPP),	
   	
   Ljung-­‐Box	
   (LBP),	
   and	
   Li-­‐McLeod	
   (LMP).	
   Under	
   the	
   null	
  
hypothesis,	
  for	
  large	
  sample	
  size	
  and	
  h,	
  each	
  of	
  these	
  test	
  statistics	
  approximately	
  follow	
  a	
  
χ 2 -­‐distribution	
   with	
   M 2 (h − p) 	
   degrees	
   of	
   freedom.	
   A	
   !-­‐value	
   can	
   thus	
   be	
   obtained	
   by	
  
comparing	
   the	
   test	
   statistic	
   with	
   the	
   c.d.f.	
   of	
   this	
   distribution.	
   If	
   1-­‐!	
   is	
   greater	
   than	
   some	
  
value	
  α	
  (e.g.,	
  0.05	
  for	
  a	
  5%	
  significance	
  level),	
  we	
  cannot	
  reject	
  the	
  null	
  hypothesis	
  and	
  we	
  
accept	
  that	
  the	
  residuals	
  are	
  white.	
  Table	
  3	
  lists	
  the	
  three	
  tests	
  implemented	
  in	
  SIFT	
  along	
  
with	
  their	
  test	
  statistics	
  and	
  practical	
  notes.	
  

	
  

15	
  

Table	
  3.	
  Popular	
  portmanteau	
  tests	
  for	
  whiteness	
  of	
  residuals,	
  implemented	
  in	
  SIFT.	
  Here	
   T̂
total	
  number	
  of	
  samples	
  used	
  to	
  estimate	
  the	
  covariance	
  

Portmanteau	
  Test	
  

Formula	
  (Test	
  Statistic)	
  
h

Box-­‐Pierce	
  (BPP)	
  

Qh := Tˆ ∑tr ( Cl′C Cl C
−1
0

l =1

−1
0

) 	
  

h

Ljung-­‐Box	
  (LBP)	
  

Qh := Tˆ (Tˆ + 2)∑(Tˆ − l ) −1 tr ( Cl′C0−1Cl C0−1 )
l =1

	
  

Li-­‐McLeod	
  (LMP)	
  

h
M 2 h(h + 1)
	
  
Qh := Tˆ ∑tr ( Cl′C0−1Cl C0−1 ) +
2Tˆ
l =1

= TN 	
  	
  is	
  the	
  

Notes	
  
The	
   original	
   portmanteau	
  
test.	
   Potentially	
   overly-­‐
conservative.	
   Poor	
   small-­‐
sample	
  properties.	
  
Modification	
   of	
   BPP	
   to	
  
improve	
  
small-­‐sample	
  
properties.	
  
Potentially	
  
inflates	
   the	
   variance	
   of	
   the	
  
test	
   statistic.	
   Slightly	
   less	
  
conservative	
   than	
   LMP	
   with	
  
slightly	
   higher	
   (but	
   nearly	
  
identical)	
  statistical	
  power.	
  
Further	
   modification	
   of	
   BPP	
  
to	
   improve	
   small-­‐sample	
  
properties	
   without	
   variance	
  
inflation.	
   Slightly	
   more	
  
conservative	
   than	
   LBP.	
  
Probably	
   the	
   best	
   choice	
   in	
  
most	
  conditions.	
  

	
  
BPP	
   is	
   the	
   classical	
   portmanteau	
   test	
   statistic.	
   It	
   can	
   be	
   shown	
   that	
   in	
   small	
   sample	
  
conditions	
   (small	
   Tˆ )	
   its	
   distribution	
   under	
   the	
   null	
   hypothesis	
   diverges	
   from	
   the	
  
asymptotic χ 2 distribution.	
   This	
   can	
   render	
   it	
   overly-­‐conservative	
   leading	
   us	
   to	
   reject	
   the	
  
null	
  hypothesis	
  of	
  white	
  residuals	
  even	
  when	
  the	
  model	
  was	
  appropriately	
  fit.	
  	
  
The	
   LBP	
   statistic	
   attempts	
   to	
   improve	
   the	
   small-­‐sample	
   properties	
   of	
   the	
   test	
   statistic.	
   By	
  
adjusting	
   each	
   covariance	
   coefficient	
   by	
   its	
   asymptotic	
   variance,	
   it	
   can	
   be	
   shown	
   that	
   under	
  
the	
   null	
   hypothesis,	
   the	
   LBP	
   statistic	
   has	
   a	
   small-­‐sample	
   distribution	
   much	
   closer	
   to	
   the	
  
asymptotic	
   distribution	
   than	
   the	
   BPP	
   statistic.	
   However,	
   it	
   can	
   also	
   be	
   shown	
   that	
   the	
  
variance	
   of	
   the	
   LBP	
   statistic	
   can	
   be	
   inflated	
   to	
   substantially	
   larger	
   than	
   its	
   asymptotic	
  
distribution.	
  	
  
Like	
   LBP,	
   the	
   LMP	
   statistic	
   has	
   better	
   small-­‐sample	
   properties	
   than	
   BPP.	
   However,	
   unlike	
  
LBP,	
   it	
   does	
   so	
   without	
   inflating	
   its	
   variance.	
   Although	
   less	
   popular	
   than	
   LBP,	
   it	
   has	
   been	
  
demonstrated	
  that	
  the	
  variance	
  of	
  LMP	
  is	
  closer	
  to	
  its	
  asymptotic	
  variance	
  whereas	
  LBP	
  is	
  
more	
   sensitive	
   with	
   significance	
   levels	
   somewhat	
   larger	
   than	
   expected	
   when Tˆ is	
   large.	
   LMP	
  
is	
  slightly	
  conservative	
  but	
  the	
  statistical	
  power	
  for	
  LMP	
  and	
  LBP	
  are	
  nearly	
  identical.	
  Since,	
  
in	
   practice,	
   it	
   is	
   preferable	
   to	
   select	
   the	
   more	
   conservative	
   test	
   among	
   tests	
   with	
  
comparable	
  power,	
  LMP	
  may	
  represent	
  an	
  ideal	
  choice	
  of	
  test	
  statistic	
  for	
  most	
  applications.	
  	
  
The	
   interested	
   reader	
   should	
   consult	
   (Lutkepohl,	
   2006)	
   and	
   (Arranz,	
   n.d.)for	
   additional	
  
details	
   and	
   references	
   concerning	
   checking	
   the	
   whiteness	
   of	
   residuals.	
   The	
   whiteness	
   of	
  
residuals	
  can	
  be	
  tested	
  in	
  SIFT	
  using	
  est_checkMVARWhiteness()	
  
	
  
	
  

16	
  

3.6.2.	
  
	
  

Checking	
  the	
  consistency	
  of	
  the	
  model	
  

To	
  address	
  the	
  question	
  of	
  what	
  fraction	
  of	
  the	
  correlation	
  structure	
  of	
  the	
  original	
  data	
  is	
  
captured	
   by	
   our	
   model,	
   we	
   can	
   calculate	
   the	
   percent	
   consistency	
   (Ding	
   et	
   al.,	
   2000).	
   We	
  
generate	
  an	
  ensemble,	
  of	
  equal	
  dimensions	
  as	
  the	
  original	
  data,	
  using	
  simulated	
  data	
  from	
  
the	
   VAR	
   model.	
   For	
   both	
   the	
   real	
   and	
   simulated	
   datasets,	
   we	
   then	
   calculate	
   all	
   auto-­‐	
   and	
  
cross-­‐correlations	
   between	
   all	
   variables,	
   up	
   to	
   some	
   predetermined	
   lag.	
   Letting	
   Rr	
   and	
   Rs	
  
denote	
  the	
  vectorized	
  correlation	
  matrices	
  of	
  the	
  real	
  and	
  simulated	
  data,	
  respectively,	
  the	
  
percent	
  consistency	
  index	
  is	
  given	
  by	
  

⎛
R − Rr
PC = ⎜1 − s
⎜
Rr
⎝

⎞
⎟⎟ × 100 	
  	
  	
  	
  	
  where	
   ⋅ denotes	
  the	
  Euclidean	
  (L2)	
  norm.	
  
⎠

A	
   PC	
   value	
   near	
   100%	
   would	
   indicate	
   that	
   the	
   model	
   is	
   able	
   to	
   generate	
   data	
   that	
   has	
   a	
  
nearly	
   identical	
   correlation	
   structure	
   as	
   the	
   original	
   data.	
   A	
   PC	
   value	
   near	
   0%	
   indicates	
   a	
  
complete	
   failure	
   to	
   model	
   the	
   data.	
   While	
   determining	
   precisely	
   what	
   constitutes	
   a	
  
sufficiently	
   large	
   PC	
   value	
   is	
   an	
   area	
   for	
   future	
   research,	
   a	
   rule	
   of	
   thumb	
   is	
   that	
   is	
   a	
   value	
   of	
  
PC	
   >	
   85%	
   suggests	
   the	
   model	
   is	
   adequately	
   capturing	
   the	
   correlation	
   structure	
   of	
   the	
  
original	
   data.	
   The	
   percent	
   consistency	
   can	
   be	
   calculated	
   in	
   SIFT	
   using	
  
est_checkMVARConsistency().	
  
3.6.3.	
  
	
  

Checking	
  the	
  stability	
  and	
  stationarity	
  of	
  the	
  model	
  

In	
  section	
  3.1.	
  we	
  provided	
  a	
  condition	
  for	
  the	
  stability	
  of	
  a	
  VAR[p]	
  process.	
  Namely,	
  an	
  M-­‐
dimensional	
   VAR[p]	
   process	
   is	
   stable	
   if	
   all	
   the	
   eigenvalues	
   of	
   the	
   (Mp	
   x	
   Mp)	
   augmented	
  
coefficient	
  matrix	
  A	
  have	
  modulus	
  less	
  than	
  1.	
  Thus,	
  a	
  useful	
  stability	
  index	
  is	
  the	
   log	
  of	
  the	
  
largest	
  eigenvalue	
  !!"# 	
  of	
  A:	
  
	
   SI = ln λmax 	
  
A	
   VAR[p]	
   process	
   is	
   stable	
   if	
   and	
   only	
   if	
   SI	
   <	
   0.	
   The	
   magnitude	
   of	
   the	
   SI	
   can	
   be	
   loosely	
  
interpreted	
   as	
   an	
   estimate	
   of	
   the	
   degree	
   to	
   which	
   the	
   process	
   is	
   stable.	
   As	
   mentioned	
   in	
  
section	
  3.1.	
  ,	
  a	
  stable	
  process	
  is	
  a	
  stationary	
  process.	
  Thus	
  it	
  is	
  sufficient	
  to	
  test	
  for	
  stability	
  
of	
   the	
   model	
   to	
   guarantee	
   that	
   the	
   model	
   is	
   also	
   stationary.	
   If	
   the	
   model	
   is	
   not	
   stable,	
  
additional	
   tests	
   such	
   as	
   the	
   Augmented	
   Dickey-­‐Fuller	
   test	
   may	
   be	
   used	
   to	
   separately	
  
evaluate	
  the	
  stationarity	
  of	
  the	
  data.	
  However,	
  since	
  we	
  are	
  generally	
  interested	
  modeling	
  
stable	
   processes,	
   these	
   additional	
   stationarity	
   tests	
   are	
   not	
   implemented	
   in	
   SIFT.	
   The	
  
stability	
   index	
   of	
   a	
   fitted	
   model	
   can	
   be	
   calculated	
   in	
   SIFT	
   using	
  
est_checkMVARStability().
3.6.4.	
  
	
  

Comparing	
  parametric	
  and	
  nonparametric	
  spectra	
  and	
  coherence	
  

Another	
   approach	
   sometimes	
   used	
   to	
   validate	
   a	
   fitted	
   VAR	
   model	
   is	
   to	
   compare	
   the	
   spectra	
  
and/or	
  pairwise	
  coherence	
  estimated	
  from	
  the	
  parametric	
  models	
  with	
  those	
  derived	
  from	
  
a	
   robust	
   nonparametric	
   approach	
   such	
   as	
   multitapers	
   or	
   wavelets.	
   Using	
   an	
   equation	
  
similar	
  to	
   percent	
  consistency,	
   we	
   can	
   estimate	
   the	
   fraction	
   of	
   the	
   nonparametric	
   spectrum	
  
or	
   coherence	
   that	
   is	
   captured	
   by	
   our	
   VAR	
   model.	
   Of	
   course,	
   here	
   we	
   assuming	
   the	
  

	
  

17	
  

nonparametric	
   spectra	
   are	
   optimal	
   estimates	
   of	
   the	
   true	
   spectra	
   (“ground	
   truth”),	
   which	
  
may	
   not	
   be	
   the	
   case	
   (interestingly,	
   Burg	
   (1967;	
   1975)	
   demonstrated	
   that,	
   if	
   the	
   data	
   is	
  
generated	
  by	
  an	
  AR	
  process	
  and	
  the	
  true	
  model	
  order	
  is	
  known,	
  AR	
  spectral	
  estimation	
  is	
  a	
  
maximum-­‐entropy	
  method	
  which	
  means	
  that	
  it	
  represents	
  an	
  optimal	
  spectral	
  estimator).	
  
Nevertheless,	
   if	
   the	
   nonparametric	
   quantities	
   are	
   carefully	
   computed,	
   this	
   can	
   be	
   a	
   useful	
  
validation	
   procedure.	
   	
   An	
   upcoming	
   release	
   of	
   SIFT	
   will	
   include	
   routines	
   for	
   computing	
   this	
  
spectral	
  consistency	
  index.	
  

4.	
   Granger	
  Causality	
  and	
  Extensions	
  
	
  
Granger	
   causality	
   (GC)	
   is	
   a	
   method	
   for	
   inferring	
   certain	
   types	
   of	
   causal	
   dependency	
  
between	
   stochastic	
   variables	
   based	
   on	
   reduction	
   of	
   prediction	
   error	
   of	
   a	
   putative	
   effect	
  
when	
  past	
  observations	
  of	
  a	
  putative	
  cause	
  are	
  used	
  to	
  predict	
  the	
  effect,	
  in	
  addition	
  to	
  past	
  
observations	
  of	
  the	
  putative	
  effect.	
  The	
  concept	
  was	
  first	
  introduced	
  by	
  Norbert	
  Wiener	
  in	
  
1956	
   and	
   later	
   reformulated	
   and	
   formalized	
   by	
   C.W.	
   Granger	
   in	
   the	
   context	
   of	
   bivariate	
  
linear	
  stochastic	
  autoregressive	
  models	
  (Weiner,	
  1956;	
  Granger,	
  1969).	
  The	
  concept	
  relies	
  
on	
  two	
  assumptions:	
  
Granger	
  Causality	
  Axioms	
  
	
  
1. Causes	
  must	
  precede	
  their	
  effects	
  in	
  time	
  
	
  
2. Information	
  in	
  a	
  cause’s	
  past	
  must	
  improve	
  the	
  prediction	
  of	
  the	
  effect	
  above	
  
and	
   beyond	
   information	
   contained	
   in	
   the	
   collective	
   past	
   of	
   all	
   other	
   measured	
  
variables	
  (including	
  the	
  effect).	
  
	
  
	
  
	
  
Assumption	
  (1)	
  is	
  intuitive	
  from	
  basic	
  thermodynamical	
  principles:	
  the	
  arrow	
  of	
  causation	
  
points	
  in	
  the	
  same	
  direction	
  as	
  the	
  arrow	
  of	
  time	
  –	
  the	
  past	
  influences	
  the	
  future,	
  but	
  not	
  the	
  
reverse.	
  Assumption	
  (2)	
  is	
  also	
  intuitive:	
  for	
  a	
  putative	
  cause	
  to	
  truly	
  be	
  causal,	
  removal	
  of	
  
the	
   cause	
   should	
   result	
   in	
   some	
   change	
   in	
   the	
   future	
   of	
   the	
   putative	
   effect	
   –	
   there	
   should	
   be	
  
some	
  shared	
  information	
  between	
  the	
  past	
  of	
  the	
  cause	
  and	
  the	
  future	
  of	
  the	
  effect	
  which	
  
cannot	
  be	
  accounted	
  for	
  by	
  knowledge	
  of	
  the	
  past	
  of	
  the	
  effect.	
  	
  
The	
   theory	
   and	
   application	
   of	
   GC	
   (and	
   its	
   extensions)	
   to	
   neural	
   system	
   identification	
   has	
  
been	
   elaborated	
   in	
   a	
   number	
   of	
   other	
   articles	
   and	
   texts	
   (Kaminski,	
   1997;	
   Eichler,	
   2006;	
  
Blinowska	
  and	
  Kaminski,	
  2006;	
  Ding	
  et	
  al.,	
  2006;	
  Schlögl	
  and	
  Supp,	
  2006;	
  Bressler	
  and	
  Seth,	
  
2010).	
   As	
   such,	
   here	
   we	
   will	
   only	
   briefly	
   introduce	
   the	
   theory	
   and	
   focus	
   primarily	
   on	
  
multivariate	
   extensions	
   of	
   the	
   granger-­‐causal	
   concept,	
   including	
   the	
   partial	
   directed	
  
coherence	
  (PDC)	
  and	
  direct	
  directed	
  transfer	
  function	
  (dDTF).	
  	
  

4.1.	
  

Time-­‐Domain	
  GC	
  

	
  
Suppose	
   we	
   wish	
   to	
   test	
   whether	
   a	
   measured	
   EEG	
   variable	
   j	
   Granger-­‐causes	
   another	
  
variable	
  i	
  conditioned	
  on	
  all	
  other	
  variables	
  in	
  the	
  measured	
  set.	
  Let	
  V	
  represent	
  the	
  set	
  of	
  

	
  

18	
  

all	
   measured	
   variables	
   (e.g.,	
   all	
   available	
   EEG	
   sources/channels):	
   V	
   =	
   {1,	
   2,	
   …	
   ,	
   M}.	
   Our	
  
complete	
  (zero-­‐mean)	
  VAR[p]	
  model	
  is	
  specified	
  as:	
  
p

x t(V ) = !Ak x t("Vk) + u t 	
  
k =1

We	
   fit	
   the	
   full	
   model	
   and	
   obtain	
   the	
   mean-­‐square	
   prediction	
   error	
   when	
   x(i)	
   is	
   predicted	
  
from	
  past	
  values	
  of	
  x(V)	
  up	
  to	
  the	
  specified	
  model	
  order:	
  
(V )
(V )
var( x t(i ) | x (i)
) = var(u t(i ) ) = ! ii where	
   x (i)
= {x t(!Vk) , k "{1,…, p }} 	
  denotes	
  the	
  past	
  of	
  x(V)	
  

Now,	
  suppose	
  we	
  exclude	
  j	
  from	
  the	
  set	
  of	
  variables	
  (denoted	
  V\j)	
  and	
  re-­‐fit	
  the	
  model	
  
p

x

= !Ak x t("Vk\ j ) + u t 	
  

(V \ j )
t

k =1

and	
  again	
  obtain	
  the	
  mean-­‐square	
  prediction	
  error	
  for	
  x(i)	
  .	
  
(i )

	
  var( x t

(V \ j )
| x (i)
) = var(u t (i ) ) = ! ii 	
  

In	
   general,	
   Σ ii ≥ Σ ii 	
   and	
   Σ ii = Σ ii 	
   if	
   and	
   only	
   if	
   the	
   best	
   linear	
   predictor	
   of	
   x ( i ) 	
   based	
   on	
   the	
  
full	
   past	
   x (V ) 	
   does	
   not	
   depend	
   on	
   x ( j ) .	
   This	
   leads	
   us	
   to	
   the	
   following	
   definition	
   for	
  
multivariate	
  GC	
  (Eichler,	
  2006):	
  
	
  
DEFINITION	
  1	
  
	
  
Let	
   I	
   and	
   J	
   be	
   two	
   disjoint	
   subsets	
   of	
   V.	
   Then	
   x(J)	
   Granger-­‐causes	
   x(I)	
   conditioned	
   on	
   x(V)	
  
if	
  and	
  only	
  if	
  the	
  following	
  two	
  equivalent	
  conditions	
  hold:	
  
	
  
1. ! ii ! ! ii 	
  
2.

Ak ,ij ! 0 for	
  some	
   k ∈{1,…, p} 	
  

	
  
Here	
   ≫ 	
   means	
   “significantly	
   greater	
   than.”	
   In	
   other	
   words,	
   inferring	
   conditional	
   GC	
  
relationships	
   in	
   the	
   time	
   domain	
   amounts	
   to	
   identifying	
   non-­‐zero	
   elements	
   of	
   a	
  
VAR[p]	
  coefficient	
  matrix	
  fit	
  to	
  all	
  available	
  variables.	
  	
  
	
  
	
  
Granger	
  (1969)	
  quantified	
  DEFINITION	
  1	
  for	
  strictly	
  bivariate	
  processes	
  in	
  the	
  form	
  of	
  an	
  F-­‐
ratio:	
  	
  

	
  

19	
  

(i )
" var( x t(i ) | x (i)
) %
" ! ii %
	
  
Fij = ln $ ' = ln $
(i )
(i )
(j) '
# ! ii &
# var( x t | x (i) , x (i) ) &

(Eq	
  4.1)	
  

Here,	
  Fij	
  denotes	
  the	
  GC	
  from	
  process	
  j	
  to	
  process	
  i.	
  This	
  quantity	
  is	
  always	
  non-­‐negative	
  and	
  
increases	
   away	
   from	
   zero	
   proportionate	
   to	
   the	
   degree	
   to	
   which	
   the	
   past	
   of	
   process	
   j	
  
conditionally	
  explains	
  (“granger-­‐causes”)	
  the	
  future	
  of	
  process	
  i.	
  	
  	
  	
  

4.2.	
  

Frequency-­‐Domain	
  GC	
  

	
  
In	
  the	
  frequency	
  domain	
  a	
  very	
  similar	
  definition	
  holds	
  for	
  GC	
  as	
  in	
  the	
  time	
  domain.	
  If	
  we	
  
obtain	
   the	
   Fourier-­‐transform	
   of	
   our	
   VAR[p]	
   coefficient	
   matrices	
   A(f)	
   as	
   in	
   section	
   3.3.	
   ,	
  
based	
  on	
  the	
  time-­‐domain	
  definition	
  of	
  GC	
  we	
  can	
  derive	
  the	
  following	
  definition	
  for	
  GC	
  in	
  
the	
  frequency-­‐domain	
  (Eichler,	
  2006):	
  
DEFINITION	
  2	
  
	
  
Let	
   I	
   and	
   J	
   be	
   two	
   disjoint	
   subsets	
   of	
   V.	
   Then	
   x(J)	
   Granger-­‐causes	
   x(I)	
   conditioned	
   on	
   x(V)	
  
if	
  and	
  only	
  if	
  the	
  following	
  condition	
  holds:	
  
	
  
Aij ( f ) ! 0 	
  for	
  some	
  frequency	
  f	
  
	
  
	
  
DEFINITION	
   2	
   suggests	
   a	
   simple	
   method	
   for	
   testing	
   multivariate	
   (conditional)	
   GC	
   at	
   a	
   given	
  
frequency	
  f:	
  we	
  simply	
  test	
  for	
  non-­‐zero	
  coefficients	
  of	
  |A(f)|.	
  This	
  approach	
  yields	
  a	
  class	
  of	
  
GC	
   estimators	
   known	
   as	
   Partial	
   Directed	
   Coherence	
   (PDC)	
   measures	
   (Baccalá	
   and	
  
Sameshima,	
  2001).	
  
A	
   slightly	
   different	
   approach,	
   due	
   to	
   Granger	
   (1969)	
   and	
   later	
   refined	
   by	
   Geweke	
   (1982),	
  
provides	
  an	
  elegant	
  interpretation	
  of	
  frequency-­‐domain	
  GC	
  as	
  a	
  decomposition	
  of	
  the	
  total	
  
spectral	
   interdependence	
   between	
   two	
   series	
   (based	
   on	
   the	
   bivariate	
   spectral	
   density	
  
matrix,	
  and	
  directly	
  related	
  to	
  the	
  coherence)	
  into	
  a	
  sum	
  of	
  “instantaneous”,	
  “feedforward”	
  
and	
  “feedback”	
  causality	
  terms.	
  However,	
  this	
  interpretation	
  was	
  originally	
  derived	
  only	
  for	
  
bivariate	
  processes	
  and,	
  while	
  this	
  has	
  been	
  recently	
  been	
  extended	
  to	
  trivariate	
  (and	
  block-­‐
trivariate)	
  processes	
  (Chen	
  et	
  al.,	
  2006;	
  Wang	
  et	
  al.,	
  2007),	
  it	
  has	
  not	
  yet	
  been	
  extended	
  to	
  
the	
   true	
   multivariate	
   case.	
   An	
   implementation	
   of	
   the	
   Granger-­‐Geweke	
   formulation	
   for	
  
bivariate	
  processes	
  is	
  provided	
  in	
  SIFT	
  as	
  the	
  “GGC”	
  connectivity	
  estimator.	
  The	
  interested	
  
reader	
   should	
   consult	
   (Ding	
   et	
   al.,	
   2006)	
   for	
   an	
   excellent	
   tutorial	
   on	
   the	
   Granger-­‐Geweke	
  
approach.	
  	
  
There	
  is	
  a	
  direct	
  relationship	
  between	
  bivariate	
  time-­‐domain	
  and	
  frequency-­‐domain	
  GC.	
  If	
  
Fij	
   is	
   the	
   time-­‐domain	
   GC	
   estimator	
   ((Eq	
   4.1)	
   and	
   W(f)ij	
   is	
   the	
   frequency-­‐domain	
   Granger-­‐
Geweke	
  estimator,	
  then	
  the	
  following	
  equivalency	
  holds:	
  

Fij = ∫

	
  

Fs /2
0

W ( f )ij df 	
  

20	
  

It	
   is	
   unknown	
   whether	
   a	
   similar	
   equivalency	
   exists	
   for	
   other	
   multivariate	
   GC	
   estimators,	
  
such	
  as	
  the	
  PDC	
  and	
  dDTF.	
  However,	
  in	
  practice,	
  integrating	
  these	
  estimators	
  over	
  a	
  range	
  
of	
   frequencies	
   provides	
   a	
   simple	
   way	
   to	
   obtain	
   a	
   general	
   time-­‐domain	
   representation	
   of	
   the	
  
estimator.	
  

4.3.	
  

A	
  partial	
  list	
  of	
  VAR-­‐based	
  spectral,	
  coherence	
  and	
  GC	
  estimators	
  

	
  
Table	
  4	
  contains	
  a	
  list	
  of	
  the	
  major	
  spectral,	
  coherence,	
  and	
  GC/information	
  flow	
  estimators	
  
currently	
   implemented	
   in	
   SIFT.	
   Each	
   estimator	
   can	
   be	
   derived	
   from	
   the	
   quantities	
  
S ( f ), A( f ), H ( f ), and Σ obtained	
  in	
  section	
   3.3.	
  ,	
  with	
  the	
  exception	
  of	
  the	
  renormalized	
  
PDC	
   (rPDC).	
   The	
   rPDC	
   requires	
   estimating	
   the	
   [(Mp)2	
   x	
   (Mp)2]	
   inverse	
   cross-­‐covariance	
  
matrix	
   of	
   the	
   VAR[p]	
   process.	
   SIFT	
   achieves	
   this	
   using	
   an	
   efficient	
   iterative	
   algorithm	
  
proposed	
  in	
  (Barone,	
  1987) and	
  based	
  on	
  the	
  doubling	
  algorithm	
  of	
  (Anderson	
  and	
  Moore,	
  
1979).	
   These	
   estimators	
   and	
   more	
   can	
   be	
   computing	
   using	
   the	
   SIFT’s	
   functions	
  
pop_est_mvarConnectivity()	
  or	
  the	
  low-­‐level	
  function	
  est_mvtransfer().	
  
	
  
Table	
   4.	
   A	
   partial	
   list	
   of	
   VAR-­‐based	
   spectral,	
   coherence,	
   and	
   information	
   flow	
   /	
   GC	
   estimators	
  
implemented	
  in	
  SIFT.	
  

Coherence	
  Measures	
  

Spectral	
  M.	
  

	
  

Estimator	
  

Spectral	
  
Density	
  Matrix	
  

Primary	
  Reference	
  and	
  Notes	
  

S ( f ) = X ( f ) X ( f )*
	
  
= H ( f )ΣH ( f )*

(Brillinger,	
  2001)	
  
	
  
Sii(f)	
   is	
   the	
   spectrum	
   for	
  
variable	
   i.	
   Sij(f)	
   =	
   Sji(f)*	
   is	
   the	
  
cross-­‐spectrum	
  
between	
  
variables	
  i	
  and	
  j.	
  
	
  

Cij ( f ) =
Coherency	
  

Sij ( f )
Sii ( f ) S jj ( f ) 	
  
2

0 ≤ Cij ( f ) ≤ 1

Imaginary	
  
Coherence	
  
(iCoh)	
  

	
  

Formula	
  

iCohij ( f ) = Im(Cij ( f )) 	
  

(Brillinger,	
  2001)	
  
	
  
Complex	
   quantity.	
   Frequency-­‐
domain	
   analog	
   of	
   the	
   cross-­‐
correlation.	
   The	
   magnitude-­‐
squared	
   coherency	
   is	
   the	
  
coherence	
   Cohij(f)	
   =	
   |Cij(f)|2.	
   The	
  
phase	
   of	
   the	
   coherency	
   can	
   be	
  
used	
  
to	
  
infer	
  
lag-­‐lead	
  
relationships,	
  but,	
  as	
  with	
  cross-­‐
correlation,	
   this	
   should	
   be	
  
treated	
   with	
   caution	
   if	
   the	
  
coherence	
   is	
   low,	
   or	
   if	
   the	
  
system	
   under	
   observation	
   may	
  
be	
  open-­‐loop.	
  
	
  
	
  
(Nolte	
  et	
  al.,	
  2004)	
  
	
  
The	
   imaginary	
   part	
   of	
   the	
  
coherency.	
   This	
   was	
   proposed	
  

21	
  

as	
  a	
  coupling	
  measure	
  invariant	
  
to	
  linear	
  instantaneous	
  volume-­‐
conduction.	
   iCohij(f)	
   >	
   0	
   only	
   if	
  
the	
  phase	
  lag	
  between	
  i	
  and	
  j	
  is	
  
non-­‐zero,	
   or	
   equivalently,
0 < angle(Cij ( f )) < 2π 	
  

Partial	
  
Coherence	
  
(pCoh)	
  

Pij ( f ) =

(Brillinger,	
  2001)	
  
	
  
The	
  partial	
  coherence	
  between	
  i	
  
and	
   j	
   is	
   the	
   remaining	
  
coherence	
  
which	
  
cannot	
  
explained	
  
by	
  
a	
  
linear	
  
combination	
   of	
   coherence	
  
between	
   i	
   and	
   j	
   and	
   other	
  
measured	
   variables.	
   Thus,	
   Pij(f)	
  
can	
   regarded	
   as	
   the	
   conditional	
  
coherence	
   between	
   i	
   and	
   j	
   with	
  
respect	
   to	
   all	
   other	
   measured	
  
variables.	
  
	
  

Sˆij ( f )
Sˆii ( f ) Sˆ jj ( f )

Sˆ ( f )= S ( f ) −1

	
  

2

0 ≤ Pij ( f ) ≤ 1

(Brillinger,	
  2001)	
  
	
  
det( S ( f ))
Univariate	
   quantity	
   which	
  
	
  
Gi ( f ) = 1 −
measures	
  the	
  total	
  coherence	
  of	
  
Sii ( f )Mii ( f )
variable	
   i	
   with	
   all	
   other	
  
M ii ( f ) is	
   the	
   minor	
   of	
   S(f)	
   obtained	
   measured	
  variables.	
  

	
  
Multiple	
  
Coherence	
  
(mCoh)	
  

by	
  removing	
  the	
  ith	
  row	
  and	
  column	
  of	
  
S(f)	
  and	
  returning	
  the	
  determinant.	
  
	
  

(Baccalá	
  and	
  Sameshima,	
  2001)	
  
	
  
Complex	
  measure	
  which	
  can	
  be	
  
interpreted	
   as	
   the	
   conditional	
  
granger	
   causality	
   from	
   j	
   to	
   i	
  
normalized	
  by	
  the	
  total	
  amount	
  
of	
   causal	
   outflow	
   from	
   j.	
  	
  
Generally,	
   the	
   magnitude-­‐

Partial	
  Directed	
  Coherence	
  Measures	
  

	
  
Normalized	
  
Partial	
  
Directed	
  
Coherence	
  
(PDC)	
  

π ij ( f ) =

∑

M
k =1

Akj ( f )

2

0 ≤ π ij ( f ) ≤ 1
M

∑π

2

	
  

2

ij

( f ) =1

2

squared	
  PDC	
   π ij ( f ) 	
  	
  is	
  used.	
  

j =1

π ij ( f ) =

1
Aij ( f )
Σii
1
∑ k =1 Σ2 Akj ( f )
ii
M

Generalized	
  
PDC	
  (GPDC)	
  

2

0 ≤ π ij ( f ) ≤ 1
M

∑π
j =1

	
  

Aij ( f )

2

ij

( f ) =1

2

	
  

(Baccalá	
  and	
  Sameshima,	
  2007)	
  
	
  
	
  Modification	
   of	
   the	
   PDC	
   to	
  
account	
   for	
   severe	
   imbalances	
  
in	
   the	
   variance	
   of	
   the	
  
innovations.	
  
Theoretically	
  
provides	
   more	
   robust	
   small-­‐
sample	
   estimates.	
   As	
   with	
   PDC,	
  
the	
  squared-­‐magnitude	
   π ij ( f )
is	
  typically	
  used	
  
	
  

2

22	
  

	
  

	
  

λij ( f ) = Qij ( f )*Vij ( f ) Qij ( f ) 	
  
−1

	
  
where	
  

⎛ Re[ Aij ( f )] ⎞
Qij ( f ) = ⎜
⎟ 	
  and	
  
⎝ Im[ Aij ( f )] ⎠
Renormalized	
  
PDC	
  (rPDC)	
  

Vij ( f ) =

p

∑R

−1
jj

(k , l )Σii Z (2π f , k , l ) 	
  

k ,l =1

Z (ω , k , l )

⎛ cos(ω k )cos(ωl ) cos(ω k )sin (ωl ) ⎞
=⎜
⎟
⎝ sin(ω k )cos(ωl ) sin(ω k )sin (ωl ) ⎠

Directed	
  Transfer	
  Function	
  	
  Measures	
  

	
  
R	
   is	
   the	
   [(Mp)2	
   x	
   (Mp)2]	
   covariance	
  
matrix	
   of	
   the	
   VAR[p]	
   process	
  
(Lütkepohl,	
  2006)	
  

Normalized	
  
Directed	
  
Transfer	
  
Function	
  
(DTF)	
  

Full-­‐
Frequency	
  
DTF	
  (ffDTF)	
  

Direct	
  
(dDTF)	
  

	
  

DTF	
  

γ ij ( f ) =

M
k =1

H ik ( f )

2

0 ≤ γ ij ( f ) ≤ 1

∑

M
j =1

(Kaminski	
   and	
   Blinowska,	
  
1991;	
  Kaminski	
  et	
  al.,	
  2001)	
  
	
  
Complex	
  measure	
  which	
  can	
  be	
  
interpreted	
   as	
   the	
   total	
  
information	
   flow	
   from	
   j	
   to	
   i	
  
normalized	
  by	
  the	
  total	
  amount	
  
of	
   information	
   inflow	
   to	
   i.	
  	
  
Generally,	
   the	
   magnitude-­‐

H ij ( f )

∑

2

	
  

2

2

γ ij ( f ) = 1

ηij2 ( f ) =

(Schelter	
  et	
  al.,	
  2009)	
  
	
  
Modification	
   of	
   the	
   PDC.	
   Non-­‐
normalized	
  
PDC	
  
is	
  
renormalized	
   by	
   the	
   inverse	
  
covariance	
   matrix	
   of	
   the	
  
process	
   to	
   render	
   a	
   scale-­‐free	
  
estimator	
   (does	
   not	
   depend	
   on	
  
the	
   unit	
   of	
   measurement)	
   and	
  
eliminate	
   normalization	
   by	
  
outflows	
   and	
   dependence	
   of	
  
statistical	
  
significance	
  
on	
  
frequency.	
   To	
   our	
   knowledge	
  
SIFT	
   is	
   the	
   first	
   publically	
  
available	
   toolbox	
   to	
   implement	
  
this	
  estimator.	
  

squared	
   DTF	
   γ ij ( f ) 	
   is	
   used	
  
and,	
  
in	
  
time-­‐varying	
  
applications	
  the	
  DTF	
  should	
  not	
  
be	
  normalized.	
  

H ij ( f )

2

∑ f ∑ k =1 H ik ( f )
M

δij2 ( f ) = ηij2 ( f ) Pij2 ( f ) 	
  

2

	
  

(Korzeniewska,	
  2003)	
  
	
  
A	
  different	
  normalization	
  of	
  the	
  
DTF	
   which	
   eliminates	
   the	
  
dependence	
  of	
  the	
  denominator	
  
on	
   frequency	
   allowing	
   more	
  
interpretable	
   comparison	
   of	
  
information	
   flow	
   at	
   different	
  
frequencies.	
  
(Korzeniewska,	
  2003)	
  
	
  
The	
   dDTF	
   is	
   the	
   product	
   of	
   the	
  
ffDTF	
   and	
   the	
   pCoh.	
   Like	
   the	
  
PDC,	
   it	
   can	
   be	
   interpreted	
   as	
  
frequency-­‐domain	
   conditional	
  
GC.	
  	
  

23	
  

Granger-­‐Geweke	
  

Granger-­‐
Geweke	
  
Causality	
  
(GGC)	
  

(Σ
F (f)=
ij

jj

− (Σij2 / Σii ) ) H ij ( f )
Sii ( f )

2

	
  

(Geweke,	
   1982;	
   Bressler	
   et	
   al.,	
  
2007)	
  
	
  
For	
   bivariate	
   models	
   (M	
   =	
   2),	
  
this	
   is	
   identical	
   to	
   Geweke’s	
  
1982	
   formulation.	
   However,	
   it	
  
is	
  not	
  yet	
  clear	
  how	
  this	
  extends	
  
to	
  multivariate	
  models	
  (M	
  >	
  2).	
  
	
  

	
  

4.4.	
  

Time-­‐Frequency	
  GC	
  

	
  
In	
   section	
   3.4.	
   we	
   discussed	
   using	
   adaptive	
   VAR	
   models	
   to	
   model	
   nonstationary	
   time	
   series.	
  
These	
   methods	
   allow	
   us	
   to	
   obtain	
   a	
   sequence	
   of	
   time-­‐varying	
   VAR	
   coefficient	
   matrices.	
   A	
  
time-­‐frequency	
  representation	
  of	
  the	
  spectrum,	
  coherence	
  or	
  information-­‐flow/GC	
  can	
  thus	
  
easily	
   be	
   obtained	
   by	
   computing	
   one	
   or	
   more	
   of	
   the	
   estimators	
   in	
   Table	
   4	
   for	
   each	
  
coefficient	
   matrix.	
   Figure	
   2	
   shows	
   an	
   example	
   of	
   a	
   time-­‐frequency	
   image	
   of	
   dDTF	
  
information	
  flow	
  between	
  two	
  neural	
  processes.	
  Each	
  column	
  of	
  the	
  image	
  corresponds	
  to	
  
the	
  dDTF	
  “spectrum”	
  at	
  a	
  given	
  point	
  in	
  time.	
  	
  

	
  
Figure	
   2.	
   A	
   time-­‐frequency	
   image	
   showing	
   the	
   dDTF	
   between	
   two	
   processes	
   for	
   a	
   selected	
   range	
   of	
  
frequencies	
   and	
   times.	
   Frequency	
   is	
   on	
   the	
   y-­‐axis	
   and	
   Time	
   on	
   the	
   x-­‐axis.	
   Red	
   (blue)	
   indicates	
   more	
  
(less)	
  information	
  flow,	
  relative	
  to	
  a	
  baseline	
  period	
  (purple	
  shaded	
  region).	
  	
  

	
  

4.5.	
  

(Cross-­‐)	
  correlation	
  does	
  not	
  imply	
  (Granger-­‐)	
  causation	
  

	
  
An	
   important	
   result	
   of	
   the	
   definition	
   of	
   granger	
   causality	
   is	
   that	
   it	
   provides	
   a	
   much	
   more	
  
stringent	
   criterion	
   for	
   causation	
   (or	
   information	
   flow)	
   than	
   simply	
   observing	
   high	
  
correlation	
  with	
  some	
  lag-­‐lead	
  relationship.	
  A	
  common	
  approach	
  for	
  inferring	
  information	
  
flow	
   is	
   to	
   compute	
   the	
   cross-­‐correlation	
   (or	
   cross-­‐partial-­‐correlation)	
   between	
   two	
  
variables	
   for	
   a	
   range	
   of	
   time	
   lags	
   and	
   determine	
   whether	
   there	
   exists	
   a	
   peak	
   in	
   the	
  
correlation	
  at	
  some	
  non-­‐zero	
  lag.	
  From	
  this	
  we	
  might	
  infer	
  that	
  the	
  leading	
  variable	
  “causes”	
  
	
  

24	
  

–	
   or	
   transmits	
   information	
   to	
   –	
   the	
   lagged	
   variable.	
   However,	
   using	
   such	
   an	
   approach	
   to	
  
infer	
  causation,	
  or	
  even	
  a	
  direction	
  of	
  information	
  flow,	
  can	
  be	
  quite	
  misleading	
  for	
  several	
  
reasons.	
  	
  
Firstly,	
   the	
   cross-­‐correlation	
   is	
   a	
   symmetric	
   measure	
   and	
   is	
   therefore	
   unsuitable	
   for	
  
identifying	
  lag-­‐lead	
  relationships	
  in	
  systems	
  with	
  feedback	
  (closed-­‐loop	
  systems)	
  (Chatfield,	
  
1989).	
   It	
   is	
   currently	
   understood	
   that	
   many	
   neural	
   systems	
   exhibit	
   feedback,	
   albeit	
  
potentially	
  on	
  a	
  large	
  enough	
  time	
  scale	
  that	
  they	
  system	
  may	
  appear	
  locally	
  open-­‐loop.	
  	
  
Secondly,	
   even	
   if	
   the	
   system	
   under	
   observation	
   is	
   open-­‐loop,	
   a	
   clear	
   peak	
   in	
   the	
   cross-­‐
correlation	
   at	
   some	
   non-­‐zero	
   lag	
   would	
   satisfy	
   Assumption	
   1	
   of	
   GC	
   (causes	
   must	
   precede	
  
effects	
  in	
  time)	
  but	
  not	
  Assumption	
  2	
  (the	
  past	
  of	
  a	
  cause	
  must	
  share	
  information	
  with	
  the	
  
future	
   of	
   the	
   effect	
   that	
   cannot	
   be	
   explained	
   by	
   the	
   past	
   of	
   all	
   other	
   measured	
   variables,	
  
including	
   the	
   effect).	
   In	
   this	
   regard	
   it	
   is	
   fundamentally	
   different	
   than	
   GC.	
   As	
   it	
   turns	
   out,	
   the	
  
ability	
   for	
   GC	
   to	
   test	
   Assumption	
   2	
   is	
   what	
   makes	
   it	
   such	
   a	
   powerful	
   tool	
   for	
   causal	
  
inference,	
  in	
  contrast	
  to	
  simple	
  correlative	
  measures.	
  
To	
  illustrate:	
  suppose	
  we	
  are	
  observing	
  two	
  ants	
  independently	
  following	
  a	
  pheromone	
  trail	
  
towards	
  some	
  tasty	
  morsel.	
  Ant	
  1	
  started	
  the	
  journey	
  two	
  minutes	
  before	
  Ant	
  2	
  and	
  so	
  he	
  
appears	
   to	
   be	
   “leading”	
   Ant	
   2.	
   If	
   we	
   compute	
   the	
   cross-­‐correlation	
   between	
   the	
   two	
   ants’	
  
trajectories	
   for	
   a	
   range	
   of	
   time	
   lags	
   we	
   would	
   find	
   a	
   high	
   correlation	
   between	
   their	
  
trajectories	
  and,	
  furthermore,	
  we	
  would	
  find	
  the	
  correlation	
  was	
  peaked	
  at	
  a	
  non-­‐zero	
  lag	
  
with	
  Ant	
  1	
  leading	
  Ant	
  2	
  by	
  a	
  lag	
  of	
  two	
  minutes.	
  But	
  it	
  would	
  be	
  foolish	
  to	
  say	
  that	
  Ant	
  1	
  
was	
   “causing”	
   the	
   behavior	
   of	
   Ant	
   2.	
   In	
   fact,	
   not	
   only	
   is	
   there	
   no	
   causal	
   relationship	
  
whatsoever	
   between	
   the	
   two,	
   but	
   there	
   is	
   not	
   even	
   any	
   information	
   being	
   transmitted	
  
between	
   the	
   two	
   ants.	
   They	
   are	
   conditionally	
   independent	
   of	
   each	
   other,	
   given	
   their	
   own	
  
past	
   history	
   and	
   given	
   the	
   fact	
   that	
   each	
   is	
   independently	
   following	
   the	
   pheromone	
   trail	
  
(this	
   is	
   the	
   “common	
   (exogenous)	
   cause”	
   that	
   synchronizes	
   their	
   behavior).	
   If	
   we	
   were	
   to	
  
intervene	
  and	
  remove	
  Ant	
  2	
  (Ant	
  1),	
  Ant	
  1	
  (Ant	
  2)	
  would	
  continue	
  on	
  his	
  way,	
  oblivious	
  to	
  
the	
   fact	
   that	
   his	
   comrade	
   is	
   no	
   longer	
   in	
   lock-­‐step	
   with	
   him.	
   Consequently,	
   if	
   we	
   calculate	
  
the	
  Granger-­‐causality	
  between	
  the	
  two	
  trajectories	
  we	
  will	
  find	
  that	
  the	
  GC	
  is	
  zero	
  in	
  both	
  
directions:	
   there	
   is	
   no	
   information	
   in	
   the	
   history	
   of	
   either	
   ant	
   that	
   can	
   help	
   predict	
   the	
  
future	
   of	
   the	
   other	
   ant	
   above	
   and	
   beyond	
   the	
   information	
   already	
   contained	
   in	
   each	
   ant’s	
  
respective	
  past.	
  
Because	
   the	
   spectral	
   coherence	
   is	
   simply	
   the	
   Fourier	
   transform	
   of	
   the	
   cross-­‐correlation	
  
(and	
   therefore	
   the	
   frequency-­‐domain	
   representation	
   of	
   the	
   cross-­‐correlation),	
   the	
   same	
  
limitations	
  hold	
  for	
  coherence	
  as	
  for	
  cross-­‐correlation	
  regarding	
  inference	
  of	
  directionality	
  
of	
   information	
   flow	
   or	
   causation.	
   Namely,	
   using	
   the	
   phase	
   of	
   coherence	
   to	
   infer	
  
directionality	
  of	
  information	
  flow	
  in	
  some	
  frequency	
  (as	
  is	
  often	
  done	
  in	
  the	
  neuroscience	
  
community)	
  may	
  be	
  highly	
  misleading	
  if	
  there	
  is	
  even	
  moderate	
  feedback	
  in	
  the	
  system	
  (or	
  
if	
   the	
   coherence	
   is	
   low).	
   Coherence	
   is	
   not	
   necessarily	
   a	
   measure	
   of	
   information	
   flow,	
   but	
  
rather	
  correlation	
  between	
  two	
  processes	
  at	
  a	
  particular	
  frequency	
  (a	
  useful	
  analogy	
  here,	
  
similar	
   to	
   the	
   ants,	
   is	
   to	
   consider	
   two	
   pendulums	
   on	
   opposite	
   sides	
   of	
   the	
   globe	
   swinging	
   in	
  
synchrony	
   at	
   the	
   same	
   frequency,	
   with	
   one	
   pendulum	
   started	
   ¼	
   cycle	
   before	
   the	
   other	
   –	
  
their	
   behavior	
   is	
   coherent,	
   but	
   is	
   there	
   information	
   flow	
   between	
   them?).	
   In	
   contrast,	
  
frequency-­‐domain	
   extensions	
   of	
   Granger-­‐causality	
   condition	
   on	
   the	
   past	
   history	
   of	
   the	
  
processes	
   and,	
   assuming	
   all	
   relevant	
   variables	
   have	
   been	
   included	
   in	
   the	
   model,	
   can	
  
correctly	
   distinguish	
   between	
   such	
   spurious	
   forms	
   of	
   information	
   flow	
   or	
   causation	
   and	
  
“true”	
  information	
  flow.	
  

	
  

25	
  

5.	
   Statistics	
  
	
  
When	
   making	
   inferences	
   about	
   information	
   flow	
   or	
   causation	
   in	
   the	
   neural	
   systems,	
   it	
   is	
  
highly	
  important	
  to	
  also	
  produce	
  confidence	
  intervals	
  and	
  statistical	
  significance	
  thresholds	
  
for	
  the	
  estimator.	
  The	
  most	
  common	
  statistical	
  tests	
  used	
  in	
  neural	
  system	
  identification	
  are	
  
listed	
  in	
  Table	
  5.	
  Statistical	
  routines	
  in	
  SIFT	
  are	
  designed	
  to	
  address	
  one	
  or	
  more	
  of	
  these	
  
three	
   tests	
   and	
   currently	
   fall	
   under	
   two	
   main	
   categories:	
   non-­‐parametric	
   surrogate	
  
statistics,	
  and	
  asymptotic	
  analytic	
  statistics.	
  
Table	
  5.	
  Common	
  statistical	
  tests.	
  Here	
  C(i,j)	
  is	
  the	
  measured	
  information	
  flow	
  from	
  process	
  j	
  to	
  process	
  i.	
  Cnull	
  	
  
is	
   the	
   expected	
   measured	
   information	
   flow	
   when	
   there	
   is	
   no	
   true	
   information	
   flow,	
   Cbase	
   is	
   the	
   expected	
  
information	
  flow	
  in	
  some	
  baseline	
  period.	
  

Test	
  
Hnull	
  

Null	
  Hypothesis	
  
C(i,j)	
  ≤	
  Cnull(i,j)	
  

Hbase	
  

C(i,j)	
  =	
  Cbase(i,j)	
  

HAB	
  

CA(i,j)	
  =	
  CB(i,j)	
  

What	
  question	
  are	
  we	
  addressing?	
  
Is	
  there	
  significantly	
  non-­‐zero	
  
information	
  flow	
  from	
  process	
  jài	
  ?	
  
Is	
  there	
  a	
  difference	
  in	
  information	
  
flow	
  relative	
  to	
  the	
  baseline?	
  
Is	
  there	
  a	
  difference	
  in	
  information	
  
flow	
  between	
  experimental	
  
conditions/populations	
  A	
  and	
  B?	
  

Applicable	
  Methods	
  
Phase	
  randomization	
  
Analytic	
  tests	
  
Bootstrap	
  resampling	
  
Bootstrap	
  resampling	
  

	
  

5.1.	
  

Asymptotic	
  analytic	
  statistics	
  

	
  
In	
   recent	
   years,	
   asymptotic	
   analytic	
   tests	
   for	
   non-­‐zero	
   information	
   flow	
   (Hnull)	
   at	
   a	
   given	
  
frequency	
  have	
  been	
  derived	
  and	
  validated	
  for	
  the	
  PDC,	
  rPDC,	
  and	
  DTF	
  estimators	
  (Schelter	
  
et	
   al.,	
   2005;	
   Eichler,	
   2006b;	
   Schelter	
   et	
   al.,	
   2009).	
   These	
   tests	
   have	
   the	
   advantage	
   of	
  
requiring	
   very	
   little	
   computational	
   time,	
   compared	
   to	
   surrogate	
   statistics.	
   However,	
   these	
  
tests	
   are	
   also	
   based	
   on	
   asymptotic	
   properties	
   of	
   the	
   VAR	
   model,	
   meaning	
   they	
   are	
  
asymptotically	
   accurate	
   and	
   may	
   suffer	
   from	
   inaccuracies	
   when	
   the	
   number	
   of	
   samples	
   is	
  
not	
   very	
   large	
   or	
   when	
   assumptions	
   are	
   violated.	
   Nonetheless,	
   these	
   tests	
   can	
   provide	
   a	
  
useful	
   way	
   to	
   quickly	
   check	
   for	
   statistical	
   significance,	
   possibly	
   following	
   up	
   with	
   a	
   more	
  
rigorous	
   surrogate	
   statistical	
   test.	
   These	
   tests	
   are	
   implemented	
   in	
   SIFT’s	
  
stat_analyticTests()	
   function.	
   To	
   our	
   knowledge,	
   SIFT	
   is	
   the	
   only	
   publically	
  
available	
  toolbox	
  that	
  implements	
  these	
  analytic	
  tests.	
  	
  

5.2.	
  

Nonparametric	
  surrogate	
  statistics	
  

	
  
Analytic	
   statistics	
   require	
   knowledge	
   of	
   the	
   probability	
   distribution	
   of	
   the	
   estimator	
   in	
  
question.	
   When	
   the	
   distribution	
   of	
   an	
   estimator	
   is	
   unknown,	
   nonparametric	
   surrogate	
  
statistical	
  methods	
  may	
  be	
  used	
  to	
  test	
  for	
  non-­‐zero	
  values	
  or	
  to	
  compare	
  values	
  between	
  
two	
   conditions.	
   Surrogate	
   statistical	
   tests	
   utilize	
   surrogate	
   data	
   (modified	
   samples	
   of	
   the	
  
original	
   data)	
   to	
   empirically	
   estimate	
   the	
   expected	
   probability	
   distribution	
   of	
   either	
   the	
  
estimator	
  or	
  a	
  null	
  distribution	
  corresponding	
  to	
  the	
  expected	
  distribution	
  of	
  the	
  estimator	
  
when	
   a	
   particular	
   null	
   hypothesis	
   has	
   been	
   enforced.	
   Two	
   popular	
   surrogate	
   methods,	
  
implemented	
  in	
  SIFT,	
  are	
  bootstrap	
  resampling,	
  and	
  phase	
  randomization.	
  

	
  

26	
  

5.2.1.	
  
	
  

Bootstrap	
  resampling	
  

Bootstrap	
  resampling	
  (Efron	
  and	
  Tibshirani,	
  1994)	
  approximates	
  the	
  true	
  distribution	
  of	
  an	
  
estimator	
   by	
   randomly	
   resampling	
   with	
   replacement	
   from	
   the	
   original	
   set	
   of	
   data	
   and	
   re-­‐
computing	
   the	
   estimator	
   on	
   the	
   collection	
   of	
   resampled	
   data.	
   This	
   is	
   repeated	
   many	
   (e.g.,	
  
200-­‐2000)	
   times	
   and	
   the	
   value	
   of	
   the	
   estimator	
   for	
   each	
   resample	
   is	
   stored.	
   When	
   the	
  
procedure	
  terminates	
  we	
  have	
  an	
  empirical	
  distribution	
  of	
  the	
  estimator	
  from	
  which	
  we	
  can	
  
compute	
   the	
   expected	
   value	
   of	
   the	
   estimator,	
   obtain	
   confidence	
   intervals	
   around	
   the	
  
expected	
   value,	
   and	
   perform	
   various	
   statistical	
   tests	
   (t-­‐tests,	
   ANOVAs,	
   etc)	
   to	
   compare	
  
different	
   values	
   of	
   the	
   estimator.	
   It	
   can	
   be	
   shown	
   that,	
   under	
   certain	
   conditions,	
   as	
   the	
  
number	
   of	
   resamples	
   approaches	
   infinity,	
   the	
   bootstrap	
   distribution	
   approaches	
   the	
   true	
  
distribution	
  of	
  the	
  data.	
  The	
  convergence	
  speed	
  depends	
  largely	
  on	
  the	
  nature	
  of	
  the	
  data,	
  
but	
   a	
   rule	
   of	
   thumb	
   is	
   that	
   250-­‐1000	
   resamples	
   is	
   generally	
   sufficient	
   to	
   produce	
   a	
  
reasonable	
   estimate	
   of	
   the	
   distribution.	
   In	
   SIFT	
   the	
   bootstrap	
   distribution	
   of	
   an	
   estimator	
  
can	
   be	
   obtained	
   using	
   the	
   stat_bootstrap()	
   function.	
   P-­‐values	
   for	
   Hnull	
   and	
   Hbase	
  
significance	
   tests,	
   as	
   well	
   as	
   confidence	
   intervals	
   on	
   the	
   estimators	
   can	
   be	
   obtained	
   via	
  
stat_bootSignificanceTests()
5.2.2.	
  
	
  

Phase	
  Randomization	
  

Phase	
  randomization	
  (Theiler,	
  1992)	
  is	
  a	
  method	
  for	
  testing	
  for	
  non-­‐zero	
  information	
  flow	
  
in	
  a	
  dynamical	
  system.	
  The	
  concept	
  is	
  quite	
  simple:	
  we	
  begin	
  by	
  constructing	
  the	
  expected	
  
probability	
  distribution	
  (the	
  null	
  distribution)	
  of	
  the	
  estimator	
  when	
  the	
  null	
  hypothesis	
  (no	
  
information	
   flow)	
   has	
   been	
   enforced	
   in	
   the	
   data.	
   An	
   observed	
   value	
   of	
   the	
   estimator	
   is	
   then	
  
compared	
   to	
   the	
   quantiles	
   of	
   the	
   null	
   distribution	
   to	
   obtain	
   a	
   significance	
   level	
   for	
   rejection	
  
of	
   the	
   null	
   hypothesis	
   for	
   that	
   value.	
   Specifically,	
   to	
   generate	
   our	
   null	
   distribution	
   we	
  
randomize	
   only	
   the	
   phases	
   of	
   each	
   time-­‐series,	
   preserving	
   the	
   amplitude	
   distribution.	
   We	
  
then	
  re-­‐compute	
  our	
  estimator.	
  Repeating	
  this	
  procedure	
  many	
  times	
  produces	
  the	
  desired	
  
null	
   distribution.	
   Phase	
   randomization	
   implemented	
   efficiently	
   in	
   SIFT	
   by	
   applying	
   a	
   fast-­‐
fourier	
  transform	
  (FFT)	
  to	
  obtain	
  the	
  complex	
  power	
  spectrum,	
  replacing	
  the	
  phases	
  with	
  
those	
   of	
   a	
   uniform	
   random	
   matrix,	
   and	
   finally	
   applying	
   the	
   inverse	
   FFT	
   to	
   obtain	
   our	
  
surrogate	
   data	
   matrix.	
   This	
   procedure	
   ensures	
   that	
   (a)	
   the	
   surrogate	
   spectral	
   matrix	
   is	
  
hermitian	
  and	
  (b)	
  the	
  real	
  part	
  of	
  the	
  surrogate	
  spectrum	
  is	
  identical	
  to	
  that	
  of	
  the	
  original	
  
data.	
   Since	
   our	
   frequency-­‐domain	
   information	
   flow	
   estimators	
   depend	
   critically	
   on	
   the	
  
relative	
   phases	
   of	
   the	
   multivariate	
   time-­‐series,	
   any	
   observed	
   information	
   flow	
   in	
   the	
  
surrogate	
  dataset	
  should	
  be	
  due	
  to	
  chance.	
  Therefore,	
  values	
  of	
  the	
  estimator	
  greater	
  than,	
  
say,	
  95%	
  of	
  the	
  values	
  in	
  the	
  null	
  distribution	
  should	
  be	
  significant	
  at	
  5%	
  level	
  (p	
  <	
  0.05).	
  	
  In	
  
SIFT,	
   the	
   phase-­‐randomized	
   null	
   distribution	
   can	
   be	
   obtained	
   using	
   the	
  
stat_phaserand()	
   function.	
   P-­‐values	
   for	
   significance	
   can	
   be	
   obtained	
   via	
  
stat_bootSignificanceTests().	
  
Importantly,	
   the	
   above	
   tests	
   (both	
   analytic	
   and	
   surrogate)	
   are	
   only	
   pointwise	
   significance	
  
tests,	
   and	
   therefore,	
   when	
   jointly	
   testing	
   a	
   collection	
   of	
   values	
   (for	
   example,	
   obtaining	
   p-­‐
values	
   for	
   a	
   complete	
   time-­‐frequency	
   image),	
   we	
   should	
   expect	
   some	
   number	
   of	
   non-­‐
significant	
  values	
  to	
  exceed	
  the	
  significance	
  threshold.	
  	
  As	
  such,	
  it	
  is	
  important	
  to	
  correct	
  for	
  
multiple	
   comparisons	
   using	
   tests	
   such	
   as	
   False	
   Discovery	
   Rate	
   (FDR)	
   (Benjamini	
   and	
  
Hochberg,	
   1995).	
   SIFT	
   currently	
   affords	
   FDR	
   statistical	
   correction	
   using	
   EEGLAB’s	
   fdr()	
  
function	
  with	
  other	
  statistical	
  correction	
  methods	
  soon	
  to	
  be	
  made	
  available.	
  	
  

	
  

27	
  

6.	
   Using	
  SIFT	
  to	
  analyze	
  neural	
  information	
  flow	
  dynamics	
  
	
  
This	
  section	
  provides	
  a	
  demonstration	
  of	
  the	
  use	
  of	
  SIFT	
  to	
  estimate	
  and	
  visualize	
  source-­‐
domain	
   information	
   flow	
   dynamics	
   in	
   an	
   EEG	
   dataset.	
   To	
   get	
   the	
   most	
   of	
   this	
   tutorial	
   you	
  
may	
  want	
  to	
  download	
  the	
  toolbox	
  and	
  sample	
  data	
  and	
  follow	
  along	
  with	
  the	
  step-­‐by-­‐step	
  
instructions.	
   The	
   toolbox	
   is	
   demonstrated	
   through	
   hands-­‐on	
   examples	
   primarily	
   using	
  
SIFT’s	
   Graphical	
   User	
   Interface	
   (GUI).	
   Theory	
   boxes	
   provide	
   additional	
   information	
   and	
  
suggestions	
  at	
  some	
  stages	
  of	
  the	
  SIFT	
  pipeline.	
  
In	
  order	
  to	
  make	
  the	
  most	
  use	
  of	
  SIFT’s	
  functionality,	
  it	
  is	
  important	
  to	
  first	
  separate	
  your	
  
data	
   into	
   sources	
   –	
   e.g.	
   using	
   EEGLAB’s	
   built-­‐in	
   Independent	
   Component	
   Analysis	
   (ICA)	
  
routines.	
   To	
   make	
   use	
   of	
   the	
   advanced	
   network	
   visualization	
   tools,	
   these	
   sources	
   should	
  
also	
   be	
   localized	
   in	
   3D	
   space	
   e.g.	
   using	
   dipole	
   fitting	
   (pop_dipfit()).	
   Detailed	
  
information	
   on	
   performing	
   an	
   ICA	
   decomposition	
   and	
   source	
   localization	
   can	
   be	
   found	
   in	
  
the	
   EEGLAB	
   wiki.	
   In	
   this	
   example	
   we	
   will	
   be	
   using	
   two	
   datasets	
   from	
   a	
   single	
   subject	
  
performing	
   a	
   two-­‐back	
   with	
   feedback	
   continuous	
   performance	
   task	
   depicted	
   in	
   Figure	
   3	
  
(Onton	
   and	
   Makeig,	
   2007).	
   Here	
   the	
   subject	
   is	
   presented	
   with	
   a	
   continuous	
   stream	
   of	
  
letters,	
  separated	
  by	
  ~1500	
  ms,	
  and	
  instructed	
  to	
  press	
  a	
  button	
  with	
  the	
  right	
  thumb	
  if	
  the	
  
current	
  letter	
  matches	
  the	
  one	
  presented	
  twice	
  earlier	
  in	
  the	
  sequence	
  and	
  press	
  with	
  the	
  
left	
  thumb	
  if	
  the	
  letter	
  is	
  not	
  a	
  match.	
  Correct	
  and	
  erroneous	
  responses	
  are	
  followed	
  by	
  an	
  
auditory	
  “beep”	
  or	
  “boop”	
  sound.	
  Data	
  is	
  collected	
  using	
  a	
  64-­‐channel	
  Biosemi	
  system	
  with	
  a	
  
sampling	
  rate	
  of	
  256	
  Hz.	
  The	
  data	
  is	
  common-­‐average	
  re-­‐referenced	
  and	
  zero-­‐phase	
  high-­‐
pass	
  filtered	
  at	
  0.1	
  Hz.	
  The	
  datasets	
  we	
  are	
  analyzing	
  are	
  segregated	
  into	
  correct	
  (RespCorr)	
  
and	
  incorrect	
  (RespWrong)	
  responses,	
  time-­‐locked	
  to	
  the	
  button	
  press,	
  and	
  separated	
  into	
  
maximally	
   independent	
   components	
   using	
   Infomax	
   ICA	
   (Bell	
   and	
   Sejnowski,	
   1995).	
   These	
  
sources	
   are	
   localized	
   using	
   a	
   single	
   or	
   dual-­‐symmetric	
   equivalent-­‐current	
   dipole	
   model	
  
using	
   a	
   four-­‐shell	
   spherical	
   head	
   model	
   co-­‐registered	
   to	
   the	
   subjects’	
   electrode	
   locations	
   by	
  
warping	
   the	
   electrode	
   locations	
   to	
   the	
   model	
   head	
   sphere	
   using	
   tools	
   from	
   the	
   EEGLAB	
  
dipfit	
  plug-­‐in.	
  	
  
24 Subjects

	
  
Figure	
  3.	
  Two-­‐back	
  with	
  feedback	
  CPT	
  (Onton	
  and	
  Makeig,	
  2007).	
  

In	
   this	
   exercise	
   we	
   will	
   be	
   analyzing	
   information	
   flow	
   between	
   several	
   of	
   these	
  
anatomically-­‐localized	
   sources	
   of	
   brain	
   activity	
   during	
   correct	
   responses	
   and	
   error	
  
commission.	
  	
  

	
  

28	
  

6.1.	
  

System	
  Requirements	
  

	
  
It	
   is	
   assumed	
   that	
   you	
   have	
   Matlab®	
   2008b	
   or	
   later,	
   the	
   Signal	
   Processing	
   Toolbox,	
  
Statistics	
   Toolbox,	
   EEGLAB	
   and	
   the	
   SIFT	
   toolbox.	
   The	
   latter	
   two	
   are	
   downloadable	
   from	
  
http://sccn.ucsd.edu/eeglab/	
   and	
   http://sccn.ucsd.edu/sift.	
   The	
   SIFT	
   toolbox	
   must	
   be	
   in	
  
the	
  Matlab	
  path	
  when	
  starting	
  EEGLAB.	
  	
  
To	
   use	
   the	
   ARFIT	
   model-­‐fitting	
   algorithm	
   and	
   other	
   (recommended)	
   tools,	
   you	
   must	
  
download	
   the	
   free	
   ARFIT	
   package	
   from	
   http://www.gps.caltech.edu/~tapio/arfit/.	
   	
   After	
  
downloading	
   and	
   unzipping	
   the	
   ARFIT	
   package,	
   place	
   the	
   /arfit/	
   folder	
   in	
   /external/	
  	
  where	
  	
  is	
  the	
  full	
  path	
  to	
  the	
  SIFT	
  install	
  directory.	
  

6.2.	
  

Configuring	
  EEGLAB	
  

	
  
First	
  start	
  up	
  EEGLAB	
  from	
  the	
  MATLAB®	
  command-­‐prompt:	
  	
  
>> eeglab.	
  	
  
Before	
  we	
  load	
  any	
  data,	
  we	
  need	
  to	
  make	
  sure	
  we	
  have	
  the	
  right	
  memory	
  options	
  set.	
  From	
  
the	
   EEGLAB	
   menu	
   select	
   Fileà Memory	
   and	
   other	
   options.	
   Make	
   sure	
   your	
   memory	
  
options	
  are	
  set	
  as	
  in	
  Figure	
  4	
  below.	
  	
  
	
  

	
  
Figure	
  4.	
  Memory	
  options	
  for	
  EEGLAB.	
  

	
  
	
  
	
  

	
  

29	
  

6.3.	
  

Loading	
  the	
  data	
  

	
  
Now	
   let’s	
   load	
   our	
   sample	
   datasets	
   in	
   EEGLAB.	
   We	
   will	
   first	
   load	
   the	
   RespWrong.set	
   file	
   and	
  
then	
  RespCorr.set	
  located	
  in	
  the	
  /Data/	
  folder.	
  

	
  
Figure	
  5.	
  Load	
  RespWrong.set	
  then	
  RespCorr.set	
  

	
  
Now	
   select	
   both	
   datasets	
   from	
   the	
   Datasetsà Select	
   multiple	
   datasets	
   menu.	
   This	
   will	
  
enable	
  SIFT	
  to	
  process	
  these	
  datasets	
  in	
  sequence,	
  and	
  visualize	
  differences	
  in	
  connectivity	
  
between	
  conditions.	
  	
  

	
  
Figure	
  6.	
  Select	
  both	
  datasets	
  in	
  EEGLAB	
  

Note	
  that	
  the	
  “ICA	
  weights”	
  field	
  in	
  the	
  dataset	
  description	
  is	
  set	
  to	
  “Yes”	
  indicating	
  we	
   have	
  
ICA	
   decompositions	
   for	
   both	
   these	
   datasets.	
   Source	
   separation	
   (and	
   localization,	
   for	
  
advanced	
  visualization)	
  is	
  currently	
  a	
  prerequisite	
  for	
  the	
  GUI-­‐based	
  SIFT	
  data-­‐processing	
  

	
  

30	
  

pipeline,	
   although	
   it	
   is	
   possible	
   to	
   apply	
   low-­‐level	
   SIFT	
   command-­‐line	
   routines	
   to	
   analyze	
  
connectivity	
  using	
  channel	
  data.	
  Future	
  releases	
  of	
  SIFT	
  will	
  support	
  a	
  wider	
  variety	
  of	
  data	
  
formats.	
  
	
  

6.4.	
  

The	
  SIFT	
  analysis	
  pipeline	
  

	
  
Now	
   that	
   our	
   data	
   is	
   properly	
   loaded,	
   we	
   are	
   ready	
   to	
   begin	
   the	
   SIFT	
   data-­‐processing	
  
pipeline.	
   SIFT	
   can	
   be	
   started	
   from	
   the	
   Toolsà SIFT	
   menu.	
   Figure	
   7	
   shows	
   SIFT’s	
   data-­‐
processing	
   pipeline.	
   The	
   sub-­‐menu	
   options	
   correspond	
   to	
   SIFT’s	
   five	
   main	
   modules:	
   Pre-­‐
Processing,	
  Model	
  Fitting	
  and	
  Validation,	
  Connectivity	
  Analysis,	
  Statistics,	
  and	
  Visualization.	
  
Group	
   Analysis	
   is	
   a	
   sixth	
   module	
   which	
   is	
   applied	
   after	
   a	
   cohort	
   of	
   datasets	
   have	
   been	
  
processed	
   using	
   Modules	
   1-­‐4.	
   	
   This	
   module	
   is	
   currently	
   unavailable	
   in	
   SIFT	
   0.1a,	
   but	
   it	
   is	
  
being	
  integrated	
  into	
  EEGLAB’s	
  STUDY	
  module	
  and	
  will	
  be	
  released	
  in	
  SIFT	
  1.0.	
  	
  

Pre-processing

Connectivity

Modeling

Model Fitting
and Validation

Statistics
Group Analysis

Visualization
	
  

Figure	
  7.	
  SIFT	
  Data	
  processing	
  pipeline	
  

	
  

6.5.	
  

Preprocessing	
  	
  

	
  
The	
   first	
   step	
   in	
   our	
   pipeline	
   is	
   to	
   pre-­‐process	
   the	
   data.	
   Select	
   SIFTà Pre-­‐processing	
   to	
  
bring	
  up	
  the	
  Preprocessing	
  GUI.	
  This	
  can	
  also	
  be	
  opened	
  from	
  the	
  command-­‐line	
  using	
  	
  	
  
>> EEG = pop_pre_prepData(EEG).	
  	
  

	
  

31	
  

The	
   first	
   thing	
   you	
   will	
   see	
   is	
   the	
   splash	
   screen	
   (Figure	
   8).	
   This	
   lists	
   acknowledgements,	
  
disclaimers,	
   and,	
   most	
   importantly,	
   the	
   reference	
   (Mullen	
   et	
   al,	
   2010)	
   to	
   cite	
   in	
   your	
  
presentations/publications	
   if	
   you	
   use	
   SIFT.	
   This	
   is	
   very	
   important	
   for	
   continuing	
  
maintenance/development	
  of	
  SIFT	
  so	
  please	
  don’t	
  forget	
  to	
  cite!	
  	
  

	
  
Figure	
  8.	
  SIFT	
  splash	
  screen.	
  

Click	
  Close	
  to	
  bring	
  up	
  the	
  Preprocessing	
  GUI.	
  Figure	
  9	
  shows	
  the	
  GUI,	
  set	
  to	
  the	
  options	
  we	
  
will	
  be	
  using	
  for	
  this	
  example.	
  For	
  most	
  GUIs,	
  help	
  text	
  for	
  each	
  menu	
  item	
  appears	
  in	
  the	
  
Help	
   Pane	
   at	
   the	
   bottom	
   of	
   the	
   GUI	
   when	
   the	
   menu	
   item	
   is	
   selected	
   (for	
   other	
   GUIs	
   help	
  
text	
   appears	
   in	
   the	
   tooltip	
   text	
   when	
   the	
   mouse	
   is	
   hovered	
   over	
   a	
   menu	
   item).	
   The	
  
VerbosityLevel	
  determines	
  how	
  much	
  information	
  SIFT	
  will	
  communicate	
  to	
  the	
  user,	
  via	
  
command-­‐window	
   or	
   graphical	
   output,	
   throughout	
   the	
   remaining	
   pipeline	
   (0=no	
   (or	
  
minimal)	
   command-­‐window	
   output,	
   1=full	
   command-­‐window	
   output,	
   2=command-­‐window	
  
and	
  graphical	
  (progress-­‐bars,	
  etc)	
  output).	
  The	
  Data	
  Selection	
  group	
  contains	
  options	
  for	
  
selecting	
   ICs	
   and	
   re-­‐epoching	
   the	
   data.	
   The	
   Filtering	
   group	
   contains	
   options	
   for	
  
downsampling	
   the	
   data,	
   filtering,	
   differencing	
   and	
   linear	
   detrending.	
   Normalization	
  
(removal	
   of	
   mean	
   and	
   division	
   by	
   standard	
   deviation)	
   can	
   be	
   performed	
   across	
   the	
  
ensemble,	
   across	
   time,	
   or	
   both	
   (first	
   temporal	
   then	
   ensemble	
   normalization).	
   For	
   our	
  
example,	
  we	
  will	
  use	
  the	
  options	
  shown	
  in	
  Figure	
  9	
  and	
  in	
  the	
  table	
  below:	
  	
  
Option	
  
VerbosityLevel	
  
ComponentsToKeep	
  

EpochTimeRange	
  	
  	
  	
  
NormalizeData	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Method	
  

	
  

Value	
  
2	
  
8;	
  11;	
  13;	
  19;	
  20;	
  23;	
  38;	
  39;	
  
(if	
   typing	
   component	
   numbers	
   manually,	
  
make	
   sure	
   to	
   use	
   semicolons	
   to	
   delimit	
  
numbers)	
  
[-­‐1	
  1.25]	
  	
  
(1	
  second	
  before	
  button	
  press	
  event	
  to	
  1.25	
  
seconds	
  after	
  event)	
  
checked	
  
time;	
  ensemble.	
  	
  	
  

32	
  

	
  
Once	
   these	
   options	
   have	
   been	
   input,	
   click	
   OK.	
   Both	
   datasets	
   will	
   be	
   pre-­‐processed	
   using	
  
these	
  settings	
  and	
  you	
  will	
  be	
  prompted	
  to	
  create	
  a	
  new	
  dataset(s)	
  or	
  overwrite	
  the	
  current	
  
datasets.	
  Check	
  	
  “Overwrite”	
  and	
  click	
  OK.	
  
	
  

Help	
  Pane

Figure	
  9	
  Preprocessing	
  GUI.	
  Accessible	
  through	
  pop_pre_prepData().	
  

	
  

	
  
6.5.1.	
  
Theory:	
  preprocessing	
  	
  
	
  
SIFT	
  currently	
  allows	
  the	
  user	
  access	
  to	
  the	
  following	
  pre-­‐processing	
  options:	
  
	
  
1. Component	
  selection	
  
2. Epoching	
  
3. Filtering	
  
4. Downsampling	
  
5. Differencing	
  
6. Detrending	
  
7. Normalization	
  
	
  
Many	
   of	
   these	
   preprocessing	
   steps	
   can	
   also	
   be	
   performed	
   from	
   EEGLAB	
   prior	
   to	
   starting	
  
SIFT	
   (see	
   the	
   EEGLAB	
   Wiki).	
   Pre-­‐processing	
   can	
   be	
   performed	
   from	
   the	
   command-­‐line	
  

	
  

33	
  

using	
  SIFT’s	
  pre_prepData()	
  function	
  
	
  
6.5.1.1.	
  
Component	
  Selection	
  
	
  
Ideally,	
   one	
   should	
   fit	
   a	
   multivariate	
   causal	
   model	
   to	
   all	
   available	
   variables.	
   This	
   helps	
  
reduce	
   the	
   chances	
   of	
   false	
   positives	
   in	
   connectivity	
   (e.g.,	
   spurious	
   causation)	
   due	
   to	
  
exogenous	
   causal	
   influences	
   from	
   excluded	
   variables	
   –	
   i.e.	
   “non-­‐brain”	
   components	
  
(Schelter	
   et	
   al.,	
   2006;	
   Pearl,	
   2009).	
   However,	
   increasing	
   the	
   number	
   of	
   variables	
   also	
  
increases	
   the	
   number	
   of	
   parameters	
   which	
   we	
   must	
   estimate	
   using	
   a	
   VAR	
   model.	
   For	
  
example,	
  if	
  we	
  wish	
  to	
  fit	
  a	
  VAR	
  model	
  of	
  order	
  p,	
  increasing	
  the	
  number	
  of	
  variables	
  from	
  M	
  
to	
  M+1	
  requires	
  us	
  to	
  estimate	
   (2 M + 1) p 	
  additional	
  parameters.	
  This	
  in	
  turn	
  increases	
  the	
  
minimal	
  amount	
  of	
  data	
  we	
  need	
  to	
  adequately	
  fit	
  our	
  model	
  (see	
  the	
  discussion	
  on	
  Window	
  
Length	
   in	
   section	
   6.6.1.	
   ).	
   Thus,	
   when	
   limited	
   data	
   is	
   available	
   it	
   is	
   often	
   necessary	
   to	
   fit	
   our	
  
models	
  to	
  a	
  subset	
  of	
  relevant	
  variables.	
  
	
  
Variables	
  can	
  be	
  selected	
  using	
  several	
  approaches.	
  One	
  approach	
  is	
  to	
  estimate	
  the	
  partial	
  
coherence	
   between	
   all	
   variables	
   using	
   a	
   non-­‐parametric	
   method	
   (e.g.,	
   FFTs,	
   or	
   wavelets)	
  
and	
   then	
   only	
   keep	
   those	
   variables	
   that	
   show	
   significant	
   partial	
   coherence	
   with	
   at	
   least	
   one	
  
other	
  variable.	
  	
  If	
  one	
  is	
  working	
  with	
  ICA	
  components,	
  another	
  (possibly	
  complementary)	
  
approach	
  is	
  to	
  select	
  only	
  a	
  subset	
  of	
  ICs	
  that	
  are	
  clearly	
  related	
  to	
  brain	
  activity.	
  This	
  can	
  be	
  
performed	
  manually	
  (Onton	
  and	
  Makeig,	
  2009)	
  or	
  with	
  the	
  assistance	
  of	
  automation	
  tools	
  
such	
   as	
   ADJUST	
   (Mognon	
   et	
   al.,	
   2010).	
   The	
   validity	
   of	
   this	
   approach	
   relies	
   on	
   the	
   (rather	
  
strong)	
   assumption	
   that	
   ICA	
   has	
   effectively	
   removed	
   all	
   non-­‐brain	
   activity	
   from	
   brain	
  
components,	
   such	
   that	
   there	
   is	
   no	
   shared	
   information	
   between	
   variables	
   in	
   the	
   preserved	
  
set	
  and	
  those	
  excluded	
  from	
  the	
  set.	
  	
  See	
  (Fitzgibbon	
  et	
  al.,	
  2007;	
  Onton	
  and	
  Makeig,	
  2009)	
  
for	
   discussions	
   on	
   the	
   topic.	
   Both	
   approaches	
   can	
   be	
   performed	
   using	
   standard	
   EEGLAB	
  
routines,	
  as	
  documented	
  in	
  the	
  EEGLAB	
  Wiki.	
  In	
  practice,	
  one	
  should	
  apply	
  a	
  combination	
  of	
  
these	
   two	
   approaches,	
   selecting	
   the	
   largest	
   subset	
   of	
   partially-­‐coherent	
   ICs	
   which	
   will	
  
afford	
   an	
   adequate	
   model	
   fit	
   given	
   the	
   amount	
   of	
   data	
   available,	
   while	
   giving	
   highest	
  
priority	
   to	
   those	
   ICs	
   which	
   likely	
   arise	
   from	
   brain	
   sources	
   and	
   which	
   demonstrate	
  
significant	
  partial	
  coherence	
  with	
  one	
  or	
  more	
  other	
  “brain”	
  ICs.	
  	
  
	
  
Sparse	
  VAR	
  estimation	
  methods	
  generally	
  obviate	
  the	
  need	
  to	
  select	
  variables	
  by	
  imposing	
  
additional	
   sparsity	
   constraints	
   on	
   the	
   model	
   coefficients.	
   Although	
   we	
   may	
   have	
   a	
   large	
  
number	
   of	
   variables,	
   and	
   therefore	
   a	
   large	
   number	
   of	
   coefficients	
   to	
   estimate,	
   we	
   assume	
  
that	
   only	
   a	
   small	
   subset	
   of	
   the	
   coefficients	
   are	
   non-­‐zero	
   at	
   any	
   given	
   time,	
   effectively	
  
decreasing	
   the	
   amount	
   of	
   data	
   required	
   to	
   obtain	
   unbiased	
   coefficients	
   estimates	
   (Valdés-­‐
Sosa	
   et	
   al.,	
   2005;	
   Schelter	
   et	
   al.,	
   2006).	
   We	
   are	
   currently	
   implementing	
   this	
   approach	
   and	
  
sparse	
  VAR	
  modeling	
  will	
  be	
  incorporated	
  into	
  SIFT	
  in	
  an	
  upcoming	
  release.	
  	
  
	
  
6.5.1.2.	
  
Epoching	
  
	
  
This	
  simply	
  allows	
  the	
  user	
  to	
  analyze	
  a	
  subset	
  of	
  the	
  original	
  epoch.	
  When	
  using	
  a	
  sliding-­‐
window	
   AMVAR	
   modeling	
   approach	
   with	
   a	
   window	
   length	
   of	
   W	
   seconds,	
   if	
   one	
   wishes	
   to	
  
analyze	
  time-­‐varying	
  connectivity	
  from	
  T1	
  to	
  T2	
  seconds,	
  he	
  should	
  choose	
  his	
  epoch	
  length	
  
to	
  be	
  T1-­‐0.5W	
  to	
  T2+0.5W	
  seconds.	
  This	
  is	
  because	
  the	
  connectivity	
  estimate	
  at	
  time	
  t	
  will	
  
correspond	
   to	
   the	
   connectivity	
   over	
   the	
   W-­‐second	
   window	
   centered	
   at	
   time	
   t.	
   Thus	
   the	
  
earliest	
   timepoint	
   for	
   which	
   we	
   will	
   have	
   a	
   connectivity	
   estimate	
   is	
   T1+0.5W,	
   where	
   T1	
  
	
  

34	
  

denotes	
  the	
  start	
  of	
  the	
  epoch.	
  	
  
	
  
6.5.1.3.	
  
Filtering	
  
	
  
Filtering	
   can	
   be	
   a	
   useful	
   pre-­‐processing	
   step	
   if	
   the	
   data	
   contains	
   low-­‐frequency	
   drift	
   or	
  
pronounced	
   artifacts	
   in	
   one	
   or	
   more	
   frequency	
   bands.	
   Removal	
   of	
   drift	
   (trend)	
   can	
  
dramatically	
  improve	
  data	
  stationarity,	
  and	
  thereby	
  the	
  quality	
  of	
  a	
  VAR	
  model	
  fit.	
  Since	
  the	
  
relative	
   phases	
   of	
   each	
   time	
   series	
   are	
   a	
   key	
   element	
   in	
   information	
   flow	
   modeling,	
   it	
   is	
  
critical	
  to	
  apply	
  a	
  zero-­‐phase	
  (acausal)	
  filter,	
  which	
  introduces	
  no	
  phase	
  distortion.	
  Filtering	
  
is	
   performed	
   using	
   EEGLAB’s	
   eegfilt()	
   function.	
   This	
   in	
   turn	
   calls	
   filtfilt()	
   from	
  
the	
   Matlab	
   Signal	
   Processing	
   Toolbox	
   which	
   implements	
   a	
   forward-­‐reverse	
   (zero-­‐phase)	
  
FIR	
  filter.	
  	
  
	
  
6.5.1.4.	
  
Downsampling	
  
	
  
Downsampling	
   can	
   be	
   an	
   important	
   step	
   when	
   fitting	
   parametric	
   autoregressive	
   models.	
  
Time	
   series	
   with	
   high	
   sampling	
   rates	
   generally	
   require	
   large	
   model	
   orders	
   to	
   appropriately	
  
capture	
  the	
  process	
  dynamics,	
  particularly	
  if	
  interactions	
  occur	
  over	
  a	
  relatively	
  long	
  time	
  
lag.	
   As	
   in	
   the	
   discussion	
   above	
   regarding	
   variable	
   selection,	
   increasing	
   the	
   model	
   order	
  
increases	
  the	
  number	
  of	
  model	
  coefficients	
  that	
  must	
  be	
  estimated	
  which	
  can	
  lead	
  to	
  biased	
  
estimates	
  when	
  limited	
  data	
  is	
  available.	
  Generally	
  speaking,	
  spectral	
  and	
  causal	
  estimates	
  
derived	
   from	
   models	
   of	
   high	
   order	
   exhibit	
   increased	
   variability	
   relative	
   to	
   those	
   with	
   low	
  
model	
  order,	
  which	
  can	
  complicate	
  interpretation	
  unless	
  appropriate	
  statistics	
  are	
  applied	
  
(Schelter	
   et	
   al.,	
   2005a).	
   In	
   SIFT,	
   downsampling	
   is	
   implemented	
   using	
   EEGLAB’s	
  	
  
pop_resample()	
   function	
   which	
   employs	
   a	
   zero-­‐phase	
   antialiasing	
   filter	
   prior	
   to	
  
downsampling.	
   The	
   use	
   of	
   a	
   zero-­‐phase	
   antialiasing	
   filter	
   is	
   critical	
   for	
   the	
   same	
   reasons	
  
described	
  above	
  for	
  band-­‐pass	
  filtering.	
  	
  
	
  
6.5.1.5.	
  
Differencing	
  
	
  
Differencing	
  is	
  a	
  standard	
  operation	
  for	
  improving	
  stationarity	
  prior	
  to	
  fitting	
  time-­‐domain	
  
parametric	
  VAR	
  models	
  (Chatfield,	
  1989).	
  A	
  first-­‐order	
  difference	
  filter	
  for	
  input	
  x	
  is	
  given	
  
by yt = xt − xt −1 = ∇xt .	
   This	
   operation	
   can	
   be	
   applied	
   repeatedly	
   to	
   obtain	
   an	
   nth	
   order	
  
difference	
  filter:	
   yt = ∇ n xt = ∇ n −1 xt − ∇ n −1 xt −1 	
  .	
  Orders	
  larger	
  than	
  two	
  are	
  rarely	
  needed	
  to	
  
ensure	
  stationarity.	
  An	
  important	
  point	
  to	
  note	
  is	
  that	
  differencing	
  is	
  a	
  high-­‐pass	
  filter	
  and	
  
may	
  complicate	
  frequency-­‐domain	
  interpretations	
  of	
  connectivity	
  (Seth,	
  2010).	
  Differencing	
  
is	
  implemented	
  in	
  pre_diffData().	
  
	
  
	
  
Recently	
   published	
   reports	
   have	
   examined	
   the	
   effect	
   of	
   different	
   forms	
   of	
   downsampling,	
  
differencing,	
  and	
  filtering	
  on	
  granger-­‐causal	
  measures	
  and	
  demonstrate	
  that	
  in	
  some	
  cases,	
  
these	
   operations	
   may	
   produce	
   spurious	
   connectivity	
   estimates	
   (Florin	
   et	
   al.,	
   2010;	
   Seth,	
  
2010).	
  	
  In	
  general,	
  if	
  the	
  sampling	
  rate	
  is	
  not	
  excessively	
  high	
  (>	
  500	
  Hz)	
  and	
  there	
  are	
  not	
  
large	
   frequency-­‐specific	
   artifacts	
   in	
   the	
   data,	
   it	
   is	
   advisable	
   to	
   avoid	
   downsampling	
   or	
  
filtering.	
   Differencing	
   should	
   also	
   be	
   treated	
   with	
   great	
   caution	
   if	
   one	
   wishes	
   to	
   examine	
  
frequency-­‐domain	
   connectivity.	
   In	
   general,	
   one	
   should	
   maintain	
   caution	
   when	
   applying	
   any	
  

	
  

35	
  

transformation	
  to	
  their	
  data	
  –	
  either	
  to	
  remove	
  artifacts	
  or	
  improve	
  stationarity	
  –	
  due	
  to	
  the	
  
fact	
   that	
   may	
   not	
   be	
   well-­‐understood	
   how	
   these	
   operations	
   affect	
   connectivity	
   estimates,	
  
particularly	
  under	
  less-­‐than-­‐ideal,	
  real-­‐world	
  conditions.	
  	
  When	
  possible,	
  a	
  safer	
  alternative	
  
to	
  stationarity-­‐improving	
  transformations	
  is	
  to	
  instead	
  use	
  adaptive	
  algorithms	
  that	
  either	
  
fit	
   a	
   model	
   to	
   locally-­‐stationary	
   windows	
   of	
   data	
   (e.g.,	
   sliding-­‐window	
   AMVAR),	
   estimate	
  
model	
   coefficients	
   from	
   spectral	
   density	
   matrices	
   (e.g.,	
   minimum-­‐phase	
   factorization)	
   or	
  
utilize	
  a	
  state-­‐space	
  representation	
  (e.g.,	
  Kalman	
  Filter).	
  SIFT	
  allows	
  the	
  user	
  to	
  select	
  from	
  
several	
   such	
   adaptive	
   algorithms	
   and	
   also	
   provides	
   methods	
   for	
   rigorously	
   testing	
   the	
  
stationarity	
  and	
  quality	
  of	
  the	
  fitted	
  model.	
  	
  
	
  
6.5.1.6.	
  
Detrending	
  
	
  
When	
   only	
   linear	
   trend/drift	
   is	
   present	
   in	
   the	
   data,	
   an	
   alternative	
   to	
   high-­‐pass	
   filtering	
   is	
   to	
  
linearly	
   detrend	
   the	
   time-­‐series	
   using	
   a	
   least-­‐squares	
   fit.	
   This	
   is	
   a	
   phase-­‐preserving	
  
operation.	
  Detrending	
  and	
  centering	
  (subtraction	
  of	
  the	
  temporal	
  mean)	
  is	
  implemented	
  in	
  
SIFT’s	
  pre_detrend().	
  
	
  
6.5.1.7.	
  
Normalization	
  
	
  
SIFT	
   allows	
   you	
   to	
   perform	
   two	
   kinds	
   of	
   normalization,	
   ensemble	
   normalization	
   and	
  
temporal	
  normalization.	
  In	
  section	
  3.4.1.	
  	
  we	
  noted	
  that	
  ensemble	
  normalization	
  (pointwise	
  
subtraction	
   of	
   an	
   ensemble	
   mean	
   and	
   division	
   by	
   ensemble	
   standard	
   deviation)	
   is	
   an	
  
important	
  preprocessing	
  step	
  when	
  using	
  multi-­‐trial	
  sliding-­‐window	
  AMVAR.	
  	
  
	
  
In	
   contrast,	
   when	
   using	
   short	
   windows,	
   temporal	
   normalization	
   (subtraction	
   of	
   mean	
   of	
  
each	
   window	
   and	
   division	
   by	
   standard	
   deviation)	
   is	
   not	
   a	
   good	
   choice	
   since	
   the	
   small-­‐
sample	
  estimate	
  of	
  the	
  mean	
  and	
  variance	
  within	
  each	
  small	
  window	
  will	
  be	
  highly	
  biased	
  
(inaccurate).	
   As	
   such,	
   SIFT	
   only	
   allows	
   you	
   to	
   perform	
   temporal	
   normalization	
   across	
   the	
  
whole	
   trial	
   prior	
   to	
   segmentation.	
   This	
   should	
   be	
   performed	
   prior	
   to	
   ensemble	
  
normalization	
   and	
   ensures	
   that	
   all	
   variables	
   have	
   equal	
   weight	
   (variance)	
   across	
   the	
   trial.	
  
This	
   is	
   important	
   since	
   the	
   units	
   of	
   many	
   of	
   our	
   derived	
   VAR-­‐based	
   causal	
   measures,	
   like	
  
any	
   regression	
   method,	
   are	
   not	
   scale-­‐free	
   and	
   depend	
   highly	
   on	
   the	
   units	
   and	
   variance	
   of	
  
the	
   signals.	
   Thus,	
   severely	
   unbalanced	
   variances	
   amongst	
   the	
   variables	
   will	
   likely	
   lead	
   to	
  
model	
   misspecification	
   (e.g.	
   the	
   variables	
   with	
   the	
   highest	
   variance	
   may	
   be	
   incorrectly	
  
indicated	
   as	
   causal	
   sources).	
   It	
   is	
   worthwhile	
   to	
   note	
   that	
   scale-­‐free	
   measures	
   such	
   as	
  
renormalized	
  PDC	
  (rPDC)	
  are	
  theoretically	
  immune	
  to	
  this	
  problem.	
  Nonetheless,	
  temporal	
  
normalization,	
  when	
  possible	
  and	
  reasonable,	
  is	
  usually	
  a	
  good	
  idea.	
  
	
  
	
  

6.6.	
  

Model	
  Fitting	
  and	
  Validation	
  

	
  
Once	
   the	
   data	
   has	
   been	
   pre-­‐processed,	
   we	
   can	
   proceed	
   to	
   Model	
   Fitting	
   and	
   Validation.	
  
SIFT	
   0.1a	
   currently	
   supports	
   parametric	
   VAR	
   modeling	
   using	
   the	
   Vieira-­‐Morf	
   (Marple,	
  
1987),	
   or	
   ARFIT	
   algorithm	
   (Schneider	
   and	
   Neumaier,	
   2001).	
   ARFIT	
   must	
   be	
   downloaded	
  
separately	
  	
  and	
  installed	
  in	
  /external/arfit/).	
  SIFT	
  currently	
  supports	
  time-­‐
varying	
   parametric	
   VAR	
   modeling	
   either	
   through	
   the	
   use	
   of	
   sliding-­‐window	
   adaptive	
   VAR	
  

	
  

36	
  

modeling	
  
(est_fitMVAR())	
  
(est_fitMVARKalman()).	
  	
  	
  	
  

or	
  

recursive-­‐least-­‐squares	
  

Kalman	
  

filtering	
  

To	
   model	
   the	
   data	
   using	
   a	
   sliding-­‐window	
   AMVAR,	
   select	
   SIFTà Model	
   Fitting	
   and	
  
Validationà Fit	
  AMVAR	
  Model.	
  This	
  can	
  also	
  be	
  started	
  from	
  the	
  command-­‐line:	
  
>> EEG = pop_est_fitMVAR(EEG);
You	
   should	
   now	
   see	
   a	
   GUI	
   similar	
   to	
   that	
   displayed	
   in	
   Figure	
   12.	
   Here	
   we	
   can	
   select	
   the	
  
MVAR	
   algorithm,	
   choose	
   the	
   sliding	
   window	
   length	
   and	
   step	
   size,	
   and	
   specify	
   the	
   model	
  
order.	
   ARFIT	
   uses	
   a	
   modified	
   least-­‐squares	
   approach	
   while	
   Vieira-­‐Morf	
   uses	
   a	
   multichannel	
  
geometric-­‐mean	
   non-­‐least-­‐squares	
   lattice	
   approach	
   to	
   solve	
   for	
   the	
   model	
   coefficients.	
  
ARFIT	
   includes	
   an	
   additional	
   term	
   for	
   the	
   process	
   mean,	
   whereas	
   Vieira-­‐Morf	
   assumes	
   a	
  
zero-­‐mean	
  process.	
   The	
  current	
  implementation	
  of	
  ARFIT	
  is	
  faster	
  than	
  that	
  of	
  the	
  Vieira-­‐
Morf	
   algorithm,	
   although	
   Vieira-­‐Morf	
   returns	
   slightly	
   better	
   coefficient	
   estimates.	
   For	
   a	
  
detailed	
  performance	
  comparison	
  between	
  these	
  and	
  other	
  estimators,	
  see	
  (Schlögl,	
  2006).	
  
In	
   this	
   example	
   we	
   will	
   use	
   the	
   Vieira-­‐Morf	
   algorithm.	
   We	
   will	
   also	
   use	
   a	
   window	
   length	
   of	
  
0.4	
  sec	
  (400	
  ms)	
  with	
  a	
  step	
  size	
  of	
  0.1	
  sec	
  (10	
  ms).	
  This	
  is	
  approximately	
  1	
  cycle	
  of	
  a	
  2.5	
  Hz	
  
oscillation	
   and	
   should	
   allow	
   us	
   to	
   adequately	
   model	
   information	
   flow	
   from	
   the	
   lowest	
  
frequency	
  band	
  of	
  interest	
  (delta)	
  up	
  to	
  our	
  Nyquist	
  frequency	
  of	
  128	
  Hz.	
  Consult	
  the	
  theory	
  
section	
  (6.6.1.	
  )	
  below	
  for	
  more	
  details	
  on	
  selecting	
  an	
  appropriate	
  window	
  length.	
  Your	
  GUI	
  
should	
  appear	
  as	
  shown	
  in	
  Figure	
  10	
  with	
  options	
  set	
  as	
  in	
  the	
  table	
  below:	
  
Option	
  
Select	
  MVAR	
  Algorithm	
  
Window	
  length	
  (sec)	
  
Step	
  Size	
  (sec)	
  	
  	
  	
  
Model	
  Order	
  
	
  

Value	
  
Vieira-­‐Morf	
  
0.4	
  
0.01	
  
[1	
  30]	
  

	
  
Figure	
   10.	
   AMVAR	
   model	
   fitting	
   GUI	
   generated	
  by	
   pop_est_fitMVAR(). We	
   have	
   selected	
   a	
   window	
  
length	
  of	
  0.4	
  sec,	
  step	
  size	
  of	
  0.01	
  sec	
  and	
  specified	
  a	
  model	
  order	
  range	
  (for	
  order	
  selection)	
  of	
  1	
  to	
  30.

	
  

	
  

37	
  

6.6.1.	
  
Theory:	
  selecting	
  a	
  window	
  length	
  
	
  
There	
   are	
   several	
   important	
   considerations	
   that	
   can	
   aid	
   us	
   in	
   choosing	
   an	
   appropriate	
  
window	
   length.	
   An	
   appropriate	
   window	
   length	
   often	
   requires	
   a	
   compromise	
   between	
   one	
  
or	
  more	
  of	
  the	
  following	
  considerations:	
  
	
  
1. Local	
  Stationarity	
  
2. Temporal	
  Smoothing	
  
3. Sufficient	
  amount	
  of	
  data	
  
4. Process	
  Dynamics	
  and	
  Neurophysiology	
  
6.6.1.1.	
  
Local	
  Stationarity	
  
	
  
In	
   section	
   3.4.	
   	
   we	
   discussed	
   issues	
   surrounding	
   non-­‐stationary	
   EEG	
   data	
   and	
   introduced	
  
the	
  concept	
  of	
  using	
  a	
  sliding	
  window	
  to	
  fit	
  VAR	
  models	
  to	
  locally	
  stationary	
  data.	
  It	
  is	
  thus	
  
important	
   that	
   our	
   window	
   be	
   small	
   enough	
   to	
   ensure	
   local	
   stationary.	
   As	
   we	
   shall	
   see	
   in	
  
the	
  section	
  6.6.4.	
  validating	
  our	
  model	
  and	
  plotting	
  the	
  stability	
  index	
  for	
  a	
  chosen	
  window	
  
size	
   can	
   give	
   us	
   a	
   sense	
   as	
   to	
   whether	
   our	
   window	
   is	
   sufficiently	
   small	
   to	
   ensure	
   local	
  
stationarity.	
  
	
  
6.6.1.2.	
  
Temporal	
  Smoothing	
  
	
  
A	
   sliding-­‐window	
   analysis	
   is	
   inherently	
   a	
   temporal	
   smoothing	
   operation.	
   Larger	
   windows	
  
result	
   in	
   model	
   coefficients	
   being	
   aggregated	
   over	
   increasingly	
   longer	
   segments	
   of	
   data	
   and	
  
therefore	
   results	
   in	
   increased	
   temporal	
   smoothing.	
   If	
   there	
   are	
   rapid	
   transient	
   changes	
   in	
  
process	
   dynamics,	
   choosing	
   a	
   too-­‐large	
   window	
   may	
   obscure	
   the	
   fine	
   temporal	
   structure	
   of	
  
information	
  flow.	
  When	
  multiple	
  trials	
  are	
  available,	
  a	
  useful	
  heuristic	
  proposed	
  by	
  (Ding	
  et	
  
al.,	
  2000b)	
  for	
  obtaining	
  a	
  rough	
  upper	
  limit	
  on	
  the	
  window	
  length	
  is	
  to	
  plot	
  the	
  bivariate	
  
correlation	
   for	
   a	
   few	
   pairs	
   of	
   variables,	
   across	
   all	
   trials,	
   beginning	
   with	
   a	
   1-­‐point	
   window	
  
and	
   successively	
   averaging	
   correlation	
   within	
   trials	
   and	
   across	
   the	
   window	
   for	
   increasingly	
  
larger	
  windows.	
  An	
  illustration	
  from	
  (Ding	
  et	
  al.,	
  2000b)	
  is	
  reproduced	
  in	
  Figure	
  11.	
  Note	
  
that	
   with	
   the	
   1-­‐point	
   window	
   there	
   are	
   large	
   fluctuations	
   in	
   correlation	
   structure	
  
(covariance	
   non-­‐stationarity).	
   As	
   we	
   increase	
   the	
   window	
   length,	
   we	
   get	
   increased	
  
temporal	
  smoothing.	
  In	
  this	
  case,	
  a	
  reasonable	
  window	
  length	
  might	
  be	
  10	
  points,	
  since	
  it	
  
reduces	
  local	
  covariance	
  non-­‐stationarity	
  (local	
  fluctuations	
  in	
  cross-­‐correlation)	
  while	
  still	
  
preserving	
   some	
   of	
   the	
   temporal	
   dynamics	
   of	
   interest	
   (namely	
   the	
   changes	
   in	
   correlation	
  
structure).	
  Of	
  course,	
  this	
  suggested	
  window	
  length	
  is	
  completely	
  application-­‐specific;	
  one	
  
should	
  select	
  a	
  window	
  tailored	
  to	
  their	
  specific	
  data.	
  
	
  

	
  

38	
  

/% /9 &*8# ('"*%? &+# &,$@1 C/ (#8/%$&",&# &+*$
,&*/% %/%$&,&*/%,"*&)= .# ,-#",?#( &+# G#"/45,?
0/""#5,&*/% >#&.##% &./ 0+,%%#5$ 9"/8 &+# $&"*,&#
/-#" ,55 &"*,5$ $'00#$$*-#5) 9/" #,0+ &*8# !/*%&
? &+# &,$@1 C+# &*8#4-,")*%? $&"'0&'"# /9 &+# "#4
? 0'"-#= $+/.% *$ A*?1 L= 05#,"5) (#8/%$&",&#$ &+#
,&*/%,"*&) /9 &+# 0"/$$40/""#5,&*/% $&,&*$&*01

8'0+ /9 &+# >,$*0 !,&&#"% /9 -,"*,&*/%1 A/" 8/(#5 #$&*4
8,&*/%= .# 8'$& ,5$/ 0/%$*(#" &+# $8//&+%#$$ /9 &+#
#$&*8,&#( $!#0&",5 D',%&*&*#$1 M'" #J!#"*#%0# *%(*0,&#$
&+,& &+# UW4!/*%& PXW 8$Q .*%(/. *$ , ?//( 0/8!"/8*$#
>#&.##% !"#$#"-*%? 0/""#5,&*/% -,"*,>*5*&) ,%( 8,*%4
&,*%*%? $8//&+%#$$ /9 &+# #$&*8,&#( $!#0&",5 D',%&*&*#$1
F5&+/'?+ /%# 0/'5( &,*5/" &+# .*%(/. $*G# &/ 3& (*Y#"#%&

C+# G#"/45,? 0"/$$40/""#5,&*/% >#&.##% &./ 0+,%%#5$ 9"/8 &+#
/"&#J 0/8!'&#( *% U4!/*%& .*%(/.$ ,& #-#") &*8# !/*%& ('"*%?
!"#$ &$ C+# G#"/45,? 0"/$$40/""#5,&*/% >#&.##% &+# $,8# &./ 0+,%%#5$
@
,$ *%between	
  
A*?1 L 9/" U4=
UW4=intracranial	
  
,%( BW4!/*%& .*%(/.$
Figure	
   11.	
   Cross-­‐correlation	
  
two	
  
EEG	
   time-­‐series,	
   averaged	
  

	
  

over	
   increasing	
   window	
  
lengths	
  (1	
  point,	
  10	
  points,	
  20	
  points),	
  and	
  plotted	
  as	
  a	
  function	
  of	
  time.	
  	
  Figure	
  reproduced	
  from	
  (Ding	
  et	
  al.,	
  
2000b).	
  	
  

	
  
6.6.1.3.	
  
Sufficient	
  amount	
  of	
  data	
  
	
  
In	
   section	
   3.4.1.	
   we	
   noted	
   that	
   a	
   minimum	
   of	
   M2p	
   data	
   points	
   are	
   required	
   to	
   fit	
   an	
   M-­‐
dimensional	
  VAR[p]	
  model.	
  We	
  also	
  stated	
  that,	
  in	
  practice	
  we	
  would	
  like	
  to	
  have	
  10	
  times	
  
that	
  many	
  data	
  points	
  to	
  ensure	
  an	
  unbiased	
  fit.	
  	
  This	
  leads	
  us	
  to	
  the	
  rule-­‐of-­‐thumb	
  equation	
  
	
  
!
!! ! ≤
!"	
  
!"
	
  
or,	
  equivalently,	
  
	
  
!! !
! ≥ !"
	
  
!
	
  
where	
  W	
  is	
  the	
  window	
  length	
  in	
  points	
  and	
  N	
  is	
  the	
  total	
  number	
  of	
  trials	
  available.	
  SIFT	
  
performs	
   checks	
   on	
   parameters	
   (est_checkMVARParams())	
   and	
   will	
   let	
   you	
   know	
   if	
   the	
  
selected	
  window	
  length	
  is	
  sub-­‐optimal	
  (as	
  well	
  as	
  recommend	
  a	
  better	
  window	
  length).	
  	
  
6.6.1.4.	
  
Process	
  dynamics	
  and	
  neurophysiology	
  
	
  
In	
  section	
  3.5.	
  	
  we	
  discussed	
  how,	
  when	
  selecting	
  an	
  appropriate	
  model	
  order,	
  one	
  should	
  
take	
   into	
   account	
   the	
   temporal	
   dynamics	
   and	
   neurophysiology	
   of	
   the	
   data.	
   The	
   same	
  
principles	
   hold	
   for	
   window	
   length	
   selection.	
   Although,	
   with	
   a	
   large	
   number	
   of	
   trials,	
   we	
  
could	
  theoretically	
  fit	
  our	
  model	
  using	
  a	
  window	
  length	
  as	
  short	
  at	
  p+1	
  sample	
  points	
  long,	
  
we	
   must	
   consider	
   that	
   all	
   interactions	
   being	
   modeled	
   must	
   occur	
   within	
   the	
   selected	
  
window.	
  In	
  general,	
  if	
  we	
  expect	
  a	
  maximum	
  interaction	
  lag	
  of	
   τ	
  seconds	
  between	
  any	
  two	
  
brain	
  processes,	
  we	
  should	
  make	
  sure	
  to	
  select	
  a	
  window	
  length	
  of	
  W	
  ≥	
  τ.	
  	
  
	
  
Futhermore,	
   if	
   we	
   are	
   interested	
   in	
   frequency-­‐domain	
   quantities,	
   we	
   should	
   consider	
   the	
  
time-­‐frequency	
   uncertainty	
   principle,	
   which	
   states	
   that	
   every	
   increase	
   in	
   temporal	
  

	
  

39	
  

resolution	
  leads	
  to	
  a	
  concomitant	
  decrease	
  in	
  frequency	
  resolution.	
  In	
  general,	
  a	
  useful	
  rule-­‐
of-­‐thumb	
   is	
   to	
   ensure	
   that	
   the	
   window	
   is	
   long	
   enough	
   to	
   span	
   approximately	
   one	
   cycle	
   of	
  
the	
  lowest	
  frequency	
  of	
  interest.	
  	
  	
  
	
  
	
  
6.6.2.	
  
	
  

Selecting	
  the	
  model	
  order	
  

Now	
   that	
   we	
   have	
   chosen	
  our	
   VAR	
   algorithm,	
   window	
   length	
   and	
   step	
   size,	
   we	
   can	
   proceed	
  
to	
  model	
  order	
  selection.	
  Click	
  Start	
  Model	
  Order	
  Assistant.	
  You	
  should	
  see	
  a	
  command-­‐
window	
   pop	
   up	
   indicating	
   that	
   some	
   warnings	
   were	
   generated	
   (Figure	
   12).	
   The	
   Matlab	
  
command-­‐window	
  shows	
  the	
  results	
  of	
  a	
  sanity	
  check	
  that	
  evaluates	
  the	
  ratio	
  of	
  parameters	
  
to	
   datapoints,	
   calculates	
   the	
   number	
   of	
   estimable	
   frequencies,	
   checks	
   the	
   time-­‐frequency	
  
product,	
  and	
  performs	
  other	
  relevant	
  checks.	
  	
  Information	
  is	
  displayed	
  for	
  each	
  condition,	
  
along	
  with	
  suggestions	
  on	
  optimal	
  parameters	
  to	
  use,	
  if	
  your	
  parameter	
  selections	
  are	
  sub-­‐
optimal.	
  Here,	
  we	
  are	
  being	
  warned	
  that	
  the	
  ratio	
  of	
  free	
  parameters	
  to	
  datapoints	
  is	
  greater	
  
than	
  0.1,	
  which	
  may	
  be	
  a	
  cause	
  for	
  concern.	
  This	
  is	
  because	
  our	
  upper	
  model	
  order	
  of	
  30	
  is	
  
quite	
  large.	
  Let’s	
  go	
  ahead	
  and	
  ignore	
  this	
  error	
  (we	
  are	
  likely	
  to	
  use	
  a	
  much	
  lower	
  model	
  
order	
  when	
  we	
  fit	
  our	
  final	
  model).	
  Click	
  OK	
  to	
  continue	
  to	
  the	
  model	
  order	
  selection	
  GUI.	
  
	
  

	
  
Figure	
   12.	
   Sanity	
   check	
   on	
   the	
   model	
   parameters.	
   Command-­‐window	
   shows	
   the	
   results	
   of	
   a	
   sanity	
   check	
  
performed	
  on	
  the	
  specified	
  model	
  parameters	
  (this	
  is	
  always	
  performed	
  prior	
  to	
  model	
  fitting).	
  

	
  

40	
  

The	
  model	
  order	
  selection	
  GUI	
  is	
  shown	
  in	
  Figure	
  13.	
  Here	
  we	
  can	
  choose	
  to	
  calculate	
  one	
  
or	
  more	
  of	
  the	
  information	
  criteria	
  listed	
  in	
  section	
  3.5.	
  	
  over	
  a	
  range	
  of	
  model	
  orders	
  (pmin	
  
to	
   pmax).	
   If	
   more	
   than	
   one	
   window	
   is	
   available,	
   the	
   information	
   criteria	
   are	
   calculated	
   for	
  
each	
  window,	
  and	
  histograms	
  will	
  be	
  generated	
  showing	
  the	
  distribution	
  of	
  optimal	
  model	
  
orders	
   for	
   all	
   windows.	
   If	
   we	
   have	
   a	
   large	
   number	
   of	
   windows	
   and	
   are	
   pressed	
   for	
   time,	
   we	
  
can	
   specify	
   a	
   random	
   percentage	
   of	
   windows	
   for	
   which	
   to	
   calculate	
   the	
   criteria	
   (%	
  
windows	
   to	
   sample).	
   Bear	
   in	
   mind,	
   however,	
   that	
   increasing	
   the	
   number	
   of	
   windows	
   used	
  
will	
   result	
   in	
   a	
   better	
   estimate	
   of	
   the	
   empirical	
   distribution	
   of	
   the	
   information	
   criteria	
  
across	
   windows.	
   Additionally,	
   for	
   a	
   fast,	
   approximate	
   estimate	
   of	
   the	
   information	
   criteria,	
  
we	
   can	
   choose	
   to	
   downdate	
   the	
   model	
   (Neumaier	
   and	
   Schneider,	
   2001).	
   Rather	
   than	
   fitting	
  
pmax	
  –	
  pmin	
  VAR	
  models	
  of	
  increasing	
  order,	
  this	
  fits	
  a	
  single	
  VAR[pmax]	
  model,	
  and	
  downdates	
  
the	
  noise	
  covariance	
  matrix	
  to	
  obtain	
  approximate	
  estimates	
  of	
  the	
  information	
  criteria	
  for	
  
each	
  model	
  order	
  !  !  {!!"# , … , !!"# }.	
  
For	
  this	
  example,	
  set	
  the	
  parameters	
  as	
  shown	
  in	
  Figure	
  13	
  and	
  in	
  the	
  table	
  below:	
  	
  
Option	
  
Order	
  criteria	
  
Downdate	
  model	
  	
  
Model	
  order	
  range	
  
%	
  windows	
  to	
  sample	
  
	
  

Value	
  
select	
  all	
  
(hold	
   down	
   Ctrl	
   (Win/Linux)	
   or	
   Command	
  
(Mac)	
  and	
  click	
  to	
  select	
  multiple	
  criteria)	
  
checked	
  	
  
1	
  -­‐	
  30	
  
100	
  

Click	
  OK	
  to	
  continue.	
  

	
  
Figure	
  13.	
  The	
  Model	
  Order	
  Selection	
  GUI	
  generated	
  by	
  pop_est_selModelOrder()	
  

	
  
	
  

41	
  

A	
   progress	
   bar	
   should	
   now	
   popup	
   indicating	
   our	
   progress	
   for	
   each	
   condition	
   (RespCorr,	
  
RespWrong).	
  When	
  this	
  is	
  complete,	
  you	
  should	
  see	
  the	
  resulting	
  figures	
  shown	
  in	
  Figure	
  14.	
  
On	
  left	
  is	
  the	
  result	
  of	
  model	
  order	
  selection	
  for	
  RespWrong	
  and	
  on	
  the	
  right	
  is	
  the	
  result	
  for	
  
RespCorr.	
  The	
  top	
  panel	
  shows	
  the	
  information	
  criteria	
  (averaged	
  across	
  windows)	
  plotted	
  
versus	
   model.	
   The	
   vertical	
   lines	
   indicates	
   the	
   average	
   optimal	
   model	
   order	
   (model	
   order	
  
that	
   minimizes	
   the	
   information	
   criterion)	
   for	
   a	
   each	
   criterion.	
   The	
   lower	
   array	
   of	
  
histograms	
   show	
   the	
   distribution	
   of	
   optimal	
   model	
   orders	
   for	
   all	
   windows,	
   for	
   each	
  
criterion.	
  Note	
  that,	
  as	
  mentioned	
  in	
  section	
  3.5.	
  for	
  many	
  windows	
  the	
  AIC	
  and	
  FPE	
  criteria	
  
do	
  not	
  appear	
  to	
  exhibit	
  a	
  clear	
  minimum	
  across	
  the	
  specified	
  model	
  range.	
  In	
  contrast,	
  SBC	
  
shows	
  a	
  clear	
  minimum	
  peaking	
  around	
  p=5	
   (which	
   is	
   likely	
   too	
   low	
  given	
   that	
   we	
   will	
   only	
  
be	
   able	
   to	
   estimate	
   2.5	
   spectral	
   peaks	
   for	
   each	
   pair	
   of	
   variables)	
   while	
   HQ	
   shows	
   a	
   clear	
  
minimum	
  around	
  p=11	
  and	
  p=13	
  for	
  RespWrong	
  and	
  RespCorr,	
  respectively.	
  Note,	
  however	
  
that	
  in	
  both	
  RespWrong	
  and	
  RespCorr,	
  the	
  upper	
  limit	
  on	
  the	
  model	
  order	
  selection	
  criteria	
  
is	
   approximately	
   p=15.	
   If	
   we	
   click	
   on	
   the	
   top	
   panel	
   of	
   RespCorr,	
   we	
   get	
   an	
   expanded	
   view	
   of	
  
the	
   panel	
   (Figure	
   15a).	
   Likewise	
   clicking	
   on	
   the	
   histogram	
   for	
   HQ	
   pops	
   up	
   an	
   expanded	
  
view	
   of	
   the	
   histogram	
   (Figure	
   15b).	
   Note	
   that	
   although	
   the	
   minimum	
   for	
   HQ	
   criterion	
  
(turquoise)	
  is	
  at	
  p=	
  13,	
  the	
  upper	
  limit	
  of	
  the	
  “elbow”	
  for	
  the	
  HQ	
  criterion	
  is	
  around	
   p=15	
  or	
  
p=16.	
   It	
   also	
   appear	
   that	
   AIC/FPE	
   begin	
   to	
   “flatten	
   out”	
   after	
   p=15.	
   From	
   this	
   we	
   might	
  
conclude	
  that	
  a	
  suitable,	
  safe	
  model	
  order	
  for	
  all	
  windows	
  and	
  conditions	
  is	
  be	
  p=15.	
  	
  
	
  

	
  
Figure	
   14.	
   Results	
   of	
   model	
   order	
   selection	
   for	
   RespWrong	
   (left)	
   and	
   RespCorr	
   (right).	
   The	
   top	
   panel	
   plots	
  
the	
  information	
  criteria	
  versus	
  model	
  order	
  and	
  marks	
  the	
  average	
  optimal	
  model	
  order	
  for	
  the	
  selected	
  criteria.	
  
If	
  multiple	
  windows	
  of	
  data	
  are	
  available,	
  the	
  bottom	
  panels	
  show	
  histograms	
  of	
  optimal	
  model	
  orders	
  across	
  
the	
  selected	
  data	
  windows.	
  

	
  

	
  

42	
  

	
  
Figure	
  15.	
  Left:	
  Close-­‐up	
  view	
  of	
  SBC	
  (red),	
  HQ	
  (turquoise),	
  FPE	
  (green),	
  AIC	
  (blue)	
  information	
  criteria	
  
plotted	
   versus	
   model	
   order.	
  Note	
   that	
   FPE	
   and	
   AIC	
   plots	
   are	
   almost	
   identical.	
  Right:	
   Distribution	
   over	
   all	
  
windows	
   of	
  optimal	
  model	
  order	
  using	
   HQ	
  information	
  criterion.	
  Vertical	
  markers	
  denote	
  distribution	
  
mean.	
  Note	
  the	
  distribution	
  is	
  somewhat	
  bimodal	
  with	
  one	
  peak	
  around	
  9	
  and	
  another	
  around	
  14.	
  	
  

	
  
6.6.3.	
  
	
  

Fitting	
  the	
  final	
  model	
  

Returning	
  to	
  our	
  model	
  order	
  selection	
  GUI,	
  we	
  now	
  set	
  the	
  model	
  order	
  option	
  to	
  15	
  as	
  per	
  
the	
  previous	
  discussion.	
  The	
  final	
  set	
  of	
  parameters	
  should	
  appear	
  as	
  in	
  Figure	
  16	
  and	
  the	
  
table	
  below:	
  
Option	
  
Select	
  MVAR	
  algorithm	
  
Window	
  length	
  (sec)	
  	
  
Step	
  size	
  (sec)	
  
Model	
  order	
  
	
  

	
  

Value	
  
Vieira-­‐Morf	
  
0.4	
  	
  (400	
  ms)	
  
0.01	
  	
  (10	
  ms)	
  
15	
  

43	
  

	
  
Figure	
   16.	
   Our	
   final	
   set	
   of	
   selected	
   parameters	
   for	
   model	
   fitting.	
  Note	
  that	
  we	
  have	
  selected	
  the	
  Vieira-­‐Morf	
  
algorithm,	
   a	
   window	
   length	
   of	
   0.4	
   seconds	
   with	
   a	
   step	
   size	
   of	
   0.01	
   sec,	
   and	
   a	
   model	
   order	
   of	
   15.	
   Upon	
   clicking	
   OK,	
  
a	
  progress	
  bar	
  will	
  show	
  us	
  the	
  status	
  of	
  the	
  model-­‐fitting	
  algorithm.	
  	
  

	
  
Click	
  OK	
  to	
  continue.	
  Our	
  sanity	
  check	
  should	
  proceed	
  and	
  generate	
  no	
  warnings	
  or	
  errors	
  
indicating	
   we	
   chosen	
   a	
   valid	
   set	
   of	
   model	
   parameters.	
   For	
   each	
   condition,	
   the	
   VAR[15]	
  
model	
   will	
   not	
   be	
   fit	
   for	
   each	
   of	
   the	
   186	
   windows.	
   We	
   should	
   now	
   see	
   a	
   progress	
   bar	
  
indicating	
   the	
   model	
   fitting	
   progress	
   for	
   each	
   condition.	
   Depending	
   on	
   the	
   speed	
   of	
   your	
  
computer,	
   this	
   may	
   take	
   a	
   while,	
   so	
   you	
   might	
   want	
   to	
   get	
   a	
   coffee	
   or	
   do	
   a	
   little	
   yoga	
   or	
  
something.	
  If	
  you	
  have	
  little	
  computer	
  memory	
  or	
  processor	
  speed	
  issues,	
  you	
  can	
  increase	
  
the	
  step	
  size	
  to	
  0.03	
  sec.	
  This	
  will	
  greatly	
  reduce	
  the	
  computation	
  time	
  demands	
  while	
  still	
  
producing	
  similar	
  results	
  as	
  in	
  the	
  remainder	
  of	
  this	
  exercise.	
  	
  
6.6.4.	
  
	
  

Validating	
  the	
  fitted	
  model	
  

After	
   you	
   are	
   refreshed	
   from	
   that	
   Yoga	
   session	
   and	
   the	
   model	
   has	
   been	
   fit	
   for	
   each	
  
condition,	
   we	
   will	
   need	
   to	
   validate	
   our	
   fitted	
   model.	
   Select	
   SIFTà Model	
   Fitting	
   and	
  
Validation	
  à 	
  Validate	
  Model	
  from	
  the	
  EEGLAG	
  menu.	
  This	
  can	
  also	
  be	
  achieved	
  from	
  the	
  
command-­‐line	
  using:	
  
>> pop_est_validateMVAR(EEG);
You	
   should	
   now	
   be	
   presented	
   with	
   the	
   GUI	
   shown	
   in	
   Figure	
   17	
   (left).	
   Here	
   we	
   have	
   the	
  
option	
  to	
  check	
  the	
  whiteness	
  of	
  the	
  residuals,	
  percent	
  consistency,	
  and	
  model	
  stability	
  for	
  
each	
   (or	
   a	
   random	
   subset)	
   of	
   our	
   windows.	
   As	
   we	
   discussed	
   in	
   section	
   3.6.	
   ,	
   residual	
  
whiteness	
   tests	
   include	
   Portmanteau	
   and	
   autocorrelation	
   tests	
   for	
   correlation	
   in	
   the	
  
residuals	
  of	
  the	
  model	
  (which	
  could	
  indicate	
  a	
  poor	
  model	
  fit).	
  Here	
  we	
  have	
  the	
  option	
  to	
  
choose	
  from	
  the	
  Ljung-­‐Box,	
  Box-­‐Pierce,	
  and	
  Li-­‐McLeod	
  multivariate	
  portmanteau	
  tests	
  and	
  
a	
   simple	
   autocorrelation	
   function	
   test.	
   Percent	
   consistency	
   denotes	
   the	
   fraction	
   of	
   the	
  
correlation	
   structure	
   of	
   the	
   original	
   data	
   that	
   is	
   captured	
   by	
   the	
   model,	
   while	
   model	
  
stability	
   performs	
   an	
   eigenvalue	
   test	
   for	
   stability/stationarity	
   of	
   the	
   process.	
   The	
   options	
  
for	
  this	
  GUI	
  should	
  be	
  set	
  as	
  shown	
  in	
  Figure	
  17	
  and	
  the	
  table	
  below:	
  

	
  

44	
  

Option	
  
Check	
  Whiteness	
  of	
  Residuals	
  
significance	
  level	
  
Check	
  percent	
  consistency	
  
Check	
  model	
  stability	
  
%	
  windows	
  to	
  sample	
  
	
  

Value	
  
checked	
  and	
  select	
  all	
  
0.05	
  
checked	
  
checked	
  
100	
  

	
  
Figure	
  17.	
  Model	
  Validation	
  GUI	
  generated	
  by	
  pop_est_validateMVAR().Here	
  we	
  can	
  choose	
  to	
  check	
  
the	
  whiteness	
  of	
  residuals,	
  percent	
  consistency,	
  and	
  model	
  stability	
  for	
  all	
  (or	
  some	
  random	
  subset)	
  of	
  windows.	
  
In	
  this	
  example,	
  we	
  have	
  chosen	
  a	
  significance	
  threshold	
  of	
  p<0.05	
  for	
  our	
  whiteness	
  tests.	
  	
  

	
  
Click	
   OK	
   to	
   continue.	
   You	
   should	
   now	
   see	
   a	
   sequence	
   of	
   progress	
   bars	
   for	
   each	
   condition	
   as	
  
shown	
  in	
  Figure	
  17	
  (right).	
  This	
  may	
  take	
  a	
  while,	
  so	
  go	
  ahead	
  and	
  have	
  another	
  coffee	
  or	
  
(preferably)	
  finish	
  that	
  Yoga	
  session.	
  	
  
Once	
   model	
   the	
   model	
   validation	
   routines	
   have	
   completed,	
   you	
   should	
   see	
   the	
   results	
  
shown	
  for	
  each	
  condition	
  as	
  in	
  Figure	
  18.	
  The	
  top	
  panel	
  of	
  each	
  figure	
  shows	
  the	
  results	
  of	
  
the	
  whiteness	
  tests	
  as	
  a	
  function	
  of	
  window	
  index	
  (sorted	
  in	
  order	
  of	
  temporal	
  precedence).	
  	
  
For	
  the	
  Portmanteau	
  tests,	
  we	
  have	
  plotted	
  the	
  p-­‐value	
  for	
  acceptance	
  of	
  the	
  null	
  hypothesis	
  
of	
   correlated	
   residuals	
   (namely	
   1-­‐p	
   where	
   p	
   is	
   the	
   p-­‐value	
   for	
   rejection	
   of	
   the	
   null	
  
hypothesis).	
   Values	
   greater	
   than	
   0.05	
   (blue	
   dashed	
   line)	
   indicates	
   the	
   residuals	
   are	
   white	
   at	
  
the	
   p<0.05	
   level.	
   For	
   the	
   ACF	
   test	
   (green)	
   we	
   have	
   plotted	
   the	
   probability	
   of	
   an	
   observed	
  
ACF	
  coefficient	
  to	
  be	
  within	
  the	
  expected	
  interval	
  for	
  white	
  residuals.	
  	
  Values	
  greater	
  than	
  
0.95	
   indicates	
   the	
   residuals	
   are	
   white	
   at	
   the	
   p<0.05	
   level.	
   The	
   fraction	
   of	
   windows	
   that	
   pass	
  
the	
   whiteness	
   test	
   are	
   noted	
   in	
   the	
   legend.	
   	
   Note	
   that	
   the	
   ACF	
   tests	
   classifies	
   all	
   windows	
   as	
  
having	
   white	
   residuals,	
   while	
   the	
   Portmanteau	
   tests	
   (which	
   are	
   much	
   more	
   conservative)	
  
indicate	
  that	
  the	
  majority	
  of	
  windows	
  are	
  white.	
  The	
  fact	
  that	
  a	
  range	
  of	
  windows	
  near	
  the	
  
end	
   of	
   the	
   epoch	
   marginally	
   fail	
   the	
   portmanteau	
   tests	
   may	
   indicate	
   that	
   we	
   may	
   want	
   to	
  

	
  

45	
  

use	
  a	
  slightly	
  larger	
  model	
  order	
  (e.g.,	
  p=16)	
  or	
  perhaps	
  a	
  smaller	
  window	
  size	
  (to	
  improve	
  
local	
  stationarity).	
  
The	
   middle	
   panel	
   shows	
   the	
   percent	
   consistency	
   plotted	
   versus	
   increasing	
   window	
   index.	
  
Note	
  that	
  the	
  PC	
  is	
  reliably	
  high	
  (µ≈87%)	
  suggesting	
  a	
  reasonable	
  model	
  fit.	
  
The	
   lower	
   panel	
   shows	
   the	
   stability	
   index	
   for	
   each	
   window.	
   Values	
   above	
   or	
   near	
   0	
   indicate	
  
an	
  unstable	
  (and	
  possibly	
  nonstationary)	
  model.	
  In	
  this	
  case,	
  we	
  might	
  try	
  some	
  additional	
  
preprocessing	
  or	
  shorten	
  the	
  window	
  length	
  to	
  improve	
  local	
  stationarity	
  of	
  the	
  data.	
  In	
  our	
  
example,	
  the	
  stability	
  index	
  is	
  reliably	
  low	
  indicating	
  a	
  stable/stationary	
  model.	
  	
  
The	
  validation	
  checks	
  all	
  indicate	
  a	
  reasonably	
  fit	
  model	
  (although	
  there	
  may	
  be	
  room	
  for	
  
improvement	
   of	
   the	
   fit).	
   Assuming	
   we	
   are	
   comfortable	
   with	
   this	
   we	
   can	
   now	
   proceed	
   to	
  
spectral/connectivity	
  estimation	
  and	
  visualization.	
  
	
  

	
  

	
  

46	
  

	
  
Figure	
  18.	
  Results	
  of	
  model	
  validation	
  for	
  RespWrong	
  (top)	
  and	
  RespCorr	
  (bottom)	
  conditions.	
  For	
  each	
  
condition	
   a	
   validation	
   statistic	
   is	
   plotted	
   versus	
   window	
   number	
   (sorted	
   in	
   order	
   of	
   temporal	
   precedence).	
   	
   If	
  
only	
   one	
   window	
   is	
   available,	
   bar	
   plots	
   are	
   generated	
   instead.	
   The	
   top	
   panel	
   shows	
   the	
   significance	
   level	
   for	
  
rejection	
  of	
  the	
  hypothesis	
  of	
  correlated	
  residuals.	
  For	
  Portmanteau	
  tests	
  (LMP,	
  Box-­‐Pierce,	
  Ljung-­‐Box),	
  a	
  value	
  
greater	
  than	
  0.05	
  (dashed	
  blue	
  line)	
  means	
  we	
  can	
  reject	
  the	
  null	
  hypothesis	
  at	
  the	
  p	
  ≤	
  0.05	
  level	
  (the	
  residuals	
  
are	
  white).	
  For	
  the	
  ACF	
  test	
  (green	
  line),	
  a	
  value	
  greater	
  than	
  0.95	
  indicates	
  white	
  residuals	
  at	
  the	
  p	
   ≤	
  0.05	
  level.	
  
The	
  middle	
  panel	
  shows	
  the	
  percent	
  consistency.	
  The	
  bottom	
  panel	
  shows	
  the	
  stability	
  index.	
  

	
  

6.7.	
  

Connectivity	
  Estimation	
  

	
  
Now	
  that	
  our	
  model	
  has	
  been	
  fit,	
  we’d	
  like	
  to	
  calculate	
  some	
  frequency-­‐domain	
  measures,	
  
such	
   as	
   the	
   spectrum,	
   coherence,	
   and	
   granger-­‐causality.	
   Bring	
   up	
   the	
   Connectivity	
  
Estimation	
   GUI	
   by	
   selecting	
   SIFTà Connectivity.	
   You	
   can	
   start	
   this	
   from	
   the	
   command-­‐line	
  
for	
  a	
  single	
  dataset	
  EEG	
  using:	
  
>> EEG.CAT.Conn = pop_est_mvarConnectivity(EEG);
You	
   should	
   now	
   see	
   the	
   GUI	
   shown	
   in	
   Figure	
   19	
   (left).	
   Here	
   we	
   can	
   compute	
   all	
   the	
  
measures	
   (and	
   more)	
   listed	
   in	
   Table	
   4	
   in	
   section	
   4.3.	
   	
   We	
   can	
   specify	
   a	
   list	
   of	
   frequencies	
   at	
  
which	
   to	
   compute	
   the	
   measure(s)	
   and	
   we	
   can	
   do	
   some	
   simple	
   conversions	
   of	
   complex	
  
measures	
  and	
  spectral	
  densities.	
  	
  	
  

	
  

47	
  

For	
   our	
   example,	
   let’s	
   compute	
   the	
   direct	
   DTF	
   (with	
   full	
   causal	
   normalization,	
   denoted	
  
dDTF08),	
   the	
   complex	
   coherence,	
   the	
   partial	
   coherence,	
   and	
   the	
   complex	
   spectral	
   density	
  
over	
  the	
  frequency	
  range	
  2-­‐50	
  Hz	
  (with	
  1	
  Hz	
  resolution).	
  Your	
  options	
  should	
  be	
  set	
  as	
  in	
  
Figure	
  19	
  and	
  the	
  table	
  below:	
  	
  	
  
Option	
  
Select	
  connectivity	
  measures	
  

Value	
  
Direct	
  DTF	
  (with	
  full	
  causal	
  norm.)	
  
Complex	
  Coherence	
  (Coh)	
  
Partial	
  Coherence	
  (pCoh)	
  
Complex	
  Spectral	
  Density	
  (S)	
  
return	
  squared	
  amplitude	
  of	
  complex	
  measures	
   checked	
  
convert	
  spectral	
  density	
  to	
  decibels	
  
checked	
  
Frequencies	
  (Hz)	
  
2:50	
  
	
  

	
  
Figure	
   19.	
   Connectivity	
   estimation	
   GUI	
   generated	
   by	
   pop_est_mvarConnectivity().	
   Here	
   we	
   have	
  
chosen	
  to	
  estimate	
  the	
  Direct	
  DTF	
  (with	
  full	
  causal	
  normalization;	
  dDTF08),	
  the	
  Complex	
  Coherence	
  (Coh),	
  the	
  
Partial	
  Coherence	
  (pCoh),	
  and	
  the	
  Spectral	
  Density	
  (S).	
  	
  

While	
   selecting	
   additional	
   measures	
   only	
   marginally	
   increases	
   the	
   computational	
   time,	
  
doubling	
  the	
  number	
  of	
  measures	
  will	
  generally	
  double	
  the	
  memory	
  demands.	
  Click	
  OK	
  to	
  
continue.	
  You	
  should	
  now	
  get	
  a	
  prompt	
  notifying	
  you	
  of	
  how	
  much	
  memory	
  will	
  be	
  required	
  
(for	
  each	
  condition).	
  If	
  you	
  have	
  enough	
  memory	
  to	
  continue,	
  click	
  OK.	
  A	
  progress	
  bar	
  will	
  
appear	
  showing	
  the	
  status	
  of	
  the	
  connectivity	
  estimation	
  for	
  each	
  condition.	
  	
  

6.8.	
  

Statistics	
  

	
  
	
  Once	
   a	
   model	
   has	
   been	
   fit,	
   and	
   connectivity	
   estimates	
   computed,	
   we	
   often	
   wish	
   to	
   compute	
  
statistics	
   for	
   the	
   dataset.	
   As	
   discussed	
   in	
   section	
   5.	
   This	
   can	
   be	
   achieved	
   in	
   SIFT	
   using	
  
several	
   approaches,	
   including	
   asymptotic	
   analytic	
   tests	
   (for	
   PDC,	
   RPDC,	
   and	
   DTF	
   measures)	
  
	
  

48	
  

and	
  surrogate	
  statistics	
  (bootstrapping,	
  phase	
  randomization).	
  The	
  statistics	
  module	
  in	
  SIFT	
  
is	
   currently	
   undergoing	
   major	
   revisions	
   to	
   improve	
   its	
   compatibility	
   with	
   EEGLAB’s	
  
statistics	
  routines	
  and	
  the	
  STUDY	
  module	
  in	
  EEGLAB.	
  Thus,	
  support	
  for	
  statistics	
  in	
  SIFT	
  has	
  
been	
  withdrawn	
  pending	
  the	
  beta	
  release.	
  

6.9.	
  

Visualization	
  

	
  
Once	
   we’ve	
   computed	
   our	
   connectivity	
   estimates,	
   and	
   potentially	
   computed	
   some	
   statistics,	
  
we	
   will	
   want	
   to	
   visualize	
   the	
   results.	
   SIFT	
   currently	
   provides	
   three	
   main	
   visualization	
  
programs	
   for	
   exploring	
   results	
   for	
   a	
   single	
   dataset	
   or	
   a	
   cohort	
   of	
   datasets:	
   	
   an	
   Interactive	
  
Time-­‐Frequency	
  Grid,	
  an	
  Interactive	
  BrainMovie3D,	
  and	
  Causal	
  Projection	
  (to	
  be	
  released	
  in	
  
SIFT	
   1.0beta).	
   For	
   the	
   next	
   several	
   sections,	
   let’s	
   start	
   by	
   visualizing	
   the	
   results	
   for	
   each	
  
condition	
   separately.	
   Let’s	
   begin	
   by	
   selecting	
   only	
   the	
   RespWrong	
   dataset	
   as	
   shown	
   in	
  
Figure	
  20.	
  
	
  

	
  
Figure	
  20.	
  Select	
  only	
  the	
  RespWrong	
  dataset	
  to	
  continue.	
  

	
  
6.9.1.	
  
	
  

Interactive	
  Time-­‐Frequency	
  Grid	
  

Bring	
   up	
   the	
   Interactive	
   Time-­‐Frequency	
   Grid	
   Options	
   GUI	
   by	
   selecting	
  
SIFTà Visualizationà 	
   Time-­‐Frequency	
   Grid.	
   This	
   can	
   also	
   be	
   achieved	
   from	
   the	
  
command-­‐line:	
  
>> pop_vis_TimeFrequencyGrid(EEG);
	
  
This	
   should	
   generate	
   the	
   GUI	
   seen	
   in	
   Figure	
   21.	
   This	
   GUI	
   has	
   substantially	
   more	
   options	
  
than	
  we’ve	
  previously	
  seen,	
  and	
  we	
  will	
  only	
  briefly	
  introduce	
  them	
  here.	
  	
  Help	
  text	
  for	
  each	
  
option	
  can	
  be	
  obtained	
  by	
  expanding	
  the	
  Help	
  Pane	
  at	
  the	
  bottom	
  of	
  the	
  PropertyGrid.	
  The	
  
first	
  step	
  to	
  creating	
  a	
  Time-­‐Frequency	
  Grid	
  is	
  to	
  design	
  the	
  grid	
  layout.	
  We	
  can	
  plot	
  time-­‐
frequency	
   images	
   of	
   different	
   VAR-­‐based	
   measures	
   on	
   the	
   upper	
   triangle,	
   lower	
   triangle,	
   or	
  
diagonal	
  of	
  the	
  grid.	
  This	
  is	
  achieved	
  by	
  setting	
  the	
  MatrixLayout	
  property	
  to	
  Partial	
  and	
  
selecting	
   the	
   measures	
   to	
   plot	
   on	
   the	
   various	
   grid	
   components.	
   	
   Next	
   we	
   should	
   decide	
  
	
  

49	
  

which	
  FrequenciesToPlot.	
  Usually	
  we	
  want	
  to	
  visualize	
  a	
  subset	
  of	
  all	
  frequencies,	
  to	
  make	
  
interesting	
  details	
  more	
  salient.	
  We	
  can	
  also	
  control	
  how	
  the	
  color	
  map	
  is	
  saturated,	
  using	
  a	
  
priori	
   color	
   limits	
   or	
   adaptive	
   ones	
   based	
   on	
   percentiles	
   of	
   the	
   data.	
   	
   Picking	
   a	
   good	
   color	
  
scaling	
   is	
   important	
   for	
   visual	
   inspection	
   of	
   the	
   data.	
   If	
   source	
   localization	
   has	
   been	
  
performed	
  we	
  can	
  set	
  SourceMarginPlot	
  to	
  dipole	
  to	
  plot	
  the	
  anatomical	
  locations	
  of	
  the	
  
sources	
  on	
  the	
  column	
  and	
  row	
  margins.	
  If	
  source	
  locations	
  are	
  not	
  available,	
  but	
  ICA	
  has	
  
been	
  performed,	
  we	
  can	
  set	
  this	
  property	
  to	
  topoplot	
  to	
  instead	
  plot	
  the	
  scalp	
  maps	
  of	
  the	
  
ICs	
  on	
  the	
  margins.	
  We	
  can	
  provide	
  a	
  Baseline	
  window	
  (in	
  seconds)	
  for	
  computing	
  event-­‐
related	
  measures.	
  We	
  can	
  also	
  perform	
  statistical	
   Thresholding	
  or	
  use	
  simple	
  percentile	
  or	
  
absolute	
   thresholds	
   to	
   establish	
   significance.	
   If	
   the	
   threshold	
   is	
   constant,	
   contours	
   can	
   be	
  
plotted	
   around	
   significant	
   regions	
   by	
   enabling	
   PlotContour.	
   Finally,	
   we	
   can	
   customize	
   a	
  
wide	
   variety	
   of	
   display	
   options,	
   including	
   placement	
   of	
   event	
   and	
   frequency	
   markers,	
   labels	
  
and	
  title	
  strings,	
  font	
  colors	
  and	
  sizes,	
  and	
  more.	
  	
  
For	
  this	
  example,	
  make	
  sure	
  your	
  options	
  are	
  set	
  as	
  in	
  Figure	
  21	
  and	
  the	
  table	
  below:	
  
Option	
  
MatrixLayout	
  
	
  	
  	
  	
  UpperTriangle	
  
	
  	
  	
  	
  LowerTriangle	
  
	
  	
  	
  	
  Diagonal	
  
ColorLimits	
  

Value	
  
Partial	
  
dDTF08	
  
dDTF08	
  
S	
  
99.9	
  

FrequenciesToPlot	
  
Thresholding	
  

2:50	
  
Simple	
  

	
  	
  	
  	
  PercentileThreshold	
  

[97.5,	
  3]	
  
	
  

Baseline	
  

[-­‐1,	
  	
  -­‐0.25]	
  

FrequencyMarkers	
  

[3,	
  7,	
  10]	
  

FrequencyMarkerColor	
  

[0.7,	
  0.7,	
  0.7]	
  

Description	
  
	
  
Put	
  the	
  dDTF08	
  on	
  the	
  upper	
  triangle	
  
Put	
  the	
  dDTF08	
  on	
  the	
  lower	
  triangle	
  
Put	
  the	
  power	
  spectra	
  on	
  the	
  diagonal	
  
Saturation	
   level	
   for	
   colormaps.	
  
Providing	
   a	
   single	
   number	
   (as	
   shown	
  
here)	
   indicates	
   that	
   we’d	
   like	
   the	
  
colormap	
   to	
   saturate	
   at	
   the	
   99.9th	
  
percentile	
  of	
  all	
  measured	
  values	
  
	
  
If	
   statistics	
   are	
   available,	
   we	
   can	
   use	
  
them,	
  otherwise	
  we	
  get	
  a	
  rough	
  sense	
  of	
  
significance	
   by	
   applying	
   simple	
  
percentile	
  thresholding	
  
For	
   each	
   frequency	
   (dimension	
   3),	
   plot	
  
only	
   values	
   larger	
   (in	
   absolute	
   value)	
  
than	
   97.5%	
   of	
   all	
   other	
   measured	
  
values	
  at	
  that	
  frequency	
  
Subtract	
   the	
   average	
   connectivity	
   in	
   the	
  
pre-­‐event	
   baseline	
   window	
   (1	
   sec	
   to	
   ¼	
  
sec	
   prior	
   to	
   button-­‐press	
   event)	
   from	
  
each	
  measured	
  value.	
  This	
  produces	
  an	
  
event-­‐related	
  measure	
  
This	
   will	
   place	
   horizontal	
   markers	
   at	
  
these	
  frequencies	
  (Hz).	
  
Here	
   we	
   can	
   determine	
   the	
   [R	
   G	
   B]	
  
color(s)	
  of	
  the	
  frequency	
  markers.	
  	
  

	
  

	
  

50	
  

Layout	
   your	
  
TF	
  Grid
Specify	
  
color	
  scaling
Plot	
   source	
  
anatomy	
   on	
  
margins
Apply	
  
statistical	
  
thresholding
Examine	
  
deviation	
  
from	
  baseline
Full	
   control	
  
over	
  
color	
  
and	
  font

Help	
  Pane

	
  
Figure	
  
21.	
  
Interactive	
  
Time-­‐Frequency	
  
Grid	
  
option	
  
GUI	
  
generated	
  
by	
  
pop_vis_TimeFrequencyGrid().Almost	
   every	
   aspect	
   of	
   the	
   grid	
   is	
   customizable,	
   and	
   only	
   the	
   most	
  
commonly-­‐used	
  options	
  are	
  represented	
  in	
  the	
  GUI.

Click	
   OK	
   to	
   continue	
   and	
   generate	
   the	
   Time-­‐Frequency	
   Grid.	
   After	
   a	
   few	
   seconds,	
   you	
  
should	
   see	
   a	
   figure	
   similar	
   to	
   Figure	
   22.	
   Here	
   we	
   have	
   plotted	
   an	
   array	
   of	
   time-­‐frequency	
  
images,	
   where	
   frequency	
   is	
   on	
   the	
   y-­‐axis	
   and	
   time	
   on	
   the	
   x-­‐axis.	
   On	
   the	
   upper	
   and	
   lower	
  
triangle	
   of	
   the	
   grid	
   (above	
   and	
   below	
   the	
   red-­‐bordered	
   diagonal)	
   we	
   have	
   the	
   dDTF	
  
(conditional	
   GC)	
   between	
   each	
   pair	
   of	
   sources.	
   Information	
   flows	
   from	
   columns	
   to	
   rows.	
  
Thus	
  the	
  time-­‐frequency	
  (TF)	
  image	
  at	
  (row,col)	
  =	
  (3,1)	
  shows	
  information	
  flow	
  at	
  different	
  
times	
  and	
  frequencies	
  from	
  the	
  source	
  above	
  column	
  1	
  (IC8)	
  to	
  the	
  source	
  on	
  the	
  left	
  of	
  row	
  	
  
3	
  (IC13).	
  Note	
  that	
  we	
  have	
  vertical	
  red	
  lines	
  indicating	
  events	
  of	
  interest	
  (here	
  the	
  time	
  of	
  
the	
   button-­‐press	
   event)	
   and	
   horizontal	
   gray	
   lines	
   denoting	
   our	
   frequencies	
   of	
   interest	
  
(FrequencyMarkers).	
   On	
   the	
   diagonal	
   we	
   have	
   plotted	
   the	
   event-­‐related	
   spectral	
  
perturbation	
   (ERSP).	
   Because	
   we	
   provided	
   a	
   baseline,	
   each	
   pixel	
   shows	
   the	
   information	
  
flow	
  or	
  spectrum	
  relative	
  to	
  the	
  baseline	
  window.	
  Red	
  denotes	
  more	
  information	
  flow	
  than	
  
in	
   the	
   baseline,	
   while	
   blue	
   denotes	
   less.	
   The	
   anatomical	
   dipole	
   locations	
   for	
   each	
   source	
   are	
  
rendered	
  on	
  the	
  margins.	
  Clicking	
  on	
  this	
  will	
  expand	
  an	
  interactive	
  3D	
  MRI	
  plot	
  (dipplot).	
  

	
  

51	
  

Clicking	
   on	
   any	
   time-­‐frequency	
   image	
   generates	
   a	
   figure	
   with	
   more	
   detail	
   regarding	
   the	
  
interaction	
  between	
  the	
  respective	
  sources.	
  	
  
Note	
  the	
  large	
  bursts	
  of	
  information	
  flow	
  and	
  spectral	
  power	
  in	
  the	
  theta	
  (3-­‐7	
  Hz)	
  and	
  delta	
  
(2-­‐3	
   Hz)	
   bands	
   around,	
   and	
   just	
   after,	
   the	
   erroneous	
   button	
   press.	
   	
   This	
   suggests	
   some	
   kind	
  
of	
   transient	
   network	
   synchronization	
   occurring	
   around	
   the	
   time	
   when	
   the	
   button	
   press	
   is	
  
made	
   in	
   error.	
   As	
   a	
   side	
   note,	
   observe	
   that,	
   although	
   spectral	
   power	
   often	
   increases	
   with	
  
information	
   flow/granger-­‐causality,	
   it	
   does	
   not	
   necessarily	
   do	
   so.	
   Consider	
   IC	
   38	
   (7th	
   row	
  
and	
   column).	
   It	
   shows	
   very	
   little	
   change	
   in	
   ERSP	
   (cell	
   (7,7))	
   around	
   the	
   button	
   press,	
   but	
  
appears	
  to	
  exhibit	
  large	
  changes	
  in	
  information	
  flow	
  with	
  ICs	
  11	
  and	
  8.	
  As	
  a	
  rule,	
  spectral	
  
power	
  modulation	
  and	
  phase	
  synchronization/information	
  flow	
  can	
  occur	
  independently	
  of	
  
each	
   other	
   –	
   one	
   does	
   not	
   imply	
   the	
   other.	
   Merely	
   observing	
   two	
   regions	
   concomitantly	
  
increase	
  their	
  spectral	
  amplitude	
  does	
  not	
  necessarily	
  suggest	
  that	
  they	
  are	
  communicating.	
  
Conversely,	
   observing	
   a	
   lack	
   of	
   event-­‐related	
   spectral	
   power	
   modulation	
   in	
   some	
   putative	
  
component	
  of	
  a	
  brain	
  network	
  does	
  not	
  mean	
  it	
  is	
  not	
  critically	
  participating	
  in	
  the	
  network.	
  
To	
   explore	
   one	
   of	
   these	
   interactions	
   further,	
   let’s	
   go	
   ahead	
   and	
   click	
   on	
   cell	
   (3,1),	
   which	
  
corresponds	
  to	
  IC8àIC13.	
  	
  
	
  
Customizable	
  
Matrix	
  Layout

FROM

TO

Frequency	
  
(Hz)

Time	
  
Dipole	
  location
(click	
  to	
  expand)

Time-­‐Frequency	
  Image	
  
(Click	
  to	
  expand)

Event	
   and	
   frequency	
  
markers

ERSP	
  on	
  diagonal

	
  
Figure	
  22.	
  Interactive	
  Time-­‐Frequency	
  Grid.	
  	
  

	
  

52	
  

	
  
Clicking	
   on	
   cell	
   (3,1)	
   should	
   generate	
   an	
   image	
   similar	
   to	
   that	
   in	
   Figure	
   23a.	
   Here	
   we	
   can	
  
explore	
  the	
  interaction	
  between	
  these	
  two	
  processes	
  in	
  greater	
  detail.	
  On	
  the	
  top	
  panel	
  of	
  
we	
   have	
   the	
   dDTF	
   flow	
   from	
   IC8àIC13	
   and	
   on	
   the	
   bottom	
   panel	
   we	
   have	
   the	
   feedback	
   flow	
  
(IC13àIC8).	
   On	
   the	
   left	
   marginals	
   we	
   have	
   rendered	
   the	
   column-­‐envelope	
   of	
   the	
   matrix	
  
(maximum	
   and	
   minimum	
   dDTF	
   across	
   time),	
   while	
   on	
   the	
   bottom	
   marginal	
   we	
   have	
   the	
  
row-­‐envelope	
   (maximum	
   and	
   minimum	
   of	
   the	
   dDTF	
   across	
   frequency).	
   The	
   envelopes	
   of	
  
the	
   two-­‐sided	
   thresholds	
   for	
   statistical	
   significance	
   (using	
   the	
   percentile	
   threshold)	
   are	
  
plotted	
   as	
   green-­‐black	
   lines	
   on	
   the	
   marginals.	
   Values	
   between	
   the	
   threshold	
   lines	
   are	
  
considered	
   non-­‐significant	
   and	
   masked	
   in	
   the	
   time-­‐frequency	
   plot.	
   The	
   purple	
   shaded	
  
region	
   on	
   the	
   time-­‐marginal	
   indicates	
   our	
   baseline	
   window.	
   Every	
   part	
   of	
   the	
   image	
   is	
  
expandable.	
   Clicking	
   on	
   the	
   marginals	
   generates	
   images	
   (b)	
   and	
   (c),	
   while	
   clicking	
   on	
   the	
  
source	
  anatomical	
  images	
  generates	
  images	
  (d)	
  and	
  (e).	
  	
  
	
  

%

#

!"

&"

$"
	
  
Figure	
  23.	
  Expansion	
  of	
  the	
  Time-­‐Frequency	
  Grid	
  cell	
  (3,1)	
  corresponding	
  to	
  IC8àIC13.	
  As	
  with	
  the	
  Time-­‐
Frequency	
   Grid,	
   each	
   element	
   of	
   the	
   Time-­‐Frequency	
   Cell	
   (a,	
   center)	
   is	
   also	
   interactively	
   expandable.	
   On	
   the	
   top	
  
panel	
   of	
   (a)	
   we	
   have	
   the	
   dDTF	
   flow	
   from	
   IC8àIC13	
   and	
   on	
   the	
   bottom	
   panel	
   we	
   have	
   the	
   feedback	
   flow	
  
(IC13àIC8).	
   The	
   envelopes	
   of	
   the	
   time-­‐frequency	
   matrix	
   are	
   plotted	
   on	
   the	
   marginal	
   (b,	
   c).	
   Here,	
   two-­‐sided	
  
thresholds	
   for	
   statistical	
   significance	
   are	
   plotted	
   as	
   green-­‐black	
   lines	
   on	
   the	
   marginal.	
   The	
   purple	
   shaded	
   region	
  
denotes	
  the	
  baseline	
  interval	
  [-­‐1	
  -­‐0.25]	
  sec.	
  Clicking	
  on	
  the	
  source	
  anatomical	
  images	
  will	
  generate	
  interactive	
  
dipole	
  plots	
  of	
  the	
  sources	
  (d,	
  e).	
  

By	
   examining	
   this	
   time-­‐frequency	
   image	
   we	
   can	
   see	
   that	
   there	
   is	
   a	
   significant	
   flow	
   of	
  
information	
   from	
   IC8	
   to	
   IC13	
   in	
   the	
   theta-­‐band	
   around	
   the	
   time	
   of	
   the	
   erroneous	
   button-­‐
press.	
   There	
   is	
   also	
   slightly	
   delayed,	
   and	
   damped	
   feedback	
   from	
   IC13àIC8.	
   This	
  
emphasizes	
   the	
   points	
   made	
   in	
   section	
   4.5.	
   regarding	
   the	
   importance	
   of	
   using	
   an	
  
asymmetric	
  measure	
  that	
  can	
  separate	
  feedforward	
  and	
  feedback	
  influences	
  in	
  closed-­‐loop	
  
	
  

53	
  

systems.	
   Note	
   that	
   the	
   early	
   information	
   flow	
   is	
   highly	
   peaked	
   around	
   5	
   Hz	
   theta-­‐band,	
  
while	
   we	
   see	
   later	
   information	
   flow	
   around	
   250-­‐600	
   ms	
   shifting	
   to	
   the	
   delta-­‐band	
   (2-­‐3	
   Hz).	
  
This	
   is	
   precisely	
   in	
   line	
   with	
   several	
   observations	
   regarding	
   the	
   functional	
   role	
   in	
   error	
  
processing	
   (the	
   so-­‐called	
   Error-­‐Related	
   Negativity	
   (ERN)	
   seen	
   reported	
   in	
   ERP	
   literature)	
  
and	
   electrophysiology	
   of	
   the	
   cortical	
   area	
   to	
   which	
   IC8	
   is	
   localized	
   (Anterior	
   Cingulate	
  
Cortex	
  (ACC))	
  (Holroyd	
  and	
  Coles,	
  2002;	
  Yordanova	
  et	
  al.,	
  2004;	
  Luu	
  et	
  al.,	
  2004;	
  Roger	
  et	
  
al.,	
  2010).	
  	
  
Returning	
   to	
   our	
   Time-­‐Frequency	
   Grid	
   in	
   Figure	
   22,	
   and	
   turning	
   our	
   attention	
   to	
   the	
   first	
  
column,	
  we	
  note	
  that	
  IC8	
  (ACC)	
  appears	
  to	
  be	
  exerting	
  a	
  disproportionate	
  amount	
  of	
  theta-­‐
band	
  causal	
  influence	
  on	
  the	
  rest	
  of	
  the	
  network	
  around	
  the	
  time	
  of	
  the	
  erroneous	
  button	
  
press.	
   IC8	
   appears	
   to	
   be	
   some	
   kind	
   of	
   hub,	
   synchronizing	
   and	
   communicating	
   with	
   multiple	
  
other	
  brain	
  areas	
  when	
  the	
  error	
  is	
  being	
  committed.	
  In	
  order	
  to	
  examine	
  the	
  full	
  network	
  
behavior	
  in	
  more	
  detail,	
  let’s	
  generate	
  a	
  3D	
  BrainMovie.	
  
6.9.2.	
  
	
  

Interactive	
  Causal	
  BrainMovie3D	
  

The	
  Interactive	
  Causal	
  BrainMovie3D	
  (Mullen	
  and	
  Delorme	
  et	
  al,	
  2010;	
  Delorme,	
  2005)	
  is	
  a	
  
way	
  of	
  visualizing	
  brain	
  network	
  activity	
  across	
  time,	
  frequency	
  and	
  anatomical	
  location	
  in	
  
the	
   form	
   of	
   an	
   anatomically	
   localized	
   directed	
   graph.	
   Directed	
   graphs	
   (graphical	
   models)	
  
are	
   powerful	
   constructs	
   for	
   representing	
   causal	
   network	
   structure	
   (Pearl,	
   2000;	
   Eichler,	
  
2006a).	
   Graph-­‐theoretic	
   measures	
   are	
   being	
   increasing	
   used	
   to	
   study	
   brain	
   network	
  
organization	
   (Bullmore	
   and	
   Sporns,	
   2009).	
   The	
   BrainMovie3D	
   provides	
   a	
   way	
   to	
  
interactively	
   explore	
   multiple	
   dimensions	
   of	
   source-­‐domain	
   network	
   dynamics	
   and	
   graph	
  
structure	
  in	
  an	
  intuitive	
  and	
  aesthetically	
  pleasing	
  manner.	
  
To	
   begin,	
   let’s	
   bring	
   up	
   the	
   BrainMovie3D	
   GUI	
   by	
   selecting	
   SIFTà Visualizationà Causal	
  
BrainMovie3D.	
  The	
  command-­‐line	
  analogue	
  is	
  
>>	
  pop_vis_causalBrainMovie3D(EEG);	
  
You	
  should	
  now	
  be	
  presented	
  with	
  a	
  control	
  panel	
  similar	
  to	
  that	
  shown	
  in	
   Figure	
  24.	
  	
  This	
  
GUI	
  has	
  the	
  most	
  options	
  of	
  any	
  thus	
  far,	
  and	
  we	
  will,	
  again,	
  only	
  explore	
  a	
  small	
  subset	
  of	
  
the	
   options	
   for	
   this	
   example.	
   The	
   Help	
   Pane	
   (and	
   some	
   adventurous	
   exploration)	
   should	
  
allow	
  the	
  user	
  to	
  deduce	
  the	
  function	
  of	
  many	
  of	
  the	
  remaining	
  options.	
  	
  
One	
   of	
   the	
   interesting	
   features	
   of	
   the	
   BrainMovie	
   is	
   the	
   ability	
   to	
   modulate	
   the	
   color	
   and	
  
size	
   of	
   nodes	
   based	
   on	
   graph-­‐theoretic	
   measures	
   such	
   as	
   inflow/outflow,	
  
indegree/outdegree,	
  causal	
  flow,	
  causal	
  density,	
  asymmetry	
  ratio,	
  and	
  other	
  such	
  quantities	
  
(Seth,	
  2005;	
  Bullmore	
  and	
  Sporns,	
  2009).	
  This	
  is	
  achieved	
  through	
  the	
  NodeColorMapping,	
  
and	
  NodeSizeMapping	
  properties.	
  	
  	
  
Measure	
  
Outflow	
  
Inflow	
  
Causal	
  Flow	
  
Outdegree	
  
Indegree	
  
Causal	
  Degree	
  

	
  

Description	
  
Sum	
  connectivity	
  strengths	
  over	
  outgoing	
  edges	
  
Sum	
  connectivity	
  strength	
  over	
  incoming	
  edges	
  
Outflow	
  -­‐	
  Inflow	
  
Number	
  of	
  significant	
  outgoing	
  edges	
  
Number	
  of	
  significant	
  incoming	
  edges	
  
Outdegree-­‐Indegree	
  

54	
  

Asymmetry	
  Ratio	
  

!"#$%& − !"#$%&'
	
  
!"#$%& + !"#$%&'
−1   ≤ !"   ≤ 1	
  
AR	
   =	
   -­‐1	
   indicates	
   all	
   connectivity	
   related	
   to	
   that	
   node	
   is	
  
inflowing	
  (a	
  causal	
  sink)	
  
AR	
   =	
   +1	
   indicates	
   all	
   connectivity	
   related	
   to	
   that	
   node	
   is	
  
outflowing	
  (a	
  causal	
  source)	
  
AR	
   =	
   0	
   indicates	
   either	
   balanced	
   flow	
   or	
   no	
   significant	
  
flow	
  
!" =

	
  
Let’s	
  begin	
  by	
  starting	
  with	
  the	
  default	
  options	
  and	
  setting	
  the	
  remaining	
  options	
  as	
  shown	
  
in	
  Figure	
  24	
  and	
  the	
  table	
  below:	
  
	
  
Option	
  
ConnectivityMethod	
  
FrequenciesToCollapse	
  

Value	
  
dDTF08	
  
4:7	
  

FreqCollapseMethod	
  

Integrate	
  

EdgeColorMapping	
  

Connectivity	
  

EdgeSizeMapping	
  

ConnMagnitude	
  

NodeColorMapping	
  

AsymmetryRatio	
  

NodeSizeMapping	
  

Outflow	
  

	
  

Description	
  
Which	
  connectivity	
  measure	
  to	
  use	
  
Collapse	
   frequencies	
   across	
   the	
  
theta	
  range	
  
Which	
   method	
   to	
   use	
   to	
   collapse	
  
frequencies.	
  
Integrate:	
   integrate	
   over	
   the	
  
selected	
  frequencies	
  
Mean:	
   take	
   the	
   mean	
   over	
  
frequencies	
  
Max:	
  take	
  the	
  maximum	
  
Peak:	
   return	
   the	
   peak	
   value	
   over	
  
frequencies	
  
(a	
  
monotonically	
  
increasing	
   or	
   decreasing	
   sequence	
  
does	
  not	
  have	
  a	
  peak)	
  
The	
   color	
   of	
   the	
   edges	
   will	
   be	
  
mapped	
   to	
   connectivity	
   strength	
  
(amount	
   of	
   information	
   flow	
   along	
  
that	
   edge).	
   Red	
   =	
   high	
   connectivity,	
  
Green	
  =	
  low	
  connectivity.	
  
The	
   size	
   of	
   edges	
   of	
   the	
   graph	
  
(connecting	
   “arrows”)	
   will	
   be	
  
mapped	
   to	
   connectivity	
   magnitude	
  
(absolute	
   value	
   of	
   connectivity	
  
strength,	
  useful	
  if	
  there	
  are	
  negative	
  
values	
   as	
   with	
   event-­‐related	
  
(baselined)	
   or	
   between-­‐condition	
  
analysis)	
  
The	
  color	
  of	
  a	
  node	
  (source)	
  will	
  be	
  
mapped	
   to	
   the	
   asymmetry	
   ratio	
   of	
  
connectivity	
   for	
   that	
   source.	
   	
   Red	
   =	
  
causal	
   source,	
   Blue	
   =	
   causal	
   sink.	
  
Green	
  =	
  balanced	
  flow	
  
The	
  size	
  of	
  a	
  node	
  will	
  be	
  mapped	
  to	
  

55	
  

FooterPanelDisplaySpec	
  

	
  	
  	
  	
  	
  icaenvelopevars	
  
	
  	
  	
  	
  	
  backprojectedchans	
  
RotationPath3d	
  
ProjectGraphOnMRI	
  
Thresholding	
  

PercentileThreshold	
  

the	
   amount	
   of	
   information	
   outflow	
  
from	
  the	
  source	
  	
  
ICA_ERPenvelope	
   This	
   configures	
   the	
   footer	
   panel	
   at	
  
the	
   bottom	
   of	
   the	
   brainmovie.	
   Here	
  
we’ve	
   chosen	
   to	
   display	
   the	
   ERP	
  
envelope	
   of	
   some	
   backprojected	
  
components	
  
8	
  
Backproject	
  the	
  ERP	
  of	
  IC	
  8	
  (ACC)	
  
B1;	
  
…	
   and	
   compute	
   the	
   envelope	
   only	
  
for	
  channel	
  B1	
  (FCz)	
  
automatic	
  
This	
   creates	
   an	
   automatic	
   rotation	
  
of	
   the	
   brainmovie	
   when	
   we	
   create	
  
the	
  final	
  movie	
  
on	
  
This	
   projects	
   the	
   3D	
   directed	
   graph	
  
onto	
  the	
  2D	
  anatomical	
  slices	
  
	
  
If	
  statistics	
  are	
  available,	
  we	
  can	
  use	
  
them,	
   otherwise	
   we	
   get	
   a	
   rough	
  
sense	
   of	
   significance	
   by	
   applying	
  
simple	
  percentile	
  thresholding	
  
0.05	
  
We	
   will	
   only	
   render	
   the	
   top	
   5%	
   of	
  
all	
  connections	
  across	
  all	
  time	
  

	
  
A	
   useful	
   feature	
   of	
   the	
   Control	
   Panel	
   is	
   that	
   we	
   can	
   Preview	
   frames	
   from	
   the	
   brainmovie	
  
before	
  committing	
  to	
  render	
  the	
  movie.	
  You	
  can	
  also	
  save	
  these	
  preview	
  frames	
  allowing	
  an	
  
easy	
   way	
   to	
   create	
   a	
   network	
   image	
   for	
   any	
   desired	
   time	
   point.	
   Now	
   that	
   we	
   have	
  
configured	
   our	
   options,	
   go	
   ahead	
   and	
   click	
   on	
   the	
   scrollbar	
   in	
   the	
   Preview	
   BrainMovie	
  
panel.	
  It	
  may	
  take	
  a	
  second	
  or	
  two	
  for	
  the	
  brainmovie	
  to	
  render,	
  so	
  be	
  patient	
  and	
  don’t	
  click	
  
multiple	
   times	
   in	
   rapid	
   succession.	
   If	
   you	
   have	
   graphics	
   problems,	
   try	
   setting	
   the	
  
UseOpenGL	
   option	
   to	
   off.	
   If	
   you	
   move	
   the	
   slider	
   to	
   approximately	
   -­‐0.2	
   seconds	
   (200	
   ms	
  
before	
  the	
  button	
  press)	
  you	
  should	
  see	
  a	
  figure	
  similar	
  to	
  Figure	
  25.	
  	
  We	
  are	
  looking	
  at	
  a	
  3D	
  
rendering	
   of	
   the	
   brain	
   of	
   this	
   subject	
   derived	
   from	
   MRI	
   images.	
   To	
   be	
   precise,	
   here	
   we	
   have	
  
coregistered	
  (warped)	
  this	
  subject’s	
  electrode	
  montage	
  to	
  the	
  4-­‐shell	
  spherical	
  head	
  model	
  
corresponding	
  to	
  the	
  Montreal	
  Neurological	
  Institute	
  (MNI)	
  average	
  brain.	
  This	
  accounts	
  for	
  
the	
   low-­‐resolution	
   of	
   the	
   MRIs	
   (and	
   much	
   of	
   the	
   error	
   in	
   the	
   dipole	
   localization).	
   If	
  
individual	
   MRIs	
   are	
   available	
   for	
   the	
   subject,	
   an	
   individualized	
   head	
   model	
   can	
   be	
  
constructed.	
   The	
   outline	
   of	
   the	
   cerebral	
   spinal	
   fluid	
   (CSF)	
   is	
   rendered	
   translucently	
  
(RenderCorticalSurface	
  option)	
  to	
  show	
  us	
  the	
  outline	
  of	
  the	
  cortical	
  surface.	
  As	
  described	
  
in	
  the	
  section	
  above,	
  node	
  and	
  edge	
  color	
  and	
  size	
  are	
  modulated	
  by	
  one	
  or	
  more	
  network	
  
or	
   graph-­‐theoretic	
   measures.	
   Since	
   we	
   have	
   mapped	
   outflow	
   to	
   NodeColor	
   and	
  
AsymmetryRatio	
  to	
  NodeSize	
  we	
  can	
  immediately	
  see	
  that	
  IC8	
  (big	
  red	
  ball	
  in	
  center)	
  is	
  a	
  
causal	
   source	
   hub	
   here,	
   driving	
   many	
   other	
   brain	
   areas	
   in	
   the	
   theta	
   frequency	
   band.	
   Note	
  
the	
   backprojected	
   ERP	
   from	
   IC8	
   at	
   the	
   bottom	
   of	
   the	
   screen	
   shows	
   a	
   sharp	
   negativity	
  
around	
  40	
  ms	
  followed	
  by	
  a	
  late	
  positive	
  complex	
  at	
  around	
  350-­‐400	
  ms.	
  This	
  is	
  the	
  well-­‐
known	
   ERN	
   potential	
   known	
   to	
   be	
   associated	
   with	
   error-­‐processing.	
   Try	
   scrolling	
   to	
   the	
  
time	
   point	
   corresponding	
   to	
   the	
   negative	
   peak	
   of	
   the	
   ERN	
   (40	
   ms)	
   and	
   see	
   what	
   happens	
   to	
  
the	
  network	
  (particularly	
  IC	
  8).	
  Try	
  rotating	
  the	
  graph	
  to	
  examine	
  it	
  from	
  different	
  angles.	
  
Try	
  scrolling	
  through	
  various	
  stages	
  of	
  the	
  epoch	
  and	
  exploring	
  different	
  mappings	
  for	
  node	
  
and	
  edge	
  color	
  and	
  size.	
  

	
  

56	
  

!"##$%&'()*'+,'-./(
012(,&1-3(1-4'3*$5"-6(
%'$789-01-36($:'*$3'6(
$-0(2"*'(
;-0'%'-0'-4#/(2$%(
2,#5%#'(3*$%<8
4<'"*'5.(2'$&,*'&(
"-4"(="0'($-0(>03'(
&1?'($-0(."#"*(
@,4"2$5.$##/("*(
2$-,$##/(*"4$4'(/",*(
2":1'(
A*$-&#,.'-4#/(
&,%'*12%"&'($(BC(
."*5.$#(&,*)$.'(
!,&4"21?'(4<'(."#"*(
$-0(&1?'(")(/",*(3*$%<(
'#'2'-4&(
!<""&'()*"2($(-,2D'*(
")(12$3'($-0(2":1'(
",4%,4()"*2$4&(
@%%#/(&4$5&5.$#(
4<*'&<"#01-3(4"(4<'(
3*$%<(
;-4'*$.5:'(D*"E&'(
F$-0(&$:'G()*$2'&()*"2(
4<'(2":1'(

H<'-(*'$0/6(2$7'(
/",*(2":1'I(

	
  
Figure	
  24.	
  The	
  Interactive	
  BrainMovie3D	
  Control	
  Panel.	
  	
  

	
  

57	
  

!"#$%-"*+9"5%*"66$&7"5#&%1"%
$&9?+1$#%IJ%&",6*$%-"*+9"5%
3-'*<'5=%"5%+%5"#$0%*+--&%,7%
+##'9"5+-%'5:"6?+9"5%+K",1%
18$%5"#$%%

C#=$%&'($%)%
*"55$*9D'14%
?+=5'1,#$0%
$1*2%

C#=$%*"-"6%)%
*"55$*9D'14%
&16$5=180%$1*2%

!"#$%*"-"6%)%
+&4??$164%
6+9"0%$1*2%%

34-'5#$6&%1+7$6%'5%18$%
#'6$*9"5%":%;"/%
3-'*<'5=%"5%+5%$#=$%
*+--&%,7%*"66$&7"5#'5=%
>'?$@A6$B,$5*4%'?+=$%

!"#$%&'($%%)%
*+,&+-%",.-"/0%
$1*2%

L6+78%*+5%K$%6"1+1$#%+5#%
(""?$#%'5M",1%1"%+--"/%:,--%
$N7-"6+9"5%%

3,66$51%9?$%

E+*<76"F$*1$#%CGH%:6"?%
&$-$*1$#%*"?7"5$51&%

	
  

Figure	
  25.	
  A	
  frame	
  of	
  the	
  interactive	
  BrainMovie3D	
  at	
  -­‐0.2	
  seconds	
  (-­‐200	
  ms)	
  relative	
  to	
  the	
  event.	
  	
  

	
  
When	
   you	
   are	
   ready,	
   specify	
   an	
   output	
   folder	
   and	
   format	
   under	
   OutputFormat-­‐
à ImageOutputDirectory	
   and	
   click	
   Make	
   Movie!	
   All	
   frames	
   of	
   the	
   movie	
   will	
   now	
   be	
  
rendered	
  and	
  saved	
  to	
  disk.	
  This	
  may	
  take	
  a	
  while	
  so	
  you	
  might	
  want	
  to	
  pull	
  out	
  that	
  Yoga	
  
mat	
   again	
   (you	
   can	
   also	
   choose	
   a	
   narrower	
   MovieTimeRange	
   if	
   you	
   don’t	
   want	
   to	
   wait	
  
around).	
   If	
   you	
   selected	
   BrainMovieOptionà Visibility	
   =	
   On	
   then	
   you	
   should	
   see	
   each	
  
frame	
  rendered	
  on	
  your	
  display.	
  Setting	
  visibility	
  to	
  off	
  will	
  replace	
  the	
  on-­‐screen	
  rendering	
  
with	
  a	
  progress	
  bar,	
  which	
  should	
  speed	
  up	
  the	
  movie-­‐making	
  process.	
  	
  
Now	
  that	
  we’ve	
  made	
  our	
  movie,	
  let’s	
  take	
  a	
  look	
  at	
  some	
  of	
  our	
  frames.	
   Figure	
  26	
  shows	
  
three	
  of	
  these	
  frames,	
  corresponding	
  to	
  the	
  start	
  (-­‐523	
  ms),	
  middle	
  (40	
  ms),	
  and	
  end	
  (606	
  
ms)	
   of	
   our	
   button-­‐press	
   task.	
   Note	
   that	
   at	
   the	
   start	
   of	
   the	
   epoch,	
   the	
   network	
   is	
   initially	
  
quiescent,	
   with	
   some	
   weak	
   communication	
   between	
   sources	
   in	
   or	
   near	
   anterior	
   rostral	
   ACC	
  
(RCZa;	
  IC	
  11)	
  and	
  supplementary	
  motor	
  area	
  (SMA/preSMA;	
  IC	
  38).	
  	
  

	
  

58	
  

Moving	
   to	
   the	
   time	
   just	
   following	
   the	
   button-­‐press	
   event	
   (center	
   frame)	
   we	
   see	
   that	
   now	
  
IC8,	
   located	
   in	
   posterior	
   ACC	
   (RCZp/CCZ),	
   has	
   become	
   a	
   central	
   causal	
   hub,	
   exerting	
  
significant	
   influence	
   on	
   several	
   areas	
   of	
   the	
   network,	
   but	
   particularly	
   posterior	
   parietal	
  
cortex	
   (IC13)	
   and	
   RCZa.	
   There	
   is	
   some	
   bidirectional	
   flow,	
   but	
   the	
   flux	
   is	
   largely	
   outward	
  
from	
   IC8,	
   as	
   indicated	
   by	
   the	
   red	
   hue	
   of	
   the	
   node	
   (indicating	
   large	
   positive	
   asymmetry	
  
ratio).	
   Note	
   that	
   this	
   corresponds	
   precisely	
   to	
   the	
   negative	
   peak	
   of	
   the	
   ERN.	
   However,	
   we	
  
are	
   not	
   modeling	
   dependencies	
   in	
   the	
   event-­‐locked	
   ERN	
   itself	
   (which	
   is	
   an	
   ERP	
   and	
  
subtracted	
   during	
   ensemble	
   normalization)	
   but	
   rather	
   in	
   the	
   ongoing	
   oscillations	
  
underlying	
  the	
  ERN	
  complex.	
  	
  
Moving	
   to	
   the	
   end	
   of	
   the	
   epoch,	
   around	
   606	
   ms,	
   we	
   see	
   that	
   the	
   network	
   has	
   almost	
  
returned	
  to	
  its	
  initial	
  decoupled	
  state,	
  and	
  examining	
  the	
  last	
  frames	
  of	
  the	
  movie	
  will	
  reveal	
  
the	
  complete	
  decoupling	
  of	
  IC8	
  from	
  the	
  rest	
  of	
  the	
  network.	
  This	
  panel	
  seems	
  to	
  implicate	
  
RCZp/CCZ	
  as	
  some	
  sort	
  of	
  causal	
  hub	
  in	
  a	
  cortical	
  network	
  for	
  error-­‐processing.	
  As	
  noted	
  in	
  
section	
   6.9.1.	
   	
   this	
   is	
   entirely	
   consistent	
   with	
   the	
   theoretical	
   (and	
   partly	
   experimentally	
  
verified)	
  role	
  of	
  RCZp/CCZ	
  in	
  error	
  processing.	
  
	
  

	
  
Figure	
   26.	
   Three	
   frames	
   of	
   a	
   causal	
   BrainMovie3D	
   showing	
   transient	
   theta	
   information	
   flow	
   during	
  
error	
   commission.	
   The	
   frames	
   are	
   correspond	
   to	
   -­‐523	
   ms	
   (left),	
   40	
   ms	
   (center),	
   and	
   606	
   ms	
   (right)	
  
relative	
  to	
  the	
  button	
  press	
  (0	
  ms).	
  	
  	
  

It	
  is	
  left	
  as	
  an	
  exercise	
  to	
  the	
  reader	
  to	
  do	
  the	
  following:	
  
1. Try	
   creating	
   brainmovies	
   for	
   other	
   frequency	
   bands	
   (e.g.,	
   delta).	
   What	
   is	
   different	
  
between	
   the	
   evolution	
   of	
   the	
   delta-­‐band	
   cortical	
   network	
   and	
   the	
   theta-­‐band	
  
network?	
  You	
  can	
  even	
  have	
  brainmovie	
  find	
  the	
  peaks	
  over	
  some	
  frequency	
  range	
  
(e.g.,	
   2-­‐9	
   Hz)	
   and	
   map	
   the	
   peak	
   frequency	
   onto	
   edge	
   color	
   to	
   color-­‐code	
   different	
  
frequency-­‐specific	
   sub-­‐networks	
   (Hint:	
   examine	
   the	
   FreqCollapseMethod	
   and	
  
EdgeColorMapping	
  properties).	
  
2. Select	
   both	
   RespWrong	
   and	
   RespCorr	
   datasets	
   and	
   create	
   TimeFrequencyGrid	
  
images	
   and	
   BrainMovies	
   for	
   the	
   between-­‐condition	
   differences	
   (if	
   more	
   than	
   one	
  
datasets	
   is	
   selected	
   TimeFrequencyGrid	
   and	
   BrainMovie3D	
   automatically	
   assume	
  
you	
   want	
   to	
   examine	
   the	
   between-­‐condition	
   difference.	
   Is	
   there	
   more	
   theta-­‐band	
  
information	
   flow	
   from	
   RCZp	
   during	
   error	
   commission	
   than	
   during	
   correct	
   button	
  
presses?	
  What	
  about	
  the	
  delta	
  band?	
  
	
  

	
  

59	
  

6.9.3.	
  
	
  

Causal	
  Projection	
  

When	
   we	
   are	
   interested	
   in	
   visualizing	
   the	
   anatomical	
   distribution	
   of	
   univariate	
   measures	
  
such	
   as	
   ERSP	
   and	
   multiple	
   coherence,	
   as	
   well	
   as	
   graph-­‐theoretic	
   measures	
   such	
   as	
   causal	
  
flow	
   and	
   asymmetry	
   ratio,	
   an	
   alternative	
   to	
   creating	
   a	
   brainmovie	
   is	
   create	
   a	
   Causal	
  
Projection	
   (CP)	
   image	
   (Mullen	
   et	
   al,	
   2010a).	
   	
   CP	
   is	
   based	
   on	
   the	
   dipole	
   density	
   concept	
  
proposed	
   by	
   Delorme	
   and	
   Makeig	
   in	
   2003.	
   The	
   basic	
   assumption	
   behind	
   CP	
   is	
   that	
   each	
  
dipole	
  located	
  at	
  coordinate	
  c	
  =	
  [x	
  y	
  z]	
  has	
  some	
  spatial	
  localization	
  uncertainty	
  which	
  we	
  
will	
  (naively)	
  approximate	
  by	
  a	
  three-­‐dimensional	
  Gaussian	
  distribution	
  with	
  mean	
  µ=c	
  and	
  
covariance	
   Σ.	
  Furthermore,	
  each	
  dipole	
  has	
  some	
  functional	
  measure	
  (spectral	
  modulation,	
  
information	
   outflow,	
   etc)	
   associated	
   with	
   its	
   source	
   process.	
   We	
   can	
   then	
   compute	
   the	
  
probability	
  of	
  observing	
  a	
  dipole	
  at	
  a	
  given	
  voxel,	
  weighted	
  by	
  the	
  amplitude	
  of	
  the	
  specified	
  
measure	
  to	
  obtain	
  an	
  estimate	
  of	
  the	
  likelihood	
  of	
  observing	
  a	
  source	
  in	
  a	
  particular	
  brain	
  
location,	
  and	
  that	
  source	
  having	
  a	
  significant	
  value	
  for	
  the	
  measure	
  of	
  interest.	
  	
  
Informally,	
  the	
  CP	
  at	
  a	
  given	
  voxel	
  is	
  a	
  weighted	
  sum	
  of	
  gaussian-­‐projected	
  distances	
  to	
  all	
  
(neighboring)	
   dipoles,	
   each	
   weighted	
   by	
   the	
   amplitude	
   of	
   the	
   specified	
   measure	
   (causal	
  
outflow,	
  ERSP,	
  etc)	
  for	
  the	
  source	
  associated	
  with	
  that	
  dipole.	
  
Formally,	
   if	
   wi	
   is	
   the	
   measure	
   associated	
   with	
   the	
   ith	
   source,	
   ci	
   is	
   the	
   coordinate	
   of	
   the	
   ith	
  
dipole	
  location	
  and	
  ! i	
  is	
  the	
  covariance	
  matrix	
  of	
  the	
  corresponding	
  Gaussian	
  (reflecting	
  the	
  
uncertainty	
   in	
   localization	
   and/or	
   inter-­‐subject	
   variability)	
   then	
   for	
   a	
   voxel	
   with	
   3-­‐
dimensional	
  coordinate	
  v	
  we	
  have	
  that:	
  
!" ! =

!
!!! !!

! !, !! , Σ!
	
  
!

where	
  Z	
  is	
  an	
  optional	
  normalization	
  term	
  to	
  ensure	
  a	
  total	
  probability	
  mass	
  of	
  1	
  over	
  the	
  
brain	
  volume	
  and	
  
! !, !, Σ =

1
2!

!/!

!

Σ

!/!

exp   −! ! − ! ! Σ !! ! − ! 	
  

is	
  the	
  multivariate	
  gaussian	
  p.d.f.	
  evaluated	
  at	
  v	
  .	
  
This	
   measure	
   can	
   easily	
   be	
   generalized	
   to	
   multi-­‐subject	
   datasets	
   by	
   simply	
   combining	
   all	
  
dipoles,	
  from	
  all	
  subjects	
  in	
  the	
  same	
  MNI	
  volumetric	
  space.	
  Figure	
  27	
  shows	
  the	
  result	
  of	
  
applying	
   Causal	
   Projection	
   to	
   a	
   cohort	
   of	
   26	
   subjects	
   performing	
   the	
   two-­‐back	
   task	
  
described	
   earlier.	
   Here	
   we	
   have	
   projected	
   the	
   delta-­‐theta	
   band	
   outflows	
   and	
   inflows	
   from	
  
each	
   source	
   that	
   exhibited	
   a	
   significant	
   difference	
   between	
   RespWrong	
   and	
   RespCorr	
  
conditions.	
   Note	
   the	
   causal	
   source	
   hub	
   in	
   RCZp	
   and	
   sinks	
   in	
   RCZa	
   and	
   PPC	
   following	
  
erroneous	
   button	
   presses.	
   We	
   can	
   simultaneously	
   plot	
   two	
   measures	
   (e.g.,	
   outflow	
   and	
  
inflow)	
   or	
   two	
   conditions	
   using	
   gamma-­‐corrected	
   color	
   mapping	
   (lower	
   panel).	
   In	
   this	
  
manner	
  red	
  might	
  represent	
  increases	
  in	
  the	
  first	
  measure	
  (outflow),	
  and	
  green	
  represents	
  
increases	
  in	
  second	
  measure	
  (inflow)	
  with	
  the	
  sum	
  of	
  the	
  two	
  color	
  (yellow)	
  representing	
  
no	
   difference	
   between	
   measures	
   (balanced	
   flow)	
   and	
   transparency	
   representing	
   non-­‐
significance	
  (no	
  flow).	
  
Causal	
   projection	
   is	
   currently	
   being	
   revamped	
   for	
   inclusion	
   in	
   SIFT	
   0.1b	
  
(pop_vis_causalProjection()).	
  

	
  

60	
  

Outflow
Inflow

max

Error > Correct (p < 0.05) 3-7 Hz

Inflow/Outflow

Inflow

Inflow + Outflow

Outflow

0

	
  
Figure	
   27.	
   A	
   frame	
   (250	
   ms	
   following	
   button	
   press)	
   of	
   a	
   Causal	
   Projection	
   movie	
   computed	
   across	
   a	
  
cohort	
  of	
  26	
  subjects	
  performing	
  the	
  two-­‐back	
  CP	
  task.	
  Here	
  we	
  have	
  projected	
  the	
  delta-­‐theta	
  outflows	
  
and	
  inflows	
  from	
  each	
  source	
  that	
  exhibited	
  a	
  significant	
  difference	
  between	
   RespWrong	
  and	
  RespCorr	
  
conditions.	
  Note	
  the	
  causal	
  source	
  hub	
  in	
  RCZp	
  and	
  sinks	
  in	
  RCZa	
  and	
  PPC.	
  

	
  

6.10.	
   Group	
  Analysis	
  
	
  
Cognitive	
   experiments	
   are	
   typically	
   carried	
   out	
   across	
   a	
   cohort	
   of	
   participants	
   and	
   it	
   is	
  
useful	
   to	
   be	
   able	
   to	
   characterize	
   difference	
   in	
   brain	
   network	
   activity	
   within	
   and	
   between	
  
groups	
   of	
   individuals	
   for	
   different	
   conditions.	
   The	
   Group	
   Analysis	
   module	
   in	
   SIFT,	
   currently	
  
under	
   development,	
   will	
   afford	
   several	
   routines	
   for	
   assessing	
   group-­‐level	
   connectivity	
  
networks	
   with	
   confidence	
   intervals.	
   While	
   such	
   analysis	
   is	
   relatively	
   simple	
   when	
  
performing	
   analyses	
   on	
   scalp	
   channels,	
   it	
   become	
   more	
   complicated	
   when	
   estimating	
  
connectivity	
  in	
  the	
  source	
  domain	
  between	
  dipolar	
  IC	
  processes.	
  This	
  is	
  primarily	
  due	
  to	
  the	
  
fact	
   that	
   it	
   is	
   often	
   difficult	
   to	
   equate	
   IC	
   sources	
   between	
   participants.	
   While	
   we	
   typically	
  
utilize	
   clustering	
   techniques	
   to	
   help	
   equate	
   dipolar	
   sources	
   across	
   participants,	
   in	
   some	
  
cases	
   a	
   subset	
   of	
   participants	
   will	
   still	
   not	
   exhibit	
   one	
   or	
   more	
   sources	
   approximately	
  
observed	
  in	
  all	
  other	
  participants.	
  If	
  one	
  does	
  not	
  take	
  into	
  account	
  these	
  missing	
  variables,	
  
one	
   may	
   risk	
   obtaining	
   biased	
   estimates	
   of	
   average	
   connectivity	
   across	
   the	
   subject	
  
population.	
   This	
   missing	
   variable	
   problem	
   is	
   well-­‐known	
   in	
   statistics,	
   and	
   several	
  
approaches	
   have	
   been	
   proposed	
   for	
   dealing	
   with	
   this.	
   Currently,	
   group	
   analysis	
   in	
   the	
  
source	
  domain	
  is	
  implemented	
  using	
  two	
  methods,	
  disjoint	
  clustering,	
  which	
  does	
  not	
  take	
  
into	
   account	
   the	
   missing	
   variable	
   problem	
   but	
   may	
   still	
   be	
   useful	
   for	
   a	
   general	
   analysis,	
  
particularly	
   if	
   there	
   is	
   high	
   agreement	
   across	
   the	
   cohort	
   of	
   datasets	
   in	
   terms	
   of	
   source	
  
location,	
   and	
   a	
   Bayesian	
   mixture	
   model	
   approach	
   which	
   provides	
   more	
   robust	
   statistics	
  

	
  

61	
  

across	
  datasets.	
  A	
  brief	
  description	
  of	
  these	
  methods	
  is	
  provided	
  below,	
  and	
  a	
  more	
  detailed	
  
description	
  will	
  be	
  included	
  with	
  the	
  beta	
  release	
  of	
  SIFT.	
  	
  
6.10.1.	
  
	
  

Disjoint	
  Clustering	
  	
  

This	
  approach	
  adopts	
  a	
  3-­‐stage	
  process:	
  	
  
1. Identify	
   K	
   ROI’s	
   (clusters)	
   by	
   affinity	
   clustering	
   of	
   sources	
   across	
   subject	
   population	
  
using	
  EEGLAB’s	
  Measure-­‐Product	
  clustering.	
  	
  
2. Average	
   all	
   incoming	
   and	
   outgoing	
   individually	
   statistically	
   significant	
   connections	
  
between	
  each	
  pair	
  of	
  ROIs	
  to	
  create	
  a	
  [	
  K	
  X	
  K	
  [x	
  freq	
  x	
  time	
  ]	
  ]	
  group	
  connectivity	
  matrix.	
  	
  
3. Visualize	
  the	
  results	
  using	
  any	
  of	
  SIFTs	
  visualization	
  routines.	
  This	
  method	
  suffers	
  from	
  
low	
   statistical	
   power	
   when	
   subjects	
   do	
   not	
   have	
   high	
   agreement	
   in	
   terms	
   of	
   source	
  
locations	
  (missing	
  variable	
  problem).	
  
	
  
6.10.2.	
  
	
  

Bayesian	
  Mixture	
  Model	
  

A	
   more	
   robust	
   approach	
   (in	
   development	
   with	
   Wes	
   Thompson	
   and	
   to	
   be	
   released	
   in	
   SIFT	
  
1.0b)	
   uses	
   smoothing	
   splines	
   and	
   Monte-­‐Carlo	
   methods	
   for	
   joint	
   estimation	
   of	
   posterior	
  
probability	
   (with	
   confidence	
   intervals)	
   of	
   cluster	
   centroid	
   location	
   and	
   between-­‐cluster	
  
connectivity.	
   This	
   method	
   takes	
   into	
   account	
   the	
   “missing	
   variable”	
   problem	
   inherent	
   to	
   the	
  
disjoint	
  clustering	
  approach	
  and	
  provides	
  robust	
  group	
  connectivity	
  statistics.	
  

7.	
   Conclusions	
  and	
  Future	
  Work	
  
	
  
In	
   this	
   paper	
   we	
   have	
   introduced	
   a	
   new,	
   open-­‐source	
   (Matlab-­‐based)	
   toolbox	
   for	
  
electrophysiological	
  information	
  flow	
  analysis	
  which	
  functions	
  as	
  a	
  plugin	
  for	
  the	
  EEGLAB	
  
environment.	
   We	
   sought	
   to	
   outline	
   the	
   theoretical	
   basis	
   for	
   vector	
   autoregressive	
   (VAR)	
  
model	
   fitting	
   of	
   electrophysiological	
   data	
   as	
   well	
   as	
   some	
   VAR-­‐based	
   measures	
   for	
  
multivariate	
  granger-­‐causality	
  and	
  spectral	
  analysis	
  in	
  the	
  time	
  and	
  frequency	
  domain.	
  We	
  
then	
  demonstrated	
  the	
  applicability	
  of	
  these	
  approaches	
  through	
  simulations	
  and,	
  using	
  the	
  
SIFT	
   toolbox,	
   the	
   analysis	
   of	
   EEG	
   data	
   from	
   an	
   individual	
   performing	
   an	
   error-­‐inducing	
  
cognitive	
  task.	
  	
  
Although	
   the	
   current	
   release	
   of	
   SIFT	
   is	
   alpha	
   (and	
   therefore	
   lacking	
   important	
   several	
  
features	
   which	
   are	
   currently	
   in	
   development,	
   such	
   as	
   group	
   analysis	
   and	
   integrated	
  
statistics),	
  SIFT	
  0.1-­‐beta,	
  scheduled	
  for	
  released	
  in	
  January,	
  will	
  contain	
  these	
  features	
  and	
  
more.	
  
	
  

	
  

62	
  

8.	
   Acknowledgements	
  
	
  
I	
  would	
  like	
  to	
  first	
  express	
  my	
  deep	
  appreciation	
  to	
  Arnaud	
  Delorme	
  and	
  Christian	
  Kothe	
  
who	
   have	
   helped	
   in	
   the	
   development	
   of	
   this	
   toolbox.	
   Dr.	
   Delorme	
   is	
   the	
   principal	
   developer	
  
of	
   EEGLAB	
   and	
   has	
   helped	
   substantially	
   with	
   integrating	
   the	
   toolbox	
   into	
   the	
   EEGLAB	
  
environment	
   as	
   well	
   as	
   with	
   modifications	
   of	
   his	
   brainmovie3d.m	
   and	
   dipoledensity.m	
  
functions	
  on	
  which	
  the	
  causal	
  brainmovie	
  and	
  causal	
  projection	
  functions	
  have	
  been	
  based.	
  
Mr.	
   Kothe	
   contributed	
   to	
   many	
   conversations	
   regarding	
   the	
   structure	
   of	
   the	
   toolbox	
   and	
  
contributed	
  the	
  function	
  input/output	
  specification	
  and	
  PropertyGrid	
  code,	
  which	
  is	
  used	
  in	
  
some	
  of	
  the	
  graphical	
  user	
  interfaces.	
  	
  
I	
   would	
   also	
   like	
   to	
   thank	
   Scott	
   Makeig	
   for	
   his	
   constant	
   encouragement	
   and	
   intellectual	
  
contributions	
   in	
   developing	
   this	
   toolbox.	
   Additionally,	
   I’d	
   like	
   to	
   thank	
   my	
  
undergraduate/postgraduate	
   advisor	
   Dr.	
   Robert	
   Knight	
   (UC	
   Berkeley)	
   who	
   supported	
   my	
  
development	
  of	
  the	
  ECViz	
  toolbox	
  on	
  which	
  the	
  concept	
  of	
  SIFT	
  was	
  based.	
  
Thanks	
   also	
   goes	
   to	
   Virginia	
   De	
   Sa	
   and	
   to	
   Doug	
   Nitz	
   for	
   serving	
   as	
   my	
   project	
   committee	
  
members	
  and	
  for	
  their	
  constant	
  patience	
  throughout	
  the	
  development	
  of	
  this	
  project.	
  
Finally,	
  a	
  big	
  thanks	
  goes	
  to	
  Nima	
  Bigdely	
  Shamlo,	
  Julie	
  Onton,	
  Thorsten	
  Zander	
  and	
  others	
  
from	
   SCCN	
   and	
   elsewhere	
   who	
   have	
   contributed	
   ideas,	
   datasets,	
   visualization	
   code,	
   and	
  
other	
  useful	
  items	
  to	
  this	
  project.	
  
The	
   author	
   (Tim	
   Mullen)	
   is	
   generously	
   supported	
   by	
   a	
   San	
   Diego	
   Fellowship,	
   a	
   Glushko	
  
Fellowship	
  (Dept.	
  of	
  Cognitive	
  Science),	
  and	
  endowments	
  from	
  the	
  Swartz	
  Foundation	
  (Old	
  
Field,	
  NY).	
  
Parametric	
  model-­‐fitting	
  in	
  SIFT	
  makes	
  use	
  of	
  modified	
  routines	
  from	
  Alois	
  Schloegl’s	
  open-­‐
source	
   TSA+NaN	
   package	
   and,	
   if	
   downloaded	
   separately,	
   Tapio	
   Schneider	
   and	
   Arnold	
  
Neumaier’s	
  ARfit	
  package.	
  
	
  

	
  

	
  

63	
  

	
  

9.	
   Appendix	
  I:	
  	
  SIFT	
  0.1a	
  Function	
  Reference	
  
	
  
The	
   table	
   below	
   is	
   a	
   partial	
   reference	
   for	
   SIFT	
   0.1a	
   functions.	
   Not	
   all	
   functions	
   are	
  
documented	
  in	
  this	
  list.	
  

GUI	
  functions	
  

	
  
	
  

Preprocessing	
  

	
  

Modeling	
  

	
  

	
  

Function	
  Name	
  
	
  
pop_pre_prepData	
  
pop_est_fitMVAR	
  
pop_est_selModelOrder	
  
pop_est_validateMVAR	
  
pop_est_mvarConnectivity	
  
pop_vis_TimeFreqGrid	
  
pop_vis_causalBrainMovie3D	
  
	
  
pre_detrend	
  
pre_diffData	
  
pre_normData	
  

Description	
  
	
  
generate	
  GUI	
  for	
  data	
  preprocessing	
  
generate	
  GUI	
  for	
  VAR/AMVAR	
  model	
  fitting	
  
generate	
  GUI	
  for	
  VAR	
  model	
  order	
  selection	
  
generate	
  GUI	
  for	
  VAR	
  model	
  validation	
  
generate	
  GUI	
  for	
  computing	
  connectivity	
  measures	
  
generate	
  GUI	
  for	
  Interactive	
  Time-­‐Frequency	
  Grid	
  
generate	
  GUI	
  for	
  Interactive	
  Causal	
  BrainMovie3D	
  
	
  
linearly	
  detrend	
  or	
  center	
  an	
  ensemble	
  of	
  data	
  
apply	
  a	
  difference	
  filter	
  to	
  an	
  ensemble	
  of	
  data	
  
apply	
   temporal	
   or	
   ensemble	
   normalization	
   to	
   an	
  
ensemble	
  of	
  data	
  
pre_prepData	
  
preprocess	
   an	
   ensemble	
   of	
   data	
   (calls	
   other	
  
subfunctions)	
  
pre_selectComps	
  
select	
  a	
  set	
  of	
  independent	
  components	
  from	
  the	
  data	
  
	
  
	
  
est_calcInvCovMat	
  
compute	
  inverse	
  covariance	
  matrix	
  of	
  a	
  VAR	
  process	
  
est_calcInvCovMatFourier	
  
compute	
   frequency-­‐domain	
   transformation	
   of	
   the	
  
inverse	
  covariance	
  matrix	
  of	
  a	
  VAR	
  process	
  	
  
est_calcInvCovMatFourierPDC	
   same	
  as	
  above,	
  but	
  a	
  specific	
  version	
  used	
  for	
  analytic	
  
PDC	
  significance	
  thresholds	
  
est_checkMVARConsistency	
  
check	
  the	
  percent	
  consistency	
  of	
  a	
  fitted	
  VAR	
  model	
  
est_checkMVARParams	
  
perform	
   a	
   sanity	
   check	
   on	
   a	
   set	
   of	
   specified	
   MVAR	
  
parameters	
   –	
   return	
   recommendations	
   on	
   optimal	
  
parameters,	
  if	
  relevant.	
  
est_checkMVARStability	
  
check	
  the	
  stability/stationarity	
  of	
  fitted	
  VAR	
  model	
  
est_checkMVARWhiteness	
  
check	
   the	
   whiteness	
   of	
   the	
   residuals	
   of	
   a	
   fitted	
   VAR	
  
model	
  
est_eigenmode	
  
return	
   the	
   eigenmodes	
   of	
   a	
   VAR	
   process	
   (requires	
  
ARFIT	
  package)	
  
est_fitMVAR	
  
fit	
   a	
   VAR[p]	
   model	
   to	
   the	
   data	
   using	
   one	
   of	
   several	
  
algorithms	
  (Vieira-­‐Morf,	
  ARFIT,	
  MLS,	
  etc).	
  Optionally	
  
can	
   use	
   a	
   sliding	
   window	
   to	
   perform	
   segmentation-­‐
based	
   adaptive	
   MVAR	
   analysis.	
   Calls	
   modified	
  
routines	
   from	
   	
   Alois	
   Schloegl’s	
   open-­‐source	
   TSA	
  
package	
  or	
  from	
  the	
  ARFIT	
  package.	
  
est_fitMVARKalman	
  
fit	
  a	
  VAR[p]	
  model	
  to	
  continuous	
  data	
  using	
  a	
  Kalman	
  

64	
  

Statistics	
  

	
  

Visualization	
  

	
  

Simulations	
  

	
  

Helpers	
  

	
  

filter.	
   Adapts	
   code	
   from	
   Alois	
   Schloegl’s	
   open-­‐source	
  
TSA	
  package	
  
est_MVARConnectivity	
  
compute	
   spectral	
   density,	
   coherence,	
   and	
  
connectivity	
  	
  measures	
  from	
  a	
  fitted	
  VAR	
  model	
  
est_mvarResiduals	
  
return	
  the	
  residuals	
  of	
  a	
  fitted	
  VAR	
  model	
  
est_mvtransfer	
  
compute	
   frequency-­‐domain	
   quantities	
   from	
   a	
   VAR	
  
model	
  (spectrum,	
  coherence,	
  granger-­‐causality,	
  etc)	
  
est_selModelOrder	
  
evaluate	
   and	
   return	
   model	
   order	
   selection	
   criteria	
  
(AIC,	
  SBC,	
  FPE,	
  HQ)	
  for	
  a	
  range	
  of	
  model	
  orders	
  
	
  
	
  
stat_bootSignificanceTests	
  
perform	
   bootstrap	
   significance	
   tests	
   on	
   connectivity	
  
structure	
  
stat_analyticSignificanceTests	
   perform	
   asymptotic	
   analysis	
   significance	
   tests	
   on	
  
connectivity	
  structure	
  
stat_phaserand	
  
return	
  a	
  distribution	
  satisfying	
  the	
  null	
  hypothesis	
  of	
  
no	
   connectivity	
   using	
   a	
   phase-­‐randomization	
  
approach	
  (Theiler,	
  1997)	
  
stat_bootstrap	
  
return	
   a	
   bootstrapped	
   distribution	
   of	
   a	
  
spectral/connectivity	
  estimator	
  
stat_prctileTest	
  
perform	
   one-­‐	
   or	
   two-­‐sided	
   percentile	
   tests	
   to	
  
compare	
   an	
   observed	
   value	
   with	
   the	
   quantiles	
   of	
   a	
  
(null)	
  distribution.	
  
	
  
	
  
vis_TimeFreqGrid	
  
low-­‐level	
   function	
   to	
   create	
   an	
   interactive	
   Time-­‐
Frequency	
  Grid	
  
vis_TimeFreqCell	
  
low-­‐level	
   function	
   to	
   render	
   an	
   expanded	
   (detailed)	
  
version	
  of	
  a	
  single	
  cell	
  of	
  the	
  Time-­‐Frequency	
  Grid	
  
vis_causalBrainMovie3D	
  
low-­‐level	
  function	
  to	
  generate	
  a	
  causal	
  BrainMovie3D	
  
vis_causalProjection	
  
in	
   development	
   –	
   low-­‐level	
   function	
   to	
   generate	
   a	
  
Causal	
  Projection	
  image	
  or	
  movie	
  
	
  
	
  
sim_genVARModelFromEq	
  
generate	
   an	
   arsim()-­‐compatible	
   VAR	
   specification	
  
	
  
from	
  a	
  text-­‐based	
  equation	
  
	
  
	
  
	
  
	
  
	
  
	
  
hlp_*	
  
A	
   large	
   number	
   of	
   helper	
   functions	
   (to	
   be	
  
documented	
  later)	
  

	
  
	
  
	
  

	
  

	
  

65	
  

10.	
  References	
  
	
  
Anderson	
  BDO,	
  Moore	
  JB	
  (1979)	
  Optimal	
  Filtering.	
  Englewood	
  Cliff,	
  NJ:	
  Prentice-­‐Hall.	
  
Arranz	
   MA	
   Portmanteau	
   Test	
   Statistics	
   in	
   Time	
   Series.	
   packages.tol-­‐project.org	
   Available	
   at:	
  
http://packages.tol-­‐project.org/docs/ndmtest.pdf.	
  
Astolfi	
  L,	
  Cincotti	
  F,	
  Mattia	
  D,	
  Marciani	
  MG,	
  Baccala	
  L	
  a,	
  Vico	
  Fallani	
  F	
  de,	
  Salinari	
  S,	
  Ursino	
  
M,	
   Zavaglia	
   M,	
   Ding	
   L,	
   Edgar	
   JC,	
   Miller	
   G	
   a,	
   He	
   B,	
   Babiloni	
   F	
   (2007)	
   Comparison	
   of	
  
different	
   cortical	
   connectivity	
   estimators	
   for	
   high-­‐resolution	
   EEG	
   recordings.	
   Human	
  
brain	
  mapping	
  28:143-­‐57	
  
Baccalá	
   LA,	
   Sameshima	
   K	
   (2001)	
   Partial	
   directed	
   coherence:	
   a	
   new	
   concept	
   in	
   neural	
  
structure	
  determination.	
  Biological	
  cybernetics	
  84:463-­‐74	
  
Baccalá	
   LA,	
   Sameshima	
   K	
   (2007)	
   Generalized	
   partial	
   directed	
   coherence	
   In	
   Digital	
   Signal	
  
Processing,	
  2007	
  15th	
  International	
  Conference	
  on	
  IEEE,	
  p.	
  163–166.	
  	
  
Barone	
  P	
  (1987)	
  A	
  method	
  for	
  generating	
  independent	
  realizations	
  of	
  a	
  multivariate	
  normal	
  
stationary	
  and	
  invertible	
  ARMA(p,q)	
  process.	
  J.	
  Statist.	
  Comput.	
  Simulat.	
  8:273-­‐83	
  	
  
Bell	
  AJ,	
  Sejnowski	
  TJ	
  (1995)	
  An	
  information-­‐maximization	
  approach	
  to	
  blind	
  separation	
  and	
  
blind	
  deconvolution.	
  Neural	
  computation	
  7:1129–1159	
  	
  
Benjamini	
   Y,	
   Hochberg	
   Y	
   (1995)	
   Controlling	
   the	
   false	
   discovery	
   rate:	
   a	
   practical	
   and	
  
powerful	
   approach	
   to	
   multiple	
   testing.	
   Journal	
   of	
   the	
   Royal	
   Statistical	
   Society.	
   Series	
   B	
  
(Methodological)	
  57:289–300	
  	
  
Blinowska	
   K,	
   Kaminski	
   M	
   (2006)	
   Multivariate	
   Signal	
   Analysis	
   by	
   Parametric	
   Models	
   In	
   B.	
  
Schelter,	
   M.	
   Winterhalder,	
   &	
   J.	
   Timmer,	
   eds.	
   Handbook	
   of	
   Time	
   Series	
   Analysis	
   Wiley,	
  
Wienheim.	
  
Bressler	
   SL,	
   Richter	
   CG,	
   Chen	
   Y,	
   Ding	
   M	
   (2007)	
   Cortical	
   functional	
   network	
   organization	
  
from	
   autoregressive	
   modeling	
   of	
   local	
   field	
   potential	
   oscillations.	
   Statistics	
   in	
   medicine	
  
26:3875–3885	
  	
  
Bressler	
   SL,	
   Seth	
   AK	
   (2010)	
   Wiener-­‐Granger	
   Causality:	
   A	
   well	
   established	
   methodology.	
  
NeuroImage	
  	
  
Brillinger	
  DR	
  (2001)	
  Time	
  series:	
  data	
  analysis	
  and	
  theory.	
  SIAM.	
  
Bullmore	
   E,	
   Sporns	
   O	
   (2009)	
   Complex	
   brain	
   networks:	
   graph	
   theoretical	
   analysis	
   of	
  
structural	
  and	
  functional	
  systems.	
  Nature	
  reviews.	
  Neuroscience	
  10:186-­‐98	
  
Burg	
   JP	
   (1967)	
   Maximum	
   entropy	
   spectral	
   analysis	
   In	
   37th	
   Ann.	
   Int.	
   Meet.,	
   Soc.	
  
Explor.Geophys.	
  Oklahoma	
  City,	
  OK,	
  USA.	
  
Burg	
  JP	
  (1975)	
  Maximum	
  entropy	
  spectral	
  analysis.	
  Stanford,	
  CA,	
  USA:	
  Stanford	
  University	
  
Press.	
  
Buzsaki	
  G	
  (2006)	
  Rhythms	
  of	
  the	
  Brain.	
  Oxford	
  University	
  Press,	
  USA.	
  
Chatfield	
  C	
  (1989)	
  The	
  Analysis	
  of	
  Time	
  Series:	
  An	
  Introduction	
  4th	
  ed.	
  Chapman	
  &	
  Hall.	
  
Chen	
   Y,	
   Bressler	
   SL,	
   Ding	
   M	
   (2006)	
   Frequency	
   decomposition	
   of	
   conditional	
   Granger	
  
causality	
   and	
   application	
   to	
   multivariate	
   neural	
   field	
   potential	
   data.	
   Journal	
   of	
  
neuroscience	
  methods	
  150:228-­‐37	
  	
  
Deshpande	
  G,	
  LaConte	
  S,	
  James	
  GA,	
  Peltier	
  S,	
  Hu	
  X	
  (2009)(a)	
  Multivariate	
  Granger	
  causality	
  
analysis	
  of	
  fMRI	
  data.	
  Human	
  brain	
  mapping	
  30:1361-­‐73	
  
Deshpande	
   G,	
   Sathian	
   K,	
   Hu	
   X	
   (2009)(b)	
   Effect	
   of	
   hemodynamic	
   variability	
   on	
   Granger	
  
causality	
  analysis	
  of	
  fMRI.	
  NeuroImage	
  
Dhamala	
   M,	
   Rangarajan	
   G,	
   Ding	
   M	
   (2008)	
   Analyzing	
   information	
   flow	
   in	
   brain	
   networks	
  
with	
  nonparametric	
  Granger	
  causality.	
  NeuroImage	
  41:354-­‐62	
  
Ding	
   MZ,	
   Bressler	
   SL,	
   Yang	
   WM,	
   Liang	
   HL	
   (2000)(a)	
   Short-­‐window	
   spectral	
   analysis	
   of	
  
cortical	
  event-­‐related	
  potentials	
  by	
  adaptive	
  multivariate	
  autoregressive	
  modeling:	
  Data	
  

	
  

66	
  

preprocessing,	
   model	
   validation,	
   and	
   variability	
   assessment.	
   Biological	
   Cybernetics	
  
83:35-­‐45	
  	
  
Ding	
  M,	
  Bressler	
  SL,	
  Yang	
  W,	
  Liang	
  H	
  (2000)(b)	
  Short-­‐window	
  spectral	
  analysis	
  of	
  cortical	
  
event-­‐related	
   potentials	
   by	
   adaptive	
   multivariate	
   autoregressive	
   modeling:	
   data	
  
preprocessing,	
  model	
  validation,	
  and	
  variability	
  assessment.	
  Biol.	
  Cybern.	
  83:35-­‐45	
  	
  
Ding	
   M,	
   Chen	
   Y,	
   Bressler	
   SL	
   (2006)	
   Granger	
   causality:	
   Basic	
   theory	
   and	
   application	
   to	
  
neuroscience	
  In	
  B.	
  Schelter,	
  M.	
  Winterhalder,	
  &	
  J.	
  Timmer,	
  eds.	
  Handbook	
  of	
  Time	
  Series	
  
Analysis	
  Wiley,	
  Wienheim.	
  
Efron	
  B,	
  Tibshirani	
  RJ	
  (1994)	
  An	
  Introduction	
  to	
  the	
  Bootstrap:	
  Monographs	
  on	
  Statistics	
  &	
  
Applied	
  Probability.	
  Chapman	
  and	
  Hall/CRC.	
  
Eichler	
   M	
   (2006)(a)	
   Graphical	
   Modeling	
   of	
   Dynamic	
   Relationships	
   in	
   Multivariate	
   Time	
  
Series	
   In	
   B.	
   Schelter,	
   M.	
   Winterhalder,	
   &	
   J.	
   Timmer,	
   eds.	
   Handbook	
   of	
   Time	
   Series	
  
Analysis	
  Wiley,	
  Wienheim.	
  
Eichler	
   M	
   (2006)(b)	
   On	
   the	
   evaluation	
   of	
   information	
   flow	
   in	
   multivariate	
   systems	
   by	
   the	
  
directed	
  transfer	
  function.	
  Biological	
  cybernetics	
  94:469-­‐82	
  
Fitzgibbon	
   SP,	
   Powers	
   DMW,	
   Pope	
   KJ,	
   Clark	
   CR	
   (2007)	
   Removal	
   of	
   EEG	
   noise	
   and	
   artifact	
  
using	
  blind	
  source	
  separation.	
  Journal	
  of	
  clinical	
  neurophysiology	
  :	
  official	
  publication	
  of	
  
the	
  American	
  Electroencephalographic	
  Society	
  24:232-­‐43	
  	
  
Florian	
   G,	
   Pfurtscheller	
   G	
   (1995)	
   Dynamic	
   spectral	
   analysis	
   of	
   event-­‐related	
   EEG	
   data.	
  
Electroencephalography	
  and	
  clinical	
  neurophysiology	
  95:393–396	
  	
  
Florin	
  E,	
  Gross	
  J,	
  Pfeifer	
  J,	
  Fink	
  GR,	
  Timmermann	
  L	
  (2010)	
  The	
  effect	
  of	
  filtering	
  on	
  Granger	
  
causality	
  based	
  multivariate	
  causality	
  measures.	
  NeuroImage	
  50:577-­‐88	
  
Geweke	
   J	
   (1982)	
   Measurement	
   of	
   linear	
   dependence	
   and	
   feedback	
   between	
   multiple	
   time	
  
series.	
  Journal	
  of	
  the	
  American	
  Statistical	
  Association	
  77:304–313	
  
Granger	
   CWJ	
   (1969)	
   Investigating	
   causal	
   relations	
   by	
   econometric	
   models	
   and	
   cross-­‐
spectral	
  methods.	
  Econometrica:	
  Journal	
  of	
  the	
  Econometric	
  Society	
  37:424-­‐438	
  	
  
Haufe	
   S,	
   Tomioka	
   R,	
   Nolte	
   G	
   (2010)	
   Modeling	
   sparse	
   connectivity	
   between	
   underlying	
   brain	
  
sources	
  for	
  EEG/MEG.	
  Biomedical	
  Engineering:1-­‐10	
  
Holroyd	
  CB,	
  Coles	
  MGH	
  (2002)	
  The	
  Neural	
  Basis	
  of	
  Human	
  Error	
  Processing:	
  Reinforcement	
  
Learning,	
   Dopamine,	
   and	
   the	
   Error-­‐Related	
   Negativity.	
   Psychological	
   Review	
   109:679	
   -­‐
709	
  	
  
Hui	
   HB,	
   Leahy	
   RM	
   (2006)	
   Linearly	
   Constrained	
   MEG	
   Beamformers	
   for	
   MVAR	
   Modeling	
   of	
  
Cortical	
   Interactions	
   In	
   3rd	
   IEEE	
   International	
   Symposium	
   on	
   Biomedical	
   Imaging:	
  
Macro	
  to	
  Nano	
  IEEE,	
  p.	
  237-­‐240.	
  	
  
Jansen	
  BH,	
  Bourne	
  JR,	
  Ward	
  JW	
  (1981)	
  Autoregressive	
  estimation	
  of	
  short	
  segment	
  spectra	
  
for	
  computerized	
  EEG	
  analysis.	
  IEEE	
  transactions	
  on	
  bio-­‐medical	
  engineering	
  28:630-­‐8	
  	
  
Kaminski	
   M	
   (1997)	
   Multichannel	
   Data	
   Analysis	
   in	
   Biomedical	
   Research	
   In	
   Understanding	
  
Complex	
   Systems,	
   Handbook	
   of	
   Brain	
   Connectivity	
   series	
   Berlin/Heidelberg:	
   Springer,	
   p.	
  
327-­‐355.	
  
Kaminski	
  M,	
  Ding	
  M,	
  Truccolo	
  WA,	
  Bressler	
  SL	
  (2001)	
  Evaluating	
  causal	
  relations	
  in	
  neural	
  
systems:	
   Granger	
   causality,	
   directed	
   transfer	
   function	
   and	
   statistical	
   assessment	
   of	
  
significance.	
  Biological	
  Cybernetics	
  85:145–157	
  
Kaminski	
  M,	
  Blinowska	
  K	
  (1991)	
  A	
  New	
  Method	
  of	
  the	
  description	
  of	
  the	
  information	
  flow	
  
in	
  the	
  brain	
  structures.	
  Biological	
  Cybernetics	
  65:203–210	
  
Kenet	
  T,	
  Arieli	
  A,	
  Tsodyks	
  M,	
  Grinvald	
  A	
  (2005)	
  Are	
  Single	
  Cortical	
  Neurons	
  Soloists	
  or	
  Are	
  
They	
   Obedient	
   Members	
   of	
   a	
   Huge	
   Orchestra?	
   In	
   J.	
   L.	
   van	
   Hemmen	
   &	
   T.	
   J.	
   Sejnowski,	
   eds.	
  
23	
  Problems	
  in	
  Systems	
  Neuroscience	
  Oxford	
  University	
  Press,	
  USA.	
  
Korzeniewska	
   A	
   (2003)	
   Determination	
   of	
   information	
   flow	
   direction	
   among	
   brain	
  
structures	
   by	
   a	
   modified	
   directed	
   transfer	
   function	
   (dDTF)	
   method.	
   Journal	
   of	
  
Neuroscience	
  Methods	
  125:195-­‐207	
  
	
  

67	
  

Korzeniewska	
  A,	
  Crainiceanu	
  M,	
  Kus	
  R,	
  Franaszczuk	
  P,	
  Crone	
  N	
  (2008)	
  Dynamics	
  of	
  Event-­‐
Related	
  Causality	
  in	
  Brain	
  Electrical	
  Activity.	
  Human	
  brain	
  mapping	
  29:1170–1192	
  	
  
Luu	
  P,	
  Tucker	
  DM,	
  Makeig	
  S	
  (2004)	
  Frontal	
  midline	
  theta	
  and	
  the	
  error-­‐related	
  negativity:	
  
neurophysiological	
   mechanisms	
   of	
   action	
   regulation.	
   Clinical	
   neurophysiology	
   :	
   official	
  
journal	
  of	
  the	
  International	
  Federation	
  of	
  Clinical	
  Neurophysiology	
  115:1821-­‐35	
  	
  
Lütkepohl	
   H	
   (2006)	
   New	
   Introduction	
   to	
   Multiple	
   Time	
   Series	
   Analysis.	
   Berlin,	
   Germany:	
  
Springer.	
  
Marple	
   S	
   (1987)	
   Digital	
   Spectral	
   Analysis	
   with	
   Applications.	
   Englewood	
   Cliff,	
   NJ:	
   Prentice	
  
Hall.	
  
Michel	
   CM,	
   Murray	
   MM,	
   Lantz	
   G,	
   Gonzalez	
   S,	
   Spinelli	
   L,	
   Grave	
   De	
   Peralta	
   R	
   (2004)	
   EEG	
  
source	
  imaging.	
  Clinical	
  Neurophysiology	
  115:2195–2222	
  	
  
Mognon	
   A,	
   Jovicich	
   J,	
   Bruzzone	
   L,	
   Buiatti	
   M	
   (2010)	
   ADJUST:	
   An	
   automatic	
   EEG	
   artifact	
  
detector	
  based	
  on	
  the	
  joint	
  use	
  of	
  spatial	
  and	
  temporal	
  features.	
  Psychophysiology:1-­‐12	
  	
  
Mullen,	
   T.,	
   Delorme,	
   A.,	
   Kothe,	
   C.,	
   Makeig,	
   S	
   (2010)	
   An	
   Electrophysiological	
   Information	
  
Flow	
  Toolbox	
  for	
  EEGLAB.	
  Society	
  for	
  Neuroscience	
  2010,	
  San	
  Diego,	
  CA	
  
Neumaier	
  A,	
  Schneider	
  T	
  (2001)	
  Estimation	
  of	
  parameters	
  and	
  eigenmodes	
  of	
  multivariate	
  
autoregressive	
  models.	
  ACM	
  Transactions	
  on	
  Mathematical	
  Software	
  (TOMS)	
  27:27–57	
  
Nolte	
   G,	
   Bai	
   O,	
   Wheaton	
   L,	
   Mari	
   Z,	
   Vorbach	
   S,	
   Hallett	
   M	
   (2004)	
   Identifying	
   true	
   brain	
  
interaction	
   from	
   EEG	
   data	
   using	
   the	
   imaginary	
   part	
   of	
   coherency.	
   Clinical	
  
neurophysiology	
   :	
   official	
   journal	
   of	
   the	
   International	
   Federation	
   of	
   Clinical	
  
Neurophysiology	
  115:2292-­‐307	
  	
  
Onton	
   J,	
   Makeig	
   S	
   (2009)	
   High-­‐frequency	
   Broadband	
   Modulations	
   of	
  
Electroencephalographic	
  Spectra.	
  Frontiers	
  in	
  human	
  neuroscience	
  3:61	
  
Pearl	
  J	
  (2000)	
  Causality:	
  Models,	
  Reasoning,	
  and	
  Inference.	
  Cambridge	
  University	
  Press.	
  	
  
Pearl	
   J	
   (2009)	
   Causality:	
   Models,	
   Reasoning	
   and	
   Inference	
   2nd	
   ed.	
   New	
   York,	
   New	
   York,	
  
USA:	
  Cambridge	
  University	
  Press.	
  
Pereda	
   E,	
   Quiroga	
   RQ,	
   Bhattacharya	
   J	
   (2005)	
   Nonlinear	
   multivariate	
   analysis	
   of	
  
neurophysiological	
  signals.	
  Progress	
  in	
  neurobiology	
  77:1-­‐37	
  	
  
Roger	
   C,	
   Bénar	
   CG,	
   Vidal	
   F,	
   Hasbroucq	
   T,	
   Burle	
   B	
   (2010)	
   Rostral	
   Cingulate	
   Zone	
   and	
   correct	
  
response	
   monitoring:	
   ICA	
   and	
   source	
   localization	
   evidences	
   for	
   the	
   unicity	
   of	
   correct-­‐	
  
and	
  error-­‐negativities.	
  NeuroImage	
  51:391-­‐403	
  
Schelter	
   B,	
   Winterhalder	
   M,	
   Eichler	
   M,	
   Peifer	
   M,	
   Hellwig	
   B,	
   Guschlbauer	
   B,	
   Lucking	
   CH,	
  
Dahlhaus	
   R,	
   Timmer	
   J	
   (2005)(a)	
   Testing	
   for	
   directed	
   influences	
   among	
   neural	
   signals	
  
using	
  partial	
  directed	
  coherence.	
  J.	
  Neurosci.	
  Methods	
  152:210-­‐219	
  	
  
Schelter	
  B,	
  Winterhalder	
  M,	
  Timmer	
  J	
  eds.	
  (2006)	
  Handbook	
  of	
  Time	
  Series	
  Analysis:	
  Recent	
  
Theoretical	
  Developments	
  and	
  Applications	
  1st	
  ed.	
  Wiley.	
  
Schelter	
  B,	
  Timmer	
  J,	
  Eichler	
  M	
  (2009)	
  Assessing	
  the	
  strength	
  of	
  directed	
  influences	
  among	
  
neural	
   signals	
   using	
   renormalized	
   partial	
   directed	
   coherence.	
   Journal	
   of	
   neuroscience	
  
methods	
  179:121-­‐30	
  
Schelter	
   B,	
   Winterhalder	
   M,	
   Eichler	
   M,	
   Peifer	
   M,	
   Hellwig	
   B,	
   Guschlbauer	
   B,	
   Lücking	
   CH,	
  
Dahlhaus	
   R,	
   Timmer	
   J	
   (2005)(b)	
   Testing	
   for	
   directed	
   influences	
   among	
   neural	
   signals	
  
using	
  partial	
  directed	
  coherence.	
  Journal	
  of	
  neuroscience	
  methods	
  152:210-­‐9	
  	
  
Schlögl	
  A	
  (2000)	
  The	
  electroencephalogram	
  and	
  the	
  adaptive	
  autoregressive	
  model:	
  theory	
  
and	
  applications.	
  Doctoral	
  Thesis.	
  
Schlögl	
  A	
  (2006)	
  A	
  comparison	
  of	
  multivariate	
  autoregressive	
  estimators.	
  Signal	
  processing	
  
86:2426-­‐2429	
  	
  
Schlögl	
  A,	
  Supp	
  G	
  (2006)	
  Analyzing	
  event-­‐related	
  EEG	
  data	
  with	
  multivariate	
  autoregressive	
  
parameters.	
  Progress	
  in	
  brain	
  research	
  159:135–147	
  	
  

	
  

68	
  

Schneider	
  T,	
  Neumaier	
  A	
  (2001)	
  Algorithm	
  808:	
  ARfit-­‐-­‐-­‐a	
  matlab	
  package	
  for	
  the	
  estimation	
  
of	
   parameters	
   and	
   eigenmodes	
   of	
   multivariate	
   autoregressive	
   models.	
   ACM	
   Transactions	
  
on	
  Mathematical	
  Software	
  27:58-­‐65	
  	
  
Seth	
   AK	
   (2005)	
   Causal	
   connectivity	
   of	
   evolved	
   neural	
   networks	
   during	
   behavior.	
   Network	
  
(Bristol,	
  England)	
  16:35-­‐54	
  	
  
Seth	
   AK	
   (2010)	
   A	
   MATLAB	
   toolbox	
   for	
   Granger	
   causal	
   connectivity	
   analysis.	
   Journal	
   of	
  
neuroscience	
  methods	
  186:262-­‐73	
  	
  
Sommerlade	
  L,	
  Henschel	
  K,	
  Wohlmuth	
  J,	
  Jachan	
  M,	
  Amtage	
  F,	
  Hellwig	
  B,	
  Lucking	
  CH,	
  Timmer	
  
J,	
  Schelter	
  B	
  (2009)	
  Time-­‐variant	
  estimation	
  of	
  directed	
  influences	
  during	
  Parkinsonian	
  
tremor.	
  Journal	
  of	
  Physiology-­‐Paris	
  	
  
Supp	
   GG,	
   Schlögl	
   A,	
   Trujillo-­‐Barreto	
   N,	
   Muller	
   MM,	
   Gruber	
   T	
   (2007)	
   Directed	
   Cortical	
  
Information	
   Flow	
   during	
   Human	
   Object	
   Recognition:	
   Analyzing	
   Induced	
   EEG	
   Gamma-­‐
Band	
  Responses	
  in	
  Brain’s	
  Source	
  Space.	
  PLoS	
  One	
  2:684	
  
Theiler	
   J	
   (1992)	
   Testing	
   for	
   nonlinearity	
   in	
   time	
   series:	
   the	
   method	
   of	
   surrogate	
   data.	
  
Physica	
  D:	
  Nonlinear	
  Phenomena	
  58:77-­‐94	
  	
  
Valdés-­‐Sosa	
  P	
  a,	
  Sánchez-­‐Bornot	
  JM,	
  Lage-­‐Castellanos	
  A,	
  Vega-­‐Hernández	
  M,	
  Bosch-­‐Bayard	
  J,	
  
Melie-­‐García	
   L,	
   Canales-­‐Rodríguez	
   E	
   (2005)	
   Estimating	
   brain	
   functional	
   connectivity	
  
with	
  sparse	
  multivariate	
  autoregression.	
  Philosophical	
  transactions	
  of	
  the	
  Royal	
  Society	
  
of	
  London.	
  Series	
  B,	
  Biological	
  sciences	
  360:969-­‐81	
  	
  
Wang	
   X,	
   Chen	
   Y,	
   Bressler	
   SL,	
   Ding	
   M	
   (2007)	
   Granger	
   Causality	
   Between	
   Multiple	
  
Interdependent	
   Neurobiological	
   Time	
   Series:	
   Blockwise	
   Versus	
   Pairwise	
   Methods.	
  
International	
  Journal	
  of	
  Neural	
  Systems	
  17:71	
  	
  
Wang	
   X,	
   Chen	
   Y,	
   Ding	
   M	
   (2008)	
   Estimating	
   Granger	
   causality	
   after	
   stimulus	
   onset:	
   a	
  
cautionary	
  note.	
  NeuroImage	
  41:767-­‐76	
  	
  
Weiner	
   N	
   (1956)	
   The	
   Theory	
   of	
   Prediction	
   In	
   E.	
   F.	
   Beckenbach,	
   ed.	
   Modern	
   Mathematics	
   for	
  
Engineers	
  New	
  York,	
  New	
  York,	
  USA:	
  McGraw-­‐Hill.	
  
Yordanova	
   J,	
   Falkenstein	
   M,	
   Hohnsbein	
   J,	
   Kolev	
   V	
   (2004)	
   Parallel	
   systems	
   of	
   error	
  
processing	
  in	
  the	
  brain.	
  NeuroImage	
  22:590-­‐602	
  	
  
Zetterberg	
   LH	
   (1969)	
   Estimation	
   of	
   Parameters	
   for	
   a	
   Linear	
   Difference	
   Equation	
   with	
  
Application	
  to	
  EEG	
  Analysis.	
  Mathematical	
  Biosciences	
  5:227–275	
  	
  
	
  

	
  

69	
  



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 69
PDF Version                     : 1.4
Title                           : Microsoft Word - SIFT_manual_refstripped2.docx
Author                          : Tim Mullen
Producer                        : Mac OS X 10.6.4 Quartz PDFContext
Creator                         : Microsoft Word
Create Date                     : 2011:01:19 03:24:13Z
Modify Date                     : 2011:01:19 03:24:13Z
EXIF Metadata provided by EXIF.tools

Navigation menu