Guide

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 10

Toy Model of pH Subtraction for FSCV Data

Vincent Toups

Nov 1, 2010

Abstract

This is a quick document describing the use of some tools for chemometrics

in matlab.

0.1 Front Matter

This is a quick document describing the use of some tools for chemometrics

in matlab. It should come with a set of library ﬁles in this directory under

matlab. These library ﬁles should be loaded before running the examples. If

you start Matlab in this directory, the ﬁle startup.m should execute automati-

cally on startup. If it doesn’t, you can execute addpath(genpath(path-to-library-

directory)) to load them. This document should have come in a zip ﬁle, which,

when expanded, should create a directory. If you start Matlab in this directory,

everything should “just work.”

0.2 Generating CVs From Raw Data

The ﬁrst step in analyzing data from a ﬂow cell experiment is to generate a

single CV for each trial from the ﬂow cell. This process involves separating

the scans which contain the sample from the background, estimating the back-

ground, removing it from the sample trials, and ﬁnally averaging some subset

of the sample scans to produce the background subtracted CV. Because each of

these steps involve some discretion, and because diﬀerences in each step produce

diﬀerent CVs at the end of the process, it makes sense to automate the process

entirely.

Before the raw data can be analyzed by this library, it has to be exported

from Tar Heel CV as a color data ﬁle, with care taken to make sure there is no

background subtraction or averaging performed. This Matlab library will take

care of all that itself. In order to export a raw data ﬁle, you should load the data

into THCV, click Analyze Data, and set “#scans to average for background,”

“# of scans to average for CV,” and “# of points to average for I vs T” to zero.

Then click “Write out Color Plot.” See Figure 1.

In the simplest case, you can now load Matlab and execute a single command

to generate a CV.

f i l e n a m e = ’ path−to−raw−f i l e ’ ;

cv = autoCV ( f i l e n a m e ) ;

This just calls the function “autoCV” on the ﬁle. You can also load the ﬁle

into Matlab, and then call the function on the data itself:

f i l e n a m e = ’ path−to−raw−f i l e ’ ;

data = load (filename );

cv = autoCV ( d at a ) ;

However, this is apt to provide less informative error messages during batch

processing, because “autoCV” will not know which ﬁle has caused a possible

error. To process a batch of ﬁles located in a single directory, you can write

something like:

Figure 1:

f i l e n a m e s = ddirnames ( ’ /some−d i r e c t o r y / ’ ) ; % c a l l a Vincent l i b r a r y

% f u n c t i o n t o g e n er a t e a l i s t o f a l l f i l e n a m e s .

for f i =1: length(filenames )

f i l e n a m e=f i l e n a m e s {f i };

cv = autoCV ( d at a ) ; % g e n e r a t e t h e CV.

newName = strrep( fil e n a me , ’ . tx t ’ , ’−cv . t x t ’ ) ; % g e n e r a t e a new name

% to save t h e cv to .

save (newName , ’−a s c i i ’ , ’ cv ’ ) ; % sav e i t .

end

0.3 How it Works

“AutoCV” is a function which does a few things. The ﬁrst task, in broad

terms, is to locate the scans in the data set corresponding to the presence of

the sample in the ﬂow cell. This process is automated based on the rather

modest assumption that these scans will be diﬀerent than the background scans

(this assumption fails when a blank is injected, but then there is no way to tell

the diﬀerence between sample and background). More formally speaking, each

scan from the data set is treated as a vector, and then the Euclidian Distances

between these vectors are used to ﬁnd when the sample appears in the ﬂow cell,

and when it leaves. It does this by trying to ﬁnd the locations for the entrance

and exit time which maximize the distances between putative “sample” scans

and “background” scans.

Once we’ve found the division between background and sample trials, we

can estimate the background and remove it. We’d like to be conservative about

background estimation, so we pad the boundaries of the sample before ﬁnding

background trials to estimate.

“AutoCV” can perform background subtraction in a variety of ways. It

takes optional arguments which allow the user to customize how it performs

background subtraction. If you open the function deﬁnition ﬁle, you’ll see:

function [ cv , aux ] = autoCV ( data , v a r a r g i n )

% [ cv , aux ] = AUTOCV(DATA,VARARGIN) fi nd a CV in th e c o l o r p l o t in DATA

% This f u nc t io n at t e mp t s to c a l c u l a t e a CV f o r a f lo w c e l l da ta s e t .

% I t i s o l a t e s t he sample from t he background using d i f f e r e n c e s i n the

% CVs i n each column o f DATA and then c a l c u l a t e s t h e back ground

% trend and s u b t r a c t s i t .

% I t s up po r t s t he f o l l o w i n g o p t i o n a l arguments which modify t he d e f a u l t

% b e h a v i o r .

% OPTIONAL ARG |D e f a u l t Value

% marginPct = 2;

% A ft er t he sample CVs are d e l i n e a t e d from t he

% b u f f e r CVs , pad th e r e gi o n by t h i s p er c en t ag e b e f o r e

% c a l c u l a t i n g t h e background . Two he re c orresp on ds t o

% a 200% pad .

% f i l e n a m e = ’UNSPECIFIED ’ ;

% Use t h i s as t h e f i l e n a m e i f raw d at a was pa ss e d i n and

% t h e r e i s an e r r o r .

% f l a t t e n A r g s = {} ;

% Arguments t o be pa s sed to

% flattenBackg ro un dP ol y , i f i t i s used to s u b t r a c t th e

% background .

% trialSelectionMethod = ’peak ’;

% S p e c i f i e s t h e method used t o s e l e c t t h e t r i a l s use f o r t h e CV.

% ’ peak ’ means t h a t t h e maximum power CV i n t h e sa mp le r e gi on ,

% a f t e r ba ckgr ound s u b t r a c t , forms t he c e n t e r o f t h e CV r e gi o n .

% aroundPeak = 10;

% Number o f CVs around th e t r i a l S e l e c t e d f o r th e sample CV to

% av e rage to produce th e f i n a l sample CV.

% deFli c k e r D a t a = True ;

% S p e c i f i e s whether t o d e f l i c k e r th e data b e f or e a n a l y s i s .

% This removes ” pops and s naps ” i n t h e da ta b e f o r e any p r o c e s s i n g .

% Sh ould be mo st ly h arm le ss f o r d ata t h a t doesn ’ t

% have t h i s k i n d o f n o i s e .

% deFlickerArgs = {} ;

% Arguments t o pa ss t o th e d e f l i c k e r fu n c t i o n , i n a d di t io n to

% t h e d e f a u l t ones .

% partitionArgs = {};

% Arguments t o p as s t o t h e f u n c t i o n which c a l c u l a t e s t h e sample

% and background t r i a l s .

% biasTowardsPre = 0;

% E nable s b i a s i n g t owar ds e a r l y CVs in the dat a s e t as

% b ackg rou nd CVs .

% useSimpleMean = 0;

% D i sa b le po ly no mial backgr ound s u b t r a c t i o n and sim pl y u se

% a mean o f t he background t r i a l s as th e back ground to s u b t r a c t .

% saveBackground = 0;

% Turn t h i s on t o sa ve t h e background CVs , which you might

% want t o examine t o d e t e c t l o n g term t r e n d s i n t h e d ata .

% b ac kg r ou nd D ir e ct or y = ’ . / au toBack groun ds / ’

% I f you e n a b l e b ack gro und CV s av in g , t h i s i s w here t h e y go .

% au t o SignF i x = True ;

% Attempt to a u t o m a t i c a l l y f l i p d ata t h a t has been i n v e r t e d

% as a c on se que nc e o f u s i n g UEI mode v s non UEI mode .

This shows the degree to which AutoCV can be customized. I’ve tried to set

up defaults which work for most CVs. Optional arguments are passed in like

standard Matlab functions, so if you want to enable simple mean background

subtraction but disable deﬂickering you would call autoCV like this:

cv = autoCV ( ’ a−c o l o r −plot−f i l e . tx t ’ , ’ useSimpleMean ’ , True , . . .

’ d e F l i ckerD a t a ’ , F a l s e ) ;

AutoCV is pretty smart - I’d estimate it gets the CV right more than 80

percent of the time, but you should always double check the output before

proceeding with further analysis steps.

0.4 In Vivo Data

Automatically extracting CVs from ﬂow cell data is substantially easier than

similar analysis of in-vivo data. The critical step in extracting a sample CV is

ﬁnding the CVs in a recording which correspond to the sample and the ones

which correspond to the background. In the ﬂow cell, we know that the trials

start with background CVs, at some point the sample enters the ﬂow cell, after

which we have sample CVs until the sample leaves the cell, at which point

buﬀer should resume ﬂowing. The job of selecting the sample CVs simpliﬁes

into selecting just two points: onset and oﬀset. This is a problem domain which

can be exhaustively searched (although that is not how autoCV works, in most

cases). In Vivo, particularly for transients, there are no guarantees of single

chemical events or rapid onsets or oﬀsets. As a consequence, I’ve developed

alternative tools to deal with this case. Rather than trying to ﬁnd a single CV

in an in vivo set, it makes sense to ﬁnd multiple candidates, and return a few

of them for subsequent analysis.

At the moment, in vivo automatic extraction is more of an aid in ﬁnding CVs

rather than a bulletproof method. You can extract CVs from in vivo color plots

(exported without background subtraction) with the function “autoCVInVivo”.

As most of the lab is doing in vitro experiments, and this is alpha-level tech,

that is probably all the useful documentation for this code at this time.

0.5 Performing PCR

Above you learned about extracting CVs from in vitro (or in vivo) data auto-

matically. Usually the next step in our analysis pipeline is to calibrate a linear

model which is able to take a CV and predict a concentration. This can be

approached in several ways.

For single analyte runs, I’ve found its almost always possible to simply pick

the peak associated with the analyte that has the highest variability with respect

to concentration and perform a regression between concentration and just those

points. Flow cell data tends to be of suﬃciently low variability that there is

more than enough information in a single voltage point to get very high r2

values.

0.5.1 A note on ﬁle naming conventions

People in the lab may have noticed I tend to use crazy long ﬁle names. This

is for a great reason: it makes writing scripts to manipulate and analyze data

much, much easier. Typically, my ﬁlenames look something like this:

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =005=newRank=4=newRank=4. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =004=newRank=1=newRank=1. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =003=newRank=2=newRank=2. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =002=newRank=3=newRank=3. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =001=newRank=5=newRank=5. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=001000= co u r s e=001= t r i a l =000=newRank=6=newRank=6. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=000500= co u r s e=001= t r i a l =005=newRank=3=newRank=3. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=000500= co u r s e=001= t r i a l =004=newRank=4=newRank=4. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=000500= co u r s e=001= t r i a l =003=newRank=2=newRank=2. txt

bu f fer Ph=740=samplePh=000748=sampleHPO=000160=sampleDA=000500= co u r s e=001= t r i a l =002=newRank=1=newRank=1. txt

The critical thing to notice here is that these ﬁle names tell you everything

you want to know about each ﬁle in a simple to parse format. This library in-

cludes code to transform a ﬁle name in the format above into a matlab structure

which can be used programmatically.

>> d s f ( ’ cvs / t r i a l =000005=sampleHpo=000100=sampleDa=000125=samplePh=000748=newRank=4. tx t ’ )

d s f ( ’ c vs / t r i a l =000005=sampleHpo=000100=sampleDa=000125=samplePh=000748=newRank=4. tx t ’ )

ans =

trial : 5

sampleHpo : 100

sampleDa : 125

samplePh : 748

newRank : 4

“dsf” stands for “destructure ﬁle”. It is a very simple function which simply

splits a ﬁle name (after discarding the directory) into a pieces at the “=” signs,

and then converts those pieces into name/value pairs, ﬁnally loading those into

a structure, which it returns.

Suppose you wanted to get all the ﬁles in a directory which corresponding

to trials with only changes in pH. You could write a script like this in Matlab,

assuming you ﬁles had the right names.

f i l e n a m e s = ddirnames ( ’ . / cv s / ’ ) ; % g e t a l l t he f i l e s i n a d i r e c t o r y

% c a l l e d ’ cvs ’

ph Onl yF ile s = { } ;% i n i t i a l i z e an empty l i s t f o r h o l d i n g t h e pH f i l e n a m e s

for f i =1: length(filenames ) % l o o p ov er a l l t h e f i l e s

f i l e n a m e=f i l e n a m e s {f i };

f i l e I n f o = d s f ( f i l e n a m e ) ;

i f f i l e I n f o . sampleDa == 0 && f i l e I n f o . sampleHpo == 0

ph Onl yF ile s {end+1}= filename ;

end

It is extremely common to take subsets of your data like this, and so the

library has a lot of code to make these kinds of operations easy to write. For

instance, I would code the above like this:

qu ery Fu nct ion s ; % l oa d a s e t o f f u n c t i o n s t o h e l p o r g a n i z e f i l e n a m e s .

phOnlyTest = fAnd ( mkMatcher ( ’ sampleDa ’ , 0 ) , . . .

mkMatcher ( ’ sampleHpo ’ , 0 ) ) ;

% mkMatcher means ”make matcher ” . I t r e t u r n s a f u n c t i o n which a c c e p t s

% a f i l e name and which r e t ur n s t r u e o nly when t h e f i e l d s p e c i f i e d

% i s t h e v al ue s p e c i f i e d

% fAnd t a k e s two f u n c t i o n s and r e t u r n s a new f u n c t i o n which i s t r u e

% o n l y when b o t h o f t h e i n p u t f u n c t i o n s a re t r u e on t h e i n p u t s t o t h e

% new f u n c t i o n .

% So t h e above code read s as :

% c r e a t e a f u n c t i o n c a l l e d pHOnlyTest w hich a c c e pt s a f i le n a me as

% i n p u t and o n ly r e t u r n s t r u e when t h a t f i l e has z e ro f o r sampleDa

% and z er o f o r sampleHpo .

ph Onl yF ile s = f i l t ( phOnlyTest , ddirnames ( ’ . / cvs / ’ ) ) ;

% t h i s l i n e o f code means”

% f i l t e r t h e l i s t o f f i l e s r e t u r n e d by ddirnames ( ’ . / cv s / ’ )

% r e t u r n o n ly t h o s e f i l e s w hich p a ss ” phOnlyT es t ”

Without comments, the above is substantially shorter than the other version.

Less code means fewer opportunities for mistakes!

Anyway, you can use this library with any ﬁles you want. Chemometrics

and CV extraction don’t depend on speciﬁc ﬁlename conventions. But if you

want to do data analysis in Matlab more easily, choosing a good convention for

ﬁlenames is pretty helpful.

0.6 Performing Standard PCR

Performing PCR is simple once you’ve examined the above examples. Here is a

complete listing for a PCR in one of my directories:

a l l F i l e s = ddirnames ( [ ’ . / cvs / ’ ] ) ; % g e t t he cv names

n o t O u t l i e r = fAnd (@( x ) d s f ( x , [ ] , ’ newRank ’ , 0 ) <4 , . . .

@( x ) d s f ( x , [ ] , ’ b la c k ’ , 0) == 0); % remove o u t l i e r s

% and b l a c k l i s t e d data

t r a i n i n g F i l e s = unique ( f i l t ( not O u t l i e r , [ f i l t ( daTraining , a l l F i l e s ) ;

f i l t ( hpoTraining , a l l F i l e s ) ;

f i l t ( phTraining , a l l F i l e s ) ] ) ) ; % select

% t r a i n i n g da ta

t e s t i n g F i l e s = unique ( f i l t ( n o t O u t l i e r , f i l t ( t es t in g , a l l F i l e s ) ) ) ;

% s e l e c t t e s t i n g da ta

clear p c r S t r u c t ; % make sure t he s t r u c t u r e c on t ai n in g pcr da ta i s c l e a r

for t i =1: length(trainingFiles)

t r a i n i n g F i l e=t r a i n i n g F i l e s {t i };

props = ds f ( t r a i n i n g F i l e ) ;

p c r S t r u c t ( t i ) . da ta = load ( t r a i n i n g F i l e ) ;

p cr St ru c t ( t i ) . da = ds f ( t r a i n i n g F i l e , [ ] , ’ sampleDA ’ ) ;

p cr St ru c t ( t i ) . hpo = ds f ( t r a i n i n g F i l e , [ ] , ’ sampleHPO ’ ) ;

p cr St ru c t ( t i ) . ph = d s f ( t r a i n i n g F i l e , [ ] , ’ samplePh ’ ) ;

end

% t he above f i l l s t he pcr s t r u c t .

t e s t i n g D a t a = l o a d F i l e s ( t e s t i n g F i l e s ) ;

% l o a d t h e data f or t e s t i n g .

[ models , aux ] = doPCR( pcrStr u c t , te s t i ngDa t a , ccm ) ;

% perform PCR

The function doPCR returns a struct of models for each of the analytes

indicated in the pcrStruct (in our example, da, hpo and ph). Models.da, for in-

stance, contains the predicted and actual Dopamine concentrations in an array.

The order corresponds to the training and then the testing data passed in, in

their orders.

You can plot the results for the training data like so.

das = map( dag , a l l F i l e s ) ; % dag i s a f u n c t i o n d e f i n e d i n

% qu er yFun ct ions which g e t s t he r ec or ded

% dopamine c o n c e n t r a t i o n from a f i l e

% name . Map a p p l i e s a f u n c t i o n t o a l i s t

% o f t h i n g s and r e t u r n s a l i s t o f t h e

% r e s u l t s . So t h e above l i n e r e t ur ns a

% l i s t o f a l l t h e p re pa red dopamine c o n c e n t r a t i o n s .

nTr = 1 : length( t r a i n i n g F i l e s ) ;

plotWithRedundantXs ( das ( nTr ) , models . da . p r e d i c t e d ( nTr ) ) ;

% plotWithRedundantXs makes a s c a t t e r p l o t o f t h e d at a w here Xs may b e r e p e a t e d .

Further documentation is forthcoming, but an intrepid analyst could start

from these examples and go!

Guide

Navigation menu

Versions of this User Manual:

Views

Navigation