Guide
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 10
Download | ![]() |
Open PDF In Browser | View PDF |
Toy Model of pH Subtraction for FSCV Data Vincent Toups Nov 1, 2010 Abstract This is a quick document describing the use of some tools for chemometrics in matlab. 0.1 Front Matter This is a quick document describing the use of some tools for chemometrics in matlab. It should come with a set of library files in this directory under matlab. These library files should be loaded before running the examples. If you start Matlab in this directory, the file startup.m should execute automatically on startup. If it doesn’t, you can execute addpath(genpath(path-to-librarydirectory)) to load them. This document should have come in a zip file, which, when expanded, should create a directory. If you start Matlab in this directory, everything should “just work.” 0.2 Generating CVs From Raw Data The first step in analyzing data from a flow cell experiment is to generate a single CV for each trial from the flow cell. This process involves separating the scans which contain the sample from the background, estimating the background, removing it from the sample trials, and finally averaging some subset of the sample scans to produce the background subtracted CV. Because each of these steps involve some discretion, and because differences in each step produce different CVs at the end of the process, it makes sense to automate the process entirely. Before the raw data can be analyzed by this library, it has to be exported from Tar Heel CV as a color data file, with care taken to make sure there is no background subtraction or averaging performed. This Matlab library will take care of all that itself. In order to export a raw data file, you should load the data into THCV, click Analyze Data, and set “#scans to average for background,” “# of scans to average for CV,” and “# of points to average for I vs T” to zero. Then click “Write out Color Plot.” See Figure 1. In the simplest case, you can now load Matlab and execute a single command to generate a CV. f i l e n a m e = ’ path−to−raw− f i l e ’ ; cv = autoCV ( f i l e n a m e ) ; This just calls the function “autoCV” on the file. You can also load the file into Matlab, and then call the function on the data itself: f i l e n a m e = ’ path−to−raw− f i l e ’ ; data = load ( f i l e n a m e ) ; cv = autoCV ( data ) ; However, this is apt to provide less informative error messages during batch processing, because “autoCV” will not know which file has caused a possible error. To process a batch of files located in a single directory, you can write something like: 1 Figure 1: f i l e n a m e s = ddirnames ( ’ /some−d i r e c t o r y / ’ ) ; % c a l l a Vincent l i b r a r y % function to generate a l i s t of a l l filenames . f o r f i =1: length ( f i l e n a m e s ) f i l e n a m e=f i l e n a m e s { f i } ; cv = autoCV ( data ) ; % g e n e r a t e t h e CV. newName = strrep ( f i l e n a m e , ’ . t x t ’ , ’−cv . t x t ’ ) ; % g e n e r a t e a new name % t o s a v e t h e cv t o . save ( newName , ’− a s c i i ’ , ’ cv ’ ) ; % s a v e i t . end 0.3 How it Works “AutoCV” is a function which does a few things. The first task, in broad terms, is to locate the scans in the data set corresponding to the presence of the sample in the flow cell. This process is automated based on the rather modest assumption that these scans will be different than the background scans (this assumption fails when a blank is injected, but then there is no way to tell the difference between sample and background). More formally speaking, each scan from the data set is treated as a vector, and then the Euclidian Distances between these vectors are used to find when the sample appears in the flow cell, and when it leaves. It does this by trying to find the locations for the entrance 2 and exit time which maximize the distances between putative “sample” scans and “background” scans. Once we’ve found the division between background and sample trials, we can estimate the background and remove it. We’d like to be conservative about background estimation, so we pad the boundaries of the sample before finding background trials to estimate. “AutoCV” can perform background subtraction in a variety of ways. It takes optional arguments which allow the user to customize how it performs background subtraction. If you open the function definition file, you’ll see: function [ cv , aux ] = autoCV ( data , v a r a r g i n ) % [ cv , aux ] = AUTOCV(DATA,VARARGIN) f i n d a CV i n t h e c o l o r p l o t i n DATA % % This f u n c t i o n a t t e m p t s t o c a l c u l a t e a CV f o r a f l o w c e l l d a t a s e t . % I t i s o l a t e s t h e sample from t h e b a c k g r o u n d u s i n g d i f f e r e n c e s i n t h e % CVs i n each column o f DATA and t h e n c a l c u l a t e s t h e b a c k g r o u n d % t r e n d and s u b t r a c t s i t . % I t s u p p o r t s t h e f o l l o w i n g o p t i o n a l arguments which modify t h e d e f a u l t % behavior . % % OPTIONAL ARG | D e f a u l t Value % marginPct = 2; % A f t e r t h e sample CVs a r e d e l i n e a t e d from t h e % b u f f e r CVs , pad t h e r e g i o n by t h i s p e r c e n t a g e b e f o r e % c a l c u l a t i n g t h e b a c k g r o u n d . Two h e r e c o r r e s p o n d s t o % a 200% pad . % filename = ’UNSPECIFIED ’ ; % Use t h i s as t h e f i l e n a m e i f raw d a t a was p a s s e d i n and % t h e r e i s an e r r o r . % flattenArgs = {}; % Arguments t o be p a s s e d t o % f l a t t e n B a c k g r o u n d P o l y , i f i t i s used t o s u b t r a c t t h e % background . % t r i a l S e l e c t i o n M e t h o d = ’ peak ’ ; % S p e c i f i e s t h e method used t o s e l e c t t h e t r i a l s use f o r t h e CV. % ’ peak ’ means t h a t t h e maximum power CV i n t h e sample r e g i o n , % a f t e r b a c k g r o u n d s u b t r a c t , forms t h e c e n t e r o f t h e CV r e g i o n . % aroundPeak = 10; % Number o f CVs around t h e t r i a l S e l e c t e d f o r t h e sample CV t o % a v e r a g e t o produce t h e f i n a l sample CV. % deFlickerData = True ; % S p e c i f i e s whether to d e f l i c k e r the data b e f o r e a n a l y s i s . % This removes ” pops and s na p s ” i n t h e d a t a b e f o r e any p r o c e s s i n g . % S h o u l d be m o s t l y h a r m l e s s f o r d a t a t h a t doesn ’ t % have t h i s k i n d o f n o i s e . 3 % % % % % % % % % % % % % % % % % % % % deFlickerArgs = {}; Arguments t o p a s s t o t h e d e f l i c k e r f u n c t i o n , i n a d d i t i o n t o t h e d e f a u l t ones . partitionArgs = {}; Arguments t o p a s s t o t h e f u n c t i o n which c a l c u l a t e s t h e sample and b a c k g r o u n d t r i a l s . biasTowardsPre = 0; E n a b l e s b i a s i n g t o w a r d s e a r l y CVs i n t h e d a t a s e t as b a c k g r o u n d CVs . useSimpleMean = 0; D i s a b l e p o l y n o m i a l b a c k g r o u n d s u b t r a c t i o n and s i m p l y use a mean o f t h e b a c k g r o u n d t r i a l s as t h e b a c k g r o u n d t o s u b t r a c t . saveBackground = 0; Turn t h i s on t o s a v e t h e b a c k g r o u n d CVs , which you might want t o examine t o d e t e c t l o n g term t r e n d s i n t h e d a t a . backgroundDirectory = ’ . / autoBackgrounds / ’ I f you e n a b l e b a c k g r o u n d CV s a v i n g , t h i s i s where t h e y go . autoSignFix = True ; Attempt t o a u t o m a t i c a l l y f l i p d a t a t h a t has been i n v e r t e d as a c o n s e q u e n c e o f u s i n g UEI mode v s non UEI mode . This shows the degree to which AutoCV can be customized. I’ve tried to set up defaults which work for most CVs. Optional arguments are passed in like standard Matlab functions, so if you want to enable simple mean background subtraction but disable deflickering you would call autoCV like this: cv = autoCV ( ’ a−c o l o r −p l o t − f i l e . t x t ’ , ’ useSimpleMean ’ , True , . . . ’ deFlickerData ’ , False ) ; AutoCV is pretty smart - I’d estimate it gets the CV right more than 80 percent of the time, but you should always double check the output before proceeding with further analysis steps. 0.4 In Vivo Data Automatically extracting CVs from flow cell data is substantially easier than similar analysis of in-vivo data. The critical step in extracting a sample CV is finding the CVs in a recording which correspond to the sample and the ones which correspond to the background. In the flow cell, we know that the trials start with background CVs, at some point the sample enters the flow cell, after which we have sample CVs until the sample leaves the cell, at which point buffer should resume flowing. The job of selecting the sample CVs simplifies into selecting just two points: onset and offset. This is a problem domain which can be exhaustively searched (although that is not how autoCV works, in most cases). In Vivo, particularly for transients, there are no guarantees of single chemical events or rapid onsets or offsets. As a consequence, I’ve developed 4 alternative tools to deal with this case. Rather than trying to find a single CV in an in vivo set, it makes sense to find multiple candidates, and return a few of them for subsequent analysis. At the moment, in vivo automatic extraction is more of an aid in finding CVs rather than a bulletproof method. You can extract CVs from in vivo color plots (exported without background subtraction) with the function “autoCVInVivo”. As most of the lab is doing in vitro experiments, and this is alpha-level tech, that is probably all the useful documentation for this code at this time. 0.5 Performing PCR Above you learned about extracting CVs from in vitro (or in vivo) data automatically. Usually the next step in our analysis pipeline is to calibrate a linear model which is able to take a CV and predict a concentration. This can be approached in several ways. For single analyte runs, I’ve found its almost always possible to simply pick the peak associated with the analyte that has the highest variability with respect to concentration and perform a regression between concentration and just those points. Flow cell data tends to be of sufficiently low variability that there is more than enough information in a single voltage point to get very high r2 values. 0.5.1 A note on file naming conventions People in the lab may have noticed I tend to use crazy long file names. This is for a great reason: it makes writing scripts to manipulate and analyze data much, much easier. Typically, my filenames look something like this: b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=001000= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=000500= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=000500= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=000500= c o u r s e =001= t r i a l =0 b u f f e r P h =740=samplePh=000748=sampleHPO=000160=sampleDA=000500= c o u r s e =001= t r i a l =0 The critical thing to notice here is that these file names tell you everything you want to know about each file in a simple to parse format. This library includes code to transform a file name in the format above into a matlab structure which can be used programmatically. >> d s f ( ’ c v s / t r i a l =000005=sampleHpo=000100=sampleDa=000125=samplePh=000748=newRan 5 d s f ( ’ c v s / t r i a l =000005=sampleHpo=000100=sampleDa=000125=samplePh=000748=newRank=4 ans = trial : sampleHpo : sampleDa : samplePh : newRank : 5 100 125 748 4 “dsf” stands for “destructure file”. It is a very simple function which simply splits a file name (after discarding the directory) into a pieces at the “=” signs, and then converts those pieces into name/value pairs, finally loading those into a structure, which it returns. Suppose you wanted to get all the files in a directory which corresponding to trials with only changes in pH. You could write a script like this in Matlab, assuming you files had the right names. f i l e n a m e s = ddirnames ( ’ . / c v s / ’ ) ; % g e t a l l t h e f i l e s i n a d i r e c t o r y % c a l l e d ’ cvs ’ p h O n l y F i l e s = { } ; % i n i t i a l i z e an empty l i s t f o r h o l d i n g t h e pH f i l e n a m e s f o r f i =1: length ( f i l e n a m e s ) % l o o p o v e r a l l t h e f i l e s f i l e n a m e=f i l e n a m e s { f i } ; f i l e I n f o = dsf ( filename ) ; i f f i l e I n f o . sampleDa == 0 && f i l e I n f o . sampleHpo == 0 p h O n l y F i l e s {end+1} = f i l e n a m e ; end end It is extremely common to take subsets of your data like this, and so the library has a lot of code to make these kinds of operations easy to write. For instance, I would code the above like this: queryFunctions ; % load a s e t of f u n c t i o n s to help organize filenames . phOnlyTest = fAnd ( mkMatcher ( ’ sampleDa ’ , 0 ) , . . . mkMatcher ( ’ sampleHpo ’ , 0 ) ) ; % mkMatcher means ”make matcher ” . I t r e t u r n s a f u n c t i o n which a c c e p t s % a f i l e name and which r e t u r n s t r u e o n l y when t h e f i e l d s p e c i f i e d % is the value s p e c i f i e d % fAnd t a k e s two f u n c t i o n s and r e t u r n s a new f u n c t i o n which i s t r u e % o n l y when b o t h o f t h e i n p u t f u n c t i o n s a r e t r u e on t h e i n p u t s t o t h e % new f u n c t i o n . % So t h e a bo v e code r e a d s as : % c r e a t e a f u n c t i o n c a l l e d pHOnlyTest which a c c e p t s a f i l e n a m e as % i n p u t and o n l y r e t u r n s t r u e when t h a t f i l e has z e r o f o r sampleDa 6 % and z e r o f o r sampleHpo . p h O n l y F i l e s = f i l t ( phOnlyTest , ddirnames ( ’ . / c v s / ’ ) ) ; % t h i s l i n e o f code means” % f i l t e r t h e l i s t o f f i l e s r e t u r n e d by ddirnames ( ’ . / c v s / ’ ) % r e t u r n o n l y t h o s e f i l e s which p a s s ” phOnlyTest ” Without comments, the above is substantially shorter than the other version. Less code means fewer opportunities for mistakes! Anyway, you can use this library with any files you want. Chemometrics and CV extraction don’t depend on specific filename conventions. But if you want to do data analysis in Matlab more easily, choosing a good convention for filenames is pretty helpful. 0.6 Performing Standard PCR Performing PCR is simple once you’ve examined the above examples. Here is a complete listing for a PCR in one of my directories: a l l F i l e s = ddirnames ( [ ’ . / c v s / ’ ] ) ; % g e t t h e cv names n o t O u t l i e r = fAnd (@( x ) d s f ( x , [ ] , ’ newRank ’ , 0 ) < 4 , . . . @( x ) d s f ( x , [ ] , ’ b l a c k ’ ,0)==0); % remove o u t l i e r s % and b l a c k l i s t e d d a t a t r a i n i n g F i l e s = unique ( f i l t ( n o t O u t l i e r , [ f i l t ( daTraining , a l l F i l e s ) ; f i l t ( hpoTraining , a l l F i l e s ) ; f i l t ( phTraining , a l l F i l e s ) ] ) ) ; % s e l e c t % t r a i n i n g data t e s t i n g F i l e s = unique ( f i l t ( n o t O u t l i e r , f i l t ( t e s t i n g , a l l F i l e s ) ) ) ; % s e l e c t t e s t i n g data c l e a r p c r S t r u c t ; % make s u r e t h e s t r u c t u r e c o n t a i n i n g p c r d a t a i s c l e a r f o r t i =1: length ( t r a i n i n g F i l e s ) t r a i n i n g F i l e=t r a i n i n g F i l e s { t i } ; props = d s f ( t r a i n i n g F i l e ) ; p c r S t r u c t ( t i ) . data = load ( t r a i n i n g F i l e ) ; p c r S t r u c t ( t i ) . da = d s f ( t r a i n i n g F i l e , [ ] , ’ sampleDA ’ ) ; p c r S t r u c t ( t i ) . hpo = d s f ( t r a i n i n g F i l e , [ ] , ’ sampleHPO ’ ) ; p c r S t r u c t ( t i ) . ph = d s f ( t r a i n i n g F i l e , [ ] , ’ samplePh ’ ) ; end % t h e a b ov e f i l l s t h e p c r s t r u c t . testingData = loadFiles ( t e s t i n g F i l e s ) ; % load the data f o r t e s t i n g . [ models , aux ] = doPCR( p c r S t r u c t , t e s t i n g D a t a , ccm ) ; % perform PCR 7 The function doPCR returns a struct of models for each of the analytes indicated in the pcrStruct (in our example, da, hpo and ph). Models.da, for instance, contains the predicted and actual Dopamine concentrations in an array. The order corresponds to the training and then the testing data passed in, in their orders. You can plot the results for the training data like so. das = map( dag , a l l F i l e s ) ; % dag i s a f u n c t i o n d e f i n e d i n % q u e r y F u n c t i o n s which g e t s t h e r e c o r d e d % dopamine c o n c e n t r a t i o n from a f i l e % name . Map a p p l i e s a f u n c t i o n t o a l i s t % o f t h i n g s and r e t u r n s a l i s t o f t h e % r e s u l t s . So t h e ab o ve l i n e r e t u r n s a % l i s t o f a l l t h e p r e p a r e d dopamine c o n c e n t r a t i o n s . nTr = 1 : length ( t r a i n i n g F i l e s ) ; plotWithRedundantXs ( das ( nTr ) , models . da . p r e d i c t e d ( nTr ) ) ; % plotWithRedundantXs makes a s c a t t e r p l o t o f t h e d a t a where Xs may be r e p e a t e d . Further documentation is forthcoming, but an intrepid analyst could start from these examples and go! 8
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Page Count : 10 Producer : pdfTeX-1.40.10 Creator : TeX Create Date : 2011:03:22 09:58:29-04:00 Modify Date : 2011:03:22 09:58:29-04:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.1415926-1.40.10-2.2 (TeX Live 2009/Debian) kpathsea version 5.0.0EXIF Metadata provided by EXIF.tools