SUGI 28: Errors, Warnings, And Notes (Oh My): A Practical Guide To Debugging SAS(r) Programs
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 10
Download | |
Open PDF In Browser | View PDF |
SUGI 28 Beginning Tutorials Paper 59-28 ERRORS, WARNINGS, AND NOTES (OH MY) A PRACTICAL GUIDE TO DEBUGGING SAS PROGRAMS Lora D. Delwiche, University of California, Davis, CA Susan J. Slaughter, Avocet Solutions, Davis, CA A man's intelligence is not measured by how many mistakes he makes, but how many of those mistakes he finds. – Erik de Bie ABSTRACT Errors? "I never get errors!" Warnings? "I never bother to read them, my program still runs." Notes? "Who needs them?" This paper, is based on the belief that debugging your programs is not only necessary, but also a good way to gain insight into how SAS works. Once you understand why you got an error a warning or a note, you'll be better able to avoid problems in the future. In other words, people who are good debuggers are good programmers. We cover common problems including missing semicolons, invalid data, truncated character values, missing-data-was-generated messages, character to numeric conversions, and the tricky problem of a DATA step that runs without suspicious messages but nonetheless produces the wrong results. For each problem we decipher the message, explain possible causes, and explain how to fix the problem. We also cover some programming strategies to avoid errors in the first place. WHAT IS A BUG? Scientists have identified approximately 1 1/4 million species of animals. Of those about 3/4, or 932,000, are insects. However, only the 82,000 species belonging to the order Hemiptera are considered by scientists to be “true bugs”(McGavin, 1993). Fortunately, a taxonomy of SAS bugs would not identify nearly so many species. Entomology aside, a bug is an error in a computer program that causes an undesired, and usually unexpected, result. One way of classifying computer bugs is to divide them into three types of errors: syntax, data, and logic. Syntax errors result from failing to follow SAS’s rules about the way keywords are put together to make statements. With data errors you have a program that is syntactically sound but fails because the data values do not fit the program as it was written. With logic errors you have a program that runs, and data that fits, but the result is wrong because the program gave the wrong instructions to the computer. The bugs discussed in this paper can be classified as: Syntax • missing semicolon • uninitialized variable and variable not found Data • missing values were generated • numeric to character conversion • invalid data • character field is truncated Logic • DATA step produces wrong results but no error message. LISTEN TO THE SAS LOG The first and most important rule in debugging SAS programs is to always, always check the SAS log. After running a SAS program many people turn immediately to the output. This is understandable, but not advisable. It is entirely possibleand sooner of later it happens to all of usto get output that looks fine but is totally bogus. Often, checking the SAS log is the only way to know whether a program has run properly. The second most important rule in debugging is to always start at the beginning of the log. This may seem obvious, but if you are running SAS in the windowing environment, then the SAS log fills up the Log window and you are left looking at the end of the log, not the beginning. There is a temptation to try to fix the first error you see, which is not necessarily the first error in the log. Often times one error early in the program can generate many messages, and only the first message will give you a clue as to what the problem is. If you are using the SAS windowing environment, then you may want to get in the habit of clearing the SAS log before you submit programs. This way the beginning of the SAS log for the program just submitted is always at the top of the Log window. If you don’t clear the window, then SAS just tacks the log from what you just submitted onto whatever was already in the Log window making it hard to find the beginning of the log. SAS logs contain 3 types of messages: errors, warnings, and notes. Errors If you get an error message in your program, you will know it. Error messages get your attention because SAS will not run a job with one of these bugs. Error messages are not quiet, discreet, or subtle; they are the loud, rabble- SUGI 28 Beginning Tutorials rousers of SAS messages. This message, for example: the most from streamlining. ERROR: No CARDS or INFILE statement. THE SPECIES DATA The data for the next few examples appear in Table 1. Each observation contains data about one order in the class Insecta (La Plante, 1996). The variables are the name of the order (Order), the number of species in that order found in North America (InNorthAmerica), and the number of species found outside North America (OutsideNorthAmerica). stops a program dead in its tracks. This message tells you that SAS could not find any data to read with the INPUT statement. Warnings Warnings are less dire than errors. SAS prints warnings in your log and then goes ahead and runs the job anyway. Many people, including some professional programmers, try to ignore warnings. Don’t you be one of them. Sometimes the situations that result in warnings are indeed harmless; other times they indicate grave problems which, if unresolved, will render your results worthless. You should check all warnings to see if they are harmless or hazardous. This message: Table 1 Species data. Order Thysanura Diplura Protura Collembola Ephemeroptera Odonata Plecoptera Grylloblattodea Saltatoria Phasmida Dictyptera Isoptera Dermaptera Embioptera Psocoptera Zoraptera Mallophaga Anoplura Thysanoptera Hemiptera Neuroptera Mecoptera Trichoptera Lepidoptera Diptera Siphonaptera Hymenoptera Coleoptera Strepsiptera WARNING: The data set WORK.SPECIES may be incomplete. When this step was stopped there were 0 observations and 3 variables. tells you that SAS did run a DATA step, but for some reason there are zero observations. This could be OK, but generally speaking when you go to the trouble of creating a data set, you want some data in it. Notes Notes are the most innocuous messages that SAS writes in your SAS log. They simply inform you of the status of your program. Notes contain information such as the number of records input from an external file, or the number of observations written in a SAS data set. Don’t be fooled by demure little notes; they are a critically important way of catching errors. These messages: InNorth America 20 30 30 325 550 425 34 . 110 . . 45 20 10 150 2 320 65 625 8750 350 70 950 10500 16700 250 14600 27000 120 OutsideNorth America 230 370 70 1675 950 4575 1266 6 21890 . . . 1080 140 950 17 2280 285 2375 46250 4350 280 3550 189500 68300 850 90400 530000 180 THE MISSING SEMICOLON A syntax error NOTE: 29 records were read from the infile 'bugspeci.dat'. The minimum record length was 27. The maximum record length was 27. NOTE: The data set WORK.SPECIES has 14 observations and 3 variables. Even the newest of SAS programmers knows that every SAS statement ends with a semicolon; so it is ironic that one of the most common bugs is the missing semicolon. While most SAS error messages are clear and easy to understand, the hallmark of a missing semicolon is confusion. Missing semicolons often produce a long stream of baffling messages. In the following example, the absence of a semicolon at the end of the DATA statement causes two error messages, three warnings, and a suspicious note. tell you that while 29 records where read from a raw data file, the resulting SAS data set contains only 14 observations. If you were expecting only 14 observations, then this would be fine. But if you were expecting 29 observations, one observation for each input record, then this would tip you off that something went wrong. Another type of note can help you write efficient programs. At the end of every step SAS prints a note similar to this: NOTE: The PROCEDURE PRINT used 6.98 seconds. If you are running a one-time report, you may not care, but if you run the same program over and over then you may want to check your notes to see which steps can benefit 2 SUGI 28 Beginning Tutorials Syntax-sensitive editors 1 2 3 4 DATA species INFILE 'bugspecies.dat'; INPUT Order $ 1-15 InNorthAmerica OutsideNorthAmerica; RUN; The Enhanced Editor is the default editor for the Windows operating environment and the Program Editor is the default for all other operating environments. Both the Enhanced Editor and the Program Editor (starting with version 9 of SAS) color code your program. All the different elements of a SAS program (e.g. keywords, comments, quoted strings, constants and variables) show up in a different color as you type them in the editor. This can be extremely valuable in catching missing semicolons (and also un-matched quotes) as you type your program, because the program will most likely not be colored properly if you have a missing semicolon. If you are not using Windows, and you do not have version 9 of SAS, you may still be able to use the color coding feature. In the OS/390 operating environment, the color coding feature became the default starting with version 8.2. For information on using the color editor under UNIX (8.1 or higher) see the SAS note SN-003114 (http://www.sas.com/service/techsup/unotes/SN/003/0031 14.html). ERROR: No CARDS or INFILE statement. ERROR: Memtype field is invalid. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.SPECIES may be incomplete. When this step was stopped there were 0 observations and 3 variables. WARNING: Data set WORK.SPECIES was not replaced because this step was stopped. WARNING: The data set WORK.INFILE may be incomplete. When this step was stopped there were 0 observations and 3 variables. The message “No CARDS or INFILE statement” is especially odd since there obviously is an INFILE statement. Without a semicolon, the DATA statement becomes concatenated with the INFILE statement. SAS then interprets the keyword INFILE as a data set name in the DATA statement resulting in the warning “data set WORK.INFILE may be incomplete.” UNINITIALIZED VARIABLE AND VARIABLE NOT FOUND A syntax error These two related messages tell you that SAS was unable to find one of your variables. The first time you see one of these messages you will probably wonder what SAS is babbling about, after all you remember creating the variable. If you find that the messages in your log make no sense, check for missing semicolons. DATASTMTCHK system option In the following SAS log, the INPUT statement reads the species data using the variable name InNorthAmerica for the number of species in North America. Then a subsetting IF statement contains the misspelled variable name NorthAmerica. Some cases of missing semicolons, like the one given in the previous example, may be easier to find if you use the DATASTMTCHK=ALLKEYWORDS system option. By default you cannot use the words MERGE, RETAIN, SET, or UPDATE as SAS data set names. If you set the DATASTMTCHK system option equal to ALLKEYWORDS, then SAS will not accept any keyword as a SAS data set name. For example, if you add the OPTIONS statement to the previous example, you will get the following messages in your SAS log: 1 2 3 4 5 1 OPTIONS DATASTMTCHK=ALLKEYWORDS; 2 DATA species 3 INFILE 'bugspecies.dat'; -----57 ERROR 57-185: INFILE is not allowed in the DATA statement when option DATASTMTCHK=ALLKEYWORDS. Check for a missing semicolon in the DATA statement, or use DATASTMTCHK=NONE. 6 DATA species (KEEP = Order World); INFILE 'bugspecies.dat'; INPUT Order $ 1-15 InNorthAmerica OutsideNorthAmerica; IF NorthAmerica > 100; World = InNorthAmerica + OutsideNorthAmerica; RUN; NOTE: Variable NorthAmerica is uninitialized. When SAS is unable to find a variable in a DATA step, SAS prints the variable-is-uninitialized message. Then SAS creates the variable, sets its values to missing for all observations, and runs the DATA step. It’s nice that SAS runs the DATA step, but you probably don’t want the variable to have missing values for all observations. 4 INPUT Order $ 1-15 InNorthAmerica 5 OutsideNorthAmerica; 6 RUN; A more serious problem ensues when SAS is unable to find a variable in a PROC step. In the following example, SAS cannot find the variable InNorthAmerica. This variable did exist, but was accidentally dropped in the previous DATA step because it was not listed in the KEEP option. SAS prints the variable-not-found message and does not run the procedure at all. This system option has the added advantage in that it produces an error and therefore prevents SAS from writing over a good data set with a bad one just because you forgot a semicolon at the end of the DATA statement. 3 SUGI 28 Beginning Tutorials However, if you have an observation with missing values for all of the variables named in the function, then the result would still be missing and you would still get the missing-values note for that observation. 7 PROC PRINT DATA=species; 8 VAR Order InNorthAmerica World; ERROR: Variable INNORTHAMERICA not found. 9 RUN; THE INSECTICIDE DATA Another version of the variable-not-found message appears as a warning when the problem occurs in a less critical statement such as a LABEL statement. Because this is a warning, not an error, SAS runs the step. Data about the effectiveness of insecticides appears in Table 2 (based on Conklin, 1996). The variables are the name of the insecticide (Insecticide), its residual effect on insects (ToxInsect), and its toxicity to mammals (ToxMammal). The toxicity to insects and mammals are both rated on a scale of 0 to 6 where 0 indicates no effect and 6 means extreme toxicity. Possible causes of the variable-is-uninitialized and variable-not-found messages include: • A misspelled variable name. • Using a variable that has been dropped. Table 2 Insecticide data. • Using the wrong data set. Insecticide ToxInsect DDT 6 Malathion 4 Pyrethrins 2 Sulfur 6 Chlordane 6 Diazinon 4 Heptachlor 6 Nicotine sulfate 1 • Using a variable before it is created. MISSING VALUES WERE GENERATED A data error The missing-values-were-generated note tells you that SAS was unable to compute the value of a new variable because of existing missing values in your data. This may not indicate a problem, but it warrants an investigation. Nicotine sulfate is a lousy insecticide. It has almost no residual effect against insects, but is extremely toxic to people. In the following example, SAS computes the variable World by adding the two variables InNorthAmerica and OutsideNorthAmerica. 1 2 3 4 5 6 ToxMammal 3 1 1 0 4 4 5 6 NUMERIC TO CHARACTER CONVERSION A data error DATA species; INFILE 'bugspecies.dat'; INPUT Order $ 1-15 InNorthAmerica OutsideNorthAmerica; World = InNorthAmerica + OutsideNorthAmerica; RUN; If you accidentally mix numeric and character variables, SAS will convert the data from one type to the other, run the program anyway, and print the values-have-beenconverted note as shown in this log: 1 DATA poisons; 2 INFILE 'bugtox.dat'; 3 INPUT Insecticide $ 1-16 ToxInsect $ 18 ToxMammal $ 20; 4 Difference = ToxInsect - ToxMammal; 5 RUN; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 4 at 4:27 NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 4:16 4:28 This missing-values-were-generated note tells you that SAS assigned missing values to four observations at line 4 column 27 of the program. A quick look back at the species data shows that 4 observations have missing data for InNorthAmerica or OutsideNorthAmerica. Therefore, SAS was unable to compute a value for the variable World for those particular observations. In this example, the variables ToxInsect and ToxMammal are input as character variables but are used in an arithmetic expression. Since you can’t subtract character values, SAS must convert these variables to numeric. Specifically, the note tells you that SAS converted data at columns 16 and 28 in line 4 of the program. These columns correspond to the variable names ToxInsect and ToxMammal. The SUM function and its cousin the MEAN function can lessen this problem because they use only non-missing values. If you decided that you just wanted to sum the non-missing values in the preceding program, you would use this statement: It’s nice that SAS tries to fix the problem for you, but this doesn’t mean that you can ignore the message. If you let SAS convert your variables, it can come back to haunt you at a later time when the variable that you think is numeric is now character or vice versa. If a variable World = SUM(InNorthAmerica, OutsideNorthAmerica); 4 SUGI 28 Beginning Tutorials troublemaker for yourself. Finally, SAS prints the values of the variables as it has read them plus two automatic variables: _ERROR_ and _N_. _ERROR_ always has a value of 1 if there is a data error, and 0 if there is not. _N_ is the number of times SAS has looped through the current data step. needs to be converted, you should do it yourself, explicitly, so there are no surprises. To convert from character to numeric you use the INPUT function. To convert from numeric to character, you use the PUT function. The basic forms of these statements are: In this case, you can see that column 16 contains the letter “e”. SAS was trying to read the “e” as the value for ToxInsect, but ToxInsect is a numeric variable. Unfortunately, the INPUT statement told SAS to read Insecticide from columns 1-15, but it should have said 116. character to numeric: newvar = INPUT(oldvar, informat.); numeric to character: newvar = PUT(oldvar, format.); In either case, the informat or format must be numeric. To convert the variables ToxInsect and ToxMammal, you could use these statements: Possible causes of the invalid-data message include: • Forgetting to specify that a variable is character (SAS assumes it is numeric). NewToxInsect = INPUT(ToxInsect, 1.); NewToxMammal = INPUT(ToxMammal, 1.); • Incorrect column specifications producing embedded spaces in numeric data. • Incorrect column specifications producing character values for a numeric variable. • List-style data with two periods in a row and no space in between. • Failing to mark a missing value with a period in liststyle input, causing SAS to read the data for the next variable. The resulting variables—now numeric—will have a length of 8 bytes which is the default for numeric variables. Possible causes of the values-have-been converted message include: • Setting a variable equal to another variable of a different type. • Using a variable with the wrong type of function. • Using the letter O instead of the number zero. • Using a character variable in an arithmetic expression. • Special characters such as carriage-return-line-feed and page-feed. • Invalid dates (such as September 31) read with a date informat. • Using the wrong informat such as MMDDYY. instead of DDMMYY. INVALID DATA A data error The invalid-data note may be the ugliest message you’ll ever see in a SAS log, but once you know how to read it, you’ll know exactly what the problem is every time. Whenever SAS encounters invalid data while reading with an INPUT statement, SAS sets the problematic variable to missing for that observation and then prints a detailed message like this: CHARACTER FIELD TRUNCATED A data error This bug does not generate any error messages or suspicious notes, but you know that you have this problem when you print your data and find the end of a character variable has been lopped off. 1 DATA poisons; 2 INFILE ‘bugtox.dat’; 3 INPUT Insecticide $ 1-15 ToxInsect ToxMammal; 4 RUN; The length of a character variable is set when SAS first encounters the variable, typically in an INPUT or assignment statement. If you use list-style input, the default length for character variables is eight bytes. With column-style input it is the number of columns you specify. With formatted-style input it is the length of the informat. If you create a new variable with assignment statements, SAS sets its length based on the first occurrence of the variable. NOTE: Invalid data for ToxInsect in line 8 16-16. RULE: ----+----1----+----2----+--8 Nicotine sulfate 1 6 Insecticide=Nicotine sulfat ToxInsect=. ToxMammal=1 _ERROR_=1 _N_=8 In the following example, the variable Toxicity is first set equal to “high”. Therefore SAS gives Toxicity a length of four bytes, and any subsequent longer values will be truncated. The first line of this message is a note telling you which variable had a problem, ToxInsect, in this case; the line of data at which the problem occurred, line 8; and the column SAS was trying to read, column 16. Next SAS prints a line labeled RULE which is a handy ruler for counting columns. On this ruler 1 indicates the 10th column, 2 the 20th, and so on. Then SAS dumps the actual line of raw data so you can see the little DATA poisons (DROP = ToxInsect); INFILE 'bugtox.dat'; INPUT Insecticide $ 1-16 ToxInsect ToxMammal; IF ToxMammal >= 5 THEN Toxicity = 'High'; 5 SUGI 28 Beginning Tutorials ELSE IF ToxMammal >= 3 THEN Toxicity = 'Moderate'; ELSE IF ToxMammal >= 1 THEN Toxicity = 'Low'; ELSE Toxicity = 'No Effect'; Table 3 Moth flight data. corn earworm corn earworm corn earworm corn earworm corn earworm fall armyworm fall armyworm fall armyworm white-lined white-lined white-lined white-lined tobacco hornworm tobacco hornworm satellite RUN; Using a PROC PRINT you can see the truncated values for Toxicity. The SAS System Obs 1 2 3 4 5 6 7 8 1 Tox Mammal Insecticide DDT Malathion Pyrethrins Sulfur Chlordane Diazinon Heptachlor Nicotine sulfate 3 1 1 0 4 4 5 6 Toxicity Mode Low Low No E Mode Mode High High n n n n n n n n s s s s s s s m m f f f m m f m m f f f f f 0.107 0.226 0.161 0.239 0.279 0.140 0.139 0.156 0.600 0.322 0.660 0.853 1.199 1.604 1.726 0.118 0.131 0.108 0.154 0.158 0.113 0.113 0.131 0.322 0.313 0.276 0.336 0.721 0.617 0.767 0.168 0.186 0.150 0.218 0.222 0.159 0.159 0.186 0.458 0.436 0.386 0.472 1.017 0.872 1.076 0.249 0.281 0.232 0.327 0.336 0.241 0.241 0.277 0.681 0.654 0.581 0.708 1.525 1.307 1.616 DATA STEP PRODUCES WRONG RESULTS BUT NO ERROR A logic error Sometimes a DATA step can seem like a “black box.” You know what goes in, and you know what comes out, but what goes on in the middle can be a mystery. If what comes out is not what you want, then you have a bug. Problems like this are really logic errors. Somewhere along the way SAS got the wrong instructiona classic case of the computer doing what you tell it to do, not what you want. You could fix this problem by padding the value “high” with blanks, but a more elegant and explicit solution is to use the LENGTH statement. In this example, you would insert this statement in the DATA step before the first occurrence of the variable Toxicity. LENGTH Toxicity $9; An example Here’s the output you get when using the LENGTH statement. The values of Toxicity are no longer truncated. The SAS System 2 Tox Mammal Obs Insecticide 1 2 3 4 5 6 7 8 DDT Malathion Pyrethrins Sulfur Chlordane Diazinon Heptachlor Nicotine sulfate For a moth, a bird, or even a supersonic jet, flight occurs when lift exceeds weight. Using the moth flight data and a series of IF-THEN/ELSE statements, the following program finds the angle of attack at which each moth can sustain flight. The new variable ANGLE equals 3, 6, or 12 depending on the angle at which the moth’s lift exceeds its weight. 3 1 1 0 4 4 5 6 Toxicity DATA moths; INFILE 'bugwing.dat'; INPUT Species $ 1-16 Family $ Sex $ Weight Lift3 Lift6 Lift12; IF Lift3 >= Weight THEN Angle = 3; ELSE IF Lift6 >= Weight THEN Angle = 6; ELSE IF Lift12 >= Weight THEN Angle = 12; PROC PRINT DATA=moths; TITLE 'Angle of Attack to Sustain Flight'; VAR Species Angle; RUN; Moderate Low Low No Effect Moderate Moderate High High Perhaps some future release of SAS will warn you when character values are truncated, but for now you are on your own. THE MOTH FLIGHT DATA This program runs fine (without errors, warnings, or suspicious notes), but looking at the following output you can see several observations have missing values for Angle. Data about the flight of 15 individual moths appears in Table 3 (Callahan, 1971). The variables are the moth’s species (Species), its family (Family, where n=noctid and s=sphingid), sex (Sex), weight in grams (Weight), and lift in grams at three, six, and twelve degrees of pitch (Lift3, Lift6, and Lift12). 6 SUGI 28 Beginning Tutorials Angle of Attack to Sustain Flight Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Species corn earworm corn earworm corn earworm corn earworm corn earworm fall armyworm fall armyworm fall armyworm white-lined white-lined white-lined white-lined tobacco hornworm tobacco hornworm satellite WEIGHT=0.628 LIFT3=0.236 LIFT6=0.331 LIFT12=0.504 WEIGHT=0.66 LIFT3=0.276 LIFT6=0.386 LIFT12=0.581 WEIGHT=0.853 LIFT3=0.336 LIFT6=0.472 LIFT12=0.708 WEIGHT=1.604 LIFT3=0.617 LIFT6=0.872 LIFT12=1.307 WEIGHT=1.726 LIFT3=0.767 LIFT6=1.076 LIFT12=1.616 NOTE: 15 records were read from the infile 'bugwing.dat'. The minimum record length was 48. The maximum record length was 48. NOTE: The data set WORK.MOTHS has 15 observations and 9 variables. NOTE: The DATA statement used 1.86 seconds. 1 Angle 3 12 12 12 12 6 6 6 12 6 . . 12 . . Looking at the data values in the log, you can see that lift never exceeds weight for these moths. Apparently, these moths need an angle of attack greater than 12 to get off the ground. The IF-THEN/ELSE series should be rewritten so that it takes into account the possibility that some moths may not sustain flight at 12 degrees. One way to figure out what went wrong is just to look at the program and the output from PROC PRINT. When that doesn’t work, then there are two ways to solve the mystery: the traditional method using PUT statements, and the DATA step debugger. In this example, the problem was simple enough that you could have solved it by using a PROC PRINT after the DATA step. In real life, the PUT statement technique is most useful when you have a long and convoluted DATA step, especially if that DATA step was written by someone else and you are handed the whole step rather than having the luxury of building it piece by piece. Using PUT statements PUT statements are like INPUT statements in reverse. Instead of reading data, they write it. The basic idea behind using PUT statements to debug a DATA step is to print data values at intermediate points in the DATA step. When used without a FILE statement, PUT statements write values in the log, a handy place for them to be for debugging. The following statement tells SAS to print the values of selected variables for every observation with a missing value for Angle. USING THE DATA STEP DEBUGGER The DATA step debugger offers SAS programmers an alternative way to investigate logic errors. To understand the DATA step debugger, you have to know that SAS runs programs in two phases. First SAS compiles your program, then SAS executes your program. Syntax errors and some data errors such as numeric to character conversions occur at compile time. Other errors such as logic errors and some data errors compile just fine, but cause you to get bad results. Since the DATA step debugger works during the execution phase, it is only useful for identifying errors that occur at that time. IF Angle = . THEN PUT Weight= Lift3= Lift6= Lift12=; After inserting this statement in the program and rerunning it, the log looks like this: 1 DATA moths; 2 INFILE 'bugwing.dat'; 3 INPUT Species $ 1-16 Family $ Sex $ Weight 4 Lift3 Lift6 Lift12; 5 IF Lift3 >= Weight THEN Angle = 3; 6 ELSE IF Lift6 >= Weight THEN Angle = 6; 7 ELSE IF Lift12 >= Weight THEN Angle = 12; 8 IF Angle = . THEN PUT Weight= 9 Lift3= Lift6= Lift12=; 10 RUN; Space limits do not allow for a detailed discussion of the DATA step debugger, but the information here should be enough to get you started. For more information see the SAS OnlineDoc. To invoke the debugger, add “/ DEBUG” to the end of your DATA statement. Then run the DATA step in the SAS windowing environment. For the preceding example you would submit this. DATA moths / DEBUG; INFILE 'bugwing.dat'; INPUT Species $ 1-16 Family $ Sex $ Weight Lift3 Lift6 Lift12; IF Lift3 >= Weight THEN Angle = 3; ELSE IF Lift6 >= Weight THEN Angle = 6; ELSE IF Lift12 >= Weight THEN Angle = 12; RUN; NOTE: The infile 'bugwing.dat' is: FILENAME=C:\bugwing.dat,RECFM=V, LRECL=256 7 SUGI 28 Beginning Tutorials for all variables, or After you submit the DATA step, two windows will appear (see Figure 1). These are the DEBUGGER LOG window and the DEBUGGER SOURCE window. The DEBUGGER LOG window contains messages from the debugger and a command line. The SOURCE window contains your DATA step statements with the current line highlighted. By watching the highlighting move, you can see how SAS executes your program. SAS executes each-line of your program for the first observation, then returns to the top of the DATA step for the second observation, and so on. EXAMINE Weight Lift3 Lift6 Lift12 Angle to choose specific variables. To end your debugging session enter the command QUIT DATA STEP DEBUGGER VS. PUT STATEMENTS Some programmers will probably find the DATA step debugger very useful, others may choose to stay with the traditional PUT statement method. The debugger is designed to be used in the SAS windowing environment, so people who normally use SAS in batch will probably prefer the PUT statement method. The debugger can work in batch in some environments (by popping you into an interactive window), but it makes more sense for people who work interactively. Figure 1 DATA Step Debugger screen. Since the DATA step debugger is more interactive, it is better suited to an exploratory approach, printing a few data values here and there, making decisions as you go. If you have a general idea of which part of your DATA step is causing the problem, then you may find it simpler to use PUT statements. Some logic errors may be easier to debug by looking at more than one observation at a time. In those cases the observation-by-observation nature of the debugger may give less insight than the PUT statement method. One nice bonus of the DATA step debugger is the ability to watch SAS execute a DATA step line-by-line and observation-by-observation. For a beginner, this alone could be very enlightening. By using commands you can control how many lines SAS executes, and you can print the current values of variables you specify. Some of the basic commands that you can issue appear in the following table. Table 4 DATA step debugger commands. AVOIDING ERRORS Commands Learning how to find bugs in your programs and knowing how to fix errors is critical, but the best strategy for debugging is avoiding the errors in the first place. There are several strategies you can use to help you write programs that work: EXAMINE variable-list Prints the values of specified variables. Must specify variable names or _ALL_. STEP Executes one statement.Executes one statement. SET variable = value Assigns a new value to a specified variable. BREAK linenumber Tells SAS to execute statements up to the line number specified. Use the GO command to begin execution. GO Starts or resumes execution of the DATA step. QUIT Ends a debugger session. • Write programs in a neat and organized manner. • Develop programs piece by piece. • Test programs with small sample data sets. Write programs in a neat and organized manner SAS does not require your program to be neat and organized, but you will be doing yourself a big favor if you follow a few simple guidelines when writing your programs. To get a feel for the debugger, you may want to start by stepping through your DATA step line by line. When you want to know the current values of variables, issue an EXAMINE command such as • Never put more than one SAS statement on a line. • Use indentation to show the different parts of your program and to show which statements belong together. • Use comment statements. Programs that are easy to read are much easier to debug. EXAMINE _ALL_ 8 SUGI 28 Beginning Tutorials example, the following statement tells SAS to start reading st at the 101 data line and stop at line 200. Develop programs piece by piece We can’t emphasize this point enough. When you are developing programs that involve several steps, or you have complicated DATA steps that do several different things, don’t write the whole program at one sitting! It will improve your programming efficiency greatly if you start simply, test, and build from there. INFILE ‘allinsects.dat’ FIRSTOBS=101 OBS=200; If you are reading one line of data for each observation, this will produce a data set with 100 observations, but if you are reading two lines of data for each observation, then you will only get 50 observations. For example, you may need to write a program that will read data from a text file, calculate some new variables, produce means, merge with another data set, and then output to an Excel file. It is possible to run this entire program and not have any results in your SAS Output window, and once you are sure the program is working properly, you probably don’t want to produce any SAS output – only the Excel file. But when you are developing the program, you will want to see some intermediate results. You can also use the OBS= and FISRTOBS= options in procedures. For example, the following prints only the first 10 observations of the SAS data set SPECIES. PROC PRINT DATA=species(OBS=10); This next example shows how you can use these options on a SET statement to create a new SAS data set, SOMEINSECTS, that consists of just observations 50 to 100 of the data set INSECTS. First you would want to write a simple DATA step with an INPUT statement to read in the data. Then always use PROC PRINT to make sure the data were read correctly. Once you are convinced that the INPUT statement is working, add the statements to create the new variables, and once again use PROC PRINT to make sure the results are correct. Then write the PROC MEANS statements, and once again check to see if your results are correct. Keep adding on to the program, testing as you go, until you get your final result. If you write the program all at once, and you end up with an empty Excel sheet, then you won’t know where to start looking for the problem. But if you have tested your program every step along the way and you encounter a problem, then you know to look at the last step that you added. So use PROC PRINT liberally! DATA someinsects; SET insects (FIRSTOBS=50 OBS=100); Sometimes it is even more useful to make up data to use when you are testing your program. When you make up data, you can often create a very compact data set that has all possible combinations of data. And, since you made up the data, you know exactly what results to expect. Of course this method will not test for data types you had not anticipated, so if you use this method always test again with the real data. CONCLUSIONS This paper has discussed some of the most common SAS programming bugs and how to exterminate them as well as strategies for avoiding errors in the first place. You should always check your SAS log even when the output looks fine. Notes are just as important as error messages and warnings in debugging your programs. Once you understand why you got an error, you’ll be better able to avoid it in the future. Test programs with small sample data sets Sometimes it is just not practical to test your program with your entire data set. If the data file is very large, it may take a long time to run your program, or all those PROC PRINTs may produce volumes of output that are difficult to wade through to see what is going on. There are at least three ways to test your program with smaller data sets: • Create a small subset of your data. • Use the OBS= and FIRSTOBS= options. • Make up data. REFERENCES Callahan, Philip S. (1971). Insects and How They Function. Holiday House, NY. Conklin, Gladys (1996). Insects. Encyclopedia Americana, International Edition. Grolier, Danbury, CT, vol. 15, pp. 197-208. You can create a small sample data set and use that for testing purposes, but this may not always be easy to do, or you may not want another data file floating around that could get mistaken for the real thing. An alternative to this is to use the OBS= and/or FIRSTOBS= options. The OBS= option on the INFILE statement tells SAS how many data lines to read. For example, the following statement tells SAS to read the first 100 data lines from the file ‘allinsects.dat’. LaPlante, Albert A. (1996). Insect Control. Encyclopedia Americana, International Edition (1996). Grolier, Danbury, CT, vol. 15, pp. 197-208. McGavin, George C. (1993). Bugs of the World. Facts on File, Inc., New York. RECOMMENDED READING Carpenter, Arthur L (1996). Programming for Job Security: Tips and Techniques to Maximize Your Indispensability. Proceedings of the Twenty-First Annual SAS Users Group International Conference, 19, pp. 1637-1640. INFILE ‘allinsects.dat’ OBS=100; You can use the OBS= option in combination with the FIRSTOBS= option to read from the middle of a file. For 9 SUGI 28 Beginning Tutorials Carpenter, Arthur L (2000). Long Names and Other V8 Tips: A Bit More on Job Security. Western Users of SAS Software Proceedings of the Seventh Annual Regional Conference. pp. 448-453. Delwiche, Lora D. and Susan J. Slaughter (2002). The Little SAS Book: A Primer, Revised Second Edition. SAS Institute, Cary, NC. Hayes, Brian (1995). Debugging Myself. American Scientist, 83, pp. 404-408. Staum, Roger (2002). To Err is Human: To Debug Divine. Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference, paper 6427. ABOUT THE AUTHORS Lora Delwiche and Susan Slaughter are also the authors of The Little SAS Book: A Primer published by SAS Institute, and may be contacted at: Lora D. Delwiche (530) 752-7457 llddelwiche@ucdavis.edu Susan J. Slaughter (530) 756-8434 susan@avocetsolutions.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 10
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Create Date : 2002:12:17 17:44:02Z Modify Date : 2003:03:27 14:57:26-05:00 Subject : Beginning Tutorials Keywords : debug, errors, warnings, notes, SAS log Page Count : 10 Creation Date : 2002:12:17 17:44:02Z Mod Date : 2003:03:27 14:57:26-05:00 Producer : Acrobat Distiller 5.0.5 (Windows) Author : Lora D. Delwiche, Susan J. Slaughter Metadata Date : 2003:03:27 14:57:26-05:00 Creator : Lora D. Delwiche, Susan J. Slaughter Title : SUGI 28: Errors, Warnings, and Notes (Oh My): A Practical Guide to Debugging SAS(r) Programs Description : Beginning Tutorials Page Mode : UseOutlines Page Layout : OneColumnEXIF Metadata provided by EXIF.tools