007 0715 060
User Manual: 007-0715-060
Open the PDF directly: View PDF .
Page Count: 130
Download | ![]() |
Open PDF In Browser | View PDF |
POWER Fortran Accelerator™ User’s Guide Document Number 007-0715-060 CONTRIBUTORS Written by Chris Hogue and David Graves Edited by Janiece Carrico Production by Gloria Ackley Engineering contributions by Bron Nelson, Deb Caruso, and Mike Humphrey © Copyright 1991–1994, Silicon Graphics, Inc.— All Rights Reserved This document contains proprietary and confidential information of Silicon Graphics, Inc. The contents of this document may not be disclosed to third parties, copied, or duplicated in any form, in whole or in part, without the prior written permission of Silicon Graphics, Inc. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/ or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights are reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. Silicon Graphics and IRIS are registered trademarks, and POWER Fortran Accelerator, POWER Series, and IRIX are trademarks of Silicon Graphics, Inc. Cray is a trademark of Cray Research. VAST is a trademark of Pacific Sierra Research, Inc. VMS is a trademark of Digital Equipment Corporation. Kuck and Associates, Inc., is the supplier of the optimizer used in this product. POWER Fortran Accelerator™ User’s Guide Document Number 007-0715-060 Contents Introduction xi Organization xi Related Documentation xii Typographical Conventions xiii 1. Overview of PFA 1 Overview 1 Strategy for Using PFA 3 Command Line Options 3 Directives 4 Assertions 6 Summary 7 2. How to Use PFA 9 Overview 9 Compiling Programs With PFA Using PFA Directly 15 3. 10 Utilizing PFA Output 17 Overview 17 Formatting the Listing File 19 Paginating the Listing 19 Specifying Information to Include Disabling Message Classes 21 20 iii Contents Interpreting Default Listing Information Viewing the Listing File 22 Field Descriptions 22 Sample Listing Files 27 Indirect Indexing 27 Function Call 30 Reductions 32 4. iv 21 Customizing PFA Execution 37 Overview 37 Controlling Code Execution 38 Running Code in Parallel 38 Specifying a Work Threshold 38 Controlling PFA Code Transformations 39 Controlling Size/Complexity Thresholds 39 Setting the Optimization Level 40 Controlling Variations in Round Off 42 Controlling the Number of Scalar Optimizations 42 Enabling Loop Unrolling 43 Memory Management Transformations 44 Performing Inlining and Interprocedural Analysis 46 Specifying Routines for Inlining or IPA 47 Specifying Where to Search for Routines 47 Creating a Library 48 Specifying Occurrences 49 Conditions That Prevent Inlining or IPA 50 Controlling Fortran Language Elements 50 Global Assumptions 50 Debugging Lines 51 DO Loop Execution 51 Variable Saving Across Invocations 52 Significant Columns 52 Fortran Standard 52 Controlling Directives and Assertions 53 Selecting Directives and Assertions 53 Controlling PFA I/O 54 Obsolete Syntax 55 5. Fine-Tuning PFA 57 Overview 58 Fine-Tuning Inlining and IPA 58 Circumventing PFA 60 C$ DOACROSS 60 C$& 60 Running Code Serially 61 C*$* ASSERT DO (SERIAL) 61 CDIR$ NEXT SCALAR 61 C*$* ASSERT DO PREFER (SERIAL) 61 Running Code in Parallel 62 C*$*[NO]CONCURRENTIZE 62 CVD$ CONCUR 62 C*$* ASSERT DO PREFER (CONCURRENT) 62 Ignoring Data Dependencies 63 C*$* ASSERT DO (CONCURRENT) 63 CDIR$ IVDEP 63 C*$* ASSERT CONCURRENT CALL 64 C*$* ASSERT NO RECURRENCE 64 C*$* ASSERT PERMUTATION 64 Using Equivalenced Variables 65 Using Aliasing 65 C*$* ASSERT [NO] ARGUMENT ALIASING 65 C*$* ASSERT RELATION 66 v Contents A. PFA Command Line Options 67 Overview 67 Options Summary 69 Obsolete Syntax 88 B. PFA Directives 91 Standard Directives 92 Cray Directives 99 VAST Directives 99 C. PFA Assertions 101 Glossary 105 Index 109 vi Figures Figure 2-1 Compiling With PFA 14 vii Tables Table 1-1 Table 1-2 Table 2-1 Table 3-1 Table 3-2 Table 3-3 Table 3-4 Table 3-5 Table 4-1 Table 4-2 Table 4-3 Table A-1 Table A-2 Table A-3 Table A-4 Table A-5 Table A-6 Table A-7 Table A-8 Table A-9 Table A-10 Table A-11 Table A-12 Table A-13 Table A-14 Table A-15 Table A-16 PFA Directives 5 PFA Assertions and Their Duration 7 PFA Command Line Options 12 Listing File Include Options 20 Listing File Message Disabling Options 21 Listing File DO Loop Delimiters 23 PFA Action Abbreviations 25 Reduction Types 35 Inlining and IPA Search Command Line Options Obsolete Options 55 Obsolete Options and Their Equivalents 55 PFA Command Line Options 68 ARCLIMIT Option 69 ASSUME Option 70 CONCURRENTIZE Option 70 DIRECTIVES Option 71 DLINES Option 72 FORTRAN Option 72 INLINE Option 72 INLINE_CREATE Option 73 INLINE_DEPTH Option 73 INLINE_FROM_FILES Option 74 INLINE_FROM_LIBRARIES Option 74 INLINE_LOOPLEVEL Option 75 INLINE_MAN Option 75 INPUT Option 76 IPA Option 76 47 ix Tables Table A-17 Table A-18 Table A-19 Table A-20 Table A-21 Table A-22 Table A-23 Table A-24 Table A-25 Table A-26 Table A-27 Table A-28 Table A-29 Table A-30 Table A-31 Table A-32 Table A-33 Table A-34 Table A-35 Table A-36 Table A-37 Table A-38 Table A-39 Table A-40 Table A-41 Table A-42 x IPA_CREATE Option 76 IPA_FROM_FILES Option 77 IPA_FROM_LIBRARIES Option 77 IPA_LOOPLEVEL Option 78 IPA_MAN Option 78 LIMIT Option 79 LINES Option 79 LIST Option 80 LISTOPTIONS Option 80 MINCONCURRENT Option 81 NOCONCURRENTIZE Option 81 NODIRECTIVES Option 82 NODLINES Option 82 NOONETRIP Option 82 ONETRIP Option 83 OPTIMIZE Option 83 ROUNDOFF Option 84 SAVE Option 85 SCALAROPT Option 85 SCAN Option 86 SUPPRESS Option 86 SYNTAX Option 87 UNROLL Option 87 UNROLL2 Option 88 Obsolete Options 88 Obsolete Options and Their Equivalents 89 Introduction This guide describes the features of the Silicon Graphics POWER Fortran Accelerator TM (PFA). For details about analyzing a program and converting it for use on a multiprocessor system, refer to Chapter 5, “Fortran Enhancements for Multiprocessors,” of the Fortran 77 Programmer’s Guide. Organization This guide contains the following chapters and appendixes: Chapter 1, “Overview of PFA,” explains the basic mechanism for invoking PFA and includes a description of PFA’s listing and intermediate files. Chapter 2, “How to Use PFA,” explains how to use PFA directly and as part of a Fortran compile. Chapter 3, “Utilizing PFA Output,” explains output produced by PFA: the intermediate file and the listing file. Chapter 4, “Customizing PFA Execution,”describes how to use command line options to optimize PFA execution. Chapter 5, “Fine-Tuning PFA,” describes how to optimize code by using PFA directives and assertions. Appendix A, “PFA Command Line Options,” lists the five types of PFA command line options: parallelization, optimization, Fortran 77 language control, directives, and listing. Appendix B, “PFA Directives,” lists the PFA directives you can use to modify the features of PFA, that is, directives to increase the optimization level, increase the size of the loop that PFA can analyze, or use more xi Introduction sophisticated (and time-consuming) ways of resolving superficial data dependencies that prevent PFA from identifying a loop for parallel execution. Appendix C, “PFA Assertions,” lists the PFA assertions you can include in a program to provide information that PFA needs to identify loops that can run in parallel, despite apparent but sometimes non-existent data dependencies. The Glossary lists and defines terminology related to PFA. Related Documentation The following documents contain information relevant to PFA: xii • Fortran 77 Programmer’s Guide, Silicon Graphics, Inc., document number 007-0711-030. • Fortran 77 Language Reference Manual, Silicon Graphics, Inc., document number 007-0710-040. • IRIS-4D Series Compiler Guide, Silicon Graphics, Inc., document number 007-0905-030. Typographical Conventions Typographical Conventions This guide uses the following conventions and symbols: The following conventions and symbols are used in the text to describe the form of Fortran statements: Bold Indicates literal command line options, filenames, keywords, function/subroutine names, pathnames, and directory names. Italics Represents user-defined values. Replace the item in italics with a legal value. Italics are also used for command names, manual page names, and manual titles. Courier Indicates command syntax, program listings, computer output, and error messages. Courier bold Indicates user input. [] Enclose optional command arguments. () Surround arguments or are empty if the function has no arguments following function/subroutine names. Surround manual page section in which the command is described following IRIX commands. | Sseparates two or more optional items. ... Indicates that the preceding optional items can appear more than once in succession. # IRIX shell prompt for the superuser. % IRIX shell prompt for users other than superuser. xiii Introduction Here is an example illustrating the syntax conventions. C*$*[NO]IPA [(name [,name...])] {HERE|ROUTINE|GLOBAL} The previous syntax statement indicates that: • The keyword C*$* NOIPA or C*$*IPA must be written as shown. • You can specify one or more name, each separated by a comma and all between parentheses. • You must specify one of the following: HERE, ROUTINE, or GLOBAL. The following statements are valid examples of the described syntax: C*$* IPA(ALPHA,BETA) HERE C*$* NOIPA GLOBAL xiv Chapter 1 1. Overview of PFA This chapter contains the following sections: • “Overview” describes how PFA operates and suggests procedures for using it. • “Strategy for Using PFA” explains when and how to use PFA. • “Command Line Options” lists and describes the command line options. • “Directives” explains what a directive is and lists the supported directives. • “Assertions”explains what an assertion is and lists the supported assertions. • “Summary” is a short summary of the capabilities of PFA. Overview PFA is a Fortran 77 source-to-source preprocessor that enables you to run existing Fortran 77 programs efficiently on the Silicon Graphics POWER SeriesTM multiprocessor systems. PFA analyzes a program and identifies loops that do not contain data dependencies. Such loops are safe to execute in parallel (concurrently). PFA automatically inserts special compiler directives in a modified copy of the original source code. (PFA produces a number of files containing code and other information you need to run a program concurrently on multiple processors.) 1 Chapter 1: Overview of PFA Interpreting the PFA-generated compiler directives, the Silicon Graphics Fortran 77 compiler can generate code to split loop processing across all the available multiple processors. Because the directives inserted by PFA look like standard Fortran 77 comment statements, PFA does not affect the portability of the code to non–Silicon Graphics, Inc. (SGl), systems. In addition, you do not need a multiprocessor system to develop under PFA (although there is a slight performance loss when running multiprocessed code on a single-processor system). You can develop and test a Fortran 77 program using PFA on any IRIS-4DTM Series workstation (including single-processor systems) and then execute the program on a multiprocessor system. The executable code automatically adjusts itself to use all the processors available on the workstation at run time. (You can also manually specify the number of processors to use; see the Fortran 77 Programmer’s Guide.) However, simply passing code through PFA rarely produces all the increased performance available. There are often easily removed data dependencies that prevent PFA from running a loop in parallel. Using the listing file, optionally generated by PFA, you can find the real or potential data dependencies that prevented PFA from running a loop in parallel. Refer to Chapter 3, “Utilizing PFA Output,” for details about the listing file. If the data dependency is real, you can often remove the dependency by making a small change to the code. If the data dependency was apparent but not real, you can explicitly instruct PFA to run the code in parallel by inserting PFA assertions. These assertions look like Fortran 77 comments. With PFA, you select the code to convert to run in parallel. Thus, you can convert the whole program or key parts of it by adding PFA directives manually or by having PFA convert only selected files. In addition, you can run PFA on some, all, or none of a program’s source files. The object files produced using PFA are fully compatible with other object files. You can freely combine them with object files that you prepared manually for parallel execution and with object files that run only serially. 2 Strategy for Using PFA Strategy for Using PFA Use PFA to identify which loops of a Fortran 77 program can be run safely in parallel. In some instances, PFA alone makes a significant amount of the code run in parallel. However, for many programs simple code changes let PFA automatically run more of the code in parallel. Knowing when and where to modify your code means understanding the information in the PFA listing. Understanding the PFA listing will make it easy to recognize where small changes to the code can make big differences in how much code can run in parallel. Refer to Chapter 3, “Utilizing PFA Output”,” for information. PFA analyzes a program for data dependence. During this analysis, PFA looks for Fortran 77 DO loops in which each iteration of the loop is independent of all other iterations. If each iteration of the loop is self-contained, the system can execute the iterations in any order (or even simultaneously on separate processors) and produce the same result after running all iterations. When PFA finds a loop with data independence, PFA knows it can safely run the loop in parallel. When PFA finds a loop that contains iterations that are dependent on other iterations, it cannot safely run the loop in parallel but can tell you what is causing the problem. If PFA cannot run the loop in parallel, the listing file will explain where PFA encountered problems. Command Line Options To customize the way PFA executes an entire program, you can specify various command line options when you run PFA directly or when you specify PFA as part of a compile (Chapter 2, “How to Use PFA,” explains both procedures). The five functional categories of command line options are • parallel execution • general optimization • Fortran 77 language control 3 Chapter 1: Overview of PFA • directive control • listing Chapter 4, “Customizing PFA Execution,” explains when and how to use the various options, and Appendix A, “PFA Command Line Options,” provides a complete summary. Directives PFA directives enable, disable, or modify a feature of PFA. Essentially, directives are command line options specified within the input file instead of on the command line. Unlike command line options, directives have no default setting. To invoke a directive, you must either toggle the directive on or set a desired value for its level. PFA directives allow you to specify PFA options in addition to, or instead of, command line options. Directives placed on the first line of the input file are called global directives. PFA interprets them as if they appear at the top of each program unit in the file. Use global directives to ensure that the program is compiled with the correct command line options. Directives appearing anywhere else in the file apply only until the end of the current program unit. PFA resets the value of the directive to the global value at the start of the next program unit. (Set the global value using a command line option or a global directive.) Some command line options act like global directives. Other command line options override directives. Many PFA directives have corresponding command line options. If you specify conflicting settings in the command line and a directive, PFA chooses the most restrictive setting. For Boolean options, if either the directive or the command line has the option turned off, it is considered off. For options that require a numeric value, PFA uses the minimum of the command line setting and the directive setting. Table 1-1 lists the directives supported by PFA. In addition to the standard directive, PFA supports the CrayTM and VASTTM directives listed in the table. PFA maps these directives to corresponding PFA assertions. Refer to Chapter 5, “Fine-Tuning PFA”,” for details. 4 Directives Table 1-1 PFA Directives Standard Cray VAST C*$*ARCLIMIT(n) CDIR$ NEXT SCALAR CVD$ CONCUR C*$*CONCURRENTIZE CDIR$ IVDEP CVD$LSTVAL C*$*INLINE CFVD$NOLSTVAL C*S*IPA C*$*LIMIT(n) C*$*MINCOMCURRENT(n) C*$*NONCONCURRENTIZE C*$*NOINLINE C*$*NOIPA C*$*OPTIMIZE(n) C*$*ROUNDOFF(n) C*$*SCALAR OPTIMIZE(n) C*$*UNROLL(n) C*$*UNROLL(n,m) C$DOACROSS C$& Refer to Appendix B, “PFA Directives,” for a list and description of PFA directives. 5 Chapter 1: Overview of PFA Assertions Assertions provide PFA with additional information about the source program. Sometimes assertions can improve optimization results. Use them only when speed is essential. Because PFA does not check the correctness of assertions, they can be unsafe. If you specify an incorrect assertion, the PFA-generated code might give different answers from the scalar program. If you suspect unsafe assertions are causing problems, use the -NODIRECTIVE command line option or the C*$* NO ASSERTIONS directive to tell PFA to ignore all assertions. As with a directive, PFA treats an assertion as a global assertion if it comes before all comments and statements in the file. That is, PFA treats the assertion as if it were repeated at the top of each program unit in the file. C*$* ASSERT RELATION (name .xx. name) assertions include variable names. If you specify them as global assertions, a program uses them only when those variable names appear in COMMON blocks or are dummy argument names to the subprogram. You cannot use global assertions to make relational assertions about variables that are local to a subprogram. Many assertions, like directives, are active until the end of the program unit (or file) or until you reset them. Other assertions are valid only for the DO loop before which they appear (such as C*$* ASSERT DO PREFER (CONCURRENT)). This type of assertion applies to the next DO loop but not to any loop nested inside it. 6 Summary Table 1-2 lists PFA assertions and their duration. Table 1-2 PFA Assertions and Their Duration Assertion Duration C*$* ASSERT DO (SERIAL) Next Loop C*$* ASSERT DO (CONCURRENT) Next Loop C*$* ASSERT DO PREFER (SERIAL) Next Loop C*$* ASSERT DO PREFER (CONCURRENT) Next Loop C*$* ASSERT [NO] EQUIVALENCE HAZARD Until Reset C*$* ASSERT [NO] ARGUMENT ALIASING Until Reset C*$* ASSERT RELATION (name .xx. name) Next Loop C*$* ASSERT CONCURRENT CALL Next Loop C*$* ASSERT NO RECURRENCE Next Loop C*$* ASSERT PERMUTATION (name) Next Loop Summary PFA provides information about the dependencies of loops in a Fortran 77 program. Often, PFA can use the information to run loops in parallel automatically. But when PFA is not able to convert the code for parallel execution automatically, it can tell you where it ran into problems. Often, you need only make a small change to remove the dependencies that prevent the loop from running in parallel. The better you understand the information PFA gives you, the better equipped you will be to transform the program into an efficient parallel version. For more information about parallel processing in general, see Chapter 5 in the Fortran 77 Programmer’s Guide. Especially recommended are the sections “Analyzing Data Dependencies for Multiprocessing” and “Breaking Data Dependencies” for information about recognizing and repairing data dependency problems. 7 Chapter 2 2. How to Use PFA This chapter contains the following sections: • “Overview” describes how to prepare for using PFA. • “Compiling Programs With PFA” explains how to run PFA as part of a Fortran compile. • “Using PFA Directly”explains how to run PFA independent of the Fortran driver. Overview Simply running a program through PFA might buy you some improved performance, but you can get far more if you understand the PFA listing. From the listing, you can often identify small problems that prevent a loop from running safely in parallel. With a relatively small amount of work, you can remove these data dependencies and dramatically improve the program’s performance. When trying to find loops to run in parallel, focus your efforts on the areas of the code that use the bulk of the run time. Spending time trying to run in parallel a routine that uses only 1 percent of the run time of the program cannot significantly improve the performance of your program. To determine where your code spends its time, take an execution profile of the program. Use either pc-sample profiling (through the -p option to f77(1)) or basic block profiling (through pixie(1)). Refer to Chapter 2, “Improving Program Performance,” of the IRIS-4D Compiler Guide for details about profiling. 9 Chapter 2: How to Use PFA There are two schools of thought about profiling: conservative and optimistic. The conservative approach takes a profile of the original (nonparallel) job. You then run in parallel only the loops that account for most of the run time. The more optimistic approach runs the entire program through PFA and then profiles the resulting multiprocessed job. The conservative approach reduces the chances that something might go wrong because it makes fewer changes to the code. It also focuses on the smallest number of lines of code that have the greatest effect. Use the optimistic approach when you think that PFA will do a good job with the existing program. You will save time by letting PFA do what it can. You can then focus on those routines where PFA had a problem. One situation in which PFA frequently does a good job is when you convert programs that already run well on traditional vector architectures. Many such programs run in parallel without additional effort. Whichever approach you choose, use the profile to focus your efforts on the most time-consuming routines. Once you find a time-consuming routine, submit that routine alone to PFA. If the routine is in the middle of a large file, consider using fsplit(1) to isolate the individual routine. Compile the routine with the –pfa keep option, and examine the listing file. The PFA listing identifies the loops that PFA can and cannot run in parallel. For loops that cannot run in parallel, the PFA listing also tells you why it could not convert the loop for parallel execution. Compiling Programs With PFA The following is the command line syntax for compiling a Fortran 77 program with PFA and command line options. You can pass these options to PFA by adding the –WK option to the f77 command line. It invokes the various processing phases that compile, optimize, assemble, and link edit the program. For more information about the –WK option, see the f77(1) manual page. 10 Compiling Programs With PFA Syntax f77 -pfa[{list|keep}][-WK,-option[=value][,-option[=value]]...] [-pfaprepass,-option[=value][,-option[=value]] ... ] filename.f where –pfa Invokes the POWER Fortran Accelerator, pfa. Enables any multiprocessing directives. list Runs pfa and generates an annotated listing of the parts of the program that can (and cannot) run in parallel on multiple processors. The listing file has the suffix .l. keep Runs pfa, generates the listing file (.l), and saves the intermediate transformed Fortran 77 program. The intermediate file has the suffix .m. –WK Passes the specified command line options to PFA. Do not enter spaces between -WK and any of the hyphens, options, equal signs, and values that follow it. –option Specifies a PFA command line option listed in Table 2-1, for example, -IGNOREOPTIONS. value Specifies a value for a command line option, for example, 10. –pfaprepass Passes the code through PFA an extra time. The first time through (the prepass), PFA uses the options specified in the –pfaprepass option but does not insert C$ DOACROSS directives. The output of this operation is then passed back through PFA, using the options specified in the –WK option. Only rarely should you need to use this option, and there is good reason to avoid it. Normally, PFA does all it can in a single run-through. In rare circumstances an extra pass can be beneficial. However, the PFA algorithms do not necessarily converge, and multiple passes over the code can change it for the worse.The syntax of this option is the same as the -WK option. filename.f Specifies the Fortran 77 source program. The filename must always use the .f suffix. 11 Chapter 2: How to Use PFA Table 2-1 lists the PFA command line options. Although the table lists the options in uppercase, you can specify them in lowercase as well. Note: You can replace many of the PFA command line options listed in Table 2-1 with in-code directives. For information on these directives, see Chapter 5, “Fine-Tuning PFA,” and Appendix B, “PFA Directives.” Table 2-1 Reference Long Name Short Name Default Value Parallelization [NO]CONCURRENTIZE [N]CONC CONCURRENTIZE MINCONCURRENT=n MC=n MINCONCURRENT=500 ARCLIMIT ARCLM=n ARCLIMIT=5000 LIMIT=n LM=n LIMIT=20000 OPTIMIZE=n O=n OPTIMIZE=5 ROUNDOFF=n R=n ROUNDOFF=0 SCALAROPT=n SO=n SCALAROPT=3 UNROLL=n UR=n UNROLL=4 UNROLL2=n UR22=n UNROLL2=100 Fortran 77 Language ASSUME=list AS=list ASSUME=EL [NO]DLINES [N]DL NODLINES Control [NO]ONETRIP [N]l NOONETRIP SAVE=c SV=c SAVE=A SCAN=n SCAN=n SCAN=72 SYNTAX=c SY=c (option off) Optimization 12 PFA Command Line Options Compiling Programs With PFA Table 2-1 (continued) PFA Command Line Options Reference Long Name Short Name Default Value Inlining and Interprocedural Analysis INLINE[=list] IN (option off) IPA[=names] IPA (option off) INLINE_CREATE=name INCR=name (option off) IPA_CREATE=name IPACR=name (option off) INLINE_FROM_FILES=list INFF=list (option off) IPA_FROM_FILES=list IPAFF=list (option off) INLINE_FROM_LIBRARIES=l INFL=list ist IPAFL=list IPA_FROM_LIBRARIES=list INLL=n (option off) INLINE_LOOP_LEVEL=n IPALL=n IPALL=10 IPA_LOOP_LEVEL=n INM (option off) INLINE_MAN IPAM INLL=10 IPA_MAN IND IPALL=10) INLINE_DEPTH (option off) (INLL=10 IND=10 Directives [NO]DIRECTIVES=list [N]DR=list DIRECTIVES=AKSV I/O INPUT=file.f file.f file.f [NO]FORTRAN=file [N]F=file F=file.m [NO]LIST=file [N]L=file L=file.l Listing Obsolete LINES=n LN=n LINES=55 LISTOPTIONS=list LO=list LISTOPTIONS=OL SUPPRESS=list SU=list (option off) CREATE CR (option off) LIBRARY=file LIB=file (option off) [NO]EXPAND=list EX=list (option off) LIMIT2=n LM2=n LM2=5000 13 Chapter 2: How to Use PFA Example To compile the Fortran 77 program prog.f with PFA and the -UNROLL=8 option, enter % f77 -pfa -WK,-UNROLL=8 prog.f Figure 2-1 shows what happens when you compile a Fortran 77 program with PFA. The first pass invokes the macro preprocessor cpp to handle cpp directives. (For more information, see the cpp(1) manual page.) PFA then takes the cpp output and inserts code that runs data-independent loops in parallel. PFA can also generate a listing file (with the .l suffix) and an intermediate file (with the .m suffix). For details, refer to Chapter 3, “Utilizing PFA Output.” Finally, the Fortran 77 compiler, f77, compiles the transformed PFA-generated file to produce an object file. Fortran 77 Source (.f) cpp PFA Listing File (.1) Intermediate File (.m) f77 Object File (.o) Figure 2-1 14 Compiling With PFA Using PFA Directly Using PFA Directly Although you normally run PFA as part of an f77 compile, the two instances when you should run PFA directly are • When creating an inlining or IPA library (refer to Chapter 4, “Customizing PFA Execution.”) • If you want to “capture” the output of PFA and review it to determine further optimizations Running the pfa(1) command directly, using the following syntax, produces both the .m and the .l files. Syntax /usr/lib/pfa [-option [-option]...] filename.f where -option Specifies a PFA command line option listed in Table 2-1, for example, -INLINE. filename.f Specifies the Fortran 77 source program. The filename must have the .f suffix. Example The following command runs PFA directly using the -unroll and -roundoff options: % /usr/lib/pfa -ur=4 -r=2 sample.f 15 Chapter 3 3. Utilizing PFA Output This chapter contains the following sections: • “Overview” discusses the PFA output files and provides examples of them. • “Formatting the Listing File” explains how to change the format of the standard listing file. • “Interpreting Default Listing Information” explains the contents of the listing file. • “Sample Listing Files” provides sample listing files along with an interpretation of each. Overview PFA generates two files, a listing file (.l) and an intermediate file (.m). Invoking PFA as part of a Fortran compilation produces a line-numbered listing file when you use the -pfa list option. If you specify the -keep option, PFA produces both the numbered listing file and the intermediate file. PFA automatically produces both files when you invoke it directly. (For details about invoking PFA, refer to Chapter 2, “How to Use PFA.”) For example, consider the following program, sample.f: subroutine sample (a,b,c) dimension a(1000),b(1000),c(1000) do 10 i = 1, 1000 10 a(i) = b(i) + c(i) end 17 Chapter 3: Utilizing PFA Output Compiling sample.f as follows % f77 -pfa keep sample.f generates the following listing file, sample.l: Actions DIR Do Loops Line 1 # 1 2 3 a(1000),b(1000),c(1000) c +-------- 4 *_______ 5 10 6 Abbreviations Used DIR directive C concurrentized Loop Summary Loop# 1 From line 4 To line 5 Loop label DO 10 “sample.f” subroutine sample(a,b,c) dimension do 10 i = 1,1000 a (i) = b(i) + c(i) end Loop index I Status concurrentized and the intermediate file, sample.m: # # “sample.f” “sample.f” subroutine sample(a,b,c) DIMENSION A(1000), B(1000), C(1000) # 3 “sample.f” C$DOACROSS SHARE(A,B,C),LOCAL(I) # 3 “sample.f” DO 2 I=1,000 # 4 “sample.f” A(I) = B(I) + C(I) # 4 “sample.f” 2 CONTINUE end 18 1 1 Formatting the Listing File PFA placed a C before the first statement of the DO loop in the listing file, sample.l. The Abbreviations Used table shows that C stands for “concurrentized,” which means that PFA determined that it can safely run the loop in parallel. The Loop Summary table at the bottom of sample.l shows that the status of the loop is concurrentized. PFA inserted the statement starting with C$DOACROSS before the DO statement in the intermediate file, sample.m. The Fortran 77 compiler directive C$DOACROSS tells f77 that the next DO loop can run in parallel. The phrase SHARE (A,B,C) informs the Fortran 77 compiler that all processes that execute the DO loop share the arrays A, B, and C. The phrase LOCAL(I) indicates that every process executing the DO loop keeps a local variable I. The lines of the form # 4 "sample.f" are called line number directives. They relate the transformed source back to the original source. Note: The first line number directive appears in the listing because it was actually added by cpp before PFA ran. Formatting the Listing File You customize a PFA listing file by • paginating the listing • selecting the information to be printed • disabling specific message classes Paginating the Listing The -LINES=n option (or -LN=n) paginates the listing for printing. Use this to change the number of lines per page. Specifying -LINES=0 paginates at subroutine boundaries. If you do not specify the -LINES option, PFA prints 55 lines per page. 19 Chapter 3: Utilizing PFA Output Specifying Information to Include The -LISTOPTIONS=list option (or -LO=list) specifies the information to include in the listing file (.l), where list is any combination of the options in Table 3-1. Table 3-1 20 Listing File Include Options Value Produces C Calling tree at the end of the program listing. I Transformed program file annotated with line numbers in the source program. Error messages and debugging information can refer to the original source rather than the transformed source. Running PFA as part of an f77 compile automatically adds this option. K Print out of the PFA options used at the end of each program unit. L Loop-by-loop optimization table. N Program unit names, as processed, to the standard error file. This option is added automatically as part of an f77 -v compilation. O Annotated listing of the original program. P Processing performance statistics. S Summary of optimization performed. T Annotated listing of the transformed program. Interpreting Default Listing Information Disabling Message Classes Use the -SUPPRESS=list option (or -su=list) to disable individual classes of PFA messages that are normally included in the listing (.l) file. These messages range from syntax warnings and error messages to messages about the optimizations performed. list is any combination of the options in Table 3-2. Table 3-2 Listing File Message Disabling Options Value Message Class Disabled D Data dependence E Syntax error I Information N Unable to run loop in parallel Q Questions S Standard messages W Warning of syntax error (PFA adds the -SUPPRESS=W option automatically if you use the -w option to f77) If you do not specify this option, PFA prints messages of all classes. Interpreting Default Listing Information Knowing when and where to modify your code means understanding the information in the PFA listing. This understanding allows you to recognize where small changes to the source code will make a big difference in how much code is run in parallel.The PFA-generated listing file lists the optimizations PFA made to the code. For example, a message could say that, although three loops could have run in parallel, PFA converted only the one it determined most profitable. This section explains how to view the listing file online and then lists and describes the various fields. 21 Chapter 3: Utilizing PFA Output Viewing the Listing File The listing file is in 132-column format. To view the file, open a window with 132 columns and 40 rows by entering % wsh -s132,40 Field Descriptions This section explains the contents of the .l file when you use the default values for the -LISTOPTIONS command line option (that is, O and L). A default PFA file listing includes • line numbers • DO loop markings • footnotes • syntax errors/warning messages • action summary Line Numbers A statement in the PFA listing labeled with a line number, such as 21, is the same as line 21 from the original program or has been derived from that line. These line numbers are useful when inspecting the PFA-transformed program listing and when debugging. PFA sometimes generates several lines of code from a single line of the original program; in this case, each new line of code is labeled with the same number as the line of the original program from which it was generated. Consequently, many lines of the PFA-transformed program listing carry the same number because they are related to one line of the original program listing. 22 Interpreting Default Listing Information DO Loop Marking The listing file displays DO loops graphically in a column headed DO Loops. The PFA surrounds each DO loop (up to nest level 10) with a loop delimiter character. Each character listed in Table 3-3 has a specific meaning. Listing File DO Loop Delimiters Table 3-3 Character Denotes | Generic DO loop * PFA can run loop in parallel ! Syntax error A statement contained within n DO loops has n of these loop delimiters on that line. For example, DO Loops +------| | | |_______ Line 173 174 175 176 177 100 DO 100 M=2,MAX(MFLD,2) IADR = ISECT(M) IADR1= ISECT(M-1) PNM(IADR)=(ANM(IADR) *PNM(IADR1)) PPNM(IADR)= -(ANM(IADR) *PNM(IADR1)) Footnotes PFA uses the footnotes listing to give important details concerning its actions. PFA numbers and prints the footnotes at the bottom of each program unit under the Footnote List heading. References to the footnotes are displayed in the listing under the Footnotes column. For example, this footnote 13 DD 1790 IF (B(I) .LE. 6) IB(J*I) = I+J appears under Footnote List at the end of the program unit 13: data dependence Data dependence involving this line due to variable IB. 23 Chapter 3: Utilizing PFA Output In this example, 13 is the footnote number, DD (data dependence) is the explanation for PFA’s action, and the IF statement on line 1790 refers to the original source line number. Syntax Errors/Warning Messages When a program has syntax errors, the listing file describes the error next to the lines that start with the symbol ### in the Footnotes column. These messages are also printed to stderr, which will usually be your terminal. For example, Footnotes Actions DO Loops +------! ! ! ______ Line 1 2 3 4 5 6 20 SUBROUTINE Z(A,B,N) REAL A(N), B(N) DO 20 I=1,N X=A(I) Y=B(I) C(I)=X+Y ### line (6) ### error Array not declared or statement function declared after executable statements. ### error A do loop ends on a non-executable statement. 7 PRINT *,X 8 END Action Summary When PFA translates or modifies a statement, it uses abbreviations in the Actions column of the listing file to identify the statements. PFA lists an abbreviated explanation of its actions at the bottom of the listing. For the DIR and V classes, the class itself serves as the message and no detailed messages follow. All other classes have associated messages. 24 Interpreting Default Listing Information Table 3-4 lists and explains the values that can appear in the Actions column. Table 3-4 PFA Action Abbreviations Value Meaning DD (Data Dependence) Indicates that data dependence prevented PFA from running this statement in parallel. DIR (Directive) Used in conjunction with the footnotes and concerns compiler directives. If you code a compiler directive and that line does not have the DIR abbreviation in the listing, PFA will not recognize the directive. Check the setting of the -DIRECTIVES command line option and the syntax of the directive. E (Error) Indicates syntax errors. These messages can refer to missing or extra characters, illegal keywords, or text placed in the wrong column. PFA cannot do anything with such code. The intermediate (.m) file contains a copy of this program unit that PFA has not modified. EX (Extension) Shows where a construct in the original program is not allowed in the language PFA produces. In some cases, an operation or type is allowed in the input language but not in the output language. INF (Information) Provides noncritical information. I (Insertion) Indicates that PFA added a statement. LR (Loop Reordering) Indicates that PFA has modified a Fortran 77 statement in the process of interchanging loops. If during optimization PFA ascertains that an outer loop would be more efficient as an inner loop, and it can legally reorder the loops, PFA places the outer loop inside. In the process of this reordering, PFA might have to change loop bounds (for triangular loops), distribute loops, or float IF assignments. Only the statements modified for the exchange are marked. MIS (Miscellaneous) Indicates that some PFA information has been lost. This message does not always mean that something is wrong with the program. NX (Nonconcurrent Statement) Indicates that PFA did not try or was unable to run the statement in parallel. For example, when a subroutine call is involved in a loop, PFA generates this message. 25 Chapter 3: Utilizing PFA Output Table 3-4 (continued) 26 PFA Action Abbreviations Value Meaning NO (Program Too Large—Not Optimized) Indicates that the program unit being processed is too large for PFA to optimize, because of PFA’s data structure size limitations. When PFA optimizes programs, it adds statements that might also overflow the fixed-size tables. In either case, PFA stops optimization and passes the original program to the intermediate (.m) file, informing you of this action. For PFA to process the unit, you must split the program into smaller sections. OE (Option Error) Indicates a syntax error in a PFA option. This error does not stop processing of a program unit. OTF (Output Translation Failure) Marks statements that have constructs that exist in the input language but that cannot be represented in the output language. Q (Question) Indicates that PFA tried to optimize a loop nest but discovered a data dependence it could not break at compile time without further information. You can usually answer this question with an appropriate assertion. SO (Scalar Optimization) Marks places in the transformed listing where PFA has optimized a scalar loop. STD (Standardized) Marks where PFA changed a program to improve the chance of finding code that it can optimize. This is often a conversion from an IF/GOTO into a block IF, loop rerolling, and conversion of an IF loop to a DO loop. TE (Translator Error) Indicates an internal PFA error. PFA writes the notification to the standard error file and writes a trace back to the output file. Notify SGI if you see this sort of bug (so it can be corrected) and, if possible, send SGI the code that caused the trace back as well as the trace back itself. If you can reproduce the error in a small program unit, send that small program unit as well. W (Warning) Contains syntax warnings. Sample Listing Files Sample Listing Files This section contains a few simple examples of Fortran code and the corresponding PFA output. An actual source program would be much larger, and a single loop could contain several of the cases illustrated here. However, even in a large loop, you can deal with each problem individually. Indirect Indexing PFA cannot determine if it can run a loop in parallel when the code uses indirect indexing. A loop is indirectly indexed when it uses the value from some auxiliary array as the index value rather than the DO loop variable. The Fortran 77 code subroutine foo2(w,b,index,n) real w(n), b(n) integer index(n) do i = 1, n w(index(i)) = w(index(i)) + b(i) enddo end when submitted to PFA, results in the listing file 1 Q 2 DD b(i) +------! !_______ 10 11 12 13 14 15 16 17 18 19 subroutine foo2(w,b,index,n) real w(n), b(n) integer index(n) do i = 1, n w(index(i)) = w(index(i)) + enddo end Abbreviations Used DD data dependence Q question 27 Chapter 3: Utilizing PFA Output Footnote List 1: question 2: data dependence Is INDEX a permutation vector? Data dependence involving this line due to variable W. DO Loop Summary loop# from 1 16 to 18 DO label index DO I workload status dependencies prevent parallelism DD in the Actions column on line 17 of the listing warns that the variable w might carry a dependency. A dependency exists when one iteration of the loop writes to a location that is used by a different iteration of the loop. In this example, if the values of index(i) are ever the same for different values of i, then different iterations might use the same location in w. Therefore, this code contains a possible data dependence. If you can guarantee that the values of index(i) are always different for each value of i, then there is no dependence (each iteration uses a different location in w). Question one on the Footnote List asks if index(i) is different for every value of i. A permutation vector is a list of numbers, each of which is different from the others. If you know that index is a permutation vector, then the loop is data-independent. An example of a permutation vector is a list of objects in which each object appears exactly once. 28 Sample Listing Files Explicitly state that index is a permutation vector by adding an assertion in the source subroutine foo2(a,b,index,n) real a(n), b(n) integer index(n) c*$*assert permutation (index) do i = 1, n a(index(i)) = a(index(i)) + b(i) enddo end Now the listing file shows that PFA finds the loop safe to run in parallel (indicated by the * DO loop delimiter) Actions DIR DIR C DO Loops +-----* Line 1 # 1 “foo2.f” 2 subroutine foo2(a,b,index,n) 3 real a(n), b(n) 4 integer index(n) 5 6 c*$*assert permutation (index) 7 do i= 1, n 8 a(index(i)) = a(index(i)) + b(i) *______ 9 10 enddo end Abbreviations Used DIR C directive concurrentized Loop Summary Loop# 1 From line 7 To line 9 Loop label Do Loop index I Status concurrentized Note: As with all assertions, PFA does not verify the truth of this assertion. When you make an assertion, be certain that the assertion is always true for all possible input data. 29 Chapter 3: Utilizing PFA Output Function Call This example shows what happens when a loop contains a call to an external routine. The Fortran 77 code subroutine foo3 (a,b,c,n) real a(n), b(n), c(n) external force do i = 1, n a(i) = force (b(i), c(i)) enddo end generates the listing Actions DO Loops DIR NCS NO NCS +-----! !______ Line 1 # 2 3 4 5 6 7 8 9 1 “foo3.f” subroutine foo3(a,b,c,n) real a(n), b(n), c(n) external force do i = 1, n a(i) = force(b(i), c(i)) enddo end Abbreviations Used NO not optimized DIR directive NCS non-concurrent-stmt Footnote List 1: not optimized 2: not optimized No optimizable statements found. Unoptimizable call to “FORCE” found. Loop Summary Loop# 1 30 From line 6 To line 8 Loop label Do Loop index I Status unoptimizable call (FORCE) Sample Listing Files Calling the function force prevents PFA from automatically running the loop in parallel. PFA identifies the function call as a non-concurrent-stmt. By its nature, a nonconcurrent statement prevents PFA from assuming the loop is safe to run in parallel because PFA cannot see into the routine to look for data dependencies. If you know that force generates no data dependencies, then explicitly state this fact for the nonconcurrent statement subroutine foo3(a,b,c,n) real a(n), b(n), c(n) external force c*$*assert concurrent call do i = 1, n a(i) = force(b(i), c(i)) enddo end Now that PFA knows that the nonconcurrent statement involves no data dependency, PFA will find the loop safe to run in parallel. There is one subtlety in using the concurrent call assertion. When you use this assertion, PFA makes no attempt to examine the called routine; it simply assumes that it is safe. However, PFA is still left with the problem of correctly declaring the variables in the loop to be either SHARE or LOCAL. (PFA does the best it can, but it can sometimes be fooled.) For example, subroutine tricky (a,b,c,n,m) real a(*), b(*) external my_function c*$*assert concurrent call do i = 1, n a(i) = my_function (b(i), m) b(i) = a(i) + m enddo m = 0 end 31 Chapter 3: Utilizing PFA Output The question is whether the variable m should be SHARE or LOCAL. If the routine my_function only reads the old value of m, then it should be SHARE. If my_function writes a new value of m, then it should be LOCAL. In the absence of any more clues, PFA must go by what it can see; and what it can see is that within the loop, there are no visible assignments to m, and so PFA will declare it to be SHARE. If in fact my_function is writing the value of m, then this is incorrect. In this case, to give PFA the hint it needs, add a visible assignment to m at the top of the loop. For example, consider the following code: do i = 1, n m = 0 a(i) = my_function(b(i), m) b(i) = a(i) + m enddo Here, PFA can see an assignment to m and so will declare it to be LOCAL. Note that if my_function is both reading the old value and writing a new value of m, then it was not legal to parallelize the loop. Reductions This example shows how PFA produces a single value from a set of values. Because the entire set of values is reduced to a single value, these operations are called reductions. Consider the Fortran 77 code subroutine foo4(a,b,n,sum) real a(n), b(n), sum sum = 0.0 do i = 1, n sum = sum + a(i)*b(i) enddo end 32 Sample Listing Files Using the previous code as input, PFA produces the listing file DIR 1 DD +----! !_____ 1 # 1 “foo4.f” 2 subroutine foo4(a,b,n,sum) 3 real a (n), b(n), sum 4 5 sum = 0.0 6 do i = i, n 7 sum = sum + a(i)*b(i) 8 enddo 9 end Abbreviations Used DD data dependence DIR directive Footnote List 1: data dependence Loop Summary From To Loop# line line 1 6 8 Data dependence involving this line due to variable “SUM”. Loop label Do Loop index I Status scalar mode preferable Because different iterations of the loop read and write the same location (the variable sum), there is a dependence. However, this is a special case. Because sum just accumulates a total, you can accumulate subtotals in parallel and then combine the subtotals at the end. Because the parallel version of the code adds the elements together in a different order than the single-process version, the round-off errors accumulate differently for the two versions of the code. Thus, the answer can differ slightly as you vary the number of processes used to run the code. In fact, if you use the dynamic scheduling option for the code, the answer might vary slightly from one run of the program to the next, even if you use the same number of processes on the same machine. Most applications can safely ignore this variation in round-off error. If you do not care about this round-off error, you can tell PFA to use parallel subtotals. To tell PFA not to worry about round-off error, you can use either the C*$*ROUNDOFF=2 directive or the f77/pfa command line option -WK, -roundoff=2. 33 Chapter 3: Utilizing PFA Output The resulting listing file is DIR C +-----* *______ 1 # 1 “foo4.f” 2 subroutine foo4(a,b,n,sum) 3 real a(n), b(n), sum 4 5 sum = 0.0 6 do i = 1, n 7 sum = sum + a(i)*b(i) 8 enddo 9 end Abbreviations Used DIR directive C concurrentized Loop Summary Loop# 1 From line 6 To line 8 Loop label Do Loop index I Status concurrentized Be aware that the round-off error produced by the parallel reduction operation is not necessarily any worse than the round-off error already present in the original serial version. It will simply be different. If your application did not worry about the round-off error in the original, there is no reason to suppose that it should worry about it in the parallel version. If, on the other hand, your application takes special steps to reduce round off (for example, adding the numbers together in order from smallest absolute value to largest), then you should not use parallel reductions. 34 Sample Listing Files The previous example is called a sum reduction because the reduction operator is +. Table 3-5 shows the types of reductions PFA supports. Table 3-5 Reduction Types Type Operator Example Sum + sum = sum + expression Product * p = p* expression Min min( ) a = min(a, expression) Max max( ) x = max(x, expression) All these reductions are under the control of the -ROUNDOFF command line option, even though technically the min and max reductions do not involve round-off problems. 35 Chapter 4 4. Customizing PFA Execution This chapter contains the following sections: • “Overview” explains when to optimize PFA execution. • “Controlling Code Execution” describes how to control whether PFA runs eligible loops in parallel. • “Controlling PFA Code Transformations” describes how to control the various transformations performed by PFA. • “Performing Inlining and Interprocedural Analysis” describes inlining and interprocedural analysis and explains how and when to perform these procedures. • “Controlling Fortran Language Elements” explains how to control standard Fortran elements with command line options to PFA. • “Controlling Directives and Assertions” explains how to override PFA directives and assertions with command line options. • “Controlling PFA I/O” explains how to customize the names of PFA input and output files. • “Obsolete Syntax” lists obsolete PFA command line options. Overview To customize how PFA executes an entire program, you can specify various command line options when you run PFA directly or when you specify PFA as part of a compile. Chapter 2, “How to Use PFA,” explains both procedures. For a complete summary of the PFA command line options, refer to Appendix A, “PFA Command Line Options.” 37 Chapter 4: Customizing PFA Execution Controlling Code Execution When modifying most programs to allow loops to run in parallel, modify the code so that PFA can automatically run the loop in parallel. Avoid forcing the loop to run in parallel by directly inserting a C$DOACROSS directive. If you force code to run in parallel, you (and not PFA) need to verify that no subsequent modification inserts data dependencies. Forcing these data dependencies in code to run in parallel can produce serious (and difficult-to-find) errors. Rewriting the loop so that PFA recognizes the loop as safe to run in parallel allows PFA to check future modifications for potential data dependencies. This section describes how to control whether eligible loops are run in parallel and how to specify a work threshold for loops. Running Code in Parallel The -CONCURRENTIZE option (or -C) converts eligible loops to run in parallel. This is the default value for this option. The -NOCONCURRENTIZE option (or -NCONC) prevents PFA from converting loops to run in parallel. Specifying a Work Threshold The -MINCONCURRENT=n option (or -MC=n) specifies the minimum amount of work needed inside the loop to make executing a loop in parallel profitable. The integer n is a count of the number of operations (for example, add, multiply, load, store) in the loop, multiplied by the number of times the loop will be executed. If the loop does not contain at least this much work, the loop will not be run in parallel. If the loop bounds are not constants, an IF clause will be automatically added to the PFA-generated C$ DOACROSS directive to test at run time if sufficient work exists. If you do not specify this option, PFA runs all loops containing 500 or more operations in parallel. 38 Controlling PFA Code Transformations For example, given the original loop 2 do 2 i =1,n x(i) = y(i) * z(i) continue PFA generates the following transformed loop: C$DOACROSS IF (N .GT. 100), SHARE (N,X,Y,Z), LOCAL(I) DO 3 I=1,N x(i) = y(i)*z(i) 3 CONTINUE The IF clause ensures that n is large enough to make running the loop in parallel profitable (otherwise, PFA will run the loop serially). If the loop bound is a small constant (such as 10) instead of n, PFA would not generate a DOACROSS statement for the loop and the listing file will state that the loop does not contain enough work. Conversely, if the bound is a large constant (such as 100), then PFA generates the DOACROSS statement without the IF clause. Controlling PFA Code Transformations This section discusses the various ways in which you can control the standard transformations that PFA performs. Controlling Size/Complexity Thresholds You can control the thresholds for internal table size and routine complexity in order to analyze larger and more complex routines. Controlling Internal Table Size The -ARCLIMIT=n option (or -ARCLM=n) controls the size of the internal table used to store data dependence information (arcs). If this table overflows, PFA stops analyzing the loop and the PFA listing file shows the message too many stmts/dd arcs 39 Chapter 4: Customizing PFA Execution Increasing ARCLIMIT might allow PFA to analyze the loop but at the cost of additional processing time. Specifying a Complexity Limit The -LIMIT=n option (or -LM=n) controls the amount of time PFA can spend trying to determine whether a loop is safe to run in parallel. PFA estimates how much time is required to analyze each loop nest construct. If an outer loop looks like it would take too much time to analyze, PFA ignores the outer loop and recursively visits the inner loops. Larger limits often allow PFA to generate parallel code for deeply nested loop structures that it might not otherwise be able to run safely in parallel. However, with larger limits PFA can also take more time to analyze a program. (The limit does not correspond to the DO loop nest level. It is an estimate of the number of loop orderings that PFA can generate from a loop nest.) This option has the same effect as the global C*$* LIMIT(n) directive. Note: You do not usually need to change these limits. Setting the Optimization Level The -OPTIMIZE=n option (or -O=n) sets the optimization level. The higher you set the optimization level, the more code is optimized and the longer PFA runs. Programs that are written for running in parallel often do not need advanced transformation. With these programs, a lower optimization level is enough. Valid values for n are 40 0 Avoids converting loops to run in parallel. 1 Converts loops to run in parallel without using advanced data dependence tests. Enables loop interchanging. 2 Determines when scalars need last-value assignment using lifetime analysis. Also uses more powerful data dependence tests to find loops that can run safely in parallel. This level allows reductions in loops that execute concurrently but only if the -ROUNDOFF option is set to 2. (Refer to the following section for details about the -ROUNDOFF option.) Controlling PFA Code Transformations 3 Breaks data dependence cycles using special techniques and additional loop interchanging methods, such as interchanging triangular loops. This level also implements special-case data dependence tests. 4 Generates two versions of a loop, if necessary, to break a data-dependent arc. This level also implements more-exact data dependence tests and allows special index sets (called wraparound variables) to convert more code to run in parallel. 5 Fuses two adjacent loops if it is legal to do so (that is, there are no data dependencies) and if the loops have the same control values. In certain limited cases, this level recognizes arrays as local variables. This level is the default. This option has the same effect as the global C*$* OPTIMIZE(n) directive described in Chapter 5, “Fine-Tuning PFA.” Note: If you want to use the -UNROLL command line option, set the -OPTIMIZE option to 4 or higher (the default optimization level is above this threshold). 41 Chapter 4: Customizing PFA Execution Controlling Variations in Round Off The -ROUNDOFF=n option (or -R=n) controls the amount of variation in round off that PFA will allow. Valid values for n are the integers 0–1 Suppresses any round-off transformations. This is the default. 2 Allows reductions to be performed in parallel. The valid reduction operators are addition, multiplication, min, and max. This value is one of the most commonly specified user options. 3 Recognizes REAL induction variables. Permits memory management transformations (refer to “Memory Management Transformations” on page 44). When executing reductions in parallel, PFA processes values in a different order from the original serial code. Round-off errors accumulate differently and produce a slightly different answer. Some algorithms are sensitive to this variation, and so, by default, PFA does not run reductions in parallel. Usually, these tiny variations are irrelevant, and you can allow PFA to process a reduction in parallel allowing more loops to be run in parallel. Controlling the Number of Scalar Optimizations The -SCALAROPT=n option (or -SO=n) controls the amount of standard scalar optimizations attempted by PFA. Valid values for n are the integers 42 0 Performs no scalar transformations. 1 Enables dead code elimination, pulling loop invariants, forward substitution, and conversion of IF-GOTO into IF-THEN-ELSE. 2 Enables induction variable recognition, loop unrolling, loop fusion, array expansion, scalar promotion, and floating invariant IF tests. (Loop fusion also requires -OPTIMIZE=5.) Controlling PFA Code Transformations 3 Enables the memory management transformations (refer to “Memory Management Transformations” on page 44). (Memory management also requires -ROUNDOFF=3.) This is the default value. Enabling Loop Unrolling The -UNROLL=n option (or -UR=n) unrolls scalar inner loops when PFA cannot run the loops in parallel. n specifies the number of times to replicate the loop body. The default is 4. Specify a small power of two for the unroll value, such as two, four, or eight. Disable unrolling by setting -UNROLL=1. The -UNROLL2=m option (or -UR2=m) allows you to adjust the number of operations used by the -UNROLL option. Selecting a larger value for -UNROLL2 allows PFA to unroll loops containing more calculations. This form of unrolling applies only to the innermost loops in a nest of loops. You can unroll loops whether they execute serially or concurrently. PFA counts the number of array references and arithmetic operations in the loop. It unrolls the loop until it reaches either the number of operations specified by the -UNROLL2 option or the number of iterations specified by -UNROLL. When PFA unrolls a loop, it replicates the body of the loop a certain number of times, making the loop run faster. However, unrolling loops also increases the program size. For example, if the original program is do i = 1,100 a(i) = b(i) + c(i)*d(i) enddo the unrolled program (unrolling of order 4) is do i = 1,100,4 a(i) = b(i) + c(i)*d(i) a(i+1) = b(i+1) + c(i+1)*d(i+1) a(i+2) = b(i+2) + c(i+2)*d(i+2) a(i+3) = b(i+3) + c(i+3)*d(i+3) enddo 43 Chapter 4: Customizing PFA Execution The second (unrolled) version runs faster than the original version. The reason for the improvement is that SGI processors have separate add and multiply hardware, allowing addition and multiplication operations to run simultaneously. In the original program, the processor has to do the multiplication, wait for it to complete, then do the addition. In the second case, the processor can do the first multiplication, wait for it to complete, then overlap the second multiplication and the first addition, then the third multiplication and the second addition, and so on. The additions require nearly no additional time because all but the last one are completed within the time it takes the (previous) multiplication to complete. If the loop already contains many computations (for example, many lines of code, many additions and multiplications), then unrolling it might help a little but not much. Memory Management Transformations When -ROUNDOFF and -SCALAROPT are both set to 3, PFA attempts to do outer loop unrolling (to improve register utilization) and automatic loop blocking (also called tiling) to improve cache utilization. Outer loop unrolling is a standard hand-optimization technique. Note that the -UNROLL and -UNROLL2 options apply to inner-loop unrolling. Outer-loop unrolling can occur even if inner-loop unrolling is disabled. Loop blocking is a complex transformation that is applicable when the loop nesting depth is greater than the dimensions of the data arrays being manipulated. The canonical example is the simple matrix multiply, where a three-deep nest of loops operates on two-dimensional arrays. The simple method repeatedly sweeps over the entire array. If the array is too large to fit into the cache, this can result in a large amount of memory traffic. A better method is to break the arrays up into blocks, where each block is small enough to fit into the cache, and then sweep over each block in turn (rather than over the whole array). The code to do this is often ugly and complicated. PFA attempts to ease the burden of writing block-style algorithms by automatically generating the block version from the simple version. Note, however, that blocking does not help the more common case where the algorithm touches each array element exactly once (for example, 44 Controlling PFA Code Transformations a two-dimensional array inside of a two-deep loop nest). Because in this case the data is not being reused, blocking does not apply. For example, given the loop nest do k =1,n do j= 1,n do i =1,n a(i,j) = a(i,j) + b(i,k)*c(k,j) enddo enddo enddo using the option -r=3, PFA produces the listing below: II3 = 1 II1 = MOD (N - 1, 682) + 1 II2 = II1 II10 = N - 7 II11= (II10 + 7) / 8 DO 4 II4=1, N, 682 II8 = II3 + II2 - 1 DO 2 K=1, II10, 8 C$DOACROSS SHARE(N,K,C,II3,II8,A,B),LOCAL(DD1,DD2,C$& DD3, DD4,DD5,DD6,DD7,DD8,DD9,J,I) DO 2 J=1,N DD2 = C(K,J) DD3 = C(K+1,J) DD4 = C(K+2,J) DD5 = C(K+3,J) DD6 = C(K+4,J) DD7 = C(K+5,J) DD8 = C(K+6,J) DD9 = C(K+7,J) DO 2 I=II3, II8, 1 DD1 = A(I,J) DD1 = DD1 + B(I,K) * DD2 DD1 = DD1 + B(I, K+1) * DD3 DD1 = DD1 + B(I, K+2) * DD4 DD1 = DD1 + B(I, K+3) * DD5 DD1 = DD1 + B(I, K+4) * DD6 DD1 = DD1 + B(I, K+5) * DD7 DD1 = DD1 + B(I, K+6) * DD8 DD1 = DD1 + B(I, K+7) * DD9 A(I,J) = DD1 45 Chapter 4: Customizing PFA Execution 2 CONTINUE II7 = II11 * 8 + 1 II9 = II3 + II2 - 1 DO 3 K=II7, N, 1 C$DOACROSS SHARE(N,K,C,II3,II9,A,B),LOCAL(DD10,J,I) DO 3 J=1,N DD10 = C(K,J) DO 3 I=II3,II9,1 A(I,J) = A(I,J) + B(I,K) * DD10 3 CONTINUE II3 = II3 + II2 II2 = 682 4 CONTINUE Obviously, PFA’s version is more complicated than the original, but it runs significantly faster. Performing Inlining and Interprocedural Analysis Function and subroutine calls create an obstacle to parallelization. PFA provides three ways of dealing with this obstacle: • Assert that the external routine is safe for concurrent execution (see “C*$* ASSERT CONCURRENT CALL” on page 64). • Inline the routine by replacing the call to the external routine with the actual code. • Perform interprocedural analysis (IPA) by analyzing the external routine ahead of time and using the results of that analysis when a reference to the routine is encountered. Inlining and IPA tend to be slow, memory-intensive operations. Attempting to inline all routines everywhere they occur can take a lot of time and use a lot of system resources. Inlining should usually be restricted to a few time-critical places. 46 Performing Inlining and Interprocedural Analysis This section discusses the three steps for inlining or IPA: 1. Specify which routines will be inlined (or interprocedurally analyzed). 2. Specify which source files and libraries will be searched to find the routines. 3. Specify which occurrences of those routines are to be inlined (or analyzed). Specifying Routines for Inlining or IPA PFA supports the -INLINE=list option (or -IN=list) that specifies the routines to be inlined and the -IPA=list option for IPA. list is a colon-separated list of routines to be inlined. For example, -INLINE=jump:more If you do not specify list, PFA will attempt to inline all eligible routines. Specifying Where to Search for Routines The options listed in Table 4-1 tell PFA where to search for the routines specified with the -INLINE or -IPA option. If you do not specify either option, PFA searches the current source file by default. Table 4-1 Inlining and IPA Search Command Line Options Long Option Name Short Option Name Default Value -INLINE_FROM_FILES=list -INFF=list Current Source File -IPA_FROM_FILES=list -IPAFF=list Current Source File -INLINE_FROM_LIBRARIES=list -INFL=list None -IPA_FROM_LIBRARIES=list -IPAFL=list None If one of the names in list is a directory, then all appropriate files in that directory will be used. PFA assumes files with the extension .f are Fortran source and files with the extension .klib are PFA-produced libraries. 47 Chapter 4: Customizing PFA Execution Specify multiple files and directories with the same option by using a colon-separated list. For example, -INLINE_FROM_FILES=file1:file2:file3 Note: These options by themselves do not initiate inlining or IPA. They only specify where to look for the routines. Use them in conjunction with the appropriate -INLINE or -IPA option. Creating a Library When performing inlining and IPA, PFA analyzes the routines in the source program. Normally, inlining is done directly from a source file. However, when inlining the same set of routines in many different programs, it is more efficient to create a preanalyzed library of the routines. Use the -INLINE_CREATE =name option (or -INCR=name) to create a library of prepared routines (for later use with the -INLINE_FROM_LIBRARIES option). PFA assigns a name to the library file it creates; for maximum compatibility, use the filename extension .klib: for example, samp.klib. The -IPA_CREATE=name option (or -IPACR=name) is the analogous option for IPA. The library used to do IPA does not have to be generated from the same source that will be linked into the running program. Using this capability can cause errors, but it can also be useful. For example, you could write a library of hand-optimized assembly language routines, then construct a PFA-compatible IPA library using Fortran routines that mimic the behavior of the assembly code. Thus, you can do parallelism analysis with IPA correctly but still call the hand-optimized assembly routines. Use the following procedure to create and use a PFA library: 1. Create a library by passing the source program directly through pfa. Library creation is done by PFA and should not be done at the same time as an ordinary compilation. For example, the following command line creates a library called samp.klib for the source program samp.f: % /usr/lib/pfa -INLINE_CREATE=samp.klib samp.f 2. Compile the program with pfa: % f77 -pfa keep -WK,-INFL=samp.klib samp.f 48 Performing Inlining and Interprocedural Analysis Note: Libraries created for inlining contain complete information and can be used for inlining or IPA. Libraries created for IPA contain only summary information and can be used only for IPA. Specifying Occurrences The loop level, depth, and manual options allow you to control which occurrences of the routines specified with the -INLINE or -IPA option are actually dealt with when the -INLINE or -IPA options are used. Loop Level The -INLINE_LOOPLEVEL=n (or -INLL=n) and -IPA_LOOPLEVEL=n (or -IPALL=n) options allow you to limit PFA to work only on occurrences within deeply nested loops. Thus, a value of 1 restricts PFA to deal with routines only at the single-most deeply nested level; a value of 2 restricts PFA to the deepest and second-deepest levels; and so on. To determine most deeply nested, PFA constructs a call graph to account for nesting due to loops that occur farther up the call chain. If you do not specify either option, the loop level is 10. Depth The -INLINE_DEPTH=n (or -IND) option restricts the number of times PFA will continue to attempt inlining on already inlined routines. For example, suppose you use PFA to inline the routine foo. However, foo itself contains a call to bar. Should PFA now attempt a second inlining depth and inline bar? And if bar calls baz, should PFA inline three deep? This option provides control over this process, as routines are only inlined to the specified depth. As a special case, if you specify the value –1, only routines that do not reference other routines are inlined (that is, only leaf routines are inlined). Note that the extension to –2, –3, and so on is not supported, only –1. Note also that there is no -IPA_DEPTH option. 49 Chapter 4: Customizing PFA Execution Manual The -INLINE_MAN option turns on recognition of the C*$*INLINE directive. This directive (described in Chapter 5, “Fine-Tuning PFA”) allows you select individual occurrences of routines to be inlined. -IPA_MAN is the analogous option for the C*$*IPA directive (also described in Chapter 5, “Fine-Tuning PFA.”). Conditions That Prevent Inlining or IPA Several conditions make a routine ineligible for inline expansion or IPA: • Dummy arguments do not match the actual arguments in number, type, shape, or size. • The calling program and called routine have conflicting declarations for the same COMMON block. • The calling program and the called routine have conflicting EQUIVALENCE statements. • The routine to be inlined has a SAVE, ENTRY, or NAMELIST statement. • The routine to be inlined has a DATA loaded variable. • The routine to be inlined is too long (the limit is about 600 lines). Controlling Fortran Language Elements This section explains how to control various Fortran 77 language elements. Global Assumptions The -ASSUME=list option (or -AS=list) controls certain global assumptions of a program. list consists of any combination of the following values: E 50 Allows equivalence variables to refer to the same memory location inside one loop. For more information, see Chapter 5, “Fine-Tuning PFA.” Controlling Fortran Language Elements L Instructs PFA to use a temporary variable within the optimized loop and assign the last value to the original scalar if PFA determines that scalar can be reused before it is assigned. This value is important when a scalar is assigned in a loop run in parallel. For more information, see Chapter 5, “Fine-Tuning PFA.” P Allows for parameter aliasing in a subprogram. For more information, see Chapter 5, “Fine-Tuning PFA.” By default, PFA assumes that a program conforms to the ANSI (and VMSTM) standard; therefore, the default is -ASSUME=EL. Debugging Lines The -DLINES option tells PFA to treat the letter D in column one as if the letter were a character space. PFA then parses the rest of that line as a normal Fortran 77 statement. The -NODLINES option tells PFA to treat these lines as though they were comments. These options are useful for excluding or including debugging lines. f77 passes this option to PFA automatically when you specify the f77 -d_lines option. DO Loop Execution The -ONETRIP option (or -l) provides compatibility with older versions of Fortran where a DO loop is always executed at least once. The -NOONETRIP (or -N1) option conforms to the Fortran 77 standard. This option, which is the default, does not execute a DO loop whose termination condition is initially satisfied. f77 passes the -ONETRIP option to PFA automatically when you specify the f77 -one_trip option. 51 Chapter 4: Customizing PFA Execution Variable Saving Across Invocations The -SAVE=c option (or -SV=c) specifies whether a procedure’s variables are saved across invocations. c is one of the following values: A Performs a lifetime analysis on a procedure’s variables to determine those that need to have their value saved across invocations of the procedure. When it finds such a variable, PFA generates a SAVE statement for the variable. M Does not generate SAVE statements. This is the default value. Significant Columns The -SCAN=n option controls the number of columns that PFA assumes to be significant. PFA ignores anything beyond the specified column number. The default value for n is 72. Specifying any of the following f77 options automatically sets this option: -col72, -col120, or -extend_source. Fortran Standard Setting the -SYNTAX=c option (or -SY=c) alters the interpretation of the Fortran input to be in compliance with other standards. c is one of the following values: A Interprets the source in strict compliance with the ANSI Fortran 77 standard. V Interprets the source in compliance with the VMS Fortran standard but without the additional SGI extensions. If you do not specify this option, PFA uses the same rules as the standard SGI Fortran compiler (refer to the Fortran 77 Programmer’s Guide for details). 52 Controlling Directives and Assertions Controlling Directives and Assertions This section discusses the options you can use to select whether PFA accepts a specific directive or assertion. You can use these options to override directives and assertions that are specified in the source program. Selecting Directives and Assertions The -DIRECTIVES=list option specifies the directives and assertions to accept. The -NODIRECTIVES option tells PFA to ignore all directives and assertions. This option is useful when you suspect unsafe directives are causing problems with program execution. Note: Some directives are called assertions because they assert program characteristics that PFA cannot verify. (For example, an assertion could assert that subroutine x contains no data dependencies.) However, you might want PFA to use it when optimizing. Refer to Chapter 1, “Overview of PFA,”for more information about directives and assertions. Valid values for list are any combination of the values A Accepts assertions. C Accepts Cray CDIR$ directives; CDIR$IVDEP ignores certain data dependencies in a loop. But because of differences between SGI hardware and a Cray machine, these data dependencies are not always safe to ignore on SGI hardware. To be safe, PFA does not recognize the CDIR$IVDEP directive by default. You can, at your own risk, turn on Cray-directive recognition, which will cause PFA to treat this Cray directive as if it were a C*$*ASSERT DO (CONCURRENT) assertion. K Accepts C*$* directives. 53 Chapter 4: Customizing PFA Execution S Accepts C$ directives. PFA recognizes the directives C$DOACROSS, C$, and C$&. (For more information, see the Fortran 77 Programmer’s Guide.) If a C$DOACROSS directive appears, PFA does not examine or alter the loop to which the directive applies. This allows you to mix code that you converted to parallel execution with code that PFA converted to parallel execution. V Accepts VAST CVD$ directives. For example, specifying -DIRECTIVES=K enables PFA directives only, whereas -DIRECTIVES=CK enables both Cray and PFA directives. Adding A to the DIRECTIVES sequence also enables PFA assertions. Any combination of options is acceptable. If you do not specify either option, PFA will accept all assertions, PFA C*$* directives, all C$ directives, and VAST CVD$ directives. Controlling PFA I/O This section describes command line options you can use to name PFA input and output. You do not need to use these options unless you want to change the default names. In particular, some versions of the make(1) utility assume that files ending in .1 are lex(1) input files. To perform automatic makes without overwriting the PFA listing file, use a different suffix for the listing filename. Use the -INPUT=file.f option to specify the name of the Fortran source program PFA input file. If you do not specify this option, PFA assumes that a command line argument not preceded by a dash is the input filename. The -FORTRAN=file option specifies the name of the PFA intermediate file (that is, the transformed source). If you do not specify this filename, PFA names the intermediate file.m, where file is the name of the input file. For details about the intermediate file, refer to Chapter 3, “Utilizing PFA Output.” 54 Obsolete Syntax The -LIST=file option specifies the name of the PFA listing file. If you do not specify this filename, PFA names the listing file file.l, where file is the name of the input file. For details about the listing file, refer to Chapter 3, “Utilizing PFA Output.” Obsolete Syntax Table 4-2 lists obsolete PFA command line options. Table 4-2 Obsolete Options Long Option Name Short Option Name Default Value -EXPAND -X, -EX off -CREATE -CR off -LIBRARY -LIB off -LIMIT2 -LM2 5000 PFA now accepts new syntax for some of the command line options (particularly the syntax for inlining). For compatibilIty with the older versions, these options are translated into their newer equivalents in Table 4-3. Whenever possible do not use the older syntax; support for it might be withdrawn in the future. Table 4-3 Obsolete Options and Their Equivalents Old Version New Version -EXPAND=A -INLINE -EXPAND=M -INLINE_MAN -LIBRARY=name -INLINE_FROM_LIBRARIES=name -CREATE -LIBRARY=name -INLINE_CREATE=name -LIMIT2=n -ARCLIMIT=n 55 Chapter 5 5. Fine-Tuning PFA This chapter contains the following sections: • “Overview” explains how to fine-tune program execution using directives and assertions. • “Fine-Tuning Inlining and IPA” describes how to use directives to use inlining and IPA more specifically than with command line options. • “Circumventing PFA” explains how to use directives to bypass PFA’s analysis and leave areas of code unchanged. • “Running Code Serially” explains how to use directives and assertions to stop PFA from running specific code in parallel. • “Running Code in Parallel” explains how to use directives and assertions to tell PFA that it is safe to run specific parts of code in parallel. • “Ignoring Data Dependencies” explains how to tell PFA that apparently data-dependent code is safe to run in parallel. • “Using Equivalenced Variables” explains how to assert that your code uses or does not use equivalenced variables. • “Using Aliasing” describes the assertions used with aliasing. 57 Chapter 5: Fine-Tuning PFA Overview After you run a Fortran source program through PFA once, you can use directives and assertions to fine-tune program execution. The listing file will show where and why PFA did not parallelize the code. You can use directives and assertions to force PFA to execute portions of code in various ways. Command line directives apply to the program as a whole. If you want finer control for parallelizing a critical loop or inlining a particular occurrence of a routine, specify directives and assertions directly in the code. You can also use directives and assertions to keep PFA from converting code to run in parallel. In other cases you might want to explicitly force PFA to run segments of code in parallel even though it normally would not. Fine-Tuning Inlining and IPA Chapter 4, “Customizing PFA Execution,” explains how to use inlining and IPA on an entire program (refer to “Performing Inlining and Interprocedural Analysis” on page 46). You can fine-tune inlining and IPA using the C*$*[NO] INLINE and C*$*[NO] IPA directives. The C*$* [NO] INLINE directive behaves much the same as the -INLINE command line option, but with the directive you can specify which occurrences of a routine are actually inlined. The format for this directive is C*$*[NO]INLINE [(name[,name ... ])] {HERE|ROUTINE|GLOBAL} where 58 name Specifies the routines to be inlined. If you do not specify a name, this directive will affect all routines in the program. HERE Applies the INLINE directive only to the next line; occurrences of the named routines on that next line are inlined. ROUTINE Inlines the named routines everywhere they appear in the current routine. Fine-Tuning Inlining and IPA GLOBAL Inlines the named routines throughout the source file. The C*$*NOINLINE form overrides the -INLINE command line option and so allows you to selectively disable inlining of the named routines at specific points. Example In the following code fragment, the C*$*INLINE directive inlines the first call to beta but not the second. do i =1,n C*$*INLINE (beta) HERE call beta (i,1) enddo call beta (n, 2) Using the specifier ROUTINE rather than HERE inlines both calls. This routine must be compiled with the -inline_man command line option for the C*$* INLINE directive to be recognized. The C*$* [NO] IPA directive is the analogous directive for interprocedural analysis. The format for this directive is C*$*[NO]IPA [(name [,name...])] {HERE|ROUTINE|GLOBAL} 59 Chapter 5: Fine-Tuning PFA Circumventing PFA Sometimes you might need to hand-tune a DO loop so that it will run in parallel. Use the directives in this section to prevent PFA from analyzing your modified code. C$ DOACROSS The C$ DOACROSS directive tells the Fortran 77 compiler to generate parallel code for the following loop. When PFA encounters this directive on input, it does not modify the accompanying loop and therefore does not interfere with any hand-tuning. C$ DOACROSS is the standard method for parallelism in Fortran. This directive is the same directive that PFA generates as a result of its analysis. Refer to the Fortran 77 Programmer’s Guide for more information about the C$ DOACROSS directive and its optional clauses. PFA runs the following code as it appears: C$ DOACROSS DO 10 I=1, 100 A(I) = B(I) 10 CONTINUE C$& The C$& directive continues the C$ DOACROSS directive onto multiple lines, for example, C$DOACROSS SHARE(ALPHA, BETA, GAMMA, DELTA, C$& EPSILON, OMEGA), LASTLOCAL (I, J, K, L, M, N), C$& LOCAL(XXX1, XXX2, XXX3, XXX4, XXX5, XXX6, XXX7, C$& XXX8, XXX9) 60 Running Code Serially Running Code Serially Use the following assertions and directives to keep PFA from running specific code in parallel. C*$* ASSERT DO (SERIAL) The C*$* ASSERT DO (SERIAL) assertion tells PFA to run the specified loop serially. PFA does not try to convert the specified loop to run in parallel. It also does not try to run any enclosing loop in parallel. However, PFA can still convert any loops nested inside the serial loop to run in parallel. CDIR$ NEXT SCALAR Silicon Graphics PFA supports the corresponding Cray directive, CDIR$ NEXT SCALAR. PFA interprets this directive as if it were a C*$* ASSERT DO (SERIAL) assertion and generates scalar code for the next DO loop. C*$* ASSERT DO PREFER (SERIAL) The C*$* ASSERT DO PREFER (SERIAL) assertion indicates that you want to execute a DO loop in serial mode. This assertion directs PFA to leave the DO loop alone, regardless of the setting of the optimization level. You can use this assertion to control which loop (in a nest of loops) PFA chooses to run in parallel. The following example program segment shows how to use the assertion: DO 100 I = 1, N C*$*ASSERT DO PREFER (SERIAL) DO 100 J = 1, M A(I,J) = B(I,J) 100 CONTINUE In the DO loop above, the assertion requests that the J loop be serial. In this construction, PFA tries to run the I loop in parallel but not the J loop. This capability is useful when you know the value of M to be very small or less than N. This assertion applies only to the DO loop that appears directly after the assertion. 61 Chapter 5: Fine-Tuning PFA Running Code in Parallel This section explains the directives and assertions that allow PFA to determine that specific areas of code are safe to run in parallel. C*$*[NO]CONCURRENTIZE The C*$*[NO]CONCURRENTIZE directive converts eligible loops to run in parallel. The NO version prevents PFA from converting loops to run in parallel. These directives, when specified globally, have the same effect as the -CONCURRENTIZE and -NOCONCURRENTIZE options (see Chapter 2, “How to Use PFA.”). CVD$ CONCUR PFA supports the VAST directive CVD$CONCUR. This directive runs a loop in parallel to optimize performance. PFA interprets this directive as if it were the C*$*CONCURRENTIZE directive. C*$* ASSERT DO PREFER (CONCURRENT) The C*$* ASSERT DO PREFER (CONCURRENT) assertion directs PFA to run a particular nested loop in parallel if possible. PFA runs another of the nested loops in parallel only if a condition prevents running the selected loop in parallel. Consider the following code: C*$* ASSERT DO PREFER (CONCURRENT) DO 100 I = 1, N DO 100 J = 1, M A (I, J) = B (I, J) 100 CONTINUE This code directs PFA to prefer to run the I loop in parallel. However, if a data dependence conflict prevents running the I loop in parallel, PFA might run the J loop in parallel. The C*$* ASSERT DO PREFER (CONCURRENT) assertion applies only to the DO loop immediately before it. 62 Ignoring Data Dependencies Ignoring Data Dependencies PFA avoids running code in parallel that it believes to be data-dependent. Use the assertions described in the following sections to override this behavior. C*$* ASSERT DO (CONCURRENT) The C*$* ASSERT DO (CONCURRENT) assertion tells PFA to ignore assumed data dependencies. Normally, PFA is conservative about converting loops to run in parallel. When PFA analyzes a loop to see if it is safe to run in parallel, it categorizes the loop into one of three groups: • yes (loop is safe to run in parallel) • no • not sure Normally, PFA does not run “not sure” loops in parallel. It assumes there are data dependencies. C*$* ASSERT DO (CONCURRENT) tells PFA to go ahead and run “not sure” loops in parallel. Note: If PFA identifies a loop as containing definite (as opposed to assumed) data dependencies, it does not run the loop in parallel even if you specify a C*$* ASSERT DO (CONCURRENT) assertion. CDIR$ IVDEP PFA interprets the Cray directive CDIR$ IVDEP as if it were a C*$* ASSERT DO (CONCURRENT) assertion. Some dependencies that are safe to run on Cray hardware are not safe to run on SGI hardware. Therefore, recognition of this assertion is turned off by default. 63 Chapter 5: Fine-Tuning PFA C*$* ASSERT CONCURRENT CALL The C*$* ASSERT CONCURRENT CALL tells PFA to ignore assumed dependencies that are due to a subroutine call or a function reference. However, you must ensure that the subroutines and referenced functions are safe for parallel execution. This assertion applies to all subroutine and function references in the accompanying loop, which must appear on the next line. C*$* ASSERT NO RECURRENCE The C*$* ASSERT NO RECURRENCE(variable) assertion tells PFA to ignore all data dependencies associated with variable. PFA ignores not just assumed dependencies (as with the C*$* ASSERT DO (CONCURRENT) assertion) but also real dependencies. Use this assertion to force PFA to parallelize a loop when other, gentler means have failed. Use this assertion with caution, as indiscriminate use can result in illegal parallel code. C*$* ASSERT PERMUTATION The C*$* ASSERT PERMUTATION(array) assertion tells PFA that array contains no repeated values. This assertion permits PFA to run in parallel certain kinds of loops that use indirect addressing, for example, DO I = 1, N A(INDEX(I)) = A(INDEX(I)) + B(I) ENDDO You can run this loop in parallel only if the array INDEX has no repeated values (so that each INDEX (I) is unique). PFA cannot determine this, so it does not run such a loop in parallel. However, if you know that every element of INDEX() is unique, you can insert the following line before the loop to permit PFA to run the loop in parallel: C*$* ASSERT PERMUTATION (INDEX) 64 Using Equivalenced Variables Using Equivalenced Variables The C*$* ASSERT NO EQUIVALENCE HAZARD assertion tells PFA that your code does not use equivalenced variables to refer to the same memory location inside one loop nest. Normally, EQUIVALENCE statements allow your code to use different variable names to refer to the same storage location. The -ASSUME=E command line option acts like the global C*$* ASSERT EQUIVALENCE HAZARD assertion (see “Global Assumptions” on page 50 in Chapter 4). The C*$* ASSERT EQUIVALENCE HAZARD assertion is active until you reset it or until the end of the program unit. Using Aliasing PFA has several assertions for use with aliasing. C*$* ASSERT [NO] ARGUMENT ALIASING The C*$* ASSERT [NO] ARGUMENT ALIASING assertion allows PFA to make assumptions about subprogram arguments in a program. According to the Fortran 77 standard, you can alias a variable only if you do not modify (that is, write to) the aliased variable. The following subroutine violates the standard, because variable A is aliased in the subroutine (through C and D) and variable X is aliased (through X and E): COMMON X,Y REAL A,B CALL SUB (A, A, X) ... SUBROUTINE SUB(C,D,E) COMMON X,Y X = ... C = ... ... 65 Chapter 5: Fine-Tuning PFA The command line option -ASSUME=P acts like a global C*$* ASSERT ARGUMENT ALIASING assertion (see Chapter 4, “Customizing PFA Execution.”). A C*$* ARGUMENT ALIASING assertion is active until it is reset or until the next routine begins. C*$* ASSERT RELATION The C*$* ASSERT RELATION(name.xx.name) assertion indicates the relationship between two variables or between a variable and a constant. name is the variable or constant, and xx is any of the following: GT, GE, EQ, NE, LT, or LE. This assertion applies only to the next DO statement. Consider the following code: 100 DO 100 I = 1, N A (I) = A (I+M) + B (I) CONTINUE If you know that M is greater than N, use the following assertion to give this information to PFA: C*$* ASSERT RELATION (M .GT. N) DO 100 I = 1, N A (I) = A (I +M) + B (I) 100 CONTINUE Knowing that M is greater than N, PFA can generate parallel code for this loop. If at run time, M is less than N, the answers produced by the code run in parallel could differ significantly from the answers produced by the original code run serially. Note: Many relationships of this type can be cheaply tested for at run time. PFA will attempt to answer questions of this sort by generating an IF statement that explicitly tests the relationship at run time. Occasionally, PFA may need assistance, or you may want to squeeze that last ounce of performance out of some critical loop by asserting some relationship rather than repeatedly checking it at run time. 66 Appendix A A. PFA Command Line Options This appendix contains the following sections: • “Overview” • “Options Summary” • “Obsolete Syntax” This appendix lists and describes the options to PFA. The default settings are satisfactory for most programs. However, you can alter the defaults to customize output. PFA accepts several command line options. Table A-1 lists the default settings for each option. Overview Table A-1 summarizes the PFA command line options. The Reference column lists the functional categories of the following options: • parallel execution • general optimization • Fortran 77 language control • directive control • listing The next three columns list the long names, short names, and default values of the options. Following the table is an explanation of each option, including the option’s long and short names, its default, and, if applicable, the long and short names for the NO version of the option. 67 Appendix A: PFA Command Line Options Note: You can replace many of the PFA command line options described in this chapter with in-code directives. Table A-1 PFA Command Line Options Reference Long Name Short Name Default Value Parallelization [NO]CONCURRENTIZE [N]CONC CONCURRENTIZE MINCONCURRENT=n MC=n MINCONCURRENT=500 ARCLIMIT ARCLM=n ARCLIMIT=5000 LIMIT=n LM=n LIMIT=20000 OPTIMIZE=n O=n OPTIMIZE=5 ROUNDOFF=n R=n ROUNDOFF=0 SCALAROPT=n SO=n SCALAROPT=3 UNROLL=n UR=n UNROLL=4 UNROLL2=n UR22=n UNROLL2=100 Fortran 77 Language ASSUME=list AS=list ASSUME=EL Control [NO]DLINES [N]DL NODLINES [NO]ONETRIP [N]l NOONETRIP SAVE=c SV=c SAVE=A SCAN=n SCAN=n SCAN=72 SYNTAX=c SY=c (option off) INLINE[=list] IN (option off) IPA[=names] IPA (option off) INLINE_CREATE=name INCR=name (option off) Optimization Inlining and Interprocedural Analysis IPA_CREATE=name IPACR=name (option off) INLINE_FROM_FILES=list INFF=list (option off) IPA_FROM_FILES=list IPAFF=list (option off) INLINE_FROM_LIBRARIES=list INFL=list (option off) IPA_FROM_LIBRARIES=list IPAFL=list (option off) INLINE_LOOP_LEVEL=n INLL=n (INLL=10 IPA_LOOP_LEVEL=n IPALL=n IPALL=10 INLINE_MAN INM (option off) IPA_MAN IPAM INLL=10 INLINE_DEPTH IND IPALL=10) IND=10 68 Options Summary Table A-1 (continued) PFA Command Line Options Reference Long Name Short Name Default Value Directives [NO]DIRECTIVES=list [N]DR=list DIRECTIVES=AKSV I/O INPUT=file.f file.f file.f [NO]FORTRAN=file [N]F=file F=file.m [NO]LIST=file [N]L=file L=file.l Listing Obsolete LINES=n LN=n LINES=55 LISTOPTIONS=list LO=list LISTOPTIONS=OL SUPPRESS=list SU=list (option off) CREATE CR (option off) LIBRARY=file LIB=file (option off) [NO]EXPAND=list EX=list (option off) LIMIT2=n LM2=n LM2=5000 Options Summary This section lists and defines all PFA command line options alphabetically. ARCLIMIT The -ARCLIMIT option, described in Table A-2, controls the size of the internal table used to store data dependence information (arcs). Table A-2 ARCLIMIT Option Long Option Name Short Option Name Default Value -ARCLIMIT=n -ARCLM=n 5000 69 Appendix A: PFA Command Line Options ASSUME The -ASSUME option, described in Table A-3, controls certain global assumptions of a program. Table A-3 ASSUME Option Long Option Name Short Option Name Default Value -ASSUME=list -AS=list EL You can also use various assertions to control these assumptions. list is any combination of the following values: E Means that equivalence variables can refer to the same memory location inside one loop. L Is important when a scalar is assigned in a loop run in parallel. If ASSUME is L, PFA uses a temporary variable within the optimized loop and assigns the last value to the original scalar if PFA determines that scalar can be reused before it is assigned. P Allows for parameter aliasing in a subprogram. CONCURRENTIZE The -CONCURRENTIZE option, described in Table A-4, converts eligible loops to run in parallel. Table A-4 CONCURRENTIZE Option Long Option Name Short Option Name Default Value -CONCURRENTIZE -C -CONCURRENTIZE See also NOCONCURRENTIZE. 70 Options Summary DIRECTIVES The -DIRECTIVES option, described in Table A-5, specifies the directives and assertions to accept. Table A-5 DIRECTIVES Option Long Option Name Short Option Name Default Value -DIRECTIVES=list -DR=list AKSV list consists of any combination of A Accepts assertions. C Accepts Cray CDIR$ directives. Because of differences between SGI and Cray hardware, certain data dependencies that CDIR$IVDEP ignores in a loop are not always safe to ignore on SGI hardware. PFA does not recognize the CDIR$IVDEP directive by default. You can, however, turn on Cray-directive recognition, which will cause PFA to treat the Cray directive as a C*$*ASSERT DO (CONCURRENT) assertion. K Accepts PFA C*$* directives. S Accepts C$ directives. PFA recognizes the directives C$&C, C$, and $DOACROSS. (For more information, see the Fortran 77 Programmer’s Guide.) If a C$DOACROSS directive appears, PFA does not examine or alter the loop to which the directive applies. This allows you to mix code you converted to parallel execution with code that PFA converted to parallel execution. V Accepts VAST CVD$ directives. See also NODIRECTIVES. 71 Appendix A: PFA Command Line Options DLINES The -DLINES option, described in Table A-6, tells PFA to treat letter D in column one as if the letter were a character space. Table A-6 DLINES Option Long Option Name Short Option Name Default Value -DLINES -DL -NODLINES PFA then parses the rest of that line as a normal Fortran 77 statement. See also NODLINES. FORTRAN The -FORTRAN option, described in Table A-7, specifies the name of the PFA-transformed source. Table A-7 FORTRAN Option Long Option Name Short Option Name Default Value -FORTRAN=filename -F=filename filename. m filename is the name of the Fortran source. INLINE The -INLINE option, described in Table A-8, specifies the routines to be inlined. Table A-8 INLINE Option Long Option Name Short Option Name Default Value -INLINE[=list] -IN[=list] none If this option is given with a (colon-separated) list of routine names, then only those routines will be inlined. If it is given without a list of names, then PFA will attempt to inline all eligible routines. 72 Options Summary INLINE_CREATE The -INLINE_CREATE option, described in Table A-9, creates a library of prepared routines for later use with the -INLINE_FROM_LIBRARIES option. Table A-9 INLINE_CREATE Option Long Option Name Short Option Name Default Value -INLINE_CREATE=name -INCR=name option off You are not required to create a library to do inlining; you can inline directly from a source file. Constructing a library will save time if the inlining operation is to be done repeatedly. PFA analyzes the current source file and places the appropriate information into the file named with the -INLINE_CREATE option. For maximum compatibility, the filename extension .klib is recommended: for example, samp.klib. INLINE_DEPTH The -INLINE_DEPTH option, described in Table A-10, restricts the number of times PFA will continue to attempt inlining on already inlined routines. Table A-10 INLINE_DEPTH Option Long Option Name Short Option Name Default Value -INLINE_DEPTH=n -IND=n 10 This option controls this process, as routines are only inlined to the specified depth. As a special case, if you specify the value –1, only routines that do not reference other routines are inlined (that is, only leaf routines are inlined). The only valid negative number is –1; do not specify –2, –3, and so on. Note that there is no -IPA_DEPTH option. 73 Appendix A: PFA Command Line Options INLINE_FROM_FILES The -INLINE_FROM_FILES option, described in Table A-11, specifies where to look for routines named in the -INLINE option. Table A-11 INLINE_FROM_FILES Option Long Option Name Short Option Name Default Value -INLINE_FROM_FILES=list -INFF=list current source file Files with the extension .f are assumed to be Fortran source, while files with the extension .klib are assumed to be PFA-produced libraries. Specify multiple files and directories by using a colon-separated list. Note: This option alone does not initiate inlining. It only specifies where to look for the routines. Use this option with the -INLINE option. INLINE_FROM_LIBRARIES The -INLINE_FROM_LIBRARIES option, described in Table A-12, specifies where to look for the routines named in the -INLINE option. Table A-12 INLINE_FROM_LIBRARIES Option Long Option Name Short Option Name Default Value -INLINE_FROM_LIBRARIES=list -INFF=list current source file Files with the extension .f are assumed to be Fortran source, while files with the extension .klib are assumed to be PFA-produced libraries. Specify multiple libraries by using a colon-separated list. Note: This option alone does not initiate inlining. It only specifies where to look for the routines. Use this option with the -INLINE option. 74 Options Summary INLINE_LOOPLEVEL The -INLINE_LOOPLEVEL option, described in Table A-13, restricts PFA occurrences within deeply nested loops. Table A-13 INLINE_LOOPLEVEL Option Long Option Name Short Option Name Default Value -INLINE_LOOPLEVEL=n -INLL=n 10 Thus, a value of 1 restricts PFA to deal with routines only at the singlemost deeply nested level; a value of 2 restricts PFA to the deepest and second-deepest levels; and so on. To determine what is most deeply nested, PFA constructs a call graph to account for nesting due to loops that occur farther up the call chain. INLINE_MAN The -INLINE_MAN option, described in Table A-14, turns on recognition of the C*$*INLINE directive. Table A-14 INLINE_MAN Option Long Option Name Short Option Name Default Value -INLINE_MAN -INM option off The C*$*INLINE directive allows you select individual occurrences of routines to be inlined. 75 Appendix A: PFA Command Line Options INPUT The -INPUT option, described in Table A-15, specifies the name of the PFA input file. Table A-15 INPUT Option Long Option Name Short Option Name Default Value -INPUT=filename.f -I=filename.f filename.f It is not necessary to precede the input filename with this option; PFA assumes that a command line argument not preceded by a dash is the input filename. IPA The -IPA option, described in Table A-16, specifies the routine’s IPA. Table A-16 IPA Option Long Option Name Short Option Name Default Value -IPA[=list] -IPA[=list] none If this option is given with a colon-separated list of routine names, then only those routines will be IPAed. If it is given without a list of names, then PFA will attempt to IPA all eligible routines. IPA_CREATE The -IPA_CREATE option, described in Table A-17, creates a library of prepared routines for later use with the -IPA_FROM_LIBRARIES option. Table A-17 76 IPA_CREATE Option Long Option Name Short Option Name Default Value -IPA_CREATE=name -IPACR=name option off Options Summary You are not required to create a library to do inlining; you can inline directly from a source file. Constructing a library will save time if the inlining operation is to be done repeatedly. PFA analyzes the current source file and places the appropriate information into the file named with the -INLINE_CREATE option. For maximum compatibility, the filename extension .klib is recommended: for example, samp.klib. Libraries created for IPA only contain summary information and so can be used only for IPA. IPA_FROM_FILES The -IPA_FROM_FILES option, described in Table A-18, specifies where to look for the routines named in the -IPA option. Table A-18 IPA_FROM_FILES Option Long Option Name Short Option Name Default Value -IPA_FROM_FILES=list -IPAFF=list current source file Files with the extension .f are assumed to be Fortran source, while files with the extension .klib are assumed to be PFA-produced libraries. Specify multiple files using a colon-separated list. Note: This option alone does not initiate IPA. It only specifies where to look for the routines. Use this option in conjunction with the -IPA option. IPA_FROM_LIBRARIES The -IPA_FROM_LIBRARIES option, described in Table A-19, specifies where to look for the routines named in the -IPA option. Table A-19 IPA_FROM_LIBRARIES Option Long Option Name Short Option Name Default Value -IPA_FROM_LIBRARIES=list -IPAFL=list none 77 Appendix A: PFA Command Line Options Files with the extension .f are assumed to be Fortran source, while files with the extension .klib are assumed to be PFA-produced libraries. Specify multiple libraries by using a colon-separated list. Note: This option alone does not initiate IPA. It only specifies where to look for the routines. Use this option in conjunction with the -IPA option. IPA_LOOPLEVEL The -IPA_LOOPLEVEL option, described in Table A-20, restricts PFA to occurrences within deeply nested loops. Table A-20 IPA_LOOPLEVEL Option Long Option Name Short Option Name Default Value -IPA_LOOPLEVEL=n -IPALL=n 10 A value of 1 restricts PFA to routines only at the singlemost deeply nested level; a value of 2 restricts PFA to the deepest and second-deepest levels; and so on. To determine what is most deeply nested, PFA constructs a call graph to account for nesting due to loops that occur further up the call chain. IPA_MAN The -IPA_MAN option, described in Table A-21, turns on recognition of the C*$*IPA directive. Table A-21 IPA_MAN Option Long Option Name Short Option Name Default Value -IPA_MAN -IPAM option off The C*$*IPA directive allows you select individual occurrences of routines to be IPAed. 78 Options Summary LIMIT The -LIMIT option, described in Table A-22, reduces PFA processing time by limiting the amount of time PFA can spend on trying to determine whether a loop is safe to run in parallel. Table A-22 LIMIT Option Long Option Name Short Option Name Default Value -LIMIT=n -LM=n LIMIT=5000 PFA estimates how much time is required to analyze each loop nest construct. If an outer loop looks like it would take too much time to analyze, PFA ignores the outer loop and recursively visits the inner loops. Larger limits often allow PFA to generate parallel code for deeply nested loop structures that it might not otherwise be able to run safely in parallel. However, with larger limits PFA can also take more time to analyze a program. (The limit does not correspond to the DO loop nest level. It is an estimate of the number of loop orderings that PFA can generate from a loop nest.) LINES The -LINES option, described in Table A-23, paginates the listing for printing. Table A-23 LINES Option Long Option Name Short Option Name Default Value -LINES=n -LN=n LINES=55 Use this option to change the number of lines per page. Specifying -LINES=0 paginates at subroutine boundaries. 79 Appendix A: PFA Command Line Options LIST The -LIST option, described in Table A-24, specifies the name of the PFA listing file. Table A-24 LIST Option Long Option Name Short Option Name Default Value -LIST=filename -L=filename LIST=filename.l filename is the name of the Fortran source. LISTOPTIONS The -LISTOPTIONS option, described in Table A-25, specifies the information to include in the listing file (.l). Table A-25 LISTOPTIONS Option Long Option Name Short Option Name Default Value -LISTOPTIONS=list -LO=list OL list consists of any combination of 80 C Calling tree at the end of the program listing. I Transformed program file annotated with line numbers in the source program. Error messages and debugging information can refer to the original source rather than the transformed source. When PFA is run as part of an f77 compilation, this option is added automatically. K PFA option used at the end of each program unit. L Loop-by-loop optimization table. N Program unit names, as processed, to the standard error file. This option is added automatically as part of an f77 -v compilation. O Annotated listing of the original program. P Processing performance statistics. Options Summary S Summary of optimization performed. T Annotated listing of the transformed program. MINCONCURRENT The -MINCONCURRENT option, described in Table A-26, establishes the minimum amount of work needed inside the loop to make executing a loop in parallel profitable. Table A-26 MINCONCURRENT Option Long Option Name Short Option Name Default Value -MINCONCURRENT=n -MC=n 500 If the loop does not contain at least this much work, the loop will not be run in parallel. If the loop bounds are not constants, an IF clause will be automatically added to the PFA-generated DOACROSS directive to test at run time whether sufficient work exists. The MINCONCURRENT value is a count of the number of operations (for example, add, multiply, load, store) in the loop, multiplied by the number of times the loop will be executed. NOCONCURRENTIZE The -NOCONCURRENTIZE option, described in Table A-27, prevents PFA from converting loops to run in parallel. Table A-27 NOCONCURRENTIZE Option Long Option Name Short Option Name Default Value -NOCONCURRENTIZE -NCONC none See also CONCURRENTIZE. 81 Appendix A: PFA Command Line Options NODIRECTIVES The -NODIRECTIVES option, described in Table A-28, tells PFA to ignore all directives and assertions. Table A-28 NODIRECTIVES Option Long Option Name Short Option Name Default Value -NODIRECTIVES -NDR none See also DIRECTIVES. NODLINES The -NODLINES option, described in Table A-29, tells PFA to treat lines starting with D as though they were comments. Table A-29 NODLINES Option Long Option Name Short Option Name Default Value -NODLINES -NDL -NODLINES See also DLINES. NOONETRIP The -NOONETRIP option, described in Table A-30, conforms to the Fortran 77 standard, which specifies that a DO loop whose termination condition is initially satisfied is not executed. Table A-30 NOONETRIP Option Long Option Name Short Option Name Default Value -NOONETRIP -N1 -NOONETRIP See also ONETRIP. 82 Options Summary ONETRIP The -ONETRIP option, described in Table A-31, allows compatibility with older versions of Fortran where a DO loop is always executed at least once. Table A-31 ONETRIP Option Long Option Name Short Option Name Default Value -ONETRIP -1 -NOONETRIP See also NOONETRIP. OPTIMIZE The -OPTIMIZE option, described in Table A-32, sets the optimization level. Table A-32 OPTIMIZE Option Long Option Name Short Option Name Default Value -OPTIMIZE=n -O=n 5 The higher you set the optimization level, the more code is optimized and the longer PFA runs. Valid values for n are the integers 0 Avoids converting loops to run in parallel. 1 Converts loops to run in parallel without using advanced data dependence tests. Enables loop interchanging. 2 Determines when scalars need last-value assignment using lifetime analysis. Also uses more powerful data dependence tests to find loops that can be run safely in parallel. This level allows reductions in loops that execute concurrently, but only if -ROUNDOFF is set to 2. 3 Breaks data dependence cycles using special techniques and additional loop interchanging methods, such as interchanging triangular loops. This level also implements special-case data dependence tests. 83 Appendix A: PFA Command Line Options 4 Generates two versions of a loop, if necessary, to break a data dependence arc. This level also implements more exact data dependence tests and allows special index sets (called wraparound variables) to convert more code to run in parallel. 5 Fuses two adjacent loops if it is legal to do so (no data dependencies) and if the loops have the same control values. In certain limited cases, this level recognizes arrays as local variables. Level 5 also tells PFA to try harder to take the outermost loop possible (of a set of nested loops) and run it in parallel. Note: If you want to use the -UNROLL command line option, you must set the -OPTIMIZE option to 4 or higher (the default optimization level is above this threshold). ROUNDOFF The -ROUNDOFF option, described in Table A-33, controls whether PFA runs a reduction operation in parallel. Table A-33 ROUNDOFF Option Long Option Name Short Option Name Default Value -ROUNDOFF=n -R=n 0 Valid values for n are 84 0–1 Suppresses any round-off changing transformations. 2 Allows reductions to be performed in parallel. The valid reduction operators are addition, multiplication, min, and max. -ROUNDOFF=2 is one of the most common user options. 3 Recognizes REAL induction variables. Permits the memory management transformations. Options Summary SAVE Table A-34 describes the -SAVE option. Table A-34 SAVE Option Long Option Name Short Option Name Default Value -SAVE=c -SV=c A Either of the following values are valid for c: A Performs lifetime analysis on a procedure‘s variables to try and determine those that need to have their value saved across invocations of the procedure. When it finds such a variable, PFA generates a SAVE statement for the variable. This is the default value. M Does not generate SAVE statements. SCALAROPT The -SCALAROPT=n option, described in Table A-35, controls the amount of standard scalar optimizations attempted by PFA. Table A-35 SCALAROPT Option Long Option Name Short Option Name Default Value -SCALAROPT=n -SO=n 3 Valid values for n are 0 Performs no scalar transformations. 1 Enables dead code elimination, pulling loop variables, forward substitution, and conversion of IF-GOTO into IF-THEN-ELSE. 2 Enables induction variable recognition, loop unrolling, loop fusion, array expansion, scalar promotion, and floating invariant IF tests. (Loop fusion also requires -OPTIMIZE=5.) 85 Appendix A: PFA Command Line Options 3 Enables the memory management transformations. (Memory management also requires -ROUNDOFF=3.) SCAN The -SCAN option, described in Table A-36, controls the number of columns that are assumed to be significant (PFA ignores anything beyond the specified column). Table A-36 SCAN Option Long Option Name Short Option Name Default Value -SCAN=n -SCAN=n 72 Specifying any of the following f77 options automatically sets this option: -col72, -col120, or -extend _source. SUPPRESS The -SUPPRESS option, described in Table A-37, lets you individually disable classes of PFA messages that are normally included in the listing (.l) file. Table A-37 SUPPRESS Option Long Option Name Short Option Name Default Value -SUPPRESS=list -SU=list option off These messages range from syntax warnings and error messages to messages about the optimizations performed. list is of any combination of the following: 86 D Data dependence E Syntax error I Information N Not able to run loop in parallel Options Summary Q Questions S Standard messages W Warning of syntax error (PFA adds the -SUPPRESS=W option automatically if you use the -w option to f77) SYNTAX Setting the -SYNTAX option, described in Table A-38, alters the interpretation of the Fortran input to be in compliance with other standards. Table A-38 SYNTAX Option Long Option Name Short Option Name Default Value -SYNTAX=c -SY=c SGI Fortran syntax c is any of the following values: A Interprets the source in strict compliance with the ANSI Fortran 77 standard. V Interprets the source in compliance with the VMS Fortran standard but without the additional SGI extensions. UNROLL The -UNROLL option, described in Table A-39, unrolls scalar inner loops when PFA cannot run the loops in parallel. Table A-39 UNROLL Option Long Option Name Short Option Name Default Value -UNROLL=n -UR=n 4 When PFA unrolls a loop, it replicates the body of the loop a certain number of times, making the loop run faster. (In previous releases of PFA the default value was 1.) 87 Appendix A: PFA Command Line Options UNROLL2 The -UNROLL2 option, described in Table A-40, allows you to adjust the number of operations used by the -UNROLL option. Table A-40 UNROLL2 Option Long Option Name Short Option Name Default Value -UNROLL2=m -UR2=m 50 Selecting a larger value for -UNROLL2 allows PFA to unroll loops containing more calculations. This form of unrolling applies only to the innermost loops in a nest of loops. You can unroll loops whether they execute serially or concurrently. f 77 passes this option to PFA automatically when you specify the f77 -d lines option. Obsolete Syntax Table A-41 describes obsolete PFA syntax. Table A-41 Obsolete Options Long Option Name Short Option Name Default Value -EXPAND=list -X=list, -EX=list option off -CREATE -CR option off -LIBRARY=file -LIB=file option off -LIMIT2=n -LM2 5000 This version of PFA has altered some of the command line syntax (particularly the syntax for inlining). 88 Obsolete Syntax For compatibility with the older versions, Table A-42 lists the options that are translated into their newer equivalents. Table A-42 Obsolete Options and Their Equivalents Old Version New Version -EXPAND=A -INLINE -EXPAND=M -INLINE_MAN -LIBRARY=name -INLINE_CREATE=name -LIMIT2=n -ARCLIMIT=n Whenever possible do not use this older syntax. Support for it might be withdrawn in the future. 89 Appendix B B. PFA Directives This appendix contains the following sections: • “Standard Directives” • “Cray Directives” • “VAST Directives” This appendix lists and describes the three types of PFA directives: • Standard • Cray • VAST Chapter 1, “Overview of PFA,” describes the purpose of directives. For details about how to use directives, refer to Chapter 5, “Fine-Tuning PFA.” 91 Appendix B: PFA Directives Standard Directives This section lists and describes the following standard PFA directives alphabetically: • C*$*ARCLIMIT • C*$*CONCURRENTIZE • C*$*INLINE • C*$*IPA • C*$*LIMIT • C*$*MINCONCURRENT • C*$*NOCONCURRENTIZE • C*$*NOINLINE • C*$*NOIPA • C*$*OPTIMIZE • C*$*ROUNDOFF • C*$*SCALAR OPTIMIZE • C*$*UNROLL • C$*DOACROSS • C$& C*$* ARCLIMIT The C*$*ARCLIMIT(n) directive controls the size of the internal table used to store data dependence information (arcs). n is an integer. This directive, when specified globally, has the same effect as the -ARCLIMIT command line option. C*$* CONCURRENTIZE The C*$*CONCURRENTIZE directive converts eligible loops to run in parallel. This directive, when specified globally, has the same effect as the -C*$*CONCURRENTIZE command line option. See also C*$*NOCONCURRENTIZE. 92 Standard Directives C*$* INLINE The C*$*INLINE directive behaves much like the -INLINE command line option but specifies which occurrences of a routine are actually inlined. The format for this directive is C*$* INLINE [(name[,name ... ] ) ] {HERE | ROUTINE | GLOBAL} where name Specifies the routines to be inlined. If you do not specify a name, all routines will be affected. HERE Inlines only to the next line; occurrences of the named routines on that next line are inlined. ROUTINE Inlines the named routines everywhere they appear in the current routine. GLOBAL Inlines the named routines throughout the source file. See also C*$*NOINLINE. For details about inlining, refer to Chapter 4, “Customizing PFA Execution.” For details about using the C*$*INLINE directive, refer to Chapter 5, “Fine-Tuning PFA.” C*$* IPA The C*$* IPA directive behaves much like the -IPA command line option but specifies on which occurrences of a routine to use IPA. The format for this directive is C* $ * IPA [ (name [, name ... ])] {HERE|ROUTINE|GLOBAL} where name Specifies the routines to be IPAed. If you do not specify a name, all routines will be affected. HERE Uses IPA only on occurrences of the named routines that appear on the next line. ROUTINE Uses IPA on the named routines everywhere they appear in the current routine 93 Appendix B: PFA Directives GLOBAL Uses IPA on the named routines throughout the source file. See also C*$*NOIPA. For details about interprocedural analysis, refer to Chapter 4, “Customizing PFA Execution.” For details about using the C*$*IPA directive, refer to Chapter 5, “Fine-Tuning PFA.” C*$* LIMIT The C*$*LIMIT(n) directive reduces PFA processing time by limiting the amount of time PFA can spend on trying to determine whether a loop is safe to run in parallel. PFA estimates how much time is required to analyze each loop nest construct. If an outer loop looks like it would take too much time to analyze, PFA ignores the outer loop and recursively visits the inner loops. Larger limits often allow PFA to generate parallel code for deeply nested loop structures that it might not otherwise be able to run safely in parallel. However, with larger limits PFA can also take more time to analyze a program. (The limit does not correspond to the DO loop nest level. It is an estimate of the number of loop orderings that PFA can generate from a loop nest.) This directive, when specified globally, has the same effect as the -LIMIT command line option. C*$* MINCONCURRENT The C*$*MINCONCURRENT(n) option establishes the minimum amount of work needed inside the loop to make executing a loop in parallel profitable. n is a count of the number of operations (for example, add, multiply, load, store) in the loop, multiplied by the number of times the loop will be executed. If the loop does not contain at least this much work, the loop will not be run in parallel. If the loop bounds are not constants, an IF clause will be automatically added to the PFA-generated C$ DOACROSS directive to test at run time if sufficient work exists. 94 Standard Directives C*$* NOCONCURRENTIZE The C*$*NONCONCURRENTIZE option prevents PFA from converting loops to run in parallel. See also C*$*CONCURRENTIZE. C*$* NOINLINE The C*$*NOINLINE directive behaves much like the -NOINLINE command line option, but with the directive you can specify which occurrences of a routine are not inlined. The format for this directive is C*$* NOINLINE [(name [,name ... ])] {HERE|ROUTINE|GLOBAL} where name Specifies the routines to be inlined. If you do not specify a name all routines will be affected. HERE Disables inlining of occurrences of the named routines only on the next line. ROUTINE Disables inlining of the named routines everywhere they appear in the current routine. GLOBAL Disables inlining of the named routines throughout the source file. C*$*NOINLINE overrides the -INLINE command line option and so allows you to disable inlining of the named routines at specific points. C*$* NOIPA The C*$*NOIPA directive behaves much like the -NOIPA command line option, but with the directive you can specify on which occurrences of a routine to not use IPA. The format for this directive is C*$* NOIPA [(name [, name ... ])] { HERE|ROUTINE|GLOBAL} where name Specifies the routines to disable IPA. If you do not specify a name all routines will be affected. HERE Disables IPA of occurrences of the named routines only on the current routine 95 Appendix B: PFA Directives ROUTINE Disables IPA of the named routines everywhere they appear in the current routine. GLOBAL Disables IPA of the named routines throughout the source file. C*$*NOIPA overrides the -IPA command line option and so allows you to disable IPA of the named routines at specific points. C*$*OPTIMIZE The C*$*OPTIMIZE(n) directive sets the optimization level. The higher the optimization level, the more code is optimized and longer PFA runs. Valid values for n are the integers 96 0 Avoids converting loops to run in parallel. 1 Converts loops to run in parallel without using advanced data dependence tests. Enable loop interchanging. 2 Determines when scalars need last-value assignment using lifetime analysis. Also uses more powerful data dependences tests to find loops that can run safely in parallel. This level allows reductions in loops that execute concurrently but only if the round-off setting is at least 2. 3 Breaks data dependence cycles using special techniques and additional loop interchanging methods, such as interchanging triangular loops. This level also implements special-case data dependence tests. 4 Generates two versions of a loop, if necessary, to break a data dependent arc. This level also implements more exact data dependence tests and allows special index sets (called wraparound variables) to convert more code to run in parallel. 5 Fuses two adjacent loops if it is legal to do so (no data dependencies) and if the loops have the same control values. In certain limited cases, this level recognizes arrays as local variables. Level 5 also tells PFA to try harder to run the outermost loop possible (of a set of loops) in parallel. Standard Directives Note: If you want to use unrolling, set the optimize level to at least 4 (the default optimization level is above this threshold). C*$*ROUNDOFF The C*$*ROUNDOFF(n) directive controls whether PFA runs a reduction operation in parallel. Valid values for n are 0–1 Suppresses any round-off changing transformations. 2 Allows reductions to be performed in parallel. The valid reduction operators are addition, multiplication, min, and max. -ROUNDOFF=2 is one of the most common user options. 3 Recognizes REAL induction variables. Permits the memory management transformations. C*$*SCALAR OPTIMIZE The C*$*SCALAR OPTIMIZE (n) directive controls the amount of standard scalar optimizations attempted by PFA. Valid values for n are 0 Performs no scalar transformations. 1 Enables dead code elimination, pulling loop invariants, forward substitution, and conversion of IF-GOTO into IF-THEN-ELSE. 2 Enables induction variables recognition, loop unrolling, loop fusion, array expansion, scalar promotion, and floating invariant IF tests. (Loop fusion also requires -OPTIMIZE=5.) 3 Enables the memory management transformations. (Memory management also requires -ROUNDOFF=3.) 97 Appendix B: PFA Directives C*$*UNROLL The C*$*UNROLL (n) directive unrolls scalar inner loops when PFA cannot run the loops in parallel. When PFA unrolls a loop, it replicates the body of the loop a certain number of times, making the loop ran faster. In this form, n has the same meaning as in the -UNROLL=n command line option. The C*$*UNROLL(n, m) option allows you to adjust the number of operations used when unrolling. In this form, n is as above and m is as in the -UNROLL2=m command line option. This form of unrolling applies only to the innermost loops in a nest of loops. You can unroll loops whether they execute serially or concurrently. C$ DOACROSS The C$ DOACROSS directive tells the Fortran 77 compiler to generate parallel code for the loop that immediately follows the directive. Putting this directive in the original source marks the loop to run in parallel and signals PFA not to modify the loop. Note: PFA generates the C$ DOACROSS directive and inserts it into the code as the result of PFA’s parallelism analysis. C$& The C$& directive continues the C$ DOACROSS directive onto multiple lines. 98 Cray Directives Cray Directives PFA supports the following Cray directives: • CDIR$ IVDEP • CDIR$ NEXT SCALAR CDIR$ IVDEP PFA interprets the CDIR$ IVDEP directive as if it were a C*$* ASSERT DO (CONCURRENT) assertion. (Refer to Appendix C, “PFA Assertions,” for details.) CDIR$ NEXT SCALAR CDIR$ NEXT SCALAR is a Cray directive that generates scalar code for the next DO loop. PFA interprets this directive as if it were a C*$* ASSERT DO(SERIAL) assertion. (Refer to Appendix C, “PFA Assertions,” for details.) VAST Directives PFA supports the CVD$CONCUR VAST directive. The CVD$CONCUR directive runs a loop in parallel to optimize performance. PFA interprets this directive as if it were the C*$*CONCURRENTIZE directive (described in “Standard Directives” on page 92). 99 Appendix C C. PFA Assertions This appendix lists and describes the following PFA assertions alphabetically: • C*$* ASSERT ARGUMENT ALIASING • C*$* ASSERT DO (SERIAL) • C*$* ASSERT DO (CONCURRENT) • C*$* ASSERT DO PREFER (SERIAL) • C*$* ASSERT DO PREFER (CONCURRENT) • C*$* ASSERT EQUIVALENCE HAZARD • C*$* ASSERT NO ARGUMENT ALIASING • C*$* ASSERT NO EQUIVALENCE HAZARD • C*$* ASSERT RELATION (name .xx. name) • C*$* ASSERT CONCURRENT CALL • C*$* ASSERT NO RECURRENCE • C*$* ASSERT PERMUTATION (name) Chapter 1, “Overview of PFA,” describes the purpose of assertions. For details about using assertions, refer to Chapter 5, “Fine-Tuning PFA.” 101 Appendix C: PFA Assertions C*$* ASSERT ARGUMENT ALIASING The C*$* ASSERT ARGUMENT ALIASING assertion allows PFA to make assumptions about subprogram arguments in a program. According to the Fortran 77 standard, you can alias a variable only if you do not modify (that is, write to) the aliased variable. This assertion tells PFA that the subprogram on the following line violates the Fortran 77 standard in this regard. C*$* ASSERT DO (SERIAL) The C*$* ASSERT DO (SERIAL) assertion tells PFA to run the specified loop serially. PFA does not try to convert the specified loop to run in parallel. Nor does it try to run any enclosing loop in parallel. However, PFA can still convert any loops nested inside the serial loop to run in parallel. C*$* ASSERT DO (CONCURRENT) The C*$* ASSERT DO (CONCURRENT) assertion tells PFA to ignore assumed data dependencies. Normally, PFA is conservative about what loops it converts run in parallel. When PFA analyzes a loop to see if it is safe to run in parallel, it categorizes the loop into one of three groups: • yes (loop is safe to run in parallel) • no • not sure Normally, PFA does not run “not sure” loops in parallel. C*$* ASSERT DO (CONCURRENT) tells PFA to go ahead and run “not sure” loops in parallel. Note: If PFA identifies a loop as containing definite (as opposed to assumed) data dependencies, it does not run the loop in parallel even if a C*$* ASSERT DO (CONCURRENT) assertion precedes the loop. C*$* ASSERT DO PREFER (SERIAL) The C*$* ASSERT DO PREFER (SERIAL) assertion indicates that you want to execute a DO loop in serial mode. This assertion directs PFA to leave the DO loop alone, regardless of the setting of the optimization level. You can use this assertion to control which loop (in a nest of loops) PFA chooses to run in parallel. 102 C*$* ASSERT DO PREFER (CONCURRENT) The C*$* ASSERT DO PREFER (CONCURRENT) assertion runs a particular nested loop in parallel whenever possible. PFA runs other nested loops in parallel only if a condition prevents running the selected loop in parallel. The C*$* ASSERT DO PREFER (CONCURRENT) assertion applies only to the DO loop that it precedes. PFA does not generate parallel code if you use the -NOCONCURRENTIZE command line option or the C*$* NOCONCURRENTIZE directive. C*$* ASSERT EQUIVALENCE HAZARD The C*$* ASSERT EQUIVALENCE HAZARD assertion allows equivalence variables to refer to the same memory location inside one loop. This assertion, when specified globally, has the same effect as the -ASSUME=E command line option. The C*$* ASSERT EQUIVALENCE HAZARD assertion is active until you reset it or until the end of the program unit. See also C*$* ASSERT NO EQUIVALENCE HAZARD. C*$* ASSERT CONCURRENT CALL C*$* ASSERT CONCURRENT CALL tells PFA to ignore assumed dependencies that are due to a subroutine call or a function reference. However, you must ensure that the subroutines and referenced functions are safe for parallel execution. This assertion applies to all subroutine and function references in the immediately following loop. C*$* ASSERT NO ARGUMENT ALIASING The C*$* ASSERT NO ARGUMENT ALIASING assertion allows PFA to make assumptions about subprogram arguments in a program. According to the Fortran 77 standard, you can alias a variable only if you do not modify (that is, write to) the aliased variable. 103 Appendix C: PFA Assertions C*$* ASSERT NO EQUIVALENCE HAZARD The C*$* ASSERT NO EQUIVALENCE HAZARD assertion tells PFA that your code does not use equivalenced variables to refer to the same memory location inside one loop nest. Normally, EQUIVALENCE statements allow your code to use different variable names to refer to the same memory location. C*$* ASSERT NO RECURRENCE The C*$* ASSERT NO RECURRENCE (variable) assertion tells PFA to ignore all data dependencies associated with variable. PFA ignores not just assumed dependencies (as with the C*$* ASSERT DO (CONCURRENT) assertion) but also real dependencies. Use this assertion to force PFA to parallelize a loop when other, gentler means have failed. Use this assertion with great caution, as indiscriminate use can result in illegal parallel code. C*$* ASSERT PERMUTATION The C*$* ASSERT PERMUTATION(array) assertion tells PFA that array contains no repeated values. This assertion permits PFA to run in parallel certain kinds of loops that use indirect addressing. C*$* ASSERT RELATION The C*$* ASSERT RELATION (name1 .xx. name2) assertion explicitly states the relationship between name1 and name2. name1 and name2 are two variables or a variable and a constant, and xx is any of the following: GT, GE, EQ, NE, LT, or LE. This assertion applies only to the DO statement it precedes. If you specify this assertion globally, the program uses the assertion only when name1 and name2 appear in COMMON blocks or are dummy argument names to the subprogram. 104 Glossary action summary The portion of the listing file that summarizes PFA’s actions. assertion A PFA directive that asserts something about the program. For example, an assertion can assert that a particular array is a permutation vector. PFA does not verify the validity of assertions. data independence When no iteration of a loop writes to a memory location that is read or written by any other iteration of that loop. directive A command, specified within the source file, that requests a particular action from PFA. For example, directives enable, disable, or modify a feature of PFA. global assertion An assertion that is placed on the first line of the input file. PFA interprets global assertions as if they appear at the top of each program unit in the file. See also, assertion. global directive Directives that are placed on the first line of the input file. PFA interprets global directives as if they appear at the top of each program unit in the file. See also, directive. inlining The process of replacing a call to an external routine with the actual code. 105 Glossary intermediate file A transformed version of a Fortran source program generated by PFA. This file name has the suffix .m. interprocedural analysis (IPA) The process of analyzing an external routine ahead of time and using the results when the routine is referenced. listing file An annotated listing of the parts of a source program that can and cannot run in parallel on multiple processor generated by PFA. This file has the suffix .1. max reduction A reduction that uses the max() intrinsic function. See also, reduction. min reduction A reduction that uses the min() intrinsic function. See also, reduction. parallelize Manipulating code so that it can be run in parallel. permutation index A permutation vector used to index into an array. Because all the numbers in the permutation vector are different, when used as indexes they all refer to different array elements. permutation vector Any list of numbers that are all different. POWER Fortran Accelerator (PFA) A source-to-source preprocessor that analyzes a program and identifies loops that do not contain data dependencies. product reduction A reduction that uses the multiply operator *. See also, reduction. 106 profiling A process that produces detailed information about program execution, such as details about areas of code where most of the execution time is spent. The prof(1) command produces profiling information. reduction An operation that reduces a set of values to one value. round-off error The inaccuracy resulting from rounding off values in a calculation. sum reduction A reduction that uses the add operator +. See also, reduction. 107 Index A C action summary, 24, 105 addressing indirect, 104 aliasing with assertions, 65 –ARCLIMIT command line option, 39, 69 assertions C*$* ASSERT ARGUMENT ALIASING, 65, 102 C*$* ASSERT CONCURRENT CALL, 64, 103 C*$* ASSERT DO (CONCURRENT), 63, 102 C*$* ASSERT DO (SERIAL), 61, 102 C*$* ASSERT DO PREFER (CONCURRENT), 62, 103 C*$* ASSERT DO PREFER (SERIAL), 61, 102 C*$* ASSERT EQUIVALENCE HAZARD, 103 C*$* ASSERT NO ARGUMENT ALIASING, 65, 103 C*$* ASSERT NO EQUIVALENCE HAZARD, 104 C*$* ASSERT NO RECURRENCE, 64, 104 C*$* ASSERT PERMUTATION, 64, 104 C*$* ASSERT RELATION, 66, 104 definition, 105 duration of, 7 for aliasing, 65 purpose of, 6 selecting, 53 –ASSUME command line option, 50, 65, 70 automatic loop blocking, 44 C$ DOACROSS, 60, 98 C$&, 60, 98 C*$* ARCLIMIT, 92 C*$* ASSERT ARGUMENT ALIASING, 65, 102 C*$* ASSERT CONCURRENT CALL, 64, 103 C*$* ASSERT DO (CONCURRENT), 63, 102 C*$* ASSERT DO (SERIAL), 61, 102 C*$* ASSERT DO PREFER (CONCURRENT), 62, 103 C*$* ASSERT DO PREFER (SERIAL), 61, 102 C*$* ASSERT EQUIVALENCE HAZARD, 103 C*$* ASSERT NO ARGUMENT ALIASING, 65, 103 C*$* ASSERT NO EQUIVALENCE HAZARD, 104 C*$* ASSERT NO RECURRENCE, 64, 104 C*$* ASSERT PERMUTATION, 64, 104 C*$* ASSERT RELATION, 66, 104 C*$* CONCURRENTIZE, 62, 92 C*$* INLINE, 58, 93 enabling recognition of, 50 C*$* IPA, 93 enabling recognition of, 50 C*$* LIMIT, 94 C*$* MINCONCURRENT, 94 C*$* NOCONCURRENTIZE, 62, 95 C*$* NOINLINE, 58, 95 C*$* NOIPA, 95 109 Index C*$* OPTIMIZE, 96 C*$* ROUNDOFF, 97 C*$* SCALAR OPTIMIZE, 97 C*$* UNROLL, 98 CDIR$ IVDEP, 53, 63, 99 CDIR$ NEXT SCALAR, 61, 99 columns specify number, 52 compiling programs with PFA, 10 –CONCURRENTIZE command line option, 70 conditions that prevent inlining/IPA, 50 controlling code execution, 38 running code in parallel, 38 specifying a work threshold, 38 controlling Fortran language elements, 50 Cray directives, 99 CDIR$ IVDEP, 99 CDIR$ NEXT SCALAR, 99 enabling recognition of, 53 see also directives, 99 creating a library, 48 customizing PFA execution, 37 controlling code execution, 38 overview, 37 CVD$ CONCUR, 62 D data dependencies ignoring, 63 data independence, 105 debugging lines excluding and including, 51 default listing information interpretation action summary, 24 DO loop marking, 23 field descriptions, 22 110 footnotes, 23 line numbers, 22 syntax error/warning messages, 24 viewing the listing file, 22 directives C$ DOACROSS, 60, 98 C$&, 60, 98 C*$* ARCLIMIT, 92 C*$* CONCURRENTIZE, 62, 92 C*$* INLINE, 58, 93 C*$* IPA, 93 C*$* LIMIT, 94 C*$* MINCONCURRENT, 94 C*$* NOCONCURRENTIZE, 62, 95 C*$* NOINLINE, 58, 95 C*$* NOIPA, 95 C*$* OPTIMIZE, 96 C*$* ROUNDOFF, 97 C*$* SCALAR OPTIMIZE, 97 C*$* UNROLL, 98 CDIR$ IVDEP, 63, 99 CDIR$ NEXT SCALAR, 61, 99 CVD$ CONCUR, 62 definition, 105 purpose of, 4 selecting, 53 –DIRECTIVES command line option, 53, 71 –DLINES command line option, 51, 72 DO loop controlling execution, 51 marking in listing file, 23 E enabling loop unrolling, 43 equivalenced variables using, 65 error messages in listing file, 24 example PFA command line, 14 using PFA directly, 15 F fine-tuning inlining and IPA, 58 footnotes in listing file, 23 formatting the listing file, 19 –FORTRAN command line option, 54, 72 Fortran standard specifying, 52 function call PFA listing generated, 30 G global assertion, 105 global assumptions, 50 global directive, 105 I indirect addressing, 104 indirect indexing, 27 –INLINE command line option, 47, 72 –INLINE_CREATE command line option, 48, 73 –INLINE_DEPTH command line option, 49, 73 –INLINE_FROM_FILES command line option, 47, 74 –INLINE_FROM_LIBRARIES command line option, 47, 74 –INLINE_LOOPLEVEL command line option, 75 –INLINE_MAN command line option, 50, 75 inlining, 105 conditions that prevent, 50 fine-tuning, 58 manual, 50 performing, 46 specifying depth in loops, 49 specifying location of routines, 47 specifying loop level, 49 specifying routines, 47 –INPUT command line option, 54, 76 intermediate file, 106 internal table controlling size, 39 interprocedural analysis (IPA), 106 conditions that prevent, 50 fine-tuning, 58 manual, 50 performing, 46 specifying location of routines, 47 specifying loop level, 49 specifying routines, 47 –IPA command line option, 47, 76 –IPA_CREATE command line option, 48, 76 –IPA_FROM_FILES command line option, 47, 77 –IPA_FROM_LIBRARIES command line option, 47, 77 –IPA_LOOPLEVEL command line option, 78 –IPA_MAN command line option, 50, 78 L library creating for inlining and IPA, 48 –LIMIT command line option, 40, 79 –LINES command line option, 79 –LIST command line option, 55, 80 listing file, 106 111 Index action summary, 24 error/warning messages, 24 field descriptions, 22 footnotes, 23 include options, 20 interpreting default information, 21 samples, 27-34 viewing, 22 listing file formatting, 19 disabling message classes, 21 paginating the listing, 19 specifying information to include, 20 –LISTOPTIONS command line option, 80 loop blocking, 44 loop unrolling enabling, 43 loops controlling execution, 51 M max reduction, 106 memory management transformations, 44 messages in listing file, 24 min reduction, 106 –MINCONCURRENT command line option, 81 N –NOCONCURRENTIZE command line option, 81 –NODIRECTIVES command line option, 82 –NODLINES command line option, 51, 82 –NOONETRIP command line option, 51, 82 112 O obsolete options, 55 obsolete syntax, 88 –ONETRIP command line option, 51, 83 optimization scalar controlling, 42 setting levels, 40 –OPTIMIZE command line option, 40, 83 overview of PFA, 1 P paginating the listing file, 19 parallelize, 106 permutation index, 106 permutation vector, 106 PFA, 106 action summary, 24 assertions, 101-104 duration, 7 purpose of, 6 circumventing, 60 command line example, 14 command line options, 3-4, 67 command line syntax, 11 compiling with, 10 controlling code transformations, 39 customizing execution, 38 definition, 1 directives, 91-99 purpose of, 4 table of, 5 interpreting default listing, 21 naming input and output, 54 overview of usage, 9 specifying routines, 49 strategy for using, 3 summary, 7 table of action abbreviations, 25 table of command line options, 12 using directly, 15 utilizing output, 17 PFA command line option –ARCLIMIT, 39, 69 –ASSUME, 50, 65, 70 –CONCURRENTIZE, 70 –DIRECTIVES, 53, 71 –DLINES, 51, 72 –FORTRAN, 54, 72 –INLINE, 47, 72 –INLINE_CREATE, 48, 73 –INLINE_DEPTH, 49, 73 –INLINE_FROM_FILES, 47, 74 –INLINE_FROM_LIBRARIES, 47, 74 –INLINE_LOOPLEVEL, 75 –INLINE_MAN, 50, 75 –INPUT, 54, 76 –IPA, 47, 76 –IPA_CREATE, 48, 76 –IPA_FROM_FILES, 47, 77 –IPA_FROM_LIBRARIES, 47, 77 –IPA_LOOPLEVEL, 78 –IPA_MAN, 50, 78 –LIMIT, 40, 79 –LINES, 79 –LIST, 55, 80 –LISTOPTIONS, 80 –MINCONCURRENT, 81 –NOCONCURRENTIZE, 81 –NODIRECTIVES, 82 –NODLINES, 51, 82 –NOONETRIP, 51, 82 –ONETRIP, 51, 83 –OPTIMIZE, 40, 83 –pfa, 11 –pfaprepass, 11 –ROUNDOFF, 42, 84 –SAVE, 52, 85 –SCALAROPT, 42, 85 –SCAN, 52, 86 –SUPPRESS, 86 –SYNTAX, 87 –UNROLL, 43, 87 –UNROLL2, 43, 88 –WK, 11 –pfa command line option, 11 PFA overview of operation, 1 –pfaprepass command line option, 11 POWER Fortran Accelerator (PFA), 106 product reduction, 106 profiling, 107 R reductions definition, 107 example of, 32 sum, 35 types of, 35 round off controlling variations, 42 error, 107 –ROUNDOFF command line option, 42, 84 routines specifying for PFA, 49 specifying where to search, 47 running code in parallel, 38, 62 running code serially, 61 113 Index S V sample listing files, 27 function call, 30 indirect indexing, 27 reductions, 32 –SAVE command line option, 52, 85 scalar optimizations controlling amount attempted, 42 –SCALAROPT command line option, 42, 85 –SCAN command line option, 52, 86 setting optimization level, 40 significant columns specifying, 52 specifying a complexity limit, 40 specifying a work threshold, 38 specifying routines for inlining or IPA, 47 standard directives, 92-98 see also directives strategy for using PFA, 3 sum reduction, 35, 107 –SUPPRESS command line option, 86 –SYNTAX command line option, 87 syntax conventions, xiii variables equivalenced, 65 saving across invocations, 52 VAST directives, 99 enabling recognition of, 54 see also directives, 99 viewing the listing file, 22 T tiling, 44 U –UNROLL command line option, 43, 87 –UNROLL2 command line option, 43, 88 114 W warning messages in listing file, 24 –WK command line option, 11 work threshold specifying, 38 Tell Us About This Manual As a user of Silicon Graphics products, you can help us to better understand your needs and to improve the quality of our documentation. Any information that you provide will be useful. Here is a list of suggested topics: • General impression of the document • Omission of material that you expected to find • Technical errors • Relevance of the material to the job you had to do • Quality of the printing and binding Please send the title and part number of the document with your comments. The part number for this document is 007-0715-060. Thank you! Three Ways to Reach Us • To send your comments by electronic mail, use either of these addresses: – On the Internet: techpubs@sgi.com – For UUCP mail (through any backbone site): [your_site]!sgi!techpubs • To fax your comments (or annotated copies of manual pages), use this fax number: 650-932-0801 • To send your comments by traditional mail, use this address: Technical Publications Silicon Graphics, Inc. 2011 North Shoreline Boulevard, M/S 535 Mountain View, California 94043-1389
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.2 Linearized : Yes Create Date : 2001:05:16 15:23:00 Producer : Acrobat Distiller 4.0 for Windows Modify Date : 2001:05:16 15:23:00-07:00 Page Count : 130EXIF Metadata provided by EXIF.tools