007 0715 060

User Manual: 007-0715-060

Open the PDF directly: View PDF .
Page Count: 130 [warning: Documents this large are best viewed by clicking the View PDF Link!]

POWER Fortran Accelerator™

User’s Guide

Document Number 007-0715-060

POWER Fortran Accelerator™ User’s Guide

Document Number 007-0715-060

CONTRIBUTORS

Written by Chris Hogue and David Graves

Edited by Janiece Carrico

Production by Gloria Ackley

Engineering contributions by Bron Nelson, Deb Caruso, and Mike Humphrey

This document contains proprietary and conﬁdential information of Silicon

Graphics, Inc. The contents of this document may not be disclosed to third parties,

copied, or duplicated in any form, in whole or in part, without the prior written

permission of Silicon Graphics, Inc.

RESTRICTED RIGHTS LEGEND

Use, duplication, or disclosure of the technical data contained in this document by

the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the

Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/

or in similar or successor clauses in the FAR, or in the DOD or NASA FAR

Supplement. Unpublished rights are reserved under the Copyright Laws of the

United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline

Blvd., Mountain View, CA 94039-7311.

Silicon Graphics and IRIS are registered trademarks, and POWER Fortran

Accelerator, POWER Series, and IRIX are trademarks of Silicon Graphics, Inc. Cray is

a trademark of Cray Research. VAST is a trademark of Paciﬁc Sierra Research, Inc.

VMS is a trademark of Digital Equipment Corporation.

Kuck and Associates, Inc., is the supplier of the optimizer used in this product.

iii

Contents

Introduction xi

Organization xi

Related Documentation

The following documents contain information relevant to PFA:

•Fortran 77 Programmer’s Guide, Silicon Graphics, Inc., document number

007-0711-030.

•Fortran 77 Language Reference Manual, Silicon Graphics, Inc., document

number 007-0710-040.

•IRIS-4D Series Compiler Guide, Silicon Graphics, Inc., document number

007-0905-030.

Typographical Conventions

xiii

Typographical Conventions

This guide uses the following conventions and symbols:

The following conventions and symbols are used in the text to describe the

form of Fortran statements:

Bold Indicates literal command line options, ﬁlenames,

keywords, function/subroutine names, pathnames, and

directory names.

Italics Represents user-deﬁned values. Replace the item in italics

with a legal value. Italics are also used for command names,

manual page names, and manual titles.

Courier Indicates command syntax, program listings, computer

output, and error messages.

Courier bold

Indicates user input.

[ ] Enclose optional command arguments.

() Surround arguments or are empty if the function has no

arguments following function/subroutine names.

Surround manual page section in which the command is

described following IRIX commands.

| Sseparates two or more optional items.

... Indicates that the preceding optional items can appear more

than once in succession.

#IRIX shell prompt for the superuser.

%IRIX shell prompt for users other than superuser.

xiv

Introduction

Here is an example illustrating the syntax conventions.

C*$*[NO]IPA [(name [,name...])] {HERE|ROUTINE|GLOBAL}

The previous syntax statement indicates that:

•The keyword C*$* NOIPA or C*$*IPA must be written as shown.

•You can specify one or more name, each separated by a comma and all

between parentheses.

•You must specify one of the following: HERE,ROUTINE, or GLOBAL.

The following statements are valid examples of the described syntax:

C*$* IPA(ALPHA,BETA) HERE

C*$* NOIPA GLOBAL

Chapter 1

1. Overview of PFA

This chapter contains the following sections:

•“Overview” describes how PFA operates and suggests procedures for

using it.

•“Strategy for Using PFA” explains when and how to use PFA.

•“Command Line Options” lists and describes the command line

options.

•“Directives” explains what a directive is and lists the supported

directives.

•“Assertions”explains what an assertion is and lists the supported

assertions.

•“Summary” is a short summary of the capabilities of PFA.

Overview

PFA is a Fortran 77 source-to-source preprocessor that enables you to run

existing Fortran 77 programs efﬁciently on the Silicon Graphics POWER

SeriesTM multiprocessor systems. PFA analyzes a program and identiﬁes

loops that do not contain data dependencies. Such loops are safe to execute

in parallel (concurrently). PFA automatically inserts special compiler

directives in a modiﬁed copy of the original source code. (PFA produces a

number of ﬁles containing code and other information you need to run a

program concurrently on multiple processors.)

Chapter 1: Overview of PFA

Interpreting the PFA-generated compiler directives, the Silicon Graphics

Fortran 77 compiler can generate code to split loop processing across all the

available multiple processors. Because the directives inserted by PFA look

like standard Fortran 77 comment statements, PFA does not affect the

portability of the code to non–Silicon Graphics, Inc. (SGl), systems.

In addition, you do not need a multiprocessor system to develop under PFA

(although there is a slight performance loss when running multiprocessed

code on a single-processor system). You can develop and test a Fortran 77

program using PFA on any IRIS-4DTM Series workstation (including

single-processor systems) and then execute the program on a multiprocessor

system. The executable code automatically adjusts itself to use all the

processors available on the workstation at run time. (You can also manually

specify the number of processors to use; see the Fortran 77 Programmer’s

Guide.) However, simply passing code through PFA rarely produces all the

increased performance available. There are often easily removed data

dependencies that prevent PFA from running a loop in parallel. Using the

listing ﬁle, optionally generated by PFA, you can ﬁnd the real or potential

data dependencies that prevented PFA from running a loop in parallel. Refer

to Chapter 3, “Utilizing PFA Output,” for details about the listing ﬁle.

If the data dependency is real, you can often remove the dependency by

making a small change to the code. If the data dependency was apparent but

not real, you can explicitly instruct PFA to run the code in parallel by

inserting PFA assertions. These assertions look like Fortran 77 comments.

With PFA, you select the code to convert to run in parallel. Thus, you can

convert the whole program or key parts of it by adding PFA directives

manually or by having PFA convert only selected ﬁles. In addition, you can

run PFA on some, all, or none of a program’s source ﬁles. The object ﬁles

produced using PFA are fully compatible with other object ﬁles. You can

freely combine them with object ﬁles that you prepared manually for

parallel execution and with object ﬁles that run only serially.

Strategy for Using PFA

Use PFA to identify which loops of a Fortran 77 program can be run safely

in parallel. In some instances, PFA alone makes a signiﬁcant amount of the

code run in parallel. However, for many programs simple code changes let

PFA automatically run more of the code in parallel.

Knowing when and where to modify your code means understanding the

information in the PFA listing. Understanding the PFA listing will make it

easy to recognize where small changes to the code can make big differences

in how much code can run in parallel. Refer to Chapter 3, “Utilizing PFA

Output”,” for information.

PFA analyzes a program for data dependence. During this analysis, PFA

looks for Fortran 77 DO loops in which each iteration of the loop is

independent of all other iterations. If each iteration of the loop is

self-contained, the system can execute the iterations in any order (or even

simultaneously on separate processors) and produce the same result after

running all iterations.

When PFA ﬁnds a loop with data independence, PFA knows it can safely run

the loop in parallel. When PFA ﬁnds a loop that contains iterations that are

dependent on other iterations, it cannot safely run the loop in parallel but

can tell you what is causing the problem. If PFA cannot run the loop in

parallel, the listing ﬁle will explain where PFA encountered problems.

Command Line Options

To customize the way PFA executes an entire program, you can specify

various command line options when you run PFA directly or when you

specify PFA as part of a compile (Chapter 2, “How to Use PFA,” explains

both procedures). The ﬁve functional categories of command line options are

•parallel execution

•general optimization

•Fortran 77 language control

Chapter 1: Overview of PFA

•directive control

•listing

Chapter 4, “Customizing PFA Execution,” explains when and how to use the

various options, and Appendix A, “PFA Command Line Options,” provides

a complete summary.

Directives

PFA directives enable, disable, or modify a feature of PFA. Essentially,

directives are command line options speciﬁed within the input ﬁle instead

of on the command line. Unlike command line options, directives have no

default setting. To invoke a directive, you must either toggle the directive on

or set a desired value for its level.

PFA directives allow you to specify PFA options in addition to, or instead of,

command line options. Directives placed on the ﬁrst line of the input ﬁle are

called global directives. PFA interprets them as if they appear at the top of each

program unit in the ﬁle. Use global directives to ensure that the program is

compiled with the correct command line options. Directives appearing

anywhere else in the ﬁle apply only until the end of the current program

unit. PFA resets the value of the directive to the global value at the start of

the next program unit. (Set the global value using a command line option or

a global directive.)

Some command line options act like global directives. Other command line

options override directives. Many PFA directives have corresponding

command line options. If you specify conﬂicting settings in the command

line and a directive, PFA chooses the most restrictive setting. For Boolean

options, if either the directive or the command line has the option turned off,

it is considered off. For options that require a numeric value, PFA uses the

minimum of the command line setting and the directive setting.

Table 1-1 lists the directives supported by PFA. In addition to the standard

directive, PFA supports the CrayTM and VASTTM directives listed in the table.

PFA maps these directives to corresponding PFA assertions. Refer to

Chapter 5, “Fine-Tuning PFA”,” for details.

Directives

Refer to Appendix B, “PFA Directives,” for a list and description of PFA

directives.

Table 1-1 PFA Directives

Standard Cray VAST

C*$*ARCLIMIT(n) CDIR$ NEXT SCALAR CVD$ CONCUR

C*$*CONCURRENTIZE CDIR$ IVDEP CVD$LSTVAL

C*$*INLINE CFVD$NOLSTVAL

C*S*IPA

C*$*LIMIT(n)

C*$*MINCOMCURRENT(n)

C*$*NONCONCURRENTIZE

C*$*NOINLINE

C*$*NOIPA

C*$*OPTIMIZE(n)

C*$*ROUNDOFF(n)

C*$*SCALAR OPTIMIZE(n)

C*$*UNROLL(n)

C*$*UNROLL(n,m)

C$DOACROSS

C$&

Chapter 1: Overview of PFA

Assertions

Assertions provide PFA with additional information about the source

program. Sometimes assertions can improve optimization results. Use them

only when speed is essential.

Because PFA does not check the correctness of assertions, they can be unsafe.

If you specify an incorrect assertion, the PFA-generated code might give

different answers from the scalar program. If you suspect unsafe assertions

are causing problems, use the -NODIRECTIVE command line option or the

C*$* NO ASSERTIONS directive to tell PFA to ignore all assertions.

As with a directive, PFA treats an assertion as a global assertion if it comes

before all comments and statements in the ﬁle. That is, PFA treats the

assertion as if it were repeated at the top of each program unit in the ﬁle.

C*$* ASSERT RELATION (name .xx. name) assertions include variable

names. If you specify them as global assertions, a program uses them only

when those variable names appear in COMMON blocks or are dummy

argument names to the subprogram. You cannot use global assertions to

make relational assertions about variables that are local to a subprogram.

Many assertions, like directives, are active until the end of the program unit

(or ﬁle) or until you reset them. Other assertions are valid only for the DO

loop before which they appear (such as C*$* ASSERT DO PREFER

(CONCURRENT)). This type of assertion applies to the next DO loop but

not to any loop nested inside it.

Summary

Table 1-2 lists PFA assertions and their duration.

Summary

PFA provides information about the dependencies of loops in a Fortran 77

program. Often, PFA can use the information to run loops in parallel

automatically. But when PFA is not able to convert the code for parallel

execution automatically, it can tell you where it ran into problems. Often,

you need only make a small change to remove the dependencies that

prevent the loop from running in parallel. The better you understand the

information PFA gives you, the better equipped you will be to transform the

program into an efﬁcient parallel version.

For more information about parallel processing in general, see Chapter 5 in

the Fortran 77 Programmer’s Guide. Especially recommended are the sections

“Analyzing Data Dependencies for Multiprocessing” and “Breaking Data

Dependencies” for information about recognizing and repairing data

dependency problems.

Table 1-2 PFA Assertions and Their Duration

Assertion Duration

C*$* ASSERT DO (SERIAL) Next Loop

C*$* ASSERT DO (CONCURRENT) Next Loop

C*$* ASSERT DO PREFER (SERIAL) Next Loop

C*$* ASSERT DO PREFER (CONCURRENT) Next Loop

C*$* ASSERT [NO] EQUIVALENCE HAZARD Until Reset

C*$* ASSERT [NO] ARGUMENT ALIASING Until Reset

C*$* ASSERT RELATION (name .xx.name) Next Loop

C*$* ASSERT CONCURRENT CALL Next Loop

C*$* ASSERT NO RECURRENCE Next Loop

C*$* ASSERT PERMUTATION (name) Next Loop

Chapter 2

2. How to Use PFA

This chapter contains the following sections:

•“Overview” describes how to prepare for using PFA.

•“Compiling Programs With PFA” explains how to run PFA as part of a

Fortran compile.

•“Using PFA Directly”explains how to run PFA independent of the

Fortran driver.

Overview

Simply running a program through PFA might buy you some improved

performance, but you can get far more if you understand the PFA listing.

From the listing, you can often identify small problems that prevent a loop

from running safely in parallel. With a relatively small amount of work, you

can remove these data dependencies and dramatically improve the

program’s performance.

When trying to ﬁnd loops to run in parallel, focus your efforts on the areas

of the code that use the bulk of the run time. Spending time trying to run in

parallel a routine that uses only 1 percent of the run time of the program

cannot signiﬁcantly improve the performance of your program.

To determine where your code spends its time, take an execution proﬁle of

the program. Use either pc-sample proﬁling (through the -p option to f77(1))

or basic block proﬁling (through pixie(1)). Refer to Chapter 2, “Improving

Program Performance,” of the IRIS-4D Compiler Guide for details about

proﬁling.

Chapter 2: How to Use PFA

There are two schools of thought about proﬁling: conservative and

optimistic. The conservative approach takes a proﬁle of the original

(nonparallel) job. You then run in parallel only the loops that account for

most of the run time. The more optimistic approach runs the entire program

through PFA and then proﬁles the resulting multiprocessed job. The

conservative approach reduces the chances that something might go wrong

because it makes fewer changes to the code. It also focuses on the smallest

number of lines of code that have the greatest effect.

Use the optimistic approach when you think that PFA will do a good job

with the existing program. You will save time by letting PFA do what it can.

You can then focus on those routines where PFA had a problem. One

situation in which PFA frequently does a good job is when you convert

programs that already run well on traditional vector architectures. Many

such programs run in parallel without additional effort.

Whichever approach you choose, use the proﬁle to focus your efforts on the

most time-consuming routines. Once you ﬁnd a time-consuming routine,

submit that routine alone to PFA. If the routine is in the middle of a large ﬁle,

consider using fsplit(1) to isolate the individual routine. Compile the routine

with the –pfa keep option, and examine the listing ﬁle. The PFA listing

identiﬁes the loops that PFA can and cannot run in parallel. For loops that

cannot run in parallel, the PFA listing also tells you why it could not convert

the loop for parallel execution.

Compiling Programs With PFA

The following is the command line syntax for compiling a Fortran 77

program with PFA and command line options. You can pass these options to

PFA by adding the –WK option to the f77 command line. It invokes the

various processing phases that compile, optimize, assemble, and link edit

the program. For more information about the –WK option, see the f77(1)

manual page.

Compiling Programs With PFA

Syntax

f77 -pfa[{list|keep}][-WK,-option[=value][,-option[=value]]...]

[-pfaprepass,-option[=value][,-option[=value]] ... ] ﬁlename.f

where

–pfa Invokes the POWER Fortran Accelerator, pfa. Enables any

multiprocessing directives.

list Runs pfa and generates an annotated listing of the parts of

the program that can (and cannot) run in parallel on

multiple processors. The listing ﬁle has the sufﬁx .l.

keep Runs pfa, generates the listing ﬁle (.l), and saves the

intermediate transformed Fortran 77 program. The

intermediate ﬁle has the sufﬁx.m.

–WK Passes the speciﬁed command line options to PFA. Do not

enter spaces between -WK and any of the hyphens, options,

equal signs, and values that follow it.

–option Speciﬁes a PFA command line option listed in Table 2-1, for

example, -IGNOREOPTIONS.

value Speciﬁes a value for a command line option, for example,

10.

–pfaprepass Passes the code through PFA an extra time. The ﬁrst time

through (the prepass), PFA uses the options speciﬁed in the

–pfaprepass option but does not insert C$ DOACROSS

directives. The output of this operation is then passed back

through PFA, using the options speciﬁed in the –WK

option. Only rarely should you need to use this option, and

there is good reason to avoid it. Normally, PFA does all it

can in a single run-through. In rare circumstances an extra

pass can be beneﬁcial. However, the PFA algorithms do not

necessarily converge, and multiple passes over the code can

change it for the worse.The syntax of this option is the same

as the -WK option.

ﬁlename.f Speciﬁes the Fortran 77 source program. The ﬁlename must

always use the .f sufﬁx.

Chapter 2: How to Use PFA

Table 2-1 lists the PFA command line options. Although the table lists the

options in uppercase, you can specify them in lowercase as well.

Note: You can replace many of the PFA command line options listed in

Table 2-1 with in-code directives. For information on these directives, see

Chapter 5, “Fine-Tuning PFA,” and Appendix B, “PFA Directives.”

Table 2-1 PFA Command Line Options

Reference Long Name Short Name Default Value

Parallelization [NO]CONCURRENTIZE

MINCONCURRENT=n

[N]CONC

MC=n

CONCURRENTIZE

MINCONCURRENT=500

Optimization ARCLIMIT

LIMIT=n

OPTIMIZE=n

ROUNDOFF=n

SCALAROPT=n

UNROLL=n

UNROLL2=n

ARCLM=n

LM=n

O=n

R=n

SO=n

UR=n

UR22=n

ARCLIMIT=5000

LIMIT=20000

OPTIMIZE=5

ROUNDOFF=0

SCALAROPT=3

UNROLL=4

UNROLL2=100

Fortran 77

Language

Control

ASSUME=list

[NO]DLINES

[NO]ONETRIP

SAVE=c

SCAN=n

SYNTAX=c

AS=list

[N]DL

[N]l

SV=c

SCAN=n

SY=c

ASSUME=EL

NODLINES

NOONETRIP

SAVE=A

SCAN=72

(option off)

Compiling Programs With PFA

Inlining and

Interprocedural

Analysis

INLINE[=list]

IPA[=names]

INLINE_CREATE=name

IPA_CREATE=name

INLINE_FROM_FILES=list

IPA_FROM_FILES=list

INLINE_FROM_LIBRARIES=l

ist

IPA_FROM_LIBRARIES=list

INLINE_LOOP_LEVEL=n

IPA_LOOP_LEVEL=n

INLINE_MAN

IPA_MAN

INLINE_DEPTH

IPA

INCR=name

IPACR=name

INFF=list

IPAFF=list

INFL=list

IPAFL=list

INLL=n

IPALL=n

INM

IPAM

IND

(option off)

(INLL=10

IPALL=10

(option off)

INLL=10

IPALL=10)

IND=10

Directives [NO]DIRECTIVES=list [N]DR=list DIRECTIVES=AKSV

I/O INPUT=ﬁle.f

[NO]FORTRAN=ﬁle

[NO]LIST=ﬁle

ﬁle.f

[N]F=ﬁle

[N]L=ﬁle

ﬁle.f

F=ﬁle.m

L=ﬁle.l

Listing LINES=n

LISTOPTIONS=list

SUPPRESS=list

LN=n

LO=list

SU=list

LINES=55

LISTOPTIONS=OL

(option off)

Obsolete CREATE

LIBRARY=ﬁle

[NO]EXPAND=list

LIMIT2=n

LIB=ﬁle

EX=list

LM2=n

(option off)

LM2=5000

Table 2-1 (continued) PFA Command Line Options

Reference Long Name Short Name Default Value

Chapter 2: How to Use PFA

Example

To compile the Fortran 77 program prog.f with PFA and the -UNROLL=8

option, enter

% f77 -pfa -WK,-UNROLL=8 prog.f

Figure 2-1 shows what happens when you compile a Fortran 77 program

with PFA. The ﬁrst pass invokes the macro preprocessor cpp to handle cpp

directives. (For more information, see the cpp(1) manual page.) PFA then

takes the cpp output and inserts code that runs data-independent loops in

parallel. PFA can also generate a listing ﬁle (with the .l sufﬁx) and an

intermediate ﬁle (with the .m sufﬁx). For details, refer to Chapter 3,

“Utilizing PFA Output.”

Finally, the Fortran 77 compiler, f77, compiles the transformed

PFA-generated ﬁle to produce an object ﬁle.

Figure 2-1 Compiling With PFA

cpp

PFA

f77

Object File (.o)

Fortran 77 Source (.f)

Listing File (.1)

Intermediate File (.m)

Using PFA Directly

Although you normally run PFA as part of an f77 compile, the two instances

when you should run PFA directly are

•When creating an inlining or IPA library (refer to Chapter 4,

“Customizing PFA Execution.”)

•If you want to “capture” the output of PFA and review it to determine

further optimizations

Running the pfa(1) command directly, using the following syntax, produces

both the .m and the .l ﬁles.

Syntax

/usr/lib/pfa [-option [-option]...] ﬁlename.f

where

-option Speciﬁes a PFA command line option listed in Table 2-1, for

example, -INLINE.

ﬁlename.f Speciﬁes the Fortran 77 source program. The ﬁlename must

have the .f sufﬁx.

Example

The following command runs PFA directly using the -unroll and -roundoff

options:

% /usr/lib/pfa -ur=4 -r=2 sample.f

Chapter 3

3. Utilizing PFA Output

This chapter contains the following sections:

•“Overview” discusses the PFA output ﬁles and provides examples of

them.

•“Formatting the Listing File” explains how to change the format of the

standard listing ﬁle.

•“Interpreting Default Listing Information” explains the contents of the

listing ﬁle.

•“Sample Listing Files” provides sample listing ﬁles along with an

interpretation of each.

Overview

PFA generates two ﬁles, a listing ﬁle (.l) and an intermediate ﬁle (.m).

Invoking PFA as part of a Fortran compilation produces a line-numbered

listing ﬁle when you use the -pfa list option. If you specify the -keep option,

PFA produces both the numbered listing ﬁle and the intermediate ﬁle. PFA

automatically produces both ﬁles when you invoke it directly. (For details

about invoking PFA, refer to Chapter 2, “How to Use PFA.”)

For example, consider the following program, sample.f:

subroutine sample (a,b,c)

dimension a(1000),b(1000),c(1000)

do 10 i = 1, 1000

10 a(i) = b(i) + c(i)

end

Chapter 3: Utilizing PFA Output

Compiling sample.f as follows

%f77 -pfa keep sample.f

generates the following listing ﬁle, sample.l:

Actions Do Loops Line

DIR 1 # 1 “sample.f”

2 subroutine sample(a,b,c)

3 dimension

a(1000),b(1000),c(1000)

c +-------- 4 do 10 i = 1,1000

*_______ 5 10 a (i) = b(i) + c(i)

6 end

Abbreviations Used

DIR directive

C concurrentized

Loop Summary

From To Loop Loop

Loop# line line label index Status

1 4 5 DO 10 I concurrentized

and the intermediate ﬁle, sample.m:

# 1 “sample.f”

subroutine sample(a,b,c)

DIMENSION A(1000), B(1000), C(1000)

# 3 “sample.f”

C$DOACROSS SHARE(A,B,C),LOCAL(I)

# 3 “sample.f”

DO 2 I=1,000

# 4 “sample.f”

A(I) = B(I) + C(I)

# 4 “sample.f”

2 CONTINUE

end

Formatting the Listing File

PFA placed a C before the ﬁrst statement of the DO loop in the listing ﬁle,

sample.l. The Abbreviations Used table shows that C stands for

“concurrentized,” which means that PFA determined that it can safely run

the loop in parallel. The Loop Summary table at the bottom of sample.l

shows that the status of the loop is concurrentized.

PFA inserted the statement starting with C$DOACROSS before the DO

statement in the intermediate ﬁle, sample.m. The Fortran 77 compiler

directive C$DOACROSS tells f77 that the next DO loop can run in parallel.

The phrase SHARE (A,B,C) informs the Fortran 77 compiler that all

processes that execute the DO loop share the arrays A,B, and C. The phrase

LOCAL(I) indicates that every process executing the DO loop keeps a local

variable I. The lines of the form # 4 "sample.f" are called line number

directives. They relate the transformed source back to the original source.

Note: The ﬁrst line number directive appears in the listing because it was

actually added by cpp before PFA ran.

Formatting the Listing File

You customize a PFA listing ﬁle by

•paginating the listing

•selecting the information to be printed

•disabling speciﬁc message classes

Paginating the Listing

The -LINES=n option (or -LN=n) paginates the listing for printing. Use this

to change the number of lines per page. Specifying -LINES=0 paginates at

subroutine boundaries.

If you do not specify the -LINES option, PFA prints 55 lines per page.

Chapter 3: Utilizing PFA Output

Specifying Information to Include

The -LISTOPTIONS=list option (or -LO=list) speciﬁes the information to

include in the listing ﬁle (.l), where list is any combination of the options in

Table 3-1.

Table 3-1 Listing File Include Options

Value Produces

C Calling tree at the end of the program listing.

I Transformed program ﬁle annotated with line numbers in the

source program. Error messages and debugging information can

refer to the original source rather than the transformed source.

Running PFA as part of an f77 compile automatically adds this

option.

K Print out of the PFA options used at the end of each program unit.

L Loop-by-loop optimization table.

N Program unit names, as processed, to the standard error ﬁle. This

option is added automatically as part of an f77 -v compilation.

O Annotated listing of the original program.

P Processing performance statistics.

S Summary of optimization performed.

T Annotated listing of the transformed program.

Interpreting Default Listing Information

Disabling Message Classes

Use the -SUPPRESS=list option (or -su=list) to disable individual classes of

PFA messages that are normally included in the listing (.l) ﬁle. These

messages range from syntax warnings and error messages to messages

about the optimizations performed. list is any combination of the options in

Table 3-2.

If you do not specify this option, PFA prints messages of all classes.

Interpreting Default Listing Information

Knowing when and where to modify your code means understanding the

information in the PFA listing. This understanding allows you to recognize

where small changes to the source code will make a big difference in how

much code is run in parallel.The PFA-generated listing ﬁle lists the

optimizations PFA made to the code. For example, a message could say that,

although three loops could have run in parallel, PFA converted only the one

it determined most proﬁtable.

This section explains how to view the listing ﬁle online and then lists and

describes the various ﬁelds.

Table 3-2 Listing File Message Disabling Options

Value Message Class Disabled

D Data dependence

E Syntax error

I Information

N Unable to run loop in parallel

Q Questions

S Standard messages

W Warning of syntax error (PFA adds the -SUPPRESS=W option

automatically if you use the -w option to f77)

Chapter 3: Utilizing PFA Output

Viewing the Listing File

The listing ﬁle is in 132-column format. To view the ﬁle, open a window with

132 columns and 40 rows by entering

% wsh -s132,40

Field Descriptions

This section explains the contents of the .l ﬁle when you use the default

values for the -LISTOPTIONS command line option (that is, O and L).

A default PFA ﬁle listing includes

•line numbers

•DO loop markings

•footnotes

•syntax errors/warning messages

•action summary

Line Numbers

A statement in the PFA listing labeled with a line number, such as 21, is the

same as line 21 from the original program or has been derived from that line.

These line numbers are useful when inspecting the PFA-transformed

program listing and when debugging. PFA sometimes generates several

lines of code from a single line of the original program; in this case, each new

line of code is labeled with the same number as the line of the original

program from which it was generated. Consequently, many lines of the

PFA-transformed program listing carry the same number because they are

related to one line of the original program listing.

Interpreting Default Listing Information

DO Loop Marking

The listing ﬁle displays DO loops graphically in a column headed DO

Loops. The PFA surrounds each DO loop (up to nest level 10) with a loop

delimiter character. Each character listed in Table 3-3 has a speciﬁc meaning.

A statement contained within nDO loops has n of these loop delimiters on

that line.

For example,

DO Loops Line

+------- 173 DO 100 M=2,MAX(MFLD,2)

| 174 IADR = ISECT(M)

| 175 IADR1= ISECT(M-1)

| 176 PNM(IADR)=(ANM(IADR) *PNM(IADR1))

|_______ 177 100 PPNM(IADR)= -(ANM(IADR) *PNM(IADR1))

Footnotes

PFA uses the footnotes listing to give important details concerning its

actions. PFA numbers and prints the footnotes at the bottom of each

program unit under the Footnote List heading. References to the footnotes

are displayed in the listing under the Footnotes column. For example, this

footnote

13 DD 1790 IF (B(I) .LE. 6) IB(J*I) = I+J

appears under Footnote List at the end of the program unit

13: data dependence Data dependence involving this line due

to variable IB.

Table 3-3 Listing File DO Loop Delimiters

Character Denotes

| Generic DO loop

* PFA can run loop in parallel

! Syntax error

Chapter 3: Utilizing PFA Output

In this example, 13 is the footnote number, DD (data dependence) is the

explanation for PFA’s action, and the IF statement on line 1790 refers to the

original source line number.

Syntax Errors/Warning Messages

When a program has syntax errors, the listing ﬁle describes the error next to

the lines that start with the symbol ### in the Footnotes column. These

messages are also printed to stderr, which will usually be your terminal.

For example,

Footnotes Actions DO Loops Line

1 SUBROUTINE Z(A,B,N)

2 REAL A(N), B(N)

+------- 3 DO 20 I=1,N

! 4 X=A(I)

! 5 Y=B(I)

! ______ 6 20 C(I)=X+Y

### line (6)

### error Array not declared or statement function declared

after executable statements.

### error A do loop ends on a non-executable statement.

7 PRINT *,X

8 END

Action Summary

When PFA translates or modiﬁes a statement, it uses abbreviations in the

Actions column of the listing ﬁle to identify the statements. PFA lists an

abbreviated explanation of its actions at the bottom of the listing. For the

DIR and V classes, the class itself serves as the message and no detailed

messages follow. All other classes have associated messages.

Interpreting Default Listing Information

Table 3-4 lists and explains the values that can appear in the Actions column.

Table 3-4 PFA Action Abbreviations

Value Meaning

DD (Data Dependence) Indicates that data dependence prevented PFA from

running this statement in parallel.

DIR (Directive) Used in conjunction with the footnotes and concerns

compiler directives. If you code a compiler directive and that line does

not have the DIR abbreviation in the listing, PFA will not recognize the

directive. Check the setting of the -DIRECTIVES command line option

and the syntax of the directive.

E (Error) Indicates syntax errors. These messages can refer to missing or

extra characters, illegal keywords, or text placed in the wrong column.

PFA cannot do anything with such code. The intermediate (.m) ﬁle

contains a copy of this program unit that PFA has not modiﬁed.

EX (Extension) Shows where a construct in the original program is not

allowed in the language PFA produces. In some cases, an operation or

type is allowed in the input language but not in the output language.

INF (Information) Provides noncritical information.

I (Insertion) Indicates that PFA added a statement.

LR (Loop Reordering) Indicates that PFA has modiﬁed a Fortran 77

statement in the process of interchanging loops. If during optimization

PFA ascertains that an outer loop would be more efﬁcient as an inner

loop, and it can legally reorder the loops, PFA places the outer loop

inside. In the process of this reordering, PFA might have to change loop

bounds (for triangular loops), distribute loops, or ﬂoat IF assignments.

Only the statements modiﬁed for the exchange are marked.

MIS (Miscellaneous) Indicates that some PFA information has been lost. This

message does not always mean that something is wrong with the

program.

NX (Nonconcurrent Statement) Indicates that PFA did not try or was unable

to run the statement in parallel. For example, when a subroutine call is

involved in a loop, PFA generates this message.

Chapter 3: Utilizing PFA Output

NO (Program Too Large—Not Optimized) Indicates that the program unit

being processed is too large for PFA to optimize, because of PFA’s data

structure size limitations. When PFA optimizes programs, it adds

statements that might also overﬂow the ﬁxed-size tables. In either case,

PFA stops optimization and passes the original program to the

intermediate (.m) ﬁle, informing you of this action. For PFA to process

the unit, you must split the program into smaller sections.

OE (Option Error) Indicates a syntax error in a PFA option. This error does

not stop processing of a program unit.

OTF (Output Translation Failure) Marks statements that have constructs that

exist in the input language but that cannot be represented in the output

language.

Q (Question) Indicates that PFA tried to optimize a loop nest but

discovered a data dependence it could not break at compile time without

further information. You can usually answer this question with an

appropriate assertion.

SO (Scalar Optimization) Marks places in the transformed listing where PFA

has optimized a scalar loop.

STD (Standardized) Marks where PFA changed a program to improve the

chance of ﬁnding code that it can optimize. This is often a conversion

from an IF/GOTO into a block IF, loop rerolling, and conversion of an

IF loop to a DO loop.

TE (Translator Error) Indicates an internal PFA error. PFA writes the

notiﬁcation to the standard error ﬁle and writes a trace back to the output

ﬁle. Notify SGI if you see this sort of bug (so it can be corrected) and, if

possible, send SGI the code that caused the trace back as well as the trace

back itself. If you can reproduce the error in a small program unit, send

that small program unit as well.

W (Warning) Contains syntax warnings.

Table 3-4 (continued) PFA Action Abbreviations

Value Meaning

Sample Listing Files

This section contains a few simple examples of Fortran code and the

corresponding PFA output. An actual source program would be much

larger, and a single loop could contain several of the cases illustrated here.

However, even in a large loop, you can deal with each problem individually.

Indirect Indexing

PFA cannot determine if it can run a loop in parallel when the code uses

indirect indexing. A loop is indirectly indexed when it uses the value from

some auxiliary array as the index value rather than the DO loop variable.

The Fortran 77 code

subroutine foo2(w,b,index,n)

real w(n), b(n)

integer index(n)

do i = 1, n

w(index(i)) = w(index(i)) + b(i)

enddo

end

when submitted to PFA, results in the listing ﬁle

12 subroutine foo2(w,b,index,n)

13 real w(n), b(n)

14 integer index(n)

1 Q +------- 16 do i = 1, n

2 DD ! 17 w(index(i)) = w(index(i)) +

b(i)

!_______ 18 enddo

19 end

Abbreviations Used

DD data dependence

Q question

Chapter 3: Utilizing PFA Output

Footnote List

1: question Is INDEX a permutation vector?

2: data dependence Data dependence involving this line due

to variable W.

DO Loop Summary

loop# from to DO label index workload status

1 16 18 DO I dependencies prevent

parallelism

DD in the Actions column on line 17 of the listing warns that the variable w

might carry a dependency. A dependency exists when one iteration of the

loop writes to a location that is used by a different iteration of the loop. In

this example, if the values of index(i) are ever the same for different values

of i, then different iterations might use the same location in w. Therefore, this

code contains a possible data dependence.

If you can guarantee that the values of index(i) are always different for each

value of i, then there is no dependence (each iteration uses a different

location in w). Question one on the Footnote List asks if index(i) is different

for every value of i. A permutation vector is a list of numbers, each of which

is different from the others. If you know that index is a permutation vector,

then the loop is data-independent. An example of a permutation vector is a

list of objects in which each object appears exactly once.

Sample Listing Files

Explicitly state that index is a permutation vector by adding an assertion in

the source

subroutine foo2(a,b,index,n)

real a(n), b(n)

integer index(n)

c*$*assert permutation (index)

do i = 1, n

a(index(i)) = a(index(i)) + b(i)

enddo

end

Now the listing ﬁle shows that PFA ﬁnds the loop safe to run in parallel

(indicated by the *DO loop delimiter)

Actions DO Loops Line

DIR 1 # 1 “foo2.f”

2 subroutine foo2(a,b,index,n)

3 real a(n), b(n)

4 integer index(n)

DIR 6 c*$*assert permutation (index)

C +------ 7 do i= 1, n

* 8 a(index(i)) = a(index(i)) +

b(i)

*______ 9 enddo

10 end

Abbreviations Used

DIR directive

C concurrentized

Loop Summary

From To Loop Loop

Loop# line line label index Status

1 7 9 Do I concurrentized

Note: As with all assertions, PFA does not verify the truth of this assertion.

When you make an assertion, be certain that the assertion is always true for

all possible input data.

Chapter 3: Utilizing PFA Output

Function Call

This example shows what happens when a loop contains a call to an external

routine. The Fortran 77 code

subroutine foo3 (a,b,c,n)

real a(n), b(n), c(n)

external force

do i = 1, n

a(i) = force (b(i), c(i))

enddo

end

generates the listing

Actions DO Loops Line

DIR 1 # 1 “foo3.f”

2 subroutine foo3(a,b,c,n)

3 real a(n), b(n), c(n)

4 external force

NCS +------ 6 do i = 1, n

NO NCS ! 7 a(i) = force(b(i), c(i))

!______ 8 enddo

9 end

Abbreviations Used

NO not optimized

DIR directive

NCS non-concurrent-stmt

Footnote List

1: not optimized No optimizable statements found.

2: not optimized Unoptimizable call to “FORCE” found.

Loop Summary

From To Loop Loop

Loop# line line label index Status

1 6 8 Do I unoptimizable

call (FORCE)

Sample Listing Files

Calling the function force prevents PFA from automatically running the loop

in parallel. PFA identiﬁes the function call as a non-concurrent-stmt. By its

nature, a nonconcurrent statement prevents PFA from assuming the loop is

safe to run in parallel because PFA cannot see into the routine to look for data

dependencies.

If you know that force generates no data dependencies, then explicitly state

this fact for the nonconcurrent statement

subroutine foo3(a,b,c,n)

real a(n), b(n), c(n)

external force

c*$*assert concurrent call

do i = 1, n

a(i) = force(b(i), c(i))

enddo

end

Now that PFA knows that the nonconcurrent statement involves no data

dependency, PFA will ﬁnd the loop safe to run in parallel.

There is one subtlety in using the concurrent call assertion. When you use

this assertion, PFA makes no attempt to examine the called routine; it simply

assumes that it is safe. However, PFA is still left with the problem of correctly

declaring the variables in the loop to be either SHARE or LOCAL. (PFA does

the best it can, but it can sometimes be fooled.) For example,

subroutine tricky (a,b,c,n,m)

real a(*), b(*)

external my_function

c*$*assert concurrent call

do i = 1, n

a(i) = my_function (b(i), m)

b(i) = a(i) + m

enddo

m = 0

end

Chapter 3: Utilizing PFA Output

The question is whether the variable m should be SHARE or LOCAL. If the

routine my_function only reads the old value of m, then it should be

SHARE. If my_function writes a new value of m, then it should be LOCAL.

In the absence of any more clues, PFA must go by what it can see; and what

it can see is that within the loop, there are no visible assignments to m, and

so PFA will declare it to be SHARE. If in fact my_function is writing the

value of m, then this is incorrect. In this case, to give PFA the hint it needs,

add a visible assignment to m at the top of the loop.

For example, consider the following code:

do i = 1, n

m = 0

a(i) = my_function(b(i), m)

b(i) = a(i) + m

enddo

Here, PFA can see an assignment to m and so will declare it to be LOCAL.

Note that if my_function is both reading the old value and writing a new

value of m, then it was not legal to parallelize the loop.

Reductions

This example shows how PFA produces a single value from a set of values.

Because the entire set of values is reduced to a single value, these operations

are called reductions.

Consider the Fortran 77 code

subroutine foo4(a,b,n,sum)

real a(n), b(n), sum

sum = 0.0

do i = 1, n

sum = sum + a(i)*b(i)

enddo

end

Sample Listing Files

Using the previous code as input, PFA produces the listing ﬁle

DIR 1 # 1 “foo4.f”

2 subroutine foo4(a,b,n,sum)

3 real a (n), b(n), sum

5 sum = 0.0

+----- 6 do i = i, n

1 DD ! 7 sum = sum + a(i)*b(i)

!_____ 8 enddo

9 end

Abbreviations Used

DD data dependence

DIR directive

Footnote List

1: data dependence Data dependence involving this

line due to variable “SUM”.

Loop Summary

From To Loop Loop

Loop# line line label index Status

1 6 8 Do I scalar mode preferable

Because different iterations of the loop read and write the same location (the

variable sum), there is a dependence. However, this is a special case. Because

sum just accumulates a total, you can accumulate subtotals in parallel and

then combine the subtotals at the end.

Because the parallel version of the code adds the elements together in a

different order than the single-process version, the round-off errors

accumulate differently for the two versions of the code. Thus, the answer can

differ slightly as you vary the number of processes used to run the code. In

fact, if you use the dynamic scheduling option for the code, the answer

might vary slightly from one run of the program to the next, even if you use

the same number of processes on the same machine.

Most applications can safely ignore this variation in round-off error. If you

do not care about this round-off error, you can tell PFA to use parallel

subtotals. To tell PFA not to worry about round-off error, you can use either

the C*$*ROUNDOFF=2 directive or the f77/pfa command line option -WK,

-roundoff=2.

Chapter 3: Utilizing PFA Output

The resulting listing ﬁle is

DIR 1 # 1 “foo4.f”

2 subroutine foo4(a,b,n,sum)

3 real a(n), b(n), sum

5 sum = 0.0

C +------ 6 do i = 1, n

* 7 sum = sum + a(i)*b(i)

*______ 8 enddo

9 end

Abbreviations Used

DIR directive

C concurrentized

Loop Summary

From To Loop Loop

Loop# line line label index Status

1 6 8 Do I concurrentized

Be aware that the round-off error produced by the parallel reduction

operation is not necessarily any worse than the round-off error already

present in the original serial version. It will simply be different. If your

application did not worry about the round-off error in the original, there is

no reason to suppose that it should worry about it in the parallel version. If,

on the other hand, your application takes special steps to reduce round off

(for example, adding the numbers together in order from smallest absolute

value to largest), then you should not use parallel reductions.

Sample Listing Files

The previous example is called a sum reduction because the reduction

operator is +. Table 3-5 shows the types of reductions PFA supports.

All these reductions are under the control of the -ROUNDOFF command

line option, even though technically the min and max reductions do not

involve round-off problems.

Table 3-5 Reduction Types

Type Operator Example

Sum + sum = sum + expression

Product * p = p* expression

Min min( ) a = min(a, expression)

Max max( ) x = max(x, expression)

Chapter 4

4. Customizing PFA Execution

This chapter contains the following sections:

•“Overview” explains when to optimize PFA execution.

•“Controlling Code Execution” describes how to control whether PFA

runs eligible loops in parallel.

•“Controlling PFA Code Transformations” describes how to control the

various transformations performed by PFA.

•“Performing Inlining and Interprocedural Analysis” describes inlining

and interprocedural analysis and explains how and when to perform

these procedures.

•“Controlling Fortran Language Elements” explains how to control

standard Fortran elements with command line options to PFA.

•“Controlling Directives and Assertions” explains how to override PFA

directives and assertions with command line options.

•“Controlling PFA I/O” explains how to customize the names of PFA

input and output ﬁles.

•“Obsolete Syntax” lists obsolete PFA command line options.

Overview

To customize how PFA executes an entire program, you can specify various

command line options when you run PFA directly or when you specify PFA

as part of a compile. Chapter 2, “How to Use PFA,” explains both

procedures. For a complete summary of the PFA command line options,

refer to Appendix A, “PFA Command Line Options.”

Chapter 4: Customizing PFA Execution

Controlling Code Execution

When modifying most programs to allow loops to run in parallel, modify the

code so that PFA can automatically run the loop in parallel. Avoid forcing the

loop to run in parallel by directly inserting a C$DOACROSS directive. If

you force code to run in parallel, you (and not PFA) need to verify that no

subsequent modiﬁcation inserts data dependencies. Forcing these data

dependencies in code to run in parallel can produce serious (and

difﬁcult-to-ﬁnd) errors. Rewriting the loop so that PFA recognizes the loop

as safe to run in parallel allows PFA to check future modiﬁcations for

potential data dependencies.

This section describes how to control whether eligible loops are run in

parallel and how to specify a work threshold for loops.

Running Code in Parallel

The -CONCURRENTIZE option (or -C) converts eligible loops to run in

parallel. This is the default value for this option. The

-NOCONCURRENTIZE option (or -NCONC) prevents PFA from

converting loops to run in parallel.

Specifying a Work Threshold

The -MINCONCURRENT=n option (or -MC=n) speciﬁes the minimum

amount of work needed inside the loop to make executing a loop in parallel

proﬁtable. The integer n is a count of the number of operations (for example,

add, multiply, load, store) in the loop, multiplied by the number of times the

loop will be executed.

If the loop does not contain at least this much work, the loop will not be run

in parallel. If the loop bounds are not constants, an IF clause will be

automatically added to the PFA-generated C$ DOACROSS directive to test

at run time if sufﬁcient work exists.

If you do not specify this option, PFA runs all loops containing 500 or more

operations in parallel.

Controlling PFA Code Transformations

For example, given the original loop

do 2 i =1,n

x(i) = y(i) * z(i)

2 continue

PFA generates the following transformed loop:

C$DOACROSS IF (N .GT. 100), SHARE (N,X,Y,Z), LOCAL(I)

DO 3 I=1,N

x(i) = y(i)*z(i)

3 CONTINUE

The IF clause ensures that n is large enough to make running the loop in

parallel proﬁtable (otherwise, PFA will run the loop serially). If the loop

bound is a small constant (such as 10) instead of n, PFA would not generate

aDOACROSS statement for the loop and the listing ﬁle will state that the

loop does not contain enough work. Conversely, if the bound is a large

constant (such as 100), then PFA generates the DOACROSS statement

without the IF clause.

Controlling PFA Code Transformations

This section discusses the various ways in which you can control the

standard transformations that PFA performs.

Controlling Size/Complexity Thresholds

You can control the thresholds for internal table size and routine complexity

in order to analyze larger and more complex routines.

Controlling Internal Table Size

The -ARCLIMIT=n option (or -ARCLM=n) controls the size of the internal

table used to store data dependence information (arcs). If this table

overﬂows, PFA stops analyzing the loop and the PFA listing ﬁle shows the

message

too many stmts/dd arcs

Chapter 4: Customizing PFA Execution

Increasing ARCLIMIT might allow PFA to analyze the loop but at the cost

of additional processing time.

Specifying a Complexity Limit

The -LIMIT=n option (or -LM=n) controls the amount of time PFA can

spend trying to determine whether a loop is safe to run in parallel. PFA

estimates how much time is required to analyze each loop nest construct. If

an outer loop looks like it would take too much time to analyze, PFA ignores

the outer loop and recursively visits the inner loops.

Larger limits often allow PFA to generate parallel code for deeply nested

loop structures that it might not otherwise be able to run safely in parallel.

However, with larger limits PFA can also take more time to analyze a

program. (The limit does not correspond to the DO loop nest level. It is an

estimate of the number of loop orderings that PFA can generate from a loop

nest.) This option has the same effect as the global C*$* LIMIT(n) directive.

Note: You do not usually need to change these limits.

Setting the Optimization Level

The -OPTIMIZE=n option (or -O=n) sets the optimization level. The higher

you set the optimization level, the more code is optimized and the longer

PFA runs. Programs that are written for running in parallel often do not need

advanced transformation. With these programs, a lower optimization level

is enough. Valid values for n are

0 Avoids converting loops to run in parallel.

1 Converts loops to run in parallel without using advanced

data dependence tests. Enables loop interchanging.

2 Determines when scalars need last-value assignment using

lifetime analysis. Also uses more powerful data

dependence tests to ﬁnd loops that can run safely in

parallel. This level allows reductions in loops that execute

concurrently but only if the -ROUNDOFF option is set to 2.

(Refer to the following section for details about the

-ROUNDOFF option.)

Controlling PFA Code Transformations

3 Breaks data dependence cycles using special techniques

and additional loop interchanging methods, such as

interchanging triangular loops. This level also implements

special-case data dependence tests.

4 Generates two versions of a loop, if necessary, to break a

data-dependent arc. This level also implements more-exact

data dependence tests and allows special index sets (called

wraparound variables) to convert more code to run in

parallel.

5 Fuses two adjacent loops if it is legal to do so (that is, there

are no data dependencies) and if the loops have the same

control values. In certain limited cases, this level recognizes

arrays as local variables. This level is the default.

This option has the same effect as the global C*$* OPTIMIZE(n) directive

described in Chapter 5, “Fine-Tuning PFA.”

Note: If you want to use the -UNROLL command line option, set the

-OPTIMIZE option to 4 or higher (the default optimization level is above

this threshold).

Chapter 4: Customizing PFA Execution

Controlling Variations in Round Off

The -ROUNDOFF=n option (or -R=n) controls the amount of variation in

round off that PFA will allow. Valid values for n are the integers

0–1 Suppresses any round-off transformations. This is the

default.

2 Allows reductions to be performed in parallel. The valid

reduction operators are addition, multiplication, min, and

max. This value is one of the most commonly speciﬁed user

options.

3 Recognizes REAL induction variables. Permits memory

management transformations (refer to “Memory

Management Transformations” on page 44).

When executing reductions in parallel, PFA processes values in a different

order from the original serial code. Round-off errors accumulate differently

and produce a slightly different answer. Some algorithms are sensitive to

this variation, and so, by default, PFA does not run reductions in parallel.

Usually, these tiny variations are irrelevant, and you can allow PFA to

process a reduction in parallel allowing more loops to be run in parallel.

Controlling the Number of Scalar Optimizations

The -SCALAROPT=n option (or -SO=n) controls the amount of standard

scalar optimizations attempted by PFA. Valid values for n are the integers

0 Performs no scalar transformations.

1 Enables dead code elimination, pulling loop invariants,

forward substitution, and conversion of IF-GOTO into

IF-THEN-ELSE.

2 Enables induction variable recognition, loop unrolling, loop

fusion, array expansion, scalar promotion, and ﬂoating

invariant IF tests. (Loop fusion also requires

-OPTIMIZE=5.)

Controlling PFA Code Transformations

3 Enables the memory management transformations (refer to

“Memory Management Transformations” on page 44).

(Memory management also requires -ROUNDOFF=3.) This

is the default value.

Enabling Loop Unrolling

The -UNROLL=n option (or -UR=n) unrolls scalar inner loops when PFA

cannot run the loops in parallel. n speciﬁes the number of times to replicate

the loop body. The default is 4. Specify a small power of two for the unroll

value, such as two, four, or eight. Disable unrolling by setting -UNROLL=1.

The -UNROLL2=m option (or -UR2=m) allows you to adjust the number of

operations used by the -UNROLL option. Selecting a larger value for

-UNROLL2 allows PFA to unroll loops containing more calculations. This

form of unrolling applies only to the innermost loops in a nest of loops. You

can unroll loops whether they execute serially or concurrently.

PFA counts the number of array references and arithmetic operations in the

loop. It unrolls the loop until it reaches either the number of operations

speciﬁed by the -UNROLL2 option or the number of iterations speciﬁed by

-UNROLL.

When PFA unrolls a loop, it replicates the body of the loop a certain number

of times, making the loop run faster. However, unrolling loops also increases

the program size.

For example, if the original program is

do i = 1,100

a(i) = b(i) + c(i)*d(i)

enddo

the unrolled program (unrolling of order 4) is

do i = 1,100,4

a(i) = b(i) + c(i)*d(i)

a(i+1) = b(i+1) + c(i+1)*d(i+1)

a(i+2) = b(i+2) + c(i+2)*d(i+2)

a(i+3) = b(i+3) + c(i+3)*d(i+3)

enddo

Chapter 4: Customizing PFA Execution

The second (unrolled) version runs faster than the original version. The

reason for the improvement is that SGI processors have separate add and

multiply hardware, allowing addition and multiplication operations to run

simultaneously. In the original program, the processor has to do the

multiplication, wait for it to complete, then do the addition. In the second

case, the processor can do the ﬁrst multiplication, wait for it to complete,

then overlap the second multiplication and the ﬁrst addition, then the third

multiplication and the second addition, and so on.

The additions require nearly no additional time because all but the last one

are completed within the time it takes the (previous) multiplication to

complete. If the loop already contains many computations (for example,

many lines of code, many additions and multiplications), then unrolling it

might help a little but not much.

Memory Management Transformations

When -ROUNDOFF and -SCALAROPT are both set to 3, PFA attempts to

do outer loop unrolling (to improve register utilization) and automatic loop

blocking (also called tiling) to improve cache utilization.

Outer loop unrolling is a standard hand-optimization technique. Note that

the -UNROLL and -UNROLL2 options apply to inner-loop unrolling.

Outer-loop unrolling can occur even if inner-loop unrolling is disabled.

Loop blocking is a complex transformation that is applicable when the loop

nesting depth is greater than the dimensions of the data arrays being

manipulated. The canonical example is the simple matrix multiply, where a

three-deep nest of loops operates on two-dimensional arrays.

The simple method repeatedly sweeps over the entire array. If the array is

too large to ﬁt into the cache, this can result in a large amount of memory

trafﬁc. A better method is to break the arrays up into blocks, where each

block is small enough to ﬁt into the cache, and then sweep over each block

in turn (rather than over the whole array). The code to do this is often ugly

and complicated. PFA attempts to ease the burden of writing block-style

algorithms by automatically generating the block version from the simple

version. Note, however, that blocking does not help the more common case

where the algorithm touches each array element exactly once (for example,

Controlling PFA Code Transformations

a two-dimensional array inside of a two-deep loop nest). Because in this case

the data is not being reused, blocking does not apply.

For example, given the loop nest

do k =1,n

do j= 1,n

do i =1,n

a(i,j) = a(i,j) + b(i,k)*c(k,j)

enddo

using the option -r=3, PFA produces the listing below:

II3 = 1

II1 = MOD (N - 1, 682) + 1

II2 = II1

II10 = N - 7

II11= (II10 + 7) / 8

DO 4 II4=1, N, 682

II8 = II3 + II2 - 1

DO 2 K=1, II10, 8

C$DOACROSS SHARE(N,K,C,II3,II8,A,B),LOCAL(DD1,DD2,C$& DD3,

DD4,DD5,DD6,DD7,DD8,DD9,J,I)

DO 2 J=1,N

DD2 = C(K,J)

DD3 = C(K+1,J)

DD4 = C(K+2,J)

DD5 = C(K+3,J)

DD6 = C(K+4,J)

DD7 = C(K+5,J)

DD8 = C(K+6,J)

DD9 = C(K+7,J)

DO 2 I=II3, II8, 1

DD1 = A(I,J)

DD1 = DD1 + B(I,K) * DD2

DD1 = DD1 + B(I, K+1) * DD3

DD1 = DD1 + B(I, K+2) * DD4

DD1 = DD1 + B(I, K+3) * DD5

DD1 = DD1 + B(I, K+4) * DD6

DD1 = DD1 + B(I, K+5) * DD7

DD1 = DD1 + B(I, K+6) * DD8

DD1 = DD1 + B(I, K+7) * DD9

A(I,J) = DD1

Chapter 4: Customizing PFA Execution

2 CONTINUE

II7 = II11 * 8 + 1

II9 = II3 + II2 - 1

DO 3 K=II7, N, 1

C$DOACROSS SHARE(N,K,C,II3,II9,A,B),LOCAL(DD10,J,I)

DO 3 J=1,N

DD10 = C(K,J)

DO 3 I=II3,II9,1

A(I,J) = A(I,J) + B(I,K) * DD10

3 CONTINUE

II3 = II3 + II2

II2 = 682

4 CONTINUE

Obviously, PFA’s version is more complicated than the original, but it runs

signiﬁcantly faster.

Performing Inlining and Interprocedural Analysis

Function and subroutine calls create an obstacle to parallelization. PFA

provides three ways of dealing with this obstacle:

•Assert that the external routine is safe for concurrent execution (see

“C*$* ASSERT CONCURRENT CALL” on page 64).

•Inline the routine by replacing the call to the external routine with the

actual code.

•Perform interprocedural analysis (IPA) by analyzing the external

routine ahead of time and using the results of that analysis when a

reference to the routine is encountered.

Inlining and IPA tend to be slow, memory-intensive operations. Attempting

to inline all routines everywhere they occur can take a lot of time and use a

lot of system resources. Inlining should usually be restricted to a few

time-critical places.

Performing Inlining and Interprocedural Analysis

This section discusses the three steps for inlining or IPA:

1. Specify which routines will be inlined (or interprocedurally analyzed).

2. Specify which source ﬁles and libraries will be searched to ﬁnd the

routines.

3. Specify which occurrences of those routines are to be inlined (or

analyzed).

Specifying Routines for Inlining or IPA

PFA supports the -INLINE=list option (or -IN=list) that speciﬁes the

routines to be inlined and the -IPA=list option for IPA. list is a

colon-separated list of routines to be inlined. For example,

-INLINE=jump:more

If you do not specify list, PFA will attempt to inline all eligible routines.

Specifying Where to Search for Routines

The options listed in Table 4-1 tell PFA where to search for the routines

speciﬁed with the -INLINE or -IPA option. If you do not specify either

option, PFA searches the current source ﬁle by default.

If one of the names in list is a directory, then all appropriate ﬁles in that

directory will be used. PFA assumes ﬁles with the extension .f are Fortran

source and ﬁles with the extension .klib are PFA-produced libraries.

Table 4-1 Inlining and IPA Search Command Line Options

Long Option Name Short Option Name Default Value

-INLINE_FROM_FILES=list -INFF=list Current Source File

-IPA_FROM_FILES=list -IPAFF=list Current Source File

-INLINE_FROM_LIBRARIES=list -INFL=list None

-IPA_FROM_LIBRARIES=list -IPAFL=list None

Chapter 4: Customizing PFA Execution

Specify multiple ﬁles and directories with the same option by using a

colon-separated list. For example,

-INLINE_FROM_FILES=file1:file2:file3

Note: These options by themselves do not initiate inlining or IPA. They only

specify where to look for the routines. Use them in conjunction with the

appropriate -INLINE or -IPA option.

Creating a Library

When performing inlining and IPA, PFA analyzes the routines in the source

program. Normally, inlining is done directly from a source ﬁle. However,

when inlining the same set of routines in many different programs, it is more

efﬁcient to create a preanalyzed library of the routines. Use the

-INLINE_CREATE =name option (or -INCR=name) to create a library of

prepared routines (for later use with the -INLINE_FROM_LIBRARIES

option). PFA assigns a name to the library ﬁle it creates; for maximum

compatibility, use the ﬁlename extension .klib: for example, samp.klib.

The -IPA_CREATE=name option (or -IPACR=name) is the analogous option

for IPA.

The library used to do IPA does not have to be generated from the same

source that will be linked into the running program. Using this capability

can cause errors, but it can also be useful. For example, you could write a

library of hand-optimized assembly language routines, then construct a

PFA-compatible IPA library using Fortran routines that mimic the behavior

of the assembly code. Thus, you can do parallelism analysis with IPA

correctly but still call the hand-optimized assembly routines. Use the

following procedure to create and use a PFA library:

1. Create a library by passing the source program directly through pfa.

Library creation is done by PFA and should not be done at the same

time as an ordinary compilation. For example, the following command

line creates a library called samp.klib for the source program samp.f:

%/usr/lib/pfa -INLINE_CREATE=samp.klib samp.f

2. Compile the program with pfa:

%f77 -pfa keep -WK,-INFL=samp.klib samp.f

Performing Inlining and Interprocedural Analysis

Note: Libraries created for inlining contain complete information and can be

used for inlining or IPA. Libraries created for IPA contain only summary

information and can be used only for IPA.

Specifying Occurrences

The loop level, depth, and manual options allow you to control which

occurrences of the routines speciﬁed with the -INLINE or -IPA option are

actually dealt with when the -INLINE or -IPA options are used.

Loop Level

The -INLINE_LOOPLEVEL=n(or -INLL=n) and -IPA_LOOPLEVEL=n (or

-IPALL=n) options allow you to limit PFA to work only on occurrences

within deeply nested loops. Thus, a value of 1 restricts PFA to deal with

routines only at the single-most deeply nested level; a value of 2 restricts

PFA to the deepest and second-deepest levels; and so on.

To determine most deeply nested, PFA constructs a call graph to account for

nesting due to loops that occur farther up the call chain. If you do not specify

either option, the loop level is 10.

Depth

The -INLINE_DEPTH=n (or -IND) option restricts the number of times PFA

will continue to attempt inlining on already inlined routines. For example,

suppose you use PFA to inline the routine foo. However, foo itself contains

a call to bar. Should PFA now attempt a second inlining depth and inline

bar? And if bar calls baz, should PFA inline three deep? This option provides

control over this process, as routines are only inlined to the speciﬁed depth.

As a special case, if you specify the value –1, only routines that do not

reference other routines are inlined (that is, only leaf routines are inlined).

Note that the extension to –2, –3, and so on is not supported, only –1. Note

also that there is no -IPA_DEPTH option.

Chapter 4: Customizing PFA Execution

Manual

The -INLINE_MAN option turns on recognition of the C*$*INLINE

directive. This directive (described in Chapter 5, “Fine-Tuning PFA”) allows

you select individual occurrences of routines to be inlined. -IPA_MAN is the

analogous option for the C*$*IPA directive (also described in Chapter 5,

“Fine-Tuning PFA.”).

Conditions That Prevent Inlining or IPA

Several conditions make a routine ineligible for inline expansion or IPA:

•Dummy arguments do not match the actual arguments in number,

type, shape, or size.

•The calling program and called routine have conﬂicting declarations for

the same COMMON block.

•The calling program and the called routine have conﬂicting

EQUIVALENCE statements.

•The routine to be inlined has a SAVE,ENTRY, or NAMELIST

statement.

•The routine to be inlined has a DATA loaded variable.

•The routine to be inlined is too long (the limit is about 600 lines).

Controlling Fortran Language Elements

This section explains how to control various Fortran 77 language elements.

Global Assumptions

The -ASSUME=list option (or -AS=list) controls certain global assumptions

of a program. list consists of any combination of the following values:

E Allows equivalence variables to refer to the same memory

location inside one loop. For more information, see

Chapter 5, “Fine-Tuning PFA.”

Controlling Fortran Language Elements

L Instructs PFA to use a temporary variable within the

optimized loop and assign the last value to the original

scalar if PFA determines that scalar can be reused before it

is assigned. This value is important when a scalar is

assigned in a loop run in parallel. For more information, see

Chapter 5, “Fine-Tuning PFA.”

P Allows for parameter aliasing in a subprogram. For more

information, see Chapter 5, “Fine-Tuning PFA.”

By default, PFA assumes that a program conforms to the ANSI (and VMSTM)

standard; therefore, the default is -ASSUME=EL.

Debugging Lines

The -DLINES option tells PFA to treat the letter D in column one as if the

letter were a character space. PFA then parses the rest of that line as a normal

Fortran 77 statement. The -NODLINES option tells PFA to treat these lines

as though they were comments. These options are useful for excluding or

including debugging lines. f77 passes this option to PFA automatically when

you specify the f77 -d_lines option.

DO Loop Execution

The -ONETRIP option (or -l) provides compatibility with older versions of

Fortran where a DO loop is always executed at least once. The

-NOONETRIP (or -N1) option conforms to the Fortran 77 standard.

This option, which is the default, does not execute a DO loop whose

termination condition is initially satisﬁed. f77 passes the -ONETRIP option

to PFA automatically when you specify the f77 -one_trip option.

Chapter 4: Customizing PFA Execution

Variable Saving Across Invocations

The -SAVE=c option (or -SV=c) speciﬁes whether a procedure’s variables are

saved across invocations. c is one of the following values:

A Performs a lifetime analysis on a procedure’s variables to

determine those that need to have their value saved across

invocations of the procedure. When it ﬁnds such a variable,

PFA generates a SAVE statement for the variable.

M Does not generate SAVE statements. This is the default

value.

Signiﬁcant Columns

The -SCAN=n option controls the number of columns that PFA assumes to

be signiﬁcant. PFA ignores anything beyond the speciﬁed column number.

The default value for n is 72. Specifying any of the following f77 options

automatically sets this option: -col72,-col120, or -extend_source.

Fortran Standard

Setting the -SYNTAX=c option (or -SY=c) alters the interpretation of the

Fortran input to be in compliance with other standards. c is one of the

following values:

A Interprets the source in strict compliance with the ANSI

Fortran 77 standard.

V Interprets the source in compliance with the VMS Fortran

standard but without the additional SGI extensions.

If you do not specify this option, PFA uses the same rules as the standard SGI

Fortran compiler (refer to the Fortran 77 Programmer’s Guide for details).

Controlling Directives and Assertions

This section discusses the options you can use to select whether PFA accepts

a speciﬁc directive or assertion. You can use these options to override

directives and assertions that are speciﬁed in the source program.

Selecting Directives and Assertions

The -DIRECTIVES=list option speciﬁes the directives and assertions to

accept. The -NODIRECTIVES option tells PFA to ignore all directives and

assertions. This option is useful when you suspect unsafe directives are

causing problems with program execution.

Note: Some directives are called assertions because they assert program

characteristics that PFA cannot verify. (For example, an assertion could

assert that subroutine x contains no data dependencies.) However, you

might want PFA to use it when optimizing. Refer to Chapter 1, “Overview

of PFA,”for more information about directives and assertions.

Valid values for list are any combination of the values

A Accepts assertions.

C Accepts Cray CDIR$ directives; CDIR$IVDEP ignores

certain data dependencies in a loop. But because of

differences between SGI hardware and a Cray machine,

these data dependencies are not always safe to ignore on

SGI hardware. To be safe, PFA does not recognize the

CDIR$IVDEP directive by default. You can, at your own

risk, turn on Cray-directive recognition, which will cause

PFA to treat this Cray directive as if it were a C*$*ASSERT

DO (CONCURRENT) assertion.

K Accepts C*$* directives.

Chapter 4: Customizing PFA Execution

S Accepts C$ directives. PFA recognizes the directives

C$DOACROSS,C$, and C$&. (For more information, see

the Fortran 77 Programmer’s Guide.) If a C$DOACROSS

directive appears, PFA does not examine or alter the loop to

which the directive applies. This allows you to mix code

that you converted to parallel execution with code that PFA

converted to parallel execution.

V Accepts VAST CVD$ directives.

For example, specifying -DIRECTIVES=K enables PFA directives only,

whereas -DIRECTIVES=CK enables both Cray and PFA directives. Adding

A to the DIRECTIVES sequence also enables PFA assertions. Any

combination of options is acceptable.

If you do not specify either option, PFA will accept all assertions, PFA C*$*

directives, all C$ directives, and VAST CVD$ directives.

Controlling PFA I/O

This section describes command line options you can use to name PFA input

and output. You do not need to use these options unless you want to change

the default names. In particular, some versions of the make(1) utility assume

that ﬁles ending in .1 are lex(1) input ﬁles. To perform automatic makes

without overwriting the PFA listing ﬁle, use a different sufﬁx for the listing

ﬁlename.

Use the -INPUT=ﬁle.f option to specify the name of the Fortran source

program PFA input ﬁle. If you do not specify this option, PFA assumes that

a command line argument not preceded by a dash is the input ﬁlename.

The -FORTRAN=ﬁle option speciﬁes the name of the PFA intermediate ﬁle

(that is, the transformed source). If you do not specify this ﬁlename, PFA

names the intermediate ﬁle.m, where ﬁle is the name of the input ﬁle. For

details about the intermediate ﬁle, refer to Chapter 3, “Utilizing PFA

Output.”

Obsolete Syntax

The -LIST=ﬁle option speciﬁes the name of the PFA listing ﬁle. If you do not

specify this ﬁlename, PFA names the listing ﬁle ﬁle.l, where ﬁle is the name

of the input ﬁle. For details about the listing ﬁle, refer to Chapter 3,

“Utilizing PFA Output.”

Obsolete Syntax

Table 4-2 lists obsolete PFA command line options.

PFA now accepts new syntax for some of the command line options

(particularly the syntax for inlining). For compatibilIty with the older

versions, these options are translated into their newer equivalents in

Table 4-3. Whenever possible do not use the older syntax; support for it

might be withdrawn in the future.

Table 4-2 Obsolete Options

Long Option Name Short Option Name Default Value

-EXPAND -X, -EX off

-CREATE -CR off

-LIBRARY -LIB off

-LIMIT2 -LM2 5000

Table 4-3 Obsolete Options and Their Equivalents

Old Version New Version

-EXPAND=A-INLINE

-EXPAND=M-INLINE_MAN

-LIBRARY=name -INLINE_FROM_LIBRARIES=name

-CREATE -LIBRARY=name -INLINE_CREATE=name

-LIMIT2=n-ARCLIMIT=n

Chapter 5

5. Fine-Tuning PFA

This chapter contains the following sections:

•“Overview” explains how to ﬁne-tune program execution using

directives and assertions.

•“Fine-Tuning Inlining and IPA” describes how to use directives to use

inlining and IPA more speciﬁcally than with command line options.

•“Circumventing PFA” explains how to use directives to bypass PFA’s

analysis and leave areas of code unchanged.

•“Running Code Serially” explains how to use directives and assertions

to stop PFA from running speciﬁc code in parallel.

•“Running Code in Parallel” explains how to use directives and

assertions to tell PFA that it is safe to run speciﬁc parts of code in

parallel.

•“Ignoring Data Dependencies” explains how to tell PFA that apparently

data-dependent code is safe to run in parallel.

•“Using Equivalenced Variables” explains how to assert that your code

uses or does not use equivalenced variables.

•“Using Aliasing” describes the assertions used with aliasing.

Chapter 5: Fine-Tuning PFA

Overview

After you run a Fortran source program through PFA once, you can use

directives and assertions to ﬁne-tune program execution. The listing ﬁle will

show where and why PFA did not parallelize the code.

You can use directives and assertions to force PFA to execute portions of

code in various ways. Command line directives apply to the program as a

whole.

If you want ﬁner control for parallelizing a critical loop or inlining a

particular occurrence of a routine, specify directives and assertions directly

in the code. You can also use directives and assertions to keep PFA from

converting code to run in parallel. In other cases you might want to explicitly

force PFA to run segments of code in parallel even though it normally would

not.

Fine-Tuning Inlining and IPA

Chapter 4, “Customizing PFA Execution,” explains how to use inlining and

IPA on an entire program (refer to “Performing Inlining and Interprocedural

Analysis” on page 46). You can ﬁne-tune inlining and IPA using the

C*$*[NO] INLINE and C*$*[NO] IPA directives.

The C*$* [NO] INLINE directive behaves much the same as the -INLINE

command line option, but with the directive you can specify which

occurrences of a routine are actually inlined. The format for this directive is

C*$*[NO]INLINE [(name[,name ... ])] {HERE|ROUTINE|GLOBAL}

where

name Speciﬁes the routines to be inlined. If you do not specify a

name, this directive will affect all routines in the program.

HERE Applies the INLINE directive only to the next line;

occurrences of the named routines on that next line are

inlined.

ROUTINE Inlines the named routines everywhere they appear in the

current routine.

Fine-Tuning Inlining and IPA

GLOBAL Inlines the named routines throughout the source ﬁle.

The C*$*NOINLINE form overrides the -INLINE command line option and

so allows you to selectively disable inlining of the named routines at speciﬁc

points.

Example

In the following code fragment, the C*$*INLINE directive inlines the ﬁrst

call to beta but not the second.

do i =1,n

C*$*INLINE (beta) HERE

call beta (i,1)

enddo

call beta (n, 2)

Using the speciﬁer ROUTINE rather than HERE inlines both calls. This

routine must be compiled with the -inline_man command line option for the

C*$* INLINE directive to be recognized.

The C*$* [NO] IPA directive is the analogous directive for interprocedural

analysis. The format for this directive is

C*$*[NO]IPA [(name [,name...])] {HERE|ROUTINE|GLOBAL}

Chapter 5: Fine-Tuning PFA

Circumventing PFA

Sometimes you might need to hand-tune a DO loop so that it will run in

parallel. Use the directives in this section to prevent PFA from analyzing

your modiﬁed code.

C$ DOACROSS

The C$ DOACROSS directive tells the Fortran 77 compiler to generate

parallel code for the following loop. When PFA encounters this directive on

input, it does not modify the accompanying loop and therefore does not

interfere with any hand-tuning.

C$ DOACROSS is the standard method for parallelism in Fortran. This

directive is the same directive that PFA generates as a result of its analysis.

Refer to the Fortran 77 Programmer’s Guide for more information about the

C$ DOACROSS directive and its optional clauses.

PFA runs the following code as it appears:

C$ DOACROSS

DO 10 I=1, 100

A(I) = B(I)

10 CONTINUE

C$&

The C$& directive continues the C$ DOACROSS directive onto multiple

lines, for example,

C$DOACROSS SHARE(ALPHA, BETA, GAMMA, DELTA,

C$& EPSILON, OMEGA), LASTLOCAL (I, J, K, L, M, N),

C$& LOCAL(XXX1, XXX2, XXX3, XXX4, XXX5, XXX6, XXX7,

C$& XXX8, XXX9)

Running Code Serially

Use the following assertions and directives to keep PFA from running

speciﬁc code in parallel.

C*$* ASSERT DO (SERIAL)

The C*$* ASSERT DO (SERIAL) assertion tells PFA to run the speciﬁed

loop serially. PFA does not try to convert the speciﬁed loop to run in parallel.

It also does not try to run any enclosing loop in parallel. However, PFA can

still convert any loops nested inside the serial loop to run in parallel.

CDIR$ NEXT SCALAR

Silicon Graphics PFA supports the corresponding Cray directive, CDIR$

NEXT SCALAR. PFA interprets this directive as if it were a C*$* ASSERT

DO (SERIAL) assertion and generates scalar code for the next DO loop.

C*$* ASSERT DO PREFER (SERIAL)

The C*$* ASSERT DO PREFER (SERIAL) assertion indicates that you want

to execute a DO loop in serial mode. This assertion directs PFA to leave the

DO loop alone, regardless of the setting of the optimization level. You can

use this assertion to control which loop (in a nest of loops) PFA chooses to

run in parallel. The following example program segment shows how to use

the assertion:

DO 100 I = 1, N

C*$*ASSERT DO PREFER (SERIAL)

DO 100 J = 1, M

A(I,J) = B(I,J)

100 CONTINUE

In the DO loop above, the assertion requests that the J loop be serial. In this

construction, PFA tries to run the I loop in parallel but not the J loop. This

capability is useful when you know the value of M to be very small or less

than N. This assertion applies only to the DO loop that appears directly after

the assertion.

Chapter 5: Fine-Tuning PFA

Running Code in Parallel

This section explains the directives and assertions that allow PFA to

determine that speciﬁc areas of code are safe to run in parallel.

C*$*[NO]CONCURRENTIZE

The C*$*[NO]CONCURRENTIZE directive converts eligible loops to run in

parallel. The NO version prevents PFA from converting loops to run in

parallel. These directives, when speciﬁed globally, have the same effect as

the -CONCURRENTIZE and -NOCONCURRENTIZE options (see

Chapter 2, “How to Use PFA.”).

CVD$ CONCUR

PFA supports the VAST directive CVD$CONCUR. This directive runs a

loop in parallel to optimize performance. PFA interprets this directive as if it

were the C*$*CONCURRENTIZE directive.

C*$* ASSERT DO PREFER (CONCURRENT)

The C*$* ASSERT DO PREFER (CONCURRENT) assertion directs PFA to

run a particular nested loop in parallel if possible. PFA runs another of the

nested loops in parallel only if a condition prevents running the selected

loop in parallel.

Consider the following code:

C*$* ASSERT DO PREFER (CONCURRENT)

DO 100 I = 1, N

DO 100 J = 1, M

A (I, J) = B (I, J)

100 CONTINUE

This code directs PFA to prefer to run the I loop in parallel. However, if a

data dependence conﬂict prevents running the I loop in parallel, PFA might

run the J loop in parallel. The C*$* ASSERT DO PREFER (CONCURRENT)

assertion applies only to the DO loop immediately before it.

Ignoring Data Dependencies

PFA avoids running code in parallel that it believes to be data-dependent.

Use the assertions described in the following sections to override this

behavior.

C*$* ASSERT DO (CONCURRENT)

The C*$* ASSERT DO (CONCURRENT) assertion tells PFA to ignore

assumed data dependencies. Normally, PFA is conservative about

converting loops to run in parallel.

When PFA analyzes a loop to see if it is safe to run in parallel, it categorizes

the loop into one of three groups:

•yes (loop is safe to run in parallel)

•no

•not sure

Normally, PFA does not run “not sure” loops in parallel. It assumes there are

data dependencies. C*$* ASSERT DO (CONCURRENT) tells PFA to go

ahead and run “not sure” loops in parallel.

Note: If PFA identiﬁes a loop as containing deﬁnite (as opposed to assumed)

data dependencies, it does not run the loop in parallel even if you specify a

C*$* ASSERT DO (CONCURRENT) assertion.

CDIR$ IVDEP

PFA interprets the Cray directive CDIR$ IVDEP as if it were a C*$* ASSERT

DO (CONCURRENT) assertion. Some dependencies that are safe to run on

Cray hardware are not safe to run on SGI hardware. Therefore, recognition

of this assertion is turned off by default.

Chapter 5: Fine-Tuning PFA

C*$* ASSERT CONCURRENT CALL

The C*$* ASSERT CONCURRENT CALL tells PFA to ignore assumed

dependencies that are due to a subroutine call or a function reference.

However, you must ensure that the subroutines and referenced functions are

safe for parallel execution. This assertion applies to all subroutine and

function references in the accompanying loop, which must appear on the

next line.

C*$* ASSERT NO RECURRENCE

The C*$* ASSERT NO RECURRENCE(variable) assertion tells PFA to

ignore all data dependencies associated with variable. PFA ignores not just

assumed dependencies (as with the C*$* ASSERT DO (CONCURRENT)

assertion) but also real dependencies. Use this assertion to force PFA to

parallelize a loop when other, gentler means have failed. Use this assertion

with caution, as indiscriminate use can result in illegal parallel code.

C*$* ASSERT PERMUTATION

The C*$* ASSERT PERMUTATION(array) assertion tells PFA that array

contains no repeated values. This assertion permits PFA to run in parallel

certain kinds of loops that use indirect addressing, for example,

DO I = 1, N

A(INDEX(I)) = A(INDEX(I)) + B(I)

ENDDO

You can run this loop in parallel only if the array INDEX has no repeated

values (so that each INDEX (I) is unique). PFA cannot determine this, so it

does not run such a loop in parallel. However, if you know that every

element of INDEX() is unique, you can insert the following line before the

loop to permit PFA to run the loop in parallel:

C*$* ASSERT PERMUTATION (INDEX)

Using Equivalenced Variables

The C*$* ASSERT NO EQUIVALENCE HAZARD assertion tells PFA that

your code does not use equivalenced variables to refer to the same memory

location inside one loop nest. Normally, EQUIVALENCE statements allow

your code to use different variable names to refer to the same storage

location. The -ASSUME=E command line option acts like the global C*$*

ASSERT EQUIVALENCE HAZARD assertion (see “Global Assumptions”

on page 50 in Chapter 4). The C*$* ASSERT EQUIVALENCE HAZARD

assertion is active until you reset it or until the end of the program unit.

Using Aliasing

PFA has several assertions for use with aliasing.

C*$* ASSERT [NO] ARGUMENT ALIASING

The C*$* ASSERT [NO] ARGUMENT ALIASING assertion allows PFA to

make assumptions about subprogram arguments in a program. According

to the Fortran 77 standard, you can alias a variable only if you do not modify

(that is, write to) the aliased variable.

The following subroutine violates the standard, because variable A is aliased

in the subroutine (through C and D) and variable X is aliased (through X and

E):

COMMON X,Y

REAL A,B

CALL SUB (A, A, X)

...

SUBROUTINE SUB(C,D,E)

COMMON X,Y

X = ...

C = ...

...

Chapter 5: Fine-Tuning PFA

The command line option -ASSUME=P acts like a global C*$* ASSERT

ARGUMENT ALIASING assertion (see Chapter 4, “Customizing PFA

Execution.”). A C*$* ARGUMENT ALIASING assertion is active until it is

reset or until the next routine begins.

C*$* ASSERT RELATION

The C*$* ASSERT RELATION(name.xx.name) assertion indicates the

relationship between two variables or between a variable and a constant.

name is the variable or constant, and xx is any of the following: GT,GE,EQ,

NE,LT, or LE. This assertion applies only to the next DO statement.

Consider the following code:

DO 100 I = 1, N

A (I) = A (I+M) + B (I)

100 CONTINUE

If you know that M is greater than N, use the following assertion to give this

information to PFA:

C*$* ASSERT RELATION (M .GT. N)

DO 100 I = 1, N

A (I) = A (I +M) + B (I)

100 CONTINUE

Knowing that M is greater than N, PFA can generate parallel code for this

loop. If at run time, M is less than N, the answers produced by the code run

in parallel could differ signiﬁcantly from the answers produced by the

original code run serially.

Note: Many relationships of this type can be cheaply tested for at run time.

PFA will attempt to answer questions of this sort by generating an IF

statement that explicitly tests the relationship at run time. Occasionally, PFA

may need assistance, or you may want to squeeze that last ounce of

performance out of some critical loop by asserting some relationship rather

than repeatedly checking it at run time.

Appendix A

A. PFA Command Line Options

This appendix contains the following sections:

•“Overview”

•“Options Summary”

•“Obsolete Syntax”

This appendix lists and describes the options to PFA. The default settings are

satisfactory for most programs. However, you can alter the defaults to

customize output. PFA accepts several command line options. Table A-1 lists

the default settings for each option.

Overview

Table A-1 summarizes the PFA command line options. The Reference

column lists the functional categories of the following options:

•parallel execution

•general optimization

•Fortran 77 language control

•directive control

•listing

The next three columns list the long names, short names, and default values

of the options. Following the table is an explanation of each option,

including the option’s long and short names, its default, and, if applicable,

the long and short names for the NO version of the option.

Appendix A: PFA Command Line Options

Note: You can replace many of the PFA command line options described in

this chapter with in-code directives.

Table A-1 PFA Command Line Options

Reference Long Name Short Name Default Value

Parallelization [NO]CONCURRENTIZE

MINCONCURRENT=n

[N]CONC

MC=n

CONCURRENTIZE

MINCONCURRENT=500

Optimization ARCLIMIT

LIMIT=n

OPTIMIZE=n

ROUNDOFF=n

SCALAROPT=n

UNROLL=n

UNROLL2=n

ARCLM=n

LM=n

O=n

R=n

SO=n

UR=n

UR22=n

ARCLIMIT=5000

LIMIT=20000

OPTIMIZE=5

ROUNDOFF=0

SCALAROPT=3

UNROLL=4

UNROLL2=100

Fortran 77 Language

Control

ASSUME=list

[NO]DLINES

[NO]ONETRIP

SAVE=c

SCAN=n

SYNTAX=c

AS=list

[N]DL

[N]l

SV=c

SCAN=n

SY=c

ASSUME=EL

NODLINES

NOONETRIP

SAVE=A

SCAN=72

(option off)

Inlining and

Interprocedural

Analysis

INLINE[=list]

IPA[=names]

INLINE_CREATE=name

IPA_CREATE=name

INLINE_FROM_FILES=list

IPA_FROM_FILES=list

INLINE_FROM_LIBRARIES=list

IPA_FROM_LIBRARIES=list

INLINE_LOOP_LEVEL=n

IPA_LOOP_LEVEL=n

INLINE_MAN

IPA_MAN

INLINE_DEPTH

IPA

INCR=name

IPACR=name

INFF=list

IPAFF=list

INFL=list

IPAFL=list

INLL=n

IPALL=n

INM

IPAM

IND

(option off)

(INLL=10

IPALL=10

(option off)

INLL=10

IPALL=10)

IND=10

Options Summary

This section lists and deﬁnes all PFA command line options alphabetically.

ARCLIMIT

The -ARCLIMIT option, described in Table A-2, controls the size of the

internal table used to store data dependence information (arcs).

Directives [NO]DIRECTIVES=list [N]DR=list DIRECTIVES=AKSV

I/O INPUT=ﬁle.f

[NO]FORTRAN=ﬁle

[NO]LIST=ﬁle

ﬁle.f

[N]F=ﬁle

[N]L=ﬁle

ﬁle.f

F=ﬁle.m

L=ﬁle.l

Listing LINES=n

LISTOPTIONS=list

SUPPRESS=list

LN=n

LO=list

SU=list

LINES=55

LISTOPTIONS=OL

(option off)

Obsolete CREATE

LIBRARY=ﬁle

[NO]EXPAND=list

LIMIT2=n

LIB=ﬁle

EX=list

LM2=n

(option off)

LM2=5000

Table A-2 ARCLIMIT Option

Long Option Name Short Option Name Default Value

-ARCLIMIT=n-ARCLM=n5000

Table A-1 (continued) PFA Command Line Options

Reference Long Name Short Name Default Value

Appendix A: PFA Command Line Options

ASSUME

The -ASSUME option, described in Table A-3, controls certain global

assumptions of a program.

You can also use various assertions to control these assumptions. list is any

combination of the following values:

E Means that equivalence variables can refer to the same

memory location inside one loop.

L Is important when a scalar is assigned in a loop run in

parallel. If ASSUME is L, PFA uses a temporary variable

within the optimized loop and assigns the last value to the

original scalar if PFA determines that scalar can be reused

before it is assigned.

P Allows for parameter aliasing in a subprogram.

CONCURRENTIZE

The -CONCURRENTIZE option, described in Table A-4, converts eligible

loops to run in parallel.

007 0715 060

Navigation menu

Versions of this User Manual:

Views

Navigation