Lab 3 Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 10

ECE 420 Parallel and Distributed Programming

Lab 3: Solving a Linear System of Equations via

Gauss-Jordan Elimination using OpenMP∗

Winter 2017

1 Background

Consider the problem of solving a linear system of equations

a11x11 +a12x12 +... +a1nx1n=b1

a21x21 +a22x22 +... +a2nx2n=b2

... ...

an1xn1+an2xn2+... +annxnn =bn.

Denote the coeﬃcient matrix by







a11 a12 ... a1n

a21 a22 ... a2n

... ... ... ...

an1an2... ann







the constant vector by







...







∗In this manual, all the indeces start from 1.

ECE420: Lab 3 University of Alberta

and the unknown variables by

~x =







...







The linear system of equations can be represented by

A·~x =~

The linear system of equations is actually characterized by the augmented

matrix G,

G={A|~







a11 a12 ... a1nb1

a21 a22 ... a2nb2

... ... ... ... ...

an1an2... ann bn







Any operation on the original system of equations is equivalent to performing

some corresponding operation on the augmented matrix G. And an augmented

matrix can easily be mapped back to a linear system of equations. For simplicity,

we will perform calculation on the augmented matrix.

There are 3 types of linear operations (or row operations) which will not change

the solution(s) to the linear system of equations:

1. interchanging any two rows;

2. multiplying each element of a row by a nonzero constant;

3. replacing a row by the sum of itself and a constant multiple of another row

in the augmented matrix.

For these row operations, we use the following notations:

1. Ri↔Rj: interchange the ith row and the jth row;

2. αRi: multiply each element of row iby a nonzero α;

3. Ri+αRj: replace row iwith the sum of row iand αtimes row j.

To solve the linear system of equations, the basic idea is to transform the

original linear system to an equivalent new system via some linear operations. And

the new system should be reduced to some good form such that every equation

ECE420: Lab 3 University of Alberta

in the system has exactly one diﬀerent variable with a nonzero coeﬃcient, from

which we can simply “read” the solutions. In other words, the target augmented

matrix should be in the following form:







d11 0... 0b0

0d22 ... 0b0

... ... ... ... ...

0 0 ... dnn b0







The Gauss-Jordan Elimination is a procedure to achieve the above goal.

1.1 Gaussian Elimination with Partial Pivoting

Gaussian Elimination will transform the augmented matrix into its equivalent

“upper triangular” form, in which the elements below the main diagonal are all

zeros. It iteratively eliminates the elements below the main diagonal from the

ﬁrst column to the last column via row operations. Algorithm 1 describes such a

procedure.

Note that the partial pivoting part is important in this procedure. It will pre-

vent the case in which ukk is zero or close to zero. Thus, with partial pivoting, the

program will be more numerically stable. Also, note that in the row replacement

operation, jstarts from k, since the ﬁrst k−1 elements are always zero in this

algorithm.

1.2 Jordan Elimination

After we obtain an “upper triangular” Ufrom Gaussian Elimination, the Jordan

Elimination can further transform it into the desired form. Similarly, the basic idea

is to iteratively eliminate the elements above the main diagonal for each column,

one after another.

The inner for loop performs the row replacement. However, for each row, we

only need to update the two elements dik and di(n+1), since elements on the other

columns actually stay the same. See the example for an illustration.

After we get D, we can compute the desired solution ~x simply by

xi=di(n+1)/dii,for any i.

ECE420: Lab 3 University of Alberta

Algorithm 1 Gaussian Elimination

Input: An augmented matrix G={A|~

b}, where A= (aij ) is an n×nmatrix

and ~

b= (bi) is an n-dimensional vector.

Output: The augmented matrix U(the elements are denoted as uij ) that is

equivalent to Gand is in the “upper triangular” form.

Initially, U←G

for k= 1 to n−1do/*eliminate elements below the diagonal to zero one

column after another*/

/*Pivoting*/

In U, from row kto row n, ﬁnd the row kpthat has the maximum absolute

value of the element in the kth column

Swap row kand row kp

/*Elimination*/

for i=k+ 1 to ndo

temp =uik/ukk

for j=kto n+ 1 do

uij ←uij −temp ·ukj /*row replacement*/

endfor

ECE420: Lab 3 University of Alberta

Algorithm 2 Jordan Elimination

Input: The output of the Gaussian Elimination U(an n×(n+ 1) matrix).

Output: The augmented matrix D(with elements denoted by dij ) that is

equivalent to Gand is in our ﬁnal target form.

Initially, D←U

for k=nto 2 do/*eliminate elements to zero for each column one after

another*/

for i= 1 to k−1do/*row replacement one row after another*/

di(n+1) ←di(n+1) −dik/dkk ·dk(n+1)

dik ←0

endfor

1.3 An Example

We will give an example to demonstrate the described algorithms. Consider a

linear system of equations:

2x11 + 4x12 −2x13 = 3

−4x21 −8x22 + 5x23 =−4

4x31 + 4x32 −5x33 = 4.

The corresponding augmented matrix is







2 4 −2 3

−4−8 5 −4

4 4 −5 4







The Gauss-Jordan Elimination with partial pivoting on it will be

ECE420: Lab 3 University of Alberta







2 4 −2 3

−4−8 5 −4

4 4 −5 4







R1↔R2

−−−−→ 





−4−8 5 −4

2 4 −2 3

4 4 −5 4





(pivoting)

R2+1

2R1

−−−−−→ 





−4−8 5 −4

0 0 1

4 4 −5 4







R3+R1

−−−−→ 





−4−8 5 −4

0 0 1

0−4 0 0







R2↔R3

−−−−→ 





−4−8 5 −4

0−4 0 0

0 0 1





(pivoting; it happens to be the end of Gaussian Elimination)

R1−10R3

−−−−−→ 





−4−8 0 −14

0−4 0 0

0 0 1





(Starting Jordan Elimination)

R1−2R2

−−−−→ 





−4 0 0 −14

0−4 0 0

0 0 1







ECE420: Lab 3 University of Alberta

2 Task and Requirement

Task: Using OpenMP, implement a program to solve linear systems of equations

by Gauss-Jordan Elimination with partial pivoting. The input will be a coeﬃcient

matrix Aand a vector ~

b. The output will be a vector ~x, where

A·~x =~

Requirements and Remarks:

•Use the scripts in “Development Kit Lab 3” to generate input data, load

data and save results. Refer to the ReadMe ﬁle for details on how to use

them.

•Time measurement should be implemented.

•The number of threads should be passed as the only command line argument

to your program.

•Optimize the performance of your implementation as much as possible using

the techniques learned in class.

•You don’t need to consider the singular cases, i.e., a linear system with

no solution or an inﬁnite number of solutions. The input data generated

by “datagen.c” will avoid such cases. You do need to include the partial

pivoting procedure in your code to make the computed results correct and

numerically stable.

Lab Report Requirements:

1. Describe your implementation clearly.

2. Compare and discuss the performance (speedup or eﬃciency) of your imple-

mentation under diﬀerent numbers of threads used. Explain your results.

3. Present the design choices and performance optimization analysis in your lab

report. In the report, show the run times of your optimized code compared

against some baseline inferior version(s). Use ﬁgures and/or tables to show

ECE420: Lab 3 University of Alberta

the results under various setups (e.g., diﬀerent scheduling policies and chunk

sizes, diﬀerent parallelization strategies, and diﬀerent numbers of threads

and so on) and under suﬃciently large input(s). In other words, an eﬀort for

performance optimization must be demonstrated in your lab report.

4. Please also refer to the “Lab Report Guide” for other requirements.

Submission Instructions:

Each team is required to submit BOTH a hard copy of printed lab report to the

assignment box AND the source code on eClass. The report should be submitted

in the assignment box on the 2nd ﬂoor of ECERF. Please check eClass for the

code and report submission due date.

For code submission, each team is required to submit a zip ﬁle to eClass. The

zip ﬁle should be named “StudentID-Hx.zip”, where “StudentID” is the Student

ID of one of your group members (doesn’t matter which member) and “Hx” is the

section (H1 or H2).

The zip ﬁle should contain the following ﬁles:

1. “readme”: a text ﬁle containing instructions on how to compile your source

ﬁles;

2. “members”: a text ﬁle listing the student IDs of ALL group members, with

each student ID occupying one line;

3. “Makeﬁle”: the makeﬁle to generate the executable. Please ensure that your

Makeﬁle is located in the root folder of that zip ﬁle and the default “$make”

command will generate the ﬁnal executable program named “main”;

4. All the necessary source ﬁles to build the executable “main”;

DO NOT include the compiled data generation ﬁle, “serialtester” ﬁle, or the

input/output data ﬁle.

Note: you MUST use the ﬁle names suggested above. File names are

case-sensitive. You MUST generate the required zip ﬁle by directly

ECE420: Lab 3 University of Alberta

compressing all the above ﬁles, rather than compressing a folder con-

taining those ﬁles.

3 Hints and Tips

•To improve the performance, you might need to ﬁnd out all components of

the program that each can be run in parallel ﬁrst. Try to reduce the number

of forks and implicit joins due to repeated uses of parallel directives. In other

words, try to reuse the team of threads launched by a parallel directive.

•What kind of scheduling policies will yield the best result for each OpenMP

for loop?

•How many threads should be used?

•Should we use the same number of threads to handle each for loop? Or

should we vary the number of threads dynamically at diﬀerent stages of

the program? Should we vary the scheduling policy, chunksizes and other

parameters?

•To guarantee the correctness, how should we eﬀectively protect a critical

section if there is any?

•Would the single or master directive be useful here?

•Is it helpful to use the collapse clause if there is any nested loop?

•In the pivoting procedure, swap the index or pointers rather than swapping

all the row elements in memory.

•The debugging is similar to Pthread debugging. When compiling, instead

of using the “-g” ﬂag, you might need to use the “-ggdb” ﬂag to make the

debugger work well on an OpenMP multi-threaded program. You can try

other ﬂags like “-gstabs”, “-gstabs+”. However, whether they would work

depends on the system.

Appendix

ECE420: Lab 3 University of Alberta

A Marking Guideline

Code:

Correct implementation: 2

Speed: 1

Report:

Clear description of implementation: 1

Testing and veriﬁcation: 1

Performance discussion: 4

Presentation: 1

Total: 10

Lab 3 Manual

Navigation menu

Versions of this User Manual:

Views

Navigation