Lab 3 Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 10

Download
Open PDF In Browser	View PDF

ECE 420 Parallel and Distributed Programming
Lab 3: Solving a Linear System of Equations via
Gauss-Jordan Elimination using OpenMP∗

Winter 2017

1

Background

Consider the problem of solving a linear system of equations
a11 x11 + a12 x12 + ... + a1n x1n = b1
a21 x21 + a22 x22 + ... + a2n x2n = b2
...

...

an1 xn1 + an2 xn2 + ... + ann xnn = bn .
Denote the coefficient matrix by



A=


a11 a12
a21 a22
... ...
an1 an2

... a1n
... a2n
... ...
... ann

the constant vector by


~b = 


∗

b1
b2
...
bn

In this manual, all the indeces start from 1.

1




,





,


ECE420: Lab 3

University of Alberta

and the unknown variables by



~x = 


x1
x2
...
xn




.


The linear system of equations can be represented by
A · ~x = ~b.
The linear system of equations is actually characterized by the augmented
matrix G,
G = {A|~b}

a11 a12
 a
 21 a22
= 
 ... ...
an1 an2

... a1n
... a2n
... ...
... ann

b1
b2
...
bn




.


Any operation on the original system of equations is equivalent to performing
some corresponding operation on the augmented matrix G. And an augmented
matrix can easily be mapped back to a linear system of equations. For simplicity,
we will perform calculation on the augmented matrix.
There are 3 types of linear operations (or row operations) which will not change
the solution(s) to the linear system of equations:
1. interchanging any two rows;
2. multiplying each element of a row by a nonzero constant;
3. replacing a row by the sum of itself and a constant multiple of another row
in the augmented matrix.
For these row operations, we use the following notations:
1. Ri ↔ Rj : interchange the ith row and the j th row;
2. αRi : multiply each element of row i by a nonzero α;
3. Ri + αRj : replace row i with the sum of row i and α times row j.
To solve the linear system of equations, the basic idea is to transform the
original linear system to an equivalent new system via some linear operations. And
the new system should be reduced to some good form such that every equation
2

ECE420: Lab 3

University of Alberta

in the system has exactly one different variable with a nonzero coefficient, from
which we can simply “read” the solutions. In other words, the target augmented
matrix should be in the following form:






d11 0
0 d22
... ...
0
0

... 0
... 0
... ...
... dnn

b01
b02
...
b0n




.


The Gauss-Jordan Elimination is a procedure to achieve the above goal.

1.1

Gaussian Elimination with Partial Pivoting

Gaussian Elimination will transform the augmented matrix into its equivalent
“upper triangular” form, in which the elements below the main diagonal are all
zeros. It iteratively eliminates the elements below the main diagonal from the
first column to the last column via row operations. Algorithm 1 describes such a
procedure.
Note that the partial pivoting part is important in this procedure. It will prevent the case in which ukk is zero or close to zero. Thus, with partial pivoting, the
program will be more numerically stable. Also, note that in the row replacement
operation, j starts from k, since the first k − 1 elements are always zero in this
algorithm.

1.2

Jordan Elimination

After we obtain an “upper triangular” U from Gaussian Elimination, the Jordan
Elimination can further transform it into the desired form. Similarly, the basic idea
is to iteratively eliminate the elements above the main diagonal for each column,
one after another.
The inner for loop performs the row replacement. However, for each row, we
only need to update the two elements dik and di(n+1) , since elements on the other
columns actually stay the same. See the example for an illustration.
After we get D, we can compute the desired solution ~x simply by
xi = di(n+1) /dii , for any i.
3

ECE420: Lab 3

University of Alberta

Algorithm 1 Gaussian Elimination
Input: An augmented matrix G = {A|~b}, where A = (aij ) is an n × n matrix
and ~b = (bi ) is an n-dimensional vector.
Output: The augmented matrix U (the elements are denoted as uij ) that is
equivalent to G and is in the “upper triangular” form.
Initially, U ← G
for k = 1 to n − 1 do/*eliminate elements below the diagonal to zero one
column after another*/
/*Pivoting*/
In U, from row k to row n, find the row kp that has the maximum absolute
value of the element in the k th column
Swap row k and row kp
/*Elimination*/
for i = k + 1 to n do
temp = uik /ukk
for j = k to n + 1 do
uij ← uij − temp · ukj /*row replacement*/
endfor
endfor
endfor

4

ECE420: Lab 3

University of Alberta

Algorithm 2 Jordan Elimination
Input: The output of the Gaussian Elimination U (an n × (n + 1) matrix).
Output: The augmented matrix D (with elements denoted by dij ) that is
equivalent to G and is in our final target form.
Initially, D ← U
for k = n to 2 do/*eliminate elements to zero for each column one after
another*/
for i = 1 to k − 1 do/*row replacement one row after another*/
di(n+1) ← di(n+1) − dik /dkk · dk(n+1)
dik ← 0
endfor
endfor

1.3

An Example

We will give an example to demonstrate the described algorithms. Consider a
linear system of equations:
2x11 + 4x12 − 2x13 = 3
−4x21 − 8x22 + 5x23 = −4
4x31 + 4x32 − 5x33 = 4.
The corresponding augmented matrix is


2
4 −2 3


 −4 −8 5 −4 
4
4 −5 4
The Gauss-Jordan Elimination with partial pivoting on it will be

5

ECE420: Lab 3

University of Alberta




2
4 −2 3


 −4 −8 5 −4 
4
4 −5 4


−4 −8 5 −4
R ↔R2


−−1−−→
4 −2 3  (pivoting)
 2
4
4 −5 4


−4
−8
5
−4
R2 + 1 R1


1
−−−−2−→  0
1 
0
2
4
4 −5 4


−4 −8 5 −4
R +R1


−−3−−→
0 12 1 
 0
0 −4 0 0


−4 −8 5 −4
R ↔R3


−−2−−→
 0 −4 0 0  (pivoting; it happens to be the end of Gaussian Elimination)
0
0 21 1


−4 −8 0 −14
R −10R3


−−1−−−→
 0 −4 0 0  (Starting Jordan Elimination)
0
0 12
1


−4 0 0 −14
R1 −2R2


−−
−−→  0 −4 0 0 
0
0 12
1

6

ECE420: Lab 3

2

University of Alberta

Task and Requirement

Task: Using OpenMP, implement a program to solve linear systems of equations
by Gauss-Jordan Elimination with partial pivoting. The input will be a coefficient
matrix A and a vector ~b. The output will be a vector ~x, where
A · ~x = ~b.

Requirements and Remarks:
• Use the scripts in “Development Kit Lab 3” to generate input data, load
data and save results. Refer to the ReadMe file for details on how to use
them.
• Time measurement should be implemented.
• The number of threads should be passed as the only command line argument
to your program.
• Optimize the performance of your implementation as much as possible using
the techniques learned in class.
• You don’t need to consider the singular cases, i.e., a linear system with
no solution or an infinite number of solutions. The input data generated
by “datagen.c” will avoid such cases. You do need to include the partial
pivoting procedure in your code to make the computed results correct and
numerically stable.
Lab Report Requirements:
1. Describe your implementation clearly.
2. Compare and discuss the performance (speedup or efficiency) of your implementation under different numbers of threads used. Explain your results.
3. Present the design choices and performance optimization analysis in your lab
report. In the report, show the run times of your optimized code compared
against some baseline inferior version(s). Use figures and/or tables to show
7

ECE420: Lab 3

University of Alberta

the results under various setups (e.g., different scheduling policies and chunk
sizes, different parallelization strategies, and different numbers of threads
and so on) and under sufficiently large input(s). In other words, an effort for
performance optimization must be demonstrated in your lab report.
4. Please also refer to the “Lab Report Guide” for other requirements.
Submission Instructions:
Each team is required to submit BOTH a hard copy of printed lab report to the
assignment box AND the source code on eClass. The report should be submitted
in the assignment box on the 2nd floor of ECERF. Please check eClass for the
code and report submission due date.
For code submission, each team is required to submit a zip file to eClass. The
zip file should be named “StudentID-Hx.zip”, where “StudentID” is the Student
ID of one of your group members (doesn’t matter which member) and “Hx” is the
section (H1 or H2).
The zip file should contain the following files:
1. “readme”: a text file containing instructions on how to compile your source
files;
2. “members”: a text file listing the student IDs of ALL group members, with
each student ID occupying one line;
3. “Makefile”: the makefile to generate the executable. Please ensure that your
Makefile is located in the root folder of that zip file and the default “$make”
command will generate the final executable program named “main”;
4. All the necessary source files to build the executable “main”;
DO NOT include the compiled data generation file, “serialtester” file, or the
input/output data file.
Note: you MUST use the file names suggested above. File names are
case-sensitive. You MUST generate the required zip file by directly

8

ECE420: Lab 3

University of Alberta

compressing all the above files, rather than compressing a folder containing those files.

3

Hints and Tips
• To improve the performance, you might need to find out all components of
the program that each can be run in parallel first. Try to reduce the number
of forks and implicit joins due to repeated uses of parallel directives. In other
words, try to reuse the team of threads launched by a parallel directive.
• What kind of scheduling policies will yield the best result for each OpenMP
for loop?
• How many threads should be used?
• Should we use the same number of threads to handle each for loop? Or
should we vary the number of threads dynamically at different stages of
the program? Should we vary the scheduling policy, chunksizes and other
parameters?
• To guarantee the correctness, how should we effectively protect a critical
section if there is any?
• Would the single or master directive be useful here?
• Is it helpful to use the collapse clause if there is any nested loop?
• In the pivoting procedure, swap the index or pointers rather than swapping
all the row elements in memory.
• The debugging is similar to Pthread debugging. When compiling, instead
of using the “-g” flag, you might need to use the “-ggdb” flag to make the
debugger work well on an OpenMP multi-threaded program. You can try
other flags like “-gstabs”, “-gstabs+”. However, whether they would work
depends on the system.
Appendix
9

ECE420: Lab 3

A

University of Alberta

Marking Guideline

Code:
Correct implementation:
Speed:
Report:
Clear description of implementation:
Testing and verification:
Performance discussion:
Presentation:
Total:

2
1
1
1
4
1
10

10

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 10
Producer                        : pdfTeX-1.40.16
Creator                         : TeX
Create Date                     : 2017:03:01 10:43:24-07:00
Modify Date                     : 2017:03:01 10:43:24-07:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) kpathsea version 6.2.1

EXIF Metadata provided by EXIF.tools

Lab 3 Manual

Navigation menu

Versions of this User Manual:

Views

Navigation