Lab 3 Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 10

ECE 420 Parallel and Distributed Programming
Lab 3: Solving a Linear System of Equations via
Gauss-Jordan Elimination using OpenMP
Winter 2017
1 Background
Consider the problem of solving a linear system of equations
a11x11 +a12x12 +... +a1nx1n=b1
a21x21 +a22x22 +... +a2nx2n=b2
... ...
an1xn1+an2xn2+... +annxnn =bn.
Denote the coefficient matrix by
A=
a11 a12 ... a1n
a21 a22 ... a2n
... ... ... ...
an1an2... ann
,
the constant vector by
~
b=
b1
b2
...
bn
,
In this manual, all the indeces start from 1.
1
ECE420: Lab 3 University of Alberta
and the unknown variables by
~x =
x1
x2
...
xn
.
The linear system of equations can be represented by
A·~x =~
b.
The linear system of equations is actually characterized by the augmented
matrix G,
G={A|~
b}
=
a11 a12 ... a1nb1
a21 a22 ... a2nb2
... ... ... ... ...
an1an2... ann bn
.
Any operation on the original system of equations is equivalent to performing
some corresponding operation on the augmented matrix G. And an augmented
matrix can easily be mapped back to a linear system of equations. For simplicity,
we will perform calculation on the augmented matrix.
There are 3 types of linear operations (or row operations) which will not change
the solution(s) to the linear system of equations:
1. interchanging any two rows;
2. multiplying each element of a row by a nonzero constant;
3. replacing a row by the sum of itself and a constant multiple of another row
in the augmented matrix.
For these row operations, we use the following notations:
1. RiRj: interchange the ith row and the jth row;
2. αRi: multiply each element of row iby a nonzero α;
3. Ri+αRj: replace row iwith the sum of row iand αtimes row j.
To solve the linear system of equations, the basic idea is to transform the
original linear system to an equivalent new system via some linear operations. And
the new system should be reduced to some good form such that every equation
2
ECE420: Lab 3 University of Alberta
in the system has exactly one different variable with a nonzero coefficient, from
which we can simply “read” the solutions. In other words, the target augmented
matrix should be in the following form:
d11 0... 0b0
1
0d22 ... 0b0
2
... ... ... ... ...
0 0 ... dnn b0
n
.
The Gauss-Jordan Elimination is a procedure to achieve the above goal.
1.1 Gaussian Elimination with Partial Pivoting
Gaussian Elimination will transform the augmented matrix into its equivalent
“upper triangular” form, in which the elements below the main diagonal are all
zeros. It iteratively eliminates the elements below the main diagonal from the
first column to the last column via row operations. Algorithm 1 describes such a
procedure.
Note that the partial pivoting part is important in this procedure. It will pre-
vent the case in which ukk is zero or close to zero. Thus, with partial pivoting, the
program will be more numerically stable. Also, note that in the row replacement
operation, jstarts from k, since the first k1 elements are always zero in this
algorithm.
1.2 Jordan Elimination
After we obtain an “upper triangular” Ufrom Gaussian Elimination, the Jordan
Elimination can further transform it into the desired form. Similarly, the basic idea
is to iteratively eliminate the elements above the main diagonal for each column,
one after another.
The inner for loop performs the row replacement. However, for each row, we
only need to update the two elements dik and di(n+1), since elements on the other
columns actually stay the same. See the example for an illustration.
After we get D, we can compute the desired solution ~x simply by
xi=di(n+1)/dii,for any i.
3
ECE420: Lab 3 University of Alberta
Algorithm 1 Gaussian Elimination
Input: An augmented matrix G={A|~
b}, where A= (aij ) is an n×nmatrix
and ~
b= (bi) is an n-dimensional vector.
Output: The augmented matrix U(the elements are denoted as uij ) that is
equivalent to Gand is in the “upper triangular” form.
Initially, UG
for k= 1 to n1do/*eliminate elements below the diagonal to zero one
column after another*/
/*Pivoting*/
In U, from row kto row n, find the row kpthat has the maximum absolute
value of the element in the kth column
Swap row kand row kp
/*Elimination*/
for i=k+ 1 to ndo
temp =uik/ukk
for j=kto n+ 1 do
uij uij temp ·ukj /*row replacement*/
endfor
endfor
endfor
4
ECE420: Lab 3 University of Alberta
Algorithm 2 Jordan Elimination
Input: The output of the Gaussian Elimination U(an n×(n+ 1) matrix).
Output: The augmented matrix D(with elements denoted by dij ) that is
equivalent to Gand is in our final target form.
Initially, DU
for k=nto 2 do/*eliminate elements to zero for each column one after
another*/
for i= 1 to k1do/*row replacement one row after another*/
di(n+1) di(n+1) dik/dkk ·dk(n+1)
dik 0
endfor
endfor
1.3 An Example
We will give an example to demonstrate the described algorithms. Consider a
linear system of equations:
2x11 + 4x12 2x13 = 3
4x21 8x22 + 5x23 =4
4x31 + 4x32 5x33 = 4.
The corresponding augmented matrix is
2 4 2 3
48 5 4
4 4 5 4
The Gauss-Jordan Elimination with partial pivoting on it will be
5
ECE420: Lab 3 University of Alberta
2 4 2 3
48 5 4
4 4 5 4
R1R2
48 5 4
2 4 2 3
4 4 5 4
(pivoting)
R2+1
2R1
48 5 4
0 0 1
21
4 4 5 4
R3+R1
48 5 4
0 0 1
21
04 0 0
R2R3
48 5 4
04 0 0
0 0 1
21
(pivoting; it happens to be the end of Gaussian Elimination)
R110R3
48 0 14
04 0 0
0 0 1
21
(Starting Jordan Elimination)
R12R2
4 0 0 14
04 0 0
0 0 1
21
6
ECE420: Lab 3 University of Alberta
2 Task and Requirement
Task: Using OpenMP, implement a program to solve linear systems of equations
by Gauss-Jordan Elimination with partial pivoting. The input will be a coefficient
matrix Aand a vector ~
b. The output will be a vector ~x, where
A·~x =~
b.
Requirements and Remarks:
Use the scripts in “Development Kit Lab 3” to generate input data, load
data and save results. Refer to the ReadMe file for details on how to use
them.
Time measurement should be implemented.
The number of threads should be passed as the only command line argument
to your program.
Optimize the performance of your implementation as much as possible using
the techniques learned in class.
You don’t need to consider the singular cases, i.e., a linear system with
no solution or an infinite number of solutions. The input data generated
by “datagen.c” will avoid such cases. You do need to include the partial
pivoting procedure in your code to make the computed results correct and
numerically stable.
Lab Report Requirements:
1. Describe your implementation clearly.
2. Compare and discuss the performance (speedup or efficiency) of your imple-
mentation under different numbers of threads used. Explain your results.
3. Present the design choices and performance optimization analysis in your lab
report. In the report, show the run times of your optimized code compared
against some baseline inferior version(s). Use figures and/or tables to show
7
ECE420: Lab 3 University of Alberta
the results under various setups (e.g., different scheduling policies and chunk
sizes, different parallelization strategies, and different numbers of threads
and so on) and under sufficiently large input(s). In other words, an effort for
performance optimization must be demonstrated in your lab report.
4. Please also refer to the “Lab Report Guide” for other requirements.
Submission Instructions:
Each team is required to submit BOTH a hard copy of printed lab report to the
assignment box AND the source code on eClass. The report should be submitted
in the assignment box on the 2nd floor of ECERF. Please check eClass for the
code and report submission due date.
For code submission, each team is required to submit a zip file to eClass. The
zip file should be named “StudentID-Hx.zip”, where “StudentID” is the Student
ID of one of your group members (doesn’t matter which member) and “Hx” is the
section (H1 or H2).
The zip file should contain the following files:
1. “readme”: a text file containing instructions on how to compile your source
files;
2. “members”: a text file listing the student IDs of ALL group members, with
each student ID occupying one line;
3. “Makefile”: the makefile to generate the executable. Please ensure that your
Makefile is located in the root folder of that zip file and the default “$make”
command will generate the final executable program named “main”;
4. All the necessary source files to build the executable “main”;
DO NOT include the compiled data generation file, “serialtester” file, or the
input/output data file.
Note: you MUST use the file names suggested above. File names are
case-sensitive. You MUST generate the required zip file by directly
8
ECE420: Lab 3 University of Alberta
compressing all the above files, rather than compressing a folder con-
taining those files.
3 Hints and Tips
To improve the performance, you might need to find out all components of
the program that each can be run in parallel first. Try to reduce the number
of forks and implicit joins due to repeated uses of parallel directives. In other
words, try to reuse the team of threads launched by a parallel directive.
What kind of scheduling policies will yield the best result for each OpenMP
for loop?
How many threads should be used?
Should we use the same number of threads to handle each for loop? Or
should we vary the number of threads dynamically at different stages of
the program? Should we vary the scheduling policy, chunksizes and other
parameters?
To guarantee the correctness, how should we effectively protect a critical
section if there is any?
Would the single or master directive be useful here?
Is it helpful to use the collapse clause if there is any nested loop?
In the pivoting procedure, swap the index or pointers rather than swapping
all the row elements in memory.
The debugging is similar to Pthread debugging. When compiling, instead
of using the “-g” flag, you might need to use the “-ggdb” flag to make the
debugger work well on an OpenMP multi-threaded program. You can try
other flags like “-gstabs”, “-gstabs+”. However, whether they would work
depends on the system.
Appendix
9
ECE420: Lab 3 University of Alberta
A Marking Guideline
Code:
Correct implementation: 2
Speed: 1
Report:
Clear description of implementation: 1
Testing and verification: 1
Performance discussion: 4
Presentation: 1
Total: 10
10

Navigation menu