Lab 3 Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 10
Download | |
Open PDF In Browser | View PDF |
ECE 420 Parallel and Distributed Programming Lab 3: Solving a Linear System of Equations via Gauss-Jordan Elimination using OpenMP∗ Winter 2017 1 Background Consider the problem of solving a linear system of equations a11 x11 + a12 x12 + ... + a1n x1n = b1 a21 x21 + a22 x22 + ... + a2n x2n = b2 ... ... an1 xn1 + an2 xn2 + ... + ann xnn = bn . Denote the coefficient matrix by A= a11 a12 a21 a22 ... ... an1 an2 ... a1n ... a2n ... ... ... ann the constant vector by ~b = ∗ b1 b2 ... bn In this manual, all the indeces start from 1. 1 , , ECE420: Lab 3 University of Alberta and the unknown variables by ~x = x1 x2 ... xn . The linear system of equations can be represented by A · ~x = ~b. The linear system of equations is actually characterized by the augmented matrix G, G = {A|~b} a11 a12 a 21 a22 = ... ... an1 an2 ... a1n ... a2n ... ... ... ann b1 b2 ... bn . Any operation on the original system of equations is equivalent to performing some corresponding operation on the augmented matrix G. And an augmented matrix can easily be mapped back to a linear system of equations. For simplicity, we will perform calculation on the augmented matrix. There are 3 types of linear operations (or row operations) which will not change the solution(s) to the linear system of equations: 1. interchanging any two rows; 2. multiplying each element of a row by a nonzero constant; 3. replacing a row by the sum of itself and a constant multiple of another row in the augmented matrix. For these row operations, we use the following notations: 1. Ri ↔ Rj : interchange the ith row and the j th row; 2. αRi : multiply each element of row i by a nonzero α; 3. Ri + αRj : replace row i with the sum of row i and α times row j. To solve the linear system of equations, the basic idea is to transform the original linear system to an equivalent new system via some linear operations. And the new system should be reduced to some good form such that every equation 2 ECE420: Lab 3 University of Alberta in the system has exactly one different variable with a nonzero coefficient, from which we can simply “read” the solutions. In other words, the target augmented matrix should be in the following form: d11 0 0 d22 ... ... 0 0 ... 0 ... 0 ... ... ... dnn b01 b02 ... b0n . The Gauss-Jordan Elimination is a procedure to achieve the above goal. 1.1 Gaussian Elimination with Partial Pivoting Gaussian Elimination will transform the augmented matrix into its equivalent “upper triangular” form, in which the elements below the main diagonal are all zeros. It iteratively eliminates the elements below the main diagonal from the first column to the last column via row operations. Algorithm 1 describes such a procedure. Note that the partial pivoting part is important in this procedure. It will prevent the case in which ukk is zero or close to zero. Thus, with partial pivoting, the program will be more numerically stable. Also, note that in the row replacement operation, j starts from k, since the first k − 1 elements are always zero in this algorithm. 1.2 Jordan Elimination After we obtain an “upper triangular” U from Gaussian Elimination, the Jordan Elimination can further transform it into the desired form. Similarly, the basic idea is to iteratively eliminate the elements above the main diagonal for each column, one after another. The inner for loop performs the row replacement. However, for each row, we only need to update the two elements dik and di(n+1) , since elements on the other columns actually stay the same. See the example for an illustration. After we get D, we can compute the desired solution ~x simply by xi = di(n+1) /dii , for any i. 3 ECE420: Lab 3 University of Alberta Algorithm 1 Gaussian Elimination Input: An augmented matrix G = {A|~b}, where A = (aij ) is an n × n matrix and ~b = (bi ) is an n-dimensional vector. Output: The augmented matrix U (the elements are denoted as uij ) that is equivalent to G and is in the “upper triangular” form. Initially, U ← G for k = 1 to n − 1 do/*eliminate elements below the diagonal to zero one column after another*/ /*Pivoting*/ In U, from row k to row n, find the row kp that has the maximum absolute value of the element in the k th column Swap row k and row kp /*Elimination*/ for i = k + 1 to n do temp = uik /ukk for j = k to n + 1 do uij ← uij − temp · ukj /*row replacement*/ endfor endfor endfor 4 ECE420: Lab 3 University of Alberta Algorithm 2 Jordan Elimination Input: The output of the Gaussian Elimination U (an n × (n + 1) matrix). Output: The augmented matrix D (with elements denoted by dij ) that is equivalent to G and is in our final target form. Initially, D ← U for k = n to 2 do/*eliminate elements to zero for each column one after another*/ for i = 1 to k − 1 do/*row replacement one row after another*/ di(n+1) ← di(n+1) − dik /dkk · dk(n+1) dik ← 0 endfor endfor 1.3 An Example We will give an example to demonstrate the described algorithms. Consider a linear system of equations: 2x11 + 4x12 − 2x13 = 3 −4x21 − 8x22 + 5x23 = −4 4x31 + 4x32 − 5x33 = 4. The corresponding augmented matrix is 2 4 −2 3 −4 −8 5 −4 4 4 −5 4 The Gauss-Jordan Elimination with partial pivoting on it will be 5 ECE420: Lab 3 University of Alberta 2 4 −2 3 −4 −8 5 −4 4 4 −5 4 −4 −8 5 −4 R ↔R2 −−1−−→ 4 −2 3 (pivoting) 2 4 4 −5 4 −4 −8 5 −4 R2 + 1 R1 1 −−−−2−→ 0 1 0 2 4 4 −5 4 −4 −8 5 −4 R +R1 −−3−−→ 0 12 1 0 0 −4 0 0 −4 −8 5 −4 R ↔R3 −−2−−→ 0 −4 0 0 (pivoting; it happens to be the end of Gaussian Elimination) 0 0 21 1 −4 −8 0 −14 R −10R3 −−1−−−→ 0 −4 0 0 (Starting Jordan Elimination) 0 0 12 1 −4 0 0 −14 R1 −2R2 −− −−→ 0 −4 0 0 0 0 12 1 6 ECE420: Lab 3 2 University of Alberta Task and Requirement Task: Using OpenMP, implement a program to solve linear systems of equations by Gauss-Jordan Elimination with partial pivoting. The input will be a coefficient matrix A and a vector ~b. The output will be a vector ~x, where A · ~x = ~b. Requirements and Remarks: • Use the scripts in “Development Kit Lab 3” to generate input data, load data and save results. Refer to the ReadMe file for details on how to use them. • Time measurement should be implemented. • The number of threads should be passed as the only command line argument to your program. • Optimize the performance of your implementation as much as possible using the techniques learned in class. • You don’t need to consider the singular cases, i.e., a linear system with no solution or an infinite number of solutions. The input data generated by “datagen.c” will avoid such cases. You do need to include the partial pivoting procedure in your code to make the computed results correct and numerically stable. Lab Report Requirements: 1. Describe your implementation clearly. 2. Compare and discuss the performance (speedup or efficiency) of your implementation under different numbers of threads used. Explain your results. 3. Present the design choices and performance optimization analysis in your lab report. In the report, show the run times of your optimized code compared against some baseline inferior version(s). Use figures and/or tables to show 7 ECE420: Lab 3 University of Alberta the results under various setups (e.g., different scheduling policies and chunk sizes, different parallelization strategies, and different numbers of threads and so on) and under sufficiently large input(s). In other words, an effort for performance optimization must be demonstrated in your lab report. 4. Please also refer to the “Lab Report Guide” for other requirements. Submission Instructions: Each team is required to submit BOTH a hard copy of printed lab report to the assignment box AND the source code on eClass. The report should be submitted in the assignment box on the 2nd floor of ECERF. Please check eClass for the code and report submission due date. For code submission, each team is required to submit a zip file to eClass. The zip file should be named “StudentID-Hx.zip”, where “StudentID” is the Student ID of one of your group members (doesn’t matter which member) and “Hx” is the section (H1 or H2). The zip file should contain the following files: 1. “readme”: a text file containing instructions on how to compile your source files; 2. “members”: a text file listing the student IDs of ALL group members, with each student ID occupying one line; 3. “Makefile”: the makefile to generate the executable. Please ensure that your Makefile is located in the root folder of that zip file and the default “$make” command will generate the final executable program named “main”; 4. All the necessary source files to build the executable “main”; DO NOT include the compiled data generation file, “serialtester” file, or the input/output data file. Note: you MUST use the file names suggested above. File names are case-sensitive. You MUST generate the required zip file by directly 8 ECE420: Lab 3 University of Alberta compressing all the above files, rather than compressing a folder containing those files. 3 Hints and Tips • To improve the performance, you might need to find out all components of the program that each can be run in parallel first. Try to reduce the number of forks and implicit joins due to repeated uses of parallel directives. In other words, try to reuse the team of threads launched by a parallel directive. • What kind of scheduling policies will yield the best result for each OpenMP for loop? • How many threads should be used? • Should we use the same number of threads to handle each for loop? Or should we vary the number of threads dynamically at different stages of the program? Should we vary the scheduling policy, chunksizes and other parameters? • To guarantee the correctness, how should we effectively protect a critical section if there is any? • Would the single or master directive be useful here? • Is it helpful to use the collapse clause if there is any nested loop? • In the pivoting procedure, swap the index or pointers rather than swapping all the row elements in memory. • The debugging is similar to Pthread debugging. When compiling, instead of using the “-g” flag, you might need to use the “-ggdb” flag to make the debugger work well on an OpenMP multi-threaded program. You can try other flags like “-gstabs”, “-gstabs+”. However, whether they would work depends on the system. Appendix 9 ECE420: Lab 3 A University of Alberta Marking Guideline Code: Correct implementation: Speed: Report: Clear description of implementation: Testing and verification: Performance discussion: Presentation: Total: 2 1 1 1 4 1 10 10
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 10 Producer : pdfTeX-1.40.16 Creator : TeX Create Date : 2017:03:01 10:43:24-07:00 Modify Date : 2017:03:01 10:43:24-07:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools