PETSc Manual

PETSc_manual

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 266

Download
Open PDF In Browser	View PDF

ANL-95/11 Rev 3.8

Mathematics and Computer Science Division
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240
Argonne, IL 60439
www.anl.gov

PETSc Users Manual
Revision 3.8

Mathematics and Computer Science Division

Argonne National Laboratory is a U.S. Department of Energy
laboratory managed by UChicago Argonne, LLC

About Argonne National Laboratory
Argonne is a U.S. Department of Energy laboratory managed by UChicago Argonne, LLC
under contract DE-AC02-06CH11357. The Laboratory’s main facility is outside Chicago,
at 9700 South Cass Avenue, Argonne, Illinois 60439. For information about Argonne
and its pioneering science and technology programs, see www.anl.gov.

DOCUMENT AVAILABILITY
Online Access: U.S. Department of Energy (DOE) reports produced after 1991 and a
growing number of pre-1991 documents are available free via DOE's SciTech Connect
(http://www.osti.gov/scitech/)
Reports not in digital format may be purchased by the public from the National
Technical Information Service (NTIS):
U.S. Department of Commerce
National Technical Information Service
5301 Shawnee Rd
Alexandra, VA 22312
www.ntis.gov
Phone: (800) 553-NTIS (6847) or (703) 605-6000
Fax: (703) 605-6900
Email: orders@ntis.gov
Reports not in digital format are available to DOE and DOE contractors from the Office of
Scientific and Technical Information (OSTI):
U.S. Department of Energy
Office of Scientific and Technical Information
P.O. Box 62
Oak Ridge, TN 37831-0062
www.osti.gov
Phone: (865) 576-8401
Fax: (865) 576-5728
Email: reports@osti.gov

Disclaimer
This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States
Government nor any agency thereof, nor UChicago Argonne, LLC, nor any of their employees or officers, makes any warranty, express
or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply
its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of
document authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof,
Argonne National Laboratory, or UChicago Argonne, LLC.

ANL-95/11 Rev 3.8

PETSc Users Manual
Revision 3.8

Prepared by
S. Balay1 , S. Abhyankar2 , M. Adams3 , J. Brown1 , P. Brune1 , K. Buschelman1 , L.
Dalcin4 , V. Eijkhout6 , W. Gropp1 , D. Karpeyev1 , D. Kaushik1 , M. Knepley1 , D. May7 ,
L. Curfman McInnes1 , T. Munson1 , K. Rupp1 , P. Sanan8 , B. Smith1 , S. Zampini4 , H.
Zhang5 , and H. Zhang1
1 Mathematics
2 Energy

and Computer Science Division, Argonne National Laboratory

Systems Division, Argonne National Laboratory

3 Computational
4 Extreme

Computing Research Center, King Abdullah University of Science and Technology

5 Computer
6 Texas

Research, Lawrence Berkeley National Laboratory

Science Department, Illinois Institute of Technology

Advanced Computing Center, University of Texas at Austin

7 Department
8 Institute

of Earth Sciences, University of Oxford

of Geophysics, ETH Zurich

September 2017
This work was supported by the Office of Advanced Scientific Computing Research,
Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.

PETSc 3.8 March 24, 2018

Abstract:
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for
Scientific Computation (PETSc) is a suite of data structures and routines that provide the building
blocks for the implementation of large-scale application codes on parallel (and serial) computers.
PETSc uses the MPI standard for all message-passing communication.
PETSc includes an expanding suite of parallel linear solvers, nonlinear solvers, and time integrators that may be used in application codes written in Fortran, C, C++, and Python (via
petsc4py). PETSc provides many of the mechanisms needed within parallel application codes, such
as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling
users to employ the level of abstraction that is most appropriate for a particular problem. By using
techniques of object-oriented programming, PETSc provides enormous flexibility for users.
PETSc is a sophisticated set of software tools; as such, for some users it initially has a much
steeper learning curve than a simple subroutine library. In particular, for individuals without some
computer science background, experience programming in C, C++ or Fortran and experience using
a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage
of the features that enable efficient software use. However, the power of the PETSc design and
the algorithms it incorporates may make the efficient implementation of many application codes
simpler than “rolling them” yourself.
• For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for
the classes of problems for which effective MATLAB code can be written.
• PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise
sequential code. Certainly all parts of a previously sequential code need not be parallelized
but the matrix generation portion must be parallelized to expect any kind of reasonable
performance. Do not expect to generate your matrix sequentially and then “use PETSc” to
solve the linear system in parallel.
Since PETSc is under continued development, small changes in usage and calling sequences
of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for
information on contacting support.
A list of publications and web sites that feature work involving PETSc may be found at
http://www.mcs.anl.gov/petsc/publications.
We welcome any reports of corrections for this document.

5

PETSc 3.8 March 24, 2018

6

PETSc 3.8 March 24, 2018

Getting Information on PETSc:
Online:
• Manual pages and example usage : http://www.mcs.anl.gov/petsc/documentation
• Installing PETSc : http://www.mcs.anl.gov/petsc/documentation/installation.html
• Tutorials : https://www.mcs.anl.gov/petsc/documentation/tutorials/index.html
In this manual:
• Complete contents, page 13
• Basic introduction, page 21
• Assembling vectors, page 49; and matrices, page 65
• Linear solvers, page 79
• Nonlinear solvers, page 105
• Timestepping (ODE) solvers, page 143
• Structured grids, page 54
• Unstructured meshes, pages 59 and 163
• Index, page 247

7

PETSc 3.8 March 24, 2018

8

PETSc 3.8 March 24, 2018

Acknowledgments:
We thank all PETSc users for their many suggestions, bug reports, and encouragement.
Recent contributors to PETSc can be seen by visualizing the history of the PETSc git repository,
for example at github.com/petsc/petsc/graphs/contributors.
Earlier contributors to PETSc include:
• Asbjorn Hoiland Aarrestad - the explicit Runge-Kutta implementations (TSRK);
• G. Anciaux and J. Roman - the interfaces to the partitioning packages PTScotch, Chaco, and
Party;
• Allison Baker - the flexible GMRES (KSPFGMRES) and LGMRES (KSPLGMRES) code;
• Chad Carroll - Win32 graphics;
• Ethan Coon - the PetscBag and many bug fixes;
• Cameron Cooper - portions of the VecScatter routines;
• Patrick Farrell - the nleqerr line search for SNES;
• Paulo Goldfeld - the balancing Neumann-Neumann preconditioner (PCNN);
• Matt Hille;
• Joel Malard - the BICGStab(l) implementation (KSPBCGSL);
• Paul Mullowney, enhancements to portions of the Nvidia GPU interface;
• Dave May - the GCR implementation (KSPGCR);
• Peter Mell - portions of the DMDA routines;
• Richard Mills - the AIJPERM matrix format (MATAIJPERM) for the Cray X1 and universal F90
array interface;
• Victor Minden - the NVIDIA GPU interface;
• Todd Munson - the LUSOL (sparse solver in MINOS) interface (MATSOLVERLUSOL) and several
Krylov methods;
• Adam Powell - the PETSc Debian package;
• Robert Scheichl - the MINRES implementation (KSPMINRES);
• Kerry Stevens - the pthread-based Vec and Mat classes plus the various thread pools (deprecated);
• Karen Toonen - design and implementation of much of the PETSc web pages;
• Desire Nuentsa Wakam - the deflated GMRES implementation (KSPDGMRES);
• Liyang Xu - the interface to PVODE, now SUNDIALS/CVODE (TSSUNDIALS);
9

PETSc 3.8 March 24, 2018

PETSc source code contains modified routines from the following public domain software packages:
• LINPACK - dense matrix factorization and solve; converted to C using f2c and then handoptimized for small matrix sizes, for block matrix data structures;
• MINPACK - see page 138; sequential matrix coloring routines for finite difference Jacobian
evaluations; converted to C using f2c;
• SPARSPAK - see page 88; matrix reordering routines, converted to C using f2c;
• libtfs - the efficient, parallel direct solver developed by Henry Tufo and Paul Fischer for the
direct solution of a coarse grid problem (a linear system with very few degrees of freedom per
processor).

PETSc interfaces to the following external software:
• BLAS and LAPACK - numerical linear algebra;
• Chaco - A graph partitioning package;
http://www.cs.sandia.gov/CRF/chac.html
• ESSL - IBM’s math library for fast sparse direct LU factorization;
• Elemental - Jack Poulson’s parallel dense matrix solver package;
http://libelemental.org/
• HDF5 - the data model, library, and file format for storing and managing data,
https://support.hdfgroup.org/HDF5/
• hypre - the LLNL preconditioner library;
http://www.llnl.gov/CASC/hypre
• LUSOL - sparse LU factorization code (part of MINOS) developed by Michael Saunders,
Systems Optimization Laboratory, Stanford University;
http://www.sbsi-sol-optimize.com/
• MATLAB - see page 191;
• MUMPS - see page 102, MUltifrontal Massively Parallel sparse direct Solver developed by
Patrick Amestoy, Iain Duff, Jacko Koster, and Jean-Yves L’Excellent;
http://www.enseeiht.fr/lima/apo/MUMPS/credits.html
• Metis/ParMeTiS - see page 77, parallel graph partitioner,
http://www-users.cs.umn.edu/˜karypis/metis/
• Party - A graph partitioning package;
http://www2.cs.uni-paderborn.de/cs/ag-monien/PERSONAL/ROBSY/party.html
• PaStiX - Parallel LU and Cholesky solvers;
http://pastix.gforge.inria.fr/
• PTScotch - A graph partitioning package;
http://www.labri.fr/Perso/˜pelegrin/scotch/
10

PETSc 3.8 March 24, 2018

• SPAI - for parallel sparse approximate inverse preconditioning;
https://cccs.unibas.ch/lehre/software-packages/
• SuiteSparse - see page 102, developed by Timothy A. Davis;
http://faculty.cse.tamu.edu/davis/suitesparse.html
• SUNDIALS/CVODE - see page 153, parallel ODE integrator;
http://www.llnl.gov/CASC/sundials/
• SuperLU and SuperLU Dist - see page 102, the efficient sparse LU codes developed by Jim
Demmel, Xiaoye S. Li, and John Gilbert;
http://crd-legacy.lbl.gov/˜xiaoye/SuperLU
• STRUMPACK - the STRUctured Matrix Package;
http://portal.nersc.gov/project/sparse/strumpack/
• Triangle and Tetgen - mesh generation packages;
https://www.cs.cmu.edu/˜quake/triangle.html
http://wias-berlin.de/software/tetgen/
• Trilinos/ML - Sandia’s main multigrid preconditioning package;
http://software.sandia.gov/trilinos/,
• Zoltan - graph partitioners from Sandia National Laboratory;
http://www.cs.sandia.gov/zoltan/
These are all optional packages and do not need to be installed to use PETSc.
PETSc software is developed and maintained using
• Emacs editor
• Git revision control system
• Python
PETSc documentation has been generated using
• Sowing text processing tools developed by Bill Gropp
http://wgropp.cs.illinois.edu/projects/software/sowing/
• c2html
• pdflatex
• python

11

PETSc 3.8 March 24, 2018

12

Contents
Abstract

I

5

Introduction to PETSc

19

1 Getting Started
1.1 Suggested Reading . . . .
1.2 Running PETSc Programs
1.3 Running PETSc Tests . .
1.4 Writing PETSc Programs
1.5 Simple PETSc Examples .
1.6 Citing PETSc . . . . . . .
1.7 Directory Structure . . . .

21
22
23
24
25
26
41
42

II

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

Programming with PETSc

45

2 Vectors and Parallel Data
2.1 Creating and Assembling Vectors . . . . . . . . . . . . .
2.2 Basic Vector Operations . . . . . . . . . . . . . . . . . .
2.3 Indexing and Ordering . . . . . . . . . . . . . . . . . . .
2.3.1 Application Orderings . . . . . . . . . . . . . . .
2.3.2 Local to Global Mappings . . . . . . . . . . . . .
2.4 Structured Grids Using Distributed Arrays . . . . . . .
2.4.1 Creating Distributed Arrays . . . . . . . . . . . .
2.4.2 Local/Global Vectors and Scatters . . . . . . . .
2.4.3 Local (Ghosted) Work Vectors . . . . . . . . . .
2.4.4 Accessing the Vector Entries for DMDA Vectors
2.4.5 Grid Information . . . . . . . . . . . . . . . . . .
2.5 Vectors Related to Unstructured Grids . . . . . . . . . .
2.5.1 Index Sets . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Scatters and Gathers . . . . . . . . . . . . . . . .
2.5.3 Scattering Ghost Values . . . . . . . . . . . . . .
2.5.4 Vectors with Locations for Ghost Values . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

47
47
49
51
51
53
54
54
55
56
57
58
59
59
60
62
62

3 Matrices
65
3.1 Creating and Assembling Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1.1 Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.2 Dense Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
13

CONTENTS

3.2
3.3
3.4
3.5

3.1.3 Block Matrices . .
Basic Matrix Operations .
Matrix-Free Matrices . . .
Other Matrix Operations
Partitioning . . . . . . . .

PETSc 3.8 March 24, 2018

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

4 KSP: Linear System Solvers
4.1 Using KSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Solving Successive Linear Systems . . . . . . . . . . . . . . . . . . . . .
4.3 Krylov Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Preconditioning within KSP . . . . . . . . . . . . . . . . . . . . .
4.3.2 Convergence Tests . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Convergence Monitoring . . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Understanding the Operator’s Spectrum . . . . . . . . . . . . . .
4.3.5 Other KSP Options . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 ILU and ICC Preconditioners . . . . . . . . . . . . . . . . . . . .
4.4.2 SOR and SSOR Preconditioners . . . . . . . . . . . . . . . . . .
4.4.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4 Block Jacobi and Overlapping Additive Schwarz Preconditioners
4.4.5 Algebraic Multigrid (AMG) Preconditioners . . . . . . . . . . . .
4.4.6 Balancing Domain Decomposition by Constraints . . . . . . . . .
4.4.7 Shell Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.8 Combining Preconditioners . . . . . . . . . . . . . . . . . . . . .
4.4.9 Multigrid Preconditioners . . . . . . . . . . . . . . . . . . . . . .
4.5 Solving Block Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Solving Singular Systems . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Using External Linear Solvers . . . . . . . . . . . . . . . . . . . . . . . .
5 SNES: Nonlinear Solvers
5.1 Basic SNES Usage . . . . . . . . . . . . . .
5.1.1 Nonlinear Function Evaluation . . .
5.1.2 Jacobian Evaluation . . . . . . . . .
5.2 The Nonlinear Solvers . . . . . . . . . . . .
5.2.1 Line Search Newton . . . . . . . . .
5.2.2 Trust Region Methods . . . . . . . .
5.2.3 Nonlinear Krylov Methods . . . . .
5.2.4 Quasi-Newton Methods . . . . . . .
5.2.5 The Full Approximation Scheme . .
5.2.6 Nonlinear Additive Schwarz . . . . .
5.3 General Options . . . . . . . . . . . . . . .
5.3.1 Convergence Tests . . . . . . . . . .
5.3.2 Convergence Monitoring . . . . . . .
5.3.3 Checking Accuracy of Derivatives . .
5.4 Inexact Newton-like Methods . . . . . . . .
5.5 Matrix-Free Methods . . . . . . . . . . . . .
5.6 Finite Difference Jacobian Approximations
5.7 Variational Inequalities . . . . . . . . . . . .
5.8 Nonlinear Preconditioning . . . . . . . . . .
14

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

71
73
73
75
77

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

79
79
80
81
82
84
84
85
86
86
87
88
88
89
91
93
95
95
97
99
102
102

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

105
. 105
. 113
. 113
. 114
. 114
. 116
. 116
. 117
. 117
. 118
. 118
. 118
. 119
. 120
. 120
. 121
. 138
. 140
. 140

CONTENTS

6 TS:
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10

PETSc 3.8 March 24, 2018

Scalable ODE and DAE Solvers
Basic TS Options . . . . . . . . . . . . . . . . . .
Using Implicit-Explicit (IMEX) Methods . . . . .
Using fully implicit methods . . . . . . . . . . . .
Using the Explicit Runge-Kutta timestepper with
Special Cases . . . . . . . . . . . . . . . . . . . .
Monitoring and visualizing solutions . . . . . . .
Error control via variable time-stepping . . . . .
Handling of discontinuities . . . . . . . . . . . . .
Using TChem from PETSc . . . . . . . . . . . .
Using Sundials from PETSc . . . . . . . . . . . .

. . . . .
. . . . .
. . . . .
variable
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

. . . . . .
. . . . . .
. . . . . .
timesteps
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

143
. 145
. 146
. 150
. 150
. 150
. 151
. 151
. 152
. 153
. 153

7 Performing sensitivity analysis
155
7.1 Using the discrete adjoint methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8 Solving Steady-State Problems with Pseudo-Timestepping

159

9 High Level Support for Multigrid with KSPSetDM() and SNESSetDM()

161

10 DMPlex: Unstructured Grids in PETSc
10.1 Representing Unstructured Grids . . . .
10.2 Data on Unstructured Grids . . . . . . .
10.2.1 Data Layout . . . . . . . . . . .
10.2.2 Partitioning and Ordering . . . .
10.3 Evaluating Residuals . . . . . . . . . . .
10.4 Networks . . . . . . . . . . . . . . . . .
10.4.1 Application flow . . . . . . . . .
10.4.2 Utility functions . . . . . . . . .
10.4.3 Retrieving components . . . . . .

III

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

163
. 163
. 165
. 165
. 166
. 166
. 167
. 168
. 169
. 169

Additional Information

171

11 PETSc for Fortran Users
11.1 C vs. Fortran Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.1 Include Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.2 Error Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.3 Calling Fortran Routines from C (and C Routines from Fortran)
11.1.4 Passing Null Pointers . . . . . . . . . . . . . . . . . . . . . . . .
11.1.5 Duplicating Multiple Vectors . . . . . . . . . . . . . . . . . . . .
11.1.6 Matrix, Vector and IS Indices . . . . . . . . . . . . . . . . . . . .
11.1.7 Setting Routines . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1.8 Compiling and Linking Fortran Programs . . . . . . . . . . . . .
11.1.9 Routines with Different Fortran Interfaces . . . . . . . . . . . . .
11.2 Sample Fortran Programs . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 Array Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . .
15

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

173
173
173
173
174
174
175
175
175
175
176
176
189

CONTENTS

PETSc 3.8 March 24, 2018

12 Using MATLAB with PETSc
12.1 Dumping Data for MATLAB . . . . . . . . . . .
12.1.1 Dumping ASCII MATLAB data . . . . .
12.1.2 Dumping Binary Data for MATLAB . . .
12.2 Sending Data to an Interactive MATLAB Session
12.3 Using the MATLAB Compute Engine . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

191
. 191
. 191
. 191
. 192
. 192

13 Profiling
13.1 Basic Profiling Information . . . . . . . . . . . . . . . . . . .
13.1.1 Interpreting -log view Output: The Basics . . . . . .
13.1.2 Interpreting -log view Output: Parallel Performance
13.1.3 Using -log mpe with Jumpshot . . . . . . . . . . . . .
13.2 Profiling Application Codes . . . . . . . . . . . . . . . . . . .
13.3 Profiling Multiple Sections of Code . . . . . . . . . . . . . . .
13.4 Restricting Event Logging . . . . . . . . . . . . . . . . . . . .
13.5 Interpreting -log info Output: Informative Messages . . . .
13.6 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7 Saving Output to a File . . . . . . . . . . . . . . . . . . . . .
13.8 Accurate Profiling and Paging Overheads . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

195
195
195
197
199
199
200
200
201
202
202
202

14 Hints for Performance Tuning
14.1 Maximizing Memory Bandwidth . . . . . . . . . . . . . . . . . . . . .
14.1.1 Memory Bandwidth vs. Processes . . . . . . . . . . . . . . . .
14.1.2 Non-Uniform Memory Access (NUMA) and Process Placement
14.2 Performance Pitfalls and Advice . . . . . . . . . . . . . . . . . . . . .
14.2.1 Debug vs. Optimized Builds . . . . . . . . . . . . . . . . . . . .
14.2.2 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.3 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.4 Memory Allocation for Sparse Matrix Assembly . . . . . . . . .
14.2.5 Memory Allocation for Sparse Matrix Factorization . . . . . .
14.2.6 Detecting Memory Allocation Problems . . . . . . . . . . . . .
14.2.7 Data Structure Reuse . . . . . . . . . . . . . . . . . . . . . . .
14.2.8 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . .
14.2.9 Tips for Efficient Use of Linear Solvers . . . . . . . . . . . . . .
14.2.10 System-Related Problems . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

205
205
205
207
211
211
212
212
213
213
213
214
214
215
215

.
.
.
.
.
.
.
.
.
.
.
.

217
. 217
. 217
. 217
. 218
. 218
. 219
. 219
. 221
. 221
. 221
. 222
. 222

15 Other PETSc Features
15.1 PETSc on a process subset . . . . .
15.2 Runtime Options . . . . . . . . . . .
15.2.1 The Options Database . . . .
15.2.2 Options Prefixes . . . . . . .
15.2.3 User-Defined PetscOptions .
15.2.4 Keeping Track of Options . .
15.3 Viewers: Looking at PETSc Objects
15.3.1 Viewing From Options . . . .
15.3.2 Using Viewers to Check Load
15.4 Using SAWs with PETSc . . . . . .
15.5 Debugging . . . . . . . . . . . . . . .
15.6 Error Handling . . . . . . . . . . . .

. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Imbalance
. . . . . .
. . . . . .
. . . . . .
16

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

CONTENTS

PETSc 3.8 March 24, 2018

15.7 Numbers . . . . . . . . . . . . . . . . . . . . . . . . .
15.8 Parallel Communication . . . . . . . . . . . . . . . .
15.9 Graphics . . . . . . . . . . . . . . . . . . . . . . . . .
15.9.1 Windows as PetscViewers . . . . . . . . . . .
15.9.2 Simple PetscDrawing . . . . . . . . . . . . . .
15.9.3 Line Graphs . . . . . . . . . . . . . . . . . . .
15.9.4 Graphical Convergence Monitor . . . . . . . .
15.9.5 Disabling Graphics at Compile Time . . . . .
15.10Emacs Users . . . . . . . . . . . . . . . . . . . . . .
15.11Vi and Vim Users . . . . . . . . . . . . . . . . . . .
15.12Eclipse Users . . . . . . . . . . . . . . . . . . . . . .
15.13Qt Creator Users . . . . . . . . . . . . . . . . . . . .
15.14Visual Studio Users . . . . . . . . . . . . . . . . . . .
15.15XCode Users (The Apple GUI Development System)
15.15.1 Mac OS X . . . . . . . . . . . . . . . . . . . .
15.15.2 iPhone/iPad iOS . . . . . . . . . . . . . . . .
16 Makefiles
16.1 Makefile System . . . . . . .
16.1.1 Makefile Commands .
16.1.2 Customized Makefiles
16.2 PETSc Flags . . . . . . . . .
16.2.1 Sample Makefiles . . .
16.3 Limitations . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

17 Unimportant and Advanced Features of
17.1 Extracting Submatrices . . . . . . . . .
17.2 Matrix Factorization . . . . . . . . . . .
17.3 Unimportant Details of KSP . . . . . .
17.4 Unimportant Details of PC . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

Matrices
. . . . . .
. . . . . .
. . . . . .
. . . . . .

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

and
. . .
. . .
. . .
. . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

224
224
224
225
225
226
229
229
229
230
230
231
233
233
233
233

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

235
. 235
. 235
. 236
. 236
. 236
. 239

Solvers
. . . . .
. . . . .
. . . . .
. . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

241
241
241
243
244

Index

247

Bibliography

263

17

CONTENTS

PETSc 3.8 March 24, 2018

18

Part I

Introduction to PETSc

19

Chapter 1

Getting Started
The Portable, Extensible Toolkit for Scientific Computation (PETSc) has successfully demonstrated
that the use of modern programming paradigms can ease the development of large-scale scientific
application codes in Fortran, C, C++, and Python. Begun over 20 years ago, the software has
evolved into a powerful set of tools for the numerical solution of partial differential equations and
related problems on high-performance computers.
PETSc consists of a variety of libraries (similar to classes in C++), which are discussed in detail
in Parts II and III of the users manual. Each library manipulates a particular family of objects
(for instance, vectors) and the operations one would like to perform on the objects. The objects
and operations in PETSc are derived from our long experiences with scientific computation. Some
of the PETSc modules deal with
•
•
•
•
•
•
•
•

index sets (IS), including permutations, for indexing into vectors, renumbering, etc;
vectors (Vec);
matrices (Mat) (generally sparse);
managing interactions between mesh data structures and vectors and matrices (DM);
over thirty Krylov subspace methods (KSP);
dozens of preconditioners, including multigrid, block solvers, and sparse direct solvers (PC);
nonlinear solvers (SNES); and
timesteppers for solving time-dependent (nonlinear) PDEs, including support for differential
algebraic equations, and the computation of adjoints (sensitivities/gradients of the solutions)
(TS)

Each consists of an abstract interface (simply a set of calling sequences) and one or more implementations using particular data structures. Thus, PETSc provides clean and effective codes for
the various phases of solving PDEs, with a uniform approach for each class of problem. This design
enables easy comparison and use of different algorithms (for example, to experiment with different
Krylov subspace methods, preconditioners, or truncated Newton methods). Hence, PETSc provides a rich environment for modeling scientific applications as well as for rapid algorithm design
and prototyping.
The libraries enable easy customization and extension of both algorithms and implementations.
This approach promotes code reuse and flexibility, and separates the issues of parallelism from
the choice of algorithms. The PETSc infrastructure creates a foundation for building large-scale
applications.
It is useful to consider the interrelationships among different pieces of PETSc. Figure 1 is a
diagram of some of these pieces. The figure illustrates the library’s hierarchical organization, which
enables users to employ the level of abstraction that is most appropriate for a particular problem.
21

1.1. SUGGESTED READING

PETSc 3.8 March 24, 2018

Application Codes

...

Higher-Level Libraries

PETSc
TS (Time Steppers)
Euler

Backward
Euler

RK

DM (Domain Management)

BDF SSP ARKIMEX

RosenbrockW

Distributed
Array

...

Increasing Level of Abstraction

SNES (Nonlinear Solvers)
Newton
Newton
Line Search Trust Region

FAS

Plex (Unstructured)

...

TAO (Optimization)

NGMRES NASM

ASPIN

...

Newton

LevenbergMarquardt

...

Pipelined
CG

...

KSP (Krylov Subspace Methods)
GMRES

Richardson

CG

CGS Bi-CGStab TFQMR MINRES

GCR

Chebyshev

PC (Preconditioners)
Additive
Schwarz

Block
Jacobi

Jacobi

ICC

ILU

LU

SOR

MG

AMG

BDDC

Shell

...

CUSP

ViennaCL FFT

Shell

...

Mat (Operators)
Compressed
Sparse Row

Block
CSR

Symmetric
Block CSR

Dense

CUSPARSE

Vec (Vectors)
Standard

CUDA

CUSP

BLAS/LAPACK

IS (Index Sets)
ViennaCL

...

General

MPI

Block

Stride

...

Figure 1: Numerical libraries of PETSc

1.1

Suggested Reading

The manual is divided into three parts:
• Part I - Introduction to PETSc
• Part II - Programming with PETSc
• Part III - Additional Information
Part I describes the basic procedure for using the PETSc library and presents two simple examples of solving linear systems with PETSc. This section conveys the typical style used throughout
the library and enables the application programmer to begin using the software immediately. Part
I is also distributed separately for individuals interested in an overview of the PETSc software, excluding the details of library usage. Readers of this separate distribution of Part I should note that
all references within the text to particular chapters and sections indicate locations in the complete
users manual.
Part II explains in detail the use of the various PETSc libraries, such as vectors, matrices, index
sets, linear and nonlinear solvers, and graphics. Part III describes a variety of useful information,
including profiling, the options database, viewers, error handling, makefiles, and some details of
PETSc design.
PETSc has evolved to become quite a comprehensive package, and therefore the PETSc Users
Manual can be rather intimidating for new users. We recommend that one initially read the entire
document before proceeding with serious use of PETSc, but bear in mind that PETSc can be used
22

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

efficiently before one understands all of the material presented here. Furthermore, the definitive
reference for any PETSc function is always the online manual page (“manpage”).
Within the PETSc distribution, the directory ${PETSC_DIR}/docs contains all documentation.
Manual pages for all PETSc functions can be accessed at www.mcs.anl.gov/petsc/documentation.
The manual pages provide hyperlinked indices (organized by both concept and routine name) to
the tutorial examples and enable easy movement among related topics.
Emacs and Vi/Vim users may find the etags/ctags option to be extremely useful for exploring
the PETSc source code. Details of this feature are provided in Section 15.10.
The file manual.pdf contains the complete PETSc Users Manual in the portable document
format (PDF), while intro.pdf includes only the introductory segment, Part I. The complete
PETSc distribution, users manual, manual pages, and additional information are also available via
the PETSc home page at www.mcs.anl.gov/petsc. The PETSc home page also contains details
regarding installation, new features and changes in recent versions of PETSc, machines that we
currently support, and a FAQ list for frequently asked questions.
Note to Fortran Programmers: In most of the manual, the examples and calling sequences
are given for the C/C++ family of programming languages. We follow this convention because we
recommend that PETSc applications be coded in C or C++. However, pure Fortran programmers
can use most of the functionality of PETSc from Fortran, with only minor differences in the user
interface. Chapter 11 provides a discussion of the differences between using PETSc from Fortran
and C, as well as several complete Fortran examples. This chapter also introduces some routines
that support direct use of Fortran90 pointers.
Note to Python Programmers: To program with PETSc in Python you need to install the
PETSc4py package developed by Lisandro Dalcin. This can be done by configuring PETSc with
the option --download-petsc4py. See the PETSc installation guide for more details:
http://www.mcs.anl.gov/petsc/documentation/installation.html.

1.2

Running PETSc Programs

Before using PETSc, the user must first set the environmental variable PETSC_DIR, indicating the
full path of the PETSc home directory. For example, under the UNIX bash shell a command of
the form
export PETSC_DIR=$HOME/petsc
can be placed in the user’s .bashrc or other startup file. In addition, the user may need to set
the environment variable PETSC_ARCH to specify a particular configuration of the PETSc libraries.
Note that PETSC_ARCH is just a name selected by the installer to refer to the libraries compiled
for a particular set of compiler options and machine type. Using different values of PETSC_ARCH
allows one to switch between several different sets (say debug and optimized) of libraries easily.
To determine if you need to set PETSC_ARCH, look in the directory indicated by PETSC_DIR, if
there are subdirectories beginning with arch then those subdirectories give the possible values for
PETSC_ARCH.
All PETSc programs use the MPI (Message Passing Interface) standard for message-passing
communication [25]. Thus, to execute PETSc programs, users must know the procedure for beginning MPI jobs on their selected computer system(s). For instance, when using the MPICH
implementation of MPI [16] and many others, the following command initiates a program that uses
eight processors:
23

1.3. RUNNING PETSC TESTS

PETSc 3.8 March 24, 2018

mpiexec -n 8 ./petsc_program_name petsc_options
PETSc also comes with a script that uses the information set in ${PETSC_DIR}/${PETSC_ARCH}/
lib/petsc/conf/petscvariables to automatically use the correct mpiexec for your configuration.
${PETSC_DIR}/bin/petscmpiexec -n 8 ./petsc_program_name petsc_options
All PETSc-compliant programs support the use of the -h or -help option as well as the -v or
-version option.
Certain options are supported by all PETSc programs. We list a few particularly useful ones
below; a complete list can be obtained by running any PETSc program with the option -help.
• -log_view - summarize the program’s performance
• -fp_trap - stop on floating-point exceptions; for example divide by zero
• -malloc_dump - enable memory tracing; dump list of unfreed memory at conclusion of the
run
• -malloc_debug - enable memory tracing (by default this is activated for debugging versions)
• -start_in_debugger [noxterm,gdb,dbx,xxgdb] [-display name] - start all processes in
debugger
• -on_error_attach_debugger [noxterm,gdb,dbx,xxgdb] [-display name] - start debugger only on encountering an error
• -info - print a great deal of information about what the program is doing as it runs
• -options_file filename - read options from a file
See Section 15.5 for more information on debugging PETSc programs.

1.3

Running PETSc Tests

As discussed in Section 1.2, users should set PETSC_DIR and PETSC_ARCH before running the tests,
or can provide them on the command line as below.
To check if the libraries are working do:
make PETSC_DIR= PETSC_ARCH= test
A larger set of tests can be run with
make PETSC_DIR= PETSC_ARCH= alltests
A new testing system is available by running
make -f gmakefile test PETSC_ARCH=
The test reporting system classifies them according to the Test Anywhere Protocal (TAP)1 . In
brief, the categories are
• ok
The test passed.
• not ok
The test failed.
1

See https://testanything.org/tap-specification.html

24

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

• not ok #SKIP
The test was skipped, usually because build requirements were not met (for example, an
external solver library was required, but not compiled against it).
• ok #TODO
The test is under development by the developers.
The tests are a series of shell scripts, generated by information contained within the test source
file, that are invoked by the makefile system. The tests are run in ${PETSC_DIR}/${PETSC_ARCH}/
tests with the same directory as the source tree underneath (See Section 1.7). A label is used to
denote where it can be found within the source tree. For example, test vec_vec_tutorials_runex6
denotes the shell script:
${PETSC_DIR}/${PETSC_ARCH}/tests/src/vec/vec/examples/tutorials/runex6.sh
These shell scripts can be run independently in those directories, and take arguments to show the
commands run, change arguments, etc. Use the -h option to the shell script to see these options.
Often, you want to run only a subset of tests. Our makefiles use gmake’s wildcard syntax.
In this syntax, % is a wild card character and is passed in using the search argument. Two
wildcard characters cannot be used in a search, so the searchin argument is used to provide the
equivalent of %pattern% search. The default examples have default arguments, and we often wish
to test examples with various arguments; we use the argsearch argument for these searches. Like
searchin, it does not use wildcards, but rather whether the string is within the arguments.
Some examples are:
make -f gmakefile test
make -f gmakefile test
make -f gmakefile test
make -f gmakefile test
cuda in arguments

search=’ts%’
searchin=’tutorials’
search ’ts%’ searchin=’tutorials’
argsearch=’cuda’

#
#
#
#

Run
Run
Run
Run

all TS examples
all tutorials
all TS tutorials
examples with

It is useful before invoking the tests to see what targets will be run. The print-test target
helps with this:
make -f gmakefile print-test argsearch=’cuda’
To see all of the test targets, this command can be used:
make -f gmakefile print-test
For more help, run
make -f gmakefile help-test
To learn more about the test system details, one can look at the Developer’s Guide.

1.4

Writing PETSc Programs

Most PETSc programs begin with a call to
PetscInitialize(int *argc,char ***argv,char *file,char *help);
which initializes PETSc and MPI. The arguments argc and argv are the command line arguments
delivered in all C and C++ programs. The argument file optionally indicates an alternative
name for the PETSc options file, .petscrc, which resides by default in the user’s home directory.
25

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

Section 15.2 provides details regarding this file and the PETSc options database, which can be used
for runtime customization. The final argument, help, is an optional character string that will be
printed if the program is run with the -help option. In Fortran the initialization command has
the form
call PetscInitialize(character(*) file,integer ierr)
PetscInitialize() automatically calls MPI_Init() if MPI has not been not previously initialized.
In certain circumstances in which MPI needs to be initialized directly (or is initialized by some
other library), the user can first call MPI_Init() (or have the other library do it), and then call
PetscInitialize(). By default, PetscInitialize() sets the PETSc “world” communicator,
given by PETSC_COMM_WORLD, to MPI_COMM_WORLD.
For those not familiar with MPI, a communicator is a way of indicating a collection of processes
that will be involved together in a calculation or communication. Communicators have the variable
type MPI_Comm. In most cases users can employ the communicator PETSC_COMM_WORLD to indicate
all processes in a given run and PETSC_COMM_SELF to indicate a single process.
MPI provides routines for generating new communicators consisting of subsets of processors,
though most users rarely need to use these. The book Using MPI, by Lusk, Gropp, and Skjellum [17] provides an excellent introduction to the concepts in MPI. See also the MPI homepage
http://www.mcs.anl.gov/mpi/. Note that PETSc users need not program much message passing
directly with MPI, but they must be familiar with the basic concepts of message passing and
distributed memory computing.
All PETSc routines return a PetscErrorCode, which is an integer indicating whether an error
has occurred during the call. The error code is set to be nonzero if an error has been detected;
otherwise, it is zero. For the C/C++ interface, the error variable is the routine’s return value, while
for the Fortran version, each PETSc routine has as its final argument an integer error variable. Error
tracebacks are discussed in the following section.
All PETSc programs should call PetscFinalize() as their final (or nearly final) statement, as
given below in the C/C++ and Fortran formats, respectively:
PetscFinalize();
call PetscFinalize(ierr)
This routine handles options to be called at the conclusion of the program, and calls MPI_Finalize()
if PetscInitialize() began MPI. If MPI was initiated externally from PETSc (by either the user
or another software package), the user is responsible for calling MPI_Finalize().

1.5

Simple PETSc Examples

To help the user start using PETSc immediately, we begin with a simple uniprocessor example in
Figure 2 that solves the one-dimensional Laplacian problem with finite differences. This sequential
code, which can be found in $PETSC_DIR/src/ksp/ksp/examples/tutorials/ex1.c, illustrates
the solution of a linear system with KSP, the interface to the preconditioners, Krylov subspace
methods, and direct linear solvers of PETSc. Following the code we highlight a few of the most
important parts of this example.

static char help[] = "Solves a tridiagonal linear system with KSP.\n\n";
/*T
26

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

Concepts: KSP^solving a system of linear equations
Processors: 1
T*/
/*
Include "petscksp.h" so that we can use KSP solvers. Note that this file
automatically includes:
petscsys.h
- base PETSc routines
petscvec.h - vectors
petscmat.h - matrices
petscis.h
- index sets
petscksp.h - Krylov subspace methods
petscviewer.h - viewers
petscpc.h - preconditioners
Note: The corresponding parallel example is ex23.c
*/
#include 
int main(int argc,char **args)
{
Vec
x, b, u;
/* approx solution, RHS, exact solution */
Mat
A;
/* linear system matrix */
KSP
ksp;
/* linear solver context */
PC
pc;
/* preconditioner context */
PetscReal
norm;
/* norm of solution error */
PetscErrorCode ierr;
PetscInt
i,n = 10,col[3],its;
PetscMPIInt
size;
PetscScalar
one = 1.0,value[3];
PetscBool
nonzeroguess = PETSC FALSE,changepcside = PETSC FALSE;
ierr = PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr;
ierr = MPI Comm size(PETSC COMM WORLD,&size);CHKERRQ(ierr);
if (size != 1) SETERRQ(PETSC COMM WORLD,1,"This is a uniprocessor example only!"
);
ierr = PetscOptionsGetInt(NULL,NULL,"-n",&n,NULL);CHKERRQ(ierr);
ierr = PetscOptionsGetBool(NULL,NULL,"-nonzero_guess",&nonzeroguess,NULL);
CHKERRQ(ierr);

/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Compute the matrix and right-hand-side vector that define
the linear system, Ax = b.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Create vectors. Note that we form 1 vector from scratch and
then duplicate as needed.
*/
ierr = VecCreate(PETSC COMM WORLD,&x);CHKERRQ(ierr);
ierr = PetscObjectSetName((PetscObject) x, "Solution");CHKERRQ(ierr);
27

1.5. SIMPLE PETSC EXAMPLES

ierr
ierr
ierr
ierr

=
=
=
=

PETSc 3.8 March 24, 2018

VecSetSizes(x,PETSC DECIDE,n);CHKERRQ(ierr);
VecSetFromOptions(x);CHKERRQ(ierr);
VecDuplicate(x,&b);CHKERRQ(ierr);
VecDuplicate(x,&u);CHKERRQ(ierr);

/*
Create matrix. When using MatCreate(), the matrix format can
be specified at runtime.
Performance tuning note: For problems of substantial size,
preallocation of matrix memory is crucial for attaining good
performance. See the matrix chapter of the users manual for details.
*/
ierr
ierr
ierr
ierr

=
=
=
=

MatCreate(PETSC COMM WORLD,&A);CHKERRQ(ierr);
MatSetSizes(A,PETSC DECIDE,PETSC DECIDE,n,n);CHKERRQ(ierr);
MatSetFromOptions(A);CHKERRQ(ierr);
MatSetUp(A);CHKERRQ(ierr);

/*
Assemble matrix
*/
value[0] = -1.0; value[1] = 2.0; value[2] = -1.0;
for (i=1; i -pc_type  -ksp_monitor -ksp_rtol 
These options will override those specified above as long as
KSPSetFromOptions() is called _after_ any other customization
routines.
*/
ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
if (nonzeroguess) {
PetscScalar p = .5;
ierr = VecSet(x,p);CHKERRQ(ierr);
ierr = KSPSetInitialGuessNonzero(ksp,PETSC TRUE);CHKERRQ(ierr);
}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve the linear system
29

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Solve linear system
*/
ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
/*
View solver info; we could instead use the option -ksp_view to
print this info to the screen at the conclusion of KSPSolve().
*/
ierr = KSPView(ksp,PETSC VIEWER STDOUT WORLD);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Check solution and clean up
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Check the error
*/
ierr = VecAXPY(x,-1.0,u);CHKERRQ(ierr);
ierr = VecNorm(x,NORM 2,&norm);CHKERRQ(ierr);
ierr = KSPGetIterationNumber(ksp,&its);CHKERRQ(ierr);
ierr = PetscPrintf(PETSC COMM WORLD,"Norm of error %g, Iterations %D\n",(double)
norm,its);CHKERRQ(ierr);
/*
Free work space. All PETSc objects should be destroyed when they
are no longer needed.
*/
ierr = VecDestroy(&x);CHKERRQ(ierr); ierr = VecDestroy(&u);CHKERRQ(ierr);
ierr = VecDestroy(&b);CHKERRQ(ierr); ierr = MatDestroy(&A);CHKERRQ(ierr);
ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
/*
Always call PetscFinalize() before exiting a program. This routine
- finalizes the PETSc libraries as well as MPI
- provides summary and diagnostic information if certain runtime
options are chosen (e.g., -log_view).
*/
ierr = PetscFinalize();
return ierr;
}
Figure 2: Example of Uniprocessor PETSc Code

Include Files
The C/C++ include files for PETSc should be used via statements such as
#include 
30

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

where petscksp.h is the include file for the linear solver library. Each PETSc program must specify
an include file that corresponds to the highest level PETSc objects needed within the program; all
of the required lower level include files are automatically included within the higher level files.
For example, petscksp.h includes petscmat.h (matrices), petscvec.h (vectors), and petscsys.h
(base PETSc file). The PETSc include files are located in the directory ${PETSC_DIR}/include.
See Section 11.1.1 for a discussion of PETSc include files in Fortran programs.

The Options Database
As shown in Figure 2, the user can input control data at run time using the options database.
In this example the command PetscOptionsGetInt(NULL,NULL,"-n",&n,&flg); checks whether
the user has provided a command line option to set the value of n, the problem dimension. If so,
the variable n is set accordingly; otherwise, n remains unchanged. A complete description of the
options database may be found in Section 15.2.

Vectors
One creates a new parallel or sequential vector, x, of global dimension M with the commands
VecCreate(MPI Comm comm,Vec *x);
VecSetSizes(Vec x, PetscInt m, PetscInt M);
where comm denotes the MPI communicator and m is the optional local size which may be PETSC_DECIDE.
The type of storage for the vector may be set with either calls to VecSetType() or VecSetFromOptions().
Additional vectors of the same type can be formed with
VecDuplicate(Vec old,Vec *new);
The commands
VecSet(Vec x,PetscScalar value);
VecSetValues(Vec x,PetscInt n,PetscInt *indices,PetscScalar
*values,INSERT VALUES);
respectively set all the components of a vector to a particular scalar value and assign a different
value to each component. More detailed information about PETSc vectors, including their basic
operations, scattering/gathering, index sets, and distributed arrays, is discussed in Chapter 2.
Note the use of the PETSc variable type PetscScalar in this example. The PetscScalar is
simply defined to be double in C/C++ (or correspondingly double precision in Fortran) for
versions of PETSc that have not been compiled for use with complex numbers. The PetscScalar
data type enables identical code to be used when the PETSc libraries have been compiled for use
with complex numbers. Section 15.7 discusses the use of complex numbers in PETSc programs.

Matrices
Usage of PETSc matrices and vectors is similar. The user can create a new parallel or sequential
matrix, A, which has M global rows and N global columns, with the routines
MatCreate(MPI Comm comm,Mat *A);
MatSetSizes(Mat A,PETSC DECIDE,PETSC DECIDE,PetscInt M,PetscInt N);
where the matrix format can be specified at runtime via the options database. The user could
alternatively specify each processes’ number of local rows and columns using m and n.
31

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

MatSetSizes(Mat A,PetscInt m,PetscInt n,PETSC DETERMINE,PETSC DETERMINE);
Generally one then sets the “type” of the matrix, with, for example,
MatSetType(A,MATAIJ);
This causes the matrix A to used the compressed sparse row storage format to store the matrix
entries. See MatType for a list of all matrix types. Values can then be set with the command
MatSetValues(Mat A,PetscInt m,PetscInt *im,PetscInt n,PetscInt *in,PetscScalar
*values,INSERT VALUES);
After all elements have been inserted into the matrix, it must be processed with the pair of commands
MatAssemblyBegin(A,MAT FINAL ASSEMBLY);
MatAssemblyEnd(A,MAT FINAL ASSEMBLY);
Chapter 3 discusses various matrix formats as well as the details of some basic matrix manipulation
routines.

Linear Solvers
After creating the matrix and vectors that define a linear system, Ax = b, the user can then use
KSP to solve the system with the following sequence of commands:
KSPCreate(MPI Comm comm,KSP *ksp);
KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat);
KSPSetFromOptions(KSP ksp);
KSPSolve(KSP ksp,Vec b,Vec x);
KSPDestroy(KSP ksp);
The user first creates the KSP context and sets the operators associated with the system (matrix that
defines the linear system, Amat and matrix from which the preconditioner is constructed, Pmat). The
user then sets various options for customized solution, solves the linear system, and finally destroys
the KSP context. We emphasize the command KSPSetFromOptions(), which enables the user to
customize the linear solution method at runtime by using the options database, which is discussed
in Section 15.2. Through this database, the user not only can select an iterative method and
preconditioner, but also can prescribe the convergence tolerance, set various monitoring routines,
etc. (see, e.g., Figure 6).
Chapter 4 describes in detail the KSP package, including the PC and KSP packages for preconditioners and Krylov subspace methods.

Nonlinear Solvers
Most PDE problems of interest are inherently nonlinear. PETSc provides an interface to tackle the
nonlinear problems directly called SNES. Chapter 5 describes the nonlinear solvers in detail. We
recommend most PETSc users work directly with SNES, rather than using PETSc for the linear
problem within a nonlinear solver.
32

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

Error Checking
All PETSc routines return an integer indicating whether an error has occurred during the call.
The PETSc macro CHKERRQ(ierr) checks the value of ierr and calls the PETSc error handler
upon error detection. CHKERRQ(ierr) should be used in all subroutines to enable a complete error
traceback. In Figure 3 we indicate a traceback generated by error detection within a sample PETSc
program. The error occurred on line 3618 of the file ${PETSC_DIR}/src/mat/impls/aij/seq/
aij.c and was caused by trying to allocate too large an array in memory. The routine was called
in the program ex3.c on line 66. See Section 11.1.2 for details regarding error checking when using
the PETSc Fortran interface.
$ mpiexec -n 1 ./ex3 -m 100000
[0]PETSC ERROR: --------------------- Error Message -------------------------------[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0]PETSC ERROR: Memory allocated 11282182704 Memory used by process 7075897344
[0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
[0]PETSC ERROR: Memory requested 18446744072169447424
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.7.1-224-g9c9a9c5 GIT Date: 2016-05-18 22:43:00
-0500
[0]PETSC ERROR: ./ex3 on a arch-darwin-double-debug named Patricks-MacBook-Pro-2.local by patrick
Mon Jun 27 18:04:03 2016
[0]PETSC ERROR: Configure options PETSC_DIR=/Users/patrick/petsc PETSC_ARCH=arch-darwin-doubledebug --download-mpich --download-f2cblaslapack --with-cc=clang --with-cxx=clang++ --with-fc=
gfortran --with-debugging=1 --with-precision=double --with-scalar-type=real --with-viennacl=0
--download-c2html -download-sowing
[0]PETSC ERROR: #1 MatSeqAIJSetPreallocation_SeqAIJ() line 3618 in /Users/patrick/petsc/src/mat/
impls/aij/seq/aij.c
[0]PETSC ERROR: #2 PetscTrMallocDefault() line 188 in /Users/patrick/petsc/src/sys/memory/mtr.c
[0]PETSC ERROR: #3 MatSeqAIJSetPreallocation_SeqAIJ() line 3618 in /Users/patrick/petsc/src/mat/
impls/aij/seq/aij.c
[0]PETSC ERROR: #4 MatSeqAIJSetPreallocation() line 3562 in /Users/patrick/petsc/src/mat/impls/aij
/seq/aij.c
[0]PETSC ERROR: #5 main() line 66 in /Users/patrick/petsc/src/ksp/ksp/examples/tutorials/ex3.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -m 100000
[0]PETSC ERROR: ----------------End of Error Message ------- send entire error message to petscmaint@mcs.anl.gov----------

Figure 3: Example of Error Traceback
When running the debug version of the PETSc libraries, it does a great deal of checking for
memory corruption (writing outside of array bounds etc). The macro CHKMEMQ can be called
anywhere in the code to check the current status of the memory for corruption. By putting several
(or many) of these macros into your code you can usually easily track down in what small segment
of your code the corruption has occured. One can also use Valgrind to track down memory errors;
see the FAQ at www.mcs.anl.gov/petsc/documentation/faq.html

Parallel Programming
Since PETSc uses the message-passing model for parallel programming and employs MPI for all
interprocessor communication, the user is free to employ MPI routines as needed throughout an
application code. However, by default the user is shielded from many of the details of message
33

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

passing within PETSc, since these are hidden within parallel objects, such as vectors, matrices,
and solvers. In addition, PETSc provides tools such as generalized vector scatters/gathers and
distributed arrays to assist in the management of parallel data.
Recall that the user must specify a communicator upon creation of any PETSc object (such as
a vector, matrix, or solver) to indicate the processors over which the object is to be distributed.
For example, as mentioned above, some commands for matrix, vector, and linear solver creation
are:
MatCreate(MPI Comm comm,Mat *A);
VecCreate(MPI Comm comm,Vec *x);
KSPCreate(MPI Comm comm,KSP *ksp);
The creation routines are collective over all processors in the communicator; thus, all processors in
the communicator must call the creation routine. In addition, if a sequence of collective routines is
being used, they must be called in the same order on each processor.
The next example, given in Figure 4, illustrates the solution of a linear system in parallel. This
code, corresponding to $PETSC_DIR/src/ksp/ksp/examples/tutorials/ex2.c, handles the twodimensional Laplacian discretized with finite differences, where the linear system is again solved
with KSP. The code performs the same tasks as the sequential version within Figure 2. Note
that the user interface for initiating the program, creating vectors and matrices, and solving the
linear system is exactly the same for the uniprocessor and multiprocessor examples. The primary
difference between the examples in Figures 2 and 4 is that each processor forms only its local part
of the matrix and vectors in the parallel case.

static char help[] = "Solves a linear system in parallel with KSP.\n\
Input parameters include:\n\
-random_exact_sol : use a random exact solution vector\n\
-view_exact_sol
: write exact solution vector to stdout\n\
-m 
: number of mesh points in x-direction\n\
-n 
: number of mesh points in y-direction\n\n";
/*T
Concepts: KSP^basic parallel example;
Concepts: KSP^Laplacian, 2d
Concepts: Laplacian, 2d
Processors: n
T*/
/*
Include "petscksp.h" so that we can use KSP solvers. Note that this file
automatically includes:
petscsys.h
- base PETSc routines
petscvec.h - vectors
petscmat.h - matrices
petscis.h
- index sets
petscksp.h - Krylov subspace methods
petscviewer.h - viewers
petscpc.h - preconditioners
*/
#include 

34

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

int main(int argc,char **args)
{
Vec
x,b,u;
/* approx solution, RHS, exact solution */
Mat
A;
/* linear system matrix */
KSP
ksp;
/* linear solver context */
PetscRandom
rctx;
/* random number generator context */
PetscReal
norm;
/* norm of solution error */
PetscInt
i,j,Ii,J,Istart,Iend,m = 8,n = 7,its;
PetscErrorCode ierr;
PetscBool
flg = PETSC FALSE;
PetscScalar
v;
#if defined(PETSC_USE_LOG)
PetscLogStage stage;
#endif
ierr
ierr
ierr
/* -

=
=
=
-

PetscInitialize(&argc,&args,(char*)0,help);if (ierr) return ierr;
PetscOptionsGetInt(NULL,NULL,"-m",&m,NULL);CHKERRQ(ierr);
PetscOptionsGetInt(NULL,NULL,"-n",&n,NULL);CHKERRQ(ierr);
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Compute the matrix and right-hand-side vector that define
the linear system, Ax = b.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */

/*
Create parallel matrix, specifying only its global dimensions.
When using MatCreate(), the matrix format can be specified at
runtime. Also, the parallel partitioning of the matrix is
determined by PETSc at runtime.
Performance tuning note: For problems of substantial size,
preallocation of matrix memory is crucial for attaining good
performance. See the matrix chapter of the users manual for details.
*/
ierr
ierr
ierr
ierr
ierr
ierr
ierr

=
=
=
=
=
=
=

MatCreate(PETSC COMM WORLD,&A);CHKERRQ(ierr);
MatSetSizes(A,PETSC DECIDE,PETSC DECIDE,m*n,m*n);CHKERRQ(ierr);
MatSetFromOptions(A);CHKERRQ(ierr);
MatMPIAIJSetPreallocation(A,5,NULL,5,NULL);CHKERRQ(ierr);
MatSeqAIJSetPreallocation(A,5,NULL);CHKERRQ(ierr);
MatSeqSBAIJSetPreallocation(A,1,5,NULL);CHKERRQ(ierr);
MatMPISBAIJSetPreallocation(A,1,5,NULL,5,NULL);CHKERRQ(ierr);

/*
Currently, all PETSc parallel matrix formats are partitioned by
contiguous chunks of rows across the processors. Determine which
rows of the matrix are locally owned.
*/
ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr);
/*
Set matrix elements for the 2-D, five-point stencil in parallel.
35

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

- Each processor needs to insert only elements that it owns
locally (but any non-local elements will be sent to the
appropriate processor during matrix assembly).
- Always specify global rows and columns of matrix entries.
Note: this uses the less common natural ordering that orders first
all the unknowns for x = h then for x = 2h etc; Hence you see J = Ii +- n
instead of J = I +- m as you might expect. The more standard ordering
would first do all variables for y = h, then y = 2h etc.
*/
ierr = PetscLogStageRegister("Assembly", &stage);CHKERRQ(ierr);
ierr = PetscLogStagePush(stage);CHKERRQ(ierr);
for (Ii=Istart; Ii0)
{J = Ii - n; ierr =
MatSetValues(A,1,&Ii,1,&J,&v,ADD VALUES);CHKERRQ(ierr);}
if (i0)
{J = Ii - 1; ierr =
MatSetValues(A,1,&Ii,1,&J,&v,ADD VALUES);CHKERRQ(ierr);}
if (j -pc_type  -ksp_monitor -ksp_rtol 
These options will override those specified above as long as
KSPSetFromOptions() is called _after_ any other customization
routines.
*/
ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve the linear system
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Check solution and clean up
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
Check the error
*/
ierr = VecAXPY(x,-1.0,u);CHKERRQ(ierr);
ierr = VecNorm(x,NORM 2,&norm);CHKERRQ(ierr);
ierr = KSPGetIterationNumber(ksp,&its);CHKERRQ(ierr);
/*
Print convergence information. PetscPrintf() produces a single
print statement from all processes that share a communicator.
An alternative is PetscFPrintf(), which prints to a file.
*/
ierr = PetscPrintf(PETSC COMM WORLD,"Norm of error %g iterations
38

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

%D\n",(double)norm,its);CHKERRQ(ierr);
/*
Free work space. All PETSc objects should be destroyed when they
are no longer needed.
*/
ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
ierr = VecDestroy(&u);CHKERRQ(ierr); ierr = VecDestroy(&x);CHKERRQ(ierr);
ierr = VecDestroy(&b);CHKERRQ(ierr); ierr = MatDestroy(&A);CHKERRQ(ierr);
/*
Always call PetscFinalize() before exiting a program. This routine
- finalizes the PETSc libraries as well as MPI
- provides summary and diagnostic information if certain runtime
options are chosen (e.g., -log_view).
*/
ierr = PetscFinalize();
return ierr;
}
Figure 4: Example of Multiprocessor PETSc Code

Compiling and Running Programs
Figure 5 illustrates compiling and running a PETSc program using MPICH on an OS X laptop.
Note that different machines will have compilation commands as determined by the configuration
process. See Chapter 16 for a discussion about compiling PETSc programs. Users who are experiencing difficulties linking PETSc programs should refer to the FAQ on the PETSc website
http://www.mcs.anl.gov/petsc or given in the file $PETSC_DIR/docs/faq.html.
$ make ex2
/Users/patrick/petsc/arch-darwin-double-debug/bin/mpicc -o ex2.o -c -Wall -Wwrite-strings -Wnostrict-aliasing -Wno-unknown-pragmas -Qunused-arguments -fvisibility=hidden -g3
-I/Users/
patrick/petsc/include -I/Users/patrick/petsc/arch-darwin-double-debug/include -I/opt/X11/
include -I/opt/local/include
‘pwd‘/ex2.c
/Users/patrick/petsc/arch-darwin-double-debug/bin/mpicc -Wl,-multiply_defined,suppress -Wl,multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,search_paths_first
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunusedarguments -fvisibility=hidden -g3 -o ex2 ex2.o -Wl,-rpath,/Users/patrick/petsc/arch-darwindouble-debug/lib -L/Users/patrick/petsc/arch-darwin-double-debug/lib -lpetsc -Wl,-rpath,/Users
/patrick/petsc/arch-darwin-double-debug/lib -lf2clapack -lf2cblas -Wl,-rpath,/opt/X11/lib -L/
opt/X11/lib -lX11 -lssl -lcrypto -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/
Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.0.2/lib/darwin -L/Applications/Xcode.app/
Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.0.2/lib/darwin -lmpifort
-lgfortran -Wl,-rpath,/opt/local/lib/gcc5/gcc/x86_64-apple-darwin14/5.3.0 -L/opt/local/lib/
gcc5/gcc/x86_64-apple-darwin14/5.3.0 -Wl,-rpath,/opt/local/lib/gcc5 -L/opt/local/lib/gcc5 lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/
Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.0.2/lib
/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/
bin/../lib/clang/7.0.2/lib/darwin -lclang_rt.osx -Wl,-rpath,/Users/patrick/petsc/arch-darwindouble-debug/lib -L/Users/patrick/petsc/arch-darwin-double-debug/lib -ldl -lmpi -lpmpi -lSystem
-Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/

39

1.5. SIMPLE PETSC EXAMPLES

PETSc 3.8 March 24, 2018

bin/../lib/clang/7.0.2/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/
XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.0.2/lib/darwin -lclang_rt.osx -ldl
/bin/rm -f ex2.o
$ $PETSC_DIR/bin/petscmpiexec -n 1 ./ex2
Norm of error 0.000156044 iterations 6
$ $PETSC_DIR/bin/petscmpiexec -n 2 ./ex2
Norm of error 0.000411674 iterations 7

Figure 5: Running a PETSc Program
As shown in Figure 6, the option -log_view activates printing of a performance summary,
including times, floating point operation (flop) rates, and message-passing activity. Chapter 13
provides details about profiling, including interpretation of the output data within Figure 6. This
particular example involves the solution of a linear system on one processor using GMRES and
ILU. The low floating point operation (flop) rates in this example are due to the fact that the
code solved a tiny system. We include this example merely to demonstrate the ease of extracting
performance information.
$ $PETSC_DIR/bin/petscmpiexec -n 1 ./ex1 -n 1000 -pc_type ilu -ksp_type gmres -ksp_rtol 1.e-7 -log_view
...
-----------------------------------------------------------------------------------------------------------------------Event
Count
Time (sec)
Flops
--- Global --- --- Stage --Total
Max Ratio Max
Ratio
Max Ratio Mess
Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
-------------------------------------------------------------------------------------------------------------------------- Event Stage 0: Main Stage
VecMDot
1 1.0 3.2830e-06 1.0 2.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0
0 5 0 0 0
609
VecNorm
3 1.0 4.4550e-06 1.0 6.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 14 0 0 0
0 14 0 0 0 1346
VecScale
2 1.0 4.0110e-06 1.0 2.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0
0 5 0 0 0
499
VecCopy
1 1.0 3.2280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0
0 0 0 0 0
0
VecSet
11 1.0 2.5537e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0
2 0 0 0 0
0
VecAXPY
2 1.0 2.0930e-06 1.0 4.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0
0 10 0 0 0 1911
VecMAXPY
2 1.0 1.1280e-06 1.0 4.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0
0 10 0 0 0 3546
VecNormalize
2 1.0 9.3970e-06 1.0 6.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 14 0 0 0
1 14 0 0 0
638
MatMult
2 1.0 1.1177e-05 1.0 9.99e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 24 0 0 0
1 24 0 0 0
894
MatSolve
2 1.0 1.9933e-05 1.0 9.99e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 24 0 0 0
1 24 0 0 0
501
MatLUFactorNum
1 1.0 3.5081e-05 1.0 4.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 2 10 0 0 0
2 10 0 0 0
114
MatILUFactorSym
1 1.0 4.4259e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0
3 0 0 0 0
0
MatAssemblyBegin
1 1.0 8.2015e-08 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0
0 0 0 0 0
0
MatAssemblyEnd
1 1.0 3.3536e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0
2 0 0 0 0
0
MatGetRowIJ
1 1.0 1.5960e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0
0 0 0 0 0
0
MatGetOrdering
1 1.0 3.9791e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0
3 0 0 0 0
0
MatView
2 1.0 6.7909e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0
5 0 0 0 0
0
KSPGMRESOrthog
1 1.0 7.5970e-06 1.0 4.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0
1 10 0 0 0
526
KSPSetUp
1 1.0 3.4424e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0
2 0 0 0 0
0
KSPSolve
1 1.0 2.7264e-04 1.0 3.30e+04 1.0 0.0e+00 0.0e+00 0.0e+00 19 79 0 0 0 19 79 0 0 0
121
PCSetUp
1 1.0 1.5234e-04 1.0 4.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 11 10 0 0 0 11 10 0 0 0
26
PCApply
2 1.0 2.1022e-05 1.0 9.99e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 24 0 0 0
1 24 0 0 0
475
-----------------------------------------------------------------------------------------------------------------------Memory usage is given in bytes:
Object Type
Creations
Destructions
Reports information only for process 0.

Memory

Descendants’ Mem.

--- Event Stage 0: Main Stage
Vector
8
8
76224
0.
Matrix
2
2
134212
0.
Krylov Solver
1
1
18400
0.
Preconditioner
1
1
1032
0.
Index Set
3
3
10328
0.
Viewer
1
0
0
0.
========================================================================================================================
...

Figure 6: Running a PETSc Program with Profiling (partial output)
40

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

Writing Application Codes with PETSc
The examples throughout the library demonstrate the software usage and can serve as templates
for developing custom applications. We suggest that new PETSc users examine programs in the
directories ${PETSC_DIR}/src//examples/tutorials where  denotes any of
the PETSc libraries (listed in the following section), such as SNES or KSP. The manual pages located
at ${PETSC_DIR}/docs/index.htm or http://www.mcs.anl.gov/petsc/documentation provide links
(organized by both routine names and concepts) to the tutorial examples.
To write a new application program using PETSc, we suggest the following procedure:
1. Install and test PETSc according to the instructions at the PETSc web site.
2. Copy one of the many PETSc examples in the directory that corresponds to the class
of problem of interest (e.g., for linear solvers, see ${PETSC_DIR}/src/ksp/ksp/examples/
tutorials).
3. Copy the corresponding makefile within the example directory; compile and run the example
program.
4. Use the example program as a starting point for developing a custom code.

1.6

Citing PETSc

When citing PETSc in a publication please cite the following:
@Mi=sc{petsc-web-page,
Author = "Satish Balay and Shrirang Abhyankar and Mark~F. Adams and Jed Brown
and Peter Brune and Kris Buschelman and Lisandro Dalcin and Victor Eijkhout
and William~D. Gropp and Dinesh Kaushik and Matthew~G. Knepley and Lois Curfman McInnes
and Karl Rupp and Barry~F. Smith and Stefano Zampini
and Hong Zhang and Hong Zhang",
Title = "{PETS}c {W}eb page",
Note
= "http://www.mcs.anl.gov/petsc",
Year
= "2017"}
@TechReport{petsc-user-ref,
Author = "Satish Balay and Shrirang Abhyankar and Mark~F. Adams and Jed Brown
and Peter Brune and Kris Buschelman and Lisandro Dalcin and Victor Eijkhout
and Dinesh Kaushik and Matthew~G. Knepley and Dave~A. May and Lois Curfman McInnes
and William~D. Gropp and Karl Rupp and Patrick Sanan and Barry~F. Smith and Stefano Zampini
and Hong Zhang and Hong Zhang",
Title
= "PETSc Users Manual",
Number
= "ANL-95/11 - Revision 3.8",
Institution = "Argonne National Laboratory",
Year
= "2017"}
@InProceedings{petsc-efficient,
Author
= "Satish Balay and William D. Gropp and Lois C. McInnes and Barry F. Smith",
Title
= "Efficienct Management of Parallelism in Object Oriented
Numerical Software Libraries",
Booktitle = "Modern Software Tools in Scientific Computing",
Editor
= "E. Arge and A. M. Bruaset and H. P. Langtangen",
Pages
= "163--202",
41

1.7. DIRECTORY STRUCTURE

PETSc 3.8 March 24, 2018

Publisher = "Birkhauser Press",
Year
= "1997"}

1.7

Directory Structure

We conclude this introduction with an overview of the organization of the PETSc software. The
root directory of PETSc contains the following directories:
• docs - All documentation for PETSc. The files manual.pdf contains the hyperlinked users
manual, suitable for printing or on-screen viewering. Includes the subdirectory
- manualpages (on-line manual pages).
• bin - Utilities and short scripts for use with PETSc, including
• petscmpiexec (utility for setting running MPI jobs),
• conf - Base PETSc configuration files that define the standard make variables and rules used
by PETSc
• include - All include files for PETSc that are visible to the user.
• include/petsc/finclude - PETSc include files for Fortran programmers using the .F suffix
(recommended).
• include/petsc/private - Private PETSc include files that should not need to be used by
application programmers.
• share - Some small test matrices in data files
• src - The source code for all PETSc libraries, which currently includes
• vec - vectors,
• is - index sets,
• mat - matrices,
• dm - data management between meshes and vectors and matrices,
• ksp - complete linear equations solvers,
• ksp - Krylov subspace accelerators,
• pc - preconditioners,
• snes - nonlinear solvers
• ts - ODE solvers and timestepping,
• sys - general system-related routines,
• logging - PETSc logging and profiling routines,
• classes - low-level classes
•
•
•
•

draw - simple graphics,
viewer
bag
random - random number generators.

• contrib - contributed modules that use PETSc but are not part of the official PETSc
package. We encourage users who have developed such code that they wish to share
with others to let us know by writing to petsc-maint@mcs.anl.gov.
Each PETSc source code library directory has the following subdirectories:
42

CHAPTER 1. GETTING STARTED

PETSc 3.8 March 24, 2018

• examples - Example programs for the component, including
• tutorials - Programs designed to teach users about PETSc. These codes can serve as
templates for the design of custom applications.
• tests - Programs designed for thorough testing of PETSc. As such, these codes are not
intended for examination by users.
• interface - The calling sequences for the abstract interface to the component. Code here
does not know about particular implementations.
• impls - Source code for one or more implementations.
• utils - Utility routines. Source here may know about the implementations, but ideally will
not know about implementations for other components.

43

1.7. DIRECTORY STRUCTURE

PETSc 3.8 March 24, 2018

44

Part II

Programming with PETSc

45

Chapter 2

Vectors and Parallel Data
The vector (denoted by Vec) is one of the simplest PETSc objects. Vectors are used to store discrete
PDE solutions, right-hand sides for linear systems, etc. This chapter is organized as follows:
• (Vec) Sections 2.1 and 2.2 - basic usage of vectors
• Section 2.3 - management of the various numberings of degrees of freedom, vertices, cells, etc.
• (AO) Mapping between different global numberings
• (ISLocalToGlobalMapping) Mapping between local and global numberings
• (DM) Section 2.4 - management of grids
• (IS, VecScatter) Section 2.5 - management of vectors related to unstructured grids

2.1

Creating and Assembling Vectors

PETSc currently provides two basic vector types: sequential and parallel (MPI-based). To create
a sequential vector with m components, one can use the command
VecCreateSeq(PETSC COMM SELF,PetscInt m,Vec *x);
To create a parallel vector one can either specify the number of components that will be stored on
each process or let PETSc decide. The command
VecCreateMPI(MPI Comm comm,PetscInt m,PetscInt M,Vec *x);
creates a vector distributed over all processes in the communicator, comm, where m indicates the
number of components to store on the local process, and M is the total number of vector components. Either the local or global dimension, but not both, can be set to PETSC_DECIDE or
PETSC_DETERMINE, respectively, to indicate that PETSc should decide or determine it. More
generally, one can use the routines
VecCreate(MPI Comm comm,Vec *v);
VecSetSizes(Vec v, PetscInt m, PetscInt M);
VecSetFromOptions(Vec v);
which automatically generates the appropriate vector type (sequential or parallel) over all processes in comm. The option -vec_type mpi can be used in conjunction with VecCreate() and
VecSetFromOptions() to specify the use of MPI vectors even for the uniprocessor case.
47

2.1. CREATING AND ASSEMBLING VECTORS

PETSc 3.8 March 24, 2018

We emphasize that all processes in comm must call the vector creation routines, since these routines are collective over all processes in the communicator. If you are not familiar with MPI communicators, see the discussion in Section 1.4 on page 25. In addition, if a sequence of VecCreateXXX()
routines is used, they must be called in the same order on each process in the communicator.
One can assign a single value to all components of a vector with the command
VecSet(Vec x,PetscScalar value);
Assigning values to individual components of the vector is more complicated, in order to make it
possible to write efficient parallel code. Assigning a set of components is a two-step process: one
first calls
VecSetValues(Vec x,PetscInt n,PetscInt *indices,PetscScalar
*values,INSERT VALUES);
any number of times on any or all of the processes. The argument n gives the number of components
being set in this insertion. The integer array indices contains the global component indices, and
values is the array of values to be inserted. Any process can set any components of the vector;
PETSc ensures that they are automatically stored in the correct location. Once all of the values
have been inserted with VecSetValues(), one must call
VecAssemblyBegin(Vec x);
followed by
VecAssemblyEnd(Vec x);
to perform any needed message passing of nonlocal components. In order to allow the overlap of
communication and calculation, the user’s code can perform any series of other actions between
these two calls while the messages are in transition.
Example usage of VecSetValues() may be found in $PETSC_DIR/src/vec/vec/examples/
tutorials/ex2.c or ex2f.F.
Often, rather than inserting elements in a vector, one may wish to add values. This process is
also done with the command
VecSetValues(Vec x,PetscInt n,PetscInt *indices, PetscScalar *values,ADD VALUES);
Again one must call the assembly routines VecAssemblyBegin() and VecAssemblyEnd() after all
of the values have been added. Note that addition and insertion calls to VecSetValues() cannot
be mixed. Instead, one must add and insert vector elements in phases, with intervening calls to
the assembly routines. This phased assembly procedure overcomes the nondeterministic behavior
that would occur if two different processes generated values for the same location, with one process
adding while the other is inserting its value. (In this case the addition and insertion actions could be
performed in either order, thus resulting in different values at the particular location. Since PETSc
does not allow the simultaneous use of INSERT_VALUES and ADD_VALUES this nondeterministic
behavior will not occur in PETSc.)
You can called VecGetValues() to pull local values from a vector (but not off-process values),
an alternative method for extracting some components of a vector are the vector scatter routines.
See Section 2.5.2 for details; see also below for VecGetArray().
One can examine a vector with the command
VecView(Vec x,PetscViewer v);
48

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

To print the vector to the screen, one can use the viewer PETSC_VIEWER_STDOUT_WORLD, which
ensures that parallel vectors are printed correctly to stdout. To display the vector in an Xwindow, one can use the default X-windows viewer PETSC_VIEWER_DRAW_WORLD, or one can create
a viewer with the routine PetscViewerDrawOpenX(). A variety of viewers are discussed further in
Section 15.3.
To create a new vector of the same format as an existing vector, one uses the command
VecDuplicate(Vec old,Vec *new);
To create several new vectors of the same format as an existing vector, one uses the command
VecDuplicateVecs(Vec old,PetscInt n,Vec **new);
This routine creates an array of pointers to vectors. The two routines are very useful because they
allow one to write library code that does not depend on the particular format of the vectors being
used. Instead, the subroutines can automatically correctly create work vectors based on the specified
existing vector. As discussed in Section 11.1.5, the Fortran interface for VecDuplicateVecs() differs
slightly.
When a vector is no longer needed, it should be destroyed with the command
VecDestroy(Vec *x);
To destroy an array of vectors, use the command
VecDestroyVecs(PetscInt n,Vec **vecs);
Note that the Fortran interface for VecDestroyVecs() differs slightly, as described in Section 11.1.5.
It is also possible to create vectors that use an array provided by the user, rather than having
PETSc internally allocate the array space. Such vectors can be created with the routines
VecCreateSeqWithArray(PETSC COMM SELF,PetscInt bs,PetscInt n,PetscScalar
*array,Vec *V);
and
VecCreateMPIWithArray(MPI Comm comm,PetscInt bs,PetscInt n,PetscInt
N,PetscScalar *array,Vec *vv);
Note that here one must provide the value n; it cannot be PETSC_DECIDE and the user is responsible
for providing enough space in the array; n*sizeof(PetscScalar).

2.2

Basic Vector Operations

As listed in Table 1, we have chosen certain basic vector operations to support within the PETSc
vector library. These operations were selected because they often arise in application codes. The
NormType argument to P
VecNorm() is one of NORM_1, NORM_2, or NORM_INFINITY. The 1-norm is
P
2 1/2 and the infinity norm is max |x |.
|x
|,
the
2-norm
is
(
i i
i i
i xi )
For parallel vectors that are distributed across the processes by ranges, it is possible to determine
a process’s local range with the routine
VecGetOwnershipRange(Vec vec,PetscInt *low,PetscInt *high);
The argument low indicates the first component owned by the local process, while high specifies one
more than the last owned by the local process. This command is useful, for instance, in assembling
parallel vectors.
49

2.2. BASIC VECTOR OPERATIONS

PETSc 3.8 March 24, 2018

Function Name
VecAXPY(Vec y,PetscScalar a,Vec x);
VecAYPX(Vec y,PetscScalar a,Vec x);
VecWAXPY(Vec w,PetscScalar a,Vec x,Vec y);
VecAXPBY(Vec y,PetscScalar a,PetscScalar b,Vec x);
VecScale(Vec x, PetscScalar a);
VecDot(Vec x, Vec y, PetscScalar *r);
VecTDot(Vec x, Vec y, PetscScalar *r);
VecNorm(Vec x,NormType type, PetscReal *r);
VecSum(Vec x, PetscScalar *r);
VecCopy(Vec x, Vec y);
VecSwap(Vec x, Vec y);
VecPointwiseMult(Vec w,Vec x,Vec y);
VecPointwiseDivide(Vec w,Vec x,Vec y);
VecMDot(Vec x,PetscInt n,Vec y[],PetscScalar *r);
VecMTDot(Vec x,PetscInt n,Vec y[],PetscScalar *r);
VecMAXPY(Vec y,PetscInt n, PetscScalar *a, Vec x[]);
VecMax(Vec x, PetscInt *idx, PetscReal *r);
VecMin(Vec x, PetscInt *idx, PetscReal *r);
VecAbs(Vec x);
VecReciprocal(Vec x);
VecShift(Vec x,PetscScalar s);
VecSet(Vec x,PetscScalar alpha);

Operation
y =y+a∗x
y =x+a∗y
w =a∗x+y
y =a∗x+b∗y
x=a∗x
r = x̄0 ∗ y
r = x0 ∗ y
r=P
||x||type
r = xi
y=x
y = x while x = y
wi = xi ∗ yi
wi = xi /yi
r[i] = x̄0 ∗ y[i]
r[i] = x0 P
∗ y[i]
y = y + i ai ∗ x[i]
r = max xi
r = min xi
xi = |xi |
xi = 1/xi
xi = s + xi
xi = α

Table 1: PETSc Vector Operations
On occasion, the user needs to access the actual elements of the vector. The routine VecGetArray()
returns a pointer to the elements local to the process:
VecGetArray(Vec v,PetscScalar **array);
When access to the array is no longer needed, the user should call
VecRestoreArray(Vec v, PetscScalar **array);
If the values do not need to be modified, the routines VecGetArrayRead() and
VecRestoreArrayRead() provide read-only access and should be used instead.
VecGetArrayRead(Vec v, const PetscScalar **array);
VecRestoreArrayRead(Vec v, const PetscScalar **array);
Minor differences exist in the Fortran interface for VecGetArray() and VecRestoreArray(), as
discussed in Section 11.2.1. It is important to note that VecGetArray() and VecRestoreArray()
do not copy the vector elements; they merely give users direct access to the vector elements. Thus,
these routines require essentially no time to call and can be used efficiently.
The number of elements stored locally can be accessed with
VecGetLocalSize(Vec v,PetscInt *size);
The global vector length can be determined by
VecGetSize(Vec v,PetscInt *size);
50

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

In addition to VecDot() and VecMDot() and VecNorm(), PETSc provides split phase versions of
these that allow several independent inner products and/or norms to share the same communication
(thus improving parallel efficiency). For example, one may have code such as
VecDot(Vec x,Vec y,PetscScalar *dot);
VecMDot(Vec x,PetscInt nv, Vec y[],PetscScalar *dot);
VecNorm(Vec x,NormType NORM 2,PetscReal *norm2);
VecNorm(Vec x,NormType NORM 1,PetscReal *norm1);
This code works fine, but it performs three separate parallel communication operations. Instead,
one can write
VecDotBegin(Vec x,Vec y,PetscScalar *dot);
VecMDotBegin(Vec x, PetscInt nv,Vec y[],PetscScalar *dot);
VecNormBegin(Vec x,NormType NORM 2,PetscReal *norm2);
VecNormBegin(Vec x,NormType NORM 1,PetscReal *norm1);
VecDotEnd(Vec x,Vec y,PetscScalar *dot);
VecMDotEnd(Vec x, PetscInt nv,Vec y[],PetscScalar *dot);
VecNormEnd(Vec x,NormType NORM 2,PetscReal *norm2);
VecNormEnd(Vec x,NormType NORM 1,PetscReal *norm1);
With this code, the communication is delayed until the first call to VecxxxEnd() at which a single
MPI reduction is used to communicate all the required values. It is required that the calls to
the VecxxxEnd() are performed in the same order as the calls to the VecxxxBegin(); however,
if you mistakenly make the calls in the wrong order, PETSc will generate an error informing you
of this. There are additional routines VecTDotBegin() and VecTDotEnd(), VecMTDotBegin(),
VecMTDotEnd().
Note: these routines use only MPI-1 functionality; they do not allow you to overlap computation and communication (assuming no threads are spawned within a MPI process). Once MPI-2
implementations are more common we’ll improve these routines to allow overlap of inner product
and norm calculations with other calculations. Also currently these routines only work for the
PETSc built in vector types.

2.3

Indexing and Ordering

When writing parallel PDE codes, there is extra complexity caused by having multiple ways of
indexing (numbering) and ordering objects such as vertices and degrees of freedom. For example,
a grid generator or partitioner may renumber the nodes, requiring adjustment of the other data
structures that refer to these objects; see Figure 8. In addition, local numbering (on a single process)
of objects may be different than the global (cross-process) numbering. PETSc provides a variety
of tools to help to manage the mapping amongst the various numbering systems. The two most
basic are the AO (application ordering), which enables mapping between different global (crossprocess) numbering schemes and the ISLocalToGlobalMapping, which allows mapping between
local (on-process) and global (cross-process) numbering.

2.3.1

Application Orderings

In many applications it is desirable to work with one or more “orderings” (or numberings) of degrees
of freedom, cells, nodes, etc. Doing so in a parallel environment is complicated by the fact that each
process cannot keep complete lists of the mappings between different orderings. In addition, the
51

2.3. INDEXING AND ORDERING

PETSc 3.8 March 24, 2018

orderings used in the PETSc linear algebra routines (often contiguous ranges) may not correspond
to the “natural” orderings for the application.
PETSc provides certain utility routines that allow one to deal cleanly and efficiently with the
various orderings. To define a new application ordering (called an AO in PETSc), one can call the
routine
AOCreateBasic(MPI Comm comm,PetscInt n,const PetscInt apordering[],const
PetscInt petscordering[],AO *ao);
The arrays apordering and petscordering, respectively, contain a list of integers in the application
ordering and their corresponding mapped values in the PETSc ordering. Each process can provide
whatever subset of the ordering it chooses, but multiple processes should never contribute duplicate
values. The argument n indicates the number of local contributed values.
For example, consider a vector of length 5, where node 0 in the application ordering corresponds to node 3 in the PETSc ordering. In addition, nodes 1, 2, 3, and 4 of the application
ordering correspond, respectively, to nodes 2, 1, 4, and 0 of the PETSc ordering. We can write this
correspondence as
{0, 1, 2, 3, 4} → {3, 2, 1, 4, 0}.
The user can create the PETSc AO mappings in a number of ways. For example, if using two
processes, one could call
AOCreateBasic(PETSC COMM WORLD,2,{0,3},{3,4},&ao);
on the first process and
AOCreateBasic(PETSC COMM WORLD,3,{1,2,4},{2,1,0},&ao);
on the other process.
Once the application ordering has been created, it can be used with either of the commands
AOPetscToApplication(AO ao,PetscInt n,PetscInt *indices);
AOApplicationToPetsc(AO ao,PetscInt n,PetscInt *indices);
Upon input, the n-dimensional array indices specifies the indices to be mapped, while upon output,
indices contains the mapped values. Since we, in general, employ a parallel database for the AO
mappings, it is crucial that all processes that called AOCreateBasic() also call these routines; these
routines cannot be called by just a subset of processes in the MPI communicator that was used in
the call to AOCreateBasic().
An alternative routine to create the application ordering, AO, is
AOCreateBasicIS(IS apordering,IS petscordering,AO *ao);
where index sets (see 2.5.1) are used instead of integer arrays.
The mapping routines
AOPetscToApplicationIS(AO ao,IS indices);
AOApplicationToPetscIS(AO ao,IS indices);
will map index sets (IS objects) between orderings. Both the AOXxxToYyy() and AOXxxToYyyIS()
routines can be used regardless of whether the AO was created with a AOCreateBasic() or
AOCreateBasicIS().
The AO context should be destroyed with AODestroy(AO *ao) and viewed with
AOView(AO ao,PetscViewer viewer).
52

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

Although we refer to the two orderings as “PETSc” and “application” orderings, the user is
free to use them both for application orderings and to maintain relationships among a variety of
orderings by employing several AO contexts.
The AOxxToxx() routines allow negative entries in the input integer array. These entries are not
mapped; they simply remain unchanged. This functionality enables, for example, mapping neighbor
lists that use negative numbers to indicate nonexistent neighbors due to boundary conditions, etc.

2.3.2

Local to Global Mappings

In many applications one works with a global representation of a vector (usually on a vector
obtained with VecCreateMPI()) and a local representation of the same vector that includes ghost
points required for local computation. PETSc provides routines to help map indices from a local
numbering scheme to the PETSc global numbering scheme. This is done via the following routines
ISLocalToGlobalMappingCreate(MPI Comm comm,PetscInt bs,PetscInt N,PetscInt*
globalnum,PetscCopyMode mode,ISLocalToGlobalMapping* ctx);
ISLocalToGlobalMappingApply(ISLocalToGlobalMapping ctx,PetscInt n,PetscInt
*in,PetscInt *out);
ISLocalToGlobalMappingApplyIS(ISLocalToGlobalMapping ctx,IS isin,IS* isout);
ISLocalToGlobalMappingDestroy(ISLocalToGlobalMapping *ctx);
Here N denotes the number of local indices, globalnum contains the global number of each local
number, and ISLocalToGlobalMapping is the resulting PETSc object that contains the information
needed to apply the mapping with either ISLocalToGlobalMappingApply() or
ISLocalToGlobalMappingApplyIS().
Note that the ISLocalToGlobalMapping routines serve a different purpose than the AO routines.
In the former case they provide a mapping from a local numbering scheme (including ghost points)
to a global numbering scheme, while in the latter they provide a mapping between two global
numbering schemes. In fact, many applications may use both AO and ISLocalToGlobalMapping
routines. The AO routines are first used to map from an application global ordering (that has no
relationship to parallel processing etc.) to the PETSc ordering scheme (where each process has
a contiguous set of indices in the numbering). Then in order to perform function or Jacobian
evaluations locally on each process, one works with a local numbering scheme that includes ghost
points. The mapping from this local numbering scheme back to the global PETSc numbering can
be handled with the ISLocalToGlobalMapping routines.
If one is given a list of block indices in a global numbering, the routine
ISGlobalToLocalMappingApplyBlock(ISLocalToGlobalMapping
ctx,ISGlobalToLocalMappingMode type,PetscInt nin,PetscInt idxin[],PetscInt
*nout,PetscInt idxout[]);
will provide a new list of indices in the local numbering. Again, negative values in idxin are left
unmapped. But, in addition, if type is set to IS_GTOLM_MASK , then nout is set to nin and all
global values in idxin that are not represented in the local to global mapping are replaced by -1.
When type is set to IS_GTOLM_DROP, the values in idxin that are not represented locally in the
mapping are not included in idxout, so that potentially nout is smaller than nin. One must pass in
an array long enough to hold all the indices. One can call ISGlobalToLocalMappingApplyBlock()
with idxout equal to NULL to determine the required length (returned in nout) and then allocate the
required space and call ISGlobalToLocalMappingApplyBlock() a second time to set the values.
Often it is convenient to set elements into a vector using the local node numbering rather than
the global node numbering (e.g., each process may maintain its own sublist of vertices and elements
53

2.4. STRUCTURED GRIDS USING DISTRIBUTED ARRAYS

Proc 6

PETSc 3.8 March 24, 2018

Proc 6

Proc 0 Proc 1

Proc 0 Proc 1

Box-type stencil

Star-type stencil

Figure 7: Ghost Points for Two Stencil Types on the Seventh Process
and number them locally). To set values into a vector with the local numbering, one must first call
VecSetLocalToGlobalMapping(Vec v,ISLocalToGlobalMapping ctx);
and then call
VecSetValuesLocal(Vec x,PetscInt n,const PetscInt indices[],const PetscScalar
values[],INSERT VALUES);
Now the indices use the local numbering, rather than the global, meaning the entries lie in [0, n)
where n is the local size of the vector.

2.4

Structured Grids Using Distributed Arrays

Distributed arrays (DMDAs), which are used in conjunction with PETSc vectors, are intended for
use with logically regular rectangular grids when communication of nonlocal data is needed before
certain local computations can occur. PETSc distributed arrays are designed only for the case in
which data can be thought of as being stored in a standard multidimensional array; thus, DMDAs are
not intended for parallelizing unstructured grid problems, etc. DAs are intended for communicating
vector (field) information; they are not intended for storing matrices.
For example, a typical situation one encounters in solving PDEs in parallel is that, to evaluate
a local function, f(x), each process requires its local portion of the vector x as well as its ghost
points (the bordering portions of the vector that are owned by neighboring processes). Figure 7
illustrates the ghost points for the seventh process of a two-dimensional, regular parallel grid. Each
box represents a process; the ghost points for the seventh process’s local part of a parallel array
are shown in gray.

2.4.1

Creating Distributed Arrays

The PETSc DMDA object manages the parallel communication required while working with data
stored in regular arrays. The actual data is stored in appropriately sized vector objects; the DMDA
object only contains the parallel data layout information and communication information, however
it may be used to create vectors and matrices with the proper layout.
One creates a distributed array communication data structure in two dimensions with the
command
54

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

DMDACreate2d(MPI Comm comm,DMBoundaryType xperiod,DMBoundaryType
yperiod,DMDAStencilType st,PetscInt M, PetscInt N,PetscInt m,PetscInt
n,PetscInt dof,PetscInt s,PetscInt *lx,PetscInt *ly,DM *da);
The arguments M and N indicate the global numbers of grid points in each direction, while m and
n denote the process partition in each direction; m*n must equal the number of processes in the
MPI communicator, comm. Instead of specifying the process layout, one may use PETSC_DECIDE
for m and n so that PETSc will determine the partition using MPI. The type of periodicity of
the array is specified by xperiod and yperiod, which can be DM_BOUNDARY_NONE (no periodicity), DM_BOUNDARY_PERIODIC (periodic in that direction), DM_BOUNDARY_TWIST (periodic in that
direction, but identified in reverse order), DM_BOUNDARY_GHOSTED , or DM_BOUNDARY_MIRROR. The
argument dof indicates the number of degrees of freedom at each array point, and s is the stencil
width (i.e., the width of the ghost point region). The optional arrays lx and ly may contain the
number of nodes along the x and y axis for each cell, i.e. the dimension of lx is m and the dimension
of ly is n; alternately, NULL may be passed in.
Two types of distributed array communication data structures can be created, as specified by
st. Star-type stencils that radiate outward only in the coordinate directions are indicated by
DMDA_STENCIL_STAR, while box-type stencils are specified by DMDA_STENCIL_BOX. For example,
for the two-dimensional case, DMDA_STENCIL_STAR with width 1 corresponds to the standard 5point stencil, while DMDA_STENCIL_BOX with width 1 denotes the standard 9-point stencil. In
both instances the ghost points are identical, the only difference being that with star-type stencils
certain ghost points are ignored, decreasing substantially the number of messages sent. Note that
the DMDA_STENCIL_STAR stencils can save interprocess communication in two and three dimensions.
These DMDA stencils have nothing directly to do with any finite difference stencils one might
chose to use for a discretization; they only ensure that the correct values are in place for application
of a user-defined finite difference stencil (or any other discretization technique).
The commands for creating distributed array communication data structures in one and three
dimensions are analogous:
DMDACreate1d(MPI Comm comm,DMBoundaryType xperiod,PetscInt M,PetscInt w,PetscInt
s,PetscInt *lc,DM *inra);
DMDACreate3d=(MPI Comm comm,DMBoundaryType xperiod,DMBoundaryType
yperiod,DMBoundaryType zperiod, DMDAStencilType stencil_type,PetscInt
M,PetscInt N,PetscInt P,PetscInt m,PetscInt n,PetscInt p,PetscInt w,PetscInt
s,PetscInt *lx,PetscInt *ly,PetscInt *lz,DM *inra);
The routines to create distributed arrays are collective, so that all processes in the communicator
comm must call DACreateXXX().

2.4.2

Local/Global Vectors and Scatters

Each DMDA object defines the layout of two vectors: a distributed global vector and a local vector
that includes room for the appropriate ghost points. The DMDA object provides information about
the size and layout of these vectors, but does not internally allocate any associated storage space
for field values. Instead, the user can create vector objects that use the DMDA layout information
with the routines
DMCreateGlobalVector(DM da,Vec *g);
DMCreateLocalVector(DM da,Vec *l);
55

2.4. STRUCTURED GRIDS USING DISTRIBUTED ARRAYS

PETSc 3.8 March 24, 2018

These vectors will generally serve as the building blocks for local and global PDE solutions, etc.
If additional vectors with such layout information are needed in a code, they can be obtained by
duplicating l or g via VecDuplicate() or VecDuplicateVecs().
We emphasize that a distributed array provides the information needed to communicate the
ghost value information between processes. In most cases, several different vectors can share the
same communication information (or, in other words, can share a given DMDA). The design of the
DMDA object makes this easy, as each DMDA operation may operate on vectors of the appropriate
size, as obtained via DMCreateLocalVector() and DMCreateGlobalVector() or as produced by
VecDuplicate(). As such, the DMDA scatter/gather operations (e.g., DMGlobalToLocalBegin())
require vector input/output arguments, as discussed below.
PETSc currently provides no container for multiple arrays sharing the same distributed array
communication; note, however, that the dof parameter handles many cases of interest.
At certain stages of many applications, there is a need to work on a local portion of the vector,
including the ghost points. This may be done by scattering a global vector into its local parts by
using the two-stage commands
DMGlobalToLocalBegin(DM da,Vec g,InsertMode iora,Vec l);
DMGlobalToLocalEnd(DM da,Vec g,InsertMode iora,Vec l);
which allow the overlap of communication and computation. Since the global and local vectors, given by g and l, respectively, must be compatible with the distributed array, da, they
should be generated by DMCreateGlobalVector() and DMCreateLocalVector() (or be duplicates
of such a vector obtained via VecDuplicate()). The InsertMode can be either ADD_VALUES or
INSERT_VALUES.
One can scatter the local patches into the distributed vector with the commands
DMLocalToGlobalBegin(DM da,Vec l,InsertMode mode,Vec g);
DMLocalToGlobalEnd(DM da,Vec l,InsertMode mode,Vec g);
In general this is used with an InsertMode of ADD_VALUES, because if one wishes to insert values
into the global vector they should just access the global vector directly and put in the values.
A third type of distributed array scatter is from a local vector (including ghost points that
contain irrelevant values) to a local vector with correct ghost point values. This scatter may be
done with the commands
DMDALocalToLocalBegin(DM da,Vec l1,InsertMode iora,Vec l2);
DMDALocalToLocalEnd(DM da,Vec l1,InsertMode iora,Vec l2);
Since both local vectors, l1 and l2, must be compatible with the distributed array, da, they
should be generated by DMCreateLocalVector() (or be duplicates of such vectors obtained via
VecDuplicate()). The InsertMode can be either ADD_VALUES or INSERT_VALUES.
It is possible to directly access the vector scatter contexts (see below) used in the local-to-global
(ltog), global-to-local (gtol), and local-to-local (ltol) scatters with the command
DMDAGetScatter(DM da,VecScatter *ltog,VecScatter *gtol,VecScatter *ltol);
Most users should not need to use these contexts.

2.4.3

Local (Ghosted) Work Vectors

In most applications the local ghosted vectors are only needed during user “function evaluations”.
PETSc provides an easy, light-weight (requiring essentially no CPU time) way to obtain these work
vectors and return them when they are no longer needed. This is done with the routines
56

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

DMGetLocalVector(DM da,Vec *l);
... use the local vector l ...
DMRestoreLocalVector(DM da,Vec *l);

2.4.4

Accessing the Vector Entries for DMDA Vectors

PETSc provides an easy way to set values into the DMDA Vectors and access them using the
natural grid indexing. This is done with the routines
DMDAVecGetArray(DM da,Vec l,void *array);
... use the array indexing it with 1 or 2 or 3 dimensions ...
... depending on the dimension of the DMDA ...
DMDAVecRestoreArray(DM da,Vec l,void *array);
DMDAVecGetArrayRead(DM da,Vec l,void *array);
... use the array indexing it with 1 or 2 or 3 dimensions ...
... depending on the dimension of the DMDA ...
DMDAVecRestoreArrayRead(DM da,Vec l,void *array);
and
DMDAVecGetArrayDOF(DM da,Vec l,void *array);
... use the array indexing it with 1 or 2 or 3 dimensions ...
... depending on the dimension of the DMDA ...
DMDAVecRestoreArrayDOF(DM da,Vec l,void *array);
DMDAVecGetArrayDOFRead(DM da,Vec l,void *array);
... use the array indexing it with 1 or 2 or 3 dimensions ...
... depending on the dimension of the DMDA ...
DMDAVecRestoreArrayDOFRead(DM da,Vec l,void *array);
where array is a multidimensional C array with the same dimension as da. The vector l can be
either a global vector or a local vector. The array is accessed using the usual global indexing on
the entire grid, but the user may only refer to the local and ghost entries of this array as all other
entries are undefined. For example, for a scalar problem in two dimensions one could use
PetscScalar **f,**u;
...
DMDAVecGetArray(DM da,Vec local,&u);
DMDAVecGetArray(DM da,Vec global,&f);
...
f[i][j] = u[i][j] - ...
...
DMDAVecRestoreArray(DM da,Vec local,&u);
DMDAVecRestoreArray(DM da,Vec global,&f);
The recommended approach for multi-component PDEs is to declare a struct representing the
fields defined at each node of the grid, e.g.
typedef struct {
PetscScalar u,v,omega,temperature;
} Node;
57

2.4. STRUCTURED GRIDS USING DISTRIBUTED ARRAYS

PETSc 3.8 March 24, 2018

and write residual evaluation using
Node **f,**u;
DMDAVecGetArray(DM da,Vec local,&u);
DMDAVecGetArray(DM da,Vec global,&f);
...
f[i][j].omega = ...
...
DMDAVecRestoreArray(DM da,Vec local,&u);
DMDAVecRestoreArray(DM da,Vec global,&f);
See $PETSC_DIR/src/snes/examples/tutorials/ex5.c for a complete example and see $PETSC
_DIR/src/snes/examples/tutorials/ex19.c for an example for a multi-component PDE.

2.4.5

Grid Information

The global indices of the lower left corner of the local portion of the array as well as the local array
size can be obtained with the commands
DMDAGetCorners(DM da,PetscInt *x,PetscInt *y,PetscInt *z,PetscInt *m,PetscInt
*n,PetscInt *p);
DMDAGetGhostCorners(DM da,PetscInt *x,PetscInt *y,PetscInt *z,PetscInt
*m,PetscInt *n,PetscInt *p);
The first version excludes any ghost points, while the second version includes them. The routine DMDAGetGhostCorners() deals with the fact that subarrays along boundaries of the problem
domain have ghost points only on their interior edges, but not on their boundary edges.
When either type of stencil is used, DMDA_STENCIL_STAR or DMDA_STENCIL_BOX, the local vectors
(with the ghost points) represent rectangular arrays, including the extra corner elements in the
DMDA_STENCIL_STAR case. This configuration provides simple access to the elements by employing
two- (or three-) dimensional indexing. The only difference between the two cases is that when
DMDA_STENCIL_STAR is used, the extra corner components are not scattered between the processes
and thus contain undefined values that should not be used.
To assemble global stiffness matrices, one can use these global indices with MatSetValues() or
MatSetValuesStencil(). Alternately, the global node number of each local node, including the
ghost nodes, can be obtained by calling
DMGetLocalToGlobalMapping(DM da,ISLocalToGlobalMapping *map);
followed by
VecSetLocalToGlobalMapping(Vec v,ISLocalToGlobalMapping map);
MatSetLocalToGlobalMapping(Mat A,ISLocalToGlobalMapping map);
Now entries may be added to the vector and matrix using the local numbering and
VecSetValuesLocal() and MatSetValuesLocal().
Since the global ordering that PETSc uses to manage its parallel vectors (and matrices) does
not usually correspond to the “natural” ordering of a two- or three-dimensional array, the DMDA
structure provides an application ordering AO (see Section 2.3.1) that maps between the natural
ordering on a rectangular grid and the ordering PETSc uses to parallelize. This ordering context
can be obtained with the command
DMDAGetAO(DM da,AO *ao);
58

CHAPTER 2. VECTORS AND PARALLEL DATA

Processor 2

Processor 3

PETSc 3.8 March 24, 2018

Processor 2

Processor 3

26
21
16

27
22
17

28
23
18

29
24
19

30
25
20

22
19
16

23
20
17

24
21
18

29
27
25

30
28
26

11
6
1

12
7
2

13
8
3

14
9
4

15
10
5

7
4
1

8
5
2

9
6
3

14
12
10

15
13
11

Processor 0

Processor 1

Processor 0
Processor 1
PETSc Ordering

Natural Ordering

Figure 8: Natural Ordering and PETSc Ordering for a 2D Distributed Array (Four Processes)
In Figure 8 we indicate the orderings for a two-dimensional distributed array, divided among four
processes.
The example $PETSC_DIR/src/snes/examples/tutorials/ex5.c illustrates the use of a distributed array in the solution of a nonlinear problem. The analogous Fortran program is $PETSC
_DIR/src/snes/examples/tutorials/ex5f.F90; see Chapter 5 for a discussion of the nonlinear
solvers.

2.5
2.5.1

Vectors Related to Unstructured Grids
Index Sets

To facilitate general vector scatters and gathers used, for example, in updating ghost points for
problems defined on unstructured grids 1 , PETSc employs the concept of an index set, via the IS
class. An index set, which is a generalization of a set of integer indices, is used to define scatters,
gathers, and similar operations on vectors and matrices.
The following command creates an index set based on a list of integers:
ISCreateGeneral(MPI Comm comm,PetscInt n,PetscInt *indices,PetscCopyMode mode,
IS *is);
When mode is PETSC_COPY_VALUES, this routine copies the n indices passed to it by the integer
array indices. Thus, the user should be sure to free the integer array indices when it is no longer
needed, perhaps directly after the call to ISCreateGeneral(). The communicator, comm, should
consist of all processes that will be using the IS.
Another standard index set is defined by a starting point (first) and a stride (step), and can
be created with the command
ISCreateStride(MPI Comm comm,PetscInt n,PetscInt first,PetscInt step,IS *is);
1

Also see Chapter 10 which describes DMPlex, an abstraction for working with unstructured grids.

59

2.5. VECTORS RELATED TO UNSTRUCTURED GRIDS

PETSc 3.8 March 24, 2018

Index sets can be destroyed with the command
ISDestroy(IS &is);
On rare occasions the user may need to access information directly from an index set. Several
commands assist in this process:
ISGetSize(IS is,PetscInt *size);
ISStrideGetInfo(IS is,PetscInt *first,PetscInt *stride);
ISGetIndices(IS is,PetscInt **indices);
The function ISGetIndices() returns a pointer to a list of the indices in the index set. For certain
index sets, this may be a temporary array of indices created specifically for a given routine. Thus,
once the user finishes using the array of indices, the routine
ISRestoreIndices(IS is, PetscInt **indices);
should be called to ensure that the system can free the space it may have used to generate the list
of indices.
A blocked version of the index sets can be created with the command
ISCreateBlock(MPI Comm comm,PetscInt bs,PetscInt n,PetscInt
*indices,PetscCopyMode mode, IS *is);
This version is used for defining operations in which each element of the index set refers to a block
of bs vector entries. Related routines analogous to those described above exist as well, including
ISBlockGetIndices(), ISBlockGetSize(), ISBlockGetLocalSize(), ISGetBlockSize(). See
the man pages for details.

2.5.2

Scatters and Gathers

PETSc vectors have full support for general scatters and gathers. One can select any subset of the
components of a vector to insert or add to any subset of the components of another vector. We
refer to these operations as generalized scatters, though they are actually a combination of scatters
and gathers.
To copy selected components from one vector to another, one uses the following set of commands:
VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx);
VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD);
VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD);
VecScatterDestroy(VecScatter *ctx);
Here ix denotes the index set of the first vector, while iy indicates the index set of the destination
vector. The vectors can be parallel or sequential. The only requirements are that the number
of entries in the index set of the first vector, ix, equals the number in the destination index set,
iy, and that the vectors be long enough to contain all the indices referred to in the index sets.
The argument INSERT_VALUES specifies that the vector elements will be inserted into the specified
locations of the destination vector, overwriting any existing values. To add the components, rather
than insert them, the user should select the option ADD_VALUES instead of INSERT_VALUES.
To perform a conventional gather operation, the user simply makes the destination index set,
iy, be a stride index set with a stride of one. Similarly, a conventional scatter can be done with an
initial (sending) index set consisting of a stride. The scatter routines are collective operations (i.e.
all processes that own a parallel vector must call the scatter routines). When scattering from a
60

CHAPTER 2. VECTORS AND PARALLEL DATA

Vec
VecScatter
IS
PetscScalar
PetscInt

p, x;
/* initial vector,
scatter;
/* scatter context
from, to;
/* index sets that
*values;
idx_from[] = {100,200}, idx_to[]

PETSc 3.8 March 24, 2018

destination vector */
*/
define the scatter */
= {0,1};

VecCreateSeq(PETSC COMM SELF,2,&x);
ISCreateGeneral(PETSC COMM SELF,2,idx_from,PETSC COPY VALUES,&from);
ISCreateGeneral(PETSC COMM SELF,2,idx_to,PETSC COPY VALUES,&to);
VecScatterCreate(p,from,x,to,&scatter);
VecScatterBegin(scatter,p,x,INSERT VALUES,SCATTER FORWARD);
VecScatterEnd(scatter,p,x,INSERT VALUES,SCATTER FORWARD);
VecGetArray(x,&values);
ISDestroy(&from);
ISDestroy(&to);
VecScatterDestroy(&scatter);
Figure 9: Example Code for Vector Scatters
parallel vector to sequential vectors, each process has its own sequential vector that receives values
from locations as indicated in its own index set. Similarly, in scattering from sequential vectors to
a parallel vector, each process has its own sequential vector that makes contributions to the parallel
vector.
Caution: When INSERT_VALUES is used, if two different processes contribute different values to
the same component in a parallel vector, either value may end up being inserted. When ADD_VALUES
is used, the correct sum is added to the correct location.
In some cases one may wish to “undo” a scatter, that is perform the scatter backwards, switching
the roles of the sender and receiver. This is done by using
VecScatterBegin(VecScatter ctx,Vec y,Vec x,INSERT VALUES,SCATTER REVERSE);
VecScatterEnd(VecScatter ctx,Vec y,Vec x,INSERT VALUES,SCATTER REVERSE);
Note that the roles of the first two arguments to these routines must be swapped whenever the
SCATTER_REVERSE option is used.
Once a VecScatter object has been created it may be used with any vectors that have the
appropriate parallel data layout. That is, one can call VecScatterBegin() and VecScatterEnd()
with different vectors than used in the call to VecScatterCreate() so long as they have the same
parallel layout (number of elements on each process are the same). Usually, these “different” vectors
would have been obtained via calls to VecDuplicate() from the original vectors used in the call
to VecScatterCreate().
There is a PETSc routine that is nearly the opposite of VecSetValues(), that is, VecGetValues(),
but it can only get local values from the vector. To get off-process values, the user should create
a new vector where the components are to be stored, and then perform the appropriate vector
scatter. For example, if one desires to obtain the values of the 100th and 200th entries of a parallel
vector, p, one could use a code such as that within Figure 9. In this example, the values of the
100th and 200th components are placed in the array values. In this example each process now has
the 100th and 200th component, but obviously each process could gather any elements it needed,
or none by creating an index set with no entries.
The scatter comprises two stages, in order to allow overlap of communication and computation.
61

2.5. VECTORS RELATED TO UNSTRUCTURED GRIDS

PETSc 3.8 March 24, 2018

The introduction of the VecScatter context allows the communication patterns for the scatter to
be computed once and then reused repeatedly. Generally, even setting up the communication for a
scatter requires communication; hence, it is best to reuse such information when possible.

2.5.3

Scattering Ghost Values

Generalized scatters provide a very general method for managing the communication of required
ghost values for unstructured grid computations. One scatters the global vector into a local
“ghosted” work vector, performs the computation on the local work vectors, and then scatters
back into the global solution vector. In the simplest case this may be written as
VecScatterBegin(VecScatter scatter,Vec globalin,Vec localin,InsertMode
INSERT VALUES, ScatterMode SCATTER FORWARD);
VecScatterEnd(VecScatter scatter,Vec globalin,Vec localin,InsertMode
INSERT VALUES,ScatterMode SCATTER FORWARD);
/* For example, do local calculations from localin to localout */
...
VecScatterBegin(VecScatter scatter,Vec localout,Vec globalout,InsertMode
ADD VALUES,ScatterMode SCATTER REVERSE);
VecScatterEnd(VecScatter scatter,Vec localout,Vec globalout,InsertMode
ADD VALUES,ScatterMode SCATTER REVERSE);

2.5.4

Vectors with Locations for Ghost Values

There are two minor drawbacks to the basic approach described above:
• the extra memory requirement for the local work vector, localin, which duplicates the
memory in globalin, and
• the extra time required to copy the local values from localin to globalin.
An alternative approach is to allocate global vectors with space preallocated for the ghost values;
this may be done with either
VecCreateGhost(MPI Comm comm,PetscInt n,PetscInt N,PetscInt nghost,PetscInt
*ghosts,Vec *vv)
or
VecCreateGhostWithArray(MPI Comm comm,PetscInt n,PetscInt N,PetscInt
nghost,PetscInt *ghosts,PetscScalar *array,Vec *vv)
Here n is the number of local vector entries, N is the number of global entries (or NULL) and nghost
is the number of ghost entries. The array ghosts is of size nghost and contains the global vector
location for each local ghost location. Using VecDuplicate() or VecDuplicateVecs() on a ghosted
vector will generate additional ghosted vectors.
In many ways, a ghosted vector behaves just like any other MPI vector created by VecCreateMPI().
The difference is that the ghosted vector has an additional “local” representation that allows one
to access the ghost locations. This is done through the call to
VecGhostGetLocalForm(Vec g,Vec *l);
62

CHAPTER 2. VECTORS AND PARALLEL DATA

PETSc 3.8 March 24, 2018

The vector l is a sequential representation of the parallel vector g that shares the same array space
(and hence numerical values); but allows one to access the “ghost” values past “the end of the”
array. Note that one access the entries in l using the local numbering of elements and ghosts, while
they are accessed in g using the global numbering.
A common usage of a ghosted vector is given by
VecGhostUpdateBegin(Vec globalin,InsertMode INSERT VALUES, ScatterMode
SCATTER FORWARD);
VecGhostUpdateEnd(Vec globalin,InsertMode INSERT VALUES, ScatterMode
SCATTER FORWARD);
VecGhostGetLocalForm(Vec globalin,Vec *localin);
VecGhostGetLocalForm(Vec globalout,Vec *localout);
... Do local calculations from localin to localout ...
VecGhostRestoreLocalForm(Vec globalin,Vec *localin);
VecGhostRestoreLocalForm(Vec globalout,Vec *localout);
VecGhostUpdateBegin(Vec globalout,InsertMode ADD VALUES, ScatterMode
SCATTER REVERSE);
VecGhostUpdateEnd(Vec globalout,InsertMode ADD VALUES, ScatterMode
SCATTER REVERSE);
The routines VecGhostUpdateBegin() and VecGhostUpdateEnd() are equivalent to the routines VecScatterBegin() and VecScatterEnd() above except that since they are scattering into
the ghost locations, they do not need to copy the local vector values, which are already in place. In
addition, the user does not have to allocate the local work vector, since the ghosted vector already
has allocated slots to contain the ghost values.
The input arguments INSERT_VALUES and SCATTER_FORWARD cause the ghost values to be correctly updated from the appropriate process. The arguments ADD_VALUES and SCATTER_REVERSE
update the “local” portions of the vector from all the other processes’ ghost values. This would be
appropriate, for example, when performing a finite element assembly of a load vector.
Section 3.5 discusses the important topic of partitioning an unstructured grid.

63

2.5. VECTORS RELATED TO UNSTRUCTURED GRIDS

64

PETSc 3.8 March 24, 2018

Chapter 3

Matrices
PETSc provides a variety of matrix implementations because no single matrix format is appropriate
for all problems. Currently, we support dense storage and compressed sparse row storage (both
sequential and parallel versions), as well as several specialized formats. Additional formats can be
added.
This chapter describes the basics of using PETSc matrices in general (regardless of the particular
format chosen) and discusses tips for efficient use of the several simple uniprocess and parallel
matrix types. The use of PETSc matrices involves the following actions: create a particular type
of matrix, insert values into it, process the matrix, use the matrix for various computations, and
finally destroy the matrix. The application code does not need to know or care about the particular
storage formats of the matrices.

3.1

Creating and Assembling Matrices

The simplest routine for forming a PETSc matrix, A, is followed by
MatCreate(MPI Comm comm,Mat *A)
MatSetSizes(Mat A,PetscInt m,PetscInt n,PetscInt M,PetscInt N)
This routine generates a sequential matrix when running one process and a parallel matrix for two or
more processes; the particular matrix format is set by the user via options database commands. The
user specifies either the global matrix dimensions, given by M and N or the local dimensions, given
by m and n while PETSc completely controls memory allocation. This routine facilitates switching
among various matrix types, for example, to determine the format that is most efficient for a certain
application. By default, MatCreate() employs the sparse AIJ format, which is discussed in detail
Section 3.1.1. See the manual pages for further information about available matrix formats.
To insert or add entries to a matrix, one can call a variant of MatSetValues(), either
MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt
idxn[],const PetscScalar values[],INSERT VALUES);
or
MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt
idxn[],const PetscScalar values[],ADD VALUES);
This routine inserts or adds a logically dense subblock of dimension m*n into the matrix. The
integer indices idxm and idxn, respectively, indicate the global row and column numbers to be
inserted. MatSetValues() uses the standard C convention, where the row and column matrix
65

3.1. CREATING AND ASSEMBLING MATRICES

PETSc 3.8 March 24, 2018

indices begin with zero regardless of the storage format employed. The array values is logically
two-dimensional, containing the values that are to be inserted. By default the values are given in
row major order, which is the opposite of the Fortran convention, meaning that the value to be
put in row idxm[i] and column idxn[j] is located in values[i*n+j]. To allow the insertion of
values in column major order, one can call the command
MatSetOption(Mat A,MAT ROW ORIENTED,PETSC FALSE);
Warning: Several of the sparse implementations do not currently support the column-oriented
option.
This notation should not be a mystery to anyone. For example, to insert one matrix into another
when using MATLAB, one uses the command A(im,in) = B; where im and in contain the indices
for the rows and columns. This action is identical to the calls above to MatSetValues().
When using the block compressed sparse row matrix format (MATSEQBAIJ or MATMPIBAIJ),
one can insert elements more efficiently using the block variant, MatSetValuesBlocked() or
MatSetValuesBlockedLocal().
The function MatSetOption() accepts several other inputs; see the manual page for details.
After the matrix elements have been inserted or added into the matrix, they must be processed
(also called “assembled”) before they can be used. The routines for matrix processing are
MatAssemblyBegin(Mat A,MAT FINAL ASSEMBLY);
MatAssemblyEnd(Mat A,MAT FINAL ASSEMBLY);
By placing other code between these two calls, the user can perform computations while messages
are in transit. Calls to MatSetValues() with the INSERT_VALUES and ADD_VALUES] options
{\em cannot be mixed without intervening calls to the assembly routines. For such intermediate
assembly calls the second routine argument typically should be MAT_FLUSH_ASSEMBLY, which omits
some of the work of the full assembly process. MAT_FINAL_ASSEMBLY is required only in the last
matrix assembly before a matrix is used.
Even though one may insert values into PETSc matrices without regard to which process
eventually stores them, for efficiency reasons we usually recommend generating most entries on the
process where they are destined to be stored. To help the application programmer with this task
for matrices that are distributed across the processes by ranges, the routine
MatGetOwnershipRange(Mat A,PetscInt *first_row,PetscInt *last_row);
informs the user that all rows from first_row to last_row-1 (since the value returned in last_row
is one more than the global index of the last local row) will be stored on the local process.
In the sparse matrix implementations, once the assembly routines have been called, the matrices
are compressed and can be used for matrix-vector multiplication, etc. Any space for preallocated
nonzeros that was not filled by a call to MatSetValues() or a related routine is compressed out by
assembling with MAT_FINAL_ASSEMBLY. If you intend to use that extra space later, be sure to insert
explicit zeros before assembling with MAT_FINAL_ASSEMBLY so the space will not be compressed out.
Once the matrix has been assembled, inserting new values will be expensive since it will require
copies and possible memory allocation.
If one wishes to repeatedly assemble matrices that retain the same nonzero pattern (such as
within a nonlinear or time-dependent problem), the option
MatSetOption(Mat A,MAT NEW NONZERO LOCATIONS,PETSC FALSE);
should be specified after the first matrix has been fully assembled. This option ensures that
certain data structures and communication information will be reused (instead of regenerated)
66

CHAPTER 3. MATRICES

PETSc 3.8 March 24, 2018

during successive steps, thereby increasing efficiency. See $PETSC_DIR/src/ksp/ksp/examples/
tutorials/ex5.c for a simple example of solving two linear systems that use the same matrix
data structure.

3.1.1

Sparse Matrices

The default matrix representation within PETSc is the general sparse AIJ format (also called the
Yale sparse matrix format or compressed sparse row format, CSR). This section discusses tips for
efficiently using this matrix format for large-scale applications. Additional formats (such as block
compressed row and block diagonal storage, which are generally much more efficient for problems
with multiple degrees of freedom per node) are discussed below. Beginning users need not concern
themselves initially with such details and may wish to proceed directly to Section 3.2. However,
when an application code progresses to the point of tuning for efficiency and/or generating timing
results, it is crucial to read this information.
Sequential AIJ Sparse Matrices
In the PETSc AIJ matrix formats, we store the nonzero elements by rows, along with an array of
corresponding column numbers and an array of pointers to the beginning of each row. Note that
the diagonal matrix entries are stored with the rest of the nonzeros (not separately).
To create a sequential AIJ sparse matrix, A, with m rows and n columns, one uses the command
MatCreateSeqAIJ(PETSC COMM SELF,PetscInt m,PetscInt n,PetscInt nz,PetscInt
*nnz,Mat *A);
where nz or nnz can be used to preallocate matrix memory, as discussed below. The user can set
nz=0 and nnz=NULL for PETSc to control all matrix memory allocation.
The sequential and parallel AIJ matrix storage formats by default employ i-nodes (identical
nodes) when possible. We search for consecutive rows with the same nonzero structure, thereby
reusing matrix information for increased efficiency. Related options database keys are -mat_no_
inode (do not use inodes) and -mat_inode_limit  (set inode limit (max limit=5)). Note
that problems with a single degree of freedom per grid node will automatically not use I-nodes.
The internal data representation for the AIJ formats employs zero-based indexing.
Preallocation of Memory for Sequential AIJ Sparse Matrices
The dynamic process of allocating new memory and copying from the old storage to the new is
intrinsically very expensive. Thus, to obtain good performance when assembling an AIJ matrix, it
is crucial to preallocate the memory needed for the sparse matrix. The user has two choices for
preallocating matrix memory via MatCreateSeqAIJ().
One can use the scalar nz to specify the expected number of nonzeros for each row. This is
generally fine if the number of nonzeros per row is roughly the same throughout the matrix (or as a
quick and easy first step for preallocation). If one underestimates the actual number of nonzeros in
a given row, then during the assembly process PETSc will automatically allocate additional needed
space. However, this extra memory allocation can slow the computation,
If different rows have very different numbers of nonzeros, one should attempt to indicate (nearly)
the exact number of elements intended for the various rows with the optional array, nnz of length
m, where m is the number of rows, for example
PetscInt nnz[m];
nnz[0] = 
67

3.1. CREATING AND ASSEMBLING MATRICES

PETSc 3.8 March 24, 2018

nnz[1] = 
....
nnz[m-1] = 
In this case, the assembly process will require no additional memory allocations if the nnz estimates
are correct. If, however, the nnz estimates are incorrect, PETSc will automatically obtain the
additional needed space, at a slight loss of efficiency.
Using the array nnz to preallocate memory is especially important for efficient matrix assembly
if the number of nonzeros varies considerably among the rows. One can generally set nnz either
by knowing in advance the problem structure (e.g., the stencil for finite difference problems on a
structured grid) or by precomputing the information by using a segment of code similar to that
for the regular matrix assembly. The overhead of determining the nnz array will be quite small
compared with the overhead of the inherently expensive mallocs and moves of data that are needed
for dynamic allocation during matrix assembly. Always guess high if an exact value is not known
(extra space is cheaper than too little).
Thus, when assembling a sparse matrix with very different numbers of nonzeros in various rows,
one could proceed as follows for finite difference methods:
1. item Allocate integer array nnz.
2. Loop over grid, counting the expected number of nonzeros for the row(s) associated with the
various grid points.
3. Create the sparse matrix via MatCreateSeqAIJ() or alternative.
4. Loop over the grid, generating matrix entries and inserting in matrix via MatSetValues().
For (vertex-based) finite element type calculations, an analogous procedure is as follows:
1. Allocate integer array nnz.
2. Loop over vertices, computing the number of neighbor vertices, which determines the number
of nonzeros for the corresponding matrix row(s).
3. Create the sparse matrix via MatCreateSeqAIJ() or alternative.
4. Loop over elements, generating matrix entries and inserting in matrix via MatSetValues().
The -info option causes the routines MatAssemblyBegin() and MatAssemblyEnd() to print information about the success of the preallocation. Consider the following example for the MATSEQAIJ
matrix format:
MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:20 unneeded, 100 used
MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 0

The first line indicates that the user preallocated 120 spaces but only 100 were used. The second
line indicates that the user preallocated enough space so that PETSc did not have to internally
allocate additional space (an expensive operation). In the next example the user did not preallocate
sufficient space, as indicated by the fact that the number of mallocs is very large (bad for efficiency):
MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:47 unneeded, 1000 used
MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 40000

Although at first glance such procedures for determining the matrix structure in advance may
seem unusual, they are actually very efficient because they alleviate the need for dynamic construction of the matrix data structure, which can be very expensive.
68

CHAPTER 3. MATRICES

PETSc 3.8 March 24, 2018

Parallel AIJ Sparse Matrices
Parallel sparse matrices with the AIJ format can be created with the command
MatCreateAIJ=(MPI Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,PetscInt
d_nz,PetscInt *d_nnz, PetscInt o_nz,PetscInt *o_nnz,Mat *A);
A is the newly created matrix, while the arguments m, M, and N, indicate the number of local rows
and the number of global rows and columns, respectively. In the PETSc partitioning scheme, all the
matrix columns are local and n is the number of columns corresponding to local part of a parallel
vector. Either the local or global parameters can be replaced with PETSC_DECIDE, so that PETSc
will determine them. The matrix is stored with a fixed number of rows on each process, given by
m, or determined by PETSc if m is PETSC_DECIDE.
If PETSC_DECIDE is not used for the arguments m and n, then the user must ensure that they are
chosen to be compatible with the vectors. To do this, one first considers the matrix-vector product
y = Ax. The m that is used in the matrix creation routine MatCreateAIJ() must match the local
size used in the vector creation routine VecCreateMPI() for y. Likewise, the n used must match
that used as the local size in VecCreateMPI() for x.
The user must set d_nz=0, o_nz=0, d_nnz=NULL, and o_nnz=NULL for PETSc to control dynamic allocation of matrix memory space.
Analogous to nz and nnz for the routine
MatCreateSeqAIJ(), these arguments optionally specify nonzero information for the diagonal (d_nz
and d_nnz) and off-diagonal (o_nz and o_nnz) parts of the matrix. For a square global matrix, we
define each process’s diagonal portion to be its local rows and the corresponding columns (a square
submatrix); each process’s off-diagonal portion encompasses the remainder of the local matrix (a
rectangular submatrix). The rank in the MPI communicator determines the absolute ordering of
the blocks. That is, the process with rank 0 in the communicator given to MatCreateAIJ() contains the top rows of the matrix; the ith process in that communicator contains the ith block of the
matrix.
Preallocation of Memory for Parallel AIJ Sparse Matrices
As discussed above, preallocation of memory is critical for achieving good performance during
matrix assembly, as this reduces the number of allocations and copies required. We present an
example for three processes to indicate how this may be done for the MATMPIAIJ matrix format.
Consider the 8 by 8 matrix, which is partitioned by default with three rows on the first process,
three on the second and two on the third.

















1
0
9

2
5
0

13
0
0

0 14
18 0
0
0

25 26
30 0

0
6
10

27
0

| 0
| 7
| 11

0
0
0

| 0
| 8
| 12

4
0
0

| 15 16 17
| 19 20 21
| 22 23 0

| 0
| 0
| 24

0
0
0

| 0
| 31

| 29
| 0

0
34

3
0
0

0
32

28
33

The “diagonal” submatrix, d, on the first process is given by


1
 0
9

2
5
0

69


0
6 ,
10


















3.1. CREATING AND ASSEMBLING MATRICES

PETSc 3.8 March 24, 2018

while the “off-diagonal” submatrix, o, matrix is given by


0
 7
11

3 0 0
0 0 8
0 0 12


4
0 .
0

For the first process one could set d_nz to 2 (since each row has 2 nonzeros) or, alternatively, set
d_nnz to {2, 2, 2}. The o_nz could be set to 2 since each row of the o matrix has 2 nonzeros, or
o_nnz could be set to {2, 2, 2}.
For the second process the d submatrix is given by


15
 19
22

16
20
23


17
21  .
0

Thus, one could set d_nz to 3, since the maximum number of nonzeros in each row is 3, or
alternatively one could set d_nnz to {3, 3, 2}, thereby indicating that the first two rows will have 3
nonzeros while the third has 2. The corresponding o submatrix for the second process is


13 0
 0 18
0 0

14 0
0 0
0 24


0
0 
0

so that one could set o_nz to 2 or o_nnz to {2,1,1}.
Note that the user never directly works with the d and o submatrices, except when preallocating
storage space as indicated above. Also, the user need not preallocate exactly the correct amount of
space; as long as a sufficiently close estimate is given, the high efficiency for matrix assembly will
remain.
As described above, the option -info will print information about the success of preallocation
during matrix assembly. For the MATMPIAIJ and MATMPIBAIJ formats, PETSc will also list the
number of elements owned by on each process that were generated on a different process. For
example, the statements
MatAssemblyBegin_MPIAIJ:Stash has 10 entries, uses 0 mallocs
MatAssemblyBegin_MPIAIJ:Stash has 3 entries, uses 0 mallocs
MatAssemblyBegin_MPIAIJ:Stash has 5 entries, uses 0 mallocs

indicate that very few values have been generated on different processes. On the other hand, the
statements
MatAssemblyBegin_MPIAIJ:Stash has 100000 entries, uses 100 mallocs
MatAssemblyBegin_MPIAIJ:Stash has 77777 entries, uses 70 mallocs

indicate that many values have been generated on the “wrong” processes. This situation can be
very inefficient, since the transfer of values to the “correct” process is generally expensive. By using
the command MatGetOwnershipRange() in application codes, the user should be able to generate
most entries on the owning process.
Note: It is fine to generate some entries on the “wrong” process. Often this can lead to cleaner,
simpler, less buggy codes. One should never make code overly complicated in order to generate all
values locally. Rather, one should organize the code in such a way that most values are generated
locally.

3.1.2

Dense Matrices

PETSc provides both sequential and parallel dense matrix formats, where each process stores its
entries in a column-major array in the usual Fortran style. To create a sequential, dense PETSc
matrix, A of dimensions m by n, the user should call
70

CHAPTER 3. MATRICES

PETSc 3.8 March 24, 2018

MatCreateSeqDense(PETSC COMM SELF,PetscInt m,PetscInt n,PetscScalar *data,Mat *A);
The variable data enables the user to optionally provide the location of the data for matrix storage
(intended for Fortran users who wish to allocate their own storage space). Most users should
merely set data to NULL for PETSc to control matrix memory allocation. To create a parallel,
dense matrix, A, the user should call
MatCreateDense(MPI Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt
N,PetscScalar *data,Mat *A)
The arguments m, n, M, and N, indicate the number of local rows and columns and the number of
global rows and columns, respectively. Either the local or global parameters can be replaced with
PETSC_DECIDE, so that PETSc will determine them. The matrix is stored with a fixed number of
rows on each process, given by m, or determined by PETSc if m is PETSC_DECIDE.
PETSc does not provide parallel dense direct solvers, instead interfacing to external packages
that provide these solvers. Our focus is on sparse iterative solvers.

3.1.3

Block Matrices

Block matrices arise when coupling variables with different meaning, especially when solving problems with constraints (e.g. incompressible flow) and “multi-physics” problems. Usually the number
of blocks is small and each block is partitioned in parallel. We illustrate for a 3 × 3 system with
components labeled a, b, c. With some numbering of unknowns, the matrix could be written as


Aaa Aab Aac
 Aba Abb Abc  .
Aca Acb Acc
There are two fundamentally different ways that this matrix could be stored, as a single assembled
sparse matrix where entries from all blocks are merged together (“monolithic”), or as separate
assembled matrices for each block (“nested”). These formats have different performance characteristics depending on the operation being performed. In particular, many preconditioners require a
monolithic format, but some that are very effective for solving block systems (see Section 4.5) are
more efficient when a nested format is used. In order to stay flexible, we would like to be able to use
the same code to assemble block matrices in both monolithic and nested formats. Additionally, for
software maintainability and testing, especially in a multi-physics context where different groups
might be responsible for assembling each of the blocks, it is desirable to be able to use exactly the
same code to assemble a single block independently as to assemble it as part of a larger system.
To do this, we introduce the four spaces shown in Figure 10.
• The monolithic global space is the space in which the Krylov and Newton solvers operate,
with collective semantics across the entire block system.
• The split global space splits the blocks apart, but each split still has collective semantics.
• The split local space adds ghost points and separates the blocks. Operations in this space can
be performed with no parallel communication. This is often the most natural, and certainly
the most powerful, space for matrix assembly code.
• The monolithic local space can be thought of as adding ghost points to the monolithic global
space, but it is often more natural to use it simply as a concatenation of split local spaces
on each process. It is not common to explicitly manipulate vectors or matrices in this space
71

3.1. CREATING AND ASSEMBLING MATRICES

Monolithic Global

PETSc 3.8 March 24, 2018

Monolithic Local
Split Local

rank 0
rank 0

Split Global

LocalToGlobalMapping
LocalToGlobal()
GetLocalSubMatrix()

rank 0

rank 1

rank 1

rank 1

rank 2
rank 2

rank 2

GetSubMatrix() / GetSubVector()

Figure 10: The relationship between spaces used for coupled assembly.
(at least not during assembly), but it is a useful for declaring which part of a matrix is being
assembled.
The key to format-independent assembly is the function
MatGetLocalSubMatrix(Mat A,IS isrow,IS iscol,Mat *submat);
which provides a “view” submat into a matrix A that operates in the monolithic global space. The
submat transforms from the split local space defined by iscol to the split local space defined by
isrow. The index sets specify the parts of the monolithic local space that submat should operate in.
If a nested matrix format is used, then MatGetLocalSubMatrix() finds the nested block and returns
it without making any copies. In this case, submat is fully functional and has a parallel communicator. If a monolithic matrix format is used, then MatGetLocalSubMatrix() returns a proxy matrix
on PETSC_COMM_SELF that does not provide values or implement MatMult(), but does implement
MatSetValuesLocal() and, if isrow,iscol have a constant block size, MatSetValuesBlockedLocal().
Note that although submat may not be a fully functional matrix and the caller does not even know
a priori which communicator it will reside on, it always implements the local assembly functions
(which are not collective).
The index sets isrow,iscol can be obtained using
DMCompositeGetLocalISs() if DMComposite is being used. DMComposite can also be used to
create matrices, in which case the MATNEST format can be specified using -prefix_dm_mat_type
nest and MATAIJ can be specified using -prefix_dm_mat_type aij. See $PETSC_DIR/src/snes/
examples/tutorials/ex28.c for a simple example using this interface.
72

CHAPTER 3. MATRICES

3.2

PETSc 3.8 March 24, 2018

Basic Matrix Operations

Table 2 summarizes basic PETSc matrix operations. We briefly discuss a few of these routines in
more detail below.
The parallel matrix can multiply a vector with n local entries, returning a vector with m local
entries. That is, to form the product
MatMult(Mat A,Vec x,Vec y);
the vectors x and y should be generated with
VecCreateMPI(MPI Comm comm,n,N,&x);
VecCreateMPI(MPI Comm comm,m,M,&y);
By default, if the user lets PETSc decide the number of components to be stored locally (by passing
in PETSC_DECIDE as the second argument to VecCreateMPI() or using VecCreate()), vectors and
matrices of the same dimension are automatically compatible for parallel matrix-vector operations.
Along with the matrix-vector multiplication routine, there is a version for the transpose of the
matrix,
MatMultTranspose(Mat A,Vec x,Vec y);
There are also versions that add the result to another vector:
MatMultAdd(Mat A,Vec x,Vec y,Vec w);
MatMultTransposeAdd(Mat A,Vec x,Vec y,Vec w);
These routines, respectively, produce w = A ∗ x + y and w = AT ∗ x + y . In C it is legal for the
vectors y and w to be identical. In Fortran, this situation is forbidden by the language standard,
but we allow it anyway.
One can print a matrix (sequential or parallel) to the screen with the command
MatView(Mat mat,PETSC VIEWER STDOUT WORLD);
Other viewers can be used as well. For instance, one can draw the nonzero structure of the matrix
into the default X-window with the command
MatView(Mat mat,PETSC VIEWER DRAW WORLD);
Also one can use
MatView(Mat mat,PetscViewer viewer);
where viewer was obtained with PetscViewerDrawOpen(). Additional viewers and options are
given in the MatView() man page and Section 15.3.
The NormType argument to MatNorm() is one of NORM_1, NORM_INFINITY, and NORM_FROBENIUS.

3.3

Matrix-Free Matrices

Some people like to use matrix-free methods, which do not require explicit storage of the matrix, for
the numerical solution of partial differential equations. To support matrix-free methods in PETSc,
one can use the following command to create a Mat structure without ever actually generating the
matrix:
73

3.3. MATRIX-FREE MATRICES

PETSc 3.8 March 24, 2018

Function Name
MatAXPY(Mat Y, PetscScalar a, Mat X, MatStructure s);
MatMult(Mat A,Vec x, Vec y);
MatMultAdd(Mat A,Vec x, Vec y,Vec z);
MatMultTranspose(Mat A,Vec x, Vec y);
MatMultTransposeAdd(Mat A, Vec x, Vec y, Vec z);
MatNorm(Mat A,NormType type, PetscReal *r);
MatDiagonalScale(Mat A,Vec l,Vec r);
MatScale(Mat A,PetscScalar a);
MatConvert(Mat A, MatType type, Mat *B);
MatCopy(Mat A, Mat B, MatStructure s);
MatGetDiagonal(Mat A, Vec x);
MatTranspose(Mat A, MatReuse, Mat* B);
MatZeroEntries(Mat A);
MatShift(Mat Y, PetscScalar a);

Operation
Y =Y +a∗X
y =A∗x
z =y+A∗x
y = AT ∗ x
z = y + AT ∗ x
r = ||A||type
A = diag(l) ∗ A ∗ diag(r)
A=a∗A
B=A
B=A
x = diag(A)
B = AT
A=0
Y =Y +a∗I

Table 2: PETSc Matrix Operations

MatCreateShell(MPI Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,void
*ctx,Mat *mat);
Here M and N are the global matrix dimensions (rows and columns), m and n are the local matrix
dimensions, and ctx is a pointer to data needed by any user-defined shell matrix operations; the
manual page has additional details about these parameters. Most matrix-free algorithms require
only the application of the linear operator to a vector. To provide this action, the user must write
a routine with the calling sequence
UserMult(Mat mat,Vec x,Vec y);
and then associate it with the matrix, mat, by using the command
MatShellSetOperation(Mat mat,MatOperation MATOP_MULT, (void(*)(void))
PetscErrorCode (*UserMult)(Mat,Vec,Vec));
Here MATOP_MULT is the name of the operation for matrix-vector multiplication. Within each userdefined routine (such as UserMult()), the user should call MatShellGetContext() to obtain the
user-defined context, ctx, that was set by MatCreateShell(). This shell matrix can be used with
the iterative linear equation solvers discussed in the following chapters.
The routine MatShellSetOperation() can be used to set any other matrix operations as well.
The file $PETSC_DIR/include/petscmat.h provides a complete list of matrix operations, which
have the form MATOP_, where  is the name (in all capital letters) of
the user interface routine (for example, MatMult() → MATOP_MULT). All user-provided functions
have the same calling sequence as the usual matrix interface routines, since the user-defined functions are intended to be accessed through the same interface, e.g., MatMult(Mat,Vec,Vec) →
UserMult(Mat,Vec,Vec). The final argument for MatShellSetOperation() needs to be cast to a
void *, since the final argument could (depending on the MatOperation) be a variety of different
functions.
Note that MatShellSetOperation() can also be used as a “backdoor” means of introducing
user-defined changes in matrix operations for other storage formats (for example, to override the
74

CHAPTER 3. MATRICES

PETSc 3.8 March 24, 2018

default LU factorization routine supplied within PETSc for the MATSEQAIJ format). However, we
urge anyone who introduces such changes to use caution, since it would be very easy to accidentally
create a bug in the new routine that could affect other routines as well.
See also Section 5.5 for details on one set of helpful utilities for using the matrix-free approach
for nonlinear solvers.

3.4

Other Matrix Operations

In many iterative calculations (for instance, in a nonlinear equations solver), it is important for
efficiency purposes to reuse the nonzero structure of a matrix, rather than determining it anew
every time the matrix is generated. To retain a given matrix but reinitialize its contents, one can
employ
MatZeroEntries(Mat A);
This routine will zero the matrix entries in the data structure but keep all the data that indicates
where the nonzeros are located. In this way a new matrix assembly will be much less expensive,
since no memory allocations or copies will be needed. Of course, one can also explicitly set selected
matrix elements to zero by calling MatSetValues().
By default, if new entries are made in locations where no nonzeros previously existed, space
will be allocated for the new entries. To prevent the allocation of additional memory and simply
discard those new entries, one can use the option
MatSetOption(Mat A,MAT NEW NONZERO LOCATIONS,PETSC FALSE);
Once the matrix has been assembled, one can factor it numerically without repeating the ordering
or the symbolic factorization. This option can save some computational time, although it does
require that the factorization is not done in-place.
In the numerical solution of elliptic partial differential equations, it can be cumbersome to
deal with Dirichlet boundary conditions. In particular, one would like to assemble the matrix
without regard to boundary conditions and then at the end apply the Dirichlet boundary conditions.
In numerical analysis classes this process is usually presented as moving the known boundary
conditions to the right-hand side and then solving a smaller linear system for the interior unknowns.
Unfortunately, implementing this requires extracting a large submatrix from the original matrix
and creating its corresponding data structures. This process can be expensive in terms of both
time and memory.
One simple way to deal with this difficulty is to replace those rows in the matrix associated with
known boundary conditions, by rows of the identity matrix (or some scaling of it). This action can
be done with the command
MatZeroRows(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag_value,Vec
x,Vec b),
or equivalently,
MatZeroRowsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b);
For sparse matrices this removes the data structures for certain rows of the matrix. If the pointer
diag_value is NULL, it even removes the diagonal entry. If the pointer is not null, it uses that given
value at the pointer location in the diagonal entry of the eliminated rows.
One nice feature of this approach is that when solving a nonlinear problem such that at each
iteration the Dirichlet boundary conditions are in the same positions and the matrix retains the
75

3.4. OTHER MATRIX OPERATIONS

PETSc 3.8 March 24, 2018

same nonzero structure, the user can call MatZeroRows() in the first iteration. Then, before
generating the matrix in the second iteration the user should call
MatSetOption(Mat A,MAT NEW NONZERO LOCATIONS,PETSC FALSE);
From that point, no new values will be inserted into those (boundary) rows of the matrix.
The functions MatZeroRowsLocal() and MatZeroRowsLocalIS() can also be used if for each
process one provides the Dirichlet locations in the local numbering of the matrix. A drawback of
MatZeroRows() is that it destroys the symmetry of a matrix. Thus one can use
MatZeroRowsColumns(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar
diag_value,Vec x,Vec b),
or equivalently,
MatZeroRowsColumnsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b);
Note that with all of these for a given assembled matrix it can be only called once to update the
x and b vector. It cannot be used if one wishes to solve multiple right hand side problems for the
same matrix since the matrix entries needed for updating the b vector are removed in its first use.
Once the zeroed rows are removed the new matrix has possibly many rows with only a diagonal
entry affecting the parallel load balancing. The PCREDISTRIBUTE preconditioner removes all the
zeroed rows (and associated columns and adjusts the right hand side based on the removed columns)
and then rebalances the resulting rows of smaller matrix across the processes. Thus one can use
MatZeroRows() to set the Dirichlet points and then solve with the preconditioner PCREDISTRIBUTE.
Note if the original matrix was symmetric the smaller solved matrix will also be symmetric.
Another matrix routine of interest is
MatConvert(Mat mat,MatType newtype,Mat *M)
which converts the matrix mat to new matrix, M, that has either the same or different format.
Set newtype to MATSAME to copy the matrix, keeping the same matrix format. See $PETSC_DIR
/include/petscmat.h for other available matrix types; standard ones are MATSEQDENSE, MATSEQAIJ,
MATMPIAIJ, MATSEQBAIJ and MATMPIBAIJ.
In certain applications it may be necessary for application codes to directly access elements of
a matrix. This may be done by using the the command (for local rows only)
MatGetRow(Mat A,PetscInt row, PetscInt *ncols,const PetscInt (*cols)[],const
PetscScalar (*vals)[]);
The argument ncols returns the number of nonzeros in that row, while cols and vals returns the
column indices (with indices starting at zero) and values in the row. If only the column indices
are needed (and not the corresponding matrix elements), one can use NULL for the vals argument.
Similarly, one can use NULL for the cols argument. The user can only examine the values extracted
with MatGetRow(); the values cannot be altered. To change the matrix entries, one must use
MatSetValues().
Once the user has finished using a row, he or she must call
MatRestoreRow(Mat A,PetscInt row,PetscInt *ncols,PetscInt **cols,PetscScalar
**vals);
to free any space that was allocated during the call to MatGetRow().
76

CHAPTER 3. MATRICES

3.5

PETSc 3.8 March 24, 2018

Partitioning

For almost all unstructured grid computation, the distribution of portions of the grid across the
process’s work load and memory can have a very large impact on performance. In most PDE
calculations the grid partitioning and distribution across the processes can (and should) be done in
a “pre-processing” step before the numerical computations. However, this does not mean it need be
done in a separate, sequential program; rather, it should be done before one sets up the parallel grid
data structures in the actual program. PETSc provides an interface to the ParMETIS (developed
by George Karypis; see $PETSC_DIR/docs/installation.html. for directions on installing PETSc
to use ParMETIS) to allow the partitioning to be done in parallel. PETSc does not currently
provide directly support for dynamic repartitioning, load balancing by migrating matrix entries
between processes, etc. For problems that require mesh refinement, PETSc uses the “rebuild the
data structure” approach, as opposed to the “maintain dynamic data structures that support the
insertion/deletion of additional vector and matrix rows and columns entries” approach.
Partitioning in PETSc is organized around the MatPartitioning object. One first creates
a parallel matrix that contains the connectivity information about the grid (or other graph-type
object) that is to be partitioned. This is done with the command
MatCreateMPIAdj(MPI Comm comm,int mlocal,PetscInt n,const PetscInt ia[],const
PetscInt ja[],PetscInt *weights,Mat *Adj);
The argument mlocal indicates the number of rows of the graph being provided by the given
process, n is the total number of columns; equal to the sum of all the mlocal. The arguments ia
and ja are the row pointers and column pointers for the given rows; these are the usual format for
parallel compressed sparse row storage, using indices starting at 0, not 1.

1

4

0

2

1

0

3

5

2

3

Figure 11: Numbering on Simple Unstructured Grid
This, of course, assumes that one has already distributed the grid (graph) information among
the processes. The details of this initial distribution is not important; it could be simply determined
by assigning to the first process the first n0 nodes from a file, the second process the next n1 nodes,
etc.
For example, we demonstrate the form of the ia and ja for a triangular grid where we
77

3.5. PARTITIONING

PETSc 3.8 March 24, 2018

(1) partition by element (triangle)
• Process 0: mlocal = 2, n = 4, ja = {2,3, 3}, ia = {0,2,3}
• Process 1: mlocal = 2, n = 4, ja = {0, 0,1}, ia = {0,1,3}
Note that elements are not connected to themselves and we only indicate edge connections (in some
contexts single vertex connections between elements may also be included). We use a space above
to denote the transition between rows in the matrix.
and (2) partition by vertex.
• Process 0: mlocal = 3, n = 6, ja = {3,4, 4,5, 3,4,5}, ia = {0, 2, 4, 7}
• Process 1: mlocal = 3, n = 6, ja = {0,2, 4, 0,1,2,3,5, 1,2,4}, ia = {0, 3, 8, 11}.
Once the connectivity matrix has been created the following code will generate the renumbering
required for the new partition
MatPartitioningCreate(MPI Comm comm,MatPartitioning *part);
MatPartitioningSetAdjacency(MatPartitioning part,Mat Adj);
MatPartitioningSetFromOptions(MatPartitioning part);
MatPartitioningApply(MatPartitioning part,IS *is);
MatPartitioningDestroy(MatPartitioning *part);
MatDestroy(Mat *Adj);
ISPartitioningToNumbering(IS is,IS *isg);
The resulting isg contains for each local node the new global number of that node. The resulting
is contains the new process number that each local node has been assigned to.
Now that a new numbering of the nodes has been determined, one must renumber all the nodes
and migrate the grid information to the correct process. The command
AOCreateBasicIS(isg,NULL,&ao);
generates, see Section 2.3.1, an AO object that can be used in conjunction with the is and isg
to move the relevant grid information to the correct process and renumber the nodes etc. In this
context, the new ordering is the “application” ordering so AOPetscToApplication() converts old
global indices to new global indices and AOApplicationToPetsc() converts new global indices back
to old global indices.
PETSc does not currently provide tools that completely manage the migration and node renumbering, since it will be dependent on the particular data structure you use to store the grid information and the type of grid information that you need for your application. We do plan to include
more support for this in the future, but designing the appropriate general user interface and providing a scalable implementation that can be used for a wide variety of different grids requires a
great deal of time.

78

Chapter 4

KSP: Linear System Solvers
The object KSP is the heart of PETSc, because it provides uniform and efficient access to all of
the package’s linear system solvers, including parallel and sequential, direct and iterative. KSP is
intended for solving nonsingular systems of the form
Ax = b,

(4.1)

where A denotes the matrix representation of a linear operator, b is the right-hand-side vector, and
x is the solution vector. KSP uses the same calling sequence for both direct and iterative solution
of a linear system. In addition, particular solution techniques and their associated options can be
selected at runtime.
The combination of a Krylov subspace method and a preconditioner is at the center of most
modern numerical codes for the iterative solution of linear systems. See, for example, [13] for an
overview of the theory of such methods. KSP creates a simplified interface to the lower-level KSP
and PC modules within the PETSc package. The KSP package, discussed in Section 4.3, provides
many popular Krylov subspace iterative methods; the PC module, described in Section 4.4, includes
a variety of preconditioners. Although both KSP and PC can be used directly, users should employ
the interface of KSP.

4.1

Using KSP

To solve a linear system with KSP, one must first create a solver context with the command
KSPCreate(MPI Comm comm,KSP *ksp);
Here comm is the MPI communicator, and ksp is the newly formed solver context. Before actually
solving a linear system with KSP, the user must call the following routine to set the matrices
associated with the linear system:
KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat);
The argument Amat, representing the matrix that defines the linear system, is a symbolic place
holder for any kind of matrix. In particular, KSP does support matrix-free methods. The routine MatCreateShell() in Section 3.3 provides further information regarding matrix-free methods.
Typically, the matrix from which the preconditioner is to be constructed, Pmat, is the same as
the matrix that defines the linear system, Amat; however, occasionally these matrices differ (for instance, when a preconditioning matrix is obtained from a lower order method than that employed
to form the linear system matrix).
Much of the power of KSP can be accessed through the single routine
79

4.2. SOLVING SUCCESSIVE LINEAR SYSTEMS

PETSc 3.8 March 24, 2018

KSPSetFromOptions(KSP ksp);
This routine accepts the options -h and -help as well as any of the KSP and PC options discussed
below. To solve a linear system, one sets the rhs and solution vectors using and executes the
command
KSPSolve(KSP ksp,Vec b,Vec x);
where b and x respectively denote the right-hand-side and solution vectors. On return, the iteration
number at which the iterative process stopped can be obtained using
KSPGetIterationNumber(KSP ksp, PetscInt *its);
Note that this does not state that the method converged at this iteration: it can also have reached
the maximum number of iterations, or have diverged.
Section 4.3.2 gives more details regarding convergence testing. Note that multiple linear solves
can be performed by the same KSP context. Once the KSP context is no longer needed, it should be
destroyed with the command
KSPDestroy(KSP *ksp);
The above procedure is sufficient for general use of the KSP package. One additional step is
required for users who wish to customize certain preconditioners (e.g., see Section 4.4.4) or to log
certain performance data using the PETSc profiling facilities (as discussed in Chapter 13). In this
case, the user can optionally explicitly call
KSPSetUp(KSP ksp)
before calling KSPSolve() to perform any setup required for the linear solvers. The explicit call
of this routine enables the separate monitoring of any computations performed during the set up
phase, such as incomplete factorization for the ILU preconditioner.
The default solver within KSP is restarted GMRES, preconditioned for the uniprocess case with
ILU(0), and for the multiprocess case with the block Jacobi method (with one block per process,
each of which is solved with ILU(0)). A variety of other solvers and options are also available. To
allow application programmers to set any of the preconditioner or Krylov subspace options directly
within the code, we provide routines that extract the PC and KSP contexts,
KSPGetPC(KSP ksp,PC *pc);
The application programmer can then directly call any of the PC or KSP routines to modify the
corresponding default options.
To solve a linear system with a direct solver (currently supported by PETSc for sequential
matrices, and by several external solvers through PETSc interfaces (see Section 4.7)) one may use
the options -ksp_type preonly -pc_type lu (see below).
By default, if a direct solver is used, the factorization is not done in-place. This approach
prevents the user from the unexpected surprise of having a corrupted matrix after a linear solve.
The routine PCFactorSetUseInPlace(), discussed below, causes factorization to be done in-place.

4.2

Solving Successive Linear Systems

When solving multiple linear systems of the same size with the same method, several options are
available. To solve successive linear systems having the same preconditioner matrix (i.e., the same
80

CHAPTER 4. KSP: LINEAR SYSTEM SOLVERS

PETSc 3.8 March 24, 2018

data structure with exactly the same matrix elements) but different right-hand-side vectors, the
user should simply call KSPSolve(), multiple times. The preconditioner setup operations (e.g.,
factorization for ILU) will be done during the first call to KSPSolve() only; such operations will
not be repeated for successive solves.
To solve successive linear systems that have different preconditioner matrices (i.e., the matrix
elements and/or the matrix data structure change), the user must call KSPSetOperators() and
KSPSolve() for each solve. See Section 4.1 for a description of various flags for KSPSetOperators()
that can save work for such cases.

4.3

Krylov Methods

The Krylov subspace methods accept a number of options, many of which are discussed below.
First, to set the Krylov subspace method that is to be used, one calls the command
KSPSetType(KSP ksp,KSPType method);
The type can be one of KSPRICHARDSON, KSPCHEBYSHEV, KSPCG, KSPGMRES, KSPTCQMR, KSPBCGS,
KSPCGS, KSPTFQMR, KSPCR, KSPLSQR, KSPBICG, KSPPREONLY. or others; see Table 3 or the KSPType
man page for more. The KSP method can also be set with the options database command ksp_type, followed by one of the options richardson, chebyshev, cg, gmres, tcqmr, bcgs, cgs,
tfqmr, cr, lsqr, bicg, preonly., or others (see Table 3 or the KSPType man page) There are
method-specific options:for instance, for the Richardson, Chebyshev, and GMRES methods:
KSPRichardsonSetScale(KSP ksp,PetscReal scale);
KSPChebyshevSetEigenvalues(KSP ksp,PetscReal emax,PetscReal emin);
KSPGMRESSetRestart(KSP ksp,PetscInt max_steps);
The default parameter values are damping_factor=1.0, emax=0.01, emin=100.0, and max_steps=
30. The GMRES restart and Richardson damping factor can also be set with the options ksp_gmres_restart  and -ksp_richardson_scale .
The default technique for orthogonalization of the Hessenberg matrix in GMRES is the unmodified (classical) Gram-Schmidt method, which can be set with
KSPGMRESSetOrthogonalization(KSP
ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);
or the options database command -ksp_gmres_classicalgramschmidt. By default this will not
use iterative refinement to improve the stability of the orthogonalization. This can be changed with
the option
KSPGMRESSetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType type)
or via the options database with
-ksp_gmres_cgs_refinement_type none,ifneeded,always
The values for KSPGMRESCGSRefinementType() are KSP_GMRES_CGS_REFINEMENT_NONE,
KSP_GMRES_CGS_REFINEMENT_IFNEEDED and KSP_GMRES_CGS_REFINEMENT_ALWAYS.
One can also use modified Gram-Schmidt, by using the orthogonalization routine
KSPGMRESModifiedGramSchmidtOrthogonalization() or by using the command line option ksp_gmres_modifiedgramschmidt.
For the conjugate gradient method with complex numbers, there are two slightly different
algorithms depending on whether the matrix is Hermitian symmetric or truly symmetric (the
81

4.3. KRYLOV METHODS

PETSc 3.8 March 24, 2018

default is to assume that it is Hermitian symmetric). To indicate that it is symmetric, one uses the
command
KSPCGSetType(KSP ksp,KSPCGType KSP CG SYMMETRIC);
Note that this option is not valid for all matrices.
The LSQR algorithm does not involve a preconditioner; any preconditioner set to work with
the KSP object is ignored if KSPLSQR was selected.
By default, KSP assumes an initial guess of zero by zeroing the initial value for the solution
vector that is given; this zeroing is done at the call to KSPSolve() (or KSPSolve()). To use a
nonzero initial guess, the user must call
KSPSetInitialGuessNonzero(KSP ksp,PetscBool flg);

4.3.1

Preconditioning within KSP

Since the rate of convergence of Krylov projection methods for a particular linear system is strongly
dependent on its spectrum, preconditioning is typically used to alter the spectrum and hence
accelerate the convergence rate of iterative techniques. Preconditioning can be applied to the
system (4.1) by
(ML−1 AMR−1 ) (MR x) = ML−1 b,
(4.2)
where ML and MR indicate preconditioning matrices (or, matrices from which the preconditioner
is to be constructed). If ML = I in (4.2), right preconditioning results, and the residual of (4.1),
r ≡ b − Ax = b − AMR−1 MR x,
is preserved. In contrast, the residual is altered for left (MR = I) and symmetric preconditioning,
as given by
rL ≡ ML−1 b − ML−1 Ax = ML−1 r.
By default, most KSP implementations use left preconditioning. Some more naturally use other
options, though. For instance, KSPQCG defaults to use symmetric preconditioning and KSPFGMRES
uses right preconditioning by default. Right preconditioning can be activated for some methods by
using the options database command -ksp_pc_side right or calling the routine
KSPSetPCSide(KSP ksp,PCSide PC RIGHT);
Attempting to use right preconditioning for a method that does not currently support it results in
an error message of the form
KSPSetUp_Richardson:No right preconditioning for KSPRICHARDSON

We summarize the defaults for the residuals used in KSP convergence monitoring within Table 3. Details regarding specific convergence tests and monitoring routines are presented in the
following sections. The preconditioned residual is used by default for convergence testing of all leftpreconditioned KSP methods. For the conjugate gradient, Richardson, and Chebyshev methods the
true residual can be used by the options database command ksp_norm_type unpreconditioned
or by calling the routine
KSPSetNormType(KSP ksp,KSP NORM UNPRECONDITIONED);
82

CHAPTER 4. KSP: LINEAR SYSTEM SOLVERS

Method
Richardson
Chebyshev
Conjugate Gradient [19]
Pipelined Conjugate Gradients
Pipelined Conjugate Gradients (Gropp)
Pipelined Conjugate Gradients with Residual Replacement
Conjugate Gradients for the Normal Equations
Flexible Conjugate Gradients [26]
Pipelined, Flexible Conjugate Gradient
Conjugate Gradients for Least Squares
Conjugate Gradients with Constraint (1)
Conjugate Gradients with Constraint (2)
Conjugate Gradients with Constraint (3)
Conjugate Gradients with Constraint (4)
BiConjugate Gradient
BiCGSTAB [36]
Improved BiCGSTAB
Flexible BiCGSTAB
Flexible BiCGSTAB (variant)
Enhanced BiCGSTAB(L)
Minimal Residual Method [27]
Generalized Minimal Residual [32]
Flexible Generalized Minimal Residual [31]
Deflated Generalized Minimal Residual
Pipelined Generalized Minimal Residual
Pipelined, Flexible Generalized Minimal Residual
Generalized Minimal Residual with Accelerated Restart
Conjugate Residual [12]
Generalized Conjugate Residual
Pipelined Conjugate Residual
Pipelined, Flexible Conjugate Residual
FETI-DP
Conjugate Gradient Squared [35]
Transpose-Free Quasi-Minimal Residual (1) [14]
Transpose-Free Quasi-Minimal Residual (2)
Least Squares Method
Symmetric LQ Method [27]
TSIRM
Python Shell
Shell for no KSP method

Table 3: KSP Objects.

83

PETSc 3.8 March 24, 2018

KSPType
KSPRICHARDSON
KSPCHEBYSHEV
KSPCG
KSPPIPECG
KSPGROPPCG
KSPPIPECGRR
KSPCGNE
KSPFCG
KSPPIPEFCG
KSPCGLS
KSPNASH
KSPSTCG
KSPGLTR
KSPQCG
KSPBICG
KSPBCGS
KSPIBCGS
KSPFBCGS
KSPFBCGSR
KSPBCGSL
KSPMINRES
KSPGMRES
KSPFGMRES
KSPDGMRES
KSPPGMRES
KSPPIPEFGMRES
KSPLGMRES
KSPCR
KSPGCR
KSPPIPECR
KSPPIPEGCR
KSPFETIDP
KSPCGS
KSPTFQMR
KSPTCQMR
KSPLSQR
KSPSYMMLQ
KSPTSIRM
KSPPYTHON
KSPPREONLY

Options
Database
Name
richardson
chebyshev
cg
pipecg
cgne
pipecgrr
cgne
fcg
pipefcg
cgls
nash
stcg
gltr
qcg
bicg
bcgs
ibcgs
fbcgs
fbcgsr
bcgsl
minres
gmres
fgmres
dgmres
pgmres
pipefgmres
lgmres
cr
gcr
pipecr
pipegcr
fetidp
cgs
tfqmr
tcqmr
lsqr
symmlq
tsirm
python
preonly

4.3. KRYLOV METHODS

PETSc 3.8 March 24, 2018

Note: the bi-conjugate gradient method requires application of both the matrix and its transpose
plus the preconditioner and its transpose. Currently not all matrices and preconditioners provide
this support and thus the KSPBICG cannot always be used.
Note: PETSc implements the FETI-DP (Finite Element Tearing and Interconnecting DualPrimal) method as a KSP since it recasts the original problem into a contstrained minimization
one with Lagrange multipliers. The only matrix type supported is MATIS. Support for saddle point
problems is provided. Consult the online documentation for further details.

4.3.2

Convergence Tests

The default convergence test, KSPConvergedDefault(), is based on the l2 -norm of the residual.
Convergence (or divergence) is decided by three quantities: the decrease of the residual norm
relative to the norm of the right hand side, rtol, the absolute size of the residual norm, atol, and
the relative increase in the residual, dtol. Convergence is detected at iteration k if
krk k2 < max(rtol ∗ kbk2 , atol),
where rk = b − Axk . Divergence is detected if
krk k2 > dtol ∗ kbk2 .
These parameters, as well as the maximum number of allowable iterations, can be set with the
routine
KSPSetTolerances(KSP ksp,PetscReal rtol,PetscReal atol,PetscReal dtol,PetscInt
maxits);
The user can retain the default value of any of these parameters by specifying PETSC_DEFAULT as
the corresponding tolerance; the defaults are rtol=1e-5, atol=1e-50, dtol=1e5, and maxits=1e4.
These parameters can also be set from the options database with the commands -ksp_rtol ,
-ksp_atol , -ksp_divtol , and -ksp_max_it .
In addition to providing an interface to a simple convergence test, KSP allows the application
programmer the flexibility to provide customized convergence-testing routines. The user can specify
a customized routine with the command
KSPSetConvergenceTest(KSP ksp,PetscErrorCode (*test)(KSP ksp,PetscInt
it,PetscReal rnorm, KSPConvergedReason *reason,void *ctx),void
*ctx,PetscErrorCode (*destroy)(void *ctx));
The final routine argument, ctx, is an optional context for private data for the user-defined convergence routine, test. Other test routine arguments are the iteration number, it, and the residual’s
l2 norm, rnorm. The routine for detecting convergence, test, should set reason to positive for convergence, 0 for no convergence, and negative for failure to converge. A full list of possible values for
KSPConvergedReason is given in include/petscksp.h. You can use KSPGetConvergedReason()
after KSPSolve() to see why convergence/divergence was detected.

4.3.3

Convergence Monitoring

By default, the Krylov solvers run silently without displaying information about the iterations.
The user can indicate that the norms of the residuals should be displayed by using -ksp_monitor
within the options database. To display the residual norms in a graphical window (running under X
Windows), one should use -ksp_monitor_lg_residualnorm [x,y,w,h], where either all or none
of the options must be specified. Application programmers can also provide their own routines to
perform the monitoring by using the command
84

CHAPTER 4. KSP: LINEAR SYSTEM SOLVERS

PETSc 3.8 March 24, 2018

KSPMonitorSet(KSP ksp,PetscErrorCode (*mon)(KSP ksp,PetscInt it,PetscReal
rnorm,void *ctx),void *ctx,PetscErrorCode (*mondestroy)(void**));
The final routine argument, ctx, is an optional context for private data for the user-defined monitoring routine, mon. Other mon routine arguments are the iteration number (it) and the residual’s l2
norm
(rnorm).
A
helpful
routine
within
user-defined
monitors
is
PetscObjectGetComm((PetscObject)ksp,MPI_Comm *comm), which returns in comm the MPI communicator for the KSP context. See section 1.4 for more discussion of the use of MPI communicators
within PETSc.
Several monitoring routines are supplied with PETSc, including
KSPMonitorDefault(KSP,PetscInt,PetscReal, void *);
KSPMonitorSingularValue(KSP,PetscInt,PetscReal,void *);
KSPMonitorTrueResidualNorm(KSP,PetscInt,PetscReal, void *);
The default monitor simply prints an estimate of the l2 -norm of the residual at each iteration.
The routine KSPMonitorSingularValue() is appropriate only for use with the conjugate gradient
method or GMRES, since it prints estimates of the extreme singular values of the preconditioned
operator at each iteration. Since KSPMonitorTrueResidualNorm() prints the true residual at each
iteration by actually computing the residual using the formula r = b − Ax, the routine is slow
and should be used only for testing or convergence studies, not for timing. These monitors may
be accessed with the command line options -ksp_monitor, -ksp_monitor_singular_value, and
-ksp_monitor_true_residual.
To employ the default graphical monitor, one should use the commands
PetscDrawLG lg;
KSPMonitorLGResidualNormCreate(MPI Comm comm,char *display,char *title,PetscInt
x,PetscInt y,PetscInt w,PetscInt h,PetscDrawLG *lg);
KSPMonitorSet(KSP ksp,KSPMonitorLGResidualNorm,lg,0);
When no longer needed, the line graph should be destroyed with the command
PetscDrawLGDestroy(PetscDrawLG *lg);
The user can change aspects of the graphs with the PetscDrawLG*() and PetscDrawAxis*()
routines. One can also access this functionality from the options database with the command
-ksp_monitor_lg_residualnorm [x,y,w,h]. , where x, y, w, h are the optional location and
size of the window.
One can cancel hardwired monitoring routines for KSP at runtime with -ksp_monitor_cancel.
Unless the Krylov method converges so that the residual norm is small, say 10−10 , many of
the final digits printed with the -ksp_monitor option are meaningless. Worse, they are different
on different machines; due to different round-off rules used by, say, the IBM RS6000 and the Sun
SPARC. This makes testing between different machines difficult. The option -ksp_monitor_short
causes PETSc to print fewer of the digits of the residual norm as it gets smaller; thus on most of
the machines it will always print the same numbers making cross system testing easier.

4.3.4

Understanding the Operator’s Spectrum

Since the convergence of Krylov subspace methods depends strongly on the spectrum (eigenvalues)
of the preconditioned operator, PETSc has specific routines for eigenvalue approximation via the
Arnoldi or Lanczos iteration. First, before the linear solve one must call
85

4.4. PRECONDITIONERS

PETSc 3.8 March 24, 2018

KSPSetComputeEigenvalues(KSP ksp,PETSC TRUE);
Then after the KSP solve one calls
KSPComputeEigenvalues(KSP ksp,PetscInt n,PetscReal *realpart,PetscReal
*complexpart,PetscInt *neig);
Here, n is the size of the two arrays and the eigenvalues are inserted into those two arrays. neig is
the number of eigenvalues computed; this number depends on the size of the Krylov space generated
during the linear system solution, for GMRES it is never larger than the restart parameter. There
is an additional routine
KSPComputeEigenvaluesExplicitly(KSP ksp, PetscInt n,PetscReal
*realpart,PetscReal *complexpart);
that is useful only for very small problems. It explicitly computes the full representation of the
preconditioned operator and calls LAPACK to compute its eigenvalues. It should be only used for
matrices of size up to a couple hundred. The PetscDrawSP*() routines are very useful for drawing
scatter plots of the eigenvalues.
The eigenvalues may also be computed and displayed graphically with the options data base
commands -ksp_plot_eigenvalues and -ksp_plot_eigenvalues_explicitly.
Or they can
be dumped to the screen in ASCII text via -ksp_compute_eigenvalues and -ksp_compute_
eigenvalues_explicitly.

4.3.5

Other KSP Options

To obtain the solution vector and right hand side from a KSP context, one uses
KSPGetSolution(KSP ksp,Vec *x);
KSPGetRhs(KSP ksp,Vec *rhs);
During the iterative process the solution may not yet have been calculated or it may be stored in
a different location. To access the approximate solution during the iterative process, one uses the
command
KSPBuildSolution(KSP ksp,Vec w,Vec *v);
where the solution is returned in v. The user can optionally provide a vector in w as the location to
store the vector; however, if w is NULL, space allocated by PETSc in the KSP context is used. One
should not destroy this vector. For certain KSP methods, (e.g., GMRES), the construction of the
solution is expensive, while for many others it doesn’t evenrequire a vector copy.
Access to the residual is done in a similar way with the command
KSPBuildResidual(KSP ksp,Vec t,Vec w,Vec *v);
Again, for GMRES and certain other methods this is an expensive operation.

4.4

Preconditioners

As discussed in Section 4.3.1, Krylov subspace methods are typically used in conjunction with a
preconditioner. To employ a particular preconditioning method, the user can either select it from
the options database using input of the form -pc_type  or set the method with the
command
86

CHAPTER 4. KSP: LINEAR SYSTEM SOLVERS

Method
Jacobi
Block Jacobi
SOR (and SSOR)
SOR with Eisenstat trick
Incomplete Cholesky
Incomplete LU
Additive Schwarz
Generalized Additive Schwarz
Algebraic Multigrid
Balancing Domain Decomposition by Constraints
Linear solver
Combination of preconditioners
LU
Cholesky
No preconditioning
Shell for user-defined PC

PETSc 3.8 March 24, 2018

PCType
PCJACOBI
PCBJACOBI
PCSOR
PCEISENSTAT
PCICC
PCILU
PCASM
PCGASM
PCGAMG
PCBDDC
PCKSP
PCCOMPOSITE
PCLU
PCCHOLESKY
PCNONE
PCSHELL

Options Database Name
jacobi
bjacobi
sor
eisenstat
icc
ilu
asm
gasm
gamg
bddc
ksp
composite
lu
cholesky
none
shell

Table 4: PETSc Preconditioners (partial list)

PCSetType(PC pc,PCType method);
In Table 4 we summarize the basic preconditioning methods supported in PETSc. See the PCType
manual page for a complete list. The PCSHELL preconditioner uses a specific, application-provided
preconditioner. The direct preconditioner, PCLU , is, in fact, a direct solver for the linear system
that uses LU factorization. PCLU is included as a preconditioner so that PETSc has a consistent
interface among direct and iterative linear solvers.
Each preconditioner may have associated with it a set of options, which can be set with routines
and options database commands provided for this purpose. Such routine names and commands are
all of the form PC

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 266
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.17
Create Date                     : 2018:03:24 21:46:10-05:00
Modify Date                     : 2018:03:24 21:46:10-05:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2

EXIF Metadata provided by EXIF.tools

PETSc Manual

PETSc_manual

Navigation menu

Versions of this User Manual:

Views

Navigation