A Fruit Growers Guide Catalogue, 1899 (Classic Reprint) George Greening Free Book Downlaod Ark
ark_guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 404
Download | |
Open PDF In Browser | View PDF |
User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2) Daniel R. Reynolds1, David J. Gardner2, Alan C. Hindmarsh2, Carol S. Woodward2 and Jean M. Sexton1, 1 Department of Mathematics Southern Methodist University 2 Center for Applied Scientific Computing Lawrence Livermore National Laboratory January 22, 2019 LLNL-SM-668082 DISCLAIMER This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor Southern Methodist University, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government, Lawrence Livermore National Security, LLC, or Southern Methodist University. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government, Lawrence Livermore National Security, LLC, or Southern Methodist University, and shall not be used for advertising or product endorsement purposes. Approved for public release; further dissemination unlimited CONTENTS 1 Introduction 1.1 Changes from previous versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Reading this User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 SUNDIALS Release License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 10 10 2 Mathematical Considerations 2.1 Adaptive single-step methods . . . . . . . . . . 2.2 Interpolation . . . . . . . . . . . . . . . . . . . 2.3 ARKStep – Additive Runge-Kutta methods . . . 2.4 ERKStep – Explicit Runge-Kutta methods . . . 2.5 MRIStep – Multirate infinitesimal step methods . 2.6 Error norms . . . . . . . . . . . . . . . . . . . . 2.7 Time step adaptivity . . . . . . . . . . . . . . . 2.8 Explicit stability . . . . . . . . . . . . . . . . . 2.9 Algebraic solvers . . . . . . . . . . . . . . . . . 2.10 Rootfinding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 15 16 16 17 18 21 22 30 3 Code Organization 3.1 ARKode organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 35 4 Using ARKStep for C and C++ Applications 4.1 Access to library and header files . . . . 4.2 Data Types . . . . . . . . . . . . . . . . 4.3 Header Files . . . . . . . . . . . . . . . 4.4 A skeleton of the user’s main program . . 4.5 User-callable functions . . . . . . . . . . 4.6 User-supplied functions . . . . . . . . . 4.7 Preconditioner modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 . 37 . 38 . 39 . 40 . 44 . 96 . 108 5 FARKODE, an Interface Module for FORTRAN Applications 117 5.1 Important note on portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2 Fortran Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6 Using ERKStep for C and C++ Applications 6.1 Access to library and header files . . . . 6.2 Data Types . . . . . . . . . . . . . . . . 6.3 Header Files . . . . . . . . . . . . . . . 6.4 A skeleton of the user’s main program . . 6.5 ERKStep User-callable functions . . . . 6.6 User-supplied functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 153 153 154 155 157 183 i 7 Using MRIStep for C and C++ Applications 7.1 Access to library and header files . . . . 7.2 Data Types . . . . . . . . . . . . . . . . 7.3 Header Files . . . . . . . . . . . . . . . 7.4 A skeleton of the user’s main program . . 7.5 MRIStep User-callable functions . . . . 7.6 User-supplied functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 187 187 188 189 191 206 8 Butcher Table Data Structure 209 8.1 ARKodeButcherTable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9 Vector Data Structures 9.1 Description of the NVECTOR Modules . . 9.2 Description of the NVECTOR operations . 9.3 The NVECTOR_SERIAL Module . . . . . 9.4 The NVECTOR_PARALLEL Module . . . 9.5 The NVECTOR_OPENMP Module . . . . 9.6 The NVECTOR_PTHREADS Module . . 9.7 The NVECTOR_PARHYP Module . . . . 9.8 The NVECTOR_PETSC Module . . . . . 9.9 The NVECTOR_CUDA Module . . . . . 9.10 The NVECTOR_RAJA Module . . . . . . 9.11 The NVECTOR_OPENMPDEV Module . 9.12 NVECTOR Examples . . . . . . . . . . . 9.13 NVECTOR functions required by ARKode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 213 215 223 226 229 232 236 238 240 244 247 250 253 10 Matrix Data Structures 10.1 Description of the SUNMATRIX Modules . . 10.2 Description of the SUNMATRIX operations . 10.3 Compatibility of SUNMATRIX types . . . . . 10.4 The SUNMATRIX_DENSE Module . . . . . 10.5 The SUNMATRIX_BAND Module . . . . . . 10.6 The SUNMATRIX_SPARSE Module . . . . . 10.7 SUNMATRIX Examples . . . . . . . . . . . . 10.8 SUNMATRIX functions required by ARKode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 255 256 258 258 262 268 274 274 11 Description of the SUNLinearSolver module 11.1 The SUNLinearSolver API . . . . . . . . 11.2 ARKode SUNLinearSolver interface . . 11.3 The SUNLinSol_Dense Module . . . . . 11.4 The SUNLinSol_Band Module . . . . . 11.5 The SUNLinSol_LapackDense Module . 11.6 The SUNLinSol_LapackBand Module . 11.7 The SUNLinSol_KLU Module . . . . . 11.8 The SUNLinSol_SuperLUMT Module . 11.9 The SUNLinSol_SPGMR Module . . . . 11.10 The SUNLinSol_SPFGMR Module . . . 11.11 The SUNLinSol_SPBCGS Module . . . 11.12 The SUNLinSol_SPTFQMR Module . . 11.13 The SUNLinSol_PCG Module . . . . . . 11.14 SUNLinearSolver Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 278 286 288 290 292 294 296 300 303 307 312 316 319 324 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Nonlinear Solver Data Structures 325 12.1 Description of the SUNNonlinearSolver Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 ii 13 ARKode Installation Procedure 339 13.1 CMake-based installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 13.2 Installed libraries and exported header files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 14 Appendix: ARKode Constants 357 14.1 ARKode input constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 14.2 ARKode output constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 15 Appendix: Butcher tables 361 15.1 Explicit Butcher tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 15.2 Implicit Butcher tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 15.3 Additive Butcher tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Bibliography 383 Index 387 iii iv User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), This is the documentation for ARKode, an adaptive step time integration package for stiff, nonstiff and mixed stiff/nonstiff systems of ordinary differential equations (ODEs) using Runge-Kutta (i.e. one-step, multi-stage) methods. The ARKode solver is a component of the SUNDIALS suite of nonlinear and differential/algebraic equation solvers. It is designed to have a similar user experience to the CVODE solver, including user modes to allow adaptive integration to specified output times, return after each internal step and root-finding capabilities, and for calculations in serial, using shared-memory parallelism (via OpenMP, Pthreads, CUDA, Raja) or distributed-memory parallelism (via MPI). The default integration and solver options should apply to most users, though control over nearly all internal parameters and time adaptivity algorithms is enabled through optional interface routines. ARKode is written in C, with C++ and Fortran interfaces. ARKode is developed by Southern Methodist University, with support by the US Department of Energy through the FASTMath SciDAC Institute, under subcontract B598130 from Lawrence Livermore National Laboratory. CONTENTS 1 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 2 CONTENTS CHAPTER ONE INTRODUCTION The ARKode infrastructure provides adaptive-step time integration modules for stiff, nonstiff and mixed stiff/nonstiff systems of ordinary differential equations (ODEs). ARKode itself is structured to support a wide range of one-step (but multi-stage) methods, allowing for rapid development of parallel implementations of state-of-the-art time integration methods. At present, ARKode is packaged with two time-stepping modules, ARKStep and ERKStep. ARKStep supports ODE systems posed in split, linearly-implicit form, 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 , (1.1) where 𝑡 is the independent variable, 𝑦 is the set of dependent variables (in R𝑁 ), 𝑀 is a user-specified, nonsingular operator from R𝑁 to R𝑁 , and the right-hand side function is partitioned into up to two components: • 𝑓𝐸 (𝑡, 𝑦) contains the “nonstiff” time scale components to be integrated explicitly, and • 𝑓𝐼 (𝑡, 𝑦) contains the “stiff” time scale components to be integrated implicitly. Either of these operators may be disabled, allowing for fully explicit, fully implicit, or combination implicit-explicit (ImEx) time integration. The algorithms used in ARKStep are adaptive- and fixed-step additive Runge Kutta methods. Such methods are defined through combining two complementary Runge-Kutta methods: one explicit (ERK) and the other diagonally implicit (DIRK). Through appropriately partitioning the ODE right-hand side into explicit and implicit components (1.1), such methods have the potential to enable accurate and efficient time integration of stiff, nonstiff, and mixed stiff/nonstiff systems of ordinary differential equations. A key feature allowing for high efficiency of these methods is that only the components in 𝑓𝐼 (𝑡, 𝑦) must be solved implicitly, allowing for splittings tuned for use with optimal implicit solver algorithms. This framework allows for significant freedom over the constitutive methods used for each component, and ARKode is packaged with a wide array of built-in methods for use. These built-in Butcher tables include adaptive explicit methods of orders 2-8, adaptive implicit methods of orders 2-5, and adaptive ImEx methods of orders 3-5. ERKStep focuses specifically on problems posed in explicit form, 𝑦˙ = 𝑓 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (1.2) allowing for increased computational efficiency and memory savings. The algorithms used in ERKStep are adaptiveand fixed-step explicit Runge Kutta methods. As with ARKStep, the ERKStep module is packaged with adaptive explicit methods of orders 2-8. For problems that include nonzero implicit term 𝑓𝐼 (𝑡, 𝑦), the resulting implicit system (assumed nonlinear, unless specified otherwise) is solved approximately at each integration step, using a modified Newton method, inexact Newton method, or an accelerated fixed-point solver. For the Newton-based methods and the serial or threaded NVECTOR modules in SUNDIALS, ARKode may use a variety of linear solvers provided with SUNDIALS, including both direct (dense, band, or sparse) and preconditioned Krylov iterative (GMRES [SS1986], BiCGStab [V1992], TFQMR [F1993], FGMRES [S1993], or PCG [HS1952]) linear solvers. When used with the MPI-based parallel, PETSc, hypre, CUDA, and Raja NVECTOR modules, or a user-provided vector data structure, only the Krylov solvers are 3 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), available, although a user may supply their own linear solver for any data structures if desired. For the serial or threaded vector structures, we provide a banded preconditioner module called ARKBANDPRE that may be used with the Krylov solvers, while for the MPI-based parallel vector structure there is a preconditioner module called ARKBBDPRE which provides a band-block-diagonal preconditioner. Additionally, a user may supply more optimal, problem-specific preconditioner routines. 1.1 Changes from previous versions 1.1.1 Changes in v3.0.2 There were no changes to ARKode in this release. 1.1.2 Changes in v3.0.1 A bug in ARKode where single precision builds would fail to compile has been fixed. 1.1.3 Changes in v3.0.0 The ARKode library has been entirely rewritten to support a modular approach to one-step methods, which should allow rapid research and development of novel integration methods without affecting existing solver functionality. To support this, the existing ARK-based methods have been encapsulated inside the new ARKStep time-stepping module. Two new time-stepping modules have been added: • The ERKStep module provides an optimized implementation for explicit Runge-Kutta methods with reduced storage and number of calls to the ODE right-hand side function. • The MRIStep module implements two-rate explicit-explicit multirate infinitesimal step methods utilizing different step sizes for slow and fast processes in an additive splitting. This restructure has resulted in numerous small changes to the user interface, particularly the suite of “Set” routines for user-provided solver parameters and and “Get” routines to access solver statistics, that are now prefixed with the name of time-stepping module (e.g., ARKStep or ERKStep) instead of ARKode. Aside from affecting the names of these routines, user-level changes have been kept to a minimum. However, we recommend that users consult both this documentation and the ARKode example programs for further details on the updated infrastructure. As part of the ARKode restructuring an ARKodeButcherTable structure has been added for storing Butcher tables. Functions for creating new Butcher tables and checking their analytic order are provided along with other utility routines. For more details see Butcher Table Data Structure. Two changes were made in the initial step size algorithm: • Fixed an efficiency bug where an extra call to the right hand side function was made. • Changed the behavior of the algorithm if the max-iterations case is hit. Before the algorithm would exit with the step size calculated on the penultimate iteration. Now it will exit with the step size calculated on the final iteration. ARKode’s dense output infrastructure has been improved to support higher-degree Hermite polynomial interpolants (up to degree 5) over the last successful time step. ARKode’s previous direct and iterative linear solver interfaces, ARKDLS and ARKSPILS, have been merged into a single unified linear solver interface, ARKLS, to support any valid SUNLINSOL module. This includes DIRECT and ITERATIVE types as well as the new MATRIX_ITERATIVE type. Details regarding how ARKLS utilizes linear solvers of each type as well as discussion regarding intended use cases for user-supplied SUNLinSol implementations 4 Chapter 1. Introduction User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), are included in the chapter Description of the SUNLinearSolver module. All ARKode examples programs and the standalone linear solver examples have been updated to use the unified linear solver interface. The user interface for the new ARKLS module is very similar to the previous ARKDLS and ARKSPILS interfaces. Additionally, we note that Fortran users will need to enlarge their iout array of optional integer outputs, and update the indices that they query for certain linear-solver-related statistics. The names of all constructor routines for SUNDIALS-provided SUNLinSol implementations have been updated to follow the naming convention SUNLinSol_* where * is the name of the linear solver. The new names are SUNLinSol_Band, SUNLinSol_Dense, SUNLinSol_KLU, SUNLinSol_LapackBand, SUNLinSol_LapackDense, SUNLinSol_PCG, SUNLinSol_SPBCGS, SUNLinSol_SPFGMR, SUNLinSol_SPGMR, SUNLinSol_SPTFQMR, and SUNLinSol_SuperLUMT. Solver-specific “set” routine names have been similarly standardized. To minimize challenges in user migration to the new names, the previous routine names may still be used; these will be deprecated in future releases, so we recommend that users migrate to the new names soon. All ARKode example programs and the standalone linear solver examples have been updated to use the new naming convention. The SUNBandMatrix constructor has been simplified to remove the storage upper bandwidth argument. SUNDIALS integrators have been updated to utilize generic nonlinear solver modules defined through the SUNNONLINSOL API. This API will ease the addition of new nonlinear solver options and allow for external or user-supplied nonlinear solvers. The SUNNONLINSOL API and SUNDIALS provided modules are described in Nonlinear Solver Data Structures and follow the same object oriented design and implementation used by the NVector, SUNMatrix, and SUNLinSol modules. Currently two SUNNONLINSOL implementations are provided, SUNNonlinSol_Newton and SUNNonlinSol_FixedPoint. These replicate the previous integrator specific implementations of a Newton iteration and an accelerated fixed-point iteration, respectively. Example programs using each of these nonlinear solver modules in a standalone manner have been added and all ARKode example programs have been updated to use generic SUNNonlinSol modules. As with previous versions, ARKode will use the Newton solver (now provided by SUNNonlinSol_Newton) by default. Use of the ARKStepSetLinear() routine (previously named ARKodeSetLinear) will indicate that the problem is linearly-implicit, using only a single Newton iteration per implicit stage. Users wishing to switch to the accelerated fixed-point solver are now required to create a SUNNonlinSol_FixedPoint object and attach that to ARKode, instead of calling the previous ARKodeSetFixedPoint routine. See the documentation sections A skeleton of the user’s main program, Nonlinear solver interface functions, and The SUNNonlinearSolver_FixedPoint implementation for further details, or the serial C example program ark_brusselator_fp.c for an example. Three fused vector operations and seven vector array operations have been added to the NVECTOR API. These optional operations are disabled by default and may be activated by calling vector specific routines after creating an NVector (see Description of the NVECTOR Modules for more details). The new operations are intended to increase data reuse in vector operations, reduce parallel communication on distributed memory systems, and lower the number of kernel launches on systems with accelerators. The fused operations are N_VLinearCombination, N_VScaleAddMulti, and N_VDotProdMulti, and the vector array operations are N_VLinearCombinationVectorArray, N_VScaleVectorArray, N_VConstVectorArray, N_VWrmsNormVectorArray, N_VWrmsNormMaskVectorArray, N_VScaleAddMultiVectorArray, and N_VLinearCombinationVectorArray. If an NVector implementation defines any of these operations as NULL, then standard NVector operations will automatically be called as necessary to complete the computation. Multiple changes to the CUDA NVECTOR were made: • Changed the N_VMake_Cuda function to take a host data pointer and a device data pointer instead of an N_VectorContent_Cuda object. • Changed N_VGetLength_Cuda to return the global vector length instead of the local vector length. • Added N_VGetLocalLength_Cuda to return the local vector length. • Added N_VGetMPIComm_Cuda to return the MPI communicator used. • Removed the accessor functions in the namespace suncudavec. 1.1. Changes from previous versions 5 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • Added the ability to set the cudaStream_t used for execution of the CUDA NVECTOR kernels. See the function N_VSetCudaStreams_Cuda. • Added N_VNewManaged_Cuda, N_VMakeManaged_Cuda, and N_VIsManagedMemory_Cuda functions to accommodate using managed memory with the CUDA NVECTOR. Multiple changes to the RAJA NVECTOR were made: • Changed N_VGetLength_Raja to return the global vector length instead of the local vector length. • Added N_VGetLocalLength_Raja to return the local vector length. • Added N_VGetMPIComm_Raja to return the MPI communicator used. • Removed the accessor functions in the namespace sunrajavec. A new NVECTOR implementation for leveraging OpenMP 4.5+ device offloading has been added, NVECTOR_OpenMPDEV. See The NVECTOR_OPENMPDEV Module for more details. 1.1.4 Changes in v2.2.1 Fixed a bug in the CUDA NVECTOR where the N_VInvTest operation could write beyond the allocated vector data. Fixed library installation path for multiarch systems. This fix changes the default library installation path to CMAKE_INSTALL_PREFIX/CMAKE_INSTALL_LIBDIR from CMAKE_INSTALL_PREFIX/lib. CMAKE_INSTALL_LIBDIR is automatically set, but is available as a CMAKE option that can modified. 1.1.5 Changes in v2.2.0 Fixed a problem with setting sunindextype which would occur with some compilers (e.g. armclang) that did not define __STDC_VERSION__. Added hybrid MPI/CUDA and MPI/RAJA vectors to allow use of more than one MPI rank when using a GPU system. The vectors assume one GPU device per MPI rank. Changed the name of the RAJA NVECTOR library to libsundials_nveccudaraja.lib from libsundials_nvecraja.lib to better reflect that we only support CUDA as a backend for RAJA currently. Several changes were made to the build system: • CMake 3.1.3 is now the minimum required CMake version. • Deprecate the behavior of the SUNDIALS_INDEX_TYPE CMake option SUNDIALS_INDEX_SIZE CMake option to select the sunindextype integer size. and added the • The native CMake FindMPI module is now used to locate an MPI installation. • If MPI is enabled and MPI compiler wrappers are not set, the build system will check if CMAKE__COMPILER can compile MPI programs before trying to locate and use an MPI installation. • The previous options for setting MPI compiler wrappers and the executable for running MPI programs have been have been depreated. The new options that align with those used in native CMake FindMPI module are MPI_C_COMPILER, MPI_CXX_COMPILER, MPI_Fortran_COMPILER, and MPIEXEC_EXECUTABLE. • When a Fortran name-mangling scheme is needed (e.g., LAPACK_ENABLE is ON) the build system will infer the scheme from the Fortran compiler. If a Fortran compiler is not available or the inferred or default scheme needs to be overridden, the advanced options SUNDIALS_F77_FUNC_CASE and SUNDIALS_F77_FUNC_UNDERSCORES can be used to manually set the name-mangling scheme and bypass trying to infer the scheme. 6 Chapter 1. Introduction User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • Parts of the main CMakeLists.txt file were moved to new files in the src and example directories to make the CMake configuration file structure more modular. 1.1.6 Changes in v2.1.2 Updated the minimum required version of CMake to 2.8.12 and enabled using rpath by default to locate shared libraries on OSX. Fixed Windows specific problem where sunindextype was not correctly defined when using 64-bit integers for the SUNDIALS index type. On Windows sunindextype is now defined as the MSVC basic type __int64. Added sparse SUNMatrix “Reallocate” routine to allow specification of the nonzero storage. Updated the KLU SUNLinearSolver module to set constants for the two reinitialization types, and fixed a bug in the full reinitialization approach where the sparse SUNMatrix pointer would go out of scope on some architectures. Updated the “ScaleAdd” and “ScaleAddI” implementations in the sparse SUNMatrix module to more optimally handle the case where the target matrix contained sufficient storage for the sum, but had the wrong sparsity pattern. The sum now occurs in-place, by performing the sum backwards in the existing storage. However, it is still more efficient if the user-supplied Jacobian routine allocates storage for the sum 𝐼 + 𝛾𝐽 or 𝑀 + 𝛾𝐽 manually (with zero entries if needed). Changed LICENSE install path to instdir/include/sundials. 1.1.7 Changes in v2.1.1 Fixed a potential memory leak in the SPGMR and SPFGMR linear solvers: if “Initialize” was called multiple times then the solver memory was reallocated (without being freed). Fixed a minor bug in the ARKReInit routine, where a flag was incorrectly set to indicate that the problem had been resized (instead of just re-initialized). Fixed C++11 compiler errors/warnings about incompatible use of string literals. Updated KLU SUNLinearSolver module to use a typedef for the precision-specific solve function to be used (to avoid compiler warnings). Added missing typecasts for some (void*) pointers (again, to avoid compiler warnings). Bugfix in sunmatrix_sparse.c where we had used int instead of sunindextype in one location. Added missing #include in NVECTOR and SUNMATRIX header files. Added missing prototype for ARKSpilsGetNumMTSetups. Fixed an indexing bug in the CUDA NVECTOR implementation of N_VWrmsNormMask and revised the RAJA NVECTOR implementation of N_VWrmsNormMask to work with mask arrays using values other than zero or one. Replaced double with realtype in the RAJA vector test functions. Fixed compilation issue with GCC 7.3.0 and Fortran programs that do not require a SUNMatrix or SUNLinearSolver module (e.g. iterative linear solvers, explicit methods, fixed point solver, etc.). 1.1.8 Changes in v2.1.0 Added NVECTOR print functions that write vector data to a specified file (e.g. N_VPrintFile_Serial). Added make test and make test_install options to the build system for testing SUNDIALS after building with make and installing with make install respectively. 1.1. Changes from previous versions 7 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 1.1.9 Changes in v2.0.0 All interfaces to matrix structures and linear solvers have been reworked, and all example programs have been updated. The goal of the redesign of these interfaces was to provide more encapsulation and ease in interfacing custom linear solvers and interoperability with linear solver libraries. Specific changes include: • Added generic SUNMATRIX module with three provided implementations: dense, banded and sparse. These replicate previous SUNDIALS Dls and Sls matrix structures in a single object-oriented API. • Added example problems demonstrating use of generic SUNMATRIX modules. • Added generic SUNLINEARSOLVER module with eleven provided implementations: dense, banded, LAPACK dense, LAPACK band, KLU, SuperLU_MT, SPGMR, SPBCGS, SPTFQMR, SPFGMR, PCG. These replicate previous SUNDIALS generic linear solvers in a single object-oriented API. • Added example problems demonstrating use of generic SUNLINEARSOLVER modules. • Expanded package-provided direct linear solver (Dls) interfaces and scaled, preconditioned, iterative linear solver (Spils) interfaces to utilize generic SUNMATRIX and SUNLINEARSOLVER objects. • Removed package-specific, linear solver-specific, solver modules (e.g. CVDENSE, KINBAND, IDAKLU, ARKSPGMR) since their functionality is entirely replicated by the generic Dls/Spils interfaces and SUNLINEARSOLVER/SUNMATRIX modules. The exception is CVDIAG, a diagonal approximate Jacobian solver available to CVODE and CVODES. • Converted all SUNDIALS example problems to utilize new generic SUNMATRIX and SUNLINEARSOLVER objects, along with updated Dls and Spils linear solver interfaces. • Added Spils interface routines to ARKode, CVODE, CVODES, IDA and IDAS to allow specification of a userprovided “JTSetup” routine. This change supports users who wish to set up data structures for the user-provided Jacobian-times-vector (“JTimes”) routine, and where the cost of one JTSetup setup per Newton iteration can be amortized between multiple JTimes calls. Two additional NVECTOR implementations were added – one for CUDA and one for RAJA vectors. These vectors are supplied to provide very basic support for running on GPU architectures. Users are advised that these vectors both move all data to the GPU device upon construction, and speedup will only be realized if the user also conducts the right-hand-side function evaluation on the device. In addition, these vectors assume the problem fits on one GPU. Further information about RAJA, users are referred to the web site, https://software.llnl.gov/RAJA/. These additions are accompanied by additions to various interface functions and to user documentation. All indices for data structures were updated to a new sunindextype that can be configured to be a 32- or 64-bit integer data index type. sunindextype is defined to be int32_t or int64_t when portable types are supported, otherwise it is defined as int or long int. The Fortran interfaces continue to use long int for indices, except for their sparse matrix interface that now uses the new sunindextype. This new flexible capability for index types includes interfaces to PETSc, hypre, SuperLU_MT, and KLU with either 32-bit or 64-bit capabilities depending how the user configures SUNDIALS. To avoid potential namespace conflicts, the macros defining booleantype values TRUE and FALSE have been changed to SUNTRUE and SUNFALSE respectively. Temporary vectors were removed from preconditioner setup and solve routines for all packages. It is assumed that all necessary data for user-provided preconditioner operations will be allocated and stored in user-provided data structures. The file include/sundials_fconfig.h was added. This file contains SUNDIALS type information for use in Fortran programs. Added functions SUNDIALSGetVersion and SUNDIALSGetVersionNumber to get SUNDIALS release version information at runtime. 8 Chapter 1. Introduction User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The build system was expanded to support many of the xSDK-compliant keys. The xSDK is a movement in scientific software to provide a foundation for the rapid and efficient production of high-quality, sustainable extreme-scale scientific applications. More information can be found at, https://xsdk.info. In addition, numerous changes were made to the build system. These include the addition of separate BLAS_ENABLE and BLAS_LIBRARIES CMake variables, additional error checking during CMake configuration, minor bug fixes, and renaming CMake options to enable/disable examples for greater clarity and an added option to enable/disable Fortran 77 examples. These changes included changing ENABLE_EXAMPLES to ENABLE_EXAMPLES_C, changing CXX_ENABLE to EXAMPLES_ENABLE_CXX, changing F90_ENABLE to EXAMPLES_ENABLE_F90, and adding an EXAMPLES_ENABLE_F77 option. Corrections and additions were made to the examples, to installation-related files, and to the user documentation. 1.1.10 Changes in v1.1.0 We have included numerous bugfixes and enhancements since the v1.0.2 release. The bugfixes include: • For each linear solver, the various solver performance counters are now initialized to 0 in both the solver specification function and in the solver’s linit function. This ensures that these solver counters are initialized upon linear solver instantiation as well as at the beginning of the problem solution. • The choice of the method vs embedding the Billington and TRBDF2 explicit Runge-Kutta methods were swapped, since in those the lower-order coefficients result in an A-stable method, while the higher-order coefficients do not. This change results in significantly improved robustness when using those methods. • A bug was fixed for the situation where a user supplies a vector of absolute tolerances, and also uses the vector Resize() functionality. • A bug was fixed wherein a user-supplied Butcher table without an embedding is supplied, and the user is running with either fixed time steps (or they do adaptivity manually); previously this had resulted in an error since the embedding order was below 1. • Numerous aspects of the documentation were fixed and/or clarified. The feature changes/enhancements include: • Two additional NVECTOR implementations were added – one for Hypre (parallel) ParVector vectors, and one for PETSc vectors. These additions are accompanied by additions to various interface functions and to user documentation. • Each NVECTOR module now includes a function, N_VGetVectorID, that returns the NVECTOR module name. • A memory leak was fixed in the banded preconditioner and banded-block-diagonal preconditioner interfaces. In addition, updates were done to return integers from linear solver and preconditioner ‘free’ routines. • The Krylov linear solver Bi-CGstab was enhanced by removing a redundant dot product. Various additions and corrections were made to the interfaces to the sparse solvers KLU and SuperLU_MT, including support for CSR format when using KLU. • The ARKode implicit predictor algorithms were updated: methods 2 and 3 were improved slightly, a new predictor approach was added, and the default choice was modified. • The underlying sparse matrix structure was enhanced to allow both CSR and CSC matrices, with CSR supported by the KLU linear solver interface. ARKode interfaces to the KLU solver from both C and Fortran were updated to enable selection of sparse matrix type, and a Fortran-90 CSR example program was added. • The missing ARKSpilsGetNumMtimesEvals() function was added – this had been included in the previous documentation but had not been implemented. 1.1. Changes from previous versions 9 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • The handling of integer codes for specifying built-in ARKode Butcher tables was enhanced. While a global numbering system is still used, methods now have #defined names to simplify the user interface and to streamline incorporation of new Butcher tables into ARKode. • The maximum number of Butcher table stages was increased from 8 to 15 to accommodate very high order methods, and an 8th-order adaptive ERK method was added. • Support was added for the explicit and implicit methods in an additive Runge-Kutta method to utilize different stage times, solution and embedding coefficients, to support new SSP-ARK methods. • The FARKODE interface was extended to include a routine to set scalar/array-valued residual tolerances, to support Fortran applications with non-identity mass-matrices. 1.2 Reading this User Guide This user guide is a combination of general usage instructions and specific example programs. We expect that some readers will want to concentrate on the general instructions, while others will refer mostly to the examples, and the organization is intended to accommodate both styles. The structure of this document is as follows: • In the next section we provide a thorough presentation of the underlying mathematics used within the ARKode family of solvers. • We follow this with an overview of how the source code for ARKode is organized. • The largest section follows, providing a full account of the ARKStep module user interface, including a description of all user-accessible functions and outlines for usage in serial and parallel applications. Since ARKode is written in C, we first present a section on using ARKStep for C and C++ applications, followed with a separate section on using ARKode within Fortran applications. • The much smaller section describing the ERKStep time-stepping module, using ERKStep for C and C++ applications, follows. • Subsequent sections discuss shared features between ARKode and the rest of the SUNDIALS library: vector data structures, matrix data structures, linear solver data structures, and the installation procedure. • The final sections catalog the full set of ARKode constants, that are used for both input specifications and return codes, and the full set of Butcher tables that are packaged with ARKode. 1.3 SUNDIALS Release License All SUNDIALS packages are released open source, under the BSD 3-Clause license. The only requirements of the license are preservation of copyright and a standard disclaimer of liability. The full text of the license and an additional notice are provided below and may also be found in the LICENSE and NOTICE files provided with all SUNDIALS packages. PLEASE NOTE If you are using SUNDIALS with any third party libraries linked in (e.g., LAPACK, KLU, SuperLU_MT, PETSc, or hypre), be sure to review the respective license of the package as that license may have more restrictive terms than the SUNDIALS license. For example, if someone builds SUNDIALS with a statically linked KLU, the build is subject to terms of the more-restrictive LGPL license (which is what KLU is released with) and not the SUNDIALS BSD license anymore. 10 Chapter 1. Introduction User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 1.3.1 BSD 3-Clause License Copyright (c) 2002-2019, Lawrence Livermore National Security and Southern Methodist University. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ‘’AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1.3.2 Additional Notice This work was produced under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This work was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. 1.3.3 SUNDIALS Release Numbers LLNL-CODE-667205 (ARKODE) UCRL-CODE-155951 (CVODE) UCRL-CODE-155950 (CVODES) UCRL-CODE-155952 (IDA) UCRL-CODE-237203 (IDAS) 1.3. SUNDIALS Release License 11 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), LLNL-CODE-665877 (KINSOL) 12 Chapter 1. Introduction CHAPTER TWO MATHEMATICAL CONSIDERATIONS ARKode solves ODE initial value problems (IVP) in R𝑁 posed in linearly-implicit form, 𝑀 𝑦˙ = 𝑓 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (2.1) Here, 𝑡 is the independent variable (e.g. time), and the dependent variables are given by 𝑦 ∈ R𝑁 , where we use the notation 𝑦˙ to denote 𝑑𝑦 𝑑𝑡 . 𝑀 is a user-specified nonsingular operator from R𝑁 → R𝑁 . This operator is currently assumed to be independent of both 𝑡 and 𝑦. For standard systems of ordinary differential equations and for problems arising from the spatial semi-discretization of partial differential equations using finite difference, finite volume, or spectral finite element methods, 𝑀 is typically the identity matrix, 𝐼. For PDEs using standard finite-element spatial semi-discretizations, 𝑀 is typically a well-conditioned mass matrix that is fixed throughout a simulation (except in the case of a spatiallyadaptive method, where 𝑀 can change between, but not within, time steps). The ODE right-hand side is given by the function 𝑓 (𝑡, 𝑦), i.e. in general we make no assumption that the problem (2.1) is autonomous (𝑓 = 𝑓 (𝑦)). In general, the time integration methods within ARKode support additive splittings of this right-hand side function, as described in the subsections that follow. Through these splittings, the time-stepping methods currently supplied with ARKode are designed to solve stiff, nonstiff, or mixed stiff/nonstiff problems. Roughly speaking, stiffness is characterized by the presence of at least one rapidly damped mode, whose time constant is small compared to the time scale of the solution itself. In the sub-sections that follow, we elaborate on the numerical methods utilized in ARKode. We first discuss the “singlestep” nature of the ARKode infrastructure, including its usage modes and approaches for interpolated solution output. We then discuss the current suite of time-stepping modules supplied with ARKode, including the ARKStep module for additive Runge-Kutta methods, the ERKStep module that is optimized for explicit Runge-Kutta methods, and the MRIStep module for two-rate explicit-explicit multirate infinitesimal step methods. We then discuss the adaptive temporal error controllers shared by the time-stepping modules, including discussion of our choice of norms for measuring errors within various components of the solver. We then discuss the nonlinear and linear solver strategies used by ARKode’s time-stepping modules for solving implicit algebraic systems that arise in computing each stage and/or step: nonlinear solvers, linear solvers, preconditioners, error control within iterative nonlinear and linear solvers, algorithms for initial predictors for implicit stage solutions, and approaches for handling non-identity mass-matrices. We conclude with a section describing ARKode’s rootfinding capabilities, that may be used to stop integration of a problem prematurely based on traversal of roots in user-specified functions. 2.1 Adaptive single-step methods The ARKode infrastructure is designed to support single-step, IVP integration methods, i.e. 𝑦𝑛 = 𝜙(𝑦𝑛−1 , ℎ𝑛 ) 13 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), where 𝑦𝑛−1 is an approximation to the solution 𝑦(𝑡𝑛−1 ), 𝑦𝑛 is an approximation to the solution 𝑦(𝑡𝑛 ), 𝑡𝑛 = 𝑡𝑛−1 +ℎ𝑛 , and the approximation method is represented by the function 𝜙. The choice of step size ℎ𝑛 is determined by the time-stepping method (based on user-provided inputs, typically accuracy requirements). However, users may place minimum/maximum bounds on ℎ𝑛 if desired. ARKode’s time stepping modules may be run in a variety of “modes”: • NORMAL – The solver will take internal steps until it has just overtaken a user-specified output time, 𝑡out , in the direction of integration, i.e. 𝑡𝑛−1 < 𝑡out ≤ 𝑡𝑛 for forward integration, or 𝑡𝑛 ≤ 𝑡out < 𝑡𝑛−1 for backward integration. It will then compute an approximation to the solution 𝑦(𝑡out ) by interpolation (using one of the dense output routines described in the section Interpolation). • ONE-STEP – The solver will only take a single internal step 𝑦𝑛−1 → 𝑦𝑛 and then return control back to the calling program. If this step will overtake 𝑡out then the solver will again return an interpolated result; otherwise it will return a copy of the internal solution 𝑦𝑛 . • NORMAL-TSTOP – The solver will take internal steps until the next step will overtake 𝑡out . It will then limit this next step so that 𝑡𝑛 = 𝑡𝑛−1 + ℎ𝑛 = 𝑡out , and once the step completes it will return a copy of the internal solution 𝑦𝑛 . • ONE-STEP-TSTOP – The solver will check whether the next step will overtake 𝑡out – if not then this mode is identical to “one-step” above; otherwise it will limit this next step so that 𝑡𝑛 = 𝑡𝑛−1 + ℎ𝑛 = 𝑡out . In either case, once the step completes it will return a copy of the internal solution 𝑦𝑛 . We note that interpolated solutions may be slightly less accurate than the internal solutions produced by the solver. Hence, to ensure that the returned value has full method accuracy one of the “tstop” modes may be used. 2.2 Interpolation As mentioned above, the time-stepping modules in ARKode support interpolation of solutions 𝑦(𝑡out ) where 𝑡out occurs within a completed time step from 𝑡𝑛−1 → 𝑡𝑛 . Additionally, this module supports extrapolation of solutions to 𝑡 outside this interval (e.g. to construct predictors for iterative nonlinear and linear solvers). To this end, ARKode currently supports construction of polynomial interpolants 𝑝𝑞 (𝑡) of polynomial order up to 𝑞 = 5, although this polynomial order may be adjusted by the user. These interpolants are either of Lagrange or Hermite form, and use the data {𝑦𝑛−1 , 𝑓𝑛−1 , 𝑦𝑛 , 𝑓𝑛 }, where here we use the simplified notation 𝑓𝑘 to denote 𝑓 (𝑡𝑘 , 𝑦𝑘 ). Defining a normalized “time” variable, 𝜏 , for the most-recentlycomputed solution interval 𝑡𝑛−1 → 𝑡𝑛 as 𝜏 (𝑡) = 𝑡 − 𝑡𝑛−1 , ℎ𝑛 we then construct the interpolants 𝑝𝑞 (𝑡) as follows: • 𝑞 = 0: constant interpolant 𝑝0 (𝜏 ) = 𝑦𝑛−1 + 𝑦𝑛 . 2 • 𝑞 = 1: linear Lagrange interpolant 𝑝1 (𝜏 ) = −𝜏 𝑦𝑛−1 + (1 + 𝜏 ) 𝑦𝑛 . • 𝑞 = 2: quadratic Hermite interpolant 𝑝2 (𝜏 ) = 𝜏 2 𝑦𝑛−1 + (1 − 𝜏 2 ) 𝑦𝑛 + ℎ(𝜏 + 𝜏 2 ) 𝑓𝑛 . 14 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • 𝑞 = 3: cubic Hermite interpolant 𝑝3 (𝜏 ) = (3𝜏 2 + 2𝜏 3 ) 𝑦𝑛−1 + (1 − 3𝜏 2 − 2𝜏 3 ) 𝑦𝑛 + ℎ(𝜏 2 + 𝜏 3 ) 𝑓𝑛−1 + ℎ(𝜏 + 2𝜏 2 + 𝜏 3 ) 𝑓𝑛 . We note that although interpolants of order > 5 are possible, these are not currently implemented due to their increased computing and storage costs. However, these may be added in future releases. 2.3 ARKStep – Additive Runge-Kutta methods The ARKStep time-stepping module in ARKode is designed for IVP of the form 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 , (2.2) i.e. the right-hand side function is additively split into two components: • 𝑓𝐸 (𝑡, 𝑦) contains the “nonstiff” components of the system. This will be integrated using an explicit method. • 𝑓𝐼 (𝑡, 𝑦) contains the “stiff” components of the system. This will be integrated using an implicit method. In solving the IVP (2.2), ARKStep utilizes variable-step, embedded, additive Runge-Kutta methods (ARK), corresponding to algorithms of the form 𝑀 𝑧𝑖 = 𝑀 𝑦𝑛−1 + ℎ𝑛 𝑀 𝑦𝑛 = 𝑀 𝑦𝑛−1 + ℎ𝑛 𝑀 𝑦˜𝑛 = 𝑀 𝑦𝑛−1 + ℎ𝑛 𝑖−1 ∑︁ 𝑗=1 𝑠 ∑︁ 𝐸 𝐴𝐸 𝑖,𝑗 𝑓𝐸 (𝑡𝑛,𝑗 , 𝑧𝑗 ) + ℎ𝑛 𝑖 ∑︁ 𝐴𝐼𝑖,𝑗 𝑓𝐼 (𝑡𝐼𝑛,𝑗 , 𝑧𝑗 ), 𝑖 = 1, . . . , 𝑠, 𝑗=1 (︀ )︀ 𝐸 𝐼 𝐼 𝑏𝐸 𝑖 𝑓𝐸 (𝑡𝑛,𝑖 , 𝑧𝑖 ) + 𝑏𝑖 𝑓𝐼 (𝑡𝑛,𝑖 , 𝑧𝑖 ) , (2.3) 𝑖=1 𝑠 (︁ ∑︁ )︁ ˜𝑏𝐸 𝑓𝐸 (𝑡𝐸 , 𝑧𝑖 ) + ˜𝑏𝐼 𝑓𝐼 (𝑡𝐼 , 𝑧𝑖 ) . 𝑖 𝑛,𝑖 𝑖 𝑛,𝑖 𝑖=1 Here 𝑦˜𝑛 are embedded solutions that approximate 𝑦(𝑡𝑛 ) that are used for error estimation; these typically have slightly lower accuracy than the computed solutions 𝑦𝑛 . The internal stage times are abbreviated using the notation 𝑡𝐸 𝑛,𝑗 = 𝐼 𝐼 𝐸 𝑠×𝑠 𝑡𝑛−1 + 𝑐𝐸 ℎ and 𝑡 = 𝑡 + 𝑐 ℎ . The ARK method is primarily defined through the coefficients 𝐴 ∈ R , 𝑛−1 𝑗 𝑛 𝑛,𝑗 𝑗 𝑛 𝐼 𝑠×𝑠 𝐸 𝑠 𝐼 𝑠 𝐸 𝑠 𝐼 𝑠 𝐴 ∈ R , 𝑏 ∈ R , 𝑏 ∈ R , 𝑐 ∈ R and 𝑐 ∈ R , that correspond with the explicit and implicit Butcher tables. Additional coefficients ˜𝑏𝐸 ∈ R𝑠 and ˜𝑏𝐼 ∈ R𝑠 are used to construct the embedding 𝑦˜𝑛 . We note that ARKStep currently enforces the constraint that the explicit and implicit methods in an ARK pair must share the same number of stages, 𝑠; however it allows the possibility for different explicit and implicit stage times, i.e. 𝑐𝐸 need not equal 𝑐𝐼 . The user of ARKStep must choose appropriately between one of three classes of methods: ImEx, explicit, and implicit. All of ARKode’s available Butcher tables encoding the coefficients 𝑐𝐸 , 𝑐𝐼 , 𝐴𝐸 , 𝐴𝐼 , 𝑏𝐸 , 𝑏𝐼 , ˜𝑏𝐸 and ˜𝑏𝐼 are further described in the Appendix: Butcher tables. For mixed stiff/nonstiff problems, a user should provide both of the functions 𝑓𝐸 and 𝑓𝐼 that define the IVP system. For such problems, ARKStep currently implements the ARK methods proposed in [KC2003], allowing for methods having order of accuracy 𝑞 = {3, 4, 5}; the tables for these methods are given in the section Additive Butcher tables. Additionally, user-defined ARK tables are supported. For nonstiff problems, a user may specify that 𝑓𝐼 = 0, i.e. the equation (2.2) reduces to the non-split IVP 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (2.4) In this scenario, the coefficients 𝐴𝐼 = 0, 𝑐𝐼 = 0, 𝑏𝐼 = 0 and ˜𝑏𝐼 = 0 in (2.3), and the ARK methods reduce to classical explicit Runge-Kutta methods (ERK). For these classes of methods, ARKode provides coefficients with orders of accuracy 𝑞 = {2, 3, 4, 5, 6, 8}, with embeddings of orders 𝑝 = {1, 2, 3, 4, 5, 7}. These default to the Heun-Euler-2-1-2, 2.3. ARKStep – Additive Runge-Kutta methods 15 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Bogacki-Shampine-4-2-3, Zonneveld-5-3-4, Cash-Karp-6-4-5, Verner-8-5-6 and Fehlberg-13-7-8 methods, respectively. As with ARK methods, user-defined ERK tables are supported. Finally, for stiff problems the user may specify that 𝑓𝐸 = 0, so the equation (2.2) reduces to the non-split IVP 𝑀 𝑦˙ = 𝑓𝐼 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (2.5) Similarly to ERK methods, in this scenario the coefficients 𝐴𝐸 = 0, 𝑐𝐸 = 0, 𝑏𝐸 = 0 and ˜𝑏𝐸 = 0 in (2.3), and the ARK methods reduce to classical diagonally-implicit Runge-Kutta methods (DIRK). For these classes of methods, ARKode provides tables with orders of accuracy 𝑞 = {2, 3, 4, 5}, with embeddings of orders 𝑝 = {1, 2, 3, 4}. These default to the SDIRK-2-1-2, ARK-4-2-3 (implicit), SDIRK-5-3-4 and ARK-8-4-5 (implicit) methods, respectively. Again, userdefined DIRK tables are supported. 2.4 ERKStep – Explicit Runge-Kutta methods The ERKStep time-stepping module in ARKode is designed for IVP of the form 𝑦˙ = 𝑓 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (2.6) For such problems, ERKStep provides variable-step, embedded, explicit Runge-Kutta methods (ERK), corresponding to algorithms of the form 𝑧𝑖 = 𝑦𝑛−1 + ℎ𝑛 𝑦𝑛 = 𝑦𝑛−1 + ℎ𝑛 𝑦˜𝑛 = 𝑦𝑛−1 + ℎ𝑛 𝑖−1 ∑︁ 𝑗=1 𝑠 ∑︁ 𝑖=1 𝑠 ∑︁ 𝐴𝑖,𝑗 𝑓 (𝑡𝑛,𝑗 , 𝑧𝑗 ), 𝑖 = 1, . . . , 𝑠, (2.7) 𝑏𝑖 𝑓 (𝑡𝑛,𝑖 , 𝑧𝑖 ), ˜𝑏𝑖 𝑓 (𝑡𝑛,𝑖 , 𝑧𝑖 ), 𝑖=1 where the variables have the same meanings as in the previous section. We note that the problem (2.6) is fully encapsulated in the more general problems (2.4), and that the algorithm (2.7) is similarly encapsulated in the more general algorithm (2.3). While it therefore follows that ARKStep can be used to solve every problem solvable by ERKStep, using the same set of methods, we include ERKStep as a distinct time-stepping module since this simplified form admits a more efficient and memory-friendly solution process than when considering the more general form. 2.5 MRIStep – Multirate infinitesimal step methods The MRIStep time-stepping module in ARKode is designed for IVP of the form 𝑦˙ = 𝑓𝑠 (𝑡, 𝑦) + 𝑓𝑓 (𝑡, 𝑦), 𝑦(𝑡0 ) = 𝑦0 . (2.8) i.e. the right-hand side function is additively split into two components: • 𝑓𝑠 (𝑡, 𝑦) contains the “slow” components of the system. This will be integrated using a large time step ℎ𝑠 . • 𝑓𝑓 (𝑡, 𝑦) contains the “fast” components of the system. This will be integrated using a small time step ℎ𝑓 . For such problems, MRIStep provides fixed-step multirate infinitesimal step methods (see [SKAW2009], [SKAW2012a], and [SKAW2012b]) that combine two Runge-Kutta methods. The slow (outer) method is an 𝑠 stage 16 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), explicit Runge-Kutta method where the stage values and the new solution are computed by solving an auxiliary ODE with a fast (inner) Runge-Kutta method. This corresponds to the algorithm 𝑤 1 = 𝑦𝑛 , 𝑟𝑖 = 𝑖−1 ∑︁ (𝐴𝑠𝑖,𝑗 − 𝐴𝑠𝑖−1,𝑗 )𝑓𝑠 (𝑤𝑗 ), 𝑗=1 𝑣𝑖 (𝜏𝑖−1 ) = 𝑤𝑖−1 , 1 𝑑𝑣𝑖 𝑟𝑖 , = 𝑓𝑓 (𝑣𝑖 ) + 𝑠 𝑑𝜏 𝑐𝑖 − 𝑐𝑠𝑖−1 (2.9) 𝜏 ∈ [𝜏𝑖−1 , 𝜏𝑖 ], 𝑖 = 2, . . . , 𝑠 + 1 𝑤𝑖 = 𝑣𝑖 (𝜏𝑖 ), 𝑦𝑛+1 = 𝑤𝑠+1 , where the slow stages 𝑤𝑖 at times 𝜏𝑖 = 𝑡𝑛 + 𝑐𝑠𝑖 ℎ𝑠 are computed by solving the 𝑣𝑖 fast ODE on [𝜏𝑖−1 , 𝜏𝑖 ] with the initial condition 𝑤𝑖−1 , forcing term 𝑟𝑖 , and 𝐴𝑠𝑠+1,𝑗 = 𝑏𝑠𝑗 . The MRIStep module provides a thrid order explicit-explicit method using the Knoth-Wolke-3-3 ERK for the slow and fast method. User-defined tables are also supported. A user defined method will be first to thrid order accurate depending on the slow and fast tables provided. If both the slow and fast tables are second order, then the overall method will also be second order. If the slow and fast tables are both third order and the slow method satisfies an auxiliary condition (see [SKAW2012a]), then the overall method will also be thrid order. Note that at this time the MRIStep module only supports explicit fast and slow tables where the stage times of the slow table must be unique and orderd (i.e., 𝑐𝑠𝑖 > 𝑐𝑠𝑖−1 ) and the final stage time must be less than 1. 2.6 Error norms In the process of controlling errors at various levels (time integration, nonlinear solution, linear solution), the methods in ARKode use a weighted root-mean-square norm, denoted ‖ · ‖WRMS , for all error-like quantities, (︃ ‖𝑣‖WRMS = 𝑁 1 ∑︁ 2 (𝑣𝑖 𝑤𝑖 ) 𝑁 𝑖=1 )︃1/2 . (2.10) The utility of this norm arises in the specification of the weighting vector 𝑤, that combines the units of the problem with user-supplied values that specify an “acceptable” level of error. To this end, we construct an error weight vector using the most-recent step solution and user-supplied relative and absolute tolerances, namely 𝑤𝑖 = 1 . 𝑅𝑇 𝑂𝐿 · |𝑦𝑛−1,𝑖 | + 𝐴𝑇 𝑂𝐿𝑖 (2.11) Since 1/𝑤𝑖 represents a tolerance in the 𝑖-th component of the solution vector 𝑦, a vector whose WRMS norm is 1 is regarded as “small.” For brevity, unless specified otherwise we will drop the subscript WRMS on norms in the remainder of this section. Additionally, for problems involving a non-identity mass matrix, 𝑀 ̸= 𝐼, the units of equation (2.2) may differ from the units of the solution 𝑦. In this case, we may additionally construct a residual weight vector, 𝑤𝑖 = 1 , 𝑅𝑇 𝑂𝐿 · | [𝑀 𝑦𝑛−1 ]𝑖 | + 𝐴𝑇 𝑂𝐿′𝑖 (2.12) where the user may specify a separate absolute residual tolerance value or array, 𝐴𝑇 𝑂𝐿′ . The choice of weighting vector used in any given norm is determined by the quantity being measured: values having “solution” units use (2.11), whereas values having “equation” units use (2.12). Obviously, for problems with 𝑀 = 𝐼, the solution and equation units are identical, so the solvers in ARKode will use (2.11) when computing all error norms. 2.6. Error norms 17 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 2.7 Time step adaptivity A critical component of IVP “solvers” (rather than just time-steppers) is their adaptive control of local truncation error (LTE). At every step, we estimate the local error, and ensure that it satisfies tolerance conditions. If this local error test fails, then the step is recomputed with a reduced step size. To this end, the Runge-Kutta methods packaged within both the ARKStep and ERKStep modules admit an embedded solution 𝑦˜𝑛 , as shown in equations (2.3) and (2.7). Generally, these embedded solutions attain a slightly lower order of accuracy than the computed solution 𝑦𝑛 . Denoting the order of accuracy for 𝑦𝑛 as 𝑞 and for 𝑦˜𝑛 as 𝑝, most of these embedded methods satisfy 𝑝 = 𝑞 − 1. These values of 𝑞 and 𝑝 correspond to the global orders of accuracy for the method and embedding, hence each admit local truncation errors satisfying [HW1993] ‖𝑦𝑛 − 𝑦(𝑡𝑛 )‖ = 𝐶ℎ𝑞+1 + 𝒪(ℎ𝑞+2 𝑛 𝑛 ), ‖˜ 𝑦𝑛 − 𝑦(𝑡𝑛 )‖ = 𝐷ℎ𝑝+1 + 𝒪(ℎ𝑝+2 𝑛 𝑛 ), (2.13) where 𝐶 and 𝐷 are constants independent of ℎ𝑛 , and where we have assumed exact initial conditions for the step, i.e. 𝑦𝑛−1 = 𝑦(𝑡𝑛−1 ). Combining these estimates, we have ‖𝑦𝑛 − 𝑦˜𝑛 ‖ = ‖𝑦𝑛 − 𝑦(𝑡𝑛 ) − 𝑦˜𝑛 + 𝑦(𝑡𝑛 )‖ ≤ ‖𝑦𝑛 − 𝑦(𝑡𝑛 )‖ + ‖˜ 𝑦𝑛 − 𝑦(𝑡𝑛 )‖ ≤ 𝐷ℎ𝑝+1 + 𝒪(ℎ𝑝+2 𝑛 𝑛 ). We therefore use the norm of the difference between 𝑦𝑛 and 𝑦˜𝑛 as an estimate for the LTE at the step 𝑛 𝑀 𝑇𝑛 = 𝛽 (𝑦𝑛 − 𝑦˜𝑛 ) = 𝛽ℎ𝑛 𝑠 [︁(︁ ∑︁ )︁ (︁ )︁ ]︁ ˜𝐸 𝑓𝐸 (𝑡𝐸 , 𝑧𝑖 ) + 𝑏𝐼 − ˜𝑏𝐼 𝑓𝐼 (𝑡𝐼 , 𝑧𝑖 ) 𝑏𝐸 𝑖 − 𝑏𝑖 𝑛,𝑖 𝑖 𝑖 𝑛,𝑖 (2.14) 𝑖=1 for ARK methods, and similarly for ERK methods. Here, 𝛽 > 0 is an error bias to help account for the error constant 𝐷; the default value of this constant is 𝛽 = 1.5, which may be modified by the user. With this LTE estimate, the local error test is simply ‖𝑇𝑛 ‖ < 1 since this norm includes the user-specified tolerances. If this error test passes, the step is considered successful, and the estimate is subsequently used to estimate the next step size, the algorithms used for this purpose are described below in the section Asymptotic error control. If the error test fails, the step is rejected and a new step size ℎ′ is then computed using the same error controller as for successful steps. A new attempt at the step is made, and the error test is repeated. If the error test fails twice, then ℎ′ /ℎ is limited above to 0.3, and limited below to 0.1 after an additional step failure. After seven error test failures, control is returned to the user with a failure message. We note that all of the constants listed above are only the default values; each may be modified by the user. We define the step size ratio between a prospective step ℎ′ and a completed step ℎ as 𝜂, i.e. 𝜂 = ℎ′ /ℎ. This value is subsequently bounded from above by 𝜂max to ensure that step size adjustments are not overly aggressive. This upper bound changes according to the step and history, ⎧ ⎪ on the first step (default is 10000), ⎨etamx1, 𝜂max = growth, on general steps (default is 20), ⎪ ⎩ 1, if the previous step had an error test failure. A flowchart detailing how the time steps are modified at each iteration to ensure solver convergence and successful steps is given in the figure below. Here, all norms correspond to the WRMS norm, and the error adaptivity function arkAdapt is supplied by one of the error control algorithms discussed in the subsections below. For some problems it may be preferable to avoid small step size adjustments. This can be especially true for problems that construct a Newton Jacobian matrix or a preconditioner for a nonlinear or an iterative linear solve, where this construction is computationally expensive, and where convergence can be seriously hindered through use of an inaccurate matrix. To accommodate these scenarios, the step is left unchanged when 𝜂 ∈ [𝜂𝐿 , 𝜂𝑈 ]. The default values for this interval are 𝜂𝐿 = 1 and 𝜂𝑈 = 1.5, and may be modified by the user. We note that any choices for 𝜂 (or equivalently, ℎ′ ) are subsequently constrained by the optional user-supplied bounds ℎmin and ℎmax . Additionally, the time-stepping algorithms in ARKode may similarly limit ℎ′ to adhere to a userprovided “TSTOP” stopping point, 𝑡stop . 18 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 2.7. Time step adaptivity 19 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 2.7.1 Asymptotic error control As mentioned above, the time-stepping modules in ARKode adapt the step size in order to attain local errors within desired tolerances of the true solution. These adaptivity algorithms estimate the prospective step size ℎ′ based on the asymptotic local error estimates (2.13). We define the values 𝜀𝑛 , 𝜀𝑛−1 and 𝜀𝑛−2 as 𝜀𝑘 ≡ ‖𝑇𝑘 ‖ = 𝛽‖𝑦𝑘 − 𝑦˜𝑘 ‖, corresponding to the local error estimates for three consecutive steps, 𝑡𝑛−3 → 𝑡𝑛−2 → 𝑡𝑛−1 → 𝑡𝑛 . These local error history values are all initialized to 1 upon program initialization, to accommodate the few initial time steps of a calculation where some of these error estimates have not yet been computed. With these estimates, ARKode supports a variety of error control algorithms, as specified in the subsections below. PID controller This is the default time adaptivity controller used by the ARKStep and ERKStep modules. It derives from those found in [KC2003], [S1998], [S2003] and [S2006], and uses all three of the local error estimates 𝜀𝑛 , 𝜀𝑛−1 and 𝜀𝑛−2 in determination of a prospective step size, 𝑘 /𝑝 −𝑘 /𝑝 2 ℎ′ = ℎ𝑛 𝜀𝑛−𝑘1 /𝑝 𝜀𝑛−1 𝜀𝑛−23 , where the constants 𝑘1 , 𝑘2 and 𝑘3 default to 0.58, 0.21 and 0.1, respectively, and may be modied by the user. In this estimate, a floor of 𝜀 > 10−10 is enforced to avoid division-by-zero errors. PI controller Like with the previous method, the PI controller derives from those found in [KC2003], [S1998], [S2003] and [S2006], but it differs in that it only uses the two most recent step sizes in its adaptivity algorithm, 𝑘 /𝑝 2 1 /𝑝 ℎ′ = ℎ𝑛 𝜀−𝑘 𝜀𝑛−1 . 𝑛 Here, the default values of 𝑘1 and 𝑘2 default to 0.8 and 0.31, respectively, though they may be changed by the user. I controller This is the standard time adaptivity control algorithm in use by most publicly-available ODE solver codes. It bases the prospective time step estimate entirely off of the current local error estimate, ℎ′ = ℎ𝑛 𝜀𝑛−𝑘1 /𝑝 . By default, 𝑘1 = 1, but that may be modified by the user. Explicit Gustafsson controller This step adaptivity algorithm was proposed in [G1991], and is primarily useful with explicit Runge-Kutta methods. In the notation of our earlier controllers, it has the form {︃ −1/𝑝 ℎ1 𝜀1 , on the first step, ′ ℎ = (2.15) −𝑘 /𝑝 𝑘 /𝑝 ℎ𝑛 𝜀𝑛 1 (𝜀𝑛 /𝜀𝑛−1 ) 2 , on subsequent steps. The default values of 𝑘1 and 𝑘2 are 0.367 and 0.268, respectively, and may be modified by the user. 20 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Implicit Gustafsson controller A version of the above controller suitable for implicit Runge-Kutta methods was introduced in [G1994], and has the form {︃ −1/𝑝 ℎ1 𝜀1 , on the first step, ′ ℎ = (2.16) −𝑘1 /𝑝 −𝑘2 /𝑝 ℎ𝑛 (ℎ𝑛 /ℎ𝑛−1 ) 𝜀𝑛 (𝜀𝑛 /𝜀𝑛−1 ) , on subsequent steps. The algorithm parameters default to 𝑘1 = 0.98 and 𝑘2 = 0.95, but may be modified by the user. ImEx Gustafsson controller An ImEx version of these two preceding controllers is also available. This approach computes the estimates ℎ′1 arising from equation (2.15) and the estimate ℎ′2 arising from equation (2.16), and selects ℎ′ = ℎ min {|ℎ′1 |, |ℎ′2 |} . |ℎ| Here, equation (2.15) uses 𝑘1 and 𝑘2 with default values of 0.367 and 0.268, while equation (2.16) sets both parameters to the input 𝑘3 that defaults to 0.95. All of these values may be modified by the user. User-supplied controller Finally, ARKode’s time-stepping modules allow the user to define their own time step adaptivity function, ℎ′ = 𝐻(𝑦, 𝑡, ℎ𝑛 , ℎ𝑛−1 , ℎ𝑛−2 , 𝜀𝑛 , 𝜀𝑛−1 , 𝜀𝑛−2 , 𝑞, 𝑝), to allow for problem-specific choices, or for continued experimentation with temporal error controllers. 2.8 Explicit stability For problems that involve a nonzero explicit component, i.e. 𝑓𝐸 (𝑡, 𝑦) ̸= 0 in ARKStep or for any problem in ERKStep, explicit and ImEx Runge-Kutta methods may benefit from additional user-supplied information regarding the explicit stability region. All ARKode adaptivity methods utilize estimates of the local error, and it is often the case that such local error control will be sufficient for method stability, since unstable steps will typically exceed the error control tolerances. However, for problems in which 𝑓𝐸 (𝑡, 𝑦) includes even moderately stiff components, and especially for higher-order integration methods, it may occur that a significant number of attempted steps will exceed the error tolerances. While these steps will automatically be recomputed, such trial-and-error can result in an unreasonable number of failed steps, increasing the cost of the computation. In these scenarios, a stability-based time step controller may also be useful. Since the maximum stable explicit step for any method depends on the problem under consideration, in that the value (ℎ𝑛 𝜆) must reside within a bounded stability region, where 𝜆 are the eigenvalues of the linearized operator 𝜕𝑓𝐸 /𝜕𝑦, information on the maximum stable step size is not readily available to ARKode’s time-stepping modules. However, for many problems such information may be easily obtained through analysis of the problem itself, e.g. in an advection-diffusion calculation 𝑓𝐼 may contain the stiff diffusive components and 𝑓𝐸 may contain the comparably nonstiff advection terms. In this scenario, an explicitly stable step ℎexp would be predicted as one satisfying the Courant-Friedrichs-Lewy (CFL) stability condition for the advective portion of the problem, |ℎexp | < ∆𝑥 |𝜆| where ∆𝑥 is the spatial mesh size and 𝜆 is the fastest advective wave speed. 2.8. Explicit stability 21 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), In these scenarios, a user may supply a routine to predict this maximum explicitly stable step size, |ℎexp |. If a value for |ℎexp | is supplied, it is compared against the value resulting from the local error controller, |ℎacc |, and the eventual time step used will be limited accordingly, ℎ′ = ℎ min{𝑐 |ℎexp |, |ℎacc |}. |ℎ| Here the explicit stability step factor 𝑐 > 0 (often called the “CFL number”) defaults to 1/2 but may be modified by the user. 2.8.1 Fixed time stepping While both the ARKStep and ERKStep time-stepping modules are designed for tolerance-based time step adaptivity, they additionally support a “fixed-step” mode. This mode is typically used for debugging purposes, for verification against hand-coded Runge-Kutta methods, or for problems where the time steps should be chosen based on other problem-specific information. In this mode, all internal time step adaptivity is disabled: • temporal error control is disabled, • nonlinear or linear solver non-convergence will result in an error (instead of a step size adjustment), • no check against an explicit stability condition is performed. Additional information on this mode is provided in the sections ARKStep Optional Inputs and ERKStep Optional Inputs. 2.9 Algebraic solvers When solving a problem involving either a nonzero implicit component, 𝑓𝐼 (𝑡, 𝑦) ̸= 0, or a non-identity mass matrix, 𝑀 ̸= 𝐼, systems of linear or nonlinear algebraic equations must be solved at each stage and/or step of the method. This section therefore focuses on the variety of mathematical methods provided in the ARKode infrastructure for such problems, including nonlinear solvers, linear solvers, preconditioners, iterative solver error control, implicit predictors, and techniques used for simplifying the above solves when using non-time-dependent mass-matrices. 2.9.1 Nonlinear solver methods For both the DIRK and ARK methods corresponding to (2.2) and (2.5), an implicit system 𝐺(𝑧𝑖 ) ≡ 𝑀 𝑧𝑖 − ℎ𝑛 𝐴𝐼𝑖,𝑖 𝑓𝐼 (𝑡𝐼𝑛,𝑖 , 𝑧𝑖 ) − 𝑎𝑖 = 0 (2.17) must be solved for each stage 𝑧𝑖 , 𝑖 = 1, . . . , 𝑠, where we have the data ⎛ ⎞ 𝑖−1 ∑︁ [︀ 𝐸 ]︀ 𝐼 𝐼 ⎠ 𝑎𝑖 ≡ ⎝𝑦𝑛−1 + ℎ𝑛 𝐴𝑖,𝑗 𝑓𝐸 (𝑡𝐸 𝑛,𝑗 , 𝑧𝑗 ) + 𝐴𝑖,𝑗 𝑓𝐼 (𝑡𝑛,𝑗 , 𝑧𝑗 ) 𝑗=1 for the ARK methods, or ⎛ 𝑎𝑖 ≡ ⎝𝑦𝑛−1 + ℎ𝑛 𝑖−1 ∑︁ ⎞ 𝐴𝐼𝑖,𝑗 𝑓𝐼 (𝑡𝐼𝑛,𝑗 , 𝑧𝑗 )⎠ 𝑗=1 for the DIRK methods. Here, if 𝑓𝐼 (𝑡, 𝑦) depends nonlinearly on 𝑦 then (2.17) corresponds to a nonlinear system of equations; if 𝑓𝐼 (𝑡, 𝑦) depends linearly on 𝑦 then this is a linear system of equations. 22 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), For systems of either type, ARKode provides a choice of solution strategies. The default solver choice is a variant of Newton’s method, (𝑚+1) 𝑧𝑖 (𝑚) = 𝑧𝑖 + 𝛿 (𝑚+1) , (2.18) where 𝑚 is the Newton iteration index, and the Newton update 𝛿 (𝑚+1) in turn requires the solution of the Newton linear system (︁ )︁ (︁ )︁ (𝑚) (𝑚) 𝒜 𝑡𝐼𝑛,𝑖 , 𝑧𝑖 𝛿 (𝑚+1) = −𝐺 𝑧𝑖 , (2.19) in which 𝒜(𝑡, 𝑧) ≈ 𝑀 − 𝛾𝐽(𝑡, 𝑧), 𝐽(𝑡, 𝑧) = 𝜕𝑓𝐼 (𝑡, 𝑧) , 𝜕𝑧 and 𝛾 = ℎ𝑛 𝐴𝐼𝑖,𝑖 . (2.20) When the problem involves an identity mass matrix, then as an alternative to Newton’s method, ARKode provides a fixed point iteration for solving the stages 𝑧𝑖 , 𝑖 = 1, . . . , 𝑠, (︁ )︁ (︁ )︁ (𝑚) (𝑚) (𝑚) (𝑚+1) ≡ 𝑧𝑖 − 𝐺 𝑧𝑖 , 𝑚 = 0, 1, . . . (2.21) 𝑧𝑖 = Φ 𝑧𝑖 This iteration may additionally be improved using a technique called “Anderson acceleration” [WN2011]. Unlike with Newton’s method, these methods do not require the solution of a linear system at each iteration, instead opting for solution of a low-dimensional least-squares solution to construct the nonlinear update. Finally, if the user specifies that 𝑓𝐼 (𝑡, 𝑦) depends linearly on 𝑦, and if the Newton-based nonlinear solver is chosen, then the problem (2.17) will be solved using only a single Newton iteration. In this case, an additional user-supplied argument indicates whether this Jacobian is time-dependent or not, signaling whether the Jacobian or preconditioner needs to be recomputed at each stage or time step, or if it can be reused throughout the full simulation. The optimal choice of solver (Newton vs fixed-point) is highly problem dependent. Since fixed-point solvers do not require the solution of any linear systems, each iteration may be significantly less costly than their Newton counterparts. However, this can come at the cost of slower convergence (or even divergence) in comparison with Newton-like methods. On the other hand, these fixed-point solvers do allow for user specification of the Anderson-accelerated subspace size, 𝑚𝑘 . While the required amount of solver memory for acceleration grows proportionately to 𝑚𝑘 𝑁 , larger values of 𝑚𝑘 may result in faster convergence. In our experience, this improvement is most significant for “small” values, e.g. 1 ≤ 𝑚𝑘 ≤ 5, and that larger values of 𝑚𝑘 may not result in improved convergence. While a Newton-based iteration is the default solver due to its increased robustness on very stiff problems, we strongly recommend that users also consider the fixed-point solver when attempting a new problem. For either the Newton or fixed-point solvers, it is well-known that both the efficiency and robustness of the algorithm (0) intimately depend on the choice of a good initial guess. The initial guess for these solvers is a prediction 𝑧𝑖 that is computed explicitly from previously-computed data (e.g. 𝑦𝑛−2 , 𝑦𝑛−1 , and 𝑧𝑗 where 𝑗 < 𝑖). Additional information on the specific predictor algorithms is provided in the following section, Implicit predictors. 2.9.2 Linear solver methods When a Newton-based method is chosen for solving each nonlinear system, a linear system of equations must be solved at each nonlinear iteration. For this solve ARKode provides several choices, including the option of a usersupplied linear solver module. The linear solver modules distributed with SUNDIALS are organized into two families: a direct family comprising direct linear solvers for dense, banded or sparse matrices, and a spils family comprising scaled, preconditioned, iterative (Krylov) linear solvers. The methods offered through these modules are as follows: • dense direct solvers, using either an internal SUNDIALS implementation or a BLAS/LAPACK implementation (serial version only), • band direct solvers, using either an internal SUNDIALS implementation or a BLAS/LAPACK implementation (serial version only), 2.9. Algebraic solvers 23 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • sparse direct solvers, using either the KLU sparse matrix library [KLU], or the OpenMP or PThreads-enabled SuperLU_MT sparse matrix library [SuperLUMT] [Note that users will need to download and install the KLU or SuperLU_MT packages independent of ARKode], • SPGMR, a scaled, preconditioned GMRES (Generalized Minimal Residual) solver, • SPFGMR, a scaled, preconditioned FGMRES (Flexible Generalized Minimal Residual) solver, • SPBCGS, a scaled, preconditioned Bi-CGStab (Bi-Conjugate Gradient Stable) solver, • SPTFQMR, a scaled, preconditioned TFQMR (Transpose-free Quasi-Minimal Residual) solver, or • PCG, a preconditioned CG (Conjugate Gradient method) solver for symmetric linear systems. For large stiff systems where direct methods are often infeasible, the combination of an implicit integrator and a preconditioned Krylov method can yield a powerful tool because it combines established methods for stiff integration, nonlinear solver iteration, and Krylov (linear) iteration with a problem-specific treatment of the dominant sources of stiffness, in the form of a user-supplied preconditioner matrix [BH1989]. We note that the direct linear solver modules currently provided by SUNDIALS are only designed to be used with the serial and threaded vector representations. Matrix-based linear solvers In the case that a matrix-based linear solver is used, a modified Newton iteration is utilized. In a modified newton iteration, the matrix 𝒜 is held fixed for multiple Newton iterations. More precisely, each Newton iteration is computed from the modified equation (︁ )︁ (︀ )︀ (𝑚) 𝒜˜ 𝑡˜, 𝑧˜ 𝛿 (𝑚+1) = −𝐺 𝑧𝑖 , (2.22) in which ˜ 𝑧) ≈ 𝑀 − 𝛾˜ 𝐽(𝑡, 𝑧), 𝒜(𝑡, and 𝛾˜ = ℎ̃𝐴𝐼𝑖,𝑖 . (2.23) Here, the solution 𝑧˜, time 𝑡˜, and step size ℎ̃ upon which the modified equation rely, are merely values of these quantities from a previous iteration. In other words, the matrix 𝒜˜ is only computed rarely, and reused for repeated solves. The frequency at which 𝒜˜ is recomputed defaults to 20 time steps, but may be modified by the user. When using the dense and band SUNMatrix objects for the linear systems (2.22), the Jacobian 𝐽 may be supplied by a user routine, or approximated internally by finite-differences. In the case of differencing, we use the standard approximation 𝐽𝑖,𝑗 (𝑡, 𝑧) ≈ 𝑓𝐼,𝑖 (𝑡, 𝑧 + 𝜎𝑗 𝑒𝑗 ) − 𝑓𝐼,𝑖 (𝑡, 𝑧) , 𝜎𝑗 where 𝑒𝑗 is the 𝑗-th unit vector, and the increments 𝜎𝑗 are given by }︂ {︂ √ 𝜎0 𝜎𝑗 = max 𝑈 |𝑧𝑗 |, . 𝑤𝑗 Here 𝑈 is the unit roundoff, 𝜎0 is a small dimensionless value, and 𝑤𝑗 is the error weight defined in (2.11). In the dense case, this approach requires 𝑁 evaluations of 𝑓𝐼 , one for each column of 𝐽. In the band case, the columns of 𝐽 are computed in groups, using the Curtis-Powell-Reid algorithm, with the number of 𝑓𝐼 evaluations equal to the matrix bandwidth. We note that with sparse and user-supplied SUNMatrix objects, the Jacobian must be supplied by a user routine. 24 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Matrix-free iterative linear solvers In the case that a matrix-free iterative linear solver is chosen, an inexact Newton iteration is utilized. Here, the matrix 𝒜 is not itself constructed since the algorithms only require the product of this matrix with a given vector. Additionally, each Newton system (2.19) is not solved completely, since these linear solvers are iterative (hence the “inexact” in the name). As a result. for these linear solvers 𝒜 is applied in a matrix-free manner, 𝒜(𝑡, 𝑧) 𝑣 = 𝑀 𝑣 − 𝛾 𝐽(𝑡, 𝑧) 𝑣. The matrix-vector products 𝑀 𝑣 must be provided through a user-supplied routine; the matrix-vector products 𝐽𝑣 are obtained by either calling an optional user-supplied routine, or through a finite difference approximation to the directional derivative: 𝐽(𝑡, 𝑧) 𝑣 ≈ 𝑓𝐼 (𝑡, 𝑧 + 𝜎𝑣) − 𝑓𝐼 (𝑡, 𝑧) , 𝜎 where the increment 𝜎 = 1/‖𝑣‖ to ensure that ‖𝜎𝑣‖ = 1. As with the modified Newton method that reused 𝒜 between solves, the inexact Newton iteration may also recompute the preconditioner 𝑃 infrequently to balance the high costs of matrix construction and factorization against the reduced convergence rate that may result from a stale preconditioner. Updating the linear solver In cases where recomputation of the Newton matrix 𝒜˜ or preconditioner 𝑃 is lagged, these structures will be recomputed only in the following circumstances: • when starting the problem, • when more than 20 steps have been taken since the last update (this value may be modified by the user), • when the value 𝛾˜ of 𝛾 at the last update satisfies |𝛾/˜ 𝛾 − 1| > 0.2 (this value may be modified by the user), • when a non-fatal convergence failure just occurred, • when an error test failure just occurred, or • if the problem is linearly implicit and 𝛾 has changed by a factor larger than 100 times machine epsilon. When an update is forced due to a convergence failure, an update of 𝒜˜ or 𝑃 may or may not involve a re-evaluation of ˜ or of Jacobian data (in 𝑃 ), depending on whether errors in the Jacobian were the likely cause of the failure. 𝐽 (in 𝒜) More generally, the decision is made to re-evaluate 𝐽 (or instruct the user to update 𝑃 ) when: • starting the problem, • more than 50 steps have been taken since the last evaluation, • a convergence failure occurred with an outdated matrix, and the value 𝛾˜ of 𝛾 at the last update satisfies |𝛾/˜ 𝛾 − 1| > 0.2, • a convergence failure occurred that forced a step size reduction, or • if the problem is linearly implicit and 𝛾 has changed by a factor larger than 100 times machine epsilon. However, for linear solvers and preconditioners that do not rely on costly matrix construction and factorization operations (e.g. when using a geometric multigrid method as preconditioner), it may be more efficient to update these structures more frequently than the above heuristics specify, since the increased rate of linear/nonlinear solver convergence may more than account for the additional cost of Jacobian/preconditioner construction. To this end, a user may specify that the system matrix 𝒜 and/or preconditioner 𝑃 should be recomputed more frequently. As will be further discussed in the section Preconditioning, in the case of most Krylov methods, preconditioning may be applied on the left, right, or on both sides of 𝒜, with user-supplied routines for the preconditioner setup and solve operations. 2.9. Algebraic solvers 25 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 2.9.3 Iteration Error Control Nonlinear iteration error control The stopping test for all of the nonlinear solver algorithms is related to the temporal local error test, with the goal of keeping the nonlinear iteration errors from interfering with local error control. Denoting the final computed value of (𝑚) each stage solution as 𝑧𝑖 , and the true stage solution solving (2.17) as 𝑧𝑖 , we want to ensure that the iteration error (𝑚) 𝑧𝑖 − 𝑧𝑖 is “small” (recall that a norm less than 1 is already considered within an acceptable tolerance). To this end, we first estimate the linear convergence rate 𝑅𝑖 of the nonlinear iteration. We initialize 𝑅𝑖 = 1, and reset (𝑚) (𝑚−1) it to this value whenever 𝒜˜ or 𝑃 are updated. After computing a nonlinear correction 𝛿 (𝑚) = 𝑧𝑖 − 𝑧𝑖 , if 𝑚 > 0 we update 𝑅𝑖 as ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ 𝑅𝑖 ← max{0.3𝑅𝑖 , ⃦𝛿 (𝑚) ⃦ / ⃦𝛿 (𝑚−1) ⃦}. where the factor 0.3 is user-modifiable. (𝑚) (𝑚) Let 𝑦𝑛 denote the time-evolved solution constructed using our approximate nonlinear stage solutions, 𝑧𝑖 , and let (∞) 𝑦𝑛 denote the time-evolved solution constructed using exact nonlinear stage solutions. We then use the estimate ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ ⃦ (∞) ⃦ ⃦ (𝑚+1) ⃦ (𝑚) ⃦ ⃦ (𝑚) ⃦ (𝑚−1) ⃦ − 𝑧𝑖 ⃦ ≈ max 𝑅𝑖 ⃦𝑧𝑖 − 𝑧𝑖 ⃦𝑦𝑛 − 𝑦𝑛(𝑚) ⃦ ≈ max ⃦𝑧𝑖 ⃦ = max 𝑅𝑖 ⃦𝛿 (𝑚) ⃦ . 𝑖 𝑖 𝑖 Therefore our convergence (stopping) test for the nonlinear iteration for each stage is ⃦ ⃦ ⃦ ⃦ 𝑅𝑖 ⃦𝛿 (𝑚) ⃦ < 𝜖, (2.24) where the factor 𝜖 has default value 0.1. We default to a maximum of 3 nonlinear iterations. We also declare the nonlinear iteration to be divergent if any of the ratios ‖𝛿 (𝑚) ‖/‖𝛿 (𝑚−1) ‖ > 2.3 with 𝑚 > 0. If convergence fails in the fixed point iteration, or in the Newton iteration with 𝐽 or 𝒜 current, we reduce the step size ℎ𝑛 by a factor of 0.25. The integration will be halted after 10 convergence failures, or if a convergence failure occurs with ℎ𝑛 = ℎmin . However, since the nonlinearity of (2.17) may vary significantly based on the problem under consideration, these default constants may all be modified by the user. Linear iteration error control When a Krylov method is used to solve the linear Newton systems (2.19), its errors must also be controlled. To this end, we approximate the linear iteration error in the solution vector 𝛿 (𝑚) using the preconditioned residual vector, e.g. 𝑟 = 𝑃 𝒜𝛿 (𝑚) + 𝑃 𝐺 for the case of left preconditioning (the role of the preconditioner is further elaborated in the next section). In an attempt to ensure that the linear iteration errors do not interfere with the nonlinear solution error and local time integration error controls, we require that the norm of the preconditioned linear residual satisfies ‖𝑟‖ ≤ 𝜖𝐿 𝜖 . 10 (2.25) Here 𝜖 is the same value as that is used above for the nonlinear error control. The factor of 10 is used to ensure that the linear solver error does not adversely affect the nonlinear solver convergence. Smaller values for the parameter 𝜖𝐿 are typically useful for strongly nonlinear or very stiff ODE systems, while easier ODE systems may benefit from a value closer to 1. The default value is 𝜖𝐿 = 0.05, which may be modified by the user. We note that for linearly implicit problems the tolerance (2.25) is similarly used for the single Newton iteration. 2.9.4 Preconditioning When using an inexact Newton method to solve the nonlinear system (2.17), an iterative method is used repeatedly to solve linear systems of the form 𝒜𝑥 = 𝑏, where 𝑥 is a correction vector and 𝑏 is a residual vector. If this iterative 26 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), method is one of the scaled preconditioned iterative linear solvers supplied with SUNDIALS, their efficiency may benefit tremendously from preconditioning. A system 𝒜𝑥 = 𝑏 can be preconditioned using any one of: (𝑃 −1 𝒜)𝑥 = 𝑃 −1 𝑏 −1 )𝑃 𝑥 = 𝑏 (𝑃𝐿−1 𝒜𝑃𝑅−1 )𝑃𝑅 𝑥 𝑃𝐿−1 𝑏 (𝒜𝑃 = [left preconditioning], [right preconditioning], [left and right preconditioning]. These Krylov iterative methods are then applied to a system with the matrix 𝑃 −1 𝒜, 𝒜𝑃 −1 , or 𝑃𝐿−1 𝒜𝑃𝑅−1 , instead of 𝒜. In order to improve the convergence of the Krylov iteration, the preconditioner matrix 𝑃 , or the product 𝑃𝐿 𝑃𝑅 in the third case, should in some sense approximate the system matrix 𝒜. Simultaneously, in order to be cost-effective the matrix 𝑃 (or matrices 𝑃𝐿 and 𝑃𝑅 ) should be reasonably efficient to evaluate and solve. Finding an optimal point in this trade-off between rapid convergence and low cost can be quite challenging. Good choices are often problemdependent (for example, see [BH1989] for an extensive study of preconditioners for reaction-transport systems). Most of the iterative linear solvers supplied with SUNDIALS allow for all three types of preconditioning (left, right or both), although for non-symmetric matrices 𝒜 we know of few situations where preconditioning on both sides is superior to preconditioning on one side only (with the product 𝑃 = 𝑃𝐿 𝑃𝑅 ). Moreover, for a given preconditioner matrix, the merits of left vs. right preconditioning are unclear in general, so we recommend that the user experiment with both choices. Performance can differ between these since the inverse of the left preconditioner is included in the linear system residual whose norm is being tested in the Krylov algorithm. As a rule, however, if the preconditioner is the product of two matrices, we recommend that preconditioning be done either on the left only or the right only, rather than using one factor on each side. An exception to this rule is the PCG solver, that itself assumes a symmetric matrix 𝒜, since the PCG algorithm in fact applies the single preconditioner matrix 𝑃 in both left/right fashion as 𝑃 −1/2 𝒜𝑃 −1/2 . Typical preconditioners are based on approximations to the system Jacobian, 𝐽 = 𝜕𝑓𝐼 /𝜕𝑦. Since the Newton iteration matrix involved is 𝒜 = 𝑀 − 𝛾𝐽, any approximation 𝐽¯ to 𝐽 yields a matrix that is of potential use as a preconditioner, ¯ Because the Krylov iteration occurs within a Newton iteration and further also within a time namely 𝑃 = 𝑀 − 𝛾 𝐽. integration, and since each of these iterations has its own test for convergence, the preconditioner may use a very crude approximation, as long as it captures the dominant numerical features of the system. We have found that the combination of a preconditioner with the Newton-Krylov iteration, using even a relatively poor approximation to the Jacobian, can be surprisingly superior to using the same matrix without Krylov acceleration (i.e., a modified Newton iteration), as well as to using the Newton-Krylov method with no preconditioning. 2.9.5 Implicit predictors For problems with implicit components, a prediction algorithm is employed for constructing the initial guesses for (0) each implicit Runge-Kutta stage, 𝑧𝑖 . As is well-known with nonlinear solvers, the selection of a good initial guess can have dramatic effects on both the speed and robustness of the solve, making the difference between rapid quadratic convergence versus divergence of the iteration. To this end, a variety of prediction algorithms are provided. In each (0) case, the stage guesses 𝑧𝑖 are constructed explicitly using readily-available information, including the previous step solutions 𝑦𝑛−1 and 𝑦𝑛−2 , as well as any previous stage solutions 𝑧𝑗 , 𝑗 < 𝑖. In most cases, prediction is performed by constructing an interpolating polynomial through existing data, which is then evaluated at the desired stage time to provide an inexpensive but (hopefully) reasonable prediction of the stage solution. Specifically, for most Runge-Kutta methods each stage solution satisfies 𝑧𝑖 ≈ 𝑦(𝑡𝐼𝑛,𝑖 ), so by constructing an interpolating polynomial 𝑝𝑞 (𝑡) through a set of existing data, the initial guess at stage solutions may be approximated as (0) 𝑧𝑖 = 𝑝𝑞 (𝑡𝐼𝑛,𝑖 ). (2.26) As the stage times for implicit ARK and DIRK stages usually satisfy 𝑐𝐼𝑗 > 0, it is typically the case that 𝑡𝐼𝑛,𝑗 is outside of the time interval containing the data used to construct 𝑝𝑞 (𝑡), hence (2.26) will correspond to an extrapolant instead 2.9. Algebraic solvers 27 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), of an interpolant. The dangers of using a polynomial interpolant to extrapolate values outside the interpolation interval are well-known, with higher-order polynomials and predictions further outside the interval resulting in the greatest potential inaccuracies. The prediction algorithms available in ARKode therefore construct a variety of interpolants 𝑝𝑞 (𝑡), having different polynomial order and using different interpolation data, to support ‘optimal’ choices for different types of problems, as described below. Trivial predictor The so-called “trivial predictor” is given by the formula 𝑝0 (𝑡) = 𝑦𝑛−1 . While this piecewise-constant interpolant is clearly not a highly accurate candidate for problems with time-varying solutions, it is often the most robust approach for highly stiff problems, or for problems with implicit constraints whose violation may cause illegal solution values (e.g. a negative density or temperature). Maximum order predictor At the opposite end of the spectrum, ARKode’s interpolation module can be used to construct a higher-order polynomial interpolant, 𝑝𝑞 (𝑡), based on the two most-recently-computed solutions, {𝑦𝑛−2 , 𝑓𝑛−2 , 𝑦𝑛−1 , 𝑓𝑛−1 }. This can then be used to extrapolate predicted stage solutions for each stage time 𝑡𝐼𝑛,𝑖 . This polynomial order is the same as that specified by the user for dense output. Variable order predictor This predictor attempts to use higher-order polynomials 𝑝𝑞 (𝑡) for predicting earlier stages, and lower-order interpolants for later stages. It uses the same interpolation module as described above, but chooses 𝑞 adaptively based on the stage index 𝑖, under the (rather tenuous) assumption that the stage times are increasing, i.e. 𝑐𝐼𝑗 < 𝑐𝐼𝑘 for 𝑗 < 𝑘: 𝑞 = max{𝑞max − 𝑖, 1}. Cutoff order predictor This predictor follows a similar idea as the previous algorithm, but monitors the actual stage times to determine the 𝑛 polynomial interpolant to use for prediction. Denoting 𝜏 = 𝑐𝐼𝑖 ℎℎ𝑛−1 , the polynomial degree 𝑞 is chosen as: {︃ 𝑞max , 𝑞= 1, if 𝜏 < 12 , otherwise. Bootstrap predictor This predictor does not use any information from the preceding step, instead using information only within the current step [𝑡𝑛−1 , 𝑡𝑛 ]. In addition to using the solution and ODE right-hand side function, 𝑦𝑛−1 and 𝑓 (𝑡𝑛−1 , 𝑦𝑛−1 ), this approach uses the right-hand side from a previously computed stage solution in the same step, 𝑓 (𝑡𝑛−1 + 𝑐𝐼𝑗 ℎ, 𝑧𝑗 ) to construct a quadratic Hermite interpolant for the prediction. If we define the constants ℎ̃ = 𝑐𝐼𝑗 ℎ and 𝜏 = 𝑐𝐼𝑖 ℎ, the predictor is given by (︂ )︂ 𝜏2 𝜏2 (0) 𝑧𝑖 = 𝑦𝑛−1 + 𝜏 − 𝑓 (𝑡𝑛−1 , 𝑦𝑛−1 ) + 𝑓 (𝑡𝑛−1 + ℎ̃, 𝑧𝑗 ). 2ℎ̃ 2ℎ̃ 28 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), For stages without a nonzero preceding stage time, i.e. 𝑐𝐼𝑗 ̸= 0 for 𝑗 < 𝑖, this method reduces to using the trivial (0) predictor 𝑧𝑖 = 𝑦𝑛−1 . For stages having multiple preceding nonzero 𝑐𝐼𝑗 , we choose the stage having largest 𝑐𝐼𝑗 value, to minimize the level of extrapolation used in the prediction. We note that in general, each stage solution 𝑧𝑗 has significantly worse accuracy than the time step solutions 𝑦𝑛−1 , due to the difference between the stage order and the method order in Runge-Kutta methods. As a result, the accuracy of this predictor will generally be rather limited, but it is provided for problems in which this increased stage error is better than the effects of extrapolation far outside of the previous time step interval [𝑡𝑛−2 , 𝑡𝑛−1 ]. We further note that although this method could be used with non-identity mass matrix 𝑀 ̸= 𝐼, support for that mode is not currently implemented, so selection of this predictor in the case that 𝑀 ̸= 𝐼 will result in use of the trivial predictor. Minimum correction predictor The last predictor is not interpolation based; instead it utilizes all existing stage information from the current step to create a predictor containing all but the current stage solution. Specifically, as discussed in equations (2.3) and (2.17), each stage solves a nonlinear equation 𝑧𝑖 = 𝑦𝑛−1 + ℎ𝑛 𝑖−1 ∑︁ 𝐸 𝐴𝐸 𝑖,𝑗 𝑓𝐸 (𝑡𝑛,𝑗 , 𝑧𝑗 ) + ℎ𝑛 𝑗=1 𝑖 ∑︁ 𝐴𝐼𝑖,𝑗 𝑓𝐼 (𝑡𝐼𝑛,𝑗 , 𝑧𝑗 ), 𝑗=1 ⇔ 𝐺(𝑧𝑖 ) ≡ 𝑧𝑖 − ℎ𝑛 𝐴𝐼𝑖,𝑖 𝑓𝐼 (𝑡𝐼𝑛,𝑖 , 𝑧𝑖 ) − 𝑎𝑖 = 0. This prediction method merely computes the predictor 𝑧𝑖 as 𝑧𝑖 = 𝑦𝑛−1 + ℎ𝑛 𝑖−1 ∑︁ 𝐸 𝐴𝐸 𝑖,𝑗 𝑓𝐸 (𝑡𝑛,𝑗 , 𝑧𝑗 ) + ℎ𝑛 𝑗=1 𝑖−1 ∑︁ 𝐴𝐼𝑖,𝑗 𝑓𝐼 (𝑡𝐼𝑛,𝑗 , 𝑧𝑗 ), 𝑗=1 ⇔ 𝑧𝑖 = 𝑎𝑖 . We again note that although this method could be used with non-identity mass matrix 𝑀 ̸= 𝐼, support for that mode is not currently implemented, so selection of this predictor in the case that 𝑀 ̸= 𝐼 will result in use of the trivial predictor. 2.9.6 Mass matrix solver Within the algorithms described above, there are multiple locations where a matrix-vector product 𝑏 = 𝑀𝑣 (2.27) 𝑥 = 𝑀 −1 𝑏 (2.28) or a linear solve are required. Of course, for problems in which 𝑀 = 𝐼 both of these operators are trivial. However for problems with non-identity 𝑀 , these linear solves (2.28) may be handled using any valid linear solver module, in the same manner as described in the section Linear solver methods for solving the linear Newton systems. At present, for DIRK and ARK problems using a matrix-based solver for the Newton nonlinear iterations, the type of matrix (dense, band, sparse, or custom) for the Jacobian matrix 𝐽 must match the type of mass matrix 𝑀 , since 2.9. Algebraic solvers 29 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ˜ When matrix-based methods are employed, the user must these are combined to form the Newton system matrix 𝒜. supply a routine to compute 𝑀 in the appropriate form to match the structure of 𝒜, with a user-supplied routine of type ARKLsMassFn(). This matrix structure is used internally to perform any requisite mass matrix-vector products (2.27). When matrix-free methods are selected, a routine must be supplied to perform the mass-matrix-vector product, 𝑀 𝑣. As with iterative solvers for the Newton systems, preconditioning may be applied to aid in solution of the mass matrix systems (2.28). When using an iterative mass matrix linear solver, we require that the norm of the preconditioned linear residual satisfies ‖𝑟‖ ≤ 𝜖𝐿 𝜖, (2.29) where again, 𝜖 is the nonlinear solver tolerance parameter from (2.24). When using iterative system and mass matrix linear solvers, 𝜖𝐿 may be specified separately for both tolerances (2.25) and (2.29). In the above algorithmic description there are three locations where a linear solve of the form (2.28) is required: (a) in constructing the time-evolved solution 𝑦𝑛 , (b) in estimating the local temporal truncation error, and (c) in constructing predictors for the implicit solver iteration (see section Maximum order predictor). Specifically, to construct the timeevolved solution 𝑦𝑛 from equation (2.3) we must solve 𝑀 𝑦𝑛 = 𝑀 𝑦𝑛−1 + ℎ𝑛 𝑠 ∑︁ (︀ )︀ 𝐸 𝐼 𝐼 𝑏𝐸 𝑖 𝑓𝐸 (𝑡𝑛,𝑖 , 𝑧𝑖 ) + 𝑏𝑖 𝑓𝐼 (𝑡𝑛,𝑖 , 𝑧𝑖 ) , 𝑖=1 ⇔ 𝑀 (𝑦𝑛 − 𝑦𝑛−1 ) = ℎ𝑛 𝑠 ∑︁ (︀ )︀ 𝐸 𝐼 𝐼 𝑏𝐸 𝑖 𝑓𝐸 (𝑡𝑛,𝑖 , 𝑧𝑖 ) + 𝑏𝑖 𝑓𝐼 (𝑡𝑛,𝑖 , 𝑧𝑖 ) , 𝑖=1 ⇔ 𝑀 𝜈 = ℎ𝑛 𝑠 ∑︁ (︀ )︀ 𝐸 𝐼 𝐼 𝑏𝐸 𝑖 𝑓𝐸 (𝑡𝑛,𝑖 , 𝑧𝑖 ) + 𝑏𝑖 𝑓𝐼 (𝑡𝑛,𝑖 , 𝑧𝑖 ) , 𝑖=1 for the update 𝜈 = 𝑦𝑛 − 𝑦𝑛−1 . For construction of the stages 𝑧𝑖 this requires no mass matrix solves (as these are included in the nonlinear system solve). Similarly, in computing the local temporal error estimate 𝑇𝑛 from equation (2.14) we must solve systems of the form 𝑀 𝑇𝑛 = ℎ 𝑠 [︁(︁ ∑︁ )︁ (︁ )︁ ]︁ ˜𝐸 𝑓𝐸 (𝑡𝐸 , 𝑧𝑖 ) + 𝑏𝐼 − ˜𝑏𝐼 𝑓𝐼 (𝑡𝐼 , 𝑧𝑖 ) . 𝑏𝐸 𝑖 − 𝑏𝑖 𝑛,𝑖 𝑖 𝑖 𝑛,𝑖 (2.30) 𝑖=1 Lastly, in constructing dense output and implicit predictors of order 2 or higher (as in the section Maximum order predictor above), we must compute the derivative information 𝑓𝑘 from the equation 𝑀 𝑓𝑘 = 𝑓𝐸 (𝑡𝑘 , 𝑦𝑘 ) + 𝑓𝐼 (𝑡𝑘 , 𝑦𝑘 ). In total, these require only two mass-matrix linear solves (2.28) per attempted time step, with one more upon completion of a time step that meets the solution accuracy requirements. When fixed time-stepping is used (ℎ𝑛 = ℎ), the solve (2.30) is not performed at each attempted step. 2.10 Rootfinding Many of the time-stepping modules in ARKode also support a rootfinding feature. This means that, while integrating the IVP (2.1), these can also find the roots of a set of user-defined functions 𝑔𝑖 (𝑡, 𝑦) that depend on 𝑡 and the solution vector 𝑦 = 𝑦(𝑡). The number of these root functions is arbitrary, and if more than one 𝑔𝑖 is found to have a root in any given interval, the various root locations are found and reported in the order that they occur on the 𝑡 axis, in the direction of integration. 30 Chapter 2. Mathematical Considerations User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Generally, this rootfinding feature finds only roots of odd multiplicity, corresponding to changes in sign of 𝑔𝑖 (𝑡, 𝑦(𝑡)), denoted 𝑔𝑖 (𝑡) for short. If a user root function has a root of even multiplicity (no sign change), it will almost certainly be missed due to the realities of floating-point arithmetic. If such a root is desired, the user should reformulate the root function so that it changes sign at the desired root. The basic scheme used is to check for sign changes of any 𝑔𝑖 (𝑡) over each time step taken, and then (when a sign change is found) to home in on the root (or roots) with a modified secant method [HS1980]. In addition, each time 𝑔 is evaluated, ARKode checks to see if 𝑔𝑖 (𝑡) = 0 exactly, and if so it reports this as a root. However, if an exact zero of any 𝑔𝑖 is found at a point 𝑡, ARKode computes 𝑔(𝑡 + 𝛿) for a small increment 𝛿, slightly further in the direction of integration, and if any 𝑔𝑖 (𝑡 + 𝛿) = 0 also, ARKode stops and reports an error. This way, each time ARKode takes a time step, it is guaranteed that the values of all 𝑔𝑖 are nonzero at some past value of 𝑡, beyond which a search for roots is to be done. At any given time in the course of the time-stepping, after suitable checking and adjusting has been done, ARKode has an interval (𝑡lo , 𝑡hi ] in which roots of the 𝑔𝑖 (𝑡) are to be sought, such that 𝑡hi is further ahead in the direction of integration, and all 𝑔𝑖 (𝑡lo ) ̸= 0. The endpoint 𝑡hi is either 𝑡𝑛 , the end of the time step last taken, or the next requested output time 𝑡out if this comes sooner. The endpoint 𝑡lo is either 𝑡𝑛−1 , or the last output time 𝑡out (if this occurred within the last step), or the last root location (if a root was just located within this step), possibly adjusted slightly toward 𝑡𝑛 if an exact zero was found. The algorithm checks 𝑔(𝑡hi ) for zeros, and it checks for sign changes in (𝑡lo , 𝑡hi ). If no sign changes are found, then either a root is reported (if some 𝑔𝑖 (𝑡hi ) = 0) or we proceed to the next time interval (starting at 𝑡hi ). If one or more sign changes were found, then a loop is entered to locate the root to within a rather tight tolerance, given by 𝜏 = 100 𝑈 (|𝑡𝑛 | + |ℎ|) (where 𝑈 = unit roundoff). Whenever sign changes are seen in two or more root functions, the one deemed most likely to have its root occur first is the one with the largest value of |𝑔𝑖 (𝑡hi )| / |𝑔𝑖 (𝑡hi ) − 𝑔𝑖 (𝑡lo )|, corresponding to the closest to 𝑡lo of the secant method values. At each pass through the loop, a new value 𝑡mid is set, strictly within the search interval, and the values of 𝑔𝑖 (𝑡mid ) are checked. Then either 𝑡lo or 𝑡hi is reset to 𝑡mid according to which subinterval is found to have the sign change. If there is none in (𝑡lo , 𝑡mid ) but some 𝑔𝑖 (𝑡mid ) = 0, then that root is reported. The loop continues until |𝑡hi − 𝑡lo | < 𝜏 , and then the reported root location is 𝑡hi . In the loop to locate the root of 𝑔𝑖 (𝑡), the formula for 𝑡mid is 𝑡mid = 𝑡hi − 𝑔𝑖 (𝑡hi )(𝑡hi − 𝑡lo ) , 𝑔𝑖 (𝑡hi ) − 𝛼𝑔𝑖 (𝑡lo ) where 𝛼 is a weight parameter. On the first two passes through the loop, 𝛼 is set to 1, making 𝑡mid the secant method value. Thereafter, 𝛼 is reset according to the side of the subinterval (low vs high, i.e. toward 𝑡lo vs toward 𝑡hi ) in which the sign change was found in the previous two passes. If the two sides were opposite, 𝛼 is set to 1. If the two sides were the same, 𝛼 is halved (if on the low side) or doubled (if on the high side). The value of 𝑡mid is closer to 𝑡lo when 𝛼 < 1 and closer to 𝑡hi when 𝛼 > 1. If the above value of 𝑡mid is within 𝜏 /2 of 𝑡lo or 𝑡hi , it is adjusted inward, such that its fractional distance from the endpoint (relative to the interval size) is between 0.1 and 0.5 (with 0.5 being the midpoint), and the actual distance from the endpoint is at least 𝜏 /2. Finally, we note that when running in parallel, ARKode’s rootfinding module assumes that the entire set of root defining functions 𝑔𝑖 (𝑡, 𝑦) is replicated on every MPI task. Since in these cases the vector 𝑦 is distributed across tasks, it is the user’s responsibility to perform any necessary inter-task communication to ensure that 𝑔𝑖 (𝑡, 𝑦) is identical on each task. 2.10. Rootfinding 31 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 32 Chapter 2. Mathematical Considerations CHAPTER THREE CODE ORGANIZATION The family of solvers referred to as SUNDIALS consists of the solvers CVODE and ARKode (for ODE systems), KINSOL (for nonlinear algebraic systems), and IDA (for differential-algebraic systems). In addition, SUNDIALS also includes variants of CVODE and IDA with sensitivity analysis capabilities (using either forward or adjoint methods), called CVODES and IDAS, respectively. The various solvers of this family share many subordinate modules. For this reason, it is organized as a family, with a directory structure that exploits that sharing (see the following Figures SUNDIALS organization, SUNDIALS tree and SUNDIALS examples). The following is a list of the solver packages presently available, and the basic functionality of each: • CVODE, a linear multistep solver for stiff and nonstiff ODE systems 𝑦˙ = 𝑓 (𝑡, 𝑦) based on Adams and BDF methods; • CVODES, a linear multistep solver for stiff and nonstiff ODEs with sensitivity analysis capabilities; • ARKode, a Runge-Kutta based solver for stiff, nonstiff, and mixed ODE systems; • IDA, a linear multistep solver for differential-algebraic systems 𝐹 (𝑡, 𝑦, 𝑦) ˙ = 0 based on BDF methods; • IDAS, a linear multistep solver for differential-algebraic systems with sensitivity analysis capabilities; • KINSOL, a solver for nonlinear algebraic systems 𝐹 (𝑢) = 0. Fig. 3.1: SUNDIALS organization: High-level diagram of the SUNDIALS structure 33 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 3.2: SUNDIALS tree: Directory structure of the source tree. Fig. 3.3: SUNDIALS examples: Directory structure of the examples. 34 Chapter 3. Code Organization User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 3.1 ARKode organization The ARKode package is written in the ANSI C language. The following summarizes the basic structure of the package, although knowledge of this structure is not necessary for its use. The overall organization of the ARKode package is shown in Figure ARKode organization. The central integration modules, implemented in the files arkode.h, arkode_impl.h, arkode_butcher.h, arkode.c, arkode_arkstep.c , arkode_erkstep.c and arkode_butcher.c, deal with the evaluation of integration stages, the nonlinear solvers, estimation of the local truncation error, selection of step size, and interpolation to user output points, among other issues. ARKode currently supports modified Newton, inexact Newton, and accelerated fixed-point solvers for these nonlinearly implicit problems. However, when using the Newton-based iterations, or when using a non-identity mass matrix 𝑀 ̸= 𝐼, ARKode has flexibility in the choice of method used to solve the linear sub-systems that arise. Therefore, for any user problem invoking the Newton solvers, or any user problem with 𝑀 ̸= 𝐼, one (or more) of the linear system solver modules should be specified by the user, which is then invoked as needed during the integration process. Fig. 3.4: ARKode organization: Overall structure of the ARKode package. Modules specific to ARKode are the timesteppers, linear solver interfaces and preconditioners: ARKSTEP, ERKSTEP, ARKBBDPRE, ARKBANDPRE; all other items correspond to generic solver and auxiliary modules. Note also that the LAPACK, KLU and SuperLU_MT support is through interfaces to external packages. Users will need to download and compile those packages independently. For solving these linear systems, ARKode’s linear solver interface supports both direct and iterative linear solvers built using the generic SUNLINSOL API (see Description of the SUNLinearSolver module). These solvers may utilize a SUNMATRIX object for storing Jacobian information, or they may be matrix-free. Since ARKode can operate on any valid SUNLINSOL implementation, the set of linear solver modules available to ARKode will expand as new SUNLINSOL modules are developed. 3.1. ARKode organization 35 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), For users employing dense or banded Jacobians, ARKode includes algorithms for their approximation through difference quotients, although the user also has the option of supplying a routine to compute the Jacobian (or an approximation to it) directly. This user-supplied routine is required when using sparse or user-supplied Jacobian matrices. For users employing iterative linear solvers, ARKode includes an algorithm for the approximation by difference quotients of the product 𝐴𝑣. Again, the user has the option of providing routines for this operation, in two phases: setup (preprocessing of Jacobian data) and multiplication. When solve problems with non-identity mass matrices, corresponding user-supplied routines for computing either the mass matrix 𝑀 or the product 𝑀 𝑣 are required. Additionally, the type of linear solver module (iterative, dense-direct, band-direct, sparse-direct) used for both the IVP system and mass matrix must match. For preconditioned iterative methods for either the system or mass matrix solves, the preconditioning must be supplied by the user, again in two phases: setup and solve. While there is no default choice of preconditioner analogous to the difference-quotient approximation in the direct case, the references [BH1989] and [B1992], together with the example and demonstration programs included with ARKode and CVODE, offer considerable assistance in building simple preconditioners. ARKode’s linear solver interface consists of four primary phases, devoted to 1. memory allocation and initialization, 2. setup of the matrix/preconditioner data involved, 3. solution of the system, and 4. freeing of memory. The setup and solution phases are separate because the evaluation of Jacobians and preconditioners is done only periodically during the integration process, and only as required to achieve convergence. ARKode also provides two rudimentary preconditioner modules, for use with any of the Krylov iterative linear solvers. The first, ARKBANDPRE is intended to be used with the serial or threaded vector data structures (NVECTOR_SERIAL, NVECTOR_OPENMP and NVECTOR_PTHREADS), and provides a banded difference-quotient approximation to the Jacobian as the preconditioner, with corresponding setup and solve routines. The second preconditioner module, ARKBBDPRE, is intended to work with the parallel vector data structure, NVECTOR_PARALLEL, and generates a preconditioner that is a block-diagonal matrix with each block being a band matrix owned by a single processor. All state information used by ARKode to solve a given problem is saved in a single opaque memory structure, and a pointer to that structure is returned to the user. For C and C++ applications there is no global data in the ARKode package, and so in this respect it is reentrant. State information specific to the linear solver interface is saved in a separate data structure, a pointer to which resides in the ARKode memory structure. State information specific to the linear solver implementation (and matrix implementation, if applicable) are stored in their own data structures, that are returned to the user upon construction, and subsequently provided to ARKode for use. We note that the ARKode Fortran interface, however, currently uses global variables, so at most one of each of these objects may be created per memory space (i.e. one per MPI task in distributed memory computations). 36 Chapter 3. Code Organization CHAPTER FOUR USING ARKSTEP FOR C AND C++ APPLICATIONS This chapter is concerned with the use of the ARKStep time-stepping module for the solution of initial value problems (IVPs) in a C or C++ language setting. The following sections discuss the header files and the layout of the user’s main program, and provide descriptions of the ARKStep user-callable functions and user-supplied functions. The example programs described in the companion document [R2018] may be helpful. Those codes may be used as templates for new codes and are included in the ARKode package examples subdirectory. Users with applications written in Fortran should see the chapter FARKODE, an Interface Module for FORTRAN Applications, which describes the Fortran/C interface module for ARKStep, and may look to the Fortran example programs also described in the companion document [R2018]. These codes are also located in the ARKode package examples directory. The user should be aware that not all SUNLINSOL, SUNMATRIX, and preconditioning modules are compatible with all NVECTOR implementations. Details on compatibility are given in the documentation for each SUNMATRIX (see Matrix Data Structures) and each SUNLINSOL module (see Description of the SUNLinearSolver module). For example, NVECTOR_PARALLEL is not compatible with the dense, banded, or sparse SUNMATRIX types, or with the corresponding dense, banded, or sparse SUNLINSOL modules. Please check the sections Matrix Data Structures and Description of the SUNLinearSolver module to verify compatibility between these modules. In addition to that documentation, we note that the ARKBANDPRE preconditioning module is only compatible with the NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS vector implementations, and the preconditioner module ARKBBDPRE can only be used with NVECTOR_PARALLEL. ARKStep uses various input and output constants from the shared ARKode infrastructure. These are defined as needed in this chapter, but for convenience the full list is provided separately in the section Appendix: ARKode Constants. The relevant information on using ARKStep’s C and C++ interfaces is detailed in the following sub-sections. 4.1 Access to library and header files At this point, it is assumed that the installation of ARKode, following the procedure described in the section ARKode Installation Procedure, has been completed successfully. Regardless of where the user’s application program resides, its associated compilation and load commands must make reference to the appropriate locations for the library and header files required by ARKode. The relevant library files are • libdir/libsundials_arkode.lib, • libdir/libsundials_nvec*.lib, where the file extension .lib is typically .so for shared libraries and .a for static libraries. The relevant header files are located in the subdirectories • incdir/include/arkode 37 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • incdir/include/sundials • incdir/include/nvector • incdir/include/sunmatrix • incdir/include/sunlinsol • incdir/include/sunnonlinsol The directories libdir and incdir are the installation library and include directories, respectively. For a default installation, these are instdir/lib and instdir/include, respectively, where instdir is the directory where SUNDIALS was installed (see the section ARKode Installation Procedure for further details). 4.2 Data Types The sundials_types.h file contains the definition of the variable type realtype, which is used by the SUNDIALS solvers for all floating-point data, the definition of the integer type sunindextype, which is used for vector and matrix indices, and booleantype, which is used for certain logic operations within SUNDIALS. 4.2.1 Floating point types The type “realtype” can be set to float, double, or long double, depending on how SUNDIALS was installed (with the default being double). The user can change the precision of the SUNDIALS solvers’ floating-point arithmetic at the configuration stage (see the section Configuration options (Unix/Linux)). Additionally, based on the current precision, sundials_types.h defines the values BIG_REAL to be the largest value representable as a realtype, SMALL_REAL to be the smallest positive value representable as a realtype, and UNIT_ROUNDOFF to be the smallest realtype number, 𝜀, such that 1.0 + 𝜀 ̸= 1.0. Within SUNDIALS, real constants may be set to have the appropriate precision by way of a macro called RCONST. It is this macro that needs the ability to branch on the definition realtype. In ANSI C, a floating-point constant with no suffix is stored as a double. Placing the suffix “F” at the end of a floating point constant makes it a float, whereas using the suffix “L” makes it a long double. For example, #define A 1.0 #define B 1.0F #define C 1.0L defines A to be a double constant equal to 1.0, B to be a float constant equal to 1.0, and C to be a long double constant equal to 1.0. The macro call RCONST(1.0) automatically expands to 1.0 if realtype is double, to 1.0F if realtype is float, or to 1.0L if realtype is long double. SUNDIALS uses the RCONST macro internally to declare all of its floating-point constants. A user program which uses the type realtype and the RCONST macro to handle floating-point constants is precisionindependent, except for any calls to precision-specific standard math library functions. Users can, however, use the types double, float, or long double in their code (assuming that this usage is consistent with the size of realtype values that are passed to and from SUNDIALS). Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use realtype, so long as the SUNDIALS libraries have been compiled using the same precision (for details see the section ARKode Installation Procedure). 4.2.2 Integer types used for vector and matrix indices The type sunindextype can be either a 32- or 64-bit signed integer. The default is the portable int64_t type, and the user can change it to int32_t at the configuration stage. The configuration system will detect if the compiler 38 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), does not support portable types, and will replace int32_t and int64_t with int and long int, respectively, to ensure use of the desired sizes on Linux, Mac OS X, and Windows platforms. SUNDIALS currently does not support unsigned integer types for vector and matrix indices, although these could be added in the future if there is sufficient demand. A user program which uses sunindextype to handle vector and matrix indices will work with both index storage types except for any calls to index storage-specific external libraries. (Our C and C++ example programs use sunindextype.) Users can, however, use any one of int, long int, int32_t, int64_t or long long int in their code, assuming that this usage is consistent with the typedef for sunindextype on their architecture. Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use sunindextype, so long as the SUNDIALS libraries use the appropriate index storage type (for details see the section ARKode Installation Procedure). 4.3 Header Files When using ARKStep, the calling program must include several header files so that various macros and data types can be used. The header file that is always required is: • arkode/arkode_arkstep.h, the main header file for the ARKStep time-stepping module, which defines the several types and various constants, includes function prototypes, and includes the shared arkode/arkode.h and arkode/arkode_ls.h header files. Note that arkode.h includes sundials_types.h directly, which defines the types realtype, sunindextype and booleantype and the constants SUNFALSE and SUNTRUE, so a user program does not need to include sundials_types.h directly. Additionally, the calling program must also include a NVECTOR implementation header file, of the form nvector/nvector_***.h, corresponding to the user’s preferred data layout and form of parallelism. See the section Vector Data Structures for details for the appropriate name. This file in turn includes the header file sundials_nvector.h which defines the abstract N_Vector data type. If the user includes a non-trivial implicit component to their ODE system, then each time step will require a nonlinear solver for the resulting systems of equations – the default for this is a modified Newton iteration. If using a nondefault nonlinear solver module, or when interacting with a SUNNONLINSOL module directly, the calling program must also include a SUNNONLINSOL header file, of the form sunnonlinsol/sunnonlinsol_***.h where *** is the name of the nonlinear solver module (see the section Nonlinear Solver Data Structures for more information). This file in turn includes the header file sundials_nonlinearsolver.h which defines the abstract SUNNonlinearSolver data type. If using a nonlinear solver that requires the solution of a linear system of the form 𝒜𝑥 = 𝑏 (e.g., the default Newton iteration), then a linear solver module header file will also be required. Similarly, if the ODE system involves a nonidentity mass matrix 𝑀 ̸= 𝐼, then each time step will require a linear solver for systems of the form 𝑀 𝑥 = 𝑏. The header files corresponding to the SUNDIALS-provided linear solver modules available for use with ARKode are: • Direct linear solvers: – sunlinsol/sunlinsol_dense.h, which is used with the dense linear solver module, SUNLINSOL_DENSE; – sunlinsol/sunlinsol_band.h, which is used with the banded linear solver module, SUNLINSOL_BAND; – sunlinsol/sunlinsol_lapackdense.h, which is used with the LAPACK dense linear solver module, SUNLINSOL_LAPACKDENSE; – sunlinsol/sunlinsol_lapackband.h, which is used with the LAPACK banded linear solver module, SUNLINSOL_LAPACKBAND; 4.3. Header Files 39 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), – sunlinsol/sunlinsol_klu.h, which is used with the KLU sparse linear solver module, SUNLINSOL_KLU; – sunlinsol/sunlinsol_superlumt.h, which is used with the SuperLU_MT sparse linear solver module, SUNLINSOL_SUPERLUMT; • Iterative linear solvers: – sunlinsol/sunlinsol_spgmr.h, which is used with the scaled, preconditioned GMRES Krylov linear solver module, SUNLINSOL_SPGMR; – sunlinsol/sunlinsol_spfgmr.h, which is used with the scaled, preconditioned FGMRES Krylov linear solver module, SUNLINSOL_SPFGMR; – sunlinsol/sunlinsol_spbcgs.h, which is used with the scaled, preconditioned Bi-CGStab Krylov linear solver module, SUNLINSOL_SPBCGS; – sunlinsol/sunlinsol_sptfqmr.h, which is used with the scaled, preconditioned TFQMR Krylov linear solver module, SUNLINSOL_SPTFQMR; – sunlinsol/sunlinsol_pcg.h, which is used with the scaled, preconditioned CG Krylov linear solver module, SUNLINSOL_PCG; The header files for the SUNLINSOL_DENSE and SUNLINSOL_LAPACKDENSE linear solver modules include the file sunmatrix/sunmatrix_dense.h, which defines the SUNMATRIX_DENSE matrix module, as well as various functions and macros for acting on such matrices. The header files for the SUNLINSOL_BAND and SUNLINSOL_LAPACKBAND linear solver modules include the file sunmatrix/sunmatrix_band.h, which defines the SUNMATRIX_BAND matrix module, as well as various functions and macros for acting on such matrices. The header files for the SUNLINSOL_KLU and SUNLINSOL_SUPERLUMT linear solver modules include the file sunmatrix/sunmatrix_sparse.h, which defines the SUNMATRIX_SPARSE matrix module, as well as various functions and macros for acting on such matrices. The header files for the Krylov iterative solvers include the file sundials/sundials_iterative.h, which enumerates the preconditioning type and (for the SPGMR and SPFGMR solvers) the choices for the Gram-Schmidt orthogonalization process. Other headers may be needed, according to the choice of preconditioner, etc. For example, if preconditioning for an iterative linear solver were performed using the ARKBBDPRE module, the header arkode/arkode_bbdpre.h is needed to access the preconditioner initialization routines. 4.4 A skeleton of the user’s main program The following is a skeleton of the user’s main program (or calling program) for the integration of an ODE IVP using the ARKStep module. Most of the steps are independent of the NVECTOR, SUNMATRIX, SUNLINSOL and SUNNONLINSOL implementations used. For the steps that are not, refer to the sections Vector Data Structures, Matrix Data Structures, Description of the SUNLinearSolver module, and Nonlinear Solver Data Structures for the specific name of the function to be called or macro to be referenced. 1. Initialize parallel or multi-threaded environment, if appropriate. For example, call MPI_Init to initialize MPI if used, or set num_threads, the number of threads to use within the threaded vector functions, if used. 2. Set problem dimensions, etc. This generally includes the problem size, N, and may include the local vector length Nlocal. 40 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Note: The variables N and Nlocal should be of type sunindextype. 3. Set vector of initial values To set the vector y0 of initial values, use the appropriate functions defined by the particular NVECTOR implementation. For native SUNDIALS vector implementations (except the CUDA and RAJA based ones), use a call of the form y0 = N_VMake_***(..., ydata); if the realtype array ydata containing the initial values of 𝑦 already exists. Otherwise, create a new vector by making a call of the form y0 = N_VNew_***(...); and then set its elements by accessing the underlying data where it is located with a call of the form ydata = N_VGetArrayPointer_***(y0); See the sections The NVECTOR_SERIAL Module through The NVECTOR_PTHREADS Module for details. For the HYPRE and PETSc vector wrappers, first create and initialize the underlying vector, and then create the NVECTOR wrapper with a call of the form y0 = N_VMake_***(yvec); where yvec is a HYPRE or PETSc vector. Note that calls like N_VNew_***(...) and N_VGetArrayPointer_***(...) are not available for these vector wrappers. See the sections The NVECTOR_PARHYP Module and The NVECTOR_PETSC Module for details. If using either the CUDA- or RAJA-based vector implementations use a call of the form y0 = N_VMake_***(..., c); where c is a pointer to a suncudavec or sunrajavec vector class if this class already exists. Otherwise, create a new vector by making a call of the form N_VGetDeviceArrayPointer_*** or N_VGetHostArrayPointer_*** Note that the vector class will allocate memory on both the host and device when instantiated. See the sections The NVECTOR_CUDA Module and The NVECTOR_RAJA Module for details. 4. Create ARKStep object Call arkode_mem = ARKStepCreate(...) to create the ARKStep memory block. ARKStepCreate() returns a void* pointer to this memory structure. See the section ARKStep initialization and deallocation functions for details. 5. Specify integration tolerances Call ARKStepSStolerances() or ARKStepSVtolerances() to specify either a scalar relative tolerance and scalar absolute tolerance, or a scalar relative tolerance and a vector of absolute tolerances, respectively. Alternatively, call ARKStepWFtolerances() to specify a function which sets directly the weights used in evaluating WRMS vector norms. See the section ARKStep tolerance specification functions for details. If a problem with non-identity mass matrix is used, and the solution units differ considerably from the equation units, absolute tolerances for the equation residuals (nonlinear and linear) may be spec- 4.4. A skeleton of the user’s main program 41 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ified separately through calls to ARKStepResStolerance(), ARKStepResVtolerance(), or ARKStepResFtolerance(). 6. Create matrix object If a nonlinear solver requiring a linear solver will be used (e.g., a Newton iteration) and the linear solver will be a matrix-based linear solver, then a template Jacobian matrix must be created by using the appropriate functions defined by the particular SUNMATRIX implementation. For the SUNDIALS-supplied SUNMATRIX implementations, the matrix object may be created using a call of the form SUNMatrix A = SUNBandMatrix(...); or SUNMatrix A = SUNDenseMatrix(...); or SUNMatrix A = SUNSparseMatrix(...); Similarly, if the problem involves a non-identity mass matrix, and the mass-matrix linear systems will be solved using a direct linear solver, then a template mass matrix must be created by using the appropriate functions defined by the particular SUNMATRIX implementation. NOTE: The dense, banded, and sparse matrix objects are usable only in a serial or threaded environment. 7. Create linear solver object If a nonlinear solver requiring a linear solver will be used (e.g., a Newton iteration), or if the problem involves a non-identity mass matrix, then the desired linear solver object(s) must be created by using the appropriate functions defined by the particular SUNLINSOL implementation. For any of the SUNDIALS-supplied SUNLINSOL implementations, the linear solver object may be created using a call of the form SUNLinearSolver LS = SUNLinSol_*(...); where * can be replaced with “Dense”, “SPGMR”, or other options, as discussed in the sections Linear solver interface functions and Description of the SUNLinearSolver module. 8. Set linear solver optional inputs Call *Set* functions from the selected linear solver module to change optional inputs specific to that linear solver. See the documentation for each SUNLINSOL module in the section Description of the SUNLinearSolver module for details. 9. Attach linear solver module If a linear solver was created above for implicit stage solves, initialize the ARKLS linear solver interface by attaching the linear solver object (and Jacobian matrix object, if applicable) with the call (for details see the section Linear solver interface functions): ier = ARKStepSetLinearSolver(...); Similarly, if the problem involves a non-identity mass matrix, initialize the ARKLS mass matrix linear solver interface by attaching the mass linear solver object (and mass matrix object, if applicable) with the call (for details see the section Linear solver interface functions): ier = ARKStepSetMassLinearSolver(...); 42 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 10. Set optional inputs Call ARKStepSet* functions to change any optional inputs that control the behavior of ARKStep from their default values. See the section Optional input functions for details. 11. Create nonlinear solver object If the problem involves an implicit component, and if a non-default nonlinear solver object will be used for implicit stage solves (see the section Nonlinear solver interface functions), then the desired nonlinear solver object must be created by using the appropriate functions defined by the particular SUNNONLINSOL implementation (e.g., NLS = SUNNonlinSol_***(...); where *** is the name of the nonlinear solver (see the section Nonlinear Solver Data Structures for details). For the SUNDIALS-supplied SUNNONLINSOL implementations, the nonlinear solver object may be created using a call of the form SUNNonlinearSolver NLS = SUNNonlinSol_Newton(...); or SUNNonlinearSolver NLS = SUNNonlinSol_FixedPoint(...); 12. Attach nonlinear solver module If a nonlinear solver object was created above, then it must be attached to ARKStep using the call (for details see the section Nonlinear solver interface functions): ier = ARKStepSetNonlinearSolver(...); 13. Set nonlinear solver optional inputs Call the appropriate set functions for the selected nonlinear solver module to change optional inputs specific to that nonlinear solver. These must be called after attaching the nonlinear solver to ARKStep, otherwise the optional inputs will be overridden by ARKStep defaults. See the section Nonlinear Solver Data Structures for more information on optional inputs. 14. Specify rootfinding problem Optionally, call ARKStepRootInit() to initialize a rootfinding problem to be solved during the integration of the ODE system. See the section Rootfinding initialization function for general details, and the section Optional input functions for relevant optional input calls. 15. Advance solution in time For each point at which output is desired, call ier = ARKStepEvolve(arkode_mem, tout, yout, &tret, itask); Here, itask specifies the return mode. The vector yout (which can be the same as the vector y0 above) will contain 𝑦(𝑡out ). See the section ARKStep solver function for details. 16. Get optional outputs Call ARKStepGet* functions to obtain optional output. See the section Optional output functions for details. 17. Deallocate memory for solution vector Upon completion of the integration, deallocate memory for the vector y (or yout) by calling the destructor function: N_VDestroy(y); 18. Free solver memory 4.4. A skeleton of the user’s main program 43 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Call ARKStepFree(&arkode_mem) to free the memory allocated for the ARKStep module (and any nonlinear solver module). 19. Free linear solver and matrix memory Call SUNLinSolFree() and (possibly) SUNMatDestroy() to free any memory allocated for the linear solver and matrix objects created above. 20. Finalize MPI, if used Call MPI_Finalize to terminate MPI. SUNDIALS provides some linear solvers only as a means for users to get problems running and not as highly efficient solvers. For example, if solving a dense system, we suggest using the LAPACK solvers if the size of the linear system is > 50, 000 (thanks to A. Nicolai for his testing and recommendation). The table below shows the linear solver interfaces available as SUNLinearSolver modules and the vector implementations required for use. As an example, one cannot use the dense direct solver interfaces with the MPI-based vector implementation. However, as discussed in section Description of the SUNLinearSolver module the SUNDIALS packages operate on generic SUNLinearSolver objects, allowing a user to develop their own solvers should they so desire. 4.4.1 SUNDIALS linear solver interfaces and vector implementations that can be used for each Linear Solver Interface Dense Band LapackDense LapackBand KLU SuperLU_MT SPGMR SPFGMR SPBCGS SPTFQMR PCG User supplied Serial X X X X X X X X X X X X Parallel (MPI) X X X X X X OpenMP pThreads hypre Vec. X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X PETSc Vec. X X X X X X CUDA RAJA User Suppl. X X X X X X X X X X X X X X X X X X X X X X X X 4.5 User-callable functions This section describes the functions that are called by the user to setup and then solve an IVP using the ARKStep time-stepping module. Some of these are required; however, starting with the section Optional input functions, the functions listed involve optional inputs/outputs or restarting, and those paragraphs may be skipped for a casual use of ARKode’s ARKStep module. In any case, refer to the preceding section, A skeleton of the user’s main program, for the correct order of these calls. On an error, each user-callable function returns a negative value (or NULL if the function returns a pointer) and sends an error message to the error handler routine, which prints the message to stderr by default. However, the user can set a file as error output or can provide her own error handler function (see the section Optional input functions for details). 44 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4.5.1 ARKStep initialization and deallocation functions void* ARKStepCreate(ARKRhsFn fe, ARKRhsFn fi, realtype t0, N_Vector y0) This function creates an internal memory block for a problem to be solved using the ARKStep time-stepping module in ARKode. Arguments: • fe – the name of the C function (of type ARKRhsFn()) defining the explicit portion of the right-hand side function in 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦). • fi – the name of the C function (of type ARKRhsFn()) defining the implicit portion of the right-hand side function in 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: If successful, a pointer to initialized problem memory of type void*, to be passed to all userfacing ARKStep routines listed below. If unsuccessful, a NULL pointer will be returned, and an error message will be printed to stderr. void ARKStepFree(void** arkode_mem) This function frees the problem memory arkode_mem created by ARKStepCreate(). Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: None 4.5.2 ARKStep tolerance specification functions These functions specify the integration tolerances. One of them should be called before the first call to ARKStepEvolve(); otherwise default values of reltol = 1e-4 and abstol = 1e-9 will be used, which may be entirely incorrect for a specific problem. The integration tolerances reltol and abstol define a vector of error weights, ewt. ARKStepSStolerances(), this vector has components In the case of ewt[i] = 1.0/(reltol*abs(y[i]) + abstol); whereas in the case of ARKStepSVtolerances() the vector components are given by ewt[i] = 1.0/(reltol*abs(y[i]) + abstol[i]); This vector is used in all error and convergence tests, which use a weighted RMS norm on all error-like vectors 𝑣: (︃ ‖𝑣‖𝑊 𝑅𝑀 𝑆 = 𝑁 1 ∑︁ (𝑣𝑖 𝑒𝑤𝑡𝑖 )2 𝑁 𝑖=1 )︃1/2 , where 𝑁 is the problem dimension. Alternatively, the user may supply a custom function to supply the ewt vector, through a call to ARKStepWFtolerances(). int ARKStepSStolerances(void* arkode_mem, realtype reltol, realtype abstol) This function specifies scalar relative and absolute tolerances. Arguments: • arkode_mem – pointer to the ARKStep memory block. 4.5. User-callable functions 45 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • reltol – scalar relative tolerance. • abstol – scalar absolute tolerance. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ARKStepSVtolerances(void* arkode_mem, realtype reltol, N_Vector abstol) This function specifies a scalar relative tolerance and a vector absolute tolerance (a potentially different absolute tolerance for each vector component). Arguments: • arkode_mem – pointer to the ARKStep memory block. • reltol – scalar relative tolerance. • abstol – vector containing the absolute tolerances for each solution component. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ARKStepWFtolerances(void* arkode_mem, ARKEwtFn efun) This function specifies a user-supplied function efun to compute the error weight vector ewt. Arguments: • arkode_mem – pointer to the ARKStep memory block. • efun – the name of the function (of type ARKEwtFn()) that implements the error weight vector computation. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module Moreover, for problems involving a non-identity mass matrix 𝑀 ̸= 𝐼, the units of the solution vector 𝑦 may differ from the units of the IVP, posed for the vector 𝑀 𝑦. When this occurs, iterative solvers for the Newton linear systems and the mass matrix linear systems may require a different set of tolerances. Since the relative tolerance is dimensionless, but the absolute tolerance encodes a measure of what is “small” in the units of the respective quantity, a user may optionally define absolute tolerances in the equation units. In this case, ARKStep defines a vector of residual weights, rwt for measuring convergence of these iterative solvers. In the case of ARKStepResStolerance(), this vector has components rwt[i] = 1.0/(reltol*abs(My[i]) + rabstol); whereas in the case of ARKStepResVtolerance() the vector components are given by 46 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), rwt[i] = 1.0/(reltol*abs(My[i]) + rabstol[i]); This residual weight vector is used in all iterative solver convergence tests, which similarly use a weighted RMS norm on all residual-like vectors 𝑣: (︃ )︃1/2 𝑁 1 ∑︁ 2 , ‖𝑣‖𝑊 𝑅𝑀 𝑆 = (𝑣𝑖 𝑟𝑤𝑡𝑖 ) 𝑁 𝑖=1 where 𝑁 is the problem dimension. As with the error weight vector, the user may supply a custom function to supply the rwt vector, through a call to ARKStepResFtolerance(). Further information on all three of these functions is provided below. int ARKStepResStolerance(void* arkode_mem, realtype abstol) This function specifies a scalar absolute residual tolerance. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rabstol – scalar absolute residual tolerance. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ARKStepResVtolerance(void* arkode_mem, N_Vector rabstol) This function specifies a vector of absolute residual tolerances. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rabstol – vector containing the absolute residual tolerances for each solution component. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ARKStepResFtolerance(void* arkode_mem, ARKRwtFn rfun) This function specifies a user-supplied function rfun to compute the residual weight vector rwt. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rfun – the name of the function (of type ARKRwtFn()) that implements the residual weight vector computation. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if the ARKStep memory was not allocated by the time-stepping module 4.5. User-callable functions 47 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), General advice on the choice of tolerances For many users, the appropriate choices for tolerance values in reltol, abstol, and rabstol are a concern. The following pieces of advice are relevant. 1. The scalar relative tolerance reltol is to be set to control relative errors. So a value of 10−4 means that errors are controlled to .01%. We do not recommend using reltol larger than 10−3 . On the other hand, reltol should not be so small that it is comparable to the unit roundoff of the machine arithmetic (generally around 10−15 for double-precision). 2. The absolute tolerances abstol (whether scalar or vector) need to be set to control absolute errors when any components of the solution vector 𝑦 may be so small that pure relative error control is meaningless. For example, if 𝑦𝑖 starts at some nonzero value, but in time decays to zero, then pure relative error control on 𝑦𝑖 makes no sense (and is overly costly) after 𝑦𝑖 is below some noise level. Then abstol (if scalar) or abstol[i] (if a vector) needs to be set to that noise level. If the different components have different noise levels, then abstol should be a vector. For example, see the example problem ark_robertson.c, and the discussion of it in the ARKode Examples Documentation [R2018]. In that problem, the three components vary between 0 and 1, and have different noise levels; hence the atols vector therein. It is impossible to give any general advice on abstol values, because the appropriate noise levels are completely problem-dependent. The user or modeler hopefully has some idea as to what those noise levels are. 3. The residual absolute tolerances rabstol (whether scalar or vector) follow a similar explanation as for abstol, except that these should be set to the noise level of the equation components, i.e. the noise level of 𝑀 𝑦. For problems in which 𝑀 = 𝐼, it is recommended that rabstol be left unset, which will default to the already-supplied abstol values. 4. Finally, it is important to pick all the tolerance values conservatively, because they control the error committed on each individual step. The final (global) errors are an accumulation of those per-step errors, where that accumulation factor is problem-dependent. A general rule of thumb is to reduce the tolerances by a factor of 10 from the actual desired limits on errors. So if you want .01% relative accuracy (globally), a good choice for reltol is 10−5 . In any case, it is a good idea to do a few experiments with the tolerances to see how the computed solution values vary as tolerances are reduced. Advice on controlling nonphysical negative values In many applications, some components in the true solution are always positive or non-negative, though at times very small. In the numerical solution, however, small negative (nonphysical) values can then occur. In most cases, these values are harmless, and simply need to be controlled, not eliminated, but in other cases any value that violates a constraint may cause a simulation to halt. For both of these scenarios the following pieces of advice are relevant. 1. The best way to control the size of unwanted negative computed values is with tighter absolute tolerances. Again this requires some knowledge of the noise level of these components, which may or may not be different for different components. Some experimentation may be needed. 2. If output plots or tables are being generated, and it is important to avoid having negative numbers appear there (for the sake of avoiding a long explanation of them, if nothing else), then eliminate them, but only in the context of the output medium. Then the internal values carried by the solver are unaffected. Remember that a small negative value in 𝑦 returned by ARKStep, with magnitude comparable to abstol or less, is equivalent to zero as far as the computation is concerned. 3. The user’s right-hand side routines 𝑓𝐸 and 𝑓𝐼 should never change a negative value in the solution vector 𝑦 to a non-negative value in attempt to “fix” this problem, since this can lead to numerical instability. If the 𝑓𝐸 or 𝑓𝐼 routines cannot tolerate a zero or negative value (e.g. because there is a square root or log), then the offending value should be changed to zero or a tiny positive number in a temporary variable (not in the input 𝑦 vector) for the purposes of computing 𝑓𝐸 (𝑡, 𝑦) or 𝑓𝐼 (𝑡, 𝑦). 48 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4. Positivity and non-negativity constraints on components can be enforced by use of the recoverable error return feature in the user-supplied right-hand side functions, 𝑓𝐸 and 𝑓𝐼 . When a recoverable error is encountered, ARKStep will retry the step with a smaller step size, which typically alleviates the problem. However, because this option involves some additional overhead cost, it should only be exercised if the use of absolute tolerances to control the computed values is unsuccessful. 4.5.3 Linear solver interface functions As previously explained, the Newton iterations used in solving implicit systems within ARKStep require the solution of linear systems of the form (︁ )︁ (︁ )︁ (𝑚) (𝑚) 𝒜 𝑧𝑖 𝛿 (𝑚+1) = −𝐺 𝑧𝑖 where 𝒜 ≈ 𝑀 − 𝛾𝐽, 𝐽= 𝜕𝑓𝐼 . 𝜕𝑦 ARKode’s ARKLs linear solver interface supports all valid SUNLinearSolver modules for this task. Matrix-based SUNLinearSolver modules utilize SUNMatrix objects to store the approximate Jacobian matrix 𝐽, the Newton matrix 𝒜, the mass matrix 𝑀 , and factorizations used throughout the solution process. Matrix-free SUNLinearSolver modules instead use iterative methods to solve the Newton systems of equations, and only require the action of the matrix on a vector, 𝒜𝑣. With most of these methods, preconditioning can be done on the left only, on the right only, on both the left and the right, or not at all. The exceptions to this rule are SPFGMR that supports right preconditioning only and PCG that performs symmetric preconditioning. For the specification of a preconditioner, see the iterative linear solver portions of the sections Optional input functions and User-supplied functions. If preconditioning is done, user-supplied functions should be used to define left and right preconditioner matrices 𝑃1 and 𝑃2 (either of which could be the identity matrix), such that the product 𝑃1 𝑃2 approximates the Newton matrix 𝒜 = 𝑀 − 𝛾𝐽. To specify a generic linear solver for ARKStep to use for the Newton systems, after the call to ARKStepCreate() but before any calls to ARKStepEvolve(), the user’s program must create the appropriate SUNLinearSolver object and call the function ARKStepSetLinearSolver(), as documented below. To create the SUNLinearSolver object, the user may call one of the SUNDIALS-packaged SUNLinSol module constructor routines via a call of the form SUNLinearSolver LS = SUNLinSol_*(...); The current list of such constructor routines includes SUNLinSol_Dense(), SUNLinSol_Band(), SUNLinSol_LapackDense(), SUNLinSol_LapackBand(), SUNLinSol_KLU(), SUNLinSol_SuperLUMT(), SUNLinSol_SPGMR(), SUNLinSol_SPFGMR(), SUNLinSol_SPBCGS(), SUNLinSol_SPTFQMR(), and SUNLinSol_PCG(). Alternately, a user-supplied SUNLinearSolver module may be created and used instead. The use of each of the generic linear solvers involves certain constants, functions and possibly some macros, that are likely to be needed in the user code. These are available in the corresponding header file associated with the specific SUNMatrix or SUNLinearSolver module in question, as described in the sections Matrix Data Structures and Description of the SUNLinearSolver module. Once this solver object has been constructed, the user should attach it to ARKStep via a call to ARKStepSetLinearSolver(). The first argument passed to this function is the ARKStep memory pointer returned by ARKStepCreate(); the second argument is the SUNLinearSolver object created above. The third argument is an optional SUNMatrix object to accompany matrix-based SUNLinearSolver inputs (for matrix-free 4.5. User-callable functions 49 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), linear solvers, the third argument should be NULL). A call to this function initializes the ARKLs linear solver interface, linking it to the ARKStep integrator, and allows the user to specify additional parameters and routines pertinent to their choice of linear solver. int ARKStepSetLinearSolver(void* arkode_mem, SUNLinearSolver LS, SUNMatrix J) This function specifies the SUNLinearSolver object that ARKStep should use, as well as a template Jacobian SUNMatrix object (if applicable). Arguments: • arkode_mem – pointer to the ARKStep memory block. • LS – the SUNLinearSolver object to use. • J – the template Jacobian SUNMatrix object to use (or NULL if not applicable). Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_MEM_FAIL if there was a memory allocation failure • ARKLS_ILL_INPUT if ARKLS is incompatible with the provided LS or J input objects, or the current N_Vector module. Notes: If LS is a matrix-free linear solver, then the J argument should be NULL. If LS is a matrix-based linear solver, then the template Jacobian matrix J will be used in the solve process, so if additional storage is required within the SUNMatrix object (e.g. for factorization of a banded matrix), ensure that the input object is allocated with sufficient size (see the documentation of the particular SUNMATRIX type in the section Matrix Data Structures for further information). When using sparse linear solvers, it is typically much more efficient to supply J so that it includes the full sparsity pattern of the Newton system matrices 𝒜 = 𝐼 − 𝛾𝐽 (or 𝒜 = 𝑀 − 𝛾𝐽 in the case of non-identity mass matrix), even if J itself has zeros in nonzero locations of 𝐼 (or 𝑀 ). The reasoning for this is that 𝒜 is constructed in-place, on top of the user-specified values of J, so if the sparsity pattern in J is insufficient to store 𝒜 then it will need to be resized internally by ARKStep. 4.5.4 Mass matrix solver specification functions As discussed in section Mass matrix solver, if the ODE system involves a non-identity mass matrix 𝑀 ̸= 𝐼, then ARKStep must solve linear systems of the form 𝑀 𝑥 = 𝑏. ARKode’s ARKLs mass-matrix linear solver interface supports all valid SUNLinearSolver modules for this task. For iterative linear solvers, user-supplied preconditioning can be applied. For the specification of a preconditioner, see the iterative linear solver portions of the sections Optional input functions and User-supplied functions. If preconditioning is to be performed, user-supplied functions should be used to define left and right preconditioner matrices 𝑃1 and 𝑃2 (either of which could be the identity matrix), such that the product 𝑃1 𝑃2 approximates the mass matrix 𝑀 . To specify a generic linear solver for ARKStep to use for mass matrix systems, after the call to ARKStepCreate() but before any calls to ARKStepEvolve(), the user’s program must create the appropriate SUNLinearSolver object and call the function ARKStepSetMassLinearSolver(), as documented below. The first argument passed to this functions is the ARKStep memory pointer returned by ARKStepCreate(); the second argument is the desired SUNLinearSolver object to use for solving mass matrix systems. The third object is a template SUNMatrix to use with the provided SUNLinearSolver (if applicable). The fourth input is a flag to indicate whether the mass matrix is time-dependent, i.e. 𝑀 = 𝑀 (𝑡) or not. A call to this function initializes the ARKLs mass 50 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), matrix linear solver interface, linking this to the main ARKStep integrator, and allows the user to specify additional parameters and routines pertinent to their choice of linear solver. The use of each of the generic linear solvers involves certain constants and possibly some macros, that are likely to be needed in the user code. These are available in the corresponding header file associated with the specific SUNMatrix or SUNLinearSolver module in question, as described in the sections Matrix Data Structures and Description of the SUNLinearSolver module. Note: if the user program includes linear solvers for both the Newton and mass matrix systems, these must have the same type: • If both are matrix-based, then they must utilize the same SUNMatrix type, since these will be added when forming the Newton system matrices 𝒜. In this case, both the Newton and mass matrix linear solver interfaces can use the same SUNLinearSolver object, although different solver objects (e.g. with different solver parameters) are also allowed. • If both are matrix-free, then the Newton and mass matrix SUNLinearSolver objects must be different. These may even use different solver algorithms (SPGMR, SPBCGS, etc.), if desired. For example, if the mass matrix is symmetric but the Jacobian is not, then PCG may be used for the mass matrix systems and SPGMR for the Newton systems. As with the Newton system solvers, the mass matrix linear system solvers listed below are all built on top of generic SUNDIALS solver modules. int ARKStepSetMassLinearSolver(void* arkode_mem, SUNLinearSolver LS, SUNMatrix M, booleantype time_dep) This function specifies the SUNLinearSolver object that ARKStep should use for mass matrix systems, as well as a template SUNMatrix object. Arguments: • arkode_mem – pointer to the ARKStep memory block. • LS – the SUNLinearSolver object to use. • M – the template mass SUNMatrix object to use. • time_dep – flag denoting whether the mass matrix depends on the independent variable (𝑀 = 𝑀 (𝑡)) or not (𝑀 ̸= 𝑀 (𝑡)). SUNTRUE indicates time-dependence of the mass matrix. Currently, only values of “SUNFALSE” are supported. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_MEM_FAIL if there was a memory allocation failure • ARKLS_ILL_INPUT if ARKLS is incompatible with the provided LS or M input objects, or the current N_Vector module. Notes: If LS is a matrix-free linear solver, then the M argument should be NULL. If LS is a matrix-based linear solver, then the template mass matrix M will be used in the solve process, so if additional storage is required within the SUNMatrix object (e.g. for factorization of a banded matrix), ensure that the input object is allocated with sufficient size. The time_dep flag is currently unused, serving as a placeholder for planned future functionality. As such, ARKStep only computes and factors the mass matrix once, with the results reused throughout the entire ARKStep simulation. 4.5. User-callable functions 51 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Unlike the system Jacobian, the system mass matrix cannot be approximated using finite-differences of any functions provided to ARKStep. Hence, use of the a matrix-based LS requires the user to provide a mass-matrix constructor routine (see ARKLsMassFn and ARKStepSetMassFn()). Similarly, the system mass matrix-vector-product cannot be approximated using finite-differences of any functions provided to ARKStep. Hence, use of a matrix-free LS requires the user to provide a mass-matrix-timesvector product routine (see ARKLsMassTimesVecFn and ARKStepSetMassTimes()). 4.5.5 Nonlinear solver interface functions When changing the nonlinear solver in ARKStep, after the call to ARKStepCreate() but before any calls to ARKStepEvolve(), the user’s program must create the appropriate SUNNonlinSol object and call ARKStepSetNonlinearSolver(), as documented below. If any calls to ARKStepEvolve() have been made, then ARKStep will need to be reinitialized by calling ARKStepReInit() to ensure that the nonlinear solver is initialized correctly before any subsequent calls to ARKStepEvolve(). The first argument passed to the routine ARKStepSetNonlinearSolver() is the ARKStep memory pointer returned by ARKStepCreate(); the second argument passed to this function is the desired SUNNonlinSol object to use for solving the nonlinear system for each implicit stage. A call to this function attaches the nonlinear solver to the main ARKStep integrator. int ARKStepSetNonlinearSolver(void* arkode_mem, SUNNonlinearSolver NLS) This function specifies the SUNNonlinearSolver object that ARKStep should use for implicit stage solves. Arguments: • arkode_mem – pointer to the ARKStep memory block. • NLS – the SUNNonlinearSolver object to use. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_MEM_FAIL if there was a memory allocation failure • ARK_ILL_INPUT if ARKStep is incompatible with the provided NLS input object. Notes: ARKStep will use the Newton SUNNonlinSol module by default; a call to this routine replaces that module with the supplied NLS object. 4.5.6 Rootfinding initialization function As described in the section Rootfinding, while solving the IVP, ARKode’s time-stepping modules have the capability to find the roots of a set of user-defined functions. To activate the root-finding algorithm, call the following function. This is normally called only once, prior to the first call to ARKStepEvolve(), but if the rootfinding problem is to be changed during the solution, ARKStepRootInit() can also be called prior to a continuation call to ARKStepEvolve(). int ARKStepRootInit(void* arkode_mem, int nrtfn, ARKRootFn g) Initializes a rootfinding problem to be solved during the integration of the ODE system. It must be called after ARKStepCreate(), and before ARKStepEvolve(). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nrtfn – number of functions 𝑔𝑖 , an integer ≥ 0. 52 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • g – name of user-supplied function, of type ARKRootFn(), defining the functions 𝑔𝑖 whose roots are sought. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_MEM_FAIL if there was a memory allocation failure • ARK_ILL_INPUT if nrtfn is greater than zero but g = NULL. Notes: To disable the rootfinding feature after it has already been initialized, or to free memory associated with ARKStep’s rootfinding module, call ARKStepRootInit with nrtfn = 0. Similarly, if a new IVP is to be solved with a call to ARKStepReInit(), where the new IVP has no rootfinding problem but the prior one did, then call ARKStepRootInit with nrtfn = 0. 4.5.7 ARKStep solver function This is the central step in the solution process – the call to perform the integration of the IVP. One of the input arguments (itask) specifies one of two modes as to where ARKStep is to return a solution. These modes are modified if the user has set a stop time (with a call to the optional input function ARKStepSetStopTime()) or has requested rootfinding. int ARKStepEvolve(void* arkode_mem, realtype tout, N_Vector yout, realtype *tret, int itask) Integrates the ODE over an interval in 𝑡. Arguments: • arkode_mem – pointer to the ARKStep memory block. • tout – the next time at which a computed solution is desired. • yout – the computed solution vector. • tret – the time corresponding to yout (output). • itask – a flag indicating the job of the solver for the next user step. The ARK_NORMAL option causes the solver to take internal steps until it has just overtaken a userspecified output time, tout, in the direction of integration, i.e. 𝑡𝑛−1 < tout ≤ 𝑡𝑛 for forward integration, or 𝑡𝑛 ≤ tout < 𝑡𝑛−1 for backward integration. It will then compute an approximation to the solution 𝑦(𝑡𝑜𝑢𝑡) by interpolation (using one of the dense output routines described in the section Interpolation). The ARK_ONE_STEP option tells the solver to only take a single internal step 𝑦𝑛−1 → 𝑦𝑛 and then return control back to the calling program. If this step will overtake tout then the solver will again return an interpolated result; otherwise it will return a copy of the internal solution 𝑦𝑛 in the vector yout Return value: • ARK_SUCCESS if successful. • ARK_ROOT_RETURN if ARKStepEvolve() succeeded, and found one or more roots. If the number of root functions, nrtfn, is greater than 1, call ARKStepGetRootInfo() to see which 𝑔𝑖 were found to have a root at (*tret). • ARK_TSTOP_RETURN if ARKStepEvolve() succeeded and returned at tstop. • ARK_MEM_NULL if the arkode_mem argument was NULL. 4.5. User-callable functions 53 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if one of the inputs to ARKStepEvolve() is illegal, or some other input to the solver was either illegal or missing. Details will be provided in the error message. Typical causes of this failure: 1. A component of the error weight vector became zero during internal time-stepping. 2. The linear solver initialization function (called by the user after calling ARKStepCreate()) failed to set the linear solver-specific lsolve field in arkode_mem. 3. A root of one of the root functions was found both at a point 𝑡 and also very near 𝑡. • ARK_TOO_MUCH_WORK if the solver took mxstep internal steps but could not reach tout. The default value for mxstep is MXSTEP_DEFAULT = 500. • ARK_TOO_MUCH_ACC if the solver could not satisfy the accuracy demanded by the user for some internal step. • ARK_ERR_FAILURE if error test failures occurred either too many times (ark_maxnef ) during one internal time step or occurred with |ℎ| = ℎ𝑚𝑖𝑛 . • ARK_CONV_FAILURE if either convergence test failures occurred too many times (ark_maxncf ) during one internal time step or occurred with |ℎ| = ℎ𝑚𝑖𝑛 . • ARK_LINIT_FAIL if the linear solver’s initialization function failed. • ARK_LSETUP_FAIL if the linear solver’s setup routine failed in an unrecoverable manner. • ARK_LSOLVE_FAIL if the linear solver’s solve routine failed in an unrecoverable manner. • ARK_MASSINIT_FAIL if the mass matrix solver’s initialization function failed. • ARK_MASSSETUP_FAIL if the mass matrix solver’s setup routine failed. • ARK_MASSSOLVE_FAIL if the mass matrix solver’s solve routine failed. • ARK_VECTOROP_ERR a vector operation error occured. Notes: The input vector yout can use the same memory as the vector y0 of initial conditions that was passed to ARKStepCreate(). In ARK_ONE_STEP mode, tout is used only on the first call, and only to get the direction and a rough scale of the independent variable. All failure return values are negative and so testing the return argument for negative values will trap all ARKStepEvolve() failures. Since interpolation may reduce the accuracy in the reported solution, if full method accuracy is desired the user should issue a call to ARKStepSetStopTime() before the call to ARKStepEvolve() to specify a fixed stop time to end the time step and return to the user. Upon return from ARKStepEvolve(), a copy of the internal solution 𝑦𝑛 will be returned in the vector yout. Once the integrator returns at a tstop time, any future testing for tstop is disabled (and can be re-enabled only though a new call to ARKStepSetStopTime()). On any error return in which one or more internal steps were taken by ARKStepEvolve(), the returned values of tret and yout correspond to the farthest point reached in the integration. On all other error returns, tret and yout are left unchanged from those provided to the routine. 4.5.8 Optional input functions There are numerous optional input parameters that control the behavior of the ARKStep solver, each of which may be modified from its default value through calling an appropriate input function. The following tables list all optional input functions, grouped by which aspect of ARKStep they control. Detailed information on the calling syntax and arguments for each function are then provided following each table. 54 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The optional inputs are grouped into the following categories: • General ARKStep options (Optional inputs for ARKStep), • IVP method solver options (Optional inputs for IVP method selection), • Step adaptivity solver options (Optional inputs for time step adaptivity), • Implicit stage solver options (Optional inputs for implicit stage solves), • Linear solver interface options (Linear solver interface optional input functions), For the most casual use of ARKStep, relying on the default set of solver parameters, the reader can skip to the following section, User-supplied functions. We note that, on an error return, all of the optional input functions send an error message to the error handler function. We also note that all error return values are negative, so a test on the return arguments for negative values will catch all errors. Optional inputs for ARKStep Optional input Return ARKStep solver parameters to their defaults Set dense output order Supply a pointer to a diagnostics output file Supply a pointer to an error output file Supply a custom error handler function Disable time step adaptivity (fixed-step mode) Supply an initial step size to attempt Maximum no. of warnings for 𝑡𝑛 + ℎ = 𝑡𝑛 Maximum no. of internal steps before tout Maximum absolute step size Minimum absolute step size Set a value for 𝑡𝑠𝑡𝑜𝑝 Supply a pointer for user data Maximum no. of ARKStep error test failures Set ‘optimal’ adaptivity parameters for a method Function name ARKStepSetDefaults() ARKStepSetDenseOrder() ARKStepSetDiagnostics() ARKStepSetErrFile() ARKStepSetErrHandlerFn() ARKStepSetFixedStep() ARKStepSetInitStep() ARKStepSetMaxHnilWarns() ARKStepSetMaxNumSteps() ARKStepSetMaxStep() ARKStepSetMinStep() ARKStepSetStopTime() ARKStepSetUserData() ARKStepSetMaxErrTestFails() ARKStepSetOptimalParams() Default internal 3 NULL stderr internal fn disabled estimated 10 500 ∞ 0.0 ∞ NULL 7 internal int ARKStepSetDefaults(void* arkode_mem) Resets all optional input parameters to ARKStep’s original default values. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Does not change the user_data pointer or any parameters within the specified time-stepping module. Also leaves alone any data structures or options related to root-finding (those can be reset using ARKStepRootInit()). int ARKStepSetDenseOrder(void* arkode_mem, int dord) Specifies the degree of the polynomial interpolant used for dense output (i.e. interpolation of solution output values and implicit method predictors). 4.5. User-callable functions 55 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. • dord – requested polynomial order of accuracy. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Allowed values are between 0 and min(q,5), where q is the order of the overall integration method. int ARKStepSetDiagnostics(void* arkode_mem, FILE* diagfp) Specifies the file pointer for a diagnostics file where all ARKStep step adaptivity and solver information is written. Arguments: • arkode_mem – pointer to the ARKStep memory block. • diagfp – pointer to the diagnostics output file. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This parameter can be stdout or stderr, although the suggested approach is to specify a pointer to a unique file opened by the user and returned by fopen. If not called, or if called with a NULL file pointer, all diagnostics output is disabled. When run in parallel, only one process should set a non-NULL value for this pointer, since statistics from all processes would be identical. int ARKStepSetErrFile(void* arkode_mem, FILE* errfp) Specifies a pointer to the file where all ARKStep warning and error messages will be written if the default internal error handling function is used. Arguments: • arkode_mem – pointer to the ARKStep memory block. • errfp – pointer to the output file. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value for errfp is stderr. Passing a NULL value disables all future error message output (except for the case wherein the ARKStep memory pointer is NULL). This use of the function is strongly discouraged. If used, this routine should be called before any other optional input functions, in order to take effect for subsequent error messages. 56 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepSetErrHandlerFn(void* arkode_mem, ARKErrHandlerFn ehfun, void* eh_data) Specifies the optional user-defined function to be used in handling error messages. Arguments: • arkode_mem – pointer to the ARKStep memory block. • ehfun – name of user-supplied error handler function. • eh_data – pointer to user data passed to ehfun every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Error messages indicating that the ARKStep solver memory is NULL will always be directed to stderr. int ARKStepSetFixedStep(void* arkode_mem, realtype hfixed) Disabled time step adaptivity within ARKStep, and specifies the fixed time step size to use for all internal steps. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hfixed – value of the fixed step size to use. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass 0.0 to return ARKStep to the default (adaptive-step) mode. Use of this function is not recommended, since we it gives no assurance of the validity of the computed solutions. It is primarily provided for code-to-code verification testing purposes. When using ARKStepSetFixedStep(), any values provided to the functions ARKStepSetInitStep(), ARKStepSetAdaptivityFn(), ARKStepSetMaxErrTestFails(), ARKStepSetAdaptivityMethod(), ARKStepSetCFLFraction(), ARKStepSetErrorBias(), ARKStepSetFixedStepBounds(), ARKStepSetMaxCFailGrowth(), ARKStepSetMaxEFailGrowth(), ARKStepSetMaxFirstGrowth(), ARKStepSetMaxGrowth(), ARKStepSetSafetyFactor(), ARKStepSetSmallNumEFails() and ARKStepSetStabilityFn() will be ignored, since temporal adaptivity is disabled. If both ARKStepSetFixedStep() and ARKStepSetStopTime() are used, then the fixed step size will be used for all steps until the final step preceding the provided stop time (which may be shorter). To resume use of the previous fixed step size, another call to ARKStepSetFixedStep() must be made prior to calling ARKStepEvolve() to resume integration. It is not recommended that ARKStepSetFixedStep() be used in concert with ARKStepSetMaxStep() or ARKStepSetMinStep(), since at best those latter two routines will provide no useful information to the solver, and at worst they may interfere with the desired fixed step size. int ARKStepSetInitStep(void* arkode_mem, realtype hin) Specifies the initial time step size ARKStep should use after initialization or re-initialization. Arguments: • arkode_mem – pointer to the ARKStep memory block. 4.5. User-callable functions 57 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • hin – value of the initial step to be attempted (̸= 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass 0.0 to use the default value. ⃦ 2 ⃦ ⃦ ⃦ By default, ARKStep estimates the initial step size to be the solution ℎ of the equation ⃦ ℎ2𝑦¨ ⃦ = 1, where 𝑦¨ is an estimated value of the second derivative of the solution at t0. int ARKStepSetMaxHnilWarns(void* arkode_mem, int mxhnil) Specifies the maximum number of messages issued by the solver to warn that 𝑡 + ℎ = 𝑡 on the next internal step, before ARKStep will instead return with an error. Arguments: • arkode_mem – pointer to the ARKStep memory block. • mxhnil – maximum allowed number of warning messages (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 10; set mxhnil to zero to specify this default. A negative value indicates that no warning messages should be issued. int ARKStepSetMaxNumSteps(void* arkode_mem, long int mxsteps) Specifies the maximum number of steps to be taken by the solver in its attempt to reach the next output time, before ARKStep will return with an error. Arguments: • arkode_mem – pointer to the ARKStep memory block. • mxsteps – maximum allowed number of internal steps. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Passing mxsteps = 0 results in ARKStep using the default value (500). Passing mxsteps < 0 disables the test (not recommended). int ARKStepSetMaxStep(void* arkode_mem, realtype hmax) Specifies the upper bound on the magnitude of the time step size. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hmax – maximum absolute value of the time step size (≥ 0). Return value: 58 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass hmax ≤ 0.0 to set the default value of ∞. int ARKStepSetMinStep(void* arkode_mem, realtype hmin) Specifies the lower bound on the magnitude of the time step size. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hmin – minimum absolute value of the time step size (≥ 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass hmin ≤ 0.0 to set the default value of 0. int ARKStepSetStopTime(void* arkode_mem, realtype tstop) Specifies the value of the independent variable 𝑡 past which the solution is not to proceed. Arguments: • arkode_mem – pointer to the ARKStep memory block. • tstop – stopping time for the integrator. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default is that no stop time is imposed. int ARKStepSetUserData(void* arkode_mem, void* user_data) Specifies the user data block user_data and attaches it to the main ARKStep memory block. Arguments: • arkode_mem – pointer to the ARKStep memory block. • user_data – pointer to the user data. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If specified, the pointer to user_data is passed to all user-supplied functions for which it is an argument; otherwise NULL is passed. If user_data is needed in user linear solver or preconditioner functions, the call to this function must be made before the call to specify the linear solver. 4.5. User-callable functions 59 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepSetMaxErrTestFails(void* arkode_mem, int maxnef ) Specifies the maximum number of error test failures permitted in attempting one step, before returning with an error. Arguments: • arkode_mem – pointer to the ARKStep memory block. • maxnef – maximum allowed number of error test failures (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 7; set maxnef ≤ 0 to specify this default. int ARKStepSetOptimalParams(void* arkode_mem) Sets all adaptivity and solver parameters to our ‘best guess’ values, for a given integration method (ERK, DIRK, ARK) and a given method order. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Should only be called after the method order and integration method have been set. These values resulted from repeated testing of ARKStep’s solvers on a variety of training problems. However, all problems are different, so these values may not be optimal for all users. Optional inputs for IVP method selection Optional input Set integrator method order Specify implicit/explicit problem Specify explicit problem Specify implicit problem Set additive RK tables Specify additive RK table numbers Function name ARKStepSetOrder() ARKStepSetImEx() ARKStepSetExplicit() ARKStepSetImplicit() ARKStepSetTables() ARKStepSetTableNum() Default 4 SUNTRUE SUNFALSE SUNFALSE internal internal int ARKStepSetOrder(void* arkode_mem, int ord) Specifies the order of accuracy for the ARK/DIRK/ERK integration method. Arguments: • arkode_mem – pointer to the ARKStep memory block. • ord – requested order of accuracy. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL 60 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_ILL_INPUT if an argument has an illegal value Notes: For explicit methods, the allowed values are 2 ≤ ord ≤ 8. For implicit methods, the allowed values are 2 ≤ ord ≤ 5, and for ImEx methods the allowed values are 3 ≤ ord ≤ 5. Any illegal input will result in the default value of 4. Since ord affects the memory requirements for the internal ARKStep memory block, it cannot be changed after the first call to ARKStepEvolve(), unless ARKStepReInit() is called. int ARKStepSetImEx(void* arkode_mem) Specifies that both the implicit and explicit portions of problem are enabled, and to use an additive Runge Kutta method. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This is automatically deduced when neither of the function pointers fe or fi passed to ARKStepCreate() are NULL, but may be set directly by the user if desired. int ARKStepSetExplicit(void* arkode_mem) Specifies that the implicit portion of problem is disabled, and to use an explicit RK method. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This is automatically deduced when the function pointer fi passed to ARKStepCreate() is NULL, but may be set directly by the user if desired. If the problem is posed in explicit form, i.e. 𝑦˙ = 𝑓 (𝑡, 𝑦), then we recommend that the ERKStep time-stepper module be used instead. int ARKStepSetImplicit(void* arkode_mem) Specifies that the explicit portion of problem is disabled, and to use a diagonally implicit RK method. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This is automatically deduced when the function pointer fe passed to ARKStepCreate() is NULL, but may be set directly by the user if desired. 4.5. User-callable functions 61 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepSetTables(void* arkode_mem, int q, int p, ARKodeButcherTable Bi, ButcherTable Be) Specifies a customized Butcher table (or pair) for the ERK, DIRK, or ARK method. ARKode- Arguments: • arkode_mem – pointer to the ARKStep memory block. • q – global order of accuracy for the ARK method. • p – global order of accuracy for the embedded ARK method. • Bi – the Butcher table for the implicit RK method. • Be – the Butcher table for the explicit RK method. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: For a description of the ARKodeButcherTable type and related functions for creating Butcher tables see Butcher Table Data Structure. To set an explicit table, Bi must be NULL. This automatically calls ARKStepSetExplicit(). However, if the problem is posed in explicit form, i.e. 𝑦˙ = 𝑓 (𝑡, 𝑦), then we recommend that the ERKStep time-stepper module be used instead of ARKStep. To set an implicit table, Be must be NULL. This automatically calls ARKStepSetImplicit(). If both Bi and Be are provided, this routine automatically calls ARKStepSetImEx(). When only one table is provided (i.e., Bi or Be is NULL) then the input values of q and p are ignored and the global order of the method and embedding (if applicable) are obtained from the Butcher table structures. If both Bi and Be are non-NULL (e.g, an IMEX method is provided) then the input values of q and p are used as the order of the ARK method may be less than the orders of the individual tables. No error checking is performed to ensure that either p or q correctly describe the coefficients that were input. Error checking is performed on Bi and Be (if non-NULL) to ensure that they specify DIRK and ERK methods, respectively. If the inputs Bi or Be do not contain an embedding (when the corresponding explicit or implicit table is nonNULL), the user must call ARKStepSetFixedStep() to enable fixed-step mode and set the desired time step size. int ARKStepSetTableNum(void* arkode_mem, int itable, int etable) Indicates to use specific built-in Butcher tables for the ERK, DIRK or ARK method. Arguments: • arkode_mem – pointer to the ARKStep memory block. • itable – index of the DIRK Butcher table. • etable – index of the ERK Butcher table. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value 62 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: The allowable values for both the itable and etable arguments corresponding to built-in tables may be found Appendix: Butcher tables. To choose an explicit table, set itable to a negative value. This automatically calls ARKStepSetExplicit(). However, if the problem is posed in explicit form, i.e. 𝑦˙ = 𝑓 (𝑡, 𝑦), then we recommend that the ERKStep timestepper module be used instead of ARKStep. To select an implicit table, set etable to a negative value. This automatically calls ARKStepSetImplicit(). If both itable and etable are non-negative, then these should match an existing implicit/explicit pair, listed in the section Additive Butcher tables. This automatically calls ARKStepSetImEx(). In all cases, error-checking is performed to ensure that the tables exist. Optional inputs for time step adaptivity The mathematical explanation of ARKode’s time step adaptivity algorithm, including how each of the parameters below is used within the code, is provided in the section Time step adaptivity. Optional input Set a custom time step adaptivity function Choose an existing time step adaptivity method Explicit stability safety factor Time step error bias factor Bounds determining no change in step size Maximum step growth factor on convergence fail Maximum step growth factor on error test fail Maximum first step growth factor Maximum general step growth factor Time step safety factor Error fails before MaxEFailGrowth takes effect Explicit stability function Function name ARKStepSetAdaptivityFn() ARKStepSetAdaptivityMethod() ARKStepSetCFLFraction() ARKStepSetErrorBias() ARKStepSetFixedStepBounds() ARKStepSetMaxCFailGrowth() ARKStepSetMaxEFailGrowth() ARKStepSetMaxFirstGrowth() ARKStepSetMaxGrowth() ARKStepSetSafetyFactor() ARKStepSetSmallNumEFails() ARKStepSetStabilityFn() Default internal 0 0.5 1.5 1.0 1.5 0.25 0.3 10000.0 20.0 0.96 2 none int ARKStepSetAdaptivityFn(void* arkode_mem, ARKAdaptFn hfun, void* h_data) Sets a user-supplied time-step adaptivity function. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hfun – name of user-supplied adaptivity function. • h_data – pointer to user data passed to hfun every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This function should focus on accuracy-based time step estimation; for stability based time steps the function ARKStepSetStabilityFn() should be used instead. int ARKStepSetAdaptivityMethod(void* arkode_mem, int imethod, int idefault, int pq, realtype* adapt_params) Specifies the method (and associated parameters) used for time step adaptivity. Arguments: 4.5. User-callable functions 63 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode_mem – pointer to the ARKStep memory block. • imethod – accuracy-based adaptivity method choice (0 ≤ imethod ≤ 5): 0 is PID, 1 is PI, 2 is I, 3 is explicit Gustafsson, 4 is implicit Gustafsson, and 5 is the ImEx Gustafsson. • idefault – flag denoting whether to use default adaptivity parameters (1), or that they will be supplied in the adapt_params argument (0). • pq – flag denoting whether to use the embedding order of accuracy p (0) or the method order of accuracy q (1) within the adaptivity algorithm. p is the default. • adapt_params[0] – 𝑘1 parameter within accuracy-based adaptivity algorithms. • adapt_params[1] – 𝑘2 parameter within accuracy-based adaptivity algorithms. • adapt_params[2] – 𝑘3 parameter within accuracy-based adaptivity algorithms. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If custom parameters are supplied, they will be checked for validity against published stability intervals. If other parameter values are desired, it is recommended to instead provide a custom function through a call to ARKStepSetAdaptivityFn(). int ARKStepSetCFLFraction(void* arkode_mem, realtype cfl_frac) Specifies the fraction of the estimated explicitly stable step to use. Arguments: • arkode_mem – pointer to the ARKStep memory block. • cfl_frac – maximum allowed fraction of explicitly stable step (default is 0.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetErrorBias(void* arkode_mem, realtype bias) Specifies the bias to be applied to the error estimates within accuracy-based adaptivity strategies. Arguments: • arkode_mem – pointer to the ARKStep memory block. • bias – bias applied to error in accuracy-based time step estimation (default is 1.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value below 1.0 will imply a reset to the default value. int ARKStepSetFixedStepBounds(void* arkode_mem, realtype lb, realtype ub) Specifies the step growth interval in which the step size will remain unchanged. 64 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. • lb – lower bound on window to leave step size fixed (default is 1.0). • ub – upper bound on window to leave step size fixed (default is 1.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any interval not containing 1.0 will imply a reset to the default values. int ARKStepSetMaxCFailGrowth(void* arkode_mem, realtype etacf ) Specifies the maximum step size growth factor upon an algebraic solver convergence failure on a stage solve within a step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • etacf – time step reduction factor on a nonlinear solver convergence failure (default is 0.25). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value outside the interval (0, 1] will imply a reset to the default value. int ARKStepSetMaxEFailGrowth(void* arkode_mem, realtype etamxf ) Specifies the maximum step size growth factor upon multiple successive accuracy-based error failures in the solver. Arguments: • arkode_mem – pointer to the ARKStep memory block. • etamxf – time step reduction factor on multiple error fails (default is 0.3). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value outside the interval (0, 1] will imply a reset to the default value. int ARKStepSetMaxFirstGrowth(void* arkode_mem, realtype etamx1) Specifies the maximum allowed step size change following the very first integration step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • etamx1 – maximum allowed growth factor after the first time step (default is 10000.0). Return value: • ARK_SUCCESS if successful 4.5. User-callable functions 65 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value ≤ 1.0 will imply a reset to the default value. int ARKStepSetMaxGrowth(void* arkode_mem, realtype mx_growth) Specifies the maximum growth of the step size between consecutive steps in the integration process. Arguments: • arkode_mem – pointer to the ARKStep memory block. • growth – maximum allowed growth factor between consecutive time steps (default is 20.0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value ≤ 1.0 will imply a reset to the default value. int ARKStepSetSafetyFactor(void* arkode_mem, realtype safety) Specifies the safety factor to be applied to the accuracy-based estimated step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • safety – safety factor applied to accuracy-based time step (default is 0.96). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetSmallNumEFails(void* arkode_mem, int small_nef ) Specifies the threshold for “multiple” successive error failures before the etamxf parameter from ARKStepSetMaxEFailGrowth() is applied. Arguments: • arkode_mem – pointer to the ARKStep memory block. • small_nef – bound to determine ‘multiple’ for etamxf (default is 2). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetStabilityFn(void* arkode_mem, ARKExpStabFn EStab, void* estab_data) Sets the problem-dependent function to estimate a stable time step size for the explicit portion of the ODE system. Arguments: 66 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode_mem – pointer to the ARKStep memory block. • EStab – name of user-supplied stability function. • estab_data – pointer to user data passed to EStab every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This function should return an estimate of the absolute value of the maximum stable time step for the explicit portion of the ODE system. It is not required, since accuracy-based adaptivity may be sufficient for retaining stability, but this can be quite useful for problems where the explicit right-hand side function 𝑓𝐸 (𝑡, 𝑦) may contain stiff terms. Optional inputs for implicit stage solves The mathematical explanation for the nonlinear solver strategies used by ARKStep, including how each of the parameters below is used within the code, is provided in the section Nonlinear solver methods. Optional input Specify linearly implicit 𝑓𝐼 Specify nonlinearly implicit 𝑓𝐼 Implicit predictor method Maximum number of nonlinear iterations Coefficient in the nonlinear convergence test Nonlinear convergence rate constant Nonlinear residual divergence ratio Maximum number of convergence failures Function name ARKStepSetLinear() ARKStepSetNonlinear() ARKStepSetPredictorMethod() ARKStepSetMaxNonlinIters() ARKStepSetNonlinConvCoef() ARKStepSetNonlinCRDown() ARKStepSetNonlinRDiv() ARKStepSetMaxConvFails() Default SUNFALSE SUNTRUE 0 3 0.1 0.3 2.3 10 int ARKStepSetLinear(void* arkode_mem, int timedepend) Specifies that the implicit portion of the problem is linear. Arguments: • arkode_mem – pointer to the ARKStep memory block. • timedepend – flag denoting whether the Jacobian of 𝑓𝐼 (𝑡, 𝑦) is time-dependent (1) or not (0). Alternately, when using an iterative linear solver this flag denotes time dependence of the preconditioner. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Tightens the linear solver tolerances and takes only a single Newton iteration. Calls ARKStepSetDeltaGammaMax() to enforce Jacobian recomputation when the step size ratio changes by more than 100 times the unit roundoff (since nonlinear convergence is not tested). Only applicable when used in combination with the modified or inexact Newton iteration (not the fixed-point solver). The only SUNNonlinearSolver module that is compatible with the ARKStepSetLinear() option is the Newton solver. int ARKStepSetNonlinear(void* arkode_mem) Specifies that the implicit portion of the problem is nonlinear. 4.5. User-callable functions 67 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This is the default behavior of ARKStep, so the function is primarily useful to undo a previous call to ARKStepSetLinear(). Calls ARKStepSetDeltaGammaMax() to reset the step size ratio threshold to the default value. int ARKStepSetPredictorMethod(void* arkode_mem, int method) Specifies the method to use for predicting implicit solutions. Arguments: • arkode_mem – pointer to the ARKStep memory block. • method – method choice (0 ≤ method ≤ 4): – 0 is the trivial predictor, – 1 is the maximum order (dense output) predictor, – 2 is the variable order predictor, that decreases the polynomial degree for more distant RK stages, – 3 is the cutoff order predictor, that uses the maximum order for early RK stages, and a first-order predictor for distant RK stages, – 4 is the bootstrap predictor, that uses a second-order predictor based on only information within the current step. – 5 is the minimum correction predictor, that uses all preceding stage information within the current step for prediction. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 0. If method is set to an undefined value, this default predictor will be used. int ARKStepSetMaxNonlinIters(void* arkode_mem, int maxcor) Specifies the maximum number of nonlinear solver iterations permitted per RK stage within each time step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • maxcor – maximum allowed solver iterations per stage (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value or if the SUNNONLINSOL module is NULL • ARK_NLS_OP_ERR if the SUNNONLINSOL object returned a failure flag 68 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: The default value is 3; set maxcor ≤ 0 to specify this default. int ARKStepSetNonlinConvCoef(void* arkode_mem, realtype nlscoef ) Specifies the safety factor used within the nonlinear solver convergence test. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nlscoef – coefficient in nonlinear solver convergence test (> 0.0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 0.1; set nlscoef ≤ 0 to specify this default. int ARKStepSetNonlinCRDown(void* arkode_mem, realtype crdown) Specifies the constant used in estimating the nonlinear solver convergence rate. Arguments: • arkode_mem – pointer to the ARKStep memory block. • crdown – nonlinear convergence rate estimation constant (default is 0.3). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetNonlinRDiv(void* arkode_mem, realtype rdiv) Specifies the nonlinear correction threshold beyond which the iteration will be declared divergent. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rdiv – tolerance on nonlinear correction size ratio to declare divergence (default is 2.3). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetMaxConvFails(void* arkode_mem, int maxncf ) Specifies the maximum number of nonlinear solver convergence failures permitted during one step, before ARKStep will return with an error. Arguments: • arkode_mem – pointer to the ARKStep memory block. • maxncf – maximum allowed nonlinear solver convergence failures per step (> 0). Return value: 4.5. User-callable functions 69 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 10; set maxncf ≤ 0 to specify this default. Upon each convergence failure, ARKStep will first call the Jacobian setup routine and try again (if a Newton method is used). If a convergence failure still occurs, the time step size is reduced by the factor etacf (set within ARKStepSetMaxCFailGrowth()). Linear solver interface optional input functions The mathematical explanation of the linear solver methods available to ARKStep is provided in the section Linear solver methods. We group the user-callable routines into four categories: general routines concerning the update frequency for matrices and/or preconditioners, optional inputs for matrix-based linear solvers, optional inputs for matrix-free linear solvers, and optional inputs for iterative linear solvers. We note that the matrix-based and matrixfree groups are mutually exclusive, whereas the “iterative” tag can apply to either case. Optional inputs for the ARKLs linear solver interface As discussed in the section Updating the linear solver, ARKode strives to reuse matrix and preconditioner data for as many solves as possible to amortize the high costs of matrix construction and factorization. To that end, ARKStep provides three user-callable routines to modify this behavior. To this end, we recall that the Newton system matrices that arise within an implicit stage solve are 𝒜(𝑡, 𝑧) ≈ 𝑀 − 𝛾𝐽(𝑡, 𝑧), where the implicit right-hand side function has (𝑡,𝑧) . Jacobian matrix 𝐽(𝑡, 𝑧) = 𝜕𝑓𝐼𝜕𝑧 The matrix or preconditioner for 𝒜 can only be updated within a call to the linear solver ‘setup’ routine. In general, the frequency with which the linear solver setup routine is called may be controlled with the msbp argument to ARKStepSetMaxStepsBetweenLSet(). When this occurs, the validity of 𝒜 for successive time steps intimately depends on whether the corresponding 𝛾 and 𝐽 inputs remain valid. If the current value of 𝛾 is ever too far from the value used when constructing 𝒜, then it is considered invalid and the linear solver setup routine is called. For linear solvers with user-supplied preconditioning, the input jok is then set to SUNFALSE in calling the user-supplied ARKLsPrecSetupFn(), to recommend a preconditioner update. It is more difficult to automatically and efficiently determine the validity of 𝐽 (unless the nonlinear solver fails to converge). To this end, we automatically update 𝐽 at a user-defined frequency, controlled with the msbj argument to ARKStepSetMaxStepsBetweenJac(). We note that this is only checked within calls to the linear solver setup routine, so values msbj < msbp do not make sense. For linear solvers with user-supplied preconditioning: at each call to the linear solver setup routine, msbj is used to determine whether to recommend a preconditioner update (i.e., whether to set jok to SUNFALSE in calling the user-supplied ARKLsPrecSetupFn()). For matrix-based linear solvers: at each call to the linear solver setup routine, msbj is used to determine whether (𝑡,𝑦) the matrix 𝐽(𝑡, 𝑦) = 𝜕𝑓𝐼𝜕𝑦 should be updated; if not then the previous value is reused and the system matrix 𝒜(𝑡, 𝑦) ≈ 𝑀 − 𝛾𝐽(𝑡, 𝑦) is recomputed using the current 𝛾 value. Optional input Max change in step signaling new 𝐽 Max steps between calls to “lsetup” routine Max steps between calls to new 𝐽 Function name ARKStepSetDeltaGammaMax() ARKStepSetMaxStepsBetweenLSet() ARKStepSetMaxStepsBetweenJac() Default 0.2 20 50 int ARKStepSetDeltaGammaMax(void* arkode_mem, realtype dgmax) Specifies a scaled step size ratio tolerance, beyond which the linear solver setup routine will be signaled. 70 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. • dgmax – tolerance on step size ratio change before calling linear solver setup routine (default is 0.2). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ARKStepSetMaxStepsBetweenLSet(void* arkode_mem, int msbp) Specifies the frequency of calls to the linear solver setup routine. Positive values specify the number of time steps between setup calls; negative values force recomputation at each stage solve; zero values reset to the default. Arguments: • arkode_mem – pointer to the ARKStep memory block. • msbp – maximum number of time steps between linear solver setup calls, or flag to force recomputation at each stage solve (default is 20). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL int ARKStepSetMaxStepsBetweenJac(void* arkode_mem, long int msbj) Specifies the maximum number of time steps to wait before recomputation of the Jacobian or recommendation to update the preconditioner. Arguments: • arkode_mem – pointer to the ARKStep memory block. • msbj – maximum number of time steps between Jacobian or preconditioner updates (default is 50). Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_LMEM_NULL if the linear solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. Notes: Passing a value msbj ≤ 0 indicates to use the default value of 50. This function must be called after the ARKLS system solver interface has been initialized through a call to ARKStepSetLinearSolver(). Optional inputs for matrix-based SUNLinearSolver modules Optional input Jacobian function Mass matrix function Function name ARKStepSetJacFn() ARKStepSetMassFn() 4.5. User-callable functions Default DQ none 71 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), When using matrix-based linear solver modules, the ARKLS solver interface needs a function to compute an approximation to the Jacobian matrix 𝐽(𝑡, 𝑦). This function must be of type ARKLsJacFn(). The user can supply a custom Jacobian function, or if using a dense or banded 𝐽 can use the default internal difference quotient approximation that comes with the ARKLS interface. At present, we do not supply a corresponding routine to approximate Jacobian entries in sparse matrices 𝐽. To specify a user-supplied Jacobian function jac, ARKStep provides the function ARKStepSetJacFn(). The ARKLS interface passes the user data pointer to the Jacobian function. This allows the user to create an arbitrary structure with relevant problem data and access it during the execution of the usersupplied Jacobian function, without using global data in the program. The user data pointer may be specified through ARKStepSetUserData(). Similarly, if the ODE system involves a non-identity mass matrix, 𝑀 ̸= 𝐼, matrix-based linear solver modules require a function to compute an approximation to the mass matrix 𝑀 . There is no default difference quotient approximation (for any matrix type), so this routine must be supplied by the user. This function must be of type ARKLsMassFn(), and should be set using the function ARKStepSetMassFn(). We note that the ARKLS solver passes the user data pointer to the mass matrix function. This allows the user to create an arbitrary structure with relevant problem data and access it during the execution of the user-supplied mass matrix function, without using global data in the program. The pointer user data may be specified through ARKStepSetUserData(). int ARKStepSetJacFn(void* arkode_mem, ARKLsJacFn jac) Specifies the Jacobian approximation routine to be used for the matrix-based solver with the ARKLS interface. Arguments: • arkode_mem – pointer to the ARKStep memory block. • jac – name of user-supplied Jacobian approximation function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: This routine must be called after the ARKLS linear solver interface has been initialized through a call to ARKStepSetLinearSolver(). By default, ARKLS uses an internal difference quotient function for dense and band matrices. If NULL is passed in for jac, this default is used. An error will occur if no jac is supplied when using other matrix types. The function type ARKLsJacFn() is described in the section User-supplied functions. int ARKStepSetMassFn(void* arkode_mem, ARKLsMassFn mass) Specifies the mass matrix approximation routine to be used for the matrix-based solver with the ARKLS interface. Arguments: • arkode_mem – pointer to the ARKStep memory block. • mass – name of user-supplied mass matrix approximation function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_MASSMEM_NULL if the mass matrix solver memory was NULL • ARKLS_ILL_INPUT if an argument has an illegal value 72 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: This routine must be called after the ARKLS mass matrix solver interface has been initialized through a call to ARKStepSetMassLinearSolver(). Since there is no default difference quotient function for mass matrices, mass must be non-NULL. The function type ARKLsMassFn() is described in the section User-supplied functions. Optional inputs for matrix-free SUNLinearSolver modules Optional input 𝐽𝑣 functions (jtimes and jtsetup) 𝑀 𝑣 functions (mtimes and mtsetup) Function name ARKStepSetJacTimes() ARKStepSetMassTimes() Default DQ, none none, none As described in the section Linear solver methods, when solving the Newton linear systems with matrix-free methods, the ARKLS interface requires a jtimes function to compute an approximation to the product between the Jacobian matrix 𝐽(𝑡, 𝑦) and a vector 𝑣. The user can supply a custom Jacobian-times-vector approximation function, or use the default internal difference quotient function that comes with the ARKLS interface. A user-defined Jacobian-vector function must be of type ARKLsJacTimesVecFn and can be specified through a call to ARKStepSetJacTimes() (see the section User-supplied functions for specification details). As with the user-supplied preconditioner functions, the evaluation and processing of any Jacobian-related data needed by the user’s Jacobian-times-vector function is done in the optional user-supplied function of type ARKLsJacTimesSetupFn (see the section User-supplied functions for specification details). As with the preconditioner functions, a pointer to the user-defined data structure, user_data, specified through ARKStepSetUserData() (or a NULL pointer otherwise) is passed to the Jacobian-times-vector setup and product functions each time they are called. Similarly, if a problem involves a non-identity mass matrix, 𝑀 ̸= 𝐼, then matrix-free solvers require a mtimes function to compute an approximation to the product between the mass matrix 𝑀 and a vector 𝑣. This function must be usersupplied, since there is no default value. mtimes must be of type ARKLsMassTimesVecFn(), and can be specified through a call to the ARKStepSetMassTimes() routine. As with the user-supplied preconditioner functions, the evaluation and processing of any mass matrix-related data needed by the user’s mass-matrix-times-vector function is done in the optional user-supplied function of type ARKLsMassTimesSetupFn (see the section User-supplied functions for specification details). int ARKStepSetJacTimes(void* arkode_mem, ARKLsJacTimesSetupFn TimesVecFn jtimes) Specifies the Jacobian-times-vector setup and product functions. jtsetup, ARKLsJac- Arguments: • arkode_mem – pointer to the ARKStep memory block. • jtsetup – user-defined Jacobian-vector setup function. Pass NULL if no setup is necessary. • jtimes – user-defined Jacobian-vector product function. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_LMEM_NULL if the linear solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. • ARKLS_SUNLS_FAIL if an error occurred when setting up the Jacobian-vector product in the SUNLinearSolver object used by the ARKLS interface. Notes: The default is to use an internal finite difference quotient for jtimes and to leave out jtsetup. If NULL is passed to jtimes, these defaults are used. A user may specify non-NULL jtimes and NULL jtsetup inputs. 4.5. User-callable functions 73 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), This function must be called after the ARKLS system solver interface has been initialized through a call to ARKStepSetLinearSolver(). The function types ARKLsJacTimesSetupFn and ARKLsJacTimesVecFn are described in the section User-supplied functions. int ARKStepSetMassTimes(void* arkode_mem, ARKLsMassTimesSetupFn MassTimesVecFn mtimes, void* mtimes_data) Specifies the mass matrix-times-vector setup and product functions. mtsetup, ARKLs- Arguments: • arkode_mem – pointer to the ARKStep memory block. • mtsetup – user-defined mass matrix-vector setup function. Pass NULL if no setup is necessary. • mtimes – user-defined mass matrix-vector product function. • mtimes_data – a pointer to user data, that will be supplied to both the mtsetup and mtimes functions. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_MASSMEM_NULL if the mass matrix solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. • ARKLS_SUNLS_FAIL if an error occurred when setting up the mass-matrix-vector product in the SUNLinearSolver object used by the ARKLS interface. Notes: There is no default finite difference quotient for mtimes, so if using the ARKLS mass matrix solver interface with NULL-valued 𝑀 , and this routine is called with NULL-valued mtimes, an error will occur. A user may specify NULL for mtsetup. This function must be called after the ARKLS mass matrix solver interface has been initialized through a call to ARKStepSetMassLinearSolver(). The function types ARKLsMassTimesSetupFn and ARKLsMassTimesVecFn are described in the section User-supplied functions. Optional inputs for iterative SUNLinearSolver modules Optional input Newton preconditioning functions Mass matrix preconditioning functions Newton linear and nonlinear tolerance ratio Mass matrix linear and nonlinear tolerance ratio Function name ARKStepSetPreconditioner() ARKStepSetMassPreconditioner() ARKStepSetEpsLin() ARKStepSetMassEpsLin() Default NULL, NULL NULL, NULL 0.05 0.05 As described in the section Linear solver methods, when using an iterative linear solver the user may supply a preconditioning operator to aid in solution of the system. This operator consists of two user-supplied functions, psetup and psolve, that are supplied to ARKStep using either the function ARKStepSetPreconditioner() (for preconditioning the Newton system), or the function ARKStepSetMassPreconditioner() (for preconditioning the mass matrix system). The psetup function supplied to these routines should handle evaluation and preprocessing of any Jacobian or mass-matrix data needed by the user’s preconditioner solve function, psolve. The user data pointer received through ARKStepSetUserData() (or a pointer to NULL if user data was not specified) is passed to the psetup and psolve functions. This allows the user to create an arbitrary structure with relevant problem data and access it during the execution of the user-supplied preconditioner functions without using global data in the program. If preconditioning is supplied for both the Newton and mass matrix linear systems, it is expected that the user will supply different psetup and psolve function for each. 74 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Also, as described in the section Linear iteration error control, the ARKLS interface requires that iterative linear solvers stop when the norm of the preconditioned residual satisfies ‖𝑟‖ ≤ 𝜖𝐿 𝜖 10 where the default 𝜖𝐿 = 0.05, which may be modified by the user through the ARKStepSetEpsLin() function. int ARKStepSetPreconditioner(void* arkode_mem, ARKLsPrecSetupFn SolveFn psolve) Specifies the user-supplied preconditioner setup and solve functions. psetup, ARKLsPrec- Arguments: • arkode_mem – pointer to the ARKStep memory block. • psetup – user defined preconditioner setup function. Pass NULL if no setup is needed. • psolve – user-defined preconditioner solve function. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_LMEM_NULL if the linear solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. • ARKLS_SUNLS_FAIL if an error occurred when setting SUNLinearSolver object used by the ARKLS interface. up preconditioning in the Notes: The default is NULL for both arguments (i.e., no preconditioning). This function must be called after the ARKLS system solver interface has been initialized through a call to ARKStepSetLinearSolver(). Both of the function types ARKLsPrecSetupFn() and ARKLsPrecSolveFn() are described in the section User-supplied functions. int ARKStepSetMassPreconditioner(void* arkode_mem, ARKLsMassPrecSetupFn psetup, ARKLsMassPrecSolveFn psolve) Specifies the mass matrix preconditioner setup and solve functions. Arguments: • arkode_mem – pointer to the ARKStep memory block. • psetup – user defined preconditioner setup function. Pass NULL if no setup is to be done. • psolve – user-defined preconditioner solve function. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_LMEM_NULL if the linear solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. • ARKLS_SUNLS_FAIL if an error occurred when setting SUNLinearSolver object used by the ARKLS interface. 4.5. User-callable functions up preconditioning in the 75 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: This function must be called after the ARKLS mass matrix solver interface has been initialized through a call to ARKStepSetMassLinearSolver(). The default is NULL for both arguments (i.e. no preconditioning). Both of the function types ARKLsMassPrecSetupFn() and ARKLsMassPrecSolveFn() are described in the section User-supplied functions. int ARKStepSetEpsLin(void* arkode_mem, realtype eplifac) Specifies the factor by which the tolerance on the nonlinear iteration is multiplied to get a tolerance on the linear iteration. Arguments: • arkode_mem – pointer to the ARKStep memory block. • eplifac – linear convergence safety factor. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_LMEM_NULL if the linear solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. Notes: Passing a value eplifac ≤ 0 indicates to use the default value of 0.05. This function must be called after the ARKLS system solver interface has been initialized through a call to ARKStepSetLinearSolver(). int ARKStepSetMassEpsLin(void* arkode_mem, realtype eplifac) Specifies the factor by which the tolerance on the nonlinear iteration is multiplied to get a tolerance on the mass matrix linear iteration. Arguments: • arkode_mem – pointer to the ARKStep memory block. • eplifac – linear convergence safety factor. Return value: • ARKLS_SUCCESS if successful. • ARKLS_MEM_NULL if the ARKStep memory was NULL. • ARKLS_MASSMEM_NULL if the mass matrix solver memory was NULL. • ARKLS_ILL_INPUT if an input has an illegal value. Notes: This function must be called after the ARKLS mass matrix solver interface has been initialized through a call to ARKStepSetMassLinearSolver(). Passing a value eplifac ≤ 0 indicates to use the default value of 0.05. Rootfinding optional input functions The following functions can be called to set optional inputs to control the rootfinding algorithm, the mathematics of which are described in the section Rootfinding. Optional input Direction of zero-crossings to monitor Disable inactive root warnings 76 Function name ARKStepSetRootDirection() ARKStepSetNoInactiveRootWarn() Default both enabled Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepSetRootDirection(void* arkode_mem, int* rootdir) Specifies the direction of zero-crossings to be located and returned. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rootdir – state array of length nrtfn, the number of root functions 𝑔𝑖 (the value of nrtfn was supplied in the call to ARKStepRootInit()). If rootdir[i] == 0 then crossing in either direction for 𝑔𝑖 should be reported. A value of +1 or -1 indicates that the solver should report only zero-crossings where 𝑔𝑖 is increasing or decreasing, respectively. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default behavior is to monitor for both zero-crossing directions. int ARKStepSetNoInactiveRootWarn(void* arkode_mem) Disables issuing a warning if some root function appears to be identically zero at the beginning of the integration. Arguments: • arkode_mem – pointer to the ARKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory is NULL Notes: ARKStep will not report the initial conditions as a possible zero-crossing (assuming that one or more components 𝑔𝑖 are zero at the initial time). However, if it appears that some 𝑔𝑖 is identically zero at the initial time (i.e., 𝑔𝑖 is zero at the initial time and after the first step), ARKStep will issue a warning which can be disabled with this optional input function. 4.5.9 Interpolated output function An optional function ARKStepGetDky() is available to obtain additional values of solution-related quantities. This function should only be called after a successful return from ARKStepEvolve(), as it provides interpolated values either of 𝑦 or of its derivatives (up to the 5th derivative) interpolated to any value of 𝑡 in the last internal step taken by ARKStepEvolve(). Internally, this dense output algorithm is identical to the algorithm used for the maximum order implicit predictors, described in the section Maximum order predictor, except that derivatives of the polynomial model may be evaluated upon request. int ARKStepGetDky(void* arkode_mem, realtype t, int k, N_Vector dky) 𝑑(𝑘) Computes the k-th derivative of the function 𝑦 at the time t, i.e. 𝑑𝑡 (𝑘) 𝑦(𝑡), for values of the independent variable satisfying 𝑡𝑛 − ℎ𝑛 ≤ 𝑡 ≤ 𝑡𝑛 , with 𝑡𝑛 as current internal time reached, and ℎ𝑛 is the last internal step size successfully used by the solver. This routine uses an interpolating polynomial of degree max(dord, k), where dord is the argument provided to ARKStepSetDenseOrder(). The user may request k in the range {0,...,*dord*}. Arguments: • arkode_mem – pointer to the ARKStep memory block. • t – the value of the independent variable at which the derivative is to be evaluated. • k – the derivative order requested. 4.5. User-callable functions 77 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • dky – output vector (must be allocated by the user). Return value: • ARK_SUCCESS if successful • ARK_BAD_K if k is not in the range {0,...,*dord*}. • ARK_BAD_T if t is not in the interval [𝑡𝑛 − ℎ𝑛 , 𝑡𝑛 ] • ARK_BAD_DKY if the dky vector was NULL • ARK_MEM_NULL if the ARKStep memory is NULL Notes: It is only legal to call this function after a successful return from ARKStepEvolve(). A user may access the values 𝑡𝑛 and ℎ𝑛 via the functions ARKStepGetCurrentTime() and ARKStepGetLastStep(), respectively. 4.5.10 Optional output functions ARKStep provides an extensive set of functions that can be used to obtain solver performance information. We organize these into groups: 1. SUNDIALS version information accessor routines are in the subsection SUNDIALS version information, 2. General ARKStep output routines are in the subsection Main solver optional output functions, 3. ARKStep implicit solver output routines are in the subsection Implicit solver optional output functions, 4. Output routines regarding root-finding results are in the subsection Rootfinding optional output functions, 5. Linear solver output routines are in the subsection Linear solver interface optional output functions and 6. General usability routines (e.g. to print the current ARKStep parameters, or output the current Butcher table(s)) are in the subsection General usability functions. Following each table, we elaborate on each function. Some of the optional outputs, especially the various counters, can be very useful in determining the efficiency of various methods inside ARKStep. For example: • The counters nsteps, nfe_evals, nfi_evals and nf_evals provide a rough measure of the overall cost of a given run, and can be compared between runs with different solver options to suggest which set of options is the most efficient. • The ratio nniters/nsteps measures the performance of the nonlinear iteration in solving the nonlinear systems at each stage, providing a measure of the degree of nonlinearity in the problem. Typical values of this for a Newton solver on a general problem range from 1.1 to 1.8. • When using a Newton nonlinear solver, the ratio njevals/nniters (in the case of a direct linear solver), and the ratio npevals/nniters (in the case of an iterative linear solver) can measure the overall degree of nonlinearity in the problem, since these are updated infrequently, unless the Newton method convergence slows. • When using a Newton nonlinear solver, the ratio njevals/nniters (when using a direct linear solver), and the ratio nliters/nniters (when using an iterative linear solver) can indicate the quality of the approximate Jacobian or preconditioner being used. For example, if this ratio is larger for a user-supplied Jacobian or Jacobianvector product routine than for the difference-quotient routine, it can indicate that the user-supplied Jacobian is inaccurate. • The ratio expsteps/accsteps can measure the quality of the ImEx splitting used, since a higher-quality splitting will be dominated by accuracy-limited steps. 78 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • The ratio nsteps/step_attempts can measure the quality of the time step adaptivity algorithm, since a poor algorithm will result in more failed steps, and hence a lower ratio. It is therefore recommended that users retrieve and output these statistics following each run, and take some time to investigate alternate solver options that will be more optimal for their particular problem of interest. SUNDIALS version information The following functions provide a way to get SUNDIALS version information at runtime. int SUNDIALSGetVersion(char *version, int len) This routine fills a string with SUNDIALS version information. Arguments: • version – character array to hold the SUNDIALS version information. • len – allocated length of the version character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS version Notes: An array of 25 characters should be sufficient to hold the version information. int SUNDIALSGetVersionNumber(int *major, int *minor, int *patch, char *label, int len) This routine sets integers for the SUNDIALS major, minor, and patch release numbers and fills a string with the release label if applicable. Arguments: • major – SUNDIALS release major version number. • minor – SUNDIALS release minor version number. • patch – SUNDIALS release patch version number. • label – string to hold the SUNDIALS release label. • len – allocated length of the label character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS label Notes: An array of 10 characters should be sufficient to hold the label information. If a label is not used in the release version, no information is copied to label. 4.5. User-callable functions 79 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Main solver optional output functions Optional output Size of ARKStep real and integer workspaces Cumulative number of internal steps Actual initial time step size used Step size used for the last successful step Step size to be attempted on the next step Current internal time reached by the solver Suggested factor for tolerance scaling Error weight vector for state variables Residual weight vector Single accessor to many statistics at once Name of constant associated with a return flag No. of explicit stability-limited steps No. of accuracy-limited steps No. of attempted steps No. of calls to fe and fi functions No. of local error test failures that have occurred Current ERK and DIRK Butcher tables Estimated local truncation error vector Single accessor to many statistics at once Function name ARKStepGetWorkSpace() ARKStepGetNumSteps() ARKStepGetActualInitStep() ARKStepGetLastStep() ARKStepGetCurrentStep() ARKStepGetCurrentTime() ARKStepGetTolScaleFactor() ARKStepGetErrWeights() ARKStepGetResWeights() ARKStepGetStepStats() ARKStepGetReturnFlagName() ARKStepGetNumExpSteps() ARKStepGetNumAccSteps() ARKStepGetNumStepAttempts() ARKStepGetNumRhsEvals() ARKStepGetNumErrTestFails() ARKStepGetCurrentButcherTables() ARKStepGetEstLocalErrors() ARKStepGetTimestepperStats() int ARKStepGetWorkSpace(void* arkode_mem, long int* lenrw, long int* leniw) Returns the ARKStep real and integer workspace sizes. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lenrw – the number of realtype values in the ARKStep workspace. • leniw – the number of integer values in the ARKStep workspace. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetNumSteps(void* arkode_mem, long int* nsteps) Returns the cumulative number of internal steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nsteps – number of steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetActualInitStep(void* arkode_mem, realtype* hinused) Returns the value of the integration step size used on the first step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hinused – actual value of initial step size. 80 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: Even if the value of the initial integration step was specified by the user through a call to ARKStepSetInitStep(), this value may have been changed by ARKStep to ensure that the step size fell within the prescribed bounds (ℎ𝑚𝑖𝑛 ≤ ℎ0 ≤ ℎ𝑚𝑎𝑥 ), or to satisfy the local error test condition, or to ensure convergence of the nonlinear solver. int ARKStepGetLastStep(void* arkode_mem, realtype* hlast) Returns the integration step size taken on the last successful internal step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hlast – step size taken on the last internal step. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetCurrentStep(void* arkode_mem, realtype* hcur) Returns the integration step size to be attempted on the next internal step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • hcur – step size to be attempted on the next internal step. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetCurrentTime(void* arkode_mem, realtype* tcur) Returns the current internal time reached by the solver. Arguments: • arkode_mem – pointer to the ARKStep memory block. • tcur – current internal time reached. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetTolScaleFactor(void* arkode_mem, realtype* tolsfac) Returns a suggested factor by which the user’s tolerances should be scaled when too much accuracy has been requested for some internal step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • tolsfac – suggested scaling factor for user-supplied tolerances. Return value: • ARK_SUCCESS if successful 4.5. User-callable functions 81 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetErrWeights(void* arkode_mem, N_Vector eweight) Returns the current error weight vector. Arguments: • arkode_mem – pointer to the ARKStep memory block. • eweight – solution error weights at the current time. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The user must allocate space for eweight, that will be filled in by this function. int ARKStepGetResWeights(void* arkode_mem, N_Vector rweight) Returns the current residual weight vector. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rweight – residual error weights at the current time. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The user must allocate space for rweight, that will be filled in by this function. int ARKStepGetStepStats(void* arkode_mem, long int* nsteps, realtype* hinused, realtype* hlast, realtype* hcur, realtype* tcur) Returns many of the most useful optional outputs in a single call. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nsteps – number of steps taken in the solver. • hinused – actual value of initial step size. • hlast – step size taken on the last internal step. • hcur – step size to be attempted on the next internal step. • tcur – current internal time reached. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL char *ARKStepGetReturnFlagName(long int flag) Returns the name of the ARKStep constant corresponding to flag. Arguments: • flag – a return flag from an ARKStep function. Return value: The return value is a string containing the name of the corresponding constant. 82 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepGetNumExpSteps(void* arkode_mem, long int* expsteps) Returns the cumulative number of stability-limited steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • expsteps – number of stability-limited steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetNumAccSteps(void* arkode_mem, long int* accsteps) Returns the cumulative number of accuracy-limited steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • accsteps – number of accuracy-limited steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetNumStepAttempts(void* arkode_mem, long int* step_attempts) Returns the cumulative number of steps attempted by the solver (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • step_attempts – number of steps attempted by solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetNumRhsEvals(void* arkode_mem, long int* nfe_evals, long int* nfi_evals) Returns the number of calls to the user’s right-hand side functions, 𝑓𝐸 and 𝑓𝐼 (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nfe_evals – number of calls to the user’s 𝑓𝐸 (𝑡, 𝑦) function. • nfi_evals – number of calls to the user’s 𝑓𝐼 (𝑡, 𝑦) function. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The nfi_evals value does not account for calls made to 𝑓𝐼 by a linear solver or preconditioner module. int ARKStepGetNumErrTestFails(void* arkode_mem, long int* netfails) Returns the number of local error test failures that have occurred (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. 4.5. User-callable functions 83 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • netfails – number of error test failures. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL int ARKStepGetCurrentButcherTables(void* arkode_mem, ARKodeButcherTable *Bi, ARKodepButcherTable *Be) Returns the explicit and implicit Butcher tables currently in use by the solver. Arguments: • arkode_mem – pointer to the ARKStep memory block. • Bi – pointer to implicit Butcher table structure. • Be – pointer to explicit Butcher table structure. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The ARKodeButcherTable data structure is defined as a pointer to the following C structure: typedef struct ARKStepButcherTableMem { int q; int p; int stages; realtype **A; realtype *c; realtype *b; realtype *d; /* /* /* /* /* /* /* method order of accuracy embedding order of accuracy number of stages Butcher table coefficients canopy node coefficients root node coefficients embedding coefficients */ */ */ */ */ */ */ } *ARKStepButcherTable; For more details see Butcher Table Data Structure. int ARKStepGetEstLocalErrors(void* arkode_mem, N_Vector ele) Returns the vector of estimated local truncation errors for the current step. Arguments: • arkode_mem – pointer to the ARKStep memory block. • ele – vector of estimated local truncation errors. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The user must allocate space for ele, that will be filled in by this function. The values returned in ele are valid only after a successful call to ARKStepEvolve() (i.e. it returned a non-negative value). The ele vector, together with the eweight vector from ARKStepGetErrWeights(), can be used to determine how the various components of the system contributed to the estimated local error test. Specifically, that error test uses the WRMS norm of a vector whose components are the products of the components of these two vectors. Thus, for example, if there were recent error test failures, the components causing the failures are those with largest values for the products, denoted loosely as eweight[i]*ele[i]. 84 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepGetTimestepperStats(void* arkode_mem, long int* expsteps, long int* accsteps, long int* step_attempts, long int* nfe_evals, long int* nfi_evals, long int* nlinsetups, long int* netfails) Returns many of the most useful time-stepper statistics in a single call. Arguments: • arkode_mem – pointer to the ARKStep memory block. • expsteps – number of stability-limited steps taken in the solver. • accsteps – number of accuracy-limited steps taken in the solver. • step_attempts – number of steps attempted by the solver. • nfe_evals – number of calls to the user’s 𝑓𝐸 (𝑡, 𝑦) function. • nfi_evals – number of calls to the user’s 𝑓𝐼 (𝑡, 𝑦) function. • nlinsetups – number of linear solver setup calls made. • netfails – number of error test failures. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Implicit solver optional output functions Optional output No. of calls to linear solver setup function No. of nonlinear solver iterations No. of nonlinear solver convergence failures Single accessor to all nonlinear solver statistics Function name ARKStepGetNumLinSolvSetups() ARKStepGetNumNonlinSolvIters() ARKStepGetNumNonlinSolvConvFails() ARKStepGetNonlinSolvStats() int ARKStepGetNumLinSolvSetups(void* arkode_mem, long int* nlinsetups) Returns the number of calls made to the linear solver’s setup routine (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nlinsetups – number of linear solver setup calls made. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: This is only accumulated for the ‘life’ of the nonlinear solver object; the counter is reset whenever a new nonlinear solver module is ‘attached’ to ARKStep, or when ARKStep is resized. int ARKStepGetNumNonlinSolvIters(void* arkode_mem, long int* nniters) Returns the number of nonlinear solver iterations performed (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nniters – number of nonlinear iterations performed. Return value: 4.5. User-callable functions 85 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NLS_OP_ERR if the SUNNONLINSOL object returned a failure flag Notes: This is only accumulated for the ‘life’ of the nonlinear solver object; the counter is reset whenever a new nonlinear solver module is ‘attached’ to ARKStep, or when ARKStep is resized. int ARKStepGetNumNonlinSolvConvFails(void* arkode_mem, long int* nncfails) Returns the number of nonlinear solver convergence failures that have occurred (so far). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nncfails – number of nonlinear convergence failures. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: This is only accumulated for the ‘life’ of the nonlinear solver object; the counter is reset whenever a new nonlinear solver module is ‘attached’ to ARKStep, or when ARKStep is resized. int ARKStepGetNonlinSolvStats(void* arkode_mem, long int* nniters, long int* nncfails) Returns all of the nonlinear solver statistics in a single call. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nniters – number of nonlinear iterations performed. • nncfails – number of nonlinear convergence failures. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NLS_OP_ERR if the SUNNONLINSOL object returned a failure flag Notes: These are only accumulated for the ‘life’ of the nonlinear solver object; the counters are reset whenever a new nonlinear solver module is ‘attached’ to ARKStep, or when ARKStep is resized. Rootfinding optional output functions Optional output Array showing roots found No. of calls to user root function Function name ARKStepGetRootInfo() ARKStepGetNumGEvals() int ARKStepGetRootInfo(void* arkode_mem, int* rootsfound) Returns an array showing which functions were found to have a root. Arguments: • arkode_mem – pointer to the ARKStep memory block. • rootsfound – array of length nrtfn with the indices of the user functions 𝑔𝑖 found to have a root (the value of nrtfn was supplied in the call to ARKStepRootInit()). For 𝑖 = 0 . . . nrtfn-1, rootsfound[i] is nonzero if 𝑔𝑖 has a root, and 0 if not. 86 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The user must allocate space for rootsfound prior to calling this function. For the components of 𝑔𝑖 for which a root was found, the sign of rootsfound[i] indicates the direction of zero-crossing. A value of +1 indicates that 𝑔𝑖 is increasing, while a value of -1 indicates a decreasing 𝑔𝑖 . int ARKStepGetNumGEvals(void* arkode_mem, long int* ngevals) Returns the cumulative number of calls made to the user’s root function 𝑔. Arguments: • arkode_mem – pointer to the ARKStep memory block. • ngevals – number of calls made to 𝑔 so far. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Linear solver interface optional output functions The following optional outputs are available from the ARKLS modules: workspace requirements, number of calls to the Jacobian routine, number of calls to the mass matrix routine, number of calls to the implicit right-hand side routine for finite-difference Jacobian approximation or Jacobian-vector product approximation, number of linear iterations, number of linear convergence failures, number of calls to the preconditioner setup and solve routines, number of calls to the Jacobian-vector setup and product routines, number of calls to the mass-matrix-vector setup and product routines, and last return value from an ARKLS function. Note that, where the name of an output would otherwise conflict with the name of an optional output from the main solver, a suffix LS (for Linear Solver) or MLS (for Mass Linear Solver) has been added here (e.g. lenrwLS). Optional output Size of real and integer workspaces No. of Jacobian evaluations No. of preconditioner evaluations No. of preconditioner solves No. of linear iterations No. of linear convergence failures No. of Jacobian-vector setup evaluations No. of Jacobian-vector product evaluations No. of fi calls for finite diff. 𝐽 or 𝐽𝑣 evals. Last return from a linear solver function Name of constant associated with a return flag Size of real and integer mass matrix solver workspaces No. of mass matrix solver setups (incl. 𝑀 evals.) No. of mass matrix multiplies No. of mass matrix solves No. of mass matrix preconditioner evaluations No. of mass matrix preconditioner solves No. of mass matrix linear iterations No. of mass matrix solver convergence failures No. of mass-matrix-vector setup evaluations Last return from a mass matrix solver function 4.5. User-callable functions Function name ARKStepGetLinWorkSpace() ARKStepGetNumJacEvals() ARKStepGetNumPrecEvals() ARKStepGetNumPrecSolves() ARKStepGetNumLinIters() ARKStepGetNumLinConvFails() ARKStepGetNumJTSetupEvals() ARKStepGetNumJtimesEvals() ARKStepGetNumLinRhsEvals() ARKStepGetLastLinFlag() ARKStepGetLinReturnFlagName() ARKStepGetMassWorkSpace() ARKStepGetNumMassSetups() ARKStepGetNumMassMult() ARKStepGetNumMassSolves() ARKStepGetNumMassPrecEvals() ARKStepGetNumMassPrecSolves() ARKStepGetNumMassIters() ARKStepGetNumMassConvFails() ARKStepGetNumMTSetups() ARKStepGetLastMassFlag() 87 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepGetLinWorkSpace(void* arkode_mem, long int* lenrwLS, long int* leniwLS) Returns the real and integer workspace used by the ARKLS linear solver interface. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lenrwLS – the number of realtype values in the ARKLS workspace. • leniwLS – the number of integer values in the ARKLS workspace. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: The workspace requirements reported by this routine correspond only to memory allocated within this interface and to memory allocated by the SUNLinearSolver object attached to it. The template Jacobian matrix allocated by the user outside of ARKLS is not included in this report. In a parallel setting, the above values are global (i.e. summed over all processors). int ARKStepGetNumJacEvals(void* arkode_mem, long int* njevals) Returns the number of calls made to the Jacobian approximation routine. Arguments: • arkode_mem – pointer to the ARKStep memory block. • njevals – number of calls to the Jacobian function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumPrecEvals(void* arkode_mem, long int* npevals) Returns the total number of preconditioner evaluations, i.e. the number of calls made to psetup with jok = SUNFALSE. Arguments: • arkode_mem – pointer to the ARKStep memory block. • npevals – the current number of calls to psetup. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumPrecSolves(void* arkode_mem, long int* npsolves) Returns the number of calls made to the preconditioner solve function, psolve. Arguments: • arkode_mem – pointer to the ARKStep memory block. • npsolves – the number of calls to psolve. 88 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumLinIters(void* arkode_mem, long int* nliters) Returns the cumulative number of linear iterations. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nliters – the current number of linear iterations. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: This is only accumulated for the ‘life’ of the linear solver object; the counter is reset whenever a new linear solver module is ‘attached’ to ARKStep, or when ARKStep is resized. int ARKStepGetNumLinConvFails(void* arkode_mem, long int* nlcfails) Returns the cumulative number of linear convergence failures. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nlcfails – the current number of linear convergence failures. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumJTSetupEvals(void* arkode_mem, long int* njtsetup) Returns the cumulative number of calls made to the user-supplied Jacobian-vector setup function, jtsetup. Arguments: • arkode_mem – pointer to the ARKStep memory block. • njtsetup – the current number of calls to jtsetup. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumJtimesEvals(void* arkode_mem, long int* njvevals) Returns the cumulative number of calls made to the Jacobian-vector product function, jtimes. Arguments: • arkode_mem – pointer to the ARKStep memory block. • njvevals – the current number of calls to jtimes. 4.5. User-callable functions 89 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumLinRhsEvals(void* arkode_mem, long int* nfevalsLS) Returns the number of calls to the user-supplied implicit right-hand side function 𝑓𝐼 for finite difference Jacobian or Jacobian-vector product approximation. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nfevalsLS – the number of calls to the user implicit right-hand side function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: The value nfevalsLS is incremented only if the default internal difference quotient function is used. int ARKStepGetLastLinFlag(void* arkode_mem, long int* lsflag) Returns the last return value from an ARKLS routine. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lsflag – the value of the last return flag from an ARKLS function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: If the ARKLs setup function failed when using the SUNLINSOL_DENSE or SUNLINSOL_BAND modules, then the value of lsflag is equal to the column index (numbered from one) at which a zero diagonal element was encountered during the LU factorization of the (dense or banded) Jacobian matrix. For all other failures, lsflag is negative. Otherwise, if the ARKLs setup function failed (ARKStepEvolve() returned ARK_LSETUP_FAIL), then lsflag will be SUNLS_PSET_FAIL_UNREC, SUNLS_ASET_FAIL_UNREC or SUNLS_PACKAGE_FAIL_UNREC. If the ARKLS solve function failed (ARKStepEvolve() returned ARK_LSOLVE_FAIL), then lsflag contains the error return flag from the SUNLinearSolver object, which will be one of: SUNLS_MEM_NULL, indicating that the SUNLinearSolver memory is NULL; SUNLS_ATIMES_FAIL_UNREC, indicating an unrecoverable failure in the 𝐽𝑣 function; SUNLS_PSOLVE_FAIL_UNREC, indicating that the preconditioner solve function failed unrecoverably; SUNLS_GS_FAIL, indicating a failure in the Gram-Schmidt procedure (SPGMR and SPFGMR only); SUNLS_QRSOL_FAIL, indicating that the matrix 𝑅 was found to be singular during the QR solve phase (SPGMR and SPFGMR only); or SUNLS_PACKAGE_FAIL_UNREC, indicating an unrecoverable failure in an external iterative linear solver package. char *ARKStepGetLinReturnFlagName(long int lsflag) Returns the name of the ARKLS constant corresponding to lsflag. 90 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • lsflag – a return flag from an ARKLS function. Return value: The return value is a string containing the name of the corresponding constant. If using the SUNLINSOL_DENSE or SUNLINSOL_BAND modules, then if 1 ≤ lsflag ≤ 𝑛 (LU factorization failed), this routine returns “NONE”. int ARKStepGetMassWorkSpace(void* arkode_mem, long int* lenrwMLS, long int* leniwMLS) Returns the real and integer workspace used by the ARKLS mass matrix linear solver interface. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lenrwMLS – the number of realtype values in the ARKLS mass solver workspace. • leniwMLS – the number of integer values in the ARKLS mass solver workspace. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: The workspace requirements reported by this routine correspond only to memory allocated within this interface and to memory allocated by the SUNLinearSolver object attached to it. The template mass matrix allocated by the user outside of ARKLS is not included in this report. In a parallel setting, the above values are global (i.e. summed over all processors). int ARKStepGetNumMassSetups(void* arkode_mem, long int* nmsetups) Returns the number of calls made to the ARKLS mass matrix solver ‘setup’ routine; these include all calls to the user-supplied mass-matrix constructor function. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmsetups – number of calls to the mass matrix solver setup routine. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMassMult(void* arkode_mem, long int* nmmults) Returns the number of calls made to the ARKLS mass matrix ‘matvec’ routine (matrix-based solvers) or the user-supplied mtimes routine (matris-free solvers). Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmmults – number of calls to the mass matrix solver matrix-times-vector routine. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL 4.5. User-callable functions 91 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKStepGetNumMassSolves(void* arkode_mem, long int* nmsolves) Returns the number of calls made to the ARKLS mass matrix solver ‘solve’ routine. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmsolves – number of calls to the mass matrix solver solve routine. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMassPrecEvals(void* arkode_mem, long int* nmpevals) Returns the total number of mass matrix preconditioner evaluations, i.e. the number of calls made to psetup. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmpevals – the current number of calls to psetup. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMassPrecSolves(void* arkode_mem, long int* nmpsolves) Returns the number of calls made to the mass matrix preconditioner solve function, psolve. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmpsolves – the number of calls to psolve. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMassIters(void* arkode_mem, long int* nmiters) Returns the cumulative number of mass matrix solver iterations. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmiters – the current number of mass matrix solver linear iterations. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMassConvFails(void* arkode_mem, long int* nmcfails) Returns the cumulative number of mass matrix solver convergence failures. 92 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmcfails – the current number of mass matrix solver convergence failures. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetNumMTSetups(void* arkode_mem, long int* nmtsetup) Returns the cumulative number of calls made to the user-supplied mass-matrix-vector product setup function, mtsetup. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nmtsetup – the current number of calls to mtsetup. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL int ARKStepGetLastMassFlag(void* arkode_mem, long int* mlsflag) Returns the last return value from an ARKLS mass matrix interface routine. Arguments: • arkode_mem – pointer to the ARKStep memory block. • mlsflag – the value of the last return flag from an ARKLS mass matrix solver interface function. Return value: • ARKLS_SUCCESS if successful • ARKLS_MEM_NULL if the ARKStep memory was NULL • ARKLS_LMEM_NULL if the linear solver memory was NULL Notes: The values of msflag for each of the various solvers will match those described above for the function ARKStepGetLastLSFlag(). General usability functions The following optional routines may be called by a user to inquire about existing solver parameters, to retrieve stored Butcher tables, write the current Butcher table(s), or even to test a provided Butcher table to determine its analytical order of accuracy. While none of these would typically be called during the course of solving an initial value problem, these may be useful for users wishing to better understand ARKStep and/or specific Runge-Kutta methods. Optional routine Output all ARKStep solver parameters Output the current Butcher table(s) Function name ARKStepWriteParameters() ARKStepWriteButcher() int ARKStepWriteParameters(void* arkode_mem, FILE *fp) Outputs all ARKStep solver parameters to the provided file pointer. 4.5. User-callable functions 93 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ARKStep memory block. • fp – pointer to use for printing the solver parameters. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. When run in parallel, only one process should set a non-NULL value for this pointer, since parameters for all processes would be identical. int ARKStepWriteButcher(void* arkode_mem, FILE *fp) Outputs the current Butcher table(s) to the provided file pointer. Arguments: • arkode_mem – pointer to the ARKStep memory block. • fp – pointer to use for printing the Butcher table(s). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. If ARKStep is currently configured to run in purely explicit or purely implicit mode, this will output a single Butcher table; if configured to run an ImEx method then both tables will be output. When run in parallel, only one process should set a non-NULL value for this pointer, since tables for all processes would be identical. 4.5.11 ARKStep re-initialization functions To reinitialize the ARKStep module for the solution of a new problem, where a prior call to ARKStepCreate() has been made, the user must call the function ARKStepReInit(). The new problem must have the same size as the previous one. This routine performs the same input checking and initializations that are done in ARKStepCreate(), but it performs no memory allocation as it assumes that the existing internal memory is sufficient for the new problem. A call to this re-initialization routine deletes the solution history that was stored internally during the previous integration. Following a successful call to ARKStepReInit(), call ARKStepEvolve() again for the solution of the new problem. The use of ARKStepReInit() requires that the number of Runge Kutta stages, denoted by s, be no larger for the new problem than for the previous problem. This condition is automatically fulfilled if the method order q and the problem type (explicit, implicit, ImEx) are left unchanged. When using the ARKStep time-stepping module, if there are changes to the linear solver specifications, the user should make the appropriate calls to either the linear solver objects themselves, or to the ARKLS interface routines, as described in the section Linear solver interface functions. Otherwise, all solver inputs set previously remain in effect. One important use of the ARKStepReInit() function is in the treating of jump discontinuities in the RHS functions. Except in cases of fairly small jumps, it is usually more efficient to stop at each point of discontinuity and restart the integrator with a readjusted ODE model, using a call to ARKStepReInit(). To stop when the location of the discontinuity is known, simply make that location a value of tout. To stop when the location of the discontinuity is determined by the solution, use the rootfinding feature. In either case, it is critical that the RHS functions not 94 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), incorporate the discontinuity, but rather have a smooth extension over the discontinuity, so that the step across it (and subsequent rootfinding, if used) can be done efficiently. Then use a switch within the RHS functions (communicated through user_data) that can be flipped between the stopping of the integration and the restart, so that the restarted problem uses the new values (which have jumped). Similar comments apply if there is to be a jump in the dependent variable vector. int ARKStepReInit(void* arkode_mem, ARKRhsFn fe, ARKRhsFn fi, realtype t0, N_Vector y0) Provides required problem specifications and re-initializes the ARKStep time-stepper module. Arguments: • arkode_mem – pointer to the ARKStep memory block. • fe – the name of the C function (of type ARKRhsFn()) defining the explicit portion of the right-hand side function in 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦). • fi – the name of the C function (of type ARKRhsFn()) defining the implicit portion of the right-hand side function in 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_MEM_FAIL if a memory allocation failed • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, ARKStepReInit() also sends an error message to the error handler function. 4.5.12 ARKStep system resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatially-adaptive PDE simulations under a method-of-lines approach), the ARKStep integrator may be “resized” between integration steps, through calls to the ARKStepResize() function. This function modifies ARKStep’s internal memory structures to use the new problem size, without destruction of the temporal adaptivity heuristics. It is assumed that the dynamical time scales before and after the vector resize will be comparable, so that all timestepping heuristics prior to calling ARKStepResize() remain valid after the call. If instead the dynamics should be recomputed from scratch, the ARKStep memory structure should be deleted with a call to ARKStepFree(), and recreated with a calls to ARKStepCreate(). To aid in the vector resize operation, the user can supply a vector resize function that will take as input a vector with the previous size, and transform it in-place to return a corresponding vector of the new size. If this function (of type ARKVecResizeFn()) is not supplied (i.e. is set to NULL), then all existing vectors internal to ARKStep will be destroyed and re-cloned from the new input vector. In the case that the dynamical time scale should be modified slightly from the previous time scale, an input hscale is allowed, that will rescale the upcoming time step by the specified factor. If a value hscale ≤ 0 is specified, the default of 1.0 will be used. int ARKStepResize(void* arkode_mem, N_Vector ynew, realtype hscale, realtype t0, ARKVecResizeFn resize, void* resize_data) Re-initializes ARKStep with a different state vector but with comparable dynamical time scale. Arguments: • arkode_mem – pointer to the ARKStep memory block. 4.5. User-callable functions 95 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ynew – the newly-sized solution vector, holding the current dependent variable values 𝑦(𝑡0 ). • hscale – the desired scaling factor for the dynamical time scale (i.e. the next step will be of size h*hscale). • t0 – the current value of the independent variable 𝑡0 (this must be consistent with ynew). • resize – the user-supplied vector resize function (of type ARKVecResizeFn(). • resize_data – the user-supplied data structure to be passed to resize when modifying internal ARKStep vectors. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ARKStep memory was NULL • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, ARKStepResize() also sends an error message to the error handler function. Resizing the linear solver When using any of the SUNDIALS-provided linear solver modules, the linear solver memory structures must also be resized. At present, none of these include a solver-specific ‘resize’ function, so the linear solver memory must be destroyed and re-allocated following each call to ARKStepResize(). Moreover, the existing ARKLS interface should then be deleted and recreated by attaching the updated SUNLinearSolver (and possibly SUNMatrix) object(s) through calls to ARKStepSetLinearSolver(), and ARKStepSetMassLinearSolver(). If any user-supplied routines are provided to aid the linear solver (e.g. Jacobian construction, Jacobian-vector product, mass-matrix-vector product, preconditioning), then the corresponding “set” routines must be called again following the solver re-specification. Resizing the absolute tolerance array If using array-valued absolute tolerances, the absolute tolerance vector will be invalid after the call to ARKStepResize(), so the new absolute tolerance vector should be re-set following each call to ARKStepResize() through a new call to ARKStepSVtolerances() (and similarly to ARKStepResVtolerance() if that was used for the original problem). If scalar-valued tolerances or a tolerance function was specified through either ARKStepSStolerances() or ARKStepWFtolerances(), then these will remain valid and no further action is necessary. Note: For an example of ARKStepResize() usage, see the supplied serial C example problem, ark_heat1D_adapt.c. 4.6 User-supplied functions The user-supplied functions for ARKStep consist of: • at least one function defining the ODE (required), • a function that handles error and warning messages (optional), • a function that provides the error weight vector (optional), 96 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • a function that provides the residual weight vector (optional), • a function that handles adaptive time step error control (optional), • a function that handles explicit time step stability (optional), • a function that defines the root-finding problem(s) to solve (optional), • one or two functions that provide Jacobian-related information for the linear solver, if a Newton-based nonlinear iteration is chosen (optional), • one or two functions that define the preconditioner for use in any of the Krylov iterative algorithms, if a Newtonbased nonlinear iteration and iterative linear solver are chosen (optional), and • if the problem involves a non-identity mass matrix 𝑀 ̸= 𝐼: – one or two functions that provide mass-matrix-related information for the linear and mass matrix solvers (required), – one or two functions that define the mass matrix preconditioner for use in an iterative mass matrix solver is chosen (optional), and • a function that handles vector resizing operations, if the underlying vector structure supports resizing (as opposed to deletion/recreation), and if the user plans to call ARKStepResize() (optional). 4.6.1 ODE right-hand side The user must supply at least one function of type ARKRhsFn to specify the explicit and/or implicit portions of the ODE system: typedef int (*ARKRhsFn)(realtype t, N_Vector y, N_Vector ydot, void* user_data) These functions compute the ODE right-hand side for a given value of the independent variable 𝑡 and state vector 𝑦. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • ydot – the output vector that forms a portion of the ODE RHS 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦). • user_data – the user_data pointer that was passed to ARKStepSetUserData(). Return value: An ARKRhsFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKStep will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARK_RHSFUNC_FAIL is returned). Notes: Allocation of memory for ydot is handled within the ARKStep module. A recoverable failure error return from the ARKRhsFn is typically used to flag a value of the dependent variable 𝑦 that is “illegal” in some way (e.g., negative where only a non-negative value is physically meaningful). If such a return is made, ARKStep will attempt to recover (possibly repeating the nonlinear iteration, or reducing the step size) in order to avoid this recoverable error return. There are some situations in which recovery is not possible even if the right-hand side function returns a recoverable error flag. One is when this occurs at the very first call to the ARKRhsFn (in which case ARKStep returns ARK_FIRST_RHSFUNC_ERR). Another is when a recoverable error is reported by ARKRhsFn after the integrator completes a successful stage, in which case ARKStep returns ARK_UNREC_RHSFUNC_ERR). 4.6. User-supplied functions 97 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4.6.2 Error message handler function As an alternative to the default behavior of directing error and warning messages to the file pointed to by errfp (see ARKStepSetErrFile()), the user may provide a function of type ARKErrHandlerFn to process any such messages. typedef void (*ARKErrHandlerFn)(int error_code, const char* module, const char* function, char* msg, void* user_data) This function processes error and warning messages from ARKStep and its sub-modules. Arguments: • error_code – the error code. • module – the name of the ARKStep module reporting the error. • function – the name of the function in which the error occurred. • msg – the error message. • user_data – a pointer to user data, the same as the eh_data parameter that was passed to ARKStepSetErrHandlerFn(). Return value: An ARKErrHandlerFn function has no return value. Notes: error_code is negative for errors and positive (ARK_WARNING) for warnings. If a function that returns a pointer to memory encounters an error, it sets error_code to 0. 4.6.3 Error weight function As an alternative to providing the relative and absolute tolerances, the user may provide a function of type ARKEwtFn (︁ ∑︀ )︁1/2 𝑛 2 to compute a vector ewt containing the weights in the WRMS norm ‖𝑣‖𝑊 𝑅𝑀 𝑆 = 𝑛1 𝑖=1 (𝑒𝑤𝑡𝑖 𝑣𝑖 ) . These weights will be used in place of those defined in the section Error norms. typedef int (*ARKEwtFn)(N_Vector y, N_Vector ewt, void* user_data) This function computes the WRMS error weights for the vector 𝑦. Arguments: • y – the dependent variable vector at which the weight vector is to be computed. • ewt – the output vector containing the error weights. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: An ARKEwtFn function must return 0 if it successfully set the error weights, and -1 otherwise. Notes: Allocation of memory for ewt is handled within ARKStep. The error weight vector must have all components positive. It is the user’s responsibility to perform this test and return -1 if it is not satisfied. 4.6.4 Residual weight function As an alternative to providing the scalar or vector absolute residual tolerances (when the IVP units differ from the solution units), the user may provide a function of type ARKRwtFn to compute a vector rwt containing the weights in (︁ ∑︀ )︁1/2 𝑛 2 the WRMS norm ‖𝑣‖𝑊 𝑅𝑀 𝑆 = 𝑛1 𝑖=1 (𝑟𝑤𝑡𝑖 𝑣𝑖 ) . These weights will be used in place of those defined in the section Error norms. 98 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), typedef int (*ARKRwtFn)(N_Vector y, N_Vector rwt, void* user_data) This function computes the WRMS residual weights for the vector 𝑦. Arguments: • y – the dependent variable vector at which the weight vector is to be computed. • rwt – the output vector containing the residual weights. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: An ARKRwtFn function must return 0 if it successfully set the residual weights, and -1 otherwise. Notes: Allocation of memory for rwt is handled within ARKStep. The residual weight vector must have all components positive. It is the user’s responsibility to perform this test and return -1 if it is not satisfied. 4.6.5 Time step adaptivity function As an alternative to using one of the built-in time step adaptivity methods for controlling solution error, the user may provide a function of type ARKAdaptFn to compute a target step size ℎ for the next integration step. These steps should be chosen as the maximum value such that the error estimates remain below 1. typedef int (*ARKAdaptFn)(N_Vector y, realtype t, realtype h1, realtype h2, realtype h3, realtype e1, realtype e2, realtype e3, int q, int p, realtype* hnew, void* user_data) This function implements a time step adaptivity algorithm that chooses ℎ satisfying the error tolerances. Arguments: • y – the current value of the dependent variable vector. • t – the current value of the independent variable. • h1 – the current step size, 𝑡𝑛 − 𝑡𝑛−1 . • h2 – the previous step size, 𝑡𝑛−1 − 𝑡𝑛−2 . • h3 – the step size 𝑡𝑛−2 − 𝑡𝑛−3 . • e1 – the error estimate from the current step, 𝑛. • e2 – the error estimate from the previous step, 𝑛 − 1. • e3 – the error estimate from the step 𝑛 − 2. • q – the global order of accuracy for the method. • p – the global order of accuracy for the embedded method. • hnew – the output value of the next step size. • user_data – a pointer to user data, the same as the h_data parameter that was passed to ARKStepSetAdaptivityFn(). Return value: An ARKAdaptFn function should return 0 if it successfully set the next step size, and a non-zero value otherwise. 4.6.6 Explicit stability function A user may supply a function to predict the maximum stable step size for the explicit portion of the ImEx system, 𝑓𝐸 (𝑡, 𝑦). While the accuracy-based time step adaptivity algorithms may be sufficient for retaining a stable solution 4.6. User-supplied functions 99 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), to the ODE system, these may be inefficient if 𝑓𝐸 (𝑡, 𝑦) contains moderately stiff terms. In this scenario, a user may provide a function of type ARKExpStabFn to provide this stability information to ARKStep. This function must set the scalar step size satisfying the stability restriction for the upcoming time step. This value will subsequently be bounded by the user-supplied values for the minimum and maximum allowed time step, and the accuracy-based time step. typedef int (*ARKExpStabFn)(N_Vector y, realtype t, realtype* hstab, void* user_data) This function predicts the maximum stable step size for the explicit portions of the ImEx ODE system. Arguments: • y – the current value of the dependent variable vector. • t – the current value of the independent variable. • hstab – the output value with the absolute value of the maximum stable step size. • user_data – a pointer to user data, the same as the estab_data parameter that was passed to ARKStepSetStabilityFn(). Return value: An ARKExpStabFn function should return 0 if it successfully set the upcoming stable step size, and a non-zero value otherwise. Notes: If this function is not supplied, or if it returns hstab ≤ 0.0, then ARKStep will assume that there is no explicit stability restriction on the time step size. 4.6.7 Rootfinding function If a rootfinding problem is to be solved during the integration of the ODE system, the user must supply a function of type ARKRootFn. typedef int (*ARKRootFn)(realtype t, N_Vector y, realtype* gout, void* user_data) This function implements a vector-valued function 𝑔(𝑡, 𝑦) such that the roots of the nrtfn components 𝑔𝑖 (𝑡, 𝑦) are sought. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • gout – the output array, of length nrtfn, with components 𝑔𝑖 (𝑡, 𝑦). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: An ARKRootFn function should return 0 if successful or a non-zero value if an error occurred (in which case the integration is halted and ARKStep returns ARK_RTFUNC_FAIL). Notes: Allocation of memory for gout is handled within ARKStep. 4.6.8 Jacobian construction (matrix-based linear solvers) If a matrix-based linear solver module is used (i.e., a non-NULL SUNMatrix object was supplied to ARKStepSetLinearSolver() in section A skeleton of the user’s main program), the user may provide a function of type ARKLsJacFn to provide the Jacobian approximation. typedef int (*ARKLsJacFn)(realtype t, N_Vector y, N_Vector fy, SUNMatrix Jac, void* user_data, N_Vector tmp1, N_Vector tmp2, N_Vector tmp3) 𝐼 This function computes the Jacobian matrix 𝐽 = 𝜕𝑓 𝜕𝑦 (or an approximation to it). 100 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector, namely the predicted value of 𝑦(𝑡). • fy – the current value of the vector 𝑓𝐼 (𝑡, 𝑦). • Jac – the output Jacobian matrix. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). • tmp1, tmp2, tmp3 – pointers to memory allocated to variables of type N_Vector which can be used by an ARKLsJacFn as temporary storage or work space. Return value: An ARKLsJacFn function should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKStep will attempt to correct, while ARKLS sets last_flag to ARKLS_JACFUNC_RECVR), or a negative value if it failed unrecoverably (in which case the integration is halted, ARKStepEvolve() returns ARK_LSETUP_FAIL and ARKLS sets last_flag to ARKLS_JACFUNC_UNRECVR). Notes: Information regarding the structure of the specific SUNMatrix structure (e.g.~number of rows, upper/lower bandwidth, sparsity type) may be obtained through using the implementation-specific SUNMatrix interface functions (see the section Matrix Data Structures for details). Prior to calling the user-supplied Jacobian function, the Jacobian matrix 𝐽(𝑡, 𝑦) is zeroed out, so only nonzero elements need to be loaded into Jac. If the user’s ARKLsJacFn function uses difference quotient approximations, then it may need to access quantities not in the argument list. These include the current step size, the error weights, etc. To obtain these, the user will need to add a pointer to the ark_mem structure to their user_data, and then use the ARKStepGet* functions listed in Optional output functions. The unit roundoff can be accessed as UNIT_ROUNDOFF, which is defined in the header file sundials_types.h. dense: A user-supplied dense Jacobian function must load the N by N dense matrix Jac with an approximation to the Jacobian matrix 𝐽(𝑡, 𝑦) at the point (𝑡, 𝑦). The accessor macros SM_ELEMENT_D and SM_COLUMN_D allow the user to read and write dense matrix elements without making explicit references to the underlying representation of the SUNMATRIX_DENSE type. SM_ELEMENT_D(J, i, j) references the (i,j)-th element of the dense matrix J (for i, j between 0 and N-1). This macro is meant for small problems for which efficiency of access is not a major concern. Thus, in terms of the indices 𝑚 and 𝑛 ranging from 1 to N, the Jacobian element 𝐽𝑚,𝑛 can be set using the statement SM_ELEMENT_D(J, m-1, n-1) = 𝐽𝑚,𝑛 . Alternatively, SM_COLUMN_D(J, j) returns a pointer to the first element of the j-th column of J (for j ranging from 0 to N-1), and the elements of the j-th column can then be accessed using ordinary array indexing. Consequently, 𝐽𝑚,𝑛 can be loaded using the statements col_n = SM_COLUMN_D(J, n-1); col_n[m-1] = 𝐽𝑚,𝑛 . For large problems, it is more efficient to use SM_COLUMN_D than to use SM_ELEMENT_D. Note that both of these macros number rows and columns starting from 0. The SUNMATRIX_DENSE type and accessor macros are documented in section The SUNMATRIX_DENSE Module. band: A user-supplied banded Jacobian function must load the band matrix Jac with the elements of the Jacobian 𝐽(𝑡, 𝑦) at the point (𝑡, 𝑦). The accessor macros SM_ELEMENT_B, SM_COLUMN_B, and SM_COLUMN_ELEMENT_B allow the user to read and write band matrix elements without making specific references to the underlying representation of the SUNMATRIX_BAND type. SM_ELEMENT_B(J, i, j) references the (i,j)-th element of the band matrix J, counting from 0. This macro is meant for use in small problems for which efficiency of access is not a major concern. Thus, in terms of the indices 𝑚 and 𝑛 ranging from 1 to N with (𝑚, 𝑛) within the band defined by mupper and mlower, the Jacobian element 𝐽𝑚,𝑛 can be loaded using the statement SM_ELEMENT_B(J, m-1, n-1) = 𝐽𝑚,𝑛 . The elements within the band 4.6. User-supplied functions 101 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), are those with -mupper ≤ 𝑚 − 𝑛 ≤ mlower. Alternatively, SM_COLUMN_B(J, j) returns a pointer to the diagonal element of the j-th column of J, and if we assign this address to realtype *col_j, then the i-th element of the j-th column is given by SM_COLUMN_ELEMENT_B(col_j, i, j), counting from 0. Thus, for (𝑚, 𝑛) within the band, 𝐽𝑚,𝑛 can be loaded by setting col_n = SM_COLUMN_B(J, n-1); SM_COLUMN_ELEMENT_B(col_n, m-1, n-1) = 𝐽𝑚,𝑛 . The elements of the j-th column can also be accessed via ordinary array indexing, but this approach requires knowledge of the underlying storage for a band matrix of type SUNMATRIX_BAND. The array col_n can be indexed from -mupper to mlower. For large problems, it is more efficient to use SM_COLUMN_B and SM_COLUMN_ELEMENT_B than to use the SM_ELEMENT_B macro. As in the dense case, these macros all number rows and columns starting from 0. The SUNMATRIX_BAND type and accessor macros are documented in section The SUNMATRIX_BAND Module. sparse: A user-supplied sparse Jacobian function must load the compressed-sparse-column (CSC) or compressedsparse-row (CSR) matrix Jac with an approximation to the Jacobian matrix 𝐽(𝑡, 𝑦) at the point (𝑡, 𝑦). Storage for Jac already exists on entry to this function, although the user should ensure that sufficient space is allocated in Jac to hold the nonzero values to be set; if the existing space is insufficient the user may reallocate the data and index arrays as needed. The amount of allocated space in a SUNMATRIX_SPARSE object may be accessed using the macro SM_NNZ_S or the routine SUNSparseMatrix_NNZ(). The SUNMATRIX_SPARSE type is further documented in the section The SUNMATRIX_SPARSE Module. 4.6.9 Jacobian-vector product (matrix-free linear solvers) When using a matrix-free linear solver modules for the implicit stage solves (i.e., a NULL-valued SUNMATRIX argument was supplied to ARKStepSetLinearSolver() in the section A skeleton of the user’s main program), the user may provide a function of type ARKLsJacTimesVecFn in the following form, to compute matrix-vector products 𝐽𝑣. If such a function is not supplied, the default is a difference quotient approximation to these products. typedef int (*ARKLsJacTimesVecFn)(N_Vector v, N_Vector Jv, realtype t, N_Vector y, N_Vector fy, void* N_Vector tmp) )︁ (︁ user_data, 𝜕𝑓𝐼 This function computes the product 𝐽𝑣 = 𝜕𝑦 𝑣 (or an approximation to it). Arguments: • v – the vector to multiply. • Jv – the output vector computed. • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • fy – the current value of the vector 𝑓𝐼 (𝑡, 𝑦). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). • tmp – pointer to memory allocated to a variable of type N_Vector which can be used as temporary storage or work space. Return value: The value to be returned by the Jacobian-vector product function should be 0 if successful. Any other return value will result in an unrecoverable error of the generic Krylov solver, in which case the integration is halted. Notes: If the user’s ARKLsJacTimesVecFn function uses difference quotient approximations, it may need to access quantities not in the argument list. These include the current step size, the error weights, etc. To obtain these, the user will need to add a pointer to the ark_mem structure to their user_data, and then use the ARKStepGet* functions listed in Optional output functions. The unit roundoff can be accessed as UNIT_ROUNDOFF, which is defined in the header file sundials_types.h. 102 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4.6.10 Jacobian-vector product setup (matrix-free linear solvers) If the user’s Jacobian-times-vector routine requires that any Jacobian-related data be preprocessed or evaluated, then this needs to be done in a user-supplied function of type ARKLsJacTimesSetupFn, defined as follows: typedef int (*ARKLsJacTimesSetupFn)(realtype t, N_Vector y, N_Vector fy, void* user_data) This function preprocesses and/or evaluates any Jacobian-related data needed by the Jacobian-times-vector routine. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • fy – the current value of the vector 𝑓𝐼 (𝑡, 𝑦). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: The value to be returned by the Jacobian-vector setup function should be 0 if successful, positive for a recoverable error (in which case the step will be retried), or negative for an unrecoverable error (in which case the integration is halted). Notes: Each call to the Jacobian-vector setup function is preceded by a call to the implicit ARKRhsFn user function with the same (𝑡, 𝑦) arguments. Thus, the setup function can use any auxiliary data that is computed and saved during the evaluation of the implicit ODE right-hand side. If the user’s ARKLsJacTimesSetupFn function uses difference quotient approximations, it may need to access quantities not in the argument list. These include the current step size, the error weights, etc. To obtain these, the user will need to add a pointer to the ark_mem structure to their user_data, and then use the ARKStepGet* functions listed in Optional output functions. The unit roundoff can be accessed as UNIT_ROUNDOFF, which is defined in the header file sundials_types.h. 4.6.11 Preconditioner solve (iterative linear solvers) If a user-supplied preconditioner is to be used with a SUNLinSol solver module, then the user must provide a function of type ARKLsPrecSolveFn to solve the linear system 𝑃 𝑧 = 𝑟, where 𝑃 corresponds to either a left or right preconditioning matrix. Here 𝑃 should approximate (at least crudely) the Newton matrix 𝐴 = 𝑀 − 𝛾𝐽, where 𝑀 is 𝐼 the mass matrix (typically 𝑀 = 𝐼 unless working in a finite-element setting) and 𝐽 = 𝜕𝑓 𝜕𝑦 If preconditioning is done on both sides, the product of the two preconditioner matrices should approximate 𝐴. typedef int (*ARKLsPrecSolveFn)(realtype t, N_Vector y, N_Vector fy, N_Vector r, N_Vector z, realtype gamma, realtype delta, int lr, void* user_data) This function solves the preconditioner system 𝑃 𝑧 = 𝑟. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • fy – the current value of the vector 𝑓𝐼 (𝑡, 𝑦). • r – the right-hand side vector of the linear system. • z – the computed output solution vector. • gamma – the scalar 𝛾 appearing in the Newton matrix given by 𝐴 = 𝑀 − 𝛾𝐽. • delta – an input tolerance to be used if an iterative method is employed in the solution. In that case, the residual vector 𝑅𝑒𝑠 = 𝑟 − 𝑃 𝑧 of the system should be made to be less than delta in the weighted 4.6. User-supplied functions 103 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), (︁∑︀ )︁1/2 𝑛 2 𝑙2 norm, i.e. (𝑅𝑒𝑠 * 𝑒𝑤𝑡 ) < 𝛿, where 𝛿 = delta. To obtain the N_Vector ewt, call 𝑖 𝑖 𝑖=1 ARKStepGetErrWeights(). • lr – an input flag indicating whether the preconditioner solve is to use the left preconditioner (lr = 1) or the right preconditioner (lr = 2). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: The value to be returned by the preconditioner solve function is a flag indicating whether it was successful. This value should be 0 if successful, positive for a recoverable error (in which case the step will be retried), or negative for an unrecoverable error (in which case the integration is halted). 4.6.12 Preconditioner setup (iterative linear solvers) If the user’s preconditioner routine requires that any data be preprocessed or evaluated, then these actions need to occur within a user-supplied function of type ARKLsPrecSetupFn. typedef int (*ARKLsPrecSetupFn)(realtype t, N_Vector y, N_Vector fy, booleantype jok, booleantype* jcurPtr, realtype gamma, void* user_data) This function preprocesses and/or evaluates Jacobian-related data needed by the preconditioner. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • fy – the current value of the vector 𝑓𝐼 (𝑡, 𝑦). • jok – is an input flag indicating whether the Jacobian-related data needs to be updated. The jok argument provides for the reuse of Jacobian data in the preconditioner solve function. When jok = SUNFALSE, the Jacobian-related data should be recomputed from scratch. When jok = SUNTRUE the Jacobian data, if saved from the previous call to this function, can be reused (with the current value of gamma). A call with jok = SUNTRUE can only occur after a call with jok = SUNFALSE. • jcurPtr – is a pointer to a flag which should be set to SUNTRUE if Jacobian data was recomputed, or set to SUNFALSE if Jacobian data was not recomputed, but saved data was still reused. • gamma – the scalar 𝛾 appearing in the Newton matrix given by 𝐴 = 𝑀 − 𝛾𝐽. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: The value to be returned by the preconditioner setup function is a flag indicating whether it was successful. This value should be 0 if successful, positive for a recoverable error (in which case the step will be retried), or negative for an unrecoverable error (in which case the integration is halted). Notes: The operations performed by this function might include forming a crude approximate Jacobian, and performing an LU factorization of the resulting approximation to 𝐴 = 𝑀 − 𝛾𝐽. Each call to the preconditioner setup function is preceded by a call to the implicit ARKRhsFn user function with the same (𝑡, 𝑦) arguments. Thus, the preconditioner setup function can use any auxiliary data that is computed and saved during the evaluation of the ODE right-hand side. This function is not called in advance of every call to the preconditioner solve function, but rather is called only as often as needed to achieve convergence in the Newton iteration. If the user’s ARKLsPrecSetupFn function uses difference quotient approximations, it may need to access quantities not in the call list. These include the current step size, the error weights, etc. To obtain these, the user will need to add a pointer to the ark_mem structure to their user_data, and then use the ARKStepGet* 104 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), functions listed in Optional output functions. The unit roundoff can be accessed as UNIT_ROUNDOFF, which is defined in the header file sundials_types.h. 4.6.13 Mass matrix construction (matrix-based linear solvers) If a matrix-based mass-matrix linear solver is used (i.e., a non-NULL SUNMATRIX was supplied to ARKStepSetMassLinearSolver() in the section A skeleton of the user’s main program), the user must provide a function of type ARKLsMassFn to provide the mass matrix approximation. typedef int (*ARKLsMassFn)(realtype t, SUNMatrix M, void* user_data, N_Vector tmp1, N_Vector tmp2, N_Vector tmp3) This function computes the mass matrix 𝑀 (or an approximation to it). Arguments: • t – the current value of the independent variable. • M – the output mass matrix. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). • tmp1, tmp2, tmp3 – pointers to memory allocated to variables of type N_Vector which can be used by an ARKLsMassFn as temporary storage or work space. Return value: An ARKLsMassFn function should return 0 if successful, or a negative value if it failed unrecoverably (in which case the integration is halted, ARKStepEvolve() returns ARK_MASSSETUP_FAIL and ARKLS sets last_flag to ARKLS_MASSFUNC_UNRECVR). Notes: Information regarding the structure of the specific SUNMatrix structure (e.g.~number of rows, upper/lower bandwidth, sparsity type) may be obtained through using the implementation-specific SUNMatrix interface functions (see the section Matrix Data Structures for details). Prior to calling the user-supplied mass matrix function, the mass matrix 𝑀 is zeroed out, so only nonzero elements need to be loaded into M. dense: A user-supplied dense mass matrix function must load the N by N dense matrix M with an approximation to the mass matrix 𝑀 . As discussed above in section Jacobian construction (matrix-based linear solvers), the accessor macros SM_ELEMENT_D and SM_COLUMN_D allow the user to read and write dense matrix elements without making explicit references to the underlying representation of the SUNMATRIX_DENSE type. Similarly, the SUNMATRIX_DENSE type and accessor macros SM_ELEMENT_D and SM_COLUMN_D are documented in the section The SUNMATRIX_DENSE Module. band: A user-supplied banded mass matrix function must load the band matrix M with the elements of the mass matrix 𝑀 . As discussed above in section Jacobian construction (matrix-based linear solvers), the accessor macros SM_ELEMENT_B, SM_COLUMN_B, and SM_COLUMN_ELEMENT_B allow the user to read and write band matrix elements without making specific references to the underlying representation of the SUNMATRIX_BAND type. Similarly, the SUNMATRIX_BAND type and the accessor macros SM_ELEMENT_B, SM_COLUMN_B, and SM_COLUMN_ELEMENT_B are documented in the section The SUNMATRIX_BAND Module. sparse: A user-supplied sparse mass matrix function must load the compressed-sparse-column (CSR) or compressedsparse-row (CSR) matrix M with an approximation to the mass matrix 𝑀 . Storage for M already exists on entry to this function, although the user should ensure that sufficient space is allocated in M to hold the nonzero values to be set; if the existing space is insufficient the user may reallocate the data and row index arrays as needed. The type of M is SUNMATRIX_SPARSE, and the amount of allocated space in a SUNMATRIX_SPARSE object 4.6. User-supplied functions 105 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), may be accessed using the macro SM_NNZ_S or the routine SUNSparseMatrix_NNZ(). The SUNMATRIX_SPARSE type is further documented in the section The SUNMATRIX_SPARSE Module. 4.6.14 Mass matrix-vector product (matrix-free linear solvers) If a matrix-free linear solver is to be used for mass-matrix linear systems (i.e., a NULL-valued SUNMATRIX argument was supplied to ARKStepSetMassLinearSolver() in the section A skeleton of the user’s main program), the user must provide a function of type ARKLsMassTimesVecFn in the following form, to compute matrix-vector products 𝑀 𝑣. typedef int (*ARKLsMassTimesVecFn)(N_Vector v, N_Vector Mv, realtype t, void* mtimes_data) This function computes the product 𝑀 * 𝑣 (or an approximation to it). Arguments: • v – the vector to multiply. • Mv – the output vector computed. • t – the current value of the independent variable. • mtimes_data – a pointer to user data, the same as the mtimes_data parameter that was passed to ARKStepSetMassTimes(). Return value: The value to be returned by the mass-matrix-vector product function should be 0 if successful. Any other return value will result in an unrecoverable error of the generic Krylov solver, in which case the integration is halted. 4.6.15 Mass matrix-vector product setup (matrix-free linear solvers) If the user’s mass-matrix-times-vector routine requires that any mass matrix-related data be preprocessed or evaluated, then this needs to be done in a user-supplied function of type ARKLsMassTimesSetupFn, defined as follows: typedef int (*ARKLsMassTimesSetupFn)(realtype t, void* mtimes_data) This function preprocesses and/or evaluates any mass-matrix-related data needed by the mass-matrix-timesvector routine. Arguments: • t – the current value of the independent variable. • mtimes_data – a pointer to user data, the same as the mtimes_data parameter that was passed to ARKStepSetMassTimes(). Return value: The value to be returned by the mass-matrix-vector setup function should be 0 if successful. Any other return value will result in an unrecoverable error of the ARKLS mass matrix solver interface, in which case the integration is halted. 4.6.16 Mass matrix preconditioner solve (iterative linear solvers) If a user-supplied preconditioner is to be used with a SUNLINEAR solver module for mass matrix linear systems, then the user must provide a function of type ARKLsMassPrecSolveFn to solve the linear system 𝑃 𝑧 = 𝑟, where 𝑃 may be either a left or right preconditioning matrix. Here 𝑃 should approximate (at least crudely) the mass matrix 𝑀 . If preconditioning is done on both sides, the product of the two preconditioner matrices should approximate 𝑀 . typedef int (*ARKLsMassPrecSolveFn)(realtype t, N_Vector r, N_Vector z, realtype delta, int lr, void* user_data) This function solves the preconditioner system 𝑃 𝑧 = 𝑟. 106 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • t – the current value of the independent variable. • r – the right-hand side vector of the linear system. • z – the computed output solution vector. • delta – an input tolerance to be used if an iterative method is employed in the solution. In that case, the residual vector 𝑅𝑒𝑠 = 𝑟 − 𝑃 𝑧 of the system should be made to be less than delta in the weighted (︁∑︀ )︁1/2 𝑛 2 𝑙2 norm, i.e. (𝑅𝑒𝑠 * 𝑒𝑤𝑡 ) < 𝛿, where 𝛿 = delta. To obtain the N_Vector ewt, call 𝑖 𝑖 𝑖=1 ARKStepGetErrWeights(). • lr – an input flag indicating whether the preconditioner solve is to use the left preconditioner (lr = 1) or the right preconditioner (lr = 2). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: The value to be returned by the preconditioner solve function is a flag indicating whether it was successful. This value should be 0 if successful, positive for a recoverable error (in which case the step will be retried), or negative for an unrecoverable error (in which case the integration is halted). 4.6.17 Mass matrix preconditioner setup (iterative linear solvers) If the user’s mass matrix preconditioner above requires that any problem data be preprocessed or evaluated, then these actions need to occur within a user-supplied function of type ARKLsMassPrecSetupFn. typedef int (*ARKLsMassPrecSetupFn)(realtype t, void* user_data) This function preprocesses and/or evaluates mass-matrix-related data needed by the preconditioner. Arguments: • t – the current value of the independent variable. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ARKStepSetUserData(). Return value: The value to be returned by the mass matrix preconditioner setup function is a flag indicating whether it was successful. This value should be 0 if successful, positive for a recoverable error (in which case the step will be retried), or negative for an unrecoverable error (in which case the integration is halted). Notes: The operations performed by this function might include forming a mass matrix and performing an incomplete factorization of the result. Although such operations would typically be performed only once at the beginning of a simulation, these may be required if the mass matrix can change as a function of time. If both this function and a ARKLsMassTimesSetupFn are supplied, all calls to this function will be preceded by a call to the ARKLsMassTimesSetupFn, so any setup performed there may be reused. 4.6.18 Vector resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatial adaptivity in a PDE simulation), the ARKStep integrator may be “resized” between integration steps, through calls to the ARKStepResize() function. Typically, when performing adaptive simulations the solution is stored in a customized user-supplied data structure, to enable adaptivity without repeated allocation/deallocation of memory. In these scenarios, it is recommended that the user supply a customized vector kernel to interface between SUNDIALS and their problem-specific data structure. If this vector kernel includes a function of type ARKVecResizeFn to resize a given vector implementation, then this function may be supplied to ARKStepResize() so that all internal 4.6. User-supplied functions 107 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ARKStep vectors may be resized, instead of deleting and re-creating them at each call. This resize function should have the following form: typedef int (*ARKVecResizeFn)(N_Vector y, N_Vector ytemplate, void* user_data) This function resizes the vector y to match the dimensions of the supplied vector, ytemplate. Arguments: • y – the vector to resize. • ytemplate – a vector of the desired size. • user_data – a pointer to user data, the same as the resize_data parameter that was passed to ARKStepResize(). Return value: An ARKVecResizeFn function should return 0 if it successfully resizes the vector y, and a nonzero value otherwise. Notes: If this function is not supplied, then ARKStep will instead destroy the vector y and clone a new vector y off of ytemplate. 4.7 Preconditioner modules The efficiency of Krylov iterative methods for the solution of linear systems can be greatly enhanced through preconditioning. For problems in which the user cannot define a more effective, problem-specific preconditioner, ARKode provides two internal preconditioner modules that may be used by ARKStep: a banded preconditioner for serial and threaded problems (ARKBANDPRE) and a band-block-diagonal preconditioner for parallel problems (ARKBBDPRE). 4.7.1 A serial banded preconditioner module This preconditioner provides a band matrix preconditioner for use with iterative SUNLINSOL modules through the ARKLS linear solver interface, in a serial or threaded setting. It requires that the problem be set up using either the NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS module, due to data access patterns. It also currently requires that the problem involve an identity mass matrix, i.e. 𝑀 = 𝐼. This module uses difference quotients of the ODE right-hand side function 𝑓𝐼 to generate a band matrix of bandwidth ml + mu + 1, where the number of super-diagonals (mu, the upper half-bandwidth) and sub-diagonals (ml, the lower half-bandwidth) are specified by the user. This band matrix is used to to form a preconditioner the Krylov linear 𝐼 solver. Although this matrix is intended to approximate the Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 , it may be a very crude approximation, since the true Jacobian may not be banded, or its true bandwidth may be larger than ml + mu + 1. However, as long as the banded approximation generated for the preconditioner is sufficiently accurate, it may speed convergence of the Krylov iteration. ARKBANDPRE usage In order to use the ARKBANDPRE module, the user need not define any additional functions. In addition to the header files required for the integration of the ODE problem (see the section Access to library and header files), to use the ARKBANDPRE module, the user’s program must include the header file arkode_bandpre.h which declares the needed function prototypes. The following is a summary of the usage of this module. Steps that are unchanged from the skeleton program presented in A skeleton of the user’s main program are italicized. 1. Initialize multi-threaded environment (if appropriate) 2. Set problem dimensions 108 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 3. Set vector of initial values 4. Create ARKStep object 5. Specify integration tolerances 6. Create iterative linear solver object When creating the iterative linear solver object, specify the type of preconditioning (PREC_LEFT or PREC_RIGHT) to use. 7. Set linear solver optional inputs 8. Attach linear solver module 9. Initialize the ARKBANDPRE preconditioner module Specify the upper and lower half-bandwidths (mu and ml, respectively) and call ier = ARKBandPrecInit(arkode_mem, N, mu, ml); to allocate memory and initialize the internal preconditioner data. 10. Set optional inputs Note that the user should not call ARKStepSetPreconditioner() as it will overwrite the preconditioner setup and solve functions. 11. Create nonlinear solver object 12. Attach nonlinear solver module 13. Set nonlinear solver optional inputs 14. Specify rootfinding problem 15. Advance solution in time 16. Get optional outputs Additional optional outputs associated with ARKBANDPRE are available by way of the two routines described below, ARKBandPrecGetWorkSpace() and ARKBandPrecGetNumRhsEvals(). 17. Deallocate memory for solution vector 18. Free solver memory 19. Free linear solver memory ARKBANDPRE user-callable functions The ARKBANDPRE preconditioner module is initialized and attached by calling the following function: int ARKBandPrecInit(void* arkode_mem, sunindextype N, sunindextype mu, sunindextype ml) Initializes the ARKBANDPRE preconditioner and allocates required (internal) memory for it. Arguments: • arkode_mem – pointer to the ARKStep memory block. • N – problem dimension (size of ODE system). • mu – upper half-bandwidth of the Jacobian approximation. • ml – lower half-bandwidth of the Jacobian approximation. Return value: 4.7. Preconditioner modules 109 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_ILL_INPUT if an input has an illegal value • ARKLS_MEM_FAIL if a memory allocation request failed Notes: The banded approximate Jacobian will have nonzero elements only in locations (𝑖, 𝑗) with ml ≤ 𝑗 − 𝑖 ≤ mu. The following two optional output functions are available for use with the ARKBANDPRE module: int ARKBandPrecGetWorkSpace(void* arkode_mem, long int* lenrwLS, long int* leniwLS) Returns the sizes of the ARKBANDPRE real and integer workspaces. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lenrwLS – the number of realtype values in the ARKBANDPRE workspace. • leniwLS – the number of integer values in the ARKBANDPRE workspace. Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_PMEM_NULL if the preconditioner memory is NULL Notes: The workspace requirements reported by this routine correspond only to memory allocated within the ARKBANDPRE module (the banded matrix approximation, banded SUNLinearSolver object, and temporary vectors). The workspaces referred to here exist in addition to those given by the corresponding function ARKStepGetLSWorkspace(). int ARKBandPrecGetNumRhsEvals(void* arkode_mem, long int* nfevalsBP) Returns the number of calls made to the user-supplied right-hand side function 𝑓𝐼 for constructing the finitedifference banded Jacobian approximation used within the preconditioner setup function. Arguments: • arkode_mem – pointer to the ARKStep memory block. • nfevalsBP – number of calls to 𝑓𝐼 . Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_PMEM_NULL if the preconditioner memory is NULL Notes: The counter nfevalsBP is distinct from the counter nfevalsLS returned by the corresponding function ARKStepGetNumLSRhsEvals() and also from nfi_evals returned by ARKStepGetNumRhsEvals(). The total number of right-hand side function evaluations is the sum of all three of these counters, plus the nfe_evals counter for 𝑓𝐸 calls returned by ARKStepGetNumRhsEvals(). 110 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4.7.2 A parallel band-block-diagonal preconditioner module A principal reason for using a parallel ODE solver (such as ARKode) lies in the solution of partial differential equations (PDEs). Moreover, Krylov iterative methods are used on many such problems due to the nature of the underlying linear system of equations that needs to solved at each time step. For many PDEs, the linear algebraic system is large, sparse and structured. However, if a Krylov iterative method is to be effective in this setting, then a nontrivial preconditioner is required. Otherwise, the rate of convergence of the Krylov iterative method is usually slow, and degrades as the PDE mesh is refined. Typically, an effective preconditioner must be problem-specific. However, we have developed one type of preconditioner that treats a rather broad class of PDE-based problems. It has been successfully used with CVODE for several realistic, large-scale problems [HT1998]. It is included in a software module within the ARKode package, and is accessible within the ARKStep time stepping module. This preconditioning module works with the parallel vector module NVECTOR_PARALLEL and is usable with any of the Krylov iterative linear solvers through the ARKLS interface. It generates a preconditioner that is a block-diagonal matrix with each block being a band matrix. The blocks need not have the same number of super- and sub-diagonals and these numbers may vary from block to block. This Band-Block-Diagonal Preconditioner module is called ARKBBDPRE. One way to envision these preconditioners is to think of the computational PDE domain as being subdivided into 𝑄 non-overlapping subdomains, where each subdomain is assigned to one of the 𝑄 MPI tasks used to solve the ODE system. The basic idea is to isolate the preconditioning so that it is local to each process, and also to use a (possibly cheaper) approximate right-hand side function for construction of this preconditioning matrix. This requires the definition of a new function 𝑔(𝑡, 𝑦) ≈ 𝑓𝐼 (𝑡, 𝑦) that will be used to construct the BBD preconditioner matrix. At present, we assume that the ODE be written in explicit form as 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦), where 𝑓𝐼 corresponds to the ODE components to be treated implicitly, i.e. this preconditioning module does not support problems with non-identity mass matrices. The user may set 𝑔 = 𝑓𝐼 , if no less expensive approximation is desired. Corresponding to the domain decomposition, there is a decomposition of the solution vector 𝑦 into 𝑄 disjoint blocks 𝑦𝑞 , and a decomposition of 𝑔 into blocks 𝑔𝑞 . The block 𝑔𝑞 depends both on 𝑦𝑝 and on components of blocks 𝑦𝑞′ associated with neighboring subdomains (so-called ghost-cell data). If we let 𝑦¯𝑞 denote 𝑦𝑞 augmented with those other components on which 𝑔𝑞 depends, then we have 𝑇 𝑔(𝑡, 𝑦) = [𝑔1 (𝑡, 𝑦¯1 ), 𝑔2 (𝑡, 𝑦¯2 ), . . . , 𝑔𝑄 (𝑡, 𝑦¯𝑄 )] , and each of the blocks 𝑔𝑞 (𝑡, 𝑦¯𝑞 ) is decoupled from one another. The preconditioner associated with this decomposition has the form 𝑃 = diag[𝑃1 , 𝑃2 , . . . , 𝑃𝑄 ] where 𝑃𝑞 ≈ 𝐼 − 𝛾𝐽𝑞 𝜕𝑔 and where 𝐽𝑞 is a difference quotient approximation to 𝜕 𝑦¯𝑞𝑞 . This matrix is taken to be banded, with upper and lower half-bandwidths mudq and mldq defined as the number of non-zero diagonals above and below the main diagonal, respectively. The difference quotient approximation is computed using mudq + mldq + 2 evaluations of 𝑔𝑚 , but only a matrix of bandwidth mukeep + mlkeep + 1 is retained. Neither pair of parameters need be the true half-bandwidths of the Jacobian of the local block of 𝑔, if smaller values provide a more efficient preconditioner. The solution of the complete linear system 𝑃𝑥 = 𝑏 reduces to solving each of the distinct equations 𝑃𝑞 𝑥𝑞 = 𝑏𝑞 , 4.7. Preconditioner modules 𝑞 = 1, . . . , 𝑄, 111 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), and this is done by banded LU factorization of 𝑃𝑞 followed by a banded backsolve. Similar block-diagonal preconditioners could be considered with different treatments of the blocks 𝑃𝑞 . For example, incomplete LU factorization or an iterative method could be used instead of banded LU factorization. ARKBBDPRE user-supplied functions The ARKBBDPRE module calls two user-provided functions to construct 𝑃 : a required function gloc (of type ARKLocalFn()) which approximates the right-hand side function 𝑔(𝑡, 𝑦) ≈ 𝑓𝐼 (𝑡, 𝑦) and which is computed locally, and an optional function cfn (of type ARKCommFn()) which performs all inter-process communication necessary to evaluate the approximate right-hand side 𝑔. These are in addition to the user-supplied right-hand side function 𝑓𝐼 . Both functions take as input the same pointer user_data that is passed by the user to ARKStepSetUserData() and that was passed to the user’s function 𝑓𝐼 . The user is responsible for providing space (presumably within user_data) for components of 𝑦 that are communicated between processes by cfn, and that are then used by gloc, which should not do any communication. typedef int (*ARKLocalFn)(sunindextype Nlocal, realtype t, N_Vector y, N_Vector void* user_data) This gloc function computes 𝑔(𝑡, 𝑦). It fills the vector glocal as a function of t and y. glocal, Arguments: • Nlocal – the local vector length. • t – the value of the independent variable. • y – the value of the dependent variable vector on this process. • glocal – the output vector of 𝑔(𝑡, 𝑦) on this process. • user_data – a pointer to user data, the same as the user_data parameter passed to ARKStepSetUserData(). Return value: An ARKLocalFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKStep will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARKStepEvolve() will return ARK_LSETUP_FAIL). Notes: This function should assume that all inter-process communication of data needed to calculate glocal has already been done, and that this data is accessible within user data. The case where 𝑔 is mathematically identical to 𝑓𝐼 is allowed. typedef int (*ARKCommFn)(sunindextype Nlocal, realtype t, N_Vector y, void* user_data) This cfn function performs all inter-process communication necessary for the execution of the gloc function above, using the input vector y. Arguments: • Nlocal – the local vector length. • t – the value of the independent variable. • y – the value of the dependent variable vector on this process. • user_data – a pointer to user data, the same as the user_data parameter passed to ARKStepSetUserData(). Return value: An ARKCommFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKStep will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARKStepEvolve() will return ARK_LSETUP_FAIL). Notes: The cfn function is expected to save communicated data in space defined within the data structure user_data. 112 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Each call to the cfn function is preceded by a call to the right-hand side function 𝑓𝐼 with the same (𝑡, 𝑦) arguments. Thus, cfn can omit any communication done by 𝑓𝐼 if relevant to the evaluation of glocal. If all necessary communication was done in 𝑓𝐼 , then cfn = NULL can be passed in the call to ARKBBDPrecInit() (see below). ARKBBDPRE usage In addition to the header files required for the integration of the ODE problem (see the section Access to library and header files), to use the ARKBBDPRE module, the user’s program must include the header file arkode_bbdpre.h which declares the needed function prototypes. The following is a summary of the proper usage of this module. Steps that are unchanged from the skeleton program presented in A skeleton of the user’s main program are italicized. 1. Initialize MPI 2. Set problem dimensions 3. Set vector of initial values 4. Create ARKStep object 5. Specify integration tolerances 6. Create iterative linear solver object When creating the iterative linear solver object, specify the type of preconditioning (PREC_LEFT or PREC_RIGHT) to use. 7. Set linear solver optional inputs 8. Attach linear solver module 9. Initialize the ARKBBDPRE preconditioner module Specify the upper and lower half-bandwidths for computation mudq and mldq, the upper and lower halfbandwidths for storage mukeep and mlkeep, and call ier = ARKBBDPrecInit(arkode_mem, Nlocal, mudq, mldq, mukeep, mlkeep, dqrely, gloc, cfn); to allocate memory and initialize the internal preconditioner data. The last two arguments of ARKBBDPrecInit() are the two user-supplied functions of type ARKLocalFn() and ARKCommFn() described above, respectively. 10. Set optional inputs Note that the user should not call ARKStepSetPreconditioner() as it will overwrite the preconditioner setup and solve functions. 11. Create nonlinear solver object 12. Attach nonlinear solver module 13. Set nonlinear solver optional inputs 14. Specify rootfinding problem 15. Advance solution in time 16. Get optional outputs Additional optional outputs associated with ARKBBDPRE are available through the routines ARKBBDPrecGetWorkSpace() and ARKBBDPrecGetNumGfnEvals(). 4.7. Preconditioner modules 113 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 17. Deallocate memory for solution vector 18. Free solver memory 19. Free linear solver memory 20. Finalize MPI ARKBBDPRE user-callable functions The ARKBBDPRE preconditioner module is initialized (or re-initialized) and attached to the integrator by calling the following functions: int ARKBBDPrecInit(void* arkode_mem, sunindextype Nlocal, sunindextype mudq, sunindextype mldq, sunindextype mukeep, sunindextype mlkeep, realtype dqrely, ARKLocalFn gloc, ARKCommFn cfn) Initializes and allocates (internal) memory for the ARKBBDPRE preconditioner. Arguments: • arkode_mem – pointer to the ARKStep memory block. • Nlocal – local vector length. • mudq – upper half-bandwidth to be used in the difference quotient Jacobian approximation. • mldq – lower half-bandwidth to be used in the difference quotient Jacobian approximation. • mukeep – upper half-bandwidth of the retained banded approximate Jacobian block. • mlkeep – lower half-bandwidth of the retained banded approximate Jacobian block. • dqrely – the relative increment in components of y used in the difference quotient approximations. √ The default is dqrely = unit roundoff, which can be specified by passing dqrely = 0.0. • gloc – the name of the C function (of type ARKLocalFn()) which computes the approximation 𝑔(𝑡, 𝑦) ≈ 𝑓𝐼 (𝑡, 𝑦). • cfn – the name of the C function (of type ARKCommFn()) which performs all inter-process communication required for the computation of 𝑔(𝑡, 𝑦). Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_ILL_INPUT if an input has an illegal value • ARKLS_MEM_FAIL if a memory allocation request failed Notes: If one of the half-bandwidths mudq or mldq to be used in the difference quotient calculation of the approximate Jacobian is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly. The half-bandwidths mudq and mldq need not be the true half-bandwidths of the Jacobian of the local block of 𝑔 when smaller values may provide a greater efficiency. Also, the half-bandwidths mukeep and mlkeep of the retained banded approximate Jacobian block may be even smaller than mudq and mldq, to reduce storage and computational costs further. For all four half-bandwidths, the values need not be the same on every processor. 114 Chapter 4. Using ARKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The ARKBBDPRE module also provides a re-initialization function to allow solving a sequence of problems of the same size, with the same linear solver choice, provided there is no change in Nlocal, mukeep, or mlkeep. After solving one problem, and after calling ARKStepReInit() to re-initialize ARKStep for a subsequent problem, a call to ARKBBDPrecReInit() can be made to change any of the following: the half-bandwidths mudq and mldq used in the difference-quotient Jacobian approximations, the relative increment dqrely, or one of the user-supplied functions gloc and cfn. If there is a change in any of the linear solver inputs, an additional call to the “Set” routines provided by the SUNLINSOL module, and/or one or more of the corresponding ARKStepSet*** functions, must also be made (in the proper order). int ARKBBDPrecReInit(void* arkode_mem, sunindextype mudq, sunindextype mldq, realtype dqrely) Re-initializes the ARKBBDPRE preconditioner module. Arguments: • arkode_mem – pointer to the ARKStep memory block. • mudq – upper half-bandwidth to be used in the difference quotient Jacobian approximation. • mldq – lower half-bandwidth to be used in the difference quotient Jacobian approximation. • dqrely – the relative increment in components of y used in the difference quotient approximations. √ The default is dqrely = unit roundoff, which can be specified by passing dqrely = 0.0. Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_PMEM_NULL if the preconditioner memory is NULL Notes: If one of the half-bandwidths mudq or mldq is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly. The following two optional output functions are available for use with the ARKBBDPRE module: int ARKBBDPrecGetWorkSpace(void* arkode_mem, long int* lenrwBBDP, long int* leniwBBDP) Returns the processor-local ARKBBDPRE real and integer workspace sizes. Arguments: • arkode_mem – pointer to the ARKStep memory block. • lenrwBBDP – the number of realtype values in the ARKBBDPRE workspace. • leniwBBDP – the number of integer values in the ARKBBDPRE workspace. Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_PMEM_NULL if the preconditioner memory is NULL Notes: The workspace requirements reported by this routine correspond only to memory allocated within the ARKBBDPRE module (the banded matrix approximation, banded SUNLinearSolver object, temporary vectors). These values are local to each process. The workspaces referred to here exist in addition to those given by the corresponding function ARKStepGetLSWorkSpace(). 4.7. Preconditioner modules 115 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ARKBBDPrecGetNumGfnEvals(void* arkode_mem, long int* ngevalsBBDP) Returns the number of calls made to the user-supplied gloc function (of type ARKLocalFn()) due to the finite difference approximation of the Jacobian blocks used within the preconditioner setup function. Arguments: • arkode_mem – pointer to the ARKStep memory block. • ngevalsBBDP – the number of calls made to the user-supplied gloc function. Return value: • ARKLS_SUCCESS if no errors occurred • ARKLS_MEM_NULL if the ARKStep memory is NULL • ARKLS_LMEM_NULL if the linear solver memory is NULL • ARKLS_PMEM_NULL if the preconditioner memory is NULL In addition to the ngevalsBBDP gloc evaluations, the costs associated with ARKBBDPRE also include nlinsetups LU factorizations, nlinsetups calls to cfn, npsolves banded backsolve calls, and nfevalsLS right-hand side function evaluations, where nlinsetups is an optional ARKStep output and npsolves and nfevalsLS are linear solver optional outputs (see the table Linear solver interface optional output functions). 116 Chapter 4. Using ARKStep for C and C++ Applications CHAPTER FIVE FARKODE, AN INTERFACE MODULE FOR FORTRAN APPLICATIONS The FARKODE interface module is a package of C functions which support the use of the ARKStep time-stepping module for the solution of ODE systems 𝑀 𝑦˙ = 𝑓𝐸 (𝑡, 𝑦) + 𝑓𝐼 (𝑡, 𝑦), in a mixed Fortran/C setting. While ARKode is written in C, it is assumed here that the user’s calling program and user-supplied problem-defining routines are written in Fortran. We assume only minimal Fortran capabilities; specifically that the Fortran compiler support full Fortran77 functionality (although more modern standards are similarly supported). This package provides the necessary interfaces to ARKODE for the majority of supplied serial and parallel NVECTOR implementations. 5.1 Important note on portability In this package, the names of the interface functions, and the names of the Fortran user routines called by them, appear as dummy names which are mapped to actual values by a series of definitions in the header files. By default, those mapping definitions depend in turn on the C macro F77_FUNC defined in the header file sundials_config.h. The mapping defined by F77_FUNC in turn transforms the C interface names to match the name-mangling approach used by the supplied Fortran compiler. By “name-mangling”, we mean that due to the case-independent nature of the Fortran language, Fortran compilers convert all subroutine and object names to use either all lower-case or all upper-case characters, and append either zero, one or two underscores as a prefix or suffix the the name. For example, the Fortran subroutine MyFunction() will be changed to one of myfunction, MYFUNCTION, myfunction__, MYFUNCTION_, and so on, depending on the Fortran compiler used. SUNDIALS determines this name-mangling scheme at configuration time (see ARKode Installation Procedure). 5.2 Fortran Data Types Throughout this documentation, we will refer to data types according to their usage in C. The equivalent types to these may vary, depending on your computer architecture and on how SUNDIALS was compiled (see ARKode Installation Procedure). A Fortran user should first determine the equivalent types for their architecture and compiler, and then take care that all arguments passed through this Fortran/C interface are declared of the appropriate type. Integers: SUNDIALS uses int, long int and sunindextype types. As discussed in ARKode Installation Procedure, at compilation SUNDIALS allows the configuration of the ‘index’ type, that accepts values of 32-bit signed and 64-bit signed. This choice dictates the size of a SUNDIALS sunindextype variable. • int – equivalent to an INTEGER or INTEGER*4 in Fortran • long int – this will depend on the computer architecture: 117 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), – 32-bit architecture – equivalent to an INTEGER or INTEGER*4 in Fortran – 64-bit architecture – equivalent to an INTEGER*8 in Fortran • sunindextype – this will depend on the SUNDIALS configuration: – 32-bit – equivalent to an INTEGER or INTEGER*4 in Fortran – 64-bit – equivalent to an INTEGER*8 in Fortran Real numbers: As discussed in ARKode Installation Procedure, at compilation SUNDIALS allows the configuration option --with-precision, that accepts values of single, double or extended (the default is double). This choice dictates the size of a realtype variable. The corresponding Fortran types for these realtype sizes are: • single – equivalent to a REAL or REAL*4 in Fortran • double – equivalent to a DOUBLE PRECISION or REAL*8 in Fortran • extended – equivalent to a REAL*16 in Fortran We note that when SUNDIALS is compiled with Fortran interfaces enabled, a file sundials/sundials_fconfig.h is placed in the installation’s include directory, containing information about the Fortran types that correspond to the C types of the configured SUNDIALS installation. This file may be “included” by Fortran routines, as long as the compiler supports the Fortran90 standard (or higher), as shown in the ARKode example programs ark_bruss.f90, ark_bruss1D_FEM_klu.f90 and fark_heat2D.f90. Details on the Fortran interface to ARKode are provided in the following sub-sections: 5.2.1 FARKODE routines In this section, we list the full set of user-callable functions comprising the FARKODE solver interface. For each function, we list the corresponding ARKStep functions, to provide a mapping between the two solver interfaces. Further documentation on each FARKODE function is provided in the following sections, Usage of the FARKODE interface module, FARKODE optional output, Usage of the FARKROOT interface to rootfinding and Usage of the FARKODE interface to built-in preconditioners. Additionally, all Fortran and C functions below are hyperlinked to their definitions in the documentation, for simplified access. Interface to the NVECTOR modules • FNVINITS() (defined by NVECTOR_SERIAL) interfaces to N_VNewEmpty_Serial(). • FNVINITP() (defined by NVECTOR_PARALLEL) interfaces to N_VNewEmpty_Parallel(). • FNVINITOMP() (defined by NVECTOR_OPENMP) interfaces to N_VNewEmpty_OpenMP(). • FNVINITPTS() (defined by NVECTOR_PTHREADS) interfaces to N_VNewEmpty_Pthreads(). • FNVINITPH() (defined by NVECTOR_PARHYP) interfaces to N_VNewEmpty_ParHyp(). Interface to the SUNMATRIX modules • FSUNBANDMATINIT() (defined by SUNMATRIX_BAND) interfaces to SUNBandMatrix(). • FSUNDENSEMATINIT() (defined by SUNMATRIX_DENSE) interfaces to SUNDenseMatrix(). • FSUNSPARSEMATINIT() (defined by SUNMATRIX_SPARSE) interfaces to SUNSparseMatrix(). 118 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Interface to the SUNLINSOL modules • FSUNBANDLINSOLINIT() (defined by SUNLINSOL_BAND) interfaces to SUNLinSol_Band(). • FSUNDENSELINSOLINIT() (defined by SUNLINSOL_DENSE) interfaces to SUNLinSol_Dense(). • FSUNKLUINIT() (defined by SUNLINSOL_KLU) interfaces to SUNLinSol_KLU(). • FSUNKLUREINIT() (defined by SUNLINSOL_KLU) interfaces to SUNLinSol_KLUReinit(). • FSUNLAPACKBANDINIT() (defined SUNLinSol_LapackBand(). by SUNLINSOL_LAPACKBAND) interfaces to • FSUNLAPACKDENSEINIT() (defined SUNLinSol_LapackDense(). by SUNLINSOL_LAPACKDENSE) interfaces to • FSUNPCGINIT() (defined by SUNLINSOL_PCG) interfaces to SUNLinSol_PCG(). • FSUNSPBCGSINIT() (defined by SUNLINSOL_SPBCGS) interfaces to SUNLinSol_SPBCGS(). • FSUNSPFGMRINIT() (defined by SUNLINSOL_SPFGMR) interfaces to SUNLinSol_SPFGMR(). • FSUNSPGMRINIT() (defined by SUNLINSOL_SPGMR) interfaces to SUNLinSol_SPGMR(). • FSUNSPTFQMRINIT() (defined by SUNLINSOL_SPTFQMR) interfaces to SUNLinSol_SPTFQMR(). • FSUNSUPERLUMTINIT() (defined SUNLinSol_SuperLUMT(). by SUNLINSOL_SUPERLUMT) interfaces to interfaces to Interface to the SUNNONLINSOL modules • FSUNNEWTONINIT() (defined SUNNonlinSol_Newton(). by SUNNONLINSOL_NEWTON) • FSUNNEWTONSETMAXITERS() (defined by SUNNONLINSOL_NEWTON) SUNNonlinSolSetMaxIters() for a SUNNONLINSOL_NEWTON object. interfaces to • FSUNFIXEDPOINTINIT() (defined SUNNonlinSol_Newton(). interfaces to by SUNNONLINSOL_FIXEDPOINT) • FSUNFIXEDPOINTSETMAXITERS() (defined by SUNNONLINSOL_FIXEDPOINT) interfaces to SUNNonlinSolSetMaxIters() for a SUNNONLINSOL_FIXEDPOINT object. Interface to the main ARKODE module • FARKMALLOC() interfaces to ARKStepCreate() and ARKStepSetUserData(), as well as one of ARKStepSStolerances() or ARKStepSVtolerances(). • FARKREINIT() interfaces to ARKStepReInit(). • FARKRESIZE() interfaces to ARKStepResize(). • FARKSETIIN() and FARKSETRIN() interface to the ARKStepSet* and ARKStepSet* functions (see Optional input functions). • FARKEWTSET() interfaces to ARKStepWFtolerances(). • FARKADAPTSET() interfaces to ARKStepSetAdaptivityFn(). • FARKEXPSTABSET() interfaces to ARKStepSetStabilityFn(). • FARKSETERKTABLE() interfaces to ARKStepSetTables(). 5.2. Fortran Data Types 119 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • FARKSETIRKTABLE() interfaces to ARKStepSetTables(). • FARKSETARKTABLES() interfaces to ARKStepSetTables(). • FARKSETRESTOLERANCE() ARKStepResVtolerance() interfaces to either ARKStepResStolerance() and • FARKODE() interfaces to ARKStepEvolve(), the ARKStepGet* functions (see Optional output functions), and to the optional output functions for the selected linear solver module (see Optional output functions). • FARKDKY() interfaces to the interpolated output function ARKStepGetDky(). • FARKGETERRWEIGHTS() interfaces to ARKStepGetErrWeights(). • FARKGETESTLOCALERR() interfaces to ARKStepGetEstLocalErrors(). • FARKFREE() interfaces to ARKStepFree(). Interface to the system nonlinear solver interface • FARKNLSINIT() interfaces to ARKStepSetNonlinearSolver(). Interface to the system linear solver interfaces • FARKLSINIT() interfaces to ARKStepSetLinearSolver(). • FARKDENSESETJAC() interfaces to ARKStepSetJacFn(). • FARKBANDSETJAC() interfaces to ARKStepSetJacFn(). • FARKSPARSESETJAC() interfaces to ARKStepSetJacFn(). • FARKLSSETEPSLIN() interfaces to ARKStepSetEpsLin(). • FARKLSSETJAC() interfaces to ARKStepSetJacTimes(). • FARKLSSETPREC() interfaces to ARKStepSetPreconditioner(). Interface to the mass matrix linear solver interfaces • FARKLSMASSINIT() interfaces to ARKStepSetMassLinearSolver(). • FARKDENSESETMASS() interfaces to ARKStepSetMassFn(). • FARKBANDSETMASS() interfaces to ARKStepSetMassFn(). • FARKSPARSESETMASS() interfaces to ARKStepSetMassFn(). • FARKLSSETMASSEPSLIN() interfaces to ARKStepSetMassEpsLin(). • FARKLSSETMASS() interfaces to ARKStepSetMassTimes(). • FARKLSSETMASSPREC() interfaces to ARKStepSetMassPreconditioner(). User-supplied routines As with the native C interface, the FARKODE solver interface requires user-supplied functions to specify the ODE problem to be solved. In contrast to the case of direct use of ARKStep, and of most Fortran ODE solvers, the names of all user-supplied routines here are fixed, in order to maximize portability for the resulting mixed-language program. As a result, whether using a purely implicit, purely explicit, or mixed implicit-explicit solver, routines for both 𝑓𝐸 (𝑡, 𝑦) and 𝑓𝐼 (𝑡, 𝑦) must be provided by the user (though either of which may do nothing): 120 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), FARKODE routine (FORTRAN, user-supplied) FARKIFUN() FARKEFUN() ARKStep interface function type ARKRhsFn() ARKRhsFn() In addition, as with the native C interface a user may provide additional routines to assist in the solution process. Each of the following user-supplied routines is activated by calling the specified “activation” routine, with the exception of FARKSPJAC() which is required whenever a sparse matrix solver is used: FARKODE routine (FORTRAN, user-supplied) FARKDJAC() FARKBJAC() FARKSPJAC() FARKDMASS() FARKBMASS() FARKSPMASS() FARKPSET() FARKPSOL() FARKJTSETUP() FARKJTIMES() FARKMASSPSET() FARKMASSPSOL() FARKMTSETUP() FARKMTIMES() FARKEWT() FARKADAPT() FARKEXPSTAB() ARKStep interface function type ARKLsJacFn() ARKLsJacFn() ARKLsJacFn() ARKLsMassFn() ARKLsMassFn() ARKLsMassFn() ARKLsPrecSetupFn() ARKLsPrecSolveFn() ARKLsJacTimesSetupFn() ARKLsJacTimesVecFn() ARKLsMassPrecSetupFn() ARKLsMassPrecSolveFn() ARKLsMassTimesSetupFn() ARKLsMassTimesVecFn() ARKEwtFn() ARKAdaptFn() ARKExpStabFn() FARKODE “activation” routine FARKDENSESETJAC() FARKBANDSETJAC() FARKSPARSESETJAC() FARKDENSESETMASS() FARKBANDSETMASS() FARKSPARSESETMASS() FARKLSSETPREC() FARKLSSETPREC() FARKLSSETJAC() FARKLSSETJAC() FARKLSSETMASSPREC() FARKLSSETMASSPREC() FARKLSSETMASS() FARKLSSETMASS() FARKEWTSET() FARKADAPTSET() FARKEXPSTABSET() 5.2.2 Usage of the FARKODE interface module The usage of FARKODE requires calls to a variety of interface functions, depending on the method options selected, and two or more user-supplied routines which define the problem to be solved. These function calls and user routines are summarized separately below. Some details are omitted, and the user is referred to the description of the corresponding C interface ARKStep functions for complete information on the arguments of any given user-callable interface routine, or of a given user-supplied function called by an interface function. The usage of FARKODE for rootfinding and with preconditioner modules is described in later subsections. Right-hand side specification The user must in all cases supply the following Fortran routines: subroutine FARKIFUN(T, Y, YDOT, IPAR, RPAR, IER) Sets the YDOT array to 𝑓𝐼 (𝑡, 𝑦), the implicit portion of the right-hand side of the ODE system, as function of the independent variable T = 𝑡 and the array of dependent state variables Y = 𝑦. Arguments: • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing state variables. • YDOT (realtype, output) – array containing state derivatives. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). 5.2. Fortran Data Types 121 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IER (int, output) – return flag (0 success, >0 recoverable error, <0 unrecoverable error). subroutine FARKEFUN(T, Y, YDOT, IPAR, RPAR, IER) Sets the YDOT array to 𝑓𝐸 (𝑡, 𝑦), the explicit portion of the right-hand side of the ODE system, as function of the independent variable T = 𝑡 and the array of dependent state variables Y = 𝑦. Arguments: • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing state variables. • YDOT (realtype, output) – array containing state derivatives. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 success, >0 recoverable error, <0 unrecoverable error). For purely explicit problems, although the routine FARKIFUN() must exist, it will never be called, and may remain empty. Similarly, for purely implicit problems, FARKEFUN() will never be called and must exist and may remain empty. NVECTOR module initialization If using one of the NVECTOR modules supplied with SUNDIALS, the user must make a call of the form CALL CALL CALL CALL CALL FNVINITS(4, NEQ, IER) FNVINITP(COMM, 4, NLOCAL, NGLOBAL, IER) FNVINITOMP(4, NEQ, NUM_THREADS, IER) FNVINITPTS(4, NEQ, NUM_THREADS, IER) FNVINITPH(COMM, 4, NLOCAL, NGLOBAL, IER) in which the specific arguments are as described in the appropriate section of the Chapter Vector Data Structures. SUNMATRIX module initialization In the case of using either an implicit or ImEx method, the solution of each Runge-Kutta stage may involve the solution 𝐼 of linear systems related to the Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 of the implicit portion of the ODE system. If using a Newton iteration with direct SUNLINSOL linear solver module and one of the SUNMATRIX modules supplied with SUNDIALS, the user must make a call of the form CALL FSUNBANDMATINIT(4, N, MU, ML, SMU, IER) CALL FSUNDENSEMATINIT(4, M, N, IER) CALL FSUNSPARSEMATINIT(4, M, N, NNZ, SPARSETYPE, IER) in which the specific arguments are as described in the appropriate section of the Chapter Matrix Data Structures. Note that these matrix options are usable only in a serial or multi-threaded environment. As described in the section Mass matrix solver, in the case of using a problem with a non-identity mass matrix (no matter whether the integrator is implicit, explicit or ImEx), linear systems of the form 𝑀 𝑥 = 𝑏 must be solved, where 𝑀 is the system mass matrix. If these are to be solved with a direct SUNLINSOL linear solver module and one of the SUNMATRIX modules supplied with SUNDIALS, the user must make a call of the form CALL FSUNBANDMASSMATINIT(N, MU, ML, SMU, IER) CALL FSUNDENSEMASSMATINIT(M, N, IER) CALL FSUNSPARSEMASSMATINIT(M, N, NNZ, SPARSETYPE, IER) 122 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), in which the specific arguments are as described in the appropriate section of the Chapter Matrix Data Structures, again noting that these are only usable in a serial or multi-threaded environment. SUNLINSOL module initialization If using a Newton iteration with one of the SUNLINSOL linear solver modules supplied with SUNDIALS, the user must make a call of the form CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL FSUNBANDLINSOLINIT(4, IER) FSUNDENSELINSOLINIT(4, IER) FSUNKLUINIT(4, IER) FSUNLAPACKBANDINIT(4, IER) FSUNLAPACKDENSEINIT(4, IER) FSUNPCGINIT(4, PRETYPE, MAXL, IER) FSUNSPBCGSINIT(4, PRETYPE, MAXL, IER) FSUNSPFGMRINIT(4, PRETYPE, MAXL, IER) FSUNSPGMRINIT(4, PRETYPE, MAXL, IER) FSUNSPTFQMRINIT(4, PRETYPE, MAXL, IER) FSUNSUPERLUMTINIT(4, NUM_THREADS, IER) in which the specific arguments are as described in the appropriate section of the Chapter Description of the SUNLinearSolver module. Note that the dense, band and sparse solvers are usable only in a serial or multi-threaded environment. Once one of these has been initialized, its solver parameters may be modified using a call to the functions CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL FSUNKLUSETORDERING(4, ORD_CHOICE, IER) FSUNSUPERLUMTSETORDERING(4, ORD_CHOICE, IER) FSUNPCGSETPRECTYPE(4, PRETYPE, IER) FSUNPCGSETMAXL(4, MAXL, IER) FSUNSPBCGSSETPRECTYPE(4, PRETYPE, IER) FSUNSPBCGSSETMAXL(4, MAXL, IER) FSUNSPFGMRSETGSTYPE(4, GSTYPE, IER) FSUNSPFGMRSETPRECTYPE(4, PRETYPE, IER) FSUNSPGMRSETGSTYPE(4, GSTYPE, IER) FSUNSPGMRSETPRECTYPE(4, PRETYPE, IER) FSUNSPTFQMRSETPRECTYPE(4, PRETYPE, IER) FSUNSPTFQMRSETMAXL(4, MAXL, IER) where again the call sequences are described in the appropriate sections of the Chapter Description of the SUNLinearSolver module. Similarly, in the case of using one of the SUNLINSOL linear solver modules supplied with SUNDIALS to solve a problem with a non-identity mass matrix, the user must make a call of the form CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL FSUNMASSBANDLINSOLINIT(IER) FSUNMASSDENSELINSOLINIT(IER) FSUNMASSKLUINIT(IER) FSUNMASSLAPACKBANDINIT(IER) FSUNMASSLAPACKDENSEINIT(IER) FSUNMASSPCGINIT(PRETYPE, MAXL, IER) FSUNMASSSPBCGSINIT(PRETYPE, MAXL, IER) FSUNMASSSPFGMRINIT(PRETYPE, MAXL, IER) FSUNMASSSPGMRINIT(PRETYPE, MAXL, IER) FSUNMASSSPTFQMRINIT(PRETYPE, MAXL, IER) FSUNMASSSUPERLUMTINIT(NUM_THREADS, IER) in which the specific arguments are as described in the appropriate section of the Chapter Description of the SUNLinearSolver module. 5.2. Fortran Data Types 123 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Once one of these has been initialized, its solver parameters may be modified using a call to the functions CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL FSUNMASSKLUSETORDERING(ORD_CHOICE, IER) FSUNMASSSUPERLUMTSETORDERING(ORD_CHOICE, IER) FSUNMASSPCGSETPRECTYPE(PRETYPE, IER) FSUNMASSPCGSETMAXL(MAXL, IER) FSUNMASSSPBCGSSETPRECTYPE(PRETYPE, IER) FSUNMASSSPBCGSSETMAXL(MAXL, IER) FSUNMASSSPFGMRSETGSTYPE(GSTYPE, IER) FSUNMASSSPFGMRSETPRECTYPE(PRETYPE, IER) FSUNMASSSPGMRSETGSTYPE(GSTYPE, IER) FSUNMASSSPGMRSETPRECTYPE(PRETYPE, IER) FSUNMASSSPTFQMRSETPRECTYPE(PRETYPE, IER) FSUNMASSSPTFQMRSETMAXL(MAXL, IER) where again the call sequences are described in the appropriate sections of the Chapter Description of the SUNLinearSolver module. SUNNONLINSOL module initialization If using a non-default nonlinear solver method, the user must make a call of the form CALL FSUNNEWTONINIT(4, IER) CALL FSUNFIXEDPOINTINIT(4, M, IER) in which the specific arguments are as described in the appropriate section of the Chapter Nonlinear Solver Data Structures. Once one of these has been initialized, its solver parameters may be modified using a call to the functions CALL FSUNNEWTONSETMAXITERS(4, MAXITERS, IER) CALL FSUNFIXEDPOINTSETMAXITERS(4, MAXITERS, IER) where again the call sequences are described in the appropriate sections of the Chapter Nonlinear Solver Data Structures. Problem specification To set various problem and solution parameters and allocate internal memory, the user must call FARKMALLOC(). subroutine FARKMALLOC(T0, Y0, IMEX, IATOL, RTOL, ATOL, IOUT, ROUT, IPAR, RPAR, IER) Initializes the Fortran interface to the ARKStep solver, providing interfaces to the C routines ARKStepCreate() and ARKStepSetUserData(), as well as one of ARKStepSStolerances() or ARKStepSVtolerances(). Arguments: • T0 (realtype, input) – initial value of 𝑡. • Y0 (realtype, input) – array of initial conditions. • IMEX (int, input) – flag denoting basic integration method: 0 = implicit, 1 = explicit, 2 = ImEx. • IATOL (int, input) – type for absolute tolerance input ATOL: 1 = scalar, 2 = array, 3 = user-supplied function; the user must subsequently call FARKEWTSET() and supply a routine FARKEWT() to compute the error weight vector. • RTOL (realtype, input) – scalar relative tolerance. • ATOL (realtype, input) – scalar or array absolute tolerance. 124 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IOUT (long int, input/output) – array of length 29 for integer optional outputs. • ROUT (realtype, input/output) – array of length 6 for real optional outputs. • IPAR (long int, input/output) – array of user integer data, which will be passed unmodified to all user-provided routines. • RPAR (realtype, input/output) – array with user real data, which will be passed unmodified to all user-provided routines. • IER (int, output) – return flag (0 success, ̸= 0 failure). Notes: Modifications to the user data arrays IPAR and RPAR inside a user-provided routine will be propagated to all subsequent calls to such routines. The optional outputs associated with the main ARKStep integrator are listed in Table: Optional FARKODE integer outputs and Table: Optional FARKODE real outputs, in the section FARKODE optional output. As an alternative to providing tolerances in the call to FARKMALLOC(), the user may provide a routine to compute the error weights used in the WRMS norm evaluations. If supplied, it must have the following form: subroutine FARKEWT(Y, EWT, IPAR, RPAR, IER) It must set the positive components of the error weight vector EWT for the calculation of the WRMS norm of Y. Arguments: • Y (realtype, input) – array containing state variables. • EWT (realtype, output) – array containing the error weight vector. • IPAR (long int, input) – array containing the integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing the real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 success, ̸= 0 failure). If the FARKEWT() routine is provided, then, following the call to FARKMALLOC(), the user must call the function FARKEWTSET(). subroutine FARKEWTSET(FLAG, IER) Informs FARKODE to use the user-supplied FARKEWT() function. Arguments: • FLAG (int, input) – flag, use “1” to denoting to use FARKEWT(). • IER (int, output) – return flag (0 success, ̸= 0 failure). Setting optional inputs Unlike ARKStep’s C interface, that provides separate functions for setting each optional input, FARKODE uses only two functions, that accept keywords to specify which optional input should be set to the provided value. These routines are FARKSETIIN() and FARKSETRIN(), and are further described below. subroutine FARKSETIIN(KEY, IVAL, IER) Specification routine to pass optional integer inputs to the FARKODE() solver. Arguments: • KEY (quoted string, input) – which optional input is set (see Table: Keys for setting FARKODE integer optional inputs). • IVAL (long int, input) – the integer input value to be used. • IER (int, output) – return flag (0 success, ̸= 0 failure). 5.2. Fortran Data Types 125 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Table: Keys for setting FARKODE integer optional inputs Key ORDER DENSE_ORDER LINEAR NONLINEAR EXPLICIT IMPLICIT IMEX IRK_TABLE_NUM ERK_TABLE_NUM ARK_TABLE_NUM (a) MAX_NSTEPS HNIL_WARNS PREDICT_METHOD MAX_ERRFAIL MAX_CONVFAIL MAX_NITERS ADAPT_SMALL_NEF LSETUP_MSBP ARKStep routine ARKStepSetOrder() ARKStepSetDenseOrder() ARKStepSetLinear() ARKStepSetNonlinear() ARKStepSetExplicit() ARKStepSetImplicit() ARKStepSetImEx() ARKStepSetTableNum() ARKStepSetTableNum() ARKStepSetTableNum() ARKStepSetMaxNumSteps() ARKStepSetMaxHnilWarns() ARKStepSetPredictorMethod() ARKStepSetMaxErrTestFails() ARKStepSetMaxConvFails() ARKStepSetMaxNonlinIters() ARKStepSetSmallNumEFails() ARKStepSetMaxStepsBetweenLSet() (a) When setting ARK_TABLE_NUM, pass in IVAL as an array of length 2, specifying the IRK table number first, then the ERK table number. The integer specifiers for each table may be found in the section Appendix: ARKode Constants, or in the ARKode header files arkode_butcher_dirk.h and arkode_butcher_erk.h. subroutine FARKSETRIN(KEY, RVAL, IER) Specification routine to pass optional real inputs to the FARKODE() solver. Arguments: • KEY (quoted string, input) – which optional input is set (see Table: Keys for setting FARKODE real optional inputs). • RVAL (realtype, input) – the real input value to be used. • IER (int, output) – return flag (0 success, ̸= 0 failure). 126 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Table: Keys for setting FARKODE real optional inputs Key INIT_STEP MAX_STEP MIN_STEP STOP_TIME NLCONV_COEF ADAPT_CFL ADAPT_SAFETY ADAPT_BIAS ADAPT_GROWTH ADAPT_ETAMX1 ADAPT_BOUNDS ADAPT_ETAMXF ADAPT_ETACF NONLIN_CRDOWN NONLIN_RDIV LSETUP_DGMAX FIXED_STEP ARKStep routine ARKStepSetInitStep() ARKStepSetMaxStep() ARKStepSetMinStep() ARKStepSetStopTime() ARKStepSetNonlinConvCoef() ARKStepSetCFLFraction() ARKStepSetSafetyFactor() ARKStepSetErrorBias() ARKStepSetMaxGrowth() ARKStepSetMaxFirstGrowth() ARKStepSetFixedStepBounds() ARKStepSetMaxEFailGrowth() ARKStepSetMaxCFailGrowth() ARKStepSetNonlinCRDown() ARKStepSetNonlinRDiv() ARKStepSetDeltaGammaMax() ARKStepSetFixedStep() If a user wishes to reset all of the options to their default values, they may call the routine FARKSETDEFAULTS(). subroutine FARKSETDEFAULTS(IER) Specification routine to reset all FARKODE optional inputs to their default values. Arguments: • IER (int, output) – return flag (0 success, ̸= 0 failure). Optional advanced FARKODE inputs FARKODE supplies additional routines to specify optional advanced inputs to the ARKStepEvolve() solver. These are summarized below, and the user is referred to their C routine counterparts for more complete information. subroutine FARKSETERKTABLE(S, Q, P, C, A, B, BEMBED, IER) Interface to the routine ARKStepSetTables(). Arguments: • S (int, input) – number of stages in the table. • Q (int, input) – global order of accuracy of the method. • P (int, input) – global order of accuracy of the embedding. • C (realtype, input) – array of length S containing the stage times. • A (realtype, input) – array of length S*S containing the ERK coefficients (stored in row-major, “C”, order). • B (realtype, input) – array of length S containing the solution coefficients. • BEMBED (realtype, input) – array of length S containing the embedding coefficients. • IER (int, output) – return flag (0 success, ̸= 0 failure). subroutine FARKSETIRKTABLE(S, Q, P, C, A, B, BEMBED, IER) Interface to the routine ARKStepSetTables(). 5.2. Fortran Data Types 127 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • S (int, input) – number of stages in the table. • Q (int, input) – global order of accuracy of the method. • P (int, input) – global order of accuracy of the embedding. • C (realtype, input) – array of length S containing the stage times. • A (realtype, input) – array of length S*S containing the IRK coefficients (stored in row-major, “C”, order). • B (realtype, input) – array of length S containing the solution coefficients. • BEMBED (realtype, input) – array of length S containing the embedding coefficients. • IER (int, output) – return flag (0 success, ̸= 0 failure). subroutine FARKSETARKTABLES(S, Q, P, CI, CE, AI, AE, BI, BE, B2I, B2E, IER) Interface to the routine ARKStepSetTables(). Arguments: • S (int, input) – number of stages in the table. • Q (int, input) – global order of accuracy of the method. • P (int, input) – global order of accuracy of the embedding. • CI (realtype, input) – array of length S containing the implicit stage times. • CE (realtype, input) – array of length S containing the explicit stage times. • AI (realtype, input) – array of length S*S containing the IRK coefficients (stored in row-major, “C”, order). • AE (realtype, input) – array of length S*S containing the ERK coefficients (stored in row-major, “C”, order). • BI (realtype, input) – array of length S containing the implicit solution coefficients. • BE (realtype, input) – array of length S containing the explicit solution coefficients. • B2I (realtype, input) – array of length S containing the implicit embedding coefficients. • B2E (realtype, input) – array of length S containing the explicit embedding coefficients. • IER (int, output) – return flag (0 success, ̸= 0 failure). subroutine FARKSETRESTOLERANCE(IATOL, ATOL, IER) Interface to the routines ARKStepResStolerance() and ARKStepResVtolerance(). Arguments: • IATOL (int, input) – type for absolute residual tolerance input ATOL: 1 = scalar, 2 = array. • ATOL (realtype, input) – scalar or array absolute residual tolerance. • IER (int, output) – return flag (0 success, ̸= 0 failure). Additionally, a user may set the accuracy-based step size adaptivity strategy (and it’s associated parameters) through a call to FARKSETADAPTIVITYMETHOD(), as described below. subroutine FARKSETADAPTIVITYMETHOD(IMETHOD, IDEFAULT, IPQ, PARAMS, IER) Specification routine to set the step size adaptivity strategy and parameters within the FARKODE() solver. Interfaces with the C routine ARKStepSetAdaptivityMethod(). Arguments: 128 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IMETHOD (int, input) – choice of adaptivity method. • IDEFAULT (int, input) – flag denoting whether to use default parameters (1) or that customized parameters will be supplied (1). • IPQ (int, input) – flag denoting whether to use the embedding order of accuracy (0) or the method order of accuracy (1) within step adaptivity algorithm. • PARAMS (realtype, input) – array of 3 parameters to be used within the adaptivity strategy. • IER (int, output) – return flag (0 success, ̸= 0 failure). Lastly, the user may provide functions to aid/replace those within ARKStep for handling adaptive error control and explicit stability. The former of these is designed for advanced users who wish to investigate custom step adaptivity approaches as opposed to using any of those built-in to ARKStep. In ARKStep’s C/C++ interface, this would be provided by a function of type ARKAdaptFn(); in the Fortran interface this is provided through the user-supplied function: subroutine FARKADAPT(Y, T, H1, H2, H3, E1, E2, E3, Q, P, HNEW, IPAR, RPAR, IER) It must set the new step size HNEW based on the three previous steps (H1, H2, H3) and the three previous error estimates (E1, E2, E3). Arguments: • Y (realtype, input) – array containing state variables. • T (realtype, input) – current value of the independent variable. • H1 (realtype, input) – current step size. • H2 (realtype, input) – previous step size. • H3 (realtype, input) – previous-previous step size. • E1 (realtype, input) – estimated temporal error in current step. • E2 (realtype, input) – estimated temporal error in previous step. • E3 (realtype, input) – estimated temporal error in previous-previous step. • Q (int, input) – global order of accuracy for RK method. • P (int, input) – global order of accuracy for RK embedded method. • HNEW (realtype, output) – array containing the error weight vector. • IPAR (long int, input) – array containing the integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing the real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 success, ̸= 0 failure). This routine is enabled by a call to the activation routine: subroutine FARKADAPTSET(FLAG, IER) Informs FARKODE to use the user-supplied FARKADAPT() function. Arguments: • FLAG (int, input) – flag, use “1” to denoting to use FARKADAPT(), or use “0” to denote a return to the default adaptivity strategy. • IER (int, output) – return flag (0 success, ̸= 0 failure). Note: The call to FARKADAPTSET() must occur after the call to FARKMALLOC(). 5.2. Fortran Data Types 129 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Similarly, if either an explicit or mixed implicit-explicit integration method is to be employed, the user may specify a function to provide the maximum explicitly-stable step for their problem. Again, in the C/C++ interface this would be a function of type ARKExpStabFn(), while in ARKStep’s Fortran interface this must be given through the user-supplied function: subroutine FARKEXPSTAB(Y, T, HSTAB, IPAR, RPAR, IER) It must set the maximum explicitly-stable step size, HSTAB, based on the current solution, Y. Arguments: • Y (realtype, input) – array containing state variables. • T (realtype, input) – current value of the independent variable. • HSTAB (realtype, output) – maximum explicitly-stable step size. • IPAR (long int, input) – array containing the integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing the real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 success, ̸= 0 failure). This routine is enabled by a call to the activation routine: subroutine FARKEXPSTABSET(FLAG, IER) Informs FARKODE to use the user-supplied FARKEXPSTAB() function. Arguments: • FLAG (int, input) – flag, use “1” to denoting to use FARKEXPSTAB(), or use “0” to denote a return to the default error-based stability strategy. • IER (int, output) – return flag (0 success, ̸= 0 failure). Note: The call to FARKEXPSTABSET() must occur after the call to FARKMALLOC(). Nonlinear solver module specification To use a non-default nonlinear solver algorithm, then after it has been initialized in step SUNNONLINSOL module initialization above, the user of FARKODE must attach it to ARKSTEP by calling the FARKNLSINIT() routine: subroutine FARKNLSINIT(IER) Interfaces with the ARKStepSetNonlinearSolver() function to specify use of a non-default nonlinear solver module. Arguments: • IER (int, output) – return flag (0 if success, -1 if a memory allocation error occurred, -2 for an illegal input). System linear solver interface specification To attach the linear solver (and optionally the matrix) object(s) initialized in steps SUNMATRIX module initialization and SUNLINSOL module initialization above, the user of FARKODE must initialize the linear solver interface. To attach any SUNLINSOL object (and optional SUNMATRIX object) to ARKStep, following calls to initialize the SUNLINSOL (and SUNMATRIX) object(s) in steps SUNMATRIX module initialization and SUNLINSOL module initialization above, the user must call the FARKLSINIT() routine: subroutine FARKLSINIT(IER) Interfaces with the ARKStepSetLinearSolver() function to attach a linear solver object (and optionally a matrix object) to ARKStep. 130 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • IER (int, output) – return flag (0 if success, -1 if a memory allocation error occurred, -2 for an illegal input). Matrix-based linear solvers As an option when using ARKSTEP with either the SUNLINSOL_DENSE or SUNLINSOL_LAPACKDENSE linear 𝐼 solver modules, the user may supply a routine that computes a dense approximation of the system Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 . If supplied, it must have the following form: subroutine FARKDJAC(NEQ, T, Y, FY, DJAC, H, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied dense Jacobian approximation function (of type ARKLsJacFn()), to be used by the SUNLINSOL_DENSE or SUNLINSOL_LAPACKDENSE solver modules. Arguments: • NEQ (long int, input) – size of the ODE system. • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing values of the dependent state variables. • FY (realtype, input) – array containing values of the dependent state derivatives. • DJAC (realtype of size (NEQ,NEQ), output) – 2D array containing the Jacobian entries. • H (realtype, input) – current step size. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: Typically this routine will use only NEQ, T, Y, and DJAC. It must compute the Jacobian and store it column-wise in DJAC. If the above routine uses difference quotient approximations, it may need to access the error weight array EWT in the calculation of suitable increments. The array EWT can be obtained by calling FARKGETERRWEIGHTS() using one of the work arrays as temporary storage for EWT. It may also need the unit roundoff, which can be obtained as the optional output ROUT(6), passed from the calling program to this routine using either RPAR or a common block. If the FARKDJAC() routine is provided, then, following the call to FARKLSINIT(), the user must call the routine FARKDENSESETJAC(): subroutine FARKDENSESETJAC(FLAG, IER) Interface to the ARKStepSetJacFn() function, specifying to use the user-supplied routine FARKDJAC() for the Jacobian approximation. Arguments: • FLAG (int, input) – any nonzero value specifies to use FARKDJAC(). • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). As an option when using ARKStep with either the SUNLINSOL_BAND or SUNLINSOL_LAPACKBAND linear solver modules, the user may supply a routine that computes a banded approximation of the linear system Jacobian 𝐼 𝐽 = 𝜕𝑓 𝜕𝑦 . If supplied, it must have the following form: 5.2. Fortran Data Types 131 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FARKBJAC(NEQ, MU, ML, MDIM, T, Y, FY, BJAC, H, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied band Jacobian approximation function (of type ARKLsJacFn()), to be used by the SUNLINSOL_BAND or SUNLINSOL_LAPACKBAND solver modules. Arguments: • NEQ (long int, input) – size of the ODE system. • MU (long int, input) – upper half-bandwidth. • ML (long int, input) – lower half-bandwidth. • MDIM (long int, input) – leading dimension of BJAC array. • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing dependent state variables. • FY (realtype, input) – array containing dependent state derivatives. • BJAC (realtype of size (MDIM,NEQ), output) – 2D array containing the Jacobian entries. • H (realtype, input) – current step size. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: Typically this routine will use only NEQ, MU, ML, T, Y, and BJAC. It must load the MDIM by N array BJAC with the Jacobian matrix at the current (𝑡, 𝑦) in band form. Store in BJAC(k,j) the Jacobian element 𝐽𝑖,𝑗 with k = i - j + MU + 1 (or k = 1, ..., ML+MU+1) and j = 1, ..., N. If the above routine uses difference quotient approximations, it may need to use the error weight array EWT in the calculation of suitable increments. The array EWT can be obtained by calling FARKGETERRWEIGHTS() using one of the work arrays as temporary storage for EWT. It may also need the unit roundoff, which can be obtained as the optional output ROUT(6), passed from the calling program to this routine using either RPAR or a common block. If the FARKBJAC() routine is provided, then, following the call to FARKLSINIT(), the user must call the routine FARKBANDSETJAC(). subroutine FARKBANDSETJAC(FLAG, IER) Interface to the ARKStepSetJacFn() function, specifying to use the user-supplied routine FARKBJAC() for the Jacobian approximation. Arguments: • FLAG (int, input) – any nonzero value specifies to use FARKBJAC(). • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). When using ARKStep with either the SUNLINSOL_KLU or SUNLINSOL_SUPERLUMT sparse direct linear solver 𝐼 modules, the user must supply a routine that computes a sparse approximation of the system Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 . Both the KLU and SuperLU_MT solvers allow specification of 𝐽 in either compressed-sparse-column (CSC) format or compressed-sparse-row (CSR) format. The sparse Jacobian approximation function must have the following form: subroutine FARKSPJAC(T, Y, FY, N, NNZ, JDATA, JINDEXVALS, JINDEXPTRS, H, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied sparse Jacobian approximation function (of type ARKLsJacFn()), to be used by the SUNLINSOL_KLU or SUNLINSOL_SUPERLUMT solver modules. Arguments: 132 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing values of the dependent state variables. • FY (realtype, input) – array containing values of the dependent state derivatives. • N (sunindextype, input) – number of matrix rows and columns in Jacobian. • NNZ (sunindextype, input) – allocated length of nonzero storage in Jacobian. • JDATA (realtype of size NNZ, output) – nonzero values in Jacobian. • JINDEXVALS (sunindextype of size NNZ, output) – row [CSR: column] indices for each nonzero Jacobian entry. • JINDEXPTRS (sunindextype of size N+1, output) – indices of where each column’s [CSR: row’s] nonzeros begin in data array; last entry points just past end of data values. • H (realtype, input) – current step size. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: due to the internal storage format of the SUNMATRIX_SPARSE module, the matrix-specific integer parameters and arrays are all of type sunindextype – the index precision (32-bit vs 64-bit signed integers) specified during the SUNDIALS build. It is assumed that the user’s Fortran codes are constructed to have matching type to how SUNDIALS was installed. If the above routine uses difference quotient approximations to compute the nonzero entries, it may need to access the error weight array EWT in the calculation of suitable increments. The array EWT can be obtained by calling FARKGETERRWEIGHTS() using one of the work arrays as temporary storage for EWT. It may also need the unit roundoff, which can be obtained as the optional output ROUT(6), passed from the calling program to this routine using either RPAR or a common block. When supplying the FARKSPJAC() routine, following the call to FARKLSINIT(), the user must call the routine FARKSPARSESETJAC(). subroutine FARKSPARSESETJAC(IER) Interface to the ARKStepSetJacFn() function, specifying that the user-supplied routine FARKSPJAC() has been provided for the Jacobian approximation. Arguments: • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). Iterative linear solvers As described in the section Linear iteration error control, a user may adjust the linear solver tolerance scaling factor 𝜖𝐿 . Fortran users may adjust this value by calling the function FARKLSSETEPSLIN(): subroutine FARKLSSETEPSLIN(EPLIFAC, IER) Interface to the function ARKStepSetEpsLin() to specify the linear solver tolerance scale factor 𝜖𝐿 for the Newton system linear solver. This routine must be called after FARKLSINIT(). Arguments: 5.2. Fortran Data Types 133 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • EPLIFAC (realtype, input) – value to use for 𝜖𝐿 . Passing a value of 0 indicates to use the default value (0.05). • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Optional user-supplied routines FARKJTSETUP() and FARKJTIMES() may be provided to compute the product of 𝐼 the system Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 and a given vector 𝑣. If these are supplied, then following the call to FARKLSINIT(), the user must call the FARKLSSETJAC() routine with FLAG ̸= 0: subroutine FARKLSSETJAC(FLAG, IER) Interface to the function ARKStepSetJacTimes() to specify use of the user-supplied Jacobian-times-vector setup and product functions, FARKJTSETUP() and FARKJTIMES(), respectively. This routine must be called after FARKLSINIT(). Arguments: • FLAG (int, input) – flag denoting use of user-supplied Jacobian-times-vector routines. A nonzero value specifies to use these the user-supplied routines, a zero value specifies not to use these. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Similarly, optional user-supplied routines FARKPSET() and FARKPSOL() may be provided to perform preconditioning of the iterative linear solver (note: the SUNLINSOL module must have been configured with preconditioning enabled). If these routines are supplied, then following the call to FARKLSINIT() the user must call the routine FARKLSSETPREC() with FLAG ̸= 0: subroutine FARKLSSETPREC(FLAG, IER) Interface to the function ARKStepSetPreconditioner() to specify use of the user-supplied preconditioner setup and solve functions, FARKPSET() and FARKPSOL(), respectively. This routine must be called after FARKLSINIT(). Arguments: • FLAG (int, input) – flag denoting use of user-supplied preconditioning routines. A nonzero value specifies to use these the user-supplied routines, a zero value specifies not to use these. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). With treatment of the linear systems by any of the Krylov iterative solvers, there are four optional user-supplied routines – FARKJTSETUP(), FARKJTIMES(), FARKPSET() and FARKPSOL(). The specifications of these functions are given below. As an option when using iterative linear solvers, the user may supply a routine that computes the product of the system 𝐼 Jacobian 𝐽 = 𝜕𝑓 𝜕𝑦 and a given vector 𝑣. If supplied, it must have the following form: subroutine FARKJTIMES(V, FJV, T, Y, FY, H, IPAR, RPAR, WORK, IER) Interface to provide a user-supplied Jacobian-times-vector product approximation function (corresponding to a C interface routine of type ARKLsJacTimesVecFn()), to be used by one of the Krylov iterative linear solvers. Arguments: • V (realtype, input) – array containing the vector to multiply. • FJV (realtype, output) – array containing resulting product vector. • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing dependent state variables. • FY (realtype, input) – array containing dependent state derivatives. • H (realtype, input) – current step size. 134 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WORK (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: Typically this routine will use only T, Y, V, and FJV. It must compute the product vector 𝐽𝑣, where 𝑣 is given in V, and the product is stored in FJV. If the user’s Jacobian-times-vector product routine requires that any Jacobian related data be evaluated or preprocessed, then the following routine can be used for the evaluation and preprocessing of this data: subroutine FARKJTSETUP(T, Y, FY, H, IPAR, RPAR, IER) Interface to setup data for use in a user-supplied Jacobian-times-vector product approximation function (corresponding to a C interface routine of type ARKLJacTimesSetupFn()). Arguments: • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – array containing dependent state variables. • FY (realtype, input) – array containing dependent state derivatives. • H (realtype, input) – current step size. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: Typically this routine will use only T and Y, and store the results in either the arrays IPAR and RPAR, or in a Fortran module or common block. If preconditioning is to be included, the following routine must be supplied, for solution of the preconditioner linear system: subroutine FARKPSOL(T, Y, FY, R, Z, GAMMA, DELTA, LR, IPAR, RPAR, VT, IER) User-supplied preconditioner solve routine (of type ARKLsPrecSolveFn()). Arguments: • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – current dependent state variable array. • FY (realtype, input) – current dependent state variable derivative array. • R (realtype, input) – right-hand side array. • Z (realtype, output) – solution array. • GAMMA (realtype, input) – Jacobian scaling factor. • DELTA (realtype, input) – desired residual tolerance. • LR (int, input) – flag denoting to solve the right or left preconditioner system: 1 = left preconditioner, 2 = right preconditioner. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). 5.2. Fortran Data Types 135 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IER (int, output) – return flag (0 if success, >0 if a recoverable failure, <0 if a non-recoverable failure). Notes: Typically this routine will use only T, Y, GAMMA, R, LR, and Z. It must solve the preconditioner linear system 𝑃 𝑧 = 𝑟. The preconditioner (or the product of the left and right preconditioners if both are nontrivial) should be an approximation to the matrix 𝑀 − 𝛾𝐽, where 𝑀 is the system mass matrix, 𝛾 is the input GAMMA, 𝐼 and 𝐽 = 𝜕𝑓 𝜕𝑦 . If the user’s preconditioner requires that any Jacobian related data be evaluated or preprocessed, then the following routine can be used for the evaluation and preprocessing of the preconditioner: subroutine FARKPSET(T, Y, FY, JOK, JCUR, GAMMA, H, IPAR, RPAR, IER) User-supplied preconditioner setup routine (of type ARKLsPrecSetupFn()). Arguments: • T (realtype, input) – current value of the independent variable. • Y (realtype, input) – current dependent state variable array. • FY (realtype, input) – current dependent state variable derivative array. • JOK (int, input) – flag indicating whether Jacobian-related data needs to be recomputed: 0 = recompute, 1 = reuse with the current value of GAMMA. • JCUR (realtype, output) – return flag to denote if Jacobian data was recomputed (1=yes, 0=no). • GAMMA (realtype, input) – Jacobian scaling factor. • H (realtype, input) – current step size. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, >0 if a recoverable failure, <0 if a non-recoverable failure). Notes: This routine must set up the preconditioner 𝑃 to be used in the subsequent call to FARKPSOL(). The preconditioner (or the product of the left and right preconditioners if using both) should be an approximation to 𝐼 the matrix 𝑀 − 𝛾𝐽, where 𝑀 is the system mass matrix, 𝛾 is the input GAMMA, and 𝐽 = 𝜕𝑓 𝜕𝑦 . Notes: 1. If the user’s FARKJTSETUP(), FARKJTIMES() or FARKPSET() routines use difference quotient approximations, they may need to use the error weight array EWT and/or the unit roundoff, in the calculation of suitable increments. Also, if FARKPSOL() uses an iterative method in its solution, the residual vector 𝜌 = 𝑟 − 𝑃 𝑧 of the system should be made less than 𝛿 = DELTA in the weighted l2 norm, i.e. (︃ ∑︁ )︃1/2 2 (𝜌𝑖 𝐸𝑊 𝑇𝑖 ) < 𝛿. 𝑖 2. If needed in FARKJTSETUP() FARKJTIMES(), FARKPSOL(), or FARKPSET(), the error weight array EWT can be obtained by calling FARKGETERRWEIGHTS() using a user-allocated array as temporary storage for EWT. 3. If needed in FARKJTSETUP() FARKJTIMES(), FARKPSOL(), or FARKPSET(), the unit roundoff can be obtained as the optional output ROUT(6) (available after the call to FARKMALLOC()) and can be passed using either the RPAR user data array or a common block. 136 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Mass matrix linear solver interface specification To attach the mass matrix linear solver (and optionally the mass matrix) object(s) initialized in steps SUNMATRIX module initialization and SUNLINSOL module initialization above, the user of FARKODE must initialize the mass-matrix linear solver interface. To attach any SUNLINSOL object (and optional SUNMATRIX object) to the mass-matrix solver interface, following calls to initialize the SUNLINSOL (and SUNMATRIX) object(s) in steps SUNMATRIX module initialization and SUNLINSOL module initialization above, the user must call the FARKLSMASSINIT() routine: subroutine FARKLSMASSINIT(TIME_DEP, IER) Interfaces with the ARKStepSetMassLinearSolver() function to attach a linear solver object (and optionally a matrix object) to ARKStep’s mass-matrix linear solver interface. Arguments: • TIME_DEP (int, input) – flag indicating whether the mass matrix is time-dependent (1) or not (0). Currently, only values of “0” are supported • IER (int, output) – return flag (0 if success, -1 if a memory allocation error occurred, -2 for an illegal input). Matrix-based mass matrix linear solvers When using the mass-matrix linear solver interface with the SUNLINSOL_DENSE or SUNLINSOL_LAPACKDENSE mass matrix linear solver modules, the user must supply a routine that computes the dense mass matrix 𝑀 . This routine must have the following form: subroutine FARKDMASS(NEQ, T, DMASS, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied dense mass matrix computation function (of type ARKLsMassFn()), to be used by the SUNLINSOL_DENSE or SUNLINSOL_LAPACKDENSE solver modules. Arguments: • NEQ (long int, input) – size of the ODE system. • T (realtype, input) – current value of the independent variable. • DMASS (realtype of size (NEQ,NEQ), output) – 2D array containing the mass matrix entries. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: Typically this routine will use only NEQ, T, and DMASS. It must compute the mass matrix and store it column-wise in DMASS. To indicate that the FARKDMASS() routine has been provided, then, following the call to FARKLSMASSINIT(), the user must call the routine FARKDENSESETMASS(): subroutine FARKDENSESETMASS(IER) Interface to the ARKStepSetMassFn() function, specifying to use the user-supplied routine FARKDMASS() for the mass matrix calculation. Arguments: • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). 5.2. Fortran Data Types 137 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), When using the mass-matrix linear solver interface with the SUNLINSOL_BAND or SUNLINSOL_LAPACKBAND mass matrix linear solver modules, the user must supply a routine that computes the banded mass matrix 𝑀 . This routine must have the following form: subroutine FARKBMASS(NEQ, MU, ML, MDIM, T, BMASS, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied band mass matrix calculation function (of type ARKLsMassFn()), to be used by the SUNLINSOL_BAND or SUNLINSOL_LAPACKBAND solver modules. Arguments: • NEQ (long int, input) – size of the ODE system. • MU (long int, input) – upper half-bandwidth. • ML (long int, input) – lower half-bandwidth. • MDIM (long int, input) – leading dimension of BMASS array. • T (realtype, input) – current value of the independent variable. • BMASS (realtype of size (MDIM,NEQ), output) – 2D array containing the mass matrix entries. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: Typically this routine will use only NEQ, MU, ML, T, and BMASS. It must load the MDIM by N array BMASS with the mass matrix at the current (𝑡) in band form. Store in BMASS(k,j) the mass matrix element 𝑀𝑖,𝑗 with k = i - j + MU + 1 (or k = 1, ..., ML+MU+1) and j = 1, ..., N. To indicate that the FARKBMASS() routine has been provided, then, following the call to FARKLSMASSINIT(), the user must call the routine FARKBANDSETMASS(): subroutine FARKBANDSETMASS(IER) Interface to the ARKStepSetMassFn() function, specifying to use the user-supplied routine FARKBMASS() for the mass matrix calculation. Arguments: • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). When using the mass-matrix linear solver interface with the SUNLINSOL_KLU or SUNLINSOL_SUPERLUMT mass matrix linear solver modules, the user must supply a routine that computes the sparse mass matrix 𝑀 . Both the KLU and SuperLU_MT solver interfaces support the compressed-sparse-column (CSC) and compressed-sparse-row (CSR) matrix formats. The desired format must have been specified to the FSUNSPARSEMASSMATINIT() function when initializing the sparse mass matrix. The user-provided routine to compute 𝑀 must have the following form: subroutine FARKSPMASS(T, N, NNZ, MDATA, MINDEXVALS, MINDEXPTRS, IPAR, RPAR, WK1, WK2, WK3, IER) Interface to provide a user-supplied sparse mass matrix approximation function (of type ARKLsMassFn()), to be used by the SUNLINSOL_KLU or SUNLINSOL_SUPERLUMT solver modules. Arguments: • T (realtype, input) – current value of the independent variable. • N (sunindextype, input) – number of mass matrix rows and columns. • NNZ (sunindextype, input) – allocated length of nonzero storage in mass matrix. • MDATA (realtype of size NNZ, output) – nonzero values in mass matrix. 138 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • MINDEXVALS (sunindextype of size NNZ, output) – row [CSR: column] indices for each nonzero mass matrix entry. • MINDEXPTRS (sunindextype of size N+1, output) – indices of where each column’s [CSR: row’s] nonzeros begin in data array; last entry points just past end of data values. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • WK1, WK2, WK3 (realtype, input) – array containing temporary workspace of same size as Y. • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: due to the internal storage format of the SUNMATRIX_SPARSE module, the matrix-specific integer parameters and arrays are all of type sunindextype – the index precision (32-bit vs 64-bit signed integers) specified during the SUNDIALS build. It is assumed that the user’s Fortran codes are constructed to have matching type to how SUNDIALS was installed. To indicate that the FARKSPMASS() routine has been provided, then, following the call to FARKLSMASSINIT(), the user must call the routine FARKSPARSESETMASS(): subroutine FARKSPARSESETMASS(IER) Interface to the ARKStepSetMassFn() function, specifying that the user-supplied routine FARKSPMASS() has been provided for the mass matrix calculation. Arguments: • IER (int, output) – return flag (0 if success, ̸= 0 if an error occurred). Iterative mass matrix linear solvers As described in the section Linear iteration error control, a user may adjust the linear solver tolerance scaling factor 𝜖𝐿 . Fortran users may adjust this value for the mass matrix linear solver by calling the function FARKLSSETMASSEPSLIN(): subroutine FARKLSSETMASSEPSLIN(EPLIFAC, IER) Interface to the function ARKStepSetMassEpsLin() to specify the linear solver tolerance scale factor 𝜖𝐿 for the mass matrix linear solver. This routine must be called after FARKLSMASSINIT(). Arguments: • EPLIFAC (realtype, input) – value to use for 𝜖𝐿 . Passing a value of 0 indicates to use the default value (0.05). • IER (int, output) – return flag (0 if success, ̸= 0 if an error). With treatment of the mass matrix linear systems by any of the Krylov iterative solvers, there are two required user-supplied routines, FARKMTSETUP() and FARKMTIMES(), and there are two optional user-supplied routines, FARKMASSPSET() and FARKMASSPSOL(). The specifications of these functions are given below. The required routines when using a Krylov iterative mass matrix linear solver perform setup and computation of the product of the system mass matrix 𝑀 and a given vector 𝑣. The product routine must have the following form: subroutine FARKMTIMES(V, MV, T, IPAR, RPAR, IER) Interface to a user-supplied mass-matrix-times-vector product approximation function (corresponding to a C interface routine of type ARKLsMassTimesVecFn()), to be used by one of the Krylov iterative linear solvers. Arguments: 5.2. Fortran Data Types 139 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • V (realtype, input) – array containing the vector to multiply. • MV (realtype, output) – array containing resulting product vector. • T (realtype, input) – current value of the independent variable. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: Typically this routine will use only T, V, and MV. It must compute the product vector 𝑀 𝑣, where 𝑣 is given in V, and the product is stored in MV. If the user’s mass-matrix-times-vector product routine requires that any mass matrix data be evaluated or preprocessed, then the following routine can be used for the evaluation and preprocessing of this data: subroutine FARKMTSETUP(T, IPAR, RPAR, IER) Interface to a user-supplied mass-matrix-times-vector setup function (corresponding to a C interface routine of type ARKLsMassTimesSetupFn()). Arguments: • T (realtype, input) – current value of the independent variable. • IPAR (long int, input) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: Typically this routine will use only T, and store the results in either the arrays IPAR and RPAR, or in a Fortran module or common block. If no mass matrix setup is needed, this routine should just set IER to 0 and return. To indicate that these routines have been supplied by the user, then, following the call to FARKLSMASSINIT(), the user must call the routine FARKLSSETMASS(): subroutine FARKLSSETMASS(IER) Interface to the function ARKStepSetMassTimes() to specify use of the user-supplied mass-matrix-timesvector setup and product functions FARKMTSETUP() and FARKMTIMES(). This routine must be called after FARKLSMASSINIT(). Arguments: • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Two optional user-supplied preconditioning routines may be supplied to help accelerate convergence of the Krylov mass matrix linear solver. If preconditioning was selected when enabling the Krylov solver (i.e. the solver was set up with IPRETYPE ̸= 0), then the user must also call the routine FARKLSSETMASSPREC() with FLAG ̸= 0: subroutine FARKLSSETMASSPREC(FLAG, IER) Interface to the function ARKStepSetMassPreconditioner() to specify use of the user-supplied preconditioner setup and solve functions, FARKMASSPSET() and FARKMASSPSOL(), respectively. This routine must be called after FARKLSMASSINIT(). Arguments: • FLAG (int, input) – flag denoting use of user-supplied preconditioning routines. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). In addition, the user must provide the following two routines to implement the preconditioner setup and solve functions to be used within the solve. 140 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FARKMASSPSET(T, IPAR, RPAR, IER) User-supplied preconditioner setup routine (of type ARKLsMassPrecSetupFn()). Arguments: • T (realtype, input) – current value of the independent variable. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, >0 if a recoverable failure, <0 if a non-recoverable failure). Notes: This routine must set up the preconditioner 𝑃 to be used in the subsequent call to FARKMASSPSOL(). The preconditioner (or the product of the left and right preconditioners if using both) should be an approximation to the system mass matrix, 𝑀 . subroutine FARKMASSPSOL(T, R, Z, DELTA, LR, IPAR, RPAR, IER) User-supplied preconditioner solve routine (of type ARKLsMassPrecSolveFn()). Arguments: • T (realtype, input) – current value of the independent variable. • R (realtype, input) – right-hand side array. • Z (realtype, output) – solution array. • DELTA (realtype, input) – desired residual tolerance. • LR (int, input) – flag denoting to solve the right or left preconditioner system: 1 = left preconditioner, 2 = right preconditioner. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, >0 if a recoverable failure, <0 if a non-recoverable failure). Notes: Typically this routine will use only T, R, LR, and Z. It must solve the preconditioner linear system 𝑃 𝑧 = 𝑟. The preconditioner (or the product of the left and right preconditioners if both are nontrivial) should be an approximation to the system mass matrix 𝑀 . Notes: 1. If the user’s FARKMASSPSOL() uses an iterative method in its solution, the residual vector 𝜌 = 𝑟 − 𝑃 𝑧 of the system should be made less than 𝛿 = DELTA in the weighted l2 norm, i.e. (︃ )︃1/2 ∑︁ 2 (𝜌𝑖 𝐸𝑊 𝑇𝑖 ) < 𝛿. 𝑖 2. If needed in FARKMTIMES(), FARKMTSETUP(), FARKMASSPSOL(), or FARKMASSPSET(), the error weight array EWT can be obtained by calling FARKGETERRWEIGHTS() using a user-allocated array as temporary storage for EWT. 3. If needed in FARKMTIMES(), FARKMTSETUP(), FARKMASSPSOL(), or FARKMASSPSET(), the unit roundoff can be obtained as the optional output ROUT(6) (available after the call to FARKMALLOC()) and can be passed using either the RPAR user data array or a common block. 5.2. Fortran Data Types 141 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Problem solution Carrying out the integration is accomplished by making calls to FARKODE(). subroutine FARKODE(TOUT, T, Y, ITASK, IER) Fortran interface to the C routine ARKStepEvolve() for performing the solve, along with many of the ARK*Get* routines for reporting on solver statistics. Arguments: • TOUT (realtype, input) – next value of 𝑡 at which a solution is desired. • T (realtype, output) – value of independent variable that corresponds to the output Y • Y (realtype, output) – array containing dependent state variables on output. • ITASK (int, input) – task indicator : – 1 = normal mode (overshoot TOUT and interpolate) – 2 = one-step mode (return after each internal step taken) – 3 = normal ‘tstop’ mode (like 1, but integration never proceeds past TSTOP, which must be specified through a preceding call to FARKSETRIN() using the key STOP_TIME) – 4 = one step ‘tstop’ mode (like 2, but integration never goes past TSTOP). • IER (int, output) – completion flag: – 0 = success, – 1 = tstop return, – 2 = root return, – values -1, ..., -10 are failure modes (see ARKStepEvolve() and Appendix: ARKode Constants). Notes: The current values of the optional outputs are immediately available in IOUT and ROUT upon return from this function (see Table: Optional FARKODE integer outputs and Table: Optional FARKODE real outputs). A full description of error flags and output behavior of the solver (values filled in for T and Y) is provided in the description of ARKStepEvolve(). Additional solution output After a successful return from FARKODE(), the routine FARKDKY() may be used to obtain a derivative of the solution, of order up to 3, at any 𝑡 within the last step taken. subroutine FARKDKY(T, K, DKY, IER) Fortran interface to the C routine ARKDKY() for interpolating output of the solution or its derivatives at any point within the last step taken. Arguments: • T (realtype, input) – time at which solution derivative is desired, within the interval [𝑡𝑛 − ℎ, 𝑡𝑛 ]. • K (int, input) – derivative order (0 ≤ 𝑘 ≤ 3). • DKY (realtype, output) – array containing the computed K-th derivative of 𝑦. • IER (int, output) – return flag (0 if success, <0 if an illegal argument). 142 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Problem reinitialization To re-initialize the ARKStep solver for the solution of a new problem of the same size as one already solved, the user must call FARKREINIT(): subroutine FARKREINIT(T0, Y0, IMEX, IATOL, RTOL, ATOL, IER) Re-initializes the Fortran interface to the ARKStep solver. Arguments: The arguments have the same names and meanings as those of FARKMALLOC(). Notes: This routine performs no memory allocation, instead using the existing memory created by the previous FARKMALLOC() call. The call to specify the linear system solution method may or may not be needed. Following a call to FARKREINIT() if the choice of linear solver is being changed then a user must make a call to create the alternate SUNLINSOL module and then attach it to ARKStep, as shown above. If only linear solver parameters are being modified, then these calls may be made without re-attaching to ARKStep. Resizing the ODE system For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when solving a spatially-adaptive PDE), the FARKODE() integrator may be “resized” between integration steps, through calls to the FARKRESIZE() function, that interfaces with the C routine ARKStepResize(). This function modifies ARKStep’s internal memory structures to use the new problem size, without destruction of the temporal adaptivity heuristics. It is assumed that the dynamical time scales before and after the vector resize will be comparable, so that all time-stepping heuristics prior to calling FARKRESIZE() remain valid after the call. If instead the dynamics should be re-calibrated, the FARKODE memory structure should be deleted with a call to FARKFREE(), and re-created with a call to FARKMALLOC(). subroutine FARKRESIZE(T0, Y0, HSCALE, ITOL, RTOL, ATOL, IER) Re-initializes the Fortran interface to the ARKStep solver for a differently-sized ODE system. Arguments: • T0 (realtype, input) – initial value of the independent variable 𝑡. • Y0 (realtype, input) – array of dependent-variable initial conditions. • HSCALE (realtype, input) – desired step size scale factor: – 1.0 is the default, – any value <= 0.0 results in the default. • ITOL (int, input) – flag denoting that a new relative tolerance and vector of absolute tolerances are supplied in the RTOL and ATOL arguments: – 0 = retain the current scalar-valued relative and absolute tolerances, or the user-supplied error weight function, FARKEWT(). – 1 = RTOL contains the new scalar-valued relative tolerance and ATOL contains a new array of absolute tolerances. • RTOL (realtype, input) – scalar relative tolerance. • ATOL (realtype, input) – array of absolute tolerances. • IER (int, output) – return flag (0 success, ̸= 0 failure). Notes: This routine performs the opposite set of of operations as FARKREINIT(): it does not reinitialize any of the time-step heuristics, but it does perform memory reallocation. 5.2. Fortran Data Types 143 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Following a call to FARKRESIZE(), the internal data structures for all linear solver and matrix objects will be the incorrect size. Hence, calls must be made to re-create the linear system solver, mass matrix solver, linear system matrix, and mass matrix, followed by calls to attach the updated objects to ARKStep. If any user-supplied linear solver helper routines were used (Jacobian evaluation, Jacobian-vector product, mass matrix evaluation, mass-matrix-vector product, preconditioning, etc.), then the relevant “set” routines to specify their usage must be called again following the re-specification of the linear solver module(s). Memory deallocation To free the internal memory created by FARKMALLOC(), FARKLSINIT(), FARKLSMASSINIT(), and the SUNMATRIX, SUNLINSOL and SUNNONLINSOL objects, the user may call FARKFREE(), as follows: subroutine FARKFREE() Frees the internal memory created by FARKMALLOC(). Arguments: None. 5.2.3 FARKODE optional output We note that the optional inputs to FARKODE have already been described in the section Setting optional inputs. IOUT and ROUT arrays In the Fortran interface, the optional outputs from the FARKODE() solver are accessed not through individual functions, but rather through a pair of user-allocated arrays, IOUT (having long int type) of dimension at least 35, and ROUT (having realtype type) of dimension at least 6. These arrays must be allocated by the user program that calls FARKODE(), that passes them through the Fortran interface as arguments to FARKMALLOC(). Following this call, FARKODE() will modify the entries of these arrays to contain all optional output values provided to a Fortran user. In the following tables, Table: Optional FARKODE integer outputs and Table: Optional FARKODE real outputs, we list the entries in these arrays by index, naming them according to their role with the main ARKStep solver, and list the relevant ARKStep C/C++ function that is actually called to extract the output value. Similarly, optional integer output values that are specific to the ARKLS linear solver interface are listed in Table: Optional ARKLS interface outputs. For more details on the optional inputs and outputs to ARKStep, see the sections Optional input functions and Optional output functions. 144 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Table: Optional FARKODE integer outputs IOUT Index 1 2 3 4 5 6 7 8 9 10 11 12 13 Optional output LENRW LENIW NST NST_STB NST_ACC NST_ATT NFE NFI NSETUPS NETF NNI NCFN NGE ARKStep function ARKStepGetWorkSpace() ARKStepGetWorkSpace() ARKStepGetNumSteps() ARKStepGetNumExpSteps() ARKStepGetNumAccSteps() ARKStepGetNumStepAttempts() ARKStepGetNumRhsEvals() (num 𝑓𝐸 calls) ARKStepGetNumRhsEvals() (num 𝑓𝐼 calls) ARKStepGetNumLinSolvSetups() ARKStepGetNumErrTestFails() ARKStepGetNumNonlinSolvIters() ARKStepGetNumNonlinSolvConvFails() ARKStepGetNumGEvals() Table: Optional FARKODE real outputs ROUT Index 1 2 3 4 5 6 Optional output H0U HU HCUR TCUR TOLSF UROUND ARKStep function ARKStepGetActualInitStep() ARKStepGetLastStep() ARKStepGetCurrentStep() ARKStepGetCurrentTime() ARKStepGetTolScaleFactor() UNIT_ROUNDOFF (see the section Data Types) Table: Optional ARKLS interface outputs IOUT Index 14 15 16 17 18 19 20 21 22 23 24 Optional output LENRWLS LENIWLS LSTF NFELS NJE NJTS NJTV NPE NPS NLI NCFL 5.2. Fortran Data Types ARKStep function ARKLsGetWorkSpace() ARKLsGetWorkSpace() ARKLsGetLastFlag() ARKLsGetNumRhsEvals() ARKLsGetNumJacEvals() ARKLsGetNumJTSetupEvals() ARKLsGetNumJtimesEvals() ARKLsGetNumPrecEvals() ARKLsGetNumPrecSolves() ARKLsGetNumLinIters() ARKLsGetNumConvFails() 145 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Table: Optional ARKLS mass interface outputs IOUT Index 25 26 27 28 29 30 31 32 33 34 35 Optional output LENRWMS LENIWMS LSTMF NMSET NMSOL NMTSET NMMUL NMPE NMPS NMLI NMCFL ARKStep function ARKLsGetMassWorkSpace() ARKLsGetMassWorkSpace() ARKLsGetLastMassFlag() ARKLsGetNumMassSetups() ARKLsGetNumMassSolves() ARKLsGetNumMTSetups() ARKLsGetNumMassMult() ARKLsGetNumMassPrecEvals() ARKLsGetNumMassPrecSolves() ARKLsGetNumMassIters() ARKLsGetNumMassConvFails() Additional optional output routines In addition to the optional inputs communicated through FARKSET* calls and the optional outputs extracted from IOUT and ROUT, the following user-callable routines are available. To obtain the error weight array EWT, containing the multiplicative error weights used in the WRMS norms, the user may call the routine FARKGETERRWEIGHTS() as follows: subroutine FARKGETERRWEIGHTS(EWT, IER) Retrieves the current error weight vector (interfaces with ARKStepGetErrWeights()). Arguments: • EWT (realtype, output) – array containing the error weight vector. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: The array EWT must have already been allocated by the user, of the same size as the solution array Y. Similarly, to obtain the estimated local truncation errors, following a successful call to FARKODE(), the user may call the routine FARKGETESTLOCALERR() as follows: subroutine FARKGETESTLOCALERR(ELE, IER) Retrieves the current local truncation ARKStepGetEstLocalErrors()). error estimate vector (interfaces with Arguments: • ELE (realtype, output) – array with the estimated local truncation error vector. • IER (int, output) – return flag (0 if success, ̸= 0 if an error). Notes: The array ELE must have already been allocated by the user, of the same size as the solution array Y. 5.2.4 Usage of the FARKROOT interface to rootfinding The FARKROOT interface package allows programs written in Fortran to use the rootfinding feature of the ARKStep solver module. The user-callable functions in FARKROOT, with the corresponding ARKStep functions, are as follows: • FARKROOTINIT() interfaces to ARKStepRootInit(), • FARKROOTINFO() interfaces to ARKStepGetRootInfo(), and 146 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • FARKROOTFREE() interfaces to ARKStepRootInit(), freeing memory by calling the initializer with no root functions. Note that at this time, FARKROOT does not provide support to specify the direction of zero-crossing that is to be monitored. Instead, all roots are considered. However, the actual direction of zero-crossing may be captured by the user through monitoring the sign of any non-zero elements in the array INFO returned by FARKROOTINFO(). In order to use the rootfinding feature of ARKStep, after calling FARKMALLOC() but prior to calling FARKODE(), the user must call FARKROOTINIT() to allocate and initialize memory for the FARKROOT module: subroutine FARKROOTINIT(NRTFN, IER) Initializes the Fortran interface to the FARKROOT module. Arguments: • NRTFN (int, input) – total number of root functions. • IER (int, output) – return flag (0 success, -1 if ARKStep memory is NULL, and -11 if a memory allocation error occurred). If rootfinding is enabled, the user must specify the functions whose roots are to be found. These rootfinding functions should be implemented in the user-supplied FARKROOTFN() subroutine: subroutine FARKROOTFN(T, Y, G, IPAR, RPAR, IER) User supplied function implementing the vector-valued function 𝑔(𝑡, 𝑦) such that the roots of the NRTFN components 𝑔𝑖 (𝑡, 𝑦) = 0 are sought. Arguments: • T (realtype, input) – independent variable value 𝑡. • Y (realtype, input) – dependent variable array 𝑦. • G (realtype, output) – function value array 𝑔(𝑡, 𝑦). • IPAR (long int, input/output) – integer user data array, the same as the array passed to FARKMALLOC(). • RPAR (realtype, input/output) – real-valued user data array, the same as the array passed to FARKMALLOC(). • IER (int, output) – return flag (0 success, < 0 if error). When making calls to FARKODE() to solve the ODE system, the occurrence of a root is flagged by the return value IER = 2. In that case, if NRTFN > 1, the functions 𝑔𝑖 (𝑡, 𝑦) which were found to have a root can be identified by calling the routine FARKROOTINFO(): subroutine FARKROOTINFO(NRTFN, INFO, IER) Initializes the Fortran interface to the FARKROOT module. Arguments: • NRTFN (int, input) – total number of root functions. • INFO (int, input/output) – array of length NRTFN with root information (must be allocated by the user). For each index, i = 1, ..., NRTFN: – INFO(i) = 1 if 𝑔𝑖 (𝑡, 𝑦) was found to have a root, and 𝑔𝑖 is increasing. – INFO(i) = -1 if 𝑔𝑖 (𝑡, 𝑦) was found to have a root, and 𝑔𝑖 is decreasing. – INFO(i) = 0 otherwise. • IER (int, output) – return flag (0 success, < 0 if error). 5.2. Fortran Data Types 147 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The total number of calls made to the root function FARKROOTFN(), denoted NGE, can be obtained from IOUT(12). If the FARKODE/ARKStep memory block is reinitialized to solve a different problem via a call to FARKREINIT(), then the counter NGE is reset to zero. Lastly, to free the memory resources allocated by a prior call to FARKROOTINIT(), the user must make a call to FARKROOTFREE(): subroutine FARKROOTFREE() Frees memory associated with the FARKODE rootfinding module. 5.2.5 Usage of the FARKODE interface to built-in preconditioners The FARKODE interface enables usage of the two built-in preconditioning modules ARKBANDPRE and ARKBBDPRE. Details on how these preconditioners work are provided in the section Preconditioner modules. In this section, we focus specifically on the Fortran interface to these modules. Usage of the FARKBP interface to ARKBANDPRE The FARKBP interface module is a package of C functions which, as part of the FARKODE interface module, support the use of the ARKStep solver with the serial or threaded NVector modules (The NVECTOR_SERIAL Module, The NVECTOR_OPENMP Module or The NVECTOR_PTHREADS Module), and the combination of the ARKBANDPRE preconditioner module (see the section A serial banded preconditioner module) with the ARKStep linear solver interface and any of the Krylov iterative linear solvers. The two user-callable functions in this package, with the corresponding ARKStep function around which they wrap, are: • FARKBPINIT() interfaces to ARKBandPrecInit(). • FARKBPOPT() interfaces to the ARKBANDPRE optional ARKBandPrecGetWorkSpace() and ARKBandPrecGetNumRhsEvals(). output functions, As with the rest of the FARKODE routines, the names of the user-supplied routines are mapped to actual values through a series of definitions in the header file farkbp.h. The following is a summary of the usage of this module. Steps that are unchanged from the main program described in the section Usage of the FARKODE interface module are italicized. 1. Right-hand side specification 2. NVECTOR module initialization 3. SUNLINSOL module initialization Initialize one of the iterative SUNLINSOL modules, by calling one of FSUNPCGINIT, FSUNSPBCGSINIT, FSUNSPFGMRINIT, FSUNSPGMRINIT or FSUNSPTFQMRINIT, supplying an argument to specify that the SUNLINSOL module should utilize left or right preconditioning. 4. Problem specification 5. Set optional inputs 6. Linear solver interface specification First, initialize the ARKStep linear solver interface by calling FARKLSINIT(). Optionally, to specify that ARKStep should use the supplied FARKJTIMES() and FARKJTSETUP() routines, the user should call FARKLSSETJAC() with FLAG ̸= 0, as described in the section Iterative linear solvers. Then, to initialize the ARKBANDPRE preconditioner, call the routine FARKBPINIT(), as follows: 148 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FARKBPINIT(NEQ, MU, ML, IER) Interfaces with the ARKBandPrecInit() function to allocate memory and initialize data associated with the ARKBANDPRE preconditioner. Arguments: • NEQ (long int, input) – problem size. • MU (long int, input) – upper half-bandwidth of the band matrix that is retained as an approximation of the Jacobian. • ML (long int, input) – lower half-bandwidth of the band matrix approximation to the Jacobian. • IER (int, output) – return flag (0 if success, -1 if a memory failure). 7. Problem solution 8. ARKBANDPRE optional outputs Optional outputs for ARKStep’s linear solver interface are listed in Table: Optional ARKLS interface outputs. To obtain the optional outputs associated with the ARKBANDPRE module, the user should call the FARKBPOPT(), as specified below: subroutine FARKBPOPT(LENRWBP, LENIWBP, NFEBP) Interfaces with the ARKBANDPRE optional output functions. Arguments: • LENRWBP (long int, output) – length of real preconditioner work space (from ARKBandPrecGetWorkSpace()). • LENIWBP (long int, output) – length of integer preconditioner work space, in integer words (from ARKBandPrecGetWorkSpace()). • NFEBP (long int, output) – ARKBandPrecGetNumRhsEvals()) number of 𝑓𝐼 (𝑡, 𝑦) evaluations (from 9. Additional solution output 10. Problem re-initialization 11. Memory deallocation (The memory allocated for the FARKBP module is deallocated automatically by FARKFREE()) Usage of the FARKBBD interface to ARKBBDPRE The FARKBBD interface module is a package of C functions which, as part of the FARKODE interface module, support the use of the ARKStep solver with the parallel vector module (The NVECTOR_PARALLEL Module), and the combination of the ARKBBDPRE preconditioner module (see the section A parallel band-block-diagonal preconditioner module) with any of the Krylov iterative linear solvers. The user-callable functions in this package, with the corresponding ARKStep and ARKBBDPRE functions, are as follows: • FARKBBDINIT() interfaces to ARKBBDPrecInit(). • FARKBBDREINIT() interfaces to ARKBBDPrecReInit(). • FARKBBDOPT() interfaces to the ARKBBDPRE optional output functions. In addition to the functions required for general FARKODE usage, the user-supplied functions required by this package are listed in the table below, each with the corresponding interface function which calls it (and its type within ARKBBDPRE or ARKStep). 5.2. Fortran Data Types 149 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Table: FARKBBD function mapping FARKBBD routine (FORTRAN, user-supplied) FARKGLOCFN() FARKCOMMFN() FARKJTIMES() FARKJTSETUP() ARKStep routine (C, interface) FARKgloc FARKcfn FARKJtimes FARKJTSetup ARKStep interface function type ARKLocalFn() ARKCommFn() ARKLsJacTimesVecFn() ARKLsJacTimesSetupFn() As with the rest of the FARKODE routines, the names of all user-supplied routines here are fixed, in order to maximize portability for the resulting mixed-language program. Additionally, based on flags discussed above in the section FARKODE routines, the names of the user-supplied routines are mapped to actual values through a series of definitions in the header file farkbbd.h. The following is a summary of the usage of this module. Steps that are unchanged from the main program described in the section Usage of the FARKODE interface module are italicized. 1. Right-hand side specification 2. NVECTOR module initialization 3. SUNLINSOL module initialization Initialize one of the iterative SUNLINSOL modules, by calling one of FSUNPCGINIT, FSUNSPBCGSINIT, FSUNSPFGMRINIT, FSUNSPGMRINIT or FSUNSPTFQMRINIT, supplying an argument to specify that the SUNLINSOL module should utilize left or right preconditioning. 4. Problem specification 5. Set optional inputs 6. Linear solver interface specification First, initialize ARKStep’s linear solver interface by calling FARKLSINIT(). Optionally, to specify that ARKStep should use the supplied FARKJTIMES() and FARKJTSETUP() routines, the user should call FARKLSSETJAC() with FLAG ̸= 0, as described in the section Iterative linear solvers. Then, to initialize the ARKBBDPRE preconditioner, call the function FARKBBDINIT(), as described below: subroutine FARKBBDINIT(NLOCAL, MUDQ, MLDQ, MU, ML, DQRELY, IER) Interfaces with the ARKBBDPrecInit() routine to initialize the ARKBBDPRE preconditioning module. Arguments: • NLOCAL (long int, input) – local vector size on this process. • MUDQ (long int, input) – upper half-bandwidth to be used in the computation of the local Jacobian blocks by difference quotients. These may be smaller than the true half-bandwidths of the Jacobian of the local block of 𝑔, when smaller values may provide greater efficiency. • MLDQ (long int, input) – lower half-bandwidth to be used in the computation of the local Jacobian blocks by difference quotients. • MU (long int, input) – upper half-bandwidth of the band matrix that is retained as an approximation of the local Jacobian block (may be smaller than MUDQ). • ML (long int, input) – lower half-bandwidth of the band matrix that is retained as an approximation of the local Jacobian block (may be smaller than MLDQ). • DQRELY (realtype, input) – relative increment factor in 𝑦 for difference quotients (0.0 indicates to use the default). • IER (int, output) – return flag (0 if success, -1 if a memory failure). 150 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 7. Problem solution 8. ARKBBDPRE optional outputs Optional outputs from the ARKStep linear solver interface are listed in Table: Optional ARKLS interface outputs. To obtain the optional outputs associated with the ARKBBDPRE module, the user should call FARKBBDOPT(), as specified below: subroutine FARKBBDOPT(LENRWBBD, LENIWBBD, NGEBBD) Interfaces with the ARKBBDPRE optional output functions. Arguments: • LENRWBP (long int, output) – length of real preconditioner work space on this process (from ARKBBDPrecGetWorkSpace()). • LENIWBP (long int, output) – length of integer preconditioner work space on this process (from ARKBBDPrecGetWorkSpace()). • NGEBBD (long int, output) – number ARKBBDPrecGetNumGfnEvals()) so far. of 𝑔(𝑡, 𝑦) evaluations (from 9. Additional solution output 10. Problem re-initialization If a sequence of problems of the same size is being solved using the same linear solver in combination with the ARKBBDPRE preconditioner, then the ARKStep package can be re-initialized for the second and subsequent problems by calling FARKREINIT(), following which a call to FARKBBDREINIT() may or may not be needed. If the input arguments are the same, no FARKBBDREINIT() call is needed. If there is a change in input arguments other than MU or ML, then the user program should call FARKBBDREINIT() as specified below: subroutine FARKBBDREINIT(NLOCAL, MUDQ, MLDQ, DQRELY, IER) Interfaces with the ARKBBDPrecReInit() function to reinitialize the ARKBBDPRE module. Arguments: The arguments of the same names have the same meanings as in FARKBBDINIT(). However, if the value of MU or ML is being changed, then a call to FARKBBDINIT() must be made instead. Finally, if there is a change in any of the linear solver inputs, then a call to one of FSUNSPGMRINIT(), FSUNSPBCGSINIT(), FSUNSPTFQMRINIT(), FSUNSPFGMRINIT() or FSUNPCGINIT(), followed by a call to FARKLSINIT() must also be made; in this case the linear solver memory is reallocated. 11. Problem resizing If a sequence of problems of different sizes (but with similar dynamical time scales) is being solved using the same linear solver (SPGMR, SPBCG, SPTFQMR, SPFGMR or PCG) in combination with the ARKBBDPRE preconditioner, then the ARKStep package can be re-initialized for the second and subsequent problems by calling FARKRESIZE(), following which a call to FARKBBDINIT() is required to delete and re-allocate the preconditioner memory of the correct size. subroutine FARKBBDREINIT(NLOCAL, MUDQ, MLDQ, DQRELY, IER) Interfaces with the ARKBBDPrecReInit() function to reinitialize the ARKBBDPRE module. Arguments: The arguments of the same names have the same meanings as in FARKBBDINIT(). However, if the value of MU or ML is being changed, then a call to FARKBBDINIT() must be made instead. Finally, if there is a change in any of the linear solver inputs, then a call to one of FSUNSPGMRINIT(), FSUNSPBCGSINIT(), FSUNSPTFQMRINIT(), FSUNSPFGMRINIT() or FSUNPCGINIT(), followed by a call to FARKLSINIT() must also be made; in this case the linear solver memory is reallocated. 5.2. Fortran Data Types 151 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 12. Memory deallocation (The memory allocated for the FARKBBD module is deallocated automatically by FARKFREE()). 13. User-supplied routines The following two routines must be supplied for use with the ARKBBDPRE module: subroutine FARKGLOCFN(NLOC, T, YLOC, GLOC, IPAR, RPAR, IER) User-supplied routine (of type ARKLocalFn()) that computes a processor-local approximation 𝑔(𝑡, 𝑦) to the right-hand side function 𝑓𝐼 (𝑡, 𝑦). Arguments: • NLOC (long int, input) – local problem size. • T (realtype, input) – current value of the independent variable. • YLOC (realtype, input) – array containing local dependent state variables. • GLOC (realtype, output) – array containing local dependent state derivatives. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). subroutine FARKCOMMFN(NLOC, T, YLOC, IPAR, RPAR, IER) User-supplied routine (of type ARKCommFn()) that performs all inter-process communication necessary for the execution of the FARKGLOCFN() function above, using the input vector YLOC. Arguments: • NLOC (long int, input) – local problem size. • T (realtype, input) – current value of the independent variable. • YLOC (realtype, input) – array containing local dependent state variables. • IPAR (long int, input/output) – array containing integer user data that was passed to FARKMALLOC(). • RPAR (realtype, input/output) – array containing real user data that was passed to FARKMALLOC(). • IER (int, output) – return flag (0 if success, >0 if a recoverable error occurred, <0 if an unrecoverable error occurred). Notes: This subroutine must be supplied even if it is not needed, and must return IER = 0. 152 Chapter 5. FARKODE, an Interface Module for FORTRAN Applications CHAPTER SIX USING ERKSTEP FOR C AND C++ APPLICATIONS This chapter is concerned with the use of the ERKStep time-stepping module for the solution of nonstiff initial value problems (IVPs) in a C or C++ language setting. The following sections discuss the header files and the layout of the user’s main program, and provide descriptions of the ERKStep user-callable functions and user-supplied functions. The example programs described in the companion document [R2018] may be helpful. Those codes may be used as templates for new codes and are included in the ARKode package examples subdirectory. ERKStep uses the input and output constants from the shared ARKode infrastructure. These are defined as needed in this chapter, but for convenience the full list is provided separately in the section Appendix: ARKode Constants. The relevant information on using ERKStep’s C and C++ interfaces is detailed in the following sub-sections. 6.1 Access to library and header files At this point, it is assumed that the installation of ARKode, following the procedure described in the section ARKode Installation Procedure, has been completed successfully. Regardless of where the user’s application program resides, its associated compilation and load commands must make reference to the appropriate locations for the library and header files required by ARKode. The relevant library files are • libdir/libsundials_arkode.lib, • libdir/libsundials_nvec*.lib, where the file extension .lib is typically .so for shared libraries and .a for static libraries. The relevant header files are located in the subdirectories • incdir/include/arkode • incdir/include/sundials • incdir/include/nvector The directories libdir and incdir are the installation library and include directories, respectively. For a default installation, these are instdir/lib and instdir/include, respectively, where instdir is the directory where SUNDIALS was installed (see the section ARKode Installation Procedure for further details). 6.2 Data Types The sundials_types.h file contains the definition of the variable type realtype, which is used by the SUNDIALS solvers for all floating-point data, the definition of the integer type sunindextype, which is used for vector and matrix indices, and booleantype, which is used for certain logic operations within SUNDIALS. 153 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 6.2.1 Floating point types The type “realtype” can be set to float, double, or long double, depending on how SUNDIALS was installed (with the default being double). The user can change the precision of the SUNDIALS solvers’ floating-point arithmetic at the configuration stage (see the section Configuration options (Unix/Linux)). Additionally, based on the current precision, sundials_types.h defines the values BIG_REAL to be the largest value representable as a realtype, SMALL_REAL to be the smallest positive value representable as a realtype, and UNIT_ROUNDOFF to be the smallest realtype number, 𝜀, such that 1.0 + 𝜀 ̸= 1.0. Within SUNDIALS, real constants may be set to have the appropriate precision by way of a macro called RCONST. It is this macro that needs the ability to branch on the definition realtype. In ANSI C, a floating-point constant with no suffix is stored as a double. Placing the suffix “F” at the end of a floating point constant makes it a float, whereas using the suffix “L” makes it a long double. For example, #define A 1.0 #define B 1.0F #define C 1.0L defines A to be a double constant equal to 1.0, B to be a float constant equal to 1.0, and C to be a long double constant equal to 1.0. The macro call RCONST(1.0) automatically expands to 1.0 if realtype is double, to 1.0F if realtype is float, or to 1.0L if realtype is long double. SUNDIALS uses the RCONST macro internally to declare all of its floating-point constants. A user program which uses the type realtype and the RCONST macro to handle floating-point constants is precisionindependent, except for any calls to precision-specific standard math library functions. Users can, however, use the types double, float, or long double in their code (assuming that this usage is consistent with the size of realtype values that are passed to and from SUNDIALS). Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use realtype, so long as the SUNDIALS libraries have been compiled using the same precision (for details see the section ARKode Installation Procedure). 6.2.2 Integer types used for vector and matrix indices The type sunindextype can be either a 32- or 64-bit signed integer. The default is the portable int64_t type, and the user can change it to int32_t at the configuration stage. The configuration system will detect if the compiler does not support portable types, and will replace int32_t and int64_t with int and long int, respectively, to ensure use of the desired sizes on Linux, Mac OS X, and Windows platforms. SUNDIALS currently does not support unsigned integer types for vector and matrix indices, although these could be added in the future if there is sufficient demand. A user program which uses sunindextype to handle vector and matrix indices will work with both index storage types except for any calls to index storage-specific external libraries. (Our C and C++ example programs use sunindextype.) Users can, however, use any one of int, long int, int32_t, int64_t or long long int in their code, assuming that this usage is consistent with the typedef for sunindextype on their architecture. Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use sunindextype, so long as the SUNDIALS libraries use the appropriate index storage type (for details see the section ARKode Installation Procedure). 6.3 Header Files When using ERKStep, the calling program must include several header files so that various macros and data types can be used. The header file that is always required is: 154 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode/arkode_erkstep.h, the main header file for the ERKStep time-stepping module, which defines the several types and various constants, includes function prototypes, and includes the shared arkode/arkode.h header file. Note that arkode.h includes sundials_types.h directly, which defines the types realtype, sunindextype and booleantype and the constants SUNFALSE and SUNTRUE, so a user program does not need to include sundials_types.h directly. Additionally, the calling program must also include an NVECTOR implementation header file, of the form nvector/nvector_***.h, corresponding to the user’s preferred data layout and form of parallelism. See the section Vector Data Structures for details for the appropriate name. This file in turn includes the header file sundials_nvector.h which defines the abstract N_Vector data type. 6.4 A skeleton of the user’s main program The following is a skeleton of the user’s main program (or calling program) for the integration of an ODE IVP using the ERKStep module. Most of the steps are independent of the NVECTOR implementation used. For the steps that are not, refer to the section Vector Data Structures for the specific name of the function to be called or macro to be referenced. 1. Initialize parallel or multi-threaded environment, if appropriate. For example, call MPI_Init to initialize MPI if used, or set num_threads, the number of threads to use within the threaded vector functions, if used. 2. Set problem dimensions, etc. This generally includes the problem size, N, and may include the local vector length Nlocal. Note: The variables N and Nlocal should be of type sunindextype. 3. Set vector of initial values To set the vector y0 of initial values, use the appropriate functions defined by the particular NVECTOR implementation. For native SUNDIALS vector implementations (except the CUDA and RAJA based ones), use a call of the form y0 = N_VMake_***(..., ydata); if the realtype array ydata containing the initial values of 𝑦 already exists. Otherwise, create a new vector by making a call of the form y0 = N_VNew_***(...); and then set its elements by accessing the underlying data where it is located with a call of the form ydata = N_VGetArrayPointer_***(y0); See the sections The NVECTOR_SERIAL Module through The NVECTOR_PTHREADS Module for details. For the HYPRE and PETSc vector wrappers, first create and initialize the underlying vector, and then create the NVECTOR wrapper with a call of the form y0 = N_VMake_***(yvec); where yvec is a HYPRE or PETSc vector. Note that calls like N_VNew_***(...) and N_VGetArrayPointer_***(...) are not available for these vector wrappers. See the sections The NVECTOR_PARHYP Module and The NVECTOR_PETSC Module for details. 6.4. A skeleton of the user’s main program 155 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), If using either the CUDA- or RAJA-based vector implementations use a call of the form y0 = N_VMake_***(..., c); where c is a pointer to a suncudavec or sunrajavec vector class if this class already exists. Otherwise, create a new vector by making a call of the form N_VGetDeviceArrayPointer_*** or N_VGetHostArrayPointer_*** Note that the vector class will allocate memory on both the host and device when instantiated. See the sections The NVECTOR_CUDA Module and The NVECTOR_RAJA Module for details. 4. Create ERKStep object Call arkode_mem = ERKStepCreate(...) to create the ERKStep memory block. ERKStepCreate() returns a void* pointer to this memory structure. See the section ERKStep initialization and deallocation functions for details. 5. Specify integration tolerances Call ERKStepSStolerances() or ERKStepSVtolerances() to specify either a scalar relative tolerance and scalar absolute tolerance, or a scalar relative tolerance and a vector of absolute tolerances, respectively. Alternatively, call ERKStepWFtolerances() to specify a function which sets directly the weights used in evaluating WRMS vector norms. See the section ERKStep tolerance specification functions for details. 6. Set optional inputs Call ERKStepSet* functions to change any optional inputs that control the behavior of ERKStep from their default values. See the section Optional input functions for details. 7. Specify rootfinding problem Optionally, call ERKStepRootInit() to initialize a rootfinding problem to be solved during the integration of the ODE system. See the section Rootfinding initialization function for general details, and the section Optional input functions for relevant optional input calls. 8. Advance solution in time For each point at which output is desired, call ier = ERKStepEvolve(arkode_mem, tout, yout, &tret, itask); Here, itask specifies the return mode. The vector yout (which can be the same as the vector y0 above) will contain 𝑦(𝑡out ). See the section ERKStep solver function for details. 9. Get optional outputs Call ERKStepGet* functions to obtain optional output. See the section Optional output functions for details. 10. Deallocate memory for solution vector Upon completion of the integration, deallocate memory for the vector y (or yout) by calling the NVECTOR destructor function: N_VDestroy(y); 11. Free solver memory Call ERKStepFree(&arkode_mem) to free the memory allocated for the ERKStep module. 156 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 12. Finalize MPI, if used Call MPI_Finalize to terminate MPI. 6.5 ERKStep User-callable functions This section describes the functions that are called by the user to setup and then solve an IVP using the ERKStep time-stepping module. Some of these are required; however, starting with the section Optional input functions, the functions listed involve optional inputs/outputs or restarting, and those paragraphs may be skipped for a casual use of ARKode’s ERKStep module. In any case, refer to the preceding section, A skeleton of the user’s main program, for the correct order of these calls. On an error, each user-callable function returns a negative value (or NULL if the function returns a pointer) and sends an error message to the error handler routine, which prints the message to stderr by default. However, the user can set a file as error output or can provide her own error handler function (see the section Optional input functions for details). 6.5.1 ERKStep initialization and deallocation functions void* ERKStepCreate(ARKRhsFn f, realtype t0, N_Vector y0) This function allocates and initializes memory for a problem to be solved using the ERKStep time-stepping module in ARKode. Arguments: • f – the name of the C function (of type ARKRhsFn()) defining the right-hand side function in 𝑦˙ = 𝑓 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: If successful, a pointer to initialized problem memory of type void*, to be passed to all userfacing ERKStep routines listed below. If unsuccessful, a NULL pointer will be returned, and an error message will be printed to stderr. void ERKStepFree(void** arkode_mem) This function frees the problem memory arkode_mem created by ERKStepCreate(). Arguments: • arkode_mem – pointer to the ERKStep memory block. Return value: None 6.5.2 ERKStep tolerance specification functions These functions specify the integration tolerances. One of them should be called before the first call to ERKStepEvolve(); otherwise default values of reltol = 1e-4 and abstol = 1e-9 will be used, which may be entirely incorrect for a specific problem. The integration tolerances reltol and abstol define a vector of error weights, ewt. ERKStepSStolerances(), this vector has components In the case of ewt[i] = 1.0/(reltol*abs(y[i]) + abstol); whereas in the case of ERKStepSVtolerances() the vector components are given by 6.5. ERKStep User-callable functions 157 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ewt[i] = 1.0/(reltol*abs(y[i]) + abstol[i]); This vector is used in all error tests, which use a weighted RMS norm on all error-like vectors v: (︃ ‖𝑣‖𝑊 𝑅𝑀 𝑆 = 𝑁 1 ∑︁ (𝑣𝑖 𝑒𝑤𝑡𝑖 )2 𝑁 𝑖=1 )︃1/2 , where 𝑁 is the problem dimension. Alternatively, the user may supply a custom function to supply the ewt vector, through a call to ERKStepWFtolerances(). int ERKStepSStolerances(void* arkode_mem, realtype reltol, realtype abstol) This function specifies scalar relative and absolute tolerances. Arguments: • arkode_mem – pointer to the ERKStep memory block. • reltol – scalar relative tolerance. • abstol – scalar absolute tolerance. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL • ARK_NO_MALLOC if the ERKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ERKStepSVtolerances(void* arkode_mem, realtype reltol, N_Vector abstol) This function specifies a scalar relative tolerance and a vector absolute tolerance (a potentially different absolute tolerance for each vector component). Arguments: • arkode_mem – pointer to the ERKStep memory block. • reltol – scalar relative tolerance. • abstol – vector containing the absolute tolerances for each solution component. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL • ARK_NO_MALLOC if the ERKStep memory was not allocated by the time-stepping module • ARK_ILL_INPUT if an argument has an illegal value (e.g. a negative tolerance). int ERKStepWFtolerances(void* arkode_mem, ARKEwtFn efun) This function specifies a user-supplied function efun to compute the error weight vector ewt. Arguments: • arkode_mem – pointer to the ERKStep memory block. • efun – the name of the function (of type ARKEwtFn()) that implements the error weight vector computation. Return value: 158 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL • ARK_NO_MALLOC if the ERKStep memory was not allocated by the time-stepping module General advice on the choice of tolerances For many users, the appropriate choices for tolerance values in reltol and abstol are a concern. The following pieces of advice are relevant. 1. The scalar relative tolerance reltol is to be set to control relative errors. So a value of 10−4 means that errors are controlled to .01%. We do not recommend using reltol larger than 10−3 . On the other hand, reltol should not be so small that it is comparable to the unit roundoff of the machine arithmetic (generally around 10−15 for double-precision). 2. The absolute tolerances abstol (whether scalar or vector) need to be set to control absolute errors when any components of the solution vector 𝑦 may be so small that pure relative error control is meaningless. For example, if 𝑦𝑖 starts at some nonzero value, but in time decays to zero, then pure relative error control on 𝑦𝑖 makes no sense (and is overly costly) after 𝑦𝑖 is below some noise level. Then abstol (if scalar) or abstol[i] (if a vector) needs to be set to that noise level. If the different components have different noise levels, then abstol should be a vector. For example, see the example problem ark_robertson.c, and the discussion of it in the ARKode Examples Documentation [R2018]. In that problem, the three components vary between 0 and 1, and have different noise levels; hence the atols vector therein. It is impossible to give any general advice on abstol values, because the appropriate noise levels are completely problem-dependent. The user or modeler hopefully has some idea as to what those noise levels are. 3. Finally, it is important to pick all the tolerance values conservatively, because they control the error committed on each individual step. The final (global) errors are an accumulation of those per-step errors, where that accumulation factor is problem-dependent. A general rule of thumb is to reduce the tolerances by a factor of 10 from the actual desired limits on errors. So if you want .01% relative accuracy (globally), a good choice for reltol is 10−5 . In any case, it is a good idea to do a few experiments with the tolerances to see how the computed solution values vary as tolerances are reduced. Advice on controlling nonphysical negative values In many applications, some components in the true solution are always positive or non-negative, though at times very small. In the numerical solution, however, small negative (nonphysical) values can then occur. In most cases, these values are harmless, and simply need to be controlled, not eliminated, but in other cases any value that violates a constraint may cause a simulation to halt. For both of these scenarios the following pieces of advice are relevant. 1. The best way to control the size of unwanted negative computed values is with tighter absolute tolerances. Again this requires some knowledge of the noise level of these components, which may or may not be different for different components. Some experimentation may be needed. 2. If output plots or tables are being generated, and it is important to avoid having negative numbers appear there (for the sake of avoiding a long explanation of them, if nothing else), then eliminate them, but only in the context of the output medium. Then the internal values carried by the solver are unaffected. Remember that a small negative value in 𝑦 returned by ERKStep, with magnitude comparable to abstol or less, is equivalent to zero as far as the computation is concerned. 3. The user’s right-hand side routine 𝑓 should never change a negative value in the solution vector 𝑦 to a nonnegative value in attempt to “fix” this problem, since this can lead to numerical instability. If the 𝑓 routine cannot tolerate a zero or negative value (e.g. because there is a square root or log), then the offending value should be changed to zero or a tiny positive number in a temporary variable (not in the input 𝑦 vector) for the purposes of computing 𝑓 (𝑡, 𝑦). 6.5. ERKStep User-callable functions 159 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 4. Positivity and non-negativity constraints on components can be enforced by use of the recoverable error return feature in the user-supplied right-hand side function, 𝑓 . When a recoverable error is encountered, ERKStep will retry the step with a smaller step size, which typically alleviates the problem. However, because this option involves some additional overhead cost, it should only be exercised if the use of absolute tolerances to control the computed values is unsuccessful. 6.5.3 Rootfinding initialization function As described in the section Rootfinding, while solving the IVP, ARKode’s time-stepping modules have the capability to find the roots of a set of user-defined functions. To activate the root-finding algorithm, call the following function. This is normally called only once, prior to the first call to ERKStepEvolve(), but if the rootfinding problem is to be changed during the solution, ERKStepRootInit() can also be called prior to a continuation call to ERKStepEvolve(). int ERKStepRootInit(void* arkode_mem, int nrtfn, ARKRootFn g) Initializes a rootfinding problem to be solved during the integration of the ODE system. It must be called after ERKStepCreate(), and before ERKStepEvolve(). Arguments: • arkode_mem – pointer to the ERKStep memory block. • nrtfn – number of functions 𝑔𝑖 , an integer ≥ 0. • g – name of user-supplied function, of type ARKRootFn(), defining the functions 𝑔𝑖 whose roots are sought. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL • ARK_MEM_FAIL if there was a memory allocation failure • ARK_ILL_INPUT if nrtfn is greater than zero but g = NULL. Notes: To disable the rootfinding feature after it has already been initialized, or to free memory associated with ERKStep’s rootfinding module, call ERKStepRootInit with nrtfn = 0. Similarly, if a new IVP is to be solved with a call to ERKStepReInit(), where the new IVP has no rootfinding problem but the prior one did, then call ERKStepRootInit with nrtfn = 0. 6.5.4 ERKStep solver function This is the central step in the solution process – the call to perform the integration of the IVP. One of the input arguments (itask) specifies one of two modes as to where ERKStep is to return a solution. These modes are modified if the user has set a stop time (with a call to the optional input function ERKStepSetStopTime()) or has requested rootfinding. int ERKStepEvolve(void* arkode_mem, realtype tout, N_Vector yout, realtype *tret, int itask) Integrates the ODE over an interval in 𝑡. Arguments: • arkode_mem – pointer to the ERKStep memory block. • tout – the next time at which a computed solution is desired. • yout – the computed solution vector. 160 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • tret – the time corresponding to yout (output). • itask – a flag indicating the job of the solver for the next user step. The ARK_NORMAL option causes the solver to take internal steps until it has just overtaken a userspecified output time, tout, in the direction of integration, i.e. 𝑡𝑛−1 < tout ≤ 𝑡𝑛 for forward integration, or 𝑡𝑛 ≤ tout < 𝑡𝑛−1 for backward integration. It will then compute an approximation to the solution 𝑦(𝑡𝑜𝑢𝑡) by interpolation (using one of the dense output routines described in the section Interpolation). The ARK_ONE_STEP option tells the solver to only take a single internal step 𝑦𝑛−1 → 𝑦𝑛 and then return control back to the calling program. If this step will overtake tout then the solver will again return an interpolated result; otherwise it will return a copy of the internal solution 𝑦𝑛 in the vector yout Return value: • ARK_SUCCESS if successful. • ARK_ROOT_RETURN if ERKStepEvolve() succeeded, and found one or more roots. If the number of root functions, nrtfn, is greater than 1, call ERKStepGetRootInfo() to see which 𝑔𝑖 were found to have a root at (*tret). • ARK_TSTOP_RETURN if ERKStepEvolve() succeeded and returned at tstop. • ARK_MEM_NULL if the arkode_mem argument was NULL. • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if one of the inputs to ERKStepEvolve() is illegal, or some other input to the solver was either illegal or missing. Details will be provided in the error message. Typical causes of this failure: 1. A component of the error weight vector became zero during internal time-stepping. 2. A root of one of the root functions was found both at a point 𝑡 and also very near 𝑡. • ARK_TOO_MUCH_WORK if the solver took mxstep internal steps but could not reach tout. The default value for mxstep is MXSTEP_DEFAULT = 500. • ARK_TOO_MUCH_ACC if the solver could not satisfy the accuracy demanded by the user for some internal step. • ARK_ERR_FAILURE if error test failures occurred either too many times (ark_maxnef ) during one internal time step or occurred with |ℎ| = ℎ𝑚𝑖𝑛 . • ARK_VECTOROP_ERR a vector operation error occured. Notes: The input vector yout can use the same memory as the vector y0 of initial conditions that was passed to ERKStepCreate(). In ARK_ONE_STEP mode, tout is used only on the first call, and only to get the direction and a rough scale of the independent variable. All failure return values are negative and so testing the return argument for negative values will trap all ERKStepEvolve() failures. Since interpolation may reduce the accuracy in the reported solution, if full method accuracy is desired the user should issue a call to ERKStepSetStopTime() before the call to ERKStepEvolve() to specify a fixed stop time to end the time step and return to the user. Upon return from ERKStepEvolve(), a copy of the internal solution 𝑦𝑛 will be returned in the vector yout. Once the integrator returns at a tstop time, any future testing for tstop is disabled (and can be re-enabled only though a new call to ERKStepSetStopTime()). On any error return in which one or more internal steps were taken by ERKStepEvolve(), the returned values of tret and yout correspond to the farthest point reached in the integration. On all other error returns, tret and yout are left unchanged from those provided to the routine. 6.5. ERKStep User-callable functions 161 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 6.5.5 Optional input functions There are numerous optional input parameters that control the behavior of the ERKStep solver, each of which may be modified from its default value through calling an appropriate input function. The following tables list all optional input functions, grouped by which aspect of ERKStep they control. Detailed information on the calling syntax and arguments for each function are then provided following each table. The optional inputs are grouped into the following categories: • General ERKStep options (Optional inputs for ERKStep), • IVP method solver options (Optional inputs for IVP method selection), • Step adaptivity solver options (Optional inputs for time step adaptivity), For the most casual use of ERKStep, relying on the default set of solver parameters, the reader can skip to the following section, User-supplied functions. We note that, on an error return, all of the optional input functions send an error message to the error handler function. We also note that all error return values are negative, so a test on the return arguments for negative values will catch all errors. Optional inputs for ERKStep Optional input Return ERKStep solver parameters to their defaults Set dense output order Supply a pointer to a diagnostics output file Supply a pointer to an error output file Supply a custom error handler function Disable time step adaptivity (fixed-step mode) Supply an initial step size to attempt Maximum no. of warnings for 𝑡𝑛 + ℎ = 𝑡𝑛 Maximum no. of internal steps before tout Maximum absolute step size Minimum absolute step size Set a value for 𝑡𝑠𝑡𝑜𝑝 Supply a pointer for user data Maximum no. of ERKStep error test failures Function name ERKStepSetDefaults() ERKStepSetDenseOrder() ERKStepSetDiagnostics() ERKStepSetErrFile() ERKStepSetErrHandlerFn() ERKStepSetFixedStep() ERKStepSetInitStep() ERKStepSetMaxHnilWarns() ERKStepSetMaxNumSteps() ERKStepSetMaxStep() ERKStepSetMinStep() ERKStepSetStopTime() ERKStepSetUserData() ERKStepSetMaxErrTestFails() Default internal 3 NULL stderr internal fn disabled estimated 10 500 ∞ 0.0 ∞ NULL 7 int ERKStepSetDefaults(void* arkode_mem) Resets all optional input parameters to ERKStep’s original default values. Arguments: • arkode_mem – pointer to the ERKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Does not change problem-defining function pointer f or the user_data pointer. Also leaves alone any data structures or options related to root-finding (those can be reset using ERKStepRootInit()). 162 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ERKStepSetDenseOrder(void* arkode_mem, int dord) Specifies the degree of the polynomial interpolant used for dense output (i.e. interpolation of solution output values). Arguments: • arkode_mem – pointer to the ERKStep memory block. • dord – requested polynomial order of accuracy. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Allowed values are between 0 and min(q,5), where q is the order of the overall integration method. int ERKStepSetDiagnostics(void* arkode_mem, FILE* diagfp) Specifies the file pointer for a diagnostics file where all ERKStep step adaptivity and solver information is written. Arguments: • arkode_mem – pointer to the ERKStep memory block. • diagfp – pointer to the diagnostics output file. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This parameter can be stdout or stderr, although the suggested approach is to specify a pointer to a unique file opened by the user and returned by fopen. If not called, or if called with a NULL file pointer, all diagnostics output is disabled. When run in parallel, only one process should set a non-NULL value for this pointer, since statistics from all processes would be identical. int ERKStepSetErrFile(void* arkode_mem, FILE* errfp) Specifies a pointer to the file where all ERKStep warning and error messages will be written if the default internal error handling function is used. Arguments: • arkode_mem – pointer to the ERKStep memory block. • errfp – pointer to the output file. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value for errfp is stderr. Passing a NULL value disables all future error message output (except for the case wherein the ERKStep memory pointer is NULL). This use of the function is strongly discouraged. 6.5. ERKStep User-callable functions 163 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), If used, this routine should be called before any other optional input functions, in order to take effect for subsequent error messages. int ERKStepSetErrHandlerFn(void* arkode_mem, ARKErrHandlerFn ehfun, void* eh_data) Specifies the optional user-defined function to be used in handling error messages. Arguments: • arkode_mem – pointer to the ERKStep memory block. • ehfun – name of user-supplied error handler function. • eh_data – pointer to user data passed to ehfun every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Error messages indicating that the ERKStep solver memory is NULL will always be directed to stderr. int ERKStepSetFixedStep(void* arkode_mem, realtype hfixed) Disabled time step adaptivity within ERKStep, and specifies the fixed time step size to use for all internal steps. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hfixed – value of the fixed step size to use. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass 0.0 to return ERKStep to the default (adaptive-step) mode. Use of this function is not recommended, since we it gives no assurance of the validity of the computed solutions. It is primarily provided for code-to-code verification testing purposes. When using ERKStepSetFixedStep(), any values provided to the functions ERKStepSetInitStep(), ERKStepSetAdaptivityFn(), ERKStepSetMaxErrTestFails(), ERKStepSetAdaptivityMethod(), ERKStepSetCFLFraction(), ERKStepSetErrorBias(), ERKStepSetFixedStepBounds(), ERKStepSetMaxEFailGrowth(), ERKStepSetMaxFirstGrowth(), ERKStepSetMaxGrowth(), ERKStepSetSafetyFactor(), ERKStepSetSmallNumEFails() and ERKStepSetStabilityFn() will be ignored, since temporal adaptivity is disabled. If both ERKStepSetFixedStep() and ERKStepSetStopTime() are used, then the fixed step size will be used for all steps until the final step preceding the provided stop time (which may be shorter). To resume use of the previous fixed step size, another call to ERKStepSetFixedStep() must be made prior to calling ERKStepEvolve() to resume integration. It is not recommended that ERKStepSetFixedStep() be used in concert with ERKStepSetMaxStep() or ERKStepSetMinStep(), since at best those latter two routines will provide no useful information to the solver, and at worst they may interfere with the desired fixed step size. int ERKStepSetInitStep(void* arkode_mem, realtype hin) Specifies the initial time step size ERKStep should use after initialization or re-initialization. 164 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ERKStep memory block. • hin – value of the initial step to be attempted (̸= 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass 0.0 to use the default value. ⃦ 2 ⃦ ⃦ ⃦ By default, ERKStep estimates the initial step size to be the solution ℎ of the equation ⃦ ℎ2𝑦¨ ⃦ = 1, where 𝑦¨ is an estimated value of the second derivative of the solution at t0. int ERKStepSetMaxHnilWarns(void* arkode_mem, int mxhnil) Specifies the maximum number of messages issued by the solver to warn that 𝑡 + ℎ = 𝑡 on the next internal step, before ERKStep will instead return with an error. Arguments: • arkode_mem – pointer to the ERKStep memory block. • mxhnil – maximum allowed number of warning messages (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 10; set mxhnil to zero to specify this default. A negative value indicates that no warning messages should be issued. int ERKStepSetMaxNumSteps(void* arkode_mem, long int mxsteps) Specifies the maximum number of steps to be taken by the solver in its attempt to reach the next output time, before ERKStep will return with an error. Arguments: • arkode_mem – pointer to the ERKStep memory block. • mxsteps – maximum allowed number of internal steps. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Passing mxsteps = 0 results in ERKStep using the default value (500). Passing mxsteps < 0 disables the test (not recommended). int ERKStepSetMaxStep(void* arkode_mem, realtype hmax) Specifies the upper bound on the magnitude of the time step size. Arguments: • arkode_mem – pointer to the ERKStep memory block. 6.5. ERKStep User-callable functions 165 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • hmax – maximum absolute value of the time step size (≥ 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass hmax ≤ 0.0 to set the default value of ∞. int ERKStepSetMinStep(void* arkode_mem, realtype hmin) Specifies the lower bound on the magnitude of the time step size. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hmin – minimum absolute value of the time step size (≥ 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Pass hmin ≤ 0.0 to set the default value of 0. int ERKStepSetStopTime(void* arkode_mem, realtype tstop) Specifies the value of the independent variable 𝑡 past which the solution is not to proceed. Arguments: • arkode_mem – pointer to the ERKStep memory block. • tstop – stopping time for the integrator. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default is that no stop time is imposed. int ERKStepSetUserData(void* arkode_mem, void* user_data) Specifies the user data block user_data and attaches it to the main ERKStep memory block. Arguments: • arkode_mem – pointer to the ERKStep memory block. • user_data – pointer to the user data. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If specified, the pointer to user_data is passed to all user-supplied functions for which it is an argument; otherwise NULL is passed. 166 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int ERKStepSetMaxErrTestFails(void* arkode_mem, int maxnef ) Specifies the maximum number of error test failures permitted in attempting one step, before returning with an error. Arguments: • arkode_mem – pointer to the ERKStep memory block. • maxnef – maximum allowed number of error test failures (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 7; set maxnef ≤ 0 to specify this default. Optional inputs for IVP method selection Optional input Set integrator method order Set explicit RK table Specify explicit RK table number Function name ERKStepSetOrder() ERKStepSetTable() ERKStepSetTableNum() Default 4 internal internal int ERKStepSetOrder(void* arkode_mem, int ord) Specifies the order of accuracy for the ERK integration method. Arguments: • arkode_mem – pointer to the ERKStep memory block. • ord – requested order of accuracy. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The allowed values are 2 ≤ ord ≤ 8. Any illegal input will result in the default value of 4. Since ord affects the memory requirements for the internal ERKStep memory block, it cannot be changed after the first call to ERKStepEvolve(), unless ERKStepReInit() is called. int ERKStepSetTable(void* arkode_mem, ARKodeButcherTable B) Specifies a customized Butcher table for the ERK method. Arguments: • arkode_mem – pointer to the ERKStep memory block. • B – the Butcher table for the explicit RK method. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value 6.5. ERKStep User-callable functions 167 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: For a description of the ARKodeButcherTable type and related functions for creating Butcher tables see Butcher Table Data Structure. No error checking is performed to ensure that either the method order p or the embedding order q specified in the Butcher table structure correctly describe the coefficients in the Butcher table. Error checking is performed to ensure that the Butcher table is strictly lower-triangular (i.e. that it specifies an ERK method). If the Butcher table does not contain an embedding, the user must call ERKStepSetFixedStep() to enable fixed-step mode and set the desired time step size. int ERKStepSetTableNum(void* arkode_mem, int etable) Indicates to use a specific built-in Butcher table for the ERK method. Arguments: • arkode_mem – pointer to the ERKStep memory block. • etable – index of the Butcher table. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: etable should match an existing explicit method from the section Explicit Butcher tables. Error-checking is performed to ensure that the table exists, and is not implicit. Optional inputs for time step adaptivity The mathematical explanation of ARKode’s time step adaptivity algorithm, including how each of the parameters below is used within the code, is provided in the section Time step adaptivity. Optional input Set a custom time step adaptivity function Choose an existing time step adaptivity method Explicit stability safety factor Time step error bias factor Bounds determining no change in step size Maximum step growth factor on error test fail Maximum first step growth factor Maximum general step growth factor Time step safety factor Error fails before MaxEFailGrowth takes effect Explicit stability function Function name ERKStepSetAdaptivityFn() ERKStepSetAdaptivityMethod() ERKStepSetCFLFraction() ERKStepSetErrorBias() ERKStepSetFixedStepBounds() ERKStepSetMaxEFailGrowth() ERKStepSetMaxFirstGrowth() ERKStepSetMaxGrowth() ERKStepSetSafetyFactor() ERKStepSetSmallNumEFails() ERKStepSetStabilityFn() Default internal 0 0.5 1.5 1.0 1.5 0.3 10000.0 20.0 0.96 2 none int ERKStepSetAdaptivityFn(void* arkode_mem, ARKAdaptFn hfun, void* h_data) Sets a user-supplied time-step adaptivity function. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hfun – name of user-supplied adaptivity function. • h_data – pointer to user data passed to hfun every time it is called. 168 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This function should focus on accuracy-based time step estimation; for stability based time steps the function ERKStepSetStabilityFn() should be used instead. int ERKStepSetAdaptivityMethod(void* arkode_mem, int imethod, int idefault, int pq, realtype* adapt_params) Specifies the method (and associated parameters) used for time step adaptivity. Arguments: • arkode_mem – pointer to the ERKStep memory block. • imethod – accuracy-based adaptivity method choice (0 ≤ imethod ≤ 5): 0 is PID, 1 is PI, 2 is I, 3 is explicit Gustafsson, 4 is implicit Gustafsson, and 5 is the ImEx Gustafsson. • idefault – flag denoting whether to use default adaptivity parameters (1), or that they will be supplied in the adapt_params argument (0). • pq – flag denoting whether to use the embedding order of accuracy p (0) or the method order of accuracy q (1) within the adaptivity algorithm. p is the default. • adapt_params[0] – 𝑘1 parameter within accuracy-based adaptivity algorithms. • adapt_params[1] – 𝑘2 parameter within accuracy-based adaptivity algorithms. • adapt_params[2] – 𝑘3 parameter within accuracy-based adaptivity algorithms. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If custom parameters are supplied, they will be checked for validity against published stability intervals. If other parameter values are desired, it is recommended to instead provide a custom function through a call to ERKStepSetAdaptivityFn(). int ERKStepSetCFLFraction(void* arkode_mem, realtype cfl_frac) Specifies the fraction of the estimated explicitly stable step to use. Arguments: • arkode_mem – pointer to the ERKStep memory block. • cfl_frac – maximum allowed fraction of explicitly stable step (default is 0.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ERKStepSetErrorBias(void* arkode_mem, realtype bias) Specifies the bias to be applied to the error estimates within accuracy-based adaptivity strategies. Arguments: 6.5. ERKStep User-callable functions 169 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode_mem – pointer to the ERKStep memory block. • bias – bias applied to error in accuracy-based time step estimation (default is 1.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value below 1.0 will imply a reset to the default value. int ERKStepSetFixedStepBounds(void* arkode_mem, realtype lb, realtype ub) Specifies the step growth interval in which the step size will remain unchanged. Arguments: • arkode_mem – pointer to the ERKStep memory block. • lb – lower bound on window to leave step size fixed (default is 1.0). • ub – upper bound on window to leave step size fixed (default is 1.5). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any interval not containing 1.0 will imply a reset to the default values. int ERKStepSetMaxEFailGrowth(void* arkode_mem, realtype etamxf ) Specifies the maximum step size growth factor upon multiple successive accuracy-based error failures in the solver. Arguments: • arkode_mem – pointer to the ERKStep memory block. • etamxf – time step reduction factor on multiple error fails (default is 0.3). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value outside the interval (0, 1] will imply a reset to the default value. int ERKStepSetMaxFirstGrowth(void* arkode_mem, realtype etamx1) Specifies the maximum allowed step size change following the very first integration step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • etamx1 – maximum allowed growth factor after the first time step (default is 10000.0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL 170 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value ≤ 1.0 will imply a reset to the default value. int ERKStepSetMaxGrowth(void* arkode_mem, realtype mx_growth) Specifies the maximum growth of the step size between consecutive steps in the integration process. Arguments: • arkode_mem – pointer to the ERKStep memory block. • growth – maximum allowed growth factor between consecutive time steps (default is 20.0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any value ≤ 1.0 will imply a reset to the default value. int ERKStepSetSafetyFactor(void* arkode_mem, realtype safety) Specifies the safety factor to be applied to the accuracy-based estimated step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • safety – safety factor applied to accuracy-based time step (default is 0.96). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ERKStepSetSmallNumEFails(void* arkode_mem, int small_nef ) Specifies the threshold for “multiple” successive error failures before the etamxf parameter from ERKStepSetMaxEFailGrowth() is applied. Arguments: • arkode_mem – pointer to the ERKStep memory block. • small_nef – bound to determine ‘multiple’ for etamxf (default is 2). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Any non-positive parameter will imply a reset to the default value. int ERKStepSetStabilityFn(void* arkode_mem, ARKExpStabFn EStab, void* estab_data) Sets the problem-dependent function to estimate a stable time step size for the explicit portion of the ODE system. Arguments: • arkode_mem – pointer to the ERKStep memory block. 6.5. ERKStep User-callable functions 171 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • EStab – name of user-supplied stability function. • estab_data – pointer to user data passed to EStab every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This function should return an estimate of the absolute value of the maximum stable time step for the the ODE system. It is not required, since accuracy-based adaptivity may be sufficient for retaining stability, but this can be quite useful for problems where the right-hand side function 𝑓 (𝑡, 𝑦) may contain stiff terms. Rootfinding optional input functions The following functions can be called to set optional inputs to control the rootfinding algorithm, the mathematics of which are described in the section Rootfinding. Optional input Direction of zero-crossings to monitor Disable inactive root warnings Function name ERKStepSetRootDirection() ERKStepSetNoInactiveRootWarn() Default both enabled int ERKStepSetRootDirection(void* arkode_mem, int* rootdir) Specifies the direction of zero-crossings to be located and returned. Arguments: • arkode_mem – pointer to the ERKStep memory block. • rootdir – state array of length nrtfn, the number of root functions 𝑔𝑖 (the value of nrtfn was supplied in the call to ERKStepRootInit()). If rootdir[i] == 0 then crossing in either direction for 𝑔𝑖 should be reported. A value of +1 or -1 indicates that the solver should report only zero-crossings where 𝑔𝑖 is increasing or decreasing, respectively. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default behavior is to monitor for both zero-crossing directions. int ERKStepSetNoInactiveRootWarn(void* arkode_mem) Disables issuing a warning if some root function appears to be identically zero at the beginning of the integration. Arguments: • arkode_mem – pointer to the ERKStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory is NULL Notes: ERKStep will not report the initial conditions as a possible zero-crossing (assuming that one or more components 𝑔𝑖 are zero at the initial time). However, if it appears that some 𝑔𝑖 is identically zero at the initial time (i.e., 𝑔𝑖 is zero at the initial time and after the first step), ERKStep will issue a warning which can be disabled with this optional input function. 172 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 6.5.6 Interpolated output function An optional function ERKStepGetDky() is available to obtain additional values of solution-related quantities. This function should only be called after a successful return from ERKStepEvolve(), as it provides interpolated values either of 𝑦 or of its derivatives (up to the 5th derivative) interpolated to any value of 𝑡 in the last internal step taken by ERKStepEvolve(). Internally, this dense output algorithm is identical to the algorithm used for the maximum order implicit predictors, described in the section Maximum order predictor, except that derivatives of the polynomial model may be evaluated upon request. int ERKStepGetDky(void* arkode_mem, realtype t, int k, N_Vector dky) 𝑑(𝑘) Computes the k-th derivative of the function 𝑦 at the time t, i.e. 𝑑𝑡 (𝑘) 𝑦(𝑡), for values of the independent variable satisfying 𝑡𝑛 − ℎ𝑛 ≤ 𝑡 ≤ 𝑡𝑛 , with 𝑡𝑛 as current internal time reached, and ℎ𝑛 is the last internal step size successfully used by the solver. This routine uses an interpolating polynomial of degree max(dord, k), where dord is the argument provided to ERKStepSetDenseOrder(). The user may request k in the range {0,...,*dord*}. Arguments: • arkode_mem – pointer to the ERKStep memory block. • t – the value of the independent variable at which the derivative is to be evaluated. • k – the derivative order requested. • dky – output vector (must be allocated by the user). Return value: • ARK_SUCCESS if successful • ARK_BAD_K if k is not in the range {0,...,*dord*}. • ARK_BAD_T if t is not in the interval [𝑡𝑛 − ℎ𝑛 , 𝑡𝑛 ] • ARK_BAD_DKY if the dky vector was NULL • ARK_MEM_NULL if the ERKStep memory is NULL Notes: It is only legal to call this function after a successful return from ERKStepEvolve(). A user may access the values 𝑡𝑛 and ℎ𝑛 via the functions ERKStepGetCurrentTime() and ERKStepGetLastStep(), respectively. 6.5.7 Optional output functions ERKStep provides an extensive set of functions that can be used to obtain solver performance information. We organize these into groups: 1. SUNDIALS version information accessor routines are in the subsection SUNDIALS version information, 2. General ERKStep output routines are in the subsection Main solver optional output functions, 3. Output routines regarding root-finding results are in the subsection Rootfinding optional output functions, 4. General usability routines (e.g. to print the current ERKStep parameters, or output the current Butcher table) are in the subsection General usability functions. Following each table, we elaborate on each function. Some of the optional outputs, especially the various counters, can be very useful in determining the efficiency of various methods inside ERKStep. For example: • The counters nsteps and nf_evals provide a rough measure of the overall cost of a given run, and can be compared between runs with different solver options to suggest which set of options is the most efficient. 6.5. ERKStep User-callable functions 173 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • The ratio nsteps/step_attempts can measure the quality of the time step adaptivity algorithm, since a poor algorithm will result in more failed steps, and hence a lower ratio. It is therefore recommended that users retrieve and output these statistics following each run, and take some time to investigate alternate solver options that will be more optimal for their particular problem of interest. SUNDIALS version information The following functions provide a way to get SUNDIALS version information at runtime. int SUNDIALSGetVersion(char *version, int len) This routine fills a string with SUNDIALS version information. Arguments: • version – character array to hold the SUNDIALS version information. • len – allocated length of the version character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS version Notes: An array of 25 characters should be sufficient to hold the version information. int SUNDIALSGetVersionNumber(int *major, int *minor, int *patch, char *label, int len) This routine sets integers for the SUNDIALS major, minor, and patch release numbers and fills a string with the release label if applicable. Arguments: • major – SUNDIALS release major version number. • minor – SUNDIALS release minor version number. • patch – SUNDIALS release patch version number. • label – string to hold the SUNDIALS release label. • len – allocated length of the label character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS label Notes: An array of 10 characters should be sufficient to hold the label information. If a label is not used in the release version, no information is copied to label. 174 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Main solver optional output functions Optional output Size of ERKStep real and integer workspaces Cumulative number of internal steps Actual initial time step size used Step size used for the last successful step Step size to be attempted on the next step Current internal time reached by the solver Suggested factor for tolerance scaling Error weight vector for state variables Single accessor to many statistics at once Name of constant associated with a return flag No. of explicit stability-limited steps No. of accuracy-limited steps No. of attempted steps No. of calls to f function No. of local error test failures that have occurred Current ERK Butcher table Estimated local truncation error vector Single accessor to many statistics at once Function name ERKStepGetWorkSpace() ERKStepGetNumSteps() ERKStepGetActualInitStep() ERKStepGetLastStep() ERKStepGetCurrentStep() ERKStepGetCurrentTime() ERKStepGetTolScaleFactor() ERKStepGetErrWeights() ERKStepGetStepStats() ERKStepGetReturnFlagName() ERKStepGetNumExpSteps() ERKStepGetNumAccSteps() ERKStepGetNumStepAttempts() ERKStepGetNumRhsEvals() ERKStepGetNumErrTestFails() ERKStepGetCurrentButcherTable() ERKStepGetEstLocalErrors() ERKStepGetTimestepperStats() int ERKStepGetWorkSpace(void* arkode_mem, long int* lenrw, long int* leniw) Returns the ERKStep real and integer workspace sizes. Arguments: • arkode_mem – pointer to the ERKStep memory block. • lenrw – the number of realtype values in the ERKStep workspace. • leniw – the number of integer values in the ERKStep workspace. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetNumSteps(void* arkode_mem, long int* nsteps) Returns the cumulative number of internal steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the ERKStep memory block. • nsteps – number of steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetActualInitStep(void* arkode_mem, realtype* hinused) Returns the value of the integration step size used on the first step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hinused – actual value of initial step size. 6.5. ERKStep User-callable functions 175 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL Notes: Even if the value of the initial integration step was specified by the user through a call to ERKStepSetInitStep(), this value may have been changed by ERKStep to ensure that the step size fell within the prescribed bounds (ℎ𝑚𝑖𝑛 ≤ ℎ0 ≤ ℎ𝑚𝑎𝑥 ), or to satisfy the local error test condition. int ERKStepGetLastStep(void* arkode_mem, realtype* hlast) Returns the integration step size taken on the last successful internal step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hlast – step size taken on the last internal step. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetCurrentStep(void* arkode_mem, realtype* hcur) Returns the integration step size to be attempted on the next internal step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • hcur – step size to be attempted on the next internal step. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetCurrentTime(void* arkode_mem, realtype* tcur) Returns the current internal time reached by the solver. Arguments: • arkode_mem – pointer to the ERKStep memory block. • tcur – current internal time reached. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetTolScaleFactor(void* arkode_mem, realtype* tolsfac) Returns a suggested factor by which the user’s tolerances should be scaled when too much accuracy has been requested for some internal step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • tolsfac – suggested scaling factor for user-supplied tolerances. Return value: • ARK_SUCCESS if successful 176 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetErrWeights(void* arkode_mem, N_Vector eweight) Returns the current error weight vector. Arguments: • arkode_mem – pointer to the ERKStep memory block. • eweight – solution error weights at the current time. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL Notes: The user must allocate space for eweight, that will be filled in by this function. int ERKStepGetStepStats(void* arkode_mem, long int* nsteps, realtype* hinused, realtype* hlast, realtype* hcur, realtype* tcur) Returns many of the most useful optional outputs in a single call. Arguments: • arkode_mem – pointer to the ERKStep memory block. • nsteps – number of steps taken in the solver. • hinused – actual value of initial step size. • hlast – step size taken on the last internal step. • hcur – step size to be attempted on the next internal step. • tcur – current internal time reached. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL char *ERKStepGetReturnFlagName(long int flag) Returns the name of the ERKStep constant corresponding to flag. Arguments: • flag – a return flag from an ERKStep function. Return value: The return value is a string containing the name of the corresponding constant. int ERKStepGetNumExpSteps(void* arkode_mem, long int* expsteps) Returns the cumulative number of stability-limited steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the ERKStep memory block. • expsteps – number of stability-limited steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetNumAccSteps(void* arkode_mem, long int* accsteps) Returns the cumulative number of accuracy-limited steps taken by the solver (so far). 6.5. ERKStep User-callable functions 177 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the ERKStep memory block. • accsteps – number of accuracy-limited steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetNumStepAttempts(void* arkode_mem, long int* step_attempts) Returns the cumulative number of steps attempted by the solver (so far). Arguments: • arkode_mem – pointer to the ERKStep memory block. • step_attempts – number of steps attempted by solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetNumRhsEvals(void* arkode_mem, long int* nf_evals) Returns the number of calls to the user’s right-hand side function, 𝑓 (so far). Arguments: • arkode_mem – pointer to the ERKStep memory block. • nf_evals – number of calls to the user’s 𝑓 (𝑡, 𝑦) function. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetNumErrTestFails(void* arkode_mem, long int* netfails) Returns the number of local error test failures that have occurred (so far). Arguments: • arkode_mem – pointer to the ERKStep memory block. • netfails – number of error test failures. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL int ERKStepGetCurrentButcherTable(void* arkode_mem, ARKodeButcherTable *B) Returns the Butcher table currently in use by the solver. Arguments: • arkode_mem – pointer to the ERKStep memory block. • B – pointer to Butcher table structure. Return value: • ARK_SUCCESS if successful 178 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the ERKStep memory was NULL Notes: The ARKodeButcherTable data structure is defined as a pointer to the following C structure: typedef struct ARKodeButcherTableMem { int q; int p; int stages; realtype **A; realtype *c; realtype *b; realtype *d; /* /* /* /* /* /* /* method order of accuracy embedding order of accuracy number of stages Butcher table coefficients canopy node coefficients root node coefficients embedding coefficients */ */ */ */ */ */ */ } *ARKodeButcherTable; For more details see :ref:`ARKodeButcherTable`. int ERKStepGetEstLocalErrors(void* arkode_mem, N_Vector ele) Returns the vector of estimated local truncation errors for the current step. Arguments: • arkode_mem – pointer to the ERKStep memory block. • ele – vector of estimated local truncation errors. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL Notes: The user must allocate space for ele, that will be filled in by this function. The values returned in ele are valid only after a successful call to ERKStepEvolve() (i.e. it returned a non-negative value). The ele vector, together with the eweight vector from ERKStepGetErrWeights(), can be used to determine how the various components of the system contributed to the estimated local error test. Specifically, that error test uses the WRMS norm of a vector whose components are the products of the components of these two vectors. Thus, for example, if there were recent error test failures, the components causing the failures are those with largest values for the products, denoted loosely as eweight[i]*ele[i]. int ERKStepGetTimestepperStats(void* arkode_mem, long int* expsteps, long int* accsteps, long int* step_attempts, long int* nf_evals, long int* netfails) Returns many of the most useful time-stepper statistics in a single call. Arguments: • arkode_mem – pointer to the ERKStep memory block. • expsteps – number of stability-limited steps taken in the solver. • accsteps – number of accuracy-limited steps taken in the solver. • step_attempts – number of steps attempted by the solver. • nf_evals – number of calls to the user’s 𝑓 (𝑡, 𝑦) function. • netfails – number of error test failures. Return value: • ARK_SUCCESS if successful 6.5. ERKStep User-callable functions 179 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the ERKStep memory was NULL Rootfinding optional output functions Optional output Array showing roots found No. of calls to user root function Function name ERKStepGetRootInfo() ERKStepGetNumGEvals() int ERKStepGetRootInfo(void* arkode_mem, int* rootsfound) Returns an array showing which functions were found to have a root. Arguments: • arkode_mem – pointer to the ERKStep memory block. • rootsfound – array of length nrtfn with the indices of the user functions 𝑔𝑖 found to have a root (the value of nrtfn was supplied in the call to ERKStepRootInit()). For 𝑖 = 0 . . . nrtfn-1, rootsfound[i] is nonzero if 𝑔𝑖 has a root, and 0 if not. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL Notes: The user must allocate space for rootsfound prior to calling this function. For the components of 𝑔𝑖 for which a root was found, the sign of rootsfound[i] indicates the direction of zero-crossing. A value of +1 indicates that 𝑔𝑖 is increasing, while a value of -1 indicates a decreasing 𝑔𝑖 . int ERKStepGetNumGEvals(void* arkode_mem, long int* ngevals) Returns the cumulative number of calls made to the user’s root function 𝑔. Arguments: • arkode_mem – pointer to the ERKStep memory block. • ngevals – number of calls made to 𝑔 so far. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL General usability functions The following optional routines may be called by a user to inquire about existing solver parameters, to retrieve stored Butcher tables, write the current Butcher table, or even to test a provided Butcher table to determine its analytical order of accuracy. While none of these would typically be called during the course of solving an initial value problem, these may be useful for users wishing to better understand ERKStep and/or specific Runge-Kutta methods. Optional routine Output all ERKStep solver parameters Output the current Butcher table Function name ERKStepWriteParameters() ERKStepWriteButcher() int ERKStepWriteParameters(void* arkode_mem, FILE *fp) Outputs all ERKStep solver parameters to the provided file pointer. Arguments: • arkode_mem – pointer to the ERKStep memory block. 180 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • fp – pointer to use for printing the solver parameters. Return value: • ARKS_SUCCESS if successful • ARKS_MEM_NULL if the ERKStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. When run in parallel, only one process should set a non-NULL value for this pointer, since parameters for all processes would be identical. int ERKStepWriteButcher(void* arkode_mem, FILE *fp) Outputs the current Butcher table to the provided file pointer. Arguments: • arkode_mem – pointer to the ERKStep memory block. • fp – pointer to use for printing the Butcher table. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. When run in parallel, only one process should set a non-NULL value for this pointer, since tables for all processes would be identical. 6.5.8 ERKStep re-initialization functions To reinitialize the ERKStep module for the solution of a new problem, where a prior call to ERKStepCreate() has been made, the user must call the function ERKStepReInit(). The new problem must have the same size as the previous one. This routine performs the same input checking and initializations that are done in ERKStepCreate(), but it performs no memory allocation as is assumes that the existing internal memory is sufficient for the new problem. A call to this re-initialization routine deletes the solution history that was stored internally during the previous integration. Following a successful call to ERKStepReInit(), call ERKStepEvolve() again for the solution of the new problem. The use of ERKStepReInit() requires that the number of Runge Kutta stages, denoted by s, be no larger for the new problem than for the previous problem. This condition is automatically fulfilled if the method order q and the problem type (explicit, implicit, ImEx) are left unchanged. One important use of the ERKStepReInit() function is in the treating of jump discontinuities in the RHS function. Except in cases of fairly small jumps, it is usually more efficient to stop at each point of discontinuity and restart the integrator with a readjusted ODE model, using a call to this routine. To stop when the location of the discontinuity is known, simply make that location a value of tout. To stop when the location of the discontinuity is determined by the solution, use the rootfinding feature. In either case, it is critical that the RHS function not incorporate the discontinuity, but rather have a smooth extension over the discontinuity, so that the step across it (and subsequent rootfinding, if used) can be done efficiently. Then use a switch within the RHS function (communicated through user_data) that can be flipped between the stopping of the integration and the restart, so that the restarted problem uses the new values (which have jumped). Similar comments apply if there is to be a jump in the dependent variable vector. int ERKStepReInit(void* arkode_mem, ARKRhsFn f, realtype t0, N_Vector y0) Provides required problem specifications and re-initializes the ERKStep time-stepper module. Arguments: • arkode_mem – pointer to the ERKStep memory block. 6.5. ERKStep User-callable functions 181 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • f – the name of the C function (of type ARKRhsFn()) defining the right-hand side function in 𝑦˙ = 𝑓 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL • ARK_MEM_FAIL if a memory allocation failed • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, ERKStepReInit() also sends an error message to the error handler function. 6.5.9 ERKStep system resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatially-adaptive PDE simulations under a method-of-lines approach), the ERKStep integrator may be “resized” between integration steps, through calls to the ERKStepResize() function. This function modifies ERKStep’s internal memory structures to use the new problem size, without destruction of the temporal adaptivity heuristics. It is assumed that the dynamical time scales before and after the vector resize will be comparable, so that all timestepping heuristics prior to calling ERKStepResize() remain valid after the call. If instead the dynamics should be recomputed from scratch, the ERKStep memory structure should be deleted with a call to ERKStepFree(), and recreated with a call to ERKStepCreate(). To aid in the vector resize operation, the user can supply a vector resize function that will take as input a vector with the previous size, and transform it in-place to return a corresponding vector of the new size. If this function (of type ARKVecResizeFn()) is not supplied (i.e. is set to NULL), then all existing vectors internal to ERKStep will be destroyed and re-cloned from the new input vector. In the case that the dynamical time scale should be modified slightly from the previous time scale, an input hscale is allowed, that will rescale the upcoming time step by the specified factor. If a value hscale ≤ 0 is specified, the default of 1.0 will be used. int ERKStepResize(void* arkode_mem, N_Vector ynew, realtype hscale, realtype t0, ARKVecResizeFn resize, void* resize_data) Re-initializes ERKStep with a different state vector but with comparable dynamical time scale. Arguments: • arkode_mem – pointer to the ERKStep memory block. • ynew – the newly-sized solution vector, holding the current dependent variable values 𝑦(𝑡0 ). • hscale – the desired scaling factor for the dynamical time scale (i.e. the next step will be of size h*hscale). • t0 – the current value of the independent variable 𝑡0 (this must be consistent with ynew). • resize – the user-supplied vector resize function (of type ARKVecResizeFn(). • resize_data – the user-supplied data structure to be passed to resize when modifying internal ERKStep vectors. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the ERKStep memory was NULL 182 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, ERKStepResize() also sends an error message to the error handler function. Resizing the absolute tolerance array If using array-valued absolute tolerances, the absolute tolerance vector will be invalid after the call to ERKStepResize(), so the new absolute tolerance vector should be re-set following each call to ERKStepResize() through a new call to ERKStepSVtolerances(). If scalar-valued tolerances or a tolerance function was specified through either ERKStepSStolerances() or ERKStepWFtolerances(), then these will remain valid and no further action is necessary. Note: For an example showing usage of the similar ARKStepResize() routine, see the supplied serial C example problem, ark_heat1D_adapt.c. 6.6 User-supplied functions The user-supplied functions for ERKStep consist of: • a function that defines the ODE (required), • a function that handles error and warning messages (optional), • a function that provides the error weight vector (optional), • a function that handles adaptive time step error control (optional), • a function that handles explicit time step stability (optional), • a function that defines the root-finding problem(s) to solve (optional), • a function that handles vector resizing operations, if the underlying vector structure supports resizing (as opposed to deletion/recreation), and if the user plans to call ERKStepResize() (optional). 6.6.1 ODE right-hand side The user must supply a function of type ARKRhsFn to specify the right-hand side of the ODE system: typedef int (*ARKRhsFn)(realtype t, N_Vector y, N_Vector ydot, void* user_data) This function computes the ODE right-hand side for a given value of the independent variable 𝑡 and state vector 𝑦. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • ydot – the output vector that forms the ODE RHS 𝑓 (𝑡, 𝑦). • user_data – the user_data pointer that was passed to ERKStepSetUserData(). Return value: An ARKRhsFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ERKStep will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARK_RHSFUNC_FAIL is returned). 6.6. User-supplied functions 183 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: Allocation of memory for ydot is handled within the ERKStep module. A recoverable failure error return from the ARKRhsFn is typically used to flag a value of the dependent variable 𝑦 that is “illegal” in some way (e.g., negative where only a non-negative value is physically meaningful). If such a return is made, ERKStep will attempt to recover by reducing the step size in order to avoid this recoverable error return. There are some situations in which recovery is not possible even if the right-hand side function returns a recoverable error flag. One is when this occurs at the very first call to the ARKRhsFn (in which case ERKStep returns ARK_FIRST_RHSFUNC_ERR). 6.6.2 Error message handler function As an alternative to the default behavior of directing error and warning messages to the file pointed to by errfp (see ERKStepSetErrFile()), the user may provide a function of type ARKErrHandlerFn to process any such messages. typedef void (*ARKErrHandlerFn)(int error_code, const char* module, const char* function, char* msg, void* user_data) This function processes error and warning messages from ERKStep and its sub-modules. Arguments: • error_code – the error code. • module – the name of the ERKStep module reporting the error. • function – the name of the function in which the error occurred. • msg – the error message. • user_data – a pointer to user data, the same as the eh_data parameter that was passed to ERKStepSetErrHandlerFn(). Return value: An ARKErrHandlerFn function has no return value. Notes: error_code is negative for errors and positive (ARK_WARNING) for warnings. If a function that returns a pointer to memory encounters an error, it sets error_code to 0. 6.6.3 Error weight function As an alternative to providing the relative and absolute tolerances, the user may provide a function of type ARKEwtFn )︁1/2 (︁ ∑︀ 𝑛 2 . These to compute a vector ewt containing the weights in the WRMS norm ‖𝑣‖𝑊 𝑅𝑀 𝑆 = 𝑛1 𝑖=1 (𝑒𝑤𝑡𝑖 𝑣𝑖 ) weights will be used in place of those defined in the section Error norms. typedef int (*ARKEwtFn)(N_Vector y, N_Vector ewt, void* user_data) This function computes the WRMS error weights for the vector 𝑦. Arguments: • y – the dependent variable vector at which the weight vector is to be computed. • ewt – the output vector containing the error weights. • user_data – a pointer to user data, the same as the user_data parameter that was passed to ERKStepSetUserData(). Return value: An ARKEwtFn function must return 0 if it successfully set the error weights, and -1 otherwise. Notes: Allocation of memory for ewt is handled within ERKStep. The error weight vector must have all components positive. It is the user’s responsibility to perform this test and return -1 if it is not satisfied. 184 Chapter 6. Using ERKStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 6.6.4 Time step adaptivity function As an alternative to using one of the built-in time step adaptivity methods for controlling solution error, the user may provide a function of type ARKAdaptFn to compute a target step size ℎ for the next integration step. These steps should be chosen as the maximum value such that the error estimates remain below 1. typedef int (*ARKAdaptFn)(N_Vector y, realtype t, realtype h1, realtype h2, realtype h3, realtype e1, realtype e2, realtype e3, int q, int p, realtype* hnew, void* user_data) This function implements a time step adaptivity algorithm that chooses ℎ satisfying the error tolerances. Arguments: • y – the current value of the dependent variable vector. • t – the current value of the independent variable. • h1 – the current step size, 𝑡𝑛 − 𝑡𝑛−1 . • h2 – the previous step size, 𝑡𝑛−1 − 𝑡𝑛−2 . • h3 – the step size 𝑡𝑛−2 − 𝑡𝑛−3 . • e1 – the error estimate from the current step, 𝑛. • e2 – the error estimate from the previous step, 𝑛 − 1. • e3 – the error estimate from the step 𝑛 − 2. • q – the global order of accuracy for the method. • p – the global order of accuracy for the embedded method. • hnew – the output value of the next step size. • user_data – a pointer to user data, the same as the h_data parameter that was passed to ERKStepSetAdaptivityFn(). Return value: An ARKAdaptFn function should return 0 if it successfully set the next step size, and a non-zero value otherwise. 6.6.5 Explicit stability function A user may supply a function to predict the maximum stable step size for the explicit Runge Kutta method on this problem. While the accuracy-based time step adaptivity algorithms may be sufficient for retaining a stable solution to the ODE system, these may be inefficient if 𝑓 (𝑡, 𝑦) contains moderately stiff terms. In this scenario, a user may provide a function of type ARKExpStabFn to provide this stability information to ERKStep. This function must set the scalar step size satisfying the stability restriction for the upcoming time step. This value will subsequently be bounded by the user-supplied values for the minimum and maximum allowed time step, and the accuracy-based time step. typedef int (*ARKExpStabFn)(N_Vector y, realtype t, realtype* hstab, void* user_data) This function predicts the maximum stable step size for the ODE system. Arguments: • y – the current value of the dependent variable vector. • t – the current value of the independent variable. • hstab – the output value with the absolute value of the maximum stable step size. • user_data – a pointer to user data, the same as the estab_data parameter that was passed to ERKStepSetStabilityFn(). 6.6. User-supplied functions 185 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: An ARKExpStabFn function should return 0 if it successfully set the upcoming stable step size, and a non-zero value otherwise. Notes: If this function is not supplied, or if it returns hstab ≤ 0.0, then ERKStep will assume that there is no explicit stability restriction on the time step size. 6.6.6 Rootfinding function If a rootfinding problem is to be solved during the integration of the ODE system, the user must supply a function of type ARKRootFn. typedef int (*ARKRootFn)(realtype t, N_Vector y, realtype* gout, void* user_data) This function implements a vector-valued function 𝑔(𝑡, 𝑦) such that the roots of the nrtfn components 𝑔𝑖 (𝑡, 𝑦) are sought. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • gout – the output array, of length nrtfn, with components 𝑔𝑖 (𝑡, 𝑦). • user_data – a pointer to user data, the same as the user_data parameter that was passed to ERKStepSetUserData(). Return value: An ARKRootFn function should return 0 if successful or a non-zero value if an error occurred (in which case the integration is halted and ERKStep returns ARK_RTFUNC_FAIL). Notes: Allocation of memory for gout is handled within ERKStep. 6.6.7 Vector resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatial adaptivity in a PDE simulation), the ERKStep integrator may be “resized” between integration steps, through calls to the ERKStepResize() function. Typically, when performing adaptive simulations the solution is stored in a customized user-supplied data structure, to enable adaptivity without repeated allocation/deallocation of memory. In these scenarios, it is recommended that the user supply a customized vector kernel to interface between SUNDIALS and their problem-specific data structure. If this vector kernel includes a function of type ARKVecResizeFn to resize a given vector implementation, then this function may be supplied to ERKStepResize() so that all internal ERKStep vectors may be resized, instead of deleting and re-creating them at each call. This resize function should have the following form: typedef int (*ARKVecResizeFn)(N_Vector y, N_Vector ytemplate, void* user_data) This function resizes the vector y to match the dimensions of the supplied vector, ytemplate. Arguments: • y – the vector to resize. • ytemplate – a vector of the desired size. • user_data – a pointer to user data, the same as the resize_data parameter that was passed to ERKStepResize(). Return value: An ARKVecResizeFn function should return 0 if it successfully resizes the vector y, and a nonzero value otherwise. Notes: If this function is not supplied, then ERKStep will instead destroy the vector y and clone a new vector y off of ytemplate. 186 Chapter 6. Using ERKStep for C and C++ Applications CHAPTER SEVEN USING MRISTEP FOR C AND C++ APPLICATIONS This chapter is concerned with the use of the MRIStep time-stepping module for the solution of two-rate initial value problems (IVPs) in a C or C++ language setting. The following sections discuss the header files and the layout of the user’s main program, and provide descriptions of the MRIStep user-callable functions and user-supplied functions. The example programs described in the companion document [R2018] may be helpful. Those codes may be used as templates for new codes and are included in the ARKode package examples subdirectory. MRIStep uses the input and output constants from the shared ARKode infrastructure. These are defined as needed in this chapter, but for convenience the full list is provided separately in the section Appendix: ARKode Constants. The relevant information on using MRIStep’s C and C++ interfaces is detailed in the following sub-sections. 7.1 Access to library and header files At this point, it is assumed that the installation of ARKode, following the procedure described in the section ARKode Installation Procedure, has been completed successfully. Regardless of where the user’s application program resides, its associated compilation and load commands must make reference to the appropriate locations for the library and header files required by ARKode. The relevant library files are • libdir/libsundials_arkode.lib, • libdir/libsundials_nvec*.lib, where the file extension .lib is typically .so for shared libraries and .a for static libraries. The relevant header files are located in the subdirectories • incdir/include/arkode • incdir/include/sundials • incdir/include/nvector The directories libdir and incdir are the installation library and include directories, respectively. For a default installation, these are instdir/lib and instdir/include, respectively, where instdir is the directory where SUNDIALS was installed (see the section ARKode Installation Procedure for further details). 7.2 Data Types The sundials_types.h file contains the definition of the variable type realtype, which is used by the SUNDIALS solvers for all floating-point data, the definition of the integer type sunindextype, which is used for vector and matrix indices, and booleantype, which is used for certain logic operations within SUNDIALS. 187 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 7.2.1 Floating point types The type “realtype” can be set to float, double, or long double, depending on how SUNDIALS was installed (with the default being double). The user can change the precision of the SUNDIALS solvers’ floating-point arithmetic at the configuration stage (see the section Configuration options (Unix/Linux)). Additionally, based on the current precision, sundials_types.h defines the values BIG_REAL to be the largest value representable as a realtype, SMALL_REAL to be the smallest positive value representable as a realtype, and UNIT_ROUNDOFF to be the smallest realtype number, 𝜀, such that 1.0 + 𝜀 ̸= 1.0. Within SUNDIALS, real constants may be set to have the appropriate precision by way of a macro called RCONST. It is this macro that needs the ability to branch on the definition realtype. In ANSI C, a floating-point constant with no suffix is stored as a double. Placing the suffix “F” at the end of a floating point constant makes it a float, whereas using the suffix “L” makes it a long double. For example, #define A 1.0 #define B 1.0F #define C 1.0L defines A to be a double constant equal to 1.0, B to be a float constant equal to 1.0, and C to be a long double constant equal to 1.0. The macro call RCONST(1.0) automatically expands to 1.0 if realtype is double, to 1.0F if realtype is float, or to 1.0L if realtype is long double. SUNDIALS uses the RCONST macro internally to declare all of its floating-point constants. A user program which uses the type realtype and the RCONST macro to handle floating-point constants is precisionindependent, except for any calls to precision-specific standard math library functions. Users can, however, use the types double, float, or long double in their code (assuming that this usage is consistent with the size of realtype values that are passed to and from SUNDIALS). Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use realtype, so long as the SUNDIALS libraries have been compiled using the same precision (for details see the section ARKode Installation Procedure). 7.2.2 Integer types used for vector and matrix indices The type sunindextype can be either a 32- or 64-bit signed integer. The default is the portable int64_t type, and the user can change it to int32_t at the configuration stage. The configuration system will detect if the compiler does not support portable types, and will replace int32_t and int64_t with int and long int, respectively, to ensure use of the desired sizes on Linux, Mac OS X, and Windows platforms. SUNDIALS currently does not support unsigned integer types for vector and matrix indices, although these could be added in the future if there is sufficient demand. A user program which uses sunindextype to handle vector and matrix indices will work with both index storage types except for any calls to index storage-specific external libraries. (Our C and C++ example programs use sunindextype.) Users can, however, use any one of int, long int, int32_t, int64_t or long long int in their code, assuming that this usage is consistent with the typedef for sunindextype on their architecture. Thus, a previously existing piece of ANSI C code can use SUNDIALS without modifying the code to use sunindextype, so long as the SUNDIALS libraries use the appropriate index storage type (for details see the section ARKode Installation Procedure). 7.3 Header Files When using MRIStep, the calling program must include several header files so that various macros and data types can be used. The header file that is always required is: 188 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode/arkode_mristep.h, the main header file for the MRIStep time-stepping module, which defines the several types and various constants, includes function prototypes, and includes the shared arkode/arkode.h header file. Note that arkode.h includes sundials_types.h directly, which defines the types realtype, sunindextype, and booleantype and the constants SUNFALSE and SUNTRUE, so a user program does not need to include sundials_types.h directly. Additionally, the calling program must also include an NVECTOR implementation header file, of the form nvector/nvector_***.h, corresponding to the user’s preferred data layout and form of parallelism. See the section Vector Data Structures for details for the appropriate name. This file in turn includes the header file sundials_nvector.h which defines the abstract N_Vector data type. 7.4 A skeleton of the user’s main program The following is a skeleton of the user’s main program (or calling program) for the integration of an ODE IVP using the MRIStep module. Most of the steps are independent of the NVECTOR implementation used. For the steps that are not, refer to the section Vector Data Structures for the specific name of the function to be called or macro to be referenced. 1. Initialize parallel or multi-threaded environment, if appropriate. For example, call MPI_Init to initialize MPI if used, or set num_threads, the number of threads to use within the threaded vector functions, if used. 2. Set problem dimensions, etc. This generally includes the problem size, N, and may include the local vector length Nlocal. Note: The variables N and Nlocal should be of type sunindextype. 3. Set vector of initial values To set the vector y0 of initial values, use the appropriate functions defined by the particular NVECTOR implementation. For native SUNDIALS vector implementations (except the CUDA and RAJA based ones), use a call of the form y0 = N_VMake_***(..., ydata); if the realtype array ydata containing the initial values of 𝑦 already exists. Otherwise, create a new vector by making a call of the form y0 = N_VNew_***(...); and then set its elements by accessing the underlying data where it is located with a call of the form ydata = N_VGetArrayPointer_***(y0); See the sections The NVECTOR_SERIAL Module through The NVECTOR_PTHREADS Module for details. For the HYPRE and PETSc vector wrappers, first create and initialize the underlying vector, and then create the NVECTOR wrapper with a call of the form y0 = N_VMake_***(yvec); where yvec is a HYPRE or PETSc vector. Note that calls like N_VNew_***(...) and N_VGetArrayPointer_***(...) are not available for these vector wrappers. See the sections The NVECTOR_PARHYP Module and The NVECTOR_PETSC Module for details. 7.4. A skeleton of the user’s main program 189 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), If using either the CUDA- or RAJA-based vector implementations use a call of the form y0 = N_VMake_***(..., c); where c is a pointer to a suncudavec or sunrajavec vector class if this class already exists. Otherwise, create a new vector by making a call of the form N_VGetDeviceArrayPointer_*** or N_VGetHostArrayPointer_*** Note that the vector class will allocate memory on both the host and device when instantiated. See the sections The NVECTOR_CUDA Module and The NVECTOR_RAJA Module for details. 4. Create MRIStep object Call arkode_mem = MRIStepCreate(...) to create the MRIStep memory block. MRIStepCreate() returns a void* pointer to this memory structure. See the section MRIStep initialization and deallocation functions for details. 5. Set the slow and fast step sizes Call MRIStepSetFixedStep() to specify the slow and fast time step sizes. 6. Set optional inputs Call MRIStepSet* functions to change any optional inputs that control the behavior of MRIStep from their default values. See the section Optional input functions for details. 7. Specify rootfinding problem Optionally, call MRIStepRootInit() to initialize a rootfinding problem to be solved during the integration of the ODE system. See the section Rootfinding initialization function for general details, and the section Optional input functions for relevant optional input calls. 8. Advance solution in time For each point at which output is desired, call ier = MRIStepEvolve(arkode_mem, tout, yout, &tret, itask); Here, itask specifies the return mode. The vector yout (which can be the same as the vector y0 above) will contain 𝑦(𝑡out ). See the section MRIStep solver function for details. 9. Get optional outputs Call MRIStepGet* functions to obtain optional output. See the section Optional output functions for details. 10. Deallocate memory for solution vector Upon completion of the integration, deallocate memory for the vector y (or yout) by calling the NVECTOR destructor function: N_VDestroy(y); 11. Free solver memory Call MRIStepFree(&arkode_mem) to free the memory allocated for the MRIStep module. 12. Finalize MPI, if used Call MPI_Finalize to terminate MPI. 190 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 7.5 MRIStep User-callable functions This section describes the functions that are called by the user to setup and then solve an IVP using the MRIStep time-stepping module. Some of these are required; however, starting with the section Optional input functions, the functions listed involve optional inputs/outputs or restarting, and those paragraphs may be skipped for a casual use of ARKode’s MRIStep module. In any case, refer to the preceding section, A skeleton of the user’s main program, for the correct order of these calls. On an error, each user-callable function returns a negative value (or NULL if the function returns a pointer) and sends an error message to the error handler routine, which prints the message to stderr by default. However, the user can set a file as error output or can provide her own error handler function (see the section Optional input functions for details). 7.5.1 MRIStep initialization and deallocation functions void* MRIStepCreate(ARKRhsFn fs, ARKRhsFn ff, realtype t0, N_Vector y0) This function allocates and initializes memory for a problem to be solved using the MRIStep time-stepping module in ARKode. Arguments: • fs – the name of the C function (of type ARKRhsFn()) defining the slow portion of the right-hand side function in 𝑦˙ = 𝑓𝑠 (𝑡, 𝑦) + 𝑓𝑓 (𝑡, 𝑦). • ff – the name of the C function (of type ARKRhsFn()) defining the fast portion of the right-hand side function in 𝑦˙ = 𝑓𝑠 (𝑡, 𝑦) + 𝑓𝑓 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: If successful, a pointer to initialized problem memory of type void*, to be passed to all userfacing MRIStep routines listed below. If unsuccessful, a NULL pointer will be returned, and an error message will be printed to stderr. void MRIStepFree(void** arkode_mem) This function frees the problem memory arkode_mem created by MRIStepCreate(). Arguments: • arkode_mem – pointer to the MRIStep memory block. Return value: None 7.5.2 Rootfinding initialization function As described in the section Rootfinding, while solving the IVP, ARKode’s time-stepping modules have the capability to find the roots of a set of user-defined functions. In the MRIStep module root finding is performed between slow solution time steps only (i.e., it is not performed within the sub-stepping a fast time scales). To activate the root-finding algorithm, call the following function. This is normally called only once, prior to the first call to MRIStepEvolve(), but if the rootfinding problem is to be changed during the solution, MRIStepRootInit() can also be called prior to a continuation call to MRIStepEvolve(). int MRIStepRootInit(void* arkode_mem, int nrtfn, ARKRootFn g) Initializes a rootfinding problem to be solved during the integration of the ODE system. It must be called after MRIStepCreate(), and before MRIStepEvolve(). Arguments: 7.5. MRIStep User-callable functions 191 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • arkode_mem – pointer to the MRIStep memory block. • nrtfn – number of functions 𝑔𝑖 , an integer ≥ 0. • g – name of user-supplied function, of type ARKRootFn(), defining the functions 𝑔𝑖 whose roots are sought. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL • ARK_MEM_FAIL if there was a memory allocation failure • ARK_ILL_INPUT if nrtfn is greater than zero but g = NULL. Notes: To disable the rootfinding feature after it has already been initialized, or to free memory associated with MRIStep’s rootfinding module, call MRIStepRootInit with nrtfn = 0. Similarly, if a new IVP is to be solved with a call to MRIStepReInit(), where the new IVP has no rootfinding problem but the prior one did, then call MRIStepRootInit with nrtfn = 0. 7.5.3 MRIStep solver function This is the central step in the solution process – the call to perform the integration of the IVP. The input argument itask specifies one of two modes as to where MRIStep is to return a solution. These modes are modified if the user has set a stop time (with a call to the optional input function MRIStepSetStopTime()) or has requested rootfinding. int MRIStepEvolve(void* arkode_mem, realtype tout, N_Vector yout, realtype *tret, int itask) Integrates the ODE over an interval in 𝑡. Arguments: • arkode_mem – pointer to the MRIStep memory block. • tout – the next time at which a computed solution is desired. • yout – the computed solution vector. • tret – the time corresponding to yout (output). • itask – a flag indicating the job of the solver for the next user step. The ARK_NORMAL option causes the solver to take internal steps until it has just overtaken a userspecified output time, tout, in the direction of integration, i.e. 𝑡𝑛−1 < tout ≤ 𝑡𝑛 for forward integration, or 𝑡𝑛 ≤ tout < 𝑡𝑛−1 for backward integration. It will then compute an approximation to the solution 𝑦(𝑡𝑜𝑢𝑡) by interpolation (using one of the dense output routines described in the section Interpolation). The ARK_ONE_STEP option tells the solver to only take a single internal step 𝑦𝑛−1 → 𝑦𝑛 and then return control back to the calling program. If this step will overtake tout then the solver will again return an interpolated result; otherwise it will return a copy of the internal solution 𝑦𝑛 in the vector yout Return value: • ARK_SUCCESS if successful. • ARK_ROOT_RETURN if MRIStepEvolve() succeeded, and found one or more roots. If the number of root functions, nrtfn, is greater than 1, call MRIStepGetRootInfo() to see which 𝑔𝑖 were found to have a root at (*tret). • ARK_TSTOP_RETURN if MRIStepEvolve() succeeded and returned at tstop. 192 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_MEM_NULL if the arkode_mem argument was NULL. • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if one of the inputs to MRIStepEvolve() is illegal, or some other input to the solver was either illegal or missing. Details will be provided in the error message. Typical causes of this failure: 1. A component of the error weight vector became zero during internal time-stepping. 2. A root of one of the root functions was found both at a point 𝑡 and also very near 𝑡. • ARK_TOO_MUCH_WORK if the solver took mxstep internal steps but could not reach tout. The default value for mxstep is MXSTEP_DEFAULT = 500. • ARK_VECTOROP_ERR a vector operation error occured. • ARK_INNERSTEP_FAILED if the inner stepper returned with an unrecoverable error. The value returned from the inner stepper can be obtained with MRIStepGetLastInnerStepFlag(). Notes: The input vector yout can use the same memory as the vector y0 of initial conditions that was passed to MRIStepCreate(). In ARK_ONE_STEP mode, tout is used only on the first call, and only to get the direction and a rough scale of the independent variable. All failure return values are negative and so testing the return argument for negative values will trap all MRIStepEvolve() failures. Since interpolation may reduce the accuracy in the reported solution, if full method accuracy is desired the user should issue a call to MRIStepSetStopTime() before the call to MRIStepEvolve() to specify a fixed stop time to end the time step and return to the user. Upon return from MRIStepEvolve(), a copy of the internal solution 𝑦𝑛 will be returned in the vector yout. Once the integrator returns at a tstop time, any future testing for tstop is disabled (and can be re-enabled only though a new call to MRIStepSetStopTime()). On any error return in which one or more internal steps were taken by MRIStepEvolve(), the returned values of tret and yout correspond to the farthest point reached in the integration. On all other error returns, tret and yout are left unchanged from those provided to the routine. 7.5.4 Optional input functions There are numerous optional input parameters that control the behavior of the MRIStep solver, each of which may be modified from its default value through calling an appropriate input function. The following tables list all optional input functions, grouped by which aspect of MRIStep they control. Detailed information on the calling syntax and arguments for each function are then provided following each table. The optional inputs are grouped into the following categories: • General MRIStep options (Optional inputs for MRIStep), • IVP method solver options (Optional inputs for IVP method selection), For the most casual use of MRIStep, relying on the default set of solver parameters, the reader can skip to the following section, User-supplied functions. We note that, on an error return, all of the optional input functions send an error message to the error handler function. We also note that all error return values are negative, so a test on the return arguments for negative values will catch all errors. 7.5. MRIStep User-callable functions 193 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Optional inputs for MRIStep Optional input Return MRIStep solver parameters to their defaults Set dense output order Supply a pointer to a diagnostics output file Supply a pointer to an error output file Supply a custom error handler function Run with fixed-step sizes Maximum no. of warnings for 𝑡𝑛 + ℎ = 𝑡𝑛 Maximum no. of internal steps before tout Set a value for 𝑡𝑠𝑡𝑜𝑝 Supply a pointer for user data Function name MRIStepSetDefaults() MRIStepSetDenseOrder() MRIStepSetDiagnostics() MRIStepSetErrFile() MRIStepSetErrHandlerFn() MRIStepSetFixedStep() MRIStepSetMaxHnilWarns() MRIStepSetMaxNumSteps() MRIStepSetStopTime() MRIStepSetUserData() Default internal 3 NULL stderr internal fn required 10 500 ∞ NULL int MRIStepSetDefaults(void* arkode_mem) Resets all optional input parameters to MRIStep’s original default values. Arguments: • arkode_mem – pointer to the MRIStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This function does not change problem-defining function pointers fs and ff or the user_data pointer. It also does not affect any data structures or options related to root-finding (those can be reset using MRIStepRootInit()). int MRIStepSetDenseOrder(void* arkode_mem, int dord) Specifies the degree of the polynomial interpolant used for dense output (i.e. interpolation of solution output values). Arguments: • arkode_mem – pointer to the MRIStep memory block. • dord – requested polynomial order of accuracy. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Allowed values are between 0 and min(q,5), where q is the order of the overall integration method. int MRIStepSetDiagnostics(void* arkode_mem, FILE* diagfp) Specifies the file pointer for a diagnostics file where all MRIStep step adaptivity and solver information is written. Arguments: • arkode_mem – pointer to the MRIStep memory block. • diagfp – pointer to the diagnostics output file. Return value: 194 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: This parameter can be stdout or stderr, although the suggested approach is to specify a pointer to a unique file opened by the user and returned by fopen. If not called, or if called with a NULL file pointer, all diagnostics output is disabled. When run in parallel, only one process should set a non-NULL value for this pointer, since statistics from all processes would be identical. int MRIStepSetErrFile(void* arkode_mem, FILE* errfp) Specifies a pointer to the file where all MRIStep warning and error messages will be written if the default internal error handling function is used. Arguments: • arkode_mem – pointer to the MRIStep memory block. • errfp – pointer to the output file. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value for errfp is stderr. Passing a NULL value disables all future error message output (except for the case wherein the MRIStep memory pointer is NULL). This use of the function is strongly discouraged. If used, this routine should be called before any other optional input functions, in order to take effect for subsequent error messages. int MRIStepSetErrHandlerFn(void* arkode_mem, ARKErrHandlerFn ehfun, void* eh_data) Specifies the optional user-defined function to be used in handling error messages. Arguments: • arkode_mem – pointer to the MRIStep memory block. • ehfun – name of user-supplied error handler function. • eh_data – pointer to user data passed to ehfun every time it is called. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Error messages indicating that the MRIStep solver memory is NULL will always be directed to stderr. int MRIStepSetFixedStep(void* arkode_mem, realtype hs, realtype hf ) Set the slow and fast step size used within MRIStep. Arguments: • arkode_mem – pointer to the MRIStep memory block. • hs – value of the slow step size. 7.5. MRIStep User-callable functions 195 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • hf – value of the fast step size. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If hf does not evenly divide the time interval between the stages of the slow method, then the actual value used for the fast steps will be slightly smaller than hf to ensure (𝑐𝑠𝑖 − 𝑐𝑠𝑖−1 )ℎ𝑠 /ℎ𝑓 is an integer value. Specifically, the fast step for the i-th slow stage will be ℎ = (𝑐𝑠𝑖 −𝑐𝑠𝑖−1 )ℎ𝑠 ⌈(𝑐𝑠𝑖 −𝑐𝑠𝑖−1 )ℎ𝑠 /ℎ𝑓 ⌉ . If both MRIStepSetFixedStep() and MRIStepSetStopTime() are used, then the fixed step size will be used for all steps until the final step preceding the provided stop time (which may be shorter). To resume use of the previous fixed step size, another call to MRIStepSetFixedStep() must be made prior to calling MRIStepEvolve() to resume integration. int MRIStepSetMaxHnilWarns(void* arkode_mem, int mxhnil) Specifies the maximum number of messages issued by the solver to warn that 𝑡 + ℎ = 𝑡 on the next internal step, before MRIStep will instead return with an error. Arguments: • arkode_mem – pointer to the MRIStep memory block. • mxhnil – maximum allowed number of warning messages (> 0). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default value is 10; set mxhnil to zero to specify this default. A negative value indicates that no warning messages should be issued. int MRIStepSetMaxNumSteps(void* arkode_mem, long int mxsteps) Specifies the maximum number of steps to be taken by the solver in its attempt to reach the next output time, before MRIStep will return with an error. Arguments: • arkode_mem – pointer to the MRIStep memory block. • mxsteps – maximum allowed number of internal steps. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: Passing mxsteps = 0 results in MRIStep using the default value (500). Passing mxsteps < 0 disables the test (not recommended). int MRIStepSetStopTime(void* arkode_mem, realtype tstop) Specifies the value of the independent variable 𝑡 past which the solution is not to proceed. 196 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the MRIStep memory block. • tstop – stopping time for the integrator. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default is that no stop time is imposed. int MRIStepSetUserData(void* arkode_mem, void* user_data) Specifies the user data block user_data and attaches it to the main MRIStep memory block. Arguments: • arkode_mem – pointer to the MRIStep memory block. • user_data – pointer to the user data. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: If specified, the pointer to user_data is passed to all user-supplied functions for which it is an argument; otherwise NULL is passed. Optional inputs for IVP method selection Optional input Set MRI RK tables Specify MRI RK table numbers Function name MRIStepSetMRITables() MRIStepSetMRITableNum() Default internal internal int MRIStepSetMRITables(void* arkode_mem, int q, ARKodeButcherTable Bs, ARKodeButcherTable Bf ) Specifies a customized Butcher table pair for the MRI method. Arguments: • arkode_mem – pointer to the MRIStep memory block. • q – global order of accuracy for the MRI method. • Bs – the Butcher table for the slow RK method. • Bf – the Butcher table for the fast RK method. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: For a description of the ARKodeButcherTable type and related functions for creating Butcher tables see Butcher Table Data Structure. 7.5. MRIStep User-callable functions 197 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), At this time the slow and fast Butcher tables must define an explicit Runge-Kutta method. Additionally, the slow table must have stage times that are unique and ordered (i.e., 𝑐𝑠𝑖 > 𝑐𝑠𝑖−1 ) and the final stage time must be less than 1. Error checking is performed to ensure that Bs and Bf define ERK methods (i.e., the A component of Bs and Bf are strictly lower-triangular) and the stage times of Bs satisfy the aforementioned restrictions. The input value of q is used rather than the orders encoded in the individual tables as the overall order of the MRI method may differ from the orders of the individual tables. No error checking is performed to ensure that p correctly describe the coefficients that were input. int MRIStepSetMRITableNum(void* arkode_mem, int istable, int iftable) Indicates to use specific built-in Butcher tables for the MRI method. Arguments: • arkode_mem – pointer to the MRIStep memory block. • istable – index of the slow Butcher table. • iftable – index of the fast Butcher table. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: istable and iftable should match existing explicit methods from the section Explicit Butcher tables. Error-checking is performed to ensure that these tables exists, and are not implicit. Rootfinding optional input functions The following functions can be called to set optional inputs to control the rootfinding algorithm, the mathematics of which are described in the section Rootfinding. Optional input Direction of zero-crossings to monitor Disable inactive root warnings Function name MRIStepSetRootDirection() MRIStepSetNoInactiveRootWarn() Default both enabled int MRIStepSetRootDirection(void* arkode_mem, int* rootdir) Specifies the direction of zero-crossings to be located and returned. Arguments: • arkode_mem – pointer to the MRIStep memory block. • rootdir – state array of length nrtfn, the number of root functions 𝑔𝑖 (the value of nrtfn was supplied in the call to MRIStepRootInit()). If rootdir[i] == 0 then crossing in either direction for 𝑔𝑖 should be reported. A value of +1 or -1 indicates that the solver should report only zero-crossings where 𝑔𝑖 is increasing or decreasing, respectively. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL • ARK_ILL_INPUT if an argument has an illegal value Notes: The default behavior is to monitor for both zero-crossing directions. int MRIStepSetNoInactiveRootWarn(void* arkode_mem) Disables issuing a warning if some root function appears to be identically zero at the beginning of the integration. 198 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • arkode_mem – pointer to the MRIStep memory block. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory is NULL Notes: MRIStep will not report the initial conditions as a possible zero-crossing (assuming that one or more components 𝑔𝑖 are zero at the initial time). However, if it appears that some 𝑔𝑖 is identically zero at the initial time (i.e., 𝑔𝑖 is zero at the initial time and after the first step), MRIStep will issue a warning which can be disabled with this optional input function. 7.5.5 Interpolated output function An optional function MRIStepGetDky() is available to obtain additional values of solution-related quantities. This function should only be called after a successful return from MRIStepEvolve(), as it provides interpolated values either of 𝑦 or of its derivatives (up to the 3rd derivative) interpolated to any value of 𝑡 in the last internal step taken by MRIStepEvolve(). Internally, this dense output algorithm is identical to the algorithm used for the maximum order implicit predictors, described in the section Maximum order predictor, except that derivatives of the polynomial model may be evaluated upon request. int MRIStepGetDky(void* arkode_mem, realtype t, int k, N_Vector dky) 𝑑(𝑘) Computes the k-th derivative of the function 𝑦 at the time t, i.e. 𝑑𝑡 (𝑘) 𝑦(𝑡), for values of the independent variable satisfying 𝑡𝑛 − ℎ𝑛 ≤ 𝑡 ≤ 𝑡𝑛 , with 𝑡𝑛 as current internal time reached, and ℎ𝑛 is the last internal step size successfully used by the solver. This routine uses an interpolating polynomial of degree max(dord, k), where dord is the argument provided to MRIStepSetDenseOrder(). The user may request k in the range {0,...,*dord*}. Arguments: • arkode_mem – pointer to the MRIStep memory block. • t – the value of the independent variable at which the derivative is to be evaluated. • k – the derivative order requested. • dky – output vector (must be allocated by the user). Return value: • ARK_SUCCESS if successful • ARK_BAD_K if k is not in the range {0,...,*dord*}. • ARK_BAD_T if t is not in the interval [𝑡𝑛 − ℎ𝑛 , 𝑡𝑛 ] • ARK_BAD_DKY if the dky vector was NULL • ARK_MEM_NULL if the MRIStep memory is NULL Notes: It is only legal to call this function after a successful return from MRIStepEvolve(). A user may access the values 𝑡𝑛 and ℎ𝑛 via the functions MRIStepGetCurrentTime() and MRIStepGetLastStep(), respectively. 7.5.6 Optional output functions MRIStep provides an extensive set of functions that can be used to obtain solver performance information. We organize these into groups: 7.5. MRIStep User-callable functions 199 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 1. SUNDIALS version information accessor routines are in the subsection SUNDIALS version information, 2. General MRIStep output routines are in the subsection Main solver optional output functions, 3. Output routines regarding root-finding results are in the subsection Rootfinding optional output functions, 4. General usability routines (e.g. to print the current MRIStep parameters, or output the current Butcher tables) are in the subsection General usability functions. Following each table, we elaborate on each function. Some of the optional outputs, especially the various counters, can be very useful in determining the efficiency of various methods inside MRIStep. For example: • The counters nssteps, nfsteps, nfs_evals, and nff_evals provide a rough measure of the overall cost of a given run, and can be compared between runs with different solver options to suggest which set of options is the most efficient. It is therefore recommended that users retrieve and output these statistics following each run, and take some time to investigate alternate solver options that will be more optimal for their particular problem of interest. SUNDIALS version information The following functions provide a way to get SUNDIALS version information at runtime. int SUNDIALSGetVersion(char *version, int len) This routine fills a string with SUNDIALS version information. Arguments: • version – character array to hold the SUNDIALS version information. • len – allocated length of the version character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS version Notes: An array of 25 characters should be sufficient to hold the version information. int SUNDIALSGetVersionNumber(int *major, int *minor, int *patch, char *label, int len) This routine sets integers for the SUNDIALS major, minor, and patch release numbers and fills a string with the release label if applicable. Arguments: • major – SUNDIALS release major version number. • minor – SUNDIALS release minor version number. • patch – SUNDIALS release patch version number. • label – string to hold the SUNDIALS release label. • len – allocated length of the label character array. Return value: • 0 if successful • -1 if the input string is too short to store the SUNDIALS label Notes: An array of 10 characters should be sufficient to hold the label information. If a label is not used in the release version, no information is copied to label. 200 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Main solver optional output functions Optional output Size of MRIStep real and integer workspaces Cumulative numbers of internal steps Step size used for the last successful step Name of constant associated with a return flag No. of calls to the fs and ff functions Current MRI Butcher tables Last inner stepper return value Function name MRIStepGetWorkSpace() MRIStepGetNumSteps() MRIStepGetLastStep() MRIStepGetReturnFlagName() MRIStepGetNumRhsEvals() MRIStepGetCurrentButcherTables() MRIStepGetLastInnerStepFlag() int MRIStepGetWorkSpace(void* arkode_mem, long int* lenrw, long int* leniw) Returns the MRIStep real and integer workspace sizes. Arguments: • arkode_mem – pointer to the MRIStep memory block. • lenrw – the number of realtype values in the MRIStep workspace. • leniw – the number of integer values in the MRIStep workspace. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL int MRIStepGetNumSteps(void* arkode_mem, long int* nssteps, long int* nfsteps) Returns the cumulative number of slow and fast internal steps taken by the solver (so far). Arguments: • arkode_mem – pointer to the MRIStep memory block. • nssteps – number of slow steps taken in the solver. • nfsteps – number of fast steps taken in the solver. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL int MRIStepGetLastStep(void* arkode_mem, realtype* hlast) Returns the integration step size taken on the last successful internal step. Arguments: • arkode_mem – pointer to the MRIStep memory block. • hlast – step size taken on the last internal step. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL int MRIStepGetCurrentTime(void* arkode_mem, realtype* tcur) Returns the current internal time reached by the solver. Arguments: • arkode_mem – pointer to the MRIStep memory block. 7.5. MRIStep User-callable functions 201 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • tcur – current internal time reached. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL char *MRIStepGetReturnFlagName(long int flag) Returns the name of the MRIStep constant corresponding to flag. Arguments: • flag – a return flag from an MRIStep function. Return value: The return value is a string containing the name of the corresponding constant. int MRIStepGetNumRhsEvals(void* arkode_mem, long int* nfs_evals, long int* nff_evals) Returns the number of calls to the user’s slow and fast right-hand side functions, 𝑓 𝑠 and 𝑓 𝑓 (so far). Arguments: • arkode_mem – pointer to the MRIStep memory block. • nfs_evals – number of calls to the user’s 𝑓 𝑠(𝑡, 𝑦) function. • nff_evals – number of calls to the user’s 𝑓 𝑓 (𝑡, 𝑦) function. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL int MRIStepGetCurrentButcherTables(void* arkode_mem, ARKodeButcherTable *Bs, ARKodeButcherTable *Bf ) Returns the slow and fast Butcher tables currently in use by the solver. Arguments: • arkode_mem – pointer to the MRIStep memory block. • Bs – pointer to slow Butcher table structure. • Bf – pointer to fast Butcher table structure. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL Notes: The ARKodeButcherTable data structure is defined in the header file arkode/arkode_butcher.h. It is defined as a pointer to the following C structure: typedef struct ARKodeButcherTableMem { int q; int p; int stages; realtype **A; realtype *c; realtype *b; realtype *d; /* /* /* /* /* /* /* method order of accuracy embedding order of accuracy number of stages Butcher table coefficients canopy node coefficients root node coefficients embedding coefficients */ */ */ */ */ */ */ } *ARKodeButcherTable; 202 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int MRIStepGetLastInnerStepFlag(void* arkode_mem, int* flag) Returns the last return value from the inner stepper. Arguments: • arkode_mem – pointer to the MRIStep memory block. • flag – inner stepper return value. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL General usability functions The following optional routines may be called by a user to inquire about existing solver parameters, to retrieve stored Butcher tables, write the current Butcher table, or even to test a provided Butcher table to determine its analytical order of accuracy. While none of these would typically be called during the course of solving an initial value problem, these may be useful for users wishing to better understand MRIStep and/or specific Runge-Kutta methods. Optional routine Output all MRIStep solver parameters Output the current Butcher tables Function name MRIStepWriteParameters() MRIStepWriteButcher() int MRIStepWriteParameters(void* arkode_mem, FILE *fp) Outputs all MRIStep solver parameters to the provided file pointer. Arguments: • arkode_mem – pointer to the MRIStep memory block. • fp – pointer to use for printing the solver parameters. Return value: • ARKS_SUCCESS if successful • ARKS_MEM_NULL if the MRIStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. When run in parallel, only one process should set a non-NULL value for this pointer, since parameters for all processes would be identical. int MRIStepWriteButcher(void* arkode_mem, FILE *fp) Outputs the current Butcher tables to the provided file pointer. Arguments: • arkode_mem – pointer to the MRIStep memory block. • fp – pointer to use for printing the Butcher tables. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL Notes: The fp argument can be stdout or stderr, or it may point to a specific file created using fopen. When run in parallel, only one process should set a non-NULL value for this pointer, since tables for all processes would be identical. 7.5. MRIStep User-callable functions 203 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Rootfinding optional output functions Optional output Array showing roots found No. of calls to user root function Function name MRIStepGetRootInfo() MRIStepGetNumGEvals() int MRIStepGetRootInfo(void* arkode_mem, int* rootsfound) Returns an array showing which functions were found to have a root. Arguments: • arkode_mem – pointer to the MRIStep memory block. • rootsfound – array of length nrtfn with the indices of the user functions 𝑔𝑖 found to have a root (the value of nrtfn was supplied in the call to MRIStepRootInit()). For 𝑖 = 0 . . . nrtfn-1, rootsfound[i] is nonzero if 𝑔𝑖 has a root, and 0 if not. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL Notes: The user must allocate space for rootsfound prior to calling this function. For the components of 𝑔𝑖 for which a root was found, the sign of rootsfound[i] indicates the direction of zero-crossing. A value of +1 indicates that 𝑔𝑖 is increasing, while a value of -1 indicates a decreasing 𝑔𝑖 . int MRIStepGetNumGEvals(void* arkode_mem, long int* ngevals) Returns the cumulative number of calls made to the user’s root function 𝑔. Arguments: • arkode_mem – pointer to the MRIStep memory block. • ngevals – number of calls made to 𝑔 so far. Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL 7.5.7 MRIStep re-initialization functions To reinitialize the MRIStep module for the solution of a new problem, where a prior call to MRIStepCreate() has been made, the user must call the function MRIStepReInit(). The new problem must have the same size as the previous one. This routine performs the same input checking and initializations that are done in MRIStepCreate(), but it performs no memory allocation as is assumes that the existing internal memory is sufficient for the new problem. A call to this re-initialization routine deletes the solution history that was stored internally during the previous integration. Following a successful call to MRIStepReInit(), call MRIStepEvolve() again for the solution of the new problem. The use of MRIStepReInit() requires that the number of Runge Kutta stages for both the slow and fast methods be no larger for the new problem than for the previous problem. One important use of the MRIStepReInit() function is in the treating of jump discontinuities in the RHS functions. Except in cases of fairly small jumps, it is usually more efficient to stop at each point of discontinuity and restart the integrator with a readjusted ODE model, using a call to this routine. To stop when the location of the discontinuity is known, simply make that location a value of tout. To stop when the location of the discontinuity is determined by the solution, use the rootfinding feature. In either case, it is critical that the RHS functions not incorporate the discontinuity, but rather have a smooth extension over the discontinuity, so that the step across it (and subsequent 204 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), rootfinding, if used) can be done efficiently. Then use a switch within the RHS functions (communicated through user_data) that can be flipped between the stopping of the integration and the restart, so that the restarted problem uses the new values (which have jumped). Similar comments apply if there is to be a jump in the dependent variable vector. int MRIStepReInit(void* arkode_mem, ARKRhsFn fs, ARKRhsFn ff, realtype t0, N_Vector y0) Provides required problem specifications and re-initializes the MRIStep time-stepper module. Arguments: • arkode_mem – pointer to the MRIStep memory block. • fs – the name of the C function (of type ARKRhsFn()) defining the slow right-hand side function in 𝑦˙ = 𝑓𝑠 (𝑡, 𝑦) + 𝑓𝑓 (𝑡, 𝑦). • ff – the name of the C function (of type ARKRhsFn()) defining the fast right-hand side function in 𝑦˙ = 𝑓𝑠 (𝑡, 𝑦) + 𝑓𝑓 (𝑡, 𝑦). • t0 – the initial value of 𝑡. • y0 – the initial condition vector 𝑦(𝑡0 ). Return value: • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL • ARK_MEM_FAIL if a memory allocation failed • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, MRIStepReInit() also sends an error message to the error handler function. 7.5.8 MRIStep system resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatially-adaptive PDE simulations under a method-of-lines approach), the MRIStep integrator may be “resized” between slow integration steps, through calls to the MRIStepResize() function. This function modifies MRIStep’s internal memory structures to use the new problem size. To aid in the vector resize operation, the user can supply a vector resize function that will take as input a vector with the previous size, and transform it in-place to return a corresponding vector of the new size. If this function (of type ARKVecResizeFn()) is not supplied (i.e. is set to NULL), then all existing vectors internal to MRIStep will be destroyed and re-cloned from the new input vector. int MRIStepResize(void* arkode_mem, N_Vector ynew, realtype t0, ARKVecResizeFn resize, void* resize_data) Re-initializes MRIStep with a different state vector. Arguments: • arkode_mem – pointer to the MRIStep memory block. • ynew – the newly-sized solution vector, holding the current dependent variable values 𝑦(𝑡0 ). • t0 – the current value of the independent variable 𝑡0 (this must be consistent with ynew). • resize – the user-supplied vector resize function (of type ARKVecResizeFn(). • resize_data – the user-supplied data structure to be passed to resize when modifying internal MRIStep vectors. Return value: 7.5. MRIStep User-callable functions 205 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • ARK_SUCCESS if successful • ARK_MEM_NULL if the MRIStep memory was NULL • ARK_NO_MALLOC if arkode_mem was not allocated. • ARK_ILL_INPUT if an argument has an illegal value. Notes: If an error occurred, MRIStepResize() also sends an error message to the error handler function. 7.6 User-supplied functions The user-supplied functions for MRIStep consist of: • functions that defines the ODE (required), • a function that handles error and warning messages (optional), • a function that defines the root-finding problem(s) to solve (optional), • a function that handles vector resizing operations, if the underlying vector structure supports resizing (as opposed to deletion/recreation), and if the user plans to call MRIStepResize() (optional). 7.6.1 ODE right-hand side The user must supply two functions of type ARKRhsFn to specify the right-hand side of the ODE system: typedef int (*ARKRhsFn)(realtype t, N_Vector y, N_Vector ydot, void* user_data) This function computes a portion of the ODE right-hand side for a given value of the independent variable 𝑡 and state vector 𝑦. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • ydot – the output vector that forms a portion the ODE RHS 𝑓 (𝑡, 𝑦). • user_data – the user_data pointer that was passed to MRIStepSetUserData(). Return value: An ARKRhsFn should return 0 if successful, a positive value if a recoverable error occurred, or a negative value if it failed unrecoverably. As the MRIStep module only supports fixed step sizes at this time any non-zero return value will halt the integration. Notes: Allocation of memory for ydot is handled within the MRIStep module. A recoverable failure error return from the ARKRhsFn is typically used to flag a value of the dependent variable 𝑦 that is “illegal” in some way (e.g., negative where only a non-negative value is physically meaningful). 7.6.2 Error message handler function As an alternative to the default behavior of directing error and warning messages to the file pointed to by errfp (see MRIStepSetErrFile()), the user may provide a function of type ARKErrHandlerFn to process any such messages. typedef void (*ARKErrHandlerFn)(int error_code, const char* module, const char* function, char* msg, void* user_data) This function processes error and warning messages from MRIStep and its sub-modules. Arguments: 206 Chapter 7. Using MRIStep for C and C++ Applications User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • error_code – the error code. • module – the name of the MRIStep module reporting the error. • function – the name of the function in which the error occurred. • msg – the error message. • user_data – a pointer to user data, the same as the eh_data parameter that was passed to MRIStepSetErrHandlerFn(). Return value: An ARKErrHandlerFn function has no return value. Notes: error_code is negative for errors and positive (ARK_WARNING) for warnings. If a function that returns a pointer to memory encounters an error, it sets error_code to 0. 7.6.3 Rootfinding function If a rootfinding problem is to be solved during the integration of the ODE system, the user must supply a function of type ARKRootFn. typedef int (*ARKRootFn)(realtype t, N_Vector y, realtype* gout, void* user_data) This function implements a vector-valued function 𝑔(𝑡, 𝑦) such that the roots of the nrtfn components 𝑔𝑖 (𝑡, 𝑦) are sought. Arguments: • t – the current value of the independent variable. • y – the current value of the dependent variable vector. • gout – the output array, of length nrtfn, with components 𝑔𝑖 (𝑡, 𝑦). • user_data – a pointer to user data, the same as the user_data parameter that was passed to MRIStepSetUserData(). Return value: An ARKRootFn function should return 0 if successful or a non-zero value if an error occurred (in which case the integration is halted and MRIStep returns ARK_RTFUNC_FAIL). Notes: Allocation of memory for gout is handled within MRIStep. 7.6.4 Vector resize function For simulations involving changes to the number of equations and unknowns in the ODE system (e.g. when using spatial adaptivity in a PDE simulation), the MRIStep integrator may be “resized” between integration steps, through calls to the MRIStepResize() function. Typically, when performing adaptive simulations the solution is stored in a customized user-supplied data structure, to enable adaptivity without repeated allocation/deallocation of memory. In these scenarios, it is recommended that the user supply a customized vector kernel to interface between SUNDIALS and their problem-specific data structure. If this vector kernel includes a function of type ARKVecResizeFn to resize a given vector implementation, then this function may be supplied to MRIStepResize() so that all internal MRIStep vectors may be resized, instead of deleting and re-creating them at each call. This resize function should have the following form: typedef int (*ARKVecResizeFn)(N_Vector y, N_Vector ytemplate, void* user_data) This function resizes the vector y to match the dimensions of the supplied vector, ytemplate. Arguments: • y – the vector to resize. • ytemplate – a vector of the desired size. 7.6. User-supplied functions 207 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • user_data – a pointer to user data, the same as the resize_data parameter that was passed to MRIStepResize(). Return value: An ARKVecResizeFn function should return 0 if it successfully resizes the vector y, and a nonzero value otherwise. Notes: If this function is not supplied, then MRIStep will instead destroy the vector y and clone a new vector y off of ytemplate. 208 Chapter 7. Using MRIStep for C and C++ Applications CHAPTER EIGHT BUTCHER TABLE DATA STRUCTURE To store the Butcher table defining a Runge Kutta method ARKode provides the ARKodeButcherTable type and several related utilitiy routines. We use the following Butcher table notation (shown for a 3-stage method): 𝑐 𝑞 𝑝 𝑐1 𝐴 𝑐2 𝑏 = 𝑐3 ˜𝑏 𝑞 𝑝 𝑎1,1 𝑎2,1 𝑎3,1 𝑏1 ˜𝑏1 𝑎1,2 𝑎2,2 𝑎3,2 𝑏2 ˜𝑏2 𝑎1,3 𝑎2,3 𝑎3,3 𝑏3 ˜𝑏3 where the method and embedding share stage 𝐴 and abscissa 𝑐 values, but use their stages 𝑧𝑖 differently through the coefficients 𝑏 and ˜𝑏 to generate methods of orders 𝑞 (the main method) and 𝑝 (the embedding, typically 𝑞 = 𝑝 + 1, though sometimes this is reversed). ARKodeButcherTable is defined as typedef ARKodeButcherTableMem* ARKodeButcherTable where ARKodeButcherTableMem is the structure typedef struct ARKodeButcherTableMem { int q; int p; int stages; realtype **A; realtype *c; realtype *b; realtype *d; }; where stages is the number of stages in the RK method, the variables q, p, A, c, and b have the same meaning as in the Butcher table above, and d is used to store ˜𝑏. 209 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 8.1 ARKodeButcherTable functions Function name ARKodeButcherTable_LoadERK() ARKodeButcherTable_LoadDIRK() ARKodeButcherTable_Alloc() ARKodeButcherTable_Create() ARKodeButcherTable_Copy() ARKodeButcherTable_Space() ARKodeButcherTable_Free() ARKodeButcherTable_Write() ARKodeButcherTable_CheckOrder() ARKodeButcherTable_CheckARKOrder() Description Retrieve a given explicit Butcher table by its unique name Retrieve a given implicit Butcher table by its unique name Allocate an empty Butcher table Create a new Butcher table Create a copy of a Butcher table Get the Butcher table real and integer workspace size Deallocate a Butcher table Write the Butcher table to an output file Check the order of a Butcher table Check the order of an ARK pair of Butcher tables ARKodeButcherTable ARKodeButcherTable_LoadERK(int emethod) Retrieves a specified explicit Butcher table. The prototype for this function, as well as the integer names for each provided method, are defined in the header file arkode/arkode_butcher_erk.h. For further information on these tables and their corresponding identifiers, see Appendix: Butcher tables. Arguments: • emethod – integer input specifying the given Butcher table. Return value: • ARKodeButcherTable structure if successful. • NULL pointer if imethod was invalid. ARKodeButcherTable ARKodeButcherTable_LoadDIRK(int imethod) Retrieves a specified diagonally-implicit Butcher table. The prototype for this function, as well as the integer names for each provided method, are defined in the header file arkode/arkode_butcher_dirk.h. For further information on these tables and their corresponding identifiers, see Appendix: Butcher tables. Arguments: • imethod – integer input specifying the given Butcher table. Return value: • ARKodeButcherTable structure if successful. • NULL pointer if imethod was invalid. ARKodeButcherTable ARKodeButcherTable_Alloc(int stages, booleantype embedded) Allocates an empty Butcher table. Arguments: • stages – the number of stages in the Butcher table. • embedded – flag denoting whether the Butcher table has an embedding (SUNTRUE) or not (SUNFALSE). Return value: • ARKodeButcherTable structure if successful. • NULL pointer if stages was invalid or an allocation error occured. ARKodeButcherTable ARKodeButcherTable_Create(int s, int q, int p, realtype *c, realtype *A, realtype *b, realtype *d) Allocates a Butcher table and fills it with the given values. 210 Chapter 8. Butcher Table Data Structure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • s – number of stages in the RK method. • q – global order of accuracy for the RK method. • p – global order of accuracy for the embedded RK method. • c – array (of length s) of stage times for the RK method. • A – array of coefficients defining the RK stages. This should be stored as a 1D array of size s*s, in row-major order. • b – array of coefficients (of length s) defining the time step solution. • d – array of coefficients (of length s) defining the embedded solution. Return value: • ARKodeButcherTable structure if successful. • NULL pointer if stages was invalid or an allocation error occured. Notes: If the method does not have an embedding then d should be NULL and q should be equal to zero. ARKodeButcherTable ARKodeButcherTable_Copy(ARKodeButcherTable B) Creates copy of the given Butcher table. Arguments: • B – the Butcher table to copy. Return value: • ARKodeButcherTable structure if successful. • NULL pointer an allocation error occured. void ARKodeButcherTable_Space(ARKodeButcherTable B, sunindextype *liw, sunindextype *lrw) Get the real and integer workspace size for a Butcher table. Arguments: • B – the Butcher table. • lenrw – the number of realtype values in the Butcher table workspace. • leniw – the number of integer values in the Butcher table workspace. Return value: • ARK_SUCCESS if successful. • ARK_MEM_NULL if the Butcher table memory was NULL. void ARKodeButcherTable_Free(ARKodeButcherTable B) Deallocate the Butcher table memory. Arguments: • B – the Butcher table. void ARKodeButcherTable_Write(ARKodeButcherTable B, FILE *outfile) Write the Butcher table to the provided file pointer. Arguments: • B – the Butcher table. • outfile – pointer to use for printing the Butcher table. 8.1. ARKodeButcherTable functions 211 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Notes: The outfile argument can be stdout or stderr, or it may point to a specific file created using fopen. int ARKodeButcherTable_CheckOrder(ARKodeButcherTable B, int* q, int* p, FILE* outfile) Determine the analytic order of accuracy for the specified Butcher table. The analytic (necessary) conditions are checked up to order 6. For orders greater than 6 the Butcher simplifying (sufficient) assumptions are used. Arguments: • B – the Butcher table. • q – the measured order of accuracy for the method. • p – the measured order of accuracy for the embedding; 0 if the method does not have an embedding. • outfile – file pointer for printing results; NULL to suppress output. Return value: • 0 – success, the measured vales of q and p match the values of q and p in the provided Butcher tables. • 1 – warning, the values of q and p in the provided Butcher tables are lower than the measured values, or the measured values achieve the maximum order possible with this function and the values of q and p in the provided Butcher tables table are higher. • -1 – failure, the values of q and p in the provided Butcher tables are higher than the measured values. • -2 – failure, the input Butcher table or critical table contents are NULL. Notes: For embedded methods, if the return flags for q and p would differ, failure takes precedence over warning, which takes precedence over success. int ARKodeButcherTable_CheckARKOrder(ARKodeButcherTable B1, ARKodeButcherTable B2, int *q, int *p, FILE *outfile) Determine the analytic order of accuracy (up to order 6) for a specified ARK pair of Butcher tables. Arguments: • B1 – a Butcher table in the ARK pair. • B2 – a Butcher table in the ARK pair. • q – the measured order of accuracy for the method. • p – the measured order of accuracy for the embedding; 0 if the method does not have an embedding. • outfile – file pointer for printing results; NULL to suppress output. Return value: • 0 – success, the measured vales of q and p match the values of q and p in the provided Butcher tables. • 1 – warning, the values of q and p in the provided Butcher tables are lower than the measured values, or the measured values achieve the maximum order possible with this function and the values of q and p in the provided Butcher tables table are higher. • -1 – failure, the input Butcher tables or critical table contents are NULL. Notes: For embedded methods, if the return flags for q and p would differ, warning takes precedence over success. 212 Chapter 8. Butcher Table Data Structure CHAPTER NINE VECTOR DATA STRUCTURES The SUNDIALS library comes packaged with a variety of NVECTOR implementations, designed for simulations in serial, shared-memory parallel, and distributed-memory parallel environments, as well as interfaces to vector data structures used within external linear solver libraries. All native implementations assume that the process-local data is stored contiguously, and they in turn provide a variety of standard vector algebra operations that may be performed on the data. In addition, SUNDIALS provides a simple interface for generic vectors (akin to a C++ abstract base class). All of the major SUNDIALS solvers (CVODE(s), IDA(s), KINSOL, ARKODE) in turn are constructed to only depend on these generic vector operations, making them immediately extensible to new user-defined vector objects. The only exceptions to this rule relate to the dense, banded and sparse-direct linear system solvers, since they rely on particular data storage and access patterns in the NVECTORS used. 9.1 Description of the NVECTOR Modules The SUNDIALS solvers are written in a data-independent manner. They all operate on generic vectors (of type N_Vector) through a set of operations defined by, and specific to, the particular NVECTOR implementation. Users can provide a custom implementation of the NVECTOR module or use one of four provided within SUNDIALS – a serial and three parallel implementations. The generic operations are described below. In the sections following, the implementations provided with SUNDIALS are described. The generic N_Vector type is a pointer to a structure that has an implementation-dependent content field containing the description and actual data of the vector, and an ops field pointing to a structure with generic vector operations. The type N_Vector is defined as: typedef struct _generic_N_Vector *N_Vector; struct _generic_N_Vector { void *content; struct _generic_N_Vector_Ops *ops; }; Here, the _generic_N_Vector_Op structure is essentially a list of function pointers to the various actual vector operations, and is defined as struct _generic_N_Vector_Ops { N_Vector_ID (*nvgetvectorid)(N_Vector); N_Vector (*nvclone)(N_Vector); N_Vector (*nvcloneempty)(N_Vector); void (*nvdestroy)(N_Vector); void (*nvspace)(N_Vector, sunindextype *, sunindextype *); realtype* (*nvgetarraypointer)(N_Vector); void (*nvsetarraypointer)(realtype *, N_Vector); 213 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void void void void void void void void realtype realtype realtype realtype realtype realtype realtype void booleantype booleantype realtype int int int int int int int int int int (*nvlinearsum)(realtype, N_Vector, realtype, N_Vector, N_Vector); (*nvconst)(realtype, N_Vector); (*nvprod)(N_Vector, N_Vector, N_Vector); (*nvdiv)(N_Vector, N_Vector, N_Vector); (*nvscale)(realtype, N_Vector, N_Vector); (*nvabs)(N_Vector, N_Vector); (*nvinv)(N_Vector, N_Vector); (*nvaddconst)(N_Vector, realtype, N_Vector); (*nvdotprod)(N_Vector, N_Vector); (*nvmaxnorm)(N_Vector); (*nvwrmsnorm)(N_Vector, N_Vector); (*nvwrmsnormmask)(N_Vector, N_Vector, N_Vector); (*nvmin)(N_Vector); (*nvwl2norm)(N_Vector, N_Vector); (*nvl1norm)(N_Vector); (*nvcompare)(realtype, N_Vector, N_Vector); (*nvinvtest)(N_Vector, N_Vector); (*nvconstrmask)(N_Vector, N_Vector, N_Vector); (*nvminquotient)(N_Vector, N_Vector); (*nvlinearcombination)(int, realtype *, N_Vector *, N_Vector); (*nvscaleaddmulti)(int, realtype *, N_Vector, N_Vector *, N_Vector *); (*nvdotprodmulti)(int, N_Vector, N_Vector *, realtype *); (*nvlinearsumvectorarray)(int, realtype, N_Vector *, realtype, N_Vector *, N_Vector *); (*nvscalevectorarray)(int, realtype *, N_Vector *, N_Vector *); (*nvconstvectorarray)(int, realtype, N_Vector *); (*nvwrmsnomrvectorarray)(int, N_Vector *, N_Vector *, realtype *); (*nvwrmsnomrmaskvectorarray)(int, N_Vector *, N_Vector *, N_Vector, realtype *); (*nvscaleaddmultivectorarray)(int, int, realtype *, N_Vector *, N_Vector **, N_Vector **); (*nvlinearcombinationvectorarray)(int, int, realtype *, N_Vector **, N_Vector *); }; The generic NVECTOR module defines and implements the vector operations acting on a N_Vector. These routines are nothing but wrappers for the vector operations defined by a particular NVECTOR implementation, which are accessed through the ops field of the N_Vector structure. To illustrate this point we show below the implementation of a typical vector operation from the generic NVECTOR module, namely N_VScale, which performs the scaling of a vector x by a scalar c: void N_VScale(realtype c, N_Vector x, N_Vector z) { z->ops->nvscale(c, x, z); } The subsection Description of the NVECTOR operations contains a complete list of all standard vector operations defined by the generic NVECTOR module. The subsections Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations list optional fused and vector array operations respectively. Fused and vector array operations are intended to increase data reuse, reduce parallel communication on distributed memory systems, and lower the number of kernel launches on systems with accelerators. If a particular NVECTOR implementation defines a fused or vector array operation as NULL, the generic NVECTOR module will automatically call standard vector operations as necessary to complete the desired operation. Currently, all fused and vector array operations are disabled by default however, SUNDIALS provided NVECTOR implementations define additional usercallable functions to enable/disable any or all of the fused and vector array operations. See the following sections for the implementation specific functions to enable/disable operations. Finally, we note that the generic NVECTOR module defines the functions N_VCloneVectorArray and 214 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_VCloneVectorArrayEmpty. Both functions create (by cloning) an array of count variables of type N_Vector, each of the same type as an existing N_Vector. Their prototypes are: N_Vector *N_VCloneVectorArray(int count, N_Vector w); N_Vector *N_VCloneVectorArrayEmpty(int count, N_Vector w); and their definitions are based on the implementation-specific N_VClone and N_VCloneEmpty operations, respectively. An array of variables of type N_Vector can be destroyed by calling N_VDestroyVectorArray, whose prototype is void N_VDestroyVectorArray(N_Vector *vs, int count); and whose definition is based on the implementation-specific N_VDestroy operation. A particular implementation of the NVECTOR module must: • Specify the content field of the N_Vector. • Define and implement the necessary vector operations. Note that the names of these routines should be unique to that implementation in order to permit using more than one NVECTOR module (each with different N_Vector internal data representations) in the same code. • Define and implement user-callable constructor and destructor routines to create and free a N_Vector with the new content field and with ops pointing to the new vector operations. • Optionally, define and implement additional user-callable routines acting on the newly defined N_Vector (e.g., a routine to print the content for debugging purposes). • Optionally, provide accessor macros as needed for that particular implementation to be used to access different parts in the content field of the newly defined N_Vector. Each NVECTOR implementation included in SUNDIALS has a unique identifier specified in enumeration and shown in the table below. It is recommended that a user supplied NVECTOR implementation use the SUNDIALS_NVEC_CUSTOM identifier. 9.1.1 Vector Identifications associated with vector kernels supplied with SUNDIALS Vector ID SUNDIALS_NVEC_SERIAL SUNDIALS_NVEC_PARALLEL SUNDIALS_NVEC_OPENMP SUNDIALS_NVEC_PTHREADS SUNDIALS_NVEC_PARHYP SUNDIALS_NVEC_PETSC SUNDIALS_NVEC_CUSTOM Vector type Serial Distributed memory parallel (MPI) OpenMP shared memory parallel PThreads shared memory parallel hypre ParHyp parallel vector PETSc parallel vector User-provided custom vector ID Value 0 1 2 3 4 5 6 9.2 Description of the NVECTOR operations The standard vector operations defined by the generic N_Vector module are defined as follows. For each of these operations, we give the name, usage of the function, and a description of its mathematical operations below. N_Vector_ID N_VGetVectorID(N_Vector w) Returns the vector type identifier for the vector w. It is used to determine the vector implementation type (e.g. 9.2. Description of the NVECTOR operations 215 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), serial, parallel, ...) from the abstract N_Vector interface. Returned values are given in the table, Vector Identifications associated with vector kernels supplied with SUNDIALS Usage: id = N_VGetVectorID(w); N_Vector N_VClone(N_Vector w) Creates a new N_Vector of the same type as an existing vector w and sets the ops field. It does not copy the vector, but rather allocates storage for the new vector. Usage: v = N_VClone(w); N_Vector N_VCloneEmpty(N_Vector w) Creates a new N_Vector of the same type as an existing vector w and sets the ops field. It does not allocate storage for the new vector’s data. Usage: v = N VCloneEmpty(w); void N_VDestroy(N_Vector v) Destroys the N_Vector v and frees memory allocated for its internal data. Usage: N_VDestroy(v); void N_VSpace(N_Vector v, sunindextype* lrw, sunindextype* liw) Returns storage requirements for the N_Vector v: lrw contains the number of realtype words and liw contains the number of integer words. This function is advisory only, for use in determining a user’s total space requirements; it could be a dummy function in a user-supplied NVECTOR module if that information is not of interest. Usage: N_VSpace(nvSpec, &lrw, &liw); realtype* N_VGetArrayPointer(N_Vector v) Returns a pointer to a realtype array from the N_Vector v. Note that this assumes that the internal data in the N_Vector is a contiguous array of realtype. This routine is only used in the solver-specific interfaces to the dense and banded (serial) linear solvers, and in the interfaces to the banded (serial) and band-block-diagonal (parallel) preconditioner modules provided with SUNDIALS. Usage: vdata = NVGetArrayPointer(v); void N_VSetArrayPointer(realtype* vdata, N_Vector v) Replaces the data array pointer in an N_Vector with a given array of realtype. Note that this assumes that the internal data in the N_Vector is a contiguous array of realtype. This routine is only used in the interfaces to the dense (serial) linear solver, hence need not exist in a user-supplied NVECTOR module. Usage: NVSetArrayPointer(vdata,v); 216 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VLinearSum(realtype a, N_Vector x, realtype b, N_Vector y, N_Vector z) Performs the operation z = ax + by, where a and b are realtype scalars and x and y are of type N_Vector: 𝑧𝑖 = 𝑎𝑥𝑖 + 𝑏𝑦𝑖 , 𝑖 = 0, . . . , 𝑛 − 1. Usage: N_VLinearSum(a, x, b, y, z); void N_VConst(realtype c, N_Vector z) Sets all components of the N_Vector z to realtype c: 𝑧𝑖 = 𝑐, 𝑖 = 0, . . . , 𝑛 − 1. Usage: N_VConst(c, z); void N_VProd(N_Vector x, N_Vector y, N_Vector z) Sets the N_Vector z to be the component-wise product of the N_Vector inputs x and y: 𝑧𝑖 = 𝑥𝑖 𝑦𝑖 , 𝑖 = 0, . . . , 𝑛 − 1. Usage: N_VProd(x, y, z); void N_VDiv(N_Vector x, N_Vector y, N_Vector z) Sets the N_Vector z to be the component-wise ratio of the N_Vector inputs x and y: 𝑧𝑖 = 𝑥𝑖 , 𝑦𝑖 𝑖 = 0, . . . , 𝑛 − 1. The 𝑦𝑖 may not be tested for 0 values. It should only be called with a y that is guaranteed to have all nonzero components. Usage: N_VDiv(x, y, z); void N_VScale(realtype c, N_Vector x, N_Vector z) Scales the N_Vector x by the realtype scalar c and returns the result in z: 𝑧𝑖 = 𝑐𝑥𝑖 , 𝑖 = 0, . . . , 𝑛 − 1. Usage: N_VScale(c, x, z); void N_VAbs(N_Vector x, N_Vector z) Sets the components of the N_Vector z to be the absolute values of the components of the N_Vector x: 𝑦𝑖 = |𝑥𝑖 |, 𝑖 = 0, . . . , 𝑛 − 1. Usage: N_VAbs(x, z); void N_VInv(N_Vector x, N_Vector z) Sets the components of the N_Vector z to be the inverses of the components of the N_Vector x: 𝑧𝑖 = 1.0/𝑥𝑖 , 9.2. Description of the NVECTOR operations 𝑖 = 0, . . . , 𝑛 − 1. 217 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), This routine may not check for division by 0. It should be called only with an x which is guaranteed to have all nonzero components. Usage: N_VInv(x, z); void N_VAddConst(N_Vector x, realtype b, N_Vector z) Adds the realtype scalar b to all components of x and returns the result in the N_Vector z: 𝑖 = 0, . . . , 𝑛 − 1. 𝑧𝑖 = 𝑥𝑖 + 𝑏, Usage: N_VAddConst(x, b, z); realtype N_VDotProd(N_Vector x, N_Vector z) Returns the value of the dot-product of the N_Vectors x and y: 𝑑= 𝑛−1 ∑︁ 𝑥𝑖 𝑦𝑖 . 𝑖=0 Usage: d = N_VDotProd(x, y); realtype N_VMaxNorm(N_Vector x) Returns the value of the 𝑙∞ norm of the N_Vector x: 𝑚= max |𝑥𝑖 |. 0≤𝑖≤𝑛−1 Usage: m = N_VMaxNorm(x); realtype N_VWrmsNorm(N_Vector x, N_Vector w) Returns the weighted root-mean-square norm of the N_Vector x with (positive) realtype weight vector w: ⎯(︃ )︃ ⎸ 𝑛−1 ⎸ ∑︁ 𝑚=⎷ (𝑥𝑖 𝑤𝑖 )2 /𝑛 𝑖=0 Usage: m = N_VWrmsNorm(x, w); realtype N_VWrmsNormMask(N_Vector x, N_Vector w, N_Vector id) Returns the weighted root mean square norm of the N_Vector x with realtype weight vector w built using only the elements of x corresponding to positive elements of the N_Vector id: ⎯(︃ )︃ ⎸ 𝑛−1 ⎸ ∑︁ ⎷ 𝑚= (𝑥𝑖 𝑤𝑖 𝐻(𝑖𝑑𝑖 ))2 /𝑛, 𝑖=0 {︃ 1 where 𝐻(𝛼) = 0 218 𝛼>0 . 𝛼≤0 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), m = N_VWrmsNormMask(x, w, id); realtype N_VMin(N_Vector x) Returns the smallest element of the N_Vector x: 𝑚= min 0≤𝑖≤𝑛−1 𝑥𝑖 . Usage: m = N_VMin(x); realtype N_VWl2Norm(N_Vector x, N_Vector w) Returns the weighted Euclidean 𝑙2 norm of the N_Vector x with realtype weight vector w: ⎯ ⎸𝑛−1 ⎸ ∑︁ 2 𝑚=⎷ (𝑥𝑖 𝑤𝑖 ) . 𝑖=0 Usage: m = N_VWL2Norm(x, w); realtype N_VL1Norm(N_Vector x) Returns the 𝑙1 norm of the N_Vector x: 𝑚= 𝑛−1 ∑︁ |𝑥𝑖 |. 𝑖=0 Usage: m = N_VL1Norm(x); void N_VCompare(realtype c, N_Vector x, N_Vector z) Compares the components of the N_Vector x to the realtype scalar c and returns an N_Vector z such that for all 0 ≤ 𝑖 ≤ 𝑛 − 1, {︃ 1.0 if |𝑥𝑖 | ≥ 𝑐, 𝑧𝑖 = . 0.0 otherwise Usage: N_VCompare(c, x, z); booleantype N_VInvTest(N_Vector x, N_Vector z) Sets the components of the N_Vector z to be the inverses of the components of the N_Vector x, with prior testing for zero values: 𝑧𝑖 = 1.0/𝑥𝑖 , 𝑖 = 0, . . . , 𝑛 − 1. This routine returns a boolean assigned to SUNTRUE if all components of x are nonzero (successful inversion) and returns SUNFALSE otherwise. Usage: t = N_VInvTest(x, z); 9.2. Description of the NVECTOR operations 219 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), booleantype N_VConstrMask(N_Vector c, N_Vector x, N_Vector m) Performs the following constraint tests based on the values in 𝑐𝑖 : 𝑥𝑖 > 0 if 𝑐𝑖 = 2, 𝑥𝑖 ≥ 0 if 𝑐𝑖 = 1, 𝑥𝑖 < 0 if 𝑐𝑖 = −2, 𝑥𝑖 ≤ 0 if 𝑐𝑖 = −1. There is no constraint on 𝑥𝑖 if 𝑐𝑖 = 0. This routine returns a boolean assigned to SUNFALSE if any element failed the constraint test and assigned to SUNTRUE if all passed. It also sets a mask vector m, with elements equal to 1.0 where the constraint test failed, and 0.0 where the test passed. This routine is used only for constraint checking. Usage: t = N_VConstrMask(c, x, m); realtype N_VMinQuotient(N_Vector num, N_Vector denom) This routine returns the minimum of the quotients obtained by termwise dividing the elements of n by the elements in d: min 𝑖=0,...,𝑛−1 num𝑖 . denom𝑖 A zero element in denom will be skipped. If no such quotients are found, then the large value BIG_REAL (defined in the header file sundials_types.h) is returned. Usage: minq = N_VMinQuotient(num, denom); 9.2.1 Description of the NVECTOR fused operations The following fused vector operations are optional. These operations are intended to increase data reuse, reduce parallel communication on distributed memory systems, and lower the number of kernel launches on systems with accelerators. If a particular NVECTOR implementation defines one of the fused vector operations as NULL, the NVECTOR interface will call one of the above standard vector operations as necessary. As above, for each operation, we give the name, usage of the function, and a description of its mathematical operations below. int N_VLinearCombination(int nv, realtype* c, N_Vector* X, N_Vector z) This routine computes the linear combination of nv vectors with 𝑛 elements: 𝑧𝑖 = 𝑛𝑣−1 ∑︁ 𝑐𝑗 𝑥𝑗,𝑖 , 𝑖 = 0, . . . , 𝑛 − 1, 𝑗=0 where 𝑐 is an array of 𝑛𝑣 scalars, 𝑥𝑗 is a vector in the vector array X, and z is the output vector. If the output vector z is one of the vectors in X, then it must be the first vector in the vector array. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VLinearCombination(nv, c, X, z); int N_VScaleAddMulti(int nv, realtype* c, N_Vector x, N_Vector* Y, N_Vector* Z) This routine scales and adds one vector to nv vectors with 𝑛 elements: 𝑧𝑗,𝑖 = 𝑐𝑗 𝑥𝑖 + 𝑦𝑗,𝑖 , 220 𝑗 = 0, . . . , 𝑛𝑣 − 1 𝑖 = 0, . . . , 𝑛 − 1, Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), where c is an array of scalars, x is a vector, 𝑦𝑗 is a vector in the vector array Y, and 𝑧𝑗 is an output vector in the vector array Z. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VScaleAddMulti(nv, c, x, Y, Z); int N_VDotProdMulti(int nv, N_Vector x, N_Vector* Y, realtype* d) This routine computes the dot product of a vector with nv vectors having 𝑛 elements: 𝑑𝑗 = 𝑛−1 ∑︁ 𝑥𝑖 𝑦𝑗,𝑖 , 𝑗 = 0, . . . , 𝑛𝑣 − 1, 𝑖=0 where d is an array of scalars containing the computed dot products, x is a vector, and 𝑦𝑗 is a vector the vector array Y. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VDotProdMulti(nv, x, Y, d); 9.2.2 Description of the NVECTOR vector array operations The following vector array operations are also optional. As with the fused vector operations, these are intended to increase data reuse, reduce parallel communication on distributed memory systems, and lower the number of kernel launches on systems with accelerators. If a particular NVECTOR implementation defines one of the fused or vector array operations as NULL, the NVECTOR interface will call one of the above standard vector operations as necessary. As above, for each operation, we give the name, usage of the function, and a description of its mathematical operations below. int N_VLinearSumVectorArray(int nv, realtype a, N_Vector X, realtype b, N_Vector* Y, N_Vector* Z) This routine computes the linear sum of two vector arrays of nv vectors with 𝑛 elements: 𝑧𝑗,𝑖 = 𝑎𝑥𝑗,𝑖 + 𝑏𝑦𝑗,𝑖 , 𝑖 = 0, . . . , 𝑛 − 1 𝑗 = 0, . . . , 𝑛𝑣 − 1, where a and b are scalars, 𝑥𝑗 and 𝑦𝑗 are vectors in the vector arrays X and Y respectively, and 𝑧𝑗 is a vector in the output vector array Z. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VLinearSumVectorArray(nv, a, X, b, Y, Z); int N_VScaleVectorArray(int nv, realtype* c, N_Vector* X, N_Vector* Z) This routine scales each element in a vector of 𝑛 elements in a vector array of nv vectors by a potentially different constant: 𝑧𝑗,𝑖 = 𝑐𝑗 𝑥𝑗,𝑖 , 𝑖 = 0, . . . , 𝑛 − 1 𝑗 = 0, . . . , 𝑛𝑣 − 1, where c is an array of scalars, 𝑥𝑗 is a vector in the vector array X, and 𝑧𝑗 is a vector in the output vector array Z. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VScaleVectorArray(nv, c, X, Z); int N_VConstVectorArray(int nv, realtype c, N_Vector* Z) This routine sets each element in a vector of 𝑛 elements in a vector array of nv vectors to the same value: 𝑧𝑗,𝑖 = 𝑐, 𝑖 = 0, . . . , 𝑛 − 1 9.2. Description of the NVECTOR operations 𝑗 = 0, . . . , 𝑛𝑣 − 1, 221 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), where c is a scalar and 𝑧𝑗 is a vector in the vector array Z. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VConstVectorArray(nv, c, Z); int N_VWrmsNormVectorArray(int nv, N_Vector* X, N_Vector* W, realtype* m) This routine computes the weighted root mean square norm of each vector in a vector array: (︃ 𝑚𝑗 = 𝑛−1 1 ∑︁ 2 (𝑥𝑗,𝑖 𝑤𝑗,𝑖 ) 𝑛 𝑖=0 )︃1/2 , 𝑗 = 0, . . . , 𝑛𝑣 − 1, where 𝑥𝑗 is a vector in the vector array X, 𝑤𝑗 is a weight vector in the vector array W, and m is the output array of scalars containing the computed norms. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VWrmsNormVectorArray(nv, X, W, m); int N_VWrmsNormMaskVectorArray(int nv, N_Vector* X, N_Vector* W, N_Vector id, realtype* m) This routine computes the masked weighted root mean square norm of each vector in a vector array: (︃ 𝑚𝑗 = 𝑛−1 1 ∑︁ 2 (𝑥𝑗,𝑖 𝑤𝑗,𝑖 𝐻(𝑖𝑑𝑖 )) 𝑛 𝑖=0 )︃1/2 , 𝑗 = 0, . . . , 𝑛𝑣 − 1, where 𝐻(𝑖𝑑𝑖 ) = 1 for 𝑖𝑑𝑖 > 0 and is zero otherwise, 𝑥𝑗 is a vector in the vector array X, 𝑤𝑗 is a weight vector in the vector array W, id is the mask vector, and m is the output array of scalars containing the computed norms. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VWrmsNormMaskVectorArray(nv, X, W, id, m); int N_VScaleAddMultiVectorArray(int nv, int nsum, realtype* c, N_Vector* X, N_Vector** YY, N_Vector** ZZ) This routine scales and adds a vector array of nv vectors to nsum other vector arrays: 𝑧𝑘,𝑗,𝑖 = 𝑐𝑘 𝑥𝑗,𝑖 + 𝑦𝑘,𝑗,𝑖 , 𝑖 = 0, . . . , 𝑛 − 1 𝑗 = 0, . . . , 𝑛𝑣 − 1, 𝑘 = 0, . . . , 𝑛𝑠𝑢𝑚 − 1 where c is an array of scalars, 𝑥𝑗 is a vector in the vector array X, 𝑦𝑘,𝑗 is a vector in the array of vector arrays YY, and 𝑧𝑘,𝑗 is an output vector in the array of vector arrays ZZ. The operation returns 0 for success and a non-zero value otherwise. Usage: ier = N_VScaleAddMultiVectorArray(nv, nsum, c, x, YY, ZZ); int N_VLinearCombinationVectorArray(int nv, int nsum, realtype* c, N_Vector** XX, N_Vector* Z) This routine computes the linear combination of nsum vector arrays containing nv vectors: 𝑧𝑗,𝑖 = 𝑛𝑠𝑢𝑚−1 ∑︁ 𝑐𝑘 𝑥𝑘,𝑗,𝑖 , 𝑖 = 0, . . . , 𝑛 − 1 𝑗 = 0, . . . , 𝑛𝑣 − 1, 𝑘=0 where c is an array of scalars, 𝑥𝑘,𝑗 is a vector in array of vector arrays XX, and 𝑧𝑗,𝑖 is an output vector in the vector array Z. If the output vector array is one of the vector arrays in XX, it must be the first vector array in XX. The operation returns 0 for success and a non-zero value otherwise. Usage: 222 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ier = N_VLinearCombinationVectorArray(nv, nsum, c, XX, Z); 9.3 The NVECTOR_SERIAL Module The serial implementation of the NVECTOR module provided with SUNDIALS, NVECTOR_SERIAL, defines the content field of a N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, and a boolean flag own_data which specifies the ownership of data. struct _N_VectorContent_Serial { sunindextype length; booleantype own_data; realtype *data; }; The header file to be included when using this module is nvector_serial.h. The following five macros are provided to access the content of an NVECTOR_SERIAL vector. The suffix _S in the names denotes the serial version. NV_CONTENT_S(v) This macro gives access to the contents of the serial vector N_Vector v. The assignment v_cont = NV_CONTENT_S(v) sets v_cont to be a pointer to the serial N_Vector content structure. Implementation: #define NV_CONTENT_S(v) ( (N_VectorContent_Serial)(v->content) ) NV_OWN_DATA_S(v) Access the own_data component of the serial N_Vector v. Implementation: #define NV_OWN_DATA_S(v) ( NV_CONTENT_S(v)->own_data ) NV_DATA_S(v) The assignment v_data = NV_DATA_S(v) sets v_data to be a pointer to the first component of the data for the N_Vector v. Similarly, the assignment NV_DATA_S(v) = v_data sets the component array of v to be v_data by storing the pointer v_data. Implementation: #define NV_DATA_S(v) ( NV_CONTENT_S(v)->data ) NV_LENGTH_S(v) Access the length component of the serial N_Vector v. The assignment v_len = NV_LENGTH_S(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_S(v) = len_v sets the length of v to be len_v. Implementation: #define NV_LENGTH_S(v) ( NV_CONTENT_S(v)->length ) NV_Ith_S(v, i) This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing. 9.3. The NVECTOR_SERIAL Module 223 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The assignment r = NV_Ith_S(v,i) sets r to be the value of the i-th component of v. The assignment NV_Ith_S(v,i) = r sets the value of the i-th component of v to be r. Here i ranges from 0 to 𝑛 − 1 for a vector of length 𝑛. Implementation: #define NV_Ith_S(v,i) ( NV_DATA_S(v)[i] ) The NVECTOR_SERIAL module defines serial implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations. Their names are obtained from those in those sections by appending the suffix _Serial (e.g. N_VDestroy_Serial). The module NVECTOR_SERIAL provides the following additional user-callable routines: N_Vector N_VNew_Serial(sunindextype vec_length) This function creates and allocates memory for a serial N_Vector. Its only argument is the vector length. N_Vector N_VNewEmpty_Serial(sunindextype vec_length) This function creates a new serial N_Vector with an empty (NULL) data array. N_Vector N_VMake_Serial(sunindextype vec_length, realtype* v_data) This function creates and allocates memory for a serial vector with user-provided data array, v_data. (This function does not allocate memory for v_data itself.) N_Vector* N_VCloneVectorArray_Serial(int count, N_Vector w) This function creates (by cloning) an array of count serial vectors. N_Vector* N_VCloneVectorArrayEmpty_Serial(int count, N_Vector w) This function creates (by cloning) an array of count serial vectors, each with an empty (‘NULL) data array. void N_VDestroyVectorArray_Serial(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Serial() or with N_VCloneVectorArrayEmpty_Serial(). sunindextype N_VGetLength_Serial(N_Vector v) This function returns the number of vector elements. void N_VPrint_Serial(N_Vector v) This function prints the content of a serial vector to stdout. void N_VPrintFile_Serial(N_Vector v, FILE *outfile) This function prints the content of a serial vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_SERIAL module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Serial(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Serial() will have the default settings for the NVECTOR_SERIAL module. void N_VEnableFusedOps_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. 224 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableScaleAddMulti_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Serial(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = NV_DATA_S(v) and then access v_data[i] within the loop than it is to use NV_Ith_S(v,i) within the loop. • N_VNewEmpty_Serial(), N_VMake_Serial(), and N_VCloneVectorArrayEmpty_Serial() set the field own_data to SUNFALSE. The functions N_VDestroy_Serial() and N_VDestroyVectorArray_Serial() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer. • To maximize efficiency, vector operations in the NVECTOR_SERIAL implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same length. For solvers that include a Fortran interface module, the NVECTOR_SERIAL module also includes a Fortran-callable function FNVINITS(code, NEQ, IER), to initialize this NVECTOR_SERIAL module. Here code is an input solver id (1 for CVODE, 2 for IDA, 3 for KINSOL, 4 for ARKode); NEQ is the problem size (declared so as to match C type long int); and IER is an error return flag equal 0 for success and -1 for failure. 9.3. The NVECTOR_SERIAL Module 225 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 9.4 The NVECTOR_PARALLEL Module The NVECTOR_PARALLEL implementation of the NVECTOR module provided with SUNDIALS is based on MPI. It defines the content field of a N_Vector to be a structure containing the global and local lengths of the vector, a pointer to the beginning of a contiguous local data array, an MPI communicator, an a boolean flag own_data indicating ownership of the data array data. struct _N_VectorContent_Parallel { sunindextype local_length; sunindextype global_length; booleantype own_data; realtype *data; MPI_Comm comm; }; The header file to be included when using this module is nvector_parallel.h. The following seven macros are provided to access the content of a NVECTOR_PARALLEL vector. The suffix _P in the names denotes the distributed memory parallel version. NV_CONTENT_P(v) This macro gives access to the contents of the parallel N_Vector v. The assignment v_cont = NV_CONTENT_P(v) sets v_cont to be a pointer to the N_Vector content structure of type struct N_VectorContent_Parallel. Implementation: #define NV_CONTENT_P(v) ( (N_VectorContent_Parallel)(v->content) ) NV_OWN_DATA_P(v) Access the own_data component of the parallel N_Vector v. Implementation: #define NV_OWN_DATA_P(v) ( NV_CONTENT_P(v)->own_data ) NV_DATA_P(v) The assignment v_data = NV_DATA_P(v) sets v_data to be a pointer to the first component of the local_data for the N_Vector v. The assignment NV_DATA_P(v) = v_data sets the component array of v to be v_data by storing the pointer v_data into data. Implementation: #define NV_DATA_P(v) ( NV_CONTENT_P(v)->data ) NV_LOCLENGTH_P(v) The assignment v_llen = NV_LOCLENGTH_P(v) sets v_llen to be the length of the local part of v. The call NV_LOCLENGTH_P(v) = llen_v sets the local_length of v to be llen_v. Implementation: #define NV_LOCLENGTH_P(v) ( NV_CONTENT_P(v)->local_length ) NV_GLOBLENGTH_P(v) The assignment v_glen = NV_GLOBLENGTH_P(v) sets v_glen to be the global_length of the vector v. The call NV_GLOBLENGTH_P(v) = glen_v sets the global_length of v to be glen_v. 226 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Implementation: #define NV_GLOBLENGTH_P(v) ( NV_CONTENT_P(v)->global_length ) NV_COMM_P(v) This macro provides access to the MPI communicator used by the parallel N_Vector v. Implementation: #define NV_COMM_P(v) ( NV_CONTENT_P(v)->comm ) NV_Ith_P(v, i) This macro gives access to the individual components of the local_data array of an N_Vector. The assignment r = NV_Ith_P(v,i) sets r to be the value of the i-th component of the local part of v. The assignment NV_Ith_P(v,i) = r sets the value of the i-th component of the local part of v to be r. Here i ranges from 0 to 𝑛 − 1, where 𝑛 is the local_length. Implementation: #define NV_Ith_P(v,i) ( NV_DATA_P(v)[i] ) The NVECTOR_PARALLEL module defines parallel implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations. Their names are obtained from those in those sections by appending the suffix _Parallel (e.g. N_VDestroy_Parallel). The module NVECTOR_PARALLEL provides the following additional user-callable routines: N_Vector N_VNew_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length) This function creates and allocates memory for a parallel vector having global length global_length, having processor-local length local_length, and using the MPI communicator comm. N_Vector N_VNewEmpty_Parallel(MPI_Comm comm, sunindextype local_length, type global_length) This function creates a new parallel N_Vector with an empty (NULL) data array. sunindex- N_Vector N_VMake_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, realtype* v_data) This function creates and allocates memory for a parallel vector with user-provided data array. (This function does not allocate memory for v_data itself.) N_Vector* N_VCloneVectorArray_Parallel(int count, N_Vector w) This function creates (by cloning) an array of count parallel vectors. N_Vector* N_VCloneVectorArrayEmpty_Parallel(int count, N_Vector w) This function creates (by cloning) an array of count parallel vectors, each with an empty (NULL) data array. void N_VDestroyVectorArray_Parallel(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Parallel() or with N_VCloneVectorArrayEmpty_Parallel(). sunindextype N_VGetLength_Parallel(N_Vector v) This function returns the number of vector elements (global vector length). sunindextype N_VGetLocalLength_Parallel(N_Vector v) This function returns the local vector length. void N_VPrint_Parallel(N_Vector v) This function prints the local content of a parallel vector to stdout. 9.4. The NVECTOR_PARALLEL Module 227 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VPrintFile_Parallel(N_Vector v, FILE *outfile) This function prints the local content of a parallel vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_PARALLEL module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Parallel(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Parallel() will have the default settings for the NVECTOR_PARALLEL module. void N_VEnableFusedOps_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Parallel(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes 228 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • When looping over the components of an N_Vector v, it is more efficient to first obtain the local component array via v_data = NV_DATA_P(v) and then access v_data[i] within the loop than it is to use NV_Ith_P(v,i) within the loop. • N_VNewEmpty_Parallel(), N_VMake_Parallel(), and N_VCloneVectorArrayEmpty_Parallel() set the field own_data to SUNFALSE. The routines N_VDestroy_Parallel() and N_VDestroyVectorArray_Parallel() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer. • To maximize efficiency, vector operations in the NVECTOR_PARALLEL implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. For solvers that include a Fortran interface module, the NVECTOR_PARALLEL module also includes a Fortran-callable function FNVINITP(COMM, code, NLOCAL, NGLOBAL, IER), to initialize this NVECTOR_PARALLEL module. Here COMM is the MPI communicator, code is an input solver id (1 for CVODE, 2 for IDA, 3 for KINSOL, 4 for ARKode); NLOCAL and NGLOBAL are the local and global vector sizes, respectively (declared so as to match C type long int); and IER is an error return flag equal 0 for success and -1 for failure. Note: If the header file sundials_config.h defines SUNDIALS_MPI_COMM_F2C to be 1 (meaning the MPI implementation used to build SUNDIALS includes the MPI_Comm_f2c function), then COMM can be any valid MPI communicator. Otherwise, MPI_COMM_WORLD will be used, so just pass an integer value as a placeholder. 9.5 The NVECTOR_OPENMP Module In situations where a user has a multi-core processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least 100, 000 before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations. The OpenMP NVECTOR implementation provided with SUNDIALS, NVECTOR_OPENMP, defines the content field of N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, a boolean flag own_data which specifies the ownership of data, and the number of threads. Operations on the vector are threaded using OpenMP, the number of threads used is based on the supplied argument in the vector constructor. struct _N_VectorContent_OpenMP { sunindextype length; booleantype own_data; realtype *data; int num_threads; }; The header file to be included when using this module is nvector_openmp.h. The following six macros are provided to access the content of an NVECTOR_OPENMP vector. The suffix _OMP in the names denotes the OpenMP version. NV_CONTENT_OMP(v) This macro gives access to the contents of the OpenMP vector N_Vector v. The assignment v_cont = NV_CONTENT_OMP(v) sets v_cont to be a pointer to the OpenMP N_Vector content structure. Implementation: 9.5. The NVECTOR_OPENMP Module 229 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), #define NV_CONTENT_OMP(v) ( (N_VectorContent_OpenMP)(v->content) ) NV_OWN_DATA_OMP(v) Access the own_data component of the OpenMP N_Vector v. Implementation: #define NV_OWN_DATA_OMP(v) ( NV_CONTENT_OMP(v)->own_data ) NV_DATA_OMP(v) The assignment v_data = NV_DATA_OMP(v) sets v_data to be a pointer to the first component of the data for the N_Vector v. Similarly, the assignment NV_DATA_OMP(v) = v_data sets the component array of v to be v_data by storing the pointer v_data. Implementation: #define NV_DATA_OMP(v) ( NV_CONTENT_OMP(v)->data ) NV_LENGTH_OMP(v) Access the length component of the OpenMP N_Vector v. The assignment v_len = NV_LENGTH_OMP(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_OMP(v) = len_v sets the length of v to be len_v. Implementation: #define NV_LENGTH_OMP(v) ( NV_CONTENT_OMP(v)->length ) NV_NUM_THREADS_OMP(v) Access the num_threads component of the OpenMP N_Vector v. The assignment v_threads = NV_NUM_THREADS_OMP(v) sets v_threads to be the num_threads of v. On the other hand, the call NV_NUM_THREADS_OMP(v) = num_threads_v sets the num_threads of v to be num_threads_v. Implementation: #define NV_NUM_THREADS_OMP(v) ( NV_CONTENT_OMP(v)->num_threads ) NV_Ith_OMP(v, i) This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing. The assignment r = NV_Ith_OMP(v,i) sets r to be the value of the i-th component of v. The assignment NV_Ith_OMP(v,i) = r sets the value of the i-th component of v to be r. Here i ranges from 0 to 𝑛 − 1 for a vector of length 𝑛. Implementation: #define NV_Ith_OMP(v,i) ( NV_DATA_OMP(v)[i] ) The NVECTOR_OPENMP module defines OpenMP implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations. Their names are obtained from those in those sections by appending the suffix _OpenMP (e.g. N_VDestroy_OpenMP). The module NVECTOR_OPENMP provides the following additional user-callable routines: 230 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_Vector N_VNew_OpenMP(sunindextype vec_length, int num_threads) This function creates and allocates memory for a OpenMP N_Vector. Arguments are the vector length and number of threads. N_Vector N_VNewEmpty_OpenMP(sunindextype vec_length, int num_threads) This function creates a new OpenMP N_Vector with an empty (NULL) data array. N_Vector N_VMake_OpenMP(sunindextype vec_length, realtype* v_data, int num_threads) This function creates and allocates memory for a OpenMP vector with user-provided data array, v_data. (This function does not allocate memory for v_data itself.) N_Vector* N_VCloneVectorArray_OpenMP(int count, N_Vector w) This function creates (by cloning) an array of count OpenMP vectors. N_Vector* N_VCloneVectorArrayEmpty_OpenMP(int count, N_Vector w) This function creates (by cloning) an array of count OpenMP vectors, each with an empty (‘NULL) data array. void N_VDestroyVectorArray_OpenMP(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_OpenMP() or with N_VCloneVectorArrayEmpty_OpenMP(). sunindextype N_VGetLength_OpenMP(N_Vector v) This function returns the number of vector elements. void N_VPrint_OpenMP(N_Vector v) This function prints the content of an OpenMP vector to stdout. void N_VPrintFile_OpenMP(N_Vector v, FILE *outfile) This function prints the content of an OpenMP vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_OPENMP module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_OpenMP(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_OpenMP() will have the default settings for the NVECTOR_OPENMP module. void N_VEnableFusedOps_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. 9.5. The NVECTOR_OPENMP Module 231 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableScaleVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_OpenMP(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = NV_DATA_OMP(v) and then access v_data[i] within the loop than it is to use NV_Ith_OMP(v,i) within the loop. • N_VNewEmpty_OpenMP(), N_VMake_OpenMP(), and N_VCloneVectorArrayEmpty_OpenMP() set the field own_data to SUNFALSE. The functions N_VDestroy_OpenMP() and N_VDestroyVectorArray_OpenMP() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer. • To maximize efficiency, vector operations in the NVECTOR_OPENMP implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. For solvers that include a Fortran interface module, the NVECTOR_OPENMP module also includes a Fortran-callable function FNVINITOMP(code, NEQ, NUMTHREADS, IER), to initialize this NVECTOR_OPENMP module. Here code is an input solver id (1 for CVODE, 2 for IDA, 3 for KINSOL, 4 for ARKode); NEQ is the problem size (declared so as to match C type long int); NUMTHREADS is the number of threads; and IER is an error return flag equal 0 for success and -1 for failure. 9.6 The NVECTOR_PTHREADS Module In situations where a user has a multi-core processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least 100, 000 before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations. 232 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The Pthreads NVECTOR implementation provided with SUNDIALS, denoted NVECTOR_PTHREADS, defines the content field of N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, a boolean flag own_data which specifies the ownership of data, and the number of threads. Operations on the vector are threaded using POSIX threads (Pthreads), the number of threads used is based on the supplied argument in the vector constructor. struct _N_VectorContent_Pthreads { sunindextype length; booleantype own_data; realtype *data; int num_threads; }; The header file to be included when using this module is nvector_pthreads.h. The following six macros are provided to access the content of an NVECTOR_PTHREADS vector. The suffix _PT in the names denotes the Pthreads version. NV_CONTENT_PT(v) This macro gives access to the contents of the Pthreads vector N_Vector v. The assignment v_cont = NV_CONTENT_PT(v) sets v_cont to be a pointer to the Pthreads N_Vector content structure. Implementation: #define NV_CONTENT_PT(v) ( (N_VectorContent_Pthreads)(v->content) ) NV_OWN_DATA_PT(v) Access the own_data component of the Pthreads N_Vector v. Implementation: #define NV_OWN_DATA_PT(v) ( NV_CONTENT_PT(v)->own_data ) NV_DATA_PT(v) The assignment v_data = NV_DATA_PT(v) sets v_data to be a pointer to the first component of the data for the N_Vector v. Similarly, the assignment NV_DATA_PT(v) = v_data sets the component array of v to be v_data by storing the pointer v_data. Implementation: #define NV_DATA_PT(v) ( NV_CONTENT_PT(v)->data ) NV_LENGTH_PT(v) Access the length component of the Pthreads N_Vector v. The assignment v_len = NV_LENGTH_PT(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_PT(v) = len_v sets the length of v to be len_v. Implementation: #define NV_LENGTH_PT(v) ( NV_CONTENT_PT(v)->length ) NV_NUM_THREADS_PT(v) Access the num_threads component of the Pthreads N_Vector v. The assignment v_threads = NV_NUM_THREADS_PT(v) sets v_threads to be the num_threads of v. On the other hand, the call NV_NUM_THREADS_PT(v) = num_threads_v sets the num_threads of v to be num_threads_v. 9.6. The NVECTOR_PTHREADS Module 233 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Implementation: #define NV_NUM_THREADS_PT(v) ( NV_CONTENT_PT(v)->num_threads ) NV_Ith_PT(v, i) This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing. The assignment r = NV_Ith_PT(v,i) sets r to be the value of the i-th component of v. The assignment NV_Ith_PT(v,i) = r sets the value of the i-th component of v to be r. Here i ranges from 0 to 𝑛 − 1 for a vector of length 𝑛. Implementation: #define NV_Ith_PT(v,i) ( NV_DATA_PT(v)[i] ) The NVECTOR_PTHREADS module defines Pthreads implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations. Their names are obtained from those in those sections by appending the suffix _Pthreads (e.g. N_VDestroy_Pthreads). The module NVECTOR_PTHREADS provides the following additional user-callable routines: N_Vector N_VNew_Pthreads(sunindextype vec_length, int num_threads) This function creates and allocates memory for a Pthreads N_Vector. Arguments are the vector length and number of threads. N_Vector N_VNewEmpty_Pthreads(sunindextype vec_length, int num_threads) This function creates a new Pthreads N_Vector with an empty (NULL) data array. N_Vector N_VMake_Pthreads(sunindextype vec_length, realtype* v_data, int num_threads) This function creates and allocates memory for a Pthreads vector with user-provided data array, v_data. (This function does not allocate memory for v_data itself.) N_Vector* N_VCloneVectorArray_Pthreads(int count, N_Vector w) This function creates (by cloning) an array of count Pthreads vectors. N_Vector* N_VCloneVectorArrayEmpty_Pthreads(int count, N_Vector w) This function creates (by cloning) an array of count Pthreads vectors, each with an empty (‘NULL) data array. void N_VDestroyVectorArray_Pthreads(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Pthreads() or with N_VCloneVectorArrayEmpty_Pthreads(). sunindextype N_VGetLength_Pthreads(N_Vector v) This function returns the number of vector elements. void N_VPrint_Pthreads(N_Vector v) This function prints the content of a Pthreads vector to stdout. void N_VPrintFile_Pthreads(N_Vector v, FILE *outfile) This function prints the content of a Pthreads vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_PTHREADS module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Pthreads(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Pthreads() will have the default settings for the NVECTOR_PTHREADS module. 234 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableFusedOps_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Pthreads(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = NV_DATA_PT(v) and then access v_data[i] within the loop than it is to use NV_Ith_S(v,i) within the loop. • N_VNewEmpty_Pthreads(), N_VMake_Pthreads(), and N_VCloneVectorArrayEmpty_Pthreads() set the field own_data to SUNFALSE. The functions N_VDestroy_Pthreads() and N_VDestroyVectorArray_Pthreads() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer. • To maximize efficiency, vector operations in the NVECTOR_PTHREADS implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s 9.6. The NVECTOR_PTHREADS Module 235 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. For solvers that include a Fortran interface module, the NVECTOR_PTHREADS module slso includes a Fortrancallable function FNVINITPTS(code, NEQ, NUMTHREADS, IER), to initialize this NVECTOR_PTHREADS module. Here code is an input solver id (1 for CVODE, 2 for IDA, 3 for KINSOL, 4 for ARKode); NEQ is the problem size (declared so as to match C type long int); NUMTHREADS is the number of threads; and IER is an error return flag equal 0 for success and -1 for failure. 9.7 The NVECTOR_PARHYP Module The NVECTOR_PARHYP implementation of the NVECTOR module provided with SUNDIALS is a wrapper around HYPRE’s ParVector class. Most of the vector kernels simply call HYPRE vector operations. The implementation defines the content field of N_Vector to be a structure containing the global and local lengths of the vector, a pointer to an object of type hypre_ParVector, an MPI communicator, and a boolean flag own_parvector indicating ownership of the HYPRE parallel vector object x. struct _N_VectorContent_ParHyp { sunindextype local_length; sunindextype global_length; booleantype own_data; booleantype own_parvector; realtype *data; MPI_Comm comm; hypre_ParVector *x; }; The header file to be included when using this module is nvector_parhyp.h. Unlike native SUNDIALS vector types, NVECTOR_PARHYP does not provide macros to access its member variables. The NVECTOR_PARHYP module defines implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations, except for N_VSetArrayPointer and N_VGetArrayPointer, because accessing raw vector data is handled by low-level HYPRE functions. As such, this vector is not available for use with SUNDIALS Fortran interfaces. When access to raw vector data is needed, one should extract the HYPRE HYPRE vector first, and then use HYPRE methods to access the data. Usage examples of NVECTOR_PARHYP are provided in the cvAdvDiff_non_ph.c example programs for CVODE and the ark_diurnal_kry_ph.c example program for ARKode. The names of parhyp methods are obtained from those in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations by appending the suffix _ParHyp (e.g. N_VDestroy_ParHyp). The module NVECTOR_PARHYP provides the following additional user-callable routines: N_Vector N_VNewEmpty_ParHyp(MPI_Comm comm, sunindextype local_length, sunindextype global_length) This function creates a new parhyp N_Vector with the pointer to the HYPRE vector set to NULL. N_Vector N_VMake_ParHyp(hypre_ParVector *x) This function creates an N_Vector wrapper around an existing HYPRE parallel vector. It does not allocate memory for x itself. hypre_ParVector *N_VGetVector_ParHyp(N_Vector v) This function returns a pointer to the underlying HYPRE vector. N_Vector* N_VCloneVectorArray_ParHyp(int count, N_Vector w) This function creates (by cloning) an array of count parhyp vectors. 236 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_Vector* N_VCloneVectorArrayEmpty_ParHyp(int count, N_Vector w) This function creates (by cloning) an array of count parhyp vectors, each with an empty (‘NULL) data array. void N_VDestroyVectorArray_ParHyp(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_ParHyp() or with N_VCloneVectorArrayEmpty_ParHyp(). void N_VPrint_ParHyp(N_Vector v) This function prints the local content of a parhyp vector to stdout. void N_VPrintFile_ParHyp(N_Vector v, FILE *outfile) This function prints the local content of a parhyp vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_PARHYP module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VMake_ParHyp(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VMake_ParHyp() will have the default settings for the NVECTOR_PARHYP module. void N_VEnableFusedOps_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector 9.7. The NVECTOR_PARHYP Module 237 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), arrays operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_ParHyp(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When there is a need to access components of an N_Vector_ParHyp v, it is recommended to extract the HYPRE vector via x_vec = N_VGetVector_ParHyp(v) and then access components using appropriate HYPRE functions. • N_VNewEmpty_ParHyp(), N_VMake_ParHyp(), and N_VCloneVectorArrayEmpty_ParHyp() set the field own_parvector to SUNFALSE. The functions N_VDestroy_ParHyp() and N_VDestroyVectorArray_ParHyp() will not attempt to delete an underlying HYPRE vector for any N_Vector with own_parvector set to SUNFALSE. In such a case, it is the user’s responsibility to delete the underlying vector. • To maximize efficiency, vector operations in the NVECTOR_PARHYP implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. 9.8 The NVECTOR_PETSC Module The NVECTOR_PETSC module is an NVECTOR wrapper around the PETSc vector. It defines the content field of a N_Vector to be a structure containing the global and local lengths of the vector, a pointer to the PETSc vector, an MPI communicator, and a boolean flag own_data indicating ownership of the wrapped PETSc vector. struct _N_VectorContent_Petsc { sunindextype local_length; sunindextype global_length; booleantype own_data; Vec *pvec; MPI_Comm comm; }; The header file to be included when using this module is nvector_petsc.h. Unlike native SUNDIALS vector types, NVECTOR_PETSC does not provide macros to access its member variables. Note that NVECTOR_PETSC requires SUNDIALS to be built with MPI support. The NVECTOR_PETSC module defines implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations, except for N_VGetArrayPointer and N_VSetArrayPointer. As such, this vector cannot be used with SUNDIALS Fortran interfaces. When access to raw vector data is needed, it is recommended to extract the PETSc vector first, and then use PETSc methods to access the data. Usage examples of NVECTOR_PETSC is provided in example programs for IDA. The names of vector operations are obtained from those in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations by appending the suffice _Petsc (e.g. N_VDestroy_Petsc). The module NVECTOR_PETSC provides the following additional user-callable routines: N_Vector N_VNewEmpty_Petsc(MPI_Comm comm, sunindextype local_length, sunindextype global_length) This function creates a new PETSC N_Vector with the pointer to the wrapped PETSc vector set to NULL. It 238 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), is used by the N_VMake_Petsc and N_VClone_Petsc implementations. It should be used only with great caution. N_Vector N_VMake_Petsc(Vec* pvec) This function creates and allocates memory for an NVECTOR_PETSC wrapper with a user-provided PETSc vector. It does not allocate memory for the vector pvec itself. Vec *N_VGetVector_Petsc(N_Vector v) This function returns a pointer to the underlying PETSc vector. N_Vector* N_VCloneVectorArray_Petsc(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_PETSC vectors. N_Vector* N_VCloneVectorArrayEmpty_Petsc(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_PETSC vectors, each with pointers to PETSc vectors set to NULL. void N_VDestroyVectorArray_Petsc(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Petsc() or with N_VCloneVectorArrayEmpty_Petsc(). void N_VPrint_Petsc(N_Vector v) This function prints the global content of a wrapped PETSc vector to stdout. void N_VPrintFile_Petsc(N_Vector v, const char fname[]) This function prints the global content of a wrapped PETSc vector to fname. By default all fused and vector array operations are disabled in the NVECTOR_PETSC module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VMake_Petsc(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VMake_Petsc() will have the default settings for the NVECTOR_PETSC module. void N_VEnableFusedOps_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. 9.8. The NVECTOR_PETSC Module 239 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableConstVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Petsc(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When there is a need to access components of an N_Vector_Petsc v, it is recommeded to extract the PETSc vector via x_vec = N_VGetVector_Petsc(v); and then access components using appropriate PETSc functions. • The functions N_VNewEmpty_Petsc(), N_VMake_Petsc(), and N_VCloneVectorArrayEmpty_Petsc() set the field own_data to SUNFALSE. The routines N_VDestroy_Petsc() and N_VDestroyVectorArray_Petsc() will not attempt to free the pointer pvec for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the pvec pointer. • To maximize efficiency, vector operations in the NVECTOR_PETSC implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. 9.9 The NVECTOR_CUDA Module The NVECTOR_CUDA module is an experimental NVECTOR implementation in the CUDA language. This module allows for SUNDIALS vector kernels to run on GPU devices. It is intended for users who are already familiar with CUDA and GPU programming. Building this vector module requires a CUDA compiler and, by extension, C++ compiler. The class Vector in the namespace suncudavec manages the vector data layout. template class Vector { I size_; I mem_size_; I global_size_; T* h_vec_; T* d_vec_; ThreadPartitioning * partStream_; ThreadPartitioning * partReduce_; 240 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), bool ownPartitioning_; bool ownData_; bool managed_mem_; SUNMPI_Comm comm_; ... }; The class members are vector size (length), size of the vector data memory block, pointers to vector data on the host and the device, pointers to classes StreamPartitioning and ReducePartitioning, which handle thread partitioning for streaming and reduction vector kernels, respectively, a boolean flag that signals if the vector owns the thread partitioning, a boolean flag that signals if the vector owns the data, a boolean flag that signals if managed memory is used for the data arrays, and the MPI communicator. he class Vector inherits from empty structure struct _N_VectorContent_Cuda { }; to interface the C++ class with N_Vector C code. Due to rapid progress in of CUDA development, we expect that suncudavec::Vector class will change frequently in the future SUNDIALS releases. The code is structured so that it can tolerate significant changes in the suncudavec::Vector class without requiring changes to user API. When instantiated, the class Vector will allocate memory on both, host and device by default. Optionally, managed memory can be allocated instead (see N_VNewManaged_Cuda), or a user can provide data arrays (see N_VMake_Cuda and N_VMakeManaged_Cuda). The NVECTOR_CUDA module can be utilized for single-node parallelism or in a distributed context with MPI. The header file to include when using this module for single-node parallelism is nvector_cuda.h. The header file to include when using this module in the distributed case is nvector_mpicuda.h. The installed module libraries to link to are libsundials_nveccuda.lib in the single-node case, or libsundials_nvecmpicuda.lib in the distributed case. Only one one of these libraries may be linked to when creating an executable or library. SUNDIALS must be built with MPI support if the distributed library is desired. Unlike other native SUNDIALS vector types, the NVECTOR_CUDA module does not provide macros to access its member variables. Instead, user should use the accessor functions: sunindextype N_VGetLength_Cuda(N_Vector v) This function returns the global length of the vector. sunindextype N_VGetLocalLength_Cuda(N_Vector v) This function returns the local length of the vector. Note: This function is for use in a distributed context and is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. realtype* N_VGetHostArrayPointer_Cuda(N_Vector v) This function returns pointer to the vector data on the host. realtype* N_VGetDeviceArrayPointer_Cuda(N_Vector v) This function returns pointer to the vector data on the device. MPI_Comm N_VGetMPIComm_Cuda(N_Vector v) This function returns the MPI communicator for the vector. Note: This function is for use in a distributed context and is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. booleantype N_VIsManagedMemory_Cuda(N_Vector v) This function returns a boolean flag indiciating if the vector data array is in managed memory or not. The NVECTOR_CUDA module defines implementations of all standard vector operations defined in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations, and Description of the NVECTOR vector array operations, except for N_VGetArrayPointer and N_VSetArrayPointer. As such, 9.9. The NVECTOR_CUDA Module 241 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), this vector cannot be used with SUNDIALS Fortran interfaces, nor with SUNDIALS direct solvers and preconditioners. This support will be added in subsequent SUNDIALS releases. The NVECTOR_CUDA module provides separate functions to access data on the host and on the device. It also provides methods for copying from the host to the device and vice versa. Usage examples of NVECTOR_CUDA are provided in example programs for CVODE [HSR2017]. The names of vector operations are obtained from those in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations by appending the suffix _Cuda (e.g. N_VDestroy_Cuda). The module NVECTOR_CUDA provides the following additional user-callable routines: N_Vector N_VNew_Cuda(sunindextype length) N_Vector N_VNew_Cuda(MPI_Comm comm, sunindextype local_length, sunindextype global_length) This function creates and allocates memory for a CUDA N_Vector. The vector data array is allocated on both the host and device. In the single-node setting, the only input is the vector length. This constructor is defined in the header nvector_cuda.h and the library to link to is is libsundials_nveccuda.lib. When used in a distributed context with MPI, the arguments are the MPI communicator, the local vector length, and the global vector length. This constructor is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. N_Vector N_VNewManaged_Cuda(sunindextype vec_length) N_Vector N_VNewManaged_Cuda(MPI_Comm comm, sunindextype local_length, sunindextype global_length) This function creates and allocates memory for a CUDA N_Vector. The vector data array is allocated in managed memory. When used in the single-node setting, the only input is the vector length. this constructor is defined in the header nvector_cuda.h and the library to link to is is libsundials_nveccuda.lib. When used in a distributed context with MPI, the arguments are the MPI communicator, the local vector length, and the global vector length. This constructor is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. N_Vector N_VNewEmpty_Cuda(sunindextype vec_length) This function creates a new N_Vector wrapper with the pointer to the wrapped CUDA vector set to NULL. It is used by N_VNew_Cuda(), N_VMake_Cuda(), and N_VClone_Cuda() implementations. N_Vector N_VMake_Cuda(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata) N_Vector N_VMake_Cuda(MPI_Comm comm, sunindextype global_length, sunindextype local_length, realtype *h_vdata, realtype *d_vdata) This function creates a CUDA N_Vector with user-supplied vector data arrays for the host and the device. When used in the single-node setting, the arguments are the the vector length, the host data array, and the device data array. This constructor is defined in the header nvector_cuda.h and the library to link to is is libsundials_nveccuda.lib. When used in a distributed context with MPI, the arguments are the MPI communicator, the global vector length, the local vector length, the host data array, the device data array. This constructor is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. N_Vector N_VMakeManaged_Cuda(sunindextype vec_length, realtype *vdata) N_Vector N_VMakeManaged_Cuda(MPI_Comm comm, sunindextype global_length, sunindextype local_length, realtype *vdata) This function creates a CUDA N_Vector with a user-supplied managed memory data array. When used in the single-node setting, the arguments are the the vector length, and the managed data array. This constructor is defined in the header nvector_cuda.h and the library to link to is is libsundials_nveccuda.lib. 242 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), When used in a distributed context with MPI, the arguments are the MPI communicator, the global vector length, the local vector length, the managed data array. This constructor is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. N_Vector* N_VCloneVectorArray_Cuda(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_CUDA vectors. N_Vector* N_VCloneVectorArrayEmpty_Cuda(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_CUDA vectors, each with pointers to CUDA vectors set to NULL. void N_VDestroyVectorArray_Cuda(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Cuda() or with N_VCloneVectorArrayEmpty_Cuda(). realtype* N_VCopyToDevice_Cuda(N_Vector v) This function copies host vector data to the device. realtype* N_VCopyFromDevice_Cuda(N_Vector v) This function copies vector data from the device to the host. void N_VPrint_Cuda(N_Vector v) This function prints the content of a CUDA vector to stdout. void N_VPrintFile_Cuda(N_Vector v, FILE *outfile) This function prints the content of a CUDA vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_CUDA module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Cuda(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Cuda() will have the default settings for the NVECTOR_CUDA module. void N_VEnableFusedOps_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableDotProdMulti_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. 9.9. The NVECTOR_CUDA Module 243 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableConstVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableWrmsNormMaskVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Cuda(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When there is a need to access components of an N_Vector_Cuda, v, it is recommeded to use functions N_VGetDeviceArrayPointer_Cuda() or N_VGetHostArrayPointer_Cuda(). • To maximize efficiency, vector operations in the NVECTOR_CUDA implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. 9.10 The NVECTOR_RAJA Module The NVECTOR_RAJA module is an experimental implementation of N_Vector using the RAJA hardware abstraction layer https://software.llnl.gov/RAJA/. In this implementation, RAJA allows for SUNDIALS vector kernels to run on GPU devices. The module is intended for users who are already familiar with RAJA and GPU programming. Building this vector module requires a C++11 compliant compiler and a CUDA software development toolkit. Besides the CUDA backend, RAJA has other backends such as serial, OpenMP and OpenAC. These backends are not used in this SUNDIALS release. Class Vector in namespace sunrajavec manages the vector data layout: template class Vector { I size_; I mem_size_; I global_size_; T* h_vec_; T* d_vec_; SUNMPI_Comm comm_; ... }; The class members are: vector size (length), size of the vector data memory block, the global vector size (length), pointers to vector data on the host and on the device, and the MPI communicator. The class Vector inherits from an empty structure 244 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), struct _N_VectorContent_Raja { }; to interface the C++ class with the N_Vector C code. When instantiated, the class Vector will allocate memory on both the host and the device. Due to the rapid progress of RAJA development, we expect that the sunrajavec::Vector class will change frequently in the future SUNDIALS releases. The code is structured so that it can tolerate significant changes in the sunrajavec::Vector class without requiring changes to the user API. The NVECTOR_RAJA module can be utilized for single-node parallelism or in a distributed context with MPI. The header file to include when using this module for single-node parallelism is nvector_raja.h. The header file to include when using this module in the distributed case is nvector_mpiraja.h. The installed module libraries to link to are libsundials_nveccudaraja.lib in the single-node case, or libsundials_nveccudampiraja.lib in the distributed case. Only one one of these libraries may be linked to when creating an executable or library. SUNDIALS must be built with MPI support if the distributed library is desired. Unlike other native SUNDIALS vector types, the NVECTOR_RAJA module does not provide macros to access its member variables. Instead, user should use the accessor functions: sunindextype N_VGetLength_Raja(N_Vector v) This function returns the global length of the vector. sunindextype N_VGetLocalLength_Raja(N_Vector v) This function returns the local length of the vector. Note: This function is for use in a distributed context and is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. realtype* N_VGetHostArrayPointer_Raja(N_Vector v) This function returns pointer to the vector data on the host. realtype* N_VGetDeviceArrayPointer_Raja(N_Vector v) This function returns pointer to the vector data on the device. MPI_Comm N_VGetMPIComm_Raja(N_Vector v) This function returns the MPI communicator for the vector. Note: This function is for use in a distributed context and is defined in the header nvector_mpicuda.h and the library to link to is libsundials_nvecmpicuda.lib. booleantype N_VIsManagedMemory_Raja(N_Vector v) This function returns a boolean flag indiciating if the vector data array is in managed memory or not. The NVECTOR_RAJA module defines the implementations of all vector operations listed in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations, except for N_VDotProdMulti, N_VWrmsNormVectorArray, N_VWrmsNormMaskVectorArray as support for arrays of reduction vectors is not yet supported in RAJA. These functions will be added to the NVECTOR_RAJA implementation in the future. Additionally, the operations N_VGetArrayPointer and N_VSetArrayPointer are not implemented by the RAJA vector. As such, this vector cannot be used with SUNDIALS Fortran interfaces, nor with SUNDIALS direct solvers and preconditioners. The NVECTOR_RAJA module provides separate functions to access data on the host and on the device. It also provides methods for copying from the host to the device and vice versa. Usage examples of NVECTOR_RAJA are provided in some example programs for CVODE [HSR2017]. The names of vector operations are obtained from those in the sections Description of the NVECTOR operations, Description of the NVECTOR fused operations and Description of the NVECTOR vector array operations by appending the suffix _Raja (e.g. N_VDestroy_Raja). The module NVECTOR_RAJA provides the following additional user-callable routines: 9.10. The NVECTOR_RAJA Module 245 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_Vector N_VNew_Raja(sunindextype vec_length) This function creates and allocates memory for a RAJA N_Vector. The memory is allocated on both the host and the device. Its only argument is the vector length. N_Vector N_VNewEmpty_Raja(sunindextype vec_length) This function creates a new N_Vector wrapper with the pointer to the wrapped RAJA vector set to NULL. It is used by N_VNew_Raja(), N_VMake_Raja(), and N_VClone_Raja() implementations. N_Vector N_VMake_Raja(N_VectorContent_Raja c) This function creates and allocates memory for an NVECTOR_RAJA wrapper around a user-provided sunrajavec::Vector class. Its only argument is of type N_VectorContent_Raja, which is the pointer to the class. N_Vector* N_VCloneVectorArray_Raja(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_RAJA vectors. N_Vector* N_VCloneVectorArrayEmpty_Raja(int count, N_Vector w) This function creates (by cloning) an array of count NVECTOR_RAJA vectors, each with pointers to RAJA vectors set to NULL. void N_VDestroyVectorArray_Raja(N_Vector* vs, int count) This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_Raja() or with N_VCloneVectorArrayEmpty_Raja(). realtype* N_VCopyToDevice_Raja(N_Vector v) This function copies host vector data to the device. realtype* N_VCopyFromDevice_Raja(N_Vector v) This function copies vector data from the device to the host. void N_VPrint_Raja(N_Vector v) This function prints the content of a RAJA vector to stdout. void N_VPrintFile_Raja(N_Vector v, FILE *outfile) This function prints the content of a RAJA vector to outfile. By default all fused and vector array operations are disabled in the NVECTOR_RAJA module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Raja(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Raja() will have the default settings for the NVECTOR_RAJA module. void N_VEnableFusedOps_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombination_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMulti_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearSumVectorArray_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. 246 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void N_VEnableScaleVectorArray_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableConstVectorArray_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableScaleAddMultiVectorArray_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. void N_VEnableLinearCombinationVectorArray_Raja(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL. Notes • When there is a need to access components of an N_Vector_Raja, v, it is recommeded to use functions N_VGetDeviceArrayPointer_Raja() or N_VGetHostArrayPointer_Raja(). • To maximize efficiency, vector operations in the NVECTOR_RAJA implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations. 9.11 The NVECTOR_OPENMPDEV Module In situations where a user has access to a device such as a GPU for offloading computation, SUNDIALS provides an NVECTOR implementation using OpenMP device offloading, called NVECTOR_OPENMPDEV. The NVECTOR_OPENMPDEV implementation defines the content field of the N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguousdata array on the host, a pointer to the beginning of a contiguous data array on the device, and a boolean flag own_data which specifies the ownership of host and device data arrays. struct _N_VectorContent_OpenMPDEV { sunindextype length; booleantype own_data; realtype *host_data; realtype *dev_data; }; The header file to include when using this module is nvector_openmpdev.h. The installed module library to link to is libsundials_nvecopenmpdev.lib where .lib is typically .so for shared libraries and .a for static libraries. The following macros are provided to access the content of an NVECTOR_OPENMPDEV vector. NV_CONTENT_OMPDEV(v) This macro gives access to the contents of the NVECTOR_OPENMPDEV vector N_Vector. The assignment v_cont = NV_CONTENT_S(v) sets v_cont to be a pointer to the NVECTOR_OPENMPDEV content structure. Implementation: 9.11. The NVECTOR_OPENMPDEV Module 247 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), #define NV_CONTENT_OMPDEV(v) ( (N_VectorContent_OpenMPDEV)(v->content) ) NV_OWN_DATA_OMPDEV(v) Access the own_data component of the OpenMPDEV N_Vector v. The assignment v_data = NV_DATA_HOST_OMPDEV(v) sets v_data to be a pointer to the first component of the data on the host for the N_Vector v. Implementation: #define NV_OWN_DATA_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->own_data ) NV_DATA_HOST_OMPDEV(v) The assignment NV_DATA_HOST_OMPDEV(v) = v_data sets the host component array of v to be v_data by storing the pointer v_data. Implementation: #define NV_DATA_HOST_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->host_data ) NV_DATA_DEV_OMPDEV(v) The assignment v_dev_data = NV_DATA_DEV_OMPDEV(v) sets v_dev_data to be a pointer to the first component of the data on the device for the N_Vector v. The assignment NV_DATA_DEV_OMPDEV(v) = v_dev_data sets the device component array of v to be v_dev_data by storing the pointer v_dev_data. Implementation: #define NV_DATA_DEV_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->dev_data ) NV_LENGTH_OMPDEV Access the length component of the OpenMPDEV N_Vector v. The assignment v_len = NV_LENGTH_OMPDEV(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_OMPDEV(v) = len_v sets the length of v to be len_v. #define NV_LENGTH_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->length ) The NVECTOR_OPENMPDEV module defines OpenMP device offloading implementations of all vector operations listed in Tables Description of the NVECTOR operations, Description of the NVECTOR fused operations, and Description of the NVECTOR vector array operations, except for N_VGetArrayPointer and N_VSetArrayPointer. As such, this vector cannot be used with the SUNDIALS FORTRAN interfaces, nor with the SUNDIALS direct solvers and preconditioners. It also provides methods for copying from the host to the device and vice versa. The names of the vector operations are obtained from those in tables Description of the NVECTOR operations, Description of the NVECTOR fused operations, and Description of the NVECTOR vector array operations by appending the suffix _OpenMPDEV (e.g. N_VDestroy_OpenMPDEV). The module NVECTOR_OPENMPDEV provides the following additional user-callable routines: N_Vector N_VNew_OpenMPDEV(sunindextype vec_length); This function creates and allocates memory for an NVECTOR_OPENMPDEV N_Vector. N_Vector N_VNewEmpty_OpenMPDEV(sunindextype vec_length); This function creates a new NVECTOR_OPENMPDEV N_Vector with an empty (NULL) data array. N_Vector N_VMake_OpenMPDEV(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata); This function creates an NVECTOR_OPENMPDEV vector with user-supplied vector data arrays h_vdata} and ‘‘d_vdata. This function does not allocate memory for data itself. N_Vector *N_VCloneVectorArray_OpenMPDEV(int count, N_Vector w); This function creates (by cloning) an array of count NVECTOR_OPENMPDEV vectors. 248 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_Vector *N_VCloneVectorArrayEmpty_OpenMPDEV(int count, N_Vector w); This function creates (by cloning) an array of count NVECTOR_OPENMPDEV vectors, each with an empty (NULL) data array. void N_VDestroyVectorArray_OpenMPDEV(N_Vector *vs, int count); This function frees memory allocated for the array of count variables of type N_Vector created with N_VCloneVectorArray_OpenMPDEV or with N_VCloneVectorArrayEmpty_OpenMPDEV. sunindextype N_VGetLength_OpenMPDEV(N_Vector v); This function returns number of vector elements. realtype *N_VGetHostArrayPointer_OpenMPDEV(N_Vector v); This function returns a pointer to the host data array. realtype *N_VGetDeviceArrayPointer_OpenMPDEV(N_Vector v); This function returns a pointer to the device data array. void N_VPrint_OpenMPDEV(N_Vector v); This function prints the content of an NVECTOR_OPENMPDEV vector to stdout. void N_VPrintFile_OpenMPDEV(N_Vector v, FILE *outfile); This function prints the content of an NVECTOR_OPENMPDEV vector to outfile. void N_VCopyToDevice_OpenMPDEV(N_Vector v); This function copies the content of an NVECTOR_OPENMPDEV vector’s host data array to the device data array. void N_VCopyFromDevice_OpenMPDEV(N_Vector v); This function copies the content of an NVECTOR_OPENMPDEV vector’s device data array to the host data array. By default all fused and vector array operations are disabled in the NVECTOR_OPENMPDEV module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with id{N_VNew_OpenMPDEV}, enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using id{N_VClone}. This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with id{N_VNew_OpenMPDEV} will have the default settings for the NVECTOR_OPENMPDEV module. int N_VEnableFusedOps_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableLinearCombination_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableScaleAddMulti_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableDotProdMulti_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableLinearSumVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the 9.11. The NVECTOR_OPENMPDEV Module 249 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableScaleVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableConstVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableWrmsNormVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. int N_VEnableWrmsNormMaskVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. N_VEnableScaleAddMultiVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. N_VEnableLinearCombinationVectorArray_OpenMPDEV(N_Vector v, booleantype tf ) This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is id{0} for success and id{-1} if the input vector or its id{ops} structure are id{NULL}. Notes • When looping over the components of an N_Vector v, it is most efficient to first obtain the component array via h_data = NV_DATA_HOST_OMPDEV(v) for the host arry or v_data = NV_DATA_DEV_OMPDEV(v) for the device array and then access v_data[i] within the loop. • When accessing individual components of an N_Vector v on the host remember to first copy the array back from the device with N_VCopyFromDevice_OpenMPDEV(v) to ensure the array is up to date. • N_VNewEmpty_OpenMPDEV(), N_VMake_OpenMPDEV(), and N_VCloneVectorArrayEmpty_OpenMPDEV() set the field own_data to SUNFALSE. The functions N_VDestroy_OpenMPDEV() and N_VDestroyVectorArray_OpenMPDEV() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointers. • To maximize efficiency, vector operations in the NVECTOR_OPENMPDEV implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same length. 9.12 NVECTOR Examples There are NVECTOR examples that may be installed for each implementation: serial, parallel, OpenMP, and Pthreads. Each implementation makes use of the functions in test_nvector.c. These example functions show simple usage of the NVECTOR family of functions. The input to the examples are the vector length, number of threads (if threaded implementation), and a print timing flag. 250 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The following is a list of the example functions in test_nvector.c: • Test_N_VClone: Creates clone of vector and checks validity of clone. • Test_N_VCloneEmpty: Creates clone of empty vector and checks validity of clone. • Test_N_VCloneVectorArray: Creates clone of vector array and checks validity of cloned array. • Test_N_VCloneVectorArray: Creates clone of empty vector array and checks validity of cloned array. • Test_N_VGetArrayPointer: Get array pointer. • Test_N_VSetArrayPointer: Allocate new vector, set pointer to new vector array, and check values. • Test_N_VLinearSum Case 1a: Test y = x + y • Test_N_VLinearSum Case 1b: Test y = -x + y • Test_N_VLinearSum Case 1c: Test y = ax + y • Test_N_VLinearSum Case 2a: Test x = x + y • Test_N_VLinearSum Case 2b: Test x = x - y • Test_N_VLinearSum Case 2c: Test x = x + by • Test_N_VLinearSum Case 3: Test z = x + y • Test_N_VLinearSum Case 4a: Test z = x - y • Test_N_VLinearSum Case 4b: Test z = -x + y • Test_N_VLinearSum Case 5a: Test z = x + by • Test_N_VLinearSum Case 5b: Test z = ax + y • Test_N_VLinearSum Case 6a: Test z = -x + by • Test_N_VLinearSum Case 6b: Test z = ax - y • Test_N_VLinearSum Case 7: Test z = a(x + y) • Test_N_VLinearSum Case 8: Test z = a(x - y) • Test_N_VLinearSum Case 9: Test z = ax + by • Test_N_VConst: Fill vector with constant and check result. • Test_N_VProd: Test vector multiply: z = x * y • Test_N_VDiv: Test vector division: z = x / y • Test_N_VScale: Case 1: scale: x = cx • Test_N_VScale: Case 2: copy: z = x • Test_N_VScale: Case 3: negate: z = -x • Test_N_VScale: Case 4: combination: z = cx • Test_N_VAbs: Create absolute value of vector. • Test_N_VAddConst: add constant vector: z = c + x • Test_N_VDotProd: Calculate dot product of two vectors. • Test_N_VMaxNorm: Create vector with known values, find and validate the max norm. • Test_N_VWrmsNorm: Create vector of known values, find and validate the weighted root mean square. 9.12. NVECTOR Examples 251 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • Test_N_VWrmsNormMask: Create vector of known values, find and validate the weighted root mean square using all elements except one. • Test_N_VMin: Create vector, find and validate the min. • Test_N_VWL2Norm: Create vector, find and validate the weighted Euclidean L2 norm. • Test_N_VL1Norm: Create vector, find and validate the L1 norm. • Test_N_VCompare: Compare vector with constant returning and validating comparison vector. • Test_N_VInvTest: Test z[i] = 1 / x[i] • Test_N_VConstrMask: Test mask of vector x with vector c. • Test_N_VMinQuotient: Fill two vectors with known values. Calculate and validate minimum quotient. • Test_N_VLinearCombination: Case 1a: Test x = a x • Test_N_VLinearCombination: Case 1b: Test z = a x • Test_N_VLinearCombination: Case 2a: Test x = a x + b y • Test_N_VLinearCombination: Case 2b: Test z = a x + b y • Test_N_VLinearCombination: Case 3a: Test x = x + a y + b z • Test_N_VLinearCombination: Case 3b: Test x = a x + b y + c z • Test_N_VLinearCombination: Case 3c: Test w = a x + b y + c z • Test_N_VScaleAddMulti: Case 1a: y = a x + y • Test_N_VScaleAddMulti: Case 1b: z = a x + y • Test_N_VScaleAddMulti: Case 2a: Y[i] = c[i] x + Y[i], i = 1,2,3 • Test_N_VScaleAddMulti: Case 2b: Z[i] = c[i] x + Y[i], i = 1,2,3 • Test_N_VDotProdMulti: Case 1: Calculate the dot product of two vectors • Test_N_VDotProdMulti: Case 2: Calculate the dot product of one vector with three other vectors in a vector array. • Test_N_VLinearSumVectorArray: Case 1: z = a x + b y • Test_N_VLinearSumVectorArray: Case 2a: Z[i] = a X[i] + b Y[i] • Test_N_VLinearSumVectorArray: Case 2b: X[i] = a X[i] + b Y[i] • Test_N_VLinearSumVectorArray: Case 2c: Y[i] = a X[i] + b Y[i] • Test_N_VScaleVectorArray: Case 1a: y = c y • Test_N_VScaleVectorArray: Case 1b: z = c y • Test_N_VScaleVectorArray: Case 2a: Y[i] = c[i] Y[i] • Test_N_VScaleVectorArray: Case 2b: Z[i] = c[i] Y[i] • Test_N_VScaleVectorArray: Case 1a: z = c • Test_N_VScaleVectorArray: Case 1b: Z[i] = c • Test_N_VWrmsNormVectorArray: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm. • Test_N_VWrmsNormVectorArray: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each. 252 Chapter 9. Vector Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • Test_N_VWrmsNormMaskVectorArray: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm using all elements except one. • Test_N_VWrmsNormMaskVectorArray: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each using all elements except one. • Test_N_VScaleAddMultiVectorArray: Case 1a: y = a x + y • Test_N_VScaleAddMultiVectorArray: Case 1b: z = a x + y • Test_N_VScaleAddMultiVectorArray: Case 2a: Y[j][0] = a[j] X[0] + Y[j][0] • Test_N_VScaleAddMultiVectorArray: Case 2b: Z[j][0] = a[j] X[0] + Y[j][0] • Test_N_VScaleAddMultiVectorArray: Case 3a: Y[0][i] = a[0] X[i] + Y[0][i] • Test_N_VScaleAddMultiVectorArray: Case 3b: Z[0][i] = a[0] X[i] + Y[0][i] • Test_N_VScaleAddMultiVectorArray: Case 4a: Y[j][i] = a[j] X[i] + Y[j][i] • Test_N_VScaleAddMultiVectorArray: Case 4b: Z[j][i] = a[j] X[i] + Y[j][i] • Test_N_VLinearCombinationVectorArray: Case 1a: x = a x • Test_N_VLinearCombinationVectorArray: Case 1b: z = a x • Test_N_VLinearCombinationVectorArray: Case 2a: x = a x + b y • Test_N_VLinearCombinationVectorArray: Case 2b: z = a x + b y • Test_N_VLinearCombinationVectorArray: Case 3a: x = a x + b y + c z • Test_N_VLinearCombinationVectorArray: Case 3b: w = a x + b y + c z • Test_N_VLinearCombinationVectorArray: Case 4a: X[0][i] = c[0] X[0][i] • Test_N_VLinearCombinationVectorArray: Case 4b: Z[i] = c[0] X[0][i] • Test_N_VLinearCombinationVectorArray: Case 5a: X[0][i] = c[0] X[0][i] + c[1] X[1][i] • Test_N_VLinearCombinationVectorArray: Case 5b: Z[i] = c[0] X[0][i] + c[1] X[1][i] • Test_N_VLinearCombinationVectorArray: Case 6a: X[0][i] = X[0][i] + c[1] X[1][i] + c[2] X[2][i] • Test_N_VLinearCombinationVectorArray: Case 6b: X[0][i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i] • Test_N_VLinearCombinationVectorArray: Case 6c: Z[i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i] 9.13 NVECTOR functions required by ARKode In the table below, we list the vector functions in the N_Vector module that are called within the ARKode package. The table also shows, for each function, which ARKode module uses the function. The ARKSTEP and ERKSTEP columns show function usage within the main time-stepping modules and the shared ARKode infrastructure, while the remaining columns show function usage within the ARKLS linear solver interface, the ARKBANDPRE and ARKBBDPRE preconditioner modules, and the FARKODE module. Note that since FARKODE is built on top of ARKode, and therefore requires the same N_Vector routines, in the FARKODE column we only list the routines that the FARKODE interface directly utilizes. Note that for ARKLS we only list the N_Vector routines used directly by ARKLS, each SUNLinearSolver module may have additional requirements that are not listed here. In addition, specific SUNNonlinearSolver modules attached to ARKode may have additional N_Vector requirements. For additional requirements by specific 9.13. NVECTOR functions required by ARKode 253 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNLinearSolver and SUNNonlinearSolver modules, please see the accompanying sections Description of the SUNLinearSolver module and Nonlinear Solver Data Structures. At this point, we should emphasize that the user does not need to know anything about ARKode’s usage of vector functions in order to use ARKode. Instead, this information is provided primarily for users interested in constructing a custom N_Vector module. We note that a number of N_Vector functions from the section Description of the NVECTOR Modules are not listed in the above table. Therefore a user-supplied N_Vector module for ARKode could safely omit these functions from their implementation. Routine N_VAbs N_VAddConst N_VClone N_VCloneEmpty N_VConst N_VDestroy N_VDiv N_VDotProd N_VGetArrayPointer N_VInv N_VLinearSum N_VMaxNorm N_VMin N_VScale N_VSetArrayPointer N_VSpace2 N_VWrmsNorm N_VLinearCombination3 ARKSTEP X X X ERKSTEP X X X ARKLS X X X X X X X X X X X X X X X X X X X ARKBBDPRE FARKODE X X X1 X X X X X ARKBANDPRE X X X X X X X X X X X X X X X X1 X X X 1. This is only required with dense or band matrix-based linear solver modules, where the default differencequotient Jacobian approximation is used. 2. The N_VSpace() function is only informational, and will only be called if provided by the N_Vector implementation. 3. The N_VLinearCombination() function is in fact optional; if it is not supplied then N_VLinearSum() will be used instead. 254 Chapter 9. Vector Data Structures CHAPTER TEN MATRIX DATA STRUCTURES The SUNDIALS library comes packaged with a variety of SUNMatrix implementations, designed for simulations requiring direct linear solvers for problems in serial or shared-memory parallel environments. SUNDIALS additionally provides a simple interface for generic matrices (akin to a C++ abstract base class). All of the major SUNDIALS packages (CVODE(s), IDA(s), KINSOL, ARKODE), are constructed to only depend on these generic matrix operations, making them immediately extensible to new user-defined matrix objects. For each of the SUNDIALS-provided matrix types, SUNDIALS also provides at least two SUNLinearSolver implementations that factor these matrix objects and use them in the solution of linear systems. 10.1 Description of the SUNMATRIX Modules For problems that involve direct methods for solving linear systems, the SUNDIALS solvers not only operate on generic vectors, but also on generic matrices (of type SUNMatrix), through a set of operations defined by the particular SUNMATRIX implementation. Users can provide their own specific implementation of the SUNMATRIX module, particularly in cases where they provide their own N_Vector and/or linear solver modules, and require matrices that are compatible with those implementations. Alternately, we provide three SUNMATRIX implementations: dense, banded, and sparse. The generic operations are described below, and descriptions of the implementations provided with SUNDIALS follow. The generic SUNMatrix type has been modeled after the object-oriented style of the generic N_Vector type. Specifically, a generic SUNMatrix is a pointer to a structure that has an implementation-dependent content field containing the description and actual data of the matrix, and an ops field pointing to a structure with generic matrix operations. The type SUNMatrix is defined as: typedef struct _generic_SUNMatrix *SUNMatrix; struct _generic_SUNMatrix { void *content; struct _generic_SUNMatrix_Ops *ops; }; Here, the _generic_SUNMatrix_Ops structure is essentially a list of function pointers to the various actual matrix operations, and is defined as struct _generic_SUNMatrix_Ops { SUNMatrix_ID (*getid)(SUNMatrix); SUNMatrix (*clone)(SUNMatrix); void (*destroy)(SUNMatrix); int (*zero)(SUNMatrix); int (*copy)(SUNMatrix, SUNMatrix); int (*scaleadd)(realtype, SUNMatrix, SUNMatrix); int (*scaleaddi)(realtype, SUNMatrix); int (*matvec)(SUNMatrix, N_Vector, N_Vector); 255 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int (*space)(SUNMatrix, long int*, long int*); }; The generic SUNMATRIX module defines and implements the matrix operations acting on a SUNMatrix. These routines are nothing but wrappers for the matrix operations defined by a particular SUNMATRIX implementation, which are accessed through the ops field of the SUNMatrix structure. To illustrate this point we show below the implementation of a typical matrix operation from the generic SUNMATRIX module, namely SUNMatZero, which sets all values of a matrix A to zero, returning a flag denoting a successful/failed operation: int SUNMatZero(SUNMatrix A) { return((int) A->ops->zero(A)); } The subsection Description of the SUNMATRIX operations contains a complete list of all matrix operations defined by the generic SUNMATRIX module. A particular implementation of the SUNMATRIX module must: • Specify the content field of the SUNMatrix object. • Define and implement a minimal subset of the matrix operations. See the documentation for each SUNDIALS solver to determine which SUNMATRIX operations they require. The list of required operations for use with ARKode is given in the section SUNMATRIX functions required by ARKode. Note that the names of these routines should be unique to that implementation in order to permit using more than one SUNMATRIX module (each with different SUNMatrix internal data representations) in the same code. • Define and implement user-callable constructor and destructor routines to create and free a SUNMatrix with the new content field and with ops pointing to the new matrix operations. • Optionally, define and implement additional user-callable routines acting on the newly defined SUNMatrix (e.g., a routine to print the content for debugging purposes). • Optionally, provide accessor macros as needed for that particular implementation to be used to access different parts in the content field of the newly defined SUNMatrix. Each SUNMATRIX implementation included in SUNDIALS has a unique identifier specified in enumeration and shown in the table below. It is recommended that a user-supplied SUNMATRIX implementation use the SUNMATRIX_CUSTOM identifier. 10.1.1 Identifiers associated with matrix kernels supplied with SUNDIALS Matrix ID SUNMATRIX_DENSE SUNMATRIX_BAND SUNMATRIX_SPARSE SUNMATRIX_CUSTOM Matrix type Dense 𝑀 × 𝑁 matrix Band 𝑀 × 𝑀 matrix Sparse (CSR or CSC) 𝑀 × 𝑁 matrix User-provided custom matrix ID Value 0 1 2 3 10.2 Description of the SUNMATRIX operations For each of the SUNMatrix operations, we give the name, usage of the function, and a description of its mathematical operations below. SUNMatrix_ID SUNMatGetID(SUNMatrix A) Returns the type identifier for the matrix A. It is used to determine the matrix implementation type (e.g. dense, banded, sparse,...) from the abstract SUNMatrix interface. This is used to assess compatibility with 256 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNDIALS-provided linear solver implementations. Returned values are given in the Table Identifiers associated with matrix kernels supplied with SUNDIALS Usage: id = SUNMatGetID(A); SUNMatrix SUNMatClone(SUNMatrix A) Creates a new SUNMatrix of the same type as an existing matrix A and sets the ops field. It does not copy the matrix, but rather allocates storage for the new matrix. Usage: B = SUNMatClone(A); void SUNMatDestroy(SUNMatrix A) Destroys the SUNMatrix A and frees memory allocated for its internal data. Usage: SUNMatDestroy(A); int SUNMatSpace(SUNMatrix A, long int *lrw, long int *liw) Returns the storage requirements for the matrix A. lrw contains the number of realtype words and liw contains the number of integer words. The return value denotes success/failure of the operation. This function is advisory only, for use in determining a user’s total space requirements; it could be a dummy function in a user-supplied SUNMatrix module if that information is not of interest. Usage: ier = SUNMatSpace(A, &lrw, &liw); int SUNMatZero(SUNMatrix A) Zeros all entries of the SUNMatrix A. The return value is an integer flag denoting success/failure of the operation: 𝐴𝑖,𝑗 = 0, 𝑖 = 1, . . . , 𝑚, 𝑗 = 1, . . . , 𝑛. Usage: ier = SUNMatZero(A); int SUNMatCopy(SUNMatrix A, SUNMatrix B) Performs the operation B = A for all entries of the matrices A and B. The return value is an integer flag denoting success/failure of the operation: 𝐵𝑖,𝑗 = 𝐴𝑖,𝑗 , 𝑖 = 1, . . . , 𝑚, 𝑗 = 1, . . . , 𝑛. Usage: ier = SUNMatCopy(A,B); SUNMatScaleAdd(realtype c, SUNMatrix A, SUNMatrix B) Performs the operation A = cA + B. The return value is an integer flag denoting success/failure of the operation: 𝐴𝑖,𝑗 = 𝑐𝐴𝑖,𝑗 + 𝐵𝑖,𝑗 , 𝑖 = 1, . . . , 𝑚, 𝑗 = 1, . . . , 𝑛. Usage: 10.2. Description of the SUNMATRIX operations 257 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ier = SUNMatScaleAdd(c, A, B); SUNMatScaleAddI(realtype c, SUNMatrix A) Performs the operation A = cA + I. The return value is an integer flag denoting success/failure of the operation: 𝐴𝑖,𝑗 = 𝑐𝐴𝑖,𝑗 + 𝛿𝑖,𝑗 , 𝑖, 𝑗 = 1, . . . , 𝑛. Usage: ier = SUNMatScaleAddI(c, A); SUNMatMatvec(SUNMatrix A, N_Vector x, N_Vector y) Performs the matrix-vector product y = Ax. It should only be called with vectors x and y that are compatible with the matrix A – both in storage type and dimensions. The return value is an integer flag denoting success/failure of the operation: 𝑦𝑖 = 𝑛 ∑︁ 𝐴𝑖,𝑗 𝑥𝑗 , 𝑖 = 1, . . . , 𝑚. 𝑗=1 Usage: ier = SUNMatMatvec(A, x, y); 10.3 Compatibility of SUNMATRIX types We note that not all SUNMatrix types are compatible with all N_Vector types provided with SUNDIALS. This is primarily due to the need for compatibility within the SUNMatMatvec routine; however, compatibility between SUNMatrix and N_Vector implementations is more crucial when considering their interaction within SUNLinearSolver objects, as will be described in more detail in section Description of the SUNLinearSolver module. More specifically, in the Table SUNDIALS matrix interfaces and vector implementations that can be used for each we show the matrix interfaces available as SUNMatrix modules, and the compatible vector implementations. 10.3.1 SUNDIALS matrix interfaces and vector implementations that can be used for each Linear Solver Dense Band Sparse User supplied Serial X X X X Parallel (MPI) X OpenMP pThreads hypre Vec. X X X X X X X X X PETSc Vec. X CUDA RAJA User Suppl. X X X X X X 10.4 The SUNMATRIX_DENSE Module The dense implementation of the SUNMatrix module provided with SUNDIALS, SUNMATRIX_DENSE, defines the content field of SUNMatrix to be the following structure: 258 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), struct _SUNMatrixContent_Dense { sunindextype M; sunindextype N; realtype *data; sunindextype ldata; realtype **cols; }; These entries of the content field contain the following information: • M - number of rows • N - number of columns • data - pointer to a contiguous block of realtype variables. The elements of the dense matrix are stored columnwise, i.e. the 𝐴𝑖,𝑗 element of a dense SUNMatrix A (with 0 ≤ 𝑖 < 𝑀 and 0 ≤ 𝑗 < 𝑁 ) may be accessed via data[j*M+i]. • ldata - length of the data array (= 𝑀 · 𝑁 ). • cols - array of pointers. cols[j] points to the first element of the j-th column of the matrix in the array data. The 𝐴𝑖,𝑗 element of a dense SUNMatrix A (with 0 ≤ 𝑖 < 𝑀 and 0 ≤ 𝑗 < 𝑁 ) may be accessed may be accessed via cols[j][i]. The header file to be included when using this module is sunmatrix/sunmatrix_dense.h. The following macros are provided to access the content of a SUNMATRIX_DENSE matrix. The prefix SM_ in the names denotes that these macros are for SUNMatrix implementations, and the suffix _D denotes that these are specific to the dense version. SM_CONTENT_D(A) This macro gives access to the contents of the dense SUNMatrix A. The assignment A_cont = SM_CONTENT_D(A) sets A_cont to be a pointer to the dense SUNMatrix content structure. Implementation: #define SM_CONTENT_D(A) ( (SUNMatrixContent_Dense)(A->content) ) SM_ROWS_D(A) Access the number of rows in the dense SUNMatrix A. This may be used either to retrieve or to set the value. For example, the assignment A_rows = SM_ROWS_D(A) sets A_rows to be the number of rows in the matrix A. Similarly, the assignment SM_ROWS_D(A) = A_rows sets the number of columns in A to equal A_rows. Implementation: #define SM_ROWS_D(A) ( SM_CONTENT_D(A)->M ) SM_COLUMNS_D(A) Access the number of columns in the dense SUNMatrix A. This may be used either to retrieve or to set the value. For example, the assignment A_columns = SM_COLUMNS_D(A) sets A_columns to be the number of columns in the matrix A. Similarly, the assignment SM_COLUMNS_D(A) = A_columns sets the number of columns in A to equal A_columns Implementation: #define SM_COLUMNS_D(A) ( SM_CONTENT_D(A)->N ) 10.4. The SUNMATRIX_DENSE Module 259 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SM_LDATA_D(A) Access the total data length in the dense SUNMatrix A. This may be used either to retrieve or to set the value. For example, the assignment A_ldata = SM_LDATA_D(A) sets A_ldata to be the length of the data array in the matrix A. Similarly, the assignment SM_LDATA_D(A) = A_ldata sets the parameter for the length of the data array in A to equal A_ldata. Implementation: #define SM_LDATA_D(A) ( SM_CONTENT_D(A)->ldata ) SM_DATA_D(A) This macro gives access to the data pointer for the matrix entries. The assignment A_data = SM_DATA_D(A) sets A_data to be a pointer to the first component of the data array for the dense SUNMatrix A. The assignment SM_DATA_D(A) = A_data sets the data array of A to be A_data by storing the pointer A_data. Implementation: #define SM_DATA_D(A) ( SM_CONTENT_D(A)->data ) SM_COLS_D(A) This macro gives access to the cols pointer for the matrix entries. The assignment A_cols = SM_COLS_D(A) sets A_cols to be a pointer to the array of column pointers for the dense SUNMatrix A. The assignment SM_COLS_D(A) = A_cols sets the column pointer array of A to be A_cols by storing the pointer A_cols. Implementation: #define SM_COLS_D(A) ( SM_CONTENT_D(A)->cols ) SM_COLUMN_D(A) This macros gives access to the individual columns of the data array of a dense SUNMatrix. The assignment col_j = SM_COLUMN_D(A,j) sets col_j to be a pointer to the first entry of the j-th column of the 𝑀 × 𝑁 dense matrix A (with 0 ≤ 𝑗 < 𝑁 ). The type of the expression SM_COLUMN_D(A,j) is realtype *. The pointer returned by the call SM_COLUMN_D(A,j) can be treated as an array which is indexed from 0 to M-1. Implementation: #define SM_COLUMN_D(A,j) ( (SM_CONTENT_D(A)->cols)[j] ) SM_ELEMENT_D(A) This macro gives access to the individual entries of the data array of a dense SUNMatrix. The assignments SM_ELEMENT_D(A,i,j) = a_ij and a_ij = SM_ELEMENT_D(A,i,j) reference the 𝐴𝑖,𝑗 element of the 𝑀 × 𝑁 dense matrix A (with 0 ≤ 𝑖 < 𝑀 and 0 ≤ 𝑗 < 𝑁 ). Implementation: #define SM_ELEMENT_D(A,i,j) ( (SM_CONTENT_D(A)->cols)[j][i] ) The SUNMATRIX_DENSE module defines dense implementations of all matrix operations listed in the section Description of the SUNMATRIX operations. Their names are obtained from those in that section by appending the suffix _Dense (e.g. SUNMatCopy_Dense). The module SUNMATRIX_DENSE provides the following additional usercallable routines: SUNMatrix SUNDenseMatrix(sunindextype M, sunindextype N) This constructor function creates and allocates memory for a dense SUNMatrix. Its arguments are the number of rows, M, and columns, N, for the dense matrix. 260 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), void SUNDenseMatrix_Print(SUNMatrix A, FILE* outfile) This function prints the content of a dense SUNMatrix to the output stream specified by outfile. Note: stdout or stderr may be used as arguments for outfile to print directly to standard output or standard error, respectively. sunindextype SUNDenseMatrix_Rows(SUNMatrix A) This function returns the number of rows in the dense SUNMatrix. sunindextype SUNDenseMatrix_Columns(SUNMatrix A) This function returns the number of columns in the dense SUNMatrix. sunindextype SUNDenseMatrix_LData(SUNMatrix A) This function returns the length of the data array for the dense SUNMatrix. realtype* SUNDenseMatrix_Data(SUNMatrix A) This function returns a pointer to the data array for the dense SUNMatrix. realtype** SUNDenseMatrix_Cols(SUNMatrix A) This function returns a pointer to the cols array for the dense SUNMatrix. realtype* SUNDenseMatrix_Column(SUNMatrix A, sunindextype j) This function returns a pointer to the first entry of the jth column of the dense SUNMatrix. The resulting pointer should be indexed over the range 0 to M-1. Notes • When looping over the components of a dense SUNMatrix A, the most efficient approaches are to: – First obtain the component array via A_data = SM_DATA_D(A) SUNDenseMatrix_Data(A) and then access A_data[i] within the loop. or A_data = – First obtain the array of column pointers via A_cols = SM_COLS_D(A) or A_cols = SUNDenseMatrix_Cols(A), and then access A_cols[j][i] within the loop. – Within a loop over the columns, access the column pointer via A_colj = SUNDenseMatrix_Column(A,j) and then to access the entries within that column using A_colj[i] within the loop. All three of these are more efficient than using SM_ELEMENT_D(A,i,j) within a double loop. • Within the SUNMatMatvec_Dense routine, internal consistency checks are performed to ensure that the matrix is called with consistent N_Vector implementations. These are currently limited to: NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS. As additional compatible vector implementations are added to SUNDIALS, these will be included within this compatibility check. For solvers that include a Fortran interface module, the SUNMATRIX_DENSE module also includes the Fortrancallable function FSUNDenseMatInit() to initialize this SUNMATRIX_DENSE module for a given SUNDIALS solver. subroutine FSUNDenseMatInit(CODE, M, N, IER) Initializes a dense SUNMatrix structure for use in a SUNDIALS solver. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • M (long int, input) – number of matrix rows. • N (long int, input) – number of matrix columns. • IER (int, output) – return flag (0 success, -1 for failure). 10.4. The SUNMATRIX_DENSE Module 261 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNDenseMassMatInit() initializes this SUNMATRIX_DENSE module for storing the mass matrix. subroutine FSUNDenseMassMatInit(M, N, IER) Initializes a dense SUNMatrix structure for use as a mass matrix in ARKode. Arguments: • M (long int, input) – number of matrix rows. • N (long int, input) – number of matrix columns. • IER (int, output) – return flag (0 success, -1 for failure). 10.5 The SUNMATRIX_BAND Module The banded implementation of the SUNMatrix module provided with SUNDIALS, SUNMATRIX_BAND, defines the content field of SUNMatrix to be the following structure: struct _SUNMatrixContent_Band { sunindextype M; sunindextype N; sunindextype mu; sunindextype ml; sunindextype smu; sunindextype ldim; realtype *data; sunindextype ldata; realtype **cols; }; A diagram of the underlying data representation in a banded matrix is shown in Figure SUNBandMatrix Diagram. A more complete description of the parts of this content field is given below: • M - number of rows • N - number of columns (N = M) • mu - upper half-bandwidth, 0 ≤ mu < 𝑁 • ml - lower half-bandwidth, 0 ≤ ml < 𝑁 • smu - storage upper bandwidth, mu ≤ smu < 𝑁 . The LU decomposition routines in the associated SUNLINSOL_BAND and SUNLINSOL_LAPACKBAND modules write the LU factors into the existing storage for the band matrix. The upper triangular factor U, however, may have an upper bandwidth as big as min(N-1, mu+ml) because of partial pivoting. The smu field holds the upper half-bandwidth allocated for the band matrix. • ldim - leading dimension (ldim ≥ 𝑠𝑚𝑢 + 𝑚𝑙 + 1) • data - pointer to a contiguous block of realtype variables. The elements of the banded matrix are stored columnwise (i.e. columns are stored one on top of the other in memory). Only elements within the specified half-bandwidths are stored. data is a pointer to ldata contiguous locations which hold the elements within the banded matrix. • ldata - length of the data array (= ldim · 𝑁 ) • cols - array of pointers. cols[j] is a pointer to the uppermost element within the band in the j-th column. This pointer may be treated as an array indexed from smu-mu (to access the uppermost element within the band in the j-th column) to smu+ml (to access the lowest element within the band in the j-th column). Indices 262 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), from 0 to smu-mu-1 give access to extra storage elements required by the LU decomposition function. Finally, cols[j][i-j+smu] is the (𝑖, 𝑗)-th element with 𝑗 − mu ≤ 𝑖 ≤ 𝑗 + ml. The header file to be included when using this module is sunmatrix/sunmatrix_band.h. The following macros are provided to access the content of a SUNMATRIX_BAND matrix. The prefix SM_ in the names denotes that these macros are for SUNMatrix implementations, and the suffix _B denotes that these are specific to the banded version. SM_CONTENT_B(A) This macro gives access to the contents of the banded SUNMatrix A. The assignment A_cont = SM_CONTENT_B(A) sets A_cont to be a pointer to the banded SUNMatrix content structure. Implementation: #define SM_CONTENT_B(A) ( (SUNMatrixContent_Band)(A->content) ) SM_ROWS_B(A) Access the number of rows in the banded SUNMatrix A. This may be used either to retrieve or to set the value. For example, the assignment A_rows = SM_ROWS_B(A) sets A_rows to be the number of rows in the matrix A. Similarly, the assignment SM_ROWS_B(A) = A_rows sets the number of columns in A to equal A_rows. Implementation: #define SM_ROWS_B(A) ( SM_CONTENT_B(A)->M ) SM_COLUMNS_B(A) Access the number of columns in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_COLUMNS_B(A) ( SM_CONTENT_B(A)->N ) SM_UBAND_B(A) Access the mu parameter in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_UBAND_B(A) ( SM_CONTENT_B(A)->mu ) SM_LBAND_B(A) Access the ml parameter in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_LBAND_B(A) ( SM_CONTENT_B(A)->ml ) SM_SUBAND_B(A) Access the smu parameter in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_SUBAND_B(A) ( SM_CONTENT_B(A)->smu ) 10.5. The SUNMATRIX_BAND Module 263 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 10.1: Diagram of the storage for the SUNMATRIX_BAND module. Here A is an 𝑁 × 𝑁 band matrix with upper and lower half-bandwidths mu and ml, respectively. The rows and columns of A are numbered from 0 to N-1 and the (𝑖, 𝑗)-th element of A is denoted A(i,j). The greyed out areas of the underlying component storage are used by the associated SUNLINSOL_BAND or SUNLINSOL_LAPACKBAND linear solver. 264 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SM_LDIM_B(A) Access the ldim parameter in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_LDIM_B(A) ( SM_CONTENT_B(A)->ldim ) SM_LDATA_B(A) Access the ldata parameter in the banded SUNMatrix A. As with SM_ROWS_B, this may be used either to retrieve or to set the value. Implementation: #define SM_LDATA_B(A) ( SM_CONTENT_B(A)->ldata ) SM_DATA_B(A) This macro gives access to the data pointer for the matrix entries. The assignment A_data = SM_DATA_B(A) sets A_data to be a pointer to the first component of the data array for the banded SUNMatrix A. The assignment SM_DATA_B(A) = A_data sets the data array of A to be A_data by storing the pointer A_data. Implementation: #define SM_DATA_B(A) ( SM_CONTENT_B(A)->data ) SM_COLS_B(A) This macro gives access to the cols pointer for the matrix entries. The assignment A_cols = SM_COLS_B(A) sets A_cols to be a pointer to the array of column pointers for the banded SUNMatrix A. The assignment SM_COLS_B(A) = A_cols sets the column pointer array of A to be A_cols by storing the pointer A_cols. Implementation: #define SM_COLS_B(A) ( SM_CONTENT_B(A)->cols ) SM_COLUMN_B(A) This macros gives access to the individual columns of the data array of a banded SUNMatrix. The assignment col_j = SM_COLUMN_B(A,j) sets col_j to be a pointer to the diagonal element of the j-th column of the 𝑁 × 𝑁 band matrix A, 0 ≤ 𝑗 ≤ 𝑁 − 1. The type of the expression SM_COLUMN_B(A,j) is realtype *. The pointer returned by the call SM_COLUMN_B(A,j) can be treated as an array which is indexed from -mu to ml. Implementation: #define SM_COLUMN_B(A,j) ( ((SM_CONTENT_B(A)->cols)[j])+SM_SUBAND_B(A) ) SM_ELEMENT_B(A) This macro gives access to the individual entries of the data array of a banded SUNMatrix. The assignments SM_ELEMENT_B(A,i,j) = a_ij and a_ij = SM_ELEMENT_B(A,i,j) reference the (𝑖, 𝑗)-th element of the 𝑁 × 𝑁 band matrix A, where 0 ≤ 𝑖, 𝑗 ≤ 𝑁 − 1. The location (𝑖, 𝑗) should further satisfy 𝑗 − mu ≤ 𝑖 ≤ 𝑗 + ml. Implementation: #define SM_ELEMENT_B(A,i,j) ( (SM_CONTENT_B(A)->cols)[j][(i)-(j)+SM_SUBAND_B(A)] ) SM_COLUMN_ELEMENT_B(A) This macro gives access to the individual entries of the data array of a banded SUNMatrix. 10.5. The SUNMATRIX_BAND Module 265 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The assignments SM_COLUMN_ELEMENT_B(col_j,i,j) = a_ij and a_ij = SM_COLUMN_ELEMENT_B(col_j,i,j) reference the (𝑖, 𝑗)-th entry of the band matrix A when used in conjunction with SM_COLUMN_B to reference the j-th column through col_j. The index (𝑖, 𝑗) should satisfy 𝑗 − mu ≤ 𝑖 ≤ 𝑗 + ml. Implementation: #define SM_COLUMN_ELEMENT_B(col_j,i,j) (col_j[(i)-(j)]) The SUNMATRIX_BAND module defines banded implementations of all matrix operations listed in the section Description of the SUNMATRIX operations. Their names are obtained from those in that section by appending the suffix _Band (e.g. SUNMatCopy_Band). The module SUNMATRIX_BAND provides the following additional user-callable routines: SUNMatrix SUNBandMatrix(sunindextype N, sunindextype mu, sunindextype ml) This constructor function creates and allocates memory for a banded SUNMatrix. Its arguments are the matrix size, N, and the upper and lower half-bandwidths of the matrix, mu and ml. The stored upper bandwidth is set to mu+ml to accommodate subsequent factorization in the SUNLINSOL_BAND and SUNLINSOL_LAPACKBAND modules. SUNMatrix SUNBandMatrixStorage(sunindextype N, sunindextype mu, sunindextype ml, sunindextype smu) This constructor function creates and allocates memory for a banded SUNMatrix. Its arguments are the matrix size, N, the upper and lower half-bandwidths of the matrix, mu and ml, and the stored upper bandwidth, smu. When creating a band SUNMatrix, this value should be •at least min(N-1,mu+ml) if the matrix will be used by the SUNLinSol_Band module; •exactly equal to mu+ml if the matrix will be used by the SUNLinSol_LapackBand module; •at least mu if used in some other manner. Note: it is strongly recommended that users call the default constructor, :c:func:‘SUNBandMatrix()‘, in all standard use cases. This advanced constructor is used internally within SUNDIALS solvers, and is provided to users who require banded matrices for non-default purposes. void SUNBandMatrix_Print(SUNMatrix A, FILE* outfile) This function prints the content of a banded SUNMatrix to the output stream specified by outfile. Note: stdout or stderr may be used as arguments for outfile to print directly to standard output or standard error, respectively. sunindextype SUNBandMatrix_Rows(SUNMatrix A) This function returns the number of rows in the banded SUNMatrix. sunindextype SUNBandMatrix_Columns(SUNMatrix A) This function returns the number of columns in the banded SUNMatrix. sunindextype SUNBandMatrix_LowerBandwidth(SUNMatrix A) This function returns the lower half-bandwidth for the banded SUNMatrix. sunindextype SUNBandMatrix_UpperBandwidth(SUNMatrix A) This function returns the upper half-bandwidth of the banded SUNMatrix. sunindextype SUNBandMatrix_StoredUpperBandwidth(SUNMatrix A) This function returns the stored upper half-bandwidth of the banded SUNMatrix. sunindextype SUNBandMatrix_LDim(SUNMatrix A) This function returns the length of the leading dimension of the banded SUNMatrix. realtype* SUNBandMatrix_Data(SUNMatrix A) This function returns a pointer to the data array for the banded SUNMatrix. 266 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), realtype** SUNBandMatrix_Cols(SUNMatrix A) This function returns a pointer to the cols array for the band SUNMatrix. realtype* SUNBandMatrix_Column(SUNMatrix A, sunindextype j) This function returns a pointer to the diagonal entry of the j-th column of the banded SUNMatrix. The resulting pointer should be indexed over the range -mu to ml. Notes • When looping over the components of a banded SUNMatrix A, the most efficient approaches are to: – First obtain the component array via A_data = SM_DATA_B(A) SUNBandMatrix_Data(A) and then access A_data[i] within the loop. or A_data = – First obtain the array of column pointers via A_cols = SM_COLS_B(A) or A_cols = SUNBandMatrix_Cols(A), and then access A_cols[j][i] within the loop. – Within a loop over the columns, access the column pointer via A_colj = SUNBandMatrix_Column(A,j) and then to access the entries within that column using SM_COLUMN_ELEMENT_B(A_colj,i,j). All three of these are more efficient than using SM_ELEMENT_B(A,i,j) within a double loop. • Within the SUNMatMatvec_Band routine, internal consistency checks are performed to ensure that the matrix is called with consistent N_Vector implementations. These are currently limited to: NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS. As additional compatible vector implementations are added to SUNDIALS, these will be included within this compatibility check. For solvers that include a Fortran interface module, the SUNMATRIX_BAND module also includes the Fortrancallable function FSUNBandMatInit() to initialize this SUNMATRIX_BAND module for a given SUNDIALS solver. subroutine FSUNBandMatInit(CODE, N, MU, ML, IER) Initializes a band SUNMatrix structure for use in a SUNDIALS solver. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • N (long int, input) – number of matrix rows (and columns). • MU (long int, input) – upper half-bandwidth. • ML (long int, input) – lower half-bandwidth. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNBandMassMatInit() initializes this SUNMATRIX_BAND module for storing the mass matrix. subroutine FSUNBandMassMatInit(N, MU, ML, IER) Initializes a band SUNMatrix structure for use as a mass matrix in ARKode. Arguments: • N (long int, input) – number of matrix rows (and columns). • MU (long int, input) – upper half-bandwidth. • ML (long int, input) – lower half-bandwidth. • IER (int, output) – return flag (0 success, -1 for failure). 10.5. The SUNMATRIX_BAND Module 267 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 10.6 The SUNMATRIX_SPARSE Module The sparse implementation of the SUNMatrix module provided with SUNDIALS, SUNMATRIX_SPARSE, is designed to work with either compressed-sparse-column (CSC) or compressed-sparse-row (CSR) sparse matrix formats. To this end, it defines the content field of SUNMatrix to be the following structure: struct _SUNMatrixContent_Sparse { sunindextype M; sunindextype N; sunindextype NNZ; sunindextype NP; realtype *data; int sparsetype; sunindextype *indexvals; sunindextype *indexptrs; /* CSC indices */ sunindextype **rowvals; sunindextype **colptrs; /* CSR indices */ sunindextype **colvals; sunindextype **rowptrs; }; A diagram of the underlying data representation in a sparse matrix is shown in Figure SUNSparseMatrix Diagram. A more complete description of the parts of this content field is given below: • M - number of rows • N - number of columns • NNZ - maximum number of nonzero entries in the matrix (allocated length of data and indexvals arrays) • NP - number of index pointers (e.g. number of column pointers for CSC matrix). For CSC matrices NP=N, and for CSR matrices NP=M. This value is set automatically at construction based the input choice for sparsetype. • data - pointer to a contiguous block of realtype variables (of length NNZ), containing the values of the nonzero entries in the matrix • sparsetype - type of the sparse matrix (CSC_MAT or CSR_MAT) • indexvals - pointer to a contiguous block of int variables (of length NNZ), containing the row indices (if CSC) or column indices (if CSR) of each nonzero matrix entry held in data • indexptrs - pointer to a contiguous block of int variables (of length NP+1). For CSC matrices each entry provides the index of the first column entry into the data and indexvals arrays, e.g. if indexptr[3]=7, then the first nonzero entry in the fourth column of the matrix is located in data[7], and is located in row indexvals[7] of the matrix. The last entry contains the total number of nonzero values in the matrix and hence points one past the end of the active data in the data and indexvals arrays. For CSR matrices, each entry provides the index of the first row entry into the data and indexvals arrays. The following pointers are added to the SUNMATRIX_SPARSE content structure for user convenience, to provide a more intuitive interface to the CSC and CSR sparse matrix data structures. They are set automatically when creating a sparse SUNMatrix, based on the sparse matrix storage type. • rowvals - pointer to indexvals when sparsetype is CSC_MAT, otherwise set to NULL. • colptrs - pointer to indexptrs when sparsetype is CSC_MAT, otherwise set to NULL. • colvals - pointer to indexvals when sparsetype is CSR_MAT, otherwise set to NULL. • rowptrs - pointer to indexptrs when sparsetype is CSR_MAT, otherwise set to NULL. 268 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), For example, the 5 × 4 matrix ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 0 3 0 1 0 3 0 7 0 0 1 0 0 0 0 0 2 0 9 5 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ could be stored as a CSC matrix in this structure as either M = 5; N = 4; NNZ = 8; NP = N; data = {3.0, 1.0, 3.0, 7.0, 1.0, 2.0, 9.0, 5.0}; sparsetype = CSC_MAT; indexvals = {1, 3, 0, 2, 0, 1, 3, 4}; indexptrs = {0, 2, 4, 5, 8}; or M = 5; N = 4; NNZ = 10; NP = N; data = {3.0, 1.0, 3.0, 7.0, 1.0, 2.0, 9.0, 5.0, *, *}; sparsetype = CSC_MAT; indexvals = {1, 3, 0, 2, 0, 1, 3, 4, *, *}; indexptrs = {0, 2, 4, 5, 8}; where the first has no unused space, and the second has additional storage (the entries marked with * may contain any values). Note in both cases that the final value in indexptrs is 8, indicating the total number of nonzero entries in the matrix. Similarly, in CSR format, the same matrix could be stored as M = 5; N = 4; NNZ = 8; NP = N; data = {3.0, 1.0, 3.0, 2.0, 7.0, 1.0, 9.0, 5.0}; sparsetype = CSR_MAT; indexvals = {1, 2, 0, 3, 1, 0, 3, 3}; indexptrs = {0, 2, 4, 5, 7, 8}; The header file to be included when using this module is sunmatrix/sunmatrix_sparse.h. The following macros are provided to access the content of a SUNMATRIX_SPARSE matrix. The prefix SM_ in the names denotes that these macros are for SUNMatrix implementations, and the suffix _S denotes that these are specific to the sparse version. SM_CONTENT_S(A) This macro gives access to the contents of the sparse SUNMatrix A. The assignment A_cont = SM_CONTENT_S(A) sets A_cont to be a pointer to the sparse SUNMatrix content structure. Implementation: #define SM_CONTENT_S(A) ( (SUNMatrixContent_Sparse)(A->content) ) 10.6. The SUNMATRIX_SPARSE Module 269 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 10.2: Diagram of the storage for a compressed-sparse-column matrix of type SUNMATRIX_SPARSE: Here A is an 𝑀 × 𝑁 sparse CSC matrix with storage for up to NNZ nonzero entries (the allocated length of both data and indexvals). The entries in indexvals may assume values from 0 to M-1, corresponding to the row index (zero-based) of each nonzero value. The entries in data contain the values of the nonzero entries, with the row i, column j entry of A (again, zero-based) denoted as A(i,j). The indexptrs array contains N+1 entries; the first N denote the starting index of each column within the indexvals and data arrays, while the final entry points one past the final nonzero entry. Here, although NNZ values are allocated, only nz are actually filled in; the greyed-out portions of data and indexvals indicate extra allocated space. 270 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SM_ROWS_S(A) Access the number of rows in the sparse SUNMatrix A. This may be used either to retrieve or to set the value. For example, the assignment A_rows = SM_ROWS_S(A) sets A_rows to be the number of rows in the matrix A. Similarly, the assignment SM_ROWS_S(A) = A_rows sets the number of columns in A to equal A_rows. Implementation: #define SM_ROWS_S(A) ( SM_CONTENT_S(A)->M ) SM_COLUMNS_S(A) Access the number of columns in the sparse SUNMatrix A. As with SM_ROWS_S, this may be used either to retrieve or to set the value. Implementation: #define SM_COLUMNS_S(A) ( SM_CONTENT_S(A)->N ) SM_NNZ_S(A) Access the allocated number of nonzeros in the sparse SUNMatrix A. As with SM_ROWS_S, this may be used either to retrieve or to set the value. Implementation: #define SM_NNZ_S(A) ( SM_CONTENT_S(A)->NNZ ) SM_NP_S(A) Access the number of index pointers NP in the sparse SUNMatrix A. As with SM_ROWS_S, this may be used either to retrieve or to set the value. Implementation: #define SM_NP_S(A) ( SM_CONTENT_S(A)->NP ) SM_SPARSETYPE_S(A) Access the sparsity type parameter in the sparse SUNMatrix A. As with SM_ROWS_S, this may be used either to retrieve or to set the value. Implementation: #define SM_SPARSETYPE_S(A) ( SM_CONTENT_S(A)->sparsetype ) SM_DATA_S(A) This macro gives access to the data pointer for the matrix entries. The assignment A_data = SM_DATA_S(A) sets A_data to be a pointer to the first component of the data array for the sparse SUNMatrix A. The assignment SM_DATA_S(A) = A_data sets the data array of A to be A_data by storing the pointer A_data. Implementation: #define SM_DATA_S(A) ( SM_CONTENT_S(A)->data ) SM_INDEXVALS_S(A) This macro gives access to the indexvals pointer for the matrix entries. The assignment A_indexvals = SM_INDEXVALS_S(A) sets A_indexvals to be a pointer to the array of index values (i.e. row indices for a CSC matrix, or column indices for a CSR matrix) for the sparse SUNMatrix A. Implementation: 10.6. The SUNMATRIX_SPARSE Module 271 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), #define SM_INDEXVALS_S(A) ( SM_CONTENT_S(A)->indexvals ) SM_INDEXPTRS_S(A) This macro gives access to the indexptrs pointer for the matrix entries. The assignment A_indexptrs = SM_INDEXPTRS_S(A) sets A_indexptrs to be a pointer to the array of index pointers (i.e. the starting indices in the data/indexvals arrays for each row or column in CSR or CSC formats, respectively). Implementation: #define SM_INDEXPTRS_S(A) ( SM_CONTENT_S(A)->indexptrs ) The SUNMATRIX_SPARSE module defines sparse implementations of all matrix operations listed in the section Description of the SUNMATRIX operations. Their names are obtained from those in that section by appending the suffix _Sparse (e.g. SUNMatCopy_Sparse). The module SUNMATRIX_SPARSE provides the following additional user-callable routines: SUNMatrix SUNSparseMatrix(sunindextype M, sunindextype N, sunindextype NNZ, int sparsetype) This constructor function creates and allocates memory for a sparse SUNMatrix. Its arguments are the number of rows and columns of the matrix, M and N, the maximum number of nonzeros to be stored in the matrix, NNZ, and a flag sparsetype indicating whether to use CSR or CSC format (valid choices are CSR_MAT or CSC_MAT). SUNMatrix SUNSparseFromDenseMatrix(SUNMatrix A, realtype droptol, int sparsetype) This constructor function creates a new sparse matrix from an existing SUNMATRIX_DENSE object by copying all values with magnitude larger than droptol into the sparse matrix structure. Requirements: •A must have type SUNMATRIX_DENSE •droptol must be non-negative •sparsetype must be either CSC_MAT or CSR_MAT The function returns NULL if any requirements are violated, or if the matrix storage request cannot be satisfied. SUNMatrix SUNSparseFromBandMatrix(SUNMatrix A, realtype droptol, int sparsetype) This constructor function creates a new sparse matrix from an existing SUNMATRIX_BAND object by copying all values with magnitude larger than droptol into the sparse matrix structure. Requirements: •A must have type SUNMATRIX_BAND •droptol must be non-negative •sparsetype must be either CSC_MAT or CSR_MAT. The function returns NULL if any requirements are violated, or if the matrix storage request cannot be satisfied. int SUNSparseMatrix_Realloc(SUNMatrix A) This function reallocates internal storage arrays in a sparse matrix so that the resulting sparse matrix has no wasted space (i.e. the space allocated for nonzero entries equals the actual number of nonzeros, indexptrs[NP]). Returns 0 on success and 1 on failure (e.g. if the input matrix is not sparse). void SUNSparseMatrix_Print(SUNMatrix A, FILE* outfile) This function prints the content of a sparse SUNMatrix to the output stream specified by outfile. Note: stdout or stderr may be used as arguments for outfile to print directly to standard output or standard error, respectively. sunindextype SUNSparseMatrix_Rows(SUNMatrix A) This function returns the number of rows in the sparse SUNMatrix. 272 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), sunindextype SUNSparseMatrix_Columns(SUNMatrix A) This function returns the number of columns in the sparse SUNMatrix. sunindextype SUNSparseMatrix_NNZ(SUNMatrix A) This function returns the number of entries allocated for nonzero storage for the sparse SUNMatrix. sunindextype SUNSparseMatrix_NP(SUNMatrix A) This function returns the number of index pointers for the sparse SUNMatrix (the indexptrs array has NP+1 entries). int SUNSparseMatrix_SparseType(SUNMatrix A) This function returns the storage type (CSR_MAT or CSC_MAT) for the sparse SUNMatrix. realtype* SUNSparseMatrix_Data(SUNMatrix A) This function returns a pointer to the data array for the sparse SUNMatrix. sunindextype* SUNSparseMatrix_IndexValues(SUNMatrix A) This function returns a pointer to index value array for the sparse SUNMatrix: for CSR format this is the column index for each nonzero entry, for CSC format this is the row index for each nonzero entry. sunindextype* SUNSparseMatrix_IndexPointers(SUNMatrix A) This function returns a pointer to the index pointer array for the sparse SUNMatrix: for CSR format this is the location of the first entry of each row in the data and indexvalues arrays, for CSC format this is the location of the first entry of each column. Note: Within the SUNMatMatvec_Sparse routine, internal consistency checks are performed to ensure that the matrix is called with consistent N_Vector implementations. These are currently limited to: NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS. As additional compatible vector implementations are added to SUNDIALS, these will be included within this compatibility check. For solvers that include a Fortran interface module, the SUNMATRIX_SPARSE module also includes the Fortrancallable function FSUNSparseMatInit() to initialize this SUNMATRIX_SPARSE module for a given SUNDIALS solver. subroutine FSUNSparseMatInit(CODE, M, N, NNZ, SPARSETYPE, IER) Initializes a sparse SUNMatrix structure for use in a SUNDIALS solver. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • M (long int, input) – number of matrix rows. • N (long int, input) – number of matrix columns. • NNZ (long int, input) – amount of nonzero storage to allocate. • SPARSETYPE (int, input) – matrix sparsity type (CSC_MAT or CSR_MAT) • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNSparseMassMatInit() initializes this SUNMATRIX_SPARSE module for storing the mass matrix. subroutine FSUNSparseMassMatInit(M, N, NNZ, SPARSETYPE, IER) Initializes a sparse SUNMatrix structure for use as a mass matrix in ARKode. Arguments: • M (long int, input) – number of matrix rows. 10.6. The SUNMATRIX_SPARSE Module 273 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • N (long int, input) – number of matrix columns. • NNZ (long int, input) – amount of nonzero storage to allocate. • SPARSETYPE (int, input) – matrix sparsity type (CSC_MAT or CSR_MAT) • IER (int, output) – return flag (0 success, -1 for failure). 10.7 SUNMATRIX Examples There are SUNMatrix examples that may be installed for each implementation: dense, banded, and sparse. Each implementation makes use of the functions in test_sunmatrix.c. These example functions show simple usage of the SUNMatrix family of functions. The inputs to the examples depend on the matrix type, and are output to stdout if the example is run without the appropriate number of command-line arguments. The following is a list of the example functions in test_sunmatrix.c: • Test_SUNMatGetID: Verifies the returned matrix ID against the value that should be returned. • Test_SUNMatClone: Creates clone of an existing matrix, copies the data, and checks that their values match. • Test_SUNMatZero: Zeros out an existing matrix and checks that each entry equals 0.0. • Test_SUNMatCopy: Clones an input matrix, copies its data to a clone, and verifies that all values match. • Test_SUNMatScaleAdd: Given an input matrix 𝐴 and an input identity matrix 𝐼, this test clones and copies 𝐴 to a new matrix 𝐵, computes 𝐵 = −𝐵 +𝐵, and verifies that the resulting matrix entries equal 0. Additionally, if the matrix is square, this test clones and copies 𝐴 to a new matrix 𝐷, clones and copies 𝐼 to a new matrix 𝐶, computes 𝐷 = 𝐷 + 𝐼 and 𝐶 = 𝐶 + 𝐴 using SUNMatScaleAdd, and then verifies that 𝐶 = 𝐷. • Test_SUNMatScaleAddI: Given an input matrix 𝐴 and an input identity matrix 𝐼, this clones and copies 𝐼 to a new matrix 𝐵, computes 𝐵 = −𝐵 + 𝐼 using SUNMatScaleAddI, and verifies that the resulting matrix entries equal 0. • Test_SUNMatMatvec Given an input matrix 𝐴 and input vectors 𝑥 and 𝑦 such that 𝑦 = 𝐴𝑥, this test has different behavior depending on whether 𝐴 is square. If it is square, it clones and copies 𝐴 to a new matrix 𝐵, computes 𝐵 = 3𝐵 + 𝐼 using SUNMatScaleAddI, clones 𝑦 to new vectors 𝑤 and 𝑧, computes 𝑧 = 𝐵𝑥 using SUNMatMatvec, computes 𝑤 = 3𝑦 + 𝑥 using N_VLinearSum, and verifies that 𝑤 == 𝑧. If 𝐴 is not square, it just clones 𝑦 to a new vector 𝑧, computes :math:‘z=Ax using SUNMatMatvec, and verifies that 𝑦 = 𝑧. • Test_SUNMatSpace: verifies that SUNMatSpace can be called, and outputs the results to stdout. 10.8 SUNMATRIX functions required by ARKode In Table List of matrix functions usage by ARKode code modules, we list the matrix functions in the SUNMatrix module used within the ARKode package. The table also shows, for each function, which of the code modules uses the function. The main ARKode time step modules, ARKStep and ERKStep, do not call any SUNMatrix functions directly, so the table columns are specific to the ARKLS interface and the ARKBANDPRE and ARKBBDPRE preconditioner modules. We further note that the ARKLS interface only utilizes these routines when supplied with a matrix-based linear solver, i.e. the SUNMatrix object (J or M) passed to ARKStepSetLinearSolver() or ARKStepSetMassLinearSolver() was not NULL. At this point, we should emphasize that the ARKode user does not need to know anything about the usage of matrix functions by the ARKode code modules in order to use ARKode. The information is presented as an implementation detail for the interested reader. 274 Chapter 10. Matrix Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 10.8.1 List of matrix functions usage by ARKode code modules Routine SUNMatGetID SUNMatClone SUNMatDestroy SUNMatZero SUNMatCopy SUNMatScaleAddI SUNMatScaleAdd SUNMatMatvec SUNMatSpace ARKLS X X X X X X 1 1 2 ARKBANDPRE ARKBBDPRE X X X X X X X X 2 2 1. These matrix functions are only used for problems involving a non-identity mass matrix. 2. These matrix functions are optionally used, in that these are only called if they are implemented in the SUNMatrix module that is being used (i.e. their function pointers are non-NULL). If not supplied, these modules will assume that the matrix requires no storage. We note that both the ARKBANDPRE and ARKBBDPRE preconditioner modules are hard-coded to use the SUNDIALS-supplied band SUNMatrix type, so the most useful information above for user-supplied SUNMatrix implementations is the column relating to ARKLS requirements. 10.8. SUNMATRIX functions required by ARKode 275 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 276 Chapter 10. Matrix Data Structures CHAPTER ELEVEN DESCRIPTION OF THE SUNLINEARSOLVER MODULE For problems that require the solution of linear systems of equations, the SUNDIALS packages operate using generic linear solver modules defined through the SUNLinSol API. This allows SUNDIALS packages to utilize any valid SUNLinSol implementation that provides a set of required functions. These functions can be divided into three categories. The first are the core linear solver functions. The second group consists of “set” routines to supply the linear solver object with functions provided by the SUNDIALS package, or for modification of solver parameters. The last group consists of “get” routines for retrieving artifacts (statistics, residual vectors, etc.) from the linear solver. All of these functions are defined in the header file sundials/sundials_linearsolver.h. The implementations provided with SUNDIALS work in coordination with the SUNDIALS generic N_Vector and SUNMatrix modules to provide a set of compatible data structures and solvers for the solution of linear systems using direct or iterative (matrix-based or matrix-free) methods. Moreover, advanced users can provide a customized SUNLinearSolver implementation to any SUNDIALS package, particularly in cases where they provide their own N_Vector and/or SUNMatrix modules. Historically, the SUNDIALS packages have been designed to specifically leverage the use of either direct linear solvers or matrix-free, scaled, preconditioned, iterative linear solvers. However, matrix-based iterative linear solvers are also supported. The iterative linear solvers packaged with SUNDIALS leverage scaling and preconditioning, as applicable, to balance error between solution components and to accelerate convergence of the linear solver. To this end, instead of solving the linear system 𝐴𝑥 = 𝑏 directly, these apply the underlying iterative algorithm to the transformed system ˜𝑥 = ˜𝑏 𝐴˜ (11.1) 𝐴˜ = 𝑆1 𝑃1−1 𝐴𝑃2−1 𝑆2−1 , ˜𝑏 = 𝑆1 𝑃 −1 𝑏, (11.2) where 1 𝑥 ˜ = 𝑆2 𝑃2 𝑥, and where • 𝑃1 is the left preconditioner, • 𝑃2 is the right preconditioner, • 𝑆1 is a diagonal matrix of scale factors for 𝑃1−1 𝑏, • 𝑆2 is a diagonal matrix of scale factors for 𝑃2 𝑥. SUNDIALS solvers request that iterative linear solvers stop based on the 2-norm of the scaled preconditioned residual meeting a prescribed tolerance ⃦ ⃦ ⃦˜ ˜ ⃦ 𝑥⃦ < tol. ⃦𝑏 − 𝐴˜ 2 277 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), When provided an iterative SUNLinSol implementation that does not support the scaling matrices 𝑆1 and 𝑆2 , SUNDIALS’ packages will adjust the value of tol accordingly (see the section Iterative linear solver tolerance for more details). In this case, they instead request that iterative linear solvers stop based on the criteria ⃦ −1 ⃦ ⃦𝑃 𝑏 − 𝑃 −1 𝐴𝑥⃦ < tol. 1 1 2 We note that the corresponding adjustments to tol in this case are non-optimal, in that they cannot balance error between specific entries of the solution 𝑥, only the aggregate error in the overall solution vector. We further note that not all of the SUNDIALS-provided iterative linear solvers support the full range of the above options (e.g., separate left/right preconditioning), and that some of the SUNDIALS packages only utilize a subset of these options. Further details on these exceptions are described in the documentation for each SUNLinearSolver implementation, or for each SUNDIALS package. For users interested in providing their own SUNLinSol module, the following section presents the SUNLinSol API and its implementation beginning with the definition of SUNLinSol functions in sections SUNLinearSolver core functions – SUNLinearSolver get functions. This is followed by the definition of functions supplied to a linear solver implementation in section Functions provided by SUNDIALS packages. The linear solver return codes are described in section SUNLinearSolver return codes. The SUNLinearSolver type and the generic SUNLinSol module are defined in section The generic SUNLinearSolver module. The section Compatibility of SUNLinearSolver modules discusses compatibility between the SUNDIALS-provided SUNLinSol modules and SUNMATRIX modules. Section Implementing a custom SUNLinearSolver module lists the requirements for supplying a custom SUNLinSol module and discusses some intended use cases. Users wishing to supply their own SUNLinSol module are encouraged to use the SUNLinSol implementations provided with SUNDIALS as a template for supplying custom linear solver modules. The SUNLinSol functions required by this SUNDIALS package as well as other package specific details are given in section ARKode SUNLinearSolver interface. The remaining sections of this chapter present the SUNLinSol modules provided with SUNDIALS. 11.1 The SUNLinearSolver API The SUNLinSol API defines several linear solver operations that enable SUNDIALS packages to utilize any SUNLinSol implementation that provides the required functions. These functions can be divided into three categories. The first are the core linear solver functions. The second group of functions consists of set routines to supply the linear solver with functions provided by the SUNDIALS time integrators and to modify solver parameters. The final group consists of get routines for retrieving linear solver statistics. All of these functions are defined in the header file sundials/sundials_linearsolver.h. 11.1.1 SUNLinearSolver core functions The core linear solver functions consist of four required routines to get the (SUNLinSolGetType()), initialize the linear solver object once all solver-specific set (SUNLinSolInitialize()), set up the linear solver object to utilize an (SUNLinSolSetup()), and solve the linear system 𝐴𝑥 = 𝑏 (SUNLinSolSolve()). tine for destruction of the linear solver object (SUNLinSolFree()) is optional. linear solver type options have been updated matrix 𝐴 The remaining rou- SUNLinearSolver_Type SUNLinSolGetType(SUNLinearSolver LS) Returns the type identifier for the linear solver LS. It is used to determine the solver type (direct, iterative, or matrix-iterative) from the abstract SUNLinearSolver interface. Returned values are one of the following: •SUNLINEARSOLVER_DIRECT – 0, the SUNLinSol module requires a matrix, and computes an ‘exact’ solution to the linear system defined by that matrix. •SUNLINEARSOLVER_ITERATIVE – 1, the SUNLinSol module does not require a matrix (though one may be provided), and computes an inexact solution to the linear system using a matrix-free iterative 278 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), algorithm. That is it solves the linear system defined by the package-supplied ATimes routine (see SUNLinSolSetATimes() below), even if that linear system differs from the one encoded in the matrix object (if one is provided). As the solver computes the solution only inexactly (or may diverge), the linear solver should check for solution convergence/accuracy as appropriate. •SUNLINEARSOLVER_MATRIX_ITERATIVE – 2, the SUNLinSol module requires a matrix, and computes an inexact solution to the linear system defined by that matrix using an iterative algorithm. That is it solves the linear system defined by the matrix object even if that linear system differs from that encoded by the package-supplied ATimes routine. As the solver computes the solution only inexactly (or may diverge), the linear solver should check for solution convergence/accuracy as appropriate. Usage: type = SUNLinSolGetType(LS); Notes: See section Intended use cases for more information on intended use cases corresponding to the linear solver type. int SUNLinSolInitialize(SUNLinearSolver LS) Performs linear solver initialization (assuming that all solver-specific options have been set). This should return zero for a successful call, and a negative value for a failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Usage: retval = SUNLinSolInitialize(LS); int SUNLinSolSetup(SUNLinearSolver LS, SUNMatrix A) Performs any linear solver setup needed, based on an updated system SUNMatrix A. This may be called frequently (e.g., with a full Newton method) or infrequently (for a modified Newton method), based on the type of integrator and/or nonlinear solver requesting the solves. This should return zero for a successful call, a positive value for a recoverable failure and a negative value for an unrecoverable failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Usage: retval = SUNLinSolSetup(LS, A); int SUNLinSolSolve(SUNLinearSolver LS, SUNMatrix A, N_Vector x, N_Vector b, realtype tol) This required function Solves a linear system 𝐴𝑥 = 𝑏. Arguments: • LS – a SUNLinSol object. • A – a SUNMatrix object. • x – a N_Vector object containing the initial guess for the solution of the linear system, and the solution to the linear system upon return. • b – a N_Vector object containing the linear system right-hand side. • tol – the desired linear solver tolerance. Return value: This should return zero for a successful call, a positive value for a recoverable failure and a negative value for an unrecoverable failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Direct solvers: can ignore the tol argument. Matrix-free solvers: (those that identify as SUNLINEARSOLVER_ITERATIVE) can ignore the SUNMatrix input A, and should rely on the matrix-vector product function supplied through the routine SUNLinSolSetATimes(). 11.1. The SUNLinearSolver API 279 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Iterative solvers: (those that identify as SUNLINEARSOLVER_ITERATIVE or SUNLINEARSOLVER_MATRIX_ITERATIVE) should attempt to solve to the specified tolerance tol in a weighted 2-norm. If the solver does not support scaling then it should just use a 2-norm. Usage: retval = SUNLinSolSolve(LS, A, x, b, tol); int SUNLinSolFree(SUNLinearSolver LS) Frees memory allocated by the linear solver. This should return zero for a successful call, and a negative value for a failure. Usage: retval = SUNLinSolFree(LS); 11.1.2 SUNLinearSolver set functions The following set functions are used to supply linear solver modules with functions defined by the SUNDIALS packages and to modify solver parameters. Only the routine for setting the matrix-vector product routine is required, and that is only for matrix-free linear solver modules. Otherwise, all other set functions are optional. SUNLinSol implementations that do not provide the functionality for any optional routine should leave the corresponding function pointer NULL instead of supplying a dummy routine. int SUNLinSolSetATimes(SUNLinearSolver LS, void* A_data, ATimesFn ATimes) This function is required for matrix-free linear solvers; otherwise it is optional. Provides a ATimesFn function pointer, as well as a void* pointer to a data structure used by this routine, to a linear solver object. SUNDIALS packages will call this function to set the matrix-vector product function to either a solver-provided difference-quotient via vector operations or a user-supplied solver-specific routine. This routine should return zero for a successful call, and a negative value for a failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Usage: retval = SUNLinSolSetATimes(LS, A_data, ATimes); int SUNLinSolSetPreconditioner(SUNLinearSolver LS, void* P_data, PSetupFn Pset, PSolveFn Psol) This optional routine provides PSetupFn and PSolveFn function pointers that implement the preconditioner solves 𝑃1−1 and 𝑃2−1 . This routine will be called by a SUNDIALS package, which will provide translation between the generic Pset and Psol calls and the package- or user-supplied routines. This routine should return zero for a successful call, and a negative value for a failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Usage: retval = SUNLinSolSetPreconditioner(LS, Pdata, Pset, Psol); int SUNLinSolSetScalingVectors(SUNLinearSolver LS, N_Vector s1, N_Vector s2) This optional routine provides left/right scaling vectors for the linear system solve. Here, s1 and s2 are N_Vectors of positive scale factors containing the diagonal of the matrices 𝑆1 and 𝑆2 , respectively. Neither of these vectors need to be tested for positivity, and a NULL argument for either indicates that the corresponding scaling matrix is the identity. This routine should return zero for a successful call, and a negative value for a failure, ideally returning one of the generic error codes listed in section SUNLinearSolver return codes. Usage: 280 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), retval = SUNLinSolSetScalingVectors(LS, s1, s2); 11.1.3 SUNLinearSolver get functions The following get functions allow SUNDIALS packages to retrieve results from a linear solve. All routines are optional. int SUNLinSolNumIters(SUNLinearSolver LS) This optional routine should return the number of linear iterations performed in the last “solve” call. Usage: its = SUNLinSolNumIters(LS); realtype SUNLinSolResNorm(SUNLinearSolver LS) This optional routine should return the final residual norm from the last “solve” call. Usage: rnorm = SUNLinSolResNorm(LS); N_Vector SUNLinSolResid(SUNLinearSolver LS) If an iterative method computes the preconditioned initial residual and returns with a successful solve without performing any iterations (i.e., either the initial guess or the preconditioner is sufficiently accurate), then this optional routine may be called by the SUNDIALS package. This routine should return the N_Vector containing the preconditioned initial residual vector. Usage: rvec = SUNLinSolResid(LS); Note: since N_Vector is actually a pointer, and the results are not modified, this routine should not require additional memory allocation. If the SUNLinSol object does not retain a vector for this purpose, then this function pointer should be set to NULL in the implementation. long int SUNLinSolLastFlag(SUNLinearSolver LS) This optional routine should return the last error flag encountered within the linear solver. This is not called by the SUNDIALS packages directly; it allows the user to investigate linear solver issues after a failed solve. Usage: lflag = SUNLinLastFlag(LS); int SUNLinSolSpace(SUNLinearSolver LS, long int *lenrwLS, long int *leniwLS) This optional routine should return the storage requirements for the linear solver LS. lrw is a long int containing the number of realtype words and liw is a long int containing the number of integer words. The return value is an integer flag denoting success/failure of the operation. This function is advisory only, for use by users to help determine their total space requirements. Usage: retval = SUNLinSolSpace(LS, &lrw, &liw); 11.1.4 Functions provided by SUNDIALS packages To interface with SUNLinSol modules, the SUNDIALS packages supply a variety of routines for evaluating the matrixvector product, and setting up and applying the preconditioniner. These package-provided routines translate between 11.1. The SUNLinearSolver API 281 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), the user-supplied ODE, DAE, or nonlinear systems and the generic interfaces to the linear systems of equations that result in their solution. The types for functions provided to a SUNLinSol module are defined in the header file sundials/sundials_iterative.h, and are described below. typedef int (*ATimesFn)(void *A_data, N_Vector v, N_Vector z) These functions compute the action of a matrix on a vector, performing the operation 𝑧 = 𝐴𝑣. Memory for z will already be allocated prior to calling this function. The parameter A_data is a pointer to any information about 𝐴 which the function needs in order to do its job. The vector 𝑣 should be left unchanged. This routine should return 0 if successful and a non-zero value if unsuccessful. typedef int (*PSetupFn)(void *P_data) These functions set up any requisite problem data in preparation for calls to the corresponding PSolveFn. This routine should return 0 if successful and a non-zero value if unsuccessful. typedef int (*PSolveFn)(void *P_data, N_Vector r, N_Vector z, realtype tol, int lr) These functions solve the preconditioner equation 𝑃 𝑧 = 𝑟 for the vector 𝑧. Memory for z will already be allocated prior to calling this function. The parameter P_data is a pointer to any information about 𝑃 which the function needs in order to do its job (set up by the corresponding PSetupFn). The parameter lr is input, and indicates whether 𝑃 is to be taken as the left or right preconditioner: lr = 1 for left and lr = 2 for right. If preconditioning is on one side only, lr can be ignored. If the preconditioner is iterative, then it should strive to solve the preconditioner equation so that ‖𝑃 𝑧 − 𝑟‖wrms < 𝑡𝑜𝑙 where the error weight vector for the WRMS norm may be accessed from the main package memory structure. The vector r should not be modified by the PSolveFn. This routine should return 0 if successful and a nonzero value if unsuccessful. On a failure, a negative return value indicates an unrecoverable condition, while a positive value indicates a recoverable one, in which the calling routine may reattempt the solution after updating preconditioner data. 11.1.5 SUNLinearSolver return codes The functions provided to SUNLinSol modules by each SUNDIALS package, and functions within the SUNDIALSprovided SUNLinSol implementations utilize a common set of return codes, listed below. These adhere to a common pattern: 0 indicates success, a postitive value corresponds to a recoverable failure, and a negative value indicates a nonrecoverable failure. Aside from this pattern, the actual values of each error code are primarily to provide additional information to the user in case of a linear solver failure. • SUNLS_SUCCESS (0) – successful call or converged solve • SUNLS_MEM_NULL (-1) – the memory argument to the function is NULL • SUNLS_ILL_INPUT (-2) – an illegal input has been provided to the function • SUNLS_MEM_FAIL (-3) – failed memory access or allocation • SUNLS_ATIMES_FAIL_UNREC (-4) – an unrecoverable failure occurred in the ATimes routine • SUNLS_PSET_FAIL_UNREC (-5) – an unrecoverable failure occurred in the Pset routine • SUNLS_PSOLVE_FAIL_UNREC (-6) – an unrecoverable failure occurred in the Psolve routine • SUNLS_PACKAGE_FAIL_UNREC (-7) – an unrecoverable failure occurred in an external linear solver package • SUNLS_GS_FAIL (-8) – a failure occurred during Gram-Schmidt orthogonalization (SPGMR/SPFGMR) • SUNLS_QRSOL_FAIL (-9) – a singular $R$ matrix was encountered in a QR factorization (SPGMR/SPFGMR) • SUNLS_RES_REDUCED (1) – an iterative solver reduced the residual, but did not converge to the desired tolerance 282 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • SUNLS_CONV_FAIL (2) – an iterative solver did not converge (and the residual was not reduced) • SUNLS_ATIMES_FAIL_REC (3) – a recoverable failure occurred in the ATimes routine • SUNLS_PSET_FAIL_REC (4) – a recoverable failure occurred in the Pset routine • SUNLS_PSOLVE_FAIL_REC (5) – a recoverable failure occurred in the Psolve routine • SUNLS_PACKAGE_FAIL_REC (6) – a recoverable failure occurred in an external linear solver package • SUNLS_QRFACT_FAIL (7) – a singular matrix was encountered during a QR factorization (SPGMR/SPFGMR) • SUNLS_LUFACT_FAIL (8) – a singular matrix was encountered during a LU factorization 11.1.6 The generic SUNLinearSolver module SUNDIALS packages interact with specific SUNLinSol implementations through the generic SUNLinSol module on which all other SUNLinSol iplementations are built. The SUNLinearSolver type is a pointer to a structure containing an implementation-dependent content field, and an ops field. The type SUNLinearSolver is defined as typedef struct _generic_SUNLinearSolver *SUNLinearSolver; struct _generic_SUNLinearSolver { void *content; struct _generic_SUNLinearSolver_Ops *ops; }; where the _generic_SUNLinearSolver_Ops structure is a list of pointers to the various actual linear solver operations provided by a specific implementation. The _generic_SUNLinearSolver_Ops structure is defined as struct _generic_SUNLinearSolver_Ops { SUNLinearSolver_Type (*gettype)(SUNLinearSolver); int (*setatimes)(SUNLinearSolver, void*, ATimesFn); int (*setpreconditioner)(SUNLinearSolver, void*, PSetupFn, PSolveFn); int (*setscalingvectors)(SUNLinearSolver, N_Vector, N_Vector); int (*initialize)(SUNLinearSolver); int (*setup)(SUNLinearSolver, SUNMatrix); int (*solve)(SUNLinearSolver, SUNMatrix, N_Vector, N_Vector, realtype); int (*numiters)(SUNLinearSolver); realtype (*resnorm)(SUNLinearSolver); long int (*lastflag)(SUNLinearSolver); int (*space)(SUNLinearSolver, long int*, long int*); N_Vector (*resid)(SUNLinearSolver); int (*free)(SUNLinearSolver); }; The generic SUNLinSol module defines and implements the linear solver operations defined in Sections SUNLinearSolver core functions through SUNLinearSolver get functions. These routines are in fact only wrappers to the linear solver operations defined by a particular SUNLinSol implementation, which are accessed through the ops field of the SUNLinearSolver structure. To illustrate this point we show below the implementation of a typical linear solver operation from the generic SUNLinearSolver module, namely SUNLinSolInitialize, which initializes a SUNLinearSolver object for use after it has been created and configured, and returns a flag denoting a successful or failed operation: 11.1. The SUNLinearSolver API 283 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int SUNLinSolInitialize(SUNLinearSolver S) { return ((int) S->ops->initialize(S)); } 11.1.7 Compatibility of SUNLinearSolver modules We note that not all SUNLinearSolver types are compatible with all SUNMatrix and N_Vector types provided with SUNDIALS. In Table Compatible SUNLinearSolver and SUNMatrix implementations we show the matrix-based linear solvers available as SUNLinearSolver modules, and the compatible matrix implementations. Recall that Table SUNDIALS linear solver interfaces and vector implementations that can be used for each shows the compatibility between all SUNLinearSolver modules and vector implementations. Compatible SUNLinearSolver and SUNMatrix implementations Linear Solver Dense LapackDense Band LapackBand KLU SuperLU_MT User supplied Dense X X Banded Sparse X X X X X X X User Supplied X X X X X X X 11.1.8 Implementing a custom SUNLinearSolver module A particular implementation of the SUNLinearSolver module must: • Specify the content field of the SUNLinSol module. • Define and implement the required linear solver operations. See the section ARKode SUNLinearSolver interface to determine which SUNLinSol operations are required for this SUNDIALS package. Note that the names of these routines should be unique to that implementation in order to permit using more than one SUNLinSol module (each with different SUNLinearSolver internal data representations) in the same code. • Define and implement user-callable constructor and destructor routines to create and free a SUNLinearSolver with the new content field and with ops pointing to the new linear solver operations. We note that the function pointers for all unsupported optional routines should be set to NULL in the ops structure. This allows the SUNDIALS package that is using the SUNLinSol object to know that the associated functionality is not supported. Additionally, a SUNLinearSolver implementation may do the following: • Define and implement additional user-callable “set” routines acting on the SUNLinearSolver, e.g., for setting various configuration options to tune the linear solver to a particular problem. • Provide additional user-callable “get” routines acting on the SUNLinearSolver object, e.g., for returning various solve statistics. 284 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Intended use cases The SUNLinSol (and SUNMATRIX) APIs are designed to require a minimal set of routines to ease interfacing with custom or third-party linear solver libraries. External solvers provide similar routines with the necessary functionality and thus will require minimal effort to wrap within custom SUNMATRIX and SUNLinSol implementations. Sections SUNMATRIX functions required by ARKode and ARKode SUNLinearSolver interface include a list of the required set of routines that compatible SUNMATRIX and SUNLinSol implementations must provide. As SUNDIALS packages utilize generic SUNLinSol modules allowing for user-supplied SUNLinearSolver implementations, there exists a wide range of possible linear solver combinations. Some intended use cases for both the SUNDIALS-provided and user-supplied SUNLinSol modules are discussd in the following sections. Direct linear solvers Direct linear solver modules require a matrix and compute an ‘exact’ solution to the linear system defined by the matrix. Multiple matrix formats and associated direct linear solvers are supplied with SUNDIALS through different SUNMATRIX and SUNLinSol implementations. SUNDIALS packages strive to amortize the high cost of matrix construction by reusing matrix information for multiple nonlinear iterations. As a result, each package’s linear solver interface recomputes Jacobian information as infrequently as possible. Alternative matrix storage formats and compatible linear solvers that are not currently provided by or interfaced with SUNDIALS can leverage this infrastructure with minimal effort. To do so, a user must implement custom SUNMATRIX and SUNLinSol wrappers for the desired matrix format and/or linear solver following the APIs described in the sections Matrix Data Structures and Description of the SUNLinearSolver module. This user-supplied SUNLinSol module must then self-identify as having SUNLINEARSOLVER_DIRECT type. Matrix-free iterative linear solvers Matrix-free iterative linear solver modules do not require a matrix and compute an inexact solution to the linear system defined by the package-supplied ATimes routine. SUNDIALS supplies multiple scaled, preconditioned iterative linear solver (spils) SUNLinSol modules that support scaling to allow users to handle non-dimensionalization (as best as possible) within each SUNDIALS package and retain variables and define equations as desired in their applications. For linear solvers that do not support left/right scaling, the tolerance supplied to the linear solver is adjusted to compensate (see section Iterative linear solver tolerance for more details); however, this use case may be non-optimal and cannot handle situations where the magnitudes of different solution components or equations vary dramatically within a single problem. To utilize alternative linear solvers that are not currently provided by or interfaced with SUNDIALS a user must implement a custom SUNLinSol wrapper for the linear solver following the API described in the section Description of the SUNLinearSolver module. This user-supplied SUNLinSol module must then self-identify as having SUNLINEARSOLVER_ITERATIVE type. Matrix-based iterative linear solvers (reusing 𝐴) Matrix-based iterative linear solver modules require a matrix and compute an inexact solution to the linear system defined by the matrix. This matrix will be updated infrequently and resued across multiple solves to amortize cost of matrix construction. As in the direct linear solver case, only wrappers for the matrix and linear solver in SUNMATRIX and SUNLinSol implementations need to be created to utilize a new linear solver. This user-supplied SUNLinSol module must then self-identify as having SUNLINEARSOLVER_MATRIX_ITERATIVE type. At present, SUNDIALS has one example problem that uses this approach for wrapping a structured-grid matrix, linear solver, and preconditioner from the hypre library that may be used as a template for other customized implementations (see examples/arkode/CXX_parhyp/ark_heat2D_hypre.cpp). 11.1. The SUNLinearSolver API 285 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Matrix-based iterative linear solvers (current 𝐴) For users who wish to utilize a matrix-based iterative linear solver module where the matrix is purely for preconditioning and the linear system is defined by the package-supplied ATimes routine, we envision two current possibilities. The preferred approach is for users to employ one of the SUNDIALS scaled, preconditioned iterative linear solver (spils) implementations (SUNLinSol_SPGMR(), SUNLinSol_SPFGMR(), SUNLinSol_SPBCGS(), SUNLinSol_SPTFQMR(), or SUNLinSol_PCG()) as the outer solver. The creation and storage of the preconditioner matrix, and interfacing with the corresponding linear solver, can be handled through a package’s preconditioner ‘setup’ and ‘solve’ functionality (see the sections Preconditioner setup (iterative linear solvers) and Preconditioner solve (iterative linear solvers), respectively) without creating SUNMATRIX and SUNLinSol implementations. This usage mode is recommended primarily because the SUNDIALS-provided spils modules support the scaling as described above. A second approach supported by the linear solver APIs is as follows. If the SUNLinSol implementation is matrix-based, self-identifies as having SUNLINEARSOLVER_ITERATIVE type, and also provides a non-NULL :c:func:‘SUNLinSolSetATimes()‘ routine, then each SUNDIALS package will call that routine to attach its packagespecific matrix-vector product routine to the SUNLinSol object. The SUNDIALS package will then call the SUNLinSol-provided SUNLinSolSetup() routine (infrequently) to update matrix information, but will provide current matrix-vector products to the SUNLinSol implementation through the package-supplied ATimesFn routine. 11.2 ARKode SUNLinearSolver interface In the table below, we list the SUNLinSol module linear solver functions used within the ARKLS interface. As with the SUNMATRIX module, we emphasize that the ARKode user does not need to know detailed usage of linear solver functions by the ARKode code modules in order to use ARKode. The information is presented as an implementation detail for the interested reader. The linear solver functions listed below are marked with “X” to indicate that they are required, or with “O” to indicate that they are only called if they are non-NULL in the SUNLinearSolver implementation that is being used. Note: 1. SUNLinSolNumIters() is only used to accumulate overall iterative linear solver statistics. If it is not implemented by the SUNLinearSolver module, then ARKLS will consider all solves as requiring zero iterations. 2. Although SUNLinSolResNorm() is optional, if it is not implemented by the SUNLinearSolver then ARKLS will consider all solves a being exact. 3. Although ARKLS does not call SUNLinSolLastFlag() directly, this routine is available for users to query linear solver failure modes directly. 4. Although ARKLS does not call SUNLinSolFree() directly, this routine should be available for users to call when cleaning up from a simulation. 286 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Routine SUNLinSolGetType SUNLinSolSetATimes SUNLinSolSetPreconditioner SUNLinSolSetScalingVectors SUNLinSolInitialize SUNLinSolSetup SUNLinSolSolve SUNLinSolNumIters1 SUNLinSolResNorm2 SUNLinSolLastFlag3 SUNLinSolFree4 SUNLinSolSpace DIRECT X O O O X X X ITERATIVE X X O O X X X O O MATRIX_ITERATIVE X O O O X X X O O O O O Since there are a wide range of potential SUNLinSol use cases, the following subsections describe some details of the ARKLS interface, in the case that interested users wish to develop custom SUNLinSol modules. 11.2.1 Lagged matrix information If the SUNLinSol identifies as having type SUNLINEARSOLVER_DIRECT or SUNLINEARSOLVER_MATRIX_ITERATIVE, then the SUNLinSol object solves a linear system defined by a SUNMATRIX object. ARKLS will update the matrix information infrequently according to the strategies outlined in the section Updating the linear solver. When solving a linear system ˜𝑥 = 𝑏 𝒜˜ ⇔ (𝑀 − 𝛾˜ 𝐽)˜ 𝑥=𝑏 it is likely that the value 𝛾˜ used to construct 𝒜˜ differs from the current value of 𝛾 in the RK method, since 𝒜˜ is updated infrequently. Therefore, after calling the SUNLinSol-provided SUNLinSolSolve() routine, we test whether 𝛾/˜ 𝛾 ̸= 1, and if this is the case we scale the solution 𝑥 ˜ to obtain the desired linear system solution 𝑥 via 𝑥= 2 𝑥 ˜. 1 + 𝛾/˜ 𝛾 (11.3) For values of 𝛾/˜ 𝛾 that are “close” to 1, this rescaling approximately solves the original linear system, as discussed below. We first note that the equation (11.3) is equivalent to 𝑥 ˜= 1 𝛾 𝑥 + 𝑥. 2 𝛾˜ Adding the two equations (𝑀 − 𝛾𝐽)𝑥 = 𝑏 and (𝑀 − 𝛾˜ 𝐽)˜ 𝑥 = 𝑏, and inserting the above relationship, we have 2𝑏 = (𝑀 − 𝛾𝐽)𝑥 + (𝑀 − 𝛾˜ 𝐽) = 𝑀 𝑥 − 𝛾𝐽𝑥 + 𝑀 𝑥 ˜ − 𝐽 (˜ 𝛾𝑥 ˜) (︂ )︂ 3 1 𝛾 𝑀 − 𝛾˜ 𝐽 𝑥 = (𝑀 − 𝛾𝐽) 𝑥 + 2 2 𝛾˜ (︂ )︂ 3 1 𝛾 = 𝑏+ 𝑀 − 𝛾˜ 𝐽 𝑥. 2 2 𝛾˜ When 𝛾/˜ 𝛾 ≈ 1, this latter term is approximately equal to 12 𝑏. 11.2.2 Iterative linear solver tolerance If the SUNLinSol object self-identifies as having type SUNLINEARSOLVER_ITERATIVE or SUNLINEARSOLVER_MATRIX_ITERATIVE, then ARKLS will set the input tolerance delta as described 11.2. ARKode SUNLinearSolver interface 287 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), in Linear iteration error control. However, if the iterative linear solver does not support scaling matrices (i.e., the SUNLinSolSetScalingVectors() routine is NULL), then ARKLS will attempt to adjust the linear solver tolerance to account for this lack of functionality. To this end, the following assumptions are made: • The units of the IVP solution and linear residual are the same (i.e., the error and residual weight vectors in section Error norms are the same); this is automatically satisfied with identity mass matrix, 𝑀 = 𝐼, or similar. • All solution components have similar magnitude; hence the error weight vector 𝑤 used in the WRMS norm (see the section Error norms) should satisfy the assumption 𝑤𝑖 ≈ 𝑤𝑚𝑒𝑎𝑛 , for 𝑖 = 0, . . . , 𝑛 − 1. • The SUNLinSol object uses a standard 2-norm to measure convergence. Under these assumptions, ARKLS uses identical left and right scaling matrices, 𝑆1 = 𝑆2 = 𝑆 = diag(𝑤), so the linear solver convergence requirement is converted as follows (using the notation from the beginning of this chapter): ⃦ ⃦ ⃦˜ ˜ ⃦ 𝑥⃦ < tol ⃦𝑏 − 𝐴˜ ⃦ ⃦ −1 2 ⇔ ⃦𝑆𝑃1 𝑏 − 𝑆𝑃1−1 𝐴𝑥⃦2 < tol ⇔ 𝑛−1 ∑︁ [︀ (︀ −1 )︀ ]︀2 𝑤𝑖 𝑃1 (𝑏 − 𝐴𝑥) 𝑖 < tol2 𝑖=0 2 ⇔ 𝑤𝑚𝑒𝑎𝑛 𝑛−1 ∑︁ [︀(︀ )︀ ]︀2 𝑃1−1 (𝑏 − 𝐴𝑥) 𝑖 < tol2 𝑖=0 ⇔ 𝑛−1 ∑︁ [︀(︀ −1 )︀ ]︀2 𝑃1 (𝑏 − 𝐴𝑥) 𝑖 < (︂ 𝑖=0 ⇔ ⃦ ⃦ −1 ⃦𝑃 (𝑏 − 𝐴𝑥)⃦ < 1 2 tol )︂2 𝑤𝑚𝑒𝑎𝑛 tol 𝑤𝑚𝑒𝑎𝑛 Therefore the tolerance scaling factor √ 𝑤𝑚𝑒𝑎𝑛 = ‖𝑤‖2 / 𝑛 is computed and the scaled tolerance delta = tol/𝑤𝑚𝑒𝑎𝑛 is supplied to the SUNLinSol object. 11.3 The SUNLinSol_Dense Module The dense implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_Dense, is designed to be used with the corresponding SUNMATRIX_DENSE matrix type, and one of the serial or sharedmemory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS). 11.3.1 SUNLinSol_Dense Usage The header file to be included when using this module is sunlinsol/sunlinsol_dense.h. The SUNLinSol_Dense module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsoldense module library. The module SUNLinSol_Dense provides the following user-callable constructor routine: SUNLinearSolver SUNLinSol_Dense(N_Vector y, SUNMatrix A) This function creates and allocates memory for a dense SUNLinearSolver. Its arguments are an N_Vector 288 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), and SUNMatrix, that it uses to determine the linear system size and to assess compatibility with the linear solver implementation. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_DENSE matrix type and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. If either A or y are incompatible then this routine will return NULL. For backwards compatibility, we also provide the wrapper function, SUNLinearSolver SUNDenseLinearSolver(N_Vector y, SUNMatrix A) Wrapper function for SUNLinSol_Dense(), with identical input and output arguments For solvers that include a Fortran interface module, the SUNLinSol_Dense module also includes the Fortran-callable function FSUNDenseLinSolInit() to initialize this SUNLinSol_Dense module for a given SUNDIALS solver. subroutine FSUNDenseLinSolInit(CODE, IER) Initializes a dense SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassDenseLinSolInit() initializes this SUNLinSol_Dense module for solving mass matrix linear systems. subroutine FSUNMassDenseLinSolInit(IER) Initializes a dense SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • IER (int, output) – return flag (0 success, -1 for failure). 11.3.2 SUNLinSol_Dense Description The SUNLinSol_Dense module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_Dense { sunindextype N; sunindextype *pivots; long int last_flag; }; These entries of the content field contain the following information: • N - size of the linear system, • pivots - index array for partial pivoting in LU factorization, • last_flag - last error return flag from internal function evaluations. This solver is constructed to perform the following operations: 11.3. The SUNLinSol_Dense Module 289 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • The “setup” call performs a 𝐿𝑈 factorization with partial (row) pivoting (𝒪(𝑁 3 ) cost), 𝑃 𝐴 = 𝐿𝑈 , where 𝑃 is a permutation matrix, 𝐿 is a lower triangular matrix with 1’s on the diagonal, and 𝑈 is an upper triangular matrix. This factorization is stored in-place on the input SUNMATRIX_DENSE object 𝐴, with pivoting information encoding 𝑃 stored in the pivots array. • The “solve” call performs pivoting and forward and backward substitution using the stored pivots array and the 𝐿𝑈 factors held in the SUNMATRIX_DENSE object (𝒪(𝑁 2 ) cost). The SUNLinSol_Dense module defines dense implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_Dense • SUNLinSolInitialize_Dense – this does nothing, since all consistency checks are performed at solver creation. • SUNLinSolSetup_Dense – this performs the 𝐿𝑈 factorization. • SUNLinSolSolve_Dense – this uses the 𝐿𝑈 factors and pivots array to perform the solve. • SUNLinSolLastFlag_Dense • SUNLinSolSpace_Dense – this only returns information for the storage within the solver object, i.e. storage for N, last_flag, and pivots. • SUNLinSolFree_Dense 11.4 The SUNLinSol_Band Module The band implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_Band, is designed to be used with the corresponding SUNMATRIX_BAND matrix type, and one of the serial or shared-memory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS). 11.4.1 SUNLinSol_Band Usage The header file to be included when using this module is sunlinsol/sunlinsol_band.h. The SUNLinSol_Band module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolband module library. The module SUNLinSol_Band provides the following user-callable constructor routine: SUNLinearSolver SUNLinSol_Band(N_Vector y, SUNMatrix A) This function creates and allocates memory for a band SUNLinearSolver. Its arguments are an N_Vector and SUNMatrix, that it uses to determine the linear system size and to assess compatibility with the linear solver implementation. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_BAND matrix type and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. Additionally, this routine will verify that the input matrix A is allocated with appropriate upper bandwidth storage for the 𝐿𝑈 factorization. If either A or y are incompatible then this routine will return NULL. For backwards compatibility, we also provide the wrapper function, 290 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNLinearSolver SUNBandLinearSolver(N_Vector y, SUNMatrix A) Wrapper function for SUNLinSol_Band(), with identical input and output arguments. For solvers that include a Fortran interface module, the SUNLinSol_Band module also includes the Fortran-callable function FSUNBandLinSolInit() to initialize this SUNLinSol_Band module for a given SUNDIALS solver. subroutine FSUNBandLinSolInit(CODE, IER) Initializes a banded SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassBandLinSolInit() initializes this SUNLinSol_Band module for solving mass matrix linear systems. subroutine FSUNMassBandLinSolInit(IER) Initializes a banded SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • IER (int, output) – return flag (0 success, -1 for failure). 11.4.2 SUNLinSol_Band Description The SUNLinSol_Band module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_Band { sunindextype N; sunindextype *pivots; long int last_flag; }; These entries of the content field contain the following information: • N - size of the linear system, • pivots - index array for partial pivoting in LU factorization, • last_flag - last error return flag from internal function evaluations. This solver is constructed to perform the following operations: • The “setup” call performs a 𝐿𝑈 factorization with partial (row) pivoting, 𝑃 𝐴 = 𝐿𝑈 , where 𝑃 is a permutation matrix, 𝐿 is a lower triangular matrix with 1’s on the diagonal, and 𝑈 is an upper triangular matrix. This factorization is stored in-place on the input SUNMATRIX_BAND object 𝐴, with pivoting information encoding 𝑃 stored in the pivots array. • The “solve” call performs pivoting and forward and backward substitution using the stored pivots array and the 𝐿𝑈 factors held in the SUNMATRIX_BAND object. • 𝐴 must be allocated to accommodate the increase in upper bandwidth that occurs during factorization. More precisely, if 𝐴 is a band matrix with upper bandwidth mu and lower bandwidth ml, then the upper triangular factor 𝑈 can have upper bandwidth as big as smu = MIN(N-1,mu+ml). The lower triangular factor 𝐿 has lower bandwidth ml. 11.4. The SUNLinSol_Band Module 291 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The SUNLinSol_Band module defines band implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_Band • SUNLinSolInitialize_Band – this does nothing, since all consistency checks are performed at solver creation. • SUNLinSolSetup_Band – this performs the 𝐿𝑈 factorization. • SUNLinSolSolve_Band – this uses the 𝐿𝑈 factors and pivots array to perform the solve. • SUNLinSolLastFlag_Band • SUNLinSolSpace_Band – this only returns information for the storage within the solver object, i.e. storage for N, last_flag, and pivots. • SUNLinSolFree_Band 11.5 The SUNLinSol_LapackDense Module The LAPACK dense implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_LapackDense, is designed to be used with the corresponding SUNMATRIX_DENSE matrix type, and one of the serial or shared-memory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP, or NVECTOR_PTHREADS). 11.5.1 SUNLinSol_LapackDense Usage The header file to be included when using this module is sunlinsol/sunlinsol_lapackdense.h. The installed module library to link to is libsundials_sunlinsollapackdense .lib where .lib is typically .so for shared libraries and .a for static libraries. The module SUNLinSol_LapackDense provides the following additional user-callable constructor routine: SUNLinearSolver SUNLinSol_LapackDense(N_Vector y, SUNMatrix A) This function creates and allocates memory for a LAPACK dense SUNLinearSolver. Its arguments are an N_Vector and SUNMatrix, that it uses to determine the linear system size and to assess compatibility with the linear solver implementation. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_DENSE matrix type and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. If either A or y are incompatible then this routine will return NULL. For backwards compatibility, we also provide the wrapper function, SUNLinearSolver SUNLapackDense(N_Vector y, SUNMatrix A) Wrapper function for SUNLinSol_LapackDense(), with identical input and output arguments. For solvers that include a Fortran interface module, the SUNLinSol_LapackDense module also includes the Fortrancallable function FSUNLapackDenseInit() to initialize this SUNLinSol_LapackDense module for a given SUNDIALS solver. subroutine FSUNLapackDenseInit(CODE, IER) Initializes a dense LAPACK SUNLinearSolver structure for use in a SUNDIALS package. 292 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassLapackDenseInit() initializes this SUNLinSol_LapackDense module for solving mass matrix linear systems. subroutine FSUNMassLapackDenseInit(IER) Initializes a dense LAPACK SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • IER (int, output) – return flag (0 success, -1 for failure). 11.5.2 SUNLinSol_LapackDense Description The SUNLinSol_LapackDense module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_Dense { sunindextype N; sunindextype *pivots; long int last_flag; }; These entries of the content field contain the following information: • N - size of the linear system, • pivots - index array for partial pivoting in LU factorization, • last_flag - last error return flag from internal function evaluations. The SUNLinSol_LapackDense module is a SUNLinearSolver wrapper for the LAPACK dense matrix factorization and solve routines, *GETRF and *GETRS, where * is either D or S, depending on whether SUNDIALS was configured to have realtype set to double or single, respectively (see section Data Types for details). In order to use the SUNLinSol_LapackDense module it is assumed that LAPACK has been installed on the system prior to installation of SUNDIALS, and that SUNDIALS has been configured appropriately to link with LAPACK (see section Working with external Libraries for details). We note that since there do not exist 128-bit floating-point factorization and solve routines in LAPACK, this interface cannot be compiled when using extended precision for realtype. Similarly, since there do not exist 64-bit integer LAPACK routines, the SUNLinSol_LapackDense module also cannot be compiled when using int64_t for the sunindextype. This solver is constructed to perform the following operations: • The “setup” call performs a 𝐿𝑈 factorization with partial (row) pivoting (𝒪(𝑁 3 ) cost), 𝑃 𝐴 = 𝐿𝑈 , where 𝑃 is a permutation matrix, 𝐿 is a lower triangular matrix with 1’s on the diagonal, and 𝑈 is an upper triangular matrix. This factorization is stored in-place on the input SUNMATRIX_DENSE object 𝐴, with pivoting information encoding 𝑃 stored in the pivots array. • The “solve” call performs pivoting and forward and backward substitution using the stored pivots array and the 𝐿𝑈 factors held in the SUNMATRIX_DENSE object (𝒪(𝑁 2 ) cost). 11.5. The SUNLinSol_LapackDense Module 293 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The SUNLinSol_LapackDense module defines dense implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_LapackDense • SUNLinSolInitialize_LapackDense – this does nothing, since all consistency checks are performed at solver creation. • SUNLinSolSetup_LapackDense – this calls either DGETRF or SGETRF to perform the 𝐿𝑈 factorization. • SUNLinSolSolve_LapackDense – this calls either DGETRS or SGETRS to use the 𝐿𝑈 factors and pivots array to perform the solve. • SUNLinSolLastFlag_LapackDense • SUNLinSolSpace_LapackDense – this only returns information for the storage within the solver object, i.e. storage for N, last_flag, and pivots. • SUNLinSolFree_LapackDense 11.6 The SUNLinSol_LapackBand Module The LAPACK band implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_LapackBand, is designed to be used with the corresponding SUNMATRIX_BAND matrix type, and one of the serial or shared-memory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP, or NVECTOR_PTHREADS). The 11.6.1 SUNLinSol_LapackBand Usage The header file to be included when using this module is sunlinsol/sunlinsol_lapackband.h. The installed module library to link to is libsundials_sunlinsollapackband .lib where .lib is typically .so for shared libraries and .a for static libraries. The module SUNLinSol_LapackBand provides the following user-callable routine: SUNLinearSolver SUNLinSol_LapackBand(N_Vector y, SUNMatrix A) This function creates and allocates memory for a LAPACK band SUNLinearSolver. Its arguments are an N_Vector and SUNMatrix, that it uses to determine the linear system size and to assess compatibility with the linear solver implementation. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_BAND matrix type and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. Additionally, this routine will verify that the input matrix A is allocated with appropriate upper bandwidth storage for the 𝐿𝑈 factorization. If either A or y are incompatible then this routine will return NULL. For backwards compatibility, we also provide the wrapper function, SUNLinearSolver SUNLapackBand(N_Vector y, SUNMatrix A) Wrapper function for SUNLinSol_LapackBand(), with identical input and output arguments. For solvers that include a Fortran interface module, the SUNLinSol_LapackBand module also includes the Fortrancallable function FSUNLapackBandInit() to initialize this SUNLinSol_LapackBand module for a given SUNDIALS solver. 294 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FSUNLapackBandInit(CODE, IER) Initializes a banded LAPACK SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassLapackBandInit() initializes this SUNLinSol_LapackBand module for solving mass matrix linear systems. subroutine FSUNMassLapackBandInit(IER) Initializes a banded LAPACK SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • IER (int, output) – return flag (0 success, -1 for failure). 11.6.2 SUNLinSol_LapackBand Description SUNLinSol_LapackBand module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_Band { sunindextype N; sunindextype *pivots; long int last_flag; }; These entries of the content field contain the following information: • N - size of the linear system, • pivots - index array for partial pivoting in LU factorization, • last_flag - last error return flag from internal function evaluations. The SUNLinSol_LapackBand module is a SUNLinearSolver wrapper for the LAPACK band matrix factorization and solve routines, *GBTRF and *GBTRS, where * is either D or S, depending on whether SUNDIALS was configured to have realtype set to double or single, respectively (see section Data Types for details). In order to use the SUNLinSol_LapackBand module it is assumed that LAPACK has been installed on the system prior to installation of SUNDIALS, and that SUNDIALS has been configured appropriately to link with LAPACK (see section Working with external Libraries for details). We note that since there do not exist 128-bit floating-point factorization and solve routines in LAPACK, this interface cannot be compiled when using extended precision for realtype. Similarly, since there do not exist 64-bit integer LAPACK routines, the SUNLinSol_LapackBand module also cannot be compiled when using int64_t for the sunindextype. This solver is constructed to perform the following operations: • The “setup” call performs a 𝐿𝑈 factorization with partial (row) pivoting, 𝑃 𝐴 = 𝐿𝑈 , where 𝑃 is a permutation matrix, 𝐿 is a lower triangular matrix with 1’s on the diagonal, and 𝑈 is an upper triangular matrix. This factorization is stored in-place on the input SUNMATRIX_BAND object 𝐴, with pivoting information encoding 𝑃 stored in the pivots array. 11.6. The SUNLinSol_LapackBand Module 295 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • The “solve” call performs pivoting and forward and backward substitution using the stored pivots array and the 𝐿𝑈 factors held in the SUNMATRIX_BAND object. • 𝐴 must be allocated to accommodate the increase in upper bandwidth that occurs during factorization. More precisely, if 𝐴 is a band matrix with upper bandwidth mu and lower bandwidth ml, then the upper triangular factor 𝑈 can have upper bandwidth as big as smu = MIN(N-1,mu+ml). The lower triangular factor 𝐿 has lower bandwidth ml. The SUNLinSol_LapackBand module defines band implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_LapackBand • SUNLinSolInitialize_LapackBand – this does nothing, since all consistency checks are performed at solver creation. • SUNLinSolSetup_LapackBand – this calls either DGBTRF or SGBTRF to perform the 𝐿𝑈 factorization. • SUNLinSolSolve_LapackBand – this calls either DGBTRS or SGBTRS to use the 𝐿𝑈 factors and pivots array to perform the solve. • SUNLinSolLastFlag_LapackBand • SUNLinSolSpace_LapackBand – this only returns information for the storage within the solver object, i.e. storage for N, last_flag, and pivots. • SUNLinSolFree_LapackBand 11.7 The SUNLinSol_KLU Module The KLU implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_KLU, is designed to be used with the corresponding SUNMATRIX_SPARSE matrix type, and one of the serial or sharedmemory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP, or NVECTOR_PTHREADS). 11.7.1 SUNLinSol_KLU Usage The header file to be included when using this module is sunlinsol/sunlinsol_klu.h. The installed module library to link to is libsundials_sunlinsolklu .lib where .lib is typically .so for shared libraries and .a for static libraries. The module SUNLinSol_KLU provides the following additional user-callable routines: SUNLinearSolver SUNLinSol_KLU(N_Vector y, SUNMatrix A) This constructor function creates and allocates memory for a SUNLinSol_KLU object. Its arguments are an N_Vector and SUNMatrix, that it uses to determine the linear system size and to assess compatibility with the linear solver implementation. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_SPARSE matrix type (using either CSR or CSC storage formats) and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. If either A or y are incompatible then this routine will return NULL. int SUNLinSol_KLUReInit(SUNLinearSolver S, SUNMatrix A, sunindextype nnz, int reinit_type) This function reinitializes memory and flags for a new factorization (symbolic and numeric) to be conducted at 296 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), the next solver setup call. This routine is useful in the cases where the number of nonzeroes has changed or if the structure of the linear system has changed which would require a new symbolic (and numeric factorization). The reinit_type argument governs the level of reinitialization. The allowed values are: 1.The Jacobian matrix will be destroyed and a new one will be allocated based on the nnz value passed to this call. New symbolic and numeric factorizations will be completed at the next solver setup. 2.Only symbolic and numeric factorizations will be completed. It is assumed that the Jacobian size has not exceeded the size of nnz given in the sparse matrix provided to the original constructor routine (or the previous SUNKLUReInit call). This routine assumes no other changes to solver use are necessary. The return values from this function are SUNLS_MEM_NULL (either S or A are NULL), SUNLS_ILL_INPUT (A does not have type SUNMATRIX_SPARSE or reinit_type is invalid), SUNLS_MEM_FAIL (reallocation of the sparse matrix failed) or SUNLS_SUCCESS. int SUNLinSol_KLUSetOrdering(SUNLinearSolver S, int ordering_choice) This function sets the ordering used by KLU for reducing fill in the linear solve. ordering_choice are: Options for 0.AMD, 1.COLAMD, and 2.the natural ordering. The default is 1 for COLAMD. The return values from this function are SUNLS_MEM_NULL (S is NULL), SUNLS_ILL_INPUT (invalid ordering_choice), or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNKLU(N_Vector y, SUNMatrix A) Wrapper function for SUNLinSol_KLU() int SUNKLUReInit(SUNLinearSolver S, SUNMatrix A, sunindextype nnz, int reinit_type) Wrapper function for SUNLinSol_KLUReInit() int SUNKLUSetOrdering(SUNLinearSolver S, int ordering_choice) Wrapper function for SUNLinSol_KLUSetOrdering() For solvers that include a Fortran interface module, the SUNLinSol_KLU module also includes the Fortran-callable function FSUNKLUInit() to initialize this SUNLinSol_KLU module for a given SUNDIALS solver. subroutine FSUNKLUInit(CODE, IER) Initializes a KLU sparse SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassKLUInit() initializes this SUNLinSol_KLU module for solving mass matrix linear systems. 11.7. The SUNLinSol_KLU Module 297 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FSUNMassKLUInit(IER) Initializes a KLU sparse SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_KLUReInit() and SUNLinSol_KLUSetOrdering() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNKLUReInit(CODE, NNZ, REINIT_TYPE, IER) Fortran interface to SUNLinSol_KLUReInit() for system linear solvers. This routine must be called after FSUNKLUInit() has been called. Arguments: NNZ should have type long int, all others should have type int; all arguments have meanings identical to those listed above. subroutine FSUNMassKLUReInit(NNZ, REINIT_TYPE, IER) Fortran interface to SUNLinSol_KLUReInit() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassKLUInit() has been called. Arguments: NNZ should have type long int, all others should have type int; all arguments have meanings identical to those listed above. subroutine FSUNKLUSetOrdering(CODE, ORDERING, IER) Fortran interface to SUNLinSol_KLUSetOrdering() for system linear solvers. This routine must be called after FSUNKLUInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassKLUSetOrdering(ORDERING, IER) Fortran interface to SUNLinSol_KLUSetOrdering() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassKLUInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.7.2 SUNLinSol_KLU Description The SUNLinSol_KLU module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_KLU { long int last_flag; int first_factorize; sun_klu_symbolic *symbolic; sun_klu_numeric *numeric; sun_klu_common common; sunindextype (*klu_solver)(sun_klu_symbolic*, sun_klu_numeric*, sunindextype, sunindextype, double*, sun_klu_common*); }; These entries of the content field contain the following information: • last_flag - last error return flag from internal function evaluations, • first_factorize - flag indicating whether the factorization has ever been performed, • Symbolic - KLU storage structure for symbolic factorization components, 298 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • Numeric - KLU storage structure for numeric factorization components, • Common - storage structure for common KLU solver components, • klu_solver – pointer to the appropriate KLU solver function (depending on whether it is using a CSR or CSC sparse matrix). The SUNLinSol_KLU module is a SUNLinearSolver wrapper for the KLU sparse matrix factorization and solver library written by Tim Davis ([KLU], [DP2010]). In order to use the SUNLinSol_KLU interface to KLU, it is assumed that KLU has been installed on the system prior to installation of SUNDIALS, and that SUNDIALS has been configured appropriately to link with KLU (see section Working with external Libraries for details). Additionally, this wrapper only supports double-precision calculations, and therefore cannot be compiled if SUNDIALS is configured to have realtype set to either extended or single (see section Data Types for details). Since the KLU library supports both 32-bit and 64-bit integers, this interface will be compiled for either of the available sunindextype options. The KLU library has a symbolic factorization routine that computes the permutation of the linear system matrix to block triangular form and the permutations that will pre-order the diagonal blocks (the only ones that need to be factored) to reduce fill-in (using AMD, COLAMD, CHOLAMD, natural, or an ordering given by the user). Of these ordering choices, the default value in the SUNLinSol_KLU module is the COLAMD ordering. KLU breaks the factorization into two separate parts. The first is a symbolic factorization and the second is a numeric factorization that returns the factored matrix along with final pivot information. KLU also has a refactor routine that can be called instead of the numeric factorization. This routine will reuse the pivot information. This routine also returns diagnostic information that a user can examine to determine if numerical stability is being lost and a full numerical factorization should be done instead of the refactor. Since the linear systems that arise within the context of SUNDIALS calculations will typically have identical sparsity patterns, the SUNLinSol_KLU module is constructed to perform the following operations: • The first time that the “setup” routine is called, it performs the symbolic factorization, followed by an initial numerical factorization. • On subsequent calls to the “setup” routine, it calls the appropriate KLU “refactor” routine, followed by estimates of the numerical conditioning using the relevant “rcond”, and if necessary “condest”, routine(s). If these estimates of the condition number are larger than 𝜀−2/3 (where 𝜀 is the double-precision unit roundoff), then a new factorization is performed. • The module includes the routine SUNKLUReInit, that can be called by the user to force a full refactorization at the next “setup” call. • The “solve” call performs pivoting and forward and backward substitution using the stored KLU data structures. We note that in this solve KLU operates on the native data arrays for the right-hand side and solution vectors, without requiring costly data copies. The SUNLinSol_KLU module defines implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_KLU • SUNLinSolInitialize_KLU – this sets the first_factorize flag to 1, forcing both symbolic and numerical factorizations on the subsequent “setup” call. • SUNLinSolSetup_KLU – this performs either a 𝐿𝑈 factorization or refactorization of the input matrix. • SUNLinSolSolve_KLU – this calls the appropriate KLU solve routine to utilize the 𝐿𝑈 factors to solve the linear system. • SUNLinSolLastFlag_KLU 11.7. The SUNLinSol_KLU Module 299 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • SUNLinSolSpace_KLU – this only returns information for the storage within the solver interface, i.e. storage for the integers last_flag and first_factorize. For additional space requirements, see the KLU documentation. • SUNLinSolFree_KLU 11.8 The SUNLinSol_SuperLUMT Module The SuperLU_MT implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_SuperLUMT, is designed to be used with the corresponding SUNMATRIX_SPARSE matrix type, and one of the serial or shared-memory N_Vector implementations (NVECTOR_SERIAL, NVECTOR_OPENMP, or NVECTOR_PTHREADS). While these are compatible, it is not recommended to use a threaded vector module with SUNLinSol_SuperLUMT unless it is the NVECTOR_OPENMP module and the SuperLU_MT library has also been compiled with OpenMP. 11.8.1 SUNLinSol_SuperLUMT Usage The header file to be included when using this module is sunlinsol/sunlinsol_superlumt.h. The installed module library to link to is libsundials_sunlinsolsuperlumt .lib where .lib is typically .so for shared libraries and .a for static libraries. The module SUNLinSol_SuperLUMT provides the following user-callable routines: SUNLinearSolver SUNLinSol_SuperLUMT(N_Vector y, SUNMatrix A, int num_threads) This constructor function creates and allocates memory for a SUNLinSol_SuperLUMT object. Its arguments are an N_Vector, a SUNMatrix, and a desired number of threads (OpenMP or Pthreads, depending on how SuperLU_MT was installed) to use during the factorization steps. This routine analyzes the input matrix and vector to determine the linear system size and to assess compatibility with the SuperLU_MT library. This routine will perform consistency checks to ensure that it is called with consistent N_Vector and SUNMatrix implementations. These are currently limited to the SUNMATRIX_SPARSE matrix type (using either CSR or CSC storage formats) and the NVECTOR_SERIAL, NVECTOR_OPENMP, and NVECTOR_PTHREADS vector types. As additional compatible matrix and vector implementations are added to SUNDIALS, these will be included within this compatibility check. If either A or y are incompatible then this routine will return NULL. The num_threads argument is not checked and is passed directly to SuperLU_MT routines. int SUNLinSol_SuperLUMTSetOrdering(SUNLinearSolver S, int ordering_choice) This function sets the ordering used by SuperLU_MT for reducing fill in the linear solve. Options for ordering_choice are: 0.natural ordering 1.minimal degree ordering on 𝐴𝑇 𝐴 2.minimal degree ordering on 𝐴𝑇 + 𝐴 3.COLAMD ordering for unsymmetric matrices The default is 3 for COLAMD. The return values from this function are SUNLS_MEM_NULL (S is NULL), SUNLS_ILL_INPUT (invalid ordering_choice), or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: 300 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNLinearSolver SUNSuperLUMT(N_Vector y, SUNMatrix A, int num_threads) Wrapper for SUNLinSol_SuperLUMT(). and int SUNSuperLUMTSetOrdering(SUNLinearSolver S, int ordering_choice) Wrapper for SUNLinSol_SuperLUMTSetOrdering(). For solvers that include a Fortran interface module, the SUNLinSol_SuperLUMT module also includes the Fortrancallable function FSUNSuperLUMTInit() to initialize this SUNLinSol_SuperLUMT module for a given SUNDIALS solver. subroutine FSUNSuperLUMTInit(CODE, NUM_THREADS, IER) Initializes a SuperLU_MT sparse SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after both the N_Vector and SUNMatrix objects have been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • NUM_THREADS (int, input) – desired number of OpenMP/Pthreads threads to use in the factorization. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassSuperLUMTInit() initializes this SUNLinSol_SuperLUMT module for solving mass matrix linear systems. subroutine FSUNMassSuperLUMTInit(NUM_THREADS, IER) Initializes a SuperLU_MT sparse SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after both the N_Vector and the mass SUNMatrix objects have been initialized. Arguments: • NUM_THREADS (int, input) – desired number of OpenMP/Pthreads threads to use in the factorization. • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_SuperLUMTSetOrdering() routine also supports Fortran interfaces for the system and mass matrix solvers: subroutine FSUNSuperLUMTSetOrdering(CODE, ORDERING, IER) Fortran interface to SUNLinSol_SuperLUMTSetOrdering() for system linear solvers. This routine must be called after FSUNSuperLUMTInit() has been called Arguments: all should have type int and have meanings identical to those listed above subroutine FSUNMassSuperLUMTSetOrdering(ORDERING, IER) Fortran interface to SUNLinSol_SuperLUMTSetOrdering() for mass matrix linear solves in ARKode. This routine must be called after FSUNMassSuperLUMTInit() has been called Arguments: all should have type int and have meanings identical to those listed above 11.8. The SUNLinSol_SuperLUMT Module 301 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 11.8.2 SUNLinSol_SuperLUMT Description The SUNLinSol_SuperLUMT module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_SuperLUMT { long int last_flag; int first_factorize; SuperMatrix *A, *AC, *L, *U, *B; Gstat_t *Gstat; sunindextype *perm_r, *perm_c; sunindextype N; int num_threads; realtype diag_pivot_thresh; int ordering; superlumt_options_t *options; }; These entries of the content field contain the following information: • last_flag - last error return flag from internal function evaluations, • first_factorize - flag indicating whether the factorization has ever been performed, • A, AC, L, U, B - SuperMatrix pointers used in solve, • Gstat - GStat_t object used in solve, • perm_r, perm_c - permutation arrays used in solve, • N - size of the linear system, • num_threads - number of OpenMP/Pthreads threads to use, • diag_pivot_thresh - threshold on diagonal pivoting, • ordering - flag for which reordering algorithm to use, • options - pointer to SuperLU_MT options structure. The SUNLinSol_SuperLUMT module is a SUNLinearSolver wrapper for the SuperLU_MT sparse matrix factorization and solver library written by X. Sherry Li ([SuperLUMT], [L2005], [DGL1999]). The package performs matrix factorization using threads to enhance efficiency in shared memory parallel environments. It should be noted that threads are only used in the factorization step. In order to use the SUNLinSol_SuperLUMT interface to SuperLU_MT, it is assumed that SuperLU_MT has been installed on the system prior to installation of SUNDIALS, and that SUNDIALS has been configured appropriately to link with SuperLU_MT (see section Working with external Libraries for details). Additionally, this wrapper only supports single- and double-precision calculations, and therefore cannot be compiled if SUNDIALS is configured to have realtype set to extended (see section Data Types for details). Moreover, since the SuperLU_MT library may be installed to support either 32-bit or 64-bit integers, it is assumed that the SuperLU_MT library is installed using the same integer precision as the SUNDIALS sunindextype option. The SuperLU_MT library has a symbolic factorization routine that computes the permutation of the linear system matrix to reduce fill-in on subsequent 𝐿𝑈 factorizations (using COLAMD, minimal degree ordering on 𝐴𝑇 * 𝐴, minimal degree ordering on 𝐴𝑇 + 𝐴, or natural ordering). Of these ordering choices, the default value in the SUNLinSol_SuperLUMT module is the COLAMD ordering. Since the linear systems that arise within the context of SUNDIALS calculations will typically have identical sparsity patterns, the SUNLinSol_SuperLUMT module is constructed to perform the following operations: • The first time that the “setup” routine is called, it performs the symbolic factorization, followed by an initial numerical factorization. 302 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • On subsequent calls to the “setup” routine, it skips the symbolic factorization, and only refactors the input matrix. • The “solve” call performs pivoting and forward and backward substitution using the stored SuperLU_MT data structures. We note that in this solve SuperLU_MT operates on the native data arrays for the right-hand side and solution vectors, without requiring costly data copies. The SUNLinSol_SuperLUMT module defines implementations of all “direct” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_SuperLUMT • SUNLinSolInitialize_SuperLUMT – this sets the first_factorize flag to 1 and resets the internal SuperLU_MT statistics variables. • SUNLinSolSetup_SuperLUMT – this performs either a 𝐿𝑈 factorization or refactorization of the input matrix. • SUNLinSolSolve_SuperLUMT – this calls the appropriate SuperLU_MT solve routine to utilize the 𝐿𝑈 factors to solve the linear system. • SUNLinSolLastFlag_SuperLUMT • SUNLinSolSpace_SuperLUMT – this only returns information for the storage within the solver interface, i.e. storage for the integers last_flag and first_factorize. For additional space requirements, see the SuperLU_MT documentation. • SUNLinSolFree_SuperLUMT 11.9 The SUNLinSol_SPGMR Module The SPGMR (Scaled, Preconditioned, Generalized Minimum Residual [SS1986]) implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_SPGMR, is an iterative linear solver that is designed to be compatible with any N_Vector implementation (serial, threaded, parallel, and user-supplied) that supports a minimal subset of operations (N_VClone(), N_VDotProd(), N_VScale(), N_VLinearSum(), N_VProd(), N_VConst(), N_VDiv(), and N_VDestroy()). 11.9.1 SUNLinSol_SPGMR Usage The header file to be included when using this module is sunlinsol/sunlinsol_spgmr.h. The SUNinSol_SPGMR module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolspgmr module library. The module SUNLinSol_SPGMR provides the following user-callable routines: SUNLinearSolver SUNLinSol_SPGMR(N_Vector y, int pretype, int maxl) This constructor function creates and allocates memory for a SPGMR SUNLinearSolver. Its arguments are an N_Vector, the desired type of preconditioning, and the number of Krylov basis vectors to use. This routine will perform consistency checks to ensure that it is called with a consistent N_Vector implementation (i.e. that it supplies the requisite vector operations). If y is incompatible, then this routine will return NULL. A maxl argument that is ≤ 0 will result in the default value (5). Allowable inputs for pretype are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2) and PREC_BOTH (3); any other integer input will result in the default (no preconditioning). We note that some SUNDIALS solvers are designed to only work with left preconditioning (IDA and IDAS) and others with only right preconditioning 11.9. The SUNLinSol_SPGMR Module 303 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), (KINSOL). While it is possible to configure a SUNLinSol_SPGMR object to use any of the preconditioning options with these solvers, this use mode is not supported and may result in inferior performance. int SUNLinSol_SPGMRSetPrecType(SUNLinearSolver S, int pretype) This function updates the type of preconditioning to use. Supported values are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2) and PREC_BOTH (3). This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal pretype), SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. int SUNLinSol_SPGMRSetGSType(SUNLinearSolver S, int gstype) This function sets the type of Gram-Schmidt orthogonalization to use. Supported values are MODIFIED_GS (1) and CLASSICAL_GS (2). Any other integer input will result in a failure, returning error code SUNLS_ILL_INPUT. This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal gstype), SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. int SUNLinSol_SPGMRSetMaxRestarts(SUNLinearSolver S, int maxrs) This function sets the number of GMRES restarts to allow. A negative input will result in the default of 0. This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNSPGMR(N_Vector y, int pretype, int maxl) Wrapper function for SUNLinSol_SPGMR() int SUNSPGMRSetPrecType(SUNLinearSolver S, int pretype) Wrapper function for SUNLinSol_SPGMRSetPrecType() int SUNSPGMRSetGSType(SUNLinearSolver S, int gstype) Wrapper function for SUNLinSol_SPGMRSetGSType() int SUNSPGMRSetMaxRestarts(SUNLinearSolver S, int maxrs) Wrapper function for SUNLinSol_SPGMRSetMaxRestarts() For solvers that include a Fortran interface module, the SUNLinSol_SPGMR module also includes the Fortran-callable function FSUNSPGMRInit() to initialize this SUNLinSol_SPGMR module for a given SUNDIALS solver. subroutine FSUNSPGMRInit(CODE, PRETYPE, MAXL, IER) Initializes a SPGMR SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of GMRES basis vectors to use. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassSPGMRInit() initializes this SUNLinSol_SPGMR module for solving mass matrix linear systems. subroutine FSUNMassSPGMRInit(PRETYPE, MAXL, IER) Initializes a SPGMR SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after the N_Vector object has been initialized. 304 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of GMRES basis vectors to use. • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_SPGMRSetGSType(), SUNLinSol_SPGMRSetPrecType() and SUNLinSol_SPGMRSetMaxRestarts() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNSPGMRSetGSType(CODE, GSTYPE, IER) Fortran interface to SUNLinSol_SPGMRSetGSType() for system linear solvers. This routine must be called after FSUNSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPGMRSetGSType(GSTYPE, IER) Fortran interface to SUNLinSol_SPGMRSetGSType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPGMRSetPrecType(CODE, PRETYPE, IER) Fortran interface to SUNLinSol_SPGMRSetPrecType() for system linear solvers. This routine must be called after FSUNSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPGMRSetPrecType(PRETYPE, IER) Fortran interface to SUNLinSol_SPGMRSetPrecType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPGMRSetMaxRS(CODE, MAXRS, IER) Fortran interface to SUNLinSol_SPGMRSetMaxRS() for system linear solvers. This routine must be called after FSUNSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPGMRSetMaxRS(MAXRS, IER) Fortran interface to SUNLinSol_SPGMRSetMaxRS() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.9.2 SUNLinSol_SPGMR Description The SUNLinSol_SPGMR module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_SPGMR { int maxl; int pretype; int gstype; int max_restarts; int numiters; 11.9. The SUNLinSol_SPGMR Module 305 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), realtype resnorm; long int last_flag; ATimesFn ATimes; void* ATData; PSetupFn Psetup; PSolveFn Psolve; void* PData; N_Vector s1; N_Vector s2; N_Vector *V; realtype **Hes; realtype *givens; N_Vector xcor; realtype *yg; N_Vector vtemp; }; These entries of the content field contain the following information: • maxl - number of GMRES basis vectors to use (default is 5), • pretype - flag for type of preconditioning to employ (default is none), • gstype - flag for type of Gram-Schmidt orthogonalization (default is modified Gram-Schmidt), • max_restarts - number of GMRES restarts to allow (default is 0), • numiters - number of iterations from the most-recent solve, • resnorm - final linear residual norm from the most-recent solve, • last_flag - last error return flag from an internal function, • ATimes - function pointer to perform 𝐴𝑣 product, • ATData - pointer to structure for ATimes, • Psetup - function pointer to preconditioner setup routine, • Psolve - function pointer to preconditioner solve routine, • PData - pointer to structure for Psetup and Psolve, • s1, s2 - vector pointers for supplied scaling matrices (default is NULL), • V - the array of Krylov basis vectors 𝑣1 , . . . , 𝑣maxl+1 , stored in V[0], ... of type N_Vector, V[maxl]. Each 𝑣𝑖 is a vector • Hes - the (maxl + 1) × maxl Hessenberg matrix. It is stored row-wise so that the (i,j)th element is given by Hes[i][j], • givens - a length 2 maxl array which represents the Givens rotation matrices that arise in the GMRES algorithm. These matrices are 𝐹0 , 𝐹1 , . . . , 𝐹𝑗 , where ⎡ ⎤ 1 ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ 𝑐𝑖 −𝑠𝑖 ⎢ ⎥, 𝐹𝑖 = ⎢ ⎥ 𝑠𝑖 𝑐𝑖 ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ .. ⎣ ⎦ . 1 306 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), are represented in the givens vector as givens[0] = 𝑐0 , givens[1] = 𝑠0 , givens[2] = 𝑐1 , givens[3] = 𝑠1 , . . ., givens[2j] = 𝑐𝑗 , givens[2j+1] = 𝑠𝑗 , • xcor - a vector which holds the scaled, preconditioned correction to the initial guess, • yg - a length (maxl + 1) array of realtype values used to hold “short” vectors (e.g. 𝑦 and 𝑔), • vtemp - temporary vector storage. This solver is constructed to perform the following operations: • During construction, the xcor and vtemp arrays are cloned from a template N_Vector that is input, and default solver parameters are set. • User-facing “set” routines may be called to modify default solver parameters. • Additional “set” routines are called by the SUNDIALS solver that interfaces with SUNLinSol_SPGMR to supply the ATimes, PSetup, and Psolve function pointers and s1 and s2 scaling vectors. • In the “initialize” call, the remaining solver data is allocated (V, Hes, givens, and yg ) • In the “setup” call, any non-NULL PSetup function is called. Typically, this is provided by the SUNDIALS solver itself, that translates between the generic PSetup function and the solver-specific routine (solversupplied or user-supplied). • In the “solve” call, the GMRES iteration is performed. This will include scaling, preconditioning, and restarts if those options have been supplied. The SUNLinSol_SPGMR module defines implementations of all “iterative” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_SPGMR • SUNLinSolInitialize_SPGMR • SUNLinSolSetATimes_SPGMR • SUNLinSolSetPreconditioner_SPGMR • SUNLinSolSetScalingVectors_SPGMR • SUNLinSolSetup_SPGMR • SUNLinSolSolve_SPGMR • SUNLinSolNumIters_SPGMR • SUNLinSolResNorm_SPGMR • SUNLinSolResid_SPGMR • SUNLinSolLastFlag_SPGMR • SUNLinSolSpace_SPGMR • SUNLinSolFree_SPGMR 11.10 The SUNLinSol_SPFGMR Module The SPFGMR (Scaled, Preconditioned, Flexible, Generalized Minimum Residual [S1993]) implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_SPFGMR, is an iterative linear solver that is designed to be compatible with any N_Vector implementation (serial, threaded, parallel, and user-supplied) that supports a minimal subset of operations (N_VClone(), N_VDotProd(), N_VScale(), N_VLinearSum(), N_VProd(), N_VConst(), N_VDiv(), and N_VDestroy()). Unlike the other Krylov iterative linear solvers 11.10. The SUNLinSol_SPFGMR Module 307 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), supplied with SUNDIALS, FGMRES is specifically designed to work with a changing preconditioner (e.g. from an iterative method). 11.10.1 SUNLinSol_SPFGMR Usage The header file to be included when using this module is sunlinsol/sunlinsol_spfgmr.h. The SUNLinSol_SPFGMR module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolspfgmr module library. The module SUNLinSol_SPFGMR provides the following user-callable routines: SUNLinearSolver SUNLinSol_SPFGMR(N_Vector y, int pretype, int maxl) This constructor function creates and allocates memory for a SPFGMR SUNLinearSolver. Its arguments are an N_Vector, a flag indicating to use preconditioning, and the number of Krylov basis vectors to use. This routine will perform consistency checks to ensure that it is called with a consistent N_Vector implementation (i.e. that it supplies the requisite vector operations). If y is incompatible, then this routine will return NULL. A maxl argument that is ≤ 0 will result in the default value (5). Since the FGMRES algorithm is designed to only support right preconditioning, then any of the pretype inputs PREC_LEFT (1), PREC_RIGHT (2), or PREC_BOTH (3) will result in use of PREC_RIGHT; any other integer input will result in the default (no preconditioning). We note that some SUNDIALS solvers are designed to only work with left preconditioning (IDA and IDAS). While it is possible to use a right-preconditioned SUNLinSol_SPFGMR object for these packages, this use mode is not supported and may result in inferior performance. int SUNLinSol_SPFGMRSetPrecType(SUNLinearSolver S, int pretype) This function updates the flag indicating use of preconditioning. Since the FGMRES algorithm is designed to only support right preconditioning, then any of the pretype inputs PREC_LEFT (1), PREC_RIGHT (2), or PREC_BOTH (3) will result in use of PREC_RIGHT; any other integer input will result in the default (no preconditioning). This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. int SUNLinSol_SPFGMRSetGSType(SUNLinearSolver S, int gstype) This function sets the type of Gram-Schmidt orthogonalization to use. Supported values are MODIFIED_GS (1) and CLASSICAL_GS (2). Any other integer input will result in a failure, returning error code SUNLS_ILL_INPUT. This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal gstype), SUNLS_MEM_NULL (S is NULL), or SUNLS_SUCCESS. int SUNLinSol_SPFGMRSetMaxRestarts(SUNLinearSolver S, int maxrs) This function sets the number of FGMRES restarts to allow. A negative input will result in the default of 0. This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNSPFGMR(N_Vector y, int pretype, int maxl) Wrapper function for SUNLinSol_SPFGMR() int SUNSPFGMRSetPrecType(SUNLinearSolver S, int pretype) Wrapper function for SUNLinSol_SPFGMRSetPrecType() int SUNSPFGMRSetGSType(SUNLinearSolver S, int gstype) Wrapper function for SUNLinSol_SPFGMRSetGSType() 308 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int SUNSPFGMRSetMaxRestarts(SUNLinearSolver S, int maxrs) Wrapper function for SUNLinSol_SPFGMRSetMaxRestarts() For solvers that include a Fortran interface module, the SUNLinSol_SPFGMR module also includes the Fortrancallable function FSUNSPFGMRInit() to initialize this SUNLinSol_SPFGMR module for a given SUNDIALS solver. subroutine FSUNSPFGMRInit(CODE, PRETYPE, MAXL, IER) Initializes a SPFGMR SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • PRETYPE (int, input) – flag denoting whether to use preconditioning: no=0, yes=1. • MAXL (int, input) – number of FGMRES basis vectors to use. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassSPFGMRInit() initializes this SUNLinSol_SPFGMR module for solving mass matrix linear systems. subroutine FSUNMassSPFGMRInit(PRETYPE, MAXL, IER) Initializes a SPFGMR SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after the N_Vector object has been initialized. Arguments: • PRETYPE (int, input) – flag denoting whether to use preconditioning: no=0, yes=1. • MAXL (int, input) – number of FGMRES basis vectors to use. • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_SPFGMRSetGSType(), SUNLinSol_SPFGMRSetPrecType() and SUNLinSol_SPFGMRSetMaxRestarts() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNSPFGMRSetGSType(CODE, GSTYPE, IER) Fortran interface to SUNLinSol_SPFGMRSetGSType() for system linear solvers. This routine must be called after FSUNSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPFGMRSetGSType(GSTYPE, IER) Fortran interface to SUNLinSol_SPFGMRSetGSType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPFGMRSetPrecType(CODE, PRETYPE, IER) Fortran interface to SUNLinSol_SPFGMRSetPrecType() for system linear solvers. This routine must be called after FSUNSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.10. The SUNLinSol_SPFGMR Module 309 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), subroutine FSUNMassSPFGMRSetPrecType(PRETYPE, IER) Fortran interface to SUNLinSol_SPFGMRSetPrecType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPFGMRSetMaxRS(CODE, MAXRS, IER) Fortran interface to SUNLinSol_SPFGMRSetMaxRS() for system linear solvers. This routine must be called after FSUNSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPFGMRSetMaxRS(MAXRS, IER) Fortran interface to SUNLinSol_SPFGMRSetMaxRS() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPFGMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.10.2 SUNLinSol_SPFGMR Description The SUNLinSol_SPFGMR module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_SPFGMR { int maxl; int pretype; int gstype; int max_restarts; int numiters; realtype resnorm; long int last_flag; ATimesFn ATimes; void* ATData; PSetupFn Psetup; PSolveFn Psolve; void* PData; N_Vector s1; N_Vector s2; N_Vector *V; N_Vector *Z; realtype **Hes; realtype *givens; N_Vector xcor; realtype *yg; N_Vector vtemp; }; These entries of the content field contain the following information: • maxl - number of FGMRES basis vectors to use (default is 5), • pretype - flag for use of preconditioning (default is none), • gstype - flag for type of Gram-Schmidt orthogonalization (default is modified Gram-Schmidt), • max_restarts - number of FGMRES restarts to allow (default is 0), • numiters - number of iterations from the most-recent solve, • resnorm - final linear residual norm from the most-recent solve, 310 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • last_flag - last error return flag from an internal function, • ATimes - function pointer to perform 𝐴𝑣 product, • ATData - pointer to structure for ATimes, • Psetup - function pointer to preconditioner setup routine, • Psolve - function pointer to preconditioner solve routine, • PData - pointer to structure for Psetup and Psolve, • s1, s2 - vector pointers for supplied scaling matrices (default is NULL), • V - the array of Krylov basis vectors 𝑣1 , . . . , 𝑣maxl+1 , stored in V[0], ..., V[maxl]. Each 𝑣𝑖 is a vector of type N_Vector, • Z - the array of preconditioned Krylov basis vectors 𝑧1 , . . . , 𝑧maxl+1 , stored in Z[0], ..., Z[maxl]. Each 𝑧𝑖 is a vector of type N_Vector, • Hes - the (maxl + 1) × maxl Hessenberg matrix. It is stored row-wise so that the (i,j)th element is given by Hes[i][j], • givens - a length 2 maxl array which represents the Givens rotation matrices that arise in the FGMRES algorithm. These matrices are 𝐹0 , 𝐹1 , . . . , 𝐹𝑗 , where ⎡ ⎤ 1 ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ 𝑐𝑖 −𝑠𝑖 ⎥, 𝐹𝑖 = ⎢ ⎢ ⎥ 𝑠𝑖 𝑐𝑖 ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ .. ⎣ ⎦ . 1 are represented in the givens vector as givens[0] = 𝑐0 , givens[1] = 𝑠0 , givens[2] = 𝑐1 , givens[3] = 𝑠1 , . . ., givens[2j] = 𝑐𝑗 , givens[2j+1] = 𝑠𝑗 , • xcor - a vector which holds the scaled, preconditioned correction to the initial guess, • yg - a length (maxl + 1) array of realtype values used to hold “short” vectors (e.g. 𝑦 and 𝑔), • vtemp - temporary vector storage. This solver is constructed to perform the following operations: • During construction, the xcor and vtemp arrays are cloned from a template N_Vector that is input, and default solver parameters are set. • User-facing “set” routines may be called to modify default solver parameters. • Additional “set” routines are called by the SUNDIALS solver that interfaces with SUNLinSol_SPFGMR to supply the ATimes, PSetup, and Psolve function pointers and s1 and s2 scaling vectors. • In the “initialize” call, the remaining solver data is allocated (V, Hes, givens, and yg ) • In the “setup” call, any non-NULL PSetup function is called. Typically, this is provided by the SUNDIALS solver itself, that translates between the generic PSetup function and the solver-specific routine (solversupplied or user-supplied). • In the “solve” call, the FGMRES iteration is performed. This will include scaling, preconditioning, and restarts if those options have been supplied. 11.10. The SUNLinSol_SPFGMR Module 311 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), The SUNLinSol_SPFGMR module defines implementations of all “iterative” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_SPFGMR • SUNLinSolInitialize_SPFGMR • SUNLinSolSetATimes_SPFGMR • SUNLinSolSetPreconditioner_SPFGMR • SUNLinSolSetScalingVectors_SPFGMR • SUNLinSolSetup_SPFGMR • SUNLinSolSolve_SPFGMR • SUNLinSolNumIters_SPFGMR • SUNLinSolResNorm_SPFGMR • SUNLinSolResid_SPFGMR • SUNLinSolLastFlag_SPFGMR • SUNLinSolSpace_SPFGMR • SUNLinSolFree_SPFGMR 11.11 The SUNLinSol_SPBCGS Module The SPBCGS (Scaled, Preconditioned, Bi-Conjugate Gradient, Stabilized [V1992]) implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_SPBCGS, is an iterative linear solver that is designed to be compatible with any N_Vector implementation (serial, threaded, parallel, and user-supplied) that supports a minimal subset of operations (N_VClone(), N_VDotProd(), N_VScale(), N_VLinearSum(), N_VProd(), N_VDiv(), and N_VDestroy()). Unlike the SPGMR and SPFGMR algorithms, SPBCGS requires a fixed amount of memory that does not increase with the number of allowed iterations. 11.11.1 SUNLinSol_SPBCGS Usage The header file to be included when using this module is sunlinsol/sunlinsol_spbcgs.h. The SUNLinSol_SPBCGS module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolspbcgs module library. The module SUNLinSol_SPBCGS provides the following user-callable routines: SUNLinearSolver SUNLinSol_SPBCGS(N_Vector y, int pretype, int maxl) This constructor function creates and allocates memory for a SPBCGS SUNLinearSolver. Its arguments are an N_Vector, the desired type of preconditioning, and the number of linear iterations to allow. This routine will perform consistency checks to ensure that it is called with a consistent N_Vector implementation (i.e. that it supplies the requisite vector operations). If y is incompatible, then this routine will return NULL. A maxl argument that is ≤ 0 will result in the default value (5). Allowable inputs for pretype are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2) and PREC_BOTH (3); any other integer input will result in the default (no preconditioning). We note that some SUNDIALS solvers are designed to only work with left preconditioning (IDA and IDAS) and others with only right preconditioning 312 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), (KINSOL). While it is possible to configure a SUNLinSol_SPBCGS object to use any of the preconditioning options with these solvers, this use mode is not supported and may result in inferior performance. int SUNLinSol_SPBCGSSetPrecType(SUNLinearSolver S, int pretype) This function updates the type of preconditioning to use. Supported values are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2), and PREC_BOTH (3). This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal pretype), SUNLS_MEM_NULL (S is NULL), or SUNLS_SUCCESS. int SUNLinSol_SPBCGSSetMaxl(SUNLinearSolver S, int maxl) This function updates the number of linear solver iterations to allow. A maxl argument that is ≤ 0 will result in the default value (5). This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNSPBCGS(N_Vector y, int pretype, int maxl) Wrapper function for SUNLinSol_SPBCGS() int SUNSPBCGSSetPrecType(SUNLinearSolver S, int pretype) Wrapper function for SUNLinSol_SPBCGSSetPrecType() int SUNSPBCGSSetMaxl(SUNLinearSolver S, int maxl) Wrapper function for SUNLinSol_SPBCGSSetMaxl() For solvers that include a Fortran interface module, the SUNLinSol_SPBCGS module also includes the Fortrancallable function FSUNSPBCGSInit() to initialize this SUNLinSol_SPBCGS module for a given SUNDIALS solver. subroutine FSUNSPBCGSInit(CODE, PRETYPE, MAXL, IER) Initializes a SPBCGS SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of SPBCGS iterations to allow. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassSPBCGSInit() initializes this SUNLinSol_SPBCGS module for solving mass matrix linear systems. subroutine FSUNMassSPBCGSInit(PRETYPE, MAXL, IER) Initializes a SPBCGS SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after the N_Vector object has been initialized. Arguments: • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of SPBCGS iterations to allow. 11.11. The SUNLinSol_SPBCGS Module 313 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_SPBCGSSetPrecType() and SUNLinSol_SPBCGSSetMaxl() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNSPBCGSSetPrecType(CODE, PRETYPE, IER) Fortran interface to SUNLinSol_SPBCGSSetPrecType() for system linear solvers. This routine must be called after FSUNSPBCGSInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPBCGSSetPrecType(PRETYPE, IER) Fortran interface to SUNLinSol_SPBCGSSetPrecType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPBCGSInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPBCGSSetMaxl(CODE, MAXL, IER) Fortran interface to SUNLinSol_SPBCGSSetMaxl() for system linear solvers. This routine must be called after FSUNSPBCGSInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPBCGSSetMaxl(MAXL, IER) Fortran interface to SUNLinSol_SPBCGSSetMaxl() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPBCGSInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.11.2 SUNLinSol_SPBCGS Description The SUNLinSol_SPBCGS module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_SPBCGS { int maxl; int pretype; int numiters; realtype resnorm; long int last_flag; ATimesFn ATimes; void* ATData; PSetupFn Psetup; PSolveFn Psolve; void* PData; N_Vector s1; N_Vector s2; N_Vector r; N_Vector r_star; N_Vector p; N_Vector q; N_Vector u; N_Vector Ap; N_Vector vtemp; }; These entries of the content field contain the following information: • maxl - number of SPBCGS iterations to allow (default is 5), 314 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • pretype - flag for type of preconditioning to employ (default is none), • numiters - number of iterations from the most-recent solve, • resnorm - final linear residual norm from the most-recent solve, • last_flag - last error return flag from an internal function, • ATimes - function pointer to perform 𝐴𝑣 product, • ATData - pointer to structure for ATimes, • Psetup - function pointer to preconditioner setup routine, • Psolve - function pointer to preconditioner solve routine, • PData - pointer to structure for Psetup and Psolve, • s1, s2 - vector pointers for supplied scaling matrices (default is NULL), • r - a N_Vector which holds the current scaled, preconditioned linear system residual, • r_star - a N_Vector which holds the initial scaled, preconditioned linear system residual, • p, q, u, Ap, vtemp - N_Vector used for workspace by the SPBCGS algorithm. This solver is constructed to perform the following operations: • During construction all N_Vector solver data is allocated, with vectors cloned from a template N_Vector that is input, and default solver parameters are set. • User-facing “set” routines may be called to modify default solver parameters. • Additional “set” routines are called by the SUNDIALS solver that interfaces with SUNLinSol_SPBCGS to supply the ATimes, PSetup, and Psolve function pointers and s1 and s2 scaling vectors. • In the “initialize” call, the solver parameters are checked for validity. • In the “setup” call, any non-NULL PSetup function is called. Typically, this is provided by the SUNDIALS solver itself, that translates between the generic PSetup function and the solver-specific routine (solversupplied or user-supplied). • In the “solve” call the SPBCGS iteration is performed. This will include scaling and preconditioning if those options have been supplied. The SUNLinSol_SPBCGS module defines implementations of all “iterative” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_SPBCGS • SUNLinSolInitialize_SPBCGS • SUNLinSolSetATimes_SPBCGS • SUNLinSolSetPreconditioner_SPBCGS • SUNLinSolSetScalingVectors_SPBCGS • SUNLinSolSetup_SPBCGS • SUNLinSolSolve_SPBCGS • SUNLinSolNumIters_SPBCGS • SUNLinSolResNorm_SPBCGS • SUNLinSolResid_SPBCGS • SUNLinSolLastFlag_SPBCGS 11.11. The SUNLinSol_SPBCGS Module 315 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • SUNLinSolSpace_SPBCGS • SUNLinSolFree_SPBCGS 11.12 The SUNLinSol_SPTFQMR Module The SPTFQMR (Scaled, Preconditioned, Transpose-Free Quasi-Minimum Residual [F1993]) implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_SPTFQMR, is an iterative linear solver that is designed to be compatible with any N_Vector implementation (serial, threaded, parallel, and user-supplied) that supports a minimal subset of operations (N_VClone(), N_VDotProd(), N_VScale(), N_VLinearSum(), N_VProd(), N_VConst(), N_VDiv(), and N_VDestroy()). Unlike the SPGMR and SPFGMR algorithms, SPTFQMR requires a fixed amount of memory that does not increase with the number of allowed iterations. 11.12.1 SUNLinSol_SPTFQMR Usage The header file to be included when using this module is sunlinsol/sunlinsol_sptfqmr.h. The SUNLinSol_SPTFQMR module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolsptfqmr module library. The module SUNLinSol_SPTFQMR provides the following user-callable routines: SUNLinearSolver SUNLinSol_SPTFQMR(N_Vector y, int pretype, int maxl) This constructor function creates and allocates memory for a SPTFQMR SUNLinearSolver. Its arguments are an N_Vector, the desired type of preconditioning, and the number of linear iterations to allow. This routine will perform consistency checks to ensure that it is called with a consistent N_Vector implementation (i.e. that it supplies the requisite vector operations). If y is incompatible, then this routine will return NULL. A maxl argument that is ≤ 0 will result in the default value (5). Allowable inputs for pretype are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2) and PREC_BOTH (3); any other integer input will result in the default (no preconditioning). We note that some SUNDIALS solvers are designed to only work with left preconditioning (IDA and IDAS) and others with only right preconditioning (KINSOL). While it is possible to configure a SUNLinSol_SPTFQMR object to use any of the preconditioning options with these solvers, this use mode is not supported and may result in inferior performance. int SUNLinSol_SPTFQMRSetPrecType(SUNLinearSolver S, int pretype) This function updates the type of preconditioning to use. Supported values are PREC_NONE (0), PREC_LEFT (1), PREC_RIGHT (2), and PREC_BOTH (3). This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal pretype), SUNLS_MEM_NULL (S is NULL), or SUNLS_SUCCESS. int SUNLinSol_SPTFQMRSetMaxl(SUNLinearSolver S, int maxl) This function updates the number of linear solver iterations to allow. A maxl argument that is ≤ 0 will result in the default value (5). This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNSPTFQMR(N_Vector y, int pretype, int maxl) Wrapper function for SUNLinSol_SPTFQMR() 316 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int SUNSPTFQMRSetPrecType(SUNLinearSolver S, int pretype) Wrapper function for SUNLinSol_SPTFQMRSetPrecType() int SUNSPTFQMRSetMaxl(SUNLinearSolver S, int maxl) Wrapper function for SUNLinSol_SPTFQMRSetMaxl() For solvers that include a Fortran interface module, the SUNLinSol_SPTFQMR module also includes the Fortrancallable function FSUNSPTFQMRInit() to initialize this SUNLinSol_SPTFQMR module for a given SUNDIALS solver. subroutine FSUNSPTFQMRInit(CODE, PRETYPE, MAXL, IER) Initializes a SPTFQMR SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of SPTFQMR iterations to allow. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassSPTFQMRInit() initializes this SUNLinSol_SPTFQMR module for solving mass matrix linear systems. subroutine FSUNMassSPTFQMRInit(PRETYPE, MAXL, IER) Initializes a SPTFQMR SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after the N_Vector object has been initialized. Arguments: • PRETYPE (int, input) – flag denoting type of preconditioning to use: none=0, left=1, right=2, both=3. • MAXL (int, input) – number of SPTFQMR iterations to allow. • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_SPTFQMRSetPrecType() and SUNLinSol_SPTFQMRSetMaxl() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNSPTFQMRSetPrecType(CODE, PRETYPE, IER) Fortran interface to SUNLinSol_SPTFQMRSetPrecType() for system linear solvers. This routine must be called after FSUNSPTFQMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPTFQMRSetPrecType(PRETYPE, IER) Fortran interface to SUNLinSol_SPTFQMRSetPrecType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPTFQMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNSPTFQMRSetMaxl(CODE, MAXL, IER) Fortran interface to SUNLinSol_SPTFQMRSetMaxl() for system linear solvers. This routine must be called after FSUNSPTFQMRInit() has been called. 11.12. The SUNLinSol_SPTFQMR Module 317 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassSPTFQMRSetMaxl(MAXL, IER) Fortran interface to SUNLinSol_SPTFQMRSetMaxl() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassSPTFQMRInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.12.2 SUNLinSol_SPTFQMR Description The SUNLinSol_SPTFQMR module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_SPTFQMR { int maxl; int pretype; int numiters; realtype resnorm; long int last_flag; ATimesFn ATimes; void* ATData; PSetupFn Psetup; PSolveFn Psolve; void* PData; N_Vector s1; N_Vector s2; N_Vector r_star; N_Vector q; N_Vector d; N_Vector v; N_Vector p; N_Vector *r; N_Vector u; N_Vector vtemp1; N_Vector vtemp2; N_Vector vtemp3; }; These entries of the content field contain the following information: • maxl - number of TFQMR iterations to allow (default is 5), • pretype - flag for type of preconditioning to employ (default is none), • numiters - number of iterations from the most-recent solve, • resnorm - final linear residual norm from the most-recent solve, • last_flag - last error return flag from an internal function, • ATimes - function pointer to perform 𝐴𝑣 product, • ATData - pointer to structure for ATimes, • Psetup - function pointer to preconditioner setup routine, • Psolve - function pointer to preconditioner solve routine, • PData - pointer to structure for Psetup and Psolve, • s1, s2 - vector pointers for supplied scaling matrices (default is NULL), • r_star - a N_Vector which holds the initial scaled, preconditioned linear system residual, 318 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • q, d, v, p, u - N_Vector used for workspace by the SPTFQMR algorithm, • r - array of two N_Vector used for workspace within the SPTFQMR algorithm, • vtemp1, vtemp2, vtemp3 - temporary vector storage. This solver is constructed to perform the following operations: • During construction all N_Vector solver data is allocated, with vectors cloned from a template N_Vector that is input, and default solver parameters are set. • User-facing “set” routines may be called to modify default solver parameters. • Additional “set” routines are called by the SUNDIALS solver that interfaces with SUNLinSol_SPTFQMR to supply the ATimes, PSetup, and Psolve function pointers and s1 and s2 scaling vectors. • In the “initialize” call, the solver parameters are checked for validity. • In the “setup” call, any non-NULL PSetup function is called. Typically, this is provided by the SUNDIALS solver itself, that translates between the generic PSetup function and the solver-specific routine (solversupplied or user-supplied). • In the “solve” call the TFQMR iteration is performed. This will include scaling and preconditioning if those options have been supplied. The SUNLinSol_SPTFQMR module defines implementations of all “iterative” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_SPTFQMR • SUNLinSolInitialize_SPTFQMR • SUNLinSolSetATimes_SPTFQMR • SUNLinSolSetPreconditioner_SPTFQMR • SUNLinSolSetScalingVectors_SPTFQMR • SUNLinSolSetup_SPTFQMR • SUNLinSolSolve_SPTFQMR • SUNLinSolNumIters_SPTFQMR • SUNLinSolResNorm_SPTFQMR • SUNLinSolResid_SPTFQMR • SUNLinSolLastFlag_SPTFQMR • SUNLinSolSpace_SPTFQMR • SUNLinSolFree_SPTFQMR 11.13 The SUNLinSol_PCG Module The PCG (Preconditioned Conjugate Gradient [HS1952] implementation of the SUNLinearSolver module provided with SUNDIALS, SUNLinSol_PCG, is an iterative linear solver that is designed to be compatible with any N_Vector implementation (serial, threaded, parallel, and user-supplied) that supports a minimal subset of operations (N_VClone(), N_VDotProd(), N_VScale(), N_VLinearSum(), N_VProd(), and N_VDestroy()). Unlike the SPGMR and SPFGMR algorithms, PCG requires a fixed amount of memory that does not increase with the number of allowed iterations. 11.13. The SUNLinSol_PCG Module 319 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Unlike all of the other iterative linear solvers supplied with SUNDIALS, PCG should only be used on symmetric linear systems (e.g. mass matrix linear systems encountered in ARKode). As a result, the explanation of the role of scaling and preconditioning matrices given in general must be modified in this scenario. The PCG algorithm solves a linear system 𝐴𝑥 = 𝑏 where 𝐴 is a symmetric (𝐴𝑇 = 𝐴), real-valued matrix. Preconditioning is allowed, and is applied in a symmetric fashion on both the right and left. Scaling is also allowed and is applied symmetrically. We denote the preconditioner and scaling matrices as follows: • 𝑃 is the preconditioner (assumed symmetric), • 𝑆 is a diagonal matrix of scale factors. The matrices 𝐴 and 𝑃 are not required explicitly; only routines that provide 𝐴 and 𝑃 −1 as operators are required. The diagonal of the matrix 𝑆 is held in a single N_Vector, supplied by the user. In this notation, PCG applies the underlying CG algorithm to the equivalent transformed system ˜𝑥 = ˜𝑏 𝐴˜ (11.4) 𝐴˜ = 𝑆𝑃 −1 𝐴𝑃 −1 𝑆, ˜𝑏 = 𝑆𝑃 −1 𝑏, (11.5) where 𝑥 ˜=𝑆 −1 𝑃 𝑥. The scaling matrix must be chosen so that the vectors 𝑆𝑃 −1 𝑏 and 𝑆 −1 𝑃 𝑥 have dimensionless components. The stopping test for the PCG iterations is on the L2 norm of the scaled preconditioned residual: ˜𝑥‖2 < 𝛿 ‖˜𝑏 − 𝐴˜ ⇔ ‖𝑆𝑃 −1 𝑏 − 𝑆𝑃 −1 𝐴𝑥‖2 < 𝛿 ⇔ ‖𝑃 −1 𝑏 − 𝑃 −1 𝐴𝑥‖𝑆 < 𝛿 where ‖𝑣‖𝑆 = √ 𝑣 𝑇 𝑆 𝑇 𝑆𝑣, with an input tolerance 𝛿. 11.13.1 SUNLinSol_PCG Usage The header file to be included when using this module is sunlinsol/sunlinsol_pcg.h. The SUNLinSol_PCG module is accessible from all SUNDIALS solvers without linking to the libsundials_sunlinsolpcg module library. The module SUNLinSol_PCG provides the following user-callable routines: SUNLinearSolver SUNLinSol_PCG(N_Vector y, int pretype, int maxl) This constructor function creates and allocates memory for a PCG SUNLinearSolver. Its arguments are an N_Vector, a flag indicating to use preconditioning, and the number of linear iterations to allow. This routine will perform consistency checks to ensure that it is called with a consistent N_Vector implementation (i.e. that it supplies the requisite vector operations). If y is incompatible then this routine will return NULL. A maxl argument that is ≤ 0 will result in the default value (5). Since the PCG algorithm is designed to only support symmetric preconditioning, then any of the pretype inputs PREC_LEFT (1), PREC_RIGHT (2), or PREC_BOTH (3) will result in use of the symmetric preconditioner; any other integer input will result in the default (no preconditioning). Although some SUNDIALS 320 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), solvers are designed to only work with left preconditioning (IDA and IDAS) and others with only right preconditioning (KINSOL), PCG should only be used with these packages when the linear systems are known to be symmetric. Since the scaling of matrix rows and columns must be identical in a symmetric matrix, symmetric preconditioning should work appropriately even for packages designed with one-sided preconditioning in mind. int SUNLinSol_PCGSetPrecType(SUNLinearSolver S, int pretype) This function updates the flag indicating use of preconditioning. As above, any one of the input values, PREC_LEFT (1), PREC_RIGHT (2), or PREC_BOTH (3) will enable preconditioning; PREC_NONE (0) disables preconditioning. This routine will return with one of the error codes SUNLS_ILL_INPUT (illegal pretype), SUNLS_MEM_NULL (S is NULL), or SUNLS_SUCCESS. int SUNLinSol_PCGSetMaxl(SUNLinearSolver S, int maxl) This function updates the number of linear solver iterations to allow. A maxl argument that is ≤ 0 will result in the default value (5). This routine will return with one of the error codes SUNLS_MEM_NULL (S is NULL) or SUNLS_SUCCESS. For backwards compatibility, we also provide the wrapper functions, each with identical input and output arguments to the routines that they wrap: SUNLinearSolver SUNPCG(N_Vector y, int pretype, int maxl) Wrapper function for SUNLinSol_PCG() int SUNPCGSetPrecType(SUNLinearSolver S, int pretype) Wrapper function for SUNLinSol_PCGSetPrecType() int SUNPCGSetMaxl(SUNLinearSolver S, int maxl) Wrapper function for SUNLinSol_PCGSetMaxl() For solvers that include a Fortran interface module, the SUNLinSol_PCG module also includes the Fortran-callable function FSUNPCGInit() to initialize this SUNLinSol_PCG module for a given SUNDIALS solver. subroutine FSUNPCGInit(CODE, PRETYPE, MAXL, IER) Initializes a PCG SUNLinearSolver structure for use in a SUNDIALS package. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, KINSOL=3, ARKode=4. • PRETYPE (int, input) – flag denoting whether to use symmetric preconditioning: no=0, yes=1. • MAXL (int, input) – number of PCG iterations to allow. • IER (int, output) – return flag (0 success, -1 for failure). Additionally, when using ARKode with a non-identity mass matrix, the Fortran-callable function FSUNMassPCGInit() initializes this SUNLinSol_PCG module for solving mass matrix linear systems. subroutine FSUNMassPCGInit(PRETYPE, MAXL, IER) Initializes a PCG SUNLinearSolver structure for use in solving mass matrix systems in ARKode. This routine must be called after the N_Vector object has been initialized. Arguments: • PRETYPE (int, input) – flag denoting whether to use symmetric preconditioning: no=0, yes=1. • MAXL (int, input) – number of PCG iterations to allow. 11.13. The SUNLinSol_PCG Module 321 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • IER (int, output) – return flag (0 success, -1 for failure). The SUNLinSol_PCGSetPrecType() and SUNLinSol_PCGSetMaxl() routines also support Fortran interfaces for the system and mass matrix solvers: subroutine FSUNPCGSetPrecType(CODE, PRETYPE, IER) Fortran interface to SUNLinSol_PCGSetPrecType() for system linear solvers. This routine must be called after FSUNPCGInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassPCGSetPrecType(PRETYPE, IER) Fortran interface to SUNLinSol_PCGSetPrecType() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassPCGInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNPCGSetMaxl(CODE, MAXL, IER) Fortran interface to SUNLinSol_PCGSetMaxl() for system linear solvers. This routine must be called after FSUNPCGInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. subroutine FSUNMassPCGSetMaxl(MAXL, IER) Fortran interface to SUNLinSol_PCGSetMaxl() for mass matrix linear solvers in ARKode. This routine must be called after FSUNMassPCGInit() has been called. Arguments: all should have type int, and have meanings identical to those listed above. 11.13.2 SUNLinSol_PCG Description The SUNLinSol_PCG module defines the content field of a SUNLinearSolver to be the following structure: struct _SUNLinearSolverContent_PCG { int maxl; int pretype; int numiters; realtype resnorm; long int last_flag; ATimesFn ATimes; void* ATData; PSetupFn Psetup; PSolveFn Psolve; void* PData; N_Vector s; N_Vector r; N_Vector p; N_Vector z; N_Vector Ap; }; These entries of the content field contain the following information: • maxl - number of PCG iterations to allow (default is 5), • pretype - flag for use of preconditioning (default is none), • numiters - number of iterations from the most-recent solve, • resnorm - final linear residual norm from the most-recent solve, 322 Chapter 11. Description of the SUNLinearSolver module User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • last_flag - last error return flag from an internal function, • ATimes - function pointer to perform 𝐴𝑣 product, • ATData - pointer to structure for ATimes, • Psetup - function pointer to preconditioner setup routine, • Psolve - function pointer to preconditioner solve routine, • PData - pointer to structure for Psetup and Psolve, • s - vector pointer for supplied scaling matrix (default is NULL), • r - a N_Vector which holds the preconditioned linear system residual, • p, z, Ap - N_Vector used for workspace by the PCG algorithm. This solver is constructed to perform the following operations: • During construction all N_Vector solver data is allocated, with vectors cloned from a template N_Vector that is input, and default solver parameters are set. • User-facing “set” routines may be called to modify default solver parameters. • Additional “set” routines are called by the SUNDIALS solver that interfaces with SUNLinSol_PCG to supply the ATimes, PSetup, and Psolve function pointers and s scaling vector. • In the “initialize” call, the solver parameters are checked for validity. • In the “setup” call, any non-NULL PSetup function is called. Typically, this is provided by the SUNDIALS solver itself, that translates between the generic PSetup function and the solver-specific routine (solversupplied or user-supplied). • In the “solve” call the PCG iteration is performed. This will include scaling and preconditioning if those options have been supplied. The SUNLinSol_PCG module defines implementations of all “iterative” linear solver operations listed in the section The SUNLinearSolver API: • SUNLinSolGetType_PCG • SUNLinSolInitialize_PCG • SUNLinSolSetATimes_PCG • SUNLinSolSetPreconditioner_PCG • SUNLinSolSetScalingVectors_PCG – since PCG only supports symmetric scaling, the second N_Vector argument to this function is ignored • SUNLinSolSetup_PCG • SUNLinSolSolve_PCG • SUNLinSolNumIters_PCG • SUNLinSolResNorm_PCG • SUNLinSolResid_PCG • SUNLinSolLastFlag_PCG • SUNLinSolSpace_PCG • SUNLinSolFree_PCG 11.13. The SUNLinSol_PCG Module 323 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 11.14 SUNLinearSolver Examples There are SUNLinearSolver examples that may be installed for each implementation; these make use of the functions in test_sunlinsol.c. These example functions show simple usage of the SUNLinearSolver family of modules. The inputs to the examples depend on the linear solver type, and are output to stdout if the example is run without the appropriate number of command-line arguments. The following is a list of the example functions in test_sunlinsol.c: • Test_SUNLinSolGetType: Verifies the returned solver type against the value that should be returned. • Test_SUNLinSolInitialize: Verifies that SUNLinSolInitialize can be called and returns successfully. • Test_SUNLinSolSetup: Verifies that SUNLinSolSetup can be called and returns successfully. • Test_SUNLinSolSolve: Given a SUNMatrix object 𝐴, N_Vector objects 𝑥 and 𝑏 (where 𝐴𝑥 = 𝑏) and a desired solution tolerance tol, this routine clones 𝑥 into a new vector 𝑦, calls SUNLinSolSolve to fill 𝑦 as the solution to 𝐴𝑦 = 𝑏 (to the input tolerance), verifies that each entry in 𝑥 and 𝑦 match to within 10*tol, and overwrites 𝑥 with 𝑦 prior to returning (in case the calling routine would like to investigate further). • Test_SUNLinSolSetATimes (iterative solvers only): Verifies that SUNLinSolSetATimes can be called and returns successfully. • Test_SUNLinSolSetPreconditioner (iterative solvers only): SUNLinSolSetPreconditioner can be called and returns successfully. Verifies that • Test_SUNLinSolSetScalingVectors (iterative solvers only): SUNLinSolSetScalingVectors can be called and returns successfully. Verifies that • Test_SUNLinSolLastFlag: Verifies that SUNLinSolLastFlag can be called, and outputs the result to stdout. • Test_SUNLinSolNumIters (iterative solvers only): Verifies that SUNLinSolNumIters can be called, and outputs the result to stdout. • Test_SUNLinSolResNorm (iterative solvers only): Verifies that SUNLinSolResNorm can be called, and that the result is non-negative. • Test_SUNLinSolResid (iterative solvers only): Verifies that SUNLinSolResid can be called. • Test_SUNLinSolSpace verifies that SUNLinSolSpace can be called, and outputs the results to stdout. We’ll note that these tests should be performed in a particular order. For either direct or iterative linear solvers, Test_SUNLinSolInitialize must be called before Test_SUNLinSolSetup, which must be called before Test_SUNLinSolSolve. Additionally, for iterative linear solvers Test_SUNLinSolSetATimes, Test_SUNLinSolSetPreconditioner and Test_SUNLinSolSetScalingVectors should be called before Test_SUNLinSolInitialize; similarly Test_SUNLinSolNumIters, Test_SUNLinSolResNorm and Test_SUNLinSolResid should be called after Test_SUNLinSolSolve. These are called in the appropriate order in all of the example problems. 324 Chapter 11. Description of the SUNLinearSolver module CHAPTER TWELVE NONLINEAR SOLVER DATA STRUCTURES 12.1 Description of the SUNNonlinearSolver Module SUNDIALS time integration packages are written in terms of generic nonlinear solver operations defined by the SUNNonlinSol API and implemented by a particular SUNNonlinSol module of type SUNNonlinearSolver. Users can supply their own SUNNonlinSol module, or use one of the modules provided with SUNDIALS. The time integrators in SUNDIALS specify a default nonlinear solver module and as such this chapter is intended for users that wish to use a non-default nonlinear solver module or would like to provide their own nonlinear solver implementation. Users interested in using a non-default solver module may skip the description of the SUNNonlinSol API in section The SUNNonlinearSolver API and proceeded to the subsequent sections in this chapter that describe the SUNNonlinSol modules provided with SUNDIALS. For users interested in providing their own SUNNonlinSol module, the following section presents the SUNNonlinSol API and its implementation beginning with the definition of SUNNonlinSol functions in the sections SUNNonlinearSolver core functions, SUNNonlinearSolver set functions and SUNNonlinearSolver get functions. This is followed by the definition of functions supplied to a nonlinear solver implementation in the section Functions provided by SUNDIALS integrators. The nonlinear solver return codes are given in the section SUNNonlinearSolver return codes. The SUNNonlinearSolver type and the generic SUNNonlinSol module are defined in the section The generic SUNNonlinearSolver module. Finally, the section Implementing a Custom SUNNonlinearSolver Module lists the requirements for supplying a custom SUNNonlinSol module. Users wishing to supply their own SUNNonlinSol module are encouraged to use the SUNNonlinSol implementations provided with SUNDIALS as a template for supplying custom nonlinear solver modules. 12.1.1 The SUNNonlinearSolver API The SUNNonlinSol API defines several nonlinear solver operations that enable SUNDIALS integrators to utilize any SUNNonlinSol implementation that provides the required functions. These functions can be divided into three categories. The first are the core nonlinear solver functions. The second group of functions consists of set routines to supply the nonlinear solver with functions provided by the SUNDIALS time integrators and to modify solver parameters. The final group consists of get routines for retrieving nonlinear solver statistics. All of these functions are defined in the header file sundials/sundials_nonlinearsolver.h. SUNNonlinearSolver core functions The core nonlinear solver functions consist of two required functions to get the nonlinear solver type (SUNNonlinsSolGetType) and solve the nonlinear system (SUNNonlinSolSolve). The remaining three functions for nonlinear solver initialization (SUNNonlinSolInitialization), setup (SUNNonlinSolSetup), and destruction (SUNNonlinSolFree) are optional. 325 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNNonlinearSolver_Type SUNNonlinSolGetType(SUNNonlinearSolver NLS) The required function SUNNonlinSolGetType() returns the nonlinear solver type. Arguments: • NLS – a SUNNonlinSol object Return value: the SUNNonlinSol type identifier (of type int) will be one of the following: •SUNNONLINEARSOLVER_ROOTFIND – 0, the SUNNonlinSol module solves 𝐹 (𝑦) = 0. •SUNNONLINEARSOLVER_FIXEDPOINT – 1, the SUNNonlinSol module solves 𝐺(𝑦) = 𝑦. int SUNNonlinSolInitialize(SUNNonlinearSolver NLS) The optional function SUNNonlinSolInitialize() performs nonlinear solver initialization and may perform any necessary memory allocations. Arguments: • NLS – a SUNNonlinSol object Return value: the return value is zero for a successful call and a negative value for a failure. Notes: It is assumed all SUNNonlinSolInitialize(). may set this operation to NULL. solver-specific options have been set prior to calling SUNNonlinSol implementations that do not require initialization int SUNNonlinSolSetup(SUNNonlinearSolver NLS, N_Vector y, void* mem) The optional function SUNNonlinSolSetup() performs any solver setup needed for a nonlinear solve. Arguments: • NLS – a SUNNonlinSol object • y – the initial iteration passed to the nonlinear solver. • mem – the SUNDIALS integrator memory structure. Return value: the return value is zero for a successful call and a negative value for a failure. Notes: SUNDIALS integrators call SUNonlinSolSetup() before each step attempt. SUNNonlinSol implementations that do not require setup may set this operation to NULL. int SUNNonlinSolSolve(SUNNonlinearSolver NLS, N_Vector y0, N_Vector y, N_Vector w, realtype tol, booleantype callLSetup, void *mem) The required function SUNNonlinSolSolve() solves the nonlinear system 𝐹 (𝑦) = 0 or 𝐺(𝑦) = 𝑦. Arguments: • NLS – a SUNNonlinSol object • y0 – the initial iterate for the nonlinear solve. This must remain unchanged throughout the solution process. • y – the solution to the nonlinear system. • w – the solution error weight vector used for computing weighted error norms. • tol – the requested solution tolerance in the weighted root-mean-squared norm. • callLSetup – a flag indicating that the integrator recommends for the linear solver setup function to be called. • mem – the SUNDIALS integrator memory structure. Return value: the return value is zero for a successul solve, a positive value for a recoverable error, and a negative value for an unrecoverable error. 326 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), int SUNNonlinSolFree(SUNNonlinearSolver NLS) The optional function SUNNonlinSolFree() frees any memory allocated by the nonlinear solver. Arguments: • NLS – a SUNNonlinSol object Return value: the return value should be zero for a successful call, and a negative value for a failure. SUNNonlinSol implementations that do not allocate data may set this operation to NULL. SUNNonlinearSolver set functions The following set functions are used to supply nonlinear solver modules with functions defined by the SUNDIALS integrators and to modify solver parameters. Only the routine for setting the nonlinear system defining function (SUNNonlinSolSetSysFn) is required. All other set functions are optional. int SUNNonlinSolSetSysFn(SUNNonlinearSolver NLS, SUNNonlinSolSysFn SysFn) The required function SUNNonlinSolSetSysFn() is used to provide the nonlinear solver with the function defining the nonlinear system. This is the function 𝐹 (𝑦) in 𝐹 (𝑦) = 0 for SUNNONLINEARSOLVER_ROOTFIND modules or 𝐺(𝑦) in 𝐺(𝑦) = 𝑦 for SUNNONLINEARSOLVER_FIXEDPOINT modules. Arguments: • NLS – a SUNNonlinSol object • SysFn – the function defining the nonlinear system. See the section Functions provided by SUNDIALS integrators for the definition of SUNNonlinSolSysFn(). Return value: the return value should be zero for a successful call, and a negative value for a failure. int SUNNonlinSolSetLSetupFn(SUNNonlinearSolver NLS, SUNNonlinSolLSetupFn SetupFn) The optional function SUNNonlinSolLSetupFn() is called by SUNDIALS integrators to provide the nonlinear solver with access to its linear solver setup function. Arguments: • NLS – a SUNNonlinSol object • SetupFn – a wrapper function to the SUNDIALS integrator’s linear solver setup function. See the section Functions provided by SUNDIALS integrators for the definition of SUNNonlinLSetupFn(). Return value: the return value should be zero for a successful call, and a negative value for a failure. Notes: The SUNNonlinLSetupFn() function sets up the linear system 𝐴𝑥 = 𝑏 where 𝐴 = 𝜕𝐹 𝜕𝑦 is the linearization of the nonlinear residual function 𝐹 (𝑦) = 0 (when using SUNLinSol direct linear solvers) or calls the user-defined preconditioner setup function (when using SUNLinSol iterative linear solvers). SUNNonlinSol implementations that do not require solving this system, do not utilize SUNLinSol linear solvers, or use SUNLinSol linear solvers that do not require setup may set this operation to NULL. int SUNNonlinSolSetLSolveFn(SUNNonlinearSolver NLS, SUNNonlinSolLSolveFn SolveFn) The optional function SUNNonlinSolSetLSolveFn() is called by SUNDIALS integrators to provide the nonlinear solver with access to its linear solver solve function. Arguments: • NLS – a SUNNonlinSol object • SolveFn – a wrapper function to the SUNDIALS integrator’s linear solver solve function. See the section Functions provided by SUNDIALS integrators for the definition of SUNNonlinSolLSolveFn(). 12.1. Description of the SUNNonlinearSolver Module 327 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: the return value should be zero for a successful call, and a negative value for a failure. Notes: The SUNNonlinLSolveFn() function solves the linear system 𝐴𝑥 = 𝑏 where 𝐴 = 𝜕𝐹 𝜕𝑦 is the linearization of the nonlinear residual function 𝐹 (𝑦) = 0. SUNNonlinSol implementations that do not require solving this system or do not use SUNLinSol linear solvers may set this operation to NULL. int SUNNonlinSolSetConvTestFn(SUNNonlinearSolver NLS, SUNNonlinSolConvTestFn CTestFn) The optional function SUNNonlinSolSetConvTestFn() is used to provide the nonlinear solver with a function for determining if the nonlinear solver iteration has converged. This is typically called by SUNDIALS integrators to define their nonlinear convergence criteria, but may be replaced by the user. Arguments: • NLS – a SUNNonlinSol object • CTestFn – a SUNDIALS integrator’s nonlinear solver convergence test function. See the section Functions provided by SUNDIALS integrators for the definition of SUNNonlinSolConvTestFn(). Return value: the return value should be zero for a successful call, and a negative value for a failure. Notes: SUNNonlinSol implementations utilizing their own convergence test criteria may set this function to NULL. int SUNNonlinSolSetMaxIters(SUNNonlinearSolver NLS, int maxiters) The optional function SUNNonlinSolSetMaxIters() sets the maximum number of nonlinear solver iterations. This is typically called by SUNDIALS integrators to define their default iteration limit, but may be adjusted by the user. Arguments: • NLS – a SUNNonlinSol object • maxiters – the maximum number of nonlinear iterations. Return value: the return value should be zero for a successful call, and a negative value for a failure (e.g., 𝑚𝑎𝑥𝑖𝑡𝑒𝑟𝑠 < 1). SUNNonlinearSolver get functions The following get functions allow SUNDIALS integrators to retrieve nonlinear solver statistics. The routines to get the current total number of iterations (SUNNonlinSolGetNumIters) and number of convergence failures are optional. The routine to get the current nonlinear solver iteration (SUNNonlinSolGetCurIter) is required when using the convergence test provided by the SUNDIALS integrator or when using a SUNLinSol spils linear solver otherwise, SUNNonlinSolGetCurIter is optional. int SUNNonlinSolGetNumIters(SUNNonlinearSolver NLS, long int *niters) The optional function SUNNonlinSolGetNumIters() returns the total number of nonlinear solver iterations. This is typically called by the SUNDIALS integrator to store the nonlinear solver statistics, but may also be called by the user. Arguments: • NLS – a SUNNonlinSol object • niters – the total number of nonlinear solver iterations. Return value: the return value should be zero for a successful call, and a negative value for a failure. int SUNNonlinSolGetCurIter(SUNNonlinearSolver NLS, int *iter) The function SUNNonlinSolGetCurIter() returns the iteration index of the current nonlinear solve. This function is required when using SUNDIALS integrator-provided convergence tests or when using a SUNLinSol spils linear solver; otherwise it is optional. 328 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Arguments: • NLS – a SUNNonlinSol object • iter – the nonlinear solver iteration in the current solve starting from zero. Return value: the return value should be zero for a successful call, and a negative value for a failure. int SUNNonlinSolGetNumConvFails(SUNNonlinearSolver NLS, long int *nconvfails) The optional function SUNNonlinSolGetNumConvFails() returns the total number of nonlinear solver convergence failures. This may be called by the SUNDIALS integrator to store the nonlinear solver statistics, but may also be called by the user. Arguments: • NLS – a SUNNonlinSol object • nconvfails – the total number of nonlinear solver convergence failures. Return value: the return value should be zero for a successful call, and a negative value for a failure. Functions provided by SUNDIALS integrators To interface with SUNNonlinSol modules, the SUNDIALS integrators supply a variety of routines for evaluating the nonlinear system, calling the SUNLinSol setup and solve functions, and testing the nonlinear iteration for convergence. These integrator-provided routines translate between the user-supplied ODE or DAE systems and the generic interfaces to the nonlinear or linear systems of equations that result in their solution. The types for functions provided to a SUNNonlinSol module are defined in the header file sundials/sundials_nonlinearsolver.h, and are described below. typedef int (*SUNNonlinSolSysFn)(N_Vector y, N_Vector F, void* mem) These functions evaluate the nonlinear system 𝐹 (𝑦) for SUNNONLINEARSOLVER_ROOTFIND type modules or 𝐺(𝑦) for SUNNONLINEARSOLVER_FIXEDPOINT type modules. Memory for F must by be allocated prior to calling this function. The vector y must be left unchanged. Arguments: • y – is the state vector at which the nonlinear system should be evaluated. • F – is the output vector containing 𝐹 (𝑦) or 𝐺(𝑦), depending on the solver type. • mem – is the SUNDIALS integrator memory structure. Return value: The return value is zero for a successul solve, a positive value for a recoverable error, and a negative value for an unrecoverable error. typedef int (*SUNNonlinSolLSetupFn)(N_Vector y, N_Vector F, booleantype jbad, booleantype* jcur, void* mem) These functions are wrappers to the SUNDIALS integrator’s function for setting up linear solves with SUNLinSol modules. Arguments: • y – is the state vector at which the linear system should be setup. • F – is the value of the nonlinear system function at y. • jbad – is an input indicating whether the nonlinear solver believes that 𝐴 has gone stale (SUNTRUE) or not (SUNFALSE). • jcur – is an output indicating whether the routine has updated the Jacobian 𝐴 (SUNTRUE) or not (SUNFALSE). • mem – is the SUNDIALS integrator memory structure. 12.1. Description of the SUNNonlinearSolver Module 329 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: The return value is zero for a successul solve, a positive value for a recoverable error, and a negative value for an unrecoverable error. Notes: The SUNNonlinLSetupFn() function sets up the linear system 𝐴𝑥 = 𝑏 where 𝐴 = 𝜕𝐹 𝜕𝑦 is the linearization of the nonlinear residual function 𝐹 (𝑦) = 0 (when using SUNLinSol direct linear solvers) or calls the user-defined preconditioner setup function (when using SUNLinSol iterative linear solvers). SUNNonlinSol implementations that do not require solving this system, do not utilize SUNLinSol linear solvers, or use SUNLinSol linear solvers that do not require setup may ignore these functions. typedef int (*SUNNonlinSolLSolveFn)(N_Vector y, N_Vector b, void* mem) These functions are wrappers to the SUNDIALS integrator’s function for solving linear systems with SUNLinSol modules. Arguments: • y – is the input vector containing the current nonlinear iteration. • b – contains the right-hand side vector for the linear solve on input and the solution to the linear system on output. • mem – is the SUNDIALS integrator memory structure. Return value: The return value is zero for a successul solve, a positive value for a recoverable error, and a negative value for an unrecoverable error. Notes: The SUNNonlinLSolveFn() function solves the linear system 𝐴𝑥 = 𝑏 where 𝐴 = 𝜕𝐹 𝜕𝑦 is the linearization of the nonlinear residual function 𝐹 (𝑦) = 0. SUNNonlinSol implementations that do not require solving this system or do not use SUNLinSol linear solvers may ignore these functions. int (*SUNNonlinSolConvTestFn)(SUNNonlinearSolver NLS, N_Vector y, N_Vector del, realtype tol, N_Vector ewt, void* mem) These functions are SUNDIALS integrator-specific convergence tests for nonlinear solvers and are typically supplied by each SUNDIALS integrator, but users may supply custom problem-specific versions as desired. Arguments: • NLS – is the SUNNonlinSol object. • y – is the current nonlinear iterate. • del – is the difference between the current and prior nonlinear iterates. • tol – is the nonlinear solver tolerance. • ewt – is the weight vector used in computing weighted norms. • mem – is the SUNDIALS integrator memory structure. Return value: The return value of this routine will be a negative value if an unrecoverable error occurred or one of the following: •SUN_NLS_SUCCESS – the iteration is converged. •SUN_NLS_CONTINUE – the iteration has not converged, keep iterating. •SUN_NLS_CONV_RECVR – the iteration appears to be diverging, try to recover. Notes: The tolerance passed to this routine by SUNDIALS integrators is the tolerance in a weighted root-meansquared norm with error weight vector ewt. SUNNonlinSol modules utilizing their own convergence criteria may ignore these functions. 330 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNNonlinearSolver return codes The functions provided to SUNNonlinSol modules by each SUNDIALS integrator, and functions within the SUNDIALS-provided SUNNonlinSol implementations utilize a common set of return codes, shown in the table below. Here, negative values correspond to non-recoverable failures, positive values to recoverable failures, and zero to a successful call. Description of the SUNNonlinearSolver return codes: Name SUN_NLS_SUCCESS SUN_NLS_CONTINUE SUN_NLS_CONV_RECVR SUN_NLS_MEM_NULL SUN_NLS_MEM_FAIL SUN_NLS_ILL_INPUT Value 0 1 2 -1 -2 -3 Description successful call or converged solve the nonlinear solver is not converged, keep iterating the nonlinear solver appears to be diverging, try to recover a memory argument is NULL a memory access or allocation failed an illegal input option was provided The generic SUNNonlinearSolver module SUNDIALS integrators interact with specific SUNNonlinSol implementations through the generic SUNNonlinSol module on which all other SUNNonlinSol implementations are built. The SUNNonlinearSolver type is a pointer to a structure containing an implementation-dependent content field and an ops field. The type SUNNonlinearSolver is defined as follows: typedef struct _generic_SUNNonlinearSolver *SUNNonlinearSolver; struct _generic_SUNNonlinearSolver { void *content; struct _generic_SUNNonlinearSolver_Ops *ops; }; where the _generic_SUNNonlinearSolver_Ops structure is a list of pointers to the various actual nonlinear solver operations provided by a specific implementation. The _generic_SUNNonlinearSolver_Ops structure is defined as struct _generic_SUNNonlinearSolver_Ops { SUNNonlinearSolver_Type (*gettype)(SUNNonlinearSolver); int (*initialize)(SUNNonlinearSolver); int (*setup)(SUNNonlinearSolver, N_Vector, void*); int (*solve)(SUNNonlinearSolver, N_Vector, N_Vector, N_Vector, realtype, booleantype, void*); int (*free)(SUNNonlinearSolver); int (*setsysfn)(SUNNonlinearSolver, SUNNonlinSolSysFn); int (*setlsetupfn)(SUNNonlinearSolver, SUNNonlinSolLSetupFn); int (*setlsolvefn)(SUNNonlinearSolver, SUNNonlinSolLSolveFn); int (*setctestfn)(SUNNonlinearSolver, SUNNonlinSolConvTestFn); int (*setmaxiters)(SUNNonlinearSolver, int); int (*getnumiters)(SUNNonlinearSolver, long int*); int (*getcuriter)(SUNNonlinearSolver, int*); int (*getnumconvfails)(SUNNonlinearSolver, long int*); }; The generic SUNNonlinSol module defines and implements the nonlinear solver operations defined in Sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions. These routines are in fact only wrappers to the nonlinear solver operations provided by a particular SUNNonlinSol implementation, which are accessed through the ops field of the SUNNonlinearSolver structure. To illustrate this point we show below the implementation 12.1. Description of the SUNNonlinearSolver Module 331 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), of a typical nonlinear solver operation from the generic SUNNonlinSol module, namely SUNNonlinSolSolve, which solves the nonlinear system and returns a flag denoting a successful or failed solve: int SUNNonlinSolSolve(SUNNonlinearSolver NLS, N_Vector y0, N_Vector y, N_Vector w, realtype tol, booleantype callLSetup, void* mem) { return((int) NLS->ops->solve(NLS, y0, y, w, tol, callLSetup, mem)); } Implementing a Custom SUNNonlinearSolver Module A SUNNonlinSol implementation must do the following: • Specify the content of the SUNNonlinSol module. • Define and implement the required nonlinear solver operations defined in Sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions. Note that the names of the module routines should be unique to that implementation in order to permit using more than one SUNNonlinSol module (each with different SUNNonlinearSolver internal data representations) in the same code. • Define and implement a user-callable constructor to create a SUNNonlinearSolver object. Additionally, a SUNNonlinearSolver implementation may do the following: • Define and implement additional user-callable “set” routines acting on the SUNNonlinearSolver object, e.g., for setting various configuration options to tune the performance of the nonlinear solve algorithm. • Provide additional user-callable “get” routines acting on the SUNNonlinearSolver object, e.g., for returning various solve statistics. 12.1.2 The SUNNonlinearSolver_Newton implementation This section describes the SUNNonlinSol implementation of Newton’s method. To access the SUNNonlinSol_Newton module, include the header file sunnonlinsol/sunnonlinsol_newton.h. We note that the SUNNonlinSol_Newton module is accessible from SUNDIALS integrators without separately linking to the libsundials_sunnonlinsolnewton module library. SUNNonlinearSolver_Newton description To find the solution to 𝐹 (𝑦) = 0 (12.1) given an initial guess 𝑦 (0) , Newton’s method computes a series of approximate solutions 𝑦 (𝑚+1) = 𝑦 (𝑚) + 𝛿 (𝑚+1) where 𝑚 is the Newton iteration index, and the Newton update 𝛿 (𝑚+1) is the solution of the linear system 𝐴(𝑦 (𝑚) )𝛿 (𝑚+1) = −𝐹 (𝑦 (𝑚) ) , (12.2) 𝐴 ≡ 𝜕𝐹/𝜕𝑦 . (12.3) in which 𝐴 is the Jacobian matrix 332 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Depending on the linear solver used, the SUNNonlinSol_Newton module will employ either a Modified Newton method, or an Inexact Newton method [B1987], [BS1990], [DES1982], [DS1996], [K1995]. When used with a direct linear solver, the Jacobian matrix 𝐴 is held constant during the Newton iteration, resulting in a Modified Newton method. With a matrix-free iterative linear solver, the iteration is an Inexact Newton method. In both cases, calls to the integrator-supplied SUNNonlinSolLSetupFn() function are made infrequently to amortize the increased cost of matrix operations (updating 𝐴 and its factorization within direct linear solvers, or updating the preconditioner within iterative linear solvers). Specifically, SUNNonlinSol_Newton will call the SUNNonlinSolLSetupFn() function in two instances: 1. when requested by the integrator (the input callLSetSetup is SUNTRUE) before attempting the Newton iteration, or 2. when reattempting the nonlinear solve after a recoverable failure occurs in the Newton iteration with stale Jacobian information (jcur is SUNFALSE). In this case, SUNNonlinSol_Newton will set jbad to SUNTRUE before calling the SUNNonlinSolLSetupFn() function. Whether the Jacobian matrix 𝐴 is fully or partially updated depends on logic unique to each integrator-supplied SUNNonlinSolSetupFn() routine. We refer to the discussion of nonlinear solver strategies provided in Chapter Mathematical Considerations for details on this decision. The default maximum number of iterations and the stopping criteria for the Newton iteration are supplied by the SUNDIALS integrator when SUNNonlinSol_Newton is attached to it. Both the maximum number of iterations and the convergence test function may be modified by the user by calling the SUNNonlinSolSetMaxIters() and/or SUNNonlinSolSetConvTestFn() functions after attaching the SUNNonlinSol_Newton object to the integrator. SUNNonlinearSolver_Newton functions The SUNNonlinSol_Newton module provides the following constructor for creating the SUNNonlinearSolver object. SUNNonlinearSolver SUNNonlinSol_Newton(N_Vector y) The function SUNNonlinSol_Newton() creates a SUNNonlinearSolver object for use with SUNDIALS integrators to solve nonlinear systems of the form 𝐹 (𝑦) = 0 using Newton’s method. Arguments: • y – a template for cloning vectors needed within the solver. Return value: a SUNNonlinSol object if the constructor exits successfully, otherwise it will be NULL. The SUNNonlinSol_Newton module implements all of the functions defined in sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions except for the SUNNonlinSolSetup() function. The SUNNonlinSol_Newton functions have the same names as those defined by the generic SUNNonlinSol API with _Newton appended to the function name. Unless using the SUNNonlinSol_Newton module as a standalone nonlinear solver the generic functions defined in sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions should be called in favor of the SUNNonlinSol_Newton-specific implementations. The SUNNonlinSol_Newton module also defines the following additional user-callable function. int SUNNonlinSolGetSysFn_Newton(SUNNonlinearSolver NLS, SUNNonlinSolSysFn *SysFn) The function SUNNonlinSolGetSysFn_Newton() returns the residual function that defines the nonlinear system. Arguments: • NLS – a SUNNonlinSol object • SysFn – the function defining the nonlinear system. 12.1. Description of the SUNNonlinearSolver Module 333 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Return value: the return value should be zero for a successful call, and a negative value for a failure. Notes: This function is intended for users that wish to evaluate the nonlinear residual in a custom convergence test function for the SUNNonlinSol_Newton module. We note that SUNNonlinSol_Newton will not leverage the results from any user calls to SysFn. SUNNonlinearSolver_Newton content The content field of the SUNNonlinSol_Newton module is the following structure. struct _SUNNonlinearSolverContent_Newton { SUNNonlinSolSysFn SUNNonlinSolLSetupFn SUNNonlinSolLSolveFn SUNNonlinSolConvTestFn N_Vector booleantype int int long int long int Sys; LSetup; LSolve; CTest; delta; jcur; curiter; maxiters; niters; nconvfails; }; These entries of the content field contain the following information: • Sys – the function for evaluating the nonlinear system, • LSetup – the package-supplied function for setting up the linear solver, • LSolve – the package-supplied function for performing a linear solve, • CTest – the function for checking convergence of the Newton iteration, • delta – the Newton iteration update vector, • jcur – the Jacobian status (SUNTRUE = current, SUNFALSE = stale), • curiter – the current number of iterations in the solve attempt, • maxiters – the maximum number of Newton iterations allowed in a solve, and • niters – the total number of nonlinear iterations across all solves. • nconvfails – the total number of nonlinear convergence failures across all solves. SUNNonlinearSolver_Newton Fortran interface For SUNDIALS integrators that include a Fortran interface, the SUNNonlinSol_Newton module also includes a Fortran-callable function for creating a SUNNonlinearSolver object. subroutine FSUNNewtonInit(CODE, IER) The function FSUNNewtonInit() can be called for Fortran programs to create a SUNNonlinearSolver object for use with SUNDIALS integrators to solve nonlinear systems of the form 𝐹 (𝑦) = 0 with Newton’s method. This routine must be called after the N_Vector object has been initialized. Arguments: 334 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, ARKode=4. • IER (int, output) – return flag (0 success, -1 for failure). See printed message for details in case of failure. 12.1.3 The SUNNonlinearSolver_FixedPoint implementation This section describes the SUNNonlinSol implementation of a fixed point (functional) iteration with optional Anderson acceleration. To access the SUNNonlinSol_FixedPoint module, include the header file sunnonlinsol/sunnonlinsol_fixedpoint.h. We note that the SUNNonlinSol_FixedPoint module is accessible from SUNDIALS integrators without separately linking to the libsundials_sunnonlinsolfixedpoint module library. SUNNonlinearSolver_FixedPoint description To find the solution to 𝐺(𝑦) = 𝑦 (12.4) given an initial guess 𝑦 (0) , the fixed point iteration computes a series of approximate solutions 𝑦 (𝑛+1) = 𝐺(𝑦 (𝑛) ) (12.5) where 𝑛 is the iteration index. The convergence of this iteration may be accelerated using Anderson’s method [A1965], [WN2011], [FS2009], [LWWY2012]. With Anderson acceleration using subspace size 𝑚, the series of approximate solutions can be formulated as the linear combination 𝑦 (𝑛+1) = 𝑚𝑛 ∑︁ (𝑛) 𝛼𝑖 𝐺(𝑦 (𝑛−𝑚𝑛 +𝑖) ) (12.6) 𝑖=0 where 𝑚𝑛 = min {𝑚, 𝑛} and the factors (𝑛) (𝑛) 𝛼(𝑛) = (𝛼0 , . . . , 𝛼𝑚 ) 𝑛 solve the minimization problem min𝛼 ‖𝐹𝑛 𝛼𝑇 ‖2 under the constraint that ∑︀𝑚𝑛 𝑖=0 𝛼𝑖 = 1 where 𝐹𝑛 = (𝑓𝑛−𝑚𝑛 , . . . , 𝑓𝑛 ) with 𝑓𝑖 = 𝐺(𝑦 (𝑖) ) − 𝑦 (𝑖) . Due to this constraint, in the limit of 𝑚 = 0 the accelerated fixed point iteration formula (12.6) simplifies to the standard fixed point iteration (12.5). Following the recommendations made in [WN2011], the SUNNonlinSol_FixedPoint implementation computes the series of approximate solutions as 𝑦 (𝑛+1) = 𝐺(𝑦 (𝑛) )− 𝑚∑︁ 𝑛 −1 (𝑛) 𝛾𝑖 ∆𝑔𝑛−𝑚𝑛 +𝑖 (12.7) 𝑖=0 with ∆𝑔𝑖 = 𝐺(𝑦 (𝑖+1) ) − 𝐺(𝑦 (𝑖) ) and where the factors (𝑛) (𝑛) 𝛾 (𝑛) = (𝛾0 , . . . , 𝛾𝑚𝑛 −1 ) solve the unconstrained minimization problem min𝛾 ‖𝑓𝑛 − ∆𝐹𝑛 𝛾 𝑇 ‖2 where ∆𝐹𝑛 = (∆𝑓𝑛−𝑚𝑛 , . . . , ∆𝑓𝑛−1 ), 12.1. Description of the SUNNonlinearSolver Module 335 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), with ∆𝑓𝑖 = 𝑓𝑖+1 − 𝑓𝑖 . The least-squares problem is solved by applying a QR factorization to ∆𝐹𝑛 = 𝑄𝑛 𝑅𝑛 and solving 𝑅𝑛 𝛾 = 𝑄𝑇𝑛 𝑓𝑛 . The acceleration subspace size 𝑚 is required when constructing the SUNNonlinSol_FixedPoint object. The default maximum number of iterations and the stopping criteria for the fixed point iteration are supplied by the SUNDIALS integrator when SUNNonlinSol_FixedPoint is attached to it. Both the maximum number of iterations and the convergence test function may be modified by the user by calling SUNNonlinSolSetMaxIters() and SUNNonlinSolSetConvTestFn() functions after attaching the SUNNonlinSol_FixedPoint object to the integrator. SUNNonlinearSolver_FixedPoint functions The SUNNonlinSol_FixedPoint module provides the following constructor for creating the SUNNonlinearSolver object. SUNNonlinearSolver SUNNonlinSol_FixedPoint(N_Vector y, int m) The function SUNNonlinSol_FixedPoint() creates a SUNNonlinearSolver object for use with SUNDIALS integrators to solve nonlinear systems of the form 𝐺(𝑦) = 𝑦. Arguments: • y – a template for cloning vectors needed within the solver. • m – the number of acceleration vectors to use. Return value: a SUNNonlinSol object if the constructor exits successfully, otherwise it will be NULL. Since the accelerated fixed point iteration (12.5) does not require the setup or solution of any linear systems, the SUNNonlinSol_FixedPoint module implements all of the functions defined in sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions except for the SUNNonlinSolSetup(), SUNNonlinSolSetLSetupFn(), and SUNNonlinSolSetLSolveFn() functions, that are set to NULL. The SUNNonlinSol_FixedPoint functions have the same names as those defined by the generic SUNNonlinSol API with _FixedPoint appended to the function name. Unless using the SUNNonlinSol_FixedPoint module as a standalone nonlinear solver the generic functions defined in sections SUNNonlinearSolver core functions through SUNNonlinearSolver get functions should be called in favor of the SUNNonlinSol_FixedPoint-specific implementations. The SUNNonlinSol_FixedPoint module also defines the following additional user-callable function. int SUNNonlinSolGetSysFn_FixedPoint(SUNNonlinearSolver NLS, SUNNonlinSolSysFn *SysFn) The function SUNNonlinSolGetSysFn_FixedPoint() returns the fixed-point function that defines the nonlinear system. Arguments: • NLS – a SUNNonlinSol object • SysFn – the function defining the nonlinear system. Return value: the return value should be zero for a successful call, and a negative value for a failure. Notes: This function is intended for users that wish to evaluate the fixed-point function in a custom convergence test function for the SUNNonlinSol_FixedPoint module. We note that SUNNonlinSol_FixedPoint will not leverage the results from any user calls to SysFn. SUNNonlinearSolver_FixedPoint content The content field of the SUNNonlinSol_FixedPoint module is the following structure. 336 Chapter 12. Nonlinear Solver Data Structures User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), struct _SUNNonlinearSolverContent_FixedPoint { SUNNonlinSolSysFn Sys; SUNNonlinSolConvTestFn CTest; int int realtype realtype realtype N_Vector N_Vector N_Vector N_Vector N_Vector N_Vector N_Vector N_Vector N_Vector int int long int long int m; *imap; *R; *gamma; *cvals; *df; *dg; *q; *Xvecs; yprev; gy; fold; gold; delta; curiter; maxiters; niters; nconvfails; }; The following entries of the content field are always allocated: • Sys – function for evaluating the nonlinear system, • CTest – function for checking convergence of the fixed point iteration, • yprev – N_Vector used to store previous fixed-point iterate, • gy – N_Vector used to store 𝐺(𝑦) in fixed-point algorithm, • delta – N_Vector used to store difference between successive fixed-point iterates, • curiter – the current number of iterations in the solve attempt, • maxiters – the maximum number of fixed-point iterations allowed in a solve, and • niters – the total number of nonlinear iterations across all solves. • nconvfails – the total number of nonlinear convergence failures across all solves. • m – number of acceleration vectors, If Anderson acceleration is requested (i.e., 𝑚 > 0 in the call to SUNNonlinSol_FixedPoint()), then the following items are also allocated within the content field: • imap – index array used in acceleration algorithm (length m) • R – small matrix used in acceleration algorithm (length m*m) • gamma – small vector used in acceleration algorithm (length m) • cvals – small vector used in acceleration algorithm (length m+1) • df – array of N_Vectors used in acceleration algorithm (length m) • dg – array of N_Vectors used in acceleration algorithm (length m) • q – array of N_Vectors used in acceleration algorithm (length m) • Xvecs – N_Vector pointer array used in acceleration algorithm (length m+1) 12.1. Description of the SUNNonlinearSolver Module 337 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), • fold – N_Vector used in acceleration algorithm • gold – N_Vector used in acceleration algorithm SUNNonlinearSolver_FixedPoint Fortran interface For SUNDIALS integrators that include a Fortran interface, the SUNNonlinSol_FixedPoint module also includes a Fortran-callable function for creating a SUNNonlinearSolver object. subroutine FSUNFixedPointInit(CODE, M, IER) The function FSUNFixedPointInit() can be called for Fortran programs to create a SUNNonlinearSolver object for use with SUNDIALS integrators to solve nonlinear systems of the form 𝐺(𝑦) = 𝑦. This routine must be called after the N_Vector object has been initialized. Arguments: • CODE (int, input) – flag denoting the SUNDIALS solver this matrix will be used for: CVODE=1, IDA=2, ARKode=4. • M (int, input) – the number of acceleration vectors. • IER (int, output) – return flag (0 success, -1 for failure). See printed message for details in case of failure. 338 Chapter 12. Nonlinear Solver Data Structures CHAPTER THIRTEEN ARKODE INSTALLATION PROCEDURE The installation of any SUNDIALS package is accomplished by installing the SUNDIALS suite as a whole, according to the instructions that follow. The same procedure applies whether or not the downloaded file contains one or all solvers in SUNDIALS. The SUNDIALS suite (or individual solvers) are distributed as compressed archives (.tar.gz). The name of the distribution archive is of the form SOLVER-X.Y.Z.tar.gz, where SOLVER is one of: sundials, cvode, cvodes, arkode, ida, idas, or kinsol, and X.Y.Z represents the version number (of the SUNDIALS suite or of the individual solver). To begin the installation, first uncompress and expand the sources, by issuing % tar -zxf SOLVER-X.Y.Z.tar.gz This will extract source files under a directory SOLVER-X.Y.Z. Starting with version 2.6.0 of SUNDIALS, CMake is the only supported method of installation. The explanations of the installation procedure begins with a few common observations: • The remainder of this chapter will follow these conventions: SOLVERDIR is the directory SOLVER-X.Y.Z created above; i.e. the directory containing the SUNDIALS sources. BUILDDIR is the (temporary) directory under which SUNDIALS is built. INSTDIR is the directory under which the SUNDIALS exported header files and libraries will be installed. Typically, header files are exported under a directory INSTDIR/include while libraries are installed under INSTDIR/lib, with INSTDIR specified at configuration time. • For SUNDIALS’ CMake-based installation, in-source builds are prohibited; in other words, the build directory BUILDDIR can not be the same as SOLVERDIR and such an attempt will lead to an error. This prevents “polluting” the source tree and allows efficient builds for different configurations and/or options. • The installation directory INSTDIR can not be the same as the source directory SOLVERDIR. • By default, only the libraries and header files are exported to the installation directory INSTDIR. If enabled by the user (with the appropriate toggle for CMake), the examples distributed with SUNDIALS will be built together with the solver libraries but the installation step will result in exporting (by default in a subdirectory of the installation directory) the example sources and sample outputs together with automatically generated configuration files that reference the installed SUNDIALS headers and libraries. As such, these configuration files for the SUNDIALS examples can be used as “templates” for your own problems. CMake installs CMakeLists.txt files and also (as an option available only under Unix/Linux) Makefile files. Note this installation approach also allows the option of building the SUNDIALS examples without having to install them. (This can be used as a sanity check for the freshly built libraries.) • Even if generation of shared libraries is enabled, only static libraries are created for the FCMIX modules. Because of the use of fixed names for the Fortran user-provided subroutines, FCMIX shared libraries would result in “undefined symbol” errors at link time. 339 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Further details on the CMake-based installation procedures, instructions for manual compilation, and a roadmap of the resulting installed libraries and exported header files, are provided in the following subsections: • CMake-based installation • Installed libraries and exported header files 13.1 CMake-based installation CMake-based installation provides a platform-independent build system. CMake can generate Unix and Linux Makefiles, as well as KDevelop, Visual Studio, and (Apple) XCode project files from the same configuration file. In addition, CMake also provides a GUI front end and which allows an interactive build and installation process. The SUNDIALS build process requires CMake version 3.0.2 or higher and a working C compiler. On Unix-like operating systems, it also requires Make (and curses, including its development libraries, for the GUI front end to CMake, ccmake or cmake-gui), while on Windows it requires Visual Studio. While many Linux distributions offer CMake, the version included may be out of date. Many new CMake features have been added recently, and you should download the latest version from http://www.cmake.org. Build instructions for CMake (only necessary for Unix-like systems) can be found on the CMake website. Once CMake is installed, Linux/Unix users will be able to use ccmake or cmake-gui (depending on the version of CMake), while Windows users will be able to use CMakeSetup. As previously noted, when using CMake to configure, build and install SUNDIALS, it is always required to use a separate build directory. While in-source builds are possible, they are explicitly prohibited by the SUNDIALS CMake scripts (one of the reasons being that, unlike autotools, CMake does not provide a make distclean procedure and it is therefore difficult to clean-up the source tree after an in-source build). By ensuring a separate build directory, it is an easy task for the user to clean-up all traces of the build by simply removing the build directory. CMake does generate a make clean which will remove files generated by the compiler and linker. 13.1.1 Configuring, building, and installing on Unix-like systems The default CMake configuration will build all included solvers and associated examples and will build static and shared libraries. The INSTDIR defaults to /usr/local and can be changed by setting the CMAKE_INSTALL_PREFIX variable. Support for FORTRAN and all other options are disabled. CMake can be used from the command line with the cmake command, or from a curses-based GUI by using the ccmake command, or from a wxWidgets or QT based GUI by using the cmake-gui command. Examples for using both text and graphical methods will be presented. For the examples shown it is assumed that there is a top level SUNDIALS directory with appropriate source, build and install directories: $ mkdir (...)/INSTDIR $ mkdir (...)/BUILDDIR $ cd (...)/BUILDDIR Building with the GUI Using CMake with the ccmake GUI follows the general process: • Select and modify values, run configure (c key) • New values are denoted with an asterisk • To set a variable, move the cursor to the variable and press enter – If it is a boolean (ON/OFF) it will toggle the value – If it is string or file, it will allow editing of the string 340 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), – For file and directories, the key can be used to complete • Repeat until all values are set as desired and the generate option is available (g key) • Some variables (advanced variables) are not visible right away • To see advanced variables, toggle to advanced mode (t key) • To search for a variable press / key, and to repeat the search, press the n key Using CMake with the cmake-gui GUI follows a similar process: • Select and modify values, click Configure • The first time you click Configure, make sure to pick the appropriate generator (the following will ssume generation of Unix Makfiles). • New values are highlighted in red • To set a variable, click on or move the cursor to the variable and press enter – If it is a boolean (ON/OFF) it will check/uncheck the box – If it is string or file, it will allow editing of the string. Additionally, an ellipsis button will appear ... on the far right of the entry. Clicking this button will bring up the file or directory selection dialog. – For files and directories, the key can be used to complete • Repeat until all values are set as desired and click the Generate button • Some variables (advanced variables) are not visible right away • To see advanced variables, click the advanced button To build the default configuration using the curses GUI, from the BUILDDIR enter the ccmake command and point to the SOLVERDIR: $ ccmake (...)/SOLVERDIR Similarly, to build the default configuration using the wxWidgets GUI, from the BUILDDIR enter the cmake-gui command and point to the SOLVERDIR: $ cmake-gui (...)/SOLVERDIR The default curses configuration screen is shown in the following figure. The default INSTDIR for both SUNDIALS and corresponding examples can be changed by setting the CMAKE_INSTALL_PREFIX and the EXAMPLES_INSTALL_PATH as shown in the following figure. Pressing the g key or clicking generate will generate makefiles including all dependencies and all rules to build SUNDIALS on this system. Back at the command prompt, you can now run: $ make or for a faster parallel build (e.g. using 4 threads), you can run $ make -j 4 To install SUNDIALS in the installation directory specified in the configuration, simply run: $ make install 13.1. CMake-based installation 341 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 13.1: Default configuration screen. Note: Initial screen is empty. To get this default configuration, press ‘c’ repeatedly (accepting default values denoted with asterisk) until the ‘g’ option is available. Fig. 13.2: Changing the INSTDIR for SUNDIALS and corresponding EXAMPLES. 342 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Building from the command line Using CMake from the command line is simply a matter of specifying CMake variable settings with the cmake command. The following will build the default configuration: $ cmake -DCMAKE_INSTALL_PREFIX=/home/myname/sundials/instdir \ > -DEXAMPLES_INSTALL_PATH=/home/myname/sundials/instdir/examples \ > ../srcdir $ make $ make install 13.1.2 Configuration options (Unix/Linux) A complete list of all available options for a CMake-based SUNDIALS configuration is provide below. Note that the default values shown are for a typical configuration on a Linux system and are provided as illustration only. BLAS_ENABLE Enable BLAS support Default: OFF Note: Setting this option to ON will trigger additional CMake options. See additional information on building with BLAS enabled in Working with external Libraries. BLAS_LIBRARIES BLAS library Default: /usr/lib/libblas.so Note: CMake will search for libraries in your LD_LIBRARY_PATH prior to searching default system paths. BUILD_ARKODE Build the ARKODE library Default: ON BUILD_CVODE Build the CVODE library Default: ON BUILD_CVODES Build the CVODES library Default: ON BUILD_IDA Build the IDA library Default: ON BUILD_IDAS Build the IDAS library Default: ON BUILD_KINSOL Build the KINSOL library Default: ON BUILD_SHARED_LIBS Build shared libraries Default: ON BUILD_STATIC_LIBS Build static libraries Default: ON 13.1. CMake-based installation 343 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), CMAKE_BUILD_TYPE Choose the type of build, options are: None (CMAKE_C_FLAGS used), Debug, Release, RelWithDebInfo, and MinSizeRel Default: Note: Specifying a build type will trigger the corresponding build type specific compiler flag options below which will be appended to the flags set by CMAKE_ _FLAGS. CMAKE_C_COMPILER C compiler Default: /usr/bin/cc CMAKE_C_FLAGS Flags for C compiler Default: CMAKE_C_FLAGS_DEBUG Flags used by the C compiler during debug builds Default: -g CMAKE_C_FLAGS_MINSIZEREL Flags used by the C compiler during release minsize builds Default: -Os -DNDEBUG CMAKE_C_FLAGS_RELEASE Flags used by the C compiler during release builds Default: -O3 -DNDEBUG CMAKE_CXX_COMPILER C++ compiler Default: /usr/bin/c++ Note: A C++ compiler (and all related options) are only are triggered if C++ examples are enabled (EXAMPLES_ENABLE_CXX is ON). All SUNDIALS solvers can be used from C++ applications by default without setting any additional configuration options. CMAKE_CXX_FLAGS Flags for C++ compiler Default: CMAKE_CXX_FLAGS_DEBUG Flags used by the C++ compiler during debug builds Default: -g CMAKE_CXX_FLAGS_MINSIZEREL Flags used by the C++ compiler during release minsize builds Default: -Os -DNDEBUG CMAKE_CXX_FLAGS_RELEASE Flags used by the C++ compiler during release builds Default: -O3 -DNDEBUG CMAKE_Fortran_COMPILER Fortran compiler Default: /usr/bin/gfortran Note: Fortran support (and all related options) are triggered only if either Fortran-C support is (FCMIX_ENABLE is ON) or BLAS/LAPACK support is enabled (BLAS_ENABLE or LAPACK_ENABLE is ON). CMAKE_Fortran_FLAGS Flags for Fortran compiler Default: 344 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), CMAKE_Fortran_FLAGS_DEBUG Flags used by the Fortran compiler during debug builds Default: -g CMAKE_Fortran_FLAGS_MINSIZEREL Flags used by the Fortran compiler during release minsize builds Default: -Os CMAKE_Fortran_FLAGS_RELEASE Flags used by the Fortran compiler during release builds Default: -O3 CMAKE_INSTALL_PREFIX Install path prefix, prepended onto install directories Default: /usr/local Note: The user must have write access to the location specified through this option. Exported SUNDIALS header files and libraries will be installed under subdirectories include and lib of CMAKE_INSTALL_PREFIX, respectively. CXX_ENABLE Flag to enable C++ ARKode examples (if examples are enabled) Default: OFF CUDA_ENABLE Build the SUNDIALS CUDA vector module. Default: OFF EXAMPLES_ENABLE_C Build the SUNDIALS C examples Default: ON EXAMPLES_ENABLE_CUDA Build the SUNDIALS CUDA examples Default: OFF Note: You need to enable CUDA support to build these examples. EXAMPLES_ENABLE_CXX Build the SUNDIALS C++ examples Default: OFF EXAMPLES_ENABLE_RAJA Build the SUNDIALS RAJA examples Default: OFF Note: You need to enable CUDA and RAJA support to build these examples. EXAMPLES_ENABLE_F77 Build the SUNDIALS Fortran77 examples Default: ON (if FCMIX_ENABLE is ON) EXAMPLES_ENABLE_F90 Build the SUNDIALS Fortran90 examples Default: OFF EXAMPLES_INSTALL Install example files Default: ON Note: This option is triggered when any of the SUNDIALS example programs are enabled (EXAMPLES_ENABLE_ is ON). If the user requires installation of example programs then the sources and sample output files for all SUNDIALS modules that are currently enabled will be exported to the directory specified by EXAMPLES_INSTALL_PATH. A CMake configuration script will also be automatically 13.1. CMake-based installation 345 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), generated and exported to the same directory. Additionally, if the configuration is done under a Unix-like system, makefiles for the compilation of the example programs (using the installed SUNDIALS libraries) will be automatically generated and exported to the directory specified by EXAMPLES_INSTALL_PATH. EXAMPLES_INSTALL_PATH Output directory for installing example files Default: /usr/local/examples Note: The actual default value for this option will be an examples subdirectory created under CMAKE_INSTALL_PREFIX. FCMIX_ENABLE Enable Fortran-C support Default: OFF F90_ENABLE Flag to enable Fortran 90 ARKode examples (if examples are enabled) Default: OFF HYPRE_ENABLE Flag to enable hypre support Default: OFF Note: See additional information on building with hypre enabled in Working with external Libraries. HYPRE_INCLUDE_DIR Path to hypre header files Default: none HYPRE_LIBRARY Path to hypre installed library files Default: none KLU_ENABLE Enable KLU support Default: OFF Note: See additional information on building with KLU enabled in Working with external Libraries. KLU_INCLUDE_DIR Path to SuiteSparse header files Default: none KLU_LIBRARY_DIR Path to SuiteSparse installed library files Default: none LAPACK_ENABLE Enable LAPACK support Default: OFF Note: Setting this option to ON will trigger additional CMake options. See additional information on building with LAPACK enabled in Working with external Libraries. LAPACK_LIBRARIES LAPACK (and BLAS) libraries Default: /usr/lib/liblapack.so;/usr/lib/libblas.so Note: CMake will search for libraries in your LD_LIBRARY_PATH prior to searching default system paths. 346 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), MPI_ENABLE Enable MPI support (build the parallel nvector). Default: OFF Note: Setting this option to ON will trigger several additional options related to MPI. MPI_C_COMPILER mpicc program Default: MPI_CXX_COMPILER mpicxx program Default: Note: This option is triggered only if MPI is enabled (MPI_ENABLE is ON) and C++ examples are enabled (EXAMPLES_ENABLE_CXX is ON). All SUNDIALS solvers can be used from C++ MPI applications by default without setting any additional configuration options other than MPI_ENABLE. MPI_Fortran_COMPILER mpif77 or mpif90 program Default: Note: This option is triggered only if MPI is enabled (MPI_ENABLE is ON) and Fortran-C support is enabled (EXAMPLES_ENABLE_F77 or EXAMPLES_ENABLE_F90 are ON). MPIEXEC_EXECUTABLE Specify the executable for running MPI programs Default: mpirun Note: This option is triggered only if MPI is enabled (MPI_ENABLE is ON). OPENMP_ENABLE Enable OpenMP support (build the OpenMP NVector) Default: OFF PETSC_ENABLE Enable PETSc support Default: OFF Note: See additional information on building with PETSc enabled in Working with external Libraries. PETSC_INCLUDE_DIR Path to PETSc header files Default: none PETSC_LIBRARY_DIR Path to PETSc installed library files Default: none PTHREAD_ENABLE Enable Pthreads support (build the Pthreads NVector) Default: OFF RAJA_ENABLE Enable RAJA support (build the RAJA NVector). Default: OFF Note: You need to enable CUDA in order to build the RAJA vector module. SUNDIALS_F77_FUNC_CASE Specify the case to use in the Fortran name-mangling scheme, options are: lower or upper 13.1. CMake-based installation 347 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Default: Note: The build system will attempt to infer the Fortran name-mangling scheme using the Fortran compiler. This option should only be used if a Fortran compiler is not available or to override the inferred or default (lower) scheme if one can not be determined. If used, SUNDIALS_F77_FUNC_UNDERSCORES must also be set. index SUNDIALS_F77_FUNC_UNDERSCORES Specify the number of underscores to append in the Fortran name-mangling scheme, options are: none, one, or two Default: Note: The build system will attempt to infer the Fortran name-mangling scheme using the Fortran compiler. This option should only be used if a Fortran compiler is not available or to override the inferred or default (one) scheme if one can not be determined. If used, SUNDIALS_F77_FUNC_CASE must also be set. SUNDIALS_INDEX_TYPE (advanced) Integer type used for SUNDIALS indices. The size must match the size provided for the SUNDIALS_INDEX_SIZE option. Default: Note: In past SUNDIALS versions, a user could set this option to INT64_T to use 64-bit integers, or INT32_T to use 32-bit integers. Starting in SUNDIALS 3.2.0, these special values are deprecated. For SUNDIALS 3.2.0 and up, a user will only need to use the SUNDIALS_INDEX_SIZE option in most cases. SUNDIALS_INDEX_SIZE Integer size (in bits) used for indices in SUNDIALS, options are: 32 or 64 Default: 64 Note: The build system tries to find an integer type of appropriate size. Candidate 64-bit integer types are (in order of preference): int64_t, __int64, long long, and long. Candidate 32-bit integers are (in order of preference): int32_t, int, and long. The advanced option, SUNDIALS_INDEX_TYPE can be used to provide a type not listed here. SUNDIALS_PRECISION Precision used in SUNDIALS, options are: double, single or extended Default: double SUPERLUMT_ENABLE Enable SuperLU_MT support Default: OFF Note: See additional information on building with SuperLU_MT enabled in Working with external Libraries. SUPERLUMT_INCLUDE_DIR Path to SuperLU_MT header files (under a typical SuperLU_MT install, this is typically the SuperLU_MT SRC directory) Default: none SUPERLUMT_LIBRARY_DIR Path to SuperLU_MT installed library files Default: none SUPERLUMT_THREAD_TYPE Must be set to Pthread or OpenMP, depending on how SuperLU_MT was compiled. Default: Pthread USE_GENERIC_MATH Use generic (stdc) math libraries Default: ON 348 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), xSDK Configuration Options SUNDIALS supports CMake configuration options defined by the Extreme-scale Scientific Software Development Kit (xSDK) community policies (see https://xsdk.info for more information). xSDK CMake options are unused by default but may be activated by setting USE_XSDK_DEFAULTS to ON. Note: When xSDK options are active, they will overwrite the corresponding SUNDIALS option and may have different default values (see details below). As such the equivalent SUNDIALS options should not be used when configuring with xSDK options. In the GUI front end to CMake (ccmake or cmake-gui), setting USE_XSDK_DEFAULTS to ON will hide the corresponding SUNDIALS options as advanced CMake variables. During configuration, messages are output detailing which xSDK flags are active and the equivalent SUNDIALS options that are replaced. Below is a complete list xSDK options and the corresponding SUNDIALS options if applicable. TPL_BLAS_LIBRARIES BLAS library Default: /usr/lib/libblas.so SUNDIALS equivalent: BLAS_LIBRARIES Note: CMake will search for libraries in your LD_LIBRARY_PATH prior to searching default system paths. TPL_ENABLE_BLAS Enable BLAS support Default: OFF SUNDIALS equivalent: BLAS_ENABLE TPL_ENABLE_HYPRE Enable hypre support Default: OFF SUNDIALS equivalent: HYPRE_ENABLE TPL_ENABLE_KLU Enable KLU support Default: OFF SUNDIALS equivalent: KLU_ENABLE TPL_ENABLE_PETSC Enable PETSc support Default: OFF SUNDIALS equivalent: PETSC_ENABLE TPL_ENABLE_LAPACK Enable LAPACK support Default: OFF SUNDIALS equivalent: LAPACK_ENABLE TPL_ENABLE_SUPERLUMT Enable SuperLU_MT support Default: OFF SUNDIALS equivalent: SUPERLUMT_ENABLE TPL_HYPRE_INCLUDE_DIRS Path to hypre header files SUNDIALS equivalent: HYPRE_INCLUDE_DIR TPL_HYPRE_LIBRARIES hypre library SUNDIALS equivalent: N/A 13.1. CMake-based installation 349 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), TPL_KLU_INCLUDE_DIRS Path to KLU header files SUNDIALS equivalent: KLU_INCLUDE_DIR TPL_KLU_LIBRARIES KLU library SUNDIALS equivalent: N/A TPL_LAPACK_LIBRARIES LAPACK (and BLAS) libraries Default: /usr/lib/liblapack.so;/usr/lib/libblas.so SUNDIALS equivalent: LAPACK_LIBRARIES Note: CMake will search for libraries in your LD_LIBRARY_PATH prior to searching default system paths. TPL_PETSC_INCLUDE_DIRS Path to PETSc header files SUNDIALS equivalent: PETSC_INCLUDE_DIR TPL_PETSC_LIBRARIES PETSc library SUNDIALS equivalent: N/A TPL_SUPERLUMT_INCLUDE_DIRS Path to SuperLU_MT header files SUNDIALS equivalent: SUPERLUMT_INCLUDE_DIR TPL_SUPERLUMT_LIBRARIES SuperLU_MT library SUNDIALS equivalent: N/A TPL_SUPERLUMT_THREAD_TYPE SuperLU_MT library thread type SUNDIALS equivalent: SUPERLUMT_THREAD_TYPE USE_XSDK_DEFAULTS Enable xSDK default configuration settings Default: OFF SUNDIALS equivalent: N/A Note: Enabling xSDK defaults also sets CMAKE_BUILD_TYPE to Debug XSDK_ENABLE_FORTRAN Enable SUNDIALS Fortran interface Default: OFF SUNDIALS equivalent: FCMIX_ENABLE XSDK_INDEX_SIZE Integer size (bits) used for indices in SUNDIALS, options are: 32 or 64 Default: 32 SUNDIALS equivalent: SUNDIALS_INDEX_SIZE XSDK_PRECISION Precision used in SUNDIALS, options are: double, single, or quad Default: double SUNDIALS equivalent: SUNDIALS_PRECISION 350 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 13.1.3 Configuration examples The following examples will help demonstrate usage of the CMake configure options. To configure SUNDIALS using the default C and Fortran compilers, and default mpicc and mpif77 parallel compilers, enable compilation of examples, and install libraries, headers, and example sources under subdirectories of /home/myname/sundials/, use: % > > > > > cmake \ -DCMAKE_INSTALL_PREFIX=/home/myname/sundials/instdir \ -DEXAMPLES_INSTALL_PATH=/home/myname/sundials/instdir/examples \ -DMPI_ENABLE=ON \ -DFCMIX_ENABLE=ON \ /home/myname/sundials/srcdir % make install To disable installation of the examples, use: % > > > > > > cmake \ -DCMAKE_INSTALL_PREFIX=/home/myname/sundials/instdir \ -DEXAMPLES_INSTALL_PATH=/home/myname/sundials/instdir/examples \ -DMPI_ENABLE=ON \ -DFCMIX_ENABLE=ON \ -DEXAMPLES_INSTALL=OFF \ /home/myname/sundials/srcdir % make install 13.1.4 Working with external Libraries The SUNDIALS suite contains many options to enable implementation flexibility when developing solutions. The following are some notes addressing specific configurations when using the supported third party libraries. Building with BLAS SUNDIALS does not utilize BLAS directly but it may be needed by other external libraries that SUNDIALS can be build with (e.g. LAPACK, PETSc, SuperLU_MT, etc.). To enable BLAS, set the BLAS_ENABLE option to ON. If the directory containing the BLAS library is in the LD_LIBRARY_PATH environment variable, CMake will set the BLAS_LIBRARIES variable accordingly, otherwise CMake will attempt to find the BLAS library in standard system locations. To explicitly tell CMake what libraries to use, the BLAS_LIBRARIES variable can be set to the desired library. Example: % > > > > > > > > cmake \ -DCMAKE_INSTALL_PREFIX=/home/myname/sundials/instdir \ -DEXAMPLES_INSTALL_PATH=/home/myname/sundials/instdir/examples \ -DBLAS_ENABLE=ON \ -DBLAS_LIBRARIES=/myblaspath/lib/libblas.so \ -DSUPERLUMT_ENABLE=ON \ -DSUPERLUMT_INCLUDE_DIR=/mysuperlumtpath/SRC -DSUPERLUMT_LIBRARY_DIR=/mysuperlumtpath/lib /home/myname/sundials/srcdir % make install Note: When allowing CMake to automatically locate the LAPACK library, CMake may also locate the corresponding 13.1. CMake-based installation 351 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), BLAS library. If a working Fortran compiler is not available to infer the Fortran name-mangling scheme, the options SUNDIALS_F77_FUNC_CASE and SUNDIALS_F77_FUNC_UNDERSCORES must be set in order to bypass the check for a Fortran compiler and define the name-mangling scheme. The defaults for these options in earlier versions of SUNDIALS were lower and one, respectively. Building with LAPACK To enable LAPACK, set the LAPACK_ENABLE option to ON. If the directory containing the LAPACK library is in the LD_LIBRARY_PATH environment variable, CMake will set the LAPACK_LIBRARIES variable accordingly, otherwise CMake will attempt to find the LAPACK library in standard system locations. To explicitly tell CMake what library to use, the LAPACK_LIBRARIES variable can be set to the desired libraries. Note: When setting the LAPACK location explicitly the location of the corresponding BLAS library will also need to be set. Example: % > > > > > > > cmake \ -DCMAKE_INSTALL_PREFIX=/home/myname/sundials/instdir \ -DEXAMPLES_INSTALL_PATH=/home/myname/sundials/instdir/examples \ -DBLAS_ENABLE=ON \ -DBLAS_LIBRARIES=/mylapackpath/lib/libblas.so \ -DLAPACK_ENABLE=ON \ -DLAPACK_LIBRARIES=/mylapackpath/lib/liblapack.so \ /home/myname/sundials/srcdir % make install Note: When allowing CMake to automatically locate the LAPACK library, CMake may also locate the corresponding BLAS library. If a working Fortran compiler is not available to infer the Fortran name-mangling scheme, the options SUNDIALS_F77_FUNC_CASE and SUNDIALS_F77_FUNC_UNDERSCORES must be set in order to bypass the check for a Fortran compiler and define the name-mangling scheme. The defaults for these options in earlier versions of SUNDIALS were lower and one, respectively. Building with KLU The KLU libraries are part of SuiteSparse, a suite of sparse matrix software, available from the Texas A&M University website: http://faculty.cse.tamu.edu/davis/suitesparse.html . SUNDIALS has been tested with SuiteSparse version 4.5.3. To enable KLU, set KLU_ENABLE to ON, set KLU_INCLUDE_DIR to the include path of the KLU installation and set KLU_LIBRARY_DIR to the lib path of the KLU installation. The CMake configure will result in populating the following variables: AMD_LIBRARY, AMD_LIBRARY_DIR, BTF_LIBRARY, BTF_LIBRARY_DIR, COLAMD_LIBRARY, COLAMD_LIBRARY_DIR, and KLU_LIBRARY. Building with SuperLU_MT The SuperLU_MT libraries are available for download from the Lawrence Berkeley National Laboratory website: http://crd-legacy.lbl.gov/$sim$xiaoye/SuperLU/#superlu_mt . 352 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNDIALS has been tested with SuperLU_MT version 3.1. To enable SuperLU_MT, set SUPERLUMT_ENABLE to ON, set SUPERLUMT_INCLUDE_DIR to the SRC path of the SuperLU_MT installation, and set the variable SUPERLUMT_LIBRARY_DIR to the lib path of the SuperLU_MT installation. At the same time, the variable SUPERLUMT_THREAD_TYPE must be set to either Pthread or OpenMP. Do not mix thread types when building SUNDIALS solvers. If threading is enabled for SUNDIALS by having either OPENMP_ENABLE or PTHREAD_ENABLE set to ON then SuperLU_MT should be set to use the same threading type. Building with PETSc The PETSc libraries are available for download from the Argonne National Laboratory website: http://www.mcs.anl.gov/petsc . SUNDIALS has been tested with PETSc version 3.7.2. To enable PETSc, set PETSC_ENABLE to ON, set PETSC_INCLUDE_DIR to the include path of the PETSc installation, and set the variable PETSC_LIBRARY_DIR to the lib path of the PETSc installation. Building with hypre The hypre libraries are available for download from the Lawrence Livermore National Laboratory website: http://computation.llnl.gov/projects/hypre. SUNDIALS has been tested with hypre version 2.11.1. To enable hypre, set HYPRE_ENABLE to ON, set HYPRE_INCLUDE_DIR to the include path of the hypre installation, and set the variable HYPRE_LIBRARY_DIR to the lib path of the hypre installation. Building with CUDA SUNDIALS CUDA modules and examples have been tested with version 8.0 of the CUDA toolkit. To build them, you need to install the Toolkit and compatible NVIDIA drivers. Both are available for download from the NVIDIA website: https://developer.nvidia.com/cuda-downloads. To enable CUDA, set CUDA_ENABLE to ON. If CUDA is installed in a nonstandard location, you may be prompted to set the variable CUDA_TOOLKIT_ROOT_DIR with your CUDA Toolkit installation path. To enable CUDA examples, set EXAMPLES_ENABLE_CUDA to ON. Building with RAJA RAJA is a performance portability layer developed by Lawrence Livermore National Laboratory and can be obtained from {tt https://github.com/LLNL/RAJA. SUNDIALS RAJA modules and examples have been tested with RAJA version 0.3. Building SUNDIALS RAJA modules requires a CUDA-enabled RAJA installation. To enable RAJA, set CUDA_ENABLE and RAJA_ENABLE to ON. If RAJA is installed in a nonstandard location you will be prompted to set the variable RAJA_DIR with the path to the RAJA CMake configuration file. To enable building the RAJA examples set EXAMPLES_ENABLE_RAJA to ON. 13.1.5 Testing the build and installation If SUNDIALS was configured with EXAMPLES_ENABLE_ options to ON, then a set of regression tests can be run after building with the make command by running: % make test Additionally, if EXAMPLES_INSTALL was also set to ON, then a set of smoke tests can be run after installing with the make install command by running: 13.1. CMake-based installation 353 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), % make test_install 13.1.6 Building and Running Examples Each of the SUNDIALS solvers is distributed with a set of examples demonstrating basic usage. To build and install the examples, set at least of the EXAMPLES_ENABLE_ options to ON, and set EXAMPLES_INSTALL to ON. Specify the installation path for the examples with the variable EXAMPLES_INSTALL_PATH. CMake will generate CMakeLists.txt configuration files (and Makefile files if on Linux/Unix) that reference the installed SUNDIALS headers and libraries. Either the CMakeLists.txt file or the traditional Makefile may be used to build the examples as well as serve as a template for creating user developed solutions. To use the supplied Makefile simply run make to compile and generate the executables. To use CMake from within the installed example directory, run cmake (or ccmake or cmake-gui to use the GUI) followed by make to compile the example code. Note that if CMake is used, it will overwrite the traditional Makefile with a new CMake-generated Makefile. The resulting output from running the examples can be compared with example output bundled in the SUNDIALS distribution. NOTE: There will potentially be differences in the output due to machine architecture, compiler versions, use of third party libraries etc. 13.1.7 Configuring, building, and installing on Windows CMake can also be used to build SUNDIALS on Windows. To build SUNDIALS for use with Visual Studio the following steps should be performed: 1. Unzip the downloaded tar file(s) into a directory. This will be the SOLVERDIR 2. Create a separate BUILDDIR 3. Open a Visual Studio Command Prompt and cd to BUILDDIR 4. Run cmake-gui ../SOLVERDIR (a) Hit Configure (b) Check/Uncheck solvers to be built (c) Change CMAKE_INSTALL_PREFIX to INSTDIR (d) Set other options as desired (e) Hit Generate 5. Back in the VS Command Window: (a) Run msbuild ALL_BUILD.vcxproj (b) Run msbuild INSTALL.vcxproj The resulting libraries will be in the INSTDIR. The SUNDIALS project can also now be opened in Visual Studio. Double click on the ALL_BUILD.vcxproj file to open the project. Build the whole solution to create the SUNDIALS libraries. To use the SUNDIALS libraries in your own projects, you must set the include directories for your project, add the SUNDIALS libraries to your project solution, and set the SUNDIALS libraries as dependencies for your project. 354 Chapter 13. ARKode Installation Procedure User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 13.2 Installed libraries and exported header files Using the CMake SUNDIALS build system, the command $ make install will install the libraries under LIBDIR and the public header files under INCLUDEDIR. The values for these directories are INSTDIR/lib and INSTDIR/include, respectively. The location can be changed by setting the CMake variable CMAKE_INSTALL_PREFIX. Although all installed libraries reside under LIBDIR/lib, the public header files are further organized into subdirectories under INCLUDEDIR/include. The installed libraries and exported header files are listed for reference in the Table: SUNDIALS libraries and header files. The file extension .LIB is typically .so for shared libraries and .a for static libraries. Note that, in this table names are relative to LIBDIR for libraries and to INCLUDEDIR for header files. A typical user program need not explicitly include any of the shared SUNDIALS header files from under the INCLUDEDIR/include/sundials directory since they are explicitly included by the appropriate solver header files (e.g., cvode_dense.h includes sundials_dense.h). However, it is both legal and safe to do so, and would be useful, for example, if the functions declared in sundials_dense.h are to be used in building a preconditioner. 13.2.1 Table: SUNDIALS libraries and header files Shared NVECTOR_SERIAL NVECTOR_SERIAL NVECTOR_PARALLEL NVECTOR_PARALLEL NVECTOR_OPENMP NVECTOR_OPENMP NVECTOR_PTHREADS NVECTOR_PTHREADS SUNMATRIX_BAND SUNMATRIX_BAND SUNMATRIX_DENSE SUNMATRIX_DENSE SUNMATRIX_SPARSE SUNMATRIX_SPARSE SUNLINSOL_BAND SUNLINSOL_BAND SUNLINSOL_DENSE SUNLINSOL_DENSE SUNLINSOL_KLU SUNLINSOL_KLU SUNLINSOL_LAPACKBAND SUNLINSOL_LAPACKBAND SUNLINSOL_LAPACKDENSE SUNLINSOL_LAPACKDENSE SUNLINSOL_PCG SUNLINSOL_PCG SUNLINSOL_SPBCGS SUNLINSOL_SPBCGS Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files sundials/sundials_band.h, sundials/sundials_config.h, su libsundials_nvecserial.LIB, libsundials_fnvecserial.a nvector/nvector_serial.h libsundials_nvecparallel.LIB, libsundials_fnvecparallel nvector/nvector_parallel.h libsundials_nvecopenmp.LIB, libsundials_fnvecopenmp.a nvector/nvector_openmp.h libsundials_nvecpthreads.LIB, libsundials_fnvecpthreads nvector/nvector_pthreads.h libsundials_sunmatrixband.LIB, libsundials_fsunmatrixba sunmatrix/sunmatrix_band.h libsundials_sunmatrixdense.LIB, libsundials_fsunmatrixd sunmatrix/sunmatrix_dense.h libsundials_sunmatrixsparse.LIB, libsundials_fsunmatrix sunmatrix/sunmatrix_sparse.h libsundials_sunlinsolband.LIB, libsundials_fsunlinsolba sunlinsol/sunlinsol_band.h libsundials_sunlinsoldense.LIB, libsundials_fsunlinsold sunlinsol/sunlinsol_dense.h libsundials_sunlinsolklu.LIB, libsundials_fsunlinsolklu sunlinsol/sunlinsol_klu.h libsundials_sunlinsollapackband.LIB, libsundials_fsunli sunlinsol/sunlinsol_lapackband.h libsundials_sunlinsollapackdense.LIB, libsundials_fsunl sunlinsol/sunlinsol_lapackdense.h libsundials_sunlinsolpcg.LIB, libsundials_fsunlinsolpcg sunlinsol/sunlinsol_pcg.h libsundials_sunlinsolspbcgs.LIB, libsundials_fsunlinsol sunlinsol/sunlinsol_spbcgs.h 13.2. Installed libraries and exported header files 355 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNLINSOL_SPFGMR SUNLINSOL_SPFGMR SUNLINSOL_SPGMR SUNLINSOL_SPGMR SUNLINSOL_SPTFQMR SUNLINSOL_SPTFQMR SUNLINSOL_SUPERLUMT SUNLINSOL_SUPERLUMT SUNNONLINSOL_NEWTON SUNNONLINSOL_NEWTON SUNNONLINSOL_FIXEDPOINT SUNNONLINSOL_FIXEDPOINT CVODE CVODE CVODES CVODES ARKODE ARKODE IDA IDA IDAS IDAS KINSOL KINSOL 356 Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files Libraries Header files libsundials_sunlinsolspfgmr.LIB, libsundials_fsunlinsol sunlinsol/sunlinsol_spfgmr.h libsundials_sunlinsolspgmr.LIB, libsundials_fsunlinsols sunlinsol/sunlinsol_spgmr.h libsundials_sunlinsolsptfqmr.LIB, libsundials_fsunlinso sunlinsol/sunlinsol_sptfqmr.h libsundials_sunlinsolsuperlumt.LIB, libsundials_fsunlin sunlinsol/sunlinsol_superlumt.h libsundials_sunnonlinsolnewton.LIB, libsundials_fsunnon sunnonlinsol/sunnonlinsol_newton.h libsundials_sunnonlinsolfixedpoint.LIB, libsundials_fsu sunnonlinsol/sunnonlinsol_fixedpoint.h libsundials_cvode.LIB, libsundials_fcvode.a cvode/cvode.h, cvode/cvode_bandpre.h, cvode/cvode_bbdpre libsundials_cvodes.LIB cvodes/cvodes.h, cvodes/cvodes_bandpre.h, cvodes/cvodes_ libsundials_arkode.LIB, libsundials_farkode.a arkode/arkode.h, arkode/arkode_arkstep.h, arkode/arkode_ libsundials_ida.LIB, libsundials_fida.a ida/ida.h, ida/ida_bbdpre.h, ida/ida_direct.h, ida/ida_imp libsundials_idas.LIB idas/idas.h, idas/idas_bbdpre.h idas/idas_direct.h, idas/ libsundials_kinsol.LIB, libsundials_fkinsol.a kinsol/kinsol.h, kinsol/kinsol_bbdpre.h, kinsol/kinsol_d Chapter 13. ARKode Installation Procedure CHAPTER FOURTEEN APPENDIX: ARKODE CONSTANTS Below we list all input and output constants used by the main solver, timestepper, and linear solver modules, together with their numerical values and a short description of their meaning. 14.1 ARKode input constants 14.1.1 Shared ARKode input constants ARK_NORMAL (1): Solver returns at a specified output time. ARK_ONE_STEP (2): Solver returns after each successful step. 14.1.2 Explicit Butcher table specification HEUN_EULER_2_1_2 (0): Use the Heun-Euler-2-1-2 ERK method BOGACKI_SHAMPINE_4_2_3 (1): Use the Bogacki-Shampine-4-2-3 ERK method ARK324L2SA_ERK_4_2_3 (2): Use the ARK-4-2-3 ERK method ZONNEVELD_5_3_4 (3): Use the Zonneveld-5-3-4 ERK method ARK436L2SA_ERK_6_3_4 (4): Use the ARK-6-3-4 ERK method SAYFY_ABURUB_6_3_4 (5): Use the Sayfy-Aburub-6-3-4 ERK method CASH_KARP_6_4_5 (6): Use the Cash-Karp-6-4-5 ERK method FEHLBERG_6_4_5 (7): Use the Fehlberg-6-4-5 ERK method DORMAND_PRINCE_7_4_5 (8): Use the Dormand-Prince-7-4-5 ERK method ARK548L2SA_ERK_8_4_5 (9): Use the ARK-8-4-5 ERK method VERNER_8_5_6 (10): Use the Verner-8-5-6 ERK method FEHLBERG_13_7_8 (11): Use the Fehlberg-13-7-8 ERK method KNOTH_WOLKE_3_3 (12): Use the Knoth-Wolke-3-3 ERK method DEFAULT_ERK_2 (HEUN_EULER_2_1_2): Use the default second-order ERK method DEFAULT_ERK_3 (BOGACKI_SHAMPINE_4_2_3): Use the default third-order ERK method DEFAULT_ERK_4 (ZONNEVELD_5_3_4): Use the default fourth-order ERK method DEFAULT_ERK_5 (CASH_KARP_6_4_5): Use the default fifth-order ERK method 357 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), DEFAULT_ERK_6 (VERNER_8_5_6): Use the default sixth-order ERK method DEFAULT_ERK_8 (FEHLBERG_13_7_8): Use the default eighth-order ERK method 14.1.3 Implicit Butcher table specification SDIRK_2_1_2 (12): Use the SDIRK-2-1-2 SDIRK method BILLINGTON_3_3_2 (13): Use the Billington-3-3-2 SDIRK method TRBDF2_3_3_2 (14): Use the TRBDF2-3-3-2 ESDIRK method KVAERNO_4_2_3 (15): Use the Kvaerno-4-2-3 ESDIRK method ARK324L2SA_DIRK_4_2_3 (16): Use the ARK-4-2-3 ESDIRK method CASH_5_2_4 (17): Use the Cash-5-2-4 SDIRK method CASH_5_3_4 (18): Use the Cash-5-3-4 SDIRK method SDIRK_5_3_4 (19): Use the SDIRK-5-3-4 SDIRK method KVAERNO_5_3_4 (20): Use the Kvaerno-5-3-4 ESDIRK method ARK436L2SA_DIRK_6_3_4 (21): Use the ARK-6-3-4 ESDIRK method KVAERNO_7_4_5 (22): Use the Kvaerno-7-4-5 ESDIRK method ARK548L2SA_DIRK_8_4_5 (23): Use the ARK-8-4-5 ESDIRK method DEFAULT_DIRK_2 (SDIRK_2_1_2): Use the default second-order DIRK method DEFAULT_DIRK_3 (ARK324L2SA_DIRK_4_2_3): Use the default third-order DIRK method DEFAULT_DIRK_4 (SDIRK_5_3_4): Use the default fourth-order DIRK method DEFAULT_DIRK_5 (ARK548L2SA_DIRK_8_4_5): Use the default fifth-order DIRK method 14.1.4 ImEx Butcher table specification ARK324L2SA_ERK_4_2_3 and ARK324L2SA_DIRK_4_2_3 (2 and 16): Use the ARK-4-2-3 ARK method ARK436L2SA_ERK_6_3_4 and ARK436L2SA_DIRK_6_3_4 (4 and 21): Use the ARK-6-3-4 ARK method ARK548L2SA_ERK_8_4_5 and ARK548L2SA_DIRK_8_4_5 (9 and 23): Use the ARK-8-4-5 ARK method DEFAULT_ARK_ETABLE_3 and DEFAULT_ARK_ITABLE_3 (ARK324L2SA_[ERK,DIRK]_4_2_3): Use the default third-order ARK method DEFAULT_ARK_ETABLE_4 and DEFAULT_ARK_ITABLE_4 (ARK436L2SA_[ERK,DIRK]_6_3_4): Use the default fourth-order ARK method DEFAULT_ARK_ETABLE_5 and DEFAULT_ARK_ITABLE_5 (ARK548L2SA_[ERK,DIRK]_8_4_5): Use the default fifth-order ARK method 14.2 ARKode output constants 14.2.1 Shared ARKode output constants ARK_SUCCESS (0): Successful function return. 358 Chapter 14. Appendix: ARKode Constants User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ARK_TSTOP_RETURN (1): ARKode succeeded by reachign the specified stopping point. ARK_ROOT_RETURN (2): ARKode succeeded and found one more more roots. ARK_WARNING (99): ARKode succeeded but an unusual situation occurred. ARK_TOO_MUCH_WORK (-1): The solver took mxstep internal steps but could not reach tout. ARK_TOO_MUCH_ACC (-2): The solver could not satisfy the accuracy demanded by the user for some internal step. ARK_ERR_FAILURE (-3): Error test failures occurred too many times during one internal time step, or the minimum step size was reached. ARK_CONV_FAILURE (-4): Convergence test failures occurred too many times during one internal time step, or the minimum step size was reached. ARK_LINIT_FAIL (-5): The linear solver’s initialization function failed. ARK_LSETUP_FAIL (-6): The linear solver’s setup function failed in an unrecoverable manner. ARK_LSOLVE_FAIL (-7): The linear solver’s solve function failed in an unrecoverable manner. ARK_RHSFUNC_FAIL (-8): The right-hand side function failed in an unrecoverable manner. ARK_FIRST_RHSFUNC_ERR (-9): The right-hand side function failed at the first call. ARK_REPTD_RHSFUNC_ERR (-10): The right-hand side function had repeated recoverable errors. ARK_UNREC_RHSFUNC_ERR (-11): The right-hand side function had a recoverable error, but no recovery is possible. ARK_RTFUNC_FAIL (-12): The rootfinding function failed in an unrecoverable manner. ARK_LFREE_FAIL (-13): The linear solver’s memory deallocation function failed. ARK_MASSINIT_FAIL (-14): The mass matrix linear solver’s initialization function failed. ARK_MASSSETUP_FAIL (-15): The mass matrix linear solver’s setup function failed in an unrecoverable manner. ARK_MASSSOLVE_FAIL (-16): The mass matrix linear solver’s solve function failed in an unrecoverable manner. ARK_MASSFREE_FAIL (-17): The mass matrix linear solver’s memory deallocation function failed. ARK_MASSMULT_FAIL (-17): The mass matrix-vector product function failed. ARK_MEM_FAIL (-20): A memory allocation failed. ARK_MEM_NULL (-21): The arkode_mem argument was NULL. ARK_ILL_INPUT (-22): One of the function inputs is illegal. ARK_NO_MALLOC (-23): The ARKode memory block was not allocated by a call to ARKodeMalloc(). ARK_BAD_K (-24): The derivative order 𝑘 is larger than allowed. ARK_BAD_T (-25): The time 𝑡 is outside the last step taken. ARK_BAD_DKY (-26): The output derivative vector is NULL. ARK_TOO_CLOSE (-27): The output and initial times are too close to each other. ARK_VECTOROP_ERR (-29): An error occurred when calling an NVECTOR routine. ARK_NLS_INIT_FAIL (-30): An error occurred when initializing a SUNNonlinearSolver module. ARK_NLS_SETUP_FAIL (-31): A non-recoverable error occurred when setting up a SUNNonlinearSolver module. ARK_NLS_SETUP_RECVR (-32): A recoverable error occurred when setting up a SUNNonlinearSolver module. 14.2. ARKode output constants 359 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ARK_NLS_OP_ERR (-33): An error occurred when calling a set/get routine in a SUNNonlinearSolver module. ARK_INNERSTEP_OP_ERR (-34): An error occurred when calling an internal stepper within an ARKode module. ARK_UNRECOGNIZED_ERROR (-99): An unknown error was encountered. 14.2.2 ARKLS linear solver modules ARKLS_SUCCESS (0): Successful function return. ARKLS_MEM_NULL (-1): The arkode_mem argument was NULL. ARKLS_LMEM_NULL (-2): The ARKLS linear solver interface has not been initialized. ARKLS_ILL_INPUT (-3): The ARKLS solver interface is not compatible with the current NVECTOR module, or an input value was illegal. ARKLS_MEM_FAIL (-4): A memory allocation request failed. ARKLS_PMEM_NULL (-5): The preconditioner module has not been initialized. ARKLS_MASSMEM_NULL (-6): The ARKLS mass-matrix linear solver interface has not been initialized. ARKLS_JACFUNC_UNRECVR (-7): The Jacobian function failed in an unrecoverable manner. ARKLS_JACFUNC_RECVR (-8): The Jacobian function had a recoverable error. ARKLS_MASSFUNC_UNRECVR (-9): The mass matrix function failed in an unrecoverable manner. ARKLS_MASSFUNC_RECVR (-10): The mass matrix function had a recoverable error. ARKLS_SUNMAT_FAIL (-11): An error occurred with the current SUNMATRIX module. ARKLS_SUNLS_FAIL (-12): An error occurred with the current SUNLINSOL module. 360 Chapter 14. Appendix: ARKode Constants CHAPTER FIFTEEN APPENDIX: BUTCHER TABLES Here we catalog the full set of Butcher tables included in ARKode. We group these into three categories: explicit, implicit and additive. However, since the methods that comprise an additive Runge Kutta method are themselves explicit and implicit, their component Butcher tables are listed within their separate sections, but are referenced together in the additive section. In each of the following tables, we use the following notation (shown for a 3-stage method): 𝑐1 𝑐2 𝑐3 𝑞 𝑝 𝑎1,1 𝑎2,1 𝑎3,1 𝑏1 ˜𝑏1 𝑎1,2 𝑎2,2 𝑎3,2 𝑏2 ˜𝑏2 𝑎1,3 𝑎2,3 𝑎3,3 𝑏3 ˜𝑏3 where here the method and embedding share stage 𝐴 and 𝑐 values, but use their stages 𝑧𝑖 differently through the coefficients 𝑏 and ˜𝑏 to generate methods of orders 𝑞 (the main method) and 𝑝 (the embedding, typically 𝑞 = 𝑝 + 1, though sometimes this is reversed). Method authors often use different naming conventions to categorize their methods. For each of the methods below with an embedding, we follow the uniform naming convention: NAME-S-P-Q where here • NAME is the author or the name provided by the author (if applicable), • S is the number of stages in the method, • P is the global order of accuracy for the embedding, • Q is the global order of accuracy for the method. For methods without an embedding (e.g., fixed-step methods) P is omitted so that methods follow the naming convention NAME-S-Q. In the code, unique integer IDs are defined inside arkode_butcher_erk.h and arkode_butcher_dirk.h for each method, which may be used by calling routines to specify the desired method. These names are specified in fixed width font at the start of each method’s section below. Additionally, for each method we provide a plot of the linear stability region in the complex plane. These have been computed via the following approach. For any Runge Kutta method as defined above, we may define the stability function 𝑅(𝜂) = 1 + 𝜂𝑏[𝐼 − 𝜂𝐴]−1 𝑒, where 𝑒 ∈ R𝑠 is a column vector of all ones, 𝜂 = ℎ𝜆 and ℎ is the time step size. If the stability function satisfies 𝜕 |𝑅(𝜂)| ≤ 1 for all eigenvalues, 𝜆, of 𝜕𝑦 𝑓 (𝑡, 𝑦) for a given IVP, then the method will be linearly stable for that problem 361 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), and step size. The stability region 𝑆 = {𝜂 ∈ C : |𝑅(𝜂)| ≤ 1} is typically given by an enclosed region of the complex plane, so it is standard to search for the border of that region in order to understand the method. Since all complex numbers with unit magnitude may be written as 𝑒𝑖𝜃 for some value of 𝜃, we perform the following algorithm to trace out this boundary. 1. Define an array of values Theta. Since we wish for a smooth curve, and since we wish to trace out the entire boundary, we choose 10,000 linearly-spaced points from 0 to 16𝜋. Since some angles will correspond to multiple locations on the stability boundary, by going beyond 2𝜋 we ensure that all boundary locations are plotted, and by using such a fine discretization the Newton method (next step) is more likely to converge to the root closest to the previous boundary point, ensuring a smooth plot. 2. For each value 𝜃 ∈ Theta, we solve the nonlinear equation 0 = 𝑓 (𝜂) = 𝑅(𝜂) − 𝑒𝑖𝜃 using a finite-difference Newton iteration, using tolerance 10−7 , and differencing parameter √ 𝜀 (≈ 10−8 ). In this iteration, we use as initial guess the solution from the previous value of 𝜃, starting with an initial-initial guess of 𝜂 = 0 for 𝜃 = 0. 3. We then plot the resulting 𝜂 values that trace the stability region boundary. We note that for any stable IVP method, the value 𝜂0 = −𝜀 + 0𝑖 is always within the stability region. So in each of the following pictures, the interior of the stability region is the connected region that includes 𝜂0 . Resultingly, methods whose linear stability boundary is located entirely in the right half-plane indicate an A-stable method. 15.1 Explicit Butcher tables In the category of explicit Runge-Kutta methods, ARKode includes methods that have orders 2 through 6, with embeddings that are of orders 1 through 5. 15.1.1 Heun-Euler-2-1-2 Accessible via the constant HEUN_EULER_2_1_2 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 2nd order explicit method. 0 0 0 1 1 0 2 1 2 1 2 1 1 0 15.1.2 Bogacki-Shampine-4-2-3 Accessible via the constant BOGACKI_SHAMPINE_4_2_3 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 3rd order 362 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.1: Linear stability region for the Heun-Euler method. The method’s region is outlined in blue; the embedding’s region is in red. explicit method (from [BS1989]). 0 0 0 0 0 1 2 1 2 0 0 0 3 4 0 3 4 0 0 1 2 9 1 3 4 9 0 3 2 9 1 3 4 9 2 7 24 1 4 1 3 1 8 Fig. 15.2: Linear stability region for the Bogacki-Shampine method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1. Explicit Butcher tables 363 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.1.3 ARK-4-2-3 (explicit) Accessible via the constant ARK324L2SA_ERK_4_2_3 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the explicit portion of the default 3rd order additive method (from [KC2003]). 0 0 0 0 1767732205903 2027836641118 0 1767732205903 2027836641118 0 0 0 3 5 5535828885825 10492691773637 788022342437 10882634858940 0 0 1 6485989280629 16251701735622 4246266847089 − 9704473918619 10755448449292 10357097424841 0 3 1471266399579 7840856788654 4482444167858 − 7529755066697 11266239266428 11593286722821 1767732205903 4055673282236 2 2756255671327 12835298489170 10771552573575 − 22201958757719 9247589265047 10645013368117 2193209047091 5459859503100 Fig. 15.3: Linear stability region for the explicit ARK-4-2-3 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1.4 Knoth-Wolke-3-3 Accessible via the constant KNOTH_WOLKE_3_3 to MRIStepSetMRITableNum() and ARKodeLoadButcherTable_ERK(). This is the default 3th order slow and fast MRIStep method (from [KW1998]). 0 0 0 0 1 3 1 3 0 0 3 4 3 − 16 15 16 0 3 1 6 3 10 8 15 15.1.5 Zonneveld-5-3-4 Accessible via the constant ZONNEVELD_5_3_4 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 4th order 364 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.4: Linear stability region for the Knoth-Wolke method explicit method (from [Z1963]). 0 0 0 0 0 0 1 2 1 2 0 0 0 0 1 2 0 1 2 0 0 0 1 0 0 1 0 0 3 4 5 32 7 32 13 32 1 − 32 0 4 1 6 1 3 1 3 1 6 0 3 − 12 7 3 7 3 13 6 − 16 3 Fig. 15.5: Linear stability region for the Zonneveld method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1. Explicit Butcher tables 365 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.1.6 ARK-6-3-4 (explicit) Accessible via the constant ARK436L2SA_ERK_6_3_4 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the explicit portion of the default 4th order additive method (from [KC2003]). 0 0 1 2 0 0 0 0 0 1 2 0 0 0 0 0 83 250 13861 62500 6889 62500 0 0 0 0 31 50 116923316275 − 2393684061468 2731218467317 − 15368042101831 9408046702089 11113171139209 0 0 0 17 20 451086348788 − 2902428689909 − 2682348792572 7519795681897 12662868775082 11960479115383 3355817975965 11060851509271 0 0 1 647845179188 3216320057751 73281519250 8382639484533 552539513391 3454668386233 3354512671639 8306763924573 4040 17871 0 4 82889 524892 0 15625 83664 69875 102672 − 2260 8211 1 4 3 4586570599 29645900160 0 178811875 945068544 814220225 1159782912 3700637 − 11593932 61727 225920 Fig. 15.6: Linear stability region for the explicit ARK-6-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 366 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.1.7 Sayfy-Aburub-6-3-4 Accessible via the constant SAYFY_ABURUB_6_3_4 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK() (from [SA2002]). 0 0 0 0 0 0 0 1 2 1 2 0 0 0 0 0 1 −1 2 0 0 0 0 1 1 6 2 3 1 6 0 0 0 1 2 0.137 0.226 0.137 0 0 0 1 0.452 0 2 0 4 1 6 1 3 1 12 0 1 3 1 12 3 1 6 2 3 1 6 0 0 0 −0.904 −0.548 Fig. 15.7: Linear stability region for the Sayfy-Aburub-6-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1.8 Cash-Karp-6-4-5 Accessible via the constant CASH_KARP_6_4_5 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 5th order 15.1. Explicit Butcher tables 367 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), explicit method (from [CK1990]). 0 0 0 0 0 0 0 1 5 1 5 0 0 0 0 0 3 10 3 40 9 40 0 0 0 0 3 5 3 10 9 − 10 6 5 0 0 0 1 − 11 54 5 2 70 − 27 35 27 0 0 7 8 1631 55296 175 512 575 13824 44275 110592 253 4096 0 5 37 378 0 250 621 125 594 0 512 1771 4 2825 27648 0 18575 48384 13525 55296 277 14336 1 4 Fig. 15.8: Linear stability region for the Cash-Karp method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1.9 Fehlberg-6-4-5 Accessible via the constant FEHLBERG_6_4_5 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK() (from [F1969]). 368 0 0 0 0 0 0 0 1 4 1 4 0 0 0 0 0 3 8 3 32 9 32 0 0 0 0 12 13 1932 2197 − 7200 2197 7296 2197 0 0 0 1 439 216 −8 3680 513 845 − 4104 0 0 1 2 8 − 27 2 3544 − 2565 1859 4104 11 − 40 0 5 16 135 0 6656 12825 28561 56430 9 − 50 2 55 4 25 216 0 1408 2565 2197 4104 − 51 0 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.9: Linear stability region for the Fehlberg method. The method’s region is outlined in blue; the embedding’s region is in red. 15.1.10 Dormand-Prince-7-4-5 Accessible via the constant DORMAND_PRINCE_7_4_5 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK() (from [DP1980]). 0 0 0 0 0 0 0 0 1 5 1 5 0 0 0 0 0 0 3 10 3 40 9 40 0 0 0 0 0 4 5 44 45 56 − 15 32 9 0 0 0 0 8 9 19372 6561 − 25360 2187 64448 6561 212 − 729 0 0 0 1 9017 3168 − 355 33 46732 5247 49 176 5103 − 18656 0 0 1 35 384 0 500 1113 125 192 2187 − 6784 11 84 0 5 35 384 0 500 1113 125 192 2187 − 6784 11 84 0 4 5179 57600 0 7571 16695 393 640 92097 − 339200 187 2100 1 40 15.1.11 ARK-8-4-5 (explicit) Accessible via the constant ARK548L2SA_ERK_8_4_5 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the explicit portion of 15.1. Explicit Butcher tables 369 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.10: Linear stability region for the Dormand-Prince method. The method’s region is outlined in blue; the embedding’s region is in red. the default 5th order additive method (from [KC2003]). 0 0 0 0 0 0 0 0 41 100 0 0 0 0 0 0 2935347310677 11292855782101 367902744464 2072280473677 677623207551 8224143866563 0 0 0 0 0 1426016391358 7196633302097 1268023523408 10340822734521 0 1029933939417 13636558850479 0 0 0 0 92 100 14463281900351 6315353703477 0 66114435211212 5879490589093 − 54053170152839 4284798021562 0 0 0 24 100 14090043504691 34967701212078 0 15191511035443 11219624916014 − 18461159152457 12425892160975 281667163811 − 9011619295870 0 0 3 5 19230459214898 13134317526959 0 21275331358303 2942455364971 − 38145345988419 4862620318723 − 81 − 81 1 19977161125411 − 11928030595625 0 − 40795976796054 6384907823539 177454434618887 12078138498510 782672205425 8267701900261 − 69563011059811 9646580694205 7356628 4942186 5 872700587467 − 9133579230613 0 0 22348218063261 9555858737531 − 1143369518992 8141816002931 − 39379526789629 19018526304540 3272738 4290004 4 975461918565 − 9796059967033 0 0 78070527104295 32432590147079 548382580838 − 3424219808633 − 33438840321285 15594753105479 3629800 4656183 41 100 15.1.12 Verner-8-5-6 Accessible via the constant VERNER_8_5_6 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 6th order 370 Chapter 15. Appendix: Butcher tables 0 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.11: Linear stability region for the explicit ARK-8-4-5 method. The method’s region is outlined in blue; the embedding’s region is in red. explicit method (from [V1978]). 0 0 0 0 0 0 0 0 0 1 6 1 6 0 0 0 0 0 0 0 4 15 4 75 16 75 0 0 0 0 0 0 2 3 5 6 − 83 5 2 0 0 0 0 0 5 6 − 165 64 55 6 − 425 64 85 96 0 0 0 0 1 12 5 −8 4015 612 − 11 36 88 255 0 0 0 8263 − 15000 124 75 − 643 680 81 − 250 2484 10625 0 0 0 1 3501 1720 − 300 43 297275 52632 319 − 2322 24068 84065 0 3850 26703 0 6 3 40 0 875 2244 23 72 264 1955 0 125 11592 43 616 5 13 160 0 2375 5984 5 16 12 85 3 44 0 0 1 15 15.1.13 Fehlberg-13-7-8 Accessible via the constant FEHLBERG_13_7_8 to ARKStepSetARKTableNum(), ERKStepSetERKTableNum() or ARKodeLoadButcherTable_ERK(). This is the default 8th order 15.1. Explicit Butcher tables 371 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.12: Linear stability region for the Verner-8-5-6 method. The method’s region is outlined in blue; the embedding’s region is in red. explicit method (from [B2008]). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 27 2 27 0 0 0 0 0 0 0 0 0 0 0 0 1 9 1 36 1 12 0 0 0 0 0 0 0 0 0 0 0 1 6 1 24 0 1 8 0 0 0 0 0 0 0 0 0 0 5 12 5 12 0 − 25 16 25 16 0 0 0 0 0 0 0 0 0 1 2 1 20 0 0 1 4 1 5 0 0 0 0 0 0 0 0 5 6 25 − 108 0 0 125 108 65 − 27 125 54 0 0 0 0 0 0 0 1 6 31 300 0 0 0 61 225 − 29 13 900 0 0 0 0 0 0 2 3 2 0 0 − 53 6 704 45 − 107 9 67 90 3 0 0 0 0 0 1 3 91 − 108 0 0 23 108 − 976 135 311 54 − 19 60 17 6 1 − 12 0 0 0 0 1 2383 4100 0 0 − 341 164 4496 1025 − 301 82 2133 4100 45 82 45 164 18 41 0 0 0 0 3 205 0 0 0 0 6 − 41 3 − 205 3 − 41 3 41 6 41 0 0 0 1 − 1777 4100 0 0 − 341 164 4496 1025 − 289 82 2193 4100 51 82 33 164 12 41 0 1 0 9 35 9 35 9 280 9 280 0 41 840 41 840 9 35 9 35 9 280 9 280 41 840 0 0 8 0 0 0 0 0 34 105 7 41 840 0 0 0 0 34 105 15.2 Implicit Butcher tables In the category of diagonally implicit Runge-Kutta methods, ARKode includes methods that have orders 2 through 5, with embeddings that are of orders 1 through 4. 372 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.13: Linear stability region for the Fehlberg-13-7-8 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.1 SDIRK-2-1-2 Accessible via the constant SDIRK_2_1_2 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). This is the default 2nd order implicit method. Both the method and embedding are A- and B-stable. 1 1 0 0 −1 1 2 1 2 1 2 1 1 0 Fig. 15.14: Linear stability region for the SDIRK-2-1-2 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2. Implicit Butcher tables 373 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.2.2 Billington-3-3-2 Accessible via the constant BILLINGTON_3_3_2 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). Here, the higher-order embedding is less stable than the lower-order method (from [B1983]). 0.292893218813 0.292893218813 0 0 1.091883092037 0.798989873223 0.292893218813 0 1.292893218813 0.740789228841 0.259210771159 0.292893218813 2 0.740789228840 0.259210771159 0 3 0.691665115992 0.503597029883 −0.195262145876 Fig. 15.15: Linear stability region for the Billington method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.3 TRBDF2-3-3-2 Accessible via the constant TRBDF2_3_3_2 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). As with Billington, here the higher-order embedding is less stable than the lower-order method (from [B1985]). 0 2− √ 2 1 2 3 374 0 0 √ 2− 2 2 √ 2 4 √ 2 4√ 1− 42 3 √ 2− 2 2 √ 2 4 √ 2 4 √ 3 2 4 +1 3 0 0 √ 2− 2 2 √ 2− 2 2 √ 2− 2 6 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.16: Linear stability region for the TRBDF2 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.4 Kvaerno-4-2-3 Accessible via the constant KVAERNO_4_2_3 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). Both the method and embedding are A-stable; additionally the method is L-stable (from [K2004]). 0 0 0 0 0.4358665215 0.4358665215 0 0 1 0.490563388419108 0.073570090080892 0.4358665215 0 1 0.308809969973036 1.490563388254106 −1.235239879727145 0.4358665215 3 0.308809969973036 1.490563388254106 −1.235239879727145 0.4358665215 2 0.490563388419108 0.073570090080892 0 0.871733043 0.4358665215 0 Fig. 15.17: Linear stability region for the Kvaerno-4-2-3 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2. Implicit Butcher tables 375 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.2.5 ARK-4-2-3 (implicit) Accessible via the constant ARK324L2SA_DIRK_4_2_3 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). This is the default 3rd order implicit method, and the implicit portion of the default 3rd order additive method. Both the method and embedding are A-stable; additionally the method is L-stable (from [KC2003]). 0 0 0 0 0 1767732205903 2027836641118 1767732205903 4055673282236 1767732205903 4055673282236 3 5 0 0 2746238789719 10658868560708 640167445237 − 6845629431997 1767732205903 4055673282236 0 1 1471266399579 7840856788654 4482444167858 − 7529755066697 11266239266428 11593286722821 1767732205903 4055673282236 3 1471266399579 7840856788654 4482444167858 − 7529755066697 11266239266428 11593286722821 1767732205903 4055673282236 2 2756255671327 12835298489170 10771552573575 − 22201958757719 9247589265047 10645013368117 2193209047091 5459859503100 Fig. 15.18: Linear stability region for the implicit ARK-4-2-3 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.6 Cash-5-2-4 Accessible via the constant ARKodeLoadButcherTable_DIRK(). method is L-stable (from [C1979]). CASH_5_2_4 to ARKStepSetIRKTableNum() or Both the method and embedding are A-stable; additionally the 0.435866521508 0.435866521508 0 0 0 0 −0.7 −1.13586652150 0.435866521508 0 0 0 0.8 1.08543330679 −0.721299828287 0.435866521508 0 0 0.924556761814 0.416349501547 0.190984004184 −0.118643265417 0.435866521508 0 1 0.896869652944 0.0182725272734 −0.0845900310706 −0.266418670647 0.435866521508 4 0.896869652944 0.0182725272734 −0.0845900310706 −0.266418670647 0.435866521508 2 1.05646216107052 −0.0564621610705236 0 0 0 376 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.19: Linear stability region for the Cash-5-2-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.7 Cash-5-3-4 Accessible via the constant ARKodeLoadButcherTable_DIRK(). method is L-stable (from [C1979]). CASH_5_3_4 to ARKStepSetIRKTableNum() or Both the method and embedding are A-stable; additionally the 0.435866521508 0.435866521508 0 0 0 0 −0.7 −1.13586652150 0.435866521508 0 0 0 0.8 1.08543330679 −0.721299828287 0.435866521508 0 0 0.924556761814 0.416349501547 0.190984004184 −0.118643265417 0.435866521508 0 1 0.896869652944 0.0182725272734 −0.0845900310706 −0.266418670647 0.435866521508 4 0.896869652944 0.0182725272734 −0.0845900310706 −0.266418670647 0.435866521508 3 0.776691932910 0.0297472791484 −0.0267440239074 0.220304811849 0 15.2.8 SDIRK-5-3-4 Accessible via the constant SDIRK_5_3_4 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). This is the default 4th order implicit method. Here, the method is both A- and L-stable, although the embedding has reduced stability (from [HW1996]). 15.2. Implicit Butcher tables 1 4 1 4 0 0 0 0 3 4 1 2 1 4 0 0 0 11 20 17 50 1 − 25 1 4 0 0 1 2 371 1360 137 − 2720 15 544 1 4 0 1 25 24 49 − 48 125 16 85 − 12 1 4 4 25 24 49 − 48 125 16 85 − 12 1 4 3 59 48 17 − 96 225 32 85 − 12 0 377 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.20: Linear stability region for the Cash-5-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. Fig. 15.21: Linear stability region for the SDIRK-5-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 378 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 15.2.9 Kvaerno-5-3-4 Accessible via the constant KVAERNO_5_3_4 to ARKStepSetIRKTableNum() ARKodeLoadButcherTable_DIRK(). Both the method and embedding are A-stable (from [K2004]). 0 or 0 0 0 0 0 0.4358665215 0.4358665215 0 0 0 0.468238744853136 0.140737774731968 −0.108365551378832 0.4358665215 0 0 1 0.102399400616089 −0.376878452267324 0.838612530151233 0.4358665215 0 1 0.157024897860995 0.117330441357768 0.61667803039168 −0.326899891110444 0.4358665215 4 0.157024897860995 0.117330441357768 0.61667803039168 −0.326899891110444 0.4358665215 3 0.102399400616089 −0.376878452267324 0.838612530151233 0.4358665215 0 0.871733043 Fig. 15.22: Linear stability region for the Kvaerno-5-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.10 ARK-6-3-4 (implicit) Accessible via the constant ARK436L2SA_DIRK_6_3_4 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). This is the implicit portion of the default 4th order additive method. 15.2. Implicit Butcher tables 379 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Both the method and embedding are A-stable; additionally the method is L-stable (from [KC2003]). 0 0 0 0 0 0 0 1 2 1 4 1 4 0 0 0 0 83 250 8611 62500 1743 − 31250 1 4 0 0 0 31 50 5012029 34652500 654441 − 2922500 174375 388108 1 4 0 0 17 20 15267082809 155376265600 71443401 − 120774400 730878875 902184768 2285395 8070912 1 4 0 1 82889 524892 0 15625 83664 69875 102672 − 2260 8211 1 4 4 82889 524892 0 15625 83664 69875 102672 − 2260 8211 1 4 3 4586570599 29645900160 0 178811875 945068544 814220225 1159782912 3700637 − 11593932 61727 225920 Fig. 15.23: Linear stability region for the implicit ARK-6-3-4 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.11 Kvaerno-7-4-5 Accessible via the constant KVAERNO_7_4_5 to ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). Both the method and embedding are A-stable; additionally the method is L-stable (from [K2004]). 0 0 0 0 0 0.52 0.26 0.26 0 0 1.230333209967908 0.13 0.84033320996790809 0.26 0 0.895765984350076 0.22371961478320505 0.47675532319799699 −0.06470895363112615 0.26 0.436393609858648 0.16648564323248321 0.10450018841591720 0.03631482272098715 −0.13090704451073998 1 0.13855640231268224 0 −0.04245337201752043 0.02446657898003141 0.61 1 0.13659751177640291 0 −0.05496908796538376 −0.04118626728321046 0.62 5 0.13659751177640291 0 −0.05496908796538376 −0.04118626728321046 0.62 4 0.13855640231268224 0 −0.04245337201752043 0.02446657898003141 0.61 380 Chapter 15. Appendix: Butcher tables User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.24: Linear stability region for the Kvaerno-7-4-5 method. The method’s region is outlined in blue; the embedding’s region is in red. 15.2.12 ARK-8-4-5 (implicit) Accessible via the constant ARK548L2SA_DIRK_8_4_5 for ARKStepSetIRKTableNum() or ARKodeLoadButcherTable_DIRK(). This is the default 5th order implicit method, and the implicit portion of the default 5th order additive method. Both the method and embedding are A-stable; additionally the method is L-stable (from [KC2003]). 0 0 0 0 0 0 0 41 100 41 200 41 200 0 0 0 0 2935347310677 11292855782101 41 400 567603406766 − 11931857230679 41 200 0 0 0 1426016391358 7196633302097 683785636431 9252920307686 0 110385047103 − 1367015193373 41 200 0 0 92 100 3016520224154 10081342136671 0 30586259806659 12414158314087 22760509404356 − 11113319521817 41 200 0 24 100 218866479029 1489978393911 0 638256894668 5436446318841 1179710474555 − 5321154724896 60928119172 − 8023461067671 41 200 3 5 1020004230633 5715676835656 0 25762820946817 25263940353407 2161375909145 − 9755907335909 211217309593 − 5846859502534 4269925059573 − 7827059040749 2 1 872700587467 − 9133579230613 0 0 22348218063261 9555858737531 − 1143369518992 8141816002931 39379526789629 − 19018526304540 327273 429000 5 872700587467 − 9133579230613 0 0 22348218063261 9555858737531 − 1143369518992 8141816002931 39379526789629 − 19018526304540 327273 429000 4 975461918565 − 9796059967033 0 0 78070527104295 32432590147079 548382580838 − 3424219808633 33438840321285 − 15594753105479 362980 465618 15.3 Additive Butcher tables In the category of additive Runge-Kutta methods for split implicit and explicit calculations, ARKode includes methods that have orders 3 through 5, with embeddings that are of orders 2 through 4. These Butcher table pairs are as follows: • 3rd-order pair: ARK-4-2-3 (explicit) with ARK-4-2-3 (implicit), corresponding to Butcher tables ARK324L2SA_ERK_4_2_3 and ARK324L2SA_DIRK_4_2_3 for ARKStepSetARKTableNum(). 15.3. Additive Butcher tables 381 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), Fig. 15.25: Linear stability region for the implicit ARK-8-4-5 method. The method’s region is outlined in blue; the embedding’s region is in red. • 4th-order pair: ARK-6-3-4 (explicit) with ARK-6-3-4 (implicit), corresponding to Butcher tables ARK436L2SA_ERK_6_3_4 and ARK436L2SA_DIRK_6_3_4 for ARKStepSetARKTableNum(). • 5th-order pair: ARK-8-4-5 (explicit) with ARK-8-4-5 (implicit), corresponding to Butcher tables ARK548L2SA_ERK_8_4_5 and ARK548L2SA_ERK_8_4_5 for ARKStepSetARKTableNum(). 382 Chapter 15. Appendix: Butcher tables BIBLIOGRAPHY [A1965] D.G. Anderson, Iterative Procedures for Nonlinear Integral Equations, J. Assoc. Comput. Machinery, 12:547560, 1965. [B1985] Bank et al., Transient Simulation of Silicon Devices and Circuits, IEEE Trans. CAD, 4:436-451, 1985. [B1983] S.R. Billington, Type-Insensitive Codes for the Solution of Stiff and Nonstiff Systems of Ordinary Differential Equations, in: Master Thesis, University of Manchester, United Kingdom, 1983. [BS1989] P. Bogacki and L.F. Shampine. A 3(2) pair of Runge–Kutta formulas, Appl. Math. Lett., 2:321–325, 1989. [B1987] P.N. Brown. A local convergence theory for combined inexact-Newton/finite difference projection methods. SIAM J. Numer. Anal., 24:407-434, 1987. [BH1989] P.N. Brown and A.C. Hindmarsh. Reduced Storage Matrix Methods in Stiff ODE Systems. J. Appl. Math. & Comp., 31:49-91, 1989. [BS1990] P.N. Brown and Y. Saad. Hybrid Krylov Methods for Nonlinear Systems of Equations. SIAM J. Sci. Stat. Comput., 11:450-481, 1990. [B2008] J.C. Butcher, Numerical Methods for Ordinary Differential Equations. Wiley, 2nd edition, Chicester, England, 2008. [B1992] G.D. Byrne. Pragmatic Experiments with Krylov Methods in the Stiff ODE Setting. In J.R. Cash and I. Gladwell, editors, Computational Ordinary Differential Equations, pp. 323-356, Oxford University Press, 1992. [C1979] J.R. Cash. Diagonally Implicit Runge-Kutta Formulae with Error Estimates. IMA J Appl Math, 24:293-301, 1979. [CK1990] J.R. Cash and A.H. Karp. A variable order Runge-Kutta method for initial value problems with rapidly varying right-hand sides, ACM Trans. Math. Soft., 16:201-222, 1990. [CGM2014] J. CHeng, M. Grossman and T. McKercher. Professional Cuda C Programming. John Wiley & Sons, 2014. [DP1980] J.R. Dormand and P.J. Prince. A family of embedded Runge-Kutta formulae, J. Comput. Appl. Math. 6:19–26, 1980. [DP2010] T. Davis and E. Palamadai Natarajan. Algortithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Soft., 37, 2010. [DES1982] R.S. Dembo, S.C. Eisenstat and T. Steihaug. Inexact Newton Methods. SIAM J. Numer. Anal., 19:400-408, 1982. [DGL1999] J.W. Demmel, J.R. Gilbert and X.S. Li. An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination. SIAM J. Matrix Analysis and Applications, 20:915-952, 1999. [DS1996] J.E. Dennis and R.B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM, Philadelphia, 1996. 383 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), [F2015] R. Falgout and U.M. Yang. Hypre user’s manual. LLNL Technical Report, 2015. [FS2009] H. Fang and Y. Saad. Two classes of secant methods for nonlinear acceleration. Numer. Linear Algebra Appl., 16:197-21, 2009. [F1969] E. Fehlberg. Low-order classical Runge-Kutta formulas with step size control and their application to some heat transfer problems. NASA Technical Report 315, 1969. [F1993] R.W. Freund. A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermitian Linear Systems. SIAM J. Sci. Comp., 14:470-482, 1993. [G1991] K. Gustafsson. Control theoretic techniques for stepsize selection in explicit Runge-Kutta methods. ACM Trans. Math. Soft., 17:533-554, 1991. [G1994] K. Gustafsson. Control-theoretic techniques for stepsize selection in implicit Runge-Kutta methods. ACM Trans. Math. Soft. 20:496-512, 1994. [HW1993] E. Hairer, S. Norsett and G. Wanner. Solving Ordinary Differential Equations I. Springer Series in Computational Mathematics, vol. 8, 1993. [HW1996] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II. Springer Series in Computational Mathematics, vol. 14, 1996. [HS1952] M.R. Hestenes and E. Stiefel. Methods of Conjugate Gradients for Solving Linear Systems. J. Research of the National Bureau of Standards, 49:409-436, 1952. [HS1980] K.L. Hiebert and L.F. Shampine. Implicitly Defined Output Points for Solutions of ODEs. Technical Report SAND80-0180, Sandia National Laboratories, February 1980. [HS2017] A.C. Hindmarsh and R. Serban. User Documentation for CVODE v3.0.0. Technical Report UCRL-SM208108, LLNL, 2017. [HSR2017] A.C. Hindmarsh, R. Serban and D.R. Reynolds. Example Programs for CVODE v3.0.0. Technical Report UCRL-SM-208110, LLNL, 2017. [HT1998] A.C. Hindmarsh and A.G. Taylor. PVODE and KINSOL: Parallel Software for Differential and Nonlinear Systems. Technical Report UCRL-IL-129739, LLNL, February 1998. [HK2014] R.D. Hornung and J.A. Keasler. The RAJA Portability Layer: Overview and Status. Technical Report LLNL-TR-661403, LLNL, September 2014. [K1995] C.T. Kelley. Iterative Methods for Solving Linear and Nonlinear Equations. SIAM, Philadelphia, 1995. [KC2003] C.A. Kennedy and M.H. Carpenter. Additive Runge-Kutta schemes for convection-diffusion-reaction equations. Appl. Numer. Math., 44:139-181, 2003. [KLU] KLU Sparse Matrix Factorization Library. [K2004] A. Kv{ae}rno. Singly Diagonally Implicit Runge-Kutta Methods with an Explicit First Stage. BIT Numer. Math., 44:489-502, 2004. [L2005] X.S. Li. An Overview of SuperLU: Algorithms, Implementation, and User Interface. ACM Trans. Math. Soft., 31:302-325, 2005. [LWWY2012] P.A. Lott, H.F. Walker, C.S. Woodward and U.M. Yang. An Accelerated Picard Method for Nonlinear Systems Related to Variably Saturated Flow, Adv. Wat. Resour., 38:92-101, 2012. [R2018] D.R. Reynolds. ARKode Example Documentation. Technical Report, Southern Methodist University Center for Scientific Computation, 2018. [SS1986] Y. Saad and M.H. Schultz. GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comp., 7:856-869, 1986. [S1993] Y. Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput., 14:461-469, 1993. 384 Bibliography User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), [SA2002] A. Sayfy and A. Aburub. Embedded Additive Runge-Kutta Methods. Intern. J. Computer Math., 79:945953, 2002. [SKAW2009] M. Schlegel, O. Knoth, M. Arnold, and R. Wolke. Multirate Runge–Kutta schemes for advection equations. J. Comput. Appl. Math., 226:345-357, 2009. [SKAW2012a] M. Schlegel, O. Knoth, M. Arnold, and R. Wolke. Implementation of multirate time integration methods for air pollution modelling. GMD, 5:1395-1405, 2012. [SKAW2012b] M. Schlegel, O. Knoth, M. Arnold, and R. Wolke. Numerical solution of multiscale problems in atmospheric modeling. Appl. Numer. Math., 62:1531-1542, 2012. [S1998] G. Soderlind. The automatic control of numerical integration. CWI Quarterly, 11:55-74, 1998. [S2003] G. Soderlind. Digital filters in adaptive time-stepping. ACM Trans. Math. Soft., 29:1-26, 2003. [S2006] G. Soderlind. Time-step selection algorithms: Adaptivity, control and signal processing. Appl. Numer. Math., 56:488-502, 2006. [SuperLUMT] SuperLU_MT Threaded Sparse Matrix Factorization Library. [V1992] H.A. Van Der Vorst. Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comp., 13:631-644, 1992. [V1978] J.H. Verner. Explicit Runge-Kutta methods with estimates of the local truncation error. SIAM J. Numer. Anal., 15:772-790, 1978. [WN2011] H.F. Walker and P. Ni. Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal., 49:17151735, 2011. [KW1998] O. Knoth and R. Wolke. Implicit-explicit Runge-Kutta methods for computing atmospheric reactive flows. Appl. Numer. Math., 28(2):327-341, 1998. [Z1963] J.A. Zonneveld. Automatic integration of ordinary differential equations. Report R743, Mathematisch Centrum, Postbus 4079, 1009AB Amsterdam, 1963. Bibliography 385 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 386 Bibliography INDEX additive Runge-Kutta methods, 15 ARK-4-2-3 ARK method, 358, 381 ARK-4-2-3 ERK method, 357, 364 ARK-4-2-3 ESDIRK method, 358, 376 ARK-6-3-4 ARK method, 358, 382 ARK-6-3-4 ERK method, 357, 366 ARK-6-3-4 ESDIRK method, 358, 379 ARK-8-4-5 ARK method, 358, 382 ARK-8-4-5 ERK method, 357, 369 ARK-8-4-5 ESDIRK method, 358, 381 ARK_BAD_DKY, 359 ARK_BAD_K, 359 ARK_BAD_T, 359 ARK_CONV_FAILURE, 359 ARK_ERR_FAILURE, 359 ARK_FIRST_RHSFUNC_ERR, 359 ARK_ILL_INPUT, 359 ARK_INNERSTEP_OP_ERR, 360 ARK_LFREE_FAIL, 359 ARK_LINIT_FAIL, 359 ARK_LSETUP_FAIL, 359 ARK_LSOLVE_FAIL, 359 ARK_MASSFREE_FAIL, 359 ARK_MASSINIT_FAIL, 359 ARK_MASSMULT_FAIL, 359 ARK_MASSSETUP_FAIL, 359 ARK_MASSSOLVE_FAIL, 359 ARK_MEM_FAIL, 359 ARK_MEM_NULL, 359 ARK_NLS_INIT_FAIL, 359 ARK_NLS_OP_ERR, 360 ARK_NLS_SETUP_FAIL, 359 ARK_NLS_SETUP_RECVR, 359 ARK_NO_MALLOC, 359 ARK_NORMAL, 357 ARK_ONE_STEP, 357 ARK_REPTD_RHSFUNC_ERR, 359 ARK_RHSFUNC_FAIL, 359 ARK_ROOT_RETURN, 359 ARK_RTFUNC_FAIL, 359 ARK_SUCCESS, 358 ARK_TOO_CLOSE, 359 ARK_TOO_MUCH_ACC, 359 ARK_TOO_MUCH_WORK, 359 ARK_TSTOP_RETURN, 359 ARK_UNREC_RHSFUNC_ERR, 359 ARK_UNRECOGNIZED_ERROR, 360 ARK_VECTOROP_ERR, 359 ARK_WARNING, 359 ARKAdaptFn (C type), 99, 185 ARKBandPrecGetNumRhsEvals (C function), 110 ARKBandPrecGetWorkSpace (C function), 110 ARKBandPrecInit (C function), 109 ARKBBDPrecGetNumGfnEvals (C function), 115 ARKBBDPrecGetWorkSpace (C function), 115 ARKBBDPrecInit (C function), 114 ARKBBDPrecReInit (C function), 115 ARKCommFn (C function), 112 ARKErrHandlerFn (C type), 98, 184, 206 ARKEwtFn (C type), 98, 184 ARKExpStabFn (C type), 100, 185 ARKLocalFn (C function), 112 ARKLS_ILL_INPUT, 360 ARKLS_JACFUNC_RECVR, 360 ARKLS_JACFUNC_UNRECVR, 360 ARKLS_LMEM_NULL, 360 ARKLS_MASSFUNC_RECVR, 360 ARKLS_MASSFUNC_UNRECVR, 360 ARKLS_MASSMEM_NULL, 360 ARKLS_MEM_FAIL, 360 ARKLS_MEM_NULL, 360 ARKLS_PMEM_NULL, 360 ARKLS_SUCCESS, 360 ARKLS_SUNLS_FAIL, 360 ARKLS_SUNMAT_FAIL, 360 ARKLsJacFn (C type), 100 ARKLsJacTimesSetupFn (C type), 103 ARKLsJacTimesVecFn (C type), 102 ARKLsMassFn (C type), 105 ARKLsMassPrecSetupFn (C type), 107 ARKLsMassPrecSolveFn (C type), 106 ARKLsMassTimesSetupFn (C type), 106 ARKLsMassTimesVecFn (C type), 106 ARKLsPrecSetupFn (C type), 104 387 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ARKLsPrecSolveFn (C type), 103 ARKodeButcherTable (C type), 209 ARKodeButcherTable_Alloc (C function), 210 ARKodeButcherTable_CheckARKOrder (C function), 212 ARKodeButcherTable_CheckOrder (C function), 212 ARKodeButcherTable_Copy (C function), 211 ARKodeButcherTable_Create (C function), 210 ARKodeButcherTable_Free (C function), 211 ARKodeButcherTable_LoadDIRK (C function), 210 ARKodeButcherTable_LoadERK (C function), 210 ARKodeButcherTable_Space (C function), 211 ARKodeButcherTable_Write (C function), 211 ARKRhsFn (C type), 97, 183, 206 ARKRootFn (C type), 100, 186, 207 ARKRwtFn (C type), 98 ARKStepCreate (C function), 45 ARKStepEvolve (C function), 53 ARKStepFree (C function), 45 ARKStepGetActualInitStep (C function), 80 ARKStepGetCurrentButcherTables (C function), 84 ARKStepGetCurrentStep (C function), 81 ARKStepGetCurrentTime (C function), 81 ARKStepGetDky (C function), 77 ARKStepGetErrWeights (C function), 82 ARKStepGetEstLocalErrors (C function), 84 ARKStepGetLastLinFlag (C function), 90 ARKStepGetLastMassFlag (C function), 93 ARKStepGetLastStep (C function), 81 ARKStepGetLinReturnFlagName (C function), 90 ARKStepGetLinWorkSpace (C function), 87 ARKStepGetMassWorkSpace (C function), 91 ARKStepGetNonlinSolvStats (C function), 86 ARKStepGetNumAccSteps (C function), 83 ARKStepGetNumErrTestFails (C function), 83 ARKStepGetNumExpSteps (C function), 82 ARKStepGetNumGEvals (C function), 87 ARKStepGetNumJacEvals (C function), 88 ARKStepGetNumJtimesEvals (C function), 89 ARKStepGetNumJTSetupEvals (C function), 89 ARKStepGetNumLinConvFails (C function), 89 ARKStepGetNumLinIters (C function), 89 ARKStepGetNumLinRhsEvals (C function), 90 ARKStepGetNumLinSolvSetups (C function), 85 ARKStepGetNumMassConvFails (C function), 92 ARKStepGetNumMassIters (C function), 92 ARKStepGetNumMassMult (C function), 91 ARKStepGetNumMassPrecEvals (C function), 92 ARKStepGetNumMassPrecSolves (C function), 92 ARKStepGetNumMassSetups (C function), 91 ARKStepGetNumMassSolves (C function), 91 ARKStepGetNumMTSetups (C function), 93 ARKStepGetNumNonlinSolvConvFails (C function), 86 ARKStepGetNumNonlinSolvIters (C function), 85 388 ARKStepGetNumPrecEvals (C function), 88 ARKStepGetNumPrecSolves (C function), 88 ARKStepGetNumRhsEvals (C function), 83 ARKStepGetNumStepAttempts (C function), 83 ARKStepGetNumSteps (C function), 80 ARKStepGetResWeights (C function), 82 ARKStepGetReturnFlagName (C function), 82 ARKStepGetRootInfo (C function), 86 ARKStepGetStepStats (C function), 82 ARKStepGetTimestepperStats (C function), 85 ARKStepGetTolScaleFactor (C function), 81 ARKStepGetWorkSpace (C function), 80 ARKStepReInit (C function), 95 ARKStepResFtolerance (C function), 47 ARKStepResize (C function), 95 ARKStepResStolerance (C function), 47 ARKStepResVtolerance (C function), 47 ARKStepRootInit (C function), 52 ARKStepSetAdaptivityFn (C function), 63 ARKStepSetAdaptivityMethod (C function), 63 ARKStepSetCFLFraction (C function), 64 ARKStepSetDefaults (C function), 55 ARKStepSetDeltaGammaMax (C function), 70 ARKStepSetDenseOrder (C function), 55 ARKStepSetDiagnostics (C function), 56 ARKStepSetEpsLin (C function), 76 ARKStepSetErrFile (C function), 56 ARKStepSetErrHandlerFn (C function), 56 ARKStepSetErrorBias (C function), 64 ARKStepSetExplicit (C function), 61 ARKStepSetFixedStep (C function), 57 ARKStepSetFixedStepBounds (C function), 64 ARKStepSetImEx (C function), 61 ARKStepSetImplicit (C function), 61 ARKStepSetInitStep (C function), 57 ARKStepSetJacFn (C function), 72 ARKStepSetJacTimes (C function), 73 ARKStepSetLinear (C function), 67 ARKStepSetLinearSolver (C function), 50 ARKStepSetMassEpsLin (C function), 76 ARKStepSetMassFn (C function), 72 ARKStepSetMassLinearSolver (C function), 51 ARKStepSetMassPreconditioner (C function), 75 ARKStepSetMassTimes (C function), 74 ARKStepSetMaxCFailGrowth (C function), 65 ARKStepSetMaxConvFails (C function), 69 ARKStepSetMaxEFailGrowth (C function), 65 ARKStepSetMaxErrTestFails (C function), 59 ARKStepSetMaxFirstGrowth (C function), 65 ARKStepSetMaxGrowth (C function), 66 ARKStepSetMaxHnilWarns (C function), 58 ARKStepSetMaxNonlinIters (C function), 68 ARKStepSetMaxNumSteps (C function), 58 ARKStepSetMaxStep (C function), 58 Index User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ARKStepSetMaxStepsBetweenJac (C function), 71 ARKStepSetMaxStepsBetweenLSet (C function), 71 ARKStepSetMinStep (C function), 59 ARKStepSetNoInactiveRootWarn (C function), 77 ARKStepSetNonlinConvCoef (C function), 69 ARKStepSetNonlinCRDown (C function), 69 ARKStepSetNonlinear (C function), 67 ARKStepSetNonlinearSolver (C function), 52 ARKStepSetNonlinRDiv (C function), 69 ARKStepSetOptimalParams (C function), 60 ARKStepSetOrder (C function), 60 ARKStepSetPreconditioner (C function), 75 ARKStepSetPredictorMethod (C function), 68 ARKStepSetRootDirection (C function), 76 ARKStepSetSafetyFactor (C function), 66 ARKStepSetSmallNumEFails (C function), 66 ARKStepSetStabilityFn (C function), 66 ARKStepSetStopTime (C function), 59 ARKStepSetTableNum (C function), 62 ARKStepSetTables (C function), 61 ARKStepSetUserData (C function), 59 ARKStepSStolerances (C function), 45 ARKStepSVtolerances (C function), 46 ARKStepWFtolerances (C function), 46 ARKStepWriteButcher (C function), 94 ARKStepWriteParameters (C function), 93 ARKVecResizeFn (C type), 108, 186, 207 ATimesFn (C type), 282 BIG_REAL, 38, 154, 188 Billington-3-3-2 SDIRK method, 358, 374 BLAS_ENABLE (CMake option), 343 BLAS_LIBRARIES (CMake option), 343 Bogacki-Shampine-4-2-3 ERK method, 357, 362 BUILD_ARKODE (CMake option), 343 BUILD_CVODE (CMake option), 343 BUILD_CVODES (CMake option), 343 BUILD_IDA (CMake option), 343 BUILD_IDAS (CMake option), 343 BUILD_KINSOL (CMake option), 343 BUILD_SHARED_LIBS (CMake option), 343 BUILD_STATIC_LIBS (CMake option), 343 CMAKE_C_FLAGS_RELEASE (CMake option), 344 CMAKE_CXX_COMPILER (CMake option), 344 CMAKE_CXX_FLAGS (CMake option), 344 CMAKE_CXX_FLAGS_DEBUG (CMake option), 344 CMAKE_CXX_FLAGS_MINSIZEREL (CMake option), 344 CMAKE_CXX_FLAGS_RELEASE (CMake option), 344 CMAKE_Fortran_COMPILER (CMake option), 344 CMAKE_Fortran_FLAGS (CMake option), 344 CMAKE_Fortran_FLAGS_DEBUG (CMake option), 345 CMAKE_Fortran_FLAGS_MINSIZEREL (CMake option), 345 CMAKE_Fortran_FLAGS_RELEASE (CMake option), 345 CMAKE_INSTALL_PREFIX (CMake option), 345 CUDA_ENABLE (CMake option), 345 CXX_ENABLE (CMake option), 345 DEFAULT_ARK_ETABLE_3, 358 DEFAULT_ARK_ETABLE_4, 358 DEFAULT_ARK_ETABLE_5, 358 DEFAULT_ARK_ITABLE_3, 358 DEFAULT_ARK_ITABLE_4, 358 DEFAULT_ARK_ITABLE_5, 358 DEFAULT_DIRK_2, 358 DEFAULT_DIRK_3, 358 DEFAULT_DIRK_4, 358 DEFAULT_DIRK_5, 358 DEFAULT_ERK_2, 357 DEFAULT_ERK_3, 357 DEFAULT_ERK_4, 357 DEFAULT_ERK_5, 357 DEFAULT_ERK_6, 358 DEFAULT_ERK_8, 358 diagonally-implicit Runge-Kutta methods, 16 Dormand-Prince-7-4-5 ERK method, 357, 369 ERKStepCreate (C function), 157 ERKStepEvolve (C function), 160 ERKStepFree (C function), 157 ERKStepGetActualInitStep (C function), 175 Cash-5-2-4 SDIRK method, 358, 376 ERKStepGetCurrentButcherTable (C function), 178 Cash-5-3-4 SDIRK method, 358, 377 ERKStepGetCurrentStep (C function), 176 Cash-Karp-6-4-5 ERK method, 357, 367 ERKStepGetCurrentTime (C function), 176 ccmake, 340 ERKStepGetDky (C function), 173 cmake, 341 ERKStepGetErrWeights (C function), 177 cmake-gui, 340 ERKStepGetEstLocalErrors (C function), 179 CMAKE_BUILD_TYPE (CMake option), 344 ERKStepGetLastStep (C function), 176 CMAKE_C_COMPILER (CMake option), 344 ERKStepGetNumAccSteps (C function), 177 CMAKE_C_FLAGS (CMake option), 344 ERKStepGetNumErrTestFails (C function), 178 CMAKE_C_FLAGS_DEBUG (CMake option), 344 ERKStepGetNumExpSteps (C function), 177 CMAKE_C_FLAGS_MINSIZEREL (CMake option), ERKStepGetNumGEvals (C function), 180 344 Index 389 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), ERKStepGetNumRhsEvals (C function), 178 ERKStepGetNumStepAttempts (C function), 178 ERKStepGetNumSteps (C function), 175 ERKStepGetReturnFlagName (C function), 177 ERKStepGetRootInfo (C function), 180 ERKStepGetStepStats (C function), 177 ERKStepGetTimestepperStats (C function), 179 ERKStepGetTolScaleFactor (C function), 176 ERKStepGetWorkSpace (C function), 175 ERKStepReInit (C function), 181 ERKStepResize (C function), 182 ERKStepRootInit (C function), 160 ERKStepSetAdaptivityFn (C function), 168 ERKStepSetAdaptivityMethod (C function), 169 ERKStepSetCFLFraction (C function), 169 ERKStepSetDefaults (C function), 162 ERKStepSetDenseOrder (C function), 162 ERKStepSetDiagnostics (C function), 163 ERKStepSetErrFile (C function), 163 ERKStepSetErrHandlerFn (C function), 164 ERKStepSetErrorBias (C function), 169 ERKStepSetFixedStep (C function), 164 ERKStepSetFixedStepBounds (C function), 170 ERKStepSetInitStep (C function), 164 ERKStepSetMaxEFailGrowth (C function), 170 ERKStepSetMaxErrTestFails (C function), 166 ERKStepSetMaxFirstGrowth (C function), 170 ERKStepSetMaxGrowth (C function), 171 ERKStepSetMaxHnilWarns (C function), 165 ERKStepSetMaxNumSteps (C function), 165 ERKStepSetMaxStep (C function), 165 ERKStepSetMinStep (C function), 166 ERKStepSetNoInactiveRootWarn (C function), 172 ERKStepSetOrder (C function), 167 ERKStepSetRootDirection (C function), 172 ERKStepSetSafetyFactor (C function), 171 ERKStepSetSmallNumEFails (C function), 171 ERKStepSetStabilityFn (C function), 171 ERKStepSetStopTime (C function), 166 ERKStepSetTable (C function), 167 ERKStepSetTableNum (C function), 168 ERKStepSetUserData (C function), 166 ERKStepSStolerances (C function), 158 ERKStepSVtolerances (C function), 158 ERKStepWFtolerances (C function), 158 ERKStepWriteButcher (C function), 181 ERKStepWriteParameters (C function), 180 error weight vector, 17 EXAMPLES_ENABLE_C (CMake option), 345 EXAMPLES_ENABLE_CUDA (CMake option), 345 EXAMPLES_ENABLE_CXX (CMake option), 345 EXAMPLES_ENABLE_F77 (CMake option), 345 EXAMPLES_ENABLE_F90 (CMake option), 345 EXAMPLES_ENABLE_RAJA (CMake option), 345 390 EXAMPLES_INSTALL (CMake option), 345 EXAMPLES_INSTALL_PATH (CMake option), 346 explicit Runge-Kutta methods, 15, 16 F90_ENABLE (CMake option), 346 FARKADAPT() (fortran subroutine), 129 FARKADAPTSET() (fortran subroutine), 129 FARKBANDSETJAC() (fortran subroutine), 132 FARKBANDSETMASS() (fortran subroutine), 138 FARKBBDINIT() (fortran subroutine), 150 FARKBBDOPT() (fortran subroutine), 151 FARKBBDREINIT() (fortran subroutine), 151 FARKBJAC() (fortran subroutine), 131 FARKBMASS() (fortran subroutine), 138 FARKBPINIT() (fortran subroutine), 148 FARKBPOPT() (fortran subroutine), 149 FARKCOMMFN() (fortran subroutine), 152 FARKDENSESETJAC() (fortran subroutine), 131 FARKDENSESETMASS() (fortran subroutine), 137 FARKDJAC() (fortran subroutine), 131 FARKDKY() (fortran subroutine), 142 FARKDMASS() (fortran subroutine), 137 FARKEFUN() (fortran subroutine), 122 FARKEWT() (fortran subroutine), 125 FARKEWTSET() (fortran subroutine), 125 FARKEXPSTAB() (fortran subroutine), 130 FARKEXPSTABSET() (fortran subroutine), 130 FARKFREE() (fortran subroutine), 144 FARKGETERRWEIGHTS() (fortran subroutine), 146 FARKGETESTLOCALERR() (fortran subroutine), 146 FARKGLOCFN() (fortran subroutine), 152 FARKIFUN() (fortran subroutine), 121 FARKJTIMES() (fortran subroutine), 134 FARKJTSETUP() (fortran subroutine), 135 FARKLSINIT() (fortran subroutine), 130 FARKLSMASSINIT() (fortran subroutine), 137 FARKLSSETEPSLIN() (fortran subroutine), 133 FARKLSSETJAC() (fortran subroutine), 134 FARKLSSETMASS() (fortran subroutine), 140 FARKLSSETMASSEPSLIN() (fortran subroutine), 139 FARKLSSETMASSPREC() (fortran subroutine), 140 FARKLSSETPREC() (fortran subroutine), 134 FARKMALLOC() (fortran subroutine), 124 FARKMASSPSET() (fortran subroutine), 140 FARKMASSPSOL() (fortran subroutine), 141 FARKMTIMES() (fortran subroutine), 139 FARKMTSETUP() (fortran subroutine), 140 FARKNLSINIT() (fortran subroutine), 130 FARKODE() (fortran subroutine), 142 FARKPSET() (fortran subroutine), 136 FARKPSOL() (fortran subroutine), 135 FARKREINIT() (fortran subroutine), 143 FARKRESIZE() (fortran subroutine), 143 FARKROOTFN() (fortran subroutine), 147 Index User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), FARKROOTFREE() (fortran subroutine), 148 FARKROOTINFO() (fortran subroutine), 147 FARKROOTINIT() (fortran subroutine), 147 FARKSETADAPTIVITYMETHOD() (fortran subroutine), 128 FARKSETARKTABLES() (fortran subroutine), 128 FARKSETDEFAULTS() (fortran subroutine), 127 FARKSETERKTABLE() (fortran subroutine), 127 FARKSETIIN() (fortran subroutine), 125 FARKSETIRKTABLE() (fortran subroutine), 127 FARKSETRESTOLERANCE() (fortran subroutine), 128 FARKSETRIN() (fortran subroutine), 126 FARKSPARSESETJAC() (fortran subroutine), 133 FARKSPARSESETMASS() (fortran subroutine), 139 FARKSPJAC() (fortran subroutine), 132 FARKSPMASS() (fortran subroutine), 138 FCMIX_ENABLE (CMake option), 346 Fehlberg-13-7-8 ERK method, 357, 371 Fehlberg-6-4-5 ERK method, 357, 368 fixed point iteration, 23 FSUNBandLinSolInit() (fortran subroutine), 291 FSUNBandMassMatInit() (fortran subroutine), 267 FSUNBandMatInit() (fortran subroutine), 267 FSUNDenseLinSolInit() (fortran subroutine), 289 FSUNDenseMassMatInit() (fortran subroutine), 262 FSUNDenseMatInit() (fortran subroutine), 261 FSUNFixedPointInit() (fortran subroutine), 338 FSUNKLUInit() (fortran subroutine), 297 FSUNKLUReInit() (fortran subroutine), 298 FSUNKLUSetOrdering() (fortran subroutine), 298 FSUNLapackBandInit() (fortran subroutine), 294 FSUNLapackDenseInit() (fortran subroutine), 292 FSUNMassBandLinSolInit() (fortran subroutine), 291 FSUNMassDenseLinSolInit() (fortran subroutine), 289 FSUNMassKLUInit() (fortran subroutine), 297 FSUNMassKLUReInit() (fortran subroutine), 298 FSUNMassKLUSetOrdering() (fortran subroutine), 298 FSUNMassLapackBandInit() (fortran subroutine), 295 FSUNMassLapackDenseInit() (fortran subroutine), 293 FSUNMassPCGInit() (fortran subroutine), 321 FSUNMassPCGSetMaxl() (fortran subroutine), 322 FSUNMassPCGSetPrecType() (fortran subroutine), 322 FSUNMassSPBCGSInit() (fortran subroutine), 313 FSUNMassSPBCGSSetMaxl() (fortran subroutine), 314 FSUNMassSPBCGSSetPrecType() (fortran subroutine), 314 FSUNMassSPFGMRInit() (fortran subroutine), 309 FSUNMassSPFGMRSetGSType() (fortran subroutine), 309 FSUNMassSPFGMRSetMaxRS() (fortran subroutine), 310 FSUNMassSPFGMRSetPrecType() (fortran subroutine), 309 FSUNMassSPGMRInit() (fortran subroutine), 304 Index FSUNMassSPGMRSetGSType() (fortran subroutine), 305 FSUNMassSPGMRSetMaxRS() (fortran subroutine), 305 FSUNMassSPGMRSetPrecType() (fortran subroutine), 305 FSUNMassSPTFQMRInit() (fortran subroutine), 317 FSUNMassSPTFQMRSetMaxl() (fortran subroutine), 318 FSUNMassSPTFQMRSetPrecType() (fortran subroutine), 317 FSUNMassSuperLUMTInit() (fortran subroutine), 301 FSUNMassSuperLUMTSetOrdering() (fortran subroutine), 301 FSUNNewtonInit() (fortran subroutine), 334 FSUNPCGInit() (fortran subroutine), 321 FSUNPCGSetMaxl() (fortran subroutine), 322 FSUNPCGSetPrecType() (fortran subroutine), 322 FSUNSparseMassMatInit() (fortran subroutine), 273 FSUNSparseMatInit() (fortran subroutine), 273 FSUNSPBCGSInit() (fortran subroutine), 313 FSUNSPBCGSSetMaxl() (fortran subroutine), 314 FSUNSPBCGSSetPrecType() (fortran subroutine), 314 FSUNSPFGMRInit() (fortran subroutine), 309 FSUNSPFGMRSetGSType() (fortran subroutine), 309 FSUNSPFGMRSetMaxRS() (fortran subroutine), 310 FSUNSPFGMRSetPrecType() (fortran subroutine), 309 FSUNSPGMRInit() (fortran subroutine), 304 FSUNSPGMRSetGSType() (fortran subroutine), 305 FSUNSPGMRSetMaxRS() (fortran subroutine), 305 FSUNSPGMRSetPrecType() (fortran subroutine), 305 FSUNSPTFQMRInit() (fortran subroutine), 317 FSUNSPTFQMRSetMaxl() (fortran subroutine), 317 FSUNSPTFQMRSetPrecType() (fortran subroutine), 317 FSUNSuperLUMTInit() (fortran subroutine), 301 FSUNSuperLUMTSetOrdering() (fortran subroutine), 301 Heun-Euler-2-1-2 ERK method, 357, 362 HYPRE_ENABLE (CMake option), 346 HYPRE_INCLUDE_DIR (CMake option), 346 HYPRE_LIBRARY (CMake option), 346 inexact Newton iteration, 24 KLU_INCLUDE_DIR (CMake option), 346 KLU_LIBRARY_DIR (CMake option), 346 Knoth-Wolke-3-3 ERK method, 357, 364 Kvaerno-4-2-3 ESDIRK method, 358, 375 Kvaerno-5-3-4 ESDIRK method, 358, 379 Kvaerno-7-4-5 ESDIRK method, 358, 380 LAPACK_ENABLE (CMake option), 346 LAPACK_LIBRARIES (CMake option), 346 linear solver setup, 25 391 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), modified Newton iteration, 24 MPI_C_COMPILER (CMake option), 347 MPI_CXX_COMPILER (CMake option), 347 MPI_ENABLE (CMake option), 347 MPI_Fortran_COMPILER (CMake option), 347 MPIEXEC_EXECUTABLE (CMake option), 347 MRIStepCreate (C function), 191 MRIStepEvolve (C function), 192 MRIStepFree (C function), 191 MRIStepGetCurrentButcherTables (C function), 202 MRIStepGetCurrentTime (C function), 201 MRIStepGetDky (C function), 199 MRIStepGetLastInnerStepFlag (C function), 202 MRIStepGetLastStep (C function), 201 MRIStepGetNumGEvals (C function), 204 MRIStepGetNumRhsEvals (C function), 202 MRIStepGetNumSteps (C function), 201 MRIStepGetReturnFlagName (C function), 202 MRIStepGetRootInfo (C function), 204 MRIStepGetWorkSpace (C function), 201 MRIStepReInit (C function), 205 MRIStepResize (C function), 205 MRIStepRootInit (C function), 191 MRIStepSetDefaults (C function), 194 MRIStepSetDenseOrder (C function), 194 MRIStepSetDiagnostics (C function), 194 MRIStepSetErrFile (C function), 195 MRIStepSetErrHandlerFn (C function), 195 MRIStepSetFixedStep (C function), 195 MRIStepSetMaxHnilWarns (C function), 196 MRIStepSetMaxNumSteps (C function), 196 MRIStepSetMRITableNum (C function), 198 MRIStepSetMRITables (C function), 197 MRIStepSetNoInactiveRootWarn (C function), 198 MRIStepSetRootDirection (C function), 198 MRIStepSetStopTime (C function), 196 MRIStepSetUserData (C function), 197 MRIStepWriteButcher (C function), 203 MRIStepWriteParameters (C function), 203 N_VAbs (C function), 217 N_VAddConst (C function), 218 N_VClone (C function), 216 N_VCloneEmpty (C function), 216 N_VCloneVectorArray_Cuda (C function), 243 N_VCloneVectorArray_OpenMP (C function), 231 N_VCloneVectorArray_Parallel (C function), 227 N_VCloneVectorArray_ParHyp (C function), 236 N_VCloneVectorArray_Petsc (C function), 239 N_VCloneVectorArray_Pthreads (C function), 234 N_VCloneVectorArray_Raja (C function), 246 N_VCloneVectorArray_Serial (C function), 224 N_VCloneVectorArrayEmpty_Cuda (C function), 243 392 N_VCloneVectorArrayEmpty_OpenMP (C function), 231 N_VCloneVectorArrayEmpty_Parallel (C function), 227 N_VCloneVectorArrayEmpty_ParHyp (C function), 236 N_VCloneVectorArrayEmpty_Petsc (C function), 239 N_VCloneVectorArrayEmpty_Pthreads (C function), 234 N_VCloneVectorArrayEmpty_Raja (C function), 246 N_VCloneVectorArrayEmpty_Serial (C function), 224 N_VCompare (C function), 219 N_VConst (C function), 217 N_VConstrMask (C function), 219 N_VConstVectorArray (C function), 221 N_VCopyFromDevice_Cuda (C function), 243 N_VCopyFromDevice_Raja (C function), 246 N_VCopyToDevice_Cuda (C function), 243 N_VCopyToDevice_Raja (C function), 246 N_VDestroy (C function), 216 N_VDestroyVectorArray_Cuda (C function), 243 N_VDestroyVectorArray_OpenMP (C function), 231 N_VDestroyVectorArray_Parallel (C function), 227 N_VDestroyVectorArray_ParHyp (C function), 237 N_VDestroyVectorArray_Petsc (C function), 239 N_VDestroyVectorArray_Pthreads (C function), 234 N_VDestroyVectorArray_Raja (C function), 246 N_VDestroyVectorArray_Serial (C function), 224 N_VDiv (C function), 217 N_VDotProd (C function), 218 N_VDotProdMulti (C function), 221 N_VEnableConstVectorArray_Cuda (C function), 243 N_VEnableConstVectorArray_OpenMP (C function), 232 N_VEnableConstVectorArray_OpenMPDEV (C function), 250 N_VEnableConstVectorArray_Parallel (C function), 228 N_VEnableConstVectorArray_ParHyp (C function), 237 N_VEnableConstVectorArray_Petsc (C function), 239 N_VEnableConstVectorArray_Pthreads (C function), 235 N_VEnableConstVectorArray_Raja (C function), 247 N_VEnableConstVectorArray_Serial (C function), 225 N_VEnableDotProdMulti_Cuda (C function), 243 N_VEnableDotProdMulti_OpenMP (C function), 231 N_VEnableDotProdMulti_OpenMPDEV (C function), 249 N_VEnableDotProdMulti_Parallel (C function), 228 N_VEnableDotProdMulti_ParHyp (C function), 237 N_VEnableDotProdMulti_Petsc (C function), 239 N_VEnableDotProdMulti_Pthreads (C function), 235 N_VEnableDotProdMulti_Serial (C function), 225 N_VEnableFusedOps_Cuda (C function), 243 N_VEnableFusedOps_OpenMP (C function), 231 N_VEnableFusedOps_OpenMPDEV (C function), 249 N_VEnableFusedOps_Parallel (C function), 228 N_VEnableFusedOps_ParHyp (C function), 237 N_VEnableFusedOps_Petsc (C function), 239 Index User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), N_VEnableFusedOps_Pthreads (C function), 234 N_VEnableFusedOps_Raja (C function), 246 N_VEnableFusedOps_Serial (C function), 224 N_VEnableLinearCombination_Cuda (C function), 243 N_VEnableLinearCombination_OpenMP (C function), 231 N_VEnableLinearCombination_OpenMPDEV (C function), 249 N_VEnableLinearCombination_Parallel (C function), 228 N_VEnableLinearCombination_ParHyp (C function), 237 N_VEnableLinearCombination_Petsc (C function), 239 N_VEnableLinearCombination_Pthreads (C function), 235 N_VEnableLinearCombination_Raja (C function), 246 N_VEnableLinearCombination_Serial (C function), 224 N_VEnableLinearCombinationVectorArray_Cuda (C function), 244 N_VEnableLinearCombinationVectorArray_OpenMP (C function), 232 N_VEnableLinearCombinationVectorArray_OpenMPDEV (C function), 250 N_VEnableLinearCombinationVectorArray_Parallel (C function), 228 N_VEnableLinearCombinationVectorArray_ParHyp (C function), 238 N_VEnableLinearCombinationVectorArray_Petsc (C function), 240 N_VEnableLinearCombinationVectorArray_Pthreads (C function), 235 N_VEnableLinearCombinationVectorArray_Raja (C function), 247 N_VEnableLinearCombinationVectorArray_Serial (C function), 225 N_VEnableLinearSumVectorArray_Cuda (C function), 243 N_VEnableLinearSumVectorArray_OpenMP (C function), 231 N_VEnableLinearSumVectorArray_OpenMPDEV (C function), 249 N_VEnableLinearSumVectorArray_Parallel (C function), 228 N_VEnableLinearSumVectorArray_ParHyp (C function), 237 N_VEnableLinearSumVectorArray_Petsc (C function), 239 N_VEnableLinearSumVectorArray_Pthreads (C function), 235 N_VEnableLinearSumVectorArray_Raja (C function), 246 N_VEnableLinearSumVectorArray_Serial (C function), 225 N_VEnableScaleAddMulti_Cuda (C function), 243 Index N_VEnableScaleAddMulti_OpenMP (C function), 231 N_VEnableScaleAddMulti_OpenMPDEV (C function), 249 N_VEnableScaleAddMulti_Parallel (C function), 228 N_VEnableScaleAddMulti_ParHyp (C function), 237 N_VEnableScaleAddMulti_Petsc (C function), 239 N_VEnableScaleAddMulti_Pthreads (C function), 235 N_VEnableScaleAddMulti_Raja (C function), 246 N_VEnableScaleAddMulti_Serial (C function), 224 N_VEnableScaleAddMultiVectorArray_Cuda (C function), 244 N_VEnableScaleAddMultiVectorArray_OpenMP (C function), 232 N_VEnableScaleAddMultiVectorArray_OpenMPDEV (C function), 250 N_VEnableScaleAddMultiVectorArray_Parallel (C function), 228 N_VEnableScaleAddMultiVectorArray_ParHyp (C function), 237 N_VEnableScaleAddMultiVectorArray_Petsc (C function), 240 N_VEnableScaleAddMultiVectorArray_Pthreads (C function), 235 N_VEnableScaleAddMultiVectorArray_Raja (C function), 247 N_VEnableScaleAddMultiVectorArray_Serial (C function), 225 N_VEnableScaleVectorArray_Cuda (C function), 243 N_VEnableScaleVectorArray_OpenMP (C function), 231 N_VEnableScaleVectorArray_OpenMPDEV (C function), 250 N_VEnableScaleVectorArray_Parallel (C function), 228 N_VEnableScaleVectorArray_ParHyp (C function), 237 N_VEnableScaleVectorArray_Petsc (C function), 239 N_VEnableScaleVectorArray_Pthreads (C function), 235 N_VEnableScaleVectorArray_Raja (C function), 246 N_VEnableScaleVectorArray_Serial (C function), 225 N_VEnableWrmsNormMaskVectorArray_Cuda (C function), 244 N_VEnableWrmsNormMaskVectorArray_OpenMP (C function), 232 N_VEnableWrmsNormMaskVectorArray_OpenMPDEV (C function), 250 N_VEnableWrmsNormMaskVectorArray_Parallel (C function), 228 N_VEnableWrmsNormMaskVectorArray_ParHyp (C function), 237 N_VEnableWrmsNormMaskVectorArray_Petsc (C function), 240 N_VEnableWrmsNormMaskVectorArray_Pthreads (C function), 235 N_VEnableWrmsNormMaskVectorArray_Serial (C function), 225 N_VEnableWrmsNormVectorArray_Cuda (C function), 393 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), 244 N_VEnableWrmsNormVectorArray_OpenMP (C function), 232 N_VEnableWrmsNormVectorArray_OpenMPDEV (C function), 250 N_VEnableWrmsNormVectorArray_Parallel (C function), 228 N_VEnableWrmsNormVectorArray_ParHyp (C function), 237 N_VEnableWrmsNormVectorArray_Petsc (C function), 240 N_VEnableWrmsNormVectorArray_Pthreads (C function), 235 N_VEnableWrmsNormVectorArray_Serial (C function), 225 N_VGetArrayPointer (C function), 216 N_VGetDeviceArrayPointer_Cuda (C function), 241 N_VGetDeviceArrayPointer_Raja (C function), 245 N_VGetHostArrayPointer_Cuda (C function), 241 N_VGetHostArrayPointer_Raja (C function), 245 N_VGetLength_Cuda (C function), 241 N_VGetLength_OpenMP (C function), 231 N_VGetLength_Parallel (C function), 227 N_VGetLength_Pthreads (C function), 234 N_VGetLength_Raja (C function), 245 N_VGetLength_Serial (C function), 224 N_VGetLocalLength_Cuda (C function), 241 N_VGetLocalLength_Parallel (C function), 227 N_VGetLocalLength_Raja (C function), 245 N_VGetMPIComm_Cuda (C function), 241 N_VGetMPIComm_Raja (C function), 245 N_VGetVector_ParHyp (C function), 236 N_VGetVector_Petsc (C function), 239 N_VGetVectorID (C function), 215 N_VInv (C function), 217 N_VInvTest (C function), 219 N_VIsManagedMemory_Cuda (C function), 241 N_VIsManagedMemory_Raja (C function), 245 N_VL1Norm (C function), 219 N_VLinearCombination (C function), 220 N_VLinearCombinationVectorArray (C function), 222 N_VLinearSum (C function), 216 N_VLinearSumVectorArray (C function), 221 N_VMake_Cuda (C function), 242 N_VMake_OpenMP (C function), 231 N_VMake_Parallel (C function), 227 N_VMake_ParHyp (C function), 236 N_VMake_Petsc (C function), 239 N_VMake_Pthreads (C function), 234 N_VMake_Raja (C function), 246 N_VMake_Serial (C function), 224 N_VMakeManaged_Cuda (C function), 242 N_VMaxNorm (C function), 218 N_VMin (C function), 219 394 N_VMinQuotient (C function), 220 N_VNew_Cuda (C function), 242 N_VNew_OpenMP (C function), 230 N_VNew_Parallel (C function), 227 N_VNew_Pthreads (C function), 234 N_VNew_Raja (C function), 245 N_VNew_Serial (C function), 224 N_VNewEmpty_Cuda (C function), 242 N_VNewEmpty_OpenMP (C function), 231 N_VNewEmpty_Parallel (C function), 227 N_VNewEmpty_ParHyp (C function), 236 N_VNewEmpty_Petsc (C function), 238 N_VNewEmpty_Pthreads (C function), 234 N_VNewEmpty_Raja (C function), 246 N_VNewEmpty_Serial (C function), 224 N_VNewManaged_Cuda (C function), 242 N_VPrint_Cuda (C function), 243 N_VPrint_OpenMP (C function), 231 N_VPrint_Parallel (C function), 227 N_VPrint_ParHyp (C function), 237 N_VPrint_Petsc (C function), 239 N_VPrint_Pthreads (C function), 234 N_VPrint_Raja (C function), 246 N_VPrint_Serial (C function), 224 N_VPrintFile_Cuda (C function), 243 N_VPrintFile_OpenMP (C function), 231 N_VPrintFile_Parallel (C function), 227 N_VPrintFile_ParHyp (C function), 237 N_VPrintFile_Petsc (C function), 239 N_VPrintFile_Pthreads (C function), 234 N_VPrintFile_Raja (C function), 246 N_VPrintFile_Serial (C function), 224 N_VProd (C function), 217 N_VScale (C function), 217 N_VScaleAddMulti (C function), 220 N_VScaleAddMultiVectorArray (C function), 222 N_VScaleVectorArray (C function), 221 N_VSetArrayPointer (C function), 216 N_VSpace (C function), 216 N_VWl2Norm (C function), 219 N_VWrmsNorm (C function), 218 N_VWrmsNormMask (C function), 218 N_VWrmsNormMaskVectorArray (C function), 222 N_VWrmsNormVectorArray (C function), 222 Newton linear system, 23 Newton update, 23 Newton’s method, 23 NV_COMM_P (C macro), 227 NV_CONTENT_OMP (C macro), 229 NV_CONTENT_OMPDEV (C macro), 247 NV_CONTENT_P (C macro), 226 NV_CONTENT_PT (C macro), 233 NV_CONTENT_S (C macro), 223 NV_DATA_DEV_OMPDEV (C macro), 248 Index User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), NV_DATA_HOST_OMPDEV (C macro), 248 NV_DATA_OMP (C macro), 230 NV_DATA_P (C macro), 226 NV_DATA_PT (C macro), 233 NV_DATA_S (C macro), 223 NV_GLOBLENGTH_P (C macro), 226 NV_Ith_OMP (C macro), 230 NV_Ith_P (C macro), 227 NV_Ith_PT (C macro), 234 NV_Ith_S (C macro), 223 NV_LENGTH_OMP (C macro), 230 NV_LENGTH_OMPDEV (C macro), 248 NV_LENGTH_PT (C macro), 233 NV_LENGTH_S (C macro), 223 NV_LOCLENGTH_P (C macro), 226 NV_NUM_THREADS_OMP (C macro), 230 NV_NUM_THREADS_PT (C macro), 233 NV_OWN_DATA_OMP (C macro), 230 NV_OWN_DATA_OMPDEV (C macro), 248 NV_OWN_DATA_P (C macro), 226 NV_OWN_DATA_PT (C macro), 233 NV_OWN_DATA_S (C macro), 223 OPENMP_ENABLE (CMake option), 347 PETSC_ENABLE (CMake option), 347 PETSC_INCLUDE_DIR (CMake option), 347 PETSC_LIBRARY_DIR (CMake option), 347 PSetupFn (C type), 282 PSolveFn (C type), 282 PTHREAD_ENABLE (CMake option), 347 RAJA_ENABLE (CMake option), 347 RCONST, 38, 154, 188 realtype, 38, 154, 188 residual weight vector, 17 Sayfy-Aburub-6-3-4 ERK method, 357, 367 SDIRK-2-1-2 method, 358, 373 SDIRK-5-3-4 method, 358, 377 SM_COLS_B (C macro), 265 SM_COLS_D (C macro), 260 SM_COLUMN_B (C macro), 265 SM_COLUMN_D (C macro), 260 SM_COLUMN_ELEMENT_B (C macro), 265 SM_COLUMNS_B (C macro), 263 SM_COLUMNS_D (C macro), 259 SM_COLUMNS_S (C macro), 271 SM_CONTENT_B (C macro), 263 SM_CONTENT_D (C macro), 259 SM_CONTENT_S (C macro), 269 SM_DATA_B (C macro), 265 SM_DATA_D (C macro), 260 SM_DATA_S (C macro), 271 SM_ELEMENT_B (C macro), 265 Index SM_ELEMENT_D (C macro), 260 SM_INDEXPTRS_S (C macro), 272 SM_INDEXVALS_S (C macro), 271 SM_LBAND_B (C macro), 263 SM_LDATA_B (C macro), 265 SM_LDATA_D (C macro), 259 SM_LDIM_B (C macro), 263 SM_NNZ_S (C macro), 271 SM_NP_S (C macro), 271 SM_ROWS_B (C macro), 263 SM_ROWS_D (C macro), 259 SM_ROWS_S (C macro), 269 SM_SPARSETYPE_S (C macro), 271 SM_SUBAND_B (C macro), 263 SM_UBAND_B (C macro), 263 SMALL_REAL, 38, 154, 188 SUNBandLinearSolver (C function), 290 SUNBandMatrix (C function), 266 SUNBandMatrix_Cols (C function), 266 SUNBandMatrix_Column (C function), 267 SUNBandMatrix_Columns (C function), 266 SUNBandMatrix_Data (C function), 266 SUNBandMatrix_LDim (C function), 266 SUNBandMatrix_LowerBandwidth (C function), 266 SUNBandMatrix_Print (C function), 266 SUNBandMatrix_Rows (C function), 266 SUNBandMatrix_StoredUpperBandwidth (C function), 266 SUNBandMatrix_UpperBandwidth (C function), 266 SUNBandMatrixStorage (C function), 266 SUNDenseLinearSolver (C function), 289 SUNDenseMatrix (C function), 260 SUNDenseMatrix_Cols (C function), 261 SUNDenseMatrix_Column (C function), 261 SUNDenseMatrix_Columns (C function), 261 SUNDenseMatrix_Data (C function), 261 SUNDenseMatrix_LData (C function), 261 SUNDenseMatrix_Print (C function), 261 SUNDenseMatrix_Rows (C function), 261 SUNDIALS_F77_FUNC_CASE (CMake option), 347 SUNDIALS_INDEX_SIZE (CMake option), 348 SUNDIALS_INDEX_TYPE (CMake option), 348 SUNDIALS_PRECISION (CMake option), 348 SUNDIALSGetVersion (C function), 79, 174, 200 SUNDIALSGetVersionNumber (C function), 79, 174, 200 SUNKLU (C function), 297 SUNKLUReInit (C function), 297 SUNKLUSetOrdering (C function), 297 SUNLapackBand (C function), 294 SUNLapackDense (C function), 292 SUNLinSol_Band (C function), 290 SUNLinSol_Dense (C function), 288 SUNLinSol_KLU (C function), 296 395 User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), SUNLinSol_KLUReInit (C function), 296 SUNLinSol_KLUSetOrdering (C function), 297 SUNLinSol_LapackBand (C function), 294 SUNLinSol_LapackDense (C function), 292 SUNLinSol_PCG (C function), 320 SUNLinSol_PCGSetMaxl (C function), 321 SUNLinSol_PCGSetPrecType (C function), 321 SUNLinSol_SPBCGS (C function), 312 SUNLinSol_SPBCGSSetMaxl (C function), 313 SUNLinSol_SPBCGSSetPrecType (C function), 313 SUNLinSol_SPFGMR (C function), 308 SUNLinSol_SPFGMRSetGSType (C function), 308 SUNLinSol_SPFGMRSetMaxRestarts (C function), 308 SUNLinSol_SPFGMRSetPrecType (C function), 308 SUNLinSol_SPGMR (C function), 303 SUNLinSol_SPGMRSetGSType (C function), 304 SUNLinSol_SPGMRSetMaxRestarts (C function), 304 SUNLinSol_SPGMRSetPrecType (C function), 304 SUNLinSol_SPTFQMR (C function), 316 SUNLinSol_SPTFQMRSetMaxl (C function), 316 SUNLinSol_SPTFQMRSetPrecType (C function), 316 SUNLinSol_SuperLUMT (C function), 300 SUNLinSol_SuperLUMTSetOrdering (C function), 300 SUNLinSolFree (C function), 280 SUNLinSolGetType (C function), 278 SUNLinSolInitialize (C function), 279 SUNLinSolLastFlag (C function), 281 SUNLinSolNumIters (C function), 281 SUNLinSolResid (C function), 281 SUNLinSolResNorm (C function), 281 SUNLinSolSetATimes (C function), 280 SUNLinSolSetPreconditioner (C function), 280 SUNLinSolSetScalingVectors (C function), 280 SUNLinSolSetup (C function), 279 SUNLinSolSolve (C function), 279 SUNLinSolSpace (C function), 281 SUNMatClone (C function), 257 SUNMatCopy (C function), 257 SUNMatDestroy (C function), 257 SUNMatGetID (C function), 256 SUNMatMatvec (C function), 258 SUNMatScaleAdd (C function), 257 SUNMatScaleAddI (C function), 258 SUNMatSpace (C function), 257 SUNMatZero (C function), 257 SUNNonlinSol_FixedPoint (C function), 336 SUNNonlinSol_Newton (C function), 333 SUNNonlinSolConvTestFn (C type), 330 SUNNonlinSolFree (C function), 326 SUNNonlinSolGetCurIter (C function), 328 SUNNonlinSolGetNumConvFails (C function), 329 SUNNonlinSolGetNumIters (C function), 328 SUNNonlinSolGetSysFn_FixedPoint (C function), 336 SUNNonlinSolGetSysFn_Newton (C function), 333 396 SUNNonlinSolGetType (C function), 325 SUNNonlinSolInitialize (C function), 326 SUNNonlinSolLSetupFn (C type), 329 SUNNonlinSolLSolveFn (C type), 330 SUNNonlinSolSetConvTestFn (C function), 328 SUNNonlinSolSetLSetupFn (C function), 327 SUNNonlinSolSetLSolveFn (C function), 327 SUNNonlinSolSetMaxIters (C function), 328 SUNNonlinSolSetSysFn (C function), 327 SUNNonlinSolSetup (C function), 326 SUNNonlinSolSolve (C function), 326 SUNNonlinSolSysFn (C type), 329 SUNPCG (C function), 321 SUNPCGSetMaxl (C function), 321 SUNPCGSetPrecType (C function), 321 SUNSparseFromBandMatrix (C function), 272 SUNSparseFromDenseMatrix (C function), 272 SUNSparseMatrix (C function), 272 SUNSparseMatrix_Columns (C function), 272 SUNSparseMatrix_Data (C function), 273 SUNSparseMatrix_IndexPointers (C function), 273 SUNSparseMatrix_IndexValues (C function), 273 SUNSparseMatrix_NNZ (C function), 273 SUNSparseMatrix_NP (C function), 273 SUNSparseMatrix_Print (C function), 272 SUNSparseMatrix_Realloc (C function), 272 SUNSparseMatrix_Rows (C function), 272 SUNSparseMatrix_SparseType (C function), 273 SUNSPBCGS (C function), 313 SUNSPBCGSSetMaxl (C function), 313 SUNSPBCGSSetPrecType (C function), 313 SUNSPFGMR (C function), 308 SUNSPFGMRSetGSType (C function), 308 SUNSPFGMRSetMaxRestarts (C function), 308 SUNSPFGMRSetPrecType (C function), 308 SUNSPGMR (C function), 304 SUNSPGMRSetGSType (C function), 304 SUNSPGMRSetMaxRestarts (C function), 304 SUNSPGMRSetPrecType (C function), 304 SUNSPTFQMR (C function), 316 SUNSPTFQMRSetMaxl (C function), 317 SUNSPTFQMRSetPrecType (C function), 316 SUNSuperLUMT (C function), 300 SUNSuperLUMTSetOrdering (C function), 301 SUPERLUMT_ENABLE (CMake option), 348 SUPERLUMT_INCLUDE_DIR (CMake option), 348 SUPERLUMT_LIBRARY_DIR (CMake option), 348 SUPERLUMT_THREAD_TYPE (CMake option), 348 TPL_BLAS_LIBRARIES (xSDK CMake option), 349 TPL_ENABLE_BLAS (xSDK CMake option), 349 TPL_ENABLE_HYPRE (xSDK CMake option), 349 TPL_ENABLE_KLU (xSDK CMake option), 349 TPL_ENABLE_LAPACK (xSDK CMake option), 349 Index User Documentation for ARKode v3.0.2 (SUNDIALS v4.0.2), TPL_ENABLE_PETSC (xSDK CMake option), 349 TPL_ENABLE_SUPERLUMT (xSDK CMake option), 349 TPL_HYPRE_INCLUDE_DIRS (xSDK CMake option), 349 TPL_HYPRE_LIBRARIES (xSDK CMake option), 349 TPL_KLU_INCLUDE_DIRS (xSDK CMake option), 350 TPL_KLU_LIBRARIES (xSDK CMake option), 350 TPL_LAPACK_LIBRARIES (xSDK CMake option), 350 TPL_PETSC_INCLUDE_DIRS (xSDK CMake option), 350 TPL_PETSC_LIBRARIES (xSDK CMake option), 350 TPL_SUPERLUMT_INCLUDE_DIRS (xSDK CMake option), 350 TPL_SUPERLUMT_LIBRARIES (xSDK CMake option), 350 TPL_SUPERLUMT_THREAD_TYPE (xSDK CMake option), 350 TRBDF2-3-3-2 ESDIRK method, 358, 374 UNIT_ROUNDOFF, 38, 154, 188 USE_GENERIC_MATH (CMake option), 348 USE_XSDK_DEFAULTS (xSDK CMake option), 350 User main program, 40, 155, 189 Verner-8-5-6 ERK method, 357, 370 weighted root-mean-square norm, 17 XSDK_ENABLE_FORTRAN (xSDK CMake option), 350 XSDK_INDEX_SIZE (xSDK CMake option), 350 XSDK_PRECISION (xSDK CMake option), 350 Zonneveld-5-3-4 ERK method, 357, 364 Index 397
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 404 Page Mode : UseOutlines Warning : Duplicate 'Author' entry in dictionary (ignored) Author : Daniel R. Reynolds$^1$, David J. Gardner$^2$, , Alan C. Hindmarsh$^2$, Carol S. Woodward$^2$ , and Jean M. Sexton$^1$, , , {protect .elax protect .elax protect edef n{it}protect xdef T1/ptm/m/n/10 {T1/ptm/m/n/10 }T1/ptm/m/n/10 size@update enc@update $^1$Department of Mathematics}, {protect .elax protect .elax protect edef n{it}protect xdef T1/ptm/m/n/10 {T1/ptm/m/n/10 }T1/ptm/m/n/10 size@update enc@update Southern Methodist University} , , {protect .elax protect .elax protect edef n{it}protect xdef T1/ptm/m/n/10 {T1/ptm/m/n/10 }T1/ptm/m/n/10 size@update enc@update $^2$Center for Applied Scientific Computing}, {protect .elax protect .elax protect edef n{it}protect xdef T1/ptm/m/n/10 {T1/ptm/m/n/10 }T1/ptm/m/n/10 size@update enc@update Lawrence Livermore National Laboratory} Title : User Documentation for ARKode v3.0.2, (SUNDIALS v4.0.2) Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.19 Create Date : 2019:01:22 14:05:53-08:00 Modify Date : 2019:01:22 14:05:53-08:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) kpathsea version 6.3.0EXIF Metadata provided by EXIF.tools