# UQTk V3.0.4 Manual

User Manual: Pdf

Open the PDF directly: View PDF .

Page Count: 106 [warning: Documents this large are best viewed by clicking the View PDF Link!]

SANDIA REPORT

SAND2017-11051

Unlimited Release

Printed Oct, 2017

UQTk Version 3.0.4 User Manual

Khachik Sargsyan, Cosmin Safta, Kenny Chowdhary, Sarah Castorena,

Sarah de Bord, Bert Debusschere

Prepared by

Sandia National Laboratories

Albuquerque, New Mexico 87185 and Livermore, California 94550

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and

Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc, for the

U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Approved for public release; further dissemination unlimited.

Issued by Sandia National Laboratories, operated for the United States Department of Energy

by National Technology and Engineering Solutions of Sandia, LLC.

NOTICE: This report was prepared as an account of work sponsored by an agency of the United

States Government. Neither the United States Government, nor any agency thereof, nor any

of their employees, nor any of their contractors, subcontractors, or their employees, make any

warranty, express or implied, or assume any legal liability or responsibility for the accuracy,

completeness, or usefulness of any information, apparatus, product, or process disclosed, or rep-

resent that its use would not infringe privately owned rights. Reference herein to any speciﬁc

commercial product, process, or service by trade name, trademark, manufacturer, or otherwise,

does not necessarily constitute or imply its endorsement, recommendation, or favoring by the

United States Government, any agency thereof, or any of their contractors or subcontractors.

The views and opinions expressed herein do not necessarily state or reﬂect those of the United

States Government, any agency thereof, or any of their contractors.

Printed in the United States of America. This report has been reproduced directly from the best

available copy.

Available to DOE and DOE contractors from

U.S. Department of Energy

Ofﬁce of Scientiﬁc and Technical Information

P.O. Box 62

Oak Ridge, TN 37831

Telephone: (865) 576-8401

Facsimile: (865) 576-5728

E-Mail: reports@adonis.osti.gov

Online ordering: http://www.osti.gov/bridge

Available to the public from

U.S. Department of Commerce

National Technical Information Service

5285 Port Royal Rd

Springﬁeld, VA 22161

Telephone: (800) 553-6847

Facsimile: (703) 605-6900

E-Mail: orders@ntis.fedworld.gov

Online ordering: http://www.ntis.gov/help/ordermethods.asp?loc=7-4-0#online

D

E

P

A

R

T

M

E

N

T

O

F

E

N

E

R

G

Y

• •

U

N

I

T

E

D

S

T

A

T

E

S

O

F

A

M

E

R

I

C

A

2

SAND2017-11051

Unlimited Release

Printed Oct, 2017

UQTk Version 3.0.4 User Manual

Khachik Sargsyan, Cosmin Safta, Kenny Chowdhary, Sarah Castorena,

Sarah de Bord, Bert Debusschere

Abstract

The UQ Toolkit (UQTk) is a collection of libraries and tools for the quantiﬁcation of un-

certainty in numerical model predictions. Version 3.0.4 oﬀers intrusive and non-intrusive

methods for propagating input uncertainties through computational models, tools for sen-

sitivity analysis, methods for sparse surrogate construction, and Bayesian inference tools

for inferring parameters from experimental data. This manual discusses the download and

installation process for UQTk, provides pointers to the UQ methods used in the toolkit, and

describes some of the examples provided with the toolkit.

3

4

Contents

1 Overview 7

2 Download and Installation 9

Requirements......................................................... 9

Download............................................................ 9

DirectoryStructure.................................................... 10

External Software and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

PyUQTk........................................................ 11

Installation .......................................................... 12

Conﬁgurationﬂags................................................ 12

Installationexample............................................... 13

3 Theory and Conventions 17

Polynomial Chaos Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Source Code Description 19

Applications ......................................................... 19

}generate quad: ................................................ 20

}gen mi: ....................................................... 20

}gp regr: ...................................................... 21

}model inf: .................................................... 22

}pce eval: ..................................................... 26

}pce quad: ..................................................... 27

5

}pce resp: ..................................................... 29

}pce rv: ....................................................... 30

}pce sens: ..................................................... 30

}pdf cl: ....................................................... 30

}regression: .................................................. 31

}sens: ........................................................ 33

5 Examples 35

ElementaryOperations................................................. 35

Forward Propagation of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

NumericalIntegration.................................................. 44

Forward Propagation of Uncertainty with PyUQTk . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Expanded Forward Propagation of Uncertainty - PyUQTk . . . . . . . . . . . . . . . . . . . 61

Theory.......................................................... 62

Heat Transfer - Dual Pane Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bayesian Inference of a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Surrogate Construction and Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Global Sensitivity Analysis via Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Karhunen-Lo`eve Expansion of a Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Support 99

References 100

6

Chapter 1

Overview

The UQ Toolkit (UQTk) is a collection of libraries and tools for the quantiﬁcation of

uncertainty in numerical model predictions. In general, uncertainty quantiﬁcation (UQ)

pertains to all aspects that aﬀect the predictive ﬁdelity of a numerical simulation, from the

uncertainty in the experimental data that was used to inform the parameters of a chosen

model, and the propagation of uncertain parameters and boundary conditions through that

model, to the choice of the model itself.

In particular, UQTk provides implementations of many probabilistic approaches for UQ

in this general context. Version 3.0.4 oﬀers intrusive and non-intrusive methods for propagat-

ing input uncertainties through computational models, tools for sensitivity analysis, methods

for sparse surrogate construction, and Bayesian inference tools for inferring parameters from

experimental data.

The main objective of UQTk is to make these methods available to the broader scientiﬁc

community for the purposes of algorithmic development in UQ or educational use. The most

direct way to use the libraries is to link to them directly from C++ programs. Alternatively,

in the examples section, many scripts for common UQ operations are provided, which can be

modiﬁed to ﬁt the users’ purposes using existing numerical simulation codes as a black-box.

The next chapter in this manual discusses the download and installation process for

UQTk, followed by some pointers to the UQ methods used in the toolkit, and a description

of some of the examples provided with the toolkit.

7

8

Chapter 2

Download and Installation

Requirements

The core UQTk libraries are written in C++, with some dependencies on FORTRAN

numerical libraries. As such, to use UQTk, a compatible C++ and FORTRAN compiler

will be needed. UQTk is installed and built most naturally on a Unix-like platform, and

has been tested on Mac OS X and Linux. Installation and use on Windows machines under

Cygwin is possible, but has not been tested extensively.

Many of the examples rely on Python, including NumPy, SciPy, and matplotlib packages

for postprocessing and graphing. As such, Python version 2.7.x with compatible NumPy,

SciPy, and matplotlib are recommended. Further the use of XML for input ﬁles requires

the Expat XML parser library to be installed on your system. Note, if you will be linking

the core UQTk libraries directly to your own codes, and do not plan on using the UQTk

examples, then those additional dependencies are not required.

Download

The most recent version of UQTk, currently 3.0.4, can be downloaded from the following

location:

http://www.sandia.gov/UQToolkit

After download, extract the tar ﬁle into the directory where you want to keep the UQTk

source directory.

% tar -xzvf UQTk_v3.0.tgz

Make sure to replace the name of the tar ﬁle in this command with the name of the most

recent tar ﬁle you just downloaded, as the ﬁle name depends on the version of the toolkit.

9

Directory Structure

After extraction, there will be a new directory UQTk_v3.0 (version number may be dif-

ferent). Inside this top level directory are the following directories:

config Conﬁguration ﬁles

cpp C++ source code

app C++ apps

lib C++ libraries

tests Tests for C++ libraries

dep External dependencies

ann Approximate Nearest Neighbors library

blas Netlib’s BLAS library (linear algebra)

cvode-2.7.0 SUNDIALS’ CVODE library (ODE solvers)

dsfmt dsfmt library (random number generators)

figtree Fast Improved Gauss Transform library

lapack Netlib’s LAPACK library (linear algebra)

lbfgs lbfgs library (optimization)

slatec Netlib’s SLATEC library (general purpose math)

doc Documentation

examples Examples with C++ libraries and apps

fwd_prop forward propagation with a heat transfer example

kle_ex1 Karhunen-Loeve expansion example

line_infer calibrate parameters of a linear model

muq interface between MUQ and UQTk

num_integ quadrature and Monte Carlo integrations

ops operations with Polynomial Chaos expansions

pce_bcs construct sparse Polynomial Chaos expansions

sensMC Monte-Carlo based sensitivity index computation

surf_rxn surface reaction example for forward and inverse UQ

uqpc construct Polynomial Chaos surrogates for multiple

outputs/functions

PyUQTk Python scripts and interface to C++ libraries

array interface to array class

bcs interface to Bayesian compressive sensing library

dfi interface to data-free inference class

inference Python Markov Chain Monte Carlo (MCMC) scripts

kle interface to Karhunen-Loeve expansion class

mcmc interface to MCMC class

pce interface to Polynomial Chaos expansion class

plotting Python plotting scripts

pytests

quad interface to Quad class

sens Python global sensitivity analysis scripts

10

tools interface to UQTk tools

utils interface to UQTk utils

External Software and Libraries

The following software and libraries are required to compile UQTK

1. C++/Fortran compilers. Please note that C++ and Fortran compilers need to

be compatible with each other. Please see the table below for a list of platforms and

compilers we have tested so far. For OS X these compilers were installed either using

MacPorts, or Homebrew, or directly built from source code.

Platform Compiler

OS X 10.9 GNU 4.8, 4.9, 5.2, Intel 14.0.3

OS X 10.10 GNU 5.2, 5.3

OS X 10.11 GNU 5.2–4, 6.1, 6.2, Intel 14.0.3, OpenMPI 1.10

Linux x86 64 (Red Hat) GNU 4.8

Linux ia-64 (Suse) Intel 10.1.021

2. CMake. We switched to a CMake-based build/install conﬁguration in version 3.0.

The conﬁguration ﬁles require a CMake version 2.6.x or higher.

3. Expat library. The Expat XML Parser is installed together with other XCode tools

on OS X. It is also fairly common on Linux systems, with installation scripts available

for several platforms. Alternatively this library can be downloaded from http://

expat.sourceforge.net

PyUQTk

The following additional software and libraries are not required to compile UQTK, but

are necessary for the full Python interface to UQTk called PyUQTk.

1. Python, NumPy, SciPy, and Matplotlib. We have successfully compiled PyUQTk

with Python 2.6.x and Python 2.7.x, NumPy 1.8.1, SciPy 0.14.0, and Matplotlib 1.4.2.

Note that we have not tried using Python 3 or higher. Note that it is important

that all of these packages be compatible with each other. Sometimes, your OS may

come with a default version of Python but not SciPy or NumPy. When adding those

packages afterwards, it can be hard to get them to all be compatible with each other.

To avoid issues, it is recommended to install Python, NumPy, and SciPy all from the

same package manager (e.g. get them all through MacPorts or Homebrew on OS X).

2. SWIG. PyUQTk has been tested with SWIG 3.0.2.

11

Installation

We deﬁne the following keywords to simplify build and install descriptions in this section.

•sourcedir - directory containing UQTk source ﬁles, i.e. the top level directory men-

tioned in Section 2.

•builddir - directory where UQTk library and its dependencies will be built. This

directory should not be the same as sourcedir.

•installdir - directory where UQTk libraries are installed and header and script ﬁles are

copied

To generate the build structure, compile, test, and install UQTk:

(1) >mkdir builddir ; cd builddir

(2) >cmake <ﬂags>sourcedir

(3) >make

(4) >ctest

(5) >make install

Conﬁguration ﬂags

A (partial) list of conﬁguration ﬂags that can be set at step (2) above is provided below:

•CMAKE INSTALL PREFIX : set installdir.

•CMAKE C COMPILER : C compiler

•CMAKE CXX COMPILER : C++ compiler

•CMAKE Fortran COMPILER : Fortran compiler

•IntelLibPath: For Intel compilers: path to libraries if diﬀerent than default system

paths

•PyUQTk: If ON, then build PyUQTk’s Python to C++ interface. Default: OFF

Several pre-set conﬁg ﬁles are available in the “sourcedir/conﬁg” directory. Some of these

shell scripts also accept arguments, e.g. conﬁg-teton.sh, to switch between several conﬁgura-

tions. Type, for example “conﬁg-teton.sh –help” to obtain a list of options. For a basic setup

using default system settings for GNU compilers, see “conﬁg-gcc-base.sh”. The user is en-

couraged to copy of one these script ﬁles and edit to match the desired conﬁguration. Then,

12

step no. 2 above (cmake <ﬂags>sourcedir) should be replaced by a command executing a

particular shell script from the command line, e.g.

(2) >../UQTk_v3.0.1/config/config-gcc-base.sh

In this example, the conﬁguration script is executed from the build directory, while it is

assumed that the conﬁguration script still sits in the conﬁguration directory, in this case

version 3.0.1, of the UQTk source code tree.

If all goes well, there should be no errors. Two log ﬁles in the “conﬁg” directory contain

the output for Steps (2) and (3) above, for compilation and installation on OS X 10.9.5 using

GNU 4.8.3 compilers:

(2) >../UQTk/config/config-teton.sh -c gnu -p ON >& cmake-mac-gnu.log

(3) >make >& make-gnu.log ; make install >>& make-gnu.log

After compilation ends, the installdir will be contain the following sub-directories:

PyUQTk Python scripts and, if PyUQTk=ON, interface to C++ classes

bin app’s binaries

cpp tests for C++ libraries

examples examples on using UQTk

include UQTk header ﬁles

examples UQTk libraries, including for external dependencies

To use the UQTk libraries, your program should link in the libraries in installdir/lib

and add installdir/include/uqtk and installdir/include/dep directories to the com-

piler include path. The apps are standalone programs that perform UQ operations, such as

response surface construction, or sampling from random variables. For more details, see the

Examples section.

Installation example

In this section, we will take the user through the installation of UQTk and PyUQTk on

a Mac OSX 10.11 system with the GNU compilers. The following example uses GNU 6.1

installed under /opt/local/gcc61. For the compilation of PyUQTk, we are using Python

version 2.7.10 with SciPy 0.14.0, Matplotlib 1.4.2, NumPy 1.8.1, and SWIG 3.0.2. Note that

you will need to install both swig and swig-python libraries if you install SWIG via Macports.

If you install SWIG from source, you do not need to install a separate swig-python library.

It will be cleaner to keep the source directory separate from the build and install direc-

tories. For simplicity, we will create a UQTk-build directory in the same parent folder as

the source directory, UQTk . While in the source directory, create the build directory and cd

into it:

13

$ mkdir ../UQTk-build

$ cd ../UQTk-build

It is important to note that the CMake compilation uses the cc and c++ deﬁned compilers

by default. This may not be the compilers you want when installing UQTk. Luckily, CMake

allows you to specify which compilers you want, similar to autoconf. Thus, we type

$ cmake -DCMAKE_INSTALL_PREFIX:PATH=$PWD/../UQTk-install \

-DCMAKE_Fortran_COMPILER=/opt/local/gcc61/bin/gfortran-6.1.0 \

-DCMAKE_C_COMPILER=/opt/local/gcc61/bin/gcc-6.1.0 \

-DCMAKE_CXX_COMPILER=/opt/local/gcc61/bin/g++-6.1.0 ../UQTk

Note that this will conﬁgure CMake to compile UQTk without the Python interface. Also,

we speciﬁed the installation directory to be UQTk-install in the same parent directory at

UQTk and UQTk-build. Figure 2.1 shows what CMake prints to the screen. To turn on the

Figure 2.1: CMake conﬁguration without the Python interface.

Python interface just set the CMake ﬂag, PyUQTk, on, i.e.,

$ cmake -DPyUQTk=ON \

-DCMAKE_INSTALL_PREFIX:PATH=$PWD/../UQTk-install \

-DCMAKE_Fortran_COMPILER=/opt/local/gcc61/bin/gfortran-6.1.0 \

-DCMAKE_C_COMPILER=/opt/local/gcc61/bin/gcc-6.1.0 \

-DCMAKE_CXX_COMPILER=/opt/local/gcc61/bin/g++-6.1.0 ../UQTk

14

Figure 2.2 shows the additional output to screen after the Python interface ﬂag is turned

on.

Figure 2.2: CMake conﬁguration with the Python interface.

If the CMake command has executed without error, you are now ready to build UQTk.

While in the build directory, type

$ make

or, for a faster compilation using Nparallel threads,

$ make -j N

where one can replace Nwith the number of virtual cores on your machine, e.g. 8. This will

build in the UQTK-build/ directory. The screen should look similar to Figure 2.3 with or

without the Python interface when building.

Figure 2.3: Start and end of build without Python interface.

To verify that the build was successful, run the ctest command from the UQTK-build/

directory to execute C++ and Python (only if building PyUQTk) test scripts.

$ ctest

15

Figure 2.4: Result of ctest after successful build and install. Note that if you do not build

PyUQTk, those tests will not be run.

The output should look similar to Figure 2.4.

If the Python tests fail, even though the compilation went well, a common issue that the

conﬁgure script may have found a diﬀerent version of the Python libraries than the one is

used when you issue Python from the command line. To avoid this, specify the path to your

Python executable and libraries to the conﬁguration process. For example (on OS X):

cmake -DCMAKE_INSTALL_PREFIX:PATH=$PWD/../UQTk-install \

-DCMAKE_Fortran_COMPILER=/opt/local/gcc61/bin/gfortran-6.1.0 \

-DCMAKE_C_COMPILER=/opt/local/gcc61/bin/gcc-6.1.0 \

-DCMAKE_CXX_COMPILER=/opt/local/gcc61/bin/g++-6.1.0 ../UQTk

-DPYTHON_EXECUTABLE:FILEPATH=/opt/local/bin/python \

-DPYTHON_LIBRARY:FILEPATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib \

-DPyUQTk=ON \

../UQTk

If all looks good, you are now ready to install UQTk. While in the build directory, type

$ make install

which installs the libraries, headers, apps, examples, and such in the speciﬁed installation

directory. Additionally, if you are building the Python interface, the install command will

copy over the python scripts and SWIG modules (*.so) over to PyUQTk/.

As a reminder, commonly used conﬁgure options are illustrated in the scripts that are

provided in the “sourcedir/conﬁg” folder.

16

Chapter 3

Theory and Conventions

UQTk implements many probabilistic methods found in the literature. For more details

on the methods, please refer to the following papers and books on Polynomial Chaos methods

for uncertainty propagation [3, 17], Karhunen-Lo`eve (KL) expansions [7], numerical quadra-

ture (including sparse quadrature) [13, 2, 9, 31, 6], Bayesian inference [30, 4, 18], Markov

Chain Monte Carlo [5, 8, 10, 11], Bayesian compressive sensing [1], and the Rosenblatt

transformation [23].

Below, some key aspects and conventions of UQTk Polynomial Chaos expansions are

outlined in order to connect the tools in UQTk to the broader theory.

Polynomial Chaos Expansions

•The default ordering of PCE terms in the multi-index in UQTk is the canonical ordering

for total order truncation

•The PC basis functions in UQTk are not normalized

•The Legendre-Uniform PC Basis type is deﬁned on the interval [-1, 1], with weight

function 1/2

17

18

Chapter 4

Source Code Description

For more details on the actual source code in UQTk, HTML documentation is also

available in the doc_cpp/html folder.

Applications

The following command-line applications are available (source code is in cpp/app )

}generate_quad : Quadrature point/weight generation

}gen_mi : Polynomial multiindex generation

}gp_regr : Gaussian process regression

}model_inf : Model parameter inference

}pce_eval : PC evaluation

}pce_quad : PC generation from samples

}pce_resp : PC projection via quadrature integration

}pce_rv : PC-related random variable generation

}pce_sens : PC sensitivity extraction

}pdf_cl : Kernel Density Estimation

}regression : Linear parametric regression

}sens : Sobol sensitivity indices via Monte-Carlo sampling

Below we detail the theory behind all the applications. For speciﬁc help in running an

app, type app_name -h.

19

}generate quad:

This utility generates isotropic quadrature (both full tensor product or sparse) points of

given dimensionality and type. The keyword options are:

Quadrature types: -g <quadType>

•LU : Legendre-Uniform

•HG : Gauss-Hermite

•LG : Gamma-Laguerre

•SW : Stieltjes-Wiegert

•JB : Beta-Jacobi

•CC : Clenshaw-Curtis

•CCO : Clenshaw-Curtis Open (endpoints not included)

•NC : Newton-Cotes (equidistant)

•NCO : Newton-Cotes Open (endpoints not included)

•GP3 : Gauss-Patterson

•pdf : Custom PDF

Sparsity types: -x <fsType>

•full : full tensor product

•sparse : Smolyak sparse grid construction

Note that one can create an equidistant multidimensional grid by using ‘NC’ quadrature

type and ‘full’ sparsity type.

}gen mi:

This utility generates multi index set of a given type and dimensionality. The keyword

options are:

Multiindex types: -x <mi_type>

20

•TO : Total order truncation, i.e. α= (α1, . . . , αd), where α1+···+αd=||α||1≤p,

for given order pand dimensionality d. The number of multiindices is NT O

p,d = (p+

d)!/(p!d!).

•TP : Tensor product truncation, i.e. α= (α1, . . . , αd), where αi≤pi, for i=

1, . . . , d. The dimension-speciﬁc orders are given in a ﬁle with a name speciﬁed as a

command-line argument (-f). The number of multiindices is NT P

p1,...,pd=Qd

i=1(pi+ 1).

•HDMR : High-Dimensional Model Representation, where, for each k,k-variate mul-

tiindices are truncated up to a given order. That is, if ||α||0=k(i.e. the num-

ber of non-zero elements is equal to k), then ||α||1≤pk, for k= 1, . . . , kmax. The

variate-speciﬁc orders pkare given in a ﬁle with a name speciﬁed as a command-line

argument (-f). The number of multiindices constructed in this way is NHDMR

p0,...,pkmax =

Pkmax

k=0 (pk+k)!/(pk!k!).

}gp regr:

This utility performs Gaussian process regression [22], in particular using the Bayesian

perspective of constructing GP emulators, see e.g. [12, 20]. The data is given as pairs D=

{(x(i), y(i))}N

i=1, where x∈Rd. The function to be found, f(x) is endowed with a Gaussian

prior with mean h(x)Tcand a predeﬁned covariance C(x, x0) = σ2c(x, x0). Currently, only

a squared-exponential covariance is implemented, i.e. c(x, x0) = e−(x−x0)TB(x−x0). The mean

trend basis vector h(x) = (L0(x), . . . , LK−1(x)) consists of Legendre polynomials, while c

and σ2are hyperparameters with a normal inverse gamma (conjugate) prior

p(c, σ2) = p(c|σ2)p(σ2)∝e−(c−c0)TV−1(c−c0)

2σ2

σ

e−β

σ2

σ2(α+1) .

The parameters c0, V −1and Bare ﬁxed for the duration of the regression. Conditioned on

yi=f(xi), the posterior is a student-t process

f(x)|D,c0, V −1, B, α, β ∼St-t(µ∗(x),ˆσc∗(x, x0))

with mean and covariance deﬁned as

µ∗(x) = h(x)Tˆ

c+t(x)TA−1(y−Hˆ

c),

c∗(x, x0) = c(x, x0)−t(x)TA−1t(x0)+[h(x)T−t(x)TA−1H]V∗[h(x0)T−t(x0)TA−1H]T,

where yT= (y(1), . . . , y(N)) and

ˆ

c=V∗(V−1c0+HTA−1y) ˆσ2=2β+cT

0V−1c0+yTA−1y−ˆ

cT(V∗)−1ˆ

c

N+ 2α−K−2

t(x)T= (c(x, x(1)), . . . , c(x, x(N))) V∗= (V−1+HTA−1H)−1

H= (h(x(1))T,...,h(x(N))T)Amn =c(x(m), x(n))

(4.1)

21

Note that currently the commonly used prior p(c, σ2)∝σ−2is implemented which is a

special case with α=β= 0 and c0=0,V−1= 0K×K. Also, a small nugget of size 10−6

is added to the diagonal of matrix Afor numerical purposes, playing a role of ‘data noise’.

Finally, the covariance matrix Bis taken to be diagonal, with the entries either ﬁxed or

found before the regression by maximizing marginal posterior [20]. More ﬂexibility in trend

basis and covariance structure selection is a matter of current work.

The app builds the Student-t process according to the computations detailed above, and

evaluates its mean and covariance at a user-deﬁned grid of points x.

}model inf:

This utility perform Bayesian inference for several generic types of models. Consider a

dataset D={(x(i), y(i))}N

i=1 of pairs of x-ymeasured values from some unknown ‘truth’

function g(·), i.e. y(i)=g(x(i))+meas.errors. For example, y(i)can be measurements at

spatial locations x(i), or at time instances x(i), or x(i)=isimply enumerating several ob-

servables. We call elements of x∈IRSdesign or controllable parameters. Assume, generally,

that gis not deterministic, i.e. the vector of measurements y(i)at each icontains Rin-

stances/replicas/measurements of the true output g(x). Furthermore, consider a model of

interest f(λ;x) as a function of model parameters λ∈IRDproducing a single output. We

are interested in calibrating the model f(λ;x) with respect to model parameters λ, seeking

an approximate match of the model to the truth:

f(λ;x)≈g(x).(4.2)

The full error budget takes the following form

y(i)=f(λ;x(i)) + δ(x(i)) + i,(4.3)

where δ(x) is the model discrepancy term, and iis the measurement error for the i-th data

point. The most common assumption for the latter is an i.i.d Gaussian assumption with

vanishing mean

i∼N(0, σ2),for all i= 1, . . . , N. (4.4)

Concerning model error δ(x), we envision three scenarios:

•when the model discrepancy term δ(x) is ignored, one arrives at the classical construc-

tion y(i)−f(λ;x(i))∼N(0, σ2) with likelihood described below in Eq. (4.12).

•when the model discrepancy δ(x) is modeled explicitly as a Gaussian process with

a predeﬁned, typically squared-exponential covariance term with parameters either

ﬁxed apriori or inferred as hyperparameters, together with λ. This approach has been

established in [15], and is referred to as “Kennedy-O’Hagan”, koh approach.

•embedded model error approach is a novel strategy when model error is embedded

into the model itself. For detailed discussion on the advantages and challenges of

22

the approach, see [26]. This method leads to several likelihood options (keywords abc,

abcm, gausmarg, mvn, full, marg), many of which are topics of current research and are

under development. In this approach, one augments some of the parameters in λwith a

probabilistic representation, such as multivariate normal, and infers parameters of this

representation instead. Without loss of generality, and for the clarity of illustration,

we assumed that the ﬁrst Mcomponents of λare augmented with a random variable.

To this end, λis augmented by a multivariate normal random variable as

λ→Λ=λ+A(α)~

ξ, (4.5)

where

A(α) =

α11 0 0 . . . 0

α21 α22 0. . . 0

α31 α32 α33 . . . 0

.

.

..

.

..

.

.....

.

.

αM1αM2αM3. . . αMM

0 0 0 . . . 0

.

.

..

.

..

.

.....

.

.

0 0 0 . . . 0

0 0 0 . . . 0

D×M

,and ~

ξ=

ξ1

ξ2

.

.

.

ξM

(4.6)

Here ~

ξis a vector of independent identically distributed standard normal variables,

and α= (α11, . . . , αMM ) is the vector of size M(M+ 1)/2 of all non-zero entries in the

matrix A. The set of parameters describing the random vector Λis ˆ

λ= (λ,α) The

full data model then is written as

y(i)=f(λ+A(α)~

ξ;x(i)) + i(4.7)

or

y(i)=fˆ

λ(x(i);~

ξ) + σ2ξM+i,(4.8)

where fˆ

λ(x;~

ξ) is a random process induced by this model error embedding. The mean

and variance of this process are deﬁned as µˆ

λ(x) and σ2

ˆ

λ(x), respectively. To represent

this random process and allow easy access to its ﬁrst two moments, we employ a

non-intrusive spectral projection (NISP) approach to propagate uncertainties in fvia

Gauss-Hermite PC expansion,

y(i)=

K−1

X

k=0

fik(λ,α)Ψk(~

ξ) + σ2ξM+i,(4.9)

for a ﬁxed order pexpansion, leading to K= (p+M)!/(p!M!) terms.

The parameter estimation problem for λis now reformulated as a parameter estimation for

ˆ

λ= (λ,α). This inverse problem is solved via Bayesian machinery. Bayes’ formula reads

p(ˆ

λ|D)

| {z }

posterior

∝p(D|ˆ

λ)

| {z }

likelihood

p(ˆ

λ)

|{z}

prior

,(4.10)

23

where the key function is the likelihood function

LD(ˆ

λ) = p(D|ˆ

λ) (4.11)

that connects the prior distribution of the parameters of interest to the posterior one. The

options for the likelihood are given further in this section. For details on the likelihood

construction, see [26]. To alleviate the invariance with respect to sign-ﬂips, we use a prior

that enforces αMi >0 for i= 1, . . . , M. Also, one can either ﬁx σ2or infer it together with

ˆ

λ.

Exact computation of the potentially high-dimensional posterior (4.10) is usually prob-

lematic, therefore we employ Markov chain Monte Carlo (MCMC) algorithm for sampling

from the posterior. Model fand the exact form of the likelihood are determined using

command line arguments. Below we detail the currently implemented model types.

Model types: -f <modeltype>

•prop : for x∈IR1and λ∈IR1, the function is deﬁned as f(λ;x) = λx.

•prop_quad : for x∈IR1and λ∈IR2, the function is deﬁned as f(λ;x) = λ1x+λ2x2.

•exp : for x∈IR1and λ∈IR2, the function is deﬁned as f(λ;x) = eλ1+λ2x.

•exp_quad : for x∈IR1and λ∈IR3, the function is deﬁned as f(λ;x) = eλ1+λ2x+λ3x2.

•const : for any x∈IRnand λ∈IR1, the function is deﬁned as f(λ;x) = λ.

•linear : for x∈IR1and λ∈IR2, the function is deﬁned as f(λ;x) = λ1+λ2x.

•bb : the model is a ‘black-box’ executed via system-call of a script named

bb.x that takes ﬁles p.dat (matrix R×Dfor λ) and x.dat (matrix N×Sfor x) and

returns output y.dat (matrix R×Nfor f). This eﬀectively simulates f(λ;x) at R

values of λand Nvalues of x.

•heat_transfer1 : a custom model designed for a tutorial case of a heat conduction

problem: for x∈IR1and λ∈IR1, the model is deﬁned as f(λ;x) = xdw

Awλ+T0, where

dw= 0.1, Aw= 0.04 and T0= 273.

•heat_transfer2 : a custom model designed for a tutorial case of a heat conduction

problem: for x∈IR1and λ∈IR2, the model is deﬁned as f(λ;x) = xQ

Awλ1+λ2, where

Aw= 0.04 and Q= 20.0.

•frac_power : a custom function for testing. For x∈IR1and λ∈IR4, the function is

deﬁned as f(λ;x) = λ0+λ1x+λ2x2+λ3(x+ 1)3.5.

•exp_sketch : exponential function to enable the sketch illustrations of model error

embedding approach, for x∈IR1and λ∈IR2, the model is deﬁned as f(λ;x) =

λ2eλ1x−2.

24

•inp : a function that produces the input components as output. That is f(λ;x(i)) =

λi, for x∈IR1and λ∈IRd, assuming exactly dvalues for the design variables x(these

are usually simply indices xi=ifor i= 1, . . . , d).

•pcl : the model is a Legendre PC expansion that is linear with respect to coeﬃcients

λ,i.e. f(λ;x) = Pα∈S λαΨα(x).

•pcx : the model is a Legendre PC expansion in both xand λ,i.e. z= (λ,x), and

f(λ;x) = Pα∈S cαΨα(z)

•pc : the model is a set of Legendre polynomial expansions for each value of x:i.e.

f(λ;x(i)) = Pα∈S cα,iΨα(λ).

•pcs : same as pc, only the multi-index set Scan be diﬀerent for each x(i),i.e.

f(λ;x(i)) = Pα∈Sicα,iΨα(λ).

Likelihood construction is the key step and the biggest challenge in model parameter

inference.

Likelihood types: -l <liktype>

•classical : No α, or M= 0. This is a classical, least-squares likelihood

log LD(λ) = −

N

X

i=1

(y(i)−f(λ;x(i)))2

2σ2−N

2log (2πσ2),(4.12)

•koh : Kennedy-O’Hagan likelihood with explicit additive representation of

model discrepancy [15].

•full : This is the exact likelihood

LD(ˆ

λ) = πhˆ

λ(y(1), . . . , y(N)),(4.13)

where hˆ

λis the random vector with entries fˆ

λ(x(i);~

ξ) + σ2ξM+i. When there is no

data noise, i.e. σ= 0, this likelihood is degenerate [26]. Typically, computation of this

likelihood requires a KDE step for each ˆ

λto evaluate a high-d PDF πhˆ

λ(·).

•marg : Marginal approximation of the exact likelihood

LD(ˆ

λ) =

N

Y

i=1

πhˆ

λ,i (y(i)),(4.14)

where hˆ

λ,i is the i-th component of hˆ

λ. This requires one-dimensional KDE estimates

performed for all Ndimensions.

25

•mvn : Multivariate normal approximation of the full likelihood

log LD(ˆ

λ) = −1

2(y−µˆ

λ)TΣ−1

ˆ

λ(y−µˆ

λ)−N

2log (2π)−1

2log (det Σˆ

λ),(4.15)

where mean vector µˆ

λand covariance matrix Σˆ

λare deﬁned as µi

ˆ

λ=µˆ

λ(x(i)) and

Σˆ

λ

ij =E(hˆ

λ,i −µˆ

λ(x(i)))(hˆ

λ,j −µˆ

λ(x(j)))T, respectively.

•gausmarg : This likelihood further assumes independence in the gaussian approxi-

mation, leading to

log LD(ˆ

λ) = −

N

X

i=1

(y(i)−µˆ

λ(x(i)))2

2σ2

ˆ

λ(x(i)) + σ2−1

2

N

X

i=1

log 2πσ2

ˆ

λ(x(i)) + σ2.(4.16)

•abcm : This likelihood enforces the mean of fˆ

λto match the mean of data

log LD(ˆ

λ) = −

N

X

i=1

(y(i)−µˆ

λ(x(i)))2

22−1

2log (2π2),(4.17)

•abc : This likelihood enforces the mean of fˆ

λto match the mean of data and

the standard deviation to match the average spread of data around mean within some

factor γ

log LD(ˆ

λ) = −

N

X

i=1

(y(i)−µˆ

λ(x(i)))2+γ|y(i)−µˆ

λ(x(i))| − qσ2

ˆ

λ(x(i)) + σ22

22−1

2log (2π2),

(4.18)

}pce eval:

This utility evaluates PC-related functions given input ﬁle xdata.dat and return the

evaluations in an output ﬁle ydata.dat. The keyword options are:

Function types: -f <fcn_type>

•PC : Evaluates the function f(~

ξ) = PK

k=0 ckΨk(~

ξ) given a set of ~

ξ, the PC type,

dimensionality, order and coeﬃcients.

•PC_mi : Evaluates the function f(~

ξ) = PK

k=0 ckΨk(~

ξ) given a set of ~

ξ, the PC type,

multiindex and coeﬃcients.

•PCmap : Evaluates ‘map’ functions from a germ of one PC type to another. That is

PC1 to PC2 is a function ~η =f(~

ξ) = C−1

2C1(~

ξ1), where C1and C2are the cumulative

distribution functions (CDFs) associated with the PDFs of PC1 and PC2, respectively.

For example, HG→LU is a map from standard normal random variable to a uniform

random variable in [−1,1].

26

}pce quad:

This utility constructs a PC expansion from a given set of samples. Given a set of N

samples {x(i)}N

i=1 of a random d-variate vector ~

X, the goal is to build a PC expansion

~

X'

K

X

k=0

ckΨk(~

ξ),(4.19)

where dis the stochastic dimensionality, i.e. ~

ξ= (ξ1, . . . , ξd). We use orthogonal projection

method, i.e.

ck=h~

XΨk(~

ξ)i

hΨ2

k(~

ξ)i=h~

G(~

ξ)Ψk(~

ξ)i

hΨ2

k(~

ξ)i.(4.20)

The denominator can be precomputed analytically or numerically with high precision. The

key map ~

G(~

ξ) in the numerator is constructed as follows. We employ the Rosenblatt trans-

formation, constructed by shifted and scaled successive conditional cumulative distribution

functions (CDFs),

η1= 2F1(X1)−1

η2= 2F2|1(X2|X1)−1

η3= 2F3|2,1(X3|X2, X1)−1 (4.21)

.

.

.

ηd= 2Fd|d−1,...,1(Xd|Xd−1, . . . , X1)−1.

maps any joint random vector to a set of independent standard Uniform[-1,1] random vari-

ables. Rosenblatt transformation is the multivariate generalization of the well-known CDF

transformation, stating that F(X) is uniformly distributed if F(·) is the CDF of random

variable X. The shorthand notation is ~η =~

R(~

X). Now denote the shifted and scaled uni-

variate CDF of the ‘germ’ ξiby H(·), so that by the CDF transformation reads as ~

H(~

ξ) = ~η.

For example, for Legendre-Uniform PC, the germ itself is uniform and H(·) is identity, while

for Gauss-Hermite PC the function H(·) is shifted and scaled version of the normal CDF.

Now, we can write the connection between ~

Xand ~

ξby

~

R(~

X) = ~

H(~

ξ),or ~

X=~

R−1◦~

H

|{z }

~

G

(~

ξ) (4.22)

While the computation of ~

His done analytically or numerically with high precision,

the main challenge is to estimate ~

R−1. In practice the exact joint cumulative distribution

F(x1,...,xd) is generally not available and is estimated using a standard Kernel Density

Estimator (KDE) using the samples available. Given Nsamples {x(i)}N

i=1 , the KDE estimate

of its joint probability density function is a sum of Nmultivariate gaussian functions centered

27

at each data point x(i):

p~

X(x) = 1

Nσd(2π)d/2

N

X

i=1

exp −(x−x(i))T(x−x(i))

2σ2(4.23)

or

p~

X1,..., ~

Xd(x1,...,xd) = 1

Nσd(2π)d/2

N

X

i=1

exp −(x1−x(i)

1)2+··· + (xd−x(i)

d)2

2σ2!,(4.24)

where the bandwidth σshould be chosen to balance smoothness and accuracy, see [28, 29]

for discussions of the choice of σ. Note that ideally σshould be chosen to be dimension-

dependent, however the current implementation uses the same bandwidth for all dimensions.

Now the conditional CDF is KDE-estimated by

Fk|k−1,...,1(xk|xk−1,...,x1) = Zxk

−∞

pk|k−1,...,1(x0

k|xk−1,...,x1)dx0

k

=Zxk

−∞

pk,...,1(x0

k,xk−1,...,x1)

pk−1,...,1(xk−1,...,x1)dx0

k

≈1

σ√2πZxk

−∞

N

X

i=1

exp −(x1−x(i)

1)2+···+(x0

k−x(i)

k)2

2σ2

N

X

i=1

exp −(x1−x(i)

1)2+···+(xk−1−x(i)

k−1)2

2σ2dx0

k

=Zxk

−∞

N

X

i=1

exp −(x1−x(i)

1)2+···+(xk−1−x(i)

k−1)2

2σ2×1

σ√2πexp −(x0

k−x(i)

k)2

2σ2

N

X

i=1

exp −(x1−x(i)

1)2+···+(xk−1−x(i)

k−1)2

2σ2dx0

k

=

N

X

i=1

exp −(x1−x(i)

1)2+···+(xk−1−x(i)

k−1)2

2σ2×Φxk−x(i)

k

σ

N

X

i=1

exp −(x1−x(i)

1)2+···+(xk−1−x(i)

k−1)2

2σ2,(4.25)

where Φ(z) is the CDF of a standard normal random variable. Note that the numerator in

(4.25) diﬀers from the denominator only by an extra factor Φ xk−x(i)

k

σin each summand,

allowing an eﬃcient computation scheme.

28

The above Rosenblatt transformation maps the random vector xto a set of i.i.d. uniform

random variables ~η = (η1, . . . , ηd). However, the formula (4.22) requires the inverse of

the Rosenblatt transformation. Nevertheless, the approximate conditional distributions are

monotonic, hence they are guaranteed to have an inverse function, and it can be evaluated

rapidly with a bisection method.

With the numerical estimation of the map (4.22) available, we can proceed to evaluation

the numerator of the orthogonal projection (4.20)

h~

G(~

ξ)Ψk(~

ξ)i=Z~

ξ

~

G(x)Ψk(x)π~

ξ(~

ξ)d~

ξ, (4.26)

where π~

ξ(~

ξ) is the PDF of ~

ξ. The projection integral (4.26) is computed via quadrature

integration

Z~

ξ

~

G(~

ξ)Ψk(~

ξ)π~

ξ(~

ξ)d~

ξ≈

Q

X

q=1

~

G(~

ξq)Ψk(~

ξq)wq=

Q

X

q=1

~

R−1(~

H(~

ξq))Ψk(~

ξq)wq,(4.27)

where (~

ξq, wq) are Gaussian quadrature point-weight pairs for the weight function π~

ξ(~

ξ).

}pce resp:

This utility performs orthogonal projection given function evaluations at quadrature

points, in order to arrive at polynomial chaos coeﬃcients for a Total-Order PC expansion

f(~

ξ)≈X

||α||1≤p

cαΨα(~

ξ)≡g(~

ξ).(4.28)

The orthogonal projection computed by this utility is

cα=1

hΨ2

αiZ~

ξ

f(~

ξ)Ψα(~

ξ)π~

ξ(~

ξ)d~

ξ≈1

hΨ2

αi

Q

X

q=1

wqf(~

ξ(q))Ψα(~

ξ(q)).(4.29)

Given the function evaluations f(~

ξ(q)) and precomputed quadrature (~

ξ(q), wq), this utility

outputs the PC coeﬃcients cα, PC evaluations at the quadrature points g(~

ξ(q)) as well as, if

requested by a command line ﬂag, a quadrature estimate of the relative L2error

||f−g||2

||f||2≈v

u

u

tPQ

q=1 wq(f(~

ξ(q))−g(~

ξ(q)))2

PQ

q=1 wqf(~

ξ(q))2.(4.30)

Note that the selected quadrature may not compute the error accurately, since the integrated

functions are squared and can be higher than the quadrature is expected to integrate accu-

rately. In such cases, one can use the pce_eval app to evaluate the PC expansion separately

and compare to the function evaluations with an `2norm instead.

29

}pce rv:

This utility generates PC-related random variables (RVs). The keyword options are:

RV types: -w <type>

•PC : Generates samples of univariate random variable PK

k=0 ckΨk(~

ξ) given the PC

type, dimensionality, order and coeﬃcients.

•PCmi : Generates samples of univariate random variable PK

k=0 ckΨk(~

ξ) given the PC

type, multiindex and coeﬃcients.

•PCvar : Generates samples of multivariate random variable ~

ξthat is the germ of a

given PC type and dimensionality.

}pce sens:

This utility evaluates Sobol sensitivity indices of a PC expansion with a given multiindex

and a coeﬃcient vector. It computes main, total and joint sensitivities, as well as variance

fraction of each PC term individually. Given a PC expansion PIcαΨα(~

ξ), the computed

moments and sensitivity indices are:

•mean: m=c~

0

•total variance: V=Pα6=~

0c2

αhΨ2

αi

•variance fraction for the basis term α:Vα=c2

αhΨ2

αi

V

•main Sobol sensitivity index for dimension i:Si=1

VPα∈IS

ic2

αhΨ2

αi, where IS

iis the

set of multiindices that include only dimension i.

•total Sobol sensitivity index for dimension i:ST

i=1

VPα∈IT

ic2

αhΨ2

αi, where IT

iis the

set of multiindices that include dimension i, among others.

•joint-total Sobol sensitivity index for dimension pair (i, j): ST

ij =1

VPα∈IT

ij c2

αhΨ2

αi,

where IT

ij is the set of multiindices that include dimensions iand j,among others.

Note that this is somewhat diﬀerent from the conventional deﬁnition of joint sensitivity

indices, which presumes terms that include only dimensions iand j.

}pdf cl:

Kernel density estimation (KDE) with Gaussian kernels given a set of samples to eval-

uate probability distribution function (PDF). The procedure relies on approximate nearest

30

neighbors algorithm with fast improved Gaussian transform to accelerate KDE by only com-

puting Gaussians of relevant neighbors. Our tests have shown 10-20x speedup compared to

Python’s default KDE package. Also, the app allows clustering enhancement to the data

set to enable cluster-speciﬁc bandwidth selection - particularly useful for multimodal data.

User provides the samples’ ﬁle, and either a) number of grid points per dimension for density

evaluation, or b) a ﬁle with target points where the density is evaluated, or c) a ﬁle with a

hypercube limits in which the density is evaluated.

}regression:

This utility performs regression with respect to a linear parametric expansions such as

PCs or RBFs. Consider a dataset (x(i), y(i))N

i=1 that one tries to ﬁt a basis expansion with:

y(i)≈

K

X

k=1

ckPk(x(i)),(4.31)

for a set of basis functions Pk(x). This is a linear regression problem, since the object of

interest is the vector of coeﬃcients c= (c1, . . . , ck), and the summation above is linear in

c. This app provides various methods of obtaining the expansion coeﬃcients, using diﬀerent

kinds of bases.

The key implemented command line options are

Basis types: -f <basistype>

•PC : Polynomial Chaos bases of total-order truncation

•PC_MI : Polynomial Chaos bases of custom multiindex truncation

•POL : Monomial bases of total-order truncation

•POL_MI : Monomial bases of custom multiindex truncation

•RBF : Radial Basis Functions, see e.g. [21]

Regression methods: -f <meth>

•lsq : Bayesian least-squares, see [25] and more details below.

•wbcs : Weighted Bayesian compressive sensing, see [27].

Although the standard least squares is commonly used and well-documented elsewhere,

we detail here the speciﬁc implementation in this app, including the Bayesian interpretation.

31

Deﬁne the data vector y= (y(1), . . . , y(N)), and the measurement matrix Pof size N×K

with entries Pik =Pk(x(i)). The regularized least-squares problem is formulated as

arg min

c||y−P c||2+||Λc||2

| {z }

R(c)

(4.32)

with a closed form solution

ˆ

c= (PTP+Λ)−1

| {z }

Σ

PTy(4.33)

where Λ=diag(√λ1,...,√λK) is a diagonal matrix of non-negative regularization weights

λi≥0.

The Bayesian analog of this, detailed in [25], infers coeﬃcient vector cand data noise

variance σ2, given data y, employing Bayes’ formula

Posterior

z }| {

p(c, σ2|y)∝

Likelihood

z }| {

p(y|c, σ2)

Prior

z }| {

p(c, σ2) (4.34)

The likelihood function is associated with i.i.d. Gaussian noise model y−P c ∼N(0, σ2IN),

and is written as,

p(y|c, σ2)≡Lc,σ2(y) = (2πσ2)−N

2exp −1

2σ2||y−P c||2(4.35)

Further, the prior p(c, σ2) is written as a product of a zero-mean Gaussian prior on cand

an inverse-gamma prior on σ2:

p(c, σ2) = K

Y

k=1

λk

2π!1

2

exp −1

2||Λc||2

| {z }

p(c)

(σ2)−α−1exp −β

σ2

| {z }

p(σ2)

(4.36)

The posterior distribution then takes a form of normal-scaled inverse gamma distribution

which, after some re-arranging, is best described as

p(c|σ2,y)∼MV N(ˆ

c, σ2Σ),(4.37)

p(σ2|y)∼IG

α+N−K

2

| {z }

α∗

, β +R(ˆ

c)

2

| {z }

β∗

(4.38)

where ˆ

cand Σ, as well as the residual R(·) are deﬁned via the classical least-squares prob-

lem (4.32) and (4.33). Thus, the mean posterior value of data variance is ˆσ2=β+R(ˆ

c)

2

α+N−K

2−1.

32

Also, note that the residual can be written as R(ˆ

c) = yTIN−PPTP+Λ−1PTy.

One can integrate out σ2from (4.36) to arrive at a multivariate t-distribution

p(c|y)∼MV T ˆ

c,β∗

α∗Σ,2α∗(4.39)

with a mean ˆ

cand covariance α∗

α∗−2Σ.

Now, the pushed-forward process at new values xwould be, deﬁning P(x) = (P1(x), . . . , Pk(x)),

a Student-t process with mean µ(x) = P(x)ˆ

c, scale C(x, x0) = β∗

α∗P(x)ΣP(x0) and degrees-

of-freedom 2α∗.

Note that, currently, Jeﬀrey’s prior for p(σ2)=1/σ2is implemented, which corresponds

to the case of α=β= 0. We are currently implementing more ﬂexible user-deﬁned input

for αand β. In particular, in the limit of β=σ2

0α→ ∞, one recovers the case with a ﬁxed,

predeﬁned data noise variance σ2

0.

}sens:

This utility performs a series of tasks for for the computation of Sobol indices. Some

theoretical background on the statistical estimators employed here is given in Chapter 5. This

utility can be used in conjunction with utility trdSpls which generates truncated normal

or log-normal random samples. It can also be used to generate uniform random samples by

selecting a truncated normal distribution and a suitably large standard deviation.

In addition to the -h ﬂag, it has the following command line options:

•-a <action>: Action to be performed by this utility

–splFO: assemble samples for ﬁrst order Sobol indices

–idxFO: compute ﬁrst order Sobol indices

–splTO: assemble samples for total order Sobol indices

–idxTO: compute total order Sobol indices

–splJnt: assemble samples for joint Sobol indices

–idxJnt: compute joint Sobol indices

•-d <ndim>: Number of dimensions

•-n <ndim>: Number of dimensions

•-u <spl1>: name of ﬁle holding the ﬁrst set of samples, nspl×ndim

•-v <spl2>: name of ﬁle holding the second set of samples, nspl×ndim

33

•-x <mev>: name of ﬁle holding model evaluations

•-p <pfile>: name of ﬁle possibly holding a custom list of parameters for Sobol

indices

34

Chapter 5

Examples

The primary intended use for UQTk is as a library that provides UQ functionality to

numerical simulations. To aid the development of UQ-enabled simulation codes, some ex-

amples of programs that perform common UQ operations with UQTk are provided with

the distribution. These examples can serve as a template to be modiﬁed for the user’s pur-

poses. In some cases, e.g. in sampling-based approaches where the simulation code is used

as a black-box entity, the examples may provide enough functionality to be used directly,

with only minor adjustments. Below is a brief description of the main examples that are

currently in the UQTk distribution. For all of these, make sure the environment variable

UQTK_INS is set and points upper level directory of the UQTk install directory, e.g. the

keyword installdir described in the installation section. This path also needs to be added to

environment variable PYTHONPATH.

Elementary Operations

Overview

This set of examples is located under examples/ops. It illustrates the use of UQTk

for elementary operations on random variables that are represented with Polynomial Chaos

(PC) expansions.

Description

This example can be run from command-line:

./Ops.x

followed by

./plot_pdf.py samples.a.dat

./plot_pdf.py samples.loga.dat

35

to plot select probability distributions based on samples from Polynomial Chaos Expan-

sions (PCE) utilized in this example.

Ops.x step-by-step

•Wherever relevant the PCSet class implements functions that take either “double *” ar-

guments or array container arguments. The array containers, named “Array1D”, “Ar-

ray2D”, and“Array3D”, respectively, are provided with the UQTk library to streamline

the management of data structures.

1. Instantiate a PCSet class for a 2nd order 1D PCE using Hermite-Gauss chaos.

int ord = 2;

int dim = 1;

PCSet myPCSet("ISP",ord,dim,"HG");

2. Initialize coeﬃcients for HG PCE expansion ˆagiven its mean and standard deviation:

double ma = 2.0; // Mean

double sa = 0.1; // Std Dev

myPCSet.InitMeanStDv(ma,sa,a);

ˆa=

P

X

k=0

akΨk(ξ), a0=µ, a1=σ

phψ2

1i, a2=a3=. . . = 0

3. Initialize ˆ

b= 2.0ψ0(ξ)+0.2ψ1(ξ)+0.01ψ2(ξ) and subtract ˆ

bfrom ˆa:

b[0] = 2.0;

b[1] = 0.2;

b[2] = 0.01;

myPCSet.Subtract(a,b,c);

The subtraction is a term by term operation: ck=ak−bk

4. Product of PCE’s, ˆc= ˆa·ˆ

b:

myPCSet.Prod(a,b,c);

ˆc=

P

X

k=0

ckΨk(ξ) = P

X

k=0

akΨk(ξ)! P

X

k=0

bkΨk(ξ)!

ck=

P

X

i=0

P

X

j=0

Cijkaibj, Cijk =hψiψjψki

hψ2

ki

The triple product Cijk is computed and stored when the PCSet class is instantiated.

36

5. Exponential of a PCE, ˆc= exp(ˆa) is computed using a Taylor series approach

myPCSet.Exp(a,c);

ˆc= exp(ˆa) = exp(a0) 1 +

NT

X

n=0

ˆ

dn

n!!(5.1)

where

ˆ

d= ˆa−a0=

P

X

k=1

ak(5.2)

The number of terms NTin the Taylor series expansion are incremented adaptively

until an error criterion is met (relative magnitude of coeﬃcients compared to the mean)

or the maximum number of terms is reached. Currently, the default relative tolerance

and maximum number of Taylor terms are 10−6and 500. This values can be changed by

the user using public PCSet methods SetTaylorTolerance and SetTaylorTermsMax,

respectively.

6. Division, ˆc= ˆa/ˆ

b:

myPCSet.Div(a,b,c);

Internally the division operation is cast as a linear system, see item 4, ˆa=ˆ

b·ˆc, with

unknown coeﬃcients ckand known coeﬃcients akand bk. The linear system is sparse

and it is solved with a GMRES iterative solver provided by NETLIB

7. Natural logarithm, ˆc= log(ˆa):

myPCSet.Log(a,c);

Currently, two methodologies are implemented to compute the logarithm of a PCE:

Taylor series expansion and an integration approach. For more details see Debusschere

et. al. [3].

8. Draw samples from the random variable ˆarepresented as a PCE:

myPCSet.DrawSampleSet(aa,aa_samp);

Currently “Ops.x” draws sample from both ˆaand log(ˆa) and saves the results to ﬁles

“samples.a.dat” and “samples.loga.dat”, respectively.

9. The directory contains a python script that computes probability distributions from

samples via Kernel Density Estimate (KDE, also see Lecture #1) and generates two

plots, “samples.a.dat.pdf” and “samples.loga.dat.pdf”, also shown in Fig. 5.1.

37

Figure 5.1: Probability densities for ˆaand log(ˆa) computed via KDE. Results generated

using several KDE bandwidths. This feature is available in the Python’s SciPy package

starting with version 0.11

Forward Propagation of Uncertainty

Overview

•Located in examples/surf_rxn

•Several examples of propagating uncertainty in input parameters through a model for

surface reactions, consisting of three Ordinary Diﬀerential Equations (ODEs). Two

approaches are illustrated:

–Direct linking to the C++ UQTk libraries from a C++ simulation code:

∗Propagation of input uncertainties with Intrusive Spectral Projection (ISP),

Non Intrusive Spectral Projection (NISP) via quadrature , and NISP via

Monte Carlo (MC) sampling.

∗For more documentation, see a detailed description below

∗An example can be run with ./forUQ_sr.py

–Using simulation code as a black box forward model:

∗Propagation of uncertainty in one input parameter with NISP quadrature

approach.

∗For more documentation, see a detailed description below

∗An example can be run with ./forUQ_BB_sr.py

Simulation Code Linked to UQTk Libraties

The example script forUQ_sr.py, provided with this example can perform parametric

uncertainty propagation using three methods

•NISP: Non-intrusive spectral projection using quadrature integration

38

•ISP: Intrusive spectral projection

•NISP MC: Non-intrusive spectral projection using Monte-Carlo integration

The command-line usage for this example is

./forUQ_sr.py <pctype> <pcord> <method1> [<method2>] [<method3>]

The script requires the xml input template ﬁle forUQ surf rxn.in.xml.templ. In this tem-

plate, the default setting for param b is uncertain normal random variable with a standard

deviation set to 10% of the mean.

The following parameters are deﬁned at the beginning of the ﬁle:

•pctype: The type of PC, supports ’HG’, ’LU’, ’GLG’, ’JB’

•pcord: The order of output PC expansion

•methodX: NISP, ISP or NISP MC

•nsam: Number of samples requested for NISP Monte-Carlo (currently hardwired in

the script)

Description of Non-Intrusive Spectral Projection utilities (SurfRxnNISP.cpp and

SurfRxnNISP MC.cpp)

f(~

ξ) = X

k

ckΨk(~

ξ)ck=hf(~

ξ)Ψk(~

ξ)i

hΨ2

k(~

ξ)i

hf(~

ξ)Ψk(~

ξ)i=Zf(~

ξ)Ψk(~

ξ)π(~

ξ)d~

ξ≈"X

q

f(~

ξq)Ψk(~

ξq)wq#

| {z }

NISP

or "1

NX

s

f(~

ξs)Ψk(~

ξs)#

| {z }

NISP MC

These codes implement the following workﬂows

1. Read XML ﬁle

2. Create a PC object with or without quadrature

•NISP: PCSet myPCSet("NISP",order,dim,pcType,0.0,1.0)

•NISP MC: PCSet myPCSet("NISPnoq",order,dim,pcType,0.0,1.0)

3. Get the quadrature points or generate Monte-Carlo samples

39

•NISP: myPCSet.GetQuadPoints(qdpts)

•NISP MC: myPCSet.DrawSampleVar(samPts)

4. Create input PC objects and evaluate input parameters corresponding to quadrature

points

5. Step forward in time

- Collect values for all input parameter samples

- Perform Galerkin projection or Monte-Carlo integration

- Write the PC modes and derived ﬁrst two moments to ﬁles

Description of Intrusive Spectral Projection utility (SurfRxnISP.cpp)

This code implement the following workﬂows

1. Read XML ﬁle

2. Create a PC object for intrusive propagation

PCSet myPCSet("ISP",order,dim,pcType,0.0,1.0)

3. Represent state variables and all parameters with their PC coeﬃcients

•u→ {uk},v→ {vk},w→ {wk},z→ {zk},

•a→ {ak},b→ {bk},c→ {ck},d→ {dk},e→ {ek},f→ {fk}.

4. Step forward in time according to PC arithmetics, e.g.

a·u→ {(a·u)k}with

a·u= X

i

aiΨi(~

ξ)! X

j

ujΨj(~

ξ)!=X

k X

i,j

aiujhΨiΨjΨki

hΨ2

ki!

| {z }

(a·u)k

Ψk(~

ξ)

Postprocessing Utilities - time series

./plSurfRxnMstd.py NISP

./plSurfRxnMstd.py ISP

./plSurfRxnMstd.py NISP_MC

These commands plot the time series of mean and standard deviations of all three species

with all three methods. Sample results are shown in Fig. 5.2.

Postprocessing Utilities - PDFs

40

Figure 5.2: Time series of mean and standard deviations for u,v, and wwith NISP, ISP,

and NISP MC, respectively.

./plPDF_method.py <species> <qoi> <pctype> <pcord> <method1> [<method2>] [<method3>]

e.g.

./plPDF_method.py u ave HG 3 NISP ISP

This script samples the PC representations, then computes the PDFs of time-average

(ave) or the ﬁnal time value (tf) for all three species. Sample results are shown in Fig. 5.3.

Figure 5.3: PDFs for u,v, and w; Top row shows results for average u,v, and w; Bottom

row shows results corresponding to values at the last integration step (ﬁnal time).

Simulation Code Employed as a Black Box

The command-line usage for the script implementing this example is given as

./forUQ_BB_sr.py --nom nomvals -s stdfac -d dim -l lev -o ord -q sp --npdf npdf

--npces npces

The following parameters can be controlled by the user

41

•nomvals: List of nominal parameter values, separated by comma if more than one

value, and no spaces. Default is one value, 20.75

•stdfac: Ratio of standard deviation/nominal parameter values. Default value: 0.1

•dim: number of uncertain input parameters. Currently this example can only handle

dim = 1

•lev: No. of quadrature points per dimension (for full quadrature) or sparsity level (for

sparse quadrature). Default value: 21.

•ord: PCE order. Default value: 20

•sp: Quadrature type “full” or “sparse”. Default value: “full”

•npdf: No. of grid points for Kernel Density Estimate evaluations of output model

PDF’s. Default value 100

•npces: No. of PCE evaluations to estimate output densitie. Default value 105

Note: This example assumes Hermite-Gauss chaos for the model input

parameters.

This script uses the following utilities, located in the bin directory under the UQTk

installation path

•generate quad: Generate quadrature points for full/sparse quadrature and several types

of rules.

•pce rv: Generate samples from a random variable deﬁned by a Polynomial Chaos

expansion (PCE)

•pce eval: Evaluates PCE for germ samples saved in input ﬁle “xdata.dat”.

•pce resp: Constructs PCE by Galerkin projection

Sequence of computations:

1. forUQ BB sr.py

saves the input parameters’ nominal values and standard deviations in a diagonal

matrix format in ﬁle “pcﬁle”. First it saves the matrix of nominal values, then the

matrix of standard deviations. This information is suﬃcient to deﬁne a PCE for a

normal random variable in terms of a standard normal germ. For a one parameter

problem, this ﬁle has two lines.

42

2. generate quad:

Generate quadrature points for full/sparse quadrature and several types of rules. The

usage with default script arguments generate_quad -d1 -g’HG’ -xfull -p21 > logQuad.dat

This generates Hermite-Gauss quadrature points for a 21-point rule in one dimension.

Quadrature points locations are saved in “qdpts.dat” and weights in “wghts.dat” and

indices of points in the 1D space in “indices.dat”. At the end of “generate quad”

execution ﬁle “qdpts.dat” is copied over “xdata.dat”

3. pce eval:

Evaluates PCE of input parameters at quadrature points, saved previously in “xdata.dat”.

The evaluation is dimension by dimension, and for each dimension the correspond-

ing column from “pcﬁle” is saved in “pccf.dat”. See command-line arguments below.

pce_eval -x’PC’ -p1 -q1 -f’pccf.dat’ -sHG >> logEvalInPC.dat

At the end of this computation, ﬁle “input.dat” contains a matrix of PCE evaluations.

The number of lines is equal to the number of quadrature points and the number of

columns to the dimensionality of input parameter space.

4. Model evaluations:

funcBB("input.dat","output.dat",xmltpl="surf_rxn.in.xml.tp3",

xmlin="surf_rxn.in.xml")

The Python function “funcBB” is deﬁned in ﬁle “prob3 utils.py”. This evaluates the

forward model at sets of input parameters in ﬁle “input.dat” and saves the model

output in “output.dat”. For each model evaluation, speciﬁc parameters are inserted in

the xml ﬁle “surf rxn.in.xml” which is a copy of the template in “surf rxn.in.xml.tp3”.

At the end “output.dat” is copied over “ydata.dat”

5. pce resp:

pce_resp -xHG -o20 -d1 -e > logPCEresp.dat

Computes a Hermite-Gauss PCE of the model output via Galerkin projection. The

model evaluations are taken from “ydata.dat”, and the quadrature point locations from

“xdata.dat”. PCE coeﬃcients are saved in “PCcoeﬀ quad.dat”, the multi-index list in

“mindex.dat” and these ﬁles are pasted together in “mipc.dat”

(average uas a function of parameter b values. Location of quadrature points is

shown with circles.)

43

6. pce rv:

pce_rv -w’PCvar’ -xHG -d1 -n100 -p1 -q0 > logPCrv.dat

Draw a 100 samples from the germ of the HG PCE. Samples are saved in ﬁle “rvar.dat”

and also copied to ﬁle “xdata.dat”

7. pce eval:

pce_eval -x’PC’ -p1 -q1 -f’pccf.dat’ -sHG >> logEvalInPCrnd.dat See item 3 for de-

tails. Results are saved “input val.dat”.

8. Evaluate both the forward model (through the black-box script “funcBB”, see item 4)

and its PCE surrogate (see item 3) and save results to ﬁles “output val.dat” and

“output val pc.dat”. Compute L2error between the two sets of values using function

“compute err” deﬁned in “utils.py”

9. Sample output PCE and plot the PDF of these samples computed using either a Kernel

Density Estimate approach with several kernel bandwidths or by binning:

Numerical Integration

Overview

This example is located in examples/num_integ. It contains a collection of Python

scripts that can be used to perform numerical integration on six Genz functions: oscillatory,

exponential, continuous, Gaussian, corner-peak, and product-peak. Quadrature and Monte

Carlo integration methods are both employed in this example.

44

Theory

In uncertainty quantiﬁcation, forward propagation of uncertain inputs often involves eval-

uating integrals that cannot be computed analytically. Such integrals can be approximated

numerically using either a random or a deterministic sampling approach. Of the two inte-

gration methods implemented in this example, quadrature methods are deterministic while

Monte Carlo methods are random.

Quadrature Integration

The general quadrature rule for integrating a function u(ξ) is given by:

Zu(ξ)dξ ≈

Nq

X

i=1

qiu(ξi) (5.3)

where the Nqξiare quadrature points with corresponsing weights qi.

The accuracy of quadrature integration relies heavily on the choice of the quadrature

points. There are countless quadrature rules that can be used to generate quadrature points,

such as Gauss-Hermite, Gauss-Legendre, and Clenshaw-Curtis.

When performing quadrature integration, one can use either full tensor product or sparse

quadrature methods. While full tensor product quadrature methods are eﬀective for func-

tions of low dimension, they suﬀer from the curse of dimensionality. Full tensor product

quadrature integration methods require Ndquadrature points to integrate a function of di-

mension dwith Nquadrature points per dimension. Thus, for functions of high dimension

the number of quadrature points required quickly becomes too large for these methods to

be practical. Therefore, in higher dimensions sparse quadrature approaches, which require

far fewer points, are utilized. When performing sparse quadrature integration, rather than

determining the number of quadrature points per dimension, a level is selected. Once a level

is selected, the total number of quadrature points can be determined from the dimension of

the function. For more information on quadrature integration see reference here.

Monte Carlo Integration

One random sampling approach that can be used to evaluate integrals numerically is

Monte Carlo integration. To use Monte Carlo integration methods to evaluate the integral

of a general function u(ξ) on the d-dimensional [0,1]dthe following equation can be used:

Zu(ξ)dξ ≈1

Ns

Ns

X

i=1

u(ξi) (5.4)

The Nsξiare random sampling points chosen from the region of integration according to

the distribution of the inputs. In this example, we are assuming the inputs have uniform

45

distribution. One advantage of using Monte Carlo integration is that any number of sam-

pling points can be used, while quadrature integration methods require a certain number of

sampling points. One disadvantage of using Monte Carlo integration methods is that there

is slow convergence. However, this O(1

√Ns) convergence rate is independent of the dimension

of the integral.

Genz Functions

The functions being integrated in this example are six Genz functions, and they are

integrated over the d-dimensional [0,1]d. These functions, along with their exact integrals,

are deﬁned as follows. The Genz parameters wirepresent weight parameters and uirepresent

shift parameters. In the current example, the parameters wiand uiare set to 1, with one

exception. The parameters wiand uiare instead set to 0.1 for the Corner-peak function in

the sparse_quad.py ﬁle.

Model Formula: f(λ) Exact Integral: R

[0,1]d

f(λ)dλ

Oscillatory cos(2πu1+

d

P

i=1

wiλi) cos (2πu1+1

2

d

P

i=1

wi)

d

Q

i=1

2 sin( wi

2)

wi

Exponential exp(

d

P

i=1

wi(λi−ui))

d

Q

i=1

1

wi(exp(wi(1 −ui)) −exp(−wiui))

Continuous exp(−

d

P

i=1

wi|λi−ui|)

d

Q

i=1

1

wi(2 −exp(−wiui)−exp(wi(ui−1)))

Gaussian exp(−

d

P

i=1

w2

i(λi−ui)2)

d

Q

i=1

√π

2wi(erf(wi(1 −ui)) + erf(wiui))

Corner-peak (1 +

d

P

i=1

wiλi)−(d+1) 1

d!

d

Q

i=1

wiP

r{0,1}d

(−1)||r||1

1+

d

P

i=1

wiri

Product-peak

d

Q

i=1

w2

i

1+w2

i(λi−ui)2

d

Q

i=1

wi(arctan(wi(1 −ui)) + arctan(wiui))

Implementation

The script set consists of three ﬁles:

•full_quad.py: a script to compare full quadrature and Monte Carlo integration meth-

ods.

•sparse_quad.py: a script to compare sparse quadrature and Monte Carlo integration

methods.

•quad_tools.py: a script containing functions called by full_quad.py and sparse_quad.py.

46

full quad.py

This script will produce a graph comparing full quadrature and Monte Carlo integration

methods. Use the command ./full_quad.py to run this ﬁle. Upon running the ﬁle, the

user will be prompted to select a model from the Genz functions listed.

Please enter desired model from choices:

genz_osc

genz_exp

genz_cont

genz_gaus

genz_cpeak

genz_ppeak

The six functions listed correspond to the Genz functions deﬁned above. After the user

selects the desired model, he/she will be prompted to enter the desired dimension.

Please enter desired dimension:

The dimension should be entered as an integer without any decimal points. As full quadra-

ture integration is being implemented, this script should not be used for functions of high di-

mension. If you wish to integrate a function of high dimension, instead use sparse_quad.py.

After the user enters the desired dimension, she/he will be prompted to enter the desired

maximum number of quadrature points per dimension.

Enter the desired maximum number of quadrature points per dimension:

Again, this number should be entered as an integer without any decimal points. Several

quadrature integrations will be performed, with the ﬁrst beginning with 1 quadrature point

per dimension. For subsequent quadrature integrations, the number of quadrature points

will be incremented by one until the maximum number of quadrature points per dimension,

as speciﬁed by the user, is reached. For example, if the user has requested a maximum

of 4 quadrature points per dimension, 4 quadrature integrations will be performed: one

with 1 quadrature point per dimension, another with 2 quadrature points per dimension, a

third with 3 quadrature points per dimension, and a fourth with 4 quadrature points per

dimension.

Next, the script will call the function generate_qw from the quad_tools.py script to

generate quadrature points as well as the corresponding weights.

Then, the exact integral for the chosen function is computed by calling the integ_exact

function in quad_tools.py. This function calculates the exact integral according to the

47

formulas found in the above Theory section. The error between the exact integral and the

quadrature approximation is then calculated and stored in a list of errors.

Now, for each quadrature integration performed, a Monte Carlo integration is also per-

formed with the same number of sampling points as the total number of quadrature points.

To account for the random nature of the Monte Carlo sampling approach, ten Monte Carlo

integrations are performed and their errors from the exact integral are averaged. To perform

these Monte Carlo integrations and calculate the error in these approximations, the function

find_error found in quad_tools.py is called. Although we are integrating over [0,1]d, the

sampling points will be uniformly random points in [−1,1]d. We do this so the same function

func can be used to evaluate the model at these points and the quadrature points, which

are generated in [−1,1]d. The function func takes points in [−1,1]das input and maps these

points to points in [0,1]dbefore the function is evaluated at these new points

Finally, the data from both the quadrature and Monte Carlo integrations are plotted.

A log-log graph is created that displays the total number of sampling points versus the

absolute error in the integral approximation. The graph will be displayed and will be saved

as quad_vs_mc.pdf as well.

sparse quad.py

This script is similar to the full_quad.py ﬁle and will produce a graph comparing

sparse quadrature and Monte Carlo integration methods. Sparse quadrature integration

rules should be utilized for functions of high dimension, as they do not obey full tensor

product rules. Use the command ./sparse_quad.py to run this script. Upon running the

ﬁle, the user will be prompted to select a model from the Genz functions listed.

Please enter desired model from choices:

genz_osc

genz_exp

genz_cont

genz_gaus

genz_cpeak

genz_ppeak

After the user selects the desired model, he/she will be prompted to enter the desired di-

mension.

Please enter desired dimension:

The dimension should be entered as an integer without any decimal points. After the user

enters the desired dimension, she/he will be prompted to enter the maximum desired level.

48

Enter the maximum desired level:

Again, this number should be entered as an integer without any decimal points. Multiple

quadrature integrations will be performed, with the ﬁrst beginning at level 1. For subsequent

quadrature integrations, the level will increase by one until the maximum desired level, as

speciﬁed by the user, is reached.

Next, the script will call the function generate_qw from the quad_tools.py script to

generate quadrature points as well as the corresponding weights.

Then, the exact integral for the chosen function is computed by calling the integ_exact

function in quad_tools.py. The error between the exact integral and the quadrature ap-

proximation is then calculated and stored in a list of errors.

Now, for each quadrature integration performed, a Monte Carlo integration is also per-

formed with the same number of sampling points as the total number of quadrature points.

This is done in the same manner as in the full_quad.py script.

Lastly, the data from both the sparse quadrature and Monte Carlo integration are plotted.

A log-log graph is created that displays the total number of sampling points versus the

absolute error in the integral approximation. The graph will be displayed and will be saved

as sparse_quad.pdf.

quad tools.py

This script contains four functions called by the full_quad.py and sparse_quad.py

ﬁles.

•generate_qw(ndim,param,sp=’full’,type=’LU’): This function generates quadra-

ture points and corresponding weights. The quadrature points will be generated in the

d-dimensional [−1,1]d.

–ndim: The number of dimensions as speciﬁed by the user.

–param: Equal to the number of quadrature points per dimension when full quadra-

ture integration is being performed. When sparse quadrature integration is being

performed, param represents the level.

–sp: The sparsity, which can be set to either full or sparse. The default is set as

sp=’full’, and to change to sparse quadrature one can pass sp=’sparse’ as a

parameter to the function.

–type: The quadrature type. The default rule is Legendre-Uniform (’LU’). To

change the quadrature type, one can pass a diﬀerent type to the function. For ex-

ample, to change to a Gauss-Hermite quadrature rule, pass type=’HG’ to the func-

tion. For a complete list of the available quadrature types see the generate_quad

subsection in the Applications section of Chapter 3 of the manual.

49

•func(xdata,model,func_params): This function evaluates the Genz functions at the

selected sampling points.

–xdata: These will either be the quadrature points generated by generate_qw

or the uniform random points generated in the find_error function. The points

speciﬁed as xdata into this function will be in [−1,1]dand thus will ﬁrst be mapped

to points in [0,1]dbefore the function can be evaluated at these new points.

–model: The Genz function speciﬁed by the user.

–func params: The parameters, wiand ui, of the Genz function selected. In the

full_quad.py ﬁle, all Genz parameters are set to 1. In the sparse_quad.py

ﬁle, all Genz parameters are set to 1 for all models except genz_cpeak. For the

genz_cpeak model, the Genz parameters are set to 0.1.

•integ_exact(model,func_params): This function computes the exact integral R

[0,1]d

f(λ)dλ

of the selected Genz function, f(λ).

–model: The Genz function selected by the user.

–func params: The parameters, wiand ui, of the Genz function selected. In the

full_quad.py ﬁle, all Genz parameters are set to 1. In the sparse_quad.py

ﬁle, all Genz parameters are set to 1 for all models except genz_cpeak. For the

genz_cpeak model, the Genz parameters are set to 0.1.

•find_error: This function performs 10 Monte Carlo integrations, and returns their

average error from the exact integral. The function takes inputs: pts, ndim, model,

integ ex, and func params.

–pts: The number of uniform random points that will be generated. Equal to the

total number of quadrature points used.

–ndim: The number of dimensions as speciﬁed by the user.

–model: The Genz function selected by the user.

–integ ex : The exact integral R

[0,1]d

f(λ)dλ of the selected Genz function returned

by the integ_exact function.

–func params: The parameters, wiand ui, of the Genz function selected. In the

full_quad.py ﬁle, all Genz parameters are set to 1. In the sparse_quad.py

ﬁle, all Genz parameters are set to 1 for all models except genz_cpeak. For the

genz_cpeak model, the Genz parameters are set to 0.1.

50

Sample Results

Try running the full_quad.py ﬁle with the following input:

Please enter desired model from choices:

genz_osc

genz_exp

genz_cont

genz_gaus

genz_cpeak

genz_ppeak

genz_exp

Please enter desired dimension: 5

Enter the desired maximum number of quadrature points per dimension: 10

Your graph should look similar to the one in the ﬁgure below. Although the Monte Carlo

integration curve may vary due to the random nature of the sampling, your quadrature curve

should be identical to the one pictured.

Figure 5.4: Sample results of full quad.py

51

Now try running the sparse_quad.py ﬁle with the following input:

Please enter desired model from choices:

genz_osc

genz_exp

genz_cont

genz_gaus

genz_cpeak

genz_ppeak

genz_cont

Please enter desired dimension: 14

Enter the maximum desired level: 4

While the quadrature integrations are being performed, the current level will be printed to

your screen. Your graph should look similar to the ﬁgure below. Again, the Monte Carlo

curve may diﬀer but the quadrature curve should be the same as the one pictured.

Figure 5.5: Samples results of sparse quad.py

52

Next, try running full_quad.py with a quadrature rule other than the default Legendre-

Uniform. Locate the line in full_quad.py that calls the function generate_quad. It should

read:

xpts,wghts=generate_qw(ndim,quad_param)

Now, change this line to read:

xpts,wghts=generate_qw(ndim,quad_param, type= ’CC’)

This will change the quadrature rule to Clenshaw-Curtis. Then run the ﬁle with input:

genz_gaus, 5, 10. Sample results can be found in the ﬁgure below.

Figure 5.6: Sample results of full quad.py with Clenshaw-Curtis quadrature rule.

53

Forward Propagation of Uncertainty with PyUQTk

Overview

This example is located in examples/fwd_prop. It contains a pair of Python scripts that

propagate uncertainty in input parameters through a heat transfer model using both a Monte

Carlo sampling approach and a non-intrusive spectral projection (NISP) via quadrature

methods.

Theory

Heat Transfer

Glass%

T2%

T1%

Q%

Wall%

Ti%

To%

TA%

In this example, the heat transfer through a window is calculated using samples of seven

uncertain Gaussian parameters. These parameters, along with their means and standard

deviations are deﬁned below.

Parameter Mean Standard deviation (%)

Ti: Room temperature 293 K 0.5

To: Outside temperature 273 K 0.5

dw: Window thickness 0.01 m 1

kw: Window conductivity 1 W/mK 5

hi: Inner convective heat transfer coeﬃcient 2 W/m2K 15

ho: Outer convective heat transfer coeﬃcient 6 W/m2K 15

TA: Atmospheric temperature 150 K 10

54

Once we have samples of the 7 parameters, the following forward model is used to calcu-

late heat ﬂux Q:

Q=hi(Ti−T1) = kw

T1−T2

dw

=ho(T2−To) + σ(T4

2−T4

A)

T1represents the inner window temperature and T2represents the outer window temperature.

is the emissivity of uncoated glass, which we take to be 0.95, and σis the Stefan-Boltzmann

constant.

Polynomial Chaos Expansion

In this example, the forward propagation requires the representation of heat ﬂux Q with

a multidimensional Polynomial Chaos Expansion (PCE). This representation can be deﬁned

as follows:

Q=

P

X

k=0

QkΨk(ξ1, ..., ξn)

•Q: Random variable represented with multi-D PCE

•Qk: PC coeﬃcients

•Ψk: Multi-D orthogonal polynomials up to order p

•ξi: Gaussian random variable known as the germ

•n: Dimensionality = number of uncertain model parameters

•P + 1: Number of PC terms = (n+p)!

n!p!

Non-Intrusive Spectral Projection (NISP)

Having speciﬁed a PCE form for our heat ﬂux Q, we must determine the PC coeﬃcients.

We do so through a non-intrusive Galerkin Projection. The coeﬃcients are determined using

the following formula:

Qk=hQΨki

hΨ2

ki=1

hΨ2

kiZQΨk(ξ)w(ξ)dξ

In this example we use quadrature methods to evaluate this projection integral to determine

the PC coeﬃcients.

55

Kernel Density Estimation

Once we have a large number of heat ﬂux samples, to obtain a probability density function

(PDF) curve, a kernel density estimation (KDE) is performed. When performing KDE, the

following formula is used to evaluate the PDF at point Q:

P DF (Q) = 1

Nsh

Ns

X

i=1

KQ−Qi

h

Qiare samples of the heat ﬂux, Nsis the number of sample points, K represents the kernel,

a non-negative function, and h > 0 is the bandwidth. In this example we use the Gaussian

kernel, K(x) = 1

√2πe−x2

2. The results rely heavily on the choice of bandwidth, and there are

many rules that can be used to calculate the bandwidth. In our example, the built-in SciPy

function employed automatically determines the bandwidth using Scott’s rule of thumb.

Implementation

The script set consists of two ﬁles:

•rad_heat_transfer_atm_pce.py: the main script

•heat_transfer_pce_tools.py: functions called by rad_heat_transfer_atm_pce.py

rad heat transfer atm pce.py

This script will produce a graph comparing PDFs of heat ﬂux generated using NISP full

and sparse quadrature methods and a Monte Carlo sampling method. Use the command

./rad_heat_transfer_atm_pce.py to run this ﬁle

The top section of the script has some ﬂags that are useful to consider:

•main verbose: set this parameter to 1 for intermediate print statements, otherwise set

to 0.

•compute rad: set to True to include radiation; use False to not use radiation. Note,

radiation is used, the nonlinear solver in Python sometimes has a hard time converging,

in which case you will see a warning message pop up.

•nord: the order of the PCE

•ndim: the number of dimensions of the PCE

•pc type: indicates the polynomial type and weighting function. The default is set to

”HG”, Hermite-Gauss.

56

•pc alpha and pc beta: Free parameters greater than -1. Used with Gamma-Laguerre

and Beta-Jacobi PC types.

•param: The parameter used for quadrature point generation. Equal to the number of

quadrature points per dimension for full quadrature or the level for sparse quadrature

methods. This parameter is generally set to nord + 1 in order to have the right

polynomial exactness.

Monte Carlo Sampling Methods

The script begins by assigning the input parameters and their means and standard deviations.

Using this information, a large number of random parameter samples (default is 100,000)

is generated. With these samples and our forward model, the function compute_heat_flux

then calculates the heat ﬂux assuming that no radiative heat transfer occurs. This simply

requires solving a system of three linear equations. Then, using the values of Q,T1, and

T2obtained from compute_heat_flux as initial guesses, the function r_heat_flux calcu-

lates the total heat ﬂux (including radiative heat transfer) using the SciPy nonlinear solver

optimize.fsolve. Using the heat ﬂux samples, a kernel density estimation is then per-

formed using function KDE.

NISP Quadrature Methods

After the Monte Carlo sampling process is complete, forward propagation using projection

methods will take place. At the beginning of this section of the script, the following variables

are deﬁned:

While running the ﬁle, a statement similar to the following will print indicating that PC

objects are being instantiated.

Instantiating PC Objects

Generating 4^7 = 16384 quadrature points.

Used 4 quadrature points per dimension for initialization.

Generating 4^7 = 16384 quadrature points.

Used 4 quadrature points per dimension for initialization.

Level 0 / 4

Level 1 / 4

Level 2 / 4

Level 3 / 4

Level 4 / 4

Instantiation complete

These PC objects contain all of the information needed about the polynomial chaos expan-

sion, such as the number of dimensions, the order of the expansion, and the sparsity (full or

sparse) of the quadrature methods to be implemented.

57

Next, a NumPy array of quadrature points is generated, using the function get_quadpts.

Then, the quadrature points in ξare converted to equivalent samples of the input parameters,

taking advantage of the fact that the inputs are assumed to be Gaussian. If we let µrepresent

the mean of an input parameter, σrepresent its standard deviation, and qdpts be a vector of

quadrature points, the following equation is used to convert these samples in ξto equivalent

samples of the input parameter:

parameter samples =µ+σ(qdpts)

Now that we have samples of all the input parameters, these samples are run through

the forward model to obtain values of Q using the function fwd_model. Then the actual

Galerkin projection is performed on these function evaluations to obtain the PC coeﬃcients,

using the function GalerkinProjection.

Next, to create a PDF of the output Q from its PCE, germ samples are generated and

the PCE is evaluated at these sample points using the function evaluate_pce. Lastly, using

our PCE evaluations, a kernel density estimation is performed using function KDE

This entire process is done twice, once with full tensor product quadrature points and

again with sparse quadrature points.

Printing and Graphing

Next, statements indicating the total number of sampling points used for each forward

propagation method will be printed.

Monte Carlo sampling used 100000 points

Full quadrature method used 16384 points

Sparse quadrature method used 6245 points

Finally, a graph is created which displays the three diﬀerent heat ﬂux PDFs on the same

ﬁgure. It will be saved under the ﬁle name heat_flux_pce.pdf.

heat transfer pce tools.py

This script contains several functions called by the rad_heat_transfer_atm_pce.py ﬁle.

•compute_heat_flux(Ti, To, dw, kw, hi, ho): This function calculates heat ﬂux

Q, assuming no radiative heat transfer occurs.

–Ti, To, dw, kw, hi, ho: Sample values (scalars) of the input parameters

•r_heat_flux(Ti,To,dw,kw,hi,ho,TA,estimates): This function calculates heat ﬂux

Q, assuming radiative heat transfer occurs. The SciPy non-linear solver optimize.fsolve

is employed.

58

–Ti, To, dw, kw, hi, ho, TA: Sample values (scalars) of the input parameters

–estimates: Estimates of Q,T1, and T2required to solve the nonlinear system. For

these estimates, we use the output of the function compute_heat_flux, which

solves the system assuming no radiative heat transfer occurs.

•get_quadpts(pc_model,ndim): This function generates a NumPy array of either full

tensor product or sparse quadrature points. Returns number of quadrature points,

totquat, as well.

–pc model: PC object with information about the PCE, including the desired

sparsity for quadrature point generation.

–ndim: Number of dimensions of the PCE

•fwd_model(Ti_samples,To_samples,dw_samples,kw_samples,hi_samples,

ho_samples,verbose):

This function returns a NumPy array of evaluations of the forward model. This func-

tion calls the functions compute_heat_flux and r_heat_flux.

–Ti samples, To samples, dw samples, kw samples, hi samples, ho samples: 1D

NumPy arrays (vectors) of parameter sample values.

•fwd_model_rad(Ti_samples,To_samples,dw_samples,kw_samples,hi_samples,

ho_samples,TA_samples,verbose):

Same as fwd_model but with radiation enabled, and an extra argument for the samples

of TA.

•GalerkinProjection(pc_model,f_evaluations): This function returns a 1D NumPy

array with the PC coeﬃcients.

–pc model: PC object with information about the PCE.

–f evaluations: 1D NumPy array (vector) with function to be projected, evaluated

at the quadrature points.

•evaluate_pce(pc_model,pc_coeffs,germ_samples): This function evaluates the PCE

at a set of samples of the germ.

–pc model: PC object with information about the PCE.

–pc coeﬀs: 1D NumPy array with PC coeﬃcients. This array is the output of the

function GalerkinProjection

–germ samples: NumPy array with samples of the PCE germ at which the random

variable is to be evaluated.

•KDE(fcn_evals): This function performs a kernel density estimation. It returns a

NumPy array of points at which the PDF was estimated, as well as a NumPy array of

the corresponding estimated PDF values.

–fcn evals: A NumPy array of evaluations of the forward model (values of heat

ﬂux Q).

59

Sample Results

Run the ﬁle rad_heat_transfer_atm_pce.py with the default settings. You should

expect to see the following print statement, and your graph should look similar to the one

found in the ﬁgure below.

Monte Carlo sampling used 100000 points

Full quadrature method used 16384 points

Sparse quadrature method used 6245 points

Figure 5.7: Sample results of rad heat transfer atm pce.py

Now trying changing one of the default settings. Find the line that reads nord = 3 and

change it to read nord = 2. This will change the order of the PCE to 2, rather than 3. You

should expect to see the following print statement, and a sample graph is found in the ﬁgure

below.

Monte Carlo sampling used 100000 points

Full quadrature method used 2187 points

Sparse quadrature method used 1023 points

60

Figure 5.8: Sample results of rad heat transfer atm pce.py with nord = 2

The user can also change the default pc_type = "HG". For example, to use Legendre-

Uniform PC type, change this line to read pc_type = "LU".

Expanded Forward Propagation of Uncertainty - PyUQTk

Overview

This example is contains two pairs of python ﬁles, each pair consists of a main script

and a tools ﬁle that holds functions to be called from the main script. They are located

in UQTk/examples/window. This example is an expanded version of the previous forward

propagation problem, it utilizes a more complex model that provides additional parameters.

One pair of scripts produces a probability density function of the heat ﬂux for a twelve

dimension model, using fourth order PCE’s and sparse quadrature. The other pair produces

a probability density function of the heat ﬂux for a ﬁfth dimension model, using fourth

order PCE’s. This script compares the results of using sparse and full quadrature, as well

as Monte Carlo sampling. It also produces a separate plot showing the spectral decay of the

61

PC coeﬃcients.

Theory

Heat Transfer - Dual Pane Window

The heat ﬂux calculations are implemented using random samples from the normal dis-

tribution of each of the uncertain Gaussian parameters. The standard deviations assigned

are estimates. These parameters are deﬁned in the table below:

The forward model that was developed relies on the same set of steady state heat transfer

equations from the ﬁrst example, with the addition of a combined equation for conduction

and convection for the air gap. This is deﬁned as conductance, and was implemented in order

to provide an alternative to determining the convection coeﬃcient for this region, which can

be challenging[2].

Q=hi(Ti−T1) = kw

T1−T2

dw

= 0.41ka

dagβρ2da3

µ2|T2−T3|0.16

(T2−T3) = kw

T3−T4

dw

=ho(T4−To) + σ(T4

4−T4

s)

[1] Leonard, John H. IV, Leinhard, John H. V. A Heat Transfer Textbook - 4th edition. pg

579. Phlogiston Press. 2011 [2] Rubin, Michael.Calculating Heat Transfer Through Windows.

Energy Research. Vol. 6, pg 341-349. 1982.

62

Parameter Deﬁnition Mean Value Standard Deviation %

TiInside temperature 293 K0.5

ToOutside temperature 273 K0.5

TsSky Temperature[1] 263 K10

kwConduction constant for glass 1 W/m2K5

kaConduction constant for air 0.024 W/m2K5

hiInside convection coeﬃcient 2 W/m2K15

hoOutside convection coeﬃcient 6 W/m215

dwWidth of the glass 5 mm 1

daWidth of the air gap 1 cm 1

µViscosity or air 1.86x10−5kg/m s 5

ρDensity of air 1.29 kg/m35

βThermal expansion coeﬃcient 3.67x10−31/K 5

T1represents the glass temperature of the outer facing surface for the ﬁrst glass layer,

and T2represents the temperature of the inner surface for the ﬁrst glass layer, T3represents

the temperature of the inner surface of the second glass layer, T4represents the outer facing

surface of the second glass layer. is the emissivity of uncoated glass, which we take to be

0.95, and σis the Stefan-Boltzmann constant.

Polynomial Chaos Expansion*

Non-Intrusive Spectral Projection (NISP)*

Kernel Density Estimation*

*Please see previous example.

Implementation

The ﬁrst set of scripts:

•5D_fwd_prop.py: the main script

•fwd_prop_tools.py: functions called by 5D_fwd_prop.py

63

Figure 5.9: Increase in number of samples with change in order.

5D fwd prop.py

This script will produce a plot comparing the probability density function of the heat ﬂux

using three methods. Non-intrusive spectral projection full and sparse quadrature methods,

and Monte Carlo sampling. It also produces a plot showing the spectral decay of the PC

coeﬃcient magnitude in relation to the PC order. This plot illustrates the how the pro-

jection converges to give a precise representation of the model. Changing the order of this

script (nord) may alter the results of the second plot, since it has elements that are not

dynamically linked. Use the command python 5D_fwd_prop.py to run this ﬁle from the

window directory.

The second set of scripts:

•highd_sparse.py: the main script

•tools_conductance_dp_pce_wrad.py: functions called by highd_sparse.py

64

highd sparse.py

This script will produce of the probability density function of the heat ﬂux in twelve

dimensions, using only non-intrusive spectral projection with sparse quadrature. Use the

command python highd_sparse.py to run this ﬁle from the window directory. Adjust-

ments may be made to nord if desired, though the number of points produced increases

dramatically with an increase in order, as illustrated by ﬁgure 5.9.

Monte Carlo Sampling Methods

This is very similar to the previous example with the exception of compute_heat_flux

producing T1,T2,T3,T4and Q. This function consists of a solved set of ﬁve linear equations,

our forward model neglecting convection for the air gap and radiative heat transfer, which

are evaluated for the parameter samples. The values obtained are used as initial inputs for

r_heat_flux, which calculates the total heat ﬂux for all heat transfer modes, using the SciPy

nonlinear solver optimize.fsolve. Using the heat ﬂux samples, a kernel density estimation

is then performed using function KDE.

NISP Quadrature Methods

Please see previous example.

Printing and Graphing

Next, statements indicating the total number of sampling points used for each forward

propagation method will be printed. The number of Monte Carlo points is ﬁxed, but the

number of points produced by quadrature method will vary with the number of uncertain

parameters and the order of the PCE.

Monte Carlo sampling used 100000 points

Full quadrature method used xxxxxxx points

Sparse quadrature method used xxxxxxx points

Finally, a graph is created which displays the three diﬀerent heat ﬂux PDFs on the same

ﬁgure. It will be saved under the ﬁle name heat_flux_pce.pdf.

fwd prop tools.py and tools conductance dp pce wrad.py

This scripts contain several functions called by the 5D_fwd_prop.py and highd_sparse.py

ﬁles. The ﬁve dimension example uses the ﬁve parameters with the highest uncertainty,

Ts,hi,ho,kw,ka.

65

•compute_heat_flux(Ti,To,dw,da,kw,ka,hi,ho): This function calculates heat ﬂux

Q, assuming no radiative heat transfer occurs and no convective heat transfer occurs

in the air gap.

–Ti, To, dw, da, kw, ka, hi, ho: Sample values (scalars) of the input parameters

•r_heat_flux(Ti,To,Ts,dw,da,kw,ka,hi,ho,beta,rho,mu,estimates): This func-

tion calculates heat ﬂux Q, assuming radiative heat transfer, and convective heat trans-

fer for the air gap occurs. The SciPy non-linear solver optimize.fsolve is employed.

–Ti, To, Ts, dw, da, kw, ka, hi, ho, beta, rho, mu: Sample values (scalars) of the

input parameters

–estimates: Estimates of Q,T1,T2,T3, and T4are required to solve the nonlinear

system. For these estimates, we use the output of the function compute_heat_flux,

which solves the system assuming no radiative or convective heat transfer for the

air gap occurs.

•get_quadpts(pc_model,ndim)*

•fwd_model(Ti_samples,To_samples,Ts_samples,dw_samples,da_samples,kw_samples,

ka_samples,hi_samples,ho_samples,beta_samples,rho_samples,mu_samples):

This function returns a NumPy array of evaluations of the forward model. This func-

tion calls the functions compute_heat_flux and r_heat_flux.

–Ti samples, To samples,Ts samples, dw samples, da samples, kw samples, ka samples,

hi samples, ho samples, beta samples, rho samples, mu samples: 1D NumPy ar-

rays (vectors) of parameter sample values.

•GalerkinProjection(pc_model,f_evaluations)*

•evaluate_pce(pc_model,pc_coeffs,germ_samples)*:

•KDE(fcn_evals)*:

•get_multi_index(pc_model,ndim) This function computes the multi indices to be

used in the inverse relationship plot.

–pc model: Contains information about the PC set.

–ndim: Number of model dimensions.

•plot_mi_dims(pc_model,c_k,ndim) This function produces the inverse relationship

plot using the multi indices produced in the previous function, and the value of the

PC coeﬃcients.

–pc model: Contains information about the PC set.

–ndim: Number of model dimensions.

–c k: Numpy array of PC coeﬃcients.

*Please see previous example.

66

Sample Results

Run the ﬁle highd_sparse.py with the default settings. You should expect to see the

following print statement, and your graph should look similar to the one found in the ﬁgure

below. This sample is for a problem with twelve dimensions, the PCE is fourth order. The

following will print to the terminal:

Sparse quadrature method used 258681 points

Figure 5.10: Sample results of highd sparse.py

The next two ﬁgures show the two results of the ﬁve dimension example with a fourth

order PCE. The following will print to the terminal:

67

Monte Carlo sampling used 100000 points

Full quadrature method used 3125 points

Sparse quadrature method used 10363 points

Figure 5.11: Sample results of 5D fwd prop.py

68

Figure 5.12: Sample results of 5D fwd prop.py

Bayesian Inference of a Line

Overview

This example is located in examples/line_infer It infers the slope and intercept of

a line from noisy data using Bayes’ rule. The C++ libraries are called directly from the

driver program. By changing the likelihood function and the input data, this program can

be tailored to other inference problems.

To run an example, type ./line_infer.py directly. This ﬁle contains quite a bit of

inline documentation about the various settings and methods used. To get a listing of all

command line options, type ./line_infer.py -h". A typical run, with parameters changed

from command-line, is as follows:

./line_infer.py --nd 5 --stats

This will run the inference problem with 5 data points, generate plots of the posterior

distributions, and generate statistics of the MCMC samples. If no plots are desired, also

give the --noplots argument.

69

More details

After setting a number of default values for the example problem overall, the line_infer.py

script sets the proper inference inputs in the ﬁle line_infer.xml, starting from a set of de-

faults in line_infer.xml.templ. The ﬁle line_infer.xml is read in by the C++ code

line_infer.x, which does the actual Bayesian inference. After that, synthetic data is gen-

erated, either from a linear, or cosine model, with added noise.

Then, the script calls the C++ line inference code line_infer.x to infer the two pa-

rameters (slope and intercept) of a line that best ﬁts the artiﬁcial data. (Note, one can also

run the inference code directly by manually editing the ﬁle line_infer.xml and typing the

command ./line_infer.x)

The script then reads in the MCMC posterior samples ﬁle, and performs some postpro-

cessing. Unless the ﬂag --noplots is speciﬁed, the script computes and plots the following:

•The pushed-forward and posterior predictive error bars

–Generate a dense grid of x-values

–Evaluate the linear model y=a+bx for all posterior samples (a, b) after the

burn-in

–Pushed-forward distribution: compute the sample mean and standard deviation

of using the sampled models

–Posterior predictive distribution: combine pushed-forward distribution with the

noise model

•The MCMC chain for each variable, as well as a scatter plot for each pair of variables

•The marginal posterior distribution for each variable, as well as the marginal joint

distribution for each pair of variables

If the ﬂag --stats is speciﬁed, the following statistics are also computed:

•The mean, MAP (Maximum A Posteriori), and standard deviations of all parameters

•The covariance matrix

•The average acceptance probability of the chain

•The eﬀective sample sizes for each variable in the chain

Sample Results

70

Figure 5.13: The pushed forward posterior distribution (dark grey) and posterior predictive

distribution (light grey).

Figure 5.14: MCMC chains for parameters a and b, as well as a scatter plot for a and b

71

Figure 5.15: Marginal posterior distributions for all variables, as well as marginal joint

posteriors

Surrogate Construction and Sensitivity Analysis

Overview

•Located in examples/uqpc

•A collection of scripts that construct a PC surrogate for a computational model which

is speciﬁed as a black box simulation code. Also provides tools for sensitivity analysis

of the outputs of this black box model with respect to its input parameters.

Theory

Consider a function f(λ;x) where λ= (λ1, . . . , λd) are the model input parameters of

interest, while x∈Rrare design parameters with controllable values. For example, xcan

denote a spatial coordinate or a time snapshot, or it can simply enumerate multiple quantities

of interest. Furthermore, we assume known domains for each input parameter, i.e.

λi∈[ai, bi] for i= 1, . . . .d. (5.5)

72

The goal is to build a Polynomial Chaos (PC) surrogate function for each value of design

parameter x, i.e. for l= 1, . . . , L,

f(λ;xl)≈g(λ;xl) =

K−1

X

k=0

cklΨk(ξ) (5.6)

with respect to scaled inputs

ξi=λi−bi−ai

2

bi+ai

2∈[−1,1] for i= 1, . . . .d. (5.7)

Here Ψk(ξ) = Ψk(ξ1, . . . , ξd) are multivariate normalized Legendre polynomials, deﬁned as

products of univariate normalized Legendre polynomials ψki(ξi) as follows:

Ψk(ξ) = ψk1(ξ1). . . ψkd(ξd).(5.8)

A typical truncation rule in (5.6) is deﬁned according to the total order of the basis terms,

i.e. only polynomials with the total order ≤pare retained for some positive integer order p,

implying |k1|+···+|kd| ≤ p, and K= (d+p)!/(d!p!). The scalar index kis simply counting

the multi-indices (k1, . . . , kd).

After computing the PC coeﬃcients ckl, one can extract the global sensitivity information,

also called Sobol indices or variance-based decomposition. For example, the main sensitivity

index with respect to the dimension i(or variable ξi) is

Si(xl) = Pk∈Iic2

kl

PK−1

k=1 c2

kl

,(5.9)

where Iiis the indices of basis terms that involve only the variable ξi, i.e. the one-dimensional

monomials ψ1(ξi), ψ2(ξi), .... In other words, these are basis terms corresponding to multi-

indices with the only non-zero entry at the i-th location.

The two generic methods of ﬁnding the PC coeﬃcients ckl are detailed below.

Projection

The basis orthonormality enables the projection formulae

ckl =ZΩ

f(λ(ξ); xl)Ψk(ξ)1

2ddξ (5.10)

where Ω = [−1,1]dand λ(ξ) simply denotes the linear scaling relation in (5.7).

The projection integral can be taken by

73

•Quadrature integration

ckl ≈

Q

X

q=1

f(λ(ξ(q)); xl)Ψk(ξ(q)),(5.11)

where ξ(q)are Gauss-Legendre quadrature points in the d-dimensional Ω = [−1,1]d.

•Monte-Carlo integration [to be implemented]

ckl ≈

M

X

m=1

f(λ(ξ(m)); xl)Ψk(ξ(m)),(5.12)

where ξ(m)are uniformly random points in the d-dimensional Ω = [−1,1]d.

Inference

[to be implemented]

Implementation

The script set consists of three ﬁles are

•uq_pc.py : the main script

•model.py : black-box example model

•plot.py : plotting

The apps employed

•generate_quad : Quadrature point/weight generation

•gen_mi : PC multiindex generation

•pce_resp : Projection via quadrature integration

•pce_sens : Sensitivity extraction

•pce_eval : PC evaluation

Main script: The syntax of the main script is

74

% uq_pc.py -r <run_regime>

% -p <pdomain_file> -m <method> -o <ord> -s <sam_method> -n <nqd> -v <nval>

Also one can run uq_pc.py -h for help in the terminal and to check the defaults.

The list of arguments:

•-r <run_regime>: The regime in which the workﬂow is employed. The options are

online : Black-box model, which is in model.py, is run directly as param-

eter ensemble becomes available. User can provide their own model.py or simply use

the current one as an example.

offline_prep : Prepare the input parameter ensemble and store in ytrain.dat

and, if validation is requested, yval.dat.

The user then should run the model (model.py ptrain.dat ytrain.dat and per-

haps model.py pval.dat yval.dat) in order to provide ensemble output for the

offline_post stage.

offline_post : Postprocess the output ensemble, assuming the model is run oﬄine

with input ensemble provided in the offline_prep stage producing ytrain.dat and,

if validation is requested, yval.dat. The rest of the arguments should remain the same

as in the offline_post stage.

•-p <domain_file>: A ﬁle with drows and 2 columns, where d is the number of

parameters and each row consists of the lower and upper bound of the corresponding

parameter.

•-m <method>: The method of ﬁnding the PC surrogate coeﬃcients. The options are

proj : Projection method outlined in (5.10) and (5.11)

lsq : Least-squares.

bcs : Bayesian compressive sensing.

•-o <ord>: Total order pof the requested PC surrogates.

•-s <sam_method>: The input parameter sampling method. The options are

Q: Quadrature points. This sampling scheme works with the projection method

only, described in (5.11)

U: Uniformly random points. To be implemented.

•-n <nqd>: Number of samples requested if sam_method=U, or the number of quadrature

points per dimension, if sam_method=Q.

•-v <nval>: Number of uniformly random samples generated for PC surrogate valida-

tion, can be equal to 0 to skip validation.

Generated output ﬁles are:

75

•allsens.dat and allsens_sc.dat :The main eﬀect sensitivity indices in a format

Lxd, where each row corresponds to a single value for the design parameter, and each

column corresponds to the sensitivity index of a parameter. The second ﬁle stores the

same results, only the sensitivity indices are scaled to sum to one for each parameter.

•results.pk : Python pickle ﬁle that includes surrogate, sensitivity and error informa-

tion. It is loaded in plotting utilities.

•*.log : Log ﬁles of the apps that have been employed, for reference.

•mi.dat : PC multiindex, for reference, in a matrix form of size Kxd, see (5.8).

•designPar.dat : A ﬁle containing values for the design parameters, for reference.

Black-box model: The syntax of the model script is

% model.py <modelPar_file> <output_file>

Also one can run model.py -h for help in the terminal and to check the defaults.

The list of arguments:

•<modelPar_file> : A ﬁle with Nrows and dcolumns that stores the input parameter

ensemble of N samples.

•<output_file> : The ﬁle where output is stored, with Nrows (number of input

parameter samples) and Lcolumns (number of outputs, or number of design parameter

values).

model.py is the ﬁle that needs to be modiﬁed/provided by the user according to the model

under study. Currently, a simple example function f(λ;x) = Pd

i=1 λiPd

i=1

λi+λ2

i

ix+1 is

implemented that also produces the ﬁle designPar.dat for design parameters xl=lfor

l= 0,...,6. The default dimensionality is set to d= 3.

A user-created black-box model.py should accept an input parameter ﬁle <modelPar_file>

and should produce an output ﬁle <output_file> with the formats described above, in order

to be consistent with the expected I/O.

Visualization: The syntax of the plotting script is

% plot.py <plotid>

The user is encouraged to enhance or change the visualization scripts. The current defaults,

explained below, are simply for illustration purposes. The options for the only argument

<plotid> are

76

sens : Plots the sensitivity information.

dm : Plots model-vs-data for one of the values of the design parameter.

idm : Plots model and data values on the same axis, for one of the values of the

design parameter.

For the visualization script above, make sure the PYTHONPATH environment variable con-

tains the directory UQTK_INS.

Sample run:

In order to run a simple example, use the prepared an input domain ﬁle which is the

default domain ﬁle, pdomain_3d.dat, and run

•Online mode: uq_pc.py -r online -v111

•Oﬄine mode: uq_pc.py -r offline_prep -v111,

followed by model evaluations

model.py ptrain.dat ytrain.dat and model.py pval.dat yval.dat,

and the postprocessing stage uq_pc.py -r offline_post -v111

After ﬁnishing, run plot.py sens or plot.py dm or plot.py idm to visualize some

results.

Global Sensitivity Analysis via Sampling

Overview

•Located in PyUQTk/sens

•A collection of Python functions that generate input samples for black-box models,

followed by functions that post-process model outputs to generate total, ﬁrst-order,

and joint eﬀect Sobol indices

Theory

Let X= (X1,··· , Xn):Ω→ X ⊂ IRnbe an n−dimensional Random Variable in

L2(Ω,S, P ) with probability density X∼pX(x). Let x= (x1,··· , xn)∈ X be a sample

drawn from this density, with X=X1⊗ X2⊗ ··· ⊗ Xn, and Xi⊂IR is the range of Xi.

Let X−i= (X1,··· , Xi−1, Xi+1,··· , Xn) : Ω → X−i⊂IRn−1, where X−i∼pX−i|Xi(x−i|xi) =

pX(x)/pXi(xi), pXi(xi) is the marginal density of Xi,x−i= (x1,··· , xi−1, xi+1,··· , xn), and

X−i=X1⊗ ··· ⊗ Xi−1⊗ Xi+1 ⊗ ··· ⊗ Xn.

77

Consider a function Y=f(X) : Ω →IR, with Y∈L2(Ω,S, P ). Further, let Y∼pY(y),

with y=f(x). Given the variance of fis ﬁnite , one can employ the law of total variance1,2

to decompose the variance of fas

V[f] = Vxi[E[f|xi]] + Exi[V[f|xi]] (5.13)

The conditional mean, E[f|xi]≡E[f(X)|Xi=xi], and conditional variance, V[f|xi] =

V[f(X)|Xi=xi], are deﬁned as

hfi−i≡E[f|xi] = ZX−i

f(x)pX−i|Xi(x−i|xi)dx−i(5.14)

V[f|xi] = E[(f− hfi−i)2|xi]

=E[(f2−2fhfi−i+hfi2

−i)|xi]

=E[f2|xi]−2hfi−ihfi−i+hfi2

−i

=ZX−i

f(x)2pX−i|Xi(x−i|xi)dx−i− hfi2

−i(5.15)

The terms in the rhs of Eq. (5.13) can be written as

Vxi[E[f|xi]] = Exi[ (E[f|xi]−Exi[E[f|xi]])2] (5.16)

=Exi[ (E[f|xi]−f0)2]

=Exi[ (E[f|xi])2]−f2

0

=ZXi

E[f|xi]2pXi(xi)dxi−f2

0

where f0=E[f] = Exi[E[f|xi]] is the expectation of f, and

Exi[V[f|xi]] = ZXi

V[f|xi]pXi(xi)dxi(5.17)

The ratio

Si=Vxi[E[f|xi]]

V[f](5.18)

is called the ﬁrst-order Sobol index [32] and

ST

−i=Exi[V[f|xi]]

V[f](5.19)

is the total eﬀect Sobol index for x−i. Using Eq. (5.13), the sum of the two indices deﬁned

above is

Si+ST

−i=S−i+ST

i= 1 (5.20)

Joint Sobol indices Sij are deﬁned as

Sij =Vxi,xj[E[f|xi, xj]]

V[f]−Si−Sj(5.21)

1en.wikipedia.org/wiki/Law_of_total_variance

2en.wikipedia.org/wiki/Law_of_total_expectation

78

for i, j = 1,2...,n and i6=j.

Sican be interpreted as the fraction of the variance in model fthat can be attributed

to the i-th input parameter only, while Sij is the variance fraction that is due to the joint

contribution of i-th and j-th input parameters. ST

imeasures the fractional contribution to

the total variance due to parameter xiand its interactions with all other model parameters.

The Sobol indices are numerically estimated using Monte Carlo (MC) algorithms pro-

posed by Saltelli [24] and Kucherenko et al [16]. Let xk= (x1,··· , xn)kbe a sample of X

drawn from pX. Let x0k

−ibe a sample from the conditional distribution pX−i|Xi(x0

−i|xk

i), and

x00k

ia sample from the conditional distribution pXi|X−i(x00

i|xk

−i).

The expectation f0=E[f] and variance V=V[f] are estimated using the xksamples as

f0≈1

N

N

X

k=1

f(xk), V ≈1

N

N

X

k=1

f(xk)2−f2

0(5.22)

where Nis the total number of samples. The ﬁrst-order Sobol indices Siare estimated as

Si≈1

V 1

N

N

X

k=1

f(xk)f(x0k

−i∪xk

i)−f2

0!(5.23)

The joint Sobol indices are estimated as

Sij ≈1

V 1

N

N

X

k=1

f(xk)f(x0k

−(i,j)∪xk

i,j)−f2

0!−Si−Sj(5.24)

For ST

i, UQTk oﬀers two alternative MC estimators. In the ﬁrst approach, ST

iis estimated

as

ST

i= 1 −S−i≈1−1

V 1

N

N

X

k=1

f(xk)f(x00k

i∪xk

−i)−f2

0!(5.25)

In the second approach, ST

iis estimated as

ST

i≈1

2V 1

N

N

X

k=1 f(xk)−f(xk

−i∪x00k

i)2!(5.26)

Implementation

Directory pyUQTk/sensitivity contains two Python ﬁles

•gsalib.py : set of Python functions implementing the MC sampling and estimators

for Sobol indices

•gsatest.py : workﬂow illustrating the computation of Sobol indices for a toy problem

79

gsalib.py implements the following functions

•genSpl_Si(nspl,ndim,abrng,**kwargs) : generates samples for Eq. (5.23). The

input parameters are as follows

nspl: number of samples N,

ndim: dimensionality nof the input parameter space ,

abrng: a 2-dimensional array n×2, containing the range for each component xi.

The following optional parameters can also be speciﬁed

splout: name of ascii output ﬁle for MC samples

matfile: name of binary output ﬁle for select MC samples. These samples are

used in subsequent calculations of joint Sobol indices

verb: verbosity level

nd: number of signiﬁcant digits for ascii output

The default values for optional parameters are listed in gsalib.py

•genSens_Si(modeval,ndim,**kwargs) : computes ﬁrst-order Sobol indices using Eq. (5.23).

The input parameters are as follows

modeval: name of ascii ﬁle with model evaluations,

ndim: dimensionality nof the input parameter space

The following optional parameter can also be speciﬁed

verb: verbosity level

The default value for the optional parameter is listed in gsalib.py

•genSpl_SiT(nspl,ndim,abrng,**kwargs) : generates samples for Eqs. (5.25-5.26).

The input parameters are as follows

nspl: number of samples N,

ndim: dimensionality nof the input parameter space ,

abrng: an 2-dimensional array n×2, containing the range for each component xi.

The following optional parameters can also be speciﬁed

splout: name of ascii output ﬁle for MC samples

matfile: name of binary output ﬁle for select MC samples. These samples are

used in subsequent calculations of Sobol indices

verb: verbosity level

nd: number of signiﬁcant digits for ascii output

The default values for optional parameters are listed in gsalib.py

80

•genSens_SiT(modeval,ndim,**kwargs) : computes total Sobol indices using either

Eq. (5.25) or Eq. (5.26). The input parameters are as follows

modeval: name of ascii ﬁle with model evaluations,

ndim: dimensionality nof the input parameter space

The following optional parameter can also be speciﬁed

type: speciﬁes wether to use Eq. (5.25) for type = ”type1” or Eq. (5.26) for

type 6= ”type1”

verb: verbosity level

The default value for the optional parameter is listed in gsalib.py

•genSpl_Sij(ndim,**kwargs) : generates samples for Eq. (5.24). The input parame-

ters are as follows

ndim: dimensionality nof the input parameter space ,

The following optional parameters can also be speciﬁed

splout: name of ascii output ﬁle for MC samples

matfile: name of binary output ﬁle for select MC samples saved by genSpl_Si.

verb: verbosity level

nd: number of signiﬁcant digits for ascii output

The default values for optional parameters are listed in gsalib.py

•genSens_Sij(sobolSi,modeval,**kwargs) : computes joint Sobol indices using Eq. (5.24).

The input parameters are as follows

sobolSi: array with values for ﬁrst-order Sobol indices Si

modeval: name of ascii ﬁle with model evaluations.

The following optional parameter can also be speciﬁed

verb: verbosity level

The default value for the optional parameter is listed in gsalib.py

gsatest.py provides the workﬂow for the estimation of Sobol indices for a simple model

given by

f(x1, x2, . . . , xn) =

n

X

i=1

xi+

n−1

X

i=1

i2xixi+1 (5.27)

In the example provided in this ﬁle, n(ndim in the ﬁle) is set equal to 4, and the number of

samples N(nspl in the ﬁle) to 104. Figures 5.16 and 5.17 show results based on an ensemble

of 10 runs. To generate these results run the example workﬂow:

python gsatest.py

81

Figure 5.16: First-order (left frame) and joint (right frame) Sobol indices for the model

given in Eq. (5.27). The black circles show the theorerical values, computed analytically,

and the error bars correspond to ±σcomputed based on an ensemble of 10 runs.

Figure 5.17: Total-order Sobol indices for the model given in Eq. (5.27). The red bars

shows results based on Eq. (5.25) while the yellow bars are based on Eq. (5.26). The black

circles show the theorerical values, computed analytically, and the error bars correspond to

±σcomputed based on an ensemble of 10 runs. For this model, Eq. (5.26) provides more

accurate estimates for ST

icompared to results based on Eq. (5.25).

82

Karhunen-Lo`eve Expansion of a Stochastic Process

•Located in examples/kle_ex1

•Some examples of the construction of 1D and 2D Karhunen-Lo`eve (KL) expansions of

a Gaussian stochastic process, based on sample realizations of this stochastic process.

Theory

Assume stochastic process F(x, ω) : D×Ω→Ris L2random ﬁeld on D, with covariance

function C(x, y). Then Fcan be written as

F(x, ω) = hF(x, ω)iω+∞

X

k=1 pλkfk(x)ξk(5.28)

where fk(x) are eigenfunctions of C(x, y) and λkare corresponding eigenvalues (all positive).

Random variables ξkare uncorrelated with unit variance. Projecting realizations of Fonto

fkleads to samples of ξk. These samples are generally not independent. In the special case

when Fis a Gaussian random process, ξkare i.i.d. normal random variables.

The KL expansion is optimal, i.e. of all possible orthonormal bases for L2(D×Ω) the

above {fk(x)|k= 1,2, . . .}minimize the mean-square error in a ﬁnite linear representation

of F(·). If known, the covariance matrix can be speciﬁed analytically, e.g. the square-

exponential form

C(x, y) = exp −|x−y|2

c2

l!(5.29)

where |x−y|is the distance between xand yand clis the correlation length. The covariance

matrix can also be estimated from realizations, e.g.

C(x, y) = 1

NωX

ω

(F(x, ω)− hF(x, ω)iω)(F(y, ω)− hF(y, ω)iω) (5.30)

where Nωis the number of samples, and hF(x, ω)iωis the mean over the random ﬁeld

realizations at x.

The eigenvalues and eigenvectors in Eq. (5.28) are solutions of the Fredholm equation of

second kind: ZC(x, y)f(y)dy =λf(x) (5.31)

One can employ the Nystrom algorithm [19] to discretize of the integral in the left-hand side

of Eq. (5.31)

Np

X

i=1

wiC(x, yi)f(yi) = λf(x) (5.32)

83

Here wiare the weights for the quadrature rule that uses Nppoints yiwhere realizations are

provided. In a 1D conﬁguration, one can employ the weights corresponding to the trapezoidal

rule:

wi=

y2−y1

2if i= 1,

yi+1−yi−1

2if 2 ≤i<Np,

yNp−yNp−1

2if i=Np,

(5.33)

After further manipulation, Eq. (5.32) is written as

Ag =λg

where A=W KW and g=W f, with Wbeing the diagonal matrix Wii =√wiand Kij =

C(xi, yj). Since matrix Ais symmetric, one can employ eﬃcient algorithms to compute its

eigenvalues λkand eigenvectors gk. Currently UQTk relies on the dsyevx function provided

by the LAPACK library.

The KL eigenvectors are computed as fk=W−1gkand samples of random variables ξk

are obtained by projecting realizations of the random process Fon the eigenmodes fk

ξk|ωl=hF(x, ωl)− hF(x, ω)iω, fk(x)ix/pλk

Numerically, these projections can be estimated via quadrature

ξk|ωl=

Np

X

i=1

wi(F(xi, ωl)− hF(xi, ω)iω)fk(xi)/pλk(5.34)

If Fis a Gaussian process, ξkare i.i.d. normal RVs, i.e. automatically have ﬁrst order

Wiener-Hermite Polynomial Chaos Expansions (PCE). In general however, the KL RVs can

be converted to PCEs (not shown in the current example).

1D Examples

In this section we are presenting 1D RFs generated with kl 1D.x. The RFs are generated

on a non-uniform 1D grid, with smaller grid spacing near x= 0 and larger grid spacing

towards x= 1. This grid is computed using an algebraic expression [14]

xi=Lβ+ 1 −(β−1)ri

ri+ 1 , ri=β+ 1

β−11−ηi

, ηi=i−1

Np−1, i = 1,2, . . . , Np(5.35)

The β > 1 factor in the above expression controls the compression near x= 0. It results in

higher compression as βgets closer to 1. The examples shown in this section are based on

default values for the parameters that control the grid deﬁnition in kl 1D:

β= 1.1, L = 1, Np= 129

84

Figure 5.18 shows sample realizations for 1D random ﬁelds (RF) generated with a square-

exponential covariance matrix employing several correlation lenghts cl. These ﬁgures were

generated with

./mkplots.py samples 0.05

./mkplots.py samples 0.10

./mkplots.py samples 0.20

(a) cl= 0.05 (b) cl= 0.10 (c) cl= 0.20

Figure 5.18: Sample 1D random ﬁeld realizations for several correlation lengths cl.

Once the RF realizations are generated the covariance matrix is discarded and a “numer-

ical” covariance matrix is estimated based on the available realizations. Figure 5.19 shows

shaded illustration of covariance matrices computed using several sets of 1D RF samples.

These ﬁgures were generated with

./mkplots.py numcov 0.05 512 ./mkplots.py numcov 0.20 512

./mkplots.py numcov 0.05 8192 ./mkplots.py numcov 0.20 8192

./mkplots.py numcov 0.05 131072 ./mkplots.py numcov 0.20 131071

These matrices employ RF samples generated on a non-uniform grid with higher densitiy

of points near the left boundary. Hence, the matrix entries near the diagonal in the upper

right corner show larger values. Grids grow further apart away from the left boundary hence

the region near the diagonal grows thinner for these grid points.

Figure 5.20 shows the eigenvalue solution of Fredholm equation (5.31) in its discretized

form given by Eq. (5.32). This ﬁgure was generated with

./mkplots.py pltKLeig1D 512 131072

For this 1D example problem, 29= 512 RF realizations are suﬃcient to estimate the

KLE eigenvalue spectrum. As the correlation length decreases the eigenvalues decrease

more slowly suggesting that more terms are needed to represent RF ﬂuctuations.

Figure 5.21 shows ﬁrst four KL eigenvectors corresponding to cl= 0.05, scaled by the

square rood of the corresponding eigenvalue. These plots were generated with

./mkplots.py numKLevec 0.05 512 on

85

(a) cl= 0.05, Nω= 29(b) cl= 0.05, Nω= 213 (c) cl= 0.05, Nω= 217

(d) cl= 0.20, Nω= 29(e) cl= 0.20, Nω= 213 (f) cl= 0.20, Nω= 217

Figure 5.19: Illustration of covariance matrices computed from 1D RF realizations. Red

corresponds to large values, close to 1, while blue corresponds to small values, close to 0.

./mkplots.py numKLevec 0.05 8192 off

./mkplots.py numKLevec 0.05 131072 off

Unlike the eigenvalue spectrum, the eigenvectors are very sensitive to the covariance

matrix entries. For cl= 0.05, a large number of RF realizations, e.g. Nω= 217 in Fig. 5.21c,

are required for computing a covariance matrix with KL modes that are close to the ones

based on analytical covariance matrix (analytical modes not shown).

Figure 5.22 shows ﬁrst four KL eigenvectors corresponding to cl= 0.20, scaled by the

square rood of the corresponding eigenvalue. These plots were generated with

./mkplots.py numKLevec 0.20 512 on

./mkplots.py numKLevec 0.20 8192 off

./mkplots.py numKLevec 0.20 131072 off

For larger correlation lengths, a smaller number of samples is suﬃcient to estimate a

covariance matrix and subsequently the KL modes. The results based on Nω= 213 = 8192

RF realizations, in Fig. 5.22b, are close to the ones based on a much larger number of

realizations, Nω= 217 = 131072 in Fig. 5.22c.

One can explore the orthogonality of the KLE modes to compute samples of germ ξk,

introduced in Eq. (5.28). These samples are computed via Eq. (eq:xirealiz) and are saved

in ﬁles xidata* in the corresponding run directories. Using the ξsamples, one can estimate

their density via Kernel Density Estimate (KDE). Figures 5.23 and 5.24. These ﬁgures were

86

Figure 5.20: KL eigenvalues estimated with two sets of RF realizations: 29= 512 (dashed

lines) and 217 = 131072 (solid lines).

(a) Nω= 29(b) Nω= 213 (c) Nω= 217

Figure 5.21: Illustration of ﬁrst 4 KL modes, computed based on a numerical covariance

matrices using three sets of RF realizations with cl= 0.05

generated with

./mkplots.py xidata 0.05 512 ./mkplots.py xidata 0.20 512

./mkplots.py xidata 0.05 131072 ./mkplots.py xidata 0.20 131072

Independent of the correlation length, a relatively large number of samples is required

for “converged” estimates for the density of ξ.

Figures 5.25 and 5.26 show reconstructions of select RF realizations. As observed in the

ﬁgure showing the decay in the magnitude of the KL eigenvalues, more terms are needed to

represents small scale features occuring for smaller correlation lengths, in Fig. 5.25, compared

to RF with larger correlation lengths, e.g. the example shown in Fig. 5.26. The plots shown

in Figs. 5.25 and 5.26 were generated with

87

(a) Nω= 29(b) Nω= 213 (c) Nω= 217

Figure 5.22: Illustration of ﬁrst 4 KL modes, computed based on a numerical covariance

matrices using three sets of RF realizations with cl= 0.20

(a) Nω= 29(b) Nω= 217

Figure 5.23: Probability densities for ξkobtained via KDE using samples obtained by

projecting RF realizations onto KL modes. Results correspond to cl= 0.05.

./mkplots.py pltKLrecon1D 0.05 21 51 10

./mkplots.py pltKLrecon1D 0.10 63 21 4

2D Examples on Structured Grids

In this section we are presenting 2D RFs generated with kl 2D.x. The RFs are generated

on a non-uniform structured 2D grid [0, Lx]×[0, Ly], with smaller grid spacing near the

boundaries and larger grid spacing towards the center of the domain. This grid is computed

using an algebraic expression [14]. The ﬁrst coordinate is computed via

xi=Lx

(2α+β)ri+ 2α−β

(2α+ 1) (1 + ri), ri=β+ 1

β−1

ηi−α

1−α, ηi=i−1

Np−1, i = 1,2, . . . , Nx(5.36)

88

(a) Nω= 29(b) Nω= 217

Figure 5.24: Probability densities for ξkobtained via KDE using samples obtained by

projecting RF realizations onto KL modes. Results correspond to cl= 0.20.

(a) Mean (b) Mean + 10 terms (c) Mean + 20 terms

(d) Mean + 30 terms (e) Mean + 40 terms (f) Mean + 50 terms

Figure 5.25: Reconstructing realizations with an increasing number of KL expansion terms

for cl= 0.05

The β > 1 factor in the above expression controls the compression near x= 0 and x=Lx,

while α∈[0,1] determines where the clustering occurs. The examples shown in this section

are based on default values for the parameters that control the grid deﬁnition in kl 2D.x:

α= 1/2, β = 1.1, Lx1=Lx2=L= 1, Nx1=Nx2= 65

Figure 5.27 shows the 2D computational grid created with these parameters. This ﬁgure was

generated with the Python script “pl2Dsgrid.py”

./pl2Dsgrid.py cvspl2D_0.1_4096

Figure 5.28 shows 2D RF realizations with correlation lengths cl= 0.1 and cl= 0.2.

As the correlation length increases the realizations become smoother. These ﬁgure were

generated with

./mkplots.py samples2D 0.1 4096 2 (Figs. 5.28a,5.28b)

89

(a) Mean (b) Mean + 4 terms (c) Mean + 8 terms

(d) Mean + 12 terms (e) Mean + 16 terms (f) Mean + 20 terms

Figure 5.26: Reconstructing realizations with an increasing number of KL expansion terms

for cl= 0.10

./mkplots.py samples2D 0.2 4096 2 (Figs. 5.28c,5.28d)

In a 2D conﬁguration the rhs of Eq. (eq:fredint) is discretized using a 2D ﬁnite volume

approach:

ZCov(x, y)f(y)dy ≈

Nx1−1

X

i=1

Nx2−1

X

j=1

(Cov(x, y)f(y)) |ij Aij (5.37)

Here, Aij is the area of rectangle (ij) with lower left corner (i, j) and upper right corner

(i+ 1, j + 1), and (Cov(x, y)f(y)) |ij is the average over rectangle (ij) computed as the

arithmetic average of values at its four vertices. Eq. (5.37) can be further cast as

ZCov(x, y)f(y)dy ≈

Nx1

X

i=1

Nx2

X

j=1

(Cov(x, y)f(y))i,j wi,j ,(5.38)

where wi,j is a quarter of the area of all rectangles that surround vertex (i, j).

Figures 5.29 and 5.30 shows ﬁrst 8 KL modes computed based on covariance matrices that

where estimated from 212 = 4096 and 216 = 65536 number of RF samples, respectively, and

correlation length cl= 0.1 for both sets. The results in Fig. 5.30 are close to the KL modes

corresponding to the analytical covariance matrix (results not shown), while the results in

Fig. 5.29 indicate that 212 RF realizations is not suﬃcient to generate converged KL modes.

These ﬁgures were generated with

./mkplots.py numKLevec2D 0.1 4096 (Fig. 5.29)

./mkplots.py numKLevec2D 0.1 65536 (Fig. 5.30)

Figure 5.31 shows ﬁrst 8 KL modes computed based on a covariance matrix that was

90

Figure 5.27: Structured grid employed for 2D RF examples.

(a) cl= 0.1 (b) cl= 0.1 (c) cl= 0.2 (d) cl= 0.2

Figure 5.28: Sample 2D random ﬁeld realizations for cl= 0.1 and cl= 0.2.

estimated from 212 = 4096 number of RF samples. For these results, with correlation length

cl= 0.5, 212 samples are suﬃcient to estimate the covariance matrix and subsequently KL

modes that are close to analytical results (results not shown). The plots in Fig. 5.31 were

generated with

./mkplots.py numKLevec2D 0.5 4096

Figures 5.32 and 5.33 show reconstructions of select 2D RF realizations. As observed in

the previous section for 1D RFs, more terms are needed to represents small scale features

occuring for smaller correlation lengths, in Fig. 5.32, compared to RF with larger correlation

lengths, e.g. the example shown in Fig. 5.33. The plots shown in Figs. 5.32 and 5.33 were

generated with

./mkplots.py pltKLrecon2D 0.2 3 85 12 (Fig. 5.32)

./mkplots.py pltKLrecon2D 0.5 37 36 5 (Fig. 5.33)

91

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 4

(e) Mode 5 (f) Mode 6 (g) Mode 7 (h) Mode 8

Figure 5.29: Illustration of ﬁrst 8 KL modes, computed based on a numerical covariance

matrix estimated using 212 2D RF realizations with cl= 0.1

2D Examples on Unstructured Grids

For this example we choose a computational domain that resembles the shape of Cali-

fornia. A number of 212 = 4096 points were randomly distributed inside this computational

domain, and a triangular grid with 8063 triangles was generated via Delaunay triangulation.

The 2D grid point locations are provided in “data/cali grid.dat” and the grid point connec-

tivities are provided in “data/cali tria.dat”. Figure 5.34 shows the placement of these grid

points, including an inset plot with the triangular grid connectivities. This ﬁgure shows the

grids on a uniform scale in terms of latitude and longitude degrees and was generated with

./pl2Dugrid.py

Figure 5.35 shows 2D RF realizations with correlation lengths cl= 0.5◦and cl= 2◦.

These ﬁgure were generated with

./mkplots.py samples2Du 0.5 4096 2 (Figs. 5.35a,5.35b)

./mkplots.py samples2Du 2.0 4096 2 (Figs. 5.35c,5.35d)

Figure 5.36 shows ﬁrst 16 KL modes computed based on a covariance matrix that was

estimated from 216 = 65536 number of RF samples, with correlation length cl= 0.5◦.

The KL modes corresponding to an analytically estimated covariance matrix with the same

correlation length are shown in Fig. 5.37. For this example, it seems that 216 samples are

suﬃcient to estimate the ﬁrst 12 to 13 modes accurately. Please note that some of the

92

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 4

(e) Mode 5 (f) Mode 6 (g) Mode 7 (h) Mode 8

Figure 5.30: Illustration of ﬁrst 8 KL modes, computed based on a numerical covariance

matrix estimated using 216 2D RF realizations with cl= 0.1

modes can diﬀer up to a multiplicative factor of −1, hence the colorscheme will be reversed.

Higher order modes start diverging from analytical estimates, e.g. modes 14 through 16 in

this example. Figure 5.38 shows KL modes corresponding to a covariance matrix estimated

from RF realizations with cl= 2◦. For this correlation length, 216 samples are suﬃcient to

generate KL modes that are very close to analytical results (not shown). These ﬁgures were

generated with

./mkplots.py numKLevec2Du 0.5 65536 (Fig. 5.36)

./mkplots.py anlKLevec2Du SqExp 0.5 (Fig. 5.37)

./mkplots.py numKLevec2Du 2.0 65536 (Fig. 5.38)

93

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 4

(e) Mode 5 (f) Mode 6 (g) Mode 7 (h) Mode 8

Figure 5.31: Illustration of ﬁrst 8 KL modes, computed based on a numerical covariance

matrix estimated using 212 2D RF realizations with cl= 0.5

(a) Mean (b) Mean + 12 terms (c) Mean + 24 terms (d) Mean + 36 terms

(e) Mean + 48 terms (f) Mean + 60 terms (g) Mean + 72 terms (h) Mean + 84 terms

Figure 5.32: Reconstructing 2D realizations with an increasing number of KL expansion

terms for cl= 0.2

94

(a) Mean (b) Mean + 5 terms (c) Mean + 10 terms (d) Mean + 15 terms

(e) Mean + 20 terms (f) Mean + 25 terms (g) Mean + 30 terms (h) Mean + 35 terms

Figure 5.33: Reconstructing 2D realizations with an increasing number of KL expansion

terms for cl= 0.5

Figure 5.34: Unstructured grid generated via Delaunay traingulation overlaid over Califor-

nia.

95

(a) cl= 0.5◦(b) cl= 0.5◦(c) cl= 2◦(d) cl= 2◦

Figure 5.35: Sample 2D random ﬁeld realizations on an unstructured grid for cl= 0.5◦and

cl= 2◦.

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 6

(e) Mode 10 (f) Mode 16

Figure 5.36: Illustration of seclect KL modes, computed based on a numerical covariance

matrix estimated using 216 2D RF realizations on an unstructured grid with cl= 0.5◦.

96

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 6

(e) Mode 10 (f) Mode 16

Figure 5.37: Illustration of select KL modes, computed based on an analytical covariance ma-

trix for 2D RF realizations on an unstructured grid with cl= 0.5◦and a square-exponential

form.

(a) Mode 1 (b) Mode 2 (c) Mode 3 (d) Mode 6

(e) Mode 10 (f) Mode 16

Figure 5.38: Illustration of select KL modes, computed based on a numerical covariance

matrix estimated using 216 2D RF realizations on an unstructured grid with cl= 2◦.

97

98

Chapter 6

Support

UQTk is the subject of continual development and improvement. If you have questions

about or suggestions for UQTk, feel free to e-mail the UQTk Users list, at uqtk-users@

software.sandia.gov. You can sign up for this mailing list at https://software.sandia.

gov/mailman/listinfo/uqtk-users.

99

100

References

[1] S. Babacan, R. Molina, and A. Katsaggelos. Bayesian compressive sensing using Laplace

priors. IEEE Transactions on Image Processing, 19(1):53–63, 2010.

[2] C. W. Clenshaw and A. R. Curtis. A method for numerical integration on an automatic

computer. Numerische Mathematik, 2:197–205, 1960.

[3] B.J. Debusschere, H.N. Najm, P.P. P´ebay, O.M. Knio, R.G. Ghanem, and O.P. Le

Maˆıtre. Numerical challenges in the use of polynomial chaos representations for stochas-

tic processes. SIAM Journal on Scientiﬁc Computing, 26(2):698–719, 2004.

[4] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data

Analysis. Chapman & Hall CRC, 2 edition, 2003.

[5] Stuart Geman and D. Geman. Stochastic Relaxation, Gibbs Distributions, and the

Bayesian Restoration of Images. Pattern Analysis and Machine Intelligence, IEEE

Transactions on, PAMI-6(6):721–741, 1984.

[6] Thomas Gerstner and Michael Griebel. Numerical integration using sparse grids. Nu-

merical Algorithms, 18(3-4):209–232, 1998. (also as SFB 256 preprint 553, Univ. Bonn,

1998).

[7] R.G. Ghanem and P.D. Spanos. Stochastic Finite Elements: A Spectral Approach.

Springer Verlag, New York, 1991.

[8] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in

Practice. Chapman & Hall, London, 1996.

[9] G. H. Golub and J. H. Welsch. Calculation of Gauss quadrature rules. Math. Comp.,

23:221–230, 1969.

[10] H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli,

7:223–242, 2001.

[11] Heikki Haario, Marko Laine, Antonietta Mira, and Eero Saksman. Dram: Eﬃcient

adaptive mcmc. Statistics and Computing, 16(4):339–354, 2006.

[12] R.G. Haylock and A. O’Hagan. On inference for outputs of computationally expensive

algorithms with uncertainty on the inputs. Bayesian statistics, 5:629–637, 1996.

[13] F.B. Hildebrand. Introduction to Numerical Analysis. Dover, 1987.

[14] K.A Hoﬀmann and S.T. Chiang. Computational Fluid Dynamics, volume 1, chapter 9,

pages 358–426. EES, 2000.

101

[15] M. C. Kennedy and A. O’Hagan. Bayesian calibration of computer models. Journal of

the Royal Statistical Society: Series B, 63(3):425–464, 2001.

[16] S. Kucherenko, S. Tarantola, and P. Annoni. Estimation of global sensitivity indices

for models with dependent variables. Computer Physics Communications, 183:937–946,

2012.

[17] O.P. Le Maˆıtre and O.M. Knio. Spectral Methods for Uncertainty Quantiﬁcation: With

Applications to Computational Fluid Dynamics (Scientiﬁc Computation). Springer, 1st

edition. edition, April 2010.

[18] Y. M. Marzouk and H. N. Najm. Dimensionality reduction and polynomial chaos ac-

celeration of Bayesian inference in inverse problems. Journal of Computational Physics,

228(6):1862–1902, 2009.

[19] E.J. Nystr¨om. ¨

Uber die praktische auﬂ¨osung von integralgleichungen mit anwendungen

auf randwertaufgaben. Acta Mathematica, 54(1):185–204, 1930.

[20] J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of com-

puter model outputs. Biometrika, 89(4):769–784, 2002.

[21] Mark Orr. Introduction to radial basis function networks. Technical Report, Center for

Cognitive Science, University of Edinburgh, 1996.

[22] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning.

MIT Press, 2006.

[23] M. Rosenblatt. Remarks on a multivariate transformation. Annals of Mathematical

Statistics, 23(3):470 – 472, 1952.

[24] A. Saltelli. Making best use of model evaluations to compute sensitivity indices. Com-

puter Physics Communications, 145:280–297, 2002.

[25] K. Sargsyan. Surrogate models for uncertainty propagation and sensitivity analysis. In

R. Ghanem, D. Higdon, and H. Owhadi, editors, Handbook of Uncertainty Quantiﬁca-

tion. Springer, 2016.

[26] K. Sargsyan, H. N. Najm, and R. Ghanem. On the Statistical Calibration of Physical

Models. International Journal for Chemical Kinetics, 47(4):246–276, 2015.

[27] K. Sargsyan, C. Safta, H. Najm, B. Debusschere, D. Ricciuto, and P. Thornton. Dimen-

sionality reduction for complex models via Bayesian compressive sensing. International

Journal of Uncertainty Quantiﬁcation, 4(1):63–93, 2014.

[28] D.W. Scott. Multivariate Density Estimation. Theory, Practice and Visualization. Wi-

ley, New York, 1992.

[29] B.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and

Hall, London, 1986.

102

[30] D.S. Sivia. Data Analysis: A Bayesian Tutorial. Oxford Science, 1996.

[31] S. A. Smolyak. Quadrature and interpolation formulas for tensor products of certain

classes of functions. Soviet Mathematics Dokl., 4:240–243, 1963.

[32] I. M. Sobol. Sensitivity estimates for nonlinear mathematical models. Math. Modeling

and Comput. Exper., 1:407–414, 1993.

103

DISTRIBUTION:

1 MS 0899 Technical Library, 8944 (electronic copy)

104

v1.40

105

106