# PyTorch Py Torch Under The Hood A Guide To Understand Internals

User Manual:

Open the PDF directly: View PDF .

Page Count: 105 [warning: Documents this large are best viewed by clicking the View PDF Link!]

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

Agenda

TENSORS

Tensors

Python objects

Zero-copy

Tensor storage

Memory allocators (CPU/GPU)

The big picture

JIT

Just-in-time compiler

Tracing

Scripting

Why TorchScript ?

Building IR and JIT Phases

Optimizations

Serialization

Using models in other languages

PRODUCTION

Some tips

Q&A

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DISCLAIMER

ÉPyTorch is a moving target, Deep Learning ecosystem moves

fast and big changes happens every week;

ÉThis is not a talk to teach you the basics of PyTorch or how to

train your network, but to teach you how PyTorch

components works under the hood in a intuitive way;

ÉThis talk is updated to the PyTorch v.1.0.1 version;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DISCLAIMER

ÉPyTorch is a moving target, Deep Learning ecosystem moves

fast and big changes happens every week;

ÉThis is not a talk to teach you the basics of PyTorch or how to

train your network, but to teach you how PyTorch

components works under the hood in a intuitive way;

ÉThis talk is updated to the PyTorch v.1.0.1 version;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DISCLAIMER

ÉPyTorch is a moving target, Deep Learning ecosystem moves

fast and big changes happens every week;

ÉThis is not a talk to teach you the basics of PyTorch or how to

train your network, but to teach you how PyTorch

components works under the hood in a intuitive way;

ÉThis talk is updated to the PyTorch v.1.0.1 version;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

Simply put, TENSORS are a generalization of vectors and matrices.

In PyTorch, they are a multi-dimensional matrix containing elements

of a single data type.

>>> import torch

>>> t=torch.tensor([[1.,-1.], [1.,-1.]])

>>> t

tensor([[ 1., -1.]

[ 1., -1.]])

>>> t.dtype # They have a type

torch.float32

>>> t.shape # a shape

torch.Size([2, 2])

>>> t.device # and live in some device

device(type='cpu')

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

Simply put, TENSORS are a generalization of vectors and matrices.

In PyTorch, they are a multi-dimensional matrix containing elements

of a single data type.

>>> import torch

>>> t=torch.tensor([[1.,-1.], [1.,-1.]])

>>> t

tensor([[ 1., -1.]

[ 1., -1.]])

>>> t.dtype # They have a type

torch.float32

>>> t.shape # a shape

torch.Size([2, 2])

>>> t.device # and live in some device

device(type='cpu')

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

Simply put, TENSORS are a generalization of vectors and matrices.

In PyTorch, they are a multi-dimensional matrix containing elements

of a single data type.

>>> import torch

>>> t=torch.tensor([[1.,-1.], [1.,-1.]])

>>> t

tensor([[ 1., -1.]

[ 1., -1.]])

>>> t.dtype # They have a type

torch.float32

>>> t.shape # a shape

torch.Size([2, 2])

>>> t.device # and live in some device

device(type='cpu')

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

Simply put, TENSORS are a generalization of vectors and matrices.

In PyTorch, they are a multi-dimensional matrix containing elements

of a single data type.

>>> import torch

>>> t=torch.tensor([[1.,-1.], [1.,-1.]])

>>> t

tensor([[ 1., -1.]

[ 1., -1.]])

>>> t.dtype # They have a type

torch.float32

>>> t.shape # a shape

torch.Size([2, 2])

>>> t.device # and live in some device

device(type='cpu')

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

Simply put, TENSORS are a generalization of vectors and matrices.

In PyTorch, they are a multi-dimensional matrix containing elements

of a single data type.

>>> import torch

>>> t=torch.tensor([[1.,-1.], [1.,-1.]])

>>> t

tensor([[ 1., -1.]

[ 1., -1.]])

>>> t.dtype # They have a type

torch.float32

>>> t.shape # a shape

torch.Size([2, 2])

>>> t.device # and live in some device

device(type='cpu')

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

É

Although PyTorch has an elegant python ﬁrst design, all PyTorch

heavy work is actually implemented in C++.

ÉIn Python, the integration of C++ code is (usually) done using

what is called an extension;

ÉPyTorch uses ATen, which is the foundational tensor operation

library on which all else is built;

É

To do automatic differentiation, PyTorch uses

Autograd

, which

is an augmentation on top of the ATen framework;

ÉIn the Python API, PyTorch previously had separate

Variable and a Tensor types, after v.0.4.0 they were

merged into Tensor .

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

É

Although PyTorch has an elegant python ﬁrst design, all PyTorch

heavy work is actually implemented in C++.

ÉIn Python, the integration of C++ code is (usually) done using

what is called an extension;

ÉPyTorch uses ATen, which is the foundational tensor operation

library on which all else is built;

É

To do automatic differentiation, PyTorch uses

Autograd

, which

is an augmentation on top of the ATen framework;

ÉIn the Python API, PyTorch previously had separate

Variable and a Tensor types, after v.0.4.0 they were

merged into Tensor .

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

É

Although PyTorch has an elegant python ﬁrst design, all PyTorch

heavy work is actually implemented in C++.

ÉIn Python, the integration of C++ code is (usually) done using

what is called an extension;

ÉPyTorch uses ATen, which is the foundational tensor operation

library on which all else is built;

É

To do automatic differentiation, PyTorch uses

Autograd

, which

is an augmentation on top of the ATen framework;

ÉIn the Python API, PyTorch previously had separate

Variable and a Tensor types, after v.0.4.0 they were

merged into Tensor .

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSORS

É

Although PyTorch has an elegant python ﬁrst design, all PyTorch

heavy work is actually implemented in C++.

ÉIn Python, the integration of C++ code is (usually) done using

what is called an extension;

ÉPyTorch uses ATen, which is the foundational tensor operation

library on which all else is built;

É

To do automatic differentiation, PyTorch uses

Autograd

, which

is an augmentation on top of the ATen framework;

ÉIn the Python API, PyTorch previously had separate

Variable and a Tensor types, after v.0.4.0 they were

merged into Tensor .

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct {

PyObject_HEAD

double ob_fval;

} PyFloatObject;

typedef struct _object {

Py_ssize_t ob_refcnt;

struct _typeobject *ob_type;

} PyObject;

struct _typeobject *ob_type

Py_ssize_t ob_refcnt

object

PyObject

double ob_fval

PyObject_HEAD

object

PyFloatObject

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct {

PyObject_HEAD

double ob_fval;

} PyFloatObject;

typedef struct _object {

Py_ssize_t ob_refcnt;

struct _typeobject *ob_type;

} PyObject;

struct _typeobject *ob_type

Py_ssize_t ob_refcnt

object

PyObject

double ob_fval

PyObject_HEAD

object

PyFloatObject

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

typedef struct {

PyObject_HEAD

double ob_fval;

} PyFloatObject;

typedef struct _object {

Py_ssize_t ob_refcnt;

struct _typeobject *ob_type;

} PyObject;

struct _typeobject *ob_type

Py_ssize_t ob_refcnt

object

PyObject

double ob_fval

PyObject_HEAD

object

PyFloatObject

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

struct THPVariable {

PyObject_HEAD

torch::autograd::Variable cdata;

PyObject*backward_hooks;

};

(object fields)

PyObject_HEAD (w/ ref counter)

object

THPVariable

variable_a

variable_b

Ref Count = 1

Ref Count = 2

The TH preﬁx is from TorcH, and Pmeans Python.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

QUICK RECAP PYTHON OBJECTS

struct THPVariable {

PyObject_HEAD

torch::autograd::Variable cdata;

PyObject*backward_hooks;

};

(object fields)

PyObject_HEAD (w/ ref counter)

object

THPVariable

variable_a

variable_b

Ref Count = 1

Ref Count = 2

The TH preﬁx is from TorcH, and Pmeans Python.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

INPYTHON,EVERYTHING IS AN OBJECT

>>> a= 300

>>> b= 300

>>> ais b

False

>>> a= 200

>>> b= 200

>>> ais b

True

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 2

(object fields)

PyObject_HEAD

object

PyIntObject

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 1

A typical Python program spend much of its time

allocating/deallocating integers. CPython then caches the small

integers.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

INPYTHON,EVERYTHING IS AN OBJECT

>>> a= 300

>>> b= 300

>>> ais b

False

>>> a= 200

>>> b= 200

>>> ais b

True

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 2

(object fields)

PyObject_HEAD

object

PyIntObject

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 1

A typical Python program spend much of its time

allocating/deallocating integers. CPython then caches the small

integers.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

INPYTHON,EVERYTHING IS AN OBJECT

>>> a= 300

>>> b= 300

>>> ais b

False

>>> a= 200

>>> b= 200

>>> ais b

True

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 2

(object fields)

PyObject_HEAD

object

PyIntObject

(object fields)

PyObject_HEAD

object

PyIntObject

a

b

Ref Count = 1

Ref Count = 1

A typical Python program spend much of its time

allocating/deallocating integers. CPython then caches the small

integers.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

It is very common to load tensors in numpy and convert them to

PyTorch, or vice-versa;

>>> np_array =np.ones((2,2))

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.tensor(np_array)

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

>>> torch_array.add_(1.0)

>>> np_array

array([[1., 1.], # array is intact, a copy was made

[1., 1.]])

Underline after an operation means an in-place operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

It is very common to load tensors in numpy and convert them to

PyTorch, or vice-versa;

>>> np_array =np.ones((2,2))

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.tensor(np_array)

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

>>> torch_array.add_(1.0)

>>> np_array

array([[1., 1.], # array is intact, a copy was made

[1., 1.]])

Underline after an operation means an in-place operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

It is very common to load tensors in numpy and convert them to

PyTorch, or vice-versa;

>>> np_array =np.ones((2,2))

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.tensor(np_array)

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

>>> torch_array.add_(1.0)

>>> np_array

array([[1., 1.], # array is intact, a copy was made

[1., 1.]])

Underline after an operation means an in-place operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

It is very common to load tensors in numpy and convert them to

PyTorch, or vice-versa;

>>> np_array =np.ones((2,2))

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.tensor(np_array)

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

>>> torch_array.add_(1.0)

>>> np_array

array([[1., 1.], # array is intact, a copy was made

[1., 1.]])

Underline after an operation means an in-place operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

ÉNow imagine that you have a batch of 128 images, 3 channels

each (RGB) and with size of 224x224;

0

1

1

1

0

0

1

1

1

0

0

1

1

1

1

1

1

0

1

0

1

1

1

1

0

1

0

0

0

1

1

1

0

1

0

0

1

0

0

1

1

0

0

1

1

0

0

0

Column

Row

Channel

ÉThis will yield a size in memory of ∼74MB. We don’t want to

duplicate memory (except when copying them to discrete GPUs

of course);

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Let’s see now a slightly different code using the function

torch.from_numpy() this time:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> torch_array.add_(1.0)

>>> np_array

array([[2., 2.],

[2., 2.]])

The original numpy array was changed, because it used a zero-copy

operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Let’s see now a slightly different code using the function

torch.from_numpy() this time:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> torch_array.add_(1.0)

>>> np_array

array([[2., 2.],

[2., 2.]])

The original numpy array was changed, because it used a zero-copy

operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Let’s see now a slightly different code using the function

torch.from_numpy() this time:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> torch_array.add_(1.0)

>>> np_array

array([[2., 2.],

[2., 2.]])

The original numpy array was changed, because it used a zero-copy

operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Let’s see now a slightly different code using the function

torch.from_numpy() this time:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> torch_array.add_(1.0)

>>> np_array

array([[2., 2.],

[2., 2.]])

The original numpy array was changed, because it used a zero-copy

operation.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Difference between in-place and standard operations might not be

so clear in some cases:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> np_array =np_array + 1.0

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

However, if you use np_array += 1.0 , that is an in-place

operation that will change torch_array memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Difference between in-place and standard operations might not be

so clear in some cases:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> np_array =np_array + 1.0

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

However, if you use np_array += 1.0 , that is an in-place

operation that will change torch_array memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Difference between in-place and standard operations might not be

so clear in some cases:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> np_array =np_array + 1.0

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

However, if you use np_array += 1.0 , that is an in-place

operation that will change torch_array memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

Difference between in-place and standard operations might not be

so clear in some cases:

>>> np_array

array([[1., 1.],

[1., 1.]])

>>> torch_array =torch.from_numpy(np_array)

>>> np_array =np_array + 1.0

>>> torch_array

tensor([[1., 1.],

[1., 1.]], dtype=torch.float64)

However, if you use np_array += 1.0 , that is an in-place

operation that will change torch_array memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

ZERO-COPYING TENSORS

at::Tensor tensor_from_numpy(PyObject*obj) {

// (...) - omitted for brevity

auto array =(PyArrayObject*)obj;

int ndim =PyArray_NDIM(array);

auto sizes =to_aten_shape(ndim, PyArray_DIMS(array));

auto strides =to_aten_shape(ndim, PyArray_STRIDES(array));

// (...) - omitted for brevity

void*data_ptr =PyArray_DATA(array);

auto&type =CPU(dtype_to_aten(PyArray_TYPE(array)));

Py_INCREF(obj);

return type.tensorFromBlob(data_ptr, sizes, strides,

[obj](void*data) {

AutoGIL gil;

Py_DECREF(obj);

});

}

Pay attention to the reference counting using

Py_INCREF()

and the

call to tensorFromBlob() function.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

DATA POINTERS

(object fields)

data_pointer*

object

PyArrayObject

(object fields)

data_pointer*

object

FloatTensor

The tensor FloatTensor did a copy of the numpy array data

pointer and not of the contents. The reference is kept safe by the

Python reference counting mechanism.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

The abstraction responsible for holding the data isn’t actually the

Tensor , but the Storage .

struct C10_API StorageImpl final :(...) {

// (...)

private:

// (...)

caffe2::TypeMeta data_type_;

DataPtr data_ptr_;

int64_t numel_;

Allocator*allocator_;

}

É

Holds a pointer to the raw data and contains information such as

the size and allocator;

ÉStorage is a dumb abstraction, there is no metadata telling us

how to interpret the data it holds;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

The abstraction responsible for holding the data isn’t actually the

Tensor , but the Storage .

struct C10_API StorageImpl final :(...) {

// (...)

private:

// (...)

caffe2::TypeMeta data_type_;

DataPtr data_ptr_;

int64_t numel_;

Allocator*allocator_;

}

É

Holds a pointer to the raw data and contains information such as

the size and allocator;

ÉStorage is a dumb abstraction, there is no metadata telling us

how to interpret the data it holds;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

The abstraction responsible for holding the data isn’t actually the

Tensor , but the Storage .

struct C10_API StorageImpl final :(...) {

// (...)

private:

// (...)

caffe2::TypeMeta data_type_;

DataPtr data_ptr_;

int64_t numel_;

Allocator*allocator_;

}

É

Holds a pointer to the raw data and contains information such as

the size and allocator;

ÉStorage is a dumb abstraction, there is no metadata telling us

how to interpret the data it holds;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

É

The

Storage

abstraction is very powerful because it decouples

the raw data and how we can interpret it;

ÉWe can have multiple tensors sharing the same storage, but

with different interpretations, also called views, but without

duplicating memory:

>>> tensor_a =torch.ones((2,2))

>>> tensor_b =tensor_a.view(4)

>>> tensor_a_data =tensor_a.storage().data_ptr()

>>> tensor_b_data =tensor_b.storage().data_ptr()

>>> tensor_a_data == tensor_b_data

True

Étensor_b is a different view (interpretation) of the same data

present in the underlying storage that is shared between both

tensors.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

É

The

Storage

abstraction is very powerful because it decouples

the raw data and how we can interpret it;

ÉWe can have multiple tensors sharing the same storage, but

with different interpretations, also called views, but without

duplicating memory:

>>> tensor_a =torch.ones((2,2))

>>> tensor_b =tensor_a.view(4)

>>> tensor_a_data =tensor_a.storage().data_ptr()

>>> tensor_b_data =tensor_b.storage().data_ptr()

>>> tensor_a_data == tensor_b_data

True

Étensor_b is a different view (interpretation) of the same data

present in the underlying storage that is shared between both

tensors.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

É

The

Storage

abstraction is very powerful because it decouples

the raw data and how we can interpret it;

ÉWe can have multiple tensors sharing the same storage, but

with different interpretations, also called views, but without

duplicating memory:

>>> tensor_a =torch.ones((2,2))

>>> tensor_b =tensor_a.view(4)

>>> tensor_a_data =tensor_a.storage().data_ptr()

>>> tensor_b_data =tensor_b.storage().data_ptr()

>>> tensor_a_data == tensor_b_data

True

Étensor_b is a different view (interpretation) of the same data

present in the underlying storage that is shared between both

tensors.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TENSOR STORAGE

É

The

Storage

abstraction is very powerful because it decouples

the raw data and how we can interpret it;

ÉWe can have multiple tensors sharing the same storage, but

with different interpretations, also called views, but without

duplicating memory:

>>> tensor_a =torch.ones((2,2))

>>> tensor_b =tensor_a.view(4)

>>> tensor_a_data =tensor_a.storage().data_ptr()

>>> tensor_b_data =tensor_b.storage().data_ptr()

>>> tensor_a_data == tensor_b_data

True

Étensor_b is a different view (interpretation) of the same data

present in the underlying storage that is shared between both

tensors.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

MEMORY ALLOCATORS (CPU/GPU)

ÉThe tensor storage can be allocated either in the CPU memory

or GPU, therefore a mechanism is required to switch between

these different allocations:

struct Allocator {

virtual ~Allocator() {}

virtual DataPtr allocate(size_t n) const = 0;

virtual DeleterFnPtr raw_deleter() const {...}

void*raw_allocate(size_t n) {...}

void raw_deallocate(void*ptr) {...}

};

É

There are

Allocator

s that will use the GPU allocators such as

cudaMallocHost() when the storage should be used for the

GPU or posix_memalign() POSIX functions for data in the

CPU memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

MEMORY ALLOCATORS (CPU/GPU)

ÉThe tensor storage can be allocated either in the CPU memory

or GPU, therefore a mechanism is required to switch between

these different allocations:

struct Allocator {

virtual ~Allocator() {}

virtual DataPtr allocate(size_t n) const = 0;

virtual DeleterFnPtr raw_deleter() const {...}

void*raw_allocate(size_t n) {...}

void raw_deallocate(void*ptr) {...}

};

É

There are

Allocator

s that will use the GPU allocators such as

cudaMallocHost() when the storage should be used for the

GPU or posix_memalign() POSIX functions for data in the

CPU memory.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

THE BIG PICTURE

(object fields)

Storage *storage

object

Tensor

Allocator *allocator

(object fields)

DataPtr data_ptr

object

Storage

raw_deallocate()

(object fields)

raw_allocate()

object

Allocator

Raw Data

ÉThe Tensor has a Storage which in turn has a pointer to

the raw data and to the Allocator to allocate memory

according to the destination device.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

ÉPyTorch is eager by design, which means that it is easily

hackable to debug, inspect, etc;

ÉHowever, this poses problems for optimization and for

decoupling it from Python (the model itself is Python code);

ÉPyTorch 1.0 introduced torch.jit , which has two main

methods to convert a PyTorch model to a serializable and

optimizable format;

ÉTorchScript was also introduced as a statically-typed subset of

Python;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

ÉPyTorch is eager by design, which means that it is easily

hackable to debug, inspect, etc;

ÉHowever, this poses problems for optimization and for

decoupling it from Python (the model itself is Python code);

ÉPyTorch 1.0 introduced torch.jit , which has two main

methods to convert a PyTorch model to a serializable and

optimizable format;

ÉTorchScript was also introduced as a statically-typed subset of

Python;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

JIT - JUST-IN-TIME COMPILER

ÉPyTorch is eager by design, which means that it is easily

hackable to debug, inspect, etc;

ÉHowever, this poses problems for optimization and for

decoupling it from Python (the model itself is Python code);

ÉPyTorch 1.0 introduced torch.jit , which has two main

methods to convert a PyTorch model to a serializable and

optimizable format;

ÉTorchScript was also introduced as a statically-typed subset of

Python;

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING

def my_function(x):

if x.mean() > 1.0:

r=torch.tensor(1.0)

else:

r=torch.tensor(2.0)

return r

>>> ftrace =torch.jit.trace(my_function, (torch.ones(2,2)))

>>> ftrace.graph

graph(%x : Float(2, 2)) {

%4 : Float() = prim::Constant[value={2}]()

%5 : Device = prim::Constant[value="cpu"]()

%6 : int = prim::Constant[value=6]()

%7 : bool = prim::Constant[value=0]()

%8 : bool = prim::Constant[value=0]()

%9 : Float() = aten::to(%4, %5, %6, %7, %8)

%10 : Float() = aten::detach(%9)

return (%10); }

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING

def my_function(x):

if x.mean() > 1.0:

r=torch.tensor(1.0)

else:

r=torch.tensor(2.0)

return r

>>> ftrace =torch.jit.trace(my_function, (torch.ones(2,2)))

>>> ftrace.graph

graph(%x : Float(2, 2)) {

%4 : Float() = prim::Constant[value={2}]()

%5 : Device = prim::Constant[value="cpu"]()

%6 : int = prim::Constant[value=6]()

%7 : bool = prim::Constant[value=0]()

%8 : bool = prim::Constant[value=0]()

%9 : Float() = aten::to(%4, %5, %6, %7, %8)

%10 : Float() = aten::detach(%9)

return (%10); }

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING

def my_function(x):

if x.mean() > 1.0:

r=torch.tensor(1.0)

else:

r=torch.tensor(2.0)

return r

>>> ftrace =torch.jit.trace(my_function, (torch.ones(2,2)))

>>> ftrace.graph

graph(%x : Float(2, 2)) {

%4 : Float() = prim::Constant[value={2}]()

%5 : Device = prim::Constant[value="cpu"]()

%6 : int = prim::Constant[value=6]()

%7 : bool = prim::Constant[value=0]()

%8 : bool = prim::Constant[value=0]()

%9 : Float() = aten::to(%4, %5, %6, %7, %8)

%10 : Float() = aten::detach(%9)

return (%10); }

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING

To call the JIT’ed function, just call the forward() method:

>>> x=torch.ones(2,2)

>>> ftrace.forward(x)

tensor(2.)

However, tracing will not record any control-ﬂow like if statements

or loops, it executes the code with the given context and creates the

graph. You can see this limitation below:

>>> x=torch.ones(2,2).add_(1.0)

>>> ftrace.forward(x)

tensor(2.)

According to

my_function()

, result should have been 1.0. Tracing

also checks for differences between traced and Python function, but

what about Dropout ?

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

TRACING

To call the JIT’ed function, just call the forward() method:

>>> x=torch.ones(2,2)

>>> ftrace.forward(x)

tensor(2.)

However, tracing will not record any control-ﬂow like if statements

or loops, it executes the code with the given context and creates the

graph. You can see this limitation below:

>>> x=torch.ones(2,2).add_(1.0)

>>> ftrace.forward(x)

tensor(2.)

According to

my_function()

, result should have been 1.0. Tracing

also checks for differences between traced and Python function, but

what about Dropout ?

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING

>>> my_function.graph

graph(%x : Tensor) {

%2 : float = prim::Constant[value=1]()

%5 : int = prim::Constant[value=1]()

%6 : int = prim::Constant[value=2]()

%1 : Tensor = aten::mean(%x)

%3 : Tensor = aten::gt(%1, %2)

%4 : bool = prim::Bool(%3)

%r : int = prim::If(%4)

block0() {

-> (%5)

}

block1() {

-> (%6)

}

return (%r);

}

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING

The my_function() is now a ScriptModule :

>>> type(my_function)

torch.jit.ScriptModule

When we check the results again:

>>> x=torch.ones(2,2)

>>> my_function(x)

2

>>> x=torch.ones(2,2).add_(1.0)

>>> my_function(x)

1

Control-ﬂow logic was preserved !

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING

The my_function() is now a ScriptModule :

>>> type(my_function)

torch.jit.ScriptModule

When we check the results again:

>>> x=torch.ones(2,2)

>>> my_function(x)

2

>>> x=torch.ones(2,2).add_(1.0)

>>> my_function(x)

1

Control-ﬂow logic was preserved !

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

SCRIPTING

The my_function() is now a ScriptModule :

>>> type(my_function)

torch.jit.ScriptModule

When we check the results again:

>>> x=torch.ones(2,2)

>>> my_function(x)

2

>>> x=torch.ones(2,2).add_(1.0)

>>> my_function(x)

1

Control-ﬂow logic was preserved !

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?

ÉThe concept of having a well-deﬁned Intermediate

Representation (IR) is very powerful, it’s the main concept

behind LLVM platform as well;

ÉThis opens the door to:

É

Decouple the model (computationl graph) from Python runtime;

ÉUse it in production with C++ (no GIL) or other languages;

ÉCapitalize on optimizations (whole program);

É

Split the development world of hackable and easy to debug from

the world of putting these models in production and optimize

them.

PyTorch under the hood - Christian S. Perone (2019)

TENSORS JIT PRODUCTION Q&A

WHY TORCHSCRIPT ?

ÉThe concept of having a well-deﬁned Intermediate

Representation (IR) is very powerful, it’s the main concept

behind LLVM platform as well;

ÉThis opens the door to:

É

Decouple the model (computationl graph) from Python runtime;

ÉUse it in production with C++ (no GIL) or other languages;

ÉCapitalize on optimizations (whole program);

É

Split the development world of hackable and easy to debug from

the world of putting these models in production and optimize

them.