Proxmox VE Administration Guide Pve Admin

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 341 [warning: Documents this large are best viewed by clicking the View PDF Link!]

PROXMOX VE ADMINISTRATION GUIDE
RELEASE 5.2
May 16, 2018
Proxmox Server Solutions Gmbh
www.proxmox.com
Proxmox VE Administration Guide ii
Copyright ©2017 Proxmox Server Solutions Gmbh
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with
no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU Free Documentation License".
Proxmox VE Administration Guide iii
Contents
1 Introduction 1
1.1 Central Management ....................................... 2
1.2 Flexible Storage ......................................... 3
1.3 Integrated Backup and Restore ................................. 3
1.4 High Availability Cluster ..................................... 3
1.5 Flexible Networking ........................................ 4
1.6 Integrated Firewall ........................................ 4
1.7 Why Open Source ........................................ 4
1.8 Your benefit with Proxmox VE .................................. 4
1.9 Getting Help ........................................... 5
1.9.1 Proxmox VE Wiki ..................................... 5
1.9.2 Community Support Forum ................................ 5
1.9.3 Mailing Lists ........................................ 5
1.9.4 Commercial Support ................................... 5
1.9.5 Bug Tracker ........................................ 5
1.10 Project History .......................................... 5
1.11 Improving the Proxmox VE Documentation ........................... 6
2 Installing Proxmox VE 7
2.1 System Requirements ...................................... 7
2.1.1 Minimum Requirements, for Evaluation .......................... 7
2.1.2 Recommended System Requirements .......................... 8
2.1.3 Simple Performance Overview .............................. 8
2.1.4 Supported web browsers for accessing the web interface . . . . . . . . . . . . . . . . 8
2.2 Using the Proxmox VE Installation CD-ROM ........................... 8
2.2.1 Advanced LVM Configuration Options .......................... 14
2.2.2 ZFS Performance Tips .................................. 15
2.3 Install Proxmox VE on Debian .................................. 15
Proxmox VE Administration Guide iv
2.4 Install from USB Stick ...................................... 16
2.4.1 Prepare a USB flash drive as install medium ....................... 16
2.4.2 Instructions for GNU/Linux ................................ 16
2.4.3 Instructions for OSX ................................... 17
2.4.4 Instructions for Windows ................................. 17
2.4.5 Boot your server from USB media ............................ 17
3 Host System Administration 18
3.1 Package Repositories ...................................... 18
3.1.1 Proxmox VE Enterprise Repository ............................ 19
3.1.2 Proxmox VE No-Subscription Repository ......................... 19
3.1.3 Proxmox VE Test Repository ............................... 19
3.1.4 SecureApt ......................................... 20
3.2 System Software Updates .................................... 20
3.3 Network Configuration ...................................... 21
3.3.1 Naming Conventions ................................... 21
3.3.2 Choosing a network configuration ............................ 22
3.3.3 Default Configuration using a Bridge ........................... 22
3.3.4 Routed Configuration ................................... 23
3.3.5 Masquerading (NAT) with iptables . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.6 Linux Bond ........................................ 24
3.3.7 VLAN 802.1Q ....................................... 26
3.4 Time Synchronization ...................................... 28
3.4.1 Using Custom NTP Servers ............................... 28
3.5 External Metric Server ...................................... 29
3.5.1 Graphite server configuration ............................... 29
3.5.2 Influxdb plugin configuration ............................... 29
3.6 Disk Health Monitoring ...................................... 30
3.7 Logical Volume Manager (LVM) ................................. 30
3.7.1 Hardware ......................................... 31
3.7.2 Bootloader ........................................ 31
3.7.3 Creating a Volume Group ................................. 31
3.7.4 Creating an extra LV for /var/lib/vz . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7.5 Resizing the thin pool ................................... 32
3.7.6 Create a LVM-thin pool .................................. 33
3.8 ZFS on Linux ........................................... 33
Proxmox VE Administration Guide v
3.8.1 Hardware ......................................... 34
3.8.2 Installation as Root File System ............................. 34
3.8.3 Bootloader ........................................ 35
3.8.4 ZFS Administration .................................... 35
3.8.5 Activate E-Mail Notification ................................ 37
3.8.6 Limit ZFS Memory Usage ................................ 38
3.9 Certificate Management ..................................... 39
3.9.1 Certificates for communication within the cluster ..................... 39
3.9.2 Certificates for API and web GUI ............................. 39
4 Hyper-converged Infrastructure 42
4.1 Benefits of a Hyper-Converged Infrastructure (HCI) with Proxmox VE . . . . . . . . . . . . . . 42
4.2 Manage Ceph Services on Proxmox VE Nodes ......................... 43
4.2.1 Precondition ....................................... 44
4.2.2 Installation of Ceph Packages .............................. 44
4.2.3 Creating initial Ceph configuration ............................ 45
4.2.4 Creating Ceph Monitors ................................. 46
4.2.5 Creating Ceph Manager ................................. 46
4.2.6 Creating Ceph OSDs ................................... 47
4.2.7 Creating Ceph Pools ................................... 49
4.2.8 Ceph CRUSH & device classes ............................. 50
4.2.9 Ceph Client ........................................ 52
5 Graphical User Interface 53
5.1 Features ............................................. 53
5.2 Login ............................................... 54
5.3 GUI Overview .......................................... 54
5.3.1 Header .......................................... 55
5.3.2 Resource Tree ...................................... 56
5.3.3 Log Panel ......................................... 56
5.4 Content Panels .......................................... 57
5.4.1 Datacenter ........................................ 57
5.4.2 Nodes ........................................... 58
5.4.3 Guests .......................................... 59
5.4.4 Storage .......................................... 61
5.4.5 Pools ........................................... 62
Proxmox VE Administration Guide vi
6 Cluster Manager 63
6.1 Requirements .......................................... 63
6.2 Preparing Nodes ......................................... 64
6.3 Create the Cluster ........................................ 64
6.3.1 Multiple Clusters In Same Network ............................ 64
6.4 Adding Nodes to the Cluster ................................... 65
6.4.1 Adding Nodes With Separated Cluster Network ..................... 66
6.5 Remove a Cluster Node ..................................... 66
6.5.1 Separate A Node Without Reinstalling .......................... 67
6.6 Quorum .............................................. 69
6.7 Cluster Network ......................................... 69
6.7.1 Network Requirements .................................. 69
6.7.2 Separate Cluster Network ................................ 70
6.7.3 Redundant Ring Protocol ................................. 73
6.7.4 RRP On Cluster Creation ................................. 73
6.7.5 RRP On Existing Clusters ................................ 73
6.8 Corosync Configuration ..................................... 74
6.8.1 Edit corosync.conf .................................... 74
6.8.2 Troubleshooting ...................................... 75
6.8.3 Corosync Configuration Glossary ............................ 76
6.9 Cluster Cold Start ........................................ 76
6.10 Guest Migration ......................................... 77
6.10.1 Migration Type ...................................... 77
6.10.2 Migration Network .................................... 77
7 Proxmox Cluster File System (pmxcfs) 79
7.1 POSIX Compatibility ....................................... 79
7.2 File Access Rights ........................................ 80
7.3 Technology ............................................ 80
7.4 File System Layout ........................................ 80
7.4.1 Files ............................................ 80
7.4.2 Symbolic links ....................................... 81
7.4.3 Special status files for debugging (JSON) . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4.4 Enable/Disable debugging ................................ 81
7.5 Recovery ............................................. 81
7.5.1 Remove Cluster configuration .............................. 81
7.5.2 Recovering/Moving Guests from Failed Nodes . . . . . . . . . . . . . . . . . . . . . . 82
Proxmox VE Administration Guide vii
8 Proxmox VE Storage 83
8.1 Storage Types .......................................... 83
8.1.1 Thin Provisioning ..................................... 84
8.2 Storage Configuration ...................................... 84
8.2.1 Storage Pools ....................................... 85
8.2.2 Common Storage Properties ............................... 85
8.3 Volumes ............................................. 86
8.3.1 Volume Ownership .................................... 87
8.4 Using the Command Line Interface ............................... 87
8.4.1 Examples ......................................... 87
8.5 Directory Backend ........................................ 88
8.5.1 Configuration ....................................... 89
8.5.2 File naming conventions ................................. 89
8.5.3 Storage Features ..................................... 90
8.5.4 Examples ......................................... 90
8.6 NFS Backend ........................................... 91
8.6.1 Configuration ....................................... 91
8.6.2 Storage Features ..................................... 92
8.6.3 Examples ......................................... 92
8.7 CIFS Backend .......................................... 92
8.7.1 Configuration ....................................... 92
8.7.2 Storage Features ..................................... 93
8.7.3 Examples ......................................... 94
8.8 GlusterFS Backend ........................................ 94
8.8.1 Configuration ....................................... 94
8.8.2 File naming conventions ................................. 95
8.8.3 Storage Features ..................................... 95
8.9 Local ZFS Pool Backend ..................................... 95
8.9.1 Configuration ....................................... 95
8.9.2 File naming conventions ................................. 96
8.9.3 Storage Features ..................................... 96
8.9.4 Examples ......................................... 96
8.10 LVM Backend ........................................... 96
8.10.1 Configuration ....................................... 97
8.10.2 File naming conventions ................................. 97
Proxmox VE Administration Guide viii
8.10.3 Storage Features ..................................... 97
8.10.4 Examples ......................................... 98
8.11 LVM thin Backend ........................................ 98
8.11.1 Configuration ....................................... 98
8.11.2 File naming conventions ................................. 99
8.11.3 Storage Features ..................................... 99
8.11.4 Examples ......................................... 99
8.12 Open-iSCSI initiator ....................................... 99
8.12.1 Configuration ....................................... 99
8.12.2 File naming conventions .................................100
8.12.3 Storage Features .....................................100
8.12.4 Examples .........................................100
8.13 User Mode iSCSI Backend ....................................101
8.13.1 Configuration .......................................101
8.13.2 Storage Features .....................................101
8.14 Ceph RADOS Block Devices (RBD) ...............................101
8.14.1 Configuration .......................................102
8.14.2 Authentication .......................................103
8.14.3 Storage Features .....................................103
9 Storage Replication 104
9.1 Supported Storage Types ....................................104
9.2 Schedule Format .........................................105
9.2.1 Detailed Specification ...................................105
9.2.2 Examples: .........................................106
9.3 Error Handling ..........................................106
9.3.1 Possible issues ......................................106
9.3.2 Migrating a guest in case of Error ............................107
9.3.3 Example .........................................107
9.4 Managing Jobs ..........................................108
9.5 Command Line Interface Examples ...............................108
10 Qemu/KVM Virtual Machines 109
10.1 Emulated devices and paravirtualized devices ..........................109
10.2 Virtual Machines Settings ....................................110
10.2.1 General Settings .....................................110
Proxmox VE Administration Guide ix
10.2.2 OS Settings ........................................111
10.2.3 Hard Disk .........................................111
10.2.4 CPU ............................................113
10.2.5 Memory ..........................................116
10.2.6 Network Device ......................................118
10.2.7 USB Passthrough .....................................119
10.2.8 BIOS and UEFI ......................................120
10.2.9 Automatic Start and Shutdown of Virtual Machines ...................120
10.3 Migration .............................................121
10.3.1 Online Migration .....................................121
10.3.2 Offline Migration .....................................122
10.4 Copies and Clones ........................................122
10.5 Virtual Machine Templates ....................................123
10.6 Importing Virtual Machines and disk images ...........................124
10.6.1 Step-by-step example of a Windows OVF import .....................124
10.6.2 Adding an external disk image to a Virtual Machine ...................125
10.7 Cloud-Init Support ........................................125
10.7.1 Preparing Cloud-Init Templates ..............................126
10.7.2 Deploying Cloud-Init Templates ..............................127
10.7.3 Cloud-Init specific Options ................................128
10.8 Managing Virtual Machines with qm ...............................129
10.8.1 CLI Usage Examples ...................................129
10.9 Configuration ...........................................129
10.9.1 File Format ........................................130
10.9.2 Snapshots ........................................130
10.9.3 Options ..........................................131
10.10Locks ...............................................151
11 Proxmox Container Toolkit 152
11.1 Technology Overview .......................................152
11.2 Security Considerations .....................................153
11.3 Guest Operating System Configuration .............................153
11.4 Container Images ........................................155
11.5 Container Storage ........................................156
11.5.1 FUSE Mounts .......................................156
11.5.2 Using Quotas Inside Containers .............................156
Proxmox VE Administration Guide x
11.5.3 Using ACLs Inside Containers ..............................157
11.5.4 Backup of Containers mount points ...........................157
11.5.5 Replication of Containers mount points .........................157
11.6 Container Settings ........................................157
11.6.1 General Settings .....................................157
11.6.2 CPU ............................................159
11.6.3 Memory ..........................................160
11.6.4 Mount Points .......................................160
11.6.5 Network ..........................................163
11.6.6 Automatic Start and Shutdown of Containers . . . . . . . . . . . . . . . . . . . . . . 164
11.7 Backup and Restore .......................................165
11.7.1 Container Backup .....................................165
11.7.2 Restoring Container Backups ...............................165
11.8 Managing Containers with pct .................................166
11.8.1 CLI Usage Examples ...................................167
11.8.2 Obtaining Debugging Logs ................................167
11.9 Migration .............................................168
11.10Configuration ...........................................168
11.10.1File Format ........................................168
11.10.2Snapshots ........................................169
11.10.3Options ..........................................169
11.11Locks ...............................................174
12 Proxmox VE Firewall 175
12.1 Zones ...............................................175
12.2 Configuration Files ........................................175
12.2.1 Cluster Wide Setup ....................................176
12.2.2 Host Specific Configuration ................................177
12.2.3 VM/Container Configuration ...............................178
12.3 Firewall Rules ..........................................179
12.4 Security Groups .........................................180
12.5 IP Aliases .............................................181
12.5.1 Standard IP Alias local_network . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.6 IP Sets ..............................................181
12.6.1 Standard IP set management ..............................182
12.6.2 Standard IP set blacklist ..............................182
Proxmox VE Administration Guide xi
12.6.3 Standard IP set ipfilter-net*...........................182
12.7 Services and Commands .....................................182
12.8 Tips and Tricks ..........................................183
12.8.1 How to allow FTP .....................................183
12.8.2 Suricata IPS integration ..................................183
12.9 Notes on IPv6 ..........................................184
12.10Ports used by Proxmox VE ....................................184
13 User Management 185
13.1 Users ...............................................185
13.1.1 System administrator ...................................185
13.1.2 Groups ..........................................186
13.2 Authentication Realms ......................................186
13.3 Two factor authentication .....................................187
13.4 Permission Management .....................................187
13.4.1 Roles ...........................................188
13.4.2 Privileges .........................................188
13.4.3 Objects and Paths ....................................189
13.4.4 Pools ...........................................190
13.4.5 What permission do I need? ...............................190
13.5 Command Line Tool .......................................191
13.6 Real World Examples ......................................192
13.6.1 Administrator Group ...................................192
13.6.2 Auditors ..........................................192
13.6.3 Delegate User Management ...............................193
13.6.4 Pools ...........................................193
14 High Availability 194
14.1 Requirements ..........................................195
14.2 Resources ............................................196
14.3 Management Tasks ........................................196
14.4 How It Works ...........................................197
14.4.1 Service States ......................................198
14.4.2 Local Resource Manager .................................199
14.4.3 Cluster Resource Manager ................................200
14.5 Configuration ...........................................200
Proxmox VE Administration Guide xii
14.5.1 Resources ........................................201
14.5.2 Groups ..........................................203
14.6 Fencing ..............................................205
14.6.1 How Proxmox VE Fences .................................205
14.6.2 Configure Hardware Watchdog ..............................206
14.6.3 Recover Fenced Services ................................206
14.7 Start Failure Policy ........................................206
14.8 Error Recovery ..........................................207
14.9 Package Updates .........................................207
14.10Node Maintenance ........................................207
14.10.1Shutdown .........................................207
14.10.2Reboot ..........................................208
14.10.3Manual Resource Movement ...............................208
15 Backup and Restore 209
15.1 Backup modes ..........................................209
15.2 Backup File Names ........................................211
15.3 Restore ..............................................211
15.3.1 Bandwidth Limit ......................................211
15.4 Configuration ...........................................212
15.5 Hook Scripts ...........................................213
15.6 File Exclusions ..........................................213
15.7 Examples .............................................214
16 Important Service Daemons 215
16.1 pvedaemon - Proxmox VE API Daemon .............................215
16.2 pveproxy - Proxmox VE API Proxy Daemon ...........................215
16.2.1 Host based Access Control ................................215
16.2.2 SSL Cipher Suite .....................................216
16.2.3 Diffie-Hellman Parameters ................................216
16.2.4 Alternative HTTPS certificate ...............................216
16.3 pvestatd - Proxmox VE Status Daemon .............................216
16.4 spiceproxy - SPICE Proxy Service ................................216
16.4.1 Host based Access Control ................................217
17 Useful Command Line Tools 218
17.1 pvesubscription - Subscription Management ...........................218
17.2 pveperf - Proxmox VE Benchmark Script ............................218
Proxmox VE Administration Guide xiii
18 Frequently Asked Questions 220
19 Bibliography 223
19.1 Books about Proxmox VE ....................................223
19.2 Books about related technology .................................223
19.3 Books about related topics ....................................224
A Command Line Interface 225
A.1 pvesm - Proxmox VE Storage Manager .............................225
A.2 pvesubscription - Proxmox VE Subscription Manager .....................234
A.3 pveperf - Proxmox VE Benchmark Script ............................235
A.4 pveceph - Manage CEPH Services on Proxmox VE Nodes ...................235
A.5 pvenode - Proxmox VE Node Management ...........................239
A.6 qm - Qemu/KVM Virtual Machine Manager ...........................241
A.7 qmrestore - Restore QemuServer vzdump Backups . . . . . . . . . . . . . . . . . . . . . . 265
A.8 pct - Proxmox Container Toolkit .................................265
A.9 pveam - Proxmox VE Appliance Manager ............................280
A.10 pvecm - Proxmox VE Cluster Manager .............................281
A.11 pvesr - Proxmox VE Storage Replication ............................284
A.12 pveum - Proxmox VE User Manager ..............................288
A.13 vzdump - Backup Utility for VMs and Containers . . . . . . . . . . . . . . . . . . . . . . . . 293
A.14 ha-manager - Proxmox VE HA Manager ............................295
B Service Daemons 299
B.1 pve-firewall - Proxmox VE Firewall Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . 299
B.2 pvedaemon - Proxmox VE API Daemon ............................300
B.3 pveproxy - Proxmox VE API Proxy Daemon ...........................301
B.4 pvestatd - Proxmox VE Status Daemon .............................301
B.5 spiceproxy - SPICE Proxy Service ...............................302
B.6 pmxcfs - Proxmox Cluster File System .............................303
B.7 pve-ha-crm - Cluster Resource Manager Daemon .......................303
B.8 pve-ha-lrm - Local Resource Manager Daemon . . . . . . . . . . . . . . . . . . . . . . . . 304
C Configuration Files 305
C.1 Datacenter Configuration .....................................305
C.1.1 File Format ........................................305
C.1.2 Options ..........................................305
D Firewall Macro Definitions 308
E GNU Free Documentation License 322
Proxmox VE Administration Guide 1 / 328
Chapter 1
Introduction
Proxmox VE is a platform to run virtual machines and containers. It is based on Debian Linux, and completely
open source. For maximum flexibility, we implemented two virtualization technologies - Kernel-based Virtual
Machine (KVM) and container-based virtualization (LXC).
One main design goal was to make administration as easy as possible. You can use Proxmox VE on a
single node, or assemble a cluster of many nodes. All management tasks can be done using our web-based
management interface, and even a novice user can setup and install Proxmox VE within minutes.
qm pvesm pveum ha-manager
pct pvecm pveceph pve-rewall
User Tools
pveproxy pvedaemon pvestatd pve-ha-lrm pve-cluster
Linux Kernel
KVM
Container
AppApp
VM
Guest OS
AppApp
Qemu
Container
AppApp
VM
Guest OS
AppApp
Proxmox VE Administration Guide 2 / 328
1.1 Central Management
While many people start with a single node, Proxmox VE can scale out to a large set of clustered nodes.
The cluster stack is fully integrated and ships with the default installation.
Unique Multi-Master Design
The integrated web-based management interface gives you a clean overview of all your KVM guests
and Linux containers and even of your whole cluster. You can easily manage your VMs and con-
tainers, storage or cluster from the GUI. There is no need to install a separate, complex, and pricey
management server.
Proxmox Cluster File System (pmxcfs)
Proxmox VE uses the unique Proxmox Cluster file system (pmxcfs), a database-driven file system for
storing configuration files. This enables you to store the configuration of thousands of virtual machines.
By using corosync, these files are replicated in real time on all cluster nodes. The file system stores
all data inside a persistent database on disk, nonetheless, a copy of the data resides in RAM which
provides a maximum storage size is 30MB - more than enough for thousands of VMs.
Proxmox VE is the only virtualization platform using this unique cluster file system.
Web-based Management Interface
Proxmox VE is simple to use. Management tasks can be done via the included web based manage-
ment interface - there is no need to install a separate management tool or any additional management
node with huge databases. The multi-master tool allows you to manage your whole cluster from any
node of your cluster. The central web-based management - based on the JavaScript Framework (Ex-
tJS) - empowers you to control all functionalities from the GUI and overview history and syslogs of each
single node. This includes running backup or restore jobs, live-migration or HA triggered activities.
Command Line
For advanced users who are used to the comfort of the Unix shell or Windows Powershell, Proxmox
VE provides a command line interface to manage all the components of your virtual environment. This
command line interface has intelligent tab completion and full documentation in the form of UNIX man
pages.
REST API
Proxmox VE uses a RESTful API. We choose JSON as primary data format, and the whole API is for-
mally defined using JSON Schema. This enables fast and easy integration for third party management
tools like custom hosting environments.
Role-based Administration
You can define granular access for all objects (like VMs, storages, nodes, etc.) by using the role based
user- and permission management. This allows you to define privileges and helps you to control
access to objects. This concept is also known as access control lists: Each permission specifies a
subject (a user or group) and a role (set of privileges) on a specific path.
Authentication Realms
Proxmox VE supports multiple authentication sources like Microsoft Active Directory, LDAP, Linux PAM
standard authentication or the built-in Proxmox VE authentication server.
Proxmox VE Administration Guide 3 / 328
1.2 Flexible Storage
The Proxmox VE storage model is very flexible. Virtual machine images can either be stored on one or
several local storages or on shared storage like NFS and on SAN. There are no limits, you may configure as
many storage definitions as you like. You can use all storage technologies available for Debian Linux.
One major benefit of storing VMs on shared storage is the ability to live-migrate running machines without
any downtime, as all nodes in the cluster have direct access to VM disk images.
We currently support the following Network storage types:
LVM Group (network backing with iSCSI targets)
iSCSI target
NFS Share
CIFS Share
Ceph RBD
Directly use iSCSI LUNs
• GlusterFS
Local storage types supported are:
LVM Group (local backing devices like block devices, FC devices, DRBD, etc.)
Directory (storage on existing filesystem)
• ZFS
1.3 Integrated Backup and Restore
The integrated backup tool (vzdump) creates consistent snapshots of running Containers and KVM guests.
It basically creates an archive of the VM or CT data which includes the VM/CT configuration files.
KVM live backup works for all storage types including VM images on NFS, CIFS, iSCSI LUN, Ceph RBD or
Sheepdog. The new backup format is optimized for storing VM backups fast and effective (sparse files, out
of order data, minimized I/O).
1.4 High Availability Cluster
A multi-node Proxmox VE HA Cluster enables the definition of highly available virtual servers. The Proxmox
VE HA Cluster is based on proven Linux HA technologies, providing stable and reliable HA services.
Proxmox VE Administration Guide 4 / 328
1.5 Flexible Networking
Proxmox VE uses a bridged networking model. All VMs can share one bridge as if virtual network cables
from each guest were all plugged into the same switch. For connecting VMs to the outside world, bridges
are attached to physical network cards assigned a TCP/IP configuration.
For further flexibility, VLANs (IEEE 802.1q) and network bonding/aggregation are possible. In this way it is
possible to build complex, flexible virtual networks for the Proxmox VE hosts, leveraging the full power of the
Linux network stack.
1.6 Integrated Firewall
The integrated firewall allows you to filter network packets on any VM or Container interface. Common sets
of firewall rules can be grouped into “security groups”.
1.7 Why Open Source
Proxmox VE uses a Linux kernel and is based on the Debian GNU/Linux Distribution. The source code of
Proxmox VE is released under the GNU Affero General Public License, version 3. This means that you are
free to inspect the source code at any time or contribute to the project yourself.
At Proxmox we are committed to use open source software whenever possible. Using open source software
guarantees full access to all functionalities - as well as high security and reliability. We think that everybody
should have the right to access the source code of a software to run it, build on it, or submit changes back
to the project. Everybody is encouraged to contribute while Proxmox ensures the product always meets
professional quality criteria.
Open source software also helps to keep your costs low and makes your core infrastructure independent
from a single vendor.
1.8 Your benefit with Proxmox VE
Open source software
No vendor lock-in
Linux kernel
Fast installation and easy-to-use
Web-based management interface
REST API
Huge active community
Low administration costs and simple deployment
Proxmox VE Administration Guide 5 / 328
1.9 Getting Help
1.9.1 Proxmox VE Wiki
The primary source of information is the Proxmox VE Wiki. It combines the reference documentation with
user contributed content.
1.9.2 Community Support Forum
Proxmox VE itself is fully open source, so we always encourage our users to discuss and share their knowl-
edge using the Proxmox VE Community Forum. The forum is fully moderated by the Proxmox support team,
and has a quite large user base around the whole world. Needless to say that such a large forum is a great
place to get information.
1.9.3 Mailing Lists
This is a fast way to communicate via email with the Proxmox VE community
Mailing list for users: PVE User List
The primary communication channel for developers is:
Mailing list for developer: PVE development discussion
1.9.4 Commercial Support
Proxmox Server Solutions Gmbh also offers commercial Proxmox VE Subscription Service Plans. System
Administrators with a standard subscription plan can access a dedicated support portal with guaranteed re-
sponse time, where Proxmox VE developers help them should an issue appear. Please contact the Proxmox
sales team for more information or volume discounts.
1.9.5 Bug Tracker
We also run a public bug tracker at https://bugzilla.proxmox.com. If you ever detect an issue, you can file a
bug report there. This makes it easy to track its status, and you will get notified as soon as the problem is
fixed.
1.10 Project History
The project started in 2007, followed by a first stable version in 2008. At the time we used OpenVZ for
containers, and KVM for virtual machines. The clustering features were limited, and the user interface was
simple (server generated web page).
But we quickly developed new features using the Corosync cluster stack, and the introduction of the new
Proxmox cluster file system (pmxcfs) was a big step forward, because it completely hides the cluster com-
plexity from the user. Managing a cluster of 16 nodes is as simple as managing a single node.
Proxmox VE Administration Guide 6 / 328
We also introduced a new REST API, with a complete declarative specification written in JSON-Schema.
This enabled other people to integrate Proxmox VE into their infrastructure, and made it easy to provide
additional services.
Also, the new REST API made it possible to replace the original user interface with a modern HTML5
application using JavaScript. We also replaced the old Java based VNC console code with noVNC. So
you only need a web browser to manage your VMs.
The support for various storage types is another big task. Notably, Proxmox VE was the first distribution to
ship ZFS on Linux by default in 2014. Another milestone was the ability to run and manage Ceph storage on
the hypervisor nodes. Such setups are extremely cost effective.
When we started we were among the first companies providing commercial support for KVM. The KVM
project itself continuously evolved, and is now a widely used hypervisor. New features arrive with each
release. We developed the KVM live backup feature, which makes it possible to create snapshot backups on
any storage type.
The most notable change with version 4.0 was the move from OpenVZ to LXC. Containers are now deeply
integrated, and they can use the same storage and network features as virtual machines.
1.11 Improving the Proxmox VE Documentation
Depending on which issue you want to improve, you can use a variety of communication mediums to reach
the developers.
If you notice an error in the current documentation, use the Proxmox bug tracker and propose an alternate
text/wording.
If you want to propose new content, it depends on what you want to document:
if the content is specific to your setup, a wiki article is the best option. For instance if you want to document
specific options for guest systems, like which combination of Qemu drivers work best with a less popular
OS, this is a perfect fit for a wiki article.
if you think the content is generic enough to be of interest for all users, then you should try to get it into the
reference documentation. The reference documentation is written in the easy to use asciidoc document
format. Editing the official documentation requires to clone the git repository at git://git.proxmox.
com/git/pve-docs.git and then follow the README.adoc document.
Improving the documentation is just as easy as editing a Wikipedia article and is an interesting foray in the
development of a large opensource project.
Note
If you are interested in working on the Proxmox VE codebase, the Developer Documentation wiki article
will show you where to start.
Proxmox VE Administration Guide 7 / 328
Chapter 2
Installing Proxmox VE
Proxmox VE is based on Debian and comes with an installation CD-ROM which includes a complete Debian
system ("stretch" for version 5.x) as well as all necessary Proxmox VE packages.
The installer just asks you a few questions, then partitions the local disk(s), installs all required packages,
and configures the system including a basic network setup. You can get a fully functional system within a
few minutes. This is the preferred and recommended installation method.
Alternatively, Proxmox VE can be installed on top of an existing Debian system. This option is only recom-
mended for advanced users since detail knowledge about Proxmox VE is necessary.
2.1 System Requirements
For production servers, high quality server equipment is needed. Keep in mind, if you run 10 Virtual Servers
on one machine and you then experience a hardware failure, 10 services are lost. Proxmox VE supports
clustering, this means that multiple Proxmox VE installations can be centrally managed thanks to the included
cluster functionality.
Proxmox VE can use local storage (DAS), SAN, NAS and also distributed storage (Ceph RBD). For details
see chapter storage Chapter 8.
2.1.1 Minimum Requirements, for Evaluation
CPU: 64bit (Intel EMT64 or AMD64)
Intel VT/AMD-V capable CPU/Mainboard for KVM Full Virtualization support
RAM: 1 GB RAM, plus additional RAM used for guests
Hard drive
One NIC
Proxmox VE Administration Guide 8 / 328
2.1.2 Recommended System Requirements
CPU: 64bit (Intel EMT64 or AMD64), Multi core CPU recommended
Intel VT/AMD-V capable CPU/Mainboard for KVM Full Virtualization support
RAM: 8 GB RAM, plus additional RAM used for guests
Hardware RAID with batteries protected write cache (“BBU”) or flash based protection
Fast hard drives, best results with 15k rpm SAS, Raid10
At least two NICs, depending on the used storage technology you need more
2.1.3 Simple Performance Overview
On an installed Proxmox VE system, you can run the included pveperf script to obtain an overview of the
CPU and hard disk performance.
Note
this is just a very quick and general benchmark. More detailed tests are recommended, especially regard-
ing the I/O performance of your system.
2.1.4 Supported web browsers for accessing the web interface
To use the web interface you need a modern browser, this includes:
Firefox, a release from the current year, or the latest Extended Support Release
Chrome, a release from the current year
the Microsoft currently supported versions of Internet Explorer (as of 2016, this means IE 11 or IE Edge)
the Apple currently supported versions of Safari (as of 2016, this means Safari 9)
If Proxmox VE detects you’re connecting from a mobile device, you will be redirected to a lightweight touch-
based UI.
2.2 Using the Proxmox VE Installation CD-ROM
You can download the ISO from http://www.proxmox.com. It includes the following:
Complete operating system (Debian Linux, 64-bit)
The Proxmox VE installer, which partitions the hard drive(s) with ext4, ext3, xfs or ZFS and installs the
operating system.
Proxmox VE kernel (Linux) with LXC and KVM support
Proxmox VE Administration Guide 9 / 328
Complete toolset for administering virtual machines, containers and all necessary resources
Web based management interface for using the toolset
Note
By default, the complete server is used and all existing data is removed.
Please insert the installation CD-ROM, then boot from that drive. Immediately afterwards you can choose
the following menu options:
Install Proxmox VE
Start normal installation.
Install Proxmox VE (Debug mode)
Start installation in debug mode. It opens a shell console at several installation steps, so that you
can debug things if something goes wrong. Please press CTRL-D to exit those debug consoles and
continue installation. This option is mostly for developers and not meant for general use.
Rescue Boot
This option allows you to boot an existing installation. It searches all attached hard disks and, if it finds
an existing installation, boots directly into that disk using the existing Linux kernel. This can be useful
if there are problems with the boot block (grub), or the BIOS is unable to read the boot block from the
disk.
Test Memory
Runs memtest86+. This is useful to check if your memory is functional and error free.
Proxmox VE Administration Guide 10 / 328
You normally select Install Proxmox VE to start the installation. After that you get prompted to select the
target hard disk(s). The Options button lets you select the target file system, which defaults to ext4. The
installer uses LVM if you select ext3,ext4 or xfs as file system, and offers additional option to restrict
LVM space (see below)
If you have more than one disk, you can also use ZFS as file system. ZFS supports several software RAID
levels, so this is specially useful if you do not have a hardware RAID controller. The Options button lets
you select the ZFS RAID level, and you can choose disks there.
Proxmox VE Administration Guide 11 / 328
The next page just ask for basic configuration options like your location, the time zone and keyboard layout.
The location is used to select a download server near you to speedup updates. The installer is usually able
to auto detect those setting, so you only need to change them in rare situations when auto detection fails, or
when you want to use some special keyboard layout not commonly used in your country.
Proxmox VE Administration Guide 12 / 328
You then need to specify an email address and the superuser (root) password. The password must have at
least 5 characters, but we highly recommend to use stronger passwords - here are some guidelines:
Use a minimum password length of 12 to 14 characters.
Include lowercase and uppercase alphabetic characters, numbers and symbols.
Avoid character repetition, keyboard patterns, dictionary words, letter or number sequences, usernames,
relative or pet names, romantic links (current or past) and biographical information (e.g., ID numbers,
ancestors’ names or dates).
It is sometimes necessary to send notification to the system administrator, for example:
Information about available package updates.
Error messages from periodic CRON jobs.
All those notification mails will be sent to the specified email address.
Proxmox VE Administration Guide 13 / 328
The last step is the network configuration. Please note that you can use either IPv4 or IPv6 here, but not
both. If you want to configure a dual stack node, you can easily do that after installation.
Proxmox VE Administration Guide 14 / 328
If you press Next now, installation starts to format disks, and copies packages to the target. Please wait
until that is finished, then reboot the server.
Further configuration is done via the Proxmox web interface. Just point your browser to the IP address given
during installation (https://youripaddress:8006).
Note
Default login is "root" (realm PAM) and the root password is defined during the installation process.
2.2.1 Advanced LVM Configuration Options
The installer creates a Volume Group (VG) called pve, and additional Logical Volumes (LVs) called root,
data and swap. The size of those volumes can be controlled with:
hdsize
Defines the total HD size to be used. This way you can save free space on the HD for further partition-
ing (i.e. for an additional PV and VG on the same hard disk that can be used for LVM storage).
swapsize
Defines the size of the swap volume. The default is the size of the installed memory, minimum 4 GB
and maximum 8 GB. The resulting value cannot be greater than hdsize/8.
Proxmox VE Administration Guide 15 / 328
Note
If set to 0, no swap volume will be created.
maxroot
Defines the maximum size of the root volume, which stores the operation system. The maximum
limit of the root volume size is hdsize/4.
maxvz
Defines the maximum size of the data volume. The actual size of the data volume is:
datasize = hdsize - rootsize - swapsize - minfree
Where datasize cannot be bigger than maxvz.
Note
In case of LVM thin, the data pool will only be created if datasize is bigger than 4GB.
Note
If set to 0, no data volume will be created and the storage configuration will be adapted accordingly.
minfree
Defines the amount of free space left in LVM volume group pve. With more than 128GB storage
available the default is 16GB, else hdsize/8 will be used.
Note
LVM requires free space in the VG for snapshot creation (not required for lvmthin snapshots).
2.2.2 ZFS Performance Tips
ZFS uses a lot of memory, so it is best to add additional RAM if you want to use ZFS. A good calculation is
4GB plus 1GB RAM for each TB RAW disk space.
ZFS also provides the feature to use a fast SSD drive as write cache. The write cache is called the ZFS
Intent Log (ZIL). You can add that after installation using the following command:
zpool add <pool-name> log </dev/path_to_fast_ssd>
2.3 Install Proxmox VE on Debian
Proxmox VE ships as a set of Debian packages, so you can install it on top of a normal Debian installation.
After configuring the repositories, you need to run:
apt-get update
apt-get install proxmox-ve
Proxmox VE Administration Guide 16 / 328
Installing on top of an existing Debian installation looks easy, but it presumes that you have correctly installed
the base system, and you know how you want to configure and use the local storage. Network configuration
is also completely up to you.
In general, this is not trivial, especially when you use LVM or ZFS.
You can find a detailed step by step howto on the wiki.
2.4 Install from USB Stick
The Proxmox VE installation media is now a hybrid ISO image, working in two ways:
An ISO image file ready to burn on CD
A raw sector (IMG) image file ready to directly copy to flash media (USB Stick)
Using USB sticks is faster and more environmental friendly and therefore the recommended way to install
Proxmox VE.
2.4.1 Prepare a USB flash drive as install medium
In order to boot the installation media, copy the ISO image to a USB media.
First download the ISO image from https://www.proxmox.com/en/downloads/category/iso-images-pve
You need at least a 1 GB USB media.
Note
Using UNetbootin or Rufus does not work.
Important
Make sure that the USB media is not mounted and does not contain any important data.
2.4.2 Instructions for GNU/Linux
You can simply use dd on UNIX like systems. First download the ISO image, then plug in the USB stick. You
need to find out what device name gets assigned to the USB stick (see below). Then run:
dd if=proxmox-ve_*.iso of=/dev/XYZ bs=1M
Note
Be sure to replace /dev/XYZ with the correct device name.
Caution
Be very careful, and do not overwrite the hard disk!
Proxmox VE Administration Guide 17 / 328
Find Correct USB Device Name
You can compare the last lines of dmesg command before and after the insertion, or use the lsblk command.
Open a terminal and run:
lsblk
Then plug in your USB media and run the command again:
lsblk
A new device will appear, and this is the USB device you want to use.
2.4.3 Instructions for OSX
Open the terminal (query Terminal in Spotlight).
Convert the .iso file to .img using the convert option of hdiutil for example.
hdiutil convert -format UDRW -o proxmox-ve_*.dmg proxmox-ve_*.iso
Tip
OS X tends to put the .dmg ending on the output file automatically.
To get the current list of devices run the command again:
diskutil list
Now insert your USB flash media and run this command again to determine the device node assigned to
your flash media (e.g. /dev/diskX).
diskutil list
diskutil unmountDisk /dev/diskX
Note
replace X with the disk number from the last command.
sudo dd if=proxmox-ve_*.dmg of=/dev/rdiskN bs=1m
2.4.4 Instructions for Windows
Download Etcher from https://etcher.io , select the ISO and your USB Drive.
If this doesn’t work, alternatively use the OSForensics USB installer from http://www.osforensics.com/portability.html
2.4.5 Boot your server from USB media
Connect your USB media to your server and make sure that the server boots from USB (see server BIOS).
Then follow the installation wizard.
Proxmox VE Administration Guide 18 / 328
Chapter 3
Host System Administration
Proxmox VE is based on the famous Debian Linux distribution. That means that you have access to the
whole world of Debian packages, and the base system is well documented. The Debian Administrator’s
Handbook is available online, and provides a comprehensive introduction to the Debian operating system
(see [Hertzog13]).
A standard Proxmox VE installation uses the default repositories from Debian, so you get bug fixes and
security updates through that channel. In addition, we provide our own package repository to roll out all
Proxmox VE related packages. This includes updates to some Debian packages when necessary.
We also deliver a specially optimized Linux kernel, where we enable all required virtualization and container
features. That kernel includes drivers for ZFS, and several hardware drivers. For example, we ship Intel
network card drivers to support their newest hardware.
The following sections will concentrate on virtualization related topics. They either explains things which are
different on Proxmox VE, or tasks which are commonly used on Proxmox VE. For other topics, please refer
to the standard Debian documentation.
3.1 Package Repositories
All Debian based systems use APT as package management tool. The list of repositories is defined in /
etc/apt/sources.list and .list files found inside /etc/apt/sources.d/. Updates can be
installed directly using apt-get, or via the GUI.
Apt sources.list files list one package repository per line, with the most preferred source listed first.
Empty lines are ignored, and a #character anywhere on a line marks the remainder of that line as a com-
ment. The information available from the configured sources is acquired by apt-get update.
File /etc/apt/sources.list
deb http://ftp.debian.org/debian stretch main contrib
# security updates
deb http://security.debian.org stretch/updates main contrib
In addition, Proxmox VE provides three different package repositories.
Proxmox VE Administration Guide 19 / 328
3.1.1 Proxmox VE Enterprise Repository
This is the default, stable and recommended repository, available for all Proxmox VE subscription users. It
contains the most stable packages, and is suitable for production use. The pve-enterprise repository
is enabled by default:
File /etc/apt/sources.list.d/pve-enterprise.list
deb https://enterprise.proxmox.com/debian/pve stretch pve-enterprise
As soon as updates are available, the root@pam user is notified via email about the available new pack-
ages. On the GUI, the change-log of each package can be viewed (if available), showing all details of the
update. So you will never miss important security fixes.
Please note that and you need a valid subscription key to access this repository. We offer different support
levels, and you can find further details at http://www.proxmox.com/en/proxmox-ve/pricing.
Note
You can disable this repository by commenting out the above line using a #(at the start of the line).
This prevents error messages if you do not have a subscription key. Please configure the pve-no-
subscription repository in that case.
3.1.2 Proxmox VE No-Subscription Repository
As the name suggests, you do not need a subscription key to access this repository. It can be used for
testing and non-production use. Its not recommended to run on production servers, as these packages are
not always heavily tested and validated.
We recommend to configure this repository in /etc/apt/sources.list.
File /etc/apt/sources.list
deb http://ftp.debian.org/debian stretch main contrib
# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve stretch pve-no-subscription
# security updates
deb http://security.debian.org stretch/updates main contrib
3.1.3 Proxmox VE Test Repository
Finally, there is a repository called pvetest. This one contains the latest packages and is heavily used by
developers to test new features. As usual, you can configure this using /etc/apt/sources.list by
adding the following line:
Proxmox VE Administration Guide 20 / 328
sources.list entry for pvetest
deb http://download.proxmox.com/debian/pve stretch pvetest
Warning
the pvetest repository should (as the name implies) only be used for testing new features or bug
fixes.
3.1.4 SecureApt
We use GnuPG to sign the Release files inside those repositories, and APT uses that signatures to verify
that all packages are from a trusted source.
The key used for verification is already installed if you install from our installation CD. If you install by other
means, you can manually download the key with:
# wget http://download.proxmox.com/debian/proxmox-ve-release-5.x.gpg -
-O /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
Please verify the checksum afterwards:
# sha512sum /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
ffb95f0f4be68d2e753c8875ea2f8465864a58431d5361e88789568673551501ae574283a4e0492f17d79dc67edfb173a56a6304dea39e01f249ebdabc9f074a -
/etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
or
# md5sum /etc/apt/trusted.gpg.d/proxmox-ve-release-5.x.gpg
511d36d0f1350c01c42a3dc9f3c27939 /etc/apt/trusted.gpg.d/proxmox-ve-release -
-5.x.gpg
3.2 System Software Updates
We provide regular package updates on all repositories. You can install those update using the GUI, or you
can directly run the CLI command apt-get:
apt-get update
apt-get dist-upgrade
Note
The apt package management system is extremely flexible and provides countless of feature - see man
apt-get or [Hertzog13] for additional information.
You should do such updates at regular intervals, or when we release versions with security related fixes.
Major system upgrades are announced at the Proxmox VE Community Forum. Those announcement also
contain detailed upgrade instructions.
Tip
We recommend to run regular upgrades, because it is important to get the latest security updates.
Proxmox VE Administration Guide 21 / 328
3.3 Network Configuration
Network configuration can be done either via the GUI, or by manually editing the file /etc/network/
interfaces, which contains the whole network configuration. The interfaces(5) manual page
contains the complete format description. All Proxmox VE tools try hard to keep direct user modifications,
but using the GUI is still preferable, because it protects you from errors.
Once the network is configured, you can use the Debian traditional tools ifup and ifdown commands to
bring interfaces up and down.
Note
Proxmox VE does not write changes directly to /etc/network/interfaces. Instead, we write into
a temporary file called /etc/network/interfaces.new, and commit those changes when you
reboot the node.
3.3.1 Naming Conventions
We currently use the following naming conventions for device names:
Ethernet devices: en*, systemd network interface names. This naming scheme is used for new Proxmox
VE installations since version 5.0.
Ethernet devices: eth[N], where 0 N (eth0,eth1, . . . ) This naming scheme is used for Proxmox VE
hosts which were installed before the 5.0 release. When upgrading to 5.0, the names are kept as-is.
Bridge names: vmbr[N], where 0 N4094 (vmbr0 -vmbr4094)
Bonds: bond[N], where 0 N (bond0,bond1,...)
VLANs: Simply add the VLAN number to the device name, separated by a period (eno1.50,bond1.
30)
This makes it easier to debug networks problems, because the device name implies the device type.
Systemd Network Interface Names
Systemd uses the two character prefix en for Ethernet network devices. The next characters depends on the
device driver and the fact which schema matches first.
o<index>[n<phys_port_name>|d<dev_port>] — devices on board
s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
[P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
x<MAC> — device by MAC address
The most common patterns are:
eno1 — is the first on board NIC
enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
For more information see Predictable Network Interface Names.
Proxmox VE Administration Guide 22 / 328
3.3.2 Choosing a network configuration
Depending on your current network organization and your resources you can choose either a bridged, routed,
or masquerading networking setup.
Proxmox VE server in a private LAN, using an external gateway to reach the internet
The Bridged model makes the most sense in this case, and this is also the default mode on new Proxmox
VE installations. Each of your Guest system will have a virtual interface attached to the Proxmox VE bridge.
This is similar in effect to having the Guest network card directly connected to a new switch on your LAN, the
Proxmox VE host playing the role of the switch.
Proxmox VE server at hosting provider, with public IP ranges for Guests
For this setup, you can use either a Bridged or Routed model, depending on what your provider allows.
Proxmox VE server at hosting provider, with a single public IP address
In that case the only way to get outgoing network accesses for your guest systems is to use Masquerading.
For incoming network access to your guests, you will need to configure Port Forwarding.
For further flexibility, you can configure VLANs (IEEE 802.1q) and network bonding, also known as "link
aggregation". That way it is possible to build complex and flexible virtual networks.
3.3.3 Default Configuration using a Bridge
Bridges are like physical network switches implemented in software. All VMs can share a single bridge, or
you can create multiple bridges to separate network domains. Each host can have up to 4094 bridges.
The installation program creates a single bridge named vmbr0, which is connected to the first Ethernet
card. The corresponding configuration in /etc/network/interfaces might look like this:
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.10.2
netmask 255.255.255.0
gateway 192.168.10.1
bridge_ports eno1
bridge_stp off
bridge_fd 0
Virtual machines behave as if they were directly connected to the physical network. The network, in turn,
sees each virtual machine as having its own MAC, even though there is only one network cable connecting
all of these VMs to the network.
Proxmox VE Administration Guide 23 / 328
3.3.4 Routed Configuration
Most hosting providers do not support the above setup. For security reasons, they disable networking as
soon as they detect multiple MAC addresses on a single interface.
Tip
Some providers allows you to register additional MACs on there management interface. This avoids the
problem, but is clumsy to configure because you need to register a MAC for each of your VMs.
You can avoid the problem by “routing” all traffic via a single interface. This makes sure that all network
packets use the same MAC address.
A common scenario is that you have a public IP (assume 198.51.100.5 for this example), and an addi-
tional IP block for your VMs (203.0.113.16/29). We recommend the following setup for such situations:
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet static
address 198.51.100.5
netmask 255.255.255.0
gateway 198.51.100.1
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up echo 1 > /proc/sys/net/ipv4/conf/eno1/proxy_arp
auto vmbr0
iface vmbr0 inet static
address 203.0.113.17
netmask 255.255.255.248
bridge_ports none
bridge_stp off
bridge_fd 0
3.3.5 Masquerading (NAT) with iptables
Masquerading allows guests having only a private IP address to access the network by using the host IP
address for outgoing traffic. Each outgoing packet is rewritten by iptables to appear as originating from
the host, and responses are rewritten accordingly to be routed to the original sender.
auto lo
iface lo inet loopback
auto eno1
#real IP address
iface eno1 inet static
address 198.51.100.5
netmask 255.255.255.0
gateway 198.51.100.1
Proxmox VE Administration Guide 24 / 328
auto vmbr0
#private sub network
iface vmbr0 inet static
address 10.10.10.1
netmask 255.255.255.0
bridge_ports none
bridge_stp off
bridge_fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s ’10.10.10.0/24’ -o eno1 -
-j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s ’10.10.10.0/24’ -o eno1 -
-j MASQUERADE
3.3.6 Linux Bond
Bonding (also called NIC teaming or Link Aggregation) is a technique for binding multiple NIC’s to a single
network device. It is possible to achieve different goals, like make the network fault-tolerant, increase the
performance or both together.
High-speed hardware like Fibre Channel and the associated switching hardware can be quite expensive. By
doing link aggregation, two NICs can appear as one logical interface, resulting in double speed. This is a
native Linux kernel feature that is supported by most switches. If your nodes have multiple Ethernet ports,
you can distribute your points of failure by running network cables to different switches and the bonded
connection will failover to one cable or the other in case of network trouble.
Aggregated links can improve live-migration delays and improve the speed of replication of data between
Proxmox VE Cluster nodes.
There are 7 modes for bonding:
Round-robin (balance-rr): Transmit network packets in sequential order from the first available network
interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.
Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes
active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally
visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.
XOR (balance-xor): Transmit network packets based on [(source MAC address XOR’d with destination
MAC address) modulo NIC slave count]. This selects the same NIC slave for each destination MAC
address. This mode provides load balancing and fault tolerance.
Broadcast (broadcast): Transmit network packets on all slave network interfaces. This mode provides
fault tolerance.
IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the
same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group ac-
cording to the 802.3ad specification.
Adaptive transmit load balancing (balance-tlb): Linux bonding driver mode that does not require any
special network-switch support. The outgoing network packet traffic is distributed according to the current
Proxmox VE Administration Guide 25 / 328
load (computed relative to the speed) on each network interface slave. Incoming traffic is received by one
currently designated slave network interface. If this receiving slave fails, another slave takes over the MAC
address of the failed receiving slave.
Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4
traffic, and does not require any special network switch support. The receive load balancing is achieved by
ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out
and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in
the single logical bonded interface such that different network-peers use different MAC addresses for their
network packet traffic.
If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using the corresponding
bonding mode (802.3ad). Otherwise you should generally use the active-backup mode.
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode
on the bonding interfaces, other modes are unsupported.
The following bond configuration can be used as distributed/shared storage network. The benefit would be
that you get more speed and the network will be fault-tolerant.
Example: Use bond with fixed IP address
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet static
slaves eno1 eno2
address 192.168.1.2
netmask 255.255.255.0
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2
netmask 255.255.255.0
gateway 10.10.10.1
bridge_ports eno1
bridge_stp off
bridge_fd 0
Another possibility it to use the bond directly as bridge port. This can be used to make the guest network
fault-tolerant.
Example: Use a bond as bridge port
Proxmox VE Administration Guide 26 / 328
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2
netmask 255.255.255.0
gateway 10.10.10.1
bridge_ports bond0
bridge_stp off
bridge_fd 0
3.3.7 VLAN 802.1Q
A virtual LAN (VLAN) is a broadcast domain that is partitioned and isolated in the network at layer two. So it
is possible to have multiple networks (4096) in a physical network, each independent of the other ones.
Each VLAN network is identified by a number often called tag. Network packages are then tagged to identify
which virtual network they belong to.
VLAN for Guest Networks
Proxmox VE supports this setup out of the box. You can specify the VLAN tag when you create a VM.
The VLAN tag is part of the guest network confinuration. The networking layer supports differnet modes to
implement VLANs, depending on the bridge configuration:
VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to
a VLAN tag, which is transparently supported by the Linux bridge. Trunk mode is also possible, but that
makes the configuration in the guest necessary.
"traditional" VLAN on the Linux bridge: In contrast to the VLAN awareness method, this method is not
transparent and creates a VLAN device with associated bridge for each VLAN. That is, if e.g. in our default
network, a guest VLAN 5 is used to create eno1.5 and vmbr0v5, which remains until rebooting.
Open vSwitch VLAN: This mode uses the OVS VLAN feature.
Guest configured VLAN: VLANs are assigned inside the guest. In this case, the setup is completely
done inside the guest and can not be influenced from the outside. The benefit is that you can use more
than one VLAN on a single virtual NIC.
Proxmox VE Administration Guide 27 / 328
VLAN on the Host
To allow host communication with an isolated network. It is possible to apply VLAN tags to any network
device (NIC, Bond, Bridge). In general, you should configure the VLAN on the interface with the least
abstraction layers between itself and the physical NIC.
For example, in a default configuration where you want to place the host management address on a separate
VLAN.
Note
In the examples we use the VLAN at bridge level to ensure the correct function of VLAN 5 in the guest
network, but in combination with VLAN anwareness bridge this it will not work for guest network VLAN 5.
The downside of this setup is more CPU usage.
Example: Use VLAN 5 for the Proxmox VE management IP
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno1.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2
netmask 255.255.255.0
gateway 10.10.10.1
bridge_ports eno1.5
bridge_stp off
bridge_fd 0
auto vmbr0
iface vmbr0 inet manual
bridge_ports eno1
bridge_stp off
bridge_fd 0
The next example is the same setup but a bond is used to make this network fail-safe.
Example: Use VLAN 5 with bond0 for the Proxmox VE management IP
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
Proxmox VE Administration Guide 28 / 328
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer2+3
iface bond0.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2
netmask 255.255.255.0
gateway 10.10.10.1
bridge_ports bond0.5
bridge_stp off
bridge_fd 0
auto vmbr0
iface vmbr0 inet manual
bridge_ports bond0
bridge_stp off
bridge_fd 0
3.4 Time Synchronization
The Proxmox VE cluster stack itself relies heavily on the fact that all the nodes have precisely synchronized
time. Some other components, like Ceph, also refuse to work properly if the local time on nodes is not in
sync.
Time synchronization between nodes can be achieved with the “Network Time Protocol” (NTP). Proxmox VE
uses systemd-timesyncd as NTP client by default, preconfigured to use a set of public servers. This
setup works out of the box in most cases.
3.4.1 Using Custom NTP Servers
In some cases, it might be desired to not use the default NTP servers. For example, if your Proxmox VE
nodes do not have access to the public internet (e.g., because of restrictive firewall rules), you need to setup
local NTP servers and tell systemd-timesyncd to use them:
File /etc/systemd/timesyncd.conf
[Time]
Servers=ntp1.example.com ntp2.example.com ntp3.example.com ntp4.example.com
After restarting the synchronization service (systemctl restart systemd-timesyncd) you should
verify that your newly configured NTP servers are used by checking the journal (journalctl --since
-1h -u systemd-timesyncd):
Proxmox VE Administration Guide 29 / 328
...
Oct 07 14:58:36 node1 systemd[1]: Stopping Network Time Synchronization...
Oct 07 14:58:36 node1 systemd[1]: Starting Network Time Synchronization...
Oct 07 14:58:36 node1 systemd[1]: Started Network Time Synchronization.
Oct 07 14:58:36 node1 systemd-timesyncd[13514]: Using NTP server -
10.0.0.1:123 (ntp1.example.com).
Oct 07 14:58:36 nora systemd-timesyncd[13514]: interval/delta/delay/jitter/ -
drift 64s/-0.002s/0.020s/0.000s/-31ppm
...
3.5 External Metric Server
Starting with Proxmox VE 4.0, you can define external metric servers, which will be sent various stats about
your hosts, virtual machines and storages.
Currently supported are:
graphite (see http://graphiteapp.org )
influxdb (see https://www.influxdata.com/time-series-platform/influxdb/ )
The server definitions are saved in /etc/pve/status.cfg
3.5.1 Graphite server configuration
The definition of a server is:
graphite:
server your-server
port your-port
path your-path
where your-port defaults to 2003 and your-path defaults to proxmox
Proxmox VE sends the data over udp, so the graphite server has to be configured for this
3.5.2 Influxdb plugin configuration
The definition is:
influxdb:
server your-server
port your-port
Proxmox VE sends the data over udp, so the influxdb server has to be configured for this
Here is an example configuration for influxdb (on your influxdb server):
Proxmox VE Administration Guide 30 / 328
[[udp]]
enabled = true
bind-address = "0.0.0.0:8089"
database = "proxmox"
batch-size = 1000
batch-timeout = "1s"
With this configuration, your server listens on all IP addresses on port 8089, and writes the data in the
proxmox database
3.6 Disk Health Monitoring
Although a robust and redundant storage is recommended, it can be very helpful to monitor the health of
your local disks.
Starting with Proxmox VE 4.3, the package smartmontools 1is installed and required. This is a set of tools
to monitor and control the S.M.A.R.T. system for local hard disks.
You can get the status of a disk by issuing the following command:
# smartctl -a /dev/sdX
where /dev/sdX is the path to one of your local disks.
If the output says:
SMART support is: Disabled
you can enable it with the command:
# smartctl -s on /dev/sdX
For more information on how to use smartctl, please see man smartctl.
By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and
/dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem.
For more information about how to configure smartd, please see man smartd and man smartd.conf.
If you use your hard disks with a hardware raid controller, there are most likely tools to monitor the disks in
the raid array and the array itself. For more information about this, please refer to the vendor of your raid
controller.
3.7 Logical Volume Manager (LVM)
Most people install Proxmox VE directly on a local disk. The Proxmox VE installation CD offers several
options for local disk management, and the current default setup uses LVM. The installer let you select a
single disk for such setup, and uses that disk as physical volume for the Volume Group (VG) pve. The
following output is from a test installation using a small 8GB disk:
1smartmontools homepage https://www.smartmontools.org
Proxmox VE Administration Guide 31 / 328
# pvs
PV VG Fmt Attr PSize PFree
/dev/sda3 pve lvm2 a-- 7.87g 876.00m
# vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 3 0 wz--n- 7.87g 876.00m
The installer allocates three Logical Volumes (LV) inside this VG:
# lvs
LV VG Attr LSize Pool Origin Data% Meta%
data pve twi-a-tz-- 4.38g 0.00 0.63
root pve -wi-ao---- 1.75g
swap pve -wi-ao---- 896.00m
root
Formatted as ext4, and contains the operation system.
swap
Swap partition
data
This volume uses LVM-thin, and is used to store VM images. LVM-thin is preferable for this task,
because it offers efficient support for snapshots and clones.
For Proxmox VE versions up to 4.1, the installer creates a standard logical volume called “data”, which is
mounted at /var/lib/vz.
Starting from version 4.2, the logical volume “data” is a LVM-thin pool, used to store block based guest
images, and /var/lib/vz is simply a directory on the root file system.
3.7.1 Hardware
We highly recommend to use a hardware RAID controller (with BBU) for such setups. This increases perfor-
mance, provides redundancy, and make disk replacements easier (hot-pluggable).
LVM itself does not need any special hardware, and memory requirements are very low.
3.7.2 Bootloader
We install two boot loaders by default. The first partition contains the standard GRUB boot loader. The
second partition is an EFI System Partition (ESP), which makes it possible to boot on EFI systems.
3.7.3 Creating a Volume Group
Let’s assume we have an empty disk /dev/sdb, onto which we want to create a volume group named
“vmdata”.
Proxmox VE Administration Guide 32 / 328
Caution
Please note that the following commands will destroy all existing data on /dev/sdb.
First create a partition.
# sgdisk -N 1 /dev/sdb
Create a Physical Volume (PV) without confirmation and 250K metadatasize.
# pvcreate --metadatasize 250k -y -ff /dev/sdb1
Create a volume group named “vmdata” on /dev/sdb1
# vgcreate vmdata /dev/sdb1
3.7.4 Creating an extra LV for /var/lib/vz
This can be easily done by creating a new thin LV.
# lvcreate -n <Name> -V <Size[M,G,T]> <VG>/<LVThin_pool>
A real world example:
# lvcreate -n vz -V 10G pve/data
Now a filesystem must be created on the LV.
# mkfs.ext4 /dev/pve/vz
At last this has to be mounted.
Warning
be sure that /var/lib/vz is empty. On a default installation it’s not.
To make it always accessible add the following line in /etc/fstab.
# echo ’/dev/pve/vz /var/lib/vz ext4 defaults 0 2’ >> /etc/fstab
3.7.5 Resizing the thin pool
Resize the LV and the metadata pool can be achieved with the following command.
# lvresize --size +<size[\M,G,T]> --poolmetadatasize +<size[\M,G]> < -
VG>/<LVThin_pool>
Note
When extending the data pool, the metadata pool must also be extended.
Proxmox VE Administration Guide 33 / 328
3.7.6 Create a LVM-thin pool
A thin pool has to be created on top of a volume group. How to create a volume group see Section LVM.
# lvcreate -L 80G -T -n vmstore vmdata
3.8 ZFS on Linux
ZFS is a combined file system and logical volume manager designed by Sun Microsystems. Starting with
Proxmox VE 3.4, the native Linux kernel port of the ZFS file system is introduced as optional file system and
also as an additional selection for the root file system. There is no need for manually compile ZFS modules
- all packages are included.
By using ZFS, its possible to achieve maximum enterprise features with low budget hardware, but also high
performance systems by leveraging SSD caching or even SSD only setups. ZFS can replace cost intense
hardware raid cards by moderate CPU and memory load combined with easy management.
GENERAL ZFS ADVANTAGES
Easy configuration and management with Proxmox VE GUI and CLI.
• Reliable
Protection against data corruption
Data compression on file system level
• Snapshots
Copy-on-write clone
Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
Can use SSD for cache
Self healing
Continuous integrity checking
Designed for high storage capacities
Protection against data corruption
Asynchronous replication over network
Open Source
• Encryption
. . .
Proxmox VE Administration Guide 34 / 328
3.8.1 Hardware
ZFS depends heavily on memory, so you need at least 8GB to start. In practice, use as much you can get
for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM.
If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC
S3700 Series). This can increase the overall performance significantly.
Important
Do not use ZFS on top of hardware controller which has its own cache management. ZFS needs to
directly communicate with disks. An HBA adapter is the way to go, or something like LSI controller
flashed in “IT” mode.
If you are experimenting with an installation of Proxmox VE inside a VM (Nested Virtualization), don’t use
virtio for disks of that VM, since they are not supported by ZFS. Use IDE or SCSI instead (works also
with virtio SCSI controller type).
3.8.2 Installation as Root File System
When you install using the Proxmox VE installer, you can choose ZFS for the root file system. You need to
select the RAID type at installation time:
RAID0 Also called “striping”. The capacity of such volume is the sum of the capacities of all
disks. But RAID0 does not add any redundancy, so the failure of a single drive
makes the volume unusable.
RAID1 Also called “mirroring”. Data is written identically to all disks. This mode requires at
least 2 disks with the same size. The resulting capacity is that of a single disk.
RAID10 A combination of RAID0 and RAID1. Requires at least 4 disks.
RAIDZ-1 A variation on RAID-5, single parity. Requires at least 3 disks.
RAIDZ-2 A variation on RAID-5, double parity. Requires at least 4 disks.
RAIDZ-3 A variation on RAID-5, triple parity. Requires at least 5 disks.
The installer automatically partitions the disks, creates a ZFS pool called rpool, and installs the root file
system on the ZFS subvolume rpool/ROOT/pve-1.
Another subvolume called rpool/data is created to store VM images. In order to use that with the
Proxmox VE tools, the installer creates the following configuration entry in /etc/pve/storage.cfg:
zfspool: local-zfs
pool rpool/data
sparse
content images,rootdir
After installation, you can view your ZFS pool status using the zpool command:
Proxmox VE Administration Guide 35 / 328
# zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
errors: No known data errors
The zfs command is used configure and manage your ZFS file systems. The following command lists all
file systems after installation:
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 4.94G 7.68T 96K /rpool
rpool/ROOT 702M 7.68T 96K /rpool/ROOT
rpool/ROOT/pve-1 702M 7.68T 702M /
rpool/data 96K 7.68T 96K /rpool/data
rpool/swap 4.25G 7.69T 64K -
3.8.3 Bootloader
The default ZFS disk partitioning scheme does not use the first 2048 sectors. This gives enough room to
install a GRUB boot partition. The Proxmox VE installer automatically allocates that space, and installs the
GRUB boot loader there. If you use a redundant RAID setup, it installs the boot loader on all disk required
for booting. So you can boot even if some disks fail.
Note
It is not possible to use ZFS as root file system with UEFI boot.
3.8.4 ZFS Administration
This section gives you some usage examples for common tasks. ZFS itself is really powerful and provides
many options. The main commands to manage ZFS are zfs and zpool. Both commands come with great
manual pages, which can be read with:
# man zpool
# man zfs
Proxmox VE Administration Guide 36 / 328
Create a new zpool
To create a new pool, at least one disk is needed. The ashift should have the same sector-size (2 power
of ashift) or larger as the underlying disk.
zpool create -f -o ashift=12 <pool> <device>
To activate compression
zfs set compression=lz4 <pool>
Create a new pool with RAID-0
Minimum 1 Disk
zpool create -f -o ashift=12 <pool> <device1> <device2>
Create a new pool with RAID-1
Minimum 2 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
Create a new pool with RAID-10
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2> -
mirror <device3> <device4>
Create a new pool with RAIDZ-1
Minimum 3 Disks
zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> < -
device3>
Create a new pool with RAIDZ-2
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> < -
device3> <device4>
Create a new pool with cache (L2ARC)
It is possible to use a dedicated cache drive partition to increase the performance (use SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
Proxmox VE Administration Guide 37 / 328
Create a new pool with log (ZIL)
It is possible to use a dedicated cache drive partition to increase the performance(SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> log <log_device>
Add cache and log to an existing pool
If you have an pool without cache and log. First partition the SSD in 2 partition with parted or gdisk
Important
Always use GPT partition tables.
The maximum size of a log device should be about half the size of physical memory, so this is usually quite
small. The rest of the SSD can be used as cache.
zpool add -f <pool> log <device-part1> cache <device-part2>
Changing a failed device
zpool replace -f <pool> <old device> <new-device>
3.8.5 Activate E-Mail Notification
ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module. The daemon
can also send emails on ZFS events like pool errors. Newer ZFS packages ships the daemon in a separate
package, and you can install it using apt-get:
# apt-get install zfs-zed
To activate the daemon it is necessary to edit /etc/zfs/zed.d/zed.rc with your favourite editor, and
uncomment the ZED_EMAIL_ADDR setting:
ZED_EMAIL_ADDR="root"
Please note Proxmox VE forwards mails to root to the email address configured for the root user.
Important
The only setting that is required is ZED_EMAIL_ADDR. All other settings are optional.
Proxmox VE Administration Guide 38 / 328
3.8.6 Limit ZFS Memory Usage
It is good to use at most 50 percent (which is the default) of the system memory for ZFS ARC to prevent
performance shortage of the host. Use your preferred editor to change the configuration in /etc/modpr
obe.d/zfs.conf and insert:
options zfs zfs_arc_max=8589934592
This example setting limits the usage to 8GB.
Important
If your root file system is ZFS you must update your initramfs every time this value changes:
update-initramfs -u
SWAP on ZFS
SWAP on ZFS on Linux may generate some troubles, like blocking the server or generating a high IO load,
often seen when starting a Backup to an external Storage.
We strongly recommend to use enough memory, so that you normally do not run into low memory situations.
Additionally, you can lower the “swappiness” value. A good value for servers is 10:
sysctl -w vm.swappiness=10
To make the swappiness persistent, open /etc/sysctl.conf with an editor of your choice and add the
following line:
vm.swappiness = 10
Table 3.1: Linux kernel swappiness parameter values
Value Strategy
vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition
vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.
vm.swappiness = 10 This value is sometimes recommended to improve performance
when sufficient memory exists in a system.
vm.swappiness = 60 The default value.
vm.swappiness = 100 The kernel will swap aggressively.
Proxmox VE Administration Guide 39 / 328
3.9 Certificate Management
3.9.1 Certificates for communication within the cluster
Each Proxmox VE cluster creates its own internal Certificate Authority (CA) and generates a self-signed cer-
tificate for each node. These certificates are used for encrypted communication with the cluster’s pveproxy
service and the Shell/Console feature if SPICE is used.
The CA certificate and key are stored in the pmxcfs (see the pmxcfs(8) manpage).
3.9.2 Certificates for API and web GUI
The REST API and web GUI are provided by the pveproxy service, which runs on each node.
You have the following options for the certificate used by pveproxy:
1. By default the node-specific certificate in /etc/pve/nodes/NODENAME/pve-ssl.pem is used.
This certificate is signed by the cluster CA and therefore not trusted by browsers and operating sys-
tems by default.
2. use an externally provided certificate (e.g. signed by a commercial CA).
3. use ACME (e.g., Let’s Encrypt) to get a trusted certificate with automatic renewal.
For options 2 and 3 the file /etc/pve/local/pveproxy-ssl.pem (and /etc/pve/local/pve
proxy-ssl.key, which needs to be without password) is used.
Certificates are managed with the Proxmox VE Node management command (see the pvenode(1) man-
page).
Warning
Do not replace or manually modify the automatically generated node certificate files in /etc/
pve/local/pve-ssl.pem and /etc/pve/local/pve-ssl.key or the cluster CA files
in /etc/pve/pve-root-ca.pem and /etc/pve/priv/pve-root-ca.key.
Getting trusted certificates via ACME
Proxmox VE includes an implementation of the Automatic Certificate Management Environment ACME pro-
tocol, allowing Proxmox VE admins to interface with Let’s Encrypt for easy setup of trusted TLS certificates
which are accepted out of the box on most modern operating systems and browsers.
Currently the two ACME endpoints implemented are Let’s Encrypt (LE) and its staging environment (see
https://letsencrypt.org), both using the standalone HTTP challenge.
Because of rate-limits you should use LE staging for experiments.
There are a few prerequisites to use Let’s Encrypt:
1. Port 80 of the node needs to be reachable from the internet.
2. There must be no other listener on port 80.
Proxmox VE Administration Guide 40 / 328
3. The requested (sub)domain needs to resolve to a public IP of the Node.
4. You have to accept the ToS of Let’s Encrypt.
At the moment the GUI uses only the default ACME account.
Example: Sample pvenode invocation for using Let’s Encrypt certificates
root@proxmox:~# pvenode acme account register default mail@example.invalid
Directory endpoints:
0) Let’s Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory)
1) Let’s Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/ -
directory)
2) Custom
Enter selection:
1
Attempting to fetch Terms of Service from ’https://acme-staging-v02.api. -
letsencrypt.org/directory’..
Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November -
-15-2017.pdf
Do you agree to the above terms? [y|N]y
Attempting to register account with ’https://acme-staging-v02.api. -
letsencrypt.org/directory’..
Generating ACME account key..
Registering ACME account..
Registration successful, account URL: ’https://acme-staging-v02.api. -
letsencrypt.org/acme/acct/xxxxxxx’
Task OK
root@proxmox:~# pvenode acme account list
default
root@proxmox:~# pvenode config set --acme domains=example.invalid
root@proxmox:~# pvenode acme cert order
Loading ACME account details
Placing ACME order
Order URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/ -
xxxxxxxxxxxxxx
Getting authorization details from
’https://acme-staging-v02.api.letsencrypt.org/acme/authz/ -
xxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx-xxxxxxx’
... pending!
Setting up webserver
Triggering validation
Sleeping for 5 seconds
Status is ’valid’!
All domains validated!
Creating CSR
Finalizing order
Proxmox VE Administration Guide 41 / 328
Checking order status
valid!
Downloading certificate
Setting pveproxy certificate and key
Restarting pveproxy
Task OK
Automatic renewal of ACME certificates
If a node has been successfully configured with an ACME-provided certificate (either via pvenode or via the
GUI), the certificate will be automatically renewed by the pve-daily-update.service. Currently, renewal will be
attempted if the certificate has expired or will expire in the next 30 days.
Proxmox VE Administration Guide 42 / 328
Chapter 4
Hyper-converged Infrastructure
Proxmox VE is a virtualization platform that tightly integrates compute, storage and networking resources,
manages highly available clusters, backup/restore as well as disaster recovery. All components are software-
defined and compatible with one another.
Therefore it is possible to administrate them like a single system via the centralized web management inter-
face. These capabilities make Proxmox VE an ideal choice to deploy and manage an open source hyper-
converged infrastructure.
4.1 Benefits of a Hyper-Converged Infrastructure (HCI) with Proxmox
VE
A hyper-converged infrastructure is especially useful for deployments in which a high infrastructure demand
meets a low administration budget, for distributed setups such as remote and branch office environments or
for virtual private and public clouds.
HCI provides the following advantages:
Scalability: seamless expansion of compute, network and storage devices (i.e. scale up servers and
storage quickly and independently from each other).
Low cost: Proxmox VE is open source and integrates all components you need such as compute, storage,
networking, backup, and management center. It can replace an expensive compute/storage infrastructure.
Data protection and efficiency: services such as backup and disaster recovery are integrated.
Simplicity: easy configuration and centralized administration.
Open Source: No vendor lock-in.
Proxmox VE Administration Guide 43 / 328
4.2 Manage Ceph Services on Proxmox VE Nodes
Proxmox VE unifies your compute and storage systems, i.e. you can use the same physical nodes within
a cluster for both computing (processing VMs and containers) and replicated storage. The traditional silos
of compute and storage resources can be wrapped up into a single hyper-converged appliance. Separate
storage networks (SANs) and connections via network (NAS) disappear. With the integration of Ceph, an
open source software-defined storage platform, Proxmox VE has the ability to run and manage Ceph storage
directly on the hypervisor nodes.
Ceph is a distributed object store and file system designed to provide excellent performance, reliability and
scalability.
For small to mid sized deployments, it is possible to install a Ceph server for RADOS Block Devices (RBD)
directly on your Proxmox VE cluster nodes, see Ceph RADOS Block Devices (RBD) Section 8.14. Recent
hardware has plenty of CPU power and RAM, so running storage services and VMs on the same node is
possible.
To simplify management, we provide pveceph - a tool to install and manage Ceph services on Proxmox VE
nodes.
Ceph consists of a couple of Daemons 1, for use as a RBD storage:
1Ceph intro http://docs.ceph.com/docs/master/start/intro/
Proxmox VE Administration Guide 44 / 328
Ceph Monitor (ceph-mon)
Ceph Manager (ceph-mgr)
Ceph OSD (ceph-osd; Object Storage Daemon)
Tip
We recommend to get familiar with the Ceph vocabulary. a
aCeph glossary http://docs.ceph.com/docs/luminous/glossary
4.2.1 Precondition
To build a Proxmox Ceph Cluster there should be at least three (preferably) identical servers for the setup.
A 10Gb network, exclusively used for Ceph, is recommended. A meshed network setup is also an option if
there are no 10Gb switches available, see wiki .
Check also the recommendations from Ceph’s website.
4.2.2 Installation of Ceph Packages
On each node run the installation script as follows:
pveceph install
This sets up an apt package repository in /etc/apt/sources.list.d/ceph.list and installs
the required software.
Proxmox VE Administration Guide 45 / 328
4.2.3 Creating initial Ceph configuration
After installation of packages, you need to create an initial Ceph configuration on just one node, based on
your network (10.10.10.0/24 in the following example) dedicated for Ceph:
pveceph init --network 10.10.10.0/24
This creates an initial config at /etc/pve/ceph.conf. That file is automatically distributed to all Prox-
mox VE nodes by using pmxcfs Chapter 7. The command also creates a symbolic link from /etc/ceph/
ceph.conf pointing to that file. So you can simply run Ceph commands without the need to specify a
configuration file.
Proxmox VE Administration Guide 46 / 328
4.2.4 Creating Ceph Monitors
The Ceph Monitor (MON) 2maintains a master copy of the cluster map. For HA you need to have at least 3
monitors.
On each node where you want to place a monitor (three monitors are recommended), create it by using the
Ceph Monitor tab in the GUI or run.
pveceph createmon
This will also install the needed Ceph Manager (ceph-mgr) by default. If you do not want to install a manager,
specify the -exclude-manager option.
4.2.5 Creating Ceph Manager
The Manager daemon runs alongside the monitors. It provides interfaces for monitoring the cluster. Since
the Ceph luminous release the ceph-mgr 3daemon is required. During monitor installation the ceph manager
will be installed as well.
2Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/
3Ceph Manager http://docs.ceph.com/docs/luminous/mgr/
Proxmox VE Administration Guide 47 / 328
Note
It is recommended to install the Ceph Manager on the monitor nodes. For high availability install more
then one manager.
pveceph createmgr
4.2.6 Creating Ceph OSDs
via GUI or via CLI as follows:
pveceph createosd /dev/sd[X]
Tip
We recommend a Ceph cluster size, starting with 12 OSDs, distributed evenly among your, at least three
nodes (4 OSDs on each node).
Proxmox VE Administration Guide 48 / 328
Ceph Bluestore
Starting with the Ceph Kraken release, a new Ceph OSD storage type was introduced, the so called Blue-
store 4. In Ceph luminous this store is the default when creating OSDs.
pveceph createosd /dev/sd[X]
Note
In order to select a disk in the GUI, to be more failsafe, the disk needs to have a GPT apartition table. You
can create this with gdisk /dev/sd(x). If there is no GPT, you cannot select the disk as DB/WAL.
aGPT partition table https://en.wikipedia.org/wiki/GUID_Partition_Table
If you want to use a separate DB/WAL device for your OSDs, you can specify it through the -wal_dev option.
pveceph createosd /dev/sd[X] -wal_dev /dev/sd[Y]
Note
The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s internal journal or write-ahead
log. It is recommended to use a fast SSDs or NVRAM for better performance.
Ceph Filestore
Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can still be used and might give
better performance in small setups, when backed by a NVMe SSD or similar.
pveceph createosd /dev/sd[X] -bluestore 0
Note
In order to select a disk in the GUI, the disk needs to have a GPT apartition table. You can create this with
gdisk /dev/sd(x). If there is no GPT, you cannot select the disk as journal. Currently the journal
size is fixed to 5 GB.
If you want to use a dedicated SSD journal disk:
pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0
Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD journal disk.
pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0
This partitions the disk (data and journal partition), creates filesystems and starts the OSD, afterwards it is
running and fully functional.
4Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/
Proxmox VE Administration Guide 49 / 328
Note
This command refuses to initialize disk when it detects existing data. So if you want to overwrite a disk
you should remove existing data first. You can do that using: ceph-disk zap /dev/sd[X]
You can create OSDs containing both journal and data partitions or you can place the journal on a dedicated
SSD. Using a SSD journal disk is highly recommended to achieve good performance.
4.2.7 Creating Ceph Pools
A pool is a logical group for storing objects. It holds Placement Groups (PG), a collection of objects.
When no options are given, we set a default of 64 PGs, a size of 3 replicas and a min_size of 2 replicas
for serving objects in a degraded state.
Note
The default number of PGs works for 2-6 disks. Ceph throws a "HEALTH_WARNING" if you have too few
or too many PGs in your cluster.
Proxmox VE Administration Guide 50 / 328
It is advised to calculate the PG number depending on your setup, you can find the formula and the PG
calculator 5online. While PGs can be increased later on, they can never be decreased.
You can create pools through command line or on the GUI on each PVE host under Ceph Pools.
pveceph createpool <name>
If you would like to automatically get also a storage definition for your pool, active the checkbox "Add stor-
ages" on the GUI or use the command line option --add_storages on pool creation.
Further information on Ceph pool handling can be found in the Ceph pool operation 6manual.
4.2.8 Ceph CRUSH & device classes
The foundation of Ceph is its algorithm, Controlled Replication Under Scalable Hashing (CRUSH 7).
CRUSH calculates where to store to and retrieve data from, this has the advantage that no central index ser-
vice is needed. CRUSH works with a map of OSDs, buckets (device locations) and rulesets (data replication)
for pools.
Note
Further information can be found in the Ceph documentation, under the section CRUSH map a.
aCRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/
This map can be altered to reflect different replication hierarchies. The object replicas can be separated (eg.
failure domains), while maintaining the desired distribution.
A common use case is to use different classes of disks for different Ceph pools. For this reason, Ceph
introduced the device classes with luminous, to accommodate the need for easy ruleset generation.
The device classes can be seen in the ceph osd tree output. These classes represent their own root bucket,
which can be seen with the below command.
ceph osd crush tree --show-shadow
Example output form the above command:
ID CLASS WEIGHT TYPE NAME
-16 nvme 2.18307 root default~nvme
-13 nvme 0.72769 host sumi1~nvme
12 nvme 0.72769 osd.12
-14 nvme 0.72769 host sumi2~nvme
13 nvme 0.72769 osd.13
-15 nvme 0.72769 host sumi3~nvme
14 nvme 0.72769 osd.14
-1 7.70544 root default
-3 2.56848 host sumi1
12 nvme 0.72769 osd.12
-5 2.56848 host sumi2
5PG calculator http://ceph.com/pgcalc/
6Ceph pool operation http://docs.ceph.com/docs/luminous/rados/operations/pools/
7CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
Proxmox VE Administration Guide 51 / 328
13 nvme 0.72769 osd.13
-7 2.56848 host sumi3
14 nvme 0.72769 osd.14
To let a pool distribute its objects only on a specific device class, you need to create a ruleset with the specific
class first.
ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> < -
class>
<rule-name> name of the rule, to connect with a pool (seen in GUI & CLI)
<root> which crush root it should belong to (default ceph root "default")
<failure-domain> at which failure-domain the objects should be distributed (usually host)
<class> what type of OSD backing store to use (eg. nvme, ssd, hdd)
Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
ceph osd pool set <pool-name> crush_rule <rule-name>
Tip
If the pool already contains objects, all of these have to be moved accordingly. Depending on your setup
this may introduce a big performance hit on your cluster. As an alternative, you can create a new pool and
move disks separately.
Proxmox VE Administration Guide 52 / 328
4.2.9 Ceph Client
You can then configure Proxmox VE to use such pools to store VM or Container images. Simply use the GUI
too add a new RBD storage (see section Ceph RADOS Block Devices (RBD) Section 8.14).
You also need to copy the keyring to a predefined location for a external Ceph cluster. If Ceph is installed on
the Proxmox nodes itself, then this will be done automatically.
Note
The file name needs to be <storage_id> + `.keyring -<storage_id> is the expression after
rbd: in /etc/pve/storage.cfg which is my-ceph-storage in the following example:
mkdir /etc/pve/priv/ceph
cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage. -
keyring
Proxmox VE Administration Guide 53 / 328
Chapter 5
Graphical User Interface
Proxmox VE is simple. There is no need to install a separate management tool, and everything can be done
through your web browser (Latest Firefox or Google Chrome is preferred). A built-in HTML5 console is used
to access the guest console. As an alternative, SPICE can be used.
Because we use the Proxmox cluster file system (pmxcfs), you can connect to any node to manage the
entire cluster. Each node can manage the entire cluster. There is no need for a dedicated manager node.
You can use the web-based administration interface with any modern browser. When Proxmox VE detects
that you are connecting from a mobile device, you are redirected to a simpler, touch-based user interface.
The web interface can be reached via https://youripaddress:8006 (default login is: root, and the password is
specified during the installation process).
5.1 Features
Seamless integration and management of Proxmox VE clusters
AJAX technologies for dynamic updates of resources
Secure access to all Virtual Machines and Containers via SSL encryption (https)
Fast search-driven interface, capable of handling hundreds and probably thousands of VMs
Secure HTML5 console or SPICE
Role based permission management for all objects (VMs, storages, nodes, etc.)
Support for multiple authentication sources (e.g. local, MS ADS, LDAP, . . . )
Two-Factor Authentication (OATH, Yubikey)
Based on ExtJS 6.x JavaScript framework
Proxmox VE Administration Guide 54 / 328
5.2 Login
When you connect to the server, you will first see the login window. Proxmox VE supports various authen-
tication backends (Realm), and you can select the language here. The GUI is translated to more than 20
languages.
Note
You can save the user name on the client side by selection the checkbox at the bottom. This saves some
typing when you login next time.
5.3 GUI Overview
Proxmox VE Administration Guide 55 / 328
The Proxmox VE user interface consists of four regions.
Header On top. Shows status information and contains buttons for most important actions.
Resource Tree At the left side. A navigation tree where you can select specific objects.
Content Panel Center region. Selected objects displays configuration options and status here.
Log Panel At the bottom. Displays log entries for recent tasks. You can double-click on those
log entries to get more details, or to abort a running task.
Note
You can shrink and expand the size of the resource tree and log panel, or completely hide the log panel.
This can be helpful when you work on small displays and want more space to view other content.
5.3.1 Header
On the top left side, the first thing you see is the Proxmox logo. Next to it is the current running version of
Proxmox VE. In the search bar nearside you can search for specific objects (VMs, containers, nodes, . . . ).
This is sometimes faster than selecting an object in the resource tree.
To the right of the search bar we see the identity (login name). The gear symbol is a button opening the
My Settings dialog. There you can customize some client side user interface setting (reset the saved login
name, reset saved layout).
The rightmost part of the header contains four buttons:
Help Opens a new browser window showing the reference documentation.
Create VM Opens the virtual machine creation wizard.
Create CT Open the container creation wizard.
Proxmox VE Administration Guide 56 / 328
Logout Logout, and show the login dialog again.
5.3.2 Resource Tree
This is the main navigation tree. On top of the tree you can select some predefined views, which changes
the structure of the tree below. The default view is Server View, and it shows the following object types:
Datacenter Contains cluster wide setting (relevant for all nodes).
Node Represents the hosts inside a cluster, where the guests runs.
Guest VMs, Containers and Templates.
Storage Data Storage.
Pool It is possible to group guests using a pool to simplify management.
The following view types are available:
Server View Shows all kind of objects, grouped by nodes.
Folder View Shows all kind of objects, grouped by object type.
Storage View Only show storage objects, grouped by nodes.
Pool View Show VMs and Containers, grouped by pool.
5.3.3 Log Panel
The main purpose of the log panel is to show you what is currently going on in your cluster. Actions like
creating an new VM are executed in background, and we call such background job a task.
Any output from such task is saved into a separate log file. You can view that log by simply double-click a
task log entry. It is also possible to abort a running task there.
Please note that we display most recent tasks from all cluster nodes here. So you can see when somebody
else is working on another cluster node in real-time.
Note
We remove older and finished task from the log panel to keep that list short. But you can still find those
tasks in the Task History within the node panel.
Some short running actions simply sends logs to all cluster members. You can see those messages in the
Cluster log panel.
Proxmox VE Administration Guide 57 / 328
5.4 Content Panels
When you select something in the resource tree, the corresponding object displays configuration and status
information in the content panel. The following sections give a brief overview of the functionality. Please refer
to the individual chapters inside the reference documentation to get more detailed information.
5.4.1 Datacenter
On the datacenter level you can access cluster wide settings and information.
Search: it is possible to search anything in cluster ,this can be a node, VM, Container, Storage or a pool.
Summary: gives a brief overview over the cluster health.
Options: can show and set defaults, which apply cluster wide.
Storage: is the place where a storage will add/managed/removed.
Backup: has the capability to schedule Backups. This is cluster wide, so you do not care about where the
VM/Container are on your cluster at schedule time.
Proxmox VE Administration Guide 58 / 328
Permissions: will manage user and group permission, LDAP, MS-AD and Two-Factor authentication can
be setup here.
HA: will manage the Proxmox VE High-Availability
Firewall: on this level the Proxmox Firewall works cluster wide and makes templates which are cluster
wide available.
Support: here you get all information about your support subscription.
If you like to have more information about this see the corresponding chapter.
5.4.2 Nodes
All belongs of a node can be managed at this level.
Search: it is possible to search anything on the node, this can be a VM, Container, Storage or a pool.
Summary: gives a brief overview over the resource usage.
Shell: log you in the shell of the node.
Proxmox VE Administration Guide 59 / 328
System: is for configuring the network, dns and time, and also shows your syslog.
Updates: will upgrade the system and informs you about new packets.
Firewall: on this level is only for this node.
Disk: gives you an brief overview about you physical hard drives and how they are used.
Ceph: is only used if you have installed a Ceph sever on you host. Then you can manage your Ceph
cluster and see the status of it here.
Task History: here all past task are shown.
Subscription: here you can upload you subscription key and get a system overview in case of a support
case.
5.4.3 Guests
There are two differed kinds of VM types and both types can be converted to a template. One of them are
Kernel-based Virtual Machine (KVM) and the other one are Linux Containers (LXC). General the navigation
are the same only some option are different.
In the main management center the VM navigation begin if a VM is selected in the left tree.
The top header contains important VM operation commands like Start,Shutdown,Reset,Remove,Migrate,
Console and Help. Some of them have hidden buttons like Shutdown has Stop and Console contains the
different console types SPICE,noVNC and xterm.js.
On the right side the content switch white the focus of the option.
On the left side. All available options are listed one below the other.
Proxmox VE Administration Guide 60 / 328
Summary: gives a brief overview over the VM activity.
Console: an interactive console to your VM.
(KVM)Hardware: shows and set the Hardware of the KVM VM.
(LXC)Resources: defines the LXC Hardware opportunities.
(LXC)Network: the LXC Network settings.
(LXC)DNS: the LXC DNS settings.
Options: all VM options can be set here, this distinguishes between KVM and LXC.
Task History: here all previous task from this VM will be shown.
(KVM) Monitor: is the interactive communication interface to the KVM process.
Backup: shows the available backups from this VM and also create a backupset.
Replication: shows the replication jobs for this VM and allows to create new jobs.
Snapshots: manage VM snapshots.
Firewall: manage the firewall on VM level.
Permissions: manage the user permission for this VM.
Proxmox VE Administration Guide 61 / 328
5.4.4 Storage
In this view we have a two partition split view. On the left side we have the storage options and on the right
side the content of the selected option will shown.
Summary: show you important information about your storage like Usage,Type,Content,Active and
Enabled.
Content: Here all contend will listed grouped by content.
Permissions: manage the user permission for this storage.
Proxmox VE Administration Guide 62 / 328
5.4.5 Pools
In this view we have a two partition split view. On the left side we have the logical pool options and on the
right side the content of the selected option will shown.
Summary: show the description of the pool.
Members: Here all members of this pool will listed and can be managed.
Permissions: manage the user permission for this pool.
Proxmox VE Administration Guide 63 / 328
Chapter 6
Cluster Manager
The Proxmox VE cluster manager pvecm is a tool to create a group of physical servers. Such a group is
called a cluster. We use the Corosync Cluster Engine for reliable group communication, and such clusters
can consist of up to 32 physical nodes (probably more, dependent on network latency).
pvecm can be used to create a new cluster, join nodes to a cluster, leave the cluster, get status informa-
tion and do various other cluster related tasks. The Proxmox Cluster File System (“pmxcfs”) is used to
transparently distribute the cluster configuration to all cluster nodes.
Grouping nodes into a cluster has the following advantages:
Centralized, web based management
Multi-master clusters: each node can do all management task
pmxcfs: database-driven file system for storing configuration files, replicated in real-time on all nodes
using corosync.
Easy migration of virtual machines and containers between physical hosts
Fast deployment
Cluster-wide services like firewall and HA
6.1 Requirements
All nodes must be in the same network as corosync uses IP Multicast to communicate between nodes
(also see Corosync Cluster Engine). Corosync uses UDP ports 5404 and 5405 for cluster communication.
Note
Some switches do not support IP multicast by default and must be manually enabled first.
Date and time have to be synchronized.
SSH tunnel on TCP port 22 between nodes is used.
Proxmox VE Administration Guide 64 / 328
If you are interested in High Availability, you need to have at least three nodes for reliable quorum. All
nodes should have the same version.
We recommend a dedicated NIC for the cluster traffic, especially if you use shared storage.
Note
It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster nodes.
6.2 Preparing Nodes
First, install Proxmox VE on all nodes. Make sure that each node is installed with the final hostname and IP
configuration. Changing the hostname and IP is not possible after cluster creation.
Currently the cluster creation has to be done on the console, so you need to login via ssh.
6.3 Create the Cluster
Login via ssh to the first Proxmox VE node. Use a unique name for your cluster. This name cannot be
changed later.
hp1# pvecm create YOUR-CLUSTER-NAME
Caution
The cluster name is used to compute the default multicast address. Please use unique cluster
names if you run more than one cluster inside your network.
To check the state of your cluster use:
hp1# pvecm status
6.3.1 Multiple Clusters In Same Network
It is possible to create multiple clusters in the same physical or logical network. Each cluster must have
a unique name, which is used to generate the cluster’s multicast group address. As long as no duplicate
cluster names are configured in one network segment, the different clusters won’t interfere with each other.
If multiple clusters operate in a single network it may be beneficial to setup an IGMP querier and enable
IGMP Snooping in said network. This may reduce the load of the network significantly because multicast
packets are only delivered to endpoints of the respective member nodes.
Proxmox VE Administration Guide 65 / 328
6.4 Adding Nodes to the Cluster
Login via ssh to the node you want to add.
hp2# pvecm add IP-ADDRESS-CLUSTER
For IP-ADDRESS-CLUSTER use the IP from an existing cluster node.
Caution
A new node cannot hold any VMs, because you would get conflicts about identical VM IDs. Also,
all existing configuration in /etc/pve is overwritten when you join a new node to the cluster. To
workaround, use vzdump to backup and restore to a different VMID after adding the node to the
cluster.
To check the state of cluster:
# pvecm status
Cluster status after adding 4 nodes
hp2# pvecm status
Quorum information
~~~~~~~~~~~~~~~~~~
Date: Mon Apr 20 12:30:13 2015
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1928
Quorate: Yes
Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 2
Flags: Quorate
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Name
0x00000001 1 192.168.15.91
0x00000002 1 192.168.15.92 (local)
0x00000003 1 192.168.15.93
0x00000004 1 192.168.15.94
If you only want the list of all nodes use:
# pvecm nodes
Proxmox VE Administration Guide 66 / 328
List nodes in a cluster
hp2# pvecm nodes
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Name
1 1 hp1
2 1 hp2 (local)
3 1 hp3
4 1 hp4
6.4.1 Adding Nodes With Separated Cluster Network
When adding a node to a cluster with a separated cluster network you need to use the ringX_addr parame-
ters to set the nodes address on those networks:
pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
If you want to use the Redundant Ring Protocol you will also want to pass the ring1_addr parameter.
6.5 Remove a Cluster Node
Caution
Read carefully the procedure before proceeding, as it could not be what you want or need.
Move all virtual machines from the node. Make sure you have no local data or backups you want to keep, or
save them accordingly. In the following example we will remove the node hp4 from the cluster.
Log in to a different cluster node (not hp4), and issue a pvecm nodes command to identify the node ID
to remove:
hp1# pvecm nodes
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Name
1 1 hp1 (local)
2 1 hp2
3 1 hp3
4 1 hp4
At this point you must power off hp4 and make sure that it will not power on again (in the network) as it is.
Important
As said above, it is critical to power off the node before removal, and make sure that it will never
power on again (in the existing cluster network) as it is. If you power on the node as it is, your
cluster will be screwed up and it could be difficult to restore a clean cluster state.
Proxmox VE Administration Guide 67 / 328
After powering off the node hp4, we can safely remove it from the cluster.
hp1# pvecm delnode hp4
If the operation succeeds no output is returned, just check the node list again with pvecm nodes or pvecm
status. You should see something like:
hp1# pvecm status
Quorum information
~~~~~~~~~~~~~~~~~~
Date: Mon Apr 20 12:44:28 2015
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1992
Quorate: Yes
Votequorum information
~~~~~~~~~~~~~~~~~~~~~~
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 3
Flags: Quorate
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Name
0x00000001 1 192.168.15.90 (local)
0x00000002 1 192.168.15.91
0x00000003 1 192.168.15.92
If, for whatever reason, you want that this server joins the same cluster again, you have to
reinstall Proxmox VE on it from scratch
then join it, as explained in the previous section.
6.5.1 Separate A Node Without Reinstalling
Caution
This is not the recommended method, proceed with caution. Use the above mentioned method if
you’re unsure.
You can also separate a node from a cluster without reinstalling it from scratch. But after removing the
node from the cluster it will still have access to the shared storages! This must be resolved before you start
removing the node from the cluster. A Proxmox VE cluster cannot share the exact same storage with another
cluster, as storage locking doesn’t work over cluster boundary. Further, it may also lead to VMID conflicts.
Proxmox VE Administration Guide 68 / 328
Its suggested that you create a new storage where only the node which you want to separate has access.
This can be an new export on your NFS or a new Ceph pool, to name a few examples. Its just important that
the exact same storage does not gets accessed by multiple clusters. After setting this storage up move all
data from the node and its VMs to it. Then you are ready to separate the node from the cluster.
Warning
Ensure all shared resources are cleanly separated! You will run into conflicts and problems else.
First stop the corosync and the pve-cluster services on the node:
systemctl stop pve-cluster
systemctl stop corosync
Start the cluster filesystem again in local mode:
pmxcfs -l
Delete the corosync configuration files:
rm /etc/pve/corosync.conf
rm /etc/corosync/*
You can now start the filesystem again as normal service:
killall pmxcfs
systemctl start pve-cluster
The node is now separated from the cluster. You can deleted it from a remaining node of the cluster with:
pvecm delnode oldnode
If the command failed, because the remaining node in the cluster lost quorum when the now separate node
exited, you may set the expected votes to 1 as a workaround:
pvecm expected 1
And the repeat the pvecm delnode command.
Now switch back to the separated node, here delete all remaining files left from the old cluster. This ensures
that the node can be added to another cluster again without problems.
rm /var/lib/corosync/*
As the configuration files from the other nodes are still in the cluster filesystem you may want to clean those
up too. Remove simply the whole directory recursive from /etc/pve/nodes/NODENAME, but check three
times that you used the correct one before deleting it.
Caution
The nodes SSH keys are still in the authorized_key file, this means the nodes can still connect to
each other with public key authentication. This should be fixed by removing the respective keys
from the /etc/pve/priv/authorized_keys file.
Proxmox VE Administration Guide 69 / 328
6.6 Quorum
Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes.
A quorum is the minimum number of votes that a distributed transaction has to obtain in order
to be allowed to perform an operation in a distributed system.
— from Wikipedia Quorum (distributed computing)
In case of network partitioning, state changes requires that a majority of nodes are online. The cluster
switches to read-only mode if it loses quorum.
Note
Proxmox VE assigns a single vote to each node by default.
6.7 Cluster Network
The cluster network is the core of a cluster. All messages sent over it have to be delivered reliable to all
nodes in their respective order. In Proxmox VE this part is done by corosync, an implementation of a high
performance low overhead high availability development toolkit. It serves our decentralized configuration file
system (pmxcfs).
6.7.1 Network Requirements
This needs a reliable network with latencies under 2 milliseconds (LAN performance) to work properly. While
corosync can also use unicast for communication between nodes its highly recommended to have a multi-
cast capable network. The network should not be used heavily by other members, ideally corosync runs on
its own network. never share it with network where storage communicates too.
Before setting up a cluster it is good practice to check if the network is fit for that purpose.
Ensure that all nodes are in the same subnet. This must only be true for the network interfaces used for
cluster communication (corosync).
Ensure all nodes can reach each other over those interfaces, using ping is enough for a basic test.
Ensure that multicast works in general and a high package rates. This can be done with the omping tool.
The final "%loss" number should be < 1%.
omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
Ensure that multicast communication works over an extended period of time. This uncovers problems
where IGMP snooping is activated on the network but no multicast querier is active. This test has a
duration of around 10 minutes.
omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
Your network is not ready for clustering if any of these test fails. Recheck your network configuration. Es-
pecially switches are notorious for having multicast disabled by default or IGMP snooping enabled with no
IGMP querier active.
In smaller cluster its also an option to use unicast if you really cannot get multicast to work.
Proxmox VE Administration Guide 70 / 328
6.7.2 Separate Cluster Network
When creating a cluster without any parameters the cluster network is generally shared with the Web UI and
the VMs and its traffic. Depending on your setup even storage traffic may get sent over the same network.
Its recommended to change that, as corosync is a time critical real time application.
Setting Up A New Network
First you have to setup a new network interface. It should be on a physical separate network. Ensure that
your network fulfills the cluster network requirements.
Separate On Cluster Creation
This is possible through the ring0_addr and bindnet0_addr parameter of the pvecm create command used
for creating a new cluster.
If you have setup an additional NIC with a static address on 10.10.10.1/25 and want to send and receive all
cluster communication over this interface you would execute:
pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
To check if everything is working properly execute:
systemctl status corosync
Afterwards, proceed as descripted in the section to add nodes with a separated cluster network.
Separate After Cluster Creation
You can do this also if you have already created a cluster and want to switch its communication to another
network, without rebuilding the whole cluster. This change may lead to short durations of quorum loss in the
cluster, as nodes have to restart corosync and come up one after the other on the new network.
Check how to edit the corosync.conf file first. The open it and you should see a file similar to:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: due
nodeid: 2
quorum_votes: 1
ring0_addr: due
}
node {
name: tre
nodeid: 3
Proxmox VE Administration Guide 71 / 328
quorum_votes: 1
ring0_addr: tre
}
node {
name: uno
nodeid: 1
quorum_votes: 1
ring0_addr: uno
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: thomas-testcluster
config_version: 3
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.30.50
ringnumber: 0
}
}
The first you want to do is add the name properties in the node entries if you do not see them already. Those
must match the node name.
Then replace the address from the ring0_addr properties with the new addresses. You may use plain IP
addresses or also hostnames here. If you use hostnames ensure that they are resolvable from all nodes.
In my example I want to switch my cluster communication to the 10.10.10.1/25 network. So I replace all
ring0_addr respectively. I also set the bindnetaddr in the totem section of the config to an address of the
new network. It can be any address from the subnet configured on the new network interface.
After you increased the config_version property the new configuration file should look like:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: due
nodeid: 2
quorum_votes: 1
ring0_addr: 10.10.10.2
Proxmox VE Administration Guide 72 / 328
}
node {
name: tre
nodeid: 3
quorum_votes: 1
ring0_addr: 10.10.10.3
}
node {
name: uno
nodeid: 1
quorum_votes: 1
ring0_addr: 10.10.10.1
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: thomas-testcluster
config_version: 4
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.10.10.1
ringnumber: 0
}
}
Now after a final check whether all changed information is correct we save it and see again the edit corosync.conf
file section to learn how to bring it in effect.
As our change cannot be enforced live from corosync we have to do an restart.
On a single node execute:
systemctl restart corosync
Now check if everything is fine:
systemctl status corosync
If corosync runs again correct restart corosync also on all other nodes. They will then join the cluster
membership one by one on the new network.
Proxmox VE Administration Guide 73 / 328
6.7.3 Redundant Ring Protocol
To avoid a single point of failure you should implement counter measurements. This can be on the hardware
and operating system level through network bonding.
Corosync itself offers also a possibility to add redundancy through the so called Redundant Ring Protocol.
This protocol allows running a second totem ring on another network, this network should be physically
separated from the other rings network to actually increase availability.
6.7.4 RRP On Cluster Creation
The pvecm create command provides the additional parameters bindnetX_addr,ringX_addr and rrp_mode,
can be used for RRP configuration.
Note
See the glossary if you do not know what each parameter means.
So if you have two networks, one on the 10.10.10.1/24 and the other on the 10.10.20.1/24 subnet you would
execute:
pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
6.7.5 RRP On Existing Clusters
You will take similar steps as described in separating the cluster network to enable RRP on an already
running cluster. The single difference is, that you will add ring1 and use it instead of ring0.
First add a new interface subsection in the totem section, set its ringnumber property to 1. Set
the interfaces bindnetaddr property to an address of the subnet you have configured for your new ring.
Further set the rrp_mode to passive, this is the only stable mode.
Then add to each node entry in the nodelist section its new ring1_addr property with the nodes
additional ring address.
So if you have two networks, one on the 10.10.10.1/24 and the other on the 10.10.20.1/24 subnet, the final
configuration file should look like:
totem {
cluster_name: tweak
config_version: 9
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 2
interface {
bindnetaddr: 10.10.10.1
ringnumber: 0
}
interface {
bindnetaddr: 10.10.20.1
Proxmox VE Administration Guide 74 / 328
ringnumber: 1
}
}
nodelist {
node {
name: pvecm1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.10.10.1
ring1_addr: 10.10.20.1
}
node {
name: pvecm2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.10.10.2
ring1_addr: 10.10.20.2
}
[...] # other cluster nodes here
}
[...] # other remaining config sections here
Bring it in effect like described in the edit the corosync.conf file section.
This is a change which cannot take live in effect and needs at least a restart of corosync. Recommended is
a restart of the whole cluster.
If you cannot reboot the whole cluster ensure no High Availability services are configured and the stop the
corosync service on all nodes. After corosync is stopped on all nodes start it one after the other again.
6.8 Corosync Configuration
The /etc/pve/corosync.conf file plays a central role in Proxmox VE cluster. It controls the cluster
member ship and its network. For reading more about it check the corosync.conf man page:
man corosync.conf
For node membership you should always use the pvecm tool provided by Proxmox VE. You may have to
edit the configuration file manually for other changes. Here are a few best practice tips for doing this.
6.8.1 Edit corosync.conf
Editing the corosync.conf file can be not always straight forward. There are two on each cluster, one in /
etc/pve/corosync.conf and the other in /etc/corosync/corosync.conf. Editing the one
in our cluster file system will propagate the changes to the local one, but not vice versa.
Proxmox VE Administration Guide 75 / 328
The configuration will get updated automatically as soon as the file changes. This means changes which
can be integrated in a running corosync will take instantly effect. So you should always make a copy and edit
that instead, to avoid triggering some unwanted changes by an in between safe.
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
Then open the Config file with your favorite editor, nano and vim.tiny are preinstalled on Proxmox VE
for example.
Note
Always increment the config_version number on configuration changes, omitting this can lead to problems.
After making the necessary changes create another copy of the current working configuration file. This
serves as a backup if the new configuration fails to apply or makes problems in other ways.
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.bak
Then move the new configuration file over the old one:
mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
You may check with the commands
systemctl status corosync
journalctl -b -u corosync
If the change could applied automatically. If not you may have to restart the corosync service via:
systemctl restart corosync
On errors check the troubleshooting section below.
6.8.2 Troubleshooting
Issue: quorum.expected_votes must be configured
When corosync starts to fail and you get the following message in the system log:
[...]
corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to -
initialize.
corosync[1647]: [SERV ] Service engine ’corosync_quorum’ failed to load -
for reason
’configuration error: nodelist or quorum.expected_votes must be -
configured!’
[...]
It means that the hostname you set for corosync ringX_addr in the configuration could not be resolved.
Proxmox VE Administration Guide 76 / 328
Write Configuration When Not Quorate
If you need to change /etc/pve/corosync.conf on an node with no quorum, and you know what you do, use:
pvecm expected 1
This sets the expected vote count to 1 and makes the cluster quorate. You can now fix your configuration, or
revert it back to the last working backup.
This is not enough if corosync cannot start anymore. Here its best to edit the local copy of the corosync
configuration in /etc/corosync/corosync.conf so that corosync can start again. Ensure that on all nodes this
configuration has the same content to avoid split brains. If you are not sure what went wrong it’s best to ask
the Proxmox Community to help you.
6.8.3 Corosync Configuration Glossary
ringX_addr
This names the different ring addresses for the corosync totem rings used for the cluster communica-
tion.
bindnetaddr
Defines to which interface the ring should bind to. It may be any address of the subnet configured on
the interface we want to use. In general its the recommended to just use an address a node uses on
this interface.
rrp_mode
Specifies the mode of the redundant ring protocol and may be passive, active or none. Note that use
of active is highly experimental and not official supported. Passive is the preferred mode, it may double
the cluster communication throughput and increases availability.
6.9 Cluster Cold Start
It is obvious that a cluster is not quorate when all nodes are offline. This is a common case after a power
failure.
Note
It is always a good idea to use an uninterruptible power supply (“UPS”, also called “battery backup”) to
avoid this state, especially if you want HA.
On node startup, the pve-guests service is started and waits for quorum. Once quorate, it starts all
guests which have the onboot flag set.
When you turn on nodes, or when power comes back after power failure, it is likely that some nodes boots
faster than others. Please keep in mind that guest startup is delayed until you reach quorum.
Proxmox VE Administration Guide 77 / 328
6.10 Guest Migration
Migrating virtual guests to other nodes is a useful feature in a cluster. There are settings to control the
behavior of such migrations. This can be done via the configuration file datacenter.cfg or for a specific
migration via API or command line parameters.
It makes a difference if a Guest is online or offline, or if it has local resources (like a local disk).
For Details about Virtual Machine Migration see the QEMU/KVM Migration Chapter Section 10.3
For Details about Container Migration see the Container Migration Chapter Section 11.9
6.10.1 Migration Type
The migration type defines if the migration data should be sent over an encrypted (secure) channel or an
unencrypted (insecure) one. Setting the migration type to insecure means that the RAM content of a
virtual guest gets also transferred unencrypted, which can lead to information disclosure of critical data from
inside the guest (for example passwords or encryption keys).
Therefore, we strongly recommend using the secure channel if you do not have full control over the network
and can not guarantee that no one is eavesdropping to it.
Note
Storage migration does not follow this setting. Currently, it always sends the storage content over a secure
channel.
Encryption requires a lot of computing power, so this setting is often changed to "unsafe" to achieve better
performance. The impact on modern systems is lower because they implement AES encryption in hardware.
The performance impact is particularly evident in fast networks where you can transfer 10 Gbps or more.
6.10.2 Migration Network
By default, Proxmox VE uses the network in which cluster communication takes place to send the migration
traffic. This is not optimal because sensitive cluster traffic can be disrupted and this network may not have
the best bandwidth available on the node.
Setting the migration network parameter allows the use of a dedicated network for the entire migration traffic.
In addition to the memory, this also affects the storage traffic for offline migrations.
The migration network is set as a network in the CIDR notation. This has the advantage that you do not have
to set individual IP addresses for each node. Proxmox VE can determine the real address on the destination
node from the network specified in the CIDR form. To enable this, the network must be specified so that
each node has one, but only one IP in the respective network.
Example
We assume that we have a three-node setup with three separate networks. One for public communication
with the Internet, one for cluster communication and a very fast one, which we want to use as a dedicated
network for migration.
A network configuration for such a setup might look as follows:
Proxmox VE Administration Guide 78 / 328
iface eno1 inet manual
# public network
auto vmbr0
iface vmbr0 inet static
address 192.X.Y.57
netmask 255.255.250.0
gateway 192.X.Y.1
bridge_ports eno1
bridge_stp off
bridge_fd 0
# cluster network
auto eno2
iface eno2 inet static
address 10.1.1.1
netmask 255.255.255.0
# fast network
auto eno3
iface eno3 inet static
address 10.1.2.1
netmask 255.255.255.0
Here, we will use the network 10.1.2.0/24 as a migration network. For a single migration, you can do this
using the migration_network parameter of the command line tool:
# qm migrate 106 tre --online --migration_network 10.1.2.0/24
To configure this as the default network for all migrations in the cluster, set the migration property of the
/etc/pve/datacenter.cfg file:
# use dedicated migration network
migration: secure,network=10.1.2.0/24
Note
The migration type must always be set when the migration network gets set in /etc/pve/datacen
ter.cfg.
Proxmox VE Administration Guide 79 / 328
Chapter 7
Proxmox Cluster File System (pmxcfs)
The Proxmox Cluster file system (“pmxcfs”) is a database-driven file system for storing configuration files,
replicated in real time to all cluster nodes using corosync. We use this to store all PVE related configura-
tion files.
Although the file system stores all data inside a persistent database on disk, a copy of the data resides in
RAM. That imposes restriction on the maximum size, which is currently 30MB. This is still enough to store
the configuration of several thousand virtual machines.
This system provides the following advantages:
seamless replication of all configuration to all nodes in real time
provides strong consistency checks to avoid duplicate VM IDs
read-only when a node loses quorum
automatic updates of the corosync cluster configuration to all nodes
includes a distributed locking mechanism
7.1 POSIX Compatibility
The file system is based on FUSE, so the behavior is POSIX like. But some feature are simply not imple-
mented, because we do not need them:
you can just generate normal files and directories, but no symbolic links, . . .
you can’t rename non-empty directories (because this makes it easier to guarantee that VMIDs are unique).
you can’t change file permissions (permissions are based on path)
O_EXCL creates were not atomic (like old NFS)
O_TRUNC creates are not atomic (FUSE restriction)
Proxmox VE Administration Guide 80 / 328
7.2 File Access Rights
All files and directories are owned by user root and have group www-data. Only root has write permis-
sions, but group www-data can read most files. Files below the following paths:
/etc/pve/priv/
/etc/pve/nodes/${NAME}/priv/
are only accessible by root.
7.3 Technology
We use the Corosync Cluster Engine for cluster communication, and SQlite for the database file. The file
system is implemented in user space using FUSE.
7.4 File System Layout
The file system is mounted at:
/etc/pve
7.4.1 Files
corosync.conf Corosync cluster configuration file (previous to
Proxmox VE 4.x this file was called cluster.conf)
storage.cfg Proxmox VE storage configuration
datacenter.cfg Proxmox VE datacenter wide configuration
(keyboard layout, proxy, . . . )
user.cfg Proxmox VE access control configuration
(users/groups/. . . )
domains.cfg Proxmox VE authentication domains
status.cfg Proxmox VE external metrics server configuration
authkey.pub Public key used by ticket system
pve-root-ca.pem Public certificate of cluster CA
priv/shadow.cfg Shadow password file
priv/authkey.key Private key used by ticket system
priv/pve-root-ca.key Private key of cluster CA
nodes/<NAME>/pve-ssl.pem Public SSL certificate for web server (signed by
cluster CA)
nodes/<NAME>/pve-ssl.key Private SSL key for pve-ssl.pem
nodes/<NAME>/pveproxy-ssl.pem Public SSL certificate (chain) for web server
(optional override for pve-ssl.pem)
nodes/<NAME>/pveproxy-ssl.key Private SSL key for pveproxy-ssl.pem
(optional)
Proxmox VE Administration Guide 81 / 328
nodes/<NAME>/qemu-server/<VMID>.
conf
VM configuration data for KVM VMs
nodes/<NAME>/lxc/<VMID>.conf VM configuration data for LXC containers
firewall/cluster.fw Firewall configuration applied to all nodes
firewall/<NAME>.fw Firewall configuration for individual nodes
firewall/<VMID>.fw Firewall configuration for VMs and Containers
7.4.2 Symbolic links
local nodes/<LOCAL_HOST_NAME>
qemu-server nodes/<LOCAL_HOST_NAME>/qemu-
server/
lxc nodes/<LOCAL_HOST_NAME>/lxc/
7.4.3 Special status files for debugging (JSON)
.version File versions (to detect file modifications)
.members Info about cluster members
.vmlist List of all VMs
.clusterlog Cluster log (last 50 entries)
.rrd RRD data (most recent entries)
7.4.4 Enable/Disable debugging
You can enable verbose syslog messages with:
echo "1" >/etc/pve/.debug
And disable verbose syslog messages with:
echo "0" >/etc/pve/.debug
7.5 Recovery
If you have major problems with your Proxmox VE host, e.g. hardware issues, it could be helpful to just
copy the pmxcfs database file /var/lib/pve-cluster/config.db and move it to a new Proxmox
VE host. On the new host (with nothing running), you need to stop the pve-cluster service and replace
the config.db file (needed permissions 0600). Second, adapt /etc/hostname and /etc/hosts
according to the lost Proxmox VE host, then reboot and check. (And don’t forget your VM/CT data)
7.5.1 Remove Cluster configuration
The recommended way is to reinstall the node after you removed it from your cluster. This makes sure that
all secret cluster/ssh keys and any shared configuration data is destroyed.
In some cases, you might prefer to put a node back to local mode without reinstall, which is described in
Separate A Node Without Reinstalling
Proxmox VE Administration Guide 82 / 328
7.5.2 Recovering/Moving Guests from Failed Nodes
For the guest configuration files in nodes/<NAME>/qemu-server/ (VMs) and nodes/<NAME>/
lxc/ (containers), Proxmox VE sees the containing node <NAME> as owner of the respective guest. This
concept enables the usage of local locks instead of expensive cluster-wide locks for preventing concurrent
guest configuration changes.
As a consequence, if the owning node of a guest fails (e.g., because of a power outage, fencing event, ..), a
regular migration is not possible (even if all the disks are located on shared storage) because such a local lock
on the (dead) owning node is unobtainable. This is not a problem for HA-managed guests, as Proxmox VE’s
High Availability stack includes the necessary (cluster-wide) locking and watchdog functionality to ensure
correct and automatic recovery of guests from fenced nodes.
If a non-HA-managed guest has only shared disks (and no other local resources which are only available
on the failed node are configured), a manual recovery is possible by simply moving the guest configuration
file from the failed node’s directory in /etc/pve/ to an alive node’s directory (which changes the logical
owner or location of the guest).
For example, recovering the VM with ID 100 from a dead node1 to another node node2 works with the
following command executed when logged in as root on any member node of the cluster:
mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
Warning
Before manually recovering a guest like this, make absolutely sure that the failed source node
is really powered off/fenced. Otherwise Proxmox VE’s locking principles are violated by the mv
command, which can have unexpected consequences.
Warning
Guest with local disks (or other local resources which are only available on the dead node) are not
recoverable like this. Either wait for the failed node to rejoin the cluster or restore such guests from
backups.
Proxmox VE Administration Guide 83 / 328
Chapter 8
Proxmox VE Storage
The Proxmox VE storage model is very flexible. Virtual machine images can either be stored on one or
several local storages, or on shared storage like NFS or iSCSI (NAS, SAN). There are no limits, and you
may configure as many storage pools as you like. You can use all storage technologies available for Debian
Linux.
One major benefit of storing VMs on shared storage is the ability to live-migrate running machines without
any downtime, as all nodes in the cluster have direct access to VM disk images. There is no need to copy
VM image data, so live migration is very fast in that case.
The storage library (package libpve-storage-perl) uses a flexible plugin system to provide a com-
mon interface to all storage types. This can be easily adopted to include further storage types in future.
8.1 Storage Types
There are basically two different classes of storage types:
Block level storage
Allows to store large raw images. It is usually not possible to store other files (ISO, backups, ..) on
such storage types. Most modern block level storage implementations support snapshots and clones.
RADOS, Sheepdog and GlusterFS are distributed systems, replicating storage data to different nodes.
File level storage
They allow access to a full featured (POSIX) file system. They are more flexible, and allows you to store
any content type. ZFS is probably the most advanced system, and it has full support for snapshots
and clones.
Table 8.1: Available storage types
Description PVE type Level Shared Snapshots Stable
ZFS (local) zfspool file no yes yes
Directory dir file no no1yes
NFS nfs file yes no1yes
CIFS cifs file yes no1yes
GlusterFS glusterfs file yes no1yes
Proxmox VE Administration Guide 84 / 328
Table 8.1: (continued)
Description PVE type Level Shared Snapshots Stable
LVM lvm block no2no yes
LVM-thin lvmthin block no yes yes
iSCSI/kernel iscsi block yes no yes
iSCSI/libiscsi iscsidir
ect
block yes no yes
Ceph/RBD rbd block yes yes yes
Sheepdog sheepdog block yes yes beta
ZFS over
iSCSI
zfs block yes yes yes
1: On file based storages, snapshots are possible with the qcow2 format.
2: It is possible to use LVM on top of an iSCSI storage. That way you get a shared LVM storage.
8.1.1 Thin Provisioning
A number of storages, and the Qemu image format qcow2, support thin provisioning. With thin provisioning
activated, only the blocks that the guest system actually use will be written to the storage.
Say for instance you create a VM with a 32GB hard disk, and after installing the guest system OS, the root
file system of the VM contains 3 GB of data. In that case only 3GB are written to the storage, even if the
guest VM sees a 32GB hard drive. In this way thin provisioning allows you to create disk images which are
larger than the currently available storage blocks. You can create large disk images for your VMs, and when
the need arises, add more disks to your storage without resizing the VMs’ file systems.
All storage types which have the “Snapshots” feature also support thin provisioning.
Caution
If a storage runs full, all guests using volumes on that storage receive IO errors. This can cause file
system inconsistencies and may corrupt your data. So it is advisable to avoid over-provisioning of
your storage resources, or carefully observe free space to avoid such conditions.
8.2 Storage Configuration
All Proxmox VE related storage configuration is stored within a single text file at /etc/pve/storage.
cfg. As this file is within /etc/pve/, it gets automatically distributed to all cluster nodes. So all nodes
share the same storage configuration.
Sharing storage configuration make perfect sense for shared storage, because the same “shared” storage
is accessible from all nodes. But is also useful for local storage types. In this case such local storage is
available on all nodes, but it is physically different and can have totally different content.
Proxmox VE Administration Guide 85 / 328
8.2.1 Storage Pools
Each storage pool has a <type>, and is uniquely identified by its <STORAGE_ID>. A pool configuration
looks like this:
<type>: <STORAGE_ID>
<property> <value>
<property> <value>
...
The <type>: <STORAGE_ID> line starts the pool definition, which is then followed by a list of proper-
ties. Most properties have values, but some of them come with reasonable default. In that case you can omit
the value.
To be more specific, take a look at the default storage configuration after installation. It contains one special
local storage pool named local, which refers to the directory /var/lib/vz and is always available. The
Proxmox VE installer creates additional storage entries depending on the storage type chosen at installation
time.
Default storage configuration (/etc/pve/storage.cfg)
dir: local
path /var/lib/vz
content iso,vztmpl,backup
# default image store on LVM based installation
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
# default image store on ZFS based installation
zfspool: local-zfs
pool rpool/data
sparse
content images,rootdir
8.2.2 Common Storage Properties
A few storage properties are common among different storage types.
nodes
List of cluster node names where this storage is usable/accessible. One can use this property to
restrict storage access to a limited set of nodes.
content
A storage can support several content types, for example virtual disk images, cdrom iso images,
container templates or container root directories. Not all storage types support all content types. One
can set this property to select for what this storage is used for.
Proxmox VE Administration Guide 86 / 328
images
KVM-Qemu VM images.
rootdir
Allow to store container data.
vztmpl
Container templates.
backup
Backup files (vzdump).
iso
ISO images
shared
Mark storage as shared.
disable
You can use this flag to disable the storage completely.
maxfiles
Maximum number of backup files per VM. Use 0for unlimited.
format
Default image format (raw|qcow2|vmdk)
Warning
It is not advisable to use the same storage pool on different Proxmox VE clusters. Some stor-
age operation need exclusive access to the storage, so proper locking is required. While this is
implemented within a cluster, it does not work between different clusters.
8.3 Volumes
We use a special notation to address storage data. When you allocate data from a storage pool, it re-
turns such a volume identifier. A volume is identified by the <STORAGE_ID>, followed by a storage type
dependent volume name, separated by colon. A valid <VOLUME_ID> looks like:
local:230/example-image.raw
local:iso/debian-501-amd64-netinst.iso
local:vztmpl/debian-5.0-joomla_1.5.9-1_i386.tar.gz
iscsi-storage:0.0.2.scsi-14 -
f504e46494c4500494b5042546d2d646744372d31616d61
To get the file system path for a <VOLUME_ID> use:
pvesm path <VOLUME_ID>
Proxmox VE Administration Guide 87 / 328
8.3.1 Volume Ownership
There exists an ownership relation for image type volumes. Each such volume is owned by a VM or
Container. For example volume local:230/example-image.raw is owned by VM 230. Most storage
backends encodes this ownership information into the volume name.
When you remove a VM or Container, the system also removes all associated volumes which are owned by
that VM or Container.
8.4 Using the Command Line Interface
It is recommended to familiarize yourself with the concept behind storage pools and volume identifiers, but in
real life, you are not forced to do any of those low level operations on the command line. Normally, allocation
and removal of volumes is done by the VM and Container management tools.
Nevertheless, there is a command line tool called pvesm (“Proxmox VE Storage Manager”), which is able
to perform common storage management tasks.
8.4.1 Examples
Add storage pools
pvesm add <TYPE> <STORAGE_ID> <OPTIONS>
pvesm add dir <STORAGE_ID> --path <PATH>
pvesm add nfs <STORAGE_ID> --path <PATH> --server <SERVER> --export -
<EXPORT>
pvesm add lvm <STORAGE_ID> --vgname <VGNAME>
pvesm add iscsi <STORAGE_ID> --portal <HOST[:PORT]> --target <TARGET -
>
Disable storage pools
pvesm set <STORAGE_ID> --disable 1
Enable storage pools
pvesm set <STORAGE_ID> --disable 0
Change/set storage options
pvesm set <STORAGE_ID> <OPTIONS>
pvesm set <STORAGE_ID> --shared 1
pvesm set local --format qcow2
pvesm set <STORAGE_ID> --content iso
Remove storage pools. This does not delete any data, and does not disconnect or unmount anything. It just
removes the storage configuration.
pvesm remove <STORAGE_ID>
Allocate volumes
Proxmox VE Administration Guide 88 / 328
pvesm alloc <STORAGE_ID> <VMID> <name> <size> [--format <raw|qcow2>]
Allocate a 4G volume in local storage. The name is auto-generated if you pass an empty string as <name>
pvesm alloc local <VMID> ’’ 4G
Free volumes
pvesm free <VOLUME_ID>
Warning
This really destroys all volume data.
List storage status
pvesm status
List storage contents
pvesm list <STORAGE_ID> [--vmid <VMID>]
List volumes allocated by VMID
pvesm list <STORAGE_ID> --vmid <VMID>
List iso images
pvesm list <STORAGE_ID> --iso
List container templates
pvesm list <STORAGE_ID> --vztmpl
Show file system path for a volume
pvesm path <VOLUME_ID>
8.5 Directory Backend
Storage pool type: dir
Proxmox VE can use local directories or locally mounted shares for storage. A directory is a file level storage,
so you can store any content type like virtual disk images, containers, templates, ISO images or backup files.
Note
You can mount additional storages via standard linux /etc/fstab, and then define a directory storage
for that mount point. This way you can use any file system supported by Linux.
Proxmox VE Administration Guide 89 / 328
This backend assumes that the underlying directory is POSIX compatible, but nothing else. This implies that
you cannot create snapshots at the storage level. But there exists a workaround for VM images using the
qcow2 file format, because that format supports snapshots internally.
Tip
Some storage types do not support O_DIRECT, so you can’t use cache mode none with such storages.
Simply use cache mode writeback instead.
We use a predefined directory layout to store different content types into different sub-directories. This layout
is used by all file level storage backends.
Table 8.2: Directory layout
Content type Subdir
VM images images/<VMID>/
ISO images template/iso/
Container templates template/cache/
Backup files dump/
8.5.1 Configuration
This backend supports all common storage properties, and adds an additional property called path to
specify the directory. This needs to be an absolute file system path.
Configuration Example (/etc/pve/storage.cfg)
dir: backup
path /mnt/backup
content backup
maxfiles 7
Above configuration defines a storage pool called backup. That pool can be used to store up to 7 backups
(maxfiles 7) per VM. The real path for the backup files is /mnt/backup/dump/....
8.5.2 File naming conventions
This backend uses a well defined naming scheme for VM images:
vm-<VMID>-<NAME>.<FORMAT>
<VMID>
This specifies the owner VM.
Proxmox VE Administration Guide 90 / 328
<NAME>
This can be an arbitrary name (ascii) without white space. The backend uses disk-[N] as
default, where [N] is replaced by an integer to make the name unique.
<FORMAT>
Specifies the image format (raw|qcow2|vmdk).
When you create a VM template, all VM images are renamed to indicate that they are now read-only, and
can be used as a base image for clones:
base-<VMID>-<NAME>.<FORMAT>
Note
Such base images are used to generate cloned images. So it is important that those files are read-only,
and never get modified. The backend changes the access mode to 0444, and sets the immutable flag
(chattr +i) if the storage supports that.
8.5.3 Storage Features
As mentioned above, most file systems do not support snapshots out of the box. To workaround that problem,
this backend is able to use qcow2 internal snapshot capabilities.
Same applies to clones. The backend uses the qcow2 base image feature to create clones.
Table 8.3: Storage features for backend dir
Content types Image formats Shared Snapshots Clones
images
rootdir
vztmpl iso
backup
raw qcow2
vmdk subvol
no qcow2 qcow2
8.5.4 Examples
Please use the following command to allocate a 4GB image on storage local:
# pvesm alloc local 100 vm-100-disk10.raw 4G
Formatting ’/var/lib/vz/images/100/vm-100-disk10.raw’, fmt=raw size -
=4294967296
successfully created ’local:100/vm-100-disk10.raw’
Note
The image name must conform to above naming conventions.
Proxmox VE Administration Guide 91 / 328
The real file system path is shown with:
# pvesm path local:100/vm-100-disk10.raw
/var/lib/vz/images/100/vm-100-disk10.raw
And you can remove the image with:
# pvesm free local:100/vm-100-disk10.raw
8.6 NFS Backend
Storage pool type: nfs
The NFS backend is based on the directory backend, so it shares most properties. The directory layout and
the file naming conventions are the same. The main advantage is that you can directly configure the NFS
server properties, so the backend can mount the share automatically. There is no need to modify /etc/
fstab. The backend can also test if the server is online, and provides a method to query the server for
exported shares.
8.6.1 Configuration
The backend supports all common storage properties, except the shared flag, which is always set. Addition-
ally, the following properties are used to configure the NFS server:
server
Server IP or DNS name. To avoid DNS lookup delays, it is usually preferable to use an IP address
instead of a DNS name - unless you have a very reliable DNS server, or list the server in the local /
etc/hosts file.
export
NFS export path (as listed by pvesm nfsscan).
You can also set NFS mount options:
path
The local mount point (defaults to /mnt/pve/<STORAGE_ID>/).
options
NFS mount options (see man nfs).
Configuration Example (/etc/pve/storage.cfg)
nfs: iso-templates
path /mnt/pve/iso-templates
server 10.0.0.10
export /space/iso-templates
options vers=3,soft
content iso,vztmpl
Proxmox VE Administration Guide 92 / 328
Tip
After an NFS request times out, NFS request are retried indefinitely by default. This can lead to unexpected
hangs on the client side. For read-only content, it is worth to consider the NFS soft option, which limits
the number of retries to three.
8.6.2 Storage Features
NFS does not support snapshots, but the backend uses qcow2 features to implement snapshots and
cloning.
Table 8.4: Storage features for backend nfs
Content types Image formats Shared Snapshots Clones
images
rootdir
vztmpl iso
backup
raw qcow2
vmdk
yes qcow2 qcow2
8.6.3 Examples
You can get a list of exported NFS shares with:
# pvesm nfsscan <server>
8.7 CIFS Backend
Storage pool type: cifs
The CIFS backend extends the directory backend, so that no manual setup of a CIFS mount is needed. Such
a storage can be added directly through the Proxmox VE API or the WebUI, with all our backend advantages,
like server heartbeat check or comfortable selection of exported shares.
8.7.1 Configuration
The backend supports all common storage properties, except the shared flag, which is always set. Addition-
ally, the following CIFS special properties are available:
server
Server IP or DNS name. Required.
Tip
To avoid DNS lookup delays, it is usually preferable to use an IP address instead of a DNS name - unless
you have a very reliable DNS server, or list the server in the local /etc/hosts file.
Proxmox VE Administration Guide 93 / 328
share
CIFS share to use (get available ones with pvesm cifsscan or the WebUI). Required.
username
The username for the CIFS storage. Optional, defaults to ‘guest’.
password
The user password. Optional. It will be saved in a file only readable by root (/etc/pve/priv/
<STORAGE_ID>.cred).
domain
Sets the user domain (workgroup) for this storage. Optional.
smbversion
SMB protocol Version. Optional, default is 3. SMB1 is not supported due to security issues.
path
The local mount point. Optional, defaults to /mnt/pve/<STORAGE_ID>/.
Configuration Example (/etc/pve/storage.cfg)
cifs: backup
path /mnt/pve/backup
server 10.0.0.11
share VMData
content backup
username anna
smbversion 3
8.7.2 Storage Features
CIFS does not support snapshots on a storage level. But you may use qcow2 backing files if you still want
to have snapshots and cloning features available.
Table 8.5: Storage features for backend cifs
Content types Image formats Shared Snapshots Clones
images
rootdir
vztmpl iso
backup
raw qcow2
vmdk
yes qcow2 qcow2
Proxmox VE Administration Guide 94 / 328
8.7.3 Examples
You can get a list of exported CIFS shares with:
# pvesm cifsscan <server> [--username <username>] [--password]
Then you could add this share as a storage to the whole Proxmox VE cluster with:
# pvesm add cifs <storagename> --server <server> --share <share> [-- -
username <username>] [--password]
8.8 GlusterFS Backend
Storage pool type: glusterfs
GlusterFS is a scalable network file system. The system uses a modular design, runs on commodity hard-
ware, and can provide a highly available enterprise storage at low costs. Such system is capable of scaling
to several petabytes, and can handle thousands of clients.
Note
After a node/brick crash, GlusterFS does a full rsync to make sure data is consistent. This can take a
very long time with large files, so this backend is not suitable to store large VM images.
8.8.1 Configuration
The backend supports all common storage properties, and adds the following GlusterFS specific options:
server
GlusterFS volfile server IP or DNS name.
server2
Backup volfile server IP or DNS name.
volume
GlusterFS Volume.
transport
GlusterFS transport: tcp,unix or rdma
Configuration Example (/etc/pve/storage.cfg)
glusterfs: Gluster
server 10.2.3.4
server2 10.2.3.5
volume glustervol
content images,iso
Proxmox VE Administration Guide 95 / 328
8.8.2 File naming conventions
The directory layout and the file naming conventions are inherited from the dir backend.
8.8.3 Storage Features
The storage provides a file level interface, but no native snapshot/clone implementation.
Table 8.6: Storage features for backend glusterfs
Content types Image formats Shared Snapshots Clones
images
vztmpl iso
backup
raw qcow2
vmdk
yes qcow2 qcow2
8.9 Local ZFS Pool Backend
Storage pool type: zfspool
This backend allows you to access local ZFS pools (or ZFS file systems inside such pools).
8.9.1 Configuration
The backend supports the common storage properties content,nodes,disable, and the following
ZFS specific properties:
pool
Select the ZFS pool/filesystem. All allocations are done within that pool.
blocksize
Set ZFS blocksize parameter.
sparse
Use ZFS thin-provisioning. A sparse volume is a volume whose reservation is not equal to the volume
size.
Configuration Example (/etc/pve/storage.cfg)
zfspool: vmdata
pool tank/vmdata
content rootdir,images
sparse
Proxmox VE Administration Guide 96 / 328
8.9.2 File naming conventions
The backend uses the following naming scheme for VM images:
vm-<VMID>-<NAME> // normal VM images
base-<VMID>-<NAME> // template VM image (read-only)
subvol-<VMID>-<NAME> // subvolumes (ZFS filesystem for containers)
<VMID>
This specifies the owner VM.
<NAME>
This can be an arbitrary name (ascii) without white space. The backend uses disk[N] as default,
where [N] is replaced by an integer to make the name unique.
8.9.3 Storage Features
ZFS is probably the most advanced storage type regarding snapshot and cloning. The backend uses ZFS
datasets for both VM images (format raw) and container data (format subvol). ZFS properties are inher-
ited from the parent dataset, so you can simply set defaults on the parent dataset.
Table 8.7: Storage features for backend zfs
Content types Image formats Shared Snapshots Clones
images
rootdir
raw subvol no yes yes
8.9.4 Examples
It is recommended to create an extra ZFS file system to store your VM images:
# zfs create tank/vmdata
To enable compression on that newly allocated file system:
# zfs set compression=on tank/vmdata
You can get a list of available ZFS filesystems with:
# pvesm zfsscan
8.10 LVM Backend
Storage pool type: lvm
Proxmox VE Administration Guide 97 / 328
LVM is a light software layer on top of hard disks and partitions. It can be used to split available disk space
into smaller logical volumes. LVM is widely used on Linux and makes managing hard drives easier.
Another use case is to put LVM on top of a big iSCSI LUN. That way you can easily manage space on
that iSCSI LUN, which would not be possible otherwise, because the iSCSI specification does not define a
management interface for space allocation.
8.10.1 Configuration
The LVM backend supports the common storage properties content,nodes,disable, and the follow-
ing LVM specific properties:
vgname
LVM volume group name. This must point to an existing volume group.
base
Base volume. This volume is automatically activated before accessing the storage. This is mostly
useful when the LVM volume group resides on a remote iSCSI server.
saferemove
Zero-out data when removing LVs. When removing a volume, this makes sure that all data gets erased.
saferemove_throughput
Wipe throughput (cstream -t parameter value).
Configuration Example (/etc/pve/storage.cfg)
lvm: myspace
vgname myspace
content rootdir,images
8.10.2 File naming conventions
The backend use basically the same naming conventions as the ZFS pool backend.
vm-<VMID>-<NAME> // normal VM images
8.10.3 Storage Features
LVM is a typical block storage, but this backend does not support snapshot and clones. Unfortunately,
normal LVM snapshots are quite inefficient, because they interfere all writes on the whole volume group
during snapshot time.
One big advantage is that you can use it on top of a shared storage, for example an iSCSI LUN. The backend
itself implement proper cluster wide locking.
Tip
The newer LVM-thin backend allows snapshot and clones, but does not support shared storage.
Proxmox VE Administration Guide 98 / 328
Table 8.8: Storage features for backend lvm
Content types Image formats Shared Snapshots Clones
images
rootdir
raw possible no no
8.10.4 Examples
List available volume groups:
# pvesm lvmscan
8.11 LVM thin Backend
Storage pool type: lvmthin
LVM normally allocates blocks when you create a volume. LVM thin pools instead allocates blocks when they
are written. This behaviour is called thin-provisioning, because volumes can be much larger than physically
available space.
You can use the normal LVM command line tools to manage and create LVM thin pools (see man lvmthin
for details). Assuming you already have a LVM volume group called pve, the following commands create a
new LVM thin pool (size 100G) called data:
lvcreate -L 100G -n data pve
lvconvert --type thin-pool pve/data
8.11.1 Configuration
The LVM thin backend supports the common storage properties content,nodes,disable, and the
following LVM specific properties:
vgname
LVM volume group name. This must point to an existing volume group.
thinpool
The name of the LVM thin pool.
Configuration Example (/etc/pve/storage.cfg)
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
Proxmox VE Administration Guide 99 / 328
8.11.2 File naming conventions
The backend use basically the same naming conventions as the ZFS pool backend.
vm-<VMID>-<NAME> // normal VM images
8.11.3 Storage Features
LVM thin is a block storage, but fully supports snapshots and clones efficiently. New volumes are automati-
cally initialized with zero.
It must be mentioned that LVM thin pools cannot be shared across multiple nodes, so you can only use them
as local storage.
Table 8.9: Storage features for backend lvmthin
Content types Image formats Shared Snapshots Clones
images
rootdir
raw no yes yes
8.11.4 Examples
List available LVM thin pools on volume group pve:
# pvesm lvmthinscan pve
8.12 Open-iSCSI initiator
Storage pool type: iscsi
iSCSI is a widely employed technology used to connect to storage servers. Almost all storage vendors
support iSCSI. There are also open source iSCSI target solutions available, e.g. OpenMediaVault, which is
based on Debian.
To use this backend, you need to install the open-iscsi package. This is a standard Debian package,
but it is not installed by default to save resources.
# apt-get install open-iscsi
Low-level iscsi management task can be done using the iscsiadm tool.
8.12.1 Configuration
The backend supports the common storage properties content,nodes,disable, and the following
iSCSI specific properties:
Proxmox VE Administration Guide 100 / 328
portal
iSCSI portal (IP or DNS name with optional port).
target
iSCSI target.
Configuration Example (/etc/pve/storage.cfg)
iscsi: mynas
portal 10.10.10.1
target iqn.2006-01.openfiler.com:tsn.dcb5aaaddd
content none
Tip
If you want to use LVM on top of iSCSI, it make sense to set content none. That way it is not possible
to create VMs using iSCSI LUNs directly.
8.12.2 File naming conventions
The iSCSI protocol does not define an interface to allocate or delete data. Instead, that needs to be done on
the target side and is vendor specific. The target simply exports them as numbered LUNs. So Proxmox VE
iSCSI volume names just encodes some information about the LUN as seen by the linux kernel.
8.12.3 Storage Features
iSCSI is a block level type storage, and provides no management interface. So it is usually best to export
one big LUN, and setup LVM on top of that LUN. You can then use the LVM plugin to manage the storage on
that iSCSI LUN.
Table 8.10: Storage features for backend iscsi
Content types Image formats Shared Snapshots Clones
images none raw yes no no
8.12.4 Examples
Scan a remote iSCSI portal, and returns a list of possible targets:
pvesm iscsiscan -portal <HOST[:PORT]>
Proxmox VE Administration Guide 101 / 328
8.13 User Mode iSCSI Backend
Storage pool type: iscsidirect
This backend provides basically the same functionality as the Open-iSCSI backed, but uses a user-level
library (package libiscsi2) to implement it.
It should be noted that there are no kernel drivers involved, so this can be viewed as performance optimiza-
tion. But this comes with the drawback that you cannot use LVM on top of such iSCSI LUN. So you need to
manage all space allocations at the storage server side.
8.13.1 Configuration
The user mode iSCSI backend uses the same configuration options as the Open-iSCSI backed.
Configuration Example (/etc/pve/storage.cfg)
iscsidirect: faststore
portal 10.10.10.1
target iqn.2006-01.openfiler.com:tsn.dcb5aaaddd
8.13.2 Storage Features
Note
This backend works with VMs only. Containers cannot use this driver.
Table 8.11: Storage features for backend iscsidirect
Content types Image formats Shared Snapshots Clones
images raw yes no no
8.14 Ceph RADOS Block Devices (RBD)
Storage pool type: rbd
Ceph is a distributed object store and file system designed to provide excellent performance, reliability and
scalability. RADOS block devices implement a feature rich block level storage, and you get the following
advantages:
thin provisioning
resizable volumes
distributed and redundant (striped over multiple OSDs)
Proxmox VE Administration Guide 102 / 328
full snapshot and clone capabilities
self healing
no single point of failure
scalable to the exabyte level
kernel and user space implementation available
Note
For smaller deployments, it is also possible to run Ceph services directly on your Proxmox VE nodes.
Recent hardware has plenty of CPU power and RAM, so running storage services and VMs on same
node is possible.
8.14.1 Configuration
This backend supports the common storage properties nodes,disable,content, and the following
rbd specific properties:
monhost
List of monitor daemon IPs. Optional, only needed if Ceph is not running on the PVE cluster.
pool
Ceph pool name.
username
RBD user Id. Optional, only needed if Ceph is not running on the PVE cluster.
krbd
Access rbd through krbd kernel module. This is required if you want to use the storage for containers.
Configuration Example for a external Ceph cluster (/etc/pve/storage.cfg)
rbd: ceph-external
monhost 10.1.1.20 10.1.1.21 10.1.1.22
pool ceph-external
content images
username admin
Tip
You can use the rbd utility to do low-level management tasks.
Proxmox VE Administration Guide 103 / 328
8.14.2 Authentication
If you use cephx authentication, you need to copy the keyfile from your external Ceph cluster to a Proxmox
VE host.
Create the directory /etc/pve/priv/ceph with
mkdir /etc/pve/priv/ceph
Then copy the keyring
scp <cephserver>:/etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ -
ceph/<STORAGE_ID>.keyring
The keyring must be named to match your <STORAGE_ID>. Copying the keyring generally requires root
privileges.
If Ceph is installed locally on the PVE cluster, this is done automatically by pveceph or in the GUI.
8.14.3 Storage Features
The rbd backend is a block level storage, and implements full snapshot and clone functionality.
Table 8.12: Storage features for backend rbd
Content types Image formats Shared Snapshots Clones
images
rootdir
raw yes yes yes
Proxmox VE Administration Guide 104 / 328
Chapter 9
Storage Replication
The pvesr command line tool manages the Proxmox VE storage replication framework. Storage replication
brings redundancy for guests using local storage and reduces migration time.
It replicates guest volumes to another node so that all data is available without using shared storage. Replica-
tion uses snapshots to minimize traffic sent over the network. Therefore, new data is sent only incrementally
after an initial full sync. In the case of a node failure, your guest data is still available on the replicated node.
The replication will be done automatically in configurable intervals. The minimum replication interval is one
minute and the maximal interval is once a week. The format used to specify those intervals is a subset of
systemd calendar events, see Schedule Format Section 9.2 section:
Every guest can be replicated to multiple target nodes, but a guest cannot get replicated twice to the same
target node.
Each replications bandwidth can be limited, to avoid overloading a storage or server.
Virtual guest with active replication cannot currently use online migration. Offline migration is supported in
general. If you migrate to a node where the guests data is already replicated only the changes since the last
synchronisation (so called delta) must be sent, this reduces the required time significantly. In this case
the replication direction will also switch nodes automatically after the migration finished.
For example: VM100 is currently on nodeA and gets replicated to nodeB. You migrate it to nodeB, so now
it gets automatically replicated back from nodeB to nodeA.
If you migrate to a node where the guest is not replicated, the whole disk data must send over. After the
migration the replication job continues to replicate this guest to the configured nodes.
Important
High-Availability is allowed in combination with storage replication, but it has the following implica-
tions:
redistributing services after a more preferred node comes online will lead to errors.
recovery works, but there may be some data loss between the last synced time and the time a
node failed.
9.1 Supported Storage Types
Proxmox VE Administration Guide 105 / 328
Table 9.1: Storage Types
Description PVE type Snapshots Stable
ZFS (local) zfspool yes yes
9.2 Schedule Format
Proxmox VE has a very flexible replication scheduler. It is based on the systemd time calendar event format.1
Calendar events may be used to refer to one or more points in time in a single expression.
Such a calendar event uses the following format:
[day(s)] [[start-time(s)][/repetition-time(s)]]
This allows you to configure a set of days on which the job should run. You can also set one or more start
times, it tells the replication scheduler the moments in time when a job should start. With this information we
could create a job which runs every workday at 10 PM: ’mon,tue,wed,thu,fri 22’ which could be
abbreviated to: ’mon..fri 22’, most reasonable schedules can be written quite intuitive this way.
Note
Hours are set in 24h format.
To allow easier and shorter configuration one or more repetition times can be set. They indicate that on the
start-time(s) itself and the start-time(s) plus all multiples of the repetition value replications will be done. If
you want to start replication at 8 AM and repeat it every 15 minutes until 9 AM you would use: ’8:00/15’
Here you see also that if no hour separation (:) is used the value gets interpreted as minute. If such a
separation is used the value on the left denotes the hour(s) and the value on the right denotes the minute(s).
Further, you can use *to match all possible values.
To get additional ideas look at more Examples below Section 9.2.2.
9.2.1 Detailed Specification
days
Days are specified with an abbreviated English version: sun, mon, tue, wed, thu, fri
and sat. You may use multiple days as a comma-separated list. A range of days can also be set
by specifying the start and end day separated by “..”, for example mon..fri. Those formats can be
also mixed. If omitted *is assumed.
time-format
A time format consists of hours and minutes interval lists. Hours and minutes are separated by ’:’.
Both, hour and minute, can be list and ranges of values, using the same format as days. First come
hours then minutes, hours can be omitted if not needed, in this case *is assumed for the value of
hours. The valid range for values is 0-23 for hours and 0-59 for minutes.
1see man 7 systemd.time for more information
Proxmox VE Administration Guide 106 / 328
9.2.2 Examples:
Table 9.2: Schedule Examples
Schedule String Alternative Meaning
mon,tue,wed,thu,fri mon..fri Every working day at 0:00
sat,sun sat..sun Only on weekends at 0:00
mon,wed,fri Only on Monday, Wednesday
and Friday at 0:00
12:05 12:05 Every day at 12:05 PM
*/5 0/5 Every five minutes
mon..wed 30/10 mon,tue,wed 30/10 Monday, Tuesday, Wednesday
30, 40 and 50 minutes after
every full hour
mon..fri 8..17,22:0/15 Every working day every 15
minutes between 8 AM and 6
PM and between 10 PM and 11
PM
fri 12..13:5/20 fri 12,13:5/20 Friday at 12:05, 12:25, 12:45,
13:05, 13:25 and 13:45
12,14,16,18,20,22:5 12/2:5 Every day starting at 12:05 until
22:05, every 2 hours
* */1 Every minute (minimum interval)
9.3 Error Handling
If a replication job encounters problems it will be placed in error state. In this state the configured replication
intervals get suspended temporarily. Then we retry the failed replication in a 30 minute interval, once this
succeeds the original schedule gets activated again.
9.3.1 Possible issues
This represents only the most common issues possible, depending on your setup there may be also another
cause.
Network is not working.
No free space left on the replication target storage.
Storage with same storage ID available on target node
Note
You can always use the replication log to get hints about a problems cause.
Proxmox VE Administration Guide 107 / 328
9.3.2 Migrating a guest in case of Error
In the case of a grave error a virtual guest may get stuck on a failed node. You then need to move it manually
to a working node again.
9.3.3 Example
Lets assume that you have two guests (VM 100 and CT 200) running on node A and replicate to node B.
Node A failed and can not get back online. Now you have to migrate the guest to Node B manually.
connect to node B over ssh or open its shell via the WebUI
check if that the cluster is quorate
# pvecm status
If you have no quorum we strongly advise to fix this first and make the node operable again. Only if this is
not possible at the moment you may use the following command to enforce quorum on the current node:
# pvecm expected 1
Warning
If expected votes are set avoid changes which affect the cluster (for example adding/removing
nodes, storages, virtual guests) at all costs. Only use it to get vital guests up and running again or
to resolve to quorum issue itself.
move both guest configuration files form the origin node A to node B:
# mv /etc/pve/nodes/A/qemu-server/100.conf /etc/pve/nodes/B/qemu-server -
/100.conf
# mv /etc/pve/nodes/A/lxc/200.conf /etc/pve/nodes/B/lxc/200.conf
Now you can start the guests again:
# qm start 100
# pct start 200
Remember to replace the VMIDs and node names with your respective values.
Proxmox VE Administration Guide 108 / 328
9.4 Managing Jobs
You can use the web GUI to create, modify and remove replication jobs easily. Additionally the command
line interface (CLI) tool pvesr can be used to do this.
You can find the replication panel on all levels (datacenter, node, virtual guest) in the web GUI. They differ in
what jobs get shown: all, only node specific or only guest specific jobs.
Once adding a new job you need to specify the virtual guest (if not already selected) and the target node.
The replication schedule Section 9.2 can be set if the default of all 15 minutes is not desired. You
may also impose rate limiting on a replication job, this can help to keep the storage load acceptable.
A replication job is identified by an cluster-wide unique ID. This ID is composed of the VMID in addition to an
job number. This ID must only be specified manually if the CLI tool is used.
9.5 Command Line Interface Examples
Create a replication job which will run every 5 minutes with limited bandwidth of 10 mbps (megabytes per
second) for the guest with guest ID 100.
# pvesr create-local-job 100-0 pve1 --schedule "*/5" --rate 10
Disable an active job with ID 100-0
# pvesr disable 100-0
Enable a deactivated job with ID 100-0
# pvesr enable 100-0
Change the schedule interval of the job with ID 100-0 to once a hour
# pvesr update 100-0 --schedule ’*/00’
Proxmox VE Administration Guide 109 / 328
Chapter 10
Qemu/KVM Virtual Machines
Qemu (short form for Quick Emulator) is an open source hypervisor that emulates a physical computer.
From the perspective of the host system where Qemu is running, Qemu is a user program which has access
to a number of local resources like partitions, files, network cards which are then passed to an emulated
computer which sees them as if they were real devices.
A guest operating system running in the emulated computer accesses these devices, and runs as it were
running on real hardware. For instance you can pass an iso image as a parameter to Qemu, and the OS
running in the emulated computer will see a real CDROM inserted in a CD drive.
Qemu can emulate a great variety of hardware from ARM to Sparc, but Proxmox VE is only concerned with
32 and 64 bits PC clone emulation, since it represents the overwhelming majority of server hardware. The
emulation of PC clones is also one of the fastest due to the availability of processor extensions which greatly
speed up Qemu when the emulated architecture is the same as the host architecture.
Note
You may sometimes encounter the term KVM (Kernel-based Virtual Machine). It means that Qemu is
running with the support of the virtualization processor extensions, via the Linux kvm module. In the
context of Proxmox VE Qemu and KVM can be used interchangeably as Qemu in Proxmox VE will always
try to load the kvm module.
Qemu inside Proxmox VE runs as a root process, since this is required to access block and PCI devices.
10.1 Emulated devices and paravirtualized devices
The PC hardware emulated by Qemu includes a mainboard, network controllers, scsi, ide and sata con-
trollers, serial ports (the complete list can be seen in the kvm(1) man page) all of them emulated in
software. All these devices are the exact software equivalent of existing hardware devices, and if the OS
running in the guest has the proper drivers it will use the devices as if it were running on real hardware. This
allows Qemu to runs unmodified operating systems.
This however has a performance cost, as running in software what was meant to run in hardware involves
a lot of extra work for the host CPU. To mitigate this, Qemu can present to the guest operating system
paravirtualized devices, where the guest OS recognizes it is running inside Qemu and cooperates with the
hypervisor.
Proxmox VE Administration Guide 110 / 328
Qemu relies on the virtio virtualization standard, and is thus able to present paravirtualized virtio devices,
which includes a paravirtualized generic disk controller, a paravirtualized network card, a paravirtualized
serial port, a paravirtualized SCSI controller, etc . . .
It is highly recommended to use the virtio devices whenever you can, as they provide a big performance
improvement. Using the virtio generic disk controller versus an emulated IDE controller will double the
sequential write throughput, as measured with bonnie++(8). Using the virtio network interface can
deliver up to three times the throughput of an emulated Intel E1000 network card, as measured with iperf
(1).1
10.2 Virtual Machines Settings
Generally speaking Proxmox VE tries to choose sane defaults for virtual machines (VM). Make sure you
understand the meaning of the settings you change, as it could incur a performance slowdown, or putting
your data at risk.
10.2.1 General Settings
General settings of a VM include
the Node : the physical server on which the VM will run
the VM ID: a unique number in this Proxmox VE installation used to identify your VM
Name: a free form text string you can use to describe the VM
Resource Pool: a logical group of VMs
1See this benchmark on the KVM wiki http://www.linux-kvm.org/page/Using_VirtIO_NIC
Proxmox VE Administration Guide 111 / 328
10.2.2 OS Settings
When creating a VM, setting the proper Operating System(OS) allows Proxmox VE to optimize some low
level parameters. For instance Windows OS expect the BIOS clock to use the local time, while Unix based
OS expect the BIOS clock to have the UTC time.
10.2.3 Hard Disk
Qemu can emulate a number of storage controllers:
the IDE controller, has a design which goes back to the 1984 PC/AT disk controller. Even if this controller
has been superseded by recent designs, each and every OS you can think of has support for it, making
it a great choice if you want to run an OS released before 2003. You can connect up to 4 devices on this
controller.
the SATA (Serial ATA) controller, dating from 2003, has a more modern design, allowing higher throughput
and a greater number of devices to be connected. You can connect up to 6 devices on this controller.
the SCSI controller, designed in 1985, is commonly found on server grade hardware, and can connect up
to 14 storage devices. Proxmox VE emulates by default a LSI 53C895A controller.
A SCSI controller of type VirtIO SCSI is the recommended setting if you aim for performance and is
automatically selected for newly created Linux VMs since Proxmox VE 4.3. Linux distributions have support
for this controller since 2012, and FreeBSD since 2014. For Windows OSes, you need to provide an extra
iso containing the drivers during the installation. If you aim at maximum performance, you can select a
SCSI controller of type VirtIO SCSI single which will allow you to select the IO Thread option. When
selecting VirtIO SCSI single Qemu will create a new controller for each disk, instead of adding all disks to
the same controller.
The VirtIO Block controller, often just called VirtIO or virtio-blk, is an older type of paravirtualized controller.
It has been superseded by the VirtIO SCSI Controller, in terms of features.
Proxmox VE Administration Guide 112 / 328
On each controller you attach a number of emulated hard disks, which are backed by a file or a block device
residing in the configured storage. The choice of a storage type will determine the format of the hard disk
image. Storages which present block devices (LVM, ZFS, Ceph) will require the raw disk image format,
whereas files based storages (Ext4, NFS, CIFS, GlusterFS) will let you to choose either the raw disk image
format or the QEMU image format.
the QEMU image format is a copy on write format which allows snapshots, and thin provisioning of the
disk image.
the raw disk image is a bit-to-bit image of a hard disk, similar to what you would get when executing the
dd command on a block device in Linux. This format does not support thin provisioning or snapshots by
itself, requiring cooperation from the storage layer for these tasks. It may, however, be up to 10% faster
than the QEMU image format.2
the VMware image format only makes sense if you intend to import/export the disk image to other hyper-
visors.
Setting the Cache mode of the hard drive will impact how the host system will notify the guest systems of
block write completions. The No cache default means that the guest system will be notified that a write is
complete when each block reaches the physical storage write queue, ignoring the host page cache. This
provides a good balance between safety and speed.
If you want the Proxmox VE backup manager to skip a disk when doing a backup of a VM, you can set the
No backup option on that disk.
If you want the Proxmox VE storage replication mechanism to skip a disk when starting a replication job, you
can set the Skip replication option on that disk. As of Proxmox VE 5.0, replication requires the disk images
to be on a storage of type zfspool, so adding a disk image to other storages when the VM has replication
configured requires to skip replication for this disk image.
If your storage supports thin provisioning (see the storage chapter in the Proxmox VE guide), and your VM
has a SCSI controller you can activate the Discard option on the hard disks connected to that controller. With
2See this benchmark for details http://events.linuxfoundation.org/sites/events/files/slides/-
CloudOpen2013_Khoa_Huynh_v3.pdf
Proxmox VE Administration Guide 113 / 328
Discard enabled, when the filesystem of a VM marks blocks as unused after removing files, the emulated
SCSI controller will relay this information to the storage, which will then shrink the disk image accordingly.
IO Thread
The option IO Thread can only be used when using a disk with the VirtIO controller, or with the SCSI
controller, when the emulated controller type is VirtIO SCSI single. With this enabled, Qemu creates one
I/O thread per storage controller, instead of a single thread for all I/O, so it increases performance when
multiple disks are used and each disk has its own storage controller. Note that backups do not currently
work with IO Thread enabled.
10.2.4 CPU
ACPU socket is a physical slot on a PC motherboard where you can plug a CPU. This CPU can then contain
one or many cores, which are independent processing units. Whether you have a single CPU socket with
4 cores, or two CPU sockets with two cores is mostly irrelevant from a performance point of view. However
some software licenses depend on the number of sockets a machine has, in that case it makes sense to set
the number of sockets to what the license allows you.
Increasing the number of virtual cpus (cores and sockets) will usually provide a performance improvement
though that is heavily dependent on the use of the VM. Multithreaded applications will of course benefit from
a large number of virtual cpus, as for each virtual cpu you add, Qemu will create a new thread of execution
on the host system. If you’re not sure about the workload of your VM, it is usually a safe bet to set the number
of Total cores to 2.
Note
It is perfectly safe if the overall number of cores of all your VMs is greater than the number of cores on the
server (e.g., 4 VMs with each 4 cores on a machine with only 8 cores). In that case the host system will
balance the Qemu execution threads between your server cores, just like if you were running a standard
multithreaded application. However, Proxmox VE will prevent you from assigning more virtual CPU cores
than physically available, as this will only bring the performance down due to the cost of context switches.
Proxmox VE Administration Guide 114 / 328
Resource Limits
In addition to the number of virtual cores, you can configure how much resources a VM can get in relation
to the host CPU time and also in relation to other VMs. With the cpulimit (“Host CPU Time”) option you
can limit how much CPU time the whole VM can use on the host. It is a floating point value representing
CPU time in percent, so 1.0 is equal to 100%,2.5 to 250% and so on. If a single process would fully use
one single core it would have 100% CPU Time usage. If a VM with four cores utilizes all its cores fully it
would theoretically use 400%. In reality the usage may be even a bit higher as Qemu can have additional
threads for VM peripherals besides the vCPU core ones. This setting can be useful if a VM should have
multiple vCPUs, as it runs a few processes in parallel, but the VM as a whole should not be able to run all
vCPUs at 100% at the same time. Using a specific example: lets say we have a VM which would profit from
having 8 vCPUs, but at no time all of those 8 cores should run at full load - as this would make the server so
overloaded that other VMs and CTs would get to less CPU. So, we set the cpulimit limit to 4.0 (=400%). If
all cores do the same heavy work they would all get 50% of a real host cores CPU time. But, if only 4 would
do work they could still get almost 100% of a real core each.
Note
VMs can, depending on their configuration, use additional threads e.g., for networking or IO operations
but also live migration. Thus a VM can show up to use more CPU time than just its virtual CPUs could
use. To ensure that a VM never uses more CPU time than virtual CPUs assigned set the cpulimit setting
to the same value as the total core count.
The second CPU resource limiting setting, cpuunits (nowadays often called CPU shares or CPU weight),
controls how much CPU time a VM gets in regards to other VMs running. It is a relative weight which defaults
to 1024, if you increase this for a VM it will be prioritized by the scheduler in comparison to other VMs with
lower weight. E.g., if VM 100 has set the default 1024 and VM 200 was changed to 2048, the latter VM 200
would receive twice the CPU bandwidth than the first VM 100.
For more information see man systemd.resource-control, here CPUQuota corresponds to cpu
limit and CPUShares corresponds to our cpuunits setting, visit its Notes section for references and
implementation details.
CPU Type
Qemu can emulate a number different of CPU types from 486 to the latest Xeon processors. Each new
processor generation adds new features, like hardware assisted 3d rendering, random number generation,
memory protection, etc . . . Usually you should select for your VM a processor type which closely matches
the CPU of the host system, as it means that the host CPU features (also called CPU flags ) will be available
in your VMs. If you want an exact match, you can set the CPU type to host in which case the VM will have
exactly the same CPU flags as your host system.
This has a downside though. If you want to do a live migration of VMs between different hosts, your VM might
end up on a new system with a different CPU type. If the CPU flags passed to the guest are missing, the
qemu process will stop. To remedy this Qemu has also its own CPU type kvm64, that Proxmox VE uses by
defaults. kvm64 is a Pentium 4 look a like CPU type, which has a reduced CPU flags set, but is guaranteed
to work everywhere.
In short, if you care about live migration and moving VMs between nodes, leave the kvm64 default. If you
don’t care about live migration or have a homogeneous cluster where all nodes have the same CPU, set the
CPU type to host, as in theory this will give your guests maximum performance.
Proxmox VE Administration Guide 115 / 328
Meltdown / Spectre related CPU flags
There are two CPU flags related to the Meltdown and Spectre vulnerabilities 3which need to be set manually
unless the selected CPU type of your VM already enables them by default.
The first, called pcid, helps to reduce the performance impact of the Meltdown mitigation called Kernel Page-
Table Isolation (KPTI), which effectively hides the Kernel memory from the user space. Without PCID, KPTI
is quite an expensive mechanism 4.
The second CPU flag is called spec-ctrl, which allows an operating system to selectively disable or restrict
speculative execution in order to limit the ability of attackers to exploit the Spectre vulnerability.
There are two requirements that need to be fulfilled in order to use these two CPU flags:
The host CPU(s) must support the feature and propagate it to the guest’s virtual CPU(s)
The guest operating system must be updated to a version which mitigates the attacks and is able to utilize
the CPU feature
In order to use spec-ctrl, your CPU or system vendor also needs to provide a so-called “microcode update”
5for your CPU.
To check if the Proxmox VE host supports PCID, execute the following command as root:
# grep ’ pcid ’ /proc/cpuinfo
If this does not return empty your host’s CPU has support for pcid.
To check if the Proxmox VE host supports spec-ctrl, execute the following command as root:
# grep ’ spec_ctrl ’ /proc/cpuinfo
If this does not return empty your host’s CPU has support for spec-ctrl.
If you use ‘host’ or another CPU type which enables the desired flags by default, and you updated your guest
OS to make use of the associated CPU features, you’re already set.
Otherwise you need to set the desired CPU flag of the virtual CPU, either by editing the CPU options in the
WebUI, or by setting the flags property of the cpu option in the VM configuration file.
NUMA
You can also optionally emulate a NUMA 6architecture in your VMs. The basics of the NUMA architecture
mean that instead of having a global memory pool available to all your cores, the memory is spread into
local banks close to each socket. This can bring speed improvements as the memory bus is not a bottleneck
anymore. If your system has a NUMA architecture 7we recommend to activate the option, as this will allow
proper distribution of the VM resources on the host system. This option is also required to hot-plug cores or
RAM in a VM.
If the NUMA option is used, it is recommended to set the number of sockets to the number of sockets of the
host system.
3Meltdown Attack https://meltdownattack.com/
4PCID is now a critical performance/security feature on x86 https://groups.google.com/forum/m/#!topic/mechanical-
sympathy/L9mHTbeQLNU
5You can use ‘intel-microcode’ / ‘amd-microcode’ from Debian non-free if your vendor does not provide such an update.
Note that not all affected CPUs can be updated to support spec-ctrl.
6https://en.wikipedia.org/wiki/Non-uniform_memory_access
7if the command numactl --hardware | grep available returns more than one node, then your host system
has a NUMA architecture
Proxmox VE Administration Guide 116 / 328
vCPU hot-plug
Modern operating systems introduced the capability to hot-plug and, to a certain extent, hot-unplug CPUs
in a running systems. Virtualisation allows us to avoid a lot of the (physical) problems real hardware can
cause in such scenarios. Still, this is a rather new and complicated feature, so its use should be restricted
to cases where its absolutely needed. Most of the functionality can be replicated with other, well tested and
less complicated, features, see Resource Limits Section 10.2.4.
In Proxmox VE the maximal number of plugged CPUs is always cores *sockets. To start a VM with
less than this total core count of CPUs you may use the vpus setting, it denotes how many vCPUs should
be plugged in at VM start.
Currently only this feature is only supported on Linux, a kernel newer than 3.10 is needed, a kernel newer
than 4.7 is recommended.
You can use a udev rule as follow to automatically set new CPUs as online in the guest:
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{ -
online}="1"
Save this under /etc/udev/rules.d/ as a file ending in .rules.
Note: CPU hot-remove is machine dependent and requires guest cooperation. The deletion command
does not guarantee CPU removal to actually happen, typically it’s a request forwarded to guest using target
dependent mechanism, e.g., ACPI on x86/amd64.
10.2.5 Memory
For each VM you have the option to set a fixed size memory or asking Proxmox VE to dynamically allocate
memory based on the current RAM usage of the host.
Fixed Memory Allocation
Proxmox VE Administration Guide 117 / 328
When setting memory and minimum memory to the same amount Proxmox VE will simply allocate what you
specify to your VM.
Even when using a fixed memory size, the ballooning device gets added to the VM, because it delivers useful
information such as how much memory the guest really uses. In general, you should leave ballooning
enabled, but if you want to disable it (e.g. for debugging purposes), simply uncheck Ballooning Device or
set
balloon: 0
in the configuration.
Automatic Memory Allocation
When setting the minimum memory lower than memory, Proxmox VE will make sure that the minimum
amount you specified is always available to the VM, and if RAM usage on the host is below 80%, will
dynamically add memory to the guest up to the maximum memory specified.
When the host is becoming short on RAM, the VM will then release some memory back to the host, swapping
running processes if needed and starting the oom killer in last resort. The passing around of memory
between host and guest is done via a special balloon kernel driver running inside the guest, which will
grab or release memory pages from the host. 8
When multiple VMs use the autoallocate facility, it is possible to set a Shares coefficient which indicates the
relative amount of the free host memory that each VM should take. Suppose for instance you have four VMs,
three of them running a HTTP server and the last one is a database server. To cache more database blocks
in the database server RAM, you would like to prioritize the database VM when spare RAM is available. For
this you assign a Shares property of 3000 to the database VM, leaving the other VMs to the Shares default
setting of 1000. The host server has 32GB of RAM, and is currently using 16GB, leaving 32 * 80/100 - 16 =
9GB RAM to be allocated to the VMs. The database VM will get 9 * 3000 / (3000 + 1000 + 1000 + 1000) =
4.5 GB extra RAM and each HTTP server will get 1/5 GB.
All Linux distributions released after 2010 have the balloon kernel driver included. For Windows OSes, the
balloon driver needs to be added manually and can incur a slowdown of the guest, so we don’t recommend
using it on critical systems.
When allocating RAM to your VMs, a good rule of thumb is always to leave 1GB of RAM available to the
host.
8A good explanation of the inner workings of the balloon driver can be found here https://rwmj.wordpress.com/2010/07/-
17/virtio-balloon/
Proxmox VE Administration Guide 118 / 328
10.2.6 Network Device
Each VM can have many Network interface controllers (NIC), of four different types:
Intel E1000 is the default, and emulates an Intel Gigabit network card.
the VirtIO paravirtualized NIC should be used if you aim for maximum performance. Like all VirtIO devices,
the guest OS should have the proper driver installed.
the Realtek 8139 emulates an older 100 MB/s network card, and should only be used when emulating
older operating systems ( released before 2002 )
the vmxnet3 is another paravirtualized device, which should only be used when importing a VM from
another hypervisor.
Proxmox VE will generate for each NIC a random MAC address, so that your VM is addressable on Ethernet
networks.
The NIC you added to the VM can follow one of two different models:
in the default Bridged mode each virtual NIC is backed on the host by a tap device, ( a software loopback
device simulating an Ethernet NIC ). This tap device is added to a bridge, by default vmbr0 in Proxmox VE.
In this mode, VMs have direct access to the Ethernet LAN on which the host is located.
in the alternative NAT mode, each virtual NIC will only communicate with the Qemu user networking
stack, where a built-in router and DHCP server can provide network access. This built-in DHCP will serve
addresses in the private 10.0.2.0/24 range. The NAT mode is much slower than the bridged mode, and
should only be used for testing. This mode is only available via CLI or the API, but not via the WebUI.
You can also skip adding a network device when creating a VM by selecting No network device.
Proxmox VE Administration Guide 119 / 328
Multiqueue
If you are using the VirtIO driver, you can optionally activate the Multiqueue option. This option allows
the guest OS to process networking packets using multiple virtual CPUs, providing an increase in the total
number of packets transferred.
When using the VirtIO driver with Proxmox VE, each NIC network queue is passed to the host kernel, where
the queue will be processed by a kernel thread spawn by the vhost driver. With this option activated, it is
possible to pass multiple network queues to the host kernel for each NIC.
When using Multiqueue, it is recommended to set it to a value equal to the number of Total Cores of your
guest. You also need to set in the VM the number of multi-purpose channels on each VirtIO NIC with the
ethtool command:
ethtool -L ens1 combined X
where X is the number of the number of vcpus of the VM.
You should note that setting the Multiqueue parameter to a value greater than one will increase the CPU
load on the host and guest systems as the traffic increases. We recommend to set this option only when the
VM has to process a great number of incoming connections, such as when the VM is running as a router,
reverse proxy or a busy HTTP server doing long polling.
10.2.7 USB Passthrough
There are two different types of USB passthrough devices:
Host USB passthrough
SPICE USB passthrough
Host USB passthrough works by giving a VM a USB device of the host. This can either be done via the
vendor- and product-id, or via the host bus and port.
The vendor/product-id looks like this: 0123:abcd, where 0123 is the id of the vendor, and abcd is the id of
the product, meaning two pieces of the same usb device have the same id.
The bus/port looks like this: 1-2.3.4, where 1is the bus and 2.3.4 is the port path. This represents the
physical ports of your host (depending of the internal order of the usb controllers).
If a device is present in a VM configuration when the VM starts up, but the device is not present in the host,
the VM can boot without problems. As soon as the device/port is available in the host, it gets passed through.
Warning
Using this kind of USB passthrough means that you cannot move a VM online to another host, since
the hardware is only available on the host the VM is currently residing.
The second type of passthrough is SPICE USB passthrough. This is useful if you use a SPICE client which
supports it. If you add a SPICE USB port to your VM, you can passthrough a USB device from where your
SPICE client is, directly to the VM (for example an input device or hardware dongle).
Proxmox VE Administration Guide 120 / 328
10.2.8 BIOS and UEFI
In order to properly emulate a computer, QEMU needs to use a firmware. By default QEMU uses SeaBIOS
for this, which is an open-source, x86 BIOS implementation. SeaBIOS is a good choice for most standard
setups.
There are, however, some scenarios in which a BIOS is not a good firmware to boot from, e.g. if you want
to do VGA passthrough. 9In such cases, you should rather use OVMF, which is an open-source UEFI
implementation. 10
If you want to use OVMF, there are several things to consider:
In order to save things like the boot order, there needs to be an EFI Disk. This disk will be included in
backups and snapshots, and there can only be one.
You can create such a disk with the following command:
qm set <vmid> -efidisk0 <storage>:1,format=<format>
Where <storage> is the storage where you want to have the disk, and <format> is a format which the
storage supports. Alternatively, you can create such a disk through the web interface with Add EFI Disk
in the hardware section of a VM.
When using OVMF with a virtual display (without VGA passthrough), you need to set the client resolution in
the OVMF menu(which you can reach with a press of the ESC button during boot), or you have to choose
SPICE as the display type.
10.2.9 Automatic Start and Shutdown of Virtual Machines
After creating your VMs, you probably want them to start automatically when the host system boots. For this
you need to select the option Start at boot from the Options Tab of your VM in the web interface, or set it
with the following command:
qm set <vmid> -onboot 1
Start and Shutdown Order
In some case you want to be able to fine tune the boot order of your VMs, for instance if one of your VM is
providing firewalling or DHCP to other guest systems. For this you can use the following parameters:
9Alex Williamson has a very good blog entry about this. http://vfio.blogspot.co.at/2014/08/primary-graphics-assignment-
without-vga.html
10 See the OVMF Project http://www.tianocore.org/ovmf/
Proxmox VE Administration Guide 121 / 328
Start/Shutdown order: Defines the start order priority. E.g. set it to 1 if you want the VM to be the first to
be started. (We use the reverse startup order for shutdown, so a machine with a start order of 1 would be
the last to be shut down). If multiple VMs have the same order defined on a host, they will additionally be
ordered by VMID in ascending order.
Startup delay: Defines the interval between this VM start and subsequent VMs starts . E.g. set it to 240
if you want to wait 240 seconds before starting other VMs.
Shutdown timeout: Defines the duration in seconds Proxmox VE should wait for the VM to be offline after
issuing a shutdown command. By default this value is set to 180, which means that Proxmox VE will issue
a shutdown request and wait 180 seconds for the machine to be offline. If the machine is still online after
the timeout it will be stopped forcefully.
Note
VMs managed by the HA stack do not follow the start on boot and boot order options currently. Those
VMs will be skipped by the startup and shutdown algorithm as the HA manager itself ensures that VMs
get started and stopped.
Please note that machines without a Start/Shutdown order parameter will always start after those where the
parameter is set. Further, this parameter can only be enforced between virtual machines running on the
same host, not cluster-wide.
10.3 Migration
If you have a cluster, you can migrate your VM to another host with
qm migrate <vmid> <target>
There are generally two mechanisms for this
Online Migration (aka Live Migration)
Offline Migration
10.3.1 Online Migration
When your VM is running and it has no local resources defined (such as disks on local storage, passed
through devices, etc.) you can initiate a live migration with the -online flag.
Proxmox VE Administration Guide 122 / 328
How it works
This starts a Qemu Process on the target host with the incoming flag, which means that the process starts
and waits for the memory data and device states from the source Virtual Machine (since all other resources,
e.g. disks, are shared, the memory content and device state are the only things left to transmit).
Once this connection is established, the source begins to send the memory content asynchronously to the
target. If the memory on the source changes, those sections are marked dirty and there will be another pass
of sending data. This happens until the amount of data to send is so small that it can pause the VM on the
source, send the remaining data to the target and start the VM on the target in under a second.
Requirements
For Live Migration to work, there are some things required:
The VM has no local resources (e.g. passed through devices, local disks, etc.)
The hosts are in the same Proxmox VE cluster.
The hosts have a working (and reliable) network connection.
The target host must have the same or higher versions of the Proxmox VE packages. (It might work the
other way, but this is never guaranteed)
10.3.2 Offline Migration
If you have local resources, you can still offline migrate your VMs, as long as all disk are on storages, which
are defined on both hosts. Then the migration will copy the disk over the network to the target host.
10.4 Copies and Clones
VM installation is usually done using an installation media (CD-ROM) from the operation system vendor.
Depending on the OS, this can be a time consuming task one might want to avoid.
An easy way to deploy many VMs of the same type is to copy an existing VM. We use the term clone for
such copies, and distinguish between linked and full clones.
Full Clone
The result of such copy is an independent VM. The new VM does not share any storage resources
with the original.
Proxmox VE Administration Guide 123 / 328
It is possible to select a Target Storage, so one can use this to migrate a VM to a totally different
storage. You can also change the disk image Format if the storage driver supports several formats.
Note
A full clone need to read and copy all VM image data. This is usually much slower than creating a
linked clone.
Some storage types allows to copy a specific Snapshot, which defaults to the current VM data. This
also means that the final copy never includes any additional snapshots from the original VM.
Linked Clone
Modern storage drivers supports a way to generate fast linked clones. Such a clone is a writable copy
whose initial contents are the same as the original data. Creating a linked clone is nearly instanta-
neous, and initially consumes no additional space.
They are called linked because the new image still refers to the original. Unmodified data blocks are
read from the original image, but modification are written (and afterwards read) from a new location.
This technique is called Copy-on-write.
This requires that the original volume is read-only. With Proxmox VE one can convert any VM into a
read-only Template). Such templates can later be used to create linked clones efficiently.
Note
You cannot delete the original template while linked clones exists.
It is not possible to change the Target storage for linked clones, because this is a storage internal
feature.
The Target node option allows you to create the new VM on a different node. The only restriction is that the
VM is on shared storage, and that storage is also available on the target node.
To avoid resource conflicts, all network interface MAC addresses gets randomized, and we generate a new
UUID for the VM BIOS (smbios1) setting.
10.5 Virtual Machine Templates
One can convert a VM into a Template. Such templates are read-only, and you can use them to create linked
clones.
Note
It is not possible to start templates, because this would modify the disk images. If you want to change the
template, create a linked clone and modify that.
Proxmox VE Administration Guide 124 / 328
10.6 Importing Virtual Machines and disk images
A VM export from a foreign hypervisor takes usually the form of one or more disk images, with a configuration
file describing the settings of the VM (RAM, number of cores).
The disk images can be in the vmdk format, if the disks come from VMware or VirtualBox, or qcow2 if
the disks come from a KVM hypervisor. The most popular configuration format for VM exports is the OVF
standard, but in practice interoperation is limited because many settings are not implemented in the standard
itself, and hypervisors export the supplementary information in non-standard extensions.
Besides the problem of format, importing disk images from other hypervisors may fail if the emulated hard-
ware changes too much from one hypervisor to another. Windows VMs are particularly concerned by this,
as the OS is very picky about any changes of hardware. This problem may be solved by installing the
MergeIDE.zip utility available from the Internet before exporting and choosing a hard disk type of IDE before
booting the imported Windows VM.
Finally there is the question of paravirtualized drivers, which improve the speed of the emulated system and
are specific to the hypervisor. GNU/Linux and other free Unix OSes have all the necessary drivers installed
by default and you can switch to the paravirtualized drivers right after importing the VM. For Windows VMs,
you need to install the Windows paravirtualized drivers by yourself.
GNU/Linux and other free Unix can usually be imported without hassle. Note that we cannot guarantee a
successful import/export of Windows VMs in all cases due to the problems above.
10.6.1 Step-by-step example of a Windows OVF import
Microsoft provides Virtual Machines downloads to get started with Windows development.We are going to
use one of these to demonstrate the OVF import feature.
Download the Virtual Machine zip
After getting informed about the user agreement, choose the Windows 10 Enterprise (Evaluation - Build) for
the VMware platform, and download the zip.
Extract the disk image from the zip
Using the unzip utility or any archiver of your choice, unpack the zip, and copy via ssh/scp the ovf and
vmdk files to your Proxmox VE host.
Import the Virtual Machine
This will create a new virtual machine, using cores, memory and VM name as read from the OVF manifest,
and import the disks to the local-lvm storage. You have to configure the network manually.
qm importovf 999 WinDev1709Eval.ovf local-lvm
The VM is ready to be started.
Proxmox VE Administration Guide 125 / 328
10.6.2 Adding an external disk image to a Virtual Machine
You can also add an existing disk image to a VM, either coming from a foreign hypervisor, or one that you
created yourself.
Suppose you created a Debian/Ubuntu disk image with the vmdebootstrap tool:
vmdebootstrap --verbose \
--size 10GiB --serial-console \
--grub --no-extlinux \
--package openssh-server \
--package avahi-daemon \
--package qemu-guest-agent \
--hostname vm600 --enable-dhcp \
--customize=./copy_pub_ssh.sh \
--sparse --image vm600.raw
You can now create a new target VM for this image.
qm create 600 --net0 virtio,bridge=vmbr0 --name vm600 --serial0 -
socket \
--bootdisk scsi0 --scsihw virtio-scsi-pci --ostype l26
Add the disk image as unused0 to the VM, using the storage pvedir:
qm importdisk 600 vm600.raw pvedir
Finally attach the unused disk to the SCSI controller of the VM:
qm set 600 --scsi0 pvedir:600/vm-600-disk-1.raw
The VM is ready to be started.
10.7 Cloud-Init Support
Cloud-Init is the defacto multi-distribution package that handles early initialization of a virtual machine in-
stance. Using Cloud-Init, configuration of network devices and ssh keys on the hypervisor side is possible.
When the VM starts for the first time, the Cloud-Init software inside the VM will apply those settings.
Many Linux distributions provide ready-to-use Cloud-Init images, mostly designed for OpenStack. These
images will also work with Proxmox VE. While it may seem convenient to get such ready-to-use images, we
usually recommended to prepare the images by yourself. The advantage is that you will know exactly what
you have installed, and this helps you later to easily customize the image for your needs.
Once you have created such a Cloud-Init image we recommend to convert it into a VM template. From a VM
template you can quickly create linked clones, so this is a fast method to roll out new VM instances. You just
need to configure the network (and maybe the ssh keys) before you start the new VM.
We recommend using SSH key-based authentication to login to the VMs provisioned by Cloud-Init. It is also
possible to set a password, but this is not as safe as using SSH key-based authentication because Proxmox
VE needs to store an encrypted version of that password inside the Cloud-Init data.
Proxmox VE generates an ISO image to pass the Cloud-Init data to the VM. For that purpose all Cloud-Init
VMs need to have an assigned CDROM drive. Also many Cloud-Init images assume to have a serial console,
so it is recommended to add a serial console and use it as display for those VMs.
Proxmox VE Administration Guide 126 / 328
10.7.1 Preparing Cloud-Init Templates
The first step is to prepare your VM. Basically you can use any VM. Simply install the Cloud-Init packages
inside the VM that you want to prepare. On Debian/Ubuntu based systems this is as simple as:
apt-get install cloud-init
Already many distributions provide ready-to-use Cloud-Init images (provided as .qcow2 files), so alterna-
tively you can simply download and import such images. For the following example, we will use the cloud
image provided by Ubuntu at https://cloud-images.ubuntu.com.
# download the image
wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg- -
amd64.img
# create a new VM
qm create 9000 --memory 2048 --net0 virtio,bridge=vmbr0
# import the downloaded disk to local-lvm storage
qm importdisk 9000 bionic-server-cloudimg-amd64.img local-lvm