OpenOnload Onload User Guide (2015) SF 104474 CD 20 Issue
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 265 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Table of Contents
- 1 What’s New
- 2 Low Latency Quickstart Guide
- 3 Background
- 4 Installation
- 4.1 Introduction
- 4.2 Onload Distributions
- 4.3 Hardware and Software Supported Platforms
- 4.4 Onload and the Network Adapter Driver
- 4.5 Removing Previously Installed Drivers
- 4.6 Pre-install Notes
- 4.7 EnterpriseOnload - Build and Install from SRPM
- 4.8 EnterpriseOnload - Debian Source Packages
- 4.9 OpenOnload DKMS Installation
- 4.10 Build OpenOnload Source RPM
- 4.11 OpenOnload - Installation
- 4.12 Onload Kernel Modules
- 4.13 Configuring the Network Interfaces
- 4.14 Installing Netperf
- 4.15 How to run Onload
- 4.16 Testing the Onload Installation
- 4.17 Apply an Onload Patch
- 5 Tuning Onload
- 6 Onload Functionality
- 6.1 Onload Transparency
- 6.2 Onload Stacks
- 6.3 Virtual Network Interface (VNIC)
- 6.4 Functional Overview
- 6.5 Onload with Mixed Network Adapters
- 6.6 Maximum Number of Network Interfaces
- 6.7 Whitelist and Blacklist Interfaces
- 6.8 Onloaded PIDs
- 6.9 Onload and File Descriptors, Stacks and Sockets
- 6.10 System calls intercepted by Onload
- 6.11 Linux Sysctls
- 6.12 Changing Onload Control Plane Table Sizes
- 6.13 SO_TIMESTAMP and SO_TIMESTAMPNS (software timestamps)
- 6.14 SO_TIMESTAMPING (Hardware Receive Timestamps)
- 6.15 SO_TIMESTAMPING (Hardware Transmit Timestamps)
- 6.16 SO_BINDTODEVICE
- 6.17 Multiplexed I/O
- 6.18 Wire Order Delivery
- 6.19 Stack Sharing
- 6.20 Application Clustering
- 6.21 Bonding, Link aggregation and Failover
- 6.22 VLANS
- 6.23 Accelerated pipe()
- 6.24 Zero-Copy API
- 6.25 Debug and Logging
- 7 Onload - TCP
- 7.1 TCP Operation
- 7.2 TCP Handshake - SYN, SYNACK
- 7.3 TCP SYN Cookies
- 7.4 TCP Socket Options
- 7.5 TCP Level Options
- 7.6 TCP File Descriptor Control
- 7.7 TCP Congestion Control
- 7.8 TCP SACK
- 7.9 TCP QUICKACK
- 7.10 TCP Delayed ACK
- 7.11 TCP Dynamic ACK
- 7.12 TCP Loopback Acceleration
- 7.13 TCP Striping
- 7.14 TCP Connection Reset on RTO
- 7.15 ONLOAD_MSG_WARM
- 7.16 Listen/Accept Sockets
- 7.17 Socket Caching
- 7.18 Scalable Filters
- 7.19 Transparent Reverse Proxy Modes
- 7.20 Transparent Reverse Proxy on Multiple CPUs
- 8 Onload - UDP
- 8.1 UDP Operation
- 8.2 Socket Options
- 8.3 Source Specific Socket Options
- 8.4 UDP Send and Receive Paths
- 8.5 Fragmented UDP
- 8.6 User Level recvmmsg for UDP
- 8.7 User-Level sendmmsg for UDP
- 8.8 Multicast Replication
- 8.9 Multicast Operation and Stack Sharing
- 8.10 Multicast Loopback
- 8.11 Hardware Multicast Loopback
- 8.12 IP_MULTICAST_ALL
- 9 Packet Buffers
- 10 Onload and Virtualization
- 11 Limitations
- 11.1 Introduction
- 11.2 Changes to Behavior
- 11.3 Limits to Acceleration
- 11.4 epoll - Known Issues
- 11.5 Configuration Issues
- Mixed Adapters Sharing a Broadcast Domain
- Virtual Memory on 32 Bit Systems
- Hardware Resources
- IGMP Operation and Multicast Process Priority
- Dynamic Loading
- Scalable Packet Buffer Mode
- SLES11 SR-IOV
- Huge Pages with IPC namespace
- Huge Pages with Shared Stacks
- Huge Pages - Size
- Huge Pages - AMD IOMMU
- Huge Pages and shmmni
- Red Hat MRG 2 and SR-IOV
- PowerPC Architecture
- Java 7 Applications - use of vfork()
- 12 Change History
- A Parameter Reference
- B Meta Options
- C Build Dependencies
- D Onload Extensions API
- D.1 Source Code
- D.2 Common Components
- D.3 Stacks API
- D.4 Stacks API Usage
- D.5 Stacks API - Examples
- D.6 Zero-Copy API
- Zero-Copy Data Buffers
- Zero-Copy UDP Receive Overview
- Zero-Copy UDP Receive
- Zero-Copy Receive Example #1
- Zero-Copy Receive Example #2
- Zero-Copy TCP Send Overview
- Zero-Copy TCP Send
- Zero-Copy Send - Single Message, Single Buffer
- Zero-Copy Send - Multiple Message, Multiple Buffers
- Zero-Copy Send - Full Example
- D.7 Templated Sends
- D.8 Delegated Sends API
- E onload_stackdump
- E.1 Introduction
- E.2 General Use
- List Onloaded Processes
- Identify Onloaded Processes Affinities
- List Onload Environment variables
- TX PIO Counters
- Send RST on a TCP Socket
- Removing Zombie and Orphan Stacks
- Snapshot vs. Dynamic Views
- Monitoring Receive and Transmit Packet Buffers
- Packet Sets
- TCP Application STATS
- The onload_stackdump LOTS Command.
- Remote Monitoring
- F Solarflare sfnettest
- G onload_tcpdump
- H ef_vi
- I onload_iptables
- J Solarflare efpio Test Application

Issue20 ©SolarflareCommunications2015 i
OnloadUserGuide
Copyright©2015SOLARFLARECommunications,Inc.Allrightsreserved.
Thesoftwareandhardwareasapplicable(the“Product”)describedinthisdocument,andthisdocument,areprotectedby
copyrightlaws,patentsandotherintellectualpropertylawsandinternationaltreaties.TheProductdescribedinthisdocumentis
providedpursuanttoalicenseagreement,evaluationagreementand/ornon‐disclosureagreement.TheProductmaybeusedonly
inaccordancewiththetermsofsuchagreement.Thesoftwareasapplicablemaybecopiedonlyinaccordancewiththetermsof
suchagreement.
OnloadislicensedundertheGNUGeneralPublicLicense(Version2,June1991).SeetheLICENSEfileinthedistributionfordetails.
TheOnloadExtensionsStubLibraryisCopyrightlicensedundertheBSD2‐ClauseLicense.
OnloadcontainsalgorithmsanduseshardwareinterfacetechniqueswhicharesubjecttoSolarflareCommunicationsIncpatent
applications.PartiesinterestedinlicensingSolarflare'sIPareencouragedtocontactSolarflare'sIntellectualPropertyLicensing
Groupat:
DirectorofIntellectualPropertyLicensing
IntellectualPropertyLicensingGroup
SolarflareCommunicationsInc,
7505IrvineCenterDrive
Suite100
Irvine,California92618
YouwillnotdisclosetoathirdpartytheresultsofanyperformancetestscarriedoutusingOnloadorEnterpriseOnloadwithout
thepriorwrittenconsentofSolarflare.
Thefurnishingofthisdocumenttoyoudoesnotgiveyouanyrightsorlicenses,expressorimplied,byestoppelorotherwise,with
respecttoanysuchProduct,oranycopyrights,patentsorotherintellectualpropertyrightscoveringsuchProduct,andthis
documentdoesnotcontainorrepresentanycommitmentofanykindonthepartofSOLARFLARECommunications,Inc.orits
affiliates.
TheonlywarrantiesgrantedbySOLARFLARECommunications,Inc.oritsaffiliatesinconnectionwiththeProductdescribedinthis
documentarethoseexpresslysetforthinthelicenseagreement,evaluationagreementand/ornon‐disclosureagreement
pursuanttowhichtheProductisprovided.EXCEPTASEXPRESSLYSETFORTHINSUCHAGREEMENT,NEITHERSOLARFLARE
COMMUNICATIONS,INC.NORITSAFFILIATESMAKEANYREPRESENTATIONSORWARRANTIESOFANYKIND(EXPRESSORIMPLIED)
REGARDINGTHEPRODUCTORTHISDOCUMENTATIONANDHEREBYDISCLAIMALLIMPLIEDWARRANTIESOFMERCHANTABILITY,
FITNESSFORAPARTICULARPURPOSEANDNON‐INFRINGEMENT,ANDANYWARRANTIESTHATMAYARISEFROMCOURSEOF
DEALING,COURSEOFPERFORMANCEORUSAGEOFTRADE.Unlessotherwiseexpresslysetforthinsuchagreement,totheextent
allowedbyapplicablelaw(a)innoeventshallSOLARFLARECommunications,Inc.oritsaffiliateshaveanyliabilityunderanylegal
theoryforanylossofrevenuesorprofits,lossofuseordata,orbusinessinterruptions,orforanyindirect,special,incidentalor
consequentialdamages,evenifadvisedofthepossibilityofsuchdamages;and(b)thetotalliabilityofSOLARFLARE
Communications,Inc.oritsaffiliatesarisingfromorrelatingtosuchagreementortheuseofthisdocumentshallnotexceedthe
amountreceivedbySOLARFLARECommunications,Inc.oritsaffiliatesforthatcopyoftheProductorthisdocumentwhichisthe
subjectofsuchliability.
TheProductisnotintendedforuseinmedical,lifesaving,lifesustaining,criticalcontrolorsafetysystems,orinnuclearfacility
applications.
SF‐104474‐CD
LastRevised:October2015
Issue20

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 ii

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 iii
TableofContents
1What’sNew ........................................................1
2LowLatencyQuickstartGuide.........................................4
3Background.......................................................11
3.1Introduction. ...............................................11
4Installation........................................................15
4.1Introduction ................................................15
4.2OnloadDistributions .........................................15
4.3HardwareandSoftwareSupportedPlatforms ....................16
4.4OnloadandtheNetworkAdapterDriver ........................17
4.5RemovingPreviouslyInstalledDrivers...........................17
4.6Pre‐installNotes ............................................18
4.7EnterpriseOnload‐BuildandInstallfromSRPM ..................18
4.8EnterpriseOnload‐DebianSourcePackages......................20
4.9OpenOnloadDKMSInstallation................................20
4.10BuildOpenOnloadSourceRPM...............................21
4.11OpenOnload‐Installation....................................21
4.12OnloadKernelModules .....................................22
4.13ConfiguringtheNetworkInterfaces............................23
4.14InstallingNetperf...........................................24
4.15HowtorunOnload .........................................24
4.16Testi ngtheOnloadInstallation................................24
4.17ApplyanOnloadPatch ......................................24
5TuningOnload .....................................................26
5.1Introduction ................................................26
5.2SystemTuning ..............................................27
5.3StandardTuning .............................................29
5.4OnloadDeploymentonNUMASystems .........................31
5.5InterruptHandling‐KernelDriver ..............................33
5.6PerformanceJitter...........................................39
5.7AdvancedTuning ............................................42

OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 iv
6OnloadFunctionality................................................49
6.1OnloadTransparency.........................................49
6.2OnloadStacks...............................................49
6.3VirtualNetworkInterface(VNIC) ...............................50
6.4FunctionalOverview .........................................50
6.5OnloadwithMixedNetworkAdapters ..........................50
6.6MaximumNumberofNetworkInterfaces .......................51
6.7WhitelistandBlacklistInterfaces ...............................51
6.8OnloadedPIDs ..............................................51
6.9OnloadandFileDescriptors,StacksandSockets ..................52
6.10SystemcallsinterceptedbyOnload ............................52
6.11LinuxSysctls ...............................................52
6.12ChangingOnloadControlPlaneTableSizes .....................54
6.13SO_TIMESTAMPandSO_TIMESTAMPNS(softwaretimestamps)....55
6.14SO_TIMESTAMPING(HardwareReceiveTimestamps) .............55
6.15SO_TIMESTAMPING(HardwareTransmitTimestamps)............56
6.16SO_BINDTODEVICE.........................................57
6.17MultiplexedI/O............................................57
6.18WireOrderDelivery ........................................61
6.19StackSharing..............................................62
6.20ApplicationClustering .......................................63
6.21Bonding,LinkaggregationandFailover.........................65
6.22VLANS....................................................66
6.23Acceleratedpipe() ..........................................66
6.24Zero‐CopyAPI .............................................67
6.25DebugandLogging.........................................67

OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 v
7Onload‐TCP ......................................................69
7.1TCPOperation ..............................................69
7.2TCPHandshake‐SYN,SYNACK .................................69
7.3TCPSYNCookies ............................................70
7.4TCPSocketOptions..........................................70
7.5TCPLevelOptions ...........................................72
7.6TCPFileDescriptorControl....................................73
7.7TCPCongestionControl.......................................74
7.8TCPSACK ..................................................75
7.9TCPQUICKACK ..............................................75
7.10TCPDelayedACK...........................................75
7.11TCPDynamicACK ..........................................75
7.12TCPLoopbackAcceleration ..................................76
7.13TCPStriping...............................................77
7.14TCPConnectionResetonRTO ................................78
7.15ONLOAD_MSG_WARM ......................................78
7.16Listen/AcceptSockets .......................................79
7.17SocketCaching.............................................80
7.18ScalableFilters.............................................82
7.19TransparentReverseProxyModes.............................84
7.20TransparentReverseProxyonMultipleCPUs ....................85
8Onload‐UDP ......................................................86
8.1UDPOperation..............................................86
8.2SocketOptions..............................................86
8.3SourceSpecificSocketOptions ................................88
8.4UDPSendandReceivePaths ..................................88
8.5FragmentedUDP............................................89
8.6UserLevelrecvmmsgforUDP .................................89
8.7User‐LevelsendmmsgforUDP .................................90
8.8MulticastReplication.........................................90
8.9MulticastOperationandStackSharing..........................91
8.10MulticastLoopback .........................................94
8.11HardwareMulticastLoopback................................94
8.12IP_MULTICAST_ALL .........................................96

OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 vi
9PacketBuffers.....................................................97
9.1Introduction ................................................97
9.2NetworkAdapterBufferTableMode............................97
9.3LargeBufferTableSupport ....................................97
9.4ScalablePacketBufferMode ..................................98
9.5AllocatingHugePages ........................................98
9.6HowPacketBuffersAreUsedbyOnload .........................99
9.7ConfiguringScalablePacketBuffers............................102
9.8PhysicalAddressingMode ...................................106
9.9ProgrammedI/O...........................................107
9.10TemplatedSends ..........................................108
10OnloadandVirtualization .........................................109
10.1Introduction ..............................................109
10.2Overview ................................................109
10.3OnloadandLinuxKVM.....................................109
10.4OnloadandNICPartitioning.................................111
10.5OnloadinaDockerContainer ...............................113
10.6Pre‐Installation ...........................................113
10.7Installation ...............................................114
10.8CreateOnloadDockerImage................................115
10.9Migration................................................115
10.10CopyingFilesBetweenHostandContainer ...................116
11Limitations......................................................117
11.1Introduction ..............................................117
11.2ChangestoBehavior .......................................117
11.3LimitstoAcceleration ......................................119
11.4epoll‐KnownIssues.......................................122
11.5ConfigurationIssues.......................................124
12ChangeHistory ..................................................129
12.1Features.................................................130
12.2EnvironmentVariables .....................................135
12.3ModuleOptions...........................................143
AParameterReference..............................................146
A.1ParameterList.............................................146
BMetaOptions....................................................185
B.1Environmentvariables ......................................185
CBuildDependencies...............................................187
C.1General...................................................187

OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 vii
DOnloadExtensionsAPI.............................................189
D.1SourceCode...............................................189
D.2CommonComponents......................................189
D.3StacksAPI.................................................193
D.4StacksAPIUsage...........................................198
D.5StacksAPI‐Examples.......................................200
D.6Zero‐CopyAPI .............................................201
D.7TemplatedSends ...........................................212
D.8DelegatedSendsAPI ........................................213
Eonload_stackdump................................................219
E.1Introduction ...............................................219
E.2GeneralUse ...............................................219
FSolarflaresfnettest................................................238
F.1 Introduction...............................................238
Gonload_tcpdump.................................................246
G.1Introduction...............................................246
G.2Buildingonload_tcpdump ...................................246
G.3Usingonload_tcpdump .....................................246
Hef_vi........................................................... 249
H.1Components ..............................................249
H.2CompilingandLinking ......................................249
H.3Documentation ............................................250
Ionload_iptables...................................................251
I.1Description ................................................251
I.2Howitworks ...............................................251
I.3Features...................................................252
I.4Rules .....................................................252
I.5Previewfirewallrules ........................................253
I.6ErrorMessages .............................................255
JSolarflareefpioTestApplication.....................................257
J.1efpio .....................................................257

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 1
1What’sNew
ThisissueoftheuserguideidentifieschangesintroducedinOpenOnload201509.
RefertoChangeHistoryonpage129toconfirmfeatureavailabilityintheEnterprise
release.
Foracompletelistoffeaturesandenhancementsrefertothereleasenotesandthe
releasechangelogavailablefrom:http://www.openonload.org/download.html.
ThechangesandimprovementsinOnload‐201509aregearedtowardsInternet
basedservices,ISPloadbalancingserversandCDNbasedinfrastructuressuchas
thosefrontedbyveryhighconnectionratereverseproxyandtransparentproxy
servers.ThechangesinOnloadimprovescalabilitybyincreasingsocketconnection
ratesandbyremovinglimitationsonthenumberoflisteningsocketsandactive‐
opennetworkconnectionsthatcanbesustained.
NetdriverandFirmwareUpdates
OpenOnload201509includesthe4.5.1.1026netdriver.
UsersshouldrefertoReleaseNotes‐sfcinthedistributionpackagefordetailsof
changestotheadapterdriver.Manyofthenewfeaturesrequireaminimum4.6
versionfirmware.
NewFeaturesOpenOnload201509
ScalableFilters
Onaselectedinterface,aMACfilterisusedtoreceivealltraffictoasingleOnload
stack.TheMACfilterovercomesthehardwarelimitationsencounteredwhenusing
IPfiltersandallowsagreaternumberofTCPlisteningsocketsandactive‐open
connectionstobemaintained.
ThisfeatureisenabledwiththeEF_SCALABLE_FILTERSenvironmentvariable.Refer
toScalableFiltersonpage82formoredetails.
ActiveSocketCaching
ActivesocketcachingspeedsupsocketcreationallowingOnloadtoreuseactive‐
opensocketswhicharerecycledbacktotheOnloadstackwhenanestablishedTCP
connectionhasterminated.PassiveSocketCachingwasaddedinapreviousOnload
release.
RefertoSocketCachingonpage80.

OnloadUserGuide
What’sNew
Issue20 ©SolarflareCommunications2015 2
IP_TRANSPARENTSocketOption
Onload201509supportstheIP_TRANSPARENTsocketoptiononTCPsockets(Linux
since2.6.24).SocketshavingsetthisoptionareabletobindtoanonlocalIPaddress.
ThisfeatureisaddedtosupportOnloaddeploymentintransparentandreverse
proxyconfigurations.FormoreinformationseeTransparentReverseProxyModes
onpage84.
Teaming
Onloadnowsupportsbonds/teamsconfiguredwiththeLinux"teaming"kernel
moduleand"teamd"daemon.Thisisinadditiontothelong‐standingsupportfor
bondsconfiguredusingthestandardLinux"bonding"module.teamdisdistributed
withRHEL7andotherLinuxOSvariants.
ef_vi
TheOnloadlayer2APInowhassupportforIP‐protocolandEthertypefilters.These
areonlysupportedonSFN7000‐seriesadaptersandrequireaminimumfirmware
versionofatleast4.6.Furtherdetailsareavailableintheef_viDoxygen
documentation.RefertoAppendixHfordetailsofef_vi.
UDPrecvmsg
Inpreviousreleases,whenusingrecvmsg()toretrieveTXtimestampsforUDP
packets,OnloadwouldonlyreturntheUDPpayload.Inthe201509release,Onload
willreturntheentireEthernetframe.ThismatchesthebehaviouroftheLinux
kernel.
PacketBuffers
WithanaimtofurtherreduceTLBthrashingandeliminatepacketsdrops,Onload
willattempttoreusebuffersfromthesamesetofpacketbuffers.Onloadstackdump
canbeusedtoidentifythepacketssetsbeingusedandfreebufferstatus.
SeePacketSetsonpage222forawiderdescriptionandmoreinformation.
EnvironmentVariables
ChangeshavebeenmadeaffectingthefollowingOnloadenvironmentvariables.
Updatesmayincludechangestothedefaultvalue,removalorchangestothe
variabledefinition.Usersareadvisedtocheckbyrunningthefollowingcommand:
#onload_stackdumpdoc
EF_MAX_ENDPOINTS
EF_LOG
EF_PIPE_SIZE
EF_MAX_PINNED_PAGES
EF_SCALABLE_FILTERS
EF_SCALABLE_FILTERS_ENABLE
EF_SCALABLE_FILTERS_MODE
EF_TCP_CONNECT_SPIN
EF_TCP_SYNCRECV_MAX

OnloadUserGuide
What’sNew
Issue20 ©SolarflareCommunications2015 3
EF_TCP_SNDBUF_MODE
EF_UDP_SEND_NONBLOCK_NO_PACKETS_MODE
EF_TCP_SOCKBUF_MAX_FRACTION
EF_RETRANSMIT_THRESHOLD_ORPHAN
NewenvironmentvariablesarelistedinChapter12,EnvironmentVariableson
page135
ChangeHistory
TheChangeHistorysectionisupdatedwitheveryrevisionofthisdocumentto
includethelatestOnloadfeatures,changesoradditionstoenvironmentvariables
andchangesoradditionstoOnloadmoduleoptions.RefertoChangeHistoryon
page129.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 4
2LowLatencyQuickstartGuide
Introduction
Thissectiondemonstrateshowtoachieveverylowlatencycoupledwithminimum
jitteronasystemfittedwiththeSolarflareSFN7122Fnetworkadapterandusing
Solarflare’skernel‐bypassnetworkaccelerationmiddleware,OpenOnload.
TheprocedurewillfocusontheperformanceofthenetworkadapterforTCPand
UDPapplicationsrunningonLinuxusingtheindustry‐standardNetperfnetwork
benchmarkapplicationandtheSolarflaresuppliedopensourcesfnettestnetwork
benchmarksuite.
PleasereadtheSolarflareLICENSEfileregardingthedisclosureofbenchmarktest
results.
SoftwareInstallation
BeforerunningLowLatencybenchmarktestsensurethatcorrectdriverand
firmwareversionsareinstallede.g.(minimumdriverandfirmwareversionsare
shown):
[root@server‐N]#ethtool‐ienp3s0f0
driver:sfc
version:4.5.1.1020
firmware‐version:4.4.2.1011rx1tx1
FirmwareVariant
OnSFN7000seriesadapters,theadaptershouldusetheultra‐low‐latencyfirmware
variant–asindicatedbythepresenceofrx1tx1asshownabove.Firmwarevariants
areselectedwiththesfbootutilityfromtheSolarflareLinuxUtilitiespackage
(SF‐107601‐LS).
Netperf
Netperfcanbedownloadedfromhttp://www.netperf.org/netperf/
Unpackthecompressedtarfileusingthetarcommand:
[root@system‐N]#tar‐zxvfnetperf‐<version>.tar.gz
Thiswillcreateasub‐directorycallednetperf‐<version>fromwhichthe
configureandmakecommandscanberun(asroot):
./configure
makeinstall
Followinginstallationthenetperfandnetserverapplicationsarelocatedinthe
srcsubdirectory.

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 5
Solarflaresfnettest
Downloadthesfnettest‐<version>.tgzsourcefilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
[root@system‐N]#tar‐zxvfsfnettest‐<version>.tgz
Runthemakeutilityfromthesfnettest‐<version>/srcsubdirectorytobuildthe
sfnt‐pingpongapplication.
SolarflareOnload
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalledthesystemmust
supportabuildenvironmentcapableofcompilingkernelmodules.RefertoBuild
Dependenciesonpage187formoredetails.
Downloadtheopenonload‐<version>.tgzfilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
[root@system‐N]#tar‐zxvfonload‐<version>.tgz
Runtheonload_installcommandfromtheOnload‐<version>/scripts
subdirectory:
[root@system‐N]#./onload_install
TestSetup
Thediagrambelowidentifiestherequiredphysicalconfigurationoftwoservers
equippedwithSolarflarenetworkadaptersconnectedback‐to‐backinorderto
measurethelatencyoftheadapter,driversandaccelerationmiddleware.If
required,testscanberepeatedwitha10Gswitchonthelinktomeasurethe
additionallatencydeltausingaparticularswitch.
Requirements:
•TwoserversareequippedwithSolarflarenetworkadaptersandconnected
withasinglecablebetweentheSolarflareinterfaces.
•TheSolarflareinterfacesareconfiguredwithanIPaddresssothattrafficcan
passbetweenthem.Usepingtoverifyconnection.
• Onload,netperfandsfnettestareinstalledonbothmachines.
System under test
10G link
(direct attach or optical)
System under test

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 6
Pre‐TestConfiguration
Onbothmachines:
1IsolatetheCPUcoresthatwillbeusedfromthegeneralSMPbalancingand
scheduleralgorithms.Addthefollowingoptiontothekernellinein/boot/
grub/grub.conf:
isolcpus=<commaseparatedcpulist>
2StopthecpuspeedservicetopreventpowersavingmodesfromreducingCPU
clockspeed.
RHEL6[root@system‐N]#servicecpuspeedstop
RHEL7[root@system‐N]#sysctlstopcpupower
3StoptheirqbalanceservicetopreventtheOSfromre‐balancinginterrupts
betweenavailableCPUcores.
RHEL6[root@system‐N]#serviceirqbalancestop
RHEL7[root@system‐N]#sysctlstopirqbalance
4Stoptheiptablesservicetoeliminateoverheadsincurredbythefirewall.
SolarflarerecommendthissteponRHEL6forimprovedlatencywhenusingthe
kernelnetworkdriver.
RHEL6[root@system‐N]#serviceiptablesstop
RHEL7[root@system‐N]#sysctlstopiptables
5Disableinterruptmoderation.
[root@system‐N]#ethtool‐Ceth<N>rx‐usecs0adaptive‐rxoff
where<N>istheidentifieroftheSolarflareadapterEthernetinterface.
6RefertotheReferenceSystemSpecificationbelowforBIOSfeatures.
ReferenceSystemSpecification
ThefollowinglatencymeasurementswererecordedontwinIntel®SandyBridge
servers.Thespecificationofthetestsystemsisasfollows:
•DELLPowerEdgeR210serversequippedwithIntel®Xeon®CPUE3‐1280V2
@3.60GHz,2x2GBDIMMs.
•BIOS:TurbomodeENABLED,cstatesDISABLED,IOMMUDISABLED.
•RedHatEnterpriseLinuxV7.0(x86_64kernel,version3.10.0‐123.el7.x86_64).
• SolarflareSFN7122FNIC(driverandfirmware–seeSoftwareInstallation)
Directattachcableat10G.
•Performancemightbeimprovedonsomesystemsifthetunedserviceis
disabled.Usersshouldexperimentwithtunedtuningprofilesordisablethe
tunedservice.
• OpenOnloaddistribution:openonload‐201502‐u3.
ItisexpectedthatsimilarresultswillbeachievedonanyIntelbased,PCIeGen3
serverorcompatiblesystem.

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 7
UDPLatency:Netperf
Runthenet‐serverapplicationonsystem‐1:
[root@system‐1]#pkill‐fnetserver
[root@system‐1]#onload‐‐profile=latencytaskset‐c1./netserver
Runthenetperfapplicationonsystem‐2:
[root@system‐2]#onload‐‐profile=latencytaskset‐c1./netperf‐tUDP_RR
‐H<system1‐ip>‐l10‐‐‐r32
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
212992212992323210.00300351.00
300351transactions/secondmeansthateachtransactiontakes1/300351seconds
resultinginaRTT/2latencyof(1/300351)/2or1.66µs.
UDPLatency:sfnt‐pingpong
Runthesfnt‐pingpongapplicationonbothsystems:
[root@system‐1]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong
[root@system‐2]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong‐‐
affinity"1;1"udp<system1‐ip>
#sizemeanminmedianmax%ilestddeviter
016361571162510584179179911000
11637157316259865189689911000
21634157016289852173167912000
41639157216279917205685910000
816391571162710073200095910000
1616361573162910194173268911000
32166315911647100212198102897000
64169316111670102122400133880000
1281763167017559897188785846000
256188217791850100432477141793000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.66µs
witha99%ilelatencyunder2.2µs.
TCPLatency:Netperf
Runthenetserverapplicationonsystem‐1:
[root@system‐1]#pkill‐fnetserver
[root@system‐1]#onload‐‐profile=latencytaskset‐c1./netserver
Runthenetperfapplicationonsystem‐2:
[root@system‐2]#onload‐‐profile=latencytaskset‐c1./netperf‐t
TCP_RR‐H<system1‐ip>‐l10‐‐‐r32

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 8
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
1638487380323210.00274853.34
274853transactions/secondmeansthateachtransactiontakes1/274853seconds
resultinginaRTT/2latencyof(1/274853)/2or1.81µs.
TCPLatency:sfnt‐pingpong
Runthesfnt‐pingpongapplicationonbothsystems:
[root@system‐1]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong
[root@system‐2]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong‐‐
affinity"1;1"tcp<system1‐ip>
#sizemeanminmedianmax%ilestddeviter
1179816971757101652514164829000
2179416871749105612936198831000
417651690174910301192280845000
817721699175510583193093842000
16180416941751102412925211827000
3217861710176710523197398835000
6418471754183311266202099808000
128192918231908105522460114774000
2562014192319989757219989741000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.78µs
witha99%ilelatencyunder2.0µs.
Layer2ef_viLatency
TheefpioUDPtestapplication,suppliedwiththeopenonloadpackage,canbeused
tomeasurelatencyoftheSolarflareef_vilayer2API.efpiousesPIO.
Usingthesameback‐to‐backconfigurationdescribedabove,efpiolatencytests
wererecordedonDELLPowerEdgeR210servers.
#ef_vi_version_str:201306‐7122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
round‐triptime:2.65µs(1.32RTT/2)
SolarflareefpioTestApplicationonpage257describestheefpioapplication,
commandlineoptionsandprovidesexamplecommandlines.

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 9
ComparativeData
AdapterComparison
Thefollowingtableshowsacomparisonbetweenlatencytestsconductedonthe
SFN6000andtheSFN7000seriesadapters‐valuesshownaretheRTT/2valuein
microseconds.
TestingWithoutOnload
ThebenchmarkperformancetestscanberunwithoutOnloadusingtheregular
kernelnetworkdrivers.Todothisremovetheonload‐‐profile=latencypart
fromthecommandline.
Togetthebestresponseandcomparablelatencyresultsusingkerneldrivers,
Solarflarerecommendsettinginterruptaffinitysuchthatinterruptsandthe
applicationarerunningondifferentCPUcoresbutonthesameprocessorpackage
‐examplesbelow.
Usethefollowingcommandtoidentifyreceivequeuescreatedforaninterfacee.g:
#cat/proc/interrupts|grepeth2
33:0000IR‐PCI‐MSI‐edgeeth2‐0
34:0000IR‐PCI‐MSI‐edgeeth2‐1
DirectIRQ33toCPUcore0andIRQ34toCPUcore1:
#echo1>/proc/irq/33/smp_affinity
#echo2>/proc/irq/34/smp_affinity
Kernellatencyhasbeenmeasuredat3.66µswithUDPtrafficona3.11kernel
supportingthenewkernel“busypoll”featurewherethefollowingvaluesare
recommended:
#sysctlnet.core.busy_poll=50&&sysctlnet.core.busy_read=50
Latencywillbehigherwhenbusypollisnotappliedornotsupportedinthekernel
version.Latencyoflessthan6uscanbemeasuredwithoutbusypollonastandard
RHEL6.4kernel.
Table1:LatencyTests‐ComparativeData
Test SFN6000 SFN7000 Latencygain
UDP 2.2 1.6 27%
TCP 2.4 1.8 25%
ef_viUDP efpingpong‐2.0 efpio‐1.3 40%

OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 10
FurtherInformation
ForinstallationofSolarflareadaptersandperformancetuningofthenetworkdriver
whennotusingOnloadrefertotheSolarflareServerAdapterUserGuide(SF‐
103837‐CD)availablefromhttps://support.solarflare.com/
QuestionsregardingSolarflareproducts,Onloadandthisuserguidecanbeemailed
tosupport@solarflare.com.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 11
3Background
3.1Introduction.
NOTE:ThisguideshouldbereadinconjunctionwiththeSolarflareServerAdapter
User’sGuide,SF‐103837‐CD,whichdescribesproceduresforhardwareand
softwareinstallationofSolarflarenetworkinterfacescards,networkdevicedrivers
andrelatedsoftware.
NOTE:ThroughoutthisuserguidethetermOnloadreferstobothOpenOnloadand
EnterpriseOnloadunlessotherwisestated.
OnloadistheSolarflareacceleratednetworkmiddleware.Itisanimplementationof
TCPandUDPoverIPwhichisdynamicallylinkedintotheaddressspaceofuser‐
modeapplications,andgranteddirect(butsafe)accesstothenetwork‐adapter
hardware.Theresultisthatdatacanbetransmittedtoandreceivedfromthe
networkdirectlybytheapplication,withoutinvolvementoftheoperatingsystem.
Thistechniqueisknownas'kernelbypass'.
Kernelbypassavoidsdisruptiveeventssuchassystemcalls,contextswitchesand
interruptsandsoincreasestheefficiencywithwhichaprocessorcanexecute
applicationcode.Thisalsodirectlyreducesthehostprocessingoverhead,typically
byafactoroftwo,leavingmoreCPUtimeavailableforapplicationprocessing.This
effectismostpronouncedforapplicationswhicharenetworkintensive,suchas:
•Market‐dataandtradingapplications
• Computationalfluiddynamics(CFD)
•HPC(HighPerformanceComputing)
•HPMPI(HighPerformanceMessagePassingInterface),Onloadiscompatible
withMPICH1and2,HPMPI,OpenMPIandSCALI
•Otherphysicalmodelswhicharemoderatelyparallelizable
• High‐bandwidthvideo‐streaming
•Web‐caching,Load‐balancingandMemcachedapplications
•ContentDeliveryNetworks(CDN)andHTTPservers
•Othersystemhot‐spotssuchasdistributedlockmanagersorforced
serializationpoints
TheOnloadlibrarydynamicallylinkswiththeapplicationatruntimeusingthe
standardBSDsocketsAPI,meaningthatnomodificationsarerequiredtothe
applicationbeingaccelerated.Onloadisthefirstandonlyproducttoofferfullkernel
bypassforPOSIXsocket‐basedapplicationsoverTCP/IPandUDP/IPprotocols

OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 12
ContrastingwithConventionalNetworking
Whenusingconventionalnetworking,anapplicationcallsontheOSkerneltosend
andreceivedatatoandfromthenetwork.Transitioningfromtheapplicationtothe
kernelisanexpensiveoperation,andcanbeasignificantperformancebarrier.
WhenanapplicationacceleratedusingOnloadneedstosendorreceivedata,it
neednotaccesstheoperatingsystem,butcandirectlyaccessapartitiononthe
networkadapter.ThetwoschemesareshowninFigure1.
Figure1:ContrastwithConventionalNetworking.
Animportantfeatureoftheconventionalmodelisthatapplicationsdonotget
directaccesstothenetworkinghardwareandsocannotcompromisesystem
integrity.OnloadisabletopreservesystemintegritybypartitioningtheNICatthe
hardwarelevelintomany,protected'VirtualNICs'(VNIC).Anapplicationcanbe
granteddirectaccesstoaVNICwithouttheabilitytoaccesstherestofthesystem
(includingotherVNICsormemorythatdoesnotbelongtotheapplication).Thus
OnloadwithaSolarflareNICallowsoptimumperformancewithoutcompromising
securityorsystemintegrity.
Insummary,Onloadcansignificantlyreducenetworkprocessingoverheads.

OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 13
HowOnloadIncreasesPerformance
Onloadcansignificantlyreducethecostsassociatedwithnetworkingbyreducing
CPUoverheadsandimprovingperformanceforlatency,bandwidthandapplication
scalability.
Overhead
Transitioningintoandoutofthekernelfromauser‐spaceapplicationisarelatively
expensiveoperation:theequivalentofhundredsorthousandsofinstructions.With
conventionalnetworkingsuchatransitionisrequiredeverytimetheapplication
sendsandreceivesdata.WithOnload,theTCP/IPprocessingcanbedoneentirely
withintheuser‐process,eliminatingexpensiveapplication/kerneltransitions,i.e.
systemcalls.Inaddition,theOnloadTCP/IPstackishighlytuned,offeringfurther
overheadsavings.
TheoverheadsavingsofOnloadmeanmoreoftheCPU'scomputingpoweris
availabletotheapplicationtodousefulwork.
Latency
Conventionally,whenaserverapplicationisreadytoprocessatransactionitcalls
intotheOSkerneltoperforma'receive'operation,wherethekernelputsthecalling
thread'tosleep'untilarequestarrivesfromthenetwork.Whensucharequest
arrives,thenetworkhardware'interrupts'thekernel,whichreceivestherequest
and'wakes'theapplication.
AllofthisoverheadtakesCPUcyclesaswellasincreasingcacheandtranslation
lookaside‐buffer(TLB)footprint.WithOnload,theapplicationcanremainatuser
levelwaitingforrequeststoarriveatthenetworkadapterandprocessthem
directly.Theeliminationofakernel‐to‐usertransition,aninterrupt,anda
subsequentuser‐to‐kerneltransitioncansignificantlyreducelatency.Inshort,
reducedoverheadsmeanreducedlatency.
Bandwidth
BecauseOnloadimposeslessoverhead,itcanprocessmorebytesofnetworktraffic
everysecond.Alongwithspeciallytunedbufferingandalgorithmsdesignedfor10
gigabitnetworks,Onloadallowsapplicationstoachievesignificantlyimproved
bandwidth.
Scalability
Modernmulti‐coresystemsarecapableofrunningmanyapplications
simultaneously.However,theadvantagescanbequicklylostwhenthemultiple
corescontendonasingleresource,suchaslocksinakernelnetworkstackordevice
driver.Theseproblemsarecompoundedonmodernsystemswithmultiplecaches
acrossmanyCPUcoresandNon‐UniformMemoryArchitectures.

OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 14
Onloadresultsinthenetworkadapterbeingpartitionedandeachpartitionbeing
accessedbyanindependentcopyoftheTCP/IPstack.TheresultisthatwithOnload,
doublingthecoresreallycanresultindoubledthroughputasdemonstratedby
Figure2.
Figure2:OnloadPartitionedNetworkAdapter
FurtherInformation
Fordetailedinformationreferto:
•OnloadFunctionalityonpage49.
•Onload‐TCPonpage69.
•Onload‐UDPonpage86.
•OnloadandVirtualizationonpage109

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 15
4Installation
4.1Introduction
Thischaptercoversthefollowingtopics:
•OnloadDistributionsonpage15
•HardwareandSoftwareSupportedPlatformsonpage16
•OnloadandtheNetworkAdapterDriveronpage17
•RemovingPreviouslyInstalledDriversonpage17
•Pre‐installNotesonpage18
•EnterpriseOnload‐BuildandInstallfromSRPMonpage18
•EnterpriseOnload‐DebianSourcePackagesonpage20
•OpenOnloadDKMSInstallationonpage20
•BuildOpenOnloadSourceRPMonpage21
•OpenOnload‐Installationonpage21
•OnloadKernelModulesonpage22
•ConfiguringtheNetworkInterfacesonpage23
•InstallingNetperfonpage24
•TestingtheOnloadInstallationonpage24
•ApplyanOnloadPatchonpage24
4.2OnloadDistributions
Onloadisavailableintwodistributions
• “OpenOnload”isafreeversionofOnloadavailablefromhttp://
www.openonload.org/distributedasasourcetarballundertheGPLv2license.
OpenOnloadissubjecttoalineardevelopmentcyclewheremajorreleases
every3‐4monthsincludethelatestdevelopmentfeatures.
• “EnterpriseOnload”isacommercialenterpriseversionofOnloaddistributedas
asourceRPMundertheGPLv2license.EnterpriseOnloaddiffersfrom
OpenOnloadinthatitisofferedasamaturecommercialproductthatis
downstreamfromOpenOnloadhavingundergoneacomprehensivesoftware
producttestcycleresultingintested,hardenedandvalidatedcode.

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 16
TheSolarflareproductrangeoffersaflexibleandbroadrangeofsupportoptions,
usersshouldconsulttheirresellerfordetailsandrefertotheSolarflareEnterprise
ServiceandSupportinformationathttp://www.solarflare.com/Enterprise‐Service‐
Support.
4.3HardwareandSoftwareSupportedPlatforms
•OnloadcanberunonthefollowingSolarflareadapters:
‐ SolarflareFlareonAdapters
‐ OnloadNetworkAdapters
‐ Solarflaremezzanineadapters
‐ SFA6902FandSFA7942QApplicationOnload™Engine.
RefertotheSolarflareServerAdapterUserGuide‘ProductSpecifications’for
adapterdetails.
•OnloadcanrunonalllntelandAMDx86processors,32bitand64bitplatforms.
•Table2identifiessupportedoperatingsystems/kernels
Table2:OS/KernelSupport
OSVersion Notes
RedHatEnterpriseLinux6.4‐7.2 RHEL6built‐inSolarflaredrivers
maynotsupportSFN7000series
adapters.
RedHatMessagingRealtimeandGrid2.4,2.5
RedHatEnterpriseLinuxforRealtime7.1
SuSELinuxEnterpriseServer11sp2,sp3,sp4 Built‐inSolarflaredriversmay
notsupportSFN7000series
adapters.
SuSELinuxEnterpriseRealtimeExtension11
SuSELinuxEnterpriseServer12baserelease
CanonicalUbuntuServerLTS14.04
CanonicalUbuntuServer14.10,15.04,15.10
Debian7“Wheezy”7.x
Debian8“Jessie”8.0

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 17
WhilsttheOnloadQAtestcyclepredominantlyfocusesontheLinuxOSversions
documentedabove,althoughnotformallysupported,Solarflarearenotawareof
anyissuespreventingOnloadinstallationonotherLinuxvariantssuchasCentos,
Gentoo,andFedora.SomeversionsofUbuntuandDebianearlierthanthoselisted
abovearealsoknowntosupportOnload.
4.4OnloadandtheNetworkAdapterDriver
TheSolarflarenetworkadapterdriver,the“netdriver”,isgenerallyavailablefrom
threesources:
•DownloadassourceRPMfromsupport.solarflare.com.
•Packaged‘inbox’inmanyLinuxdistributionse.gRedHatEnterpriseLinux.
•PackagedintheOpenOnload/EnterpriseOnloaddistribution.
WhenusingOnloadyoumustusetheadapterdriverdistributedwiththatversionof
Onload.
4.5RemovingPreviouslyInstalledDrivers
TheSolarflareadapterdriver(sfc.ko)isdistributedaspartofmanyLinuxbasedOS
distributions‐thisisoftenreferredtoasthe‘boxeddriver’orthe‘in‐tree’driver.
DependingontheOSversionthisdrivermaynotsupportmorerecentSolarflare
adapters.Alwayscheckthedriverreleasenotesavailablefromhttps://
support.solarflare.com/.
The‘in‐tree’driverdisplaysonlyMajorandMinorrevisionnumberswhendisplayed
bytheethtoolcommand:
#ethtool‐ienp3s0f0
driver:sfc
version:4.0
EveryOnloadreviseddistributionincludesaversionofthenetdrivertosupportthe
specificfeaturesoftheOnloadrelease–andthisdrivershouldalwaysbeusedwith
Onload.(ThedriverisinstalledalongwiththeotherOnloaddrivers.)Onloaddrivers
displaydetailedversioninformationusingtheethtoolcommand:
Linuxkernels2.6.18‐4.2
SolarflareaimtosupporttheOScurrentandpreviousmajorreleaseatthepoint
thesearereleased(plusthelatestlongtermsupportreleaseifthisisnotalready
included).Thisincludesallminorreleaseswherethedistributorhasnotyet
declaredendoflife/support.
Table2:OS/KernelSupport
OSVersion Notes

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 18
#ethtool‐ienp3s0f0
driver:sfc
version:4.5.1.1020
ToensuretheOnloaddriverisalwaysloadedfollowingsystemreboot,the‘in‐tree’
drivercanberemovedfromtheOSentirely.AlternativelyanyOnloadstartupscript
shouldincludethecommandtoreloadtheOnloaddrivers:
#onload_toolreload
Toremovethe‘in‐tree’driver(withOnloaduninstalledornotyetinstalled):
#find/lib/modules/$(uname‐r)‐name'sfc*.ko'|xargsrm–rf
#rmmodsfc
#update‐initramfs‐u‐k<kernelversion>
initramfscommandsmaydifferondifferentLinuxbasedOS,e.gonCentos7the
followingdracutcommandcanbeused:
#dracut–f/boot/initramfs‐<version>.x86_64.imginitramfs‐<version>.x86_64
4.6Pre‐installNotes
NOTE:IfOnloadistoacceleratea32bitapplicationona64bitarchitecture,the
32bitlibcdevelopmentheadersshouldbeinstalledbeforebuildingOnload.Refer
toAppendixCforinstallinstructions.
NOTE:YoumustremoveanyexistingSolarflareRPMdriverpackagesbefore
installingOnload.
NOTE:WhenmigratingbetweenOnloadversionsorbetweenOpenOnloadand
EnterpriseOnload,apreviouslyinstalledversionmustfirstberemovedusingthe
onload_uninstallcommand.
NOTE:TheSolarflaredriversarecurrentlyclassifiedasunsupportedinSLES11,12,
thecertificationprocessisunderway.Toovercomethis(SLES11)add
‘allow_unsupported_modules1’tothe/etc/modprobe.d/unsupported‐
modulesfile.ForSLES12addthesametothe/etc/modprobe.d/10‐
unsupported‐modules.conffile.
4.7EnterpriseOnload‐BuildandInstallfromSRPM
ThefollowingstepsidentifytheprocedurestobuildandinstallEnterpriseOnload.
SRPMscanbebuiltbythe‘root’or‘non‐root’user,buttheusermusthave
superuserprivilegestoinstallRPMs.CustomersshouldcontacttheirSolarflare
customersalesrepresentativeforaccesstotheEnterpriseOnloadSRPMresources.
BuildtheRPM
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
Asroot:

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 19
rpmbuild‐‐rebuildenterpriseonload‐<version>.src.rpm
Orasanon‐rootuser:
Itisadvisedtouse_topdirtoensurethatRPMsarebuiltintoadirectorytowhich
theuserhaspermissions.Thedirectorystructuremustpre‐existfortherpmbuild
commandtosucceed.
mkdir‐p/tmp//myrpm/{SOURCES,BUILD,RPMS,SRPMS}
rpmbuild‐‐define"_topdir/tmp/myrpm"\
‐‐rebuildenterpriseonload‐<version>.src.rpm
NOTE:Onsomenon‐standardkernelstherpmbuildmightfailbecauseofbuild
dependencies.Inthiseventretry,addingthe‐‐nodepsoptiontothecommand
line.
BuildingthesourceRPMwillproduce2binaryRPMfileswhichcanbefoundinthe
•/usr/src/*/RPMS/directory
•or,whenbuiltbyanon‐rootuserin_topdir/RPMS
•or,when_topdirwasdefinedintherpmbuildcommandlinein/tmp/myrpm/
RPMS/x86_64/
forexampletheEnterpriseOnloaduser‐spacecomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonload‐<version>.x86_64.rpm
andtheEnterpriseOnloadkernelcomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonload‐kmod‐2.6.18‐92.el5‐
<version>.x86_64.rpm
InstalltheEnterpriseOnloadRPM
TheEnterpriseOnloadRPMandthekernelRPMmustbeinstalledfor
EnterpriseOnloadtofunctioncorrectly.
rpm‐ivfenterpriseonload‐<version>.x86_64.rpm
rpm‐ivfenterpriseonload‐kmod‐2.6.18‐92.el5‐<version>.x86_64.rpm
NOTE:EnterpriseOnloadisnowinstalledbutthekernelmodulesarenotyetloaded.
NOTE:TheEnterpriseOnload‐kmodfilenameisspecifictothekernelthatitisbuilt
for.
InstallingtheEnterpriseOnloadKernelModule
ThiswillloadtheEnterpriseOnloadkerneldriverandotherdriverdependenciesand
createanydevicenodesneededforEnterpriseOnloaddriversandutilities.The
commandshouldberunasroot.
/etc/init.d/openonloadstart
Followingsuccessfulexecutionthiscommandproducesnooutput,butthe‘onload’
scriptwillidentifythatthekernelmoduleisnowloaded.

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 20
onload
EnterpriseOnload<version>
Copyright2006‐2013SolarflareCommunications,2002‐2005Level5Networks
Built:Oct15201309:19:2312:23:12(release)
Kernelmodule:<version>
NOTE:AtthispointEnterpriseOnloadisloaded,butuntilthenetworkinterfacehas
beenconfiguredandbroughtintoserviceEnterpriseOnloadwillbeunableto
acceleratetraffic.
4.8EnterpriseOnload‐DebianSourcePackages
Fromversion4.0,DebianinstallpackagesareavailableforEnterpriseOnload.
Packagesarenamedinthefollowingformat:
enterpriseonload_<version>‐debiansource.tgz
1Untarsourcepackage
$tarxfenterpriseonload_<version>‐debiansource.tgz
2Extractsource
$dpkg‐source‐xenterpriseonload_<version>‐1.dsc
3Buildpackages
$cdenterpriseonload‐<version>
$debuild‐i‐uc‐us
4Installpackages
$sudodpkg‐i../enterpriseonload‐user_<version>‐1_amd64.deb
$sudodpkg‐i../enterpriseonload‐source_<version>‐1_all.deb
5Buildandinstallmodules
$sudom‐aa‐ienterpriseonload
4.9OpenOnloadDKMSInstallation
OpenOnloadDKMSpackagesareavailablebycontactingsupport@solarflare.com.
1DKMSmustbeinstalledontheserver.DKMScanbedownloadedfromhttp://
linux.dell.com/dkms/orfromtheOSdistribution.Tocheckthisrunthe
followingcommandwhichwillreturnnothingifDKMSisnotinstalled:
#dkms‐‐version
dkms:2.2.0.3
2InstalltheOnloaddkmspackage:
#rpm‐iopenonload‐dkms‐<version>.noarch.rpm
3Ensuredriversandkernelmoduleareloaded:
onload_toolreload

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 21
4.10BuildOpenOnloadSourceRPM
AsourceRPMcanbebuiltfromtheOpenOnloaddistributiontarfile.
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
CopythefiletoadirectoryonthemachinewherethesourceRPMistobe
created.
2Asroot,executethefollowingcommand:
rpmbuild‐tsopenonload‐<version>.tgz*
x86_64Wrote:/root/rpmbuild/SRPMS/openonload‐<version>.src.rpm
TheoutputidentifiesthelocationofthesourceRPM.Usethe‐taoptiontoget
abinaryRPM.
4.11OpenOnload‐Installation
Thefollowingproceduredemonstrateshowtodownload,untarandinstall
OpenOnload.
DownloadanduntarOpenOnload
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
Thecompressedtarfile(.tgz)shouldbedownloaded/copiedtoadirectoryon
themachineonwhichitwillbeinstalled.
2Asroot,unpackthetarfileusingthetarcommand.
tar‐zxvfopenonload‐<version>.tgz
Thiswillunpackthetarfileand,withinthecurrentdirectory,createasub‐
directorycalledopenonload‐<version>whichcontainsothersub‐directories
includingthescriptsdirectoryfromwhichsubsequentinstallcommandscan
berun.
BuildingandInstallingOpenOnload
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
ThefollowingcommandwillbuildandinstallOpenOnloadandrequireddriversin
thesystemdirectories:
./onload_install
Successfulinstallationwillbeindicatedwiththefollowingoutput
“onload_install:Installcomplete”–possiblyfollowedbyawarningthatthe
sfc(netdriver)driverisalreadyinstalled.
NOTE:Theonload_installscriptdoesnotcreateRPMs.

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 22
LoadOnloadDrivers
FollowinginstallationitisnecessarytoloadtheOnloaddrivers:
onload_toolreload
WhenusedwithOpenOnloadthiscommandwillreplaceanypreviouslyloaded
networkadapterdriverwiththedriverfromtheOpenOnloaddistribution.
CheckthatSolarflaredriversareloadedusingthefollowingcommands:
lsmod|grepsfc
lsmod|greponload
AnalternativetothereloadcommandistorebootthesystemtoloadOnload
drivers.
ConfirmOnloadInstallation
WhentheOnloadinstallationiscompleteruntheonloadcommandtoconfirm
installationofOnloadsoftwareandkernelmodule:
[root@server1]onload
WilldisplaytheOnloadproductbannerandusage:
OpenOnload201405
Copyright2006‐2012SolarflareCommunications,2002‐2005Level5Networks
Built:May20201416:46:33(release)
Kernelmodule:201405
usage:
onload[options]<command><command‐args>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐force‐profiles‐‐profilesettingsoverrideenvironment
‐‐no‐app‐handler‐‐donotuseapp‐specificsettings
‐‐app=<app‐name>‐‐identifyapplicationtorununderonload
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
4.12OnloadKernelModules
ToidentifySolarflaredriversalreadyinstalledontheserver:
modprobe‐l|grep‐esfc‐eonloa

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 23
d
Tounloadanyloadeddrivers:
onload_toolunload
ToremovetheinstalledfilesofapreviousOnload:
onload_uninstall
ToloadtheSolarflarenetdriver(ifnotalreadyloaded):
modprobesfc
Reloaddriversfollowingupgradeorchangedsettings:
onload_toolreload
4.13ConfiguringtheNetworkInterfaces
NetworkinterfacesshouldbeconfiguredaccordingtotheSolarflareServerAdapter
User’sGuide.
Whentheinterface(s)havebeenconfigured,thedmesgcommandwilldisplay
outputsimilartothefollowing(oneentryforeachSolarflareinterface):
sfc0000:13:00.0:INFO:eth2SolarflareCommunicationsNICPCI(1924:803)
sfc0000:13:00.1:INFO:eth3SolarflareCommunicationsNICPCI(1924:803)
DriverName Description
sfc.ko ALinuxnetdriverprovidestheinterfacebetweentheLinux
networkstackandtheSolarflarenetworkadapter.
sfc_char.ko ProvideslowlevelaccesstotheSolarflarenetworkadapter
virtualizedresources.Supportsdirectaccesstothenetwork
adapterforapplicationsthatusetheef_viuser‐levelinterface
formaximumperformance.
sfc_tune.ko Thisisusedtopreventthekernelduringidleperiodsfrom
puttingtheCPUsintoasleepstate.
Removedinopenonload‐201405.
sfc_aoe.ko SolarflareApplicationOnload™EnginedriverfortheSFA6902F
adapter.
sfc_affinity.ko Usedtodirecttrafficflowmanagedbyathreadtothecorethe
threadisrunningon,insertspacketfiltersthatoverridethe
RSSbehaviour.
sfc_resource.ko Managesthevirtualizationresourcesoftheadapterand
sharestheresourcesbetweenotherdrivers.
onload.ko ThekernelcomponentofOnload.

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 24
NOTE:IPaddressconfigurationshouldbecarriedoutusingnormalOStoolse.g.
system‐config‐network(RedHat)oryast(SUSE).
4.14InstallingNetperf
RefertotheLowLatencyQuickstartGuideonpage4forinstructionstoinstall
NetperfandSolarflaresfnettestapplications.
4.15HowtorunOnload
OnceOnloadhasbeeninstalledtherearedifferentwaystoaccelerateapplications.
ExportingLD_PRELOADwillmeanthatallapplicationsstartedinthesame
environmentwillbeaccelerated.
#exportLD_PRELOAD=libonload.so
Pre‐fixingtheapplicationcommandlinewiththeonloadcommandwillaccelerate
theapplication.
#onload<app_name>[app_options]
4.16TestingtheOnloadInstallation
ThetheLowLatencyQuickstartGuideonpage4demonstratestestingofOnload
withNetperfandtheSolarflaresfnettestbenchmarktools.
4.17ApplyanOnloadPatch
Occasionally,theSolarflareSupportGroupmayissueasoftware‘patch’whichis
appliedtoonloadtoresolveaspecificbugorinvestigateaspecificissue.The
followingproceduredescribeshowapatchshouldbeappliedtotheinstalled
OpenOnloadsoftware.
1Copythepatchtoadirectoryontheserverwhereonloadisalreadyinstalled.
2Gototheonloaddirectoryandapplythepatche.g.
cdopenonload‐<version>
[openonload‐<version>]$patch‐p1<~/<path>/<nameofpatchfile>.patch
3Uninstalltheoldonloaddrivers
[openonload‐<version>]$onload_uninstall
4Buildandre‐installtheonloaddrivers
[openonload‐<version>]$./scripts/onload_install
[openonload‐<version>]$onload_toolreload
Thefollowingproceduredescribeshowapatchshouldbeappliedtotheinstalled
EnterpriseOnloadRPM.(ThisexamplepatchesEnterpriseOnloadversion2.1.0.3).

OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 25
1CopythepatchtothedirectoryontheserverwheretheEnterpriseOnloadRPM
packageexistsandcarryoutthefollowingcommands:
rpm2cpioenterpriseonload‐2.1.0.3‐1.src.rpm|cpio–id
tar‐xzfenterpriseonload‐2.1.0.3.tgz
cdenterpriseonload‐2.1.0.3
patch‐p1<$PATCHNAME
2Thiscannowbeinstalleddirectoryfromthisdirectory:
./scripts/onload_install
3OritcanberepackagedasanewRPM:
cd..
tarczfenterpriseonload‐2.1.0.3.tgzenterpriseonload‐2.1.0.3
rpmbuild‐tsenterpriseonload‐2.1.0.3.tgz
4Therpmbuildprocedurewilldisplaya‘Wrote’lineidentifyingthelocationof
thebuiltRPMe.g
Wrote:/root/rpmbuild/SRPMS/enterpriseonload‐2.1.0.3‐1.el6.src.rpm
5InstalltheRPMintheusualway:
rpm‐ivh/root/rpmbuild/SRPMS/enterpriseonload‐2.1.0.3‐1.el6.src.rpm

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 26
5TuningOnload
5.1Introduction
ThischapterdocumentstheavailabletuningoptionsforOnload,andtheexpected
results.Theoptionscanbesplitintothefollowingcategories:
•SystemTuning
• StandardLatencyTuning.
•AdvancedTuningdrivenfromanalysisoftheOnloadstackusing
onload_stackdump.
MostoftheOnloadconfigurationparameters,includingtuningparameters,areset
byenvironmentvariablesexportedintotheacceleratedapplicationsenvironment.
Environmentvariablescanbeidentifiedthroughoutthismanualastheybeginwith
EF_.AllenvironmentvariablesaredescribedinAppendicesAandBofthismanual.
Examplesthroughoutthisguideassumetheuseofthebashorshshells;othershells
mayusedifferentmethodstoexportvariablesintotheapplicationsenvironment.
•SystemTuningonpage27describestoolsandcommandswhichcanbeusedto
tunetheserverandOS.
•StandardTuningonpage29describeshowtoperformstandardheuristic
tuning,whichcanhelpimprovetheapplication’sperformance.Therearealso
benchmarkexamplesrunningspecificteststodemonstratetheimprovements
Onloadcanhaveonanapplication.
•AdvancedTuningonpage42introducesadvancedtuningoptionsusing
onload_stackdump.Thereareworkedexamplestodemonstratehowto
achievetheapplicationtuninggoals.
NOTE:Onloadtuningandkerneldrivertuningaresubjecttodifferent
requirements.ThissectiondescribesthestepstotuneOnload.Fordetailsonhow
totunetheSolarflarekerneldriver,refertothe'PerformanceTuningonLinux'
sectionoftheSolarflareServerAdapterUserGuide.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 27
5.2SystemTuning
Thissectiondetailsstepstotunetheserverandoperatingsystemforlowestlatency.
Sysjitter
TheSolarflaresysjitterutilitymeasurestheextenttowhichthesystemintroduces
jitterandsoimpactsontheuser‐levelprocess.Sysjitterrunsathreadoneach
processorcoreandwhenthethreadisde‐scheduledfromthecoreitmeasuresfor
howlong.Sysjitterproducessummarystatisticsforeachprocessorcore.The
sysjitterutilitycanbedownloadedfromwww.openonload.org
Sysjittershouldberunonasystemthatisidle.Whenrunningonasystemwith
cpusetsenabled‐runsysjitterasroot.
RefertothesysjitterREADMEfileforfurtherinformationonbuildingandrunning
sysjitter.
ThefollowingisanexampleoftheoutputfromsysjitteronasingleCPUsocket
serverwith4CPUcores.
./sysjitter‐‐runtime10200|column‐t
core_i:0123
threshold(ns):200200200200
cpu_mhz:3215321532153215
runtime(ns):9987653973998765224599876520709987652027
runtime(s):9.9889.9889.9889.988
int_n:10001101301001210001
int_n_per_sec:1001.3361014.2521002.4381001.336
int_min(ns):1333124712991446
int_median(ns):1390133013291470
int_mean(ns):1424145214521502
int_90(ns):1437137213571519
int_99(ns):1619504623921688
int_999(ns):506522977156043694
int_9999(ns):312603901718430536419
int_99999(ns):406134506534709749998
int_max(ns):406134506534709749998
int_total(ns):14244846147199721454199115031294
int_total(%):0.1430.1470.1460.150
Thetablebelowdescribestheoutputfieldsofthesysjitterutility.
Field Description
threshold(ns) ignoreanyinterruptsshorterthanthisperiod
cpu_mhz CPUspeed
runtime(ns) runtimeofsysjitter‐nanoseconds
runtime(s) runtimeofsysjitter‐seconds
int_n numberofinterruptionstotheuserthread

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 28
Timer(TSC)Stability
OnloadusestheTimeStampCounter(TSC)CPUregisterstomeasurechangesin
timewithverylowoverhead.ModernCPUssupportan“invariantTSC”,whichis
synchronizedacrossdifferentCPUsandticksataconstantrateregardlessofthe
currentCPUfrequencyandpowersavingmode.Onloadreliesonthistogenerate
accuratetimecalculationswhenrunningacrossmultipleCPUs.Ifrunonasystem
whichdoesnothaveaninvariantTSC,Onloadmaycalculatewildlyinaccuratetime
valuesandthiscan,inextremecases,leadtosomeconnectionsbecomingstuck.
UsersshouldconsulttheirservervendordocumentationandOSdocumentationto
ensurethatserverscanmeettheinvariantTSCrequirement.
CPUPowerSavingMode
ModernprocessorsutilizedesignfeaturesthatenableaCPUcoretodropinto
loweringpowerstateswheninstructedbytheoperatingsystemthattheCPUcore
isidle.WhentheOSschedulesworkontheidleCPUcore(orwhenotherCPUcores
ordevicesneedtoaccessdatacurrentlyintheidleCPUcore’sdatacache)theCPU
coreissignaledtoreturntothefully‐onpowerstate.ThesechangesinCPUcore
powerstatescreateadditionalnetworklatencyandjitter.
int_n_per_sec numberofinterruptionstotheuserthreadpersecond
int_min(ns) minimumtimetakenawayfromtheuserthreadduetoan
interruption
int_median(ns) mediantimetakenawayfromtheuserthreadduetoan
interruption
int_mean(ns) meantimetakenawayfromtheuserthreadduetoan
interruption
int_90(ns) 90%percentilevalue
int_99(ns) 99%percentilevalue
int_999(ns) 99.9%percentilevalue
int_9999(ns) 99.99%percentilevalue
int_99999(ns) 99.999%percentilevalue
int_max(ns) maxtimetakenawayfromtheuserthread
int_total(ns) totaltimespentnotprocessingtheuserthread
int_total(%) int_total(ns)asapercentageoftotalruntime
Field Description

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 29
Solarflarethereforerecommendthatcustomerswishingtoachievethelowest
latencyandlowestjitterdisablethe“C1Epowerstate”or“CPUpowersavingmode”
withinthemachine'sBIOS.
DisablingtheCPUpowersavingmodesisrequirediftheapplicationistorealizelow
latencywithlowjitter.
NOTE:ToensureCstatesarenotenabled,overridingtheBIOSsettings,itis
recommendedtoputtheline‘intel_idle.max_cstate=0idle=poll’intothe
kernelcommandline/boot/grub/grub.conf.Thesettingswillproduceconsistent
resultsandareparticularlyusefulwhenbenchmarking,butallowingsomecoresto
enableTurbomodeswhileothersareidlecanproducebestlatencyinsomeservers.
UsersshouldrefertovendordocumentationandexperimentwithCstatesfor
differentapplications.
Customersshouldconsulttheirsystemvendoranddocumentationfordetails
concerningthedisablingofC1E,CstatesorCPUpowersavingstates.
5.3StandardTuning
ThissectiondetailsstandardtuningstepsforOnload.
Spinning(busy‐wait)
Conventionally,whenanapplicationattemptstoreadfromasocketandnodatais
available,theapplicationwillentertheOSkernelandblock.Whendatabecomes
available,thenetworkadapterwillinterrupttheCPU,allowingthekernelto
rescheduletheapplicationtocontinue.
Blockingandinterruptsarerelativelyexpensiveoperations,andcanadverselyaffect
bandwidth,latencyandCPUefficiency.
Onloadcanbeconfiguredtospinontheprocessorinusermodeforuptoaspecified
numberofmicrosecondswaitingfordatafromthenetwork.Ifthespinperiod
expirestheprocessorwillreverttoconventionalblockingbehavior.Non‐blocking
socketswillalwaysreturnimmediatelyastheseareunaffectedbyspinning.
OnloadusestheEF_POLL_USECenvironmentvariabletoconfigurethelengthofthe
spintimeout.
exportEF_POLL_USEC=100000
willsetthebusy‐waitperiodto100milliseconds.SeeMetaOptionsonpage185for
moredetails.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 30
Enablingspinning
ToenablespinninginOnload:
SetEF_POLL_USEC.ThiscausesOnloadtospinontheprocessorforuptothe
specifiednumberofmicrosecondsbeforeblocking.ThissettingisusedinTCPand
UDPandalsoinrecv(),select(),pselect()andpoll(),ppoll()and
epoll_wait(),epoll_pwait()andonload_ordered_epoll_wait().Usethe
followingcommand:
exportEF_POLL_USEC=100000
NOTE:IfneitherofthespinningoptionsEF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
SettingtheEF_POLL_USECvariablealsosetsthefollowingenvironmentvariables.
EF_SPIN_USEC=EF_POLL_USEC
EF_SELECT_SPIN=1
EF_EPOLL_SPIN=1
EF_POLL_SPIN=1
EF_PKT_WAIT_SPIN=1
EF_TCP_SEND_SPIN=1
EF_UDP_RECV_SPIN=1
EF_UDP_SEND_SPIN=1
EF_TCP_RECV_SPIN=1
EF_BUZZ_USEC=EF_POLL_USEC
EF_SOCK_LOCK_BUZZ=1
EF_STACK_LOCK_BUZZ=1
Turnoffadaptivemoderationandsetinterruptmoderationtoahighvalue
(microseconds)toavoidfloodingthesystemwithinterrupts.Usethefollowing
command:
/sbin/ethtool‐Ceth2rx‐usecs60adaptive‐rxoff
SeeMetaOptionsonpage185formoredetails
WhentoUseSpinning
Theoptimalsettingisdependentonthenatureoftheapplication.Ifanapplication
islikelytofinddatasoonafterblocking,orthesystemdoesnothaveanyother
majortaskstoperform,spinningcanimprovelatencyandbandwidthsignificantly.
Ingeneral,anapplicationwillbenefitfromspinningifthenumberofactivethreads
islessthanthenumberofavailableCPUcores.However,iftheapplicationhasmore
activethreadsthanavailableCPUcores,spinningcanadverselyaffectapplication
performancebecauseathreadthatisspinning(andthereforeidle)takesCPUtime
awayfromanotherthreadthatcouldbedoingwork.Ifindoubt,itisadvisabletotry
anapplicationwitharangeofsettingstodiscovertheoptimalvalue.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 31
Pollingvs.Interrupts
InterruptsareusefulbecausetheyallowtheCPUtodootherusefulworkwhile
simultaneouslywaitingforasynchronousevents(suchasthereceptionofpackets
fromthenetwork).ThehistoricalalternativetointerruptswasfortheCPUto
periodicallypollforasynchronouseventsandonsingleprocessorsystemsthiscould
resultingreaterlatencythanwouldbeobservedwithinterrupts.Historicallyitwas
acceptedthatinterruptswere“goodforlatency”.
Onmodern,multicoresystemsthetradeoffsaredifferent.Itisoftenpossibleto
dedicateanentireCPUcoretotheprocessingofasinglesourceofasynchronous
events(suchasnetworktraffic).TheCPUdedicatedtoprocessingnetworktraffic
canbespinning(akabusywaiting),continuouslypollingforthearrivalofpackets.
Whenapacketarrives,theCPUcanbeginprocessingitalmostimmediately.
Contrastthepollingmodeltoaninterrupt‐drivenmodel.HeretheCPUislikelyinits
“idleloop”whenaninterruptoccurs.Theidleloopisinterrupted,theinterrupt
handlerexecutes,typicallymarkingaworkertaskasrunnable.TheOSschedulerwill
thenrunandswitchestothekernelthreadthatwillprocesstheincomingpacket.
Thereistypicallyasubsequenttaskswitchtoauser‐modethreadwherethereal
workofprocessingtheevent(e.g.actingonthepacketpayload)isperformed.
Dependingonthesystem,itcantakeontheorderofamicrosecondtorespondto
aninterruptandswitchtotheappropriatethreadcontextbeforebeginningthereal
workofprocessingtheevent.AdedicatedCPUspinninginapollingloopcanbegin
processingtheasynchronouseventinamatterofnanoseconds.
ItfollowsthatspinningonlybecomesanoptionifaCPUcorecanbededicatedto
theasynchronousevent.IftherearemorethreadsawaitingeventsthanCPUcores
(i.e.ifallCPUcoresareoversubscribedtoapplicationworkerthreads),thenspinning
isnotaviableoption,(atleast,notforallevents).Onethreadwillbespinning,
pollingfortheeventwhileanothercouldbedoingusefulwork.Spinninginsucha
scenariocanleadto(dramatically)increasedlatencies.ButifaCPUcorecanbe
dedicatedtoeachthreadthatblockswaitingfornetworkI/O,thenspinningisthe
bestmethodtoachievethelowestpossiblelatency.
5.4OnloadDeploymentonNUMASystems
WhendeployedonNUMAsystems,applicationloadthroughputandlatency
performancecanbeadverselyaffectedunlessdueconsiderationisgiventothe
selectionoftheNUMAnode,theallocationofcachememoryandtheaffinitization
ofdrivers,processesandinterrupts.
ForbestperformancetheacceleratedapplicationshouldalwaysrunontheNUMA
nodenearesttotheSolarflareadapter.Thecorrectallocationofmemoryis
particularlyimportanttoensurethatpacketbuffersareallocatedonthecorrect
NUMAnodetoavoidunnecessaryincreasesinQPItrafficandtoavoiddropped
packets.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 32
Usefulcommands
•ToidentifyNUMAnodes,socketmemoryandCPUcoreallocation:
#numactl‐H
•ToidentifytheNUMAnodelocaltoaSolarflareadapter:
#cat/sys/class/net/<interface>/device/numa_node
•ToidentifymemoryallocationanduseonaparticularNUMAnode:
#cat/sys/devices/system/node/node<N>/numastat
DriverLoading
Whenloading,theOnloadmodulewillcreateavarietyofcommondatastructures.
ToensurethatthesearecreatedontheNUMAnodenearesttotheSolarflare
adapter,onload_toolreloadshouldbeaffinitizedtoacoreonthecorrectNUMA
node.
#numactl‐‐cpunodebind=1onload_toolreload
MemoryPolicy
Toguaranteethatmemoryisappropriatelyallocated‐andtoensurethatmemory
allocationsdonotfail,amemorypolicythatbindstoaspecificNUMAnodeshould
beselected.Whennopolicyisspecifiedthesystemwillgenerallyuseadefault
policyallocatingmemoryonthenodeonwhichaprocessisexecuting.
ApplicationProcessing
ThemajorityofprocessingbyOnloadoccursinthecontextoftheOnloaded
application.VariousmethodscanbeusedtoaffinitizetheOnloadedprocess;
numactl,tasksetorcpusetsortheCPUaffinitycanbesetprogramatically.
Workqueues
AnOnloadedapplicationwillcreatetwosharedworkqueuesandoneper‐stack
workqueue.TheimplementationoftheworkqueuediffersbetweenLinuxkernels‐
andsodoesthemethodusedtoaffinitizeworkqueues.
OnmorerecentLinuxkernels(3.10+)theOnloadworkqueueswillbeinitially
affinitizedtothenodeonwhichtheyarecreated.Thereforeifthedriverloadis
affinitizedandtheOnloadedapplicationaffinitizedtothecorrectnode,Onload
stackswillbecreatedonthecorrectnodeandtherewillbenofurtherwork
required.
SpecifyingacpumaskviasysfsforaworkqueueisNOTrecommendedasthiscan
breakorderingrequirements.
OnolderLinuxkernelsdedicatedworkqueuethreadsarecreated‐andthesecanbe
affinitizedusingtasksetorcpusets.Identifythetwoworkqueuessharedbyall
Onloadstacks:

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 33
onload‐wqueue
sfc_vi
Identifytheper‐stackworkqueuewhichhasanameintheformatonload‐
wq<stackid>(e.gonload‐wq:1forstack1).
Usetheonload_stackdumpcommandtoidentifyOnloadstacksandthePIDofthe
processthatcreatedthestack:
#onload_stackdump
#stack‐idstack‐namepids
0‐106913
UsetheLinuxpidofcommandtoidentifythePIDsforOnloadworkqueues:
#pidofonload‐wq:0sfc_vionload‐wqueue
106930105409105431
Itisrecommendedthatthesharedworkqueuesareaffinitizedimmediatelyafterthe
driverisloadedandtheper‐stackqueueimmediatelyafterstackcreation.
Interrupts
WhenOnloadisbeingusedinaninterrupt‐drivenmode(seeInterruptHandling‐
UsingOnloadonpage38)interruptsshouldaffinitizedtothesameNUMAnode
runningtheOnloadapplication,butnotonthesameCPUcoreastheapplication.
WhenOnloadisspinning(busy‐wait)therewillbefew(ifany)interrupts,soitisnot
arealconcernwherethesearehandled.
Verification
Theonload_stackdumplotscommandisusedtoverifythatallocationsoccuronthe
requiredNUMAnode:
#onload_stackdumplots|grepnuma
numanodes:creation=0load=0
numanodemasks:packetalloc=1sockalloc=1interrupt=1
ThecpuaffinityofindividualOnloadedthreadscanbeidentifiedwiththefollowing
command:
#onload_stackdumpthreads
5.5InterruptHandling‐KernelDriver
DefaultBehavior
Usingthevalueidentifiedfromtherss_cpusoption,theSolarflareNETdriverwill
createanumberofreceive(andtransmit)queues(termedan“RSSchannel”)for
eachphysicalinterface.BydefaultthedrivercreatesoneRSSchannelperCPUcore
detectedintheseveruptoamaximumof32.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 34
Therss_cpussfcdrivermoduleoptioncanbesetinausercreatedfile<sfc.conf>in
the/etc/modprobe.ddirectory.Thedrivermustbereloadedbeforetheoption
becomeseffective.Forexample,rss_cpuscanbesettoanintegervalue:
optionssfcrss_cpus=4
Intheaboveexample4receivequeuesarecreatedperSolarflareinterface.The
defaultvalueisrss_cpus=cores.Otheravailableoptionsarerss_cpus=<int>,
rss_cpus=hyperthreadsandrss_cpus=packages.
NOTE:Ifthesfcdrivermoduleparameter‘rss_numa_local’isenabled,RSSwillbe
restrictedtousecores/hyperthreadsontheNUMAnodelocaltotheSolarflare
adapter.
AffinitizingRSSChannelstoCPUs
Asdescribedintheprevioussection,thedefaultbehavioroftheSolarflarenetwork
driveristocreateoneRSSchannelperCPUcore.Atloadtimethedriveraffinitizes
theinterruptassociatedwitheachRSSchanneltoaseparateCPUcoresothe
interruptloadisevenlydistributedovertheavailableCPUcores.
NOTE:TheseinitialinterruptaffinitieswillbedisruptedandchangediftheLinux
IRQbalancerdaemonisrunning.TostoptheIRQbalancerusethefollowing
command:
#serviceirqbalancestop
Inthefollowingexample,wehaveaserverwith2Solarflaredual‐portadapters
(totalofnetwork4interfaces),installedinaserverwith2CPUsocketswith8cores
persocket(hyperthreadingisdisabled).
Ifwesetrss_cpus=4,eachinterfacewillcreate4RSSchannels.Thedrivertakes
caretospreadtheaffinitizedinterruptsevenlyovertheCPUtopologyi.e.evenly
betweenthetwoCPUsocketsandevenlyoversharedL2/L3caches.
Thedriveralsoattemptstospreadtheinterruptloadofthemultiplenetwork
interfacesbyusingdifferentCPUcoresfordifferentinterfaces:
Table3:ExampleRSSChannelMapping
Interface Numofrxqueues Maptocores
1 4 0,1,2,3
2 4 4,5,6,7
3 4 8,9,10,11
4 4 12,13,14,15

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 35
With4receivequeuescreatedperinterfacethisresults,onthismachine,tothefirst
networkinterfacemappingtothefourlowestnumberCPUcoresi.e.twocoresfrom
eachCPUsocketasillustratedbelow.Thenextnetworkinterfaceusesthenextfour
CPUsuntileachCPUcoreisloadedwithasingleRSSchannel–asillustratedin
Figure3below.
Figure3:MappingRSSChannelstoCPUcores.
ToidentifythemappingofreceivequeuestoCPUcores,usethefollowing
command:
#cat/proc/interrupts|grepeth4
106:19000000000000000IR‐PCI‐MSI‐edgeeth4‐0
107:01100000000000000IR‐PCI‐MSI‐edgeeth4‐1
108:00100000000000000IR‐PCI‐MSI‐edgeeth4‐2
109:0002000000000000IR‐PCI‐MSI‐edgeeth4‐3
NotethateachreceivequeuehasanassignedIRQ.Receivequeueeth4‐0isserved
byIRQ106,eth4‐1byIRQ107etc.
sfcaffinity_config
TheOpenOnloaddistributionalsoincludesthesfcaffinity_configscriptwhich
canalsobeusedtoaffinitizeRSSchannelinterrupts.sfcaffinity_confighasa
numberofcommandlineoptionsbutacommonwayofrunningitiswiththeauto
command:
#sfcaffinity_configauto
Autoinstructssfcaffinity_configtosetinterruptsaffinitiestoevenlyspreadthe
RSSchannelsovertheavailableCPUcores.Usingtheabovescenarioasanexample,
whererss_cpushasbeensetto4,thecommandwillaffinitizetheinterrupt
associatedwitheachreceivequeueevenlyovertheCPUtopology–inthiscasethe
firstfourCPUcores.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 36
sfcaffinity_config:INFO:eth4:Spreading4interruptsevenlyover2sharedcaches
sfcaffinity_config:INFO:eth4:bindrxq0(irq106)tocore1
sfcaffinity_config:INFO:eth4:bindrxq1(irq107)tocore0
sfcaffinity_config:INFO:eth4:bindrxq2(irq108)tocore3
sfcaffinity_config:INFO:eth4:bindrxq3(irq109)tocore2
sfcaffinity_config:INFO:eth4:configuresfc_affinityn_rxqs=4
cpu_to_rxq=1,0,3,2,1,0,3,2,1,0,3,2,1,0,3,2
Figure4:Mappingwithsfcaffinity_configauto
Inthisexample,afterrunningthesfcaffinity_configautocommand,interrupts
forthe4receivequeuesfromthe4interfacesarenowalldirectedtothesame4
cores0,1,2,3asillustratedbyFigure4.
NOTE:Runningthesfcaffinity_configautocommandalsodisablesthekernel
IRQbalanceservicetopreventinterruptsbeingredirectedbythekerneltoother
cores.
RestrictRSStolocalNUMAnode
Thesfcdrivermoduleparameterrss_numa_localwillrestrictRSStoonlyuseCPU
coresorhypterthreads(ifhyperthreadingisenabled)ontheNUMAnodelocaltothe
Solarflareadapter.
rss_numa_localdoesNOTrestrictthenumberofRSSchannelscreatedbythe
driver–itinsteadworksbyrestrictingtheRSSspreadingsoonlythechannelsonthe
localNUMAnodewillreceivekerneldrivertraffic.
Inthedefaultcase(whererss_cpus=cores),oneRSSchanneliscreatedperCPU
core.However,thedriveradjuststheRSSsettingssuchthatonlytheRSSchannels
affinitizedtothelocalCPUsocketreceivetraffic.Itthereforehasnoeffectonthe
Onloadallocationanduseofreceivequeuesandinterrupts.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 37
Figure5belowidentifiesthereceivequeueinterruptsspreadwhenrss_cpus=4
andrss_numa_local=1.Inthismachineadapter1isattachedtothePCIebuson
socket#0withadapter#2attachedtothePCIebusonsocket#1.
Figure5:Mappingwithrss_numa_local
RestrictRSSReceiveQueues
Theethtool‐ Xcommandcanalsobeusedtorestrictthereceivequeuesaccessible
byRSS.Inthefollowingexamplerss_cpus=4andethtool‐xidentifiesthe4
receivequeuesperinterface:
#ethtool‐xeth4
RXflowhashindirectiontableforeth4with4RXring(s):
0:01230123
8:01230123
16:01230123
24:01230123
32:01230123
40:01230123
48:01230123
56:01230123
64:01230123
72:01230123
80:01230123
88:01230123
96:01230123
104:01230123
112:01230123
120:01230123
TorestrictRSStospreadreceiveflowsevenlyoverthefirst2receivequeues.Use
ethtool‐X:
#ethtool‐Xeth4equal2

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 38
RXflowhashindirectiontableforeth4with4RXring(s):
0:01010101
8:01010101
16:01010101
24:01010101
32:01010101
40:01010101
48:01010101
56:01010101
64:01010101
72:01010101
80:01010101
88:01010101
96:01010101
104:01010101
112:01010101
120:01010101
InterruptHandling‐UsingOnload
AthreadacceleratedbyOnloadwilleitherbeinterruptdrivenoritwillbespinning.
Whenthethreadisinterruptdriven,athreadwhichcallsintoOnloadtoreadfrom
itsreceivequeueandforwhichtherearenoreceivedpacketstobeprocessed,will
‘sleep’untilaninterrupt(s)fromthekernelinformsitthatthereismoreworktodo.
Whenathreadisspinning,itisbusywaitingonitsreceivequeueuntilpacketsare
received‐inwhichcasethepacketsareretrievedandthethreadreturns
immediatelytothereceivequeue,oruntilthespinperiodexpires.Ifthespinperiod
expiresthethreadwillrelinquishtheCPUcoreand‘sleep’untilaninterruptfromthe
kernelinformsitthatfurtherpacketshavebeenreceived.Ifthespinperiodisset
greaterthanthepacketinter‐arrivalrate,thespinningthreadcancontinuetospin
andretrievepacketswithoutinterruptsoccurring.Evenwhenspinning,an
applicationmightexperienceafewinterrupts.
Asageneralrule,whenspinning,onlyafewinterruptswillbeexpectedso
performanceistypicallyinsensitiveastowhichCPUcoreprocessestheinterrupts.
However,whenOnloadisinterruptdrivenperformancecanbesensitivetowhere
theinterruptsarehandledandwilltypicallybenefittobeonthesameCPUsocket
astheapplicationthreadhandlingthesocketI/O.TocontroltheCPUcoreprocessing
OnloadinterruptsusetheEF_IRQ_COREorEF_IRQ_CHANNELenvironmentvariables.
UsingEF_PACKET_BUFFER_MODE0or2,anonloadstackwilluseoneormoreofthe
interruptsassignedtotheNETdriverreceivequeueswheretheCPUcorehandling
theinterruptsisdefinedbytheRSSmappingofreceivequeuestoCPUcores.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 39
UsingEF_PACKET_BUFFER_MODE1or3,theonloadstackcreatesdedicated
interrupts.SeeTable4belowfordetails.
Anotherenvironmentvariable,EF_IRQ_CHANNEL,canbeusedtoselecttheNET
driverreceivechannelthatwillbeusedtohandleinterruptsforanonloadstack.
OnloadinterruptsarehandledbythesamecoreassignedtotheNETdriverreceive
channel.
WhenOnloadisusingaNETdriverRSSchannelforitssourceofinterrupts,itcanbe
usefultodedicatethischanneltoOnloadandpreventthedriverfromusingthis
channelforRSStraffic.Seeabovesectionson“RestrictingRSSreceivequeues”and
“RestrictRSStolocalNUMAnode”formethodsofhowtoachievethis.
5.6PerformanceJitter
Onanysystemreducingoreliminatingjitteriskeytogainingoptimumperformance,
howeverthecausesofjitterleadingtopoorperformancecanbedifficulttodefine
anddifficulttoremedy.Thefollowingsectionidentifiessomekeypointsthatshould
beconsidered.
•Afirststeptowardsreducingjittershouldbetoconsidertheconfiguration
settingsspecifiedintheLowLatencyQuickstartGuideonpage4‐thisincludes
thedisablingoftheirqbalanceservice,interruptmoderationsettingsand
measurestopreventCPUcoresswitchingtopowersavingmodes.
•UseisolcpustoisolateCPUcoresthattheapplication‐oratleastthecritical
threadsoftheapplicationwilluseandpreventOShousekeepingtasksand
othernon‐criticaltasksfromrunningonthesecores.
•Setanapplicationthreadrunningononecoreandtheinterruptsforthat
threadonaseparatecore‐butonthesamephysicalCPUpackage.Evenwhen
spinning,interruptsmaystilloccur,forexample,iftheapplicationfailstocall
intotheOnloadstackforextendedperiodsbecauseitisbusydoingotherwork.
Table4:SelectingOnloadinterrupts
EF_PACKET_BUFFER_MODE EF_IRQ_CORE
0(default)or2OnloadinterruptsarehandledviatheNETdriver
receivechannelinterrupts.
Itisonlypossibleforinterruptstobehandledon
therequestedcoreifaNETdriverinterruptis
assignedtotheselectedcore.
1or3Onloadcreatesdedicatedinterruptsforeach
onloadstackandaninterruptisassignedtothe
requestedcore.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 40
•Ideallyeachspinningthreadwillbeallocatedaseparatecoresothat,inthe
eventthatitblocksorisde‐scheduled,itwillnotpreventotherimportant
threadsfromdoingwork.Acommoncauseofjitterismorethanonespinning
threadsharingthesameCPUcore.Jitterspikesmayindicatethatonethreadis
beingheldofftheCPUcorebyanotherthread.
•WhenEF_STACK_LOCK_BUZZ=1,threadswillspinfortheEF_BUZZ_USEC
periodwhiletheywaittoacquirethestacklock.Lockbuzzingcanleadto
unfairnessbetweenthreadscompetingforalock,andsoresultinresource
starvationforone.Occurrencesofthisarecountedinthe'stack_lock_buzz'
counter.EF_STACK_LOCK_BUZZisenabledbydefaultwhenEF_POLL_USEC
(spinning)isenabled.
•Ifamulti‐threadapplicationisdoinglotsofsocketoperations,stacklock
contentionwillleadtosend/receiveperformancejitter.Insuchcasesimproved
performancecanbehadwheneachcontendingthreadhasitsownstack.This
canbemanagedwithEF_STACK_PER_THREADwhichcreatesaseparateOnload
stackforthesocketscreatedbyeachthread.Ifseparatestacksarenotan
optionthenitmaybebeneficialtoreducetheEF_BUZZ_USECperiodorto
disablestacklockbuzzingaltogether.
•Itisalwaysimportantthatthreadsthatneedtocommunicatewitheachother
arerunningonthesameCPUpackagesothatthesethreadscansharea
memorycache.
• Jittermayalsobeintroducedwhensomesocketsareacceleratedandothers
arenot.Onloadwillensurethatacceleratedsocketsaregivenpriorityovernon‐
acceleratedsockets,althoughthisdelaywillonlybeintheregionofafew
microseconds‐notmilliseconds,thepenaltywillalwaysbeonthesideofthe
non‐acceleratedsockets.TheenvironmentvariablesEF_POLL_FAST_USECand
EF_POLL_NONBLOCK_FAST_USECcanbeconfiguredtomanagetheextentof
priorityofacceleratedsocketsovernon‐acceleratedsockets.
•Iftrafficissparse,spinningwilldeliverthesamelatencybenefits,buttheuser
shouldensurethatthespintimeoutperiod,configuredusingthe
EF_POLL_USECvariable,issufficientlylongtoensurethethreadisstillspinning
whentrafficisreceived.
•Whenapplicationsonlyneedtosendandreceiveoccasionallyitmaybe
beneficialtoimplementakeepalive‐heartbeatmechanismbetweenpeers.
ThishastheeffectofretainingtheprocessdataintheCPUmemorycache.
Callingsendorreceiveafteradelaycanresultinthecalltakingmeasurably
longer,duetothecacheeffects,thanifthisiscalledinatightloop.
•OnsomeserversBIOSsettingssuchaspowerandutilizationmonitoringcan
causeunnecessaryjitterbyperformingmonitoringtasksonallCPUcores.The
usershouldchecktheBIOSanddecideifperiodictasks(andtherelatedSMIs)
canbedisabled.
•TheSolarflaresysjitterutilitycanbeusedtoidentifyandmeasurejitteronall
coresofanidlesystem‐refertoSysjitteronpage27fordetails.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 41
UsingOnloadTuningProfiles
Environmentvariablessetintheapplicationuser‐spacecanbeusedconfigureand
controlaspectsoftheacceleratedapplication’sperformance.Thesevariablescanbe
exportedusingtheLinuxexportcommande.g.
exportEF_POLL_USEC=100000
Onloadsupportstuningprofilescriptfileswhichareusedtogroupenvironment
variableswithinasinglefiletobecalledfromtheOnloadcommandline.
ThelatencyprofilesetstheEF_POLL_USEC=100000settingthebusy‐waitspin
timeoutto100milliseconds.TheprofilealsodisablesTCPfaststartforneworidle
connectionswhereadditionalTCPACKswilladdlatencytothereceivepath.Touse
theprofileincludeitontheonloadcommandlinee.g
onload‐‐profile=latencynetperf‐Honload2‐sfc‐tTCP_RR
FollowingOnloadinstallation,profilesprovidedbySolarflarearelocatedinthe
followingdirectory‐thisdirectorywillbedeletedbytheonload_uninstall
command:
/usr/libexec/onload/profiles
User‐definedenvironmentvariablescanbewrittentoauser‐definedprofilescript
file(havinga.opfextension)andstoredinanydirectoryontheserver.Thefullpath
tothefileshouldthenbespecifiedontheonloadcommandlinee.g.
onload‐‐profile=/tmp/myprofile.opfnetperf‐Honload2‐sfc‐tTCP_RR
Asanexamplethelatencyprofile,providedbytheOnloaddistributionisshown
below:
#Onloadlowlatencyprofile.
#Enablepolling/spinning.Whentheapplicationmakesablockingcall
#suchasrecv()orpoll(),thiscausesOnloadtobusywaitforupto
100ms
#beforeblocking.
onload_setEF_POLL_USEC=100000
#DisableFASTSTARTwhenconnectionisneworhasbeenidleforawhile.
#Theadditionalacksitcausesaddlatencyonthereceivepath.
onload_setEF_TCP_FASTSTART_INIT0
onload_setEF_TCP_FASTSTART_IDLE0
ForacompletelistofenvironmentvariablesrefertoParameterReferenceon
page146
BenchmarkTesting
BenchmarkproceduresusingOnload,netperfandsfnt_pingpongaredescribedin
theLowLatencyQuickstartGuideonpage4.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 42
5.7AdvancedTuning
Advancedtuningrequirescloserexaminationoftheapplicationperformance.The
applicationshouldbetunedtoachievethefollowingobjectives:
•Tohaveasmuchprocessingatuser‐levelaspossible.
•Tohaveasfewinterruptsaspossible.
•Toeliminatedrops.
•Tominimizelockcontention.
Onloadincludesadiagnosticapplicationcalledonload_stackdump,whichcanbe
usedtomonitorOnloadperformanceandtosettuningoptions.
Thefollowingsectionsdemonstratetheuseofonload_stackdumptoexamine
aspectsofthesystemperformanceandsetenvironmentvariablestoachievethe
tuningobjectives.
Forfurtherexamplesanduseofonload_stackdumprefertoonload_stackdumpon
page219.
MonitoringUsingonload_stackdump
Touseonload_stackdump,enterthefollowingcommand:
onload_stackdump[command]
Tolistavailablecommandsandviewdocumentationforonload_stackdumpenter
thefollowingcommands:
onload_stackdumpdoc
onload_stackdump‐h
Aspecificstacknumbercanalsobeprovidedontheonload_stackdumpcommand
line.
WorkedExamples
PrefaultPacketBuffers
TheOnloadenvironmentvariableEF_PREFAULT_PACKETSwillcausetheuser
processto‘touch’thespecifiednumberofpacketbufferswhenanOnloadstackis
created.Thismeansthatmemoryforthesepacketbuffersispre‐allocatedand
memory‐mappedintotheuser‐processaddressspace.
Preallocationisadvisedtopreventlatencyjittercausedbytheallocationand
memory‐mappingoverheads.
Whendecidinghowmanypacketstoprefault,theusershouldlookattheallocvalue
whentheonload_stackdumppacketscommandisrun.Theallocvalueisahigh
watermarkidentifyingthemaximumthenumberofpacketsbeingusedbythestack
atanysingularpoint.SettingEF_PREFAULT_PACKETStoatleastthisvalueis
recommended.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 43
onload_stackdumppackets
$onload_stackdumppackets
ci_netif_pkt_dump_all:id=6
pkt_bufs:size=2048max=32768alloc=576free=50async=0
pkt_bufs:rx=525rx_ring=522rx_queued=3
pkt_bufs:tx=1tx_ring=0tx_oflow=0tx_other=1
509:0x8000Rx
1:0x4000Nonb
n_zero_refs=66n_freepkts=50estimated_free_nonb=16
free_nonb=0nonb_pkt_pool=a39ffff
NOTE:Itisnotpossibletoprefaultanumberofpacketsexceedingthecurrentvalue
ofEF_MAX_PACKETS–andattemptstodothiswillresultinawarningsimilartothe
following:
ci_netif_pkt_prefault_reserve:Prefaultedonly63488of64000
Thewarningmessageisharmless,thisinformstheuserthatnotalltherequested
packetscouldbeprefaulted(becausesomehavealreadybeenallocatedtoreceive
rings).
WhendecidinghowmanypacketstoprefaulttheusershouldconsiderthatOnload
mustallocatefromtheEF_MAX_PACKETpool,anumberofpacketbuffersperreceive
ringperinterface.Oncethesehavebeenallocated,anyremaindercanbe
prefaulted.
Userswhorequiretoprefaultthemaximumpossiblenumberofavailablepackets
cansetEF_PREFAULT_PACKETSandEF_MAX_PACKETStothesamevalueandjust
ignorethewarningsfromOnload:
EF_PREFAULT_PACKETS=64000EF_MAX_PACKETS=64000onload<myapplication>...
RefertoAppendixAonpage146fordetailsoftheEF_PREFAULT_PACKETSvariable.
CAUTION:Prefaultingpacketbuffersforonestackwillreducethenumberof
availablebuffersavailableforothers.Usersshouldconsiderthatoverallocationto
onestackmightmeanspare(redundant)packetbuffercapacitythatcouldbebetter
allocatedelsewhere.
ProcessingatUser‐Level
Manyapplicationscanachievebetterperformancewhenmostprocessingoccursat
user‐levelratherthankernel‐level.Toidentifyhowanapplicationisperforming,
enterthefollowingcommand:
onload_stackdumplots|greppolls

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 44
$onload_stackdumplots|greppoll
k_polls:673
u_polls:41
Theoutputidentifiesmanymorek_pollsthanu_pollsindicatingthatthe
stackisoperatingmainlyatkernel‐levelandmaynotbeachievingoptimal
performance.
Solution
TerminatetheapplicationandsettheEF_POLL_USECparameterto100000.Re‐start
theapplicationandre‐runonload_stackdump:
exportEF_POLL_USEC=100000
onload_stackdumplots|greppolls
$onload_stackdumplots|greppolls
k_polls:673
u_polls:1289
Theoutputidentifiesthatthenumberofu_pollsisfargreaterthanthe
numberofk_pollsindicatingthatthestackisnowoperatingmainlyat
user‐level.
Counter Description
k_polls Numberoftimesthesocketeventqueuewas
polledfromthekernel.
u_polls Numberoftimesthesocketeventqueuewas
polledfromuserspace.
periodic_polls Numberoftimesaperiodictimerhaspolledfor
events.
interrupt_polls Numberoftimesaninterruptpolledfor
networkevents.
deferred_polls Numberoftimespollhasbeendeferredtothe
stacklockholder.
timeout_interrupt_polls Numberoftimestimeoutinterruptspolledfor
networkevents.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 45
AsFewInterruptsasPossible
Atunedapplicationwillreachabalancebetweenthenumber/rateofinterrupts
processedandtheamountofrealworkthatgetsdonee.g.processmultiplepackets
perinterruptratherthanone.Evenspinningapplicationscanbenefitfromthe
occasionalinterrupt,e.g.whenaspinningthreadhasbeende‐scheduledfroma
CPU,aninterruptwillprodthethreadbacktoactionwhenfurtherworkhastobe
done.
#onload_stackdumplots|grep^interrupt
Solution
Ifanapplicationisobservedtakinglotsofinterruptsitmaybebeneficialtoincrease
thespintimewiththeEF_POLL_USECvariableorsettingahighinterrupt
moderationvalueforthenetdriverusingethtool.
Thenumberofinterruptsonthesystemcanalsobeidentifiedfrom/proc/
interrupts.
EliminatingDrops
Theperformanceofnetworksisimpactedbyanypacketloss.Thisisespecially
pronouncedforreliabledatatransferprotocolsthatarebuiltontopofunicastor
multicastUDPsockets.
Firstchecktoseeifpacketshavebeendroppedbythenetworkadapterbefore
reachingtheOnloadstack.Useethtooltocollectstatsdirectlyfromthenetwork
adapter:
#ethtool‐Senps0f0|grepdrop
rx_noskb_drops:0
port_rx_nodesc_drops:0
Counter Description
Interrupts Totalnumberofinterruptsreceivedforthestack.
Interruptpolls Numberoftimesthestackispolled‐invokedbyinterrupt.
Interruptevs Numberofeventsprocessedwheninvokedbyaninterrupt.
Interruptwakes Numberoftimestheapplicationiswokenbyinterrupt.
Interruptprimes Numberoftimesinterruptsarere‐enabled(afterspinningor
pollingthestack).
Interruptnoevents Numberofstackpollsforwhichtheretherewasnoeventto
recover.
Interruptlock
contends
Theapplicationpolledthestackandhasthelockbeforean
interruptfired.
Interruptbudget
limited
Numberoftimes,whenhandlingapollinaninterrupt,thepoll
wasstoppedwhentheNAPIbudgetwasreached.Anyremaining
eventsarethenprocessedonthestackworkqueue.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 46
port_rx_dp_di_dropped_packets:681618610
Solution
Ifpacketlossisobservedatthenetworklevelduetoalackofreceivebufferingtry
increasingthesizeofthereceivedescriptorqueuesizeviaEF_RXQ_SIZE.Ifpacket
dropsareobservedatthesocketlevelconsulttheapplicationdocumentation‐it
mayalsobeworthexperimentingwithsocketbuffersizes(seeEF_UDP_RCVBUF).
SettingtheEF_EVS_PER_POLLvariabletoahighervaluemayalsoimproveefficiency
‐refertoAppendixAforadescriptionofthisvariable.
MinimizingLockContention
Lockcontentioncangreatlyaffectperformance.Whenthreadsshareastack,a
threadholdingthestacklockwillpreventanotherthreadfromdoingusefulwork.
Applicationswithfewerthreadsmaybeabletocreateastackperthread(see
EF_STACK_PER_THREADandStacksAPIonpage193).
Useonload_stackdumptoidentifyinstancesoflockcontention:
#onload_stackdumplots|egrep"(lock_)|(sleep)"
Counter Description
rx_noskb_drops Numberofpacketsdroppedwhenthereare
nofurthersocketbufferstouse.
port_rx_nodesc_drops Numberofpacketsdroppedwhenthereare
nofurtherdescriptorsintherxringbufferto
receivethem.
port_rx_dp_di_dropped_packets Numberofpacketsdroppedbecausefilters
indicatethepacketsshouldbedropped‐this
canhappenwhenpacketsdon’tmatchany
filterorthematchedfilterindicatesthe
packetshouldbedropped.
Counter Description
periodic_lock_contends
Numberoftimesperiodictimercouldnotgetthe
stacklock.
interrupt_lock_contends
Numberoftimestheuserlevelgotthestacklock.
timeout_interrupt_lock_conte
nds
Numberoftimestimeoutinterruptscouldnotlock
thestack.
sock_sleeps
Numberoftimesathreadhasblockedonasingle
socket.
sock_sleep_primes
Numberoftimesselect/poll/epollenabled
interrupts.

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 47
unlock_slow
Numberoftimestheslowpathwastakentounlock
thestacklock.
unlock_slow_pkt_waiter
Numberoftimespacketmemoryshortage
provokedtheunlockslowpath.
unlock_slow_socket_list
Numberoftimesthedeferredsocketlistprovoked
theunlockslowpath.
unlock_slow_need_prime
Numberoftimesinterruptprimingprovokedthe
unlockslowpath.
unlock_slow_wake
Numberoftimestheunlockslowpathwastakento
wakethreads.
unlock_slow_swf_update
Numberoftimestheunlockslowpathwastakento
updateswfilters.
unlock_slow_close
Numberoftimestheunlockslowpathwastakento
closesockets/pipes.
unlock_slow_syscall
Numberoftimesasyscallwasneededonthe
unlockslowpath.
lock_wakes
Numberoftimesathreadiswokenwhenblocked
onthestacklock.
stack_lock_buzz
Numberoftimesathreadhasspunwaitingforthe
stacklock.
sock_lock_sleeps
Numberoftimesathreadhassleptwaitingfora
socklock.
sock_lock_buzz
Numberoftimesathreadhasspunwaitingfora
socklock.
tcp_send_ni_lock_contends
NumberoftimesTCPsendmsg()contendedthe
stacklock
udp_send_ni_lock_contends
NumberoftimesUDPsendmsg()contendedthe
stacklock
getsockopt_ni_lock_contends
Numberoftimesgetsockopt()contendedthestack
lock.
setsockopt_ni_lock_contends
Numberoftimessetsockopt()contendedthestack
lock.
lock_dropped_icmps
NumberofdroppedICMPmessagesnotprocessed
duetocontention.
Counter Description

OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 48
Solution
Performancewillbeimprovedwhenstackcontentioniskepttoaminimum.When
threadsshareastackitispreferableforathreadtospinratherthansleepwhen
waitingforastacklock.TheEF_BUZZ_USECvaluecanbeincreasedtoreduce
‘sleeps’.Wherepossibleusestacksperprocess.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 49
6OnloadFunctionality
ThischapterprovidesdetailedinformationaboutspecificaspectsofSolarflare
Onloadoperationandfunctionality.
6.1OnloadTransparency
Onloadprovidessignificantlyimprovedperformancewithouttheneedtorewriteor
recompiletheuserapplication,whilstretainingcompleteinteroperabilitywiththe
standardTCPandUDPprotocols.
IntheregularkernelTCP/IParchitectureanapplicationisdynamicallylinkedtothe
libclibrary.ThisOSlibraryprovidessupportforthestandardBSDsocketsAPIviaa
setof‘wrapper’functionswithrealprocessingoccurringatthekernel‐level.Onload
alsosupportsthestandardBSDsocketsAPI.However,incontrasttothekernelTCP/
IP,Onloadmovesprotocolprocessingoutofthekernel‐spaceandintotheuser‐level
Onloadlibraryitself.
AsanetworkingapplicationinvokesthestandardsocketAPIfunctioncallse.g.
socket(),read(),write()etc,theseareinterceptedbytheOnloadlibrarymaking
useoftheLD_PRELOADmechanismonLinux.Fromeachfunctioncall,Onloadwill
examinethefiledescriptoridentifyingthosesocketsusingaSolarflareinterface‐
whichareprocessedbytheOnloadstack,whilstthosenotusingaSolarflare
interfacearetransparentlypassedtothekernelstack.
6.2OnloadStacks
AnOnload'stack'isaninstanceofaTCP/IPstack.Thestackincludestransmitand
receivebuffers,openconnectionsandtheassociatedportnumbersandstack
options.EachstackhasassociatedwithitoneormoreVirtualNICs(typicallyoneper
physicalportthatstackisusing).
Innormalusage,eachacceleratedprocesswillhaveitsownOnloadstacksharedby
allconnectionscreatedbytheprocess.Itisalsopossibleformultipleprocessesto
shareasingleOnloadstackinstance(refertoStackSharingonpage62),andfora
singleapplicationtohavemorethanoneOnloadstack.RefertoOnloadExtensions
APIonpage189.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 50
6.3VirtualNetworkInterface(VNIC)
TheSolarflarenetworkadaptersupports1024transmitqueues,1024receive
queues,1024eventqueuesand1024timerresourcespernetworkport.AVNIC
(virtualnetworkinterface)consistsofoneuniqueinstanceofeachofthese
resourceswhichallowstheVNICclienti.e.theOnloadstack,anisolatedandsafe
mechanismofsendingandreceivingnetworktraffic.Receivedpacketsaresteered
tothecorrectVNICbymeansofIP/MACfiltertablesonthenetworkadapterand/or
ReceiveSideScaling(RSS).AnOnloadstackallocatesoneVNICperSolarflare
networkportsoithasadedicatedsendandreceivechannelfromusermode.
FollowingaresetoftheSolarflarenetworkadapterdriver,allvirtualinterface
resourcesincludingOnloadstacksandsocketswillbere‐instated.Thereset
operationwillbetransparenttotheapplication,buttrafficwillbelostduringthe
reset.
6.4FunctionalOverview
Whenestablishingitsfirstsocket,anapplicationisallocatedanOnloadstackwhich
allocatestherequiredVNICs.
Whenapacketarrives,IPfilteringintheadapteridentifiesthesocketandthedata
iswrittentothenextavailablereceivebufferinthecorrespondingOnloadstack.The
adapterthenwritesaneventtoan“eventqueue”managedbyOnload.Ifthe
applicationisregularlymakingsocketcalls,Onloadisregularlypollingthisevent
queue,andthenprocessingeventsdirectlyratherthaninterruptsarethenormal
meansbywhichanapplicationisabletorendezvouswithitsdata.
User‐levelprocessingsignificantlyreduceskernel/user‐levelcontextswitchingand
interruptsareonlyrequiredwhentheapplicationblocks‐sincewhenthe
applicationismakingsocketcalls,Onloadisbusyprocessingtheeventqueuepicking
upnewnetworkevents.
6.5OnloadwithMixedNetworkAdapters
AservermaybeequippedwithSolarflarenetworkinterfacesandnon‐Solarflare
networkinterfaces.Whenanapplicationisaccelerated,OnloadreadstheLinux
kernelroutingtable(Onloadwillonlyconsiderthekerneldefaultroutingtable)to
identifywhichnetworkinterfaceisrequiredtomakeaconnection.Ifanon‐
SolarflareinterfaceisrequiredtoreachadestinationOnloadwillpassthe
connectiontothekernelTCP/IPstack.Noadditionalconfigurationisrequiredto
achievethisasOnloaddoesthisautomaticallybylookingintheIProutetable.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 51
6.6MaximumNumberofNetworkInterfaces
Onloadsupportsupto8Solarflarenetworkinterfacesbydefault.Ifanapplication
requiresmoreSolarflareinterfacesthefollowingvaluescanbealteredinthesource
code:src/include/ci/internal/transport_config_opt.hheaderfile
CI_CFG_MAX_INTERFACESandCI_CFG_MAX_REGISTER_INTERFACES.
FollowingchangestothesevaluesitisnecessarytorebuildandreinstallOnload.
6.7WhitelistandBlacklistInterfaces
BydefaultOnloadwillusethefirst‘N’SolarflarenetworkinterfacesfornetworkI/O
whereNisequaltoCI_CFG_MAX_REGISTER_INTERFACES(defaultvalue8).
SupportedfromOnload201502,theuserisabletoselectwhichSolarflareinterfaces
aretobeusedbyOnload.
Theintf_white_listOnloadmoduleoptionisaspace‐separatedlistofSolarflare
networkadapterinterfacesthatOnloadwillusefornetworkI/O.
•InterfacesidentifiedinthewhitelistwillalwaysbeacceleratedbyOnload.
•InterfacesNOTidentifiedinthewhitelistwillnotbeacceleratedbyOnload.
•AnemptywhitelistmeansthatALLSolarflareinterfaceswillbeaccelerated.
Theintf_black_listOnloadmoduleoptionisaspace‐separatedlistofSolarflare
networkadapterinterfacesthatOnloadwillnotusefornetworkI/O.
Whenaninterfaceappearsinbothlists,blacklisttakespriority.Renamingof
interfacesafterOnloadhasstartedwillnotbereflectedintheaccesslistsand
changestolistswillonlyaffectOnloadstackscreatedaftersuchchanges‐not
currentlyrunningstacks.
Onloadmoduleoptionscanbespecifiedinausercreatedfileinthe/etc/
modprobe.ddirectory:
optionsonloadintf_white_list=eth4
optionsonloadintf_black_list="eth5eth6"
Theseoptionsareappliedgloballyandcannotbeappliedtoindividualstacks.
6.8OnloadedPIDs
ToidentifyprocessesacceleratedbyOnloadusetheonload_fusercommand:
#onload_fuser‐v
9886ping
OnlyprocessesthathavecreatedanOnloadstackarepresent.Processeswhichare
loadedunderOnload,buthavenotcreatedanysocketsarenotpresent.The
onload_stackdumpcommandcanalsolistacceleratedprocesses‐seeList
OnloadedProcessesonpage220fordetails.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 52
6.9OnloadandFileDescriptors,StacksandSockets
ForanOnloadedprocessitispossibletoidentifythefiledescriptors,Onloadstacks
andsocketsbeingacceleratedbyOnload.Usethe/proc/<PID>/fdfile‐supplying
thePIDoftheacceleratedprocesse.g.
#ls‐l/proc/9886/fd
total0
lrwx‐‐‐‐‐‐1rootroot64May1414:090‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:091‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:092‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:093‐>onload:[tcp:6:3]
lrwx‐‐‐‐‐‐1rootroot64May1414:094‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:095‐>/dev/onload
lrwx‐‐‐‐‐‐1rootroot64May1414:096‐>onload:[udp:6:2]
Acceleratedfiledescriptorsarelistedassymboliclinksto/dev/onload.Accelerated
socketsaredescribedin[protocol:stack:socket]format.
6.10SystemcallsinterceptedbyOnload
SystemcallsinterceptedbytheOnloadlibraryarelistedinthefollowingfile:
[onload]/src/include/onload/declare_syscalls.h.tmpl
6.11LinuxSysctls
TheLinuxdirectory/proc/sys/net/ipv4containsdefaultsettingswhichtune
differentpartsoftheIPv4networkingstack.InmanycasesOnloadtakesitsdefault
settingsfromthevaluesinthisdirectory.Insomecasesthedefaultcanbe
overridden,foraspecifiedprocessesorsocket,usingsocketoptionsorwithOnload
environmentvariables.ThefollowingtablesidentifythedefaultLinuxvaluesand
howOnloadtuningparameterscanoverridetheLinuxsettings.
KernelValue tcp_slow_start_after_idle
Description controlscongestionwindowvalidationasperRFC2861.Thisis
“off”bydefaultinOnload,asit'snotusuallyusefulinmodern
switchednetworks
Onloadvalue #defineCI_CFG_CONGESTION_WINDOW_VALIDATION
Comments intransport_config_opt.h‐recompileafterchanging.
KernelValue tcp_congestion_control
Description determineswhatcongestioncontrolalgorithmisusedbyTCP.
Validsettingsincludereno,bicandcubic

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 53
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_CONG_AVOID_SCALE_BACK
KernelValue tcp_adv_win_scale
Description defineshowquicklytheTCPwindowwilladvance
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_TCP_ADV_WIN_SCALE_MAX
KernelValue tcp_rmem
Description thedefaultsizeofsockets'receivebuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings,butisignored
onTCPacceptedsockets.Referto
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT.
Comments canbeoverridenwiththeSO_RCVBUFsocketoption.
canbesetwithEF_TCP_RCVBUF
KernelValue tcp_wmem
Description thedefaultsizeofsockets'sendbuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings
Comments EF_TCP_SNDBUFoverridesSO_SNDBUFwhichoverrides
tcp_wmem
KernelValue tcp_dsack
Description allowsTCPtosendduplicateSACKS
Onloadvalue usesthecurrentlyactiveLinuxsettings
Comments
KernelValue tcp_fack
Description enablesfastretransmissions
Onloadvalue fastretransmissionsarealwaysenabled‐Onloadusesthe
currentlyactiveLinuxsetting
Comments

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 54
RefertotheParameterReferenceonpage146fordetailsofenvironmentvariables.
6.12ChangingOnloadControlPlaneTableSizes
Onloadsupportsthefollowingruntimeconfigurableoptionswhichdeterminethe
sizeofcontrolplanetables:
ThetableaboveidentifiesthedefaultvaluesfortheOnloadcontrolplanetables.The
defaultvaluesarenormallysufficientforthemajorityofapplicationsandcreating
largertablesmayimpactapplicationperformance.Ifnon‐defaultvaluesareneeded,
KernelValue tcp_sack
Description enableTCPselectacknowledgements,asperRFC2018
Onloadvalue enabledbydefault‐OnloadusesthecurrentlyactiveLinux
setting
Comments clearbit2ofEF_TCP_SYN_OPTStodisable
KernelValue tcp_max_syn_backlog
Description themaximumsizeofalisteningsocket'sbacklog
Onloadvalue setwithEF_TCP_BACKLOG_MAX
Comments
KernelValue tcp_synack_retries
Description themaximumnumberofretriesofSYN‐ACKs
Onloadvalue setwithEF_RETRANSMIT_THRESHOLD_SYNACK
Comments Defaultvalue5
Option Description Default
max_layer2_interfaces Setsthemaximumnumberofnetwork
interfaces,includingphysicalinterfaces,
VLANsandbonds,supportedinOnload’s
controlplane.
50
max_neighs Setsthemaximumnumberofrowsinthe
OnloadARP/neighbourtable.Thevalueis
roundeduptoapoweroftwo.
1024
max_routes Setsthemaximumnumberofentriesinthe
Onloadroutetable.
256

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 55
theusershouldcreateafileinthe/etc/modprobe.ddirectory.Thefilemusthavea
.confextensionandOnloadoptionscanbeaddedtothefile,asingleoptionperline,
inthefollowingformat:
optionsonloadmax_neighs=512
FollowingchangesOnloadshouldberestartedusingthereloadcommand:
onload_toolreload
6.13SO_TIMESTAMPandSO_TIMESTAMPNS(software
timestamps)
SettingtheSO_TIMESTAMPoptionusingsetsockopt()enablestimestampingon
TCPorUDPsockets.Functionssuchascmesg(),recvmsg()andrecvmmsg()can
thenrecovertimestampdataforpacketsreceivedatthesocket.
Onloadimplementsamicrosecondresolutionsoftwaretimestampingmechanism,
whichavoidstheneedforaper‐packetsystemcalltherebyreducingthenormal
timestampoverheads.
TheSolarflareadapterwillalwaysdeliverreceivedpacketstothereceiveringbuffer
intheorderthatthesearrivefromthenetwork.Onloadwillappendasoftware
timestamptothepacketmetadatawhenitretrievesapacketfromtheringbuffer‐
beforethepacketistransferredtoawaitingsocketbuffer.FromaTCPstreamthe
timestampreturnedisthatforthefirstavailablebyte.Duetoretransmissionsand
anyreordering,timestampsmaynotbemonotonicallyincreasingastheseare
deliveredtotheapplication.
WhentheOnloadapplicationisinterruptdriven,areceivedpacketistimestamped
whentheeventinterruptforthepacketfires.IftheOnloadapplicationisspinning,
areceivedpacketistimestampedwhentheapplicationcallsreceive.Spinningwill
generallyproducemoreaccuratetimestampssolongasthereceivingapplicationis
abletokeeppacewiththepacketarrivalrate.
Thesystemcallusedtogetatimestampisclock_gettime()andtheformatof
timestampsisdefinedbystruct_timeval.
Applicationspreferringtimestampswithnanosecondresolutioncanuse
SO_TIMESTAMPNSinplaceofthenormal(microsecondresolution)SO_TIMESTAMP
value.
6.14SO_TIMESTAMPING(HardwareReceiveTimestamps)
SettingtheSO_TIMESTAMPINGoptionusingsetsockopt()enableshardware
timestampingonTCPorUDPsockets.Timestampsaregeneratedbytheadapterfor
eachreceivedpacket.Functionssuchascmesg(),recvmsg()andrecvmmsg()can
thenrecoverhardwaretimestampsforpacketsrecoveredfromasocket.
• SupportedonlyonSolarflareFlareonSFN7000seriesadapters.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 56
•AnAppFlexlicenseforhardwaretimestampsmustbeinstalledontheadapter.
ThePTP/timestampinglicenseisinstalledontheSFN7322Fduring
manufacture,suchalicensecanbeinstalledonotherSFN7000seriesadapters
bytheuser.
•TheOnloadstackforthesocketmusthavetheenvironmentvariable
EF_RX_TIMESTAMPINGset‐seeAppendixAonpage146fordetails.
• ReceivedpacketsaretimestampedwhentheyentertheMAContheSFN7000
seriesadapter.
Theformatoftimestampsisdefinedbystruct_timespec.Interestedusersshould
readthekernelSO_TIMESTAMPINGdocumentationformoredetailsofhowtouse
thissocketAPI–kerneldocumentationcanbefound,forexample,at:
https://www.kernel.org/doc/Documentation/networking/timestamping/
Theonloaddistributionincludesanexampleapplicationtodemonstratetransmit
hardwaretimestamping:
/openonload‐<version>/src/tests/onload/hwtimestamping
6.15SO_TIMESTAMPING(HardwareTransmitTimestamps)
Onloadfrom201405supportshardwaretimestampingofUDPandTCPpackets
transmittedoveraSolarflareinterface.
BecausetheLinuxkerneldoesnotsupporthardwaretimestampsforTCP,Onload
providesanextensiontothestandardSO_TIMESTAMPINGAPIwiththe
ONLOAD_SOF_TIMESTAMPING_STREAMsocketoptiontosupportthis.Toreceive
hardwaretimestampsfortransmittedTCPpackets,setthefollowingsocketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE|
ONLOAD_SOF_TIMESTAMPING_STREAM
ToreceivehardwaretimestampsfortransmittedUDPpackets,setthefollowing
socketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE
Othersocketflagcombinations,notlistedabove,willbesilentlyignored.
Toreceivehardwaretransmittimestamps:
•OnlysupportedonSolarflareFlareon™SFN7000seriesadapters.
•TheadaptermusthaveaPTP/HWtimestampinglicense.
•TheadaptermusthaveaSolarCaptureProlicenseorPerformanceMonitoring
license.
•SetEF_TX_TIMESTAMPINGonstackswheretransmittimestampingisrequired.
•SetEF_TIMESTAMPING_REPORTINGtocontrolthetypeoftimestampreturned
totheapplication.Thisisoptional,bydefaultOnloadwillreporttranslated
timestampsiftheadapterclockhasbeenfullysynchronizedtocorrecttimeby

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 57
theSolarflarePTPdaemon.InallcasesOnloadwillalwaysreportraw
timestamps.RefertoParameterReferenceonpage146forfulldetailsofthe
EF_TIMESTAMPING_REPORTINGvariable.
• SolarflarePTP(sfptpd)mustberunningiftimestampsaretobesynchronized
withanexternalPTPmasterclock.
FordetailsoftheSO_TIMESTAMPINGAPIrefertotheLinuxdocumentation:
https://www.kernel.org/doc/Documentation/networking/timestamping/
Theonloaddistributionincludesanexampleapplicationtodemonstratetransmit
hardwaretimestamping:
/openonload‐<version>/src/tests/onload/hwtimestamping
6.16SO_BINDTODEVICE
Inresponsetothesetsockopt()functioncallwithSO_BINDTODEVICE,sockets
identifyingnon‐Solarflareinterfaceswillbehandledbythekernelandallsockets
identifyingSolarflareinterfaceswillbehandledbyOnload.Allsendsfromasocket
aresentviatheboundinterfaceandallTCP,UDPandMulticastpacketsreceivedvia
theboundinterfacearedeliveredonlytothesocketboundtotheinterface.
6.17MultiplexedI/O
LinuxsupportsthreecommonmethodsforhandlingmultiplexedI/Ooperation;
poll(),select()andtheepollsetoffunctions.
Thegeneralbehaviorofthepoll(),select()andepoll_wait()functionswith
OpenOnloadisasfollows:
•Ifthereareoperationsreadyonanyfiledescriptors,poll(),select()and
epoll_wait()willreturnimmediately.RefertothePoll,SelectandEpoll
subsectionsforspecificbehaviordetails.
•Iftherearenofiledescriptorsreadyandspinningisnotenabled,callsto
poll(),select()andepoll_wait()willenterthekernelandblock.
•Inthecasesofpoll()andselect(),whenthesetcontainsfiledescriptors
thatarenotacceleratedsockets,thereisaslightlatencyoverheadasOnload
mustmakeasystemcalltodeterminethereadinessofthesesockets.Thereis
nosuchcostwhenusingepoll_wait()andasystemcallisonlyneededwhen
non‐Onloaddescriptorsbecomeready.
•Iftherearenofiledescriptorsreadyandspinningisenabled,OpenOnloadwill
spintoensurethatacceleratedsocketsarepolledaspecifiednumberoftimes
beforeunacceleratedsocketsareexamined.Thisreducestheoverhead
incurredwhenOpenOnloadhastocallintothekernelandreduceslatencyon
acceleratedsockets.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 58
ThefollowingsubsectionsdiscusstheuseoftheseI/OfunctionsandOpenOnload
environmentvariablesthatcanbeusedtomanipulatebehavioroftheI/O
operation.
Poll,ppoll
Thepoll(),ppoll()filedescriptorsetcanconsistofbothacceleratedandnon‐
acceleratedfiledescriptors.TheenvironmentvariableEF_UL_POLLenables/
disablesaccelerationofthepoll(),ppoll()functioncalls.Onloadsupportsthe
followingoptionsfortheEF_UL_POLLvariable:
Additionalenvironmentvariablescanbeemployedtocontrolthepoll(),ppoll()
functionsandtogiveprioritytoacceleratedsocketsovernon‐acceleratedsockets
andotherfiledescriptors.RefertoEF_POLL_FAST,EF_POLL_FAST_USECand
EF_POLL_SPINinParameterReferenceonpage146.
Select,pselect
Theselect(),pselect()filedescriptorsetcanconsistofbothacceleratedand
non‐acceleratedfiledescriptors.TheenvironmentvariableEF_UL_SELECTenables/
disablesaccelerationoftheselect(),pselect()functioncalls.Onloadsupports
thefollowingoptionsfortheEF_UL_SELECTvariable:
Value Behaviour
0Disableaccelerationatuser‐level.Callstopoll(),ppoll()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuser‐level.Callstopoll(),ppoll()are
processedatuserlevel.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
Value EpollBehaviour
0Disableaccelerationatuser‐level.Callstoselect(),pselect()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuser‐level.Callstoselect(),pselect()are
processedatuser‐level.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 59
Additionalenvironmentvariablescanbeemployedtocontroltheselect(),
pselect()functionsandtogiveprioritytoacceleratedsocketsovernon‐
acceleratedsocketsandotherfiledescriptors.RefertoEF_SELECT_FASTand
EF_SELECT_SPINinParameterReferenceonpage146.
Epoll
Theepollsetoffunctions,epoll_create(),epoll_ctl(),epoll_wait(),
epoll_pwait(),areacceleratedinthesamewayaspollandselect.The
environmentvariableEF_UL_EPOLLenables/disablesepollacceleration.Referto
thereleasechangelogforenhancementsandchangestoepollbehavior.
UsingOnloadanepollsetcanconsistofbothOnloadfiledescriptorsandkernelfile
descriptors.OnloadsupportsthefollowingoptionsfortheEF_UL_EPOLL
environmentvariable:
Value EpollBehaviour
0Acceleratedepollisdisabledandepoll_ctl(),epoll_wait()and
epoll_pwait()functioncallsareprocessedinthekernel.Other
functionscallssuchassend()andrecv()arestillaccelerated.
Interruptavoidancedoesnotfunctionandspinningcannotbeenabled.
Ifasocketishandedovertothekernelstackafterithasbeenaddedto
anepollset,itwillbedroppedfromtheepollset.
onload_ordered_epoll_wait()isnotsupported.
1Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestlatencyexceptwhenthenumberofacceleratedfile
descriptorsintheepollsetisverylarge.Thisoptionalsogivesthebest
accelerationofepoll_ctl()calls.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyincreasewiththenumberoffiledescriptors
intheepollset.
onload_ordered_epoll_wait()issupported.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 60
Therelativeperformanceofepolloptions1and2dependsonthedetailsof
applicationbehavioraswellasthenumberofacceleratedfiledescriptorsinthe
epollset.Behaviormayalsodifferbetweenearlierandlaterkernelsandbetween
Linuxrealtimeandnon‐realtimekernels.GenerallytheOSwillallocateshorttime
slicestoauser‐levelCPUintensiveapplicationwhichmayresultinperformance
(latencyspikes).Akernel‐levelCPUintensiveprocessislesslikelytobede‐scheduled
resultinginbetterperformance.Solarflarerecommendtheuserevaluateoptions1
and2forapplicationsthatmanagesmanyfiledescriptors,ortryoption3(onload‐
201502andlater)whenusingverylargesetsandallsocketsareinthesamestack.
Additionalenvironmentvariablescanbeemployedtocontroltheepoll_ctl(),
epoll_wait()andepoll_pwait()functionsandtogiveprioritytoaccelerated
socketsovernon‐acceleratedsocketsandotherfiledescriptors.Referto
EF_EPOLL_CTL_FAST,EF_EPOLL_SPINandEF_EPOLL_MT_SAFEinParameter
Referenceonpage146.
2Callstoepoll_ctl(),epoll_wait(),epoll_pwait()areprocessedin
thekernel.
Deliversbestperformanceforlargenumbersofacceleratedfile
descriptors.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyareindependentofthenumberoffile
descriptorsintheepollset.
onload_ordered_epoll_wait()isnotsupported.
3Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestaccelerationlatencyforepoll_ctl()callsandscaleswell
whenthenumberofacceleratedfiledescriptorsintheepollsetisvery
large‐andallsocketsareinthesamestack.Thecostofthe
epoll_wait()isindependentofthenumberofacceleratedfile
descriptorsinthesetanddependsonlyonthenumberofdescriptors
thatbecomeready.Thebenefitswillbelessifsocketsexistindifferent
Onloadstacksandinthiscasetherecommendationistouse
EF_UL_EPOLL=2.
EF_UL_EPOLL=3doesnotallowmonitoringthereadinessoftheepoll
filedescriptorsfromanotherepoll/poll/select.
EF_UL_EPOLL=3cannotsupportepollsetswhichexistacrossfork().
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
onload_ordered_epoll_wait()issupported.
Value EpollBehaviour

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 61
Refertoepoll‐KnownIssuesonpage122.
6.18WireOrderDelivery
WhenaTCPorUDPapplicationisworkingwithmultiplenetworksockets
simultaneouslyitisdifficulttoensuredataisdeliveredtotheapplicationinthestrict
orderitwasreceivedfromthewireacrossthesesockets.
Theonload_ordered_epoll_wait()APIisanOnloadalternativeimplementation
ofepoll_wait()providingadditionaldataallowingareceivingapplicationto
recoverin‐ordertimestampeddatafrommultiplesockets.Tomaintainwireorder
delivery,onlyaspecificnumberofbytes,asidentifiedbythe
onload_ordered_epoll_event,shouldberecoveredfromareadysocket.
• Orderingisdoneonaper‐stackbasis‐forTCPandUDPsockets.Socketsmust
beinthesameonloadstack.
•OnlydatareceivedfromanOnloadstackwithahardwaretimestampwillbe
ordered.TheenvironmentvariableEF_RX_TIMESTAMPINGshouldbeenabled.
Filedescriptorswheretimestampinginformationisnotavailablemaybe
includedintheepollset,butreceiveddatawillbereturnedfromthese
unordered.
•TheapplicationmustusetheepollAPIandthe
onload_ordered_epoll_wait()function.
•Theapplicationmustsettheper‐processenvironmentvariable
EF_UL_EPOLL=1.
• EPOLLETandONESHOTflagsshouldNOTbeused.
•Areturnvalueofzerofromthewaitfunctionindicatestherearenofile
descriptorsreadywithordereddata‐unordereddatamaystillbeavailable.
Figure6demonstratestheWireOrderDeliveryfeature.
Figure6:WireOrderDelivery
onload_ordered_epoll_wait()returningatpointXwouldallowthefollowing
datatoberecovered:
•SocketA:timestampofpacket1,bytesinpacket1.
•SocketB:timestampofpacket2,bytesinpackets2and3.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 62
•onload_ordered_epoll_wait()returningagainwouldrecovertimestampof
packet4andbytesinpacket4.
TheWireOrderDeliveryfeatureisonlyavailableonSolarflareFlareonadapters
havingaPTP/HWtimestampinglicense.Whenreceivingacrossmultipleadapters,
Solarflaresfptpd(PTP)canensurethatadaptersarecloselysynchronizedwitheach
otherand,ifrequired,withanexternalPTPclocksource.
WireOrderDelivery‐ExampleAPI:
TheOnloaddistributionincludesexampleclient/serverapplicationstodemonstrate
thewireorderfeature:
wire_order_server‐usesonload_ordered_epoll_waittoreceiveordered
dataoverasetofsockets.Receiveddataisechoedbacktotheclientonasinglereply
socket.
wire_order_client‐Sendssequenceddataacrossthesocketset,readsthereply
datafromtheserverandensuresdataisreceivedinsequence.
SourcecodeforthewireorderAPIisavailablein:
openonload‐<version>/src/tests/onload/wire_order
AlthoughnotcompiledaspartoftheOnloadinstallprocess,tobuildtheexample
APIdothefollowing:
Ensuremmaketoolisinthecurrentpath(canbefoundintheopenonload‐
<version>/scriptsdirectory):
#exportPATH=$PATH:/openonload‐<version>/scripts
#cd/openonload‐<version>/build/gnu_x86_64/tests/onload/wire_order
#USEONLOADEXT=1make
Toruntheserver:
#EF_RX_TIMESTAMPING=3onload./wire_order_server
Toruntheclient:
#onload‐‐profile=latency./wire_order_client<ipserver>
Bydefaulttheclientwillsenddataover100TCPsocketscontrolledwiththe‐s
option.UDPcanbeselectedusingthe‐Uoption.
NOTE:Topreventsendsbeingre‐orderedbetweenstreams,thelatencyprofile
shouldbeusedontheclientside.TheenvironmentvariableEF_RX_TIMESTAMPING
mustbesetontheserverside.
6.19StackSharing
BydefaulteachprocessusingOnloadhasitsown'stack'.RefertoOnloadStacksfor
definition.Severalprocessescanbemadetoshareasinglestack,usingtheEF_NAME
environmentvariable.ProcesseswiththesamevalueforEF_NAMEintheir
environmentwillshareastack.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 63
StacksharingisonesupportedmethodtoenablemultipleprocessesusingOnload
tobeacceleratedwhenreceivingthesamemulticaststreamortoallowone
applicationtoreceiveamulticaststreamgeneratedlocallybyasecondapplication.
OthermethodstoachievethisareMulticastReplicationandHardwareMulticast
Loopback.
Stacksmayalsobesharedbymultipleprocessesinordertopreserveandcontrol
resourceswithinthesystem.Stacksharingcanbeemployedbyprocesseshandling
TCPaswellasUDPsockets.
Stacksharingshouldonlyberequestedifthereisatrustrelationshipbetweenthe
processes.Iftwoprocessesshareastackthentheyarenotcompletelyisolated:a
buginoneprocessmayimpacttheother,oroneprocesscangainaccesstothe
other'sprivilegedinformation(i.e.breachsecurity).OncetheEF_NAMEvariableis
set,anyprocessonthelocalhostcansetthesamevalueandgainaccesstothe
stack.
BydefaultOnloadstackscanonlybesharedwithprocesseshavingthesameUID.
TheEF_SHARE_WITHenvironmentvariableprovidesadditionalsecuritywhile
allowingadifferentUIDtoshareastack.RefertoParameterReferenceonpage146
foradescriptionoftheEF_NAMEandEF_SHARE_WITHvariables.
ProcessessharinganOnloadstackshouldalsonotusehugepages.Onloadwill
issueawarningatstartupandpreventtheallocationofhugepagesif
EF_SHARE_WITHidentifiesaUIDofanotherprocessorissetto‐1.IfaprocessP1
createsanOnloadstack,butisnotusinghugepagesandanotherprocessP2
attemptstosharetheOnloadstackbysettingEF_NAME,thestackoptionssetbyP1
willapply,allocationofhugepagesinP2willbeprevented.
AnalternativemethodofimplementingstacksharingistousetheOnload
ExtensionsAPIandtheonload_set_stackname()functionwhich,throughits
scopeparameter,canlimitstackaccesstotheprocessescreatedbyaparticularuser.
RefertoOnloadExtensionsAPIonpage189fordetails.
6.20ApplicationClustering
AnapplicationclusteristhesetofOnloadTCPorUDPstacksocketsboundtothe
sameport.Thisfeaturedramaticallyimprovesthescalingofsomeapplications
acrossmultipleCPUs(especiallythoseestablishingmanysocketsfromaTCP
listeningsocket).
Onloadfromversion201405automaticallycreatesaclusterusingthe
SO_RESUSEPORTsocketoption.TCPorUDPprocessesrunningonRHEL6.5(and
later)settingthisoptioncanbindmultiplesocketstothesameTCPorUDPport.
NOTE:SomeolderLinuxkernel/distributionsdonothavekernelsupportfor
SO_REUSEPORT(introducedintheLinux3.9kernel).Onloadcontainsexperimental
supportforSO_REUSEPORTonolderkernelversionsbutthishasyettobefully
testedandverifiedbySolarflare.UsersarefreetotrytheOnloadapplication
clusteringfeatureonthesekernelsandreporttheirfindingsviaemailto
support@solarflare.com.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 64
ForTCP,clusteringallowstheestablishedconnectionsresultingfromalistening
sockettobespreadoveranumberofOnloadstacks.Eachthread/processcreatesits
ownlisteningsocket(usingSO_REUSEPORT)onthesameport,witheachlistening
socketresidinginitsownOnloadstack.HandlingofincomingnewTCPconnections
arespreadviatheadapter(usingRSS)overtheapplicationclusterandtherefore
overeachofthelisteningsocketsresultingineachOnloadstackandthereforeeach
thread/process,handlingasubsetofthetotaltrafficasillustratedinFigure7below.
Figure7:ApplicationClustering‐TCP
ForUDP,clusteringallowsUDPunicasttraffictobespreadovermultipleapplications
witheachapplicationreceivingasubsetofthetotaltrafficload.
ExistingapplicationsthatdonotuseSO_RESUSEPORTcanusetheapplication
clusteringfeaturewithouttheneedforre‐compilationbyusingtheOnload
EF_TCP_FORCE_REUSEPORTorEF_UDP_FORCE_REUSEPORTenvironmentvariables
identifyingthelistofportstowhichSO_RESUSEPORTwillbeapplied.
ThesizeornumberofsocketmembersofaclusterinOnloadiscontrolledwith
EF_CLUSTER_SIZE.Tocreateaclustertheapplicationsetstheclusternamewith
EF_CLUSTER_NAME.AclusterofEF_CLUSTER_SIZEisthencreated.
NOTE:ThenumberofsocketmembersmustequaltheEF_CLUSTER_SIZEvalue
otherwiseaportionofthereceivedtrafficwillbelost.
ThespreadofreceivedtrafficbetweenclustersocketsemploysReceiveSideScaling
(RSS).ForTCPtheRSShashisafunctionofthesrc_ip:src_port,dst_ip:dst_port.For
UDPtheRSShashisafunctionofthesrc_ipanddst_iponly.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 65
Thereceptionoftrafficwithinaclusterisdependentonportnumbersonly.Iftwo
socketsbindtothesameport,butdifferentIPaddresses,aportionoftraffic
destinedforonesocketcanbereceived(butdroppedbyOnload)ontheother
socket.Forcorrectbehavior,allclustermembersshouldbindtothesameIPaddress.
ThislimitationhasbeenremovedintheOnload‐201509releasesothatitispossible
tocreatemultiplelisteningsocketsboundtothesameportbuttodifferent
addresses.
Restartinganapplicationthatincludesclustersocketmemberscanfailwhenorphan
stacksarestillpresent.UseEF_CLUSTER_RESTARTtoforceterminationoforphaned
stacksallowingthecreationofthenewcluster.
RefertoLimitationsonpage117fordetailsofApplicationClusteringlimitations.
6.21Bonding,LinkaggregationandFailover
Bonding(akateaming)allowsforimprovedreliabilityandincreasedbandwidthby
combiningphysicalportsfromoneormoreSolarflareadaptersintoabond.Abond
hasasingleIPaddress,singleMACaddressandfunctionsasasingleportorsingle
adaptertoprovideredundancy.
OnloadmonitorstheOSconfigurationofthestandardkernelbondingmoduleand
acceleratestrafficoverbondsthataredetectedassuitable(seelimitations).Asa
resultnospecialconfigurationisrequiredtoacceleratetrafficoverbonded
interfaces.
e.g.Toconfigurean802.3adbondoftwoSFCinterfaces(eth2andeth3):
modprobebondingmiimon=100mode=4xmit_hash_policy=layer3+4
ifconfigbond0up
Interfacesmustbedownbeforeaddingtothebond.
echo+eth2>/sys/class/net/bond0/bonding/slaves
echo+eth3>/sys/class/net/bond0/bonding/slaves
ifconfigbond0192.168.1.1/24
Thefile/var/log/messagesshouldthencontainalinesimilarto:
[onload]Acceleratingbond0usingOnload
TrafficoverthisinterfacewillthenbeacceleratedbyOnload.
TodisableOnloadaccelerationofbondssetCI_CFG_TEAMING=0inthefile
transport_config_opt.hatcompiletime.
InadditiontotheLinux“bonding”driver,Onloadfromthe201509versionalso
supportsthe“teaming”driverand“teamd”.
RefertotheLimitationssection,Bonding,Linkaggregationonpage120forfurther
information.

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 66
6.22VLANS
ThedivisionofaphysicalnetworkintomultiplebroadcastdomainsorVLANsoffers
improvedscalability,securityandnetworkmanagement.
OnloadwillacceleratetrafficoversuitableVLANinterfacesbydefaultwithno
additionalconfigurationrequired.
e.g.toaddaninterfaceforVLAN5overanSFCinterface(eth2)
modprobeonload
modprobe8021q
vconfigaddeth25
ifconfigeth2.5192.168.1.1/24
TrafficoverthisinterfacewillthenbetransparentlyacceleratedbyOnload.
RefertotheLimitationssection,VLANsonpage120forfurtherinformation.
6.23Acceleratedpipe()
Onloadsupportstheaccelerationofpipes,providinganacceleratedIPCmechanism
throughwhichtwoprocessesonthesamehostcancommunicateusingshared
memoryatuser‐level.Acceleratedpipesdonotinvokesystemcalls.Accelerated
pipestherefore,reducetheoverheadsforread/writeoperationsandofferimproved
latencyoverthekernelimplementation.
Tocreateauser‐levelpipe,andbeforethepipe()orpipe2()functioniscalled,a
processmustbeacceleratedbyOnloadandmusthavecreatedanOnloadstack.By
default,anacceleratedprocessthathasnotcreatedanOnloadstackisgrantedonly
anon‐acceleratedpipe.SeeEF_PIPEforotheroptions.
Theacceleratedpipeiscreatedfromthepoolofavailablepacketbuffers..
Thefollowingfunctioncalls,relatedtopipes,willbeacceleratedbyOnloadandwill
notenterthekernelunlesstheyblock:
•pipe()
•read()
•write()
•readv()
•writev()
•send()
•recv()
•recvmsg()
•sendmsg()
•poll()
•select()

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 67
•epoll_ctl()
•epoll_wait()
AswithTCP/UDPsockets,theOnloadtuningoptionssuchasEF_POLL_USECand
EF_SPIN_USECwillalsoinfluenceperformanceoftheuser‐levelpipe.
ReferalsotoEF_PIPE,EF_PIPE_RECV_SPIN,EF_PIPE_SEND_SPINinParameter
Referenceonpage146.
NOTE:Onlyanonymouspipescreatedwiththepipe()orpipe2()functioncalls
willbeaccelerated.
6.24Zero‐CopyAPI
TheOnloadExtensionsAPIincludessupportforzero‐copyofTCPtransmitpackets
andUDPreceivepackets.RefertoZero‐CopyAPIonpage201fordetailed
descriptionsandexamplesourcecodeoftheAPI.
6.25DebugandLogging
Onloadsupportsvariousdebugandloggingoptions.Logginganddebuginformation
willbedisplayedonanattachedconsoleorwillbesenttothesyslog.Toforceall
debugtothesyslogsettheOnloadenvironmentvariableEF_LOG_VIA_IOCTL=1.
Formoreinformationaboutdebug/loggingenvironmentvariablesreferto
ParameterReferenceonpage146.
Toenabledebugandloggingusingtheoptionsbelow,Onloadmustbeinstalledwith
debugenablede.g:
#onload_install‐‐debug
IfOnloadisalreadyinstalled,uninstall,thenre‐installwiththe‐‐debugoptionas
shownabove.
LogLevels:
•EF_UNIX_LOG.
•EF_LOG.
•EF_LOG_FILE‐WhenEF_LOG_VIA_IOCTLisunset,theuserisabletoredirect
OnloadoutputtoaspecifieddirectoryandfileusingtheEF_LOG_FILEoption.
TimestampscanalsobeaddedtothelogfilewhenEF_LOG_TIMESTAMPSisalso
enabled.
EF_LOG_FILE=<path/file>
Notethatkernelloggingisstilldirectedtothesyslog.
•TP_LOG(bitmask)‐usefulforstackdebugging.SeeOnloadsourcecode/src/
include/ci/internal/ip_log.hforbitvalues.
•Onloadmoduleoptions:

OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 68
‐ oo_debug_bits=[bitmask]‐usefulforkernelloggingandeventsnot
involvinganonloadstack.Seesrc/include/onload/debug.hforbit
values.
‐ ci_tp_log=[bitmask]‐usefulforkernelloggingandeventsinvolvingan
onloadstack.SeeOnloadsourcecode/src/include/ci/internal/
ip_log.hfordetails.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 69
7Onload‐TCP
7.1TCPOperation
ThetablebelowidentifiestheOnloadTCPimplementationRFCcompliance.
7.2TCPHandshake‐SYN,SYNACK
DuringtheTCPconnectionestablishment3‐wayhandshake,Onloadnegotiatesthe
MSS,WindowScale,SACKpermitted,ECN,PAWSandRTTMtimestamps.
RFC Title Compliance
793 TransmissionControlProtocol Yes
813 WindowandAcknowledgementStrategyinTCP Yes
896 CongestionControlinIP/TCP Yes
1122 RequirementsforHosts Yes
1191 PathMTUDiscovery Yes
1323 TCPExtensionsforHighPerformance Yes
2018 TCPSelectiveAcknowledgmentOptions Yes
2581 TCPCongestionControl Yes
2582 TheNewRenoModificationtoTCP’sFastRecovery
Algorithm
Yes
2883 AnExtensiontotheSelectiveAcknowledgement
(SACK)OptionforTCP
Yes
2988 ComputingTCP’sRetransmissionTimer Yes
3128 ProtectionAgainstaVariantoftheTinyFragment
Attack
Yes
3168 TheAdditionofExplicitCongestionNotification(ECN)
toIP
Yes
3465 TCPCongestionControlwithAppropriateByte
Counting(ABC)
Yes

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 70
ForTCPconnectionsOnloadwillnegotiateanappropriateMSSfortheMTU
configuredontheinterface.However,whenusingjumboframes,Onloadwill
currentlynegotiateanMSSvalueuptoamaximumof2048bytesminusthenumber
ofbytesrequiredforpacketheaders.Thisisduetothefactthatthesizeofbuffers
passedtotheSolarflarenetworkinterfacecardis2048bytesandtheOnloadstack
cannotcurrentlyhandlefragmentedpacketsonitsTCPreceivepath.
TCPoptionsadvertisedduringthehandshakecanbeselectedusingthe
EF_TCP_SYN_OPTSenvironmentvariable.RefertoParameterReferenceon
page146fordetailsofenvironmentvariables.
7.3TCPSYNCookies
TheOnloadenvironmentvariableEF_TCP_SYNCOOKIEScanbeenabledonaper
stackbasistoforcetheuseofSYNCOOKIEStherebyprovidingadegreeofprotection
againsttheDenialofService(DOS)SYNfloodattack.EF_TCP_SYNCOOKIESis
disabledbydefault.RefertoParameterReferenceonpage146fordetailsof
environmentvariables.
7.4TCPSocketOptions
OnloadTCPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_ACCEPTCONN determineswhetherthesocketcanacceptincoming
connections‐trueforlisteningsockets.(Onlyvalidasa
getsockopt()).
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.
SO_CONNECT_TIME numberofsecondsaconnectionhasbeenestablished.
(Onlyvalidasagetsockopt()).
SO_DEBUG enableprotocoldebugging.
SO_DONTROUTE outgoingdatashouldbesentonwhateverinterfacethe
socketisboundtoandnotroutedviaanotherinterface.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.
SO_KEEPALIVE enablesendingofkeep‐alivemessagesonconnection
orientedsockets.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 71
SO_LINGER whenenabled,aclose()orshutdown()willnot
returnuntilallqueuedmessagesforthesockethave
beensuccessfullysentorthelingertimeouthasbeen
reached.Otherwisetheclose()orshutdown()
returnsimmediatelyandsocketsareclosedinthe
background.
SO_OOBINLINE indicatesthatout‐of‐bounddatashouldbereturnedin‐
linewithregulardata.Thisoptionisonlyvalidfor
connection‐orientedprotocolsthatsupportout‐of‐
banddata.
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeueingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.Thevaluesetisdoubledbythekernelandby
Onloadtoallowforbookkeepingoverheadswhenitis
setbythesetsockopt()functioncall.Notethat
EF_TCP_RCVBUFoverridesthisvalueand
EF_TCP_RCVBUF_ESTABLISHED_DEFAULTcanalso
overridethisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RCVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_RECVTIMEO setsthetimeoutinmillisecondsforblockingreceive
calls.
SO_REUSEADDR canreuselocalportnumbersi.e.anothersocketcan
bindtothesameportexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowsmultiplesocketstobindtothesameport.
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.Notethat
EF_TCP_SNDBUF,EF_TCP_SNDBUF_MODEand
EF_TCP_SNDBUF_ESTABLISHED_DEFAULTcanoverride
thisvalue.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 72
7.5TCPLevelOptions
OnloadTCPsupportsthefollowingTCPoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPNS enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.SeeSO_TIMESTAMPING(HardwareReceive
Timestamps)onpage55.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_SYS
_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SOF_TIMESTAMPING_OPT
_CMSG
delivertimestampsusingthecmsgAPI.
ONLOAD_SOF_TIMESTAMP
ING_STREAM
OnloadextensiontothestandardSO_TIMESTAMPING
APItosupporthardwaretimestampsonTCPsockets.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
IP_TRANSPARENT thissocketoptionallowsthecallingapplicationtobind
thesockettoanonlocalIPaddress.
Option Description
TCP_CORK stopssendsonsegmentslessthanMSSsizeuntilthe
connectionisuncorked.
TCP_DEFER_ACCEPT aconnectionisESTABLISHEDafterhandshakeis
completeinsteadofleavingitinSYN‐RECVuntilthe
firstrealdatapacketarrives.Theconnectionisplaced
intheacceptqueuewhenthefirstdatapacketarrives.
TCP_INFO populatesaninternaldatastructurewithtcpstatistic
values.
TCP_KEEPALIVE_ABORT_
THRESHHOLD
howlongtotrytoproduceasuccessfulkeepalive
beforegivingup.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 73
7.6TCPFileDescriptorControl
Onloadsupportsthefollowingoptionsinsocket()andaccept()calls.
TCP_KEEPALIVE_THRESH
HOLD
specifiestheidletimeforkeepalivetimers.
TCP_KEEPCNT numberofkeepalivesbeforegivingup.
TCP_KEEPIDLE idletimeforkeepalives.
TCP_KEEPINTVL timebetweenkeepalives.
TCP_MAXSEG getstheMSSsizeforthisconnection.
TCP_NODELAY disablesNagle’sAlgorithmandsmallsegmentsaresent
withoutdelayandwithoutwaitingforprevious
segmentstobeacknowledged.
TCP_QUICKACK whenenabledACKmessagesaresentimmediately
followingreceptionofthenextdatapacket.Thisflag
willberesettozerofollowingeveryusei.e.itisaone
timeoption.Defaultvalueis1(enabled).
Option Description
SOCK_CLOEXEC supportedinsocket()andaccept().Setsthe
O_NONBLOCKfilestatusflagonthenewopenfile
descriptorsavingextracallstofcntl(2)toachievethe
sameresult.
SOCK_NONBLOCK supportedinaccept().Setstheclose‐on‐exec
(FD_CLOEXEC)flagonthenewfiledescriptor.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 74
7.7TCPCongestionControl
OnloadTCPimplementscongestioncontrolinaccordancewithRFC3465and
employstheNewRenoalgorithmwithextensionsforAppropriateByteCounting
(ABC).
Onneworidleconnectionsandthoseexperiencingloss,OnloademploysaFast
Startalgorithminwhichdelayedacknowledgmentsaredisabled,therebycreating
moreACKsandsubsequently‘growing’thecongestionwindowrapidly.Two
environmentvariables;EF_TCP_FASTSTART_INITandEF_TCP_FASTSTART_LOSS
areassociatedwiththefaststart‐RefertoParameterReferenceonpage146for
details.
DuringSlowStart,thecongestionwindowisinitiallysetto2xmaximumsegment
size(MSS)value.AseachACKisreceivedthecongestionwindowsizeisincreasedby
thenumberofbytesacknowledgeduptoamaximum2xMSSbytes.Thisallows
Onloadtotransmittheminimumofthecongestionwindowandadvertisedwindow
sizei.e.
transmissionwindow(bytes)=min(CWND,receiveradvertisedwindowsize)
Iflossisdetected‐eitherbyretransmissiontimeout(RTO),orthereceptionof
duplicateACKs,Onloadwilladoptacongestionavoidancealgorithmtoslowthe
transmissionrate.Incongestionavoidancethetransmissionwindowishalvedfrom
itscurrentsize‐butwillnotbelessthan2xMSS.Ifcongestionavoidancewas
triggeredbyanRTOtimeouttheSlowStartalgorithmisagainusedtorestorethe
transmitrate.IftriggeredbyduplicateACKsOnloademploysaFastRetransmitand
FastRecoveryalgorithm.
IfOnloadTCPreceives3duplicateACKsthisindicatesthatasegmenthasbeenlost
‐ratherthanjustreceivedoutoforderandcausestheimmediateretransmissionof
thelostsegment(FastRetransmit).ThecontinuedreceptionofduplicateACKsisan
indicationthattrafficstillflowswithinthenetworkandOnloadwillfollowFast
RetransmitwithFastRecovery.
DuringFastRecoveryOnloadagainresortstothecongestionavoidance(without
SlowStart)algorithmwiththecongestionwindowsizebeinghalvedfromitspresent
value.
Onloadsupportsanumberofenvironmentvariablesthatinfluencethebehaviorof
thecongestionwindowandrecoveryalgorithmsRefertoParameterReferenceon
page146.:
•EF_TCP_INITIAL_CWND‐setstheinitialsize(bytes)ofcongestionwindow
•EF_TCP_LOSS_MIN_CWND‐setstheminimumsizeofthecongestionwindow
followingloss.
•EF_CONG_AVOID_SCALE_BACK‐slowsdowntherateatwhichtheTCP
congestionwindowisopenedtohelpreducelossinenvironmentsalready
sufferingcongestionandloss.
ThecongestionvariablesshouldbeusedwithcautionsoastoavoidviolatingTCP
protocolrequirementsanddegradingTCPperformance.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 75
7.8TCPSACK
OnloadwillemployTCPSelectiveAcknowledgment(SACK)iftheoptionhasbeen
negotiatedandagreedbybothendsofaconnectionduringtheconnection
establishment3‐wayhandshake.RefertoRFC2018forfurtherinformation.
7.9TCPQUICKACK
TCPwillgenerallyaimtodeferthesendingofACKsinordertominimizethenumber
ofpacketsonthenetwork.OnloadsupportsthestandardTCP_QUICKACKsocket
optionwhichallowssomecontroloverthisbehavior.EnablingTCP_QUICKACK
causesanACKtobesentimmediatelyinresponsetothereceptionofthefollowing
datapacket.Thisisaone‐shotoperationandTCP_QUICKACKselfclearstozero
immediatelyaftertheACKissent.
7.10TCPDelayedACK
BydefaultTCPstacksdelaysendingacknowledgments(ACKs)toimproveefficiency
andutilizationofanetworklink.DelayedACKsalsoimprovereceivelatencyby
ensuringthatACKsarenotsentonthecriticalpath.However,ifthesenderofTCP
packetsisusingNagle’salgorithm,receivelatencywillbeimpairedbyusingdelayed
ACKs.
UsingtheEF_DELACK_THRESHenvironmentvariabletheusercanspecifyhowmany
TCPsegmentscanbereceivedbeforeOnloadwillrespondwithaTCPACK.Referto
theParameterListonpage146fordetailsoftheOnloadenvironmentdelayedTCP
ACKvariables.
7.11TCPDynamicACK
ThesendingofexcessiveTCPACKscanimpairperformanceandincreasereceive
sidelatency.AlthoughTCPgenerallyaimstodeferthesendingofACKs,Onloadalso
supportsafurthermechanism.TheEF_DYNAMIC_ACK_THRESHenvironmentvariable
allowsOnloadtodynamicallydeterminewhenitisnon‐detrimentaltothroughput
andefficiencytosendaTCPACK.OnloadwillforceanTCPACKtobesentifthe
numberofTCPACKspendingreachesthethresholdvalue.
RefertotheParameterListonpage146fordetailsoftheOnloadenvironment
delayedTCPACKvariables.
NOTE:WhenusedtogetherwithEF_DELACK_THRESHorEF_DYNAMIC_ACK_THRESH,
thesocketoptionTCP_QUICKACKwillbehaveexactlyasstatedabove.Bothonload
environmentvariablesidentifythemaximumnumberofsegmentsthatcanbe
receivedbeforeanACKisreturned.SendinganACKbeforethespecifiedmaximum
isreachedisallowed.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 76
NOTE:TCPACKSshouldbetransmittedatasufficientratetoensuretheremoteend
doesnotdroptheTCPconnection.
7.12TCPLoopbackAcceleration
OnloadsupportstheaccelerationofTCPloopbackconnections,providingan
acceleratedmechanismthroughwhichtwoprocessesonthesamehostcan
communicate.AcceleratedTCPloopbackconnectionsdonotinvokesystemcalls,
reducetheoverheadsforread/writeoperationsandofferimprovedlatencyoverthe
kernelimplementation.
TheserverandclientprocesseswhowanttocommunicateusinganacceleratedTCP
loopbackconnectiondonotneedtobeconfiguredtoshareanOnloadstack.
However,theserverandclientTCPloopbacksocketscanonlybeacceleratedifthey
areinthesameOnloadstack.OnloadhastheabilitytomoveaTCPloopbacksocket
betweenOnloadstackstoachievethis.
TCPloopbackaccelerationisconfiguredviatheenvironmentvariables
EF_TCP_CLIENT_LOOPBACKandEF_TCP_SERVER_LOOPBACK.AswellasenablingTCP
loopbackaccelerationtheseenvironmentvariablescontrolOnload’sbehaviorwhen
theserverandclientsocketsdonotoriginateinthesameOnloadstack.Thisgives
theusergreaterflexibilityandcontrolwhenestablishingloopbackonTCPsockets
eitherfromthelistening(server)socketorfromtheconnecting(client)socket.The
connectingsocketcanuseanylocaladdressorspecifytheloopbackaddress.
Thefollowingdiagramillustratestheclientandserverloopbackoptions.Referto
ParameterReferenceonpage146foradescriptionoftheloopbackvariables.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 77
Figure8:EF_TCP_CLIENT/SERVER_LOOPBACK
TheclientloopbackoptionEF_TCP_CLIENT_LOOPBACK=4,whenusedwiththe
serverloopbackoptionEF_TCP_SERVER_LOOPBACK=2,differsfromotherloopback
optionssuchthatratherthanmovesocketsbetweenexistingstackstheywillcreate
anadditionalstackandmovesocketsfrombothendsoftheTCPconnectionintothis
newstack.Thisavoidsthepossibilityofhavingmanyloopbacksocketssharingand
contendingfortheresourcesofasinglestack.
WhenclientandserverarenotthesameUUID,settheenvironmentvariable
EF_SHARE_WITHtoallowbothprocessestosharethecreatedsharedstack.
7.13TCPStriping
OnloadsupportsaSolarflareproprietaryTCPstripingmechanismthatallowsa
singleTCPconnectiontousebothphysicalportsofanetworkadapter.Usingthe
combinedbandwidthofbothportsmeansincreasedthroughputforTCPstreaming
applications.TCPstripingcanbeparticularlybeneficialforMessagePassing
Interface(MPI)applications.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 78
IftheTCPconnection’ssourceIPaddressanddestinationIPaddressareonthesame
subnetasdefinedbyEF_STRIPE_NETMASKthenOnloadwillattempttonegotiate
TCPstripingfortheconnection.OnloadTCPstripingmustbeconfiguredatboth
endsofthelink.
TCPstripingallowsasingleTCPconnectiontousethefullbandwidthofboth
physicalportsonthesameadapter.Thisshouldnotbeconfusedwithlink
aggregation/portbondinginwhichanyoneTCPconnectionwithinthebondcan
onlyuseasinglephysicalportandthereforemorethanoneTCPconnectionwould
berequiredtorealizethefullbandwidthoftwophysicalports.
NOTE:TCPstripingisdisabledbydefault.Toenablethisfeaturesettheparameter
CI_CFG_PORT_STRIPING=1intheonloaddistributionsourcedirectorysrc/
include/internal/tranport_config_opt.hfile.
7.14TCPConnectionResetonRTO
Undercertaincircumstancesitmaybepreferabletoavoidre‐sendingTCPdatatoa
peerservicewhendatadeliveryhasbeendelayed.Oncedatahasbeensent,andfor
whichnoacknowledgmenthasbeenreceived,theTCPretransmissiontimeout
periodrepresentsaconsiderabledelay.Whentheretransmissiontimeout(RTO)
eventuallyexpiresitmaybepreferablenottoretransmittheoriginaldata.
OnloadcanbeconfiguredtoresetaTCPconnectionratherthanattemptto
retransmitdataforwhichnoacknowledgmenthasbereceived.
ThisfeatureisenabledwiththeEF_TCP_RST_DELAYED_CONNperstackenvironment
variableandappliestoallTCPconnectionsintheonloadstack.OnanyTCP
connectionintheonloadstack,iftheRTOtimerexpiresbeforeanACKisreceived
theTCPconnectionwillbereset.
7.15ONLOAD_MSG_WARM
Applicationsthatsenddatainfrequentlymayseeincreasedsendlatencycompared
toanapplicationthatismakingfrequentsends.Thisisduetothesendpathand
associateddatastructuresnotbeingcacheandTLBresident(whichcanoccureven
iftheCPUhasbeenotherwiseidlesincetheprevioussendcall).
OnloadthereforesupportsapplicationsrepeatedlycallingsendtokeeptheTCPfast
sendpath‘warm’inthecachewithoutactuallysendingdata.Thisisparticularly
usefulforapplicationsthatonlysendinfrequentlyandhelpstomaintainlowlatency
performanceforthoseTCPconnectionsthatdonotsendoften.These“fake”sends
areperformedbysettingtheONLOAD_MSG_WARMflagwhencallingtheTCPsendcalls.
Themessagewarmfeaturedoesnottransmitanypackets.
charbuf[10];
send(fd,buf,10,ONLOAD_MSG_WARM);
Onloadstackdumpsupportsnewcounterstoindicatethelevelofmessagewarm
use:

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 79
•warm_abortedisacountofthenumberoftimesamessagewarmsend
functionwascalled,butthesendpathwasnotexercisedduetoOnloadlocking
constraints.
•warmisacountofthenumberoftimesamessagewarmsendfunctionwas
calledwhenthesendpathwasexercised.
NOTE:IftheONLOAD_MSG_WARMflagisusedonsocketswhicharenotaccelerated‐
includingthosehandedofftothekernelbyOnload,itmaycausethemessagewarm
packetstobeactuallysent.ThisisduetoalimitationinsomeLinuxdistributions
whichappeartoignorethisflag.TheOnloadextensionsAPIcanbeusedtocheck
whetherasocketsupportstheMSG_WARMfeatureviathe
onload_fd_check_feature()API(onload_fd_check_featureonpage191).
NOTE:Onloadversionsearlierthan201310donotsupporttheONLOAD_MSG_WARM
socketflag,thereforesettingtheflagwillcausemessagewarmpacketstobesent.
7.16Listen/AcceptSockets
TCPsocketsacceptedfromalisteningsocketwillshareawildcardfilterwiththe
parentsocket.ThefollowingOnloadmoduleoptionscanbeusedtocontrol
behaviorwhentheparentsocketisclosed.
oof_shared_keep_thresh‐default100,isthenumberofacceptedsocketssharing
awildcardfilterthatwillcausethefiltertopersistafterthelisteningsockethas
closed.
oof_shared_steal_thresh‐default200,isthenumberofsocketssharinga
wildcardfilterthatwillcausethefiltertopersistevenwhenanewlisteningsocket
needsthefilter.
Ifthelisteningsocketisclosedthebehaviordependsonthenumberofremaining
acceptedsocketsasfollows:
Numberofacceptedsockets OnloadAction
>oof_shared_keep_threshbut
<oof_shared_steal_thresh
Retainthewildcardfiltersharedbyall
acceptedsockets.
Ifanewlisteningsocketrequiresthefilter,
Onloadwillinstallafull‐matchfilterforeach
acceptedsocketallowingthelisteningsocket
tousethewildcardfilter.
>oof_shared_steal_thresh Retainthewildcardfiltersharedbyall
acceptedsockets.
Anewlisteningsocketcanbecreatedbuta
filtercannotbeinstalledmeaningthesocket
willreceivenotrafficuntilthenumberof
acceptedconnectionsisreduced.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 80
7.17SocketCaching
SocketcachingmeansOnloadcanfurtherreducetheoverheadofsettingupnew
TCPconnectionsbyreusingexistingsocketsinsteadofcreatingfromnew.
Acachedsocketretainsafiledescriptorandsocketbufferwhenitisreturnedtothe
cacheoftheOnloadstackfromwhichitoriginated.
SocketcachingisenabledwhenEF_SOCKET_CACHE_MAXissettoavaluegreater
thanzero.Onloadwilldecidewhethertoapplypassiveoractivecachingdepending
onthetypeofsocketscreatedbytheuserapplication.
EF_SOCKET_CACHE_MAXappliestobothactiveandpassivesockets,i.e.ifsetto100
thecachelimitis100ofeachsockettype.
TCPPassiveSocketCaching
Passivesocketcaching,supportedfromtheOnload201502release,meansOnload
willre‐usesocketbuffersandfiledescriptorsfrompassive‐open(listeningsockets).
Thiscanimprovetheacceptrateofactive‐openTCPconnectionsandwillbenefit
processeswhichneedtoacceptlotsofconnectionsfromtheselisteningsockets.
TCPActiveSocketCaching
Activesocketcaching,supportedfromtheOnload201509release,meansOnload
willre‐usesocketbuffersandfiledescriptorsfromactive‐opensocketswhenan
establishedTCPconnectionhasterminated.
Active‐opensocketssettingtheIP_TRANSPARENTsocketoptioncanbecached.
CachingStackdump
OnloadstackdumpcanbeusedtomonitorcachingactivityonOnloadstacks.
#onload_stackdumplots[|grepcache]
Counter Description
active cache:
hit=0
avail=0
cache=EMPTY
pending=EMPTY
TCPsocketcaching:
hit=numberofcachehits(werecached)
avail=numberofsocketsavailableforcaching
currentcachestate
sockcache_cached Numberofsocketscachedoverthelifetimeofthestack
sockcache_contenti
on
Numberofsocketsnotcachedduetolockcontention
passive_sockcache_
stacklim
Numberofpassivesocketsnotcachedduetostacklimit

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 81
Caching‐Requirements
Therearesomenecessarypre‐requisiteswhenusingsocketcaching:
•setEF_UL_EPOLL=3andsetEF_FDS_MT_SAFE=1
•socketcachingisnotsupportedafterfork()
•socketsthathavebeendup()edwillnotbecached
•socketsthatusetheO_ASYNCorO_APPENDmodeswillnotbecached
• cachingoffersnobenefitifasinglesocketacceptsconnectionsonmultiple
localaddresses(applicabletopassivecachingonly).
•SetO_NONBLOCKorO_CLOEXECifrequiredonthesocket,whencreatingthe
socket.
Whensocketcachingcannotbeenabled,socketswillbeprocessedasnormal
Onloadsockets.
Usersshouldrefertodetailsofthefollowingenvironmentvariables:
• EF_SOCKET_CACHE_MAX
• EF_PER_SOCKET_CACHE_MAX
• EF_SOCKET_CACHE_PORTS
NOTE:Allowingmoresocketstobecachedthantherearefiledescriptorsavailable
canresultindrasticallyreducedperformanceandusersshouldconsiderthatthe
socketcachelimit,EF_SOCKET_CACHE_MAX,appliesperstack,unliketheper‐
processEF_SOCKET_CACHE_PORTSlimits.
RefertoParameterReferenceonpage146fordetailsofOnloadenvironment
variables.
active_sockcache_s
tacklim
Numberofactivesocketsnotcachedduetostacklimit
sockcache_socklim Numberofsocketsnotcachedduetosocketlimit
sockcache_hit Numberofsocketcachehits(werecached)
sockcache_hit_reap Numberofsocketcachehits(werecached)afterreaping
sockcache_miss_int
mismatch
Numberofsocketcachemissesduetomismatchedinterfaces
activecache_cached Numberofactivesocketscachedoverthelifetimeofthestack.
activecache_stackl
im
Numberofactivesocketsnotcachedduetostacklimit
activecache_hit Numberofactivesocketcachehits(werecached)
activecache_hit_re
ap
Numberofactivesocketcachehits(werecached)afterreaping
Counter Description

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 82
7.18ScalableFilters
Usingscalablefilters,anOnloadstackcaninstallaMACfiltertoreceivealltraffic
fromaspecifiedinterface.
NOTE:OncetheMACfilterisinsertedonaninterface,ARP,ICMPandIGMPtraffic
isdirectedtothekernel,butallothertrafficisdirectedtoasingleOnloadstack.
Usingscalablefiltersremoveslimitationson:
•thenumberoflisteningsocketsinscalablefilterspassivemode
•thenumberofactive‐openconnectionsinscalablefilterstransparent‐active
mode.ThisworksonlyforsocketshavingtheIP_TRANSPARENToptionset.See
TransparentReverseProxyModesonpage84below.
ItissuggestedthatadedicatedinterfaceisusedbythestackinsertingtheMACfilter.
Thisallowsthekernelstackoranotherapplicationusingscalablefilterstousethe
samephysicalport.
TheSolarflareSFN7000seriesadaptercanbepartitionedtoexposeupto16PCIe
physicalfunctions(PF).EachPFispresentedtotheOSasastandardnetwork
interface.Theadapterispartitionedwiththesfbootutility‐seeexamplebelow.
OnceaMACfilterhasbeeninstalledonaPF,otherOnloadstackscanstillreceive
othertrafficonthesamePF,butsocketswillhavetoinsertIPfiltersfortherequired
traffic.ApartfromARP,ICMPandIGMPpackets,OSkernelsockets,usingthesame
PF,willnotreceiveanytraffic.
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Ifa
processcreatesmultiplestacks,theEF_SCALABLE_FILTERS_ENABLEper‐stack
variablecanbeusedtoenable/disablethisfeatureforindividualstacksusingthe
existingOnloadextensionsAPIe.g.
onload_stack_opt_set_int(EF_SCALABLE_FILTERS_ENABLE,1);
TheMACfilterisinsertedwhenthestackiscreated‐i.e.beforesocketsarecreated,
andsocketsneedtobecreatedtoreceiveanytrafficdestinedforthisstack.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 83
ScalableFilters‐Restrictions
•ScalablefiltersareonlyusedforTCPtraffic.
•UDPtrafficcanbereceivedandacceleratedbyOnloadoninterfaceswhere
scalablefiltersareenabled,butkernelUDPsocketswillnotreceivetraffic.
•UDPfragmentedframescannotbereceivedoninterfaceswherescalablefilters
areenabled.Usersshouldavoidhavingfragmentedframesontheseinterfaces.
•Theadaptermustusethefull‐featureorlow‐latencyfirmwarevariants.
• Minimumfirmwareversion:4.6.5.1000.
•Stackperthreadoptions(EF_STACK_PER_THREAD)cannotbeusedwiththis
feature.
•BydefaultthescalablefiltersfeaturerequiresCAP_NET_RAW.Onloadcanbe
configuredtoavoidcapabilitychecksforthisusingtheOnloadmoduleoption
scalable_filter_gid.SeeModuleOptionsonpage143fordetails.
ScalableFilters‐Configuration
Toenablescalablefiltersonaspecificinterface:
EF_SCALABLE_FILTERS=enps0f0
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Acluster
(seeApplicationClusteringonpage63)mighthavemultiplestacksandeachstack
couldinstallaMACfilteronadifferentinterface.
SocketsmustbeboundtotheIPaddressoftheinterface.
ThisfeatureistargetedatTCPlisteningsocketsonlyandconnectionsacceptedfrom
alisteningsocketwillsharetheMACfilter.
PartitiontheNIC
ThesfbootutilityisavailableintheSolarflareLinuxUtilitiespackage(SF‐107601‐LS),
thefollowingexampledemonstrateshowtopartitiontheadaptertoexposemore
thanonePF(Acoldrebootoftheserverisneededafterchangesusingsfboot).
#sfbootpf‐count=2vf‐count=0switch‐mode=partitioning
ScalableFiltersandBonding
Bondedinterfaces‐createdwiththestandardLinuxbondingorteamingdrivercan
beusedforscalablefilters.
Everyinterfacethatispartofthebondmustbepresentinthesystemwhenthe
scalablefiltersstackiscreated.Removingthebondwillcausethescalablefilterto
stopreceivingtraffic.Afteranewbondinterfaceiscreated,theapplicationmustbe
restartedtousethebond.

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 84
7.19TransparentReverseProxyModes
EnhancementssuchasScalableFilters,SocketCachingandsupportforthe
IP_TRANSPARENTsocketoptionsupportOnloadwithgreaterefficiencyand
increasedscalabilityintransparentreverseproxymodeserverdeployments.
Thesefeaturesreducetoaminimumtheoverheadsassociatedwithcreatingand
connectingtransparentsockets.Onloadcanuseofupto2milliontransparent
active‐opensocketsperOnloadstack.
AtransparentsocketiscreatedwhenasocketsetstheIP_TRANSPARENTsocket
optionandexplicitlybindstoIPaddressesandport.Theipaddresscanbeona
foreignhost.IP_TRANSPARENTmustbesetbeforethebind.
TheEF_SCALABLE_FILTERSvariableisusedtoenablescalablefiltersandtoconfigure
thetransparentproxymode.
Restrictions
•TheIP_TRANSPARENToptionmustbesetbeforethesocketisbound.
•TheIP_TRANSPARENToptioncannotbeclearedafterbindonaccelerated
sockets.
• IP_TRANSPARENTsocketscannotbeacceleratediftheyareboundtoport0or
toINADDR_ANY.
• IP_TRANSPARENTsocketscannotbepassedtothekernelstackwhenboundto
aportthatisinthelistspecifiedbyEF_FORCE_TCP_REUSEPORT.
•Whenusingtherss:transparent_activemode(seebelow),EF_CLUSTER_NAME
mustbeexplicitlysetbytheprocesssharingtheclusterANDthestackcannot
benamedbyeitherEF_NAMEoronload_set_stackname().
Config(example)Settings
BelowareexamplesofconfigurationsusingtheEF_SCALABLE_FILTERSenvironment
optiontosettransparentproxymodes.
• Enablescalablefiltersoninterfacep1p1‐thisinsertsaMACaddressfilteron
theadapter.Thefilterissharedbyallactiveopenconnectionsontheinterface.
SocketcachingwillbeappliedtothepassivesideoftheTCPconnection.
EF_SCALABLE_FILTERS=p1p1=passive
• Enablescalablefiltersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketcachingwillbeapplied
totheactivesideofaTCPconnection:
EF_SCALABLE_FILTERS=enps0f0=transparent_active

OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 85
•Asfortheexampleabove,butusessymmetricalRSStoensurethatrequests/
responsesbetweenclientsandbackendserversareprocessedbythesame
thread.
EF_SCALABLE_FILTERS=enps0f0=rss:transparent_active
• Enablescalablefitlersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketbuffersarecached
fromactiveandpassivesidesoftheTCPconnection.
EF_SCALABLE_FILTERS=enps0f0=transparent_active:passive
7.20TransparentReverseProxyonMultipleCPUs
UsedtogetherwithApplicationClustering,transparentscalablemodescandeliver
linearscalabilityusingmultipleCPUcores.
ThisusesRSStodistributetraffic,bothupstreamanddownstreamoftheproxy
application,mappingstreamstothecorrectOnloadstack.WheneachCPUcoreis
associatedexclusivelywithasingleclusteredstacktherecanbenocontention
betweenstacks.
Forthisusecasetofunctioncorrectly,theproxyapplicationwillusethedownstream
clientaddress:portontheupstream(toserver)sideoftheTCPconnection.Inthis
wayRSSandhardwarefiltersensurethatclientsideandserversidearehandledby
thesameworkerthreadandtrafficisdirectedtothecorrectstack.
Inthisscenariotheclientthinksitcommunicatesdirectlywiththeserver,andthe
serverthinksitcommunicatesdirectlywiththeclient‐thetransparentproxyserver
is‘transparent’.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 86
8Onload‐UDP
8.1UDPOperation
ThetablebelowidentifiestheOnloadUDPimplementationRFCcompliance.
8.2SocketOptions
OnloadUDPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
RFC Title Compliance
768 UserDatagramProtocol Yes
1122 RequirementsforHosts Yes
3678 SocketInterfaceExtensionsfor
MulticastSourceFilters
Partial
SeeSourceSpecificSocketOptions
onpage88
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.See
SO_BINDTODEVICEonpage57.
SO_BROADCAST whenenableddatagramsocketscansendandreceive
packetsto/fromabroadcastaddress.
SO_DEBUG enableprotocoldebugging.
SO_DONTROUTE outgoingdatashouldbesentonwhateverinterfacethe
socketisboundtoandnotroutedviaanotherinterface.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 87
SO_LINGER whenenabledaclose()orshutdown()willnotreturn
untilallqueuedmessagesforthesockethavebeen
successfullysentorthelingertimeouthasbeen
reached.Otherwisethecallreturnsimmediatelyand
socketsareclosedinthebackground.
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeuingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.Thevaluesetisdoubledbythekernelandby
Onloadtoallowforbookkeepingoverheadwhenitis
setbythesetsockopt()functioncall.Notethat
EF_UDP_RCVBUFoverridesthisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RECVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_REUSEADDR canreuselocalportsi.e.anothersocketcanbindtothe
sameportnumberexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowmultiplesocketstobindtothesameport.
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.NotethatEF_UDP_SNDBUF
overridesthisvalue.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enableordisablereceivingtheSO_TIMESTAMPcontrol
message(microsecondresolution).Seebelow.
SO_TIMESTAMPNS enableordisablereceivingtheSO_TIMESTAMPcontrol
message(nanosecondresolution).SeeSO_TIMESTAMP
andSO_TIMESTAMPNS(softwaretimestamps)on
page55.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.SeeSO_TIMESTAMPING(HardwareReceive
Timestamps)onpage55.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 88
8.3SourceSpecificSocketOptions
Thefollowingtableidentifiessourcespecificsocketoptionssupportedfromonload‐
201210‐u1onwards.RefertoreleasenotesforOnloadspecificbehaviorregarding
theseoptions.
8.4UDPSendandReceivePaths
ForeachUDPsocket,Onloadcreatesbothanacceleratedsocketandakernelsocket.
Thereisusuallynofiledescriptorforthekernelsocketvisibleintheuser’sfile
descriptortable.WhenaUDPprocessisreadytotransmitdata,Onloadwillchecka
cachedARPtablewhichmapsIPaddressestoMACaddresses.Acache‘hit’results
insendingviatheOnloadacceleratedsocket.Acache‘miss’resultsinasyscallto
populatetheusermodecachedARPtable.IfnoMACaddresscanbeidentifiedvia
thisprocessthepacketissentviathekernelstacktoprovokeARPresolution.
Therefore,itispossiblethatsomeUDPtrafficwillbesentoccasionallyviathekernel
stack.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_SYS
_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
Option Description
IP_ADD_SOURCE_MEMBER
SHIP
Jointhesuppliedmulticastgrouponthegiveninterface
andacceptdatafromthesuppliedsourceaddress.
IP_DROP_SOURCE_MEMBE
RSHIP
Dropsmembershiptothegivenmulticastgroup,
interfaceandsourceaddress.
MCAST_JOIN_SOURCE_GR
OUP
Joinasourcespecificgroup.
MCAST_LEAVE_SOURCE_G
ROUP
Leaveasourcespecificgroup.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 89
Figure9:UDPSendandReceivePaths
Figure9illustratestheUDPsendandreceivepaths.Lighterarrowsindicatethe
accelerated‘kernelbypass’path.DarkerarrowsidentifyfragmentedUDPpackets
receivedbytheSolarflareadapterandUDPpacketsreceivedfromanon‐Solarflare
adapter.UDPpacketsarrivingattheSolarflareadapterarefilteredonsourceand
destinationaddressandportnumbertoidentifyaVNICthepacketwillbedelivered
to.FragmentedUDPpacketsarereceivedbytheapplicationviathekernelUDP
socket.UDPpacketsreceivedbyanon‐Solarflareadapterarealwaysreceivedviathe
kernelUDPsocket.
8.5FragmentedUDP
WhensendingdatagramswhichexceedtheMTU,theOnloadstackwillsend
multipleEthernetpackets.OnhostsrunningOnload,fragmenteddatagramsare
alwaysreceivedviathekernelstack.
8.6UserLevelrecvmmsgforUDP
Therecvmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduser‐levelrecvmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Therecvmmsg()isnotsupportedforTCPsockets.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 90
8.7User‐LevelsendmmsgforUDP
Thesendmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduser‐levelsendmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Thesendmmsg()isnotsupportedforTCPsockets.
8.8MulticastReplication
TheSolarflareSFN7000seriesadapterssupportmulticastreplicationwhere
receivedpacketsarereplicatedinhardwareanddeliveredtomultiplereceive
queues.ThisfeatureallowsanynumberofOnloadclients,listeningtothesame
multicastdatastream,toreceivetheirowncopyofthepackets,withoutan
additionalsoftwarecopyandwithouttheneedtoshareOnloadstacks.Asillustrated
below,thepacketsaredeliveredmultipletimesbythecontrollertoeachreceive
queuethathasinstalledahardwarefiltertoreceivethespecifiedmulticaststream.
Figure10:HardwareMulticastReplication
Multicastreplicationisperformedintheadaptertransparentlyanddoesnotneed
tobeexplicitlyenabled.
ThisfeatureremovestheneedtoshareOnloadstacksusingtheEF_NAME
environmentvariable.UsersusingEF_NAMEexclusivelyforsharingmulticasttraffic
cannowremoveEF_NAMEfromtheconfigurations.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 91
8.9MulticastOperationandStackSharing
Toillustratesharedstacks,thefollowingexamplesdescribeOnloadbehaviorwhen
twoprocesses,onthesamehost,subscribetothesamemulticaststream:
•MulticastReceiveUsingDifferentOnloadStacksonpage91
•MulticastTransmitUsingDifferentOnloadStacksonpage92
•MulticastReceiveSharinganOnloadStackonpage92
•MulticastTransmitSharinganOnloadStackonpage93
•MulticastReceive‐OnloadStackandKernelStackonpage93.
NOTE:ThefollowingsubsectionsusetwoprocessestodemonstrateOnload
behavior.InpracticemultipleprocessescansharethesameOnloadstack.Stack
sharingisnotlimitedtomulticastsubscribersandcanbeemployedbyanyTCPand
UDPapplications.
MulticastReceiveUsingDifferentOnloadStacks
RunningonSFN5000orSFN6000seriesadapters(forSFN7000series‐seeMulticast
Replicationabove),OnloadwillnoticeiftwoOnloadstacksonthesamehost
subscribetothesamemulticaststreamandwillrespondbyredirectingthestream
togothroughthekernel.Handingthestreamtothekernel,thoughstillusingOnload
stacks,allowsbothsubscriberstoreceivethedatagrams,butuser‐space
accelerationislostandthereceiverateislowerthatitcouldotherwisebe.Figure11
belowillustratestheconfiguration.Arrowsindicatethereceivepathand
fragmentedUDPpath.
Figure11:MulticastReceiveUsingDifferentOnloadStacks.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 92
ThereasonforthisbehaviorisbecausetheSolarflareNICwillnotdeliverasingle
receivedmulticastpacketmultipletimestomultiplestacks–thepacketisdelivered
onlyonce.Ifareceivedpacketisdeliveredtokernel‐space,thenthekernelTCP/IP
stackwillcopythereceiveddatamultipletimestoeachsocketlisteningonthe
correspondingmulticaststream.Ifthereceivedpacketweredelivereddirectlyto
Onload,wherethestacksaremappedtouser‐space,itwouldonlybedeliveredtoa
singlesubscriberofthemulticaststream.
MulticastTransmitUsingDifferentOnloadStacks
ReferringtoFigure11,ifoneprocessweretotransmitmulticastdatagrams,these
wouldnotbereceivedbythesecondprocess.Onloadisonlyabletoaccelerate
transmittedmulticastdatagramswhentheydonotneedtobedeliveredtoother
applicationsinthesamehost.Ormoreaccurately,themulticaststreamcanonlybe
deliveredwithinthesameOnloadstack.
OnloadbydefaultchangesthedefaultstateoftheIP_MULTICAST_LOOPsocket
optionto0ratherthan1.ThischangeallowsOnloadtoacceleratemulticasttransmit
formostapplications,butmeansthatmulticasttrafficisnotdeliveredtoother
applicationsonthesamehostunlessthesubscribersocketsareinthesamestack.
ThenormalbehaviorcanberestoredbysettingEF_FORCE_SEND_MULTICAST=0,but
thislimitsmulticastaccelerationontransmittosocketsthathavemanuallysetthe
IP_MULTICAST_LOOPsocketoptiontozero.
MulticastReceiveSharinganOnloadStack
SettingtheEF_NAMEenvironmentvariabletothesamestring(max8chars)inboth
processesmeanstheycanshareanOnloadstack.Thestreamisnolongerredirected
throughthekernelresultinginamuchhigherreceiveratethancanbeobservedwith
thekernelTCP/IPstack(orwithseparateOnloadstackswherethedatapathisvia
thekernelTCP/IPstack).ThisconfigurationisillustratedinFigure12below.Lighter
arrowsindicatetheaccelerated(kernelbypass)path.Darkerarrowsindicatethe
fragmentedUDPpath.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 93
Figure12:SharinganOnloadStack
MulticastTransmitSharinganOnloadStack
ReferringtoFigure12,datagramstransmittedbyoneprocesswouldbereceivedby
thesecondprocessbecausebothprocessessharetheOnloadstack.
MulticastReceive‐OnloadStackandKernelStack
IfamulticaststreamisbeingacceleratedbyOnload,andanotherapplicationthatis
notusingOnloadsubscribestothesamestream,thenthesecondapplicationwill
notreceivetheassociateddatagrams.Thereforeifmultipleapplicationssubscribe
toaparticularmulticaststream,eitherallornoneshouldberunwithOnload.
ToenablemultipleapplicationsacceleratedwithOnloadtosubscribetothesame
multicaststream,theapplicationsmustsharethesameOnloadstack.Stacksharing
isachievedbyusingtheEF_NAMEenvironmentvariable(max8chars).
MulticastReceiveandMultipleSockets
Whenmultiplesocketsjointhesamemulticastgroup,receivedpacketsare
deliveredtothesesocketsintheorderthattheyjoinedthegroup.
Whenmultiplesocketsarecreatedbydifferentthreadsandallthreadsarespinning
onrecv(),thethreadwhichisabletoreceivefirstwillalsodeliverthepacketsto
theothersockets.
Ifathread‘A’isspinningonpoll(),andanotherthread‘B’,listeningtothesame
group,callsrecv()butdoesnotspin,‘A’willnoticeareceivedpacketfirstand
deliverthepacketto‘B’withoutaninterruptoccurring.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 94
8.10MulticastLoopback
ThesocketoptionIP_MULTICAST_LOOPcontrolswhethermulticasttrafficsentona
socketcanbereceivedlocallyonthemachine.WithOnload,thedefaultvalueofthe
IP_MULTICAST_LOOPsocketoptionis0(thekernelstackdefaults
IP_MULTICAST_LOOPto1).ThereforebydefaultwithOnloadmulticasttrafficsent
onasocketwillnotbereceivedlocally.
AswellassettingIP_MULTICAST_LOOPto1,receivingmulticasttrafficlocally
requiresboththesenderandreceivertobeusingthesameOnloadstack.Therefore,
whenareceiverisinthesameapplicationasthesenderitwillreceivemulticast
traffic.Ifsenderandreceiverareindifferentapplicationsthenbothmustberunning
OnloadandmustbeconfiguredtosharethesameOnloadstack.
FortwoprocessestoshareanOnloadstackbothmustsetthesamevalueforthe
EF_NAMEparameter(max8chars).Ifonelocalprocessistoreceivethedatasentby
asendinglocalprocess,EF_MCAST_SENDmustbesetto1or3onthethreadcreator
ofthestack.
UserofearlierOnloadversionsandusersofEF_MULTICAST_LOOP_OFFshouldrefer
totheParameterReferencetableParameterReferenceonpage146fordetailsof
deprecatedfeatures.
8.11HardwareMulticastLoopback
AnalternativetotheOnloadstacksharingschemedescribedinMulticastLoopback,
HardwareMulticastLoopback,availablefromopenonload‐201405,enablesthe
passingofmulticasttrafficbetweenOnloadstacksallowingapplicationsrunningon
thesameservertobenefitfromOnloadaccelerationwithouttheneedtosharean
Onloadstacktherebyreducingtheriskofstacklockandresourcecontention.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 95
Figure13:HardwareMulticastLoopback
•OnlyavailableontheSolarflareFlareonSFN7000seriesadapters.
•Adaptersmusthaveaminimumfirmwareversionv4.0.7.6710and“full
featured”firmwaremustbeselectedusingthefirmware‐variantoptionvia
the“sfboot”utility.RefertotheSolarflareServerUserGuide‘sfboot
parameters’forfurtherdetails.
HardwareMulticastLoopbackallowsdatageneratedbyoneprocesstobereceived
byanotherprocessonthesamehost‐MulticastReplicationdoesnotsupportlocal
loopback.
ReceptionofloopedbacktrafficisenabledbydefaultonaperOnloadstackbasis.A
stackcanchoosenottoreceiveloopedbacktrafficbysettingtheenvironment
variableEF_MCAST_RECV_HW_LOOP=0.
NOTE:HardwareMulticastLoopbackisenabledthroughasinglehardwarefilter.
Forthisreason,ifanysingleprocesschoosestoreceivemulticastloopbacktraffic
byEF_MCAST_RECV_HW_LOOP=1,thenallotherprocessesjoinedtothesame
multicastgroupwillalsoreceivetheloopbacktrafficregardlessoftheirsettingfor
EF_MCAST_RECV_HW_LOOP.
Sendingofloopedbacktrafficisdisabledbydefault.Onaper‐stackbasisthisfeature
canbeenabledbysettingtheenvironmentvariableEF_MCAST_SENDtoeither2or3.
SettingthesocketoptionMULTICAST_TTL=0willdisablethesendingoftrafficonthe
normalnetworkpathandpreventtrafficbeingloopedback.Thevalueofthesocket
optionIP_MULTICAST_LOOPhasnoeffectonHardwareMulticastLoopback.Refer
toOnloadandIP_MULTICAST_TTLonpage119fordifferencesinLinuxkerneland
Onloadbehavior.

OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 96
8.12IP_MULTICAST_ALL
Foranacceleratedsocket,OnloadwillalwaysbehaveasifIP_MULTICAST_ALL=0.
Thereisalwaysthepotentialformessagestoarriveatathehost‐perhapsfroma
non‐Solarflareinterfaceorviatheloopbackinterface‐whichwillalsobedelivered
tothesocketundernormalUDPportmatchingrulessothesocketcouldreceive
trafficforgroupsnotexplicitlyjoinedonthissocket.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 97
9PacketBuffers
9.1Introduction
PacketbuffersdescribethememoryusedbytheOnloadstack(andSolarflare
adapter)toreceive,transmitandqueuenetworkdata.Packetbuffersprovidea
methodforuser‐modeaccessiblememorytobedirectlyaccessedbythenetwork
adapterwithoutcompromisingsystemintegrity.
Onloadwillrequesthugepagesiftheseareavailablewhenallocatingmemoryfor
packetbuffers.Usinghugepagescanleadtoimprovedperformanceforsome
applicationsbyreducingthenumberofTranslationLookasideBuffer(TLB)entries
neededtodescribepacketbuffersandthereforeminimizeTLB‘thrashing’.
NOTE:OnloadhugepagesupportshouldnotbeenablediftheapplicationusesIPC
namespacesandtheCLONE_NEWIPCflag.
Onloadofferstwoconfigurationmodesfornetworkpacketbuffers:
9.2NetworkAdapterBufferTableMode
Solarflarenetworkadaptersemployaproprietaryhardware‐basedbufferaddress
translationmechanismtoprovidememoryprotectionandtranslationtoOnload
stacksaccessingaVNIContheadapter.Thisisthedefaultpacketbuffermodeand
issuitableforthemajorityofapplicationsusingOnload.
Thisschemeemploysabuffertableresidingonthenetworkadaptertocontrolthe
memoryanOnloadstackcanusetosendandreceivepackets.
Whiletheadapter’sbuffertableissufficientforthemajorityofapplications,on
adapterspriortotheSFN7000series,itislimitedtoapproximately120,000x2Kbyte
bufferswhichhavetobesharedbetweenallOnloadstacks.
IfthetotalpacketbufferrequirementsofallapplicationsusingOnloadrequiremore
thanthenumberofpacketbufferssupportedbytheadapter’sbuffertable,theuser
shouldconsiderchangingtotheScalablePacketBuffersconfiguration.
9.3LargeBufferTableSupport
TheSolarflareSFN7000seriesadaptersalleviatethepacketbufferlimitationsof
previousgenerationSolarflareadaptersandsupportmanymorethanthe120,000
packetbufferwithouttheneedtoswitchtoScalablePacketBufferMode.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 98
EachbuffertableentryintheSFN7000seriesadaptercandescribea4Kbyte,
64Kbyte,1Mbyteor4Mbyteblockofmemorywhereeachtableentryisthepage
sizeasdirectedbytheoperatingsystem.
9.4ScalablePacketBufferMode
ScalablePacketBufferModeisanalternativepacketbuffermodewhichallowsa
muchhighernumberofpacketbufferstobeusedbyOnload.UsingtheScalable
PacketBufferModeOnloadstacksemploySingleRootI/OVirtualization(SR‐IOV)
virtualfunctions(VF)toprovidememoryprotectionandtranslation.This
mechanismremovesthe120KbufferslimitationimposedbytheNetworkAdapter
BufferTableMode.
FordeploymentswhereusingSR‐IOVand/ortheIOMMUisnotanoption,Onload
alsosupportsanalternativeScalablePacketBufferModeschemecalledPhysical
AddressingMode.Physicaladdressingalsoremovesthe120Kpacketbuffer
limitation,howeverphysicaladdressingdoesnotprovidethememoryprotection
providedbySR‐IOVandanIOMMU.FordetailsofPhysicalAddressingModesee
PhysicalAddressingModeonpage106.
NOTE:EnablingSR‐IOV,whichisneededforScalablePacketBufferMode,hasa
latencyimpactwhichdependsontheadaptermodel.FortheSFN5000adapter
series,latencyincreasesbyapproximately50nsforthe1/2RTTlatency.The
SFN6000adapterserieshasequivalentlatencytotheSFN5000adapterserieswhen
operatinginthismode.
NOTE:MRGusersshouldrefertoRedHatMRG2andSR‐IOVonpage128.
ForfurtherdetailsonSR‐IOVconfigurationrefertoConfiguringScalablePacket
Buffersonpage102.
9.5AllocatingHugePages
Usinghugepagescanleadtoimprovedperformanceforsomeapplicationsby
reducingthenumberofTranslationLookasideBuffer(TLB)entriesneededto
describepacketbuffersandthereforeminimizeTLB‘thrashing’.Hugepagesalso
delivermanypacketsbuffers,butconsumeonlyaasingleentryinthebuffertable.
Explicithugepagesarerecommended.
Thecurrenthugepageallocationcanbecheckedbyinspectionof/proc/meminfo
cat/proc/meminfo|grepHuge
Thisshouldreturnsomethingsimilarto:
AnonHugePages:2048kB
HugePages_Total:2050
HugePages_Free:2050
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:2048kB

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 99
Thetotalnumberofhugepagesavailableonthesystemisthevalue
HugePages_Total.Thefollowingcommandcanbeusedtodynamicallysetand/or
changethenumberofhugepagesallocatedonasystemto(<N>isanon‐negative
integer):
echo<N>>/proc/sys/vm/nr_hugepages
OnaNUMAplatform,thekernelwillattempttodistributethehugepagepoolover
thesetofallallowednodesspecifiedbytheNUMAmemorypolicyofthetaskthat
modifiesnr_hugepages.Thefollowingcommandcanbeusedtocheckthepernode
distributionofhugepagesinaNUMAsystem:
cat/sys/devices/system/node/node*/meminfo|grepHuge
Hugepagescanalsobeallocatedonaper‐NUMAnodebasis(ratherthanhavethe
hugepagesallocatedacrossmultipleNUMAnodes).Thefollowingcommandcanbe
usedtoallocate<N>hugepagesonNUMAnode<M>:
echo<N>>/sys/devices/system/node/node<M>/hugepages/hugepages‐2048kB/nr_hugepages
9.6HowPacketBuffersAreUsedbyOnload
EachpacketbufferisallocatedtoexactlyoneOnloadstackandisusedtoreceive,
transmitorqueuenetworkdata.PacketbuffersareusedbyOnloadinthefollowing
ways:
1Receivedescriptorrings.BydefaulttheRXdescriptorringwillhold512packet
buffersatalltimes.ThisvalueisconfigurableusingtheEF_RXQ_SIZE(per
stack)variable.
2Transmitdescriptorrings.BydefaulttheTXdescriptorringwillholdupto512
packetbuffers.ThisvalueisconfigurableusingtheEF_TXQ_SIZE(perstack)
variable.
3Toqueuedataheldinreceiveandtransmitsocketbuffers.
4TCPsocketscanalsoholdpacketbuffersinthesocket’sretransmitqueueand
inthereorderqueue.
5User‐levelpipesalsoconsumepacketbufferresources.
IdentifyingPacketBufferRequirements
WhendecidingthenumberofpacketbuffersrequiredbyanOnloadstack
considerationshouldbegiventotheresourceneedsofthestacktoensurethatthe
availablepacketbufferscanbesharedefficientlybetweenallOnloadstacks.
•Example1:
Ifweconsiderahypotheticalcaseofasinglehost:
‐ whichemploysmultipleOnloadstackse.g10
‐ eachstackhasmultiplesocketse.g6
‐ andeachsocketusesmanypacketbufferse.g2000

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 100
Thiswouldrequireatotalof120000packetbuffers
•Example2:
IfonastacktheTCPreceivequeueis1MbyteandtheMSSvalueis1472bytes,
thiswouldrequireatleast700packetbuffers‐(andagreaternumberif
segmentssmallerthattheMSSwerereceived).
•Example3:
AUDPreceivequeueof200Kbyteswherereceiveddatagramsareeach200
byteswouldhold1000packetbuffers.
Theexamplesaboveuseonlyapproximatecalculatedvalues.The
onload_stackdumpcommandprovidesaccuratemeasurementsofpacketbuffer
allocationandusage.
Considerationshouldbegiventopacketbufferallocationtoensurethateachstack
isallocatedthebuffersitwillrequireratherthana‘onesizefitsall’approach.
WhenusingtheBufferTableModethesystemislimitedto120Kpacketbuffers‐
theseareallocatedsymmetricallyacrossallSolarflareinterfaces.
NOTE:Packetbuffersareaccessibletoallnetworkinterfacesandeachpacketbuffer
requiresanentryineverynetworkadapters’buffertable.Addingmorenetwork
adapters‐andthereforemoreinterfacesdoesnotincreasethenumberofpacket
buffersavailable.
ForlargescaleapplicationstheScalablePacketBufferModeremovesthelimitations
imposedbythenetworkadapterbuffertable.SeeConfiguringScalablePacket
Buffersonpage102fordetails.
RunningOutofPacketBuffers
WhenOnloaddetectsthatastackisclosetoallocatingallavailablepacketbuffersit
willtakeactiontotryandavoidpacketbufferexhaustion.Onloadwillautomatically
startdroppingpacketsonreceiveand,wherepossible,willreducethereceive
descriptorringfilllevelinanattempttoalleviatethesituation.A‘memorypressure’
conditioncanbeidentifiedusingtheonload_stackdumplotscommandwhere
thepkt_bufsfieldwilldisplaytheCRITICALindicator.SeeIdentifyingMemory
Pressurebelow.
Completepacketbufferexhaustioncanresultindeadlock.InanOnloadstack,ifall
availablepacketbuffersareallocated(forexamplecurrentlyqueuedinsocket
buffers)thestackispreventedfromtransmittingfurtherdataastherearenopacket
buffersavailableforthetask.
IfallavailablepacketbuffersareallocatedthenOnloadwillalsofailtokeepits
adaptersreceivequeuesreplenished.Ifthequeuesfallemptyfurtherdatareceived
bytheadapterisinstantlydropped.OnaTCPconnectionpacketbuffersareusedto
holdunacknowledgeddataintheretransmitqueue,anddroppingreceivedpackets
containingACKsdelaysthefreeingofthesepacketbuffersbacktoOnload.Setting
thevalueofEF_MIN_FREE_PACKETS=0canresultinastackhavingnofreepacket
buffersandthis,inturn,canpreventthestackfromshuttingdowncleanly.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 101
IdentifyingMemoryPressure
Thefollowingextractsfromtheonload_stackdumpcommandidentifyanOnload
stackundermemorypressure.
TheEF_MAX_PACKETSvalueidentifiesthemaximumnumberofpacketbuffersthat
canbeusedbythestack.EF_MAX_RX_PACKETSisthemaximumnumberofpacket
buffersthatcanbeusedtoholdpacketsreceived.EF_MAX_TX_PACKETSisthe
maximumnumberofpacketbuffersthatcanbeusedtoholdpacketstosend.These
twovaluesarealwayslessthatEF_MAX_PACKETStoensurethatneitherthetransmit
orreceivepathscanstarvetheotherofpacketbuffers.RefertoParameter
Referenceonpage146fordetaileddescriptionsoftheseperstackvariables.
TheexampleOnloadstackhasthefollowingdefaultenvironmentvariablevalues:
EF_MAX_PACKETS:32768
EF_MAX_RX_PACKETS:24576
EF_MAX_TX_PACKETS:24576
Theonload_stackdumplotscommandidentifiespacketbufferallocationandthe
onsetofamemorypressurestate:
pkt_bufs:size=2048max=32768alloc=24576free=32async=0CRITICAL
pkt_bufs:rx=24544rx_ring=9rx_queued=24535
Therearepotentially32768packetbuffersavailableandthestackhasallocated
(used)24576packetbuffers.
Inthesocketreceivebuffersthereare24544packetsbufferswaitingtobe
processedbytheapplication‐thisisapproachingtheEF_MAX_RX_PACKETSlimitand
isthereasontheCRITICALflagispresenti.e.theOnloadstackisundermemory
pressure.Only9packetbuffersareavailabletothereceivedescriptorring.
OnloadwillaimtokeeptheRXdescriptorringfullatalltimes.Iftherearenot
enoughavailablepacketbufferstorefilltheRXdescriptorringthisisindicatedbythe
LOWmemorypressureflag.
Theonload_stackdumplotscommandwillalsoidentifythenumberofmemory
pressureeventsandnumberofpacketsdroppedasaresultofmemorypressure.
memory_pressure:1
memory_pressure_drops:22096
ControllingOnloadPacketBufferUse
Anumberofenvironmentvariablescontrolthepacketbufferallocationonaper
stackbasis.RefertoParameterReferenceonpage146foradescriptionof
EF_MAX_PACKETS.
Unlessexplicitlyconfiguredbytheuser,EF_MAX_RX_PACKETSand
EF_MAX_TX_PACKETSwillbeautomaticallysetto75%oftheEF_MAX_PACKETS
value.Thisensuresthatsufficientbuffersareavailabletobothreceiveandtransmit.
TheEF_MAX_RX_PACKETSandEF_MAX_TX_PACKETSarenottypicallyconfiguredby
theuser.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 102
Ifanapplicationrequiresmorepacketbuffersthanthemaximumconfigured,then
EF_MAX_PACKETSmaybeincreasedtomeetdemand,howeveritshouldbe
recognizedthatlargerpacketbufferqueuesincreasecachefootprintwhichcanlead
toreducedthroughputandincreasedlatency.
EF_MAX_PACKETSisthemaximumnumberofpacketbuffersthatcouldbeusedby
thestack.SettingEF_MAX_RX_PACKETStoavaluegreaterthanEF_MAX_PACKETS
effectivelymeansthatallpacketbuffers(EF_MAX_PACKETS)allocatedtothestack
willbeusedforRX‐withnothingleftforTX.Thesafestmethodistoonlyincrease
EF_MAX_PACKETSwhichkeepstheRXandTXpacketbuffersvaluesat75%ofthis
value.
9.7ConfiguringScalablePacketBuffers
NOTE:SR‐IOVandthereforeScalablePacketBufferModeisnotcurrentlysupported
ontheSFN7000seriesadapterbutwillbeavailableinafuturerelease.
UsingtheScalablePacketBufferModeOnloadstacksareboundtovirtualfunctions
(VFs)andprovideaPCISR‐IOVcompliantmeanstoprovidememoryprotectionand
translation.VFsemploythekernelIOMMU.
RefertoChapter11andScalablePacketBufferModeonpage127for32‐bitkernel
limitations.
Procedure:
•Step1.PlatformSupportonpage102
•Step2.BIOSandLinuxKernelConfigurationonpage103
•Step3.UpdateadapterfirmwareandenableSR‐IOVonpage104
•Step4.EnableVFsforOnloadonpage105
•Step5.CheckPCIeVFConfigurationonpage105
•Step6.CheckVFsinonload_stackdumponpage105
Step1.PlatformSupport
ScalablePacketBufferModeisimplementedusingSR‐IOV,supportforwhichisa
relativelyrecentadditiontotheLinuxkernel.Therewereseveralkernelbugsinearly
incarnationsofSR‐IOVsupport,uptoandincludingkernel.org2.6.34.Thefixeshave
beenback‐portedtorecentRedHatkernels.Usersareadvisedtoenablescalable
packetbuffermodeonRedHatkernel2.6.32‐131.0.15orlater,orkernel.org2.6.35
orlater.Inotherdistributions,itisrecommendedthatthemostrecentpatched
kernelversionisused
•ThesystemhardwaremusthaveanIOMMUandthismustbeenabledinthe
BIOS.
•ThekernelmustbecompiledwithsupportforIOMMUandkernelcommand
lineoptionsarerequiredtoselecttheIOMMUmode.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 103
•ThekernelmustbecompiledwithsupportforSR‐IOVAPIs(CONFIG‐PCI‐IOV).
•SR‐IOVmustbeenabledonthenetworkadapterusingthesfbootutility.
•Whenmorethan6VFsareneeded,thesystemhardwareandkernelmust
supportPCIeAlternativeRequesterID(ARI)‐aPCIeGen2feature.
•OnloadoptionsEF_PACKET_BUFFER_MODE=1mustbesetintheenvironment.
•Thesfcdrivermoduleoptionmax_vfsshouldbesettotherequirednumberof
VFs.
NOTE:TheScalablePacketBufferfeaturecanbesusceptibletoknownkernelissues
observedonRHEL6andSLES11.(Seehttp://www.spinics.net/lists/linux‐pci/
msg10480.htmlfordetails.Theconditioncanresultinanunresponsiveserverif
intel_iommuhasbeenenabledinthegrub.conffile,aspertheprocedureatStep
2.BIOSandLinuxKernelConfigurationonpage103,andiftheSolarflare
sfc_resourcedriverisreloaded.Thisissuehasbeenaddressedinnewerkernels.
Step2.BIOSandLinuxKernelConfiguration
TouseSR‐IOV,hardwarevirtualizationmustbeenabled.RefertoRedHatEnabling
IntelVT‐xandAMD‐VVirtualizationinBIOSformoreinformation.Takecareto
enableVT‐daswellasVTonanIntelplatform.
ToverifythattheextensionshavebeencorrectlyenabledrefertoRedHatVerifying
virtualizationextensions.Forbestkernelconfigurationperformanceandtoavoid
kernelbugsexhibitedwhenIOMMUisenabledforalldevices,Solarflare
recommendthekernelisconfiguredtousetheIOMMUinpass‐throughmode‐
appendthefollowinglinestokernellineinthe/boot/grub/grub.conffile:
OnanIntelsystem:
intel_iommu=oniommu=on,pt
OnanAMDsystem:
amd_iommu=on,iommu=on,pt
Inpass‐throughmodetheIOMMUisbypassedforregulardevices.RefertoRedHat:
PCIpassthroughformoreinformation.
NOTE:OnLinuxRedHat5servers(2.6.18)itisnecessarytoalsousethe
iommu_type=2option.
NOTE:EnterpriseOnloadv2.1.0.0usersandOpenOnloadv201109‐u2(onwards)
users:
RecentkernelsarecompiledwithsupportforIOMMUsbydefault,but
unfortunatelytherealtime(‐rt)kernelpatchesarenotcurrentlycompatiblewith
IOMMUs(RedHatMRGkernelsarecompiledwithCONFIG_PCI_IOVdisabled).Itis
possibletousescalablepacketbuffermodeonsomesystemswithoutIOMMU
support,butinaninsecuremode.InthisconfigurationtheIOMMUisbypassed,and
thereisnocheckingofDMAaddressesprovidedbyOnloadinuser‐space.Bugsor
mis‐behaviorofuser‐spacecodecancompromisethesystem.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 104
Toenablethisinsecuremode,settheOnloadmoduleoption
unsafe_sriov_without_iommu=1forthesfc_resourcekernelmodule.
LinuxMRGusersareurgedtouseMRGu2andkernel3.2.33‐rt50.66.el6rt.x86_64
orlatertoavoidknownissuesandlimitationsofearlierversions.
Theunsafe_sriov_without_iommuoptionisobsoletedinOpenOnload201210.It
isreplacedbyphysicaladdressingmode‐seePhysicalAddressingModeon
page106fordetails.
Step3.UpdateadapterfirmwareandenableSR‐IOV
1DownloadandinstalltheSolarflareLinuxUtilitiesRPMfrom
support.solarflare.comandunziptheutilitiesfiletorevealtheRPM:
2InstalltheRPM:
#rpm‐Uvhsfutils‐<version>.rpm
3Identifythecurrentfirmwareversionontheadapter:
#sfupdate
4Upgradetheadapterfirmwarewithsfupdate:
#sfupdate‐‐write
FullinstructionsonusingsfupdatecanbefoundintheSolarflareNetwork
ServerAdapterUserGuide.
5UsesfboottoenableSR‐IOVandenabletheVFs.Youcanenableupto127VFs
perport,butthehostBIOSmayonlybeabletosupportasmallernumber.The
followingexamplewillconfigure16VFsoneachSolarflareport:
#sfbootsriov=enabledvf‐count=16vf‐msix‐limit=1
6Itisnecessarytoreboottheserverfollowingchangesusingsfbootand
sfupdate.
NOTE:Enablingall127VFsperportwithmorethanoneMSI‐XinterruptperVFmay
notbesupportedbythehostBIOS.IftheBIOSdoesn'tsupportthisthenyoumay
get127VFsononeportandnoVFsontheotherport.YoushouldcontactyourBIOS
vendorforanupgradeorreducetheVFcount.
NOTE:OnRedHat5serversthevf‐countshouldnotexceed32.
Option DefaultValue Description
sriov=<enabled|disabled> Disabled Enable/DisablehardwareSRIOV
support
vf‐count=<n> 127 Numberofvirtualfunctions
advertisedperport.Seethe
notebelow.
vf‐msix‐limit=<n> 1 NumberofMSI‐Xinterruptsper
VF

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 105
NOTE:VFallocationmustbesymmetricacrossallSolarflareinterfaces.
Step4.EnableVFsforOnload
#exportEF_PACKET_BUFFER_MODE=1
Thesfcdrivermodulemax_vfsshouldspecifythenumberofrequiredVFs.The
drivermoduleoptioncanbesetinauser‐createdfile(e.g.sfc.conf)inthe/etc/
modprobe.ddirectory:
optionssfcmax_vfs=N
RefertoParameterReferenceonpage146forothervalues.
Step5.CheckPCIeVFConfiguration
ThenetworkadaptersfcdriverwillinitializetheVFs,whichcanbedisplayedbythe
lspcicommand:
#lspci‐d1924:
05:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.2Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.3Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.4Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.5Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.6Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.7Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.0Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.1Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
Thelspciexampleoutputaboveidentifiesonephysicalfunctionperphysicalport
andthevirtualfunctions(fourforeachport)ofasingleSolarflaredual‐portnetwork
adapter.
Step6.CheckVFsinonload_stackdump
Theonload_stackdumpnetifcommandwillidentifyVFsbeingusedbyOnload
stacksasinthefollowingexample:
#onload_stackdumpnetif
ci_netif_dump:stack=0name=
ver=201109uid=0pid=3354
lock=10000000UNLOCKEDnics=3primed=3
sock_bufs:max=1024n_allocated=4
pkt_bufs:size=2048max=32768alloc=1152free=128async=0
pkt_bufs:rx=1024rx_ring=1024rx_queued=0
pkt_bufs:tx=0tx_ring=0tx_oflow=0tx_other=0

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 106
time:netif=3df7d2poll=3df7d2now=3df7d2(diff=0.000sec)
ci_netif_dump_vi:stack=0intf=0vi=67dev=0000:05:01.0hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
ci_netif_dump_vi:stack=0intf=1vi=67dev=0000:05:01.1hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
TheoutputabovecorrespondstoVFsadvertisedontheSolarflarenetworkadapter
interfaceidentifiedusingthelspcicommand‐RefertoStep5above.
9.8PhysicalAddressingMode
PhysicaladdressingmodeisaScalablePacketBufferModethatalsoallowsOnload
stackstouselargeamountsofpacketbuffermemory(avoidingthelimitationsofthe
addresstranslationtableontheadapter),butwithouttherequirementtoconfigure
anduseSR‐IOVvirtualfunctions.
Physicaladdressingmode,doeshowever,removememoryprotectionfromthe
networkadapter’saccessofpacketbuffers.Unprivilegeduser‐levelcodeisprovided
anddirectlyhandlestherawphysicalmemoryaddressesofpacketsbuffers.User‐
levelcodeprovidesphysicalmemoryaddressesdirectlytotheadapterand
thereforehastheabilitytodirecttheadaptertoreadorwritearbitrarymemory
locations.Aresultofthisisthatamaliciousorbuggyapplicationcancompromise
systemintegrityandsecurity.OpenOnloadversionsearlierthanonload‐201210and
EnterpriseOnload‐2.1.0.0arelimitedto1millionpacketbuffers.Thislimitwas
raisedto2millionpacketsbuffersin201210‐u1andEnterpriseOnload‐2.1.0.1.
Toenablephysicaladdressingmode:
1Ignoreconfigurationsteps1‐4above.
2Putthefollowingoptionintoauser‐created.conffileinthe/etc/modprobe.d
directory:
optionsonloadphys_mode_gid=<n>
Wheresetting<n>tobe‐1allowsalluserstousephysicaladdressingmodeand
settingtoanintegerxrestrictsuseofphysicaladdressingmodetothespecific
usergroupx.
3ReloadtheOnloaddrivers
onload_toolreload
4EnabletheOnloadenvironmentusingEF_PACKET_BUFFER_MODE2or3.
EF_PACKET_BUFFER_MODE=2isequivalenttomode0,butusesphysical
addresses.Mode3usesSR‐IOVVFswithphysicaladdresses,butdoesnotuse
theIOMMUformemorytranslationandprotection.RefertoParameter
Referenceonpage146foracompletedescriptionofall
EF_PACKET_BUFFER_MODEoptions.

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 107
9.9ProgrammedI/O
PIO(programmedinput/output)describestheprocesswherebydataisdirectly
transferredbytheCPUtoorfromanI/Odevice.Itisanalternativetobusmaster
DMAtechniqueswheredataaretransferredwithoutCPUinvolvement.
Solarflare7000seriesadapterssupportTXPIO,wherepacketsonthetransmitpath
canbe“pushed”totheadapterdirectlybytheCPU.Thisimprovesthelatencyof
transmittedpacketsbutcancauseaverysmallincreaseinCPUutilization.TXPIOis
thereforeespeciallyusefulforsmallerpackets.
TheOnloadTXPIOfeatureisenabledbydefaultbutcanbedisabledviathe
environmentvariableEF_PIO.Anadditionalenvironmentvariable,
EF_PIO_THRESHOLDspecifiesthesizeofthelargestpacketsizethatcanuseTXPIO.
PIObuffersontheadapterarelimitedtoamaximumof8Onloadstacks.For
optimumperformance,PIObuffersshouldbereservedforcriticalprocessesand
otherprocessesshouldsetEF_PIOto0(zero).
TheOnloadstackdumputilityprovidesadditionalcounterstoindicatethelevelof
PIOuse‐seeTXPIOCountersonpage220fordetails.
TheSolarflarenetdriverwillalsousePIObuffersfornon‐acceleratedsocketsand
thiswillreducethenumberofPIObuffersavailabletoOnloadstacks.Topreventthis
setthedrivermoduleoptionpiobuf_size=0.
Whenbothacceleratedandnon‐acceleratedsocketsareusingPIO,thenumberof
PIObuffersavailabletoOnloadstackscanbecalculatedfromthetotal16available
PIOregions:
Usingtheaboveexamplevalues,eachportontheadapterrequires:
piobuf_size*rss_cpus/regionsize=0.5regions‐(roundup‐soeachportneeds1
region).
Thisleaves16‐2=14regionsforOnloadstackswhichalsorequireoneregionper
port,perstack.Thereforefromourexamplewecanhave7onloadstacksusingPIO
buffers.
PIObuffersareallocatedonafirst‐come,first‐servedbasis.Thefollowingwarning
mightbeobservedwhenstackscannotbeallocatedanymorePIObuffers:
WARNING:allPIObufsallocatedtootherstacks.ContinuingwithoutPIO.
UseEF_PIOtocontrolthis
Description Examplevalue
piobuf_size drivermoduleparameter 256
rss_cpus drivermoduleparameter 4
region achunkofmemory2048bytes 2048bytes

OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 108
ToensuremorebuffersareavailableforOnload,itispossibletopreventthenet
driverfromusingPIObuffers.Thiscanbedonebysettingthesfcdrivermodule
optioninauser‐createdfileinthe/etc/modprobe.ddirectory:
optionssfcpiobuf_size=0
Driversshouldbereloadedforthechangestobeeffective:
#onload_toolreload
Theper‐stackEF_PIOvariablecanalsobeunsetforstackswherePIObuffersarenot
required.
9.10TemplatedSends
“Templatedsends”isanotherSFN7000seriesadapterfeaturethatbuildsontopof
TXPIOtoprovidefurthertransmitlatencyimprovements.Thiscanbeusedin
applicationsthatknowthemajorityofthecontentofpacketsinadvanceofwhen
thepacketistobesent.Forexample,amarketfeedhandlermaypublishpackets
thatvaryonlyinthespecificvalueofcertainfields,possiblydifferentsymbolsand
priceinformation,butareotherwiseidentical.Templatedsendsinvolvecreatinga
templateofapacketontheadaptercontainingthebulkofthedatapriortothetime
ofsendingthepacket.Then,whenthepacketistobesent,theremainingdatais
pushedtotheadaptertocompleteandsendthepacket.
TheOnloadtemplatedsendsfeatureusestheOnloadExtensionsAPItogeneratethe
packettemplatewhichistheninstantiatedontheadapterreadytoreceivethe
“missing”databeforeeachtransmission.
TheAPIdetailsareavailableintheOnload201310distributionat/src/include/
onload/extensions_zc.h
RefertoOnloadExtensionsAPIforfurtherinformationontheuseofpacket
templatesincludingcodeexamplesofusingthisfeature.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 109
10OnloadandVirtualization
10.1Introduction
UsingOnload‐201502acceleratedapplicationsareabletobenefitfromtheinherent
securitythroughisolation,easeofdeploymentthroughmigrationandincreased
resourcemanagementsupportedbyLinuxvirtualizedenvironments.
Thischapteridentifiesthefollowing:
•OnloadandLinuxKVMonpage109
•OnloadandNICPartitioningonpage111
•OnloadinaDockerContaineronpage113
10.2Overview
• RunningOnloadinaVirtualMachine(VM)orDockerContainermeansthe
Onloadacceleratedapplicationbenefitsfromtheinherentisolationpolicyof
thevirtualizedenvironment.
•Thereisminimaldegradationoflatencyandthroughputperformance.Near
nativenetworkI/Operformanceispossiblebecausethereisdirecthardware
access(nohardwareemulation)withtheguestkernel(andvirtualization
platformhypervisor)beingbypassed.
• Multiplecontainers/virtualmachinescanco‐existonthesamehostandallare
isolatedfromeachother.
10.3OnloadandLinuxKVM
OpenOnload201502includessupporttoaccelerateapplicationsrunningwithin
LinuxVMsonaKVMhost.ThisfeatureissupportedonSolarflareSFN7000series
adapterswhereeachphysicalinterfaceontheadaptercanbeexposedtothehost
asupto16PCIephysicalfunctions(PF)andupto240virtualfunctions(VF).The
adapteralsosupportsupto2048MSI‐Xinterrupts.
ThissupportrequiresaVF(orPF)tobeexposeddirectlyintotheLinuxVM–KVM
callthisnetworkconfiguration“Networkhostdev”.Onloadprovidesuser‐level
accesstotheadapterviatheVFinexactlythesamewayasisachievedonanon‐
virtualizedLinuxinstall.FirmwareontheSolarflareSFN7000seriesadapter
configureslayer2switchingcapabilitythatsupportsthetransportofnetwork
packetsbetweenPCIphysicalfunctionsandvirtualfunctions.Thisfeaturesupports

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 110
thetransportofnetworktrafficbetweenOnloadapplicationsrunningindifferent
virtualmachines.Thisallowstraffictobereplicatedacrossmultiplefunctionsand
traffictransmittedfromoneVMcanbereceivedonanotherVM.
Figure14belowillustratesOnloaddeployedintotheLinuxKVMNetworkHostdev
architecturewhichexposesVirtualFunctions(VF)directlytotheVMguest.This
configurationallowstheOnloaddatapathtofullybypassthehostoperatingsystem
andprovidesmaximumaccelerationfornetworktraffic.
Figure14:OnloadandNetworkHostdevConfiguration
TodeployOnloadinaLinuxKVM:
•AsdetailedintheSolarflareServerAdapterUserGuide(SF‐103837‐CD)chapter
7SRIOV:
‐ InstalltheSolarflareNETdriverversion4.4.1.1017(orlater)
‐ Ensuretheadapterisusingfirmwareversion4.4.2.1011(orlater)
‐ Runsfboottoselectthefull‐featurefirmwarevariant,settheswitch‐mode
andidentifytherequirednumberofVFs:
#sfbootfirmware‐variant=full‐featureswitch‐mode=sriovvf‐count=4
‐ Reboottheserver,sotheLinuxKVMhostcanenumeratetheVFs
• FollowtheinstructionsinSolarflareServerAdapterUserGuide(SF‐103837‐CD)
sectionKVMLibvirtnetworkhostdev‐Configurationto:
‐ CreateaVM
‐ ConfiguretheVFs
‐ UnbindVFsfromthehost

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 111
‐ PassVFstotheVM
ExamplevirshcommandlineandXMLfileconfigurationinstructionsare
provided.
•InstallOnloadintheVMasinanon‐virtualizedhost‐seeOpenOnload‐
Installationonpage21.
•Setthesfcdrivermoduleoptionnum_vistocreatethenumberofvirtual
interfaces.AVIisneededforeachOnloadstackcreatedonaVF.Drivermodule
optionsshouldbesetinausercreatedfile(e.gsfc.conf)inthe/etc/
modprobe.ddirectory.
optionssfcnum_vis=<NUM>
NOTE:WhenusingOnloadwithmultiplevirtualfunctions(VF)itisnecessaryto
settheOnloadmoduleoptionoof_all_ports_requiredtozero.SeeModule
Optionsonpage143fordetails.
TheSolarflareServerAdapterUserGuideisavailablefromhttps://
support.solarflare.com/.
10.4OnloadandNICPartitioning
EachphysicalinterfaceontheSolarflareSFN7000seriesadaptercanbeexposedto
thehostasmultiplePCIephysicalfunctions(PF).Upto16PFs,eachhavingaunique
MACaddress,aresupportedperadapter.ToOnload,eachPFrepresentsavirtual
adapter.

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 112
Figure15:OnloadandNICPartitioning
OntheadaptereachPFisbackedbyavirtualadapterandvirtualport‐these
componentsarecreatedbytheSolarflareNETdriverwhenitfindsapartitioned
adapter.ThePFscanbeconfiguredtotransparentlyplacetrafficonseparateVLANS
(soeachpartitionisonaseparatebroadcastdomain).
ToconfigureOnloadtousethepartitionedNIC:
• Ensuretheadapterisusingfirmwareversion4.4.2.1011(minimum)
•Usesfboottoselectthefull‐featurefirmwarevariant
•UsesfboottopartitiontheNICintomultiplePFs
•RebootingthehostallowsthefirmwaretopartitiontheNICintomultiplePFs.
•Toidentifywhichphysicalportanetworkinterfaceisusing:
#cat/sys/class/net/eth<N>/device/physical_port
ForcompletedetailsofconfiguringNICPartitioningrefertotheSolarflareServer
AdapterUserGuide(SF‐103837‐CD)chapter7SRIOVavailablefromhttps://
support.solarflare.com/.

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 113
10.5OnloadinaDockerContainer
Figure16illustratestheOnloaddeploymentinaDockercontainerenvironment.
Onlytheuser‐levelcomponentsarecreatedinthecontainer.Onloadinthe
containerusestheOnloaddriversinstalledonthehostfornetworkI/O.Network
interfacesconfiguredonthehostarealsovisibleandusabledirectlyfromthe
container.
Figure16:OnloadinaDockerContainer
Inkeepingwiththecontainerizationtheory,itisenvisagedthatonlyasingleOnload
instancewillberunningineachcontainer,however,therearenorestrictions
preventingmultipleinstancesrunninginthesamecontainer.
10.6Pre‐Installation
Thisinstallproceduremakesthefollowingassumptions‐ensurethesecomponents
arecreated/installedbeforecontinuing:
•Dockerisinstalledonthehostserver.
•Onload201502(orlaterversion)mustbeinstalledonthehost.Anidentical
versionwillbeinstalledinthecontainer.
NOTE:OnloaddoesnotcurrentlysupportLinuxnamespaces.SupportforLinux
Networknamespacesmaybeaddedinafuturerelease.

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 114
10.7Installation
1Thedockerruncommandwillcreateacontainernamedonload.Thecontainer
iscreatedfromthecentos:latestbaseimageandabashshellterminalwillbe
started.
[root@host]#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/onload_epoll‐‐
name=onload‐it‐v/src/openonload‐201502.tgz:/tmp/openonload‐201502.tgz
centos:latest/bin/bash
Theexampleabovecopiestheopenonload‐201502.tgzfilefromthe/src
directoryonthehostandplacedthisfileinto/tmpinthecontainerrootfile
system.Allsubsequentcommandsareruninsidethecontainerunlesshostis
specified.
2InstallrequiredOStools/packagesinthecontainer.
#yuminstallperlautoconfautomakelibtooltargccmakenet‐toolsethtool
DifferentdockerbaseimagesmayrequireadditionalOSpackagesinstalled.
3Unpackthetarballtobuildtheopenonload‐<version>sub‐directory.
#/usr/bin/tar‐zxvf/tmp/openonload‐201502.tgz
Note:itisnotpossibletousetools/utilities(suchastar)fromthehostfile
systemonfilesinthecontainerfilesystem.
4Changedirectorytotheopenonload‐<version>/scriptsdirectory
#cd/tmp/openonload‐201502/scripts
5BuildandinstalltheOnloaduser‐levelcomponentsinthecontainer:
#./onload_build‐‐user
Ifthebuildprocessidentifiesanymissingdependencies,returntostep2to
installmissingcomponents.
#./onload_install‐‐userfiles‐‐nobuild
Thefollowingwarningmayappearattheendoftheinstallprocess,butitisnot
necessarytoreloadthedrivers
onload_install:Toloadthenewlyinstalleddriversrun:onload_toolreload
6CheckOnloadinstallation
#onload
OpenOnload201502
Copyright2006‐2012SolarflareCommunications,2002‐2005Level5
Networks
Built:Feb5201512:41:04(release)
Kernelmodule:201502
usage:
onload[options]<command><command‐args>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐force‐profiles‐‐profilesettingsoverrideenvironment
‐‐no‐app‐handler‐‐donotuseapp‐specificsettings
‐‐app=<app‐name>‐‐identifyapplicationtorununderonload

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 115
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
7Onthehost,checkthatthecontainerhasbeencreatedandisrunning:
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
e2a12a635359centos:latest"/bin/bash"15secondsagoUp14secondsonload
8Configurenetworkinterfaces.
Configurenetworkadapterinterfacesinthehost.Interfaceswillalsobevisible
andusablefromthecontainer:
#ifconfig‐a
9Onloadisnowinstalledandreadytouseinthecontainer.
10.8CreateOnloadDockerImage
TocreateanewdockerimagethatincludestheOnloadinstallationpriorto
migration.Allcommandsarerunonthehost.
1Identifythecontainer(noteCONTAINERIDorNAME)
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
35bfeceb7022centos:latest"/bin/bash"24hoursagoExitedonload
2Createnewimage(thisexampleusestheNAMEvalue)
#dockercommit‐m"installedonload201502"onloadonload:v1
89e95645d5ff1fa02880dee44b433ab577f5a2715daf944fd0b393620d8253f1
3Listimages
#/dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv189e95645d5ff28secondsago486MB
centoslatestdade6cb4530a3daysago224MB
10.9Migration
Thedockersavecommandcanbeusedtoarchiveadockerimagewhichincludes
theOnloadinstallation.Thisimagecanthenbemigratedtootherservershavingthe
followingconfiguration:
•Dockerisinstalledanddockerserviceisrunning
•HostoperatingsystemRHEL7
•TheOnloadversionrunningonthehostmustbethesameasthemigrated
imageOnloadversion
•ThetargetserverdoesnotneedtohavethesameSolarflareadaptertypes
installed.

OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 116
1Createatarfileofthecontainerimage:
#dockersave‐o<dirpathtostoreimage>/<nameofimage>.tar
<currentnameofimage>
Example(storeimagetarfileinhost/tmpdirectory):
#dockersave‐o/tmp/dk‐onload‐201502.taronload
2Theimagetarfilecanthenbecopiedtothetargetserverwhereitcanbe
loadedwiththedockerloadcommand:
#dockerload‐i/<pathtotransferredfile>/dk‐onload‐201502.tar
#dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv1303ec2d3e2b5Aboutanhourago486MB
3Create/runacontainerfromthetransferredimage.
#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/
onload_epoll‐‐name=onload‐itonload:v1/bin/bash
Whenthecontainerhasbeencreated,Onloadwillberunningwithinit.
OnloadDockerImages
Onloadimagesarenotcurrentlyavailablefromthedefaultdockerregistryhub.
Imagesmaybemadeavailableifthereissufficientcustomerinterestand
requirementforthisfeature.
10.10CopyingFilesBetweenHostandContainer
Thefollowingexampledemonstrateshowtocopyfilesfromthehosttoacontainer.
Allcommandsarerunonthehost.
1GetthecontainerShortName(outputtruncated):
[root@hostname]#dockerps‐a
CONTAINERID
bd1ea8d5526c
2DiscoverthecontainerLongName:
[root@hostname]#dockerinspect‐f'{{.Id}}'bd1ea8d5526c
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156
Thecontainerlongnamecanalsobediscoveredusingthecontainernamein
placeofthecontaineridentifier.
3Copyafiletorootfilesystem(/tmp)onthecontainer:
[root@hostname]#cpmyfile.txt/var/lib/docker/devicemapper/mnt/
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156/
rootfs/tmp/myfile.txt

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 117
11Limitations
Usersareadvisedtoreadthelatestrelease_notesdistributedwiththeOnload
releaseforacomprehensivelistofKnownIssues.
11.1Introduction
ThischapteroutlinesconfigurationsthatOnloaddoesnotaccelerateandwaysin
whichOnloadmaychangebehaviorofthesystemandapplications.Itisakeygoal
ofOnloadtobefullycompatiblewiththebehavioroftheregularkernelstack,but
therearesomecaseswherebehaviordeviates.
11.2ChangestoBehavior
MultithreadedApplicationsTermination
AsOnloadhandlesnetworkinginthecontextofthecallingapplication'sthreaditis
recommendedthatapplicationsensureallthreadsexitcleanlywhentheprocess
terminates.Inparticulartheexit()functioncausesallthreadstoexitimmediately
‐eventhoseincriticalsections.ThiscancausethreadscurrentlywithintheOnload
stackholdingtheperstacklocktoterminatewithoutreleasingthissharedlock‐this
isparticularlyimportantforsharedstackswhereaprocesssharingthestackcould
‘hang’whenOnloadlocksarenotreleased.
AnuncleanexitcanpreventtheOnloadkernelcomponentsfromcleanlyclosingthe
application'sTCPconnections,amessagesimilartothefollowingwillbeobserved:
[onload]Stack[0]releasedwithlockstuck
andanypendingTCPconnectionswillbereset.Topreventthis,applicationsshould
alwaysensurethatallthreadsexitcleanly.
ThreadCancellation
Unexpectedbehaviorcanresultwhenanacceleratedapplicationusesa
pthread_cancelfunction.Thereisincreasedriskfrommulti‐threadedapplicationsor
aPTHREAD_CANCEL_ASYNCHRONOUSthreadcallinganon‐asyncsafefunction.
Onloadusersarestronglyadvisedthatapplicationsshouldnotusepthread_cancel
functions.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 118
PacketCapture
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,diagnostictoolssuchastcpdumpandwiresharkdonotcapture
acceleratedpackets.TheSolarflaresuppliedonload_tcpdumpdoessupportcapture
ofUDPandTCPpacketsfromOnloadstacks‐Refertoonload_tcpdumponpage246
fordetails.
Firewalls
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,thesepacketsarenotvisibletothekernelfirewall(iptables)and
thereforefirewallruleswillnotbeappliedtoacceleratedtraffic.The
onload_iptablesfeaturecanbeusedtoenforceLinuxiptablesrulesashardware
filtersontheSolarflareadapter,refertoonload_iptablesonpage251.
NOTE:Hardwarefilteringonthenetworkadapterwillensurethataccelerated
applicationsreceivetrafficonlyonportstowhichtheyarebound.
SystemTools
Withtheexceptionof‘listening’sockets,TCPsocketsacceleratedbyOnloadarenot
visibletothenetstattool.UDPsocketsarevisibletonetstat.
Acceleratedsocketsappearinthe/procdirectoryassymboliclinksto/dev/
onload.Toolsthatrelyon/procwillprobablynotidentifytheassociatedfile
descriptorsasbeingsockets.RefertoOnloadandFileDescriptors,Stacksand
Socketsonpage52formoredetails.
AcceleratedsocketscanbeinspectedindetailwiththeOnloadonload_stackdump
tool,whichexposesconsiderablymoreinformationthantheregularsystemtools.
Fordetailsofonload_stackdumprefertoonload_stackdumponpage219.
Signals
IfanapplicationreceivesaSIGSTOPsignal,itispossiblefortheprocessingof
networkeventstobestalledinanOnloadstackusedbytheapplication.This
happensiftheapplicationisholdingalockinsidethestackwhentheapplicationis
stopped,andiftheapplicationremainsstoppedforalongtime,thismaycauseTCP
connectionstotime‐out.
Asignalwhichterminatesanapplicationcanpreventthreadsfromexitingcleanly.
RefertoMultithreadedApplicationsTerminationonpage117formoreinformation.
Undefinedcontentmayresultwhenasignalhandlerusesthethirdargument
(ucontext)andifthesignalispostponedbyOnload.Toavoidthis,usetheOnload
moduleoptionsafe_signals_and_exit=0oruseEF_SIGNALS_NOPOSTPONEto
preventspecificsignalsbeingpostponedbyOnload.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 119
OnloadandIP_MULTICAST_TTL
OnloadwillactinaccordancewithRFC791whenitcomestotheIP_MULTICAST_TTL
setting.UsingOnload,ifIP_MULTICAST_TTL=0,packetswillneverbetransmittedon
thewire.
ThisdiffersfromtheLinuxkernelwherethefollowingbehaviorhasbeenobserved:
Kernel‐IP_MULTICAST_TTL0‐ifthereisalocallistener,packetswillnotbe
transmittedonthewire.
Kernel‐IP_MULTICAST_TTL0‐ifthereisNOlocallistener,packetswillalwaysbe
transmittedonthewire.
Source/PolicyBasedRoutingandRoutingMetrics
Onloaddoesnotcurrentlysupportsourcebasedorpolicybasedrouting.Whereas
theLinuxkernelwillselectaroutebasedonroutingmetrics,Onloadwillselectany
ofthevalidroutestoadestinationthatareavailable.
11.3LimitstoAcceleration
IPFragmentation
FragmentedIPtrafficisnotacceleratedbyOnloadonthereceiveside,andisinstead
receivedtransparentlyviathekernelstack.IPfragmentationisrarelyseenwithTCP,
becausetheTCP/IPstackssegmentmessagesintoMTU‐sizedIPdatagrams.With
UDP,datagramsarefragmentedbyIPiftheyaretoolargefortheconfiguredMTU.
RefertoFragmentedUDPonpage89foradescriptionofOnloadbehavior.
BroadcastTraffic
Broadcastsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
Multicasttrafficcanbeaccelerated.
IPv6Traffic
IPv6trafficfunctionsasnormalbutwillnotbeaccelerated.
RawSockets
RawSocketsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
SocketpairandUNIXDomainSockets
Onloadwillintercept,butdoesnotacceleratethesocketpair()systemcall.
Socketscreatedwithsocketpair()willbehandledbythekernel.Onloadalsodoes
notaccelerateUNIXdomainsockets.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 120
StaticallyLinkedApplications
Onloadwillnotacceleratestaticallylinkedapplications.Thisisduetothemethodin
whichOnloadinterceptslibcfunctioncalls(usingLD_PRELOAD).
LocalPortAddress
OnloadislimitedtoOOF_LOCAL_ADDR_MAXnumberoflocalinterfaceaddresses.A
localaddresscanidentifyaphysicalportoraVLAN,andmultipleaddressescanbe
assignedtoasingleinterfacewhereeachaddresscontributestothemaximum
value.Userscanallocateadditionallocalinterfaceaddressesbyincreasingthe
compiletimeconstantOOF_LOCAL_ADDR_MAXinthe/src/lib/efthrm/
oof_impl.hfileandrebuildingOnload.Inonload‐201205OOF_LOCAL_ADDR_MAX
wasreplacedbytheonloadmoduleoptionmax_layer2_interfaces.
Bonding,Linkaggregation
•Onloadwillonlyacceleratetrafficover802.3adandactive‐backupbonds.
•Onloadwillnotacceleratetrafficifabondcontainsanyslaveinterfacesthatare
notSolarflarenetworkdevices.Addinganon‐Solarflarenetworkdevicetoa
bondthatiscurrentlyacceleratedbyOnloadmayresultinunexpectedresults
suchasconnectionsbeingreset.
• AccelerationofbondedinterfacesinOnloadrequiresakernelconfiguredwith
sysfssupportandabondingmoduleversionof3.0.0orlater.
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsofconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpTeams’.
VLANs
•OnloadwillonlyacceleratetrafficoverVLANswherethemasterdeviceiseither
aSolarflarenetworkdevice,oroverabondedinterfacethatisaccelerated.i.e.
IftheVLAN'smasterisaccelerated,thensoistheVLANinterfaceitself.
•NestedVLANtagsarenotaccelerated,butwillfunctionasnormal.
•TheifconfigcommandwillreturninconsistentstatisticsonVLANinterfaces(not
masterinterface).
•ASolarflareVLANtaggedinterfacethatissubsequentlyplacedinabondwill
notbeaccelerated.
• HardwarefiltersinstalledbyOnloadontheSolarflareadapterwillonlyconsider
theIPaddressandport,butnottheVLANidentifier.ThereforeifthesameIP
address:portcombinationexistsondifferentVLANinterfaces,onlythefirst
interfacetoinstallthefilterwillreceivethetraffic.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 121
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsandconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpVLANs’.
TCPRTODuringOverloadConditions
UnderveryhighloadconditionsanincreasedfrequencyofTCPretransmission
timeouts(RTOs)mightbeobserved.Thishasthepotentialtooccurwhenathread
servicingthestackisdescheduledbytheCPUwhilststillholdingthestacklockthus
preventinganotherthreadfromaccessing/pollingthestack.Astacknotbeing
servicedmeansthatACKsarenotreceivedinatimelymannerforpacketssentand
resultsinRTOsfortheunacknowledgedpackets.
EnablingtheperstackenvironmentvariableEF_INT_DRIVENcanreducethe
likelihoodofthisbehaviorbyensuringthestackisservicedpromptly.
TCPwithJumboFrames
WhenusingjumboframeswithTCP,OnloadwilllimittheMSSto2048bytesto
ensurethatsegmentsdonotexceedthesizeofinternalpacketbuffers.
Thisshouldpresentnoproblemsunlesstheremoteendofaconnectionisunableto
negotiatethislowerMSSvalue.
TransmissionPath‐PacketLoss
OccasionallyOnloadneedstosendapacket,whichwouldnormallybeaccelerated,
viathekernel.ThisoccurswhenthereisnodestinationaddressentryintheARP
tableortopreventanARPtableentryfrombecomingstale.
Bydefault,theLinuxsysctl,unres_qlen,willenqueue3packetsperunresolved
addresswhenwaitingforanARPreply,andonaserversubjecttoaveryhighUDP
orTCPtrafficloadthiscanresultinpacketlossonthetransmitpathandpackets
beingdiscarded.
Theunres_qlenvaluecanbeidentifiedusingthefollowingcommand:
sysctl‐a|grepunres_qlen
net.ipv4.neigh.eth2.unres_qlen=3
net.ipv4.neigh.eth0.unres_qlen=3
net.ipv4.neigh.lo.unres_qlen=3
net.ipv4.neigh.default.unres_qlen=3
Changestothequeuelengthscanbemadepermanentinthe/etc/sysctl.conf
file.Solarflarerecommendsettingtheunres_qlenvaluetoatleast50.
Ifpacketdiscardsaresuspected,thisextremelyrareconditioncanbeindicatedby
thecp_defercounterproducedbytheonload_stackdumplotscommandonUDP
socketsorfromtheunresolved_discardscounterintheLinux/proc/net/stat
arp_cachefile.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 122
ApplicationClustering
•OnloadmatchestheLinuxkernelimplementationsuchthatclusteringisnot
supportedformulticasttrafficandwheresettingofSO_REUSEPORThasthe
sameaffectasSO_REUSEADDR.
• Callingconnect()onaTCPsocketwhichwaspreviouslysubjecttoabind()
callisnotcurrentlysupported.Thiswillbesupportedinafuturerelease.
•Anapplicationclusterwillnotpersistoveradapter/server/driverreset.Before
restartingtheserverorresettingtheadaptertheOnloadapplicationsshouldbe
terminated.Thislimitationwillberemovedinafuturerelease.
•TheenvironmentvariableEF_CLUSTER_RESTARTdeterminesthebehaviorof
theclusterwhentheapplicationprocessisrestarted‐referto
EF_CLUSTER_RESTARTinParameterReferenceonpage146.
•IfthenumberofsocketsinaclusterislessthanEF_CLUSTER_SIZE,aportionof
thereceivedtrafficwillbelost.
•ThereislittlebenefitwhenclusteringinvolvesaTCPloopbacklisteningsocket
asconnectionswillnotbedistributedamongstallthreads.Anon‐loopback
listeningsocket‐whichmightoccasionallygetsomeloopbackconnectionscan
benefitfromApplicationClustering.
11.4epoll‐KnownIssues
OnloadsupportsdifferentimplementationsofepollcontrolledbytheEF_UL_EPOLL
environmentvariable‐seeMultiplexedI/Oonpage57forconfigurationdetails.
•WhenusingEF_UL_EPOLL=1or3,ithasbeenidentifiedthatthebehaviorof
epoll_wait()differsfromthekernelwhentheEPOLLONESHOTeventis
requested,resultingintwo‘wakeups’beingobserved,onefromthekerneland
onefromOnload.ThisbehaviorisapparentonSOCK_DGRAMandSOCK_STREAM
socketsforallcombinationsofEPOLLONESHOT,EPOLLINandEPOLLOUTevents.
ThisappliesforTCPlisteningsocketsandUDPsockets,butnotforTCP
connectedsockets.
•EF_EPOLL_CTL_FASTisenabledbydefaultandthismodifiesthesemanticsof
epoll.Inparticular,itbuffersupcallstoepoll_ctl()andonlyappliesthem
whenepoll_wait()iscalled.Thiscanbreakapplicationsthatdo
epoll_wait()inonethreadandepoll_ctl()inanotherthread.Theissue
onlyaffectsEF_UL_EPOLL=2andthesolutionistosetEF_EPOLL_CTL_FAST=0
ifthisisaproblem.ThedescribedconditiondoesnotoccurifEF_UL_EPOLL=1
orEF_UL_EPOLL=3.
•WhenEF_EPOLL_CTL_FASTisenabledandanapplicationistestingthe
readinessofanepollfiledescriptorwithoutactuallycallingepoll_wait(),for
examplebydoingepollwithinepollorepollwithinselect(),ifonethreadis
callingselect()orepoll_wait()andanotherthreadisdoingepoll_ctl(),
thenEF_EPOLL_CTL_FASTshouldbedisabled.Thisapplieswhenusing
EF_UL_EPOLL1,2or3.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 123
Iftheapplicationismonitoringthestateoftheepollfiledescriptorindirectly,
e.g.bymonitoringtheepollfdwithpoll,thenEF_EPOLL_CTL_FASTcancause
issuesandshouldbesettozero.
•Asocketshouldberemovedfromanepollsetonlywhenallreferencestothe
socketareclosed.
WithEF_UL_EPOLL=1(default)orEF_UL_EPOLL=3,asocketisremovedfrom
theepollsetifthefiledescriptorisclosed,evenifotherreferencestothe
socketexist.Thiscancauseproblemsiffiledescriptorsareduplicatedusing
dup().Forexample:
s=socket();
s2=dup(s);
epoll_ctl(epoll_fd,EPOLL_CTL_ADD,s,...);
close(s);/*socketreferencedbysisremovedfromepollsetwhenusingonload*/
WorkaroundissetEF_UL_EPOLL=2.
•WhenOnloadisunabletoaccelerateaconnectedsocket,e.g.becausenoroute
tothedestinationexistswhichusesaSolarflareinterface,thesocketwillbe
handedofftothekernelandisremovedfromtheepollset.Becausethesocket
isnolongerintheepollset,attemptstomodifythesocketwithepoll_ctl()
willfailwiththeENOENT(descriptornotpresent)error.Thedescribedcondition
doesnotoccurifEF_UL_EPOLL=1or3.
•Ifanepollfiledescriptorispassedtotheread()orwrite()functionsthese
willreturnadifferenterrorcodethanthatreportedbythekernelstack.This
issueexistsforallimplementationsofepoll.
•WhenEPOLLETisusedandtheeventisready,epoll_wait()istriggeredby
ANYeventonthesocketinsteadoftherequestedevent.Thisissueshouldnot
affectapplicationcorrectness.Theproblemexistsforbothimplementationsof
epoll.
•Usersshouldbeawarethatifaserverisoverclockedtheepoll_wait()
timeoutvaluewillincreaseasCPUMHzincreasesresultinginunexpected
timeoutvalues.ThishasbeenobservedonIntelbasedsystemsandwhenthe
OnloadepollimplementationisEF_UL_EPOLL=1or3.UsingEF_UL_EPOLL=2
thisbehaviorisnotobserved.
•Onaspinningthread,ifepollaccelerationisdisabledbysetting
EF_UL_EPOLL=0,socketsonthisthreadwillbehandedofftothekernel,but
latencywillbeworsethanexpectedkernelsocketlatency.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 124
11.5ConfigurationIssues
MixedAdaptersSharingaBroadcastDomain
OnloadshouldnotbeusedwhenSolarflareandnon‐Solarflareinterfacesinthe
samenetworkserverareconfiguredinthesamebroadcastdomain1asdepictedby
thefollowingdiagram.
Whenanoriginatingserver(S1)sendsanARPrequesttoaremoteserver(S2)having
morethanoneinterfacewithinthesamebroadcastdomain,ARPresponsesfromS2
willbegeneratedfromallinterfacesanditisnon‐deterministicwhichresponsethe
originatoruses.WhenOnloaddetectsthissituation,itpromptsamessage
identifying'duplicateclaimofipaddress'toappearinthe(S1)hostsyslog
asawarningofpotentialproblems.
Problem1
TrafficfromS1toS2maybedeliveredthrougheitheroftheinterfacesonS2,
irrespectiveoftheIPaddressused.Thismeansthatifoneinterfaceisacceleratedby
Onloadandtheotherisnot,youmayormaynotgetacceleration.
Toresolvethesituation(forthecurrentsession)issuethefollowingcommand:
echo1>/proc/sys/net/ipv4/conf/all/arp_ignore
ortoresolveitpermanentlyaddthefollowinglinetothe/etc/sysctl.conffile:
net.ipv4.conf.all.arp_ignore=1
andrunthesysctlcommandforthisbeeffective.
sysctl‐p
ThesecommandsensurethataninterfacewillonlyrespondtoanARPrequestwhen
theIPaddressmatchesitsown.RefertotheLinuxdocumentationLinux/
Documentation/networking/ip‐sysctl.txtforfurtherdetails.
1. ABroadcastdomaincanbealocalnetworksegmentorVLAN.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 125
Problem2
AmoreseriousproblemarisesifoneinterfaceonS2carriesOnloadacceleratedTCP
connectionsandanotherinterfaceonthesamehostandsamebroadcastdomainis
non‐Solarflare:
ATCPpacketreceivedonthenon‐SolarflareinterfacecanresultinacceleratedTCP
connectionsbeingresetbythekernelstackandthereforeappeartotheapplication
asifTCPconnectionsarebeingdropped/terminatedatrandom.
TopreventthissituationtheSolarflareandnon‐Solarflareinterfacesshouldnotbe
configuredinthesamebroadcastdomain.ThesolutiondescribedforProblem1
abovecanreducethefrequencyofProblem2,butdoesnoteliminateit.
TCPpacketscanbedirectedtothewronginterfacebecause:
•theoriginatorS1needstorefreshitsARPtableforthedestinationIPaddress‐
sosendsanARPrequestandsubsequentlydirectsTCPpacketstothenon‐
Solarflareinterface
•aswitchwithinthebroadcastdomainbroadcaststheTCPpacketstoall
interfaces.
VirtualMemoryon32BitSystems
On32bitLinuxsystemstheamountofallocatedvirtualaddressspacedefaults,
typically,to128MbwhichlimitsthenumberofSolarflareinterfacesthatcanbe
configured.Virtualmemoryallocationcanbeidentifiedinthe/proc/meminfofile
e.g.
grepVmalloc/proc/meminfo
VmallocTotal:122880kB
VmallocUsed:76380kB
VmallocChunk:15600kB
TheOnloaddriverwillattempttomapallPCIBaseAddressRegistersforeach
Solarflareinterfaceintovirtualmemorywhereeachinterfacerequires16Mb.
Examinationofthekernellogsin/var/log/messagesatthepointtheOnload
driverisloading,wouldrevealamemoryallocationfailureasinthefollowing
extract:
allocationfailed:outofvmallocspace‐usevmalloc=<size>toincreasesize.
[sfcefrm]Failed(‐12)tomapbar(16777216bytes)
[sfcefrm]efrm_nic_add:ERROR:linux_efrm_nic_ctorfailed(‐12)
Onesolutionistousea64bitkernel.Anotheristoincreasethevirtualmemory
allocationonthe32bitsystembysettingvmallocsizeonthe‘kernelline’inthe/
boot/grub/grub.conffileto256,forexample,
kernel/vmlinuz‐2.6.18‐238.el5roroot=/dev/sda7vmalloc=256M
Thesystemmustberebootedforthischangetotakeeffect.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 126
HardwareResources
Onloadusescertainphysicalresourcesonthenetworkadapter.Iftheseresources
areexhausted,itisnotpossibletocreatenewOnloadstacksandnotpossibleto
acceleratenewsockets.Thesephysicalresourcesinclude:
1VirtualNICs.VirtualNICsprovidetheinterfacebywhichauserlevelapplication
sendsandreceivesnetworktraffic.Whentheseareexhausteditisnotpossible
tocreatenewOnloadstacks,meaningnewapplicationscannotbeaccelerated.
However,SolarflarenetworkadapterssupportlargenumbersofVirtualNICs,
andthisresourceisnottypicallythefirsttorunout.
2Filters.Filtersareusedtodemultiplexpacketsreceivedfromthewiretothe
appropriateapplication.Whentheseareexhausteditisnotpossibletocreate
newacceleratedsockets.Solarflarerecommendthatapplicationsdonot
allocatemorethan4096filters.
3Buffertableentries.Thebuffertableprovidesaddressprotectionand
translationforDMAbuffers.Whentheseareexhausteditisnotpossibleto
createnewOnloadstacks,andexistingstacksarenotabletoallocatemore
DMAbuffers.
Whenanyoftheseresourcesareexhausted,normaloperationofthesystemshould
continue,butitwillnotbepossibletoacceleratenewsocketsorapplications.
Undersevereconditions,afterresourcesareexhausted,itmaynotbepossibleto
sendorreceivetrafficresultinginapplicationsgetting‘stuck’.The
onload_stackdumputilityshouldbeusedtomonitorhardwareresources.
IGMPOperationandMulticastProcessPriority
ItisimportantthatthepriorityofprocessesusingUDPmulticastdonothavea
higherprioritythanthekernelthreadhandlingthemanagementofmulticastgroup
membership.
Failuretoobservethiscouldleadtothefollowingsituations:
1IncorrectkernelIGMPoperation.
2Thehigherpriorityuserprocessisabletoeffectivelyblockthekernelthread
andpreventitfromidentifyingthemulticastgrouptoOnloadwhichwillreact
bydroppingpacketsreceivedforthemulticastgroup.
Acombinationofindicatorsmayidentifythis:
•ethtoolreportsgoodpacketsbeingreceivedwhilemulticastmismatchdoesnot
increase.
•ifconfigidentifiesdataisbeingreceived.
• onload_stackdumpwillshowtherx_discard_mcast_mismatchcounter
increasing.
Loweringthepriorityoftheuserprocesswillremedythesituationandallowthe
multicastpacketsthroughOnloadtotheuserprocess.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 127
DynamicLoading
Iftheonloadlibrarylibonloadisopenedwithdlopen()andclosedwithdlclose()
itcanleavetheapplicationinanunpredictablestate.Usersareadvisedtousethe
RTLD_NODELETEflagtopreventthelibraryfrombeingunloadedwhendlclose()is
called.
ScalablePacketBufferMode
SupportforSR‐IOVisdisabledon32‐bitkernels,thereforethefollowingfeaturesare
notavailableon32‐bitkernels.
•ScalablePacketBufferMode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
Onsomekernelversions,configuringtheadaptertohavealargenumberofVFs(via
sfboot)cancausekernelpanics.Affectingkernelversionsintherange3.0to3.3
inclusive,thisisduetothelargenetlinkmessagesthatincludeinformationabout
networkinterfaces.
Theproblemcanbeavoidedbylimitingthetotalnumberofphysicalnetwork
interfaces,includingVFs,toamaximum30.
SLES11SR‐IOV
IthasbeennotedthatsomeSLES11kernels(3.1andearlier)exhibitabug,typically
seenwhenloadingOnloaddrivers,whenrunningOpenOnloadwithSR‐IOVandIntel
IOMMUs.Thisbughasbeenfixedinmorerecentkernels3.2stableand3.6.
HugePageswithIPCnamespace
HugepagesupportshouldnotbeenablediftheapplicationusesIPCnamespaces
andtheCLONE_NEWIPCflag.Failuretoobservethismayresultinasegfault.
HugePageswithSharedStacks
ProcesseswhichshareanOnloadstackshouldnotattempttousehugepages.Refer
toStackSharingonpage62forlimitationdetails.
HugePages‐Size
Whenusinghugepages,itisrecommendedtoavoidsettingthepagesizegreater
than2Mbyte.AfailuretoobservethiscouldleadtoOnloadunabletoallocate
furtherbuffertablespaceforpacketbuffers.
HugePages‐AMDIOMMU
DuetotheAMDIOMMUnotreturningalignedPCIaddresses,theuseofhugepages
onsystemswithAMDIOMMUsisnotsupported.

OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 128
HugePagesandshmmni
Usersshouldensurethatthenumberofsystemwidesharedmemorysegments
(shmmni)exceedsthenumberofhugepagesrequired.
•Toidentifycurrentshmmnisetting:
#cat/proc/sys/kernel/shmmni
•Toset(norebootrequired‐butnotpermanent):
#echo8000>/proc/sys/kernel/shmmni
•Toset(permanent‐rebootrequired):
#echo"kernel.shmmni=8000">>/etc/sysctl.conf
Forexample,if4000hugepagesarerequired,increasethecurrentshmmnivalueby
4000.
RedHatMRG2andSR‐IOV
EnterpriseOnloadfromversion2.1.0.1includessupportforRedHatMRG2update3
andthe3.6.11‐rtkernel.SolarflaredonotrecommendtheuseofSR‐IOVorthe
IOMMUwhenusingOnloadonthesesystemsduetoanumberofknownkernel
issues.ThefollowingOnloadfeaturesshouldnotbeusedonMRG2u3:
•Scalablepacketbuffermode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
PowerPCArchitecture
•32bitapplicationsareknownnottoworkcorrectlywithonload‐201310.This
hasbeencorrectedinonload‐201310‐u1.
•SR‐IOVisnotsupportedbyonload‐201310onPowerPCsystems.
RecommendedsettingisEF_PACKET_BUFFER_MODE==0or2,butnot1or3.
•PowerPCarchitecturesdonotcurrentlysupportPIOforreducedlatency.
EF_PIOshouldbesettozero.
Java7Applications‐useofvfork()
OnloadacceleratedJava7applicationsthatcallvfork()shouldsetthe
environmentvariableEF_VFORK_MODE=2andthereaftertheapplicationshouldnot
createsocketsoracceleratedpipesinvfork()childbeforeexec.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 129
12ChangeHistory
Thischapterprovidesabriefhistoryofchanges,additionsandremovalstoOnload
releasesaffectingOnloadbehaviorandOnloadenvironmentvariables.
•Featuresonpage130
•EnvironmentVariablesonpage135
•ModuleOptionsonpage143
TheOOLcolumnidentifiestheOpenOnloadreleasesupportingthefeature.TheEOL
columnidentifiestheEnterpriseOnloadreleasesupportingthefeature(NS=not
supported).
ThefollowingtablemapsmajorEnterpriseOnloadreleasestotheclosest
functionallyequivalentOpenOnloadrelease.Usersshouldalwaysalsorefertothe
ReleasenotesandChangelogstoidentifyfeaturesupportintheEnterpriserelease.
OpenOnload EnterpriseOnload
201011‐u1 1.0
201109‐u2 2.0
201310‐u2 3.0
201502‐u2 4.0

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 130
12.1Features
Feature OOL EOL Description/Notes
4.5.1.1026netdriver 201509 NS Adapternetdriver.
ApplicationClustering 201405 NS 201509Removethesameport,sameaddress
limitation.
CI_CFG_MAX_INTERFACES
CI_CFG_MAX_REGISTER_INT
ERFACES
ALL NS Increasedefaultto8(previously6).This
remainsacompiletimeoption.
onload_set_recv_filter() 201509 NS UDPsocketscallsisdeprecatedin201509.
Teamingdriver 201509 NS Acceleratelinksaggregatedusingteamdand
theteamingdriver.
TransparentProxy 201509 NS SeeTransparentReverseProxyModeson
page84.
ScalableFilters 201509 NS SeeScalableFiltersonpage82.
IP_TRANSPARENT 201509 NS TCPsocketoption.
4.5.1.1010netdriver 201502‐u2 4.0 Adapternetdriver.
4.4.1.1021netdriver 201502‐u1 NS Adapternetdriver.
SO_PROTOCOL 201502‐u2 4.0 Socketoptiontoretrieveasocketprotocolasan
integer.
4.4.1.1017netdriver 201502 NS Adapternetdriver.
LinuxDockerContainers 201502 4.0 SeeOnloadinaDockerContaineronpage113
OnloadinKVM 201502 4.0 OnloadandLinuxKVMonpage109
Socketcaching 201502 4.0 SeeListen/AcceptSocketsonpage79
RemoteMonitoring 201502 4.0 SeeRemoteMonitoringonpage236
Blacklist/Whitelist 201502 4.0 SeeWhitelistandBlacklistInterfaceson
page51
TCPdelegatedsend 201502 4.0 SeeListen/AcceptSocketsonpage79
SynCookies 201502 4.0
Receivequeuedropcounters 201502 4.0
Ubuntu/Debiansupported 201502 4.0 SeeHardwareandSoftwareSupported
Platformsonpage16forsupportedversions.

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 131
4.1.2.1003netdriver 201405‐u2
201405‐u1
NS NetdriversupportingRHEL7andlaterkernels.
SIOCOUTQ 201405‐u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetacknowledged.
SIOCOUTQNSD 201405‐u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetsent.
ef_pd_interface_name() 201405‐u1 4.0 Identifiestheinterfaceusedbyaprotection
domain.
ef_vi_prime() 201405‐u1 4.0 Primeinterruptssocanblockonafile
descriptor(includinganyvirtualinterface)until
eventsarereadytobeprocessed.
ef_filter_spec_set_tx_port
_sniff()
201405‐u1 4.0 NewfiltertypetosniffTXtraffic.
ONLOAD_SOF_TIMESTAMPING_ST
REAM
201405 4.0 Onloadextensiontothestandard
SO_TIMESTAMPINGAPItosupporthardware
timestampsonTCPsockets.
onload_move_fd 201405 4.0 Movesocketsbetweenstacks.
SolarCapturePro‐
applicationclustering
201405 4.0 Onloaddistributionincludesthesolar‐clusterd
daemonforSolarCaptureProapplication
clusteringfeature.
4.1.0.6734netdriver 201405 3.0.0.8
3.0.0.7
3.0.0.6
3.0.0.5
3.0.0.4
NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadapters‐includingSFN7x42Q.
SO_REUSEPORT 201405 4.0 Allowmultiplesocketstobindtothesameport
‐supportstheApplicationClusteringfeature‐
seeApplicationClusteringonpage63.
HWMulticastLoopback 201405 4.0 RefertoHardwareMulticastLoopbackon
page94.
onload_ordered_epoll_
wait()
onload_ordered_epoll_
event
201405 4.0 Wireorderdeliveryofpackets.
RefertoWireOrderDeliveryonpage61.
TCPSYNcookies 201405 4.0 ForceuseofTCPSYNcookiestoprotectagainst
aSYNfloodattack.
Feature OOL EOL Description/Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 132
onload_tooldisable_cstates 201405 ‐Removedalongwiththesfc_tunedriver.
sfc_aoedriver 201405 NS ApplicationOnload™driverincludedinthe
Onloaddistribution.
4.0.2.6645netdriver 201310‐u2 3.0 NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersintroducinghardwarepacket
timestampsandPTPon7xxxseriesadapters.
SFN7142Qnotsupported.
SO_TIMESTAMPING 201310‐u1 3.0 Socketoptiontoreceivehardwaretimestamps
forreceivedpackets.
onload_fd_check_feature() 201310‐u1 3.0 onload_fd_check_featureonpage191
4.0.2.6628netdriver 201310‐u1 NS NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersintroducinghardwarepacket
timestampsandPTPon7xxxseriesadapters.
4.0.0.6585netdriver 201310 3.0 NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersandSolarflarePTPand
hardwarepackettimestamps.
MulticastReplication 201310 3.0 Bonding,LinkaggregationandFailoveron
page65
TXPIO 201310 3.0 DebugandLoggingonpage67
LargeBufferTableSupport 201310 3.0 LargeBufferTableSupportonpage97
TemplatedSends 201310 3.0 TemplatedSendsonpage108
ONLOAD_MSG_WARM 201310 3.0 ONLOAD_MSG_WARMonpage78
SO_TIMESTAMP
SO_TIMESTAMPNS
201310 3.0 SupportedforTCPsockets
dup3() 201310 3.0 Onloadwillinterceptcallstocreateacopyofa
filedescriptorusingdup3().
3.3.0.6262netdriver NS 2.1.0.1 SupportSolarflareEnhancedPTP(sfptpd).
IP_ADD_SOURCE_MEMBERS
HIP
201210‐u1 3.0 Jointhesuppliedmulticastgrouponthegiven
interfaceandacceptdatafromthesupplied
sourceaddress.
IP_DROP_SOURCE_MEMBER
SHIP
201210‐u1 3.0 Dropsmembershiptothegivenmulticast
group,interfaceandsourceaddress.
MCAST_JOIN_SOURCE_GRO
UP
201210‐u1 3.0 Joinasourcespecificgroup.
Feature OOL EOL Description/Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 133
MCAST_LEAVE_SOURCE_GR
OUP
201210‐u1 3.0 Leaveasourcespecificgroup.
3.3.0.6246netdriver 201210‐u1 NS SupportSolarflareEnhancedPTP(sfptpd).
Hugepagessupport 201210 3.0 Packetbuffersusehugepages.Controlledby
EF_USE_HUGE_PAGES
Defaultis1‐usehugepagesifavailable
SeeLimitationsonpage117
onload_iptables 201210 3.0 ApplyLinuxiptablesfirewallrulesoruser‐
definedfirewallrulestoSolarflareinterfaces
onload_stackdump
processes
onload_stackdumpaffinities
onload_stackdumpenv
201210 3.0 ShowallacceleratedprocessesbyPID
ShowCPUcoreacceleratedprocessisrunning
on
Showenvironmentvariables‐
EF_VALIDATE_ENV
Physicaladdressingmode 201210 3.0 Allowsaprocesstousephysicaladdresses
ratherthancontrolledI/Oaddresses.Enabled
byEF_PACKET_BUFFER_MODE2or3
UDPsendmmsg() 201210 3.0 Sendmultiplemsgsinasinglefunctioncall
I/OMultiplexing 201210 3.0 Supportforppoll(),pselect()and
epoll_pwait()
DKMS 201210 NS OpenOnloadavailableinDKMSRPMbinary
format
3.2.1.6222Bnetdriver 201210 NS OpenOnloadonly
3.2.1.6110netdriver NS 2.1.0.0 EnterpriseOnloadonly
3.2.1.6099netdriver 201205‐u1 NS
Removingzombiestacks 201205‐u1 2.1.0.0 onload_stackdump‐zkillwillterminate
stackslingeringafterexit
Compatibility 201205‐u1 2.1.0.0 CompatibilitywithRHEL6.3andLinux3.4.0
TCPstriping 201205 2.1.0.0 SingleTCPconnectioncanusethefull
bandwidthofbothportsonaSolarflareadapter
TCPloopbackacceleration 201205 2.1.0.0 EF_TCP_CLIENT_LOOPBACK&
EF_TCP_SERVER_LOOPBACK
TCPdelayed
acknowledgments
201205 2.1.0.0 EF_DYNAMIC_ACK_THRESH
Feature OOL EOL Description/Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 134
TCPresetfollowingRTO 201205 2.1.0.0 EF_TCP_RST_DELAYED_CONN
Configurecontrolplane
tables
201205 2.1.0.0 max_layer_2_interface
max_neighs
max_routes
Onloadadaptersupport 201109‐u2 2.0.0.0 OnloadsupportforSFN5322F&SFN6x22F
Acceleratepipe2() 201109‐u2 2.0.0.0 Acceleratepipe2()functioncall
SOCK_NONBLOCK
SOCK_CLOEXEC
201109‐u2 2.0.0.0 TCPsockettypes
ExtensionsAPI 201109‐u2 2.0.0.0 Supportforonload_thread_set_spin()
3.2netdriver 201109‐u1 2.0.0.0
Onload_tcpdump 201109 2.0.0.0
ScalablePacketBuffer 201109 2.0.0.0 EF_PACKET_BUFFER_MODE=1
Zero‐CopyUDPRX 201109 2.0.0.0
Zero‐CopyTCPTX 201109 2.0.0.0
Receivefiltering 201109 2.0.0.0
TCP_QUICKACK 201109 2.0.0.0 setsockopt()option
Benchmarktoolsfnettest 201109 2.0.0.0 Supportforsfnt‐stream
3.1netdriver 201104
ExtensionsAPI 201104 2.0.0.0 Initialpublication
SO_BINDTODEVICE
SO_TIMESTAMP
SO_TIMESTAMPNS
201104 2.0.0.0 setsockopt()andgetsockopt()options
Acceleratedpipe() 201104 2.0.0.0 Acceleratepipe()functioncall
UDPrecvmmsg() 201104 2.0.0.0 Delivermultiplemsgsinasinglefunctioncall
Benchmarktoolsfnettest 201104 2.0.0.0 Supportsonlysfnt‐pingpong
Feature OOL EOL Description/Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 135
12.2EnvironmentVariables
Variable OOL EOL Changed Notes
EF_UDP_SEND_NONBLOC
K_NO_PACKETS_MODE
201509 NS Controlbehaviourofnon‐block
UDPsend()callswhen
insufficientbufferscanbe
allocated.
EF_TCP_SYNRECV_MAX 201509 NS Limitthenumberofhalf‐open
connectionsthatcanbecreated
inanOnloadstack.
EF_TCP_SOCKBUF_MAX_
FRACTION
201509 NS ControlthefractionoftotalTX
buffersallocatedtoasingle
socket.
EF_TCP_CONNECT_SPIN 201509 NS Callstoconnect()forTCP
socketswillspinuntila
connectionisestablishedorthe
spintimeoutexpiresorthe
sockettimeoutexpires.
Default=disabled.
EF_SCALABLE_FILTERS_E
NABLE
201509 NS Tog glescalablefiltersmodefor
astack.
EF_SCALABLE_FITLERS_M
ODE
201509 NS Storesthescalablefiltermode
setwithEF_SCALABLE_FILTERS.
NOTSETDIRECTLY.
EF_SCALABLE_FILTERS 201509 NS Identifytheinterfacetouseand
setmodeforscalablelistening
sockets.
EF_RETRANSMIT_THRESH
OLD_ORPHAN
201509 NS Numberofretransmittimeouts
beforeaTCPconnectionis
abortedincaseoforphaned
connection.
EF_MAX_EP_PINNED_PA
GES
NS 1.0 201509 Notusedinpreviousrelease
andremovedfrom201509.
EF_OFE_ENGINE_SIZE 201502 NS Size(bytes)oftheOnloadfilter
engineallocatedwhenanew
stackiscreated.
EF_TCP_SNDBUF_ESTABLI
SHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_SNDBUFforTCPsocketsin
theESTABLISHEDstate.

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 136
EF_TCP_RCVBUF_STRICT 201502 4.0 PreventTCPsmallsegment
attackbylimitingnumberof
packetsinaTCPreceivequeue
andreorderbuffer.
EF_TCP_RCVBUF_ESTABLI
SHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_RCVBUFforTCPsocketsin
theESTABLISHEDstate.
EF_SO_BUSY_POLL_SPIN 201502 4.0 Spinonlyifaspinningsocketis
presentinthepoll/select/epoll
set.
EF_SELECT_NONBLOCK_FA
ST_USEC
201502 4.0 Non‐acceleratedsocketsare
polledonlyeveryNusecs.
EF_SELECT_FAST_USEC 201502 4.0 Acceleratedsocketsarepolled
forNusecsbefore
unacceleratedsockets.
EF_PIPE_SIZE 201502 4.0 201509 Defaultsizeofapipe.
Defaultdecreasedto229376
from237568.
EF_SOCKET_CACHE_MAX 201502 4.0 Setthemaximumnumberof
TCPsocketstocacheperstack.
EF_SOCKET_CACHE_PORTS 201502 4.0 Allowcachingofsocketsbound
tospecifiedports.
EF_PER_SOCKET_CACHE_M
AX
201502 4.0 Limitthesizeofasocketcache.
EF_COMPOUND_PAGES_MOD
E
201502 4.0 ControlOnloaduseof
compoundpages.
EF_UL_EPOLL=3 201502 4.0
EF_ACCEPT_INHERIT_NOD
ELAY
NS 3.0 201502/4.0 Removed(OOL)201502,(EOL)
4.0.
EF_TCP_SEND_NONBLOCK_
NO_PACKETS_MODE
201502 3.0.0.3 Controlnon‐blockingTCPsend()
callbehaviorwhenunableto
allocatesufficientpacket
buffers.
EF_CLUSTER_IGNORE 201405‐u1 4.0 Ignoreattemptstouseclusters
EF_CLUSTER_RESTART 201405 4.0 DetermineOnloadcluster
behaviorfollowingrestart.
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 137
EF_CLUSTER_SIZE 201405 4.0 Size(numberofsocket
members)ofapplication
cluster.
EF_CLUSTER_NAME 201405 4.0 Createanapplicationcluster.
EF_UDP_FORCE_REUSEPOR
T
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_TCP_FORCE_REUSEPOR
T
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_MCAST_SEND 201405 4.0 Enable/Disablemulticast
loopback.
EF_MCAST_RECV_HW_LOOP 201405 4.0 Enable/Disablehardware
multicastloopback‐receive.
EF_TX_TIMESTAMPING 201405 4.0 Perstackhardware
timestampingcontrol.
EF_TIMESTAMPING_REPOR
TING
201405 4.0 Controltimestampreporting.
EF_TCP_SYNCOOKIES 201405 4.0 UseTCPsyncookiestoprotect
againstSYNfloodattack.
EF_SYNC_CPLANE_AT_CRE
ATE
201405 3.0 Synchronizecontrolplanewhen
astackiscreated.
EF_MULTICAST_LOOP_OFF ‐3.0 201405 Deprecatedinfavorof
EF_MCAST_SEND
EF_TX_PUSH_THRESHOLD 201310_u1 3.0 ImproveEF_TX_PUSHlow
latencytransmitfeature.
EF_RX_TIMESTAMPING 201310_u1 3.0 Controlofreceivepacket
hardwaretimestamps.
EF_RETRANSMIT_THRESHO
LD_SYNACK
201104 1.0.0.0 201310‐u1 Defaultchangedfrom4to5.
EF_PIO 201310 3.0 Enable/disablePIO
Defaultvalue1.
EF_PIO_THRESHOLD 201310 3.0 Identifiesthelargestpacketsize
thatcanusePIO.Defaultvalue
is1514.
EF_VFORK_MODE 201310 3.0 Dictateshowvfork()intercept
shouldwork.
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 138
EF_FREE_PACKETS_LOW_W
ATERMARK
201310 3.0 201405‐u1 Leveloffreepacketstobe
retainedduringruntime.
Defaultchangedto0
(interpretedasEF_RXQ_SIZE/2 )
from100.
EF_TCP_SNDBUF_MODE 201310 2.0.0.6 201502
4.0
201509
LimitTCPpacketbuffersused
onthesendqueueand
retransmitqueue.
Defaultchangedto1from0in
201502/4.0.
Addedmode2in201509.
EF_TXQ_SIZE 3.0 201310 Limitedto2048forSFN7000
series.
EF_MAX_ENDPOINTS 201104 1.1.0.3 201310
201509
Defaultchangedto1024from
10.
Defaultchangesto8192from
1024.Min(default)changesto
4from0.
EF_SO_TIMESTAMP_RESYN
C_TIME
201104 2.1.0.1 201310 RemovedfromOOL.
EF_SIGNALS_NOPOSTPONE 201210‐u1 2.1.0.1 Preventthespecifiedlistof
signalsfrombeingpostponed
byonload.
EF_FORCE_TCP_NODELAY 201210 3.0 ForceuseofTCP_NODELAY.
EF_USE_HUGE_PAGES 201210 3.0 Enableshugepagesforpacket
buffers.
EF_VALIDATE_ENV 201210 3.0 Willwarnaboutobsoleteor
misspelledoptionsinthe
environment
Defaultvalue1.
EF_PD_VF 201205‐u1 2.1.0.0 201210 AllocateVIswithinSR‐IOVVFs
toallocateunlimitedmemory.
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 139
EF_PD_PHYS_MODE 201205_u1 2.1.0.0 201210 AllowsaVItousephysical
addressingratherthan
protectedI/Oaddresses
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
EF_MAX_PACKETS 20101111 1.0.0.0 201210 Onloadwillroundthespecified
valueuptothenearestmultiple
of1024.
EF_EPCACHE_MAX 20101111 1.0.0.0 201210 RemovedfromOOL
EF_TCP_MAX_SEQERR_MSG
S
NS 201210 Removed
EF_STACK_LOCK_BUZZ 20101111 1.0.0.0 201210 OOLChangetoper_process,
fromper_stack.EOLisper
stack.
EF_RFC_RTO_INITIAL 20101111 1.0.0.0 201210
2.1.0.0
Changedefaultto1000from
3000
EF_DYNAMIC_ACK_THRESH 201205 2.1.0.0 201210 Defaultvaluechangedto16
from32in201210
EF_TCP_SERVER_LOOPBAC
K
EF_TCP_CLIENT_LOOPBAC
K
201205 2.1.0.0 201210 TCPloopbackacceleration
Addedoption4forclient
loopbacktocausebothendsof
aTCPconnectiontosharea
newlycreatedstack.
Option4issupportedfrom
EnterpriseOnloadv3.0.
EF_TCP_RST_DELAYED 201205 2.1.0.0 ResetTCPconnectionfollowing
RTOexpiry
EF_SA_ONSTACK_INTERCE
PT
201205 2.1.0.0 Defaultvalue0
EF_SHARE_WITH 201109‐u2 2.0.0.0
EF_EPOLL_CTL_HANDOFF 201109‐u2 2.0.0.0 Defaultvalue1
EF_CHECK_STACK_USER NS 201109‐u2 RenamedEF_SHARE_WITH
EF_POLL_USEC 201109‐u1 1.0.0.0
EF_DEFER_WORK_LIMIT 201109‐u1 2.0.0.0 Defaultvalue32
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 140
EF_POLL_FAST_LOOPS 20101111 1.0.0.0 201109‐u1
2.0.0.0
RenamedEF_POLL_FAST_USEC
EF_POLL_NONBLOCK_
FAST_LOOPS
201104 2.0.0.0 201109‐u1
2.0.0.1
RenamedEF_POLL_NONBLOCK_
FAST_USEC
EF_PIPE_RECV_SPIN 201104 2.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_PKT_WAIT_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_PIPE_SEND_SPIN 201104 2.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_TCP_ACCEPT_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_TCP_RECV_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_TCP_SEND_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_UDP_RECV_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_UDP_SEND_SPIN 20101111 1.0.0.0 201109‐u1 Becomesper‐process,was
previouslyper‐stack
EF_EPOLL_NONBLOCK_FAS
T_LOOPS
201104‐u2 2.0.0.0 201109‐u1 Removed
EF_POLL_AVOID_INT 20101111 1.0.0.0 201109‐u1 Removed
EF_SELECT_AVOID_INT 20101111 1.0.0.0 201109‐u1 Removed
EF_SIG_DEFER 20101111 1.0.0.0 201109‐u1 Removed
EF_IRQ_CORE 201109 2.0.0.0 201109‐u2 Non‐rootusercannowsetit
whenusingscalablepacket
buffermode
EF_IRQ_CHANNEL 201109 2.0.0.0
EF_IRQ_MODERATION 201109 2.0.0.0 Defaultvalue0
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 141
EF_PACKET_BUFFER_MODE 201109 2.0.0.0 201210 In201210options2and3
enablephysicaladdressing
mode.
EOLonlysupportsoption1.
EOLv3.0supportsoptions2and
3.
Default‐disabled
EF_SIG_REINIT 201109 NS Defaultvalue0.
201109‐u1 Removedin201109‐u1
EF_POLL_TCP_LISTEN_UL
_ONLY
201104 2.0.0.0 201109 Removed
EF_POLL_UDP 20101111 1.0.0.0 201109 Removed
EF_POLL_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_POLL_UDP_UL_ONLY 201104 2.0.0.0 201109 Removed
EF_SELECT_UDP 20101111 1.0.0.0 201109 Removed
EF_SELECT_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_UDP_CHECK_ERRORS 20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_FAST_LOOP
S
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_MCAST_UL_
ONLY
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_UL_ONLY 20101111 1.0.0.0 201109 Removed
EF_TX_QOS_CLASS 201104‐u2 2.0.0.0 Defaultvalue0
EF_TX_MIN_IPG_CNTL 201104‐u2 2.0.0.0 Defaultvalue0
EF_TCP_LISTEN_HANDOVE
R
201104‐u2 2.0.0.0 Defaultvalue0
EF_TCP_CONNECT_HANDOV
ER
201104‐u2 2.0.0.0 Defaultvalue0
EF_EPOLL_NONBLOCK_FAS
T_LOOPS
201104‐u2 2.0.0.0 Defaultvalue32
201109‐u1 Removedin201109‐u1
EF_TCP_SNDBUF_MODE 2.0.0.6 Defaultvalue0
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 142
EF_UDP_PORT_HANDOVER2
_MAX
201104‐u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_HANDOVER2
_MIN
201104‐u1 2.0.0.0 Defaultvalue2
EF_UDP_PORT_HANDOVER3
_MAX
201104‐u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_HANDOVER3
_MIN
201104‐u1 2.0.0.0 Defaultvalue2
EF_STACK_PER_THREAD 201104‐u1 2.0.0.0 Defaultvalue0
EF_PREFAULT_PACKETS 20101111 1.0.0.0 201104‐u1 Enabledbydefault,was
previouslydisabled
EF_MCAST_RECV 201104‐u1 2.0.0.0 Defaultvalue1
EF_MCAST_JOIN_BINDTOD
EVICE
201104‐u1 2.0.0.0 Defaultvalue0
EF_MCAST_JOIN_HANDOVE
R
201104‐u1 2.0.0.0 Defaultvalue0
EF_DONT_ACCELERATE 201104‐u1 2.0.0.0 Defaultvalue0
EF_MULTICAST 20101111 1.0.0.0 201104‐u1 Removed
EF_TX_PUSH 20101111‐u1 1.0.0.0 201104 Enabledbydefault,was
previouslydisabled
201109 Nolongersetbythelatency
profilescript
Variable OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 143
12.3ModuleOptions
Tolistallonloadmoduleoptions:
#modinfoonload
Option OOL EOL Changed Notes
scalable_filter_gid 201509 NS SettoagroupIdentifierof
usersallowedtousethe
scalablefiltersfeature.
Setto‐2meansthat
CAP_NET_RAWisrequired‐ and
checkingisenforced.
Setto‐1toavoidcapability
(CAP_NET_RAW)check.
oof_shared_steal_thre
sh
SeeListen/AcceptSocketson
page79
oof_shared_keep_thres
h
SeeListen/AcceptSocketson
page79
oof_all_ports_require
d
Whensetto1,Onloadwill
returnanerrorifitisunableto
installafilteronallrequired
interfaces.
Setthisto0whenusing
multiplePFsorVFswithOnload.
intf_white_list 201502 NS SeeWhitelistandBlacklist
Interfacesonpage51
intf_black_list 201502 NS SeeWhitelistandBlacklist
Interfacesonpage51
timesync_period 201502 NS Periodinmillisecondsbetween
synchronizingtheOnloadclock
withthesystemclock.
max_packets_per_stack 201210 3.0 Limitthenumberofpacket
buffersthateachOnloadstack
canallocate.Thismodule
optionplacesanupperlimiton
theEF_MAX_PACKETSoption

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 144
epoll2_max_stacks 201210 3.0 Identifiesthemaximum
numberofstacksthatanepoll
filedescriptorcanhandlewhen
EF_UL_EPOLL=2
phys_mod_gid 201210 3.0 sfc_charmoduleparameterto
restrictwhichef_viuserscan
usephysicaladdressingmode.
phys_mode_gid 201210 3.0 Enablephysicaladdressing
modeandrestrictwhichusers
canuseit
shared_buffer_table 201210 NS Thisoptionshouldbesetto
enableef_viapplicationsthat
usetheef_iobufsetAPI.Setting
shared_buffer_table=10000
willmake10000buffertable
entriesavailableforusewith
ef_iobufset.
safe_signals_and_exit 201205 2.1.0.0 WhenOnloadinterceptsa
terminationsignalitwill
attemptacleanexitbyreleasing
resourcesincludingstacklocks
etc.Thedefaultis(1)enabled
anditisrecommendedthatthis
remainsenabledunlesssignal
handlingproblemsoccurwhen
itcanbedisabled(0).
max_layer2_interfaces 201205 2.1.0.0 Maximumnumberofnetwork
interfaces(includesphysical,
VLANandbonds)supportedin
thecontrolplane.
max_routes 201205 2.1.0.0 Maximumnumberofentriesin
theOnloadroutetable.Default
is256.
max_neighs 201205 2.1.0.0 Maximumnumberofentriesin
OnloadARP/neighbourtable.
Roundeduptopoweroftwo
value.Defaultis1024.
Option OOL EOL Changed Notes

OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 145
NOTE:TheusershouldalwaysrefertotheOnloaddistributionReleaseNotesand
ChangeLog.Theseareavailablefromhttp://www.openonload.org/
download.html.
unsafe_sriov_without_
iommu
201209‐u2 2.0.0.0 201210 Removed,obsoletedbyphysical
addressingmodesand
phys_mode_gid.
ObsoleteinEOLfromv3.0.
buffer_table_min
buffer_table_max
2.0.0.0 201210 Obsolete‐Removed.
ObsoleteinEOLfromv3.0.
Option OOL EOL Changed Notes

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 146
AParameterReference
A.1ParameterList
Theparameterlistdetailsthefollowing:
•Theenvironmentvariableusedtosettheparameter.
• Parametername:thenameusedbyonload_stackdump.
•Thedefault,minimumandmaximumvalues.
•Whetherthevariablescopeappliesper‐stackorper‐process.
• Description.
EF_ACCEPTQ_MIN_BACKLOG
Name:acceptq_min_backlog default:1 per‐stack
Setsaminimumvaluetouseforthe'backlog'argumenttothelisten()call.Ifthe
applicationrequestsasmallervalue,usethisvalueinstead.
EF_ACCEPT_INHERIT_NONBLOCK
Name:accept_force_inherit_nonblock default:0 min:0 max:1 per‐
process
Ifsetto1,TCPsocketsacceptedfromalisteningsocketinherittheO_NONBLOCKflag
fromthelisteningsocket.
EF_BINDTODEVICE_HANDOVER
Name:bindtodevice_handover default:0 min:0 max:1 per‐stack
HandsocketsovertothekernelstackthathavetheSO_BINDTODEVICEsocketoption
enabled.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 147
EF_BURST_CONTROL_LIMIT
Name:burst_control_limit default:0 per‐stack
Ifnon‐zero,limitshowmanybytesofdataaretransmittedinasingleburst.Thiscanbe
usefultoavoiddropsonlow‐endswitcheswhichcontainlimitedbufferingorlimited
internalbandwidth.Thisisnotusuallyneededforusewithmostmodern,high‐
performanceswitches.
EF_BUZZ_USEC
Name:buzz_usec default:0 per‐stack
Setsthetimeoutinmicrosecondsforlockbuzzingoptions.Settozerotodisablelock
buzzing(spinning).Willbuzzforeverifsetto‐1.AlsosetbytheEF_POLL_USECoption.
EF_CLUSTER_IGNORE
Name:cluster_ignore default:0 min:0 max:1 per‐stack
Whenset,thisoptioninstructsOnloadtoignoreattemptstouseclustersandeffectively
ignoreattemptstosetSO_REUSEPORT.
EF_CLUSTER_RESTART
Name:cluster_restart_opt default:0 min:0 max:1 per‐process
Thisoptioncontrolsthebehaviourwhenrecreatingastack(e.g.duetorestartinga
process)inanSO_REUSEPORTclusteranditencountersaresourcelimitationsuchasan
orphanstackfromthepreviousprocess:0‐returnanerror.1‐terminatetheorphanto
allowthenewprocesstocontinue
EF_CLUSTER_SIZE
Name:cluster_size default:2 min:2 per‐process
IfuseofSO_REUSEPORTcreatesacluster,thisoptionspecifiessizeoftheclustertobe
created.ThisoptionhasnoimpactifuseofSO_REUSEPORTjoinsaclusterthatalready
exists.Notethatiffewersocketsthanspecifiedherejointhecluster,thensometraffic
willbelost.RefertotheSO_REUSEPORTsectioninthemanualformoredetail.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 148
EF_COMPOUND_PAGES_MODE
Name:compound_pages default:0 min:0 max:2 per‐stack
Debugoption,notsuitablefornormaluse.Forpacketbuffers,allocatesystempagesin
thefollowingway:0‐trytousecompoundpagesifpossible(default);1‐donotuse
compoundpagesofhighorder;2‐donotusecompoundpagesatall.
EF_CONG_AVOID_SCALE_BACK
Name:cong_avoid_scale_back default:0 per‐stack
When>0,thisoptionslowsdowntherateatwhichtheTCPcongestionwindowis
opened.Thiscanhelptoreducelossinenvironmentswherethereislotsofcongestion
andloss.
EF_DEFER_WORK_LIMIT
Name:defer_work_limit default:32 per‐stack
Themaximumnumberoftimesthatworkcanbedeferredtothelockholderbeforewe
forcetheunlockedthreadtoblockandwaitforthelock
EF_DELACK_THRESH
Name:delack_thresh default:1 min:0 max:65535 per‐stack
Thisoptioncontrolsthedelayedacknowledgementalgorithm.Asocketmayreceiveup
tothespecifiednumberofTCPsegmentswithoutgeneratinganACK.Settingthisoption
to0disablesdelayedacknowledgements.NB.Thisoptionisoverriddenby
EF_DYNAMIC_ACK_THRESH,sobothoptionsneedtobesetto0todisabledelayed
acknowledgements.
EF_DONT_ACCELERATE
Name:dont_accelerate default:0 min:0 max:1 per‐process
Donotacceleratebydefault.Thisoptionisusuallyusedinconjuctionwith
onload_set_stackname()toallowindividualsocketstobeacceleratedselectively.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 149
EF_DYNAMIC_ACK_THRESH
Name:dynack_thresh default:16 min:0 max:65535 per‐stack
Ifsetto>0thiswillturnondynamicadapationoftheACKratetoincreaseefficiencyby
avoidingACKswhentheywouldreducethroughput.Thevalueisusedasthethreshold
fornumberofpendingACKsbeforeanACKisforced.Ifsettozerothenthestandard
delayed‐ackalgorithmisused.
EF_EPOLL_CTL_FAST
Name:ul_epoll_ctl_fast default:1 min:0 max:1 per‐process
Avoidsystemcallsinepoll_ctl()whenusinganacceleratedepollimplementation.
Systemcallsaredeferreduntilepoll_wait()blocks,andinsomecasesremoved
completely.Thisoptionimprovesperformanceforapplicationsthatcallepoll_ctl()
frequently.CAVEATS:*ThisoptionhasnoeffectwhenEF_UL_EPOLL=0.*Donotturnthis
optiononifyourapplicationusesdup(),fork()orexec()incojuctionwithepollfile
descriptorsorwiththesocketsmonitoredbyepoll.*Ifyoumonitortheepollfdin
anotherpoll,selectorepollset,andtheeffectsofepoll_ctl()arelatencycritical,then
thisoptioncancauselatencyspikesorevendeadlock.*WithEF_UL_EPOLL=2,this
optionisharmfulifyouarecallingepoll_wait()andepoll_ctl()simultaneouslyfrom
differentthreadsorprocesses.
EF_EPOLL_CTL_HANDOFF
Name:ul_epoll_ctl_handoff default:1 min:0 max:1 per‐process
Allowepoll_ctl()callstobepassedfromonethreadtoanotherinordertoavoidlock
contention,inEF_UL_EPOLL=1or3case.Thisoptimisationisparticularlyimportant
whenepoll_ctl()callsaremadeconcurrentlywithepoll_wait()andspinningis
enabled.Thisoptionisenabledbydefault.CAVEAT:Thisoptionmaycauseanerrorcode
returnedbyepoll_ctl()tobehiddenfromtheapplicationwhenacallisdeferred.Insuch
casesanerrormessageisemittedtostderrorthesystemlog.
EF_EPOLL_MT_SAFE
Name:ul_epoll_mt_safe default:0 min:0 max:1 per‐process
Thisoptiondisablesconcurrencycontrolinsidetheacceleratedepollimplementations,
reducingCPUoverhead.Itissafetoenablethisoptionif,foreachepollset,allcallson
theepollsetandallcallsthatmaymodifyamemberoftheepollsetareconcurrency
safe.Callsthatmaymodifyamemberarebind(),connect(),listen()andclose().This

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 150
optionimprovesperformancewithEF_UL_EPOLL=1or3andalsowithEF_UL_EPOLL=2
andEF_EPOLL_CTL_FAST=1.
EF_EPOLL_SPIN
Name:ul_epoll_spin default:0 min:0 max:1 per‐process
Spininepoll_wait()callsuntilaneventissatisfiedorthespintimeoutexpires
(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernelandblock.The
spintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_EVS_PER_POLL
Name:evs_per_poll default:64 min:0 max:0x7fffffff per‐stack
Setsthenumberofhardwarenetworkeventstohandlebeforeperformingotherwork.
Thevaluechosenrepresentsatrade‐off:Largervaluesincreasebatching(which
typicallyimprovesefficiency)butmayalsoincreasetheworkingsetsize(whichharms
cacheefficiency).
EF_FDS_MT_SAFE
Name:fds_mt_safe default:1 min:0 max:1 per‐process
Thisoptionallowslessstrictconcurrencycontrolwhenaccessingtheuser‐levelfile
descriptortable,resultinginincreasedperformance,particularlyformulti‐threaded
applications.Single‐threadedapplicationsgetasmalllatencybenefit,butmulti‐
threadedapplicationsbenefitmostduetodecreasedcache‐linebouncingbetweenCPU
cores.Thisoptionisunsafeforapplicationsthatmakechangestofiledescriptorsinone
threadwhileaccessingthesamefiledescriptorsinotherthreads.Forexample,closinga
filedescriptorinonethreadwhileinvokinganothersystemcallonthatfiledescriptorin
asecondthread.Concurrentcallsthatdonotchangetheobjectunderlyingthefile
descriptorremainsafe.Callstobind(),connect(),listen()maychangeunderlyingobject.
Ifyoucallsuchfunctionsinonethreadwhileaccessingthesamefiledescriptorfromthe
otherthread,thisoptionisalsounsafe.Insomespecialcases,anyfunctionsmay
changeunderlyingobject.Alsoconcurrentcallsmayhappenfromsignalhandlers,soset
thisto0ifyoursignalhandlerscallbind(),connect(),listen()orclose()

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 151
EF_FDTABLE_SIZE
Name:fdtable_size default:0 per‐process
Limitthenumberofopenedfiledescriptorsbythisvalue.Ifzero,theinitialhardlimitof
openfiles(`ulimit‐n‐H`)isused.Hardandsoftresourcelimitsforopenedfile
descriptors(helpulimit,man2setrlimit)areboundbythisvalue.
EF_FDTABLE_STRICT
Name:fdtable_strict default:0 min:0 max:1 per‐process
Enablesmorestrictconcurrencycontrolfortheuser‐levelfiledescriptortable.Enabling
thisoptioncanreduceperformanceforapplicationsthatcreateanddestroymany
connectionspersecond.
EF_FORCE_SEND_MULTICAST
Name:force_send_multicast default:1 min:0 max:1 per‐stack
Thisoptioncausesallmulticastsendstobeaccelerated.Whendisabled,multicast
sendsareonlyacceleratedforsocketsthathaveclearedtheIP_MULTICAST_LOOP
flag.Thisoptiondisablesloopbackofmulticasttraffictoreceiversonthesamehost,
unless(a)thosereceiversaresharinganOpenOnloadstackwiththesender(see
EF_NAME)andEF_MCAST_SENDissetto1or3,or(b)prerequisitestosupportloopback
tootherOpenOnloadstacksaremet(seeEF_MCAST_SEND).SeetheOpenOnload
manualforfurtherdetailsonmulticastoperation.
EF_FORCE_TCP_NODELAY
Name:tcp_force_nodelay default:0 min:0 max:2 per‐stack
ThisoptionallowstheusertooverridetheuseofTCP_NODELAY.Thismaybeusefulin
caseswhere3rd‐partysoftwareis(not)settingthisvalueandtheuserwouldliketo
controlitsbehaviour:0‐donotoverride1‐alwayssetTCP_NODELAY2‐neverset
TCP_NODELAY
EF_FORK_NETIF
Name:fork_netif default:3 min:CI_UNIX_FORK_NETIF_NONE max:
CI_UNIX_FORK_NETIF_BOTH per‐process
Thisoptioncontrolsbehaviourafteranapplicationcallsfork().0‐Neitherforkparent

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 152
norchildcreatesanewOpenOnloadstack1‐Childcreatesanewstackfornewsockets
2‐Parentcreatesanewstackfornewsockets3‐Parentandchildeachcreateanew
stackfornewsockets
EF_FREE_PACKETS_LOW_WATERMARK
Name:free_packets_low default:0 per‐stack
Keepfreepacketsnumbertobeatleastthisvalue.EF_MIN_FREE_PACKETSdefines
initialisationbehaviour;thisvalueisaboutnormalapplicationruntime.Insome
combinationsofhardwareandsoftware,Onloadisnotableallocatepacketsatany
context,soitmakessensetokeepsomesparepackets.Defaultvalue0isinterpretedas
EF_RXQ_SIZE/2
EF_HELPER_PRIME_USEC
Name:timer_prime_usec default:250 per‐stack
Setsthefrequencywithwhichsoftwareshouldresetthecount‐downtimer.Usuallyset
toavaluethatissignificantlysmallerthanEF_HELPER_USECtopreventthecount‐down
timerfromfiringunlessneeded.Defaultsto(EF_HELPER_USEC/2).
EF_HELPER_USEC
Name:timer_usec default:500 per‐stack
Timeoutinmicrosecondsforthecount‐downinterrupttimer.Thistimergeneratesan
interruptifnetworkeventsarenothandledbytheapplicationwithinthegiventime.It
ensuresthatnetworkeventsarehandledpromptlywhentheapplicationisnotinvoking
thenetwork,orisdescheduled.Setthisto0todisablethecount‐downinterrupttimer.
Itisdisabledbydefaultforstacksthatareinterruptdriven.
EF_INT_DRIVEN
Name:int_driven default:1 min:0 max:1 per‐stack
Putthestackintoan'interruptdriven'modeofoperation.Whenthisoptionisnot
enabledOnloadusesheuristicstodecidewhentoenableinterrupts,andthiscancause
latencyjitterinsomeapplications.Soenablingthisoptioncanhelpavoidlatency
outliers.Thisoptionisenabledbydefaultexceptwhenspinningisenabled.Thisoption
canbeusedinconjunctionwithspinningtopreventoutlierscausedwhenthespin
timeoutisexceededandtheapplicationblocks,orwhentheapplicationisdescheduled.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 153
Inthiscasewerecommendthatinterruptmoderationbesettoareasonablyhighvalue
(eg.100us)topreventtoohigharateofinterrupts.
EF_INT_REPRIME
Name:int_reprime default:0 min:0 max:1 per‐stack
Enableinterruptsmoreaggressivelythanthedefault.
EF_IRQ_CHANNEL
Name:irq_channel default:4294967295 min:-1 max:SMAX per‐stack
Setthenet‐driverreceivechannelthatwillbeusedtohandleinterruptsforthisstack.
Thecorethatreceivesinterruptsforthisstackwillbewhichevercoreisconfiguredto
handleinterruptsforthespecifiednetdriverreceivechannel.Thisoptiononlytakes
effectEF_PACKET_BUFFER_MODE=0(default)or2.
EF_IRQ_CORE
Name:irq_core default:4294967295 min:-1 max:SMAX per‐stack
SpecifywhichCPUcoreinterruptsforthisstackshouldbehandledon.With
EF_PACKET_BUFFER_MODE=1or3,Onloadcreatesdedicatedinterruptsforeachstack,
andtheinterruptisassignedtotherequestedcore.WithEF_PACKET_BUFFER_MODE=0
(default)or2,Onloadinterruptsarehandledvianetdriverreceivechannelinterrupts.
Thesfc_affinitydriverisusedtochoosewhichnet‐driverreceivechannelisused.Itis
onlypossibleforinterruptstobehandledontherequestedcoreifanetdriverinterrupt
isassignedtotheselectedcore.Otherwiseanearbycorewillbeselected.Notethatif
theIRQbalancerserviceisenableditmayredirectinterruptstoothercores.
EF_IRQ_MODERATION
Name:irq_usec default:0 min:0 max:1000000 per‐stack
Interruptmoderationinterval,inmicroseconds.Thisoptiononlytakeseffectivewith
EF_PACKET_BUFFER_MODE=1or3.Otherwisetheinterruptmoderationsettingsofthe
kernelnetdrivertakeeffect.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 154
EF_KEEPALIVE_INTVL
Name:keepalive_intvl default:75000 per‐stack
Defaultintervalbetweenkeepalives,inmilliseconds.
EF_KEEPALIVE_PROBES
Name:keepalive_probes default:9 per‐stack
Defaultnumberofkeepaliveprobestotrybeforeabortingtheconnection.
EF_KEEPALIVE_TIME
Name:keepalive_time default:7200000 per‐stack
Defaultidletimebeforekeepaliveprobesaresent,inmilliseconds.
EF_LOAD_ENV
Name:load_env default:1 min:0 max:1 per‐process
OpenOnloadwillonlyconsultotherenvironmentvariablesifthisoptionisset.i.e.
ClearingthisoptionwillcauseallotherEF_environmentvariablestobeignored.
EF_LOG
Name:log_category default:27 min:0 per‐stack
DesignedtocontrolhowchattyOnload'sinformative/warningmessagesare.Specified
asacommaseperatedlistofoptionstoenableanddisable(withaminussign).Valid
optionsare'banner'(onbydefault),'resource_warnings'(onbydefault),
'config_warnings'(onbydefault)'conn_drop'(offbydefault)and'usage_warnings'(on
bydefault).E.g.:Toenableconn_drop:EF_LOG=conn_drop.E.g.:Toenableconn_drop
andturnoffresourcewarnings:EF_LOG=conn_drop,‐resource_warnings

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 155
EF_LOG_FILE
Scope:per‐stack
WhenEF_LOG_VIA_IOCTLisunset,theusercandirectOnloaddebugandoutputdatato
adirectory/fileinsteadofstdoutandinsteadofthesyslog.
EF_LOG_TIMESTAMPS
Name:EF_LOG_TIMESTAMPS default:0 min:0max:1 global
IfenabledthiswilladdatimestamptoeveryOnloadoutputlogentry.Timestampsare
originatedfromtheFRCcounter.
EF_LOG_VIA_IOCTL
Name:log_via_ioctl default:0 min:0 max:1 per‐process
CauseserrorandlogmessagesemittedbyOpenOnloadtobewrittentothesystemlog
ratherthanwrittentostandarderror.Thisincludesthecopyrightbanneremittedwhen
anapplicationcreatesanewOpenOnloadstack.Bydefault,OpenOnloadlogsarewritten
totheapplicationstandarderrorifandonlyifitisaTTY.Enablethisoptionwhenitis
importantnottochangewhattheapplicationwritestostandarderror.Disableitto
guaranteethatloggoestostandarderrorevenifitisnotaTTY.
EF_MAX_ENDPOINTS
Name:max_ep_bufs default:8192 min:4 max:
CI_CFG_NETIF_MAX_ENDPOINTS_MAX per‐stack
Thisoptionplacesanupperlimitonthenumberofacceleratedendpoints(sockets,
pipesetc.)inanOnloadstack.Thisoptionshouldbesettoapoweroftwobetween4
and2^21.Whenthislimitisreachedlisteningsocketsarenotabletoacceptnew
connectionsoveracceleratedinterfaces.Newsocketsandpipescreatedviasocket()
andpipe()etc.arehandedovertothekernelstackandsoarenotaccelerated.Note:~4
syn‐receivestatesconsumeoneendpoint,seealsoEF_TCP_SYNRECV_MAX.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 156
EF_MAX_PACKETS
Name:max_packets default:32768 min:1024 per‐stack
UpperlimitonnumberofpacketbuffersineachOpenOnloadstack.Packetbuffers
requirehardwareresourceswhichmaybecomealimitingfactorifmanystacksareeach
usingmanypacketbuffers.Thisoptioncanbeusedtolimithowmuchhardware
resourceandmemoryastackuses.Thisoptionhasanupperlimitdeterminedbythe
max_packets_per_stackonloadmoduleoption.Note:When'scalablepacketbuffer
mode'isnotenabled(seeEF_PACKET_BUFFER_MODE)thetotalnumberofpacket
bufferspossibleinaggregateislimitedbyahardwareresource.TheSFN5xseries
adapterssupportapproximately120,000packetbuffers.
EF_MAX_RX_PACKETS
Name:max_rx_packets default:24576 min:0 max:1000000000 per‐
stack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythereceivedata
path.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethatsome
packetbuffersarereservedforthetransmitpath.
EF_MAX_TX_PACKETS
Name:max_tx_packets default:24576 min:0 max:1000000000 per‐
stack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythetransmitdata
path.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethatsome
packetbuffersarereservedforthereceivepath.
EF_MCAST_JOIN_BINDTODEVICE
Name:mcast_join_bindtodevice default:0 min:0 max:1 per‐stack
WhenaUDPsocketjoinsamulticastgroup(usingIP_ADD_MEMBERSHIPorsimilar),this
optioncausesthesockettobeboundtotheinterfacethatthejoinwason.Thebenefit
ofthisisthatitensuresthesocketwillnotaccidentallyreceivepacketsfromother
interfacesthathappentomatchthesamegroupandport.Thiscansometimeshappen
ifanothersocketjoinsthesamemulticastgrouponadifferentinterface,oriftheswitch
isnotfilteringmulticasttrafficeffectively.Ifthesocketjoinsmulticastgroupsonmore
thanoneinterface,thenthebindingisautomaticallyremoved.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 157
EF_MCAST_JOIN_HANDOVER
Name:mcast_join_handover default:0 min:0 max:2 per‐stack
Whenthisoptionissetto1,andaUDPsocketjoinsamulticastgrouponaninterface
thatisnotaccelerated,theUDPsocketishanded‐overtothekernelstack.Thiscanbea
goodideabecauseitpreventsthatsocketfromconsumingOnloadresources,andmay
alsohelpavoidspinningwhenitisnotwanted.Whensetto2,UDPsocketsthatjoin
multicastgroupsarealwayshanded‐overtothekernelstack.
EF_MCAST_RECV
Name:mcast_recv default:1 min:0 max:1 per‐stack
Controlswhetherornottoacceleratemulticastreceives.Whensettozero,multicast
receivesarenotaccelerated,butthesocketcontinuestobemanagedbyOnload.See
alsoEF_MCAST_JOIN_HANDOVER.SeetheOpenOnloadmanualforfurtherdetailson
multicastoperation.
EF_MCAST_RECV_HW_LOOP
Name:mcast_recv_hw_loop default:1 min:0 max:1 per‐stack
Whenenabledallowsudpsocketstoreceivemulticasttrafficthatoriginatesfromother
OpenOnloadstacks.SeetheOpenOnloadmanualforfurtherdetailsonmulticast
operation.
EF_MCAST_SEND
Name:mcast_send default:0 min:0 max:3 per‐stack
ControlsloopbackofmulticasttraffictoreceiversinthesameandotherOpenOnload
stacks.Whensetto0(default)disablesloopbackwithinthesamestackaswellasto
otherOpenOnloadstacks.Whensetto1enablesloopbacktothesamestackWhensetto
2enablesloopbacktootherOpenOnloadstacks.Whensetto3enablesloopbacktothe
sameaswellasotherOpenOnloadstacks.InrespecttoloopbacktootherOpenOnload
stackstheoptionsisjustahintandthefeaturerequires:(a)7000‐seriesornewer
device,and(b)selectingfirmwarevariantwithloopbacksupport.SeetheOpenOnload
manualforfurtherdetailsonmulticastoperation.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 158
EF_MIN_FREE_PACKETS
Name:min_free_packets default:100 min:0 max:1000000000 per‐
stack
Minimumnumberoffreepacketstoreserveforeachstackatinitialisation.IfOnloadis
notabletoallocatesufficientpacketbufferstofilltheRXringsandfillthefreepoolwith
thegivennumberofbuffers,thencreationofthestackwillfail.
EF_MULTICAST_LOOP_OFF
Name:multicast_loop_off default:1 min:0 max:1 per‐stack
EF_MULTICAST_LOOP_OFFisdeprecatedinfavourofEF_MCAST_SENDWhenset,
disablesloopbackofmulticasttraffictoreceiversinthesameOpenOnloadstack.This
optiononlytakeseffectwhenEF_MCAST_SENDisnotsetandisequivalentto
EF_MCAST_SEND=1orEF_MCAST_SEND=0forvaluesof0and1respectively.Seethe
OpenOnloadmanualforfurtherdetailsonmulticastoperation.
EF_NETIF_DTOR
Name:netif_dtor default:1 min:0 max:2 per‐process
ThisoptioncontrolsthelifetimeofOpenOnloadstackswhenthelastsocketinastackis
closed.
EF_NAME
Default:none min:8 chars per‐stack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stacknameoverridesthisvariableand,
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
EF_NONAGLE_INFLIGHT_MAX
Name:nonagle_inflight_max default:50 min:1 per‐stack
ThisoptionaffectsthebehaviourofTCPsocketswiththeTCP_NODELAYsocketoption.
Nagle'salgorithmisenabledwhenthenumberofpacketsin‐flight(sentbutnot
acknowledged)exceedsthevalueofthisoption.Thisimprovesefficiencywhensending
manysmallmessages,whilepreservinglowlatency.Setthisoptionto‐1toensurethat

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 159
Nagle'salgorithmneverdelayssendingofTCPmessagesonsocketswithTCP_NODELAY
enabled.
EF_NO_FAIL
Name:no_fail default:1 min:0 max:1 per‐process
Thisoptioncontrolswhetherfailuretocreateanacceleratedsocket(duetoresource
limitations)ishiddenbycreatingaconventionalunacceleratedsocket.Setthisoption
to0tocauseout‐of‐resourceserrorstobepropagatedaserrorstotheapplication,orto
1tohaveOnloadusethekernelstackinsteadwhenoutofresources.Disablingthis
optioncanbeusefultoensurethatsocketsarebeingacceleratedasexpected(ie.tofind
outwhentheyarenot).
EF_PACKET_BUFFER_MODE
Name:packet_buffer_mode default:0 min:0 max:3 per‐stack
ThisoptionaffectshowDMAbuffersaremanaged.Thedefaultpacketbuffermode
usesalimitedhardwareresource,andsorestrictsthetotalamountofmemorythatcan
beusedbyOnloadforDMA.SettingEF_PACKET_BUFFER_MODE!=0enables'scalable
packetbuffermode'whichremovesthatlimit.Seedetailsforeachmodebelow.1‐
SR‐IOVwithIOMMU.EachstackallocatesaseparatePCIVirtualFunction.IOMMU
guaranteesthatdifferentstacksdonothaveanyaccesstoeachotherdata.2‐Physical
addressmode.Inherentlyunsafe;noaddressspaceseparationbetweendifferent
stacksornetdriverpackets.3‐SR‐IOVwithphysicaladdressmode.Eachstack
allocatesaseparatePCIVirtualFunction.IOMMUisnotused,sothismodeisunsafein
thesamewayas(2).Touseoddmodes(1and3)SR‐IOVmustbeenabledintheBIOS,OS
kernelandonthenetworkadapter.Inthesemodesyoualsogetfasterinterrupt
handlerwhichcanimprovelatencyforsomeworkloads.Formode(1)youalsohaveto
enableIOMMU(alsoknownasVT‐d)inBIOSandinyourkernel.Forunsafephysical
addressmodes(2)and(3),youshouldtunephys_mode_gidmoduleparameterofthe
onloadmodule.
EF_PER_SOCKET_CACHE_MAX
Name:per_sock_cache_max default:0 per‐stack
Whensocketcachingisenabled,(i.e.whenEF_SOCKET_CACHE_MAX>0),thissetsa
furtherlimitonthesizeofthecacheforeachsocket.Ifsettozero,nolimitissetbeyond
thegloballimitspecifiedbyEF_SOCKET_CACHE_MAX.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 160
EF_PIO
Name:pio default:1 min:0 max:2 per‐stack
ControlofwhetherProgrammedI/OisusedinsteadofDMAforsmallpackets:0‐no
(useDMA);1‐usePIOforsmallpacketsifavailable(default);2‐usePIOforsmall
packetsandfailifPIOisnotavailable.Mode1willfallbacktoDMAifPIOisnotcurrently
available.Mode2willfailtocreatethestackifthehardwaresupportsPIObutPIOisnot
currentlyavailable.OnhardwarethatdoesnotsupportPIOthereisnodifference
betweenmode1andmode2Inallcases,PIOwillonlybeusedforsmallpackets(see
EF_PIO_THRESHOLD)andiftheVI'stransmitqueueiscurrentlyempty.Ifthese
conditionsarenotmetDMAwillbeused,eveninmode2.Note:PIOiscurrentlyonly
availableonx86_64systemsNote:Mode2willnotpreventastackfromoperating
withoutPIOintheeventthatPIOallocationisoriginallysuccessfulbutthenfails
afteranadapterisrebootedorhotpluggedwhilethatstackexists.
EF_PIO_THRESHOLD
Name:pio_thresh default:1514 min:0 per‐stack
SetsathresholdforthesizeofpacketthatwillusePIO,ifturnedonusingEF_PIO.
PacketsuptothethresholdwillusePIO.Largerpacketswillnot.
EF_PIPE
Name:ul_pipe default:2 min:CI_UNIX_PIPE_DONT_ACCELERATE max:
CI_UNIX_PIPE_ACCELERATE_IF_NETIF per‐process
0‐disablepipeacceleration,1‐enablepipeacceleration,2‐accleratepipesonlyifan
Onloadstackalreadyexistsintheprocess.
EF_PIPE_RECV_SPIN
Name:pipe_recv_spin default:0 min:0 max:1 per‐process
Spininpipereceivecallsuntildataarrivesorthespintimeoutexpires(whicheveristhe
sooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespintimeoutisset
byEF_SPIN_USECorEF_POLL_USEC.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 161
EF_PIPE_SEND_SPIN
Name:pipe_send_spin default:0 min:0 max:1 per‐process
Spininpipesendcallsuntilspacebecomesavailableinthesocketbufferorthespin
timeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernel
andblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_PIPE_SIZE
Name:pipe_size default:229376 min:OO_PIPE_MIN_SIZE max:
CI_CFG_MAX_PIPE_SIZE per‐process
Defaultsizeofthepipeinbytes.Actualpipesizewillberoundeduptothesizeofpacket
bufferandsubjecttomodificationsbyfcntlF_SETPIPE_SZwheresupported.
EF_PKT_WAIT_SPIN
Name:pkt_wait_spin default:0 min:0 max:1 per‐process
SpinwhilewaitingforDMAbuffers.Ifthespintimeoutexpires,enterthekerneland
block.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_FAST
Name:ul_poll_fast default:1 min:0 max:1 per‐process
Allowapoll()calltoreturnwithoutinspectingthestateofallpolledfiledescriptors
whenatleastoneeventissatisfied.Thisallowstheacceleratedpoll()calltoavoida
systemcallwhenacceleratedsocketsare'ready',andcanincreaseperformance
substantially.Thisoptionchangesthesemanticsofpoll(),andassuchcouldcause
applicationstomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsovernon‐
acceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityofapplications
workfinewiththisoption.
EF_POLL_FAST_USEC
Name:ul_poll_fast_usec default:32 per‐process
Whenspinninginapoll()call,causesacceleratedsocketstobepolledforNusecsbefore
unacceleratedsocketsarepolled.Thisreduceslatencyforacceleratedsockets,possibly
attheexpenseoflatencyonunacceleratedsockets.Sinceacceleratedsocketsare
typicallythepartsoftheapplicationwhicharemostperformance‐sensitivethisis

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 162
typicallyagoodtradeoff.
EF_POLL_NONBLOCK_FAST_USEC
Name:ul_poll_nonblock_fast_usec default:200 per‐process
Wheninvokingpoll()withtimeout==0(non‐blocking),thisoptioncausesnon‐
acceleratedsocketstobepolledonlyeveryNusecs.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceaccelerated
socketsaretypicallythepartsoftheapplicationwhicharemostperformance‐sensitive
thisisoftenagoodtradeoff.Setthisoptiontozerotodisable,ortoahighervalueto
furtherimprovelatencyforacceleratedsockets.Thisoptionchangesthebehaviourof
poll()calls,socouldpotentiallycauseanapplicationtomisbehave.
EF_POLL_ON_DEMAND
Name:poll_on_demand default:1 min:0 max:1 per‐stack
Pollfornetworkeventsinthecontextoftheapplicationcallsintothenetworkstack.
Thisoptionisenabledbydefault.Thisoptioncanimproveperformanceinmulti‐
threadedapplicationswheretheOnloadstackisinterrupt‐driven(EF_INT_DRIVEN=1),
becauseitcanreducelockcontention.SettingEF_POLL_ON_DEMAND=0ensuresthat
networkeventsare(mostly)processedinresponsetointerrupts.
EF_POLL_SPIN
Name:ul_poll_spin default:0 min:0 max:1 per‐process
Spininpoll()callsuntilaneventissatisfiedorthespintimeoutexpires(whicheveristhe
sooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespintimeoutisset
byEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_USEC
Name:ef_poll_usec_meta_option default:0 per‐process
Thisoptionenablesspinningandsetsthespintimeoutinmicroseconds.Settingthis
optionisequivalentto:SettingEF_SPIN_USECandEF_BUZZ_USEC,enablingspinningfor
UDPsendsandreceives,TCPsendsandreceives,select,pollandepoll_wait(),and
enablinglockbuzzing.Spinningtypicallyreduceslatencyandjittersubstantially,andcan
alsoimprovethroughput.However,insomeapplicationsspinningcanharm
performance;particularlyapplicationthathavemanythreads.Whenspinningis

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 163
enabledyoushouldnormallydedicateaCPUcoretoeachthreadthatspins.Youcanuse
theEF_*_SPINoptionstoselectivelyenableordisablespinningforeachAPIand
transport.Youcanalsousetheonload_thread_set_spin()extensionAPItocontrol
spinningonaper‐threadandper‐APIbasis.
EF_PREFAULT_PACKETS
Name:prefault_packets default:1 min:0 max:1000000000 per‐stack
Whenset,thisoptioncausestheprocessto'touch'thespecifiednumberofpacket
bufferswhentheOnloadstackiscreated.Thiscausesmemoryforthepacketbuffersto
bepre‐allocated,andalsocausesthemtobememory‐mappedintotheprocessaddress
space.Thiscanpreventlatencyjittercausedbyallocationandmemory‐mapping
overheads.Thenumberofpacketsrequestedisinadditiontothepacketbuffersthatare
allocatedtofilltheRXrings.Thereisnoguaranteethatitwillbepossibletoallocatethe
numberofpacketbuffersrequested.Thedefaultsettingcausesallpacketbufferstobe
mappedintotheuser‐leveladdressspace,butdoesnotcauseanyextrabufferstobe
reserved.Setto0topreventprefaulting.
EF_PROBE
Name:probe default:1 min:0 max:1 per‐process
Whenset,filedescriptorsaccessedfollowingexec()willbe'probed'andOpenOnload
socketswillbemappedtouser‐landsothattheycanbeaccelerated.Otherwise
OpenOnloadsocketsarenotacceleratedfollowingexec().
EF_RETRANSMIT_THRESHOLD
Name:retransmit_threshold default:15 min:0 max:SMAX per‐stack
NumberofretransmittimeoutsbeforeaTCPconnectionisaborted.
EF_RETRANSMIT_THRESHOLD_ORPHAN
Name:retransmit_threshold_orphan default:8 min:0 max:SMAX
per‐stack
NumberofretransmittimeoutsbeforeaTCPconnectionisabortedincaseoforphaned
connection.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 164
EF_RETRANSMIT_THRESHOLD_SYN
Name:retransmit_threshold_syn default:4 min:0 max:SMAX per‐
stack
NumberoftimesaSYNwillberetransmittedbeforeaconnect()attemptwillbe
aborted.
EF_RETRANSMIT_THRESHOLD_SYNACK
Name:retransmit_threshold_synack default:5 min:0 max:
CI_CFG_TCP_SYNACK_RETRANS_MAX per‐stack
NumberoftimesaSYN‐ACKwillberetransmittedbeforeanembryonicconnectionwill
beaborted.
EF_RFC_RTO_INITIAL
Name:rto_initial default:1000 per‐stack
Initialretransmittimeoutinmilliseconds.i.e.Thenumberofmillisecondstowaitforan
ACKbeforeretransmittingpackets.
EF_RFC_RTO_MAX
Name:rto_max default:120000 per‐stack
Maximumretransmittimeoutinmilliseconds.
EF_RFC_RTO_MIN
Name:rto_min default:200 per‐stack
Minimumretransmittimeoutinmilliseconds.
EF_RXQ_LIMIT
Name:rxq_limit default:65535 min:CI_CFG_RX_DESC_BATCH max:
65535 per‐stack
Maximumfilllevelforthereceivedescriptorring.Thishasnoeffectwhenithasavalue
largerthantheringsize(EF_RXQ_SIZE).

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 165
EF_RXQ_MIN
Name:rxq_min default:256 min:2 * CI_CFG_RX_DESC_BATCH + 1 per‐
stack
MinimuminitialfilllevelforeachRXring.IfOnloadisnotabletoallocatesufficient
packetbufferstofilleachRXringtothislevel,thencreationofthestackwillfail.
EF_RXQ_SIZE
Name:rxq_size default:512 min:512 max:4096 per‐stack
Setthesizeofthereceivedescriptorring.Validvalues:512,1024,2048or4096.Alarger
ringsizecanabsorblargerpacketburstswithoutdrops,butmayreduceefficiency
becausetheworkingsetsizeisincreased.
EF_RX_TIMESTAMPING
Name:rx_timestamping default:0 min:0 max:3 per‐stack
Controlofhardwaretimestampingofreceivedpackets,possiblevalues:0‐donotdo
timestamping(default);1‐requesttimestampingbutcontinueifhardwareisnot
capableoritdoesnotsucceed;2‐requesttimestampingandfailifhardwareiscapable
anditdoesnotsucceed;3‐requesttimestampingandfailifhardwareisnotcapableor
itdoesnotsucceed;
EF_SA_ONSTACK_INTERCEPT
Name:sa_onstack_intercept default:0 min:0 max:1 per‐process
InterceptsignalswhensignalhandlerisinstalledwithSA_ONSTACKflag.0‐Don't
intercept.Ifyoucallsocket‐relatedfunctionssuchassend,file‐relatedfunctionssuchas
closeordupfromyoursignalhandler,thenyourapplicationmaydeadlock.(default)1‐
Intercept.ThereisnoguaranteethatSA_ONSTACKflagwillreallywork,but
OpenOnloadlibrarywilldoitsbest.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 166
EF_SCALABLE_FILTERS
Name:scalable_filter_ifindex default:0 min:0 max:SMAX per‐
stack
Specifiestheinterfaceonwhichtoenablesupportforscalablefilters,andconfiguresthe
scalablefiltermode(s)touse.ScalablefiltersallowOnloadtouseasinglehardware
MAC‐addressfiltertoavoidhardwarelimitationsandoverheads.Thisremoves
restrictionsonthenumberofsimultaneousconnectionsandincreasesperformanceof
activeconnectcalls,butkernelsupportontheselectedinterfaceislimitedtoARP/
DHCP/ICMPprotocolsandsomeOnloadfeaturesthatrelyonunacceleratedtraffic(such
asreceivingfragmentedUDPdatagrams)willnotwork.PleaseseetheOnloaduser
guideforfulldetails.Dependingonthemodeselectedthisoptionwillenablesupport
for:‐scalablelisteningsockets;‐IP_TRANSPARENTsocketoption;Theinterfacespecified
mustbeaSFN7000orlaterNIC.FormatofEF_SCALABLE_FILTERSvariableisasfollows:
EF_SCALABLE_FILTERS=<interface‐name>[=mode[:mode]]wheremodeisoneof:
transparent_active,passive,rss.Thefollowingmodesandtheircombinationscanbe
specified:transparent_active,passive,rss:transparent_active,
transparent_active:passive
EF_SCALABLE_FILTERS_ENABLE
Name:scalable_filter_enable default:0 min:0 max:1 per‐stack
Turnthescalablefilterfeatureonoroffonastack.Ifthisissetto1thenthe
configurationselectedinEF_SCALABLE_FILTERSwillbeused.Ifthisissetto0then
scalablefilterswillnotbeusedforthisstack.Ifunsetthiswilldefaultto1if
EF_SCALABLE_FILTERSisconfigured.
EF_SCALABLE_FILTERS_MODE
Name:scalable_filter_mode default:4294967295 min:-1 max:6
per‐stack
StoresscalablefiltermodesetwithEF_SCALABLE_FILTERS.Tobesetindirectlywith
EF_SCALABLE_FILTERSvariable

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 167
EF_SELECT_FAST
Name:ul_select_fast default:1 min:0 max:1 per‐process
Allowaselect()calltoreturnwithoutinspectingthestateofallselectedfiledescriptors
whenatleastoneselectedeventissatisfied.Thisallowstheacceleratedselect()callto
avoidasystemcallwhenacceleratedsocketsare'ready',andcanincreaseperformance
substantially.Thisoptionchangesthesemanticsofselect(),andassuchcouldcause
applicationstomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsovernon‐
acceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityofapplications
workfinewiththisoption.
EF_SELECT_FAST_USEC
Name:ul_select_fast_usec default:32 per‐process
Whenspinninginaselect()call,causesacceleratedsocketstobepolledforNusecs
beforeunacceleratedsocketsarepolled.Thisreduceslatencyforacceleratedsockets,
possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceacceleratedsockets
aretypicallythepartsoftheapplicationwhicharemostperformance‐sensitivethisis
typicallyagoodtradeoff.
EF_SELECT_NONBLOCK_FAST_USEC
Name:ul_select_nonblock_fast_usec default:200 per‐process
Wheninvokingselect()withtimeout==0(non‐blocking),thisoptioncausesnon‐
acceleratedsocketstobepolledonlyeveryNusecs.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceaccelerated
socketsaretypicallythepartsoftheapplicationwhicharemostperformance‐sensitive
thisisoftenagoodtradeoff.Setthisoptiontozerotodisable,ortoahighervalueto
furtherimprovelatencyforacceleratedsockets.Thisoptionchangesthebehaviourof
select()calls,socouldpotentiallycauseanapplicationtomisbehave.
EF_SELECT_SPIN
Name:ul_select_spin default:0 min:0 max:1 per‐process
Spininblockingselect()callsuntiltheselectsetissatisfiedorthespintimeoutexpires
(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernelandblock.The
spintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 168
EF_SEND_POLL_MAX_EVS
Name:send_poll_max_events default:96 min:1 max:65535 per‐stack
Whenpollingfornetworkeventsaftersending,thisplacesalimitonthenumberof
eventshandled.
EF_SEND_POLL_THRESH
Name:send_poll_thresh default:64 min:0 max:65535 per‐stack
Pollfornetworkeventsaftersendingthismanypackets.Settingthistoalargervalue
mayimprovetransmitthroughputforsmallmessagesbyallowingbatching.However,
suchbatchingmaycausesendstobedelayedleadingtoincreasedjitter.
EF_SHARE_WITH
Name:share_with default:0 min:-1 max:SMAX per‐stack
Setthisoptiontoallowastacktobeaccessedbyprocessesownedbyanotheruser.Set
ittotheUIDofauserthatshouldbepermittedtosharethisstack,orsetitto‐1toallow
anyusertosharethestack.Bydefaultstacksarenotaccessiblebyusersotherthan
root.Processesinvokedbyrootcanaccessanystack.Setuidprocessescanonlyaccess
stackscreatedbytheeffectiveuser,nottherealuser.Thisrestrictioncanberelaxedby
settingtheonloadkernelmoduleoptionallow_insecure_setuid_sharing=1.WARNING:A
userthatispermittedtoaccessastackisableto:Snooponanydatatransmittedor
receivedviathestack;Injectormodifydatatransmittedorreceivedviathestack;
damagethestackandanysocketsorconnectionsinit;causemisbehaviourandcrashes
inanyapplicationusingthestack.
EF_SIGNALS_NOPOSTPONE
Name:signals_no_postpone default:67109952 min:0 max:
(ci_uint64)(-1) per‐process
Comma‐separatedlistofsignalnumberstoavoidpostponingofthesignalhandlers.
Yourapplicationwilldeadlockifoneofthehandlersusessocketfunction.Bydefault,
thelistincludesSIGBUS,SIGSEGVandSIGPROF.Pleasespecifynumbers,notstring
aliases:EF_SIGNALS_NOPOSTPONE=7,11,27insteadof
EF_SIGNALS_NOPOSTPONE=SIGBUS,SIGSEGV,SIGPROF.Youcanset
EF_SIGNALS_NOPOSTPONEtoemptyvaluetopostponeallsignalhandlersinthesame
wayifyoususpectthesesignalstocallnetworkfunctions.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 169
EF_SOCKET_CACHE_MAX
Name:sock_cache_max default:0 per‐stack
SetsthemaximumnumberofTCPsocketstocacheforthisstack.Whenset>0,
OpenOnloadwillcacheresourcesassociatedwithsocketsinordertoimprove
connectionset‐upandtear‐downperformance.Thisimprovesperformancefor
applicationsthatmakenewTCPconnectionsatahighrate.
EF_SOCKET_CACHE_PORTS
Name:sock_cache_ports default:0 per‐process
Thisoptionspecifiesacomma‐separatedlistofportnumbers.Whenset(andsocket
cachingisenabled),onlysocketsboundtothespecifiedportswillbeeligibletobe
cached.
EF_SOCK_LOCK_BUZZ
Name:sock_lock_buzz default:0 min:0 max:1 per‐process
Spinwhilewaitingtoobtainaper‐socketlock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.Theper‐socketlockistaken
inrecv()callsandsimilar.Thisoptioncanreducejitterwhenmultiplethreadsinvoke
recv()onthesamesocket,butcanreducefairnessbetweenthreadscompetingforthe
lock.
EF_SO_BUSY_POLL_SPIN
Name:so_busy_poll_spin default:0 min:0 max:1 per‐process
Spinpoll,selectandepollinaLinux‐likeway:enablespinningonlyifaspinningsocletis
presetinthepoll/select/epollset.SeeLinuxdocumentationonSO_BUSY_POLLsocket
optionfordetails.YoushouldalsoenablespinningviaEF_POLL,SELECT,EPOLL_SPIN
variableifyou'dliketospininpoll,selectorepollcorrespondingly.Thespindurationis
setviaEF_SPIN_USEC,whichisequivalenttotheLinuxsysctl.net.busy_pollvalue.
EF_POLL_USECisall‐in‐onevariabletosetforall4variablesmentionedhere.Linuxnever
spinsinepoll,butOnloaddoes.Thisvariabledoesnotaffectepollbehaviourif
EF_UL_EPOLL=2.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 170
EF_SPIN_USEC
Name:ul_spin_usec default:0 per‐process
Setsthetimeoutinmicrosecondsforspinningoptions.Setthistoto‐1tospinforever.
ThespintimeoutmayalsobesetbytheEF_POLL_USECoption.Spinningtypically
reduceslatencyandjittersubstantially,andcanalsoimprovethroughput.However,in
someapplicationsspinningcanharmperformance;particularlyapplicationthathave
manythreads.WhenspinningisenabledyoushouldnormallydedicateaCPUcoreto
eachthreadthatspins.YoucanusetheEF_*_SPINoptionstoselectivelyenableor
disablespinningforeachAPIandtransport.Youcanalsousethe
onload_thread_set_spin()extensionAPItocontrolspinningonaper‐threadandper‐API
basis.
EF_STACK_LOCK_BUZZ
Name:stack_lock_buzz default:0 min:0 max:1 per‐process
Spinwhilewaitingtoobtainaper‐stacklock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.Thisoptionreducesjitter
causedbylockcontention,butcanreducefairnessbetweenthreadscompetingforthe
lock.
EF_STACK_PER_THREAD
Name:stack_per_thread default:0 min:0 max:1 per‐process
CreateaseparateOnloadstackforthesocketscreatedbyeachthread.
EF_SYNC_CPLANE_AT_CREATE
Name:sync_cplane default:2 min:0 max:2 per‐stack
Whenthisoptionissetto2Onloadwillforceasyncofcontrolplaneinformationfrom
thekernelwhenastackiscreated.Thiscanhelptoensureuptodateinformationis
usedwhereastackiscreatedimmediatelyfollowinginterfaceconfiguration.Ifthis
optionissetto1thenOnloadwillonlyforceasyncforthefirststackcreated.Thiscan
beusedifstackcreationtimeforlaterstacksistimecritical.Settingthisoptionto0will
disableforcedsync.Synchronisingdatafromthekernelwillcontinuetohappen
periodically.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 171
EF_TCP
Name:ul_tcp default:1 min:0 max:1 per‐process
CleartodisableaccelerationofnewTCPsockets.
EF_TCP_ACCEPT_SPIN
Name:tcp_accept_spin default:0 min:0 max:1 per‐process
SpininblockingTCPaccept()callsuntilincomingconnectionisestablished,thespin
timeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespin
timeoutexpires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.
EF_TCP_ADV_WIN_SCALE_MAX
Name:tcp_adv_win_scale_max default:14 min:0 max:14 per‐stack
MaximumvalueforTCPwindowscalingthatwillbeadvertised.
EF_TCP_BACKLOG_MAX
Name:tcp_backlog_max default:256 per‐stack
Placesanupperlimitonthenumberofembryonic(half‐open)connectionsforone
listeningsocket;seealsoEF_TCP_SYNRECV_MAX.Thisvalueisoverriddenby/proc/sys/
net/ipv4/tcp_max_syn_backlog.
EF_TCP_CLIENT_LOOPBACK
Name:tcp_client_loopback default:0 min:0 max:
CITP_TCP_LOOPBACK_TO_NEWSTACK per‐stack
EnableaccelerationofTCPloopbackconnectionsontheconnecting(client)side:0‐
notaccelerated(default);1‐accelerateifthelisteningsocketisinthesamestack(you
shouldalsosetEF_TCP_SERVER_LOOPBACK!=0);2‐accelerateandmoveaccepted
sockettothestackoftheconnectingsocket(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK=2);3‐accelerateandmovetheconnectingsockettothe
stackofthelisteningsocket(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK!=0).4‐accelerateandmovebothconnectingand
acceptedsocketstothenewstack(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK=2).NOTES:Options3and4breaksomeapplicationsusing

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 172
epoll,forkanddupcalls.Options2and4makesaccept()tomisbehaveiftheclientexist
tooearly.Option4isnotrecommendedon32‐bitsystemsbecauseitcancreatealotof
additionalOnloadstackseatingalotoflowmemory.
EF_TCP_CONNECT_HANDOVER
Name:tcp_connect_handover default:0 min:0 max:1 per‐stack
WhenanacceleratedTCPsocketcallsconnect(),handitovertothekernelstack.This
optiondisablesaccelerationofactive‐openTCPconnections.
EF_TCP_CONNECT_SPIN
Name:tcp_connect_spin default:0 min:0 max:1 per‐process
SpininblockingTCPconnect()callsuntilconnectionisestablished,thespintimeout
expiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespintimeout
expires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.
EF_TCP_FASTSTART_IDLE
Name:tcp_faststart_idle default:65536 min:0 per‐stack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTART.SettozerotopreventaconnectionenteringFASTSTARTafteranidle
period.
EF_TCP_FASTSTART_INIT
Name:tcp_faststart_init default:65536 min:0 per‐stack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTART.SettozerotodisableFASTSTARTonnewconnections.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 173
EF_TCP_FASTSTART_LOSS
Name:tcp_faststart_loss default:65536 min:0 per‐stack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTARTfollowingloss.SettozerotodisableFASTSTARTafterloss.
EF_TCP_FIN_TIMEOUT
Name:fin_timeout default:60 per‐stack
Timeinsecondstowaitforanorphanedconnectiontobeclosedproperlybythe
networkpartner(e.g.FINintheTCPFIN_WAIT2state;zerowindowopeningtosendour
FIN,etc).
EF_TCP_FORCE_REUSEPORT
Name:tcp_reuseports default:0 per‐process
Thisoptionspecifiesacomma‐separatedlistofportnumbers.TCPsocketsthatbindto
thoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
EF_TCP_INITIAL_CWND
Name:initial_cwnd default:0 min:0 max:SMAX per‐stack
Setstheinitialsizeofthecongestionwindow(inbytes)forTCPconnections.Somecare
isneededas,forexample,settingsmallerthanthesegmentsizemayresultinOnload
beingunabletosendtraffic.WARNING:ModifyingthisoptionmayviolatetheTCP
protocol.
EF_TCP_LISTEN_HANDOVER
Name:tcp_listen_handover default:0 min:0 max:1 per‐stack
WhenanacceleratedTCPsocketcallslisten(),handitovertothekernelstack.This
optiondisablesaccelerationofTCPlisteningsocketsandpassivelyopenedTCP
connections.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 174
EF_TCP_LOSS_MIN_CWND
Name:loss_min_cwnd default:0 min:0 max:SMAX per‐stack
SetstheminimumsizeofthecongestionwindowforTCPconnectionsfollowing
loss.WARNING:ModifyingthisoptionmayviolatetheTCPprotocol.
EF_TCP_RCVBUF
Name:tcp_rcvbuf_user default:0 per‐stack
OverrideSO_RCVBUFforTCPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT
Name:tcp_rcvbuf_est_def default:131072 per‐stack
OverridestheOSdefaultSO_RCVBUFvalueforTCPsocketsintheESTABLISHEDstateif
theOSdefaultSO_RCVBUFvaluefallsoutsideboundssetwiththisoption.Thisvalueis
usedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoidconfusionof
someapplicationslikenetperf.Thelowerboundissettothisvalueandtheupperbound
issetto4*thisvalue.IftheOSdefaultSO_RCVBUFvalueislessthanthelowerbound,
thenthelowerboundisused.IftheOSdefaultSO_RCVBUFvalueismorethanthe
upperbound,thentheupperboundisused.ThisvariableoverridesOSdefault
SO_RCVBUFvalueonly,itdoesnotchangeSO_RCVBUFiftheapplicationexplicitlysetsit
(seeEF_TCP_RCVBUFvariablewhichoverridesapplication‐suppliedvalue).
EF_TCP_RCVBUF_STRICT
Name:tcp_rcvbuf_strict default:0 min:0 max:1 per‐stack
ThisoptionpreventsTCPsmallsegmentattack.Withthisoptionset,Onloadlimitsthe
numberofpacketsinsideTCPreceivequeueandTCPreorderbuffer.Insomecases,this
optioncausesperformancepenalty.Youprobablywantthisoptionifyourapplicationis
connectingtounrtustedpartneroroveruntrustednetwork.Offbydefault.
EF_TCP_RECV_SPIN
Name:tcp_recv_spin default:0 min:0 max:1 per‐process
SpininblockingTCPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthe

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 175
kernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_TCP_RST_DELAYED_CONN
Name:rst_delayed_conn default:0 min:0 max:1 per‐stack
ThisoptiontellsOnloadtoresetTCPconnectionsratherthanallowdatatobe
transmittedlate.Specifically,TCPconnectionsareresetiftheretransmittimeoutfires.
(Thisusuallyhappenswhendataislost,andnormallytriggersaretransmitwhichresults
indatabeingdeliveredhundredsofmillisecondslate).WARNING:Thisoptionislikelyto
causeconnectionstoberesetspuriouslyifACKpacketsaredroppedinthenetwork.
EF_TCP_RX_CHECKS
Name:tcp_rx_checks default:0 min:0 max:1 per‐stack
Internal/debugginguseonly:performextradebugging/consistencychecksonreceived
packets.
EF_TCP_RX_LOG_FLAGS
Name:tcp_rx_log_flags default:0 per‐stack
LogreceivedpacketsthathaveanyoftheseflagssetintheTCPheader.Onlyactive
whenEF_TCP_RX_CHECKSisset.
EF_TCP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:tcp_nonblock_no_pkts_mode default:0 min:0 max:1 per‐stack
Thisoptioncontrolshowanon‐blockingTCPsend()callshouldbehaveifitisunableto
allocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinuxkernelstack
behaviourandblockforpacketbufferstobeavailable.Ifsetto1,thisoptionwillcause
OnloadtoreturnerrorENOBUFS.Notethisoptioncancausesomeapplications(that
assumethatasocketthatiswriteableisabletosendwithouterror)tomalfunction.
EF_TCP_SEND_SPIN
Name:tcp_send_spin default:0 min:0 max:1 per‐process
SpininblockingTCPsendcallsuntilwindowisupdatedbypeer,thespintimeoutexpires
orthesockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 176
enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_TCP_SERVER_LOOPBACK
Name:tcp_server_loopback default:0 min:0 max:
CITP_TCP_LOOPBACK_ALLOW_ALIEN_IN_ACCEPTQ per‐stack
EnableaccelerationofTCPloopbackconnectionsonthelistening(server)side:0‐not
accelerated(default);1‐accelerateiftheconnectingsocketisinthesamestack(you
shouldalsosetEF_TCP_CLIENT_LOOPBACK!=0);2‐accelerateandallowaccepted
sockettobeinanotherstack(thisisnecessaryforclientswith
EF_TCP_CLIENT_LOOPBACK=2,4).
EF_TCP_SNDBUF
Name:tcp_sndbuf_user default:0 per‐stack
OverrideSO_SNDBUFforTCPsockets(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_SNDBUF_ESTABLISHED_DEFAULT
Name:tcp_sndbuf_est_def default:131072 per‐stack
OverridestheOSdefaultSO_SNDBUFvalueforTCPsocketsintheESTABLISHEDstateif
theOSdefaultSO_SNDBUFvaluefallsoutsideboundssetwiththisoption.Thisvalueis
usedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoidconfusionof
someapplicationslikenetperf.Thelowerboundissettothisvalueandtheupperbound
issetto4*thisvalue.IftheOSdefaultSO_SNDBUFvalueislessthanthelowerbound,
thenthelowerboundisused.IftheOSdefaultSO_SNDBUFvalueismorethanthe
upperbound,thentheupperboundisused.ThisvariableoverridesOSdefault
SO_SNDBUFvalueonly,itdoesnotchangeSO_SNDBUFiftheapplicationexplicitlysets
it(seeEF_TCP_SNDBUFvariablewhichoverridesapplication‐suppliedvalue).
EF_TCP_SNDBUF_MODE
Name:tcp_sndbuf_mode default:1 min:0 max:2 per‐stack
ThisoptioncontrolshowtheSO_SNDBUFlimitisappliedtoTCPsockets.Inthedefault
modethelimitappliestothesizeofthesendqueueandretransmitqueuecombined.
Whenthisoptionissetto0thelimitappliestothethesendqueueonly.Whenthis
optionissetto2,theSNDBUFsizeisautomaticallyadjustedforeachTCPsocketto

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 177
matchthewindowadvertisedbythepeer(limitedby
EF_TCP_SOCKBUF_MAX_FRACTION).IftheapplicationsetsSO_SNDBUFexplictlythen
automaticadjustmentisnotusedforthatsocket.Thelimitisappliedtothesizeof
thesendqueueandretransmitqueuecombined.Youmayalsowantto
setEF_TCP_RCVBUF_MODEtogiveautomaticadjustmentofRCVBUF.
EF_TCP_SOCKBUF_MAX_FRACTION
Name:tcp_sockbuf_max_fraction default:1 min:1 max:10 per‐stack
ThisoptioncontrolsthemaximumfractionoftheTXbuffersthatmaybeallocatedtoa
singlesocketwithEF_TCP_SNDBUF_MODE=2.Italsocontrolsthemaximumfractionof
theRXbuffersthatmaybeallocatedtoasinglesocketwith
EF_TCP_RCVBUF_MODE=1.Themaximumallocationforasocketis
EF_MAX_TX_PACKETS/(2^N)forTXandEF_MAX_RX_PACKETS/(2^N)forRX,whereNis
specifiedhere.
EF_TCP_SYNCOOKIES
Name:tcp_syncookies default:0 min:0 max:1 per‐stack
UseTCPsyncookiestoprotectfromSYNfloodattack
EF_TCP_SYNRECV_MAX
Name:tcp_synrecv_max default:1024 max:
CI_CFG_NETIF_MAX_ENDPOINTS_MAX per‐stack
Placesanupperlimitonthenumberofembryonic(half‐open)connectionsinanOnload
stack;seealsoEF_TCP_BACKLOG_MAX.Bydefault,EF_TCP_SYNRECV_MAX=4*
EF_TCP_BACKLOG_MAX.
EF_TCP_SYN_OPTS
Name:syn_opts default:7 per‐stack
AbitmaskspecifyingtheTCPoptionstoadvertiseinSYNsegments.bit0(0x1)issetto1
toenablePAWSandRTTMtimestamps(RFC1323),bit1(0x2)issetto1toenable
windowscaling(RFC1323),bit2(0x4)issetto1toenableSACK(RFC2018),bit3(0x8)is
setto1toenableECN(RFC3128).

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 178
EF_TCP_TCONST_MSL
Name:msl_seconds default:25 per‐stack
TheMaximumSegmentLifetime(asdefinedbytheTCPRFC).Asmallervaluecauses
connectionstospendlesstimeintheTIME_WAITstate.
EF_TIMESTAMPING_REPORTING
Name:timestamping_reporting default:0 min:0 max:1 per‐stack
Controlstimestampreporting,possiblevalues:0:reporttranslatedtimestampsonly
whentheNICclockhasbeenset;1:reporttranslatedtimestampsonlywhenthesystem
clockandtheNICclockareinsync(e.g.usingptpd)Iftheaboveconditionsarenotmet
Onloadwillonlyreportraw(nottranslated)timestamps.
EF_TXQ_LIMIT
Name:txq_limit default:268435455 min:16 * 1024 max:0xfffffff
per‐stack
Maximumnumberofbytestoenqueueonthetransmitdescriptorring.
EF_TXQ_RESTART
Name:txq_restart default:268435455 min:1 max:0xfffffff per‐
stack
Level(inbytes)towhichthetransmitdescriptorringmustfallbeforeitwillbefilled
again.
EF_TXQ_SIZE
Name:txq_size default:512 min:512 max:4096 per‐stack
Setthesizeofthetransmitdescriptorring.Validvalues:512,1024,2048or4096.
EF_TX_MIN_IPG_CNTL
Name:tx_min_ipg_cntl default:0 min:-1 max:20 per‐stack
Ratepacingvalue.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 179
EF_TX_PUSH
Name:tx_push default:1 min:0 max:1 per‐stack
Enablelow‐latencytransmit.
EF_TX_PUSH_THRESHOLD
Name:tx_push_thresh default:100 min:1 per‐stack
SetsathresholdforthenumberofoutstandingsendsbeforewestopusingTX
descriptorpush.ThishasnoeffectifEF_TX_PUSH=0.Thisthresholdisignored,and
assumedtobe1,onpre‐SFN7000‐serieshardware.Itmakessensetosetthisvalue
similartoEF_SEND_POLL_THRESH
EF_TX_QOS_CLASS
Name:tx_qos_class default:0 min:0 max:1 per‐stack
SettheQOSclassfortransmittedpacketsonthisOnloadstack.TwoQOSclassesare
supported:0and1.BydefaultbothOnloadacceleratedtrafficandkerneltrafficarein
class0.YoucanminimiselatencybyplacinglatencysensitivetrafficintoaseparateQOS
classfrombulktraffic.
EF_TX_TIMESTAMPING
Name:tx_timestamping default:0 min:0 max:3 per‐stack
Controlofhardwaretimestampingoftransmittedpackets,possiblevalues:0‐donot
dotimestamping(default);1‐requesttimestampingbutcontinueifhardwareisnot
capableoritdoesnotsucceed;2‐requesttimestampingandfailifhardwareiscapable
anditdoesnotsucceed;3‐requesttimestampingandfailifhardwareisnotcapableor
itdoesnotsucceed;
EF_UDP
Name:ul_udp default:1 min:0 max:1 per‐process
CleartodisableaccelerationofnewUDPsockets.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 180
EF_UDP_CONNECT_HANDOVER
Name:udp_connect_handover default:1 min:0 max:1 per‐stack
WhenaUDPsocketisconnectedtoanIPaddressthatcannotbeacceleratedby
OpenOnload,handthesocketovertothekernelstack.Whenthisoptionisdisabledthe
socketremainsunderthecontrolofOpenOnload.Thismaybeworthwhilebecausethe
socketmaysubsequentlybere‐connectedtoanIPaddressthatcanbeaccelerated.
EF_UDP_FORCE_REUSEPORT
Name:udp_reuseports default:0 per‐process
Thisoptionspecifiesacomma‐separatedlistofportnumbers.UDPsocketsthatbindto
thoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
EF_UDP_PORT_HANDOVER2_MAX
Name:udp_port_handover2_max default:1 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER2_MIN
Name:udp_port_handover2_min default:2 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER3_MAX
Name:udp_port_handover3_max default:1 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 181
EF_UDP_PORT_HANDOVER3_MIN
Name:udp_port_handover3_min default:2 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER_MAX
Name:udp_port_handover_max default:1 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER_MIN
Name:udp_port_handover_min default:2 per‐stack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_RCVBUF
Name:udp_rcvbuf_user default:0 per‐stack
OverrideSO_RCVBUFforUDPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_UDP_RECV_SPIN
Name:udp_recv_spin default:0 min:0 max:1 per‐process
SpininblockingUDPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 182
EF_UDP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:udp_nonblock_no_pkts_mode default:0 min:0 max:1 per‐stack
Thisoptioncontrolshowanon‐blockingUDPsend()callshouldbehaveifitisunableto
allocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinuxkernelstack
behaviourandblockforpacketbufferstobeavailable.Ifsetto1,thisoptionwillcause
OnloadtoreturnerrorENOBUFS.Notethisoptioncancausesomeapplications(that
assumethatasocketthatiswriteableisabletosendwithouterror)tomalfunction.
EF_UDP_SEND_SPIN
Name:udp_send_spin default:0 min:0 max:1 per‐process
SpininblockingUDPsendcallsuntilspacebecomesavailableinthesocketbuffer,the
spintimeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespin
timeoutexpires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.Note:UDPsendsusuallycompleteveryquickly,butcanblockifthe
applicationdoesalargeburstofsendsatahighrate.Thisoptionreducesjitterwhen
suchblockingisneeded.
EF_UDP_SEND_UNLOCKED
Name:udp_send_unlocked default:1 min:0 max:1 per‐stack
Enablesthe'unlocked'UDPsendpath.Whenenabledthisoptionimprovesconcurrency
whenmultiplethreadsareperformingUDPsends.
EF_UDP_SEND_UNLOCK_THRESH
Name:udp_send_unlock_thresh default:1500 per‐stack
UDPmessagesizebelowwhichweattempttotakethestacklockearly.Takingthelock
earlyreducesoverheadandlatencyslightly,butmayincreaselockcontentioninmulti‐
threadedapplications.
EF_UDP_SNDBUF
Name:udp_sndbuf_user default:0 per‐stack
OverrideSO_SNDBUFforUDPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 183
EF_UL_EPOLL
Name:ul_epoll default:1 min:0 max:3 per‐process
Chooseepollimplementation.Thechoicesare:0‐kernel(unaccelerated)1‐user‐
level(accelerated,lowestlatency)2‐kernel‐accelerated(bestwhentherearelotsof
socketsinthesetandmode3isnotsuitable)3‐user‐level(accelerated,lowest
latency,scalable,supportssocketcaching)Thedefaultistheuser‐level
implementation(1).Mode3canofferbenefitsovermode1,particularlywithlarger
sets.However,thismodehassomerestrictions.Itdoesnotsupportepollsetsthatexist
acrossfork().Itdoesnotsupportmonitoringthereadinessoftheset'sepollfdviaa
anotherepoll/poll/select.
EF_UL_POLL
Name:ul_poll default:1 min:0 max:1 per‐process
Cleartodisableaccelerationofpoll()callsatuser‐level.
EF_UL_SELECT
Name:ul_select default:1 min:0 max:1 per‐process
Cleartodisableaccelerationofselect()callsatuser‐level.
EF_UNCONFINE_SYN
Name:unconfine_syn default:1 min:0 max:1 per‐stack
AcceptTCPconnectionsthatcrossintoorout‐ofaprivatenetwork.
EF_UNIX_LOG
Name:log_level default:3 per‐process
Abitmaskdeterminingwhichkindsofdiagnosticsmessageswillbelogged.0x1
errors0x2unexpected0x4setup0x8verbose0x10select()
0x20poll()0x100socketset‐up0x200socketcontrol0x400socket
caching0x1000signalinterception0x2000libraryenter/exit0x4000log
callarguments0x8000contextlookup0x10000pass‐through0x20000very
verbose0x40000Verbosereturnederror0x80000V.Verboseerrors:show'ok'
too0x20000000verbosetransportcontrol0x40000000veryverbosetransport
control0x80000000verbosepass‐through

OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 184
EF_URG_RFC
Name:urg_rfc default:0 min:0 max:1 per‐stack
ChoosebetweencompliancewithRFC1122(1)orBSDbehaviour(0)regardingthe
locationoftheurgentpointinTCPpacketheaders.
EF_USE_DSACK
Name:use_dsack default:1 min:0 max:1 per‐stack
WhetherornottouseDSACK(duplicateSACK).
EF_USE_HUGE_PAGES
Name:huge_pages default:1 min:0 max:2 per‐stack
Controlofwhetherhugepagesareusedforpacketbuffers:0‐no;1‐usehugepagesif
available(default);2‐alwaysusehugepagesandfailifhugepagesarenot
available.Mode1printssyslogmessageifthereisnotenoughhugepagesinthe
system.Mode2guaranteesonlyinitially‐allocatedpacketstobeinhugepages.Itis
recommendedtousethismodetogetherwithEF_MIN_FREE_PACKETS,tocontrolthe
numberofsuchguaranteedhugepages.Allnon‐initialpacketsareallocatedinhuge
pageswhenpossible;syslogmessageisprintedifthesystemisoutofhugepages.Non‐
initialpacketsmaybeallocatedinnon‐hugepageswithoutanywarninginsyslogfor
bothmode1and2evenifthesystemhasfreehugepages.
EF_VALIDATE_ENV
Name:validate_env default:1 min:0 max:1 per‐stack
WhensetthisoptionvalidatesOnloadrelatedenvironmentvariables(startingwithEF_).
EF_VFORK_MODE
Name:vfork_mode default:1 min:0 max:2 per‐process
Thisoptiondictateshowvfork()interceptshouldwork.Afteravfork(),parentandchildstill
shareaddressspacebutnotfiledescriptors.Wehavetobecarefulaboutmakingchanges
inthechildthatcanbeseenintheparent.Weofferthreeoptionshere.Differentappsmay
requiredifferentoptionsdependingontheiruseofvfork().IfusingEF_VFORK_MODE=2,it
isnotsafetocreatesocketsorpipesinthechildbeforecallingexec().0‐Oldbehavior.
Replacevfork()withfork()1‐Replacevfork()withfork()andblockparenttillchildexits/
execs2‐Replacevfork()withvfork()

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 185
BMetaOptions
B.1Environmentvariables
Thereareseveralenvironmentvariableswhichactasmeta‐optionsandsetseveral
oftheoptionsdetailedinAppendixA.Theseare:
EF_POLL_USEC
SettingEF_POLL_USECcausesthefollowingoptionstobeset:
• EF_SPIN_USEC=EF_POLL_USEC
• EF_SELECT_SPIN=1
• EF_EPOLL_SPIN=1
• EF_POLL_SPIN=1
• EF_PKT_WAIT_SPIN=1
• EF_TCP_SEND_SPIN=1
• EF_UDP_RECV_SPIN=1
• EF_UDP_SEND_SPIN=1
• EF_TCP_RECV_SPIN=1
• EF_BUZZ_USEC=EF_POLL_USEC
• EF_SOCK_LOCK_BUZZ=1
•EF_STACK_LOCK_BUZZ=1
NOTE:Ifneitherofthespinningoptions;EF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
EF_BUZZ_USEC
SettingEF_BUZZ_USECsetsthefollowingoptions:
• EF_SOCK_LOCK_BUZZ=1
•EF_STACK_LOCK_BUZZ=1
NOTE:IfEF_POLL_USECissettovalueN,thenEF_BUZZ_USECisalsosettoNonlyif
N<=100,IfN>100thenEF_BUZZ_USECwillbesetto100.Thisisdeliberateas
spinningfortoolongoninternallocksmayadverselyaffectperformance.However
theusercanexplicitlysetEF_BUZZ_USECvaluee.g.

OnloadUserGuide
MetaOptions
Issue20 ©SolarflareCommunications2015 186
exportEF_POLL_USEC=10000
exportEF_BUZZ_USEC=1000

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 187
CBuildDependencies
C.1General
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalled,thetarget
platformmustsupportthefollowingcapabilities:
• SupportageneralCbuildenvironment‐i.e.hasgcc,make,libcandlibc‐
devel.
•Fromversion201502thefollowingarerequired:perl,autoconf,automake
andlibtool.
•Cancompilekernelmodules‐i.e.hasthecorrectkernel‐develpackageforthe
installedkernelversion.
•If32bitapplicationsaretobeacceleratedon64bitarchitecturesthemachine
mustbeabletobuild32bitapplications.
NOTE:Onloadbuildshavebeentestedagainstlibtoolversions1.5.26to2.4.2.Users
experiencingbuildissueswithotherlibtoolversionsshouldcontact
support@solarflare.com.
BuildingKernelModules
ThekernelmustbebuiltwithCONFIG_NETFILTERenabled.Standarddistributions
willalreadyhavethisenabled,butitmustalsobeenabledwhenbuildingacustom
kernel.Thisoptiondoesnotaffectperformance.
Thefollowingcommandscanbeusedtoinstallkerneldevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu(anykernel):
apt‐getinstalllinux‐headers‐$(uname‐r)
•ForRedHat/Fedora(notfor32bitKernel):
‐ Ifthesystemsupportsa32bitKernelandthekernelisPAE,then:
yum‐yinstallkernel‐PAE‐devel
‐ otherwise:
yum‐yinstallkernel‐devel
•ForSuSE:
yast‐ikernel‐source

OnloadUserGuide
BuildDependencies
Issue20 ©SolarflareCommunications2015 188
onload
•binutils
•gettext
•gawk
•gcc
•sed
•make
•bash
•glibc‐common
•automake
•libtool
•autoconf.
onload_tcpdump
•libpcap
•libpcap‐devel1
solar_clusterd
•python‐devel1
Building32bitapplicationson64bitarchitectureplatforms
Thefollowingcommandscanbeusedtoinstall32bitlibcdevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu:
apt‐getinstallgcc‐multiliblibc6‐dev‐i386
•ForRedHat/Fedora:
yum‐yinstallglibc‐devel.i586
•ForSuSE:
yast‐iglibc‐devel‐32bit
yast‐igcc‐32bit
1. Ifadditionalpackagesarenotinstalledthedependentcomponentwillnotbebuilt,butthe
Onloadbuildwillsucceed.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 189
DOnloadExtensionsAPI
TheOnloadExtensionsAPIallowstheusertocustomizeanapplicationusing
advancedfeaturestoimproveperformance.
TheExtensionsAPIdoesnotcreateanyruntimedependencyonOnloadandan
applicationusingtheAPIcanrunwithoutOnload.ThelicensefortheAPIand
associatedlibrariesisaBSD2‐ClauseLicense.
Thissectioncoversthefollowstopics:
•CommonComponentsonpage189
•StacksAPIonpage193
•Zero‐CopyAPIonpage201
•TemplatedSendsonpage212
•DelegatedSendsAPIonpage213
D.1SourceCode
TheonloadsourcecodeisprovidedwiththeOnloaddistribution.Entrypointsfor
thesourcecodeare:
•src/lib/transport/unix/onload_ext_intercept.c
•src/lib/transport/unix/zc_intercept.c
D.2CommonComponents
ForallapplicationsemployingtheExtensionsAPIthefollowingcomponentsare
provided:
• #include<onload/extensions.h>
Anapplicationshouldincludetheheaderfilecontainingfunctionprototypes
andconstantvaluesrequiredwhenusingtheAPI.
•libonload_ext.a,libonload_ext.so
ThislibraryprovidesstubimplementationsoftheextendedAPI.Anapplication
thatwishestousetheextensionsAPIshouldlinkagainstthislibrary.
WhenOnloadisnotpresent,theapplicationwillcontinuetofunction,butcalls
totheextensionsAPIwillhavenoeffect(unlessdocumentedotherwise).
Tolinktothislibraryincludethe‘‐l’linkeroptiononthecompilercommandline
i.e.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 190
‐lonload_ext
onload_is_present
Description
Iftheapplicationislinkedwithlibonload_ext,butnotrunningwithOnloadthiswill
return0.IftheapplicationisrunningwithOnloadthiswillreturn1.
Definition
intonload_is_present(void)
FormalParameters
None
ReturnValue
1fromlibonload.solibrary,or0fromlibonload_ext.alibrary
onload_fd_stat
structonload_stat
{
int32_tstack_id;
char*stack_name;
int32_tendpoint_id;
int32_tendpoint_state;
};
externintonload_fd_stat(intfd,structonload_stat*stat);
Description
Retrievesinternaldetailsaboutanacceleratedsocket.
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0socketisnotaccelerated
1socketisaccelerated
‐ENOMEMwhenmemorycannotbeallocated

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 191
Notes
Whencallingfree()onstack_nameusethe(char*)becausememoryisallocated
usingmalloc.
Thisfunctionwillcallmalloc()andsoshouldneverbecalledfromanyother
functionrequiringamalloclock.
onload_fd_check_feature
intonload_fd_check_feature(intfd,enumonload_fd_featurefeature);
enumonload_fd_feature{
/*CheckwhetherthisfdsupportsONLOAD_MSG_WARMornot*/
ONLOAD_FD_FEAT_MSG_WARM
};
Description
UsedtocheckwhethertheOnloadfiledescriptorsupportsafeatureornot.
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0ifthefeatureissupportedbutnotonthisfd
>0ifthefeatureissupportedbothbyonloadandthisfd
<0ifthefeatureissupported:
‐ENOSYSifonload_fd_check_feature()isnotsupported.
‐ENOTSUPPifthefeatureisnotsupportedbyonload.
Notes
Onload‐201509andlaterversionssupportthe
ONLOAD_FD_FEAT_UDP_TX_TS_HDRoption.onload_fd_check_featurewillreturn
1toindicatethatarecvmesgusedtoretreiveTXtimestampsforUDPpacketswill
returntheentireEthernetheader.Whenrunonolderversionsofonloadthiswill
return‐EOPNOTSUPP.
onload_thread_set_spin
Description
Foreachthread,specifywhichoperationsshouldspin.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 192
Definition
intonload_thread_set_spin(
enumonload_spin_typetype,
unsignedspin)
FormalParameters
type
Whichoperationtochangethespinstatusof.Thetypemustbeoneofthe
following:
enumonload_spin_type{
ONLOAD_SPIN_ALL
ONLOAD_SPIN_UDP_RECV,
ONLOAD_SPIN_UDP_SEND,
ONLOAD_SPIN_TCP_RECV,
ONLOAD_SPIN_TCP_SEND,
ONLOAD_SPIN_TCP_ACCEPT,
ONLOAD_SPIN_PIPE_RECV,
ONLOAD_SPIN_PIPE_SEND,
ONLOAD_SPIN_SELECT,
ONLOAD_SPIN_POLL,
ONLOAD_SPIN_PKT_WAIT,
ONLOAD_SPIN_EPOLL_WAIT
};
spin
Abooleanwhichindicateswhethertheoperationshouldspinornot.
ReturnValue
0onsuccess
‐EINVALifunsupportedtypeisspecified.
Notes
Spintime(forallthreads)issetusingtheEF_SPIN_USECparameter.
Examples
Theonload_thread_set_spinAPIcanbeusedtocontrolspinningonaper‐thread
orper‐APIbasis.Theexistingspin‐relatedconfigurationoptionssetthedefault
behaviorforthreads,andtheonload_thread_set_spinAPIoverridesthedefault.
Disableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,0);
Enableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,1);

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 193
Enablespinningonlyforcertainthreads:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin,invokeonload_thread_set_spin().
Disablespinningonlyincertainthreads:
1EnablespinningbysettingEF_POLL_USEC=<timeout>.
2Ineachthreadthatshouldnotspin,invokeonload_thread_set_spin().
NOTE:IfathreadissettoNOTspinandthenblocksthismayinvokeaninterrupt
forthewholestack.Interruptsoccurringonmoderatelybusythreadsmay
causeunintendedandundesirableconsequences.
EnablespinningforUDPtraffic,butnotTCPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(UDPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_UDP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_UDP_SEND,1)
EnablespinningforTCPtraffic,butnotUDPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(TCPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_TCP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_SEND,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_ACCEPT,1)
D.3StacksAPI
UsingtheOnloadExtensionsAPIanapplicationcanbindselectedsocketstospecific
Onloadstacksandinthiswayensurethattime‐criticalsocketsarenotstarvedof
resourcesbyothernon‐criticalsockets.TheAPIallowsanapplicationtoselect
socketswhicharetobeacceleratedthusreservingOnloadresourcesfor
performancecriticalpaths.Thisalsopreventsnon‐criticalpathsfromcreatingjitter
forcriticalpaths.
onload_set_stackname
Description
SelecttheOnloadstackthatnewsocketsareplacedin.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 194
Definition
intonload_set_stackname(
intwho,
intscope,
constchar*name)
FormalParameters
who
Mustbeoneofthefollowing:
‐ ONLOAD_THIS_THREAD‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbythisthread.
‐ ONLOAD_ALL_THREADS‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbyallthreadsinthecurrentprocess.
ONLOAD_THIS_THREADtakesprecedenceoverONLOAD_ALL_THREADS.
scope
Mustbeoneofthefollowing:
‐ ONLOAD_SCOPE_THREAD‐nameisscopedwithcurrentthread
‐ ONLOAD_SCOPE_PROCESS‐nameisscopedwithcurrentprocess
‐ ONLOAD_SCOPE_USER‐nameisscopedwithcurrentuser
‐ ONLOAD_SCOPE_GLOBAL‐nameisglobalacrossallthreads,usersand
processes.
‐ ONLOAD_SCOPE_NOCHANGE‐undoeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…),seeNoteson
page195.
name
Oneofthefollowing:
‐ thestacknameupto8characters.
‐ anemptystringtosetnostackname
‐ thespecialvalueONLOAD_DONT_ACCELERATEtopreventsocketscreated
inthisthread,user,processfrombeingaccelerated.
SocketsidentifiedbytheoptionsabovewillbelongtotheOnloadstackuntila
subsequentcallusingonload_set_stacknameidentifiesadifferentstackorthe
ONLOAD_SCOPE_NOCHANGEoptionisused.
ReturnValue
0onsuccess
‐1witherrnosettoENAMETOOLONGifthenameexceedspermittedlength
‐1witherrnosettoEINVALifotherparametersareinvalid.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 195
Notes
Note1
Thisappliesforstacksselectedforsocketscreatedbysocket()andforpipe(),it
hasnoeffectonaccept().Passivelyopenedsocketscreatedviaaccept()will
alwaysbeinthesamestackasthelisteningsocketthattheyarelinkedto,thismeans
thatthefollowingarefunctionallyidenticali.e.
onload_set_stackname(foo)
socket
listen
onload_set_stackname(bar)
accept
and
onload_set_stackname(foo)
socket
listen
accept
onload_set_stackname(bar)
Inbothcasesthelisteningsocketandtheacceptedsocketwillbeinstackfoo.
Note2
Scopedefinesthenamespaceinwhichastackbelongs.Astacknameoffooinscope
userisnotthesameasastacknameoffooinscopethread.Scoperestrictsthe
visibilityofastacktoeitherthecurrentthread,currentprocess,currentuseroris
unrestricted(global).Thishasthepropertythatwith,forexample,processbased
scoping,twoprocessescanhavethesamestacknamewithoutsharingastack‐as
thestackforeachprocesshasadifferentnamespace.
Note3
Scopingcanbethoughtofasaddingasuffixtothesuppliednamee.g.
ONLOAD_SCOPE_THREAD:<stackname>‐t<thread_id>
ONLOAD_SCOPE_PROCESS:<stackname>‐p<process_id>
ONLOAD_SCOPE_USER:<stackname>‐u<user_id>
ONLOAD_SCOPE_GLOBAL:<stackname>
Thisisanexampleonlyandtheimplementationisfreetodosomethingdifferent
suchasmaintainingdifferentlistsfordifferentscopes.
Note4
ONLOAD_SCOPE_NOCHANGEwillundotheeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…).
Ifyouhavepreviouslyusedonload_set_stackname(ONLOAD_THIS_THREAD,…)and
wanttoreverttothebehaviorofthreadsthatareusingtheONLOAD_ALL_THREADS
configuration,withoutchangingthatconfiguration,youcandothefollowing:

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 196
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_NOCHANGE,"");
Relatedenvironmentvariables
Relatedenvironmentvariablesare:
EF_DONT_ACCELERATE
Default:0
Minimum:0
Maximum:1
Scope:Per‐process
IfthisenvironmentvariableissetthenaccelerationforALLsocketsisdisabledand
handedofftothekernelstackuntiltheapplicationoverridesthisstatewithacallto
onload_set_stackname().
EF_STACK_PER_THREAD
Default:0
Minimum:0
Maximum:1
Scope:Per‐process
Ifthisenvironmentvariableisseteachsocketcreatedbytheapplicationwillbe
placedinastackdependingonthethreadinwhichitiscreated.Stackscould,for
example,benamedusingthethreadIDofthethreadthatcreatesthestack,butthis
shouldnotbereliedupon.
Acalltoonload_set_stacknameoverridesthisvariable.EF_DONT_ACCELERATE
takesprecedenceoverthisvariable.
EF_NAME
Default:none
Minimum:0chars
Maximum:8chars
Scope:per‐stack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stacknameoverridesthisvariableand,
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
onload_move_fd
Description
Movethefiledescriptortothecurrentstack.Thetargetstackcanbespecifiedwith
onload_set_stackname().

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 197
Definition
intonload_move_fd(intfd)
FormalParameters
fd‐thefiledescriptortobemovedtothecurrentstack.
ReturnValue
0onsuccess
non‐zerootherwise.
Notes
Usefultomovefdsobtainedbyaccept()tomoveanewconnectionoutofthe
listeningsocket.
CurrentlylimitedtoTCPclosedsocketsandTCPacceptedsockets.Asockettobe
movedmusthaveanemptysendqueueandemptyre‐transmitqueue.Asocket
whichhashadasend()operationcannotbemoved.
ShouldnotbeusedsimultaneouslywithotherI/Omultiplexactionsi.e.poll(),
select(),recv()etconthefiledescriptor.
Thisfunctionisnotasync‐safeandshouldneverbecalledfromanyprocessfunction
handlingsignals.
onload_stackname_save
Description
Savethestateofthecurrentonloadstackidentifiedbythepreviouscallto
onload_set_stackname()
Definition
intonload_stackname_save(void)
FormalParameters
none
ReturnValue
0onsuccess
‐ENOMEMwhenmemorycannotbeallocated.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 198
onload_stackname_restore
Description
Restorestackstatesavedwithapreviouscalltoonload_stackname_save().All
updates/changestostateofthecurrentstackwillbedeletedandallstatepreviously
savedwillberestored.Toavoidunexpectedresults,thestackshouldberestoredin
thesamethreadasusedtocallonload_stackname_save().
Definition
intonload_stackname_restore(void)
FormalParameters
none
ReturnValue
0onsuccess
non‐zeroifanerroroccurs.
Notes
TheAPIstacknamesaveandrestorefunctionsprovideflexibilitywhenbinding
socketstoanOnloadstack.
Usingacombinationofonload_set_stackname(),onload_stackname_save()
andonload_stackname_restore(),theuserisabletocreatedefaultstacksettings
whichapplytooneormoresockets,savethisstateandthencreatechangedstack
settingswhichareappliedtoothersockets.Theoriginaldefaultsettingscanthenbe
restoredtoapplytosubsequentsockets.
D.4StacksAPIUsage
UsingacombinationoftheEF_DONT_ACCELERATEenvironmentvariableandthe
functiononload_set_stackname(),theuserisabletocontrol/selectsocketswhich
aretobeacceleratedandisolatetheseperformancecriticalsocketsandthreads
fromtherestofthesystem.
onload_stack_opt_set_int
Description
Set/modifyperstackoptionsthatallsubsequentlycreatedstackswilluseinsteadof
usingtheexistingglobalstackoptions.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 199
Definition
intonload_stack_opt_set_int(
constchar*name,
int64_tvalue)
FormalParameters
name
Stackoptiontomodify
value
Newvalueforthestackoption.
Example
onload_stack_opt_set_int(EF_DONT_ACCELERATE,1);
ReturnValue
0onsuccess
‐1witherrnosettoEINVALiftherequestedoptionisnotfound.
Notes
Cannotbeusedtomodifyoptionsonexistingstacks‐onlyfornewstacks.
Cannotbeusedtomodifyprocessoptions‐onlystackoptions.
Modifiedoptionswillbeusedforallnewlycreatedstacksuntil
onload_stack_opt_reset()iscalled.
onload_stack_opt_reset
Description
Reverttousingglobalstackoptionsfornewlycreatedstacks.
Definition
intonload_stack_opt_reset(void)
FormalParameters
None.
ReturnValue
0always

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 200
Notes
Shouldbecalledfollowingacalltoonload_stack_opt_set_int()torevertto
usingglobalstackoptionsforallnewlycreatedstacks.
D.5StacksAPI‐Examples
•Thisthreadwillusestackfoo,otherthreadsinthestackwillcontinueasbefore.
onload_set_stackname(ONLOAD_THIS_THREAD,ONLOAD_SCOPE_GLOBAL,"foo")
•Allthreadsinthisprocesswillgettheirownstackcalledfoo.Thisisequivalent
totheEF_STACK_PER_THREADenvironmentvariable.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_THREAD,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessdid
thesamefunctioncallitwillgetitsownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_PROCESS,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessrunby
thesameuserdidthesame,itwouldsharethesamestackasthefirstprocess.
Ifanotherprocessrunbyadifferentuserdidthesameitwouldgetisownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_USER,"foo")
• EquivalenttoEF_NAME.Allthreadswilluseastackcalledfoowhichissharedby
anyotherprocesswhichdoesthesame.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,"foo")
• EquivalenttoEF_DONT_ACCELERATE.Newsockets/pipeswillnotbeaccelerated
untilanothercalltoonload_set_stackname().
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,ONLOAD_DONT_ACCELERATE)
onload_ordered_epoll_wait
FordetailsoftheWireOrderDeliveryfeaturerefertoWireOrderDeliveryon
page61
Description
Iftheepollsetcontainsacceleratedsocketsinonlyonestackthisfunctioncanbe
usedinsteadofepoll_wait()toreturneventsintheorderthesewererecoveredfrom
thewire.Thereisnoexplicitcheckonsockets,soapplicationsmustensurethatthe
rulesareappliedtoavoidmis‐orderingofpackets.
Definition
intonload_ordered_epoll_wait(
intepfd,
structepoll_event*events,
structonload_ordered_epoll_event*oo_events,
intmaxevents,
inttimeout);

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 201
FormalParameters
Seedefinitionepoll_wait().
ReturnValue
0onsuccess
non‐zerootherwise.
Notes
Anyfiledescriptorsreturnedasreadywithoutavalidtimestampi.e.tv_sec=0,
shouldbeconsideredun‐orderedwithrespecttotherestoftheset.Thiscanoccur
fordatareceivedviathekernelordatareturnedwithoutahardwaretimestampi.e.
fromaninterfacethatdoesnotsupporthardwaretimestamping.
TheenvironmentvariableEF_UL_EPOLL=1mustbesetHardwaretimestampsare
required.ThisfeatureisonlyavailableontheSFN7000seriesadapters.
structonload_ordered_epoll_event{
/*Thehardwaretimestampofthefirstreadabledata*/
structtimespects;
/*Numberofbytesthatmaybereadtomaintainwireorder*/
intbytes
};
D.6Zero‐CopyAPI
Zero‐Copycanimprovetheperformanceofnetworkingapplicationsbyeliminating
intermediatebufferswhentransferringdatabetweenapplicationandnetwork
adapter.
TheOnloadExtensionsZero‐CopyAPIsupportszero‐copyofUDPreceivedpacket
dataandTCPtransmitpacketdata.
TheAPIprovidesthefollowingcomponents:
•#include<onload/extensions_zc.h>
Inadditiontothecommoncomponents,anapplicationshouldincludethis
headerfilewhichcontainsallfunctionprototypesandconstantvaluesrequired
whenusingtheAPI.
Thisfileincludescomprehensivedocumentation,requireddatastructuresand
functiondefinitions.
Zero‐CopyDataBuffers
Toavoidthecopydataispassedtoandfromtheapplicationinspecialbuffers
describedbyastructonload_zc_iovec.Amessageordatagramcanconsistof
multipleiovecsusingastructonload_zc_msg.Asinglecalltosendmayinvolve
multiplemessagesusinganarrayofstructonload_zc_mmsg.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 202
/*Azc_iovecdescribesasinglebuffer*/
structonload_zc_iovec{
void*iov_base;/*Addresswithinbuffer*/
size_tiov_len;/*Lengthofdata*/
onload_zc_handlebuf;/*(opaque)bufferhandle*/
unsignediov_flags;/*Notcurrentlyused*/
};
/*Amsgdescribesarrayofiovecsthatmakeupdatagram*/
structonload_zc_msg{
structonload_zc_iovec*iov;/*Arrayofbuffers*/
structmsghdrmsghdr;/*Messagemetadata*/
};
/*Anmmsgdescribesamessage,thesocket,anditsresult*/
structonload_zc_mmsg{
structonload_zc_msgmsg;/*Message*/
intrc;/*Resultofsendoperation*/
intfd;/*sockettosendon*/
};
Figure17:Zero‐CopyDataBuffers
Zero‐CopyUDPReceiveOverview
Figure18illustratesthedifferencebetweenthenormalUDPreceivemodeandthe
zero‐copymethod.
WhenusingthestandardPOSIXsocketcalls,theadapterdeliverspacketstoan
OnloadpacketbufferwhichisdescribedbyadescriptorpreviouslyplacedintheRX
descriptorring.Whentheapplicationcallsrecv(),Onloadcopiesthedatafromthe
packetbuffertoanapplication‐suppliedbuffer.
Usingthezero‐copyUDPreceiveAPItheapplicationcallstheonload_zc_recv()
functionincludingacallbackfunctionwhichwillbecalledwhendataisready.The
callbackcandirectlyaccessthedatainsidetheOnloadpacketbufferavoidingacopy.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 203
Figure18:Traditionalvs.Zero‐CopyUDPReceive
Asinglecallusingonload_zc_recv()functioncanresultinmultipledatagrams
beingdeliveredtothecallbackfunction.EachtimethecallbackreturnstoOnload
thenextdatagramisdelivered.ProcessingstopswhenthecallbackinstructsOnload
toceasedeliveryortherearenofurtherreceiveddatagrams.
Ifthereceivingapplicationdoesnotrequiretolookatalldatareceived(i.e.is
filtering)thiscanresultinaconsiderableperformanceadvantagebecausethisdata
isnotpulledintotheprocessor'scache,therebyreducingtheapplicationcache
footprint.
Asageneralrule,thecallbackfunctionshouldavoidcallingothersystemcallswhich
attempttomodifyorclosethecurrentsocket.
Zero‐copyUDPReceiveisimplementedwithintheOnloadExtensionsAPI.
Zero‐CopyUDPReceive
Theonload_zc_recv()functionspecifiesacallbacktoinvokeforeachreceived
UDPdatagram.Thecallbackisinvokedinthecontextofthecallto
onload_zc_recv()(i.e.Itblocks/spinswaitingfordata).

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 204
Beforecalling,theapplicationmustsetthefollowinginthestruct
onload_zc_recv_args:
typedefenumonload_zc_callback_rc
(*onload_zc_recv_callback)(structonload_zc_recv_args*args,intflags);
structonload_zc_recv_args
{
structonload_zc_msgmsg;
onload_zc_recv_callbackcb;
void*user_ptr;
intflags;
};
intonload_zc_recv(intfd,structonload_zc_recv_args*args);
Figure19:Zero‐Copyrecv_args
Thecallbackgetstoexaminethedata,andcancontrolwhathappensnext:(i)
whetherornotthebuffer(s)arekeptbythecallbackorareimmediatelyfreedby
Onload;and(ii)whetherornotonload_zc_recv()willinternallyloopandinvoke
thecallbackwiththenextdatagram,orimmediatelyreturntotheapplication.The
nextactionisdeterminedbysettingflagsinthereturncodeasfollows:
cb settothecallbackfunctionpointer
user_ptr settopointtoapplicationstate,thisisnottouchedby
onload
msg.msghdr.msg_control
msg_controllen
msg_name
msg_namelen
theuserapplicationshouldsetthesetoappropriate
buffersandlengths(ifrequired)asyouwouldfor
recvmsg(orNULLand0ifnotused)
flags settoindicatebehavior(e.g.
ONLOAD_MSG_DONTWAIT)
ONLOAD_ZC_KEEP thecallbackfunctioncanelecttoretain
ownershipofreceivedbuffer(s)byreturning
ONLOAD_ZC_KEEP.Followingthis,thecorrect
waytoreleaseretainedbuffersistocall
onload_zc_release_buffers()toexplicitly
releasethefirstbufferfromeachreceived
datagram.Subsequentbufferspertainingtothe
samedatagramwillthenbeautomatically
released.
ONLOAD_ZC_CONTINUE tosuggestthatOnloadshouldloopandprocess
moredatagrams
ONLOAD_ZC_TERMINATE toinsistthatOnloadimmediatelyreturnfrom
theonload_zc_recv()

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 205
FlagscanalsobesetbyOnload:
Ifthereisunaccelerateddataonthesocketfromthekernel’sreceivepaththis
cannotbehandledwithoutcopying.Theapplicationhastwochoicesasfollows:
Zero‐CopyReceiveExample#1
structonload_zc_recv_argsargs;
structzc_recv_statestate;
intrc;
state.bytes=bytes_to_wait_for;
/*Easywaytosetmsg_control*andmsg_name*tozero*/
memset(&args.msg,0,sizeof(args.msg));
args.cb=&zc_recv_callback;
args.user_ptr=&state;
args.flags=ONLOAD_ZC_RECV_OS_INLINE;
rc=onload_zc_recv(fd,&args);
//‐‐‐
enumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflags)
{
inti;
structzc_recv_state*state=args‐>user_ptr;
for(i=0;i<args‐>msg.msghdr.msg_iovlen;++i){
printf("zccallbackiov%d:%p,%d",i,
args‐>msg.iov[i].iov_base,
args‐>msg.iov[i].iov_len);
state‐>bytes‐=args‐>msg.iov[i].iov_len;
}
ONLOAD_ZC_END_OF_BURST Onloadsetsthisflagtoindicatethatthisisthe
lastpacket
ONLOAD_ZC_MSG_SHARED Packetbuffersarereadonly
ONLOAD_MSG_RECV_OS_INLINE setthisflagwhencallingonload_zc_recv().
Onloadwilldealwiththekerneldatainternally
andpassittothecallback
checkreturncode checkthereturncodefromonload_zc_recv().
IfitreturnsENOTEMPTYthentheapplicationmust
callonload_recvmsg_kernel()toretrievethe
kerneldata.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 206
if(state‐>bytes<=0)returnONLOAD_ZC_TERMINATE;
elsereturnONLOAD_ZC_CONTINUE;
}
Figure20:Zero‐CopyReceive‐example#1
Zero‐CopyReceiveExample#2
staticenumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflag)
{
structuser_info*zc_info=args‐>user_ptr;
inti,zc_rc=0;
for(i=0;i<args‐>msg.msghdr.msg_iovlen;++i){
zc_rc+=args‐>msg.iov[i].iov_len;
handle_msg(args‐>msg.iov[i].iov_base,
args‐>msg.iov[i].iov_len);
}
if(zc_rc==0)
returnONLOAD_ZC_TERMINATE;
zc_info‐>zc_rc+=zc_rc;
if((zc_info‐>flags&MSG_WAITALL)&&
(zc_info‐>zc_rc<zc_info‐>size))
returnONLOAD_ZC_CONTINUE;
elsereturnONLOAD_ZC_TERMINATE;
}
ssize_tdo_recv_zc(intfd,void*buf,size_tlen,intflags)
{
structuser_infoinfo;intrc;
init_user_info(&info);
memset(&zc_args,0,sizeof(zc_args));
zc_args.user_ptr=&info;
zc_args.flags=0;
zc_args.cb=&zc_recv_callback;
if(flags&MSG_DONTWAIT)
zc_args.flags|=ONLOAD_MSG_DONTWAIT;
rc=onload_zc_recv(fd,&zc_args);
if(rc==‐ENOTEMPTY){
if((rc=onload_recvmsg_kernel(fd,&msg,0))<0)
printf("onload_recvmsg_kernelfailed\n");
}
elseif(rc==0){
/*zc_rcgetssetbycallbacktobytesreceived,sowe
*canreturnthattoappearlikestandardrecvcall*/
rc=info.zc_rc;
}
returnrc;
}
Figure21:Zero‐CopyReceive‐example#2

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 207
NOTE:onload_zc_recv()onlysupportsaccelerated(Onloaded)sockets.For
example,whenboundtoabroadcastaddressthesocketfdishandedofftothe
kernelandthisfunctionwillreturnESOCKNOTSUPPORT.
Zero‐CopyTCPSendOverview
Figure22illustratesthedifferencebetweenthenormalTCPtransmitmethodand
thezero‐copymethod.
WhenusingstandardPOSIXsocketcalls,theapplicationfirstcreatesthepayload
datainanapplicationallocatedbufferbeforecallingthesend()function.Onload
willcopythedatatoaOnloadpacketbufferinmemoryandpostadescriptortothis
bufferinthenetworkadapterTXdescriptorring.
Usingthezero‐copyTCPtransmitAPItheapplicationcallsthe
onload_zc_alloc_buffers()functiontorequestbuffersfromOnload.Apointer
toapacketbufferisreturnedinresponse.Theapplicationplacesthedatatosend
directlyintothisbufferandthencallsonload_zc_send()toindicatetoOnloadthat
dataisavailabletosend.
OnloadwillpostadescriptorforthepacketbufferinthenetworkadapterTX
descriptorringandringtheTXdoorbell.Thenetworkadapterfetchesthedatafor
transmission.
Figure22:Traditionalvs.Zero‐CopyTCPTransmit

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 208
NOTE:Thesocketusedtoallocatezero‐copybuffersmustbeinthesamestackas
thesocketusedtosendthebuffers.WhenusingTCPloopback,Onloadcanmovea
socketfromonestacktoanother.UsersmustensurethattheyALWAYSUSE
BUFFERSFROMTHECORRECTSTACK.
NOTE:Theonload_zc_sendfunctiondoesnotcurrentlysupportthe
ONLOAD_MSG_MOREorTCP_CORKflags.
Zero‐copyTCPtransmitisimplementedwithintheOnloadExtensionsAPI.
Zero‐CopyTCPSend
Thezero‐copysendAPIsupportsthesendingofmultiplemessagestodifferent
socketsinasinglecall.Databuffersmustbeallocatedinadvanceandforbest
efficiencytheseshouldbeallocatedinblocksandoffthecriticalpath.Theuser
shouldavoidsimplymovingthecopyfromOnloadintotheapplication,butwhere
thisisunavoidable,itshouldalsobedoneoffthecriticalpath.
intonload_zc_send(structonload_zc_mmsg*msgs,intmlen,intflags);
Figure23:Zero‐Copysend
intonload_zc_alloc_buffers(intfd,
structonload_zc_iovec*iovecs,
intiovecs_len,
onload_zc_buffer_type_flagsflags);
intonload_zc_release_buffers(intfd,
onload_zc_handle*bufs,
intbufs_len);
Figure24:Zero‐Copyallocatebuffers
Theonload_zc_send()functionreturnvalueidentifieshowmanyofthe
onload_zc_mmsgarray’srcfieldsareset.Eachonload_zc_mmsg.rcreturnshow
manybytes(orerror)weresentinforthatmessage.Refertothetablebelow.
rc=onload_zc_send()
rc<0applicationerrorcallingonload_zc_send().rcissetto
theerrorcode
rc==0shouldnothappen
0<rc<=n_msgs rcissettothenumberofmessageswhosestatushasbeen
sentinmmsgs[i].rc.
rc==n_msgsisthenormalcase

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 209
SentbuffersareownedbyOnload.Unsentbuffersareownedbytheapplicationand
mustbefreedorreusedtoavoidleaking.
Zero‐CopySend‐SingleMessage,SingleBuffer
structonload_zc_ioveciovec;
structonload_zc_mmsgmmsg;
rc=onload_zc_alloc_buffers(fd,&iovec,1,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==O);
assert(my_data_len<=iovec.iov_len);
memcpy(iovec.iov_base,my_data,my_data_len);
iovec.iov_len=my_data_len;
mmsg.fd=fd;
mmsg.iov=&iovec;
mmsg.msg.msghdr.msg_iovlen=1;
rc=onload_zc_send(&mmsg,1,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
/*Onlyonemessage,sorcshouldbe1*/
assert(rc==1);
/*rc==1sowecanlookatthefirst(only)mmsg.rc*/
if(mmsg.rc<0)
/*Errorsendingmessage*/
onload_zc_release_buffers(fd,&iovec.buf,1);
else
/*Messagesent,singlemsg,singleiovecso
*shouldn'tworryaboutpartialsends*/
assert(mmsg.rc==my_data_len);
}
Figure25:Zero‐Copy‐SingleMessage,SingleBufferExample
Theexampleabovedemonstrateserrorcodehandling.Noteitcontainsanexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
Zero‐CopySend‐MultipleMessage,MultipleBuffers
#defineN_BUFFERS2
#defineN_MSGS2
rc=mmsg[i].rc
rc<0errorsendingthismessage.rcissettotheerrorcode
rc>=0rcissettothenumberofbytesthathavebeensentinthis
message.Comparetothemessagelengthtoestablish
whichbufferssent

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 210
structonload_zc_ioveciovec[N_MSGS][N_BUFFERS];
structonload_zc_mmsgmmsg[N_MSGS];
for(i=0;i<N_MSGS;++i){
rc=onload_zc_alloc_buffers(fd,iovec[i],N_BUFFERS,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==0);
/*TODOstoredatainiovec[i][j].iov_base,
*setiovec[i][j]iov_len*/
mmsg[i]fd=fd;/*Couldbedifferentforeachmessage*/
mmsg[i].iov=iovec[i];
mmsg[i].msg.msghdr.msg_iovlen=N_BUFFERS;
}
rc=onload_zc_send(mmsg,N_MSGS,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
for(i=0;i<N_MSGS;++i){
if(i<rc){
/*mmsg[i]issetandwecanuseit*/
if(mmsg[i]<0){
/*errorsendingthismessage‐releasebuffers*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}elseif(mmsg(i]<sum_over_j(iovec[i][j].iov_len)){
/*partialsuccess*/
/*TODOusemmsg[i]todeterminewhichbuffersin
*iovec[i]arrayaresentandwhicharestill
*ownedbyapplication*/
}else{
/*Wholemessagesent,buffersnowownedbyOnload*/
}
}else{
/*mmsg[i]isnotset,thismessagewasnotsent*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}
}
}
Figure26:Zero‐Copy‐MultipleMessages,MultipleBuffersExample
Theexampleabovedemonstrateserrorcodehandlingandcontainssomeexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
Zero‐CopySend‐FullExample
staticstructonload_zc_ioveciovec[NUM_ZC_BUFFERS];
staticssize_tdo_send_zc(intfd,constvoid*buf,size_tlen,intflags)
{
intbytes_done,rc,i,bufs_needed;
structonload_zc_mmsgmmsg;

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 211
mmsg.fd=fd;
mmsg.msg.iov=iovec;
bytes_done=0;
mmsg.msg.msghdr.msg_iovlen=0;
while(bytes_done<len){
if(iovec[mmsg.msg.msghdr.msg_iovlen].iov_len>(len‐bytes_done))
iovec[mmsg.msg.msghdr.msg_iovlen].iov_len=(len‐bytes_done);
memcpy(iovec[i].iov_base,buf+bytes_done,iov_len);
bytes_done+=iovec[mmsg.msg.msghdr.msg_iovlen].iov_len;
++mmsg.msg.msghdr.msg_iovlen;
}
rc=onload_zc_send(&mmsg,1,0);
if(rc!=1/*Numberofmessageswesent*/){
printf("onload_zc_sendfailedtoprocessmsg,%d\n",rc);
return‐1;
}else{
if(mmsg.rc<0)
printf("onload_zc_sendmessageerror%d\n",mmsg.rc);
else{
/*Iterateovertheiovecs;anythatweresentwemustreplenish.*/
i=0;bufs_needed=0;
while(i<mmsg.msg.msghdr.msg_iovlen){
if(bytes_done==mmsg.rc){
printf(onload_zc_senddidnotsendiovec%d\n",i);
/*Inotherbufferallocationschemeswewouldhavetorelease
*thesebuffers,butseemspointlessasweguaranteeatthe
*endofthisfunctiontohaveiovecarrayfull,sodonothing.
*/
}else{
/*Buffersent,nowownedbyOnload,soreplenishiovecarray*/
++bufsneeded;
bytes_done+=iovec[i].iov_len;
}
++i;
}
if(bufs_needed)/*replenishtheiovecarray*/
rc=onload_zc_alloc_buffers(fd,iovec,bufs_needed,
ONLOAD_ZC_BUFFER_HDR_TCP);
}
}
/*Setareturncodethatlookssimilarenoughtosend().NB.we're
*notsetting(andneitherdoesonload_zc_send())errno*/
if(mmsg.rc<0)return‐1;
elsereturnbytes_done;
}
Figure27:Zero‐CopySend

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 212
D.7TemplatedSends
Foradescriptionofthetemplatessendsfeature,refertoTemplatedSendson
page108.Foradescriptionofthepackettemplatetobeusedbythetemplated
sendsfeaturerefertotheusenotesandreferencestoonload_msg_templateinthe
[onload]/src/include/onload/extensions_zc.hfileincludedfromtheOnload
distribution.
MSGTemplate
structoo_msg_template{
/*Toverifysubsequenttemplatedcallsareusedwiththesamesocket*/
oo_spoomt_sock_id;
};
MSGUpdate
/*Anupdate_iovecdescribesasingletemplateupdate*/
structonload_template_msg_update_iovec{
void*otmu_base;/*Pointertonewdata*/
size_totmu_len;/*Lengthofnewdata*/
off_totmu_offset;/*Offsetwithintemplatetoupdate*/
unsignedotmu_flags;/*Forfutureuse.Mustbesetto0.*/
};
MSGAllocation
/*Validoptionsforflagsare:ONLOAD_TEMPLATE_FLAGS_PIO_RETRY*/
externintonload_msg_template_alloc(intfd,structiovec*initial_msg,
intmlen,onload_template_handle*handle,
unsignedflags);
MSGTemplateUpdate
/*Validoptionsforflagsare:ONLOAD_TEMPLATE_FLAGS_SEND_NOW,
*ONLOAD_TEMPLATE_FLAGS_DONTWAIT
*/
externint
onload_msg_template_update(intfd,onload_template_handlehandle,
structonload_template_msg_update_iovec*updates,
intulen,unsignedflags);
MSGTemplateAbort
externintonload_msg_template_abort(intfd,onload_template_handlehandle);

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 213
D.8DelegatedSendsAPI
ThedelegatedsendAPI,supportedbySolarflareSFN7000seriesadapters,canlower
thelatencyoverheadincurredwhencallingsend()onTCPsocketsbycontrolling
TCPsocketcreationandmanagementthroughOnload,butallowingTCPsends
directlythroughtheOnloadlayer2ef_viAPIorothersimilarAPI.
Description
AnapplicationusingthedelegatedsendsAPIwillprepareapacketbufferwithIP/
TCPheaderdata,beforeaddingpayloaddatatothepacket.Thepacketbuffercan
bepreparedinadvanceandpayloadaddedjustbeforethesendisrequired.
Aftereachdelegatedsend,theactualdatasent(andlengthofthatdata)isreturned
toOnload.ThisallowsOnloadtoupdatetheTCPinternalstateandhavethedatato
handifretransmissionsarerequiredonthesocket.
ThisfeatureisintendedforapplicationsthatmakesporadicTCPsendsasopposed
tolargeamountsofbi‐directionalTCPtraffic.TheAPIshouldbeusedwithcaution
tosendsmallamountsofTCPdata.Althoughthepacketbuffercanbepreparedin
advanceofthesend,theideaistocompletethedelegatedsendoperation
(onload_delegated_send_complete())soonaftertheinitialsendtomaintainthe
integrityoftheTCPinternalstate.
TheuserisresponsibleforserializationwhenusingthedelegatedsendAPI.Thefirst
callshouldalwaysbeonload_delegated_send_prepare().Ifanormalsendis
requiredfollowingtheprepare,theusershoulduse
onload_delegated_send_cancel().
Foragivenfiledescriptor,whileadelegatedsendisinprogress,anduntilcomplete
hasbeencalled,theusershouldNOTattemptanystandardsend(),write()or
sendfile()close()etcoperations.
Performance
Forbestlatencytheapplicationshouldcallonload_delegated_send_complete()
assoonasadelegatedsendiscomplete.ThisallowsOnloadtocontinueif
retransmissionsarerequired‐Onloadcannotperformanyretransmissionuntil
completehasbeencalled.
Whenalinkpartnerhasalreadyacknowledgeddatabeforecompletehasbeen
called,OnloadwillnothavetocopythesentdatatotheTCPretransmitqueue.So
delayingthecompletecallmayavoidadatacopybutlatencymaysufferintheevent
ofpacketloss.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 214
ExampleCode
TheOnload‐201502distributionincludestheefdelegated_server.cand
efdelegated_client.cexampleapplicationstodemonstratethedelegatedsends
API.Variablesandconstantsdefinitions,includingsocketflagsandfunctionreturn
codesrequiredwhenusingtheAPIcanbefoundintheextensions.hheaderfile.
onload_delegated_send_prepare
Description
Preparetosenduptosizebytes.AllocateTCPheadersandpreparethemwith
EthernetIP/TCPheaderdata.
Definition
enumonload_delegated_send_prepare(
intfd,
intsize,
uintflags,
structonload_delegated_send*)
FormalParameters
fd
Filedescriptortosendon
size
Sizeofpayloaddata
flags
Seebelow
structonload_delegated_send*
Seebelow
ReturnValue
0onsuccess
nonzerootherwise
Notes
Thisfunctioncanbecalledspeculativelysothatthepacketbufferispreparedin
advance,headersareaddedsothatthepacketpayloaddatacanbeadded
immediatelybeforethesendisrequired.
ThisfunctionassumesthepacketlengthisequaltoMSSinwhichcasethereisno
needtocallonload_delegated_send_tcp_update().

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 215
FlagsareusedforARPresolution:
•defaultflags=0
•ONLOAD_DELEGATED_SEND_FLAG_IGNORE_ARP‐donotdoARPlookup,the
callerwillprovidedestinationMACaddress.
•ONLOAD_DELEGATED_SEND_FLAG_RESOLVE_ARP‐ifARPinformationisnot
available,sendaspeculativeTCP_ACKtoprovokekernelintoARPresolution‐
waitupto1msforARPinformationtoappear.
TCPsendwindow/congestionwindowsmustberespectedduringdelegated
sends.
Seeextensions.hforflagsandreturncodevalues.
structonload_delegated_send{
void*headers;
intheaders_len;/*bufferlenoninput,headerslenonoutput*/
intmss;/*onepacketpayloadmaynotexceedthis*/
intsend_wnd;/*sendwindow*/
intcong_wnd;/*congestionwindow*/
intuser_size;/*the"size"valuefromsend_prepare()call*/
inttcp_seq_offset;
intip_len_offset;
intip_tcp_hdr_len;
intreserved[5];
};
onload_delegated_send_tcp_update
Description
Updatepacketheaderswithpayloadlengthandflags.
Definition
voidonload_delegated_send_tcp_update(
structonload_delegated_send*,
intsize,
intflags)
FormalParameters
structonload_delegated_send*
Seebelow
size
Sizeofpayloaddata
flags
Seebelow

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 216
ReturnValue
None
Notes
Thisfunctioniscalledwhen,duringasend,thepayloadlengthisnotequaltothe
MSSvalue.Seeonload_delegated_send_prepareonpage214.
FlagTCP_FLAG_PSHisexpectedtobesetonthelastpacketwhensendingalarge
datachunk.
onload_delegated_send_tcp_advance
Description
AdvanceTCPheadersaftersendingoneTCPpacket.
Definition
voidonload_delegated_send_tcp_advance(
structonload_delegated_send*,
intbytes)
FormalParameters
structonload_delegated_send*
Seebelow
bytes
Numberofbytessent
ReturnValue
None
Notes
Whensendingapacketusingmultiplesends,thefunctioniscalledtoupdatethe
headerdatawiththenumberofbytesaftereachsend.
Theactualdatasentisnotreturnedtoonloaduntil
onload_delegated_send_complete()iscalled.
onload_delegated_send_complete
Description
Followingadelegatedsend,thisfunctionisusedtoreturntheactualdatasent(and
lengthofthatdata)toOnloadwhichwillupdatetheinternalTCPstate.

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 217
Definition
intonload_delegated_send_complete(
intfd,
conststructiovec*,
intiovlen,
intflags)
FormalParameters
fd
Thefiledescriptor.
structiovec
Pointertothedatasent
iovlen
Size(bytes)ofthedatasent
flags
(MSG_DONTWAIT|MSG_NOSIGNAL]
ReturnValue
0onsuccess
non‐zeroifanerroroccurs.
Notes
Onloadisunabletodoanyretransmituntilthisfunctionhasbeencalled.
Thisfunctionshouldbecalledevenifsome(butnotall)bytesspecifiedinthe
preparefunctionhavebeensent.Theusermustalsocall
onload_delegated_send_cancel()ifsomeofthebytesarenotgoingtobesent
i.e.reserved‐but‐not‐sent‐seeonload_delegated_send_cancel()notesbelow.
ThisfunctioncanblockbecauseofSO_SNDBUFlimitationandwillignorethe
SO_SNDTIMEOvalue.
onload_delegated_send_cancel
Description
Nomoredelegatedsendisplanned.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.
Definition
intonload_delegated_send_cancel(intfd)
FormalParameters
fd

OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 218
Thefiledescriptortobeclosed.
ReturnValue
0onsuccess
non‐zeroifanerroroccurs.
Notes
Whentcpheadershavebeenallocatedwithonload_delegated_send_prepare(),but
itissubsequentlyrequiredtodoanormalsend,thisfunctioncanbeusedtocancel
thedelegatedsendoperationanddoanormalsend.
Thereisnoneedtocallthisfunctionbeforecalling
onload_delegated_send_prepare().
Thereisnoneedtocallthisfunctionifallthebytesspecifiedinthe
onload_delegated_send_prepare()functionhavebeensent.
Ifsome,butnotallbyteshavebeensent,youmustcall
onload_delegated_send_complete()forthesentbytesTHENcall
onload_delegated_send_cancel()fortheremainingbytes(reserved‐but‐not‐
sent)bytes.Thisappliesevenifthereasonfornotsendingisthatthewindowlimits
returnedfromthepreparefunctionhavebeenreached.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 219
Eonload_stackdump
E.1Introduction
TheSolarflareonload_stackdumpdiagnosticutilityisacomponentoftheOnload
distributionwhichcanbeusedtomonitorOnloadperformance,settuningoptions
andexamineaspectsofthesystemperformance.
NOTE:Toviewdataforallstacks,createdbyallusers,theusermustberootwhen
runningonload_stackdump.Non‐rootuserscanonlyviewdataforstackscreated
bythemselvesandaccessibletothemviatheEF_SHARE_WITHenvironment
variable.
Thefollowingexamplesofonload_stackdumparedemonstratedelsewhereinthis
userguide:
•MonitoringUsingonload_stackdumponpage42
•ProcessingatUser‐Levelonpage43
•AsFewInterruptsasPossibleonpage45
•EliminatingDropsonpage45
•MinimizingLockContentiononpage46
E.2GeneralUse
Theonload_stackdumptoolcanproduceanextensiverangeofdataanditcanbe
moreusefultolimitoutputtospecificstacksortospecificaspectsofthesystem
performanceforanalysispurposes.
•Forhelp,andtolistallonload_stackdumpcommandsandoptions:
onload_stackdump‐?
•Tolistandreadenvironmentvariablesdescriptions:
onload_stackdumpdoc
•Fordescriptionsofstatisticsvariables:
onload_stackdumpdescribe_stats
Describesallstatisticslistedbytheonload_stackdumplotscommand.
•Toidentifyallstacks,byidentifierandname,andallprocessesacceleratedby
Onload:
onload_stackdump
#stack‐idstack‐namepids
6teststack28570

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 220
•Tolimitthecommand/optiontoaspecificstacke.g(stack4).
onload_stackdump4lots
ListOnloadedProcesses
The‘onload_stackdumpprocesses’commandwillshowthePIDandnameof
processesbeingacceleratedbyOnloadandtheOnloadstackbeingusedbyeach
processe.g.
#onload_stackdumpprocesses
#pidstack‐idcmdline
255873./sfnt‐pingpong
Onloadedprocesseswhichhavenotcreatedasocketarenotdisplayed,butcanbe
identifiedusingthelsofcommand.
IdentifyOnloadedProcessesAffinities
The‘onload_stackdumpaffinities’commandwillidentifythetaskaffinityforan
acceleratedprocesse.g.
#onload_stackdumpaffinities
pid=25587
cmdline=./sfnt‐pingpong
task25587:80
Thetaskaffinityisidentifiedfroman8bitmaski.e.01isCPUcore0,02isCPUcore
1,80isCPUcore7etc.
ListOnloadEnvironmentvariables
The‘onload_stackdumpenv’commandwillidentifyonloadedprocessesrunning
inthecurrentenvironmentandlistallOnloadvariablessetinthecurrent
environmente.g.
#EF_POLL_USEC=100000EF_TXQ_SIZE=4096EF_INT_DRIVE=1onload<application>
#onload_stackdumpenv
pid:25587
cmdline:./sfnt‐pingpong
env:EF_POLL_USEC=100000
env:EF_TXQ_SIZE=4096
env:EF_INT_DRIVEN=1
TXPIOCounters
TheOnloadstackdumputilityexposescounterstoindicatehowoftenTXPIOisbeing
used‐seeDebugandLoggingonpage67.ToviewPIOcountersrunthefollowing
command:
$onload_stackdumpstats|greppio
pio_pkts:2485971
no_pio_err:0

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 221
ThevaluesreturnedwillidentifythenumberofpacketssentviaPIOandnumberof
timeswhenPIOwasnotusedduetoanerrorcondition.
SendRSTonaTCPSocket
TosendaresetonanOnloadacceleratedTCPsocket,specifythestackandsocket
usingtherstcommand:
#onload_stackdump<stack:socket>rst
RemovingZombieandOrphanStacks
Onloadstacksandsocketscanremainactiveevenafterallprocessesusingthem
havebeenterminatedorhaveexited,forexampletoensuresentdataissuccessfully
receivedbytheTCPpeerortohonorTCPTIME_WAITsemantics.Suchstacksshould
alwayseventuallyself‐destructanddisappearwithnouserintervention.However,
thesestacks,insomeinstances,causeproblemsforre‐startingapplications,for
exampletheapplicationmaybeunabletousethesameportnumberswhenthese
arestillbeingusedbythepersistentstacksocket.Persistentstacksalsoretain
resourcessuchaspacketbufferswhicharethendeniedtootherstacks.
Suchstacksaretermed‘zombie’or‘orphan’stacksanditmaybeundesirableor
desirablethattheyexist.
•Tolistallpersistentstacks:
#onload_stackdump‐zall
Nooutputtotheconsoleorsyslogmeansthatnosuchstacksexist.
•Tolistaspecificpersistentstack:
#onload_stackdump‐z<stackID>
•Todisplaythestateofpersistentstacks:
#onload_stackdump‐zdump
•Toterminatepersistentstacks
#onload_stackdump‐zkill
•Todisplayalloptionsavailableforzombie/orphanstacks:
#onload_stackdump‐‐help
Snapshotvs.DynamicViews
Theonload_stackdumptoolpresentsasnapshotviewofthesystemwheninvoked.
Tomonitorstateandvariablechangeswhilstanapplicationisrunninguse
onload_stackdumpwiththeLinuxwatchcommande.g.
• snapshot:onload_stackdumpnetif
• dynamic:watch‐d‐n1onload_stackdumpnetif
Someonload_stackdumpcommandsalsoupdateperiodicallywhilstmonitoringa
process.Thesecommandsusuallyhavethewatch_prefixe.g.

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 222
watch_stats,watch_more_stats,watch_tcp_stats,watch_ip_statsetc.
Usetheonload_stackdump‐hoptiontolistallcommands.
MonitoringReceiveandTransmitPacketBuffers
onload_stackdumppackets
#onload_stackdumppackets
ci_netif_pkt_dump_all:id=1
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=437current
pkt_bufs:max=32768alloc=2048free=981async=0
pkt_bufs:rx=1067rx_ring=1001rx_queued=2pressure_pool=64
pkt_bufs:tx=0tx_ring=0tx_oflow=0
pkt_bufs:in_loopback=0in_sock=0
1003:0x200Rx
n_zero_refs=1045n_freepkts=981estimated_free_nonb=64
free_nonb=0nonb_pkt_pool=ffffffffffffffff
Theonload_stackdumppacketscommandcanbeusefultoreviewpacketbuffer
allocation,useandreusewithinamonitoredprocess.
Theexampleaboveidentifiesthattheprocesshasamaximumof32768buffers
(eachof2048bytes)available.Fromthispool576buffershavebeenallocatedand
50fromthatallocationarecurrentlyfreeforreuse‐thatmeanstheycanbepushed
ontothereceiveortransmitringbuffersreadytoacceptnewincoming/outgoing
data.
Onthereceivesideofthestack,525packetbuffershavebeenallocated,522have
beenpushedtothereceivering‐andareavailableforincomingpackets,and3are
currentlyinthereceivequeuefortheapplicationtoprocess.
Onthetransmitsideofthestack,only1packetbufferiscurrentlyallocatedand
becauseitisnotcurrentlyinthetransmitringandisnotinanoverflowbufferitis
countedastx_other.Theremainingvaluesarecalculationsbasedonthepacket
buffervalues.
UsingtheEF_PREFAULT_PACKETSenvironmentvariable,packetscanbepre‐
allocatedtotheuser‐processwhenanOnloadstackiscreated.Thiscanreduce
latencyjitterandimproveOnloadperformance‐forfurtherdetailsseePrefault
PacketBuffersonpage42.
PacketSets
Apacketsetisa2MBchunkofpacketbuffersbeingusedbyanOnloadapplication.
Anapplicationmightusebuffersfromasinglesetorfromseveralsetsdependingon
itscomplexityandpacketbufferrequirements.
WithanaimtofurtherreduceTLBthrashingandeliminatepacketsdrops,Onload
willtrytoreusebuffersfromthesameset.

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 223
Theonload_stackdumplotscommandinOnload201509willreportonthecurrent
useofpacketssetse.g
$onload_stackdumplots|greppkt_set
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=442current
Intheaboveoutputthereare2packetsets,thecountersidentifythenumberoffree
packetbuffersineachsetandidentifythesetcurrentlybeingused.
Thepacketsetsfeatureisnotavailabletouserapplicationsusingtheef_vilayer
directly.
TCPApplicationSTATS
Thefollowingonload_stackdumpcommandscanbeusedtomonitoraccelerated
TCPconnections:
onload_stackdumptcp_stats
Field Description
tcp_active_opens Numberofsocketconnectionsinitiatedbythe
localend
tcp_passive_opens Numberofsocketsconnectionsacceptedbythe
localend
tcp_attempt_fails Numberoffailedconnectionattempts
tcp_estab_resets Numberofestablishedconnectionswhichwere
subsequentlyreset
tcp_curr_estab Numberofsocketconnectionsintheestablished
orclose_waitstates
tcp_in_segs Total numberofreceivedsegments‐includes
erroredsegments
tcp_out_segs Totalnumberoftransmittedsegments‐excluding
segmentscontainingonlyretransmittedoctets
tcp_retran_segs Totalnumberofretransmittedsegments
tcp_in_errs Total numberofsegmentsreceivedwitherrors
tcp_out_rsts Numberofresetsegmentssent

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 224
onload_stackdumpmore_stats|greptcp
Usetheonload_stackdump‐hcommandtolistallTCPconnection,stackand
socketcommands.
Theonload_stackdumpLOTSCommand.
Theonload_stackdumplotscommandwillproduceextensivedataforall
acceleratedstacksandsockets.Thecommandcanalsoberestrictedtoaspecific
stackanditsassociatedconnectionswhenthestacknumberisenteredonthe
commandlinee.g.
onload_stackdumplots
onload_stackdump2lots
Fordescriptionsofthestatisticsrefertotheoutputfromthefollowingcommand:
onload_stackdumpdescribe_stats
Field Description
tcp_has_recvq Nonzeroifreceivequeuehasdataready
tcp_recvq_bytes Totalbytesinreceivequeue
tcp_recvq_pkts Totalpacketsinreceivequeue
tcp_has_recv_reorder Nonzeroifsockethasoutofsequencebytes
tcp_recv_reorder_pkts: Numberofoutofsequencepacketsreceived
tcp_has_sendq Nonzeroifsendqueueshavedataready
tcp_sendq_bytes Numberofbytescurrentlyinallsendqueuesfor
thisconnection
tcp_sendq_pkts Numberofpacketscurrentlyinallsendqueuesfor
thisconnection
tcp_has_inflight Nonzeroifsomedataremainsunacknowledged
tcp_inflight_bytes Totalnumberofunacknowledgedbytes
tcp_inflight_pkts Totalnumberofunacknowledgedpackets
tcp_n_in_listenq Numberofsockets(summedacrossalllistening
sockets)wherethelocalendhasrespondedto
SYN,withaSYN_ACK,butthishasnotyetbeen
acknowledgedbytheremoteend
tcp_n_in_acceptq Numberofsockets(summedacrossalllistening
sockets)thatarecurrentlyqueuedwaitingforthe
localapplicationtocallaccept()

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 225
Thefollowingtablesdescribetheoutputfromtheonload_stackdumplots
commandfor:
•TCPstack
•TCPestablishedconnectionsocket
•TCPlisteningsocket
•UDPsocket
Withinthetablesthefollowingabbreviationsareused:
•rx=receive(orreceiver),tx=transmit(orsend)
•pkts=packets,skts=sockets
•Max=maximum,num=number,seq=sequencenumber
Table5:StackdumpOutput:TCPStack
Sampleoutput Description
onload_stackdumplots Commandentered
ci_netif_dump:stack=7name= StackidandstacknameassetbyEF_NAME.
ver=201310uid=0pid=21098 Onloadversion,useridandprocessidofcreator
process
lock=20000000LOCKEDnics=3primed=1 Internalstacklockstatus
nics=bitfieldidentifiesadaptersusedbythisstack
e.g.3=0x11‐sostackisusingNICs1and2.
primed=1meanstheeventqueuewillgenerate
aninterruptwhenthenexteventarrives
sock_bufs:max=1024n_allocated=4 Maxnumberofsocketsbufferswhichcanbe
allocated,andnumbercurrentlyinuse.Socket
buffersarealsousedbypipes.
pkt_bufs:size=2048max=32768alloc=576
free=57async=0
Packetbuffers:
Atotalof32768(eachof2048bytes)pktbuffers
areavailabletothisstack.576havebeenallocated
ofwhich57arefreeandcanbereusedbyeither
receiveortransmitrings.
async=buffersthatarenotfree,notbeingused,
notbeingreaped‐i.einastatewaitingtobe
returnedforreuse

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 226
pkt_bufs:rx=517rx_ring=514rx_queued=3 Receivepacketbuffers:
Atotalof517pktbuffersarecurrentlyinuse,514
havebeenpushedtothereceivering,3areinthe
application’sreceivequeue
IftheCRITICALflagisdisplayeditindicatesa
memorypressureconditioninwhichthenumber
ofpacketsinthereceivesocketbuffers(rx=517)is
approachingtheEF_MAX_RX_PACKETSvalue.
IftheLOWflagisdisplayeditindicatesamemory
pressureconditionwhentherearenotenough
packetbuffersavailabletorefilltheRXdescriptor
ring.
pkt_bufs:tx=2tx_ring=1tx_oflow=0
tx_other=1
Transmitpacketbuffers:
Atotalof2pktbuffersarecurrentlyinuse,1
remainsinthetransmitring,0buffershave
overflowed.tx_other=pktbuffersnotinuseby
transmitandnotintx_ringortx_oflowqueue
time:netif=5eb5c61poll=5eb5c61now=5eb5c61
(diff=0.000sec)
Internaltimervalues
ci_netif_dump_vi:stack=7intf=0
vi_instance=87hw=0C0
Datadescribesthestack’svirtualinterfacetothe
NIC
evq:cap=2048current=16de30is_32_evs=0
is_ev=0
Eventqueuedata:
cap‐maxnumofeventsqueuecanhold
current‐currenteventqueuelocation
is_32_evs‐is1ifthereare32ormoreevents
pending
is_ev‐is1ifthereareanyeventspending
Table5:StackdumpOutput:TCPStack
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 227
rxq:cap=511lim=511spc=1level=510
total_desc=93666
Receivequeuedata:
cap‐totalcapacity
lim‐maxfilllevelforreceivedescriptorring,
specifiedbyEF_RXQ_LIMIT
spc‐amountoffreespaceinreceivequeue‐how
manydescriptorscouldbeaddedbeforethe
receivequeuebecomesfull
level‐howfullthereceivequeuecurrentlyis
total_desc‐totalnumberofdescriptorsthathave
beenpushedtothereceivequeue
txq:cap=511lim=511spc=511level=0pkts=0
oflow_pkts=0
Transmitqueuedata:
cap‐totalcapacity
lim‐maxfilllevelfortransmitdescriptorring,
specifiedbyEF_TXQ_LIMIT
spc‐amountoffreespaceinthetransmitqueue‐
howmanydescriptorscouldbeaddedbeforethe
transmitqueuebecomesfull
level‐howfullthetransmitqueuecurrentlyis
pkts‐howmanypacketsarerepresentedbythe
descriptorsinthetransmitqueue
oflow‐howmanypacketsareintheoverflow
transmitqueue(i.e.waitingforspaceintheNIC's
transmitqueue)
txq:tot_pkts=93669bytes=0 Totalnumberofpacketssentandnumberof
packetbytescurrentlyinthequeue
ci_netif_dump_extra:stack=7 Additionaldatafollows
in_poll=0post_poll_list_empty=1
poll_did_wake=0
StackPollingStatus:
in_poll=processiscurrentlypolling
post_poll_list_empty=1,(1=true,0=false)tasksto
bedoneoncepollingiscomplete
poll_did_wake=whilepolling,theprocess
identifiedasocketwhichneedstobewoken
followingthepoll
rx_defrag_head=‐1rx_defrag_tail=‐1Reassemblysequencenumbers.‐1meansnore‐
assemblyhasoccurred
Table5:StackdumpOutput:TCPStack
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 228
tx_tcp_may_alloc=1nonb_pool=1
send_may_poll=0is_spinner=0
TCPbufferdata:
tx_tcp_may_alloc=numpktbufferstcpcoulduse
nonb_pool=numberofpktbuffersavailabletotcp
processwithoutholdinglock
send_may_poll=0
is_spinner=TRUEifathreadisspinning
send_may_poll=0 0
hwport_to_intf_i=0,‐1,‐1,‐1,‐1,‐1
intf_i_to_hwport=0,0,0,0,0,0
Internalmappingofinternalinterfacesto
hardwareports
uk_intf_ver=03e89aa26d20b98fd08793e771f2cdd9 md5user/kernelinterfacechecksumcomputedby
bothkernelanduserapplicationtoverifyinternal
datastructures
ci_netif_dump_reap_list:stack=7
7:2
7:1
Identifiessocketsthathavebufferswhichcanbe
freede.g.7:2=stack7socket2
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
TCP7:1lcl=192.168.1.2:50773
rmt=192.168.1.1:34875ESTABLISHED
SocketConfiguration.
Stack:socketid,localandremoteip:portaddress,
TCPconnectionisESTABLISHED
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=0000b6f4(RQ)tx_wake=00000002
flags:
Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE
BOUND
Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocaladdresses
rcvbuf=129940sndbuf=131072rx_errno=0
tx_errno=0so_error=0
Socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
tcpflags:TSOWSCLSACKESTAB TCPflagscurrentlysetforthissockets
TCPstate:ESTABLISHED StateoftheTCPconnection
Table5:StackdumpOutput:TCPStack
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 229
snd:up=b554bb86una‐nxt‐max=b554bb86‐
b554bb87‐b556b6a6enq=b554bb87
TCPsequencenumbers.
up=(urgentpointer)sequenceofbytefollowing
the00Bbyte
una‐nxt‐max=sequencenumberoffirst
unacknowledgedbyte,sequencenumberofnext
byteweexpecttobeacknowledgedandmax=
sequenceoflastbyteinthecurrentsendwindow
enq=sequencenumberoflastbytecurrently
queuedfortransmit
send=0(0)pre=0inflight=1(1)wnd=129824
unused=129823
SendData.
send=numberofpkts(bytes)sent
pre=numberofpktsinpre‐sendqueue.Aprocess
canadddatatotheprequeuewhenitisprevented
fromsendingthedataimmediately.Thedatawill
besentwhenthecurrentsendingoperationis
complete
inflight=numberofpkts(bytes)sentbutnotyet
acknowledged
wnd=receiver’sadvertisedwindowsize(bytes)
andnumberoffree(unused)space(bytes)inthat
window
snd:cwnd=49733+0used=0ssthresh=65535
bytes_acked=0Open
Congestionwindow(cwnd).
cwnd=congestionwindowsize(bytes)
used=portionofthecwndcurrentlyinuse
slowstartthresh‐numberofbytesthathavetobe
sentbeforeprocesscanexitslowstart
bytes_acked=numberofbytesacknowledged‐
thisvalueisusedtocalculatetherateatwhichthe
congestionwindowisopened
currentcwndstatus=OPEN
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 230
snd:Onloaded(Valid)if=6mtu=1500intf_i=0
vlan=0encap=4
Onloaded=canreachthedestinationviaan
acceleratedinterface.
(Valid)=cachedcontrolplaneinformationisup‐to‐
date,cansendimmediatelyusingthisinformation.
(Old)=cachedcontrolplaneinformationmaybe
out‐of‐date.OnnextsendOnloadwilldoacontrol
planelookup‐thiswilladdsomelatency.
rcv:nxt‐max=0e9251fe‐0e944d1d
current=0e944d92FASTSTARTFAST
ReceiverData.
nxt‐max=nextbyteweexpecttoreceiveandlast
byteweexpecttoreceive(becauseofwindow
size)
current=bytecurrentlybeingprocessed
rob_n=0recv1_n=2recv2_n=0wndadv=129823
cur=129940usr=0
Reorderbuffer.
Bytesreceivedoutofsequenceareputintoa
reorderbufferawaitingfurtherbytesbefore
reorderingcanoccur.
rob_n=numofbytesinreorderbuffer
recv1_n=numofbytesingeneralreorderbuffer
recv2_n=numofbytesinurgentdatareorder
buffer
wndadv=receiveradvertisedwindowsize
cur=currentreceivewindowsize
usr=currenttcpstackuser
async:rx_put=‐1rx_get=‐1tx_head=‐1Asynchronousqueuedata.
eff_mss=1448smss=1460amss=1460
used_bufs=2uid=0wscls=1r=1
MaxSegmentSize.
eff_mss=effective_mss
smss=sendermss
amss=advertisedmss
used_bufs=numberoftransmitbuffersused
useridthatcreatedthissocket(0=root)
wscls/r=parameterstowindowscalingalgorithm
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 231
srtt=01rttvar=000rto=189zwins=0,0 Roundtriptime(RTT)‐allvaluesaremilliseconds.
srtt=smoothedRTTvalue
rttvar=RTTvariation
rto=currentRTOtimeoutvalue
zwins=zerowindows,timeswhenadvertised
windowhasgonetozerosize.
retrans=0dupacks=0rtos=0frecs=0seqerr=0
ooo_pkts=0ooo=0
Re‐transmissions.
retrans=internalstate,nearlyalwayszero.
dupacks=numberofduplicateacksreceived
rtos=numberofretranstimeouts
frecs=numberoffastrecoveries
seqerr=numberofsequenceerrors
numberofoutofsequencepkts
numberofoutoforderevents
timers: Currentlyactivetimers
tx_nomac NumberofTCPpacketssentviatheOSusingraw
socketswhenuptodateARPdataisnotavailable.
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description
TCP7:3lcl=0.0.0.0:50773rmt=0.0.0.0:0
LISTEN
Socketconfiguration.
stack:socketid,LISTENINGsocketonport50773
localandremoteaddressesnotset‐notboundto
anyIPaddr
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=00000000tx_wake=00000000flags: Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE
BOUNDPBOUND
Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocalport
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 232
rcvbuf=129940sndbuf=131072rx_errno=6b
tx_errno=20so_error=0
ReceiveBuffer.
socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
tcpflags:WSCLSACK Flagsadvertisedduringhandshake
listenq:max=1024n=0 ListenQueue.
queueofhalfopenconnects(SYNreceivedand
SYNACKsent‐waitingforfinalACK)
n‐numberofconnectionsinthequeue
acceptq:max=5n=0get=‐1put=‐1total=0 AcceptQueue.
queueofopenconnections,waitingfor
applicationtocallaccept().
max=maxconnectionsthatcanexistinthequeue
n=currentnumberofconnections
get/put=indexesforqueueaccess
total=numofconnectionsthathavetraversed
thisqueue
epcache:n=0cache=EMPTYpending=EMPTY Endpointcache.
n=numberofendpointscurrentlyknowntothis
socket
cache=EMPTYoryesifendpointsarestillcached
pending=EMTPYoryesifendpointsstillhaveto
becached
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 233
defer_accept=0 NumberoftimesTCP_DEFER_ACCEPTkickedin‐
seeTCPsocketoptions
l_overflow=0l_no_synrecv=0a_overflow=0
a_no_sock=0ack_rsts=0os=2
l_overflow=numberoftimeslistenqueuewasfull
andhadtorejectaSYNrequest
l_no_synrecv=numberoftimesunabletoallocate
internalresourceforSYNrequest
a_overflow=numberoftimesunabletopromote
connectiontotheacceptqueuewhichisfull
a_no_sock=numberoftimesunabletocreate
socket
ack_rsts=numberoftimesreceivedanACKbefore
SYNsotheconnectionwasreset
os=2thereare2socketsbeingprocessedinthe
kernel
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description
UDP4:1lcl=192.168.1.2:38142
rmt=192.168.1.1:42638UDP
SocketConfiguration.
stack:socketid,UDPsocketonport38142
Localandremoteaddressesandports
lock:20000000LOCKED Stackinternallockstatus
rx_wake=000e69b0tx_wake=000e69b1flags: Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocaladdresses
rcvbuf=129024sndbuf=129024rx_errno=0
tx_errno=0so_error=0
Buffers.
socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
udpflags:FILTMCAST_LOOPRXOS FlagssetontheUDPsocket
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 234
mcast_snd:intf=‐1ifindex=0saddr=0.0.0.0
ttl=1mtu=1500
Multicast.
intf=multicasthardwareportid(‐1meansport
wasnotset)
ifindex=interface(port)identifier
saddr=IPaddress
tt1=timetolive(defaultformulticast=1)
mtu=maxtransmissionunitsize
rcv:q_bytes=0q_pkts=0reap=2
tot_bytes=30225920tot_pkts=944560
ReceiveQueue.
q_bytes=numbytescurrentlyinrxqueue
q_pkts=numpktscurrentlyinrxqueue
tot_bytes=totalbytesreceived
tot_pkts=totalpktsreceived
rcv:oflow_drop=0(0%)mem_drop=0eagain=0
pktinfo=0q_max_pkts=0
OverflowBuffer.
oflow=numberofdatagramsintheoverflow
queuewhenthesocketbufferisfull.
drop=numberofdatagramsdroppeddueto
runningoutofpacketbuffermemory.
eagain=numberoftimestheapplicationtriedto
readfromasocketwhenthereisnodataready‐
thisvaluecanbeignoredonthercvside
pktinfo=numberoftimesIP_PKTINFOcontrol
messagewasreceived
q_max=maxdepthreachedbythereceivequeue
(packets)
rcv:os=0(0%)os_slow=0os_error=0 Numberofdatagramsreceivedvia:
os=operatingsystem
os_slow=operatingsystemslowsocket
os_error=recv()functioncallviaOSreturnedan
error
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 235
snd:q=0+0ul=944561os=0(0%)os_slow=0(0%) Sendvalues.
q=numberofbytessenttotheinterfacebutnot
yettransmitted
ul=numberofdatagramssentviaonload
os=numberofdatagramssentviaOS
os_slownumberofdatagramssentviaOSslow
path
snd:cp_match=0(0%) UnconnectedUDPsend.
cp_match=numberdgramssentviaaccelerated
pathandpercentthisisofallunconnectedsend
dgrams
snd:lk_poll=0(0%)lk_pkt=944561(100%)
lk_snd=0(0%)
Stackinternallock.
lk_poll=numberoftimesthelockwasheldwhile
wepollthestack
lk_pkt=numberofpktssentwhileholdingthe
lock
lk_snd=numberoftimesthelockwasheldwhile
sendingdata
snd:lk_defer=0(0%)cached_daddr=0.0.0.0 Sendingdeferredtotheprocess/threadcurrently
holdingthelock
snd:eagain=0spin=0block=0 eagain=countofthenumberoftimesthe
applicationtriedtosenddata,butthetransmit
queueisalreadyfull.Ahighvalueonthesendside
mayindicatetransmitissues.
spin=numberoftimesprocesshadtospinwhen
thesendqueuewasfull
block=numberoftimesprocesshadtoblock
whenthesendqueuewasfull
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 236
Followingthestackandsocketdataonload_stackdumplotswilldisplayalistof
statisticaldata.Fordescriptionsofthefieldsrefertotheoutputfromthefollowing
command:
onload_stackdumpdescribe_stats
Thefinallistproducedbyonload_stackdumplotsshowsthecurrentvaluesofall
environmentvariablesinthemonitoredprocessenvironment.Fordescriptionsof
theenvironmentvariablesrefertoParameterReferenceonpage146orusethe
onload_stackdumpdoccommand.
RemoteMonitoring
IntroducedinOnload‐201502,theremotemonitoringfeatureusesasimpleclient/
servermodeltoexporttheOnloadstackandsocketdatatoaremoteserver(s).The
remotemonitor(server)processisinstalledalongwiththeOnloaddistribution.A
simpleexampleclientprocessisalsoprovided:
Theserverprocess(onthemachinetobemonitored)canbestartedfromthe
followingdirectory:
openonload‐201502/src/tools/onload_remote_monitor
Startthemonitorserverprocessidentifyingaportthroughwhichserver/client
processeswillconnect:
#./onload_remote_monitor<port>
Theexampleclientprocesscanbefoundinthefollowingdirectory:
openonload‐201502/src/tests/onload/onload_remote_monitor
Fromtheremotemachine,starttheclientprocessidentifyingtheserverhost
machineandportnumber
#./orm_example_client<serverhost>:<port>
snd:poll_avoids_full=0fragments=0
confirm=0
poll_avoids_full=numberoftimespollingcreated
spaceinthesendqueue
fragments=numberof(nonfirst)fragmentssent
confirm=numberofdatagramssentwith
MSG_CONFIRMflag
snd:os_late=0unconnect_late=0 os_late=numberofpktssentviaOSaftercopying
unconnect_late=numberofpktssilentlydropped
whenprocess/threadbecomesdisconnected
duringasendprocedure
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description

OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 237
Intheinitialreleasetheremote_monitorserverwillexportanextensivelistof
countersfromtheOnloadstacksandsockets.DataisexportedinJSONformatfor
processingbyaremoteapplication.
Remotemonitoringisanexploratoryfeatureanditisplannedthatfuture
continuousdevelopmentwillincludedatarequestedbydirectcustomerinputand
feedback.
Customersinterestedinremotemonitoringareaskedtoprovidefeedbackand
monitoringrequirementsbysendinganemailtosupport@solarflare.com.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 238
FSolarflaresfnettest
F.1Introduction
Solarflaresfnettestisasetofbenchmarktoolsandtestutilitiessuppliedby
Solarflareforbenchmarkandperformancetestingofnetworkserversandnetwork
adapters.Thesfnettestisavailableinbinaryandsourceformsfrom:
http://www.openonload.org/
Downloadthesfnettest‐<version>.tgzsourcefileandunpackusingthetar
command.
tar‐zxvfsfnettest‐<version>.tgz
Runthemakeutilityfromthe/sfnettest‐<version>/srcsubdirectorytobuild
thebenchmarkapplications.
RefertotheREADME.sfnt‐pingpongorREADME.sfnt‐streamfilesinthe
distributiondirectoryoncesfnettestisinstalled.
sfnt‐pingpong
Description
Thesfnt‐pingpongapplicationmeasuresTCPandUDPlatencybycreatingasingle
socketbetweentwoserversandrunningasimplemessagepatternbetweenthem.
TheoutputidentifieslatencyandstatisticsforincreasingTCP/UDPpacketsizes.
Usage
sfnt‐pingpong[options][<tcp|udp|pipe|unix_stream|unix_datagram>
[<host[:port]>]]
Options
sfnt‐pingpongoptions:
Option Description
‐‐port serverport
‐‐sizes singlemessagesize(bytes)
‐‐connect connect()UDPsocket
‐‐spin spinonnon‐blockingrecv()

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 239
‐‐muxer select,pollorepoll
‐‐serv‐muxer none,select,pollorepoll(sameasclientbydefault)
‐‐rtt reportround‐trip‐time
‐‐raw dumprawresultstofiles
‐‐percentile percentile
‐‐minmsg minimummessagesize
‐‐maxmsg maximummessagesize
‐‐minms mintimepermsgsize(ms)
‐‐maxms maxtimepermsgsize(ms)
‐‐miniter minimumiterationsforresult
‐‐maxiter maximumiterationsforresult
‐‐mcast usemulticastaddressing
‐‐mcastintf setthemulticastinterface.Theclientsendsthisparameter
totheserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐bindtodev SO_BINDTODEVICE
‐‐forkboth forkclientandserver
‐‐n‐pipe includepipesinfiledescriptorset
‐‐n‐unix‐dincludeunixdatagramsinthefiledescriptorset
‐‐n‐unix‐sincludeunixstreamsinthefiledescriptorset
‐‐n‐udp includeUDPsocketsinfiledescriptorset
‐‐n‐tcpc includeTCPsocketsinfiledescriptorset
‐‐n‐tcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcp‐serv host:portforTCPconnections
‐‐timeout socketSND/RECVtimeout
Option Description

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 240
Standardoptions:
Examples
ExampleTCPlatencycommandlines
[root@server]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong
[root@client]#onload‐‐profile=latencytaskset‐c1./sfnt‐pingpong\
‐‐maxms=10000‐‐affinity"1;1"tcp<server‐ip>
ExampleUDPlatencycommandlines
[root@server]#onload‐‐profile=latencytaskset‐c9./sfnt‐pingpong
[root@client]#onload‐‐profile=latencytaskset‐c9./sfnt‐pingpong\
‐‐maxms=10000‐‐affinity"9;9"udp<server_ip>
Exampleoutput
#version:1.4.0‐modified
#src:13b27e6b86132da11b727fbe552e2293
#date:SatApr2111:56:22BST2012
#uname:Linuxserver4.uk.level5networks.com2.6.32‐220.el6.x86_64#1SMP
WedNov908:03:13EST2011x86_64x86_64x86_64GNU/Linux
#cpu:modelname:Intel(R)Xeon(R)CPUE5‐2687W0@3.10GHz
#lspci:05:00.0Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
#lspci:05:00.1Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
‐‐affinity ’<client‐core>;<server‐core>’Enclosevaluesinquotes.
Thisoptionshouldbesetontheclientsideonly.Theclient
sendsthe<server_core>valuetotheserver.Theusermust
ensurethattheidentifiedservercoreisavailableonthe
servermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐n‐pings numberofpingmessages
‐‐n‐pongs numberofpongmessages
‐‐nodelay enableTCP_NODELAY
Option Description
‐?‐‐help thismessage
‐q‐‐quiet quiet
‐v‐‐verbose displaymoreinformation
Option Description

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 241
#lspci:83:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:83:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:85:00.0Ethernetcontroller:IntelCorporation82574LGigabit
NetworkConnection
#eth0:driver:igb
#eth0:version:3.0.6‐k
#eth0:bus‐info:0000:05:00.0
#eth1:driver:igb
#eth1:version:3.0.6‐k
#eth1:bus‐info:0000:05:00.1
#eth2:driver:sfc
#eth2:version:3.2.1.6083
#eth2:bus‐info:0000:83:00.0
#eth3:driver:sfc
#eth3:version:3.2.1.6083
#eth3:bus‐info:0000:83:00.1
#eth4:driver:e1000e
#eth4:version:1.4.4‐k
#eth4:bus‐info:0000:85:00.0
#virbr0:driver:bridge
#virbr0:version:2.3
#virbr0:bus‐info:N/A
#virbr0‐nic:driver:tun
#virbr0‐nic:version:1.6
#virbr0‐nic:bus‐info:tap
#ram:MemTotal:32959748kB
#tsc_hz:3099966880
#LD_PRELOAD=libonload.so
#serverLD_PRELOAD=libonload.so
#onload_version=201205
#EF_TCP_FASTSTART_INIT=0
#EF_POLL_USEC=100000
#EF_TCP_FASTSTART_IDLE=0
#
#sizemeanminmedianmax%ilestddeviter
1245323802434182882669771000000
2245323792435451092616901000000
4246723802436105022730821000000
824652383244687982642701000000
1624602380244174942632681000000
3224742399245487582677711000000
64249524192474121742716771000000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof2.4
microsecondswitha99%ilelatencylessthan2.7microseconds.

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 242
sfnt‐stream
Thesfnt‐streamapplicationmeasuresRTTlatency(not1/2RTT)forafixedsize
messageatincreasingmessagerates.Latencyiscalculatedfromasampleofall
messagessent.Messageratescanbesetwiththeratesoptionandthenumberof
messagestosampleusingthesampleoption.
Solarflaresfnt‐streamonlyfunctionsonUDPsockets.Thislimitationwillbe
removedtosupportotherprotocolsinthefuture.
RefertotheREADME.sfnt‐streamfilewhichispartoftheOnloaddistributionfor
furtherinformation.
Usage
sfnt‐stream[options][tcp|udp|pipe|unix_stream|unix_datagram[host[:port]]]
Options
sfnt‐streamoptions:
Option Description
‐‐msgsize messagesize(bytes)
‐‐rates msgrates<min>‐<max>[+<step>]
‐‐millisec timepertest(milliseconds)
‐‐samples numberofsamplespertest
‐‐stop stopwhenTXrateachievedisbelowgivepercentageof
targetrate
‐‐maxburst maximumburstlength
‐‐port serverportnumber
‐‐connect connect()UDPsocket
‐‐spin spinonnon‐blockingrecv()
‐‐muxer select,poll,epollornone
‐‐rtt reportround‐trip‐time
‐‐raw dumprawresultstofile
‐‐percentile percentile
‐‐mcast setthemulticastaddress

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 243
Standardoptions:
‐‐mcastintf setmulticastinterface.Theclientsendsthisparameterto
theserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐ttl IP_TTLandIP_MULTICAST_TTL
‐‐bindtodevice SO_BINDTODEVICE
‐‐n‐pipe includepipesinfiledescriptorset
‐‐n‐unix‐dincludeunixdatagraminfiledescriptorset
‐‐n‐unix‐sincludeunixstreaminfiledescriptorset
‐‐n‐udp includeUDPsocketsinfiledescriptorset
‐‐n‐tcpc includeTCPsocketsinfiledescriptorset
‐‐n‐tcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcpc‐serv host:portforTCPconnections
‐‐nodelay enableTCP_NODELAY
‐‐affinity "<client‐tx>,<client‐rx>;<server‐core>"enclosethevalues
indoublequotese.g."4,5;3".Thisoptionshouldbeseton
theclientsideonly.Theclientsendsthe<server_core>
valuetotheserver.Theusermustensurethatthe
identifiedservercoreisavailableontheservermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐rtt‐iter iterationsforRTTmeasurement
Option Description
‐?‐‐help thismessage
‐q‐‐quiet quiet
‐v‐‐verbose displaymoreinformation
‐‐version displayversioninformation
Option Description

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 244
Examples
Examplecommandlinesclient/server
#./sfnt‐stream(server)
#./sfnt‐stream‐‐affinity1,1udp<server‐ip>(client)
#./taskset‐c1./sfnt‐stream‐‐affinity="3,5;3"‐‐mcastintf=eth4udp\
<remote‐ip>(client)
BondedInterfaces:sfnt‐stream
Thefollowingexampleconfiguresasinglebond,havingtwoslavesinterfaces,on
eachmachine.Bothclientandservermachinesuseeth4andeth5.
ClientConfiguration:
[root@clientsrc]#ifconfigeth40.0.0.0down
[root@clientsrc]#ifconfigeth50.0.0.0down
[root@clientsrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@clientsrc]#ifconfigbond0up
[root@clientsrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#ifconfigbond0172.16.136.27/21
[root@clientsrc]#onload‐‐profile=latencytaskset‐c3./sfnt‐stream
sfnt‐stream:server:waitingforclienttoconnect...
sfnt‐stream:server:clientconnected
sfnt‐stream:server:client0at172.16.136.28:45037
ServerConfiguration:
[root@serversrc]#ifconfigeth40.0.0.0down
[root@serversrc]#ifconfigeth50.0.0.0down
[root@serversrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@serversrc]#ifconfigbond0up
[root@serversrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#ifconfigbond0172.16.136.28/21
NOTE:serversendstoIPaddressofclientbond
[root@serversrc]#onload‐‐profile=latencytaskset‐c1./sfnt‐stream‐‐mcastintf=bond0‐
‐affinity"1,1;3"udp172.16.136.27
OutputFields
Alltimemeasurementsarenanosecondsunlessotherwisestated.
Field Description
mpstarget Msgpersectargetrate
mpssend Msgpersecactualrate
mpsrecv Msgreceiverate
latencymean RTTmeanlatency

OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 245
LatencyProfile‐Spinning
Bothsfnt‐pingpongandsfnt‐streamusescriptsfoundintheonload_apps
subdirectorywhichinvoketheonloadlatencyprofiletherebycausingthe
applicationto‘spin’.
Torunthesetestprogramsinaninterruptdrivenmode,replacethe‐‐
profile=latencyoptiononthecommandline,withthe‐‐no‐app‐handleroption.
latencymin RTTminimumlatency
latencymedian RTTmedianlatency
latencymax RTTmaximumlatency
latency%ile RTT99%ile
latencystddev Standarddeviationofsample
latencysamples Numberofmessagesusedtocalculatelatency
measurement
sendjitmean Meanvariancewhensendingmessages
sendjitmin Minimumvariancewhensendingmessages
sendjitmax Maximumvariancewhensendingmessages
sendjitbehind Numberoftimesthesenderfallsbehindandisunableto
keepupwiththetransmitrate
gapsn_gaps Countthenumberofgapsappearinginthestream
gapsn_drops Countthenumberofdropsfromstream
gapsn_ooo Countthenumberofsequencenumbersreceivedoutof
order
Field Description

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 246
Gonload_tcpdump
G.1Introduction
Bydefinition,Onloadisakernelbypasstechnologyandthispreventspacketsfrom
beingcapturedbypacketsniffingapplicationssuchastcpdump,netstatand
wireshark.
Onloadsupportstheonload_tcpdumpapplicationthatsupportspacketcapture
fromonloadstackstoafileortobedisplayedonstandardout(stdout).Packet
capturefilesproducedbyonload_tcpdumpcanthenbeimportedtotheregular
tcpdump,wiresharkorotherthirdpartyapplicationwhereuserscantakeadvantage
ofdedicatedsearchandanalysisfeatures.
Onload_tcpdumpallowsforthecaptureofallTCPandUDPunicastandmulticast
datasentorreceivedviaOnloadstacks‐includingsharedstacks.
G.2Buildingonload_tcpdump
Theonload_tcpdumpscriptissuppliedwiththeOnloaddistributionandislocated
intheOnload‐<version>/scriptssub‐directory.
NOTE:libpcapandlibpcap‐develmustbebuiltandinstalledbeforeOnloadis
installed.
G.3Usingonload_tcpdump
Forhelpusethe./onload_tcpdump‐hcommand:
Usage:
onload_tcpdump[‐ostack‐(id|name)[‐ostack...]]
tcpdump_options_and_parameters
"mantcpdump"fordetailsontcpdumpparameters.
Youmayusestackidnumberorshell‐likepatternforthestackname
tospecifytheOnloadstackstolistenon.
Ifyoudonotspecifystacks,onload_tcpdumpwillmonitorallonload
stacks.
Ifyoudonotspecifyinterfacevia‐ioption,onload_tcpdump
listensonALLinterfacesinsteadofthefirstone.
ForfurtherinformationrefertotheLinuxmantcpdumppages.
Examples
•Captureallacceleratedtrafficfrometh2toafilecalledmycaps.pcap:

OnloadUserGuide
onload_tcpdump
Issue20 ©SolarflareCommunications2015 247
#onload_tcpdump‐ieth2‐wmycaps.pcap
•Ifnofileisspecifiedonload_tcpdumpwilldirectoutputtostdout:
#onload_tcpdump‐ieth2
•TocaptureacceleratedtrafficforaspecificOnloadstack(byname):
#onload_tcpdump‐ieth4‐ostackname
•TocaptureacceleratedtrafficforaspecificOnloadstack(byID):
#onload_tcpdump‐o7
•TocaptureacceleratedtrafficforOnloadstackswherenamebeginswith“abc”
#onload_tcpdump‐o'abc*'
•Tocaptureacceleratedtrafficforonloadstack1,stacknamed“stack2”andall
onloadstackswithnamebeginningwith“ab”:
#onload_tcpdump‐o1‐o'stack2'‐o'ab*'
Dependencies
Theonload_tcpdumpapplicationrequireslibpcapandlibpcap‐develtobe
installedontheserver.Iflibpcapisnotinstalledthefollowingmessageisreported
whenonload_tcpdumpisinvoked:
./onload_tcpdump
ciOnloadwascompiledwithoutlibpcapdevelopmentpackageinstalled.You
needtoinstalllibpcap‐develorlibpcap‐devpackagetorun
onload_tcpdump.
tcpdump:truncateddumpfile;triedtoread24fileheaderbytes,onlygot
0
Hangup
Iflibpcapismissingitcanbedownloadedfromhttp://www.tcpdump.org/
Untarthecompressedfileonthetargetserverandfollowbuildinstructionsinthe
INSTALL.txtfile.ThelibpcappackagemustbeinstalledbeforeOnloadisbuiltand
installed.
Limitations
•Currentlyonload_tcpdumpcapturesonlypacketsfromonloadstacksandnot
fromkernelstacks.
•Theonload_tcpdumpapplicationmonitorsstackcreationeventsandwill
attachtonewlycreatedstackshowever,thereisashortperiod(normallyonly
afewmilliseconds)betweenstackcreationandtheattachmentduringwhich
packetssent/receivedwillnotbecaptured.
KnownIssues
Usersmaynoticethatthepacketssentwhenthedestinationaddressisnotinthe
hostARPtablecausesthepacketstoappearinbothonload_tcpdumpand(Linux)
tcpdump.

OnloadUserGuide
onload_tcpdump
Issue20 ©SolarflareCommunications2015 248
SolarCapture
Solarflare’sSolarCaptureisapacketcaptureapplicationforSolarflarenetwork
adapters.Itisabletocapturereceivedpacketsfromthewireatlinerate,assigning
accuratetimestampstoeachpacket.PacketsarecapturedtoPCAPfileorforwarded
touser‐suppliedlogicforprocessing.FordetailsseetheSolarCaptureUserGuide
(SF‐108469‐CD)availablefromhttps://support.solarflare.com/.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 249
Hef_vi
TheSolarflareef_viAPIisalayer2APIthatgrantsanapplicationdirectaccesstothe
Solarflarenetworkadapterdatapathtodeliverlowerlatencyandreducedper
messageprocessingoverheads.ef_viistheinternalAPIusedbyOnloadforsending
andreceivingpackets.Itcanbeuseddirectlybyapplicationsthatwantthevery
lowestlatencysendandreceiveAPIandthatdonotrequireaPOSIXsocket
interface.
•ef_viispackagedwiththeOnloaddistribution.
•ef_viisanOSIlevel2interfacewhichsendsandreceivesrawEthernetframes.
•ef_visupportsazero‐copyinterfacebecausetheuserprocesshasdirectaccess
tomemorybuffersusedbythehardwaretoreceiveandtransmitdata.
•Anapplicationcanusebothef_viandOnloadatthesametime.Forexample,
useef_vitoreceiveUDPmarketdataandOnloadsocketsforTCPconnections
fortrading.
•Theef_viAPIcandeliverlowerlatencythanOnloadandincursreducedper
messageoverheads.
•ef_viisfreesoftwaredistributedunderaLGPLlicense.
•Theuserapplicationwishingtousethelayer2ef_viAPImustimplementthe
higherlayerprotocols.
H.1Components
AllcomponentsrequiredtobuildandlinkauserapplicationwiththeSolarflareef_vi
APIaredistributedwithOnload.WhenOnloadisinstalledallrequireddirectories/
filesarelocatedundertheOnloaddistributiondirectory.
H.2CompilingandLinking
RefertotheREADME.ef_vifileintheOnloaddirectoryforcompileandlink
instructions.

OnloadUserGuide
ef_vi
Issue20 ©SolarflareCommunications2015 250
H.3Documentation
Theef_vidocumentationisdistributedindoxygenformatwiththeOnload
distribution.DocumentsinHTMLandRTFformataregeneratedbyrunningdoxygen
inthefollowingdirectory:
cdopenonload‐<version>/src/include/etherfabric/doxygen
doxygendoxyfile_ef_vi
DocumentsaregeneratedintheHTMLandRTFsub‐directories.
Theef_viuserguideisalsoavailableinPDFformat(SF‐114063‐CD)fromthe
Solarflaredownloadsite.

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 251
Ionload_iptables
I.1Description
TheLinuxnetfilteriptablesfeatureprovidesfilteringbasedonuser‐configurable
ruleswiththeaimofmanagingaccesstonetworkdevicesandpreventing
unauthorizedormaliciouspassageofnetworktraffic.Packetsdeliveredtoan
applicationviatheOnloadacceleratedpatharenotvisibletotheOSkerneland,as
aresult,thesepacketsarenotvisibletothekernelfirewall(iptables).
Theonload_iptablesfeatureallowstheusertoconfigureruleswhichdetermine
whichhardwarefiltersOnloadispermittedtoinsertontheadapterandtherefore
whichconnectionsandsocketscanbypassthekerneland,asaconsequence,bypass
iptables.
Theonload_iptablescommandcanconvertasnapshot1copyofthekerneliptables
rulesintoOnloadfirewallrulesusedtodetermineifsockets,createdbyanOnloaded
process,areretainedbyOnloadorhandedofftothekernelnetworkstack.
Additionally,user‐definedfilterrulescanbeaddedtotheOnloadfirewallonaper
interfacebasis.TheOnloadfirewallappliestothereceivefilterpathonly.
I.2Howitworks
BeforeOnloadacceleratesasocketitfirstcheckstheOnloadfirewallmodule.Ifthe
firewallmoduleindicatestheaccelerationofthesocketwouldviolateafirewallrule,
theaccelerationrequestisdeniedandthesocketishandedofftothekernel.
Networktrafficsentorreceivedonthesocketisnotaccelerated.
Onloadfirewallrulesareparsedinascendingnumericalorder.Thefirstruletomatch
thenewlycreatedsocket‐whichmayindicatetoaccelerateordeceleratethesocket
‐isselectedandnofurtherrulesareparsed.
IftheOnloadfirewallrulesareanexactcopyofthekerneliptablesi.e.withno
additionalrulesaddedbytheOnloaduser,thenasockethandedofftothekernel,
becauseofaniptablesruleviolation,willbeunabletoreceivedatathrougheither
path.
Changingrulesusingonload_iptableswillnotinterruptexistingnetwork
connections.
NOTE:Onloadfirewallruleswillnotpersistovernetworkdriverrestarts.
1. SubsequentchangestokerneliptableswillnotbereflectedintheOnloadfirewall.

OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 252
NOTE:Theonload_iptables“IPrules”willonlyblockhardwareIPfiltersfrombeing
insertedandonload_iptables“MACrules”willonlyblockhardwareMACfilters
frombeinginserted.ThereforeitispossiblethatifaruleisinsertedtoblockaMAC
address,theuserisstillabletoaccepttrafficfromthespecifiedhostbyOnload
insertinganappropriateIPhardwarefilter.
Files
WhentheOnloaddriversareloaded,firewallrulesexistintheLinuxprocpseudo
filesystemat:
/proc/driver/sfc_resource
Withinthisdirectorythefirewall_add,firewall_delandresourcesfileswillbe
present.Thesefilesarewritableonlybyarootuser.Noattemptshouldbemadeto
removethesefiles.
Onceruleshavebeencreatedforaparticularinterface–andonlywhiletheserules
exist–aseparatedirectoryexistswhichcontainsthecurrentfirewallrulesforthe
interface:
/proc/driver/sfc_resource/ethN/firewall_rules
I.3Features
Togethelp
#onload_iptables‐h
I.4Rules
Thegeneralformatoftheruleis:
[rule=n]if=ethNprotocol=(ip|tcp|udp)[local_ip=a.b.c.d[/mask]]
[remote_ip=a.b.c.d[/mask]][local_port=a[‐b]][remote_port=a[‐b]][vlan=n]
action=(ACCELERATE|DECELERATE)
NOTE:UsingtheIPaddressruleform,thevlanidentifieriseffectiveonlywhenusing
aSolarflareSFN7000seriesadapterwhichisconfiguredtousethefull‐featured
firmwarevariant.OnotherSolarflareadaptersthevlanidentifierisignored.The
vlanidentifiercanonlybespecifiedwiththevlan=nsyntaxandnotontheinterface.
[rule=n]if=ethNprotocol=ethmac=xx:xx:xx:xx:xx:xx[/FF:FF:FF:FF:FF:FF]
[vlan=n]action=(ACCELERATE|DECELERATE)
NOTE:UsingtheMACaddressruleform,thevlanidentifieriseffectivewhen
specifiedforanySolarflareadapter.

OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 253
I.5Previewfirewallrules
BeforecreatingtheOnloadfirewall,runtheonload_iptables‐ voptiontoidentify
whichruleswillbeadoptedbythefirewallandwhichwillberejected(areasonis
givenforrejection):
#onload_iptables‐v
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=5201‐5201
remote_ip=0.0.0.0/0remote_port=0‐65535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=5201‐5201
remote_ip=0.0.0.0/0remote_port=0‐65535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcp
dpts:80:88
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=80‐88
remote_ip=0.0.0.0/0remote_port=0‐65535action=
tcp‐‐0.0.0.0/00.0.0.0/0tcpspt:800
=>Errorparsing:Insuffcientargumentsinrule.
Thelastruleisrejectedbecausetheactionismissing.
NOTE:The‐voptiondoesnotcreatefirewallrulesforanySolarflareinterface,but
allowstheusertopreviewwhichLinuxiptablesruleswillbeacceptedandwhich
willberejectedbyOnload
ToconvertLinuxiptablestoOnloadfirewallrules
TheLinuxiptablescanbeappliedtoallorindividualSolarflareinterfaces.
Onloadiptablesareonlyappliedtothereceivefilterpath.Theusercanselectthe
INPUTCHAINorauserdefinedCHAINtoparsefromtheiptables.ThedefaultCHAIN
isINPUT.Toadopttherulesfromiptableseventhoughsomeruleswillberejected
enterthefollowingcommandidentifyingtheSolarflareinterfacetherulesshouldbe
appliedto:
#onload_iptables‐iethN‐c
#onload_iptables‐a‐c
Runningtheonload_iptablescommandwilloverwriteexistingrulesintheOnload
firewallwhenusedwiththe‐i(interface)or‐a(allinterfaces)options.
NOTE:ApplyingtheLinuxiptablestoaSolarflareinterfaceisoptional.The
alternativesaretocreateuser‐definedfirewallrulesperinterfaceornottoapply
anyfirewallrulesperinterface(defaultbehavior).
NOTE:onload_iptableswillimportallrulestotheidentifiedinterface‐evenrules
specifiedonanotherinterface.Toavoidimportingrulesspecifiedon‘other’
interfacesusingthe‐‐use‐extendedoption.

OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 254
Toviewrulesforaspecificinterface:
WhenfirewallrulesexistforaSolarflareinterface,andonlywhiletheyexist,a
directoryfortheinterfacewillbecreatedin:
/proc/driver/sfc_resource
Rulesforaspecificinterfacewillbefoundinthefirewall_rulesfilee.g.
cat/proc/driver/sfc_resource/eth3/firewall_rules
if=eth3rule=0protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=5201‐5201remote_port=0‐65535action=DECELERATE
if=eth3rule=1protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=5201‐5201remote_port=0‐65535action=DECELERATE
if=eth3rule=2protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=5201‐5201remote_port=72‐72action=DECELERATE
if=eth3rule=3protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=80‐88remote_port=0‐65535action=DECELERATE
Toaddaruleforaselectedinterface
echo"rule=4if=eth3action=ACCEPTprotocol=udplocal_port=7330‐7340"\
>/proc/driver/sfc_resource/firewall_add
Rulescanbeinsertedintoanypositioninthetableandexistingrulenumberswillbe
adjustedtoaccommodatenewrules.Ifarulenumberisnotspecifiedtherulewill
beappendedtotheexistingrulelist.
NOTE:Errorsresultingfromtheadd/deletecommandswillbedisplayedindmesg.
Todeletearulefromaselectedinterface:
Todeleteasinglerule:
#echo"if=eth3rule=2">/proc/driver/sfc_resource/firewall_del
Todeleteallrules:
echo"eth2all">/proc/driver/sfc_resource/firewall_del
Whenthelastruleforaninterfacehasbeendeletedtheinterfacefirewall_rulesfile
isremovedfrom/proc/driver/sfc_resource.Theinterfacedirectorywillbe
removedonlywhencompletelyempty.
ErrorChecking
Theonload_iptablescommanddoesnotlogerrorstostdout.Errorsarisingfromadd
ordeletecommandswillloggedindmesg.
Interface&Port
Onloadfirewallrulesareboundtoaninterfaceandnottoaphysicaladapterport.It
ispossibletocreaterulesforaninterfaceinaconfigured/downstate.

OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 255
Virtual/BondedInterface
Onvirtualorbondedinterfacesfirewallrulesareonlyappliedandenforcedonthe
‘real’interface.
I.6ErrorMessages
Errormessagesrelatingtoonload_iptablesoperationswillappearindmesg.
Table9:Errormessagesforonload_iptables
ErrorMessage Description
Internalerror Internalcondition‐shouldnothappen.
Unsupportedrule Internalcondition‐shouldnothappen.
Outofmemoryallocatingnewrule Memoryallocationerror.
Seenmultiplerulenumbers Onlyasinglerulenumbercanbe
specifiedwhenadding/deletingrules.
Seenmultipleinterfaces Onlyasingleinterfacecanbespecified
whenadding/deletingrules.
Unabletounderstandaction Theactionspecifiedwhenaddinga
ruleisnotsupported.Notethatthere
shouldbenospacesi.e.
action=ACCELERATE.
Unabletounderstandprotocol Non‐supportedprotocol.
Unabletounderstandremainderof
therule
Non‐supportedparameters/syntax.
Failedtounderstandinterface Theinterfacedoesnotexist.Rulescan
beaddedtoaninterfacethatdoesnot
yetexist,butcannotbedeletedfrom
annon‐existentinterface.
Failedtoremoverule Theruledoesnotexist.
Errorremovingtable Internalcondition‐shouldnothappen.
Invalidlocal_iprule Invalidaddress/maskformat.
Supportedformats:
a.b.c.d
a.b.c.d/n
a.b.c.d/e.f.g.h
wherea.b.c.d.e.f.g.haredecimalrange
0‐255,n=decimalrange0‐32.

OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 256
NOTE:ALinuxlimitationapplicabletothe/proc/filesystemrestrictsawrite
operationto1024bytes.Whenwritingto/proc/driver/sfc_resource/
firewall_[add|del]filestheuserisadvisedtoflushthewritebetweenlineswhich
exceedthe1024bytelimit.
Invalidremote_iprule Invalidaddress/maskformat.
Invalidrule Arulemustidentifyatleastan
interface,aprotocol,anactionandat
leastonematchcriteria.
Invalidmac Invalidmacaddress/maskformat.
Supportedformats:
xx:xx:xx:xx:xx:xx
xx:xx:xx:xx:xx:xx/xx:xx:xx:xx:xx:xx
wherexisahexdigit.
Table9:Errormessagesforonload_iptables
ErrorMessage Description

OnloadUserGuide
Issue20 ©SolarflareCommunications2015 257
JSolarflareefpioTestApplication
Theopenonloaddistributionincludesthecommandlineefpiotestapplicationto
measurelatencyoftheSolarflareef_vilayer2APIwithPIO.Theefpioapplicationis
asinglethreadping/pong.Whenalliterationsarecompletetheclientsidewill
displaytheround‐triptime.
Bydefaultefpiodownloadsapackettotheadapteratstartofdayandtransmitsthis
samepacketoneveryiterationofthetest.The–coptioncanbeusedtotestthe
latencyofef_viusingPIOtotransferanewtransmitpackettotheadapteronevery
iteration.
Withtheonloaddistributioninstalledefpiowillbepresentinthefollowing
directory:
~/openonload‐201310/build/gnu_x86_64/tests/ef_vi
J.1efpio
./efpio–help
usage:
efpio[options]<ping|pong><interface>
<local‐ip‐intf><local‐port>
<remote‐mac><remote‐ip‐intf><remote‐port>
options:
‐n<iterations>‐setnumberofiterations
‐s<message‐size>‐setudppayloadsize
‐w‐sleepinsteadofbusywait
‐v‐useaVF
‐p‐physicaladdressmode
‐t‐disableTXpush
‐c‐copyoncriticalpath
Table10:efpioOptions
Parameter Description
interface thelocalinterfacetousee.g.eth2
local‐ip‐intf localinterfaceIPaddress/hostname
local‐port localinterfaceIPportnumbertouse
remote‐mac MACaddressoftheremoteinterface
remote‐ip‐intf remoteserverIPaddress/hostname
remote‐port remoteserverportnumber

OnloadUserGuide
SolarflareefpioTestApplication
Issue20 ©SolarflareCommunications2015 258
Torunefpio
Theefpiomustbestartedontheserver(pongside)beforetheclient(pingside)is
run.Commandlineexamplesareshownbelow.
1Ontheserverside(server1)
taskset–c<M>./efpiopongeth<N><local‐ip>8001<server2‐mac>
<server2‐ip>8001
#ef_vi_version_str:201306‐7122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
2Ontheclientside(server2)
taskset–c<M>./efpiopingeth<N><local‐ip>8001<server1‐mac>
<server1‐ip>8001
#ef_vi_version_str:201306‐7122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
round‐triptime:2.848µs
M=cpucore,N=Solarflareadapterinterface.