OpenOnload Onload User Guide (2015) SF 104474 CD 20 Issue

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 265 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Issue20 ©SolarflareCommunications2015 i
OnloadUserGuide
Copyright©2015SOLARFLARECommunications,Inc.Allrightsreserved.
Thesoftwareandhardwareasapplicable(the“Product”)describedinthisdocument,andthisdocument,areprotectedby
copyrightlaws,patentsandotherintellectualpropertylawsandinternationaltreaties.TheProductdescribedinthisdocumentis
providedpursuanttoalicenseagreement,evaluationagreementand/ornondisclosureagreement.TheProductmaybeusedonly
inaccordancewiththetermsofsuchagreement.Thesoftwareasapplicablemaybecopiedonlyinaccordancewiththetermsof
suchagreement.
OnloadislicensedundertheGNUGeneralPublicLicense(Version2,June1991).SeetheLICENSEfileinthedistributionfordetails.
TheOnloadExtensionsStubLibraryisCopyrightlicensedundertheBSD2ClauseLicense.
OnloadcontainsalgorithmsanduseshardwareinterfacetechniqueswhicharesubjecttoSolarflareCommunicationsIncpatent
applications.PartiesinterestedinlicensingSolarflare'sIPareencouragedtocontactSolarflare'sIntellectualPropertyLicensing
Groupat:
DirectorofIntellectualPropertyLicensing
IntellectualPropertyLicensingGroup
SolarflareCommunicationsInc,
7505IrvineCenterDrive
Suite100
Irvine,California92618
YouwillnotdisclosetoathirdpartytheresultsofanyperformancetestscarriedoutusingOnloadorEnterpriseOnloadwithout
thepriorwrittenconsentofSolarflare.
Thefurnishingofthisdocumenttoyoudoesnotgiveyouanyrightsorlicenses,expressorimplied,byestoppelorotherwise,with
respecttoanysuchProduct,oranycopyrights,patentsorotherintellectualpropertyrightscoveringsuchProduct,andthis
documentdoesnotcontainorrepresentanycommitmentofanykindonthepartofSOLARFLARECommunications,Inc.orits
affiliates.
TheonlywarrantiesgrantedbySOLARFLARECommunications,Inc.oritsaffiliatesinconnectionwiththeProductdescribedinthis
documentarethoseexpresslysetforthinthelicenseagreement,evaluationagreementand/ornondisclosureagreement
pursuanttowhichtheProductisprovided.EXCEPTASEXPRESSLYSETFORTHINSUCHAGREEMENT,NEITHERSOLARFLARE
COMMUNICATIONS,INC.NORITSAFFILIATESMAKEANYREPRESENTATIONSORWARRANTIESOFANYKIND(EXPRESSORIMPLIED)
REGARDINGTHEPRODUCTORTHISDOCUMENTATIONANDHEREBYDISCLAIMALLIMPLIEDWARRANTIESOFMERCHANTABILITY,
FITNESSFORAPARTICULARPURPOSEANDNONINFRINGEMENT,ANDANYWARRANTIESTHATMAYARISEFROMCOURSEOF
DEALING,COURSEOFPERFORMANCEORUSAGEOFTRADE.Unlessotherwiseexpresslysetforthinsuchagreement,totheextent
allowedbyapplicablelaw(a)innoeventshallSOLARFLARECommunications,Inc.oritsaffiliateshaveanyliabilityunderanylegal
theoryforanylossofrevenuesorprofits,lossofuseordata,orbusinessinterruptions,orforanyindirect,special,incidentalor
consequentialdamages,evenifadvisedofthepossibilityofsuchdamages;and(b)thetotalliabilityofSOLARFLARE
Communications,Inc.oritsaffiliatesarisingfromorrelatingtosuchagreementortheuseofthisdocumentshallnotexceedthe
amountreceivedbySOLARFLARECommunications,Inc.oritsaffiliatesforthatcopyoftheProductorthisdocumentwhichisthe
subjectofsuchliability.
TheProductisnotintendedforuseinmedical,lifesaving,lifesustaining,criticalcontrolorsafetysystems,orinnuclearfacility
applications.
SF104474CD
LastRevised:October2015
Issue20
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 ii
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 iii
   TableofContents
1What’sNew ........................................................1
2LowLatencyQuickstartGuide.........................................4
3Background.......................................................11
3.1Introduction. ...............................................11
4Installation........................................................15
4.1Introduction ................................................15
4.2OnloadDistributions .........................................15
4.3HardwareandSoftwareSupportedPlatforms ....................16
4.4OnloadandtheNetworkAdapterDriver ........................17
4.5RemovingPreviouslyInstalledDrivers...........................17
4.6PreinstallNotes ............................................18
4.7EnterpriseOnload‐BuildandInstallfromSRPM ..................18
4.8EnterpriseOnload‐DebianSourcePackages......................20
4.9OpenOnloadDKMSInstallation................................20
4.10BuildOpenOnloadSourceRPM...............................21
4.11OpenOnload‐Installation....................................21
4.12OnloadKernelModules .....................................22
4.13ConfiguringtheNetworkInterfaces............................23
4.14InstallingNetperf...........................................24
4.15HowtorunOnload .........................................24
4.16Testi ngtheOnloadInstallation................................24
4.17ApplyanOnloadPatch ......................................24
5TuningOnload .....................................................26
5.1Introduction ................................................26
5.2SystemTuning ..............................................27
5.3StandardTuning .............................................29
5.4OnloadDeploymentonNUMASystems .........................31
5.5InterruptHandling‐KernelDriver ..............................33
5.6PerformanceJitter...........................................39
5.7AdvancedTuning ............................................42
OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 iv
6OnloadFunctionality................................................49
6.1OnloadTransparency.........................................49
6.2OnloadStacks...............................................49
6.3VirtualNetworkInterface(VNIC) ...............................50
6.4FunctionalOverview .........................................50
6.5OnloadwithMixedNetworkAdapters ..........................50
6.6MaximumNumberofNetworkInterfaces .......................51
6.7WhitelistandBlacklistInterfaces ...............................51
6.8OnloadedPIDs ..............................................51
6.9OnloadandFileDescriptors,StacksandSockets ..................52
6.10SystemcallsinterceptedbyOnload ............................52
6.11LinuxSysctls ...............................................52
6.12ChangingOnloadControlPlaneTableSizes .....................54
6.13SO_TIMESTAMPandSO_TIMESTAMPNS(softwaretimestamps)....55
6.14SO_TIMESTAMPING(HardwareReceiveTimestamps) .............55
6.15SO_TIMESTAMPING(HardwareTransmitTimestamps)............56
6.16SO_BINDTODEVICE.........................................57
6.17MultiplexedI/O............................................57
6.18WireOrderDelivery ........................................61
6.19StackSharing..............................................62
6.20ApplicationClustering .......................................63
6.21Bonding,LinkaggregationandFailover.........................65
6.22VLANS....................................................66
6.23Acceleratedpipe() ..........................................66
6.24ZeroCopyAPI .............................................67
6.25DebugandLogging.........................................67
OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 v
7Onload‐TCP ......................................................69
7.1TCPOperation ..............................................69
7.2TCPHandshake‐SYN,SYNACK .................................69
7.3TCPSYNCookies ............................................70
7.4TCPSocketOptions..........................................70
7.5TCPLevelOptions ...........................................72
7.6TCPFileDescriptorControl....................................73
7.7TCPCongestionControl.......................................74
7.8TCPSACK ..................................................75
7.9TCPQUICKACK ..............................................75
7.10TCPDelayedACK...........................................75
7.11TCPDynamicACK ..........................................75
7.12TCPLoopbackAcceleration ..................................76
7.13TCPStriping...............................................77
7.14TCPConnectionResetonRTO ................................78
7.15ONLOAD_MSG_WARM ......................................78
7.16Listen/AcceptSockets .......................................79
7.17SocketCaching.............................................80
7.18ScalableFilters.............................................82
7.19TransparentReverseProxyModes.............................84
7.20TransparentReverseProxyonMultipleCPUs ....................85
8Onload‐UDP ......................................................86
8.1UDPOperation..............................................86
8.2SocketOptions..............................................86
8.3SourceSpecificSocketOptions ................................88
8.4UDPSendandReceivePaths ..................................88
8.5FragmentedUDP............................................89
8.6UserLevelrecvmmsgforUDP .................................89
8.7UserLevelsendmmsgforUDP .................................90
8.8MulticastReplication.........................................90
8.9MulticastOperationandStackSharing..........................91
8.10MulticastLoopback .........................................94
8.11HardwareMulticastLoopback................................94
8.12IP_MULTICAST_ALL .........................................96
OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 vi
9PacketBuffers.....................................................97
9.1Introduction ................................................97
9.2NetworkAdapterBufferTableMode............................97
9.3LargeBufferTableSupport ....................................97
9.4ScalablePacketBufferMode ..................................98
9.5AllocatingHugePages ........................................98
9.6HowPacketBuffersAreUsedbyOnload .........................99
9.7ConfiguringScalablePacketBuffers............................102
9.8PhysicalAddressingMode ...................................106
9.9ProgrammedI/O...........................................107
9.10TemplatedSends ..........................................108
10OnloadandVirtualization .........................................109
10.1Introduction ..............................................109
10.2Overview ................................................109
10.3OnloadandLinuxKVM.....................................109
10.4OnloadandNICPartitioning.................................111
10.5OnloadinaDockerContainer ...............................113
10.6PreInstallation ...........................................113
10.7Installation ...............................................114
10.8CreateOnloadDockerImage................................115
10.9Migration................................................115
10.10CopyingFilesBetweenHostandContainer ...................116
11Limitations......................................................117
11.1Introduction ..............................................117
11.2ChangestoBehavior .......................................117
11.3LimitstoAcceleration ......................................119
11.4epoll‐KnownIssues.......................................122
11.5ConfigurationIssues.......................................124
12ChangeHistory ..................................................129
12.1Features.................................................130
12.2EnvironmentVariables .....................................135
12.3ModuleOptions...........................................143
AParameterReference..............................................146
A.1ParameterList.............................................146
BMetaOptions....................................................185
B.1Environmentvariables ......................................185
CBuildDependencies...............................................187
C.1General...................................................187
OnloadUserGuide
TableofContents
Issue20 ©SolarflareCommunications2015 vii
DOnloadExtensionsAPI.............................................189
D.1SourceCode...............................................189
D.2CommonComponents......................................189
D.3StacksAPI.................................................193
D.4StacksAPIUsage...........................................198
D.5StacksAPI‐Examples.......................................200
D.6ZeroCopyAPI .............................................201
D.7TemplatedSends ...........................................212
D.8DelegatedSendsAPI ........................................213
Eonload_stackdump................................................219
E.1Introduction ...............................................219
E.2GeneralUse ...............................................219
FSolarflaresfnettest................................................238
F.1 Introduction...............................................238
Gonload_tcpdump.................................................246
G.1Introduction...............................................246
G.2Buildingonload_tcpdump ...................................246
G.3Usingonload_tcpdump .....................................246
Hef_vi........................................................... 249
H.1Components ..............................................249
H.2CompilingandLinking ......................................249
H.3Documentation ............................................250
Ionload_iptables...................................................251
I.1Description ................................................251
I.2Howitworks ...............................................251
I.3Features...................................................252
I.4Rules .....................................................252
I.5Previewfirewallrules ........................................253
I.6ErrorMessages .............................................255
JSolarflareefpioTestApplication.....................................257
J.1efpio .....................................................257
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 1
1WhatsNew
ThisissueoftheuserguideidentifieschangesintroducedinOpenOnload201509.
RefertoChangeHistoryonpage129toconfirmfeatureavailabilityintheEnterprise
release.
Foracompletelistoffeaturesandenhancementsrefertothereleasenotesandthe
releasechangelogavailablefrom:http://www.openonload.org/download.html.
ThechangesandimprovementsinOnload201509aregearedtowardsInternet
basedservices,ISPloadbalancingserversandCDNbasedinfrastructuressuchas
thosefrontedbyveryhighconnectionratereverseproxyandtransparentproxy
servers.ThechangesinOnloadimprovescalabilitybyincreasingsocketconnection
ratesandbyremovinglimitationsonthenumberoflisteningsocketsandactive
opennetworkconnectionsthatcanbesustained.
NetdriverandFirmwareUpdates
OpenOnload201509includesthe4.5.1.1026netdriver.
UsersshouldrefertoReleaseNotessfcinthedistributionpackagefordetailsof
changestotheadapterdriver.Manyofthenewfeaturesrequireaminimum4.6
versionfirmware.
NewFeaturesOpenOnload201509
ScalableFilters
Onaselectedinterface,aMACfilterisusedtoreceivealltraffictoasingleOnload
stack.TheMACfilterovercomesthehardwarelimitationsencounteredwhenusing
IPfiltersandallowsagreaternumberofTCPlisteningsocketsandactiveopen
connectionstobemaintained.
ThisfeatureisenabledwiththeEF_SCALABLE_FILTERSenvironmentvariable.Refer
toScalableFiltersonpage82formoredetails.
ActiveSocketCaching
ActivesocketcachingspeedsupsocketcreationallowingOnloadtoreuseactive
opensocketswhicharerecycledbacktotheOnloadstackwhenanestablishedTCP
connectionhasterminated.PassiveSocketCachingwasaddedinapreviousOnload
release.
RefertoSocketCachingonpage80.
OnloadUserGuide
WhatsNew
Issue20 ©SolarflareCommunications2015 2
IP_TRANSPARENTSocketOption
Onload201509supportstheIP_TRANSPARENTsocketoptiononTCPsockets(Linux
since2.6.24).SocketshavingsetthisoptionareabletobindtoanonlocalIPaddress.
ThisfeatureisaddedtosupportOnloaddeploymentintransparentandreverse
proxyconfigurations.FormoreinformationseeTransparentReverseProxyModes
onpage84.
Teaming
Onloadnowsupportsbonds/teamsconfiguredwiththeLinux"teaming"kernel
moduleand"teamd"daemon.Thisisinadditiontothelongstandingsupportfor
bondsconfiguredusingthestandardLinux"bonding"module.teamdisdistributed
withRHEL7andotherLinuxOSvariants.
ef_vi
TheOnloadlayer2APInowhassupportforIPprotocolandEthertypefilters.These
areonlysupportedonSFN7000seriesadaptersandrequireaminimumfirmware
versionofatleast4.6.Furtherdetailsareavailableintheef_viDoxygen
documentation.RefertoAppendixHfordetailsofef_vi.
UDPrecvmsg
Inpreviousreleases,whenusingrecvmsg()toretrieveTXtimestampsforUDP
packets,OnloadwouldonlyreturntheUDPpayload.Inthe201509release,Onload
willreturntheentireEthernetframe.ThismatchesthebehaviouroftheLinux
kernel.
PacketBuffers
WithanaimtofurtherreduceTLBthrashingandeliminatepacketsdrops,Onload
willattempttoreusebuffersfromthesamesetofpacketbuffers.Onloadstackdump
canbeusedtoidentifythepacketssetsbeingusedandfreebufferstatus.
SeePacketSetsonpage222forawiderdescriptionandmoreinformation.
EnvironmentVariables
ChangeshavebeenmadeaffectingthefollowingOnloadenvironmentvariables.
Updatesmayincludechangestothedefaultvalue,removalorchangestothe
variabledefinition.Usersareadvisedtocheckbyrunningthefollowingcommand:
#onload_stackdumpdoc
EF_MAX_ENDPOINTS
EF_LOG
EF_PIPE_SIZE
EF_MAX_PINNED_PAGES
EF_SCALABLE_FILTERS
EF_SCALABLE_FILTERS_ENABLE
EF_SCALABLE_FILTERS_MODE
EF_TCP_CONNECT_SPIN
EF_TCP_SYNCRECV_MAX
OnloadUserGuide
WhatsNew
Issue20 ©SolarflareCommunications2015 3
EF_TCP_SNDBUF_MODE
EF_UDP_SEND_NONBLOCK_NO_PACKETS_MODE
EF_TCP_SOCKBUF_MAX_FRACTION
EF_RETRANSMIT_THRESHOLD_ORPHAN
NewenvironmentvariablesarelistedinChapter12,EnvironmentVariableson
page135
ChangeHistory
TheChangeHistorysectionisupdatedwitheveryrevisionofthisdocumentto
includethelatestOnloadfeatures,changesoradditionstoenvironmentvariables
andchangesoradditionstoOnloadmoduleoptions.RefertoChangeHistoryon
page129.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 4
2LowLatencyQuickstartGuide
Introduction
Thissectiondemonstrateshowtoachieveverylowlatencycoupledwithminimum
jitteronasystemfittedwiththeSolarflareSFN7122Fnetworkadapterandusing
Solarflare’skernelbypassnetworkaccelerationmiddleware,OpenOnload.
TheprocedurewillfocusontheperformanceofthenetworkadapterforTCPand
UDPapplicationsrunningonLinuxusingtheindustrystandardNetperfnetwork
benchmarkapplicationandtheSolarflaresuppliedopensourcesfnettestnetwork
benchmarksuite.
PleasereadtheSolarflareLICENSEfileregardingthedisclosureofbenchmarktest
results.
SoftwareInstallation
BeforerunningLowLatencybenchmarktestsensurethatcorrectdriverand
firmwareversionsareinstallede.g.(minimumdriverandfirmwareversionsare
shown):
[root@serverN]#ethtool‐ienp3s0f0
driver:sfc
version:4.5.1.1020
firmwareversion:4.4.2.1011rx1tx1
FirmwareVariant
OnSFN7000seriesadapters,theadaptershouldusetheultralowlatencyfirmware
variantasindicatedbythepresenceofrx1tx1asshownabove.Firmwarevariants
areselectedwiththesfbootutilityfromtheSolarflareLinuxUtilitiespackage
(SF107601LS).
Netperf
Netperfcanbedownloadedfromhttp://www.netperf.org/netperf/
Unpackthecompressedtarfileusingthetarcommand:
[root@systemN]#tar‐zxvfnetperf<version>.tar.gz
Thiswillcreateasubdirectorycallednetperf<version>fromwhichthe
configureandmakecommandscanberun(asroot):
./configure
makeinstall
Followinginstallationthenetperfandnetserverapplicationsarelocatedinthe
srcsubdirectory.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 5
Solarflaresfnettest
Downloadthesfnettest<version>.tgzsourcefilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
[root@systemN]#tar‐zxvfsfnettest<version>.tgz
Runthemakeutilityfromthesfnettest<version>/srcsubdirectorytobuildthe
sfntpingpongapplication.
SolarflareOnload
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalledthesystemmust
supportabuildenvironmentcapableofcompilingkernelmodules.RefertoBuild
Dependenciesonpage187formoredetails.
Downloadtheopenonload<version>.tgzfilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
[root@systemN]#tar‐zxvfonload<version>.tgz
Runtheonload_installcommandfromtheOnload<version>/scripts
subdirectory:
[root@systemN]#./onload_install
TestSetup
Thediagrambelowidentifiestherequiredphysicalconfigurationoftwoservers
equippedwithSolarflarenetworkadaptersconnectedbacktobackinorderto
measurethelatencyoftheadapter,driversandaccelerationmiddleware.If
required,testscanberepeatedwitha10Gswitchonthelinktomeasurethe
additionallatencydeltausingaparticularswitch.
Requirements:
•TwoserversareequippedwithSolarflarenetworkadaptersandconnected
withasinglecablebetweentheSolarflareinterfaces.
•TheSolarflareinterfacesareconfiguredwithanIPaddresssothattrafficcan
passbetweenthem.Usepingtoverifyconnection.
• Onload,netperfandsfnettestareinstalledonbothmachines.
System under test
10G link
(direct attach or optical)
System under test
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 6
PreTestConfiguration
Onbothmachines:
1IsolatetheCPUcoresthatwillbeusedfromthegeneralSMPbalancingand
scheduleralgorithms.Addthefollowingoptiontothekernellinein/boot/
grub/grub.conf:
isolcpus=<commaseparatedcpulist>
2StopthecpuspeedservicetopreventpowersavingmodesfromreducingCPU
clockspeed.
RHEL6[root@systemN]#servicecpuspeedstop
RHEL7[root@systemN]#sysctlstopcpupower
3StoptheirqbalanceservicetopreventtheOSfromrebalancinginterrupts
betweenavailableCPUcores.
RHEL6[root@systemN]#serviceirqbalancestop
RHEL7[root@systemN]#sysctlstopirqbalance
4Stoptheiptablesservicetoeliminateoverheadsincurredbythefirewall.
SolarflarerecommendthissteponRHEL6forimprovedlatencywhenusingthe
kernelnetworkdriver.
RHEL6[root@systemN]#serviceiptablesstop
RHEL7[root@systemN]#sysctlstopiptables
5Disableinterruptmoderation.
[root@systemN]#ethtool‐Ceth<N>rxusecs0adaptiverxoff
where<N>istheidentifieroftheSolarflareadapterEthernetinterface.
6RefertotheReferenceSystemSpecificationbelowforBIOSfeatures.
ReferenceSystemSpecification
ThefollowinglatencymeasurementswererecordedontwinIntel®SandyBridge
servers.Thespecificationofthetestsystemsisasfollows:
•DELLPowerEdgeR210serversequippedwithIntel®Xeon®CPUE31280V2
@3.60GHz,2x2GBDIMMs.
•BIOS:TurbomodeENABLED,cstatesDISABLED,IOMMUDISABLED.
•RedHatEnterpriseLinuxV7.0(x86_64kernel,version3.10.0123.el7.x86_64).
• SolarflareSFN7122FNIC(driverandfirmwareseeSoftwareInstallation)
Directattachcableat10G.
•Performancemightbeimprovedonsomesystemsifthetunedserviceis
disabled.Usersshouldexperimentwithtunedtuningprofilesordisablethe
tunedservice.
• OpenOnloaddistribution:openonload201502u3.
ItisexpectedthatsimilarresultswillbeachievedonanyIntelbased,PCIeGen3
serverorcompatiblesystem.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 7
UDPLatency:Netperf
Runthenetserverapplicationonsystem1:
[root@system1]#pkill‐fnetserver
[root@system1]#onload‐‐profile=latencytaskset‐c1./netserver
Runthenetperfapplicationonsystem‐2:
[root@system2]#onload‐profile=latencytaskset‐c1./netperf‐tUDP_RR
H<system1ip>‐l10‐‐‐r32
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
212992212992323210.00300351.00
300351transactions/secondmeansthateachtransactiontakes1/300351seconds
resultinginaRTT/2latencyof(1/300351)/2or1.66µs.
UDPLatency:sfntpingpong
Runthesfntpingpongapplicationonbothsystems:
[root@system1]#onload‐‐profile=latencytaskset‐c1./sfntpingpong
[root@system2]#onload‐‐profile=latencytaskset‐c1./sfntpingpong‐‐
affinity"1;1"udp<system1ip>
#sizemeanminmedianmax%ilestddeviter
016361571162510584179179911000
11637157316259865189689911000
21634157016289852173167912000
41639157216279917205685910000
816391571162710073200095910000
1616361573162910194173268911000
32166315911647100212198102897000
64169316111670102122400133880000
1281763167017559897188785846000
256188217791850100432477141793000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.66µs
witha99%ilelatencyunder2.2µs.
TCPLatency:Netperf
Runthenetserverapplicationonsystem1:
[root@system1]#pkill‐fnetserver
[root@system1]#onload‐‐profile=latencytaskset‐c1./netserver
Runthenetperfapplicationonsystem2:
[root@system2]#onload‐‐profile=latencytasksetc1./netperft
TCP_RR‐H<system1ip>‐l10‐‐‐r32
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 8
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
1638487380323210.00274853.34
274853transactions/secondmeansthateachtransactiontakes1/274853seconds
resultinginaRTT/2latencyof(1/274853)/2or1.81µs.
TCPLatency:sfntpingpong
Runthesfntpingpongapplicationonbothsystems:
[root@system1]#onload‐‐profile=latencytaskset‐c1./sfntpingpong
[root@system2]#onload‐‐profile=latencytaskset‐c1./sfntpingpong‐‐
affinity"1;1"tcp<system1ip>
#sizemeanminmedianmax%ilestddeviter
1179816971757101652514164829000
2179416871749105612936198831000
417651690174910301192280845000
817721699175510583193093842000
16180416941751102412925211827000
3217861710176710523197398835000
6418471754183311266202099808000
128192918231908105522460114774000
2562014192319989757219989741000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.78µs
witha99%ilelatencyunder2.0µs.
Layer2ef_viLatency
TheefpioUDPtestapplication,suppliedwiththeopenonloadpackage,canbeused
tomeasurelatencyoftheSolarflareef_vilayer2API.efpiousesPIO.
Usingthesamebacktobackconfigurationdescribedabove,efpiolatencytests
wererecordedonDELLPowerEdgeR210servers.
#ef_vi_version_str:2013067122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
roundtriptime:2.65µs(1.32RTT/2)
SolarflareefpioTestApplicationonpage257describestheefpioapplication,
commandlineoptionsandprovidesexamplecommandlines.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 9
ComparativeData
AdapterComparison
Thefollowingtableshowsacomparisonbetweenlatencytestsconductedonthe
SFN6000andtheSFN7000seriesadapters‐valuesshownaretheRTT/2valuein
microseconds.
TestingWithoutOnload
ThebenchmarkperformancetestscanberunwithoutOnloadusingtheregular
kernelnetworkdrivers.Todothisremovetheonload‐‐profile=latencypart
fromthecommandline.
Togetthebestresponseandcomparablelatencyresultsusingkerneldrivers,
Solarflarerecommendsettinginterruptaffinitysuchthatinterruptsandthe
applicationarerunningondifferentCPUcoresbutonthesameprocessorpackage
‐examplesbelow.
Usethefollowingcommandtoidentifyreceivequeuescreatedforaninterfacee.g:
#cat/proc/interrupts|grepeth2
33:0000IRPCIMSIedgeeth20
34:0000IRPCIMSIedgeeth21
DirectIRQ33toCPUcore0andIRQ34toCPUcore1:
#echo1>/proc/irq/33/smp_affinity
#echo2>/proc/irq/34/smp_affinity
Kernellatencyhasbeenmeasuredat3.66µswithUDPtrafficona3.11kernel
supportingthenewkernel“busypoll”featurewherethefollowingvaluesare
recommended:
#sysctlnet.core.busy_poll=50&&sysctlnet.core.busy_read=50
Latencywillbehigherwhenbusypollisnotappliedornotsupportedinthekernel
version.Latencyoflessthan6uscanbemeasuredwithoutbusypollonastandard
RHEL6.4kernel.
Table1:LatencyTests‐ComparativeData
Test SFN6000 SFN7000 Latencygain
UDP 2.2 1.6 27%
TCP 2.4 1.8 25%
ef_viUDP efpingpong‐2.0 efpio‐1.3 40%
OnloadUserGuide
LowLatencyQuickstartGuide
Issue20 ©SolarflareCommunications2015 10
FurtherInformation
ForinstallationofSolarflareadaptersandperformancetuningofthenetworkdriver
whennotusingOnloadrefertotheSolarflareServerAdapterUserGuide(SF
103837CD)availablefromhttps://support.solarflare.com/
QuestionsregardingSolarflareproducts,Onloadandthisuserguidecanbeemailed
tosupport@solarflare.com.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 11
3Background
3.1Introduction.
NOTE:ThisguideshouldbereadinconjunctionwiththeSolarflareServerAdapter
UsersGuide,SF103837CD,whichdescribesproceduresforhardwareand
softwareinstallationofSolarflarenetworkinterfacescards,networkdevicedrivers
andrelatedsoftware.
NOTE:ThroughoutthisuserguidethetermOnloadreferstobothOpenOnloadand
EnterpriseOnloadunlessotherwisestated.
OnloadistheSolarflareacceleratednetworkmiddleware.Itisanimplementationof
TCPandUDPoverIPwhichisdynamicallylinkedintotheaddressspaceofuser
modeapplications,andgranteddirect(butsafe)accesstothenetworkadapter
hardware.Theresultisthatdatacanbetransmittedtoandreceivedfromthe
networkdirectlybytheapplication,withoutinvolvementoftheoperatingsystem.
Thistechniqueisknownas'kernelbypass'.
Kernelbypassavoidsdisruptiveeventssuchassystemcalls,contextswitchesand
interruptsandsoincreasestheefficiencywithwhichaprocessorcanexecute
applicationcode.Thisalsodirectlyreducesthehostprocessingoverhead,typically
byafactoroftwo,leavingmoreCPUtimeavailableforapplicationprocessing.This
effectismostpronouncedforapplicationswhicharenetworkintensive,suchas:
•Marketdataandtradingapplications
• Computationalfluiddynamics(CFD)
•HPC(HighPerformanceComputing)
•HPMPI(HighPerformanceMessagePassingInterface),Onloadiscompatible
withMPICH1and2,HPMPI,OpenMPIandSCALI
•Otherphysicalmodelswhicharemoderatelyparallelizable
• Highbandwidthvideostreaming
•Webcaching,LoadbalancingandMemcachedapplications
•ContentDeliveryNetworks(CDN)andHTTPservers
•Othersystemhotspotssuchasdistributedlockmanagersorforced
serializationpoints
TheOnloadlibrarydynamicallylinkswiththeapplicationatruntimeusingthe
standardBSDsocketsAPI,meaningthatnomodificationsarerequiredtothe
applicationbeingaccelerated.Onloadisthefirstandonlyproducttoofferfullkernel
bypassforPOSIXsocketbasedapplicationsoverTCP/IPandUDP/IPprotocols
OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 12
ContrastingwithConventionalNetworking
Whenusingconventionalnetworking,anapplicationcallsontheOSkerneltosend
andreceivedatatoandfromthenetwork.Transitioningfromtheapplicationtothe
kernelisanexpensiveoperation,andcanbeasignificantperformancebarrier.
WhenanapplicationacceleratedusingOnloadneedstosendorreceivedata,it
neednotaccesstheoperatingsystem,butcandirectlyaccessapartitiononthe
networkadapter.ThetwoschemesareshowninFigure1.
Figure1:ContrastwithConventionalNetworking.
Animportantfeatureoftheconventionalmodelisthatapplicationsdonotget
directaccesstothenetworkinghardwareandsocannotcompromisesystem
integrity.OnloadisabletopreservesystemintegritybypartitioningtheNICatthe
hardwarelevelintomany,protected'VirtualNICs'(VNIC).Anapplicationcanbe
granteddirectaccesstoaVNICwithouttheabilitytoaccesstherestofthesystem
(includingotherVNICsormemorythatdoesnotbelongtotheapplication).Thus
OnloadwithaSolarflareNICallowsoptimumperformancewithoutcompromising
securityorsystemintegrity.
Insummary,Onloadcansignificantlyreducenetworkprocessingoverheads.
OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 13
HowOnloadIncreasesPerformance
Onloadcansignificantlyreducethecostsassociatedwithnetworkingbyreducing
CPUoverheadsandimprovingperformanceforlatency,bandwidthandapplication
scalability.
Overhead
Transitioningintoandoutofthekernelfromauserspaceapplicationisarelatively
expensiveoperation:theequivalentofhundredsorthousandsofinstructions.With
conventionalnetworkingsuchatransitionisrequiredeverytimetheapplication
sendsandreceivesdata.WithOnload,theTCP/IPprocessingcanbedoneentirely
withintheuserprocess,eliminatingexpensiveapplication/kerneltransitions,i.e.
systemcalls.Inaddition,theOnloadTCP/IPstackishighlytuned,offeringfurther
overheadsavings.
TheoverheadsavingsofOnloadmeanmoreoftheCPU'scomputingpoweris
availabletotheapplicationtodousefulwork.
Latency
Conventionally,whenaserverapplicationisreadytoprocessatransactionitcalls
intotheOSkerneltoperforma'receive'operation,wherethekernelputsthecalling
thread'tosleep'untilarequestarrivesfromthenetwork.Whensucharequest
arrives,thenetworkhardware'interrupts'thekernel,whichreceivestherequest
and'wakes'theapplication.
AllofthisoverheadtakesCPUcyclesaswellasincreasingcacheandtranslation
lookasidebuffer(TLB)footprint.WithOnload,theapplicationcanremainatuser
levelwaitingforrequeststoarriveatthenetworkadapterandprocessthem
directly.Theeliminationofakerneltousertransition,aninterrupt,anda
subsequentusertokerneltransitioncansignificantlyreducelatency.Inshort,
reducedoverheadsmeanreducedlatency.
Bandwidth
BecauseOnloadimposeslessoverhead,itcanprocessmorebytesofnetworktraffic
everysecond.Alongwithspeciallytunedbufferingandalgorithmsdesignedfor10
gigabitnetworks,Onloadallowsapplicationstoachievesignificantlyimproved
bandwidth.
Scalability
Modernmulticoresystemsarecapableofrunningmanyapplications
simultaneously.However,theadvantagescanbequicklylostwhenthemultiple
corescontendonasingleresource,suchaslocksinakernelnetworkstackordevice
driver.Theseproblemsarecompoundedonmodernsystemswithmultiplecaches
acrossmanyCPUcoresandNonUniformMemoryArchitectures.
OnloadUserGuide
Background
Issue20 ©SolarflareCommunications2015 14
Onloadresultsinthenetworkadapterbeingpartitionedandeachpartitionbeing
accessedbyanindependentcopyoftheTCP/IPstack.TheresultisthatwithOnload,
doublingthecoresreallycanresultindoubledthroughputasdemonstratedby
Figure2.
Figure2:OnloadPartitionedNetworkAdapter
FurtherInformation
Fordetailedinformationreferto:
OnloadFunctionalityonpage49.
Onload‐TCPonpage69.
Onload‐UDPonpage86.
OnloadandVirtualizationonpage109
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 15
4Installation
4.1Introduction
Thischaptercoversthefollowingtopics:
OnloadDistributionsonpage15
HardwareandSoftwareSupportedPlatformsonpage16
OnloadandtheNetworkAdapterDriveronpage17
RemovingPreviouslyInstalledDriversonpage17
PreinstallNotesonpage18
EnterpriseOnload‐BuildandInstallfromSRPMonpage18
EnterpriseOnload‐DebianSourcePackagesonpage20
OpenOnloadDKMSInstallationonpage20
BuildOpenOnloadSourceRPMonpage21
OpenOnload‐Installationonpage21
OnloadKernelModulesonpage22
ConfiguringtheNetworkInterfacesonpage23
InstallingNetperfonpage24
TestingtheOnloadInstallationonpage24
ApplyanOnloadPatchonpage24
4.2OnloadDistributions
Onloadisavailableintwodistributions
• “OpenOnload”isafreeversionofOnloadavailablefromhttp://
www.openonload.org/distributedasasourcetarballundertheGPLv2license.
OpenOnloadissubjecttoalineardevelopmentcyclewheremajorreleases
every34monthsincludethelatestdevelopmentfeatures.
• “EnterpriseOnload”isacommercialenterpriseversionofOnloaddistributedas
asourceRPMundertheGPLv2license.EnterpriseOnloaddiffersfrom
OpenOnloadinthatitisofferedasamaturecommercialproductthatis
downstreamfromOpenOnloadhavingundergoneacomprehensivesoftware
producttestcycleresultingintested,hardenedandvalidatedcode.
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 16
TheSolarflareproductrangeoffersaflexibleandbroadrangeofsupportoptions,
usersshouldconsulttheirresellerfordetailsandrefertotheSolarflareEnterprise
ServiceandSupportinformationathttp://www.solarflare.com/EnterpriseService
Support.
4.3HardwareandSoftwareSupportedPlatforms
•OnloadcanberunonthefollowingSolarflareadapters:
‐ SolarflareFlareonAdapters
‐ OnloadNetworkAdapters
‐ Solarflaremezzanineadapters
‐ SFA6902FandSFA7942QApplicationOnload™Engine.
RefertotheSolarflareServerAdapterUserGuide‘ProductSpecifications’for
adapterdetails.
•OnloadcanrunonalllntelandAMDx86processors,32bitand64bitplatforms.
Table2identifiessupportedoperatingsystems/kernels
Table2:OS/KernelSupport
OSVersion Notes
RedHatEnterpriseLinux6.4‐7.2 RHEL6builtinSolarflaredrivers
maynotsupportSFN7000series
adapters.
RedHatMessagingRealtimeandGrid2.4,2.5
RedHatEnterpriseLinuxforRealtime7.1
SuSELinuxEnterpriseServer11sp2,sp3,sp4 BuiltinSolarflaredriversmay
notsupportSFN7000series
adapters.
SuSELinuxEnterpriseRealtimeExtension11
SuSELinuxEnterpriseServer12baserelease
CanonicalUbuntuServerLTS14.04
CanonicalUbuntuServer14.10,15.04,15.10
Debian7“Wheezy7.x
Debian8Jessie”8.0
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 17
WhilsttheOnloadQAtestcyclepredominantlyfocusesontheLinuxOSversions
documentedabove,althoughnotformallysupported,Solarflarearenotawareof
anyissuespreventingOnloadinstallationonotherLinuxvariantssuchasCentos,
Gentoo,andFedora.SomeversionsofUbuntuandDebianearlierthanthoselisted
abovearealsoknowntosupportOnload.
4.4OnloadandtheNetworkAdapterDriver
TheSolarflarenetworkadapterdriver,the“netdriver,isgenerallyavailablefrom
threesources:
DownloadassourceRPMfromsupport.solarflare.com.
•Packaged‘inboxinmanyLinuxdistributionse.gRedHatEnterpriseLinux.
•PackagedintheOpenOnload/EnterpriseOnloaddistribution.
WhenusingOnloadyoumustusetheadapterdriverdistributedwiththatversionof
Onload.
4.5RemovingPreviouslyInstalledDrivers
TheSolarflareadapterdriver(sfc.ko)isdistributedaspartofmanyLinuxbasedOS
distributions‐thisisoftenreferredtoasthe‘boxeddriverorthe‘intree’driver.
DependingontheOSversionthisdrivermaynotsupportmorerecentSolarflare
adapters.Alwayscheckthedriverreleasenotesavailablefromhttps://
support.solarflare.com/.
The‘intree’driverdisplaysonlyMajorandMinorrevisionnumberswhendisplayed
bytheethtoolcommand:
#ethtool‐ienp3s0f0
driver:sfc
version:4.0
EveryOnloadreviseddistributionincludesaversionofthenetdrivertosupportthe
specificfeaturesoftheOnloadreleaseandthisdrivershouldalwaysbeusedwith
Onload.(ThedriverisinstalledalongwiththeotherOnloaddrivers.)Onloaddrivers
displaydetailedversioninformationusingtheethtoolcommand:
Linuxkernels2.6.18‐4.2
SolarflareaimtosupporttheOScurrentandpreviousmajorreleaseatthepoint
thesearereleased(plusthelatestlongtermsupportreleaseifthisisnotalready
included).Thisincludesallminorreleaseswherethedistributorhasnotyet
declaredendoflife/support.
Table2:OS/KernelSupport
OSVersion Notes
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 18
#ethtool‐ienp3s0f0
driver:sfc
version:4.5.1.1020
ToensuretheOnloaddriverisalwaysloadedfollowingsystemreboot,the‘intree’
drivercanberemovedfromtheOSentirely.AlternativelyanyOnloadstartupscript
shouldincludethecommandtoreloadtheOnloaddrivers:
#onload_toolreload
Toremovethe‘intree’driver(withOnloaduninstalledornotyetinstalled):
#find/lib/modules/$(uname‐r)‐name'sfc*.ko'|xargsrm–rf
#rmmodsfc
#updateinitramfs‐u‐k<kernelversion>
initramfscommandsmaydifferondifferentLinuxbasedOS,e.gonCentos7the
followingdracutcommandcanbeused:
#dracut–f/boot/initramfs<version>.x86_64.imginitramfs<version>.x86_64
4.6PreinstallNotes
NOTE:IfOnloadistoacceleratea32bitapplicationona64bitarchitecture,the
32bitlibcdevelopmentheadersshouldbeinstalledbeforebuildingOnload.Refer
toAppendixCforinstallinstructions.
NOTE:YoumustremoveanyexistingSolarflareRPMdriverpackagesbefore
installingOnload.
NOTE:WhenmigratingbetweenOnloadversionsorbetweenOpenOnloadand
EnterpriseOnload,apreviouslyinstalledversionmustfirstberemovedusingthe
onload_uninstallcommand.
NOTE:TheSolarflaredriversarecurrentlyclassifiedasunsupportedinSLES11,12,
thecertificationprocessisunderway.Toovercomethis(SLES11)add
allow_unsupported_modules1tothe/etc/modprobe.d/unsupported
modulesfile.ForSLES12addthesametothe/etc/modprobe.d/10
unsupportedmodules.conffile.
4.7EnterpriseOnload‐BuildandInstallfromSRPM
ThefollowingstepsidentifytheprocedurestobuildandinstallEnterpriseOnload.
SRPMscanbebuiltbythe‘rootor‘nonrootuser,buttheusermusthave
superuserprivilegestoinstallRPMs.CustomersshouldcontacttheirSolarflare
customersalesrepresentativeforaccesstotheEnterpriseOnloadSRPMresources.
BuildtheRPM
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
Asroot:
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 19
rpmbuild‐‐rebuildenterpriseonload<version>.src.rpm
Orasanonrootuser:
Itisadvisedtouse_topdirtoensurethatRPMsarebuiltintoadirectorytowhich
theuserhaspermissions.Thedirectorystructuremustpreexistfortherpmbuild
commandtosucceed.
mkdir‐p/tmp//myrpm/{SOURCES,BUILD,RPMS,SRPMS}
rpmbuild‐‐define"_topdir/tmp/myrpm"\
‐‐rebuildenterpriseonload<version>.src.rpm
NOTE:Onsomenonstandardkernelstherpmbuildmightfailbecauseofbuild
dependencies.Inthiseventretry,addingthe‐‐nodepsoptiontothecommand
line.
BuildingthesourceRPMwillproduce2binaryRPMfileswhichcanbefoundinthe
/usr/src/*/RPMS/directory
•or,whenbuiltbyanonrootuserin_topdir/RPMS
•or,when_topdirwasdefinedintherpmbuildcommandlinein/tmp/myrpm/
RPMS/x86_64/
forexampletheEnterpriseOnloaduserspacecomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonload<version>.x86_64.rpm
andtheEnterpriseOnloadkernelcomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonloadkmod2.6.1892.el5
<version>.x86_64.rpm
InstalltheEnterpriseOnloadRPM
TheEnterpriseOnloadRPMandthekernelRPMmustbeinstalledfor
EnterpriseOnloadtofunctioncorrectly.
rpm‐ivfenterpriseonload<version>.x86_64.rpm
rpm‐ivfenterpriseonloadkmod2.6.1892.el5<version>.x86_64.rpm
NOTE:EnterpriseOnloadisnowinstalledbutthekernelmodulesarenotyetloaded.
NOTE:TheEnterpriseOnloadkmodfilenameisspecifictothekernelthatitisbuilt
for.
InstallingtheEnterpriseOnloadKernelModule
ThiswillloadtheEnterpriseOnloadkerneldriverandotherdriverdependenciesand
createanydevicenodesneededforEnterpriseOnloaddriversandutilities.The
commandshouldberunasroot.
/etc/init.d/openonloadstart
Followingsuccessfulexecutionthiscommandproducesnooutput,buttheonload
scriptwillidentifythatthekernelmoduleisnowloaded.
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 20
onload
EnterpriseOnload<version>
Copyright20062013SolarflareCommunications,20022005Level5Networks
Built:Oct15201309:19:2312:23:12(release)
Kernelmodule:<version>
NOTE:AtthispointEnterpriseOnloadisloaded,butuntilthenetworkinterfacehas
beenconfiguredandbroughtintoserviceEnterpriseOnloadwillbeunableto
acceleratetraffic.
4.8EnterpriseOnload‐DebianSourcePackages
Fromversion4.0,DebianinstallpackagesareavailableforEnterpriseOnload.
Packagesarenamedinthefollowingformat:
enterpriseonload_<version>debiansource.tgz
1Untarsourcepackage
$tarxfenterpriseonload_<version>debiansource.tgz
2Extractsource
$dpkgsource‐xenterpriseonload_<version>1.dsc
3Buildpackages
$cdenterpriseonload<version>
$debuild‐i‐uc‐us
4Installpackages
$sudodpkg‐i../enterpriseonloaduser_<version>1_amd64.deb
$sudodpkg‐i../enterpriseonloadsource_<version>1_all.deb
5Buildandinstallmodules
$sudomaaienterpriseonload
4.9OpenOnloadDKMSInstallation
OpenOnloadDKMSpackagesareavailablebycontactingsupport@solarflare.com.
1DKMSmustbeinstalledontheserver.DKMScanbedownloadedfromhttp://
linux.dell.com/dkms/orfromtheOSdistribution.Tocheckthisrunthe
followingcommandwhichwillreturnnothingifDKMSisnotinstalled:
#dkms‐‐version
dkms:2.2.0.3
2InstalltheOnloaddkmspackage:
#rpm‐iopenonloaddkms<version>.noarch.rpm
3Ensuredriversandkernelmoduleareloaded:
onload_toolreload
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 21
4.10BuildOpenOnloadSourceRPM
AsourceRPMcanbebuiltfromtheOpenOnloaddistributiontarfile.
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
CopythefiletoadirectoryonthemachinewherethesourceRPMistobe
created.
2Asroot,executethefollowingcommand:
rpmbuild‐tsopenonload<version>.tgz*
x86_64Wrote:/root/rpmbuild/SRPMS/openonload<version>.src.rpm
TheoutputidentifiesthelocationofthesourceRPM.Usethetaoptiontoget
abinaryRPM.
4.11OpenOnload‐Installation
Thefollowingproceduredemonstrateshowtodownload,untarandinstall
OpenOnload.
DownloadanduntarOpenOnload
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
Thecompressedtarfile(.tgz)shouldbedownloaded/copiedtoadirectoryon
themachineonwhichitwillbeinstalled.
2Asroot,unpackthetarfileusingthetarcommand.
tar‐zxvfopenonload<version>.tgz
Thiswillunpackthetarfileand,withinthecurrentdirectory,createasub
directorycalledopenonload<version>whichcontainsothersubdirectories
includingthescriptsdirectoryfromwhichsubsequentinstallcommandscan
berun.
BuildingandInstallingOpenOnload
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
ThefollowingcommandwillbuildandinstallOpenOnloadandrequireddriversin
thesystemdirectories:
./onload_install
Successfulinstallationwillbeindicatedwiththefollowingoutput
onload_install:Installcompletepossiblyfollowedbyawarningthatthe
sfc(netdriver)driverisalreadyinstalled.
NOTE:Theonload_installscriptdoesnotcreateRPMs.
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 22
LoadOnloadDrivers
FollowinginstallationitisnecessarytoloadtheOnloaddrivers:
onload_toolreload
WhenusedwithOpenOnloadthiscommandwillreplaceanypreviouslyloaded
networkadapterdriverwiththedriverfromtheOpenOnloaddistribution.
CheckthatSolarflaredriversareloadedusingthefollowingcommands:
lsmod|grepsfc
lsmod|greponload
AnalternativetothereloadcommandistorebootthesystemtoloadOnload
drivers.
ConfirmOnloadInstallation
WhentheOnloadinstallationiscompleteruntheonloadcommandtoconfirm
installationofOnloadsoftwareandkernelmodule:
[root@server1]onload
WilldisplaytheOnloadproductbannerandusage:
OpenOnload201405
Copyright20062012SolarflareCommunications,20022005Level5Networks
Built:May20201416:46:33(release)
Kernelmodule:201405
usage:
onload[options]<command><commandargs>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐forceprofiles‐‐profilesettingsoverrideenvironment
‐‐noapphandler‐‐donotuseappspecificsettings
‐‐app=<appname>‐‐identifyapplicationtorununderonload
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
4.12OnloadKernelModules
ToidentifySolarflaredriversalreadyinstalledontheserver:
modprobe‐l|grep‐esfc‐eonloa
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 23
d
Tounloadanyloadeddrivers:
onload_toolunload
ToremovetheinstalledfilesofapreviousOnload:
onload_uninstall
ToloadtheSolarflarenetdriver(ifnotalreadyloaded):
modprobesfc
Reloaddriversfollowingupgradeorchangedsettings:
onload_toolreload
4.13ConfiguringtheNetworkInterfaces
NetworkinterfacesshouldbeconfiguredaccordingtotheSolarflareServerAdapter
User’sGuide.
Whentheinterface(s)havebeenconfigured,thedmesgcommandwilldisplay
outputsimilartothefollowing(oneentryforeachSolarflareinterface):
sfc0000:13:00.0:INFO:eth2SolarflareCommunicationsNICPCI(1924:803)
sfc0000:13:00.1:INFO:eth3SolarflareCommunicationsNICPCI(1924:803)
DriverName Description
sfc.ko ALinuxnetdriverprovidestheinterfacebetweentheLinux
networkstackandtheSolarflarenetworkadapter.
sfc_char.ko ProvideslowlevelaccesstotheSolarflarenetworkadapter
virtualizedresources.Supportsdirectaccesstothenetwork
adapterforapplicationsthatusetheef_viuserlevelinterface
formaximumperformance.
sfc_tune.ko Thisisusedtopreventthekernelduringidleperiodsfrom
puttingtheCPUsintoasleepstate.
Removedinopenonload201405.
sfc_aoe.ko SolarflareApplicationOnload™EnginedriverfortheSFA6902F
adapter.
sfc_affinity.ko Usedtodirecttrafficflowmanagedbyathreadtothecorethe
threadisrunningon,insertspacketfiltersthatoverridethe
RSSbehaviour.
sfc_resource.ko Managesthevirtualizationresourcesoftheadapterand
sharestheresourcesbetweenotherdrivers.
onload.ko ThekernelcomponentofOnload.
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 24
NOTE:IPaddressconfigurationshouldbecarriedoutusingnormalOStoolse.g.
systemconfignetwork(RedHat)oryast(SUSE).
4.14InstallingNetperf
RefertotheLowLatencyQuickstartGuideonpage4forinstructionstoinstall
NetperfandSolarflaresfnettestapplications.
4.15HowtorunOnload
OnceOnloadhasbeeninstalledtherearedifferentwaystoaccelerateapplications.
ExportingLD_PRELOADwillmeanthatallapplicationsstartedinthesame
environmentwillbeaccelerated.
#exportLD_PRELOAD=libonload.so
Prefixingtheapplicationcommandlinewiththeonloadcommandwillaccelerate
theapplication.
#onload<app_name>[app_options]
4.16TestingtheOnloadInstallation
ThetheLowLatencyQuickstartGuideonpage4demonstratestestingofOnload
withNetperfandtheSolarflaresfnettestbenchmarktools.
4.17ApplyanOnloadPatch
Occasionally,theSolarflareSupportGroupmayissueasoftware‘patch’whichis
appliedtoonloadtoresolveaspecificbugorinvestigateaspecificissue.The
followingproceduredescribeshowapatchshouldbeappliedtotheinstalled
OpenOnloadsoftware.
1Copythepatchtoadirectoryontheserverwhereonloadisalreadyinstalled.
2Gototheonloaddirectoryandapplythepatche.g.
cdopenonload<version>
[openonload<version>]$patch‐p1<~/<path>/<nameofpatchfile>.patch
3Uninstalltheoldonloaddrivers
[openonload<version>]$onload_uninstall
4Buildandreinstalltheonloaddrivers
[openonload<version>]$./scripts/onload_install
[openonload<version>]$onload_toolreload
Thefollowingproceduredescribeshowapatchshouldbeappliedtotheinstalled
EnterpriseOnloadRPM.(ThisexamplepatchesEnterpriseOnloadversion2.1.0.3).
OnloadUserGuide
Installation
Issue20 ©SolarflareCommunications2015 25
1CopythepatchtothedirectoryontheserverwheretheEnterpriseOnloadRPM
packageexistsandcarryoutthefollowingcommands:
rpm2cpioenterpriseonload2.1.0.31.src.rpm|cpio–id
tar‐xzfenterpriseonload2.1.0.3.tgz
cdenterpriseonload2.1.0.3
patch‐p1<$PATCHNAME
2Thiscannowbeinstalleddirectoryfromthisdirectory:
./scripts/onload_install
3OritcanberepackagedasanewRPM:
cd..
tarczfenterpriseonload2.1.0.3.tgzenterpriseonload2.1.0.3
rpmbuild‐tsenterpriseonload2.1.0.3.tgz
4Therpmbuildprocedurewilldisplaya‘Wrote’lineidentifyingthelocationof
thebuiltRPMe.g
Wrote:/root/rpmbuild/SRPMS/enterpriseonload2.1.0.31.el6.src.rpm
5InstalltheRPMintheusualway:
rpm‐ivh/root/rpmbuild/SRPMS/enterpriseonload2.1.0.31.el6.src.rpm
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 26
5TuningOnload
5.1Introduction
ThischapterdocumentstheavailabletuningoptionsforOnload,andtheexpected
results.Theoptionscanbesplitintothefollowingcategories:
•SystemTuning
• StandardLatencyTuning.
•AdvancedTuningdrivenfromanalysisoftheOnloadstackusing
onload_stackdump.
MostoftheOnloadconfigurationparameters,includingtuningparameters,areset
byenvironmentvariablesexportedintotheacceleratedapplicationsenvironment.
Environmentvariablescanbeidentifiedthroughoutthismanualastheybeginwith
EF_.AllenvironmentvariablesaredescribedinAppendicesAandBofthismanual.
Examplesthroughoutthisguideassumetheuseofthebashorshshells;othershells
mayusedifferentmethodstoexportvariablesintotheapplicationsenvironment.
SystemTuningonpage27describestoolsandcommandswhichcanbeusedto
tunetheserverandOS.
StandardTuningonpage29describeshowtoperformstandardheuristic
tuning,whichcanhelpimprovetheapplication’sperformance.Therearealso
benchmarkexamplesrunningspecificteststodemonstratetheimprovements
Onloadcanhaveonanapplication.
AdvancedTuningonpage42introducesadvancedtuningoptionsusing
onload_stackdump.Thereareworkedexamplestodemonstratehowto
achievetheapplicationtuninggoals.
NOTE:Onloadtuningandkerneldrivertuningaresubjecttodifferent
requirements.ThissectiondescribesthestepstotuneOnload.Fordetailsonhow
totunetheSolarflarekerneldriver,refertothe'PerformanceTuningonLinux'
sectionoftheSolarflareServerAdapterUserGuide.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 27
5.2SystemTuning
Thissectiondetailsstepstotunetheserverandoperatingsystemforlowestlatency.
Sysjitter
TheSolarflaresysjitterutilitymeasurestheextenttowhichthesystemintroduces
jitterandsoimpactsontheuserlevelprocess.Sysjitterrunsathreadoneach
processorcoreandwhenthethreadisdescheduledfromthecoreitmeasuresfor
howlong.Sysjitterproducessummarystatisticsforeachprocessorcore.The
sysjitterutilitycanbedownloadedfromwww.openonload.org
Sysjittershouldberunonasystemthatisidle.Whenrunningonasystemwith
cpusetsenabled‐runsysjitterasroot.
RefertothesysjitterREADMEfileforfurtherinformationonbuildingandrunning
sysjitter.
ThefollowingisanexampleoftheoutputfromsysjitteronasingleCPUsocket
serverwith4CPUcores.
./sysjitter‐‐runtime10200|column‐t
core_i:0123
threshold(ns):200200200200
cpu_mhz:3215321532153215
runtime(ns):9987653973998765224599876520709987652027
runtime(s):9.9889.9889.9889.988
int_n:10001101301001210001
int_n_per_sec:1001.3361014.2521002.4381001.336
int_min(ns):1333124712991446
int_median(ns):1390133013291470
int_mean(ns):1424145214521502
int_90(ns):1437137213571519
int_99(ns):1619504623921688
int_999(ns):506522977156043694
int_9999(ns):312603901718430536419
int_99999(ns):406134506534709749998
int_max(ns):406134506534709749998
int_total(ns):14244846147199721454199115031294
int_total(%):0.1430.1470.1460.150
Thetablebelowdescribestheoutputfieldsofthesysjitterutility.
Field Description
threshold(ns) ignoreanyinterruptsshorterthanthisperiod
cpu_mhz CPUspeed
runtime(ns) runtimeofsysjitter‐nanoseconds
runtime(s) runtimeofsysjitter‐seconds
int_n numberofinterruptionstotheuserthread
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 28
Timer(TSC)Stability
OnloadusestheTimeStampCounter(TSC)CPUregisterstomeasurechangesin
timewithverylowoverhead.ModernCPUssupportan“invariantTSC,whichis
synchronizedacrossdifferentCPUsandticksataconstantrateregardlessofthe
currentCPUfrequencyandpowersavingmode.Onloadreliesonthistogenerate
accuratetimecalculationswhenrunningacrossmultipleCPUs.Ifrunonasystem
whichdoesnothaveaninvariantTSC,Onloadmaycalculatewildlyinaccuratetime
valuesandthiscan,inextremecases,leadtosomeconnectionsbecomingstuck.
UsersshouldconsulttheirservervendordocumentationandOSdocumentationto
ensurethatserverscanmeettheinvariantTSCrequirement.
CPUPowerSavingMode
ModernprocessorsutilizedesignfeaturesthatenableaCPUcoretodropinto
loweringpowerstateswheninstructedbytheoperatingsystemthattheCPUcore
isidle.WhentheOSschedulesworkontheidleCPUcore(orwhenotherCPUcores
ordevicesneedtoaccessdatacurrentlyintheidleCPUcore’sdatacache)theCPU
coreissignaledtoreturntothefullyonpowerstate.ThesechangesinCPUcore
powerstatescreateadditionalnetworklatencyandjitter.
int_n_per_sec numberofinterruptionstotheuserthreadpersecond
int_min(ns) minimumtimetakenawayfromtheuserthreadduetoan
interruption
int_median(ns) mediantimetakenawayfromtheuserthreadduetoan
interruption
int_mean(ns) meantimetakenawayfromtheuserthreadduetoan
interruption
int_90(ns) 90%percentilevalue
int_99(ns) 99%percentilevalue
int_999(ns) 99.9%percentilevalue
int_9999(ns) 99.99%percentilevalue
int_99999(ns) 99.999%percentilevalue
int_max(ns) maxtimetakenawayfromtheuserthread
int_total(ns) totaltimespentnotprocessingtheuserthread
int_total(%) int_total(ns)asapercentageoftotalruntime
Field Description
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 29
Solarflarethereforerecommendthatcustomerswishingtoachievethelowest
latencyandlowestjitterdisablethe“C1Epowerstateor“CPUpowersavingmode”
withinthemachine'sBIOS.
DisablingtheCPUpowersavingmodesisrequirediftheapplicationistorealizelow
latencywithlowjitter.
NOTE:ToensureCstatesarenotenabled,overridingtheBIOSsettings,itis
recommendedtoputthelineintel_idle.max_cstate=0idle=pollintothe
kernelcommandline/boot/grub/grub.conf.Thesettingswillproduceconsistent
resultsandareparticularlyusefulwhenbenchmarking,butallowingsomecoresto
enableTurbomodeswhileothersareidlecanproducebestlatencyinsomeservers.
UsersshouldrefertovendordocumentationandexperimentwithCstatesfor
differentapplications.
Customersshouldconsulttheirsystemvendoranddocumentationfordetails
concerningthedisablingofC1E,CstatesorCPUpowersavingstates.
5.3StandardTuning
ThissectiondetailsstandardtuningstepsforOnload.
Spinning(busywait)
Conventionally,whenanapplicationattemptstoreadfromasocketandnodatais
available,theapplicationwillentertheOSkernelandblock.Whendatabecomes
available,thenetworkadapterwillinterrupttheCPU,allowingthekernelto
rescheduletheapplicationtocontinue.
Blockingandinterruptsarerelativelyexpensiveoperations,andcanadverselyaffect
bandwidth,latencyandCPUefficiency.
Onloadcanbeconfiguredtospinontheprocessorinusermodeforuptoaspecified
numberofmicrosecondswaitingfordatafromthenetwork.Ifthespinperiod
expirestheprocessorwillreverttoconventionalblockingbehavior.Nonblocking
socketswillalwaysreturnimmediatelyastheseareunaffectedbyspinning.
OnloadusestheEF_POLL_USECenvironmentvariabletoconfigurethelengthofthe
spintimeout.
exportEF_POLL_USEC=100000
willsetthebusywaitperiodto100milliseconds.SeeMetaOptionsonpage185for
moredetails.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 30
Enablingspinning
ToenablespinninginOnload:
SetEF_POLL_USEC.ThiscausesOnloadtospinontheprocessorforuptothe
specifiednumberofmicrosecondsbeforeblocking.ThissettingisusedinTCPand
UDPandalsoinrecv(),select(),pselect()andpoll(),ppoll()and
epoll_wait(),epoll_pwait()andonload_ordered_epoll_wait().Usethe
followingcommand:
exportEF_POLL_USEC=100000
NOTE:IfneitherofthespinningoptionsEF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
SettingtheEF_POLL_USECvariablealsosetsthefollowingenvironmentvariables.
EF_SPIN_USEC=EF_POLL_USEC
EF_SELECT_SPIN=1
EF_EPOLL_SPIN=1
EF_POLL_SPIN=1
EF_PKT_WAIT_SPIN=1
EF_TCP_SEND_SPIN=1
EF_UDP_RECV_SPIN=1
EF_UDP_SEND_SPIN=1
EF_TCP_RECV_SPIN=1
EF_BUZZ_USEC=EF_POLL_USEC
EF_SOCK_LOCK_BUZZ=1
EF_STACK_LOCK_BUZZ=1
Turnoffadaptivemoderationandsetinterruptmoderationtoahighvalue
(microseconds)toavoidfloodingthesystemwithinterrupts.Usethefollowing
command:
/sbin/ethtool‐Ceth2rxusecs60adaptiverxoff
SeeMetaOptionsonpage185formoredetails
WhentoUseSpinning
Theoptimalsettingisdependentonthenatureoftheapplication.Ifanapplication
islikelytofinddatasoonafterblocking,orthesystemdoesnothaveanyother
majortaskstoperform,spinningcanimprovelatencyandbandwidthsignificantly.
Ingeneral,anapplicationwillbenefitfromspinningifthenumberofactivethreads
islessthanthenumberofavailableCPUcores.However,iftheapplicationhasmore
activethreadsthanavailableCPUcores,spinningcanadverselyaffectapplication
performancebecauseathreadthatisspinning(andthereforeidle)takesCPUtime
awayfromanotherthreadthatcouldbedoingwork.Ifindoubt,itisadvisabletotry
anapplicationwitharangeofsettingstodiscovertheoptimalvalue.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 31
Pollingvs.Interrupts
InterruptsareusefulbecausetheyallowtheCPUtodootherusefulworkwhile
simultaneouslywaitingforasynchronousevents(suchasthereceptionofpackets
fromthenetwork).ThehistoricalalternativetointerruptswasfortheCPUto
periodicallypollforasynchronouseventsandonsingleprocessorsystemsthiscould
resultingreaterlatencythanwouldbeobservedwithinterrupts.Historicallyitwas
acceptedthatinterruptswere“goodforlatency”.
Onmodern,multicoresystemsthetradeoffsaredifferent.Itisoftenpossibleto
dedicateanentireCPUcoretotheprocessingofasinglesourceofasynchronous
events(suchasnetworktraffic).TheCPUdedicatedtoprocessingnetworktraffic
canbespinning(akabusywaiting),continuouslypollingforthearrivalofpackets.
Whenapacketarrives,theCPUcanbeginprocessingitalmostimmediately.
Contrastthepollingmodeltoaninterruptdrivenmodel.HeretheCPUislikelyinits
“idleloop”whenaninterruptoccurs.Theidleloopisinterrupted,theinterrupt
handlerexecutes,typicallymarkingaworkertaskasrunnable.TheOSschedulerwill
thenrunandswitchestothekernelthreadthatwillprocesstheincomingpacket.
Thereistypicallyasubsequenttaskswitchtoausermodethreadwherethereal
workofprocessingtheevent(e.g.actingonthepacketpayload)isperformed.
Dependingonthesystem,itcantakeontheorderofamicrosecondtorespondto
aninterruptandswitchtotheappropriatethreadcontextbeforebeginningthereal
workofprocessingtheevent.AdedicatedCPUspinninginapollingloopcanbegin
processingtheasynchronouseventinamatterofnanoseconds.
ItfollowsthatspinningonlybecomesanoptionifaCPUcorecanbededicatedto
theasynchronousevent.IftherearemorethreadsawaitingeventsthanCPUcores
(i.e.ifallCPUcoresareoversubscribedtoapplicationworkerthreads),thenspinning
isnotaviableoption,(atleast,notforallevents).Onethreadwillbespinning,
pollingfortheeventwhileanothercouldbedoingusefulwork.Spinninginsucha
scenariocanleadto(dramatically)increasedlatencies.ButifaCPUcorecanbe
dedicatedtoeachthreadthatblockswaitingfornetworkI/O,thenspinningisthe
bestmethodtoachievethelowestpossiblelatency.
5.4OnloadDeploymentonNUMASystems
WhendeployedonNUMAsystems,applicationloadthroughputandlatency
performancecanbeadverselyaffectedunlessdueconsiderationisgiventothe
selectionoftheNUMAnode,theallocationofcachememoryandtheaffinitization
ofdrivers,processesandinterrupts.
ForbestperformancetheacceleratedapplicationshouldalwaysrunontheNUMA
nodenearesttotheSolarflareadapter.Thecorrectallocationofmemoryis
particularlyimportanttoensurethatpacketbuffersareallocatedonthecorrect
NUMAnodetoavoidunnecessaryincreasesinQPItrafficandtoavoiddropped
packets.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 32
Usefulcommands
•ToidentifyNUMAnodes,socketmemoryandCPUcoreallocation:
#numactl‐H
•ToidentifytheNUMAnodelocaltoaSolarflareadapter:
#cat/sys/class/net/<interface>/device/numa_node
•ToidentifymemoryallocationanduseonaparticularNUMAnode:
#cat/sys/devices/system/node/node<N>/numastat
DriverLoading
Whenloading,theOnloadmodulewillcreateavarietyofcommondatastructures.
ToensurethatthesearecreatedontheNUMAnodenearesttotheSolarflare
adapter,onload_toolreloadshouldbeaffinitizedtoacoreonthecorrectNUMA
node.
#numactl‐‐cpunodebind=1onload_toolreload
MemoryPolicy
Toguaranteethatmemoryisappropriatelyallocated‐andtoensurethatmemory
allocationsdonotfail,amemorypolicythatbindstoaspecificNUMAnodeshould
beselected.Whennopolicyisspecifiedthesystemwillgenerallyuseadefault
policyallocatingmemoryonthenodeonwhichaprocessisexecuting.
ApplicationProcessing
ThemajorityofprocessingbyOnloadoccursinthecontextoftheOnloaded
application.VariousmethodscanbeusedtoaffinitizetheOnloadedprocess;
numactl,tasksetorcpusetsortheCPUaffinitycanbesetprogramatically.
Workqueues
AnOnloadedapplicationwillcreatetwosharedworkqueuesandoneperstack
workqueue.TheimplementationoftheworkqueuediffersbetweenLinuxkernels‐
andsodoesthemethodusedtoaffinitizeworkqueues.
OnmorerecentLinuxkernels(3.10+)theOnloadworkqueueswillbeinitially
affinitizedtothenodeonwhichtheyarecreated.Thereforeifthedriverloadis
affinitizedandtheOnloadedapplicationaffinitizedtothecorrectnode,Onload
stackswillbecreatedonthecorrectnodeandtherewillbenofurtherwork
required.
SpecifyingacpumaskviasysfsforaworkqueueisNOTrecommendedasthiscan
breakorderingrequirements.
OnolderLinuxkernelsdedicatedworkqueuethreadsarecreated‐andthesecanbe
affinitizedusingtasksetorcpusets.Identifythetwoworkqueuessharedbyall
Onloadstacks:
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 33
onloadwqueue
sfc_vi
Identifytheperstackworkqueuewhichhasanameintheformatonload
wq<stackid>(e.gonloadwq:1forstack1).
Usetheonload_stackdumpcommandtoidentifyOnloadstacksandthePIDofthe
processthatcreatedthestack:
#onload_stackdump
#stackidstacknamepids
0‐106913
UsetheLinuxpidofcommandtoidentifythePIDsforOnloadworkqueues:
#pidofonloadwq:0sfc_vionloadwqueue
106930105409105431
Itisrecommendedthatthesharedworkqueuesareaffinitizedimmediatelyafterthe
driverisloadedandtheperstackqueueimmediatelyafterstackcreation.
Interrupts
WhenOnloadisbeingusedinaninterruptdrivenmode(seeInterruptHandling‐
UsingOnloadonpage38)interruptsshouldaffinitizedtothesameNUMAnode
runningtheOnloadapplication,butnotonthesameCPUcoreastheapplication.
WhenOnloadisspinning(busywait)therewillbefew(ifany)interrupts,soitisnot
arealconcernwherethesearehandled.
Verification
Theonload_stackdumplotscommandisusedtoverifythatallocationsoccuronthe
requiredNUMAnode:
#onload_stackdumplots|grepnuma
numanodes:creation=0load=0
numanodemasks:packetalloc=1sockalloc=1interrupt=1
ThecpuaffinityofindividualOnloadedthreadscanbeidentifiedwiththefollowing
command:
#onload_stackdumpthreads
5.5InterruptHandling‐KernelDriver
DefaultBehavior
Usingthevalueidentifiedfromtherss_cpusoption,theSolarflareNETdriverwill
createanumberofreceive(andtransmit)queues(termedan“RSSchannel”)for
eachphysicalinterface.BydefaultthedrivercreatesoneRSSchannelperCPUcore
detectedintheseveruptoamaximumof32.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 34
Therss_cpussfcdrivermoduleoptioncanbesetinausercreatedfile<sfc.conf>in
the/etc/modprobe.ddirectory.Thedrivermustbereloadedbeforetheoption
becomeseffective.Forexample,rss_cpuscanbesettoanintegervalue:
optionssfcrss_cpus=4
Intheaboveexample4receivequeuesarecreatedperSolarflareinterface.The
defaultvalueisrss_cpus=cores.Otheravailableoptionsarerss_cpus=<int>,
rss_cpus=hyperthreadsandrss_cpus=packages.
NOTE:Ifthesfcdrivermoduleparameterrss_numa_localisenabled,RSSwillbe
restrictedtousecores/hyperthreadsontheNUMAnodelocaltotheSolarflare
adapter.
AffinitizingRSSChannelstoCPUs
Asdescribedintheprevioussection,thedefaultbehavioroftheSolarflarenetwork
driveristocreateoneRSSchannelperCPUcore.Atloadtimethedriveraffinitizes
theinterruptassociatedwitheachRSSchanneltoaseparateCPUcoresothe
interruptloadisevenlydistributedovertheavailableCPUcores.
NOTE:TheseinitialinterruptaffinitieswillbedisruptedandchangediftheLinux
IRQbalancerdaemonisrunning.TostoptheIRQbalancerusethefollowing
command:
#serviceirqbalancestop
Inthefollowingexample,wehaveaserverwith2Solarflaredualportadapters
(totalofnetwork4interfaces),installedinaserverwith2CPUsocketswith8cores
persocket(hyperthreadingisdisabled).
Ifwesetrss_cpus=4,eachinterfacewillcreate4RSSchannels.Thedrivertakes
caretospreadtheaffinitizedinterruptsevenlyovertheCPUtopologyi.e.evenly
betweenthetwoCPUsocketsandevenlyoversharedL2/L3caches.
Thedriveralsoattemptstospreadtheinterruptloadofthemultiplenetwork
interfacesbyusingdifferentCPUcoresfordifferentinterfaces:
Table3:ExampleRSSChannelMapping
Interface Numofrxqueues Maptocores
1 4 0,1,2,3
2 4 4,5,6,7
3 4 8,9,10,11
4 4 12,13,14,15
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 35
With4receivequeuescreatedperinterfacethisresults,onthismachine,tothefirst
networkinterfacemappingtothefourlowestnumberCPUcoresi.e.twocoresfrom
eachCPUsocketasillustratedbelow.Thenextnetworkinterfaceusesthenextfour
CPUsuntileachCPUcoreisloadedwithasingleRSSchannelasillustratedin
Figure3below.
Figure3:MappingRSSChannelstoCPUcores.
ToidentifythemappingofreceivequeuestoCPUcores,usethefollowing
command:
#cat/proc/interrupts|grepeth4
106:19000000000000000IRPCIMSIedgeeth40
107:01100000000000000IRPCIMSIedgeeth41
108:00100000000000000IRPCIMSIedgeeth42
109:0002000000000000IRPCIMSIedgeeth43
NotethateachreceivequeuehasanassignedIRQ.Receivequeueeth40isserved
byIRQ106,eth41byIRQ107etc.
sfcaffinity_config
TheOpenOnloaddistributionalsoincludesthesfcaffinity_configscriptwhich
canalsobeusedtoaffinitizeRSSchannelinterrupts.sfcaffinity_confighasa
numberofcommandlineoptionsbutacommonwayofrunningitiswiththeauto
command:
#sfcaffinity_configauto
Autoinstructssfcaffinity_configtosetinterruptsaffinitiestoevenlyspreadthe
RSSchannelsovertheavailableCPUcores.Usingtheabovescenarioasanexample,
whererss_cpushasbeensetto4,thecommandwillaffinitizetheinterrupt
associatedwitheachreceivequeueevenlyovertheCPUtopologyinthiscasethe
firstfourCPUcores.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 36
sfcaffinity_config:INFO:eth4:Spreading4interruptsevenlyover2sharedcaches
sfcaffinity_config:INFO:eth4:bindrxq0(irq106)tocore1
sfcaffinity_config:INFO:eth4:bindrxq1(irq107)tocore0
sfcaffinity_config:INFO:eth4:bindrxq2(irq108)tocore3
sfcaffinity_config:INFO:eth4:bindrxq3(irq109)tocore2
sfcaffinity_config:INFO:eth4:configuresfc_affinityn_rxqs=4
cpu_to_rxq=1,0,3,2,1,0,3,2,1,0,3,2,1,0,3,2
Figure4:Mappingwithsfcaffinity_configauto
Inthisexample,afterrunningthesfcaffinity_configautocommand,interrupts
forthe4receivequeuesfromthe4interfacesarenowalldirectedtothesame4
cores0,1,2,3asillustratedbyFigure4.
NOTE:Runningthesfcaffinity_configautocommandalsodisablesthekernel
IRQbalanceservicetopreventinterruptsbeingredirectedbythekerneltoother
cores.
RestrictRSStolocalNUMAnode
Thesfcdrivermoduleparameterrss_numa_localwillrestrictRSStoonlyuseCPU
coresorhypterthreads(ifhyperthreadingisenabled)ontheNUMAnodelocaltothe
Solarflareadapter.
rss_numa_localdoesNOTrestrictthenumberofRSSchannelscreatedbythe
driveritinsteadworksbyrestrictingtheRSSspreadingsoonlythechannelsonthe
localNUMAnodewillreceivekerneldrivertraffic.
Inthedefaultcase(whererss_cpus=cores),oneRSSchanneliscreatedperCPU
core.However,thedriveradjuststheRSSsettingssuchthatonlytheRSSchannels
affinitizedtothelocalCPUsocketreceivetraffic.Itthereforehasnoeffectonthe
Onloadallocationanduseofreceivequeuesandinterrupts.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 37
Figure5belowidentifiesthereceivequeueinterruptsspreadwhenrss_cpus=4
andrss_numa_local=1.Inthismachineadapter1isattachedtothePCIebuson
socket#0withadapter#2attachedtothePCIebusonsocket#1.
Figure5:Mappingwithrss_numa_local
RestrictRSSReceiveQueues
Theethtool‐ Xcommandcanalsobeusedtorestrictthereceivequeuesaccessible
byRSS.Inthefollowingexamplerss_cpus=4andethtool‐xidentifiesthe4
receivequeuesperinterface:
#ethtool‐xeth4
RXflowhashindirectiontableforeth4with4RXring(s):
0:01230123
8:01230123
16:01230123
24:01230123
32:01230123
40:01230123
48:01230123
56:01230123
64:01230123
72:01230123
80:01230123
88:01230123
96:01230123
104:01230123
112:01230123
120:01230123
TorestrictRSStospreadreceiveflowsevenlyoverthefirst2receivequeues.Use
ethtool‐X:
#ethtool‐Xeth4equal2
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 38
RXflowhashindirectiontableforeth4with4RXring(s):
0:01010101
8:01010101
16:01010101
24:01010101
32:01010101
40:01010101
48:01010101
56:01010101
64:01010101
72:01010101
80:01010101
88:01010101
96:01010101
104:01010101
112:01010101
120:01010101
InterruptHandling‐UsingOnload
AthreadacceleratedbyOnloadwilleitherbeinterruptdrivenoritwillbespinning.
Whenthethreadisinterruptdriven,athreadwhichcallsintoOnloadtoreadfrom
itsreceivequeueandforwhichtherearenoreceivedpacketstobeprocessed,will
‘sleep’untilaninterrupt(s)fromthekernelinformsitthatthereismoreworktodo.
Whenathreadisspinning,itisbusywaitingonitsreceivequeueuntilpacketsare
received‐inwhichcasethepacketsareretrievedandthethreadreturns
immediatelytothereceivequeue,oruntilthespinperiodexpires.Ifthespinperiod
expiresthethreadwillrelinquishtheCPUcoreandsleep’untilaninterruptfromthe
kernelinformsitthatfurtherpacketshavebeenreceived.Ifthespinperiodisset
greaterthanthepacketinterarrivalrate,thespinningthreadcancontinuetospin
andretrievepacketswithoutinterruptsoccurring.Evenwhenspinning,an
applicationmightexperienceafewinterrupts.
Asageneralrule,whenspinning,onlyafewinterruptswillbeexpectedso
performanceistypicallyinsensitiveastowhichCPUcoreprocessestheinterrupts.
However,whenOnloadisinterruptdrivenperformancecanbesensitivetowhere
theinterruptsarehandledandwilltypicallybenefittobeonthesameCPUsocket
astheapplicationthreadhandlingthesocketI/O.TocontroltheCPUcoreprocessing
OnloadinterruptsusetheEF_IRQ_COREorEF_IRQ_CHANNELenvironmentvariables.
UsingEF_PACKET_BUFFER_MODE0or2,anonloadstackwilluseoneormoreofthe
interruptsassignedtotheNETdriverreceivequeueswheretheCPUcorehandling
theinterruptsisdefinedbytheRSSmappingofreceivequeuestoCPUcores.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 39
UsingEF_PACKET_BUFFER_MODE1or3,theonloadstackcreatesdedicated
interrupts.SeeTable4belowfordetails.
Anotherenvironmentvariable,EF_IRQ_CHANNEL,canbeusedtoselecttheNET
driverreceivechannelthatwillbeusedtohandleinterruptsforanonloadstack.
OnloadinterruptsarehandledbythesamecoreassignedtotheNETdriverreceive
channel.
WhenOnloadisusingaNETdriverRSSchannelforitssourceofinterrupts,itcanbe
usefultodedicatethischanneltoOnloadandpreventthedriverfromusingthis
channelforRSStraffic.Seeabovesectionson“RestrictingRSSreceivequeues”and
“RestrictRSStolocalNUMAnodeformethodsofhowtoachievethis.
5.6PerformanceJitter
Onanysystemreducingoreliminatingjitteriskeytogainingoptimumperformance,
howeverthecausesofjitterleadingtopoorperformancecanbedifficulttodefine
anddifficulttoremedy.Thefollowingsectionidentifiessomekeypointsthatshould
beconsidered.
•Afirststeptowardsreducingjittershouldbetoconsidertheconfiguration
settingsspecifiedintheLowLatencyQuickstartGuideonpage4‐thisincludes
thedisablingoftheirqbalanceservice,interruptmoderationsettingsand
measurestopreventCPUcoresswitchingtopowersavingmodes.
•UseisolcpustoisolateCPUcoresthattheapplication‐oratleastthecritical
threadsoftheapplicationwilluseandpreventOShousekeepingtasksand
othernoncriticaltasksfromrunningonthesecores.
•Setanapplicationthreadrunningononecoreandtheinterruptsforthat
threadonaseparatecore‐butonthesamephysicalCPUpackage.Evenwhen
spinning,interruptsmaystilloccur,forexample,iftheapplicationfailstocall
intotheOnloadstackforextendedperiodsbecauseitisbusydoingotherwork.
Table4:SelectingOnloadinterrupts
EF_PACKET_BUFFER_MODE EF_IRQ_CORE
0(default)or2OnloadinterruptsarehandledviatheNETdriver
receivechannelinterrupts.
Itisonlypossibleforinterruptstobehandledon
therequestedcoreifaNETdriverinterruptis
assignedtotheselectedcore.
1or3Onloadcreatesdedicatedinterruptsforeach
onloadstackandaninterruptisassignedtothe
requestedcore.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 40
•Ideallyeachspinningthreadwillbeallocatedaseparatecoresothat,inthe
eventthatitblocksorisdescheduled,itwillnotpreventotherimportant
threadsfromdoingwork.Acommoncauseofjitterismorethanonespinning
threadsharingthesameCPUcore.Jitterspikesmayindicatethatonethreadis
beingheldofftheCPUcorebyanotherthread.
•WhenEF_STACK_LOCK_BUZZ=1,threadswillspinfortheEF_BUZZ_USEC
periodwhiletheywaittoacquirethestacklock.Lockbuzzingcanleadto
unfairnessbetweenthreadscompetingforalock,andsoresultinresource
starvationforone.Occurrencesofthisarecountedinthe'stack_lock_buzz'
counter.EF_STACK_LOCK_BUZZisenabledbydefaultwhenEF_POLL_USEC
(spinning)isenabled.
•Ifamultithreadapplicationisdoinglotsofsocketoperations,stacklock
contentionwillleadtosend/receiveperformancejitter.Insuchcasesimproved
performancecanbehadwheneachcontendingthreadhasitsownstack.This
canbemanagedwithEF_STACK_PER_THREADwhichcreatesaseparateOnload
stackforthesocketscreatedbyeachthread.Ifseparatestacksarenotan
optionthenitmaybebeneficialtoreducetheEF_BUZZ_USECperiodorto
disablestacklockbuzzingaltogether.
•Itisalwaysimportantthatthreadsthatneedtocommunicatewitheachother
arerunningonthesameCPUpackagesothatthesethreadscansharea
memorycache.
• Jittermayalsobeintroducedwhensomesocketsareacceleratedandothers
arenot.Onloadwillensurethatacceleratedsocketsaregivenpriorityovernon
acceleratedsockets,althoughthisdelaywillonlybeintheregionofafew
microseconds‐notmilliseconds,thepenaltywillalwaysbeonthesideofthe
nonacceleratedsockets.TheenvironmentvariablesEF_POLL_FAST_USECand
EF_POLL_NONBLOCK_FAST_USECcanbeconfiguredtomanagetheextentof
priorityofacceleratedsocketsovernonacceleratedsockets.
•Iftrafficissparse,spinningwilldeliverthesamelatencybenefits,buttheuser
shouldensurethatthespintimeoutperiod,configuredusingthe
EF_POLL_USECvariable,issufficientlylongtoensurethethreadisstillspinning
whentrafficisreceived.
•Whenapplicationsonlyneedtosendandreceiveoccasionallyitmaybe
beneficialtoimplementakeepalive‐heartbeatmechanismbetweenpeers.
ThishastheeffectofretainingtheprocessdataintheCPUmemorycache.
Callingsendorreceiveafteradelaycanresultinthecalltakingmeasurably
longer,duetothecacheeffects,thanifthisiscalledinatightloop.
•OnsomeserversBIOSsettingssuchaspowerandutilizationmonitoringcan
causeunnecessaryjitterbyperformingmonitoringtasksonallCPUcores.The
usershouldchecktheBIOSanddecideifperiodictasks(andtherelatedSMIs)
canbedisabled.
•TheSolarflaresysjitterutilitycanbeusedtoidentifyandmeasurejitteronall
coresofanidlesystem‐refertoSysjitteronpage27fordetails.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 41
UsingOnloadTuningProfiles
Environmentvariablessetintheapplicationuserspacecanbeusedconfigureand
controlaspectsoftheacceleratedapplication’sperformance.Thesevariablescanbe
exportedusingtheLinuxexportcommande.g.
exportEF_POLL_USEC=100000
Onloadsupportstuningprofilescriptfileswhichareusedtogroupenvironment
variableswithinasinglefiletobecalledfromtheOnloadcommandline.
ThelatencyprofilesetstheEF_POLL_USEC=100000settingthebusywaitspin
timeoutto100milliseconds.TheprofilealsodisablesTCPfaststartforneworidle
connectionswhereadditionalTCPACKswilladdlatencytothereceivepath.Touse
theprofileincludeitontheonloadcommandlinee.g
onload‐‐profile=latencynetperf‐Honload2sfc‐tTCP_RR
FollowingOnloadinstallation,profilesprovidedbySolarflarearelocatedinthe
followingdirectory‐thisdirectorywillbedeletedbytheonload_uninstall
command:
/usr/libexec/onload/profiles
Userdefinedenvironmentvariablescanbewrittentoauserdefinedprofilescript
file(havinga.opfextension)andstoredinanydirectoryontheserver.Thefullpath
tothefileshouldthenbespecifiedontheonloadcommandlinee.g.
onload‐‐profile=/tmp/myprofile.opfnetperf‐Honload2sfc‐tTCP_RR
Asanexamplethelatencyprofile,providedbytheOnloaddistributionisshown
below:
#Onloadlowlatencyprofile.
#Enablepolling/spinning.Whentheapplicationmakesablockingcall
#suchasrecv()orpoll(),thiscausesOnloadtobusywaitforupto
100ms
#beforeblocking.
onload_setEF_POLL_USEC=100000
#DisableFASTSTARTwhenconnectionisneworhasbeenidleforawhile.
#Theadditionalacksitcausesaddlatencyonthereceivepath.
onload_setEF_TCP_FASTSTART_INIT0
onload_setEF_TCP_FASTSTART_IDLE0
ForacompletelistofenvironmentvariablesrefertoParameterReferenceon
page146
BenchmarkTesting
BenchmarkproceduresusingOnload,netperfandsfnt_pingpongaredescribedin
theLowLatencyQuickstartGuideonpage4.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 42
5.7AdvancedTuning
Advancedtuningrequirescloserexaminationoftheapplicationperformance.The
applicationshouldbetunedtoachievethefollowingobjectives:
•Tohaveasmuchprocessingatuserlevelaspossible.
•Tohaveasfewinterruptsaspossible.
•Toeliminatedrops.
•Tominimizelockcontention.
Onloadincludesadiagnosticapplicationcalledonload_stackdump,whichcanbe
usedtomonitorOnloadperformanceandtosettuningoptions.
Thefollowingsectionsdemonstratetheuseofonload_stackdumptoexamine
aspectsofthesystemperformanceandsetenvironmentvariablestoachievethe
tuningobjectives.
Forfurtherexamplesanduseofonload_stackdumprefertoonload_stackdumpon
page219.
MonitoringUsingonload_stackdump
Touseonload_stackdump,enterthefollowingcommand:
onload_stackdump[command]
Tolistavailablecommandsandviewdocumentationforonload_stackdumpenter
thefollowingcommands:
onload_stackdumpdoc
onload_stackdump‐h
Aspecificstacknumbercanalsobeprovidedontheonload_stackdumpcommand
line.
WorkedExamples
PrefaultPacketBuffers
TheOnloadenvironmentvariableEF_PREFAULT_PACKETSwillcausetheuser
processto‘touch’thespecifiednumberofpacketbufferswhenanOnloadstackis
created.Thismeansthatmemoryforthesepacketbuffersispreallocatedand
memorymappedintotheuserprocessaddressspace.
Preallocationisadvisedtopreventlatencyjittercausedbytheallocationand
memorymappingoverheads.
Whendecidinghowmanypacketstoprefault,theusershouldlookattheallocvalue
whentheonload_stackdumppacketscommandisrun.Theallocvalueisahigh
watermarkidentifyingthemaximumthenumberofpacketsbeingusedbythestack
atanysingularpoint.SettingEF_PREFAULT_PACKETStoatleastthisvalueis
recommended.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 43
onload_stackdumppackets
$onload_stackdumppackets
ci_netif_pkt_dump_all:id=6
pkt_bufs:size=2048max=32768alloc=576free=50async=0
pkt_bufs:rx=525rx_ring=522rx_queued=3
pkt_bufs:tx=1tx_ring=0tx_oflow=0tx_other=1
509:0x8000Rx
1:0x4000Nonb
n_zero_refs=66n_freepkts=50estimated_free_nonb=16
free_nonb=0nonb_pkt_pool=a39ffff
NOTE:Itisnotpossibletoprefaultanumberofpacketsexceedingthecurrentvalue
ofEF_MAX_PACKETSandattemptstodothiswillresultinawarningsimilartothe
following:
ci_netif_pkt_prefault_reserve:Prefaultedonly63488of64000
Thewarningmessageisharmless,thisinformstheuserthatnotalltherequested
packetscouldbeprefaulted(becausesomehavealreadybeenallocatedtoreceive
rings).
WhendecidinghowmanypacketstoprefaulttheusershouldconsiderthatOnload
mustallocatefromtheEF_MAX_PACKETpool,anumberofpacketbuffersperreceive
ringperinterface.Oncethesehavebeenallocated,anyremaindercanbe
prefaulted.
Userswhorequiretoprefaultthemaximumpossiblenumberofavailablepackets
cansetEF_PREFAULT_PACKETSandEF_MAX_PACKETStothesamevalueandjust
ignorethewarningsfromOnload:
EF_PREFAULT_PACKETS=64000EF_MAX_PACKETS=64000onload<myapplication>...
RefertoAppendixAonpage146fordetailsoftheEF_PREFAULT_PACKETSvariable.
CAUTION:Prefaultingpacketbuffersforonestackwillreducethenumberof
availablebuffersavailableforothers.Usersshouldconsiderthatoverallocationto
onestackmightmeanspare(redundant)packetbuffercapacitythatcouldbebetter
allocatedelsewhere.
ProcessingatUserLevel
Manyapplicationscanachievebetterperformancewhenmostprocessingoccursat
userlevelratherthankernellevel.Toidentifyhowanapplicationisperforming,
enterthefollowingcommand:
onload_stackdumplots|greppolls
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 44
$onload_stackdumplots|greppoll
k_polls:673
u_polls:41
Theoutputidentifiesmanymorek_pollsthanu_pollsindicatingthatthe
stackisoperatingmainlyatkernellevelandmaynotbeachievingoptimal
performance.
Solution
TerminatetheapplicationandsettheEF_POLL_USECparameterto100000.Restart
theapplicationandrerunonload_stackdump:
exportEF_POLL_USEC=100000
onload_stackdumplots|greppolls
$onload_stackdumplots|greppolls
k_polls:673
u_polls:1289
Theoutputidentifiesthatthenumberofu_pollsisfargreaterthanthe
numberofk_pollsindicatingthatthestackisnowoperatingmainlyat
userlevel.
Counter Description
k_polls Numberoftimesthesocketeventqueuewas
polledfromthekernel.
u_polls Numberoftimesthesocketeventqueuewas
polledfromuserspace.
periodic_polls Numberoftimesaperiodictimerhaspolledfor
events.
interrupt_polls Numberoftimesaninterruptpolledfor
networkevents.
deferred_polls Numberoftimespollhasbeendeferredtothe
stacklockholder.
timeout_interrupt_polls Numberoftimestimeoutinterruptspolledfor
networkevents.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 45
AsFewInterruptsasPossible
Atunedapplicationwillreachabalancebetweenthenumber/rateofinterrupts
processedandtheamountofrealworkthatgetsdonee.g.processmultiplepackets
perinterruptratherthanone.Evenspinningapplicationscanbenefitfromthe
occasionalinterrupt,e.g.whenaspinningthreadhasbeendescheduledfroma
CPU,aninterruptwillprodthethreadbacktoactionwhenfurtherworkhastobe
done.
#onload_stackdumplots|grep^interrupt
Solution
Ifanapplicationisobservedtakinglotsofinterruptsitmaybebeneficialtoincrease
thespintimewiththeEF_POLL_USECvariableorsettingahighinterrupt
moderationvalueforthenetdriverusingethtool.
Thenumberofinterruptsonthesystemcanalsobeidentifiedfrom/proc/
interrupts.
EliminatingDrops
Theperformanceofnetworksisimpactedbyanypacketloss.Thisisespecially
pronouncedforreliabledatatransferprotocolsthatarebuiltontopofunicastor
multicastUDPsockets.
Firstchecktoseeifpacketshavebeendroppedbythenetworkadapterbefore
reachingtheOnloadstack.Useethtooltocollectstatsdirectlyfromthenetwork
adapter:
#ethtool‐Senps0f0|grepdrop
rx_noskb_drops:0
port_rx_nodesc_drops:0
Counter Description
Interrupts Totalnumberofinterruptsreceivedforthestack.
Interruptpolls Numberoftimesthestackispolled‐invokedbyinterrupt.
Interruptevs Numberofeventsprocessedwheninvokedbyaninterrupt.
Interruptwakes Numberoftimestheapplicationiswokenbyinterrupt.
Interruptprimes Numberoftimesinterruptsarereenabled(afterspinningor
pollingthestack).
Interruptnoevents Numberofstackpollsforwhichtheretherewasnoeventto
recover.
Interruptlock
contends
Theapplicationpolledthestackandhasthelockbeforean
interruptfired.
Interruptbudget
limited
Numberoftimes,whenhandlingapollinaninterrupt,thepoll
wasstoppedwhentheNAPIbudgetwasreached.Anyremaining
eventsarethenprocessedonthestackworkqueue.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 46
port_rx_dp_di_dropped_packets:681618610
Solution
Ifpacketlossisobservedatthenetworklevelduetoalackofreceivebufferingtry
increasingthesizeofthereceivedescriptorqueuesizeviaEF_RXQ_SIZE.Ifpacket
dropsareobservedatthesocketlevelconsulttheapplicationdocumentation‐it
mayalsobeworthexperimentingwithsocketbuffersizes(seeEF_UDP_RCVBUF).
SettingtheEF_EVS_PER_POLLvariabletoahighervaluemayalsoimproveefficiency
‐refertoAppendixAforadescriptionofthisvariable.
MinimizingLockContention
Lockcontentioncangreatlyaffectperformance.Whenthreadsshareastack,a
threadholdingthestacklockwillpreventanotherthreadfromdoingusefulwork.
Applicationswithfewerthreadsmaybeabletocreateastackperthread(see
EF_STACK_PER_THREADandStacksAPIonpage193).
Useonload_stackdumptoidentifyinstancesoflockcontention:
#onload_stackdumplots|egrep"(lock_)|(sleep)"
Counter Description
rx_noskb_drops Numberofpacketsdroppedwhenthereare
nofurthersocketbufferstouse.
port_rx_nodesc_drops Numberofpacketsdroppedwhenthereare
nofurtherdescriptorsintherxringbufferto
receivethem.
port_rx_dp_di_dropped_packets Numberofpacketsdroppedbecausefilters
indicatethepacketsshouldbedropped‐this
canhappenwhenpacketsdon’tmatchany
filterorthematchedfilterindicatesthe
packetshouldbedropped.
Counter Description
periodic_lock_contends
Numberoftimesperiodictimercouldnotgetthe
stacklock.
interrupt_lock_contends
Numberoftimestheuserlevelgotthestacklock.
timeout_interrupt_lock_conte
nds
Numberoftimestimeoutinterruptscouldnotlock
thestack.
sock_sleeps
Numberoftimesathreadhasblockedonasingle
socket.
sock_sleep_primes
Numberoftimesselect/poll/epollenabled
interrupts.
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 47
unlock_slow
Numberoftimestheslowpathwastakentounlock
thestacklock.
unlock_slow_pkt_waiter
Numberoftimespacketmemoryshortage
provokedtheunlockslowpath.
unlock_slow_socket_list
Numberoftimesthedeferredsocketlistprovoked
theunlockslowpath.
unlock_slow_need_prime
Numberoftimesinterruptprimingprovokedthe
unlockslowpath.
unlock_slow_wake
Numberoftimestheunlockslowpathwastakento
wakethreads.
unlock_slow_swf_update
Numberoftimestheunlockslowpathwastakento
updateswfilters.
unlock_slow_close
Numberoftimestheunlockslowpathwastakento
closesockets/pipes.
unlock_slow_syscall
Numberoftimesasyscallwasneededonthe
unlockslowpath.
lock_wakes
Numberoftimesathreadiswokenwhenblocked
onthestacklock.
stack_lock_buzz
Numberoftimesathreadhasspunwaitingforthe
stacklock.
sock_lock_sleeps
Numberoftimesathreadhassleptwaitingfora
socklock.
sock_lock_buzz
Numberoftimesathreadhasspunwaitingfora
socklock.
tcp_send_ni_lock_contends
NumberoftimesTCPsendmsg()contendedthe
stacklock
udp_send_ni_lock_contends
NumberoftimesUDPsendmsg()contendedthe
stacklock
getsockopt_ni_lock_contends
Numberoftimesgetsockopt()contendedthestack
lock.
setsockopt_ni_lock_contends
Numberoftimessetsockopt()contendedthestack
lock.
lock_dropped_icmps
NumberofdroppedICMPmessagesnotprocessed
duetocontention.
Counter Description
OnloadUserGuide
TuningOnload
Issue20 ©SolarflareCommunications2015 48
Solution
Performancewillbeimprovedwhenstackcontentioniskepttoaminimum.When
threadsshareastackitispreferableforathreadtospinratherthansleepwhen
waitingforastacklock.TheEF_BUZZ_USECvaluecanbeincreasedtoreduce
‘sleeps’.Wherepossibleusestacksperprocess.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 49
6OnloadFunctionality
ThischapterprovidesdetailedinformationaboutspecificaspectsofSolarflare
Onloadoperationandfunctionality.
6.1OnloadTransparency
Onloadprovidessignificantlyimprovedperformancewithouttheneedtorewriteor
recompiletheuserapplication,whilstretainingcompleteinteroperabilitywiththe
standardTCPandUDPprotocols.
IntheregularkernelTCP/IParchitectureanapplicationisdynamicallylinkedtothe
libclibrary.ThisOSlibraryprovidessupportforthestandardBSDsocketsAPIviaa
setof‘wrapperfunctionswithrealprocessingoccurringatthekernellevel.Onload
alsosupportsthestandardBSDsocketsAPI.However,incontrasttothekernelTCP/
IP,Onloadmovesprotocolprocessingoutofthekernelspaceandintotheuserlevel
Onloadlibraryitself.
AsanetworkingapplicationinvokesthestandardsocketAPIfunctioncallse.g.
socket(),read(),write()etc,theseareinterceptedbytheOnloadlibrarymaking
useoftheLD_PRELOADmechanismonLinux.Fromeachfunctioncall,Onloadwill
examinethefiledescriptoridentifyingthosesocketsusingaSolarflareinterface‐
whichareprocessedbytheOnloadstack,whilstthosenotusingaSolarflare
interfacearetransparentlypassedtothekernelstack.
6.2OnloadStacks
AnOnload'stack'isaninstanceofaTCP/IPstack.Thestackincludestransmitand
receivebuffers,openconnectionsandtheassociatedportnumbersandstack
options.EachstackhasassociatedwithitoneormoreVirtualNICs(typicallyoneper
physicalportthatstackisusing).
Innormalusage,eachacceleratedprocesswillhaveitsownOnloadstacksharedby
allconnectionscreatedbytheprocess.Itisalsopossibleformultipleprocessesto
shareasingleOnloadstackinstance(refertoStackSharingonpage62),andfora
singleapplicationtohavemorethanoneOnloadstack.RefertoOnloadExtensions
APIonpage189.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 50
6.3VirtualNetworkInterface(VNIC)
TheSolarflarenetworkadaptersupports1024transmitqueues,1024receive
queues,1024eventqueuesand1024timerresourcespernetworkport.AVNIC
(virtualnetworkinterface)consistsofoneuniqueinstanceofeachofthese
resourceswhichallowstheVNICclienti.e.theOnloadstack,anisolatedandsafe
mechanismofsendingandreceivingnetworktraffic.Receivedpacketsaresteered
tothecorrectVNICbymeansofIP/MACfiltertablesonthenetworkadapterand/or
ReceiveSideScaling(RSS).AnOnloadstackallocatesoneVNICperSolarflare
networkportsoithasadedicatedsendandreceivechannelfromusermode.
FollowingaresetoftheSolarflarenetworkadapterdriver,allvirtualinterface
resourcesincludingOnloadstacksandsocketswillbereinstated.Thereset
operationwillbetransparenttotheapplication,buttrafficwillbelostduringthe
reset.
6.4FunctionalOverview
Whenestablishingitsfirstsocket,anapplicationisallocatedanOnloadstackwhich
allocatestherequiredVNICs.
Whenapacketarrives,IPfilteringintheadapteridentifiesthesocketandthedata
iswrittentothenextavailablereceivebufferinthecorrespondingOnloadstack.The
adapterthenwritesaneventtoan“eventqueue”managedbyOnload.Ifthe
applicationisregularlymakingsocketcalls,Onloadisregularlypollingthisevent
queue,andthenprocessingeventsdirectlyratherthaninterruptsarethenormal
meansbywhichanapplicationisabletorendezvouswithitsdata.
Userlevelprocessingsignificantlyreduceskernel/userlevelcontextswitchingand
interruptsareonlyrequiredwhentheapplicationblocks‐sincewhenthe
applicationismakingsocketcalls,Onloadisbusyprocessingtheeventqueuepicking
upnewnetworkevents.
6.5OnloadwithMixedNetworkAdapters
AservermaybeequippedwithSolarflarenetworkinterfacesandnonSolarflare
networkinterfaces.Whenanapplicationisaccelerated,OnloadreadstheLinux
kernelroutingtable(Onloadwillonlyconsiderthekerneldefaultroutingtable)to
identifywhichnetworkinterfaceisrequiredtomakeaconnection.Ifanon
SolarflareinterfaceisrequiredtoreachadestinationOnloadwillpassthe
connectiontothekernelTCP/IPstack.Noadditionalconfigurationisrequiredto
achievethisasOnloaddoesthisautomaticallybylookingintheIProutetable.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 51
6.6MaximumNumberofNetworkInterfaces
Onloadsupportsupto8Solarflarenetworkinterfacesbydefault.Ifanapplication
requiresmoreSolarflareinterfacesthefollowingvaluescanbealteredinthesource
code:src/include/ci/internal/transport_config_opt.hheaderfile
CI_CFG_MAX_INTERFACESandCI_CFG_MAX_REGISTER_INTERFACES.
FollowingchangestothesevaluesitisnecessarytorebuildandreinstallOnload.
6.7WhitelistandBlacklistInterfaces
BydefaultOnloadwillusethefirst‘N’SolarflarenetworkinterfacesfornetworkI/O
whereNisequaltoCI_CFG_MAX_REGISTER_INTERFACES(defaultvalue8).
SupportedfromOnload201502,theuserisabletoselectwhichSolarflareinterfaces
aretobeusedbyOnload.
Theintf_white_listOnloadmoduleoptionisaspaceseparatedlistofSolarflare
networkadapterinterfacesthatOnloadwillusefornetworkI/O.
•InterfacesidentifiedinthewhitelistwillalwaysbeacceleratedbyOnload.
•InterfacesNOTidentifiedinthewhitelistwillnotbeacceleratedbyOnload.
•AnemptywhitelistmeansthatALLSolarflareinterfaceswillbeaccelerated.
Theintf_black_listOnloadmoduleoptionisaspaceseparatedlistofSolarflare
networkadapterinterfacesthatOnloadwillnotusefornetworkI/O.
Whenaninterfaceappearsinbothlists,blacklisttakespriority.Renamingof
interfacesafterOnloadhasstartedwillnotbereflectedintheaccesslistsand
changestolistswillonlyaffectOnloadstackscreatedaftersuchchanges‐not
currentlyrunningstacks.
Onloadmoduleoptionscanbespecifiedinausercreatedfileinthe/etc/
modprobe.ddirectory:
optionsonloadintf_white_list=eth4
optionsonloadintf_black_list="eth5eth6"
Theseoptionsareappliedgloballyandcannotbeappliedtoindividualstacks.
6.8OnloadedPIDs
ToidentifyprocessesacceleratedbyOnloadusetheonload_fusercommand:
#onload_fuser‐v
9886ping
OnlyprocessesthathavecreatedanOnloadstackarepresent.Processeswhichare
loadedunderOnload,buthavenotcreatedanysocketsarenotpresent.The
onload_stackdumpcommandcanalsolistacceleratedprocesses‐seeList
OnloadedProcessesonpage220fordetails.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 52
6.9OnloadandFileDescriptors,StacksandSockets
ForanOnloadedprocessitispossibletoidentifythefiledescriptors,Onloadstacks
andsocketsbeingacceleratedbyOnload.Usethe/proc/<PID>/fdfile‐supplying
thePIDoftheacceleratedprocesse.g.
#ls‐l/proc/9886/fd
total0
lrwx‐‐‐‐‐‐1rootroot64May1414:090‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:091‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:092‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:093‐>onload:[tcp:6:3]
lrwx‐‐‐‐‐‐1rootroot64May1414:094‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:095‐>/dev/onload
lrwx‐‐‐‐‐‐1rootroot64May1414:096‐>onload:[udp:6:2]
Acceleratedfiledescriptorsarelistedassymboliclinksto/dev/onload.Accelerated
socketsaredescribedin[protocol:stack:socket]format.
6.10SystemcallsinterceptedbyOnload
SystemcallsinterceptedbytheOnloadlibraryarelistedinthefollowingfile:
[onload]/src/include/onload/declare_syscalls.h.tmpl
6.11LinuxSysctls
TheLinuxdirectory/proc/sys/net/ipv4containsdefaultsettingswhichtune
differentpartsoftheIPv4networkingstack.InmanycasesOnloadtakesitsdefault
settingsfromthevaluesinthisdirectory.Insomecasesthedefaultcanbe
overridden,foraspecifiedprocessesorsocket,usingsocketoptionsorwithOnload
environmentvariables.ThefollowingtablesidentifythedefaultLinuxvaluesand
howOnloadtuningparameterscanoverridetheLinuxsettings.
KernelValue tcp_slow_start_after_idle
Description controlscongestionwindowvalidationasperRFC2861.Thisis
“offbydefaultinOnload,asit'snotusuallyusefulinmodern
switchednetworks
Onloadvalue #defineCI_CFG_CONGESTION_WINDOW_VALIDATION
Comments intransport_config_opt.h‐recompileafterchanging.
KernelValue tcp_congestion_control
Description determineswhatcongestioncontrolalgorithmisusedbyTCP.
Validsettingsincludereno,bicandcubic
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 53
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_CONG_AVOID_SCALE_BACK
KernelValue tcp_adv_win_scale
Description defineshowquicklytheTCPwindowwilladvance
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_TCP_ADV_WIN_SCALE_MAX
KernelValue tcp_rmem
Description thedefaultsizeofsockets'receivebuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings,butisignored
onTCPacceptedsockets.Referto
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT.
Comments canbeoverridenwiththeSO_RCVBUFsocketoption.
canbesetwithEF_TCP_RCVBUF
KernelValue tcp_wmem
Description thedefaultsizeofsockets'sendbuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings
Comments EF_TCP_SNDBUFoverridesSO_SNDBUFwhichoverrides
tcp_wmem
KernelValue tcp_dsack
Description allowsTCPtosendduplicateSACKS
Onloadvalue usesthecurrentlyactiveLinuxsettings
Comments
KernelValue tcp_fack
Description enablesfastretransmissions
Onloadvalue fastretransmissionsarealwaysenabled‐Onloadusesthe
currentlyactiveLinuxsetting
Comments
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 54
RefertotheParameterReferenceonpage146fordetailsofenvironmentvariables.
6.12ChangingOnloadControlPlaneTableSizes
Onloadsupportsthefollowingruntimeconfigurableoptionswhichdeterminethe
sizeofcontrolplanetables:
ThetableaboveidentifiesthedefaultvaluesfortheOnloadcontrolplanetables.The
defaultvaluesarenormallysufficientforthemajorityofapplicationsandcreating
largertablesmayimpactapplicationperformance.Ifnondefaultvaluesareneeded,
KernelValue tcp_sack
Description enableTCPselectacknowledgements,asperRFC2018
Onloadvalue enabledbydefault‐OnloadusesthecurrentlyactiveLinux
setting
Comments clearbit2ofEF_TCP_SYN_OPTStodisable
KernelValue tcp_max_syn_backlog
Description themaximumsizeofalisteningsocket'sbacklog
Onloadvalue setwithEF_TCP_BACKLOG_MAX
Comments
KernelValue tcp_synack_retries
Description themaximumnumberofretriesofSYNACKs
Onloadvalue setwithEF_RETRANSMIT_THRESHOLD_SYNACK
Comments Defaultvalue5
Option Description Default
max_layer2_interfaces Setsthemaximumnumberofnetwork
interfaces,includingphysicalinterfaces,
VLANsandbonds,supportedinOnload’s
controlplane.
50
max_neighs Setsthemaximumnumberofrowsinthe
OnloadARP/neighbourtable.Thevalueis
roundeduptoapoweroftwo.
1024
max_routes Setsthemaximumnumberofentriesinthe
Onloadroutetable.
256
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 55
theusershouldcreateafileinthe/etc/modprobe.ddirectory.Thefilemusthavea
.confextensionandOnloadoptionscanbeaddedtothefile,asingleoptionperline,
inthefollowingformat:
optionsonloadmax_neighs=512
FollowingchangesOnloadshouldberestartedusingthereloadcommand:
onload_toolreload
6.13SO_TIMESTAMPandSO_TIMESTAMPNS(software
timestamps)
SettingtheSO_TIMESTAMPoptionusingsetsockopt()enablestimestampingon
TCPorUDPsockets.Functionssuchascmesg(),recvmsg()andrecvmmsg()can
thenrecovertimestampdataforpacketsreceivedatthesocket.
Onloadimplementsamicrosecondresolutionsoftwaretimestampingmechanism,
whichavoidstheneedforaperpacketsystemcalltherebyreducingthenormal
timestampoverheads.
TheSolarflareadapterwillalwaysdeliverreceivedpacketstothereceiveringbuffer
intheorderthatthesearrivefromthenetwork.Onloadwillappendasoftware
timestamptothepacketmetadatawhenitretrievesapacketfromtheringbuffer‐
beforethepacketistransferredtoawaitingsocketbuffer.FromaTCPstreamthe
timestampreturnedisthatforthefirstavailablebyte.Duetoretransmissionsand
anyreordering,timestampsmaynotbemonotonicallyincreasingastheseare
deliveredtotheapplication.
WhentheOnloadapplicationisinterruptdriven,areceivedpacketistimestamped
whentheeventinterruptforthepacketfires.IftheOnloadapplicationisspinning,
areceivedpacketistimestampedwhentheapplicationcallsreceive.Spinningwill
generallyproducemoreaccuratetimestampssolongasthereceivingapplicationis
abletokeeppacewiththepacketarrivalrate.
Thesystemcallusedtogetatimestampisclock_gettime()andtheformatof
timestampsisdefinedbystruct_timeval.
Applicationspreferringtimestampswithnanosecondresolutioncanuse
SO_TIMESTAMPNSinplaceofthenormal(microsecondresolution)SO_TIMESTAMP
value.
6.14SO_TIMESTAMPING(HardwareReceiveTimestamps)
SettingtheSO_TIMESTAMPINGoptionusingsetsockopt()enableshardware
timestampingonTCPorUDPsockets.Timestampsaregeneratedbytheadapterfor
eachreceivedpacket.Functionssuchascmesg(),recvmsg()andrecvmmsg()can
thenrecoverhardwaretimestampsforpacketsrecoveredfromasocket.
• SupportedonlyonSolarflareFlareonSFN7000seriesadapters.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 56
•AnAppFlexlicenseforhardwaretimestampsmustbeinstalledontheadapter.
ThePTP/timestampinglicenseisinstalledontheSFN7322Fduring
manufacture,suchalicensecanbeinstalledonotherSFN7000seriesadapters
bytheuser.
•TheOnloadstackforthesocketmusthavetheenvironmentvariable
EF_RX_TIMESTAMPINGset‐seeAppendixAonpage146fordetails.
• ReceivedpacketsaretimestampedwhentheyentertheMAContheSFN7000
seriesadapter.
Theformatoftimestampsisdefinedbystruct_timespec.Interestedusersshould
readthekernelSO_TIMESTAMPINGdocumentationformoredetailsofhowtouse
thissocketAPIkerneldocumentationcanbefound,forexample,at:
https://www.kernel.org/doc/Documentation/networking/timestamping/
Theonloaddistributionincludesanexampleapplicationtodemonstratetransmit
hardwaretimestamping:
/openonload<version>/src/tests/onload/hwtimestamping
6.15SO_TIMESTAMPING(HardwareTransmitTimestamps)
Onloadfrom201405supportshardwaretimestampingofUDPandTCPpackets
transmittedoveraSolarflareinterface.
BecausetheLinuxkerneldoesnotsupporthardwaretimestampsforTCP,Onload
providesanextensiontothestandardSO_TIMESTAMPINGAPIwiththe
ONLOAD_SOF_TIMESTAMPING_STREAMsocketoptiontosupportthis.Toreceive
hardwaretimestampsfortransmittedTCPpackets,setthefollowingsocketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE|
ONLOAD_SOF_TIMESTAMPING_STREAM
ToreceivehardwaretimestampsfortransmittedUDPpackets,setthefollowing
socketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE
Othersocketflagcombinations,notlistedabove,willbesilentlyignored.
Toreceivehardwaretransmittimestamps:
•OnlysupportedonSolarflareFlareon™SFN7000seriesadapters.
•TheadaptermusthaveaPTP/HWtimestampinglicense.
•TheadaptermusthaveaSolarCaptureProlicenseorPerformanceMonitoring
license.
•SetEF_TX_TIMESTAMPINGonstackswheretransmittimestampingisrequired.
•SetEF_TIMESTAMPING_REPORTINGtocontrolthetypeoftimestampreturned
totheapplication.Thisisoptional,bydefaultOnloadwillreporttranslated
timestampsiftheadapterclockhasbeenfullysynchronizedtocorrecttimeby
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 57
theSolarflarePTPdaemon.InallcasesOnloadwillalwaysreportraw
timestamps.RefertoParameterReferenceonpage146forfulldetailsofthe
EF_TIMESTAMPING_REPORTINGvariable.
• SolarflarePTP(sfptpd)mustberunningiftimestampsaretobesynchronized
withanexternalPTPmasterclock.
FordetailsoftheSO_TIMESTAMPINGAPIrefertotheLinuxdocumentation:
https://www.kernel.org/doc/Documentation/networking/timestamping/
Theonloaddistributionincludesanexampleapplicationtodemonstratetransmit
hardwaretimestamping:
/openonload<version>/src/tests/onload/hwtimestamping
6.16SO_BINDTODEVICE
Inresponsetothesetsockopt()functioncallwithSO_BINDTODEVICE,sockets
identifyingnonSolarflareinterfaceswillbehandledbythekernelandallsockets
identifyingSolarflareinterfaceswillbehandledbyOnload.Allsendsfromasocket
aresentviatheboundinterfaceandallTCP,UDPandMulticastpacketsreceivedvia
theboundinterfacearedeliveredonlytothesocketboundtotheinterface.
6.17MultiplexedI/O
LinuxsupportsthreecommonmethodsforhandlingmultiplexedI/Ooperation;
poll(),select()andtheepollsetoffunctions.
Thegeneralbehaviorofthepoll(),select()andepoll_wait()functionswith
OpenOnloadisasfollows:
•Ifthereareoperationsreadyonanyfiledescriptors,poll(),select()and
epoll_wait()willreturnimmediately.RefertothePoll,SelectandEpoll
subsectionsforspecificbehaviordetails.
•Iftherearenofiledescriptorsreadyandspinningisnotenabled,callsto
poll(),select()andepoll_wait()willenterthekernelandblock.
•Inthecasesofpoll()andselect(),whenthesetcontainsfiledescriptors
thatarenotacceleratedsockets,thereisaslightlatencyoverheadasOnload
mustmakeasystemcalltodeterminethereadinessofthesesockets.Thereis
nosuchcostwhenusingepoll_wait()andasystemcallisonlyneededwhen
nonOnloaddescriptorsbecomeready.
•Iftherearenofiledescriptorsreadyandspinningisenabled,OpenOnloadwill
spintoensurethatacceleratedsocketsarepolledaspecifiednumberoftimes
beforeunacceleratedsocketsareexamined.Thisreducestheoverhead
incurredwhenOpenOnloadhastocallintothekernelandreduceslatencyon
acceleratedsockets.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 58
ThefollowingsubsectionsdiscusstheuseoftheseI/OfunctionsandOpenOnload
environmentvariablesthatcanbeusedtomanipulatebehavioroftheI/O
operation.
Poll,ppoll
Thepoll(),ppoll()filedescriptorsetcanconsistofbothacceleratedandnon
acceleratedfiledescriptors.TheenvironmentvariableEF_UL_POLLenables/
disablesaccelerationofthepoll(),ppoll()functioncalls.Onloadsupportsthe
followingoptionsfortheEF_UL_POLLvariable:
Additionalenvironmentvariablescanbeemployedtocontrolthepoll(),ppoll()
functionsandtogiveprioritytoacceleratedsocketsovernonacceleratedsockets
andotherfiledescriptors.RefertoEF_POLL_FAST,EF_POLL_FAST_USECand
EF_POLL_SPINinParameterReferenceonpage146.
Select,pselect
Theselect(),pselect()filedescriptorsetcanconsistofbothacceleratedand
nonacceleratedfiledescriptors.TheenvironmentvariableEF_UL_SELECTenables/
disablesaccelerationoftheselect(),pselect()functioncalls.Onloadsupports
thefollowingoptionsfortheEF_UL_SELECTvariable:
Value Behaviour
0Disableaccelerationatuserlevel.Callstopoll(),ppoll()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuserlevel.Callstopoll(),ppoll()are
processedatuserlevel.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
Value EpollBehaviour
0Disableaccelerationatuserlevel.Callstoselect(),pselect()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuserlevel.Callstoselect(),pselect()are
processedatuserlevel.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 59
Additionalenvironmentvariablescanbeemployedtocontroltheselect(),
pselect()functionsandtogiveprioritytoacceleratedsocketsovernon
acceleratedsocketsandotherfiledescriptors.RefertoEF_SELECT_FASTand
EF_SELECT_SPINinParameterReferenceonpage146.
Epoll
Theepollsetoffunctions,epoll_create(),epoll_ctl(),epoll_wait(),
epoll_pwait(),areacceleratedinthesamewayaspollandselect.The
environmentvariableEF_UL_EPOLLenables/disablesepollacceleration.Referto
thereleasechangelogforenhancementsandchangestoepollbehavior.
UsingOnloadanepollsetcanconsistofbothOnloadfiledescriptorsandkernelfile
descriptors.OnloadsupportsthefollowingoptionsfortheEF_UL_EPOLL
environmentvariable:
Value EpollBehaviour
0Acceleratedepollisdisabledandepoll_ctl(),epoll_wait()and
epoll_pwait()functioncallsareprocessedinthekernel.Other
functionscallssuchassend()andrecv()arestillaccelerated.
Interruptavoidancedoesnotfunctionandspinningcannotbeenabled.
Ifasocketishandedovertothekernelstackafterithasbeenaddedto
anepollset,itwillbedroppedfromtheepollset.
onload_ordered_epoll_wait()isnotsupported.
1Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestlatencyexceptwhenthenumberofacceleratedfile
descriptorsintheepollsetisverylarge.Thisoptionalsogivesthebest
accelerationofepoll_ctl()calls.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyincreasewiththenumberoffiledescriptors
intheepollset.
onload_ordered_epoll_wait()issupported.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 60
Therelativeperformanceofepolloptions1and2dependsonthedetailsof
applicationbehavioraswellasthenumberofacceleratedfiledescriptorsinthe
epollset.Behaviormayalsodifferbetweenearlierandlaterkernelsandbetween
Linuxrealtimeandnonrealtimekernels.GenerallytheOSwillallocateshorttime
slicestoauserlevelCPUintensiveapplicationwhichmayresultinperformance
(latencyspikes).AkernellevelCPUintensiveprocessislesslikelytobedescheduled
resultinginbetterperformance.Solarflarerecommendtheuserevaluateoptions1
and2forapplicationsthatmanagesmanyfiledescriptors,ortryoption3(onload
201502andlater)whenusingverylargesetsandallsocketsareinthesamestack.
Additionalenvironmentvariablescanbeemployedtocontroltheepoll_ctl(),
epoll_wait()andepoll_pwait()functionsandtogiveprioritytoaccelerated
socketsovernonacceleratedsocketsandotherfiledescriptors.Referto
EF_EPOLL_CTL_FAST,EF_EPOLL_SPINandEF_EPOLL_MT_SAFEinParameter
Referenceonpage146.
2Callstoepoll_ctl(),epoll_wait(),epoll_pwait()areprocessedin
thekernel.
Deliversbestperformanceforlargenumbersofacceleratedfile
descriptors.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyareindependentofthenumberoffile
descriptorsintheepollset.
onload_ordered_epoll_wait()isnotsupported.
3Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestaccelerationlatencyforepoll_ctl()callsandscaleswell
whenthenumberofacceleratedfiledescriptorsintheepollsetisvery
large‐andallsocketsareinthesamestack.Thecostofthe
epoll_wait()isindependentofthenumberofacceleratedfile
descriptorsinthesetanddependsonlyonthenumberofdescriptors
thatbecomeready.Thebenefitswillbelessifsocketsexistindifferent
Onloadstacksandinthiscasetherecommendationistouse
EF_UL_EPOLL=2.
EF_UL_EPOLL=3doesnotallowmonitoringthereadinessoftheepoll
filedescriptorsfromanotherepoll/poll/select.
EF_UL_EPOLL=3cannotsupportepollsetswhichexistacrossfork().
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
onload_ordered_epoll_wait()issupported.
Value EpollBehaviour
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 61
Refertoepoll‐KnownIssuesonpage122.
6.18WireOrderDelivery
WhenaTCPorUDPapplicationisworkingwithmultiplenetworksockets
simultaneouslyitisdifficulttoensuredataisdeliveredtotheapplicationinthestrict
orderitwasreceivedfromthewireacrossthesesockets.
Theonload_ordered_epoll_wait()APIisanOnloadalternativeimplementation
ofepoll_wait()providingadditionaldataallowingareceivingapplicationto
recoverinordertimestampeddatafrommultiplesockets.Tomaintainwireorder
delivery,onlyaspecificnumberofbytes,asidentifiedbythe
onload_ordered_epoll_event,shouldberecoveredfromareadysocket.
• Orderingisdoneonaperstackbasis‐forTCPandUDPsockets.Socketsmust
beinthesameonloadstack.
•OnlydatareceivedfromanOnloadstackwithahardwaretimestampwillbe
ordered.TheenvironmentvariableEF_RX_TIMESTAMPINGshouldbeenabled.
Filedescriptorswheretimestampinginformationisnotavailablemaybe
includedintheepollset,butreceiveddatawillbereturnedfromthese
unordered.
•TheapplicationmustusetheepollAPIandthe
onload_ordered_epoll_wait()function.
•Theapplicationmustsettheperprocessenvironmentvariable
EF_UL_EPOLL=1.
• EPOLLETandONESHOTflagsshouldNOTbeused.
•Areturnvalueofzerofromthewaitfunctionindicatestherearenofile
descriptorsreadywithordereddata‐unordereddatamaystillbeavailable.
Figure6demonstratestheWireOrderDeliveryfeature.
Figure6:WireOrderDelivery
onload_ordered_epoll_wait()returningatpointXwouldallowthefollowing
datatoberecovered:
•SocketA:timestampofpacket1,bytesinpacket1.
•SocketB:timestampofpacket2,bytesinpackets2and3.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 62
onload_ordered_epoll_wait()returningagainwouldrecovertimestampof
packet4andbytesinpacket4.
TheWireOrderDeliveryfeatureisonlyavailableonSolarflareFlareonadapters
havingaPTP/HWtimestampinglicense.Whenreceivingacrossmultipleadapters,
Solarflaresfptpd(PTP)canensurethatadaptersarecloselysynchronizedwitheach
otherand,ifrequired,withanexternalPTPclocksource.
WireOrderDelivery‐ExampleAPI:
TheOnloaddistributionincludesexampleclient/serverapplicationstodemonstrate
thewireorderfeature:
wire_order_server‐usesonload_ordered_epoll_waittoreceiveordered
dataoverasetofsockets.Receiveddataisechoedbacktotheclientonasinglereply
socket.
wire_order_client‐Sendssequenceddataacrossthesocketset,readsthereply
datafromtheserverandensuresdataisreceivedinsequence.
SourcecodeforthewireorderAPIisavailablein:
openonload<version>/src/tests/onload/wire_order
AlthoughnotcompiledaspartoftheOnloadinstallprocess,tobuildtheexample
APIdothefollowing:
Ensuremmaketoolisinthecurrentpath(canbefoundintheopenonload
<version>/scriptsdirectory):
#exportPATH=$PATH:/openonload<version>/scripts
#cd/openonload<version>/build/gnu_x86_64/tests/onload/wire_order
#USEONLOADEXT=1make
Toruntheserver:
#EF_RX_TIMESTAMPING=3onload./wire_order_server
Toruntheclient:
#onload‐‐profile=latency./wire_order_client<ipserver>
Bydefaulttheclientwillsenddataover100TCPsocketscontrolledwiththe‐s
option.UDPcanbeselectedusingthe‐Uoption.
NOTE:Topreventsendsbeingreorderedbetweenstreams,thelatencyprofile
shouldbeusedontheclientside.TheenvironmentvariableEF_RX_TIMESTAMPING
mustbesetontheserverside.
6.19StackSharing
BydefaulteachprocessusingOnloadhasitsown'stack'.RefertoOnloadStacksfor
definition.Severalprocessescanbemadetoshareasinglestack,usingtheEF_NAME
environmentvariable.ProcesseswiththesamevalueforEF_NAMEintheir
environmentwillshareastack.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 63
StacksharingisonesupportedmethodtoenablemultipleprocessesusingOnload
tobeacceleratedwhenreceivingthesamemulticaststreamortoallowone
applicationtoreceiveamulticaststreamgeneratedlocallybyasecondapplication.
OthermethodstoachievethisareMulticastReplicationandHardwareMulticast
Loopback.
Stacksmayalsobesharedbymultipleprocessesinordertopreserveandcontrol
resourceswithinthesystem.Stacksharingcanbeemployedbyprocesseshandling
TCPaswellasUDPsockets.
Stacksharingshouldonlyberequestedifthereisatrustrelationshipbetweenthe
processes.Iftwoprocessesshareastackthentheyarenotcompletelyisolated:a
buginoneprocessmayimpacttheother,oroneprocesscangainaccesstothe
other'sprivilegedinformation(i.e.breachsecurity).OncetheEF_NAMEvariableis
set,anyprocessonthelocalhostcansetthesamevalueandgainaccesstothe
stack.
BydefaultOnloadstackscanonlybesharedwithprocesseshavingthesameUID.
TheEF_SHARE_WITHenvironmentvariableprovidesadditionalsecuritywhile
allowingadifferentUIDtoshareastack.RefertoParameterReferenceonpage146
foradescriptionoftheEF_NAMEandEF_SHARE_WITHvariables.
ProcessessharinganOnloadstackshouldalsonotusehugepages.Onloadwill
issueawarningatstartupandpreventtheallocationofhugepagesif
EF_SHARE_WITHidentifiesaUIDofanotherprocessorissetto‐1.IfaprocessP1
createsanOnloadstack,butisnotusinghugepagesandanotherprocessP2
attemptstosharetheOnloadstackbysettingEF_NAME,thestackoptionssetbyP1
willapply,allocationofhugepagesinP2willbeprevented.
AnalternativemethodofimplementingstacksharingistousetheOnload
ExtensionsAPIandtheonload_set_stackname()functionwhich,throughits
scopeparameter,canlimitstackaccesstotheprocessescreatedbyaparticularuser.
RefertoOnloadExtensionsAPIonpage189fordetails.
6.20ApplicationClustering
AnapplicationclusteristhesetofOnloadTCPorUDPstacksocketsboundtothe
sameport.Thisfeaturedramaticallyimprovesthescalingofsomeapplications
acrossmultipleCPUs(especiallythoseestablishingmanysocketsfromaTCP
listeningsocket).
Onloadfromversion201405automaticallycreatesaclusterusingthe
SO_RESUSEPORTsocketoption.TCPorUDPprocessesrunningonRHEL6.5(and
later)settingthisoptioncanbindmultiplesocketstothesameTCPorUDPport.
NOTE:SomeolderLinuxkernel/distributionsdonothavekernelsupportfor
SO_REUSEPORT(introducedintheLinux3.9kernel).Onloadcontainsexperimental
supportforSO_REUSEPORTonolderkernelversionsbutthishasyettobefully
testedandverifiedbySolarflare.UsersarefreetotrytheOnloadapplication
clusteringfeatureonthesekernelsandreporttheirfindingsviaemailto
support@solarflare.com.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 64
ForTCP,clusteringallowstheestablishedconnectionsresultingfromalistening
sockettobespreadoveranumberofOnloadstacks.Eachthread/processcreatesits
ownlisteningsocket(usingSO_REUSEPORT)onthesameport,witheachlistening
socketresidinginitsownOnloadstack.HandlingofincomingnewTCPconnections
arespreadviatheadapter(usingRSS)overtheapplicationclusterandtherefore
overeachofthelisteningsocketsresultingineachOnloadstackandthereforeeach
thread/process,handlingasubsetofthetotaltrafficasillustratedinFigure7below.
Figure7:ApplicationClustering‐TCP
ForUDP,clusteringallowsUDPunicasttraffictobespreadovermultipleapplications
witheachapplicationreceivingasubsetofthetotaltrafficload.
ExistingapplicationsthatdonotuseSO_RESUSEPORTcanusetheapplication
clusteringfeaturewithouttheneedforrecompilationbyusingtheOnload
EF_TCP_FORCE_REUSEPORTorEF_UDP_FORCE_REUSEPORTenvironmentvariables
identifyingthelistofportstowhichSO_RESUSEPORTwillbeapplied.
ThesizeornumberofsocketmembersofaclusterinOnloadiscontrolledwith
EF_CLUSTER_SIZE.Tocreateaclustertheapplicationsetstheclusternamewith
EF_CLUSTER_NAME.AclusterofEF_CLUSTER_SIZEisthencreated.
NOTE:ThenumberofsocketmembersmustequaltheEF_CLUSTER_SIZEvalue
otherwiseaportionofthereceivedtrafficwillbelost.
ThespreadofreceivedtrafficbetweenclustersocketsemploysReceiveSideScaling
(RSS).ForTCPtheRSShashisafunctionofthesrc_ip:src_port,dst_ip:dst_port.For
UDPtheRSShashisafunctionofthesrc_ipanddst_iponly.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 65
Thereceptionoftrafficwithinaclusterisdependentonportnumbersonly.Iftwo
socketsbindtothesameport,butdifferentIPaddresses,aportionoftraffic
destinedforonesocketcanbereceived(butdroppedbyOnload)ontheother
socket.Forcorrectbehavior,allclustermembersshouldbindtothesameIPaddress.
ThislimitationhasbeenremovedintheOnload201509releasesothatitispossible
tocreatemultiplelisteningsocketsboundtothesameportbuttodifferent
addresses.
Restartinganapplicationthatincludesclustersocketmemberscanfailwhenorphan
stacksarestillpresent.UseEF_CLUSTER_RESTARTtoforceterminationoforphaned
stacksallowingthecreationofthenewcluster.
RefertoLimitationsonpage117fordetailsofApplicationClusteringlimitations.
6.21Bonding,LinkaggregationandFailover
Bonding(akateaming)allowsforimprovedreliabilityandincreasedbandwidthby
combiningphysicalportsfromoneormoreSolarflareadaptersintoabond.Abond
hasasingleIPaddress,singleMACaddressandfunctionsasasingleportorsingle
adaptertoprovideredundancy.
OnloadmonitorstheOSconfigurationofthestandardkernelbondingmoduleand
acceleratestrafficoverbondsthataredetectedassuitable(seelimitations).Asa
resultnospecialconfigurationisrequiredtoacceleratetrafficoverbonded
interfaces.
e.g.Toconfigurean802.3adbondoftwoSFCinterfaces(eth2andeth3):
modprobebondingmiimon=100mode=4xmit_hash_policy=layer3+4
ifconfigbond0up
Interfacesmustbedownbeforeaddingtothebond.
echo+eth2>/sys/class/net/bond0/bonding/slaves
echo+eth3>/sys/class/net/bond0/bonding/slaves
ifconfigbond0192.168.1.1/24
Thefile/var/log/messagesshouldthencontainalinesimilarto:
[onload]Acceleratingbond0usingOnload
TrafficoverthisinterfacewillthenbeacceleratedbyOnload.
TodisableOnloadaccelerationofbondssetCI_CFG_TEAMING=0inthefile
transport_config_opt.hatcompiletime.
InadditiontotheLinux“bondingdriver,Onloadfromthe201509versionalso
supportsthe“teamingdriverand“teamd”.
RefertotheLimitationssection,Bonding,Linkaggregationonpage120forfurther
information.
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 66
6.22VLANS
ThedivisionofaphysicalnetworkintomultiplebroadcastdomainsorVLANsoffers
improvedscalability,securityandnetworkmanagement.
OnloadwillacceleratetrafficoversuitableVLANinterfacesbydefaultwithno
additionalconfigurationrequired.
e.g.toaddaninterfaceforVLAN5overanSFCinterface(eth2)
modprobeonload
modprobe8021q
vconfigaddeth25
ifconfigeth2.5192.168.1.1/24
TrafficoverthisinterfacewillthenbetransparentlyacceleratedbyOnload.
RefertotheLimitationssection,VLANsonpage120forfurtherinformation.
6.23Acceleratedpipe()
Onloadsupportstheaccelerationofpipes,providinganacceleratedIPCmechanism
throughwhichtwoprocessesonthesamehostcancommunicateusingshared
memoryatuserlevel.Acceleratedpipesdonotinvokesystemcalls.Accelerated
pipestherefore,reducetheoverheadsforread/writeoperationsandofferimproved
latencyoverthekernelimplementation.
Tocreateauserlevelpipe,andbeforethepipe()orpipe2()functioniscalled,a
processmustbeacceleratedbyOnloadandmusthavecreatedanOnloadstack.By
default,anacceleratedprocessthathasnotcreatedanOnloadstackisgrantedonly
anonacceleratedpipe.SeeEF_PIPEforotheroptions.
Theacceleratedpipeiscreatedfromthepoolofavailablepacketbuffers..
Thefollowingfunctioncalls,relatedtopipes,willbeacceleratedbyOnloadandwill
notenterthekernelunlesstheyblock:
pipe()
read()
write()
readv()
writev()
send()
recv()
recvmsg()
sendmsg()
poll()
select()
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 67
epoll_ctl()
epoll_wait()
AswithTCP/UDPsockets,theOnloadtuningoptionssuchasEF_POLL_USECand
EF_SPIN_USECwillalsoinfluenceperformanceoftheuserlevelpipe.
ReferalsotoEF_PIPE,EF_PIPE_RECV_SPIN,EF_PIPE_SEND_SPINinParameter
Referenceonpage146.
NOTE:Onlyanonymouspipescreatedwiththepipe()orpipe2()functioncalls
willbeaccelerated.
6.24ZeroCopyAPI
TheOnloadExtensionsAPIincludessupportforzerocopyofTCPtransmitpackets
andUDPreceivepackets.RefertoZeroCopyAPIonpage201fordetailed
descriptionsandexamplesourcecodeoftheAPI.
6.25DebugandLogging
Onloadsupportsvariousdebugandloggingoptions.Logginganddebuginformation
willbedisplayedonanattachedconsoleorwillbesenttothesyslog.Toforceall
debugtothesyslogsettheOnloadenvironmentvariableEF_LOG_VIA_IOCTL=1.
Formoreinformationaboutdebug/loggingenvironmentvariablesreferto
ParameterReferenceonpage146.
Toenabledebugandloggingusingtheoptionsbelow,Onloadmustbeinstalledwith
debugenablede.g:
#onload_install‐‐debug
IfOnloadisalreadyinstalled,uninstall,thenreinstallwiththe‐‐debugoptionas
shownabove.
LogLevels:
EF_UNIX_LOG.
EF_LOG.
EF_LOG_FILE‐WhenEF_LOG_VIA_IOCTLisunset,theuserisabletoredirect
OnloadoutputtoaspecifieddirectoryandfileusingtheEF_LOG_FILEoption.
TimestampscanalsobeaddedtothelogfilewhenEF_LOG_TIMESTAMPSisalso
enabled.
EF_LOG_FILE=<path/file>
Notethatkernelloggingisstilldirectedtothesyslog.
TP_LOG(bitmask)‐usefulforstackdebugging.SeeOnloadsourcecode/src/
include/ci/internal/ip_log.hforbitvalues.
•Onloadmoduleoptions:
OnloadUserGuide
OnloadFunctionality
Issue20 ©SolarflareCommunications2015 68
‐ oo_debug_bits=[bitmask]‐usefulforkernelloggingandeventsnot
involvinganonloadstack.Seesrc/include/onload/debug.hforbit
values.
‐ ci_tp_log=[bitmask]‐usefulforkernelloggingandeventsinvolvingan
onloadstack.SeeOnloadsourcecode/src/include/ci/internal/
ip_log.hfordetails.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 69
7Onload‐TCP
7.1TCPOperation
ThetablebelowidentifiestheOnloadTCPimplementationRFCcompliance.
7.2TCPHandshake‐SYN,SYNACK
DuringtheTCPconnectionestablishment3wayhandshake,Onloadnegotiatesthe
MSS,WindowScale,SACKpermitted,ECN,PAWSandRTTMtimestamps.
RFC Title Compliance
793 TransmissionControlProtocol Yes
813 WindowandAcknowledgementStrategyinTCP Yes
896 CongestionControlinIP/TCP Yes
1122 RequirementsforHosts Yes
1191 PathMTUDiscovery Yes
1323 TCPExtensionsforHighPerformance Yes
2018 TCPSelectiveAcknowledgmentOptions Yes
2581 TCPCongestionControl Yes
2582 TheNewRenoModificationtoTCPsFastRecovery
Algorithm
Yes
2883 AnExtensiontotheSelectiveAcknowledgement
(SACK)OptionforTCP
Yes
2988 ComputingTCPsRetransmissionTimer Yes
3128 ProtectionAgainstaVariantoftheTinyFragment
Attack
Yes
3168 TheAdditionofExplicitCongestionNotification(ECN)
toIP
Yes
3465 TCPCongestionControlwithAppropriateByte
Counting(ABC)
Yes
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 70
ForTCPconnectionsOnloadwillnegotiateanappropriateMSSfortheMTU
configuredontheinterface.However,whenusingjumboframes,Onloadwill
currentlynegotiateanMSSvalueuptoamaximumof2048bytesminusthenumber
ofbytesrequiredforpacketheaders.Thisisduetothefactthatthesizeofbuffers
passedtotheSolarflarenetworkinterfacecardis2048bytesandtheOnloadstack
cannotcurrentlyhandlefragmentedpacketsonitsTCPreceivepath.
TCPoptionsadvertisedduringthehandshakecanbeselectedusingthe
EF_TCP_SYN_OPTSenvironmentvariable.RefertoParameterReferenceon
page146fordetailsofenvironmentvariables.
7.3TCPSYNCookies
TheOnloadenvironmentvariableEF_TCP_SYNCOOKIEScanbeenabledonaper
stackbasistoforcetheuseofSYNCOOKIEStherebyprovidingadegreeofprotection
againsttheDenialofService(DOS)SYNfloodattack.EF_TCP_SYNCOOKIESis
disabledbydefault.RefertoParameterReferenceonpage146fordetailsof
environmentvariables.
7.4TCPSocketOptions
OnloadTCPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_ACCEPTCONN determineswhetherthesocketcanacceptincoming
connections‐trueforlisteningsockets.(Onlyvalidasa
getsockopt()).
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.
SO_CONNECT_TIME numberofsecondsaconnectionhasbeenestablished.
(Onlyvalidasagetsockopt()).
SO_DEBUG enableprotocoldebugging.
SO_DONTROUTE outgoingdatashouldbesentonwhateverinterfacethe
socketisboundtoandnotroutedviaanotherinterface.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.
SO_KEEPALIVE enablesendingofkeepalivemessagesonconnection
orientedsockets.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 71
SO_LINGER whenenabled,aclose()orshutdown()willnot
returnuntilallqueuedmessagesforthesockethave
beensuccessfullysentorthelingertimeouthasbeen
reached.Otherwisetheclose()orshutdown()
returnsimmediatelyandsocketsareclosedinthe
background.
SO_OOBINLINE indicatesthatoutofbounddatashouldbereturnedin
linewithregulardata.Thisoptionisonlyvalidfor
connectionorientedprotocolsthatsupportoutof
banddata.
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeueingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.Thevaluesetisdoubledbythekernelandby
Onloadtoallowforbookkeepingoverheadswhenitis
setbythesetsockopt()functioncall.Notethat
EF_TCP_RCVBUFoverridesthisvalueand
EF_TCP_RCVBUF_ESTABLISHED_DEFAULTcanalso
overridethisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RCVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_RECVTIMEO setsthetimeoutinmillisecondsforblockingreceive
calls.
SO_REUSEADDR canreuselocalportnumbersi.e.anothersocketcan
bindtothesameportexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowsmultiplesocketstobindtothesameport.
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.Notethat
EF_TCP_SNDBUF,EF_TCP_SNDBUF_MODEand
EF_TCP_SNDBUF_ESTABLISHED_DEFAULTcanoverride
thisvalue.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 72
7.5TCPLevelOptions
OnloadTCPsupportsthefollowingTCPoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPNS enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.SeeSO_TIMESTAMPING(HardwareReceive
Timestamps)onpage55.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_SYS
_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SOF_TIMESTAMPING_OPT
_CMSG
delivertimestampsusingthecmsgAPI.
ONLOAD_SOF_TIMESTAMP
ING_STREAM
OnloadextensiontothestandardSO_TIMESTAMPING
APItosupporthardwaretimestampsonTCPsockets.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
IP_TRANSPARENT thissocketoptionallowsthecallingapplicationtobind
thesockettoanonlocalIPaddress.
Option Description
TCP_CORK stopssendsonsegmentslessthanMSSsizeuntilthe
connectionisuncorked.
TCP_DEFER_ACCEPT aconnectionisESTABLISHEDafterhandshakeis
completeinsteadofleavingitinSYNRECVuntilthe
firstrealdatapacketarrives.Theconnectionisplaced
intheacceptqueuewhenthefirstdatapacketarrives.
TCP_INFO populatesaninternaldatastructurewithtcpstatistic
values.
TCP_KEEPALIVE_ABORT_
THRESHHOLD
howlongtotrytoproduceasuccessfulkeepalive
beforegivingup.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 73
7.6TCPFileDescriptorControl
Onloadsupportsthefollowingoptionsinsocket()andaccept()calls.
TCP_KEEPALIVE_THRESH
HOLD
specifiestheidletimeforkeepalivetimers.
TCP_KEEPCNT numberofkeepalivesbeforegivingup.
TCP_KEEPIDLE idletimeforkeepalives.
TCP_KEEPINTVL timebetweenkeepalives.
TCP_MAXSEG getstheMSSsizeforthisconnection.
TCP_NODELAY disablesNagle’sAlgorithmandsmallsegmentsaresent
withoutdelayandwithoutwaitingforprevious
segmentstobeacknowledged.
TCP_QUICKACK whenenabledACKmessagesaresentimmediately
followingreceptionofthenextdatapacket.Thisflag
willberesettozerofollowingeveryusei.e.itisaone
timeoption.Defaultvalueis1(enabled).
Option Description
SOCK_CLOEXEC supportedinsocket()andaccept().Setsthe
O_NONBLOCKfilestatusflagonthenewopenfile
descriptorsavingextracallstofcntl(2)toachievethe
sameresult.
SOCK_NONBLOCK supportedinaccept().Setsthecloseonexec
(FD_CLOEXEC)flagonthenewfiledescriptor.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 74
7.7TCPCongestionControl
OnloadTCPimplementscongestioncontrolinaccordancewithRFC3465and
employstheNewRenoalgorithmwithextensionsforAppropriateByteCounting
(ABC).
Onneworidleconnectionsandthoseexperiencingloss,OnloademploysaFast
Startalgorithminwhichdelayedacknowledgmentsaredisabled,therebycreating
moreACKsandsubsequently‘growingthecongestionwindowrapidly.Two
environmentvariables;EF_TCP_FASTSTART_INITandEF_TCP_FASTSTART_LOSS
areassociatedwiththefaststart‐RefertoParameterReferenceonpage146for
details.
DuringSlowStart,thecongestionwindowisinitiallysetto2xmaximumsegment
size(MSS)value.AseachACKisreceivedthecongestionwindowsizeisincreasedby
thenumberofbytesacknowledgeduptoamaximum2xMSSbytes.Thisallows
Onloadtotransmittheminimumofthecongestionwindowandadvertisedwindow
sizei.e.
transmissionwindow(bytes)=min(CWND,receiveradvertisedwindowsize)
Iflossisdetected‐eitherbyretransmissiontimeout(RTO),orthereceptionof
duplicateACKs,Onloadwilladoptacongestionavoidancealgorithmtoslowthe
transmissionrate.Incongestionavoidancethetransmissionwindowishalvedfrom
itscurrentsize‐butwillnotbelessthan2xMSS.Ifcongestionavoidancewas
triggeredbyanRTOtimeouttheSlowStartalgorithmisagainusedtorestorethe
transmitrate.IftriggeredbyduplicateACKsOnloademploysaFastRetransmitand
FastRecoveryalgorithm.
IfOnloadTCPreceives3duplicateACKsthisindicatesthatasegmenthasbeenlost
‐ratherthanjustreceivedoutoforderandcausestheimmediateretransmissionof
thelostsegment(FastRetransmit).ThecontinuedreceptionofduplicateACKsisan
indicationthattrafficstillflowswithinthenetworkandOnloadwillfollowFast
RetransmitwithFastRecovery.
DuringFastRecoveryOnloadagainresortstothecongestionavoidance(without
SlowStart)algorithmwiththecongestionwindowsizebeinghalvedfromitspresent
value.
Onloadsupportsanumberofenvironmentvariablesthatinfluencethebehaviorof
thecongestionwindowandrecoveryalgorithmsRefertoParameterReferenceon
page146.:
EF_TCP_INITIAL_CWND‐setstheinitialsize(bytes)ofcongestionwindow
EF_TCP_LOSS_MIN_CWND‐setstheminimumsizeofthecongestionwindow
followingloss.
EF_CONG_AVOID_SCALE_BACK‐slowsdowntherateatwhichtheTCP
congestionwindowisopenedtohelpreducelossinenvironmentsalready
sufferingcongestionandloss.
ThecongestionvariablesshouldbeusedwithcautionsoastoavoidviolatingTCP
protocolrequirementsanddegradingTCPperformance.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 75
7.8TCPSACK
OnloadwillemployTCPSelectiveAcknowledgment(SACK)iftheoptionhasbeen
negotiatedandagreedbybothendsofaconnectionduringtheconnection
establishment3wayhandshake.RefertoRFC2018forfurtherinformation.
7.9TCPQUICKACK
TCPwillgenerallyaimtodeferthesendingofACKsinordertominimizethenumber
ofpacketsonthenetwork.OnloadsupportsthestandardTCP_QUICKACKsocket
optionwhichallowssomecontroloverthisbehavior.EnablingTCP_QUICKACK
causesanACKtobesentimmediatelyinresponsetothereceptionofthefollowing
datapacket.ThisisaoneshotoperationandTCP_QUICKACKselfclearstozero
immediatelyaftertheACKissent.
7.10TCPDelayedACK
BydefaultTCPstacksdelaysendingacknowledgments(ACKs)toimproveefficiency
andutilizationofanetworklink.DelayedACKsalsoimprovereceivelatencyby
ensuringthatACKsarenotsentonthecriticalpath.However,ifthesenderofTCP
packetsisusingNagle’salgorithm,receivelatencywillbeimpairedbyusingdelayed
ACKs.
UsingtheEF_DELACK_THRESHenvironmentvariabletheusercanspecifyhowmany
TCPsegmentscanbereceivedbeforeOnloadwillrespondwithaTCPACK.Referto
theParameterListonpage146fordetailsoftheOnloadenvironmentdelayedTCP
ACKvariables.
7.11TCPDynamicACK
ThesendingofexcessiveTCPACKscanimpairperformanceandincreasereceive
sidelatency.AlthoughTCPgenerallyaimstodeferthesendingofACKs,Onloadalso
supportsafurthermechanism.TheEF_DYNAMIC_ACK_THRESHenvironmentvariable
allowsOnloadtodynamicallydeterminewhenitisnondetrimentaltothroughput
andefficiencytosendaTCPACK.OnloadwillforceanTCPACKtobesentifthe
numberofTCPACKspendingreachesthethresholdvalue.
RefertotheParameterListonpage146fordetailsoftheOnloadenvironment
delayedTCPACKvariables.
NOTE:WhenusedtogetherwithEF_DELACK_THRESHorEF_DYNAMIC_ACK_THRESH,
thesocketoptionTCP_QUICKACKwillbehaveexactlyasstatedabove.Bothonload
environmentvariablesidentifythemaximumnumberofsegmentsthatcanbe
receivedbeforeanACKisreturned.SendinganACKbeforethespecifiedmaximum
isreachedisallowed.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 76
NOTE:TCPACKSshouldbetransmittedatasufficientratetoensuretheremoteend
doesnotdroptheTCPconnection.
7.12TCPLoopbackAcceleration
OnloadsupportstheaccelerationofTCPloopbackconnections,providingan
acceleratedmechanismthroughwhichtwoprocessesonthesamehostcan
communicate.AcceleratedTCPloopbackconnectionsdonotinvokesystemcalls,
reducetheoverheadsforread/writeoperationsandofferimprovedlatencyoverthe
kernelimplementation.
TheserverandclientprocesseswhowanttocommunicateusinganacceleratedTCP
loopbackconnectiondonotneedtobeconfiguredtoshareanOnloadstack.
However,theserverandclientTCPloopbacksocketscanonlybeacceleratedifthey
areinthesameOnloadstack.OnloadhastheabilitytomoveaTCPloopbacksocket
betweenOnloadstackstoachievethis.
TCPloopbackaccelerationisconfiguredviatheenvironmentvariables
EF_TCP_CLIENT_LOOPBACKandEF_TCP_SERVER_LOOPBACK.AswellasenablingTCP
loopbackaccelerationtheseenvironmentvariablescontrolOnload’sbehaviorwhen
theserverandclientsocketsdonotoriginateinthesameOnloadstack.Thisgives
theusergreaterflexibilityandcontrolwhenestablishingloopbackonTCPsockets
eitherfromthelistening(server)socketorfromtheconnecting(client)socket.The
connectingsocketcanuseanylocaladdressorspecifytheloopbackaddress.
Thefollowingdiagramillustratestheclientandserverloopbackoptions.Referto
ParameterReferenceonpage146foradescriptionoftheloopbackvariables.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 77
Figure8:EF_TCP_CLIENT/SERVER_LOOPBACK
TheclientloopbackoptionEF_TCP_CLIENT_LOOPBACK=4,whenusedwiththe
serverloopbackoptionEF_TCP_SERVER_LOOPBACK=2,differsfromotherloopback
optionssuchthatratherthanmovesocketsbetweenexistingstackstheywillcreate
anadditionalstackandmovesocketsfrombothendsoftheTCPconnectionintothis
newstack.Thisavoidsthepossibilityofhavingmanyloopbacksocketssharingand
contendingfortheresourcesofasinglestack.
WhenclientandserverarenotthesameUUID,settheenvironmentvariable
EF_SHARE_WITHtoallowbothprocessestosharethecreatedsharedstack.
7.13TCPStriping
OnloadsupportsaSolarflareproprietaryTCPstripingmechanismthatallowsa
singleTCPconnectiontousebothphysicalportsofanetworkadapter.Usingthe
combinedbandwidthofbothportsmeansincreasedthroughputforTCPstreaming
applications.TCPstripingcanbeparticularlybeneficialforMessagePassing
Interface(MPI)applications.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 78
IftheTCPconnection’ssourceIPaddressanddestinationIPaddressareonthesame
subnetasdefinedbyEF_STRIPE_NETMASKthenOnloadwillattempttonegotiate
TCPstripingfortheconnection.OnloadTCPstripingmustbeconfiguredatboth
endsofthelink.
TCPstripingallowsasingleTCPconnectiontousethefullbandwidthofboth
physicalportsonthesameadapter.Thisshouldnotbeconfusedwithlink
aggregation/portbondinginwhichanyoneTCPconnectionwithinthebondcan
onlyuseasinglephysicalportandthereforemorethanoneTCPconnectionwould
berequiredtorealizethefullbandwidthoftwophysicalports.
NOTE:TCPstripingisdisabledbydefault.Toenablethisfeaturesettheparameter
CI_CFG_PORT_STRIPING=1intheonloaddistributionsourcedirectorysrc/
include/internal/tranport_config_opt.hfile.
7.14TCPConnectionResetonRTO
UndercertaincircumstancesitmaybepreferabletoavoidresendingTCPdatatoa
peerservicewhendatadeliveryhasbeendelayed.Oncedatahasbeensent,andfor
whichnoacknowledgmenthasbeenreceived,theTCPretransmissiontimeout
periodrepresentsaconsiderabledelay.Whentheretransmissiontimeout(RTO)
eventuallyexpiresitmaybepreferablenottoretransmittheoriginaldata.
OnloadcanbeconfiguredtoresetaTCPconnectionratherthanattemptto
retransmitdataforwhichnoacknowledgmenthasbereceived.
ThisfeatureisenabledwiththeEF_TCP_RST_DELAYED_CONNperstackenvironment
variableandappliestoallTCPconnectionsintheonloadstack.OnanyTCP
connectionintheonloadstack,iftheRTOtimerexpiresbeforeanACKisreceived
theTCPconnectionwillbereset.
7.15ONLOAD_MSG_WARM
Applicationsthatsenddatainfrequentlymayseeincreasedsendlatencycompared
toanapplicationthatismakingfrequentsends.Thisisduetothesendpathand
associateddatastructuresnotbeingcacheandTLBresident(whichcanoccureven
iftheCPUhasbeenotherwiseidlesincetheprevioussendcall).
OnloadthereforesupportsapplicationsrepeatedlycallingsendtokeeptheTCPfast
sendpath‘warm’inthecachewithoutactuallysendingdata.Thisisparticularly
usefulforapplicationsthatonlysendinfrequentlyandhelpstomaintainlowlatency
performanceforthoseTCPconnectionsthatdonotsendoften.These“fake”sends
areperformedbysettingtheONLOAD_MSG_WARMflagwhencallingtheTCPsendcalls.
Themessagewarmfeaturedoesnottransmitanypackets.
charbuf[10];
send(fd,buf,10,ONLOAD_MSG_WARM);
Onloadstackdumpsupportsnewcounterstoindicatethelevelofmessagewarm
use:
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 79
warm_abortedisacountofthenumberoftimesamessagewarmsend
functionwascalled,butthesendpathwasnotexercisedduetoOnloadlocking
constraints.
warmisacountofthenumberoftimesamessagewarmsendfunctionwas
calledwhenthesendpathwasexercised.
NOTE:IftheONLOAD_MSG_WARMflagisusedonsocketswhicharenotaccelerated‐
includingthosehandedofftothekernelbyOnload,itmaycausethemessagewarm
packetstobeactuallysent.ThisisduetoalimitationinsomeLinuxdistributions
whichappeartoignorethisflag.TheOnloadextensionsAPIcanbeusedtocheck
whetherasocketsupportstheMSG_WARMfeatureviathe
onload_fd_check_feature()API(onload_fd_check_featureonpage191).
NOTE:Onloadversionsearlierthan201310donotsupporttheONLOAD_MSG_WARM
socketflag,thereforesettingtheflagwillcausemessagewarmpacketstobesent.
7.16Listen/AcceptSockets
TCPsocketsacceptedfromalisteningsocketwillshareawildcardfilterwiththe
parentsocket.ThefollowingOnloadmoduleoptionscanbeusedtocontrol
behaviorwhentheparentsocketisclosed.
oof_shared_keep_thresh‐default100,isthenumberofacceptedsocketssharing
awildcardfilterthatwillcausethefiltertopersistafterthelisteningsockethas
closed.
oof_shared_steal_thresh‐default200,isthenumberofsocketssharinga
wildcardfilterthatwillcausethefiltertopersistevenwhenanewlisteningsocket
needsthefilter.
Ifthelisteningsocketisclosedthebehaviordependsonthenumberofremaining
acceptedsocketsasfollows:
Numberofacceptedsockets OnloadAction
>oof_shared_keep_threshbut
<oof_shared_steal_thresh
Retainthewildcardfiltersharedbyall
acceptedsockets.
Ifanewlisteningsocketrequiresthefilter,
Onloadwillinstallafullmatchfilterforeach
acceptedsocketallowingthelisteningsocket
tousethewildcardfilter.
>oof_shared_steal_thresh Retainthewildcardfiltersharedbyall
acceptedsockets.
Anewlisteningsocketcanbecreatedbuta
filtercannotbeinstalledmeaningthesocket
willreceivenotrafficuntilthenumberof
acceptedconnectionsisreduced.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 80
7.17SocketCaching
SocketcachingmeansOnloadcanfurtherreducetheoverheadofsettingupnew
TCPconnectionsbyreusingexistingsocketsinsteadofcreatingfromnew.
Acachedsocketretainsafiledescriptorandsocketbufferwhenitisreturnedtothe
cacheoftheOnloadstackfromwhichitoriginated.
SocketcachingisenabledwhenEF_SOCKET_CACHE_MAXissettoavaluegreater
thanzero.Onloadwilldecidewhethertoapplypassiveoractivecachingdepending
onthetypeofsocketscreatedbytheuserapplication.
EF_SOCKET_CACHE_MAXappliestobothactiveandpassivesockets,i.e.ifsetto100
thecachelimitis100ofeachsockettype.
TCPPassiveSocketCaching
Passivesocketcaching,supportedfromtheOnload201502release,meansOnload
willreusesocketbuffersandfiledescriptorsfrompassiveopen(listeningsockets).
ThiscanimprovetheacceptrateofactiveopenTCPconnectionsandwillbenefit
processeswhichneedtoacceptlotsofconnectionsfromtheselisteningsockets.
TCPActiveSocketCaching
Activesocketcaching,supportedfromtheOnload201509release,meansOnload
willreusesocketbuffersandfiledescriptorsfromactiveopensocketswhenan
establishedTCPconnectionhasterminated.
ActiveopensocketssettingtheIP_TRANSPARENTsocketoptioncanbecached.
CachingStackdump
OnloadstackdumpcanbeusedtomonitorcachingactivityonOnloadstacks.
#onload_stackdumplots[|grepcache]
Counter Description
active cache:
hit=0
avail=0
cache=EMPTY
pending=EMPTY
TCPsocketcaching:
hit=numberofcachehits(werecached)
avail=numberofsocketsavailableforcaching
currentcachestate
sockcache_cached Numberofsocketscachedoverthelifetimeofthestack
sockcache_contenti
on
Numberofsocketsnotcachedduetolockcontention
passive_sockcache_
stacklim
Numberofpassivesocketsnotcachedduetostacklimit
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 81
Caching‐Requirements
Therearesomenecessaryprerequisiteswhenusingsocketcaching:
•setEF_UL_EPOLL=3andsetEF_FDS_MT_SAFE=1
•socketcachingisnotsupportedafterfork()
•socketsthathavebeendup()edwillnotbecached
•socketsthatusetheO_ASYNCorO_APPENDmodeswillnotbecached
• cachingoffersnobenefitifasinglesocketacceptsconnectionsonmultiple
localaddresses(applicabletopassivecachingonly).
•SetO_NONBLOCKorO_CLOEXECifrequiredonthesocket,whencreatingthe
socket.
Whensocketcachingcannotbeenabled,socketswillbeprocessedasnormal
Onloadsockets.
Usersshouldrefertodetailsofthefollowingenvironmentvariables:
• EF_SOCKET_CACHE_MAX
• EF_PER_SOCKET_CACHE_MAX
• EF_SOCKET_CACHE_PORTS
NOTE:Allowingmoresocketstobecachedthantherearefiledescriptorsavailable
canresultindrasticallyreducedperformanceandusersshouldconsiderthatthe
socketcachelimit,EF_SOCKET_CACHE_MAX,appliesperstack,unliketheper
processEF_SOCKET_CACHE_PORTSlimits.
RefertoParameterReferenceonpage146fordetailsofOnloadenvironment
variables.
active_sockcache_s
tacklim
Numberofactivesocketsnotcachedduetostacklimit
sockcache_socklim Numberofsocketsnotcachedduetosocketlimit
sockcache_hit Numberofsocketcachehits(werecached)
sockcache_hit_reap Numberofsocketcachehits(werecached)afterreaping
sockcache_miss_int
mismatch
Numberofsocketcachemissesduetomismatchedinterfaces
activecache_cached Numberofactivesocketscachedoverthelifetimeofthestack.
activecache_stackl
im
Numberofactivesocketsnotcachedduetostacklimit
activecache_hit Numberofactivesocketcachehits(werecached)
activecache_hit_re
ap
Numberofactivesocketcachehits(werecached)afterreaping
Counter Description
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 82
7.18ScalableFilters
Usingscalablefilters,anOnloadstackcaninstallaMACfiltertoreceivealltraffic
fromaspecifiedinterface.
NOTE:OncetheMACfilterisinsertedonaninterface,ARP,ICMPandIGMPtraffic
isdirectedtothekernel,butallothertrafficisdirectedtoasingleOnloadstack.
Usingscalablefiltersremoveslimitationson:
•thenumberoflisteningsocketsinscalablefilterspassivemode
•thenumberofactiveopenconnectionsinscalablefilterstransparentactive
mode.ThisworksonlyforsocketshavingtheIP_TRANSPARENToptionset.See
TransparentReverseProxyModesonpage84below.
ItissuggestedthatadedicatedinterfaceisusedbythestackinsertingtheMACfilter.
Thisallowsthekernelstackoranotherapplicationusingscalablefilterstousethe
samephysicalport.
TheSolarflareSFN7000seriesadaptercanbepartitionedtoexposeupto16PCIe
physicalfunctions(PF).EachPFispresentedtotheOSasastandardnetwork
interface.Theadapterispartitionedwiththesfbootutility‐seeexamplebelow.
OnceaMACfilterhasbeeninstalledonaPF,otherOnloadstackscanstillreceive
othertrafficonthesamePF,butsocketswillhavetoinsertIPfiltersfortherequired
traffic.ApartfromARP,ICMPandIGMPpackets,OSkernelsockets,usingthesame
PF,willnotreceiveanytraffic.
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Ifa
processcreatesmultiplestacks,theEF_SCALABLE_FILTERS_ENABLEperstack
variablecanbeusedtoenable/disablethisfeatureforindividualstacksusingthe
existingOnloadextensionsAPIe.g.
onload_stack_opt_set_int(EF_SCALABLE_FILTERS_ENABLE,1);
TheMACfilterisinsertedwhenthestackiscreated‐i.e.beforesocketsarecreated,
andsocketsneedtobecreatedtoreceiveanytrafficdestinedforthisstack.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 83
ScalableFilters‐Restrictions
•ScalablefiltersareonlyusedforTCPtraffic.
•UDPtrafficcanbereceivedandacceleratedbyOnloadoninterfaceswhere
scalablefiltersareenabled,butkernelUDPsocketswillnotreceivetraffic.
•UDPfragmentedframescannotbereceivedoninterfaceswherescalablefilters
areenabled.Usersshouldavoidhavingfragmentedframesontheseinterfaces.
•Theadaptermustusethefullfeatureorlowlatencyfirmwarevariants.
• Minimumfirmwareversion:4.6.5.1000.
•Stackperthreadoptions(EF_STACK_PER_THREAD)cannotbeusedwiththis
feature.
•BydefaultthescalablefiltersfeaturerequiresCAP_NET_RAW.Onloadcanbe
configuredtoavoidcapabilitychecksforthisusingtheOnloadmoduleoption
scalable_filter_gid.SeeModuleOptionsonpage143fordetails.
ScalableFilters‐Configuration
Toenablescalablefiltersonaspecificinterface:
EF_SCALABLE_FILTERS=enps0f0
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Acluster
(seeApplicationClusteringonpage63)mighthavemultiplestacksandeachstack
couldinstallaMACfilteronadifferentinterface.
SocketsmustbeboundtotheIPaddressoftheinterface.
ThisfeatureistargetedatTCPlisteningsocketsonlyandconnectionsacceptedfrom
alisteningsocketwillsharetheMACfilter.
PartitiontheNIC
ThesfbootutilityisavailableintheSolarflareLinuxUtilitiespackage(SF107601LS),
thefollowingexampledemonstrateshowtopartitiontheadaptertoexposemore
thanonePF(Acoldrebootoftheserverisneededafterchangesusingsfboot).
#sfbootpfcount=2vfcount=0switchmode=partitioning
ScalableFiltersandBonding
Bondedinterfaces‐createdwiththestandardLinuxbondingorteamingdrivercan
beusedforscalablefilters.
Everyinterfacethatispartofthebondmustbepresentinthesystemwhenthe
scalablefiltersstackiscreated.Removingthebondwillcausethescalablefilterto
stopreceivingtraffic.Afteranewbondinterfaceiscreated,theapplicationmustbe
restartedtousethebond.
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 84
7.19TransparentReverseProxyModes
EnhancementssuchasScalableFilters,SocketCachingandsupportforthe
IP_TRANSPARENTsocketoptionsupportOnloadwithgreaterefficiencyand
increasedscalabilityintransparentreverseproxymodeserverdeployments.
Thesefeaturesreducetoaminimumtheoverheadsassociatedwithcreatingand
connectingtransparentsockets.Onloadcanuseofupto2milliontransparent
activeopensocketsperOnloadstack.
AtransparentsocketiscreatedwhenasocketsetstheIP_TRANSPARENTsocket
optionandexplicitlybindstoIPaddressesandport.Theipaddresscanbeona
foreignhost.IP_TRANSPARENTmustbesetbeforethebind.
TheEF_SCALABLE_FILTERSvariableisusedtoenablescalablefiltersandtoconfigure
thetransparentproxymode.
Restrictions
•TheIP_TRANSPARENToptionmustbesetbeforethesocketisbound.
•TheIP_TRANSPARENToptioncannotbeclearedafterbindonaccelerated
sockets.
• IP_TRANSPARENTsocketscannotbeacceleratediftheyareboundtoport0or
toINADDR_ANY.
• IP_TRANSPARENTsocketscannotbepassedtothekernelstackwhenboundto
aportthatisinthelistspecifiedbyEF_FORCE_TCP_REUSEPORT.
•Whenusingtherss:transparent_activemode(seebelow),EF_CLUSTER_NAME
mustbeexplicitlysetbytheprocesssharingtheclusterANDthestackcannot
benamedbyeitherEF_NAMEoronload_set_stackname().
Config(example)Settings
BelowareexamplesofconfigurationsusingtheEF_SCALABLE_FILTERSenvironment
optiontosettransparentproxymodes.
• Enablescalablefiltersoninterfacep1p1‐thisinsertsaMACaddressfilteron
theadapter.Thefilterissharedbyallactiveopenconnectionsontheinterface.
SocketcachingwillbeappliedtothepassivesideoftheTCPconnection.
EF_SCALABLE_FILTERS=p1p1=passive
• Enablescalablefiltersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketcachingwillbeapplied
totheactivesideofaTCPconnection:
EF_SCALABLE_FILTERS=enps0f0=transparent_active
OnloadUserGuide
Onload‐TCP
Issue20 ©SolarflareCommunications2015 85
•Asfortheexampleabove,butusessymmetricalRSStoensurethatrequests/
responsesbetweenclientsandbackendserversareprocessedbythesame
thread.
EF_SCALABLE_FILTERS=enps0f0=rss:transparent_active
• Enablescalablefitlersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketbuffersarecached
fromactiveandpassivesidesoftheTCPconnection.
EF_SCALABLE_FILTERS=enps0f0=transparent_active:passive
7.20TransparentReverseProxyonMultipleCPUs
UsedtogetherwithApplicationClustering,transparentscalablemodescandeliver
linearscalabilityusingmultipleCPUcores.
ThisusesRSStodistributetraffic,bothupstreamanddownstreamoftheproxy
application,mappingstreamstothecorrectOnloadstack.WheneachCPUcoreis
associatedexclusivelywithasingleclusteredstacktherecanbenocontention
betweenstacks.
Forthisusecasetofunctioncorrectly,theproxyapplicationwillusethedownstream
clientaddress:portontheupstream(toserver)sideoftheTCPconnection.Inthis
wayRSSandhardwarefiltersensurethatclientsideandserversidearehandledby
thesameworkerthreadandtrafficisdirectedtothecorrectstack.
Inthisscenariotheclientthinksitcommunicatesdirectlywiththeserver,andthe
serverthinksitcommunicatesdirectlywiththeclient‐thetransparentproxyserver
is‘transparent’.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 86
8Onload‐UDP
8.1UDPOperation
ThetablebelowidentifiestheOnloadUDPimplementationRFCcompliance.
8.2SocketOptions
OnloadUDPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
RFC Title Compliance
768 UserDatagramProtocol Yes
1122 RequirementsforHosts Yes
3678 SocketInterfaceExtensionsfor
MulticastSourceFilters
Partial
SeeSourceSpecificSocketOptions
onpage88
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.See
SO_BINDTODEVICEonpage57.
SO_BROADCAST whenenableddatagramsocketscansendandreceive
packetsto/fromabroadcastaddress.
SO_DEBUG enableprotocoldebugging.
SO_DONTROUTE outgoingdatashouldbesentonwhateverinterfacethe
socketisboundtoandnotroutedviaanotherinterface.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 87
SO_LINGER whenenabledaclose()orshutdown()willnotreturn
untilallqueuedmessagesforthesockethavebeen
successfullysentorthelingertimeouthasbeen
reached.Otherwisethecallreturnsimmediatelyand
socketsareclosedinthebackground.
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeuingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.Thevaluesetisdoubledbythekernelandby
Onloadtoallowforbookkeepingoverheadwhenitis
setbythesetsockopt()functioncall.Notethat
EF_UDP_RCVBUFoverridesthisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RECVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_REUSEADDR canreuselocalportsi.e.anothersocketcanbindtothe
sameportnumberexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowmultiplesocketstobindtothesameport.
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.NotethatEF_UDP_SNDBUF
overridesthisvalue.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enableordisablereceivingtheSO_TIMESTAMPcontrol
message(microsecondresolution).Seebelow.
SO_TIMESTAMPNS enableordisablereceivingtheSO_TIMESTAMPcontrol
message(nanosecondresolution).SeeSO_TIMESTAMP
andSO_TIMESTAMPNS(softwaretimestamps)on
page55.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.SeeSO_TIMESTAMPING(HardwareReceive
Timestamps)onpage55.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 88
8.3SourceSpecificSocketOptions
Thefollowingtableidentifiessourcespecificsocketoptionssupportedfromonload
201210u1onwards.RefertoreleasenotesforOnloadspecificbehaviorregarding
theseoptions.
8.4UDPSendandReceivePaths
ForeachUDPsocket,Onloadcreatesbothanacceleratedsocketandakernelsocket.
Thereisusuallynofiledescriptorforthekernelsocketvisibleintheusersfile
descriptortable.WhenaUDPprocessisreadytotransmitdata,Onloadwillchecka
cachedARPtablewhichmapsIPaddressestoMACaddresses.Acache‘hitresults
insendingviatheOnloadacceleratedsocket.Acache‘miss’resultsinasyscallto
populatetheusermodecachedARPtable.IfnoMACaddresscanbeidentifiedvia
thisprocessthepacketissentviathekernelstacktoprovokeARPresolution.
Therefore,itispossiblethatsomeUDPtrafficwillbesentoccasionallyviathekernel
stack.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_SYS
_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
Option Description
IP_ADD_SOURCE_MEMBER
SHIP
Jointhesuppliedmulticastgrouponthegiveninterface
andacceptdatafromthesuppliedsourceaddress.
IP_DROP_SOURCE_MEMBE
RSHIP
Dropsmembershiptothegivenmulticastgroup,
interfaceandsourceaddress.
MCAST_JOIN_SOURCE_GR
OUP
Joinasourcespecificgroup.
MCAST_LEAVE_SOURCE_G
ROUP
Leaveasourcespecificgroup.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 89
Figure9:UDPSendandReceivePaths
Figure9illustratestheUDPsendandreceivepaths.Lighterarrowsindicatethe
accelerated‘kernelbypass’path.DarkerarrowsidentifyfragmentedUDPpackets
receivedbytheSolarflareadapterandUDPpacketsreceivedfromanonSolarflare
adapter.UDPpacketsarrivingattheSolarflareadapterarefilteredonsourceand
destinationaddressandportnumbertoidentifyaVNICthepacketwillbedelivered
to.FragmentedUDPpacketsarereceivedbytheapplicationviathekernelUDP
socket.UDPpacketsreceivedbyanonSolarflareadapterarealwaysreceivedviathe
kernelUDPsocket.
8.5FragmentedUDP
WhensendingdatagramswhichexceedtheMTU,theOnloadstackwillsend
multipleEthernetpackets.OnhostsrunningOnload,fragmenteddatagramsare
alwaysreceivedviathekernelstack.
8.6UserLevelrecvmmsgforUDP
Therecvmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduserlevelrecvmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Therecvmmsg()isnotsupportedforTCPsockets.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 90
8.7UserLevelsendmmsgforUDP
Thesendmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduserlevelsendmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Thesendmmsg()isnotsupportedforTCPsockets.
8.8MulticastReplication
TheSolarflareSFN7000seriesadapterssupportmulticastreplicationwhere
receivedpacketsarereplicatedinhardwareanddeliveredtomultiplereceive
queues.ThisfeatureallowsanynumberofOnloadclients,listeningtothesame
multicastdatastream,toreceivetheirowncopyofthepackets,withoutan
additionalsoftwarecopyandwithouttheneedtoshareOnloadstacks.Asillustrated
below,thepacketsaredeliveredmultipletimesbythecontrollertoeachreceive
queuethathasinstalledahardwarefiltertoreceivethespecifiedmulticaststream.
Figure10:HardwareMulticastReplication
Multicastreplicationisperformedintheadaptertransparentlyanddoesnotneed
tobeexplicitlyenabled.
ThisfeatureremovestheneedtoshareOnloadstacksusingtheEF_NAME
environmentvariable.UsersusingEF_NAMEexclusivelyforsharingmulticasttraffic
cannowremoveEF_NAMEfromtheconfigurations.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 91
8.9MulticastOperationandStackSharing
Toillustratesharedstacks,thefollowingexamplesdescribeOnloadbehaviorwhen
twoprocesses,onthesamehost,subscribetothesamemulticaststream:
MulticastReceiveUsingDifferentOnloadStacksonpage91
MulticastTransmitUsingDifferentOnloadStacksonpage92
MulticastReceiveSharinganOnloadStackonpage92
MulticastTransmitSharinganOnloadStackonpage93
MulticastReceive‐OnloadStackandKernelStackonpage93.
NOTE:ThefollowingsubsectionsusetwoprocessestodemonstrateOnload
behavior.InpracticemultipleprocessescansharethesameOnloadstack.Stack
sharingisnotlimitedtomulticastsubscribersandcanbeemployedbyanyTCPand
UDPapplications.
MulticastReceiveUsingDifferentOnloadStacks
RunningonSFN5000orSFN6000seriesadapters(forSFN7000series‐seeMulticast
Replicationabove),OnloadwillnoticeiftwoOnloadstacksonthesamehost
subscribetothesamemulticaststreamandwillrespondbyredirectingthestream
togothroughthekernel.Handingthestreamtothekernel,thoughstillusingOnload
stacks,allowsbothsubscriberstoreceivethedatagrams,butuserspace
accelerationislostandthereceiverateislowerthatitcouldotherwisebe.Figure11
belowillustratestheconfiguration.Arrowsindicatethereceivepathand
fragmentedUDPpath.
Figure11:MulticastReceiveUsingDifferentOnloadStacks.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 92
ThereasonforthisbehaviorisbecausetheSolarflareNICwillnotdeliverasingle
receivedmulticastpacketmultipletimestomultiplestacksthepacketisdelivered
onlyonce.Ifareceivedpacketisdeliveredtokernelspace,thenthekernelTCP/IP
stackwillcopythereceiveddatamultipletimestoeachsocketlisteningonthe
correspondingmulticaststream.Ifthereceivedpacketweredelivereddirectlyto
Onload,wherethestacksaremappedtouserspace,itwouldonlybedeliveredtoa
singlesubscriberofthemulticaststream.
MulticastTransmitUsingDifferentOnloadStacks
ReferringtoFigure11,ifoneprocessweretotransmitmulticastdatagrams,these
wouldnotbereceivedbythesecondprocess.Onloadisonlyabletoaccelerate
transmittedmulticastdatagramswhentheydonotneedtobedeliveredtoother
applicationsinthesamehost.Ormoreaccurately,themulticaststreamcanonlybe
deliveredwithinthesameOnloadstack.
OnloadbydefaultchangesthedefaultstateoftheIP_MULTICAST_LOOPsocket
optionto0ratherthan1.ThischangeallowsOnloadtoacceleratemulticasttransmit
formostapplications,butmeansthatmulticasttrafficisnotdeliveredtoother
applicationsonthesamehostunlessthesubscribersocketsareinthesamestack.
ThenormalbehaviorcanberestoredbysettingEF_FORCE_SEND_MULTICAST=0,but
thislimitsmulticastaccelerationontransmittosocketsthathavemanuallysetthe
IP_MULTICAST_LOOPsocketoptiontozero.
MulticastReceiveSharinganOnloadStack
SettingtheEF_NAMEenvironmentvariabletothesamestring(max8chars)inboth
processesmeanstheycanshareanOnloadstack.Thestreamisnolongerredirected
throughthekernelresultinginamuchhigherreceiveratethancanbeobservedwith
thekernelTCP/IPstack(orwithseparateOnloadstackswherethedatapathisvia
thekernelTCP/IPstack).ThisconfigurationisillustratedinFigure12below.Lighter
arrowsindicatetheaccelerated(kernelbypass)path.Darkerarrowsindicatethe
fragmentedUDPpath.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 93
Figure12:SharinganOnloadStack
MulticastTransmitSharinganOnloadStack
ReferringtoFigure12,datagramstransmittedbyoneprocesswouldbereceivedby
thesecondprocessbecausebothprocessessharetheOnloadstack.
MulticastReceive‐OnloadStackandKernelStack
IfamulticaststreamisbeingacceleratedbyOnload,andanotherapplicationthatis
notusingOnloadsubscribestothesamestream,thenthesecondapplicationwill
notreceivetheassociateddatagrams.Thereforeifmultipleapplicationssubscribe
toaparticularmulticaststream,eitherallornoneshouldberunwithOnload.
ToenablemultipleapplicationsacceleratedwithOnloadtosubscribetothesame
multicaststream,theapplicationsmustsharethesameOnloadstack.Stacksharing
isachievedbyusingtheEF_NAMEenvironmentvariable(max8chars).
MulticastReceiveandMultipleSockets
Whenmultiplesocketsjointhesamemulticastgroup,receivedpacketsare
deliveredtothesesocketsintheorderthattheyjoinedthegroup.
Whenmultiplesocketsarecreatedbydifferentthreadsandallthreadsarespinning
onrecv(),thethreadwhichisabletoreceivefirstwillalsodeliverthepacketsto
theothersockets.
Ifathread‘Aisspinningonpoll(),andanotherthread‘B’,listeningtothesame
group,callsrecv()butdoesnotspin,‘Awillnoticeareceivedpacketfirstand
deliverthepacketto‘B’withoutaninterruptoccurring.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 94
8.10MulticastLoopback
ThesocketoptionIP_MULTICAST_LOOPcontrolswhethermulticasttrafficsentona
socketcanbereceivedlocallyonthemachine.WithOnload,thedefaultvalueofthe
IP_MULTICAST_LOOPsocketoptionis0(thekernelstackdefaults
IP_MULTICAST_LOOPto1).ThereforebydefaultwithOnloadmulticasttrafficsent
onasocketwillnotbereceivedlocally.
AswellassettingIP_MULTICAST_LOOPto1,receivingmulticasttrafficlocally
requiresboththesenderandreceivertobeusingthesameOnloadstack.Therefore,
whenareceiverisinthesameapplicationasthesenderitwillreceivemulticast
traffic.Ifsenderandreceiverareindifferentapplicationsthenbothmustberunning
OnloadandmustbeconfiguredtosharethesameOnloadstack.
FortwoprocessestoshareanOnloadstackbothmustsetthesamevalueforthe
EF_NAMEparameter(max8chars).Ifonelocalprocessistoreceivethedatasentby
asendinglocalprocess,EF_MCAST_SENDmustbesetto1or3onthethreadcreator
ofthestack.
UserofearlierOnloadversionsandusersofEF_MULTICAST_LOOP_OFFshouldrefer
totheParameterReferencetableParameterReferenceonpage146fordetailsof
deprecatedfeatures.
8.11HardwareMulticastLoopback
AnalternativetotheOnloadstacksharingschemedescribedinMulticastLoopback,
HardwareMulticastLoopback,availablefromopenonload201405,enablesthe
passingofmulticasttrafficbetweenOnloadstacksallowingapplicationsrunningon
thesameservertobenefitfromOnloadaccelerationwithouttheneedtosharean
Onloadstacktherebyreducingtheriskofstacklockandresourcecontention.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 95
Figure13:HardwareMulticastLoopback
•OnlyavailableontheSolarflareFlareonSFN7000seriesadapters.
•Adaptersmusthaveaminimumfirmwareversionv4.0.7.6710and“full
featured”firmwaremustbeselectedusingthefirmwarevariantoptionvia
the“sfbootutility.RefertotheSolarflareServerUserGuide‘sfboot
parametersforfurtherdetails.
HardwareMulticastLoopbackallowsdatageneratedbyoneprocesstobereceived
byanotherprocessonthesamehost‐MulticastReplicationdoesnotsupportlocal
loopback.
ReceptionofloopedbacktrafficisenabledbydefaultonaperOnloadstackbasis.A
stackcanchoosenottoreceiveloopedbacktrafficbysettingtheenvironment
variableEF_MCAST_RECV_HW_LOOP=0.
NOTE:HardwareMulticastLoopbackisenabledthroughasinglehardwarefilter.
Forthisreason,ifanysingleprocesschoosestoreceivemulticastloopbacktraffic
byEF_MCAST_RECV_HW_LOOP=1,thenallotherprocessesjoinedtothesame
multicastgroupwillalsoreceivetheloopbacktrafficregardlessoftheirsettingfor
EF_MCAST_RECV_HW_LOOP.
Sendingofloopedbacktrafficisdisabledbydefault.Onaperstackbasisthisfeature
canbeenabledbysettingtheenvironmentvariableEF_MCAST_SENDtoeither2or3.
SettingthesocketoptionMULTICAST_TTL=0willdisablethesendingoftrafficonthe
normalnetworkpathandpreventtrafficbeingloopedback.Thevalueofthesocket
optionIP_MULTICAST_LOOPhasnoeffectonHardwareMulticastLoopback.Refer
toOnloadandIP_MULTICAST_TTLonpage119fordifferencesinLinuxkerneland
Onloadbehavior.
OnloadUserGuide
Onload‐UDP
Issue20 ©SolarflareCommunications2015 96
8.12IP_MULTICAST_ALL
Foranacceleratedsocket,OnloadwillalwaysbehaveasifIP_MULTICAST_ALL=0.
Thereisalwaysthepotentialformessagestoarriveatathehost‐perhapsfroma
nonSolarflareinterfaceorviatheloopbackinterface‐whichwillalsobedelivered
tothesocketundernormalUDPportmatchingrulessothesocketcouldreceive
trafficforgroupsnotexplicitlyjoinedonthissocket.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 97
9PacketBuffers
9.1Introduction
PacketbuffersdescribethememoryusedbytheOnloadstack(andSolarflare
adapter)toreceive,transmitandqueuenetworkdata.Packetbuffersprovidea
methodforusermodeaccessiblememorytobedirectlyaccessedbythenetwork
adapterwithoutcompromisingsystemintegrity.
Onloadwillrequesthugepagesiftheseareavailablewhenallocatingmemoryfor
packetbuffers.Usinghugepagescanleadtoimprovedperformanceforsome
applicationsbyreducingthenumberofTranslationLookasideBuffer(TLB)entries
neededtodescribepacketbuffersandthereforeminimizeTLB‘thrashing’.
NOTE:OnloadhugepagesupportshouldnotbeenablediftheapplicationusesIPC
namespacesandtheCLONE_NEWIPCflag.
Onloadofferstwoconfigurationmodesfornetworkpacketbuffers:
9.2NetworkAdapterBufferTableMode
Solarflarenetworkadaptersemployaproprietaryhardwarebasedbufferaddress
translationmechanismtoprovidememoryprotectionandtranslationtoOnload
stacksaccessingaVNIContheadapter.Thisisthedefaultpacketbuffermodeand
issuitableforthemajorityofapplicationsusingOnload.
Thisschemeemploysabuffertableresidingonthenetworkadaptertocontrolthe
memoryanOnloadstackcanusetosendandreceivepackets.
Whiletheadaptersbuffertableissufficientforthemajorityofapplications,on
adapterspriortotheSFN7000series,itislimitedtoapproximately120,000x2Kbyte
bufferswhichhavetobesharedbetweenallOnloadstacks.
IfthetotalpacketbufferrequirementsofallapplicationsusingOnloadrequiremore
thanthenumberofpacketbufferssupportedbytheadaptersbuffertable,theuser
shouldconsiderchangingtotheScalablePacketBuffersconfiguration.
9.3LargeBufferTableSupport
TheSolarflareSFN7000seriesadaptersalleviatethepacketbufferlimitationsof
previousgenerationSolarflareadaptersandsupportmanymorethanthe120,000
packetbufferwithouttheneedtoswitchtoScalablePacketBufferMode.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 98
EachbuffertableentryintheSFN7000seriesadaptercandescribea4Kbyte,
64Kbyte,1Mbyteor4Mbyteblockofmemorywhereeachtableentryisthepage
sizeasdirectedbytheoperatingsystem.
9.4ScalablePacketBufferMode
ScalablePacketBufferModeisanalternativepacketbuffermodewhichallowsa
muchhighernumberofpacketbufferstobeusedbyOnload.UsingtheScalable
PacketBufferModeOnloadstacksemploySingleRootI/OVirtualization(SRIOV)
virtualfunctions(VF)toprovidememoryprotectionandtranslation.This
mechanismremovesthe120KbufferslimitationimposedbytheNetworkAdapter
BufferTableMode.
FordeploymentswhereusingSRIOVand/ortheIOMMUisnotanoption,Onload
alsosupportsanalternativeScalablePacketBufferModeschemecalledPhysical
AddressingMode.Physicaladdressingalsoremovesthe120Kpacketbuffer
limitation,howeverphysicaladdressingdoesnotprovidethememoryprotection
providedbySRIOVandanIOMMU.FordetailsofPhysicalAddressingModesee
PhysicalAddressingModeonpage106.
NOTE:EnablingSRIOV,whichisneededforScalablePacketBufferMode,hasa
latencyimpactwhichdependsontheadaptermodel.FortheSFN5000adapter
series,latencyincreasesbyapproximately50nsforthe1/2RTTlatency.The
SFN6000adapterserieshasequivalentlatencytotheSFN5000adapterserieswhen
operatinginthismode.
NOTE:MRGusersshouldrefertoRedHatMRG2andSRIOVonpage128.
ForfurtherdetailsonSRIOVconfigurationrefertoConfiguringScalablePacket
Buffersonpage102.
9.5AllocatingHugePages
Usinghugepagescanleadtoimprovedperformanceforsomeapplicationsby
reducingthenumberofTranslationLookasideBuffer(TLB)entriesneededto
describepacketbuffersandthereforeminimizeTLB‘thrashing.Hugepagesalso
delivermanypacketsbuffers,butconsumeonlyaasingleentryinthebuffertable.
Explicithugepagesarerecommended.
Thecurrenthugepageallocationcanbecheckedbyinspectionof/proc/meminfo
cat/proc/meminfo|grepHuge
Thisshouldreturnsomethingsimilarto:
AnonHugePages:2048kB
HugePages_Total:2050
HugePages_Free:2050
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:2048kB
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 99
Thetotalnumberofhugepagesavailableonthesystemisthevalue
HugePages_Total.Thefollowingcommandcanbeusedtodynamicallysetand/or
changethenumberofhugepagesallocatedonasystemto(<N>isanonnegative
integer):
echo<N>>/proc/sys/vm/nr_hugepages
OnaNUMAplatform,thekernelwillattempttodistributethehugepagepoolover
thesetofallallowednodesspecifiedbytheNUMAmemorypolicyofthetaskthat
modifiesnr_hugepages.Thefollowingcommandcanbeusedtocheckthepernode
distributionofhugepagesinaNUMAsystem:
cat/sys/devices/system/node/node*/meminfo|grepHuge
HugepagescanalsobeallocatedonaperNUMAnodebasis(ratherthanhavethe
hugepagesallocatedacrossmultipleNUMAnodes).Thefollowingcommandcanbe
usedtoallocate<N>hugepagesonNUMAnode<M>:
echo<N>>/sys/devices/system/node/node<M>/hugepages/hugepages2048kB/nr_hugepages
9.6HowPacketBuffersAreUsedbyOnload
EachpacketbufferisallocatedtoexactlyoneOnloadstackandisusedtoreceive,
transmitorqueuenetworkdata.PacketbuffersareusedbyOnloadinthefollowing
ways:
1Receivedescriptorrings.BydefaulttheRXdescriptorringwillhold512packet
buffersatalltimes.ThisvalueisconfigurableusingtheEF_RXQ_SIZE(per
stack)variable.
2Transmitdescriptorrings.BydefaulttheTXdescriptorringwillholdupto512
packetbuffers.ThisvalueisconfigurableusingtheEF_TXQ_SIZE(perstack)
variable.
3Toqueuedataheldinreceiveandtransmitsocketbuffers.
4TCPsocketscanalsoholdpacketbuffersinthesocket’sretransmitqueueand
inthereorderqueue.
5Userlevelpipesalsoconsumepacketbufferresources.
IdentifyingPacketBufferRequirements
WhendecidingthenumberofpacketbuffersrequiredbyanOnloadstack
considerationshouldbegiventotheresourceneedsofthestacktoensurethatthe
availablepacketbufferscanbesharedefficientlybetweenallOnloadstacks.
Example1:
Ifweconsiderahypotheticalcaseofasinglehost:
‐ whichemploysmultipleOnloadstackse.g10
‐ eachstackhasmultiplesocketse.g6
‐ andeachsocketusesmanypacketbufferse.g2000
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 100
Thiswouldrequireatotalof120000packetbuffers
Example2:
IfonastacktheTCPreceivequeueis1MbyteandtheMSSvalueis1472bytes,
thiswouldrequireatleast700packetbuffers‐(andagreaternumberif
segmentssmallerthattheMSSwerereceived).
Example3:
AUDPreceivequeueof200Kbyteswherereceiveddatagramsareeach200
byteswouldhold1000packetbuffers.
Theexamplesaboveuseonlyapproximatecalculatedvalues.The
onload_stackdumpcommandprovidesaccuratemeasurementsofpacketbuffer
allocationandusage.
Considerationshouldbegiventopacketbufferallocationtoensurethateachstack
isallocatedthebuffersitwillrequireratherthana‘onesizefitsall’approach.
WhenusingtheBufferTableModethesystemislimitedto120Kpacketbuffers‐
theseareallocatedsymmetricallyacrossallSolarflareinterfaces.
NOTE:Packetbuffersareaccessibletoallnetworkinterfacesandeachpacketbuffer
requiresanentryineverynetworkadaptersbuffertable.Addingmorenetwork
adapters‐andthereforemoreinterfacesdoesnotincreasethenumberofpacket
buffersavailable.
ForlargescaleapplicationstheScalablePacketBufferModeremovesthelimitations
imposedbythenetworkadapterbuffertable.SeeConfiguringScalablePacket
Buffersonpage102fordetails.
RunningOutofPacketBuffers
WhenOnloaddetectsthatastackisclosetoallocatingallavailablepacketbuffersit
willtakeactiontotryandavoidpacketbufferexhaustion.Onloadwillautomatically
startdroppingpacketsonreceiveand,wherepossible,willreducethereceive
descriptorringfilllevelinanattempttoalleviatethesituation.A‘memorypressure’
conditioncanbeidentifiedusingtheonload_stackdumplotscommandwhere
thepkt_bufsfieldwilldisplaytheCRITICALindicator.SeeIdentifyingMemory
Pressurebelow.
Completepacketbufferexhaustioncanresultindeadlock.InanOnloadstack,ifall
availablepacketbuffersareallocated(forexamplecurrentlyqueuedinsocket
buffers)thestackispreventedfromtransmittingfurtherdataastherearenopacket
buffersavailableforthetask.
IfallavailablepacketbuffersareallocatedthenOnloadwillalsofailtokeepits
adaptersreceivequeuesreplenished.Ifthequeuesfallemptyfurtherdatareceived
bytheadapterisinstantlydropped.OnaTCPconnectionpacketbuffersareusedto
holdunacknowledgeddataintheretransmitqueue,anddroppingreceivedpackets
containingACKsdelaysthefreeingofthesepacketbuffersbacktoOnload.Setting
thevalueofEF_MIN_FREE_PACKETS=0canresultinastackhavingnofreepacket
buffersandthis,inturn,canpreventthestackfromshuttingdowncleanly.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 101
IdentifyingMemoryPressure
Thefollowingextractsfromtheonload_stackdumpcommandidentifyanOnload
stackundermemorypressure.
TheEF_MAX_PACKETSvalueidentifiesthemaximumnumberofpacketbuffersthat
canbeusedbythestack.EF_MAX_RX_PACKETSisthemaximumnumberofpacket
buffersthatcanbeusedtoholdpacketsreceived.EF_MAX_TX_PACKETSisthe
maximumnumberofpacketbuffersthatcanbeusedtoholdpacketstosend.These
twovaluesarealwayslessthatEF_MAX_PACKETStoensurethatneitherthetransmit
orreceivepathscanstarvetheotherofpacketbuffers.RefertoParameter
Referenceonpage146fordetaileddescriptionsoftheseperstackvariables.
TheexampleOnloadstackhasthefollowingdefaultenvironmentvariablevalues:
EF_MAX_PACKETS:32768
EF_MAX_RX_PACKETS:24576
EF_MAX_TX_PACKETS:24576
Theonload_stackdumplotscommandidentifiespacketbufferallocationandthe
onsetofamemorypressurestate:
pkt_bufs:size=2048max=32768alloc=24576free=32async=0CRITICAL
pkt_bufs:rx=24544rx_ring=9rx_queued=24535
Therearepotentially32768packetbuffersavailableandthestackhasallocated
(used)24576packetbuffers.
Inthesocketreceivebuffersthereare24544packetsbufferswaitingtobe
processedbytheapplication‐thisisapproachingtheEF_MAX_RX_PACKETSlimitand
isthereasontheCRITICALflagispresenti.e.theOnloadstackisundermemory
pressure.Only9packetbuffersareavailabletothereceivedescriptorring.
OnloadwillaimtokeeptheRXdescriptorringfullatalltimes.Iftherearenot
enoughavailablepacketbufferstorefilltheRXdescriptorringthisisindicatedbythe
LOWmemorypressureflag.
Theonload_stackdumplotscommandwillalsoidentifythenumberofmemory
pressureeventsandnumberofpacketsdroppedasaresultofmemorypressure.
memory_pressure:1
memory_pressure_drops:22096
ControllingOnloadPacketBufferUse
Anumberofenvironmentvariablescontrolthepacketbufferallocationonaper
stackbasis.RefertoParameterReferenceonpage146foradescriptionof
EF_MAX_PACKETS.
Unlessexplicitlyconfiguredbytheuser,EF_MAX_RX_PACKETSand
EF_MAX_TX_PACKETSwillbeautomaticallysetto75%oftheEF_MAX_PACKETS
value.Thisensuresthatsufficientbuffersareavailabletobothreceiveandtransmit.
TheEF_MAX_RX_PACKETSandEF_MAX_TX_PACKETSarenottypicallyconfiguredby
theuser.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 102
Ifanapplicationrequiresmorepacketbuffersthanthemaximumconfigured,then
EF_MAX_PACKETSmaybeincreasedtomeetdemand,howeveritshouldbe
recognizedthatlargerpacketbufferqueuesincreasecachefootprintwhichcanlead
toreducedthroughputandincreasedlatency.
EF_MAX_PACKETSisthemaximumnumberofpacketbuffersthatcouldbeusedby
thestack.SettingEF_MAX_RX_PACKETStoavaluegreaterthanEF_MAX_PACKETS
effectivelymeansthatallpacketbuffers(EF_MAX_PACKETS)allocatedtothestack
willbeusedforRX‐withnothingleftforTX.Thesafestmethodistoonlyincrease
EF_MAX_PACKETSwhichkeepstheRXandTXpacketbuffersvaluesat75%ofthis
value.
9.7ConfiguringScalablePacketBuffers
NOTE:SRIOVandthereforeScalablePacketBufferModeisnotcurrentlysupported
ontheSFN7000seriesadapterbutwillbeavailableinafuturerelease.
UsingtheScalablePacketBufferModeOnloadstacksareboundtovirtualfunctions
(VFs)andprovideaPCISRIOVcompliantmeanstoprovidememoryprotectionand
translation.VFsemploythekernelIOMMU.
RefertoChapter11andScalablePacketBufferModeonpage127for32bitkernel
limitations.
Procedure:
Step1.PlatformSupportonpage102
Step2.BIOSandLinuxKernelConfigurationonpage103
Step3.UpdateadapterfirmwareandenableSRIOVonpage104
Step4.EnableVFsforOnloadonpage105
Step5.CheckPCIeVFConfigurationonpage105
Step6.CheckVFsinonload_stackdumponpage105
Step1.PlatformSupport
ScalablePacketBufferModeisimplementedusingSRIOV,supportforwhichisa
relativelyrecentadditiontotheLinuxkernel.Therewereseveralkernelbugsinearly
incarnationsofSRIOVsupport,uptoandincludingkernel.org2.6.34.Thefixeshave
beenbackportedtorecentRedHatkernels.Usersareadvisedtoenablescalable
packetbuffermodeonRedHatkernel2.6.32131.0.15orlater,orkernel.org2.6.35
orlater.Inotherdistributions,itisrecommendedthatthemostrecentpatched
kernelversionisused
•ThesystemhardwaremusthaveanIOMMUandthismustbeenabledinthe
BIOS.
•ThekernelmustbecompiledwithsupportforIOMMUandkernelcommand
lineoptionsarerequiredtoselecttheIOMMUmode.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 103
•ThekernelmustbecompiledwithsupportforSRIOVAPIs(CONFIGPCIIOV).
•SRIOVmustbeenabledonthenetworkadapterusingthesfbootutility.
•Whenmorethan6VFsareneeded,thesystemhardwareandkernelmust
supportPCIeAlternativeRequesterID(ARI)‐aPCIeGen2feature.
•OnloadoptionsEF_PACKET_BUFFER_MODE=1mustbesetintheenvironment.
•Thesfcdrivermoduleoptionmax_vfsshouldbesettotherequirednumberof
VFs.
NOTE:TheScalablePacketBufferfeaturecanbesusceptibletoknownkernelissues
observedonRHEL6andSLES11.(Seehttp://www.spinics.net/lists/linuxpci/
msg10480.htmlfordetails.Theconditioncanresultinanunresponsiveserverif
intel_iommuhasbeenenabledinthegrub.conffile,aspertheprocedureatStep
2.BIOSandLinuxKernelConfigurationonpage103,andiftheSolarflare
sfc_resourcedriverisreloaded.Thisissuehasbeenaddressedinnewerkernels.
Step2.BIOSandLinuxKernelConfiguration
TouseSRIOV,hardwarevirtualizationmustbeenabled.RefertoRedHatEnabling
IntelVTxandAMDVVirtualizationinBIOSformoreinformation.Takecareto
enableVTdaswellasVTonanIntelplatform.
ToverifythattheextensionshavebeencorrectlyenabledrefertoRedHatVerifying
virtualizationextensions.Forbestkernelconfigurationperformanceandtoavoid
kernelbugsexhibitedwhenIOMMUisenabledforalldevices,Solarflare
recommendthekernelisconfiguredtousetheIOMMUinpassthroughmode‐
appendthefollowinglinestokernellineinthe/boot/grub/grub.conffile:
OnanIntelsystem:
intel_iommu=oniommu=on,pt
OnanAMDsystem:
amd_iommu=on,iommu=on,pt
InpassthroughmodetheIOMMUisbypassedforregulardevices.RefertoRedHat:
PCIpassthroughformoreinformation.
NOTE:OnLinuxRedHat5servers(2.6.18)itisnecessarytoalsousethe
iommu_type=2option.
NOTE:EnterpriseOnloadv2.1.0.0usersandOpenOnloadv201109u2(onwards)
users:
RecentkernelsarecompiledwithsupportforIOMMUsbydefault,but
unfortunatelytherealtime(rt)kernelpatchesarenotcurrentlycompatiblewith
IOMMUs(RedHatMRGkernelsarecompiledwithCONFIG_PCI_IOVdisabled).Itis
possibletousescalablepacketbuffermodeonsomesystemswithoutIOMMU
support,butinaninsecuremode.InthisconfigurationtheIOMMUisbypassed,and
thereisnocheckingofDMAaddressesprovidedbyOnloadinuserspace.Bugsor
misbehaviorofuserspacecodecancompromisethesystem.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 104
Toenablethisinsecuremode,settheOnloadmoduleoption
unsafe_sriov_without_iommu=1forthesfc_resourcekernelmodule.
LinuxMRGusersareurgedtouseMRGu2andkernel3.2.33rt50.66.el6rt.x86_64
orlatertoavoidknownissuesandlimitationsofearlierversions.
Theunsafe_sriov_without_iommuoptionisobsoletedinOpenOnload201210.It
isreplacedbyphysicaladdressingmode‐seePhysicalAddressingModeon
page106fordetails.
Step3.UpdateadapterfirmwareandenableSRIOV
1DownloadandinstalltheSolarflareLinuxUtilitiesRPMfrom
support.solarflare.comandunziptheutilitiesfiletorevealtheRPM:
2InstalltheRPM:
#rpm‐Uvhsfutils<version>.rpm
3Identifythecurrentfirmwareversionontheadapter:
#sfupdate
4Upgradetheadapterfirmwarewithsfupdate:
#sfupdate‐‐write
FullinstructionsonusingsfupdatecanbefoundintheSolarflareNetwork
ServerAdapterUserGuide.
5UsesfboottoenableSRIOVandenabletheVFs.Youcanenableupto127VFs
perport,butthehostBIOSmayonlybeabletosupportasmallernumber.The
followingexamplewillconfigure16VFsoneachSolarflareport:
#sfbootsriov=enabledvfcount=16vfmsixlimit=1
6Itisnecessarytoreboottheserverfollowingchangesusingsfbootand
sfupdate.
NOTE:Enablingall127VFsperportwithmorethanoneMSIXinterruptperVFmay
notbesupportedbythehostBIOS.IftheBIOSdoesn'tsupportthisthenyoumay
get127VFsononeportandnoVFsontheotherport.YoushouldcontactyourBIOS
vendorforanupgradeorreducetheVFcount.
NOTE:OnRedHat5serversthevfcountshouldnotexceed32.
Option DefaultValue Description
sriov=<enabled|disabled> Disabled Enable/DisablehardwareSRIOV
support
vfcount=<n> 127 Numberofvirtualfunctions
advertisedperport.Seethe
notebelow.
vfmsixlimit=<n> 1 NumberofMSIXinterruptsper
VF
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 105
NOTE:VFallocationmustbesymmetricacrossallSolarflareinterfaces.
Step4.EnableVFsforOnload
#exportEF_PACKET_BUFFER_MODE=1
Thesfcdrivermodulemax_vfsshouldspecifythenumberofrequiredVFs.The
drivermoduleoptioncanbesetinausercreatedfile(e.g.sfc.conf)inthe/etc/
modprobe.ddirectory:
optionssfcmax_vfs=N
RefertoParameterReferenceonpage146forothervalues.
Step5.CheckPCIeVFConfiguration
ThenetworkadaptersfcdriverwillinitializetheVFs,whichcanbedisplayedbythe
lspcicommand:
#lspci‐d1924:
05:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.2Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.3Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.4Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.5Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.6Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.7Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.0Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.1Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
Thelspciexampleoutputaboveidentifiesonephysicalfunctionperphysicalport
andthevirtualfunctions(fourforeachport)ofasingleSolarflaredualportnetwork
adapter.
Step6.CheckVFsinonload_stackdump
Theonload_stackdumpnetifcommandwillidentifyVFsbeingusedbyOnload
stacksasinthefollowingexample:
#onload_stackdumpnetif
ci_netif_dump:stack=0name=
ver=201109uid=0pid=3354
lock=10000000UNLOCKEDnics=3primed=3
sock_bufs:max=1024n_allocated=4
pkt_bufs:size=2048max=32768alloc=1152free=128async=0
pkt_bufs:rx=1024rx_ring=1024rx_queued=0
pkt_bufs:tx=0tx_ring=0tx_oflow=0tx_other=0
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 106
time:netif=3df7d2poll=3df7d2now=3df7d2(diff=0.000sec)
ci_netif_dump_vi:stack=0intf=0vi=67dev=0000:05:01.0hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
ci_netif_dump_vi:stack=0intf=1vi=67dev=0000:05:01.1hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
TheoutputabovecorrespondstoVFsadvertisedontheSolarflarenetworkadapter
interfaceidentifiedusingthelspcicommand‐RefertoStep5above.
9.8PhysicalAddressingMode
PhysicaladdressingmodeisaScalablePacketBufferModethatalsoallowsOnload
stackstouselargeamountsofpacketbuffermemory(avoidingthelimitationsofthe
addresstranslationtableontheadapter),butwithouttherequirementtoconfigure
anduseSRIOVvirtualfunctions.
Physicaladdressingmode,doeshowever,removememoryprotectionfromthe
networkadaptersaccessofpacketbuffers.Unprivilegeduserlevelcodeisprovided
anddirectlyhandlestherawphysicalmemoryaddressesofpacketsbuffers.User
levelcodeprovidesphysicalmemoryaddressesdirectlytotheadapterand
thereforehastheabilitytodirecttheadaptertoreadorwritearbitrarymemory
locations.Aresultofthisisthatamaliciousorbuggyapplicationcancompromise
systemintegrityandsecurity.OpenOnloadversionsearlierthanonload201210and
EnterpriseOnload2.1.0.0arelimitedto1millionpacketbuffers.Thislimitwas
raisedto2millionpacketsbuffersin201210u1andEnterpriseOnload2.1.0.1.
Toenablephysicaladdressingmode:
1Ignoreconfigurationsteps14above.
2Putthefollowingoptionintoausercreated.conffileinthe/etc/modprobe.d
directory:
optionsonloadphys_mode_gid=<n>
Wheresetting<n>tobe‐1allowsalluserstousephysicaladdressingmodeand
settingtoanintegerxrestrictsuseofphysicaladdressingmodetothespecific
usergroupx.
3ReloadtheOnloaddrivers
onload_toolreload
4EnabletheOnloadenvironmentusingEF_PACKET_BUFFER_MODE2or3.
EF_PACKET_BUFFER_MODE=2isequivalenttomode0,butusesphysical
addresses.Mode3usesSRIOVVFswithphysicaladdresses,butdoesnotuse
theIOMMUformemorytranslationandprotection.RefertoParameter
Referenceonpage146foracompletedescriptionofall
EF_PACKET_BUFFER_MODEoptions.
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 107
9.9ProgrammedI/O
PIO(programmedinput/output)describestheprocesswherebydataisdirectly
transferredbytheCPUtoorfromanI/Odevice.Itisanalternativetobusmaster
DMAtechniqueswheredataaretransferredwithoutCPUinvolvement.
Solarflare7000seriesadapterssupportTXPIO,wherepacketsonthetransmitpath
canbe“pushed”totheadapterdirectlybytheCPU.Thisimprovesthelatencyof
transmittedpacketsbutcancauseaverysmallincreaseinCPUutilization.TXPIOis
thereforeespeciallyusefulforsmallerpackets.
TheOnloadTXPIOfeatureisenabledbydefaultbutcanbedisabledviathe
environmentvariableEF_PIO.Anadditionalenvironmentvariable,
EF_PIO_THRESHOLDspecifiesthesizeofthelargestpacketsizethatcanuseTXPIO.
PIObuffersontheadapterarelimitedtoamaximumof8Onloadstacks.For
optimumperformance,PIObuffersshouldbereservedforcriticalprocessesand
otherprocessesshouldsetEF_PIOto0(zero).
TheOnloadstackdumputilityprovidesadditionalcounterstoindicatethelevelof
PIOuse‐seeTXPIOCountersonpage220fordetails.
TheSolarflarenetdriverwillalsousePIObuffersfornonacceleratedsocketsand
thiswillreducethenumberofPIObuffersavailabletoOnloadstacks.Topreventthis
setthedrivermoduleoptionpiobuf_size=0.
WhenbothacceleratedandnonacceleratedsocketsareusingPIO,thenumberof
PIObuffersavailabletoOnloadstackscanbecalculatedfromthetotal16available
PIOregions:
Usingtheaboveexamplevalues,eachportontheadapterrequires:
piobuf_size*rss_cpus/regionsize=0.5regions‐(roundup‐soeachportneeds1
region).
Thisleaves162=14regionsforOnloadstackswhichalsorequireoneregionper
port,perstack.Thereforefromourexamplewecanhave7onloadstacksusingPIO
buffers.
PIObuffersareallocatedonafirstcome,firstservedbasis.Thefollowingwarning
mightbeobservedwhenstackscannotbeallocatedanymorePIObuffers:
WARNING:allPIObufsallocatedtootherstacks.ContinuingwithoutPIO.
UseEF_PIOtocontrolthis
Description Examplevalue
piobuf_size drivermoduleparameter 256
rss_cpus drivermoduleparameter 4
region achunkofmemory2048bytes 2048bytes
OnloadUserGuide
PacketBuffers
Issue20 ©SolarflareCommunications2015 108
ToensuremorebuffersareavailableforOnload,itispossibletopreventthenet
driverfromusingPIObuffers.Thiscanbedonebysettingthesfcdrivermodule
optioninausercreatedfileinthe/etc/modprobe.ddirectory:
optionssfcpiobuf_size=0
Driversshouldbereloadedforthechangestobeeffective:
#onload_toolreload
TheperstackEF_PIOvariablecanalsobeunsetforstackswherePIObuffersarenot
required.
9.10TemplatedSends
“Templatedsends”isanotherSFN7000seriesadapterfeaturethatbuildsontopof
TXPIOtoprovidefurthertransmitlatencyimprovements.Thiscanbeusedin
applicationsthatknowthemajorityofthecontentofpacketsinadvanceofwhen
thepacketistobesent.Forexample,amarketfeedhandlermaypublishpackets
thatvaryonlyinthespecificvalueofcertainfields,possiblydifferentsymbolsand
priceinformation,butareotherwiseidentical.Templatedsendsinvolvecreatinga
templateofapacketontheadaptercontainingthebulkofthedatapriortothetime
ofsendingthepacket.Then,whenthepacketistobesent,theremainingdatais
pushedtotheadaptertocompleteandsendthepacket.
TheOnloadtemplatedsendsfeatureusestheOnloadExtensionsAPItogeneratethe
packettemplatewhichistheninstantiatedontheadapterreadytoreceivethe
“missingdatabeforeeachtransmission.
TheAPIdetailsareavailableintheOnload201310distributionat/src/include/
onload/extensions_zc.h
RefertoOnloadExtensionsAPIforfurtherinformationontheuseofpacket
templatesincludingcodeexamplesofusingthisfeature.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 109
10OnloadandVirtualization
10.1Introduction
UsingOnload201502acceleratedapplicationsareabletobenefitfromtheinherent
securitythroughisolation,easeofdeploymentthroughmigrationandincreased
resourcemanagementsupportedbyLinuxvirtualizedenvironments.
Thischapteridentifiesthefollowing:
OnloadandLinuxKVMonpage109
OnloadandNICPartitioningonpage111
OnloadinaDockerContaineronpage113
10.2Overview
• RunningOnloadinaVirtualMachine(VM)orDockerContainermeansthe
Onloadacceleratedapplicationbenefitsfromtheinherentisolationpolicyof
thevirtualizedenvironment.
•Thereisminimaldegradationoflatencyandthroughputperformance.Near
nativenetworkI/Operformanceispossiblebecausethereisdirecthardware
access(nohardwareemulation)withtheguestkernel(andvirtualization
platformhypervisor)beingbypassed.
• Multiplecontainers/virtualmachinescancoexistonthesamehostandallare
isolatedfromeachother.
10.3OnloadandLinuxKVM
OpenOnload201502includessupporttoaccelerateapplicationsrunningwithin
LinuxVMsonaKVMhost.ThisfeatureissupportedonSolarflareSFN7000series
adapterswhereeachphysicalinterfaceontheadaptercanbeexposedtothehost
asupto16PCIephysicalfunctions(PF)andupto240virtualfunctions(VF).The
adapteralsosupportsupto2048MSIXinterrupts.
ThissupportrequiresaVF(orPF)tobeexposeddirectlyintotheLinuxVMKVM
callthisnetworkconfiguration“Networkhostdev.Onloadprovidesuserlevel
accesstotheadapterviatheVFinexactlythesamewayasisachievedonanon
virtualizedLinuxinstall.FirmwareontheSolarflareSFN7000seriesadapter
configureslayer2switchingcapabilitythatsupportsthetransportofnetwork
packetsbetweenPCIphysicalfunctionsandvirtualfunctions.Thisfeaturesupports
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 110
thetransportofnetworktrafficbetweenOnloadapplicationsrunningindifferent
virtualmachines.Thisallowstraffictobereplicatedacrossmultiplefunctionsand
traffictransmittedfromoneVMcanbereceivedonanotherVM.
Figure14belowillustratesOnloaddeployedintotheLinuxKVMNetworkHostdev
architecturewhichexposesVirtualFunctions(VF)directlytotheVMguest.This
configurationallowstheOnloaddatapathtofullybypassthehostoperatingsystem
andprovidesmaximumaccelerationfornetworktraffic.
Figure14:OnloadandNetworkHostdevConfiguration
TodeployOnloadinaLinuxKVM:
•AsdetailedintheSolarflareServerAdapterUserGuide(SF103837CD)chapter
7SRIOV:
‐ InstalltheSolarflareNETdriverversion4.4.1.1017(orlater)
‐ Ensuretheadapterisusingfirmwareversion4.4.2.1011(orlater)
‐ Runsfboottoselectthefullfeaturefirmwarevariant,settheswitchmode
andidentifytherequirednumberofVFs:
#sfbootfirmwarevariant=fullfeatureswitchmode=sriovvfcount=4
‐ Reboottheserver,sotheLinuxKVMhostcanenumeratetheVFs
• FollowtheinstructionsinSolarflareServerAdapterUserGuide(SF103837CD)
sectionKVMLibvirtnetworkhostdev‐Configurationto:
‐ CreateaVM
‐ ConfiguretheVFs
‐ UnbindVFsfromthehost
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 111
‐ PassVFstotheVM
ExamplevirshcommandlineandXMLfileconfigurationinstructionsare
provided.
•InstallOnloadintheVMasinanonvirtualizedhost‐seeOpenOnload‐
Installationonpage21.
•Setthesfcdrivermoduleoptionnum_vistocreatethenumberofvirtual
interfaces.AVIisneededforeachOnloadstackcreatedonaVF.Drivermodule
optionsshouldbesetinausercreatedfile(e.gsfc.conf)inthe/etc/
modprobe.ddirectory.
optionssfcnum_vis=<NUM>
NOTE:WhenusingOnloadwithmultiplevirtualfunctions(VF)itisnecessaryto
settheOnloadmoduleoptionoof_all_ports_requiredtozero.SeeModule
Optionsonpage143fordetails.
TheSolarflareServerAdapterUserGuideisavailablefromhttps://
support.solarflare.com/.
10.4OnloadandNICPartitioning
EachphysicalinterfaceontheSolarflareSFN7000seriesadaptercanbeexposedto
thehostasmultiplePCIephysicalfunctions(PF).Upto16PFs,eachhavingaunique
MACaddress,aresupportedperadapter.ToOnload,eachPFrepresentsavirtual
adapter.
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 112
Figure15:OnloadandNICPartitioning
OntheadaptereachPFisbackedbyavirtualadapterandvirtualport‐these
componentsarecreatedbytheSolarflareNETdriverwhenitfindsapartitioned
adapter.ThePFscanbeconfiguredtotransparentlyplacetrafficonseparateVLANS
(soeachpartitionisonaseparatebroadcastdomain).
ToconfigureOnloadtousethepartitionedNIC:
• Ensuretheadapterisusingfirmwareversion4.4.2.1011(minimum)
•Usesfboottoselectthefullfeaturefirmwarevariant
•UsesfboottopartitiontheNICintomultiplePFs
•RebootingthehostallowsthefirmwaretopartitiontheNICintomultiplePFs.
•Toidentifywhichphysicalportanetworkinterfaceisusing:
#cat/sys/class/net/eth<N>/device/physical_port
ForcompletedetailsofconfiguringNICPartitioningrefertotheSolarflareServer
AdapterUserGuide(SF103837CD)chapter7SRIOVavailablefromhttps://
support.solarflare.com/.
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 113
10.5OnloadinaDockerContainer
Figure16illustratestheOnloaddeploymentinaDockercontainerenvironment.
Onlytheuserlevelcomponentsarecreatedinthecontainer.Onloadinthe
containerusestheOnloaddriversinstalledonthehostfornetworkI/O.Network
interfacesconfiguredonthehostarealsovisibleandusabledirectlyfromthe
container.
Figure16:OnloadinaDockerContainer
Inkeepingwiththecontainerizationtheory,itisenvisagedthatonlyasingleOnload
instancewillberunningineachcontainer,however,therearenorestrictions
preventingmultipleinstancesrunninginthesamecontainer.
10.6PreInstallation
Thisinstallproceduremakesthefollowingassumptions‐ensurethesecomponents
arecreated/installedbeforecontinuing:
•Dockerisinstalledonthehostserver.
•Onload201502(orlaterversion)mustbeinstalledonthehost.Anidentical
versionwillbeinstalledinthecontainer.
NOTE:OnloaddoesnotcurrentlysupportLinuxnamespaces.SupportforLinux
Networknamespacesmaybeaddedinafuturerelease.
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 114
10.7Installation
1Thedockerruncommandwillcreateacontainernamedonload.Thecontainer
iscreatedfromthecentos:latestbaseimageandabashshellterminalwillbe
started.
[root@host]#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/onload_epoll‐‐
name=onload‐it‐v/src/openonload201502.tgz:/tmp/openonload201502.tgz
centos:latest/bin/bash
Theexampleabovecopiestheopenonload201502.tgzfilefromthe/src
directoryonthehostandplacedthisfileinto/tmpinthecontainerrootfile
system.Allsubsequentcommandsareruninsidethecontainerunlesshostis
specified.
2InstallrequiredOStools/packagesinthecontainer.
#yuminstallperlautoconfautomakelibtooltargccmakenettoolsethtool
DifferentdockerbaseimagesmayrequireadditionalOSpackagesinstalled.
3Unpackthetarballtobuildtheopenonload<version>subdirectory.
#/usr/bin/tar‐zxvf/tmp/openonload201502.tgz
Note:itisnotpossibletousetools/utilities(suchastar)fromthehostfile
systemonfilesinthecontainerfilesystem.
4Changedirectorytotheopenonload<version>/scriptsdirectory
#cd/tmp/openonload201502/scripts
5BuildandinstalltheOnloaduserlevelcomponentsinthecontainer:
#./onload_build‐‐user
Ifthebuildprocessidentifiesanymissingdependencies,returntostep2to
installmissingcomponents.
#./onload_install‐‐userfiles‐‐nobuild
Thefollowingwarningmayappearattheendoftheinstallprocess,butitisnot
necessarytoreloadthedrivers
onload_install:Toloadthenewlyinstalleddriversrun:onload_toolreload
6CheckOnloadinstallation
#onload
OpenOnload201502
Copyright20062012SolarflareCommunications,20022005Level5
Networks
Built:Feb5201512:41:04(release)
Kernelmodule:201502
usage:
onload[options]<command><commandargs>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐forceprofiles‐‐profilesettingsoverrideenvironment
‐‐noapphandler‐‐donotuseappspecificsettings
‐‐app=<appname>‐‐identifyapplicationtorununderonload
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 115
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
7Onthehost,checkthatthecontainerhasbeencreatedandisrunning:
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
e2a12a635359centos:latest"/bin/bash"15secondsagoUp14secondsonload
8Configurenetworkinterfaces.
Configurenetworkadapterinterfacesinthehost.Interfaceswillalsobevisible
andusablefromthecontainer:
#ifconfig‐a
9Onloadisnowinstalledandreadytouseinthecontainer.
10.8CreateOnloadDockerImage
TocreateanewdockerimagethatincludestheOnloadinstallationpriorto
migration.Allcommandsarerunonthehost.
1Identifythecontainer(noteCONTAINERIDorNAME)
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
35bfeceb7022centos:latest"/bin/bash"24hoursagoExitedonload
2Createnewimage(thisexampleusestheNAMEvalue)
#dockercommit‐m"installedonload201502"onloadonload:v1
89e95645d5ff1fa02880dee44b433ab577f5a2715daf944fd0b393620d8253f1
3Listimages
#/dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv189e95645d5ff28secondsago486MB
centoslatestdade6cb4530a3daysago224MB
10.9Migration
Thedockersavecommandcanbeusedtoarchiveadockerimagewhichincludes
theOnloadinstallation.Thisimagecanthenbemigratedtootherservershavingthe
followingconfiguration:
•Dockerisinstalledanddockerserviceisrunning
•HostoperatingsystemRHEL7
•TheOnloadversionrunningonthehostmustbethesameasthemigrated
imageOnloadversion
•ThetargetserverdoesnotneedtohavethesameSolarflareadaptertypes
installed.
OnloadUserGuide
OnloadandVirtualization
Issue20 ©SolarflareCommunications2015 116
1Createatarfileofthecontainerimage:
#dockersave‐o<dirpathtostoreimage>/<nameofimage>.tar
<currentnameofimage>
Example(storeimagetarfileinhost/tmpdirectory):
#dockersave‐o/tmp/dkonload201502.taronload
2Theimagetarfilecanthenbecopiedtothetargetserverwhereitcanbe
loadedwiththedockerloadcommand:
#dockerload‐i/<pathtotransferredfile>/dkonload201502.tar
#dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv1303ec2d3e2b5Aboutanhourago486MB
3Create/runacontainerfromthetransferredimage.
#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/
onload_epoll‐‐name=onload‐itonload:v1/bin/bash
Whenthecontainerhasbeencreated,Onloadwillberunningwithinit.
OnloadDockerImages
Onloadimagesarenotcurrentlyavailablefromthedefaultdockerregistryhub.
Imagesmaybemadeavailableifthereissufficientcustomerinterestand
requirementforthisfeature.
10.10CopyingFilesBetweenHostandContainer
Thefollowingexampledemonstrateshowtocopyfilesfromthehosttoacontainer.
Allcommandsarerunonthehost.
1GetthecontainerShortName(outputtruncated):
[root@hostname]#dockerps‐a
CONTAINERID
bd1ea8d5526c
2DiscoverthecontainerLongName:
[root@hostname]#dockerinspect‐f'{{.Id}}'bd1ea8d5526c
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156
Thecontainerlongnamecanalsobediscoveredusingthecontainernamein
placeofthecontaineridentifier.
3Copyafiletorootfilesystem(/tmp)onthecontainer:
[root@hostname]#cpmyfile.txt/var/lib/docker/devicemapper/mnt/
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156/
rootfs/tmp/myfile.txt
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 117
11Limitations
Usersareadvisedtoreadthelatestrelease_notesdistributedwiththeOnload
releaseforacomprehensivelistofKnownIssues.
11.1Introduction
ThischapteroutlinesconfigurationsthatOnloaddoesnotaccelerateandwaysin
whichOnloadmaychangebehaviorofthesystemandapplications.Itisakeygoal
ofOnloadtobefullycompatiblewiththebehavioroftheregularkernelstack,but
therearesomecaseswherebehaviordeviates.
11.2ChangestoBehavior
MultithreadedApplicationsTermination
AsOnloadhandlesnetworkinginthecontextofthecallingapplication'sthreaditis
recommendedthatapplicationsensureallthreadsexitcleanlywhentheprocess
terminates.Inparticulartheexit()functioncausesallthreadstoexitimmediately
‐eventhoseincriticalsections.ThiscancausethreadscurrentlywithintheOnload
stackholdingtheperstacklocktoterminatewithoutreleasingthissharedlock‐this
isparticularlyimportantforsharedstackswhereaprocesssharingthestackcould
‘hangwhenOnloadlocksarenotreleased.
AnuncleanexitcanpreventtheOnloadkernelcomponentsfromcleanlyclosingthe
application'sTCPconnections,amessagesimilartothefollowingwillbeobserved:
[onload]Stack[0]releasedwithlockstuck
andanypendingTCPconnectionswillbereset.Topreventthis,applicationsshould
alwaysensurethatallthreadsexitcleanly.
ThreadCancellation
Unexpectedbehaviorcanresultwhenanacceleratedapplicationusesa
pthread_cancelfunction.Thereisincreasedriskfrommultithreadedapplicationsor
aPTHREAD_CANCEL_ASYNCHRONOUSthreadcallinganonasyncsafefunction.
Onloadusersarestronglyadvisedthatapplicationsshouldnotusepthread_cancel
functions.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 118
PacketCapture
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,diagnostictoolssuchastcpdumpandwiresharkdonotcapture
acceleratedpackets.TheSolarflaresuppliedonload_tcpdumpdoessupportcapture
ofUDPandTCPpacketsfromOnloadstacks‐Refertoonload_tcpdumponpage246
fordetails.
Firewalls
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,thesepacketsarenotvisibletothekernelfirewall(iptables)and
thereforefirewallruleswillnotbeappliedtoacceleratedtraffic.The
onload_iptablesfeaturecanbeusedtoenforceLinuxiptablesrulesashardware
filtersontheSolarflareadapter,refertoonload_iptablesonpage251.
NOTE:Hardwarefilteringonthenetworkadapterwillensurethataccelerated
applicationsreceivetrafficonlyonportstowhichtheyarebound.
SystemTools
Withtheexceptionof‘listeningsockets,TCPsocketsacceleratedbyOnloadarenot
visibletothenetstattool.UDPsocketsarevisibletonetstat.
Acceleratedsocketsappearinthe/procdirectoryassymboliclinksto/dev/
onload.Toolsthatrelyon/procwillprobablynotidentifytheassociatedfile
descriptorsasbeingsockets.RefertoOnloadandFileDescriptors,Stacksand
Socketsonpage52formoredetails.
AcceleratedsocketscanbeinspectedindetailwiththeOnloadonload_stackdump
tool,whichexposesconsiderablymoreinformationthantheregularsystemtools.
Fordetailsofonload_stackdumprefertoonload_stackdumponpage219.
Signals
IfanapplicationreceivesaSIGSTOPsignal,itispossiblefortheprocessingof
networkeventstobestalledinanOnloadstackusedbytheapplication.This
happensiftheapplicationisholdingalockinsidethestackwhentheapplicationis
stopped,andiftheapplicationremainsstoppedforalongtime,thismaycauseTCP
connectionstotimeout.
Asignalwhichterminatesanapplicationcanpreventthreadsfromexitingcleanly.
RefertoMultithreadedApplicationsTerminationonpage117formoreinformation.
Undefinedcontentmayresultwhenasignalhandlerusesthethirdargument
(ucontext)andifthesignalispostponedbyOnload.Toavoidthis,usetheOnload
moduleoptionsafe_signals_and_exit=0oruseEF_SIGNALS_NOPOSTPONEto
preventspecificsignalsbeingpostponedbyOnload.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 119
OnloadandIP_MULTICAST_TTL
OnloadwillactinaccordancewithRFC791whenitcomestotheIP_MULTICAST_TTL
setting.UsingOnload,ifIP_MULTICAST_TTL=0,packetswillneverbetransmittedon
thewire.
ThisdiffersfromtheLinuxkernelwherethefollowingbehaviorhasbeenobserved:
Kernel‐IP_MULTICAST_TTL0‐ifthereisalocallistener,packetswillnotbe
transmittedonthewire.
Kernel‐IP_MULTICAST_TTL0‐ifthereisNOlocallistener,packetswillalwaysbe
transmittedonthewire.
Source/PolicyBasedRoutingandRoutingMetrics
Onloaddoesnotcurrentlysupportsourcebasedorpolicybasedrouting.Whereas
theLinuxkernelwillselectaroutebasedonroutingmetrics,Onloadwillselectany
ofthevalidroutestoadestinationthatareavailable.
11.3LimitstoAcceleration
IPFragmentation
FragmentedIPtrafficisnotacceleratedbyOnloadonthereceiveside,andisinstead
receivedtransparentlyviathekernelstack.IPfragmentationisrarelyseenwithTCP,
becausetheTCP/IPstackssegmentmessagesintoMTUsizedIPdatagrams.With
UDP,datagramsarefragmentedbyIPiftheyaretoolargefortheconfiguredMTU.
RefertoFragmentedUDPonpage89foradescriptionofOnloadbehavior.
BroadcastTraffic
Broadcastsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
Multicasttrafficcanbeaccelerated.
IPv6Traffic
IPv6trafficfunctionsasnormalbutwillnotbeaccelerated.
RawSockets
RawSocketsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
SocketpairandUNIXDomainSockets
Onloadwillintercept,butdoesnotacceleratethesocketpair()systemcall.
Socketscreatedwithsocketpair()willbehandledbythekernel.Onloadalsodoes
notaccelerateUNIXdomainsockets.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 120
StaticallyLinkedApplications
Onloadwillnotacceleratestaticallylinkedapplications.Thisisduetothemethodin
whichOnloadinterceptslibcfunctioncalls(usingLD_PRELOAD).
LocalPortAddress
OnloadislimitedtoOOF_LOCAL_ADDR_MAXnumberoflocalinterfaceaddresses.A
localaddresscanidentifyaphysicalportoraVLAN,andmultipleaddressescanbe
assignedtoasingleinterfacewhereeachaddresscontributestothemaximum
value.Userscanallocateadditionallocalinterfaceaddressesbyincreasingthe
compiletimeconstantOOF_LOCAL_ADDR_MAXinthe/src/lib/efthrm/
oof_impl.hfileandrebuildingOnload.Inonload201205OOF_LOCAL_ADDR_MAX
wasreplacedbytheonloadmoduleoptionmax_layer2_interfaces.
Bonding,Linkaggregation
•Onloadwillonlyacceleratetrafficover802.3adandactivebackupbonds.
•Onloadwillnotacceleratetrafficifabondcontainsanyslaveinterfacesthatare
notSolarflarenetworkdevices.AddinganonSolarflarenetworkdevicetoa
bondthatiscurrentlyacceleratedbyOnloadmayresultinunexpectedresults
suchasconnectionsbeingreset.
• AccelerationofbondedinterfacesinOnloadrequiresakernelconfiguredwith
sysfssupportandabondingmoduleversionof3.0.0orlater.
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsofconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpTeams’.
VLANs
•OnloadwillonlyacceleratetrafficoverVLANswherethemasterdeviceiseither
aSolarflarenetworkdevice,oroverabondedinterfacethatisaccelerated.i.e.
IftheVLAN'smasterisaccelerated,thensoistheVLANinterfaceitself.
•NestedVLANtagsarenotaccelerated,butwillfunctionasnormal.
•TheifconfigcommandwillreturninconsistentstatisticsonVLANinterfaces(not
masterinterface).
•ASolarflareVLANtaggedinterfacethatissubsequentlyplacedinabondwill
notbeaccelerated.
• HardwarefiltersinstalledbyOnloadontheSolarflareadapterwillonlyconsider
theIPaddressandport,butnottheVLANidentifier.ThereforeifthesameIP
address:portcombinationexistsondifferentVLANinterfaces,onlythefirst
interfacetoinstallthefilterwillreceivethetraffic.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 121
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsandconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpVLANs’.
TCPRTODuringOverloadConditions
UnderveryhighloadconditionsanincreasedfrequencyofTCPretransmission
timeouts(RTOs)mightbeobserved.Thishasthepotentialtooccurwhenathread
servicingthestackisdescheduledbytheCPUwhilststillholdingthestacklockthus
preventinganotherthreadfromaccessing/pollingthestack.Astacknotbeing
servicedmeansthatACKsarenotreceivedinatimelymannerforpacketssentand
resultsinRTOsfortheunacknowledgedpackets.
EnablingtheperstackenvironmentvariableEF_INT_DRIVENcanreducethe
likelihoodofthisbehaviorbyensuringthestackisservicedpromptly.
TCPwithJumboFrames
WhenusingjumboframeswithTCP,OnloadwilllimittheMSSto2048bytesto
ensurethatsegmentsdonotexceedthesizeofinternalpacketbuffers.
Thisshouldpresentnoproblemsunlesstheremoteendofaconnectionisunableto
negotiatethislowerMSSvalue.
TransmissionPath‐PacketLoss
OccasionallyOnloadneedstosendapacket,whichwouldnormallybeaccelerated,
viathekernel.ThisoccurswhenthereisnodestinationaddressentryintheARP
tableortopreventanARPtableentryfrombecomingstale.
Bydefault,theLinuxsysctl,unres_qlen,willenqueue3packetsperunresolved
addresswhenwaitingforanARPreply,andonaserversubjecttoaveryhighUDP
orTCPtrafficloadthiscanresultinpacketlossonthetransmitpathandpackets
beingdiscarded.
Theunres_qlenvaluecanbeidentifiedusingthefollowingcommand:
sysctl‐a|grepunres_qlen
net.ipv4.neigh.eth2.unres_qlen=3
net.ipv4.neigh.eth0.unres_qlen=3
net.ipv4.neigh.lo.unres_qlen=3
net.ipv4.neigh.default.unres_qlen=3
Changestothequeuelengthscanbemadepermanentinthe/etc/sysctl.conf
file.Solarflarerecommendsettingtheunres_qlenvaluetoatleast50.
Ifpacketdiscardsaresuspected,thisextremelyrareconditioncanbeindicatedby
thecp_defercounterproducedbytheonload_stackdumplotscommandonUDP
socketsorfromtheunresolved_discardscounterintheLinux/proc/net/stat
arp_cachefile.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 122
ApplicationClustering
•OnloadmatchestheLinuxkernelimplementationsuchthatclusteringisnot
supportedformulticasttrafficandwheresettingofSO_REUSEPORThasthe
sameaffectasSO_REUSEADDR.
• Callingconnect()onaTCPsocketwhichwaspreviouslysubjecttoabind()
callisnotcurrentlysupported.Thiswillbesupportedinafuturerelease.
•Anapplicationclusterwillnotpersistoveradapter/server/driverreset.Before
restartingtheserverorresettingtheadaptertheOnloadapplicationsshouldbe
terminated.Thislimitationwillberemovedinafuturerelease.
•TheenvironmentvariableEF_CLUSTER_RESTARTdeterminesthebehaviorof
theclusterwhentheapplicationprocessisrestarted‐referto
EF_CLUSTER_RESTARTinParameterReferenceonpage146.
•IfthenumberofsocketsinaclusterislessthanEF_CLUSTER_SIZE,aportionof
thereceivedtrafficwillbelost.
•ThereislittlebenefitwhenclusteringinvolvesaTCPloopbacklisteningsocket
asconnectionswillnotbedistributedamongstallthreads.Anonloopback
listeningsocket‐whichmightoccasionallygetsomeloopbackconnectionscan
benefitfromApplicationClustering.
11.4epoll‐KnownIssues
OnloadsupportsdifferentimplementationsofepollcontrolledbytheEF_UL_EPOLL
environmentvariable‐seeMultiplexedI/Oonpage57forconfigurationdetails.
•WhenusingEF_UL_EPOLL=1or3,ithasbeenidentifiedthatthebehaviorof
epoll_wait()differsfromthekernelwhentheEPOLLONESHOTeventis
requested,resultingintwo‘wakeups’beingobserved,onefromthekerneland
onefromOnload.ThisbehaviorisapparentonSOCK_DGRAMandSOCK_STREAM
socketsforallcombinationsofEPOLLONESHOT,EPOLLINandEPOLLOUTevents.
ThisappliesforTCPlisteningsocketsandUDPsockets,butnotforTCP
connectedsockets.
EF_EPOLL_CTL_FASTisenabledbydefaultandthismodifiesthesemanticsof
epoll.Inparticular,itbuffersupcallstoepoll_ctl()andonlyappliesthem
whenepoll_wait()iscalled.Thiscanbreakapplicationsthatdo
epoll_wait()inonethreadandepoll_ctl()inanotherthread.Theissue
onlyaffectsEF_UL_EPOLL=2andthesolutionistosetEF_EPOLL_CTL_FAST=0
ifthisisaproblem.ThedescribedconditiondoesnotoccurifEF_UL_EPOLL=1
orEF_UL_EPOLL=3.
•WhenEF_EPOLL_CTL_FASTisenabledandanapplicationistestingthe
readinessofanepollfiledescriptorwithoutactuallycallingepoll_wait(),for
examplebydoingepollwithinepollorepollwithinselect(),ifonethreadis
callingselect()orepoll_wait()andanotherthreadisdoingepoll_ctl(),
thenEF_EPOLL_CTL_FASTshouldbedisabled.Thisapplieswhenusing
EF_UL_EPOLL1,2or3.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 123
Iftheapplicationismonitoringthestateoftheepollfiledescriptorindirectly,
e.g.bymonitoringtheepollfdwithpoll,thenEF_EPOLL_CTL_FASTcancause
issuesandshouldbesettozero.
•Asocketshouldberemovedfromanepollsetonlywhenallreferencestothe
socketareclosed.
WithEF_UL_EPOLL=1(default)orEF_UL_EPOLL=3,asocketisremovedfrom
theepollsetifthefiledescriptorisclosed,evenifotherreferencestothe
socketexist.Thiscancauseproblemsiffiledescriptorsareduplicatedusing
dup().Forexample:
s=socket();
s2=dup(s);
epoll_ctl(epoll_fd,EPOLL_CTL_ADD,s,...);
close(s);/*socketreferencedbysisremovedfromepollsetwhenusingonload*/
WorkaroundissetEF_UL_EPOLL=2.
•WhenOnloadisunabletoaccelerateaconnectedsocket,e.g.becausenoroute
tothedestinationexistswhichusesaSolarflareinterface,thesocketwillbe
handedofftothekernelandisremovedfromtheepollset.Becausethesocket
isnolongerintheepollset,attemptstomodifythesocketwithepoll_ctl()
willfailwiththeENOENT(descriptornotpresent)error.Thedescribedcondition
doesnotoccurifEF_UL_EPOLL=1or3.
•Ifanepollfiledescriptorispassedtotheread()orwrite()functionsthese
willreturnadifferenterrorcodethanthatreportedbythekernelstack.This
issueexistsforallimplementationsofepoll.
•WhenEPOLLETisusedandtheeventisready,epoll_wait()istriggeredby
ANYeventonthesocketinsteadoftherequestedevent.Thisissueshouldnot
affectapplicationcorrectness.Theproblemexistsforbothimplementationsof
epoll.
•Usersshouldbeawarethatifaserverisoverclockedtheepoll_wait()
timeoutvaluewillincreaseasCPUMHzincreasesresultinginunexpected
timeoutvalues.ThishasbeenobservedonIntelbasedsystemsandwhenthe
OnloadepollimplementationisEF_UL_EPOLL=1or3.UsingEF_UL_EPOLL=2
thisbehaviorisnotobserved.
•Onaspinningthread,ifepollaccelerationisdisabledbysetting
EF_UL_EPOLL=0,socketsonthisthreadwillbehandedofftothekernel,but
latencywillbeworsethanexpectedkernelsocketlatency.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 124
11.5ConfigurationIssues
MixedAdaptersSharingaBroadcastDomain
OnloadshouldnotbeusedwhenSolarflareandnonSolarflareinterfacesinthe
samenetworkserverareconfiguredinthesamebroadcastdomain1asdepictedby
thefollowingdiagram.
Whenanoriginatingserver(S1)sendsanARPrequesttoaremoteserver(S2)having
morethanoneinterfacewithinthesamebroadcastdomain,ARPresponsesfromS2
willbegeneratedfromallinterfacesanditisnondeterministicwhichresponsethe
originatoruses.WhenOnloaddetectsthissituation,itpromptsamessage
identifying'duplicateclaimofipaddress'toappearinthe(S1)hostsyslog
asawarningofpotentialproblems.
Problem1
TrafficfromS1toS2maybedeliveredthrougheitheroftheinterfacesonS2,
irrespectiveoftheIPaddressused.Thismeansthatifoneinterfaceisacceleratedby
Onloadandtheotherisnot,youmayormaynotgetacceleration.
Toresolvethesituation(forthecurrentsession)issuethefollowingcommand:
echo1>/proc/sys/net/ipv4/conf/all/arp_ignore
ortoresolveitpermanentlyaddthefollowinglinetothe/etc/sysctl.conffile:
net.ipv4.conf.all.arp_ignore=1
andrunthesysctlcommandforthisbeeffective.
sysctl‐p
ThesecommandsensurethataninterfacewillonlyrespondtoanARPrequestwhen
theIPaddressmatchesitsown.RefertotheLinuxdocumentationLinux/
Documentation/networking/ipsysctl.txtforfurtherdetails.
1. ABroadcastdomaincanbealocalnetworksegmentorVLAN.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 125
Problem2
AmoreseriousproblemarisesifoneinterfaceonS2carriesOnloadacceleratedTCP
connectionsandanotherinterfaceonthesamehostandsamebroadcastdomainis
nonSolarflare:
ATCPpacketreceivedonthenonSolarflareinterfacecanresultinacceleratedTCP
connectionsbeingresetbythekernelstackandthereforeappeartotheapplication
asifTCPconnectionsarebeingdropped/terminatedatrandom.
TopreventthissituationtheSolarflareandnonSolarflareinterfacesshouldnotbe
configuredinthesamebroadcastdomain.ThesolutiondescribedforProblem1
abovecanreducethefrequencyofProblem2,butdoesnoteliminateit.
TCPpacketscanbedirectedtothewronginterfacebecause:
•theoriginatorS1needstorefreshitsARPtableforthedestinationIPaddress‐
sosendsanARPrequestandsubsequentlydirectsTCPpacketstothenon
Solarflareinterface
•aswitchwithinthebroadcastdomainbroadcaststheTCPpacketstoall
interfaces.
VirtualMemoryon32BitSystems
On32bitLinuxsystemstheamountofallocatedvirtualaddressspacedefaults,
typically,to128MbwhichlimitsthenumberofSolarflareinterfacesthatcanbe
configured.Virtualmemoryallocationcanbeidentifiedinthe/proc/meminfofile
e.g.
grepVmalloc/proc/meminfo
VmallocTotal:122880kB
VmallocUsed:76380kB
VmallocChunk:15600kB
TheOnloaddriverwillattempttomapallPCIBaseAddressRegistersforeach
Solarflareinterfaceintovirtualmemorywhereeachinterfacerequires16Mb.
Examinationofthekernellogsin/var/log/messagesatthepointtheOnload
driverisloading,wouldrevealamemoryallocationfailureasinthefollowing
extract:
allocationfailed:outofvmallocspace‐usevmalloc=<size>toincreasesize.
[sfcefrm]Failed(12)tomapbar(16777216bytes)
[sfcefrm]efrm_nic_add:ERROR:linux_efrm_nic_ctorfailed(12)
Onesolutionistousea64bitkernel.Anotheristoincreasethevirtualmemory
allocationonthe32bitsystembysettingvmallocsizeonthe‘kernelline’inthe/
boot/grub/grub.conffileto256,forexample,
kernel/vmlinuz2.6.18238.el5roroot=/dev/sda7vmalloc=256M
Thesystemmustberebootedforthischangetotakeeffect.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 126
HardwareResources
Onloadusescertainphysicalresourcesonthenetworkadapter.Iftheseresources
areexhausted,itisnotpossibletocreatenewOnloadstacksandnotpossibleto
acceleratenewsockets.Thesephysicalresourcesinclude:
1VirtualNICs.VirtualNICsprovidetheinterfacebywhichauserlevelapplication
sendsandreceivesnetworktraffic.Whentheseareexhausteditisnotpossible
tocreatenewOnloadstacks,meaningnewapplicationscannotbeaccelerated.
However,SolarflarenetworkadapterssupportlargenumbersofVirtualNICs,
andthisresourceisnottypicallythefirsttorunout.
2Filters.Filtersareusedtodemultiplexpacketsreceivedfromthewiretothe
appropriateapplication.Whentheseareexhausteditisnotpossibletocreate
newacceleratedsockets.Solarflarerecommendthatapplicationsdonot
allocatemorethan4096filters.
3Buffertableentries.Thebuffertableprovidesaddressprotectionand
translationforDMAbuffers.Whentheseareexhausteditisnotpossibleto
createnewOnloadstacks,andexistingstacksarenotabletoallocatemore
DMAbuffers.
Whenanyoftheseresourcesareexhausted,normaloperationofthesystemshould
continue,butitwillnotbepossibletoacceleratenewsocketsorapplications.
Undersevereconditions,afterresourcesareexhausted,itmaynotbepossibleto
sendorreceivetrafficresultinginapplicationsgetting‘stuck.The
onload_stackdumputilityshouldbeusedtomonitorhardwareresources.
IGMPOperationandMulticastProcessPriority
ItisimportantthatthepriorityofprocessesusingUDPmulticastdonothavea
higherprioritythanthekernelthreadhandlingthemanagementofmulticastgroup
membership.
Failuretoobservethiscouldleadtothefollowingsituations:
1IncorrectkernelIGMPoperation.
2Thehigherpriorityuserprocessisabletoeffectivelyblockthekernelthread
andpreventitfromidentifyingthemulticastgrouptoOnloadwhichwillreact
bydroppingpacketsreceivedforthemulticastgroup.
Acombinationofindicatorsmayidentifythis:
•ethtoolreportsgoodpacketsbeingreceivedwhilemulticastmismatchdoesnot
increase.
•ifconfigidentifiesdataisbeingreceived.
• onload_stackdumpwillshowtherx_discard_mcast_mismatchcounter
increasing.
Loweringthepriorityoftheuserprocesswillremedythesituationandallowthe
multicastpacketsthroughOnloadtotheuserprocess.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 127
DynamicLoading
Iftheonloadlibrarylibonloadisopenedwithdlopen()andclosedwithdlclose()
itcanleavetheapplicationinanunpredictablestate.Usersareadvisedtousethe
RTLD_NODELETEflagtopreventthelibraryfrombeingunloadedwhendlclose()is
called.
ScalablePacketBufferMode
SupportforSRIOVisdisabledon32bitkernels,thereforethefollowingfeaturesare
notavailableon32bitkernels.
•ScalablePacketBufferMode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
Onsomekernelversions,configuringtheadaptertohavealargenumberofVFs(via
sfboot)cancausekernelpanics.Affectingkernelversionsintherange3.0to3.3
inclusive,thisisduetothelargenetlinkmessagesthatincludeinformationabout
networkinterfaces.
Theproblemcanbeavoidedbylimitingthetotalnumberofphysicalnetwork
interfaces,includingVFs,toamaximum30.
SLES11SRIOV
IthasbeennotedthatsomeSLES11kernels(3.1andearlier)exhibitabug,typically
seenwhenloadingOnloaddrivers,whenrunningOpenOnloadwithSRIOVandIntel
IOMMUs.Thisbughasbeenfixedinmorerecentkernels3.2stableand3.6.
HugePageswithIPCnamespace
HugepagesupportshouldnotbeenablediftheapplicationusesIPCnamespaces
andtheCLONE_NEWIPCflag.Failuretoobservethismayresultinasegfault.
HugePageswithSharedStacks
ProcesseswhichshareanOnloadstackshouldnotattempttousehugepages.Refer
toStackSharingonpage62forlimitationdetails.
HugePages‐Size
Whenusinghugepages,itisrecommendedtoavoidsettingthepagesizegreater
than2Mbyte.AfailuretoobservethiscouldleadtoOnloadunabletoallocate
furtherbuffertablespaceforpacketbuffers.
HugePages‐AMDIOMMU
DuetotheAMDIOMMUnotreturningalignedPCIaddresses,theuseofhugepages
onsystemswithAMDIOMMUsisnotsupported.
OnloadUserGuide
Limitations
Issue20 ©SolarflareCommunications2015 128
HugePagesandshmmni
Usersshouldensurethatthenumberofsystemwidesharedmemorysegments
(shmmni)exceedsthenumberofhugepagesrequired.
•Toidentifycurrentshmmnisetting:
#cat/proc/sys/kernel/shmmni
•Toset(norebootrequired‐butnotpermanent):
#echo8000>/proc/sys/kernel/shmmni
•Toset(permanent‐rebootrequired):
#echo"kernel.shmmni=8000">>/etc/sysctl.conf
Forexample,if4000hugepagesarerequired,increasethecurrentshmmnivalueby
4000.
RedHatMRG2andSRIOV
EnterpriseOnloadfromversion2.1.0.1includessupportforRedHatMRG2update3
andthe3.6.11rtkernel.SolarflaredonotrecommendtheuseofSRIOVorthe
IOMMUwhenusingOnloadonthesesystemsduetoanumberofknownkernel
issues.ThefollowingOnloadfeaturesshouldnotbeusedonMRG2u3:
•Scalablepacketbuffermode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
PowerPCArchitecture
•32bitapplicationsareknownnottoworkcorrectlywithonload201310.This
hasbeencorrectedinonload201310u1.
•SRIOVisnotsupportedbyonload201310onPowerPCsystems.
RecommendedsettingisEF_PACKET_BUFFER_MODE==0or2,butnot1or3.
•PowerPCarchitecturesdonotcurrentlysupportPIOforreducedlatency.
EF_PIOshouldbesettozero.
Java7Applications‐useofvfork()
OnloadacceleratedJava7applicationsthatcallvfork()shouldsetthe
environmentvariableEF_VFORK_MODE=2andthereaftertheapplicationshouldnot
createsocketsoracceleratedpipesinvfork()childbeforeexec.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 129
12ChangeHistory
Thischapterprovidesabriefhistoryofchanges,additionsandremovalstoOnload
releasesaffectingOnloadbehaviorandOnloadenvironmentvariables.
Featuresonpage130
EnvironmentVariablesonpage135
ModuleOptionsonpage143
TheOOLcolumnidentifiestheOpenOnloadreleasesupportingthefeature.TheEOL
columnidentifiestheEnterpriseOnloadreleasesupportingthefeature(NS=not
supported).
ThefollowingtablemapsmajorEnterpriseOnloadreleasestotheclosest
functionallyequivalentOpenOnloadrelease.Usersshouldalwaysalsorefertothe
ReleasenotesandChangelogstoidentifyfeaturesupportintheEnterpriserelease.
OpenOnload EnterpriseOnload
201011u1 1.0
201109u2 2.0
201310u2 3.0
201502u2 4.0
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 130
12.1Features
Feature OOL EOL Description/Notes
4.5.1.1026netdriver 201509 NS Adapternetdriver.
ApplicationClustering 201405 NS 201509Removethesameport,sameaddress
limitation.
CI_CFG_MAX_INTERFACES
CI_CFG_MAX_REGISTER_INT
ERFACES
ALL NS Increasedefaultto8(previously6).This
remainsacompiletimeoption.
onload_set_recv_filter() 201509 NS UDPsocketscallsisdeprecatedin201509.
Teamingdriver 201509 NS Acceleratelinksaggregatedusingteamdand
theteamingdriver.
TransparentProxy 201509 NS SeeTransparentReverseProxyModeson
page84.
ScalableFilters 201509 NS SeeScalableFiltersonpage82.
IP_TRANSPARENT 201509 NS TCPsocketoption.
4.5.1.1010netdriver 201502u2 4.0 Adapternetdriver.
4.4.1.1021netdriver 201502u1 NS Adapternetdriver.
SO_PROTOCOL 201502u2 4.0 Socketoptiontoretrieveasocketprotocolasan
integer.
4.4.1.1017netdriver 201502 NS Adapternetdriver.
LinuxDockerContainers 201502 4.0 SeeOnloadinaDockerContaineronpage113
OnloadinKVM 201502 4.0 OnloadandLinuxKVMonpage109
Socketcaching 201502 4.0 SeeListen/AcceptSocketsonpage79
RemoteMonitoring 201502 4.0 SeeRemoteMonitoringonpage236
Blacklist/Whitelist 201502 4.0 SeeWhitelistandBlacklistInterfaceson
page51
TCPdelegatedsend 201502 4.0 SeeListen/AcceptSocketsonpage79
SynCookies 201502 4.0
Receivequeuedropcounters 201502 4.0
Ubuntu/Debiansupported 201502 4.0 SeeHardwareandSoftwareSupported
Platformsonpage16forsupportedversions.
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 131
4.1.2.1003netdriver 201405u2
201405u1
NS NetdriversupportingRHEL7andlaterkernels.
SIOCOUTQ 201405u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetacknowledged.
SIOCOUTQNSD 201405u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetsent.
ef_pd_interface_name() 201405u1 4.0 Identifiestheinterfaceusedbyaprotection
domain.
ef_vi_prime() 201405u1 4.0 Primeinterruptssocanblockonafile
descriptor(includinganyvirtualinterface)until
eventsarereadytobeprocessed.
ef_filter_spec_set_tx_port
_sniff()
201405u1 4.0 NewfiltertypetosniffTXtraffic.
ONLOAD_SOF_TIMESTAMPING_ST
REAM
201405 4.0 Onloadextensiontothestandard
SO_TIMESTAMPINGAPItosupporthardware
timestampsonTCPsockets.
onload_move_fd 201405 4.0 Movesocketsbetweenstacks.
SolarCapturePro‐
applicationclustering
201405 4.0 Onloaddistributionincludesthesolarclusterd
daemonforSolarCaptureProapplication
clusteringfeature.
4.1.0.6734netdriver 201405 3.0.0.8
3.0.0.7
3.0.0.6
3.0.0.5
3.0.0.4
NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadapters‐includingSFN7x42Q.
SO_REUSEPORT 201405 4.0 Allowmultiplesocketstobindtothesameport
‐supportstheApplicationClusteringfeature‐
seeApplicationClusteringonpage63.
HWMulticastLoopback 201405 4.0 RefertoHardwareMulticastLoopbackon
page94.
onload_ordered_epoll_
wait()
onload_ordered_epoll_
event
201405 4.0 Wireorderdeliveryofpackets.
RefertoWireOrderDeliveryonpage61.
TCPSYNcookies 201405 4.0 ForceuseofTCPSYNcookiestoprotectagainst
aSYNfloodattack.
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 132
onload_tooldisable_cstates 201405 Removedalongwiththesfc_tunedriver.
sfc_aoedriver 201405 NS ApplicationOnload™driverincludedinthe
Onloaddistribution.
4.0.2.6645netdriver 201310u2 3.0 NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersintroducinghardwarepacket
timestampsandPTPon7xxxseriesadapters.
SFN7142Qnotsupported.
SO_TIMESTAMPING 201310u1 3.0 Socketoptiontoreceivehardwaretimestamps
forreceivedpackets.
onload_fd_check_feature() 201310u1 3.0 onload_fd_check_featureonpage191
4.0.2.6628netdriver 201310u1 NS NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersintroducinghardwarepacket
timestampsandPTPon7xxxseriesadapters.
4.0.0.6585netdriver 201310 3.0 NetdriversupportingSFN5xxx,6xxxand7xxx
seriesadaptersandSolarflarePTPand
hardwarepackettimestamps.
MulticastReplication 201310 3.0 Bonding,LinkaggregationandFailoveron
page65
TXPIO 201310 3.0 DebugandLoggingonpage67
LargeBufferTableSupport 201310 3.0 LargeBufferTableSupportonpage97
TemplatedSends 201310 3.0 TemplatedSendsonpage108
ONLOAD_MSG_WARM 201310 3.0 ONLOAD_MSG_WARMonpage78
SO_TIMESTAMP
SO_TIMESTAMPNS
201310 3.0 SupportedforTCPsockets
dup3() 201310 3.0 Onloadwillinterceptcallstocreateacopyofa
filedescriptorusingdup3().
3.3.0.6262netdriver NS 2.1.0.1 SupportSolarflareEnhancedPTP(sfptpd).
IP_ADD_SOURCE_MEMBERS
HIP
201210u1 3.0 Jointhesuppliedmulticastgrouponthegiven
interfaceandacceptdatafromthesupplied
sourceaddress.
IP_DROP_SOURCE_MEMBER
SHIP
201210u1 3.0 Dropsmembershiptothegivenmulticast
group,interfaceandsourceaddress.
MCAST_JOIN_SOURCE_GRO
UP
201210u1 3.0 Joinasourcespecificgroup.
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 133
MCAST_LEAVE_SOURCE_GR
OUP
201210u1 3.0 Leaveasourcespecificgroup.
3.3.0.6246netdriver 201210u1 NS SupportSolarflareEnhancedPTP(sfptpd).
Hugepagessupport 201210 3.0 Packetbuffersusehugepages.Controlledby
EF_USE_HUGE_PAGES
Defaultis1‐usehugepagesifavailable
SeeLimitationsonpage117
onload_iptables 201210 3.0 ApplyLinuxiptablesfirewallrulesoruser
definedfirewallrulestoSolarflareinterfaces
onload_stackdump
processes
onload_stackdumpaffinities
onload_stackdumpenv
201210 3.0 ShowallacceleratedprocessesbyPID
ShowCPUcoreacceleratedprocessisrunning
on
Showenvironmentvariables‐
EF_VALIDATE_ENV
Physicaladdressingmode 201210 3.0 Allowsaprocesstousephysicaladdresses
ratherthancontrolledI/Oaddresses.Enabled
byEF_PACKET_BUFFER_MODE2or3
UDPsendmmsg() 201210 3.0 Sendmultiplemsgsinasinglefunctioncall
I/OMultiplexing 201210 3.0 Supportforppoll(),pselect()and
epoll_pwait()
DKMS 201210 NS OpenOnloadavailableinDKMSRPMbinary
format
3.2.1.6222Bnetdriver 201210 NS OpenOnloadonly
3.2.1.6110netdriver NS 2.1.0.0 EnterpriseOnloadonly
3.2.1.6099netdriver 201205u1 NS
Removingzombiestacks 201205u1 2.1.0.0 onload_stackdump‐zkillwillterminate
stackslingeringafterexit
Compatibility 201205u1 2.1.0.0 CompatibilitywithRHEL6.3andLinux3.4.0
TCPstriping 201205 2.1.0.0 SingleTCPconnectioncanusethefull
bandwidthofbothportsonaSolarflareadapter
TCPloopbackacceleration 201205 2.1.0.0 EF_TCP_CLIENT_LOOPBACK&
EF_TCP_SERVER_LOOPBACK
TCPdelayed
acknowledgments
201205 2.1.0.0 EF_DYNAMIC_ACK_THRESH
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 134
TCPresetfollowingRTO 201205 2.1.0.0 EF_TCP_RST_DELAYED_CONN
Configurecontrolplane
tables
201205 2.1.0.0 max_layer_2_interface
max_neighs
max_routes
Onloadadaptersupport 201109u2 2.0.0.0 OnloadsupportforSFN5322F&SFN6x22F
Acceleratepipe2() 201109u2 2.0.0.0 Acceleratepipe2()functioncall
SOCK_NONBLOCK
SOCK_CLOEXEC
201109u2 2.0.0.0 TCPsockettypes
ExtensionsAPI 201109u2 2.0.0.0 Supportforonload_thread_set_spin()
3.2netdriver 201109u1 2.0.0.0
Onload_tcpdump 201109 2.0.0.0
ScalablePacketBuffer 201109 2.0.0.0 EF_PACKET_BUFFER_MODE=1
ZeroCopyUDPRX 201109 2.0.0.0
ZeroCopyTCPTX 201109 2.0.0.0
Receivefiltering 201109 2.0.0.0
TCP_QUICKACK 201109 2.0.0.0 setsockopt()option
Benchmarktoolsfnettest 201109 2.0.0.0 Supportforsfntstream
3.1netdriver 201104
ExtensionsAPI 201104 2.0.0.0 Initialpublication
SO_BINDTODEVICE
SO_TIMESTAMP
SO_TIMESTAMPNS
201104 2.0.0.0 setsockopt()andgetsockopt()options
Acceleratedpipe() 201104 2.0.0.0 Acceleratepipe()functioncall
UDPrecvmmsg() 201104 2.0.0.0 Delivermultiplemsgsinasinglefunctioncall
Benchmarktoolsfnettest 201104 2.0.0.0 Supportsonlysfntpingpong
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 135
12.2EnvironmentVariables
Variable OOL EOL Changed Notes
EF_UDP_SEND_NONBLOC
K_NO_PACKETS_MODE
201509 NS Controlbehaviourofnonblock
UDPsend()callswhen
insufficientbufferscanbe
allocated.
EF_TCP_SYNRECV_MAX 201509 NS Limitthenumberofhalfopen
connectionsthatcanbecreated
inanOnloadstack.
EF_TCP_SOCKBUF_MAX_
FRACTION
201509 NS ControlthefractionoftotalTX
buffersallocatedtoasingle
socket.
EF_TCP_CONNECT_SPIN 201509 NS Callstoconnect()forTCP
socketswillspinuntila
connectionisestablishedorthe
spintimeoutexpiresorthe
sockettimeoutexpires.
Default=disabled.
EF_SCALABLE_FILTERS_E
NABLE
201509 NS Tog glescalablefiltersmodefor
astack.
EF_SCALABLE_FITLERS_M
ODE
201509 NS Storesthescalablefiltermode
setwithEF_SCALABLE_FILTERS.
NOTSETDIRECTLY.
EF_SCALABLE_FILTERS 201509 NS Identifytheinterfacetouseand
setmodeforscalablelistening
sockets.
EF_RETRANSMIT_THRESH
OLD_ORPHAN
201509 NS Numberofretransmittimeouts
beforeaTCPconnectionis
abortedincaseoforphaned
connection.
EF_MAX_EP_PINNED_PA
GES
NS 1.0 201509 Notusedinpreviousrelease
andremovedfrom201509.
EF_OFE_ENGINE_SIZE 201502 NS Size(bytes)oftheOnloadfilter
engineallocatedwhenanew
stackiscreated.
EF_TCP_SNDBUF_ESTABLI
SHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_SNDBUFforTCPsocketsin
theESTABLISHEDstate.
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 136
EF_TCP_RCVBUF_STRICT 201502 4.0 PreventTCPsmallsegment
attackbylimitingnumberof
packetsinaTCPreceivequeue
andreorderbuffer.
EF_TCP_RCVBUF_ESTABLI
SHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_RCVBUFforTCPsocketsin
theESTABLISHEDstate.
EF_SO_BUSY_POLL_SPIN 201502 4.0 Spinonlyifaspinningsocketis
presentinthepoll/select/epoll
set.
EF_SELECT_NONBLOCK_FA
ST_USEC
201502 4.0 Nonacceleratedsocketsare
polledonlyeveryNusecs.
EF_SELECT_FAST_USEC 201502 4.0 Acceleratedsocketsarepolled
forNusecsbefore
unacceleratedsockets.
EF_PIPE_SIZE 201502 4.0 201509 Defaultsizeofapipe.
Defaultdecreasedto229376
from237568.
EF_SOCKET_CACHE_MAX 201502 4.0 Setthemaximumnumberof
TCPsocketstocacheperstack.
EF_SOCKET_CACHE_PORTS 201502 4.0 Allowcachingofsocketsbound
tospecifiedports.
EF_PER_SOCKET_CACHE_M
AX
201502 4.0 Limitthesizeofasocketcache.
EF_COMPOUND_PAGES_MOD
E
201502 4.0 ControlOnloaduseof
compoundpages.
EF_UL_EPOLL=3 201502 4.0
EF_ACCEPT_INHERIT_NOD
ELAY
NS 3.0 201502/4.0 Removed(OOL)201502,(EOL)
4.0.
EF_TCP_SEND_NONBLOCK_
NO_PACKETS_MODE
201502 3.0.0.3 ControlnonblockingTCPsend()
callbehaviorwhenunableto
allocatesufficientpacket
buffers.
EF_CLUSTER_IGNORE 201405u1 4.0 Ignoreattemptstouseclusters
EF_CLUSTER_RESTART 201405 4.0 DetermineOnloadcluster
behaviorfollowingrestart.
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 137
EF_CLUSTER_SIZE 201405 4.0 Size(numberofsocket
members)ofapplication
cluster.
EF_CLUSTER_NAME 201405 4.0 Createanapplicationcluster.
EF_UDP_FORCE_REUSEPOR
T
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_TCP_FORCE_REUSEPOR
T
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_MCAST_SEND 201405 4.0 Enable/Disablemulticast
loopback.
EF_MCAST_RECV_HW_LOOP 201405 4.0 Enable/Disablehardware
multicastloopback‐receive.
EF_TX_TIMESTAMPING 201405 4.0 Perstackhardware
timestampingcontrol.
EF_TIMESTAMPING_REPOR
TING
201405 4.0 Controltimestampreporting.
EF_TCP_SYNCOOKIES 201405 4.0 UseTCPsyncookiestoprotect
againstSYNfloodattack.
EF_SYNC_CPLANE_AT_CRE
ATE
201405 3.0 Synchronizecontrolplanewhen
astackiscreated.
EF_MULTICAST_LOOP_OFF 3.0 201405 Deprecatedinfavorof
EF_MCAST_SEND
EF_TX_PUSH_THRESHOLD 201310_u1 3.0 ImproveEF_TX_PUSHlow
latencytransmitfeature.
EF_RX_TIMESTAMPING 201310_u1 3.0 Controlofreceivepacket
hardwaretimestamps.
EF_RETRANSMIT_THRESHO
LD_SYNACK
201104 1.0.0.0 201310u1 Defaultchangedfrom4to5.
EF_PIO 201310 3.0 Enable/disablePIO
Defaultvalue1.
EF_PIO_THRESHOLD 201310 3.0 Identifiesthelargestpacketsize
thatcanusePIO.Defaultvalue
is1514.
EF_VFORK_MODE 201310 3.0 Dictateshowvfork()intercept
shouldwork.
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 138
EF_FREE_PACKETS_LOW_W
ATERMARK
201310 3.0 201405u1 Leveloffreepacketstobe
retainedduringruntime.
Defaultchangedto0
(interpretedasEF_RXQ_SIZE/2 )
from100.
EF_TCP_SNDBUF_MODE 201310 2.0.0.6 201502
4.0
201509
LimitTCPpacketbuffersused
onthesendqueueand
retransmitqueue.
Defaultchangedto1from0in
201502/4.0.
Addedmode2in201509.
EF_TXQ_SIZE 3.0 201310 Limitedto2048forSFN7000
series.
EF_MAX_ENDPOINTS 201104 1.1.0.3 201310
201509
Defaultchangedto1024from
10.
Defaultchangesto8192from
1024.Min(default)changesto
4from0.
EF_SO_TIMESTAMP_RESYN
C_TIME
201104 2.1.0.1 201310 RemovedfromOOL.
EF_SIGNALS_NOPOSTPONE 201210u1 2.1.0.1 Preventthespecifiedlistof
signalsfrombeingpostponed
byonload.
EF_FORCE_TCP_NODELAY 201210 3.0 ForceuseofTCP_NODELAY.
EF_USE_HUGE_PAGES 201210 3.0 Enableshugepagesforpacket
buffers.
EF_VALIDATE_ENV 201210 3.0 Willwarnaboutobsoleteor
misspelledoptionsinthe
environment
Defaultvalue1.
EF_PD_VF 201205u1 2.1.0.0 201210 AllocateVIswithinSRIOVVFs
toallocateunlimitedmemory.
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 139
EF_PD_PHYS_MODE 201205_u1 2.1.0.0 201210 AllowsaVItousephysical
addressingratherthan
protectedI/Oaddresses
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
EF_MAX_PACKETS 20101111 1.0.0.0 201210 Onloadwillroundthespecified
valueuptothenearestmultiple
of1024.
EF_EPCACHE_MAX 20101111 1.0.0.0 201210 RemovedfromOOL
EF_TCP_MAX_SEQERR_MSG
S
NS 201210 Removed
EF_STACK_LOCK_BUZZ 20101111 1.0.0.0 201210 OOLChangetoper_process,
fromper_stack.EOLisper
stack.
EF_RFC_RTO_INITIAL 20101111 1.0.0.0 201210
2.1.0.0
Changedefaultto1000from
3000
EF_DYNAMIC_ACK_THRESH 201205 2.1.0.0 201210 Defaultvaluechangedto16
from32in201210
EF_TCP_SERVER_LOOPBAC
K
EF_TCP_CLIENT_LOOPBAC
K
201205 2.1.0.0 201210 TCPloopbackacceleration
Addedoption4forclient
loopbacktocausebothendsof
aTCPconnectiontosharea
newlycreatedstack.
Option4issupportedfrom
EnterpriseOnloadv3.0.
EF_TCP_RST_DELAYED 201205 2.1.0.0 ResetTCPconnectionfollowing
RTOexpiry
EF_SA_ONSTACK_INTERCE
PT
201205 2.1.0.0 Defaultvalue0
EF_SHARE_WITH 201109u2 2.0.0.0
EF_EPOLL_CTL_HANDOFF 201109u2 2.0.0.0 Defaultvalue1
EF_CHECK_STACK_USER NS 201109u2 RenamedEF_SHARE_WITH
EF_POLL_USEC 201109u1 1.0.0.0
EF_DEFER_WORK_LIMIT 201109u1 2.0.0.0 Defaultvalue32
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 140
EF_POLL_FAST_LOOPS 20101111 1.0.0.0 201109u1
2.0.0.0
RenamedEF_POLL_FAST_USEC
EF_POLL_NONBLOCK_
FAST_LOOPS
201104 2.0.0.0 201109u1
2.0.0.1
RenamedEF_POLL_NONBLOCK_
FAST_USEC
EF_PIPE_RECV_SPIN 201104 2.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_PKT_WAIT_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_PIPE_SEND_SPIN 201104 2.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_ACCEPT_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_RECV_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_SEND_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_UDP_RECV_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_UDP_SEND_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_EPOLL_NONBLOCK_FAS
T_LOOPS
201104u2 2.0.0.0 201109u1 Removed
EF_POLL_AVOID_INT 20101111 1.0.0.0 201109u1 Removed
EF_SELECT_AVOID_INT 20101111 1.0.0.0 201109u1 Removed
EF_SIG_DEFER 20101111 1.0.0.0 201109u1 Removed
EF_IRQ_CORE 201109 2.0.0.0 201109u2 Nonrootusercannowsetit
whenusingscalablepacket
buffermode
EF_IRQ_CHANNEL 201109 2.0.0.0
EF_IRQ_MODERATION 201109 2.0.0.0 Defaultvalue0
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 141
EF_PACKET_BUFFER_MODE 201109 2.0.0.0 201210 In201210options2and3
enablephysicaladdressing
mode.
EOLonlysupportsoption1.
EOLv3.0supportsoptions2and
3.
Default‐disabled
EF_SIG_REINIT 201109 NS Defaultvalue0.
201109u1 Removedin201109u1
EF_POLL_TCP_LISTEN_UL
_ONLY
201104 2.0.0.0 201109 Removed
EF_POLL_UDP 20101111 1.0.0.0 201109 Removed
EF_POLL_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_POLL_UDP_UL_ONLY 201104 2.0.0.0 201109 Removed
EF_SELECT_UDP 20101111 1.0.0.0 201109 Removed
EF_SELECT_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_UDP_CHECK_ERRORS 20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_FAST_LOOP
S
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_MCAST_UL_
ONLY
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_UL_ONLY 20101111 1.0.0.0 201109 Removed
EF_TX_QOS_CLASS 201104u2 2.0.0.0 Defaultvalue0
EF_TX_MIN_IPG_CNTL 201104u2 2.0.0.0 Defaultvalue0
EF_TCP_LISTEN_HANDOVE
R
201104u2 2.0.0.0 Defaultvalue0
EF_TCP_CONNECT_HANDOV
ER
201104u2 2.0.0.0 Defaultvalue0
EF_EPOLL_NONBLOCK_FAS
T_LOOPS
201104u2 2.0.0.0 Defaultvalue32
201109u1 Removedin201109u1
EF_TCP_SNDBUF_MODE 2.0.0.6 Defaultvalue0
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 142
EF_UDP_PORT_HANDOVER2
_MAX
201104u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_HANDOVER2
_MIN
201104u1 2.0.0.0 Defaultvalue2
EF_UDP_PORT_HANDOVER3
_MAX
201104u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_HANDOVER3
_MIN
201104u1 2.0.0.0 Defaultvalue2
EF_STACK_PER_THREAD 201104u1 2.0.0.0 Defaultvalue0
EF_PREFAULT_PACKETS 20101111 1.0.0.0 201104u1 Enabledbydefault,was
previouslydisabled
EF_MCAST_RECV 201104u1 2.0.0.0 Defaultvalue1
EF_MCAST_JOIN_BINDTOD
EVICE
201104u1 2.0.0.0 Defaultvalue0
EF_MCAST_JOIN_HANDOVE
R
201104u1 2.0.0.0 Defaultvalue0
EF_DONT_ACCELERATE 201104u1 2.0.0.0 Defaultvalue0
EF_MULTICAST 20101111 1.0.0.0 201104u1 Removed
EF_TX_PUSH 20101111u1 1.0.0.0 201104 Enabledbydefault,was
previouslydisabled
201109 Nolongersetbythelatency
profilescript
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 143
12.3ModuleOptions
Tolistallonloadmoduleoptions:
#modinfoonload
Option OOL EOL Changed Notes
scalable_filter_gid 201509 NS SettoagroupIdentifierof
usersallowedtousethe
scalablefiltersfeature.
Setto‐2meansthat
CAP_NET_RAWisrequired‐ and
checkingisenforced.
Setto‐1toavoidcapability
(CAP_NET_RAW)check.
oof_shared_steal_thre
sh
SeeListen/AcceptSocketson
page79
oof_shared_keep_thres
h
SeeListen/AcceptSocketson
page79
oof_all_ports_require
d
Whensetto1,Onloadwill
returnanerrorifitisunableto
installafilteronallrequired
interfaces.
Setthisto0whenusing
multiplePFsorVFswithOnload.
intf_white_list 201502 NS SeeWhitelistandBlacklist
Interfacesonpage51
intf_black_list 201502 NS SeeWhitelistandBlacklist
Interfacesonpage51
timesync_period 201502 NS Periodinmillisecondsbetween
synchronizingtheOnloadclock
withthesystemclock.
max_packets_per_stack 201210 3.0 Limitthenumberofpacket
buffersthateachOnloadstack
canallocate.Thismodule
optionplacesanupperlimiton
theEF_MAX_PACKETSoption
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 144
epoll2_max_stacks 201210 3.0 Identifiesthemaximum
numberofstacksthatanepoll
filedescriptorcanhandlewhen
EF_UL_EPOLL=2
phys_mod_gid 201210 3.0 sfc_charmoduleparameterto
restrictwhichef_viuserscan
usephysicaladdressingmode.
phys_mode_gid 201210 3.0 Enablephysicaladdressing
modeandrestrictwhichusers
canuseit
shared_buffer_table 201210 NS Thisoptionshouldbesetto
enableef_viapplicationsthat
usetheef_iobufsetAPI.Setting
shared_buffer_table=10000
willmake10000buffertable
entriesavailableforusewith
ef_iobufset.
safe_signals_and_exit 201205 2.1.0.0 WhenOnloadinterceptsa
terminationsignalitwill
attemptacleanexitbyreleasing
resourcesincludingstacklocks
etc.Thedefaultis(1)enabled
anditisrecommendedthatthis
remainsenabledunlesssignal
handlingproblemsoccurwhen
itcanbedisabled(0).
max_layer2_interfaces 201205 2.1.0.0 Maximumnumberofnetwork
interfaces(includesphysical,
VLANandbonds)supportedin
thecontrolplane.
max_routes 201205 2.1.0.0 Maximumnumberofentriesin
theOnloadroutetable.Default
is256.
max_neighs 201205 2.1.0.0 Maximumnumberofentriesin
OnloadARP/neighbourtable.
Roundeduptopoweroftwo
value.Defaultis1024.
Option OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue20 ©SolarflareCommunications2015 145
NOTE:TheusershouldalwaysrefertotheOnloaddistributionReleaseNotesand
ChangeLog.Theseareavailablefromhttp://www.openonload.org/
download.html.
unsafe_sriov_without_
iommu
201209u2 2.0.0.0 201210 Removed,obsoletedbyphysical
addressingmodesand
phys_mode_gid.
ObsoleteinEOLfromv3.0.
buffer_table_min
buffer_table_max
2.0.0.0 201210 Obsolete‐Removed.
ObsoleteinEOLfromv3.0.
Option OOL EOL Changed Notes
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 146
AParameterReference
A.1ParameterList
Theparameterlistdetailsthefollowing:
•Theenvironmentvariableusedtosettheparameter.
• Parametername:thenameusedbyonload_stackdump.
•Thedefault,minimumandmaximumvalues.
•Whetherthevariablescopeappliesperstackorperprocess.
• Description.
EF_ACCEPTQ_MIN_BACKLOG
Name:acceptq_min_backlog default:1 perstack
Setsaminimumvaluetouseforthe'backlog'argumenttothelisten()call.Ifthe
applicationrequestsasmallervalue,usethisvalueinstead.
EF_ACCEPT_INHERIT_NONBLOCK
Name:accept_force_inherit_nonblock default:0 min:0 max:1 per
process
Ifsetto1,TCPsocketsacceptedfromalisteningsocketinherittheO_NONBLOCKflag
fromthelisteningsocket.
EF_BINDTODEVICE_HANDOVER
Name:bindtodevice_handover default:0 min:0 max:1 perstack
HandsocketsovertothekernelstackthathavetheSO_BINDTODEVICEsocketoption
enabled.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 147
EF_BURST_CONTROL_LIMIT
Name:burst_control_limit default:0 perstack
Ifnonzero,limitshowmanybytesofdataaretransmittedinasingleburst.Thiscanbe
usefultoavoiddropsonlowendswitcheswhichcontainlimitedbufferingorlimited
internalbandwidth.Thisisnotusuallyneededforusewithmostmodern,high
performanceswitches.
EF_BUZZ_USEC
Name:buzz_usec default:0 perstack
Setsthetimeoutinmicrosecondsforlockbuzzingoptions.Settozerotodisablelock
buzzing(spinning).Willbuzzforeverifsetto‐1.AlsosetbytheEF_POLL_USECoption.
EF_CLUSTER_IGNORE
Name:cluster_ignore default:0 min:0 max:1 perstack
Whenset,thisoptioninstructsOnloadtoignoreattemptstouseclustersandeffectively
ignoreattemptstosetSO_REUSEPORT.
EF_CLUSTER_RESTART
Name:cluster_restart_opt default:0 min:0 max:1 perprocess
Thisoptioncontrolsthebehaviourwhenrecreatingastack(e.g.duetorestartinga
process)inanSO_REUSEPORTclusteranditencountersaresourcelimitationsuchasan
orphanstackfromthepreviousprocess:0‐returnanerror.1‐terminatetheorphanto
allowthenewprocesstocontinue
EF_CLUSTER_SIZE
Name:cluster_size default:2 min:2 perprocess
IfuseofSO_REUSEPORTcreatesacluster,thisoptionspecifiessizeoftheclustertobe
created.ThisoptionhasnoimpactifuseofSO_REUSEPORTjoinsaclusterthatalready
exists.Notethatiffewersocketsthanspecifiedherejointhecluster,thensometraffic
willbelost.RefertotheSO_REUSEPORTsectioninthemanualformoredetail.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 148
EF_COMPOUND_PAGES_MODE
Name:compound_pages default:0 min:0 max:2 perstack
Debugoption,notsuitablefornormaluse.Forpacketbuffers,allocatesystempagesin
thefollowingway:0‐trytousecompoundpagesifpossible(default);1‐donotuse
compoundpagesofhighorder;2‐donotusecompoundpagesatall.
EF_CONG_AVOID_SCALE_BACK
Name:cong_avoid_scale_back default:0 perstack
When>0,thisoptionslowsdowntherateatwhichtheTCPcongestionwindowis
opened.Thiscanhelptoreducelossinenvironmentswherethereislotsofcongestion
andloss.
EF_DEFER_WORK_LIMIT
Name:defer_work_limit default:32 perstack
Themaximumnumberoftimesthatworkcanbedeferredtothelockholderbeforewe
forcetheunlockedthreadtoblockandwaitforthelock
EF_DELACK_THRESH
Name:delack_thresh default:1 min:0 max:65535 perstack
Thisoptioncontrolsthedelayedacknowledgementalgorithm.Asocketmayreceiveup
tothespecifiednumberofTCPsegmentswithoutgeneratinganACK.Settingthisoption
to0disablesdelayedacknowledgements.NB.Thisoptionisoverriddenby
EF_DYNAMIC_ACK_THRESH,sobothoptionsneedtobesetto0todisabledelayed
acknowledgements.
EF_DONT_ACCELERATE
Name:dont_accelerate default:0 min:0 max:1 perprocess
Donotacceleratebydefault.Thisoptionisusuallyusedinconjuctionwith
onload_set_stackname()toallowindividualsocketstobeacceleratedselectively.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 149
EF_DYNAMIC_ACK_THRESH
Name:dynack_thresh default:16 min:0 max:65535 perstack
Ifsetto>0thiswillturnondynamicadapationoftheACKratetoincreaseefficiencyby
avoidingACKswhentheywouldreducethroughput.Thevalueisusedasthethreshold
fornumberofpendingACKsbeforeanACKisforced.Ifsettozerothenthestandard
delayedackalgorithmisused.
EF_EPOLL_CTL_FAST
Name:ul_epoll_ctl_fast default:1 min:0 max:1 perprocess
Avoidsystemcallsinepoll_ctl()whenusinganacceleratedepollimplementation.
Systemcallsaredeferreduntilepoll_wait()blocks,andinsomecasesremoved
completely.Thisoptionimprovesperformanceforapplicationsthatcallepoll_ctl()
frequently.CAVEATS:*ThisoptionhasnoeffectwhenEF_UL_EPOLL=0.*Donotturnthis
optiononifyourapplicationusesdup(),fork()orexec()incojuctionwithepollfile
descriptorsorwiththesocketsmonitoredbyepoll.*Ifyoumonitortheepollfdin
anotherpoll,selectorepollset,andtheeffectsofepoll_ctl()arelatencycritical,then
thisoptioncancauselatencyspikesorevendeadlock.*WithEF_UL_EPOLL=2,this
optionisharmfulifyouarecallingepoll_wait()andepoll_ctl()simultaneouslyfrom
differentthreadsorprocesses.
EF_EPOLL_CTL_HANDOFF
Name:ul_epoll_ctl_handoff default:1 min:0 max:1 perprocess
Allowepoll_ctl()callstobepassedfromonethreadtoanotherinordertoavoidlock
contention,inEF_UL_EPOLL=1or3case.Thisoptimisationisparticularlyimportant
whenepoll_ctl()callsaremadeconcurrentlywithepoll_wait()andspinningis
enabled.Thisoptionisenabledbydefault.CAVEAT:Thisoptionmaycauseanerrorcode
returnedbyepoll_ctl()tobehiddenfromtheapplicationwhenacallisdeferred.Insuch
casesanerrormessageisemittedtostderrorthesystemlog.
EF_EPOLL_MT_SAFE
Name:ul_epoll_mt_safe default:0 min:0 max:1 perprocess
Thisoptiondisablesconcurrencycontrolinsidetheacceleratedepollimplementations,
reducingCPUoverhead.Itissafetoenablethisoptionif,foreachepollset,allcallson
theepollsetandallcallsthatmaymodifyamemberoftheepollsetareconcurrency
safe.Callsthatmaymodifyamemberarebind(),connect(),listen()andclose().This
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 150
optionimprovesperformancewithEF_UL_EPOLL=1or3andalsowithEF_UL_EPOLL=2
andEF_EPOLL_CTL_FAST=1.
EF_EPOLL_SPIN
Name:ul_epoll_spin default:0 min:0 max:1 perprocess
Spininepoll_wait()callsuntilaneventissatisfiedorthespintimeoutexpires
(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernelandblock.The
spintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_EVS_PER_POLL
Name:evs_per_poll default:64 min:0 max:0x7fffffff perstack
Setsthenumberofhardwarenetworkeventstohandlebeforeperformingotherwork.
Thevaluechosenrepresentsatradeoff:Largervaluesincreasebatching(which
typicallyimprovesefficiency)butmayalsoincreasetheworkingsetsize(whichharms
cacheefficiency).
EF_FDS_MT_SAFE
Name:fds_mt_safe default:1 min:0 max:1 perprocess
Thisoptionallowslessstrictconcurrencycontrolwhenaccessingtheuserlevelfile
descriptortable,resultinginincreasedperformance,particularlyformultithreaded
applications.Singlethreadedapplicationsgetasmalllatencybenefit,butmulti
threadedapplicationsbenefitmostduetodecreasedcachelinebouncingbetweenCPU
cores.Thisoptionisunsafeforapplicationsthatmakechangestofiledescriptorsinone
threadwhileaccessingthesamefiledescriptorsinotherthreads.Forexample,closinga
filedescriptorinonethreadwhileinvokinganothersystemcallonthatfiledescriptorin
asecondthread.Concurrentcallsthatdonotchangetheobjectunderlyingthefile
descriptorremainsafe.Callstobind(),connect(),listen()maychangeunderlyingobject.
Ifyoucallsuchfunctionsinonethreadwhileaccessingthesamefiledescriptorfromthe
otherthread,thisoptionisalsounsafe.Insomespecialcases,anyfunctionsmay
changeunderlyingobject.Alsoconcurrentcallsmayhappenfromsignalhandlers,soset
thisto0ifyoursignalhandlerscallbind(),connect(),listen()orclose()
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 151
EF_FDTABLE_SIZE
Name:fdtable_size default:0 perprocess
Limitthenumberofopenedfiledescriptorsbythisvalue.Ifzero,theinitialhardlimitof
openfiles(`ulimit‐n‐H`)isused.Hardandsoftresourcelimitsforopenedfile
descriptors(helpulimit,man2setrlimit)areboundbythisvalue.
EF_FDTABLE_STRICT
Name:fdtable_strict default:0 min:0 max:1 perprocess
Enablesmorestrictconcurrencycontrolfortheuserlevelfiledescriptortable.Enabling
thisoptioncanreduceperformanceforapplicationsthatcreateanddestroymany
connectionspersecond.
EF_FORCE_SEND_MULTICAST
Name:force_send_multicast default:1 min:0 max:1 perstack
Thisoptioncausesallmulticastsendstobeaccelerated.Whendisabled,multicast
sendsareonlyacceleratedforsocketsthathaveclearedtheIP_MULTICAST_LOOP
flag.Thisoptiondisablesloopbackofmulticasttraffictoreceiversonthesamehost,
unless(a)thosereceiversaresharinganOpenOnloadstackwiththesender(see
EF_NAME)andEF_MCAST_SENDissetto1or3,or(b)prerequisitestosupportloopback
tootherOpenOnloadstacksaremet(seeEF_MCAST_SEND).SeetheOpenOnload
manualforfurtherdetailsonmulticastoperation.
EF_FORCE_TCP_NODELAY
Name:tcp_force_nodelay default:0 min:0 max:2 perstack
ThisoptionallowstheusertooverridetheuseofTCP_NODELAY.Thismaybeusefulin
caseswhere3rdpartysoftwareis(not)settingthisvalueandtheuserwouldliketo
controlitsbehaviour:0‐donotoverride1‐alwayssetTCP_NODELAY2‐neverset
TCP_NODELAY
EF_FORK_NETIF
Name:fork_netif default:3 min:CI_UNIX_FORK_NETIF_NONE max:
CI_UNIX_FORK_NETIF_BOTH perprocess
Thisoptioncontrolsbehaviourafteranapplicationcallsfork().0‐Neitherforkparent
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 152
norchildcreatesanewOpenOnloadstack1‐Childcreatesanewstackfornewsockets
2‐Parentcreatesanewstackfornewsockets3‐Parentandchildeachcreateanew
stackfornewsockets
EF_FREE_PACKETS_LOW_WATERMARK
Name:free_packets_low default:0 perstack
Keepfreepacketsnumbertobeatleastthisvalue.EF_MIN_FREE_PACKETSdefines
initialisationbehaviour;thisvalueisaboutnormalapplicationruntime.Insome
combinationsofhardwareandsoftware,Onloadisnotableallocatepacketsatany
context,soitmakessensetokeepsomesparepackets.Defaultvalue0isinterpretedas
EF_RXQ_SIZE/2
EF_HELPER_PRIME_USEC
Name:timer_prime_usec default:250 perstack
Setsthefrequencywithwhichsoftwareshouldresetthecountdowntimer.Usuallyset
toavaluethatissignificantlysmallerthanEF_HELPER_USECtopreventthecountdown
timerfromfiringunlessneeded.Defaultsto(EF_HELPER_USEC/2).
EF_HELPER_USEC
Name:timer_usec default:500 perstack
Timeoutinmicrosecondsforthecountdowninterrupttimer.Thistimergeneratesan
interruptifnetworkeventsarenothandledbytheapplicationwithinthegiventime.It
ensuresthatnetworkeventsarehandledpromptlywhentheapplicationisnotinvoking
thenetwork,orisdescheduled.Setthisto0todisablethecountdowninterrupttimer.
Itisdisabledbydefaultforstacksthatareinterruptdriven.
EF_INT_DRIVEN
Name:int_driven default:1 min:0 max:1 perstack
Putthestackintoan'interruptdriven'modeofoperation.Whenthisoptionisnot
enabledOnloadusesheuristicstodecidewhentoenableinterrupts,andthiscancause
latencyjitterinsomeapplications.Soenablingthisoptioncanhelpavoidlatency
outliers.Thisoptionisenabledbydefaultexceptwhenspinningisenabled.Thisoption
canbeusedinconjunctionwithspinningtopreventoutlierscausedwhenthespin
timeoutisexceededandtheapplicationblocks,orwhentheapplicationisdescheduled.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 153
Inthiscasewerecommendthatinterruptmoderationbesettoareasonablyhighvalue
(eg.100us)topreventtoohigharateofinterrupts.
EF_INT_REPRIME
Name:int_reprime default:0 min:0 max:1 perstack
Enableinterruptsmoreaggressivelythanthedefault.
EF_IRQ_CHANNEL
Name:irq_channel default:4294967295 min:-1 max:SMAX perstack
Setthenetdriverreceivechannelthatwillbeusedtohandleinterruptsforthisstack.
Thecorethatreceivesinterruptsforthisstackwillbewhichevercoreisconfiguredto
handleinterruptsforthespecifiednetdriverreceivechannel.Thisoptiononlytakes
effectEF_PACKET_BUFFER_MODE=0(default)or2.
EF_IRQ_CORE
Name:irq_core default:4294967295 min:-1 max:SMAX perstack
SpecifywhichCPUcoreinterruptsforthisstackshouldbehandledon.With
EF_PACKET_BUFFER_MODE=1or3,Onloadcreatesdedicatedinterruptsforeachstack,
andtheinterruptisassignedtotherequestedcore.WithEF_PACKET_BUFFER_MODE=0
(default)or2,Onloadinterruptsarehandledvianetdriverreceivechannelinterrupts.
Thesfc_affinitydriverisusedtochoosewhichnetdriverreceivechannelisused.Itis
onlypossibleforinterruptstobehandledontherequestedcoreifanetdriverinterrupt
isassignedtotheselectedcore.Otherwiseanearbycorewillbeselected.Notethatif
theIRQbalancerserviceisenableditmayredirectinterruptstoothercores.
EF_IRQ_MODERATION
Name:irq_usec default:0 min:0 max:1000000 perstack
Interruptmoderationinterval,inmicroseconds.Thisoptiononlytakeseffectivewith
EF_PACKET_BUFFER_MODE=1or3.Otherwisetheinterruptmoderationsettingsofthe
kernelnetdrivertakeeffect.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 154
EF_KEEPALIVE_INTVL
Name:keepalive_intvl default:75000 perstack
Defaultintervalbetweenkeepalives,inmilliseconds.
EF_KEEPALIVE_PROBES
Name:keepalive_probes default:9 perstack
Defaultnumberofkeepaliveprobestotrybeforeabortingtheconnection.
EF_KEEPALIVE_TIME
Name:keepalive_time default:7200000 perstack
Defaultidletimebeforekeepaliveprobesaresent,inmilliseconds.
EF_LOAD_ENV
Name:load_env default:1 min:0 max:1 perprocess
OpenOnloadwillonlyconsultotherenvironmentvariablesifthisoptionisset.i.e.
ClearingthisoptionwillcauseallotherEF_environmentvariablestobeignored.
EF_LOG
Name:log_category default:27 min:0 perstack
DesignedtocontrolhowchattyOnload'sinformative/warningmessagesare.Specified
asacommaseperatedlistofoptionstoenableanddisable(withaminussign).Valid
optionsare'banner'(onbydefault),'resource_warnings'(onbydefault),
'config_warnings'(onbydefault)'conn_drop'(offbydefault)and'usage_warnings'(on
bydefault).E.g.:Toenableconn_drop:EF_LOG=conn_drop.E.g.:Toenableconn_drop
andturnoffresourcewarnings:EF_LOG=conn_drop,resource_warnings
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 155
EF_LOG_FILE
Scope:perstack
WhenEF_LOG_VIA_IOCTLisunset,theusercandirectOnloaddebugandoutputdatato
adirectory/fileinsteadofstdoutandinsteadofthesyslog.
EF_LOG_TIMESTAMPS
Name:EF_LOG_TIMESTAMPS default:0 min:0max:1 global
IfenabledthiswilladdatimestamptoeveryOnloadoutputlogentry.Timestampsare
originatedfromtheFRCcounter.
EF_LOG_VIA_IOCTL
Name:log_via_ioctl default:0 min:0 max:1 perprocess
CauseserrorandlogmessagesemittedbyOpenOnloadtobewrittentothesystemlog
ratherthanwrittentostandarderror.Thisincludesthecopyrightbanneremittedwhen
anapplicationcreatesanewOpenOnloadstack.Bydefault,OpenOnloadlogsarewritten
totheapplicationstandarderrorifandonlyifitisaTTY.Enablethisoptionwhenitis
importantnottochangewhattheapplicationwritestostandarderror.Disableitto
guaranteethatloggoestostandarderrorevenifitisnotaTTY.
EF_MAX_ENDPOINTS
Name:max_ep_bufs default:8192 min:4 max:
CI_CFG_NETIF_MAX_ENDPOINTS_MAX perstack
Thisoptionplacesanupperlimitonthenumberofacceleratedendpoints(sockets,
pipesetc.)inanOnloadstack.Thisoptionshouldbesettoapoweroftwobetween4
and2^21.Whenthislimitisreachedlisteningsocketsarenotabletoacceptnew
connectionsoveracceleratedinterfaces.Newsocketsandpipescreatedviasocket()
andpipe()etc.arehandedovertothekernelstackandsoarenotaccelerated.Note:~4
synreceivestatesconsumeoneendpoint,seealsoEF_TCP_SYNRECV_MAX.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 156
EF_MAX_PACKETS
Name:max_packets default:32768 min:1024 perstack
UpperlimitonnumberofpacketbuffersineachOpenOnloadstack.Packetbuffers
requirehardwareresourceswhichmaybecomealimitingfactorifmanystacksareeach
usingmanypacketbuffers.Thisoptioncanbeusedtolimithowmuchhardware
resourceandmemoryastackuses.Thisoptionhasanupperlimitdeterminedbythe
max_packets_per_stackonloadmoduleoption.Note:When'scalablepacketbuffer
mode'isnotenabled(seeEF_PACKET_BUFFER_MODE)thetotalnumberofpacket
bufferspossibleinaggregateislimitedbyahardwareresource.TheSFN5xseries
adapterssupportapproximately120,000packetbuffers.
EF_MAX_RX_PACKETS
Name:max_rx_packets default:24576 min:0 max:1000000000 per
stack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythereceivedata
path.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethatsome
packetbuffersarereservedforthetransmitpath.
EF_MAX_TX_PACKETS
Name:max_tx_packets default:24576 min:0 max:1000000000 per
stack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythetransmitdata
path.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethatsome
packetbuffersarereservedforthereceivepath.
EF_MCAST_JOIN_BINDTODEVICE
Name:mcast_join_bindtodevice default:0 min:0 max:1 perstack
WhenaUDPsocketjoinsamulticastgroup(usingIP_ADD_MEMBERSHIPorsimilar),this
optioncausesthesockettobeboundtotheinterfacethatthejoinwason.Thebenefit
ofthisisthatitensuresthesocketwillnotaccidentallyreceivepacketsfromother
interfacesthathappentomatchthesamegroupandport.Thiscansometimeshappen
ifanothersocketjoinsthesamemulticastgrouponadifferentinterface,oriftheswitch
isnotfilteringmulticasttrafficeffectively.Ifthesocketjoinsmulticastgroupsonmore
thanoneinterface,thenthebindingisautomaticallyremoved.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 157
EF_MCAST_JOIN_HANDOVER
Name:mcast_join_handover default:0 min:0 max:2 perstack
Whenthisoptionissetto1,andaUDPsocketjoinsamulticastgrouponaninterface
thatisnotaccelerated,theUDPsocketishandedovertothekernelstack.Thiscanbea
goodideabecauseitpreventsthatsocketfromconsumingOnloadresources,andmay
alsohelpavoidspinningwhenitisnotwanted.Whensetto2,UDPsocketsthatjoin
multicastgroupsarealwayshandedovertothekernelstack.
EF_MCAST_RECV
Name:mcast_recv default:1 min:0 max:1 perstack
Controlswhetherornottoacceleratemulticastreceives.Whensettozero,multicast
receivesarenotaccelerated,butthesocketcontinuestobemanagedbyOnload.See
alsoEF_MCAST_JOIN_HANDOVER.SeetheOpenOnloadmanualforfurtherdetailson
multicastoperation.
EF_MCAST_RECV_HW_LOOP
Name:mcast_recv_hw_loop default:1 min:0 max:1 perstack
Whenenabledallowsudpsocketstoreceivemulticasttrafficthatoriginatesfromother
OpenOnloadstacks.SeetheOpenOnloadmanualforfurtherdetailsonmulticast
operation.
EF_MCAST_SEND
Name:mcast_send default:0 min:0 max:3 perstack
ControlsloopbackofmulticasttraffictoreceiversinthesameandotherOpenOnload
stacks.Whensetto0(default)disablesloopbackwithinthesamestackaswellasto
otherOpenOnloadstacks.Whensetto1enablesloopbacktothesamestackWhensetto
2enablesloopbacktootherOpenOnloadstacks.Whensetto3enablesloopbacktothe
sameaswellasotherOpenOnloadstacks.InrespecttoloopbacktootherOpenOnload
stackstheoptionsisjustahintandthefeaturerequires:(a)7000seriesornewer
device,and(b)selectingfirmwarevariantwithloopbacksupport.SeetheOpenOnload
manualforfurtherdetailsonmulticastoperation.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 158
EF_MIN_FREE_PACKETS
Name:min_free_packets default:100 min:0 max:1000000000 per
stack
Minimumnumberoffreepacketstoreserveforeachstackatinitialisation.IfOnloadis
notabletoallocatesufficientpacketbufferstofilltheRXringsandfillthefreepoolwith
thegivennumberofbuffers,thencreationofthestackwillfail.
EF_MULTICAST_LOOP_OFF
Name:multicast_loop_off default:1 min:0 max:1 perstack
EF_MULTICAST_LOOP_OFFisdeprecatedinfavourofEF_MCAST_SENDWhenset,
disablesloopbackofmulticasttraffictoreceiversinthesameOpenOnloadstack.This
optiononlytakeseffectwhenEF_MCAST_SENDisnotsetandisequivalentto
EF_MCAST_SEND=1orEF_MCAST_SEND=0forvaluesof0and1respectively.Seethe
OpenOnloadmanualforfurtherdetailsonmulticastoperation.
EF_NETIF_DTOR
Name:netif_dtor default:1 min:0 max:2 perprocess
ThisoptioncontrolsthelifetimeofOpenOnloadstackswhenthelastsocketinastackis
closed.
EF_NAME
Default:none min:8 chars perstack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stacknameoverridesthisvariableand,
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
EF_NONAGLE_INFLIGHT_MAX
Name:nonagle_inflight_max default:50 min:1 perstack
ThisoptionaffectsthebehaviourofTCPsocketswiththeTCP_NODELAYsocketoption.
Nagle'salgorithmisenabledwhenthenumberofpacketsinflight(sentbutnot
acknowledged)exceedsthevalueofthisoption.Thisimprovesefficiencywhensending
manysmallmessages,whilepreservinglowlatency.Setthisoptionto‐1toensurethat
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 159
Nagle'salgorithmneverdelayssendingofTCPmessagesonsocketswithTCP_NODELAY
enabled.
EF_NO_FAIL
Name:no_fail default:1 min:0 max:1 perprocess
Thisoptioncontrolswhetherfailuretocreateanacceleratedsocket(duetoresource
limitations)ishiddenbycreatingaconventionalunacceleratedsocket.Setthisoption
to0tocauseoutofresourceserrorstobepropagatedaserrorstotheapplication,orto
1tohaveOnloadusethekernelstackinsteadwhenoutofresources.Disablingthis
optioncanbeusefultoensurethatsocketsarebeingacceleratedasexpected(ie.tofind
outwhentheyarenot).
EF_PACKET_BUFFER_MODE
Name:packet_buffer_mode default:0 min:0 max:3 perstack
ThisoptionaffectshowDMAbuffersaremanaged.Thedefaultpacketbuffermode
usesalimitedhardwareresource,andsorestrictsthetotalamountofmemorythatcan
beusedbyOnloadforDMA.SettingEF_PACKET_BUFFER_MODE!=0enables'scalable
packetbuffermode'whichremovesthatlimit.Seedetailsforeachmodebelow.1‐
SRIOVwithIOMMU.EachstackallocatesaseparatePCIVirtualFunction.IOMMU
guaranteesthatdifferentstacksdonothaveanyaccesstoeachotherdata.2‐Physical
addressmode.Inherentlyunsafe;noaddressspaceseparationbetweendifferent
stacksornetdriverpackets.3‐SRIOVwithphysicaladdressmode.Eachstack
allocatesaseparatePCIVirtualFunction.IOMMUisnotused,sothismodeisunsafein
thesamewayas(2).Touseoddmodes(1and3)SRIOVmustbeenabledintheBIOS,OS
kernelandonthenetworkadapter.Inthesemodesyoualsogetfasterinterrupt
handlerwhichcanimprovelatencyforsomeworkloads.Formode(1)youalsohaveto
enableIOMMU(alsoknownasVTd)inBIOSandinyourkernel.Forunsafephysical
addressmodes(2)and(3),youshouldtunephys_mode_gidmoduleparameterofthe
onloadmodule.
EF_PER_SOCKET_CACHE_MAX
Name:per_sock_cache_max default:0 perstack
Whensocketcachingisenabled,(i.e.whenEF_SOCKET_CACHE_MAX>0),thissetsa
furtherlimitonthesizeofthecacheforeachsocket.Ifsettozero,nolimitissetbeyond
thegloballimitspecifiedbyEF_SOCKET_CACHE_MAX.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 160
EF_PIO
Name:pio default:1 min:0 max:2 perstack
ControlofwhetherProgrammedI/OisusedinsteadofDMAforsmallpackets:0‐no
(useDMA);1‐usePIOforsmallpacketsifavailable(default);2‐usePIOforsmall
packetsandfailifPIOisnotavailable.Mode1willfallbacktoDMAifPIOisnotcurrently
available.Mode2willfailtocreatethestackifthehardwaresupportsPIObutPIOisnot
currentlyavailable.OnhardwarethatdoesnotsupportPIOthereisnodifference
betweenmode1andmode2Inallcases,PIOwillonlybeusedforsmallpackets(see
EF_PIO_THRESHOLD)andiftheVI'stransmitqueueiscurrentlyempty.Ifthese
conditionsarenotmetDMAwillbeused,eveninmode2.Note:PIOiscurrentlyonly
availableonx86_64systemsNote:Mode2willnotpreventastackfromoperating
withoutPIOintheeventthatPIOallocationisoriginallysuccessfulbutthenfails
afteranadapterisrebootedorhotpluggedwhilethatstackexists.
EF_PIO_THRESHOLD
Name:pio_thresh default:1514 min:0 perstack
SetsathresholdforthesizeofpacketthatwillusePIO,ifturnedonusingEF_PIO.
PacketsuptothethresholdwillusePIO.Largerpacketswillnot.
EF_PIPE
Name:ul_pipe default:2 min:CI_UNIX_PIPE_DONT_ACCELERATE max:
CI_UNIX_PIPE_ACCELERATE_IF_NETIF perprocess
0‐disablepipeacceleration,1‐enablepipeacceleration,2‐accleratepipesonlyifan
Onloadstackalreadyexistsintheprocess.
EF_PIPE_RECV_SPIN
Name:pipe_recv_spin default:0 min:0 max:1 perprocess
Spininpipereceivecallsuntildataarrivesorthespintimeoutexpires(whicheveristhe
sooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespintimeoutisset
byEF_SPIN_USECorEF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 161
EF_PIPE_SEND_SPIN
Name:pipe_send_spin default:0 min:0 max:1 perprocess
Spininpipesendcallsuntilspacebecomesavailableinthesocketbufferorthespin
timeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernel
andblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_PIPE_SIZE
Name:pipe_size default:229376 min:OO_PIPE_MIN_SIZE max:
CI_CFG_MAX_PIPE_SIZE perprocess
Defaultsizeofthepipeinbytes.Actualpipesizewillberoundeduptothesizeofpacket
bufferandsubjecttomodificationsbyfcntlF_SETPIPE_SZwheresupported.
EF_PKT_WAIT_SPIN
Name:pkt_wait_spin default:0 min:0 max:1 perprocess
SpinwhilewaitingforDMAbuffers.Ifthespintimeoutexpires,enterthekerneland
block.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_FAST
Name:ul_poll_fast default:1 min:0 max:1 perprocess
Allowapoll()calltoreturnwithoutinspectingthestateofallpolledfiledescriptors
whenatleastoneeventissatisfied.Thisallowstheacceleratedpoll()calltoavoida
systemcallwhenacceleratedsocketsare'ready',andcanincreaseperformance
substantially.Thisoptionchangesthesemanticsofpoll(),andassuchcouldcause
applicationstomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsovernon
acceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityofapplications
workfinewiththisoption.
EF_POLL_FAST_USEC
Name:ul_poll_fast_usec default:32 perprocess
Whenspinninginapoll()call,causesacceleratedsocketstobepolledforNusecsbefore
unacceleratedsocketsarepolled.Thisreduceslatencyforacceleratedsockets,possibly
attheexpenseoflatencyonunacceleratedsockets.Sinceacceleratedsocketsare
typicallythepartsoftheapplicationwhicharemostperformancesensitivethisis
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 162
typicallyagoodtradeoff.
EF_POLL_NONBLOCK_FAST_USEC
Name:ul_poll_nonblock_fast_usec default:200 perprocess
Wheninvokingpoll()withtimeout==0(nonblocking),thisoptioncausesnon
acceleratedsocketstobepolledonlyeveryNusecs.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceaccelerated
socketsaretypicallythepartsoftheapplicationwhicharemostperformancesensitive
thisisoftenagoodtradeoff.Setthisoptiontozerotodisable,ortoahighervalueto
furtherimprovelatencyforacceleratedsockets.Thisoptionchangesthebehaviourof
poll()calls,socouldpotentiallycauseanapplicationtomisbehave.
EF_POLL_ON_DEMAND
Name:poll_on_demand default:1 min:0 max:1 perstack
Pollfornetworkeventsinthecontextoftheapplicationcallsintothenetworkstack.
Thisoptionisenabledbydefault.Thisoptioncanimproveperformanceinmulti
threadedapplicationswheretheOnloadstackisinterruptdriven(EF_INT_DRIVEN=1),
becauseitcanreducelockcontention.SettingEF_POLL_ON_DEMAND=0ensuresthat
networkeventsare(mostly)processedinresponsetointerrupts.
EF_POLL_SPIN
Name:ul_poll_spin default:0 min:0 max:1 perprocess
Spininpoll()callsuntilaneventissatisfiedorthespintimeoutexpires(whicheveristhe
sooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespintimeoutisset
byEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_USEC
Name:ef_poll_usec_meta_option default:0 perprocess
Thisoptionenablesspinningandsetsthespintimeoutinmicroseconds.Settingthis
optionisequivalentto:SettingEF_SPIN_USECandEF_BUZZ_USEC,enablingspinningfor
UDPsendsandreceives,TCPsendsandreceives,select,pollandepoll_wait(),and
enablinglockbuzzing.Spinningtypicallyreduceslatencyandjittersubstantially,andcan
alsoimprovethroughput.However,insomeapplicationsspinningcanharm
performance;particularlyapplicationthathavemanythreads.Whenspinningis
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 163
enabledyoushouldnormallydedicateaCPUcoretoeachthreadthatspins.Youcanuse
theEF_*_SPINoptionstoselectivelyenableordisablespinningforeachAPIand
transport.Youcanalsousetheonload_thread_set_spin()extensionAPItocontrol
spinningonaperthreadandperAPIbasis.
EF_PREFAULT_PACKETS
Name:prefault_packets default:1 min:0 max:1000000000 perstack
Whenset,thisoptioncausestheprocessto'touch'thespecifiednumberofpacket
bufferswhentheOnloadstackiscreated.Thiscausesmemoryforthepacketbuffersto
bepreallocated,andalsocausesthemtobememorymappedintotheprocessaddress
space.Thiscanpreventlatencyjittercausedbyallocationandmemorymapping
overheads.Thenumberofpacketsrequestedisinadditiontothepacketbuffersthatare
allocatedtofilltheRXrings.Thereisnoguaranteethatitwillbepossibletoallocatethe
numberofpacketbuffersrequested.Thedefaultsettingcausesallpacketbufferstobe
mappedintotheuserleveladdressspace,butdoesnotcauseanyextrabufferstobe
reserved.Setto0topreventprefaulting.
EF_PROBE
Name:probe default:1 min:0 max:1 perprocess
Whenset,filedescriptorsaccessedfollowingexec()willbe'probed'andOpenOnload
socketswillbemappedtouserlandsothattheycanbeaccelerated.Otherwise
OpenOnloadsocketsarenotacceleratedfollowingexec().
EF_RETRANSMIT_THRESHOLD
Name:retransmit_threshold default:15 min:0 max:SMAX perstack
NumberofretransmittimeoutsbeforeaTCPconnectionisaborted.
EF_RETRANSMIT_THRESHOLD_ORPHAN
Name:retransmit_threshold_orphan default:8 min:0 max:SMAX 
perstack
NumberofretransmittimeoutsbeforeaTCPconnectionisabortedincaseoforphaned
connection.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 164
EF_RETRANSMIT_THRESHOLD_SYN
Name:retransmit_threshold_syn default:4 min:0 max:SMAX per
stack
NumberoftimesaSYNwillberetransmittedbeforeaconnect()attemptwillbe
aborted.
EF_RETRANSMIT_THRESHOLD_SYNACK
Name:retransmit_threshold_synack default:5 min:0 max:
CI_CFG_TCP_SYNACK_RETRANS_MAX perstack
NumberoftimesaSYNACKwillberetransmittedbeforeanembryonicconnectionwill
beaborted.
EF_RFC_RTO_INITIAL
Name:rto_initial default:1000 perstack
Initialretransmittimeoutinmilliseconds.i.e.Thenumberofmillisecondstowaitforan
ACKbeforeretransmittingpackets.
EF_RFC_RTO_MAX
Name:rto_max default:120000 perstack
Maximumretransmittimeoutinmilliseconds.
EF_RFC_RTO_MIN
Name:rto_min default:200 perstack
Minimumretransmittimeoutinmilliseconds.
EF_RXQ_LIMIT
Name:rxq_limit default:65535 min:CI_CFG_RX_DESC_BATCH max:
65535 perstack
Maximumfilllevelforthereceivedescriptorring.Thishasnoeffectwhenithasavalue
largerthantheringsize(EF_RXQ_SIZE).
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 165
EF_RXQ_MIN
Name:rxq_min default:256 min:2 * CI_CFG_RX_DESC_BATCH + 1 per
stack
MinimuminitialfilllevelforeachRXring.IfOnloadisnotabletoallocatesufficient
packetbufferstofilleachRXringtothislevel,thencreationofthestackwillfail.
EF_RXQ_SIZE
Name:rxq_size default:512 min:512 max:4096 perstack
Setthesizeofthereceivedescriptorring.Validvalues:512,1024,2048or4096.Alarger
ringsizecanabsorblargerpacketburstswithoutdrops,butmayreduceefficiency
becausetheworkingsetsizeisincreased.
EF_RX_TIMESTAMPING
Name:rx_timestamping default:0 min:0 max:3 perstack
Controlofhardwaretimestampingofreceivedpackets,possiblevalues:0‐donotdo
timestamping(default);1‐requesttimestampingbutcontinueifhardwareisnot
capableoritdoesnotsucceed;2‐requesttimestampingandfailifhardwareiscapable
anditdoesnotsucceed;3‐requesttimestampingandfailifhardwareisnotcapableor
itdoesnotsucceed;
EF_SA_ONSTACK_INTERCEPT
Name:sa_onstack_intercept default:0 min:0 max:1 perprocess
InterceptsignalswhensignalhandlerisinstalledwithSA_ONSTACKflag.0‐Don't
intercept.Ifyoucallsocketrelatedfunctionssuchassend,filerelatedfunctionssuchas
closeordupfromyoursignalhandler,thenyourapplicationmaydeadlock.(default)1‐
Intercept.ThereisnoguaranteethatSA_ONSTACKflagwillreallywork,but
OpenOnloadlibrarywilldoitsbest.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 166
EF_SCALABLE_FILTERS
Name:scalable_filter_ifindex default:0 min:0 max:SMAX per
stack
Specifiestheinterfaceonwhichtoenablesupportforscalablefilters,andconfiguresthe
scalablefiltermode(s)touse.ScalablefiltersallowOnloadtouseasinglehardware
MACaddressfiltertoavoidhardwarelimitationsandoverheads.Thisremoves
restrictionsonthenumberofsimultaneousconnectionsandincreasesperformanceof
activeconnectcalls,butkernelsupportontheselectedinterfaceislimitedtoARP/
DHCP/ICMPprotocolsandsomeOnloadfeaturesthatrelyonunacceleratedtraffic(such
asreceivingfragmentedUDPdatagrams)willnotwork.PleaseseetheOnloaduser
guideforfulldetails.Dependingonthemodeselectedthisoptionwillenablesupport
for:‐scalablelisteningsockets;‐IP_TRANSPARENTsocketoption;Theinterfacespecified
mustbeaSFN7000orlaterNIC.FormatofEF_SCALABLE_FILTERSvariableisasfollows:
EF_SCALABLE_FILTERS=<interfacename>[=mode[:mode]]wheremodeisoneof:
transparent_active,passive,rss.Thefollowingmodesandtheircombinationscanbe
specified:transparent_active,passive,rss:transparent_active,
transparent_active:passive
EF_SCALABLE_FILTERS_ENABLE
Name:scalable_filter_enable default:0 min:0 max:1 perstack
Turnthescalablefilterfeatureonoroffonastack.Ifthisissetto1thenthe
configurationselectedinEF_SCALABLE_FILTERSwillbeused.Ifthisissetto0then
scalablefilterswillnotbeusedforthisstack.Ifunsetthiswilldefaultto1if
EF_SCALABLE_FILTERSisconfigured.
EF_SCALABLE_FILTERS_MODE
Name:scalable_filter_mode default:4294967295 min:-1 max:6 
perstack
StoresscalablefiltermodesetwithEF_SCALABLE_FILTERS.Tobesetindirectlywith
EF_SCALABLE_FILTERSvariable
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 167
EF_SELECT_FAST
Name:ul_select_fast default:1 min:0 max:1 perprocess
Allowaselect()calltoreturnwithoutinspectingthestateofallselectedfiledescriptors
whenatleastoneselectedeventissatisfied.Thisallowstheacceleratedselect()callto
avoidasystemcallwhenacceleratedsocketsare'ready',andcanincreaseperformance
substantially.Thisoptionchangesthesemanticsofselect(),andassuchcouldcause
applicationstomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsovernon
acceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityofapplications
workfinewiththisoption.
EF_SELECT_FAST_USEC
Name:ul_select_fast_usec default:32 perprocess
Whenspinninginaselect()call,causesacceleratedsocketstobepolledforNusecs
beforeunacceleratedsocketsarepolled.Thisreduceslatencyforacceleratedsockets,
possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceacceleratedsockets
aretypicallythepartsoftheapplicationwhicharemostperformancesensitivethisis
typicallyagoodtradeoff.
EF_SELECT_NONBLOCK_FAST_USEC
Name:ul_select_nonblock_fast_usec default:200 perprocess
Wheninvokingselect()withtimeout==0(nonblocking),thisoptioncausesnon
acceleratedsocketstobepolledonlyeveryNusecs.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Sinceaccelerated
socketsaretypicallythepartsoftheapplicationwhicharemostperformancesensitive
thisisoftenagoodtradeoff.Setthisoptiontozerotodisable,ortoahighervalueto
furtherimprovelatencyforacceleratedsockets.Thisoptionchangesthebehaviourof
select()calls,socouldpotentiallycauseanapplicationtomisbehave.
EF_SELECT_SPIN
Name:ul_select_spin default:0 min:0 max:1 perprocess
Spininblockingselect()callsuntiltheselectsetissatisfiedorthespintimeoutexpires
(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernelandblock.The
spintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 168
EF_SEND_POLL_MAX_EVS
Name:send_poll_max_events default:96 min:1 max:65535 perstack
Whenpollingfornetworkeventsaftersending,thisplacesalimitonthenumberof
eventshandled.
EF_SEND_POLL_THRESH
Name:send_poll_thresh default:64 min:0 max:65535 perstack
Pollfornetworkeventsaftersendingthismanypackets.Settingthistoalargervalue
mayimprovetransmitthroughputforsmallmessagesbyallowingbatching.However,
suchbatchingmaycausesendstobedelayedleadingtoincreasedjitter.
EF_SHARE_WITH
Name:share_with default:0 min:-1 max:SMAX perstack
Setthisoptiontoallowastacktobeaccessedbyprocessesownedbyanotheruser.Set
ittotheUIDofauserthatshouldbepermittedtosharethisstack,orsetitto‐1toallow
anyusertosharethestack.Bydefaultstacksarenotaccessiblebyusersotherthan
root.Processesinvokedbyrootcanaccessanystack.Setuidprocessescanonlyaccess
stackscreatedbytheeffectiveuser,nottherealuser.Thisrestrictioncanberelaxedby
settingtheonloadkernelmoduleoptionallow_insecure_setuid_sharing=1.WARNING:A
userthatispermittedtoaccessastackisableto:Snooponanydatatransmittedor
receivedviathestack;Injectormodifydatatransmittedorreceivedviathestack;
damagethestackandanysocketsorconnectionsinit;causemisbehaviourandcrashes
inanyapplicationusingthestack.
EF_SIGNALS_NOPOSTPONE
Name:signals_no_postpone default:67109952 min:0 max:
(ci_uint64)(-1) perprocess
Commaseparatedlistofsignalnumberstoavoidpostponingofthesignalhandlers.
Yourapplicationwilldeadlockifoneofthehandlersusessocketfunction.Bydefault,
thelistincludesSIGBUS,SIGSEGVandSIGPROF.Pleasespecifynumbers,notstring
aliases:EF_SIGNALS_NOPOSTPONE=7,11,27insteadof
EF_SIGNALS_NOPOSTPONE=SIGBUS,SIGSEGV,SIGPROF.Youcanset
EF_SIGNALS_NOPOSTPONEtoemptyvaluetopostponeallsignalhandlersinthesame
wayifyoususpectthesesignalstocallnetworkfunctions.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 169
EF_SOCKET_CACHE_MAX
Name:sock_cache_max default:0 perstack
SetsthemaximumnumberofTCPsocketstocacheforthisstack.Whenset>0,
OpenOnloadwillcacheresourcesassociatedwithsocketsinordertoimprove
connectionsetupandteardownperformance.Thisimprovesperformancefor
applicationsthatmakenewTCPconnectionsatahighrate.
EF_SOCKET_CACHE_PORTS
Name:sock_cache_ports default:0 perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.Whenset(andsocket
cachingisenabled),onlysocketsboundtothespecifiedportswillbeeligibletobe
cached.
EF_SOCK_LOCK_BUZZ
Name:sock_lock_buzz default:0 min:0 max:1 perprocess
Spinwhilewaitingtoobtainapersocketlock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.Thepersocketlockistaken
inrecv()callsandsimilar.Thisoptioncanreducejitterwhenmultiplethreadsinvoke
recv()onthesamesocket,butcanreducefairnessbetweenthreadscompetingforthe
lock.
EF_SO_BUSY_POLL_SPIN
Name:so_busy_poll_spin default:0 min:0 max:1 perprocess
Spinpoll,selectandepollinaLinuxlikeway:enablespinningonlyifaspinningsocletis
presetinthepoll/select/epollset.SeeLinuxdocumentationonSO_BUSY_POLLsocket
optionfordetails.YoushouldalsoenablespinningviaEF_POLL,SELECT,EPOLL_SPIN
variableifyou'dliketospininpoll,selectorepollcorrespondingly.Thespindurationis
setviaEF_SPIN_USEC,whichisequivalenttotheLinuxsysctl.net.busy_pollvalue.
EF_POLL_USECisallinonevariabletosetforall4variablesmentionedhere.Linuxnever
spinsinepoll,butOnloaddoes.Thisvariabledoesnotaffectepollbehaviourif
EF_UL_EPOLL=2.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 170
EF_SPIN_USEC
Name:ul_spin_usec default:0 perprocess
Setsthetimeoutinmicrosecondsforspinningoptions.Setthistoto‐1tospinforever.
ThespintimeoutmayalsobesetbytheEF_POLL_USECoption.Spinningtypically
reduceslatencyandjittersubstantially,andcanalsoimprovethroughput.However,in
someapplicationsspinningcanharmperformance;particularlyapplicationthathave
manythreads.WhenspinningisenabledyoushouldnormallydedicateaCPUcoreto
eachthreadthatspins.YoucanusetheEF_*_SPINoptionstoselectivelyenableor
disablespinningforeachAPIandtransport.Youcanalsousethe
onload_thread_set_spin()extensionAPItocontrolspinningonaperthreadandperAPI
basis.
EF_STACK_LOCK_BUZZ
Name:stack_lock_buzz default:0 min:0 max:1 perprocess
Spinwhilewaitingtoobtainaperstacklock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.Thisoptionreducesjitter
causedbylockcontention,butcanreducefairnessbetweenthreadscompetingforthe
lock.
EF_STACK_PER_THREAD
Name:stack_per_thread default:0 min:0 max:1 perprocess
CreateaseparateOnloadstackforthesocketscreatedbyeachthread.
EF_SYNC_CPLANE_AT_CREATE
Name:sync_cplane default:2 min:0 max:2 perstack
Whenthisoptionissetto2Onloadwillforceasyncofcontrolplaneinformationfrom
thekernelwhenastackiscreated.Thiscanhelptoensureuptodateinformationis
usedwhereastackiscreatedimmediatelyfollowinginterfaceconfiguration.Ifthis
optionissetto1thenOnloadwillonlyforceasyncforthefirststackcreated.Thiscan
beusedifstackcreationtimeforlaterstacksistimecritical.Settingthisoptionto0will
disableforcedsync.Synchronisingdatafromthekernelwillcontinuetohappen
periodically.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 171
EF_TCP
Name:ul_tcp default:1 min:0 max:1 perprocess
CleartodisableaccelerationofnewTCPsockets.
EF_TCP_ACCEPT_SPIN
Name:tcp_accept_spin default:0 min:0 max:1 perprocess
SpininblockingTCPaccept()callsuntilincomingconnectionisestablished,thespin
timeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespin
timeoutexpires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.
EF_TCP_ADV_WIN_SCALE_MAX
Name:tcp_adv_win_scale_max default:14 min:0 max:14 perstack
MaximumvalueforTCPwindowscalingthatwillbeadvertised.
EF_TCP_BACKLOG_MAX
Name:tcp_backlog_max default:256 perstack
Placesanupperlimitonthenumberofembryonic(halfopen)connectionsforone
listeningsocket;seealsoEF_TCP_SYNRECV_MAX.Thisvalueisoverriddenby/proc/sys/
net/ipv4/tcp_max_syn_backlog.
EF_TCP_CLIENT_LOOPBACK
Name:tcp_client_loopback default:0 min:0 max:
CITP_TCP_LOOPBACK_TO_NEWSTACK perstack
EnableaccelerationofTCPloopbackconnectionsontheconnecting(client)side:0‐
notaccelerated(default);1‐accelerateifthelisteningsocketisinthesamestack(you
shouldalsosetEF_TCP_SERVER_LOOPBACK!=0);2‐accelerateandmoveaccepted
sockettothestackoftheconnectingsocket(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK=2);3‐accelerateandmovetheconnectingsockettothe
stackofthelisteningsocket(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK!=0).4‐accelerateandmovebothconnectingand
acceptedsocketstothenewstack(servershouldallowthisvia
EF_TCP_SERVER_LOOPBACK=2).NOTES:Options3and4breaksomeapplicationsusing
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 172
epoll,forkanddupcalls.Options2and4makesaccept()tomisbehaveiftheclientexist
tooearly.Option4isnotrecommendedon32bitsystemsbecauseitcancreatealotof
additionalOnloadstackseatingalotoflowmemory.
EF_TCP_CONNECT_HANDOVER
Name:tcp_connect_handover default:0 min:0 max:1 perstack
WhenanacceleratedTCPsocketcallsconnect(),handitovertothekernelstack.This
optiondisablesaccelerationofactiveopenTCPconnections.
EF_TCP_CONNECT_SPIN
Name:tcp_connect_spin default:0 min:0 max:1 perprocess
SpininblockingTCPconnect()callsuntilconnectionisestablished,thespintimeout
expiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespintimeout
expires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.
EF_TCP_FASTSTART_IDLE
Name:tcp_faststart_idle default:65536 min:0 perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTART.SettozerotopreventaconnectionenteringFASTSTARTafteranidle
period.
EF_TCP_FASTSTART_INIT
Name:tcp_faststart_init default:65536 min:0 perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTART.SettozerotodisableFASTSTARTonnewconnections.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 173
EF_TCP_FASTSTART_LOSS
Name:tcp_faststart_loss default:65536 min:0 perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhendoing
somayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.Thisoptionsetsthe
numberofbytesthatmustbeACKedbythereceiverbeforetheconnectionexits
FASTSTARTfollowingloss.SettozerotodisableFASTSTARTafterloss.
EF_TCP_FIN_TIMEOUT
Name:fin_timeout default:60 perstack
Timeinsecondstowaitforanorphanedconnectiontobeclosedproperlybythe
networkpartner(e.g.FINintheTCPFIN_WAIT2state;zerowindowopeningtosendour
FIN,etc).
EF_TCP_FORCE_REUSEPORT
Name:tcp_reuseports default:0 perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.TCPsocketsthatbindto
thoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
EF_TCP_INITIAL_CWND
Name:initial_cwnd default:0 min:0 max:SMAX perstack
Setstheinitialsizeofthecongestionwindow(inbytes)forTCPconnections.Somecare
isneededas,forexample,settingsmallerthanthesegmentsizemayresultinOnload
beingunabletosendtraffic.WARNING:ModifyingthisoptionmayviolatetheTCP
protocol.
EF_TCP_LISTEN_HANDOVER
Name:tcp_listen_handover default:0 min:0 max:1 perstack
WhenanacceleratedTCPsocketcallslisten(),handitovertothekernelstack.This
optiondisablesaccelerationofTCPlisteningsocketsandpassivelyopenedTCP
connections.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 174
EF_TCP_LOSS_MIN_CWND
Name:loss_min_cwnd default:0 min:0 max:SMAX perstack
SetstheminimumsizeofthecongestionwindowforTCPconnectionsfollowing
loss.WARNING:ModifyingthisoptionmayviolatetheTCPprotocol.
EF_TCP_RCVBUF
Name:tcp_rcvbuf_user default:0 perstack
OverrideSO_RCVBUFforTCPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT
Name:tcp_rcvbuf_est_def default:131072 perstack
OverridestheOSdefaultSO_RCVBUFvalueforTCPsocketsintheESTABLISHEDstateif
theOSdefaultSO_RCVBUFvaluefallsoutsideboundssetwiththisoption.Thisvalueis
usedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoidconfusionof
someapplicationslikenetperf.Thelowerboundissettothisvalueandtheupperbound
issetto4*thisvalue.IftheOSdefaultSO_RCVBUFvalueislessthanthelowerbound,
thenthelowerboundisused.IftheOSdefaultSO_RCVBUFvalueismorethanthe
upperbound,thentheupperboundisused.ThisvariableoverridesOSdefault
SO_RCVBUFvalueonly,itdoesnotchangeSO_RCVBUFiftheapplicationexplicitlysetsit
(seeEF_TCP_RCVBUFvariablewhichoverridesapplicationsuppliedvalue).
EF_TCP_RCVBUF_STRICT
Name:tcp_rcvbuf_strict default:0 min:0 max:1 perstack
ThisoptionpreventsTCPsmallsegmentattack.Withthisoptionset,Onloadlimitsthe
numberofpacketsinsideTCPreceivequeueandTCPreorderbuffer.Insomecases,this
optioncausesperformancepenalty.Youprobablywantthisoptionifyourapplicationis
connectingtounrtustedpartneroroveruntrustednetwork.Offbydefault.
EF_TCP_RECV_SPIN
Name:tcp_recv_spin default:0 min:0 max:1 perprocess
SpininblockingTCPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthe
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 175
kernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_TCP_RST_DELAYED_CONN
Name:rst_delayed_conn default:0 min:0 max:1 perstack
ThisoptiontellsOnloadtoresetTCPconnectionsratherthanallowdatatobe
transmittedlate.Specifically,TCPconnectionsareresetiftheretransmittimeoutfires.
(Thisusuallyhappenswhendataislost,andnormallytriggersaretransmitwhichresults
indatabeingdeliveredhundredsofmillisecondslate).WARNING:Thisoptionislikelyto
causeconnectionstoberesetspuriouslyifACKpacketsaredroppedinthenetwork.
EF_TCP_RX_CHECKS
Name:tcp_rx_checks default:0 min:0 max:1 perstack
Internal/debugginguseonly:performextradebugging/consistencychecksonreceived
packets.
EF_TCP_RX_LOG_FLAGS
Name:tcp_rx_log_flags default:0 perstack
LogreceivedpacketsthathaveanyoftheseflagssetintheTCPheader.Onlyactive
whenEF_TCP_RX_CHECKSisset.
EF_TCP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:tcp_nonblock_no_pkts_mode default:0 min:0 max:1 perstack
ThisoptioncontrolshowanonblockingTCPsend()callshouldbehaveifitisunableto
allocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinuxkernelstack
behaviourandblockforpacketbufferstobeavailable.Ifsetto1,thisoptionwillcause
OnloadtoreturnerrorENOBUFS.Notethisoptioncancausesomeapplications(that
assumethatasocketthatiswriteableisabletosendwithouterror)tomalfunction.
EF_TCP_SEND_SPIN
Name:tcp_send_spin default:0 min:0 max:1 perprocess
SpininblockingTCPsendcallsuntilwindowisupdatedbypeer,thespintimeoutexpires
orthesockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 176
enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_TCP_SERVER_LOOPBACK
Name:tcp_server_loopback default:0 min:0 max:
CITP_TCP_LOOPBACK_ALLOW_ALIEN_IN_ACCEPTQ perstack
EnableaccelerationofTCPloopbackconnectionsonthelistening(server)side:0‐not
accelerated(default);1‐accelerateiftheconnectingsocketisinthesamestack(you
shouldalsosetEF_TCP_CLIENT_LOOPBACK!=0);2‐accelerateandallowaccepted
sockettobeinanotherstack(thisisnecessaryforclientswith
EF_TCP_CLIENT_LOOPBACK=2,4).
EF_TCP_SNDBUF
Name:tcp_sndbuf_user default:0 perstack
OverrideSO_SNDBUFforTCPsockets(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_SNDBUF_ESTABLISHED_DEFAULT
Name:tcp_sndbuf_est_def default:131072 perstack
OverridestheOSdefaultSO_SNDBUFvalueforTCPsocketsintheESTABLISHEDstateif
theOSdefaultSO_SNDBUFvaluefallsoutsideboundssetwiththisoption.Thisvalueis
usedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoidconfusionof
someapplicationslikenetperf.Thelowerboundissettothisvalueandtheupperbound
issetto4*thisvalue.IftheOSdefaultSO_SNDBUFvalueislessthanthelowerbound,
thenthelowerboundisused.IftheOSdefaultSO_SNDBUFvalueismorethanthe
upperbound,thentheupperboundisused.ThisvariableoverridesOSdefault
SO_SNDBUFvalueonly,itdoesnotchangeSO_SNDBUFiftheapplicationexplicitlysets
it(seeEF_TCP_SNDBUFvariablewhichoverridesapplicationsuppliedvalue).
EF_TCP_SNDBUF_MODE
Name:tcp_sndbuf_mode default:1 min:0 max:2 perstack
ThisoptioncontrolshowtheSO_SNDBUFlimitisappliedtoTCPsockets.Inthedefault
modethelimitappliestothesizeofthesendqueueandretransmitqueuecombined.
Whenthisoptionissetto0thelimitappliestothethesendqueueonly.Whenthis
optionissetto2,theSNDBUFsizeisautomaticallyadjustedforeachTCPsocketto
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 177
matchthewindowadvertisedbythepeer(limitedby
EF_TCP_SOCKBUF_MAX_FRACTION).IftheapplicationsetsSO_SNDBUFexplictlythen
automaticadjustmentisnotusedforthatsocket.Thelimitisappliedtothesizeof
thesendqueueandretransmitqueuecombined.Youmayalsowantto
setEF_TCP_RCVBUF_MODEtogiveautomaticadjustmentofRCVBUF.
EF_TCP_SOCKBUF_MAX_FRACTION
Name:tcp_sockbuf_max_fraction default:1 min:1 max:10 perstack
ThisoptioncontrolsthemaximumfractionoftheTXbuffersthatmaybeallocatedtoa
singlesocketwithEF_TCP_SNDBUF_MODE=2.Italsocontrolsthemaximumfractionof
theRXbuffersthatmaybeallocatedtoasinglesocketwith
EF_TCP_RCVBUF_MODE=1.Themaximumallocationforasocketis
EF_MAX_TX_PACKETS/(2^N)forTXandEF_MAX_RX_PACKETS/(2^N)forRX,whereNis
specifiedhere.
EF_TCP_SYNCOOKIES
Name:tcp_syncookies default:0 min:0 max:1 perstack
UseTCPsyncookiestoprotectfromSYNfloodattack
EF_TCP_SYNRECV_MAX
Name:tcp_synrecv_max default:1024 max:
CI_CFG_NETIF_MAX_ENDPOINTS_MAX perstack
Placesanupperlimitonthenumberofembryonic(halfopen)connectionsinanOnload
stack;seealsoEF_TCP_BACKLOG_MAX.Bydefault,EF_TCP_SYNRECV_MAX=4*
EF_TCP_BACKLOG_MAX.
EF_TCP_SYN_OPTS
Name:syn_opts default:7 perstack
AbitmaskspecifyingtheTCPoptionstoadvertiseinSYNsegments.bit0(0x1)issetto1
toenablePAWSandRTTMtimestamps(RFC1323),bit1(0x2)issetto1toenable
windowscaling(RFC1323),bit2(0x4)issetto1toenableSACK(RFC2018),bit3(0x8)is
setto1toenableECN(RFC3128).
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 178
EF_TCP_TCONST_MSL
Name:msl_seconds default:25 perstack
TheMaximumSegmentLifetime(asdefinedbytheTCPRFC).Asmallervaluecauses
connectionstospendlesstimeintheTIME_WAITstate.
EF_TIMESTAMPING_REPORTING
Name:timestamping_reporting default:0 min:0 max:1 perstack
Controlstimestampreporting,possiblevalues:0:reporttranslatedtimestampsonly
whentheNICclockhasbeenset;1:reporttranslatedtimestampsonlywhenthesystem
clockandtheNICclockareinsync(e.g.usingptpd)Iftheaboveconditionsarenotmet
Onloadwillonlyreportraw(nottranslated)timestamps.
EF_TXQ_LIMIT
Name:txq_limit default:268435455 min:16 * 1024 max:0xfffffff 
perstack
Maximumnumberofbytestoenqueueonthetransmitdescriptorring.
EF_TXQ_RESTART
Name:txq_restart default:268435455 min:1 max:0xfffffff per
stack
Level(inbytes)towhichthetransmitdescriptorringmustfallbeforeitwillbefilled
again.
EF_TXQ_SIZE
Name:txq_size default:512 min:512 max:4096 perstack
Setthesizeofthetransmitdescriptorring.Validvalues:512,1024,2048or4096.
EF_TX_MIN_IPG_CNTL
Name:tx_min_ipg_cntl default:0 min:-1 max:20 perstack
Ratepacingvalue.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 179
EF_TX_PUSH
Name:tx_push default:1 min:0 max:1 perstack
Enablelowlatencytransmit.
EF_TX_PUSH_THRESHOLD
Name:tx_push_thresh default:100 min:1 perstack
SetsathresholdforthenumberofoutstandingsendsbeforewestopusingTX
descriptorpush.ThishasnoeffectifEF_TX_PUSH=0.Thisthresholdisignored,and
assumedtobe1,onpreSFN7000serieshardware.Itmakessensetosetthisvalue
similartoEF_SEND_POLL_THRESH
EF_TX_QOS_CLASS
Name:tx_qos_class default:0 min:0 max:1 perstack
SettheQOSclassfortransmittedpacketsonthisOnloadstack.TwoQOSclassesare
supported:0and1.BydefaultbothOnloadacceleratedtrafficandkerneltrafficarein
class0.YoucanminimiselatencybyplacinglatencysensitivetrafficintoaseparateQOS
classfrombulktraffic.
EF_TX_TIMESTAMPING
Name:tx_timestamping default:0 min:0 max:3 perstack
Controlofhardwaretimestampingoftransmittedpackets,possiblevalues:0‐donot
dotimestamping(default);1‐requesttimestampingbutcontinueifhardwareisnot
capableoritdoesnotsucceed;2‐requesttimestampingandfailifhardwareiscapable
anditdoesnotsucceed;3‐requesttimestampingandfailifhardwareisnotcapableor
itdoesnotsucceed;
EF_UDP
Name:ul_udp default:1 min:0 max:1 perprocess
CleartodisableaccelerationofnewUDPsockets.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 180
EF_UDP_CONNECT_HANDOVER
Name:udp_connect_handover default:1 min:0 max:1 perstack
WhenaUDPsocketisconnectedtoanIPaddressthatcannotbeacceleratedby
OpenOnload,handthesocketovertothekernelstack.Whenthisoptionisdisabledthe
socketremainsunderthecontrolofOpenOnload.Thismaybeworthwhilebecausethe
socketmaysubsequentlybereconnectedtoanIPaddressthatcanbeaccelerated.
EF_UDP_FORCE_REUSEPORT
Name:udp_reuseports default:0 perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.UDPsocketsthatbindto
thoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
EF_UDP_PORT_HANDOVER2_MAX
Name:udp_port_handover2_max default:1 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER2_MIN
Name:udp_port_handover2_min default:2 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER3_MAX
Name:udp_port_handover3_max default:1 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 181
EF_UDP_PORT_HANDOVER3_MIN
Name:udp_port_handover3_min default:2 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER_MAX
Name:udp_port_handover_max default:1 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_PORT_HANDOVER_MIN
Name:udp_port_handover_min default:2 perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.The
rangeisinclusive.
EF_UDP_RCVBUF
Name:udp_rcvbuf_user default:0 perstack
OverrideSO_RCVBUFforUDPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_UDP_RECV_SPIN
Name:udp_recv_spin default:0 min:0 max:1 perprocess
SpininblockingUDPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 182
EF_UDP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:udp_nonblock_no_pkts_mode default:0 min:0 max:1 perstack
ThisoptioncontrolshowanonblockingUDPsend()callshouldbehaveifitisunableto
allocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinuxkernelstack
behaviourandblockforpacketbufferstobeavailable.Ifsetto1,thisoptionwillcause
OnloadtoreturnerrorENOBUFS.Notethisoptioncancausesomeapplications(that
assumethatasocketthatiswriteableisabletosendwithouterror)tomalfunction.
EF_UDP_SEND_SPIN
Name:udp_send_spin default:0 min:0 max:1 perprocess
SpininblockingUDPsendcallsuntilspacebecomesavailableinthesocketbuffer,the
spintimeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespin
timeoutexpires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.Note:UDPsendsusuallycompleteveryquickly,butcanblockifthe
applicationdoesalargeburstofsendsatahighrate.Thisoptionreducesjitterwhen
suchblockingisneeded.
EF_UDP_SEND_UNLOCKED
Name:udp_send_unlocked default:1 min:0 max:1 perstack
Enablesthe'unlocked'UDPsendpath.Whenenabledthisoptionimprovesconcurrency
whenmultiplethreadsareperformingUDPsends.
EF_UDP_SEND_UNLOCK_THRESH
Name:udp_send_unlock_thresh default:1500 perstack
UDPmessagesizebelowwhichweattempttotakethestacklockearly.Takingthelock
earlyreducesoverheadandlatencyslightly,butmayincreaselockcontentioninmulti
threadedapplications.
EF_UDP_SNDBUF
Name:udp_sndbuf_user default:0 perstack
OverrideSO_SNDBUFforUDPsockets.(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 183
EF_UL_EPOLL
Name:ul_epoll default:1 min:0 max:3 perprocess
Chooseepollimplementation.Thechoicesare:0‐kernel(unaccelerated)1‐user
level(accelerated,lowestlatency)2‐kernelaccelerated(bestwhentherearelotsof
socketsinthesetandmode3isnotsuitable)3‐userlevel(accelerated,lowest
latency,scalable,supportssocketcaching)Thedefaultistheuserlevel
implementation(1).Mode3canofferbenefitsovermode1,particularlywithlarger
sets.However,thismodehassomerestrictions.Itdoesnotsupportepollsetsthatexist
acrossfork().Itdoesnotsupportmonitoringthereadinessoftheset'sepollfdviaa
anotherepoll/poll/select.
EF_UL_POLL
Name:ul_poll default:1 min:0 max:1 perprocess
Cleartodisableaccelerationofpoll()callsatuserlevel.
EF_UL_SELECT
Name:ul_select default:1 min:0 max:1 perprocess
Cleartodisableaccelerationofselect()callsatuserlevel.
EF_UNCONFINE_SYN
Name:unconfine_syn default:1 min:0 max:1 perstack
AcceptTCPconnectionsthatcrossintooroutofaprivatenetwork.
EF_UNIX_LOG
Name:log_level default:3 perprocess
Abitmaskdeterminingwhichkindsofdiagnosticsmessageswillbelogged.0x1
errors0x2unexpected0x4setup0x8verbose0x10select()
0x20poll()0x100socketsetup0x200socketcontrol0x400socket
caching0x1000signalinterception0x2000libraryenter/exit0x4000log
callarguments0x8000contextlookup0x10000passthrough0x20000very
verbose0x40000Verbosereturnederror0x80000V.Verboseerrors:show'ok'
too0x20000000verbosetransportcontrol0x40000000veryverbosetransport
control0x80000000verbosepassthrough
OnloadUserGuide
ParameterReference
Issue20 ©SolarflareCommunications2015 184
EF_URG_RFC
Name:urg_rfc default:0 min:0 max:1 perstack
ChoosebetweencompliancewithRFC1122(1)orBSDbehaviour(0)regardingthe
locationoftheurgentpointinTCPpacketheaders.
EF_USE_DSACK
Name:use_dsack default:1 min:0 max:1 perstack
WhetherornottouseDSACK(duplicateSACK).
EF_USE_HUGE_PAGES
Name:huge_pages default:1 min:0 max:2 perstack
Controlofwhetherhugepagesareusedforpacketbuffers:0‐no;1‐usehugepagesif
available(default);2‐alwaysusehugepagesandfailifhugepagesarenot
available.Mode1printssyslogmessageifthereisnotenoughhugepagesinthe
system.Mode2guaranteesonlyinitiallyallocatedpacketstobeinhugepages.Itis
recommendedtousethismodetogetherwithEF_MIN_FREE_PACKETS,tocontrolthe
numberofsuchguaranteedhugepages.Allnoninitialpacketsareallocatedinhuge
pageswhenpossible;syslogmessageisprintedifthesystemisoutofhugepages.Non
initialpacketsmaybeallocatedinnonhugepageswithoutanywarninginsyslogfor
bothmode1and2evenifthesystemhasfreehugepages.
EF_VALIDATE_ENV
Name:validate_env default:1 min:0 max:1 perstack
WhensetthisoptionvalidatesOnloadrelatedenvironmentvariables(startingwithEF_).
EF_VFORK_MODE
Name:vfork_mode default:1 min:0 max:2 perprocess
Thisoptiondictateshowvfork()interceptshouldwork.Afteravfork(),parentandchildstill
shareaddressspacebutnotfiledescriptors.Wehavetobecarefulaboutmakingchanges
inthechildthatcanbeseenintheparent.Weofferthreeoptionshere.Differentappsmay
requiredifferentoptionsdependingontheiruseofvfork().IfusingEF_VFORK_MODE=2,it
isnotsafetocreatesocketsorpipesinthechildbeforecallingexec().0‐Oldbehavior.
Replacevfork()withfork()1‐Replacevfork()withfork()andblockparenttillchildexits/
execs2‐Replacevfork()withvfork()
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 185
BMetaOptions
B.1Environmentvariables
Thereareseveralenvironmentvariableswhichactasmetaoptionsandsetseveral
oftheoptionsdetailedinAppendixA.Theseare:
EF_POLL_USEC
SettingEF_POLL_USECcausesthefollowingoptionstobeset:
• EF_SPIN_USEC=EF_POLL_USEC
• EF_SELECT_SPIN=1
• EF_EPOLL_SPIN=1
• EF_POLL_SPIN=1
• EF_PKT_WAIT_SPIN=1
• EF_TCP_SEND_SPIN=1
• EF_UDP_RECV_SPIN=1
• EF_UDP_SEND_SPIN=1
• EF_TCP_RECV_SPIN=1
• EF_BUZZ_USEC=EF_POLL_USEC
• EF_SOCK_LOCK_BUZZ=1
•EF_STACK_LOCK_BUZZ=1
NOTE:Ifneitherofthespinningoptions;EF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
EF_BUZZ_USEC
SettingEF_BUZZ_USECsetsthefollowingoptions:
• EF_SOCK_LOCK_BUZZ=1
•EF_STACK_LOCK_BUZZ=1
NOTE:IfEF_POLL_USECissettovalueN,thenEF_BUZZ_USECisalsosettoNonlyif
N<=100,IfN>100thenEF_BUZZ_USECwillbesetto100.Thisisdeliberateas
spinningfortoolongoninternallocksmayadverselyaffectperformance.However
theusercanexplicitlysetEF_BUZZ_USECvaluee.g.
OnloadUserGuide
MetaOptions
Issue20 ©SolarflareCommunications2015 186
exportEF_POLL_USEC=10000
exportEF_BUZZ_USEC=1000
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 187
CBuildDependencies
C.1General
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalled,thetarget
platformmustsupportthefollowingcapabilities:
• SupportageneralCbuildenvironment‐i.e.hasgcc,make,libcandlibc
devel.
•Fromversion201502thefollowingarerequired:perl,autoconf,automake
andlibtool.
•Cancompilekernelmodules‐i.e.hasthecorrectkerneldevelpackageforthe
installedkernelversion.
•If32bitapplicationsaretobeacceleratedon64bitarchitecturesthemachine
mustbeabletobuild32bitapplications.
NOTE:Onloadbuildshavebeentestedagainstlibtoolversions1.5.26to2.4.2.Users
experiencingbuildissueswithotherlibtoolversionsshouldcontact
support@solarflare.com.
BuildingKernelModules
ThekernelmustbebuiltwithCONFIG_NETFILTERenabled.Standarddistributions
willalreadyhavethisenabled,butitmustalsobeenabledwhenbuildingacustom
kernel.Thisoptiondoesnotaffectperformance.
Thefollowingcommandscanbeusedtoinstallkerneldevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu(anykernel):
aptgetinstalllinuxheaders$(uname‐r)
•ForRedHat/Fedora(notfor32bitKernel):
‐ Ifthesystemsupportsa32bitKernelandthekernelisPAE,then:
yum‐yinstallkernelPAEdevel
‐ otherwise:
yum‐yinstallkerneldevel
•ForSuSE:
yast‐ikernelsource
OnloadUserGuide
BuildDependencies
Issue20 ©SolarflareCommunications2015 188
onload
binutils
gettext
gawk
gcc
sed
make
bash
glibccommon
automake
libtool
autoconf.
onload_tcpdump
libpcap
libpcapdevel1
solar_clusterd
pythondevel1
Building32bitapplicationson64bitarchitectureplatforms
Thefollowingcommandscanbeusedtoinstall32bitlibcdevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu:
aptgetinstallgccmultiliblibc6devi386
•ForRedHat/Fedora:
yum‐yinstallglibcdevel.i586
•ForSuSE:
yast‐iglibcdevel32bit
yast‐igcc32bit
1. Ifadditionalpackagesarenotinstalledthedependentcomponentwillnotbebuilt,butthe
Onloadbuildwillsucceed.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 189
DOnloadExtensionsAPI
TheOnloadExtensionsAPIallowstheusertocustomizeanapplicationusing
advancedfeaturestoimproveperformance.
TheExtensionsAPIdoesnotcreateanyruntimedependencyonOnloadandan
applicationusingtheAPIcanrunwithoutOnload.ThelicensefortheAPIand
associatedlibrariesisaBSD2ClauseLicense.
Thissectioncoversthefollowstopics:
CommonComponentsonpage189
StacksAPIonpage193
ZeroCopyAPIonpage201
TemplatedSendsonpage212
DelegatedSendsAPIonpage213
D.1SourceCode
TheonloadsourcecodeisprovidedwiththeOnloaddistribution.Entrypointsfor
thesourcecodeare:
src/lib/transport/unix/onload_ext_intercept.c
src/lib/transport/unix/zc_intercept.c
D.2CommonComponents
ForallapplicationsemployingtheExtensionsAPIthefollowingcomponentsare
provided:
• #include<onload/extensions.h>
Anapplicationshouldincludetheheaderfilecontainingfunctionprototypes
andconstantvaluesrequiredwhenusingtheAPI.
libonload_ext.a,libonload_ext.so
ThislibraryprovidesstubimplementationsoftheextendedAPI.Anapplication
thatwishestousetheextensionsAPIshouldlinkagainstthislibrary.
WhenOnloadisnotpresent,theapplicationwillcontinuetofunction,butcalls
totheextensionsAPIwillhavenoeffect(unlessdocumentedotherwise).
Tolinktothislibraryincludethel’linkeroptiononthecompilercommandline
i.e.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 190
lonload_ext
onload_is_present
Description
Iftheapplicationislinkedwithlibonload_ext,butnotrunningwithOnloadthiswill
return0.IftheapplicationisrunningwithOnloadthiswillreturn1.
Definition
intonload_is_present(void)
FormalParameters
None
ReturnValue
1fromlibonload.solibrary,or0fromlibonload_ext.alibrary
onload_fd_stat
structonload_stat
{
int32_tstack_id;
char*stack_name;
int32_tendpoint_id;
int32_tendpoint_state;
};
externintonload_fd_stat(intfd,structonload_stat*stat);
Description
Retrievesinternaldetailsaboutanacceleratedsocket.
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0socketisnotaccelerated
1socketisaccelerated
ENOMEMwhenmemorycannotbeallocated
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 191
Notes
Whencallingfree()onstack_nameusethe(char*)becausememoryisallocated
usingmalloc.
Thisfunctionwillcallmalloc()andsoshouldneverbecalledfromanyother
functionrequiringamalloclock.
onload_fd_check_feature
intonload_fd_check_feature(intfd,enumonload_fd_featurefeature);
enumonload_fd_feature{
/*CheckwhetherthisfdsupportsONLOAD_MSG_WARMornot*/
ONLOAD_FD_FEAT_MSG_WARM
};
Description
UsedtocheckwhethertheOnloadfiledescriptorsupportsafeatureornot.
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0ifthefeatureissupportedbutnotonthisfd
>0ifthefeatureissupportedbothbyonloadandthisfd
<0ifthefeatureissupported:
ENOSYSifonload_fd_check_feature()isnotsupported.
‐ENOTSUPPifthefeatureisnotsupportedbyonload.
Notes
Onload201509andlaterversionssupportthe
ONLOAD_FD_FEAT_UDP_TX_TS_HDRoption.onload_fd_check_featurewillreturn
1toindicatethatarecvmesgusedtoretreiveTXtimestampsforUDPpacketswill
returntheentireEthernetheader.Whenrunonolderversionsofonloadthiswill
return‐EOPNOTSUPP.
onload_thread_set_spin
Description
Foreachthread,specifywhichoperationsshouldspin.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 192
Definition
intonload_thread_set_spin(
enumonload_spin_typetype,
unsignedspin)
FormalParameters
type
Whichoperationtochangethespinstatusof.Thetypemustbeoneofthe
following:
enumonload_spin_type{
ONLOAD_SPIN_ALL
ONLOAD_SPIN_UDP_RECV,
ONLOAD_SPIN_UDP_SEND,
ONLOAD_SPIN_TCP_RECV,
ONLOAD_SPIN_TCP_SEND,
ONLOAD_SPIN_TCP_ACCEPT,
ONLOAD_SPIN_PIPE_RECV,
ONLOAD_SPIN_PIPE_SEND,
ONLOAD_SPIN_SELECT,
ONLOAD_SPIN_POLL,
ONLOAD_SPIN_PKT_WAIT,
ONLOAD_SPIN_EPOLL_WAIT
};
spin
Abooleanwhichindicateswhethertheoperationshouldspinornot.
ReturnValue
0onsuccess
EINVALifunsupportedtypeisspecified.
Notes
Spintime(forallthreads)issetusingtheEF_SPIN_USECparameter.
Examples
Theonload_thread_set_spinAPIcanbeusedtocontrolspinningonaperthread
orperAPIbasis.Theexistingspinrelatedconfigurationoptionssetthedefault
behaviorforthreads,andtheonload_thread_set_spinAPIoverridesthedefault.
Disableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,0);
Enableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,1);
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 193
Enablespinningonlyforcertainthreads:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin,invokeonload_thread_set_spin().
Disablespinningonlyincertainthreads:
1EnablespinningbysettingEF_POLL_USEC=<timeout>.
2Ineachthreadthatshouldnotspin,invokeonload_thread_set_spin().
NOTE:IfathreadissettoNOTspinandthenblocksthismayinvokeaninterrupt
forthewholestack.Interruptsoccurringonmoderatelybusythreadsmay
causeunintendedandundesirableconsequences.
EnablespinningforUDPtraffic,butnotTCPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(UDPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_UDP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_UDP_SEND,1)
EnablespinningforTCPtraffic,butnotUDPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(TCPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_TCP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_SEND,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_ACCEPT,1)
D.3StacksAPI
UsingtheOnloadExtensionsAPIanapplicationcanbindselectedsocketstospecific
Onloadstacksandinthiswayensurethattimecriticalsocketsarenotstarvedof
resourcesbyothernoncriticalsockets.TheAPIallowsanapplicationtoselect
socketswhicharetobeacceleratedthusreservingOnloadresourcesfor
performancecriticalpaths.Thisalsopreventsnoncriticalpathsfromcreatingjitter
forcriticalpaths.
onload_set_stackname
Description
SelecttheOnloadstackthatnewsocketsareplacedin.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 194
Definition
intonload_set_stackname(
intwho,
intscope,
constchar*name)
FormalParameters
who
Mustbeoneofthefollowing:
‐ ONLOAD_THIS_THREAD‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbythisthread.
‐ ONLOAD_ALL_THREADS‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbyallthreadsinthecurrentprocess.
ONLOAD_THIS_THREADtakesprecedenceoverONLOAD_ALL_THREADS.
scope
Mustbeoneofthefollowing:
‐ ONLOAD_SCOPE_THREAD‐nameisscopedwithcurrentthread
‐ ONLOAD_SCOPE_PROCESS‐nameisscopedwithcurrentprocess
‐ ONLOAD_SCOPE_USER‐nameisscopedwithcurrentuser
‐ ONLOAD_SCOPE_GLOBAL‐nameisglobalacrossallthreads,usersand
processes.
‐ ONLOAD_SCOPE_NOCHANGE‐undoeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…),seeNoteson
page195.
name
Oneofthefollowing:
‐ thestacknameupto8characters.
‐ anemptystringtosetnostackname
‐ thespecialvalueONLOAD_DONT_ACCELERATEtopreventsocketscreated
inthisthread,user,processfrombeingaccelerated.
SocketsidentifiedbytheoptionsabovewillbelongtotheOnloadstackuntila
subsequentcallusingonload_set_stacknameidentifiesadifferentstackorthe
ONLOAD_SCOPE_NOCHANGEoptionisused.
ReturnValue
0onsuccess
1witherrnosettoENAMETOOLONGifthenameexceedspermittedlength
1witherrnosettoEINVALifotherparametersareinvalid.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 195
Notes
Note1
Thisappliesforstacksselectedforsocketscreatedbysocket()andforpipe(),it
hasnoeffectonaccept().Passivelyopenedsocketscreatedviaaccept()will
alwaysbeinthesamestackasthelisteningsocketthattheyarelinkedto,thismeans
thatthefollowingarefunctionallyidenticali.e.
onload_set_stackname(foo)
socket
listen
onload_set_stackname(bar)
accept
and
onload_set_stackname(foo)
socket
listen
accept
onload_set_stackname(bar)
Inbothcasesthelisteningsocketandtheacceptedsocketwillbeinstackfoo.
Note2
Scopedefinesthenamespaceinwhichastackbelongs.Astacknameoffooinscope
userisnotthesameasastacknameoffooinscopethread.Scoperestrictsthe
visibilityofastacktoeitherthecurrentthread,currentprocess,currentuseroris
unrestricted(global).Thishasthepropertythatwith,forexample,processbased
scoping,twoprocessescanhavethesamestacknamewithoutsharingastack‐as
thestackforeachprocesshasadifferentnamespace.
Note3
Scopingcanbethoughtofasaddingasuffixtothesuppliednamee.g.
ONLOAD_SCOPE_THREAD:<stackname>t<thread_id>
ONLOAD_SCOPE_PROCESS:<stackname>p<process_id>
ONLOAD_SCOPE_USER:<stackname>u<user_id>
ONLOAD_SCOPE_GLOBAL:<stackname>
Thisisanexampleonlyandtheimplementationisfreetodosomethingdifferent
suchasmaintainingdifferentlistsfordifferentscopes.
Note4
ONLOAD_SCOPE_NOCHANGEwillundotheeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…).
Ifyouhavepreviouslyusedonload_set_stackname(ONLOAD_THIS_THREAD,…)and
wanttoreverttothebehaviorofthreadsthatareusingtheONLOAD_ALL_THREADS
configuration,withoutchangingthatconfiguration,youcandothefollowing:
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 196
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_NOCHANGE,"");
Relatedenvironmentvariables
Relatedenvironmentvariablesare:
EF_DONT_ACCELERATE
Default:0
Minimum:0
Maximum:1
Scope:Perprocess
IfthisenvironmentvariableissetthenaccelerationforALLsocketsisdisabledand
handedofftothekernelstackuntiltheapplicationoverridesthisstatewithacallto
onload_set_stackname().
EF_STACK_PER_THREAD
Default:0
Minimum:0
Maximum:1
Scope:Perprocess
Ifthisenvironmentvariableisseteachsocketcreatedbytheapplicationwillbe
placedinastackdependingonthethreadinwhichitiscreated.Stackscould,for
example,benamedusingthethreadIDofthethreadthatcreatesthestack,butthis
shouldnotbereliedupon.
Acalltoonload_set_stacknameoverridesthisvariable.EF_DONT_ACCELERATE
takesprecedenceoverthisvariable.
EF_NAME
Default:none
Minimum:0chars
Maximum:8chars
Scope:perstack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stacknameoverridesthisvariableand,
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
onload_move_fd
Description
Movethefiledescriptortothecurrentstack.Thetargetstackcanbespecifiedwith
onload_set_stackname().
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 197
Definition
intonload_move_fd(intfd)
FormalParameters
fd‐thefiledescriptortobemovedtothecurrentstack.
ReturnValue
0onsuccess
nonzerootherwise.
Notes
Usefultomovefdsobtainedbyaccept()tomoveanewconnectionoutofthe
listeningsocket.
CurrentlylimitedtoTCPclosedsocketsandTCPacceptedsockets.Asockettobe
movedmusthaveanemptysendqueueandemptyretransmitqueue.Asocket
whichhashadasend()operationcannotbemoved.
ShouldnotbeusedsimultaneouslywithotherI/Omultiplexactionsi.e.poll(),
select(),recv()etconthefiledescriptor.
Thisfunctionisnotasyncsafeandshouldneverbecalledfromanyprocessfunction
handlingsignals.
onload_stackname_save
Description
Savethestateofthecurrentonloadstackidentifiedbythepreviouscallto
onload_set_stackname()
Definition
intonload_stackname_save(void)
FormalParameters
none
ReturnValue
0onsuccess
ENOMEMwhenmemorycannotbeallocated.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 198
onload_stackname_restore
Description
Restorestackstatesavedwithapreviouscalltoonload_stackname_save().All
updates/changestostateofthecurrentstackwillbedeletedandallstatepreviously
savedwillberestored.Toavoidunexpectedresults,thestackshouldberestoredin
thesamethreadasusedtocallonload_stackname_save().
Definition
intonload_stackname_restore(void)
FormalParameters
none
ReturnValue
0onsuccess
nonzeroifanerroroccurs.
Notes
TheAPIstacknamesaveandrestorefunctionsprovideflexibilitywhenbinding
socketstoanOnloadstack.
Usingacombinationofonload_set_stackname(),onload_stackname_save()
andonload_stackname_restore(),theuserisabletocreatedefaultstacksettings
whichapplytooneormoresockets,savethisstateandthencreatechangedstack
settingswhichareappliedtoothersockets.Theoriginaldefaultsettingscanthenbe
restoredtoapplytosubsequentsockets.
D.4StacksAPIUsage
UsingacombinationoftheEF_DONT_ACCELERATEenvironmentvariableandthe
functiononload_set_stackname(),theuserisabletocontrol/selectsocketswhich
aretobeacceleratedandisolatetheseperformancecriticalsocketsandthreads
fromtherestofthesystem.
onload_stack_opt_set_int
Description
Set/modifyperstackoptionsthatallsubsequentlycreatedstackswilluseinsteadof
usingtheexistingglobalstackoptions.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 199
Definition
intonload_stack_opt_set_int(
constchar*name,
int64_tvalue)
FormalParameters
name
Stackoptiontomodify
value
Newvalueforthestackoption.
Example
onload_stack_opt_set_int(EF_DONT_ACCELERATE,1);
ReturnValue
0onsuccess
1witherrnosettoEINVALiftherequestedoptionisnotfound.
Notes
Cannotbeusedtomodifyoptionsonexistingstacks‐onlyfornewstacks.
Cannotbeusedtomodifyprocessoptions‐onlystackoptions.
Modifiedoptionswillbeusedforallnewlycreatedstacksuntil
onload_stack_opt_reset()iscalled.
onload_stack_opt_reset
Description
Reverttousingglobalstackoptionsfornewlycreatedstacks.
Definition
intonload_stack_opt_reset(void)
FormalParameters
None.
ReturnValue
0always
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 200
Notes
Shouldbecalledfollowingacalltoonload_stack_opt_set_int()torevertto
usingglobalstackoptionsforallnewlycreatedstacks.
D.5StacksAPI‐Examples
•Thisthreadwillusestackfoo,otherthreadsinthestackwillcontinueasbefore.
onload_set_stackname(ONLOAD_THIS_THREAD,ONLOAD_SCOPE_GLOBAL,"foo")
•Allthreadsinthisprocesswillgettheirownstackcalledfoo.Thisisequivalent
totheEF_STACK_PER_THREADenvironmentvariable.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_THREAD,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessdid
thesamefunctioncallitwillgetitsownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_PROCESS,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessrunby
thesameuserdidthesame,itwouldsharethesamestackasthefirstprocess.
Ifanotherprocessrunbyadifferentuserdidthesameitwouldgetisownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_USER,"foo")
• EquivalenttoEF_NAME.Allthreadswilluseastackcalledfoowhichissharedby
anyotherprocesswhichdoesthesame.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,"foo")
• EquivalenttoEF_DONT_ACCELERATE.Newsockets/pipeswillnotbeaccelerated
untilanothercalltoonload_set_stackname().
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,ONLOAD_DONT_ACCELERATE)
onload_ordered_epoll_wait
FordetailsoftheWireOrderDeliveryfeaturerefertoWireOrderDeliveryon
page61
Description
Iftheepollsetcontainsacceleratedsocketsinonlyonestackthisfunctioncanbe
usedinsteadofepoll_wait()toreturneventsintheorderthesewererecoveredfrom
thewire.Thereisnoexplicitcheckonsockets,soapplicationsmustensurethatthe
rulesareappliedtoavoidmisorderingofpackets.
Definition
intonload_ordered_epoll_wait(
intepfd,
structepoll_event*events,
structonload_ordered_epoll_event*oo_events,
intmaxevents,
inttimeout);
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 201
FormalParameters
Seedefinitionepoll_wait().
ReturnValue
0onsuccess
nonzerootherwise.
Notes
Anyfiledescriptorsreturnedasreadywithoutavalidtimestampi.e.tv_sec=0,
shouldbeconsideredunorderedwithrespecttotherestoftheset.Thiscanoccur
fordatareceivedviathekernelordatareturnedwithoutahardwaretimestampi.e.
fromaninterfacethatdoesnotsupporthardwaretimestamping.
TheenvironmentvariableEF_UL_EPOLL=1mustbesetHardwaretimestampsare
required.ThisfeatureisonlyavailableontheSFN7000seriesadapters.
structonload_ordered_epoll_event{
/*Thehardwaretimestampofthefirstreadabledata*/
structtimespects;
/*Numberofbytesthatmaybereadtomaintainwireorder*/
intbytes
};
D.6ZeroCopyAPI
ZeroCopycanimprovetheperformanceofnetworkingapplicationsbyeliminating
intermediatebufferswhentransferringdatabetweenapplicationandnetwork
adapter.
TheOnloadExtensionsZeroCopyAPIsupportszerocopyofUDPreceivedpacket
dataandTCPtransmitpacketdata.
TheAPIprovidesthefollowingcomponents:
#include<onload/extensions_zc.h>
Inadditiontothecommoncomponents,anapplicationshouldincludethis
headerfilewhichcontainsallfunctionprototypesandconstantvaluesrequired
whenusingtheAPI.
Thisfileincludescomprehensivedocumentation,requireddatastructuresand
functiondefinitions.
ZeroCopyDataBuffers
Toavoidthecopydataispassedtoandfromtheapplicationinspecialbuffers
describedbyastructonload_zc_iovec.Amessageordatagramcanconsistof
multipleiovecsusingastructonload_zc_msg.Asinglecalltosendmayinvolve
multiplemessagesusinganarrayofstructonload_zc_mmsg.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 202
/*Azc_iovecdescribesasinglebuffer*/
structonload_zc_iovec{
void*iov_base;/*Addresswithinbuffer*/
size_tiov_len;/*Lengthofdata*/
onload_zc_handlebuf;/*(opaque)bufferhandle*/
unsignediov_flags;/*Notcurrentlyused*/
};
/*Amsgdescribesarrayofiovecsthatmakeupdatagram*/
structonload_zc_msg{
structonload_zc_iovec*iov;/*Arrayofbuffers*/
structmsghdrmsghdr;/*Messagemetadata*/
};
/*Anmmsgdescribesamessage,thesocket,anditsresult*/
structonload_zc_mmsg{
structonload_zc_msgmsg;/*Message*/
intrc;/*Resultofsendoperation*/
intfd;/*sockettosendon*/
};
Figure17:ZeroCopyDataBuffers
ZeroCopyUDPReceiveOverview
Figure18illustratesthedifferencebetweenthenormalUDPreceivemodeandthe
zerocopymethod.
WhenusingthestandardPOSIXsocketcalls,theadapterdeliverspacketstoan
OnloadpacketbufferwhichisdescribedbyadescriptorpreviouslyplacedintheRX
descriptorring.Whentheapplicationcallsrecv(),Onloadcopiesthedatafromthe
packetbuffertoanapplicationsuppliedbuffer.
UsingthezerocopyUDPreceiveAPItheapplicationcallstheonload_zc_recv()
functionincludingacallbackfunctionwhichwillbecalledwhendataisready.The
callbackcandirectlyaccessthedatainsidetheOnloadpacketbufferavoidingacopy.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 203
Figure18:Traditionalvs.ZeroCopyUDPReceive
Asinglecallusingonload_zc_recv()functioncanresultinmultipledatagrams
beingdeliveredtothecallbackfunction.EachtimethecallbackreturnstoOnload
thenextdatagramisdelivered.ProcessingstopswhenthecallbackinstructsOnload
toceasedeliveryortherearenofurtherreceiveddatagrams.
Ifthereceivingapplicationdoesnotrequiretolookatalldatareceived(i.e.is
filtering)thiscanresultinaconsiderableperformanceadvantagebecausethisdata
isnotpulledintotheprocessor'scache,therebyreducingtheapplicationcache
footprint.
Asageneralrule,thecallbackfunctionshouldavoidcallingothersystemcallswhich
attempttomodifyorclosethecurrentsocket.
ZerocopyUDPReceiveisimplementedwithintheOnloadExtensionsAPI.
ZeroCopyUDPReceive
Theonload_zc_recv()functionspecifiesacallbacktoinvokeforeachreceived
UDPdatagram.Thecallbackisinvokedinthecontextofthecallto
onload_zc_recv()(i.e.Itblocks/spinswaitingfordata).
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 204
Beforecalling,theapplicationmustsetthefollowinginthestruct
onload_zc_recv_args:
typedefenumonload_zc_callback_rc
(*onload_zc_recv_callback)(structonload_zc_recv_args*args,intflags);
structonload_zc_recv_args
{
structonload_zc_msgmsg;
onload_zc_recv_callbackcb;
void*user_ptr;
intflags;
};
intonload_zc_recv(intfd,structonload_zc_recv_args*args);
Figure19:ZeroCopyrecv_args
Thecallbackgetstoexaminethedata,andcancontrolwhathappensnext:(i)
whetherornotthebuffer(s)arekeptbythecallbackorareimmediatelyfreedby
Onload;and(ii)whetherornotonload_zc_recv()willinternallyloopandinvoke
thecallbackwiththenextdatagram,orimmediatelyreturntotheapplication.The
nextactionisdeterminedbysettingflagsinthereturncodeasfollows:
cb settothecallbackfunctionpointer
user_ptr settopointtoapplicationstate,thisisnottouchedby
onload
msg.msghdr.msg_control
msg_controllen
msg_name
msg_namelen
theuserapplicationshouldsetthesetoappropriate
buffersandlengths(ifrequired)asyouwouldfor
recvmsg(orNULLand0ifnotused)
flags settoindicatebehavior(e.g.
ONLOAD_MSG_DONTWAIT)
ONLOAD_ZC_KEEP thecallbackfunctioncanelecttoretain
ownershipofreceivedbuffer(s)byreturning
ONLOAD_ZC_KEEP.Followingthis,thecorrect
waytoreleaseretainedbuffersistocall
onload_zc_release_buffers()toexplicitly
releasethefirstbufferfromeachreceived
datagram.Subsequentbufferspertainingtothe
samedatagramwillthenbeautomatically
released.
ONLOAD_ZC_CONTINUE tosuggestthatOnloadshouldloopandprocess
moredatagrams
ONLOAD_ZC_TERMINATE toinsistthatOnloadimmediatelyreturnfrom
theonload_zc_recv()
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 205
FlagscanalsobesetbyOnload:
Ifthereisunaccelerateddataonthesocketfromthekernelsreceivepaththis
cannotbehandledwithoutcopying.Theapplicationhastwochoicesasfollows:
ZeroCopyReceiveExample#1
structonload_zc_recv_argsargs;
structzc_recv_statestate;
intrc;
state.bytes=bytes_to_wait_for;
/*Easywaytosetmsg_control*andmsg_name*tozero*/
memset(&args.msg,0,sizeof(args.msg));
args.cb=&zc_recv_callback;
args.user_ptr=&state;
args.flags=ONLOAD_ZC_RECV_OS_INLINE;
rc=onload_zc_recv(fd,&args);
//‐‐‐
enumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflags)
{
inti;
structzc_recv_state*state=args>user_ptr;
for(i=0;i<args>msg.msghdr.msg_iovlen;++i){
printf("zccallbackiov%d:%p,%d",i,
args>msg.iov[i].iov_base,
args>msg.iov[i].iov_len);
state>bytes‐=args>msg.iov[i].iov_len;
}
ONLOAD_ZC_END_OF_BURST Onloadsetsthisflagtoindicatethatthisisthe
lastpacket
ONLOAD_ZC_MSG_SHARED Packetbuffersarereadonly
ONLOAD_MSG_RECV_OS_INLINE setthisflagwhencallingonload_zc_recv().
Onloadwilldealwiththekerneldatainternally
andpassittothecallback
checkreturncode checkthereturncodefromonload_zc_recv().
IfitreturnsENOTEMPTYthentheapplicationmust
callonload_recvmsg_kernel()toretrievethe
kerneldata.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 206
if(state>bytes<=0)returnONLOAD_ZC_TERMINATE;
elsereturnONLOAD_ZC_CONTINUE;
}
Figure20:ZeroCopyReceive‐example#1
ZeroCopyReceiveExample#2
staticenumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflag)
{
structuser_info*zc_info=args>user_ptr;
inti,zc_rc=0;
for(i=0;i<args>msg.msghdr.msg_iovlen;++i){
zc_rc+=args>msg.iov[i].iov_len;
handle_msg(args>msg.iov[i].iov_base,
args>msg.iov[i].iov_len);
}
if(zc_rc==0)
returnONLOAD_ZC_TERMINATE;
zc_info>zc_rc+=zc_rc;
if((zc_info>flags&MSG_WAITALL)&&
(zc_info>zc_rc<zc_info>size))
returnONLOAD_ZC_CONTINUE;
elsereturnONLOAD_ZC_TERMINATE;
}
ssize_tdo_recv_zc(intfd,void*buf,size_tlen,intflags)
{
structuser_infoinfo;intrc;
init_user_info(&info);
memset(&zc_args,0,sizeof(zc_args));
zc_args.user_ptr=&info;
zc_args.flags=0;
zc_args.cb=&zc_recv_callback;
if(flags&MSG_DONTWAIT)
zc_args.flags|=ONLOAD_MSG_DONTWAIT;
rc=onload_zc_recv(fd,&zc_args);
if(rc==‐ENOTEMPTY){
if((rc=onload_recvmsg_kernel(fd,&msg,0))<0)
printf("onload_recvmsg_kernelfailed\n");
}
elseif(rc==0){
/*zc_rcgetssetbycallbacktobytesreceived,sowe
*canreturnthattoappearlikestandardrecvcall*/
rc=info.zc_rc;
}
returnrc;
}
Figure21:ZeroCopyReceive‐example#2
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 207
NOTE:onload_zc_recv()onlysupportsaccelerated(Onloaded)sockets.For
example,whenboundtoabroadcastaddressthesocketfdishandedofftothe
kernelandthisfunctionwillreturnESOCKNOTSUPPORT.
ZeroCopyTCPSendOverview
Figure22illustratesthedifferencebetweenthenormalTCPtransmitmethodand
thezero‐copymethod.
WhenusingstandardPOSIXsocketcalls,theapplicationfirstcreatesthepayload
datainanapplicationallocatedbufferbeforecallingthesend()function.Onload
willcopythedatatoaOnloadpacketbufferinmemoryandpostadescriptortothis
bufferinthenetworkadapterTXdescriptorring.
UsingthezerocopyTCPtransmitAPItheapplicationcallsthe
onload_zc_alloc_buffers()functiontorequestbuffersfromOnload.Apointer
toapacketbufferisreturnedinresponse.Theapplicationplacesthedatatosend
directlyintothisbufferandthencallsonload_zc_send()toindicatetoOnloadthat
dataisavailabletosend.
OnloadwillpostadescriptorforthepacketbufferinthenetworkadapterTX
descriptorringandringtheTXdoorbell.Thenetworkadapterfetchesthedatafor
transmission.
Figure22:Traditionalvs.ZeroCopyTCPTransmit
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 208
NOTE:Thesocketusedtoallocatezerocopybuffersmustbeinthesamestackas
thesocketusedtosendthebuffers.WhenusingTCPloopback,Onloadcanmovea
socketfromonestacktoanother.UsersmustensurethattheyALWAYSUSE
BUFFERSFROMTHECORRECTSTACK.
NOTE:Theonload_zc_sendfunctiondoesnotcurrentlysupportthe
ONLOAD_MSG_MOREorTCP_CORKflags.
ZerocopyTCPtransmitisimplementedwithintheOnloadExtensionsAPI.
ZeroCopyTCPSend
ThezerocopysendAPIsupportsthesendingofmultiplemessagestodifferent
socketsinasinglecall.Databuffersmustbeallocatedinadvanceandforbest
efficiencytheseshouldbeallocatedinblocksandoffthecriticalpath.Theuser
shouldavoidsimplymovingthecopyfromOnloadintotheapplication,butwhere
thisisunavoidable,itshouldalsobedoneoffthecriticalpath.
intonload_zc_send(structonload_zc_mmsg*msgs,intmlen,intflags);
Figure23:ZeroCopysend
intonload_zc_alloc_buffers(intfd,
structonload_zc_iovec*iovecs,
intiovecs_len,
onload_zc_buffer_type_flagsflags);
intonload_zc_release_buffers(intfd,
onload_zc_handle*bufs,
intbufs_len);
Figure24:ZeroCopyallocatebuffers
Theonload_zc_send()functionreturnvalueidentifieshowmanyofthe
onload_zc_mmsgarraysrcfieldsareset.Eachonload_zc_mmsg.rcreturnshow
manybytes(orerror)weresentinforthatmessage.Refertothetablebelow.
rc=onload_zc_send()
rc<0applicationerrorcallingonload_zc_send().rcissetto
theerrorcode
rc==0shouldnothappen
0<rc<=n_msgs rcissettothenumberofmessageswhosestatushasbeen
sentinmmsgs[i].rc.
rc==n_msgsisthenormalcase
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 209
SentbuffersareownedbyOnload.Unsentbuffersareownedbytheapplicationand
mustbefreedorreusedtoavoidleaking.
ZeroCopySend‐SingleMessage,SingleBuffer
structonload_zc_ioveciovec;
structonload_zc_mmsgmmsg;
rc=onload_zc_alloc_buffers(fd,&iovec,1,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==O);
assert(my_data_len<=iovec.iov_len);
memcpy(iovec.iov_base,my_data,my_data_len);
iovec.iov_len=my_data_len;
mmsg.fd=fd;
mmsg.iov=&iovec;
mmsg.msg.msghdr.msg_iovlen=1;
rc=onload_zc_send(&mmsg,1,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
/*Onlyonemessage,sorcshouldbe1*/
assert(rc==1);
/*rc==1sowecanlookatthefirst(only)mmsg.rc*/
if(mmsg.rc<0)
/*Errorsendingmessage*/
onload_zc_release_buffers(fd,&iovec.buf,1);
else
/*Messagesent,singlemsg,singleiovecso
*shouldn'tworryaboutpartialsends*/
assert(mmsg.rc==my_data_len);
}
Figure25:ZeroCopy‐SingleMessage,SingleBufferExample
Theexampleabovedemonstrateserrorcodehandling.Noteitcontainsanexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
ZeroCopySend‐MultipleMessage,MultipleBuffers
#defineN_BUFFERS2
#defineN_MSGS2
rc=mmsg[i].rc
rc<0errorsendingthismessage.rcissettotheerrorcode
rc>=0rcissettothenumberofbytesthathavebeensentinthis
message.Comparetothemessagelengthtoestablish
whichbufferssent
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 210
structonload_zc_ioveciovec[N_MSGS][N_BUFFERS];
structonload_zc_mmsgmmsg[N_MSGS];
for(i=0;i<N_MSGS;++i){
rc=onload_zc_alloc_buffers(fd,iovec[i],N_BUFFERS,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==0);
/*TODOstoredatainiovec[i][j].iov_base,
*setiovec[i][j]iov_len*/
mmsg[i]fd=fd;/*Couldbedifferentforeachmessage*/
mmsg[i].iov=iovec[i];
mmsg[i].msg.msghdr.msg_iovlen=N_BUFFERS;
}
rc=onload_zc_send(mmsg,N_MSGS,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
for(i=0;i<N_MSGS;++i){
if(i<rc){
/*mmsg[i]issetandwecanuseit*/
if(mmsg[i]<0){
/*errorsendingthismessage‐releasebuffers*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}elseif(mmsg(i]<sum_over_j(iovec[i][j].iov_len)){
/*partialsuccess*/
/*TODOusemmsg[i]todeterminewhichbuffersin
*iovec[i]arrayaresentandwhicharestill
*ownedbyapplication*/
}else{
/*Wholemessagesent,buffersnowownedbyOnload*/
}
}else{
/*mmsg[i]isnotset,thismessagewasnotsent*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}
}
}
Figure26:ZeroCopy‐MultipleMessages,MultipleBuffersExample
Theexampleabovedemonstrateserrorcodehandlingandcontainssomeexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
ZeroCopySend‐FullExample
staticstructonload_zc_ioveciovec[NUM_ZC_BUFFERS];
staticssize_tdo_send_zc(intfd,constvoid*buf,size_tlen,intflags)
{
intbytes_done,rc,i,bufs_needed;
structonload_zc_mmsgmmsg;
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 211
mmsg.fd=fd;
mmsg.msg.iov=iovec;
bytes_done=0;
mmsg.msg.msghdr.msg_iovlen=0;
while(bytes_done<len){
if(iovec[mmsg.msg.msghdr.msg_iovlen].iov_len>(len‐bytes_done))
iovec[mmsg.msg.msghdr.msg_iovlen].iov_len=(len‐bytes_done);
memcpy(iovec[i].iov_base,buf+bytes_done,iov_len);
bytes_done+=iovec[mmsg.msg.msghdr.msg_iovlen].iov_len;
++mmsg.msg.msghdr.msg_iovlen;
}
rc=onload_zc_send(&mmsg,1,0);
if(rc!=1/*Numberofmessageswesent*/){
printf("onload_zc_sendfailedtoprocessmsg,%d\n",rc);
return‐1;
}else{
if(mmsg.rc<0)
printf("onload_zc_sendmessageerror%d\n",mmsg.rc);
else{
/*Iterateovertheiovecs;anythatweresentwemustreplenish.*/
i=0;bufs_needed=0;
while(i<mmsg.msg.msghdr.msg_iovlen){
if(bytes_done==mmsg.rc){
printf(onload_zc_senddidnotsendiovec%d\n",i);
/*Inotherbufferallocationschemeswewouldhavetorelease
*thesebuffers,butseemspointlessasweguaranteeatthe
*endofthisfunctiontohaveiovecarrayfull,sodonothing.
*/
}else{
/*Buffersent,nowownedbyOnload,soreplenishiovecarray*/
++bufsneeded;
bytes_done+=iovec[i].iov_len;
}
++i;
}
if(bufs_needed)/*replenishtheiovecarray*/
rc=onload_zc_alloc_buffers(fd,iovec,bufs_needed,
ONLOAD_ZC_BUFFER_HDR_TCP);
}
}
/*Setareturncodethatlookssimilarenoughtosend().NB.we're
*notsetting(andneitherdoesonload_zc_send())errno*/
if(mmsg.rc<0)return‐1;
elsereturnbytes_done;
}
Figure27:ZeroCopySend
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 212
D.7TemplatedSends
Foradescriptionofthetemplatessendsfeature,refertoTemplatedSendson
page108.Foradescriptionofthepackettemplatetobeusedbythetemplated
sendsfeaturerefertotheusenotesandreferencestoonload_msg_templateinthe
[onload]/src/include/onload/extensions_zc.hfileincludedfromtheOnload
distribution.
MSGTemplate
structoo_msg_template{
/*Toverifysubsequenttemplatedcallsareusedwiththesamesocket*/
oo_spoomt_sock_id;
};
MSGUpdate
/*Anupdate_iovecdescribesasingletemplateupdate*/
structonload_template_msg_update_iovec{
void*otmu_base;/*Pointertonewdata*/
size_totmu_len;/*Lengthofnewdata*/
off_totmu_offset;/*Offsetwithintemplatetoupdate*/
unsignedotmu_flags;/*Forfutureuse.Mustbesetto0.*/
};
MSGAllocation
/*Validoptionsforflagsare:ONLOAD_TEMPLATE_FLAGS_PIO_RETRY*/
externintonload_msg_template_alloc(intfd,structiovec*initial_msg,
intmlen,onload_template_handle*handle,
unsignedflags);
MSGTemplateUpdate
/*Validoptionsforflagsare:ONLOAD_TEMPLATE_FLAGS_SEND_NOW,
*ONLOAD_TEMPLATE_FLAGS_DONTWAIT
*/
externint
onload_msg_template_update(intfd,onload_template_handlehandle,
structonload_template_msg_update_iovec*updates,
intulen,unsignedflags);
MSGTemplateAbort
externintonload_msg_template_abort(intfd,onload_template_handlehandle);
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 213
D.8DelegatedSendsAPI
ThedelegatedsendAPI,supportedbySolarflareSFN7000seriesadapters,canlower
thelatencyoverheadincurredwhencallingsend()onTCPsocketsbycontrolling
TCPsocketcreationandmanagementthroughOnload,butallowingTCPsends
directlythroughtheOnloadlayer2ef_viAPIorothersimilarAPI.
Description
AnapplicationusingthedelegatedsendsAPIwillprepareapacketbufferwithIP/
TCPheaderdata,beforeaddingpayloaddatatothepacket.Thepacketbuffercan
bepreparedinadvanceandpayloadaddedjustbeforethesendisrequired.
Aftereachdelegatedsend,theactualdatasent(andlengthofthatdata)isreturned
toOnload.ThisallowsOnloadtoupdatetheTCPinternalstateandhavethedatato
handifretransmissionsarerequiredonthesocket.
ThisfeatureisintendedforapplicationsthatmakesporadicTCPsendsasopposed
tolargeamountsofbidirectionalTCPtraffic.TheAPIshouldbeusedwithcaution
tosendsmallamountsofTCPdata.Althoughthepacketbuffercanbepreparedin
advanceofthesend,theideaistocompletethedelegatedsendoperation
(onload_delegated_send_complete())soonaftertheinitialsendtomaintainthe
integrityoftheTCPinternalstate.
TheuserisresponsibleforserializationwhenusingthedelegatedsendAPI.Thefirst
callshouldalwaysbeonload_delegated_send_prepare().Ifanormalsendis
requiredfollowingtheprepare,theusershoulduse
onload_delegated_send_cancel().
Foragivenfiledescriptor,whileadelegatedsendisinprogress,anduntilcomplete
hasbeencalled,theusershouldNOTattemptanystandardsend(),write()or
sendfile()close()etcoperations.
Performance
Forbestlatencytheapplicationshouldcallonload_delegated_send_complete()
assoonasadelegatedsendiscomplete.ThisallowsOnloadtocontinueif
retransmissionsarerequired‐Onloadcannotperformanyretransmissionuntil
completehasbeencalled.
Whenalinkpartnerhasalreadyacknowledgeddatabeforecompletehasbeen
called,OnloadwillnothavetocopythesentdatatotheTCPretransmitqueue.So
delayingthecompletecallmayavoidadatacopybutlatencymaysufferintheevent
ofpacketloss.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 214
ExampleCode
TheOnload201502distributionincludestheefdelegated_server.cand
efdelegated_client.cexampleapplicationstodemonstratethedelegatedsends
API.Variablesandconstantsdefinitions,includingsocketflagsandfunctionreturn
codesrequiredwhenusingtheAPIcanbefoundintheextensions.hheaderfile.
onload_delegated_send_prepare
Description
Preparetosenduptosizebytes.AllocateTCPheadersandpreparethemwith
EthernetIP/TCPheaderdata.
Definition
enumonload_delegated_send_prepare(
intfd,
intsize,
uintflags,
structonload_delegated_send*)
FormalParameters
fd
Filedescriptortosendon
size
Sizeofpayloaddata
flags
Seebelow
structonload_delegated_send*
Seebelow
ReturnValue
0onsuccess
nonzerootherwise
Notes
Thisfunctioncanbecalledspeculativelysothatthepacketbufferispreparedin
advance,headersareaddedsothatthepacketpayloaddatacanbeadded
immediatelybeforethesendisrequired.
ThisfunctionassumesthepacketlengthisequaltoMSSinwhichcasethereisno
needtocallonload_delegated_send_tcp_update().
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 215
FlagsareusedforARPresolution:
•defaultflags=0
ONLOAD_DELEGATED_SEND_FLAG_IGNORE_ARP‐donotdoARPlookup,the
callerwillprovidedestinationMACaddress.
ONLOAD_DELEGATED_SEND_FLAG_RESOLVE_ARP‐ifARPinformationisnot
available,sendaspeculativeTCP_ACKtoprovokekernelintoARPresolution‐
waitupto1msforARPinformationtoappear.
TCPsendwindow/congestionwindowsmustberespectedduringdelegated
sends.
Seeextensions.hforflagsandreturncodevalues.
structonload_delegated_send{
void*headers;
intheaders_len;/*bufferlenoninput,headerslenonoutput*/
intmss;/*onepacketpayloadmaynotexceedthis*/
intsend_wnd;/*sendwindow*/
intcong_wnd;/*congestionwindow*/
intuser_size;/*the"size"valuefromsend_prepare()call*/
inttcp_seq_offset;
intip_len_offset;
intip_tcp_hdr_len;
intreserved[5];
};
onload_delegated_send_tcp_update
Description
Updatepacketheaderswithpayloadlengthandflags.
Definition
voidonload_delegated_send_tcp_update(
structonload_delegated_send*,
intsize,
intflags)
FormalParameters
structonload_delegated_send*
Seebelow
size
Sizeofpayloaddata
flags
Seebelow
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 216
ReturnValue
None
Notes
Thisfunctioniscalledwhen,duringasend,thepayloadlengthisnotequaltothe
MSSvalue.Seeonload_delegated_send_prepareonpage214.
FlagTCP_FLAG_PSHisexpectedtobesetonthelastpacketwhensendingalarge
datachunk.
onload_delegated_send_tcp_advance
Description
AdvanceTCPheadersaftersendingoneTCPpacket.
Definition
voidonload_delegated_send_tcp_advance(
structonload_delegated_send*,
intbytes)
FormalParameters
structonload_delegated_send*
Seebelow
bytes
Numberofbytessent
ReturnValue
None
Notes
Whensendingapacketusingmultiplesends,thefunctioniscalledtoupdatethe
headerdatawiththenumberofbytesaftereachsend.
Theactualdatasentisnotreturnedtoonloaduntil
onload_delegated_send_complete()iscalled.
onload_delegated_send_complete
Description
Followingadelegatedsend,thisfunctionisusedtoreturntheactualdatasent(and
lengthofthatdata)toOnloadwhichwillupdatetheinternalTCPstate.
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 217
Definition
intonload_delegated_send_complete(
intfd,
conststructiovec*,
intiovlen,
intflags)
FormalParameters
fd
Thefiledescriptor.
structiovec
Pointertothedatasent
iovlen
Size(bytes)ofthedatasent
flags
(MSG_DONTWAIT|MSG_NOSIGNAL]
ReturnValue
0onsuccess
nonzeroifanerroroccurs.
Notes
Onloadisunabletodoanyretransmituntilthisfunctionhasbeencalled.
Thisfunctionshouldbecalledevenifsome(butnotall)bytesspecifiedinthe
preparefunctionhavebeensent.Theusermustalsocall
onload_delegated_send_cancel()ifsomeofthebytesarenotgoingtobesent
i.e.reservedbutnotsent‐seeonload_delegated_send_cancel()notesbelow.
ThisfunctioncanblockbecauseofSO_SNDBUFlimitationandwillignorethe
SO_SNDTIMEOvalue.
onload_delegated_send_cancel
Description
Nomoredelegatedsendisplanned.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.
Definition
intonload_delegated_send_cancel(intfd)
FormalParameters
fd
OnloadUserGuide
OnloadExtensionsAPI
Issue20 ©SolarflareCommunications2015 218
Thefiledescriptortobeclosed.
ReturnValue
0onsuccess
nonzeroifanerroroccurs.
Notes
Whentcpheadershavebeenallocatedwithonload_delegated_send_prepare(),but
itissubsequentlyrequiredtodoanormalsend,thisfunctioncanbeusedtocancel
thedelegatedsendoperationanddoanormalsend.
Thereisnoneedtocallthisfunctionbeforecalling
onload_delegated_send_prepare().
Thereisnoneedtocallthisfunctionifallthebytesspecifiedinthe
onload_delegated_send_prepare()functionhavebeensent.
Ifsome,butnotallbyteshavebeensent,youmustcall
onload_delegated_send_complete()forthesentbytesTHENcall
onload_delegated_send_cancel()fortheremainingbytes(reservedbutnot
sent)bytes.Thisappliesevenifthereasonfornotsendingisthatthewindowlimits
returnedfromthepreparefunctionhavebeenreached.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 219
Eonload_stackdump
E.1Introduction
TheSolarflareonload_stackdumpdiagnosticutilityisacomponentoftheOnload
distributionwhichcanbeusedtomonitorOnloadperformance,settuningoptions
andexamineaspectsofthesystemperformance.
NOTE:Toviewdataforallstacks,createdbyallusers,theusermustberootwhen
runningonload_stackdump.Nonrootuserscanonlyviewdataforstackscreated
bythemselvesandaccessibletothemviatheEF_SHARE_WITHenvironment
variable.
Thefollowingexamplesofonload_stackdumparedemonstratedelsewhereinthis
userguide:
MonitoringUsingonload_stackdumponpage42
ProcessingatUserLevelonpage43
AsFewInterruptsasPossibleonpage45
EliminatingDropsonpage45
MinimizingLockContentiononpage46
E.2GeneralUse
Theonload_stackdumptoolcanproduceanextensiverangeofdataanditcanbe
moreusefultolimitoutputtospecificstacksortospecificaspectsofthesystem
performanceforanalysispurposes.
•Forhelp,andtolistallonload_stackdumpcommandsandoptions:
onload_stackdump‐?
•Tolistandreadenvironmentvariablesdescriptions:
onload_stackdumpdoc
•Fordescriptionsofstatisticsvariables:
onload_stackdumpdescribe_stats
Describesallstatisticslistedbytheonload_stackdumplotscommand.
•Toidentifyallstacks,byidentifierandname,andallprocessesacceleratedby
Onload:
onload_stackdump
#stackidstacknamepids
6teststack28570
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 220
•Tolimitthecommand/optiontoaspecificstacke.g(stack4).
onload_stackdump4lots
ListOnloadedProcesses
Theonload_stackdumpprocessescommandwillshowthePIDandnameof
processesbeingacceleratedbyOnloadandtheOnloadstackbeingusedbyeach
processe.g.
#onload_stackdumpprocesses
#pidstackidcmdline
255873./sfntpingpong
Onloadedprocesseswhichhavenotcreatedasocketarenotdisplayed,butcanbe
identifiedusingthelsofcommand.
IdentifyOnloadedProcessesAffinities
Theonload_stackdumpaffinitiescommandwillidentifythetaskaffinityforan
acceleratedprocesse.g.
#onload_stackdumpaffinities
pid=25587
cmdline=./sfntpingpong
task25587:80
Thetaskaffinityisidentifiedfroman8bitmaski.e.01isCPUcore0,02isCPUcore
1,80isCPUcore7etc.
ListOnloadEnvironmentvariables
Theonload_stackdumpenvcommandwillidentifyonloadedprocessesrunning
inthecurrentenvironmentandlistallOnloadvariablessetinthecurrent
environmente.g.
#EF_POLL_USEC=100000EF_TXQ_SIZE=4096EF_INT_DRIVE=1onload<application>
#onload_stackdumpenv
pid:25587
cmdline:./sfntpingpong
env:EF_POLL_USEC=100000
env:EF_TXQ_SIZE=4096
env:EF_INT_DRIVEN=1
TXPIOCounters
TheOnloadstackdumputilityexposescounterstoindicatehowoftenTXPIOisbeing
used‐seeDebugandLoggingonpage67.ToviewPIOcountersrunthefollowing
command:
$onload_stackdumpstats|greppio
pio_pkts:2485971
no_pio_err:0
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 221
ThevaluesreturnedwillidentifythenumberofpacketssentviaPIOandnumberof
timeswhenPIOwasnotusedduetoanerrorcondition.
SendRSTonaTCPSocket
TosendaresetonanOnloadacceleratedTCPsocket,specifythestackandsocket
usingtherstcommand:
#onload_stackdump<stack:socket>rst
RemovingZombieandOrphanStacks
Onloadstacksandsocketscanremainactiveevenafterallprocessesusingthem
havebeenterminatedorhaveexited,forexampletoensuresentdataissuccessfully
receivedbytheTCPpeerortohonorTCPTIME_WAITsemantics.Suchstacksshould
alwayseventuallyselfdestructanddisappearwithnouserintervention.However,
thesestacks,insomeinstances,causeproblemsforrestartingapplications,for
exampletheapplicationmaybeunabletousethesameportnumberswhenthese
arestillbeingusedbythepersistentstacksocket.Persistentstacksalsoretain
resourcessuchaspacketbufferswhicharethendeniedtootherstacks.
Suchstacksaretermed‘zombie’or‘orphanstacksanditmaybeundesirableor
desirablethattheyexist.
•Tolistallpersistentstacks:
#onload_stackdump‐zall
Nooutputtotheconsoleorsyslogmeansthatnosuchstacksexist.
•Tolistaspecificpersistentstack:
#onload_stackdump‐z<stackID>
•Todisplaythestateofpersistentstacks:
#onload_stackdump‐zdump
•Toterminatepersistentstacks
#onload_stackdump‐zkill
•Todisplayalloptionsavailableforzombie/orphanstacks:
#onload_stackdump‐‐help
Snapshotvs.DynamicViews
Theonload_stackdumptoolpresentsasnapshotviewofthesystemwheninvoked.
Tomonitorstateandvariablechangeswhilstanapplicationisrunninguse
onload_stackdumpwiththeLinuxwatchcommande.g.
• snapshot:onload_stackdumpnetif
• dynamic:watch‐d‐n1onload_stackdumpnetif
Someonload_stackdumpcommandsalsoupdateperiodicallywhilstmonitoringa
process.Thesecommandsusuallyhavethewatch_prefixe.g.
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 222
watch_stats,watch_more_stats,watch_tcp_stats,watch_ip_statsetc.
Usetheonload_stackdump‐hoptiontolistallcommands.
MonitoringReceiveandTransmitPacketBuffers
onload_stackdumppackets
#onload_stackdumppackets
ci_netif_pkt_dump_all:id=1
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=437current
pkt_bufs:max=32768alloc=2048free=981async=0
pkt_bufs:rx=1067rx_ring=1001rx_queued=2pressure_pool=64
pkt_bufs:tx=0tx_ring=0tx_oflow=0
pkt_bufs:in_loopback=0in_sock=0
1003:0x200Rx
n_zero_refs=1045n_freepkts=981estimated_free_nonb=64
free_nonb=0nonb_pkt_pool=ffffffffffffffff
Theonload_stackdumppacketscommandcanbeusefultoreviewpacketbuffer
allocation,useandreusewithinamonitoredprocess.
Theexampleaboveidentifiesthattheprocesshasamaximumof32768buffers
(eachof2048bytes)available.Fromthispool576buffershavebeenallocatedand
50fromthatallocationarecurrentlyfreeforreuse‐thatmeanstheycanbepushed
ontothereceiveortransmitringbuffersreadytoacceptnewincoming/outgoing
data.
Onthereceivesideofthestack,525packetbuffershavebeenallocated,522have
beenpushedtothereceivering‐andareavailableforincomingpackets,and3are
currentlyinthereceivequeuefortheapplicationtoprocess.
Onthetransmitsideofthestack,only1packetbufferiscurrentlyallocatedand
becauseitisnotcurrentlyinthetransmitringandisnotinanoverflowbufferitis
countedastx_other.Theremainingvaluesarecalculationsbasedonthepacket
buffervalues.
UsingtheEF_PREFAULT_PACKETSenvironmentvariable,packetscanbepre
allocatedtotheuserprocesswhenanOnloadstackiscreated.Thiscanreduce
latencyjitterandimproveOnloadperformance‐forfurtherdetailsseePrefault
PacketBuffersonpage42.
PacketSets
Apacketsetisa2MBchunkofpacketbuffersbeingusedbyanOnloadapplication.
Anapplicationmightusebuffersfromasinglesetorfromseveralsetsdependingon
itscomplexityandpacketbufferrequirements.
WithanaimtofurtherreduceTLBthrashingandeliminatepacketsdrops,Onload
willtrytoreusebuffersfromthesameset.
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 223
Theonload_stackdumplotscommandinOnload201509willreportonthecurrent
useofpacketssetse.g
$onload_stackdumplots|greppkt_set
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=442current
Intheaboveoutputthereare2packetsets,thecountersidentifythenumberoffree
packetbuffersineachsetandidentifythesetcurrentlybeingused.
Thepacketsetsfeatureisnotavailabletouserapplicationsusingtheef_vilayer
directly.
TCPApplicationSTATS
Thefollowingonload_stackdumpcommandscanbeusedtomonitoraccelerated
TCPconnections:
onload_stackdumptcp_stats
Field Description
tcp_active_opens Numberofsocketconnectionsinitiatedbythe
localend
tcp_passive_opens Numberofsocketsconnectionsacceptedbythe
localend
tcp_attempt_fails Numberoffailedconnectionattempts
tcp_estab_resets Numberofestablishedconnectionswhichwere
subsequentlyreset
tcp_curr_estab Numberofsocketconnectionsintheestablished
orclose_waitstates
tcp_in_segs Total numberofreceivedsegments‐includes
erroredsegments
tcp_out_segs Totalnumberoftransmittedsegments‐excluding
segmentscontainingonlyretransmittedoctets
tcp_retran_segs Totalnumberofretransmittedsegments
tcp_in_errs Total numberofsegmentsreceivedwitherrors
tcp_out_rsts Numberofresetsegmentssent
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 224
onload_stackdumpmore_stats|greptcp
Usetheonload_stackdump‐hcommandtolistallTCPconnection,stackand
socketcommands.
Theonload_stackdumpLOTSCommand.
Theonload_stackdumplotscommandwillproduceextensivedataforall
acceleratedstacksandsockets.Thecommandcanalsoberestrictedtoaspecific
stackanditsassociatedconnectionswhenthestacknumberisenteredonthe
commandlinee.g.
onload_stackdumplots
onload_stackdump2lots
Fordescriptionsofthestatisticsrefertotheoutputfromthefollowingcommand:
onload_stackdumpdescribe_stats
Field Description
tcp_has_recvq Nonzeroifreceivequeuehasdataready
tcp_recvq_bytes Totalbytesinreceivequeue
tcp_recvq_pkts Totalpacketsinreceivequeue
tcp_has_recv_reorder Nonzeroifsockethasoutofsequencebytes
tcp_recv_reorder_pkts: Numberofoutofsequencepacketsreceived
tcp_has_sendq Nonzeroifsendqueueshavedataready
tcp_sendq_bytes Numberofbytescurrentlyinallsendqueuesfor
thisconnection
tcp_sendq_pkts Numberofpacketscurrentlyinallsendqueuesfor
thisconnection
tcp_has_inflight Nonzeroifsomedataremainsunacknowledged
tcp_inflight_bytes Totalnumberofunacknowledgedbytes
tcp_inflight_pkts Totalnumberofunacknowledgedpackets
tcp_n_in_listenq Numberofsockets(summedacrossalllistening
sockets)wherethelocalendhasrespondedto
SYN,withaSYN_ACK,butthishasnotyetbeen
acknowledgedbytheremoteend
tcp_n_in_acceptq Numberofsockets(summedacrossalllistening
sockets)thatarecurrentlyqueuedwaitingforthe
localapplicationtocallaccept()
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 225
Thefollowingtablesdescribetheoutputfromtheonload_stackdumplots
commandfor:
•TCPstack
•TCPestablishedconnectionsocket
•TCPlisteningsocket
•UDPsocket
Withinthetablesthefollowingabbreviationsareused:
rx=receive(orreceiver),tx=transmit(orsend)
pkts=packets,skts=sockets
Max=maximum,num=number,seq=sequencenumber
Table5:StackdumpOutput:TCPStack
Sampleoutput Description
onload_stackdumplots Commandentered
ci_netif_dump:stack=7name= StackidandstacknameassetbyEF_NAME.
ver=201310uid=0pid=21098 Onloadversion,useridandprocessidofcreator
process
lock=20000000LOCKEDnics=3primed=1 Internalstacklockstatus
nics=bitfieldidentifiesadaptersusedbythisstack
e.g.3=0x11‐sostackisusingNICs1and2.
primed=1meanstheeventqueuewillgenerate
aninterruptwhenthenexteventarrives
sock_bufs:max=1024n_allocated=4 Maxnumberofsocketsbufferswhichcanbe
allocated,andnumbercurrentlyinuse.Socket
buffersarealsousedbypipes.
pkt_bufs:size=2048max=32768alloc=576
free=57async=0
Packetbuffers:
Atotalof32768(eachof2048bytes)pktbuffers
areavailabletothisstack.576havebeenallocated
ofwhich57arefreeandcanbereusedbyeither
receiveortransmitrings.
async=buffersthatarenotfree,notbeingused,
notbeingreaped‐i.einastatewaitingtobe
returnedforreuse
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 226
pkt_bufs:rx=517rx_ring=514rx_queued=3 Receivepacketbuffers:
Atotalof517pktbuffersarecurrentlyinuse,514
havebeenpushedtothereceivering,3areinthe
application’sreceivequeue
IftheCRITICALflagisdisplayeditindicatesa
memorypressureconditioninwhichthenumber
ofpacketsinthereceivesocketbuffers(rx=517)is
approachingtheEF_MAX_RX_PACKETSvalue.
IftheLOWflagisdisplayeditindicatesamemory
pressureconditionwhentherearenotenough
packetbuffersavailabletorefilltheRXdescriptor
ring.
pkt_bufs:tx=2tx_ring=1tx_oflow=0
tx_other=1
Transmitpacketbuffers:
Atotalof2pktbuffersarecurrentlyinuse,1
remainsinthetransmitring,0buffershave
overflowed.tx_other=pktbuffersnotinuseby
transmitandnotintx_ringortx_oflowqueue
time:netif=5eb5c61poll=5eb5c61now=5eb5c61
(diff=0.000sec)
Internaltimervalues
ci_netif_dump_vi:stack=7intf=0
vi_instance=87hw=0C0
Datadescribesthestacksvirtualinterfacetothe
NIC
evq:cap=2048current=16de30is_32_evs=0
is_ev=0
Eventqueuedata:
cap‐maxnumofeventsqueuecanhold
current‐currenteventqueuelocation
is_32_evs‐is1ifthereare32ormoreevents
pending
is_ev‐is1ifthereareanyeventspending
Table5:StackdumpOutput:TCPStack
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 227
rxq:cap=511lim=511spc=1level=510
total_desc=93666
Receivequeuedata:
cap‐totalcapacity
lim‐maxfilllevelforreceivedescriptorring,
specifiedbyEF_RXQ_LIMIT
spc‐amountoffreespaceinreceivequeue‐how
manydescriptorscouldbeaddedbeforethe
receivequeuebecomesfull
level‐howfullthereceivequeuecurrentlyis
total_desc‐totalnumberofdescriptorsthathave
beenpushedtothereceivequeue
txq:cap=511lim=511spc=511level=0pkts=0
oflow_pkts=0
Transmitqueuedata:
cap‐totalcapacity
lim‐maxfilllevelfortransmitdescriptorring,
specifiedbyEF_TXQ_LIMIT
spc‐amountoffreespaceinthetransmitqueue‐
howmanydescriptorscouldbeaddedbeforethe
transmitqueuebecomesfull
level‐howfullthetransmitqueuecurrentlyis
pkts‐howmanypacketsarerepresentedbythe
descriptorsinthetransmitqueue
oflow‐howmanypacketsareintheoverflow
transmitqueue(i.e.waitingforspaceintheNIC's
transmitqueue)
txq:tot_pkts=93669bytes=0 Totalnumberofpacketssentandnumberof
packetbytescurrentlyinthequeue
ci_netif_dump_extra:stack=7 Additionaldatafollows
in_poll=0post_poll_list_empty=1
poll_did_wake=0
StackPollingStatus:
in_poll=processiscurrentlypolling
post_poll_list_empty=1,(1=true,0=false)tasksto
bedoneoncepollingiscomplete
poll_did_wake=whilepolling,theprocess
identifiedasocketwhichneedstobewoken
followingthepoll
rx_defrag_head=1rx_defrag_tail=1Reassemblysequencenumbers.‐1meansnore
assemblyhasoccurred
Table5:StackdumpOutput:TCPStack
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 228
tx_tcp_may_alloc=1nonb_pool=1
send_may_poll=0is_spinner=0
TCPbufferdata:
tx_tcp_may_alloc=numpktbufferstcpcoulduse
nonb_pool=numberofpktbuffersavailabletotcp
processwithoutholdinglock
send_may_poll=0
is_spinner=TRUEifathreadisspinning
send_may_poll=0 0
hwport_to_intf_i=0,1,1,1,1,1
intf_i_to_hwport=0,0,0,0,0,0
Internalmappingofinternalinterfacesto
hardwareports
uk_intf_ver=03e89aa26d20b98fd08793e771f2cdd9 md5user/kernelinterfacechecksumcomputedby
bothkernelanduserapplicationtoverifyinternal
datastructures
ci_netif_dump_reap_list:stack=7
7:2
7:1
Identifiessocketsthathavebufferswhichcanbe
freede.g.7:2=stack7socket2
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
TCP7:1lcl=192.168.1.2:50773
rmt=192.168.1.1:34875ESTABLISHED
SocketConfiguration.
Stack:socketid,localandremoteip:portaddress,
TCPconnectionisESTABLISHED
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=0000b6f4(RQ)tx_wake=00000002
flags:
Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE
BOUND
Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocaladdresses
rcvbuf=129940sndbuf=131072rx_errno=0
tx_errno=0so_error=0
Socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
tcpflags:TSOWSCLSACKESTAB TCPflagscurrentlysetforthissockets
TCPstate:ESTABLISHED StateoftheTCPconnection
Table5:StackdumpOutput:TCPStack
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 229
snd:up=b554bb86unanxtmax=b554bb86
b554bb87b556b6a6enq=b554bb87
TCPsequencenumbers.
up=(urgentpointer)sequenceofbytefollowing
the00Bbyte
unanxtmax=sequencenumberoffirst
unacknowledgedbyte,sequencenumberofnext
byteweexpecttobeacknowledgedandmax=
sequenceoflastbyteinthecurrentsendwindow
enq=sequencenumberoflastbytecurrently
queuedfortransmit
send=0(0)pre=0inflight=1(1)wnd=129824
unused=129823
SendData.
send=numberofpkts(bytes)sent
pre=numberofpktsinpresendqueue.Aprocess
canadddatatotheprequeuewhenitisprevented
fromsendingthedataimmediately.Thedatawill
besentwhenthecurrentsendingoperationis
complete
inflight=numberofpkts(bytes)sentbutnotyet
acknowledged
wnd=receiversadvertisedwindowsize(bytes)
andnumberoffree(unused)space(bytes)inthat
window
snd:cwnd=49733+0used=0ssthresh=65535
bytes_acked=0Open
Congestionwindow(cwnd).
cwnd=congestionwindowsize(bytes)
used=portionofthecwndcurrentlyinuse
slowstartthresh‐numberofbytesthathavetobe
sentbeforeprocesscanexitslowstart
bytes_acked=numberofbytesacknowledged‐
thisvalueisusedtocalculatetherateatwhichthe
congestionwindowisopened
currentcwndstatus=OPEN
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 230
snd:Onloaded(Valid)if=6mtu=1500intf_i=0
vlan=0encap=4
Onloaded=canreachthedestinationviaan
acceleratedinterface.
(Valid)=cachedcontrolplaneinformationisupto
date,cansendimmediatelyusingthisinformation.
(Old)=cachedcontrolplaneinformationmaybe
outofdate.OnnextsendOnloadwilldoacontrol
planelookup‐thiswilladdsomelatency.
rcv:nxtmax=0e9251fe0e944d1d
current=0e944d92FASTSTARTFAST
ReceiverData.
nxtmax=nextbyteweexpecttoreceiveandlast
byteweexpecttoreceive(becauseofwindow
size)
current=bytecurrentlybeingprocessed
rob_n=0recv1_n=2recv2_n=0wndadv=129823
cur=129940usr=0
Reorderbuffer.
Bytesreceivedoutofsequenceareputintoa
reorderbufferawaitingfurtherbytesbefore
reorderingcanoccur.
rob_n=numofbytesinreorderbuffer
recv1_n=numofbytesingeneralreorderbuffer
recv2_n=numofbytesinurgentdatareorder
buffer
wndadv=receiveradvertisedwindowsize
cur=currentreceivewindowsize
usr=currenttcpstackuser
async:rx_put=1rx_get=1tx_head=1Asynchronousqueuedata.
eff_mss=1448smss=1460amss=1460
used_bufs=2uid=0wscls=1r=1
MaxSegmentSize.
eff_mss=effective_mss
smss=sendermss
amss=advertisedmss
used_bufs=numberoftransmitbuffersused
useridthatcreatedthissocket(0=root)
wscls/r=parameterstowindowscalingalgorithm
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 231
srtt=01rttvar=000rto=189zwins=0,0 Roundtriptime(RTT)‐allvaluesaremilliseconds.
srtt=smoothedRTTvalue
rttvar=RTTvariation
rto=currentRTOtimeoutvalue
zwins=zerowindows,timeswhenadvertised
windowhasgonetozerosize.
retrans=0dupacks=0rtos=0frecs=0seqerr=0
ooo_pkts=0ooo=0
Retransmissions.
retrans=internalstate,nearlyalwayszero.
dupacks=numberofduplicateacksreceived
rtos=numberofretranstimeouts
frecs=numberoffastrecoveries
seqerr=numberofsequenceerrors
numberofoutofsequencepkts
numberofoutoforderevents
timers: Currentlyactivetimers
tx_nomac NumberofTCPpacketssentviatheOSusingraw
socketswhenuptodateARPdataisnotavailable.
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description
TCP7:3lcl=0.0.0.0:50773rmt=0.0.0.0:0
LISTEN
Socketconfiguration.
stack:socketid,LISTENINGsocketonport50773
localandremoteaddressesnotset‐notboundto
anyIPaddr
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=00000000tx_wake=00000000flags: Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE
BOUNDPBOUND
Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocalport
Table6:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 232
rcvbuf=129940sndbuf=131072rx_errno=6b
tx_errno=20so_error=0
ReceiveBuffer.
socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
tcpflags:WSCLSACK Flagsadvertisedduringhandshake
listenq:max=1024n=0 ListenQueue.
queueofhalfopenconnects(SYNreceivedand
SYNACKsent‐waitingforfinalACK)
n‐numberofconnectionsinthequeue
acceptq:max=5n=0get=1put=1total=0 AcceptQueue.
queueofopenconnections,waitingfor
applicationtocallaccept().
max=maxconnectionsthatcanexistinthequeue
n=currentnumberofconnections
get/put=indexesforqueueaccess
total=numofconnectionsthathavetraversed
thisqueue
epcache:n=0cache=EMPTYpending=EMPTY Endpointcache.
n=numberofendpointscurrentlyknowntothis
socket
cache=EMPTYoryesifendpointsarestillcached
pending=EMTPYoryesifendpointsstillhaveto
becached
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 233
defer_accept=0 NumberoftimesTCP_DEFER_ACCEPTkickedin‐
seeTCPsocketoptions
l_overflow=0l_no_synrecv=0a_overflow=0
a_no_sock=0ack_rsts=0os=2
l_overflow=numberoftimeslistenqueuewasfull
andhadtorejectaSYNrequest
l_no_synrecv=numberoftimesunabletoallocate
internalresourceforSYNrequest
a_overflow=numberoftimesunabletopromote
connectiontotheacceptqueuewhichisfull
a_no_sock=numberoftimesunabletocreate
socket
ack_rsts=numberoftimesreceivedanACKbefore
SYNsotheconnectionwasreset
os=2thereare2socketsbeingprocessedinthe
kernel
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description
UDP4:1lcl=192.168.1.2:38142
rmt=192.168.1.1:42638UDP
SocketConfiguration.
stack:socketid,UDPsocketonport38142
Localandremoteaddressesandports
lock:20000000LOCKED Stackinternallockstatus
rx_wake=000e69b0tx_wake=000e69b1flags: Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffes_flags:REUSE Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocaladdresses
rcvbuf=129024sndbuf=129024rx_errno=0
tx_errno=0so_error=0
Buffers.
socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
udpflags:FILTMCAST_LOOPRXOS FlagssetontheUDPsocket
Table7:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 234
mcast_snd:intf=1ifindex=0saddr=0.0.0.0
ttl=1mtu=1500
Multicast.
intf=multicasthardwareportid(1meansport
wasnotset)
ifindex=interface(port)identifier
saddr=IPaddress
tt1=timetolive(defaultformulticast=1)
mtu=maxtransmissionunitsize
rcv:q_bytes=0q_pkts=0reap=2
tot_bytes=30225920tot_pkts=944560
ReceiveQueue.
q_bytes=numbytescurrentlyinrxqueue
q_pkts=numpktscurrentlyinrxqueue
tot_bytes=totalbytesreceived
tot_pkts=totalpktsreceived
rcv:oflow_drop=0(0%)mem_drop=0eagain=0
pktinfo=0q_max_pkts=0
OverflowBuffer.
oflow=numberofdatagramsintheoverflow
queuewhenthesocketbufferisfull.
drop=numberofdatagramsdroppeddueto
runningoutofpacketbuffermemory.
eagain=numberoftimestheapplicationtriedto
readfromasocketwhenthereisnodataready‐
thisvaluecanbeignoredonthercvside
pktinfo=numberoftimesIP_PKTINFOcontrol
messagewasreceived
q_max=maxdepthreachedbythereceivequeue
(packets)
rcv:os=0(0%)os_slow=0os_error=0 Numberofdatagramsreceivedvia:
os=operatingsystem
os_slow=operatingsystemslowsocket
os_error=recv()functioncallviaOSreturnedan
error
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 235
snd:q=0+0ul=944561os=0(0%)os_slow=0(0%) Sendvalues.
q=numberofbytessenttotheinterfacebutnot
yettransmitted
ul=numberofdatagramssentviaonload
os=numberofdatagramssentviaOS
os_slownumberofdatagramssentviaOSslow
path
snd:cp_match=0(0%) UnconnectedUDPsend.
cp_match=numberdgramssentviaaccelerated
pathandpercentthisisofallunconnectedsend
dgrams
snd:lk_poll=0(0%)lk_pkt=944561(100%)
lk_snd=0(0%)
Stackinternallock.
lk_poll=numberoftimesthelockwasheldwhile
wepollthestack
lk_pkt=numberofpktssentwhileholdingthe
lock
lk_snd=numberoftimesthelockwasheldwhile
sendingdata
snd:lk_defer=0(0%)cached_daddr=0.0.0.0 Sendingdeferredtotheprocess/threadcurrently
holdingthelock
snd:eagain=0spin=0block=0 eagain=countofthenumberoftimesthe
applicationtriedtosenddata,butthetransmit
queueisalreadyfull.Ahighvalueonthesendside
mayindicatetransmitissues.
spin=numberoftimesprocesshadtospinwhen
thesendqueuewasfull
block=numberoftimesprocesshadtoblock
whenthesendqueuewasfull
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 236
Followingthestackandsocketdataonload_stackdumplotswilldisplayalistof
statisticaldata.Fordescriptionsofthefieldsrefertotheoutputfromthefollowing
command:
onload_stackdumpdescribe_stats
Thefinallistproducedbyonload_stackdumplotsshowsthecurrentvaluesofall
environmentvariablesinthemonitoredprocessenvironment.Fordescriptionsof
theenvironmentvariablesrefertoParameterReferenceonpage146orusethe
onload_stackdumpdoccommand.
RemoteMonitoring
IntroducedinOnload201502,theremotemonitoringfeatureusesasimpleclient/
servermodeltoexporttheOnloadstackandsocketdatatoaremoteserver(s).The
remotemonitor(server)processisinstalledalongwiththeOnloaddistribution.A
simpleexampleclientprocessisalsoprovided:
Theserverprocess(onthemachinetobemonitored)canbestartedfromthe
followingdirectory:
openonload201502/src/tools/onload_remote_monitor
Startthemonitorserverprocessidentifyingaportthroughwhichserver/client
processeswillconnect:
#./onload_remote_monitor<port>
Theexampleclientprocesscanbefoundinthefollowingdirectory:
openonload201502/src/tests/onload/onload_remote_monitor
Fromtheremotemachine,starttheclientprocessidentifyingtheserverhost
machineandportnumber
#./orm_example_client<serverhost>:<port>
snd:poll_avoids_full=0fragments=0
confirm=0
poll_avoids_full=numberoftimespollingcreated
spaceinthesendqueue
fragments=numberof(nonfirst)fragmentssent
confirm=numberofdatagramssentwith
MSG_CONFIRMflag
snd:os_late=0unconnect_late=0 os_late=numberofpktssentviaOSaftercopying
unconnect_late=numberofpktssilentlydropped
whenprocess/threadbecomesdisconnected
duringasendprocedure
Table8:StackdumpOutput:UDPSocket:
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue20 ©SolarflareCommunications2015 237
Intheinitialreleasetheremote_monitorserverwillexportanextensivelistof
countersfromtheOnloadstacksandsockets.DataisexportedinJSONformatfor
processingbyaremoteapplication.
Remotemonitoringisanexploratoryfeatureanditisplannedthatfuture
continuousdevelopmentwillincludedatarequestedbydirectcustomerinputand
feedback.
Customersinterestedinremotemonitoringareaskedtoprovidefeedbackand
monitoringrequirementsbysendinganemailtosupport@solarflare.com.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 238
FSolarflaresfnettest
F.1Introduction
Solarflaresfnettestisasetofbenchmarktoolsandtestutilitiessuppliedby
Solarflareforbenchmarkandperformancetestingofnetworkserversandnetwork
adapters.Thesfnettestisavailableinbinaryandsourceformsfrom:
http://www.openonload.org/
Downloadthesfnettest<version>.tgzsourcefileandunpackusingthetar
command.
tar‐zxvfsfnettest<version>.tgz
Runthemakeutilityfromthe/sfnettest<version>/srcsubdirectorytobuild
thebenchmarkapplications.
RefertotheREADME.sfntpingpongorREADME.sfntstreamfilesinthe
distributiondirectoryoncesfnettestisinstalled.
sfntpingpong
Description
ThesfntpingpongapplicationmeasuresTCPandUDPlatencybycreatingasingle
socketbetweentwoserversandrunningasimplemessagepatternbetweenthem.
TheoutputidentifieslatencyandstatisticsforincreasingTCP/UDPpacketsizes.
Usage
sfntpingpong[options][<tcp|udp|pipe|unix_stream|unix_datagram>
[<host[:port]>]]
Options
sfntpingpongoptions:
Option Description
‐‐port serverport
‐‐sizes singlemessagesize(bytes)
‐‐connect connect()UDPsocket
‐‐spin spinonnonblockingrecv()
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 239
‐‐muxer select,pollorepoll
‐‐servmuxer none,select,pollorepoll(sameasclientbydefault)
‐‐rtt reportroundtriptime
‐‐raw dumprawresultstofiles
‐‐percentile percentile
‐‐minmsg minimummessagesize
‐‐maxmsg maximummessagesize
‐‐minms mintimepermsgsize(ms)
‐‐maxms maxtimepermsgsize(ms)
‐‐miniter minimumiterationsforresult
‐‐maxiter maximumiterationsforresult
‐‐mcast usemulticastaddressing
‐‐mcastintf setthemulticastinterface.Theclientsendsthisparameter
totheserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐bindtodev SO_BINDTODEVICE
‐‐forkboth forkclientandserver
‐‐npipe includepipesinfiledescriptorset
‐‐nunixdincludeunixdatagramsinthefiledescriptorset
‐‐nunixsincludeunixstreamsinthefiledescriptorset
‐‐nudp includeUDPsocketsinfiledescriptorset
‐‐ntcpc includeTCPsocketsinfiledescriptorset
‐‐ntcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcpserv host:portforTCPconnections
‐‐timeout socketSND/RECVtimeout
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 240
Standardoptions:
Examples
ExampleTCPlatencycommandlines
[root@server]#onload‐‐profile=latencytaskset‐c1./sfntpingpong
[root@client]#onload‐‐profile=latencytaskset‐c1./sfntpingpong\
‐‐maxms=10000‐‐affinity"1;1"tcp<serverip>
ExampleUDPlatencycommandlines
[root@server]#onload‐‐profile=latencytaskset‐c9./sfntpingpong
[root@client]#onload‐‐profile=latencytaskset‐c9./sfntpingpong\
‐‐maxms=10000‐‐affinity"9;9"udp<server_ip>
Exampleoutput
#version:1.4.0modified
#src:13b27e6b86132da11b727fbe552e2293
#date:SatApr2111:56:22BST2012
#uname:Linuxserver4.uk.level5networks.com2.6.32220.el6.x86_64#1SMP
WedNov908:03:13EST2011x86_64x86_64x86_64GNU/Linux
#cpu:modelname:Intel(R)Xeon(R)CPUE52687W0@3.10GHz
#lspci:05:00.0Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
#lspci:05:00.1Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
‐‐affinity ’<clientcore>;<servercore>’Enclosevaluesinquotes.
Thisoptionshouldbesetontheclientsideonly.Theclient
sendsthe<server_core>valuetotheserver.Theusermust
ensurethattheidentifiedservercoreisavailableonthe
servermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐npings numberofpingmessages
‐‐npongs numberofpongmessages
‐‐nodelay enableTCP_NODELAY
Option Description
?‐‐help thismessage
q‐‐quiet quiet
v‐‐verbose displaymoreinformation
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 241
#lspci:83:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:83:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:85:00.0Ethernetcontroller:IntelCorporation82574LGigabit
NetworkConnection
#eth0:driver:igb
#eth0:version:3.0.6k
#eth0:businfo:0000:05:00.0
#eth1:driver:igb
#eth1:version:3.0.6k
#eth1:businfo:0000:05:00.1
#eth2:driver:sfc
#eth2:version:3.2.1.6083
#eth2:businfo:0000:83:00.0
#eth3:driver:sfc
#eth3:version:3.2.1.6083
#eth3:businfo:0000:83:00.1
#eth4:driver:e1000e
#eth4:version:1.4.4k
#eth4:businfo:0000:85:00.0
#virbr0:driver:bridge
#virbr0:version:2.3
#virbr0:businfo:N/A
#virbr0nic:driver:tun
#virbr0nic:version:1.6
#virbr0nic:businfo:tap
#ram:MemTotal:32959748kB
#tsc_hz:3099966880
#LD_PRELOAD=libonload.so
#serverLD_PRELOAD=libonload.so
#onload_version=201205
#EF_TCP_FASTSTART_INIT=0
#EF_POLL_USEC=100000
#EF_TCP_FASTSTART_IDLE=0
#
#sizemeanminmedianmax%ilestddeviter
1245323802434182882669771000000
2245323792435451092616901000000
4246723802436105022730821000000
824652383244687982642701000000
1624602380244174942632681000000
3224742399245487582677711000000
64249524192474121742716771000000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)RTT/2
latencyforincreasingpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof2.4
microsecondswitha99%ilelatencylessthan2.7microseconds.
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 242
sfntstream
ThesfntstreamapplicationmeasuresRTTlatency(not1/2RTT)forafixedsize
messageatincreasingmessagerates.Latencyiscalculatedfromasampleofall
messagessent.Messageratescanbesetwiththeratesoptionandthenumberof
messagestosampleusingthesampleoption.
SolarflaresfntstreamonlyfunctionsonUDPsockets.Thislimitationwillbe
removedtosupportotherprotocolsinthefuture.
RefertotheREADME.sfntstreamfilewhichispartoftheOnloaddistributionfor
furtherinformation.
Usage
sfntstream[options][tcp|udp|pipe|unix_stream|unix_datagram[host[:port]]]
Options
sfntstreamoptions:
Option Description
‐‐msgsize messagesize(bytes)
‐‐rates msgrates<min><max>[+<step>]
‐‐millisec timepertest(milliseconds)
‐‐samples numberofsamplespertest
‐‐stop stopwhenTXrateachievedisbelowgivepercentageof
targetrate
‐‐maxburst maximumburstlength
‐‐port serverportnumber
‐‐connect connect()UDPsocket
‐‐spin spinonnonblockingrecv()
‐‐muxer select,poll,epollornone
‐‐rtt reportroundtriptime
‐‐raw dumprawresultstofile
‐‐percentile percentile
‐‐mcast setthemulticastaddress
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 243
Standardoptions:
‐‐mcastintf setmulticastinterface.Theclientsendsthisparameterto
theserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐ttl IP_TTLandIP_MULTICAST_TTL
‐‐bindtodevice SO_BINDTODEVICE
‐‐npipe includepipesinfiledescriptorset
‐‐nunixdincludeunixdatagraminfiledescriptorset
‐‐nunixsincludeunixstreaminfiledescriptorset
‐‐nudp includeUDPsocketsinfiledescriptorset
‐‐ntcpc includeTCPsocketsinfiledescriptorset
‐‐ntcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcpcserv host:portforTCPconnections
‐‐nodelay enableTCP_NODELAY
‐‐affinity "<clienttx>,<clientrx>;<servercore>"enclosethevalues
indoublequotese.g."4,5;3".Thisoptionshouldbeseton
theclientsideonly.Theclientsendsthe<server_core>
valuetotheserver.Theusermustensurethatthe
identifiedservercoreisavailableontheservermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐rttiter iterationsforRTTmeasurement
Option Description
?‐‐help thismessage
q‐‐quiet quiet
v‐‐verbose displaymoreinformation
‐‐version displayversioninformation
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 244
Examples
Examplecommandlinesclient/server
#./sfntstream(server)
#./sfntstream‐‐affinity1,1udp<serverip>(client)
#./taskset‐c1./sfntstream‐‐affinity="3,5;3"‐‐mcastintf=eth4udp\
<remoteip>(client)
BondedInterfaces:sfntstream
Thefollowingexampleconfiguresasinglebond,havingtwoslavesinterfaces,on
eachmachine.Bothclientandservermachinesuseeth4andeth5.
ClientConfiguration:
[root@clientsrc]#ifconfigeth40.0.0.0down
[root@clientsrc]#ifconfigeth50.0.0.0down
[root@clientsrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@clientsrc]#ifconfigbond0up
[root@clientsrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#ifconfigbond0172.16.136.27/21
[root@clientsrc]#onload‐‐profile=latencytaskset‐c3./sfntstream
sfntstream:server:waitingforclienttoconnect...
sfntstream:server:clientconnected
sfntstream:server:client0at172.16.136.28:45037
ServerConfiguration:
[root@serversrc]#ifconfigeth40.0.0.0down
[root@serversrc]#ifconfigeth50.0.0.0down
[root@serversrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@serversrc]#ifconfigbond0up
[root@serversrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#ifconfigbond0172.16.136.28/21
NOTE:serversendstoIPaddressofclientbond
[root@serversrc]#onload‐‐profile=latencytaskset‐c1./sfntstream‐‐mcastintf=bond0‐
affinity"1,1;3"udp172.16.136.27
OutputFields
Alltimemeasurementsarenanosecondsunlessotherwisestated.
Field Description
mpstarget Msgpersectargetrate
mpssend Msgpersecactualrate
mpsrecv Msgreceiverate
latencymean RTTmeanlatency
OnloadUserGuide
Solarflaresfnettest
Issue20 ©SolarflareCommunications2015 245
LatencyProfile‐Spinning
Bothsfntpingpongandsfntstreamusescriptsfoundintheonload_apps
subdirectorywhichinvoketheonloadlatencyprofiletherebycausingthe
applicationto‘spin.
Torunthesetestprogramsinaninterruptdrivenmode,replacethe‐‐
profile=latencyoptiononthecommandline,withthe‐‐noapphandleroption.
latencymin RTTminimumlatency
latencymedian RTTmedianlatency
latencymax RTTmaximumlatency
latency%ile RTT99%ile
latencystddev Standarddeviationofsample
latencysamples Numberofmessagesusedtocalculatelatency
measurement
sendjitmean Meanvariancewhensendingmessages
sendjitmin Minimumvariancewhensendingmessages
sendjitmax Maximumvariancewhensendingmessages
sendjitbehind Numberoftimesthesenderfallsbehindandisunableto
keepupwiththetransmitrate
gapsn_gaps Countthenumberofgapsappearinginthestream
gapsn_drops Countthenumberofdropsfromstream
gapsn_ooo Countthenumberofsequencenumbersreceivedoutof
order
Field Description
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 246
Gonload_tcpdump
G.1Introduction
Bydefinition,Onloadisakernelbypasstechnologyandthispreventspacketsfrom
beingcapturedbypacketsniffingapplicationssuchastcpdump,netstatand
wireshark.
Onloadsupportstheonload_tcpdumpapplicationthatsupportspacketcapture
fromonloadstackstoafileortobedisplayedonstandardout(stdout).Packet
capturefilesproducedbyonload_tcpdumpcanthenbeimportedtotheregular
tcpdump,wiresharkorotherthirdpartyapplicationwhereuserscantakeadvantage
ofdedicatedsearchandanalysisfeatures.
Onload_tcpdumpallowsforthecaptureofallTCPandUDPunicastandmulticast
datasentorreceivedviaOnloadstacks‐includingsharedstacks.
G.2Buildingonload_tcpdump
Theonload_tcpdumpscriptissuppliedwiththeOnloaddistributionandislocated
intheOnload<version>/scriptssubdirectory.
NOTE:libpcapandlibpcapdevelmustbebuiltandinstalledbeforeOnloadis
installed.
G.3Usingonload_tcpdump
Forhelpusethe./onload_tcpdump‐hcommand:
Usage:
onload_tcpdump[ostack(id|name)[ostack...]]
tcpdump_options_and_parameters
"mantcpdump"fordetailsontcpdumpparameters.
Youmayusestackidnumberorshelllikepatternforthestackname
tospecifytheOnloadstackstolistenon.
Ifyoudonotspecifystacks,onload_tcpdumpwillmonitorallonload
stacks.
Ifyoudonotspecifyinterfacevia‐ioption,onload_tcpdump
listensonALLinterfacesinsteadofthefirstone.
ForfurtherinformationrefertotheLinuxmantcpdumppages.
Examples
•Captureallacceleratedtrafficfrometh2toafilecalledmycaps.pcap:
OnloadUserGuide
onload_tcpdump
Issue20 ©SolarflareCommunications2015 247
#onload_tcpdump‐ieth2‐wmycaps.pcap
•Ifnofileisspecifiedonload_tcpdumpwilldirectoutputtostdout:
#onload_tcpdump‐ieth2
•TocaptureacceleratedtrafficforaspecificOnloadstack(byname):
#onload_tcpdump‐ieth4‐ostackname
•TocaptureacceleratedtrafficforaspecificOnloadstack(byID):
#onload_tcpdump‐o7
•TocaptureacceleratedtrafficforOnloadstackswherenamebeginswith“abc
#onload_tcpdump‐o'abc*'
•Tocaptureacceleratedtrafficforonloadstack1,stacknamed“stack2andall
onloadstackswithnamebeginningwith“ab”:
#onload_tcpdump‐o1‐o'stack2'‐o'ab*'
Dependencies
Theonload_tcpdumpapplicationrequireslibpcapandlibpcapdeveltobe
installedontheserver.Iflibpcapisnotinstalledthefollowingmessageisreported
whenonload_tcpdumpisinvoked:
./onload_tcpdump
ciOnloadwascompiledwithoutlibpcapdevelopmentpackageinstalled.You
needtoinstalllibpcapdevelorlibpcapdevpackagetorun
onload_tcpdump.
tcpdump:truncateddumpfile;triedtoread24fileheaderbytes,onlygot
0
Hangup
Iflibpcapismissingitcanbedownloadedfromhttp://www.tcpdump.org/
Untarthecompressedfileonthetargetserverandfollowbuildinstructionsinthe
INSTALL.txtfile.ThelibpcappackagemustbeinstalledbeforeOnloadisbuiltand
installed.
Limitations
•Currentlyonload_tcpdumpcapturesonlypacketsfromonloadstacksandnot
fromkernelstacks.
•Theonload_tcpdumpapplicationmonitorsstackcreationeventsandwill
attachtonewlycreatedstackshowever,thereisashortperiod(normallyonly
afewmilliseconds)betweenstackcreationandtheattachmentduringwhich
packetssent/receivedwillnotbecaptured.
KnownIssues
Usersmaynoticethatthepacketssentwhenthedestinationaddressisnotinthe
hostARPtablecausesthepacketstoappearinbothonload_tcpdumpand(Linux)
tcpdump.
OnloadUserGuide
onload_tcpdump
Issue20 ©SolarflareCommunications2015 248
SolarCapture
Solarflare’sSolarCaptureisapacketcaptureapplicationforSolarflarenetwork
adapters.Itisabletocapturereceivedpacketsfromthewireatlinerate,assigning
accuratetimestampstoeachpacket.PacketsarecapturedtoPCAPfileorforwarded
tousersuppliedlogicforprocessing.FordetailsseetheSolarCaptureUserGuide
(SF108469CD)availablefromhttps://support.solarflare.com/.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 249
Hef_vi
TheSolarflareef_viAPIisalayer2APIthatgrantsanapplicationdirectaccesstothe
Solarflarenetworkadapterdatapathtodeliverlowerlatencyandreducedper
messageprocessingoverheads.ef_viistheinternalAPIusedbyOnloadforsending
andreceivingpackets.Itcanbeuseddirectlybyapplicationsthatwantthevery
lowestlatencysendandreceiveAPIandthatdonotrequireaPOSIXsocket
interface.
•ef_viispackagedwiththeOnloaddistribution.
•ef_viisanOSIlevel2interfacewhichsendsandreceivesrawEthernetframes.
•ef_visupportsazerocopyinterfacebecausetheuserprocesshasdirectaccess
tomemorybuffersusedbythehardwaretoreceiveandtransmitdata.
•Anapplicationcanusebothef_viandOnloadatthesametime.Forexample,
useef_vitoreceiveUDPmarketdataandOnloadsocketsforTCPconnections
fortrading.
•Theef_viAPIcandeliverlowerlatencythanOnloadandincursreducedper
messageoverheads.
•ef_viisfreesoftwaredistributedunderaLGPLlicense.
•Theuserapplicationwishingtousethelayer2ef_viAPImustimplementthe
higherlayerprotocols.
H.1Components
AllcomponentsrequiredtobuildandlinkauserapplicationwiththeSolarflareef_vi
APIaredistributedwithOnload.WhenOnloadisinstalledallrequireddirectories/
filesarelocatedundertheOnloaddistributiondirectory.
H.2CompilingandLinking
RefertotheREADME.ef_vifileintheOnloaddirectoryforcompileandlink
instructions.
OnloadUserGuide
ef_vi
Issue20 ©SolarflareCommunications2015 250
H.3Documentation
Theef_vidocumentationisdistributedindoxygenformatwiththeOnload
distribution.DocumentsinHTMLandRTFformataregeneratedbyrunningdoxygen
inthefollowingdirectory:
cdopenonload<version>/src/include/etherfabric/doxygen
doxygendoxyfile_ef_vi
DocumentsaregeneratedintheHTMLandRTFsubdirectories.
Theef_viuserguideisalsoavailableinPDFformat(SF114063CD)fromthe
Solarflaredownloadsite.
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 251
Ionload_iptables
I.1Description
TheLinuxnetfilteriptablesfeatureprovidesfilteringbasedonuserconfigurable
ruleswiththeaimofmanagingaccesstonetworkdevicesandpreventing
unauthorizedormaliciouspassageofnetworktraffic.Packetsdeliveredtoan
applicationviatheOnloadacceleratedpatharenotvisibletotheOSkerneland,as
aresult,thesepacketsarenotvisibletothekernelfirewall(iptables).
Theonload_iptablesfeatureallowstheusertoconfigureruleswhichdetermine
whichhardwarefiltersOnloadispermittedtoinsertontheadapterandtherefore
whichconnectionsandsocketscanbypassthekerneland,asaconsequence,bypass
iptables.
Theonload_iptablescommandcanconvertasnapshot1copyofthekerneliptables
rulesintoOnloadfirewallrulesusedtodetermineifsockets,createdbyanOnloaded
process,areretainedbyOnloadorhandedofftothekernelnetworkstack.
Additionally,userdefinedfilterrulescanbeaddedtotheOnloadfirewallonaper
interfacebasis.TheOnloadfirewallappliestothereceivefilterpathonly.
I.2Howitworks
BeforeOnloadacceleratesasocketitfirstcheckstheOnloadfirewallmodule.Ifthe
firewallmoduleindicatestheaccelerationofthesocketwouldviolateafirewallrule,
theaccelerationrequestisdeniedandthesocketishandedofftothekernel.
Networktrafficsentorreceivedonthesocketisnotaccelerated.
Onloadfirewallrulesareparsedinascendingnumericalorder.Thefirstruletomatch
thenewlycreatedsocket‐whichmayindicatetoaccelerateordeceleratethesocket
‐isselectedandnofurtherrulesareparsed.
IftheOnloadfirewallrulesareanexactcopyofthekerneliptablesi.e.withno
additionalrulesaddedbytheOnloaduser,thenasockethandedofftothekernel,
becauseofaniptablesruleviolation,willbeunabletoreceivedatathrougheither
path.
Changingrulesusingonload_iptableswillnotinterruptexistingnetwork
connections.
NOTE:Onloadfirewallruleswillnotpersistovernetworkdriverrestarts.
1. SubsequentchangestokerneliptableswillnotbereflectedintheOnloadfirewall.
OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 252
NOTE:Theonload_iptables“IPrules”willonlyblockhardwareIPfiltersfrombeing
insertedandonload_iptables“MACrules”willonlyblockhardwareMACfilters
frombeinginserted.ThereforeitispossiblethatifaruleisinsertedtoblockaMAC
address,theuserisstillabletoaccepttrafficfromthespecifiedhostbyOnload
insertinganappropriateIPhardwarefilter.
Files
WhentheOnloaddriversareloaded,firewallrulesexistintheLinuxprocpseudo
filesystemat:
/proc/driver/sfc_resource
Withinthisdirectorythefirewall_add,firewall_delandresourcesfileswillbe
present.Thesefilesarewritableonlybyarootuser.Noattemptshouldbemadeto
removethesefiles.
Onceruleshavebeencreatedforaparticularinterfaceandonlywhiletheserules
existaseparatedirectoryexistswhichcontainsthecurrentfirewallrulesforthe
interface:
/proc/driver/sfc_resource/ethN/firewall_rules
I.3Features
Togethelp
#onload_iptables‐h
I.4Rules
Thegeneralformatoftheruleis:
[rule=n]if=ethNprotocol=(ip|tcp|udp)[local_ip=a.b.c.d[/mask]]
[remote_ip=a.b.c.d[/mask]][local_port=a[b]][remote_port=a[b]][vlan=n]
action=(ACCELERATE|DECELERATE)
NOTE:UsingtheIPaddressruleform,thevlanidentifieriseffectiveonlywhenusing
aSolarflareSFN7000seriesadapterwhichisconfiguredtousethefullfeatured
firmwarevariant.OnotherSolarflareadaptersthevlanidentifierisignored.The
vlanidentifiercanonlybespecifiedwiththevlan=nsyntaxandnotontheinterface.
[rule=n]if=ethNprotocol=ethmac=xx:xx:xx:xx:xx:xx[/FF:FF:FF:FF:FF:FF]
[vlan=n]action=(ACCELERATE|DECELERATE)
NOTE:UsingtheMACaddressruleform,thevlanidentifieriseffectivewhen
specifiedforanySolarflareadapter.
OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 253
I.5Previewfirewallrules
BeforecreatingtheOnloadfirewall,runtheonload_iptables‐ voptiontoidentify
whichruleswillbeadoptedbythefirewallandwhichwillberejected(areasonis
givenforrejection):
#onload_iptables‐v
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=52015201
remote_ip=0.0.0.0/0remote_port=065535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=52015201
remote_ip=0.0.0.0/0remote_port=065535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcp
dpts:80:88
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=8088
remote_ip=0.0.0.0/0remote_port=065535action=
tcp‐‐0.0.0.0/00.0.0.0/0tcpspt:800
=>Errorparsing:Insuffcientargumentsinrule.
Thelastruleisrejectedbecausetheactionismissing.
NOTE:The‐voptiondoesnotcreatefirewallrulesforanySolarflareinterface,but
allowstheusertopreviewwhichLinuxiptablesruleswillbeacceptedandwhich
willberejectedbyOnload
ToconvertLinuxiptablestoOnloadfirewallrules
TheLinuxiptablescanbeappliedtoallorindividualSolarflareinterfaces.
Onloadiptablesareonlyappliedtothereceivefilterpath.Theusercanselectthe
INPUTCHAINorauserdefinedCHAINtoparsefromtheiptables.ThedefaultCHAIN
isINPUT.Toadopttherulesfromiptableseventhoughsomeruleswillberejected
enterthefollowingcommandidentifyingtheSolarflareinterfacetherulesshouldbe
appliedto:
#onload_iptables‐iethN‐c
#onload_iptables‐a‐c
Runningtheonload_iptablescommandwilloverwriteexistingrulesintheOnload
firewallwhenusedwiththe‐i(interface)or‐a(allinterfaces)options.
NOTE:ApplyingtheLinuxiptablestoaSolarflareinterfaceisoptional.The
alternativesaretocreateuserdefinedfirewallrulesperinterfaceornottoapply
anyfirewallrulesperinterface(defaultbehavior).
NOTE:onload_iptableswillimportallrulestotheidentifiedinterface‐evenrules
specifiedonanotherinterface.Toavoidimportingrulesspecifiedon‘other
interfacesusingthe‐‐useextendedoption.
OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 254
Toviewrulesforaspecificinterface:
WhenfirewallrulesexistforaSolarflareinterface,andonlywhiletheyexist,a
directoryfortheinterfacewillbecreatedin:
/proc/driver/sfc_resource
Rulesforaspecificinterfacewillbefoundinthefirewall_rulesfilee.g.
cat/proc/driver/sfc_resource/eth3/firewall_rules
if=eth3rule=0protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=065535action=DECELERATE
if=eth3rule=1protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=065535action=DECELERATE
if=eth3rule=2protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=7272action=DECELERATE
if=eth3rule=3protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=8088remote_port=065535action=DECELERATE
Toaddaruleforaselectedinterface
echo"rule=4if=eth3action=ACCEPTprotocol=udplocal_port=73307340"\
>/proc/driver/sfc_resource/firewall_add
Rulescanbeinsertedintoanypositioninthetableandexistingrulenumberswillbe
adjustedtoaccommodatenewrules.Ifarulenumberisnotspecifiedtherulewill
beappendedtotheexistingrulelist.
NOTE:Errorsresultingfromtheadd/deletecommandswillbedisplayedindmesg.
Todeletearulefromaselectedinterface:
Todeleteasinglerule:
#echo"if=eth3rule=2">/proc/driver/sfc_resource/firewall_del
Todeleteallrules:
echo"eth2all">/proc/driver/sfc_resource/firewall_del
Whenthelastruleforaninterfacehasbeendeletedtheinterfacefirewall_rulesfile
isremovedfrom/proc/driver/sfc_resource.Theinterfacedirectorywillbe
removedonlywhencompletelyempty.
ErrorChecking
Theonload_iptablescommanddoesnotlogerrorstostdout.Errorsarisingfromadd
ordeletecommandswillloggedindmesg.
Interface&Port
Onloadfirewallrulesareboundtoaninterfaceandnottoaphysicaladapterport.It
ispossibletocreaterulesforaninterfaceinaconfigured/downstate.
OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 255
Virtual/BondedInterface
Onvirtualorbondedinterfacesfirewallrulesareonlyappliedandenforcedonthe
‘real’interface.
I.6ErrorMessages
Errormessagesrelatingtoonload_iptablesoperationswillappearindmesg.
Table9:Errormessagesforonload_iptables
ErrorMessage Description
Internalerror Internalcondition‐shouldnothappen.
Unsupportedrule Internalcondition‐shouldnothappen.
Outofmemoryallocatingnewrule Memoryallocationerror.
Seenmultiplerulenumbers Onlyasinglerulenumbercanbe
specifiedwhenadding/deletingrules.
Seenmultipleinterfaces Onlyasingleinterfacecanbespecified
whenadding/deletingrules.
Unabletounderstandaction Theactionspecifiedwhenaddinga
ruleisnotsupported.Notethatthere
shouldbenospacesi.e.
action=ACCELERATE.
Unabletounderstandprotocol Nonsupportedprotocol.
Unabletounderstandremainderof
therule
Nonsupportedparameters/syntax.
Failedtounderstandinterface Theinterfacedoesnotexist.Rulescan
beaddedtoaninterfacethatdoesnot
yetexist,butcannotbedeletedfrom
annonexistentinterface.
Failedtoremoverule Theruledoesnotexist.
Errorremovingtable Internalcondition‐shouldnothappen.
Invalidlocal_iprule Invalidaddress/maskformat.
Supportedformats:
a.b.c.d
a.b.c.d/n
a.b.c.d/e.f.g.h
wherea.b.c.d.e.f.g.haredecimalrange
0255,n=decimalrange032.
OnloadUserGuide
onload_iptables
Issue20 ©SolarflareCommunications2015 256
NOTE:ALinuxlimitationapplicabletothe/proc/filesystemrestrictsawrite
operationto1024bytes.Whenwritingto/proc/driver/sfc_resource/
firewall_[add|del]filestheuserisadvisedtoflushthewritebetweenlineswhich
exceedthe1024bytelimit.
Invalidremote_iprule Invalidaddress/maskformat.
Invalidrule Arulemustidentifyatleastan
interface,aprotocol,anactionandat
leastonematchcriteria.
Invalidmac Invalidmacaddress/maskformat.
Supportedformats:
xx:xx:xx:xx:xx:xx
xx:xx:xx:xx:xx:xx/xx:xx:xx:xx:xx:xx
wherexisahexdigit.
Table9:Errormessagesforonload_iptables
ErrorMessage Description
OnloadUserGuide
Issue20 ©SolarflareCommunications2015 257
JSolarflareefpioTestApplication
Theopenonloaddistributionincludesthecommandlineefpiotestapplicationto
measurelatencyoftheSolarflareef_vilayer2APIwithPIO.Theefpioapplicationis
asinglethreadping/pong.Whenalliterationsarecompletetheclientsidewill
displaytheroundtriptime.
Bydefaultefpiodownloadsapackettotheadapteratstartofdayandtransmitsthis
samepacketoneveryiterationofthetest.The–coptioncanbeusedtotestthe
latencyofef_viusingPIOtotransferanewtransmitpackettotheadapteronevery
iteration.
Withtheonloaddistributioninstalledefpiowillbepresentinthefollowing
directory:
~/openonload201310/build/gnu_x86_64/tests/ef_vi
J.1efpio
./efpio–help
usage:
efpio[options]<ping|pong><interface>
<localipintf><localport>
<remotemac><remoteipintf><remoteport>
options:
‐n<iterations>‐setnumberofiterations
‐s<messagesize>‐setudppayloadsize
‐w‐sleepinsteadofbusywait
‐v‐useaVF
‐p‐physicaladdressmode
‐t‐disableTXpush
‐c‐copyoncriticalpath
Table10:efpioOptions
Parameter Description
interface thelocalinterfacetousee.g.eth2
localipintf localinterfaceIPaddress/hostname
localport localinterfaceIPportnumbertouse
remotemac MACaddressoftheremoteinterface
remoteipintf remoteserverIPaddress/hostname
remoteport remoteserverportnumber
OnloadUserGuide
SolarflareefpioTestApplication
Issue20 ©SolarflareCommunications2015 258
Torunefpio
Theefpiomustbestartedontheserver(pongside)beforetheclient(pingside)is
run.Commandlineexamplesareshownbelow.
1Ontheserverside(server1)
taskset–c<M>./efpiopongeth<N><localip>8001<server2mac>
<server2ip>8001
#ef_vi_version_str:2013067122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
2Ontheclientside(server2)
taskset–c<M>./efpiopingeth<N><localip>8001<server1mac>
<server1ip>8001
#ef_vi_version_str:2013067122preview2
#udppayloadlen:28
#iterations:100000
#framelen:70
roundtriptime:2.848µs
M=cpucore,N=Solarflareadapterinterface.

Navigation menu