Onload User Guide SF 104474 CD 22

Onload%20User%20Guide%20-%20(2017)%20SF-104474-CD-22%20-%20issue%2022

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 310 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Issue22 ©SolarflareCommunications2017 i
OnloadUserGuide
Copyright©2017SOLARFLARECommunications,Inc.Allrightsreserved.
Thesoftwareandhardwareasapplicable(the“Product”)describedinthisdocument,andthisdocument,areprotectedby
copyrightlaws,patentsandotherintellectualpropertylawsandinternationaltreaties.TheProductdescribedinthisdocumentis
providedpursuanttoalicenseagreement,evaluationagreementand/ornondisclosureagreement.TheProductmaybeusedonly
inaccordancewiththetermsofsuchagreement.Thesoftwareasapplicablemaybecopiedonlyinaccordancewiththetermsof
suchagreement.
OnloadislicensedundertheGNUGeneralPublicLicense(Version2,June1991).SeetheLICENSEfileinthedistributionfordetails.
TheOnloadExtensionsStubLibraryisCopyrightlicensedundertheBSD2ClauseLicense.
OnloadcontainsalgorithmsanduseshardwareinterfacetechniqueswhicharesubjecttoSolarflareCommunicationsIncpatent
applications.PartiesinterestedinlicensingSolarflare'sIPareencouragedtocontactSolarflare'sIntellectualPropertyLicensing
Groupat:
DirectorofIntellectualPropertyLicensing
IntellectualPropertyLicensingGroup
SolarflareCommunicationsInc,
7505IrvineCenterDrive
Suite100
Irvine,California92618
YouwillnotdisclosetoathirdpartytheresultsofanyperformancetestscarriedoutusingOnloadorEnterpriseOnloadwithout
thepriorwrittenconsentofSolarflare.
Thefurnishingofthisdocumenttoyoudoesnotgiveyouanyrightsorlicenses,expressorimplied,byestoppelorotherwise,with
respecttoanysuchProduct,oranycopyrights,patentsorotherintellectualpropertyrightscoveringsuchProduct,andthis
documentdoesnotcontainorrepresentanycommitmentofanykindonthepartofSOLARFLARECommunications,Inc.orits
affiliates.
TheonlywarrantiesgrantedbySOLARFLARECommunications,Inc.oritsaffiliatesinconnectionwiththeProductdescribedinthis
documentarethoseexpresslysetforthinthelicenseagreement,evaluationagreementand/ornondisclosureagreement
pursuanttowhichtheProductisprovided.EXCEPTASEXPRESSLYSETFORTHINSUCHAGREEMENT,NEITHERSOLARFLARE
COMMUNICATIONS,INC.NORITSAFFILIATESMAKEANYREPRESENTATIONSORWARRANTIESOFANYKIND(EXPRESSORIMPLIED)
REGARDINGTHEPRODUCTORTHISDOCUMENTATIONANDHEREBYDISCLAIMALLIMPLIEDWARRANTIESOFMERCHANTABILITY,
FITNESSFORAPARTICULARPURPOSEANDNONINFRINGEMENT,ANDANYWARRANTIESTHATMAYARISEFROMCOURSEOF
DEALING,COURSEOFPERFORMANCEORUSAGEOFTRADE.Unlessotherwiseexpresslysetforthinsuchagreement,totheextent
allowedbyapplicablelaw(a)innoeventshallSOLARFLARECommunications,Inc.oritsaffiliateshaveanyliabilityunderanylegal
theoryforanylossofrevenuesorprofits,lossofuseordata,orbusinessinterruptions,orforanyindirect,special,incidentalor
consequentialdamages,evenifadvisedofthepossibilityofsuchdamages;and(b)thetotalliabilityofSOLARFLARE
Communications,Inc.oritsaffiliatesarisingfromorrelatingtosuchagreementortheuseofthisdocumentshallnotexceedthe
amountreceivedbySOLARFLARECommunications,Inc.oritsaffiliatesforthatcopyoftheProductorthisdocumentwhichisthe
subjectofsuchliability.
TheProductisnotintendedforuseinmedical,lifesaving,lifesustaining,criticalcontrolorsafetysystems,orinnuclearfacility
applications.
Alistofpatentsassociatedwiththisproductcanbefoundat:http://www.solarflare.com/patent
SF104474CDLastRevised:February2017
Issue22
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 ii
Trademark
OpenOnload®andEnterpriseOnload®areregisteredtrademarksofSolarflareCommunicationsInc
intheUnitedStatesandothercountries.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 iii
   TableofContents
1What’sNew ........................................................1
1.1NewfeaturesinOpenOnload201601u1 .........................1
1.2NewfeaturesinOpenOnload201606............................2
1.3Changehistory...............................................4
2LowLatencyQuickstartGuide.........................................5
2.1Introduction .................................................5
2.2SoftwareInstallation ..........................................5
2.3TestSetup...................................................6
2.4Latency .....................................................8
2.5TestingWithoutOnload ......................................12
2.6FurtherInformation .........................................13
3Background.......................................................14
3.1Introduction ................................................14
4Installation........................................................18
4.1Introduction ................................................18
4.2OnloadDistributions .........................................18
4.3HardwareandSoftwareSupportedPlatforms ....................19
4.4OnloadandtheNetworkAdapterDriver ........................20
4.5RemovingPreviouslyInstalledDrivers...........................20
4.6MigratingBetweenOnloadVersions‐Upgrade/Downgrade .........21
4.7EnterpriseOnload‐BuildandInstallfromSRPM ..................23
4.8EnterpriseOnload‐DebianSourcePackages......................24
4.9OpenOnloadDKMSInstallation................................25
4.10BuildOpenOnloadSourceRPM...............................25
4.11OpenOnload‐Installation....................................25
4.12OnloadKernelModules .....................................27
4.13ConfiguringtheNetworkInterfaces............................28
4.14InstallingNetperf...........................................28
4.15HowtorunOnload .........................................28
4.16Testi ngtheOnloadInstallation................................28
4.17ApplyanOnloadPatch ......................................28
OnloadUserGuide
TableofContents
Issue22 ©SolarflareCommunications2017 iv
5TuningOnload .....................................................30
5.1Introduction ................................................30
5.2SystemTuning ..............................................31
5.3StandardTuning .............................................33
5.4OnloadDeploymentonNUMASystems .........................36
5.5InterruptHandling‐KernelDriver ..............................38
5.6PerformanceJitter...........................................44
5.7AdvancedTuning ............................................47
6OnloadFunctionality................................................54
6.1OnloadTransparency.........................................54
6.2OnloadStacks...............................................54
6.3VirtualNetworkInterface(VNIC) ...............................55
6.4FunctionalOverview .........................................55
6.5OnloadwithMixedNetworkAdapters ..........................55
6.6MaximumNumberofNetworkInterfaces .......................56
6.7WhitelistandBlacklistInterfaces ...............................56
6.8OnloadedPIDs ..............................................56
6.9OnloadandFileDescriptors,StacksandSockets ..................57
6.10SystemcallsinterceptedbyOnload ............................57
6.11LinuxSysctls ...............................................57
6.12ChangingOnloadControlPlaneTableSizes .....................60
6.13SO_BINDTODEVICE.........................................60
6.14MultiplexedI/O............................................61
6.15WireOrderDelivery ........................................65
6.16StackSharing..............................................67
6.17ApplicationClustering .......................................68
6.18Bonding,LinkaggregationandFailover.........................69
6.19Teaming ..................................................70
6.20VLANS....................................................71
6.21Acceleratedpipe() ..........................................72
6.22ZeroCopyAPI .............................................73
6.23DebugandLogging.........................................73
7TimeStamps .......................................................74
7.1Introduction ................................................74
7.2SoftwareTimestamps ........................................74
7.3HardwareTimestamps .......................................75
7.4Timestamping‐ExampleApplications ...........................77
OnloadUserGuide
TableofContents
Issue22 ©SolarflareCommunications2017 v
8Onload‐TCP ......................................................80
8.1TCPOperation ..............................................80
8.2TCPHandshake‐SYN,SYNACK .................................81
8.3TCPSYNCookies ............................................81
8.4TCPSocketOptions..........................................81
8.5TCPLevelOptions ...........................................84
8.6TCPFileDescriptorControl....................................85
8.7TCPCongestionControl.......................................85
8.8TCPSACK ..................................................86
8.9TCPQUICKACK ..............................................86
8.10TCPDelayedACK...........................................86
8.11TCPDynamicACK ..........................................87
8.12TCPLoopbackAcceleration ..................................87
8.13TCPStriping...............................................89
8.14TCPConnectionResetonRTO ................................89
8.15ONLOAD_MSG_WARM ......................................90
8.16Listen/AcceptSockets .......................................91
8.17SocketCaching.............................................91
8.18ScalableFilters.............................................93
8.19TransparentReverseProxyModes.............................95
8.20TransparentReverseProxyonMultipleCPUs ....................97
9Onload‐UDP ......................................................98
9.1UDPOperation..............................................98
9.2SocketOptions..............................................98
9.3SourceSpecificSocketOptions ...............................100
9.4OnloadSocketsvs.KernelSockets .............................100
9.5UDPSockets‐SendandReceivePaths .........................100
9.6FragmentedUDP...........................................101
9.7UserLevelrecvmmsgforUDP ................................101
9.8UserLevelsendmmsgforUDP ................................102
9.9UDPsendfile ..............................................102
9.10MulticastReplication.......................................102
9.11MulticastOperationandStackSharing ........................103
9.12MulticastLoopback ........................................106
9.13HardwareMulticastLoopback...............................106
9.14IP_MULTICAST_ALL ........................................107
OnloadUserGuide
TableofContents
Issue22 ©SolarflareCommunications2017 vi
10PacketBuffers...................................................108
10.1Introduction ..............................................108
10.2NetworkAdapterBufferTableMode..........................108
10.3LargeBufferTableSupport ..................................108
10.4ScalablePacketBufferMode................................109
10.5AllocatingHugePages ......................................109
10.6HowPacketBuffersAreUsedbyOnload .......................110
10.7ConfiguringScalablePacketBuffers...........................113
10.8PhysicalAddressingMode ..................................117
10.9ProgrammedI/O..........................................118
11OnloadandVirtualization ......................................... 120
11.1Introduction ..............................................120
11.2Overview ................................................120
11.3OnloadandLinuxKVM.....................................120
11.4OnloadandNICPartitioning.................................123
11.5OnloadinaDockerContainer ...............................124
11.6PreInstallation ...........................................124
11.7Installation ...............................................125
11.8CreateOnloadDockerImage................................126
11.9Migration................................................127
11.10CopyingFilesBetweenHostandContainer ...................128
12Limitations......................................................129
12.1Introduction ..............................................129
12.2ChangestoBehavior .......................................130
12.3LimitstoAcceleration ......................................133
12.4epoll‐KnownIssues.......................................137
12.5ConfigurationIssues.......................................139
13ChangeHistory ..................................................144
13.1MappingEnterpriseOnload/OpenOnload .....................144
13.2Onload‐AdapterNetDrivers ...............................144
13.3Features.................................................145
13.4EnvironmentVariables .....................................150
13.5ModuleOptions...........................................158
13.6Onload‐AdapterNetDrivers ................................161
AParameterReference.............................................. 163
A.1ParameterList.............................................163
BMetaOptions....................................................217
B.1Environmentvariables ......................................217
OnloadUserGuide
TableofContents
Issue22 ©SolarflareCommunications2017 vii
CBuildDependencies...............................................219
C.1General...................................................219
DOnloadExtensionsAPI.............................................221
D.1SourceCode...............................................221
D.2JavaNativeInterface‐Wrapper...............................221
D.3CommonComponents......................................221
D.4StacksAPI.................................................227
D.5StacksAPIUsage...........................................233
D.6StacksAPI‐Examples.......................................234
D.7ZeroCopyAPI .............................................236
D.8TemplatedSends ...........................................249
D.9DelegatedSendsAPI ........................................253
Eonload_stackdump................................................ 261
E.1Introduction ...............................................261
E.2GeneralUse ...............................................261
FSolarflaresfnettest................................................283
F.1 Introduction...............................................283
Gonload_tcpdump.................................................291
G.1Introduction...............................................291
G.2Buildingonload_tcpdump ...................................291
G.3Usingonload_tcpdump .....................................291
Hef_vi........................................................... 294
H.1Components ..............................................294
H.2CompilingandLinking ......................................294
H.3Documentation ............................................295
Ionload_iptables...................................................296
I.1Description ................................................296
I.2Howitworks ...............................................296
I.3Features...................................................297
I.4Rules .....................................................297
I.5Previewfirewallrules ........................................298
I.6ErrorMessages .............................................300
JSolarflareeflatencyTestApplication..................................302
J.1eflatency..................................................302
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 1
1WhatsNew
ThisissueoftheuserguideidentifieschangesintroducedintheOpenOnload®
updaterelease,201606u1,andEnterpriseOnload®release5.0.
RefertoChangeHistoryonpage144toconfirmfeatureavailabilityintheEnterprise
release.
ForacompletelistoffeaturesandenhancementsrefertotheReleaseNotesandthe
ReleaseChangeLogavailablefrom:http://www.openonload.org/download.html.
ThechangesandimprovementsinOnload201606andEnterpriseOnload5.0include
supportforthenewSFN8000seriesadapters,someadditionalconfiguration
options,andextensionstotheef_viAPI,seebelowfordetails.
OpenOnload201606u1andEnterpriseOnload5.0includethe4.10.0.1011net
driver.
UsersshouldrefertoReleaseNotessfcinthedistributionpackagefordetailsof
changestotheadapterdriver.Manyofthenewfeaturesrequireaminimum4.7
versionfirmware.
1.1NewfeaturesinOpenOnload201601u1
TCPDirectAPI
Theonload201601u1releaseandEnterpriseOnload5.0distributionincludethe
fullversionofSolarflare’sTCPDirectultralowlatencyAPI.Fordetailsofchanges
sincethepreviewreleaseusersshouldrefertothereleasenotes.Usersofthe
SolarflareSFN7000andSFN8000seriesadapterscanobtainanAppFlexlicensefor
thisfeature.
OnloadExtensionsAPI
Anewfunction:onload_socket_nonaccel() allowsanOnloadapplicationto
allocatesocketswhicharenotacceleratedbyOnload.
ExtensionsAPIdetailscanbefoundinthe/openonload-<version>/src/
include/onload_extensions.hfile,inOnloaddistributionReleaseNotesand
ChangeLogs.
OnloadUserGuide
WhatsNew
Issue22 ©SolarflareCommunications2017 2
ConfigurationOptions
EF_WODA_SINGLE_INTERFACEremovestheconstraintonorderingbeing
correctacrossinterfaceswhenusingonload_ordered_epoll_wait().
Latencycanbeimprovedwhenthisoptionisenabledbecausetrafficisordered
onlywithrespecttoothertrafficonthesameinterface.
EF_TCP_SHARED_LOCAL_PORTSimprovestheperformanceonTCPactive_open
connectionsbyreducingthecostofblockingandnonblockingconnect()
calls.
EF_ONLOAD_FD_BASEidentifiesabasefiledescriptorvaluethatOnloadwilluse
forinternalusefiledescriptorstherebyreducingtheriskofapplications
runningintoOnloadfiledescriptorspace.
1.2NewfeaturesinOpenOnload201606
SupportforSFN8000seriesadapters
ThisreleaseaddsfullsupportforthenewrangeofSolarflareFlareonUltraSFN8000
seriesadapters.Usersinterestedinlearningmoreabouttheimprovedperformance
andfeaturesavailablecancontactsales@solarflare.com.
Newconfigurationoptions
Somenewconfigurationoptionshavebeenadded:
EF_TCP_LISTEN_REPLIES_BACKforcesthereplytoanincomingSYNtoignore
routesandreplytotheoriginatingnetworkinterface.See
EF_TCP_LISTEN_REPLIES_BACKonpage199.
EF_HIGH_THROUGHPUT_MODEoptimizesforthroughputatthecostoflatency.
SeeEF_HIGH_THROUGHPUT_MODEonpage171.
TCPDirect
Theonload201606releasecontainapreviewofSolarflare’snewTCPDirectultra
lowlatencykernelbypasssolution.TCPDirectisanewlightweightTCP/IPStackfrom
SolarflarethatexposesanAPIarchitectedanddedicatedforminimallatency.
ThepreviewpackageincludesTCPandUDPlatencybenchmarksprogrammedtothe
TCPDirectAPItoprovideanindicationofexpectedperformance(seeTCPDirect
Latencyonpage10).
TheproductionversionofTCPDirectisavailableinonload201606u1and
EnterpriseOnload5.0.
OnloadUserGuide
WhatsNew
Issue22 ©SolarflareCommunications2017 3
ef_vi
ef_viistheOnloadlayer2API.RefertoAppendixHfordetailsofef_vi.
Thisreleaseextendstheef_viAPI.Thesechangesaresummarizedbelow.Further
detailsareavailableintheef_viUserGuide,suppliedasDoxygensourceembedded
intheheaderfiles,andalsoasaprebuiltPDFfile.
TXalternativesAPI
Thisreleaseaddsanewmethodforsendingdatatotheef_viAPI.Usingthe
ef_vi_transmit_alt*()functionsasetofalternativepacketscanbequeuedin
theadapterreadytobesent.Oncethedecisionaboutwhichalternativetosendis
made,thecorrectpacketisreleasedontothenetworkwithextremelylowlatency.
ThisfeatureissupportedonSFN8000seriesadapters.
CapabilitiesAPI
Thisreleaseaddssupportforanef_viAPIthatwillallowuserstoqueryatruntime
whatfeaturesandcapabilitiesareavailableonthehardwareasitiscurrently
configured.Thisallowsapplicationstobuiltonce,thenrunonavarietyofSolarflare
adaptersandtailorthefeaturesettothecurrenthardwarecapabilities.
IPv6filtering
ThisreleaseaddssupportforIPv6filtersinef_vi.
ThisfeaturesissupportedonSFN7000seriesandSFN8000seriesadapters.
onload_thread_get_spinextensionAPI
ThisreleaseaddsanewextensionsAPIcalltoqueryperthreadspinsettings,to
complementtheexistingonload_thread_set_spin().
onload_thread_get_spin()retrievesthespinsettingspreviouslyconfiguredvia
onload_thread_set_spin().
Refertosrc/include/onload/extensions.hfortheAPIdocumentation.Details
arealsodocumentedinonload_thread_get_spinonpage226.
sfc_resource/procfilenamechange
Toavoidanamingcollisionwheninterfacesarerenamed,thefilesthatusedtobeat
/proc/driver/sfc_resource/ethX/pdYhavemovedto/proc/driver/
sfc_resource/devices/0000:0Z.00.0/pdY
OnloadUserGuide
WhatsNew
Issue22 ©SolarflareCommunications2017 4
Newonload_cplanemodule
ThisreleaseofOnloadaddsanewbinarymodulenamedonload_cplane.
Thisnewmoduleprovidessomeofthemoduleparametersthatwerepreviously
handledbyotherSolarflaremodules.Italsoprovidessomenewparameters:
max_local_addrscontrolsthesizeoftheIPaddresstable
cplane_debug_bitscontrolsthelevelofdebugloggingproduced.
Anumberof/procfileshavebeenchanged,typicallybymovingthemfromthe/
proc/driver/onloaddirectorytothenew/proc/driver/onload_cplane
directory.SeetheReleaseNotesforfulldetails.
Somebuildtimeconfigurationoptionshavebeeneffectivelyremovedastheyare
setatbuildtimeoftheonload_cplanemodule:
CI_CFG_MAX_REGISTER_INTERFACES
CI_CFG_TEAMING
sfc_aoemodule
Thesfc_aoemoduleisnolongerincludedintheOnloaddistribution.Thiswillin
futurebedistributedseparatelytoallowindependentreleasesofthesetwo
products.Ifthiscausesissuespleasecontactsupport@solarflare.com.
1.3Changehistory
TheChangeHistorysectionisupdatedwitheveryrevisionofthisdocumentto
includethelatestOnloadfeatures,changesoradditionstoenvironmentvariables
andchangesoradditionstoOnloadmoduleoptions.RefertoChangeHistoryon
page144.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 5
2LowLatencyQuickstartGuide
2.1Introduction
Thissectiondemonstrateshowtoachieveverylowlatencycoupledwithminimum
jitteronasystemfittedwiththeSolarflareSFN8000seriesnetworkadapterand
usingSolarflare’skernelbypassnetworkaccelerationmiddleware,OpenOnload.
TheprocedurewillfocusontheperformanceofthenetworkadapterforTCPand
UDPapplicationsrunningonLinuxusingtheindustrystandardNetperfnetwork
benchmarkapplicationandtheSolarflaresuppliedopensourcesfnettestnetwork
benchmarktesttools.
NOTE:PleasereadtheSolarflareLICENSEfileregardingthedisclosureof
benchmarktestresults.
2.2SoftwareInstallation
BeforerunningLowLatencybenchmarktestsensurethatcorrectdriverand
firmwareversionsareinstallede.g.(minimumdriverandfirmwareversionsare
shown):
[root@serverN]#ethtool‐i<interface>
driver:sfc
version:4.10.0.1011
firmwareversion:6.2.0.1016rx1tx1
FirmwareVariant
OnSFN7000andSFN8000seriesadapters,theadaptershouldusetheultralow
latencyfirmwarevariantasindicatedbythepresenceofrx1tx1asshownabove.
FirmwarevariantsareselectedwiththesfbootutilityfromtheSolarflareLinux
Utilitiespackage(SF107601LS).
Netperf
Netperfcanbedownloadedfromhttp://www.netperf.org/netperf/
Unpackthecompressedtarfileusingthetarcommand:
#tar‐zxvfnetperf<version>.tar.gz
Thiswillcreateasubdirectorycallednetperf<version>fromwhichthe
configureandmakecommandscanberun(asroot):
./configure
makeinstall
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 6
Followinginstallationthenetperfandnetserverapplicationsarelocatedinthe
srcsubdirectory.
Solarflaresfnettest
Downloadthesfnettest<version>.tgzsourcefilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
#tar‐zxvfsfnettest<version>.tgz
Runthemakeutilityfromthesfnettest<version>/srcsubdirectorytobuildthe
sfntpingpongandothertestapplications.
SolarflareOnload
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalledthesystemmust
supportabuildenvironmentcapableofcompilingkernelmodules.RefertoBuild
Dependenciesonpage219formoredetails.
Downloadtheopenonload<version>.tgzfilefromwww.openonload.org
Unpackthetarfileusingthetarcommand:
#tar‐zxvfonload<version>.tgz
Runtheonload_installcommandfromtheOnload<version>/scripts
subdirectory:
#./onload_install
RefertoDriverLoading‐NUMANodeonpage36toensurethatdriversare
affinitizedtoacoreonthecorrectNUMAnode.
2.3TestSetup
Thediagrambelowidentifiestherequiredphysicalconfigurationoftwoservers
equippedwithSolarflarenetworkadaptersconnectedbacktoback.Ifrequired,
testscanberepeatedwitha10Gswitchonthelinktomeasuretheadditional
latencydeltausingaparticularswitch.
Requirements:
•TwoserversareequippedwithSolarflarenetworkadaptersandconnected
withasinglecablebetweentheSolarflareinterfaces.
•TheSolarflareinterfacesareconfiguredwithanIPaddresssothattrafficcan
passbetweenthem.Usepingtoverifyconnection.
• Onload,netperfandsfnettestareinstalledonbothmachines.
System under test
10G link
(direct attach or optical)
System under test
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 7
PreTestConfiguration
ThefollowingconfigurationoptionsareapplicabletoRHEL7systems.Onboth
machines:
1Addthefollowingoptionstothekernelconfiglinein/boot/grub/grub.conf:
isolcpus=<commaseparatedcpulist>nohz=offiommu=offintel_iommu=off
mce=ignore_cenmi_watchdog=0
2Stopthefollowingservicesontheserver:
systemctlstopcpupower
systemctlstopcpuspeed
systemctlstopcpufreqd
systemctlstoppowerd
systemctlstopirqbalance
systemctlstopfirewalld
3Allocatehugepages‐RefertoAllocatingHugePagesonpage109.
4OnaNUMAawaresystem,latencycanbeaffectedunlessconsiderationis
giventotheselectionoftheNUMAnode.RefertoOnloadDeploymenton
NUMASystemsonpage36.
5Disableinterruptmoderation.
#ethtool‐C<interface>rxusecs0adaptiverxoff
6EnablePIOintheOnloadenvironment.
EF_PIO=1
7RefertotheReferenceSystemSpecificationbelowforBIOSfeatures.
ReferenceSystemSpecification
ThefollowinglatencymeasurementswererecordedonIntel®Haswellservers.The
specificationofthetestsystemsisasfollows:
•DELLPowerEdgeR630serversequippedwithdualIntel®Xeon®CPUE52637v3
@3.50GHz,2x4GBDIMMs.
BIOS:TurbomodeENABLED,cstatesENABLED,IOMMUDISABLED,
VirtualizationDISABLED.
•RedHatEnterpriseLinuxV7.3(x86_64kernel,version3.10.0493.el7.x86_64).
• SolarflareSFN8522NIC(driverandfirmwareseeSoftwareInstallation).
Directattachcableat10G.
• OpenOnloaddistribution:openonload201606u1.
ItisexpectedthatsimilarresultswillbeachievedonanyIntelbased,PCIeGen3
serverorcompatiblesystem.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 8
2.4Latency
OnloadLatency
UDPLatency:Netperf
Runthenetserverapplicationonsystem1:
[system1]#pkill‐fnetserver
[system1]#onload‐‐profile=latencytaskset‐c4./netserver
Runthenetperfapplicationonsystem‐2:
[system2]#onload‐‐profile=latencytaskset‐c4./netperf‐tUDP_RR‐H
<system1ip>‐l10‐‐‐r32
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
212992212992323210.00366588.0
366588transactions/secondmeansthateachtransactiontakes1/366588seconds
resultingina½RTTlatencyof(1/366588)/2or1.36µs.
UDPLatency:sfntpingpong
Runthesfntpingpongapplicationonbothsystems:
[system1]#onload‐‐profile=latencytaskset‐c4./sfntpingpong
[system2]#onload‐‐profile=latencytaskset‐c4./sfntpingpong‐‐
affinity"4;4"udp<system1ip>
#sizemeanminmedianmax%ilestddeviter
01374131713461428014153541000000
11374131113461334614673541000000
21373131813451235514033541000000
4137213201345886613963531000000
81373131713451485614243541000000
16137313161344812914603541000000
32140313401373945814513671000000
641435137714051517914753631000000
128151314541481870415623781000000
256166415991630966817943881000000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)½RTT
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.40µs
witha99%ilelatencyunder1.45µs.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 9
TCPLatency:Netperf
Runthenetserverapplicationonsystem1:
[system1]#pkill‐fnetserver
[system1]#onload‐‐profile=latencytaskset‐c4./netserver
Runthenetperfapplicationonsystem2:
[system2]#onload‐‐profile=latencytasksetc4./netperft
TCP_RR‐H<system1ip>‐l10‐‐‐r32
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
1638487380323210.00327572.09
327572transactions/secondmeansthateachtransactiontakes1/327572seconds
resultingina½RTTlatencyof(1/327572)/2or1.52µs.
TCPLatency:sfntpingpong
Runthesfntpingpongapplicationonbothsystems:
[system1]#onload‐‐profile=latencytaskset‐c4./sfntpingpong
[system2]#onload‐‐profile=latencytaskset‐c4./sfntpingpong‐‐
affinity"4;4"tcp<system1ip>
#sizemeanminmedianmax%ilestddeviter
#sizemeanminmedianmax%ilestddeviter
1149414481478142491630601000000
214941449147743421630581000000
414961448147936151632551000000
815001454148432251636581000000
1615081460149139051645561000000
3215241477150734531660561000000
6415751526155850671715601000000
12816531604163735631794591000000
25617931740177639951932571000000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)½RTT
latencyforincreasingTCPpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof1.5µs
witha99%ilelatencyunder1.6µs.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 10
TCPDirectLatency
TCPDirectisafeatureavailablefortheSFN7000andSFN8000seriesadapterswhich
musthaveanOnloadlicenseandaTCPDirectlicenseinstalled.
TheTCPDirecttestapplicationsrequirehugepagestobeconfiguredonthetest
hosts.Forexample,toconfigure1024hugepages:
#sysctl‐wvm.nr_hugepages=1024
Tomakethischangepersistent,update/etc/sysctl.conf.Forexample:
#echo"vm.nr_hugepages=1024">>/etc/sysctl.conf
UDPLatency:zfudppingpong
Runthezfudppingpongapplicationonbothsystems,whichcanbefoundinthe/
openonload<version>/build/gnu_x86_64/tests/zf_apps/staticdirectory.
[system1]#ZF_ATTR="interface=enp1s0f0"taskset‐c4./zfudppingpong‐s1
pong<system1ip>:20000<system2ip>:20000
ZFlibraryinitialized
[system2]#ZF_ATTR="interface=enp1s0f0"taskset‐c4./zfudppingpong‐s1
ping<system2ip>:20000<system1ip>:20000
ZFlibraryinitialized
meanroundtriptime:2.192usec
TheoutputidentifiesmeanTCPDirect½RTTlatencyfora1byteUDPmessageof
1.096µsec.
TCPLatency:zftcppingpong
Runthezftcppingpongapplicationonbothsystems,whichcanbefoundinthe/
openonload<version>/build/gnu_x86_64/tests/zf_apps/staticdirectory.
[system1]#ZF_ATTR="interface=enp1s0f0"taskset‐c4./zftcppingpong‐s1
pong<system1ip>:20000
Waitingforincomingconnection
Connectionestablished
[system2]#ZF_ATTR="interface=enp1s0f0"taskset‐c4./zftcppingpong‐s1
ping<system1ip>:20000
Connectingtoponger
Connectionestablished
meanroundtriptime:2.202usec
TheoutputidentifiesmeanTCPDirect½RTTlatencyfora1byteTCPmessageof
1.101µsec.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 11
Layer2ef_viLatency
TheeflatencyUDPtestapplication,suppliedwiththeopenonloadpackage,canbe
usedtomeasurelatencyoftheSolarflareef_vilayer2API.eflatencyusesthe
lowestlatencymethodthatisavailableontheadapter.
[system1]#taskset‐c4./eflatency‐s32pongenp4s0f0
[system2]#taskset‐c4./eflatency‐s32pingenp4s0f0
#ef_vi_version_str:201606u1
#udppayloadlen:32
#iterations:100000
#warmups:10000
#framelen:74
#mode:Alternatives
meanroundtriptime:1.990usec
Theoutputidentifiesmean(nanosecond)ef_vi½RTTlatencyof995ns.
NOTE:TheTXalternativesmodehasbeenselectedforthelowestlatency.Ona4‐
portadapter,theportmodeoptionmustbesetto2x10Gor2x40G.Thisfeatureis
notavailablewhenusedin4x10Gmode.Theportmodeoptioncanbeconfigured
withthesfbootutilityfromtheSolarflareLinuxUtilitiespackage.
SolarflareeflatencyTestApplicationonpage302describestheeflatency
application,commandlineoptionsandprovidesexamplecommandlines.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 12
2.5TestingWithoutOnload
ThebenchmarkperformancetestscanberunwithoutOnloadusingtheregular
kernelnetworkdrivers.Todothisremovetheonload‐‐profile=latencypart
fromthecommandline.
CoreAffinity
Setinterruptaffinitysuchthatinterruptsandtheapplicationarerunningon
differentCPUcoresbutonthesameprocessorpackage‐examplesbelow.
Usethefollowingcommandtoidentifyreceivequeuescreatedforaninterfacee.g:
#cat/proc/interrupts|grepeth2
33:0000IRPCIMSIedgeeth20
34:0000IRPCIMSIedgeeth21
DirectIRQ33toCPUcore0andIRQ34toCPUcore1:
#echo1>/proc/irq/33/smp_affinity
#echo2>/proc/irq/34/smp_affinity
TunedProfile
OnRHEL7servers,thetunednetworklatencyprofileproducesbetterkernellatency
results:
#tunedadmprofilenetworklatency
BusyPoll
Enablethekernel“busypoll”featuretodisableinterruptsandallowpollingofthe
socketreceivequeue,thefollowingvaluesarerecommended:
#sysctlnet.core.busy_poll=50&&sysctlnet.core.busy_read=50
Results
Kernellatencyhasbeenmeasuredat2.7µswithUDPtrafficona3.11kernel
NOTE:Latencywillbehigherwhenbusypollisnotappliedornotsupportedinthe
kernelversion.Latencyoflessthan6uscanbemeasuredwithoutbusypollona
standardRHEL6.4kernel.
OnloadUserGuide
LowLatencyQuickstartGuide
Issue22 ©SolarflareCommunications2017 13
2.6FurtherInformation
ForinstallationofSolarflareadaptersandperformancetuningofthenetworkdriver
whennotusingOnloadrefertotheSolarflareServerAdapterUserGuide(SF
103837CD)availablefromhttps://support.solarflare.com/
QuestionsregardingSolarflareproducts,Onloadandthisuserguidecanbeemailed
tosupport@solarflare.com.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 14
3Background
3.1Introduction
NOTE:ThisguideshouldbereadinconjunctionwiththeSolarflareServerAdapter
User’sGuide,SF103837CD,whichdescribesproceduresforhardwareand
softwareinstallationofSolarflarenetworkinterfacescards,networkdevicedrivers
andrelatedsoftware.
NOTE:ThroughoutthisuserguidethetermOnloadreferstobothOpenOnloadand
EnterpriseOnloadunlessotherwisestated.
OnloadistheSolarflareacceleratednetworkmiddleware.Itisanimplementationof
TCPandUDPoverIPwhichisdynamicallylinkedintotheaddressspaceofuser
modeapplications,andgranteddirect(butsafe)accesstothenetworkadapter
hardware.Theresultisthatdatacanbetransmittedtoandreceivedfromthe
networkdirectlybytheapplication,withoutinvolvementoftheoperatingsystem.
Thistechniqueisknownas'kernelbypass'.
Kernelbypassavoidsdisruptiveeventssuchassystemcalls,contextswitchesand
interruptsandsoincreasestheefficiencywithwhichaprocessorcanexecute
applicationcode.Thisalsodirectlyreducesthehostprocessingoverhead,typically
byafactoroftwo,leavingmoreCPUtimeavailableforapplicationprocessing.This
effectismostpronouncedforapplicationswhicharenetworkintensive,suchas:
•Marketdataandtradingapplications
• Computationalfluiddynamics(CFD)
•HPC(HighPerformanceComputing)
•HPMPI(HighPerformanceMessagePassingInterface),Onloadiscompatible
withMPICH1and2,HPMPI,OpenMPIandSCALI
•Otherphysicalmodelswhicharemoderatelyparallelizable
• Highbandwidthvideostreaming
•Webcaching,LoadbalancingandMemcachedapplications
•ContentDeliveryNetworks(CDN)andHTTPservers
•Othersystemhotspotssuchasdistributedlockmanagersorforced
serializationpoints
TheOnloadlibrarydynamicallylinkswiththeapplicationatruntimeusingthe
standardBSDsocketsAPI,meaningthatnomodificationsarerequiredtothe
applicationbeingaccelerated.Onloadisthefirstandonlyproducttoofferfullkernel
bypassforPOSIXsocketbasedapplicationsoverTCP/IPandUDP/IPprotocols.
OnloadUserGuide
Background
Issue22 ©SolarflareCommunications2017 15
ContrastingwithConventionalNetworking
Whenusingconventionalnetworking,anapplicationcallsontheOSkerneltosend
andreceivedatatoandfromthenetwork.Transitioningfromtheapplicationtothe
kernelisanexpensiveoperation,andcanbeasignificantperformancebarrier.
WhenanapplicationacceleratedusingOnloadneedstosendorreceivedata,it
neednotaccesstheoperatingsystem,butcandirectlyaccessapartitiononthe
networkadapter.ThetwoschemesareshowninFigure1.
Figure1:ContrastwithConventionalNetworking.
Animportantfeatureoftheconventionalmodelisthatapplicationsdonotget
directaccesstothenetworkinghardwareandsocannotcompromisesystem
integrity.OnloadisabletopreservesystemintegritybypartitioningtheNICatthe
hardwarelevelintomany,protected'VirtualNICs'(VNIC).Anapplicationcanbe
granteddirectaccesstoaVNICwithouttheabilitytoaccesstherestofthesystem
(includingotherVNICsormemorythatdoesnotbelongtotheapplication).Thus
OnloadwithaSolarflareNICallowsoptimumperformancewithoutcompromising
securityorsystemintegrity.
Insummary,Onloadcansignificantlyreducenetworkprocessingoverheads.
OnloadUserGuide
Background
Issue22 ©SolarflareCommunications2017 16
HowOnloadIncreasesPerformance
Onloadcansignificantlyreducethecostsassociatedwithnetworkingbyreducing
CPUoverheadsandimprovingperformanceforlatency,bandwidthandapplication
scalability.
Overhead
Transitioningintoandoutofthekernelfromauserspaceapplicationisarelatively
expensiveoperation:theequivalentofhundredsorthousandsofinstructions.With
conventionalnetworkingsuchatransitionisrequiredeverytimetheapplication
sendsandreceivesdata.WithOnload,theTCP/IPprocessingcanbedoneentirely
withintheuserprocess,eliminatingexpensiveapplication/kerneltransitions,i.e.
systemcalls.Inaddition,theOnloadTCP/IPstackishighlytuned,offeringfurther
overheadsavings.
TheoverheadsavingsofOnloadmeanmoreoftheCPU'scomputingpoweris
availabletotheapplicationtodousefulwork.
Latency
Conventionally,whenaserverapplicationisreadytoprocessatransactionitcalls
intotheOSkerneltoperforma'receive'operation,wherethekernelputsthecalling
thread'tosleep'untilarequestarrivesfromthenetwork.Whensucharequest
arrives,thenetworkhardware'interrupts'thekernel,whichreceivestherequest
and'wakes'theapplication.
AllofthisoverheadtakesCPUcyclesaswellasincreasingcacheandtranslation
lookasidebuffer(TLB)footprint.WithOnload,theapplicationcanremainatuser
levelwaitingforrequeststoarriveatthenetworkadapterandprocessthem
directly.Theeliminationofakerneltousertransition,aninterrupt,anda
subsequentusertokerneltransitioncansignificantlyreducelatency.Inshort,
reducedoverheadsmeanreducedlatency.
Bandwidth
BecauseOnloadimposeslessoverhead,itcanprocessmorebytesofnetworktraffic
everysecond.Alongwithspeciallytunedbufferingandalgorithmsdesignedfor10
gigabitnetworks,Onloadallowsapplicationstoachievesignificantlyimproved
bandwidth.
Scalability
Modernmulticoresystemsarecapableofrunningmanyapplications
simultaneously.However,theadvantagescanbequicklylostwhenthemultiple
corescontendonasingleresource,suchaslocksinakernelnetworkstackordevice
driver.Theseproblemsarecompoundedonmodernsystemswithmultiplecaches
acrossmanyCPUcoresandNonUniformMemoryArchitectures.
OnloadUserGuide
Background
Issue22 ©SolarflareCommunications2017 17
Onloadresultsinthenetworkadapterbeingpartitionedandeachpartitionbeing
accessedbyanindependentcopyoftheTCP/IPstack.TheresultisthatwithOnload,
doublingthecoresreallycanresultindoubledthroughputasdemonstratedby
Figure2.
Figure2:OnloadPartitionedNetworkAdapter
FurtherInformation
Fordetailedinformationreferto:
OnloadFunctionalityonpage54.
Onload‐TCPonpage80.
Onload‐UDPonpage98.
OnloadandVirtualizationonpage120
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 18
4Installation
4.1Introduction
Thischaptercoversthefollowingtopics:
OnloadDistributionsonpage18
HardwareandSoftwareSupportedPlatformsonpage19
OnloadandtheNetworkAdapterDriveronpage20
RemovingPreviouslyInstalledDriversonpage20
onload_uninstallPreinstallNotesonpage22
EnterpriseOnload‐BuildandInstallfromSRPMonpage23
EnterpriseOnload‐DebianSourcePackagesonpage24
OpenOnloadDKMSInstallationonpage25
BuildOpenOnloadSourceRPMonpage25
OpenOnload‐Installationonpage25
OnloadKernelModulesonpage27
ConfiguringtheNetworkInterfacesonpage28
InstallingNetperfonpage28
TestingtheOnloadInstallationonpage28
ApplyanOnloadPatchonpage28
4.2OnloadDistributions
Onloadisavailableintwodistributions
• “OpenOnload”isafreeversionofOnloadavailablefromhttp://
www.openonload.org/distributedasasourcetarballundertheGPLv2license.
OpenOnloadissubjecttoalineardevelopmentcyclewheremajorreleases
every34monthsincludethelatestdevelopmentfeatures.
• “EnterpriseOnload”isacommercialenterpriseversionofOnloaddistributedas
asourceRPMundertheGPLv2license.EnterpriseOnloaddiffersfrom
OpenOnloadinthatitisofferedasamaturecommercialproductthatis
downstreamfromOpenOnloadhavingundergoneacomprehensivesoftware
producttestcycleresultingintested,hardenedandvalidatedcode.
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 19
TheSolarflareproductrangeoffersaflexibleandbroadrangeofsupportoptions,
usersshouldconsulttheirresellerfordetailsandrefertotheSolarflareEnterprise
ServiceandSupportinformationathttp://www.solarflare.com/EnterpriseService
Support.
OnloadLicenses
UsersareadvisedtoreadthefollowinglicensefilesintheOnloaddistribution:
LICENSE
lwIPLICENSE
ONLOADLICENSE
TCPDLICENSE
4.3HardwareandSoftwareSupportedPlatforms
•OnloadcanberunonthefollowingSolarflareadapters:
‐ SolarflareFlareonSFN7000andSFN8000seriesAdapters
‐ OnloadNetworkAdapters
‐ Solarflaremezzanineadapters
‐ SFA6902FandSFA7942QApplicationOnload™Engine.
RefertotheSolarflareServerAdapterUserGuide‘ProductSpecifications’for
adapterdetails.
•OnloadcanrunonalllntelandAMDx86processors,32bitand64bitplatforms.
Table1identifiessupportedoperatingsystems/kernels
Table1:OS/KernelSupport
OSVersion Notes
RedHatEnterpriseLinux,6.4‐6.8,and7.0‐
7.2
RHEL6builtinSolarflaredrivers
maynotsupportSFN7000and
SFN8000seriesadapters.
RedHatMessaging,RealtimeandGrid2.4,2.5
SuSELinuxEnterpriseServer11sp2,sp3,sp4 BuiltinSolarflaredriversmay
notsupportSFN7000and
SFN8000seriesadapters.
SuSELinuxEnterpriseRealtimeExtension11
SuSELinuxEnterpriseServer12baseandsp1
CanonicalUbuntuServerLTS14.04,16.04
CanonicalUbuntuServer16.10
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 20
WhilsttheOnloadQAtestcyclepredominantlyfocusesontheLinuxOSversions
documentedabove,althoughnotformallysupported,Solarflarearenotawareof
anyissuespreventingOnloadinstallationonotherLinuxvariantssuchasCentosand
Fedora.SomeversionsofUbuntuandDebianearlierthanthoselistedabovearealso
knowntosupportOnload.
4.4OnloadandtheNetworkAdapterDriver
TheSolarflarenetworkadapterdriver,the“netdriver,isgenerallyavailablefrom
threesources:
DownloadassourceRPMfromsupport.solarflare.com.
•Packaged‘inbox’inmanyLinuxdistributionse.gRedHatEnterpriseLinux.
•PackagedintheOpenOnload/EnterpriseOnloaddistribution.
WhenusingOnloadyoumustusetheadapterdriverdistributedwiththatversionof
Onload.
4.5RemovingPreviouslyInstalledDrivers
TheSolarflareadapterdriver(sfc.ko)isdistributedaspartofmanyLinuxbasedOS
distributions‐thisisoftenreferredtoasthe‘boxeddriverorthe‘intree’driver.
DependingontheOSversionthisdrivermaynotsupportmorerecentSolarflare
adapters.Alwayscheckthedriverreleasenotesavailablefromhttps://
support.solarflare.com/.
Debian7“Wheezy7.x
Debian8Jessie”8.0
Linuxkernels2.6.32‐4.8
ThenetdriverhasbeentestedasaVFdriver
usingKVM,ESXi5.5andESXi6.0hypervisors.
SupportedguestOS:
RHEL6.5,6.6,6.7
RHEL7.0,7.1
SLES11(sp4)
SLES12baserelease
SolarflareaimtosupporttheOScurrentandpreviousmajorreleaseatthepoint
thesearereleased(plusthelatestlongtermsupportreleaseifthisisnotalready
included).Thisincludesallminorreleaseswherethedistributorhasnotyet
declaredendoflife/support.
Table1:OS/KernelSupport(continued)
OSVersion Notes
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 21
The‘intree’driverdisplaysonlyMajorandMinorrevisionnumberswhendisplayed
bytheethtoolcommand:
#ethtool‐ienp3s0f0
driver:sfc
version:4.0
EveryOnloadreviseddistributionincludesaversionofthenetdrivertosupportthe
specificfeaturesoftheOnloadreleaseandthisdrivershouldalwaysbeusedwith
Onload.(ThedriverisinstalledalongwiththeotherOnloaddrivers.)Onloaddrivers
displaydetailedversioninformationusingtheethtoolcommand:
#ethtool‐ienp3s0f0
driver:sfc
version:4.5.1.1020
ToensuretheOnloaddriverisalwaysloadedfollowingsystemreboot,the‘intree’
drivercanberemovedfromtheOSentirely.AlternativelyanyOnloadstartupscript
shouldincludethecommandtoreloadtheOnloaddrivers:
#onload_toolreload
Toremovethe‘intree’driver(withOnloaduninstalledornotyetinstalled):
#find/lib/modules/$(uname‐r)‐name'sfc*.ko'|xargsrm–rf
#rmmodsfc
#updateinitramfs‐u‐k<kernelversion>
initramfscommandsmaydifferondifferentLinuxbasedOS,e.gonCentos7the
followingdracutcommandcanbeused:
#dracut–f
4.6MigratingBetweenOnloadVersions‐Upgrade/Downgrade
WhenmigratingbetweenOnloadversionsorbetweenOpenOnloadand
EnterpriseOnload,apreviouslyinstalledversionmustfirstbeunloadedusingthe
onload_toolunloadcommandandthenremovedusingtheonload_uninstall
command.
#onload_toolunload
#onload_uninstall
Insomespecificcasesitmaybenecessarytomanuallyremoveonloaddriver
modulesbeforeupgradingtoamorerecentversion.Todothis,listthemodulesand
removeeachdependencybeforeremovingthemodules:
#lsmod|greponload
onload5805993
sfc_char474191onload
sfc_resource1623512onload,sfc_char
sfc4318074sfc_resource,onload,sfc_char,sfc_affinity
onload_cplane1441423onload
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 22
#lsmod|grepsfc
sfc_char474191onload
sfc_resource1623512onload,sfc_char
sfc_affinity179481sfc_resource
sfc4318074sfc_resource,onload,sfc_char,sfc_affinity
Toremovemodules:
#rmmodonload
#rmmodsfc_char
Repeatthermmodcommandforeachmodule.
RemoveRPM
ForsomeEnterpriseOnloadpackages,itmayalsobenecessarytoremoveinstalled
RPMpackages:
rpm‐qa|grep'enterpriseonload'|xargsrpm‐e
rpm‐qa|grep'onload'|xargsrpm‐e
rpm‐qa|grep'sfc'|xargsrpm‐e
rpm‐qa|grep'sfutils'|xargsrpm‐e
onload_uninstallPreinstallNotes
NOTE:IfOnloadistoacceleratea32bitapplicationona64bitarchitecture,the
32bitlibcdevelopmentheadersshouldbeinstalledbeforebuildingOnload.Refer
toAppendixCforinstallinstructions.
NOTE:YoumustremoveanyexistingSolarflareRPMdriverpackagesbefore
installingOnload.
NOTE:TheSolarflaredriversarecurrentlyclassifiedasunsupportedinSLES11,12,
thecertificationprocessisunderway.Toovercomethis(SLES11)add
allow_unsupported_modules1tothe/etc/modprobe.d/unsupported
modulesfile.ForSLES12addthesametothe/etc/modprobe.d/10
unsupportedmodules.conffile.
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 23
4.7EnterpriseOnload‐BuildandInstallfromSRPM
ThefollowingstepsidentifytheprocedurestobuildandinstallEnterpriseOnload.
SRPMscanbebuiltbythe‘rootor‘nonrootuser,buttheusermusthave
superuserprivilegestoinstallRPMs.CustomersshouldcontacttheirSolarflare
customersalesrepresentativeforaccesstotheEnterpriseOnloadSRPMresources.
BuildtheRPM
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
Asroot:
rpmbuild‐‐rebuildenterpriseonload<version>.src.rpm
Orasanonrootuser:
Itisadvisedtouse_topdirtoensurethatRPMsarebuiltintoadirectorytowhich
theuserhaspermissions.Thedirectorystructuremustpreexistfortherpmbuild
commandtosucceed.
mkdir‐p/tmp//myrpm/{SOURCES,BUILD,RPMS,SRPMS}
rpmbuild‐‐define"_topdir/tmp/myrpm"\
‐‐rebuildenterpriseonload<version>.src.rpm
NOTE:Onsomenonstandardkernelstherpmbuildmightfailbecauseofbuild
dependencies.Inthiseventretry,addingthe‐‐nodepsoptiontothecommand
line.
BuildingthesourceRPMwillproduce2binaryRPMfileswhichcanbefoundinthe
/usr/src/*/RPMS/directory
•or,whenbuiltbyanonrootuserin_topdir/RPMS
•or,when_topdirwasdefinedintherpmbuildcommandlinein/tmp/myrpm/
RPMS/x86_64/
forexampletheEnterpriseOnloaduserspacecomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonload<version>.x86_64.rpm
andtheEnterpriseOnloadkernelcomponents:
/usr/src/redhat/RPMS/x86_64/enterpriseonloadkmod2.6.1892.el5
<version>.x86_64.rpm
InstalltheEnterpriseOnloadRPM
TheEnterpriseOnloadRPMandthekernelRPMmustbeinstalledfor
EnterpriseOnloadtofunctioncorrectly.
rpm‐ivfenterpriseonload<version>.x86_64.rpm
rpm‐ivfenterpriseonloadkmod2.6.1892.el5<version>.x86_64.rpm
NOTE:EnterpriseOnloadisnowinstalledbutthekernelmodulesarenotyetloaded.
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 24
NOTE:TheEnterpriseOnloadkmodfilenameisspecifictothekernelthatitisbuilt
for.
InstallingtheEnterpriseOnloadKernelModule
ThiswillloadtheEnterpriseOnloadkerneldriverandotherdriverdependenciesand
createanydevicenodesneededforEnterpriseOnloaddriversandutilities.The
commandshouldberunasroot.
/etc/init.d/openonloadstart
Followingsuccessfulexecutionthiscommandproducesnooutput,buttheonload
scriptwillidentifythatthekernelmoduleisnowloaded.
onload
EnterpriseOnload<version>
Copyright20062013SolarflareCommunications,20022005Level5Networks
Built:Oct15201309:19:2312:23:12(release)
Kernelmodule:<version>
NOTE:AtthispointEnterpriseOnloadisloaded,butuntilthenetworkinterfacehas
beenconfiguredandbroughtintoserviceEnterpriseOnloadwillbeunableto
acceleratetraffic.
4.8EnterpriseOnload‐DebianSourcePackages
Fromversion4.0,DebianinstallpackagesareavailableforEnterpriseOnload.
Packagesarenamedinthefollowingformat:
enterpriseonload_<version>debiansource.tgz
1Untarsourcepackage
$tarxfenterpriseonload_<version>debiansource.tgz
2Extractsource
$dpkgsource‐xenterpriseonload_<version>1.dsc
3Buildpackages
$cdenterpriseonload<version>
$debuild‐i‐uc‐us
4Installpackages
$sudodpkg‐i../enterpriseonloaduser_<version>1_amd64.deb
$sudodpkg‐i../enterpriseonloadsource_<version>1_all.deb
5Buildandinstallmodules
$sudomaaienterpriseonload
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 25
4.9OpenOnloadDKMSInstallation
OpenOnloadDKMSpackagesareavailablebycontactingsupport@solarflare.com.
1DKMSmustbeinstalledontheserver.DKMScanbedownloadedfromhttp://
linux.dell.com/dkms/orfromtheOSdistribution.Tocheckthisrunthe
followingcommandwhichwillreturnnothingifDKMSisnotinstalled:
#dkms‐‐version
dkms:2.2.0.3
2InstalltheOnloaddkmspackage:
#rpm‐iopenonloaddkms<version>.noarch.rpm
3Ensuredriversandkernelmoduleareloaded:
onload_toolreload
4.10BuildOpenOnloadSourceRPM
AsourceRPMcanbebuiltfromtheOpenOnloaddistributiontarfile.
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
2Asroot,executethefollowingcommand:
rpmbuild‐tsopenonload<version>.tgz*
x86_64Wrote:/root/rpmbuild/SRPMS/openonload<version>.src.rpm
TheoutputidentifiesthelocationofthesourceRPM.Usethetaoptionto
generateabinaryRPM.
4.11OpenOnload‐Installation
Thefollowingproceduredemonstrateshowtodownload,untarandinstall
OpenOnload.
DownloadanduntarOpenOnload
1Downloadtherequiredtarfilefromthefollowinglocation:
http://www.openonload.org/download.html
Thecompressedtarfile(.tgz)shouldbedownloaded/copiedtoadirectoryon
themachineonwhichitwillbeinstalled.
2Asroot,unpackthetarfileusingthetarcommand.
tar‐zxvfopenonload<version>.tgz
Thiswillunpackthetarfileand,withinthecurrentdirectory,createasub
directorycalledopenonload<version>whichcontainsothersubdirectories
includingthescriptsdirectoryfromwhichsubsequentinstallcommandscan
berun.
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 26
BuildingandInstallingOpenOnload
NOTE:RefertoAppendixCfordetailsofbuilddependencies.
ThefollowingcommandwillbuildandinstallOpenOnloadandrequireddriversin
thesystemdirectories:
./onload_install
Successfulinstallationwillbeindicatedwiththefollowingoutput
onload_install:Installcompletepossiblyfollowedbyawarningthatthe
sfc(netdriver)driverisalreadyinstalled.
NOTE:Theonload_installscriptdoesnotcreateRPMs.
LoadOnloadDrivers
FollowinginstallationitisnecessarytoloadtheOnloaddrivers:
onload_toolreload
WhenusedwithOpenOnloadthiscommandwillreplaceanypreviouslyloaded
networkadapterdriverwiththedriverfromtheOpenOnloaddistribution.
CheckthatSolarflaredriversareloadedusingthefollowingcommands:
lsmod|grepsfc
lsmod|greponload
AnalternativetothereloadcommandistorebootthesystemtoloadOnload
drivers.
ConfirmOnloadInstallation
WhentheOnloadinstallationiscompleteruntheonloadcommandtoconfirm
installationofOnloadsoftwareandkernelmodule:
#onload
WilldisplaytheOnloadproductbannerandusage:
OpenOnload201405
Copyright20062012SolarflareCommunications,20022005Level5Networks
Built:May20201416:46:33(release)
Kernelmodule:201405
usage:
onload[options]<command><commandargs>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐forceprofiles‐‐profilesettingsoverrideenvironment
‐‐noapphandler‐‐donotuseappspecificsettings
‐‐app=<appname>‐‐identifyapplicationtorununderonload
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 27
4.12OnloadKernelModules
ToidentifySolarflaredriversalreadyinstalledontheserver:
find/lib/modules/`uname‐r`‐typef‐name'*.ko'‐printf'%f\n'|grep‐E'sfc|onload'
Tounloadanyloadeddrivers:
onload_toolunload
ToremovetheinstalledfilesofapreviousOnload:
onload_uninstall
ToloadtheSolarflarenetdriver(ifnotalreadyloaded):
modprobesfc
Reloaddriversfollowingupgradeorchangedsettings:
onload_toolreload
DriverName Description
sfc.ko ALinuxnetdriverprovidestheinterfacebetweentheLinux
networkstackandtheSolarflarenetworkadapter.
sfc_char.ko ProvideslowlevelaccesstotheSolarflarenetworkadapter
virtualizedresources.Supportsdirectaccesstothenetwork
adapterforapplicationsthatusetheef_viuserlevel
interfaceformaximumperformance.
sfc_tune.ko Thisisusedtopreventthekernelduringidleperiodsfrom
puttingtheCPUsintoasleepstate.
Removedinopenonload201405.
sfc_aoe.ko SolarflareApplicationOnload™Enginedriverforthe
SFA6902Fadapter.
NotethatthisisnowdistributedseparatelyfromOnload,to
allowindependentupdate.
sfc_affinity.ko Usedtodirecttrafficflowmanagedbyathreadtothecore
thethreadisrunningon,insertspacketfiltersthatoverride
theRSSbehaviour.
sfc_resource.ko Managesthevirtualizationresourcesoftheadapterand
sharestheresourcesbetweenotherdrivers.
onload.ko ThekernelcomponentofOnload.
onload_cplane.ko ThecontrolplanecomponentofOnload.
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 28
4.13ConfiguringtheNetworkInterfaces
NetworkinterfacesshouldbeconfiguredaccordingtotheSolarflareServerAdapter
User’sGuide.
Whentheinterface(s)havebeenconfigured,thedmesgcommandwilldisplay
outputsimilartothefollowing(oneentryforeachSolarflareinterface):
sfc0000:13:00.0:INFO:eth2SolarflareCommunicationsNICPCI(1924:803)
sfc0000:13:00.1:INFO:eth3SolarflareCommunicationsNICPCI(1924:803)
NOTE:IPaddressconfigurationshouldbecarriedoutusingnormalOStoolse.g.
systemconfignetwork(RedHat)oryast(SUSE).
4.14InstallingNetperf
RefertotheLowLatencyQuickstartGuideonpage5forinstructionstoinstall
NetperfandSolarflaresfnettestapplications.
4.15HowtorunOnload
OnceOnloadhasbeeninstalledtherearedifferentwaystoaccelerateapplications.
•PrefixingtheapplicationcommandlinewiththeOnloadcommandwill
acceleratetheapplication.
#onload<app_name>[app_options]
•ExportingLD_PRELOADtotheenvironmentwillmeanthatallapplications
startedinthesameenvironmentwillbeaccelerated.
#exportLD_PRELOAD=libonload.so
4.16TestingtheOnloadInstallation
TheLowLatencyQuickstartGuideonpage5demonstratestestingofOnloadwith
NetperfandtheSolarflaresfnettestbenchmarktools.
4.17ApplyanOnloadPatch
Occasionally,theSolarflareSupportGroupmayissueasoftware‘patch’whichis
appliedtoonloadtoresolveaspecificbugorinvestigateaspecificissue.The
followingproceduredescribeshowapatchshouldbeappliedtotheinstalled
OpenOnloadsoftware.
1Copythepatchtoadirectoryontheserverwhereonloadisalreadyinstalled.
2Gototheonloaddirectoryandapplythepatche.g.
cdopenonload<version>
[openonload<version>]$patch‐p1<~/<path>/<nameofpatchfile>.patch
OnloadUserGuide
Installation
Issue22 ©SolarflareCommunications2017 29
3Uninstalltheoldonloaddrivers
[openonload<version>]$onload_uninstall
4Buildandreinstalltheonloaddrivers
[openonload<version>]$./scripts/onload_install
[openonload<version>]$onload_toolreload
Thefollowingproceduredescribeshowapatchshouldbeappliedtotheinstalled
EnterpriseOnloadRPM.(ThisexamplepatchesEnterpriseOnloadversion2.1.0.3).
1CopythepatchtothedirectoryontheserverwheretheEnterpriseOnloadRPM
packageexistsandcarryoutthefollowingcommands:
rpm2cpioenterpriseonload2.1.0.31.src.rpm|cpio–id
tar‐xzfenterpriseonload2.1.0.3.tgz
cdenterpriseonload2.1.0.3
patch‐p1<$PATCHNAME
2Thiscannowbeinstalleddirectoryfromthisdirectory:
./scripts/onload_install
3OritcanberepackagedasanewsourceRPM:
cd..
tar‐czfenterpriseonload2.1.0.3.tgzenterpriseonload2.1.0.3
rpmbuild‐tsenterpriseonload2.1.0.3.tgz
4Therpmbuildprocedurewilldisplaya‘Wrote’lineidentifyingthelocationof
thesourceRPMe.g
Wrote:/root/rpmbuild/SRPMS/enterpriseonload2.1.0.31.el6.src.rpm
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 30
5TuningOnload
5.1Introduction
ThischapterdocumentstheavailabletuningoptionsforOnload,andtheexpected
results.Theoptionscanbesplitintothefollowingcategories:
•SystemTuning
• StandardLatencyTuning.
•AdvancedTuningdrivenfromanalysisoftheOnloadstackusing
onload_stackdump.
MostoftheOnloadconfigurationparameters,includingtuningparameters,areset
byenvironmentvariablesexportedintotheacceleratedapplicationsenvironment.
Environmentvariablescanbeidentifiedthroughoutthismanualastheybeginwith
EF_.AllenvironmentvariablesaredescribedinAppendicesAandBofthismanual.
Examplesthroughoutthisguideassumetheuseofthebashorshshells;othershells
mayusedifferentmethodstoexportvariablesintotheapplicationsenvironment.
SystemTuningonpage31describestoolsandcommandswhichcanbeusedto
tunetheserverandOS.
StandardTuningonpage33describeshowtoperformstandardheuristic
tuning,whichcanhelpimprovetheapplication’sperformance.Therearealso
benchmarkexamplesrunningspecificteststodemonstratetheimprovements
Onloadcanhaveonanapplication.
AdvancedTuningonpage47introducesadvancedtuningoptionsusing
onload_stackdump.Thereareworkedexamplestodemonstratehowto
achievetheapplicationtuninggoals.
NOTE:Onloadtuningandkerneldrivertuningaresubjecttodifferent
requirements.ThissectiondescribesthestepstotuneOnload.Fordetailsonhow
totunetheSolarflarekerneldriver,refertothe'PerformanceTuningonLinux'
sectionoftheSolarflareServerAdapterUserGuide.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 31
5.2SystemTuning
Thissectiondetailsstepstotunetheserverandoperatingsystemforlowestlatency.
Sysjitter
TheSolarflaresysjitterutilitymeasurestheextenttowhichthesystemintroduces
jitterandsoimpactsontheuserlevelprocess.Sysjitterrunsathreadoneach
processorcoreandwhenthethreadisdescheduledfromthecoreitmeasuresfor
howlong.Sysjitterproducessummarystatisticsforeachprocessorcore.The
sysjitterutilitycanbedownloadedfromwww.openonload.org
Sysjittershouldberunonasystemthatisidle.Whenrunningonasystemwith
cpusetsenabled‐runsysjitterasroot.
RefertothesysjitterREADMEfileforfurtherinformationonbuildingandrunning
sysjitter.
ThefollowingisanexampleoftheoutputfromsysjitteronasingleCPUsocket
serverwith4CPUcores.
./sysjitter‐‐runtime10200|column‐t
core_i:0123
threshold(ns):200200200200
cpu_mhz:3215321532153215
runtime(ns):9987653973998765224599876520709987652027
runtime(s):9.9889.9889.9889.988
int_n:10001101301001210001
int_n_per_sec:1001.3361014.2521002.4381001.336
int_min(ns):1333124712991446
int_median(ns):1390133013291470
int_mean(ns):1424145214521502
int_90(ns):1437137213571519
int_99(ns):1619504623921688
int_999(ns):506522977156043694
int_9999(ns):312603901718430536419
int_99999(ns):406134506534709749998
int_max(ns):406134506534709749998
int_total(ns):14244846147199721454199115031294
int_total(%):0.1430.1470.1460.150
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 32
Thetablebelowdescribestheoutputfieldsofthesysjitterutility.
Timer(TSC)Stability
OnloadusestheTimeStampCounter(TSC)CPUregisterstomeasurechangesin
timewithverylowoverhead.ModernCPUssupportan“invariantTSC,whichis
synchronizedacrossdifferentCPUsandticksataconstantrateregardlessofthe
currentCPUfrequencyandpowersavingmode.Onloadreliesonthistogenerate
accuratetimecalculationswhenrunningacrossmultipleCPUs.Ifrunonasystem
whichdoesnothaveaninvariantTSC,Onloadmaycalculatewildlyinaccuratetime
valuesandthiscan,inextremecases,leadtosomeconnectionsbecomingstuck.
UsersshouldconsulttheirservervendordocumentationandOSdocumentationto
ensurethatserverscanmeettheinvariantTSCrequirement.
Field Description
threshold(ns) ignoreanyinterruptsshorterthanthisperiod
cpu_mhz CPUspeed
runtime(ns) runtimeofsysjitter‐nanoseconds
runtime(s) runtimeofsysjitter‐seconds
int_n numberofinterruptionstotheuserthread
int_n_per_sec numberofinterruptionstotheuserthreadpersecond
int_min(ns) minimumtimetakenawayfromtheuserthreadduetoan
interruption
int_median(ns) mediantimetakenawayfromtheuserthreadduetoan
interruption
int_mean(ns) meantimetakenawayfromtheuserthreadduetoan
interruption
int_90(ns) 90%percentilevalue
int_99(ns) 99%percentilevalue
int_999(ns) 99.9%percentilevalue
int_9999(ns) 99.99%percentilevalue
int_99999(ns) 99.999%percentilevalue
int_max(ns) maxtimetakenawayfromtheuserthread
int_total(ns) totaltimespentnotprocessingtheuserthread
int_total(%) int_total(ns)asapercentageoftotalruntime
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 33
CPUPowerSavingMode
ModernprocessorsutilizedesignfeaturesthatenableaCPUcoretodropinto
loweringpowerstateswheninstructedbytheoperatingsystemthattheCPUcore
isidle.WhentheOSschedulesworkontheidleCPUcore(orwhenotherCPUcores
ordevicesneedtoaccessdatacurrentlyintheidleCPUcore’sdatacache)theCPU
coreissignaledtoreturntothefullyonpowerstate.ThesechangesinCPUcore
powerstatescreateadditionalnetworklatencyandjitter.
Solarflarethereforerecommendthatcustomerswishingtoachievethelowest
latencyandlowestjitterdisablethe“C1Epowerstate”or“CPUpowersavingmode
withinthemachine'sBIOS.
DisablingtheCPUpowersavingmodesisrequirediftheapplicationistorealizelow
latencywithlowjitter.
NOTE:ToensureCstatesarenotenabled,overridingtheBIOSsettings,itis
recommendedtoputthelineintel_idle.max_cstate=0idle=pollintothe
kernelcommandline/boot/grub/grub.conf.Thesettingswillproduceconsistent
resultsandareparticularlyusefulwhenbenchmarking.
AllowingsomecorestoenableTurbomodeswhileothersareidlecanproduce
betterlatencyinsomeservers.Forthis,useidle=mwaitandenableCstatesinthe
BIOS.
Alternatively,onlaterLinuxversions,thetunedservicecanbeenabledandused
withthenetworklatencyprofile.
UsersshouldrefertovendordocumentationandexperimentwithCstatesfor
differentapplications.
Customersshouldconsulttheirsystemvendoranddocumentationfordetails
concerningthedisablingofC1E,CstatesorCPUpowersavingstates.
5.3StandardTuning
ThissectiondetailsstandardtuningstepsforOnload.
Spinning(busywait)
Conventionally,whenanapplicationattemptstoreadfromasocketandnodatais
available,theapplicationwillentertheOSkernelandblock.Whendatabecomes
available,thenetworkadapterwillinterrupttheCPU,allowingthekernelto
rescheduletheapplicationtocontinue.
Blockingandinterruptsarerelativelyexpensiveoperations,andcanadverselyaffect
bandwidth,latencyandCPUefficiency.
Onloadcanbeconfiguredtospinontheprocessorinusermodeforuptoaspecified
numberofmicrosecondswaitingfordatafromthenetwork.Ifthespinperiod
expirestheprocessorwillreverttoconventionalblockingbehavior.Nonblocking
socketswillalwaysreturnimmediatelyastheseareunaffectedbyspinning.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 34
OnloadusestheEF_POLL_USECenvironmentvariabletoconfigurethelengthofthe
spintimeout.
exportEF_POLL_USEC=100000
willsetthebusywaitperiodto100milliseconds.SeeMetaOptionsonpage217for
moredetails.
Enablingspinning
ToenablespinninginOnload:
SetEF_POLL_USEC.ThiscausesOnloadtospinontheprocessorforuptothe
specifiednumberofmicrosecondsbeforeblocking.ThissettingisusedinTCPand
UDPandalsoinrecv(),select(),pselect()andpoll(),ppoll()and
epoll_wait(),epoll_pwait()andonload_ordered_epoll_wait().Usethe
followingcommand:
exportEF_POLL_USEC=100000
NOTE:IfneitherofthespinningoptionsEF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
SettingtheEF_POLL_USECvariablealsosetsthefollowingenvironmentvariables.
EF_SPIN_USEC=EF_POLL_USEC
EF_SELECT_SPIN=1
EF_EPOLL_SPIN=1
EF_POLL_SPIN=1
EF_PKT_WAIT_SPIN=1
EF_TCP_SEND_SPIN=1
EF_UDP_RECV_SPIN=1
EF_UDP_SEND_SPIN=1
EF_TCP_RECV_SPIN=1
EF_BUZZ_USEC=MIN(EF_POLL_USEC,100)
EF_SOCK_LOCK_BUZZ=1
EF_STACK_LOCK_BUZZ=1
Turnoffadaptivemoderationandsetinterruptmoderationtoahighvalue
(microseconds)toavoidfloodingthesystemwithinterrupts.Usethefollowing
command:
/sbin/ethtool‐Ceth2rxusecs60adaptiverxoff
SeeMetaOptionsonpage217formoredetails
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 35
WhentoUseSpinning
Theoptimalsettingisdependentonthenatureoftheapplication.Ifanapplication
islikelytofinddatasoonafterblocking,orthesystemdoesnothaveanyother
majortaskstoperform,spinningcanimprovelatencyandbandwidthsignificantly.
Ingeneral,anapplicationwillbenefitfromspinningifthenumberofactivethreads
islessthanthenumberofavailableCPUcores.However,iftheapplicationhasmore
activethreadsthanavailableCPUcores,spinningcanadverselyaffectapplication
performancebecauseathreadthatisspinning(andthereforeidle)takesCPUtime
awayfromanotherthreadthatcouldbedoingwork.Ifindoubt,itisadvisabletotry
anapplicationwitharangeofsettingstodiscovertheoptimalvalue.
Pollingvs.Interrupts
InterruptsareusefulbecausetheyallowtheCPUtodootherusefulworkwhile
simultaneouslywaitingforasynchronousevents(suchasthereceptionofpackets
fromthenetwork).ThehistoricalalternativetointerruptswasfortheCPUto
periodicallypollforasynchronouseventsandonsingleprocessorsystemsthiscould
resultingreaterlatencythanwouldbeobservedwithinterrupts.Historicallyitwas
acceptedthatinterruptswere“goodforlatency”.
Onmodern,multicoresystemsthetradeoffsaredifferent.Itisoftenpossibleto
dedicateanentireCPUcoretotheprocessingofasinglesourceofasynchronous
events(suchasnetworktraffic).TheCPUdedicatedtoprocessingnetworktraffic
canbespinning(akabusywaiting),continuouslypollingforthearrivalofpackets.
Whenapacketarrives,theCPUcanbeginprocessingitalmostimmediately.
Contrastthepollingmodeltoaninterruptdrivenmodel.HeretheCPUislikelyinits
“idleloop”whenaninterruptoccurs.Theidleloopisinterrupted,theinterrupt
handlerexecutes,typicallymarkingaworkertaskasrunnable.TheOSschedulerwill
thenrunandswitchestothekernelthreadthatwillprocesstheincomingpacket.
Thereistypicallyasubsequenttaskswitchtoausermodethreadwherethereal
workofprocessingtheevent(e.g.actingonthepacketpayload)isperformed.
Dependingonthesystem,itcantakeontheorderofamicrosecondtorespondto
aninterruptandswitchtotheappropriatethreadcontextbeforebeginningthereal
workofprocessingtheevent.AdedicatedCPUspinninginapollingloopcanbegin
processingtheasynchronouseventinamatterofnanoseconds.
ItfollowsthatspinningonlybecomesanoptionifaCPUcorecanbededicatedto
theasynchronousevent.IftherearemorethreadsawaitingeventsthanCPUcores
(i.e.ifallCPUcoresareoversubscribedtoapplicationworkerthreads),thenspinning
isnotaviableoption,(atleast,notforallevents).Onethreadwillbespinning,
pollingfortheeventwhileanothercouldbedoingusefulwork.Spinninginsucha
scenariocanleadto(dramatically)increasedlatencies.ButifaCPUcorecanbe
dedicatedtoeachthreadthatblockswaitingfornetworkI/O,thenspinningisthe
bestmethodtoachievethelowestpossiblelatency.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 36
5.4OnloadDeploymentonNUMASystems
WhendeployedonNUMAsystems,applicationloadthroughputandlatency
performancecanbeadverselyaffectedunlessdueconsiderationisgiventothe
selectionoftheNUMAnode,theallocationofcachememoryandtheaffinitization
ofdrivers,processesandinterrupts.
ForbestperformancetheacceleratedapplicationshouldalwaysrunontheNUMA
nodenearesttotheSolarflareadapter.Thecorrectallocationofmemoryis
particularlyimportanttoensurethatpacketbuffersareallocatedonthecorrect
NUMAnodetoavoidunnecessaryincreasesinQPItrafficandtoavoiddropped
packets.
Usefulcommands
•ToidentifyNUMAnodes,socketmemoryandCPUcoreallocation:
#numactl‐H
•ToidentifytheNUMAnodelocaltoaSolarflareadapter:
#cat/sys/class/net/<interface>/device/numa_node
•ToidentifymemoryallocationanduseonaparticularNUMAnode:
#cat/sys/devices/system/node/node<N>/numastat
•ToidentifyNUMAnodemappingtocores,useoneofthefollowing:
#numactl‐‐hardware
#cat/sys/devices/system/node/node<N>/cpulist
DriverLoading‐NUMANode
Whenloading,theOnloadmodulewillcreateavarietyofcommondatastructures.
ToensurethatthesearecreatedontheNUMAnodenearesttotheSolarflare
adapter,onload_toolreloadshouldbeaffinitizedtoacoreonthecorrectNUMA
node.
#numactl‐‐cpunodebind=1onload_toolreload
WhenthereismorethanoneSolarflareadapterinthesameserver,ondifferent
NUMAnodes,theusermustselectonenodeovertheotherwhenloadingthedriver,
butalsomakesurethatinterruptIRQsareaffinitizedtothecorrectlocalCPUnode
foreachadapter.
onload_toolreloadissinglethreaded,sorunningwithcpunodebind=0,1”,for
example,meansthecommandcouldrunoneithernodewhichisnotidentifiableby
theuseruntilafterthecommandhascompleted.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 37
MemoryPolicy
Toguaranteethatmemoryisappropriatelyallocated‐andtoensurethatmemory
allocationsdonotfail,amemorypolicythatbindstoaspecificNUMAnodeshould
beselected.Whennopolicyisspecifiedthesystemwillgenerallyuseadefault
policyallocatingmemoryonthenodeonwhichaprocessisexecuting.
ApplicationProcessing
ThemajorityofprocessingbyOnloadoccursinthecontextoftheOnloaded
application.VariousmethodscanbeusedtoaffinitizetheOnloadedprocess;
numactl,tasksetorcpusetsortheCPUaffinitycanbesetprogramatically.
Workqueues
AnOnloadedapplicationwillcreatetwosharedworkqueuesandoneperstack
workqueue.TheimplementationoftheworkqueuediffersbetweenLinuxkernels‐
andsodoesthemethodusedtoaffinitizeworkqueues.
OnmorerecentLinuxkernels(3.10+)theOnloadworkqueueswillbeinitially
affinitizedtothenodeonwhichtheyarecreated.Thereforeifthedriverloadis
affinitizedandtheOnloadedapplicationaffinitizedtothecorrectnode,Onload
stackswillbecreatedonthecorrectnodeandtherewillbenofurtherwork
required.
SpecifyingacpumaskviasysfsforaworkqueueisNOTrecommendedasthiscan
breakorderingrequirements.
OnolderLinuxkernelsdedicatedworkqueuethreadsarecreated‐andthesecanbe
affinitizedusingtasksetorcpusets.Identifythetwoworkqueuessharedbyall
Onloadstacks:
onloadwqueue
sfc_vi
Identifytheperstackworkqueuewhichhasanameintheformatonload
wq<stackid>(e.gonloadwq:1forstack1).
Usetheonload_stackdumpcommandtoidentifyOnloadstacksandthePIDofthe
processthatcreatedthestack:
#onload_stackdump
#stackidstacknamepids
0‐106913
UsetheLinuxpidofcommandtoidentifythePIDsforOnloadworkqueues:
#pidofonloadwq:0sfc_vionloadwqueue
106930105409105431
Itisrecommendedthatthesharedworkqueuesareaffinitizedimmediatelyafterthe
driverisloadedandtheperstackqueueimmediatelyafterstackcreation.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 38
Interrupts
WhenOnloadisbeingusedinaninterruptdrivenmode(seeInterruptHandling‐
UsingOnloadonpage43)interruptsshouldaffinitizedtothesameNUMAnode
runningtheOnloadapplication,butnotonthesameCPUcoreastheapplication.
WhenOnloadisspinning(busywait)therewillbefew(ifany)interrupts,soitisnot
arealconcernwherethesearehandled.
Verification
Theonload_stackdumplotscommandisusedtoverifythatallocationsoccuronthe
requiredNUMAnode:
#onload_stackdumplots|grepnuma
numanodes:creation=0load=0
numanodemasks:packetalloc=1sockalloc=1interrupt=1
Theloadparameteridentifiesthenodewheretheadapterdriverhasbeenloaded.
ThecreationparameteridentifiesthenodeallocatingmemoryfortheOnloadstack.
ThenumanodemasksidentifywhichNUMAnodesallocatememoryforpacketsand
forsockets,andthenodesonwhichinterruptshaveactuallyoccurred.Amaskvalue
of1identifiesnode0,avalueof2identifiesnode1,avalueof3identifiesboth
nodes0and1etc.
Formostpurposesitisbestwhenloadandcreationidentifythesamenodewhich
isalsothenodelocaltotheSolarflareadapter.Toidentifythelocalnodeusethe
following:
#cat/sys/class/net/<interface>/device/numa_node
ThecpuaffinityofindividualOnloadedthreadscanbeidentifiedwiththefollowing
command:
#onload_stackdumpthreads
5.5InterruptHandling‐KernelDriver
DefaultBehavior
Usingthevalueidentifiedfromtherss_cpusoption,theSolarflareNETdriverwill
createanumberofreceive(andtransmit)queues(termedan“RSSchannel”)for
eachphysicalinterface.BydefaultthedrivercreatesoneRSSchannelperCPUcore
detectedintheseveruptoamaximumof32.
Therss_cpussfcdrivermoduleoptioncanbesetinausercreatedfile<sfc.conf>in
the/etc/modprobe.ddirectory.Thedrivermustbereloadedbeforetheoption
becomeseffective.Forexample,rss_cpuscanbesettoanintegervalue:
optionssfcrss_cpus=4
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 39
Intheaboveexample4receivequeuesarecreatedperSolarflareinterface.The
defaultvalueisrss_cpus=cores.Otheravailableoptionsarerss_cpus=<int>,
rss_cpus=hyperthreadsandrss_cpus=packages.
NOTE:Ifthesfcdrivermoduleparameterrss_numa_localisenabled,RSSwillbe
restrictedtousecores/hyperthreadsontheNUMAnodelocaltotheSolarflare
adapter.
AffinitizingRSSChannelstoCPUs
Asdescribedintheprevioussection,thedefaultbehavioroftheSolarflarenetwork
driveristocreateoneRSSchannelperCPUcore.Atloadtimethedriveraffinitizes
theinterruptassociatedwitheachRSSchanneltoaseparateCPUcoresothe
interruptloadisevenlydistributedovertheavailableCPUcores.
NOTE:TheseinitialinterruptaffinitieswillbedisruptedandchangediftheLinux
IRQbalancerdaemonisrunning.TostoptheIRQbalancerusethefollowing
command:
#serviceirqbalancestop
Inthefollowingexample,wehaveaserverwith2Solarflaredualportadapters
(totalofnetwork4interfaces),installedinaserverwith2CPUsocketswith8cores
persocket(hyperthreadingisdisabled).
Ifwesetrss_cpus=4,eachinterfacewillcreate4RSSchannels.Thedrivertakes
caretospreadtheaffinitizedinterruptsevenlyovertheCPUtopologyi.e.evenly
betweenthetwoCPUsocketsandevenlyoversharedL2/L3caches.
Thedriveralsoattemptstospreadtheinterruptloadofthemultiplenetwork
interfacesbyusingdifferentCPUcoresfordifferentinterfaces:
With4receivequeuescreatedperinterfacethisresults,onthismachine,tothefirst
networkinterfacemappingtothefourlowestnumberCPUcoresi.e.twocoresfrom
eachCPUsocketasillustratedbelow.Thenextnetworkinterfaceusesthenextfour
CPUsuntileachCPUcoreisloadedwithasingleRSSchannelasillustratedin
Figure3below.
Table2:ExampleRSSChannelMapping
Interface Numofrxqueues Maptocores
1 4 0,1,2,3
2 4 4,5,6,7
3 4 8,9,10,11
4 4 12,13,14,15
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 40
Figure3:MappingRSSChannelstoCPUcores.
ToidentifythemappingofreceivequeuestoCPUcores,usethefollowing
command:
#cat/proc/interrupts|grepeth4
106:19000000000000000IRPCIMSIedgeeth40
107:01100000000000000IRPCIMSIedgeeth41
108:00100000000000000IRPCIMSIedgeeth42
109:0002000000000000IRPCIMSIedgeeth43
NotethateachreceivequeuehasanassignedIRQ.Receivequeueeth40isserved
byIRQ106,eth41byIRQ107etc.
sfcaffinity_config
TheOpenOnloaddistributionalsoincludesthesfcaffinity_configscriptwhich
canalsobeusedtoaffinitizeRSSchannelinterrupts.sfcaffinity_confighasa
numberofcommandlineoptionsbutacommonwayofrunningitiswiththeauto
command:
#sfcaffinity_configauto
Autoinstructssfcaffinity_configtosetinterruptsaffinitiestoevenlyspreadthe
RSSchannelsovertheavailableCPUcores.Usingtheabovescenarioasanexample,
whererss_cpushasbeensetto4,thecommandwillaffinitizetheinterrupt
associatedwitheachreceivequeueevenlyovertheCPUtopologyinthiscasethe
firstfourCPUcores.
sfcaffinity_config:INFO:eth4:Spreading4interruptsevenlyover2sharedcaches
sfcaffinity_config:INFO:eth4:bindrxq0(irq106)tocore1
sfcaffinity_config:INFO:eth4:bindrxq1(irq107)tocore0
sfcaffinity_config:INFO:eth4:bindrxq2(irq108)tocore3
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 41
sfcaffinity_config:INFO:eth4:bindrxq3(irq109)tocore2
sfcaffinity_config:INFO:eth4:configuresfc_affinityn_rxqs=4
cpu_to_rxq=1,0,3,2,1,0,3,2,1,0,3,2,1,0,3,2
Figure4:Mappingwithsfcaffinity_configauto
Inthisexample,afterrunningthesfcaffinity_configautocommand,interrupts
forthe4receivequeuesfromthe4interfacesarenowalldirectedtothesame4
cores0,1,2,3asillustratedbyFigure4.
NOTE:Runningthesfcaffinity_configautocommandalsodisablesthekernel
IRQbalanceservicetopreventinterruptsbeingredirectedbythekerneltoother
cores.
RestrictRSStolocalNUMAnode
Thesfcdrivermoduleparameterrss_numa_localwillrestrictRSStoonlyuseCPU
coresorhypterthreads(ifhyperthreadingisenabled)ontheNUMAnodelocaltothe
Solarflareadapter.
rss_numa_localdoesNOTrestrictthenumberofRSSchannelscreatedbythe
driveritinsteadworksbyrestrictingtheRSSspreadingsoonlythechannelsonthe
localNUMAnodewillreceivekerneldrivertraffic.
Inthedefaultcase(whererss_cpus=cores),oneRSSchanneliscreatedperCPU
core.However,thedriveradjuststheRSSsettingssuchthatonlytheRSSchannels
affinitizedtothelocalCPUsocketreceivetraffic.Itthereforehasnoeffectonthe
Onloadallocationanduseofreceivequeuesandinterrupts.
Figure5belowidentifiesthereceivequeueinterruptsspreadwhenrss_cpus=4
andrss_numa_local=1.Inthismachineadapter1isattachedtothePCIebuson
socket#0withadapter#2attachedtothePCIebusonsocket#1.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 42
Figure5:Mappingwithrss_numa_local
RestrictRSSReceiveQueues
Theethtool‐ Xcommandcanalsobeusedtorestrictthereceivequeuesaccessible
byRSS.Inthefollowingexamplerss_cpus=4andethtool‐xidentifiesthe4
receivequeuesperinterface:
#ethtool‐xeth4
RXflowhashindirectiontableforeth4with4RXring(s):
0:01230123
8:01230123
16:01230123
24:01230123
32:01230123
40:01230123
48:01230123
56:01230123
64:01230123
72:01230123
80:01230123
88:01230123
96:01230123
104:01230123
112:01230123
120:01230123
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 43
TorestrictRSStospreadreceiveflowsevenlyoverthefirst2receivequeues.Use
ethtool‐X:
#ethtool‐Xeth4equal2
RXflowhashindirectiontableforeth4with4RXring(s):
0:01010101
8:01010101
16:01010101
24:01010101
32:01010101
40:01010101
48:01010101
56:01010101
64:01010101
72:01010101
80:01010101
88:01010101
96:01010101
104:01010101
112:01010101
120:01010101
InterruptHandling‐UsingOnload
AthreadacceleratedbyOnloadwilleitherbeinterruptdrivenoritwillbespinning.
Whenthethreadisinterruptdriven,athreadwhichcallsintoOnloadtoreadfrom
itsreceivequeueandforwhichtherearenoreceivedpacketstobeprocessed,will
‘sleep’untilaninterrupt(s)fromthekernelinformsitthatthereismoreworktodo.
Whenathreadisspinning,itisbusywaitingonitsreceivequeueuntilpacketsare
received‐inwhichcasethepacketsareretrievedandthethreadreturns
immediatelytothereceivequeue,oruntilthespinperiodexpires.Ifthespinperiod
expiresthethreadwillrelinquishtheCPUcoreand‘sleep’untilaninterruptfromthe
kernelinformsitthatfurtherpacketshavebeenreceived.Ifthespinperiodisset
greaterthanthepacketinterarrivalrate,thespinningthreadcancontinuetospin
andretrievepacketswithoutinterruptsoccurring.Evenwhenspinning,an
applicationmightexperienceafewinterrupts.
Asageneralrule,whenspinning,onlyafewinterruptswillbeexpectedso
performanceistypicallyinsensitiveastowhichCPUcoreprocessestheinterrupts.
However,whenOnloadisinterruptdrivenperformancecanbesensitivetowhere
theinterruptsarehandledandwilltypicallybenefittobeonthesameCPUsocket
astheapplicationthreadhandlingthesocketI/O.Themethodrequireddependson
thesettingoftheEF_PACKET_BUFFER_MODEenvironmentvariable:
•IfEF_PACKET_BUFFER_MODE=0or2,anOnloadstackwilluseoneormoreofthe
interruptsassignedtotheNETdriverreceivequeues.TheCPUcorehandling
theinterruptsisdefinedbytheRSSmappingofreceivequeuestoCPUcores:
‐ Ifsfcaffinity_confighasbeenusedtoaffinitizeRSSchannel
interrupts,theinterrupthandlingcoreforthestackcanbesetusingthe
EF_IRQ_COREenvironmentvariable.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 44
Itisonlypossibleforinterruptstobehandledontherequestedcoreifa
NETdriverinterruptisassignedtotheselectedcore.
SeeAffinitizingRSSChannelstoCPUsonpage39.
‐ Otherwise,theinterrupthandlingcoreforthestackcanbesetusingthe
EF_IRQ_CHANNELenvironmentvariable.Onloadinterruptsarehandledby
thesamecoreassignedtotheNETdriverreceivechannel.
•IfEF_PACKET_BUFFER_MODE=1or3,theonloadstackcreatesdedicated
interrupts.Theinterrupthandlingcoreforthestackcanbesetusingthe
EF_IRQ_COREenvironmentvariable.
Formoreinformationabouttheseenvironmentvariables,see:
EF_IRQ_CHANNELonpage172
EF_IRQ_COREonpage172
EF_PACKET_BUFFER_MODEonpage180.
WhenOnloadisusingaNETdriverRSSchannelforitssourceofinterrupts,itcanbe
usefultodedicatethischanneltoOnloadandpreventthedriverfromusingthis
channelforRSStraffic.SeeRestrictRSStolocalNUMAnodeonpage41andRestrict
RSSReceiveQueuesonpage42formethodsofhowtoachievethis.
5.6PerformanceJitter
Onanysystemreducingoreliminatingjitteriskeytogainingoptimumperformance,
howeverthecausesofjitterleadingtopoorperformancecanbedifficulttodefine
anddifficulttoremedy.Thefollowingsectionidentifiessomekeypointsthatshould
beconsidered.
•Afirststeptowardsreducingjittershouldbetoconsidertheconfiguration
settingsspecifiedintheLowLatencyQuickstartGuideonpage5‐thisincludes
thedisablingoftheirqbalanceservice,interruptmoderationsettingsand
measurestopreventCPUcoresswitchingtopowersavingmodes.
•UseisolcpustoisolateCPUcoresthattheapplication‐oratleastthecritical
threadsoftheapplicationwilluseandpreventOShousekeepingtasksand
othernoncriticaltasksfromrunningonthesecores.
•Setanapplicationthreadrunningononecoreandtheinterruptsforthat
threadonaseparatecore‐butonthesamephysicalCPUpackage.Evenwhen
spinning,interruptsmaystilloccur,forexample,iftheapplicationfailstocall
intotheOnloadstackforextendedperiodsbecauseitisbusydoingotherwork.
•Ideallyeachspinningthreadwillbeallocatedaseparatecoresothat,inthe
eventthatitblocksorisdescheduled,itwillnotpreventotherimportant
threadsfromdoingwork.Acommoncauseofjitterismorethanonespinning
threadsharingthesameCPUcore.Jitterspikesmayindicatethatonethreadis
beingheldofftheCPUcorebyanotherthread.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 45
•WhenEF_STACK_LOCK_BUZZ=1,threadswillspinfortheEF_BUZZ_USECperiod
whiletheywaittoacquirethestacklock.Lockbuzzingcanleadtounfairness
betweenthreadscompetingforalock,andsoresultinresourcestarvationfor
one.Occurrencesofthisarecountedinthe'stack_lock_buzz'counter.
EF_STACK_LOCK_BUZZisenabledbydefaultwhenEF_POLL_USEC(spinning)is
enabled.
•Ifamultithreadapplicationisdoinglotsofsocketoperations,stacklock
contentionwillleadtosend/receiveperformancejitter.Insuchcasesimproved
performancecanbehadwheneachcontendingthreadhasitsownstack.This
canbemanagedwithEF_STACK_PER_THREADwhichcreatesaseparateOnload
stackforthesocketscreatedbyeachthread.Ifseparatestacksarenotan
optionthenitmaybebeneficialtoreducetheEF_BUZZ_USECperiodorto
disablestacklockbuzzingaltogether.
•Itisalwaysimportantthatthreadsthatneedtocommunicatewitheachother
arerunningonthesameCPUpackagesothatthesethreadscansharea
memorycache.
• Jittermayalsobeintroducedwhensomesocketsareacceleratedandothers
arenot.Onloadwillensurethatacceleratedsocketsaregivenpriorityovernon
acceleratedsockets,althoughthisdelaywillonlybeintheregionofafew
microseconds‐notmilliseconds,thepenaltywillalwaysbeonthesideofthe
nonacceleratedsockets.TheenvironmentvariablesEF_POLL_FAST_USECand
EF_POLL_NONBLOCK_FAST_USECcanbeconfiguredtomanagetheextentof
priorityofacceleratedsocketsovernonacceleratedsockets.
•Iftrafficissparse,spinningwilldeliverthesamelatencybenefits,buttheuser
shouldensurethatthespintimeoutperiod,configuredusingthe
EF_POLL_USECvariable,issufficientlylongtoensurethethreadisstillspinning
whentrafficisreceived.
•Whenapplicationsonlyneedtosendandreceiveoccasionallyitmaybe
beneficialtoimplementakeepalive‐heartbeatmechanismbetweenpeers.
ThishastheeffectofretainingtheprocessdataintheCPUmemorycache.
Callingsendorreceiveafteradelaycanresultinthecalltakingmeasurably
longer,duetothecacheeffects,thanifthisiscalledinatightloop.
•OnsomeserversBIOSsettingssuchaspowerandutilizationmonitoringcan
causeunnecessaryjitterbyperformingmonitoringtasksonallCPUcores.The
usershouldchecktheBIOSanddecideifperiodictasks(andtherelatedSMIs)
canbedisabled.
•TheSolarflaresysjitterutilitycanbeusedtoidentifyandmeasurejitteronall
coresofanidlesystem‐refertoSysjitteronpage31fordetails.
UsingOnloadTuningProfiles
Environmentvariablessetintheapplicationuserspacecanbeusedconfigureand
controlaspectsoftheacceleratedapplication’sperformance.Thesevariablescanbe
exportedusingtheLinuxexportcommande.g.
exportEF_POLL_USEC=100000
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 46
Onloadsupportstuningprofilescriptfileswhichareusedtogroupenvironment
variableswithinasinglefiletobecalledfromtheOnloadcommandline.
ThelatencyprofilesetstheEF_POLL_USEC=100000settingthebusywaitspin
timeoutto100milliseconds.TheprofilealsodisablesTCPfaststartforneworidle
connectionswhereadditionalTCPACKswilladdlatencytothereceivepath.Touse
theprofileincludeitontheonloadcommandlinee.g
onload‐‐profile=latencynetperf‐Honload2sfc‐tTCP_RR
FollowingOnloadinstallation,profilesprovidedbySolarflarearelocatedinthe
followingdirectory‐thisdirectorywillbedeletedbytheonload_uninstall
command:
/usr/libexec/onload/profiles
Userdefinedenvironmentvariablescanbewrittentoauserdefinedprofilescript
file(havinga.opfextension)andstoredinanydirectoryontheserver.Thefullpath
tothefileshouldthenbespecifiedontheonloadcommandlinee.g.
onload‐‐profile=/tmp/myprofile.opfnetperf‐Honload2sfc‐tTCP_RR
Asanexamplethelatencyprofile,providedbytheOnloaddistributionisshown
below:
#Onloadlowlatencyprofile.
#Enablepolling/spinning.Whentheapplicationmakesablockingcall
#suchasrecv()orpoll(),thiscausesOnloadtobusywaitforupto
100ms
#beforeblocking.
onload_setEF_POLL_USEC=100000
#DisableFASTSTARTwhenconnectionisneworhasbeenidleforawhile.
#Theadditionalacksitcausesaddlatencyonthereceivepath.
onload_setEF_TCP_FASTSTART_INIT0
onload_setEF_TCP_FASTSTART_IDLE0
ForacompletelistofenvironmentvariablesrefertoParameterReferenceon
page163
BenchmarkTesting
BenchmarkproceduresusingOnload,netperfandsfnt_pingpongaredescribedin
theLowLatencyQuickstartGuideonpage5.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 47
5.7AdvancedTuning
Advancedtuningrequirescloserexaminationoftheapplicationperformance.The
applicationshouldbetunedtoachievethefollowingobjectives:
•Tohaveasmuchprocessingatuserlevelaspossible.
•Tohaveasfewinterruptsaspossible.
•Toeliminatedrops.
•Tominimizelockcontention.
Onloadincludesadiagnosticapplicationcalledonload_stackdump,whichcanbe
usedtomonitorOnloadperformanceandtosettuningoptions.
Thefollowingsectionsdemonstratetheuseofonload_stackdumptoexamine
aspectsofthesystemperformanceandsetenvironmentvariablestoachievethe
tuningobjectives.
Forfurtherexamplesanduseofonload_stackdumprefertoonload_stackdumpon
page261.
MonitoringUsingonload_stackdump
Touseonload_stackdump,enterthefollowingcommand:
onload_stackdump[command]
Tolistavailablecommandsandviewdocumentationforonload_stackdumpenter
thefollowingcommands:
onload_stackdumpdoc
onload_stackdump‐h
Aspecificstacknumbercanalsobeprovidedontheonload_stackdumpcommand
line.
WorkedExamples
PrefaultPacketBuffers
TheOnloadenvironmentvariableEF_PREFAULT_PACKETSwillcausetheuser
processto‘touch’thespecifiednumberofpacketbufferswhenanOnloadstackis
created.Thismeansthatmemoryforthesepacketbuffersispreallocatedand
memorymappedintotheuserprocessaddressspace.
Preallocationisadvisedtopreventlatencyjittercausedbytheallocationand
memorymappingoverheads.
Whendecidinghowmanypacketstoprefault,theusershouldlookattheallocvalue
whentheonload_stackdumppacketscommandisrun.Theallocvalueisahigh
watermarkidentifyingthemaximumthenumberofpacketsbeingusedbythestack
atanysingularpoint.SettingEF_PREFAULT_PACKETStoatleastthisvalueis
recommended.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 48
onload_stackdumppackets
$onload_stackdumppackets
ci_netif_pkt_dump_all:id=0
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=446current
pkt_bufs:max=32768alloc=2048free=990async=0
pkt_bufs:rx=1058rx_ring=992rx_queued=2pressure_pool=64
pkt_bufs:tx=0tx_ring=0tx_oflow=0
pkt_bufs:in_loopback=0in_sock=0
994:0x200Rx
n_zero_refs=1054n_freepkts=1estimated_free_nonb=1053
free_nonb=0nonb_pkt_pool=ffffffffffffffff
NOTE:Itisnotpossibletoprefaultanumberofpacketsexceedingthecurrentvalue
ofEF_MAX_PACKETSandattemptstodothiswillresultinawarningsimilartothe
following:
ci_netif_pkt_prefault_reserve:Prefaultedonly63488of64000
Thewarningmessageisharmless,thisinformstheuserthatnotalltherequested
packetscouldbeprefaulted(becausesomehavealreadybeenallocatedtoreceive
rings).
WhendecidinghowmanypacketstoprefaulttheusershouldconsiderthatOnload
mustallocatefromtheEF_MAX_PACKETpool,anumberofpacketbuffersperreceive
ringperinterface.Oncethesehavebeenallocated,anyremaindercanbe
prefaulted.
Userswhorequiretoprefaultthemaximumpossiblenumberofavailablepackets
cansetEF_PREFAULT_PACKETSandEF_MAX_PACKETStothesamevalueandjust
ignorethewarningsfromOnload:
EF_PREFAULT_PACKETS=64000EF_MAX_PACKETS=64000onload<myapplication>...
RefertoAppendixAonpage163fordetailsoftheEF_PREFAULT_PACKETSvariable.
CAUTION:Prefaultingpacketbuffersforonestackwillreducethenumberof
availablebuffersavailableforothers.Usersshouldconsiderthatoverallocationto
onestackmightmeanspare(redundant)packetbuffercapacitythatcouldbebetter
allocatedelsewhere.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 49
ProcessingatUserLevel
Manyapplicationscanachievebetterperformancewhenmostprocessingoccursat
userlevelratherthankernellevel.Toidentifyhowanapplicationisperforming,
enterthefollowingcommand:
onload_stackdumplots|greppolls
$onload_stackdumplots|greppoll
k_polls:673
u_polls:41
Theoutputidentifiesmanymorek_pollsthanu_pollsindicatingthatthestackis
operatingmainlyatkernellevelandmaynotbeachievingoptimalperformance.
Solution
TerminatetheapplicationandsettheEF_POLL_USECparameterto100000.Restart
theapplicationandrerunonload_stackdump:
exportEF_POLL_USEC=100000
onload_stackdumplots|greppolls
$onload_stackdumplots|greppolls
k_polls:673
u_polls:1289
Theoutputidentifiesthatthenumberofu_pollsisfargreaterthanthenumberof
k_pollsindicatingthatthestackisnowoperatingmainlyatuserlevel.
Counter Description
k_polls Numberoftimesthesocketeventqueuewas
polledfromthekernel.
u_polls Numberoftimesthesocketeventqueuewas
polledfromuserspace.
periodic_polls Numberoftimesaperiodictimerhaspolledfor
events.
interrupt_polls Numberoftimesaninterruptpolledfor
networkevents.
deferred_polls Numberoftimespollhasbeendeferredtothe
stacklockholder.
timeout_interrupt_polls Numberoftimestimeoutinterruptspolledfor
networkevents.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 50
AsFewInterruptsasPossible
Atunedapplicationwillreachabalancebetweenthenumber/rateofinterrupts
processedandtheamountofrealworkthatgetsdonee.g.processmultiplepackets
perinterruptratherthanone.Evenspinningapplicationscanbenefitfromthe
occasionalinterrupt,e.g.whenaspinningthreadhasbeendescheduledfroma
CPU,aninterruptwillprodthethreadbacktoactionwhenfurtherworkhastobe
done.
#onload_stackdumplots|grep^interrupt
Solution
Ifanapplicationisobservedtakinglotsofinterruptsitmaybebeneficialtoincrease
thespintimewiththeEF_POLL_USECvariableorsettingahighinterrupt
moderationvalueforthenetdriverusingethtool.
Thenumberofinterruptsonthesystemcanalsobeidentifiedfrom/proc/
interrupts.
Counter Description
Interrupts Totalnumberofinterruptsreceivedforthe
stack.
Interruptpolls Numberoftimesthestackispolled‐invoked
byinterrupt.
Interruptevs Numberofeventsprocessedwheninvokedby
aninterrupt.
Interruptwakes Numberoftimestheapplicationiswokenby
interrupt.
Interruptprimes Numberoftimesinterruptsarereenabled
(afterspinningorpollingthestack).
Interruptnoevents Numberofstackpollsforwhichtherewasno
eventtorecover.
Interruptlockcontends Theapplicationpolledthestackandhasthe
lockbeforeaninterruptfired.
Interruptbudgetlimited Numberoftimes,whenhandlingapollinan
interrupt,thepollwasstoppedwhentheNAPI
budgetwasreached.Anyremainingeventsare
thenprocessedonthestackworkqueue.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 51
EliminatingDrops
Theperformanceofnetworksisimpactedbyanypacketloss.Thisisespecially
pronouncedforreliabledatatransferprotocolsthatarebuiltontopofunicastor
multicastUDPsockets.
Firstchecktoseeifpacketshavebeendroppedbythenetworkadapterbefore
reachingtheOnloadstack.Useethtooltocollectstatsdirectlyfromthenetwork
adapter:
#ethtool‐Senps0f0|grepdrop
#ethtool‐Senps0f0|grepdrop
rx_noskb_drops:0
port_rx_nodesc_drops:0
port_rx_dp_di_dropped_packets:681618610
Solution
Ifpacketlossisobservedatthenetworklevelduetoalackofreceivebufferingtry
increasingthesizeofthereceivedescriptorqueuesizeviaEF_RXQ_SIZE.Ifpacket
dropsareobservedatthesocketlevelconsulttheapplicationdocumentation‐it
mayalsobeworthexperimentingwithsocketbuffersizes(seeEF_UDP_RCVBUF).
SettingtheEF_EVS_PER_POLLvariabletoahighervaluemayalsoimproveefficiency
‐refertoAppendixAforadescriptionofthisvariable.
Counter Description
rx_noskb_drops Numberofpacketsdroppedwhenthere
arenofurthersocketbufferstouse.
port_rx_nodesc_drops Numberofpacketsdroppedwhenthere
arenofurtherdescriptorsintherxring
buffertoreceivethem.
port_rx_dp_di_dropped_packets Numberofpacketsdroppedbecause
filtersindicatethepacketsshouldbe
dropped‐thiscanhappenwhenpackets
don’tmatchanyfilterorthematched
filterindicatesthepacketshouldbe
dropped.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 52
MinimizingLockContention
Lockcontentioncangreatlyaffectperformance.Whenthreadsshareastack,a
threadholdingthestacklockwillpreventanotherthreadfromdoingusefulwork.
Applicationswithfewerthreadsmaybeabletocreateastackperthread(see
EF_STACK_PER_THREADandStacksAPIonpage227).
Useonload_stackdumptoidentifyinstancesoflockcontention:
#onload_stackdumplots|egrep"(lock_)|(sleep)"
Counter Description
periodic_lock_contends Numberoftimesperiodictimercouldnot
getthestacklock.
interrupt_lock_contends Numberoftimestheinterrupthandler
couldnotgetthestacklockbecauseitis
alreadyheldbyuserlevelorother
context.
timeout_interrupt_lock_contends Numberoftimestimeoutinterruptscould
notlockthestack.
sock_sleeps Numberoftimesathreadhasblockedon
asinglesocket.
sock_sleep_primes Numberoftimesselect/poll/epoll
enabledinterrupts.
unlock_slow Numberoftimestheslowpathwastaken
tounlockthestacklock.
unlock_slow_pkt_waiter Numberoftimespacketmemoryshortage
provokedtheunlockslowpath.
unlock_slow_socket_list Numberoftimesthedeferredsocketlist
provokedtheunlockslowpath.
unlock_slow_need_prime Numberoftimesinterruptpriming
provokedtheunlockslowpath.
unlock_slow_wake Numberoftimestheunlockslowpathwas
takentowakethreads.
unlock_slow_swf_update Numberoftimestheunlockslowpathwas
takentoupdateswfilters.
unlock_slow_close Numberoftimestheunlockslowpathwas
takentoclosesockets/pipes.
unlock_slow_syscall Numberoftimesasyscallwasneededon
theunlockslowpath.
OnloadUserGuide
TuningOnload
Issue22 ©SolarflareCommunications2017 53
Solution
Performancewillbeimprovedwhenstackcontentioniskepttoaminimum.When
threadsshareastackitispreferableforathreadtospinratherthansleepwhen
waitingforastacklock.TheEF_BUZZ_USECvaluecanbeincreasedtoreduce
‘sleeps’.Wherepossibleusestacksperprocess.
lock_wakes Numberoftimesathreadiswokenwhen
blockedonthestacklock.
stack_lock_buzz Numberoftimesathreadhasspun
waitingforthestacklock.
sock_lock_sleeps Numberoftimesathreadhasslept
waitingforasocklock.
sock_lock_buzz Numberoftimesathreadhasspun
waitingforasocklock.
tcp_send_ni_lock_contends NumberoftimesTCPsendmsg()
contendedthestacklock
udp_send_ni_lock_contends NumberoftimesUDPsendmsg()
contendedthestacklock
getsockopt_ni_lock_contends Numberoftimesgetsockopt()
contendedthestacklock.
setsockopt_ni_lock_contends Numberoftimessetsockopt()
contendedthestacklock.
lock_dropped_icmps NumberofdroppedICMPmessagesnot
processedduetocontention.
Counter Description
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 54
6OnloadFunctionality
ThischapterprovidesdetailedinformationaboutspecificaspectsofSolarflare
Onloadoperationandfunctionality.
6.1OnloadTransparency
Onloadprovidessignificantlyimprovedperformancewithouttheneedtorewriteor
recompiletheuserapplication,whilstretainingcompleteinteroperabilitywiththe
standardTCPandUDPprotocols.
IntheregularkernelTCP/IParchitectureanapplicationisdynamicallylinkedtothe
libclibrary.ThisOSlibraryprovidessupportforthestandardBSDsocketsAPIviaa
setof‘wrapperfunctionswithrealprocessingoccurringatthekernellevel.Onload
alsosupportsthestandardBSDsocketsAPI.However,incontrasttothekernelTCP/
IP,Onloadmovesprotocolprocessingoutofthekernelspaceandintotheuserlevel
Onloadlibraryitself.
AsanetworkingapplicationinvokesthestandardsocketAPIfunctioncallse.g.
socket(),read(),write()etc,theseareinterceptedbytheOnloadlibrarymaking
useoftheLD_PRELOADmechanismonLinux.Fromeachfunctioncall,Onloadwill
examinethefiledescriptoridentifyingthosesocketsusingaSolarflareinterface‐
whichareprocessedbytheOnloadstack,whilstthosenotusingaSolarflare
interfacearetransparentlypassedtothekernelstack.
6.2OnloadStacks
AnOnload'stack'isaninstanceofaTCP/IPstack.Thestackincludestransmitand
receivebuffers,openconnectionsandtheassociatedportnumbersandstack
options.EachstackhasassociatedwithitoneormoreVirtualNICs(typicallyoneper
physicalportthatstackisusing).
Innormalusage,eachacceleratedprocesswillhaveitsownOnloadstacksharedby
allconnectionscreatedbytheprocess.Itisalsopossibleformultipleprocessesto
shareasingleOnloadstackinstance(refertoStackSharingonpage67),andfora
singleapplicationtohavemorethanoneOnloadstack.RefertoOnloadExtensions
APIonpage221.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 55
6.3VirtualNetworkInterface(VNIC)
TheSolarflarenetworkadaptersupports1024transmitqueues,1024receive
queues,1024eventqueuesand1024timerresourcespernetworkport.AVNIC
(virtualnetworkinterface)consistsofoneuniqueinstanceofeachofthese
resourceswhichallowstheVNICclienti.e.theOnloadstack,anisolatedandsafe
mechanismofsendingandreceivingnetworktraffic.Receivedpacketsaresteered
tothecorrectVNICbymeansofIP/MACfiltertablesonthenetworkadapterand/or
ReceiveSideScaling(RSS).AnOnloadstackallocatesoneVNICperSolarflare
networkportsoithasadedicatedsendandreceivechannelfromusermode.
FollowingaresetoftheSolarflarenetworkadapterdriver,allvirtualinterface
resourcesincludingOnloadstacksandsocketswillbereinstated.Thereset
operationwillbetransparenttotheapplication,buttrafficwillbelostduringthe
reset.
6.4FunctionalOverview
Whenestablishingitsfirstsocket,anapplicationisallocatedanOnloadstackwhich
allocatestherequiredVNICs.
Whenapacketarrives,IPfilteringintheadapteridentifiesthesocketandthedata
iswrittentothenextavailablereceivebufferinthecorrespondingOnloadstack.The
adapterthenwritesaneventtoan“eventqueuemanagedbyOnload.Ifthe
applicationisregularlymakingsocketcalls,Onloadisregularlypollingthisevent
queue,andthenprocessingeventsdirectlyratherthaninterruptsarethenormal
meansbywhichanapplicationisabletorendezvouswithitsdata.
Userlevelprocessingsignificantlyreduceskernel/userlevelcontextswitchingand
interruptsareonlyrequiredwhentheapplicationblocks‐sincewhenthe
applicationismakingsocketcalls,Onloadisbusyprocessingtheeventqueuepicking
upnewnetworkevents.
6.5OnloadwithMixedNetworkAdapters
AservermaybeequippedwithSolarflarenetworkinterfacesandnonSolarflare
networkinterfaces.Whenanapplicationisaccelerated,OnloadreadstheLinux
kernelroutingtable(Onloadwillonlyconsiderthekerneldefaultroutingtable)to
identifywhichnetworkinterfaceisrequiredtomakeaconnection.Ifanon
SolarflareinterfaceisrequiredtoreachadestinationOnloadwillpassthe
connectiontothekernelTCP/IPstack.Noadditionalconfigurationisrequiredto
achievethisasOnloaddoesthisautomaticallybylookingintheIProutetable.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 56
6.6MaximumNumberofNetworkInterfaces
Amaximumof32networkinterfacescanberegisteredwiththeOnloaddriver.
Bydefault,Onloadsupportsupto8networkinterfacesperstack.Thislimitcanbe
changedbyalteringtheCI_CFG_MAX_INTERFACESvalueinthesrc/include/ci/
internal/transport_config_opt.hheaderfilewithinthesourcecode.
FollowingchangestothisvalueitisnecessarytorebuildandreinstallOnload.
6.7WhitelistandBlacklistInterfaces
SupportedfromOnload201502,theuserisabletoselectwhichSolarflareinterfaces
aretobeusedbyOnload.
Theintf_white_listOnloadmoduleoptionisaspaceseparatedlistofSolarflare
networkadapterinterfacesthatOnloadwillusefornetworkI/O.
•InterfacesidentifiedinthewhitelistwillalwaysbeacceleratedbyOnload.
•InterfacesNOTidentifiedinthewhitelistwillnotbeacceleratedbyOnload.
•AnemptywhitelistmeansthatALLSolarflareinterfaceswillbeaccelerated.
Theintf_black_listOnloadmoduleoptionisaspaceseparatedlistofSolarflare
networkadapterinterfacesthatOnloadwillnotusefornetworkI/O.
Whenaninterfaceappearsinbothlists,blacklisttakespriority.Renamingof
interfacesafterOnloadhasstartedwillnotbereflectedintheaccesslistsand
changestolistswillonlyaffectOnloadstackscreatedaftersuchchanges‐not
currentlyrunningstacks.
Onloadmoduleoptionscanbespecifiedinausercreatedfileinthe/etc/
modprobe.ddirectory:
optionsonloadintf_white_list=eth4
optionsonloadintf_black_list="eth5eth6"
Usedoublequotesandspaceseparatorwhenspecifyingmultipleinterfaces.
Theseoptionsareappliedgloballyandcannotbeappliedtoindividualstacks.
6.8OnloadedPIDs
ToidentifyprocessesacceleratedbyOnloadusetheonload_fusercommand:
#onload_fuser‐v
9886ping
OnlyprocessesthathavecreatedanOnloadstackarepresent.Processeswhichare
loadedunderOnload,buthavenotcreatedanysocketsarenotpresent.The
onload_stackdumpcommandcanalsolistacceleratedprocesses‐seeList
OnloadedProcessesonpage262fordetails.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 57
6.9OnloadandFileDescriptors,StacksandSockets
ForanOnloadedprocessitispossibletoidentifythefiledescriptors,Onloadstacks
andsocketsbeingacceleratedbyOnload.Usethe/proc/<PID>/fdfile‐supplying
thePIDoftheacceleratedprocesse.g.
#ls‐l/proc/9886/fd
total0
lrwx‐‐‐‐‐‐1rootroot64May1414:090‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:091‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:092‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:093‐>onload:[tcp:6:3]
lrwx‐‐‐‐‐‐1rootroot64May1414:094‐>/dev/pts/0
lrwx‐‐‐‐‐‐1rootroot64May1414:095‐>/dev/onload
lrwx‐‐‐‐‐‐1rootroot64May1414:096‐>onload:[udp:6:2]
Acceleratedfiledescriptorsarelistedassymboliclinksto/dev/onload.Accelerated
socketsaredescribedin[protocol:stack:socket]format.
6.10SystemcallsinterceptedbyOnload
SystemcallsinterceptedbytheOnloadlibraryarelistedinthefollowingfile:
[onload]/src/include/onload/declare_syscalls.h.tmpl
6.11LinuxSysctls
TheLinuxdirectory/proc/sys/net/ipv4containsdefaultsettingswhichtune
differentpartsoftheIPv4networkingstack.InmanycasesOnloadtakesitsdefault
settingsfromthevaluesinthisdirectory.Insomecasesthedefaultcanbe
overridden,foraspecifiedprocessesorsocket,usingsocketoptionsorwithOnload
environmentvariables.ThefollowingtablesidentifythedefaultLinuxvaluesand
howOnloadtuningparameterscanoverridetheLinuxsettings.
KernelValue tcp_slow_start_after_idle
Description controlscongestionwindowvalidationasperRFC2861.
Onloadvalue “offbydefaultinOnload,asit'snotusuallyusefulin
modernswitchednetworks
Comments #defineCI_CFG_CONGESTION_WINDOW_VALIDATION
intransport_config_opt.h.
recompileafterchanging.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 58
KernelValue tcp_congestion_control
Description determineswhatcongestioncontrolalgorithmisusedbyTCP.
Validsettingsincludereno,bicandcubic
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_CONG_AVOID_SCALE_BACK
KernelValue tcp_adv_win_scale
Description defineshowquicklytheTCPwindowwilladvance
Onloadvalue nodirectequivalent‐seethesectiononTCPCongestion
Control
Comments seeEF_TCP_ADV_WIN_SCALE_MAX
KernelValue tcp_rmem
Description thedefaultsizeofsockets'receivebuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings,butisignored
onTCPacceptedsockets.Referto
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT.
Comments canbeoverridenwiththeSO_RCVBUFsocketoption.
canbesetwithEF_TCP_RCVBUF
KernelValue tcp_wmem
Description thedefaultsizeofsockets'sendbuffers(inbytes)
Onloadvalue defaultstothecurrentlyactiveLinuxsettings
Comments EF_TCP_SNDBUFoverridesSO_SNDBUFwhichoverrides
tcp_wmem
KernelValue tcp_dsack
Description allowsTCPtosendduplicateSACKS
Onloadvalue usesthecurrentlyactiveLinuxsettings
Comments
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 59
RefertotheParameterReferenceonpage163fordetailsofenvironmentvariables.
KernelValue tcp_fack
Description enablesforwardacknowledgmentalgorithm
Onloadvalue enabledbydefault‐OnloadusesthecurrentlyactiveLinux
setting
Comments
KernelValue tcp_sack
Description enableTCPselectiveacknowledgments,asperRFC2018
Onloadvalue enabledbydefault‐OnloadusesthecurrentlyactiveLinux
setting
Comments clearbit2ofEF_TCP_SYN_OPTStodisable
KernelValue tcp_max_syn_backlog
Description themaximumsizeofalisteningsocket'sbacklog
Onloadvalue setwithEF_TCP_BACKLOG_MAX
Comments
KernelValue tcp_synack_retries
Description themaximumnumberofretriesofSYNACKs
Onloadvalue setwithEF_RETRANSMIT_THRESHOLD_SYNACK
Comments Defaultvalue5
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 60
6.12ChangingOnloadControlPlaneTableSizes
Onloadsupportsthefollowingruntimeconfigurableoptionswhichdeterminethe
sizeofcontrolplanetables:
ThetableaboveidentifiesthedefaultvaluesfortheOnloadcontrolplanetables.The
defaultvaluesarenormallysufficientforthemajorityofapplicationsandcreating
largertablesmayimpactapplicationperformance.Ifnondefaultvaluesareneeded,
theusershouldcreateafileinthe/etc/modprobe.ddirectory.Thefilemusthave
the.confextensionandOnloadoptionscanbeaddedtothefile,asingleoptionper
line,inthefollowingformat:
optionsonloadmax_neighs=512
FollowingchangesOnloadshouldberestartedusingthereloadcommand:
onload_toolreload
6.13SO_BINDTODEVICE
Inresponsetothesetsockopt()functioncallwithSO_BINDTODEVICE,sockets
identifyingnonSolarflareinterfaceswillbehandledbythekernelandallsockets
identifyingSolarflareinterfaceswillbehandledbyOnload.Allsendsfromasocket
aresentviatheboundinterfaceandallTCP,UDPandMulticastpacketsreceivedvia
theboundinterfacearedeliveredonlytothesocketboundtotheinterface.
Option Description Default
max_layer2_interfaces Setsthemaximumnumberofnetwork
interfaces,includingphysicalinterfaces,
VLANsandbonds,supportedinOnload’s
controlplane.
50
max_local_addrs Setsthemaximumnumberoflocalnetwork
addressessupportedinOnloadscontrol
plane.
256
max_neighs Setsthemaximumnumberofrowsinthe
OnloadARP/neighbourtable.Thevalueis
roundeduptoapoweroftwo.
1024
max_routes Setsthemaximumnumberofentriesinthe
Onloadroutetable.
256
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 61
6.14MultiplexedI/O
LinuxsupportsthreecommonmethodsforhandlingmultiplexedI/Ooperation;
poll(),select()andtheepollsetoffunctions.
Thegeneralbehaviorofthepoll(),select()andepoll_wait()functionswith
OpenOnloadisasfollows:
•Ifthereareoperationsreadyonanyfiledescriptors,poll(),select()and
epoll_wait()willreturnimmediately.RefertothePoll,SelectandEpoll
subsectionsforspecificbehaviordetails.
•Iftherearenofiledescriptorsreadyandspinningisnotenabled,callsto
poll(),select()andepoll_wait()willenterthekernelandblock.
•Inthecasesofpoll()andselect(),whenthesetcontainsfiledescriptors
thatarenotacceleratedsockets,thereisaslightlatencyoverheadasOnload
mustmakeasystemcalltodeterminethereadinessofthesesockets.Thereis
nosuchcostwhenusingepoll_wait()andasystemcallisonlyneededwhen
nonOnloaddescriptorsbecomeready.
Toensurethatnonaccelerated(kernel)filedescriptorsarecheckedwhenthere
arenoeventsreadyonaccelerated(onload)descriptors,disablethefollowing
options:
EF_SELECT_FASTandEF_POLL_FAST‐settingbothtozero.
EF_POLL_FAST_USECandEF_SELECT_FAST_USEC‐settingbothtozero.
•Iftherearenofiledescriptorsreadyandspinningisenabled,OpenOnloadwill
spintoensurethatacceleratedsocketsarepolledaspecifiednumberoftimes
beforeunacceleratedsocketsareexamined.Thisreducestheoverhead
incurredwhenOpenOnloadhastocallintothekernelandreduceslatencyon
acceleratedsockets.
ThefollowingsubsectionsdiscusstheuseoftheseI/OfunctionsandOpenOnload
environmentvariablesthatcanbeusedtomanipulatebehavioroftheI/O
operation.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 62
Poll,ppoll
Thepoll(),ppoll()filedescriptorsetcanconsistofbothacceleratedandnon
acceleratedfiledescriptors.TheenvironmentvariableEF_UL_POLLenables/
disablesaccelerationofthepoll(),ppoll()functioncalls.Onloadsupportsthe
followingoptionsfortheEF_UL_POLLvariable:
Additionalenvironmentvariablescanbeemployedtocontrolthepoll(),ppoll()
functionsandtogiveprioritytoacceleratedsocketsovernonacceleratedsockets
andotherfiledescriptors.RefertoEF_POLL_FAST,EF_POLL_FAST_USECand
EF_POLL_SPINinParameterReferenceonpage163.
Select,pselect
Theselect(),pselect()filedescriptorsetcanconsistofbothacceleratedand
nonacceleratedfiledescriptors.TheenvironmentvariableEF_UL_SELECTenables/
disablesaccelerationoftheselect(),pselect()functioncalls.Onloadsupports
thefollowingoptionsfortheEF_UL_SELECTvariable:
Additionalenvironmentvariablescanbeemployedtocontroltheselect(),
pselect()functionsandtogiveprioritytoacceleratedsocketsovernon
acceleratedsocketsandotherfiledescriptors.RefertoEF_SELECT_FASTand
EF_SELECT_SPINinParameterReferenceonpage163.
Value Behaviour
0Disableaccelerationatuserlevel.Callstopoll(),ppoll()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuserlevel.Callstopoll(),ppoll()are
processedatuserlevel.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
Value EpollBehaviour
0Disableaccelerationatuserlevel.Callstoselect(),pselect()are
handledbythekernel.
Spinningcannotbeenabled.
1Enableaccelerationatuserlevel.Callstoselect(),pselect()are
processedatuserlevel.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 63
Epoll
Theepollsetoffunctions,epoll_create(),epoll_ctl(),epoll_wait(),
epoll_pwait(),areacceleratedinthesamewayaspollandselect.The
environmentvariableEF_UL_EPOLLenables/disablesepollacceleration.Referto
thereleasechangelogforenhancementsandchangestoepollbehavior.
UsingOnloadanepollsetcanconsistofbothOnloadfiledescriptorsandkernelfile
descriptors.OnloadsupportsthefollowingoptionsfortheEF_UL_EPOLL
environmentvariable:
Value EpollBehaviour
0Acceleratedepollisdisabledandepoll_ctl(),epoll_wait()and
epoll_pwait()functioncallsareprocessedinthekernel.Other
functionscallssuchassend()andrecv()arestillaccelerated.
Interruptavoidancedoesnotfunctionandspinningcannotbeenabled.
Ifasocketishandedovertothekernelstackafterithasbeenaddedto
anepollset,itwillbedroppedfromtheepollset.
onload_ordered_epoll_wait()isnotsupported.
1Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestlatencyexceptwhenthenumberofacceleratedfile
descriptorsintheepollsetisverylarge.Thisoptionalsogivesthebest
accelerationofepoll_ctl()calls.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyincreasewiththenumberoffiledescriptors
intheepollset.
onload_ordered_epoll_wait()issupported.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 64
Therelativeperformanceofepolloptions1and2dependsonthedetailsof
applicationbehavioraswellasthenumberofacceleratedfiledescriptorsinthe
epollset.Behaviormayalsodifferbetweenearlierandlaterkernelsandbetween
Linuxrealtimeandnonrealtimekernels.GenerallytheOSwillallocateshorttime
slicestoauserlevelCPUintensiveapplicationwhichmayresultinperformance
(latencyspikes).AkernellevelCPUintensiveprocessislesslikelytobedescheduled
resultinginbetterperformance.Solarflarerecommendtheuserevaluateoptions1
and2forapplicationsthatmanagesmanyfiledescriptors,ortryoption3(onload
201502andlater)whenusingverylargesetsandallsocketsareinthesamestack.
Additionalenvironmentvariablescanbeemployedtocontroltheepoll_ctl(),
epoll_wait()andepoll_pwait()functionsandtogiveprioritytoaccelerated
socketsovernonacceleratedsocketsandotherfiledescriptors.Referto
EF_EPOLL_CTL_FAST,EF_EPOLL_SPINandEF_EPOLL_MT_SAFEinParameter
Referenceonpage163.
2Callstoepoll_ctl(),epoll_wait(),epoll_pwait()areprocessedin
thekernel.
Deliversbestperformanceforlargenumbersofacceleratedfile
descriptors.
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
CPUoverheadandlatencyareindependentofthenumberoffile
descriptorsintheepollset.
onload_ordered_epoll_wait()isnotsupported.
3Functioncallstoepoll_ctl(),epoll_wait(),epoll_pwait()are
processedatuserlevel.
Deliversbestaccelerationlatencyforepoll_ctl()callsandscaleswell
whenthenumberofacceleratedfiledescriptorsintheepollsetisvery
large‐andallsocketsareinthesamestack.Thecostofthe
epoll_wait()isindependentofthenumberofacceleratedfile
descriptorsinthesetanddependsonlyonthenumberofdescriptors
thatbecomeready.Thebenefitswillbelessifsocketsexistindifferent
Onloadstacksandinthiscasetherecommendationistouse
EF_UL_EPOLL=2.
EF_UL_EPOLL=3doesnotallowmonitoringthereadinessoftheepoll
filedescriptorsfromanotherepoll/poll/select.
EF_UL_EPOLL=3cannotsupportepollsetswhichexistacrossfork().
Spinningcanbeenabledandinterruptsareavoideduntilanapplication
blocks.
onload_ordered_epoll_wait()issupported.
Value EpollBehaviour
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 65
Refertoepoll‐KnownIssuesonpage137.
6.15WireOrderDelivery
WhenaTCPorUDPapplicationisworkingwithmultiplenetworksockets
simultaneouslyitisdifficulttoensuredataisdeliveredtotheapplicationinthestrict
orderitwasreceivedfromthewireacrossthesesockets.
Theonload_ordered_epoll_wait()APIisanOnloadalternativeimplementation
ofepoll_wait()providingadditionaldataallowingareceivingapplicationto
recoverinordertimestampeddatafrommultiplesockets.Tomaintainwireorder
delivery,onlyaspecificnumberofbytes,asidentifiedbythe
onload_ordered_epoll_event,shouldberecoveredfromareadysocket.
• Orderingisdoneonaperstackbasis‐forTCPandUDPsockets.Socketsmust
beinthesameonloadstack.
•OnlydatareceivedfromanOnloadstackwithahardwaretimestampwillbe
ordered.TheenvironmentvariableEF_RX_TIMESTAMPINGshouldbeenabled.
Filedescriptorswheretimestampinginformationisnotavailablemaybe
includedintheepollset,butreceiveddatawillbereturnedfromthese
unordered.
•TheapplicationmustusetheepollAPIandthe
onload_ordered_epoll_wait()function.
•Theapplicationmustsettheperprocessenvironmentvariable
EF_UL_EPOLL=1orEF_UL_EPOLL=3.
• EPOLLETandONESHOTflagsshouldNOTbeused.
•Areturnvalueofzerofromthewaitfunctionindicatestherearenofile
descriptorsreadywithordereddata‐unordereddatamaystillbeavailable.
Figure6demonstratestheWireOrderDeliveryfeature.
Figure6:WireOrderDelivery
onload_ordered_epoll_wait()returningatpointXwouldallowthefollowing
datatoberecovered:
•SocketA:timestampofpacket1,bytesinpacket1.
•SocketB:timestampofpacket2,bytesinpackets2and3.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 66
onload_ordered_epoll_wait()returningagainwouldrecovertimestampof
packet4andbytesinpacket4.
TheWireOrderDeliveryfeatureisonlyavailableonSolarflareFlareonadapters
havingaPTP/HWtimestampinglicense.Whenreceivingacrossmultipleadapters,
Solarflaresfptpd(PTP)canensurethatadaptersarecloselysynchronizedwitheach
otherand,ifrequired,withanexternalPTPclocksource.
WireOrderDelivery‐ExampleAPI:
TheOnloaddistributionincludesexampleclient/serverapplicationstodemonstrate
thewireorderfeature:
wire_order_server‐usesonload_ordered_epoll_waittoreceiveordered
dataoverasetofsockets.Receiveddataisechoedbacktotheclientonasinglereply
socket.
wire_order_client‐Sendssequenceddataacrossthesocketset,readsthereply
datafromtheserverandensuresdataisreceivedinsequence.
SourcecodeforthewireorderAPIisavailablein:
openonload<version>/src/tests/onload/wire_order
AlthoughnotcompiledaspartoftheOnloadinstallprocess,tobuildtheexample
APIdothefollowing:
Ensuremmaketoolisinthecurrentpath(canbefoundintheopenonload
<version>/scriptsdirectory):
#exportPATH=$PATH:/openonload<version>/scripts
#cd/openonload<version>/build/gnu_x86_64/tests/onload/wire_order
#USEONLOADEXT=1make
Toruntheserver:
#EF_RX_TIMESTAMPING=3onload./wire_order_server
Toruntheclient:
#onload‐‐profile=latency./wire_order_client<ipserver>
Bydefaulttheclientwillsenddataover100TCPsocketscontrolledwiththe‐s
option.UDPcanbeselectedusingthe‐Uoption.
NOTE:Topreventsendsbeingreorderedbetweenstreams,thelatencyprofile
shouldbeusedontheclientside.TheenvironmentvariableEF_RX_TIMESTAMPING
mustbesetontheserverside.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 67
6.16StackSharing
BydefaulteachprocessusingOnloadhasitsown'stack'.RefertoOnloadStacksfor
definition.Severalprocessescanbemadetoshareasinglestack,usingtheEF_NAME
environmentvariable.ProcesseswiththesamevalueforEF_NAMEintheir
environmentwillshareastack.
StacksharingisonesupportedmethodtoenablemultipleprocessesusingOnload
tobeacceleratedwhenreceivingthesamemulticaststreamortoallowone
applicationtoreceiveamulticaststreamgeneratedlocallybyasecondapplication.
OthermethodstoachievethisareMulticastReplicationandHardwareMulticast
Loopback.
Stacksmayalsobesharedbymultipleprocessesinordertopreserveandcontrol
resourceswithinthesystem.Stacksharingcanbeemployedbyprocesseshandling
TCPaswellasUDPsockets.
Stacksharingshouldonlyberequestedifthereisatrustrelationshipbetweenthe
processes.Iftwoprocessesshareastackthentheyarenotcompletelyisolated:a
buginoneprocessmayimpacttheother,oroneprocesscangainaccesstothe
other'sprivilegedinformation(i.e.breachsecurity).OncetheEF_NAMEvariableis
set,anyprocessonthelocalhostcansetthesamevalueandgainaccesstothe
stack.
BydefaultOnloadstackscanonlybesharedwithprocesseshavingthesameUID.
TheEF_SHARE_WITHenvironmentvariableprovidesadditionalsecuritywhile
allowingadifferentUIDtoshareastack.RefertoParameterReferenceonpage163
foradescriptionoftheEF_NAMEandEF_SHARE_WITHvariables.
ProcessessharinganOnloadstackshouldalsonotusehugepages.Onloadwill
issueawarningatstartupandpreventtheallocationofhugepagesif
EF_SHARE_WITHidentifiesaUIDofanotherprocessorissetto‐1.IfaprocessP1
createsanOnloadstack,butisnotusinghugepagesandanotherprocessP2
attemptstosharetheOnloadstackbysettingEF_NAME,thestackoptionssetbyP1
willapply,allocationofhugepagesinP2willbeprevented.
AnalternativemethodofimplementingstacksharingistousetheOnload
ExtensionsAPIandtheonload_set_stackname()functionwhich,throughits
scopeparameter,canlimitstackaccesstotheprocessescreatedbyaparticularuser.
RefertoOnloadExtensionsAPIonpage221fordetails.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 68
6.17ApplicationClustering
AnapplicationclusteristhesetofOnloadTCPorUDPstacksocketsboundtothe
sameport.Thisfeaturedramaticallyimprovesthescalingofsomeapplications
acrossmultipleCPUs(especiallythoseestablishingmanysocketsfromaTCP
listeningsocket).
Onloadfromversion201405automaticallycreatesaclusterusingthe
SO_RESUSEPORTsocketoption.TCPorUDPprocessesrunningonRHEL6.5(and
later)settingthisoptioncanbindmultiplesocketstothesameTCPorUDPport.
NOTE:SomeolderLinuxkernel/distributionsdonothavekernelsupportfor
SO_REUSEPORT(introducedintheLinux3.9kernel).Onloadcontainsexperimental
supportforSO_REUSEPORTonolderkernelversionsbutthishasyettobefully
testedandverifiedbySolarflare.UsersarefreetotrytheOnloadapplication
clusteringfeatureonthesekernelsandreporttheirfindingsviaemailto
support@solarflare.com.
ForTCP,clusteringallowstheestablishedconnectionsresultingfromalistening
sockettobespreadoveranumberofOnloadstacks.Eachthread/processcreatesits
ownlisteningsocket(usingSO_REUSEPORT)onthesameport,witheachlistening
socketresidinginitsownOnloadstack.HandlingofincomingnewTCPconnections
arespreadviatheadapter(usingRSS)overtheapplicationclusterandtherefore
overeachofthelisteningsocketsresultingineachOnloadstackandthereforeeach
thread/process,handlingasubsetofthetotaltrafficasillustratedinFigure7below.
Figure7:ApplicationClustering‐TCP
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 69
ForUDP,clusteringallowsUDPunicasttraffictobespreadovermultipleapplications
witheachapplicationreceivingasubsetofthetotaltrafficload.
ExistingapplicationsthatdonotuseSO_RESUSEPORTcanusetheapplication
clusteringfeaturewithouttheneedforrecompilationbyusingtheOnload
EF_TCP_FORCE_REUSEPORTorEF_UDP_FORCE_REUSEPORTenvironmentvariables
identifyingthelistofportstowhichSO_RESUSEPORTwillbeapplied.
ThesizeornumberofsocketmembersofaclusterinOnloadiscontrolledwith
EF_CLUSTER_SIZE.Tocreateaclustertheapplicationsetstheclusternamewith
EF_CLUSTER_NAME.AclusterofEF_CLUSTER_SIZEisthencreated.
NOTE:ThenumberofsocketmembersmustequaltheEF_CLUSTER_SIZEvalue
otherwiseaportionofthereceivedtrafficwillbelost.
ThespreadofreceivedtrafficbetweenclustersocketsemploysReceiveSideScaling
(RSS).ForTCPtheRSShashisafunctionofthesrc_ip:src_port,dst_ip:dst_port.For
UDPtheRSShashisafunctionofthesrc_ipanddst_iponly.
Thereceptionoftrafficwithinaclusterisdependentonportnumbersonly.Iftwo
socketsbindtothesameport,butdifferentIPaddresses,aportionoftraffic
destinedforonesocketcanbereceived(butdroppedbyOnload)ontheother
socket.Forcorrectbehavior,allclustermembersshouldbindtothesameIPaddress.
ThislimitationhasbeenremovedintheOnload201509releasesothatitispossible
tocreatemultiplelisteningsocketsboundtothesameportbuttodifferent
addresses.
Restartinganapplicationthatincludesclustersocketmemberscanfailwhenorphan
stacksarestillpresent.UseEF_CLUSTER_RESTARTtoforceterminationoforphaned
stacksallowingthecreationofthenewcluster.
RefertoLimitationsonpage129fordetailsofApplicationClusteringlimitations.
6.18Bonding,LinkaggregationandFailover
Bonding(akateaming)allowsforimprovedreliabilityandincreasedbandwidthby
combiningphysicalportsfromoneormoreSolarflareadaptersintoabond.Abond
hasasingleIPaddress,singleMACaddressandfunctionsasasingleportorsingle
adaptertoprovideredundancy.
OnloadmonitorstheOSconfigurationofthestandardkernelbondingmoduleand
acceleratestrafficoverbondsthataredetectedassuitable(seelimitations).Asa
resultnospecialconfigurationisrequiredtoacceleratetrafficoverbonded
interfaces.
e.g.Toconfigurean802.3adbondoftwoSFCinterfaces(eth2andeth3):
modprobebondingmiimon=100mode=4xmit_hash_policy=layer3+4
ifconfigbond0up
Interfacesmustbedownbeforeaddingtothebond.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 70
echo+eth2>/sys/class/net/bond0/bonding/slaves
echo+eth3>/sys/class/net/bond0/bonding/slaves
ifconfigbond0192.168.1.1/24
Thefile/var/log/messagesshouldthencontainalinesimilarto:
[onload]Acceleratingbond0usingOnload
TrafficoverthisinterfacewillthenbeacceleratedbyOnload.
RefertotheLimitationssection,Bonding,Linkaggregationonpage134forfurther
information.
TodisableOnloadaccelerationofbonds,pleasecontactsupport@solarflare.com.
6.19Teaming
InadditiontotraditionalLinuxbonding,Onloadalsosupportslinkaggregationusing
theLinuxteamingdriverthatisintroducedinRHEL7,SLES12,andotherrecent
distributions.Therearevariousmethodstoconfigureteaming.Theexamplebelow
demonstratestheuseoftheNetworkManagerCLIwhichcreatestheifcfgfilesin
the/etc/sysconfig/networkscriptsdirectory.Usingnmcli,teamspersist
acrossserverreboots.
1Createtheteam:
#nmcliconnectionaddtypeteamifnameteamA
Connection'teamteamA'(b7c39a1084ac484085f266adb5e71183)successfullyadded.
2Listthecreatedteam:
#nmcliconshow
NAMEUUIDTYPEDEVICE
eno24efeb125d4894a069d8a407bf03fcc778023ethernet‐‐
eno1f270807d9904452ebbd20d6b48840c808023etherneteno1
enp1s0f116192f4d7a974154924b02ca905c8cd78023ethernet‐‐
teamteamAb7c39a1084ac484085f266adb5e71183teamteamA
virbr0ceb1a6837db947218ca938a577ff5d77bridgevirbr0
enp1s0f0cbc92b259855451c8c6b186b6d5db9f68023ethernet‐‐
3Viewdefaultsettingsforthenewlycreatedteam:
#cat/etc/sysconfig/networkscripts/ifcfgteamteamA
DEVICE=teamA
DEVICETYPE=Team
BOOTPROTO=dhcp
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 71
NAME=teamteamA
UUID=b7c39a1084ac484085f266adb5e71183
ONBOOT=yes
4Addprimaryinterfacetotheteam:
#nmcliconaddtypeteamslaveconnameteamAport1ifnameenp1s0f0masterteamA
Connection'teamAport1'(015f09d73f2a4578aaea7ff89a2769f7)successfullyadded.
5Addasecondinterfacetotheteam:
#nmcliconaddtypeteamslaveconnameteamAport2ifnameenp1s0f1masterteamA
Connection'teamAport2'(92dfe561860a4906842db7ebdf263dbe)successfullyadded.
6Bringuptheteamports:
#nmcliconnectionupteamAport1
Connectionsuccessfullyactivated(DBusactivepath:/org/freedesktop/
NetworkManager/ActiveConnection/6)
Repeatcommandforotherteamports.
7AssignteamIPaddressesviaifcfgfilesorcommandlineasrequired.
NOTE:Teamscreatedwiththeteamddaemonarenonpersistent.Teamscreated
withnmcliarepersistentacrossserverreboots.
TodisableOnloadaccelerationofteaming,pleasecontactsupport@solarflare.com.
6.20VLANS
ThedivisionofaphysicalnetworkintomultiplebroadcastdomainsorVLANsoffers
improvedscalability,securityandnetworkmanagement.
OnloadwillacceleratetrafficoversuitableVLANinterfacesbydefaultwithno
additionalconfigurationrequired.
e.g.toaddaninterfaceforVLAN5overanSFCinterface(eth2)
modprobeonload
modprobe8021q
vconfigaddeth25
ifconfigeth2.5192.168.1.1/24
TrafficoverthisinterfacewillthenbetransparentlyacceleratedbyOnload.
RefertotheLimitationssection,VLANsonpage134forfurtherinformation.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 72
6.21Acceleratedpipe()
Onloadsupportstheaccelerationofpipes,providinganacceleratedIPCmechanism
throughwhichtwoprocessesonthesamehostcancommunicateusingshared
memoryatuserlevel.Acceleratedpipesdonotinvokesystemcalls.Accelerated
pipestherefore,reducetheoverheadsforread/writeoperationsandofferimproved
latencyoverthekernelimplementation.
Tocreateauserlevelpipe,andbeforethepipe()orpipe2()functioniscalled,a
processmustbeacceleratedbyOnloadandmusthavecreatedanOnloadstack.By
default,anacceleratedprocessthathasnotcreatedanOnloadstackisgrantedonly
anonacceleratedpipe.SeeEF_PIPEforotheroptions.
Theacceleratedpipeiscreatedfromthepoolofavailablepacketbuffers.
Thefollowingfunctioncalls,relatedtopipes,willbeacceleratedbyOnloadandwill
notenterthekernelunlesstheyblock:
pipe()
read()
write()
readv()
writev()
send()
recv()
recvmsg()
sendmsg()
poll()
select()
epoll_ctl()
epoll_wait()
AswithTCP/UDPsockets,theOnloadtuningoptionssuchasEF_POLL_USECand
EF_SPIN_USECwillalsoinfluenceperformanceoftheuserlevelpipe.
ReferalsotoEF_PIPE,EF_PIPE_RECV_SPIN,EF_PIPE_SEND_SPINinParameter
Referenceonpage163.
NOTE:Onlyanonymouspipescreatedwiththepipe()orpipe2()functioncalls
willbeaccelerated.
OnloadUserGuide
OnloadFunctionality
Issue22 ©SolarflareCommunications2017 73
6.22ZeroCopyAPI
TheOnloadExtensionsAPIincludessupportforzerocopyofTCPtransmitpackets
andUDPreceivepackets.RefertoZeroCopyAPIonpage236fordetailed
descriptionsandexamplesourcecodeoftheAPI.
6.23DebugandLogging
Onloadsupportsvariousdebugandloggingoptions.Logginganddebuginformation
willbedisplayedonanattachedconsoleorwillbesenttothesyslog.Toforceall
debugtothesyslogsettheOnloadenvironmentvariableEF_LOG_VIA_IOCTL=1.
Formoreinformationaboutdebug/loggingenvironmentvariablesreferto
ParameterReferenceonpage163.
Toenabledebugandloggingusingtheoptionsbelow,Onloadmustbeinstalledwith
debugenablede.g:
#onload_install‐‐debug
IfOnloadisalreadyinstalled,uninstall,thenreinstallwiththe‐‐debugoptionas
shownabove.
LogLevels:
EF_UNIX_LOG.
EF_LOG.
EF_LOG_FILE‐WhenEF_LOG_VIA_IOCTLisunset,theuserisabletoredirect
OnloadoutputtoaspecifieddirectoryandfileusingtheEF_LOG_FILEoption.
TimestampscanalsobeaddedtothelogfilewhenEF_LOG_TIMESTAMPSisalso
enabled.
EF_LOG_FILE=<path/file>
Notethatkernelloggingisstilldirectedtothesyslog.
TP_LOG(bitmask)‐usefulforstackdebugging.SeeOnloadsourcecode/src/
include/ci/internal/ip_log.hforbitvalues.
•Controlplanemoduleoption:
‐ cplane_debug_bits=[bitmask]‐usefulforkernelloggingandevents
involvingthecontrolplane.Seesrc/include/cplane/debug.hforbit
values.
•Onloadmoduleoptions:
‐ ci_tp_log=[bitmask]‐usefulforkernelloggingandeventsinvolvingan
onloadstack.SeeOnloadsourcecode/src/include/ci/internal/
ip_log.hfordetails.
‐ oo_debug_bits=[bitmask]‐usefulforkernelloggingandeventsnot
involvinganonloadstackorthecontrolplane.Seesrc/include/onload/
debug.hforbitvalues.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 74
7TimeStamps
7.1Introduction
Thissectionidentifiesoptionsforusingsoftwareandhardwaretimestamps.
7.2SoftwareTimestamps
SettingtheSO_TIMESTAMPorSO_TIMESTAMPNSoptionsusingsetsockopt()
enablessoftwaretimestampingonTCPorUDPsockets.Functionssuchascmesg(),
recvmsg()andrecvmmsg()canthenrecovertimestampsforpacketsreceivedat
thesocket.
Onloadimplementsamicrosecondresolutionsoftwaretimestampingmechanism,
whichavoidstheneedforaperpacketsystemcalltherebyreducingthenormal
timestampoverheads.
TheSolarflareadapterwillalwaysdeliverreceivedpacketstothereceiveringbuffer
intheorderthatthesearrivefromthenetwork.Onloadwillappendasoftware
timestamptothepacketmetadatawhenitretrievesapacketfromtheringbuffer‐
beforethepacketistransferredtoawaitingsocketbuffer.
SoftwareTimestamp‐TCPStream
FromaTCPstreamthetimestampreturnedisthatforthefirstavailablebyte.Due
toretransmissionsandanyreordering,timestampsmaynotbemonotonically
increasingasthesearedeliveredtotheapplication.
SoftwareTimestamp‐InterruptDriven
WhentheOnloadapplicationisinterruptdriven,areceivedpacketistimestamped
whentheeventinterruptforthepacketfires.
SoftwareTimestamp‐Spinning
IftheOnloadapplicationisspinning,areceivedpacketistimestampedwhenthe
stackispolledatwhichpointthepacketisplacedonthesocketreceivequeue.
Spinningwillgenerallyproducemoreaccuratetimestampssolongasthereceiving
applicationisabletokeeppacewiththepacketarrivalrate.
SoftwareTimestamp‐Format
Theformatoftimestampsisdefinedbystruct_timeval.
OnloadUserGuide
TimeStamps
Issue22 ©SolarflareCommunications2017 75
Applicationspreferringtimestampswithnanosecondresolutioncanuse
SO_TIMESTAMPNSinplaceofthenormal(microsecondresolution)SO_TIMESTAMP
value.
7.3HardwareTimestamps
SettingtheSO_TIMESTAMPINGoptionusingsetsockopt()enableshardware
timestampingonTCPorUDPsockets.Timestampsaregeneratedbytheadapterfor
eachreceivedpacket.Functionssuchascmesg(),recvmsg()andrecvmmsg()can
thenrecoverhardwaretimestampsforpacketsrecoveredfromasocket.
HardwareTimestamp‐Requirements
• SupportedonlyonSolarflareFlareonSFN7000andSFN8000seriesadapters.
•AnAppFlexlicenseforhardwaretimestampsmustbeinstalledontheadapter:
‐ ThePTP/timestampinglicenseisinstalledduringmanufactureonthe
SFN7322FadapterandonthePLUSvariantsofSFN8000seriesadapters.
‐ AnappropriatelicensecanbeinstalledonotherSFN7000andSFN8000
seriesadaptersbytheuser.
HardwareTimestamp‐Format
Theformatoftimestampsisdefinedbystruct_timespec.Interestedusersshould
readthekernelSO_TIMESTAMPINGdocumentationformoredetailsofhowtouse
thissocketAPIkerneldocumentationcanbefound,forexample,at:
https://www.kernel.org/doc/Documentation/networking/timestamping/
HardwareTimestamp‐ReceivedPackets
•TheOnloadstackforthesocketmusthavetheenvironmentvariable
EF_RX_TIMESTAMPINGset‐seeAppendixAonpage163fordetails.
• ReceivedpacketsaretimestampedwhentheyentertheMAContheSFN7000
orSFN8000seriesadapter.
HardwareTimestamp‐TransmitPackets
Onloadfrom201405supportshardwaretimestampingofUDPandTCPpackets
transmittedoveraSolarflareinterface.
BecausetheLinuxkerneldoesnotsupporthardwaretimestampsforTCP,Onload
providesanextensiontothestandardSO_TIMESTAMPINGAPIwiththe
ONLOAD_SOF_TIMESTAMPING_STREAMsocketoptiontosupportthis.Torecover
hardwaretimestampsfortransmittedTCPpackets,setthefollowingsocketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE|
ONLOAD_SOF_TIMESTAMPING_STREAM
OnloadUserGuide
TimeStamps
Issue22 ©SolarflareCommunications2017 76
TorecoverhardwaretimestampsfortransmittedUDPpackets,setthefollowing
socketoptions:
SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_SYS_HARDWARE
Othersocketflagcombinations,notlistedabove,willbesilentlyignored.
Toreceivehardwaretransmittimestamps:
•OnlysupportedonSolarflareFlareon™SFN7000andSFN8000seriesadapters.
•TheadaptermusthaveaPTP/HWtimestampinglicense.
•TheadaptermusthaveaSolarCaptureProlicenseorPerformanceMonitoring
license.
•SetEF_TX_TIMESTAMPINGonstackswheretransmittimestampingisrequired.
•SetEF_TIMESTAMPING_REPORTINGtocontrolthetypeoftimestampreturned
totheapplication.Thisisoptional,bydefaultOnloadwillreporttranslated
timestampsiftheadapterclockhasbeenfullysynchronizedtocorrecttimeby
theSolarflarePTPdaemon.InallcasesOnloadwillalwaysreportraw
timestamps.RefertoParameterReferenceonpage163forfulldetailsofthe
EF_TIMESTAMPING_REPORTINGvariable.
• SolarflarePTP(sfptpd)mustberunningiftimestampsaretobesynchronized
withanexternalPTPmasterclock.
FordetailsoftheSO_TIMESTAMPINGAPIrefertotheLinuxdocumentation:
https://www.kernel.org/doc/Documentation/networking/timestamping/
ZeroedTimestamps
Iftimestampsreturnedfromtheadapterarezeroed,refertoSettingAdapterClock
Timeonpage77.
SynchronizingTime
SolarflareEnhancedPTPcanbeenabledtosynchronizethetimeacrossallclocks
withinaserverorbetweenmultipleservers.
ThesfptpddaemonsupportsclocksynchronizationwithexternalNTPandPTPtime
sourcesandincludesanoptionalPTP/NTPfallbackconfigurations.
FordetailsofSolarflarePTPrefertotheSolarflareEnhancedPTPUserGuide(SF
109110CD)availablefromhttps://support.solarflare.com/.
OnloadUserGuide
TimeStamps
Issue22 ©SolarflareCommunications2017 77
7.4Timestamping‐ExampleApplications
Theonloaddistributionincludesexampleapplicationstodemonstratereceiveand
transmithardwaretimestamping.WithOnloadinstalled,sourcecodeislocatedin
thefollowingsubdirectory:
/openonload<version>/src/tests/onload/hwtimestamping
BuildExamples
Followingtheonload_install,theexampleapplications:rx_timestampingand
tx_timestampingarelocatedinthefollowingdirectory:
/openonload<version>/build/gnu_x86_64/tests/onload/hwtimestamping
UsingearlierversionsofOnloadtheusershouldrunthemakecommandinthe
followingdirectorytobuildexampletimestampingapplications:
/openonload<version>/src/tests/onload/hwtimestamping
RunExamples
Thefollowingconditionsarerequiredtoruntheexampleapplications:
•TheservermusthaveaSolarflareSFN7000orSFN8000seriesadapter.
•TheadaptermusthaveaPTP/HWtimestampinglicense.
•Theconnectionfromwhichpacketsaretobetimestampedmustberouted
overthetimestampingadapter.
•ToreceiveTXtimestamps,theadaptermusthaveaSolarCaptureProlicenseor
PerformanceMonitoringlicense
•TheOnloadenvironmentvariableEF_RX_TIMESTAMPINGor
EF_TX_TIMESTAMPINGmustbeenabledintheOnloadenvironment.
NOTE:UsershouldalsoreadthespecificrequirementsfromtheRX/TX
timestampingsectionsabove.
SettingAdapterClockTime
Itmaybenecessaryto‘seedtheadapterclocktime‐otherwisetimestampsmaybe
zeroedorreportedas01Jan1970.ThiscanbedonebybrieflyrunningSolarflarePTP
(sfptpd)asaslave‐theadapterclockisseededfromthesystemclock.
Runningsfptpdinfreerunmodewillachievethesameresult.Itisnotrequiredto
actuallyreceiveanyPTPpacketstoseedtheadapterclockandsfptpdcanbe
terminatedafterafewsecondsasitisonlyrequiredto‘seed’theadapterclock.
Userswhowishtosynchronizetheadapterclockwithanexternaltimesource
shouldrefertotheSolarflareEnhancedPTPUserGuide(SF109110CD).
OnloadUserGuide
TimeStamps
Issue22 ©SolarflareCommunications2017 78
rx_timestampingExample
TofollowingcommandsetstheEF_RX_TIMESTAMPINGenvironmentvariableand
startstherx_timestampingexampleapplication.
OnServer1:
#EF_RX_TIMESTAMPING=2onload./rx_timestamping‐‐prototcp
oo:rx_timestamping[31250]:UsingOpenOnload201509Copyright20062015
SolarflareCommunications,20022005Level5Networks[0]
Socketcreated,listeningonport9000
Socketaccepted
Selectinghardwaretimestampingmode.
PPacket1‐27bytestimestamps1460374944.990960465
1460374944.9934211291460374944.993421129
Packet2‐27bytestimestamps1460374966.478980336
1460374966.4816235311460374966.481623531
Packet3‐0bytesnotimestamp
recvmsgreturned0‐endofstream
OnServer2:
ThisexampleusestheLinuxnetcatutilitytosendpacketstoserver1:
#nc<server1ip>9000
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
tx_timestampingExample
TofollowingcommandsetstheEF_TX_TIMESTAMPINGenvironmentvariableand
startsthetx_timestampingexampleapplication.
OnServer1:
#EF_TX_TIMESTAMPING=3onload./tx_timestamping‐‐prototcp‐‐ioctleth4
oo:tx_timestamping[16139]:UsingOpenOnload201509Copyright20062015
SolarflareCommunications,20022005Level5Networks[4]
TCPlisteningonport9000
TCPconnectionaccepted
AcceptedSIOCHWTSTAMPioctl.
Selectinghardwaretimestampingmode.
Packet1‐27bytes
Timestampfor27bytes:
Firstsenttimestamp1453436034.615029223
Lastsenttimestamp0.000000000
OnServer2:
ThisexampleusestheLinuxnetcatutilitytosendapackettoserver1whichisthen
echoedbacktothesender:
OnloadUserGuide
TimeStamps
Issue22 ©SolarflareCommunications2017 79
#nc<server1ip>9000
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz(echoedbackfromserver1)
ExampleUDPCommands
OnServer1:
#EF_RX_TIMESTAMPING=2onload./rx_timestamping‐‐protoudp‐‐port9000
OnServer2:
#nc‐u<server1_ipaddr>9000
DisplayedTimestampOrder
Timestampsintheexampleapplicationsaredisplayedinthefollowingorder:
•System:softwaretimestampfromthesystemclock.
•Transformed:hardwaretimestampconvertedtosoftwaretimestamp,thiscan
beignoredbecausetheadapterisusingUTCtimeandtransformationisnot
required.TransformedtimestampsareidenticaltoRawtimestamps.
•Raw:hardwaretimestampgeneratedbytheadapterclock.
ZeroedTimestamps
Iftimestampsreturnedfromtheexampleapplicationsarezeroed,refertoSetting
AdapterClockTimeonpage77.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 80
8Onload‐TCP
8.1TCPOperation
ThetablebelowidentifiestheOnloadTCPimplementationRFCcompliance.
RFC Title Compliance
793 TransmissionControlProtocol Yes
813 WindowandAcknowledgementStrategyinTCP Yes
896 CongestionControlinIP/TCP Yes
1122 RequirementsforHosts Yes
1191 PathMTUDiscovery Yes
1323 TCPExtensionsforHighPerformance Yes
2018 TCPSelectiveAcknowledgmentOptions Yes
2581 TCPCongestionControl Yes
2582 TheNewRenoModificationtoTCPsFastRecovery
Algorithm
Yes
2883 AnExtensiontotheSelectiveAcknowledgement
(SACK)OptionforTCP
Yes
2988 ComputingTCPsRetransmissionTimer Yes
3128 ProtectionAgainstaVariantoftheTinyFragment
Attack
Yes
3168 TheAdditionofExplicitCongestionNotification(ECN)
toIP
Yes
3465 TCPCongestionControlwithAppropriateByte
Counting(ABC)
Yes
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 81
8.2TCPHandshake‐SYN,SYNACK
DuringtheTCPconnectionestablishment3wayhandshake,Onloadnegotiatesthe
MSS,WindowScale,SACKpermitted,ECN,PAWSandRTTMtimestamps.
ForTCPconnectionsOnloadwillnegotiateanappropriateMSSfortheMTU
configuredontheinterface.However,whenusingjumboframes,Onloadwill
currentlynegotiateanMSSvalueuptoamaximumof2048bytesminusthenumber
ofbytesrequiredforpacketheaders.Thisisduetothefactthatthesizeofbuffers
passedtotheSolarflarenetworkinterfacecardis2048bytesandtheOnloadstack
cannotcurrentlyhandlefragmentedpacketsonitsTCPreceivepath.
TCPoptionsadvertisedduringthehandshakecanbeselectedusingtheEF_TCP_
SYN_OPTSenvironmentvariable.RefertoParameterReferenceonpage163for
detailsofenvironmentvariables.
8.3TCPSYNCookies
TheOnloadenvironmentvariableEF_TCP_SYNCOOKIEScanbeenabledonaper
stackbasistoforcetheuseofSYNCOOKIEStherebyprovidingadegreeofprotection
againsttheDenialofService(DOS)SYNfloodattack.EF_TCP_SYNCOOKIESis
disabledbydefault.RefertoParameterReferenceonpage163fordetailsof
environmentvariables.
8.4TCPSocketOptions
OnloadTCPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_ACCEPTCONN determineswhetherthesocketcanacceptincoming
connections‐trueforlisteningsockets.(Onlyvalidasa
getsockopt()).
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.
SO_CONNECT_TIME numberofsecondsaconnectionhasbeenestablished.
(Onlyvalidasagetsockopt()).
SO_DEBUG enableprotocoldebugging.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 82
SO_KEEPALIVE enablesendingofkeepalivemessagesonconnection
orientedsockets.
SO_LINGER whenenabled,aclose()orshutdown()willnot
returnuntilallqueuedmessagesforthesockethave
beensuccessfullysentorthelingertimeouthasbeen
reached.Otherwisetheclose()orshutdown()
returnsimmediatelyandsocketsareclosedinthe
background.
SO_OOBINLINE indicatesthatoutofbounddatashouldbereturnedin
linewithregulardata.Thisoptionisonlyvalidfor
connectionorientedprotocolsthatsupportoutof
banddata.
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeueingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.
NotethatEF_TCP_RCVBUFoverridesthisvalueandEF_
TCP_RCVBUF_ESTABLISHED_DEFAULTcanalsooverride
thisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RCVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_RECVTIMEO setsthetimeoutinmillisecondsforblockingreceive
calls.
SO_REUSEADDR canreuselocalportnumbersi.e.anothersocketcan
bindtothesameportexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowsmultiplesocketstobindtothesameport.
Option Description
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 83
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.NotethatEF_TCP_
SNDBUF,EF_TCP_SNDBUF_MODEandEF_TCP_SNDBUF_
ESTABLISHED_DEFAULTcanoverridethisvalue.
WhentheEF_TCP_SNDBUF_MODEissetto2,the
SNDBUFsizeisautomaticallyadjustedforeachTCP
sockettomatchthewindowadvertisedbythepeer.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPNS enable/disablereceivingtheSO_TIMESTAMPcontrol
message.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_
SYS_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SOF_TIMESTAMPING_
OPT_CMSG
delivertimestampsusingthecmsgAPI.
ONLOAD_SOF_
TIMESTAMPING_STREAM
OnloadextensiontothestandardSO_TIMESTAMPING
APItosupporthardwaretimestampsonTCPsockets.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
IP_TRANSPARENT thissocketoptionallowsthecallingapplicationtobind
thesockettoanonlocalIPaddress.
Option Description
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 84
8.5TCPLevelOptions
OnloadTCPsupportsthefollowingTCPoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls
Option Description
TCP_CORK stopssendsonsegmentslessthanMSSsizeuntilthe
connectionisuncorked.
TCP_DEFER_ACCEPT aconnectionisESTABLISHEDafterhandshakeis
completeinsteadofleavingitinSYNRECVuntilthe
firstrealdatapacketarrives.Theconnectionisplaced
intheacceptqueuewhenthefirstdatapacketarrives.
TCP_INFO populatesaninternaldatastructurewithtcpstatistic
values.
TCP_KEEPALIVE_ABORT_
THRESHHOLD
howlongtotrytoproduceasuccessfulkeepalive
beforegivingup.
TCP_KEEPALIVE_
THRESHHOLD
specifiestheidletimeforkeepalivetimers.
TCP_KEEPCNT numberofkeepalivesbeforegivingup.
TCP_KEEPIDLE idletimeforkeepalives.
TCP_KEEPINTVL timebetweenkeepalives.
TCP_MAXSEG getstheMSSsizeforthisconnection.
TCP_NODELAY disablesNagle’sAlgorithmandsmallsegmentsaresent
withoutdelayandwithoutwaitingforprevious
segmentstobeacknowledged.
TCP_QUICKACK whenenabledACKmessagesaresentimmediately
followingreceptionofthenextdatapacket.Thisflag
willberesettozerofollowingeveryusei.e.itisaone
timeoption.Defaultvalueis1(enabled).
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 85
8.6TCPFileDescriptorControl
Onloadsupportsthefollowingoptionsinsocket()andaccept()calls.
8.7TCPCongestionControl
OnloadTCPimplementscongestioncontrolinaccordancewithRFC3465and
employstheNewRenoalgorithmwithextensionsforAppropriateByteCounting
(ABC).
Onneworidleconnectionsandthoseexperiencingloss,OnloademploysaFast
Startalgorithminwhichdelayedacknowledgmentsaredisabled,therebycreating
moreACKsandsubsequently‘growingthecongestionwindowrapidly.Two
environmentvariables;EF_TCP_FASTSTART_INITandEF_TCP_FASTSTART_LOSS
areassociatedwiththefaststart‐RefertoParameterReferenceonpage163for
details.
DuringSlowStart,thecongestionwindowisinitiallysetto2xmaximumsegment
size(MSS)value.AseachACKisreceivedthecongestionwindowsizeisincreasedby
thenumberofbytesacknowledgeduptoamaximum2xMSSbytes.Thisallows
Onloadtotransmittheminimumofthecongestionwindowandadvertisedwindow
sizei.e.
transmissionwindow(bytes)=min(CWND,receiveradvertisedwindowsize)
Iflossisdetected‐eitherbyretransmissiontimeout(RTO),orthereceptionof
duplicateACKs,Onloadwilladoptacongestionavoidancealgorithmtoslowthe
transmissionrate.Incongestionavoidancethetransmissionwindowishalvedfrom
itscurrentsize‐butwillnotbelessthan2xMSS.Ifcongestionavoidancewas
triggeredbyanRTOtimeouttheSlowStartalgorithmisagainusedtorestorethe
transmitrate.IftriggeredbyduplicateACKsOnloademploysaFastRetransmitand
FastRecoveryalgorithm.
IfOnloadTCPreceives3duplicateACKsthisindicatesthatasegmenthasbeenlost
‐ratherthanjustreceivedoutoforderandcausestheimmediateretransmissionof
thelostsegment(FastRetransmit).ThecontinuedreceptionofduplicateACKsisan
indicationthattrafficstillflowswithinthenetworkandOnloadwillfollowFast
RetransmitwithFastRecovery.
Option Description
SOCK_CLOEXEC supportedinsocket()andaccept().SetstheO_
NONBLOCKfilestatusflagonthenewopenfile
descriptorsavingextracallstofcntl(2)toachievethe
sameresult.
SOCK_NONBLOCK supportedinaccept().Setsthecloseonexec(FD_
CLOEXEC)flagonthenewfiledescriptor.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 86
DuringFastRecoveryOnloadagainresortstothecongestionavoidance(without
SlowStart)algorithmwiththecongestionwindowsizebeinghalvedfromitspresent
value.
Onloadsupportsanumberofenvironmentvariablesthatinfluencethebehaviorof
thecongestionwindowandrecoveryalgorithmsidentifiedbelow.Referto
ParameterReferenceonpage163:
EF_TCP_INITIAL_CWND‐setstheinitialsize(bytes)ofcongestionwindow
EF_TCP_LOSS_MIN_CWND‐setstheminimumsizeofthecongestionwindow
followingloss.
EF_CONG_AVOID_SCALE_BACK‐slowsdowntherateatwhichtheTCP
congestionwindowisopenedtohelpreducelossinenvironmentsalready
sufferingcongestionandloss.
CAUTION:Thecongestionvariablesshouldbeusedwithcautionsoastoavoid
violatingTCPprotocolrequirementsanddegradingTCPperformance.
8.8TCPSACK
OnloadwillemployTCPSelectiveAcknowledgment(SACK)iftheoptionhasbeen
negotiatedandagreedbybothendsofaconnectionduringtheconnection
establishment3wayhandshake.RefertoRFC2018forfurtherinformation.
8.9TCPQUICKACK
TCPwillgenerallyaimtodeferthesendingofACKsinordertominimizethenumber
ofpacketsonthenetwork.OnloadsupportsthestandardTCP_QUICKACKsocket
optionwhichallowssomecontroloverthisbehavior.EnablingTCP_QUICKACK
causesanACKtobesentimmediatelyinresponsetothereceptionofthefollowing
datapacket.ThisisaoneshotoperationandTCP_QUICKACKselfclearstozero
immediatelyaftertheACKissent.
8.10TCPDelayedACK
BydefaultTCPstacksdelaysendingacknowledgments(ACKs)toimproveefficiency
andutilizationofanetworklink.DelayedACKsalsoimprovereceivelatencyby
ensuringthatACKsarenotsentonthecriticalpath.However,ifthesenderofTCP
packetsisusingNagle’salgorithm,receivelatencywillbeimpairedbyusingdelayed
ACKs.
UsingtheEF_DELACK_THRESHenvironmentvariabletheusercanspecifyhowmany
TCPsegmentscanbereceivedbeforeOnloadwillrespondwithaTCPACK.Referto
theParameterListonpage163fordetailsoftheOnloadenvironmentdelayedTCP
ACKvariables.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 87
8.11TCPDynamicACK
ThesendingofexcessiveTCPACKscanimpairperformanceandincreasereceive
sidelatency.AlthoughTCPgenerallyaimstodeferthesendingofACKs,Onloadalso
supportsafurthermechanism.TheEF_DYNAMIC_ACK_THRESHenvironmentvariable
allowsOnloadtodynamicallydeterminewhenitisnondetrimentaltothroughput
andefficiencytosendaTCPACK.OnloadwillforceanTCPACKtobesentifthe
numberofTCPACKspendingreachesthethresholdvalue.
RefertotheParameterListonpage163fordetailsoftheOnloadenvironment
delayedTCPACKvariables.
NOTE:WhenusedtogetherwithEF_DELACK_THRESHorEF_DYNAMIC_ACK_THRESH,
thesocketoptionTCP_QUICKACKwillbehaveexactlyasstatedabove.Bothonload
environmentvariablesidentifythemaximumnumberofsegmentsthatcanbe
receivedbeforeanACKisreturned.SendinganACKbeforethespecifiedmaximum
isreachedisallowed.
NOTE:TCPACKSshouldbetransmittedatasufficientratetoensuretheremoteend
doesnotdroptheTCPconnection.
8.12TCPLoopbackAcceleration
OnloadsupportstheaccelerationofTCPloopbackconnections,providingan
acceleratedmechanismthroughwhichtwoprocessesonthesamehostcan
communicate.AcceleratedTCPloopbackconnectionsdonotinvokesystemcalls,
reducetheoverheadsforread/writeoperationsandofferimprovedlatencyoverthe
kernelimplementation.
TheserverandclientprocesseswhowanttocommunicateusinganacceleratedTCP
loopbackconnectiondonotneedtobeconfiguredtoshareanOnloadstack.
However,theserverandclientTCPloopbacksocketscanonlybeacceleratedifthey
areinthesameOnloadstack.OnloadhastheabilitytomoveaTCPloopbacksocket
betweenOnloadstackstoachievethis.
TCPloopbackaccelerationisconfiguredviatheenvironmentvariablesEF_TCP_
CLIENT_LOOPBACKandEF_TCP_SERVER_LOOPBACK.AswellasenablingTCP
loopbackaccelerationtheseenvironmentvariablescontrolOnload’sbehaviorwhen
theserverandclientsocketsdonotoriginateinthesameOnloadstack.Thisgives
theusergreaterflexibilityandcontrolwhenestablishingloopbackonTCPsockets
eitherfromthelistening(server)socketorfromtheconnecting(client)socket.The
connectingsocketcanuseanylocaladdressorspecifytheloopbackaddress.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 88
Thefollowingdiagramillustratestheclientandserverloopbackoptions.Referto
ParameterReferenceonpage163foradescriptionoftheloopbackvariables.
Figure8:EF_TCP_CLIENT/SERVER_LOOPBACK
TheclientloopbackoptionEF_TCP_CLIENT_LOOPBACK=4,whenusedwiththe
serverloopbackoptionEF_TCP_SERVER_LOOPBACK=2,differsfromotherloopback
optionssuchthatratherthanmovesocketsbetweenexistingstackstheywillcreate
anadditionalstackandmovesocketsfrombothendsoftheTCPconnectionintothis
newstack.Thisavoidsthepossibilityofhavingmanyloopbacksocketssharingand
contendingfortheresourcesofasinglestack.
WhenclientandserverarenotthesameUUID,settheenvironmentvariableEF_
SHARE_WITHtoallowbothprocessestosharethecreatedsharedstack.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 89
8.13TCPStriping
OnloadsupportsaSolarflareproprietaryTCPstripingmechanismthatallowsa
singleTCPconnectiontousebothphysicalportsofanetworkadapter.Usingthe
combinedbandwidthofbothportsmeansincreasedthroughputforTCPstreaming
applications.TCPstripingcanbeparticularlybeneficialforMessagePassing
Interface(MPI)applications.
IftheTCPconnection’ssourceIPaddressanddestinationIPaddressareonthesame
subnetasdefinedbyEF_STRIPE_NETMASKthenOnloadwillattempttonegotiate
TCPstripingfortheconnection.OnloadTCPstripingmustbeconfiguredatboth
endsofthelink.
TCPstripingallowsasingleTCPconnectiontousethefullbandwidthofboth
physicalportsonthesameadapter.Thisshouldnotbeconfusedwithlink
aggregation/portbondinginwhichanyoneTCPconnectionwithinthebondcan
onlyuseasinglephysicalportandthereforemorethanoneTCPconnectionwould
berequiredtorealizethefullbandwidthoftwophysicalports.
NOTE:TCPstripingisdisabledbydefault.Toenablethisfeaturesettheparameter
CI_CFG_PORT_STRIPING=1intheonloaddistributionsourcedirectorysrc/
include/internal/tranport_config_opt.hfile.
8.14TCPConnectionResetonRTO
UndercertaincircumstancesitmaybepreferabletoavoidresendingTCPdatatoa
peerservicewhendatadeliveryhasbeendelayed.Oncedatahasbeensent,andfor
whichnoacknowledgmenthasbeenreceived,theTCPretransmissiontimeout
periodrepresentsaconsiderabledelay.Whentheretransmissiontimeout(RTO)
eventuallyexpiresitmaybepreferablenottoretransmittheoriginaldata.
OnloadcanbeconfiguredtoresetaTCPconnectionratherthanattemptto
retransmitdataforwhichnoacknowledgmenthasbereceived.
ThisfeatureisenabledwiththeEF_TCP_RST_DELAYED_CONNperstackenvironment
variableandappliestoallTCPconnectionsintheonloadstack.OnanyTCP
connectionintheonloadstack,iftheRTOtimerexpiresbeforeanACKisreceived
theTCPconnectionwillbereset.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 90
8.15ONLOAD_MSG_WARM
Applicationsthatsenddatainfrequentlymayseeincreasedsendlatencycompared
toanapplicationthatismakingfrequentsends.Thisisduetothesendpathand
associateddatastructuresnotbeingcacheandTLBresident(whichcanoccureven
iftheCPUhasbeenotherwiseidlesincetheprevioussendcall).
OnloadthereforesupportsapplicationsrepeatedlycallingsendtokeeptheTCPfast
sendpath‘warm’inthecachewithoutactuallysendingdata.Thisisparticularly
usefulforapplicationsthatonlysendinfrequentlyandhelpstomaintainlowlatency
performanceforthoseTCPconnectionsthatdonotsendoften.These“fake”sends
areperformedbysettingtheONLOAD_MSG_WARMflagwhencallingtheTCPsendcalls.
Themessagewarmfeaturedoesnottransmitanypackets.
charbuf[10];
send(fd,buf,10,ONLOAD_MSG_WARM);
Onloadstackdumpsupportsnewcounterstoindicatethelevelofmessagewarm
use:
warm_abortedisacountofthenumberoftimesamessagewarmsend
functionwascalled,butthesendpathwasnotexercisedduetoOnloadlocking
constraints.
warmisacountofthenumberoftimesamessagewarmsendfunctionwas
calledwhenthesendpathwasexercised.
NOTE:IftheONLOAD_MSG_WARMflagisusedonsocketswhicharenotaccelerated‐
includingthosehandedofftothekernelbyOnload,itmaycausethemessagewarm
packetstobeactuallysent.ThisisduetoalimitationinsomeLinuxdistributions
whichappeartoignorethisflag.TheOnloadextensionsAPIcanbeusedtocheck
whetherasocketsupportstheMSG_WARMfeatureviatheonload_fd_check_
feature()API(onload_fd_check_featureonpage223).
NOTE:WhenusingtheMSG_WARMfeature,Onloaddoesnotattempttosplitlarge
packetsintomultiplesegmentsandforthisreason,thesizeofdatapassedto
OnloadwhenusingtheMSG_WARMfeaturemustnotexceedtheMSSvalue.
NOTE:Onloadversionsearlierthan201310donotsupporttheONLOAD_MSG_WARM
socketflag,thereforesettingtheflagwillcausemessagewarmpacketstobesent.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 91
8.16Listen/AcceptSockets
TCPsocketsacceptedfromalisteningsocketwillshareawildcardfilterwiththe
parentsocket.ThefollowingOnloadmoduleoptionscanbeusedtocontrol
behaviorwhentheparentsocketisclosed.
oof_shared_keep_thresh‐default100,isthenumberofacceptedsocketssharing
awildcardfilterthatwillcausethefiltertopersistafterthelisteningsockethas
closed.
oof_shared_steal_thresh‐default200,isthenumberofsocketssharinga
wildcardfilterthatwillcausethefiltertopersistevenwhenanewlisteningsocket
needsthefilter.
Ifthelisteningsocketisclosedthebehaviordependsonthenumberofremaining
acceptedsocketsasfollows:
8.17SocketCaching
SocketcachingmeansOnloadcanfurtherreducetheoverheadofsettingupnew
TCPconnectionsbyreusingexistingsocketsinsteadofcreatingfromnew.
Acachedsocketretainsafiledescriptorandsocketbufferwhenitisreturnedtothe
cacheoftheOnloadstackfromwhichitoriginated.
SocketcachingisenabledwhenEF_SOCKET_CACHE_MAXissettoavaluegreater
thanzero.Onloadwillapplypassiveoractivecachingasappropriateforthetypeof
socketscreatedbytheuserapplication.
EF_SOCKET_CACHE_MAXappliestobothactiveandpassivesockets,i.e.ifsetto100
thecachelimitis100ofeachsockettype.
Numberofacceptedsockets OnloadAction
>oof_shared_keep_threshbut
<oof_shared_steal_thresh
Retainthewildcardfiltersharedbyall
acceptedsockets.
Ifanewlisteningsocketrequiresthefilter,
Onloadwillinstallafullmatchfilterforeach
acceptedsocketallowingthelisteningsocket
tousethewildcardfilter.
>oof_shared_steal_thresh Retainthewildcardfiltersharedbyall
acceptedsockets.
Anewlisteningsocketcanbecreatedbuta
filtercannotbeinstalledmeaningthesocket
willreceivenotrafficuntilthenumberof
acceptedconnectionsisreduced.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 92
TCPPassiveSocketCaching
Passivesocketcaching,supportedfromtheOnload201502release,meansOnload
willreusesocketbuffersandfiledescriptorsfrompassiveopen(listeningsockets).
ThiscanimprovetheacceptrateofactiveopenTCPconnectionsandwillbenefit
processeswhichneedtoacceptlotsofconnectionsfromtheselisteningsockets.
TCPActiveSocketCaching
Activesocketcaching,supportedfromtheOnload201509release,meansOnload
willreusesocketbuffersandfiledescriptorsfromactiveopensocketswhenan
establishedTCPconnectionhasterminated.
ActiveopensocketssettingtheIP_TRANSPARENTsocketoptioncanbecached.
CachingStackdump
OnloadstackdumpcanbeusedtomonitorcachingactivityonOnloadstacks.
#onload_stackdumplots[|grepcache]
Counter Description
activecache:hit=0
avail=0cache=EMPTY
pending=EMPTY
TCPsocketcaching:
hit=numberofcachehits(werecached)
avail=numberofsocketsavailableforcaching
currentcachestate
sockcache_cached Numberofsocketscachedoverthelifetimeofthe
stack
sockcache_
contention
Numberofsocketsnotcachedduetolock
contention
passive_sockcache_
stacklim
Numberofpassivesocketsnotcachedduetostack
limit
active_sockcache_
stacklim
Numberofactivesocketsnotcachedduetostack
limit
sockcache_socklim Numberofsocketsnotcachedduetosocketlimit
sockcache_hit Numberofsocketcachehits(werecached)
sockcache_hit_reap Numberofsocketcachehits(werecached)after
reaping
sockcache_miss_
intmismatch
Numberofsocketcachemissesduetomismatched
interfaces
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 93
Caching‐Requirements
Therearesomenecessaryprerequisiteswhenusingsocketcaching:
•setEF_UL_EPOLL=3andsetEF_FDS_MT_SAFE=1
•socketcachingisnotsupportedafterfork()
•socketsthathavebeendup()edwillnotbecached
•socketsthatusetheO_ASYNCorO_APPENDmodeswillnotbecached
• cachingoffersnobenefitifasinglesocketacceptsconnectionsonmultiple
localaddresses(applicabletopassivecachingonly).
•SetO_NONBLOCKorO_CLOEXECifrequiredonthesocket,whencreatingthe
socket.
Whensocketcachingcannotbeenabled,socketswillbeprocessedasnormal
Onloadsockets.
Usersshouldrefertodetailsofthefollowingenvironmentvariables:
EF_SOCKET_CACHE_MAX
EF_PER_SOCKET_CACHE_MAX
EF_SOCKET_CACHE_PORTS
NOTE:Allowingmoresocketstobecachedthantherearefiledescriptorsavailable
canresultindrasticallyreducedperformanceandusersshouldconsiderthatthe
socketcachelimit,EF_SOCKET_CACHE_MAX,appliesperstack,unliketheper
processEF_SOCKET_CACHE_PORTSlimits.
RefertoParameterReferenceonpage163fordetailsofOnloadenvironment
variables.
8.18ScalableFilters
Usingscalablefilters,anOnloadstackcaninstallaMACfiltertoreceivealltraffic
fromaspecifiedinterface.
activecache_cached Numberofactivesocketscachedoverthelifetime
ofthestack.
activecache_
stacklim
Numberofactivesocketsnotcachedduetostack
limit
activecache_hit Numberofactivesocketcachehits(werecached)
activecache_hit_
reap
Numberofactivesocketcachehits(werecached)
afterreaping
Counter Description
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 94
NOTE:OncetheMACfilterisinsertedonaninterface,ARP,ICMPandIGMPtraffic
isdirectedtothekernel,butallothertrafficisdirectedtoasingleOnloadstack.
Usingscalablefiltersremoveslimitationson:
•thenumberoflisteningsocketsinscalablefilterspassivemode
•thenumberofactiveopenconnectionsinscalablefilterstransparentactive
mode.ThisworksonlyforsocketshavingtheIP_TRANSPARENToptionset.See
TransparentReverseProxyModesonpage95below.
ItissuggestedthatadedicatedinterfaceisusedbythestackinsertingtheMACfilter.
Thisallowsthekernelstackoranotherapplicationusingscalablefilterstousethe
samephysicalport.
SolarflareSFN7000andSFN8000seriesadaptercanbepartitionedtoexposeupto
16PCIephysicalfunctions(PF).EachPFispresentedtotheOSasastandardnetwork
interface.Theadapterispartitionedwiththesfbootutility‐seeexamplebelow.
OnceaMACfilterhasbeeninstalledonaPF,otherOnloadstackscanstillreceive
othertrafficonthesamePF,butsocketswillhavetoinsertIPfiltersfortherequired
traffic.ApartfromARP,ICMPandIGMPpackets,OSkernelsockets,usingthesame
PF,willnotreceiveanytraffic.
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Ifa
processcreatesmultiplestacks,theEF_SCALABLE_FILTERS_ENABLEperstack
variablecanbeusedtoenable/disablethisfeatureforindividualstacksusingthe
existingOnloadextensionsAPIe.g.
onload_stack_opt_set_int(EF_SCALABLE_FILTERS_ENABLE,1);
TheMACfilterisinsertedwhenthestackiscreated‐i.e.beforesocketsarecreated,
andsocketsneedtobecreatedtoreceiveanytrafficdestinedforthisstack.
ScalableFilters‐Restrictions
•ScalablefiltersareonlyusedforTCPtraffic.
•UDPtrafficcanbereceivedandacceleratedbyOnloadoninterfaceswhere
scalablefiltersareenabled,butkernelUDPsocketswillnotreceivetraffic.
•UDPfragmentedframescannotbereceivedoninterfaceswherescalablefilters
areenabled.Usersshouldavoidhavingfragmentedframesontheseinterfaces.
•Theadaptermustusethefullfeatureorlowlatencyfirmwarevariants.
• Minimumfirmwareversion:4.6.5.1000.
•Stackperthreadoptions(EF_STACK_PER_THREAD)cannotbeusedwiththis
feature.
•BydefaultthescalablefiltersfeaturerequiresCAP_NET_RAW.Onloadcanbe
configuredtoavoidcapabilitychecksforthisusingtheOnloadmoduleoption
scalable_filter_gid.SeeModuleOptionsonpage158fordetails.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 95
ScalableFilters‐Configuration
Toenablescalablefiltersonaspecificinterface:
EF_SCALABLE_FILTERS=enps0f0
Perinterface,theMACfiltercanonlybeinstalledbyasingleOnloadstack.Acluster
(seeApplicationClusteringonpage68)mighthavemultiplestacksandeachstack
couldinstallaMACfilteronadifferentinterface.
SocketsmustbeboundtotheIPaddressoftheinterface.
ThisfeatureistargetedatTCPlisteningsocketsonlyandconnectionsacceptedfrom
alisteningsocketwillsharetheMACfilter.
PartitiontheNIC
ThesfbootutilityisavailableintheSolarflareLinuxUtilitiespackage(SF107601LS),
thefollowingexampledemonstrateshowtopartitiontheadaptertoexposemore
thanonePF(Acoldrebootoftheserverisneededafterchangesusingsfboot).
#sfbootpfcount=2vfcount=0switchmode=partitioning
ScalableFiltersandBonding
Bondedinterfaces‐createdwiththestandardLinuxbondingorteamingdrivercan
beusedforscalablefilters.
Everyinterfacethatispartofthebondmustbepresentinthesystemwhenthe
scalablefiltersstackiscreated.Removingthebondwillcausethescalablefilterto
stopreceivingtraffic.Afteranewbondinterfaceiscreated,theapplicationmustbe
restartedtousethebond.
8.19TransparentReverseProxyModes
EnhancementssuchasScalableFilters,SocketCachingandsupportfortheIP_
TRANSPARENTsocketoptionsupportOnloadwithgreaterefficiencyandincreased
scalabilityintransparentreverseproxymodeserverdeployments.
Thesefeaturesreducetoaminimumtheoverheadsassociatedwithcreatingand
connectingtransparentsockets.Onloadcanuseofupto2milliontransparent
activeopensocketsperOnloadstack.
AtransparentsocketiscreatedwhenasocketsetstheIP_TRANSPARENTsocket
optionandexplicitlybindstoIPaddressesandport.TheIPaddresscanbeona
foreignhost.IP_TRANSPARENTmustbesetbeforethebind.
TheEF_SCALABLE_FILTERSvariableisusedtoenablescalablefiltersandto
configurethetransparentproxymode.
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 96
Restrictions
•TheIP_TRANSPARENToptionmustbesetbeforethesocketisbound.
•TheIP_TRANSPARENToptioncannotbeclearedafterbindonaccelerated
sockets.
• IP_TRANSPARENTsocketscannotbeacceleratediftheyareboundtoport0or
toINADDR_ANY.
• IP_TRANSPARENTsocketscannotbepassedtothekernelstackwhenboundto
aportthatisinthelistspecifiedbyEF_FORCE_TCP_REUSEPORT.
•Whenusingtherss:transparent_activemode(seebelow),EF_CLUSTER_
NAMEmustbeexplicitlysetbytheprocesssharingtheclusterandthestack
cannotbenamedbyeitherEF_NAMEoronload_set_stackname().
Config(example)Settings
BelowareexamplesofconfigurationsusingtheEF_SCALABLE_FILTERS
environmentoptiontosettransparentproxymodes.
• Enablescalablefiltersoninterfacep1p1‐thisinsertsaMACaddressfilteron
theadapter.Thefilterissharedbyallactiveopenconnectionsontheinterface.
SocketcachingwillbeappliedtothepassivesideoftheTCPconnection.
EF_SCALABLE_FILTERS=p1p1=passive
• Enablescalablefiltersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketcachingwillbeapplied
totheactivesideofaTCPconnection:
EF_SCALABLE_FILTERS=enps0f0=transparent_active
•Asfortheexampleabove,butusessymmetricalRSStoensurethatrequests/
responsesbetweenclientsandbackendserversareprocessedbythesame
thread.
EF_SCALABLE_FILTERS=enps0f0=rss:transparent_active
• Enablescalablefiltersonenps0f0,thenallsocketsusingthisinterfacethathave
theIP_TRANSPARENTflagsetwillusetheMACfilter,othersocketswill
continuetousenormalIPfiltersonthisinterface.Socketbuffersarecached
fromactiveandpassivesidesoftheTCPconnection.
EF_SCALABLE_FILTERS=enps0f0=transparent_active:passive
OnloadUserGuide
Onload‐TCP
Issue22 ©SolarflareCommunications2017 97
8.20TransparentReverseProxyonMultipleCPUs
UsedtogetherwithApplicationClustering,transparentscalablemodescandeliver
linearscalabilityusingmultipleCPUcores.
ThisusesRSStodistributetraffic,bothupstreamanddownstreamoftheproxy
application,mappingstreamstothecorrectOnloadstack.WheneachCPUcoreis
associatedexclusivelywithasingleclusteredstacktherecanbenocontention
betweenstacks.
Forthisusecasetofunctioncorrectly,theproxyapplicationwillusethe
downstreamclientaddress:portontheupstream(toserver)sideoftheTCP
connection.InthiswayRSSandhardwarefiltersensurethatclientsideandserver
sidearehandledbythesameworkerthreadandtrafficisdirectedtothecorrect
stack.
Inthisscenariotheclientthinksitcommunicatesdirectlywiththeserver,andthe
serverthinksitcommunicatesdirectlywiththeclient‐thetransparentproxyserver
is‘transparent.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 98
9Onload‐UDP
9.1UDPOperation
ThetablebelowidentifiestheOnloadUDPimplementationRFCcompliance.
9.2SocketOptions
OnloadUDPsupportsthefollowingsocketoptionswhichcanbeusedinthe
setsockopt()andgetsockopt()functioncalls.
RFC Title Compliance
768 UserDatagramProtocol Yes
1122 RequirementsforHosts Yes
3678 SocketInterfaceExtensionsfor
MulticastSourceFilters
Partial
SeeSourceSpecificSocketOptions
onpage100
Option Description
SO_PROTOCOL retrievethesocketprotocolasaninteger.
SO_BINDTODEVICE bindthissockettoaparticularnetworkinterface.See
SO_BINDTODEVICEonpage60.
SO_BROADCAST whenenableddatagramsocketscansendandreceive
packetsto/fromabroadcastaddress.
SO_DEBUG enableprotocoldebugging.
SO_ERROR theerrnovalueofthelasterroroccurringonthe
socket.(Onlyvalidasagetsockopt()).
SO_EXCLUSIVEADDRUSE preventsothersocketsusingtheSO_REUSEADDR
optiontobindtothesameaddressandport.
SO_LINGER whenenabledaclose()orshutdown()willnotreturn
untilallqueuedmessagesforthesockethavebeen
successfullysentorthelingertimeouthasbeen
reached.Otherwisethecallreturnsimmediatelyand
socketsareclosedinthebackground.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 99
SO_PRIORITY setthepriorityforallpacketssentonthissocket.
Packetswithahigherprioritymaybeprocessedfirst
dependingontheselecteddevicequeuingdiscipline.
SO_RCVBUF setsorgetsthemaximumsocketreceivebufferin
bytes.
NotethatEF_UDP_RCVBUFoverridesthisvalue.
SettingSO_RCVBUFtoavalue<MTUcanresultin
poorerperformanceandisnotrecommended.
SO_RCVLOWAT setstheminimumnumberofbytestoprocessfor
socketinputoperations.
SO_RECVTIMEO setsthetimeoutforinputfunctiontocomplete.
SO_REUSEADDR canreuselocalportsi.e.anothersocketcanbindtothe
sameportnumberexceptwhenthereisanactive
listeningsocketboundtotheport.
SO_RESUSEPORT allowmultiplesocketstobindtothesameport.
SO_SNDBUF setsorgetsthemaximumsocketsendbufferinbytes.
ThevaluesetisdoubledbythekernelandbyOnloadto
allowforbookkeepingoverheadwhenitissetbythe
setsockopt()functioncall.NotethatEF_UDP_SNDBUF
overridesthisvalue.
SO_SNDLOWAT setstheminimumnumberofbytestoprocessfor
socketoutputoperations.Alwayssetto1byte.
SO_SNDTIMEO setthetimeoutforsendingfunctiontosendbefore
reportinganerror.
SO_TIMESTAMP enableordisablereceivingtheSO_TIMESTAMPcontrol
message(microsecondresolution).Seebelow.
SO_TIMESTAMPNS enableordisablereceivingtheSO_TIMESTAMPcontrol
message(nanosecondresolution).See
SO_BINDTODEVICEonpage60.
SO_TIMESTAMPING enable/disablehardwaretimestampsforreceived
packets.
SOF_TIMESTAMPING_TX_
HARDWARE
obtainahardwaregeneratedtransmittimestamp.
SOF_TIMESTAMPING_SYS
_HARDWARE
obtainahardwaretransmittimestampadjustedtothe
systemtimebase.
SO_TYPE returnsthesockettype(SOCK_STREAMorSOCK_DGRAM).
(Onlyvalidasagetsockopt()).
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 100
9.3SourceSpecificSocketOptions
Thefollowingtableidentifiessourcespecificsocketoptionssupportedfromonload
201210u1onwards.RefertoreleasenotesforOnloadspecificbehaviorregarding
theseoptions.
9.4OnloadSocketsvs.KernelSockets
ForeachUDPsocket,Onloadcreatesbothanacceleratedsocketandakernelsocket.
OnloadwillalwaysgiveprioritytotheOnloadsocketsoveranykernelsockets.
ThisisimportantbecauseifthereisalwaystrafficarrivingattheOnloadreceive
queue,Onloadwillmightnevergettoprocessanypacketsdeliveredviathekernel
socket(e.giftrafficarrivesfromanonSolarflareinterface).
9.5UDPSockets‐SendandReceivePaths
ForeachUDPsocket,Onloadcreatesbothanacceleratedsocketandakernelsocket.
Thereisusuallynofiledescriptorforthekernelsocketvisibleintheusersfile
descriptortable.WhenaUDPprocessisreadytotransmitdata,Onloadwillchecka
cachedARPtablewhichmapsIPaddressestoMACaddresses.Acache‘hitresults
insendingviatheOnloadacceleratedsocket.Acache‘miss’resultsinasyscallto
populatetheusermodecachedARPtable.IfnoMACaddresscanbeidentifiedvia
thisprocessthepacketissentviathekernelstacktoprovokeARPresolution.
Therefore,itispossiblethatsomeUDPtrafficwillbesentoccasionallyviathekernel
stack.
Option Description
IP_ADD_SOURCE_MEMBER
SHIP
Jointhesuppliedmulticastgrouponthegiveninterface
andacceptdatafromthesuppliedsourceaddress.
IP_DROP_SOURCE_MEMBE
RSHIP
Dropsmembershiptothegivenmulticastgroup,
interfaceandsourceaddress.
MCAST_JOIN_SOURCE_GR
OUP
Joinasourcespecificgroup.
MCAST_LEAVE_SOURCE_G
ROUP
Leaveasourcespecificgroup.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 101
Figure9:UDPSendandReceivePaths
Figure9illustratestheUDPsendandreceivepaths.Lighterarrowsindicatethe
accelerated‘kernelbypass’path.DarkerarrowsidentifyfragmentedUDPpackets
receivedbytheSolarflareadapterandUDPpacketsreceivedfromanonSolarflare
adapter.UDPpacketsarrivingattheSolarflareadapterarefilteredonsourceand
destinationaddressandportnumbertoidentifyaVNICthepacketwillbedelivered
to.FragmentedUDPpacketsarereceivedbytheapplicationviathekernelUDP
socket.UDPpacketsreceivedbyanonSolarflareadapterarealwaysreceivedviathe
kernelUDPsocket.
9.6FragmentedUDP
WhensendingdatagramswhichexceedtheMTU,theOnloadstackwillsend
multipleEthernetpackets.OnhostsrunningOnload,fragmenteddatagramsare
alwaysreceivedviathekernelstack.
9.7UserLevelrecvmmsgforUDP
Therecvmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduserlevelrecvmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Therecvmmsg()isnotsupportedforTCPsockets.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 102
9.8UserLevelsendmmsgforUDP
Thesendmmsg()functionisinterceptedforUDPsocketswhichareacceleratedby
Onload.
TheOnloaduserlevelsendmmsg()isavailabletosystemsthatdonothavekernel/
libcsupportforthisfunction.Thesendmmsg()isnotsupportedforTCPsockets.
9.9UDPsendfile
TheUDPsendfile()methodisnotcurrentlyacceleratedbyOnload.Whenan
Onloadacceleratedapplicationcallssendfile()thiswillbehandledseamlesslyby
thekernel.
9.10MulticastReplication
TheSolarflareSFN7000andSFN8000seriesadapterssupportmulticastreplication
wherereceivedpacketsarereplicatedinhardwareanddeliveredtomultiplereceive
queues.ThisfeatureallowsanynumberofOnloadclients,listeningtothesame
multicastdatastream,toreceivetheirowncopyofthepackets,withoutan
additionalsoftwarecopyandwithouttheneedtoshareOnloadstacks.Asillustrated
below,thepacketsaredeliveredmultipletimesbythecontrollertoeachreceive
queuethathasinstalledahardwarefiltertoreceivethespecifiedmulticaststream.
Figure10:HardwareMulticastReplication
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 103
Multicastreplicationisperformedintheadaptertransparentlyanddoesnotneed
tobeexplicitlyenabled.
ThisfeatureremovestheneedtoshareOnloadstacksusingtheEF_NAME
environmentvariable.UsersusingEF_NAMEexclusivelyforsharingmulticasttraffic
cannowremoveEF_NAMEfromtheconfigurations.
9.11MulticastOperationandStackSharing
Toillustratesharedstacks,thefollowingexamplesdescribeOnloadbehaviorwhen
twoprocesses,onthesamehost,subscribetothesamemulticaststream:
MulticastReceiveUsingDifferentOnloadStacksonpage103
MulticastTransmitUsingDifferentOnloadStacksonpage104
MulticastReceiveSharinganOnloadStackonpage104
MulticastTransmitSharinganOnloadStackonpage105
MulticastReceive‐OnloadStackandKernelStackonpage105.
NOTE:ThefollowingsubsectionsusetwoprocessestodemonstrateOnload
behavior.InpracticemultipleprocessescansharethesameOnloadstack.Stack
sharingisnotlimitedtomulticastsubscribersandcanbeemployedbyanyTCPand
UDPapplications.
MulticastReceiveUsingDifferentOnloadStacks
RunningonSFN5000orSFN6000seriesadapters(forSFN7000andSFN8000series,
seeMulticastReplicationonpage102),OnloadwillnoticeiftwoOnloadstackson
thesamehostsubscribetothesamemulticaststreamandwillrespondby
redirectingthestreamtogothroughthekernel.Handingthestreamtothekernel,
thoughstillusingOnloadstacks,allowsbothsubscriberstoreceivethedatagrams,
butuserspaceaccelerationislostandthereceiverateislowerthatitcould
otherwisebe.Figure11belowillustratestheconfiguration.Arrowsindicatethe
receivepathandfragmentedUDPpath.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 104
Figure11:MulticastReceiveUsingDifferentOnloadStacks.
ThereasonforthisbehaviorisbecausetheSolarflareNICwillnotdeliverasingle
receivedmulticastpacketmultipletimestomultiplestacksthepacketisdelivered
onlyonce(DoesnotapplytoSFN7000andSFN8000seriesadapters).Ifareceived
packetisdeliveredtokernelspace,thenthekernelTCP/IPstackwillcopythe
receiveddatamultipletimestoeachsocketlisteningonthecorrespondingmulticast
stream.IfthereceivedpacketweredelivereddirectlytoOnload,wherethestacks
aremappedtouserspace,itwouldonlybedeliveredtoasinglesubscriberofthe
multicaststream.
MulticastTransmitUsingDifferentOnloadStacks
ReferringtoFigure11,ifoneprocessweretotransmitmulticastdatagrams,these
wouldnotbereceivedbythesecondprocess.Onloadisonlyabletoaccelerate
transmittedmulticastdatagramswhentheydonotneedtobedeliveredtoother
applicationsinthesamehost.Ormoreaccurately,themulticaststreamcanonlybe
deliveredwithinthesameOnloadstack.
MulticastReceiveSharinganOnloadStack
SettingtheEF_NAMEenvironmentvariabletothesamestring(max8chars)inboth
processesmeanstheycanshareanOnloadstack.Thestreamisnolongerredirected
throughthekernelresultinginamuchhigherreceiveratethancanbeobservedwith
thekernelTCP/IPstack(orwithseparateOnloadstackswherethedatapathisvia
thekernelTCP/IPstack).ThisconfigurationisillustratedinFigure12below.Lighter
arrowsindicatetheaccelerated(kernelbypass)path.Darkerarrowsindicatethe
fragmentedUDPpath.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 105
Figure12:SharinganOnloadStack
MulticastTransmitSharinganOnloadStack
ReferringtoFigure12,datagramstransmittedbyoneprocesswouldbereceivedby
thesecondprocessbecausebothprocessessharetheOnloadstack.
MulticastReceive‐OnloadStackandKernelStack
IfamulticaststreamisbeingacceleratedbyOnload,andanotherapplicationthatis
notusingOnloadsubscribestothesamestream,thenthesecondapplicationwill
notreceivetheassociateddatagrams.Thereforeifmultipleapplicationssubscribe
toaparticularmulticaststream,eitherallornoneshouldberunwithOnload.
ToenablemultipleapplicationsacceleratedwithOnloadtosubscribetothesame
multicaststream,theapplicationsmustsharethesameOnloadstack.Stacksharing
isachievedbyusingtheEF_NAMEenvironmentvariable(max8chars).
MulticastReceiveandMultipleSockets
Whenmultiplesocketsjointhesamemulticastgroup,receivedpacketsare
deliveredtothesesocketsintheorderthattheyjoinedthegroup.
Whenmultiplesocketsarecreatedbydifferentthreadsandallthreadsarespinning
onrecv(),thethreadwhichisabletoreceivefirstwillalsodeliverthepacketsto
theothersockets.
Ifathread‘Aisspinningonpoll(),andanotherthread‘B’,listeningtothesame
group,callsrecv()butdoesnotspin,‘Awillnoticeareceivedpacketfirstand
deliverthepacketto‘B’withoutaninterruptoccurring.
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 106
9.12MulticastLoopback
ThesocketoptionIP_MULTICAST_LOOPcontrolswhethermulticasttrafficsentona
socketcanbereceivedlocallyonthemachine.Receivingmulticasttrafficlocally
requiresboththesenderandreceivertobeusingthesameOnloadstack.Therefore,
whenareceiverisinthesameapplicationasthesenderitwillreceivemulticast
traffic.Ifsenderandreceiverareindifferentapplicationsthenbothmustberunning
OnloadandmustbeconfiguredtosharethesameOnloadstack.
FortwoprocessestoshareanOnloadstackbothmustsetthesamevalueforthe
EF_NAMEparameter(max8chars).Ifonelocalprocessistoreceivethedatasentby
asendinglocalprocess,EF_MCAST_SENDmustbesetto1or3onthethreadcreator
ofthestack.
UserofearlierOnloadversionsandusersofEF_MULTICAST_LOOP_OFFshouldrefer
totheParameterReferencetableParameterReferenceonpage163fordetailsof
deprecatedfeatures.
9.13HardwareMulticastLoopback
AnalternativetotheOnloadstacksharingschemedescribedinMulticastLoopback,
HardwareMulticastLoopback,availablefromopenonload201405,enablesthe
passingofmulticasttrafficbetweenOnloadstacksallowingapplicationsrunningon
thesameservertobenefitfromOnloadaccelerationwithouttheneedtosharean
Onloadstacktherebyreducingtheriskofstacklockandresourcecontention.
Figure13:HardwareMulticastLoopback
OnloadUserGuide
Onload‐UDP
Issue22 ©SolarflareCommunications2017 107
•OnlyavailableontheSolarflareFlareonSFN7000andSFN8000seriesadapters.
•Adaptersmusthaveaminimumfirmwareversionv4.0.7.6710and“full
featured”firmwaremustbeselectedusingthefirmwarevariantoptionvia
the“sfbootutility.RefertotheSolarflareServerUserGuide‘sfboot
parametersforfurtherdetails.
HardwareMulticastLoopbackallowsdatageneratedbyoneprocesstobereceived
byanotherprocessonthesamehost‐MulticastReplicationdoesnotsupportlocal
loopback.
ReceptionofloopedbacktrafficisenabledbydefaultonaperOnloadstackbasis.A
stackcanchoosenottoreceiveloopedbacktrafficbysettingtheenvironment
variableEF_MCAST_RECV_HW_LOOP=0.
NOTE:HardwareMulticastLoopbackisenabledthroughasinglehardwarefilter.
Forthisreason,ifanysingleprocesschoosestoreceivemulticastloopbacktraffic
byEF_MCAST_RECV_HW_LOOP=1,thenallotherprocessesjoinedtothesame
multicastgroupwillalsoreceivetheloopbacktrafficregardlessoftheirsettingfor
EF_MCAST_RECV_HW_LOOP.
Sendingofloopedbacktrafficisdisabledbydefault.Onaperstackbasisthisfeature
canbeenabledbysettingtheenvironmentvariableEF_MCAST_SENDtoeither2or3.
SettingthesocketoptionMULTICAST_TTL=0willdisablethesendingoftrafficonthe
normalnetworkpathandpreventtrafficbeingloopedback.Thevalueofthesocket
optionIP_MULTICAST_LOOPhasnoeffectonHardwareMulticastLoopback.Refer
toOnloadandIP_MULTICAST_TTLonpage132fordifferencesinLinuxkerneland
Onloadbehavior.
9.14IP_MULTICAST_ALL
Foranacceleratedsocket,OnloadwillalwaysbehaveasifIP_MULTICAST_ALL=0.
Thereisalwaysthepotentialformessagestoarriveatathehost‐perhapsfroma
nonSolarflareinterfaceorviatheloopbackinterface‐whichwillalsobedelivered
tothesocketundernormalUDPportmatchingrulessothesocketcouldreceive
trafficforgroupsnotexplicitlyjoinedonthissocket.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 108
10PacketBuffers
10.1Introduction
PacketbuffersdescribethememoryusedbytheOnloadstack(andSolarflare
adapter)toreceive,transmitandqueuenetworkdata.Packetbuffersprovidea
methodforusermodeaccessiblememorytobedirectlyaccessedbythenetwork
adapterwithoutcompromisingsystemintegrity.
Onloadwillrequesthugepagesiftheseareavailablewhenallocatingmemoryfor
packetbuffers.Usinghugepagescanleadtoimprovedperformanceforsome
applicationsbyreducingthenumberofTranslationLookasideBuffer(TLB)entries
neededtodescribepacketbuffersandthereforeminimizeTLB‘thrashing.
NOTE:OnloadhugepagesupportshouldnotbeenablediftheapplicationusesIPC
namespacesandtheCLONE_NEWIPCflag.
Onloadofferstwoconfigurationmodesfornetworkpacketbuffers:
10.2NetworkAdapterBufferTableMode
Solarflarenetworkadaptersemployaproprietaryhardwarebasedbufferaddress
translationmechanismtoprovidememoryprotectionandtranslationtoOnload
stacksaccessingaVNIContheadapter.Thisisthedefaultpacketbuffermodeand
issuitableforthemajorityofapplicationsusingOnload.
Thisschemeemploysabuffertableresidingonthenetworkadaptertocontrolthe
memoryanOnloadstackcanusetosendandreceivepackets.
Whiletheadaptersbuffertableissufficientforthemajorityofapplications,on
adapterspriortotheSFN7000series,itislimitedtoapproximately120,000x2Kbyte
bufferswhichhavetobesharedbetweenallOnloadstacks.
IfthetotalpacketbufferrequirementsofallapplicationsusingOnloadrequiremore
thanthenumberofpacketbufferssupportedbytheadaptersbuffertable,theuser
shouldconsiderchangingtotheScalablePacketBuffersconfiguration.
10.3LargeBufferTableSupport
TheSolarflareSFN7000andSFN8000seriesadaptersalleviatethepacketbuffer
limitationsofpreviousgenerationSolarflareadaptersandsupportmanymorethan
the120,000packetbufferwithouttheneedtoswitchtoScalablePacketBuffer
Mode.
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 109
EachbuffertableentryintheSFN7000andSFN8000seriesadaptercandescribea
4Kbyte,64Kbyte,1Mbyteor4Mbyteblockofmemorywhereeachtableentryisthe
pagesizeasdirectedbytheoperatingsystem.
10.4ScalablePacketBufferMode
ScalablePacketBufferModeisanalternativepacketbuffermodewhichallowsa
muchhighernumberofpacketbufferstobeusedbyOnload.UsingtheScalable
PacketBufferModeOnloadstacksemploySingleRootI/OVirtualization(SRIOV)
virtualfunctions(VF)toprovidememoryprotectionandtranslation.
FordeploymentswhereusingSRIOVand/ortheIOMMUisnotanoption,Onload
alsosupportsanalternativeScalablePacketBufferModeschemecalledPhysical
AddressingMode.Physicaladdressingdoesnotprovidethememoryprotection
providedbySRIOVandanIOMMU.FordetailsofPhysicalAddressingModesee
PhysicalAddressingModeonpage117.
NOTE:MRGusersshouldrefertoRedHatMRG2andSRIOVonpage143.
ForfurtherdetailsonSRIOVconfigurationrefertoConfiguringScalablePacket
Buffersonpage113.
10.5AllocatingHugePages
Usinghugepagescanleadtoimprovedperformanceforsomeapplicationsby
reducingthenumberofTranslationLookasideBuffer(TLB)entriesneededto
describepacketbuffersandthereforeminimizeTLB‘thrashing.Hugepagesalso
delivermanypacketsbuffers,butconsumeonlyaasingleentryinthebuffertable.
Explicithugepagesarerecommended.
Thecurrenthugepageallocationcanbecheckedbyinspectionof/proc/meminfo:
cat/proc/meminfo|grepHuge
Thisshouldreturnsomethingsimilarto:
AnonHugePages:2048kB
HugePages_Total:2050
HugePages_Free:2050
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:2048kB
Thetotalnumberofhugepagesavailableonthesystemisthevalue
HugePages_Total.Thefollowingcommandcanbeusedtodynamicallysetand/or
changethenumberofhugepagesallocatedonasystemto<N>(where<N>isanon
negativeinteger):
echo<N>>/proc/sys/vm/nr_hugepages
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 110
OnaNUMAplatform,thekernelwillattempttodistributethehugepagepoolover
thesetofallallowednodesspecifiedbytheNUMAmemorypolicyofthetaskthat
modifiesnr_hugepages.Thefollowingcommandcanbeusedtocheckthepernode
distributionofhugepagesinaNUMAsystem:
cat/sys/devices/system/node/node*/meminfo|grepHuge
HugepagescanalsobeallocatedonaperNUMAnodebasis(ratherthanhavethe
hugepagesallocatedacrossmultipleNUMAnodes).Thefollowingcommandcanbe
usedtoallocate<N>hugepagesonNUMAnode<M>:
echo<N>>/sys/devices/system/node/node<M>/hugepages/hugepages2048kB/nr_hugepages
10.6HowPacketBuffersAreUsedbyOnload
EachpacketbufferisallocatedtoexactlyoneOnloadstackandisusedtoreceive,
transmitorqueuenetworkdata.PacketbuffersareusedbyOnloadinthefollowing
ways:
1Receivedescriptorrings.BydefaulttheRXdescriptorringwillhold512packet
buffersatalltimes.ThisvalueisconfigurableusingtheEF_RXQ_SIZE(per
stack)variable.
2Transmitdescriptorrings.BydefaulttheTXdescriptorringwillholdupto512
packetbuffers.ThisvalueisconfigurableusingtheEF_TXQ_SIZE(perstack)
variable.
3Toqueuedataheldinreceiveandtransmitsocketbuffers.
4TCPsocketscanalsoholdpacketbuffersinthesocket’sretransmitqueueand
inthereorderqueue.
5Userlevelpipesalsoconsumepacketbufferresources.
IdentifyingPacketBufferRequirements
WhendecidingthenumberofpacketbuffersrequiredbyanOnloadstack
considerationshouldbegiventotheresourceneedsofthestacktoensurethatthe
availablepacketbufferscanbesharedefficientlybetweenallOnloadstacks.
Example1:
Ifweconsiderahypotheticalcaseofasinglehost:
‐ whichemploysmultipleOnloadstackse.g10
‐ eachstackhasmultiplesocketse.g6
‐ andeachsocketusesmanypacketbufferse.g2000
Thiswouldrequireatotalof120000packetbuffers
Example2:
IfonastacktheTCPreceivequeueis1MbyteandtheMSSvalueis1472bytes,
thiswouldrequireatleast700packetbuffers‐(andagreaternumberif
segmentssmallerthattheMSSwerereceived).
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 111
Example3:
AUDPreceivequeueof200Kbyteswherereceiveddatagramsareeach200
byteswouldhold500packetbuffersbecauseeachpacketbufferis2048even
ifthedataislessthan2048bytes.
Theexamplesaboveuseonlyapproximatecalculatedvalues.The
onload_stackdumpcommandprovidesaccuratemeasurementsofpacketbuffer
allocationandusage.
Considerationshouldbegiventopacketbufferallocationtoensurethateachstack
isallocatedthebuffersitwillrequireratherthana‘onesizefitsall’approach.
WhenusingtheBufferTableModethesystemislimitedto120Kpacketbuffers‐
theseareallocatedsymmetricallyacrossallSolarflareinterfaces.
NOTE:Packetbuffersareaccessibletoallnetworkinterfacesandeachpacketbuffer
requiresanentryineverynetworkadapters’buffertable.Addingmorenetwork
adapters‐andthereforemoreinterfacesdoesnotincreasethenumberofpacket
buffersavailable.
ForlargescaleapplicationstheScalablePacketBufferModeremovesthelimitations
imposedbythenetworkadapterbuffertable.SeeConfiguringScalablePacket
Buffersonpage113fordetails.
RunningOutofPacketBuffers
WhenOnloaddetectsthatastackisclosetoallocatingallavailablepacketbuffersit
willtakeactiontotryandavoidpacketbufferexhaustion.Onloadwillautomatically
startdroppingpacketsonreceiveand,wherepossible,willreducethereceive
descriptorringfilllevelinanattempttoalleviatethesituation.A‘memorypressure’
conditioncanbeidentifiedusingtheonload_stackdumplotscommandwhere
thepkt_bufsfieldwilldisplaytheCRITICALindicator.SeeIdentifyingMemory
Pressurebelow.
Completepacketbufferexhaustioncanresultindeadlock.InanOnloadstack,ifall
availablepacketbuffersareallocated(forexamplecurrentlyqueuedinsocket
buffers)thestackispreventedfromtransmittingfurtherdataastherearenopacket
buffersavailableforthetask.
IfallavailablepacketbuffersareallocatedthenOnloadwillalsofailtokeepits
adaptersreceivequeuesreplenished.Ifthequeuesfallemptyfurtherdatareceived
bytheadapterisinstantlydropped.OnaTCPconnectionpacketbuffersareusedto
holdunacknowledgeddataintheretransmitqueue,anddroppingreceivedpackets
containingACKsdelaysthefreeingofthesepacketbuffersbacktoOnload.Setting
thevalueofEF_MIN_FREE_PACKETS=0canresultinastackhavingnofreepacket
buffersandthis,inturn,canpreventthestackfromshuttingdowncleanly.
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 112
IdentifyingMemoryPressure
Thefollowingextractsfromtheonload_stackdumpcommandidentifyanOnload
stackundermemorypressure.
TheEF_MAX_PACKETSvalueidentifiesthemaximumnumberofpacketbuffersthat
canbeusedbythestack.EF_MAX_RX_PACKETSisthemaximumnumberofpacket
buffersthatcanbeusedtoholdpacketsreceived.EF_MAX_TX_PACKETSisthe
maximumnumberofpacketbuffersthatcanbeusedtoholdpacketstosend.These
twovaluesarealwayslessthatEF_MAX_PACKETStoensurethatneitherthetransmit
orreceivepathscanstarvetheotherofpacketbuffers.RefertoParameter
Referenceonpage163fordetaileddescriptionsoftheseperstackvariables.
TheexampleOnloadstackhasthefollowingdefaultenvironmentvariablevalues:
EF_MAX_PACKETS:32768
EF_MAX_RX_PACKETS:24576
EF_MAX_TX_PACKETS:24576
Theonload_stackdumplotscommandidentifiespacketbufferallocationandthe
onsetofamemorypressurestate:
pkt_bufs:size=2048max=32768alloc=24576free=32async=0CRITICAL
pkt_bufs:rx=24544rx_ring=9rx_queued=24535
Therearepotentially32768packetbuffersavailableandthestackhasallocated
(used)24576packetbuffers.
Inthesocketreceivebuffersthereare24544packetsbufferswaitingtobe
processedbytheapplication‐thisisapproachingtheEF_MAX_RX_PACKETSlimitand
isthereasontheCRITICALflagispresenti.e.theOnloadstackisundermemory
pressure.Only9packetbuffersareavailabletothereceivedescriptorring.
OnloadwillaimtokeeptheRXdescriptorringfullatalltimes.Iftherearenot
enoughavailablepacketbufferstorefilltheRXdescriptorringthisisindicatedbythe
LOWmemorypressureflag.
Theonload_stackdumplotscommandwillalsoidentifythenumberofmemory
pressureeventsandnumberofpacketsdroppedwhenOnloadfailstoallocatea
packetbufferonthereceivepath.
memory_pressure_enter:1
memory_pressure_drops:22096
Thememory_pressureenter/exitcounterscountthenumberoftimesOnload
enter/exitsastatewhenitistryingtorefillthereceivequeue(rxq)whentheadapter
runsoutofpacketbuffers.
ControllingOnloadPacketBufferUse
Anumberofenvironmentvariablescontrolthepacketbufferallocationonaper
stackbasis.RefertoParameterReferenceonpage163foradescriptionof
EF_MAX_PACKETS.
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 113
Unlessexplicitlyconfiguredbytheuser,EF_MAX_RX_PACKETSand
EF_MAX_TX_PACKETSwillbeautomaticallysetto75%oftheEF_MAX_PACKETS
value.Thisensuresthatsufficientbuffersareavailabletobothreceiveandtransmit.
TheEF_MAX_RX_PACKETSandEF_MAX_TX_PACKETSarenottypicallyconfiguredby
theuser.
Ifanapplicationrequiresmorepacketbuffersthanthemaximumconfigured,then
EF_MAX_PACKETSmaybeincreasedtomeetdemand,howeveritshouldbe
recognizedthatlargerpacketbufferqueuesincreasecachefootprintwhichcanlead
toreducedthroughputandincreasedlatency.
EF_MAX_PACKETSisthemaximumnumberofpacketbuffersthatcouldbeusedby
thestack.SettingEF_MAX_RX_PACKETStoavaluegreaterthanEF_MAX_PACKETS
effectivelymeansthatallpacketbuffers(EF_MAX_PACKETS)allocatedtothestack
willbeusedforRX‐withnothingleftforTX.Thesafestmethodistoonlyincrease
EF_MAX_PACKETSwhichkeepstheRXandTXpacketbuffersvaluesat75%ofthis
value.
10.7ConfiguringScalablePacketBuffers
UsingtheScalablePacketBufferModeOnloadstacksareboundtovirtualfunctions
(VFs)andprovideaPCISRIOVcompliantmeanstoprovidememoryprotectionand
translation.VFsemploythekernelIOMMU.
RefertoChapter12andScalablePacketBufferModeonpage141for32bitkernel
limitations.
Procedure:
Step1.PlatformSupportonpage113
Step2.BIOSandLinuxKernelConfigurationonpage114
Step3.UpdateadapterfirmwareandenableSRIOVonpage115
Step4.EnableVFsforOnloadonpage116
Step5.CheckPCIeVFConfigurationonpage116
Step6.CheckVFsinonload_stackdumponpage116
Step1.PlatformSupport
ScalablePacketBufferModeisimplementedusingSRIOV.Therewereseveral
kernelbugsinearlyincarnationsofSRIOVsupport,uptoandincludingkernel.org
2.6.34.ThefixeshavebeenbackportedtorecentRedHatkernels.Usersareadvised
toenablescalablepacketbuffermodeonRedHatkernel2.6.32131.0.15orlater,or
kernel.org2.6.35orlater.
•ThesystemhardwaremusthaveanIOMMUandthismustbeenabledinthe
BIOS.
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 114
•ThekernelmustbecompiledwithsupportforIOMMUandkernelcommand
lineoptionsarerequiredtoselecttheIOMMUmode.
•ThekernelmustbecompiledwithsupportforSRIOVAPIs(CONFIGPCIIOV).
•SRIOVmustbeenabledonthenetworkadapterusingthesfbootutility.
•Whenmorethan6VFsareneeded,thesystemhardwareandkernelmust
supportPCIeAlternativeRequesterID(ARI)‐aPCIeGen2feature.
•OnloadoptionsEF_PACKET_BUFFER_MODE=1mustbesetintheenvironment.
•Thesfcdrivermoduleoptionmax_vfsshouldbesettotherequirednumberof
VFs.
NOTE:TheScalablePacketBufferfeaturecanbesusceptibletoknownkernelissues
observedonRHEL6andSLES11.(Seehttp://www.spinics.net/lists/linuxpci/
msg10480.htmlfordetails.Theconditioncanresultinanunresponsiveserverif
intel_iommuhasbeenenabledinthegrub.conffile,aspertheprocedureatStep
2.BIOSandLinuxKernelConfigurationonpage114,andiftheSolarflare
sfc_resourcedriverisreloaded.Thisissuehasbeenaddressedinnewerkernels.
Step2.BIOSandLinuxKernelConfiguration
TouseSRIOV,hardwarevirtualizationmustbeenabled.RefertoRedHatEnabling
IntelVTxandAMDVVirtualizationinBIOSformoreinformation.Takecareto
enableVTdaswellasVTonanIntelplatform.
ToverifythattheextensionshavebeencorrectlyenabledrefertoRedHatVerifying
virtualizationextensions.Forbestkernelconfigurationperformanceandtoavoid
kernelbugsexhibitedwhenIOMMUisenabledforalldevices,Solarflare
recommendthekernelisconfiguredtousetheIOMMUinpassthroughmode‐
appendthefollowinglinestokernellineinthe/boot/grub/grub.conffile:
OnanIntelsystem:
intel_iommu=oniommu=on,pt
OnanAMDsystem:
amd_iommu=on,iommu=on,pt
InpassthroughmodetheIOMMUisbypassedforregulardevices.RefertoRedHat:
PCIpassthroughformoreinformation.
NOTE:Realtime(rt)kernelpatchesarenotcurrentlycompatiblewithIOMMUs
(RedHatMRGkernelsarecompiledwithCONFIG_PCI_IOVdisabled).Itispossible
tousescalablepacketbuffermodeonsomesystemswithoutIOMMUsupport,but
inaninsecuremode.InthisconfigurationtheIOMMUisbypassed,andthereisno
checkingofDMAaddressesprovidedbyOnloadinuserspace.Bugsormisbehavior
ofuserspacecodecancompromisethesystem.
Toenablethisinsecuremode,settheOnloadmoduleoption
unsafe_sriov_without_iommu=1forthesfc_resourcekernelmodule.
LinuxMRGusersareurgedtouseMRGu2andkernel3.2.33rt50.66.el6rt.x86_64
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 115
orlatertoavoidknownissuesandlimitationsofearlierversions.
Theunsafe_sriov_without_iommuoptionisobsoletedinOpenOnload201210.It
isreplacedbyphysicaladdressingmode‐seePhysicalAddressingModeon
page117fordetails.
Step3.UpdateadapterfirmwareandenableSRIOV
1DownloadandinstalltheSolarflareLinuxUtilitiesRPMfrom
support.solarflare.comandunziptheutilitiesfiletorevealtheRPM:
2InstalltheRPM:
#rpm‐Uvhsfutils<version>.rpm
3Identifythecurrentfirmwareversionontheadapter:
#sfupdate
4Upgradetheadapterfirmwarewithsfupdate:
#sfupdate‐‐write
FullinstructionsonusingsfupdatecanbefoundintheSolarflareNetwork
ServerAdapterUserGuide.
5UsesfboottoenableSRIOVandenabletheVFs.Youcanenableupto127VFs
perport,butthehostBIOSmayonlybeabletosupportasmallernumber.The
followingexamplewillconfigure16VFsoneachSolarflareport:
#sfbootsriov=enabledvfcount=16vfmsixlimit=1
6Itisnecessarytoreboottheserverfollowingchangesusingsfbootand
sfupdate.
NOTE:Enablingall127VFsperportwithmorethanoneMSIXinterruptperVFmay
notbesupportedbythehostBIOS.IftheBIOSdoesn'tsupportthisthenyoumay
get127VFsononeportandnoVFsontheotherport.YoushouldcontactyourBIOS
vendorforanupgradeorreducetheVFcount.
NOTE:OnRedHat5serversthevfcountshouldnotexceed32.
NOTE:VFallocationmustbesymmetricacrossallSolarflareinterfaces.
Option DefaultValue Description
sriov=<enabled|disabled> Disabled Enable/DisablehardwareSRIOV
support
vfcount=<n> 127 Numberofvirtualfunctions
advertisedperport.Seethe
notebelow.
vfmsixlimit=<n> 1 NumberofMSIXinterruptsper
VF
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 116
Step4.EnableVFsforOnload
#exportEF_PACKET_BUFFER_MODE=1
Thesfcdrivermodulemax_vfsshouldspecifythenumberofrequiredVFs.The
drivermoduleoptioncanbesetinausercreatedfile(e.g.sfc.conf)inthe/etc/
modprobe.ddirectory:
optionssfcmax_vfs=N
RefertoParameterReferenceonpage163forothervalues.
Step5.CheckPCIeVFConfiguration
ThenetworkadaptersfcdriverwillinitializetheVFs,whichcanbedisplayedbythe
lspcicommand:
#lspci‐d1924:
05:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020[Solarflare]
05:00.2Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.3Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.4Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.5Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.6Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:00.7Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.0Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
05:01.1Ethernetcontroller:SolarflareCommunicationsSFC9020VirtualFunction
[Solarflare]
Thelspciexampleoutputaboveidentifiesonephysicalfunctionperphysicalport
andthevirtualfunctions(fourforeachport)ofasingleSolarflaredualportnetwork
adapter.
Step6.CheckVFsinonload_stackdump
Theonload_stackdumpnetifcommandwillidentifyVFsbeingusedbyOnload
stacksasinthefollowingexample:
#onload_stackdumpnetif
ci_netif_dump:stack=0name=
ver=201109uid=0pid=3354
lock=10000000UNLOCKEDnics=3primed=3
sock_bufs:max=1024n_allocated=4
pkt_bufs:size=2048max=32768alloc=1152free=128async=0
pkt_bufs:rx=1024rx_ring=1024rx_queued=0
pkt_bufs:tx=0tx_ring=0tx_oflow=0tx_other=0
time:netif=3df7d2poll=3df7d2now=3df7d2(diff=0.000sec)
ci_netif_dump_vi:stack=0intf=0vi=67dev=0000:05:01.0hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 117
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
ci_netif_dump_vi:stack=0intf=1vi=67dev=0000:05:01.1hw=0C0
evq:cap=2048current=8is_32_evs=0is_ev=0
rxq:cap=511lim=511spc=15level=496total_desc=0
txq:cap=511lim=511spc=511level=0pkts=0oflow_pkts=0
txq:tot_pkts=0bytes=0
TheoutputabovecorrespondstoVFsadvertisedontheSolarflarenetworkadapter
interfaceidentifiedusingthelspcicommand‐RefertoStep5above.
10.8PhysicalAddressingMode
PhysicaladdressingmodeisaScalablePacketBufferModethatalsoallowsOnload
stackstouselargeamountsofpacketbuffermemory(avoidingthelimitationsofthe
addresstranslationtableontheadapter),butwithouttherequirementtoconfigure
anduseSRIOVvirtualfunctions.
Physicaladdressingmode,doeshowever,removememoryprotectionfromthe
networkadaptersaccessofpacketbuffers.Unprivilegeduserlevelcodeisprovided
anddirectlyhandlestherawphysicalmemoryaddressesofpacketsbuffers.User
levelcodeprovidesphysicalmemoryaddressesdirectlytotheadapterand
thereforehastheabilitytodirecttheadaptertoreadorwritearbitrarymemory
locations.Aresultofthisisthatamaliciousorbuggyapplicationcancompromise
systemintegrityandsecurity.OpenOnloadversionsearlierthanonload201210and
EnterpriseOnload2.1.0.0arelimitedto1millionpacketbuffers.Thislimitwas
raisedto2millionpacketsbuffersin201210u1andEnterpriseOnload2.1.0.1.
Toenablephysicaladdressingmode:
1Ignoreconfigurationsteps14above.
2Putthefollowingoptionintoausercreated.conffileinthe/etc/modprobe.d
directory:
optionsonloadphys_mode_gid=<n>
Wheresetting<n>tobe‐1allowsalluserstousephysicaladdressingmodeand
settingtoanintegerxrestrictsuseofphysicaladdressingmodetothespecific
usergroupx.
3ReloadtheOnloaddrivers
onload_toolreload
4EnabletheOnloadenvironmentusingEF_PACKET_BUFFER_MODE2or3.
EF_PACKET_BUFFER_MODE=2isequivalenttomode0,butusesphysical
addresses.Mode3usesSRIOVVFswithphysicaladdresses,butdoesnotuse
theIOMMUformemorytranslationandprotection.RefertoParameter
Referenceonpage163foracompletedescriptionofall
EF_PACKET_BUFFER_MODEoptions.
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 118
10.9ProgrammedI/O
PIO(programmedinput/output)describestheprocesswherebydataisdirectly
transferredbytheCPUtoorfromanI/Odevice.Itisanalternativetobusmaster
DMAtechniqueswheredataaretransferredwithoutCPUinvolvement.
SolarflareSFN7000seriesadapterssupportTXPIO,wherepacketsonthetransmit
pathcanbe“pushed”totheadapterdirectlybytheCPU.Thisimprovesthelatency
oftransmittedpacketsbutcancauseaverysmallincreaseinCPUutilization.TXPIO
isthereforeespeciallyusefulforsmallerpackets.
TheOnloadTXPIOfeatureisenabledbydefaultbutcanbedisabledviathe
environmentvariableEF_PIO.Anadditionalenvironmentvariable,
EF_PIO_THRESHOLDspecifiesthesizeofthelargestpacketsizethatcanuseTXPIO.
ThenumberofPIObuffersavailabledependontheadaptertypebeingusedandthe
numberofPCIePhysicalFunctions(PF)exposedperport.
Foroptimumperformance,PIObuffersshouldbereservedforcriticalprocessesand
otherprocessesshouldsetEF_PIOto0(zero).
TheOnloadstackdumputilityprovidesadditionalcounterstoindicatethelevelof
PIOuse‐seeTXPIOCountersonpage262fordetails.
TheSolarflarenetdriverwillalsousePIObuffersfornonacceleratedsocketsand
thiswillreducethenumberofPIObuffersavailabletoOnloadstacks.Topreventthis
setthedrivermoduleoptionpiobuf_size=0.Drivermoduleoptionscanbesetina
usercreatedfile(sfc.conf)inthe/etc/modprobe.ddirectory:
optionssfcpiobuf_size=0
AnOnloadstackrequiresonePIObufferforeachVIitcreates.AnOnloadstackwill
createoneVIforeachphysicalinterfacethatituses.
Solarflareadapter TotalPIObuffers MaximumperPF PIObuffersize
SFN7x02 16 16 2KB
SFN7x22 16 16 2KB
SFN7x24 32 16 2KB
SFN7X42 32 16 2KB
SFN8522 16 16 4KB
OnloadUserGuide
PacketBuffers
Issue22 ©SolarflareCommunications2017 119
WhenbothacceleratedandnonacceleratedsocketsareusingPIO,thenumberof
PIObuffersavailabletoOnloadstackscanbecalculatedfromtheavailablePIO
regions:
UsingtheaboveexamplevaluesappliedtoaSFN7x22adapter,eachPFonthe
adapterrequires:
piobuf_size*rss_cpus*num_PFs/regionsize=0.5regions‐(roundup‐soeach
portneeds1region).
Thisleaves162=14regionsforOnloadstackswhichalsorequireoneregionper
port,perstack.Thereforefromourexamplewecanhave7onloadstacksusingPIO
buffers.
PIObuffersareallocatedonafirstcome,firstservedbasis.Thefollowingwarning
mightbeobservedwhenstackscannotbeallocatedanymorePIObuffers:
WARNING:allPIObufsallocatedtootherstacks.ContinuingwithoutPIO.
UseEF_PIOtocontrolthis
ToensuremorebuffersareavailableforOnload,itispossibletopreventthenet
driverfromusingPIObuffers.Thiscanbedonebysettingthesfcdrivermodule
optioninausercreatedfileinthe/etc/modprobe.ddirectory:
optionssfcpiobuf_size=0
Driversshouldbereloadedforthechangestobeeffective:
#onload_toolreload
TheperstackEF_PIOvariablecanalsobeunsetforstackswherePIObuffersarenot
required.
Description Examplevalue
piobuf_size drivermoduleparameter 256
rss_cpus drivermoduleparameter 4
region achunkofmemory2048bytes 2048bytes
PF PCIephysicalfunction.TheSFN7000or
SFN8000seriesadaptercanbe
partitionedtoexposeupto8PFsper
physicalport.
RefertoOnloadandNICPartitioningon
page123fordetails
Default1PF
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 120
11OnloadandVirtualization
11.1Introduction
UsingOnloadfromrelease201502,acceleratedapplicationsareabletobenefitfrom
theinherentsecuritythroughisolation,easeofdeploymentthroughmigrationand
increasedresourcemanagementsupportedbyLinuxvirtualizedenvironments.
Thischapteridentifiesthefollowing:
OnloadandLinuxKVMonpage120
OnloadandNICPartitioningonpage123
OnloadinaDockerContaineronpage124
11.2Overview
• RunningOnloadinaVirtualMachine(VM)orDockerContainermeansthe
Onloadacceleratedapplicationbenefitsfromtheinherentisolationpolicyof
thevirtualizedenvironment.
•Thereisminimaldegradationoflatencyandthroughputperformance.Near
nativenetworkI/Operformanceispossiblebecausethereisdirecthardware
access(nohardwareemulation)withtheguestkernel(andvirtualization
platformhypervisor)beingbypassed.
• Multiplecontainers/virtualmachinescancoexistonthesamehostandallare
isolatedfromeachother.
11.3OnloadandLinuxKVM
ThisfeatureissupportedonSolarflareSFN7000andSFN8000seriesadapters.
OpenOnloadincludessupporttoaccelerateapplicationsrunningwithinLinuxVMs
onaKVMhost.Eachphysicalinterfaceontheadaptercanbeexposedtothehostas
upto16PCIephysicalfunctions(PF)andupto240virtualfunctions(VF).The
adapteralsosupportsupto2048MSIXinterrupts.
ThissupportrequiresaVF(orPF)tobeexposeddirectlyintotheLinuxVMKVM
callthisnetworkconfiguration“Networkhostdev.Onloadprovidesuserlevel
accesstotheadapterviatheVFinexactlythesamewayasisachievedonanon
virtualizedLinuxinstall.FirmwareontheSolarflareSFN7000andSFN8000series
adapterconfigureslayer2switchingcapabilitythatsupportsthetransportof
networkpacketsbetweenPCIphysicalfunctionsandvirtualfunctions.Thisfeature
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 121
supportsthetransportofnetworktrafficbetweenOnloadapplicationsrunningin
differentvirtualmachines.Thisallowstraffictobereplicatedacrossmultiple
functionsandtraffictransmittedfromoneVMcanbereceivedonanotherVM.
Figure14belowillustratesOnloaddeployedintotheLinuxKVMNetworkHostdev
architecturewhichexposesVirtualFunctions(VF)directlytotheVMguest.This
configurationallowstheOnloaddatapathtofullybypassthehostoperatingsystem
andprovidesmaximumaccelerationfornetworktraffic.
Figure14:OnloadandNetworkHostdevConfiguration
TodeployOnloadinaLinuxKVM:
•AsdetailedintheSolarflareServerAdapterUserGuide(SF103837CD)chapter
7SRIOV:
‐ InstalltheSolarflareNETdriverversion4.4.1.1017(orlater)
‐ Ensuretheadapterisusingfirmwareversion4.4.2.1011(orlater)
‐ Runsfboottoselectthefullfeaturefirmwarevariant,settheswitchmode
andidentifytherequirednumberofVFs:
#sfbootfirmwarevariant=fullfeatureswitchmode=sriovvfcount=4
‐ Reboottheserver,sotheLinuxKVMhostcanenumeratetheVFs
• FollowtheinstructionsinSolarflareServerAdapterUserGuide(SF103837CD)
sectionKVMLibvirtnetworkhostdev‐Configurationto:
‐ CreateaVM
‐ ConfiguretheVFs
‐ UnbindVFsfromthehost
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 122
‐ PassVFstotheVM
ExamplevirshcommandlineandXMLfileconfigurationinstructionsare
provided.
•InstallOnloadintheVMasinanonvirtualizedhost‐seeOpenOnload‐
Installationonpage25.
•Setthesfcdrivermoduleoptionnum_vistocreatethenumberofvirtual
interfaces.AVIisneededforeachOnloadstackcreatedonaVF.Drivermodule
optionsshouldbesetinausercreatedfile(e.gsfc.conf)inthe/etc/
modprobe.ddirectory.
optionssfcnum_vis=<NUM>
NOTE:WhenusingOnloadwithmultiplevirtualfunctions(VF)itisnecessaryto
settheOnloadmoduleoptionoof_all_ports_requiredtozero.SeeModule
Optionsonpage158fordetails.
TheSolarflareServerAdapterUserGuideisavailablefromhttps://
support.solarflare.com/.
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 123
11.4OnloadandNICPartitioning
EachphysicalinterfaceonaSolarflareSFN7000andSFN8000seriesadaptercanbe
exposedtothehostasmultiplePCIephysicalfunctions(PF).Upto16PFs,each
havingauniqueMACaddress,aresupportedperadapter.ToOnload,eachPF
representsavirtualadapter.
Figure15:OnloadandNICPartitioning
OntheadaptereachPFisbackedbyavirtualadapterandvirtualport‐these
componentsarecreatedbytheSolarflareNETdriverwhenitfindsapartitioned
adapter.ThePFscanbeconfiguredtotransparentlyplacetrafficonseparateVLANS
(soeachpartitionisonaseparatebroadcastdomain).
ToconfigureOnloadtousethepartitionedNIC:
• Ensuretheadapterisusingfirmwareversion4.4.2.1011(minimum)
•Usesfboottoselectthefullfeaturefirmwarevariant
•UsesfboottopartitiontheNICintomultiplePFs
•RebootingthehostallowsthefirmwaretopartitiontheNICintomultiplePFs.
•Toidentifywhichphysicalportanetworkinterfaceisusing:
#cat/sys/class/net/eth<N>/device/physical_port
ForcompletedetailsofconfiguringNICPartitioningrefertotheSolarflareServer
AdapterUserGuide(SF103837CD)chapter7SRIOVavailablefromhttps://
support.solarflare.com/.
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 124
11.5OnloadinaDockerContainer
Figure16illustratestheOnloaddeploymentinaDockercontainerenvironment.
Onlytheuserlevelcomponentsarecreatedinthecontainer.Onloadinthe
containerusestheOnloaddriversinstalledonthehostfornetworkI/O.Network
interfacesconfiguredonthehostarealsovisibleandusabledirectlyfromthe
container.
Figure16:OnloadinaDockerContainer
Inkeepingwiththecontainerizationtheory,itisenvisagedthatonlyasingleOnload
instancewillberunningineachcontainer,however,therearenorestrictions
preventingmultipleinstancesrunninginthesamecontainer.
11.6PreInstallation
Thisinstallproceduremakesthefollowingassumptions‐ensurethesecomponents
arecreated/installedbeforecontinuing:
•Dockerisinstalledonthehostserver.
•Onload201502(orlaterversion)mustbeinstalledonthehost.Anidentical
versionwillbeinstalledinthecontainer.
•TheOnloadinstallationinacontainermustmatchtheOnloadinstallationon
thehost.ConfigurationoptionssuchasanyCI_CFG_*optionssetinone
environmentmustmatchthosesetintheother.
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 125
NOTE:OnloaddoesnotcurrentlysupportLinuxnamespaces.SupportforLinux
Networknamespacesmaybeaddedinafuturerelease.
11.7Installation
1Thedockerruncommandwillcreateacontainernamedonload.Thecontainer
iscreatedfromthecentos:latestbaseimageandabashshellterminalwillbe
started.
[root@host]#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/onload_cplane
‐‐device=/dev/onload_epoll‐‐name=onload‐it‐v/src/openonload201502.tgz:/tmp/
openonload201502.tgzcentos:latest/bin/bash
Theexampleabovecopiestheopenonload201502.tgzfilefromthe/src
directoryonthehostandplacedthisfileinto/tmpinthecontainerrootfile
system.Allsubsequentcommandsareruninsidethecontainerunlesshostis
specified.
NOTE:Thedirective‐‐device=/dev/onload_cplaneisrequiredwhenusedwith
onload201606andlaterreleases.
2InstallrequiredOStools/packagesinthecontainer.
#yuminstallperlautoconfautomakelibtooltargccmakenettoolsethtool
DifferentdockerbaseimagesmayrequireadditionalOSpackagesinstalled.
3Unpackthetarballtobuildtheopenonload<version>subdirectory.
#/usr/bin/tar‐zxvf/tmp/openonload201502.tgz
Note:itisnotpossibletousetools/utilities(suchastar)fromthehostfile
systemonfilesinthecontainerfilesystem.
4Changedirectorytotheopenonload<version>/scriptsdirectory
#cd/tmp/openonload201502/scripts
5BuildandinstalltheOnloaduserlevelcomponentsinthecontainer:
#./onload_build‐‐user
Ifthebuildprocessidentifiesanymissingdependencies,returntostep2to
installmissingcomponents.
#./onload_install‐‐userfiles‐‐nobuild
Thefollowingwarningmayappearattheendoftheinstallprocess,butitisnot
necessarytoreloadthedrivers
onload_install:Toloadthenewlyinstalleddriversrun:onload_toolreload
6CheckOnloadinstallation
#onload
OpenOnload201502
Copyright20062012SolarflareCommunications,20022005Level5
Networks
Built:Feb5201512:41:04(release)
Kernelmodule:201502
usage:
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 126
onload[options]<command><commandargs>
options:
‐‐profile=<profile>‐‐commaseplistofconfigprofile(s)
‐‐forceprofiles‐‐profilesettingsoverrideenvironment
‐‐noapphandler‐‐donotuseappspecificsettings
‐‐app=<appname>‐‐identifyapplicationtorununderonload
‐‐version‐‐printversioninformation
‐v‐‐verbose
‐h‐‐help‐‐thishelpmessage
7Onthehost,checkthatthecontainerhasbeencreatedandisrunning:
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
e2a12a635359centos:latest"/bin/bash"15secondsagoUp14secondsonload
8Configurenetworkinterfaces.
Configurenetworkadapterinterfacesinthehost.Interfaceswillalsobevisible
andusablefromthecontainer:
#ifconfig‐a
9Onloadisnowinstalledandreadytouseinthecontainer.
11.8CreateOnloadDockerImage
TocreateanewdockerimagethatincludestheOnloadinstallationpriorto
migration.Allcommandsarerunonthehost.
1Identifythecontainer(noteCONTAINERIDorNAME)
#dockerps‐a
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
35bfeceb7022centos:latest"/bin/bash"24hoursagoExitedonload
2Createnewimage(thisexampleusestheNAMEvalue)
#dockercommit‐m"installedonload201502"onloadonload:v1
89e95645d5ff1fa02880dee44b433ab577f5a2715daf944fd0b393620d8253f1
3Listimages
#/dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv189e95645d5ff28secondsago486MB
centoslatestdade6cb4530a3daysago224MB
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 127
11.9Migration
Thedockersavecommandcanbeusedtoarchiveadockerimagewhichincludes
theOnloadinstallation.Thisimagecanthenbemigratedtootherservershavingthe
followingconfiguration:
•Dockerisinstalledanddockerserviceisrunning
•HostoperatingsystemRHEL7
•TheOnloadversionrunningonthehostmustbethesameasthemigrated
imageOnloadversion
•ThetargetserverdoesnotneedtohavethesameSolarflareadaptertypes
installed.
1Createatarfileofthecontainerimage:
#dockersave‐o<dirpathtostoreimage>/<nameofimage>.tar
<currentnameofimage>
Example(storeimagetarfileinhost/tmpdirectory):
#dockersave‐o/tmp/dkonload201502.taronload
2Theimagetarfilecanthenbecopiedtothetargetserverwhereitcanbe
loadedwiththedockerloadcommand:
#dockerload‐i/<pathtotransferredfile>/dkonload201502.tar
#dockerimages
REPOSITORYTAGIMAGEIDCREATEDVIRTUALSIZE
onloadv1303ec2d3e2b5Aboutanhourago486MB
3Create/runacontainerfromthetransferredimage.
#dockerrun‐‐net=host‐‐device=/dev/onload‐‐device=/dev/
onload_epoll‐‐name=onload‐itonload:v1/bin/bash
Whenthecontainerhasbeencreated,Onloadwillberunningwithinit.
NOTE:Thedirective‐‐device=/dev/onload_cplaneisrequiredwhenusedwith
onload201606andlaterreleases.
OnloadDockerImages
Onloadimagesarenotcurrentlyavailablefromthedefaultdockerregistryhub.
Imagesmaybemadeavailableifthereissufficientcustomerinterestand
requirementforthisfeature.
OnloadUserGuide
OnloadandVirtualization
Issue22 ©SolarflareCommunications2017 128
11.10CopyingFilesBetweenHostandContainer
Thefollowingexampledemonstrateshowtocopyfilesfromthehosttoacontainer.
Allcommandsarerunonthehost.
1GetthecontainerShortName(outputtruncated):
[root@hostname]#dockerps‐a
CONTAINERID
bd1ea8d5526c
2DiscoverthecontainerLongName:
[root@hostname]#dockerinspect‐f'{{.Id}}'bd1ea8d5526c
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156
Thecontainerlongnamecanalsobediscoveredusingthecontainernamein
placeofthecontaineridentifier.
3Copyafiletorootfilesystem(/tmp)onthecontainer:
[root@hostname]#cpmyfile.txt/var/lib/docker/devicemapper/mnt/
bd1ea8d5526c55df4740de9ba5afe14ed28ac3d127901ccb1653e187962c5156/
rootfs/tmp/myfile.txt
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 129
12Limitations
Usersareadvisedtoreadthelatestrelease_notesdistributedwiththeOnload
releaseforacomprehensivelistofKnownIssues.
12.1Introduction
ThischapteroutlinesconfigurationsthatOnloaddoesnotaccelerateandwaysin
whichOnloadmaychangebehaviorofthesystemandapplications.Itisakeygoal
ofOnloadtobefullycompatiblewiththebehavioroftheregularkernelstack,but
therearesomecaseswherebehaviordeviates.
Resources
Onloadusescertainphysicalresourcesonthenetworkadapter.Iftheseresources
areexhausted,itisnotpossibletocreatenewOnloadstacksandnotpossibleto
acceleratenewsocketsorapplications.Theonload_stackdumputilityshouldbe
usedtomonitorhardwareresources.Physicalresourcesinclude:
VirtualNICs
VirtualNICsprovidetheinterfacebywhichauserlevelapplicationsendsand
receivesnetworktraffic.Whentheseareexhausteditisnotpossibletocreatenew
Onloadstacks,meaningnewapplicationscannotbeaccelerated.However,
SolarflarenetworkadapterssupportlargenumbersofVirtualNICs,andthis
resourceisnottypicallythefirsttobecomeunavailable.
Endpoints
Onloadrepresentssocketsandpipesasstructurescalledendpoints.Themaximum
numberofacceleratedendpointspermittedbyeachOnloadstackissetwiththe
EF_MAX_ENDPOINTSvariable.Thestacklimitcanbereachedsoonerthanexpected
whensynreceivestates(thenumberofhalfopenconnections)alsoconsume
endpointbuffers.Foursynreceivestatesconsumeoneendpoint.Themaximum
numberofsynreceivestatescanbelimitedusingtheEF_TCP_SYNRECV_MAX
variable.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 130
Filters
Filtersareusedtodeliverpacketsreceivedfromthewiretotheappropriate
application.Whenfiltersareexhausteditisnotpossibletocreatenewaccelerated
sockets.Thegeneralrecommendationisthatapplicationsdonotallocatemorethan
4096filters‐orapplicationsshouldnotcreatemorethan4096outgoing
connections.
Thelimitdoesnotapplytoinboundconnectionstoalisteningsocket.
BufferTable
ThebuffertableprovidesaddressprotectionandtranslationforDMAbuffers.When
allbufferresourcesareexhausteditisnotpossibletocreatenewOnloadstacks,and
existingstacksarenotabletoallocatemoreDMAbuffers.
Whenhardwareresourcesareexhausted,normaloperationofthesystemshould
continue,butitwillnotbepossibletoacceleratenewsocketsorapplications.
TX,RXRingBufferSize
OnloaddoesnotobeyRX,TXringsizessetinthekernel,butinsteadusesthevalues
specifiedbyEF_RXQ_SIZEandEF_TXQ_SIZEbothdefaultto512.
12.2ChangestoBehavior
MultithreadedApplicationsTermination
AsOnloadhandlesnetworkinginthecontextofthecallingapplication'sthreaditis
recommendedthatapplicationsensureallthreadsexitcleanlywhentheprocess
terminates.Inparticulartheexit()functioncausesallthreadstoexitimmediately
‐eventhoseincriticalsections.ThiscancausethreadscurrentlywithintheOnload
stackholdingtheperstacklocktoterminatewithoutreleasingthissharedlock‐this
isparticularlyimportantforsharedstackswhereaprocesssharingthestackcould
‘hangwhenOnloadlocksarenotreleased.
AnuncleanexitcanpreventtheOnloadkernelcomponentsfromcleanlyclosingthe
application'sTCPconnections,amessagesimilartothefollowingwillbeobserved:
[onload]Stack[0]releasedwithlockstuck
andanypendingTCPconnectionswillbereset.Topreventthis,applicationsshould
alwaysensurethatallthreadsexitcleanly.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 131
ThreadCancellation
Unexpectedbehaviorcanresultwhenanacceleratedapplicationusesa
pthread_cancelfunction.Thereisincreasedriskfrommultithreadedapplicationsor
aPTHREAD_CANCEL_ASYNCHRONOUSthreadcallinganonasyncsafefunction.
Onloadusersarestronglyadvisedthatapplicationsshouldnotusepthread_cancel
functions.
PacketCapture
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,diagnostictoolssuchastcpdumpandwiresharkdonotcapture
acceleratedpackets.TheSolarflaresuppliedonload_tcpdumpdoessupportcapture
ofUDPandTCPpacketsfromOnloadstacks‐Refertoonload_tcpdumponpage291
fordetails.
Firewalls
PacketsdeliveredtoanapplicationviatheacceleratedpatharenotvisibletotheOS
kernel.Asaresult,thesepacketsarenotvisibletothekernelfirewall(iptables)and
thereforefirewallruleswillnotbeappliedtoacceleratedtraffic.The
onload_iptablesfeaturecanbeusedtoenforceLinuxiptablesrulesashardware
filtersontheSolarflareadapter,refertoonload_iptablesonpage296.
NOTE:Hardwarefilteringonthenetworkadapterwillensurethataccelerated
applicationsreceivetrafficonlyonportstowhichtheyarebound.
SystemTools‐SocketVisibility
Withtheexceptionof‘listening’sockets,TCPsocketsacceleratedbyOnloadarenot
visibletothenetstattool.UDPsocketsarevisibletonetstat.
Acceleratedsocketsappearinthe/procdirectoryassymboliclinksto/dev/
onload.Toolsthatrelyon/procwillprobablynotidentifytheassociatedfile
descriptorsasbeingsockets.RefertoOnloadandFileDescriptors,Stacksand
Socketsonpage57formoredetails.
AcceleratedsocketscanbeinspectedindetailwiththeOnloadonload_stackdump
tool,whichexposesconsiderablymoreinformationthantheregularsystemtools.
Fordetailsofonload_stackdumprefertoonload_stackdumponpage261.
Signals
IfanapplicationreceivesaSIGSTOPsignal,itispossiblefortheprocessingof
networkeventstobestalledinanOnloadstackusedbytheapplication.This
happensiftheapplicationisholdingalockinsidethestackwhentheapplicationis
stopped,andiftheapplicationremainsstoppedforalongtime,thismaycauseTCP
connectionstotimeout.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 132
Asignalwhichterminatesanapplicationcanpreventthreadsfromexitingcleanly.
RefertoMultithreadedApplicationsTerminationonpage130formoreinformation.
Undefinedcontentmayresultwhenasignalhandlerusesthethirdargument
(ucontext)andifthesignalispostponedbyOnload.Toavoidthis,usetheOnload
moduleoptionsafe_signals_and_exit=0oruseEF_SIGNALS_NOPOSTPONEto
preventspecificsignalsbeingpostponedbyOnload.
OnloadandIP_MULTICAST_TTL
OnloadwillactinaccordancewithRFC791whenitcomestotheIP_MULTICAST_TTL
setting.UsingOnload,ifIP_MULTICAST_TTL=0,packetswillneverbetransmittedon
thewire.
ThisdiffersfromtheLinuxkernelwherethefollowingbehaviorhasbeenobserved:
Kernel‐IP_MULTICAST_TTL0‐ifthereisalocallistener,packetswillnotbe
transmittedonthewire.
Kernel‐IP_MULTICAST_TTL0‐ifthereisNOlocallistener,packetswillalwaysbe
transmittedonthewire.
Source/PolicyBasedRoutingandRoutingMetrics
Onloaddoesnotcurrentlysupportsourcebasedorpolicybasedrouting.Whereas
theLinuxkernelwillselectarouteandinterfacebasedonroutingmetrics,Onload
willselectanyofthevalidroutesandOnloadinterfacestoadestinationthatare
available.
TheEF_TCP_LISTEN_REPLIES_BACKenvironmentvariableprovidesapseudo
sourcebasedroutingsolution.ThisoptionforcesareplytoanincomingSYNto
ignoreroutesandreplytotheoriginatingnetworkinterface.
EnablingthisoptionwillallownewTCPconnectionstobesetup,butdoesnot
guaranteethatallrepliesfromanOnloadedapplicationwillgoviathereceiving
Solarflareinterface‐andsomereorderingoftheroutingtablemaybeneededto
guaranteethisORanexplicitroute(togoviatheSolarflareinterface)shouldbe
addedtotheroutingtable.
Onload,fromversion201606,introducedsupportforroutingtablesmetrics,
therefore,iftwoentriesintheroutingtablewillroutetraffictothedestination
address,theentrywiththebestmetricwillbeselectedevenifthatmeansrouting
overanonSolarflareinterface.
SeeEF_TCP_LISTEN_REPLIES_BACKonpage199.
ReversePathFiltering
OnloaddoesnotsupportReversePathFiltering.WhenOnloadcannotroutetraffic
toaremoteendpointoveraSolarflareinterface(nosuitableroutetableentry),the
trafficwillbehandledviathekernel.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 133
SO_REUSEPORT
Onloadvs.kernelbehaviourisdescribedinChapter6onpage68.
ThreadSafe
Onloadassumesthatfiledescriptormodificationsarethreadsafeandthatfile
descriptorsarenotconcurrentlymodifiedbydifferentthreads.Concurrentaccess
shouldnotcauseproblems.Thisisdifferentfromkernelbehaviourandusersshould
setEF_FDS_MT_SAFE=0iftheapplicationisnotconsideredthreadsafe.
Similarconsiderationshouldbegivenwhenusingepoll()wheredefault
concurrencycontrolaredisabledinOnload.Usersshouldset
EF_EPOLL_MT_SAFE=0.
12.3LimitstoAcceleration
IPFragmentation
FragmentedIPtrafficisnotacceleratedbyOnloadonthereceiveside,andisinstead
receivedtransparentlyviathekernelstack.IPfragmentationisrarelyseenwithTCP,
becausetheTCP/IPstackssegmentmessagesintoMTUsizedIPdatagrams.With
UDP,datagramsarefragmentedbyIPiftheyaretoolargefortheconfiguredMTU.
RefertoFragmentedUDPonpage101foradescriptionofOnloadbehavior.
BroadcastTraffic
Broadcastsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
Multicasttrafficcanbeaccelerated.
IPv6Traffic
IPv6trafficfunctionsasnormalbutwillnotbeaccelerated.
RawSockets
RawSocketsendsandreceivesfunctionasnormalbutwillnotbeaccelerated.
SocketpairandUNIXDomainSockets
Onloadwillintercept,butdoesnotacceleratethesocketpair()systemcall.
Socketscreatedwithsocketpair()willbehandledbythekernel.Onloadalsodoes
notaccelerateUNIXdomainsockets.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 134
UDPsendfile()
TheUDPsendfile()methodisnotcurrentlyacceleratedbyOnload.Whenan
Onloadacceleratedapplicationcallssendfile()thiswillbehandledseamlesslyby
thekernel.
StaticallyLinkedApplications
Onloadwillnotacceleratestaticallylinkedapplications.Thisisduetothemethodin
whichOnloadinterceptslibcfunctioncalls(usingLD_PRELOAD).
LocalPortAddress
OnloadislimitedtoOOF_LOCAL_ADDR_MAXnumberoflocalinterfaceaddresses.A
localaddresscanidentifyaphysicalportoraVLAN,andmultipleaddressescanbe
assignedtoasingleinterfacewhereeachaddresscontributestothemaximum
value.Userscanallocateadditionallocalinterfaceaddressesbyincreasingthe
compiletimeconstantOOF_LOCAL_ADDR_MAXinthe/src/lib/efthrm/
oof_impl.hfileandrebuildingOnload.Inonload201205OOF_LOCAL_ADDR_MAX
wasreplacedbytheonloadmoduleoptionmax_layer2_interfaces.
Bonding,Linkaggregation
•Onloadwillonlyacceleratetrafficover802.3adandactivebackupbonds.
•Onloadwillnotacceleratetrafficifabondcontainsanyslaveinterfacesthatare
notSolarflarenetworkdevices.
• AddinganonSolarflarenetworkdevicetoabondthatiscurrentlyaccelerated
byOnloadmayresultinunexpectedresultssuchasconnectionsbeingreset.
• AccelerationofbondedinterfacesinOnloadrequiresakernelconfiguredwith
sysfssupportandabondingmoduleversionof3.0.0orlater.
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsofconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpTeams’.
VLANs
•OnloadwillonlyacceleratetrafficoverVLANswherethemasterdeviceiseither
aSolarflarenetworkdevice,oroverabondedinterfacethatisaccelerated.i.e.
IftheVLAN'smasterisaccelerated,thensoistheVLANinterfaceitself.
•NestedVLANtagsarenotaccelerated,butwillfunctionasnormal.
•TheifconfigcommandwillreturninconsistentstatisticsonVLANinterfaces(not
masterinterface).
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 135
•WhenaSolarflareVLANtaggedinterfaceissubsequentlyplacedinabond,the
interfacewillcontinuetobeaccelerated,butthebondisnotaccelerated.
•UsingSolarflareSFN5000andSFN6000seriesadapters,orusingSFN7000and
SFN8000seriesadapterswiththelowlatencyfirmwarevariant,thefollowing
limitationapplies:
HardwarefiltersinstalledbyOnloadontheadapterwillonlyactontheIP
addressandport,butnottheVLANidentifier.ThereforeifthesameIP
address:portcombinationexistsondifferentVLANinterfaces,onlythefirst
interfacetoinstallthefilterwillreceivethetraffic.
ThislimitationdoesnotapplytoSFN7000andSFN8000seriesadaptersusing
thefullfeaturefirmwarevariant.
IncaseswhereOnloadwillnotacceleratethetrafficitwillcontinuetoworkviathe
OSnetworkstack.
FormoreinformationanddetailsandconfigurationoptionsrefertotheSolarflare
ServerAdapterUserGuidesection‘SettingUpVLANs’.
EthernetBridgeConfiguration
OnloaddoesnotcurrentlysupportaccelerationofinterfacesaddedtoanEthernet
bridgeconfigured/addedwiththeLinuxbrctlcommand.
TCPRTODuringOverloadConditions
UsingOnload,underveryhighloadconditionsanincreasedfrequencyofTCP
retransmissiontimeouts(RTOs)mightbeobserved.Thishasthepotentialtooccur
whenathreadservicingthestackisdescheduledbytheCPUwhilststillholdingthe
stacklockthuspreventinganotherthreadfromaccessing/pollingthestack.Astack
notbeingservicedmeansthatACKsarenotreceivedinatimelymannerforpackets
sent,resultinginRTOsfortheunacknowledgedpacketsandincreasedjitteronthe
Onloadstack.
EnablingtheperstackenvironmentvariableEF_INT_DRIVENcanreducethe
likelihoodofthisbehaviorandreducejitterbyensuringthestackisserviced
promptly.TCPwithJumboFrames
WhenusingjumboframeswithTCP,OnloadwilllimittheMSSto2048bytesto
ensurethatsegmentsdonotexceedthesizeofinternalpacketbuffers.
Thisshouldpresentnoproblemsunlesstheremoteendofaconnectionisunableto
negotiatethislowerMSSvalue.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 136
TransmissionPath‐PacketLoss
OccasionallyOnloadneedstosendapacket,whichwouldnormallybeaccelerated,
viathekernel.ThisoccurswhenthereisnodestinationaddressentryintheARP
tableortopreventanARPtableentryfrombecomingstale.
Bydefault,theLinuxsysctl,unres_qlen,willenqueue3packetsperunresolved
addresswhenwaitingforanARPreply,andonaserversubjecttoaveryhighUDP
orTCPtrafficloadthiscanresultinpacketlossonthetransmitpathandpackets
beingdiscarded.
Theunres_qlenvaluecanbeidentifiedusingthefollowingcommand:
sysctl‐a|grepunres_qlen
net.ipv4.neigh.eth2.unres_qlen=3
net.ipv4.neigh.eth0.unres_qlen=3
net.ipv4.neigh.lo.unres_qlen=3
net.ipv4.neigh.default.unres_qlen=3
Changestothequeuelengthscanbemadepermanentinthe/etc/sysctl.conf
file.Solarflarerecommendsettingtheunres_qlenvaluetoatleast50.
Ifpacketdiscardsaresuspected,thisextremelyrareconditioncanbeindicatedby
thecp_defercounterproducedbytheonload_stackdumplotscommandonUDP
socketsorfromtheunresolved_discardscounterintheLinux/proc/net/stat
arp_cachefile.
TCP‐UnsupportedRouting,TimedoutConnections
IfTCPpacketsarereceivedoveranOnloadacceleratedinterface,butOnloadcannot
findasuitableOnloadacceleratedreturnroute,noresponsewillbesentresultingin
theconnectiontimingout.
ApplicationClustering
FordetailsofApplicationClustering,refertoApplicationClusteringonpage68.
•OnloadmatchestheLinuxkernelimplementationsuchthatclusteringisnot
supportedformulticasttrafficandwheresettingofSO_REUSEPORThasthe
sameeffectasSO_REUSEADDR.
• Callingconnect()onaTCPsocketwhichwaspreviouslysubjecttoabind()
callisnotcurrentlysupported.Thiswillbesupportedinafuturerelease.
•Anapplicationclusterwillnotpersistoveradapter/server/driverreset.Before
restartingtheserverorresettingtheadaptertheOnloadapplicationsshouldbe
terminated.
•TheenvironmentvariableEF_CLUSTER_RESTARTdeterminesthebehaviorof
theclusterwhentheapplicationprocessisrestarted‐referto
EF_CLUSTER_RESTARTinParameterReferenceonpage163.
•IfthenumberofsocketsinaclusterislessthanEF_CLUSTER_SIZE,aportionof
thereceivedtrafficwillbelost.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 137
•ThereislittlebenefitwhenclusteringinvolvesaTCPloopbacklisteningsocket
asconnectionswillnotbedistributedamongstallthreads.Anonloopback
listeningsocket‐whichmightoccasionallygetsomeloopbackconnectionscan
benefitfromApplicationClustering.
12.4epoll‐KnownIssues
OnloadsupportsdifferentimplementationsofepollcontrolledbytheEF_UL_EPOLL
environmentvariable‐seeMultiplexedI/Oonpage61forconfigurationdetails.
TherearevariouslimitationsanddifferencesinOnloadvs.kernelbehaviour‐refer
toChapter6onpage61fordetails.
•WhenusingEF_UL_EPOLL=1or3,ithasbeenidentifiedthatthebehaviorof
epoll_wait()differsfromthekernelwhentheEPOLLONESHOTeventis
requested,resultingintwo‘wakeups’beingobserved,onefromthekerneland
onefromOnload.ThisbehaviorisapparentonSOCK_DGRAMandSOCK_STREAM
socketsforallcombinationsofEPOLLONESHOT,EPOLLINandEPOLLOUTevents.
Thisappliesforalltypesofacceleratedsockets.EF_EPOLL_CTL_FASTis
enabledbydefaultandthismodifiesthesemanticsofepoll.Inparticular,it
buffersupcallstoepoll_ctl()andonlyappliesthemwhenepoll_wait()is
called.Thiscanbreakapplicationsthatdoepoll_wait()inonethreadand
epoll_ctl()inanotherthread.TheissueonlyaffectsEF_UL_EPOLL=2andthe
solutionistosetEF_EPOLL_CTL_FAST=0ifthisisaproblem.Thedescribed
conditiondoesnotoccurifEF_UL_EPOLL=1orEF_UL_EPOLL=3.
•WhenEF_EPOLL_CTL_FASTisenabledandanapplicationistestingthe
readinessofanepollfiledescriptorwithoutactuallycallingepoll_wait(),for
examplebydoingepollwithinepoll()orepollwithinselect(),ifonethread
iscallingselect()orepoll_wait()andanotherthreadisdoing
epoll_ctl(),thenEF_EPOLL_CTL_FASTshouldbedisabled.Thisapplies
whenusingEF_UL_EPOLL1,2or3.
Iftheapplicationismonitoringthestateoftheepollfiledescriptorindirectly,
e.g.bymonitoringtheepollfdwithpoll,thenEF_EPOLL_CTL_FASTcancause
issuesandshouldbesettozero.
ToforceOnloadtofollowthekernelbehaviourwhenusingtheepoll_wait()
call,thefollowingvariablesshouldbeset:
EF_UL_EPOLL=2
EF_EPOLL_CTL_FAST=0
EF_EPOLL_CTL_HANDOFF=0 (whenusingEF_UL_EPOLL=1)
•Asocketshouldberemovedfromanepollsetonlywhenallreferencestothe
socketareclosed.
WithEF_UL_EPOLL=1(default)orEF_UL_EPOLL=3,asocketisremovedfrom
theepollsetifthefiledescriptorisclosed,evenifotherreferencestothe
socketexist.Thiscancauseproblemsiffiledescriptorsareduplicatedusing
dup(),dup2()orfork().Forexample:
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 138
s=socket();
s2=dup(s);
epoll_ctl(epoll_fd,EPOLL_CTL_ADD,s,...);
close(s);/*socketreferencedbysisremovedfromepollsetwhenusingonload*/
WorkaroundissetEF_UL_EPOLL=2.
•WhenOnloadisunabletoaccelerateaconnectedsocket,e.g.becausenoroute
tothedestinationexistswhichusesaSolarflareinterface,thesocketwillbe
handedofftothekernelandisremovedfromtheepollset.Becausethesocket
isnolongerintheepollset,attemptstomodifythesocketwithepoll_ctl()
willfailwiththeENOENT(descriptornotpresent)error.Thedescribedcondition
doesnotoccurifEF_UL_EPOLL=1or3.
•Ifanepollfiledescriptorispassedtotheread()orwrite()functionsthese
willreturnadifferenterrorcodethanthatreportedbythekernelstack.This
issueexistsforallimplementationsofepoll.
•WhenEPOLLETisusedandtheeventisready,epoll_wait()istriggeredby
ANYeventonthesocketinsteadoftherequestedevent.Thisissueshouldnot
affectapplicationcorrectness.Theproblemexistsforbothimplementationsof
epoll.
•Usersshouldbeawarethatifaserverisoverclockedtheepoll_wait()
timeoutvaluewillincreaseasCPUMHzincreasesresultinginunexpected
timeoutvalues.ThishasbeenobservedonIntelbasedsystemsandwhenthe
OnloadepollimplementationisEF_UL_EPOLL=1or3.UsingEF_UL_EPOLL=2
thisbehaviorisnotobserved.
•Onaspinningthread,ifepollaccelerationisdisabledbysetting
EF_UL_EPOLL=0,socketsonthisthreadwillbehandedofftothekernel,but
latencywillbeworsethanexpectedkernelsocketlatency.
•Toensurethatnonacceleratedfiledescriptorsarecheckedinpollandselect
functions,thefollowingoptionsshouldbedisabled(settozero):
EF_SELECT_FASTandEF_POLL_FAST
•Whenusingpoll()andselect()calls,toensurethatnonacceleratedfile
descriptorsarecheckedwhentherearenoeventsonanyaccelerated
descriptors,setthefollowingoptions:
EF_POLL_FAST_USECandEF_SELECT_FAST_USEC,settingbothtozero.
Spinning‐TimingIssues
Onloadusersshouldconsiderthatasdifferentsoftwareisbeingrun,timingswillbe
affectedwhichcanresultinunexpectedschedulingbehaviourandmemoryuse.
Spinningapplications,inparticular,requireadedicatedcoreperspinningOnload
thread.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 139
12.5ConfigurationIssues
MixedAdaptersSharingaBroadcastDomain
OnloadshouldnotbeusedwhenSolarflareandnonSolarflareinterfacesinthe
samenetworkserverareconfiguredinthesamebroadcastdomain1asdepictedby
thefollowingdiagram.
Whenanoriginatingserver(S1)sendsanARPrequesttoaremoteserver(S2)having
morethanoneinterfacewithinthesamebroadcastdomain,ARPresponsesfromS2
willbegeneratedfromallinterfacesanditisnondeterministicwhichresponsethe
originatoruses.WhenOnloaddetectsthissituation,itpromptsamessage
identifying'duplicateclaimofipaddress'toappearinthe(S1)hostsyslog
asawarningofpotentialproblems.
Problem1
TrafficfromS1toS2maybedeliveredthrougheitheroftheinterfacesonS2,
irrespectiveoftheIPaddressused.Thismeansthatifoneinterfaceisacceleratedby
Onloadandtheotherisnot,youmayormaynotgetacceleration.
Toresolvethesituation(forthecurrentsession)issuethefollowingcommand:
echo1>/proc/sys/net/ipv4/conf/all/arp_ignore
ortoresolveitpermanentlyaddthefollowinglinetothe/etc/sysctl.conffile:
net.ipv4.conf.all.arp_ignore=1
andrunthesysctlcommandforthisbeeffective.
sysctl‐p
ThesecommandsensurethataninterfacewillonlyrespondtoanARPrequestwhen
theIPaddressmatchesitsown.RefertotheLinuxdocumentationLinux/
Documentation/networking/ipsysctl.txtforfurtherdetails.
1. ABroadcastdomaincanbealocalnetworksegmentorVLAN.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 140
Problem2
AmoreseriousproblemarisesifoneinterfaceonS2carriesOnloadacceleratedTCP
connectionsandanotherinterfaceonthesamehostandsamebroadcastdomainis
nonSolarflare:
ATCPpacketreceivedonthenonSolarflareinterfacecanresultinacceleratedTCP
connectionsbeingresetbythekernelstackandthereforeappeartotheapplication
asifTCPconnectionsarebeingdropped/terminatedatrandom.
TopreventthissituationtheSolarflareandnonSolarflareinterfacesshouldnotbe
configuredinthesamebroadcastdomain.ThesolutiondescribedforProblem1
abovecanreducethefrequencyofProblem2,butdoesnoteliminateit.
TCPpacketscanbedirectedtothewronginterfacebecause:
•theoriginatorS1needstorefreshitsARPtableforthedestinationIPaddress‐
sosendsanARPrequestandsubsequentlydirectsTCPpacketstothenon
Solarflareinterface
•aswitchwithinthebroadcastdomainbroadcaststheTCPpacketstoall
interfaces.
VirtualMemoryon32BitSystems
On32bitLinuxsystemstheamountofallocatedvirtualaddressspacedefaults,
typically,to128MbwhichlimitsthenumberofSolarflareinterfacesthatcanbe
configured.Virtualmemoryallocationcanbeidentifiedinthe/proc/meminfofile
e.g.
grepVmalloc/proc/meminfo
VmallocTotal:122880kB
VmallocUsed:76380kB
VmallocChunk:15600kB
TheOnloaddriverwillattempttomapallPCIBaseAddressRegistersforeach
Solarflareinterfaceintovirtualmemorywhereeachinterfacerequires16Mb.
Examinationofthekernellogsin/var/log/messagesatthepointtheOnload
driverisloading,wouldrevealamemoryallocationfailureasinthefollowing
extract:
allocationfailed:outofvmallocspace‐usevmalloc=<size>toincreasesize.
[sfcefrm]Failed(12)tomapbar(16777216bytes)
[sfcefrm]efrm_nic_add:ERROR:linux_efrm_nic_ctorfailed(12)
Onesolutionistousea64bitkernel.Anotheristoincreasethevirtualmemory
allocationonthe32bitsystembysettingvmallocsizeonthe‘kernelline’inthe/
boot/grub/grub.conffileto256,forexample,
kernel/vmlinuz2.6.18238.el5roroot=/dev/sda7vmalloc=256M
Thesystemmustberebootedforthischangetotakeeffect.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 141
IGMPOperationandMulticastProcessPriority
ItisimportantthatthepriorityofprocessesusingUDPmulticastdonothavea
higherprioritythanthekernelthreadhandlingthemanagementofmulticastgroup
membership.
Failuretoobservethiscouldleadtothefollowingsituations:
1IncorrectkernelIGMPoperation.
2Thehigherpriorityuserprocessisabletoeffectivelyblockthekernelthread
andpreventitfromidentifyingthemulticastgrouptoOnloadwhichwillreact
bydroppingpacketsreceivedforthemulticastgroup.
Acombinationofindicatorsmayidentifythis:
•ethtoolreportsgoodpacketsbeingreceivedwhilemulticastmismatchdoesnot
increase.
•ifconfigidentifiesdataisbeingreceived.
• onload_stackdumpwillshowtherx_discard_mcast_mismatchcounter
increasing.
Loweringthepriorityoftheuserprocesswillremedythesituationandallowthe
multicastpacketsthroughOnloadtotheuserprocess.
DynamicLoading
Iftheonloadlibrarylibonloadisopenedwithdlopen()andclosedwithdlclose()
itcanleavetheapplicationinanunpredictablestate.Usersareadvisedtousethe
RTLD_NODELETEflagtopreventthelibraryfrombeingunloadedwhendlclose()is
called.
ScalablePacketBufferMode
SupportforSRIOVisdisabledon32bitkernels,thereforethefollowingfeaturesare
notavailableon32bitkernels.
•ScalablePacketBufferMode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
Onsomekernelversions,configuringtheadaptertohavealargenumberofVFs(via
sfboot)cancausekernelpanics.Affectingkernelversionsintherange3.0to3.3
inclusive,thisisduetothelargenetlinkmessagesthatincludeinformationabout
networkinterfaces.
Theproblemcanbeavoidedbylimitingthetotalnumberofphysicalnetwork
interfaces,includingVFs,toamaximum30.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 142
SLES11SRIOV
IthasbeennotedthatsomeSLES11kernels(3.1andearlier)exhibitabug,typically
seenwhenloadingOnloaddrivers,whenrunningOpenOnloadwithSRIOVandIntel
IOMMUs.Thisbughasbeenfixedinmorerecentkernels3.2stableand3.6.
HugePageswithIPCnamespace
HugepagesupportshouldnotbeenablediftheapplicationusesIPCnamespaces
andtheCLONE_NEWIPCflag.Failuretoobservethismayresultinasegfault.
HugePageswithSharedStacks
ProcesseswhichshareanOnloadstackshouldnotattempttousehugepages.Refer
toStackSharingonpage67forlimitationdetails.
HugePages‐Size
Whenusinghugepages,itisrecommendedtoavoidsettingthepagesizegreater
than2Mbyte.AfailuretoobservethiscouldleadtoOnloadunabletoallocate
furtherbuffertablespaceforpacketbuffers.
HugePages‐AMDIOMMU
DuetotheAMDIOMMUnotreturningalignedPCIaddresses,theuseofhugepages
onsystemswithAMDIOMMUsisnotsupported.
HugePagesandshmmni
Usersshouldensurethatthenumberofsystemwidesharedmemorysegments
(shmmni)exceedsthenumberofhugepagesrequired.
•Toidentifycurrentshmmnisetting:
#cat/proc/sys/kernel/shmmni
•Toset(norebootrequired‐butnotpermanent):
#echo8000>/proc/sys/kernel/shmmni
•Toset(permanent‐rebootrequired):
#echo"kernel.shmmni=8000">>/etc/sysctl.conf
Forexample,if4000hugepagesarerequired,increasethecurrentshmmnivalueby
4000.
OnloadUserGuide
Limitations
Issue22 ©SolarflareCommunications2017 143
RedHatMRG2andSRIOV
SolarflaredonotrecommendtheuseofSRIOVortheIOMMUwhenusingOnload
onMRG2systemsduetoanumberofknownkernelissues.Additionally,the
followingOnloadfeaturesshouldnotbeusedonMRG2u3:
•Scalablepacketbuffermode(EF_PACKET_BUFFER_MODE=1)
•ef_viwithVFs
PowerPCArchitecture
•SRIOVisnotsupportedonPowerPCsystems.Recommendedsettingis
EF_PACKET_BUFFER_MODE=0or2,butnot1or3.
•PowerPCarchitecturesdonotcurrentlysupportPIOforreducedlatency.
EF_PIOshouldbesettozero.
Java7Applications‐useofvfork()
OnloadacceleratedJava7applicationsthatcallvfork()shouldsetthe
environmentvariableEF_VFORK_MODE=2andthereaftertheapplicationshouldnot
createsocketsoracceleratedpipesinvfork()childbeforeexec.
PIOnotsupportedinKVM/ESXi
Duetolimitationswithwritecombinemappinginavirtualguestenvironment,PIO
isnotcurrentlysupportedforOnloadapplicationsrunninginavirtualmachinein
KVMorESXi.
UsersshouldensurethatEF_PIOissetto0forallOnloadstacksrunninginVMs.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 144
13ChangeHistory
Thischapterprovidesabriefhistoryofchanges,additionsandremovalstoOnload
releasesaffectingOnloadbehaviorandOnloadenvironmentvariables.
Featuresonpage145
EnvironmentVariablesonpage150
ModuleOptionsonpage158
Onload‐AdapterNetDriversonpage161
TheOOLcolumnidentifiestheOpenOnloadreleasesupportingthefeature.TheEOL
columnidentifiestheEnterpriseOnloadreleasesupportingthefeature(NS=not
supported).
13.1MappingEnterpriseOnload/OpenOnload
ThefollowingtablemapsmajorEnterpriseOnloadreleasestotheclosest
functionallyequivalentOpenOnloadrelease.Usersshouldalwaysalsorefertothe
ReleaseNotesandChangeLogstoidentifyfeaturesupportintheEnterpriserelease.
13.2Onload‐AdapterNetDrivers
RefertoOnload‐AdapterNetDriversonpage161foralistofnetdriversusedin
OpenOnloadandEnterpriseOnloaddistributions.
OpenOnload EnterpriseOnload
201011u1 1.0
201109u2 2.0
201310u2 3.0
201502u2 4.0
201606u1 5.0
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 145
13.3Features
Feature OOL EOL Description/Notes
TCPDirectGAversion 201606u1 5.0 LightweightultralowlatencyTCP/IPstack.
ExtensionsAPI 201606u1 5.0 Supportforonload_socket_nonaccel()‐
allocateasocketnotacceleratedbyonload.
Routingtablemetrics 201606 5.0 Onloadwilluseroutingtablemetrics.
Onloadadaptersupport 201606 5.0 OnloadsupportforSFN8000seriesadapters
TCPDirectpreviewversion 201606 NS LightweightultralowlatencyTCP/IPstack
ExtensionsAPI 201606 5.0 Supportforonload_thread_get_spin()
Controlplane 201606 5.0 Nowsuppliedasaseparatebinarymodule.
CI_CFG_TEAMING,CI_CFG_
MAX_REGISTER_INTERFACES
201606 5.0 Effectivelyremoved,astheyaresetatbuild
timeofthebinarycontrolplanemodule.
sfc_aoedriver 201606 5.0 ApplicationOnload™drivernolongerincluded
intheOnloaddistributionorEOL5.0
distribution.
ApplicationClustering 201405 4.0 201509Removethesameport,sameaddress
limitation.
CI_CFG_MAX_INTERFACES
CI_CFG_MAX_REGISTER_
INTERFACES
201509 4.0 Increasedefaultto8(previously6).This
remainsacompiletimeoption.
onload_set_recv_
filter()
201509 4.0 UDPsocketscallsisdeprecatedin201509and
EOL5.0.
Teamingdriver 201509 5.0 Acceleratelinksaggregatedusingteamdand
theteamingdriver.
TransparentProxy 201509 5.0 SeeTransparentReverseProxyModeson
page95.
ScalableFilters 201509 5.0 SeeScalableFiltersonpage93.
IP_TRANSPARENT 201509 5.0 TCPsocketoptiontoallowasockettobebound
toanonlocaladdress.SeeScalableFilter
modes.
SO_PROTOCOL 201502u2 4.0 Socketoptiontoretrieveasocketprotocolasan
integer.
LinuxDockerContainers 201502 4.0 SeeOnloadinaDockerContaineronpage124
OnloadinKVM 201502 4.0 OnloadandLinuxKVMonpage120
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 146
Socketcaching 201502 4.0 SeeListen/AcceptSocketsonpage91
RemoteMonitoring 201502 4.0 SeeRemoteMonitoringonpage281
Blacklist/Whitelist 201502 4.0 SeeWhitelistandBlacklistInterfaceson
page56
TCPdelegatedsend 201502 4.0 SeeListen/AcceptSocketsonpage91
SynCookies 201502 4.0
Receivequeuedropcounters 201502 4.0
Ubuntu/Debiansupported 201502 4.0 SeeHardwareandSoftwareSupported
Platformsonpage19forsupportedversions.
SIOCOUTQ 201405u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetacknowledged.
SIOCOUTQNSD 201405u1 4.0 TCPsocketioctlthatreturnstheamountofdata
notyetsent.
ef_pd_interface_name() 201405u1 4.0 Identifiestheinterfaceusedbyaprotection
domain.
ef_vi_prime() 201405u1 4.0 Primeinterruptssocanblockonafile
descriptor(includinganyvirtualinterface)until
eventsarereadytobeprocessed.
ef_filter_spec_set_tx_
port_sniff() 201405u1 4.0 NewfiltertypetosniffTXtraffic.
ONLOAD_SOF_TIMESTAMPING_
STREAM
201405 4.0 OnloadextensiontothestandardSO_
TIMESTAMPINGAPItosupporthardware
timestampsonTCPsockets.
onload_move_fd 201405 4.0 Movesocketsbetweenstacks.
SolarCapturePro‐
applicationclustering
201405 4.0 Onloaddistributionincludesthesolarclusterd
daemonforSolarCaptureProapplication
clusteringfeature.
SO_REUSEPORT 201405 4.0 Allowmultiplesocketstobindtothesameport
‐supportstheApplicationClusteringfeature‐
seeApplicationClusteringonpage68.
HWMulticastLoopback 201405 4.0 RefertoHardwareMulticastLoopbackon
page106.
onload_ordered_epoll_
wait()
onload_ordered_epoll_
event()
201405 4.0 Wireorderdeliveryofpackets.
RefertoWireOrderDeliveryonpage65.
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 147
TCPSYNcookies 201405 4.0 ForceuseofTCPSYNcookiestoprotectagainst
aSYNfloodattack.
onload_tooldisable_cstates 201405 Removedalongwiththesfc_tunedriver.
sfc_aoedriver 201405 NS ApplicationOnload™driverincludedinthe
Onloaddistribution.
SO_TIMESTAMPING 201310u1 3.0 Socketoptiontoreceivehardwaretimestamps
forreceivedpackets.
onload_fd_check_
feature() 201310u1 3.0 onload_fd_check_featureonpage223
MulticastReplication 201310 3.0 Bonding,LinkaggregationandFailoveron
page69
TXPIO 201310 3.0 DebugandLoggingonpage73
LargeBufferTableSupport 201310 3.0 LargeBufferTableSupportonpage108
TemplatedSends 201310 3.0 TheperstackEF_PIOvariablecanalsobe
unsetforstackswherePIObuffersarenot
required.onpage119
ONLOAD_MSG_WARM 201310 3.0 ONLOAD_MSG_WARMonpage90
SO_TIMESTAMP
SO_TIMESTAMPNS
201310 3.0 SupportedforTCPsockets
dup3() 201310 3.0 Onloadwillinterceptcallstocreateacopyofa
filedescriptorusingdup3().
IP_ADD_SOURCE_
MEMBERSHIP
201210u1 3.0 Jointhesuppliedmulticastgrouponthegiven
interfaceandacceptdatafromthesupplied
sourceaddress.
IP_DROP_SOURCE_
MEMBERSHIP
201210u1 3.0 Dropsmembershiptothegivenmulticast
group,interfaceandsourceaddress.
MCAST_JOIN_SOURCE_
GROUP
201210u1 3.0 Joinasourcespecificgroup.
MCAST_LEAVE_SOURCE_
GROUP
201210u1 3.0 Leaveasourcespecificgroup.
Hugepagessupport 201210 3.0 Packetbuffersusehugepages.Controlledby
EF_USE_HUGE_PAGES
Defaultis1‐usehugepagesifavailable
SeeLimitationsonpage129
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 148
onload_iptables 201210 3.0 ApplyLinuxiptablesfirewallrulesoruser
definedfirewallrulestoSolarflareinterfaces
onload_stackdump
processes
onload_stackdumpthreads
onload_stackdumpenv
201210 3.0 ShowallacceleratedprocessesbyPID
ShowCPUcoreacceleratedprocessisrunning
on
Showenvironmentvariables‐EF_VALIDATE_
ENV
Theaffinitiesoptionhasbeenreplacedwiththe
threadsoption.
UDPsendmmsg() 201210 3.0 Sendmultiplemsgsinasinglefunctioncall
I/OMultiplexing 201210 3.0 Supportforppoll(),pselect()andepoll_
pwait()
DKMS 201210 NS OpenOnloadavailableinDKMSRPMbinary
format
Removingzombiestacks 201205u1 2.1.0.0 onload_stackdump‐zkillwillterminate
stackslingeringafterexit
Compatibility 201205u1 2.1.0.0 CompatibilitywithRHEL6.3andLinux3.4.0
TCPstriping 201205 2.1.0.0 SingleTCPconnectioncanusethefull
bandwidthofbothportsonaSolarflareadapter
TCPloopbackacceleration 201205 2.1.0.0 EF_TCP_CLIENT_LOOPBACK&EF_TCP_SERVER_
LOOPBACK
TCPdelayed
acknowledgments
201205 2.1.0.0 EF_DYNAMIC_ACK_THRESH
TCPresetfollowingRTO 201205 2.1.0.0 EF_TCP_RST_DELAYED_CONN
Configurecontrolplane
tables
201205 2.1.0.0 max_layer_2_interface
max_neighs
max_routes
Onloadadaptersupport 201109u2 2.0.0.0 OnloadsupportforSFN5322F&SFN6x22F
Acceleratepipe2() 201109u2 2.0.0.0 Acceleratepipe2()functioncall
SOCK_NONBLOCK
SOCK_CLOEXEC
201109u2 2.0.0.0 TCPsockettypes
ExtensionsAPI 201109u2 2.0.0.0 Supportforonload_thread_set_spin()
Onload_tcpdump 201109 2.0.0.0
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 149
ScalablePacketBuffer 201109 2.0.0.0 EF_PACKET_BUFFER_MODE=1
ZeroCopyUDPRX 201109 2.0.0.0
ZeroCopyTCPTX 201109 2.0.0.0
Receivefiltering 201109 2.0.0.0
TCP_QUICKACK 201109 2.0.0.0 setsockopt()option
Benchmarktoolsfnettest 201109 2.0.0.0 Supportforsfntstream
ExtensionsAPI 201104 2.0.0.0 Initialpublication
SO_BINDTODEVICE
SO_TIMESTAMP
SO_TIMESTAMPNS
201104 2.0.0.0 setsockopt()andgetsockopt()options
Acceleratedpipe() 201104 2.0.0.0 Acceleratepipe()functioncall
UDPrecvmmsg() 201104 2.0.0.0 Delivermultiplemsgsinasinglefunctioncall
Benchmarktoolsfnettest 201104 2.0.0.0 Supportsonlysfntpingpong
Feature OOL EOL Description/Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 150
13.4EnvironmentVariables
Variable OOL EOL Changed Notes
EF_WODA_SINGLE_
INTERFACE
201606u1 5.0 Trafficwillonlybeordered
relativetoothertrafficarriving
onthesameinterface.
EF_TCP_SHARED_LOCAL_
PORTS_MAX
201606u1 4.0.5 Setmaxsizeforthepooloflocal
sharedports.
EF_TCP_SHARED_LOCAL_
PORTS
201606U1 4.0.5 ImproveperformanceforTCP
activeopenconnections.
EF_ONLOAD_FD_BASE 201606u1 5.0 BasevalueforOnloadinternal
usefiledescriptors.
EF_TCP_LISTEN_
REPLIES_BACK
201606 5.0 ForcereplytoanincomingSYN
toignoreroutesandreplytothe
originatingnetworkinterface.
EF_HIGH_THROUGHPUT_
MODE
201606 5.0 Optimizeforthroughputatthe
costoflatency
EF_UDP_SEND_NONBLOCK_
NO_PACKETS_MODE
201509 4.0.3 Controlbehaviorofnonblock
UDPsend()callswhen
insufficientbufferscanbe
allocated.
EF_TCP_SYNRECV_MAX 201509 5.0 Limitthenumberofhalfopen
connectionsthatcanbecreated
inanOnloadstack.
EF_TCP_SOCKBUF_MAX_
FRACTION
201509 5.0 ControlthefractionoftotalTX
buffersallocatedtoasingle
socket.
EF_TCP_CONNECT_SPIN 201509 5.0 Callstoconnect()forTCP
socketswillspinuntila
connectionisestablishedorthe
spintimeoutexpiresorthe
sockettimeoutexpires.
Default=disabled.
EF_SCALABLE_FILTERS_
ENABLE
201509 5.0 Tog glescalablefiltersmodefor
astack.
EF_SCALABLE_FITLERS_
MODE
201509 5.0 Storesthescalablefiltermode
setwithEF_SCALABLE_
FILTERS.Notsetdirectly.
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 151
EF_SCALABLE_FILTERS 201509 5.0 Identifytheinterfacetouseand
setmodeforscalablelistening
sockets.
EF_RETRANSMIT_
THRESHOLD_ORPHAN
201509 5.0 Numberofretransmittimeouts
beforeaTCPconnectionis
abortedincaseoforphaned
connection.
EF_MAX_EP_PINNED_
PAGES
NS 1.0 201509 Notusedinpreviousrelease
andremovedfrom201509.
EF_OFE_ENGINE_SIZE 201502 4.0 Size(bytes)oftheOnloadfilter
engineallocatedwhenanew
stackiscreated.
EF_TCP_SNDBUF_
ESTABLISHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_SNDBUFforTCPsocketsin
theESTABLISHEDstate.
EF_TCP_RCVBUF_STRICT 201502 4.0 PreventTCPsmallsegment
attackbylimitingnumberof
packetsinaTCPreceivequeue
andreorderbuffer.
EF_TCP_RCVBUF_
ESTABLISHED_DEFAULT
201502 4.0 OverrideOSdefaultvaluefor
SO_RCVBUFforTCPsocketsin
theESTABLISHEDstate.
EF_SO_BUSY_POLL_SPIN 201502 4.0 Spinonlyifaspinningsocketis
presentinthepoll/select/epoll
set.
EF_SELECT_NONBLOCK_
FAST_USEC
201502 4.0 Nonacceleratedsocketsare
polledonlyeveryNusecs.
EF_SELECT_FAST_USEC 201502 4.0 Acceleratedsocketsarepolled
forNusecsbefore
unacceleratedsockets.
EF_PIPE_SIZE 201502 4.0 201509
EOL4.0.3
Defaultsizeofapipe.
Defaultdecreasedto229376
from237568.
Default237568.
EF_SOCKET_CACHE_MAX 201502 4.0 Setthemaximumnumberof
TCPsocketstocacheperstack.
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 152
EF_SOCKET_CACHE_PORTS 201502 4.0 Allowcachingofsocketsbound
tospecifiedports.
EF_PER_SOCKET_CACHE_
MAX
201502 4.0 Limitthesizeofasocketcache.
EF_COMPOUND_PAGES_
MODE
201502 4.0 ControlOnloaduseof
compoundpages.
EF_UL_EPOLL=3 201502 4.0 Mode2supportedinEOL
versionsbefore4.0
EF_ACCEPT_INHERIT_
NODELAY
NS 3.0 201502/4.0 Removed(OOL)201502,(EOL)
4.0.
EF_TCP_SEND_NONBLOCK_
NO_PACKETS_MODE
201502 3.0.0.3 ControlnonblockingTCP
send()callbehaviorwhen
unabletoallocatesufficient
packetbuffers.
EF_CLUSTER_IGNORE 201405u1 4.0 Ignoreattemptstouseclusters
EF_CLUSTER_RESTART 201405 4.0 DetermineOnloadcluster
behaviorfollowingrestart.
EF_CLUSTER_SIZE 201405 4.0 Size(numberofsocket
members)ofapplication
cluster.
EF_CLUSTER_NAME 201405 4.0 Createanapplicationcluster.
EF_UDP_FORCE_
REUSEPORT
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_TCP_FORCE_
REUSEPORT
201405 4.0 SupportApplicationclustering
forlegacyapplications.
EF_MCAST_SEND 201405 4.0 Enable/Disablemulticast
loopback.
EF_MCAST_RECV_HW_LOOP 201405 4.0 Enable/Disablehardware
multicastloopback‐receive.
EF_TX_TIMESTAMPING 201405 4.0 Perstackhardware
timestampingcontrol.
EF_TIMESTAMPING_
REPORTING
201405 4.0 Controltimestampreporting.
EF_TCP_SYNCOOKIES 201405 4.0 UseTCPsyncookiestoprotect
againstSYNfloodattack.
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 153
EF_SYNC_CPLANE_AT_
CREATE
201405 3.0 Synchronizecontrolplanewhen
astackiscreated.
EF_MULTICAST_LOOP_OFF 3.0 201405 DeprecatedinfavorofEF_
MCAST_SEND
EF_TX_PUSH_THRESHOLD 201310_u1 3.0 ImproveEF_TX_PUSHlow
latencytransmitfeature.
EF_RX_TIMESTAMPING 201310_u1 3.0 Controlofreceivepacket
hardwaretimestamps.
EF_RETRANSMIT_
THRESHOLD_SYNACK
201104 1.0.0.0 201310u1 Defaultchangedfrom4to5.
EF_PIO 201310 3.0 Enable/disablePIO
Defaultvalue1.
EF_PIO_THRESHOLD 201310 3.0 Identifiesthelargestpacketsize
thatcanusePIO.Defaultvalue
is1514.
EF_VFORK_MODE 201310 3.0 Dictateshowvfork()intercept
shouldwork.
EF_FREE_PACKETS_LOW_
WATERMARK
201310 3.0 201405u1 Leveloffreepacketstobe
retainedduringruntime.
Defaultchangedto0
(interpretedasEF_RXQ_SIZE/2 )
from100.
EF_TCP_SNDBUF_MODE 201310 2.0.0.6 201502
4.0
201509
LimitTCPpacketbuffersused
onthesendqueueand
retransmitqueue.
Defaultchangedto1from0in
201502/4.0.
Addedmode2in201509.
EF_TXQ_SIZE 3.0 201310 Limitedto2048forSFN7000
andSFN8000seriesadapters.
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 154
EF_MAX_ENDPOINTS 201104 1.1.0.3 201310
201509
201509u1
EOL4.0.3
Defaultchangedto1024from
10.
Defaultchangesto8192from
1024.Min(default)changesto
4from0.
Default8192.Min4.
Default1024.Min0.
EF_SO_TIMESTAMP_
RESYNC_TIME
201104 2.1.0.1 201310 RemovedfromOOL.
EF_SIGNALS_NOPOSTPONE 201210u1 2.1.0.1 201606u1 Preventthespecifiedlistof
signalsfrombeingpostponed
byOnload.
From201606u1listalso
includesSIGFPE.
EF_FORCE_TCP_NODELAY 201210 3.0 ForceuseofTCP_NODELAY.
EF_USE_HUGE_PAGES 201210 3.0 Enableshugepagesforpacket
buffers.
EF_VALIDATE_ENV 201210 3.0 Willwarnaboutobsoleteor
misspelledoptionsinthe
environment
Defaultvalue1.
EF_PD_VF 201205u1 2.1.0.0 201210 AllocateVIswithinSRIOVVFs
toallocateunlimitedmemory.
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
EF_PD_PHYS_MODE 201205_u1 2.1.0.0 201210 AllowsaVItousephysical
addressingratherthan
protectedI/Oaddresses
Replacedwithnewoptionson
EF_PACKET_BUFFER_MODE
EF_MAX_PACKETS 20101111 1.0.0.0 201210 Onloadwillroundthespecified
valueuptothenearestmultiple
of1024.
EF_EPCACHE_MAX 20101111 1.0.0.0 201210 RemovedfromOOL
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 155
EF_TCP_MAX_SEQERR_
MSGS
NS 201210 Removed
EF_STACK_LOCK_BUZZ 20101111 1.0.0.0 201210 OOLChangetoper_process,
fromper_stack.EOLisper
stack.
EF_RFC_RTO_INITIAL 20101111 1.0.0.0 201210
2.1.0.0
Changedefaultto1000from
3000
EF_DYNAMIC_ACK_THRESH 201205 2.1.0.0 201210 Defaultvaluechangedto16
from32in201210
EF_TCP_SERVER_
LOOPBACK
EF_TCP_CLIENT_
LOOPBACK
201205 2.1.0.0 201210 TCPloopbackacceleration
Addedoption4forclient
loopbacktocausebothendsof
aTCPconnectiontosharea
newlycreatedstack.
Option4issupportedfrom
EnterpriseOnloadv3.0.
EF_TCP_RST_DELAYED 201205 2.1.0.0 ResetTCPconnectionfollowing
RTOexpiry
EF_SA_ONSTACK_
INTERCEPT
201205 2.1.0.0 Defaultvalue0
EF_SHARE_WITH 201109u2 2.0.0.0
EF_EPOLL_CTL_HANDOFF 201109u2 2.0.0.0 Defaultvalue1
EF_CHECK_STACK_USER NS 201109u2 RenamedEF_SHARE_WITH
EF_POLL_USEC 201109u1 1.0.0.0
EF_DEFER_WORK_LIMIT 201109u1 2.0.0.0 Defaultvalue32
EF_POLL_FAST_LOOPS 20101111 1.0.0.0 201109u1
2.0.0.0
RenamedEF_POLL_FAST_USEC
EF_POLL_NONBLOCK_
FAST_LOOPS
201104 2.0.0.0 201109u1
2.0.0.1
RenamedEF_POLL_NONBLOCK_
FAST_USEC
EF_PIPE_RECV_SPIN 201104 2.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_PKT_WAIT_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 156
EF_PIPE_SEND_SPIN 201104 2.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_ACCEPT_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_RECV_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_TCP_SEND_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_UDP_RECV_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_UDP_SEND_SPIN 20101111 1.0.0.0 201109u1 Becomesperprocess,was
previouslyperstack
EF_EPOLL_NONBLOCK_
FAST_LOOPS
201104u2 2.0.0.0 201109u1 Removed
EF_POLL_AVOID_INT 20101111 1.0.0.0 201109u1 Removed
EF_SELECT_AVOID_INT 20101111 1.0.0.0 201109u1 Removed
EF_SIG_DEFER 20101111 1.0.0.0 201109u1 Removed
EF_IRQ_CORE 201109 2.0.0.0 201109u2 Nonrootusercannowsetit
whenusingscalablepacket
buffermode
EF_IRQ_CHANNEL 201109 2.0.0.0
EF_IRQ_MODERATION 201109 2.0.0.0 Defaultvalue0
EF_PACKET_BUFFER_MODE 201109 2.0.0.0 201210 In201210options2and3
enablephysicaladdressing
mode.
EOLonlysupportsoption1.
EOLv3.0supportsoptions2and
3.
Default‐disabled
EF_SIG_REINIT 201109 NS Defaultvalue0.
201109u1 Removedin201109u1
EF_POLL_TCP_LISTEN_
UL_ONLY
201104 2.0.0.0 201109 Removed
EF_POLL_UDP 20101111 1.0.0.0 201109 Removed
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 157
EF_POLL_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_POLL_UDP_UL_ONLY 201104 2.0.0.0 201109 Removed
EF_SELECT_UDP 20101111 1.0.0.0 201109 Removed
EF_SELECT_UDP_TX_FAST 20101111 1.0.0.0 201109 Removed
EF_UDP_CHECK_ERRORS 20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_FAST_
LOOPS
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_MCAST_UL_
ONLY
20101111 1.0.0.0 201109 Removed
EF_UDP_RECV_UL_ONLY 20101111 1.0.0.0 201109 Removed
EF_TX_QOS_CLASS 201104u2 2.0.0.0 Defaultvalue0
EF_TX_MIN_IPG_CNTL 201104u2 2.0.0.0 Defaultvalue0
EF_TCP_LISTEN_
HANDOVER
201104u2 2.0.0.0 Defaultvalue0
EF_TCP_CONNECT_
HANDOVER
201104u2 2.0.0.0 Defaultvalue0
EF_EPOLL_NONBLOCK_
FAST_LOOPS
201104u2 2.0.0.0 Defaultvalue32
201109u1 Removedin201109u1
EF_TCP_SNDBUF_MODE 2.0.0.6 Defaultvalue0
EF_UDP_PORT_
HANDOVER2_MAX
201104u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_
HANDOVER2_MIN
201104u1 2.0.0.0 Defaultvalue2
EF_UDP_PORT_
HANDOVER3_MAX
201104u1 2.0.0.0 Defaultvalue1
EF_UDP_PORT_
HANDOVER3_MIN
201104u1 2.0.0.0 Defaultvalue2
EF_STACK_PER_THREAD 201104u1 2.0.0.0 Defaultvalue0
EF_PREFAULT_PACKETS 20101111 1.0.0.0 201104u1 Enabledbydefault,was
previouslydisabled
EF_MCAST_RECV 201104u1 2.0.0.0 Defaultvalue1
Variable OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 158
13.5ModuleOptions
Tolistallonloadmoduleoptions:
#modinfoonload
EF_MCAST_JOIN_
BINDTODEVICE
201104u1 2.0.0.0 Defaultvalue0
EF_MCAST_JOIN_
HANDOVER
201104u1 2.0.0.0 Defaultvalue0
EF_DONT_ACCELERATE 201104u1 2.0.0.0 Defaultvalue0
EF_MULTICAST 20101111 1.0.0.0 201104u1 Removed
EF_TX_PUSH 20101111u1 1.0.0.0 201104 Enabledbydefault,was
previouslydisabled
201109 Nolongersetbythelatency
profilescript
Variable OOL EOL Changed Notes
Option OOL EOL Changed Notes
max_local_addrs 201606 5.0 Maximumnumberofnetwork
addressessupportedinthe
controlplane.
scalable_filter_gid 201509 5.0 SettoagroupIdentifierofusers
allowedtousethescalable
filtersfeature.
Setto‐2meansthatCAP_NET_
RAWisrequired‐andchecking
isenforced.
Setto‐1toavoidcapability
(CAP_NET_RAW)check.
oof_shared_steal_
thresh
201502 4.0 SeeListen/AcceptSocketson
page91
oof_shared_keep_
thresh
201502 4.0 SeeListen/AcceptSocketson
page91
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 159
oof_all_ports_
required
201502 4.0 Whensetto1,Onloadwill
returnanerrorifitisunableto
installafilteronallrequired
interfaces.
Setthisto0whenusing
multiplePFsorVFswithOnload.
intf_white_list 201502 4.0 SeeWhitelistandBlacklist
Interfacesonpage56
intf_black_list 201502 4.0 SeeWhitelistandBlacklist
Interfacesonpage56
timesync_period 201502 4.0 Periodinmillisecondsbetween
synchronizingtheOnloadclock
withthesystemclock.
max_packets_per_stack 201210 3.0 Limitthenumberofpacket
buffersthateachOnloadstack
canallocate.Thismodule
optionplacesanupperlimiton
theEF_MAX_PACKETSoption
epoll2_max_stacks 201210 3.0 201310 Identifiesthemaximum
numberofstacksthatanepoll
filedescriptorcanhandlewhen
EF_UL_EPOLL=2.
Renamedepoll_max_stacks
andremovedfromlater
releases.
phys_mod_gid 201210 3.0 sfc_charmoduleparameterto
restrictwhichef_viuserscan
usephysicaladdressingmode.
phys_mode_gid 201210 3.0 Enablephysicaladdressing
modeandrestrictwhichusers
canuseit
shared_buffer_table 201210 NS Thisoptionshouldbesetto
enableef_viapplicationsthat
usetheef_iobufsetAPI.Setting
shared_buffer_table=10000
willmake10000buffertable
entriesavailableforusewith
ef_iobufset.
Option OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 160
NOTE:TheusershouldalwaysrefertotheOnloaddistributionReleaseNotesand
ChangeLog.Theseareavailablefromhttp://www.openonload.org/
download.html.
safe_signals_and_exit 201205 2.1.0.0 WhenOnloadinterceptsa
terminationsignalitwill
attemptacleanexitbyreleasing
resourcesincludingstacklocks
etc.Thedefaultis(1)enabled
anditisrecommendedthatthis
remainsenabledunlesssignal
handlingproblemsoccurwhen
itcanbedisabled(0).
max_layer2_interfaces 201205 2.1.0.0 Maximumnumberofnetwork
interfaces(includesphysical,
VLANandbonds)supportedin
thecontrolplane.
max_routes 201205 2.1.0.0 201205 Maximumnumberofentriesin
theOnloadroutetable.Default
is256.
ReplacedtheOOF_LOCAL_
ADDR_MAXsetting.
max_neighs 201205 2.1.0.0 Maximumnumberofentriesin
OnloadARP/neighbourtable.
Roundeduptopoweroftwo
value.Defaultis1024.
unsafe_sriov_without_
iommu
201209u2 2.0.0.0 201210 Removed,obsoletedbyphysical
addressingmodesandphys_
mode_gid.
ObsoleteinEOLfromv3.0.
buffer_table_min
buffer_table_max
2.0.0.0 201210 Obsolete‐Removed.
ObsoleteinEOLfromv3.0.
Option OOL EOL Changed Notes
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 161
13.6Onload‐AdapterNetDrivers
ThefollowingtableidentifiestheSolarflareadapternetdriverincludedinthe
Onloadrelease.
OOL EOL NetDriver Notes
201606u1 5.0 4.10.0.1011 Linuxkernels:2.6.18to4.9rc1
201606 NS 4.8.2.1004
NS 4.0.7 4.7.0.1043 Linuxkernels2.6.18to4.3
NS 4.0.5,4.0.6 4.7.0.1039
NS 4.0.4 4.7.0.1035
NS 4.0.3 4.7.0.1035 Supportupto4.3Linuxkernel.
andRHEL7.2
201509u1 NS 4.5.1.1037
201509 4.0.2 4.5.1.1026
NS 4.0.1 4.5.1.1020
201502u2 4.0.0 4.5.1.1010
201502u1 NS 4.4.1.1021
201502 NS 4.4.1.1017
201405u2 NS 4.1.2.1003
201405u1 NS 4.1.2.1003 supportsRHEL7.
201405 3.0.0.8,3.0.0.7
3.0.0.6,3.0.0.5,
3.0.0.4,3.0.0.3,
3.0.0.2
4.1.0.6734 NetdriversupportingSFN5xxx,
6xxxand7xxxseriesadapters‐
includingSFN7x42Q.
201310u2 3.0.0.0,3.0.0.1 4.0.2.6645 NetdriversupportingSFN5xxx,
6xxxand7xxxseriesadapters
introducinghardwarepacket
timestampsandPTPon7xxx
seriesadapters.
SFN7142Qnotsupported.
201310u1 NS 4.0.2.6625
201310 NS 4.0.0.6585 SupportsHWtimestamps,PTP
onSFN7000seriesadapters.
OnloadUserGuide
ChangeHistory
Issue22 ©SolarflareCommunications2017 162
NS 2.1.0.1 3.3.0.6262 Supportssfptpd.
201210u1 NS 3.3.0.6246 Supportssfptpd.
201210 NS 3.2.1.6222B
NS 2.1.0.0 3.2.1.6110
201205u1 NS 3.2.1.6099
201109u1 2.0.0.0 3.2
201104 NS 3.1
OOL EOL NetDriver Notes
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 163
AParameterReference
A.1ParameterList
Theparameterlistdetailsthefollowing:
•Theenvironmentvariableusedtosettheparameter.
• Parametername:thenameusedbyonload_stackdump.
•Thedefault,minimumandmaximumvalues.
•Whetherthevariablescopeappliesperstackorperprocess.
• Description.
EF_ACCEPTQ_MIN_BACKLOG
Name:acceptq_min_backlog
Default:1
Scope:perstack
Setsaminimumvaluetouseforthe'backlog'argumenttothelisten()call.Ifthe
applicationrequestsasmallervalue,usethisvalueinstead.
EF_ACCEPT_INHERIT_NONBLOCK
Name:accept_force_inherit_nonblock
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Ifsetto1,TCPsocketsacceptedfromalisteningsocketinherittheO_NONBLOCKflag
fromthelisteningsocket.
EF_BINDTODEVICE_HANDOVER
Name:bindtodevice_handover
Default:0
Minimum:0
Maximum:1
Scope:perstack
HandsocketsovertothekernelstackthathavetheSO_BINDTODEVICEsocketoption
enabled.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 164
EF_BURST_CONTROL_LIMIT
Name:burst_control_limit
Default:0
Scope:perstack
Ifnonzero,limitshowmanybytesofdataaretransmittedinasingleburst.Thiscan
beusefultoavoiddropsonlowendswitcheswhichcontainlimitedbufferingor
limitedinternalbandwidth.Thisisnotusuallyneededforusewithmostmodern,
highperformanceswitches.
EF_BUZZ_USEC
Name:buzz_usec
Default:0
Scope:perstack
Setsthetimeoutinmicrosecondsforlockbuzzingoptions.Settozerotodisablelock
buzzing(spinning).Willbuzzforeverifsetto‐1.AlsosetbytheEF_POLL_USEC
option.
EF_CLUSTER_IGNORE
Name:cluster_ignore
Default:0
Minimum:0
Maximum:1
Scope:perstack
Whenset,thisoptioninstructsOnloadtoignoreattemptstouseclustersand
effectivelyignoreattemptstosetSO_REUSEPORT.
EF_CLUSTER_RESTART
Name:cluster_restart_opt
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Thisoptioncontrolsthebehaviorwhenrecreatingastack(e.g.duetorestartinga
process)inanSO_REUSEPORTclusteranditencountersaresourcelimitationsuchas
anorphanstackfromthepreviousprocess:0‐returnanerror.1‐terminatethe
orphantoallowthenewprocesstocontinue
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 165
EF_CLUSTER_SIZE
Name:cluster_size
Default:2
Minimum:2
Scope:perprocess
IfuseofSO_REUSEPORTcreatesacluster,thisoptionspecifiessizeoftheclusterto
becreated.ThisoptionhasnoimpactifuseofSO_REUSEPORTjoinsaclusterthat
alreadyexists.Notethatiffewersocketsthanspecifiedherejointhecluster,then
sometrafficwillbelost.RefertotheSO_REUSEPORTsectioninthemanualformore
detail.ThisshouldalwaysbeapoweroftwovaluewhenusedonSFN5000or
SFN6000seriesadapters.
EF_COMPOUND_PAGES_MODE
Name:compound_pages
Default:0
Minimum:0
Maximum:2
Scope:perstack
Debugoption,notsuitablefornormaluse.
Forpacketbuffers,allocatesystempagesinthefollowingway:
•0‐trytousecompoundpagesifpossible(default)
•1‐donotusecompoundpagesofhighorder
•2‐donotusecompoundpagesatall.
EF_CONG_AVOID_SCALE_BACK
Name:cong_avoid_scale_back
Default:0
Scope:perstack
When>0,thisoptionslowsdowntherateatwhichtheTCPcongestionwindowis
opened.Thiscanhelptoreducelossinenvironmentswherethereislotsof
congestionandloss.
EF_DEFER_WORK_LIMIT
Name:defer_work_limit
Default:32
Scope:perstack
Themaximumnumberoftimesthatworkcanbedeferredtothelockholderbefore
weforcetheunlockedthreadtoblockandwaitforthelock
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 166
EF_DELACK_THRESH
Name:delack_thresh
Default:1
Minimum:0
Maximum:65535
Scope:perstack
Thisoptioncontrolsthedelayedacknowledgmentalgorithm.Asocketmayreceive
uptothespecifiednumberofTCPsegmentswithoutgeneratinganACK.Settingthis
optionto0disablesdelayedacknowledgments.
NOTE:ThisoptionisoverriddenbyEF_DYNAMIC_ACK_THRESH,sobothoptions
needtobesetto0todisabledelayedacknowledgments.
EF_DONT_ACCELERATE
Name:dont_accelerate
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Donotacceleratebydefault.Thisoptionisusuallyusedinconjunctionwith
onload_set_stackname()toallowindividualsocketstobeacceleratedselectively.
EF_DYNAMIC_ACK_THRESH
Name:dynack_thresh
Default:16
Minimum:0
Maximum:65535
Scope:perstack
Ifsetto>0thiswillturnondynamicadaptationoftheACKratetoincreaseefficiency
byavoidingACKswhentheywouldreducethroughput.Thevalueisusedasthe
thresholdfornumberofpendingACKsbeforeanACKisforced.Ifsettozerothen
thestandarddelayedackalgorithmisused.
EF_EPOLL_CTL_FAST
Name:ul_epoll_ctl_fast
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Avoidsystemcallsinepoll_ctl()whenusinganacceleratedepoll
implementation.Systemcallsaredeferreduntilepoll_wait()blocks,andinsome
casesremovedcompletely.Thisoptionimprovesperformanceforapplicationsthat
callepoll_ctl()frequently.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 167
Caveats:
•ThisoptionhasnoeffectwhenEF_UL_EPOLL=0.
•Donotturnthisoptiononifyourapplicationusesdup(),fork()orexec()in
conjunctionwithepollfiledescriptorsorwiththesocketsmonitoredbyepoll.
•Ifyoumonitortheepollfdinanotherpoll,selectorepollset,andtheeffectsof
epoll_ctl()arelatencycritical,thenthisoptioncancauselatencyspikesor
evendeadlock.
• WithEF_UL_EPOLL=2,thisoptionisharmfulifyouarecallingepoll_wait()
andepoll_ctl()simultaneouslyfromdifferentthreadsorprocesses.
EF_EPOLL_CTL_HANDOFF
Name:ul_epoll_ctl_handoff
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Allowepoll_ctl()callstobepassedfromonethreadtoanotherinordertoavoid
lockcontention,inEF_UL_EPOLL=1or3case.Thisoptimizationisparticularly
importantwhenepoll_ctl()callsaremadeconcurrentlywithepoll_wait()and
spinningisenabled.
Thisoptionisenabledbydefault.
Caveat:
•Thisoptionmaycauseanerrorcodereturnedbyepoll_ctl()tobehidden
fromtheapplicationwhenacallisdeferred.Insuchcasesanerrormessageis
emittedtostderrorthesystemlog.
EF_EPOLL_MT_SAFE
Name:ul_epoll_mt_safe
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Thisoptiondisablesconcurrencycontrolinsidetheacceleratedepoll
implementations,reducingCPUoverhead.Itissafetoenablethisoptionif,foreach
epollset,allcallsontheepollsetandallcallsthatmaymodifyamemberoftheepoll
setareconcurrencysafe.Callsthatmaymodifyamemberarebind(),connect(),
listen()andclose().
ThisoptionimprovesperformancewithEF_UL_EPOLL=1or3andalsowith
EF_UL_EPOLL=2andEF_EPOLL_CTL_FAST=1.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 168
EF_EPOLL_SPIN
Name:ul_epoll_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spininepoll_wait()callsuntilaneventissatisfiedorthespintimeoutexpires
(whicheveristhesooner).Ifthespintimeoutexpires,enterthekernelandblock.
ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_EVS_PER_POLL
Name:evs_per_poll
Default:64
Minimum:0
Maximum:0x7fffffff
Scope:perstack
Setsthenumberofhardwarenetworkeventstohandlebeforeperformingother
work.Thevaluechosenrepresentsatradeoff:Largervaluesincreasebatching
(whichtypicallyimprovesefficiency)butmayalsoincreasetheworkingsetsize
(whichharmscacheefficiency).
EF_FDS_MT_SAFE
Name:fds_mt_safe
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Thisoptionallowslessstrictconcurrencycontrolwhenaccessingtheuserlevelfile
descriptortable,resultinginincreasedperformance,particularlyformultithreaded
applications.Singlethreadedapplicationsgetasmalllatencybenefit,butmulti
threadedapplicationsbenefitmostduetodecreasedcachelinebouncingbetween
CPUcores.
Thisoptionisunsafeforapplicationsthatmakechangestofiledescriptorsinone
threadwhileaccessingthesamefiledescriptorsinotherthreads.Forexample,
closingafiledescriptorinonethreadwhileinvokinganothersystemcallonthatfile
descriptorinasecondthread.Concurrentcallsthatdonotchangetheobject
underlyingthefiledescriptorremainsafe.
Callstobind(),connect(),listen()maychangeunderlyingobject.Ifyoucall
suchfunctionsinonethreadwhileaccessingthesamefiledescriptorfromtheother
thread,thisoptionisalsounsafe.Insomespecialcases,anyfunctionsmaychange
underlyingobject.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 169
Alsoconcurrentcallsmayhappenfromsignalhandlers,sosetthisto0ifyoursignal
handlerscallbind(),connect(),listen()orclose()
EF_FDTABLE_SIZE
Name:fdtable_size
Default:0
Scope:perprocess
Limitthenumberofopenedfiledescriptorsbythisvalue.Ifzero,theinitialhardlimit
ofopenfiles(`ulimit‐n‐H`)isused.Hardandsoftresourcelimitsforopenedfile
descriptors(helpulimit,man2setrlimit)areboundbythisvalue.
EF_FDTABLE_STRICT
Name:fdtable_strict
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Enablesmorestrictconcurrencycontrolfortheuserlevelfiledescriptortable.
Enablingthisoptioncanreduceperformanceforapplicationsthatcreateand
destroymanyconnectionspersecond.
EF_FORCE_SEND_MULTICAST
Name:force_send_multicast
Default:1
Minimum:0
Maximum:1
Scope:perstack
Thisoptioncausesallmulticastsendstobeaccelerated.Whendisabled,multicast
sendsareonlyacceleratedforsocketsthathaveclearedtheIP_MULTICAST_LOOP
flag.
Thisoptiondisablesloopbackofmulticasttraffictoreceiversonthesamehost,
unless(a)thosereceiversaresharinganOpenOnloadstackwiththesender(see
EF_NAME)andEF_MCAST_SENDissetto1or3,or(b)prerequisitestosupport
loopbacktootherOpenOnloadstacksaremet(seeEF_MCAST_SEND).
SeetheOpenOnloadmanualforfurtherdetailsonmulticastoperation.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 170
EF_FORCE_TCP_NODELAY
Name:tcp_force_nodelay
Default:0
Minimum:0
Maximum:2
Scope:perstack
ThisoptionallowstheusertooverridetheuseofTCP_NODELAY.Thismaybeuseful
incaseswhere3rdpartysoftwareis(not)settingthisvalueandtheuserwouldlike
tocontrolitsbehavior:
•0‐donotoverride
•1‐alwayssetTCP_NODELAY
•2‐neversetTCP_NODELAY
EF_FORK_NETIF
Name:fork_netif
Default:3
Minimum:CI_UNIX_FORK_NETIF_NONE
Maximum:CI_UNIX_FORK_NETIF_BOTH
Scope:perprocess
Thisoptioncontrolsbehaviorafteranapplicationcallsfork():
•0‐NeitherforkparentnorchildcreatesanewOpenOnloadstack
•1‐Childcreatesanewstackfornewsockets
•2‐Parentcreatesanewstackfornewsockets
•3‐Parentandchildeachcreateanewstackfornewsockets.
EF_FREE_PACKETS_LOW_WATERMARK
Name:free_packets_low
Default:0
Scope:perstack
Keepfreepacketsnumbertobeatleastthisvalue.EF_MIN_FREE_PACKETSdefines
initializationbehavior,andthisvalueisaboutnormalapplicationruntime.Insome
combinationsofhardwareandsoftware,Onloadisnotableallocatepacketsatany
context,soitmakessensetokeepsomesparepackets.Defaultvalue0isinterpreted
asEF_RXQ_SIZE/2.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 171
EF_HELPER_PRIME_USEC
Name:timer_prime_usec
Default:250
Scope:perstack
Setsthefrequencywithwhichsoftwareshouldresetthecountdowntimer.Usually
settoavaluethatissignificantlysmallerthanEF_HELPER_USECtopreventthe
countdowntimerfromfiringunlessneeded.Defaultsto(EF_HELPER_USEC/2).
EF_HELPER_USEC
Name:timer_usec
Default:500
Scope:perstack
Timeoutinmicrosecondsforthecountdowninterrupttimer.Thistimergenerates
aninterruptifnetworkeventsarenothandledbytheapplicationwithinthegiven
time.Itensuresthatnetworkeventsarehandledpromptlywhentheapplicationis
notinvokingthenetwork,orisdescheduled.
Setthisto0todisablethecountdowninterrupttimer.Itisdisabledbydefaultfor
stacksthatareinterruptdriven.
EF_HIGH_THROUGHPUT_MODE
Name:rx_merge_mode
Default:0
Minimum:0
Maximum:1
Scope:perstack
Thisoptioncausesonloadtooptimizeforthroughputatthecostoflatency.
EF_INT_DRIVEN
Name:int_driven
Default:1
Minimum:0
Maximum:1
Scope:perstack
Putthestackintoan'interruptdriven'modeofoperation.Whenthisoptionisnot
enabledOnloadusesheuristicstodecidewhentoenableinterrupts,andthiscan
causelatencyjitterinsomeapplications.Soenablingthisoptioncanhelpavoid
latencyoutliers.
Thisoptionisenabledbydefaultexceptwhenspinningisenabled.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 172
Thisoptioncanbeusedinconjunctionwithspinningtopreventoutlierscaused
whenthespintimeoutisexceededandtheapplicationblocks,orwhenthe
applicationisdescheduled.Inthiscasewerecommendthatinterruptmoderation
besettoareasonablyhighvalue(e.g.100us)topreventtoohigharateofinterrupts.
EF_INT_REPRIME
Name:int_reprime
Default:0
Minimum:0
Maximum:1
Scope:perstack
Enableinterruptsmoreaggressivelythanthedefault.
EF_IRQ_CHANNEL
Name:irq_channel
Default:4294967295
Minimum:1
Maximum:SMAX
Scope:perstack
Setthenetdriverreceivechannelthatwillbeusedtohandleinterruptsforthis
stack.Thecorethatreceivesinterruptsforthisstackwillbewhichevercoreis
configuredtohandleinterruptsforthespecifiednetdriverreceivechannel.
ThisoptiononlytakeseffectifEF_PACKET_BUFFER_MODE=0(default)or2.
EF_IRQ_CORE
Name:irq_core
Default:4294967295
Minimum:1
Maximum:SMAX
Scope:perstack
SpecifywhichCPUcoreinterruptsforthisstackshouldbehandledon.
WithEF_PACKET_BUFFER_MODE=1or3,Onloadcreatesdedicatedinterruptsfor
eachstack,andtheinterruptisassignedtotherequestedcore.
WithEF_PACKET_BUFFER_MODE=0(default)or2,Onloadinterruptsarehandledvia
netdriverreceivechannelinterrupts.Thesfc_affinitydriverisusedtochoosewhich
netdriverreceivechannelisused.Itisonlypossibleforinterruptstobehandledon
therequestedcoreifanetdriverinterruptisassignedtotheselectedcore.
Otherwiseanearbycorewillbeselected.
NOTE:IftheIRQbalancerserviceisenableditmayredirectinterruptstoother
cores.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 173
EF_IRQ_MODERATION
Name:irq_usec
Default:0
Minimum:0
Maximum:1000000
Scope:perstack
Interruptmoderationinterval,inmicroseconds.
ThisoptiononlytakeseffectivewithEF_PACKET_BUFFER_MODE=1or3.Otherwise
theinterruptmoderationsettingsofthekernelnetdrivertakeeffect.
EF_KEEPALIVE_INTVL
Name:keepalive_intvl
Default:75000
Scope:perstack
Defaultintervalbetweenkeepalives,inmilliseconds.
EF_KEEPALIVE_PROBES
Name:keepalive_probes
Default:9
Scope:perstack
Defaultnumberofkeepaliveprobestotrybeforeabortingtheconnection.
EF_KEEPALIVE_TIME
Name:keepalive_time
Default:7200000
Scope:perstack
Defaultidletimebeforekeepaliveprobesaresent,inmilliseconds.
EF_LOAD_ENV
Name:load_env
Default:1
Minimum:0
Maximum:1
Scope:perprocess
OpenOnloadwillonlyconsultotherenvironmentvariablesifthisoptionisset.i.e.
ClearingthisoptionwillcauseallotherEF_environmentvariablestobeignored.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 174
EF_LOG
Name:log_category
Default:27
Minimum:0
Scope:perstack
DesignedtocontrolhowchattyOnload'sinformative/warningmessagesare.
Specifiedasacommaseparatedlistofoptionstoenableanddisable(withaminus
sign).Validoptionsare:
• 'banner'(onbydefault)
• 'resource_warnings'(onbydefault)
• 'config_warnings'(onbydefault)
• 'conn_drop'(offbydefault)
• 'usage_warnings'(onbydefault).
Forexample:
•Toenableconn_drop:
EF_LOG=conn_drop
•Toenableconn_dropandturnoffresourcewarnings:
EF_LOG=conn_drop,resource_warnings
EF_LOG_FILE
Scope:perstack
WhenEF_LOG_VIA_IOCTLisunset,theusercandirectOnloaddebugandoutput
datatoadirectory/fileinsteadofstdoutandinsteadofthesyslog.
EF_LOG_TIMESTAMPS
Default:0
Minimum:0
Maximum:1
Scope:global
IfenabledthiswilladdatimestamptoeveryOnloadoutputlogentry.Timestamps
areoriginatedfromtheFRCcounter.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 175
EF_LOG_VIA_IOCTL
Name:log_via_ioctl
Default:0
Minimum:0
Maximum:1
Scope:perprocess
CauseserrorandlogmessagesemittedbyOpenOnloadtobewrittentothesystem
logratherthanwrittentostandarderror.Thisincludesthecopyrightbanneremitted
whenanapplicationcreatesanewOpenOnloadstack.
Bydefault,OpenOnloadlogsarewrittentotheapplicationstandarderrorifandonly
ifitisaTTY.
Enablethisoptionwhenitisimportantnottochangewhattheapplicationwritesto
standarderror.
DisableittoguaranteethatloggoestostandarderrorevenifitisnotaTTY.
EF_MAX_ENDPOINTS
Name:max_ep_bufs
Default:8192
Minimum:4
Maximum:CI_CFG_NETIF_MAX_ENDPOINTS_MAX
Scope:perstack
Thisoptionplacesanupperlimitonthenumberofacceleratedendpoints(sockets,
pipesetc.)inanOnloadstack.Thisoptionshouldbesettoapoweroftwobetween
4and2^21.Whenthislimitisreachedlisteningsocketsarenotabletoacceptnew
connectionsoveracceleratedinterfaces.Newsocketsandpipescreatedvia
socket()andpipe()etc.arehandedovertothekernelstackandsoarenot
accelerated.
NOTE:~4synreceivestatesconsumeoneendpoint,seealso
EF_TCP_SYNRECV_MAX.
EF_MAX_PACKETS
Name:max_packets
Default:32768
Minimum:1024
Scope:perstack
UpperlimitonnumberofpacketbuffersineachOpenOnloadstack.Packetbuffers
requirehardwareresourceswhichmaybecomealimitingfactorifmanystacksare
eachusingmanypacketbuffers.Thisoptioncanbeusedtolimithowmuch
hardwareresourceandmemoryastackuses.Thisoptionhasanupperlimit
determinedbythemax_packets_per_stackonloadmoduleoption.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 176
NOTE:When'scalablepacketbuffermode'isnotenabled(see
EF_PACKET_BUFFER_MODE)thetotalnumberofpacketbufferspossibleinaggregate
islimitedbyahardwareresource.TheSFN5xseriesadapterssupport
approximately120,000packetbuffers.
EF_MAX_RX_PACKETS
Name:max_rx_packets
Default:24576
Minimum:0
Maximum:1000000000
Scope:perstack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythereceive
datapath.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethat
somepacketbuffersarereservedforthetransmitpath.
EF_MAX_TX_PACKETS
Name:max_tx_packets
Default:24576
Minimum:0
Maximum:1000000000
Scope:perstack
Themaximumnumberofpacketbuffersinastackthatcanbeusedbythetransmit
datapath.ThisshouldbesettoavaluesmallerthanEF_MAX_PACKETStoensurethat
somepacketbuffersarereservedforthereceivepath.
EF_MCAST_JOIN_BINDTODEVICE
Name:mcast_join_bindtodevice
Default:0
Minimum:0
Maximum:1
Scope:perstack
WhenaUDPsocketjoinsamulticastgroup(usingIP_ADD_MEMBERSHIPorsimilar),
thisoptioncausesthesockettobeboundtotheinterfacethatthejoinwason.The
benefitofthisisthatitensuresthesocketwillnotaccidentallyreceivepacketsfrom
otherinterfacesthathappentomatchthesamegroupandport.Thiscansometimes
happenifanothersocketjoinsthesamemulticastgrouponadifferentinterface,or
iftheswitchisnotfilteringmulticasttrafficeffectively.Ifthesocketjoinsmulticast
groupsonmorethanoneinterface,thenthebindingisautomaticallyremoved.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 177
EF_MCAST_JOIN_HANDOVER
Name:mcast_join_handover
Default:0
Minimum:0
Maximum:2
Scope:perstack
Whenthisoptionissetto1,andaUDPsocketjoinsamulticastgrouponaninterface
thatisnotaccelerated,theUDPsocketishandedovertothekernelstack.Thiscan
beagoodideabecauseitpreventsthatsocketfromconsumingOnloadresources,
andmayalsohelpavoidspinningwhenitisnotwanted.
Whensetto2,UDPsocketsthatjoinmulticastgroupsarealwayshandedovertothe
kernelstack.
EF_MCAST_RECV
Name:mcast_recv
Default:1
Minimum:0
Maximum:1
Scope:perstack
Controlswhetherornottoacceleratemulticastreceives.Whensettozero,
multicastreceivesarenotaccelerated,butthesocketcontinuestobemanagedby
Onload.
SeealsoEF_MCAST_JOIN_HANDOVER.
SeetheOpenOnloadmanualforfurtherdetailsonmulticastoperation.
EF_MCAST_RECV_HW_LOOP
Name:mcast_recv_hw_loop
Default:1
Minimum:0
Maximum:1
Scope:perstack
Whenenabledallowsudpsocketstoreceivemulticasttrafficthatoriginatesfrom
otherOpenOnloadstacks.
SeetheOpenOnloadmanualforfurtherdetailsonmulticastoperation.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 178
EF_MCAST_SEND
Name:mcast_send
Default:0
Minimum:0
Maximum:3
Scope:perstack
Controlsloopbackofmulticasttraffictoreceiversinthesameandother
OpenOnloadstacks.
•Whensetto0(default)disablesloopbackwithinthesamestackaswellasto
otherOpenOnloadstacks.
•Whensetto1enablesloopbacktothesamestack.
•Whensetto2enablesloopbacktootherOpenOnloadstacks.
•Whensetto3enablesloopbacktothesameaswellasotherOpenOnload
stacks.
InrespecttoloopbacktootherOpenOnloadstackstheoptionsisjustahintandthe
featurerequiresallthefollowing:
• 7000seriesornewerdevice
•selectingfirmwarevariantwithloopbacksupport.
SeetheOpenOnloadmanualforfurtherdetailsonmulticastoperation.
EF_MIN_FREE_PACKETS
Name:min_free_packets
Default:100
Minimum:0
Maximum:1000000000
Scope:perstack
Minimumnumberoffreepacketstoreserveforeachstackatinitialization.IfOnload
isnotabletoallocatesufficientpacketbufferstofilltheRXringsandfillthefreepool
withthegivennumberofbuffers,thencreationofthestackwillfail.
EF_MULTICAST_LOOP_OFF
Name:multicast_loop_off
Default:1
Minimum:0
Maximum:1
Scope:perstack
EF_MULTICAST_LOOP_OFFisdeprecatedinfavorofEF_MCAST_SEND.Whenset,
disablesloopbackofmulticasttraffictoreceiversinthesameOpenOnloadstack.
ThisoptiononlytakeseffectwhenEF_MCAST_SENDisnotsetandisequivalentto
EF_MCAST_SEND=1orEF_MCAST_SEND=0forvaluesof0and1respectively.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 179
SeetheOpenOnloadmanualforfurtherdetailsonmulticastoperation.
EF_NAME
Default:none
Maximum:8chars
Scope:perstack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stackname()overridesthisvariable,and
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
EF_NETIF_DTOR
Name:netif_dtor
Default:1
Minimum:0
Maximum:2
Scope:perprocess
ThisoptioncontrolsthelifetimeofOpenOnloadstackswhenthelastsocketina
stackisclosed.
EF_NONAGLE_INFLIGHT_MAX
Name:nonagle_inflight_max
Default:50
Minimum:1
Scope:perstack
ThisoptionaffectsthebehaviorofTCPsocketswiththeTCP_NODELAYsocketoption.
Nagle'salgorithmisenabledwhenthenumberofpacketsinflight(sentbutnot
acknowledged)exceedsthevalueofthisoption.Thisimprovesefficiencywhen
sendingmanysmallmessages,whilepreservinglowlatency.
Setthisoptionto‐1toensurethatNagle'salgorithmneverdelayssendingofTCP
messagesonsocketswithTCP_NODELAYenabled.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 180
EF_NO_FAIL
Name:no_fail
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Thisoptioncontrolswhetherfailuretocreateanacceleratedsocket(duetoresource
limitations)ishiddenbycreatingaconventionalunacceleratedsocket.Setthis
optionto0tocauseoutofresourceserrorstobepropagatedaserrorstothe
application,orto1tohaveOnloadusethekernelstackinsteadwhenoutof
resources.
Disablingthisoptioncanbeusefultoensurethatsocketsarebeingacceleratedas
expected(i.e.tofindoutwhentheyarenot).
EF_ONLOAD_FD_BASE
Name:fd_base
default:4 
perprocess
Onloadusesfdsinternallythatarenotvisibletotheapplication.Thiscancause
problemsforapplicationsthatmakeassumptionsabouttheiruseofthefdspace,for
examplebydoingdup2/3ontoaspecificfiledescriptor.
Ifthisisdoneonanfdthatisinternallyusedbyonloadthananerroroftheform
'citp_ep_dup3(29,3):targetisreserved,seeEF_ONLOAD_FD_BASE'willbe
generated.
Thisoptionspecifiesabasefiledescriptorvalue,thatonloadshouldtrytomakeit's
internalfiledescriptorsgreaterthanorequalto.Thisallowstheapplicationtodirect
onloadtoapartofthefdspacethatitisnotexpectingtoexplicitlyuse.
EF_PACKET_BUFFER_MODE
Name:packet_buffer_mode
Default:0
Minimum:0
Maximum:3
Scope:perstack
ThisoptionaffectshowDMAbuffersaremanaged.Thedefaultpacketbuffermode
usesalimitedhardwareresource,andsorestrictsthetotalamountofmemorythat
canbeusedbyOnloadforDMA.
SettingEF_PACKET_BUFFER_MODE!=0enables'scalablepacketbuffermode'which
removesthatlimit.Seedetailsforeachmodebelow:
•1‐SRIOVwithIOMMU.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 181
EachstackallocatesaseparatePCIVirtualFunction.IOMMUguaranteesthat
differentstacksdonothaveanyaccesstoeachotherdata.
•2‐Physicaladdressmode.
Inherentlyunsafe,withnoaddressspaceseparationbetweendifferentstacks
ornetdriverpackets.
•3‐SRIOVwithphysicaladdressmode.
EachstackallocatesaseparatePCIVirtualFunction.IOMMUisnotused,sothis
modeisunsafeinthesamewayas(2).
Touseoddmodes(1and3)SRIOVmustbeenabledintheBIOS,OSkernelandon
thenetworkadapter.Inthesemodesyoualsogetfasterinterrupthandlerwhichcan
improvelatencyforsomeworkloads.
Formode(1)youalsohavetoenableIOMMU(alsoknownasVTd)inBIOSandin
yourkernel.
Forunsafephysicaladdressmodes(2)and(3),youshouldtunephys_mode_gid
moduleparameteroftheonloadmodule.
EF_PER_SOCKET_CACHE_MAX
Name:per_sock_cache_max
Default:0
Scope:perstack
Whensocketcachingisenabled,(i.e.whenEF_SOCKET_CACHE_MAX>0),thissets
afurtherlimitonthesizeofthecacheforeachsocket.Ifsettozero,nolimitisset
beyondthegloballimitspecifiedbyEF_SOCKET_CACHE_MAX.
EF_PIO
Name:pio
Default:1
Minimum:0
Maximum:2
Scope:perstack
ControlofwhetherProgrammedI/OisusedinsteadofDMAforsmallpackets:
•0‐no(useDMA)
•1‐usePIOforsmallpacketsifavailable(default)
Mode1willfallbacktoDMAifPIOisnotcurrentlyavailable.
•2‐usePIOforsmallpacketsandfailifPIOisnotavailable.
Mode2willfailtocreatethestackifthehardwaresupportsPIObutPIOisnot
currentlyavailable.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 182
OnhardwarethatdoesnotsupportPIOthereisnodifferencebetweenmode1and
mode2.Inallcases,PIOwillonlybeusedforsmallpackets(seeEF_PIO_THRESHOLD)
andiftheVI'stransmitqueueiscurrentlyempty.Iftheseconditionsarenotmet
DMAwillbeused,eveninmode2.
NOTE:PIOiscurrentlyonlyavailableonx86_64systems.
NOTE:Mode2willnotpreventastackfromoperatingwithoutPIOintheeventthat
PIOallocationisoriginallysuccessfulbutthenfailsafteranadapterisrebootedor
hotpluggedwhilethatstackexists.
EF_PIO_THRESHOLD
Name:pio_thresh
Default:1514
Minimum:0
Scope:perstack
SetsathresholdforthesizeofpacketthatwillusePIO,ifturnedonusingEF_PIO.
PacketsuptothethresholdwillusePIO.Largerpacketswillnot.
EF_PIPE
Name:ul_pipe
Default:2
Minimum:CI_UNIX_PIPE_DONT_ACCELERATE
Maximum:CI_UNIX_PIPE_ACCELERATE_IF_NETIF
Scope:perprocess
•0‐disablepipeacceleration
•1‐enablepipeacceleration
•2‐acceleratepipesonlyifanOnloadstackalreadyexistsintheprocess.
EF_PIPE_RECV_SPIN
Name:pipe_recv_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spininpipereceivecallsuntildataarrivesorthespintimeoutexpires(whicheveris
thesooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespintimeout
issetbyEF_SPIN_USECorEF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 183
EF_PIPE_SEND_SPIN
Name:pipe_send_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spininpipesendcallsuntilspacebecomesavailableinthesocketbufferorthespin
timeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_PIPE_SIZE
Name:pipe_size
Default:229376
Minimum:OO_PIPE_MIN_SIZE
Maximum:CI_CFG_MAX_PIPE_SIZE
Scope:perprocess
Defaultsizeofthepipeinbytes.Actualpipesizewillberoundeduptothesizeof
packetbufferandsubjecttomodificationsbyfcntlF_SETPIPE_SZwheresupported.
EF_PKT_WAIT_SPIN
Name:pkt_wait_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpinwhilewaitingforDMAbuffers.Ifthespintimeoutexpires,enterthekerneland
block.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_FAST
Name:ul_poll_fast
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Allowapoll()calltoreturnwithoutinspectingthestateofallpolledfile
descriptorswhenatleastoneeventissatisfied.Thisallowstheacceleratedpoll()
calltoavoidasystemcallwhenacceleratedsocketsare'ready',andcanincrease
performancesubstantially.
Thisoptionchangesthesemanticsofpoll(),andassuchcouldcauseapplications
tomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsovernon
acceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityof
applicationsworkfinewiththisoption.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 184
EF_POLL_FAST_USEC
Name:ul_poll_fast_usec
Default:32
Scope:perprocess
Whenspinninginapoll()call,causesacceleratedsocketstobepolledforNusecs
beforeunacceleratedsocketsarepolled.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Since
acceleratedsocketsaretypicallythepartsoftheapplicationwhicharemost
performancesensitivethisistypicallyagoodtradeoff.
EF_POLL_NONBLOCK_FAST_USEC
Name:ul_poll_nonblock_fast_usec
Default:200
Scope:perprocess
Wheninvokingpoll()withtimeout==0(nonblocking),thisoptioncausesnon
acceleratedsocketstobepolledonlyeveryNusecs.
Thisreduceslatencyforacceleratedsockets,possiblyattheexpenseoflatencyon
unacceleratedsockets.Sinceacceleratedsocketsaretypicallythepartsofthe
applicationwhicharemostperformancesensitivethisisoftenagoodtradeoff.
Setthisoptiontozerotodisable,ortoahighervaluetofurtherimprovelatencyfor
acceleratedsockets.
Thisoptionchangesthebehaviorofpoll()calls,socouldpotentiallycausean
applicationtomisbehave.
EF_POLL_ON_DEMAND
Name:poll_on_demand
Default:1
Minimum:0
Maximum:1
Scope:perstack
Pollfornetworkeventsinthecontextoftheapplicationcallsintothenetworkstack.
Thisoptionisenabledbydefault.
Thisoptioncanimproveperformanceinmultithreadedapplicationswherethe
Onloadstackisinterruptdriven(EF_INT_DRIVEN=1),becauseitcanreducelock
contention.SettingEF_POLL_ON_DEMAND=0ensuresthatnetworkeventsare
(mostly)processedinresponsetointerrupts.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 185
EF_POLL_SPIN
Name:ul_poll_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spininpoll()callsuntilaneventissatisfiedorthespintimeoutexpires(whichever
isthesooner).Ifthespintimeoutexpires,enterthekernelandblock.Thespin
timeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_POLL_USEC
Name:ef_poll_usec_meta_option
Default:0
Scope:perprocess
Thisoptionenablesspinningandsetsthespintimeoutinmicroseconds.
Settingthisoptionisequivalentto:SettingEF_SPIN_USECandEF_BUZZ_USEC,
enablingspinningforUDPsendsandreceives,TCPsendsandreceives,select,poll
andepoll_wait(),andenablinglockbuzzing.
Spinningtypicallyreduceslatencyandjittersubstantially,andcanalsoimprove
throughput.However,insomeapplicationsspinningcanharmperformance,
particularlyapplicationthathavemanythreads.Whenspinningisenabledyou
shouldnormallydedicateaCPUcoretoeachthreadthatspins.
YoucanusetheEF_*_SPINoptionstoselectivelyenableordisablespinningforeach
APIandtransport.Youcanalsousetheonload_thread_set_spin()extensionAPI
tocontrolspinningonaperthreadandperAPIbasis.
EF_PREFAULT_PACKETS
Name:prefault_packets
Default:1
Minimum:0
Maximum:1000000000
Scope:perstack
Whenset,thisoptioncausestheprocessto'touch'thespecifiednumberofpacket
bufferswhentheOnloadstackiscreated.Thiscausesmemoryforthepacketbuffers
tobepreallocated,andalsocausesthemtobememorymappedintotheprocess
addressspace.Thiscanpreventlatencyjittercausedbyallocationandmemory
mappingoverheads.
Thenumberofpacketsrequestedisinadditiontothepacketbuffersthatare
allocatedtofilltheRXrings.Thereisnoguaranteethatitwillbepossibletoallocate
thenumberofpacketbuffersrequested.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 186
Thedefaultsettingcausesallpacketbufferstobemappedintotheuserlevel
addressspace,butdoesnotcauseanyextrabufferstobereserved.Setto0to
preventprefaulting.
EF_PROBE
Name:probe
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Whenset,filedescriptorsaccessedfollowingexec()willbe'probed'and
OpenOnloadsocketswillbemappedtouserlandsothattheycanbeaccelerated.
OtherwiseOpenOnloadsocketsarenotacceleratedfollowingexec().
EF_RETRANSMIT_THRESHOLD
Name:retransmit_threshold
Default:15
Minimum:0
Maximum:SMAX
Scope:perstack
NumberofretransmittimeoutsbeforeaTCPconnectionisaborted.
EF_RETRANSMIT_THRESHOLD_ORPHAN
Name:retransmit_threshold_orphan
Default:8
Minimum:0
Maximum:SMAX
Scope:perstack
NumberofretransmittimeoutsbeforeaTCPconnectionisabortedincaseof
orphanedconnection.
EF_RETRANSMIT_THRESHOLD_SYN
Name:retransmit_threshold_syn
Default:4
Minimum:0
Maximum:SMAX
Scope:perstack
NumberoftimesaSYNwillberetransmittedbeforeaconnect()attemptwillbe
aborted.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 187
EF_RETRANSMIT_THRESHOLD_SYNACK
Name:retransmit_threshold_synack
Default:5
Minimum:0
Maximum:CI_CFG_TCP_SYNACK_RETRANS_MAX
Scope:perstack
NumberoftimesaSYNACKwillberetransmittedbeforeanembryonicconnection
willbeaborted.
EF_RFC_RTO_INITIAL
Name:rto_initial
Default:1000
Scope:perstack
Initialretransmittimeoutinmilliseconds.i.e.Thenumberofmillisecondstowaitfor
anACKbeforeretransmittingpackets.
EF_RFC_RTO_MAX
Name:rto_max
Default:120000
Scope:perstack
Maximumretransmittimeoutinmilliseconds.
EF_RFC_RTO_MIN
Name:rto_min
Default:200
Scope:perstack
Minimumretransmittimeoutinmilliseconds.
EF_RXQ_LIMIT
Name:rxq_limit
Default:65535
Minimum:CI_CFG_RX_DESC_BATCH
Maximum:65535
Scope:perstack
Maximumfilllevelforthereceivedescriptorring.Thishasnoeffectwhenithasa
valuelargerthantheringsize(EF_RXQ_SIZE).
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 188
EF_RXQ_MIN
Name:rxq_min
Default:256
Minimum:2*CI_CFG_RX_DESC_BATCH+1
Scope:perstack
MinimuminitialfilllevelforeachRXring.IfOnloadisnotabletoallocatesufficient
packetbufferstofilleachRXringtothislevel,thencreationofthestackwillfail.
EF_RXQ_SIZE
Name:rxq_size
Default:512
Minimum:512
Maximum:4096
Scope:perstack
Setthesizeofthereceivedescriptorring.Validvalues:512,1024,2048or4096.
Alargerringsizecanabsorblargerpacketburstswithoutdrops,butmayreduce
efficiencybecausetheworkingsetsizeisincreased.
EF_RX_TIMESTAMPING
Name:rx_timestamping
Default:0
Minimum:0
Maximum:3
Scope:perstack
Controlofhardwaretimestampingofreceivedpackets,possiblevalues:
•0‐donotdotimestamping(default)
•1‐requesttimestampingbutcontinueifhardwareisnotcapableoritdoesnot
succeed
•2‐requesttimestampingandfailifhardwareiscapableanditdoesnotsucceed
•3‐requesttimestampingandfailifhardwareisnotcapableoritdoesnot
succeed.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 189
EF_SA_ONSTACK_INTERCEPT
Name:sa_onstack_intercept
Default:0
Minimum:0
Maximum:1
Scope:perprocess
InterceptsignalswhensignalhandlerisinstalledwithSA_ONSTACKflag.0‐Don't
intercept.Ifyoucallsocketrelatedfunctionssuchassend,filerelatedfunctions
suchascloseordupfromyoursignalhandler,thenyourapplicationmaydeadlock.
(default)1‐Intercept.ThereisnoguaranteethatSA_ONSTACKflagwillreallywork,
butOpenOnloadlibrarywilldoitsbest.
EF_SCALABLE_FILTERS
Name:scalable_filter_ifindex
Default:0
Minimum:0
Maximum:SMAX
Scope:perstack
Specifiestheinterfaceonwhichtoenablesupportforscalablefilters,andconfigures
thescalablefiltermode(s)touse.ScalablefiltersallowOnloadtouseasingle
hardwareMACaddressfiltertoavoidhardwarelimitationsandoverheads.This
removesrestrictionsonthenumberofsimultaneousconnectionsandincreases
performanceofactiveconnectcalls,butkernelsupportontheselectedinterfaceis
limitedtoARP/DHCP/ICMPprotocolsandsomeOnloadfeaturesthatrelyon
unacceleratedtraffic(suchasreceivingfragmentedUDPdatagrams)willnotwork.
PleaseseetheOnloaduserguideforfulldetails.
Dependingonthemodeselectedthisoptionwillenablesupportfor:
• scalablelisteningsockets
IP_TRANSPARENTsocketoption.
TheinterfacespecifiedmustbeaSFN7000orlateradapters.
FormatofEF_SCALABLE_FILTERSvariableisasfollows:
EF_SCALABLE_FILTERS=<interfacename>[=mode[:mode]]
wheremodeisoneof:
transparent_active
passive
rss
Thefollowingmodesandtheircombinationscanbespecified:
transparent_active
passive
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 190
rss:transparent_active
transparent_active:passive
EF_SCALABLE_FILTERS_ENABLE
Name:scalable_filter_enable
Default:0
Minimum:0
Maximum:1
Scope:perstack
Turnthescalablefilterfeatureonoroffonastack.Ifthisissetto1thenthe
configurationselectedinEF_SCALABLE_FILTERSwillbeused.Ifthisissetto0then
scalablefilterswillnotbeusedforthisstack.Ifunsetthiswilldefaultto1if
EF_SCALABLE_FILTERSisconfigured.
EF_SCALABLE_FILTERS_MODE
Name:scalable_filter_mode
Default:4294967295
Minimum:1
Maximum:6
Scope:perstack
StoresscalablefiltermodesetwithEF_SCALABLE_FILTERS.Tobesetindirectlywith
EF_SCALABLE_FILTERSvariable
EF_SELECT_FAST
Name:ul_select_fast
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Allowaselect()calltoreturnwithoutinspectingthestateofallselectedfile
descriptorswhenatleastoneselectedeventissatisfied.Thisallowstheaccelerated
select()calltoavoidasystemcallwhenacceleratedsocketsare'ready',andcan
increaseperformancesubstantially.
Thisoptionchangesthesemanticsofselect(),andassuchcouldcause
applicationstomisbehave.Iteffectivelygivesprioritytoacceleratedsocketsover
nonacceleratedsocketsandotherfiledescriptors.Inpracticeavastmajorityof
applicationsworkfinewiththisoption.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 191
EF_SELECT_FAST_USEC
Name:ul_select_fast_usec
Default:32
Scope:perprocess
Whenspinninginaselect()call,causesacceleratedsocketstobepolledforN
usecsbeforeunacceleratedsocketsarepolled.Thisreduceslatencyforaccelerated
sockets,possiblyattheexpenseoflatencyonunacceleratedsockets.Since
acceleratedsocketsaretypicallythepartsoftheapplicationwhicharemost
performancesensitivethisistypicallyagoodtradeoff.
EF_SELECT_NONBLOCK_FAST_USEC
Name:ul_select_nonblock_fast_usec
Default:200
Scope:perprocess
Wheninvokingselect()withtimeout==0(nonblocking),thisoptioncausesnon
acceleratedsocketstobepolledonlyeveryNusecs.
Thisreduceslatencyforacceleratedsockets,possiblyattheexpenseoflatencyon
unacceleratedsockets.Sinceacceleratedsocketsaretypicallythepartsofthe
applicationwhicharemostperformancesensitivethisisoftenagoodtradeoff.
Setthisoptiontozerotodisable,ortoahighervaluetofurtherimprovelatencyfor
acceleratedsockets.
Thisoptionchangesthebehaviorofselect()calls,socouldpotentiallycausean
applicationtomisbehave.
EF_SELECT_SPIN
Name:ul_select_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spininblockingselect()callsuntiltheselectsetissatisfiedorthespintimeout
expires(whicheveristhesooner).Ifthespintimeoutexpires,enterthekerneland
block.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 192
EF_SEND_POLL_MAX_EVS
Name:send_poll_max_events
Default:96
Minimum:1
Maximum:65535
Scope:perstack
Whenpollingfornetworkeventsaftersending,thisplacesalimitonthenumberof
eventshandled.
EF_SEND_POLL_THRESH
Name:send_poll_thresh
Default:64
Minimum:0
Maximum:65535
Scope:perstack
Pollfornetworkeventsaftersendingthismanypackets.
Settingthistoalargervaluemayimprovetransmitthroughputforsmallmessages
byallowingbatching.However,suchbatchingmaycausesendstobedelayed
leadingtoincreasedjitter.
EF_SHARE_WITH
Name:share_with
Default:0
Minimum:1
Maximum:SMAX
Scope:perstack
Setthisoptiontoallowastacktobeaccessedbyprocessesownedbyanotheruser.
SetittotheUIDofauserthatshouldbepermittedtosharethisstack,orsetitto‐
1toallowanyusertosharethestack.Bydefaultstacksarenotaccessiblebyusers
otherthanroot.
Processesinvokedbyrootcanaccessanystack.Setuidprocessescanonlyaccess
stackscreatedbytheeffectiveuser,nottherealuser.Thisrestrictioncanberelaxed
bysettingtheonloadkernelmoduleoptionallow_insecure_setuid_sharing=1.
WARNING:Auserthatispermittedtoaccessastackisableto:snooponanydata
transmittedorreceivedviathestack;injectormodifydatatransmittedorreceived
viathestack;damagethestackandanysocketsorconnectionsinit;cause
misbehaviorandcrashesinanyapplicationusingthestack.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 193
EF_SIGNALS_NOPOSTPONE
Name:signals_no_postpone
Default:67110080
Minimum:0
Maximum:(ci_uint64)(1)
Scope:perprocess
Commaseparatedlistofsignalnumberstoavoidpostponingofthesignalhandlers.
Yourapplicationwilldeadlockifoneofthehandlersusessocketfunction.Bydefault,
thelistincludesSIGBUS,SIGSEGVandSIGPROF.
Pleasespecifynumbers,notstringaliases:EF_SIGNALS_NOPOSTPONE=7,11,27
insteadofEF_SIGNALS_NOPOSTPONE=SIGBUS,SIGSEGV,SIGPROF.
YoucansetEF_SIGNALS_NOPOSTPONEtoemptyvaluetopostponeallsignal
handlersinthesamewayifyoususpectthesesignalstocallnetworkfunctions.
EF_SOCKET_CACHE_MAX
Name:sock_cache_max
Default:0
Scope:perstack
SetsthemaximumnumberofTCPsocketstocacheforthisstack.Whenset>0,
OpenOnloadwillcacheresourcesassociatedwithsocketsinordertoimprove
connectionsetupandteardownperformance.Thisimprovesperformancefor
applicationsthatmakenewTCPconnectionsatahighrate.
EF_SOCKET_CACHE_PORTS
Name:sock_cache_ports
Default:0
Scope:perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.Whenset(andsocket
cachingisenabled),onlysocketsboundtothespecifiedportswillbeeligibletobe
cached.
EF_SOCK_LOCK_BUZZ
Name:sock_lock_buzz
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spinwhilewaitingtoobtainapersocketlock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 194
Thepersocketlockistakeninrecv()callsandsimilar.Thisoptioncanreducejitter
whenmultiplethreadsinvokerecv()onthesamesocket,butcanreducefairness
betweenthreadscompetingforthelock.
EF_SO_BUSY_POLL_SPIN
Name:so_busy_poll_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spinpoll,selectandepollinaLinuxlikeway:enablespinningonlyifaspinning
socketispresetinthepoll/select/epollset.SeeLinuxdocumentationon
SO_BUSY_POLLsocketoptionfordetails.
YoushouldalsoenablespinningviaEF_POLL,SELECT,EPOLL_SPINvariableifyou'd
liketospininpoll,selectorepollcorrespondingly.Thespindurationissetvia
EF_SPIN_USEC,whichisequivalenttotheLinuxsysctl.net.busy_pollvalue.
EF_POLL_USECisallinonevariabletosetforall4variablesmentionedhere.
Linuxneverspinsinepoll,butOnloaddoes.Thisvariabledoesnotaffectepoll
behaviorifEF_UL_EPOLL=2.
EF_SPIN_USEC
Name:ul_spin_usec
Default:0
Scope:perprocess
Setsthetimeoutinmicrosecondsforspinningoptions.Setthisto‐1tospinforever.
ThespintimeoutmayalsobesetbytheEF_POLL_USECoption.
Spinningtypicallyreduceslatencyandjittersubstantially,andcanalsoimprove
throughput.However,insomeapplicationsspinningcanharmperformance,
particularlyapplicationthathavemanythreads.Whenspinningisenabledyou
shouldnormallydedicateaCPUcoretoeachthreadthatspins.
YoucanusetheEF_*_SPINoptionstoselectivelyenableordisablespinningforeach
APIandtransport.Youcanalsousetheonload_thread_set_spin()extensionAPI
tocontrolspinningonaperthreadandperAPIbasis.
EF_STACK_LOCK_BUZZ
Name:stack_lock_buzz
Default:0
Minimum:0
Maximum:1
Scope:perprocess
Spinwhilewaitingtoobtainaperstacklock.Ifthespintimeoutexpires,enterthe
kernelandblock.ThespintimeoutissetbyEF_BUZZ_USEC.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 195
Thisoptionreducesjittercausedbylockcontention,butcanreducefairness
betweenthreadscompetingforthelock.
EF_STACK_PER_THREAD
Name:stack_per_thread
Default:0
Minimum:0
Maximum:1
Scope:perprocess
CreateaseparateOnloadstackforthesocketscreatedbyeachthread.
EF_SYNC_CPLANE_AT_CREATE
Name:sync_cplane
Default:2
Minimum:0
Maximum:2
Scope:perstack
Whenthisoptionissetto2Onloadwillforceasyncofcontrolplaneinformation
fromthekernelwhenastackiscreated.Thiscanhelptoensureuptodate
informationisusedwhereastackiscreatedimmediatelyfollowinginterface
configuration.
Ifthisoptionissetto1thenOnloadwillonlyforceasyncforthefirststackcreated.
Thiscanbeusedifstackcreationtimeforlaterstacksistimecritical.
Settingthisoptionto0willdisableforcedsync.Synchronizingdatafromthekernel
willcontinuetohappenperiodically.
EF_TCP
Name:ul_tcp
Default:1
Minimum:0
Maximum:1
Scope:perprocess
CleartodisableaccelerationofnewTCPsockets.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 196
EF_TCP_ACCEPT_SPIN
Name:tcp_accept_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingTCPaccept()callsuntilincomingconnectionisestablished,the
spintimeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthe
spintimeoutexpires,enterthekernelandblock.Thespintimeoutissetby
EF_SPIN_USECorEF_POLL_USEC.
EF_TCP_ADV_WIN_SCALE_MAX
Name:tcp_adv_win_scale_max
Default:14
Minimum:0
Maximum:14
Scope:perstack
MaximumvalueforTCPwindowscalingthatwillbeadvertised.
EF_TCP_BACKLOG_MAX
Name:tcp_backlog_max
Default:256
Scope:perstack
Placesanupperlimitonthenumberofembryonic(halfopen)connectionsforone
listeningsocket.SeealsoEF_TCP_SYNRECV_MAX.
Thisvalueisoverriddenby/proc/sys/net/ipv4/tcp_max_syn_backlog.
EF_TCP_CLIENT_LOOPBACK
Name:tcp_client_loopback
Default:0
Minimum:0
Maximum:CITP_TCP_LOOPBACK_TO_NEWSTACK
Scope:perstack
EnableaccelerationofTCPloopbackconnectionsontheconnecting(client)side:
•0‐notaccelerated(default)
•1‐accelerateifthelisteningsocketisinthesamestack(youshouldalsoset
EF_TCP_SERVER_LOOPBACK!=0)
•2‐accelerateandmoveacceptedsockettothestackoftheconnectingsocket
(servershouldallowthisviaEF_TCP_SERVER_LOOPBACK=2)
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 197
•3‐accelerateandmovetheconnectingsockettothestackofthelistening
socket(servershouldallowthisviaEF_TCP_SERVER_LOOPBACK!=0)
•4‐accelerateandmovebothconnectingandacceptedsocketstothenewstack
(servershouldallowthisviaEF_TCP_SERVER_LOOPBACK=2).
NOTE:Options3and4breaksomeapplicationsusingepoll(),fork()anddup()
calls.
NOTE:Options2and4makesaccept()tomisbehaveiftheclientexisttooearly.
NOTE:Option4isnotrecommendedon32bitsystemsbecauseitcancreatealot
ofadditionalOnloadstackseatingalotoflowmemory.
EF_TCP_CONNECT_HANDOVER
Name:tcp_connect_handover
Default:0
Minimum:0
Maximum:1
Scope:perstack
WhenanacceleratedTCPsocketcallsconnect(),handitovertothekernelstack.
ThisoptiondisablesaccelerationofactiveopenTCPconnections.
EF_TCP_CONNECT_SPIN
Name:tcp_connect_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingTCPconnect()callsuntilconnectionisestablished,thespin
timeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespin
timeoutexpires,enterthekernelandblock.Thespintimeoutissetby
EF_SPIN_USECorEF_POLL_USEC.
EF_TCP_FASTSTART_IDLE
Name:tcp_faststart_idle
Default:65536
Minimum:0
Scope:perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhen
doingsomayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.
ThisoptionsetsthenumberofbytesthatmustbeACKedbythereceiverbeforethe
connectionexitsFASTSTART.SettozerotopreventaconnectionenteringFASTSTART
afteranidleperiod.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 198
EF_TCP_FASTSTART_INIT
Name:tcp_faststart_init
Default:65536
Minimum:0
Scope:perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhen
doingsomayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.
ThisoptionsetsthenumberofbytesthatmustbeACKedbythereceiverbeforethe
connectionexitsFASTSTART.SettozerotodisableFASTSTARTonnewconnections.
EF_TCP_FASTSTART_LOSS
Name:tcp_faststart_loss
Default:65536
Minimum:0
Scope:perstack
TheFASTSTARTfeaturepreventsOnloadfromdelayingACKsduringtimeswhen
doingsomayreduceperformance.FASTSTARTisenabledwhenaconnectionisnew,
followinglossandaftertheconnectionhasbeenidleforawhile.
ThisoptionsetsthenumberofbytesthatmustbeACKedbythereceiverbeforethe
connectionexitsFASTSTARTfollowingloss.SettozerotodisableFASTSTARTafter
loss.
EF_TCP_FIN_TIMEOUT
Name:fin_timeout
Default:60
Scope:perstack
Timeinsecondstowaitforanorphanedconnectiontobeclosedproperlybythe
networkpartner(e.g.FINintheTCPFIN_WAIT2state,zerowindowopeningtosend
ourFIN,etc).
EF_TCP_FORCE_REUSEPORT
Name:tcp_reuseports
Default:0
Scope:perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.TCPsocketsthatbind
tothoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 199
EF_TCP_INITIAL_CWND
Name:initial_cwnd
Default:0
Minimum:0
Maximum:SMAX
Scope:perstack
Setstheinitialsizeofthecongestionwindow(inbytes)forTCPconnections.Some
careisneededas,forexample,settingsmallerthanthesegmentsizemayresultin
Onloadbeingunabletosendtraffic.
WARNING:ModifyingthisoptionmayviolatetheTCPprotocol.
EF_TCP_LISTEN_HANDOVER
Name:tcp_listen_handover
Default:0
Minimum:0
Maximum:1
Scope:perstack
WhenanacceleratedTCPsocketcallslisten(),handitovertothekernelstack.
ThisoptiondisablesaccelerationofTCPlisteningsocketsandpassivelyopenedTCP
connections.
EF_TCP_LISTEN_REPLIES_BACK
Name:tcp_listen_replies_back
Default:0
Minimum:0
Maximum:1
Scope:perstack
WhenTCPlisteningsocketrepliestoincomingSYN,thisoptionforcesOnloadto
ignoretheroutetableandtoreplytothesamenetworkinterfacetheSYNwas
receivedfrom.Thismodecouldbeconsideredasapoormansourcerouting
replacement.
EF_TCP_LOSS_MIN_CWND
Name:loss_min_cwnd
Default:0
Minimum:0
Maximum:SMAX
Scope:perstack
SetstheminimumsizeofthecongestionwindowforTCPconnectionsfollowingloss.
WARNING:ModifyingthisoptionmayviolatetheTCPprotocol.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 200
EF_TCP_RCVBUF
Name:tcp_rcvbuf_user
Default:0
Scope:perstack
OverrideSO_RCVBUFforTCPsockets.(Note:theactualsizeofthebufferisdouble
theamountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_RCVBUF_ESTABLISHED_DEFAULT
Name:tcp_rcvbuf_est_def
Default:131072
Scope:perstack
OverridestheOSdefaultSO_RCVBUFvalueforTCPsocketsintheESTABLISHEDstate
iftheOSdefaultSO_RCVBUFvaluefallsoutsideboundssetwiththisoption.This
valueisusedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoid
confusionofsomeapplicationslikenetperf.
Thelowerboundissettothisvalueandtheupperboundissetto4*thisvalue.If
theOSdefaultSO_RCVBUFvalueislessthanthelowerbound,thenthelowerbound
isused.IftheOSdefaultSO_RCVBUFvalueismorethantheupperbound,thenthe
upperboundisused.
ThisvariableoverridesOSdefaultSO_RCVBUFvalueonly,itdoesnotchange
SO_RCVBUFiftheapplicationexplicitlysetsit(seeEF_TCP_RCVBUFvariablewhich
overridesapplicationsuppliedvalue).
EF_TCP_RCVBUF_MODE
Name:tcp_rcvbuf_mode
Default:0
Minimum:0
Maximum:1
Scope:perstack
ThisoptioncontrolshowtheRCVBUFissetforTCPMode0(default)givesfixedsize
RCVBUF.
Mode1willenableautomatictuningofRCVBUFusingDynamicRightSizing.If
SO_RCVBUFisexplicitlysetbytheapplicationthisvaluewillbeused.
EF_TCP_SOCKBUF_MAX_FRACTIONcanbeusedtocontrolthemaximumsizeofthe
bufferforanindividualsocket.
TheeffectofEF_TCP_RCVBUF_STRICTisindependentofthissetting.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 201
EF_TCP_RCVBUF_STRICT
Name:tcp_rcvbuf_strict
Default:0
Minimum:0
Maximum:1
Scope:perstack
ThisoptionpreventsTCPsmallsegmentattack.Withthisoptionset,Onloadlimits
thenumberofpacketsinsideTCPreceivequeueandTCPreorderbuffer.Insome
cases,thisoptioncausesperformancepenalty.Youprobablywantthisoptionifyour
applicationisconnectingtountrustedpartneroroveruntrustednetwork.
Offbydefault.
EF_TCP_RECV_SPIN
Name:tcp_recv_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingTCPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enter
thekernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_TCP_RST_DELAYED_CONN
Name:rst_delayed_conn
Default:0
Minimum:0
Maximum:1
Scope:perstack
ThisoptiontellsOnloadtoresetTCPconnectionsratherthanallowdatatobe
transmittedlate.Specifically,TCPconnectionsareresetiftheretransmittimeout
fires.(Thisusuallyhappenswhendataislost,andnormallytriggersaretransmit
whichresultsindatabeingdeliveredhundredsofmillisecondslate).
WARNING:ThisoptionislikelytocauseconnectionstoberesetspuriouslyifACK
packetsaredroppedinthenetwork.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 202
EF_TCP_RX_CHECKS
Name:tcp_rx_checks
Default:0
Minimum:0
Maximum:1
Scope:perstack
Internal/debugginguseonly:performextradebugging/consistencycheckson
receivedpackets.
EF_TCP_RX_LOG_FLAGS
Name:tcp_rx_log_flags
Default:0
Scope:perstack
LogreceivedpacketsthathaveanyoftheseflagssetintheTCPheader.Onlyactive
whenEF_TCP_RX_CHECKSisset.
EF_TCP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:tcp_nonblock_no_pkts_mode
Default:0
Minimum:0
Maximum:1
Scope:perstack
ThisoptioncontrolshowanonblockingTCPsend()callshouldbehaveifitisunable
toallocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinuxkernelstack
behaviorandblockforpacketbufferstobeavailable.Ifsetto1,thisoptionwill
causeOnloadtoreturnerrorENOBUFS.Notethisoptioncancausesomeapplications
(thatassumethatasocketthatiswritableisabletosendwithouterror)to
malfunction.
EF_TCP_SEND_SPIN
Name:tcp_send_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingTCPsendcallsuntilwindowisupdatedbypeer,thespintimeout
expiresorthesockettimeoutexpires(whicheveristhesooner).Ifthespintimeout
expires,enterthekernelandblock.ThespintimeoutissetbyEF_SPIN_USECor
EF_POLL_USEC.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 203
EF_TCP_SERVER_LOOPBACK
Name:tcp_server_loopback
Default:0
Minimum:0
Maximum:CITP_TCP_LOOPBACK_ALLOW_ALIEN_IN_ACCEPTQ
Scope:perstack
EnableaccelerationofTCPloopbackconnectionsonthelistening(server)side:
•0‐notaccelerated(default)
•1‐accelerateiftheconnectingsocketisinthesamestack(youshouldalsoset
EF_TCP_CLIENT_LOOPBACK!=0)
•2‐accelerateandallowacceptedsockettobeinanotherstack(thisis
necessaryforclientswithEF_TCP_CLIENT_LOOPBACK=2,4).
EF_TCP_SHARED_LOCAL_PORTS
Name:tcp_shared_local_ports
default:0 min:0 
perstack
ThisfeatureimprovestheperformanceofTCPactiveopens.Itreducesthecostof
bothblockingandnonblockingconnect()calls,reducesthelatencytoestablishnew
connections,andenablesscalingtolargenumbersofactiveopenconnections.It
alsoreducesthecostofclosingtheseconnections.
Theseimprovementsareachievedbysharingasetoflocalportnumbersamongst
activeopensockets,whichsavesthecostandscalinglimitsassociatedwith
installingpacketsteeringfiltersforeachactiveopensocket.Sharedlocalportsare
onlyusedwhenthelocalportisnotexplicitlyassignedbytheapplication.Setthis
optionto>=1toenablelocalportsharing.
ThevaluesetgivestheinitialnumberoflocalportstoallocatewhentheOnload
stackiscreated.Moresharedlocalportsareallocatedondemandasneededupto
themaximumgivenbyEF_TCP_SHARED_LOCAL_PORTS_MAX.
NOTE:Notethattypicallyonlyonelocalsharedportisneeded,asdifferentlocal
portsareonlyneededwhenmultipleconnectionsaremadetothesameremote
IP:port.
EF_TCP_SHARED_LOCAL_PORTS_MAX
Name:tcp_shared_local_ports_max
default:100 
min:0 
perstack
Thissettingsetsthemaximumsizeofthepooloflocalsharedports.See
EF_TCP_SHARED_LOCAL_PORTSfordetails.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 204
EF_TCP_SNDBUF
Name:tcp_sndbuf_user
Default:0
Scope:perstack
OverrideSO_SNDBUFforTCPsockets(Note:theactualsizeofthebufferisdoublethe
amountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_TCP_SNDBUF_ESTABLISHED_DEFAULT
Name:tcp_sndbuf_est_def
Default:131072
Scope:perstack
OverridestheOSdefaultSO_SNDBUFvalueforTCPsocketsintheESTABLISHEDstate
iftheOSdefaultSO_SNDBUFvaluefallsoutsideboundssetwiththisoption.This
valueisusedwhentheTCPconnectiontransitionstoESTABLISHEDstate,toavoid
confusionofsomeapplicationslikenetperf.
Thelowerboundissettothisvalueandtheupperboundissetto4*thisvalue.If
theOSdefaultSO_SNDBUFvalueislessthanthelowerbound,thenthelowerbound
isused.IftheOSdefaultSO_SNDBUFvalueismorethantheupperbound,thenthe
upperboundisused.
ThisvariableoverridesOSdefaultSO_SNDBUFvalueonly,itdoesnotchange
SO_SNDBUFiftheapplicationexplicitlysetsit(seeEF_TCP_SNDBUFvariablewhich
overridesapplicationsuppliedvalue).
EF_TCP_SNDBUF_MODE
Name:tcp_sndbuf_mode
Default:1
Minimum:0
Maximum:2
Scope:perstack
ThisoptioncontrolshowtheSO_SNDBUFlimitisappliedtoTCPsockets.Inthe
defaultmodethelimitappliestothesizeofthesendqueueandretransmitqueue
combined.Whenthisoptionissetto0thelimitappliestothesendqueueonly.
Whenthisoptionissetto2,theSNDBUFsizeisautomaticallyadjustedforeachTCP
sockettomatchthewindowadvertisedbythepeer(limitedby
EF_TCP_SOCKBUF_MAX_FRACTION).IftheapplicationsetsSO_SNDBUFexplicitlythen
automaticadjustmentisnotusedforthatsocket.Thelimitisappliedtothesizeof
thesendqueueandretransmitqueuecombined.Youmayalsowanttoset
EF_TCP_RCVBUF_MODEtogiveautomaticadjustmentofRCVBUF.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 205
EF_TCP_SOCKBUF_MAX_FRACTION
Name:tcp_sockbuf_max_fraction
Default:1
Minimum:1
Maximum:10
Scope:perstack
ThisoptioncontrolsthemaximumfractionoftheTXbuffersthatmaybeallocated
toasinglesocketwithEF_TCP_SNDBUF_MODE=2.
ItalsocontrolsthemaximumfractionoftheRXbuffersthatmaybeallocatedtoa
singlesocketwithEF_TCP_RCVBUF_MODE=1.
ThemaximumallocationforasocketisEF_MAX_TX_PACKETS/(2^N)forTXand
EF_MAX_RX_PACKETS/(2^N)forRX,whereNisspecifiedhere.
EF_TCP_SYNCOOKIES
Name:tcp_syncookies
Default:0
Minimum:0
Maximum:1
Scope:perstack
UseTCPsyncookiestoprotectfromSYNfloodattack
EF_TCP_SYNRECV_MAX
Name:tcp_synrecv_max
Default:1024
Maximum:CI_CFG_NETIF_MAX_ENDPOINTS_MAX
Scope:perstack
Placesanupperlimitonthenumberofembryonic(halfopen)connectionsinan
Onloadstack.SeealsoEF_TCP_BACKLOG_MAX.
Bydefault,EF_TCP_SYNRECV_MAX=4*EF_TCP_BACKLOG_MAX.
EF_TCP_SYN_OPTS
Name:syn_opts
Default:7
Scope:perstack
AbitmaskspecifyingtheTCPoptionstoadvertiseinSYNsegments:
•bit0(0x1)issetto1toenablePAWSandRTTMtimestamps(RFC1323)
•bit1(0x2)issetto1toenablewindowscaling(RFC1323)
•bit2(0x4)issetto1toenableSACK(RFC2018)
•bit3(0x8)issetto1toenableECN(RFC3128).
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 206
EF_TCP_TCONST_MSL
Name:msl_seconds
Default:25
Scope:perstack
TheMaximumSegmentLifetime(asdefinedbytheTCPRFC).Asmallervaluecauses
connectionstospendlesstimeintheTIME_WAITstate.
EF_TIMESTAMPING_REPORTING
Name:timestamping_reporting
Default:0
Minimum:0
Maximum:1
Scope:perstack
Controlstimestampreporting,possiblevalues:
•0:reporttranslatedtimestampsonlywhentheNICclockhasbeenset
•1:reporttranslatedtimestampsonlywhenthesystemclockandtheNICclock
areinsync(e.g.usingptpd)
IftheaboveconditionsarenotmetOnloadwillonlyreportraw(nottranslated)
timestamps.
EF_TXQ_LIMIT
Name:txq_limit
Default:268435455
Minimum:16*1024
Maximum:0xfffffff
Scope:perstack
Maximumnumberofbytestoenqueueonthetransmitdescriptorring.
EF_TXQ_RESTART
Name:txq_restart
Default:268435455
Minimum:1
Maximum:0xfffffff
Scope:perstack
Level(inbytes)towhichthetransmitdescriptorringmustfallbeforeitwillbefilled
again.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 207
EF_TXQ_SIZE
Name:txq_size
Default:512
Minimum:512
Maximum:4096
Scope:perstack
Setthesizeofthetransmitdescriptorring.Validvalues:512,1024,2048or4096.
EF_TX_MIN_IPG_CNTL
Name:tx_min_ipg_cntl
Default:0
Minimum:1
Maximum:20
Scope:perstack
Ratepacingvalue.
EF_TX_PUSH
Name:tx_push
Default:1
Minimum:0
Maximum:1
Scope:perstack
Enablelowlatencytransmit.
EF_TX_PUSH_THRESHOLD
Name:tx_push_thresh
Default:100
Minimum:1
Scope:perstack
SetsathresholdforthenumberofoutstandingsendsbeforewestopusingTX
descriptorpush.ThishasnoeffectifEF_TX_PUSH=0.Thisthresholdisignored,and
assumedtobe1,onpreSFN7000serieshardware.Itmakessensetosetthisvalue
similartoEF_SEND_POLL_THRESH.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 208
EF_TX_QOS_CLASS
Name:tx_qos_class
Default:0
Minimum:0
Maximum:1
Scope:perstack
SettheQOSclassfortransmittedpacketsonthisOnloadstack.TwoQOSclassesare
supported:0and1.BydefaultbothOnloadacceleratedtrafficandkerneltrafficare
inclass0.Youcanminimizelatencybyplacinglatencysensitivetrafficintoa
separateQOSclassfrombulktraffic.
EF_TX_TIMESTAMPING
Name:tx_timestamping
Default:0
Minimum:0
Maximum:3
Scope:perstack
Controlofhardwaretimestampingoftransmittedpackets,possiblevalues:
•0‐donotdotimestamping(default)
•1‐requesttimestampingbutcontinueifhardwareisnotcapableoritdoesnot
succeed
•2‐requesttimestampingandfailifhardwareiscapableanditdoesnotsucceed
•3‐requesttimestampingandfailifhardwareisnotcapableoritdoesnot
succeed.
EF_UDP
Name:ul_udp
Default:1
Minimum:0
Maximum:1
Scope:perprocess
CleartodisableaccelerationofnewUDPsockets.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 209
EF_UDP_CONNECT_HANDOVER
Name:udp_connect_handover
Default:1
Minimum:0
Maximum:1
Scope:perstack
WhenaUDPsocketisconnectedtoanIPaddressthatcannotbeacceleratedby
OpenOnload,handthesocketovertothekernelstack.
WhenthisoptionisdisabledthesocketremainsunderthecontrolofOpenOnload.
Thismaybeworthwhilebecausethesocketmaysubsequentlybereconnectedto
anIPaddressthatcanbeaccelerated.
EF_UDP_FORCE_REUSEPORT
Name:udp_reuseports
Default:0
Scope:perprocess
Thisoptionspecifiesacommaseparatedlistofportnumbers.UDPsocketsthatbind
tothoseportnumberswillhaveSO_REUSEPORTautomaticallyappliedtothem.
EF_UDP_PORT_HANDOVER2_MAX
Name:udp_port_handover2_max
Default:1
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
EF_UDP_PORT_HANDOVER2_MIN
Name:udp_port_handover2_min
Default:2
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER2_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 210
EF_UDP_PORT_HANDOVER3_MAX
Name:udp_port_handover3_max
Default:1
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
EF_UDP_PORT_HANDOVER3_MIN
Name:udp_port_handover3_min
Default:2
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER3_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
EF_UDP_PORT_HANDOVER_MAX
Name:udp_port_handover_max
Default:1
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MIN),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
EF_UDP_PORT_HANDOVER_MIN
Name:udp_port_handover_min
Default:2
Scope:perstack
Whenset(togetherwithEF_UDP_PORT_HANDOVER_MAX),thiscausesUDPsockets
explicitlyboundtoaportinthegivenrangetobehandedovertothekernelstack.
Therangeisinclusive.
EF_UDP_RCVBUF
Name:udp_rcvbuf_user
Default:0
Scope:perstack
OverrideSO_RCVBUFforUDPsockets.(Note:theactualsizeofthebufferisdouble
theamountrequested,mimickingthebehavioroftheLinuxkernel.)
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 211
EF_UDP_RECV_SPIN
Name:udp_recv_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingUDPreceivecallsuntildataarrives,thespintimeoutexpiresorthe
sockettimeoutexpires(whicheveristhesooner).Ifthespintimeoutexpires,enter
thekernelandblock.ThespintimeoutissetbyEF_SPIN_USECorEF_POLL_USEC.
EF_UDP_SEND_NONBLOCK_NO_PACKETS_MODE
Name:udp_nonblock_no_pkts_mode
Default:0
Minimum:0
Maximum:1
Scope:perstack
ThisoptioncontrolshowanonblockingUDPsend()callshouldbehaveifitis
unabletoallocatesufficientpacketbuffers.BydefaultOnloadwillmimicLinux
kernelstackbehaviorandblockforpacketbufferstobeavailable.Ifsetto1,this
optionwillcauseOnloadtoreturnerrorENOBUFS.Notethisoptioncancausesome
applications(thatassumethatasocketthatiswritableisabletosendwithouterror)
tomalfunction.
EF_UDP_SEND_SPIN
Name:udp_send_spin
Default:0
Minimum:0
Maximum:1
Scope:perprocess
SpininblockingUDPsendcallsuntilspacebecomesavailableinthesocketbuffer,
thespintimeoutexpiresorthesockettimeoutexpires(whicheveristhesooner).If
thespintimeoutexpires,enterthekernelandblock.Thespintimeoutissetby
EF_SPIN_USECorEF_POLL_USEC.
NOTE:UDPsendsusuallycompleteveryquickly,butcanblockiftheapplication
doesalargeburstofsendsatahighrate.Thisoptionreducesjitterwhensuch
blockingisneeded.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 212
EF_UDP_SEND_UNLOCKED
Name:udp_send_unlocked
Default:1
Minimum:0
Maximum:1
Scope:perstack
Enablesthe'unlocked'UDPsendpath.Whenenabledthisoptionimproves
concurrencywhenmultiplethreadsareperformingUDPsends.
EF_UDP_SEND_UNLOCK_THRESH
Name:udp_send_unlock_thresh
Default:1500
Scope:perstack
UDPmessagesizebelowwhichweattempttotakethestacklockearly.Takingthe
lockearlyreducesoverheadandlatencyslightly,butmayincreaselockcontention
inmultithreadedapplications.
EF_UDP_SNDBUF
Name:udp_sndbuf_user
Default:0
Scope:perstack
OverrideSO_SNDBUFforUDPsockets.(Note:theactualsizeofthebufferisdouble
theamountrequested,mimickingthebehavioroftheLinuxkernel.)
EF_UL_EPOLL
Name:ul_epoll
Default:1
Minimum:0
Maximum:3
Scope:perprocess
Chooseepollimplementation.Thechoicesare:
•0‐kernel(unaccelerated)
•1‐userlevel(accelerated,lowestlatency)
•2‐kernelaccelerated(bestwhentherearelotsofsocketsinthesetandmode
3isnotsuitable)
•3‐userlevel(accelerated,lowestlatency,scalable,supportssocketcaching).
Thedefaultistheuserlevelimplementation(1).
Mode3canofferbenefitsovermode1,particularlywithlargersets.However,this
modehassomerestrictions:
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 213
•Itdoesnotsupportepollsetsthatexistacrossfork().
•Itdoesnotsupportmonitoringthereadinessoftheset'sepollfdviaaanother
epoll/poll/select.
EF_UL_POLL
Name:ul_poll
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Cleartodisableaccelerationofpoll()callsatuserlevel.
EF_UL_SELECT
Name:ul_select
Default:1
Minimum:0
Maximum:1
Scope:perprocess
Cleartodisableaccelerationofselect()callsatuserlevel.
EF_UNCONFINE_SYN
Name:unconfine_syn
Default:1
Minimum:0
Maximum:1
Scope:perstack
AcceptTCPconnectionsthatcrossintooroutofaprivatenetwork.
EF_UNIX_LOG
Name:log_level
Default:3
Scope:perprocess
Abitmaskdeterminingwhichkindsofdiagnosticsmessageswillbelogged:
•0x1errors
•0x2unexpected
•0x4setup
•0x8verbose
•0x10select()
•0x20poll()
• 0x100socketsetup
•0x200socketcontrol
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 214
• 0x400socketcaching
• 0x1000signalinterception
• 0x2000libraryenter/exit
• 0x4000logcallarguments
• 0x8000contextlookup
• 0x10000passthrough
• 0x20000veryverbose
• 0x40000Verbosereturnederror
• 0x80000V.Verboseerrors:show'ok'too
• 0x20000000verbosetransportcontrol
• 0x40000000veryverbosetransportcontrol
• 0x80000000verbosepassthrough
EF_URG_RFC
Name:urg_rfc
Default:0
Minimum:0
Maximum:1
Scope:perstack
ChoosebetweencompliancewithRFC1122(1)orBSDbehavior(0)regardingthe
locationoftheurgentpointinTCPpacketheaders.
EF_USE_DSACK
Name:use_dsack
Default:1
Minimum:0
Maximum:1
Scope:perstack
WhetherornottouseDSACK(duplicateSACK).
EF_USE_HUGE_PAGES
Name:huge_pages
Default:1
Minimum:0
Maximum:2
Scope:perstack
Controlofwhetherhugepagesareusedforpacketbuffers:
•0‐no
•1‐usehugepagesifavailable(default)
•2‐alwaysusehugepagesandfailifhugepagesarenotavailable.
Mode1printssyslogmessageifthereisnotenoughhugepagesinthesystem.
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 215
Mode2guaranteesonlyinitiallyallocatedpacketstobeinhugepages.Itis
recommendedtousethismodetogetherwithEF_MIN_FREE_PACKETS,tocontrol
thenumberofsuchguaranteedhugepages.Allnoninitialpacketsareallocatedin
hugepageswhenpossible.Asyslogmessageisprintedifthesystemisoutofhuge
pages.
Noninitialpacketsmaybeallocatedinnonhugepageswithoutanywarningin
syslogforbothmode1and2evenifthesystemhasfreehugepages.
EF_VALIDATE_ENV
Name:validate_env
Default:1
Minimum:0
Maximum:1
Scope:perstack
WhensetthisoptionvalidatesOnloadrelatedenvironmentvariables(startingwith
EF_).
EF_VFORK_MODE
Name:vfork_mode
Default:1
Minimum:0
Maximum:2
Scope:perprocess
Thisoptiondictateshowvfork()interceptshouldwork.Afteravfork(),parent
andchildstillshareaddressspacebutnotfiledescriptors.Wehavetobecareful
aboutmakingchangesinthechildthatcanbeseenintheparent.Weofferthree
optionshere.Differentappsmayrequiredifferentoptionsdependingontheiruse
ofvfork().IfusingEF_VFORK_MODE=2,itisnotsafetocreatesocketsorpipesinthe
childbeforecallingexec().
•0‐Oldbehavior.Replacevfork()withfork()
•1‐Replacevfork()withfork()andblockparenttillchildexits/execs
•2‐Replacevfork()withvfork().
OnloadUserGuide
ParameterReference
Issue22 ©SolarflareCommunications2017 216
EF_WODA_SINGLE_INTERFACE
Name:woda_single_if
default:0 min:0 
max:1 
perprocess
Thisoptionaltersthebehaviourofonload_ordered_epoll_wait().Thisfunction
wouldnormallyensurecorrectorderingacrossmultipleinterfaces.However,this
impactslatency,asonlyeventsarrivingbeforethefirstinterfacepolledcanbe
returnedandstillguaranteeordering.
Ifthetrafficbeingorderedisonlyarrivingonasingleinterfacethenthisadditional
constraintisnotnecessary.Whenthisoptionisenabled,trafficwillonlybeordered
relativetoothertrafficarrivingonthesameinterface.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 217
BMetaOptions
B.1Environmentvariables
Thereareseveralenvironmentvariableswhichactasmetaoptionsandsetseveral
oftheoptionsdetailedinAppendixA.Theseare:
EF_POLL_USEC
SettingEF_POLL_USECcausesthefollowingoptionstobeset:
EF_SPIN_USEC=EF_POLL_USEC
EF_SELECT_SPIN=1
EF_EPOLL_SPIN=1
EF_POLL_SPIN=1
EF_PKT_WAIT_SPIN=1
EF_TCP_SEND_SPIN=1
EF_UDP_RECV_SPIN=1
EF_UDP_SEND_SPIN=1
EF_TCP_RECV_SPIN=1
EF_BUZZ_USEC=EF_POLL_USEC
EF_SOCK_LOCK_BUZZ=1
EF_STACK_LOCK_BUZZ=1
NOTE:Ifneitherofthespinningoptions;EF_POLL_USECandEF_SPIN_USECareset,
OnloadwillresorttodefaultinterruptdrivenbehaviorbecausetheEF_INT_DRIVEN
environmentvariableisenabledbydefault.
NOTE:WhenEF_POLL_USECorEF_SPIN_USECaregreaterthanzero,
EF_INT_DRIVENwillbezero.
OnloadUserGuide
MetaOptions
Issue22 ©SolarflareCommunications2017 218
EF_BUZZ_USEC
SettingEF_BUZZ_USECsetsthefollowingoptions:
• EF_SOCK_LOCK_BUZZ=1
• EF_STACK_LOCK_BUZZ=1
NOTE:IfEF_POLL_USECissettovalueN,thenEF_BUZZ_USECisalsosettoNonlyif
N<=100,IfN>100thenEF_BUZZ_USECwillbesetto100.Thisisdeliberateas
spinningfortoolongoninternallocksmayadverselyaffectperformance.However
theusercanexplicitlysetEF_BUZZ_USECvaluee.g.
exportEF_POLL_USEC=10000
exportEF_BUZZ_USEC=1000
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 219
CBuildDependencies
C.1General
BeforeOnloadnetworkandkerneldriverscanbebuiltandinstalled,thetarget
platformmustsupportthefollowingcapabilities:
• SupportageneralCbuildenvironment‐i.e.hasgcc,make,libcandlibc
devel.
•Fromversion201502thefollowingarerequired:perl,autoconf,automake
andlibtool.
•Cancompilekernelmodules‐i.e.hasthecorrectkerneldevelpackageforthe
installedkernelversion.
•If32bitapplicationsaretobeacceleratedon64bitarchitecturesthemachine
mustbeabletobuild32bitapplications.
NOTE:Onloadbuildshavebeentestedagainstlibtoolversions1.5.26to2.4.2.Users
experiencingbuildissueswithotherlibtoolversionsshouldcontact
support@solarflare.com.
BuildingKernelModules
ThekernelmustbebuiltwithCONFIG_NETFILTERenabled.Standarddistributions
willalreadyhavethisenabled,butitmustalsobeenabledwhenbuildingacustom
kernel.Thisoptiondoesnotaffectperformance.
Thefollowingcommandscanbeusedtoinstallkerneldevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu(anykernel):
aptgetinstalllinuxheaders$(uname‐r)
•ForRedHat/Fedora(notfor32bitKernel):
‐ Ifthesystemsupportsa32bitKernelandthekernelisPAE,then:
yum‐yinstallkernelPAEdevel
‐ otherwise:
yum‐yinstallkerneldevel
•ForSuSE:
yast‐ikernelsource
OnloadUserGuide
BuildDependencies
Issue22 ©SolarflareCommunications2017 220
onload
binutils
gettext
gawk
gcc
sed
make
bash
glibccommon
automake
libtool
autoconf.
onload_tcpdump
libpcap
libpcapdevel1
solar_clusterd
pythondevel1
Building32bitapplicationson64bitarchitectureplatforms
Thefollowingcommandscanbeusedtoinstall32bitlibcdevelopmentheaders.
• DebianbasedDistributions‐includingUbuntu:
aptgetinstallgccmultiliblibc6devi386
•ForRedHat/Fedora:
yum‐yinstallglibcdevel.i586
•ForSuSE:
yast‐iglibcdevel32bit
yast‐igcc32bit
1. Ifadditionalpackagesarenotinstalledthedependentcomponentwillnotbebuilt,butthe
Onloadbuildwillsucceed.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 221
DOnloadExtensionsAPI
TheOnloadExtensionsAPIallowstheusertocustomizeanapplicationusing
advancedfeaturestoimproveperformance.
TheExtensionsAPIdoesnotcreateanyruntimedependencyonOnloadandan
applicationusingtheAPIcanrunwithoutOnload.ThelicensefortheAPIand
associatedlibrariesisaBSD2ClauseLicense.
Thissectioncoversthefollowstopics:
CommonComponentsonpage221
StacksAPIonpage227
ZeroCopyAPIonpage236
TemplatedSendsonpage249
DelegatedSendsAPIonpage253
D.1SourceCode
TheonloadsourcecodeisprovidedwiththeOnloaddistribution.Entrypointsfor
thesourcecodeare:
src/lib/transport/unix/onload_ext_intercept.c
src/lib/transport/unix/zc_intercept.c
D.2JavaNativeInterface‐Wrapper
TheOnloaddistributionincludesaJNIwrapperforusewiththeextensionAPIs.Java
usersshouldalsorefertothefiles:
• /openonload<version>/src/tools/jni
D.3CommonComponents
ForallapplicationsemployingtheExtensionsAPIthefollowingcomponentsare
provided:
• #include<onload/extensions.h>
Anapplicationshouldincludetheheaderfilecontainingfunctionprototypes
andconstantvaluesrequiredwhenusingtheAPI.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 222
libonload_ext.a,libonload_ext.so
ThislibraryprovidesstubimplementationsoftheextendedAPI.Anapplication
thatwishestousetheextensionsAPIshouldlinkagainstthislibrary.
WhenOnloadisnotpresent,theapplicationwillcontinuetofunction,butcalls
totheextensionsAPIwillhavenoeffect(unlessdocumentedotherwise).
Tolinktothislibraryincludethel’linkeroptiononthecompilercommandline
i.e.
lonload_ext
Whenlinkingagainsttheonload_ext.astaticlibraryitisnecessarytoalsolink
withthedynamiclibrarybyaddingthe‘ldloptiontothecompilercommand
line.
ldl‐lonload_ext
onload_is_present
Description
Iftheapplicationislinkedwithlibonload_ext,butnotrunningwithOnloadthiswill
return0.IftheapplicationisrunningwithOnloadthiswillreturn1.
Definition
intonload_is_present(void)
FormalParameters
None
ReturnValue
1fromlibonload.solibrary,or0fromlibonload_ext.alibrary
onload_fd_stat
structonload_stat
{
int32_tstack_id;
char*stack_name;
int32_tendpoint_id;
int32_tendpoint_state;
};
externintonload_fd_stat(intfd,structonload_stat*stat);
Description
Retrievesinternaldetailsaboutanacceleratedsocket.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 223
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0socketisnotaccelerated
1socketisaccelerated
ENOMEMwhenmemorycannotbeallocated
Notes
Whencallingfree()onstack_nameusethe(char*)becausememoryisallocated
usingmalloc.
Thisfunctionwillcallmalloc()andsoshouldneverbecalledfromanyother
functionrequiringamalloclock.
onload_fd_check_feature
intonload_fd_check_feature(intfd,enumonload_fd_featurefeature);
enumonload_fd_feature{
/*CheckwhetherthisfdsupportsONLOAD_MSG_WARMornot*/
ONLOAD_FD_FEAT_MSG_WARM
};
Description
UsedtocheckwhethertheOnloadfiledescriptorsupportsafeatureornot.
Definition
Seeabove
FormalParameters
Seeabove
ReturnValue
0ifthefeatureissupportedbutnotonthisfd
>0ifthefeatureissupportedbothbyonloadandthisfd
<0ifthefeatureisnotsupported:
ENOSYSifonload_fd_check_feature()isnotsupported.
‐ENOTSUPPifthefeatureisnotsupportedbyonload.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 224
Notes
Onload201509andlaterversionssupportthe
ONLOAD_FD_FEAT_UDP_TX_TS_HDRoption.onload_fd_check_featurewillreturn
1toindicatethatarecvmsgusedtoretrieveTXtimestampsforUDPpacketswill
returntheentireEthernetheader.
NOTE:Whenrunonolderversionsofonloadthiswillreturn‐EOPNOTSUPP.
onload_thread_set_spin
Description
Forathreadcallingthisfunction,onload_thread_set_spin()setstheperthread
spinningactions,itisnotperstackandnotpersocket.
Definition
intonload_thread_set_spin(
enumonload_spin_typetype,
unsignedspin)
FormalParameters
type
Whichoperationtochangethespinstatusof.Thetypemustbeoneofthe
following:
enumonload_spin_type{
ONLOAD_SPIN_ALL,/*enableordisableallspinoptions*/
ONLOAD_SPIN_UDP_RECV,
ONLOAD_SPIN_UDP_SEND,
ONLOAD_SPIN_TCP_RECV,
ONLOAD_SPIN_TCP_SEND,
ONLOAD_SPIN_TCP_ACCEPT,
ONLOAD_SPIN_PIPE_RECV,
ONLOAD_SPIN_PIPE_SEND,
ONLOAD_SPIN_SELECT,
ONLOAD_SPIN_POLL,
ONLOAD_SPIN_PKT_WAIT,
ONLOAD_SPIN_EPOLL_WAIT,
ONLOAD_SPIN_STACK_LOCK,
ONLOAD_SPIN_SOCK_LOCK,
ONLOAD_SPIN_SO_BUSY_POLL,
ONLOAD_SPIN_TCP_CONNECT,
ONLOAD_SPIN_MAX/*specialvaluetomarklargestvalidinput*/
};
spin
Abooleanwhichindicateswhethertheoperationshouldspinornot.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 225
ReturnValue
0onsuccess
EINVALifunsupportedtypeisspecified.
Notes
Spintime(forallthreads)issetusingtheEF_SPIN_USECparameter.
Examples
Theonload_thread_set_spinAPIcanbeusedtocontrolspinningonaperthread
orperAPIbasis.Theexistingspinrelatedconfigurationoptionssetthedefault
behaviorforthreads,andtheonload_thread_set_spinAPIoverridesthedefault
forthethreadcallingthisfunction.
Disableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,0);
Enableallsortsofspinning:
onload_thread_set_spin(ONLOAD_SPIN_ALL,1);
Enablespinningonlyforcertainthreads:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin,invokeonload_thread_set_spin().
Disablespinningonlyincertainthreads:
1EnablespinningbysettingEF_POLL_USEC=<timeout>.
2Ineachthreadthatshouldnotspin,invokeonload_thread_set_spin().
WARNING:IfathreadissettoNOTspinandthenblocksthismayinvokean
interruptforthewholestack.Interruptsoccurringonmoderatelybusythreadsmay
causeunintendedandundesirableconsequences.
EnablespinningforUDPtraffic,butnotTCPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(UDPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_UDP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_UDP_SEND,1)
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 226
EnablespinningforTCPtraffic,butnotUDPtraffic:
1SetthespintimeoutbysettingEF_SPIN_USEC,anddisablespinningbydefault
bysettingEF_POLL_USEC=0.
2Ineachthreadthatshouldspin(TCPonly),do:
onload_thread_set_spin(ONLOAD_SPIN_TCP_RECV,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_SEND,1)
onload_thread_set_spin(ONLOAD_SPIN_TCP_ACCEPT,1)
Spinningandsockets:
Whenathreadcallsonload_thread_set_spin()itsetsthespinningactions
appliedwhenthethreadaccessesanysocket‐irrespectiveofwhetherthesocketis
createdbythisthread.
IfasocketiscreatedbythreadAandisaccessedbythreadB,calling
onload_thread_set_spin(ONLOAD_SPIN_ALL,1)onlyfromthreadBwillenable
spinningforthreadB,butnotforthreadA.Inthesamescenario,if
onload_thread_set_spin(ONLOAD_SPIN_ALL,1)iscalledonlyfromthreadA,then
spinningisenabledonlyforthreadA,butnotforthreadB.
Theonload_thread_set_spin()functionsetstheperthreadspinningaction.
onload_thread_get_spin
Description
Forthecurrentthread,identifywhichoperationsshouldspin.
Definition
intonload_thread_get_spin(
unsigned*state)
FormalParameters
state
Locationatwhichtowritethespinstatusasabitmask.Bitnofthemaskisset
ifspinninghasbeenenabledforspintypen(seeonload_thread_set_spinon
page224).
ReturnValue
0onsuccess
Notes
Spintime(forallthreads)issetusingtheEF_SPIN_USECparameter.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 227
Examples
DetermineifspinningisenabledforUDPreceive:
unsignedstate;
onload_thread_get_spin(&state);
if(state&(1<<ONLOAD_SPIN_UDP_RECV)){
//spinningisenabledforUDPreceive
}
D.4StacksAPI
UsingtheOnloadExtensionsAPIanapplicationcanbindselectedsocketstospecific
Onloadstacksandinthiswayensurethattimecriticalsocketsarenotstarvedof
resourcesbyothernoncriticalsockets.TheAPIallowsanapplicationtoselect
socketswhicharetobeacceleratedthusreservingOnloadresourcesfor
performancecriticalpaths.Thisalsopreventsnoncriticalpathsfromcreatingjitter
forcriticalpaths.
onload_set_stackname
Description
SelecttheOnloadstackthatnewsocketsareplacedin.Asocketcanexistonlyina
singlestack.Asocketcanbemovedtoadifferentstack‐seeonload_move_fd()
below.
Movingasockettoadifferentstackdoesnotcreateacopyofthesocketinoriginator
andtargetstacks.
Definition
intonload_set_stackname(
intwho,
intscope,
constchar*name)
FormalParameters
who
Mustbeoneofthefollowing:
‐ ONLOAD_THIS_THREAD‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbythisthread.
‐ ONLOAD_ALL_THREADS‐tomodifythestacknameinwhichall
subsequentsocketsarecreatedbyallthreadsinthecurrentprocess.
ONLOAD_THIS_THREADtakesprecedenceoverONLOAD_ALL_THREADS.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 228
scope
Mustbeoneofthefollowing:
‐ ONLOAD_SCOPE_THREAD‐nameisscopedwithcurrentthread
‐ ONLOAD_SCOPE_PROCESS‐nameisscopedwithcurrentprocess
‐ ONLOAD_SCOPE_USER‐nameisscopedwithcurrentuser
‐ ONLOAD_SCOPE_GLOBAL‐nameisglobalacrossallthreads,usersand
processes.
‐ ONLOAD_SCOPE_NOCHANGE‐undoeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…),seeNoteson
page228.
name
Oneofthefollowing:
‐ thestacknameupto8characters.
‐ anemptystringtosetnostackname
‐ thespecialvalueONLOAD_DONT_ACCELERATEtopreventsocketscreated
inthisthread,user,processfrombeingaccelerated.
SocketsidentifiedbytheoptionsabovewillbelongtotheOnloadstackuntila
subsequentcallusingonload_set_stacknameidentifiesadifferentstackorthe
ONLOAD_SCOPE_NOCHANGEoptionisused.
ReturnValue
0onsuccess
1witherrnosettoENAMETOOLONGifthenameexceedspermittedlength
1witherrnosettoEINVALifotherparametersareinvalid.
Notes
Note1
Thisappliesforstacksselectedforsocketscreatedbysocket()andforpipe(),it
hasnoeffectonaccept().Passivelyopenedsocketscreatedviaaccept()will
alwaysbeinthesamestackasthelisteningsocketthattheyarelinkedto,thismeans
thatthefollowingarefunctionallyidenticali.e.
onload_set_stackname(foo)
socket
listen
onload_set_stackname(bar)
accept
and
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 229
onload_set_stackname(foo)
socket
listen
accept
onload_set_stackname(bar)
Inbothcasesthelisteningsocketandtheacceptedsocketwillbeinstackfoo.
Note2
Scopedefinesthenamespaceinwhichastackbelongs.Astacknameoffooinscope
userisnotthesameasastacknameoffooinscopethread.Scoperestrictsthe
visibilityofastacktoeitherthecurrentthread,currentprocess,currentuseroris
unrestricted(global).Thishasthepropertythatwith,forexample,processbased
scoping,twoprocessescanhavethesamestacknamewithoutsharingastack‐as
thestackforeachprocesshasadifferentnamespace.
Note3
Scopingcanbethoughtofasaddingasuffixtothesuppliednamee.g.
ONLOAD_SCOPE_THREAD:<stackname>t<thread_id>
ONLOAD_SCOPE_PROCESS:<stackname>p<process_id>
ONLOAD_SCOPE_USER:<stackname>u<user_id>
ONLOAD_SCOPE_GLOBAL:<stackname>
Thisisanexampleonlyandtheimplementationisfreetodosomethingdifferent
suchasmaintainingdifferentlistsfordifferentscopes.
Note4
ONLOAD_SCOPE_NOCHANGEwillundotheeffectofapreviouscallto
onload_set_stackname(ONLOAD_THIS_THREAD,…).
Ifyouhavepreviouslyusedonload_set_stackname(ONLOAD_THIS_THREAD,…)and
wanttoreverttothebehaviorofthreadsthatareusingtheONLOAD_ALL_THREADS
configuration,withoutchangingthatconfiguration,youcandothefollowing:
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_NOCHANGE,"");
Relatedenvironmentvariables
Relatedenvironmentvariablesare:
EF_DONT_ACCELERATE
Default:0
Minimum:0
Maximum:1
Scope:Perprocess
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 230
IfthisenvironmentvariableissetthenaccelerationforALLsocketsisdisabledand
handedofftothekernelstackuntiltheapplicationoverridesthisstatewithacallto
onload_set_stackname().
EF_STACK_PER_THREAD
Default:0
Minimum:0
Maximum:1
Scope:Perprocess
Ifthisenvironmentvariableisseteachsocketcreatedbytheapplicationwillbe
placedinastackdependingonthethreadinwhichitiscreated.Stackscould,for
example,benamedusingthethreadIDofthethreadthatcreatesthestack,butthis
shouldnotbereliedupon.
Acalltoonload_set_stacknameoverridesthisvariable.EF_DONT_ACCELERATE
takesprecedenceoverthisvariable.
EF_NAME
Default:none
Minimum:0chars
Maximum:8chars
Scope:perstack
TheenvironmentvariableEF_NAMEwillbehonoredtocontrolOnloadstacksharing.
However,acalltoonload_set_stacknameoverridesthisvariableand,
EF_DONT_ACCELERATEandEF_STACK_PER_THREADbothtakeprecedenceover
EF_NAME.
onload_move_fd
Description
Movethefiledescriptortothecurrentstack.Thetargetstackcanbespecifiedwith
onload_set_stackname(),thenuseonload_move_fd()toputthesocketintothe
targetstack.
Asocketcanexistonlyinasinglestack.Movingasockettoadifferentstackdoesnot
createacopyofthesocketinoriginatorandtargetstacks.
Definition
intonload_move_fd(intfd)
FormalParameters
fd‐thefiledescriptortobemovedtothecurrentstack.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 231
ReturnValue
0onsuccess
nonzerootherwise.
Notes
•Usefultomovefdsobtainedbyaccept()tomoveanewconnectionoutofthe
listeningsocket.
• LimitedtoTCPclosedoracceptedsocketsonly
• Cannotbeusedonactivelyopenedconnections,althoughitispossibletouse
onload_set_stackname()beforecallingconnect() toachievethesame
result.
•Thesocketmusthaveemptysendandretransmitqueues(i.e.sendnotcalled
onthissocket)
•Thesocketmusthaveasimplereceivequeue(noloss,reordering,etc)
•Thefdisnotyetinanepollset.
•Theonload_move_fdfunctionshouldnotbeusedifSO_TIMESTAMPINGisset
toanonzerovaluefortheoriginatingsocket.
•ShouldnotbeusedsimultaneouslywithotherI/Omultiplexactionsi.e.
poll(),select(),recv()etconthefiledescriptor.
•Thisfunctionisnotasyncsafeandshouldneverbecalledfromanyprocess
functionhandlingsignals.
•Thisfunctioncannotbeusedtohandsocketsovertothekernel.Itisnot
possibletouseonload_set_stackname(ONLOAD_DONT_ACCELERATE)and
thenonload_move_fd().
NOTE:Theonload_move_fdfunctiondoesnotcheckwhetheradestinationstack
haseitherRXorTXtimestampingenabled.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 232
onload_stackname_save
Description
Savethestateofthecurrentonloadstackidentifiedbythepreviouscallto
onload_set_stackname()
Definition
intonload_stackname_save(void)
FormalParameters
none
ReturnValue
0onsuccess
ENOMEMwhenmemorycannotbeallocated.
onload_stackname_restore
Description
Restorestackstatesavedwithapreviouscalltoonload_stackname_save().All
updates/changestostateofthecurrentstackwillbedeletedandallstatepreviously
savedwillberestored.Toavoidunexpectedresults,thestackshouldberestoredin
thesamethreadasusedtocallonload_stackname_save().
Definition
intonload_stackname_restore(void)
FormalParameters
none
ReturnValue
0onsuccess
nonzeroifanerroroccurs.
Notes
TheAPIstacknamesaveandrestorefunctionsprovideflexibilitywhenbinding
socketstoanOnloadstack.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 233
Usingacombinationofonload_set_stackname(),onload_stackname_save()
andonload_stackname_restore(),theuserisabletocreatedefaultstacksettings
whichapplytooneormoresockets,savethisstateandthencreatechangedstack
settingswhichareappliedtoothersockets.Theoriginaldefaultsettingscanthenbe
restoredtoapplytosubsequentsockets.
D.5StacksAPIUsage
UsingacombinationoftheEF_DONT_ACCELERATEenvironmentvariableandthe
functiononload_set_stackname(),theuserisabletocontrol/selectsocketswhich
aretobeacceleratedandisolatetheseperformancecriticalsocketsandthreads
fromtherestofthesystem.
onload_stack_opt_set_int
Description
Set/modifyperstackoptionsthatallsubsequentlycreatedstackswilluseinsteadof
usingtheexistingglobalstackoptions.
Definition
intonload_stack_opt_set_int(
constchar*name,
int64_tvalue)
FormalParameters
name
Stackoptiontomodify
value
Newvalueforthestackoption.
Example
onload_stack_opt_set_int(“EF_SCALABLE_FILTERS_ENABLE”,1);
ReturnValue
0onsuccess
errnosettoEINVALiftherequestedoptionisnotfoundorENOMEM.
Notes
Cannotbeusedtomodifyoptionsonexistingstacks‐onlyfornewstacks.
Cannotbeusedtomodifyprocessoptions‐onlystackoptions.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 234
Modifiedoptionswillbeusedforallnewlycreatedstacksuntil
onload_stack_opt_reset()iscalled.
onload_stack_opt_reset
Description
Reverttousingglobalstackoptionsfornewlycreatedstacks.
Definition
intonload_stack_opt_reset(void)
FormalParameters
None.
ReturnValue
0always
Notes
Shouldbecalledfollowingacalltoonload_stack_opt_set_int()torevertto
usingglobalstackoptionsforallnewlycreatedstacks.
D.6StacksAPI‐Examples
•Thisthreadwillusestackfoo,otherthreadsinthestackwillcontinueasbefore.
onload_set_stackname(ONLOAD_THIS_THREAD,ONLOAD_SCOPE_GLOBAL,"foo")
•Allthreadsinthisprocesswillgettheirownstackcalledfoo.Thisisequivalent
totheEF_STACK_PER_THREADenvironmentvariable.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_THREAD,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessdid
thesamefunctioncallitwillgetitsownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_PROCESS,"foo")
•Allthreadsinthisprocesswillshareastackcalledfoo.Ifanotherprocessrunby
thesameuserdidthesame,itwouldsharethesamestackasthefirstprocess.
Ifanotherprocessrunbyadifferentuserdidthesameitwouldgetisownstack.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_USER,"foo")
• EquivalenttoEF_NAME.Allthreadswilluseastackcalledfoowhichissharedby
anyotherprocesswhichdoesthesame.
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,"foo")
• EquivalenttoEF_DONT_ACCELERATE.Newsockets/pipeswillnotbeaccelerated
untilanothercalltoonload_set_stackname().
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 235
onload_set_stackname(ONLOAD_ALL_THREADS,ONLOAD_SCOPE_GLOBAL,ONLOAD_DONT_ACCELERATE)
onload_ordered_epoll_wait
FordetailsoftheWireOrderDeliveryfeaturerefertoWireOrderDeliveryon
page65
Description
Iftheepollsetcontainsacceleratedsocketsinonlyonestackthisfunctioncanbe
usedinsteadofepoll_wait()toreturneventsintheorderthesewererecovered
fromthewire.Thereisnoexplicitcheckonsockets,soapplicationsmustensurethat
therulesareappliedtoavoidmisorderingofpackets.
Definition
intonload_ordered_epoll_wait(
intepfd,
structepoll_event*events,
structonload_ordered_epoll_event*oo_events,
intmaxevents,
inttimeout);
FormalParameters
Seedefinitionepoll_wait().
ReturnValue
0onsuccess
nonzerootherwise.
Notes
Anyfiledescriptorsreturnedasreadywithoutavalidtimestampi.e.tv_sec=0,
shouldbeconsideredunorderedwithrespecttotherestoftheset.Thiscanoccur
fordatareceivedviathekernelordatareturnedwithoutahardwaretimestampi.e.
fromaninterfacethatdoesnotsupporthardwaretimestamping.
TheenvironmentvariableEF_UL_EPOLL=1mustbesetHardwaretimestampsare
required.ThisfeatureisonlyavailableontheSFN7000andSFN8000seriesadapters.
structonload_ordered_epoll_event{
/*Thehardwaretimestampofthefirstreadabledata*/
structtimespects;
/*Numberofbytesthatmaybereadtomaintainwireorder*/
intbytes
};
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 236
D.7ZeroCopyAPI
ZeroCopycanimprovetheperformanceofnetworkingapplicationsbyeliminating
intermediatebufferswhentransferringdatabetweenapplicationandnetwork
adapter.
TheOnloadExtensionsZeroCopyAPIsupportszerocopyofUDPreceivedpacket
dataandTCPtransmitpacketdata.
TheAPIprovidesthefollowingcomponents:
#include<onload/extensions_zc.h>
Inadditiontothecommoncomponents,anapplicationshouldincludethis
headerfilewhichcontainsallfunctionprototypesandconstantvaluesrequired
whenusingtheAPI.Theheaderfilealsoincludescomprehensive
documentation,requireddatastructuresandfunctiondefinitions.
ZeroCopyDataBuffers
Toavoidthecopydataispassedtoandfromtheapplicationinspecialbuffers
describedbyastructonload_zc_iovec.Amessageordatagramcanconsistof
multipleiovecsusingastructonload_zc_msg.Asinglecalltosendmayinvolve
multiplemessagesusinganarrayofstructonload_zc_mmsg.
/*Azc_iovecdescribesasinglebuffer*/
structonload_zc_iovec{
void*iov_base;/*Addresswithinbuffer*/
size_tiov_len;/*Lengthofdata*/
onload_zc_handlebuf;/*(opaque)bufferhandle*/
unsignediov_flags;/*Notcurrentlyused*/
};
/*Amsgdescribesarrayofiovecsthatmakeupdatagram*/
structonload_zc_msg{
structonload_zc_iovec*iov;/*Arrayofbuffers*/
structmsghdrmsghdr;/*Messagemetadata*/
};
/*Anmmsgdescribesamessage,thesocket,anditsresult*/
structonload_zc_mmsg{
structonload_zc_msgmsg;/*Message*/
intrc;/*Resultofsendoperation*/
intfd;/*sockettosendon*/
};
Figure17:ZeroCopyDataBuffers
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 237
ZeroCopyUDPReceiveOverview
Figure18illustratesthedifferencebetweenthenormalUDPreceivemodeandthe
zerocopymethod.
WhenusingthestandardPOSIXsocketcalls,theadapterdeliverspacketstoan
OnloadpacketbufferwhichisdescribedbyadescriptorpreviouslyplacedintheRX
descriptorring.Whentheapplicationcallsrecv(),Onloadcopiesthedatafromthe
packetbuffertoanapplicationsuppliedbuffer.
UsingthezerocopyUDPreceiveAPItheapplicationcallstheonload_zc_recv()
functionincludingacallbackfunctionwhichwillbecalledwhendataisready.The
callbackcandirectlyaccessthedatainsidetheOnloadpacketbufferavoidingacopy.
Figure18:Traditionalvs.ZeroCopyUDPReceive
Asinglecallusingonload_zc_recv()functioncanresultinmultipledatagrams
beingdeliveredtothecallbackfunction.EachtimethecallbackreturnstoOnload
thenextdatagramisdelivered.ProcessingstopswhenthecallbackinstructsOnload
toceasedeliveryortherearenofurtherreceiveddatagrams.
Ifthereceivingapplicationdoesnotrequiretolookatalldatareceived(i.e.is
filtering)thiscanresultinaconsiderableperformanceadvantagebecausethisdata
isnotpulledintotheprocessor'scache,therebyreducingtheapplicationcache
footprint.
Asageneralrule,thecallbackfunctionshouldavoidcallingothersystemcallswhich
attempttomodifyorclosethecurrentsocket.
ZerocopyUDPReceiveisimplementedwithintheOnloadExtensionsAPI.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 238
ZeroCopyUDPReceive
Theonload_zc_recv()functionspecifiesacallbacktoinvokeforeachreceived
UDPdatagram.Thecallbackisinvokedinthecontextofthecallto
onload_zc_recv()(i.e.Itblocks/spinswaitingfordata).
Beforecalling,theapplicationmustsetthefollowinginthestruct
onload_zc_recv_args:
typedefenumonload_zc_callback_rc
(*onload_zc_recv_callback)(structonload_zc_recv_args*args,int
flags);
structonload_zc_recv_args
{
structonload_zc_msgmsg;
onload_zc_recv_callbackcb;
void*user_ptr;
intflags;
};
intonload_zc_recv(intfd,structonload_zc_recv_args*args);
Figure19:ZeroCopyrecv_args
cb settothecallbackfunctionpointer
user_ptr settopointtoapplicationstate,thisisnottouchedby
onload
msg.msghdr.msg_control
msg_controllen
msg_name
msg_namelen
theuserapplicationshouldsetthesetoappropriate
buffersandlengths(ifrequired)asyouwouldfor
recvmsg(orNULLand0ifnotused)
flags settoindicatebehavior(e.g.
ONLOAD_MSG_DONTWAIT)
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 239
Thecallbackgetstoexaminethedata,andcancontrolwhathappensnext:(i)
whetherornotthebuffer(s)arekeptbythecallbackorareimmediatelyfreedby
Onload;and(ii)whetherornotonload_zc_recv()willinternallyloopandinvoke
thecallbackwiththenextdatagram,orimmediatelyreturntotheapplication.The
nextactionisdeterminedbysettingflagsinthereturncodeasfollows:
FlagscanalsobesetbyOnload:
Ifthereisunaccelerateddataonthesocketfromthekernelsreceivepaththis
cannotbehandledwithoutcopying.Theapplicationhastwochoicesasfollows:
ONLOAD_ZC_KEEP thecallbackfunctioncanelecttoretain
ownershipofreceivedbuffer(s)byreturning
ONLOAD_ZC_KEEP.Followingthis,thecorrect
waytoreleaseretainedbuffersistocall
onload_zc_release_buffers()toexplicitly
releasethefirstbufferfromeachreceived
datagram.Subsequentbufferspertainingtothe
samedatagramwillthenbeautomatically
released.
ONLOAD_ZC_CONTINUE tosuggestthatOnloadshouldloopandprocess
moredatagrams
ONLOAD_ZC_TERMINATE toinsistthatOnloadimmediatelyreturnfrom
theonload_zc_recv()
ONLOAD_ZC_END_OF_BURST Onloadsetsthisflagtoindicatethatthisisthe
lastpacket
ONLOAD_ZC_MSG_SHARED Packetbuffersarereadonly
ONLOAD_MSG_RECV_OS_INLINE setthisflagwhencallingonload_zc_recv().
Onloadwilldealwiththekerneldatainternally
andpassittothecallback
checkreturncode checkthereturncodefromonload_zc_recv().
IfitreturnsENOTEMPTYthentheapplicationmust
callonload_recvmsg_kernel()toretrievethe
kerneldata.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 240
ZeroCopyReceiveExample#1
structonload_zc_recv_argsargs;
structzc_recv_statestate;
intrc;
state.bytes=bytes_to_wait_for;
/*Easywaytosetmsg_control*andmsg_name*tozero*/
memset(&args.msg,0,sizeof(args.msg));
args.cb=&zc_recv_callback;
args.user_ptr=&state;
args.flags=ONLOAD_ZC_RECV_OS_INLINE;
rc=onload_zc_recv(fd,&args);
//‐‐‐
enumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflags)
{
inti;
structzc_recv_state*state=args>user_ptr;
for(i=0;i<args>msg.msghdr.msg_iovlen;++i){
printf("zccallbackiov%d:%p,%d",i,
args>msg.iov[i].iov_base,
args>msg.iov[i].iov_len);
state>bytes‐=args>msg.iov[i].iov_len;
}
if(state>bytes<=0)returnONLOAD_ZC_TERMINATE;
elsereturnONLOAD_ZC_CONTINUE;
}
Figure20:ZeroCopyReceive‐example#1
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 241
ZeroCopyReceiveExample#2
staticenumonload_zc_callback_rc
zc_recv_callback(structonload_zc_recv_args*args,intflag)
{
structuser_info*zc_info=args>user_ptr;
inti,zc_rc=0;
for(i=0;i<args>msg.msghdr.msg_iovlen;++i){
zc_rc+=args>msg.iov[i].iov_len;
handle_msg(args>msg.iov[i].iov_base,
args>msg.iov[i].iov_len);
}
if(zc_rc==0)
returnONLOAD_ZC_TERMINATE;
zc_info>zc_rc+=zc_rc;
if((zc_info>flags&MSG_WAITALL)&&
(zc_info>zc_rc<zc_info>size))
returnONLOAD_ZC_CONTINUE;
elsereturnONLOAD_ZC_TERMINATE;
}
structonload_zc_recv_argszc_args;
ssize_tdo_recv_zc(intfd,void*buf,size_tlen,intflags)
{
structuser_infoinfo;intrc;
init_user_info(&info);
memset(&zc_args,0,sizeof(zc_args));
zc_args.user_ptr=&info;
zc_args.flags=0;
zc_args.cb=&zc_recv_callback;
if(flags&MSG_DONTWAIT)
zc_args.flags|=ONLOAD_MSG_DONTWAIT;
rc=onload_zc_recv(fd,&zc_args);
if(rc==‐ENOTEMPTY){
if((rc=onload_recvmsg_kernel(fd,&msg,0))<0)
printf("onload_recvmsg_kernelfailed\n");
}
elseif(rc==0){
/*zc_rcgetssetbycallbacktobytesreceived,sowe
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 242
*canreturnthattoappearlikestandardrecvcall*/
rc=info.zc_rc;
}
returnrc;
}
Figure21:ZeroCopyReceive‐example#2
NOTE:onload_zc_recv()onlysupportsaccelerated(Onloaded)sockets.For
example,whenboundtoabroadcastaddressthesocketfdishandedofftothe
kernelandthisfunctionwillreturnESOCKNOTSUPPORT.
ZeroCopyTCPSendOverview
Figure22illustratesthedifferencebetweenthenormalTCPtransmitmethodand
thezero‐copymethod.
WhenusingstandardPOSIXsocketcalls,theapplicationfirstcreatesthepayload
datainanapplicationallocatedbufferbeforecallingthesend()function.Onload
willcopythedatatoaOnloadpacketbufferinmemoryandpostadescriptortothis
bufferinthenetworkadapterTXdescriptorring.
UsingthezerocopyTCPtransmitAPItheapplicationcallsthe
onload_zc_alloc_buffers()functiontorequestbuffersfromOnload.Apointer
toapacketbufferisreturnedinresponse.Theapplicationplacesthedatatosend
directlyintothisbufferandthencallsonload_zc_send()toindicatetoOnloadthat
dataisavailabletosend.
OnloadwillpostadescriptorforthepacketbufferinthenetworkadapterTX
descriptorringandringtheTXdoorbell.Thenetworkadapterfetchesthedatafor
transmission.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 243
Figure22:Traditionalvs.ZeroCopyTCPTransmit
NOTE:Thesocketusedtoallocatezerocopybuffersmustbeinthesamestackas
thesocketusedtosendthebuffers.WhenusingTCPloopback,Onloadcanmovea
socketfromonestacktoanother.UsersmustensurethattheyALWAYSUSE
BUFFERSFROMTHECORRECTSTACK.
NOTE:Theonload_zc_sendfunctiondoesnotcurrentlysupportthe
ONLOAD_MSG_MOREorTCP_CORKflags.
ZerocopyTCPtransmitisimplementedwithintheOnloadExtensionsAPI.
ZeroCopyTCPSend
ThezerocopysendAPIsupportsthesendingofmultiplemessagestodifferent
socketsinasinglecall.Databuffersmustbeallocatedinadvanceandforbest
efficiencytheseshouldbeallocatedinblocksandoffthecriticalpath.Theuser
shouldavoidsimplymovingthecopyfromOnloadintotheapplication,butwhere
thisisunavoidable,itshouldalsobedoneoffthecriticalpath.
intonload_zc_send(structonload_zc_mmsg*msgs,intmlen,intflags);
Figure23:ZeroCopysend
intonload_zc_alloc_buffers(intfd,
structonload_zc_iovec*iovecs,
intiovecs_len,
onload_zc_buffer_type_flagsflags);
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 244
intonload_zc_release_buffers(intfd,
onload_zc_handle*bufs,
intbufs_len);
Figure24:ZeroCopyallocatebuffers
Theonload_zc_send()functionreturnvalueidentifieshowmanyofthe
onload_zc_mmsgarray’srcfieldsareset.Eachonload_zc_mmsg.rcreturnshow
manybytes(orerror)weresentinforthatmessage.Refertothetablebelow.
SentbuffersareownedbyOnload.Unsentbuffersareownedbytheapplicationand
mustbefreedorreusedtoavoidleaking.
NOTE:BufferssentwiththeONLOAD_MSG_WARMfeatureenabledarenot
actuallysentbuffers,ownershipremainswiththeuserwhoisresponsiblefor
freeingthesebuffers.
ZeroCopySend‐SingleMessage,SingleBuffer
structonload_zc_ioveciovec;
structonload_zc_mmsgmmsg;
rc=onload_zc_alloc_buffers(fd,&iovec,1,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==O);
assert(my_data_len<=iovec.iov_len);
memcpy(iovec.iov_base,my_data,my_data_len);
iovec.iov_len=my_data_len;
mmsg.fd=fd;
mmsg.msg.iov=&iovec;
rc=onload_zc_send()
rc<0applicationerrorcallingonload_zc_send().rcissetto
theerrorcode
rc==0shouldnothappen
0<rc<=n_msgs rcissettothenumberofmessageswhosestatushasbeen
sentinmmsgs[i].rc.
rc==n_msgsisthenormalcase
rc=mmsg[i].rc
rc<0errorsendingthismessage.rcissettotheerrorcode
rc>=0rcissettothenumberofbytesthathavebeensentinthis
message.Comparetothemessagelengthtoestablish
whichbufferssent
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 245
mmsg.msg.msghdr.msg_iovlen=1;
rc=onload_zc_send(&mmsg,1,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
/*Onlyonemessage,sorcshouldbe1*/
assert(rc==1);
/*rc==1sowecanlookatthefirst(only)mmsg.rc*/
if(mmsg.rc<0)
/*Errorsendingmessage*/
onload_zc_release_buffers(fd,&iovec.buf,1);
else
/*Messagesent,singlemsg,singleiovecso
*shouldn'tworryaboutpartialsends*/
assert(mmsg.rc==my_data_len);
}
Figure25:ZeroCopy‐SingleMessage,SingleBufferExample
Theexampleabovedemonstrateserrorcodehandling.Noteitcontainsanexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
ZeroCopySend‐MultipleMessage,MultipleBuffers
#defineN_BUFFERS2
#defineN_MSGS2
structonload_zc_ioveciovec[N_MSGS][N_BUFFERS];
structonload_zc_mmsgmmsg[N_MSGS];
for(i=0;i<N_MSGS;++i){
rc=onload_zc_alloc_buffers(fd,iovec[i],N_BUFFERS,
ONLOAD_ZC_BUFFER_HDR_TCP);
assert(rc==0);
/*TODOstoredatainiovec[i][j].iov_base,
*setiovec[i][j]iov_len*/
mmsg[i]fd=fd;/*Couldbedifferentforeachmessage*/
mmsg[i].iov=iovec[i];
mmsg[i].msg.msghdr.msg_iovlen=N_BUFFERS;
}
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 246
rc=onload_zc_send(mmsg,N_MSGS,0);
if(rc<=0){
/*Probablyapplicationbug*/
returnrc;
}else{
for(i=0;i<N_MSGS;++i){
if(i<rc){
/*mmsg[i]issetandwecanuseit*/
if(mmsg[i]<0){
/*errorsendingthismessage‐releasebuffers*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}elseif(mmsg(i]<sum_over_j(iovec[i][j].iov_len)){
/*partialsuccess*/
/*TODOusemmsg[i]todeterminewhichbuffersin
*iovec[i]arrayaresentandwhicharestill
*ownedbyapplication*/
}else{
/*Wholemessagesent,buffersnowownedbyOnload*/
}
}else{
/*mmsg[i]isnotset,thismessagewasnotsent*/
for(j=0;j<N_BUFFERS;++j)
onload_zc_release_buffers(fd,&iovec[i][j].buf,1);
}
}
}
Figure26:ZeroCopy‐MultipleMessages,MultipleBuffersExample
Theexampleabovedemonstrateserrorcodehandlingandcontainssomeexamples
ofbadpracticewherebuffersareallocatedandpopulatedonthecriticalpath.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 247
ZeroCopySend‐FullExample
staticstructonload_zc_ioveciovec[NUM_ZC_BUFFERS];
staticssize_tdo_send_zc(intfd,constvoid*buf,size_tlen,int
flags)
{
intbytes_done,rc,i,bufs_needed;
structonload_zc_mmsgmmsg;
mmsg.fd=fd;
mmsg.msg.iov=iovec;
bytes_done=0;
mmsg.msg.msghdr.msg_iovlen=0;
while(bytes_done<len){
if(iovec[mmsg.msg.msghdr.msg_iovlen].iov_len>(len‐bytes_done)
)
iovec[mmsg.msg.msghdr.msg_iovlen].iov_len=(len‐bytes_done);
memcpy(iovec[i].iov_base,buf+bytes_done,iov_len);
bytes_done+=iovec[mmsg.msg.msghdr.msg_iovlen].iov_len;
++mmsg.msg.msghdr.msg_iovlen;
}
rc=onload_zc_send(&mmsg,1,0);
if(rc!=1/*Numberofmessageswesent*/){
printf("onload_zc_sendfailedtoprocessmsg,%d\n",rc);
return‐1;
}else{
if(mmsg.rc<0)
printf("onload_zc_sendmessageerror%d\n",mmsg.rc);
else{
/*Iterateovertheiovecs;anythatweresentwemust
replenish.*/
i=0;bufs_needed=0;
while(i<mmsg.msg.msghdr.msg_iovlen){
if(bytes_done==mmsg.rc){
printf(onload_zc_senddidnotsendiovec%d\n",i);
/*Inotherbufferallocationschemeswewouldhaveto
release
*thesebuffers,butseemspointlessasweguaranteeatthe
*endofthisfunctiontohaveiovecarrayfull,sodo
nothing.*/
}else{
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 248
/*Buffersent,nowownedbyOnload,soreplenishiovec
array*/
++bufsneeded;
bytes_done+=iovec[i].iov_len;
}
++i;
}
if(bufs_needed)/*replenishtheiovecarray*/
rc=onload_zc_alloc_buffers(fd,iovec,bufs_needed,
ONLOAD_ZC_BUFFER_HDR_TCP);
}
}
/*Setareturncodethatlookssimilarenoughtosend().NB.we're
*notsetting(andneitherdoesonload_zc_send())errno*/
if(mmsg.rc<0)return‐1;
elsereturnbytes_done;
}
Figure27:ZeroCopySend
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 249
D.8TemplatedSends
“Templatedsends”isafeaturefortheSFN7000andSFN8000seriesadaptersthat
buildsontopofTXPIOtoprovidefurthertransmitlatencyimprovements.Referto
ProgrammedI/Oonpage118fordetailsofTXPIO.
Description
Templatedsendscanbeusedinapplicationsthatknowthemajorityofthecontent
ofpacketsinadvanceofwhenthepacketistobesent.Forexample,amarketfeed
handlermaypublishpacketsthatvaryonlyinthespecificvalueofcertainfields,
possiblydifferentsymbolsandpriceinformation,butareotherwiseidentical.
TheOnloadtemplatedsendsfeatureusestheOnloadExtensionsAPItogeneratethe
packettemplatewhichistheninstantiatedontheadapterreadytoreceivethe
“missingdatabeforeeachtransmission.
Templatedsendsinvolveallocatingatemplateofapacketontheadaptercontaining
thebulkofthedatapriortothetimeofsendingthepacket.Then,whenthepacket
istobesent,theremainingdataispushedtotheadaptertocompleteandsendthe
packet.
Whenthesocket,associatedwithanallocatedtemplate,isshutdownorclosed,
allocatedtemplatesarefreedandsubsequentcallstoaccessthesetemplatewill
returnanerror.
TheAPIdetailsareavailableintheOnloaddistributionat:
/src/include/onload/extensions_zc.h
MSGTemplate
structoo_msg_template{
/*Toverifysubsequenttemplatedcallsareusedwiththesamesocket*/
oo_spoomt_sock_id;
};
MSGUpdate
/*Anupdate_iovecdescribesasingletemplateupdate*/
structonload_template_msg_update_iovec{
void*otmu_base;/*Pointertonewdata*/
size_totmu_len;/*Lengthofnewdata*/
off_totmu_offset;/*Offsetwithintemplatetoupdate*/
unsignedotmu_flags;/*Forfutureuse.Mustbesetto0.*/
};
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 250
MSGAllocation
Description
Populatedfromanarrayofiovecstospecifytheinitialpacketdata.Thisfunctionis
calledoncetoallocatethepackettemplateandpopulatethetemplatewiththebulk
ofthepayloaddata.
Definition
externintonload_msg_template_alloc(
intfd,
structiovec*initial_msg,
intiovlen,
onload_template_handle*handle,
unsignedflags);
FormalParameters
fd
Filedescriptortosendon
initial_msg
Arrayofiovecswhicharethebulkofthepayload
iovlen
Lengthofinitialmsg
handle
Templatehandle,usedtorefertothistemplate
flags
Seenotesbelow.Canalsobesettozero
ReturnValue
0onsuccess
nonzerootherwise
Notes
Theinitialiovecarraypassedtoonload_msg_template_alloc()musthaveatleast
oneelementhavingavalidaddressandnonzerolength.
IfPIOallocationfail,thentemplate_allocwillfail.Settingtheflagsto
ONLOAD_TEMPLATE_FLAGS_PIO_RETRYwillforceallocationwithoutPIOwhile
attemptingtoallocatethePIOinlatercallstoonload_msg_template_update().
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 251
MSGTemplateUpdate
Description
Takesanarrayofonload_template_msg_update_iovectodescribechangestothe
basepacketpopulatedbytheonload_msg_template_alloc()function.Eachof
theupdateiovecsshoulddescribeasinglechange.Theupdatefunctionisusedto
overwriteexistingtemplatecontentortosendthecompletetemplatecontent
whentheONLOAD_TEMPLATE_FLAGS_SEND_NOWflagisset.
Definition
externintonload_msg_template_update(
intfd,
onload_template_handle*handle,
structonload_template_msg_update_iovec*updates,
intulen,
unsignedflags);
FormalParameters
fd
Filedescriptortosendon
handle
Templatehandle,returnedfromtheallocfunction
onload_template_msg_update_iovec
Arrayofonload_template_msg_update_ioveceachofwhichisachangeto
thetemplatepayload
ulen
Lengthofupdatesarray(i.e.thenumberofchanges)
flags
Seenotesbelow.Canalsobesettozero
ReturnValue
0onsuccess
nonzerootherwise
Notes
IftheONLOAD_TEMPLATE_FLAGS_SEND_NOWflagisset,ownershipofthe
templateispassedtoOnload.
Thisfunctioncanbecalledmultipletimesandchangesarecumulative.
Flags:
ONLOAD_TEMPLATE_FLAGS_SEND_NOW
Performthetemplateupdate,sendthetemplatecontentsandpassownership
ofthetemplatetoOnload.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 252
Tosendwithoutupdatingtemplatecontentsupdates=NULL,ulen=0andset
thesendnowflag.
ONLOAD_TEMPLATE_FLAGS_DONTWAIT(sameasMSG_DONTWAIT)
Donotblock.
MSGTemplateAbort
Abortuseofthetemplatewithoutsendingthetemplateandfreethetemplate
resourcesincludingthetemplatehandleandPIOregion.
Description
Definition
externintonload_msg_template_alloc(
intfd,
onload_template_handle*handle);
FormalParameters
fd
Filedescriptorowningthetemplate
handle
Templatehandle,usedtorefertothistemplate
ReturnValue
0onsuccess,nonzerootherwise
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 253
D.9DelegatedSendsAPI
ThedelegatedsendAPI,supportedbySolarflareSFN7000andSFN8000series
adapters,canlowerthelatencyoverheadincurredwhencallingsend()onTCP
socketsbycontrollingTCPsocketcreationandmanagementthroughOnload,but
allowingTCPsendsdirectlythroughtheOnloadlayer2ef_viAPIorothersimilarAPI.
Description
AnapplicationusingthedelegatedsendsAPIwillprepareapacketbufferwithIP/
TCPheaderdata,beforeaddingpayloaddatatothepacket.Thepacketbuffercan
bepreparedinadvanceandpayloadaddedjustbeforethesendisrequired.
Aftereachdelegatedsend,theactualdatasent(andlengthofthatdata)isreturned
toOnload.ThisallowsOnloadtoupdatetheTCPinternalstateandhavethedatato
handifretransmissionsarerequiredonthesocket.
ThisfeatureisintendedforapplicationsthatmakesporadicTCPsendsasopposed
tolargeamountsofbidirectionalTCPtraffic.TheAPIshouldbeusedwithcaution
tosendsmallamountsofTCPdata.Althoughthepacketbuffercanbepreparedin
advanceofthesend,theideaistocompletethedelegatedsendoperation
(onload_delegated_send_complete())soonaftertheinitialsendtomaintainthe
integrityoftheTCPinternalstatei.e.sothatsequence/acknowledgmentnumbers
arecorrect.
TheuserisresponsibleforserializationwhenusingthedelegatedsendAPI.Thefirst
callshouldalwaysbeonload_delegated_send_prepare().Ifanormalsendis
requiredfollowingtheprepare,theusershoulduse
onload_delegated_send_cancel().
NOTE:Foragivenfiledescriptor,whileadelegatedsendisinprogress,anduntil
completehasbeencalled,theusershouldNOTattemptanystandardsend(),
write()orsendfile()close()etcoperations.
Performance
Forbestlatencytheapplicationshouldcallonload_delegated_send_complete()
assoonasadelegatedsendiscomplete.ThisallowsOnloadtocontinueif
retransmissionsarerequired.
WARNING:Onloadcannotperformanyretransmissionuntilcompletehasbeen
called.
Whenalinkpartnerhasalreadyacknowledgeddatabeforecompletehasbeen
called,OnloadwillnothavetocopythesentdatatotheTCPretransmitqueue.So
delayingthecompletecallmayavoidadatacopybutlatencymaysufferintheevent
ofpacketloss.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 254
Standardsendvs.DelegatedSend
ThefollowingsequencedemonstratestheeventssequenceofanormalTCPsend
andtheDelegatedsend.
Figure28:Standardvs.Delegatedsend.
Apacketcouldbedelayedbeforesendingwhenthereceiverornetworkisnotready.
Whenthisoccursusingdelegatedsend,theonload_delegated_send_prepare()
functionwillreturnzerovaluesinthecong/sendwindowfieldsofthedelegated
sendstateandthecallercanelecttosendwiththestandardmethod.
Application calls
send()
Packet enqueued for
sending
TCP/IP stack
generates network
headers
Packet sent by TCP/
IP stack, added to
retransmit queue
TCP/IP state updated
for subsequent
headers
Peer replies
Application calls
o_d_s_prepare()
TCP/IP stack
generates network
headers
Packet sent via ef_vi
Application calls
o_d_s_complete()
Packet added to
retransmit queue
TCP/IP state updated
for subsequent
headers
Peer replies
Packet removed from retransmit queue and TCP/IP
state updated for subsequent headers
Normal send Delegated
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 255
ExampleCode
TheOnloaddistributionincludestheefdelegated_server.cand
efdelegated_client.cexampleapplicationstodemonstratethedelegatedsends
API.Variablesandconstantsdefinitions,includingsocketflagsandfunctionreturn
codesrequiredwhenusingtheAPIcanbefoundintheextensions.hheaderfile.
• /openonload<version>/src/tests/ef_vi
• /openonload<version>/build/gnu_x86_64/tests/ef_vi
Runclient/server
[server1]#onload‐‐profile=latency./efdelegated_server‐s‐p20003
<interface>
oo:efdelegated_serv[17921]:UsingOpenOnload201606u1Copyright20062016
SolarflareCommunications,20022005Level5Networks[3]
Waitingforclienttoconnect
Acceptedclientconnection
Startingeventloop
n_lost_msgs:0
n_samples:50000
latency_mean:5827
latency_min:5493
latency_max:343472
[server2]#onload‐‐profile=latency./efdelegated_client‐d‐p20003
<interface><server1interfaceip>
oo:efdelegated_clie[16950]:UsingOpenOnload201606u1Copyright20062016
SolarflareCommunications,20022005Level5Networks[1]
n_normal_sends:4
n_delegated_sends:54996
structonload_delegated_send
structonload_delegated_send{
void*headers;
intheaders_len;/*bufferlenoninput,headerslenonoutput*/
intmss;/*onepacketpayloadmaynotexceedthis*/
intsend_wnd;/*sendwindow*/
intcong_wnd;/*congestionwindow*/
intuser_size;/*the"size"valuefromsend_prepare()call*/
inttcp_seq_offset;
intip_len_offset;
intip_tcp_hdr_len;
intreserved[5];
};
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 256
onload_delegated_send_prepare
Description
Preparetosenduptosizebytes.AllocateTCPheadersandpreparethemwith
EthernetIP/TCPheaderdata‐includingcurrentsequencenumberand
acknowledgmentnumber.
Definition
enumonload_delegated_send_prepare(
intfd,
intsize,
uintflags,
structonload_delegated_send*)
FormalParameters
fd
Filedescriptortosendon
size
Sizeofpayloaddata
flags
Seebelow
structonload_delegated_send*
Seestructonload_delegated_send
ReturnValue
0onsuccess
nonzerootherwise
Notes
Thisfunctioncanbecalledspeculativelysothatthepacketbufferispreparedin
advance,headersareaddedsothatthepacketpayloaddatacanbeadded
immediatelybeforethesendisrequired.
ThisfunctionassumesthepacketlengthisequaltoMSSinwhichcasethereisno
needtocallonload_delegated_send_tcp_update().
FlagsareusedforARPresolution:
•defaultflags=0
ONLOAD_DELEGATED_SEND_FLAG_IGNORE_ARP‐donotdoARPlookup,the
callerwillprovidedestinationMACaddress.
ONLOAD_DELEGATED_SEND_FLAG_RESOLVE_ARP‐ifARPinformationisnot
available,sendaspeculativeTCP_ACKtoprovokekernelintoARPresolution‐
waitupto1msforARPinformationtoappear.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 257
NOTE:TCPsendwindow/congestionwindowsmustberespectedduring
delegatedsends.
Seeextensions.hforflagsandreturncodevalues.
onload_delegated_send_tcp_update
Description
ThisfunctiondoesnotsendTCPdata,butiscalledtoupdatepacketheaderswith
thesequencenumberandflagsfollowingsuccessivesendsviathe
onload_delegated_send_tcp_advance()function.
NOTE:ThisfunctiondoesnotupdatetheACKnumber.
Definition
voidonload_delegated_send_tcp_update(
structonload_delegated_send*,
intsize,
intflags)
FormalParameters
structonload_delegated_send*
Seestructonload_delegated_send
size
Sizeofpayloaddata
flags
Seebelow
ReturnValue
None
Notes
Thisfunctioniscalledwhen,duringasend,thepayloadlengthisnotequaltothe
MSSvalue.Seeonload_delegated_send_prepareonpage256.
FlagTCP_FLAG_PSHisexpectedtobesetonthelastpacketwhensendingalarge
datachunk.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 258
onload_delegated_send_tcp_advance
Description
AdvanceTCPheadersaftersendingaTCPpacket.Thisfunctionifgoodfor:
‐ sendingafewsmallpacketsinrapidsuccession
‐ sendinglargedatachunk(>MSS)overmultiplepackets
Thesequencenumberisupdatedforeachoutgoingpacket.Whenapackethasbeen
sent,theapplicationmustcallonload_delegated_send_tcp_update()toupdate
packetheaderswiththepayloadlength‐therebyensuringthatthesequence
numberiscorrectforthenextsend.
ThisfunctiondoesnotupdatetheACKnumberinoutgoingpackets.TheACK
numberinsuccessiveoutgoingpacketsisthevaluefromthelastcalltothe
onload_delegated_send_prepare()function.
Theadvancefunctionisusedtosendasmallnumberofsuccessiveoutgoing
packets,buttheapplicationshouldthencallonload_delegated_send_complete()to
returncontroltoOnloadinordertomaintainsequence/acknowledgmentnumber
integrityandallowOnloadtoremovesentdatafromtheretransmitqueue.
Definition
voidonload_delegated_send_tcp_advance(
structonload_delegated_send*,
intbytes)
FormalParameters
structonload_delegated_send*
Seestructonload_delegated_send
bytes
Numberofbytessent
ReturnValue
None
Notes
Whensendingapacketusingmultiplesends,thefunctioniscalledtoupdatethe
headerdatawiththenumberofbytesaftereachsend.
TheactualdatasentisnotreturnedtoOnloaduntilthefunction
onload_delegated_send_complete()iscalled.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 259
onload_delegated_send_complete
Description
Followingadelegatedsend,thisfunctionisusedtoreturntheactualdatasent(and
lengthofthatdata)toOnloadwhichwillupdatetheinternalTCPstatei.e.sequence
numbersandremovepacketsfromtheretransmitqueue(whenappropriateACKs
arereceived).
Definition
intonload_delegated_send_complete(
intfd,
conststructiovec*,
intiovlen,
intflags)
FormalParameters
fd
Thefiledescriptor.
structiovec
Pointertothedatasent
iovlen
Size(bytes)oftheiovecarray
flags
(MSG_DONTWAIT|MSG_NOSIGNAL]
ReturnValue
numberofbytesacceptedorreturn‐1ifanerroroccurs.
Notes
Onloadisunabletodoanyretransmituntilthisfunctionhasbeencalled.
Thisfunctionshouldbecalledevenifsome(butnotall)bytesspecifiedinthe
preparefunctionhavebeensent.Theusermustalsocall
onload_delegated_send_cancel()ifsomeofthebytesarenotgoingtobesent
i.e.reservedbutnotsent‐seeonload_delegated_send_cancel()notesbelow.
ThisfunctioncanblockbecauseofSO_SNDBUFlimitationandwillignorethe
SO_SNDTIMEOvalue.
OnloadUserGuide
OnloadExtensionsAPI
Issue22 ©SolarflareCommunications2017 260
onload_delegated_send_cancel
Description
Nomoredelegatedsendisplanned.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.
Definition
intonload_delegated_send_cancel(intfd)
FormalParameters
fd
Thefiledescriptortobeclosed.
ReturnValue
0onsuccess
nonzeroifanerroroccurs.
Notes
Whentcpheadershavebeenallocatedwithonload_delegated_send_prepare(),
butitissubsequentlyrequiredtodoanormalsend,thisfunctioncanbeusedto
cancelthedelegatedsendoperationanddoanormalsend.
Thereisnoneedtocallthisfunctionbeforecalling
onload_delegated_send_prepare().
Thereisnoneedtocallthisfunctionifallthebytesspecifiedinthe
onload_delegated_send_prepare()functionhavebeensent.
Ifsome,butnotallbyteshavebeensent,youmustcall
onload_delegated_send_complete()forthesentbytesTHENcall
onload_delegated_send_cancel()fortheremainingbytes(reservedbutnot
sent)bytes.Thisappliesevenifthereasonfornotsendingisthatthewindowlimits
returnedfromthepreparefunctionhavebeenreached.
Normalsend(),shutdown()orclose()etccanbecalledafterthiscall.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 261
Eonload_stackdump
E.1Introduction
TheSolarflareonload_stackdumpdiagnosticutilityisacomponentoftheOnload
distributionwhichcanbeusedtomonitorOnloadperformance,settuningoptions
andexamineaspectsofthesystemperformance.
NOTE:Toviewdataforallstacks,createdbyallusers,theusermustberootwhen
runningonload_stackdump.Nonrootuserscanonlyviewdataforstackscreated
bythemselvesandaccessibletothemviatheEF_SHARE_WITHenvironment
variable.
Thefollowingexamplesofonload_stackdumparedemonstratedelsewhereinthis
userguide:
MonitoringUsingonload_stackdumponpage47
ProcessingatUserLevelonpage49
AsFewInterruptsasPossibleonpage50
EliminatingDropsonpage51
MinimizingLockContentiononpage52
E.2GeneralUse
Theonload_stackdumptoolcanproduceanextensiverangeofdataanditcanbe
moreusefultolimitoutputtospecificstacksortospecificaspectsofthesystem
performanceforanalysispurposes.
•Forhelp,andtolistallonload_stackdumpcommandsandoptions:
onload_stackdump‐‐help
•Tolistandreadenvironmentvariablesdescriptions:
onload_stackdumpdoc
•Fordescriptionsofstatisticsvariables:
onload_stackdumpdescribe_stats
Describesallstatisticslistedbytheonload_stackdumplotscommand.
•Toidentifyallstacks,byidentifierandname,andallprocessesacceleratedby
Onload:
onload_stackdump
#stackidstacknamepids
6teststack28570
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 262
•Tolimitthecommand/optiontoaspecificstacke.g(stack4).
onload_stackdump4lots
ListOnloadedProcesses
Theonload_stackdumpprocessescommandwillshowthePIDandnameof
processesbeingacceleratedbyOnloadandtheOnloadstackbeingusedbyeach
processe.g.
#onload_stackdumpprocesses
#pidstackidcmdline
255873./sfntpingpong
Onloadedprocesseswhichhavenotcreatedasocketarenotdisplayed,butcanbe
identifiedusingthelsofcommand.
OnloadedThreads,Priority,Affinity
Theonload_stackdumpthreads’commandwillidentifythreadswithineach
Onloadedprocess,theCPUaffinityofthethreadandruntimepriority.
#onload_stackdumpthreads|column‐t
#pidthreadaffinitypriorityrealtime
12606126060000000200
ListOnloadEnvironmentvariables
Theonload_stackdumpenvcommandwillidentifyonloadedprocessesrunning
inthecurrentenvironmentandlistallOnloadvariablessetinthecurrent
environmente.g.
#EF_POLL_USEC=100000EF_TXQ_SIZE=4096EF_INT_DRIVE=1onload<application>
#onload_stackdumpenv
pid:25587
cmdline:./sfntpingpong
env:EF_POLL_USEC=100000
env:EF_TXQ_SIZE=4096
env:EF_INT_DRIVEN=1
TXPIOCounters
TheOnloadstackdumputilityexposescounterstoindicatehowoftenTXPIOisbeing
used‐seeDebugandLoggingonpage73.ToviewPIOcountersrunthefollowing
command:
$onload_stackdumpstats|greppio
pio_pkts:2485971
no_pio_err:0
ThevaluesreturnedwillidentifythenumberofpacketssentviaPIOandnumberof
timeswhenPIOwasnotusedduetoanerrorcondition.
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 263
SendRSTonaTCPSocket
TosendaresetonanOnloadacceleratedTCPsocket,specifythestackandsocket
usingtherstcommand:
#onload_stackdump<stack:socket>rst
RemovingZombieandOrphanStacks
Onloadstacksandsocketscanremainactiveevenafterallprocessesusingthem
havebeenterminatedorhaveexited,forexampletoensuresentdataissuccessfully
receivedbytheTCPpeerortohonorTCPTIME_WAITsemantics.Suchstacksshould
alwayseventuallyselfdestructanddisappearwithnouserintervention.However,
thesestacks,insomeinstances,causeproblemsforrestartingapplications,for
exampletheapplicationmaybeunabletousethesameportnumberswhenthese
arestillbeingusedbythepersistentstacksocket.Persistentstacksalsoretain
resourcessuchaspacketbufferswhicharethendeniedtootherstacks.
Suchstacksaretermed‘zombie’or‘orphanstacksanditmaybeundesirableor
desirablethattheyexist.
•Tolistallpersistentstacks:
#onload_stackdump‐zall
Nooutputtotheconsoleorsyslogmeansthatnosuchstacksexist.
•Tolistaspecificpersistentstack:
#onload_stackdump‐z<stackID>
•Todisplaythestateofpersistentstacks:
#onload_stackdump‐zdump
•Toterminatepersistentstacks
#onload_stackdump‐zkill
•Todisplayalloptionsavailableforzombie/orphanstacks:
#onload_stackdump‐‐help
Snapshotvs.DynamicViews
Theonload_stackdumptoolpresentsasnapshotviewofthesystemwheninvoked.
Tomonitorstateandvariablechangeswhilstanapplicationisrunninguse
onload_stackdumpwiththeLinuxwatchcommande.g.
• snapshot:onload_stackdumpnetif
• dynamic:watch‐d‐n1onload_stackdumpnetif
Someonload_stackdumpcommandsalsoupdateperiodicallywhilstmonitoringa
process.Thesecommandsusuallyhavethewatch_prefixe.g.
watch_stats,watch_more_stats,watch_tcp_stats,watch_ip_statsetc.
Usetheonload_stackdump‐hoptiontolistallcommands.
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 264
MonitoringReceiveandTransmitPacketBuffers
onload_stackdumppackets
#onload_stackdumppackets
ci_netif_pkt_dump_all:id=1
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=437current
pkt_bufs:max=32768alloc=2048free=981async=0
pkt_bufs:rx=1067rx_ring=1001rx_queued=2pressure_pool=64
pkt_bufs:tx=0tx_ring=0tx_oflow=0
pkt_bufs:in_loopback=0in_sock=0
1003:0x200Rx
n_zero_refs=1045n_freepkts=981estimated_free_nonb=64
free_nonb=0nonb_pkt_pool=ffffffffffffffff
Theonload_stackdumppacketscommandcanbeusefultoreviewpacketbuffer
allocation,useandreusewithinamonitoredprocess.
Theexampleaboveidentifiesthattheprocesshasamaximumof32768buffers
(eachof2048bytes)available.Fromthispool2048buffershavebeenallocatedand
981fromthatallocationarecurrentlyfreeforreuse‐thatmeanstheycanbe
pushedontothereceiveortransmitringbuffersreadytoacceptnewincoming/
outgoingdata.
Onthereceivesideofthestack,1067packetbuffershavebeenallocated,1001have
beenpushedtothereceivering‐andareavailableforincomingpackets,and2are
currentlyinthereceivequeuefortheapplicationtoprocess.
Onthetransmitsideofthestack,zerobuffersarecurrentlyallocatedorbeingused.
Theremainingvaluesarecalculationsbasedonthepacketbuffervalues.
UsingtheEF_PREFAULT_PACKETSenvironmentvariable,packetscanbepre
allocatedtotheuserprocesswhenanOnloadstackiscreated.Thiscanreduce
latencyjitterandimproveOnloadperformance‐forfurtherdetailsseePrefault
PacketBuffersonpage47.
PacketSets
Apacketsetisa2MBchunkofpacketbuffersbeingusedbyanOnloadapplication.
Anapplicationmightusebuffersfromasinglesetorfromseveralsetsdependingon
itscomplexityandpacketbufferrequirements.
WithanaimtofurtherreduceTLBthrashingandeliminatepacketsdrops,Onload
willtrytoreusebuffersfromthesameset.
Theonload_stackdumplotscommandinOnload201509willreportonthecurrent
useofpacketssetse.g:
$onload_stackdumplots|greppkt_set
pkt_sets:pkt_size=2048set_size=1024max=32alloc=2
pkt_set[0]:free=544
pkt_set[1]:free=442current
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 265
Intheaboveoutputthereare2packetsets,thecountersidentifythenumberoffree
packetbuffersineachsetandidentifythesetcurrentlybeingused.
Thepacketsetsfeatureisnotavailabletouserapplicationsusingtheef_vilayer
directly.
TCPApplicationSTATS
Thefollowingonload_stackdumpcommandscanbeusedtomonitoraccelerated
TCPconnections:
onload_stackdumptcp_stats
onload_stackdumpmore_stats|greptcp
Field Description
tcp_active_opens Numberofsocketconnectionsinitiatedbythe
localend
tcp_passive_opens Numberofsocketsconnectionsacceptedbythe
localend
tcp_attempt_fails Numberoffailedconnectionattempts
tcp_estab_resets Numberofestablishedconnectionswhichwere
subsequentlyreset
tcp_curr_estab Numberofsocketconnectionsintheestablished
orclose_waitstates
tcp_in_segs Total numberofreceivedsegments‐includes
erroredsegments
tcp_out_segs Totalnumberoftransmittedsegments‐excluding
segmentscontainingonlyretransmittedoctets
tcp_retran_segs Totalnumberofretransmittedsegments
tcp_in_errs Total numberofsegmentsreceivedwitherrors
tcp_out_rsts Numberofresetsegmentssent
Field Description
tcp_has_recvq Nonzeroifreceivequeuehasdataready
tcp_recvq_bytes Totalbytesinreceivequeue
tcp_recvq_pkts Totalpacketsinreceivequeue
tcp_has_recv_reorder Nonzeroifsockethasoutofsequencebytes
tcp_recv_reorder_pkts: Numberofoutofsequencepacketsreceived
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 266
Usetheonload_stackdump‐hcommandtolistallTCPconnection,stackand
socketcommands.
Theonload_stackdumpLOTSCommand.
Theonload_stackdumplotscommandwillproduceextensivedataforall
acceleratedstacksandsockets.Thecommandcanalsoberestrictedtoaspecific
stackanditsassociatedconnectionswhenthestacknumberisenteredonthe
commandlinee.g.
onload_stackdumplots
onload_stackdump2lots
Fordescriptionsofthestatisticsrefertotheoutputfromthefollowingcommand:
onload_stackdumpdescribe_stats
Thefollowingtablesdescribetheoutputfromtheonload_stackdumplots
commandfor:
•TCPstack
•TCPestablishedconnectionsocket
•TCPlisteningsocket
•UDPsocket
Withinthetablesthefollowingabbreviationsareused:
rx=receive(orreceiver),tx=transmit(orsend)
pkts=packets,skts=sockets
tcp_has_sendq Nonzeroifsendqueueshavedataready
tcp_sendq_bytes Numberofbytescurrentlyinallsendqueuesfor
thisconnection
tcp_sendq_pkts Numberofpacketscurrentlyinallsendqueuesfor
thisconnection
tcp_has_inflight Nonzeroifsomedataremainsunacknowledged
tcp_inflight_bytes Totalnumberofunacknowledgedbytes
tcp_inflight_pkts Totalnumberofunacknowledgedpackets
tcp_n_in_listenq Numberofsockets(summedacrossalllistening
sockets)wherethelocalendhasrespondedto
SYN,withaSYN_ACK,butthishasnotyetbeen
acknowledgedbytheremoteend
tcp_n_in_acceptq Numberofsockets(summedacrossalllistening
sockets)thatarecurrentlyqueuedwaitingforthe
localapplicationtocallaccept()
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 267
Max=maximum,num=number,seq=sequencenumber
Table3:StackdumpOutput:TCPStack
Sampleoutput Description
onload_stackdumplots Commandentered
ci_netif_dump_to_logger:stack=7name= StackidandstacknameassetbyEF_NAME.
ver=201606uid=0pid=1930ns_flags=0 Onloadversion,useridandprocessidofcreator
process
lock=20000000LOCKED
nics=3
primed=1
Internalstacklockstatus
nics=bitfieldidentifiesadaptersusedbythisstack
e.g.3=0x11‐sostackisusingNICs1and2.
primed=1meanstheeventqueuewillgenerate
aninterruptwhenthenexteventarrives
sock_bufs:
max=8192
n_allocated=4
Maxnumberofsocketsbufferswhichcanbe
allocated,andnumbercurrentlyinuse.
aux_bufs:max=8192
allocated=7
free=6
BuffersusedbypartiallyopenedTCPconnections
(incomingconnections)beforetheyare
establishedandpromotedtousesocketbuffers.
AuxbufferslimitedtoEF_TCP_SYNRECV_MAX*2
pkt_sets:
pkt_size=2048
set_size=1024
max=32
alloc=2
Packetbuffersare2KBinsize.Thereare1024
buffersineachpacketset
Amaximum32packetsetsareavailabletothis
stack
2packetsetsarecurrentlyallocated
pkt_set[0]:
free=112
Packetset0with112freepktbuffers
pkt_set[1]:
free=880current
Packetset1with880free2KBpktbuffers‐thisis
thepkt_setcurrentlybeingused.
pkt_bufs:
max=32768
alloc=576
free=57
async=0
Packetbuffers:
Thisstackislimitedtoallocatingamaximumof
32768packetbuffers(eachof2048bytes).576
havebeenallocatedofwhich57arefreeandcan
bereusedbyeitherreceiveortransmitrings.
async=packetbuffersusedbyOnloadinoneofits
asynchronousqueues
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 268
pkt_bufs:
rx=1056
rx_ring=992
rx_queued=0
pressure_pool=64
Receivepacketbuffers:
Atotalof1056pktbuffersarecurrentlyinuse,992
havebeenpushedtothereceivering,0areinthe
application’sreceivequeue.
Pressure_pool=rx‐(rx_ring+rx_queued)‐isa
poolofpktbuffersusedwhenthestackisunder
memorypressure.
IftheCRITICALflagisdisplayeditindicatesa
memorypressureconditioninwhichthenumber
ofpacketsinthereceivesocketbuffersis
approachingtheEF_MAX_RX_PACKETSvalue.
IftheLOWflagisdisplayeditindicatesamemory
pressureconditionwhentherearenotenough
packetbuffersavailabletorefilltheRXdescriptor
ring.
pkt_bufs:
tx=2
tx_ring=1
tx_oflow=0
Transmitpacketbuffers:
Atotalof2pktbuffersarecurrentlyinuse,1
remainsinthetransmitring.
tx_oflow=thenumberofextrapacketsthatare
readytosendtothetransmitqueue,butthatthe
transmitqueuedoesn'thavespacetoaccept.
pkt_bufs:
in_loopback=0
in_sock=991
Numberofpkt_bufscurrentlyusedinTCP
loopbackconnectionorbyaTCPsocket
time:netif=5eb5c61
poll=5eb5c61
now=5eb5c61(diff=0.000sec)
Internaltimervalues
activecache:
hit=0
avail=0
cache=EMPTY
pending=EMPTY
TCPsocketcaching.
hit=numberofcachehits(werecached)
avail=numberofsocketsavailableforcaching
currentcachestate
Table3:StackdumpOutput:TCPStack(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 269
ci_netif_dump_vi:
stack=7
intf=0
dev=(pciaddress)
hw=0C0
vi=240
pd_owner=1
channel=0
oo_vi_flags=3
Datadescribesthestacksvirtualinterfacetothe
NIC
viidentifiestheVIinusebythestack
protectiondomainownerwillbezerowhenusing
physicaladdressingmode.
channelidentifiestherxqueuebeingusedonthis
interface
bitmask‐VIisusingadapters1and2
evq:
cap=2048
current=16de30
is_32_evs=0
is_ev=0
Eventqueuedata:
cap‐maxnumofeventsqueuecanhold
current‐thecurrenteventqueuelocation
is_32_evs‐is1ifthereare32ormoreevents
pending
is_ev‐is1ifthereareanyeventspending
rxq:
cap=511
lim=511
spc=1
level=510
total_desc=93666
Receivequeuedata:
cap‐totalcapacity
lim‐maxfilllevelforreceivedescriptorring,
specifiedbyEF_RXQ_LIMIT
spc‐amountoffreespaceinreceivequeue‐how
manydescriptorscouldbeaddedbeforethe
receivequeuebecomesfull
level‐howfullthereceivequeuecurrentlyis
total_desc‐totalnumberofdescriptorsthathave
beenpushedtothereceivequeue
Table3:StackdumpOutput:TCPStack(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 270
txq:
cap=511
lim=511
spc=511
level=0
pkts=0
oflow_pkts=0
Transmitqueuedata:
cap‐totalcapacity
lim‐maxfilllevelfortransmitdescriptorring,
specifiedbyEF_TXQ_LIMIT
spc‐amountoffreespaceinthetransmitqueue‐
howmanydescriptorscouldbeaddedbeforethe
transmitqueuebecomesfull
level‐howfullthetransmitqueuecurrentlyis
pkts‐howmanypacketsarerepresentedbythe
descriptorsinthetransmitqueue
oflow‐howmanypacketsareintheoverflow
transmitqueue(i.e.waitingforspaceintheNIC's
transmitqueue)
txq:
pio_buf_size=2048
tot_pkts=93669
bytes=0
Totalnumberofpacketssentandnumberof
packetbytescurrentlyinthequeue
ci_netif_dump_extra:stack=7 Additionaldatafollows
in_poll=0
post_poll_list_empty=1
poll_did_wake=0
StackPollingStatus:
in_poll=processiscurrentlypolling
post_poll_list_empty=1,(1=true,0=false)tasksto
bedoneoncepollingiscomplete
poll_did_wake=whilepolling,theprocess
identifiedasocketwhichneedstobewoken
followingthepoll
rx_defrag_head=1
rx_defrag_tail=1
Reassemblysequencenumbers.‐1meansnore
assemblyhasoccurred
tx_may_alloc=1
can=1
nonb_pool=1
send_may_poll=0
is_spinner=0
TCPbufferdata:
tx_may_alloc=numpktbufferstcpcoulduse
nonb_pool=numberofpktbuffersavailabletotcp
processwithoutholdinglock
send_may_poll=0
is_spinner=TRUEifathreadisspinning
Table3:StackdumpOutput:TCPStack(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 271
hwport_to_intf_i=0,1,1,1,1,1
intf_i_to_hwport=0,0,0,0,0,0
Internalmappingofinternalinterfacesto
hardwareports
uk_intf_ver=03e89aa26d20b98fd08793e771f2cdd9 md5user/kernelinterfacechecksumcomputedby
bothkernelanduserapplicationtoverifyinternal
datastructures
deferredcount0/32
numanodes:creation=0load=0
numanodemasks:
packetalloc=1
sockalloc=1
interrupt=1
NUMAnodeparameters‐refertoOnload
DeploymentonNUMASystemsonpage36.
pids:14025 ListofprocessesbeingacceleratedbyOnloadon
thisstack.
Table4:StackdumpOutput:TCPEstablishedConnectionSocket
Sampleoutput Description
TCP7:1lcl=192.168.1.2:50773
rmt=192.168.1.1:34875ESTABLISHED
SocketConfiguration.
Stack:socketid,localandremoteip:portaddress,
TCPconnectionisESTABLISHED
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=0000b6f4(RQ)
tx_wake=00000002
flags:
Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
ul_poll:301326900
spincycles100000usec
uid=0
s_flags:
rcvbuf=129940
sndbuf=131072
bindtodev=1(1,0:0)
ttl=64
Socketreceivebuffersize,sendbuffersize,
Table3:StackdumpOutput:TCPStack(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 272
rcvtimeo_ms=0
sndtimeo_ms=0
sigown=0
cmsg=
Timeoutvalues(microseconds)beforeanerroris
generatedforsend/receivefunctionsassetby
SO_RCVTIMEO,SO_SNDTIMEO.
sigownidentifiesthePIDreceivingsignalsfrom
thissocket.
rx_errno=0
tx_errno=0
so_error=0
os_sock=0,TX
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.
tx_errno=ZEROiftransmitcanstillhappen,
otherwisecontainserrorcode.
so_error=currentsocketerror(0=noerror)
os_sock=identifysockethandledbytheOSand
notbyOnload.0meanshandledbyOnload.
epoll3:ready_list_id0Listofreadysocketsfromtheepoll3set.
tcpflags:TSOWSCLSACKESTABPASSIVE
local_peer:‐1
TCPflagscurrentlysetforthissocket.
local_peer=Identifythepeersocketinalocal
loopbackconnection.
snd:up=b554bb86
unanxtmax=b554bb86b554bb87b556b6a6
enq=b554bb87
TCPsequencenumbers.
up=(urgentpointer)sequenceofbytefollowing
the00Bbyte
unanxtmax=sequencenumberoffirst
unacknowledgedbyte,sequencenumberofnext
byteweexpecttobeacknowledgedandmax=
sequenceoflastbyteinthecurrentsendwindow
enq=sequencenumberoflastbytecurrently
queuedfortransmit
Table4:StackdumpOutput:TCPEstablishedConnectionSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 273
snd:send=0(0)
pre=0
inflight=1(1)
wnd=129824
unused=129823
SendData.
send=numberofpkts(bytes)sent
pre=numberofpktsinpresendqueue.Aprocess
canadddatatotheprequeuewhenitisprevented
fromsendingthedataimmediately.Thedatawill
besentwhenthecurrentsendingoperationis
complete
inflight=numberofpkts(bytes)sentbutnotyet
acknowledged
wnd=receiversadvertisedwindowsize(bytes)
andnumberoffree(unused)space(bytes)inthat
window
snd:cwnd=49733+0
used=0
ssthresh=65535
bytes_acked=0
Open
Congestionwindow(cwnd).
cwnd=congestionwindowsize(bytes)
used=portionofthecwndcurrentlyinuse
slowstartthresh‐numberofbytesthathavetobe
sentbeforeprocesscanexitslowstart
bytes_acked=numberofbytesacknowledged‐
thisvalueisusedtocalculatetherateatwhichthe
congestionwindowisopened
currentcwndstatus=OPEN
snd:
sndbuf_pkts=136
Onloaded(Valid)
if=6
mtu=1500
intf_i=0
vlan=0
encap=4
Onloaded=canreachthedestinationviaan
acceleratedinterface.
sndbuf_pkts=sizeofthesendbuffer(pkts).Send
bufferiscalculatedasbytes.
(Valid)=cachedcontrolplaneinformationisupto
date,cansendimmediatelyusingthisinformation.
(Old)=cachedcontrolplaneinformationmaybe
outofdate.OnnextsendOnloadwilldoacontrol
planelookup‐thiswilladdsomelatency.
Table4:StackdumpOutput:TCPEstablishedConnectionSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 274
snd:limited
rwnd=0
cwnd=0
nagle=0
more=0
app=412548
Countsofreasonstransmissionwasstoppedby:
receivewindsize,
congestionwinsize
Nagle'salgorithm
more(CORK,MSG_MORE)
app=TXstoppedbecauseTXQisempty.
rcv:nxtmax=0e9251fe0e944d1d
wndadv=129823
cur=0e944d92
FASTSTARTFAST
ReceiverData.
nxtmax=nextbyteweexpecttoreceiveandlast
byteweexpecttoreceive(becauseofwindow
size)
wndadv=receiveradvertisedwindowsize
cur=bytecurrentlybeingprocessed
rcv:bytes=13201600
rob_pkts=0
q_pkts=2+0
usr=0
Reorderbuffer.
Bytesreceivedoutofsequenceareputintoa
reorderbufferawaitingfurtherbytesbefore
reorderingcanoccur.
usr=numberofbytesofreceiveddataavailableto
theuser
eff_mss=1448
smss=1460a
amss=1460
used_bufs=2
MaxSegmentSize.
eff_mss=effective_mss
smss=sendermss
amss=advertisedmss
used_bufs=numberoftransmitbuffersused
srtt=01
rttvar=000
rto=189
zwins=0,0
Roundtriptime(RTT)‐allvaluesaremilliseconds.
srtt=smoothedRTTvalue
rttvar=RTTvariation
rto=currentRTOtimeoutvalue
zwins=zerowindows,timeswhenadvertised
windowhasgonetozerosize.
Table4:StackdumpOutput:TCPEstablishedConnectionSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 275
curr_retrans=0
total_retrans=0
dupacks=0
Retransmissions.
curr_retrans
total_retrans
dupacks=numberofduplicateacksreceived
rtos=0
frecs=0
seqerr=0,0
ooo_pkts=0
ooo=0
rtos=numberofretranstimeouts
frecs=numberoffastrecoveries
seqerr=numberofsequenceerrors
numberofoutofsequencepkts
numberofoutoforderevents
tx:defer=0
nomac=0
warm=0
warm_aborted=0
Numberofpktswheresendisdeferredtostack
lockholder.
NumberofpktssentviatheOSusingrawsockets
whenuptodateARPdataisnotavailable.
NumberofpktssentusingMSG_WARM.
Numberoftimesamessagewarmsendfunction
wascalled,butnotsentduetoonloadlock
constraints.
timers: Currentlyactivetimers
Table5:StackdumpOutput:TCPStackListenSocket
Sampleoutput Description
TCP7:3lcl=0.0.0.0:50773
rmt=0.0.0.0:0LISTEN
Socketconfiguration.
stack:socketid,LISTENINGsocketonport50773
localandremoteaddressesnotset‐notboundto
anyIPaddr
lock:10000000UNLOCKED Internalstacklockstatus
rx_wake=00000000
tx_wake=00000000flags:
Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
addr_spc_id=fffffffffffffffe
s_flags:REUSEBOUNDPBOUND
Addressspaceidentifierinwhichthissocketexists
andflagssetonthesocket
Allowbindtoreuselocalport
Table4:StackdumpOutput:TCPEstablishedConnectionSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 276
rcvbuf=129940
sndbuf=131072
rx_errno=6b
tx_errno=20
so_error=0
ReceiveBuffer.
socketreceivebuffersize,sendbuffersize,
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.tx_errno=ZEROif
transmitcanstillhappen,otherwisecontainserror
code.so_error=currentsocketerror(0=noerror)
tcpflags:WSCLSACK Flagsadvertisedduringhandshake
listenq:max=1024
n=0
ListenQueue.
queueofhalfopenconnects(SYNreceivedand
SYNACKsent‐waitingforfinalACK)
n‐numberofconnectionsinthequeue
acceptq:max=5
n=0
get=1
put=1
total=0
AcceptQueue.
queueofopenconnections,waitingfor
applicationtocallaccept().
max=maxconnectionsthatcanexistinthequeue
n=currentnumberofconnections
get/put=indexesforqueueaccess
total=numofconnectionsthathavetraversed
thisqueue
epcache:n=0
cache=EMPTY
pending=EMPTY
Endpointcache.
n=numberofendpointscurrentlyknowntothis
socket
cache=EMPTYoryesifendpointsarestillcached
pending=EMTPYoryesifendpointsstillhaveto
becached
Table5:StackdumpOutput:TCPStackListenSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 277
defer_accept=0 NumberoftimesTCP_DEFER_ACCEPTkickedin‐
seeTCPsocketoptions
l_overflow=0
l_no_synrecv=0
a_overflow=0
a_no_sock=0
ack_rsts=0
os=2
l_overflow=numberoftimeslistenqueuewasfull
andhadtorejectaSYNrequest
l_no_synrecv=numberoftimesunabletoallocate
internalresourceforSYNrequest
a_overflow=numberoftimesunabletopromote
connectiontotheacceptqueuewhichisfull
a_no_sock=numberoftimesunabletocreate
socket
ack_rsts=numberoftimesreceivedanACKbefore
SYNsotheconnectionwasreset
os=2thereare2socketsbeingprocessedinthe
kernel
Table6:StackdumpOutput:UDPSocket:
Sampleoutput Description
UDP4:1lcl=192.168.1.2:38142
rmt=192.168.1.1:42638UDP
SocketConfiguration.
stack:socketid,UDPsocketonport38142
Localandremoteaddressesandports
lock:20000000LOCKED Stackinternallockstatus
rx_wake=000e69b0tx_wake=000e69b1flags: Internalsequencevaluesthatareincremented
eachtimeaqueueis‘woken’
ul_poll:0
spincycles0(usec)
ul_poll
spin_cyclesisthespinduration.
uid=0s_flags:FILTER
rcvbuf=129024
sndbuf=129024
bindtodev=1(01,0:0)ttl=64
Buffers.
socketreceivebuffersize,sendbuffersize,
rcvtimeo_ms=0
sndtimeo_ms=0
sigown=0
cmsg=
Timeoutvalues(microseconds)forsend/receive
functionsassetbySO_RCVTIMEO,SO_SNDTIMEO.
sigownidentifiesthePIDreceivingsignalsfrom
thissocket.
Table5:StackdumpOutput:TCPStackListenSocket(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 278
rx_errno=0
tx_errno=0
so_error=0
os_sock=0,TX
rx_errno=ZEROwhilstdatacanstillarrive,
otherwisecontainserrorcode.
tx_errno=ZEROiftransmitcanstillhappen,
otherwisecontainserrorcode.
so_error=currentsocketerror(0=noerror)
epoll3:ready_list_id0Listoffiledescriptorsreadyintheepollset
udpflags:FILTMCAST_LOOPRXOS FlagssetontheUDPsocket
mcast_snd:intf=1
ifindex=0
saddr=0.0.0.0
ttl=1
mtu=1500
Multicast.
intf=multicasthardwareportid(1meansport
wasnotset)
ifindex=interface(port)identifier
saddr=IPaddress
tt1=timetolive(defaultformulticast=1)
mtu=maxtransmissionunitsize
rcv:q_bytes=0
q_pkts=0
reap=2
tot_bytes=30225920
tot_pkts=944560
ReceiveQueue.
q_bytes=numbytescurrentlyinrxqueue
q_pkts=numpktscurrentlyinrxqueue
tot_bytes=totalbytesreceivedonthissocket
tot_pkts=totalpktsreceived
rcv:oflow_drop=0(0%)
mem_drop=0
eagain=0
pktinfo=0
q_max_pkts=0
OverflowBuffer.
oflow_drop=numberpacketsdroppedbecause
thebufferisfull.
mem_drop=numberofdatagramsdroppeddue
torunningoutofpacketbuffermemory.
eagain=numberoftimestheapplicationtriedto
readfromasocketwhenthereisnodataready‐
thisvaluecanbeignoredonthercvside
pktinfo=numberoftimesIP_PKTINFOcontrol
messagewasreceived
q_max=maxdepthreachedbythereceivequeue
(packets)
Table6:StackdumpOutput:UDPSocket:(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 279
rcv:os=0(0%)
os_slow=0
os_error=0
Numberofdatagramsreceivedvia:
os=operatingsystem
os_slow=operatingsystemslowsocket
os_error=recv()functioncallviaOSreturnedan
error
snd:q=0+0
ul=944561
os=0(0%)
os_slow=0(0%)
Sendvalues.
q=numberofbytessenttotheinterfacebutnot
yettransmitted
ul=numberofdatagramssentviaonload
os=numberofdatagramssentviaOS
os_slownumberofdatagramssentviaOSslow
path
snd:cp_match=0(0%) UnconnectedUDPsend.
cp_match=numberdgramssentviaaccelerated
pathandpercentthisisofallunconnectedsend
dgrams
snd:lk_poll=0(0%)
lk_pkt=944561(100%)
lk_snd=0(0%)
Stackinternallock.
lk_poll=numberoftimesthelockwasheldwhile
wepollthestack
lk_pkt=numberofpktssentwhileholdingthe
lock
lk_snd=numberoftimesthelockwasheldwhile
sendingdata
snd:lk_defer=0(0%)
cached_daddr=0.0.0.0
Sendingdeferredtotheprocess/threadcurrently
holdingthelock
snd:LOCKcp=1(0%)
pkt=737815(99%)
snd=3(0%)
poll=0(0%)d
defer=1(0%)
countlockheldwhileupdatingcontrolplane
countlockstogetpktbuffer
countlocksheldwhensending
countlockheldtopollstack
count‐sendsdeferredtolockholder
snd:MCASTif=9
src=172.16.128.28
ttl=1
detailsoftheinterfacesbeingusedbytheUDP
stack
Table6:StackdumpOutput:UDPSocket:(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 280
snd:TOn=737820
match=737819(99%)
lookup=1+0(0%)
Onloaded(Valid)
totalnumberofUDPpacketssentonthissocket
‐‐
‐‐
UDPtrafficisacceleratedbyOnload
snd:TOif=9
mtu=1500
intf_i=0
vlan=0
encap=4
detailsoftheinterfacebeingusedtosendUDP
traffic
snd:TO172.16.128.28:34645=>
224.1.2.3:8001
UDPsendmulticastsourceaddress:portand
multicastaddress:port
snd:CONn=0
lookup=0
NoRoute(Old)
sndCONif=0
mtu=0
intf_i=1
vlan=0
encap=0
detailsoftheinterfacebeingusedtosendUDP
traffic
snd:eagain=0
spin=0
block=0
eagain=countofthenumberoftimesthe
applicationtriedtosenddata,butthetransmit
queueisalreadyfull.Ahighvalueonthesendside
mayindicatetransmitissues.
spin=numberoftimesprocesshadtospinwhen
thesendqueuewasfull
block=numberoftimesprocesshadtoblock
whenthesendqueuewasfull
Table6:StackdumpOutput:UDPSocket:(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 281
Followingthestackandsocketdataonload_stackdumplotswilldisplayalistof
statisticaldata.Fordescriptionsofthefieldsrefertotheoutputfromthefollowing
command:
onload_stackdumpdescribe_stats
Thefinallistproducedbyonload_stackdumplotsshowsthecurrentvaluesofall
environmentvariablesinthemonitoredprocessenvironment.Fordescriptionsof
theenvironmentvariablesrefertoParameterReferenceonpage163orusethe
onload_stackdumpdoccommand.
RemoteMonitoring
IntroducedinOnload201502,theremotemonitoringfeatureusesasimpleclient/
servermodeltoexporttheOnloadstackandsocketdatatoaremoteserver(s).The
remotemonitor(server)processisinstalledalongwiththeOnloaddistribution.A
simpleexampleclientprocessisalsoprovided:
Theserverprocess(onthemachinetobemonitored)canbestartedfromthe
followingdirectory:
openonload201502/src/tools/onload_remote_monitor
Startthemonitorserverprocessidentifyingaportthroughwhichserver/client
processeswillconnect:
#./onload_remote_monitor<port>
Theexampleclientprocesscanbefoundinthefollowingdirectory:
openonload201502/src/tests/onload/onload_remote_monitor
snd:poll_avoids_full=0
fragments=0
confirm=0
poll_avoids_full=numberoftimespollingcreated
spaceinthesendqueue
fragments=numberof(nonfirst)fragmentssent
confirm=numberofdatagramssentwith
MSG_CONFIRMflag
snd:os_slow=1
os_late=0
unconnect_late=0
nomac=0(0%)
numberofpacketssentontheos_slowpath
os_late=numberofpktssentviaOSaftercopying
unconnect_late=numberofpktssilentlydropped
whenprocess/threadbecomesdisconnected
duringasendprocedure
nomac=countnumberoftimeswhennoMAC
addresswasknown,soARPwasrequiredbefore
deliveringtraffic.
Table6:StackdumpOutput:UDPSocket:(continued)
Sampleoutput Description
OnloadUserGuide
onload_stackdump
Issue22 ©SolarflareCommunications2017 282
Fromtheremotemachine,starttheclientprocessidentifyingtheserverhost
machineandportnumber
#./orm_example_client<serverhost>:<port>
Thereisverylittleoverheadtoonload_remote_monitor.Anadditionalthreadis
createdandworksinthesamewayasonload_stackdump,withouttakinganystack
locksandreadingdatafromsharedmemory.
Intheinitialreleasetheremote_monitorserverwillexportanextensivelistof
countersfromtheOnloadstacksandsockets.DataisexportedinJSONformatfor
processingbyaremoteapplication.
Remotemonitoringisanexploratoryfeatureanditisplannedthatfuture
continuousdevelopmentwillincludedatarequestedbydirectcustomerinputand
feedback.
Customersinterestedinremotemonitoringareaskedtoprovidefeedbackand
monitoringrequirementsbysendinganemailtosupport@solarflare.com.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 283
FSolarflaresfnettest
F.1Introduction
Solarflaresfnettestisasetofbenchmarktoolsandtestutilitiessuppliedby
Solarflareforbenchmarkandperformancetestingofnetworkserversandnetwork
adapters.Thesfnettestisavailableinbinaryandsourceformsfrom:
http://www.openonload.org/
Downloadthesfnettest<version>.tgzsourcefileandunpackusingthetar
command.
tar‐zxvfsfnettest<version>.tgz
Runthemakeutilityfromthe/sfnettest<version>/srcsubdirectorytobuild
thebenchmarkapplications.
RefertotheREADME.sfntpingpongorREADME.sfntstreamfilesinthe
distributiondirectoryoncesfnettestisinstalled.
sfntpingpong
Description
ThesfntpingpongapplicationmeasuresTCPandUDPlatencybycreatingasingle
socketbetweentwoserversandrunningasimplemessagepatternbetweenthem.
TheoutputidentifieslatencyandstatisticsforincreasingTCP/UDPpacketsizes.
Usage
sfntpingpong[options][<tcp|udp|pipe|unix_stream|unix_datagram>
[<host[:port]>]]
Options
sfntpingpongoptions:
Option Description
‐‐port serverport
‐‐sizes singlemessagesize(bytes)
‐‐connect connect()UDPsocket
‐‐spin spinonnonblockingrecv()
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 284
‐‐muxer select,pollorepoll
‐‐servmuxer none,select,pollorepoll(sameasclientbydefault)
‐‐rtt reportroundtriptime
‐‐raw dumprawresultstofiles
‐‐percentile percentile
‐‐minmsg minimummessagesize
‐‐maxmsg maximummessagesize
‐‐minms mintimepermsgsize(ms)
‐‐maxms maxtimepermsgsize(ms)
‐‐miniter minimumiterationsforresult
‐‐maxiter maximumiterationsforresult
‐‐mcast usemulticastaddressing
‐‐mcastintf setthemulticastinterface.Theclientsendsthisparameter
totheserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐bindtodev SO_BINDTODEVICE
‐‐forkboth forkclientandserver
‐‐npipe includepipesinfiledescriptorset
‐‐nunixdincludeunixdatagramsinthefiledescriptorset
‐‐nunixsincludeunixstreamsinthefiledescriptorset
‐‐nudp includeUDPsocketsinfiledescriptorset
‐‐ntcpc includeTCPsocketsinfiledescriptorset
‐‐ntcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcpserv host:portforTCPconnections
‐‐timeout socketSND/RECVtimeout
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 285
Standardoptions:
Examples
ExampleTCPlatencycommandlines
[server]#onload‐‐profile=latencytaskset‐c1./sfntpingpong
[client]#onload‐‐profile=latencytaskset‐c1./sfntpingpong\
‐‐maxms=10000‐‐affinity"1;1"tcp<serverip>
ExampleUDPlatencycommandlines
[server]#onload‐‐profile=latencytaskset‐c9./sfntpingpong
[client]#onload‐‐profile=latencytaskset‐c9./sfntpingpong\
‐‐maxms=10000‐‐affinity"9;9"udp<server_ip>
Exampleoutput
#version:1.5.0
#src:8dc3b027d85b28bedf9fd731362e4968
#date:Tue9Feb13:15:46GMT2016
#uname:Linuxdellr210g2q.uk.level5networks.com3.10.0327.el7.x86_64#1
SMPThuOct2917:29:29EDT2015x86_64x86_64x86_64GNU/Linux
#cpu:modelname:Intel(R)Xeon(R)CPUE31280V2@3.60GHz
#lspci:05:00.0Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
#lspci:05:00.1Ethernetcontroller:IntelCorporationI350Gigabit
NetworkConnection(rev01)
‐‐affinity ’<clientcore>;<servercore>’Enclosevaluesinquotes.
Thisoptionshouldbesetontheclientsideonly.Theclient
sendsthe<server_core>valuetotheserver.Theusermust
ensurethattheidentifiedservercoreisavailableonthe
servermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐npings numberofpingmessages
‐‐npongs numberofpongmessages
‐‐nodelay enableTCP_NODELAY
Option Description
?‐‐help thismessage
q‐‐quiet quiet
v‐‐verbose displaymoreinformation
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 286
#lspci:83:00.0Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:83:00.1Ethernetcontroller:SolarflareCommunicationsSFC9020
[Solarstorm]
#lspci:85:00.0Ethernetcontroller:IntelCorporation82574LGigabit
NetworkConnection
#eth0:driver:igb
#eth0:version:3.0.6k
#eth0:businfo:0000:05:00.0
#eth1:driver:igb
#eth1:version:3.0.6k
#eth1:businfo:0000:05:00.1
#eth2:driver:sfc
#eth2:version:3.2.1.6083
#eth2:businfo:0000:83:00.0
#eth3:driver:sfc
#eth3:version:3.2.1.6083
#eth3:businfo:0000:83:00.1
#eth4:driver:e1000e
#eth4:version:1.4.4k
#eth4:businfo:0000:85:00.0
#virbr0:driver:bridge
#virbr0:version:2.3
#virbr0:businfo:N/A
#virbr0nic:driver:tun
#virbr0nic:version:1.6
#virbr0nic:businfo:tap
#ram:MemTotal:32959748kB
#tsc_hz:3099966880
#LD_PRELOAD=libonload.so
#serverLD_PRELOAD=libonload.so
#onload_version=201205
#EF_TCP_FASTSTART_INIT=0
#EF_POLL_USEC=100000
#EF_TCP_FASTSTART_IDLE=0
#
#sizemeanminmedianmax%ilestddeviter
1245323802434182882669771000000
2245323792435451092616901000000
4246723802436105022730821000000
824652383244687982642701000000
1624602380244174942632681000000
3224742399245487582677711000000
64249524192474121742716771000000
Theoutputidentifiesmean,minimum,medianandmaximum(nanosecond)½RTT
latencyforincreasingpacketsizesincludingthe99%percentileandstandard
deviationfortheseresults.Amessagesizeof32byteshasameanlatencyof2.4
microsecondswitha99%ilelatencylessthan2.7microseconds.
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 287
sfntstream
ThesfntstreamapplicationmeasuresRTTlatency(not1/2RTT)forafixedsize
messageatincreasingmessagerates.Latencyiscalculatedfromasampleofall
messagessent.Messageratescanbesetwiththeratesoptionandthenumberof
messagestosampleusingthesampleoption.
SolarflaresfntstreamonlyfunctionsonUDPsockets.Thislimitationwillbe
removedtosupportotherprotocolsinthefuture.
RefertotheREADME.sfntstreamfilewhichispartoftheOnloaddistributionfor
furtherinformation.
Usage
sfntstream[options][tcp|udp|pipe|unix_stream|unix_datagram[host[:port]]]
Options
sfntstreamoptions:
Option Description
‐‐msgsize messagesize(bytes)
‐‐rates msgrates<min><max>[+<step>]
‐‐millisec timepertest(milliseconds)
‐‐samples numberofsamplespertest
‐‐stop stopwhenTXrateachievedisbelowgivepercentageof
targetrate
‐‐maxburst maximumburstlength
‐‐port serverportnumber
‐‐connect connect()UDPsocket
‐‐spin spinonnonblockingrecv()
‐‐muxer select,poll,epollornone
‐‐rtt reportroundtriptime
‐‐raw dumprawresultstofile
‐‐percentile percentile
‐‐mcast setthemulticastaddress
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 288
Standardoptions:
‐‐mcastintf setmulticastinterface.Theclientsendsthisparameterto
theserver.
‐‐mcastintf=eth2bothclientandserveruseeth2
‐‐mcastintf=’eth2;eth3’clientuseseth2andserveruses
eth3(quotesarerequiredforthisformat)
‐‐mcastloop IP_MULTICAST_LOOP
‐‐ttl IP_TTLandIP_MULTICAST_TTL
‐‐bindtodevice SO_BINDTODEVICE
‐‐npipe includepipesinfiledescriptorset
‐‐nunixdincludeunixdatagraminfiledescriptorset
‐‐nunixsincludeunixstreaminfiledescriptorset
‐‐nudp includeUDPsocketsinfiledescriptorset
‐‐ntcpc includeTCPsocketsinfiledescriptorset
‐‐ntcpl includeTCPlisteningsocketsinfiledescriptorset
‐‐tcpcserv host:portforTCPconnections
‐‐nodelay enableTCP_NODELAY
‐‐affinity "<clienttx>,<clientrx>;<servercore>"enclosethevalues
indoublequotese.g."4,5;3".Thisoptionshouldbeseton
theclientsideonly.Theclientsendsthe<server_core>
valuetotheserver.Theusermustensurethatthe
identifiedservercoreisavailableontheservermachine.
Thisoptionwilloverrideanyvaluesetbytasksetonthe
samecommandline.
‐‐rttiter iterationsforRTTmeasurement
Option Description
?‐‐help thismessage
q‐‐quiet quiet
v‐‐verbose displaymoreinformation
‐‐version displayversioninformation
Option Description
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 289
Examples
Examplecommandlinesclient/server
#./sfntstream(server)
#./sfntstream‐‐affinity1,1udp<serverip>(client)
#./taskset‐c1./sfntstream‐‐affinity="3,5;3"‐‐mcastintf=eth4udp\
<remoteip>(client)
BondedInterfaces:sfntstream
Thefollowingexampleconfiguresasinglebond,havingtwoslavesinterfaces,on
eachmachine.Bothclientandservermachinesuseeth4andeth5.
ClientConfiguration:
[root@clientsrc]#ifconfigeth40.0.0.0down
[root@clientsrc]#ifconfigeth50.0.0.0down
[root@clientsrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@clientsrc]#ifconfigbond0up
[root@clientsrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@clientsrc]#ifconfigbond0172.16.136.27/21
[root@clientsrc]#onload‐‐profile=latencytaskset‐c3./sfntstream
sfntstream:server:waitingforclienttoconnect...
sfntstream:server:clientconnected
sfntstream:server:client0at172.16.136.28:45037
ServerConfiguration:
[root@serversrc]#ifconfigeth40.0.0.0down
[root@serversrc]#ifconfigeth50.0.0.0down
[root@serversrc]#modprobebondingmiimon=100mode=1xmit_hash_policy=layer2primary=eth5
[root@serversrc]#ifconfigbond0up
[root@serversrc]#echo+eth4>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#echo+eth5>/sys/class/net/bond0/bonding/slaves
[root@serversrc]#ifconfigbond0172.16.136.28/21
NOTE:serversendstoIPaddressofclientbond
[root@serversrc]#onload‐‐profile=latencytaskset‐c1./sfntstream‐‐mcastintf=bond0‐
affinity"1,1;3"udp172.16.136.27
OutputFields
Alltimemeasurementsarenanosecondsunlessotherwisestated.
Field Description
mpstarget Msgpersectargetrate
mpssend Msgpersecactualrate
mpsrecv Msgreceiverate
latencymean RTTmeanlatency
OnloadUserGuide
Solarflaresfnettest
Issue22 ©SolarflareCommunications2017 290
LatencyProfile‐Spinning
Bothsfntpingpongandsfntstreamusescriptsfoundintheonload_apps
subdirectorywhichinvoketheonloadlatencyprofiletherebycausingthe
applicationto‘spin.
Torunthesetestprogramsinaninterruptdrivenmode,replacethe‐‐
profile=latencyoptiononthecommandline,withthe‐‐noapphandleroption.
latencymin RTTminimumlatency
latencymedian RTTmedianlatency
latencymax RTTmaximumlatency
latency%ile RTT99%ile
latencystddev Standarddeviationofsample
latencysamples Numberofmessagesusedtocalculatelatency
measurement
sendjitmean Meanvariancewhensendingmessages
sendjitmin Minimumvariancewhensendingmessages
sendjitmax Maximumvariancewhensendingmessages
sendjitbehind Numberoftimesthesenderfallsbehindandisunableto
keepupwiththetransmitrate
gapsn_gaps Countthenumberofgapsappearinginthestream
gapsn_drops Countthenumberofdropsfromstream
gapsn_ooo Countthenumberofsequencenumbersreceivedoutof
order
Field Description
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 291
Gonload_tcpdump
G.1Introduction
Bydefinition,Onloadisakernelbypasstechnologyandthispreventspacketsfrom
beingcapturedbypacketsniffingapplicationssuchastcpdump,netstatand
wireshark.
Onloadsupportstheonload_tcpdumpapplicationthatsupportspacketcapture
fromonloadstackstoafileortobedisplayedonstandardout(stdout).Packet
capturefilesproducedbyonload_tcpdumpcanthenbeimportedtotheregular
tcpdump,wiresharkorotherthirdpartyapplicationwhereuserscantakeadvantage
ofdedicatedsearchandanalysisfeatures.
Onload_tcpdumpallowsforthecaptureofallTCPandUDPunicastandmulticast
datasentorreceivedviaOnloadstacks‐includingsharedstacks.
G.2Buildingonload_tcpdump
Theonload_tcpdumpscriptissuppliedwiththeOnloaddistributionandislocated
intheOnload<version>/scriptssubdirectory.
NOTE:libpcapandlibpcapdevelmustbebuiltandinstalledbeforeOnloadis
installed.
G.3Usingonload_tcpdump
Forhelpusethe./onload_tcpdump‐hcommand:
Usage:
onload_tcpdump[ostack(id|name)[ostack...]]
tcpdump_options_and_parameters
"mantcpdump"fordetailsontcpdumpparameters.
Youmayusestackidnumberorshelllikepatternforthestackname
tospecifytheOnloadstackstolistenon.
Ifyoudonotspecifystacks,onload_tcpdumpwillmonitorallonload
stacks.
Ifyoudonotspecifyinterfacevia‐ioption,onload_tcpdump
listensonALLinterfacesinsteadofthefirstone.
ForfurtherinformationrefertotheLinuxmantcpdumppages.
NOTE:Onloadtcpdumponlyacceptsseparatecommandlineoptions‐combined
optionswillbeignoredbytheapplicationparser:
OnloadUserGuide
onload_tcpdump
Issue22 ©SolarflareCommunications2017 292
Thefollowingexamplewillwork:
onload_tcpdump‐n‐i<interface>
Thefollowingexamplewillnotwork:
onload_tcpdump‐ni<interface>
Examples
•Captureallacceleratedtrafficfrometh2toafilecalledmycaps.pcap:
#onload_tcpdump‐ieth2‐wmycaps.pcap
•Ifnofileisspecifiedonload_tcpdumpwilldirectoutputtostdout:
#onload_tcpdump‐ieth2
•TocaptureacceleratedtrafficforaspecificOnloadstack(byname):
#onload_tcpdump‐ieth4‐ostackname
•TocaptureacceleratedtrafficforaspecificOnloadstack(byID):
#onload_tcpdump‐o7
•TocaptureacceleratedtrafficforOnloadstackswherenamebeginswith“abc
#onload_tcpdump‐o'abc*'
•Tocaptureacceleratedtrafficforonloadstack1,stacknamed“stack2andall
onloadstackswithnamebeginningwith“ab”:
#onload_tcpdump‐o1‐o'stack2'‐o'ab*'
Dependencies
Theonload_tcpdumpapplicationrequireslibpcapandlibpcapdeveltobe
installedontheserver.Iflibpcapisnotinstalledthefollowingmessageisreported
whenonload_tcpdumpisinvoked:
./onload_tcpdump
ciOnloadwascompiledwithoutlibpcapdevelopmentpackageinstalled.You
needtoinstalllibpcapdevelorlibpcapdevpackagetorun
onload_tcpdump.
tcpdump:truncateddumpfile;triedtoread24fileheaderbytes,onlygot
0
Hangup
Iflibpcapismissingitcanbedownloadedfromhttp://www.tcpdump.org/
Untarthecompressedfileonthetargetserverandfollowbuildinstructionsinthe
INSTALL.txtfile.ThelibpcappackagemustbeinstalledbeforeOnloadisbuiltand
installed.
Limitations
•Currentlyonload_tcpdumpcapturesonlypacketsfromOnloadstacksandnot
fromkernelstacks.
OnloadUserGuide
onload_tcpdump
Issue22 ©SolarflareCommunications2017 293
onload_tcpdumpdeliverstimestampswithmicrosecondresolution.
onload_tcpdumpdoesnotsupportnanosecondprecision.
•Theonload_tcpdumpapplicationmonitorsstackcreationeventsandwill
attachtonewlycreatedstackshowever,thereisashortperiod(normallyonly
afewmilliseconds)betweenstackcreationandtheattachmentduringwhich
packetssent/receivedwillnotbecaptured.
KnownIssues
•Usersmaynoticethatthepacketssentwhenthedestinationaddressisnotin
thehostARPtablecausesthepacketstoappearinbothonload_tcpdumpand
(Linux)tcpdump.
•Usersshouldnotattempttoaccelerateonload_tcpdumpi.e.thefollowing
commandshouldnotbeused:
onloadonload_tcpdump‐i<interface>
• onload_tcpdumpwillalsobeacceleratedifLD_PRELOADisexportedinthe
Onloadenvironment‐sothefollowingmethodsshouldnotbeused.
#exportLD_PRELOAD=libonload.so
#onload_tcpdump‐i<interface>
SolarCapture
Solarflare’sSolarCaptureisapacketcaptureapplicationforSolarflarenetwork
adapters.Itisabletocapturereceivedpacketsfromthewireatlinerate,assigning
accuratenanosecondprecisiontimestampstoeachpacket.Packetsarecapturedto
PCAPfileorforwardedtousersuppliedlogicforprocessing.Fordetailsseethe
SolarCaptureUserGuide(SF108469CD)availablefromhttps://
support.solarflare.com/.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 294
Hef_vi
TheSolarflareef_viAPIisalayer2APIthatgrantsanapplicationdirectaccesstothe
Solarflarenetworkadapterdatapathtodeliverlowerlatencyandreducedper
messageprocessingoverheads.ef_viistheinternalAPIusedbyOnloadforsending
andreceivingpackets.Itcanbeuseddirectlybyapplicationsthatwantthevery
lowestlatencysendandreceiveAPIandthatdonotrequireaPOSIXsocket
interface.
•ef_viispackagedwiththeOnloaddistribution.
•ef_viisanOSIlevel2interfacewhichsendsandreceivesrawEthernetframes.
•ef_visupportsazerocopyinterfacebecausetheuserprocesshasdirectaccess
tomemorybuffersusedbythehardwaretoreceiveandtransmitdata.
•Anapplicationcanusebothef_viandOnloadatthesametime.Forexample,
useef_vitoreceiveUDPmarketdataandOnloadsocketsforTCPconnections
fortrading.
•Theef_viAPIcandeliverlowerlatencythanOnloadandincursreducedper
messageoverheads.
•ef_viisfreesoftwaredistributedunderaLGPLlicense.
•Theuserapplicationwishingtousethelayer2ef_viAPImustimplementthe
higherlayerprotocols.
H.1Components
AllcomponentsrequiredtobuildandlinkauserapplicationwiththeSolarflareef_vi
APIaredistributedwithOnload.WhenOnloadisinstalledallrequireddirectories/
filesarelocatedundertheOnloaddistributiondirectory.
H.2CompilingandLinking
RefertotheREADME.ef_vifileintheOnloaddirectoryforcompileandlink
instructions.
OnloadUserGuide
ef_vi
Issue22 ©SolarflareCommunications2017 295
H.3Documentation
Theef_vidocumentationisdistributedindoxygenformatwiththeOnload
distribution.DocumentsinHTMLandRTFformataregeneratedbyrunningdoxygen
inthefollowingdirectory:
#cdopenonload<version>/src/include/etherfabric/doxygen
#doxygendoxyfile_ef_vi
DocumentsaregeneratedintheHTMLandRTFsubdirectories.
Theef_viuserguideisalsoavailableinPDFformat(SF114063CD)fromthe
Solarflaredownloadsite.
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 296
Ionload_iptables
I.1Description
TheLinuxnetfilteriptablesfeatureprovidesfilteringbasedonuserconfigurable
ruleswiththeaimofmanagingaccesstonetworkdevicesandpreventing
unauthorizedormaliciouspassageofnetworktraffic.Packetsdeliveredtoan
applicationviatheOnloadacceleratedpatharenotvisibletotheOSkerneland,as
aresult,thesepacketsarenotvisibletothekernelfirewall(iptables).
Theonload_iptablesfeatureallowstheusertoconfigureruleswhichdetermine
whichhardwarefiltersOnloadispermittedtoinsertontheadapterandtherefore
whichconnectionsandsocketscanbypassthekerneland,asaconsequence,bypass
iptables.
Theonload_iptablescommandcanconvertasnapshot1copyofthekerneliptables
rulesintoOnloadfirewallrulesusedtodetermineifsockets,createdbyanOnloaded
process,areretainedbyOnloadorhandedofftothekernelnetworkstack.
Additionally,userdefinedfilterrulescanbeaddedtotheOnloadfirewallonaper
interfacebasis.TheOnloadfirewallappliestothereceivefilterpathonly.
I.2Howitworks
BeforeOnloadacceleratesasocketitfirstcheckstheOnloadfirewallmodule.Ifthe
firewallmoduleindicatestheaccelerationofthesocketwouldviolateafirewallrule,
theaccelerationrequestisdeniedandthesocketishandedofftothekernel.
Networktrafficsentorreceivedonthesocketisnotaccelerated.
Onloadfirewallrulesareparsedinascendingnumericalorder.Thefirstruletomatch
thenewlycreatedsocket‐whichmayindicatetoaccelerateordeceleratethesocket
‐isselectedandnofurtherrulesareparsed.
IftheOnloadfirewallrulesareanexactcopyofthekerneliptablesi.e.withno
additionalrulesaddedbytheOnloaduser,thenasockethandedofftothekernel,
becauseofaniptablesruleviolation,willbeunabletoreceivedatathrougheither
path.
Changingrulesusingonload_iptableswillnotinterruptexistingnetwork
connections.
NOTE:Onloadfirewallruleswillnotpersistovernetworkdriverrestarts.
1. SubsequentchangestokerneliptableswillnotbereflectedintheOnloadfirewall.
OnloadUserGuide
onload_iptables
Issue22 ©SolarflareCommunications2017 297
NOTE:Theonload_iptables“IPrules”willonlyblockhardwareIPfiltersfrombeing
insertedandonload_iptables“MACrules”willonlyblockhardwareMACfilters
frombeinginserted.ThereforeitispossiblethatifaruleisinsertedtoblockaMAC
address,theuserisstillabletoaccepttrafficfromthespecifiedhostbyOnload
insertinganappropriateIPhardwarefilter.
Files
WhentheOnloaddriversareloaded,firewallrulesexistintheLinuxprocpseudo
filesystemat:
/proc/driver/sfc_resource
Withinthisdirectorythefirewall_add,firewall_delandresourcesfileswillbe
present.Thesefilesarewritableonlybyarootuser.Noattemptshouldbemadeto
removethesefiles.
Onceruleshavebeencreatedforaparticularinterfaceandonlywhiletheserules
existaseparatedirectoryexistswhichcontainsthecurrentfirewallrulesforthe
interface:
/proc/driver/sfc_resource/ethN/firewall_rules
I.3Features
Togethelp
#onload_iptables‐h
I.4Rules
Thegeneralformatoftheruleis:
[rule=n]if=ethNprotocol=(ip|tcp|udp)[local_ip=a.b.c.d[/mask]]
[remote_ip=a.b.c.d[/mask]][local_port=a[b]][remote_port=a[b]][vlan=n]
action=(ACCELERATE|DECELERATE)
NOTE:UsingtheIPaddressruleform,thevlanidentifieriseffectiveonlywhenusing
aSolarflareSFN7000orSFN8000seriesadapterwhichisconfiguredtousethefull
featuredfirmwarevariant.OnotherSolarflareadaptersthevlanidentifieris
ignored.Thevlanidentifiercanonlybespecifiedwiththevlan=nsyntaxandnoton
theinterface.
[rule=n]if=ethNprotocol=ethmac=xx:xx:xx:xx:xx:xx[/FF:FF:FF:FF:FF:FF]
[vlan=n]action=(ACCELERATE|DECELERATE)
NOTE:UsingtheMACaddressruleform,thevlanidentifieriseffectivewhen
specifiedforanySolarflareadapter.
OnloadUserGuide
onload_iptables
Issue22 ©SolarflareCommunications2017 298
I.5Previewfirewallrules
BeforecreatingtheOnloadfirewall,runtheonload_iptables‐ voptiontoidentify
whichruleswillbeadoptedbythefirewallandwhichwillberejected(areasonis
givenforrejection):
#onload_iptables‐v
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=52015201
remote_ip=0.0.0.0/0remote_port=065535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcpdpt:5201
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=52015201
remote_ip=0.0.0.0/0remote_port=065535action=DECELERATE
DROPtcp‐‐0.0.0.0/00.0.0.0/0tcp
dpts:80:88
=>if=Noneprotocol=tcplocal_ip=0.0.0.0/0local_port=8088
remote_ip=0.0.0.0/0remote_port=065535action=
tcp‐‐0.0.0.0/00.0.0.0/0tcpspt:800
=>Errorparsing:Insuffcientargumentsinrule.
Thelastruleisrejectedbecausetheactionismissing.
NOTE:The‐voptiondoesnotcreatefirewallrulesforanySolarflareinterface,but
allowstheusertopreviewwhichLinuxiptablesruleswillbeacceptedandwhich
willberejectedbyOnload
ToconvertLinuxiptablestoOnloadfirewallrules
TheLinuxiptablescanbeappliedtoallorindividualSolarflareinterfaces.
Onloadiptablesareonlyappliedtothereceivefilterpath.Theusercanselectthe
INPUTCHAINorauserdefinedCHAINtoparsefromtheiptables.ThedefaultCHAIN
isINPUT.Toadopttherulesfromiptableseventhoughsomeruleswillberejected
enterthefollowingcommandidentifyingtheSolarflareinterfacetherulesshouldbe
appliedto:
#onload_iptables‐iethN‐c
#onload_iptables‐a‐c
Runningtheonload_iptablescommandwilloverwriteexistingrulesintheOnload
firewallwhenusedwiththe‐i(interface)or‐a(allinterfaces)options.
NOTE:ApplyingtheLinuxiptablestoaSolarflareinterfaceisoptional.The
alternativesaretocreateuserdefinedfirewallrulesperinterfaceornottoapply
anyfirewallrulesperinterface(defaultbehavior).
NOTE:onload_iptableswillimportallrulestotheidentifiedinterface‐evenrules
specifiedonanotherinterface.Toavoidimportingrulesspecifiedon‘other
interfacesusingthe‐‐useextendedoption.
OnloadUserGuide
onload_iptables
Issue22 ©SolarflareCommunications2017 299
Toviewrulesforaspecificinterface:
WhenfirewallrulesexistforaSolarflareinterface,andonlywhiletheyexist,a
directoryfortheinterfacewillbecreatedin:
/proc/driver/sfc_resource
Rulesforaspecificinterfacewillbefoundinthefirewall_rulesfilee.g.
cat/proc/driver/sfc_resource/eth3/firewall_rules
if=eth3rule=0protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=065535action=DECELERATE
if=eth3rule=1protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=065535action=DECELERATE
if=eth3rule=2protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=52015201remote_port=7272action=DECELERATE
if=eth3rule=3protocol=tcplocal_ip=0.0.0.0/0.0.0.0remote_ip=0.0.0.0/
0.0.0.0local_port=8088remote_port=065535action=DECELERATE
Toaddaruleforaselectedinterface
echo"rule=4if=eth3action=ACCEPTprotocol=udplocal_port=73307340"\
>/proc/driver/sfc_resource/firewall_add
Rulescanbeinsertedintoanypositioninthetableandexistingrulenumberswillbe
adjustedtoaccommodatenewrules.Ifarulenumberisnotspecifiedtherulewill
beappendedtotheexistingrulelist.
NOTE:Errorsresultingfromtheadd/deletecommandswillbedisplayedindmesg.
Todeletearulefromaselectedinterface:
Todeleteasinglerule:
#echo"if=eth3rule=2">/proc/driver/sfc_resource/firewall_del
Todeleteallrules:
echo"eth2all">/proc/driver/sfc_resource/firewall_del
Whenthelastruleforaninterfacehasbeendeletedtheinterfacefirewall_rulesfile
isremovedfrom/proc/driver/sfc_resource.Theinterfacedirectorywillbe
removedonlywhencompletelyempty.
ErrorChecking
Theonload_iptablescommanddoesnotlogerrorstostdout.Errorsarisingfromadd
ordeletecommandswillloggedindmesg.
Interface&Port
Onloadfirewallrulesareboundtoaninterfaceandnottoaphysicaladapterport.It
ispossibletocreaterulesforaninterfaceinaconfigured/downstate.
OnloadUserGuide
onload_iptables
Issue22 ©SolarflareCommunications2017 300
Virtual/BondedInterface
Onvirtualorbondedinterfacesfirewallrulesareonlyappliedandenforcedonthe
‘real’interface.
I.6ErrorMessages
Errormessagesrelatingtoonload_iptablesoperationswillappearindmesg.
Table7:Errormessagesforonload_iptables
ErrorMessage Description
Internalerror Internalcondition‐shouldnothappen.
Unsupportedrule Internalcondition‐shouldnothappen.
Outofmemoryallocatingnewrule Memoryallocationerror.
Seenmultiplerulenumbers Onlyasinglerulenumbercanbe
specifiedwhenadding/deletingrules.
Seenmultipleinterfaces Onlyasingleinterfacecanbespecified
whenadding/deletingrules.
Unabletounderstandaction Theactionspecifiedwhenaddinga
ruleisnotsupported.Notethatthere
shouldbenospacesi.e.
action=ACCELERATE.
Unabletounderstandprotocol Nonsupportedprotocol.
Unabletounderstandremainderof
therule
Nonsupportedparameters/syntax.
Failedtounderstandinterface Theinterfacedoesnotexist.Rulescan
beaddedtoaninterfacethatdoesnot
yetexist,butcannotbedeletedfrom
annonexistentinterface.
Failedtoremoverule Theruledoesnotexist.
Errorremovingtable Internalcondition‐shouldnothappen.
Invalidlocal_iprule Invalidaddress/maskformat.
Supportedformats:
a.b.c.d
a.b.c.d/n
a.b.c.d/e.f.g.h
wherea.b.c.d.e.f.g.haredecimalrange
0255,n=decimalrange032.
OnloadUserGuide
onload_iptables
Issue22 ©SolarflareCommunications2017 301
NOTE:ALinuxlimitationapplicabletothe/proc/filesystemrestrictsawrite
operationto1024bytes.Whenwritingto/proc/driver/sfc_resource/
firewall_[add|del]filestheuserisadvisedtoflushthewritebetweenlineswhich
exceedthe1024bytelimit.
Invalidremote_iprule Invalidaddress/maskformat.
Invalidrule Arulemustidentifyatleastan
interface,aprotocol,anactionandat
leastonematchcriteria.
Invalidmac Invalidmacaddress/maskformat.
Supportedformats:
xx:xx:xx:xx:xx:xx
xx:xx:xx:xx:xx:xx/xx:xx:xx:xx:xx:xx
wherexisahexdigit.
Table7:Errormessagesforonload_iptables(continued)
ErrorMessage Description
OnloadUserGuide
Issue22 ©SolarflareCommunications2017 302
JSolarflareeflatencyTestApplication
Theopenonloaddistributionincludesthecommandlineeflatencytestapplication
tomeasurelatencyoftheSolarflareef_vilayer2API.
eflatencyisasinglethreadping/pongapplication.Whenalliterationsarecomplete
theclientsidewilldisplaytheroundtriptime.
eflatencydeterminesthelowestlatencymodethatitispossibletouse,fromthe
following:
•TXalternatives
•PIO
•DMA.
Bydefault,eflatencysends10000warmuppacketstofillcachesandstabilizethe
system,beforemeasuringstatisticsover100000iterationsofpacketswithno
payload.Payloadsizeandnumbersofiterationscanbeconfigured.
WiththeOnloaddistributioninstalled,eflatencywillbepresentinthefollowing
directory:
~/openonload<version>/build/gnu_x86_64/tests/ef_vi
J.1eflatency
./eflatency–help
usage:
eflatency[options]<ping|pong><interface>
options:
‐n<iterations>‐setnumberofiterations
‐s<messagesize>‐setudppayloadsize
‐w<iterations>‐setnumberofwarmupiterations
Table8:eflatencyOptions
Parameter Description
interface thelocalinterfacetousee.g.eth2
OnloadUserGuide
SolarflareeflatencyTestApplication
Issue22 ©SolarflareCommunications2017 303
Toruneflatency
Theeflatencymustbestartedontheserver(pongside)beforetheclient(pingside)
isrun.Commandlineexamplesareshownbelow.
1Ontheserverside(server1)
taskset–c<M>./eflatency‐s28pongeth<N>
#ef_vi_version_str:<onloadversion>
#udppayloadlen:28
#iterations:100000
#warmups:10000
#framelen:70
#mode:Alternatives
where:<
‐ <M>istheCPUcore
‐ <N>istheSolarflareadapterinterface.
2Ontheclientside(server2)
taskset–c<M>./eflatency‐s28pingeth<N>
#ef_vi_version_str:<onloadversion>
#udppayloadlen:28
#iterations:100000
#warmups:10000
#framelen:70
#mode:Alternatives
meanroundtriptime:<n.nnn>usec
where:<
‐ <M>istheCPUcore
‐ <N>istheSolarflareadapterinterface
‐ <n.nnn>isthereportedmeanroundtriptimefora28bytepayload.

Navigation menu