Digital Technical Journal, Volume 5, Number 1 Dtj_v05 01_1993 Dtj V05 01 1993
dtj_v05-01_1993 dtj_v05-01_1993
User Manual: dtj_v05-01_1993
Open the PDF directly: View PDF .
Page Count: 147
Download | |
Open PDF In Browser | View PDF |
DECnet Open Networking Digital Technical Journal Digital Equipment Corporation Volume 5 Number 1 Winrer 1993 Editorial Jane C. Blake, Editor Helen L. Patterson, Associate Editor Kathleen M. Stetson, Associate Editor Circulation Catherine M. Phillips, Administrator Production Terri Autieri, Production Editor Anne S. Katzeff, Typographer Peter R. Woodbury, Advisory Board U lustrator Samuel H. Fuller, Chairman Richard W Beane Donald Z. Harbert Richard]. Hollingsworth Alan G. Nemeth jeffrey H. Rudy Stan Smits Michael C. Thurk Gayn B. Winters The Digital Technicaljournal is a refereed journal published quarterly by Digital Equipment Corporation, 146 Main Street ML01-3/B68, Maynard, Massachusetts 01754-2571. Subscriptions to the journal are $40.00 for four issues and must be pre paid in U.S. funds. University and college professors and Ph.D. students in the electrical engineering and computer science fields receive complimentary subscriptions upon request. Orders, inquiries, and address changes should be sent to the Digital Technical journal at the published-by address. Inquiries can also be sent electronically to DTJ@CRL.DEC.COM. Single copies and back issues are available for $ 16.00 each from Digital Press of Digital Equipment Corporation, 129 Parker Street, Maynard, MA 01754. Digital employees may send subscription orders on the ENET to RDVAX::JOURNAL or by interoffice mail to mailstop ML01-3/B68. Orders should include badge number, site location code, and address. All employees must advise of changes of address. Comments on the content of any paper are welcomed and may be sent to the editor at the published-by or network address. Copyright© 1993 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted . All rights reserved. The information in the journal is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in the journal. ISSN 0898 -901X Documentation Number EY-M7 70E-DP The following are trademarks of Digital Equipment Corporation: ADVANTAGE NETWORKS, Alpha AXP , the Alpha AXP logo, AXP , Bookreader, DEC, DEC 3000 AXP, DEC FDD!controller, DEC OSF/1 AXP , DEC LANcontroller, DEC WANcontroller, DECbridge, DECchip 21064, DECconcentrator, DEChub, DECmcc, DECnet, DECnet/SNA, DECnet-VAX, DECnet/OSI for Open VMS, DECnet/OSI for ULTRIX, DECNlS 500/600, DECstation, DECthreads, DECUS, Digital, the Digital logo, DNA, LANbridge, L AT, Open VMS, Open VMS on Alpha AXP, POLYCENTER, POLYCENTER Network Manager 200, POLYCENTER Network Manager 400, POLYCENTER SNA Manager, RS232, ThinWire, TURBOchannel, ULTRIX, VAX, VMS, and VMSciuster. Advanced System Management and SOLVE: Connect for EM A are trademarks of System Center, Inc. AppleTalk is a registered trademark of Apple Computer, Inc. BSD is a trademark of the University of California at Berkeley. FastPacket, StrataCom, and IPX are registered trademarks of StrataCom, Inc. Cover Design Our cover illustrates an image of the simplicity of data sharing as experienced by system users interconnected through a IBM and NetView are registered trademarks oflnternational Business Machines Corporation. Motif, OSF, and OSF/1 are registered trademarks of Open Software Foundation, Inc. NetWare and Novell are registered trademarks of Novell, lnc. global network; papers in this issue describe NFS is a registered trademark of Sun Microsystems, Inc. the depth and complexity of technologies Presto serve is a trademark of Legato Systems, Inc. and products that make the simplicity of System V is a trademark of American Telephone and Telegraph Company. data exchange possible. UNIX is a registered trademark of UNIX System Laboratories, Inc. The cover design is by Deb Anderson of X/Open is a trademark ofx;Open Company Limited. Quantic Communications, Inc. Book production was done by Quantic Communications, Inc. I Contents 10 Foreword Anthony G. Lauck DECnet Open Networking 12 Overview of Digital's Open Networking John Harper 21 The DECnet/OSifor OpenVMS Ver sion 5.5 Implementation Lawrence Yetto, Dorothy Noren Millbrandt, Yanick Pouffary, Daniel). Ryan , J r. , and D av id). Sullivan 34 The ULTRIX Implementation of DECnet/OSI Kim A. Buxton, Edward). Fe rris, and Andrew K . Nash 44 High-performa nce TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha A XP Chran-Ham Chang, Richard Flower, john Forecast, Heather Gray, Wil l iam R. Hawe, K . K . Ramakrishnan, Ashok P. Nadkarni, Uttam N. Sh ikarp u r, and Kathleen M. Wilde 62 Routing Architecture Radia) . Pe rlman, Ross W Calion , and I. M ichael C. Shand 70 Digital's Multiprotocol Routing Software Design Graham R. Cobb and El.liot C. Gerberg 84 The DECNIS 500/600 Multiprotocol Bridge/Router and Gateway Stewart F Bryant and David L.A. Brash 99 Fra�ne Relay Networks Robe rt). Roden and Deborah Tay le r 107 An Implementation o f the OS/ Upper Layers and Applications D avid C. Robinson, Lawrence N. Friedman, and Scott A. Wattum 117 Network Management Mark W Sylor, Frands Dolan, and David G. Shurtleff 130 Design of the DECmcc Management Director Col in Strutt and james A. Swist I Editor's Introduction protocols, and the network i n terface. They then detail the optimizations made fo r high performance. Rou t i ng data through networks with thousands of nodes is a very diffi c u l t task. Ra d i a Perlman, Ross Ca lion, and M i ke Shand desc ribe how the Phase V routing arch i tecture addresses rout i n g complexity. Focusing on the IS-IS protocol, they pose problems a routi ng protocol could experience, pres e n t al ter native solu tions, and expl a i n the IS - IS approach. The chal lenges i n developing m u l t iprotocol rout ing software for i nternetworking across LANs, WANs, and d ial-up n e t works are presented i n the paper by Jane C. Blake Gra ham Cobb and E l l iot Gerberg. They h ig h l ight Editor the importance of the stability of the rou t i ng algo rithms, u s i ng the D EC WAN router and DECNIS prod Ten years ago, a network of 200 nodes was con u c ts as a basis for d isc ussing a l ternative designs. sidered very large with u n cert a i n managea b i l i t y. Stewart Bryan t and David Brash then focus on 100,000 details of the h igh-performance DECNIS 500/600 nodes in open, d istributed system env i ronments bridge/router and gateway. They discuss the archi Today, Digita l 's n e t works accommodate a n d resolve the complexities of i ncompat i b i l i t y tecture and the algorithm for d istri b u ted forward among multivendor systems. Ten years from today, i ng that i ncreases scal able performance. Both the n e t work systems comprising a million-plus nodes hardware a nd the software are described. wi l l be b u i l t based upon the D igital architectures and technologies described in this issue. In additi on to rou t i ng , the subject of clara transfer of h igh-speecl, bursty traffic using a simpl ified form John Harper provides an i nformative overview of of packet switching is described . Robert Rod e n and advances made with each phase of the D ig i t a l Net Deborah Tay ler discuss frame relay networks, their work Architecture, now in Phase V He describes the architectural layers and d is t i ngu ishes D ig ital's u n iq u e characteristics, and the care needed i n pro tocol selection and congestion hand l ing. approach to network services and m a nagement The above d iscussions of data transfe r a n d rout from t ha t of others in the i ndustry. His paper offers ing occ u r a t t he lower layers of the network archi context for those that follow. The Phase V architecture p rov ides the m igration tecture. Dave Robi ns o n , Larry Fried m a n , and Scott Wat t u m present an overview of the upper layers to open systems from previous phases of D ECnet. I n and impleme n t i n g Phase throughp u t and m i n imize con nection del ays. V, designers of two DECnet describe implementations that m aximize products for the Op enVMS and ULTRlX ope r a t i ng Network m a n ageme n t is critica l to the rel iable systems shared several goals: extend network access function of the networ k . As Mark Sylor, Frank i n a m u ltivendor environment, use standard p roto Dolan, a n d Dave S h u r t leff tell u s in their paper, cols, and protect customers' software investments. Phase V manageme n t is based on a new architec Larry Yetto, Dotsie M i l l brandt, Yanick Pou ffa ry, D a n ture that e ncompasses man agement of the n e t work Ryan , a n d David Sul livan describe the D ECne t/OSI and systems. They expl a i n the decision to move for OpenVMS implemen t a tion and give deta i l s of management respons i b i l i ty to the subsystem arch i t he s ig nifican tly different design of Phase V net tecture, and a l so describe the entity mode l . The work m a n agement. In their paper on D ECnet/OSI next paper e l aborates on the d irector portion of the for ULTRJX, Kim Buxton, Ed Ferris, and Andrew man agement Nash stress the importance of the protocol swi t c h M anagement D irector. Col i n Stru t t and Jim Swist tables i n a multiprotocol env i ronmen t . DECnet/OSI review the design of this platform for developi n g for ULTRJX incorporates OS! , TCP/IP, and X.25. I n the broad ly accepted TCP/IP protocol area, D ig i t a l has developed a h igh-performance TCP/JP implementation that takes advantage of the ful l FDDJ bandwi d t h . K.K. Ramakrishnan and members of the development team review the characteristics of the Alpha AXP workstation, OSF/1 operating system, the 2 arch i tectu re, cal led the DECmcc management capabil ities, t h e modu larity o f which a l lows future modules to be added dynamically The e d i tors thank John Harper for h is help in selecting the content of t h is issue. I Biographies David L.A. Brash David Brash, a consu l tant engineer, joined Digital's Networks Engineering Group in 198 5 to lead the hardware development of the MicroServer communications server (DEMSA). As the technical leader for the DECNIS 500/600 hardware platforms, David contributed to the architectu re, backplane specification, module and ASIC designs and monitored correctness. He was an active member of the IEEE Futurebus+ working group. He is currently leading a group supporting Alpha design wins in Europe. David holds a B.Sc. in electrical and electronic engineering from the University of Strathclyde. Stewart F. Bryant A consulting engineer with Networks and Commu nications in Reading, England, Stewart Bl)'ant worked on the advanced develop ment program that developed the DECNIS 600 architecture. D u ring the last six months of the program, he was its technical leader, focusing on implementation issues. P r ior to this work, Stewart was the hardware and firmware architect for the MicroServer hardware platform. He earned a P h . D. in physics from I mperial Col lege in 1978. He is a member of the I nstitute of E lectrical Engineers and has been a Chartered Engineer since 1985. Kim A. Buxton Kim Buxton is a principal software engineer in the Networks and Communications Group. During t he past seven years, Kim has been work ing on DECnet and osr for UNIX operating systems. She is currently the project leader of the DECnet/OSI for DEC OSF/1 AXP release. Prior to assuming the role of project leader, Kim worked on network management, session control, and trans port protocols for DECnet-ULTRIX products. S he has worked in the area of net works and communications since joining Digital in 1980. S he earned her B.S. degree in mathematics and secondary education from the University of lowell. Ross W . Calion As a member of Digital's Network Architecture Group from 1988 to 1993, Ross Calion worked on routing algorithm and addressing issues. He was a primary author of the Integrated IS-IS protocol and of the guide! ines for using NSAP addresses in the I nternet. P reviously, he was employed by Bolt Beranek and Newman as a senior scientist and helped develop the ISO CLNP pro tocol. Ross received a B.Sc. ( 19 69) in mathematics from MIT and an M .Sc. (197 7 ) in operations research from Stanford University. He is currently employed as a con sulting engineer at Wel l fleet Communications. Chran-Ham Chang Chran-Ham Chang is a principal software engineer in the UNIX System Engineering Group and a member of the FAST TCP/IP project team. Since joining Digi t al in 1987, Chran has contributed to the development of vari ous E thernet and FOO l device d rivers on both the ULTRIX and DEC OSF/ 1 AXP systems. He was also involved in the U LTRIX network performance analysis and tools design. Prior to this, Chran worked as a software specialist in Taiwan for a distributor of D igital's products. He received an M .S. i n computer science from the New jersey Institute of Technology. 3 Biographies Graham Cobb is a consulting engineer in the Internet Graham R. Cobb Products Engineering Group and was software project leader for the DECNIS '500/600 router development. Graham holds an MA in mathematics from the University of Cambridge ancl joined Digital as a communications software engi neer in 1982. He has worked on many Digital communications products, includ ing X.2'5 products and routers, and was a major contributor to the DEC WAN router 100/'500 software immediately prior to leading the DECNIS development. Most recently, Graham has been working on new-generation routing software. Francis Dolan Frank Dolan is a consultant engineer with Digital's Telecommu nication Business Group Engineering in Valbonne, France. He is currently the project manager and technical leader of the GD.MO translator, a tool being devel oped to support the DECmcc/TeMIP OS! access module and OSI agent presenta tion module. Prior to this work, Frank was the architect of several Phase V DNA specifications, including DDCMP network management, OS! transport, and network routing accounting. He was also an active member of OS! management standards committees. Frank has filed one European patent application. Ed Ferris is a principal engineer in the Networks and Edward J. Ferris Communications Group. During the past seven years, Ed has been working on DECnet- LTRIX. He is currently one of the technical leaders of the DECnet/OSI for DEC OSF/1 AXP release. Ed has primarily worked at the data link and network protocol layers. He has worked on networks and communication products since joining Digital in 1982. Ed earned a B.A. in English from the University of Massachusetts and a B.S. in computer engineering from Boston University. Richard Flower Richard Flower works on system performance issues in multiprocessors, networking, distributed systems, workstations, and memory hierarchies. T he need for accurate time-stamping events across multiple systems led him to develop the QUIPU performance monitor. The use of this monitor led to performance improvements in networking, drivers, ami RPC. Richard earned a B.S.E.E. from Stanford University (with great distinction) and a Ph.D. in com puter science from MIT. Prior to joining Digital, he was a professor at the University of Illinois. Richard is a member of Phi Beta Kappa and Tau Beta Pi. john 1\ � .... �· · . .e : � .r ' . .. _. . . ... - - - - ....'ov .- � 4 Forecast A software consultant engineer with the Networks Engineering Advanced Development Group, John Forecast addresses network performance issues associated with the transmission of audio and video data through existing networks. .John joined Digital in the United Kingdom in 1974 and moved to the United States to help design DECnet-RSX Phase 2 products, DECnet Phase rv, and DECnet implementations on U LTIUX ancl System V UNIX. John also worked o n file servers for VMS and a prototype public key authentica tion system. He holds a Ph.D. from the University of Essex. I Lawrence N. Friedman Principal engineer Lawrence Friedman is a technical leader in the OS! Applications Group. He joined Digital in 1989 and is the project leader for ULTRIX FTAJ\1 V l .O and V l . l . In addition to his project responsibilities, Larry is Digital's representative to the National Institute of Standards and Technologies and Phase 3 (NIST) FTAM SIG and was the editor of the documents from 1990 1992. to NIST FTAM SIG Phase FTAM File Store Management International Standard Profile. Larry holds a (1978) 2 He is currently the editor for the B .A . in music from Boston University. Elliot C. Gerberg Elliot Gerberg is a senior engineering manager in Digital's Networks Engineering Division, managing the Routing Engineering Group (USA). Since joining Digital in ing the DEUNA, 1977, he has worked on numerous projects includ Digital's first LAJ'l adapter; the DECserver cost terminal server; the SGEC, 100, interface; and various multiprotocol routers. Elliot has a SUNY Digital's first low a high-performance Ethernet semiconductor B.S. in physics from and an M.S. in computer science from Boston University. He holds profes sional memberships with the IEEE, the ACM, and the Internet Society. Heather Gray A principal engineer in the UNIX Software Group (USG), Heather Gray is the technical leader for networking performance on the DEC OSF/1 AXP product family. Heather's current focus is the development ofiP multi cast on DEC OSF/1 A.'(P. She has been involved with the development of Digital networking software (TCP/IP, DECnet, and OS!) since 198 6 . Prior to joining USG, Heather was project leader for the Internet Portal Y l .2 product. She came to Digital in 1984, after working on communication and process control systems at Broken Hill Proprietary Co., Ltd. john Harper (BHP) in Australia. As technical director of the Corporate Backbone Networks Group in NAC, john Harper directed the development of the DECnet Phase V architecture. Until last year john also chaired the which deals with standards for the OS! ISO Committee JTCl/SC6/WG2, network layer. He joined Digital in 1974 after receiving a degree in computer studies (1st class honors) from the University of Lancaster. john has ten patents (filed or issued) on computer net works and has published several conference papers on that subject. He has made numerous contributions to standards for computer networks. William R. Hawe A senior consulting engineer, Bill Hawe manages the LAN Architecture Group. He is involved in designing architectures for new net working technologies. Bill helped design the FDDI and extended LAN architec tures. While in the Corporate Research Group, he worked on the Ethernet design with Xerox and Intel and analyzed the performance of new communica tions technologies. Before joining Digital in 1980, Bill taught electrical engineer ing and networking at the University of Massachusetts, where he earned a B .S. E. E. and an M.S.E.E. He has published numerous papers and holds several patents. 5 Biographies Dorothy Noren Millbrandt Dotsie Millbrandt is a principal software engi neer and a co-project leader for Common Network Management. Currently she is developing management components that will work across all the DECnet/OSI platforms: OpenVMS, OSF/1, and ULTRlX. Dotsie was the project leader for the MOP component and the trace facility and has worked on OST transport and con figuration software. Prior to this work, she was a project leader and microcode developer for DSB32 and KJ.\1\'11 synchronous communications controllers in the CSS Network Systems Group. Ashok P. Nadkarni A principal software engineer in the Windows NT Systems Group, Ashok Nadkarni is working on a port of native Novell NetWare to Alpha �'Show Rou t i ng r t g$0002 Node DEC : . z ko . l l i um C i rcui t LAN L a n-D has received a l l portions of the response from Adj acency - Add r e s s EMA A , encoded them into C M I P, and transmit ted them back to the requ esting node, the CML image The command contains the node e n t i t y name, DEC : . zko. I I iu m ; the module ent ity with i n the node, rou ti ng; the name of the c i rc u i t subentity of rout i ng, lan-0; the name of the adjacency subentity of circuit, rtg$0002; and fi nally the attribute name. To issue management commands from a DECnet/ OS! fo r OpenVMS system, a user i nvokes the N C L u t i l i ty. N C L parses commands i nto fragments cal led tokens, containing ASCII stri ngs. It uses the data dic t i onary to transl ate these i n to management codes for directives, entit ies, and attribu tes. NCL then con structs a network i tem l is t from this i n formation and i nvokes the CMIP requester send fu nction. C M I P requester fu nctions are i m p lemented as a set of l i brary ro u t i nes t hat are l i nked w i t h the N C L u t i l ity. Und erneath t h is ca l l e r i nterface, the C M J P rou t i nes establish a connection over D N A session control to the destination node's CM I P l istener. The d i rective is then encoded i n to a CMIP message and passed to the destination. NCL now posts the first CMIP requester receive cal l . More than o ne receive c a l l may be needed to obtain a l l the response data. As soon as a partial response is ava i l able, the receive fu nction decodes the CMIP messages i n to network item l ists and passes them back to NCL. NCL transl ates these i n to displayable text ami val ues and d irects the output to the user's ter m i n a l o r a log fi l e . If the partial response is not complete, N C L then loops and issues a nother ca l l to the C M I P requester receive fu ncti on. The CMIP requester fu nctions are optimized for the local node case. If the dest i nation node is speci fied as "0" (the local node), the CMIP requester func t ions i n terface d i rectly to the EMAA i n terface, s k ipp i ng t he C M I P encod i ng, decod i ng, and the round trip across the network. termi nates. EMAA, the EMA Agent The m anagement struc ture imposed by EMA contains com m o n d i rect ives that must be supported by a l l e n t i t ies. A design goal for EMAA was to provide a common management facil i t y with support fo r common operations such as show o r set. EMAA can p erform these fu nctions aga i nst a n entity 's manageme n t data structures, thereby fre e i ng each e n tity from separately i mple menting them and s impl ify i ng the entity's code requ irements. This approach was successfu l l y implemented, though a t t h e cost of a more complex agent i m plementation and macro instructions a set of registration col loqu ial ly known as the " macros from hel l ." The above i n terface between EMA A and the enti· t ies is k n own as the fu l l i nterface. Not all develop ment groups' coding entit ies were i nterested in t h is approac h ; thus, EMA A a lso provides a basic i n ter face. An entity specifies wh ich i n terface to use d u r i ng its i n itial ization when it registers with EMA A . F o r a n e n t i t y t h a t u s e s the basic i n terface, E M A A si mply passes t h e d i rective i n formation t o the des ignated ent ity and expects response data returned. The choice of i n terface must be made by the modu le-level e n t i t y. If the entity uses the fu l l i n ter face, it m ust register its management structure, i ncluding a l l subentities and at tributes, with EMA A . For t hese e n t i t ies, EMA A processes the network i tem l is t passed by CML. I t creates a data structure for each subentity instance, specifying the a t tribu tes, any val ues supplied, and the actions to be performed . EMA A passes t h is to the designated e n t ity, whic h uses tables set up d u ri ng i n i t i a l i zation to cal l the appropriate action rou t i n e for the d i rec tive . By defa u l t , these action rou t i nes are set u p as c a l l backs i n to EMAA itse l f, thereby a l lowing Etvi.A A to perform the task . Wi th either the basic o r t h e ful l The CMIP Listener The CMIP l istener is imple interface, a separate response is required fo r each mented as a server process, s i m i l a r to the Phase IV suben tity i nstance specified by a d irective. EMAA network management l istener. When an i ncoming cal l s CML i terative l y t h rough a corout i n e cal l to connection request fo r CML is received, a process is pass response data back to CML. created to r u n the CML i mage. The CML i m age u t i l i zes the DNA session control i n terface to accept The Event Dispatcher Phase IV event logging the connection and receive the C M I P encoded a l lowed eve n t s to be sent to a s i n k o n one node. In d irective. It then uses the data d ictionary to decode Phase the message i n to a network item l is t . EMA A is then sinks that can be local o r on any n u mber of remote i nvoked to process the d i rective and return any nodes. Event filtering can be applied on the out required bound streams of events, fi ltering events before response from Digital Tee/mica/ jourt�al the e n t i t y. lkJI. 5 1\'o. 1 Once CML Winter t9'J3 V, the event d ispatcher supports m u l tiple 25 DECnet Open Networking they are t ransm i t ted to a sink. This provides a mech implement ation of MOP was designed for m u l t i anism to direct different types of events to d ifferent t h readed operation . This means t h e re is o n l y one sinks. MOP process per node, and i t processes mu ltiple A n eve n t sink is the destination for a n event mes concurrent operations by crea ting a separate sage. A node can have m u l t iple si nks, each accept thread for each management d irective, program i ng events from any number of remote nodes. Event request, o r d u m p request received. Moreover, a l l fi ltering can be appl ied to the inbou n d streams of management data required t o service MOP requests passes is sent to the sink, which uses the data d ic structures, designed to be searched qu ickly. When a tionary to fo rmat it i n to ASCII character strings. I t is request is received, MOP can prom p t ly ascertain then ou tpu t to the sin k client, wh ich may be a con whether the required information to service the sole, printer, or file. request is avai lable and make a response. events at the event sink. An event message that is conta ined in MOP-specific m a nagement data An optimization is used when a n event is gener ated on a node and the destination sink is on the Session Control Implementation same node. In this case, the event bypasses the out The design of the DECnet/051 for Open VMS session bound stream and is que ued d i rectly to the eve n t con trol layer is based o n goa l s defined by both the sink. T h e DECnet/051 for O p e nYMS product, i n t h e session control arch i tect u re and the DECnet user defa u l t configuration for a l o c a l node, defines one com m u nity. These goals include outbou nd stream d i rected to a s i n k o n the loca l node and defines the console as the sink cl ient . • large customer base with major investments in An event relay provides compat i b i l i ty w i t h Phase DNA appl ications. The session control layer sup IV nodes. This important function permits a Phase V event sink to log messages from Phase IV or Phase V ports t hese applications without requiring a rei ink of the object code. DECnet systems. Event relay is a sessio n c o n t rol a p p l ication that l istens for DECnet Phase IV eve n t Compatibility. The DECnet -YAX product has a • Performance. Transmit and receive operations messages. I t encapsul ates e a c h P h a s e r v event mes across the network must be as efficient as possi V eve n t message and posts it to t he ble. M i n i m a l overhead is i n troduced by the ses event dispatcher, using the same service that other sion control layer in making each transport DECnet/051 for OpenVMS entities use to post events. protocol available to appl ications. sage in a Phase lvlaintenance Operations Protocol The N ET$MOP • tation of the DNA m a i ntenance operations proto • col. MOP uses the services of the local and wide New features. The session control layer takes fu I I advantage o f the new nam ing and addressing area data link device d rivers to perfor m l ow-level capabil i ties of Phase n e t wo r k operations. MOP can down-l i ne load a n operat ing system i mage to a YMScluster satell i te Extensible. The session contro l layer design al lows fo r future additions to the arch i tecture. process is the DECnet/051 for Open VMS i mplemen • Improved V DNA. ma nagement. The session control respond to remote requests from a layer compl ies with EMA, a l lowing it to be man network device to down - l i ne load or u p - l ine clump aged from anywhere t h roughout the network . node and an image . MOP also suppo rts ma nagement d irec tives that a l low a system manager to load o r boot a Session Control Design remote device, monitor system identifica tion mes The session control layer is d ivided i n to several log sages, p erform data l i n k loopback tests, o r open a ica l compo ne nts, $ Q I O , $ ! PC , N ET$ACP, common te r m i n a l I/0 com m u n ications channel to a device's services, a nd network management. $QIO and $ J PC console program. provide the AP!s required to com m u n icate across The prim ary design goal of the MOP i m p lementa t i o n was to respond qu ickly and with low system overhead to remote requests from devices to down l i ne load an image. I n some network co nfigura the network. $QIO is fu l ly compatible with a l l Phase rv DECnet-VAX appl ications; however, it d oes not a llow access to the fu l l set of features available in DECnet/OSI for OpenVMS. These n ew features, t ions, a power fa ilure and restoration can cause and any fu ture add i t io n s, are available o n ly t h rough h u n d reds of devices to request a dow n - l i n e load at the new $!PC i n terface. known to have diffic u l t y hand l i ng t h is, so the new trol services provided by the common serv ices t h e same time. The Phase IV imp.lem e n t a t i o n was 26 The two APis are consu mers of session con Vol. 5 No. I Winter I'J93 Digital Tecbuical]ourual The DECnet/051for OpenVMS Version 5.5 Implementation component. This componen t provides all the network functions defined in Phase V to the AP!s itself i n the global names pace, $ ! PC enables session control to maintain its address attribute. This above i t . In order to do this, the common services address attribute contains all the information nec component makes use of both the NET$ACP and essary to define where the application resides on network management portions of the session con the network. $ !PC can then be used by the c l ient trol layer. side of an application to connect to a server Figure 3 shows the session layer components and through a single global name, instead of using a node name and appl ication name pair. This enables their relationships to each ot her. the client side of an appl ication to commun icate Session Control AP!s with its server without knowing where the server DECnet Phase I V restricted node names to six char currently resides. acters in length. In DECnet-VA..\: the $QIO i nterface $JPC supports a new means of accessing a node was the only means by which an application could by its address. I n Phase IV, addresses were l imited make cal ls to the session control layer. This inter to 63 areas with 1 ,023 nodes i n each area. The face also enforced the six-character name l imit. address of each node cou ld be represented with With the advent of Phase V, this restriction no longer applies. I t is possible for a node running of node name in which the 16-bit address is con character node name. As a consequence, the $QIO equivalent. This is not sufficient to address all Phase Phase V to be unreachable by a Phase IV-style six interface was extended to al low fu l l name repre sentations of a node. a 16 -bit integer. The $QIO interface supports a form verted into the ASC I I representation of the decimal V nodes, so a new function called " connect-by address tower" is available through $ !PC . The The $!PC interface is a new interface that incor porates a l l the functions of the $QIO interface, address tower is d iscussed further in t he Common Services Component section. along with extensions made to the session control Yet another feature of $ IPC is the ability to trans architecture. This item-list-driven interface pro late a node's address into the name of the node as vides a cleaner, more extensible i nterface and registered in the global namespace. In Phase IV the a l lows for easy conversion of $QIO appl ications. address- to-name translation was a management The $QIO interface uses a network control block function. Furthermore, the translatio n was local to (NCB) and a network function block (NFB) to hold the node on which i t was performed . data. This data is easily mapped to items in a network item l ist. Also, the function codes used by $QIO can be easily mapped to $ !PC function Session Control Network Management The session control layer m a kes use of the fu l l codes. As new requirements arise, supported items Ei'VlA A entity interface to support a l l entities defined can be added to the l ist without impacting the exist by the session control architecture. These i nclude ing values. the session control entity itself, as wel l as the appli The $ ! PC interface also supplies some new fea tures not available in $QIO. Phase V DNA uses the D igital Distributed Name Service (DECdns) to store cation, transport service, port, and tower mainte nance subentities. Each of these entities contains timers, flags, and other control inJormation used by information about nodes and appl ications in a the session control layer during its operation. They global namespace. Once an appl ication declares also contain counters for the events generated by the session control l ayer. The appl ication subentity is of special interest. This entity is the equivalent of the Phase IV object database. I t allows the system manager to register an appl ication with session control to make it avail able for incoming connections. This entity is a lso NETWORK MANAGEMENT used to control the operation of the application and select the types of connections that can be sent or received by it. Common Services Component The common services component is the hub for Figure 3 Digital Tee/mica/ journal Session Design llrJ/. 5 No. I Winter 1993 session control. It is responsible for performing 27 DEC net Open Networking tasks that are not specific to the $ I PC or $QJO at that l ayer. When viewed as a whole, the tower set i n terfaces. These tasks include m anaging transport describes a combination of protocols supported connections o n beha l f of session contro l users, on a node. The session control layer on every mapp i ng from a DECdns object name to addresses, DECnet/OSI fo r OpenVMS system not only uses t h is selecting com m u n i cation protoco ls supported by i n formation to c o m m u n icate with remote nodes, both the local and remote end systems, maintaining but is also responsi b le for b u i ld i ng a tower set to the protocol and address information correspond represent that l ocal system . Once b u i l t , this tower ing to loca l objects in the namespace, and activating set is p l aced in the namespace as the attribute for (or creating) processes to service i ncom i ng con the node. nect requests. The session control i n terfaces a l low the user to The N ET$ACP process is used to provide the com specify a node in m a ny ways. A node can be speci mon serv ices component w i t h process context. fied as a Phase !V-sty le node name, a Phase rv-style The NET$ACP image itself is noth ing more than a set address, a DECdns fu l l name, o r a tower set. The of queues and a n id le loop. When the session con t h ree forms o f name representations are mapped to trol layer is loaded , it creates user-mode and kernel the correspond i ng tower set by making c a l l s to the mode tasks. A queue is assigned for each tas k , and DECdns clerk the NET$ACI' process at taches to the task when i t is attribute. Once the tower set is in hand, i t can be to obtain the node's tower set started . When the session component needs to exe used to com m u n icate with the session control layer cute in the con text of a process and not on the o n the remote node. in terru p t stack, i t b u i lds a work queue e ntry, The tower set for a remote node and the tower queues i t to t he appropriate tas k queue, and wakes set fo r the local up the NET$ACP. The N ET$ACP finds the address of to determine i f both nodes support a common the desired routine in the wo rk queue entry and tower. If a common tower is fo u n d , session control node are used in conjunction executes it. T h is routine can be located anywhere at tempts to est abl ish a connection to the remote that is add ressable by the process, but it is usu a l ly node using that tower. I f the connection fa i ls, the contained comparison continues. If another match ing tower within the session contro l loadable i mage. The common services component makes is fou n d , the connection a t tempt is rereated. T h is use of the N ET$ACP for reading files, creating continues u nt i l the connection is establ ished or the network processes, and making cal l s to the DECdns tower sets are exhausted. clerk. It also makes use of the process fo r fu nctions tbat require large amounts of memory. By pe rform ing these tasks in the NET$ACP process, session con trol is able to use process v i rtual memory even Use ofDECdns The session control layer uses DECdns objects for a l l global naming . These objects are used in two d if though it is implemen ted as an executive loadable ferent ways: they can represent a node or a global image. appl ication. A node object is a global representa The tower set data structure plays a key role tion of a node i n a DECdns namespace. Each node in session control. A tower set consists of one or object contains attribu tes that identify the location m o re towers. Each tower represents a protocol of a node. Foremost in t h is l ist of attribu tes is the stack and is composed of three or more floors, as DNA$Towers attribute. The DNASTowers attribu te shown in figure 4. The lowest floors in the tower contains the tower set for the node and is w r i t ten correspond to the DNA rou ting, transport, and ses automatical l y by the session control layer when sion contro l layers; they are used to identify proto DECnet/OSI for Open VMS is configured and started. col and associated add ress information to be used Once created , this attribute i s updated by session FLOOR N APPLICATION-DEFINED FLOORS FLOOR 3 SESSION PROTOCOL SESSION ADDRESS I N FORMATION FLOOR 2 TRANSPORT PROTOCOL TRANSPORT ADDRESS I N FORMATION ROUTING PROTOCOL ROUTING ADDRESS I N FORMATION FLOOR 1 Figure 4 28 Tower Design Vul. 5 .Vo. I Winter /'J'J3 Digital Technical jou rnal The DECnet/OS!for Open VMS Version 5.5 Implementation control to reflect the current supported towers for • Providing extensible constru cts for future trans port protocols, i.e . , provid ing a set of transport the node. service l ibraries When the session control layer bu ilds the tower set for the DECdns node object, it creates a tower for each combination of supported protocols and • network addresses on the node. If the node sup ports two transports and three network addresses, Eliminating previous dupl ication in adjacent layers (session and network routing layers) • Prov iding backward compatibility with exist the tower set is generated with six towers. It always ing places the CML application protocol floor on top of (NETDRIVER/NSP and VAX OS! transport service) the session control floor. The add ress i nformation for the session control floor is then set to add ress Phase rv transport protocol engines Transport Layer Design the CMI. app l ication. The transport address infor A transport VAX commu nication mod u le has two mation is set to address DNA session control, and components, a protocol engine and the transport the routing information of each tower in the set is service libraries. The service libraries are common set to o ne of the supported network addresses for code between modules and are l i nked together the node . with each engine to form an executive loadable The node object DNA$Towers attribute contains image. The three elements of DECnet/OSI for data that completely describes the node. Since ses OpenVMS transport, the NSP protocol engi ne, the sion control supports node addresses and Phase OS! protocol engine, and the transport service rv-style node names, soft li nks are created in the l i braries, are li nked into two i m ages. Figure 5 namespace to map from a Phase V network service shows the relationship of these elements. access point (NSAP) or a Phase I V-style node name The specific fu nctions provided by a transport (node synonym) to the node object. These l i n ks can engine depend on the protocol . The generic role of then be used by the session control layer as alter NSP and the OS! transport layer is to provide a reli nate paths to the node object. able, sequent ial, connection-oriented service for An application object is a global representation of an appl icatio n. The DNA$ Towers attribute of this use by a session control layer. The design provides a common transport interface to both NSP and the object contains a set of address towers used to OS! transport layer. This enables NSP and OSI trans address the appl icatio n . The rou ting and transport port (class 4) to be used interchangeably as a DNA floors for each tower i n this set are used i n the same transport. As fu ture transport protocols are devel manner as for the node object. The add ress i nforma tion in the session floor, however, addresses the appl ication, not CML. Once set, the information in this tower set is not maintai ned unless the appl ica tion issues a register object cal l through the $!PC interface. If this is done, session control maintains the tower in the same manner as it does fo r the node object. oped, they can be easily added i n to this design. The DECnet/OSI for OpenVMS transport design places common fu nctions in the service libraries for use by any protocol engine that needs them. Any functions that are not specific to a protocol are performed in these l i braries. Separating these func tions enables new protocols to be implemented more qu ickly and allows operating-system-specific details to be hidden from the engines. Transport Implementation The DECnet/OSI fo r OpenVMS product supports two transport protocols: the open systems in ter connection transport protocol (OSI TP) and the network service protocol (NSP). Each transport protocol, or group oflogical ly associated protocols, is bund led as a separate but equivalent VAX com mu nication module. This approach accompl ishes many goals. The more notable ones incl ude • Isolating each module as a pure transport engine • Defining and enforcing a com mon transport user interface to all transports Digital Tech llicaljounwl Vol. 5 /liu. r - - - - - - - - - - - - OSI VAX COMMUNICATION MODULE I l l I l I OSI PROTOCOL ENGINE Winter 199.> I l I I -- -- - - I 1 L Figure 5 1 _ _ _ _ _ _ TRANSPORT SERVICE LIBRARIES - _ I I + I I _ _ _ _ _ _ _ _ _ NSP PROTOCOL ENGINE I -- - - J _ _ l I I I I NSP VAX COMMUNICATION I M �� EJ _ _ _ Logical Transport Components 29 DEC net Open Networking The NSP transport VAX commu nication module provide a set of action rou tines. These action rou operates only in the DNA stack and supports tines hold the protocol-specific logic to be applied only DNA session controL Due to an essential ly to the data handl ing process. unchanged wire protocol, NSP is completely com patible with Phase IV implementations. The OS! transport VAX communication module implements Network Services Phase V provides two types of network services: connectionless (CLNS) and con the International Organization for nection-oriented (CONS). CLNS offers a datagram Standardization (ISO) 8073 cl asses 0, 2, and 4 proto facil ity, in which each message is routed to its desti cols. It can operate on a pure OSI stack in a mu lti nation independently of any other. CONS estab vendor environment. The OS! transport is also lishes logical cormections i n the network layer over completely compatible with the Phase IV VA)\ OS! transport service implementation and operates on the DNA stack supporting DNA session control. Transport Engines The transport VAX communi cation modu les provide a transport connection (logical link) service to the session layer. The con nection management is designed to ensure that data o n each logical l ink is handled independently from data on other logical links. Data belonging to different transport connections is never mixed, nor does a blockage of data flow on one connection prevent data from being handled on another. The transport VAX: communication modules are state table driven. Each transport engine uses a state/event matrix to determine the address of an appropriate action routine to execute for any state/event combination. As a transport connection changes state, it keeps a histogram of state transi tions and events processed . Service Libraries The fol lowing fu nctions are common to many protocols and are implemented which transport messages are then transmitte d . Transport running over CLNS has a flexible inter face. It opens an association with the CLNS layer and is then able to sol icit the CLNS layer to enter a trans port protocol data unit (TPDU) into the network. When adm ission is granted, transport sends as many TPDUs as possible at that time. Incoming TPDUs are posted to transport as they are received by the CLNS layer. Both NSP and OS! transports run over the CLNS layer. Transport runn ing over CONS has a more rigid interface. Once a network connection is estab lished with the CONS l ayer, each transport request has to be completed by the CONS layer. Thus trans port, when running over CONS , is not able to trans mit all its TPDUs at once. Each transmit must be completed back to transport before the next can commence. Also, if transport is to receive incoming TPDUs, a read must be posted to the CONS layer. The OS! transport runs over the CONS l ayer, but the NSP protocol was designed specifical l y for CLNS and does not operate over CONS. in the service l ibraries. • Tra nsfer of normal data and expedited data from transmit buffers to receive bu ffe rs • Fragmen tation of large messages into smaller messages for transmission and the reconstruc tion of the complete message from the received fragments • Detection and recovery from loss, d uplication, corruption, and lower layers misordering introduced by Differences between Phase IV and Phase V Transport Protocol Engines Flow control pol icy is an important difference between the VA};: OS! transport service and the DECnet/OSI for OpenVMS i mplementation . The VAX OSI transport service implements a pessimistic p o l icy that never al locates cred it representing resources it does not have. The OS! transport proto col, on the other hand, implements a more opti mistic pol icy that takes adva ntage of buffering The key transport service l ibrary is the data available in the pipel ine and the variance in data transfer l ibrary. This l ibrary gives a transport engine flow on d ifferent transport connections. It makes the capabi J ity to pe rform data segmentation and the assumption that transport connections do not reassembly. Segmentation is the process of breaking consume a l l a llocated credit at the same time. a large user data message i nto multiple. smal ler Other enhancements to the OSI transport protocol messages (segments) for transmission. Reassembly include conformance to EMA network manage is the process of reconstructing a complete user ment, compliance with the most recent ISO specifi data message from the received segments. To use cations, the data transfer l ibrary, a protocol engine must OpenVMS VMScluster Alias. 30 and Vol. 5 No. I participation Winter 1993 in DECnet/OSI for Digital Technical journal The DECnet/05/for OpenVMS Version 5. 5 /mplementation The two main differences between the Phase IV for scripts to contain information for numerous and Phase V NSP implementations are conformance to the EMA management model, and, once again, entities. For example, the NSP transport i n i tial iza flow control . In Phase IV, NSP does not request flow instance of the session control transport service tion script contains commands to create an control and uses an XON/XOFF mechanism. This provider entity, which enables the session layer to resu lts in large fluctuations in throughput. Phase V use the protocol. The procedure can extract infor NSP has been en hanced to request segment flow mation about the control. This mechanism enables each side of a NET$CONVERT_DATABASE utility to translate an transport to determine when it can send data seg existing Phase IV configuration contained i n the ments. Due to this d ifference in flow control policy, configuration by using the Phase IV permanent databases. Alternatively, it can Phase V NSP throughput converges to a maximum prompt the user for the information needed to value. al low basic operation of the node. Future Direction of Transports the questions, except for the node's full name and The DECnet/OSl for OpenVMS transport design pro its Phase IV address, have defa u l t choices. If the vides a common transport user interface to a l l defaults are chosen, the node operates properly The first time NET$CONFIGURE is executed, all transports and places common functions i n the once the network has started. When appropriate, transport service l ibraries. This approach provides NET$CONFLGURE also ca l ls other configuration extensibil ity; i t al lows future transports to be easily tools to configure the DECdns cl ient and the D igital incorporated as they emerge in the industry. This D istributed Time Service (DECdts), and to p erform common interface can also be used to provide an various network transition functions. API that interfaces directly to a transport. DECoct/ Once the initia l configuration has been per OSI for Open VMS engineering is currently looking at formed, customization of components is available. providing such an API. Subsequent execu tion of the N ET$CONFIGURE pro cedure wil l present t he user with a menu that Corifiguration allows specific subsections of the configuration to Design on the new configuration tools was started be done, for example, adding or deleting MOP by collecting user com ments about the Phase IV clients or session control applications, changing tools and desirable features for the new tool. This the name of the node, or controll ing the use of data was collected from customer communication communications devices. at DECUS, through i nternal notes files, and through General help is available while running N ET$CONFIGURE. Jf the user does not u nderstand internet news groups. The first goal agreed upon was to create configu ration files that are easy to read; inexperienced Phase V network managers may be required to read any individual query, responding with a "'" (ques tion mark) provides a brief explanation. The scripts created by N ET$CONFIGURE and understand these files. Next, the tool must be are structured. The configu ration is divided into sev NET$CONFIGURE for each script file, and it is stored computed. A checksum is computed by eral files with recognizable file names rather than in a database along with the answers entered for all one potential ly unmanageable one. Each file con other configuration ques tions. This allows the tains the in itial ization comm ands needed to i nitial NET$CONFIGURE procedure to detect whether a ize one network entity. Final ly, the configuration tool should be extensible. New commands, enti script has been modified by an outside source . If ties, or other information can easily be added to the the user that user-specific changes made to the par configuration. ticular script may be lost. Configuration Design NET$CONFJGURE cannot guarantee that the infor mation will be retained after future executions of this condition is detected, N ET$CONFIGURE warns If The main configuration tool is a DCL com mand pro cedure (NET$CON FLGURE.COM). This procedure a user has mod ified the NCL scripts, the procedure. An attempt is made to maintain the generates NCI. script files, which are executed dur changes across new versions. In all cases, previous ing network start-up, to initial ize the network. In scripts are renamed before the new scripts are gen general , each script file in itializes one entity within erated . This a l lows t he user to verify that cus DECnet/OSI for OpenVMS. It is possible, however, tomized change was transferred to the new script. Digital Tech11icaljom"ttal Vol. 5 No. 1 Winter 1993 31 DECnet Open Networking If not, t he saved version can be used to manually Transport/Session Control replace the change. ports the NSP and OS! transports. The p rocedure Node Configuration N ET$CONFIGURE sup configures both transports by default, but allows NET$CONFTGURE asks only the user to select only one. Commands are gener one quest ion that is directly rel a ted to the node ated in the start-up scripts to initial ize both the enti ty. It asks for the node's DECdns ful l name and transports and the session control transport ser sets the node's name. Since the na mespace n ick vice provider entity instances, which a l low the ses name is a required component of the fu l l name answer, it also a l l ows the procedure to determine sion control layer to use them . If OSI transport is configured , default templates the namespace i n which to configure DECdns. are created to a l low the installa tion verification The node synonym default is generated by using proced ures for the OSI appl ications to operate suc the first six characters of the last si mp.le name of the cessfu l l y. The user also has the option of creating node's fu l l n ame. If the user entered the fu l l name, specific connection option templates for use with USN : . Norfo l k .Destroyer.Spruance . D D 125, the syn OS! applications. onym default woul d be 00 125. The user is free to cha nge this information as long as the response is a legal Phase f\1-style name. If present, the transition tools make use of this synonym when the node is registered in t he DECdns namespace. Data Link/Routing c.Iu re contains a table of a l l valid data l in k devices supported by DECnet/OSI for OpenVMS . When the data l i n k/rou t i ng configuration module is cal led , the system configuration is scanned . Any valid devices found on the system are presented to the user for addition to the configuration. The only exceptions are asynchronous data I ink devices. The user must specifica l ly request asynchronous sup port for these devices to be configured. Configuration is mandatory for broadcast data l i n k media since these devices are shareable and users other than DECnet/OSI for OpenVMS m ay request the device. For synchronous devices, the user has the choice to configu re the device for use a ured in the same way as they are with the DECnet VA..'( Phase rv configuration tool. The user has the option to a llow access to each a ppl ication through The NET$ CONFIGURE proce by DECnet/OSI for OpenVMS. If Al l defa u l t session control applications, e.g. , file access l istener (FAL), mail, or phone, are config dev ice is config ured , a choice between the D igital data commu ni a defa u l t account or not. The only queries made by the configuration tool are about the creation of the user account for the arplication. DECdts Confip,uration The DECeits configuration is performed by a cal l to the DTSS $ CONFIG U RE procedu re. DTSS $CONFIGURE prompts the user to choose between un iversal coordinated time (UTC) or local time, which is UTC plus or mi nus the ti me-zone differe ntial factor (TDF) . If local t i me is chosen, then the procedure prompts fo r the continent and time zone on that continent to use. Th is information is needed to comp u te the TDF. The DTSS $ CO N F 1 Gl! HE tool creates i ts own NCL scripts. These scripts are not maintained by NET$CONFIGURE, and no checksums are computed or stored for them. cations message protocol (DDCM P) or high-level data l i nk control (HDLC) as data l i nk protocol mu st Configuration a lso be made. software must be in operation so that the DECdns software may use it. The N ET$CONFIGURE proce Each data l i nk device configured requires a name for t he device and a n a me for the correspond ing rou ting circuit. The defau lts for these names are generated by using the protocol name, e . g . , car To configure DECdns, the network dure attempts to start the network once it has cre ated the necessary scripts. Once the network has been started , the NET$CONFIGURE proced ure calls detection DNS$ CONFIGURE, passing it the node full name that (CSMA-CD), HDLC, or DDCMP, along with a unit n u m was entered by the user. The ful l name contains the ber. The user m ay override the default with any namespace nickname that the user wishes to use. rier sense m u ltiple access-coil is ion val id simple name. This al lows the user to set the DNS$CON FIGU RE then uses the DECdns advertiser to data l i nk and routing circuit names to be more l isten on the broadcast media for a name server that descriptive in t heir environment; for exa mple, is advertising the same namespace n ickname. If a a nd match is made, D ECdns creates an i n i t ial ization NCL CONNECTION_TO_BOSTO!\_DR500 for a routing script with the needed instructions to configure circuit. the DECdns clerk at the next system boot. It then H DLC_SYNC_TO_BOSTON for a data l i nk Vol. 5 No. I Winter 1993 Digital Techt� ica l jourttal The DECnet/OSifor OpenVMS Version 5.5 Implementation tel ls the advertiser to configure against the same for Phase IV DECnet -VA.X. I n addition , the design of namespace. DECn et/OSI for OpenVMS is structured in a way that If the namespace nickname cannot be ma tched , the user is given alternatives. First, a l is t of the will ease the i ntroduction of new standards as they come available. current namespaccs advertised on the broadcast med ia, along with the LOCAL: namespace is offered . LOCAL: is a special case used in l ieu of the standard client-server configuration. The LOCAL namespace makes use of the client cache to store a sma l l num ber of nodes locally. If a choice is not made from the l ist, the user is queried to see if an attempt should be made to con figure to a name server that may be located on a data l i n k other than the broadcast media. If yes, then a valid address must be provided to the Acknowledgments Throughout the course of this project, many peo ple have helped in the design , implementation, and documentation of the product. We would like to thank a l l those people for a l l their help. We would also l ike to extend a special thanks to all members of the bobsled team. Without them, this product would never have come to be. DNS$CON FIGURE tool so that it may connect to the References name server on the remote node. 1 . J. If no options are chosen at this point, a final choice of creating a name server on the local node is presented . Since DECn et/OSJ for OpenVMS must configure the DECdns clerk, if the answer is still no, the procedure returns to the original l ist of known namespaces and starts aga in. Transition Tools Harper, "Overview of Digital's Open Net working," Digital Teclmicaljou mal, vol. 5, no. 1 ( Winter 1993, this issue): 1 2 - 2 1 . 2 . M . Sylor, F. Dolan, and D . Shurtleff, " Network Man agement," Digital Technical journal, vol. 5, no. 1 ( Winter 1993, this issue): 1 17- 129. Once DECdns is configured, the transition tools are used to create the correct namespace d irectory configuration. If a new namespace has been created and selected for use, the tools popu late the directories with the node information from the Phase IV DECnet database fo und on the system. Most often, the tools simply register the node with the DECdns name server along with the node synonym that was provided by the user du ring the node configuration portion of N ET$CONFIGURE. The transition tools also assist the user when renaming the node or changing from one name space to another. They copy subdirectory informa tion from the node's old DECdns directory to the new directory structure on the new namespace or within the same namespace, if the user only changed the node's name. Summary The DECnet/OSI fo r OpenVMS version 5.5 product implements all layers of the DNA Phase V archi tec ture. This extends the OpenVMS system to a new degree of network access by supplying standard OSI protocols. The product also protects the large investment in network software that OpenVMS users currently hold. This is done by fu lly support ing the extensive selection of applications avail able Digital Technical jounwl Vol. 5 No. I Winter 1993 33 Kim A. Buxton EdwardJ. Ferris A ndrew K. Nash The ULTRIX Implementation ofDECnet/OSI The DECnet/05/for ULTRIX software was developed to allow the ULTRJX operating system and ULTRJX workstation software systems to operate in a multivendm; multi protocol network based on open standards. It operates in a complex networking environment that includes 051, DECnet Phase rv, X.25, and TCP/IP protocols. BSD sockets and pmtocol switch tables provide the entry points that define interfacesfor protocol modules. The DECnet/051 for ULTRIX software incorporates Digital's Enterprise Management Architecture, which provides a framework on which to consistently manage the various components of a distributed system. The DECnet/ 05!for ULTRJX software provides a set of powerful tools and a system that can be extended to include new functions as they are incotporated in the 05! standard. DECnet/OSI fo r ULTRIX i s an end system imple mentation that supports the open systems in ter connection (OS!) protocol through the Digital with a description of network management and network configuration. Networking Arch itecture (DNA) Phase V software. System Overview This implementation provides several featu res DECnet/OSl fo r and programming environments that are consistent tation of the OSI network architecture and Digital's with the UNIX system philosophy of networking. ULTRIX is an end system implemen DNA Phase V The DNA Phase V architecture pro Ease of use, extensibil i ty, and portability were key vides a framework for incorporating OSI protocols design goals during product development. Opera as defined by the I nternational Orga nization fo r tion of DECnet/OS! for ULTRIX software in a complex Standardi zation (ISO) into DECnet/OSI products. network ing environment provides coexistence and DECnet/OSI for interac tion with the transmission contro l proto the col/internet protocol (TCP/IP), DECnet Phase interfaces. This software al lows the IV, paper " Overview of Digital 's Open C LTRIX software is integrated i nto kernel and l ayered on existing ing system and X.25, and multivendor OS! networks. The U LTRIX U LTRIX workstation software (UWS) network based on open standards. detai l s concerning standard Berkeley Software following network services: Distribu tion • (BSD) networking concepts, the The DECnet/OSl for U LTRIX software provides the local area and wide area device driver support as This paper provides an overview of DECnet/OSI for ucrrux software. I t discusses some of the design made during product development, i ncluding the use of protocol switch tables. It describes t he system's five com mu nication described i n the ISO Reference Model and DNA 2 • domains. The paper continues with a discussion of programming in terfaces, interfaces Network management software, incorporating the Digital Enterprise Management Architecture. • Appl ication program ming interfaces to support user development of d istribu ted applications. domains, emphasizing the X.25, data l i n k , and OS! application Base networking software, which includes trans port services, network layer services, X.25, and the end of this paper decisions U LTRJ X operat systems to operate in a m u l tivendor, multip rotocol Networking" (in this issue) provides a suitable introduction to J)F.\.net/OSI concepts. 1 For more reader is referred to the general references l isted at ULTIUX • DEener application software. DNA session con trol bridges DECnet applications such as file into kernel modules, and a network management transfer (dcp,d ls,drm), remote login (d login), i nterface establ ished for extensibil ity. It concludes and mail to transport layer services. 34 Vol. 5 No. I Winter 1993 Digital Technical journal The ULTRJX Implementation of DECnet/051 • service, DECnet/OSI fo r l J LTRIX development began with a which provides a loca tion-independent naming col lection of eight d istinct projects, each with its facility. This service is used by DNA session own goals, schedules, and priorit ies. These projects control were developed across engineering organizations, DECdns, Digital's to distributed provide node name name-to-add ress and spanned three continents. They consisted of translations.-1 • D igital 's distribu ted time serv ice , DECdts. This time synchronization service is required by many distributed appl ications such as DECdns to main tain a consistent time base for their operations. • OS! applications software, including file transfer, X .25, wide area device drivers, FTA M , VTP, DECeits, DECdns, OS! appl ications kernel (OSAK), and the DECnet/OSI base components. Early in development, i t was real ized that no indi v idual project could be successful without achiev ing success at a systems .level fo r the DECnet/051 for access, and management (FTAM) and virtual U LTRIX product. This real ization caused a change in term inal protocol (VTP) support. the System Goals and Development A major goal of DECnet/OSI for ULTRJX was to sup port large m u l t ivendor, multiprotocol networks, including coexistence of OSI and TCP/ I P on an U LTRD< UWS system . Coexistence includes the abil i ty to share system resources and to provide a con sistent set of services to users of both the OSI and i n ternet protocols. Another goal was to provide connectivity between OSI and TCP/IP networks through the implementations of gateways and hybrid stacks. Interoperabil i ty between DECnet/OSI and DEC net way the DECnet/OSI for ULTRJX projects approached engineering deve.lopment. Our focus switched to provid ing a common set of goals and one i n tegrated sched ule. Priorities for individual projects were reeval uated in the context of the system goals and schedule. I t was critical to have a set of wel l- defined in terfaces; any change to these interfaces could have a major system impact. Commun ication between all projects was essential. A significant amount of time was b u i l t into the schedule for system i ntegration, as wel l as compo nent integration. Phase rv products was required to maintain con Kernel Networking Environment nectiv ity during network transition to OSI . A frame The DECnet/OSI for U LTRlX kernel implementation work for the development of new OS! appl ications was designed to be consistent with other U LTRIX such as FTAM was another requ irement . As in the networking implementations such as the TCP/IP and DECnet-ULTRIX Phase IV i mplementation, program Local Area Transport (LAT). The networking struc ming and user interfaces needed to be consisten t ture is based on the BSD networking subsystem:1 with the ULTRIX and U NIX systems environment. Wherever possible, code was to be shared with The ULTRIX networking environment allows pro toco l compo nents to be insulated from each other. other development projects. For this reason , soft One important aspect of this networking system is ware development engineers used the C program the use of protocol switch tables. These tables con ming language and ai med to produce a portable implementation . This was particul arly important tain the entry points for various protocol modu les in the system, as shown in Figure 1 . DECnet/051 for for the X.25 implementation, which would be used ULTRIX uses these ent ry points to define interfaces in other products. The code was structured to mini for each protocol mod u le . This means that there are mize system-specific references and dependencies. no direct calls from one protocol component into Code that i n terfaced d irectly to the BSD system was another, an i mportant consideration when new isolated in separate modu les, and use of system layers must be in tegrated. Moreover, one protocol specific devices such as timers and buffers was hid module does not access another's databases. Infor den behind generic macros or subroutines. In addi tion, the software was designed to be mation is accessed from a mod ule only through the defined interface . extensible so that fu ture OSI p rotocols could be Insu lating protocol modu les from each other is added . To achieve extensibility, interfaces were advantageous for various reasons. As long as a pro components. tocol mod u le supports a generic i n terface, it can These include appl ication programming interfaces, established between the various act as a service provider for m u ltiple users, which in terfaces i n to each kernel mod u le, and a network a l lows a system to support multiple configurations. management interface. New protocols could be For example, X .25 or h igh-level data l in k control more easily added by supporting these in terfaces. (HDLC) may be configured i n to the kernel only Digital Technical journal Vol. 5 No. I Winter I'J93 35 DECnet Open Networking PROTOCOL SWITCH TABLE ELEMENT O: �------, SOCKET TYPE PROTOCOL FAMILY PROTOCOL NUMBER DOMAIN LIST ... DOMAIN FAMILY DOMAIN NAME POINTER TO BEGINNING OF DOMAIN PROTOCOL SWITCH TABLE POINTER TO END OF DOMAIN PROTOCOL SWITCH TABLE POINTER TO NEXT DOMAIN ENTRY FUNCTION ENTRY POINTS: pr_inpul() pr_oulpul() pr_ctlinput() pr_ctloutput() pr_usrreq() pr_init() pr_fastimo() pr_slowtimo() pr_drain() ELEMENT N.· I ... I SOCKET TYPE PROTOCOL FAMILY PROTOCOL NUMBER ... FUNCTION ENTRY POINTS pr_input() pr_output() pr_ctlinput() pr_ctloutput() pr_usrreq() pr_init() pr_rastimo() pr_slowtimo() pr_drain() t Figure 1 Domains and Protocol Switch Tables when those services are needed. New protocol To make use of the protocol switch table entry modu.les can be easily added . If token ring support points, some minor enhancements were require d . i s added as one of the broadcast devices, using the An extension was m a d e t o t h e control outp u t inter same i nterface as the carrier sense mul tiple access face to a l l ow requests from kernel-level protocol with col l ision detection (CSMA/CD) and fiber d is modules and network m anagement. The i n terface tributed data i nterface (FDDI) modu les, l i t tle or no was further extended to al low protocol modules to change will be required to the network layer. Modularity is another advantage. Complexity can use a port option to identify themselves as cl ients of the service provider, to acquire info r mation be reduced and problems can be isolated more eas from the service provider, or to mod ify the service i ly when i n terfaces between each protocol modu l e provider's behavior. Network m a n agement uses a are carefu l ly defi ned. F o r example, defining a different option passed through the control ou tput network m anagement i nterface for each protocol interface to manage kernel enti t ies. removes the requ irement for network management to access protocol module databases d irectly. Network management code does not need to The control input i nterface was also enhanced . This i nterface prov ides two arguments: a request and a pointer to one or more argu ments to be inter u nderstand the i nternal organization of a modu le or preted as a function of the request. Originally, this the locking strategies that may be required to routine was used to notify I P of events, where each access the data. event had its own u n ique request value. To allow 36 Vol. 5 No. 1 Winter 1993 Digital Technical journal The ULTRIX Implementation ofDECnet/051 DECnet/OSI protocols to use this interface withou t adding several new request values, a general purpose request was introduced . This request is used by a service provider to interrupt one or more of its cl ients to inform them of a change in service. As part of the argument list, the service provider passes a value indicating the exact nature of the event being commu nicated . As an example, the network layer uses this mechanism to inform the transport layer modules of a change to the set of network addresses. Similarly, X .25 uses this interface to provide status abou t specific network connections. The U LTRlX/BSD networking system organizes protocols i nto commu nication domains. The pur pose of a commu nication domain is to group together common properties necessary for process to-process com munication. As an example, t he X .25 domain was designed to provide a ful l set of X .25 services that can be s elected by c l ient proto cols. It includes the socket and protocol switch table interfaces necessary for user-level and kernel level cl ients, X.25 accounting, profile load ing, and trace util ities. The components of DECnet/OSI for ULTRIX may be combined in different ways depending on the configu ration requirements of individual cus tomers. A multiple domain approach was chosen to al low the various products and their development to be separated from one another. For example, network management software was placed in a sep arate domain to a l.low the X .25 and wide area network device driver (WANDO) products to be managed without install ing DECnet/OSI for U LTRIX. Similarly, the OSI domain protocols may operate withou t the X.25 or WANDO products configured into the system. Five domains were established : provide backward compatibility to existing DECnet-ULTRL'( Phase rv applications. 2. The data l i n k domain (AF_DLI) contains all the link protocols, 5. The network management domain (AF_NETJYlAN) contains all the network management functions. These functions can be used to manage any DNA networking product. Data Link D01nain Under DECnet-ULTRIX Phase IV, the routing proto col modu le accessed the drivers directly. In the OS! implementati on, data link interface ( D LI) modu les interface to the device drivers and act as service providers to network layer clients such as rou t i ng. This decision was made to minimize specific DECnet/OSI support needed in the U LTRIX operat ing system device drivers. This al lows changes to be made more easi ly, and it provides a central location for common data link protocol code as well as network management code. The AF_DLI domain provides a common interface to broadcast data links such as CSMA/CD and FDD I . Modules implementing new broadcast d a t a l i n k technologies c a n b e added at a n y time by conform ing to the DLI interface. DLI provides support for ISO 802.2 class I, type 1 functions; these may be used by any broadcast module. Other 802.2 classes are han d led by passing frames d irectly to the client module. The point-to-point protocols consist of H DLC and the Digital data communications message pro tocol (DDCMP). U LTRlX relies on the DDC:MP sup port provided by hardware devices. However, a DDCMP software module exists to interface these devices to network management. HDLC, on the other hand, is entirely implemented as a software module operating over a device driver. Similar interfaces are provided by each protocol. X.25 Domain 1 . The DECnet domain (A F_DECnet) is retained to data control, network services protocol (NSP), OS! transport, DNA Phase V routing. includ ing Logical Link Control (ISO 8802-2), CSMA/CD, FOOl, and HDLC. For DECnet/OSI for U LTRJ X , the AF_DLI domain To ensure consistency with the goals and require ments of DECnet/OSI for U LTIUX, several design alternatives were considered for integrating X .25 into ULTIUX, includ i ng porting a previous Digital i mplementation of X .25, the VAX Packet Switch Interconnect. These alternatives were rejected because they were not consistent with the DECnet/ provides access to the drivers for kernel modu les OSl for U LTRIX implementation and BSD networking as wel l as user applications. in general. A new version of X .25 was implemented 3. The X.25 domain (AF _X25) contains the proto cols necessary to access X.25 networks. 4. The OS! domain (AF_OSI) contains the h igher level DECnet/OSI protocols, i . e . , DNA session Digital 1echnicaljournal Vol. 5 No. 1 Winter 1')')3 in the C language using the protocol switch table infrastructure. This approach provided enough flexibility to a l low the ULTRD\. X .25 code to be easily ported to other product environments such as the WANrouter 250. 37 DECnet Open Networking The X .25 components of DECnet/OSI for UITRIX Functional Specification, version 3. 0. 0. It provides are provided as p art of a wider X .25 strategy that support can support multiple protocol su ites, such as Service (ISO 8473), End System to Intermediate DECnet/OSl, TCP/IP, and for the ISO Connectionless Network International Business System Routing Exchange Protocol (ISO 9542), and Machine Corporation's Systems Network Archi Phase IV rou ting. "Ping," a network loopback fu nc tecture (SNA). Under DECnet/OSI for ULTRIX, X .25 is used in two configurations. It provides the connec tion oriented network services (CONS) support to the OS! transport layer ( ISO 8208, ISO 8878), and it tion specified in A m endment X: Addition of an Echo Function to ISO 84 73 ancl in RFC 1 1 39, is pro vided as a d iagnostic tool to test network access to a node. can be used as a subnetwork for the connectionless Routing can be configured to operate over network service (CLNS) layer. When used with the data l ink entities previously mentioned as well TCP/IP networks, X .25 can be used as a subnetwork as X .25. As an end system, DECnet/OSI for U LTRIX for the IP (Request for Comment [RFC] 877). The interface to X.25 services was designed to be accessed by other kernel components. The proto col switch table was used to implement this inter face. Components such as OSI connectionless network protocol and OS! transport make d irect use of the kernel protocol switch interface with no intervening software layer. does not route protocol data u nits (PDUs). It can, however, operate over mul tiple circu its simultane ously, which allows load balancing across circuits and network red u ndancy. Phase V rou ting is capa ble of au toconfiguring to one or more network addresses. " OS! transport (ISO 8072, I S O 8073) a n d N S P are the two transport modu les supporte d . Both can Access by user-level applications to X .25 occurs be configured to operate over CLNS. However, only through the BSD socket interface. The processing OS! transport can be configured to operate over requ irements of the socket layer and the kernel CONS/X.25. OS! transport class 4 is supported layer provided by the protocol switch are consider over CLNS, and classes 0, 2, and 4 are supported ably different. To reduce the complexity of the ker over CONS/X.25. OS! transport also provides a con nel interface, an X .25 socket converter module was nectionless transport service (CLTS) to its users. provided. The socket converter module manages CLTS is a datagram service that operates over CLNS. issues such as queuing data at the socket interface OS! transport supports two cl ient interfaces and and converting between protocol switch table rou NSP supports one. Both support an interface to DNA tines and socket -layer calls. The converter module session control suppl ied by the protocol switch is treated as a client of the kernel interface. table entry points. OSJ user appl ications directly D irect access to the X .25 kernel interface from IP was not possible due to TCP/IP development constraints. Instead, an IP device converter was access OS! transport through X/Open transport interface (XTI) 6 xn specifies a transport service interface that is independent of the transport supplied with U LTRTX X .25. This X.25 -IP interface provider. O n the U LTRIX i mplementation, XTI is a module appears as a device driver to IP. Further library interface implemented using the socket more, I P can be configured to use X .25 without requiring changes to the TCP/ I P software. The pseudo-driver establishes an X .25 call when data is sent to the X .25 device. After the I P data has been transmitted, the X .25 connection is maintained to layer. It is d iscussed in more detail later in the sec tion Application Programming Interfaces. OSJ transport can have m u l t iple cl ients, and it identifies each client by an address called the trans port sdector. When OS! transport processes an reduce the overhead and cost of X.25 cal l setup incoming connect request, it uses the selector to when the next I P data packet is sent. Configuration determine which cl ient should receive notification of the X .25 I P device is performed using standard of the request. The DNA session control protocol engine was ifconfig management commands. implemented as part of NSI' for the DECnet-ULTRIX OS! Domain The AF_OSI domain contains the rou ting mod u le, Phase TV re lease. It is now implemented as a sepa rate entity to al low operation over m u l tiple trans the transport mod u les, ancl DNA session controL ports (NSP and OS! transport). This modification The routing module is an end system implementa created a subtle problem. DNA session control tion Network resides between the transport layers and the socket A rchitecture (Phase V) Network Routing Layer layer. However, both transport modu les and DNA 38 that adheres to the Digital \·bl. 5 No. I Winter 1993 Digital Technical journal The ULTRIX Implementation ofDECnet/051 session contro l need access to the socket. DNA ses an extensible data structure that consists of both sion control needs access when performing con input and output arguments. I t al lows new argu nection control, and the transport modules need ments to be added by appending fields to the end of access when appending transmit or receive buffers to the socket queues. Since the socket is actually the data structure. The l i brary is designed to support multithreaded open to DNA session control, a mechanism was cre application development. If a threads programming ated to relay the socket pointer to the transport interface is supported o n the ULTRlX operating modules. This information is passed through the system, programmers are able to write applications control output i nterface as p art of the port option. that have multiple control paths executing i n paral Application Programming Interfaces cation that frequently needs to handle requests lel. This is useful in writing a network server appl i To ease the transition of applications from Phase IV to DECnet/OSI, the Phase IV socket interface and programm i ng library were retained . Applications using these i nterfaces w i l l continue to work. This a llows programmers time to modify their applica tions to use the new interfaces and the capabilities provided with DECnet/OSI for ULTRIX . New application programming i n terfaces (APis) were developed . These APis include a DNA Phase V session control programming l i brary, an X.25 pro from multiple clients. A single server application can process requests i n parallel instead of creating additional processes to service each request. Multithreaded support i n the l ibrary was accom plished by removing the use of static and globa l data by the l ibrary. I nformation is returned i n dynamica lly allocated memory, which the appl ica tions are responsible for freeing. X. 25 Interfaces gramming l ibrary, an X.25 socket interface, and Two programming interfaces are provided for the an XTI i n terface. They al low programmers to X .25 component. A socket i nterface is provided for write network applications that use DECnet/OSI full access to X.25 features i n a manner compatible capabi l i t ies. with BSD UNIX. This allows applications to make use of a direct socket i nterface to both TCP/IP and X.25. DNA Session Control Library Through the use of the DNA Phase V session control An X .25 programming l ibrary was created to pro vide a portable programming i nterface that could library and DECdns, appl ications can provide loca be used for access to X.25 across current and future tion-independent services to the network. DNA session control stores information about an appli X.25 l ibrary was constructed o n l ines more compat cation and its services in an object in the DECdns namespace. Client appl ications can access these services by referencing the object name without knowing the current location of the service. DNA Phase V session control applications also have the option of operating over various transport services and network services. The l ibrary gives the application programmer the flexibility of speci Digital implementations. The format of cal ls to the ible with the i nterface defined in the DNA X.25 access architecture than that available through the socket interface. XTI Library The XTI l ibrary bas been extended to provide a framework for developing osr applications. XTI provides a transport-independent programming fying the particular combination of services to be interface that is standard across UNIX operating sys used. As a n alternative, the library can determine tems. On ULTRIX, XTI was implemented to provide a the possible combinations of protocols that are sup portable interface for wri ting TCP/IP applications. ported on both the local and remote systems. This In DECnet/OSI for ULTRIX, the implementation was is done by accessing the addressing information extended to provide support for OS! transport, stored in DECdns for each of these systems. If any combinations of protocols exist, DNA session con including both connection oriented transport ser vice (COTS) and CLTS. In addition to supporting the tion is established. mented . These rou tines provide a mechanism to trol tries each of them in succession u ntil a connec The DNA Phase V session control program m i ng standard XTI calls, service rou t ines were imple build and access a ddressing i nformation needed l ibrary is designed to be extensible. Instead of using within XTT. The addressing i nformation consists of a call ing sequence with numerous parameters, one transport selectors, network addresses, and in ter parameter is p assed o n all calls. This parameter is net ports. Digital Technical journal Vol. 5 No. I Winte-r 1993 39 DECnet Open Networking Support fo r the Internet RFC 1006 specification encodes the network management commands was also added to the XTI ! ibrary.7 This specification using the common management information proto a llows OS! appl ications to run over the TCP/IP pro col (CMIP). The encoded directives are passed to tocol suite. RFC 1006 defines a mechanism for OS! the common management l istener (CML). CML, in transport class 0 (TPO) messages to be mapped across a TCP connection. OSJ appl ications can be written to communicate over either TCP/IP net turn, passes the directives to the appropriate agent in a form the agent can understand. On the ULTRIX implementation, when the connection between works or OS! networks, using the same API. NCL and CML is local, a pipe is used . When NCL in conjunction with the XTI l ibrary to hand l e connection is establ ished . An RFC 1006 daemon was implemented to work needs to connect to a remote CML, an OS! network incoming connection establishment. To a l low mul The event logger (EVl.) takes event messages tiple OSI applications to bind to the same RFC 1006 generated by agents and sends them to either a local TCP port, a simple protocol exchanges file descrip sink or a remote event sink. A local sink is a process tors and a few basic messages between the XTI that is executing locally, but a remote event sink is l ibrary and the daemon, using UNIX domain sock executing on a system elsewhere in the network. In ets. RFC 1006 specifies that a TCP connection be the latter case, the CMIP protocol is used to convey completed and a TPO connect request be received the event message. Events are typ ica l l y d isplayed before an OS! appl ication server can be selected t o on the console or in a fil e . process t h e i ncoming connect. The daemon hides The DECnet/OSI for ULTRIX network m anagement the TCP connection and effectively blocks the OS! implementation is designed to be modular and application server until the TPO connect request extensible. The data dictionary, a key component, describes all the management attribu tes of each occurs. entity. The data d ictionary is a dynamical ly extensi Network Management ble database and is used by a l l network manage DECnet/OSI network management is completely dif ment appl ications. NCL uses the data dictionary to ferent from the m anagement provided for DECnet parse command l ines and display output. CML uses Phase rv . It is based on the Enterprise Management Architecture (EMA), which provides a framewo r k to the data d ictionary to decode/encode CMIP proto col messages from/to NCL, and EVl. uses it to display consistently manage the various components mak an event locally. I n formation about new attributes ing up a distribu ted system.8 DECnet/OSI for ULTRIX or e n tire entities can be added to the data dict io network management consists of a director, an nary without modifying the network management event logger, an agent access modul e , and a n agent applications. Thus layered products can easily add for each manageable protocol entity. Figure 2 support for new manageable objects. The shows the network management environment. The d irector, network control language (NCL), network management environment in DECnet/OSI for ULTRIX is essential ly a message pass provides the user interface that al lows network ing scheme, as shown in Figure 2. Like the data d ic management tionary, commands I I to be entered . USER-LEVEL ENTITY SERVICE PROVIDER AGENT I J G----1 USER CML I l'" Figure 2 40 it was designed to be extensible and SOCKET LAYER KERNEL-LEVEL ENTITY SERVICE PROVIDER AGENT l NCL I I USER EVL I AF_NETMAN KERNEL CML KERNEL EVL Network Management Vol. 5 No. 1 Winter 1993 Digital Technical journal The ULTRIX Implementation ofDECnet/051 generic. Al l manageable, DNA-architected entities In DECnet/OSI, some significant architectural use this environment. At the core is a switch, kernel changes were m ade to the maintenance operations CML. Kernel CML passes messages between user CML and any DNA entity. User CML and kernel CML communicate through the socket layer. User-level protocol (MOP). As i n Phase IV, the current imple mentation supports down-line loading and up-line dumping over FDDI and CSMA/CD devices. These agents, in turn, commu n icate with CML using the functions are now performed by using the MOP ver socket-layer interface, and kerneJ-level agents com sion 4.0 protocol over ISO 8802-2 or MOP version 3.0 municate with CML through the control output over E thernet. As part of implementing the new routine for the entity. protocol, support for down - l ine load ing CM IP User-level agents can send mu ltiple responses to scripts was added. These are used by remote sys a single request, but kernel-level agents can send tems only one response per request. Because user-level network management initialization. Cl ient informa such as DECnet/OSI routers to perform agents reside in process space and are separated tion is kept i n a MOP-specific database. By keeping by the socket l ayer, their transactions can be asyn entity-specific information modu lar and d istinct, chronous. Transactions of kernel-level agents, o n the DECnet/OSI for ULTRJX MOP implementation is t h e other h a n d , m u s t be synchronous. When called, consistent with EMA. This contrasts with the they must process the request and return a single DECnet-ULTRIX Phase IV implementation, which response. Whenever multiple responses are to be stores MOP client i nformation in the DECnet nodes returned, as i n a wild-card operation, the agent database. relies on being invoked again by kernel CML for each of the response messages. This program ming precludes the possibility of exhausting system Applications Supported The DECnet Phase IV applications continue to be buffers while conveying i nformation about a large provided with the DECnet/OSI for ULTIUX product. number of subentities. Kernel CML stops requesting These include the file transfer util ity, dcp, the additional responses from a kernel entity when it remote terminal utility, d login, and the m ail utility. detects that the socket receive queue is fu l l . Once These DECnet applications have been modified to there is more room on the queue, it resumes the use the DECnet/OSJ for U LTRl.X programming inter wild-card operation. face and to take advantage of the new DNA Phase V The network management environment pro capabilities. They can accept DECdns ful l names for v ides a core set of routines as an aid to processing node names and run over both the NSP and OS! and building the syntax for each message. It also transport. The DECnet-internet gateway is also pro provides routines that assist in wild-card process vided as part of the product. The gateway provides ing. Agents that make use of these rout ines need bidirectional network access between DECnet and not be aware of the physical structure of each mes i n ternet systems. I t a llows DECnet and TCP/IP users sage. This has several benefits. It provides a com to communicate through their respective file trans mon set of code that is not dupl icated from entity fer, remote login, and mail facil ities. to entity. If there is a problem, i t is corrected in one New OS! applications were written to provide location instead of several. Also, i t makes the imple similar capabil ities to the DECnet appl ications. mentation more portable. The message passing They allow users to access files and terminal emula scheme uses the local operating system's network tion in a multivendor environment. These OS! appli bu ffers. When changing from one operating system to another, the buffering needs to change only i n cations include FTA.t\1, VTP, and X .29 terminal t h e common code and n o t in each of t h e agents. Entities may need to originate event messages support. Just as the DECnet-internet gateway is pro vided, osr app l ications provide their own gateways to l i nk OSI and internetY bound for the EVL. The mechanism providing this support is basically the same as the message passing U LTRIX X .2 5 includes X .29 terminal support. A packet assembler/disassembler (PAD) provides out scheme previously described . A kernel EVL switch going access. Thus PAD al lows terminal emu lation receives event messages from either a user-level or for X.25 connections to remote hosts in much the kernel-level agent and passes the event up to its same way that the VTP does in a fu l l OS! stack. For counterpart through the socket layer. With t h is mechanism, however, messages flow i n only one incoming X.29 calls, a U N IX daemon creates an X.29 login process o r activates an application based on d irection, from the entity to the event logger. X.29. Digital Technical journal Vol. 5 No. 1 Winter 1993 41 DECnet Open Networking Installation and Corifiguration questions are stored in a file to provide default DECnet/OSI for U LTRIX networking software a llows answers to simplify subsequent reconfiguration. the use of OS! addressing and access to global naming services. I t provides new network man Summary agement u t i l ities and the ability to configure a The design of DECnet/OSI for ULTRIX was a challeng network stack in many d ifferent ways. For example, ing endeavor that resulted in a rich set of capabil i i n configuring X.25, many attributes can be set to ties and a system on which to build new functions. a llow conformance to m a ny public and private I t operates in a complex networking environment packet-switched data networks. The new capabi l i that includes OS!, DECnet Phase IV, X .25, and TCP/IP ties add a degree of complexity to the process of protocols. DECnet/OSI for ULTRIX software allows configuring the networking software. To simpl ify OSI appl ications to function in TCP/IP networks. this process, configuration was separated from RFC 1006 supports the operation of OSI applica i nstallation. Instal l ation occurs when files are tions using TCP/IP connections, and RFC 877 a llows moved from the d istribution media to the target TCP/IP to be configured over X .25. In addition, a set system. Configuration is the process of providing of gateways al lows intercom m unication between i nformation to make the networking subsystem DECnet/OSI and TCP/IP networks. operational. The U LTRD( DECnet/OSI and X .25 setup utilities The DECnet/OSI for U LTRIX system was also designed to be extended to include new functions provide two modes of configuration, basic and as they are incorporated into the OS! standards. advanced . The DECnet/OSI for U LTRIX setup basic New protocol components can be added and used configuration process asks a l imited number of without changing existing components or net questions and is designed for the user who wil l be work m anagement. I n addition, the software was install ing DECnet/OSI for U LTRIX on a workstation designed to be portable. The DECnet/OSI for ULTRIX connected to a local area network. The advanced software has been ported to the DEC OSF/ 1 AXP configuration process and X .25 setup utility p ro operating system, and DECnet/OSI version 1 .0 for vide more configuration choices for the network manager who wil l be insta l l ing DECnet/OSI for DEC OSF/ 1 AXP was released i n March 1993. DECnet/OSI for U LTRIX demonstrates D igital's U LTRIX in a server configuration, or who will continu ing commitment to provide the OS! proto require more detailed network configurations. col on platforms based on open systems. The X .25 and wide area network device driver setup U LTRIX system was the first end system to include u t i l ities supply a mechanism for configuring TCP/IP products that fol lowed the DNA OS! strategy. These or DECnet/OSI for ULTRIX to run over X .25 or syn systems can intemperate with either DECnet Phase chronous data links. For a more unified approach to configuring an OS! stack, these setup utilities are IV systems or other OS! systems. As with DECnet Phase IV, DECnet/OSI for U LTRIX continues to pro integrated with the DECnet/OSI for U LTRIX setup vide a set of components consistent with the UNIX advanced process. These setup u tilities add a logi philosophy of networking. cal abstraction above the EMA, which helps to reduce complexity. For each manageable entity o n t h e system, NCL scripts are generated through defa u l t assumptions and responses to configura tion questions. Network configuration is accomplished with s he l l scripts and network management scripts. These mechanisms initial ize manageable entities. At system start-up, the decnetstartup script is exe cuted from within rc.Jocal. This invokes the various NCL scripts to configure the networking software. One or more NCL scripts can be modified indepen dently of the configuration u t i l i t ies to change Acknowledgments The au thors would .like to thank the people, past and present, who contributed to the design and development of the DECnet/OSI for ULTRIX product. Special thanks go to members of the fol lowing teams for their dedication a n d hard work : DECnet ULTRIX, ULTRIX FTAM, U LTRIX VT, OSAK, DECdns, DECeits, U LTRIX X .25, and ULTRIX Wide Area Device Drivers. References attributes of the m anageable entities. As an alterna 1 . ]. Harper, "Overview of Digital's Open Net tive, the setup utilit ies can be rerun to modify the working," Digital Technicaljournal, vol . 5, no. 1 scripts. I n addition, responses to configuration (Winter 1993, this issue): 1 2-20. 42 Vol. 5 No. I Winter 1993 D igital Technical journal The ULTRIX Implementation ofDECnet/OSI 2. Information Processing Systems--'-Open Sys tems Interconnection-Basic Reference Model, ISO 7498 (New York: American National Stan dards Institute, 1984). 3. S. Martin, ). McCann, and D. Oran, " Development M anagement," Digital Technical journal, vol. 5, no. 1 (Winter 1993, this issue): 1 17-129. 9. D. Robinson, L. Friedman, and S. Wattum, "An Implementation of the OSI Upper Layers and of the VAX D istributed Name Service,'' Digital Applications," Digital Tec/:m ical journal, vol. 5, Technicaljournal, vol. 1 , no. 9 (June 1989): 9- 15. no. 1 ( Winter 1993, this issue): 107- 1 16. 4. S. Leffler, W Joy, and R . Fabry, " 4.2BSD Net working Implementation Notes," (Berkeley, CA: University of California Technical Report, 1983). 5. R . Perlman , R . Cal ion, and M. Shand, " Routing Architecture," Digital Technical journal, vol . 5, no. 1 (Winter 1993, this issue): 62-69. 6. X/Open Company, Ltd . , X!Open Portability Guide, Networking Services (Englewood Cliffs, NJ : Prentice-Hall, 1988). 7. 8. M . Sylor, F. Dolan, and D. Shurtleff, " Network General References D. Comer, Internetworking with TCP/IP: Principles, Protocols and A rchitecture (Englewood Cliffs, NJ: Prentice-Hall , 1988). S. Leffler, M . McKusick, M. Karels, and ). Quarter man, The Design and Implementation of the 4. 3 BSD UNIX Operating System (Reading, .MA: Add i son-Wesley Publishing Company, May 1989). M. Rose and D. Cass, " Request for Comments: S. Leffler, W Joy, and R. Fabry, "A 4.2BSD lnterpro RFC 1006, ISO Transport Services on Top of the cess Communication Primer," (Berkeley, CA: Univer TCP, Version 3," May 1987. sity of California Technical Report, 1983). Digital Technical journal Vol. 5 No. 1 Winter 1993 43 Chran -Ham Chang Richard Flower john Forecast Heather Gray William R. Hawe K. K. Ramakrishnan Ashok P. Nadkami Uttanz N. Shik.arpur Highperformance TCP/IP and UDP/IP Networking in DEC OSF/lfor Alpha AXP Kathleen M. Wilde The combination of the Alpba AXP workstations, the DEC FU/Jicontroller/ TURROchan nel network inte;jace, the DEC OS!! I operating system, and a stream lined implementation of the TCP/IP and UDP/IP delivers to user applications almost the full FDDI bandwidth of 100 Mb/s. This combination eliminates the network 110 bottleneck for distributed systems. The TCP!IP implementation includes extensions to TCP such as supportfor large transport windowsj()r higher perjbrmance. This is particularly desirablefor h(i!,het'speed networks and/or large delay networks. The DEC FDD!controllet/1 URIJ0channel network interface deliuersfull bcmdtL'idth to the system using D/vlA, and it supports the patented point-to-point, full-duplex FDDI mode. Measurement results show UDP pe1jonnance is comparable to TCP. Unlike typical BSD-derived systems, tbe UDP receive t!Jroughpu t to user applications is also maintained at high load. We have seen significan t i ncreases in the band tion of al ternative network protocols to achieve width avail able for compu ter com m u n ication net h igher performance. 2·.1 . 1 works in the recent past. Commercia l ly available local area networks (I.ANs) operate at 100 megabits This paper describes the work we d id to remove the end system network 1/0 bottleneck for cu rrent per second (Mb/s), and research networks are run commercia l ly avai lable high-speed data l i nks, such ning at greater than 1 gigabit per second (Gb/s). as the fiber distributed data interface (FOOl). ' <' We Processor speeds have also seen dramatic increases used the conventional internet protocol suite of at the same time. The u ltimate throughput del iv transmission control protocol/internet protocol ered to the user application, however, has not (TCP/ IP) and the user datagram protocol/in ternet i ncreased as rapid ly. This has led researchers to protocol (UDP/ I P) on Alpha AXP hardware and soft say that network 1/0 at the end system is the next ware platforms.- H.'J The specific hardware platform bottleneck. 1 One reason that network I/O to the application was the DEC 3000 t\."XI' Model 500 workstation with the DEC rf)Dicontrol ler/TC RI30channel adapter has not scaled up as rap i d ly as communication l ink (DEFTA) . The software platform was the DEC OSF/ 1 bandwidth or CPU processing speeds is that mem operating system version 1 . 2 using the TCP an<.l UDP ory bandwidth has not scaled up as rap i d l y even transport protocols. The combination of the Alpha though memory costs have fallen dramatical ly. AXP workstations, the DEFTA adapter, the DEC OSF/ 1 Network 1/0 i nvolves operations that are memory operating system , and a s treamlined implementa intensive due to data movement and error check tion of the TCP/IP and ! ID P/ I P del ivers to user appl i ing. Scal ing up memory bandwidt h , by making cations essentially the fu ll FDDI bandwidth of 100 memory Mb/s. either wider or faster, is expensive. The result has been a n increased focus on the While the DEC FDDicontrol ler/TU RBOchannel design and implementation of higher-performance network i nte rface is lower cost than previous FDDl network in terfaces, the re-exami nation of the contro l lers, it also del ivers fu l l bandwidth to the implementation of network 1/0, and the considera- system using d i rect memory access (DMA). In 44 Vol. 5 No. J Winter I'J'J.) Digital Technical journal High-performance TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha A.XP addition, it supports the patented point -to-point, the DEC OSF/1 operating system version 1 .2 to full-duplex FDDI mode. This al lows a link to be used improve the performance of TCP and UDP. This with 100 M b/s i n each direction simultaneously, section also provides measurement results for TCP which increases throughput in some cases and and UDP with DEC 3000 AXP workstations running reduces latency compared to the standard FDDI ring DEC OSF/ 1 version 1 . 2 in a few different configura mode. tions. Also included are measurements with TCP Incremental work for data movement and check and UDP with Digita l's patented fu l l-duplex mode sums has been optimized to take advantage of the for FDDI, which can potential l y increase through Alpha AXP workstation architecture, includ ing 64-bit support, wider cache l ines, and the coher point li nks (which can also be used in switched ence of cache blocks with DMA. Included in the FDDI LANs). A few implementation ideas cu rrently TCP/IP implementation are extensions to TCP recently recommended by the I nternet Engineering Task Force (IETF), such as support for large trans put and reduce latency in FDDI LANs with point-to under study are also presented in the section on Experimental Work. port windows for higher performance . 1 0 This is System Characteristics particularly desirable fo r high-speed networks The project to improve the implementation of We feel that good overload behavior is also suite) networking was targeted on the DEC 3000 Digita l's TCP/IP and UDP/IP (the internet protocol and/or large delay networks. important. Workstations as wel l as hosts acting as A..'CP Model 500 workstation, running the DEC OSF/1 servers see substa ntial load due to network I/0. operating system version 1 .2. Since we were inter Typical i mplementations of UDP/IP in systems based ested in achieving the highest pe rformance pos on the UNIX operating system are prone to degrada sible on a commercially available data l i nk, we tion in throughpu t delivered to the appl ication as chose FDDI, and used the DEC FDDicontroller/ the received load of traffic to the system increases TURBOchannel adapter (DEFTA) to communicate beyond its capaci ty. Even when transmitting UDP/IP between the Alpha A.XP workstations. In this sec packets from a peer transmitter with similar capa tion, we describe the features of the workstations, bil ities, relevant characteristics of FDDI, the internet pro the receiver experiences considerable packet loss. In some cases, systems reach receive tocol su ite, and the DEC OSF/ 1 operating system " l ivelock," a situation in which a station is only itself, relative to the involved in processing interrupts for received pack tion. The architectural features of the Alpha AXP networking implementa ets or only partially processing received p ackets workstations as wel l as the DEC FDD!controller/ without making fo rward progress in del ivering TURBOchannel adapter are shown in Figure 1 . packets to the user appl ication. 11 Changes to the The Alpha AXP System implementation of UDP/IP and algorithms incorpo rated in the DEFTA device driver remove this type of The Alpha A.XP workstation, DEC 3000 A.XP Model congestion loss at the end system u nder heavy 500 was chosen for our research. The system is receive l oad . These changes a lso eli mi nate u nfair built around Digital's 21064 64-bit, reduced instruc ness in al location of processing resources, which tion set computer (RISC) microprocessor. results in starvation (e . g . , starving the transmit path of resources). The next section of this paper discusses the char acteristics of the Alpha AX P workstations , the DEC OSF/ 1 operating system, and the two primary trans port protocols in the internet protocol suite, TCP and UDP. We provide an overview of the implemen tation of network 1/0 in a typical UNIX system using Digital's described i n the paper. 12 The section on Performance Enhancements and Measurements Resu lts then describes the specific implementation enhancements incorporated i n Digital Teclmicaljour11a l Vol. 5 No. I Winter 19')3 Microprocesso r The DECchip pipel ined and capable of issu ing two instructions per clock cycle. t3. 14 The DECchip 21064 micropro cessor can execute up to 400 mil l ion operations per second. The chip includes • An 8 - k b direct-mapped instruction cache with a 32-byte l ine size the Berkeley Software Distribution (BSD) to moti vate several of the implementation enhancements 21064 21064 CPU chip is a RISC microprocessor that is fu lly • An 8-kb d irect-mapped data cache with a 32-byte l ine size • Two associated translation buffers • A four-entry (32-byte-per-entry) write b uffer 45 DECnet Open Networking MEMORY ADDRESS CPU ADDRESS DECCHIP 2 1 064 CPU t SECONDARY CACHE 5 1 2 KB SYSTEM CROSSBAR t MAIN MEMORY MEMORY DATA CPU DATA > t � �-------,--____;_---.., t SYSTEM 1/0 B U S (TURBOCHANNEL) T U R BOCHAN N E L BUS I NTERFACE FOOl DMA E N G I N E ADAPTER PACKET MEMORY Figure 1 The Alpha AXP Workstation-CPU, Memory Subsystem, and the FDDicontroller/TURBOchannel Adapter • • A p ipelined 64-bit i nteger execution u n it with a block allocation policy a llocates on both read 32-entry register file misses and write misses. Hardware keeps the cache A pipelined floating-point unit with an addi coherent on DMAs; DMA reads probe the second tional 32 registers The DEC 3000 AXP Model 500 Workstation level cache, and DMA writes update the second level cache, while inva l i d ating the primary data The cache. More details of the DEC 3000 A.XP Model 500 DEC 3000 AXP Model 500 workstation is built AXP workstation may be obtained from "The around the DECchip 21064 m icroprocessor running Design of the DEC 3000 A.XP Systems, Two High at 150 megahertz (MHz) . 1' In addition to the on-chip performance Workstations." I' caches, there is an on-board second-level cache of 512 ki lobytes (kB). Main memory can be from 32 MB to 256 MB (1 GB with 16 M B dynamic random-access DEC OSF/1 Operating System memories [DRAMs] ) . The memory bus is 256 bits A.XP systems is an implementation of the Open plus error-correcting code (ECC) wide and has a bandwidth of l l4 MB/s. Standard on the system is Software Fou ndation (OSF) OSF/ 1 version 1 .0 and also a 10-Mb/s Ethernet interface (LANCE). For con 64-bit kernel architecture based nection to external peripherals there is an on-board Mellon small computer systems interface (SCSI)-2 i nterface Components from 4.3 BSD are included, in addition and six TURBOchannel slots with a maximum l/0 to UNIX System Laboratories System V i nterface throughpu t of 100 M B/s. One of the TURBOchannel slots is occupied by the graphics adapter. DEC OSF/ 1 operating system version 1 .2 for Alpha version 1 .1 technology. The operating system is a University's Mach version on Carnegie 2.5 kernel. compatibi l i ty. D igi tal's versi o n of OSF/1 offers both rel iability The system uses the second-level cache to help and high performance. The standard TCP/IP and mini mize the performance penalty of misses and UDP/!P networking software, interfaces, a nd proto write throughs i n the two relatively smaller pri cols remain the same to ensure full m u ltivendor mary caches in the DECchip 21064 processor. The i n teroperabi l ity. Tbe software has been tuned ancl second-level cache is a direct-mapped, write-back new enhancements have been added that improve cache with a block size of 32 bytes, chosen to match performance. The i nterfaces between the user the block size of the primary caches. The cache application and the internet protocols i nclude both 46 Vol. 5 No. 1 Winter 1993 D igital Teclmica l journal High-performance TCP/IP and UDP/IP Networking in DEC OSF/I jo1· Alpha AXP the BSD socket interface and the X/Open Transport Interface. 1 2 The internet implementation condi tional ly conforms to RFC 1 122 and RFC 1 123. 16· 17 packet processing, based on packet size, occurs in a Some of the networking u t i lities i ncluded are Telnet; file transfer protocol (FTP); the Berkeley "r" ious types. These are called mbufs. They are the pri util ities (rlogin, rep, etc.); serial line internet proto headers) through the protocol layers. The protocol typical BSD 4.3 distribution. The kernel memory is organized as bu ffers of var mary means for carrying data (and protocol col (SLIP) with optional compression; Local Area modules organize the data into a packet, compute Transport (LAT); screend, which is a filter for con its checksum, and pass the packet (which is a set of trol l i ng network access to systems when DEC OSF/ 1 mbufs chained together by pointers) to the data is used as a gateway; and prestoserve, a fil e system l ink driver for transmission. From these kernel accelerator that uses nonvolatile RAM to improve mbufs, the data has to be moved to the buffers on Network File System (NFS) server response time. the adapter across the system bus. Once the adapter The implementation also provides a STREA.I\1S i nter has a copy of the header and data, it may return an face, the transport layer interface, and allows for indication of transmit completion to the host. This STREAMS (SVID2) and sockets to coexist at the data allows the device driver to release the kernel mbufs link layer. There is support for STREAMS drivers to to be reused by the higher l ayers for transmitting or socket protocol stacks and support for BSD drivers for receiving packets (if buffers are shared between to STREAMS protocol stacks via the data li nk transmit and receive). provider in terface. While receiving packets, the adapter moves the received data i nto the host's kernel mbufs using DMA. The adapter then interrupts the host proces The OSF/1 Network Protocol Implementation sor, indicating the reception of the packet. The data The overall performance of network l/0 of a work station depends on a variety of components: the processor speed, the memory subsystem, the host bus characteristics, the network interface and fina l l y, and probably the most important, software structuring of the network 1/0 functions. To und er stand the ways in which each of these aspects influ ences performance, it is helpful to understand the structuring of the software for network l/0 and the characteristics of the computer system (processor, memory, system bus). We focus here on the struc turing of the end system networking code related to the internet protocol suite in the DEC OSF/l oper ating system, fol lowing the design of the net working code (4. 3 BSD-Reno) in the Berkeley UNfX d istribu tion. 8.9. 1 2 A user process typical ly interfaces to the net work through the socket l ayer. The protocol mod u les for UDP, TCP (transport layers) and IP (network layer) are below the socket layer in the kernel of the operating system. Data is passed between user pro cesses and the protocol modules through socket buffe rs. On message transmission, the data is typi cally moved by the host processor from user space to kernel memory for the protocol layers to packet l in k driver then executes a filter fu nction to enable posting the packet to the appropriate protocol pro cessing queue. The data remains in the same kernel mbufs during protocol processing. Buffer pointers are manipu lated to pass references to the data between the elements processing each of the proto col layers. Finally, on identifying the user process of the received message, the data is moved from the kernel mbufs to the user's address space. Another important incremental operation per formed i n the host is that of compu ting the check sum of the data on receive or transmit. Every byte of the packet data has to be examined by the pro cessor for errors, adding overhead in both CPU pro cessing and memory bandwidth. One desirable characteristic of doing the checksum after the data is in memory is that it provides end-to-end protec tion for the data between the two commun icating end systems. Because data movement and check sum operations are frequently performed and exer cise components of the system architecture (memory) that are difficult to speed up signifi cantly, we looked at these in detail as candidates for optimization. transmission. The boundary crossing from user to The Internet Protocol Suite: TCP/IP and UDP/IP kernel memory space is usual ly needed i n a general The protocols targeted for our efforts were TCP/lP ize and del iver to the data l ink device driver for purpose operating system for protection purposes. and UDP/IP, part of what is conventional ly known as Figure 2 shows where the incremental overhead for the internet protocol suiteJ,9 Digital Technical jou rnal Vol. 5 No. 1 Winter 1993 47 DEC net Open Networking TRANSMIT USER - -, (2) _ _ ; - - - ��;:�' - -----�T� ;, COPY COPY � I J � I I CHECKSUM _ l I _ _ I _ TCP DATA - - - (1 ) I - � - - (2) CHECKSUM _ _ - -iP - :�I� RECEIVE USER _ _ TRANSPORT LAYER _ _ _ _ l I iP - - NETWO R K LAYER DATA LINK LAYER � I _ FOOl _ _ _ TCP l I iP I _ DATA TC P I G _ DMA _ _ _ _ I DATA _ _ I _ (1) _ FOOl Figure 2 The incremental data operations occur in three places: (1) when the data is moved using D1l1A between the kernel and the network adapter memory, (2) when a checksum is computed for the data, and ()) when the data is copied between the user process and the kernel. TCP is a rel iable, connection-oriented , end sma l l, or one leading to nonbalanced sender and to-end transport protocol that provides flow receiver buffer sizes, can result in u nnecessary controlled data transfer. A TCP connection contains blocking and subsequent inefficient use of available a sequenced stream of data octets exchanged bandwidt h . rel iability TCP d ivides a stream o f d a t a i nto segments for through positive acknowledgment and retransmis between two peers. TCP ach ieves transmission . The maximum segment size (MSS) is sion. It achieves flow control and promotes effi negotiated at the time of connection establishment. cient movement of data through a sliding window In the case of connections within the local net scheme. The sliding window scheme al lows the work, TCP negotiates an MSS based on the maximum transmission of multiple packets while awaiting the transmission unit (MTU) size of the underlying receipt of an acknowledgment. The number of media . (For IP over FDDI the MTU is constrained to bytes that can be transmitted prior to receiving a n 4, 352 octets based on the recom mendation in RFC acknowledgment i s constrained by t h e offered win 1390. 18) TCP calculates the MSS to offer, by subtract dow on the TCP connection . The window ind icates how much buffering the receiver has available for ing from this MTU, the number of octets required the TCP connection (the receiver exercises the flow for the most common IP and TCP header sizes. control). This window size also reflects how much The implemen tation of TCP/IP in DEC OSF/1 fol lows the 4 . 3 BSD-Reno implementation of TCP. data a sender s hould be prepared to bu ffer if I ncluded is the use of dynamic round-trip time retransmission of data is required. The size of the measurements by TCP, which maintains a timer offered window can vary over the life of a connec per connection and uses adaptive time-outs for set tion. As with BSD systems, DEC OSF/ l currently ting retransmission timers. The implementation mai ntai ns a one- to-one correspondence between window size and bu ffer size al located at the socket incl udes slow start for reacting to congestive loss and op timizations such as header prediction and erroneous choice of window size, such as o ne too performance. 19 DEC OSF/1 version 1 . 2 also incl u des layer in the end systems for the TCP connection. An 48 delayed acknowledgments important for network Vol. 5 No. I Winter 1993 Digital Technical journal High-performance TCP!IP and UDP/IP Networking in DEC OSF/1 for Alpha AXP recent extensions to TCP for accommodating protocol that does n o t provide reliable delivery or higher-speed networks 10 TC P's perform ance m ay flow contro l. The receive socket bu ffer size for UDP depend upon the window size used by the two l imits the amount of data that may be received ami peer entities of the TCP connection. The product of buffered before it is copied to the user's address the transfe r rate (bandwidth) and the round-trip space. Since there is no flow control, the UDP delay measu res the window size that is needed to receiver may have to discard the packet i f i t receives maxim ize throughput on a connection. a large burst of messages and there is no socket I n the TCP specification RFC 793, the TCP header buffer space. contains a 16 -bit window size field wh ich is the If the receiver is fast enough to al low the user receive window size reported to the sender 9 Since application to consume the data, the Joss rate is the field is only 16 bits, the largest window size that very low. However, most BSD-derived systems today is supported is 64K bytes. Enhancing the original experience heavy packet loss for UDP even when specification, RFC 1323 defines a new TCP option, window scale, to a l l ow for larger windows. 1 0 Th is the receiving processor is the same speed as the option contains a scale value that is used to increase control , there is no mechanism to assure that a l l the window size value found in the TCP header. transmitted data wil l b e received when the trans transm i t ter. Furthermore, since UDP has no flow The window scale option is often recommended mitter is faster than the receiver. We describe our to improve throughput for networks with high i mplementation of UDP to avoid this behavior, so bandwidth and/or large delays (networks with large that packet loss is minimized. bandwidth-delay products). However, it also can lead to higher throughput on LANs such as an FDDI Data Link Characteristics: FDDI token ring . Increased throughpu t was observed FDDI is a 100 M b/s LAN standard that is being with window sizes larger than 64K bytes on an FDDI deployed commercia l. l y. network. access method and al lows up to 500 stations to be It uses a timed-token The TCP window scale extension maps the 16-bit connected with a total fiber length of 200 ki lo window size field to a 32-bit value. It then uses the meters. I t a llows fo r both synchronous and asyn TCP window scale option value to bit-shift this chronous traffic simu l t aneous ly and provides a value, resulting in a new m aximum receive window bound for the access t i me to the channel for both size value. The extension al lows for windows of up these classes of traffic. to I gigabyte (GB). To fac i l itate backward compati The timed-token access method ensures that all b i l i t y with existing implementations, both peers stations on the ring agree to a target token rot a t ion must offer the window scale option to enable win time (TTRT) and l i mit their transmissions to this tar dow scal ing in either direction . Window scale is get. 20 With asynchronous mode (the most widely au tomatical ly turned on if the receive socket bu ffer used mode in the industry at present), a node can size is greater than 64K bytes. A user program can transmit only if the actual token rotation t ime (TRT) set a l arger socket buffer size via the setsockopt( ) is less than the target. system ca I I . Based on the socket buffer size, the ker n e l implementation can determine the appropriate window sca l e factor. The basic algorithm is that each station on the ring measures the time since it last received the token. The time interval between two successive Similar to the choice of large window sizes, the receptions of the token is called the TRT. On a use of l a rge TCP segments, i . e . , those approaching token arrival, if a station wants to transmit, it com the size of the negotiated MSS, could give better pu tes a token holding time (THT) as: THT = TTRT - performance than smaller segments. For a given TRT. The TTRT is agreed to by all the stations on the amount of data, fewer segments are needed (and ring at the l ast time that the ring was initial ized (typ therefore fewer packets). Hence the total cost of ically happens when stations enter or leave the protocol processing overhead at the end system is ring) and is the minimum of the requested values by less than with smaller segments. the stations on the ring. If THT is positive, the sta The i nternet protocol sui te also supports the user datagram protocol or DP. U D P performance is important because it is the u nderlying protocol fo r network services such as the NFS. UDP is a connectio n-less, message-oriented transport layer Digital Technical journal Vol. 5 No. I Win ter 199.3 t i o n can transmit for this interval. At the end of transmission, the station releases the token. If a sta tion does not use the entire THT a llowed, other sta tions on the ring can use the rem a i ni ng time by using the same a lgorith m . 49 DECnet Open Networking A number of papers relating to FOOl have memory data structures, minimizes overhead 110 appeared in the l iterature, and the reader is encour related to the port interface, and minimizes inter aged to refer to " Performance Analysis of FDOI rupts to the host system. Token Ring Networks: Effect of Parameters and Guidelines for Setting ITRT," for more details. ZJ The Port Architecture contains several u nique features that optimize adapter/host system perfor mance. These features include the elimination of much of the control and status information trans Network Adapter Characteristics The DEC FDDicontroller/TURBOchannel adapter, DEFTA, is designed to be a high-performance adap ter capable of meeting the fu l l FDDI bandwidth. It provides DMA capability both in the receive and transmit directions. It performs scatter-gather on transmit. The adapter has 1 MB of packet buffering. By default, half the memory is used for receive buffering; one-fourth of the memory is al located for transmit buffering; and the remaining memory is al located for miscellaneous functions, including buffering for FOOl's station management (SMT). The memory itself is not partitioned, and the adapter uses only as much memory as necessary for the packets. It avoids internal fragmentation and does not waste any memory. The receive and transmit DMA operations are handled by state machines, and no processor is involved in data movement. The DMA engine is based on the model reported by Wenzel. 22 The main concept of this model is that of circular queues addressed by producer and consumer indices. These indices are used by the driver and the adapter for synchronization between themselves; they indi cate to each other the availability of buffers. For example, for receiving packets into the kernel memory, the device driver produces empty buffers. By writing the producer index, it indicates to the adapter the address of the last buffer produced and placed in the circular queue for receiving. The adapter consumes the empty buffer for receiving an incoming packet and updates the consumer index to indi cate to the driver the last bu ffer that it has consumed in the circular queue. The adapter is ferred between the host and adapter; the organiza tion of data in host memory in such a way as to provide efficient access by the adapter and the host; and the use of an interrupt mechanism, which elim inates unnecessary interrupts to the host. The design also optimizes performance through careful organization of data in host memory. Other than the data buffers, the only areas of host memory that are shared by the host and the adapter are the queues of buffer descriptors and the area in which the adapter writes the consumer indices. The adapter only reads the buffer descriptors; it never writes to this area of host memory. Thus the impact on host performance of the adapter writing to an area in memory, which may be in cache memory, is eliminated . On the other hand , the area in host memory where the adapter writes its consumer indices is only written by the adapter and only read by the host. Both the receive data consumer index and transmit data consumer index are written to the same longword in host memory, thus possibly eliminating an extra read by the host of i nformation that is not in cache memory Furthermore, the pro ducer and consumer indices are maintained in dif ferent sections of memory (different cache l ines) to avoid thrashing in the cache when the host and the adapter access these indices. The device driver is also designed to achieve high performance. It avoids several of the problems asso ciated with overload behavior observed in the past. 23 We describe some of these enhancements in the next section. FDDI's SMT processing is performed by a processor Performance Enhancements and Measurements Results on board the adapter, with the adapter's receive and We describe in this section the various perfor capable of full-duplex FDDI operation. Finally, transmit state machines maintaining separate queues for SMT requests and responses. To obtain high performance, communication mance enhancements included in the DEC OSF/1 operating system version 1 .2 for Alpha AXP systems. In particular, we describe the optimizations for adapters also try to minimize the amount of over data head involved in transferring the data. To improve implementation details to provide good overload movement and checksum validation, the performance, the DEFTA FDDI port interface (inter behavior within the device driver, the TCP enhance face between the hardware and the operating ments for high bandwidth-delay product networks, system's device driver) makes efficient use of host and the UDP implementation enhancements. 50 Vol. 5 No. 1 Winter 1993 Digital Tecbllical journal Highperformance TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha AXP We also present measurement results showing For network 1/0, the bcopy( ) rou tine is called to the effectiveness of the enhancements. In most transfer data between kernel mbuf data structtJres cases the measurement environment consisted of and two Alpha AXP workstations (DEC 3000 AXP Model send( )/recv( ) cal l s. user-supplied buffe rs to read( )/write( )/ 500) on a private FOOl token ring, with a DEC FDDI The bcopy( ) rou tine was writ ten in assembler. concentrator. The tests run were similar to the This routine always a ttempts to transfer data in the well-known tt cp test suite, with the primary l argest u n i ts possible consistent with the al ignment change being the use of the sl ightly more efficient of the supplied bu ffers. For the optimal case, this send and receive system calls instead of read and would be one quadword (64 bits) at a time. The rou write system ca l l s. We call this tool inett within t ine uses a simple load/store/decrement count loop Digital. The throughputs obtained were at the user that iterates across the data bu ffer as appl ication level, measured by sending at least 10,000 user messages of different sizes. With UDP, ldq t1 , these are sent as distinct messages. With TCP, algo addq aD, rithms used by TCP may concatenate m u l t iple mes stq t 1 , sages into a single packet. Time was measured using addq a1 , subq t 2, bne t2, the system clock with system calls for resource 0(a0 l 8 0( 8 8 1 b a1 l ;get ; ( 64 next quadword b i t s ) ; mo v e on ;move on ; reduce ; l oop usage. We also monitored CPU u t ilization with source poi n t e r quadword ; s tore t i poi nter byt e l l count done these system calls, and made approximate (often Several attempts were made to improve the per only fo r relative comparison) conclusions on the formance of this simple loop . One design involved usage of resources with a particular implementa u nro l l ing the loop further to perform 64 bytes of tion alternative. copying at a time, while reading ahead on the sec Optimizationsfor bcopy() and in_checksum() Rou tines cache l ines at once, based on concerns that a sec In TCP/UDP/IP protocol implementations, every same number of clock delays as the first cache miss, ond cache l ine. Another involved operating on four ond quadword read of a cache l i ne may incur the byte of data generally must pass through the if the second read is performed too soon after the bcopy( ) and in_checksu m( ) routines, when there is no assistance provided in the network interfaces. first read. However, neither of these approaches There are some exceptions: the NFS i mplementa simple loop described above. produced a copy routine that was faster than the t ions on DEC OSF/ 1 avoid the bcopy( ) on transmit The TCP/UDP/IP su ite defines a 16-bit one's com by passing a poi nter to the buffer cache entry plement checksum (in_checksu m ( )) , which can be directly to the network device driver, and U DP p erformed by adding up each 16-bit element and may be configured not to compute a checksum adding in any carries. Messages must (optional for on the data. Digital's i mplementations turn on the UDP) have the checksum val idated on transmission UDP checksum by default. Even with the above and recept ion. exceptions, it is important that the bcopy( ) and As with bcopy( ), performance can be improved in_checksu m ( ) routines operate as efficiently as by operating on the largest u nits possible (i.e. , quadwords). The Alpha AXP architecture does possible. To write efficient Alpha AXP code for these rou tines, we used the following guidelines: not include a carry bit, so we have to check if a carry has occurred. Because of the nature of • Operate on data in the largest units possible the one's complement addition algorithm, it is not necessary to add the carry in at each stage; we just • Try to maintain concurrent operation of as many accum u late the carries and add them a l l in at the independent processor u nits (CPU, memory end. By operating on two cache I ines at a time, we may start the next computation while the carry reads, write bu ffers) as possible • Keep to a minimum the nu mber of scoreboard ing delays that arise because the data is not yet available from the memory subsystem • compu tation is under way, accu mu late a l l the carries together, then add them all into the result (with another check for carry) at the end of pro cessing the two cache l ines. This results i n four Wherever possible, try to m ake use of the Alpha cycles per quadword with the addition of some end AXP chip's capabil i t y for dual issue of instructions of-loop computation to process the accu mulated Digital Techntcaljom-nal Vol. 5 No. 1 Winter 1993 51 DECnet Open Networking carries. I nterleaving the checksum computation movement operations to place data in contiguous across two cache li nes also al lows for some dual memory locations. issue effects that a l low u s to absorb the extra end of-loop computation. In addition, the driver implements a policy to achieve transmit fa irness. Al though the operating syste m's sched uli ng provides fairness at a h igher DEFTA Device Driver Enhancements Preliminary measurements performed with the DEC FDDicontroller/TURBOchannel adapter (DEFTA) and the OSF/ 1 device driver combination on DEC 3000 AXP Model 500 workstations indicated that we were able to receive the fu l l FDDI bandwidth and del iver these packets in memory to the data link user. Although we show in this paper that the DEC OSF/1 for Alpha AXP system is able to also del iver the data to the user appl ication, we ensure that the solutions provided by the driver are general enough to perform wel l even on a significantly slower mach ine. When executing on such a slow system, resources at the higher protocol layers (buffering, processing) may be inadequate to receive packets level, the pol icies within the driver a l low for prog ress on transmits even under very heavy receive overload. Although the Alpha AXP systems are capa ble of receiving the fu l l FDDI bandwidth, the enhanced transmit fa irness may stil 1 be a benefit under bursty receive loads d u ring which timely transmission is stil l desirable. In addition , as trans mission .l inks become faster, this feature will be valuable. Wherever possible, a l l secondary activities excl uding the transmit and receive paths-have been implemented using threads. Sched u l ing sec ondary activity at a lower priority does not impact the latency of transmit and receive paths. device driver has to ckal with the overload. One of Improvements to the TCP!IP Protocol and Implementation the primary contributions of the DEFTA device The ini tial TCP window size is set to a default or to arriving at the max imum FDDI bandwidth , and the driver is that it avoids receive livelocks u nder very the mod ified value set by the appl ication through heavy receive load . socket options. TCP in BS D 4.3 p erformed a round First, the queues associated with the d ifferent ing of the socket buffer, and hence the offered protocols are increased to a much l arger value (512) window size, to some multiple of the maximu m i nstead of the typical size of 50 entries. This al lows segme nt size (MSS). The imp lementation in B S D 4.3 us to ride out transien t overl oads. Second, to man performed a rounding down to the nearest multiple age extended overload periods, the driver uses the of the MSS. The MSS value is adjusted, when it is capabil ities in the adapter to efficiently manage greater than the page size, to a factor of the page receive interrupts. The d river ensures that packets size. are dropped in the adapter when the host is starved When using a socket bu ffe r size of 16K bytes, of resources to receive subsequent packets. This the round ing down to a mu ltiple of the MSS on minimizes wasted work by the host processor. The FDDI results in the number of TCP segments out device driver also tends to trade off memory for standing never exceed ing three. Depending on computing resources. The driver al locates page the application message size and i nfluenced by size mbufs (8K bytes) so that we minimize the over one or more of both the silly window syndrome head of memory al location, particularly for large avoidance algorithms and the delayed acknowl messages. edgment mechan ism , throughput penalties can be incu rred . 1 ''- 2•1 For transm it ting packets, the driver takes advan tage of the DEFTA 's abil ity to gather data from d iffer ent pieces of memory to be transmitted as a si ngle Our choice in this area was to perform a round ing up of the socket buffer, and hence window size. packet. Up to 255 m b u fs in a chain (although typi This enabled existing appl ications to maintain per call y the chain is small, less than 5) may be tra ns formance regard less of changes to the buffering m i t ted as a packet. In the u nusual case that a chain pe rformed by the u nderlying protocol. For exam of mbufs i s even longer than 255, we copy the last ple, appl ications coded before the rounding of the set of mbufs i nto a single large page-size mbuf, and bu ffer was implemented may have specified a then hand the packet to the device for transm is sion. This enables applications to have considerable bu ffe r size at some power of 2. We believe it also a l l ows better performance when interoperating flexibility, without resulting in extraneous data with other vendors' systems and provides behavior 52 Vol. 5 No. I \Vinter !')<.)) Digital Tecbnical]ourual High-performance TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha AXP that is more consistent to the user (they get at least dow scale Of)(ion to enable window scal ing i n as much buffering as they request). either d irection. A bu ffer size of 4K bytes has long been obsolete The window scale option is sent only at con for TCP connections over FDDI. Digital chose to nection initia l ization time in an
segment. increase this bu ffer to 16K bytes for U LTRJX support Therefore the window scale value is fi..,-xed when the of FDD I . With a socket buffer of 16K bytes, even connection is opened. Since the window scale when rounding up is applied, the amount of data is option is negotiated at initial ization time, only a bit limited to 17,248 octets per round-trip time. We shift to the window is added to the established path found that the throughput over FDDI is l imited by processing and has l i t tle effect on the overal l cost the window size. Th is is due to the effects of of processing a segment. schedu l ing data packet processing and acknowl Changes made to the OSF/ 1 TCP implementation edgments (ACKs), the interactions with window for using the window scale option include the addi flow contro l , and FOOl's token access protocol (described below). 13. 15 receive window shift -count field to the TCP control tion of the send window shift -count field and With memory costs decreasing considerably, block. TCP processing was mod ified : the receive we no longer consider the 16K byte default to window shift-count value was computed based on be an appropriate trade-off between memory the receive socket buffer size, and the window and throughpu t. Based on measurements for d i.f. scale option is sent with the receive window shift ferent values of the window size, we feel t hat the count. A mod ification at connection initialization default window size of 32 K bytes is reasonable . time a llows the received shift-cou nt value to be Increasing the window size from 16K bytes to stored in t he send window s h ift -count, if TCP 32K bytes resu l ts in an increase of the peak receives an segment containing a window throughput over FDDI from approximately 40 M b/s scale option. The receive window shift -count field to approximately 75 Mb/s. However, increasing the is assigned to the window scale option that is sent window size beyond 32K bytes a l l owed us to o n the segment. When the TCP enters increase the throughput even further, which led us to the incorporation of the TCP window sca l e extension. established state for the connection, window scale is turned on if both sides have sent seg ments with window scale. For every incoming seg ment, the window field in the TCP header is The imple Window Scale Extensions for TCP mentation of TCP in DEC OSF/ 1 version 1 .2 is based on the BSD 4.3 Reno d istribution. In addition, we incorporated the TCP window scale extensions left-shifted by the send window shift -count. For every outgoing segment, the window field in the TCP header is right-shifted by the receive window shift-count. based on the model proposed in RFC 1323. 10 Our work fol lowed the implementation placed in the Measurement Results witb TCP with Alpha AXP public domain by Thomas Skibo of the University of Workstations I l l i nois. the throughput with TCP on the DEC OSF/ 1 operat We used the inett tool to measure The TCP window scale extension maps the 16-bit ing system between two DEC 3000 AXP Model 500 window size to a 32-bit value. The TCP window workstations on a private FDDI ring. We observed scale option occupies 3 bytes and contains the type that as the window size increaseu from 32K bytes to of option (window scale), the length of the option (3 bytes), and the "shift-count." The window scale message sizes greater than 3,072 bytes. For example, value is a power of 2 encoded logarithmical ly. The for a user message size of 8, 192 bytes, the through shift -count is the number of bits that the receive window value is right-shifted before transmission . 150K bytes, the throughput general ly increased for put with a window size of 32K bytes was 72.6 M b/s For example, a window shift-count of 3 and a win and increased to 78.3 Mb/s for a window size of 64K bytes. The TCP throughput rose to 94.5 M b/s for a dow s ize of 16K would inform the sender that window size of 150K bytes. For window sizes the receive window size was 128K bytes. The beyond 150K bytes, we did not see a substantial, shift -cou nt value for window scale is l imited to 14. This a l lows for windows of (2 1 6+ 2 14) zw 1 GB. consistent i mprovement in throughput between To facil itate backward compatibility with existing We believe that window scale is requ ired to implementations, both peers must offer the win- achieve higher throughputs-even in a l imited = Digital Technical journal Vol. 5 No. 1 = Winter 1993 the two user appl ications in this environment. 53 DEC net Open Netvvorking FODI token ring of two stations-based on the inter 100 ....- ···....... - . . . ·- - - - - - - - - - - - - - - - - - - - - actions that occur between the token holding time, the scheduling of activities in the operating system, - - - - - - ----------- - - - - - - - - and the behavior of TCP. The defaul t value fo r 'TTRT is set to 8 mill iseconds 21 The end system is able to transmit packets at essentially the fu ll FODI band width of 100 Mb/s, thus poten t i a l ly consu ming about 350 microseconds (including CPU and network interface times) to transm it a maximum sized FDDI TCP segment of 4,31 2 bytes. D u ring the 8 mil l iseconds, the source is able to complete the 0 entire protocol processing of about 23 to 24 seg ments (approximately lOOK bytes). Further overlap of user data and protOcol pro cessing of packets can occur while the data l i n k is transmitting and the sink is generating acknowledg KEY: 40000 20000 60000 USER MESSAGE SIZE (BYTES) 80000 WINDOW SIZE = 32K BYTES WINDOW SIZE = 64K BYTES WINDOW SIZE = 1 50K BYTES ments, if there is adequate socket buffer space in the source system. Thus, with the additional win dow of approximately 20K bytes to 30K bytes, the Figure 3 source system is able to pre-process enough seg A.XP Model 500 Workstations ments and provide them to the adapter. The adapter on an Isolated FDDI Ring may begin transmi t t i ng when the token is returned to the sender (after i t receives a set of acknowledg ments), while the source CPU is processing the acknowledgments and packetizing additional user data. With up to 150K bytes of socket b u ffer (and hence window), there is maximal overlap in pro TCP Throughput as a Function of Window Size: Two DEC 3000 of extra overheacl due to the necessi t y of chaining suc h buffers. Figure 3 also shows that the through put degradation in this case i s smal l . This also explains why no further increases i n the Improvements to the UDP/IP Protocol Implementation and Measurement Results w i ndow size resulted in any significant increase in UDP is a connection-less, message-oriented trans throughput. port, with no assurances of reliable del ivery. It also cessing between the CPU, the adapter, and the FDDJ token ring, which results in h igher throughput. Figure 3 shows the throughput with TCP between does not provide flow control. Unlike TCP, the UDP two DE C 3000 A.XP Model 500 workstations on a n transmitter does not buffer user data. Therefore isolated FOOl token r i n g for different message sizes user messages are transmitted directly as packets fo r socket bu ffer sizes of 32K, 64K, and 150K bytes. on the FODI. When user messages are larger than For 150K bytes of socket buffer, the peak through the MTU size of the data l i n k (4, 352 bytes), J P frag put achieved was 94.5 M b/s. For a l l message sizes, ments the data into m u ltiple packets. To provide we believe that the CPU was not fully u t i l ized. data i n tegrity, UDP uses the one's complement Application message sizes that are sl ightly l a rger checksum tor both data as well as the U D P header. than the maximum transmission unit s i ze tradition In our experience, the receive throughput to a l l y display some sma l l throughput degradation applications u s i ng UOP/IP with BSD-derived systems due to additional overhead i ncurred for segmenta is quite poor due to m any reasons, includ ing the tion and the subsequent extra packet processing. lack of flow control. Looking at the receive path of We do not see this in Figure 3 because the CPU is incoming data for UDP, we see that packets (poten not saturated (e.g . , approximately 60 percent uti t ia l ly fragments) of a U OP message generate a high l ized at message sizes of 8K bytes), and therefore priority i nterrupt on the receiver, and the packet is the overhead for segmentation does not result in placed on the network layer (IP) queue by the lower throughput. So too, appl ication message s izes that are larger device driver. The p r iority is reduced, and a new thread is executed t ha t processes the packet at the than the d iscrete memory buffer sizes provided by JP layer. Su bsequently, fragments are reassembled the memory al locator shou ld incur sma l l amounts and placed in the receiver's socket bu ffer. There is a 54 Vol 5 No. 1 Winter 1993 Digital Technical journal High-petformance TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha AXP finite I P queue and al so a finite amount of socket system to transmit to two different receivers on bu ffer space. If space does not exist in either of different rings. Figure 4 a lso shows the aggregate these queues, packets are d ropped . Provided space transmit throughput of a single DEC 3000 A.'(P exists, the user process is then woken up to copy Model 500 workstation transmitting over two FDDI the data from the kernel to the user's space . If rings simultaneously to two d ifferent sinks. The the receiver is fast enough to allow the user applica source system is capable of transmitting signifi t ion to consume the data, the loss rate is low. cantly over the FDDI bandwidth of 100 Mb/s. For the However, as a resu lt of the way processing is sched typical NFS message size of 8, 192 bytes, the aggre uled in UNJX- I i ke systems, receivers experience gate transmit throughput was over 149 Mb/s. The substantial loss. CPU and memory cycles are con throughput of the two streams for the d ifferent sumed by UDP checksums, which we enable by message sizes, ind icates that, for the most part, default for OSF/ 1 . This overhead in addition to the their individual throughputs were similar. This overhead for data movement contribu tes to the showed that the resources in the transmitter were receiver's loss rate. being d ivided fairly between the two applications. Table 1 shows the receive throughput and mes sage loss rate with the original UDP implementation of OSF/ 1 for different message sizes. We mod ified the 200 way in which processing is performed for UDP in the receiver in DEC OSF/ 1 version 1 .2. We reorder the 0 processing steps for UDP to avoid the detrimental effects of priority-driven schedu l ing, wasted work, and the resulting excessive packet loss. Not only do we save CPU cycles in processing, we also speed up the user appl ication's abi l ity to consume data, par ticularly as we go to larger message sizes. Table 1 gives the receive throughput and message Joss rate with DEC OSF/ 1 1:::J a._ I 6 1 50 U W rJJ a: � � 1 00 Om a: I- �� � � version 1 .2 incorporating the changes in UDP processing we have implemented . 0 UDP throughput was measured between user applications transmitting and receiving different the message size of 8K bytes used by N FS . During 1 0000 20000 30000 TRANSMISSION ON A SINGLE R I NG TO 1 RECEIVER TRANSMISSION ON TWO RINGS TO 2 RECEIVERS transmitter, which is over 96 Mb/s for a ll message these measurements, the transmitting CPU was still ,� -� - - - - - - - - - - - - - - - - -· - - - - - - - - - - - - - USER MESSAGE SIZE (BYTES) KEY: size messages. Figure 4 shows the throughput at the sizes over 6,200 bytes and achieves 97.56 M b/s for 50 / I I I I I I I Figure 4 UDP Transmit Throughput: Single DEC 3000 AXP Model 500 Workstation not saturated and the F DDI link was perceived to be Transmitting as Fast as Possible to the bottleneck. Therefore, to stress the source Single Ring and Receiver and Two system further, we used two FDDJ adapters in the Receivers on Different Rings Table 1 UDP Receive Cha ra cteristics with Peer Transm itter Transmitting at Maximum Rate UDP Message Size (bytes) Receive Before Changes Message Throughput Loss Rate (Mb/s) 1 28 0.086 98.8% 1 024 0.394 99. 1 6% 51 2 4096 81 92 • 0.354 9.5 NA* UDP Receive After Changes Message Throughput Loss Rate (Mb/s) 0.64 98.5% 1 5. 1 4 90.26% 96.91 NA* 83.1 % 35.1 5% 23.77 46.86% 97.01 0.56% 1 .08% NA: Benchmark did not finish because of significant packet loss in that experiment. Digital Technical jour11af Vol. 5 No. I Winter 1993 55 DECnet Open Networking Measurements of TCP/IP and UDP/IP w ith FDDI Full-duplex Mode with ful l - d u plex FDDI operation fo r d ifferent win Earlier we observed that the behavior ofTCP i n par dow scale is used). The through p u t is relatively dow sizes of 32 K , 64K, and 150K bytes (when win ticular depended on the characteristics of the i nsensitive to the variation i n the window size. For timed-token nature of FDDI . One of the modes a l l these measurements, however, we m a intained of operation of FDDI that we believe wil l become the value of the maximum socket bu ffer size to be popular with the deployment of switches and the 150K bytes. When using a smaller value of the maxi use of poi n t - to-point FDDI is that of fu l l-duplex mum socket buffe r size (6 4 K bytes), the through FDDI. D igital 's full-duplex FDDI technology, which put drops to 76 Mb/s (for a window size of 32K is being licensed to other vendors, provides the bytes) as shown in Figure 5. abi l i t y to send and receive simul taneously, resul ting Although we removed o ne of the causes of l i m it i n sign ificantly higher aggregate bandwidth to the ing the throughput (token-holding times), fu ll station (200 Mb/s) . More important, we see this duplex operation sti l l ex hibits l i mitations due to technol ogy reducing latency for po i n t - to-po i n t schedu l ing the ACK and data packet processing and connections. There i s no token rotating o n t h e ring, the resu lting lack of parallelism in the d i fferent and the station does not await receipt of the token components in the overal l pipe (the two CPlJs of to begin transmission. A station has no restrictions the stations, the adapters, a nd the data l ink) with based on the token-hold i ng time, and therefore it is small socket buffers. Increasing the maximum not constrained as to when it can t ransmit o n the socket buffer a l lows for the paral lel ism of the work data lin k . The DEC 1-'DDlcontrol ler/TU RBOchannel involved to provide data to the protocol modu les adapter (DEFTA) provides the capabi l i ty of full on the transm i t ter. duplex opera tion. We in terconnected two DEC Observing the UDP/IP throughp u t between the 3000 AXP Model 500 workstations o n a point-to point I i n k using the DEFTAs and repeated several of DEC 3000 AXP Model 500 workstations, we found a slight increase i n the transmi t throughpu t over the t he measurements reported above. normal FDDI mod e . For example, the UDP transmit One of the characteristics observed was that the throughput for 8K messages was 97.93 Mb/s as com maximum throughput with TCP/IP between the pared to 97.56 V l Ib/s using a single ring in normal two Alpha AXP workstations, even when using the FDDI mode. This improvement is due to the absence default 32K bytes window size, reached 94 .47 Mb/s. Figure 5 shows the behavior of TCP throughp u t Experimental Work 1 00 � 80 §: � 60 � t: f- � 40 0 f-- 0 ::::J W (') W ::::J O. 0 (1) (') w � of small delays for token rotation through the sta tions as a result of using the fu l l- duplex FDDI mode. We have continued to work o n further enhancing ::: . . -- - -- - ---····· ··· . . :::::: :: the implementation of TCP and UDP for DEC OSF/ 1 for Alpha AXP . We describe some of the experimen tal work in this section. Experiments to Enhance the Transmit and Receive Pa thsfor TCPIIP The bcopy( ) and i n_checksu m( ) routine optimiza 20 tions minimize the incremental overhead fo r packet 0 20000 40000 60000 80000 USER MESSAGE SIZE (BYTES) KEY processing based on packet sizes. The protocol pro cessing rou t ines (e.g . , TCP and IP) also minimize the fixed p e r-packet processing costs. W I N DOW W I N DOW WINDOW WINDOW SIZE SIZE SIZE SIZE = = = = 32K BYTES 64K BYTES 1 50K BYTES 32K BYTES; MAX I M U M SOCKET = 64K BYTES Al l TCP outpu t goes through a single routine, tcp_outpu t( ) , which often fol lows the TCP pseu docode in RFC 793 very closely 9 A significant por tion of its implemen tation is weighed down by Figure 5 56 TCP Throughput as a Fu nction code that is useful only du ring connection start-up of Window Size: Two DEC .3000 AXP Jltlode/ 500 Workstations m issions with Full-duplex FDDI band data, a n d so on. Although the actual code and shu tdown, flow con trol, congestion, retrans and Vol. 5 No. I persistence , Winte-r 1993 processing out-of Digital Technical journal Highperformance TCP/IP and UDP/IP Networking in DEC OSF/ I for Alpha AXP that hand les these cases is not executed every time, the checks for these special cases are made on every pass through the rou t i ne and can be a non trivial overhead. Rather than check each case separately, the TCP/ I P code was modified to maintain a bit mask . Each bit in the mask is associated w i t h a special con d it ion (e .g., retransmit, congestio n , connection shu tdown, etc.). The bit is set whenever the cor responding cond ition occurs (e .g., retransmit t ime-out) and reset when the condition goes away. If the bit mask is 0, the TCP/ I P code executes straightl ine code with m i nimal tests or branches, thus optimizing the common case. Otherwise, it simply cal ls the original rou tine, tcp_outp u t , to hand le the special cond i t ions. Since the conditions occur rarely, setting and resetting the bits incurs less overhead than performing the tests expl icitly every time a packet is transmitted. S i m i lar i deas have been suggested by Van ]acobson 21' Additional efficiency is achieved by precomput ing packet fields that are common across all packets transmitted on a single connection. For example, i nstead of computing the header checksu m every time, it is partial ly precomputed and i ncrementally updated with only the fields that differ on a packet by-packet basis. Another example is the data link header compu tation . The original path i nvolved a common rou tine for all devices, which queues the packet to the appropriate d river, i ncurs the overhead of m u lti plexing mu ltiple protocols, looking u p address res olu tion protocol (ARP) tables, determi n i ng the data l ink formats, and then b u i lding the header. For TCP, once the connection is established, the data l i nk header rarely changes for the duration of the con nection. Hence at connection setup time, the data l i n k header is prebu ilt a nd remembered in the TCP protocol control block. When a packet is transm i t ted, the data l in k header is prefixed to the IP header, and the packet is directly queued to the appropriate i nterface driver. This avoids the overhead associ ated with the common rou tine. Network topology changes (e .g., l ink fa i l u res) may require the data l in k header to be changed . This is hand led through retransm ission time-ou ts. Whenever a retransm i t time-out occurs, t h e preb u i l t header is d iscarded and rebuilt the next time a packet has to be sent. Some parameters are passed from TCP to 1 P through fields i n the mbufs. Combi n ing the layers eliminates the overhead of passing parameters and val idating them. Passing parameters is a nontrivial Digital Technical journal Vol. 5 No. 1 Winter 1993 cost, since i n the original i mplementation, some data was passed as fields in the mbuf structure. Because these were format ted in network byte order, building and extracting them incu rred over head . Moreover, the I P layer does not have to per for m checks for special cases that are not appJ icable to the TCP connection. For example, no fragmenra tion check is needed since the code for TCP has al ready taken care to build a packet within the al lowed size l imits. In a similar fashion to the transmit path, a common-case fast path code was implemented for the receive side. This mimics the most frequently executed portions of the TCP/IP input rout ines, an , the IS-IS protocol is upd ated to independently flooded to the other routers. Only in announce the addresses that are reachable by the route computation is any connection made means of that protocol suite. For example, to acid IP between the fragments of a router's LSP. support to IS-IS, a new field i s defined i n the LSPs to Digital Teclmica l journal Vol. 5 No. 1 Winter 1.993 67 DECnet Open Networking annou nce IP addresses, expressed in ordered pairs of the form (IP address, subnet mask). This allows IP addresses and OS! (i.e., DECnet Phase V ) addresses t o b e assigned independently, while still a l lowing most of the overhead fu nctions required by a routing protocol, such as check ing l ink status and propagating the information, to be performed only once for a l l supported protocol suites. I f all routers support a particular protocol , the data packets for that protocol can be transmitted in native mode, i . e . , no additional header is required. If some routers do not support a particular proto col, then the packet must be encapsulated in a net work layer header for a network layer protocol that al l the 15-IS routers do support. In DECnet Phase V, a l l t he routers support both IP and CLNP, so these two protocols are transmitted i n native mode. However, if support for another protocol is added, for instance AppleTalk support, then the routers that have AppleTa l k n e ighbors need to be able to parse AppleTal k packets. However, other routers will not need to be modified . To faci l i tate knowing when to encapsulate, IS-IS routers an nou nce which protocols they support in their IS-IS packets. Also, routers that support the AppleTalk protocol and have AppleTaLk n eighbors l ist in their LSPs that they can reach certain AppleTalk destinations. The IS-IS packets are encoded such that a router can ignore i nformation pertain ing to p rotocol su ites that the router does not support but can cor rectly interpret the rest of the IS-IS packet. Assume that Rl and R2 are the only two routers in an area that support the AppleTalk protocol. Rl and R2 therefore announce in their LSPs which AppieTal k destinations they can reach. R l and R2 use a format for i ncluding AppleTalk information in IS-IS LS Ps that other routers in the same area can forward but will otherwise ignore. Assume R2 receives a n AppleTalk packet for forwarding with destination D3, reachable through R 1 . Then R2 encapsulates the packet as data inside a CLNP (or IP) packet with destination Rl. When R l receives the packet, it removes the CLNP header and fo rwards the packet to D 3. If Rl and R2 are adjacent, or if all the routers along the path from R2 to R l support the AppleTalk protocol, then encapsu lation of AppleTa l k packets i nside CLNP packets would not be necessary. Thus, encaps u l ation occurs automatically only when needed for transmission through routers that do not support the protocol of the data packet to be forwarded . 68 Using one integrated rout ing protocol to route packets from m u l tiple protocol suites has sign ifi cant advantages over using a separate rou ting pro tocol for each suite. Probably the most important advan tage is that only one ro uting protocol needs to be managed and configured . A single coordi nated rou ting protocol can respond to network problems, such as l ink fa ilures, in an efficient man ner, improves bandwidth utilization, and minimizes the CPU and memory requirements in routers. Also, in tegrated rou ting al lows a u tomatic encapsulation and e l i mi nates the need for manual configuration of where and when to encapsu late. Summary IS-IS is a powerfu l and robust rou ting protocoL Many aspects are innovative and have been copied by other routing protocols. When looked at as a whole, the algorithms may appear complex, but when examined individual ly, the designated router election, the LSP propagation, and the LSP database overload proced u re, for example, are a l l quite sim ple. IS-IS provides efficient routing fo r a variety of protocol su ites, currently includi ng DECnet Phase I V, CLNP/DECnet Phase V, and IP. References I. Information Processing Systems, Data Com munications: Protocol for Providing the Connectionless-Mode Network Service, ISO 8473 (Geneva : I nternational Organ ization for Standard ization, 1988). 2. ) . Postel, " Internet Protocol," I nternet Engi neering Tas k Force RFC 791 (September 1981). 3. Information Processing Systems, Telecom munications and Information Exchange between Systems: End System to Intermedi ate System Routeing Exchange Protocol for Use in Conjunction with the Protocol for Providing the Connectionless-Mode Network Service (ISO 8473), ISO 9542 (Geneva: I nter national Organization for Standard ization, 1988). 4. D. Plum mer, " E thernet Address Resolu tion Protocol," In ternet Engineering Task Force RFC 826 ( November 1982). 5. J Postel, " I nternet Control Message Protocol," Internet Engineering Task Force RFC 792 (September 1981 ). Vol. 5 No. I Winter 1')')3 Digital Technical journal Routing Architecture 6. Information Technology, Telecommunica 9. I. Richer, First ate System Intra-Domain Routeing Exchange Report 3803 (April 1978). Protocol for Providing the Connectionless 10. Mode Network Service (ISO 8473), ISO/ I EC for Standardization/] nternational Electrotech and E. Rosen, Semiannu a l Technical Report," BBN E. Rosen et a l . , " ARPANET Routing Algorithm Improvements, Volume 1 ," BBN Report 4473 (August 1980). 10589 (Geneva: International Organization 11. R . Perlman, Interconnections: Bridges and Routers (Reading, MA: A<.ldison-Wesley, 1992). nical Commission, 1992). 8. McQuil lan, Systems: Intermediate System to Intermedi Protocol for Use in Conjunction with the 7 ). "ARPA N ET Routing Algorithm I mprovements, tions and Information Exchange between G. Sidhu, R . Andrews, and A. Oppenheimer, 12. E . D i j kstra, "A Note on Two Problems in Con inside AppleTalk, Second Edition (Reading, nection with Graphs," Numerical Mathemat MA: Addison-Wesley, 1990). ics, vol. 1 (1959): 269-271 . C. Hedrick, " Routing Information Protocol," 13. R. Calion, "Use of OSI IS-IS for Routing in TCP/ I P Internet Engineering Tas k Force RFC 1058 and Dual Environments," Internet Engineer (June 1988). ing Task Force RFC 1 19 5 (December 1990). Digital Tecbnical journal Vol. 5 No. 1 Winter 1993 69 Graham R. Cobb Elliot C. Gerberg Digital's Multiprotocol Routing Software Design The implementation of Digital's multiprotocol routing strategy required address· ing various technical design issues, principal�y the stabili�)' of the distributed rout ing algorithms, netwotk management, performance, and interactions between rou ting and bridging. Deuelopers of Digital's DEC W4Nrouter and DECNIS products enhanced real-time kernel software, implemented pe1jonnance-centered protocol software, and used high-couerage, high-quality testing and simulation methods to solve problems related to these issues. In particulcn; a packet management strategy ensured that queuing requirements were met to guarantee the stability of tbe rout ing algorithms. Also, network management costs were minimized by down-line loading sojtware, using a menu·driuen configuration program, and carejitl moni toring Router pe1jonnance was optimized by maximizing the packet forwarding rate while minimizing the transit delay D igital 's implementation of m u l t iprotocol routing rou ters. This paper descri bes the most s ignificant software enables i nternetwork ing across complex technical problems encoun tered and the solu tions topologies i nclu ding loca l a nd wiue area networks implemented when many internetworking opera· (LANs and WANs) a nd dial-up networks. Evo l v i ng t ions a re i n tegrated i nto D i g i t a l 's mu ltiprotocol from D igita l 's successful tradition in DECnet Phase router system designs. IV networks, the impleme ntation of m u l tiprotocol rou ting cu rren tly supports n umerous protocol and packet types i n c lu d i ng Digital's Router Product Overview Digi t a l 's m u l tiprotocol rou ter p roducts comprise • DECnet Phase IV two types: ( 1 ) access routers, which a l low access to • Transmission control protocol/internet proto· WAN services from branch o ffices for l a rge LAN and col (TCP/IP) • • • • WAN i ntegration networks, and (2) backbone routers, wh ich provide high-speed packet switch Nove l l NetWare internetwork packet excha nge ing services fo r the network backbone of m u l t iple (IPX) protocol types of high-speed med i a . Backbone s i tes offer a AppleTalk protocol suite backbone netwo r k that often consol idates high speed WAN l i nes, e.g . , T l , T3, and SMDS. For high· OS! CLNS, the open systems i n tercon nect ion sreed protocol for providing the con nection less-mode high-speed sw itching fo r many I.AN ports and network service types, i .e . , Ethernet, fiber d istribu ted data interface X .25, the packet switching standard specified by the Comite Consu l ta tif Internationale de Te!egraphique et Telephonique (CCITT) Additional extensions for Digita l 's DECnet Phase V local si tes, backbone routers prov ide (FDDJ), and token ring. Th is section briefly dis· cusses some of Digi t a l 's access rou ters-the DEC WANrouter '500, DEC WA Nrouter 250, a n d DEC \VAN ro u te r 90 products-anu backbone rou ters the DEC N I S '500 and DECN !S 600 products. and ADVANTAGE-NETWORKS architecture req u i re The DEC WA N rou te r '500 is one of Digital's access ments are also supported by D igital 's m u ltiprotocol routers :�nd has been ava ilabl e in the marketplace rou ters. 1 · 2 Many of these routers incorporate bridg s i nce 1986. Origina l ly a DECnet Phase !V-only ing technology, thus prov i d i ng integrated bridging router, t h i s router has been upgraded and now 70 Vol. 5 No. I Winter I'J' J3 D igital Technicaljour11al Digital's Multiprotocol Routing Software Design offers multiprotocol routing that includes DECnet are provided for the DEChub 90 with the DEC Phase IV, TCP/IP, and OS! . Add itional support exists WAN router 90 router. in this access router for common WAN services such The DECNIS 500 and DECNIS 600 (see Figure 2) as X .25 and frame relay. The DEC WAN router 500 is a bridging and routing products are Digital's h ighest fixed-port configu ration router offering one T l perform ing and most flexible platforms. These WAN port a n d o n e Ethernet LA..1'1 port. This configu backbone rou ting systems offer the power and ration permits branch office LANs to interconnect interfaces necessary to meet the bridging and rout to backbone routers over relatively h igh-speed T l i ng requirements of complex, high-speed net l i nes. The DEC WANrou ter 500 has a n important place in rou ter industry history as it was the first router ever to support the integrated intermediate system-to-intermed iate system (Integrated IS-IS) routing algorithm .:I The DEC WANrouter 250, another of D igital's access routers, is significant due to its h igh density of WAN ports and its support for asynchronous WAN data li nk protocols. These two major features com bine with the multiprotocol rou t i ng software to provide a rou ter for the newly emerging compu ter networking needs of mobile computers. The increas works, e.g. , Ethernet, FOOl, T l/El , and T3/S M DS. i Router Software Development Methods Software development for routing systems requires rea l-time kernel software, performance-centered protocol software development implementation, and high-coverage, high-quality testing and simu la tion methods. This section briefly describes some key techniques used in these development areas for the DEC WANrouter and DECNIS engineering programs. ing use of personal compu ters, including mobile laptop computers, has led to the development of new techniques for networking such remote com pu ters. The DEC WANrouter 250 provides eight WAN ports with dial-in access for the internetworking of such remote and mobile computers. The introduction of LAN hub technology has pro Kernel Software D igital bas developed and refined different kernels with com mon interfaces to address the rea l - time software design environments requ ired for their routers. A com mon router interface model has d uced a need for new small router products for these platforms. Digital's DEChub 90 E thernet back p lane product set i ncludes the DEC WANrouter 90 access router shown in Figure 1 . One feature of the DECh ub 90 technology is that this router can be configured to reside within the hub itself or as a standalone modu le. In addition, this rou ter is completely selfcontained and extremely small (i . e . , similar in size to a VHS videocassette). Many WAN access services, such as X .25 network access, Hgure 1 DEC �li.4.Nrouter 90 Access Rou ter D igital Technical journal Vol. 5 No. I Winter 1993 Figure 2 DECNIS 600 Backbone Router 7! DECnet Open Networking permitted d ifferent kernels to be turned to specific platforms as required . In some cases, a common portable kernel was developed that permitted quick retargeting of the total router software in support of short time-to-market development needs. Software Implementation The fol lowing techniques were used in the devel opment of the DEC WA.J.'\I router and DECN!S router software: 1 . Implementing software directly from proprietary or standards-based architecture specifications 2. Licensing software from suppl iers, e . g . , external corporate software providers and govern ment fu nded u n iversity software projects 3. Importation of software from other implementa tions, i .e . , host sources such as the ULTRIX, Open Software Foundation (OSF), and OpenVMS systems D igital has developed special-purpose, highperformance implementations of the I ntegrated IS-IS routing protocol. In addition, specific software kernels provide contro l and extensions for the spe cial features required. Engineers enhanced the real time software kernel s with software interfaces com monly found in public domain software (e . g . , the Berkeley Software Development [BSD] U N IX socket model and system services). The inclusion of such i n terfaces has accelerated the addition of new software from external sources. Common router software has been developed for use across Digital's many internetworking plat forms. The majority of this routing software, which is independent of the underlying hardware, has been developed to support the evolving standards of portabi lity. For each platform, the performance intensive and hardware-specific code have been customized to maximize the design center for each instance of a router product architecture. Router Software Design Issues Many technical problems had to be resolved when build ing Digital's m u l tiprotocol routers. The fol lowing sections describe the most significant issues and how they were addressed in the DECNJS 600 backbone router, as an example of router software design . These issues were 1 . Stabil ity of the distributed routing algorithms 2. Network management 72 3. Performance 4. Interactions between routing and bridging Memory size and usage and congestion control are also key issues. However, this paper does does not describe them in depth. Briefly, the amount of memory available is a major constraint on any router implementer. Usuall y, memory is largely consumed by code and by the databases the router must ma in tain to calculate the best route. I n the case of routers that also perform connectio n-oriented functions (e .g. , X .25 gateways and terminal servers), signifi cant amou nts of memory may be taken up by the per-co nnection state and counter information. Since it is essen tial for routers in the network to agree on the best route to a destination , a l l such routers must be able to hand le the route database for that network. Digital's router designs have an automatic shu tdown mechanism that takes effect shoul d a router run out of memory in which to store rou ting informatio n . This mechanism pre vents routing loops. To control OSI congestion, the router must deter mine whether or not a packet experienced co nges t ion by calculating the average transmission queue length over time. This calculation must be per formed in an efficient rea l-time manner. Thus, for the DEC WANrouter and DEC NIS products, Digital designed and implemented algorithms specific to the particular queue structures and hardware design. Stability ofthe Distributed Routing Algorithms D istributed routing algorithm stabil it y was the most important issue considered in the design of Digital's router systems. A system design must guar antee successful results i n support of routing con trol protocols even when the router is operating under a high load. Whatever protocol is used, dynamic rou ting requires that all nodes that make decisions on how to forward data should agree on the correct path. Otherwise, data packets wil l be discarded (e.g. , if sent to a node that does not know how to reach the destination) or may loop (e .g., if two routers each believe the other is the correct next node on the path to the u l timate destination, then the packet wil l loop between the two routers). If network configurations never changed, and l ines and routers never got overloaded, then guaranteeing successful results would be easy. Vol. 5 No. I Winte-r 1993 Digital Technical journal Digital's Multiprotocol Routing Software Design Unfortunately, actual networks are complex. I n • Queuing. The most i mportant stab i l ity factor is practice, for each protocol, t h e correct path agree to make sure that the systems are self-stabil izing. ment is reached using an algorithm distributed As the problem gets worse, progress to the solu between multiple independent routers and operat tion should not become s lower. For exam pie , as ing on ever-changing data. the The d istributed algorithm must converge rapidly network configuration changes more rapid ly, the calculation of the best route must so that when network conditions change, the new not get slower. To meet this requirement, the route is agreed upon quickly. However, the algo routers must be careful about queuing data and rithm must also be stable. When changes occur at routing control messages internally so that a fast rate or when the a lgorithm is trying to com excessive or unusual data forwarding loads do plete or has just completed, the algorithm must sti l l not affect the processing of routing control mes converge t o a consistent state between al l the sages. Otherwise, when a network problem routers involved . In this way, the network remains overloads a router, the rou ting algorithm may useful. I n addition, while the network is changing, never converge to fix the problem. a router or a l ine may suddenly be presented with an excessive load of packets to forward (e. g . , Figure 3 illustrates a case w here an i ncorrectly because a routing loop occu rred transiently). This designed router (one that gives priority to data situation must not be allowed to d isturb the stabil forwarding over routing control message recep ity of the routing algorithm. tion and processing) cou ld cause a permanent The stabil i ty of a wel l-designed rou ting algo routing loop and thus isolate a portion of rithm is directly related to how well the algorithm the network. I n this example, node A is send meets the fol lowing maio requirements: ing a large amou n t of data to node F over • Line speed. The effective speed of l i nes between routers (al lowing for error correction by the data link protocol or the modem) must be h igh enough to allow the routers to rapidly exchange routing contro l information. The maximum bandwidth required for routing control traffic can be calculated from the size of the network . ' In a network o f 4,000 e n d nodes, 100 level 1 routers, and 400 level 2 routers, approximately the high-speed Tl l ine. The lower-speed (64 k ilobit -per-second [kb/s]) line is available as a backup line. Because the backup line runs at only 64 kb/s, node C need only be a low power router. For example, a router rated at 128 packets per second would be sufficient because a fu lly saturated fu ll-duplex 64-kb/s line with 128-byte packets hand les 128 packets per second . one Link State Packet (L'iP) will be received every second . This LSP may contain 1 , 500 bytes, which would use a l ine bandwidth of 12,000 bits per second. This aspect of stability is u nder the control of the network designer; line speeds and network size must be continuously monitored LAN and related. • Processing power. The router CPU must be fast FAST ROUTER SLOW ROUTER FAST ROUTER SLOW ROUTER enough to forward routing updates to neighbor i ng routers with minimum delay and must be able to recalculate the forwarding database quickly. Of course, this requirement relates only to that portion of the CPU time available for rout ing functions. A router that is a lso d oing another L---.----J L---,----J job (e.g. , acting as a file server) will have Jess CPU LAN power available, un less rou ti ng is given priority over the other fu nctions . Consequent l y, most networks now use dedicated routers instead of attempting to have routing tasks share the CPU with other functions. Digital Technical]ow-nal Figure 3 Vol. 5 No. I Winter 1993 Network Instability 73 DECnet Open Networking Consider what happens if the Tl l i ne fails. Router B notices i mmediately and begins to fo r ward data to rou ter C . Initial l y, however, router C sti l l believes the best route to node F is over the T l line and so forwards the data back to router B. B resends the data to C and so on; a routing loop has been created. This problem is common during rou ting transitions. The loop will be broken as soon as router C runs the cleci sion process and updates its routing tables. However, if router C is incorrectly designed and gives priority to forwarding data, then the unex pected ly large amount of data w i l l "swamp" the router and prevent it from running the decision process. In add ition, since router C is on ly a l ow-speed router, i t wi l l be forced to discard many data packets. Eventual ly, the transport connections between node A and node F will fail, because packets are not being clel ivered ( presumably causing the applications to fa il). This situation wi l l reduce the number of packets being in tro duced into the loop. However, each packet can go around the loop many times, thus generating a high load. I n this example, if nodes are set u p such that a packet can travel the loop 6 4 times (a common value), then introduci ng only two pacK.ets i nto the loop per second wi l l continue to swamp router C. Any node o n the L A N might be sending those packets to d iscover when access to the remote LAN is restored . The effect is a long-l ived routi ng loop that iso lates the whole LAN , even though there was supposed to be a backup l ink available. • Memory usage. Activities less im portant than routing should not consume the memory neces sary for rout ing control processes to carry out their function. Even in a dedicated router, some lesser activities will be in progress. For example, network management and accounting are important activities, but they are not as critical as maintaining network stability-without a sta ble network, network management ancl account ing will fail. Therefore, other activities should not starve the routing control processes of mem ory. Consequently, traditional memory pools are not an appropriate way to allocate critical mem ory within the router; routing memory usage must be preallocated. 74 The remainder of this section describes the impact of the requirements on processing power, queu ing, and memory al location on the design of the DEC WAN router and DECNIS products. Requirements on Processing Power The D igital Network Architecture (DNA) routing architecture requ ires that routing updates be prop agated within 1 second of arriving and that the for warding database calcu lation take no more than 5 seconds.' The forwarding database calculation is C PU-intensive, but the time is p roportional to the number of l inks reported i n LSPs. To meet the DNA requ irement, various measurements were made for each product to determine the nu mber of I inks the decision process cou ld hand le per second . This information indicates, for each product, the maxi mum number of I inks al lowed in the network. Note that this n u mber does not directly l i mit the number of nodes permitted in the network; a large network with an efficient connection strategy may have fewer l i nks than a sma l l network in which every node is connected directly to every other. The update process latency requirement means that the CPU time must be fairly allocated between the decision process and the update process. I f the upd ate process was requ ired to wait u n til the deci sion process had completed , then the delays on for warding LSPs wou l d be too large (i. e . , 6 seconds). We considered three possible solutions. I. Process priorities. Give the update process a strictly h igher priority than the decision process so that the database can be updated as required. The main issues to res o l ve are synchronizing access to the shared LSP database and allowing the decision process to complete, if a fa ul ty router generates LSPs at an excessive rate. 2. Timesl icing. As in a trad itional ti mesharing syste m, a l low both processes to run simulta neously, thus sharing the CPU. This solution also requires synchro n izing access to the LSP database. 3. Vol untary preemption. The decision process period ical ly checks to see if the update process is required and, i.f so, d ispatches to it. This check can occur at time interval s frequent enough to meet the latency requirements and at times con venient to the decision process so that no syn chronization problems occur. Vol. 5 No. I \Vi11fer /')'}3 Digital Technical journal Digital:' Multiprotoco/ Routing Software Design To avoi d the synchro n ization problems, Digital's • L i n k State Packets and t heir acknowledgments, DECNJS GOO software developers chose the third Sequence N u mber Packets (SN Ps) and Complete sol ution f()r two reasons. Sequence Number Packets (CSN Ps) 1 . Synchronization issues often ca use problems The parameters contro l l ing the minimum a nd that are serious and difficu lt to debug i n com maximum numbers of pac kets to be u sed for each plex systems. By avo iding these issues entirely, d ifferentiated type are careh11Jy calcu latecl based we simpl ified the software and increased its on their archi tected behavior ancl the network " rei iab i l ity. configurations supported by each product. For 2. The a d d ition of synchron ization mechanisms for para l lel tasks can decrease the performance of the total system (for example by causing excessive reschedu l i ng operations) . Using vol un tary preemption al lowed a very efficient solu tion that stil l met the arch itectural requ i rements. Queu ing constrai nts ensu re that h igh loads do not cause rou ting control information to be d iscarde d . Initial ly, separating t h e d a t a for fo rward ing from rou t i ng control messages might appear to be the logical sol ution to preserving routing control i nfor mation. However, this solu tion works only if the router can process all the ro uting con trol messages without get ti ng behind . practical routers, supporting a given m a x i m u m number of adjacent routers on an at tached LAI w i l l affect the policy selected for managing the Router Hello message queues and packet b u ffe rs. Such mechanisms are im plemented to guaran tee that, fo r network stabil ity, forwardi ng performance, a nd network conver gence, the minimum levels of forward progress per Requirements on Queuing Many exa mple, a route r's arch i tected des ign center for includ ing t he DEC WA Nrouter products, do not have a CPU that is fast enough to guaran tee such p rocessi ng perfor ma nce. Digital's rou ters can guarantee to meet t he tim ing req u i rements on the decision and u p date processes (even u n d e r wo rst-case loads), packet type are met. This packet management pol icy uses both buffer pools and queuing to im plement the req ui red poli cies. Inbound traffic is placed on queues that are serviced using variants of round- robi n a lgorithms. These al gorithms give different weighti ngs to each queue to ensure that progress is made for every packet type, althoug h at d i ffe rent rates (' For exam ple, for every data packet processed, the romer may process 5 I.SPs, 5 End-node Hel los, and 10 Rou ter Hel los. The actual weighti ngs used are selected when the software is designed and depend on the performance characteristics and expected network configuration of each product. Some a l ternatives that were considered are but i f that load is combined with a flood of E n d node Hello messages, Rou te r Hel lo messages, and other control traffic, then some of those messages • Alternative buffer pools. A completely separate pool can be used fo r each of the different t ypes have to be d iscarded or queued fo r later processing. Since t here m ight be 1 ,000 or more nodes on the of packets. The d isadvamage is that in smal l con LAN, the worst situation wou ld be if a l l these nodes stress, the pool of buffers avai l a ble for fo rward were to decide to send Hel lo messages at the same i ng is l i mited u nnecessarily. figt�rations or o nes that are not under heavy time. Careh1l software design means that the routers can meet the netwo rk stabil ity req u i rements and • Strict priorities. Set t i ng strict priont1es for processing d i fferent types of rou ting control LAN . messages is u ndesirable, because a flood of one Fo r the DEC WANrouter software, Digital designed type of rou ti ng cont rol message cou ld cause s t i l l not lose connectivity to end nodes on the and i mplemented a packet management pol icy that another type to be ignored fo r a long time. In d i ffere n t iates between rou t i ng packet types to such a case, i t is better to process some of each type of message than to give one type abso l u te meet their respective processing requirements for network stabil i ty. The fol lowing l ist summarizes the classes of packet types: • Data • End-node Hel lo messages • Router He l lo messages Digital Technical journal priority. In the DECNIS routers, several queues exist at the boundaries between the d i fferent DECNIS proces sors. ' Digital designed a mechanism for these queues simi lar to that descri bed for the DEC WAJ\Irou ter p roducts. When the network interface Vol. 5 No. I Winter 19')) 75 DECnet Open Networking cards, i .e . , l inecards, receive a packet destined to be passed to the ma nagement processor card (MPC), Network Management Some of the h ighest costs involved in r u n n i ng a they analyze the packet and tel l the MPC whether i t network are those related to obtaining and m a i n is data, ro uting control, bridging control, or system tain ing trai ned and experienced network managers control (wh ich includes l i necard responses to com and operators. M i nimizing these costs requ i res those described for the DEC WANrouters are used at The major network m a n agement issues are m a nds from the M PC) . Thus, queues ana logous to rou ters that can be easily and efficiently managed. a l l the i nterfaces within the system. For exa mple, the assistance p rocessor on the 1YI PC recognizes the • different types of messages and queues them on route r take to load after a power fai l u re? separate i nternal queues. Requirements on Memory Allocation Inst a l lat ion/load i ng. How are software u pdates d istributed and inst a l led? How long does the • Configuration. How is the software told abo u t changes to t h e l ines or t h e network parameters? Routers must have sufficient bu ffer space to hand le Does the network requ i re a reboot to change the routing control messages. Consequently, a l l of information? Digital's router products guaran tee this memory al loca t i o n . To p reserve these bu ffers, the D EC IS • MPC implements bu ffe r swapping between l ayers, a nd long- term reports of traffic p a tterns and as i l l ustrated in Figure 4. The data l i n k layer must usage for network planning? never be starved of bu ffers; otherwise, packets regarded as important by rou ting may be d iscarded without ever bei ng received. To ensu re that an ade • layer, t h e M P C gives the d a t a l in k a c e r t a i n n u m b e r Contro l . How can the ma nager shut down a l i ne or even a whole router? quate number of buffers is avai lable to the data l i n k of b u ffe rs a n d m a intains t h a t n umber. Every time a Monitoring . How does the manager get i m medi ate reports of problems and u nexpected changes, • Problem solvi ng . \Vhat tools are ava i l able to detect the problem and then to i nvestigate and buffer is passed from the data I i n k layer to the rout correct the problem? ing layer, another buffe r is swapped back in return . I f routing currently h a s n o free bu ffers, it selects a less important packet to discard (free ing up the In a l l networks, though, a remote ma nagement capabi l ity is essential. S k i l led network manage buffer co n t a i n i ng the packet) . In this way, the data ment staff may not be avail able at a l l sites (e. g . , a l i nk layer always has bu ffers avai lable. sma l l branch office) . In fact, some sites m ay have In the DECNlS l i necard buffers, the arrangements no staff at a l l (e.g. , a l ights-out com p u t i ng center). are s i m i la r to those ju st described , but the deta i l s d iffer. T h e l inecards and the MPC perform bu ffe r Installa tion and Loading Al l DEC WA;\Irouter and DECNIS products u pdate swapping among themselves:J their software by down- l i ne loadi ng new software over the network. In the case of t he DECNJS, the software is stored in no nvol atile memory and so does not need to be reloaded o n each boot. FULL DATA BUFFER EMPTY REPLACEMENT BUFFER However, the DEC WAN router p roducts down- l i n e l oad t h e software each t i m e they are booted . D igital considered two other a l ternatives. • Read- only memory (ROM). This means of distri bution has the disadvantage of bei ng expensive to modify ami d i ffi c u l t to replace remotely. • Floppy d isk or other interface on the route r. This mechan ism increases cost and reduces reliabil i ty. Loadi ng from a floppy d is k may also be s lower Figure 4 76 Stiffer Swapping between Routing than loading over a network . Aga i n , remote Module and Data Link Module u pdating may not be possible, a nd p hysical Vol. 5 No. I Winter /99.3 D igital Technical journal Digital's JV!ultiprotoco/ Routing Software Design security issues (e . g . , preventing un authorized DECmcc users from supplying uncontrol led router soft avail able for performance ana lysis and historical ware) may be introduced . data recording. The DECmcc design enables these For t he DECNIS product, D igital chose to use nonvolatile memory, e . g . , flash random -access memory (RAM), for fast and rel iable loading com bined with backup down-line load opera tion when software updates are required. The down-l ine load can be from a DECnet system using the ma inte nance operations protocol (MOP) or from a TCP/IP host using the boot protocol (BOOTP) and the triv ial file transfer protocol (TFTP) . 1 The down - l ine load provides an easy way to update software when requ ired; the software can be instal led o n a load host using a ny of the standard software d istribu tion mecha n isms (e .g., CD-ROM , magnetic tape, or the network). added-value fu nctional modu les are fu nctions to work without changing the router design . Many users, however, are now investing i n man agement stations that use the simple network management protocol (SNMP). Thus, for monitor i ng purposes, Digital a l ready implements basic read-only SNi\'I P man agement, which is enhanced over time to add more information. being Control Whether managed by the NCL or the DECmcc director, access is controlled using passwords. I n addition, D igi tal is focused o n offering fu l l SNMP management for the rou ter products. As wel l as providing the standard pub! ic management information, Configuration Configuring a router Digital is defi n i ng private man agement information to a llow unique features of diffic ult. the rou ters to be control led . We designed the Therefore, Digital developed a tool to assist the is notoriously i n ternal management interfaces of the routers to network ma nager with configura tion. Each of al low us to write modu les that are manageable from the DEC WANrouter and DECNIS products comes both the SNMP and the common management infor with a configurat ion progr a m . Th is menu-d riven mation protocol (0.1[P), with minimal effort and program leads the network manager through a dupl ication. series of forms to define the i nformation needed to configure the router or to modify an existing configuratio n . On-line help is avai l able, and steps may be retraced. Consequently, the network man ager has no need to learn the netwo rk control language (NCl.) . Digital used for mal human factors testing during the design and development of the conJigurators to ensure that these tools were of high qual ity. Human interface testing cont i nued through the router's customer field trials and provided additional feed back on our configurators' ease of use. One thing that D igital d id not origi n a l ly antici pate is that users now tend to see the configurators as the user interface for the product. The configura tor is often a customer's m a i n means of interacting with the router and thus is an essen tial part of t he product. Once people have used the configurator, they no longer regard it as an optional feature. Monitoring Digi tal's routers are fu lly manageable using Phase V network m anagement. all solving. Fortunately, many of the tools and tech n iques used for this task were requi red for debug ging a nd testing router implementations and t hus a l ready exist. Build ing initially on debugging and testing expe rience, and l a ter o n d iscussions with users, Digital has produced problem-solv ing guides for each DEC WAN router and DECNIS prod uct. These gu ides take the user through a step-by-step description of how to isolate and fix a problem. We have conducted human factors testing on these guides ancl have investigated different modes of making this info r m ation available. The DECNJS gu ide is cu rrently available in hard copy and also i n an on-l ine Bookreader form that a I lows moving through the flow to be automated using hot spots. Digital is cur rently evaluating Hypertext technology to further improve the usabil i ty. One main tool for problem solving i s the common trace facility (CTF), a soft NCL commands and can be ma naged using the ware tool that causes the router to record and dis DECmcc program. Digital 's Enterprise M anagement play packets that are sent and received . Analysis Architecture (EMA) - compl iant d i rector. Therefore, routi nes automatica l ly fo rmat the packets. Hav ing Vol. 5 No. I respond One of the most time-consuming, and hence expen sive, parts of a network ma nager's job is problem to Digital Technical journal They Problem Solving Win/a !')')) 77 DECnet Open Networking the CTF is comparable to h a v i ng a b u i l t - i n l in e o r • LAN <� na lyzer. The C:TF is the m a i n d i agn ostic tool 'fh rou g h p u t - t h e max i n1 u m (for�varding) rate at w h i ch none of the offered frames ( packets) used by Di git<� l 's service engineers when investigat are dropped i ng a problem a nd a l s o by t h e development engi second) by the device ( i . e . , packets per neers when debugging software . D igita l 's ro u ters also i n c l ucle di agnos t i c and • Frame loss rate- the percent of frames (packets) w h i c h i n c l u d e l o op b a c k that should have been fo rwarded by the network testing o v e r a l l i nterfaces a n d low-level , l i m i t e d , device (router) w h i l e u nd e r a constan t load b u t rem ote which m a i n tenance faci l i ties, m a nage me n t d i rectly at the data l in k layer. T h e remote management capa b i l i ties al low mo n i to r i ng of c o u n ters from a n adjacent node a n d a l s o a l low a n adj acent n o d e to fo rce a reboot i f a • b y a network m a nager to fix a problem with a boot com m an d is recogn ized by the l i necards. In the DEC WANrou ter, the MOP boot c o m m a n d i s spe c i a l ly actio ned by the lower layers of the software to m a ke sure i t is honored even i f the h igher l ayers have failed i n some way o r if the system is u n der an enormous load . We also support the "TCI'l'IP p i ng'· u t i l i t y (more for m a l ly, ICMP Echo) ancl the similar " OS ! p i ng" u t i l i t y. These tools are commonly usecl fo r d i agnosing reach a b i l i t y problems. to l ac k of Latency-fo r store-an d-forward devices (i .e . , is seen o n the o u tput port (i .e . , u n its of t im e ) A MOP boot c o m m a n d m a y be the fi n a l a t tempt i ng in the rou ter. In the DEC:NlS ro ut ers, the 1YI OP due a n d e n d ing when t h e first bit of the output frame MOP trigger i n DECnet Phase J V). 1 that reas o n , the c o m m a n d m ust be recogn i zed a n d fo rwarded last bit of the i n r ut frame reaches the i n p u t port is referred to as a MOP boot (pre v io u s l y known as a acted u p on regard less of w h a t e lse m a y b e happen not ro u t ers), the t i me in terval beg i n n i ng w h e n the su i t able p assword is supp l ie d . T h is l a t t e r operation router w i t ho u t havi ng to go physical ly o n site. For were resources ( i . e . , percent packets Jost) I n the design of Digi t a l 's r o u t e r software and sys tems, a b a l ance has been targeted with m a x i m i z i n g the packet t h roug h p u t fo rward i ng rates wh i l e m i n i m i zing the packet l a t ency. Som e ve ndors m is takenly compare l o s s - free throug h p u t rates w i t h forward i n g r a t e s t h a t h a v e high l o s s rates. Such c o mparisons must be studied carefu l l y, because they do not compare rou te p e rfo r m a nce measures of equal i m pact to the total net work. To reiterate, the throughput fo rward i ng rate occu rs only at the po i n t when the frame loss rate i s zero percent. Digita l 's ro u ters target through put designs which, as much as possible, r u n at ·'wire speed" w i t h zero frame loss rates. Regard less of the through p u t value qu oted , ro u t e r comparison shou l d referen c e com m o n packet loss rates because network appl ica Router Performance t ions need to retran s m i t any packets that are lost by To day's large-scale compu ter clara networks re ly o n briclge router components fo r the netwo r k s ' total level of pe rformance and qual i t y of service. As such, data network designers and n etwork m a n agers m ust be knowl edgeable about t h e i r chosen rou ter plat form's p erf()rmance characte ristics. This sectio n of the paper discusses the perfo rm a n c e aspects of D igita l 's routers. the rou ters. I n ge neral, the t h rough p u t , loss - free fo rward i n g r a t e i s the opt i m u m v a l u e fo r d iscussions of ro u t e r fo rward i n g perfo rm a nce. T h e o t h e r c r i t i c a l value is the stabi l it y of the router u n d e r heavy ove rload. A " receive l ive lock " con d i t i o n occur s w h e n the offered load, i . e . , i n p u t packets received fo r su bse quent fo rward i ng by a given router, reaches t he point where the de l ivered t h rough p u t , i . e . , packets actua l l y fo rward e d , decreases to zero H.� Rea l - t i me Performance Metrics systems, such as routers, have t h e poten t i a l to l ive I n support of deve l o p i ng com m o n metric.� across lock u nd e r traffic loads a bove their t h roughput the i n ternetwo r k i n g router i nd u s try, t he I n ternet peaks. However, Engineering Tas k Benchmarking Force ([ ETF) Methodo logy has set Wo r k i ng up a G ro u p , w h i c h has devc;Ioped defi n i t i o n s for ro u t e r pe rt<>r m a n c e . ' Three key metr ics defined by t h is group p rovide the background fo r Digital's rou ter software design. 78 our d iscuss i o n of ro u t i ng to it is extre mely important that i m plementations post - t h roughput avoid s a t u ra t i o n . such In responses the case of Di gital 's ro u ters, i n all archi tectures and products, the routers do not l ivelock but rem a i n stable eve n when t h e appl ied input load to a rou ter exceeds tbe peak throughput fo r ward i ng packet rate . T h is key Viii 5 No. I Wintl!r 1993 Dip, ita/ Tee/mica/ journal Digital's Multiprotocol Routing Software Design performance measure of router devices remains adapter software, the software is h ighly tuned for an underlying design characteristic of a l l D igital performance. To mi nimize the additional mainte DECNIS and DEC WAt'\Jrouter network devices. nance overhead associated with h ighly tuned soft ware, the amount of such code is kept to Packet Throughput/Forwarding Rate Digital's rou ting platforms offer a range of through put measures. For each platform, the throughput is the most often quoted value used to characterize the router's aggregate capabilities. In the case of the DECNI S 600, an aggregate throughpu t of 80,000 packets per second is offered Hl In s maller routers, a minimum. The DEC WAN router software design is an example of how Digital ca refu lly balanced prod uct performance requirements and product devel opment and maintenance costs to meet the requ ired price/performance goals for its rou ter product fam ily. the WAN line interface rates (i.e . , 64 kb/s and T l ) Packet Latency (Transit Delay) are often the l i m iting factor for the aggregate The next most frequently specified performance throughput. The software in a l l cases is optim ized requirement is packet latency or packet transit for the given router pla tforms mix of WAN and LAN delay For bridge/router devices, this meas urement i nterfaces. clearly depends on software and hardware timings. Si nce the forward ing rate is the most i m portant performance metric for a router, D igital carefu l ly However, the defi n i t i o n of l atency utilized con·e sponds d irec t l y to the , '1osen system's design . optimized the designs of its multiprotocol routers The previously quoted IETF definition for store to al low data forwarding to occur as fast as possible. and-forward dev ices can be further refined to On the DEC WANrouter products, we han d le all the accom modate d iffering device designs. The I ETF forwarding on a central CPU with l i ttle hardware working group clarifies the d ifference between a assistance. In the DECNIS products, forwarding and "store-and-forward device" and a " bi t-forward ing filtering operations are hand led by li necards. A device" internal design model for a router. The hardware performance-critical latter design model is often referred to as a "cut forwarding function 's address looku p is used on assist for the through" design and requires a d ifferent definition DECNIS routers in support of requirements for very than h igh-speed packet switching.·1 On each l inecard , a devices. The definition of latency used for this strea m l i ned software kernel has been developed cut-thratJt;h model is the time interval starting previously listed for store-and-forward along with a l l the required software. The l i necard when the end of the first bit of the input frame software kernel and modu les were carefu l l y con reaches the input port and ending when the start structed to have the minimum nu mber of instruc of the first bit of the output frame is seen on the t ions and the lowest nu mber of execution cycles output port." necessary to perform the h ig h-speed forwarding The issue that distinguishes the two models is and filtering operations. On the DECNIS MPC, the w hether or not processing starts prior to the packet software kerne.l is also fu l l y capable of the rou ting being completely received. However, another key forward i ng operations. However, this kernel is point is whether or not the packet received can be m a i n ly requ ired to provide the software processing sent out for transmission prior to complete recep for the remaining non-performance-intensive oper tion. When reception, forward i ng, and transm is ations of the router's software (i. e . , the processing of u pdates to the rou ter topology database and the network management com m ands/received pack sion can occur i n para l lel, the design is referred to as cut-through. For D igital's router designs, the DECNIS does process recept i o n and forwarding ets). This partitioning of processing of received in parallel prior to a packet being completely packets in the DECNIS rou ter system permits such received . H owever, t he DECNIS does not start trans routers, and the networks that they comprise, to remain h ighly stable when traffic overloads occur. the DECNlS latency model uses the original store For the DEC WANrouter software, the forwarding operation has n o hardware assist. Software lookup assist algorithms have been researched and imp.l e mented to help meet the performance-intensive require ment. As in the m icrocoded DECNIS l inecard Digital Technical journal Vol. 5 No. I Winter 1993 mission until a packet is completely received. Thus, and-forward definition of the IETF. In the case of the DEC WANrou ter software, the model and definition used is agai n store-and-for ward. The factors that control the packet latency i n the DEC WAN router design are as fol lows: 79 DECnet Open Networking l . Receiving the packet. The packet must be com pletely received. 2. Performing the forward i ng opera t i o n . This fac tor in cl udes packet ver ifi c a t i o n , a n a lyzing the packe t , one segment at a time. I n ternal ly, these cards use s m a l l fixed-size bu ffe rs that are l i nked together as perfo r ming any required add ress lookup, p erforming a n y required packet mod ifi cations, and queuing the packet fo r transmission o n the dest ination i n terface . form the a n a l ysis and fo rwarding lookup as soon as the data is ava i lable ( i . e . , when the first segment is received). Thus, fo r a large packet, the entire fo rward i ng decision w i l l have been made before the last byte has been received. H owever, note that u n t i l the l ast 3. Conges t i o n queuing. If the desti nation interface is not i d le , the packet will have to be queued before transmission. Some transit delay measu re ments use only uncongested media in te rfaces c o nnected to the rou ter. However, l a tency mea sureme nts must be made to measure the poten tial latency delays due to congestion at the router output interface. The packet latency due to queue occupation dela ys is also incl uded here. Congestion avoidance necessary t o store a whole packet. Aga i n , they per algo rithms have been i mplemented to m i n i m ize this c o nge stion de lay. byte has been receive d , it is not known whether the cyc lic red u nda ncy check (CRC) is correct or the packet has been corrupted . So the packet is not actu a l l y passed to the desti nation l i necard u n ti l that check has been completed . As d i scussed before, this design is s t i l l store-and-fo r ward, rather t h a n cut-t hrough. T h e DECNIS design go als were eas i l y met without u s i ng c u t - t h rough; however, Digital has used the cut -through design o n a n u mber of LAN host -based adapters. \Vhen a packet is t o be transmitted, cert a i n changes m u s t b e made i n the d a t a . F o r example, t h e 4. Trans m i t t i ng the p acke t . This factor is usua l ly IP a n d o s r protocols require tha t t i me-to- I ive fields d o m i nated by the t ime taken to clock the b i ts of a n d , in some cases, other options be m oclifiecl. the packet out of the interface b u t also i n c l u des Bridged packets may need address bits mod ified or media access times, i . e . , delays d u e to another conversi o n between Ethernet and I EEE 802 forms. node a l ready using a common connect i o n . As w i t h reception, a!J DEC WANcontrol lers perform We n o w examine h o w the DEC WANrouter a n d DECNIS routers separately minimize the transit de lay. The DEC WAN rou ters m i n i m ize the packet recep t io n a n d transmission portions by al lowing hard ware to perform these fu nctions using d i rect memory access (DMA). Because these systems have only a single processor, the forward ing delay is m i n i m ized by the same fast -path optim iza t ion s used to i mprove the forwarding rate. On the other h a nd , the optimizations for the DECNIS routers are sl ightly different for the various l i necards. The DEC WANco n troller 622 card has n o DMA, and the l i necard on-board processor is i nvolved i n receiv i ng each byte of the packet. We parse the header as soon as the re is enough infor mat i o n to do so. For example, the data l i nk packet type field is decoded before the network address bytes have been received, and the network add ress lookup is i n i tiated as soon as the add ress has been received (i .e . , before the data has been received). these operations as the data is transm i t ted. Al l cards have hardware assistance fo r reca l c u l a t i n g header checksums and CRCs. These fe atures are designed to reduce the for ward i ng delay as much as poss ible, so that the tran s i t delay is m a i n l y control led by the time i t takes to receive and send the packet . The typ e of archi tec ture that best describes the DECNIS design is a data flow, which blends trad i t i o n a l store-and-forward designs with newer c u t - through designs. T h is data flow arc h i tecture processes packets in a d is t r i b u ted manner (i . e . , l i necarcls process packets) w i t h o u t transmi t t i ng packets prior to complete rece ption val idation of these packets. This design l i mits the fo rwardi ng of packets that are fou n d to be i n error, whereas the s i m i l a r fu l l c u t - through design wou l d propagate i n va l id packets. Interaction between Routing and Bridging by the Designing a combi ned router and bridge product is add ress recogn i t i o n engine hardware without fur com pi ica tecl by the rel ationship between the rou t The add ress lookup i s then performed ther i nvolvement from the software. The DEC WANcon t ro l ler 618 card and the DEC LANcon t roller 601 and 602 cards aJ I receive packets 80 i ng and bridging fu ncti ons. 1 1 A received packet mu st be subjected to either the bridge fo rward i n g or t h e rou ting fo rward ing process (or maybe both). Vol. 5 No. I Win ter 1993 Digital Tecb7l ical jour1lal Digital's Multiprotocol Routing Software Design � - - - - - - - - - - - , Several designs are possible and are i l lustrated in (a) Protocol sp lit. In this design, some protocols are bridged, e.g. , Local Area Tra nsport (LAT), I and routing functions are completely separate; L t hey merely share I ine interfaces. Every packet _ _ _ _ _ _ _ _ _ LAN ------ received is passed to either routing (if intended for a protocol that is being routed) or bridging. (a) Protocol split. Some protocols are passed to the bridging functions, others to the routing fun ctions. (b) In tegrated with one i n terface. In this design, the routing fu nction is modeled as being o n top of the PROTOCOL SPLIT PROTOCOL SPLIT and others are routed, e.g., TCP/I P. The bridging layered I I I I I I Figure 5. bridging fu ncti o n . Theoretical ly, packets are subjected to the bridging process a nd if then, they are addressed to the rou ter, su bjected to the rout ing process. In t h is form of the model, the router uses a single logical interface seemi ngly connected to a private LAN contained within the bridge/router. (c) Integrated with m u l tiple i nterfaces. T h is design is similar to the integrated design with one interface, but the router uses a l l the avail able interfaces and logica l ly connects to the (b) Integrated with one interface. The routing function uses a single LAN address and a same extended LAl\1 m u l tiple times. single logical interface to the extended LAN. Each design model has advant ages and d isadvan r - - - - - - - , tages, and we conside red a l l three models for the design of the DECNIS routers. The protocol I I I I I l spl itting model has the advantage of simpl icity. The major disadvantage is that any particular protocol must be either bridged or routed . The in tegrated models have the disadvantage of req uiri ng specific management to prevent a routed protocol from being routed specifical l y to avoid the problems associated with bridging. The model with one inter _ _ LAN ------ also being bridged . In most cases, a protocol is _ _ _ _ _ I I I I I j ....__ _ _ _ LAN (c) Integrated with multiple interfaces. The routing function uses all inte1jaces to face also has the disadvantage that the network attach to the extended LAN multiple times. ma nager may get confused at tempting to work out which interface is being used for routing. We chose the protocol-spl itt ing model because of its effec Figure 5 tiveness and ease of use. Special Considerations ofthe DECNIS Architecture We have discussed special features of the DECNIS system architecture. Now we prese nt some addi tional DECNJS software design issues. • Bridge/Routing Design "Watchdog" poll ing. In a standalone network server product, it is necessary to guard against the software get t i ng caught in an infinite loop and hence not respondi ng to management and Control and Managem ent of Linecards control messages. The management processor is protected by a hardware watchdog timer, but Each l i necard is a separate software e nvironment the l inecards do not have a timer. To protect t he and must be managed and control led by the man l inecard software, we designed the management processor software to poll each 1 inecard every agement processor. The main tasks required are Digital Technical journal Vul. 5 No. I Winter 1')')3 Rl DECnet Open Networking 400 m i l l iseconds (ms). If there is no response, we reset the card . • Cou nters. The network in terface canis handle data forwarding and therefore must maintain the requ ired counters (e. g . , the number of data bytes received). However, to avoid requ iring the linecard to maintain 6 4-bit counters (which costs memory and requires 6 4 -bit arithmetic) , the management processor mai ntains the fu l l counters and pol ls the linecards frequently enough to guarantee that the on-card cou nters do not wrap . Each cou n ter is sized to support the design of the management processor pol l ing m u l t iprotocol routers. W h i le on one hand ach iev ing extremely high system throughput (i.e., the DECNIS 600 router supports a forwarding through put rate of over 80,000 packets per second), the DECNIS 600 design also addresses the equally criti cal metric of router stabil ity ( i .e . , the DECNIS 600 product remains stab le under extreme network loads) . "1 T h is balancing of requirements is key to justifying Digita l 's approach toward router product engi neering. As sum marized in h is recent book on computer systems performance ana lysis, Raj Jain states that The performance of a network . . . i s measured by the speed (through put and delay), accuracy (error rate) and availability of the packers sent. 1l every 400 ms. • Control. When a data l ink protocol or a ro uting protocol is started or stopped on an interface, t he management processor receives the network management command and issues appropriate control messages to the network in terface care!. Distributed Forwarding Each l inecard normally hand l es the forwarding of bridged and routed data without involving the man agement processor. This design req u ires a d ifferent approach to meeting the stabi lity requirements from that described for the DEC WANrouter devices. For example, t he DEC WANrouter products dis card data packets to meet the rou ting stabil ity requirements. This d iscard is limited by the packet management mechanisms to guarantee a minimum level of forward i ng performance for the other rout ing packets, even u nder worst-case cond itions such as those caused by network topology changes. The DEC:'>i!S routers do not norm a l ly have to discard packets, because the network interface cards can continue to forward data wh ile the m a nagement processor hand les the routing protocol operations. In addition, correctly designed l inecard software guarantees that control traffic is passed to the MPC, even in cases where the software is also passing large amounts of data traffic to the MPC. Conclusion Routers that can forward packets but cannot remain stable under heavy loads, or meet the requirements fo r bursty packet rates as required by many of the newer network appl ications (e .g. , packet-based videoconferencing systems such as D igital's D ECspi n product), will fail to satisfy cus tomers . 1 ' As such, D igital provides a wel l-tuned , optimized total network solution with DECNIS 600 routers and DECspin produ cts. This synergy of Digi tal's network applications and network infra structure components is the ulti mate j ustification fo r the multiprorocol router design decisions ou t l ined in this paper. Acknowledgments Many engineers in Austra l ia, Engl and, Ireland, and the Uni ted States participated in the design and implementation of the Digi tal's m u l tiprotocol routers. We wish to thank a l l of them. References L Df:"Cnet Digital Network A rchitecture (Phase 2. ] . Martin and ]. Leben, DECnet Phase V (Engle wood Cl iffs, NJ : Prentice-Hal l , I n c . , 1992). 3. R. Perlman, R. Cal ion, and M . Shand, " Routing Architecture," Digital Technical journal, vol . 5 , n o . 1 (Winter 1993, this issue): 62- 69. 4. S. Bryant and D . Brash, "The DECNIS 500!600 This paper describes the complex nature of the design decisions required in the development of D igital's m u l t iprotocol router systems and soft ware. The issues and solutions d iscussed show how many conflicting technical requirements can be addressed . One example of such a conf l ict is related to the designs goals for the performance of Digital's 82 V) General Description (Maynard, MA: D igital Equipment Corporation, Order No. EK DNAPV-GD-00 1 , 1987). Multiprotocol Bridge Router and Gateway," 5, no. 1 ( Winter 1993, this issue): 84 -98. Digital Technical journal, vo l. Vul. 5 No. 1 Winter 1993 Digital Tech1licaljournal Digital's Multiprotocol Routing Software Design '5. m:'Ciwt Digital Network Architecture (Phase 9. K. Ramakrish nan, ··schedu ling Jssues fo r Interfacing for High Speed Networks," Pr-oceedings of Globecom '92, !EEL Global Telecom m u nications Conference, Session 18.04, Orl a ndo, FL (December 1992) : 622-626. 10. S. Bradner, "lnterop Fa l l 1992 Router Perfor mance Study," technical presentation, Har vard University, 1992 . 1 I. W V) Netux;rk Routing Layer Functional Speci fication (Maynard, MA: Digital Equipment Corporation, 1991 ) Order No. EK-DNA03-FS001 , 6. E. Coffman , .J r. , and P Denning, Operating Sys tems Theory (Englewood Cl iffs, NJ : Pren tice Hal l , Inc , 1 973): 169. 7. S. Brad ner, " Benchmarki ng Te rmino .logy for Network I n terconnection Devices," I nter net Engineering Task Force RFC 1242 (July 199 1 ) . 8. 12. K . Ramakrishnan and W Hawe, "The Wo rk station on the Network: Performance Consid erations fo r the Co mmunications Inte rface," //;'/;'/:' Computer Society Technical Committee Hawe, M. Kempf, and A. Kirby, "The Extended Local Area Network Architecture and L A N Br i d ge 100," Digital Technical jou r nal, vol . I , no. 3 (September 1986): 54-72. R. Jain, The A rt of Computer Systems Perfo r ISBN 0-471-50336 - 3 (New York: John Wi ley & Sons, 199 1 ): 23. mance A nalysis, 13. on Operating Systems, vo l . 3, no. 3 (Fa l l R. Palmer and L. Palmer, "DECspi n: Net worked M u l t i media Conferencing fo r the Desktop," Digital Teclm ical jo urnal, vol . 5, 1989): 29 - 32. no. 2 (Spring 1993, forthcoming). Digital Technical journal Vol. 5 No. 1 Winter 1993 83 Stewart F. Bryant David L.A. Brash The DECNIS 500/600 Multiprotocol Bridge/Router and Gateway The DECN!S 500/600 high-performance multtjJrotocol bridge/muter and gateway are described. Tbe issues affecting the des(r;n of rou ters with tbis class of pe!for mance are ou tlined, along with a description of tbe architecture and implementa tion. The system described uses a distribu ted forwarding algorithm and a distribu ted buffer management algorithm executed on plug-in linecards to achieve scalable petformance. An 011eruieu• of the currently auailable linecards isprovided, along with performance results acbiet•ed during system test. The DEC Network Integration Server ')00 and 600 (OECNIS 500/600) p roducts are general-purpose com m unications servers integra ting m u l tiprotocol ured w i t h 14 Ethernets have demonstrated a for ward i ng performance of 80,000 packets per second as a router or a bridge. rou t i ng, bridgi ng, and gateway fu nctions over an T h is paper d iscusses the issues i n vo lved i n the evolving set of local and wide area i n terfaces. The design of a fast bridge/rou ter. It presents the pro product fa m i ly is designed to be flex ible, offering a cessing considerations that led u s to design the d is wide range of performance and functional it)'. t r i b u ted forward i ng system used in the OECNIS The basic system consists o f a Futurebus+ based ')00/600 pro d ucts. The paper then deta i ls the hard backplane, a m anagement processor card (i'vl PC), ware and software design and concludes with a per and a packet random -access memory (PRAM) card formance sum mary. with a centralized address recogn i t i o n engine (ARE) for forward ing routed and bridged traffic. Network Fast Bridge/Router Design Issues in terface cards or l inecards are added to provide There are a number of confl icting constraints o n network attachment. The DECNIS ')00 provides two J i necard slots, ancl the DEC I S 600 prov ides seven I i necard s lots. The applications run from local memory o n the NIPC a n d l i necards. PRAI'vl is used to bu ffer packets in transit or desti ned to the system , i tsel f. The system was developed around d istributed the design of a bridge/router. I t m ust s i m u l tane ously forward packers, partic ip a te i n the process of ma intain ing a global view of the network topology, and at a l l ti mes be responsive tO network manage me n t . This requ i res a soph i s t icated hardware ami/or software design capable of striking the cor rect balance between the demands i mposed by forward i ng on the l inecards to m a x i m ize perfo r these constraints. the The need to m ake optim u m use of the transmis l i n ecard for i nternet protocol (IP), DECnet, and s i o n technology is emphasized by the h igh l i n k rar open systems i n terconnection (OSI ) traffic using i fh in E u rope and the throughput demands o f mance. Software provides fo rward i ng on i n tegrated IS-IS (intermediate system to in termedi modern h i g h-perfo r m ance co m p u t i ng equi pme n t . ate system) routi ng, a long with bridging fu nctio nal Therefore, the r o u t e r designer m u s t fi nd methods i t y for other traffi c . The m anagement processor of forward ing packets in the m i n i m u m n u mber of controls the system, i n cl u d i ng load ing a nd d u mp CPU i nstructions in order to use modern transmis ing of the l inecards, a d m inistering the rou ting and s i o n technology to best advantage. In addition to bridging databases, genera t i ng rou ting and bridging h igh performance, low system l a tency is required . contro l traffic, and network m anagement. X.2') The appl i cations t h a t run across networks are often fu nctionality, both for rout ing data a n d as a n held up pend ing the transfer of data. As CPU perfor X . 2 5 gateway, a n d rou t ing f o r AppleTalk a n d II'X mance i n c reases, the effects of network delay play are a n in creasingly significant ro le in determining the supported on the m anagement processor. Per formance measurements on a system config- 84 overal I a p p l ication performance. vhf. 5 Nu. I \Vi11ter /')'J.i Digital Techuicaljourllal The DECNIS 500/600 Multiprotocol Bridge/Rou ter and Gateway Another aspect of forwarding that requ ires atten part of the worl.d could cause incorrect network t ion is data integrity. Many protocols used in the operation in a d ifferent geographical region. A local area network (LAN) have no clara protection bridge/router m ust t herefore be designed to pro other than that provided by the data l in k checksum. cess all network control traffic, and not export its of the data paths to minimize the periocls when the network: data is unprotectecJ. The normal technique in briclg achieve this, the router needs to provide processing ing is to leave the checksum intact from i n p u t to and filtering of the received t raffic at line rates, in output. However, more aclvanced techniques are order to extract the network control traffic from Thus carefu I attention must be paid to t he design local congestion problems to other parts of the a "good citizensh ip" constraint. To needed , as t h is si mple approach is not possible the data traffic under worst -case conditions. In when translating between clissimilar LAN types. some cases, careful software design can accom Two part icular operations that constrain the per pl ish this; however, as l ine speeds increase, hard formance of the forward ing process are packet ware support may be requ ired. Once the control parsing and address lookup. In a m u l t i protocol traffic has been extracted, adequate processing router, a variety of aclcl ress formats need to be vali power must be provided to ensure that the dated and looked u p i n the forwarding table. The network converges quickly. This requires a sui table most powerfu l address format in popu lar use is the task schedul ing scheme. OS! NSAP (network service access point), bllt this is Another requirement of a bridge/router is that it the most complex to parse, with up to 20 octets to remain manageable u nder all circumstances. If the be analyzed as a longest-match sequence extracted rou ter is being overloaded by a malfu nctioning from padd ing fields. In a bridge, supporting the node i n the network, the only way to rel ieve the sit rapid learn ing of med ia access control (M AC) uation is to shut down the circu i t causing the over aclclresses is another requ irement. To provide con load. To do this, it must be able to extract and sistently high performance, t hese processes bene process t he network management packets despite fit from hardware assistance. the overload si tuation. Cobb and Gerberg give more Although the purpose of the network is the trans mission of data packets, the most critical packets are the network control packets. These packets are used to determine topological informat ion and to communicate i t to the other network components. information o n routing issues. ' Architecture To address the requirements of a high-performance m u l tiprotocol bridge/rou ter with the technology I f a data packet is lost, the transport service retrans cu rrently available, we split the fu nctional requ i re mits the packer at a small inconvenience to the ments i nto two sets: those best hand led in a dis appl ication. However, if an excessive number of tribu ted fashion and those best hand led central ly. network control packets are lost, the apparent The data I ink and forwarding functions represent topology, and hence the apparent opt i m u m paths, the h ighest processing load and operate in suffi frequently change, lead ing to the formation of rou t ciently local context that they can be d istributed to i ng loops and the generation of further control packets describing the new paths. Th is increased traffic exacerbates the network congestion. Taken to the extreme, a processor associated with a I ine or a group of l ines. The processing requirements associated with these functions scale linearly with both l ine speed posi tive feedback loop occurs, i n and number of l ines attached to the system. Some which the only traffic flowing i s messages trying to bring the network back to stabi l ity. aspects of these per- l i ne functions, such as l i n k i ni As a result, two requirements are rtaced on the require information that is available only centra l ly a tial ization and processing of exception packets, router. First, the router must be able to identify and or need a soph isticated processing environment. process the network control packets under all over However, these may be decoupled from the critical load cond i t ions, even at the expense of data traffic. processing path and moved to the central process Second, the rou ter must be able to process these i ng function. packets quickly enough to enable t he network to In contrast to the lower- level functions, the man converge on a consistent view of the network agement of the system and the calculation of the topology. forwarding database are best hand led as a central As networks grow to global scale, the possibility i zed function , since these processes operate in emerges that an underperforming rou ter in one the context of the bridge/rou ter as a whole. The D igital Technical ]ourual Vi!/. 5 No. I Winter 1993 85 DECnet Open Networking processor workload is proportional to the size of req u i red , we would need hardware assistance in the network and not the speed of the l i nks. parsi ng Network protocols are designed to reduce the Considerati o ns of economy of hardware cost, and look ing up network addresses. amount of this type of process ing, both to minimize board area, and bus bandwidth Jed us to a single ARE contro l traffic bandwidth and to permit the con shared among all J inecards. This ackl ress parser has struction of relatively simple low-performance s u fficient p erformance to support a DECNIS 600 routers i n some parts of the network. server fu l ly pop u l ated with l inecards that support These processi ng considerations led us to design the DECN!S 500/600 as a set of per- l ine forward ing processors, co m m u n icati ng on a peer-to-peer basis to forward the normal packets that comprise the each l ink with a bandwidth of up to 2 x 10 megabits per seco n d . Above this speed, local address caches are requ ired . majority of the network traffic, pl us a central Distribu ted Forwarding management processor. Although this processor In u ndersta nd i ng the distributed torward i ng pro behaves, i n essence, l i ke a normal monoprocessing cess used on the DEC IS '500/600, it is convenient to bridge/router, its i nvolvement in forward ing is l i m first consider the forward ing of rou t i ng packets, ited to u nusual types of packet. Having spl i t the fu nctional ity between the and then to extend tl1i s descrip tion to the p rocess i ng of other packet types. In the rou ti ng forward i ng peer- to-peer forwarding processors and the m a n process, as shown in F igure 1 , the i ncom ing packets agement processor, we designed a b u ffer and are made u p of three components: the data l i n k contro l system to efficiently cou ple these pro header, the routing header, and the packet body. cessors together. The DECN!S 500/600 products The receive process (RXP) term in ates the data use a central PRAM of 256-byte bu ffers, shared l in k layer, stripp ing the data I i n k header from the among the l inecards. Ownersh i p of bu ffers is passed from one l inecard to another by a swap, packet. The rou ting header is parsed and copied i n to P"R.ANI u n mo d ified . Any required changes are empty m ade when the packet is subsequently transm itted. one. This algorithm improved both the fa irness The information needed fo r this is placed in a clara which exchanges a fu l l buffer for an of buffer al location and the performance of the structure ca l led a packet descriptor, which is writ bu ffer ownership transfer mechanism. Frac t ional ten i nto space left at the fron t of the first packet bu ffers much s m a l ler than the maxi m u m p acket b u ffe r. The packet body is copied i nto the packer sizes were used, even though this makes the sys b uffer, tem m ore com p l ic ated. The consequential econ required. omy of memory, however, made t h i s an a ttractive p roposition. Ana lysis of the fo rward ing function i ndicated that to achieve the levels of performance we continu ing in other packet b u ffers i f The desti nation network add ress is copied to the ARE, which i s also given instruct ions on which address type needs to be parsed. The H X P is now free to starr processing another in coming packet. DESTINATION NETWORK ADDRESS FORWARDING PARAMETERS PACKET (DATA LINK HEADER, ROUTING HEADER, ,-1.--� PACKET BODY) PACKET (DESCRIPTOR, ROUTING HEADER, PACKET BODY) PRAM PACKET (DESCRIPTOR, ROUTING HEADER, PACKET BODY) • I PACKET (DATA LINK HEADER, ROUTING HEADER, PACKET BODY) TXP .___...._ ..,.. ..� RING VECTOR (BUFFER POINTER, QUEUING INFORMATION) Figure 1 86 Distributed Forwarding Vol 5 No. J Winter !')' ).) D igilnf Tecbuical journal The DECNIS 500/600 Multiprotocol Bridge/Router and Ga teway When the address loo kup process has completed, the IL' l � LAN SEGMENT FRAME RELAY SERVICE I � LAN SEGMENT A RA E�=�=:=g=�==Y:)� T= K=i=N= '----.....,;:-----' Figure I 100 !6-bit CCITT cyclic Typical Frame Relay Configuration Vol. 5 No. I Wi11ter I'J9.l Digital Technical jour11al Frame Relay Networks • Frame relay addressing, using headers of 2, 3, or backward 4 octets in length. F igure 2 shows the frame relay (BECN) bit, and the discard e l igibility (DE) indica reserved in each octet to i nd icate whether or not Avoidance section. header formats. An extended address (E/A) bit is congestion notification tor, which are discussed in the Congestion the octet is the last one in the header. Permanent Virtual Circuit Control Procedures Most of the header represents the data link con nection identifier (DLCI), which identifies the Frame relay PVCs provide point-to-point connec frame's virtual circu i t . The header may also con tions between users. Although the PVCs are set up tain a DLCI or control indicator (D/C) to indicate for long periods of time, they can still be con whether the rem aining six bits are to be i nter sidered preted as lower DLCI bits or as control bits. For virtual connections because network resources (i.e., buffers and bandwidth) are not con a l ignment with LAP-D, the header also contains a sumed unless data is being t ransferred . bit to d iscriminate between commands and For interface management p u rposes, the frame responses (C/R). This bit is not used for support relay i n terface includes control procedures based ing frame rel ay access. on the LMI definition contained in the original The DLCI influences the rou t i ng of the frame to m u l t ivendor specification. These procedures use the desired destination. The DLCI is also used to messages carried over a separate PVC identified by m u ltiplex PVCs onto the physical l i nk and enables each endpoint to communicate with an in-channel signaling DLCJ. The management mes sages are transferred across the interface using data mu ltiple dest inations by means of a single l ink u n nu mbered i nformation frames, as defined network access. DLC!s may h ave either global o r in CCITI Recommendation Q.922 6 The messages local significance in t h e network. ln t h e global use a format similar to that defined in CCITI case, the scope of the DLCI extends throughout Recommendation Q.931 for ISDN sign a l i ng in sup the network such that a particular DLCI always port of cal l control and feature invocation.7 Each identifies the same destination, thus making the message is formed from a set of standardized infor frame relay network look more l ike a LAN. ln the m ation elements defining the message type and local case, the scope of the DLCI is l im ited to the associated parameters. The control procedures per particu lar interface. When loc a l DLCis are used , form three main functions: the same DLCI can be reused at another interface • to represent a d ifferent connection. • explicit L ink i n tegrity verification initiated by the user device and m aintained o n a continuous basis. Congestion control and avoidance information. This fu nction allows each entity to be confident The frame relay header also contains the forward that the other is operational and that the physical expl icit congestion notification (FECN) bit, the l i nk is in tact. DLCI (6 HIGH-ORDER BITS) C/R E/A = 0 DE EIA C/R E/A = 0 DE E/A = 0 DLCI (6 LOW-ORDER BITS) D/C E/A = 1 DLCI (6 H IGH-ORDER BITS) C/R E/A = 0 DE E/A = 0 I DLCI (4 LOW-ORDER BITS) DLCI (6 HIG H-ORDER BITS) I DLCI (4 BITS) I DLCI (4 BITS) FECN FECN FECN I I I BECN BECN BECN 1 E/A = 0 DLCI (7 BITS) DLCI OR CONTROL (6 LOW-ORDER BITS) Figure 2 Digital Technical journal = Vol. 5 No. 1 D/C E!A = 1 Frame Relay Header Formats Winter 1993 101 DECnet Open Networking • • When requested by the user, fu I I status network report providing details of a l l PVCs. The user wou ld normally request such a report at start-up and then periodically. Notification by the network of changes in i ndi vidual PVC status, including the add ition of a PVC and a change in PVC state (active/inactive). The management protocol is defined in Annex D of Al'JS I T 1 . 617, with equivalent fu nctiona l ity also defined in CCITT Recom mendation Q.933, Annex A H Y Effect 011 Higher-level Protocols Frame relay provides a m u l t iplexed PVC interface and, with regard to routi ng software, can be mod eled as a set of point-to-point J i nks. However, the characteristics of the frame relay service d iffer from normal point-to-point J i nks. The m ajor differences are as fol lows: • Round-trip delay across a frame relay network is norm a l l y longer than the delay across a dedi cated point-to-point l ink. • PVC throughput can be as h igh as 2 megabits per second (Mb/s), whereas many existing leased l ines operate at lower rates. • A single frame relay i n terface can have m u ltiple virtual connections (each one going to a dif ferent clestin ation) as compared to the tradi tional point-to-point l i nk, which supports a single connection. Given the specific characteristics j ust described, a frame relay interface m ay h ave many more pack ets in transit than a conventional point-to-point l ink. Consequently, a n acknowledged data l in k pro tocol whose procedures i nclude retransmission of data frames is of l imited use in this environment. For a large nu mber of virtual connections, the mem ory requ ired to store the data frames pend ing acknowledgment wou ld be prohibitive. In addition, if frames are being discarded due to congestion in the frame relay subnetwork, the retransmission pol icy would i ncrease, rather than recover from, this congestion . I nstead, an u nacknowledged data l i nk layer should be used. Using an u nacknow ledged data l in k protocol has implications for the rou ting l ayer operating over frame relay. In particu l ar, the data I i n k can no longer be considered rel iable, and the routing pro tocol must accommodate this characteristic. 1 02 Congestion A·voidance When a frame relay network becomes congested, network devices have no option but to d rop frames once their bu ffers become fu l l . With an unacknowl edged data l i nk layer, the user device will not be informed if a data frame is lost. This lack of expl icit signa l ing when operating over frame relay net works pl aces a requirement on the higher protocol layers in the end-system equipment. The OS! trans port layer p rotocol demonstrates how to deal with this type of characteristic . The destination end system's transport implementation detects data Joss and requests the source to retransmit the frame. The implementation reduces the source's credit to one, thus closing the source's transmit window and, in effect, reducing traffic through the con gested path. Frame relay networks are prone to congestion. Consider the scenario shown i n Figure 3. Note that the committed information rate (CJR) represents minimum guarameed throughput. In the configura tion shown, the network device can support two PVCs : one r u n ning at 64 kil obits per second (kb/s) and the other at 128 kb/s. With no back pressure appl ied across the frame relay i nterface, in the worse case, the network device will become con gested. The router can send frames into the network or a particular PVC at I Mb/s that will then be forwarded at a much s lower rate. Once the network device's butlers are fu l l , it will discard frames. As a result, rou ting and bridging control messages may be lost, thus causing the routi ng topology to become u nstable. Since this, in turn, will li kely lead to looping packets, a network melt down cou ld resu lt. I n addition, if data frames are lost, the higher layer protocols in the end system (e .g. , the OS! transport layer) discover this situation and retrans mit the lost frames. Repeated transm ission of the - r- ROUTER (USER DEVICE) - FRAME RELAY INTERFACE 1 M BIS LINE LAN SEGMENT FRAME RELAY NODE (NETWORK DEVICE) n 64 KBIS C I R LINE Figure 3 n 1 28 KBIS C I R LINE EYample Configuration ofFrame Relay Jntelface Rate and Permanent Virtual Circuit Throughput Vol. 5 No. I Winter I'J9.i Digital Teclmical journal Frame Relay Networks same data causes the effect ive end-to-end through Since the loss of rou ting control messages can p u t to drop wel l below the m i n i m u m guaranteed cause network instabil ity, a n a l ternative approach is to adopt m an u a l configuration . Static network throughput. The frame relay header has several mechanisms configu rations use reachable add resses to provide that can be used to apply the appropriate back pres rou ting information such that the transm ission sure to prevent congest ion. o f rout i ng contro l traffic is not required. Conse • The FECN bit is set by the network when a frame experiences congest ion as it traverses the network . In OSI and DECnet Phase V environ ments, this bit can be mapped onto the conges tion-experienced network l ayer bit PDU. in the header This PDU, when of the subse quently d e l ivered to the destination, a l lows the destination to d iscover that the path is con gested and to notify the source transport to decrease its window and thus place less demand on the network. Standardization work is c u r rent ly under way to add similar support to the • I n addition, the user device could implement rate-based transmission to ensure that virtual cir cuits are not congested. However, a means of notify ing the user device of the C I R of a virtual circuit was included only as a n optional extension in the LMI specification, and use of such a method wou ld destroy one of the major benefits of frame relay, i .e . , the capabi l i t y t o a l locate bandwidth on demand. I n practice, network devices have l i mited i nter nal buffering to store frames; this is reflected in the CIR assigned to PVCs. Consequently, data loss occurs transmission control protocol/internet protocol if user devices consistently transm i t data on a PVC (TCP/IP). faster than its associated CIR. Adequate procedures The BECN bit is set by the network when a frame traverses a congested virtual circ u i t in the oppo s i te d irection. This ind icator is not perfect, because there is no guarantee that traffic will be generated i n this d irection on the virtual circ u i t . A source that detects it is transmit ting o n a con gested path is expected to reduce its offered load . • quently, the routing behavior is independent of the p e rformance of the network. The DE bit, if set, ind icates that d uring co nges tion the frame should be the first d iscarded. The procedures for decid i ng to set this b i t are not clearly defined . This bit cou ld be set by (1) the entry node of the network, e . g . , when the input offered load i s too high, o r (2) the source user equipment, e . g . , to d iscriminate data frames from the more important rou ting control messages. and CPLs that cope with congested situations h ave yet to be developed and standardized. As a resu l t , s u c h situations may l e a d to unfairness in a m u lti vendor environment where those u s e rs who sup port congestion avoidance wi l l lose bandwidth to those who d o not. Products Below we describe examples of frame relay prod ucts: the StrataCom IPX FastPacket equ ipment, which provides the frame relay network ; D igital 's D EC N I S and 500/600 and WANrouter 100/500, which support the frame relay service by accessing the i nterface as user equ ipment. The StrataCom JPX FastPacket Product Family The StrataCom IPX FastPacket product family can Other methods can be used to avoid the conse be used to b u i l d networks that support both cir quences of congestion and hence frame loss. The c u i t -mode voice and data as wel l as frame relay. LM l defined i n the m u l tivendor frame relay specifi Within the network, the StrataCom IPX FastPacket that nodes c o m m u n icate using a technique based on included a threshold notification bit i n the PVC sta cel l switching, which i nvolves the transmission of cation contained an optional extension tus i nformation element of one of the messages. small , The threshold n o t ification bit provided a means of functions provide services on top of the basic t rans al lowing asynchronously mission network. Strata Com uses a hardware-based i nfor m a user device that a particu lar PVC connec switching technique resu l t ing in very high-speed a network device to fixed- length cells. Add itional, high-level (100,000 to 1 ,000,000 cel ls per second) . tion was congested. The user device could then switching stop transm i t t ing data on the connection u n t i l the With such high throughputs and low delays, these network device informed it that the congestion was networks have been used for carrying voice, video, al leviated. and data traffic. Digital Technical jourual Vol. 5 No. I Winter 1993 1 03 DEC net Open Networking The Strata Com I PX FastPacket network is config ured by network management to provide the required virtual circuits between users. The StrataCom cel l switching mechanism adopts a sin gle-eel I format for the transmission of all types of information, with each cel l containing addressing information. Routing tables within the network nodes use this addressing information to forward the traffic along the desired virtual circuit. Since i n a n y particu lar connection the path used for the sequence of cel ls is always the same, cel l ordering is maintained. Intel l igent interfaces at the edge of the network provide the functions required for spe cific services such as voice and data. Figure 4 i l l ustrates the concept employed by StrataCom of building service-specific functions on top of a common cell switch i ng technology. The fig ure shows examples of various types of external interfaces. For the frame relay i nterface, StrataCom sup ports the optional featu res defined to address con gestion. The J PX FastPacket node provides the optional expl icit congestion i n d icators defined in the frame header, w h ich are set based on averaging queues that build up in the IPX FastPacket nodes i n the network. Support is also provided for the optional threshold notification feature defined as part of the LMI ; the actual threshold values, together with buffer configuration, can be configured by the network manager. Frame Relay Support in Digital's Family of Multiprotocol Routers D igital has provided frame relay support i n its fam ily of m u l tiprotocol routers that employ the OS! intermediate system-to-intermediate system (IS-IS) routing protocol . Frame relay user device func tionality is i mplemented i n the DECNIS V2.1 soft ware for either the DECNIS 500 or the DECNIS 600 hardware units, and i n the DEC WAN router v 1 .0 software for either the DEMSB or the DEMSA hard ware units. Part of the development of the frame relay sup port involved cooperating with StrataCom to pro duce a working frame relay specification. In particular, extensions were added to the LM I to pro vide appropriate congestion control procedures. Digital's software supports the Frame Relay SpeciJication with Extensions, Revision 1 .0, written by Strata Com and the relevant Ai\ISI TlS l standards ..'. J .o.s The software has been tested and is known to be compatible with the StrataCom IPX FastPacket 16/32 equ ipment with Frame Relay I nterface Card Software. I B n COMPUTER n FRAME RELAY I NTERFACE I DATA CI RCUIT MODE I NTERFACE SERV ICE-SPEC I FI C FUNCTIONS COMMON CELL SWITC H I NG PRIVATE BRANCH EXCHANGE VOICE C I RCUIT MODE INTERFACE < I > DIG ITAL TRANSMISSION DATA C I RCUIT MODE INTERFACE ROUTER Figure 4 1 04 VOICE C I RCUIT MODE INTERFACE I P R IVATE BRANCH EXCHANGE FRAME RELAY INTERFACE B Sample StrataCom Network Configuration Vol 5 No. I Winter 1993 Digital Technical journal Frame Relay Netwo-rks The DECNIS ami WAJ'lrouter implementations use frames across the interface, as wel l as annexes the point-to-point protocol (PPP) for the transmis concerned with local management (e .g., notifica sion of multiprotocol datagrams over point -to point l inks. PPP is defined in Requests for Comment (RFCs) 1331 and 1332, with bridging extensions specified in RFC 1 2 20; support for DECnet Phase IV is defined i n RFC 1376 and for osr in RFC 1 37 7 1 0- 1 4 Congestion avoidance procedures include support tion of PVC status) .' Although a l l implementations to date have focused on a PVC-based interface, svc access is defined in ANSI TJ. 61 7, DSSJ -Signaling Specification fo-r Frame Relay Bea-rer Service. 8 E ach of these T l S l standards has an equivalent cenT recommendation, as shown in Table 1 . for both the threshold notification signal in the LMI (when available) and the FECN. The threshold notifi cation signal causes the end system to modify its rate of data transmission. Receipt of a frame with the FECN bit set causes the equivalent bit i n the network layer PDU header to be set, which i n turn causes the end systems to reduce their offered traf fic. The BECN and DE bits are never set or examined . Other Current Activities The Internet Engineering Task Force (IETF) is developing specifications for RFCs related to the frame relay technology. A specification called M u ltiprotocol Interconnect over Frame Relay defines an encapsulation mechanism for support ing multiple protocols over frame relay networks. Related Activities To allow use of the simple network management Various committees are involved in activities related protocol (SNMP), an experimental management to the frame relay technology. These activities information base (MIB) for frame relay DTEs is also include standards work, specifications, and efforts under development. to address technical issues such as interoperabil ity. To promote the frame relay technology, a Frame Relay Forum has been set up in both North America Standards and Europe. A technical committee bas been estab The overall frame relay network architecture is defined in ANSI TJ. 606, Frame Relay Bea-re-r l ished to add ress issues related to the technology in terms of its interoperability and evolution in mu lti Se-rvice-A-rchitectural F-ramewo-rk and Se-rvice vendor environments. This committee actively par DescrzjJtion. 1 Access is provided by the frame relay ticipates with the standards bodies and develops interface, which is defined in various Al'JSI stan implementation agreements and interoperabil ity dards for both permanent and switched virtual cir test procedures. Work continues to define a cuits . ANSI Tl. 618, DSSJ -Co-re A spects of Frame network-to-network control interface, m u l ticast P-rotocol fo-r Use with Frame Relay Bea-re-r Se-rvice ing capabilities, mul tiple protocol encapsu lation, contains a definition of the protocol for exchanging and interworking with other technologies, such as Ta ble 1 Cu rrent Status of Frame Relay Sta n d a rd i za t i on Standard ANSI Status CCITT Status Remarks Arch itect u re T1 .606 Standard 1 .233 Standard Replaces 1.222 Congestion Add e n d u m Standard 1.370 Standard Management to T1 .606 Standard 0.922 Sta ndard and SVC Desc ription Pri nciples Data T1 .61 8 Tra nsfe r Core Aspects (Annex A corresponds standard to T1 .61 8) T1 .61 7 Standard 0.933 Stan dard Management I nc l u ded Standard I n c l uded Standard Procedu res in T1 .61 7 for PVCs Annex D Access Signaling Most important frame relay for SVCs Digital Technical Journal Vol. 5 No. 1 in 0.933 Annex A Winter 1993 Concepts accepted i n CCITI 1 05 DECnet Op en Networking the sw itched m u lti megabit data service (SMDS) 4. defined by Bell Commun ications Research, Inc. 1 ' The c e l l swi tch i ng adopted by StrataCom within 60- 108 kHz Group Band Circuits (Geneva: International conform with emerging CCITT recom mendations f()r broad band ISDN J 6 These recommendations Teleco m m u n ications Union, 1976) '5. A NSI T/. 618: DSS 1 - Core Aspects of frame Protocol fo r Use with Frame Relay Bearer defines a standard ce l l structure and ATM adaptation Seruice (New Yor k : American National Stan l ayers (AALs) for parti c u l a r higher-level fun ctions. Summary V35: Data Tmns mission at 48 Kilobits per Second U�ing their network is expected to change over time to cover asynchronous transfer mode (ATM), which ccrrr Recommendation dards Institute, I n c . , 1990). 6. CC./7T Recommendation Q.922: ISDN User Network Interface Layer 3 Spenfication for Frame relay is a simpl ified form of packet-mode swi tching that, at least i n theory, provides access Basic Call Control (Geneva: I nterna tional to h igh bandwidth on demand, d i rect connectivity Telecom munications Union, 1991). to a J I other points in the network, and consump t i o n of only the bandwi d th actually used . Thus, 7. to the customer, the frame relay techno logy offers a red uction in the response time. 8. (Geneva: Intern a t i o n a l ANW Tl. 61 7. DSSJ -S(!{naling Spe etficatiu n j()r Frame Relay Bearer Service (New York: Rou ters con nected to a frame relay network can frame relay network requ i re that special care be Services Telecom m u n ications U n i o n , I991 ). American National Standards I ns t i tu te , I nc . , consider the m u l t iplexed , PVC interface as a set of point-to-point I i nks. The special characteristics of a Link Lc�ver Specification for Frame Mode Bearer cost of transmission l i nes and equi pment and i mp roved p erfo r m a nce and COTT Recommendation Q.931: ISDN Data 1990). 9. taken i n sel ecting the data l in k protocols and in CC/Tf' Draft Recommendation Q. 933: ISDN Signalling Specification for Fra me Mode hand I ing congestion. Bearer Serv ices (Geneva: I nternational Te lecomm u n ications Union, 1991 ). Acknowledgments The a u thors t han k StrataCom, Inc. for provi ding 10. sign ifican t input on cell switch ing technology and Point-to-Poin t Protocol for the Transmission of fi!Iulti-protoco! Datagrams over Point-to Point Links, I n ternet Engineering Task Force its use i n t heir I PX FastPacket equ ipment. The RFC 1 331 (May 1992). authors woul d also l i ke to ackn owledge Cl iff Development Group who consu l ted in Digit a l 's i n i The PPP Internet Protocol Control Protocol (JPCP), Internet Engineeri ng Task Force RFC t i a l frame relay implementation . 1 352 ( May 1992). Didcock of the Computer I ntegrated Telephony 12. References 1. 11. ing, I nternet Engineering Tas k Force R.FC 1 2 20 A NSI TJ. 606: Fra me Relay Bearer Serl'ice Arc/.Jitectural Fra mework and Seruice [)escriptimz (New York: American National (Ap r i l 1991 ) . 13. Standards Institute, I nc . , 1990). 2. CUTT Recomm endation X.25: InteJfa.ce between Data Terminal Equipment (DTL) 14. Control Protocol PPP OS! Network Layer Control Protocol (OSI 1377 (November 1992). 1'5. Betlcore TR-TS V-000772, Generic System tmrks !Jy Dedicated Circuit (Geneva: Interna Requirements in Support of Switched Multi tiona l Teleco m m u n ications Union, 198R). Megabit Data Seruice, Bel l Com m u n ications Research, Inc. (May 1991 ) . Fmme Relay ,�jJecijication with Extensions, Re t•ision 1 0, C i sco Systems, Digital Equ ip ment Corporation, Northern Telecom. In c . , and StrataCom, Inc. (September 1990). [ ( )(i w DECnet Phase (JJ NCP) , I nternet Engineering Tas k Force RFC NLCP), Internet Engineeri ng Tas k Force RFC (lXI:) fo r Tenninals Operating in the Packet 3. PPP 1 376 (November 1992) . and Data Circuit-terminating Equipment /Vlode and Connected to Public Data Net Point-to-Point Protocol EYtensiunsfor Bridg 16. CCJ7T Drajt Recom mendation I. 121: Broad band Aspects of ISDN (Geneva: I nternational Telec o m m u n i cations union, 1991 ) . �-bt. 5 No. I Winter /'J')3 Digital Technical ]ou rual David C. Robinson Lawrence N. Friednum Scott A. Wattum An Implementation ofthe OS/ Upper Layers andAppUcatUms Above the transport layet; the open systems interconnection (OS!) basic reference model describes se1'eral application standards supported by a common upper lCf:)'er protocol stack. Digital's high-performance implementation of the upper layers of the protocol stack concentrates on maximizing data throughput while minimizing con nection establishment delay A n additional benefit deriuedfrom the implementa tion is that; for· normal data exchanges, the delive1y delCf:J' is also minimized. The implementation features of Digital's two 051 applications-file transfer, access, and management (FTAt'vf) and virtual terminal (VT)-include the use of common code to facilitate portability and efficient buffer management to improve performance. The open systems i nterconnecti o n (OSI) basic reference model defined in the International Organization for Standardization standard ISO 7498-1 specifies a l:lyered protocol model consisting of seven layers. 1 By convention, the first four layers physica l, data l i n k , network. and transport-are referred to as the lower layers 2 These layers pro vide a basic com m u nication serv ice by rel iably transferring unstructured user data through one or more networks. The rem aining layers-session, presentation, and appl icatio n-build on the lower l ayers to provide serv ices that structure data exchanges and maintain i n formation in data exchanges to support d istributed appli cations . • ISO 8824 -Abstract Syntax Notation One (ASN . I ) • ISO 8825-Basic Encod ing Ru les (BER) • ISO 8649 and ISO 8650-Association Control Service E lement (ACSE) This section gives an overview of the serv ices defined in these standards. The later sections File Transfer, Access, and Management Implementation and Virtual Te r m i na l I mplementation discuss two appl ication-specific standards. Session Layer The transport layer service faci l i tates the exchange These three layers a r e known c o l lective ly as the of u nstructured bytes (i .e. , octets) of data. How upper layers. ever, exchanges between components of a d istrib This paper first gives an overview of the OS! u ted appl ication are often structured. The function upper layers and of two application standards-file of the session laye r is to standardize some of the ter m i n a l (VT). The d iscussion that fol lows co ncen structure to the transport layer exchanges. trates on the features of D igita l 's implementation of the upper layers and the two appl ications, with three phases typical of a l l connection-oriented ser transfer, access, and management (FfA.M ) and virtual emphasis on novel imp.lementation approaches. common exch anges by suppl y i ng serv ices that add The session-con nection-oriented service has the vices: connection establ ish ment, data transfer, and connecti o n release. All structuring of the data Summary of OS/ Upper Layer Standards accompl ished by using e i ther tokens or synchro The appl ication-independent parts of the OSI upper n izati o n . Hence, the connection establish ment and exchanges occurs in the data transfer phase and is layers are defined in the fol lowing standards: • ISO 8326 and ISO 8 327-Session Connection Oriented Service and Protocol • release phases are not d isc ussed further i n this paper. Tokens are used to control wh ich peer session user of a session connection is permitted to invoke ISO Hb'22 and ISO 882 3 - Presentation Connec a particular service or group of services. The tion Oriented Service and Protocol sess ion layer a l so provides services to exchange Digital Technical journal lkJI. 5 No. I Winler JY9.> 1 07 DEC net Open Networking tokens between peer session users. There are tou r Presentation L ayer types o f tokens. D ifferent compu ter architectures and compi lers 1 . Data , for control l ing h a l f-duplex data exchanges use different internal representations (i . e . , con 2. Release, fo r contro l l ing wh ich session user can initiate the release of a session connection of the mi nor synchronization service 4 . Major/Activit}', for contro l l ing the issui ng of major synchronization and activity services For example, when the data token has been nego tiated on a session connection, sess ion data can be sent only by the end that cu rrently has the token . Exchanging the data token bet ween t h e session users provides a half-duplex data service. The data transfer phase provides synchroniza tion by al lowing session users to insert major and minor synchron ization points into the data being transmitted . Optional ly, each direction of flow can have its own set of synchronization points. Figure 1 i l lustrates a data exchange structured as a single dialog unit. A d ialog unit begins at a major synchronization point and terminates either at a new major synchron ization point or by the release of the session connection. Further structure is pos sible within the dialog unit by inserting minor syn chronization points. session synchronization services between representations is necessary when com mu nicating between d issimilar arch itectures. The 3. Synchronize-mi nor, for controll ing the issu ing The crete syntax) for data values. Therefore, conversion al low appl ications to insert synchronization points into their data exchanges. These points are appl ication specific. The session service a lso provides a resyn chronization service to a l low a session user to request irs peer to resynchron ize to an earl ier synchronization point, for example, to a previous point in a file transfe r. Activities provide an additional structuring ser vice. An activity represents a l ogical piece of work. At any moment in time, there is at most one activity intent of the presentation layer is to a l low com mu nicating peers to negotiate the data representation to be used on a presentation connection. The presen tation standards, ISO HH2 2 and ISO 8823, d istingu ish between abstract syn tax and transfer syntax. Abstract syntax is the definition of a data type independent of its representation. Typica l ly, data types are defined using the ASN.l standard , ISO 8824, which was developed for this purpose. ASN . l bas a number of primitive data types, including INTEGER, REAL, and BOOLEAN , as wel l as a col lection of constructed data types, includ ing sn and SEQUENCE O F . These primitive and const ructed data types can be used to define the abstract syntax of complex data types such as appl ication protocol data units. A transfer syntax is the exte rna l com mu nication representation of an abstract syntax. Values from the abstract syntax are encoded according to the ru les defined in the transfer syntax. A common way to define a trans fer syntax is in terms of encoding ru les. For example, these ru les may indicate how an INTEGER value is represented or how to encode a SEQUENCE OF data type. A widely used transfer syntax is the basic encodi ng rules specification, ISO 8b'25. An abstract syntax can be encoded using diffe r ent transfer syntaxes, of which there are many. The role of the p resentation l ayer is to negotiate the set of abstract syntaxes to be used on a particu lar pre sentation connection and to select a compatible transfer syntax fo r each of these abstract syntaxes. This process ensures that both peers agree on the data representation to be used in data exchanges. per session con nection. However, several activities can exist du ring the l ifetime of a session connec Application Layer tion, and a n activity can span session connections. The appl ication layer supports distributed interac The synchronization services can be used with tive processing, that is, the com m u n ication aspects activities services. 1...--- MAJOR of distributed appl ications such as FlAM (defined DIALOG UNIT ------+ MINOR MINOR MINOR MAJOR by ISO 8571 ), directory serv ice (defined by ISO 9594), and VT (defined by ISO 9040 and ISO 904 1 ) . Unlike for the session and presentation layers, nu merous appl ication l ayer protocols and serv ices exist-at least as many as t here are d istributed applications. Figure 1 Data Exchange Structured as a Dialog Unit 1 08 The appl ication layer structure specified in ISO 9545 defines a model for combining these Vol. 5 Nu. I \Vinter 199.1 Digital Technical journal A n Implementation of the OS! Upper Layers and Applications protocols in the same syste m . The fu nctions for a 1 . Ama lgam a t i ng upper laye r state tables. The ser part i c u l a r appl ica tion are grouped together to fo rm vic es prov i ded by the presentation and session an application serv ice e lement (ASE). FTAJ\'1, VT, and layers are similar. Al so, connection establ ish d i recto ry servic e are examples of ASEs and are the ment a nd release i n the ACSE is basical l y the same basic b u i l d i ng blocks of the appl ication laye r. One as i n the other two upper layers. The refo re, the o r more ASEs are combi ned to fo r m an application three state tables can be combi ned i nto a sin gle entity (AE). An AE represents a set of com m u n i ca state table, tion resources and can be thought of as a program red uci ng the ove r head . Th is a m a lgamation el im on a d i sk. An i n vocat ion of a n AE (i . e . , execution of i n ates the need to m a nage l i nks between state the program) can contain one or m o re i nstances of tables, requ i res all predicates to be tested in only an ASE w i t h one or m o re application ass ociation s, one place, and generates only one state transi i . e . , application layer connect i o ns. The AE speciJ ica tion or action per i nbound event. tion a l so defines the ru les for i nteraction between ASEs operating over the same associ a t i o n as wel l as interac tions between associations. A n ASE required by all appl icati o ns is cal led the assoc iation control service eleme nt (ACSE). The ACSE, defined by ISO 8649 and ISO 8650, i s the ser v ice and protocol requ i red to establ i s h an appl ica tion ass ociation . Therefo re , an AE a lways contains at least the ACSE. thus improving perfo r mance by 2. Treating the presentation service P-DATA as a special ca se. The prese n t a t i o n service P-DATA is the most frequently used service, and hence, its performance has the greatest i mpact o n data t h rough p u t . By fas t - l a n i ng the processing of the P-DATA service, the normal overheads associated w i t h the combined state table p rocessing are avo ided . sentation connection; no other appl i c a t ion associa 3. Good bu ffe r m a nagement. The new appl ication tion can share thi s presentation connection. I n this efficient use of bu ffers. We e l i m i nated a l l copy way, a p p l ications ga i n access to the pre sentation i ng of user data within OSAK by taking advantage An appl ication association is mapped onto a pre progr a m m i ng interface (API) to OSAK enables of user bu ffe rs. O n an outbound service, an and session data phase services. OSAK user is requested to leave space at the start New OS/ Upper Layer Implementation Digital's im plementation of the OS! upper l ayers, nam ely OSAK, i n cl udes session, pre sentation, a n d ACSE services. Users of OSAK c a n thus establ i s h appl ication associations and u s e session and pre sentation services d u ring the data transfer phase. mation (PCI) to the user bu ffe r. This bu ffer is tl1en sent to the transport provi der. Otherwise, we al locate an OSAK-specific bu ffe r using a user suppl ied memory a l location rout ine. Before receiving a n i n bound service, the user m ust pass a t least one u ser bu ffer to OSAK. Th is Aims bu ffe r i s used to receive the inbound transport In 1988, when Digital decided to p roduce a new version of the user data . If there is suffic ient space, we add the OSI upper l ayer protocol control infor of OSAK, t h ree aims were considered event ( both user data a nd upper l ayer PC! ) . The upper l ayer PCl is decoded before the user param ount: h igh perto r mance, m a i ntai nabi l it y, and bu ffers portabi li ty. extremely efficient, t h is approach has the advan are re turned. In addition to bei ng tage of a l lowing OSAK users to exert i nbound Performance High performance of the OS! upper layers is essential to prod ucin g competit ive OS! products. Because a l l OSI appl ications use these upper l ayers, the performance of OSAK affects these appl ications. Therefore, OSAK aims to ma x i m ize data through p u t and to mi nim ize connection estab l ishment delays. Th is i mproved p e rfor mance is ach ieved by m a x i m izing the use of the communica t i on pipe and m i n i m izing the local processing requirements. The process involves Digital Technical ]our·rwl H1l. 5 No. I flow control; if OSAK i s not given any bu ffers, no transport events will be received. Al so, t h is buf fering scheme si mpl ifies resource ma nagement i n OSA K . As OSAK does not have any of its own resources, they a l l come from OSAK users. One OSAK user cannot interfere w i t h the operation of another OSAK user by consuming all OSAK resources. 4 . Parsing only the upper l ayer headers. The pre sentation layer standards model the m a pping Winter 199.) 1 09 DECnet Open Networking between concrete ( i nternal) a nd transfer (exter and n a l ) representa tion of data values. In particular, platforms. The only major difference between the presen tation state tables con tain predicates to verify that a l l user data is from a current pre thus ass ists port i ng applications between the versions for the ULTRIX and the OpenVMS operating systems is the way events are signaled . encoding ami decoding is i n the appl icat ion The ULTRIX i mplementat i o n supports both a pol l itself, OSAK does not implement these predi W i th the po! J i ng mode l , the OSAK user repeated ly cates. Hather, OSAK assumes t ha t i ts users have c a l l s OSAK rou tines to test for completion of an correct l y encoded their own p rotocol and w i l l event; the rou tines used are osak_col lect_pb( ) sentation con text. Since the best place for detect a n y problems when decod ing. 5. Trading memory for performance. AJI encoding and decoding of upper layer PCI is done with i n - l ine code. More compact coding i s possible using subroutines but at the cost of perfo r mance. 6. M i n i mi zing pa rameter checking. Most parame ters are poi nters to user buffers. To check the ing model a n d a n event - driven or bloc k i ng model. or osa k_get_event( ) . In the blo c k i ng model, the OSAK user blocks awaiting the event, with the osa k_select( ) rou t i ne . These three routines are avai l able t o O p e n V i'vt S applicatio ns. In a d d i t i o n , the OpenVMS i m p lemen tation supports event notification by asynchronous system traps (ASTs). A l so, the OSAK API is similar to XAP, the X/Open consequen tly, costly. Therefore, OSAK assumes AP I to the OS! upper layers. To support OSAK on that the p o i nters do i n deed point to t he user's is common to all platforms. The main differe nces v a l i d i ty of a l l pointers is t i me- consu mi ng and, m u ltiple platforms, as far as p ossible, OSAK code a re the in terface to the transport layer and the memory. Main tainability O p e nVMS support for ASTs. O ve r 90 p e rcent of the The code for t he new ve rsion of OSAK is easier to maint ain than the previous code. As stated earl ier in this section, a major step i n improving the m a i ntainability was the use of amal code i s common to the ULTRJX and the OpenVMS ve rsions. gamated state tables. A s ingle state table e l i m i nates Performance Measurements l i nks between tables, reduces the amount of main Two performance metrics, t hroughp ut and connec tenance req u i red , and thus s i mp l ifies the code. In tion establ ishment delay, were measured between add ition, using a s i ngle table makes it easier to seri two DECstation 3100 workstations connected by a al ize events. With m u l t iple state tables, an i nbound transport event can trigger a conf l i c t i n g state cha nge in the session state table at the same t i me a user request is changing the presentation stare l ightly l oaded Ethernet c o m m u nications network. The DECstari o n machi nes were r u n n i ng lll..TRIX V4.2 with DECnet-ULTRJX V5.1 . OSAK accessed OSI transport through the X/Open transport in terface nection ensures that only one event ( i . e . , either a ( XTI) in non blocking mode. user or a transport event) is active in the state table were used: an initiator and a responder. The initiator at any given time. 1 . Establ ishes a n association. table. Usi ng a single state table for a particu lar con The state tables are w r i t ten in M4 macroproces sor notation. Thus, the OSAK state table defi n ition is similar to a n OSI protocol specification ; this improves readabi l i ty. Macros are al so used exten sive ly to hand le common bu ffe r m a n i p u l a t i o n a n d the encode a n d decode fu nctions. Alt hough macros are preferred over subro u t i nes to i mp rove perfor mance, macros can be converted , at the expense of slower performance, should a more compact ver sion of OSAK be required. Po rtability The new version of OSAK is designed to facil itate portabil i t y of appl ications using both the OSAK API and OSAK i tself. The new O SA K API is designed to be com mon across a l l p latforms I I0 For throughput measureme nts, two p rograms 2 . Reads the system time. ') Transmits 2,000 buffers of data as quickly as pos s ible . These user bu ffers con t a i n suffic ient space for the upper layer headers. When a send request fa ils due to flow contro l , the sender waits using the lJLTRJX system call select(2) u n t i l the flow control is removed. The sender then col lects the user buffe rs with the osak_col lect_pb( ) ro u tine before con t i n u i ng wi th the send loop. 4. Reads the system t i me and calcu lates the time req u i red to transmit the 2,000 bu ffers. 5. Heleases the association. 1-rJI. 5 No. I Winter 1993 Digital Technical journal A n Implementation of the 0)1 UjJjJer Layers and Applications The responder i n bound even t. A l s o, for the s m a l l buffe rs, a sign ifi cant amount of t i me is consu med by in i t i a l izing the 1. Accepts a n association request user parameter b.lock before ret urning it to the user. 2. Loops, w a i t i ng for a transport event using the ! J LTRJX system c a l l select(2), and then collects We also used the throughp u t program to mea sure the connection esta b l ishment time. The pro the data using the osak_get_eve n t ( ) rou t i ne gram read the system t i me before and after the u n t i l a l l 2,000 b u ffe rs have been received ass ociation esta bl ishment phase; the average con 3. Responds to the request to release the association Ta ble 1 records the t h rough put measurements nect i o n establ ishment time was 0.08 seconds. In add i t i o n , tests on the new OpenVMS impl emen ta t ion i n d icate that throughput im proved two to fo r various b u ffer s i zes ranging from 10 to 16,000 three fo ld as compared to the OSAK code in the pre ( 16K) octets per buffer. v i ously ex isting Open VMS i mplementations. The data presented i n Table I ind icates that for Both the throughput and profile data ind icate smal l bu ffers, the throughput is poor. This low per that the transport perfo r m a nce dominates the per fo rma nce is due to the system overhead associated formance of w i t h processing a send request, in dependent of the design goa l of red ucing the overhead of t he OSI amount of data to be trans m i t ted . However, the upper l a yers to a very l ow leve l . Meeting this goa l thro u ghp u t r a pid ly i m p roves u n t i l the bu ffe r size OSA K. Therefore, OSAK has met its was necessary because poor OSAK performance reaches 4K octets. From this size on, the through wou l d impact a l l OSI appl icati ons supported by put measurement is a l most flat at between 507K OSAK. Wh i l e further reductions i n overhead are and 528K octets per second. The variation is due to possible, such savi ngs would be a t the expense of fragment a t i o n i n the lower l ayers. The n u mber of OSl upper .layer fu nctionality. send requests flow controlled represents the n u m ber of t i m es a send request was del ayed beca use of flow co ntrol by the transport service in the course of trans m i t ting the 2, 000 b u ffe rs. We profiled the i n i t i a tor and the responder. For bu ffers ranging i n size from 1 0 to 16K octets, the i n i tia tor spent more than 90 percent of the time i n transport. For t he responder, t h e percent o f t i me spent in transport varied between 60 percent for 1 0 - octet b u ffe rs and 92 perce n t for 8K-octet buffe rs. The rema i n ing t i me was spent primari l y i n select(2), waiting fo r and Table 1 processing the next File Transfer, Access, and Management Implementation This section pr >s · n b a s u m mary of the ISO FTA:'vl s t a n d a rd and deta i l s of Digital 's implementation of this standard . Summary of the ISO FTAJM. Standard ISO 8571 File Transfer, Access, and Management (FTAM) is a five-pan standard cons isting of a general i ntrod u c t i o n , a defi ni t i o n of the v i rtual file store, the file service, the fi le protocol defi n i t io n s, and the protocol i mplement ation co nformance sta te Throughput Measurements for Digital's OSI Upper Layer I m plementation m e n t proforma . The I'TA.J'vl sta ndard defines an ASE for transferrin g fi les and defi nes a framework fo r fi le access and file m a nagement. Number of FTA,\1 ser vice ami proto Buffer Size T h roughput Send Requests Initiator and Re:-.jJ oJtder (Octets) ( K i l ooctets/s) Flow Control led col actions are based on a c l i e n t - server mod e l . In 6.60 56.80 21 6.00 2 4 35 794 862 1 ' 1 51 10 1 00 51 2 1 , 024 2,048 4,096 6,000 8,1 24 8,1 25 1 0,000 1 3 ,000 1 6,000 266.60 372.60 453.70 507.00 528.80 507. 1 0 527.20 The i n i t iator is responsi ble for starting fi l e ser v i c e activity and controls the protocol a c t i o ns that take p l ace du ring the d i alog (or FlAM associati on) b e t ween two FTAM appl ications. For exampl e , the 1 ,21 7 i nitiator has to request that a n FTAM association be 596 651 751 522.20 505.27 D igital Technical Journal the FTA.M standard, the cl ient is referred to as the i n i t i ator, and the server is referred to as the responder. establ i shed, that a fi le be opened o n a remote sys tem , and that a file be reacl from a remote system . 1 ,1 0 1 1 , 279 Vol. 5 No. I Winter The responder passively reacts to the requests of the peer i n i t i ator. The resp onder is res ponsible for 1993 111 DEC net Open Networking ma naging the virtual fi le store and mapping any vir tual file attributes into local fi le attributes. Virtual File Store Many arch itectures and imple mentations of file systems exist, and storing and accessing data can d iffer from one system to another. Therefore, a mechanism is needed to describe files and their attribu tes independent of any particu lar architecture or implementation . The mechanism used in the FTAM is cal led the virtual file store. The FTAM v irtual fi le store model consists of file attribu tes, activity a t tributes, file access struc ture, and document types. File attribu tes describe the properties of the file, which include the size and the date of creation. FI'AM file at tributes also defi ne the types of actions that can be performed on a file. Read access or create access are examples of file actions. into subsets of related services. The subsets of func tiona l ity are cal led fu nctional un its. Functional u n its are used by the FTAIV! protocol to convey a user's requirements. For example, the standard defines the read fu nctional u ni t , which al lows a n implementation t o read whole files, a n d the file access unit, which a l lows an implement a t i o n to access records in the file. In addition, the FTA.i\1 standard defines the follow i ng classes of fi le service: transfer, m a nage ment, transfer and m anagement, access, and uncon strained . Each service class is composed of a set of fu nctional units. For example, an FTAM implementa tion that supports the transfer service class wi l l be able to either read or write files. New FTAM Standard Work Modifications to the FTAIVl standard are in progress in the ISO . The most Activity attributes are properties of the file, important modification is the file store m anage which are in effect for only the d uration of the FTA.i\1. ment addendum, which specifies how wild cards, associa tion. Examples of activity attribu tes are file d irectories, and refe rences ( l i nks) to files are to current access request, curre nt i n i t iator identity, be hand led in an OS! environment. The addendum and current concurrency control. Current access also specifies how to manipul ate groups of files. In request conveys the access control appl ied to the the cu rrent version of the standard , only o ne file file, e.g., read or write access. Current i n itiator iden can be selected at a time. tity conveys the name of the initiator accessing the virtual file store. Current concurrency control con veys the status of the locks appl ied by the initiator. Digital's FTAM Implementation D igital's FlAM prod ucts, ava i l able for the OpenVMS The FTA.i\1. file access structure is hierarchical and and ULTRIX operating systems, support FTA.M appli produces a n ordered tree that consists of one or cations i n both the role of i nitiator and the role of more nodes. Th is file access structure is defined in responder. The initiator applications a l low users to ASN.l and can be used to convey the structure of copy, delete, rename, l ist, and append files. In the a wide variety of files. OpenVMS version, the i n itiator applications are Tn the FTAM v irtual fi le store model, document integrated i n to the O igital Command La nguage types spec ify the semantics of a file's contents. The (DCL) so that the user can continue to use the FTA.i\1 standard defines four document types. COPY, • FTA.i\1.- 1 , u nstructured text files • FTAM-2, sequential text files • FTA.i\1-3, u nstructured binary files • FTANI-4, sequen t ial binary files The virtual file store model provides a framework for defining many d ifferent file types, i n c l u ding those not supported by the standard ized document types. The U.S. National I nstitute of Standards and Tech nologies (NI ST) has used the virtual file store model to define document types to support various file types, such as indexed files. FTAM File Service DELETE, D I RECfORY, and RENAME com m ands. Where the FTA.i\1 service and protocol is used to support t hese commands, the additional qual ifier /APPLICATlON=FTAM is required . I n the ULTIUX version, the same fu nctional ity is provided using the set of commands ocp, orm, ols, ocat, and omv. These commands have the same seman tics as the corresponding ULTRIX com m a nds cp. rm, Is, cat, and mv, respectively, and are similar to the set of DECnet file transfer util ities of dcp, d r m , d is, and dcat. (Note that the set does not include dmv.) The responder applications a l low users to cre ate, read , write, delete, and rename files. File access, i . e . , the location of specific records in a fi le, The FTA.M file service is a func is al so supported by the responder appl ications. tional base for remote file operations. Fu nctionality The OpenViVIS responder application supports file defined by the FTAM file service is broken clown locking and recoverable file transfer. I 12 Vnl. 5 No. I Winter I'J
character 1 1 .3 DECnet Open Networking as an ind ication of the intent to delete , whereas the desired profile. The responder is responsible for other systems may expect the user to enter a accepting the peer association request and for creat character. VT resolves these diffe r ing an interactive context for the remote peer user. ences by translating the local action into a virtual On the OpenVMS system, the VT protocol init ia action. The action in our example becomes the tor is invoked by the DCL command SET HOST/VTP; virtual actions of decrementing the current cursor on the ULTRJX system, the VT protocol in itiator is position and erasing the character at the current invoked using the ologi.n command. locat ion . A cooperating i mplementation would The VT implementa then translate these virtual actions into an appro Implementation Features priate local action. tion uses the OSAK interface out! ined ea r l ier in the The VT protocol is very powerful in the respect paper. The goals of the VT implementation were to that the protocol definition provides many options provide a high ly portable, very efficient, and easily and featu res that allow the support of complex ter extensible code. minal models. During association establ ishment, To achieve the goal of portabil i ty, the implemen cooperating implementations agree o n the subset tation was divided into two major components: o f the protocol and the terminal model to be used. in terface to the OS! environment and the non-OSI The protocol subset and terminal model are i nterfaces (e.g. , to terminals). The OS! component referred to as the profile. In addition, VT provides is completely portable to mu ltiple platforms. The two modes of operation: asynchronous (A-mode), non -OSJ component is platform specific and must which may be thought of as fu l l-duplex operation, be rewritten for each u n ique platform. The i nter and synchronous (S-mode), which may be thought face between these components consists of six basic fu nctions, which must be supported on all of as half-duplex operation. The ISO base standards define two basic profiles, one for each mode. Additional profiles have a lso been defined (or are being prepared) by the platforms. • Attach/detach-to attach and detach the non OSI environment • Open/c lose-to open or close a specific connec regional OSI workshops. Currently, the OpenVMS and ULTRIX implemen tations of the VT protocol tion into the non-OSI environment both support the following profiles: 1 . TELNET-1988, which mimics the basic functional ity found in the transmission control protocol/ internet protocol teletype network (TCP/IP TELJ'\JET) environment • Read/write-to read or write data between the OS! and the non-OSI environments Because each function is simple and clearly defined, the amount of platform-specific code 2. Transparent, which allows the send ing and receiving of uninterpreted data 3. A-mode-default, which provides basic A-mode functiona l ity required for implementation is minimal . For exam ple, t he read function on the U LTRJX implementa tion is only 10 lines of code. The implementation is therefore highly extensible to different platforms. Performance of the VT protocol i m p lementation Digital's VT Implementation is enhanced by using preal located b u ffe r pools. This approach to buffer management el iminates the Digital 's VT implemen tation provides both initiator overhead of dynamically allocating buffers. and responder capabil ities. I n addition to describ Our VT protocol implementation not only ing the features of the implementatio n, this section implements the ISO VT protocol but a lso provides compares the VT protocol with other network ter a gateway to and from other te rminal protocol envi minal protocols. ronments. We provide gateways to TELNET and to the Local Area Transport (LAT) on both the The VT implementation OpenVMS and the ULTRIX versions. Jn addition, we for both the U LTRIX and the OpenVMS systems pro have a VT;com mand terminal (VT/CTERJ\1) gateway vides the capability to act as either an initiator (a on the ULTRIX version. Initiator and Responder terminal implementation) or a responder (a host Comparison establishing an association with the responder Network Terminal Protocols based on information provided by the user, such as with network terminal protocols deal with echo 114 Vol. 5 No. I of the VT Protocol implementation). The initiator is responsible for Winter 1993 with Other Most comparisons Digital Technical journal An implementation of the OSi Upper Layers and Applications response time, that is, how long it takes for a char acter to echo to a d isplay after being typed a t the Acknowledgments The au thors wou ld l ike to thank their colleagues for keyboard. YT, l i ke TELNET and CTERJVI , c a n operate reviewing previous drafts of this paper. In particu in two different echo modes: rem ote, where the lar, we wou ld l i k e to thank Chris Gu nner and Nick echo is achieved by means of the remote host; and Emery, who were instrumenta l in revising the OSAK loca l , where the echo is accompl ished through the API, local host. A number of factors contribute to advanced development code into the product. and the OSAK team, who converted the response time in a remote echo situation. includ ing protocol overhead and l ine speed. TELNET has l it t l e protocol overhead; in fac t, for m o s t situatio ns, transferring normal data requires no additional overhead. VT protocol overhead is approxim ately 30 to 1 for a typical A-mode profi le, that is, 30 octets are required to carry 1 octet of user data. VT over head may seem excessive when compared with TELNET. However, the VT protocol provides many add i t ional capabi lities that TELNET does not, such as the abi l ity to accurately model d ifferent terminal environments. Additional ly, the 30 octets of over head does not increase significantly when larger amounts of user data are transferred . The largest gains for the VT are in the area of S-mode profiles. S-mode profiles enable most char acter echoing to be done locally By using an app ro priate S-mode profile, the VT implementation can provide sophisticated local terminal operations. References 1 . ]. Harper, "Overview of Digit a l 's Open Net working," Digital Technical jo urnal, vol. 5, no. 1 ( Winter 1993, this issue): 1 2 - 20. 2. L. Yet to et a l . , "The D ECnet/051 for OpenVMS Version 5.5 Implementation," Digital Technical jou rnal, vol . 5, no. 1 ( Winter 1993, this issue): 21- 33. 3. P Lawrence a nd C. Makemson, "G uide to ISO Virtual Te rminal Standards," Information Tech nology Standards Unit (UK), Department of Trade and Industry (March 1988). Genera/ References ancl then to transmit it a l l at once to the remote lnfonnation Processing Systems, Open Systerns Interconnection, Part 1 : Basic Reference Model host. The ability to process large amounts of termi reference no. ISO 7498-1 , 1984). Thus, it is possible to edit an entire screen of text nal input as batch jobs has many advantages, incl ud ( I nternational Orga nization fo r Standardization, ing reduced network bandwidth requirements, Information Technology, Open Systems Intercon reduced CPU requirements of the remote host nection: Connection Oriented Session Service (since the remote host is no longer involved in char Definition ( I n ternational Organization for Stan acter echo), and i ncreased user satisfaction (since dardization, reference no. ISO 8326, 1987). users experience no network delays fo r character echo). Information Technology, Open .�ystems intercon nection: Connection Oriented Session Protocol Definition ( International Organization for Stan Summary dardization, reference no. 150 8327, 1987). G oals common to the OSAK, FTAM, ami YT protocol projects included good performance and portabil ity of implementation. Perform ance is espec i a l l y important for OSAK, because i t supports a l l othe r OS[ applications. Maxim izing the use of common code and reducing system dependencies i n the three projects significantly reduced the engineer ing effort to port an implemen tation from one p lat form to another. This savings in human resources is Information Processing -�)'Stems, Open .�ystems Interconnection, File Transje1; Access, and Man agement: Part 1 , General Introduction; Part 2, Virtual File Store; Part 3, File Service Definition; Part 4, File Protocol .'ijJecification; and Part 5, Protocol Implementation Conformance State ment Proforma ( I nternationa l Organization for Standardization, reference no. ISO 8571 , 1988). necessa ry, given the growing set of hardware and information Processing Systems, Open Systems operating platforms supported by Digital. Equally Interconnection: Seruice Definition for the Associ important is the integration of OS! applications with ation their non -OSI cou nterparts, fo r examp le, the ocp Organization for Standardization, reference no. ISO and ologin functions and the protocol gateways. 8649, 1988). Digital Techuicttljournal Vol. 5 No. J Winter 1993 Control Service Elem ent ( I n ternational 1 15 DEC net Open Networking information Processing Systerns, Open Systems Information Processing Systems, Open Systems Interconnection: Protocol Specification for the Interconnection: Specification of Basic .Encoding Control Service Element (Interna Rules fo r Abstract Syn tax Notation One (ASN. l) tional Organization for Standardization, reference ( Internationa l O rganization for Standardization, no. ISO 8650, 1988) reference no. ISO 8825, 1987). Association Information Processing Systems, Open 5j,stems Interconnection: Connection Oriented Presenta tion Service Definition (International Organization fo r Standardiza tion, reference no. ISO 8822, 1988). Information Processing Systems, Open 5)stems Interconnection: Connection Oriented Presenta tion Protocol Specification ( International Organi zation for Standardization, reference no. ISO 8823, 1988) Information Technology, Open .s:vstems Intercon nection: Virtual Terminal Basic Class Service ( I n ternational Organization for Standardization , reference no. ISO 9040, 1990). Information Technology, Open 5)stems Intercon nection: Virtual Terminal Basic Class Protocol (Intern a t ional O rganization for Standardization, reference no. ISO 904 1 , 1990) Information Processing Systems, Open Systems Information Processing Systems, Open Systems Interconnection: Spenfication of Abstract Syn tax Interconnection: Notation One (A SN. l) ( I nternational Organization for Standard ization , reference no. ISO 8824, 1987). 1 I6 Application Lc�yer Structure (Internation a l Organization for Standardization, reference no. ISO 9545, 1989). Vol 5 No. I lrlirller IYY. J Digital Technical journal Mark W. Sylor Francis Dolan David G. Shurtlef f Network Management DECnet/051 Phase V incorporates a new network management architecture based on Digital's Ente1prise Manage1nent Architecture (El11A). The ElltlA entity model was developed to manage all entities in a consistent manner, structuring any manage able component regardless of its internal comple.:'(ity. The DNA CMIP management protocol was developed in conjunction with the model to express the basic concepts in the entity model. Phase V network management is extensible; the Phase V management architecture transparently assimilates new deuices and technolo gies. Phase V was designed to be em open architecture. Management ofDECnet/051 Phase V components is effective in a multi vendor network. Network m a n agement has been an in tegral part of DECnet since 1976 when Phase I I was cleveloped . 1 Even at that early stage of t h e DECnet arch i tecture, tecture had to become as extensible as the network architecture. F i n a l ly, since Phase V was designed to be an open a n effective management capabil ity was recognized arc h i tecture, ma nagement of Phase V components as an essential part of an orga n i zed approach to would have to be effec tive in a m u ltivendor net V, the DECnet work. Our design had to ensure t h a t the a b i l i t y to network ma nagement architecture has undergone provide effective management of network compo a nents was independent of the vendors suppl y i ng networ k i ng. Now in DECnet Phase major revision based on D i g i t a l 's En terprise Management Archi tecture (EMA). This paper gives a n overview of some of the key features and func t i o ns of EMA a n d of DECnet Phase ageme n t . See the V network m an "Overview of D ig i t a l 's Open them. The i n d i v i d u a l m anagement mechanisms used in Phase IV could have been extended to accommo date all the changes plan ned fo r Phase V. However, Networking" paper in this issu e fo r an overview of we fel t i t was time to re visit the basic network man the guiding princ iples, backgro u n d , and archi tec agement arch itecture to see if we cou ld find a u n i ture of DECnet Phase V2 Our initial fied work on Phase V ind icated t h a t changes were needed i n the network management architecture to support the broad range of network ing functions plan ned for Phase V F i rst, network approach that wou l d provide a superior sol u t i o n . Enterprise Management Architecture V development project by m a n agers wo uld have to be able to manage a l l the We began our Phase Phase V components i n a consistent man ner. A exa m i ni n g i n deta i l the requ irements fo r a new method was needed to b u i l d Phase V ma nagement network ma nagement archi tecture. Our goal was to components t h a t wou ld give the same general look design a n open arch itecture that al lowed fo r consis and fee l and the same model i n g approach to a l l tent m a nagement of an extensible array of network components. compo nents in a m u l t i vendor environment. As we Seco n d , Phase V n e t wo r k ma nagement wo u l d iden t i fied the specific requ i rements t ha t wou ld V network archi have to be ad dressed to meet this goa l , we rea l ized have t o b e exte nsible. T h e Phase tecture was being designed to a ll ow the use of m u l that we had the oppor t u n ity to develop an archi tec tiple modu les that would provide the same o r ture that went beyond ma nagem e n t of Phase s i m i l a r services a t each layer and to s i m u l t aneously works. We real i zed support mul tiple-layer protocols in a netwo r k . arch i tecture fo r the ma nagement of both networks Therefore, that we could V net provide an V ma nage and systems. The arch i tectu re eventual ly became ment arc h i tecture to transpare n t ly assi m i late new known as the En terprise Mana gement Arc h itecture devices and technologies. Our ma nagement archi- or EMA. we designed Digital Technical journal the Phase Vr;t. 5 No. J Winter 1993 l l7 DECnet Open Networking Early in the project, we recogn ized that the con ceptual separation of manageable components from the software that manages them was a fu nda mental design pri nciple. EMA therefore d istin guished entities, the basic components of the network that had to be managed, from directors, the software systems and accompanying applica tions used by managers to manage the components, as shown in Figure 1. formal ly, an entity was further spl it into a ser vice element, a ma naged object, and an agent. The service element is the portion of the entity that per forms the primary function of the entity, e.g., a data l in k layer protocol module whose primary purpose is com m u nication with a peer protocol module on another machine. The managed object encapsu lates the software that implements the fu nctions supported by the entity for its own management. For exa mple, it responds to management requests for the current val ues of state variables or to requests for the values of certai n configuration vari ables to be set to n ew values. The agent is the soft ware that provides the interface between the director and the managed object. The agent encodes and decodes protocol messages it exchanges with the d irector and passes requests to and receives responses from the managed object. Informally, we general ly equate the m anaged object and the entit y because the managed object defines what the manager can monitor and control in the entity. A d irector was modeled as a layered software system that provides a management-specific envi ronment to management appl ications. A director was split i nto a framework, a management i nforma tion repository ( M I R) , and separate configurable software modu les cal led management modu les. The director kernel provides common routines usefu l for the layered software modu les, includ ing 0 I / / KNOWLEDGE, POLICIES, AND PROCEDURES I 1 - - - / MANAGER Figure 1 I IS _ _ _ 1� I I I I 1 J I I I MANAGEMENT MODULES (APPLICATIONS) I I API I API I I MONITOR CONTROL ENTITIES MANAGEMENT PROTOCOL The Basic Entity/Director Split I I I I I I I I I MIR I � - _ - _r_ FRAMEWORK D I R ECTOR KERNE L MANAGED OBJECT I 1-r-:::_ -:::_ -:::_ -:::_ -:::_ -:::_ -:::_ � D I R ECTORS � - MANAGED - - - - - SYSTEM - MANAG - - - - I N G-SYSTEM I � - - - - - - - - · I I services such as d ispatch (location-transparent exchange of management requests and responses with enti ties), encodi ng/decod i ng, data access, data dictionary access, and event m anagement. Taken together, the director kernel and the agent provide a framework for managed objects and man agement appl ications to interact. The framework provides a n application programming in terface (API) to m anaged object and management module developers. The M I R contains data about particular entities as wel l as information about the structure and other properties of entity c lasses, which the director software also knows. Management modu les were d ist ingu ished as presentation, fu nction, or access modu les. Presen tation modules implement user or software access to the d irector management modu les that is device i ndependent and style dependent. Function mod ules provide value-added management fu nctions that are partially or completely entity indepemlent, such as n etwork fau l t diagnosis, event or alarm han d l i ng, or h istorical data record ing. Access modu les provide a consistent i n terface to the basic manage ment functions performed by entities. In add ition, they i nclude one portion that maps operations on entities i n to the appropriate protocol primitives and another portion that i mplements the protocol engine for the relevan t m anagement protoco l. Figure 2 shows t he components of a director and an entity. Although users can convenien tly interact with systems through graphical user i n terfaces (GU!s), sophisticated users wished to preserve a command line i nterface (CLI) they cou ld use to specify com plex management requests quickly. Therefore, we MANAG EME NT PROTOCOL - _ - _ - _ - _ - _ --:; _ GENT I MIR I I I I I � �������J - - c-:::_ -:::_ -:::_ -:::_ -:::_ -:::_ -:::_ J Figure 2 Vol. 5 No. 1 A Framework View ofEMA lfiiuler 1')93 Digital Tecbnical jourua/ Network Management developed a single, extensible command l anguage that wou ld al low human operators or software pro grams to com municate requests to management modu les and (u ltimately) entities in a consistent fashion. This work developed into the network control language (NCL). An NCL com mand specifies an entity, an operation to be performed by the entity, a l ist of arguments (if any), and a l ist of quali fiers (for specifying users, passwords, paths, fil ter ing values, etc.). Digital's DECmcc Management Director is an i mplementation of an EMA d irector.' The DECmcc product provides a platform for the development of new management capabil it ies and offers specific Phase V management capabi lities as we l l as a nu m ber of generic net'work management tools. The DECmcc director supports both GUT and NCL CLI user interfaces. of complex components, entity classes are orga nized into logical structures that reflect the rela tionsh ip of their corresponding components; individual. entities are named i n terms of that structure. The name of the top -level entity i n each structure is global ly u n ique, and i t is referred to as a global entity. Al l i ts child entities, however, have names that are unique only within the context of their level in the structure. Therefore, they are referred to as local entities. • Entity Model To manage a l l entities i n a consistent man ner, we required a single, consistent method for structuring any m anageable component (regard less of its inter nal complexity) and for describing its management properties: the operations that it can perform, the variables it makes available for its management, the critical occurrences it can report to managers, etc. The El'vlA entity model was developed to answer these needs. The structure of a manageable compo nent in this model is shown in Figure 3. Essential ly, the entity model defines techniques for specifying an object-oriented view of an entity. Each entity has the fol lowing properties: • A target entity's globa l ly unique name is con structed by concatenating its local name (a pair) to the local names of each of its ancestors in turn, beginn i ng with the containing global entity and endi ng with the target entity's i m med iate parent. The construction of an entity's name and the containment h ierarchy are shown in Figure 4. • A col l ection of i nternal state variables, cal led attributes, that can be read and/or modified as a result of management operations. At tributes have names unique within the context of the entity. Attributes have a type that defines the val ues the attribute can have. • A col lection of operations that can be per formed by the entity. Operati ons al low man agers to read attribu tes, mod ify attribu tes, and perform actions supported by the entity. Actions are entity-specific operations that resu l t in changes of state in the entity or cause the entity to perform an operation that has a defined effect. • A col lection of events that can be reported asyn chronously by the entity. An event is some nor m a l or abnormal condition within an entity, usu a l l y the resu l t of a state transition observed by its service element or its agent. Event reports are sent asynchronously to the manager; they i nd icate the type of (entity-specific) event that occurred and may also contain arguments that A position within an entity h ierarchy. To ease management of networks with large numbers l CR EATE AND DELETE GET AND SET OPERATIONS I ACTIONS NOTIFICATIONS Figure 3 { SERVICE THE ENTITY PROVIDES A MANAGED OBJECT (ENTITY) ATIRIBUTE EVENT REPORT D BEHAVIOR Structure of a Managed Object Digital Technical Journal Vol. 5 No. 1 If/inter 1993 A h ierarchical ly structured name. An individual entity's local name is constructed by concatenat i ng its class name to its instance identifier. The class name is a keyword that uniquely identifies the class (object type) of an entity. The instance identifier is the value of an identifying attribute used for naming i nstances of the entity's class, for which each instance of the class has a unique value. 1 19 DECnet Open Networking NODE DEC UK REO MARVIN CLASS NODE NAME = DEC . U K. REO. MARVIN STATE = ON . . . I • CLASS ROUTI NG t NODE DECUK. REO.MARVIN OSI TRANSPO RT CLASS OSI TRANSPORT MAXIMUM WINDOW = 32 l . . . NODE DEC:.UK. REO.MARVIN OSI TRANSPORT PORT %X01 75A8D9 CLASS PORT NAME = %X01 75A8D9 PROTOCOL CLASS = 4 Figure 4 k/anaged Object Naming Hierarchy further describe o r qualify the event. For exam ple, arguments could indicate the n umber of times the event occu rred before a report was sent to a nnou nce that a threshold was reached, or give the old a nd new states in an event that reports a state transition. • A specification of the behavior of the entity in relationship to the functions that the entity's ser vice element provides. This is usua l ly specified as some abstract state machine, through pseudo code, or as a set of precond itions, postcondi tions, and i nvariants. The entity model provides specific requirements and recom mendations about the way entities can be modeled in terms of these properties. These restric tions, placed on entity class definitions for p urposes of both interna l and global consistency, take several forms: (1) restrictions on the types and ranges of attr ibu tes that can be used for various purposes (e.g., as identifying or cou nter attribu tes); (2) con strain ts on operations (e.g., examine operations can have no side effects on the value of attributes whose values they report); or (3) restrictions on events (e. g . , all events and event reports must have an associated time stamp and u nique identifier). Readers familiar with open systems interconnec tion (OSI) management wil l find the entity model very similar to OSI 's structure of management infor m ation (SMl) standard:'' This is no coincidence. D u ri ng the early development of Phase V and the entity model, we recognized the need for a n open management architecture. Portions of the techno!- 1 20 ogy were therefore contributed to JSO/IEC JTC 1 SC2 l/WG4, a working group of the I n ternational Organization for Standardization (ISO) that is responsible for efforts to define standards for OSI management. Al though some details of OSI SM I and the corresponding EMA features diverged sl ightly from each other dur ing their evolution, the EMA entity model ancl OS! SMI are sti l l compatible. At this writing, work is u nder way to al ign certain parts of the EMA entity model with the final i n ternational standard (IS) versions of OS! SMI. Entities The EMA entity model describes how to specify the management of an architected subsystem . How ever, for Phase V, we chose to make the manage ment specification of a subsystem a part of the subsystem's specification. As described in the Modu les section, that may have been the most important decision made in the network manage ment architecture. As the entities for DECnet/OSI Phase V were defined, a collection of fol klore grew on how typi cal design issues cou ld or should be solved. As with any fo l k lore, these guide l i nes were p assed from one architect to another, either verbally, or as selected portions of the management specifica tions were copied from one subsystem to another. This fol klore is continua l ly changing, as new and better solutions are fou n d . Much of the fol klore has al ready been described 6 Some other guidelines are described below. Vol. 5 No. I Winte-r 1991 D igital Techllical jou r11a/ Network 1l1a naf,ement The Network M a n agement Specifica tion describes the central str ucture of Phase V network man agement, a n d in particu l a r defines the node provides a l l the i n formation needed by a d i rector to con nect to the node's agent and to issue m a n age ment d i rectives to the node or any of i t s c h i l dre n . entity class . - I n the fol lowing sections, we describe Users a n d network ma nagers rarely refer to the properties of the node entity c l a ss a n d , as a nodes by their addresses. F i rs t , it is d i fficu lt to representative example, the OSI transport module remember the addresses and second, moving the entity class. node from one place to another i n the network gen era l ly changes i ts address. Thus each node has a Node Entity Class s i ngle name, a DECdns fu l l name. The node knows its name DECnet/OST and address. Each node's name is stored as a DECclns network is cal led a node. The bou nds of that system ent ry, and one of the entry's DECdns attributes A computer system in the depend on the system's arc h i tectu re ; a personal holds the node's address. Thus, any director can computer (PC), a s ingle-processor workstation, a look up t h e node's name in the DECdns and the multiprocessor m a i nframe, a diskl ess system, even address associated with it, and then use any one of a VAXc l uster system can be considered a single the towers to connect to the node's agent. node. Nodes are modeled by the node entity class. A node entity has o n l y a few fu nctions in management. • onym, which is a sb;-character, Phase l V - style node A node is a global ent ity that is the p a rent fo r many subsystems and provides an agent fo r a l l of them. • A node has an identi ty, a name, and a n address that a l low it to be managed remotely. • To ensure backward comp a t i b i l i t y with DECnet Phase rv , a node also has a n a t t ribute c a l led its syn n ame. I f a node has a synonym name, that name is entered in a special d i rectory in the DECd ns name space as a soft l i n k to the node's Phase V name. A soft l i n k is a fo r m of al ias or i n direct poin ter, from one name to a nother, that a l l ows an e n try to be reacJ1ed by more than one name. A node plays a major role in system i n i t i a lization Each network l ayer address of the node (a node can have more than one) i s encoded in and starr-up. a standard way as a soft l i nk to the node's name. This a l lows a Identity manager (or d i rector) to translate a node a d d ress i nto the equ ivalent node name, making many d i ag The fol lowing a ttributes ident ify a node: • An address, the application layer a In t h e DECnet/OSI Phase V arch itecture, a port entity represents the in te rface between layers, mak ing v i sible to a manager how one layer (a cl ient) is OS/ Transport Module In DECnet/OSI Phase contains port, v; using the services of a lower l ayer. Ports are not cre the OS! transport m od u l e template, local ated by a manager; they are created w hen a cl ient of network service the service requests use of the service (by "opening access point ( NSAP) address, and m a n u facturing a port " ) . The exact information held in a port entity a u tomation protocol (MAP) en t i t ies. A local NSAP varies fo r each su bsystem. I n ge neral, a port entity e n t i t y contains remote NSAP entities. The contai n contains a t t r i b u tes t h a t iden t i fy the c l i e n t and the ment h ierarcl1y is shown in Figure 6. service being used, and how that serv ice is being The OS! transport module has characteristic used (e . g . , as usage cou nters). The port e n t i t y is an a t tr i b u tes. A manager can change the configura example of how the Ei\'lA evolved through feedback tion of the m o d u le by mod ifying its characteris from tic attribu tes. This is c.Jone fo r several reasons. incl u d i ng • be granted being In the case of the OSL transport module, the port on an individual port connection (TC) , and i t provides transport a window to the status information associated w i th the TC. For example, the OSI tra nsp o r t port status a t t r i b u tes 1b control the m a x i m u m n u m ber of transport give connections that can be m u l tiplexed on any sin • gle network connec t io n , when the OS! t ransport protocol is opera t i ng over the connection-mode • network service The n a me of the user of the OS! transport service Local and remote NSAP addresses and transport selectors Mod i ficatio n of these a t t r i b u t es is needed onl y if • the manager requires anythin g other than a stan dard configuration; Before entity a l so corresponds to the local end of a trans connection • a rc h i tects. oped and used in subsystem architectures. To control the m a x i m u m cred i t w i ndow t h a t m ay subsystem agement archi tecture. the concept was first devel To l i m i t the m a x i m u m permissible nu mber of active transport connections at any one t i me • the adopted as a general mechanism i n the overal l man The protocol class being operated o n the TC In a d d i t i o n , a port e n t i t y has cou nter attributes that wor k ing c.Jefa u l t values are record the to tal n u m b e r of ti mes something of defined for all characteristic at tributes. Status a t t r i b u tes show the c u rrent ope rating i n terest occurred on the TC. For example, there are state of t he module, e .g . , the n u mber of transport counters recording the n u mber of octets ancl proto connections cu rre n t ly active. Status a t tribu tes can col data units (POlls) sent and received. A manage not be m o d ified d i re ctly by a m a nager. To start the ment station can p o l l these and determine usage OSI TRANSPORT L LOCAL NSAP TEMPLATE PORT I MAP Figure 6 L 24 REMOTE NSAP Containment Hierarchy for 051 Transport Module Vol. 5 No. J Winter 1<)9.) D(�ital Technical journal Network Management over time. A port e n t i ty also main tains counters fo r The remote NSAP entity is a subent i t y of a loca l both c.l upl ic ated transport PDUs (T PDUs) c.letected NSAP e n t i t y. Each remote N SAP entity maintains a t t r i b u tes resu l t i ng from interactions and retrans m it ted TPD Us. Ta ken with the usage counter counters, these can be used to calculate error ratios between the superior loca l NSAP and a remote and rates on the TC. t ransport serv ice provid er. Events are defined fo r W hen a cl i e n t opens a port onto a service, the the remote NSAP e n t i ty, to provide immed iate noti c l ient can then use the service i n terface to selec t fication to the ma nager of error con ditions. For options such a s w hich features t o u s e or which p ro ex ample, files. M a x i m u m flex i b i l i ty, however, a l s o poses a proble m . In m a ny cases, a cl ient has l i t t l e or no • k n owledge or u ndersta nd i ng of the service options fa i l u re event occurs whenever a received TPDU ava i l able in a n un derlyi ng layer. Fu rther, i t would be unrea l istic to expect a l l cl ients of a service (or, A checksum checksu m val i d a t i o n fa i l s when per formed on • An invalid TPDU received event occurs when u l t i m a tely, an end us er) to acq u i re this i n - depth ever a TPDU received from the remote NSAP is knowledge. in v i o l a tion of the transport protocol One alt ernati ve was to prov ide defa u l t values for a l l the service options. However, a si ngle set of c.lefa u l t val ues satisfies only a s i ngle subset of uses. I nstead we adopted the template, which is a n entit y t h a t represents a set of related option v a l u es . A ma nager can create as many templates as required for d i ffe ren t sets of related option valu es. A c li e n t neec.ls to be configured only w i t h the s i ngle n a m e o f t h e template t o use, not t h e deta i l s o f every service option. The OS! management standards groups have adopted the template concept in the fo rm of their is incremented . Thus, even if the m a nager has con figured even t logging to fil ter out these eve nts, an ind ication OSI transport s e rv ice, a template name may be are happening rem a ins, teria. The event contains a n u m ber of arguments as the t i me the event occurred. The i nval id TPDU received event also has arguments that give A reason code, i n d icating in what way the TPDU was i nval ic.l, as specified i n the ISO 8073 lection of characteristic at tribu tes used to supply the operation of a TC. When a port is opened to the they wel l . All events i d e nt ify the ge nerat i ng entity and A template i n the OS! transport m o d u l e is a col c.lefault values fo r certain parameters that i nfluence that prompting the m a nager to change t h e fi lte r i ng cri • initial v a l ue managec.l object ( I VMO). Whenever an Consider this second exa mple. inval id TPDU received event is generated, a counter standard 1 c. • The part of t h e TPDU header that was inva l id • A specific D i g i t a l Network Architecture (DNA) specifiec.l by the cl ie nt. The characteristic a t t r i b u tes error code, which was added to qual ify the ISO in the temp l a te are then used as c.lefa u l t values fo r 8073 reason code and to help customers d iag TC parameters not suppl iec.l by the user, i n c l u d i ng , nose problems for example, • • The MAP p l aces a nu mber of requirements upon The value of the w i ndow t i m e r impleme ntations of the OSI T h e s e t of classes o f protocol that m a y be negoti ated fo r use on a TC • The use of checksums t ha t mi ght be negotiated fo r a TC that operates the cl ass 4 protoco l , a variant of the OS! transport protocol defined i n ISO 8073 A d e fa u l t template i s a u tomatical ly createc.l and transport protocol beyond si mple conformance to ISO 8073. The MAP entity conta ins the additional management needed to meet these extra re quirements. The MAP entity is optional; i mplementations with no busi ness requ irement to support MA.P would not provide the MAP enti ty. Supporting Mechanisms used if no templ a te is spec i fied when a port i s Network management in DECnet/05! is bui l t on a opened. number of supportin g services. Wherever possible, There is one local NSAP entity fo r each NSAP m a nageme n t uses the services of the network to SAP entity manage the network. This approach m i n i m izes the is auto m a t ically created when an NSAP address used n u m be r of spec i a l mechanisms we had to define by the OSJ transport is added to the network rout specifi c a l l y fo r network m a nagement. Some key ing subsystem (the adjacent lower layer). services used by network management i ncl ude adc.lrcss used by the OSI transport. A local Digital Technicaljounwl �bl. 5 No. I Winter JY9.> l25 DECnet Open Networking • Session control • First and foremost was timin g. DNA CM I P was developed before the OSI CM IP was standardized. • DEC:c.l ns name service • D igit a l 's cl istribu tecl t i me service (DECdts) • A u n ique identifier service ( U I D) The i nevi table changes to t he standard led to many minor differences in the protocols. Stil l , because the concepts i n the EMA entity model and OSI 's SJVH are a l igned, the DNA and OSI C M I I' A few serv ices were developed specifically to protocols are fu ndamentally the same. The support network m a nagement. Most had existed i n a u thors are c u rrently m igrating DNA C M I P to OSI earlier phases of DNA. I S CMIP. The change wi l l be transparent to any user. • D NA CMIP • Event logging • MOP down-line load protocol • Appl ication loopback • on Phase IV systems to m a n age Phase V systems. DNA CMIP can be v iewed as two separate proto In the foJ Jowing sections, we describe DNA CMIP and event logging. cols. One protocol , management info r m a t i o n con trol excha nge (MICE), is used by a director to invoke a d i rect ive (get, set, acti o n , etc.) on a n entity (or entit ies). The other protoco l , management event Digital Network Architecture Common Management Infonnation Protocol notification (MEN), is used by an entity (or entities) The entity model describes w hat a n entity c a n do. Those concepts must be expressed in the manage ment protoco l . DNA CMIP, the management proto col fo r DECnet/OSI Phase V, is a n evol u t i o n of the to report events to a director. The two protocols operate over separate connection s for important reasons. • Phase rv management protocol (ca l le d NICE). The nected c.liffer. A M EN association is brought up two protocols are rem a r kably s i m i lar. Both i nclude thus controlled by the agent. A MICE association, however, i s brought u p w hen a director (or operat i o ns. The main d i fferences between the two ma nager) wishes to invoke an operation on an protocols are in the fo l lowing areas. e n t i t y, and is t h u s contro l led by the d irector. Treatment of other ope rations. In NICE, each Attempting to share control of asso ciation estab operat i o n required a new kind of message; in CMIP, a general extension mecha nism, the action, is provided. • l ishment was not worth the complexity. • Whenever an association is shared by two (i iffe r ent users, t he problem of al locating resources Naming. NICE supported a l i m i ted nu mber of fa irly to the two users must be a d d ress ed . Since e n t i t y cl asses (eight) a n d p rovided a r u d i men tra nsport tary nam ing h ierarchy based o n the n o t i o n of between connections, the a d d it i o n of a m u l t i connections deal with this issue " qua l i fying attribu tes." CMIP supports h i e rarchi plexing protocol at the appl i cation level (with a n cal entity names a n d is essen t i a l l y u n l i m ited in attendant flow control mechanism) was again the nu mber of entities with which it can dea l . considered to be too complex. Transport con S i m i l a rly, nections are not (or shou ld not be) expensive. CMIP is m u ch m o re extensible in n a m ing at tribu tes, a t tribu te gro u ps, a n d event reports. • The times at wh ich the associations are con when a n entity wishes to repo r t an even t , and is the set, show (also cal led get), and event report • Second, DNA CMIP operates over a DNA protocol stack, not a pure ISO stack. This al lows directors Encoding. CMIP uses ISO Abstract Syn t a x N o t a t i o n I (ASN . l ), a standard t a g , lengt h , v a l u e (TLV) encod ing of a t t ribu tes and arguments, a n d N I C E used a private TLV encoding. DNA CM IP is not qu ite the same as the rs ve rsion of OS! C M I P, a lthough it was based on the second Event Logging The e n t i t y e m i ts a n event report to the m a nager when an event occurs in an ent ity The event logging m od u l e provides a service that tran smits event reports from the repor t i ng e n t i t ies to one o r m ore sink appl ications, which are considered to be a cer tain kin d of d i rector in EMA. Event logging i n Phase V i s based o n concepts similar to those provided by d raft proposal of the C M I P standard. There are t wo Phase IV. Because the principal use of event logging reasons fo r this. is fo r repo rt ing fa u l ts, event logging does n o t 1 26 Vr;/. 5 No. I Winter 1993 Digital Tee/mica/ jounwl Network Management guarantee del ivery of event reports to the sink a col lection of entities and events that are either appl ication. F igure 7 shows the event logging architectur e . 1 7 passed through the fil ter or blocked by the filter. When a n event occurs within a n entity (E) i n a in an event queue within the ou tbound stream. source node, the entity invokes the PostEvent ser Each outbound stream 's queue server sends events Event reports passing through the fil ter are p laced vice provided by the event dispatcher (a part of the to a corresponding i nbound stream i n the sink node's agent). When posting an event , the entity appl ication. M u ltiple ou tbound streams can be set supplies its name, the type of the event, all the argu u p by the manager, al lowing events to be sent to ments related to the event, a time stamp of when many sink appl ications. Ou tbou nd streams are the event occurred, and a U I D assigned to the event. modeled as entities i n their own right, and standard UIDs ensure that each event can be u niquely identi management operations (create, get, set) are used fied, so that if a sink appl ication receives more than to configure them. one copy of an event report, it can detect the dupli Each i nbound stream i n a s i n k appli cation has an cat ion. Time stamps a llow the event reports t o be event receiver ( R) . I nbound streams are genera lly ordered in t ime (an i mportant step in determining created when a connection request is received causal ity). A time service (DECdts) is used to syn from an ou tbound stream. Events received by the chronize clocks across the network. It provides a receiver are compared against a sink fi lter and consistent view of time for correlating observations. queued to the s i n k appl ication. Thus the events An important feature for ma nagement is the i n c l u from m u l tiple inboun_, streams are merged. sion of an inaccuracy bound on the time stamp. The protocol used between the ou tbou nd stream The PostEvent service formats an event report and the inbound stream is the CMIP M E N protocol, and places it in an event queue (Q). Event queues which operates over a connection (using either the are l imited in the amount of memory they use; thus DECnet transport layer protocol or OSI transport). they l i mit the number of events that can be held i n The use of a connection lowers the probabil i ty that the queue. Because events can b e placed i n the a n event report wil l be lost, since the connection queue at a rate faster than the queue server (S) can hand les acknowledt5ments and retransmissions. It process them, the queue can fi l l , and any new does nor guarantee delivery, however, and events events placed in the queue w i l l be lost. The events may stil l be lost due to fa il ures of the sink appl ica lost event is recorded as a pseudo-event in the tion or the source node. queue (it appears as an event report from the entity holding the queue). The events lost event carries an Conclusions argument that records the number of events that Our approach to Phase V management worked were lost in a row. well . Defi n i ng the EMA entity model first provided a The queue server for the event dispatcher framework of consistency among all the architec compares each event report against a filter ( F ) tures. Developing a management protocol (CMIP) associated with a n ou tbound stream. The fi l ter lists expressing the basic concepts i n the entity model S I N K D I R ECTOR SOURCE NODE EVENT DI SPATCHER SINK A P PLICATION 00 F OUTBOUND STREAM INBOUND STREAM �F - � "-011@: . � LE;J s)ll o�F -11\ R I NBOUND STREAM DNA CMIP MEN PROTOCOL OUTBOUND STREAM .� R . ""'----I I ANOTHER S I N K DIRECTOR Figure 7 Digital Technical journal Vol. 5 No. 1 Winte-r 1993 ANOTHER S O U R C E NODE Event Logging 1 27 DECnet Open Networking in conj u nction with the model placed the protocol 6. in a position to meet t he needs of the model. Giving M. Sylor, "Gu ideli nes for Structuring Manage able E n t ities," integnlled Network Manage responsibility for defi ning the management of a ment /, B . Meandzija and ]. Westcott (eels.), subsystem to the architects of that subsystem made (Amsterdam: each subsystem more comrlete and coherent. As 1989): 169-183. problems were found in the model. based on lessons learned d u r i ng the specification of e n t ities, any 7 Equipment correct those problems. 8. Corporation, 1991 ) . 9. au toconfiguration, a nd self-ma nagement . Still, sim pl ifying the management of a Phase V network is a n important area for conti nual i mprovement. The biggest success of EMA/ P hase V management 10. 11. N . LaPel le, M . Seger, and M. Sylor, "The Evolu of Network Ma nagement Products," Tee/mica! Journal, 1, vol . no . 12. 3 1 3. Director," Digital DNA Maintenance Operations Protocol Func tional Speczfication, V4. 0. 0 (Maynard, MA: DECmcc System Reference Manual, 2 volu mes (Maynard, MA: Digital Equipment Part 1 : Management in{ormation Model, Organization for 1 (Phase V) (Maynard, MA : Digital Network Architecture (Phase V) Digital Network Architecture (Phase 15. Digital Network Architecture (Phase V) Documentation Kit No. 4 (Maynard, MA : Digital Equipment Corporatio n, Order No. (Geneva : I nternational Standard ization/In terna Services V) Documentation Kit No. 3 (Maynard, MA : D igital Equ ipment Corporation, Order No. EK-DNAP3-DK-001 , forthcoming 1993). EK-DNAP4 -DK-00 1 , 1993). tional E1ectrotecbnical Com mission, 1990). OS! il!fa nagernent inj()rmation Digital Network Architecture 14. OS! Management Information Seruices Structure of Management information 1016'5-1 1992). EK-DNAP2-DK-001 , 1993). 1 30-142 ISO/IEC DIS PE55C-TE, Digital Equ ipment Corporation, Order No. Technical journal, vol . 5, no. 1 ( Wi n ter 1993, this issue): 16. information Technology - Telecommunica Structure of iVltmagement 'n{onnation tions and information Exchange Between Part 4: Guidelines j()r the Definition of Systems-Connection Oriented Transport 10165-4 Protocol Specification, ISO/lEC 8073 (Geneva: (Geneva: International Organ .. :arion for Stan International Organization for St andard iza Managed Objects, ISO/ I EC dard ization/intern ational Com miss ion, 1992). 1 28 tion, VJ. O. O (Maynard, MA: Digital Equipment Documentation Kit No. 2 (Maynard , MA: C. Strut t and J. Swist, " Design of the DECmcc Ma nagement '5. DNA Unique identijier Functional Specifica EK-DNAP 1 -DK-00 l , forthcoming 1993). no. 1 ( Wi n ter 1993, this issue): 1 2 - 20. 4. EK-DNANS-FS-002, D igital Equipment Corporation, Order No. ). Harper, " Overview of Digital's Open Net working," Digital Tecl:micat journal, vol. 5, 3. Order No. Documentation Kit No. (September 1986) : 1 1 7- 12 8. 2. EK Corporation , Order Nos. A A-PD 5 LC-TE, A A References Digital No. EK-DNA 1 1-FS-00 1 , 1992). areas. Systems, networks, and applications are all tion Order D igital Equipment Corporation, Order No. more than the trad itional network management I. Corporation, Corporation, Order No. EK-DNA l -Fs-001 , 1992). is its general applicability. E1\1A is bei ng appl ied to managed by EMA. Functional DNA Naming Service Functional Specifica reflects the overall complexity and feature-richness con trol that the manager is given. This burden is Management tion, V2. 0. 0 (Maynard , MA : D igital Equ ipment of Phase V over Phase IV as wel l as the increased eased somewhat by the use of i n tell igent clefa u J ts, Publishers, DNA02-FS-001 , 1991 ) . However, some t h i ngs d id not go as well. The Phase V was beyond anyone's expec tations. This Network Science Specification, V5. 0. 0 (Maynard, MA: D igital needed changes to t he entity model were applied to nu mber of ent i t ies, attribu tes, and operations i n DNA Elsevier DIS Electrotechnical tion/International Electrotec hnical Comm is sion, 1989) Vol. 5 No. I Winter l'J'J3 Digital Techt�icaljounwl Network Management 17 DNA Event Logging Functional Specification, Vl. O. O (Maynard , MA : Digital Equ ipment Cor porati o n , Order No. EK-DNA09-FS-001 , 1992). C. Strutt and D. Shurtleff, "Archi tecture for an Inte grated, Extensible E n terprise Management System," Integrated Network Management I, B. Meandzija and ). Westcott (eds.), (Amsterdam: Else vier Sci ence Publ ishers, 1989): 61 -72. General References tll1A Entity Model (Maynard, MA: Digital Equipme n t Corporat i o n , Order No. A A-PY7KA-TE, 1991 ). M. Sy lor, " M anaging DECnet Phase V: The E ntity DNA Network Com mand Language Functional Specification, VJ. O. O (Maynard, MA: Digital Equip ment Corpora tion, Order No. EK-DNAOS -fS-001, 1991 ). Mo d e l ," IEEE Networks (March 1988): 30-36. L. Fehskens, " An Architectura l Strategy fo r E nter S. Marti n , ]. McCa n n , and D. Ora n , " Development of Management I, B . Meandzija and ). We stcott (eds.), prise Network Management," in tegrated Network the VAX Distribu ted Name Service," Digital Techni (An1sterdam: E l s evier Science P u b l ishers, 1989): caljournal, vo l . 1 , no. 9 O u n e 1989): 9-15. 41-60. Digital Technical journal Vol. 5 No. I Winter 1993 1 29 Colin Strutt James A. Swist Design ofthe DECmcc Management Director The DECmcc product family represents a sigmficant achievement in the develop ment of enterprise management capabilities. DECmcc embodies the director por· lion ofDigital's Enterprise Management Architecture (EMA) and is both a platform for tbe development of new management capabilities and a vehicle for aiding cus tomers to manage their computing and communications environments. Initially, the DECmcc director was intended to facilitate sophisticated management of evolv· ing networks. In addition to network management, DECmcc has been adapted to the needs of system, applications, data, environment, and telecommunications management. The first implementations contained the DECmcc kernel, a devel· oper's toolkit, and various management modules. Development of the DECmcc director has been a agement tasks. The second, a simple command l i n e m u lt iyear effort involving m a n y groups within d i rector referred to as network c o n t r o l l a nguage, Digi t a l . When the DECmcc design was i n it i a ted i n wo u l d address the needs of more experienced man 1987, there was n o equ iva lent m anagement soft· agers who prefer a command l ine enviro nment.' ware i n the i ncluded , i ndustry. provided Most companies, Digital Conceived primarily as a DECnet m a nagement one o r more independent, d i rector, the DECmcc p roduct evolved to address fo cused products. Each of these dealt with m a nag the broader problems associated with m a n aging a ing a specific set of components such as a single complete comp u ting and communications environ· vendor's local area network (LAN ) b ridges o r pro· ment. v i d i ng a specific m a n agement appl ica t i o n such as arguably wil I never finish as net work enviro nments equ ipment inventory. This evolution is not yet finished and continue to change. capabil ities Since the development of DECmcc in 1987, the within DECnet Phase IV were reaching their l im i t , si mple network m anagement protocol (SNMP) has and t h e i n corporation o f newer co m mu n ications become widely implemented. DECmcc bas adapted technologies in to h a n d l e S N M P as wel l . I n a d d i t io n , the DECmcc Digital's network m a nagement a seam less way was beco m i ng i ncreasingly diffi c u l t . As part of the DECnet Phase V product, once a tool for the VA.,'( VMS architecture, is develop ment, work was started to rational ize m a n now implemented on m u l tiple p l at forms, such as agement of di stributed systems. This effort led the U LTRIX and U NIX System V Release 4 operati ng to the for m a l defi n i t io n of such concepts as the systems. d irector/en t i ty relationship, the e n t i ty model, and In this paper, we look a t the development of the the com mon ma nagement information protocol DECmcc d irector. We start by d iscussing our i ni t i a l (CM IP) . u.:1.1 These ideas formed the basis for m a n design ideas taken i n the perspect ive of t h e indus agement in Phase V and were Digi t a l 's contribu· try at the t i m e . We then describe the i n itial i mple tions to the open systems i n terconnect io n (OSI) mentation of DECmcc. We also present the effects ma nagement of the changing i n dustry and how DECmcc has model from the International Organization fo r Standardization (ISO). The origin a l v ision of network management i n Phase V included the concept o f two management directors. The first, a sophisticated director referred adapted over t i m e . We con clude with some of the opportu n ities fo r fu ture work. Historical Perspective to as the m a n agement control center (MCC), wou l d D igital's first network m a n agement capa bility was hand le t h e more complex, yet user-oriented, m an- del ivered in 1978 as part of the release of DECnet 1 30 Vol. 5 No. 1 Winter 1993 Digital Tech11icaljournal Design of the DECmcc Management Director Phase 11 software. Much of the DECnet product was • A command language, network control language (NCL) , was for m a l ly defined to be unambiguous then manageable, both configuring the software for i nstal !at ion as wel l as the operational aspects. The even with new e nti ties and their definitions; an main program used to perform managem e n t was associated primitive director of the same name, the network control program (NCP). At that time part of every Phase V package, replaces the NCP management mostly consisted of looking at infor of previous phases.5 m a t ion a nd then changing it as needed. DECnet Phase I I , however, could perform sophisticated • A management protocol called the common management information protocol (CM!P) was diagnostic loopback tests, both nonint rusive as we l l as intrusive, to d iagnose con nectivity prob used to commu nicate between directors and entities:i.H � lems at various layers of the protocol stack. Management formed a significan t part of the CMIP was named common a nd presumed to han DECnet Phase lii and DECnet Phase IV networking dle the common aspects of management across a products. Each major release contained many wide variety of m a n age ment appl ications. Some changes to manage the new fu nctional ity. However, developers suggested the possible need fo r a the DECnet management structure in place in the smal l number of special ized managemen t infor 1970s was becoming more diffic u l t to adapt to the mation protocols (SMIPs)-perhaps one for each requirements of the mid-l980s. For example, sup of the management functional areas (configura port was added for X.25 during Phase Ill and for tion, p e rformance, fau lt, security, and account Ethernet dur ing Phase IV . These releases required ing). However, CMIP proved to be sufficiently quite different management approaches than the expressive and powerful to support manage ment appl ications covering the management one used for Phase II. With the advent of the signifi cant changes to DECnet Phase V to include support functional areas. for the OSI protocol stack, another m a nagement approach was needed . A t the time the distribu ted systems management Thus in conjunction with Phase V n e twork devel work was initiated, Digital's networking and com opment, an effort was started to provide a new architectural approach to management of Phase V m u nica tions prod uct line was expanding to encom pass more than the DECnet networking hardware One of the key requirements was to provide the and software. Along with each product came its Phase V management needs in a way that would own management software, some of which was extend their adaptab i l i ty to the fu ture. This work tailored along the lines of the DECnet standard NCP. was referred to as distributed systems management In addition, because it add resses management of the compu t i ng opment Group was building some fa irly sophisti the Network Management Devel environment as well as management of the commu cated nications that DEC net comprises. Most of the initial beyond the capabilities of NCP in DECnet. The work i n distribu ted systems m a nagement con cerned itse l f with the aspects that applied to DECnet and the changes needed to provide manage management applications that went fa r developers necessarily took a different approach to management. Thus, by the l a te 1980s Digital had developed a ability of DECnet in Phase V The primary u nder nu mber of distinct management products. Many of lying concepts were articulated. these employed private protocols, for example • Directors are management programs used by human managers to effect management. Enti ties represent managed components to directors through software referred to as agents 6 • • • NMCC/DECnet monitor, a wide-area DECnet mon itoring tool, based on a graphical user interface The entity model is the u nderlying model for ma naged entities defined in terms of a n object • NMCC/ETHERnim, an Ethernet monitoring/ inventory test program, based on a graphical based approach. u.� • NCP for ma naging DECnet, based o n a com mand l i n e user i nterface user interface The formal specification for the classes of enti ties is defined in terms of Module-2+ l ike specifi • RBMS, Remote Bridge Monitoring Software for cations and is cal led management defi nition managing Digital's bridge fami ly, based on a com language (MD) ..> mand l ine user i nterface similar to NCP Digiwl Tecbnial/ journal Vol. 5 No. I Winter 1993 1 31 DEC net Open Networking TSM , Terminal • Server Manager for managing col, a l o ng with its own management structure. As Digital's terminal server fa mi ly, based o n a networks became larger, more tha n one network command line user i nterface similar to that used manager was typically needed . in the terminal servers LTM , LAN tratiic monitor for understanding the • traffic usage and patterns of Ethernet segments, based on a graphical user interface Other manufacturers also provided management The opportunity existed to provide complete, i ntegrated network management that could be adapted to the changing needs of management. Our product goals were • face, permitting management of any component software capable of managing their devices. Some in the enterprise to be performed in a sty.le that does not depend o n the specific component vendors provided particular management appl ica tions that were not tied to any specific network device. These applications performed a single func tion, such as maintaining an inventory of equip • To provide i ntegration of the management data (contained in the components as seen by the ment on behalf of a manager. director) and management information (as con The plethora of management capabilities from structed by the director using the management many vendors created many choices for end users. data) At the same time, the diverse applications were per ceived as carrying significant drawbacks. Each appli To provide a consistent, integrated user in ter • To provide a consistent, extensible means of cation provided its own user interface. Each had its storing management information and of allow own database for storing management information . ing it to be accessed convenie ntly by mul tiple Each dealt with d ifferent management information. independent management appl ications In addition, each tool provided its own, often rudi mentary, independent management application. End users viewed these many products as creat • To provide an appl ication programming in ter face (API) to support management applications ing a series of problems: ( 1 ) A manager needed mul Obviously, an approach necessary to solve these tiple management terminals, one per product. non trivial problems was not to be a small u nder (2) Separate training was requ ired to use each taking; an architected approach was appropriate 6 product. (3) Confusion occurred when the user switched between mu ltiple products. (4 ) D ifferent information was available from each product, or worse, the same information was available in a d if ferent form. (5) There was n o abi lity to share infor mation between products. (6) It became difficu lt to diagnose problems that spanned multiple technolo gies. Other aspects of the system management per spective in 1986 have been described W At that time, standards for network management had not progressed very far; SNMP did not yet exist. I n fact, agreement on the overall concepts had only begun within the OS! management committees. It is with this background, then, that the design of DECmcc as a management director was u ndertaken. opportunities Of all the situations that existed in customer net Design Approach The solution to the problems ou t l i ned was seen to be a distributed applications environment, tailored to the specific needs of management. Quite quickly, the idea of defi n ing a modu lar and extensible envi ronment was selected. Management capabilities could be added in a straightforward fashion based o n an applications kernel, which could either be replicated as needed around a network, or considered as m ultiple, coop erating kernels support i ng a d istributed manage ment environment. Hence a kernel with modu les that can be added dynamically, much as applica tions are added to a n operating system, is funda mental to the design of DECmcc. The next consideration concerned the composi tion of the modules themselves. One approach to works in the mid-1980s, probably the most impor the support of multiple techno logies had one mod tant the real ization that networks no longer ule access each different sort of component to be consisted of equipment from a single vendor. In ma naged . Since a n umber of management applica was add ition, different techno logies were commonly tion fu nctions were desirable, one m ight have a used to improve a given customer's network. With module for each such function. Also one might each technology came its own management proto- have a module for each form of user i nterface to 1 32 \1JI. 5 No. I Winter J()')) DigitCll TechnicC�l]ournal Design of the DECmcc Management Director accommodate the d ifferent user inte rface styles, such as operations, attribu tes, notifications, and their command I ine or windowing. Thus, we arrived at the concept of distinguishing form, fu nction, and access. Furthermore, we defined management modu les based on presenta tion modu les (PMs) fo r user i nterface, function modu les (FMs) for management fu nctions, and access modu les (AlVIs) fo r accessing each d istinct technology. The DECmcc d irector structure is shown in Figure 1 . We obse rved that the EMA entity model, defined initially to meet the needs of management of enti ties, provided general ized structur i ng con cepts that would be appropriate fo r the direc tor environment as wel l. Indeed , choosing the same model to handle the needs of the d irector removed the need for a translation between the entity environment and the director enviro nment for EMA enti ties, which has proved to be advanta geous for the implementations. Hence the fol low ing en tity model concepts were also used in the director. • An object- oriented approach-encapsulating objects (entities) and their operations • l . Class data- the diction ary of all ma nagement A class structure- defi ning a ttribu tes, ope ra tions, and events to r each class and specifying management information using a m anagement specification language As we studied the needs for stored management information in the d irector, we identified four d if ferent sorts of information, distingu ished by the storage needs, nature of the contents, and the access patterns. related defin itions categorized by class, updated infrequently, but read often 2. Insta nce data-the configuration information, stored i n a global naming service, changing often, but read from many places simul taneously 3. Historical data-i nformation about specific entity instances stored over time, written incre men t a l ly and read sporadical ly according to the needs of appl icat ions using such data 4 . Miscellaneous data-other data needed for specific modules, such as tariff information or the defi n ition of ru les specifying alarm conditions The complete logical information store was termed the management information repository (MIR). The kernel defines an execution environment that is su itable for management modu les and sup ports the M IR . This was ini tially implemented in terms of technology provided completely withi n t h e director kernel. Many of t h e kernel services, however, were subsequently replaced with dis tribu ted systems services, incl uding mul tithread support, naming/directory service, time service, and remote procedure cal .l (RPC). It is, perhaps, interesting to note that the deci sion to use a m u l t i threaded approach in DECmcc was not unanimous. The alternate approach pro posed an asynchronous message-passing scheme. Although the decision to use a multithreaded envi ronment has proved to be implemen table, we did not appreciate how the performance of the mu lt i threading implementations wou ld affect the ability to support the needs of application environments such as DECmcc. Invoking Module Services INTERFACE As we looked at how management modu les wou ld cal l each other, we chose a fa irly straightforward approach. User interactions with a PIVI would cause MANAGEMENT KERNEL FUNCTION MODULES the PM to invoke an FM, the FM to then invoke the appropriate Al\1 , and the AJ.Vl to com municate with the desired entity The response would then be transmit ted through the AM , I' M , and PM , with the MANAGEMENT INFORMATION REPOSITORY resu l t presented to the user. Thus the simple proce .__"--"-----.1-...I.....J ACCESS MODULES d u re call paradigm between modu les, as shown in Figure 2, supported the needs of appl ications geared toward monitoring and control operations. However, one must consider the increase in the Figure 1 DECmcc Director Structure Digital Tee/mica/ ]ounwl Vol. 5 No. 1 Winter 1993 total number of management modu les over time, 1 33 DECnet Open Networking MANAGEMENT Dl RECTOR MANAGEMENT USERS I I PRESENTATION MODU LES I FUNCTION MODU LES I ACCESS MODULES I I I MANAGED ENTITIES Figure 2 Management il'Iodule Calling Hierarchy and the even greater increase in the total nu mber of available management services (defined by specific operations on classes of entities). Thus, it became clear that the i ntermodulc procedure caJJs could not use named procedures, as administering the names of ever-increasing nu mbers of procedures would be a burden. Instead we chose an approach whereby modu les i nvoked each other's services by referring to the operations and the objects, using a service invocation procedure known as "mcc_call." We defined the interfaces provided by the manage ment modu les entirely in terms of operations on objects-an object-oriented approach -but this approach d id not require the use of object-oriented languages or databases. We further observed that one cou ld decompose a management application into a number of smal ler, potentially reusable services. Hence FMs could invoke other FMs in performing their services much in the same way that applications on UNIX systems pipe resu lts from one component to another. Given the general ly extensible nature of DECmcc and the supporting mcc_call structure, this led to the con cept of generic applications. Being run- time driven from the class d ictionary, these applications cou ld work over a wide range of managed objects and 1 34 perform the same service for each of them without a priori knowledge of the objects. For example, one might have an FM that prov ides performance related services, turning error counters (obtained d irectly from the managed objects) into error rates (by simply pol l i ng for two counter values, subtract ing one from the other. and dividing by the time i nterval between pol ls) . A d ifferent FM might pro vide alarm services by notifying users of particu lar (user-specifiable) conditions, such as when a par ticular counter exceeds a defined threshold . Of course, managers are often more interested in error rates exceeding a given threshold. The same al arms FM could be primed to look tor an error rate; the request wou ld be passed on to the performance FM, which in turn wou l d calcu late the rate by look ing at successive p o l ls of the e rror cou nter. The alarms FM does not need to be aware whether the data it needs comes from the performance FM or directly from the managed object via the appropriate A.J\1 . The d isposition of the methods among modu les is b idden by the service invocation mechanism. Furthermore, the alarms FM tracks the number of times a user is notified of a problem, and this counter is available as management data. One might then want to determine the rate of user notifica tions (using exactly the same generic performance Fivl as before), and use the same alarms FM to notify a different user when the rate of notifications exceeds a defined threshold. This threshold m ight i nd icate that one manager is being overloaded . Thus, in this scenario we have a number of modules i nvolved in a call ing hierarchy, with the same mod ules appearing more than once. Figure 3 sbows the reuse of software using generic fu nction modules in DECmcc. Management Specification Language The entity model's managemen t definition lan guage, original ly intended for the specification of management agents, was modified and appl ied to the director environment. Director-oriented information was added to the management specifi cation, such as user i nterface tags for au tomatica iJy generated forms and menus. This information was named the management specification lan guage (MSL). An MSL compiler was defined to con vert MSL to an o n - l ine form, available as metadata through an on-line diction ary, the MIR class data. With the management specification i nformation available to management modules, modules cou ld Vol. 5 No. I Winter I')'J3 Digital Technicaljout·nal Design of the DECmcc Management Director NOTI FI CATI O N � ALARMS FUNCTION MODULE I Get alarm firing rate. PERFORMANCE FUNCTION MODULE I NOTIFICATION � FUNCTION M O D ULE Calculate (error) rate from two successive (error) counter values. Get error counter. ACCESS MODULE Figure 3 Test (error rate) value against threshold; if exceeded, emit notification and increment alarm firing counter. Get error rate. PERFORMANCE FUNCTION MODULE I Calculate (firing) rate from two successive (firing) counter values. Get alarm firing counter. ALARMS I Test (firing rate) value against threshold; if exceeded, emit notification and increment alarm firing counter. Return error counter from entity. Data/Control Flowfor Multiple FMs adapt their behavior as new modules were added; For FMs, we originally envisioned two sorts of this is especia l l y important for generic modules. modules: the generic FM provi d i ng the same func Thus the same MSL that was used to help the entity tion over a wide variety of managed objects, and a agent developers was also useful for the manage specific FM providing a set of functions for a single ment d irector to d r ive the extensible management class of managed object. Today, we believe one may modules. ' ' have two d i fferent sorts of generic FM : one that is Thi s d ictionary information spurred t h e defini specific to a technology (such as network manage tion and development of the generic management ment related), and another, truly generic, which is modules. The generic P.Ms provide an extensible completely i ndependent of the technology being user interface that is capable of adapting as new managed (such as an alarms FM). managed objects or applications are added. The For PMs, we recognized the need to hand le generic FMs provide consistent functions over a device-specific aspects as well as user interface broad set of managed objects. Finally, the generic style-specific aspects. Normal ly one would have A.M s support extensible management protocols, generic PMs provide user interface capabi l i ties over allowing the dynamic addition of new sorts of man a broad variety of managed objects and appli ca aged objects. tions. However, to support the specific needs of The design of the DECmcc director led to a num generic FMs, specific PMs might be used to provide ber of possibilities in the type and application of the appropriate user i nterface. PMs that are specific the different sorts of modu les. Initial ly A.Ms were to an FM are less useful since they do not provide a conceived as being one per management protocol, consistent user interface " look and feel." device (such as bridge, terminal server, DECnet ber of smaller, but nonetheless important, design node). Since the advent of standard protocols, such decisions were made. The concept of management which usually translated to one AM per type of During the design of the DECmcc d irector, a num as SNMP from the Internet community and CMIP for domains was defined as a general container mecha OSl management, A.Ms are now more typically generic and extensible 8.9· 1 2 A single AM covers many nism for entities, w h ich could include domains d ifferent types of device with one protocol. to Digital Technical journal Vol. 5 No. I Winter 1993 themselves. Domains therefore provide a flexible, user-specifiable organizational structure for both 1 35 DECnet Open Networking visual representation at the user interface, as wel l as a means to organize the stored management infor mation and associated background processing. 1'' The need to provide a consistent approach to the naming of objects within the director was estab l ished. This was i nitially based on D igital's dis tributed name service, DECdns, providing globally 3. Management of objects in the telecommu nica tions field , such as PBX machines, m ultiplexers, and switches1H 4 . Management of noncompu ter hardware, such as air cond itioners and buildi ng-environment controls u n ique names and network-wide access to those Note that the implementation of these exten names 1' Final ly, the concept of time, including the sions generally involves a relatively small invest sched u l i ng of operations as wel l as scope of inter ment, at which point the power of existing generic est for information retrieval , was included in the appl ications is au tomatical ly provided. For exam mcc_call API. The time concept al lows manage ple, in the easiest case, a new object that is manage ment applications to be developed that can operate able through SNMP need only have its management on historically stored information as easily as they information base (MIB) translated to MSL and loaded can on data retrieved directly from the network. 16 into the DECmcc dictionary, at which point it is A more detailed report on the design of DECmcc has been publ ished . 17 Some other aspects of the DECmcc program, while not part of the technical design , had a major accessible by the existing SNMP A.t\1 as we ll as the standard generic appl ications. In other cases, such as the air conditioning exam ple, it is only necessary to code an AM that part to play i n its evolution. First was the need to com mun icates to the air conditioning contro l ler provide publ ished, open definitions of the DECmcc through its private protocol. Fu nctions such as API, based on existing standards. This a llows other alarms, notifications, historical data record ing, and vendors and end users to develop their own man graphing are automatically provided by existing FMs agement capabil it ies to add to DECmcc. Second was and PMs upon recognit ion of the new object class. the establishment of a strategic vendor program In complex cases, object-specific FMs are written within Digital to work with other vendors, particu to perform such tasks as software installation and larly those that provided network technologies that disk backup control. Yet even in these cases, all complemented D igital's own offerings, to help these them develop to the DECmcc platform. Finally a through the generic PMs. fu nctions are automatically accessible design center program was insti tuted whereby the The potential tor i n terdisciplinary applications design of DECmcc would be validated, as it evolved, is now becoming possible by the normalization of against the needs of some major customers to the interfaces to objects trad itional ly hand led by ensure that i t continued to address the manage totally separate appl ica t ions. For example, given ment problems of those customers. Broadening the Scope the extensions described above, it is possible to write an application that activates a n emergency disk backup and switches telephone trunk traffic to Since DECmcc was designed to be able to manage another bui lding if an air conditioning failure anything that could be described by the entity occu rs. In fact, depending on how the various model, and since the entity model is a general objects are defined , it may even be possible to cre object-oriented framework, it fol lows that it is feasi ate such an application simply by writing a single ble to extend DECmcc to c lasses of managed object al arm rule. and appl ications beyond the traditional network oriented view of nodes, hosts, bridges, routers, etc. Evolu tion to Open Systems Some of the new classes of managed objects and With recent industry trends toward open systems new appl ications that we have seen developed environments, as wel l as the real ization that almost using DECmcc include any en terprise now comprises mul tiple hardware 1. Management of applications such as transaction was clear that DECmcc had to evolve to this new 2. Appl ications in traditional system management, only the management of objects existing on various such as user management, disk backup, software platforms, but a lso the execution of the director processors and databases and software platforms from m u ltiple vendors, it world. Among the requirements to be met were not installation , configuration maintenance, and itself on d ifferent hardware and operating system performance monitoring platforms. 1 36 W!l. 5 No. I Winter 19'J3 Digital Technical journal Design of the DECmcc Management Director These requirements d ictated two basic design ing the operating system to support a merged image activation fu nction, a feature of the VMS goals: 1 . Porta bil ity of the director kernel itself to envi ronments other than VAX VMS 2. Portab i l i t y of plug-in management modu les to a DEemcc director running on any supported plat form, and in particular, source compatibility to the greatest extent possi ble with the consider able suite of management modu les that existed when the porting effort started Many of the fundamental requirements for porta bility had already been met. All existing manage ment modu les were coded to the API defined i n the DECmcc System Reference Manual (SR.M), and the SR.M had I ittle code that was inherently specific to VAX or VMS. 1'> In fact, only the documented SR.NI rou tines were used to access DECmcc services, as wel l as many other common operating system services such as data storage and thread control. Conse quent ly, the kernel implementation team had the flexibility to implement these services differently on various platforms without impacting manage ment module source code. This was particularly implememation. 5. Through the use of various wrapper routines in the DEemcc development toolkit, we were able to al low the m anagement module developer to code entry points to the management modules without distinction to whether they were being run in an image merge or a n independent pro cess context. Despite these major cha nges, 85 percent of the ker nel code is i n fact platform independent, and we are maintaining a single source pool for DECmcc regard less of the number of platforms. To minimize the operating-system-dependent code we must maimain and to provide backward compatibility, we are also porting to VMS a number of the above technologies such as those built on DeE. At the present time we continue to broaden our open systems focus by additional ports to UNIX System V, OpenVMS on Alpha AXP, OSF!l on Alpha AXP, as well as other operating systems. true with the al l-important mcc_call service, Implementation which provided the API for i ntermodule com muni I n late 1990 and early 1991 , D igital delivered the cation i n a platform-independem context such that first two versions of DECmcc. Version 1 .0 was writ a wide variety of i merprocess or intra process com ten to al low other vendors to start building their munications mechanisms cou ld be chosen for the management modules; version 1 . 1 added some u nderlying implementation. components for network managers. Both releases In the ini tial porting effort, which was from VAX VMS to ruse (reduced instruction set computer) a nd VA,'< ULTRlX, some of the more important changes i n underlying implementations were ran on VAX VMS systems, either workstations or hosts. In the midd le of 1992, Digital released version 1 .2 of DEemcc, wh ich added significant capabilities 1 . The MIR was implemented over the ndbm hash database manager. An earl ier version of the MIR. was also implemented over ULTRIX SQL, which provided some large-capacity database features at the expense of significant performance. 2. The operating system time i nterfaces were migrated to the d istributed time service of the Open Software Fou ndation distributed comput ing environment (OSF DCE). 3. The multithreading services were migrated to the DEeth reads comp o nent of the DCE. and runs on ruse ULTRIX. Later in 1992, D igital del iv ered POLYCENTER SNA Manager. I n conjunction with DECmcc and the SOLVE: Connect for EMA, a product from System Center, Inc., it al lows bid irec t ional management between IBM SNA hosts and DEemcc systems. 2o In early 1993, D igital released version 1 . 3 of DEemcc u nder the new product family name of POLYCENTER, with the POLYeENTER Framework, which is the basis for POLYCENTER Network Manager 200 and POLYeENTER Network Manager 400. This new version adds ways to provide simpler, yet powerful, integrat i o n of management capab i l i 4. The intermodule communication mechanisms ties; uses an OSF/Motif graphical user i nterface; a.1d (mcc_call) were implemented using R.PC tech provides additional development tools. These v r nology, with management modules running sions contain the DECmcc kerne l , a correspo ndiug as independent R.Pe server processes. This developer's toolkit, and a series of management allowed run-time extensibil ity without requir- modules, which are outl ined in Ta ble 1 . The SR.M D igital Tecb11ica l ]ounzal Vol. 5 No. I Winter 199.3 1 37 DECnet Open Networking Table 1 DECmcc Director M a nagement Modules Presentation M o d u l es Forms and Command Line PM Def i n itions Provides a command line user interface based on the NCL defi nition, together with a full-screen mode for video terminal devices. This PM also executes DECmcc command scripts. Iconic Map PM Provides an iconographic display based on OSF/Motif. It supports all the capabilities of the command li ne, but with a more usable graphical representation of the network and p u l l-down menu support. This P M also provides on-line g raph ing of management i nformation. In ad d ition, this PM can launch management applications that are not strictly part of the DECmcc environment, to provide a visual integrat ion for the manager. Notification PM Provides an interactive management d isplay of event or alarm firing conditions based on OSF/Motif. Flexible fi ltering of i nformation i s used to m i n i mize the i nformation di splayed to the manager, but the manager can search for and d i splay i nformation using various criteria such as severity level, managed object, and data and ti me. Function Modu les Registration FM Definit i o ns Provides a means for reg istering entities with the director and for maintaining reference i nformation on behalf of the entities. Domain FM Maintains the def i n itions of the various management domains, their membersh i p, and their relationships. H istorian FM Enables the capture and storage of user-specified management attributes from any entity in the network. Retrieval of the stored information by management mod u l es is provided d i rectly by the mcc_cal l API. Exporter FM Allows extraction of user-specified on-line o r stored management i nformation i n to a relational database for processing by SOL-based i nformation management tools, such as reports. Alarms FM Permits managers to specify, through rules, the set of cond itions about the network i n which they are i nterested. When the alarms FM detects a condition (the rule fi res), various notification techniques may be employed. Th ese include invoking a command script, se n d i n g ma i l , c a l l i ng a manager using an electronic beeper, or modifying an icon on the icon ic map d i splay. Perfo rmance Analyzer FM Calcu lates statistics for DEC net, transmission control protocol/internet protocol {TCP/I P), and LAN bridges, based on error and traff ic uti l i zation or other information. Diagnostic Assistant FM Helps the manager diagnose faults in a TCP/I P network, based on some of the more frequently occurring TCP/I P network problems. Autoconfiguration FMs Determ ine automatically the configuration and topology of specific portions of the network. I ncluded are FMs to determine the configu ration and topology of DECnet Phase IV networks, IP subnetworks, fiber distributed data interface (FDDI) ring maps, and LAN bridge span nin g trees. Access Modu les SN M P AM Definitions Provides access to obj ects that implement the SNMP protocol. It is a generic AM in t h e sense that it can adapt to new object def i n itions using information i n the DECmcc d i ctiona ry. New MIB definitions are provided in a standard form and translated by a M I B translation utility into the DECmcc d ictiona ry. DECnet Phase IV AM Provides access to the DEC net Phase IV i m plementations, be they hosts or servers such as routers. T h is AM i mplements the network i nformation and control exchange (NICE) protocol. DECnet/OSI Phase V AM Provides access to the DECnet/OSI Phase V i mpleme ntations, hosts, and servers. It i m plements the C M I P protocol used in Phase V. 1 38 Vol. 5 No. I Winter 19'J3 Digital Teclmicaljournal Design of the DECmcc Management Director Table 1 DECmcc Di rector M a nagement Modules (continued) Access Modules Definitions Bridge AM Supports D i gital's family of LAN bridges, i ncluding the LANbridge 1 00, LANbridge 1 50 and LANbridge 200, and the DECbridge family. It implements the RBMS protocol, which is used by the original manage ment product of the same name. F D D I AM Supports D i g ita l 's F D D I DECconce ntrator products and other devices that Termi nal Server AM Supports Dig ital's fam i l y of terminal servers, implementing management support the standard station management p rotocol (SMl). through the mai ntenance operations protocol (MOP). Ethernet Station AM Supports al l Ethernet and I EEE 802.3 stations that implement eith er, or Circuit AM Uses the services of other AMs to provide management of the network both, the Dig ital MOP protocol or the I EEE 802.2 XID and T EST messages. circuits that connect systems together, based on DEC net nodes, TCP/IP hosts, o r network management forum definitions. Such circuits might be si mple point-to- point or could represe nt complex multichannel circuits. Permit bidirectional management of t h e SNA environment and the DECmcc SNA AM and Agent PM management environment through a component that resides on an SNA host (either IBM's NetView or System Ce nter's Advanced System Management). Data Col lector AM Provides a means to allow other software, such as applications, to send events into DECmcc so they may be processed and analyzed along with events from devices or appl i cations that have access modules. Script AM Al lows invocation of existi ng or custom shell scri pts or command procedu res from DECmcc, and information to be returned from the scripts into DECmcc for processing and analysis by other modu les. provided the API definitions for management mod • u les, as provided by the kernel. Figure 4 shows a sample screen from DECmcc being used to manage a portion of a network. • • DECmcc can therefore be tailored to include the set of modules appropriate for managing the enviro n ment in which it is situated. In add ition, modules from other vendors can be integrated by the cus new management modules are added, the powerful generic capabil ities of DECmcc a l low many existing functions to be used without change. When an AM is added for a new class of resource, or when a n existing generic AM is enhanced by adding new supporting definitions i n • Make the resources known to a l l DECmcc direc tors in the network Digital Technical journal Vol. 5 No. I Winter 1993 in these • Display event information from these resources • Create alarm rules that can be triggered on par ticular conditions (polle d or u nsolicited) about these resources • • Have the relevant icons change color when the alarms fire Store, periodically, management data or infor mation about these resources i n the DECmcc historical data store, or export the information to a rel ational database fol lowing functions. Identify specific resource instances u niquely attributes Apply management actions to these resources the d ictionary, one can i m mediately perform the • management • tomer without involvement from D igital . As Modify resources convenient to package d ifferent modules together, providing for a flexible packaging scheme. Each Examine management attribu tes from these resources Since the DECmcc kernel is indifferent to the spe cific type of any management module, i t is quite Represent the resources on a n iconic display i n o n e o r more m anagement domains • • View the stored historical data Process the relational data using standard infor mation m a nagement tools, for example, to pro vide management reports 1 39 DECnet Open Networking POL V CENTER Graph N o d e 4 BILFSH User bytes received B y t e s �1� I l lJ I I I/I t o lll , . -• 34 : 4 1 35:59 36:59 3 7 : 5� 38:59 39:59 40:59 4 1 : 59 42:59 4 4 : 00 4 4 : 5� Time Minutes : Seconds 14:44 :59 User bytes sent y t e s soo 450 400 350 300 250 200 150 100 5� . . .. .. . .. . .. •J � , � �'� � ' � n:� � ' � u : � � , � �:oo � : oo «:oo « : � Time Minutes : Seconds r r Figure 4 Characte risti c s I n itial attr i b utes 14:44 :59 r Stati sti c s Screen Display of DECmcc Version 1.3 Future Work ronment management, telecommunications man Of course, work on a major software system such as agement, and so on. Commensurate with each of the DECmcc director is never complete. There are these general areas are technology-specific appl ica many areas of opportunity for additional develop ment. For example, DECmcc can be ported to other tions. In addition, further technology- independent generic applications can be developed. A recent industry platforms (both hardware and software). paper describes how DECmcc can be considered New objects can be managed, not only in network as a distributed appl ication and some additional management but a lso in system management, work to make use of the DECmcc concepts i n a application management, data management, envi- distributed environment.21 140 Vol. 5 No. J Winter 1993 Digital Technical Journal Design of tbe DECmcc J'.llml agement Director DECmcc is not the only management director 4. in the industry. Thus interoperability between DIVA (Phase V) Com mon Management Info r mation Protocol Functional Spectfication DECmcc and other management systems is another ( Maynard, MA: D igital Equipment Corpora area of opportunity. DECmcc already has links to other management systems, not the least being to tion, Order No. EK-DNAOl-FS-00 1 , July 1991 ) . 5. manage IBM SNA systems. Recent advances in object-oriented technology DNA (Phase V) Network Control Language Functional Specification (Maynard, MA: can be incorporated to enhance the object orienta Digital Equipment Corporation , Order No. tion of DECmcc. EK-DNA05-FS-00 1 , July 1991 ) . Final ly, new standard industry management pro tocols, new ma naged objects, and management 6. L. Febskens, "An Architectural St rategy for framework innovations are always becoming avail Enterprise Management," IFJP Proceedings of able. DECmcc w i l l be ta king a l l of these evolu tions the first Symposium on Integrated Network in its stride. The d istributed management environ iVJanagement (May 1989): 41-60. ment (DME), stil l under development by OSF, promises to bring yet more technology to which 7. DECmcc wi l l adapt readi ly. M. Sylor, " G u idelines for Structuring Manage able Emit ies," IFIP Proceedings of the First Symposium on Integrated Network Manage ment (May 1989): 169-183. Summary This paper has explained aspects of the design of DECmcc in the context of the state of the ind ustry at 8. Technology: Interconnection: the time. DECmcc has been a large undertaking, but Common Open Systems Management Information Service Definition, !SO/IEC 9595 we have been able to build and ship significant, con (Geneva: sistent, integrated, and yet extensible, management International Organization for Standardization/I nternational Electrotechni capabil i ties covering a broad range of managed cal Commission, 1990). objects. The ability for DECmcc to adapt to the changing management environments underscores Information 9. Information Technology: Open Systems the benefit of adopting an architected approach to Interconnection: implementation. information Protocol Specification, Part 1 , Common 1l1anagement ISO/JEC 9596-1 (Geneva: International Organi zation for Standardization/International Elec Acknawledgments trotechnical Com mission, 1990). The au thors would l ike to acknowledge the work of the many people in the groups, past and present, respo nsible for bringing the ideas presented in this 10. tion paper into practical real ity in the DECmcc product 11. References D. Shurtleff and C . StrLitt, " Extensibility o f an M. Malek, and M. Wal l (eds.) (New York: Entity Mode l ," JEEE Networks (March 1988): M. Sylor, F. Dola n , and D. Shurtleff, " Network Management," Digital Technical journal, Plenum Press, 1990): 129-141 . 12. Entity Model (Maynard, JYIA: Corporation, Order Digital No. PV7KA-TE, January 1993). D-igital Tecbn'ical]ournal l'bl. 5 No. I A A ]. Case, M . Fedor, M. Schoffstal l , and ]. Davin, "A Simple Networl' Management Protocol (SNMP)," RFC 1157 (May 1990). vol . 5, no. 1 ( Winter 1993, this issue): 1 17-129. Equipment Products," 1 , no. 3 Enterprise Management Director," Network M . Sylor, " Managing DECnet Phase Y: The EMA vol . Management and Control, A. Kershenbaum, 30-36. 3. Management (September 1986): 117-128. reviewers were very helpful. 2. of Network Digital Technical journal, set. Also, the detailed com ments of two anonymous I. N. La Pel le, M. Seger, and M. Sylor, "The Evolu 13. G. Stone, " In tegrated Management Technolo gies," AT&T UNIX Systems Management Symposium, Spring 199 1 . Winter 1993 141 DEC net Open Networking 14. C. Stru t t, "Deal ing with Scale in a n Enterprise 18. Symposium on Integrated 19. umes Network Managem ent Electronics ( M aynard, MA: Digital Order No. Equipmen t AA-PD5 LC-TE, A A PE55C-TE, Ap ril 1992) A. Shvartsman, "An Historical Object Base i n an Enterprise Management D irector," IFIP Network DECmcc System Reference Manual, 2 vol Corporation, Proceedings of the Thi-rd Symposium on Program," Engineers, 1992): 102-1 1 1 . S. Marti n , ). McCann, and D. Oran, " Develop Integrated Management The Institute of Electrical and 1989): 9-15. 20. ]. Fernandez and K. Winkler, " Model ing SNA Networks using the Structure of Management (April Information," 1993): 123-134 IEEE Commu nications ( M ay 1993) . C. Strutt ancl D. Shurt leff, "Architecture fo r an Integrated, 142 Telecom m u nications Operations and Management (New Yo rk : Second ment of the VAX D istribu ted Name Service," 17 " D igital 's the Digital Technical journal, vol. 1 , no. 9 (June 16. Borden, Network Network Management (April 1991 ) : 577-593. 15. ). Management Director," IFJP Proceedings of Extensible En terprise M anage 21. C . Stru t t , " D istribution i n an Enterprise Man ment Director," IFJP Proceedings of the First agement D irector," JFJP Proceedings of the Symposium on Jntegr·ated Network Manage Third s:vmposium on Integrated Network ment ( May 1989): 61-72. Management (April 199 3): 2 2 3 -234. Vol. 5 No. 1 Winter 1993 Digital Technica.t]ournal I Recent Digital US. Patents Thefollowing patents were recently issued to Digital Equipment Cmporation. Titles and names supplied to us by the US. Pa tent and Trademark Office are reproduced exactly as they appear on the original published patent. 5, 117, 352 5, ll 9,043 L. H . Falek Mechanism for Fail-Over Notification R. W Brown, M. D. Leis, Auto-Centered Phase-Locked Loop and E. C. Simmons 5, l l9,402 S. A. Ginzburg and ]. M. Rieger Method and Apparatus for Transmission of Local Area 5, 119,465 M. L. jack and R. T. Gumbel System for Selectively Converting Plurality of Source Data Network Signals over Unshielded Twisted Pairs Structures through Corresponding Source Intermediate Structures, and Target Intermediate Structures into Selected Target Structure 5, 1 19, 483 5, 1 19,484 W C. Madden, D. E. Sanders, Application of State Silos for Recovery fro m Memory G. M. Uhler, and W R. Wheeler Management Exceptions T. f Fox Selections between Alternate Control Word and Current Instruction Generated Control Word for ALU in Respond to ALU Output and Current Instruction 5,120,603 Magneto-Optic Recording Medium with Oriented Langmuir P H. Schmidt Blodgett Protect ive Layer 5, 1 2 1 ,085 R. W Brown Dual-Charge-Pump Bandwidth-Switched Phase-Locked-Loop 5, 121,260 G. ). Asakawa, R. Y Noguchi, Read Channel Optimization System and ). Rinaldis 5, 121 ,382 5, 123,091 H. S. Yang, M. W Carrafiel lo, Station- to-Station Full D uplex Communication in W Hawe, and R. W Graham a Commu nications Network Data Processing System and Method for Packetizing Data B. E. Newman from Peripherals. 5, 123,306 N. S. Saunders and D. ). Moretti Pin Pulling Tool 5, 125,083 D. B. Fite, T. Fossum, Method and Apparatus for Resolving a Variable Number 5, 125,086 R. C. Hetherington, of Potential Memory Access Conflicts in a Pipe! ined ). E. Murray, Jr., and D. A. Webb Computer System F. L. Perazzol i , Jr. Virtual Memory Paging Apparatus with Variable Size In-Page Clusters 5, 126,964 ]. H . Zurawski High Performance Bit-Sl iced Multipl ier C ircuit 5, 127,006 K. Subramanian and Fault D iagnostic System M. A. Bil lmers 5, 136,700 C. P Thacker 5, 150, 197 W R. Hamburgen Die Attach Structure and Method 5, 150,360 R. ). Perlman, W R. Hawe, and Utilization of Redundant Links in Bridges Networks Apparatus and Method for Reducing In terference in Two -Level Cache Memories A. G. Lauck 5, 161 , 19 3 B. T. Lampson, W R. Hawe, Pipelined Cryptography Processor and Method for its Use in A. Gupta, and B. A. Spinney Communication Networks 5, 179, 577 N. Ilyadis Dynamic Threshold Data Receiver for Local Area Networks 5, 185,537 T. Creedon, ]. Nolan, Gate Efficient Digital Gl itch Filter fo r Multiple and E . O' Neill Input Applications 5, 193, 151 R. )ain Delay-Based Congestion Avo idance in Computer Networks 5, 195, 181 S. Bryant and M. Seaman Message Processing System Having Separate Message Receiving and Transmitting Processors with Message P>ocess ing Being Distributed Between the Separate Processors Digital Techrlical journal Vol. 5 No. I Winter 1993 143 I Referees, April 1992 to December 1992 The editors acknowledge and thank the referees who have par ticipated in a peer review of the papers submittedfor publication in the D igital Technical )ournal. The referees ' detailed reports bcwe helped ensure that papers published in the journal offer relevant and in(onnative discus sions of compu ter technologies and products. The referees are computer science and engineer ing professionalsfrom academia and industry, including Digital:s consulting engineers. John Hauser, Un iversity of California Bil l Herrick, Digital Hai Huang, D igital Raj Ja in, D igital Ashok Joshi , D igital Alberto Leon-Garcia, University of Toronto Jeff Kalb, Maspar Computer Corporation Kim Kappel, Georgia Institute of Technology Paul K.inzelman, D igital Jam�s Kirkley, Digital Jeffery Kusk i n , Sta nford U niversity Paul Kyzivat, D igital M ike Leary, D igital Ian Lesl ie, U niversity of Cambridge Tom Levergood, Digital David Lomet , D igital Frank McCabe, D igital Ananr Agarwal, Massachusetts Institute of Technology joh n McDermott, Digital Brian Allison, D igital Pau l McJones, D igital Paul Beck, D igital Lisa Bender, Digital Will iam M ichalson , Worcester Polytechnic Insti tute Brian Bershad, Carnegie-Mel lon University Peter Mierswa, D igital D i leep Bha ndarkar, D igi tal Charles M i tche l l , Digital Meyer Bi l lmers, D igital David Mitton, Digital Verel l Boaen, D igital Fanya Montalvo, D igital Scott Bradner, Harvard University J Eliot Moss, University of Massachusetts Bevin B rett, D igital Trevor M udge , University of M ichigan Preston Briggs, Rice University B i l l Noyce, D igital Dean Brock, U niversity of North Caro lina Dave Patterson, U n iversity of Cal ifornia M ark R . Brow n , Digital Larry Peters o n , University of Arizona Randal E . Brya nt, Carnegie-Mel lon Un iversity David Piscitel lo, Bel lcore Lyman Chapi n , Bolt, Beranek and Newman George Polyzos, U niversity of Cal fornia john DiMarco, U niversity of Toronto Brian Porter, D igital James D u ckworth, \Vorcester Polytechnic I nstitute James .J, Quinn, D igital Hugh Dut·da n , D igital Farshad Rafii, Babson Co llege P h i l ip Enslow, Georgia I nstitute of Technology Hemant Rotitbor, Worcester Polytechnic Institute Deborah Estr i n , U n iversity of Sou thern Cal i fornia Pau l Rubinfeld, Digital Len Fehskens, Digital Peter Savage, Digital David Fenwick, D igital Michael Schroeder, D igital David Fite, Digital Wil l Sherwoo d , D igital john Forecast, Digital Robert Simcoe, D igital Tryggve Fossum, Digital Richard Sites, Digital M ark S. Fox, University of Toronto Ri chard Stockdale, Digital Rodney Gamach e , D igital David Stone, Digital Rick Gil lett, D igital joseph Tardo, D igital M ichael Greenwald, Stanford University Bob Tay lor, D igital Stephen Greenwood, Digital M i ke Uhler, D igital James Groch mal , Digital _lake VanNoy, D igital Robert Hagens, Un iversity of Wisconsin Wol f-Dietrich Weber, Stanford Un iversity AJf Hansen, Sintef Kathrin Wi nkler, D igital Steve Hardcastle-Ki l l e , Isode 1 44 vb/. 5 No. I Wi11ter 1993 Digital Tecfmicaljournal �amaomaTM ISSN 0898-901X Printed in U.S.A. EY-M770E-DP/93 0 5 02 1 8 . 0 Copyright © Digital Equipment Corporation. All Rights Reserved.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes XMP Toolkit : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 Create Date : 2006:04:14 07:43:06+01:00 Creator Tool : Adobe Acrobat 7.05 Modify Date : 2013:01:11 01:55:23Z Metadata Date : 2013:01:11 01:55:23Z Producer : Adobe Acrobat 10.1.4 Paper Capture Plug-in with ClearScan Format : application/pdf Title : Digital Technical Journal, Volume 5, Number 1: DECnet Open Networking Creator : Document ID : uuid:f8230c5e-18ec-46dc-87b5-66d2d46bb9b2 Instance ID : uuid:12bae078-990a-463e-8747-2f88c4066bcf Page Layout : SinglePage Page Mode : UseOutlines Page Count : 147EXIF Metadata provided by EXIF.tools