Apple MacOSXServer Xgrid Administration And High Performance Computing User Manual Mac OSXServerv10.5 Administrationand Admin HPC V10.5
User Manual: Apple MacOSXServer MacOSXServerv10.5-XgridAdministrationandHighPerformanceComputing
Open the PDF directly: View PDF .
Page Count: 124
Download | |
Open PDF In Browser | View PDF |
Mac OS X Server Xgrid Administration and High Performance Computing For Version 10.5 Leopard Apple Inc. © 2007 Apple Inc. All rights reserved. The owner or authorized user of a valid copy of Mac OS X Server software may reproduce this publication for the purpose of learning to use such software. No part of this publication may be reproduced or transmitted for commercial purposes, such as selling copies of this publication or for providing paid-for support services. Every effort has been made to ensure that the information in this manual is accurate. Apple Inc. is not responsible for printing or clerical errors. Apple 1 Infinite Loop Cupertino, CA 95014-2084 408-996-1010 www.apple.com Use of the “keyboard” Apple logo (Option-Shift-K) for commercial purposes without the prior written consent of Apple may constitute trademark infringement and unfair competition in violation of federal and state laws. AirPort, Apple, the Apple logo, Bonjour, FireWire, iPod, Mac, Macintosh, Mac OS, Xgrid, Xsan, and Xserve are trademarks of Apple Inc., registered in the U.S. and other countries. Apple Remote Desktop and Finder are trademarks of Apple Inc. Intel, Intel Core, and Xeon are trademarks of Intel Corp. in the U.S. and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. UNIX is a registered trademark of The Open Group. Other company and product names mentioned herein are trademarks of their respective companies. Mention of third-party products is for informational purposes only and constitutes neither an endorsement nor a recommendation. Apple assumes no responsibility with regard to the performance or use of these products. 019-0946/2007-09-01 1 Preface 9 9 9 10 10 11 12 12 13 13 Part I Contents About This Guide What’s New in Xgrid Administration What’s in This Guide Using This Guide Using Onscreen Help Advanced Server Administration Guides Viewing PDF Guides on Screen Printing PDF Guides Getting Documentation Updates Getting Additional Information Xgrid Administration Chapter 1 17 17 18 20 20 21 21 22 23 23 24 24 24 Introducing Xgrid Service About Xgrid and Computational Grids How Xgrid Works Common Types of Grids and Grid Computing Styles Xgrid Clusters Local Grids Distributed Grids Xgrid Components Agent Client Controller Jobs Requirements and Capacities Chapter 2 25 25 26 26 27 Setting Up and Configuring Xgrid Service Setup Overview Before Setting Up Xgrid Service Authentication Methods for Xgrid Single Sign-On (SSO) 3 Chapter 3 4 27 27 28 28 28 29 29 30 30 30 31 32 33 34 34 34 35 35 36 37 37 37 38 Password-Based Authentication No Authentication Hosting the Grid Controller Turning Xgrid Service On Configuring Xgrid with the Xgrid Service Configuration Assistant Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant Setting Up Xgrid Service Xgrid and Multiple Network Interfaces Configuring Controller Settings Starting Xgrid Service Configuring an Xgrid Agent (Mac OS X Server) Configuring an Xgrid Agent (Mac OS X) Setting Up Grid Authentication Setting Up Kerberos for Xgrid Setting Passwords for Xgrid Managing Client Access Setting SACL Permissions for Users and Groups Setting SACL Permissions for Administrators Managing Xgrid Service Viewing Xgrid Service Status Viewing Xgrid Service Logs Stopping Xgrid Service 39 39 40 40 40 41 41 41 42 43 43 44 44 44 44 45 45 45 46 Managing a Grid Using Xgrid Admin Status Indicators in Xgrid Admin Managing the Xgrid Controller Connecting to an Xgrid Controller Disconnecting from an Xgrid Controller Adding an Xgrid Controller Removing an Xgrid Controller Managing Agents Viewing a List of Agents Adding an Agent Deleting an Agent Managing Jobs Viewing a List of Jobs Stopping a Job Repeating or Restarting a Job Deleting a Job Adding a Grid Deleting a Grid Contents 46 Monitoring Grid Activity Chapter 4 47 47 47 48 48 48 49 49 Planning and Submitting Xgrid Jobs Structuring Jobs for Xgrid About Job Styles About Job Failure Submitting a Job Examples of Xgrid Job Submission and Results Retrieval Viewing Job Status Retrieving Job Results Chapter 5 51 51 51 52 52 53 53 53 53 54 55 Solving Xgrid Problems If Your Agents Can’t Connect to the Xgrid Controller If You Use Xgrid over SSH If You Run Tasks on Multi-CPU Machines If You Submit a Large Number of Jobs If You Want to Use Xgrid on Other Platforms If the Xgrid Controller Must Be Restarted If Xgrid Has Crashed If You Are Trying to Submit Jobs over 2 GB If You Want to Enable Kerberos/SSO for Xgrid For More Information Part II Configuring High Performance Computing Chapter 6 59 59 59 60 60 60 62 Introducing High Performance Computing Understanding HPC Apple and HPC Mac OS X Server Xserve Clusters Xserve 64-Bit Architecture Support of Loosely Coupled Computations Chapter 7 63 64 Reviewing the Cluster Setup Process Cluster Setup Overview Chapter 8 67 67 67 67 68 68 72 Identifying Prerequisites and System Requirements Prerequisites Expertise Xserve Configuration System Requirements Infrastructure Requirements Software Requirements Contents 5 6 72 73 Private Network Requirements Static IP Address and Hostname Requirements Chapter 9 75 75 78 Preparing the Cluster for Configuration Preparing the Cluster Nodes for Software Configuration (Optional) Setting Up the Management Computer Chapter 10 81 81 84 85 86 86 87 88 90 90 90 91 92 93 94 Setting Up the Cluster Controller Setting Up Server Software on the Cluster Controller Configuring DNS Service Verifying DNS Settings Configuring Open Directory Service Configuring the Cluster Controller as an Open Directory Master Configuring DHCP Service Configuring Firewall Settings on the Cluster Controller Configuring NAT Settings on the Cluster Controller Configuring NFS Configuring VPN Service Configuring Xgrid Service Preparing the Data Drive as a Mirrored RAID set Creating a Home Directory Automount Share Point Creating User Accounts Chapter 11 95 95 98 98 99 101 101 102 Setting Up Compute Nodes Creating an Auto Server Setup Record for Compute Nodes Verifying LDAP Record Creation Setting Up Compute Nodes Configuring Cluster Nodes Creating and Verifying a VPN Connection Joining a Remote Client to the Kerberos Realm Verifying Remote Client Access to the Kerberos Realm Chapter 12 103 103 104 105 106 Testing Your Cluster Checking Your Cluster Using Xgrid Admin Testing Your Xgrid Cluster Verifying Your Xgrid Configuration Verifying Your SSH Connection Appendix A 107 Cluster Setup Checklist Appendix B 111 111 112 112 Automating Compute Node Configuration Naming Multiple Cluster Nodes Joining Multiple Cluster Nodes to the Kerberos Realm Configuring Xgrid Agent Settings Using Apple Remote Desktop Contents 114 Glossary 115 Index 121 Using SSH Without Passwords Contents 7 8 Contents Preface About This Guide This guide describes the Xgrid components included in Mac OS X Server and tells you how to configure and use them in computational grids. Xgrid in Mac OS X Server version 10.5 includes a controller for computational grids and an agent that allows the server’s processor to work on jobs submitted to a grid. The agent is also available in computers using Mac OS X v10.3 or v10.4. What’s New in Xgrid Administration Xgrid service, Xgrid Admin, and high performance computing (HPC) in Mac OS X Server v10.5 Leopard include the following valuable new features.  Improved security with Xgrid superuser access controls  New Xgrid service configuration assistant  Logging improvements What’s in This Guide This guide is organized as follows:  Part I—Xgrid Administration. The chapters in this part of the guide introduce you to Xgrid service and the applications and tools available for administering xgrid.  Part II—Configuring High Performance Computing. The chapters in this part of the guide introduce you to HPC and the applications and tools available for administering HPC. Note: Because Apple frequently releases new versions and updates to its software, images shown in this book may be different from what you see on your screen. 9 Using This Guide The following list contains suggestions for using this guide:  Read the guide in its entirety. Subsequent sections might build on information and recommendations discussed in prior sections.  The instructions in this guide should always be tested in a nonoperational environment before deployment. This nonoperational environment should simulate, as much as possible, the environment where the computer will be deployed. Using Onscreen Help You can get task instructions on screen in Help Viewer while you’re managing Leopard Server. You can view help on a server or an administrator computer. (An administrator computer is a Mac OS X computer with Leopard Server administration software installed on it.) To get help for an advanced configuration of Leopard Server: m Open Server Admin or Workgroup Manager and then:  Use the Help menu to search for a task you want to perform.  Choose Help > Server Admin or Help > Workgroup Manager to browse and search the help topics. The help for Server Admin and Workgroup Manager contains instructions taken from Server Administration and other advanced administration guides described in “Advanced Server Administration Guides,” next. To see the latest server help topics: m Make sure the server or administrator computer is connected to the Internet while you’re getting help. Help Viewer automatically retrieves and caches the latest server help topics from the Internet. When not connected to the Internet, Help Viewer displays cached help topics. 10 Preface About This Guide Advanced Server Administration Guides Getting Started covers basic installation and initial setup methods for a standard, workgroup, or advanced configuration of Leopard Server. An advanced guide, Server Administration, covers advanced planning, installation, setup, and more. A suite of additional guides, listed below, covers advanced planning, setup, and management of individual services. You can get these guides in PDF format from the Mac OS X Server documentation website at www.apple.com/server/documentation. This guide ... tells you how to: Getting Started and Mac OS X Server Worksheet Install Mac OS X Server and set it up for the first time. Command-Line Administration Install, set up, and manage Mac OS X Server using UNIX commandline tools and configuration files. File Services Administration Share selected server volumes or folders among server clients using the AFP, NFS, FTP, and SMB/CIFS protocols. iCal Service Administration Set up and manage iCal shared calendar service. iChat Service Administration Set up and manage iChat instant messaging service. Mac OS X Security Configuration Make Mac OS X computers (clients) more secure, as required by enterprise and government customers. Mac OS X Server Security Configuration Make Mac OS X Server and the computer it’s installed on more secure, as required by enterprise and government customers. Mail Service Administration Set up and manage IMAP, POP, and SMTP mail services on the server. Network Services Administration Set up, configure, and administer DHCP, DNS, VPN, NTP, IP firewall, NAT, and RADIUS services on the server. Open Directory Administration Set up and manage directory and authentication services, and configure clients to access directory services. Podcast Producer Administration Set up and manage Podcast Producer service to record, process, and distribute podcasts. Print Service Administration Host shared printers and manage their associated queues and print jobs. QuickTime Streaming and Broadcasting Administration Capture and encode QuickTime content. Set up and manage QuickTime streaming service to deliver media streams live or on demand. Server Administration Perform advanced installation and setup of server software, and manage options that apply to multiple services or to the server as a whole. System Imaging and Software Update Administration Use NetBoot, NetInstall, and Software Update to automate the management of operating system and other software used by client computers. Upgrading and Migrating Use data and service settings from an earlier version of Mac OS X Server or Windows NT. User Management Create and manage user accounts, groups, and computers. Set up managed preferences for Mac OS X clients. Preface About This Guide 11 This guide ... tells you how to: Web Technologies Administration Set up and manage web technologies, including web, blog, webmail, wiki, MySQL, PHP, Ruby on Rails, and WebDAV. Xgrid Administration and High Performance Computing Set up and manage computational clusters of Xserve systems and Mac computers. Mac OS X Server Glossary Learn about terms used for server and storage products. Viewing PDF Guides on Screen While reading the PDF version of a guide on screen:  Show bookmarks to see the guide’s outline, and click a bookmark to jump to the corresponding section.  Search for a word or phrase to see a list of places where it appears in the document. Click a listed place to see the page where it occurs.  Click a cross-reference to jump to the referenced section. Click a web link to visit the website in your browser. Printing PDF Guides If you want to print a guide:  Save ink or toner by not printing the cover page.  Save color ink on a color printer by looking in the panes of the Print dialog for an option to print in grays or black and white.  Maximize the printed page image by changing the Scale setting in the Page Setup dialog. Try 122% with Paper Size set to US Letter. (PDF pages are 7.5 by 9 inches except Getting Started, which is CD size, 125 by 125 mm.)  Reduce the bulk of the printed document and save paper by printing more than one page per sheet of paper. In the Print dialog, choose Layout from the untitled pop-up menu. If your printer supports two-sided (duplex) printing, select one of the TwoSided options. Otherwise, choose 2 from the Pages per Sheet pop-up menu, and optionally choose Single Hairline from the Border menu. 12 Preface About This Guide Getting Documentation Updates Periodically, Apple posts revised help pages and new editions of guides. Some revised help pages update the latest editions of the guides.  To view new onscreen help topics for a server application, make sure your server or administrator computer is connected to the Internet and click “Latest help topics” or “Staying current” in the main help page for the application.  To download the latest guides in PDF format, go to the Mac OS X Server documentation website: www.apple.com/server/documentation Getting Additional Information For more information, consult these resources:  Read Me documents—important updates and special information. Look for them on the server discs.  Mac OS X Server website (www.apple.com/macosx/server)—gateway to extensive product and technology information.  Apple Service & Support website (www.apple.com/support)—access to hundreds of articles from Apple’s support organization.  Apple customer training (train.apple.com)—instructor-led and self-paced courses for honing your server administration skills.  Apple discussion groups (discussions.info.apple.com)—a way to share questions, knowledge, and advice with other administrators.  Apple mailing list directory (www.lists.apple.com)—subscribe to mailing lists so you can communicate with other administrators using email.  Open Source website (developer.apple.com/darwin/)—Access to Darwin open source code, developer information, and FAQs. Preface About This Guide 13 14 Preface About This Guide Part I: Xgrid Administration I Use the chapters in this part of the guide to learn about Xgrid service and the applications and tools available for administering Xgrid. Chapter 1 Introducing Xgrid Service Chapter 2 Setting Up and Configuring Xgrid Service Chapter 3 Managing a Grid Chapter 4 Planning and Submitting Xgrid Jobs Chapter 5 Solving Xgrid Problems 1 Introducing Xgrid Service 1 Use this chapter to learn about what Xgrid is and how it can help you. You use Xgrid to create grids of multiple computers and distribute complex jobs among them for high-throughput computing. Xgrid, a technology in Mac OS X Server and Mac OS X, simplifies deployment and management of computational grids. Xgrid enables administrators to group computers in grids or clusters, and enables users to easily submit complex computations to groups of computers (local, remote, or both), as either an ad hoc grid or a centrally managed cluster. About Xgrid and Computational Grids Xgrid makes it easy to turn an ad hoc group of Mac systems into a low-cost supercomputer. Xgrid is ideal for individual researchers, specialized collaborators, and application developers. For example:  Scientists can search biological databases on a cluster of Xserve systems.  Engineers can perform finite element analyses on their workgroup’s desktops.  Animators can render images using Mac systems across multiple corporate locations.  Research teams can enlist colleagues and interested laypeople in Internet-scale volunteer grids to perform long-running scientific calculations.  Anyone needing to perform CPU-intensive calculations can simultaneously run a single job across multiple computers, dramatically improving throughput and responsiveness. With Xgrid functionality integrated into Mac OS X Server, system administrators can quickly enable Xgrid on Mac systems throughout their company, turning idle CPU cycles into a productive cluster at no incremental cost. 17 How Xgrid Works Xgrid creates multiple tasks for each job and distributes those tasks among multiple nodes. These nodes can be desktop computers running Mac OS X v10.3 or later, or server computers running Mac OS X Server v10.4 or later. Many desktop computers sit idle during the day, in evenings, and on weekends. The assembly of these systems into a computational grid is known as desktop recovery. This method of grid construction enables you to vastly improve your computational capacity without purchasing extra hardware, and Xgrid makes the software configuration a straightforward task. For a server to function as a controller, Xgrid requires Mac OS X Server v10.4 or later, with a minimum of 256 MB of RAM. To operate as an agent in a grid, Xgrid requires Mac OS X v10.3 or later, with a minimum of 128 MB of RAM (256 MB advisable). All Xgrid participants must have a network connection. As always, the more RAM a system has, the better it performs, particularly for high-performance computing applications. A grid is a group of computers working together to solve a single problem. The systems in a grid can be loosely coupled, geographically dispersed and, to some extent, heterogeneous. In contrast, systems in a cluster are often homogeneous, collocated, and strictly managed. Highly dispersed grids, such as SETI@Home, enable individuals to donate their spare processor cycles to a cause. In office environments, large rendering or simulation jobs can be distributed across all the systems left idle overnight. These can even be used to augment a dedicated computational cluster, which is available to Xgrid clients at all times. These distinct grid configurations are explained in “Common Types of Grids and Grid Computing Styles” on page 20. 18 Chapter 1 Introducing Xgrid Service The illustration below gives an example of how a grid handles a job. Distributed agents 1 Client submits job to Controller 2 Controller splits job into tasks, then submits tasks to Agents 3 Agents execute tasks Dedicated Desktop Controller Client Dedicated Server 5 Controller collects tasks and returns job results to Client 4 Agents return tasks to Controller Part-Time Desktop Xgrid has no limitations on the amount of computational power it can support. The performance of the grid depend on the systems participating, the software running, and the network, among other factors. However, individual applications strongly influence the performance of the grid. You determine if an application is improved by being deployed on a computational grid. In the best case, application performance may scale linearly with the size of the grid. In the worst case, the addition of agents to a grid can cause a job to complete in even more time than if there were fewer agents. (In such a situation, tasks become so small that the overhead associated with distributing the increased number of tasks supersedes the performance gain of using more agents.) You should be aware of these considerations. Many proprietary projects enable you to participate in a large computational grid. Often these projects, such as SETI@Home and FightAIDS@Home, are tied to a specific scientific purpose. They usually have easy-to-install software that enables any volunteer to participate in that particular project, and they frequently take the form of a screen saver or background process. Chapter 1 Introducing Xgrid Service 19 You don’t need to think in terms of thousands or millions of seldom-used computers to see the significance of a computational grid. For example, computers used by university students or corporate employees often work fewer hours than the hours they sit idle at night or on weekends. These computers could contribute productively to the work of a grid without diminishing their usefulness to the students or employees. Other grid projects are designed for large-scale computational grids, such as the Globus Alliance (a group founded by universities and researchers), with flexible resource management tools and more intelligent grid deployment methods. Instead of developing neatly packaged applications for a specific grid, such projects provide comprehensive frameworks for application deployment. Xgrid enables users to participate in a computational grid of their choice while still providing the flexibility of a more generic framework for grid developers when deploying grid applications. Xgrid provides the primary benefits of both. The advantages of the Xgrid technology include:  Easy grid configuration and deployment  Straightforward yet flexible job submission  Automatic controller discovery by agents and clients  Flexible architecture based on open standards  Support for the UNIX security model, including Kerberos single sign-on or regular password authentication  Choice between a command-line interface or an API-based model for grid interaction Common Types of Grids and Grid Computing Styles Xgrid can be used in tightly coupled clusters, worldwide grids, and everything in between. This immense flexibility enables you to deploy grids of almost any nature. Three main topologies are commonly used for Xgrid deployments, discussed as follows:  “Xgrid Clusters” on page 20  “Local Grids” on page 21  “Distributed Grids” on page 21 Xgrid Clusters Computational clusters are sets of systems dedicated to computation. In a cluster, systems are typically co-located in a rack, connected using gigabit Ethernet or another high-performance network, and strictly managed for maximum performance. Cluster systems are often entirely homogeneous: their operating systems are the same versions, they have the same software installed, and they generally have the same processor, disk, and RAM configurations. 20 Chapter 1 Introducing Xgrid Service Xgrid enables administrators to easily configure the distributed resource management functionality of the cluster. Each server in the system runs the agent software, and the head node in the cluster runs the controller software. Xgrid distributes tasks across the cluster. In clusters, failure rates are generally very low. Systems are rarely, if ever, offline, and their resources are not shared with general user tasks. Clusters are the most efficient but most expensive model of distributed computing. Local Grids Systems that are under common administration in a company, university computer lab, or other managed environment can often be easily assembled into a grid for desktop recovery. These systems are often on a local area network (LAN) and they are generally managed by a single organization. As a result, they provide good network performance and offer substantial manageability. Because these systems are often also used as day-to-day workstations, users can easily interrupt grid tasks by moving the mouse, resetting the system, or even accidentally disconnecting the system from the network. In such cases, a task might fail as part of an Xgrid job the Xgrid controller eventually reassigns the failed task to another agent, and the job completes successfully. In local grids, performance is limited by such situations and by the varying performance of any given agent on the grid. Distributed Grids When a system is permitted to donate its time, a distributed grid is formed. The Xgrid agent enables a user to specify any IP address or host name for its controller. By specifying a grid, a user can dedicate his or her CPU time to that grid no matter where the controller is located. The manager of the controller has no direct management control or knowledge of the agent system but is nonetheless able to harness its CPU time. Distributed grids have very high failure rates for jobs but place a very low burden for the grid administrator. With very, very large jobs, high task failure rates may not substantially affect the performance of the grid if such failures can be rapidly reassigned to other available agents. Network performance can also be a consideration because data is sent over the Internet, rather than over a local network, to agents connected to a grid. The monetary cost of such distributed grids is extremely low. Chapter 1 Introducing Xgrid Service 21 Xgrid Components The Xgrid three-tier architecture simplifies the distribution of complicated tasks. Its user clients, grid controllers, and computational agents work together to streamline the process of assembling nodes, submitting jobs, and retrieving results. The illustration below gives an example of the Xgrid components and the process of auto configuration for a grid. Distributed agents 4 Client submits using mDNS, DNS, or name/address 2 Agents locate Controller using mDNS, DNS, or name/address 1 Controller advertises via mDNS Dedicated Desktop Controller Client 5 Clients and Controller mutually authenticate using passwords or single sign-on 3 Agents and Controller mutually authenticate using passwords or single sign-on Dedicated Server Part-time Desktop The primary components of a computational grid perform the following functions:  An agent runs one task at a time per CPU; a dual-processor computer can run two tasks simultaneously.  A controller queues tasks, distributes those tasks to agents, and handles task reassignment.  A client submits jobs to the Xgrid controller in the form of multiple tasks. (A client can be any computer running Mac OS X v10.4 or later or Mac OS X Server v10.4 or later.) In principle, the agent, controller, and client can run on the same server, but it is often more efficient to have a dedicated controller node. 22 Chapter 1 Introducing Xgrid Service Agent Xgrid agents run the computational tasks of a job. In Mac OS X Server, the agent is turned off by default. When an agent is turned on and becomes active at startup, it registers with a controller. (An agent can be connected to only one controller at a time.) The controller sends instructions and data to the agent as needed for the controller’s jobs. After it receives instructions from the controller, the agent performs its assigned tasks and sends the results back to the controller. By default, agents seek to bind to the first available controller on the LAN. Alternatively, you can specify that it bind to a specific controller. You can also specify whether an agent is always available or is available only when the computer is idle. A computer is considered idle when it has no mouse or keyboard input and ignores CPU and network activity. If a user returns to a computer that is running a grid task, the computer continues to run the task until it is finished. By default, the agent on Mac OS X Server is dedicated and the agent on a Mac OS X computer (not a server) is configured to accept tasks only when the computer has had no user input for 15 minutes. For details about configuring an agent, see “Configuring an Xgrid Agent (Mac OS X Server)” on page 32. For information about managing agents, see “Managing Agents” on page 42. Client Any system can be an Xgrid client if it is running Mac OS X v10.4 or later and has a network connection to the Xgrid controller system. In general, the client can connect to only a single controller. Depending on how a controller is configured, the client must supply a password or be authenticated by Kerberos (single sign-on) before submitting a job to the grid. A user submits a job to the controller from a system running the Xgrid client software, usually a command-line tool accessed with the Terminal application. The job can specify the controller or use multicast DNS (mDNS) to dynamically discover the first available controller. When the job is complete, the controller notifies the client and the client can retrieve the results of the job. For information about client authentication to the controller, see “Setting Up Grid Authentication” on page 34. Chapter 1 Introducing Xgrid Service 23 Controller The Xgrid controller manages the communications among the computational resources of a grid. The controller requires Mac OS X Server v10.4 or later. The controller accepts network connections from clients and agents. It receives job submissions from clients, divides the jobs into tasks, dispatches tasks to agents, and returns results to the clients. Although there can be more than one Xgrid controller running on a subnet, there can only be one controller per logical grid. Each controller can have an arbitrary number of agents connected, but Apple has tested 128 agents per controller. However, there is no software limitation on the number of agents, and users of Xgrid can choose to exceed 128 agents on a controller at their own risk, with a theoretical maximum equal to the number of available sockets on the controller system. For details about setting up an Xgrid controller, see “Configuring Controller Settings” on page 30. For information about managing controllers and grids, see “Managing the Xgrid Controller” on page 40. Jobs A job is a collection of execution instructions that can include data and executables. Xgrid can run scripts, utilities, and custom software (anything that doesn’t require user interaction). A client submits a job to the grid. The controller accepts the job and its associated files, divides the job into tasks, and then distributes the tasks to agents. Agents accept the tasks, perform the calculations, and return the results to the controller, which aggregates them and returns them to the clients. For more information about jobs, see “Structuring Jobs for Xgrid” on page 47 and “Submitting a Job” on page 48. Requirements and Capacities Xgrid is designed to scale from small clusters of a few computers up to large organization-wide grids. Xgrid supports up to 128 agents, any number of jobs comprising up to 100,000 queued tasks, up to 128 MB of submitted data per job, and up to 128 MB of results per job. These are recommended limits and are not enforced by the software. You may choose to exceed these limits at your own risk. 24 Chapter 1 Introducing Xgrid Service 2 Setting Up and Configuring Xgrid Service 2 Use this chapter to plan your grid and set up the Xgrid agent and controller. Xgrid simplifies deployment and management of computational grids. Using Server Admin you can configure Xgrid to set up computer groups (grids or clusters) and allow users to easily submit complex computations to these grids (local, remote, or both), as either an ad hoc grid or a centrally managed cluster. Setup Overview Here is an overview of the steps for setting up Xgrid service: Step 1: Before you begin See “Before Setting Up Xgrid Service” on page 26. Identify the Xgrid environment you need. Before configuring Xgrid, you must make some decisions about the grid. Step 2: Turn Xgrid service on Prior to configuring, turn on Xgrid service. See “Turning Xgrid Service On” on page 28. Step 3: (Optional) Use the Xgrid service configuration assistant to configure Xgrid If you choose to, you can configure Xgrid using the Xgrid service configuration assistant. This assistant helps with Xgrid configuration by automating many of the settings you make. See “Configuring Xgrid with the Xgrid Service Configuration Assistant” on page 28. Step 4: Configure Xgrid controller settings Configure your server as an Xgrid controller using Server Admin. See “Configuring Controller Settings” on page 30. Step 5: Start Xgrid service Start Xgrid service on the server using Server Admin. See “Starting Xgrid Service” on page 31. 25 Step 6: Configure Xgrid agent settings (Mac OS X Server) Configure your server as an Xgrid agent using Server Admin. See “Configuring an Xgrid Agent (Mac OS X Server)” on page 32. Step 7: Configuring Xgrid agent settings (Mac OS X) Configure computers as Xgrid agents by using Sharing Preferences. See “Configuring an Xgrid Agent (Mac OS X)” on page 33. Before Setting Up Xgrid Service Before configuring Xgrid service, you must define the grid environment you’ll create. In particular, you must decide the following:  The kind of authentication to use. See “Authentication Methods for Xgrid” on page 26.  Where to host your controller. See “Hosting the Grid Controller” on page 28.  How you will manage the controller. See “Managing Xgrid Service” on page 37 and “Monitoring Grid Activity” on page 46. Authentication Methods for Xgrid You can configure Xgrid with or without authentication. If you choose to require authentication of controllers to mutually authenticate with clients and agents, you can choose Single Sign-On or Password-Based Authentication. The following authentication options are available:  Single Sign-On  Password-Based Authentication  No Authentication You set up an Xgrid controller using Server Admin. You can specify the type of authentication for agents and clients. The passwords entered in Server Admin for the controller must match those entered for each agent and client. Consider these points when establishing passwords for agents and clients:  Kerberos authentication (single sign-on or SSO). If you use Kerberos authentication for agents or clients, the server that’s the Xgrid controller must be configured for Kerberos, in the same realm as the server running the Kerberos domain controller (KDC) system, and bound to the Open Directory master. The agent uses the host principal found in the /etc/krb5.keytab file. The controller uses the Xgrid service principal found in the /etc/krb5.keytab file.  Agents. The agent determines the authentication method. The controller must conform to that method and password (if a password is used). When an agent is configured with a standard password (not SSO), you must use the same password for agents when you configure the controller. If the agent has specified SSO, the correct service principal and host principals must be available. 26 Chapter 2 Setting Up and Configuring Xgrid Service  Clients. If your server is the controller for a grid, be sure that Mac OS X and Mac OS X Server clients use the correct authentication method for the controller. A client cannot submit a job to the controller unless the user chooses the correct authentication method and enters their password correctly, or has the correct ticketgranting ticket from Kerberos. For more information, see “Setting Up Grid Authentication” on page 34. Single Sign-On (SSO) SSO is the most powerful and flexible form of authentication. It leverages the Open Directory and Kerberos infrastructures in Mac OS X Server to manage authentication behind the scenes, without user intervention. Each Xgrid participant must have a Kerberos principal. The clients and agents obtain ticket-granting tickets for their principal, which is used to obtain a service ticket for the controller service principal. The controller looks at the ticket granted to the client to determine the user’s principal and verifies it with the relevant service access control lists (SACLs) and groups to determine privileges. Generally, you should use this option if any of the following conditions are true:  You already have SSO in your environment.  You have administrator control over all agents and clients in use.  Jobs must run with special privileges (such as for local, network, or SAN file system access). Password-Based Authentication When you can’t use SSO, you can require password authentication. You may not be able to use SSO if:  Potential Xgrid clients are not trusted by your SSO domain (or you don’t have one)  You want to use agents across the Internet or that are outside your control  It is an ad hoc grid, without the ability to prearrange a web of trust In these situations, your best option is to specify a password. You have two distinct password settings: one for controller-client and one for controller-agent. For security reasons these should be different passwords. Note: You can also create hybrid environments, such as with client-controller authentication done using passwords but controller-agent authentication done using SSO (or vice versa). No Authentication This option is suitable only for testing a private network in a home or a lab that is inaccessible from any untrusted computer, or when none of the jobs or the computers contain sensitive or important information. Chapter 2 Setting Up and Configuring Xgrid Service 27 Otherwise, do not use this option. It creates a potential security hole (because anyone can connect or run a job) and should never be used on a system exposed to the Internet, especially when potentially sensitive data is involved. If you choose to use no authentication, agents can join the grid and clients can submit jobs to the grid without authenticating. Hosting the Grid Controller The primary requirement for a controller is that it must be network-accessible to clients and agents. In some cases this may mean the controller must be placed outside an organizational firewall (or inside a buffer zone); otherwise, you would need to open up port 4111 so the controller can be contacted. It is much simpler (though not essential) for the controller to be on the same subnet as the agents and usual clients, so they can discover each other using Bonjour. If that’s not feasible, host the controller on a server with a fixed IP address and fully qualified DNS name (or alternatively, using Dynamic DNS and a service lookup entry) so that agents and clients know where to find it. Turning Xgrid Service On Before you can configure Xgrid settings, you must turn Xgrid service on in Server Admin. To turn Xgrid service on: 1 Open Server Admin and connect to the server. 2 Click Settings. 3 Click Services. 4 Select the Xgrid checkbox. 5 Click Save. Configuring Xgrid with the Xgrid Service Configuration Assistant You can set up Xgrid service by configuring the controller and agent using the Xgrid service configuration assistant. This optional configuration assistant guides you through setting up a server to host a grid or join an existing grid. Before this assistant proceeds, your server must have access to a directory server that provides Kerberos services. 28 Chapter 2 Setting Up and Configuring Xgrid Service Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant Use the Xgrid service configuration assistant to configure the Xgrid agent and controller to run on this server. This also configures a network file system. To set up Xgrid to host a grid using the Xgrid service configuration assistant: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click Overview. 5 Click Configure Xgrid Service (at the lower right). This opens the Xgrid service configuration assistant. 6 Click Continue. 7 Choose “Host a grid,” then click Continue. 8 Enter the username and password for the directory administrator to authenticate with the directory domain displayed, then click Continue. 9 Review and confirm your configuration settings, then click Continue. This restarts Xgrid service using your settings. 10 Click Close. Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant Use the Xgrid service configuration assistant to configure the Xgrid agent to run on this server. Joining a grid means that an agent is set up on this server and is bound to an existing controller. To join a grid using the Xgrid service configuration assistant: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click Overview. 5 Click Configure Xgrid Service (at the lower right). This opens the Xgrid service configuration assistant. 6 Click Continue. 7 Choose “Join a grid,” then click Continue. Chapter 2 Setting Up and Configuring Xgrid Service 29 8 Specify the controller you want to bind your agent to. Select “Browse Bonjour-discoverable controllers” to view and select from available controllers. Select “Use controller with hostname” to enter the hostname of a specific controller. 9 Click Continue. 10 Review and confirm your configuration settings, then click Continue. This restarts Xgrid service using your settings. 11 Click Close. Setting Up Xgrid Service You set up Xgrid service by configuring two groups of settings on the Settings pane for Xgrid service in Server Admin:  Controller. Use to configure your server as an Xgrid controller and set client and agent authentication.  Agent. Use to configure your server as an Xgrid agent, to specify the controller, and to set controller authentication. The following section describes how to configure these settings. An additional section tells you how to start Xgrid service when you finish. (By default, the Xgrid controller and agent are disabled.) Important: If you specify a password, the agent and controller must use the same password or must authenticate using Kerberos (SSO). For information about authentication options, see “Setting Passwords for Xgrid” on page 34. Xgrid and Multiple Network Interfaces On a server with multiple network interfaces, Mac OS X Server makes Xgrid service available over all interfaces. You can’t configure Xgrid service separately for each interface. Configuring Controller Settings You use Server Admin to configure an Xgrid controller. When configuring the controller, you can also set a password for any agent using the grid and for any client that submits a job to the grid. To configure an Xgrid controller: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 30 Chapter 2 Setting Up and Configuring Xgrid Service 4 Click Settings. 5 Click Controller. 6 Click “Enable controller service.” 7 From the Client Authentication pop-up menu, choose one of the following authentication options for clients and enter the password.  Password requires that the agent and controller use the same password.  Kerberos uses SSO authentication for the agent’s administrator.  None does not require a password for the agent. This option provides no protection from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data. For details about password options, see “Setting Up Grid Authentication” on page 34. 8 From the Agent Authentication pop-up menu, choose from the following authentication options for agents and enter the password.     Password requires that the agent and controller use the same password. Kerberos uses SSO authentication for the agent’s administrator. Any uses any authentication available for the agent’s administrator. None does not require a password for the agent. This option provides no protection from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data. For information about password options, see “Setting Up Grid Authentication” on page 34. 9 Click Save. Important: If you require authentication, the agent and controller must use the same password or must authenticate using Kerberos (SSO). For information about authentication options, see “Setting Up Grid Authentication” on page 34. Starting Xgrid Service Use Server Admin to start Xgrid service. The Xgrid service must be running for your server to control a grid or participate in a grid as an agent. For details about using the server as an agent and controller, see “Configuring an Xgrid Agent (Mac OS X Server)” on page 32 and “Configuring Controller Settings” on page 30. After you start Xgrid, it restarts when the server is restarted. To start Xgrid service: 1 Open Server Admin and connect to the server. Chapter 2 Setting Up and Configuring Xgrid Service 31 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click the Start Xgrid button (below the Servers list). Configuring an Xgrid Agent (Mac OS X Server) You use Server Admin to set up your server as an Xgrid agent. In addition, you can associate the agent with a specific controller or permit it to join a grid, specify when the agent accepts tasks, and set a password that the controller must recognize. To configure an Xgrid agent on the server: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click Settings. 5 Click Agent. 6 Click “Enable agent service.” 7 Specify a controller by choosing its name in the Controller pop-up menu or by entering the controller name. By default, the agent uses the first available controller. Note: An agent can find a controller in one of three ways: a specific hostname or IP address, the first available controller that advertises on Bonjour on the local subnet, or to a specific Bonjour service name. 8 Specify when the agent will accept tasks. Tasks can be accepted when the computer is idle or always. A computer is considered idle when it has no mouse or keyboard input and ignores CPU and network activity. If a user returns to a computer that is running a grid task, the computer continues to run the task until it is finished. 9 From the pop-up menu, choose one of the following authentication options and enter the password. For details, see “Setting Up Grid Authentication” on page 34.  Password requires that the agent and controller use the same password.  Kerberos uses SSO authentication for the agent’s administrator.  None does not require a password for the agent. This option provides no protection from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data. 32 Chapter 2 Setting Up and Configuring Xgrid Service 10 Click Save. Important: If you require authentication, the agent and controller must use the same password or must authenticate using Kerberos SSO. For details about authentication option, see “Setting Up Grid Authentication” on page 34. Configuring an Xgrid Agent (Mac OS X) You use Sharing preferences to set up client computers as Xgrid agents. In addition, you can associate the agent with a specific controller or permit it to join any grid, specify when the agent accepts tasks, and set a password that the controller must recognize. To configure an Xgrid agent on a client: 1 On the client computer, open Sharing preferences and click Services. 2 Click Xgrid and then click Configure. 3 Specify a controller by choosing its name in the Controller pop-up menu or by entering the controller name. By default, the agent uses the first available controller. Note: An agent can find a controller in one of three ways: a specific hostname or IP address, the first available controller that advertises on Bonjour on the local subnet, or to a specific Bonjour service name. 4 Specify when the agent will accept tasks. Tasks can be accepted when the computer is idle or always. A computer is considered idle when it has no mouse or keyboard input and ignores CPU and network activity. If a user returns to a computer that is running a grid task, the computer continues to run the task until it is finished. 5 Choose one of the following authentication options from the pop-up menu and enter the password. For more information, see “Setting Up Grid Authentication” on page 34.  Password requires that the agent and controller use the same password.  Kerberos uses SSO authentication for the agent’s administrator.  None does not require a password for the agent. This option provides no protection from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data. 6 Click OK. Important: If you require authentication, the agent and controller must use the same password or must authenticate using Kerberos (SSO). For more information about authentication options, see “Setting Up Grid Authentication” on page 34. 7 Click Start to turn Xgrid sharing on. Chapter 2 Setting Up and Configuring Xgrid Service 33 Setting Up Grid Authentication You can configure Xgrid to require authentication of controllers, clients, and agents. For more information, see “Authentication Methods for Xgrid” on page 26. Setting Up Kerberos for Xgrid You use Server Admin to configure Kerberos as the authentication method for your Xgrid. Kerberos authentication uses SSO. To configure Kerberos authentication: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 In the expanded Servers list, click Xgrid. 4 Click Settings. 5 Click Agent. 6 Click “Enable agent service.” 7 For the authentication option for the agent, choose Kerberos from the Controller Authentication pop-up menu. 8 Click Controller. 9 Click “Enable controller service.” 10 For the authentication option for the client, choose Kerberos from the Client Authentication pop-up menu. 11 For the authentication option for the agent, choose Kerberos from the Agent Authentication pop-up menu. 12 Click Save and restart the service. Setting Passwords for Xgrid You use Server Admin to configure your Xgrid controllers to authenticate clients and agents using password authentication. Password authentication requires that the agent and controller use the same password. You specify password options in Server Admin as part of configuring the agent and controller. See “Configuring an Xgrid Agent (Mac OS X Server)” on page 32 and “Configuring Controller Settings” on page 30. To configure password authentication: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 34 Chapter 2 Setting Up and Configuring Xgrid Service 3 In the expanded Servers list, click Xgrid. 4 Click Settings. 5 Click Agent. 6 Click “Enable agent service.” 7 For the authentication option for the agent, choose Password from the Controller Authentication pop-up menu and enter a password. 8 Click Controller. 9 Click “Enable controller service.” 10 For the authentication option for the client, choose Password from the Client Authentication pop-up menu and enter a password. 11 For the authentication option for the agent, choose Password from the Agent Authentication pop-up menu and enter a password. You can also choose Any from the Agent Authentication pop-up menu to permit any method of authentication. Note: Password authentication requires that the agent and controller use the same password. 12 Click Save and restart the service. Managing Client Access Server Admin in Mac OS X Server enables you to configure service access control lists (SACLs), which enable you to specify which users and groups have access to Xgrid and which administrators can manage it. Using SACLs enables you to add another layer of access control in addition to password and Kerberos authentication. Only users and groups listed in an SACL have access to its corresponding service. Setting SACL Permissions for Users and Groups You use Server Admin to set SACL permissions for users and groups to access Xgrid service. To set user and group SACL permissions for Xgrid service: 1 Open Server Admin and connect to the server. 2 Click Settings. 3 Click Access. 4 Click Services. 5 Select the level of restriction you want for the services: Chapter 2 Setting Up and Configuring Xgrid Service 35 To restrict access to all services, select “For all services.” To set access permissions for individual services, select “For selected services below,” then select a service from the Service list. 6 To provide unrestricted access to services, click “Allow all users and groups.” 7 To restrict access to users and groups: a Select “Allow only users and groups below.” b Click the Add (+) button to open the Users and Groups drawer. c Drag users and groups from the Users and Groups drawer to the list. 8 Click Save. Setting SACL Permissions for Administrators Use Server Admin to set SACL permissions for administrators to monitor and manage Xgrid service. To set administrator SACL permissions for Xgrid service: 1 Open Server Admin and connect to the server. 2 Click Settings. 3 Click Access. 4 Click Administrators. 5 Select the level of restriction you want for the services: To restrict access to all services, select “For all services.” To set access permissions for individual services, select “For selected services below,” then select a service from the Service list. 6 Open the Users and Groups drawer by clicking the Add (+) button. 7 From the Users and Groups drawer, drag users and groups to the list. 8 Set user permissions: To grant administrator access, choose Administer from the Permission pop-up menu next to the user name. To grant monitoring access, choose Monitor from the Permission pop-up menu next to the user name. 9 Click Save. 36 Chapter 2 Setting Up and Configuring Xgrid Service Managing Xgrid Service This section describes typical day-to-day tasks you might perform after you set up Xgrid service on your server. For information about initial setup, see “Setting Up Xgrid Service” on page 30. You can monitor and manage grids using Xgrid Admin. For more information, see Chapter 3, “Managing a Grid.” Viewing Xgrid Service Status You can use Server Admin to view the status of Xgrid service. To view Xgrid service status: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 From the expanded Servers list, select Xgrid. 4 Click Overview to see whether the service is running, when it started, agent and controller information, the number of jobs running and pending, and the amount of processor power available and used. 5 Click Logs to review the system, controller, and agent logs. Use the View pop-up menu to choose which log to view. Viewing Xgrid Service Logs You can use Server Admin to view the Xgrid system, controller, and agent logs for Xgrid service. To view logs: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 From the expanded Servers list, select Xgrid. 4 Click Logs, then use the Show pop-up menu to choose System Log (Xgrid), Xgrid Controller Log, or Xgrid Agent Log. To search for specific entries, use the filter field above the log. From the Command Line You can also view the Xgrid service log at /var/log/system.log using the cat or tail commands in Terminal. Chapter 2 Setting Up and Configuring Xgrid Service 37 Stopping Xgrid Service You use Server Admin to stop Xgrid service. To stop Xgrid service: 1 Open Server Admin and connect to the server. 2 Click the triangle to the left of the server. The list of services appears. 3 From the expanded Servers list, select Xgrid. 4 Click the Stop Xgrid button (below the Servers list). From the Command Line You can also stop Xgrid service immediately by using the serveradmin command in Terminal. 38 Chapter 2 Setting Up and Configuring Xgrid Service 3 Managing a Grid 3 Use this chapter to learn how to use the Xgrid Admin application to manage grids, add controllers and agents, and work with jobs. After you set up an Xgrid controller, you can use Xgrid Admin to manage a grid. You can use Xgrid Admin on the server or on a remote computer that is running Mac OS X v10.4 or later. You can manage one or more computational grids with Xgrid Admin. A computational grid is a fixed group of agents with a dedicated queue. There can be multiple grids per controller but an agent can belong to only one grid. You cannot move an agent between grids while a job (or a task) is running. Using Xgrid Admin Xgrid Admin is a tool you use to monitor one or more grids and manage agents and jobs. With Xgrid Admin, you can:  Check the status of a grid and its activity, including the number of agents working and available, processing power in use and available, and the number of jobs running and pending  Add or remove controllers and grids to manage  See a list of agents in a grid and the CPU power available and in use for each agent  Add or remove agents in a grid  See a list of jobs in a grid, the date and time each job was submitted, its progress, and the active CPU power for the job  Remove jobs in a grid  Stop a job in progress  Restart a job that was stopped or is complete 39 Xgrid Admin provides controls in its graphical interface and menu commands for all of its options. Note: You can also use the Xgrid command-line tool to perform these tasks. For more information about using the command-line tool, see Chapter 4, “Planning and Submitting Xgrid Jobs.” Status Indicators in Xgrid Admin Xgrid Admin provides status indicators, which are small color bubbles indicating the status of controllers, agents, and jobs. The color indicators are:  Colorless = controller or agent is offline, job is pending  Gray = job is submitting  Green = controller is connected, agent is working, job is running  Yellow = agent is available but not running  Red = agent is unavailable, job is failed or canceled  Blue = job is complete Managing the Xgrid Controller In general, you manage the Xgrid controller like any other service running on Mac OS X Server, using Server Admin to manage which processes are running and using Xgrid Admin to manage the agent and job queues on the controller. The amount of management required also depends on how many queues you have and the number (and temperament) of the users who submit jobs. Xgrid uses a simple first-in, first-out (FIFO) queue for scheduling each grid, which means that as the administrator you must obtain your colleagues’ cooperation to make sure resources are allocated correctly among multiple users. For more information, see the following sections: Connecting to an Xgrid Controller You use Xgrid Admin to connect to an Xgrid controller. The controller must be reachable on any network by the administrative computer running Xgrid Admin. After Xgrid Admin is connected to the controller, you can view the status of its grid and manage its agents and jobs. 40 Chapter 3 Managing a Grid To connect to an Xgrid controller: 1 Open Xgrid Admin and do one of the following:  From the pop-up menu, choose the controller or enter its name and click Connect.  In the Controllers and Grids list, select the controller name and click Connect. 2 If necessary, select the correct authentication option, enter a password, and then click OK. Disconnecting from an Xgrid Controller You use Xgrid Admin to disconnect froman Xgrid controller in the Controllers and Grids list. To disconnect an Xgrid controller: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select a controller. 3 Click Disconnect. Adding an Xgrid Controller You use Xgrid Admin to add an Xgrid controller to the Controllers and Grids list. To add an Xgrid controller to the monitoring list: 1 Open Xgrid Admin. 2 Click Add Controller. 3 From the pop-up menu, choose a controller or enter its name and click Connect. 4 If necessary, select the correct authentication option, enter a password, and then click OK. Removing an Xgrid Controller You can easily remove an Xgrid controller from the Controllers and Grids list in Xgrid Admin. To remove an Xgrid controller: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select a controller. 3 Click Remove Controller. Chapter 3 Managing a Grid 41 Managing Agents Use Xgrid Admin to view, add, or delete agents. Xgrid Admin also uses status indicators to display the status of agents. Although Server Admin provides a simple interface for enabling Xgrid services on one server or across a rack of Xserve systems, it doesn’t provide a way to configure Xgrid on desktop computers running Mac OS X v10.3 or later. If you are relying on volunteers to provide desktop agents, you can send instructions for enabling Xgrid from the Sharing pane of System Preferences. If the volunteers are using Mac OS X v10.3, you must first download the Xgrid Agent for Mac OS X v10.3 and then use the Xgrid pane of System Preferences. You can download the Xgrid Agent for Mac OS X v10.3 from: www.apple.com/server/macosx/xgrid.html If you administer a group of computers and want the computers to participate in a grid using Xgrid, you can use the following methods:  Apple Remote Desktop  SSH  NetBoot or NetInstall Apple Remote Desktop Apple Remote Desktop (ARD) v2.1 is a separate product available from Apple that integrates common administrative tasks across multiple computers (such as screen sharing, software installation, running UNIX scripts, and so on). You can use ARD to remotely run System Preferences on each computer but it is usually simpler to change the preferences once and then push the new preferences file (/Library/Preferences/com.apple.xgrid.agent.plist) to all relevant nodes. For more information, see the Apple Remote Desktop Administration guide at www.apple.com/server/documentation. SSH If you don’t have ARD but you’ve set up SSH logins, you can do the same thing as ARD using the scp command-line tool (or rsync, if you’ve set that up). You can also use the xgridctl tool with the following command: $ ssh root@remotehost xgridctl agent start For more details, see the man pages for SSH, SCP, SFTP, or rsync in the Terminal application. 42 Chapter 3 Managing a Grid NetBoot or Network Install For large networks, it often makes sense to use a common system image that is mounted or installed by each agent to configure the agents. Although Xgrid isn’t reason enough to use NetBoot, consider whether using Network Install would simplify your general administrator’s tasks. If you use Netboot with Xgrid, all agents must have unique hostnames and must keep all files intact between reboots. For more information, see System Imaging and Software Update Administration at www.apple.com/server/documentation. Viewing a List of Agents You can see a list of agents for a controller in Xgrid Admin. To see a list of agents for an Xgrid controller: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the grid. 3 Click Agents. 4 Select an agent in the list to see information about the CPU power and processors it uses. The color bubble to the left of the name shows each agent’s status. For details, see “Status Indicators in Xgrid Admin” on page 40. Adding an Agent You can add an agent to a controller in Xgrid Admin. You can add agents that are offline. The agents will be available to the controller when the computers are online or when the controller administrator makes the agents active. To add an agent: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Agents. 4 Click the Add (+) button below the list of agents. 5 Enter a name for the agent and click OK. The agent is added to the list. The color bubble to the left of the name shows the agent’s status. For details, see “Status Indicators in Xgrid Admin” on page 40. Chapter 3 Managing a Grid 43 Deleting an Agent You can delete an agent for an Xgrid controller in Xgrid Admin. To delete an agent: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Agents. 4 Click the Delete (–) button below the list of agents. Note: If you delete an agent that you know is on the local subnet and is configured to attach to that controller, wait a few moments and it will reappear in the list. If the agent doesn’t reappear, use the Add (+) button and enter its name to retrieve it. Managing Jobs You use Xgrid Admin to manage jobs after they are submitted by a client. You cannot move a job between grids. Viewing a List of Jobs You can see a list of jobs in Xgrid Admin. To see a list of jobs: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select a job in the list to see details of that job. Stopping a Job You can stop a job in Xgrid Admin. To stop a job: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select the job you want to stop. 5 Click the Stop button below the list of jobs. 44 Chapter 3 Managing a Grid Repeating or Restarting a Job You can repeat a job or restart a stopped job in Xgrid Admin. To repeat or restart a job: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select the job you want to repeat or restart. 5 Click the Start button below the list of jobs. Deleting a Job You can delete a job in Xgrid Admin. To delete a job: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the controller. 3 Click Jobs. 4 Select the job you want to delete. 5 Click the Delete (–) button below the list of jobs. Adding a Grid You use Xgrid Admin to add a grid to an Xgrid controller in the Controllers and Grids list. To add a grid: 1 Open Xgrid Admin. 2 Select the Xgrid controller you want to add the grid to. 3 Click the Add (+) button below the Controller and Grids list. 4 In the pop-up menu, enter a name for the new grid and click OK. Chapter 3 Managing a Grid 45 Deleting a Grid You use Xgrid Admin to remove a grid from an Xgrid controller in the Controllers and Grids list. To delete a grid: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the grid. 3 Click the Action pop-up menu below the Controller and Grids list and select Remove Grid. 4 Click OK. Monitoring Grid Activity You can quickly view the activity of a grid in Xgrid Admin. You can also view agents and job activity using Xgrid Admin. For more information, see “Viewing a List of Agents” on page 43 and “Viewing a List of Jobs” on page 44. To monitor the activity of a grid: 1 Open Xgrid Admin. 2 In the Controllers and Grids list, select the Xgrid controller. 3 Click Overview to see the number of agents, the amount of processor power available and used, and the number of jobs running and pending. 46 Chapter 3 Managing a Grid 4 Planning and Submitting Xgrid Jobs 4 Use this chapter to learn how to use Xgrid command-line tools and the Terminal application to submit jobs to a grid and to get information about jobs. After you configure an Xgrid controller and add agents to a grid, you can use the Terminal application to send a job to the grid. Structuring Jobs for Xgrid Carefully planning and structuring a job can result in efficient use of the grid. For example, the best structure for a job that requires multiple searches of a large database may be to divide the database into multiple sections and provide a section to each agent in the grid. About Job Styles Different styles of jobs often require different handling. Similarly, the way a job is structured influences how efficiently the grid completes it. Consider the following job styles:  Everything in one single large job, with numerous small tasks.  Everything divided into medium-sized jobs, where each job has roughly as many tasks as there are nodes in the grid. (This type of job is usually created by a meta job script, which divides the job into smaller chunks, each of which is a job in itself.)  An entire workflow composed of several interrelated jobs. Deciding how to structure a job can involve experimentation to discover the best way to complete it. For example, you might create a simple, small version of a job in two styles, such as by planning all tasks in one job or by subdividing into multiple tiny jobs. Running both experimental jobs under similar conditions in the grid will give you a good idea of which job style is better suited to those conditions. 47 About Job Failure Xgrid jobs can rely on message-passing interface (MPI) APIs. For jobs that rely on MPI, if a single task fails, the entire job fails and must be resubmitted. Therefore you should not use MPI-based jobs on grids with high task-failure rates. Jobs that are more parallel in nature are generally unaffected by occasional task failures. Tasks are typically reassigned to other available agents to complete the job. Most jobs fall into this category. Submitting a Job You submit jobs to a grid using the command-line tool and Terminal. Example code is available on the Apple developer website (developer.apple.com) for alternative methods of submitting jobs. Also If you have Developer Tools installed you can view the examples located in /Developer/Examples/Xgrid/. For more information about the syntax and options for the xgrid command-line tool, see the xgrid man pages. Some developers and organizations offer specialized applications for submitting jobs to a grid. Or you can create such an application using Apple’s developer tools for Xgrid. When determining whether to use the xgrid command-line tool or another method for submitting jobs, consider these points:  If the job is simple, use the command-line tool.  If you use a shell script, use the command-line tool.  If you want to use Xgrid as part of an application with a graphical user interface (GUI), use the Xgrid API to create the GUI or incorporate it in an existing application. For more information about the API, see the Xgrid Reference at: developer.apple.com/documentation Examples of Xgrid Job Submission and Results Retrieval The following Terminal commands are examples of jobs a client can submit to the controller. $ xgrid -h-p -job submit /bin/echo "Hello, World!" This job runs /bin/echo on the controller and agent systems with the “Hello, World!” parameter. $ xgrid -h -p -job results -id This command shows the results of the job with the id indicated. For an executable shell script marked hello.sh: #!/bin/sh /bin/echo "Hello, World!" 48 Chapter 4 Planning and Submitting Xgrid Jobs The following command copies the shell script hello.sh to the Xgrid controller and agent systems and runs the script. /bin/echo must be installed on the agent system. The hello.sh script must have its executable bit set before it can execute. xgrid -h -p -job submit hello.sh Viewing Job Status You can monitor jobs in Xgrid Admin (for details, see “Managing Jobs” on page 44) or with the command-line tool. The following commands in Terminal provide job status: $ xgrid -h -p -job list $ xgrid -h -p -job attributes -id Retrieving Job Results You can retrieve job results using the command-line tool. The following commands in Terminal retrieve job results. $ xgrid -h -p -job results $ xgrid -h -p -job results id Chapter 4 Planning and Submitting Xgrid Jobs 49 50 Chapter 4 Planning and Submitting Xgrid Jobs 5 Solving Xgrid Problems 5 Use this chapter to help solve common problems you might encounter and questions you might have while working with Xgrid service. This section contains answers to common problems and questions. If Your Agents Can’t Connect to the Xgrid Controller If an agent is a server, make sure the agent service is enabled and the Xgrid service is started. The Xgrid controller is the only component of Xgrid that has an open port (port 4111) and requires a firewall opening. This means the Xgrid controller is the only component that advertises on or responds to queries over Bonjour. When enabling the controller, make sure firewall port 4111 is open on your computer’s firewall (enabled in the Sharing Pane of System Preferences) or your corporate firewall (if accepting agents or clients outside your organization). Agents and clients access the controller through a Bonjour lookup or an explicit hostname/IP address, then they initiate a connection to the controller over a user port, avoiding the need to perform privileged operation or opening the firewall. If You Use Xgrid over SSH The simplest way to secure Xgrid using SSH is to create a tunnel from the client or the agent to the controller: $ ssh user@controller.hostname.com -L 4111:controller.hostname.com:4111 Then, have the agent or client connect to localhost instead of the controller. By doing this, SSH tunnels to the remote connection. You can use other ports on the local machine and even tunnel through an intermediary host. 51 To run an Xgrid agent over an SSH tunnel as a particular user: Using Terminal, enter the following: $ ssh -R 20000:192.168.1.100:4111 user@192.168.1.102 /usr/libexec/xgrid/ GridAgent -ServiceName localhost:20000 -RequireControllerPassword NO UsesRendezvous NO -OnlyWhenIdle NO -BindToFirstAvailable NO is the port to tunnel through the ssh connection, 192.168.1.100:4111 is the address and port number of the controller, user is the name of the user to connect, and 192.168.1.102 is the address of the remote computer to run the agent. 20000 If You Run Tasks on Multi-CPU Machines By default, each Xgrid agent (one per machine) accepts as many tasks as there are CPUs on that host, as reported by $ sysctl hw.ncpu. Agents assume that tasks are single-threaded, so they will run two tasks to make best use of a dual-CPU system. To run multithreaded tasks that take up both CPUs, edit the agent configuration file /Library/Preferences/com.apple.xgrid.agent.plist. To make it always only accept a single task, change the MaximumTaskCount line to: MaximumTaskCount=1 Note: This must be done explicitly for each agent, and is permanent until reversed. You can’t specify this kind of constraint as part of a job submission. If You Submit a Large Number of Jobs GridStuffer is a third-party Cocoa application created by Charles Parnot of Stanford to manage multitask jobs. It provides a friendly GUI for many common Xgrid tasks. GridStuffer is available at: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/html/goodies/GridStuffer-info.html A companion command-line tool, xgridstatus, provides an easy way to retrieve information about your grid and jobs. Xgridstatus is available at: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/html/goodies/xgridstatus-info.html 52 Chapter 5 Solving Xgrid Problems If You Want to Use Xgrid on Other Platforms Third-party agents are available that run Xgrid jobs on non-Mac platforms. You are responsible for ensuring that your tasks contain and call appropriate platform-specific code. There is no intrinsic support for heterogeneous execution, although there is nothing that relies on Mac-specific technology. The primary technical requirement is a sufficiently functional BEEP protocol stack. Several open source implementations are available, of varying quality. Two cross-platform Xgrid agents are available:  Curtis Campbell’s java agent, at: http://sourceforge.net/projects/xgridagent-java/  Daniel Cote’s Linux/UNIX agent (not yet updated for Mac OS X v10.4), at: http://www.novajo.ca/xgridagent/ If the Xgrid Controller Must Be Restarted When the Xgrid controller is restarted, by Server Admin, xgridctl tool, a power-outage, or a kernel panic, the following occurs:  Clients and agents are disconnected.  Tasks running when the controller restarted are stopped.  Partial data from killed tasks is discarded.  data from finished tasks is saved and can be retrieved as usual.  Queued jobs and tasks are saved and run as usual.  Tasks are started/restarted as agents reconnect and become available. If Xgrid Has Crashed The Xgrid controller and agent should restart automatically if they crash. CrashReporter logs can be found in /Library/Logs/CrashReporter. Xgrid logs notices, warnings, and errors to the console as well as to log files in /Library/Logs/Xgrid If You Are Trying to Submit Jobs over 2 GB The Xgrid controller is a 32-bit process and keeps most job input and output data in memory. This means that the controller can crash if your jobs require a large amount of input or produce a large amount of output. This limitation might change in the future. We recommend using a shared filesystem (such as Xsan or NFS) if you need to share large amounts of data between distributed processes. Chapter 5 Solving Xgrid Problems 53 If You Want to Enable Kerberos/SSO for Xgrid For Xgrid to use SSO, you need the following:  The agent must have the host’s user principal in the system keytab.  The Kerberos database on the KDC must contain the agent’s principal.  The controller’s realm must be the default realm on the agent computer. The agent’s principal is created in the KDC and is put in the agent’s keytab if the agent computer is bound to the OD master using _AUTHENTICATED BINDING_ with Directory access. Otherwise, you must use kadmin to create the principal in the KDC and export it to the keytab. For example, the computer hosting the agent must have the host’s user principal in the system keytab, as shown here: $ hostname:~ user $ sudo klist -k $ Password: $ Keytab name: FILE:/etc/krb5.keytab KVNO Principal ---- -------------------------------------------------------------1 hostname.apple.com@XGRIDTEST.APPLE.COM 1 hostname.apple.com@XGRIDTEST.APPLE.COM 1 hostname.apple.com@XGRIDTEST.APPLE.COM The Kerberos database on the KDC must contain the agent’s principal, as in the following: $ sudo kadmin.local -q "get_principal hostname.apple.com" Authenticating as principal root/admin@XGRIDTEST.APPLE.COM with password. Principal: hostname.apple.com@XGRIDTEST.APPLE.COM Expiration date: [never] Last password change: Tue Apr 12 17:46:41 PDT 2005 Password expiration date: [none] Maximum ticket life: 0 days 10:00:00 Maximum renewable life: 7 days 00:00:00 Last modified: Tue Apr 12 17:46:41 PDT 2005 (root/admin@XGRIDTEST.APPLE.COM) Last successful authentication: [never] Last failed authentication: [never] Failed password attempts: 0 Number of keys: 4 Key: vno 1, Triple DES cbc mode with HMAC/sha1, no salt Key: vno 1, ArcFour with HMAC/md5, no salt Key: vno 1, DES cbc mode with CRC-32, no salt Key: vno 1, DES cbc mode with CRC-32, Version 4 Attributes: REQUIRES_PRE_AUTH Policy: [none] 54 Chapter 5 Solving Xgrid Problems The controller’s realm must be the default realm on the agent computer, as shown: $ cat /Library/Preferences/edu.mit.Kerberos # WARNING This file is automatically created, if you wish to make changes # delete the next two lines # autogenerated from : /LDAPv3/xgridtest.apple.com # generation_id : 1637891359 [libdefaults] default_realm = XGRIDTEST.APPLE.COM [realms] XGRIDTEST.APPLE.COM = { kdc = xgridtest.apple.com admin_server = xgridtest.apple.com } [domain_realm] apple.com = XGRIDTEST.APPLE.COM .apple.com = XGRIDTEST.APPLE.COM For More Information If you’re an experienced server administrator or even a novice server administrator working with Xgrid, you can review the Xgrid FAQ site. The FAQ site will provide you with access to news, posted questions and threads, and the ability to post your own Xgrid questions. The site is at http://lists.apple.com/faq/pub/xgrid_users/. For more information about advanced configuration options, see the xgridctl man page. Chapter 5 Solving Xgrid Problems 55 56 Chapter 5 Solving Xgrid Problems Part II: Configuring High Performance Computing II Use the chapters in this part of the guide to learn about high performance computing and the applications and tools available for administering it. Chapter 6 Introducing High Performance Computing Chapter 7 Reviewing the Cluster Setup Process Chapter 8 Identifying Prerequisites and System Requirements Chapter 9 Preparing the Cluster for Configuration Chapter 10 Setting Up the Cluster Controller Chapter 11 Setting Up Compute Nodes Chapter 12 Testing Your Cluster 6 Introducing High Performance Computing 6 Use this chapter to learn about high performance computing (HPC) and how it’s supported by Apple technology. With high performance computing, you can speed the processing of complex computations by using Xserve computers with the Xgrid service. Understanding HPC HPC refers to the use of high-end computer systems to solve computationally intensive problems. HPC includes large supercomputers, symmetric multiprocessing (SMP) systems, cluster computers, and other hardware and software architectures. In recent years, developers have made it feasible for standard off-the-shelf computer systems to achieve supercomputer-scale performance by clustering them in efficient ways. Apple and HPC Apple’s hardware and software facilitate HPC in unique and meaningful ways. Although many hardware and software architectures can be used for cluster computing, Mac OS X Server v10.5 and Xserve have specific features that enhance the performance and manageability of cluster installations. The integration of Xserve with Mac OS X Server provides unparalleled ease of use, performance, and manageability. Because Apple makes the hardware and the software, the benefits of tight integration are immediately evident in the quality of the user experience with a Macintosh-based cluster. 59 Mac OS X Server Mac OS X Server v10.5 is Apple’s award-winning UNIX server operating system. Mac OS X Server can compile and run UNIX 03-complaint code, and runs 64-bit applications alongside 32-bit applications at native performance. The Mach kernel provides preemptive multitasking for outstanding performance, protected system memory for stability, and modern SMP locking for efficient use of multi processor and multi core systems. Mac OS X Server also includes highly optimized math libraries that enable software developers to take maximum advantage of the G5 or Intel-based processor without the use of difficult programming techniques or expensive development tools. Mac OS X Server also includes Xgrid, an integrated distributed resource manager for both grids and clusters. Xserve Clusters Using a combination of Xserve systems, you can build clusters that aggregate the power of these systems to provide HPC solutions at comparatively low cost. An Xserve cluster consists of at least 2nodes: a cluster controller and one or more compute nodes, as shown in the following illustration: Controller Compute nodes Xserve cluster Xserve 64-Bit Architecture The 64-bit architecture of Xserve systems is ideal for HPC applications. It provides 64bit math precision, higher data throughput, and very large memory space. 60 Chapter 6 Introducing High Performance Computing Memory Space The 64-bit architecture provides four billion times the memory space available in a 32bit architecture, which puts the theoretical address space available to Mac OS X Server applications at 16 exabytes. Xserve G5 systems support 8 GB of memory. Xserve Intel systems support 32 GB of memory. Libraries Mac OS X Server provides the following highly optimized libraries for developing HPC applications. In addition to standard libraries like libSystem, numerical libraries like BLAS, LAPACK, and others provide industry-standard routines that have been handtuned for the G5 or Intel processor. Developers can make efficient use of the system architecture without writing computer code or vector code. Library Description libSystem A collection of core system libraries libMathCommon A common math functions library vDSP A library that provides mathematical functions for applications that operate on real and complex data types BLAS A library of basic linear algebra subprograms, which are a standard set of building blocks for vector and matrix operations LAPACK The linear algebra package, which is a standard library for solving simultaneous linear equations vForce A library of highly-optimized single- and double-precision mathematical intrinsic functions vBasicOps A collection of basic operations that complement the vector processor’s basic operations up to 128 bits vBigNum A library of optimized arithmetic operations for 256-, 512-, and 1024-bit operands Easy Porting of UNIX Applications Mac OS X Server is now an Open Brand UNIX 03 Registered Product, conforming to the SUSv3 and POSIX 1003.1 specifications for the C API, shell utilities, and threads. It can compile and run all your existing UNIX 03-compliant code. Chapter 6 Introducing High Performance Computing 61 Support of Loosely Coupled Computations You can use Xserve clusters to perform most types of loosely coupled or embarrassingly parallel computations. Embarrassingly parallel computations consist of somewhat independent computational tasks that can be run in parallel on many different processors to achieve faster results. Here are examples of loosely coupled computations that you can accelerate using the setup described in this guide:  Image rendering. Different rendering tasks, such as ray tracing, reflection mapping, and radiosity, can be accelerated by parallel processing.  Bioinformatics. The throughput of bioinformatics applications like BLAST and HMMER can be greatly enhanced by running them on a cluster. Note: The Apple Workgroup Cluster is a preconfigured cluster solution that has everything you need to get up and running quickly. It includes qualified, integrated hardware components and easy-to-use management tools. You can add clusteraware commercial applications, such as iNquiry or gridMathematica, or develop your own custom applications using Xcode. For more information, see http:// www.apple.com/science/solutions/workgroupcluster.html.  Cryptography. Brute-force key search is a classic example of a cryptography application that can be greatly accelerated when run on a computer cluster.  Data mining. High performance computing is essential in data mining because of the amount of data that is analyzed. Note: This guide assumes that the cluster nodes communicate over gigabit Ethernet. Although the network latency of Gigabit Ethernet is low enough for most loosely coupled computations, those that require lower latency may benefit from another interconnect technology. 62 Chapter 6 Introducing High Performance Computing 7 Reviewing the Cluster Setup Process 7 Use this chapter to learn about the process of setting up a high performance cluster. You will use multiple server tools to configure services, a cluster controller, compute nodes, and users when setting up a high performance cluster. The following chapters provide a step-by-step process to assemble and configure a computational cluster. The resulting cluster will consist of a controller and a number of compute nodes. The compute nodes will be connected to the controller via a private (isolated) Ethernet network switch. The controller will be connected to both the private Ethernet network and a public network, potentially the Internet. The controller will also provide a shared file system to compute nodes. The controller will provide a number of services to the compute nodes:  A Firewall will isolate the controller and compute nodes from the public network, protecting against unwanted access. Access to the private network from outside the firewall will require remote users to use SSH for command-line access or VPN to use or manage cluster resources with graphical applications or administrative tools such as Apple Remote Desktop.  Network services such as DHCP, DNS, and NAT will allow the compute nodes to communicate with each other and external networks.  Open Directory will contain user account information, including usernames and passwords, and make these accounts available to compute nodes. Using Kerberos with Open Directory provides single sign-on capability, reducing the number of times a user will need to enter passwords to access cluster resources.  Open Directory will also publish network file system (NFS) share points, providing automatic file sharing between compute nodes and controller. A shared network home directory, containing home folders for each cluster user, will be mounted on each compute node.  The controller will host the Xgrid controller service. 63 Cluster Setup Overview Here is a summary of what you’ll be doing to set up and test an HPC cluster. Step 1: Before you begin Before setting up your cluster, understand the expectations and requirements that you must fulfill. See Chapter 8, “Identifying Prerequisites and System Requirements.” Step 2: Prepare the cluster for configuration Prepare your cluster nodes for configuration by setting up the hardware and connecting your nodes to a network. See Chapter 9, “Preparing the Cluster for Configuration.” Step 3: Enable, configure, and start services After your cluster is assembled and ready, start by setting up and configuring the cluster controller. Use Server Assistant to set up the server software on the cluster controller. See Chapter 10, “Setting Up the Cluster Controller.” Use Server Admin to configure and start the following services:  DNS service. See “Configuring DNS Service” on page 84.  Open Directory service. See “Configuring Open Directory Service” on page 86.  DHCP service. See “Configuring DHCP Service” on page 87.  Firewall service. See “Configuring Firewall Settings on the Cluster Controller” on page 88.  NAT service. See “Configuring NAT Settings on the Cluster Controller” on page 90.  NFS service. See “Configuring NFS” on page 90.  VPN service. See “Configuring VPN Service” on page 90.  Xgrid service. See “Configuring Xgrid Service” on page 91. Step 4: (Optional) Prepare the data drive Use Disk Utility to configure the data drive. See “Preparing the Data Drive as a Mirrored RAID set” on page 92. Step 5: Create an automounted network share Use Server Admin to create an automounted network share. See “Creating a Home Directory Automount Share Point” on page 93. Step 6: Create network user accounts Use Workgroup Manager to create network user accounts for cluster users. See “Creating User Accounts” on page 94. 64 Chapter 7 Reviewing the Cluster Setup Process Step 7: Create an Auto Server Setup record for the compute nodes Use Server Assistant to save configuration settings to a file or Open Directory record. This allows cluster nodes to automatically configure themselves when they start up for the first time. See “Creating an Auto Server Setup Record for Compute Nodes” on page 95 and “Verifying LDAP Record Creation” on page 98. Step 8: Set up compute nodes Start compute nodes to begin the Auto Server Setup process. They’ll automatically configure themselves and then restart. See “Setting Up Compute Nodes” on page 98. Step 9: Finish compute node configuration Use Server Admin to name the compute nodes, join them to the Kerberos realm, and configure their Xgrid agent software. Step 10: Test your cluster setup After configuring the controller and compute nodes, test your cluster with Xgrid Admin and a sample Xgrid application. See Chapter 12, “Testing Your Cluster.” Chapter 7 Reviewing the Cluster Setup Process 65 66 Chapter 7 Reviewing the Cluster Setup Process 8 Identifying Prerequisites and System Requirements 8 Before setting up your cluster, read the prerequisites and requirements in this chapter and familiarize yourself with the setup process. To make sure that your cluster is successfully set up, read this chapter to familiarize yourself with the expectations and requirements you must meet before starting the setup procedure. Then read the last section, which provides an overview of the cluster setup process. Prerequisites This guide assumes you have the expertise needed to set up and manage the cluster, perform the initial configuration of the cluster nodes, and carry out the types of computations you can perform on the cluster. Expertise To set up and deploy clusters, you should have a good understanding of how Mac OS X Server works and you should have a fundamental understanding of UNIX, Xgrid, and TCP/IP networking. Xserve Configuration This guide assumes that you’ll be using new, out-of-the-box Xserve systems running Mac OS X Server v10.5 or later. If not, you must install a clean version of Mac OS X Server v10.5 or later on your systems. 67 System Requirements Take time to define the requirements needed to make sure the cluster setup is successful. System requirements are categorized as infrastructure, software, and private network requirements. Infrastructure Requirements This section describes the most important hardware infrastructure requirements. Consult with your system administrator about other requirements. For example, you might need one or more uninterruptible power supplies (UPSs) to provide backup power to key cluster components. Another requirement might be a physical security system to protect the cluster from unauthorized access to sensitive information. Infrastructure requirements are divided into the following subcategories:  “General Hardware Requirements” on page 68  “Power Requirements” on page 68  “Cooling Requirements” on page 69  “Weight Requirements” on page 70  “Space Requirements” on page 70  “Network Access Requirements” on page 71 General Hardware Requirements To set up a cluster, you should have the necessary hardware infrastructure in place. This includes:  Racks  Electrical power  Cooling system  Network access points and switches Power Requirements When setting up the physical infrastructure for your cluster, consider the following power consumption figures:  Rated power consumption. This figure represents the maximum power consumption of a given system’s power supply.  Typical power consumption. This figure represents the typical power consumption of a server under normal operating conditions. Note: This section focuses only on the rated power consumption figure because it guarantees that your circuit won’t be overloaded at any time—unlike the typical power consumption figure, which doesn’t protect your circuit from abnormal surges in power consumption. 68 Chapter 8 Identifying Prerequisites and System Requirements To obtain power consumption figures for cluster nodes, see the following articles on the AppleCare Service & Support website:  Article 86694, “Xserve G5: Power consumption and thermal output (BTU) information,” at www.info.apple.com/kbnum/n86694  Article 75383, “Xserve: Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.com/kbnum/n75383  Article 86251, “Xserve (Slot Load): Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.com/kbnum/n86251  Article 304887, “Xserve (Late 2006): Power consumption and thermal output (BTU) information,” at www.info.apple.com/kbnum/n304887 Although the rated current load covers your cluster nodes, you must also consider the power consumption of other devices connected to your circuit. For large clusters, speak with an Apple Systems Engineer to determine the correct power infrastructure. For information about Apple consulting services and service and support plans, see the Apple Server Service and Support website at http://www.apple.com/server/support. WARNING: The formulas in this section help you estimate your power requirements. These estimates may not be high enough, depending on your configuration. For example, if your cluster uses one or more Xserve RAID systems, or other thirdparty hardware, you must include their power consumption requirements. Cooling Requirements It’s very important to keep your Xserve computers running at normal operating temperatures (see www.apple.com/xserve/specs.html). If your servers overheat they will shut down and any work being done will be lost. You can also damage or shorten the life span of your servers by running them at high temperatures. To obtain thermal output figures for cluster nodes, see the following articles on the AppleCare Service & Support website:  Article 86694, “Xserve G5: Power consumption and thermal output (BTU) information,” at www.info.apple.com/kbnum/n86694  Article 75383, “Xserve: Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.com/kbnum/n75383  Article 86251, “Xserve (Slot Load): Power Consumption and Thermal Output (BTU) Information,” at www.info.apple.com/kbnum/n86251  Article 304887, “Xserve (Late 2006): Power consumption and thermal output (BTU) information,” at www.info.apple.com/kbnum/n304887 Chapter 8 Identifying Prerequisites and System Requirements 69 Consider the thermal output of other devices, such as the management computer, Xserve RAID systems, monitors, and other heat-generating devices used in the same room. As always, consult with your system administrator to determine the necessary level of cooling that your cluster and its associated hardware require for safe and effective operation. Weight Requirements For Xserve and cluster node weight information, see the Apple Xserve website at www.apple.com/xserve. Also include the weight of the rack if you’re bringing in a dedicated rack, and the weight of other devices used by the cluster. If you mount cluster nodes in a rack with casters, set up the rack where you’ll keep the cluster and then mount the systems. A heavy rack is difficult to move, particularly across carpet. In addition, vibrations caused by moving your cluster long distances when racked might damage your hardware. After determining weight requirements, consult with your facilities personnel to make sure the room where the cluster will be installed meets the weight requirements. Space Requirements You should have enough space to house the cluster and enable easy access to it to perform routine maintenance tasks. Also, locate the cluster where it doesn’t affect and isn’t affected by other hardware in your server room. Consider the following when choosing a location for your cluster:  Don’t place the cluster next to an air vent, air intake, or heat source.  Don’t place the cluster directly under a sprinkler head.  Don’t obstruct doors (especially emergency exit doors) with your cluster.  Leave enough room in front of, beside, and especially behind your cluster.  Make sure air can flow around the cluster. The room might be very well cooled, but if air can’t easily flow around the cluster, your computers can still overheat. 70 Chapter 8 Identifying Prerequisites and System Requirements If you’re housing your cluster in a computer room, make sure you have at least 18 inches of clearance in front and behind your systems. If you’re housing it in an office or other unmanaged space, make sure your cluster has at least 18 inches of clearance on all sides of the rack, as shown in the following illustration: 18” 18” 18” 18” You should have enough space to open the rack’s door, slide out systems, and perform other routine maintenance tasks. Network Access Requirements Your cluster requires access to two networks:  Private network. This is a high performance Gigabit Ethernet network. You’ll need at least a 1-Gigabit switch.  Public network. This network connects the cluster controller to the client computers that submit jobs to your cluster. This guide uses a number of 10.0.2.x addresses as examples for your public network connections. Do not use these example addresses when configuring your cluster. When you see a 10.0.2.x address, substitute the IP address appropriate for your organization’s network. The following illustration shows a configuration of a cluster connected through a switch creating a private network. The illustration also shows the headnode connected to the public and private network. Public Network Gigabit Switch Private Network Chapter 8 Identifying Prerequisites and System Requirements 71 Software Requirements You need:  A site-licensed copy of Mac OS X Server v10.5 or later.  One or more copies of Apple Remote Desktop v3 or later (recommended).  The latest version of Server Tools. Volume-Licensed Serial Number To run multiple copies of Mac OS X Server, you should obtain a volume-licensed serial number. If you haven’t obtained a volume-license serial number, contact your local Apple sales representative. Note: The format of the server serial number is xsvr-999-999-x-xxx-xxx-xxx-xxx-xxx-xxx-x, where x is a letter and 9 is a digit. The first element (xsvr) and the fourth (x) must be lowercase. Apple Remote Desktop Configuration and administration of your cluster will be greatly enhanced with Apple Remote Desktop v3 or later. You can use Apple Remote Desktop to configure, monitor, and control your cluster, as well as rapidly install software. Server Tools If you are using a management computer, you must install Server Tools on your management computer. The Server Tools suite includes:  Server Assistant  Server Admin  Server Monitor  Xgrid Admin You use these tools to remotely manage the cluster. Install these tools using the Server Admin Tools CD, which is included with Xserve and Mac OS X Server. Private Network Requirements The compute nodes will be connected through a private Ethernet network, separate from your organization’s primary (public) network. The cluster controller will be connected to the private and public networks and will act as a gateway, allowing users connected to the public network (or the Internet) to use the cluster’s resources, and allowing the compute nodes to use resources outside the private network. Private network requirements include the following:  A range of IP addresses should be reserved for the private network. A number of non routable IP address ranges are available for use with private networks. These addresses cannot be used with the Internet without Network Address Translation (NAT), which will be provided by the cluster controller. 72 Chapter 8 Identifying Prerequisites and System Requirements  Addresses in ranges such as 192.168.x.x, 10.0.x.x, and 172.16.x.x are commonly used for private networks. Because the first two are used more commonly with NAT devices used in the home, and because your users may want to connect to your cluster from behind one of these devices, it is best to choose a range less likely to exist on your user’s networks. This guide uses the range 172.16.1.1 - 172.16.1.254 (subnet mask 255.255.255.0). You can use this range for your cluster, or use a different one if you prefer.  You need a Domain Name System (DNS) server that will be used to assign names to network addresses so you don’t need to remember IP addresses. Your private network can use a DNS domain name that is not in use on (and is not valid with) the Internet. This guide uses the .cluster domain. You can use this domain with your cluster as well. WARNING: Where you see the DNS domain .example.com, you should substitute the DNS domain used for your organization’s public network. Static IP Address and Hostname Requirements Your cluster requires a single static IP address and a matching fully qualified and reverse resolvable DNS entry for the cluster controller. By using a static IP address rather than a dynamic one you can maintain a consistent address that clients can always use. Note: Initiate the process of requesting an IP address and a hostname as early as you can before setting up the cluster, to account for the lead time typically required. Chapter 8 Identifying Prerequisites and System Requirements 73 74 Chapter 8 Identifying Prerequisites and System Requirements 9 Preparing the Cluster for Configuration 9 Use this chapter to mount the systems on the rack, connect the systems to a power source and the private network, and configure the optional management computer. To prepare the cluster nodes for configuration, you mount them in racks and connect them to the power source and private network. You also set up the management computer by installing Apple Remote Desktop and Server Tools. Preparing the Cluster Nodes for Software Configuration After you prepare the physical infrastructure for hosting the cluster, the next step is to mount the cluster nodes and prepare them for software configuration. To prepare the cluster for configuration: 1 Unpack the computers and mount them in the rack. For more information, use the instructions provided with your hardware. Note: If you’re using existing Xserve computers, you must perform a clean installation of Mac OS X Server v10.5 or later to restore the systems to default settings. 2 Record each computer’s serial number and keep the information in a safe place. When recording the serial numbers, do it in a way that makes it easy for you to tell which serial number belongs to each computer. For example, use a table to map a system’s serial number to the name on a label on the system’s front panel. Serial Number Name serial_number_0 Cluster controller serial_number_1 Compute node 1 serial_number_2 Compute node 2 ... ... 75 You can find the serial number of an Xserve computer in four places:  The unit’s back panel: Serial number label  The unit’s interior If you look for the serial number on the unit’s interior, don’t confuse the serial number for the server with the serial number for the optical drive—these are different numbers. The Xserve computer’s serial number is denoted by “Serial#” (not “S/N”) followed by 11 characters. Serial number label  The large pull-out plastic tab on Xserve computers with Intel processors  The cardboard shipping box You can use a barcode scanner on the box label to get the serial number. 3 Use the following guidelines, connect the cluster computers to a power source:  Power cables. Use the long power cables with a horizontal power distribution unit (PDU) and the short cables with a vertical PDU. When using the long cables, connect the servers so you can tell which cable belongs to which node. Consider labeling cables to make it easier to map a cable to a node.  Connection to the uninterruptible power supply (UPS). Connect the cluster controller, storage devices used by the cluster, and the private network switch to a UPS unit to protect against data loss in case of a power outage. If your UPS is connected to the controller through USB, you can use the UPS configuration settings in System Preferences. Note: If you are using a UPS, the UPS low power shutdown script is available for additional advanced power options. This script is located at /usr/libexec/upsshutdown. 76 Chapter 9 Preparing the Cluster for Configuration  UPS connection to wall outlet. Make sure the electrical outlets support the UPS plug shape.  Power cord retainer clips. To prevent power cables from slipping out, use the power cord retainer clips that come with your Xserve systems.  Air flow. Don’t permit a mass of power cables to obstruct air flow. 4 Connect the two Ethernet ports (shown in the illustration below) by connecting port 1 on the cluster controller to the public network and port 2 to the private network. Ethernet port 2 (private network) Ethernet port 1 (public network) 5 Connect Ethernet port 1 on the remaining nodes in the cluster to the private network, in order. Use the last port on the switch for the cluster controller, the first port for the first compute node, the second port for the second compute node, and so on. Connecting the Ethernet cables to the switch in order helps you identify which cluster node a cable belongs to. Chapter 9 Preparing the Cluster for Configuration 77 (Optional) Setting Up the Management Computer You can use the management computer to remotely set up, configure, and administer your cluster. To set up the management computer: 1 Connect the management computer to the private network (as shown) using the second-to-last switch port. Optional Management computer Private Network 2 Start the management computer. 3 Disable AirPort and any network connection other than the one you’ll be using to connect to your private network. 4 If they aren’t installed, install the latest version of the Mac OS X Server tools and applications from the Mac OS X Server Administration Tools CD, which is included with the Mac OS X Server installation kit. The Mac OS X Server tools and applications are installed into /Applications/Server/. 5 Configure the management computer’s network address. If your cluster controller is not connected to a keyboard, video display, and mouse, or if you prefer to set up the cluster from a management computer, you will connect the management computer to the private network and disable all other network connections. Until the controller is assigned an IP address on the private network, configure your management computer to use DHCP. After the controller is assigned an IP address, you should configure your management computer to use a static address in the range reserved for your private network, but outside the range reserved for compute nodes. 78 Chapter 9 Preparing the Cluster for Configuration If you are adopting the IP address range that is used in this guide (172.16.1.1 - 172.16.1.199 for compute nodes, 172.16.1.254 for the controllers), you can configure your management computer to use 172.16.1.253. After you connect to the private network, the server administration tools mentioned in this guide (Server Assistant, Server Admin, Workgroup Manager, and Xgrid Admin) can be installed and used on your management computer, connecting via IP address to the cluster controller (and later the compute nodes). You can also use Apple Remote Desktop, or the screen sharing feature included with Mac OS X v10.5, to control the nodes via the network, using the server administration tools directly on the remote nodes. Chapter 9 Preparing the Cluster for Configuration 79 80 Chapter 9 Preparing the Cluster for Configuration 10 Setting Up the Cluster Controller 10 Use this chapter to set up server software on the cluster controller and configure the services running on it. You use Server Assistant, Server Admin, and Apple Remote Desktop (optional) to set up and configure the cluster controller. Setting Up Server Software on the Cluster Controller To set up the cluster controller, use Server Assistant (located in /Applications/Server/). To set up the cluster controller: 1 Start the cluster controller. The cluster controller should have two Ethernet cables, with Ethernet port 1 connected to the public network switch and Ethernet port 2 connected to the private network switch. Only the cluster controller should be running on the private network. If you are using a management computer, use Server Assistant to connect to the controller. For more information about using Server Assistant remotely, see Server Administration. If you are using the Apple Remote Desktop to manage the controller, connect to the controller and initiate a screen control session. For more information, see the Apple Remote Desktop Guide. 2 In the Welcome screen, click Continue. 3 In the Server Configuration screen: a Select Advanced. b Click Continue. 4 In the Keyboard screen: a Select the keyboard layout for the server. b Click Continue. 81 5 In the Serial Number screen: a Enter a volume license Mac OS X Server serial number. b Click Continue. 6 In the Registration Information screen, fill out the form or press Command-Q and click Skip. 7 In the Administrator Account screen: a Create the user account you’ll use to administer the cluster controller (for example, Administrator). b Click Continue. 8 In the Network Address screen: a Choose “No, configure network settings manually.” b Click Continue. 9 In the Network Interfaces screen: a Enable TCP/IP only for Ethernet 1 and Ethernet 2 by selecting the checkboxes for both Ethernet 1 and Ethernet 2. b Click Continue. 10 In the TCP/IP Connection screen for the Ethernet 1 port: a From the Configure pop-up menu, choose Manually. b In the IP Address field, enter the public IP address of the cluster controller (for example, 10.0.2.199). c In the Subnet Mask field, enter the public subnet mask of the cluster controller (for example, 255.255.255.0). d In the Router field, enter the IP address of the router for the public network (for example, 10.0.2.1). e Leave the DNS Servers field blank. f Leave the Search Domains field blank. g Click Configure IPv6. h From the Configure IPv6 pop-up menu, choose Off. i Click OK, then click Continue. 11 In the TCP/IP Connection screen for the Ethernet 2 port: a From the Configure pop-up menu, choose Manually. b In the IP Address field, enter the private IP address of the cluster controller (for example, 172.16.1.254). c In the Subnet Mask field, enter the private subnet mask of the cluster controller (for example, 255.255.255.0). d In the Router field, enter the private IP address of the cluster controller (for example, 172.16.1.254). 82 Chapter 10 Setting Up the Cluster Controller e f g h i Leave the DNS Servers field blank. Leave the Search Domains field blank. Click Configure IPv6. From the Configure IPv6 pop-up menu, choose Off. Click OK, then click Continue. 12 In the Network Names screen: a Enter the primary DNS name and computer name. The cluster controller has a public and a private DNS name. Use the controller’s private names. For example, use controller.cluster for the primary DNS name and controller for the computer name. A warning may appear saying the server’s address resolves to another name. Click OK. b Verify that the Enable Remote Management checkbox is selected. c Click Continue. 13 In the Time Zone screen: a In the Closest City pop-up menu, choose your time zone. b Click Continue. 14 In the Directory Usage screen: a From the “Set directory usage to” pop-up menu, choose Standalone Server. b Click Continue. 15 In the Confirm Settings screen: a Review the settings. b Click Apply. c Wait for your settings to be applied. 16 Click Start Now, wait until Server Admin launches, and then (if prompted) enter the administrator user name and password. 17 When prompted, click Start Now; then when Server Admin launches, connect using the administrator user name and password. 18 Select the checkboxes to enable the following services: DHCP, DNS, Firewall, NAT, NFS, Open Directory, VPN, and Xgrid. 19 Click Save. 20 To reveal the enabled services, expand the triangle next to the controller in the Servers list. Chapter 10 Setting Up the Cluster Controller 83 Configuring DNS Service Use Server Admin on the cluster controller to create a local DNS zone and add records to map cluster nodes to their corresponding IP addresses. To configure DNS service: 1 Open Server Admin if it is not already open. 2 If necessary, click the triangle to the left of the controller to view a list of services. 3 Click DNS in the expanded Servers list. 4 Click Settings. 5 Click the Add (+) button below the “Forwarder IP Addresses” list, then enter the network address of your public DNS server (for example, 10.0.2.201). 6 Click Save. 7 Click Zones. 8 Click the Add Zone button, then select “Add Primary Zone (Master).” A default zone named example.com is created. 9 Select the default example.com zone. 10 Change the primary zone name to your private DNS domain. The primary zone name must end with a period (for example, “cluster.”). 11 Set Admin Email to the mail address of the person who should be notified of DNS errors (for example, administrator@example.com). 12 Double-click the first entry in the Nameservers list and change it to the private DNS hostname of the cluster controller (for example, controller). 13 Click Save. 14 Select the cluster DNS zone. 15 Click the triangle to the left of the cluster DNS zone. 16 Click Add Record, then select “Add Machine (A).” 17 Select the newly created newMachine. 18 Change the Machine Name field to the private hostname of the controller (for example, cluster). 19 Double-click the first IP address in the IP Address list and then change the first IP address to the public IP address for the controller (for example, 10.0.2.199). 20 Click Save. 21 Repeat steps 16 through 20 for each compute node using the private IP address reserved for them. For example, the name of the first compute node is node1 assigned to 172.16.1.1, node2 assigned to 172.16.1.2, and so on. 84 Chapter 10 Setting Up the Cluster Controller 22 Click the Start DNS button (below the Servers list). The DNS service status indicator turns green when the service starts. 23 From the Apple Menu open System Preferences (/Applications/System Preferences). 24 Click Network. 25 Select the Ethernet 1 interface. 26 In the DNS Server field enter the public IP address of the controller (for example, 10.0.2.199). 27 In the Search Domains field enter the private DNS domain (for example, cluster). 28 Click Apply. 29 Quit System Preferences. Verifying DNS Settings Open Directory requires correct configuration of the DNS service. Before configuring the Open Directory Master, verify your DNS settings carefully. Any incomplete or incorrect Open Directory configuration prevents the cluster from functioning. To verify DNS settings: 1 From the Dock on the cluster controller open the Terminal application. 2 Verify the fully qualified DNS name of the cluster controller using the hostname command. For example, entering hostname returns controller.cluster. $ hostname controller.cluster 3 Verify that the hostname of the cluster controller matches its assigned IP address in DNS using the host command. For example, entering host controller returns 10.0.2.199. $ host controller controller.cluster has address 10.0.2.199 4 Verify that the fully-qualified DNS name of the cluster controller matches its public IP address using the host command. For example, entering host controller.cluster returns 10.0.2.199. $ host controller.cluster controller.cluster has address 10.0.2.199 5 Verify that the reverse DNS record of the controller matches its fully-qualified DNS name using the host command. For example, entering host 10.0.2.199 returns controller.cluster. $ host 10.0.2.199 199.2.0.10.in-addr.arpa domain name pointer controller.cluster Chapter 10 Setting Up the Cluster Controller 85 If any DNS lookups do not match, repeat the process to create the DNS zone and entry for the controller. Do not continue the cluster setup process until DNS resolves correctly. 6 Quit Terminal. Configuring Open Directory Service The Open Directory service is responsible for authenticating users, publishing server setup configurations, and publishing network share automount records. Configuring the Cluster Controller as an Open Directory Master Use Server Admin to configure the Open Directory service on the cluster controller. To configure Open Directory settings: 1 Open Server Admin if it is not already open. 2 In the controller’s list of services, click Open Directory. 3 Click Settings, click General, then click Change. This opens the Open Directory service configuration assistant. 4 Select Open Directory Master, then click Continue. 5 Create a Directory Administrator account, then click Continue. Name, Short Name, User ID, Password: The Directory Administrator account administers the Open Directory domain that all nodes share. You can use the default Name, Short Name, and User ID. Choose a unique password. 6 Enter the Master Domain information, then click Continue. Kerberos Realm: This field is preset to be the same as the server’s private fully qualified DNS name converted to capital letters. Use the preset Kerberos Realm (for example, CONTROLLER.CLUSTER). Search Base: This field is preset to a search base suffix for the new LDAP directory, derived from the private DNS name of the cluster controller. Use the preset LDAP search base (for example, dc=controller,dc=cluster). WARNING: If these fields are not prepopulated, it might indicate your DNS settings were not configured properly. If so, click the Cancel button and redo the steps listed in “Configuring DNS Service” on page 84. 7 Confirm settings, then click Continue. 8 When the service configuration assistant completes, click Close. 9 Verify the Role is set to Open Directory Master. 86 Chapter 10 Setting Up the Cluster Controller Note: You can click Logs and monitor the log file /Library/Logs/slapconfig.log for errors while the Open Directory domain is being created. You can also use the Console (located in /Applications/Utilities/) or Terminal with the command “tail -f/Library/Logs/ slapconfig.log.” In the log, warnings such as the following can be ignored: WARNING: no policy specified for [...] defaulting to no policy After the Open Directory domain is created, the Open Directory service starts and the status icon turns green. Configuring DHCP Service Using Server Admin, configure DHCP service on the cluster controller to provide LDAP and DNS information to the compute nodes. To configure DHCP service: 1 Open Server Admin if it is not already open. 2 In the controller’s list of services, click DHCP. 3 Click Subnets. 4 Remove all subnets. 5 Create a new subnet for Ethernet port 2. 6 Click General and do the following: a In the Subnet Name field, enter a subnet name (for example, Cluster Private Network). b In the Starting IP Address field, enter the first IP address in the private network range available for compute nodes (for example, 172.16.1.1). c In the Ending IP Address field, enter the last IP address in the private network range available for compute nodes (for example, 172.16.1.99). Note: Leave some addresses unused at the end of the range for other devices and VPN connections. d In the Subnet Mask field, enter the subnet mask for your private network (for example, 255.255.255.0). e From the Network Interface pop-up menu, select en1 if it is not already selected. This menu shows the UNIX name for the port. The UNIX name for Ethernet 2 should be en1. f In the Router field, enter the private IP address of the cluster controller (for example, 172.16.1.254). g Set the lease time for the IP addresses served by the DHCP service to at least 1 month. 7 Click Save. Chapter 10 Setting Up the Cluster Controller 87 8 Click DNS below the Subnets list. 9 In the DNS Servers field, enter the public address of the cluster controller (for example, 10.0.2.199). 10 In the Default Search Domain field, enter the DNS domain for your private network (for example, cluster). 11 Click Save. 12 Click LDAP. 13 In the Server Name field, enter the fully qualified DNS name of the cluster controller (for example, controller.cluster). 14 In the Search Base field, enter the LDAP search base for your shared Open Directory domain (for example, dc=controller, dc=cluster). This entry should match the LDAP search base entry you made when you created the Open Directory domain. Note: Verify the Server Name and Search Base fields. Errors in the LDAP configuration of your DHCP service prevent proper autoconfiguration of cluster nodes, automounting of network directories, and use of network user accounts. To avoid typographical errors, copy and paste the search base settings from the Open Directory service search base settings. 15 Select the Enable checkbox to the left of the subnet you just created. 16 Click Save. 17 Click the Start DHCP button (below the Servers list). Configuring Firewall Settings on the Cluster Controller The firewall on the controller is configured to enable access to all protocols from the public and private networks, but more limited access (for SSH and VPN) from external networks, including the Internet. You can adjust these rules to narrow or expand access to your controller. To configure firewall settings on the cluster controller: 1 In the controller’s list of services, click Firewall. 2 Click Settings, then click Address Groups. 3 From the IP Address Groups list, remove all entries except for “any.” 4 Click the Add (+) button. 5 In the Group name field, enter the name of your public network (for example, example.com). 6 In the “Addresses in group” field, change the first entry to match your public IP network in CIDR notation. 88 Chapter 10 Setting Up the Cluster Controller For a subnet mask of 255.255.255.0, use “/24” after the network address (for example, 10.0.2.0/24). 7 Verify that the address range for the list accurately describes the address range used by your public network. 8 Click OK. 9 Click the Add (+) button to add another IP address group. 10 In the “Group name” field, name the group with your private DNS domain name (for example, cluster). 11 In the “Addresses in group” field, change the first entry to match your private IP network in CIDR notation. For a subnet mask of 255.255.255.0, use “/24” after the network address (for example, 172.16.1.0/24). 12 Click OK. 13 Click Save. 14 Click Services. 15 From the “Edit Services for” pop-up menu, choose “any.” 16 Select “Allow only traffic from ‘any’ to these ports.” 17 Select the following ports (in addition to what’s already selected):  ESP - Encapsulating Security Payload protocol  IKE NAT Traversal  VPN ISAKMP/IKE (500)  VPN PPTP—Point-to-Point Tunneling Protocol (1723) Note: Enabling SSH and VPN ports on the controller allows remote access to the controller from your public network. Your public network can also be protected by a firewall service or device. If you plan to access your cluster from outside your public network (for example, using the Internet), talk to your system administrator about enabling the same ports on that firewall as well. 18 Click Save. 19 From the “Edit Services for” pop-up menu, choose the public network that was created in step 5 (for example, example.com). 20 Select “Allow all traffic from .” 21 Click Save. 22 From the “Edit Services for” pop-up menu, choose the private network that was created in step 10 (for example, cluster). 23 Select “Allow all traffic from .” 24 Click Save. Chapter 10 Setting Up the Cluster Controller 89 25 Click the Start Firewall button (below the Servers list). Configuring NAT Settings on the Cluster Controller Network Address Translation (NAT) allows compute nodes to share the controller’s connection to the public network. To configure NAT: 1 In the controller’s list of services, click NAT. 2 Click Settings, then verify that IP Forwarding and Network Address Translation (NAT) is selected. 3 Verify that the “External network interface” pop-up menu is set to your public Ethernet interface (for example, Ethernet 1). 4 Verify that the Enable NAT Port Mapping Protocol checkbox is selected. 5 Click the Start NAT button (below the Servers list). Configuring NFS Using Server Admin, configure the NFS service on the cluster controller. NFS is used for file sharing and network home directory mounts. To configure NFS service: 1 In the controller’s list of services, click NFS. 2 Click Settings. 3 In the “Use__server threads” field, enter a number to specify the maximum number of NFS threads, or daemons, you want to run at one time. An nfsd daemon is a server process that runs continuously behind the scenes and processes read and write requests from clients. The more threads that are available, the more concurrent clients can be served. 4 Click Save. 5 Click the Start NFS button (below the Servers list). Configuring VPN Service Configure the VPN service to enable secure connections from computers on remote networks. To configure VPN service: 1 In the controller’s list of services, click VPN. 2 Click Settings, then click PPTP. 3 Select the Enable PPTP checkbox. 90 Chapter 10 Setting Up the Cluster Controller 4 In the Starting IP address field, enter the first private IP address you want to assign to remote VPN clients (for example, 172.16.1.200). 5 In the Ending IP address field, enter the last private IP address you want to assign to remote VPN clients (for example, 172.16.1.229). 6 Click Save. 7 Click the Start VPN button (below the Servers list). Configuring Xgrid Service Using Server Admin on the cluster controller, configure it as an Xgrid controller and then start Xgrid service. Note: Because the cluster controller is also responsible for authentication, NFS sharing, network services, and possibly other critical services, it is not advisable for a cluster controller to run the Xgrid agent. To configure the Xgrid service: 1 In the controller’s list of services, click Xgrid. 2 Click Overview. 3 Click Configure Xgrid Service. The service configuration assistant will launch. 4 Click Continue. 5 Select “Host a grid,” then click Continue. 6 Enter the directory administrator’s user name and password. This is the directory administrator account you created when you enabled the Open Directory service. 7 Click Continue. 8 Verify that the Xgrid settings include the correct Kerberos realm (for example, CONTROLLER CLUSTER). 9 Click Continue. 10 Once the Xgrid service is configured, click Close. 11 Click Settings. 12 Click Agent, then deselect Enable Agent Service. 13 Click Save. 14 When prompted to restart Xgrid, click Restart. Chapter 10 Setting Up the Cluster Controller 91 Preparing the Data Drive as a Mirrored RAID set When preparing your data drive you should protect your data by using a mirrored RAID set, also referred to as RAID 1. You can use the Disk Utility application to create the mirrored RAID set. To create a mirrored RAID set you must have two or more disks. Note: Your network share points should be located on a different drive than your operating system, ideally on a mirrored RAID set. To prepare the data drive as a mirrored RAID set: 1 Open the Disk Utility application (in /Applications/Utilitie). 2 From the drive list on the left, click one of the two drives to be used in the RAID. 3 Click RAID. 4 Enter a name for the RAID set (for example, Data). 5 Drag the disks you want to mirror from the left side of the pane to the disk list at the center of the pane. 6 For each disk you dragged to the disk list, verify the disk type is set to “Raid Slice.” To use the disk as a mirror at all times, select RAID Slice. To use the disk as a mirror only when another disk fails, select Spare. 7 To automatically rebuild mirror data, click Options, select “Automatically rebuild RAID mirror sets,” and then click OK. 8 Select the RAID set from the disk list and then from the Volume Format pop-up menu choose either “Mac OS Extended (Journaled)” or “Mac OS Extended (Case-sensitive, Journaled)”. If you plan to work with applications or source code that was designed for other UNIX operating systems, choose the case-sensitive option. 9 From the RAID Type pop-up menu, choose Mirrored RAID Set. 10 Click Create. 11 Select the mirrored RAID that will host your data volume. 12 Use the cluster administrator username and password to authenticate. 13 Verify that the RAID set has the correct format. 14 Quit the Disk Utility application. 92 Chapter 10 Setting Up the Cluster Controller Creating a Home Directory Automount Share Point Use Server Admin to configure an automount share point on the cluster controller. To create an automount home directory share point: 1 Open Server Admin and select the controller in the Servers list. 2 Click File Sharing, then click Volumes. 3 Select the volume you want to contain the home directory share point (for example, Data). 4 Click Browse. 5 Click New Folder, name the folder “home,” then click Create. 6 Click Save. 7 Select the home folder you created. 8 Click Share, then click Share Point. 9 Select Enable Automount. The Automount configuration screen appears. 10 Verify that the directory is set to /LDAPv3/127.0.0.1. 11 From the protocol pop-up menu choose NFS. 12 Verify that “Use for” is set to User home folders. 13 Click OK. 14 When prompted, enter the directory administrator’s user name and password. 15 Deselect “Enable Spotlight searching.” 16 From Share Point, click Protocol Options. The Protocol Options screen appears. 17 Click NFS. 18 Select the “Export this item and its contents to” checkbox, then choose Subnet from the pop-up menu. 19 Set the Subnet address field to your private network address (for example, 172.16.1.0). 20 Set the Subnet mask field to your private network subnet mask (for example, 255.255.255.0). 21 Verify that the mapping pop-up menu is set to “Root to Nobody.” 22 Click OK. 23 Click Save. 24 Restart the controller (Apple Menu > Restart). Chapter 10 Setting Up the Cluster Controller 93 Creating User Accounts Use Workgroup Manager to create user accounts. To create user accounts: 1 If you did not restart the cluster controller at the end of the previous section (“Creating a Home Directory Automount Share Point” on page 93), restart it now. 2 Log in using your administrator account. 3 Open Workgroup Manager (located at /Applications/Server/). You can also open Workgroup Manager from the Dock. 4 Connect to the cluster controller using its hostname and your administrator user name and password. 5 On the right side of the Workgroup Manager window, click the lock button. 6 Authenticate with the directory administrator username and password. 7 Click Accounts. 8 Select the users icon tab above the accounts listing. 9 Click New User. 10 In the Name field, enter the full name for a user (for example, “Tom C”). 11 In the Short Names list box, enter a short username for the user (for example, “tac”). 12 In the Password field, enter a password for the user. 13 In the Verify field, reenter the password for the user. 14 Click Save. 15 Click Advanced. 16 From the Login Shell pop-up menu, choose the preferred shell for the user. 17 Click Home. 18 From the list, select the NFS automount share point (home). 19 Click Create Home Now. 20 Click Save. 21 Repeat this process for each cluster user. 22 Quit Workgroup Manager. 94 Chapter 10 Setting Up the Cluster Controller 11 Setting Up Compute Nodes 11 Simplify the compute node setup process by creating Auto Server Setup records. An Auto Server Setup record is an XML property list with values that can be used to automatically complete the Server Assistant for newly installed Mac OS X servers. Auto Server Setup records can be accessed using external storage (for example a CD, USB drive, or iPod) or over a network using Open Directory. For more information about creating and using Auto Server Setup records, see Server Administration. You can accomplish additional automation of compute node configuration by using scripts executed with SSH or Apple Remote Desktop software. Creating an Auto Server Setup Record for Compute Nodes To automate the process of setting up compute nodes, use Server Assistant to save the compute node configuration to a file or Open Directory record. To create an Auto Server Setup record: 1 On the cluster controller, open Server Assistant (located in /Applications/Server/). 2 In the Welcome screen: a Select “Save advanced setup information in a file or directory record.” b Click Continue. 3 In the Language screen: a Select the language you want to use to administer the server. b Click Continue. 4 In the Keyboard screen: a Select the keyboard layout for the server. b Click Continue. 95 5 In the Serial Number screen: a Enter a site-licensed Mac OS X Server serial number. Note: If you don’t have a site-licensed number you must manually enter unique serial numbers for each compute node after it has been configured. b Click Continue. 6 In the Administrator Account screen: a Create the account you’ll use to administer compute nodes. b Click Continue. 7 In the Network Interfaces screen: a b c d e f Click Add. In the Port Name field, enter “Ethernet 1.” In the Device Name field, enter “en0” and leave the Ethernet Address field blank. Click OK. Enable TCP/IP for Ethernet 1. Click Continue. 8 In the TCP/IP Connection screen for the Built-in Ethernet 1 port: a From the Configure pop-up menu, choose Using DHCP. b Leave the other fields blank. c Click Continue. 9 In the Network Names screen: a b c d Leave the Primary DNS Name field blank. Leave the Computer Name field blank. Verify that the “Enable Remote Management” checkbox is selected. Click Continue. A warning appears indicating you left some fields blank. e Click Continue. 10 In the Time Zone screen: a From the Closest City pop-up menu, choose your time zone. b Click Continue. 96 Chapter 11 Setting Up Compute Nodes 11 In the Directory Usage screen: a From the “Set directory usage to” pop-up menu, choose “Connected to a Directory System”. b From the Connect pop-up menu, choose “Open Directory Server.” c In the IP Address or DNS Name field, enter the private DNS name of the cluster controller (for example, controller.cluster). d Click Continue. 12 In the Confirm Settings screen: a Read the configuration summary to confirm that you have made the correct settings. b Click Save As. 13 In Save settings, use the following to choose whether to save your setting to a configuration file or Open Directory record. If you use a configuration file, it should be named generic.plist and saved to a CD, DVD, USB drive, iPod, or other removable drive. It should be located in a folder called Auto Server Setup at the top level of the removable file system. The file is used if the removable drive is present when an unconfigured compute node starts for the first time. If you save your settings to an Open Directory record, an unconfigured compute node discovers the record via DHCP and configures itself accordingly. Save the record to the LDAPv3/127.0.0.1 domain and name it generic. When asked, specify an Open Directory server using the controller’s DNS name (for example, controller.cluster) or IP address (for example, 10.0.2.199). Saving settings to an Open Directory record without encryption will require the use of password (.pass) files. Saving them without encryption will expose the administrator password to anyone with access to the Open Directory domain. For more information about the creation and use of Auto Server Setup record and encryption, see Server Administration. a Select Directory Record. b If creating a Directory Record, choose /LDAPv3/127.0.0.1 from the Directory Domain pop-up menu. c Decide if you want to encrypt the record. d In the Record Name field, enter “generic.” e Click OK and then authenticate using the directory admin login and password you created when you configured Open Directory. 14 Click OK. 15 Quit Server Assistant. Chapter 11 Setting Up Compute Nodes 97 Verifying LDAP Record Creation To verify the creation of the LDAP directory record that will be used by compute nodes to autoconfigure, use the slapcat command on the cluster controller. To verify the LDAP record creation: 1 Open a Terminal window on the cluster controller and enter the following command: $ sudo slapcat | grep generic 2 When prompted enter the administrator password . This command displays the generic records in the LDAP database on the cluster controller. In this case, there should only be one record—the one you created in the previous section. dn: cn=generic,cn=autoserversetup,dc=controller,dc=cluster cn: generic Setting Up Compute Nodes Setting up compute nodes involves obtaining IP addresses for each compute node connected to your private network. This section provides useful tips for setting up compute nodes depending on your cluster configuration. To set up compute nodes: 1 Make sure compute nodes are connected to the private network through Ethernet port 1. 2 Start the first compute node. The DHCP service hosted on the cluster controller provides IP addresses to nodes when they start, beginning with the first address in the range and incrementing the address for each request. The DHCP lease time specified in the Server Admin settings for the DHCP service determines how long this address is reserved for a computer. It is advisable for each node in a cluster to use sequential IP addresses that correspond to their physical position in a rack and the names they have been assigned. Node1 would have an address that ends in 1 (for example, 172.16.1.1) and node199 would have an address that ends with 199 (for example, 172.16.1.199). If you set up your cluster in this manner, start the first node and wait until you verify its IP address before starting the next one. You can check DHCP IP address assignments in the DHCP Clients pane of Server Admin. Because Server Admin does not maintain a persistent connection to the servers it administers, you might need to click the Refresh button in the toolbar to update the client listing immediately. 98 Chapter 11 Setting Up Compute Nodes If an Auto Server Setup record is available to the compute node through a removable drive or Open Directory record, it will configure itself and reboot. After you verify that the first node has completed this process, start the remaining compute nodes sequentially, allowing time for them to obtain sequential IP addresses from the DHCP server and for autoconfiguration. Do not disconnect or remove disks until you are sure the server has applied the settings. 3 Select the DHCP service and view client connections. Static Maps in the DHCP Static Maps pane of Server Admin enable you to guarantee that an IP address is always reserved for a specific node, regardless of how much time has elapsed since it was assigned its address. In addition to providing the IP address assignment, the DHCP service on the cluster controller provides the IP address and search base for the Open Directory domain on the cluster controller. Configuring Cluster Nodes When configuring cluster nodes, use Server Admin to name cluster nodes, join them to the Kerberos realm, and join them to a grid. To configure cluster nodes: 1 Open Server Admin. 2 Click the Add Server (+) button below the Servers list. 3 Connect to the cluster node using its IP address. If you used an Auto Server Setup record to configure the nodes, use the administrator user name and password you created with that record. 4 In the Servers list, click the cluster node. 5 Click Settings. Note: If the Mac OS X Server serial number is not valid, Server Admin doesn’t permit you to administer services. If you did not supply a volume license serial number when creating the Auto Server Setup file, you must enter a valid serial number for each node before you can continue. Click General to verify the serial number. 6 Click Network. 7 In the Computer Name and Local Hostname fields, enter the computer name and hostname of the cluster node (for example, node1). 8 Click Save. 9 Click Services. 10 Select the Open Directory checkbox. 11 Select the Xgrid checkbox. Chapter 11 Setting Up Compute Nodes 99 12 Click Save. 13 Repeat steps 2 through 12 for each compute node. You can also use Apple Remote Desktop to set the names of all cluster nodes at once. For more information, see “Naming Multiple Cluster Nodes” on page 111. 14 Select the node’s Open Directory service. 15 Click Settings, then click General. 16 Verify the role is set to “Connected to a Directory System.” 17 Click Join Kerberos. A Join Kerberos Realm screen appears. Set the realm to your Kerberos realm (for example, CONTROLLER.CLUSTER). 18 Enter the Open Directory administrator user name and password. 19 Click Refresh below the Servers list. If the node has joined the Kerberos realm, the Join Kerberos button and associated text will disappear. 20 In the Servers list select the node’s Xgrid service. 21 Click Overview. 22 Click Configure Xgrid Service. The Xgrid Service Configuration Assistant appears. 23 Click Continue, then select “Join a grid.” 24 Click Continue. 25 In the “Use controller with hostname” field, enter the controller’s private DNS name (for example, controller.cluster). 26 Click Continue. 27 Confirm the settings. The Directory Server entry should be an LDAPv3 path based on the controller’s DNS name (for example, /LDAPv3/controller.cluster). The Kerberos realm should be the same as the controller’s DNS name in all capital letters (for example, CONTROLLER.CLUSTER). 28 Click Continue. 29 Click Close. You can automate steps. For more information, see Appendix B, “Automating Compute Node Configuration.” 100 Chapter 11 Setting Up Compute Nodes Creating and Verifying a VPN Connection Remote clients can connect to the private network of the cluster securely using SSH and VPN. VPN access allows graphical applications (like the GridMandelbrot sample Xgrid application) to run on remote systems, but use the cluster for computation. VPN access also allows administrative tools, such as Apple Remote Desktop, to manage compute nodes from a remote system. The following instructions are for VPN configuration for Mac OS X v10.5 clients. For other operating systems, or older versions of Mac OS X, consult the appropriate documentation using the values provided in the following. To create and verify a VPN connection: 1 Open System Preferences, then click Network. 2 Click the Add (+) button at the bottom of the network connection services list and then choose VPN from the Interface pop-up menu. 3 From the VPN Type pop-up menu, choose PPTP. 4 In the Service Name field, enter a descriptive name (for example, Cluster VPN)) and click Create. 5 In the Server Address field, enter the public IP address for the controller (for example, 10.0.2.199). 6 In the Account Name field, enter the short username for a user you created on the controller using Workgroup Manager. For more information, see “Creating User Accounts” on page 94. 7 Click Apply and then click Connect. 8 Verify that the network connection services list has an active VPN (PPTP) connection to the cluster controller and that you’re getting a private network address. Joining a Remote Client to the Kerberos Realm Because the firewall has been configured to block most types of incoming network access, a VPN connection is necessary to use Kerberos from remote clients. For your client computer to use Kerberos, you must join it to the Kerberos realm of the controller. To join a remote client to the Kerberos realm: 1 Open the Kerberos application located in the /System/Library/CoreServices/ folder. 2 Select Edit > Edit Realms. 3 Click the Add (+) button below the Realm list. 4 In the Realm Name field, enter the Kerberos Realm of the controller (for example, CONTROLLER.CLUSTER). Chapter 11 Setting Up Compute Nodes 101 5 Click Servers, then click the Add (+) button (below the Servers list). 6 Verify that the new entry in the Type column is listed as “KDC.” 7 Enter the private DNS name for your controller in the Server column (for example, controller.cluster). 8 Click Domain, then click the Add (+) button (below the Domain list). 9 Enter the private DNS zone preceded by a period (for example, .cluster). 10 Click the Add (+) button (below the Domain list). 11 Enter the private DNS zone (for example, cluster). 12 Click OK. 13 Authenticate using administrator credentials for you client computer. Verifying Remote Client Access to the Kerberos Realm After the remote client is configured to join the Kerberos realm, verify that you have received a Ticket Granting Ticket (TGT) from the controller. To verify remote client access to the Kerberos realm: 1 Open the Kerberos application located in the /System/Library/CoreServices/ folder. 2 Click New. 3 Verify that the Realm is set to the Kerberos Realm of the controller (for example, CONTROLLER.CLUSTER). 4 Enter the user name and password for an account created in the Open Directory domain of the controller. 5 Click OK. 6 Verify the entry in the Ticket Cache list. 7 Verify the entry of the TGT for your user in the Ticket list (for example, krbtgt/ CONTROLLER.CLUSTER@CONTROLLER.CLUSTER). Note: When an application that supports Kerberos is used and the Kerberos TGT does not exist or has expired, the Kerberos authentication dialog appears. You do not need to use the Kerberos application each time you want to obtain a ticket. 102 Chapter 11 Setting Up Compute Nodes 12 Testing Your Cluster 12 Use this chapter to make sure you’ve successfully configured your cluster before performing HPC. Use Xgrid Admin to verify that you can see the Xgrid agents in your cluster. Then use sample Xgrid tasks to test your cluster. Checking Your Cluster Using Xgrid Admin Use Xgrid Admin to verify that Xgrid agents are running on the compute nodes. To use Xgrid Admin to check your cluster: 1 From the management computer, a VPN client, or the controller, open Xgrid Admin (located in /Applications/Server/). 2 Click Add Controller. 3 From the pop-up menu, choose the controller and click Connect. 4 In the authentication sheet: a Select “Use Single Sign On Credentials.” b Click OK. c If prompted, enter a cluster account username, the Kerberos realm (for example, CONTROLLER.CLUSTER), and password. d Click OK. 5 In the Controllers and Grids list, select the cluster. 6 Click Overview. Overview shows the number of agents, which should equal the number of compute nodes you configured. This also shows the number of available, unavailable, and working processors, and the number of jobs running and jobs pending. 7 View the status of the Xgrid agents by clicking Agents. 103 8 Verify that you can see a list of all nodes in your cluster. If you don’t see all agents you were expecting, see “If Your Agents Can’t Connect to the Xgrid Controller” on page 51. 9 Monitor the progress of Xgrid jobs as they are being processed by clicking Jobs. 10 Quit Xgrid Admin. Testing Your Xgrid Cluster To test your cluster, use GridSample, a sample Cocoa application that comes with Developer Tools for Mac OS X v10.5, to submit Xgrid tasks to the controller. This application provides you with an easy-to-use GUI for Xgrid. On any system that has the Mac OS X developer tools installed, the example code for the application is at: /Developer/Examples/Xgrid/GridSample/GridSample.xcodeproj Using this application, you can generate the monthly calendars of the year 2007 across the cluster. Although this application is trivial, it enables you to test the cluster and it illustrates the simplicity of Xgrid job submission. Note: You can also submit Xgrid tasks using the xgrid command-line tool. For more information, see the tool’s man page and Command-Line Administration. To test your cluster using GridSample: 1 Open GridSample.xcodeproj by using Xcode (located in /Developer/Applications/). 2 Set the active executable to Xgrid Feeder Sample by choosing Project > Set Active Executable > Xgrid Feeder Sample. 3 Build and run the project by clicking “Build and Go.” The application starts running and prompts you for an Xgrid controller to connect to. 4 Enter the address of the controller and click Connect. 5 Click “Use password,” enter the password for the controller, and click OK. 6 Click New Job. 7 In the Job Name field, enter “2007 Calendars.” 8 Make sure the Command field is set to /usr/bin/cal. 9 From the Argument 1 pop-up menu, choose Range. 10 For argument 1, enter 1 in the From field, 12 in the “to” field, and 1 in the “by” field. This range tells the application to generate the 2007 monthly calendars from January through December. 11 To add another argument below Argument 1, click the Add (+) button. 12 From the Argument 2 pop-up menu, choose Literal. 104 Chapter 12 Testing Your Cluster 13 For argument 2, enter “2007.” Note: Instead of specifying one year, you could specify a range of years, and Xgrid would create a separate set of tasks for each year. 14 Click Submit. The Xgrid controller on the controller prepares the tasks and sends them to Xgrid agents running on the cluster nodes. When the job is done, the status of the job changes to Finished in the Xgrid Feeder Sample window. 15 To see the results of each task, click Show Results. Note: To test image-rendering on your cluster, use Xcode to build and run the example application GridMandelbrot.xcodeproj (located in /Developer/Examples/Xgrid/ GridMandelbrot/). Just as you did earlier, build and run the project, connect to the Xgrid controller, and submit the job. The application renders Mandelbrot images across your cluster. Verifying Your Xgrid Configuration Verify that Xgrid is configured and works. To verify your Xgrid service: 1 Install and configure Xcode developer tools. Xcode is included with the Mac OS X Server Installation disc. The latest version of Xcode can also be downloaded from the Apple Developer Connection (ADC) at www.apple.com/developer. 2 Compile and launch the Xgrid Mandelbrot example application (located in /Developer/ Examples/Xgrid/GridMandelbrot). 3 From the “Enter or choose a controller to connect to” pop-up menu, choose your controller and click Connect. 4 Select “Use Single Sign On credentials” and click OK. 5 Enter a cluster user name and password to authenticate with Kerberos, then click OK. You can monitor your cluster’s performance with the Xgrid Admin application in /Application/Server/. Chapter 12 Testing Your Cluster 105 Verifying Your SSH Connection Verify that SSH is running on the controller by using Terminal. To verify your SSH connection: 1 From a remote system, open Terminal (located in /Applications/Utilities/). 2 Open an SSH connection to your controller by logging in with a user account name and password created in Workgroup Manager and by using the public IP address or public DNS name for your controller (for example, ssh tomclark@10.0.2.199). Enter the following command to obtain a Kerberos Ticket Granting Ticket (TGT) and when prompted for a password use the same password used for your SSH connection. By using a TGT you are not required to enter passwords for access to cluster resources. $ kinit Please enter the password for tomclark@CONTROLLER.CLUSTER: After the connection to the controller is made, you can connect directly to the compute nodes using their private DNS name (for example, ssh tomclark@node1.cluster or ssh tomclark@node1). 106 Chapter 12 Testing Your Cluster Cluster Setup Checklist A Appendix A Use the checklist in this appendix to guide you through the cluster setup procedure. Print this checklist and use it to make sure you have performed all setup steps. The steps in this checklist are in order only within each section. For information about this step Go to Physical Setup N Power source meets minimum requirements “Power Requirements” on page 68 N Cooling system meets minimum requirements “Cooling Requirements” on page 69 N Facility housing the cluster meets minimum weight requirements “Weight Requirements” on page 70 N Space around the cluster meets minimum requirements “Space Requirements” on page 70 N Network switches support Gigabit Ethernet and have enough ports “Network Access Requirements” on page 71 N Mount cluster nodes on the rack “Network Access Requirements” on page 71 N Connect cluster nodes to a power source “Preparing the Cluster Nodes for Software Configuration” on page 75 N Connect cluster nodes to the private network “Preparing the Cluster Nodes for Software Configuration” on page 75 Software Setup N Obtain a static IP address and related network and DNS information N Obtain a site-licensed serial number “Volume-Licensed Serial Number” on page 72 N Obtain a copy of Apple Remote Desktop “Apple Remote Desktop” on page 72 N Record the serial numbers of cluster nodes “Preparing the Cluster Nodes for Software Configuration” on page 75 “Network Access Requirements” on page 71 107 For information about this step Go to Management Computer Setup (Optional) N Disable AirPort and other public network connections “(Optional) Setting Up the Management Computer” on page 78 N Install the latest version of Mac OS X Server tools “(Optional) Setting Up the Management Computer” on page 78 N Install Apple Remote Desktop “(Optional) Setting Up the Management Computer” on page 78 Controller Setup N Connect the controller to the public and private network N Run Server Assistant and configure public “Setting Up Server Software on the Cluster Controller” network settings on page 81 N Configure DNS service “Configuring DNS Service” on page 84 N Configure Open Directory service “Configuring the Cluster Controller as an Open Directory Master” on page 86 N Configure DHCP service “Configuring DHCP Service” on page 87 N Configure Firewall service “Configuring Firewall Settings on the Cluster Controller” on page 88 N Configure NAT service “Configuring NAT Settings on the Cluster Controller” on page 90 N Configure NFS service “Configuring NFS” on page 90 N Configure VPN service “Configuring VPN Service” on page 90 N Configure Xgrid service “Configuring Xgrid Service” on page 91 N Prepare data drive “Preparing the Data Drive as a Mirrored RAID set” on page 92 N Create home directory “Creating a Home Directory Automount Share Point” on page 93 N Create user accounts “Creating User Accounts” on page 94 “Setting Up Server Software on the Cluster Controller” on page 81 Compute Node Setup N Create auto server setup records N Set up compute nodes “Setting Up Compute Nodes” on page 98 N Configure cluster nodes “Configuring Cluster Nodes” on page 99 N Create and verify VPN connection “Creating and Verifying a VPN Connection” on page 101 “Creating an Auto Server Setup Record for Compute Nodes” on page 95 Cluster Testing 108 N Check the cluster using Xgrid Admin “Checking Your Cluster Using Xgrid Admin” on page 103 N Test Xgrid cluster “Testing Your Xgrid Cluster” on page 104 Appendix A Cluster Setup Checklist For information about this step Go to N Verify Xgrid configuration “Verifying Your Xgrid Configuration” on page 105 N Verify your SSH connection “Verifying Your SSH Connection” on page 106 Appendix A Cluster Setup Checklist 109 110 Appendix A Cluster Setup Checklist Automating Compute Node Configuration B Appendix B Use this appendix to learn about alternative ways of completing tasks documented earlier in this guide. For large clusters, some tasks in this guide can be completed quickly and efficiently using Apple Remote Desktop. Naming Multiple Cluster Nodes Using the Send UNIX Command in Apple Remote Desktop, you can rename all cluster nodes at once. The shell script used in the following steps causes each node to set its Computer name and Bonjour name to “node” followed by the last digit of its IP address. For example, a node with the IP address of “172.16.1.2” will be named “node2.” To name multiple cluster nodes: 1 Open Apple Remote Desktop. 2 Select the nodes to be configured. 3 From the Manage pop-up menu, select “Send UNIX Command.” 4 In the first field, enter the following shell script, noting the use of double quotes (“) and backquotes (`). theNodeNumber=`ipconfig getifaddr en0 | cut -d . -f 4` /System/Library/ServerSetup/serversetup -setComputerName "node$theNodeNumber" /System/Library/ServerSetup/serversetup -setBonjourName "node$theNodeNumber" 5 Select button next to User. 6 In the User field, enter “root.” 7 Click Send. For each node that sets its name, an entry is created in the results window followed by two lines containing a zero. 111 8 Close the Send UNIX Command results window. All nodes should now show their hostname in the Remote Desktop list. Joining Multiple Cluster Nodes to the Kerberos Realm To send commands to join the nodes to the Kerberos realm, use Apple Remote Desktop’s Send UNIX Command. To join multiple cluster nodes to the Kerberos realm: 1 Open Apple Remote Desktop. 2 Select the nodes you want to join. 3 From the Manage pop-up menu, choose Send UNIX Command. 4 In the first field, enter the following command: sso_util configure -r CONTROLLER.CLUSTER -a diradmin -p diradminpassword all This command sets each cluster node to join the Kerberos realm “CONTROLLER.CLUSTER” using the directory administrator account “diradmin” and the password “diradminpassword.” 5 Select the button next to User. 6 In the User field, enter “root”. 7 Click Send. For each node joining the Kerberos realm, there is an entry in the results window. 8 Close the Send UNIX Command results window. Configuring Xgrid Agent Settings Using Apple Remote Desktop To send commands to compute nodes to configure their Xgrid agent settings, use Apple Remote Desktop’s Send UNIX Command. To configure Xgrid agent settings: 1 Open Apple Remote Desktop. 2 From the pop-up menu, click Scanner and choose Network Range. 3 Enter the starting and ending addresses of the address range used by the compute nodes. 4 Select the compute nodes from the list and choose Manage > Send UNIX Command. 112 Appendix B Automating Compute Node Configuration 5 In the text field, enter the following commands: serveradmin settings xgrid:XgridKerberosInfo:ReadyForAgentRoleBasedSetup = yes serveradmin settings xgrid:XgridKerberosInfo:ReadyForControllerRoleBasedSetup = yes serveradmin settings xgrid:AgentSettings:Enabled = yes serveradmin settings xgrid:AgentSettings:ControllerPassword = "" serveradmin settings xgrid:AgentSettings:prefs:ControllerName = "controller" serveradmin settings xgrid:AgentSettings:prefs:SuspendWhenNotIdle = no serveradmin settings xgrid:AgentSettings:prefs:OnlyWhenIdle = no serveradmin settings xgrid:AgentSettings:prefs:ResolveNameAsNetService = yes serveradmin settings xgrid:AgentSettings:prefs:ControllerAuthentication = "Kerberos" serveradmin settings xgrid:AgentSettings:prefs:BindToFirstAvailable = no serveradmin settings xgrid:ControllerSettings:ClientPassword = "" serveradmin settings xgrid:ControllerSettings:Enabled = no serveradmin settings xgrid:ControllerSettings:prefs:AgentAuthentication = "Kerberos" serveradmin settings xgrid:ControllerSettings:prefs:ClientAuthentication = "Kerberos" serveradmin settings xgrid:ControllerSettings:AgentPassword = "" xgridctl agent start Replace “controller” with the fully qualified private name of the controller (for example, controller.cluster). 6 Select User and enter “root” in the text field. 7 Select “Display all output.” 8 Click Send. These commands configure the Xgrid agent on compute nodes to bind to the controller and then start the Xgrid service. The compute nodes can now receive Xgrid tasks. Appendix B Automating Compute Node Configuration 113 Using SSH Without Passwords Users on your cluster can generate authentication keys in their home folders that enable them to use SSH to connect to other cluster nodes without entering their password again. To use SSH without passwords: 1 Make an SSH connection to the controller. If connecting from a remote system, access the public IP address or DNS name of the controller (For example, ssh mab@10.0.2.199). 2 In your home directory on the controller, enter the following commands in sequence: mkdir .ssh chmod 700 .ssh ssh-keygen -t dsa -f .ssh/id_dsa -C "Enter a comment here" You are prompted twice to enter a passphrase. Leave this blank and press Return each time. chmod 600 .ssh/id_dsa* cat .ssh/id_dsa.pub >> .ssh/authorized_keys You can test the authentication keys by attempting to make an SSH connection from the controller to a cluster node (for example, ssh mab@node2.cluster). The first time you connect to any cluster node, SSH prompts you to establish the authenticity of that node by entering “yes” at the prompt. After the authenticity of the node is established, a record is stored in the ~/.ssh/known_hosts file of your home folder and you are not prompted for that host again. 114 Appendix B Automating Compute Node Configuration Glossary Glossary address A number or other identifier that uniquely identifies a computer on a network, a block of data stored on a disk, or a location in a computer’s memory. See also IP address, MAC address. administrator A user with server or directory domain administration privileges. Administrators are always members of the predefined “admin” group. AFP Apple Filing Protocol. A client/server protocol used by Apple file service to share files and network services. AFP uses TCP/IP and other protocols to support communication between computers on a network. aggregation Combining similar objects or resources (such as disks or network connections) into a single logical resource in order to achieve increased performance. For example, two or more disks can be aggregated into a single logical disk to provide a single volume with increased capacity. Apple Filing Protocol See AFP. AppleScript A scripting language with English-like syntax, used to write script files that can control your computer. AppleScript is part of the Mac operating system and is included on every Macintosh. automatic backup A backup triggered by an event (such as a scheduled time, or the exceeding of a storage limit) rather than by a human action. automatic failover Failover that occurs without human intervention. availability The amount of time that a system is available during those time periods when it’s expected to be available. See also high availability. back up (verb) The act of creating a backup. backup (noun) A collection of data that’s stored for the purpose of recovery in case the original copy of data is lost or becomes inaccessible. bit A single piece of information, with a value of either 0 or 1. 115 bit rate The speed at which bits are transmitted over a network, usually expressed in bits per second. byte A basic unit of measure for data, equal to eight bits (or binary digits). client A computer (or a user of the computer) that requests data or services from another computer, or server. cluster A collection of computers interconnected in order to improve reliability, availability, and performance. Clustered computers often run special software to coordinate the computers’ activities. See also computational cluster. command-line interface A way of interacting with the computer (for example, to run programs or modify file system permissions) by entering text commands at a shell prompt. See also shell; shell prompt. computational cluster A group of computers or servers that are grouped together to share the processing of a task at a high level of performance. A computational cluster can perform larger tasks than a single computer would be able to complete, and such a grouping of computers (or “nodes”) can achieve high performance comparable to a supercomputer. data rate The amount of information transmitted per second. default The automatic action performed by a program unless the user chooses otherwise. deploy To place configured computer systems into a specific environment or make them available for use in that environment. disk A rewritable data storage device. See also disk drive, logical disk. disk drive A device that contains a disk and reads and writes data to the disk. disk image A file that, when opened, creates an icon on a Mac OS X desktop that looks and acts like an actual disk or volume. Using NetBoot, client computers can start up over the network from a server-based disk image that contains system software. Disk image files have a filename extension of either .img or .dmg. The two image formats are similar and are represented with the same icon in the Finder. The .dmg format cannot be used on computers running Mac OS 9. DNS Domain Name System. A distributed database that maps IP addresses to domain names. A DNS server, also known as a name server, keeps a list of names and the IP addresses associated with each name. DNS domain A unique name of a computer used in the Domain Name System to translate IP addresses and names. Also called a domain name. 116 Glossary DNS name A unique name of a computer used in the Domain Name System to translate IP addresses and names. Also called a domain name. domain Part of the domain name of a computer on the Internet. It does not include the top-level domain designator (for example, .com, .net, .us, .uk). Domain name “www.example.com” consists of the subdomain or host name “www,” the domain “example,” and the top-level domain “com.” domain name See DNS name. Domain Name System See DNS. Ethernet A common local area networking technology in which data is transmitted in units called packets using protocols such as TCP/IP. Ethernet adapter An adapter that connects a device to an Ethernet network. Usually called an Ethernet card or Ethernet NIC. See also NIC. Fibre Channel The architecture on which most SAN implementations are built. Fibre Channel is a technology standard that allows data to be transferred from one network node to another at very high speeds. file system A scheme for storing data on storage devices that allows applications to read and write files without having to deal with lower-level details. GB Gigabyte. 1,073,741,824 (230) bytes. Gigabit Ethernet A group of Ethernet standards in which data is transmitted at 1 gigabit per second (Gbit/s). Abbreviated GbE. gigabyte See GB. high availability The ability of a system to perform its function continuously, without interruption. host name A unique name for a computer, historically referred to as the UNIX hostname. HTTP Hypertext Transfer Protocol. The client/server protocol for the World Wide Web. HTTP provides a way for a web browser to access a web server and request hypermedia documents created using HTML. Hypertext Transfer Protocol See HTTP. image See disk image. Internet A set of interconnected computer networks communicating through a common protocol (TCP/IP). The Internet is the most extensive publicly accessible system of interconnected computer networks in the world. Glossary 117 Internet Protocol See IP. IP Internet Protocol. Also known as IPv4. A method used with Transmission Control Protocol (TCP) to send data between computers over a local network or the Internet. IP delivers data packets and TCP keeps track of data packets. IP address A unique numeric address that identifies a computer on the Internet. KB Kilobyte. 1,024 (210) bytes. kilobyte See KB. link An active physical connection (electrical or optical) between two nodes on a network. link aggregation Configuring several physical network links as a single logical link to improve the capacity and availablility of network connections. With link aggregation, all ports are assigned the same ID. Compare to multipathing, in which each port keeps its own address. load balancing The process of distributing client computers’ requests for network services across multiple servers to optimize performance. log in (verb) To start a session with a computer (often by authenticating as a user with an account on the computer) in order to obtain services or access files. Note that logging in is separate from connecting, which merely entails establishing a physical link with the computer. logical disk A storage device that appears to a user as a single disk for storing files, even though it might actually consist of more than one physical disk drive. An Xsan volume, for example, is a logical disk that behaves like a single disk even though it consists of multiple storage pools that are, in turn, made up of multiple LUNs, each of which contains multiple disk drives. See also physical disk. Mac OS X The latest version of the Apple operating system. Mac OS X combines the reliability of UNIX with the ease of use of Macintosh. Mac OS X Server An industrial-strength server platform that supports Mac, Windows, UNIX, and Linux clients out of the box and provides a suite of scalable workgroup and network services plus advanced remote management tools. MB Megabyte. 1,048,576 (220) bytes. MB/s Abbreviation for megabytes per second. Mbit Abbreviation for megabit. Mbit/s Abbreviation for megabits per second. 118 Glossary megabyte See MB. name server A server on a network that keeps a list of names and the IP addresses associated with each name. See also DNS, WINS. Network File System See NFS. network interface Your computer’s hardware connection to a network. This includes (but isn’t limited to) Ethernet connections, AirPort cards, and FireWire connections. network interface card See NIC. NFS Network File System. A client/server protocol that uses Internet Protocol (IP) to allow remote users to access files as though they were local. NFS can export shared volumes to computers based on IP address, and also supports single sign-on (SSO) authentication through Kerberos. nfsd daemon An NFS server process that runs continuously behind the scenes and processes NFS protocol and mount protocol requests from clients. nfsd can have multiple threads. The more NFS server threads, the better concurrency. NIC Network interface card. An adapter that connects a computer or other device to a network. NIC is usually used to refer to adapters in Ethernet networking; in Fibre Channel networking, the interface is usually called a host bus adapter (HBA). Open Directory The Apple directory services architecture, which can access authoritative information about users and network resources from directory domains that use LDAP, Active Directory protocols, or BSD configuration files, and network services. open source A term for the cooperative development of software by the Internet community. The basic principle is to involve as many people as possible in writing and debugging code by publishing the source code and encouraging the formation of a large community of developers who will submit modifications and enhancements. port A sort of virtual mail slot. A server uses port numbers to determine which application should receive data packets. Firewalls use port numbers to determine whether data packets are allowed to traverse a local network. “Port” usually refers to either a TCP or UDP port. port name A unique identifier assigned to a Fibre Channel port. protocol A set of rules that determines how data is sent back and forth between two applications. Glossary 119 RAID Redundant Array of Independent (or Inexpensive) Disks. A grouping of multiple physical hard disks into a disk array, which either provides high-speed access to stored data, mirrors the data so that it can be rebuilt in case of disk failure, or both. The RAID array is presented to the storage system as a single logical storage unit. See also RAID array, RAID level. RAID 1 A RAID scheme that creates a pair of mirrored drives with identical copies of the same data. It provides a high level of data availability. RAID array A group of physical disks organized and protected by a RAID scheme and presented by RAID hardware or software as a single logical disk. In Xsan, RAID arrays appear as LUNs, which are combined to form storage pools. RAID level A storage allocation scheme used for storing data on a RAID array. Specified by a number, as in RAID 3 or RAID 0+1. router A computer networking device that forwards data packets toward their destinations. A router is a special form of gateway which links related network segments. In the small office or home, the term router often means an Internet gateway, often with Network Address Translation (NAT) functions. Although generally correct, the term router more properly refers to a network device with dedicated routing hardware. server A computer that provides services (such as file service, mail service, or web service) to other computers or network devices. Server Message Block See SMB. SMB Server Message Block. A protocol that allows client computers to access files and network services. It can be used over TCP/IP, the Internet, and other network protocols. SMB services use SMB to provide access to servers, printers, and other network resources. switch Networking hardware that connects multiple nodes (or computers) together. Switches are used in both Ethernet and Fibre Channel networking to provide fast connections between devices. 120 Glossary Index Index A B access administrator permissions 36 LDAP 86, 98 managing client 35 accounts 94 ACLs (access control lists) 35 administrator 36, 42 agents adding 43 authentication 26 controllers 23, 30, 32, 91 deleting 44 distributed grids 21 functions of 22 grid workload 19 list of 43 management of 42, 43 mobility of 39 overview 23 requirements 18 setup 32, 33, 42, 43, 112 troubleshooting 51 airflow for hardware 77 Apple Remote Desktop (ARD) agent settings 112 clusters 72 features 42 Apple Workgroup Cluster 62 applications grid performance 19 Xgrid support 53 Xserve support 61 See also specific applications ARD. See Apple Remote Desktop authentication cluster 86, 112, 114 options 26, 27, 31 passwords 26, 27, 30, 34 setup 33, 34 troubleshooting 54 See also Kerberos bioinformatics 62 C cat tool 37 client computers, agent setup 33 clients access control 35 authentication 27 overview 23 remote, joining to Kerberos 101 verifying remote access 102 See also client computers; users clusters authentication 86, 112, 114 checklist for setup 107 connections 71 controllers 81, 91, 98 data drives 92 definition 20 DHCP 87, 98 DNS 84 domain for 99 high performance computing 59, 60 homogeneity of 20 management computer 78, 81 NFS 90, 93 Open Directory 86 requirements 67, 68, 73 setup overview 63, 64 testing 103, 104, 105, 106 user accounts 94 VPN 90 vs. grids 18 Xgrid capacity 24 Xgrid service 91 Xserve 60, 62 See also nodes command-line tools Server Admin 38 SSH login 42 viewing logs 37 121 Xgrid 42, 48, 104 computational grids. See grids, computational computers agent setup 42 client 33 idle status 32 management 78, 81 configuration agents 32, 33, 42, 43, 112 authentication 33, 34, 86, 112 automatic grid 22 controller 30, 81, 91 hosting 28 joining 29 remote preferences 42 Service Configuration Assistant 28 See also clusters; nodes controllers and agents 23, 30, 32, 91 cluster 81, 91, 98 connections 40, 41 hosting considerations 28 management of 40 NAT settings 90 nodes 22 overview 24 requirements 18 security 88 setup 30, 81, 91 cooling requirements 69 cross-platform Xgrid agents 53 cryptography 62 D data drive setup 92 data mining 62 desktop recovery 18 DHCP (Dynamic Host Configuration Protocol) service 87, 98 directory services 84, 86, 99 disk images 43 disks, cluster preparation 92 Disk Utility 92 distributed computing architecture 21 See also Xgrid DNS (Domain Name System) service 28, 84, 85 documentation 11, 12, 13 Domain Name System. See DNS domains, directory 84, 86, 99 drives. See disks Dynamic Host Configuration Protocol. See DHCP E embarrassingly parallel computations 62 Ethernet 122 Index Gigabit Ethernet 62, 71 ports for 77, 81 F failure rates 21, 48 file services 90, 93 firewall service 28, 51, 88 G Gigabit Ethernet 62, 71 grids, computational automatic configuration 22 definition 18, 39 functions 22 management of 39, 45, 46 overview 17 performance 19 types 21 vs. clusters 18 See also Xgrid GridSample 104 H hardware requirements 67, 68, 69, 70 head node 21 help, using 10 highly dispersed grids 18 high performance computing (HPC) Apple’s role in 59, 60, 62 overview 59 homogeneity of clusters 20 host names 73, 86 HPC. See high performance computing I images disk 43 rendering of 62, 105 indicators, status 40 installation NetInstall 43 IP addresses DHCP setup 87, 98 DNS service 84 hosting controller 28 static 73 VPN setup 91 J jobs definition 24 deleting 45 failure of 48 list of 44 overview 18, 19, 22, 23, 24 restarting 45 results 49 status checking 49 stopping 44 structuring 47 styles 47 submitting 48 K Kerberos cluster setup 86, 112 joining remote clients 101 verifying remote client access 102 Xgrid administration 26, 27, 34, 54 L LDAP (Lightweight Directory Access Protocol) service 86, 98 libraries, code 61 local grids 21 login, SSH 42 logs 37 loosely coupled computations 62 M Mac OS X agent setup 33, 42 Mac OS X Server agent setupauthentication options 32 high performance computing 59, 60 software requirements 72 management computer 78, 81 memory Xgrid requirements 18 Xserve systems 61 message-passing interface. See MPI mounting cluster nodes 75 MPI (message-passing interface) 48 N name server 88 See also DNS naming conventions, nodes 99, 111 NAT (Network Address Translation) 90 NetBoot service 43 NetInstall 43 Network File System. See NFS networks cluster connections 71 controller hosting 28 grid type 21 private 71, 90, 101 public 71 Index See also Ethernet network services DHCP 87, 98 DNS 28, 84, 85 NAT 90 VPN 90, 101 See also IP addresses NFS (Network File System) 90, 93 nfsd daemon 90 nodes cluster arrangement 60 controller 22 firewall settings 88 head 21 joining to Kerberos realm 112 LDAP record 98 mounting 75 naming 99, 111 NAT settings 90 overview 18 setup 98 VPN connection 101 O Open Directory 84, 86, 99 Open Directory master 26, 86 P passwords 26, 27, 30, 34 PDUs (power distribution units) 76 permissions, administrator 36 ports Ethernet 77, 81 firewall 51, 88 power considerations 68, 76 power distribution units. See PDUs preferences remote setup 42 Sharing 33 System Preferences 42 private network 71 See also VPN privileges, administrator 36 problems. See troubleshooting protocols DHCP 87, 98 LDAP 86, 98 public network 71 R RAM (random-access memory) 18 rated power consumption 68 realms. See Kerberos remote server administration 42 See also Apple Remote Desktop 123 rendering images 62, 105 requirements cluster 67, 68, 73 hardware 67, 68, 69, 70 software 72 Xgrid administration 18, 24 research-related grid projects 18, 19 S SACLs (service access control lists) 35 scp tool 42 search base, LDAP 86 secure SHell. See SSH security administrator permissions 36 controllers 88 firewall service 28, 51, 88 See also access; authentication serial number 72, 75, 99 Server Admin 99 serveradmin tool 38 Server Assistant 81 servers, remote 42 See also Apple Remote Desktop Server Tools 72 service access control lists. See SACLs Service Configuration Assistant 28, 29 setup procedures. See configuration; installation share points, location of 92 Sharing preferences 33 single sign-on (SSO) authentication 26, 27, 34, 54 software cluster setup 81 requirements 72 space requirements 70 SSH (secure SHell host) 42, 51, 114 Static Maps 99 subnets 24, 87 supercomputing 17 System Preferences 42 T tail tool 37 tasks 18, 22, 52, 104 See also jobs temperature, operating 69 troubleshooting agents 51 authentication 54 124 Index firewall ports 51 multi-CPU machines 52 platform considerations 53 SSH 51 typical power consumption 68 U uninterruptible power supply. See UPS UNIX 53, 61 UPS (uninterruptible power supply) 76 user accounts, setup 94 User Datagram Protocol. See UDP users management of 40 volunteer grid projects 18, 19 See also clients; user accounts V ventilation of hardware 77 Virtual Private Network. See VPN VPN (Virtual Private Network) 90, 101 X Xcode 105 Xgrid advantages 20 application support 53 components 22, 23, 24 introduction 17, 18, 20, 21 management of 37 overview 9 planning for 26 requirements 18, 24 setup 25, 30, 91 starting 28, 31 status checking 37 stopping 38 See also agents; clusters; grids, computational; jobs Xgrid Admin agents 42, 43, 44 grid management 45, 46 jobs 44, 45 overview 39 status indicators 40 testing clusters 103, 105, 106 xgridctl tool 42 xgrid tool 48, 104 Xserve 60, 62
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Page Mode : UseOutlines XMP Toolkit : 3.1-701 Producer : Acrobat Distiller 7.0.5 for Macintosh Modify Date : 2007:10:02 17:41:38-07:00 Creator Tool : FrameMaker 6.0 Create Date : 2007:10:02 17:33:46Z Metadata Date : 2007:10:02 17:41:38-07:00 Format : application/pdf Creator : Apple Inc. Title : Xgrid Administration and High Performance Computing Description : Mac OS X Server v10.5 Leopard Document ID : uuid:33f41734-7149-11dc-8538-00145161d7da Instance ID : uuid:67089d00-7149-11dc-a578-00145161d7da Page Count : 124 Page Layout : SinglePage Subject : Mac OS X Server v10.5 Leopard Author : Apple Inc.EXIF Metadata provided by EXIF.tools