Windows NT Device Driver Book A Guide For Programmers
User Manual:
Open the PDF directly: View PDF .
Page Count: 544
Download | |
Open PDF In Browser | View PDF |
The Windows NT Device Driver Book: A Guide for Programmers Art Baker Cydonix Corporation To join a Prentice Hall PTR Internet mailing list, point to: http://www.prenhall.com/register Prentice Hall PTR Upper Saddle River, New Jersey 07458 http://www.prenhall.com Library of Congress Cataloging-in-Publication Data Baker, Art (Arthur H.) The Windows NT Device Driver Book: A Guide for Programmerss I Art Baker p. cm. Includes index. ISBN 0-13-184474-1 !.)Microsoft Windows NT device drivers (computer programs) QA76.76.D46B355 I. Title. 1996 005.7' 126--dc20 96-22449 CIP EditoriaVproduction supervision and Interior Design: Joanne Anzalone Manufacturing manager: Alexis R. Heydt Acquisitions editor: Mike Meehan Marketing Manager: Stephen Soloman Editorial assistant: Kate Hargett Cover design: Design Source Cover design director: Jerry Votta © 1 997 by Prentice Hall PTR Prentice-Hall, Inc. A Simon & Schuster Company Upper Saddle River, New Jersey 07458 The publisher offers discounts on this book when ordered in bulk quantities. For more information, contact: Corporate Sales Department Prentice Hall I Lake Street PTR Upper Saddle River, NJ 07458 Phone: 800-382-3419, Fax: 201-236-7141 E-mail: corpsales@prenhall.com All product names mentioned herein are the trademarks of their respective owners. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America IO 9 8 7 6 5 4 ISBN 0-13-184474-1 Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Simon & Schuster Asia Pte. Ltd., Singapore Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro Contents PREFACE ........................................................................................................................ ACKNOWLEDGMENTS CHAPTER 1 ............................................................................................ INTRODUCTION TO WINDOWS NT DRIVERS xv xx ............................ I 1.1 OVERALL SYSTEM ARCHITECTURE .......................................................................... 1 Design Goals for Windows NT . . . . . . . . . ..... . . . . . . . . . .. ... . . . . . ...... ........ . . . . . . .... ......... . . . . . . . . . . .... ....... . . . . 1 Hardware Privilege Levels in Windows NT ............ ....................... . . . . . ........ ..... . . . . . . . ........... 2 Base Operating System Components ......... ........... . . ........ . . .... ..... .... . . . . . ....... . . . . .... . . .... . ...... .... 2 What's in the Executive .......... .. . . . ............................... .... .... . ....... . .. . . . ....... . ......... . . .... .... . ...... 4 Extensions to the Base Operating System . . ..... . . . . . . . . . . . . . . . ...... . . . . . ..... ....... ..... . . . . ..... . . . . . ........... 7 More about the Win32 Subsystem ..... . ................ . . . .......... ....... . ... .... . .. .. . .. ..... .. .................. .. 8 . 1.2KERNEL-MODE1/0 COMPONENTS ......................................................................... 10 Design Goals for the UO Subsystem ............................................................................ ..... 1 0 Layered Drivers i n Windows NT ...................................... ................................................ 1 0 SCSI Drivers .................................................................. ... ............. ................................... 1 2 Network Drivers ........ .......................... ............................ .................................................. 1 3 1.3 SPECIAL DRIVER ARCHITECTURES ......................................................................... 15 Video Drivers . . . . . . . . ........ . . ........ . ......... . . . ....... . . . . . . . . . . .... . . ..... . ... . . . . . . . . . . . . . . . .... .... .... . ..... . . . . . ....... 1 5 Printer Drivers ..................................................................... . . . . ..................... . . . . . . ............... 1 7 Multimedia Drivers . . . . . . ..... . . . . . . . . . . . . . . . . . . . ....... . ............ . . . . . ...... . . . . . . . .... . . . .... . . . . . . . ................ . . . . . 20 Drivers for Legacy 1 6-bit Applications ... . ..... .......... .... .... . . . . . ........... . . . . . . . . ............. .... ... . . ... 21 1.4 SUMMARY ............................................................................................................... 23 iii Contents iv CHAPTER 2 THE HARDWARE ENVIRONMENT .............................................. 2. 1 HARDWARE BASICS . . .... . .... ...... .. . .. .... .. . .... .. .. .. .. ... . . . . . .... . ..... . . . . . .. . .. . . . .. . . Device Registers . . . Accessing Device Registers . Device Interrupts . . .. . . . . Data Transfer Mechanisms . . . Direct Memory Access (DMA) Mechanisms . Device-Dedicated Memory . . . .. . . Requirements for Autoconfiguration .. . .. . . . . . . . . . . .... 24 .. 24 .. . . 25 . .................................. ................. ..................... .................................. . ........................... ........................................................... ...... . .. . .. ........... . ........ .......... .... .... ........... .................... . .. . . . .. . .. . . ... . . ........ .... . . . .. . . . . .. .... .. . .. ........................ ...... ........................ ......................... . .... .. . ........................ .... ....................... .. ........... ...... ......... .... ...... . ........................... ........................ . . ... ... .. NT ISA-The Industry Standard Architecture. . .. MCA-The Micro Channel Architecture EISA-The Extended Industry Standard Architecture PCI-The Peripheral Component Interconnect 2.2 BUSES AND WINDOWS . . ....... .................. . ....... .. .... .............. ......... ····················································································· . .... . . 33 ... . ......................... ....... ................. . . .. .... . 26 27 29 30 31 32 33 . 36 ............................................. ........................ .. . . .. ............................. .... . ............... . ............................. .................................. 2.3 HINTS FOR WORKING WITH HARDWARE ......... . . Learn about the Hardware . Make Use of Hardware Intelligence Test the Hardware . . .. .... . . . .. . . ..... . . . . . . . .. . . . .... .. . ... .... . . ... 39 41 .... .. 45 .. . . . . . . 45 ... ...................... .... ...... .. ...... ...................... ...................... . ........................................................ ......................... ...... ............... ............................................................. ........................ 2.4 SUMMARY CHAPTER 3 .. .... ... ..... .... . .... .. .. . . .... ... . ...... . .. ... .. . . . . . . ........ .. ... ...... ... . .. ...... .... 47 . . .... . KERNEL-MODE 1/0 PROCESSING .. .. . . .. Exceptions . Interrupts Kernel-Mode Threads . . ... . .. . . . . . . . ............................................... 3. 1 How KERNEL-MODE CODE EXECUTES .. .... . . . . . .... .. .. . . .... . . 46 46 . .. . . .. .. . . . .... .. . ... . 48 . ... . . 48 . . . 48 . ... . ... .......................... .............. ....... ............. ....... ................... ............ ... ..... .. ........................................................................................................................... . . . ........................................... ..................... ...................... .............. NT CPU Priority Levels . Interrupt Processing Sequence Software-Generated Interrupts 3.2 USE OF INTERRUPTS BY ··················································································· . . . . . . . . . . . . . . 49 49 49 49 50 . . . 51 .................. .................. ........................... ................... ...... ............. . . ............. .... .. .......... ............. .. ............. ..... .................... . . . ............... ...................... ...... .. ...... ......... .. .............. .. . . 3.3 DEFERRED PROCEDURE CALLS (DPCS) . ... .. . ..... . .... ........ . .. . Operation of a DPC Behavior of DPCs . . . . .. . . .... . . ... . . . . . . 51 . .. ... . . . . . . . .. .. . . . 51 ............................... ............................................... ............. .... ... . .. . ........................................................................ ....... ............................ 3.4 ACCESS TO USER BUFFERS... .. .. .. . Buffer-Access Mechanisms . . ... ........ ... .... . ...... ...... .... .. ........... . .. .. ... . . .. . . . .. . . . . ................................................ .............. .................... ......... 3.5 STRUCTURE OF A KERNEL-MODE DRIVER . . . .. . .... . .. . .. Driver Initialization and Cleanup Routines . I/O System Service Dispatch Routines Data Transfer Routines Resource Synchronization Callbacks . Other Driver Routines . . . . .. .... ... .... . .... . .. . . . . . . . ........................ ........... .......... ..................... . ...................................... ...................................... . .. . . ............................................... ................. . .... .. ......................... . ... ........................ ............................................. ...... . . ........... .......................... ................................................... ............ 3.61/0 PROCESSING SEQUENCE . . . .......... ... .. . ..... . ... .. . . . Request Preprocessing by NT Request Preprocessing by the Driver Data Transfer Postprocessing by the Driver Postprocessing by the I/O Manager .. . .. . . ... ..... . ....... . . ....... . ... ... . . 53 . . . . .... . 54 . ... . .. . . 52 . . 53 ... . . . . .. . 55 55 56 57 57 . 58 ... 58 59 59 60 . . 60 ........................................................................................... . . . . ............................. ...... .................. ........................ .................................................................. .................................................. ............................. ............................................................... .................. ......................................................... .. ... 3.7 SUMMARY . ... ... ..... . ... . .. . . . .. .. . .. . . . ... .. . ... .. ... .... . .. . . . . . . .... . . .... . . .. . .. .. . ....... . . . . . . . . . . . . . .. . . ... 6 1 ... .. Contents CHAPTER 4 v DRIVERS AND KERNEL-MODE OBJECTS ................................. 62 4. 1 DATA OBJECTS AND WINDOWS NT······································································· 62 Windows NT and OOP . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 NT Objects and Win32 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 .. . . . . .... . . . . ... . . . . . . ... . .. . ... . . . .... . ..... ... . . . 63 Layout of an IRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... 64 Manipulating IRPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . .... . . . . . . 65 4.2110 REQUEST PACKETS (IRPs) ...... 4.3 DRIVER OBJECTS . .. . ... . .... . . ... . . . . .... . ... . . .... . . . .. .. .. . . . ..... . . . .. . .. . .. ·················································· 67 ..... . . . Layout of a Driver Object . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 DEVICE OBJECTS AND DEVICE EXTENSIONS .. . . . . .... . . . . ... . .... . . . .... . .... . .. . .. 69 Layout of a Device Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Manipulating Device Objects . . . . . . .. . 70 Device Extensions . . . . . . . 7l . . . ...... . .... ... . ..... .. . ... ............ .............................. .............. ... ............. ............ ............ ... ............................ ........ ...................... ................. 4.5 CONTROLLER OBJECTS AND CONTROLLER EXTENSIONS . . . . . ... .. . . . . . .. . . .. . . .. . . . 7 1 Layout of a Controller Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Manipulating Controller Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Controller Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 . . ... .. ... . ..... . . . . ... .. .. .... . ... . . .. . . .. . .. . .. . . .. .. . .... 74 Layout of an Adapter Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Manipulating Adapter Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.6 ADAPTER OBJECTS .. . . .. . . .. .. ... . . . . . . ....... .... . . . . .. . ...... . . . .... .. . .... . . . . .... . .. .. . . .. .. . . . . . . . . . .. . . . . . . . . . . . . . . . .. . 76 Layout of an Interrrupt Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Manipulating Interrupt Objects . . . . . . . 77 4.8 SUMMARY . ... . . ... . . . . ...... ........ . . . .. . . . . . ... . . . . . . .. . .... . 77 4.7 INTERRUPT OBJECTS . . ... . .. . . . . .. ... . . . . ....... ..... . . . .... ............................................ ........ ................. .. . .. .... . . CHAPTER 5 ... . . .... . . .......... . . ........ . .... . . .. .. ........ ..... GENERAL DEVELOPMENT ISSUES ............................................. 77 5.1 DRIVER DESIGN STRATEGIES . ... . . ..... . ... .. . ...... . . .. . . .. . . Use Formal Design Models . . .. Use Incremental Development . .. . . Use the Sample Drivers . . . . .. ...... ....... . ........... .. . . . . . . . . . . . . . . .. . . .. . . 77 78 .. . .. 79 . . 80 .... ... ..... . . ................................................ ..... .................. ........ .. ... .. . . . . ............ . ......................... . ................... . . ... ... ... ............ .. .... ............. . . ................. ...... ... ....... .......... .......... ... 5.2 CODING CONVENTIONS AND TECHNIQUES .. ... . . . ...... . . . . .. . ...... . ... . ..... . 80 . . . . ......... . . . . . .. . . . General Recommendations . . . .. . . .. 80 Naming Copnventions .. . . . .. . . . 81 Header Files 81 Status Return Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 NT Driver Support Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Discarding Initialization Routines . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Controlling Driver Paging . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . ...... . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 ..... ..................... ............. ... ........ .......................... ... ... . ........... .. .......................... ............ ........ . ... .. ......................... ...... ...................................................................................................................... 5.3 DRIVER MEMORY ALLOCATION ............................................................................. 8 6 Memory Available to Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........... 86 Working with the Kernel Stack . . . ... . .. . . . 87 Working with the Pool Areas . .. . . ... .. .. . 87 System Support for Memory Suballocation . . . .. 88 ............. .. ....... .......... . . .......................... .... ....... . ..................... ......... ......... ..... ..... ....... . .................. ......... ............ ............... .............. .................. . ..... 5.4 UNICODE STRINGS .................................................................................................. 9 1 Unicode String Datatypes .. . .. .. . . .. 91 Working with Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 ......... ................... ......... . .... . ........................ .... .......... . .... Contents vi 5 .5 INTERRUPT SYNCHRONIZATION . . ... . ... . . . . . .. . . . .... .. ... .. . ... . . .... . . . . . . . . . . .. . ........ .. . . .. . . . . . . ..... 93 The Problem ...................................................................................................................... 93 Interrupt Blocking ............................................................................................................. 94 Rules for Blocking Interrupts ............................................................................................ 94 Synchronization Using DFeferred Procedure Calls .......................................................... 95 5.6 SYNCHRONIZING MULTIPLE CPUS . . . . . . . . . .... .... .... . . ... . . . .. . . .. . . .. ... 95 . . ........ . ........ ........ . .. . How Spin Locks Work ...................................................................................................... 95 Using Spin Locks .............................................................................................................. 96 Rules for Using Spin Locks .............................................................................................. 97 5 .7 LINKED LISTS . ... . ..... ...... . . .. ........ . .. ... . .. . ... . ... .... ...... . .... . . .. . . ... ...... ... . . . . . 98 . . ........... ... . .. ... Singly-Linked Lists ........................................................................................................... 98 Doubly-Linked Lista ......................................................................................................... 99 Removing Blocks from a List ........................................................................................... 99 5 .8 SUMMARY ...... CHAPTER 6 .. . ... .... . . ... .. ...... . .......... . . . . . . . ... .. ..... .... . . . . .. ... . . .. . INITIALIZATION AND CLEANUP ROUTINES 6.1 WRITING A DRIVERENTRY ROUTINE . ...... . . .... .. . . .. . . . .. . 100 .... . .... ........ . .... . . . . .. ..... . ........ .... . . . ......................... ..... .... . . . ...... ......... 101 101 Execution Context ........................................................................................................... 1 01 What a DriverEntry Routine Does .................................................................................. 1 02 Initial.izaig DriverEntry Points ....................................................................................... 1 03 Creating Device Objects ................................................................................................. 1 03 Choosing a Buffering Strategy ........................................................................................ 1 04 NT and Win32 Device Names ........................................................................................ 1 05 6.2 CODE EXAMPLE: DRIVER INITIALIZATION . .... INIT.C . ..... . .. .... . .... . .. .......... . .... . . .. ... ...... . .... 105 ............................................................................................................................. 6.3 WRITING REINITIALIZE ROUTINES .. . .. 1 06 . . . . . .. .. .. 113 Execution Context ........................................................................................................... 1 1 3 What a Reintializae Routine Does .................................................................................. 1 1 3 ... ... ....... ..... .... .... . ..... ........ . ........ . ..... 6.4 WRITING AN UNLOAD ROUTINE ........................................................................... 1 14 Execution Context ........................................................................................................... 1 1 4 What an Unload Routine Does ........................................................................................ 1 1 4 6.5 CODE EXAMPLE: DRIVER CLEANUP ..................................................................... 115 UNLOAD.C .................................................................................................................... 1 1 5 6.6 WRITING SHUTDOWN ROUTINES .......................................................................... 118 Execution Context ........................................................................................................... 1 1 8 What a Shutdown Routine Does ..................................................................................... 1 1 9 Enabling Shutdownb Notification ................................................................................... 1 1 9 6 .7 TESTING THE DRIVER ........................................................................................... 119 Testing Procedure ............................................................................................................ 1 20 The WINOBJ Utility ....................................................................................................... 1 20 6.8 SUMMARY............................................................................................................. 12 1 CHAPTER 7 HARDWARE INITIALIZATION .................................................... 122 7.1 FINDING AUTO-DETECTED HARDWARE ............................................................... 122 How Auto-Detectoin Works ........................................................................................... 1 22 Auto-Detected Hardware and the Registry ..................................................................... 1 23 vii Contents Querying the Hardware Database What a ConfigCallback Routine Does Using Configuration Data . . Traqnslating Configuration Data . . . . . . . . ............................. ...................... .............................. . . . . ................. ........... ....... .. ..... ............................. ............ ..... .............................................. ............................. . . . . .. .......... ... .. .. ....... ................ ......................... .... . 7.2 CODE EXAMPLE: LOCATING AUTO-DETECTED HARDWARE AUTOCON.C ................ .. . . ..... 130 ................................ . . . 1 25 1 27 1 28 1 30 ........... ............................... ...... ........................... ... ............. 1 32 7.3 FINDING UNRECOGNIZED HARDWARE································································· 139 .. Adding Driver Parameters to the Registry Retrieving Parameters from the Registry . Other Sources of Device Information . .......... . . . . ................................................... ...... . ... ...................................... ...................... ...... . . . ....................... ............... ....................... ..... .. ... 1 40 1 40 1 41 7.4 CODE EXAMPLE: QUERYING THE REGISTRY ........................................................ 14 2 REGCON.C ..................................................................................................................... 7 .5 ALLOCATING AND RELEASING HARDWARE How Resource Allocation Works How to Claim Hardware Resources How to Release Hardware Mapping Device Memory Loading Device Microcode . . . . ............................... .................... .... ........ ........................................................................... . . . . . .... ....... .............. ......... ...... ................ ................. . ........................ .. . 7.7 SUMMARY CHAPTER 8 ............................... .......................... ........................................ 7.6 CODE EXAMPLE: ALLOCATING HARDWARE RESALLOC.C .. . . 1 52 . 1 53 ............................................................................................... ........................................................ ..... 1 43 152 1 55 1 56 1 57 . 158 . ........... ..... .............................. .... .. . . .................................................................................................. .. .......... .....•..................•.••.....•...•.............••.....•.•••..............................•.•......••..••• 1 58 162 DRIVER DISPATCH ROUTINES ................................................... 163 8.1 ENABLING DRIVER DISPATCH ROUTINES . . ........... ........................ ........................ 1/0 Request Dispatching Mechanism . . . . . . 163 ... ....... .............. .................. ................................ Enabling Specific Function Codes . Deciding Which Function Codes to Support . . . ....... ...................... ...................... ...................... ..... 8.2 EXTENDING THE DISPATCH INTERFACE Defining Private IOCTL Values IOCTL Argument-Passing Methods Writing IOCTL Header Files . . .............. .......... ............................ ..... ..... . .. . . .............. . ..................... ......... .......... ... . . . . . . . 1 63 1 64 1 65 165 . 1 67 . 167 .......... ...................... ... ........................... ................. . . .... ... .................. ............... ................................. . . .................................................................................. ....... 1 69 8.3 WRITING DRIVER DISPATCH ROUTINES······························································· 169 Execution Context What Dispatch Routines Do Exiting the Dispatch Routine . . .............................. ................................... ........................................ . . .................................. ..................................................... ... . . . . . ... ............................. .. ........... .............................. .......... 1 70 1 70 1 71 8.4 PROCESSING SPECIFIC KINDS OF REQUESTS ........................................................ 173 Processing Read and Write Requests Processing IOCTL Requests Managing IOCTL Buffers .. .. . . . ....... . ................... ....... ........................................ ...................... . . . . . ................................. ..... ............. ........ .... . ......................... ......................................... ...................... .... 1 73 174 1 77 8.5 TEsTING DRIVER DISPATCH ROUTINES................................................................ 17 8 Testing Procedure Sample Test Program . . . . . ..................... ................ .................. ...................... ........................... 8.6 SUMMARY . . .. . . ..... . . . . . .. ................. ................ .................... ....... ...................... .... . . ... ............... ....... .......................... ................... ................ .................. 1 78 1 78 179 Contents viii CHAPTER 9 PROGRAMMED 1/0 DATA TRANSFERS 9.1 How PROGRAMMED1/0 WORKS ........ . . . . ...... ....... . ...... . ... ... .... . .... .......... . What Happens during Programmed 1/0 . Synchronizing Various Driver Routines . . . .. .. .. ... . . .. . . ... .. .... . . ............. Initializing the Start 1/0 Entry Point . . Initializing a DpcForlsr Routine . Connecting to an Interrupt Source . . .. Disconnecting from an Interrupt Source ... . . . .. . . .... .... .. . .... . . .. . . . ......... . . . .. . .. . . 182 ............. . . . . . . .. . . . . . .. .. . . . . . . .... ................................ ........ .. ....... . . . . . 180 . 180 . .. . . .. 1 81 .. .. .. .......... ......... ......... . . .. . . . ... .... ..... . .. . .. . . . . .. . . . ................... . ......... ....... .. 9.2 DRIVER INITIALIZATION AND CLEANUP 180 .................................... . . 182 183 . . 1 83 .. . 1 85 ........ .. ..... ....... ....................... ............ ........... ........ .................... .. . .. . .... .......... .. .............. .. ... . . . .. .... ........ ...... ........................................ .................. .... 9.3 WRITING A START 1/0 ROUTINE . ...... . . Execution Context . What the Start 1/0 Routine Does . ... .. .... .. . . . . .. .. ........ ....... . .... . .... .. ... 185 ... . .. ... . .. . . .. . . . . . . . . . ................................... ... .. .............................................................. . . . . 9.4 WRITING AN INTERRUPT SERVICE ROUTINE (ISR) Execution Context .. . What the Interrupt Service Routine Does . .. . .. . ... .. . . . . ... . 186 ....... . ............... ... .. . .. . ... .. . . ....... . ...................... ............................. ....... .................................... . .. ...... .... . ............ 9.5 WRITING A DPCFORISR ROUTINE ....... . . Execution Context What the DpcForlsr Routine Does Priority Increments .. ................................... .... ..... . . .. ............. .......... ......... . .... ... ... .. . . . .... . ...... . . ... . .... . . . . . .. . . . . .... . . . . . . . . .. . . ........ ... . .. . . . . . 188 . . . . .. . . ... .. . . . . 188 . 1 88 . . . .. . . . 189 ..................... .... ....... .. ......... ................................ . . .. ........................ . ........... . . 9.6 SOME HARDWARE: THE PARALLEL PORT . . . ....... .. . . . . . . . . How the Parallel Port Works Device Registers . . Interrupt Behavior . . . A Driver for the Parallel Port .............. .... ... ..... . .. ........ . .. . ... .... .. . . ..... .. . . . .. . . . . .. . . ... ... . . .. ... .. . 189 . . . ... . .. .. . . .. .. . . . . . . 189 .. . . . ... . . .. 1 91 . .. .. . . . . . . . . 1 92 . . . .. . . . . .... . 1 92 ... .. .. ........ ...................... ........... ............ .... .. . ................... ................ ............ . ............ ... .. ........................... ................... ..... . . ....... .. .. ......... .... . ....... ...... ... .... .. ......... .. .... .. ............ ......... .. ........... .... .... .... . .. .... ..... ... ....... 9.7 CODE EXAMPLE: PARALLEL PORT DRIVER .... . . .... . . . ... . .... . . . .. . . XXDRIVER.H . INIT.C TRANSFER.C . 1 86 . 187 . ...... ................... .. ......... .... .. ... . . .. . . 1 85 . . . . 1 86 ............ ............................... ....... .............. .... . . . .. . . . . . . . . .... . . ... .... . . . . . .. . ................... .. ................................................. ... ....... ... . .... 192 . ... . 1 92 193 . . 1 95 ....... ............ ............................................................................................................................. . . . . .. . ...... ..................................... ................... ...... ....... ...... . ... ............. ... .. 9.8 TESTING THE DATA TRANSFER ROUTINES Testing Procedure CHAPTER 10 . .. . . . . . ..... . . .. ............ . . . .. . . . .. . . . . .. .. . . .... . 201 .... . . . . 201 .......... .. ........................ .... .................................... ... .......... .. .... .. . 9.9 SUMMARY . ... .. . . . . . .. . ...... . . .. . . TIMERS ... . . .. . . ........ ..... .... . .... . .... .. . . ...... . . ... ... ...... . .. . .. . . . . . . . .. . .. .. ....... .. 202 . 203 ............................................................................................ . .... ...... .......... .......... ................... .. .... ... 203 10.l HANDLING DEVICE TIMEOUTS .. .... . . How 1/0 Timer Routines Work . How to Catch Device Timeout Conditions . . ... . . . ........... ........................................................................... . ............ ................................... ..................... 10.2 CODE EXAMPLE: CATCHING DEVICE TIMEOUTS . . . .... XXDRIVER.H . INIT.C TRANSFER.C TIMER.C . .. . .... . ..... .... . . . . . . . .. ..... . . . 203 204 .. ... 205 . ..... . ... 206 206 207 . 209 .... . . 211 .. . 21 1 . 212 . . 21 3 ............................... .. ....................................... ......... ............... ........... ........................................................................................ ..................................... ............................................................................... .................................. . .. . . . . . . . . .. . ................ .. . ........ .......... .......... .......... ....... .. ...... ........... .. 10.3 MANAGING DEVICES WITHOUT INTERRUPTS . . .. . . . . .... Working with Noninterrupting Devices .. . How CustomTimerDpc Routines Work How to Set Up a CustomTimerDpc Routine ... . . How to Specifiy Expiration Times Other Uses for CustomTimerDpc Routines . .. . ... .... ... .... ........... . ... . . .... . .... . . . .. ... . . . . .. .. .. .. . . ....... . ........... .... . .. .. ... ... ... . . . ......... .... . ....................................................................... .. . . . ... . .. ............ .... .............. ... . . ...... .. . ...... .. ........................................................................ ......... ............ ....................................................... 214 215 Contents ix 10.4 CODE EXAMPLE: A TIMER-BASED DRIVER......... ..... . . XXDRIVER.H INIT.C . .. . TRANSFER.C . . . . . . ... .. .. . . .................. . ....... 215 . 21 6 21 6 . . 21 7 ... . . .. ............ .................. .. .............. .. . ................................................ . . ...... ........ .... . ............... .. 10.5 SUMMARY .. . ..... . . CHAPTER 11 .. . .. . . . .. . .. ......................... . .. ......................................................... . . . . . ..... .. ... ...... .. ........................ .. .. .... . ........................................ .. ........ .......................................... . . ............. . . . . . .. .......... ......... 221 .... . . ... . FULL-DUPLEX DRIVERS ........................................................... 222 11.1 DOING Two THINGS AT ONCE .......... . ........ . . . . ... . .... ....... . ........ . . .. Do You need to Process Concurrent IRPs? How the Modified Driver Architecture Works Data Structures for a Full-Duplex Driver Implementing the Alternate Path . . . .. . . . . . .. 222 ... ... . ... . . . . ............................ ........................................ . . ..................... ............ ...... . .. . . .. ................................................... . 223 . . .... . 223 . .. 224 . . . 225 . .... .. .. .. .......... .. . .. . .......................................... ............................. .. .... . 11.2 USING DEVICE QUEUE OBJECTS .... ............... . . .. . . . ..... ......... .............. .......... 225 . How Device Queue Objects Work How to Use Device Queue Objects . .. . . . . . . . . . . . . . ......... .. ..................................... ... ....... .. ....... ........ . . . 225 . . . 226 ........... ... ................................... .......... ..... ..... . .. . 11.3 WRITING CUSTOMDPC ROUTINES ...................................................................... 228 How to Use a CustomDpc Routine . . . . .. . . Execution Cointext . . .... .. . . . . . . . . . . . . .. ....... ....... .. ... . ........ .... . . 11.4 CANCELING 1/0 REQUESTS .. . . ... . . . .. 228 . . ... ........ .. .................. .............. . . .. . .. . ............................... ....... . .. .................. 229 ...... .. .. . .... .... . ... . .. . .... .............. ..... . . . .. . . . ... ... 229 How IRP Cancellation Works .. . . . 230 Synchronization Issues . .. . .. . . . 231 What a Cancel Routine Does . . . . . . 232 What a Duispatch Cleanup Routine Does ... ... 234 . .... .. . . . . .. . ....... . ............ .. ................ .............................................. ................... . ............ ... . .. .. ................................................. .. ......... .... ............... ...................................................... .. ......................................................... .. ...... 11.5 SOME MORE HARDWARE: THE 16550 UART.. . . . ... . . . ..... ..... . . . . .. . .... . . ....... . . 236 . What the 1 6550 UART Does Device Registers . . Interrupt Behavior . . . . . . . . .. . . .. . . .. ......... .. ...................................... ................ ... . ....... ................................... .. . .. ................... . . . .. . . .. .. 236 .... . . . . 236 238 . .... ............. ............... . ........................................................................................................... 11.6 CODE EXAMPLE: FuLL-DUPLEX UART DRIVER ...... . ..... .. . ...... .................... ... 239 . What to Expect DEVICE_EXTENSION in XXDRIVER.H DISPATCH.C . DEVQUEUE.C INPUT.C ISR.C CANCEL.C . . . 240 . 240 . . 241 . 244 247 249 253 ........................................................... .................................................... . . .................. ............................. .................. . ................................................................................................. ..... .... .... ............................................................................................................ .. ......................................................................................................................... ............................................................................................................................... . . ....................................................... ................................. ........ ................... 11.7 SUMMARY ................... ...... ........... ...... .... . CHAPTER 12 . .. .. . ....... .. . . . . ..... . . . . . . . ... . ........ . .. ......... 257 . .. .. DMA DRIVERS ............................................................................... 258 12.1 How DMA WORKS UNDER WINDOWS NT .......... . .............. . . .... . .... . ...... . ........ 258 . Hiding DMA Hardware Variaitons with Adapter Objects Solving the Scatter/Gather Problem with Mapping Registers Managing 1/0 Buffers with Memory Descriptor Lists Maintaining Cache Coherency Categorizing DMA Drivers .. Limitations of the NT DMA Architecture .. 258 259 261 263 .. 265 . 265 .............................................. . ..................... .................. .................................................... . ................................................................... .................... ............................... ............. . . . . . ..................................... ... .............................................................. ........ 12.2 WORKING WITH ADAPTER OBJECTS ................................................................... 266 Fiding the Right Adapter Object ..................................................................................... 266 Contents x Acquiring and Releasing the Adapter Object. Setting Up the DMA Hardware Flushing the Adapter Object Cache . . . 268 270 271 ....... ...................... .................................. . . . .............................................. ..... ...................... ........... .................................................................... ........... 12.3 WRITING A PACKET-BASED SLAVE DMA DRIVER How Packett-Based Slave DMA Works Splitting DMA Transfers .........................................•.. 272 272 274 ......................................................................... ........................·......................................................................... 12.4 CODE EXAMPLE: A PACKET-BASED SLAVE DMA DRIVER XXDRIVER.H REGCON.C TRANSFER.C ..................•..........•. 276 276 277 278 ................................................................................................................ ..................................................................................................................... . . . ........ ...................... ................................................... ............................. 12.5 WRITING A PACKET-BASED Bus MASTER DMA DRIVER································· 285 Setting Up Bus Master Hardware . Hardware with Scatter/Gather Support . Building Scatter/Gather Lists with IoMapTransfer 286 288 289 . ................................................................................. . . . ................ ...................... ................ ... .............. ......................................................... 12.6 WRITING A COMMON BUFFER SLAVE DMA DRIVER ........................................ Allocating a Common Buffer . . Using Common Buffer Slave DMA to Maintain Throughput 291 .............................. ........... ............................................... ........................................ 12.7 WRITING A COMMON BUFFER Bus MASTER DMA DRIVER .............................. How Common-Buffer Bus Master DMA Works .........................................................•................................................. CHAPTER 13 296 . ............................................... ............ 12.8 SUMMARY 296 297 LOGGING DEVICE ERRORS ...................................................... 299 13.1 EVENT-LOGGING IN WINDOWS NT Deciding What to Log How Event Logging Works . . ........ ............... ........................................... . 299 . ................................................ .................................... ............... . . ............... ............... ............................................................ 13.2 WORKING WITH MESSAGES . ................................................ ............................... How Message Codes Work . Writing Message Definition Files . A Small Example: XXMSG.MC Compiling a Message Definition Files Adding Message Resources to az Driver Registering a Driver as an Event Source . . . . ................................................ ... .............. .............. . . . . .. .. . ......................... ...................... .................................... ................ ...................... ................................... ............ ...................... ..................... ............. . .................................... .................................... 13.3 GENERATING LOG ENTRIES . . ...... .................... .................................................... Preparing a Driver for Error Logging Allocating an Error-Log Packet Logging the Error . . . . . . .. . . . . . .. . . . . ...................................... .............................. ............. ... 13.4 CODE EXAMPLE: AN ERROR-LOGGING ROUTINE EVENTLOG.C 13.5 SUMMARY . .... .. .. . ..... .. . ........ ........ .............................................. 31 0 311 31 2 313 . ........... .................................................................................................... •.•..•.•.••..•.••.•••..•..••....•.•.•....•.•.•••..•.•.•.•.•..•...••••.•........•.•.••••.•..•••...•.•.••.••• 302 303 305 307 308 309 310 ........................................ ...................... ............. ............. .......... ........................ ........... . 299 300 301 . ......... .................................................................... .... ......... CHAPTER 14 291 292 31 3 319 SYSTEM THREADS ....................................................................... 320 14.1 SYSTEM THREADS .•.••••.••.•................••••.•.••••.••••..•........•..........••.•.••••...•.••..••...••••. When to Use Threads Creating and Terminating System Threads Managing Thread Priority . System Worker Threads . 320 ...................................................................................................... ..................................................................... . ........... .................. ................................................................ . .............. ...................... ............................................................ 320 321 322 322 Contents xi 14.2 THREAD SYNCHRONIZATION .............................................................................. 323 Time Synchronization General Synchronization . . .. . .............. ......... ...... . .......... .... . . . . . . ... . . ............................................... . . . ..... ....... ... .. ......... ....... . . .. .......................... ................. ........ 323 323 14.3 USING DISPATCHER OBJECTS ............................................................................. 325 Event Objects . . Sharing Events between Drivers Mutex Objects . Semaphore Objects Timer Objects Thread Objects . Variations on the Mutex Synchronization Deadlocks . . . .. . . ........ .... ...................... .. ... . ................... .. . . . . . 325 327 . 327 329 330 331 332 . 333 ....................... ........... .. ....... ..................................... ............................................... . ............. ........................................... ....... .... ....................................... ... . . . . ......................................... ......... ..................................... .... ........... . . .................................................... .................................... ........................ . . ................... ........................................................ ..................... ............. . . .. ....................................................................... .......................... . . ..... ...................... .... ............................ .. ....................... . 14.4 CODE EXAMPLE: A THREAD-BASED DRIVER .................................................... 334 How the Driver Works . . . . . . . 334 The DEVICE_EXTENSION Structure in XXDRIVER.H . . . . 335 The XxCreateDevice Function in INIT.C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 The XxDispatchReadWrite Function in DISPATCH.C . . . . . . 338 THREAD.C 339 TRANSFER.C . . . . . . . 341 .......... ........ ........ ........ .... ......... ......... ............................... ...... ..... ........ .. .................. ...... ............ ........ .... ..... .. .. ........... ..................................................................................................................... . . ............ ...................... ....... .... ......... ........... ................ ....................... 14.5 SUMMARY ........................................................................................................... 349 CHAPTER 15 IDGHER-LEVEL DRIVERS .......................................................... 350 15. 1 AN OVERVIEW OF INTERMEDIATE DRIVERS ...................................................... 350 What Are Intermediate Drivers? Should You Use a Layered Architecture? . . .............................................. ........ ............................. . . . . .. ........ ....... ........ .......... ................... . ............ 350 351 15.2 WRITING LAYERED DRIVERS ............................................................................. 352 How Layered Drivers Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Initialization and Cleanup in Layered Drivers 353 Code Fragment: Connecting to Another Driver . . 354 ' Other Initialization Concerns for Layered Drivers . . . . . . . . . . . . . . . . . . 356 1/0 Request Processing in Layered Drivers . . . 357 Code Fragment: Calling a Lower-Level Driver . . . . . . 359 15.3 WRITING 1/0 COMPLETION ROUTINES ............................................................... 360 Requesting an 1/0 Completion Callback . . 360 Execution Context . . . . . 361 What 1/0 Completion Routines Do . ... . . . . . . ... . 362 Code Fragment: An 1/0 Completion Routine . . .. 363 ................................................................ .............................. .. ............................ ........... .... .... ...... .............. .................. .... ............... ............................ .......... .... ................. ........... ........... .. . ........................................................ .......... ..... ........................... .... ................ ...................... ..... ............................ .... ... ............. .. .... ........... ...... . ......... ... .. ...................... . ............................................... 15.4 ALLOCATING ADDmONAL IRPs ........................................................................ 364 The IRP' s 1/0 Stack Revisited . . Controlling the Siz of the IRP Stack . . . .. Creating IRPs with IoBuildSynchronousFsdRequest Creating IRPs with IoBuildAsynchronouysFsdRequest . . Creating IRPs with IoBuildDeviceloControlRequest Creating IRPs from Scratch . . . . . . . . . . Setting Up Buffers for Lower Drivers . Keeping Track of Driver-Allocated IRPs. .. . . . ............ ................ ..................................... ......... ..... ................... . .. . 364 365 367 368 369 371 374 . 375 ........ .......... . .... ................................... . ................................ .................... . . .. ............ ............................... ..................................................... .. . ...... ... .. .. ................ ......................................... ......... . . . ................. ............................... .......... ... ............ . . . ...................... ................... . . ....... .... ...... .. 15.5 WRITING FILTER DRIVERS ................................................................................. 376 How Filter Drivers Worl . . . . . . . . . . .. Initialization and Cleanup in Filter Drivers .. . . ..... . .. ... . . .. . ..... ............. ... . . . .. .. . ....... ....... . ................... ...... ........................ ... ... . . .......... ........ ......... . . . . . 377 378 xii Contents What Happens behind the Scenes Making the Attachment Transparent .......... . .. . . .. . .............................. .......... .. .. ......... .. . . .. . 380 ...................................... ......................... . .................. . 380 . 15 . 6 CODE EXAMPLE: A FILTER DRIVER . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 YYDRIVER.H-Driver Data Structures INIT.C-Initialization Code . DISPATCH.C-Filter Dispatch Routines COMPLETE.C-1/0 Completions Routines ............ ..................... . . . ...... .......... ............ 15 .7 WRITING TIGHTLY COUPLED DRIVERS . . ... ...... ............ .. . ............. . .......... . . ... ......... .. . . . .. . . . ....... ...................... ................. ............................. . ........... .... .. . .... . 381 . .... .......... . . 386 ... ............. 390 ......................... 394 .............................................................. How Tightly Coupled Drivers Work . Initialization and Cleanup in Tightly Coupled Drivers 1/0 Request Processing in Tightly Coupled Drivers 381 ................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 394 395 396 ............ ................................................... ....................................................... 15 . 8 SUMMARY CHAPTER 16 ................................................... .............. .......................................... BUILDING AND INSTALLING DRIVERS ................. ................ 397 398 16. 1 BUILDING DRIVERS · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 398 What BUILD Does .. .. . How to Build a Driver . . . .. Writing a SOURCES File . Log Files Generated by BUILD Recursive BUILD Operations . ................ .......... . ........ ..... ... . .. ................. .. . .... ......................... ............................ ............... .. . . . . .. .. .. ... .......... ........ ... . . . .......................................... .... . ................. . . .................................. ........ ... ... .. 16.3 INSTALLING DRIVERS .................................. . ... .... ....... ....... . . . ........ ....... . . . .. ............ . Changing the Driver' s Start Value . Creating Explicit Dependenceies between Drivers Establishing Global Group Dependencies Controlling Load Sequence within a Group . CHAPTER 17 .... . .......... . . .. ... .................. .. . . .. . Categories of Driver Errors . Reproducing Driver Errors . . Coding Strategies That Reduce Debugging Keeping Track of Driver Bugs . ......... 17 .3 READING CRASH SCREENS . . . . ........ .. . .... . . . . .. . ... . .. . ... ............ . . . . ...... . .. . 405 407 408 . ......... 409 . 409 41 0 41 0 . . .. 41 2 ............ . .. ... . . ... .. . . . . .. .. . .. ......... ..... ............ . . .......... ...... 413 . 41 3 .. 41 4 . ............. ......... ............. .................................... 4 19 ........................ .... . . . . .. . .... . . 41 5 41 6 418 ............. .. .. .. . . . .. ..... 419 . 41 9 .. .......................................... .. . . . .. . . .... . .............. .. .. . ... . ................... ... . .. .......... . . ....... . . ........... 421 422 ...................................... .......... .. .. . . .................... ................................................ .. . ... 404 . . ... 404 .. .. .. . . . . .. . . . .. .. .. . . . .... .. .................................................. ........................ . ........ ...... .................................. . .. .............................. ........................................ ...... .... . . . .. . .. ............................................ ............................. ...... . ........ .................................... ........... . ............................ The General Approach to Testing Drivers Using the Microsoft Hardware Compatibility Tests (HCTs) 17 .2 SOME THOUGHTS ABOUT DRIVER BUGS .. ............................. .......................... 17 .1 SOME GUIDELINES FOR DRIVER TESTING . ......................................... TESTING AND DEBUGGING DRIVERS ........ ... ............................................. .............................. ... ............................... ...................................... 16.5 SUMMARY . ............................ ................... .... ......... ............ . . . .. .... . .. .. . .. . .... .......................................................... ...... .......... .. ..................................... ................... . . . ............................................ 16.4 CONTROLLING DRIVER LOAD SEQUENCE . .................................... . ............................... How to Install a Driver by Hand Driver Registry Entries . . End-User Installation of Standard Drivers . End-User Installation of Nonstandard Drivers . . ... ............................... Using Precompiled Headers . .. Including Version Information in a Driver .. .. .. ... .. Incl.uding Nonstandard Components in a BUILD . . ... Moving Driver Symbol Data into .DBG Files . .. .... ........................................................... 16.2 MISCELLANEOUS BUILD-TIME ACTIVITIES ...................... . 398 . .. . . . 400 . . ... 401 . 403 . .. . .. .. . .. 403 ............... . ............................................ ..... ..................... . . . . .. . . . .. . .. . .. 422 .. 424 .... ... .............................. 425 425 426 Contents xiii What Happens When the System Crashes . . Layout of a STOP Message Deciphering STOP Messages ... .. . .... . . .. ...... . ....... . ............................... .. ...... . 426 .. ............................................................................................. .......................... 17.4 AN OVERVIEW OF WINDBG . .. . . ....... The Key to Source-Code Debugging A Few WINDBAG Commands .. ..... . . . . . . . . .. .... . ...................................................... .... . .. . . . ........... . .. . .. . . . . . .. . . . .. ..... . . . . . . . . . . . . ... . . ..... .. 430 ............................................................ ...................................................................................... 17.5 ANALYZING A CRASH DUMP ...... .. . . . . . . . . . . Goals of the Analysis Starting the Analysis . . . . . Tracing the Stack . . Indirect Methods of Investigation . . . Analyzing Crashes with DUMPEXAM .... . .. . ..................................... . ... .. .............. .. ........... ............ . . . . . . .... . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................. . 433 . ................................................................................. ...... .... ....... .. . . ................................................ ............ . 430 431 .. . . . . .... ... . 433 ............................................................................... .. 427 429 . . . . ........ .......................................... ... ...... ...... 433 434 436 439 17.6 INTERACTIVE DEBUGGING .................................................................................. 440 Starting and Stopping a Debug Session Setting Breakpoints . . . Setting Hard Breakpoints ....... .. ....... .................................................................... ................... . . .. ....... . .......... . .................................................................. 17.7 WRITING WINDBG EXTENSIONS . ..... .... . . . . How WINDBG Extensions Work . Initialization and Version-Checking Functions Writing Extension Commands WINDBG Helper Functions .. . Building and Using and Extension DLL . ................... . . . . . . . . . . . . . . . . . . XXDBG.C XXDBG.DEF SOUIRCES file Sample Output . 442 . ............................... . ............. . . . .... ...................................................... ........................................................... .. . ................... .............................................................. 17 .8 CODE EXAMPLE: . ............................. ........................................... . ..... . .. . .... . ..... . .... . .... . ... 442 ............................... ............................. .. ... . .... . . . . . 440 .. . 441 ....................................... . ... .... ......... 442 443 444 445 446 A WINDBG EXTENSION ........................................................ 446 446 . . . . . . . 451 . . . 451 . . . 452 ....................................................................................................................... ......... ...................... ... .... .. ........................................ ........... ......... .... .. ......................... .................................... 17 .9 MISCELLANEOUS DEBUGGING TECHNIQUES .. . Leaving Debug Code in the Driver Catching Incorrect Assumptions Using BugCheck Callbacks.... . Catching Memory Leaks Using Counters, Bits, and Buffers . ......... . ...................................... ..................................... . . . . . ................ ............................. ...................................................... . . . . . .. . . . . .... . .......... . . ... . . . . . . . ... .... 453 . . 452 453 . 453 454 . 455 ...................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .............. . ....... . ................................................... . ........... ................................................................................................. . 17.10 SUMMARY. . . . CHAPTER 19 . . . . . . . .. . . ..... . .. . . . .. . . . . ..................... .. . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 DRIVER PERFORMANCE ............................................................ 459 18.1 GENERAL GUIDELINES . . . . . . . . . . . ... . . Know Where You're Going Get to Knowe the Hardware . Explore Creative Driver Designs Optimize Code Creatively Measure Everything You Do . . . . . . . . . . . . . . .......................... .. . . ········· · · · · . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . ... . . ... . 459 . 459 460 460 . 461 . . 461 ................................................................. ......................................................................................... .................................................................................... ............................................................................................ ............................................................................... 18.2 PERFORMANCE MONITORING IN WINDOWS NT .... . . . . . Some Terminology How Performance Monitoring Works How Drivers Export Performance Data .. . . . . . . . . . . . . .. . . . . . . .. .. ....... . . . . . . . . . . . . . . . . . . 462 .......................................................................................................... ............................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 ADDING COUNTER NAMES TO THE REGISTRY . . Counter Definitions in the Registry ..... . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 462 462 464 . 464 . . ................................................................................ 464 Contents xiv Writing LODCTR Command Files ... ........ ... . .... ...... .. .. .. .. .. .. ... . . . . . . . . . .. . .. .... .. ... ... ... .. ..... .. 466 Using LODCTR and UNLODCTR . . .... .... .... ... .. ... .... . ... . ... . ... .. . .. ... .. . . . . ..... ... .. ........ .. ... . .. . 467 . . 18.4 .. . THE FORMAT OF PERFORMANCE DATA ........... ... . ........... . .............. . ... ............ . .. 468 . Overall Structure of Performance Data .. ..... ... . .... .... ... ... .... .... ...... . .. .. . .. .. . .. ... .. . .. ...... ... . .... 468 Types of Counters ... .. ............ . . .. .. .. . .... .. . ........... .. . ..... .. ............. ... .. .. .. .... .. .... .. .. .... .... .. .. . 470 Objects with Multiple Instances ..... .... ... ... .... ... . ........ ... .. ... .. .. ........ .. . . . . . . . .. .... .... .... .... .. .... 472 . . . . . . 18.5 . WRITING THE DATA-COLLECTION DLL.. ........................................................... 474 Contents of the Data-Collection DLL . . ............... . ... .. ... ... .... ... .... .. . .. . .... .... .. .. .. .... .. .. . . . .. . 474 Error Handling in a Data-Collection DLL ...... ........ .... .... .... .. ... .... .......... .... . ... . . .. .. ... .. ... . 476 Installing the DLL ..... .... ... .. .. .. ... .. ....... . . ........ .. . .... . .. .. ..... . .. ... ... . ... . .. ........ ... . . ... . . . . ... . .. ... .. 477 . . . . . 18.6 CODE EXAMPLE: A . . . DATA-COLLECTION DLL.. .................................................. 478 XXPERF.C . ... ... .. ... ... . . ... ... . . . .. . . .. .... . .. ... . .... . . . . .. ... ... . . ..... .. . . . . . . .. .... .. ... .... ... . .... ... .. .. .. ... . .. 478 Building and Installing this Example ...... . . . .. ..... .... ........... . . . .. .. .. ...... .. .. ... . .... .. ........ ... . .... .. 486 . 18.7 . . . . SUMMARY ........................................................................................................... 487 APPENDIX A A.1 . THE DEVELOPMENT ENVIRONMENT ................................... 488 HARDWARE AND SOFTWARE REQUIREMENTS ..................................................... 488 Connecting the Host and Target. ....... ... .. ..... ...... .. ... .. ... . . . . .. .. ... . . .... ... . .... .... .... . . .... ..... ... .. 489 . . . A.2 DEBUG SYMBOL FILES ........................................................................................ 490 A.3 ENABLING CRASH DUMPS ON THE TARGET SYSTEM .......................................... 490 If You Don't Get Any Crash Dump Files . ... . . . . ... . ..... ... .. . .. . . . .... .. ... . . . .... .. .. .. .... .. .... . ... ..... 491 . A.4 . ENABLING THE TARGET SYSTEM'S DEBUG CLIENT ............................................ 492 APPENDIX B COMMON BUGCHECK CODES ................................................. 494 B .1 GENERAL PROBLEMS WITH DRIVERS .................................................................. 494 SYNCHRONIZATION PROBLEMS ............................................................................ 496 B.3 CORRUPTED DRIVER DATA STRUCTURES ............................................................ 496 B.4 MEMORY PROBLEMS ............................................................................................ 498 B.5 HARDWARE FAILURES ......................................................................................... 500 B.6 CONFIGURATION MANAGER AND REGISTRY PROBLEMS ..................................... 501 B.7 FILE SYSTEM PROBLEMS ..................................................................................... 503 B.8 SYSTEM INITIALIZATION FAILURES ..................................................................... 504 B .9 INTERNAL SYSTEM FAILURES · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 506 B.2 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABOUT THE AUTHOR 507 .............................................................................................. 509 ............................................................................................................................ 511 INDEX Preface In case you haven't guessed, this book explains how to write, install, and debug kernel-mode device drivers for Windows NT. If you're in the process of designing or coding an NT driver, or if you're porting an existing driver from some other operating system, this book is a valuable companion to the Microsoft DDK documentation. This book might also have something to say to you if you just need a little more insight into the workings of Windows NT, particularly the I/O subsystem. Perhaps you're trying to decide if NT is a reasonable platform for some specific purpose. Or you may be studying operating systems, and you want to see how theory gets applied in the real world. And of course, we mustn't discount the power of morbid curiosity. The same fas cination that forces us to slow down as we drive past a car accident can also motivate us to pull a volume off the bookstore shelf. What You Should Already Know Throughout this book, I make several assumptions about what you already know. First of all, you need to have all the basic Windows NT user skills such as logging in and running various utilities . Since driver installation requires you to have adminis trator-level privileges, you can trash things pretty badly if you don't know how to use the system. Second, you'll need decent C-language programming skills . I've tried to avoid the xv Preface xvi use of "cleverness" in my code examples, but you still have to be able to read them. Next, some experience with Win32 user-mode programming is helpful, but it isn't really required. If you haven't worked with the Win32 API, you might want to browse through volume two of the Win32 Programmers Reference. This is the one that de scribes system services . Take a look at the chapters on the 1/0 primitives (CreateFile, ReadFile, WriteFile, and DeviceloControl) and the thread-model. See the bibliog raphy for other books on Win32 programming. Finally, you need to understand something about hardware in order to write driv ers . It would be helpful if you already had some experience working with hardware, but if not, Chapter 2 will give you a basic introduction. Again, the bibliography will point you toward other, more-detailed sources for this kind of information. What You' l l Find Here One of the most difficult choices any author has to make is deciding what to write about and what to leave out. In general, I've attempted to focus on core issues that are crucial to kernel-mode driver development. I've also tried to provide enough back ground information so that you'll be able to read the sample code supplied with the NT DDK, and make intelligent design choices for your own drivers . The overall flow o f the book goes from the theoretical t o the practical, with earlier chapters providing the underpinnings for later topics. Here's what's covered: Chapters 1 -5 The first part of this book provides the basic foundation you'll need if you plan to write drivers . This includes a general examination of the Windows NT driver architecture, a little bit about hardware, and a rather detailed look at the NT 1/0 Manager and its data structures . This group of topics ends with some general kernel-mode coding guidelines and techniques . Chapters 6-1 3 These eight chapters form the nucleus of the book and present all the details of writing kernel-mode NT device drivers . You'll also find discussions here of full-duplex driver architectures, handling timeout conditions , and logging device er rors . Unless you're already familiar with NT's driver architecture, you should probably read these chapters in order. Chapters 1 4 and 1 5 The next two chapters deal with alternative driver architec tures supported by Windows NT. This includes the use of kernel-mode threads in driv ers and higher-level drivers . Chapters 1 6-1 8 The final part o f the book deals with various practical details of writing NT drivers . Chapter 16 takes a look at all the things your mother never told you about the BUILD utility. Chapter 17 covers various aspects of testing and debug ging drivers, including how to analyze crash dumps and how to really get WINDBG to work. If you're actually writing a driver while you read this book, you may want to read these chapters out of order. Chapter 18 examines the crucial issue of driver perfor- Preface xvii mance and how to tie your driver into NT's performance monitoring mechanisms . Appendices The appendices cover various topics that people in my classes have asked about. The first one deals with the mechanics of setting up a driver development environment. The second appendix contains a list of the bugcheck codes you're most likely to en counter, along with descriptions of their various parameters. Used in conjunction with the material in Chapter 17, this may help you track down the cause of a blue screen or two. What You Won't Find I excluded topics from this book for several reasons . Some subj ects were just too large to cover. Others addressed the needs of too small a segment of the driver-writing community. Finally, some areas of driver-development are simply unsupported by Mi crosoft. Specifically, you won't find anything here about the following items: File system drivers At the time this book went to press, Microsoft still hadn't released any kind of developer's kit for NT file system drivers . In fact, there seemed to be a great deal of resistance to the idea within Microsoft. Until this situation changes , there's not much point in talking about the architecture o f file system drivers . Net-card and network protocol drivers NDIS and TDI drivers are both very large topics - large enough to fill a book of their own. Unfortunately, there just wasn't enough room for all of it here. I can offer one bit of consolation: The material in this book will give you much of the background you need in order to understand what's hap pening inside the NDIS/TDI framework. SCSI miniport and class drivers Although SCSI HBA miniport drivers are vital system components, the number of people actually writing them is (I suspect) rather small. Consequently, the only reference to SCSI miniports is the overview material in Chapter 1 . I would have liked t o include a discussion o f S C S I class drivers i n this book, but unfortunately there just wasn't any time to write it. The material on developing inter mediate drivers in Chapter 15 will give you much of the necessary background. From there, take a look at the sample SCSI class driver for CD-ROMs that comes with the NT DDK. Video, display, and printer drivers This is another area where I had to make a tradeoff between the number of people writing these kinds of drivers and the time available to finish the book. Unfortunately, graphics drivers for video and hardcopy de vices didn't make the cut this time. Perhaps in a later, expanded version of the book. . . Virtual DOS device drivers In my opinion, the best way to run 16-bit MS-DOS and Windows applications under Windows NT is to port the source code to Win32. In Preface xviii any event, the Microsoft documentation does a decent j ob describing the mechanics of writing VDDs so I haven't included anything about them here. About the Sample Code There's a great deal of sample driver code scattered throughout this book. You'll find all of it on the accompanying floppy disk. I've created separate directories on the floppy for each chapter, and where appropriate, subdirectories for each component or driver in the chapter. Coding style Since the purpose of this book is instruction, I've done a couple things to improve the clarity of the samples. First, I've adopted a coding style that avoids smart tricks . Some of the examples could probably have been written in fewer lines of code, but I don't think they would have been as easy to understand. Also in the name of clarity, I've eliminated everything except the bare essentials from each sample. For example, most of the drivers don't contain any error-logging or debugging code, although a real driver ought to include these things . These topics have their own chapters, and you shouldn't have too much trouble back-fitting the code into other sample drivers . Naming conventions You'll notice that almost all the sample drivers appearing in this book are called ''XXD RIVER." (The only exception is the higher-level driver Chapter 15. Its name is ''YYD RIVER.") This makes it somewhat easier to interchange the parts of different samples . It also reduces the amount of clutter that you'll be add ing to the Registry while you're playing with these drivers . Within any particular driver, I've also adopted the convention of adding the pre fix, Xx to the names of any driver-defined functions . Similarly, device registers, driver structures, and constants are also prefixed with xx._. This makes it easy to see which things you have to write and which ones come from the folks at Microsoft. Platform dependencies It's worth mentioning that these samples have been tar geted to run on Intel 80x86 platforms . In particular, the drivers all assume that device registers live in 1/0 space rather than being memory-mapped. This is relatively easy to fix with a little bit of coding and some modifications to each driver's hardware-specific header file. To build and run the examples You'll need several tools if you plan to do any driver development for Windows NT. First, get yourself a Level II subscription to the Microsoft Developer Network CDs. This is the only source for the NT DDK and the Win32 SDK. You'll also need a C compiler. I've chosen to use the Microsoft compiler for devel oping and testing all the code in this book. Your mileage may vary if you're using some other vendor's tools . See Appendix A for more information on setting up your driver de velopment environment. Preface xix Training and Consulting Services The material in this book is based on classes that I've been delivering for several years through Cydonix Corporation - a training and consulting firm whose goal is to help its clients develop device drivers and other high-performance Windows NT soft ware. Cydonix offers services that range from formal classroom training to direct par ticipation in software design and coding. For the past three years, Cydonix has been helping companies like Adaptec, AT&T, Compaq Computers, Hewlett-Packard, and Intel to learn more about the work ings of Windows NT. We have training available in a number of areas including: •Windows NT device driver programming •Win32 system service programming •Advanced server development techniques Cydonix offers both onsite training at customer facilities and open enrollment classes that are available to the general public. The public classes are hosted by train ing vendors in several geographic areas . For more information about training and consulting from Cydonix Corporation, visit our Web site at http://www .cydonix.com or send email to info@cydonix.com. You can also contact us through more earthbound means using this postal address: Cydonix Corporation Suite 304 2 1 1 7 L Street, N.W. Washington, DC 20037 Acknowledgments M any people have kindly contributed to the cre ation of this volume. First and foremost, I want to thank David Lucas (to whom this book is dedicated) for his steadfast friendship and unfaltering faith in me over the years. David, so many things have been possible in my life only because of you . . . My gratitude also goes to the editorial and production staff at Prentice Hall. Mike Meehan and Joanne Anzalone have shown infinite patience while I tried to balance my training and consulting schedule with the demands of writing a book. I'm sure you're glad it's over. I would be remiss if I didn't acknowledge all the people who've been stu dents in my various driver classes over the last twelve years. Your questions and insights have helped me understand how to communicate this kind of material to others, and I'm grateful. Finally, I'm very pleased to say that all crash sequences were performed by stunt doubles and no programmers or other small animals were actually harmed. C H A P T E R 1 Introduction to Windows NT Drivers T radition demands that any book about writing device drivers starts out by answering the question, "What is a driver?" Unfortu nately, asking this question in Windows NT is a little like asking "What color is plaid?" because there are at least a dozen different software components that can rightfully be called drivers. This chapter takes a roundabout look at the different kinds of drivers supported by Windows NT, and along the way, presents some of the design philosophy that makes this operating system such an intriguing beast. 1 .1 OVERALL SYSTEM ARCHITECTURE Windows NT drivers don't live in isolation, o f course. Rather, they are just one part of a large and complex operating system. This section takes you on a quick tour of the Windows NT architecture and points out those features that will be of most interest to driver writers. Design Goals for Windows NT Like every other commercial operating system, Windows NT is the result of a complex interaction between idealized goals and market-driven realities. The Windows NT design team set their sights on the following: • Compatibility The operating system should support a wide range of existing software and legacy hardware. • Robustness and reliability The operating system has to resist the attacks of naive or malicious users, and individual applications should be as isolated from one another as possible. - - 1 2 Chapter 1 Introduction to Windows NT Drivers • Portability The operating system should be able to run on a wide vari ety of current and future hardware platforms. • Extendibi l ity It should be possible to add new features and support new I/0 devices without perturbing the existing code base. • Performance The operating system should be able to give reasonable performance on commonly available hardware. It should also be able to take advantage of features like multiprocessing hardware. - - - Trying to balance all these goals with a reasonable time to market was a complex process. The rest of this section describes the solution that the system designers came up with - beginning with a look at the protection mechanisms that keep the operating system safe. Hardware Privilege Levels in Windows NT There are any number of things that application programs shouldn't be allowed to do in a multitasking environment. Fooling with the memory manage ment hardware or halting the processor are just two examples of actions that would cause serious problems. Rather than depending on the kindness of strange applications, Windows NT takes advantage of hardware-enforced privilege checking mechanisms to guarantee system integrity. To avoid hardware dependencies, Windows NT uses a simplified model to describe hardware privileges. This model then maps onto whatever privilege checking mechanisms are available on a given CPU. A CPU must be able to oper ate in two modes if it's going to support the Windows NT hardware privilege model. Kernel mode Anything goes when the CPU runs in kernel mode. A task can execute privileged instructions, and it has complete access to any I/0 devices. It can also touch any virtual address and fiddle with the virtual memory hard ware. This mode corresponds to Ring 0 on an Intel 80x86. User mode In this mode, the hardware prevents execution of privileged instructions and performs access checks on references to memory and I/0 space. This allows the operating system to restrict a task's access to various I/O opera tions, and trap any other behavior that might violate system integrity. Code run ning in user mode can't get itself into kernel mode without going through some kind of gate mechanism in the operating system. On an Intel 80x86 processor, this mode corresponds to Ring 3. Base Operating System Components The base components of Windows NT implement a general operating sys tem platform on which to build more complex environments. As you can see from Sec. 1.1 Overall System Architecture 3 NT Executive Kernel Hardware Abstraction Layer (HAL) Hardware Platform Copyright © 1 994 by Cydonlx Corporation. 940002a.vsd Figure 1 .1 Overall architecture of the NT kernel-mode components Figure 1.1, these base components consist of three major blocks of kernel-mode code. Hardware Abstraction Layer (HAL) The HAL is a thin layer of software that presents the rest of the system with an abstract model of any hardware that's not part of the CPU itself. The HAL exposes a well-defined set of functions that manage such items as: • Off-chip caches • Timers • I/O buses • Device registers • Interrupt controllers • DMA controllers Various system components use these HAL functions to interact with off CPU hardware. This essentially hides platform-specific details from the rest of the system and removes the need to have different versions of the operating system for platforms from different system vendors. In particular, the use of HAL rou tines makes the Kernel and device drivers binary-compatible across platforms with the same CPU architecture. Kernel Where the HAL is an abstraction of the platform, the Kernel pre sents an idealized view of the CPU itself. Among other things, the Kernel pro vides mechanisms for Chapter 1 4 Introduction to Windows NT Drivers • Interrupt and exception dispatching • Thread scheduling and synchronization • Multiprocessor synchronization • Time keeping By using these Kernel services, upper layers of the operating system can (for the most part) ignore the architecture of the underlying CPU. This makes it possi ble for drivers and higher-level operating sfstem components to be source-code portable across different CPU architectures. An interesting feature of the Kernel is that it presents an object-based inter face to its clients. When other parts of the operating system need help from the Kernel, they request its services by calling functions that create and manipulate various kinds of objects. These Kernel objects fall into two main categories: • Dispatcher objects - These are used primarily for managing and syn chronizing threads. • Control objects These objects affect the behavior of the operating sys tem itself in some way. - Device drivers don't have much use for dispatcher objects. Those that do are described in Chapter 14. Control objects are another matter, however. In particu lar, device drivers make frequent use of Deferred Procedure Call objects and Interrupt objects (described in Chapters 3 and 4 respectively) . Executive The Executive is by far the largest and most complex kernel mode component in Windows NT. Its job is to implement many of the basic func tions normally associated with an operating system. Like the Kernel, the Execu tive uses the HAL to interact with any off-CPU hardware and so becomes binary compatible across platforms from different system vendors. By relying on Kernel objects, the Executive gains the additional advantage of being source-code porta ble across different CPU architectures. Because it's such a key part of Windows NT, it's worth exploring the Executive a little more. What's in the Executive As you can see from Figure 1 .2, the Executive actually consists of several distinct software components that offer their services both to user-mode pro cesses and to one another. These Executive components are completely indepen dent and communicate only through well-defined interfaces. This modularity 1 It also means that much of the work of porting Windows NT to a new CPU is really a matter of rewriting the Kernel. To make this process easier, Microsoft has adopted a microkernel approach that tries to keep the Kernel as small as possible. Sec. 1 . 1 Overall System Architecture 5 System Service Interface Object Mgr Process Mgr Security Monitor Config Mgr 1/0 Mgr Virtual Memory Mgr Local Proc Call Copyright © 1 994 b y Cydonix Corporation. 940003a.vsd Figure 1 .2 Detailed view of the Executive makes it possible to replace an existing Executive component without perturbing any other parts of the operating system. As long as the replacement exposes the same interface, the change will be transparent. The remainder of this subsection gives cursory descriptions of the various Executive modules. System service interface All operating systems have to give user-mode processes a limited ability to execute kernel-mode code. In particular, there must be a controlled path from user to kernel mode that applications can follow when they call system services. In Windows NT, the system service dispatcher uses a technique based on the CPU's hardware exception mechanism to give user-mode code access to Executive services. Object Manager The Executive offers its services to user-mode processes through an object-based interface. These Executive objects represent things such as files, processes, threads, and shared memory segments. This use of objects pro vides a unified mechanism for tracking resources and enforcing security. The Object Manager does all the grunt work of managing these Executive objects. This includes creating and deleting objects, maintaining the global object namespace, and keeping track of how many outstanding references there are to any given object. Configuration Manager From a driver writer 's perspective, the main job of the Configuration Manager is to maintain a model of all the hardware and soft ware installed on the machine. It does this using a database called the Registry. As you read through the rest of this book, you'll see that drivers are linked to the Registry through an intricate web of connections. Among other things, drivers use the Registry to Chapter 1 6 Introduction to Windows NT Drivers • Identify themselves as trusted system components • Find and allocate peripheral hardware • Set up error-logging message files • Enable driver-performance measurement Process Manager A process is the unit of resource-tracking and security access checking in Windows NT. Along with any resources it might be holding, each process has its own virtual address space and security identity. A process also contains one or more executable entities called threads. It is the thread (and not the process) that receives ownership of a CPU and does actual work. The Process Manager is the Executive component that handles the creation, management, and deletion of processes and threads. It also provides a standard set of services for synchronizing the activities of threads. Most of the features exposed by the Process Manager are just fancy versions of mechanisms imple mented by the Kernel. Security Reference Monitor This Executive component enforces the sys tem's security policies. The Security Reference Monitor doesn't actually define security policy; that job belongs to the Local Security Authority subsystem (described later in this chapter). Rather, the Security Reference Monitor simply provides a set of primitives that both kernel- and user-mode components can call to validate access to objects, check for user privileges, and generate audit mes sages. For the most part, device drivers don't concern themselves with security issues. Device drivers normally don't do much with the Security Reference Moni tor. The I/0 Manager handles those kinds of details before it calls any routines in your driver. Virtual Memory Manager Under Windows NT, each process has a flat 4gigabyte virtual address space. The lower half of this space contains process-pri vate code and data along with the process's stack and heap space. It also holds any File Mapping objects and DLLs the process is using. The upper half of every process's address space contains nothing but kernel-mode code. One of the jobs of the Executive's Virtual Memory Manager is to maintain this illusion of a huge address space using demand-paged virtual memory management techniques. From a driver writer 's point of view, the Virtual Memory Manager is more important as a memory allocator because it maintains the system heap areas. The Virtual Memory Manager also builds and manipulates various buffer descriptors that are crucial to the operation of DMA drivers. Both these topics are covered in more detail later. Local Procedure Call facility The Local Procedure Call (LPC) facility is a message-passing mechanism used for communication between processes on the same machine. LPCs are used primarily by protected subsystems (described later) and their clients. Device drivers have no access to the LPC facility. Sec. 1.1 Overall System Architecture 7 1/0 Manager This Executive component converts I/ 0 requests from user and kernel-mode threads into properly sequenced calls to various driver rou tines. Through the use of a well-defined formal interface, the 1/0 Manager is able to communicate with all drivers the same way. This makes it unnecessary for the 1/0 Manager to know anything about the underlying hardware managed by a given driver. The rest of this book describes the operation of the 1 / 0 Man ager in gory detail. Extensions to the Base Operating System The Executive components of Windows NT present a fairly neutral face to the world. They don't implement a user interface nor do they define any external policies like security. They don't even offer a programming interface since the Executive's system service calls are not publicly documented. The base kernel mode components simply provide a generic operating system platform. Defining the look and feel of the operating system - both to users and pro grammers - is the job of some extended components known collectively as pro tected subsystems. Rather than dealing directly with the Executive, users and programmers of Windows NT interact with these subsystems. In the original architecture of Windows NT, protected subsystems were implemented entirely as a group of privileged user-mode processes. This rather elegant design made it possible to extend the base operating system without risk ing any damage to the underlying kernel-mode components. For performance reasons, Windows NT 4.0 has moved away from this pure user-mode model and shifted some subsystem components into kernel mode. Depending on the kind of work they do, all protected subsystems can be divided into two major categories. The following subsections describe each cate gory in more detail. Integral subsystems An integral subsystem performs some necessary system function. The responsibilities of these subsystems actually cover quite a lot of territory. The following are just a few examples of what they do. • Together with the Security Accounts Manager and the Logon process, the Local Security Authority defines security policy for the system. • The Service Control Manager loads, supervises, and unloads trusted sys tem components like services and drivers. • The RPC Locator and RPC Service processes give support to distributed applications that use remote procedure calls. Environment subsystems The other kind of protected subsystem is called an environment subsystem. The job of an environment subsystem is to pro vide a programming interface and execution environment for application pro grams native to some specific operating system. Currently, Windows NT provides the following subsystems: Chapter 1 8 Introduction to Windows NT Drivers • The Win32 subsystem implements the native-mode programming inter face for Windows NT. A more detailed description of this subsystem appears below. • The Virtual DOS Machine (VDM) subsystem allows 16-bit MS-DOS appli cations to run under Windows NT. Unlike other subsystems, the VDM software is actually part of the process where the MS-DOS application is running. • The Windows on Windows (WOW) subsystem supports the execution of 16-bit Windows applications. The default behavior of the WOW sub system is to run all 16-bit Windows applications as separate threads within the address space of a single VDM process. This helps to mimic the 16-bit Windows environment more closely. • The POSIX subsystem provides API support for programs conforming to the POSIX 1003.1 source-code standard. Because POSIX 1003.1 is not a binary standard, applications must be compiled and linked on Windows NT in order under this subsystem. • The OS/2 subsystem creates an execution environment for 16-bit OS/2 applications. This subsystem is available only for the 80x86 version of Windows NT. A given application is always tightly coupled to one specific subsystem and can use only the features of that subsystem. For example, a POSIX application can't make calls to Win32 API functions. Also keep in mind that applications run ning under any subsystem other than Win32 will experience some performance degradation. These other subsystems are provided mainly for compatibility. More about the Win32 Subsystem All environment subsystems are not created equal. In particular, the services provided by the Win32 subsystem are crucial to the operation of Windows NT. The duties of this subsystem include the following: • As the owner of the screen, keyboard, and mouse, it manages all console and GUI I/0 for the entire system. This includes I/O for other sub systems as well as user applications. • The Win32 subsystem implements the GUI seen by programmers and users. As the screen and window manager for Windows NT, it defines GUI policy and style for the whole system. • It exposes the Win32 API that both application programs and other sub systems use to interact with the Executive. Because of its special status, the Win32 subsystem is implemented in a dif ferent way from any of the others. Figure 1 .3 shows the organization of the Win32 subsystem. Sec. 1.1 Overall System Architecture 9 Win32 Client User Mode Win32 API DLL NT System Service Interface Kernel Mode WIN32K.SYS LPC Facility Copyright © 1 996 by Cydonix Corporation. 960009a.vsd Figure 1 .3 The Win32 subsystem has both user- and kernel-mode components Unlike its counterparts, the Win32 subsystem doesn't run entirely in user mode. Instead, it consists of both user- and kernel-mode components. To under stand how it all fits together, you need to know a little bit about the organization of the Win32 API itself. Broadly speaking, you can divide Win32 functions into three categories: • The USER functions manage GUI objects like menus and buttons. • The GDI functions that perform low-level drawing operations on graphi cal devices like the displays and printers. • The KERNEL functions manage such things as processes, threads, syn chronization objects, shared memory, and files. They map very directly onto the system services provided by the Executive. In the original design of Windows NT, one of the goals was to confine all GUI policy-making code to the Win32 server process, CSRSS. The developers believed this would make the system more robust and easier to modify. As a result, calls to many USER and GDI functions required some interaction with the CSRSS process. This is a rather expensive operation since it involves a process context switch between the Win32 client and the CSRSS server. By comparison, KERNEL functions could be handled in the context of the calling process. Their only overhead was the transition to and from kernel mode. This architecture has been replaced in Windows NT 4.0 because of the per formance limitations it put on graphically-based Win32 programs. Now, a new kernel-mode component called WIN32K.SYS has taken over most of the work for merly done by CSRSS. With this approach, calls to USER and GDI functions can Chapter 1 10 Introduction to Windows NT Drivers execute in the context of the calling process. The result is that the speed of graphi cally intensive applications improves significantly. This shift from user- to kernel-mode graphic support also had implications for the architecture of video and printer drivers under Windows NT. The next sec tion of this chapter will provide some more details on this subject. 1 .2 KER N E L-MODE 1/0 COMPONENTS Here we're going to take a look at the general layered driver model used by the kernel-mode portions of Windows NT. We'll also be examining variations on this architecture that support specific kinds of I/O devices. Design Goals for the 1/0 Subsystem In addition to the general Windows NT design goals, there were several additional requirements that the I/O subsystem had to satisfy: • Ease of development - It shouldn't take unreasonable amounts of work to provide support for a new device. • Portability - It should be relatively easy to move drivers to new plat forms. In the best case, this would mean simply compiling and linking the driver. • Extendibility - It should be easy to add support for new devices and file systems without breaking anything that already works. • Robustness - The I/0 architecture should offer clean, well-defined interfaces and minimize the use of backdoor mechanisms. • Security - It must be possible to allow or deny various kinds of access to I/O objects on a user-by-user basis. • M ultithreaded operation - Drivers should be able to handle overlap ping requests from multiple threads, even if the threads are running simultaneously on multiple CPUs. • Performance - I/0 throughput must be consistent with the needs of large-scale client-server applications. As if all this isn't enough, the I/O architecture has to work with all the leg acy devices that people have been attaching to PCs for the last decade. Some of these devices have characteristics that don't blend well with modern, large-scale operating systems. Layered Drivers in Windows NT In most operating systems, the term driver refers to a piece of code that man ages some peripheral device. Windows NT takes a more flexible approach which Sec. 1 .2 Kernel-Mode 1/0 Components 11 File System Driver 1/0 Manager Intermediate Driver Device Driver Copyright © 1 994 by Cydonix Corporation. 940008a.vsd Figure 1 .4 Layered kernel-mode drivers allows several driver layers (shown in Figure 1 .4) t o exist between an application program and a piece of hardware. This layering permits Windows NT to define a driver in much broader terms that include file systems, logical volume managers, and various network components as well as physical device drivers. These are the drivers that manage actual data transfer and control operations for a specific type of physical device. This includes starting and completing I/O operations, handling interrupts, and performing any error pro cessing required by the device. Device drivers Intermediate drivers Windows NT allows you to layer any number of intermediate drivers on top of a physical device driver. These intermediate layers provide a way of extending the capabilities of the I/ 0 system without having to modify the drivers below them. For example, the fault-tolerant disk driver in Windows NT Server is implemented as a layer that sits between the file system and the drivers for any physical disks. Another use for intermediate drivers is to separate hardware-specific oper ations from more general management issues. In this kind of arrangement, the intermediate driver is referred to as a class driver and the hardware driver is called a port driver. For example, the keyboard class driver handles general key stroke processing while the keyboard port driver worries about the details of specific keyboard controllers. The use of separate class and port drivers makes it easier to target a wider range of hardware since only the port driver needs to be rewritten. File-system drivers (FSDs) This kind of driver is generally responsible for maintaining the on-disk structures needed by various file systems. For design 12 Chapter 1 Introduction to Windows NT Drivers reasons, some other system components are implemented as file-system drivers, even though they aren't file systems as such. Microsoft currently supplies the fol lowing FSDs: • • • • • • • FAT - Windows 95 extended MS-DOS file system NTFS Windows NT high reliability file system HPFS - OS/2 high performance file system CDFS - ISO 9660 CD-ROM file system MSFS - Mailslot file system NPFS Named pipe file system RDR - LAN Manager redirector - - Unfortunately, you can't develop file-system drivers using the standard NT DDK. Microsoft released a beta version of a file system developer's kit at a confer ence in 1994, but at the time of this writing, they hadn't committed to any release date for the final version of this kit. SCSI Drivers The Windows NT SCSI architecture uses layered drivers to separate the man agement of specific devices from the control of the SCSI host bus adapter (HBA) itself. Figure 1.5 shows the components of the Windows NT SCSI architecture. Filter Driver Class Driver NT SCSI Port Driver Miniport Driver SCSI Adapter Copyright C> 1 996 by Cydonix Corporation. 9600108.vsd Figure 1 .5 Architecture of Windows NT SCSI drivers SCSI Device Sec. 1.2 Kernel-Mode 1/0 Components 13 SCSI port and miniport drivers The port driver is a Microsoft-supplied component that acts as an interface between a SCSI miniport driver and the oper ating system. By handling common SCSI grunt work and hiding the details of the local operating system, the SCSI port driver makes it easier to write drivers for new SCSI HBAs. It also reduces the overall size of a miniport and makes it easier to move the miniport to other operating systems (like Windows 95). SCSI miniports supply the port driver with routines that perform any HBA specific control operations. Generally, the only people writing SCSI miniport drivers are HBA vendors who want to sell their products in the Windows NT marketplace. SCSI class drivers Class drivers manage all the SCSI devices of a particu lar type, regardless of what HBA they're attached to. For example, there are SCSI class drivers for tapes, disks, and CD-ROM drives. Separating device control from HBA control makes it possible to mix and match SCSI devices and adapters from different vendors. If you have a device that attaches to a SCSI bus, this is the only kind of driver you'll need to write. SCSI filter drivers Filters are optional SCSI components that intercept and modify requests sent to a SCSI class driver. This allows you to take advantage of exist ing class driver capabilities without writing everything from scratch. Filters are useful if you're developing a class driver for hardware that's similar to some other device. Network Drivers In an effort to get better performance, many of the networking components in Windows NT are implemented as kernel-mode drivers. As you can see from Figure 1 .6, Windows NT uses driver layering to disengage network protocol man agement from actual network data transfers. The result is much greater flexibility and support for a wider range of network protocols and hardware. Network interface card (NIC) drivers At the bottom of the stack are the NIC drivers that manage the actual networking hardware. NIC drivers present a standard interface at their top edge that allows higher-level drivers to send and receive packets, to reset or halt the NIC, and to query and set the characteristics of the NIC. The interface to a NIC driver is defined by the network driver interface specification (NDIS). NDIS NIC drivers rely heavily on the services provided by the NDIS inter face library. This library (sometimes referred to as the NDIS wrapper) handles many of the nasty details involved in managing asynchronous communications across a network. The NDIS library also exports a complete set of kernel-mode system functions so that a properly written NDIS driver doesn't need to deal with the operating system. Based on the amount of help they get from the NDIS interfa ce library, you can classify NIC drivers as either miniports or full drivers. NIC rniniports perform Chapter 1 14 Sockets Emulator Introduction to Windows NT Drivers Other kernel-mode TOI clients NetBEUI Emulator Transport Driver Interface (TOI) Legacy Protocol Driver Media-Aware Protocol Driver NDIS Intermediate Driver NIDIS Miniport Driver NDIS Library Copynght © 1 996 by Cydonix Corporation. 940009a.vsd Figure 1 .6 Architecture of kernel-mode networking components in Windows NT only those hardware-specific operations needed to manage a particular NIC. Code in the NDIS library takes care of issues common to all NIC miniports such as syn chronization, notification of packet arrival, and queuing of outgoing packets. This is the preferred type of NIC driver for any new hardware. By comparison, full NIC drivers do almost everything on their own. This makes them much harder to write and debug and often slower than NIC miniports. Originally introduced in the first release of Windows NT, full NIC drivers are supported only to maintain backward compatibility. No one in their right mind is developing full NIC drivers anymore. NDIS intermediate drivers Version 4.0 of NDIS (the one included with Windows NT 4.0) includes a new kind of component: the NDIS intermediate driver. NDIS intermediate drivers are sandwiched between transport drivers and NDIS NIC miniports. To the transport driver, they appear to be NDIS miniports while to the NIC driver, they look like transport drivers. NDIS intermediate layers are useful if you have a legacy transport driver and you want to connect it to some new type of media unknown to the transport driver. In this situation, the intermediate driver performs any necessary transla tions between the transport driver and the NIC miniport managing the new media. Transport drivers A transport driver is responsible for implementing a specific network protocol such as TCP /IP or IPX/SPX. It is independent of the underlying network hardware and uses NDIS NIC or intermediate drivers to transfer packets over one or more physical network connections. Sec. 1 .3 Special Driver Architectures 15 All Windows NT transport drivers offer their services to kernel-mode net working clients through the transport driver interface (TOI). The TOI specifica tion defines a low-level interface that supports both connection-based and connectionless (i.e., datagram) protocols. Having all transport drivers expose a single, common interface simplifies the development of both the transport drivers and the clients they support. Kernel-mode networking clients Various kernel-mode components that access the network use the IDI interface to communicate with protocol drivers. These kernel-mode TOI clients fall into two broad categories: First, there are sys tem components whose operation is transparent to user-mode applications. One example would be the Server and Redirector that handle requests for remote file access. The other kind of TOI client is an emulator that exposes some well-known programming interface. User-mode applications access the network through one of these standard APis rather than working directly with TOI. This approach makes it easier to port existing software to Windows NT and prevents the need less proliferation of networking APis. Windows NT currently supports interfaces for sockets, NetBIOS calls, named pipes, and mailslots. 1 .3 SPECIAL DRIVER ARCHITECTU RES Along with the relatively straightforward kernel-mode drivers described in sec tion 1 .2, Windows NT depends on a number of very specialized driver architec tures. The following subsections describe each of them in detail. Video Drivers Video support in Windows NT is complicated by the fact that Win32 appli cations can use three different graphics APls. First, there's the graphical device interface (GDI). This API provides a set of device-independent rendering func tions for generating two-dimensional output on display or hardcopy devices. Most Win32 applications use this programming interface because it simplifies the task of producing identical display and printer output. For programs that need to produce three-dimensional graphics, Win32 also supports the OpenGL APL These functions generate the kind of high-quality out put needed by CAD software or scientific visualization tools. In return for the quality of the output, however, the OpenGL API demands a great deal of CPU horsepower or hardware rendering assistance. Finally, for consumer applications (i.e., games), Windows NT supports a subset of the DirectDraw API included in Windows 95. DirectDraw is one piece of Microsoft's DirectX game-programming architecture. Its goal is to give user-mode applications more direct access to video and audio hardware without compromis ing the integrity of the system. Chapter 1 16 Introduction to Windows NT Drivers System Service Interface GDI Rendering Engine VO Manager DirectDraw HAL Video Port Video Miniport Driver Video Hardware � Copyright @ 1 996 by Cydonix Corporation. 96001 1 a.v d Figure 1 .7 Architecture of NT kernel-mode video drivers Supporting multiple APis on video hardware from multiple vendors is a complex problem. Solving it in a flexible and portable manner requires the inter action of a number of software components. Figure 1.7 shows what they are. GDI engine The GDI engine is the key to Windows NT's device-indepen dent output strategy. This Microsoft-supplied component provides full software rendering support for Win32 GDI calls. In response to a Win32 drawing request, the GDI engine uses the appropriate display or printer driver to generate com mands for a specific piece of hardware. Display drivers Display drivers are vendor-supplied components that do the actual work of drawing on the display screen. By selectively overriding the rendering functions in the GDI engine, they also give Win32 access to any hard ware acceleration features provided by the video card.2 Along with a display driver for a specific piece of video hardware, vendors need to provide a corre sponding video miniport (described below). DirectDraw HAL This vendor-supplied component exposes an abstract version of the video hardware. This includes the video frame buffer plus any hard ware acceleration mechanisms supported by the DirectDraw APL Any features of 2 In earlier versions of Windows NT, both the GDI engine and the display driver were user-mode components running in the context of the Win32 subsystem process. To improve graphics perfor mance, this code runs in kernel mode in Windows NT 4.0. Sec. 1 .3 Special Driver Architectures 17 the DirectDraw hardware model not supported by the video device are emulated by Microsoft's DirectDraw software. Video port and miniport drivers The main responsibility of these two drivers is to manage state changes in the system's video hardware. The video port and miniport do not take part in any drawing operations. The work of these driv ers includes doing such things as: • Finding and initializing the video controller. • Managing any cursor or pointer hardware located on the video card. • Handling mode-set and palette operations when a full-screen MS-DOS session is running. (This only applies to 80x86 platforms.) • Making the video frame buffer available to user-mode processes. The video port and miniport are actually a tightly-coupled pair of drivers. The port driver is a Microsoft-supplied framework that simplifies the task of writ ing video drivers. It contains only generic, hardware-independent code that is common to all video drivers. The miniport is a vendor-supplied driver whose job is to manage a specific type of video card. In response to calls from the video port driver, it is the miniport that actually changes the state of the device. This division of labor between the port and miniport makes it easier to add support for new video cards to Windows NT. Printer Drivers In Windows NT, hardcopy devices are considered to be just another kind of graphical output hardware. Unlike display devices, however, there can be more than one printer on the system, and these printers may not all use the same kind of physical connection. Some of them may even be located somewhere else on the network. The Windows NT printing architecture (pictured in Figure 1 .8) is an attempt to deal with all this variety. Printer drivers A printer driver is very much like a display driver in that it runs in kernel mode and helps the GDI engine convert Win32 API graphics calls into rendering commands. The difference is that a printer driver sends its output to the spooler (described below) rather than to a video device. A printer driver is responsible for supporting a particular printer or family of printers. The Windows NT DOK contains sample drivers for raster-based print ers, PostScript printers, and plotters. Most printers available today fall into one of these categories. Unless your printer uses some completely alien technology, it's unlikely that you'd need to write an entire driver from scratch. For raster-based printers, most of the rendering operation is simply a matter of converting a specific drawing command into the proper set of printer escape Chapter 1 18 /- ation Introduction to Windows NT Drivers Spooler Config DLL Config DLL Print Proc Spool API Lang Monitor Graphics API System Service Interface GDI Engine Printer Driver System Service Interface Serial/Parallel/Network Device Driver Copyright © 1 996 by Cydonix Corporation. 960012a.vsd Figure 1 .8 Architecture of the Windows NT printing components codes. Because this is such a well-defined problem, you can use a Microsoft-sup plied framework called the Unidriver to do most of the work. In this case, you only need to write the device-specific pieces of code in the form of a miniprint driver. Adding support for printers based on a page description language like PostScript is a more complicated task. Configuration DLL To support a printer under Windows NT, it's not enough to write a printer driver. You also have to supply a user-mode configura tion DLL. The job of this DLL is to display the property-sheet dialog box that changes the printer 's settings. Application programs use the configuration DLL to set up the printing environment for specific documents. It also appears when you select one of the icons in the Windows NT shell's Printers folder. Spooler The spooler is the central component of Windows NT's printing mechanism. It takes the output generated by a printer driver and either sends it to the appropriate printer or stores it in a temporary file for later printing. The spooler works either with local or networked printers. The spooler is one of the integral subsystem processes that starts when the operating system loads. Its architecture is very modular so that it can accommo date a wide variety of printing devices and environments. Printer vendors can customize the spooler by supplying three different kinds of components: print processors, language monitors, and port monitors. Print processor DLL A print processor is a DLL that reads the spooled data produced by a specific printer driver and converts it into actual output. At its upper edge, the print processor DLL exposes a standard set of functions to the spooler. It generates output using the services provided by a language or port monitor. Special Driver Architectures Sec. 1 .3 19 The standard printer drivers can spool their output as text, as raw data (already rendered by the GDI engine), or as a series of enhanced metafile (EMF) commands to be rendered by the spooler. 3 Microsoft supplies a print processor that can interpret any of these three data formats. If you write a printer driver that uses a proprietary format for spooled data, you'll also have to write a print pro cessor for it. Language monitor DLL In workgroup situations, it's very common for several users to be sharing a single printer or print server. Consequently, it's important to keep their jobs clearly separated and to be able to determine the sta tus of a particular job at any point in time. It also may be necessary to set up a dif ferent printing environment for each job being output. To meet these kinds of needs, many vendors offer smart, bidirectional print ers that accept commands and report status over the same connection on which they receive output data. Normally, these command and status messages are in some kind of control language defined by the printer 's manufacturer. For exam ple, Hewlett Packard LaserJet printers use something called the Printer Job Lan guage (PJL). A language monitor is a DLL that allows the spooler to communicate with a bidirectional printer in a standardized way. It exposes a well-defined set of func tions that the spooler can call to control and monitor a job on one of these printers. The language monitor then converts these requests into the proper stream of job language commands and uses the port monitor (described below) to send them to the printer. Windows NT comes with a language monitor for the Hewlett Packard PJL language. If your printer uses some home-brew set of commands, you'll need to write a language monitor for it. Port monitor DLL A port monitor is a DLL that manages a particular kind of output channel on behalf of the spooler subsystem. The monitor exposes a stan dard set of functions which the spooler invokes in order to generate output. The port monitor then converts these calls into the appropriate set of Win32 1/0 requests. Allowing the spooler to work with an abstraction of the output device makes it easier to add support for a variety of printer connections. Microsoft sup plies the following port monitors with Windows NT: 3 • The local port monitor that communicates with the parallel and serial ports as well as printing data to a file. • The LPR monitor that manages LPD printers and print-servers using a TCP /IP network connection. The use of EMF data for printing allows the program generating the output to finish its print request more quickly since the rendering operation takes place later in the context of the spooler process. Raw data slows the application because it's rendered before being sent to the spooler. Chapter 1 20 • Introduction to Windows NT Drivers Port monitors from Hewlett Packard, Apple, and Digital Equipment Cor poration that control network-based printers and print-servers from these vendors. Normally, you won't need to write a port monitor unless you've developed some new and strange way to link a printer to a computer. For example, an out put device connected to a SCSI controller would need a new port monitor. Multimedia Drivers Multimedia is going to change our lives one day - if only someone can fig ure out how. For those who'd like to try, Windows NT supports a wide range of multimedia devices, including: • Waveform audio hardware that samples and reconstructs analog audio signals • MIDI ports that connect to external musical devices like keyboards, syn thesizers, and drum machines • Onboard MIDI synthesizers that are part of the computer itself • Video capture devices that digitize either single frame or continuous video signals • Related devices like CD players, video-disk players, and joysticks Most application programs don't interact with multimedia hardware by calling such functions as CreateFile or DeviceloControl. Instead they use some of the special-purpose multimedia functions provided by Win32. This indirect approach reduces their dependency on hardware from a specific vendor. Figure 1 .9 shows the components involved in multimedia operations. WIN M M To meet the requirements of different kinds of software, Win32 actually contains two separate multimedia APls. The media control interface (MCI) functions provide high-level access to a wide variety of multimedia devices while hiding many of the details from the programmer. MCI is the inter face used by most applications. For software needing more direct hardware con trol, Win32 also provides a group of low-level audio functions. Programs such as MIDI sequencers or waveform editors are more likely to use this low-level interface. Support for both sets of multimedia functions comes from the WINMM sys tem component. WINMM is a user-mode DLL that acts as a translation layer between the application and the vendor-supplied drivers that actually control the multimedia hardware. To do its job, WINMM relies on three kinds of drivers. MCI drivers An MCI driver is just a user-mode DLL that WINMM loads at runtime to process MCI commands for a specific device. In response to calls Sec. 1 .3 Special Driver Architectures 21 Application MCI Driver WINMM DLL Low-Level Audio Driver System Service Interface Multimedia Device Driver 1/0 Manager Copyright @ 1 996 by Cydonix Corporation. 960013a.vsd Figure 1 .9 Multimedia driver architecture from a multimedia application, WINMM sends various messages to the proper MCI driver. Depending on the device, the MCI driver then uses either the low level audio interface (described below) or Win32 I/O functions to control the hardware. Low-level audio drivers When an application calls a low-level audio function, WINMM loads a vendor-supplied user-mode DLL (the low-level audio driver) and sends it various messages. The low-level audio driver then uses Win32 I/O functions to communicate with the audio hardware. This is very simi lar to the operation of the MCI drivers described previously. Kernel-mode device drivers Management of the multimedia hardware itself comes from a kernel-mode device driver. This includes data transfer opera tions, handling interrupts, processing errors, and so on. Drivers for Legacy 1 6-bit Applications When Microsoft first introduced Windows NT, a vast amount of software already existed for MS-DOS and 16-bit Windows. Any new operating system hop ing to be a commercial success would have to be able to run the majority of this code without modification. At the same time, it would be necessary to protect sys tem integrity by denying these 16-bit programs the kind of unlimited hardware access they enjoyed under MS-DOS and Windows. As you saw earlier in this chapter, Microsoft's solution was to run 16-bit code in the context of one or more virtual DOS machine (VDM) processes. Chapter 1 22 Introduction to Windows NT Drivers VDM Instruction Emulation 32-blt MS-DOS Emulation Virtual Device Drivers Win32 API calls VO MS-DOS App 1 6-bit MS-DOS Emulation System Service Interface VO Manager Device Driver Copyright @ 1 996 by Cydonix Corporation. 960014a.vsd Figure 1 .1 0 Relationship of VDDs and kernel-mode drivers To meet the challenge of allowing VDMs to perform 1/0 without giving them direct access to any hardware, Windows NT uses a piece of software called a virtual DOS driver (VDD). Figure 1.10 shows the relationship of such a VDD to the other parts of the operating system. The VDD essentially acts as a translation layer between a 16-bit application and some custom piece of hardware. Whenever the application tries to touch the hardware directly, the VDD intercepts the request and turns it into a series of Win32 calls. These Win32 calls are then processed by a standard Windows NT ker nel-mode driver. A VDD can intercept a 16-bit program's attempts to access 1/0 ports and specific ranges of memory. It also has the ability to perform DMA transfers on behalf of the application, read and set the contents of CPU registers, and simulate the arrival of interrupts. All this makes it possible to fool the 16-bit application into thinking it's still running under MS-DOS or Windows. The advantage of this approach is that the original 16-bit executable doesn't need to be modified to run under Windows NT. The disadvantage is that the extra layer of software can add significant amounts of processing overhead. Since you have to write a kernel-mode driver to support the underlying hardware, the real solution is to port the application to the Win32 environment. One other point to make here: This technique supports the execution of MS DOS programs that touch hardware directly. It also supports 16-bit DLLs that play with hardware (a common form of driver in the 16-bit Windows environ ment). It does not allow you to run Windows or Windows 95 VxDs under Win dows NT. Sec. 1.4 1 .4 Summary 23 SUMMARY As you can see, Windows NT's rich architecture and multiple API environments add a certain amount of complexity to 1/0 processing. In particular, Windows NT uses a much broader definition of what constitutes a driver than many other oper ating systems. If you're in the process of adding support for a specific piece of hardware, you should have a good idea at this point of just what kind of driver(s) you'll need to write. In the next chapter we'll start our descent into kernel-mode driver develop ment by examining some of the hardware issues facing NT driver writers. C H A P T E R 2 The Hardware Environment f or some people (you know who you are), hot solder is the only true programming language. If you're not in that category, this chapter will give you a gentle introduction to those aspects of hardware that have an impact on writing drivers. You'll also find here a quick tour of the major bus architectures supported by Windows NT, and a few words to the wise about deal ing with hardware in general. 2.1 HARDWAR E BASICS There are a number of things you need to know about a peripheral device before you can design a driver for it. At the very least, the following items are important: • How to use the device's control and status registers • What causes the device to generate an interrupt • How the device transfers data • Whether the device uses any dedicated memory • Whether the device can be autoconfigured The following subsections discuss each of these topics in a general way. 24 Sec. 2.1 Hardware Basics 25 Device Registers Drivers communicate with a peripheral by reading and writing various bits in a group of registers associated with the device. Each of these device registers will generally perform one of the following functions: • Command Setting and clearing bits in command registers causes the device to start an operation or change its behavior in some way. • Status The bits in a status register contain information about the cur rent state of the device. • Data buffer Output devices accept data to be transmitted when it's written to their output buffer registers. Data coming from an input device will appear in the device's input buffer register. - - - Simple devices (like the parallel port interface in Table 2.1) have only a few registers, while complex hardware (like a graphics adapter or a network card) have a large set of registers. In the absence of any industry standard, the engineer designing the interface card is the one who decides how these registers are going to be used. So, if you expect to write a device driver, you'll need detailed informa tion about all its control and data registers. Table 2.1 These registers control a parallel port interface Parallel port registers Offset Register Access Description 0 1 Data Status Bits 0 - 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 Control Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bits 5 - 7 R/W RIO Data byte transferred through parallel port Current parallel port status Reserved 0 - interrupt has been requested by port 0 - an error has occurred 1 - printer is selected 1 - printer is out of paper 0 acknowledge 0 printer is busy Commands sent to parallel port 1 - strobe data to I from parallel port 1 - automatic line feed 0 initialize printer 1 - select printer 1 enable interrupts Reserved 2 - - R/W - - Chapter 2 26 The Hardware Environment Accessing Device Registers Once you know what a set of device registers does, you still need two addi tional pieces of information before you can work with the device: • The address of the device's first register • The address space where these registers live Since a given device's registers usually occupy consecutive locations, the address of the first register will get you to all the others. Unfortunately, finding the register base address is a rather involved process that will have to wait for Chapter 7. That still doesn't answer the question of where these registers live. As you can see from Figure 2.1, device registers can occupy either of two different address spaces. The following subsections describe each of them. 1/0 space registers Some CPU architectures map device registers into a set of addresses known as I/0 space. These 1/0 space addresses (often referred to as ports) are not part of the memory space seen by the CPU, and they can only be accessed with special machine instructions. For example, the 80x86 architecture has a 64-kilobyte 1/0 space, and IN and OUT instructions for reading and writ ing I/ 0 ports. One extra twist: To promote platform independence, an NT driver shouldn't actually use hardware instructions to touch I/ 0 ports. Instead, it ought to use the HAL functions listed in Table 2.2. Device Register LOAD/STORE CPU IN/OUT '"'";, «'l'i'ii '.! i�i!ij1m�iii&iili!i: i:1:11Ji!il!lii!!1�: ri);�1Jj Copyright © 1 994 b y Cydonix Corporation. 940028a.vsd Figure 2.1 Memory-mapped device registers and I / 0 space ports Sec. 2.1 Hardware Basics Table 2.2 27 Use these HAL functions to access ports in 1/0 space HAL 1/0 space functions Function Description READ_PORT_XXX WRITE_PORT_XXX READ_PORT_BUFFER_XXX WRITE_pORT_BUFFER_XXX Read a single value from an I/O port Write a single value to an I/ 0 port Read an array of values from consecutive I/ 0 ports Write an array of values to consecutive I/O ports Substitute one of the following for XXX: UCHAR, USHORT, or ULONG. Memory-mapped registers CPU architectures without a separate I/0 space generally map device registers into some range of physical memory addresses. Access to these memory-mapped device registers is accomplished with the same load and store instructions used for normal memory operations (for example, MOV on the 80x86 platform). Even on CPUs with a separate I/O space, some peripherals memory-map their control registers anyway. This improves the performance of high-speed devices with large register sets, since I/O instructions are typically much slower than memory-access instructions. For example, many SVGA video adapters for 80x86 machines can use memory addresses not only for their video buffers, but for their control registers as well. Once again, the HAL provides a set of support functions (listed in Table 2.3) for accessing memory-mapped registers. Notice that these are not the same func tions you use on a CPU with a separate I/ 0 space. So, if you plan to support your driver on both kinds of architecture, you'll need to take this difference into account. Chapter 5 presents some coding techniques that make this easier to do. Device Interrupts Most reasonable pieces of hardware generate an interrupt request when they need some kind of attention from the CPU. This request takes the form of an Table 2.3 Use these HAL functions to access memory-mapped device registers HAL memory-mapped register functions Function Description READ_REGISTER_XXX WRITE_REGISTER_XXX READ_REGISTER_BUFFER_XXX WRITE_REGISTER_BUFFER_XXX Read a single value from an I/ 0 register Write a single value to an I/O register Read of values from consecutive I/ 0 registers Write values to consecutive I/O registers Substitute one of the following for XXX: UCHAR, USHORT, or ULONG. Chapter 2 28 The Hardware Environment electrical signal on the interrupt lines in the bus. A device might yank on its inter rupt line for any number of reasons, including: • The device has completed a previously requested input or output opera tion and is now idle. • A buffer or FIFO associated with the device is almost full (for input oper ations) or almost empty (for output operations). The device uses an inter rupt to notify the driver that it must process the buffer if it wants the 1/0 to continue without a pause. • The device encountered some kind of error during an 1/0 operation. Some legacy devices don't use interrupts at all. Drivers for this kind of hard ware usually have to poll their devices until some kind of interesting event occurs. Under single-tasking operating systems like MS-DOS, this behavior wasn't a problem, but in an environment like Windows NT, it would seriously degrade system performance. Chapters 10 and 14 will present some techniques you can use with non-interrupting hardware. The various bus architectures supported by Windows NT take slightly dif ferent approaches to interrupts. Nonetheless, they all share several common fea tures, which are described below. Interrupt priorities When several devices are connected to the same bus, the CPU needs some way to rank the importance of their interrupt requests. This allows devices that need immediate servicing to access the CPU ahead of devices that can afford to wait. Although the exact mechanism depends on the bus, this ranking generally works by assigning a priority value to each of the interrupt request lines. When the CPU accepts an interrupt request, it blocks out any further inter rupts at or below the same priority and transfers control to an interrupt service routine. Until the interrupt service routine handles and dismisses the interrupt, only requests of a higher priority can take control of the CPU. Lower-priority requests remain pending until the more important activity is finished. Interrupt vectors An interrupt vector is a unique, bus-relative number which allows the CPU to identify the source of an interrupt and call the appropri ate service routine. The interrupt controller usually passes this vector to the CPU when it accepts an interrupt request. The CPU then uses the vector as an index into a table containing the addresses of interrupt service routines. Signaling mechanisms Hardware designers have developed two basic strategies that devices can use when they want to generate an interrupt. The older mechanism defines an interrupt request as a transition from zero to one on the interrupt signal line. These are called edge-triggered (or latched) interrupts because they depend only on the leading edge of the pulse. Sec. 2.1 Hardware Basics 29 Unfortunately, this scheme has two problems. First, it's very sensitive to electrical noise - a random spike can easily be mistaken for an interrupt request. Second, if an interrupt arrives while another one is being serviced at the same pri ority, the second interrupt will be ignored. This limits sharing to situations where simultaneous interrupts will never occur on the same line. These limitations led to the development of another signaling mechanism called a level-sensitive (or level-triggered) interrupt. This approach requires the device to send a continuous signal down the wire until the interrupt service rou tine explicitly dismisses the interrupt. In addition to greater noise immunity, this scheme makes it possible for multiple devices to share the same interrupt request line. Processor affin ity To improve overall performance, multiprocessor plat forms often contain special interrupt-routing hardware. The purpose of this hard ware is to distribute interrupt requests from a given device to one or more specific CPUs. If a particular CPU can service interrupts from a device, those interrupts are said to have affinity for that CPU. Data Transfer Mechanisms Hardware designers have three basic options when it comes to moving data between a peripheral and memory. • Programmed I/O • Direct memory access • Shared buffers The transfer mechanism used by a given device usually depends on the device's speed, the amount of data it needs to transfer, and any applicable indus try standards. In some cases, a complex piece of hardware may actually use more than one of these techniques. The following subsections explain the differences between programmed I/O and direct memory access (illustrated in Figure 2.2). Shared memory buffers are covered later in the discussion of device-specific memory. Programmed 1/0 (PIO) PIO devices need the help of the CPU to perform data transfers. Their drivers are responsible for sending or receiving each byte of data, keeping track of the buffer in memory, and maintaining a running count of the number of bytes transferred. PIO devices typically generate an interrupt after each byte or word of data is transferred. Some PIO devices have an internal buffer or a hardware FIFO that helps to reduce the interrupt count. Even so, lengthy transfers need a lot of atten tion from the CPU and produce a flood of interrupts. This can lead to very poor system performance. 30 Chapter 2 The Hardware Environment OMA Controller Count Register ----- -� Address Register Device Data Register Copyright © 1 994 by Cydonix Corporation. 940039a.vsd Figure 2.2 Paths followed by data in DMA and programmed I/0 transfers This style of 1/0 is best suited to slower devices that don't move large amounts of data in a single operation. Parallel ports, pointing devices, and the keyboard are all examples of PIO hardware. Chapter 9 will explain how to work with PIO devices. Direct memory access (OMA) DMA devices take advantage of special hardware called a DMA controller (DMAC). A DMAC is actually a very simple auxiliary processor with just enough intelligence to transfer a specified number of bytes between a peripheral device and memory. At the beginning of an I/ 0 operation, the driver loads a transfer count and a memory address into the DMAC and then starts the device. All by itself, the DMAC moves data to or from successive memory locations, and when the trans fer is complete, it generates an interrupt request. During the actual operation, the driver is suspended and the CPU can work on other tasks. High-speed devices that perform large transfers generally use DMA because it significantly reduces driver overhead and system interrupt activity. Disks, sound samplers, and network cards are examples of DMA devices. Direct Memory Access (OMA) Mechanisms Chapter 12 will have a lot more to say about the mechanics of working with this kind of hardware. There are a number of twists and turns that aren't relevant here. At this point, it's only necessary to draw a distinction between two general kinds of DMA. Sec. 2.1 Hardware Basics 31 System OMA Some devices are connected to the shared DMACs on the motherboard. These controllers each have a fixed number of data-transfer paths (called channels) that can all work simultaneously. More than one device can be attached to the same channel, but only one device at a time can transfer data over the channel. This is known as system DMA or slave DMA. By sharing hardware, slave OMA devices have a simpler architecture and lower chip count. On the downside, they may have to wait for a OMA channel to become available before they can start an operation. The floppy controller on most PCs is a slave OMA device. Bus master OMA Other devices (called bus masters) have their own OMAC hardware built into the peripheral card itself. This guarantees that high speed devices won't have to wait for a system OMA channel to become free. The AHA-1742 SCSI controller from Adaptec is one example of a bus mastering device. Device-Dedicated Memory Some devices insist on having a private range of addresses in physical mem ory. There are several reasons why a peripheral card might need dedicated address space: • Its control registers might be memory-mapped. • It might have an internal ROM containing start-up code and data. For the CPU to execute this code, it has to appear somewhere in memory address space. • It might use a block of memory as a temporary buffer for data that's being sent or received. High-speed devices like video capture boards and Ether net adapters often use this technique. Peripheral cards generally take one of two approaches to dedicated memory. Some insist on using a specific range of physical addresses. For example, VGA cards expect a 128-kilobyte block of addresses beginning at OxAOOOO to belong to them. Alternatively, the card might have an address register that holds the base physical address of its dedicated memory. During initialization, the driver for the card will load this register with a pointer to some block of available memory. Fig ure 2.3 illustrates each of these two possible designs. Regardless of which approach a card takes, it's important to remember that the card will be working with physical addresses. Since the only addresses avail able to a device driver are virtual addresses, drivers have to map any device mem ory somewhere into system virtual space before they can access it. Chapter 7 explains how all this works. Chapter 2 32 The Hardware Environment Contiguous Buffer Contiguous Buffer Copyright © 1 994 by Cydonix Corporation. 940048a.vsd Figure 2.3 How drivers access device memory Requirements for Autoconfiguration Ever since the first add-on card hit the market, PC users have been strug gling with ports, IRQs, and DMA channel assignments. In the beginning, things weren't too bad, and it usually didn't take too long to find an appropriate combi nation of DIP-switch and jumper settings. However, as people started attaching more and more optional equipment to their PCs, getting everything to work became a real nightmare. To get around these problems, some bus architectures support various levels of automatic hardware recognition and configuration. The next section of this chapter will describe specific autoconfiguration capabilities of the major buses. Here, it's enough to introduce the kinds of features that make autoconfiguration possible. Device resource lists At the very least, a device must identify itself and provide the system with a description of the resources it needs. In the ideal case, this resource list contains the following information: • Manufacturer ID • Device type ID • I/ 0 space requirements • Interrupt requirements • DMA channels • Device memory requirements Sec. 2.2 Buses and Windows NT 33 No jumpers or switches Self-identification isn't enough, however. For true autoconfiguration, a device must be able to change its port, interrupt, and DMA channel assignments dynamically under software control. This allows a driver or some other part of the operating system to arbitrate resource conflicts among competing devices. Change notification Finally, the highest level of support also requires the bus to generate a notification signal whenever a card is plugged in or removed. Without this kind of mechanism, it's not possible to implement any of the Plug and Play hot-swapping features. Since the current release of Windows NT doesn't support Plug and Play, this isn't an issue right now. But it will be in the future. 2.2 B USES AND WIN DOWS N T A bus is just a collection o f data, address, and control lines that allows a peripheral device to communicate with memory and the CPU. The specification for a bus defines such things as the shape and size of physical connectors, the functions performed by each of the lines in the bus, and the timing and signaling protocols used by devices attached to the bus. Over the last decade, hardware vendors have developed a wide variety of bus architectures with differing electrical and logical characteristics. As of version 4.0, Windows NT supports many of these buses. What follows are brief descrip tions of the buses you're most likely to encounter. For more detailed information, see some of the books listed in the bibliography. ISA - The Industry Standard Architecture This is the old standby that made its first appearance on the IBM PC / AT. It was derived from the original IBM PC bus by adding extra data and address lines and increasing the number of IRQ levels and DMA channels. Both 16-bit ISA cards and the older IBM PC 8-bit cards fit into ISA sockets. Figure 2.4 shows the organization of an ISA-based machine. The ISA bus isn't especially fast. To maintain backward compatibility with the IBM PC, the ISA bus clock rate is limited to 8.33 MHz. In the best case, a 16-bit transfer takes two clock cycles, so the maximum data rate is only about 8 MB/ sec. This limit applies regardless of the clock rate of the CPU itself. That's why the CPU and memory communicate over a high-speed local bus (sometimes called the X bus) . Register access There are very few rules when it comes to the layout of I/O space on ISA systems. Beyond some industry conventions, there aren't any real standards for the kinds of registers an ISA card should implement, nor what addresses they should use. Generally, I/O addresses between OxOOOO and OxOOFF belong only to devices on the system board, while the territory between Ox0100 Chapter 2 34 The Hardware Environment Local Bus ISA Card .. , ..• ·'>, Copyright © 1 996 b y Cydonix Corporation. 960027a.vsd Figure 2.4 Layout of an ISA sy stem and Ox03FF is available for add-on cards. The space used by expansion cards is doled out in 32-byte chunks. Unfortunately, many ISA add-on cards don't pay attention to all 16 I / O address bits. Instead, they look only a t bits 5-9 t o see i f a n I / O space reference belongs to them. If it does, they decode bits 0-4 to determine the exact register. Cards like this are a problem because they respond to multiple addresses in the 64-kilobyte I/O space, which can lead to some nasty behavior. The only way to prevent conflicts on a system with ISA boards is avoid these alias addresses altogether. 1 Interrupt mechanisms Interrupts on an ISA bus are normally handled by a pair of Intel 8259A programmable interrupt controller (PIC) chips, each of which provides eight levels of interrupt priority. These two chips are tied together in a master-slave configuration that leaves fifteen available priority levels. Table 2.4 lists the ISA priority levels and describes how they are normally used. The 8259A chip can be programmed to respond to either edge-triggered or level-sensitive interrupts. This choice must be made for the entire chip; it can't be set on an IRQ-by-IRQ basis. The power-on self-test (POST) code in the ISA BIOS programs both chips to use edge-triggered interrupts. This means that multiple ISA cards cannot normally share the same IRQ levels. OMA capabilities The standard implementation of ISA OMA uses a pair of Intel 8237 DMAC chips (or their functional equivalent) . Each of these chips 1 In other words, the control registers of any cards using the range above Ox03FF have to use l / O space addresses with zeroes in bits 8 and 9. Sec. 2.2 35 Buses and Windows NT Table 2.4 I nterrupt priorities on I SA systems ISA interrupt priority sequence Priority IRQ line Controller Used for... Highest 0 1 2 8 9 10 11 12 13 14 15 3 4 5 6 7 Master Master Master Slave Slave Slave Slave Slave Slave Slave Slave Master Master Master Master Master System timer Keyboard (Unavailable - pass-through from slave) Real-time clock alarm (Available) (Available) (Available) (Available - usually the mouse) Error output of numeric coprocessor (Available - usually the hard disk) (Available) 2nd serial port 1st serial port 2nd parallel port Floppy disk controller 1st parallel port Lowest provides four independent DMA channels. When they're ganged together in a master-slave configuration, the first slave channel (number 4) serves as a pass through and becomes unavailable. Table 2.5 describes the capabilities of these DMA channels. When several DMA channels request the bus simultaneously, the DMAC chips use a software-selected arbitration scheme to resolve the conflict. The ISA BIOS POST-code normally programs the DMACs for fixed-priority arbitration. This means that channel 0 always gets first crack at the bus, and channel 7 always goes last. Also notice from Table 2.5 that the lower channels transfer individual bytes, while the upper ones move data only in words. Since the DMAC uses a 16-bit Table 2.5 OMA architecture on the ISA bus ISA OMA channels Channel Controller Transfers ... Max transfer 0-3 4 5-7 Master Slave Slave Bytes only (Unavailable) Words only 64 kilobytes 128 kilobytes Chapter 2 36 The Hardware Environment count register in both cases, the upper channels can transfer twice as much data in a single operation. One other significant item about DMA operations: The ISA bus has only 24 address lines. This means that DMACs can access only the first 16 megabytes of system memory. Any DMA buffers outside this range are unavailable. In Chapter 12 you'll see how NT deals with this complication. Device memory The 24 address lines on the ISA bus have an impact on device memory as well as DMA buffers. Any device-dedicated memory must live in the first 16 megabytes of physical address space. This applies to any onboard ROM as well. Autoconfiguration Unfortunately, the ISA specification says nothing about autoconfiguration. ISA devices don't identify themselves (either by manu facturer or device type), nor do they provide a resource list. Since ISA cards aren't required to have any software configuration registers, users normally have to con figure the card with DIP switches and jumpers. Sometimes it's possible to make educated guesses about the presence of a particular device by tickling various addresses in I/ 0 space and listening for an appropriate giggle from a device. This is generally not a very reliable way to do things. Even if you do manage to locate a piece of hardware using this technique, you still don't know anything about its DMA or interrupt settings. The proposed Plug and Play extensions to ISA are intended to correct such problems. Until these extensions become available, you'll have to use some of the cruder methods described in Chapter 7. MCA - The Micro Channel Architecture IBM developed the Micro Channel architecture as a replacement for the aging ISA bus. In a bold move, they dumped ISA altogether and proposed a vastly improved architecture. Progress isn't cheap, however, and the cost of adopting this new design was that all legacy ISA or IBM PC adapter cards would have to be trashed. Most people were unconvinced, and the MCA bus hasn't achieved great popularity among hardware vendors. 2 Figure 2.5 shows the orga nization of a typical MCA system. Since they weren't constrained by the 8.33-MHz clock rate of the ISA bus, IBM was able to design a pretty snappy architecture. Although the original MCA implementation3 only supported data transfer rates of 10 megabytes/ sec, later versions of the bus specification incorporated a streaming data protocol that raised this number by a factor of 16. Table 2.6 summarizes the data rates available from the MCA bus. 2 3 Political problems also contributed to the failure of MCA. IBM patented the architecture and tried to impose licensing conditions that many hardware vendors found objectionable. This was the 16-bit version used for the original IBM PS/ 2 . Sec. 2.2 Buses and Windows NT 37 Serial & Parallel VGA Keyboard MCA Slot 7 MCA Slot O Copyright @ 1 996 by Cydonix Corporation. 960028a.vsd Figure 2.5 Layout of a Micro Channel system Register access An MCA bus can have at most eight card sockets, referred to as slots. 4 Each slot has an associated set of programmable option select (POS) registers that are used to configure the card. These POS registers replace the jumpers and DIP switches found on ISA devices. At the very least, an MCA card must implement a POS register that identifies the card. Other than the POS registers (which are always at a fixed location), I/O space under MCA is just about as chaotic as it is on an ISA system. (The problem with ISA alias addresses doesn't occur, however.) At the option of the designer, MCA cards can have either fixed or programmable register addresses in 1/0 space. The only requirement is that if more than one of the same card will be Table 2.6 MCA buses support a wide range of transfer speeds MCA data transfer speed - Protocol Basic Streaming 4 Data width Transfer rate 16 bits 32 bits 16 bits 32 bits 64 bits 64 bits lO MB/sec 20 MB /sec 20 MB/sec 40 MB /sec BO MB/sec 160 MB/sec Additional devices can live on the motherboard itseli. Chapter 2 38 The Hardware Environment plugged into an MCA bus, the card must have a 3-bit POS field for setting the card's base register address. Interrupt mechanisms The Micro Channel architecture supports 15 inter rupt request levels. Their functions and relative priorities follow the same pattern used by the ISA bus (refer back to Table 2.4). The only improvement is that MCA cards use level-sensitive interrupt signals, thus allowing more than one device to share a single IRQ line. OMA capabilities The MCA bus was designed to be shared. The system board can support up to eight system DMA channels, and there's room on the bus for an additional seven bus masters. Six of the system DMA channels follow a fixed priority arbitration scheme, while channels 0 and 4 have assignable priori ties. The seven bus masters also have assignable priorities, although they will always defer to the system DMA hardware. Older implementations of the system DMAC were limited to 16-bit transfers (even though the bus itself has a 32-bit data path), and buffers had to fall in the first 16 megabytes of physical memory. (Bus master cards didn't have this limita tion.) Proposed improvements to the MCA specification allowed for 32- and even 64-bit data transfers. 5 These changes also gave the system DMAC access to a full 4-gigabyte address range. Device memory The MCA specification dictates that any device with onboard ROM must use 4 bits in one of its POS registers to select a starting address for the ROM. This gives card designers the option of mapping the ROM into any of 16 separate locations in physical memory. Since the MCA bus has 32 address lines, device memory can exist anywhere in a 4-gigabyte address space. Autocon figuration MCA autoconfiguration involves the POS registers and a card-specific script called an adapter description file (ADF). Whenever an MCA system bootstraps, it checks each slot to see what's there. If it finds a previ ously configured card, it downloads configuration data from nonvolatile RAM (NVRAM) into the card's POS registers. If something appears in a slot that had previously been empty, the bootstrap configuration program uses the card's POS ID register to generate the name of the device's ADF file. After prompting the user for the floppy containing the ADF, the configuration program selects resource assignments for the new card that don't conflict with the resources used by any existing cards. These assignments are cop ied into NVRAM. Windows NT can recognize many kinds of MCA devices all by itself. If you need to touch MCA slots directly, you can use HalGetBusData and HalSetBus Data to access them. 5 The extra 32 bits came from multiplexing the address lines on the MCA bus. Sec. 2.2 Buses and Windows NT 39 EISA - The Extended Industry Standard Architecture The PC industry responded to IBM's Micro Channel architecture with the EISA bus. Most people simply weren't willing to throw away all their old hard ware. The EISA bus reflects this sentiment by removing some of the ISA limita tions while still allowing the use of legacy devices. However, EISA's emphasis on compatibility limits the architecture in certain ways. For example, even though the bus supports 32-bit data transfers, the bus clock still runs at 8.33 MHz so the maximum transfer rate is only about 33 mega bytes/ sec. Also, since EISA sockets had to be able to accept ISA cards, it was impossible to fix some of the electrical noise problems caused by the layout of the ISA wiring. See Figure 2.6 for the layout of a typical EISA system. Register access Like MCA, the EISA bus contains a number of slots, each of which corresponds to one physical socket on the bus. As you can see from Table 2.7, each of the 15 EISA sockets has its own particular range of addresses in I/O space. Within the 4-kilobyte area assigned to a particular slot, four 256-byte ranges are guaranteed to be available to the card in that socket. 6 Interrupt mechanisms EISA's interrupt capabilities are a superset of the ISA mechanisms. Although EISA interrupt controllers provide the same 15 levels available on the ISA bus (see Table 2.4), each IRQ line can be individually pro grammed for edge-triggered or level-sensitive behavior. This allows both ISA cards and EISA cards to coexist on the same bus. Memory EISA Slot 1 5 CopyrightChapter 3 54 Kernel-Mode I/O Processing Buffered 1/0 ( BIO) Under this scheme, the I/0 Manager allocates a buffer from nonpaged pool at the start of each I/0 operation and passes the address of this buffer to the driver. The driver uses this buffer for any data transfer operations to or from the device. For output requests, the I/O Manager copies the contents of the user's buffer into the system buffer before passing it to the driver. For input requests, the driver fills the system buffer with data from the device, and the I/0 Manager copies it back into user space at the end of the operation. There are two disadvantages to this technique. One is that all the memory to-memory copying of data can slow things down, particularly for devices that transfer large amounts of data on a frequent basis. The other is that it can use up a lot of nonpaged pool. So, drivers should limit the use of buffered I/0 to slow devices that don't transfer a lot of data at one time. For these reasons, you should never use Buffered I/0 to perform transfers larger than one page of memory. Direct 1/0 (DIO) This scheme avoids the need for copying user data by giving the driver direct access to the physical pages of memory where the user buffer lives. At the beginning of an I/0 operation, the I/0 Manager locks the entire user buffer into memory to prevent deadly page faults. It then builds a list that identifies the physical pages making up the user buffer. The driver uses this list to perform an I/0 operation using the actual pages of the user 's buffer. When the I/0 operation is �omplete, the I/0 Manager will unlock the pages. You should use Direct I/O for high-speed devices that need to transfer large amounts of data at once, particularly devices that perform DMA. The mechanics of Direct I/O are described in Chapter 12. 3.5 STRUCTU RE OF A KERN EL-MODE DRIVER One o f the biggest differences between a driver and a n application program i s the driver's control structure. Application programs run from beginning to end under the control of a main or WinMain function that determines the sequence in which various subroutines are called. A kernel-mode driver, on the other hand, has no main or WinMain function. Instead, it's just a collection of subroutines that are called as needed by the 1/0 Manager. Depending on the driver, the 1/0 Manager might call a driver routine in any of the following situations: • When a driver is being loaded • When the driver is being unloaded or the system is shutting down • When a user-mode program issues an I/O system service call • When a shared hardware resource becomes available to the driver • At various points during an actual device operation Sec. 3.5 Structure of a Kernel-Mode Driver 55 The remainder of this section briefly describes the major categories of routines making up a kernel-mode driver. Driver Initialization and Cleanup Routines Before any driver can begin processing 1/0 requests, there are a number of initialization tasks it must perform. Likewise, drivers need to clean things up when they leave the system. There are several routines a driver can use to perform these operations. DriverEntry routine The I/O Manager calls this routine when it loads the driver, either at system boot time if the driver is loaded automatically, or later if you load the driver manually from the Control Panel. The DriverEntry routine performs a wide range of initialization functions, including setting up pointers to other driver routines, finding and allocating any hardware resources used by the driver, and making the name of the device visible to the rest of the system. Reinitialize routine Some drivers may not be able to complete their initialization during the DriverEntry routine. This could happen if the driver depended on some other driver that wasn't yet loaded, or if the driver needed to initialize itself during different phases of the system boot. These kinds of drivers can use Reinitialize routines to spread out their initialization functions over time. Unload routine The 1/0 Manager calls a driver's Unload routine when a driver is unloaded manually using the Control Panel. The Unload routine is responsible for undoing everything that was done by the DriverEntry routine, including deallocating any hardware resources belonging to the driver and destroying any kernel objects that belong to the driver. Shutdown routine When the system goes through a user-initiated shutdown, the I/0 Manager will call the Shutdown routines registered by any currently loaded drivers. The primary purpose of a Shutdown routine is to put the hardware into a known state. System resource cleanup is not as important here because the system is about to disappear anyway. Bugcheck callback routine If a driver needs to get control in the event of a system crash, it can register a Bugcheck callback routine with the Kernel. This mechanism gives the driver a chance to put its devices into a known state, and perhaps record some state information that will be helpful in debugging the crash. VO System Service Dispatch Routines When the 1/0 Manager gets a request, it uses the function code of the request to call one of several Dispatch routines in the driver. The Dispatch routine Chapter 3 56 Kernel-Mode I/0 Processing verifies the request and may have the I/0 Manager send it to the device for processing. Open and close operations All drivers must provide a Dispatch routine that handles Win32 CreateFile requests. Drivers that need to perform cleanup operations can supply a routine to handle CloseHandle calls, as well as separate Dispatch routines that perform special processing when the last handle on a shared device is closed. Device operations Depending on the device, a driver may have one or more Dispatch routines for handling actual data transfer and control operations. The 1/0 Manager calls these routines in response to Win32 ReadFile, WriteFile, and DeviceloControl requests, or in response to an I/ 0 request from a higher level driver. These routines perform any final verification of the request and then pass it to the driver's device management routines for actual processing. Data Transfer Routines Device operations involve a number of different driver routines, depending on the nature and complexity of the device. Start 1/0 routine The 1/0 Manager calls the driver's Start 1 / 0 routine when it's time to begin a device operation. This routine allocates any resources needed to process the request and sets the device in motion. The I/ 0 Manager provides simplified support for half-duplex drivers that only need a single Start I/0 routine. Drivers of full-duplex devices that have to manage simultaneous input and output requests need a somewhat more complex architecture. Interrupt Service routine (ISR) The Kernel's interrupt dispatcher calls a driver's Interrupt Service routine whenever the driver's device generates an interrupt. The ISR is responsible for acknowledging the device, gathering any volatile state information needed by other parts of the driver, and asking the 1/0 Manager to execute a DPC routine. DPC routine(s) A driver can have one or more DPC routines that clean up after a device operation. Depending on the driver, this can involve releasing various system resources, reporting errors, handing completed 1/0 requests back to the I/ 0 Manager, and starting the next device operation if one is waiting. If you can do everything with a single DPC, the I/ 0 Manager provides a simplified mechanism called a DpcForlsr routine. However, some drivers are easier to write and maintain if they have separate DPC routines for different kinds of processing. For example, drivers that perform full-duplex 1/0 might have one DPC routine that completed input operations, and another DPC routine for outputs. At your option, your driver can have any number of these CustomDpc routines. Sec. 3.5 Structure of a Kernel-Mode Driver 57 Resource Synchronization Callbacks As an extension of the 1/0 Manager, a driver must be ready to run as needed at the request of more than one user-mode process. For example, it could be asked to send data to one device while waiting for a previous operation to complete on the same or another device. Since there's only one copy of the driver in memory, it has to handle any contention issues that might result from processing overlapping requests. The I/0 Manager makes it easier for drivers to handle these kinds of problems through the use of various synchronization callback routines. When a driver needs to access some shared resource, it queues a request for that resource. When the resource becomes available, the 1/0 Manager invokes a driver callback routine associated with the request. This has the effect of serializing access to the resource and avoiding collisions. There are three types of synchronization callback routines a driver might use. ControllerControl routine If a peripheral card supports multiple physical devices, it's important that only one hardware operation is being performed at a time. Before doing anything to the controller's registers, the Start I/O routine requests exclusive ownership of the controller. If ownership is granted, the ControllerControl callback routine executes; otherwise the ownership request waits until the current owner releases the controller. AdapterControl routine DMA hardware is another shared resource that must be passed around from driver to driver. Before doing any DMA operations, the driver requests ownership of the proper DMA hardware. If ownership is granted, the AdapterControl callback routine executes; otherwise the ownership request waits until the current owner releases the DMA hardware. SynchCritSection routines The parts of your driver that service device interrupts run at DIRQL while other pieces of driver code execute at or below DISPATCH_LEVEL. If these low-IRQL sections of code need to touch any resources used by the Interrupt Service routine, they perform the operation inside a SynchCritSection routine. Resources in this category include all device control registers and any other context or state information shared with the Interrupt Service routine. Other Driver Routi nes In addition to the basic set of routines described above, your driver may contain some of the following additional functions. Timer routines Drivers that need to keep track of the passage of time dur ing a device operation can do so using either an 1/0 Timer or a CustomTimerDpc routine. Chapter 10 describes both these mechanisms. Chapter 3 58 Kernel-Mode 1/0 Processing 1/0 completion routines Drivers of higher-level routines may want to receive notification when a request they've sent to a lower-level driver has completed. This notification will come in the form of a call to the higher-level driver's 1/0 Completion routine. Chapter 15 discusses these routines in more detail. Cancel 1/0 routines Any driver that holds on to pending requests for a long time must attach a Cancel 1/0 routine to the request. If the request is canceled, the I/ 0 Manager calls the Cancel I / 0 routine to perform any necessary cleanup operations. Chapter 11 describes the operation of these routines. 3.6 1/0 PROCESSING SEQUENCE When a user-mode thread requests a n 1/0 operation, the request goes through several processing stages: • Request preprocessing by NT and the I / 0 Manager • Driver-specific preprocessing • Device activation and interrupt servicing • Driver-specific postprocessing • Request postprocessing by the I/ 0 Manage The following sections describe these stages in more detail. Request Preprocessing by NT This phase takes care of all the device-independent setup and verification required by an I/ 0 request. 1. The Win32 subsystem converts the request into a native NT system service call. This triggers a change to kernel mode which is trapped by NT's system service dispatcher. Eventually, the call ends up inside the 1/0 Manager. 2. The 1/0 Manager allocates a data structure called an I/O Request Packet (IRP) . Subsequent chapters will have a lot to say about IRPs, but for now, just think of them as work orders that describe what the driver is supposed to do. The 1/0 Manager fills in the IRP with various pieces of information including a function code indicating what operation the user requested. 3. The 1/0 Manager performs a number of validity checks on the arguments supplied by the caller. This involves verifying the file handle, checking access rights to the file object, making sure the device supports the requested func tion, and probing any input or output buffer addresses passed by the caller. 4. If this is a Buffered 1/0 operation, the 1/0 Manager allocates a nonpaged pool buffer, and for outputs, copies data from user space into the system 1/0 Processing Sequence Sec. 3.6 59 buffer. If this is a Direct 1/0 operation, the user's buffer pages are faulted into memory and locked down, and the 1/0 Manager builds a list of the buffer's physical pages. 5. The 1/0 Manager calls one of the driver's Dispatch routines. Request Preprocessing by the Driver Each driver provides a dispatch table that controls the device-dependent preprocessing of 1/0 requests. The 1/0 Manager uses the function code of the requested operation as an index into this table and calls the corresponding driver Dispatch routine. These routines might perform any of the following operations: • Do any device-dependent parameter validation. An example would be testing whether the size of the request falls within the range of any limita tions imposed by the device itself. • If the request is such that it can be handled without any device activity, the Dispatch routine completes the request and sends it back to the 1/0 Manager. • If device operation is required, the Dispatch routine marks the request as pending and tells the 1 / 0 Manager to send it to the driver's Start I/0 routine. Data Transfer Data transfers and other device operations are managed by the driver's Start I/ 0 and Interrupt Service routines. Start 1/0 When a Dispatch routine tells the I/O Manager to start a device operation, the 1/0 Manager checks to see if the target device is currently busy. If it is, the request is queued to the device for later processing. Otherwise, the I/ 0 Manager calls the driver's Start I/0 routine. Depending on the device, the driver's Start I/O routine performs some or all of the following steps: l. I t checks the I RP function (read, write, device control, etc.) and performs any setup work specific to that type of operation. 2. If the device is attached to a multiunit controller, the ControllerControl rou tine asks for exclusive ownership of the controller hardware. 3. If the operation is a OMA transfer, the AdapterControl routine allocates OMA adapter resources. 4. It uses a SynchCritSection routine to start the device. 5. It returns control to the 1/0 Manager and waits for a device interrupt Chapter 3 60 Kernel-Mode 1/0 Processing ISR When an interrupt occurs, the Kernel's interrupt dispatcher calls the driver's ISR. Depending on the device, the ISR performs some of the following steps: 1. I t checks t o see i f the interrupt was expected. 2. It stops the device from interrupting. 3. If this is a programmed 1/0 operation and more data remains to be trans ferred, it moves the next chunk of data to or from the device and waits for the next interrupt. 4. If this is a DMA operation and more data remains to be transferred, it queues a DPC request to set up the DMA hardware for the next chunk of data. 5. If an error occurs or the data transfer is complete, it queues a DPC request to perform 1/0 postprocessing at a lower IRQL. 6. It dismisses the interrupt. Postprocessing by the Driver The Kernel's DPC dispatcher eventually calls the driver's DPC routine to perform device-specific postprocessing operations, including some or all of the following: 1. If this i s a DMA operation and more data remains t o b e transferred, i t sets up the DMA hardware for the next piece of data, starts the device, and waits for an interrupt. It then returns to the 1/0 Manager without performing any of the following steps. 2. If there was an error or timeout, the DPC routine might record it in the system event log and either retry or abort the I/ 0 request. 3. It releases any DMA and controller resources being held by the driver. 4. Next, the DPC routine puts the size of the transfer and final status informa tion into the IRP. 5. Finally, it tells the 1/0 Manager to complete the current request and start the next one, if one is waiting in the queue for this device. Postprocessing by the 1/0 Manager Once the driver's DPC routine releases an IRP, the 1/0 Manager performs various device-independent cleanup operations. These include the following. 1. If this was a Buffered 1 / 0 output operation, the 1 / 0 Manager releases the nonpaged pool buffer used during the transfer. Sununary Sec. 3.7 3.7 61 2. If this was a Direct 1/0 operation, it unlocks the user's buffer pages. 3. It queues a request to the original thread for a kernel-mode asynchronous pro cedure call (APC) . This APC will execute a piece of 1/0 Manager code in the context of the thread that issued the original 1/0 request. 4. When the kernel-mode APC runs, it copies status and transfer-size informa tion back into user space. 5. If this was a buffered input, the APC routine copies the contents of the non paged pool buffer into the caller's user-space buffer. Then it frees the system buffer. 6. If the original request was for an overlapped operation, the APC routine sets the associated Event object into the signaled state. 7. If the original request included a completion routine (for example, from a ReadFileEx or WriteFileEx call), the kernel-mode APC requests a user-mode APC to execute the completion routine. SUMMARY That completes our quick tour of NT and the 1/0 subsystem. At this point, you should have a good sense of how various driver routines interact with the 1/0 Manager. Later chapters will explain how to apply this understanding. Keeping track of all the details involved in 1/0 processing obviously requires a lot of bookkeeping. In the next chapter, we'll take a look at the data structures used by the 1/0 Manager and your driver. C H A P T E R 4 Drivers and Kernel-Mode Obj ects D ata structures are the lifeblood of most operat ing systems, and Windows NT is no exception. What's interesting about NT is its use of object technology to manage all this data. After a quick look at NT's approach to objects, this chapter introduces the major structures involved in pro cessing 1/0 requests. Later chapters will introduce additional data objects as they become necessary. 4.1 DATA O BJ ECTS AND WIN DOWS N T Just in case you've been living on Mars for the last decade, object-oriented pro gramming (OOP) is one of the currently fashionable software design methodolo gies. In this scheme, data structures are viewed as black boxes (objects) whose contents are invisible, and any interaction with these data structures occurs through a limited set of access functions (methods). The goal is to improve the reli ability and robustness of the resulting software by hiding implementation details from the users of an object, and by reducing unplanned dependencies between software modules. Windows NT and OOP Using a strict definition of OOP, the design of NT isn't truly object-ori ented. Rather, you should think of it as being object-based, because it manages its internal data structures in an objectlike way. In particular, the Kernel and the 62 Sec. 4.2 1/0 Request Packets (IRPs) 63 various Executive modules each define their own sets of data structures, along with a corresponding group of access functions. All other modules are expected to use those access functions to manipulate the contents of the structure. The data structures themselves are supposed to be opaque outside the module that defines them. That's the idea anyway. When it comes to drivers, things get a little fuzzy since a driver is essentially a trusted add-on component of the 1/0 Manager. Because of this special status, a driver is allowed to touch some object fields directly but must use access functions for other operations on the object. So, I/O Manager objects available to a driver are partially opaque. Objects defined by other NT components are entirely opaque. NT Objects and Win32 Objects If you compare internal NT objects with the Win32 user-mode objects, you'll see a couple of differences. First, with a couple of exceptions, most of these NT objects have no externally visible names. This is because these objects aren't being exported to user mode and don't need to be managed by the Object Manager. Second, you don't use handles to access internal NT objects. Instead, you use a pointer to the object body itself. In some cases, NT will create the object for you and give you the pointer. In other cases, you'll need to allocate and initialize stor age for the object. 4.2 1/0 REQU EST PACKETS (I R Ps) Almost all 1/0 is packet-driven under Windows NT. Each separate 1/0 transac tion is described by a work order that tells the driver what to do and tracks the progress of the request through the 1/0 subsystem. These work orders take the form of a data structure called an I/O Request Packet (IRP), and this is how they're used. 1. The 1/0 Manager allocates an IRP from nonpaged system memory in response to an I/0 request. Based on the 1/0 function specified by the user, it passes the IRP to the appropriate driver Dispatch routine. 2. The Dispatch routine checks the parameters of the request, and if they're valid, passes the IRP to the driver's Start I/0 routine. 3. The Start I/0 routine uses the contents of the IRP to set up a device operation. 4. When the operation is complete, the driver's DpcForlsr routine stores a final status code in the IRP and sends it back to the I/O Manager. 5. The 1/0 Manager uses the information in the IRP to complete the request and send the user the final status. Chapter 4 64 Drivers and Kernel-Mode Objects This describes what happens when requests are being sent to a lowest-level driver. If the initial request is sent to a higher-level driver, things get a little more complex, and a single IRP may travel through several layers of drivers before the request is finished. Higher-level drivers can also create additional IRPs and send them to other drivers. Layout of an IRP An IRP is a variable-sized structure allocated from nonpaged pool. As you can see from Figure 4.1, an IRP has two sections: • A header area containing general bookkeeping information • One or more parameter blocks called I/0 stack locations IRP header This area of the IRP holds various pieces of information about the overall 1/0 request. Some parts of the header are directly accessible to your driver, while other pieces are the exclusive property of the 1/0 Manager. Table 4.1 list the fields in the header that your driver is allowed to touch. The IoStatus member holds the final status of the 1 / 0 operation. When your driver is ready to complete the processing of an IRP, it sets the Status field of this block to a STATUS_XXX value. At the same time, your driver should set the Information field of the status block either to 0 (if there's an error) or to a function-code-specific value (for example, the number of bytes transferred) . MajorFunction; MinorFunction; union { struct { } Read; struct { } Write; struct { } DeviceloControl; Stack L ._,__ } Parameters; Copyright @ 1994 by Cydonix Corporation. 940033a.vsd Figure 4 . 1 The structure of an IRP Sec. 4.2 I / O Request Packets (IRPs) Table 4.1 65 Externally visible fields in an I R P header IRP header fields Field Description 10_STATUS_BLOCK IoStatus PVOID Associatedlrp.SystemBuffer Contains status of the I/0 request Points to a system space buffer if device performs Buffered I/ 0 Points to a Memory Descriptor List for a user-space buffer if device performs Direct 1/0 User-space address of I/O buffer Indicates the IRP has been canceled PMDL MdlAddress PVOID UserBuffer BOOLEAN Cancel The Associatedlrp.SystemBuffer, MdlAddress, and UserBuffer fields play various roles in managing the driver's access to data buffers. Later chapters will explain how to use these fields when your driver performs either Buffered or Direct l/O. 1/0 stack locations The main purpose of an I / 0 stack location is to hold the function code and parameters for an I/0 request. By examining the Major Function field of the stack location, a driver can decide what operation to perform and how to interpret the Parameters union. Table 4.2 describes some of the com monly used members of an I/O stack location. For requests sent directly to a lowest-level driver, the corresponding IRP will have only one I/0 stack location. For requests sent to a higher-level driver, the I / O Manager creates an IRP with separate I / O stack locations for each driver layer. Every driver in the hierarchy is allowed to touch only its own stack loca tion, and if it's not at the bottom of the pile, to set up the stack location for the next driver beneath it. When a driver passes an IRP to a lower-level driver, the I/O Manager auto matically "pushes" the 1/0 stack-pointer so that it points at the 1/0 stack location belonging to the lower driver. When the lower driver releases the IRP, the 1/0 stack-pointer is "popped" so that it again points to the stack location of the higher driver. Chapter 15 will explain how to work with this mechanism. Manipulating IRPs IRP access functions fall into two general categories: Those that operate on the IRP as a whole, and those that deal specifically with the IRP's 1/0 stack loca tions. The following subsections describe each of groups. IRPs as a whole The I/O Manager exports a variety of functions that work with IRPs. Table 4.3 lists the most common ones. Later chapters will explain how to use them. Chapter 4 66 Table 4.2 Drivers and Kernel-Mode Objects Selected contents of an I R P stack location IO_STACK_LOCATION, *PIO_STACK_LOCATION Field Contents UCHAR MajorFunction UCHAR MinorFunction union Parameters struct Read IRP_MJ_XXX function specifying the operation Used by file system and SCSI drivers Typed union keyed to MajorFunction code Parameters for IRP_MJ_READ • ULONG Length • ULONG Key • LARGE_INTEGER ByteOffset Parameters for IRP_MJ_WRITE • ULONG Length • ULONG Key • LARGE_INTEGER ByteOffset Parameters for IRP_MJ_DEVICE_CONTROL and IRP_MJ_INTERNAL_DEVICE_CONTROL • ULONG OutputBufferLength • ULONG InputBufferLength • ULONG IoControlCode • PVOID Type3InputBuffer struct Write struct DeviceloControl struct Others PDEVICE_OBJECT DeviceObject PFILE_OBJECT FileObject Available to driver • PVOID Argumentl-Argument4 Target device for this I/ 0 request File object for this request, if any Note: See NTDDK.H for additional members of the Parameters union. Table 4.3 Functions that work with the whole I R P I R P functions Function Description Called by... IoStartPacket IoCompleteRequest IoStartNextPacket IoCallDriver* IoAllocatelrp* loFreelrp* Sends IRP to Start 1/0 routine Indicates that all processing is done Sends next IRP to Start I/ 0 Sends IRP to another driver Requests additional IRPs Releases driver-allocated IRPs Dispatch DpcForlsr DpcForlsr Dispatch Dispatch 1/0 Completion *These functions are used primarily by layered drivers. Sec. 4.3 Driver Objects Table 4.4 67 IO_STACK_LOCATION access-functions IO_STACK_LOCATION access functions Function IoGetCurrentlrpStackLocation loMarklrpPending IoGetNextlrpStackLocation* IoSetNextlrpStackLocation* loSetCompletionRoutine* Description Called by... Gets pointer to caller's stack slot (Various) Dispatch Marks caller's stack slot as needing further processing Gets pointer to stack slot for next lower driver Pushes the I/ 0 stack pointer one location Attaches 1/0 Completion routine to the next lower driver 's 1/0 stack slot Dispatch Dispatch Dispatch *These functions are used primarily by layered drivers. IRP stack locations The 1/0 Manager also provides several functions that drivers can use to access an IRP's stack locations. These functions are listed in Table 4.4 4.3 DRIVER OBJ ECTS DriverEntry is the only driver routine with an exported name. When the 1/0 Manager needs to locate other driver functions, it uses the Driver object associ ated with a specific device. This object is basically a catalog that contains pointers to various driver functions. Here's how it works. 1. The 1 / 0 Manager creates a Driver object whenever it loads a driver. If the driver fails during initialization, the 1/0 Manager deletes the object. 2. During initialization, the DriverEntry routine loads pointers to other driver functions into the Driver object. 3. When an IRP is sent to a specific device, the 1/0 Manager uses the associated Driver object to find the right Dispatch routine. 4. If a request involves an actual device operation, the 1/0 Manager uses the Driver object to locate the driver's Start 1/0 routine. 5. If the driver is unloaded, the 1/0 Manager uses the Driver object to find an Unload routine. When the Unload routine is done, the 1/0 Manager deletes the Driver object. Chapter 4 68 Drivers and Kernel-Mode Objects Start 1/0 Routine Unload Routine DriverUnload MajorFunction[ ] ·· · · ···· · Dispatch Routine • Device O bject Dispatch Routine Copyright © 1 994 by Cydonix Corporation. 9400348.vsd Figure 4.2 Structure of a Driver object Layout of a Driver Object There is a unique Driver object for each driver currently loaded in the sys tem. Figure 4.2 illustrates the structure of a Driver object. As you can see, the Driver object also contains a pointer to a linked list of devices serviced by this driver. A driver's Unload routine can use this list to locate any devices it needs to delete. Unlike other objects, there are no access functions for modifying Driver objects. Instead, the DriverEntry routine sets various fields directly. Table 4.5 lists the fields you're allowed to touch. Table 4.5 Externally visible fields of a Driver object Driver object fields Field Description PDRIVER_STARTIO DriverStartio PDRIVER_UNLOAD DriverUnload PDRIVER_DISPATCH MajorFunction[ ] Address of driver's Start 1/0 routine Address of driver's Unload routine Table of driver's Dispatch routines, indexed by 1/0 operation code Linked list of Device objects created by this driver PDEVICE_OBJECT DeviceObject Device Objects and Device Extensions Sec. 4.4 4.4 69 DEVICE OBJ ECTS AND DEVICE EXTENSIONS Both the 1/0 Manager and the driver need to know what's going on with an 1/0 device at all times. Device objects make this possible by keeping information about a device's characteristics and state. There is one Device object for each vir tual, logical, and physical device on the system. Here's how they're used. 1. The DriverEntry routine creates a Device object for each o f its devices. 2. The 1/0 Manager uses a pointer in the Device object to locate the correspond ing Driver object. There it can find driver routines to operate on 1/0 requests. It also maintains a queue of current and pending IRPs attached to the Device object. 3. Various driver routines use the Device object to locate the corresponding Device Extension. As an I/ 0 request is processed, the driver uses the Exten sion to store any device-specific state information. 4. The driver's Unload routine deletes the Device object when the driver is unloaded. Physical Device drivers aren't the only ones who use these objects. Chapter 15 describes the way higher-level drivers use Device objects. Layout of a Device Object Figure 4.3 illustrates the structure of a Device object and its relation to other structures. !RP Flags + - El DriverObject Currentlrp DeviceExtension Copyright © 1 994 by Cydonix Corporation. 940035a.vsd Figure 4.3 Structure of a Device object Chapter 4 70 Table 4.6 Drivers and Kernel-Mode Objects Externally visible fields of a Device object Device object fields Field Description PVOID DeviceExtension PDRIVER_OBJECT DriverObject ULONG Flags Points to Device Extension structure Points to Driver object for this device Specifies buffering strategy for device • DO_BUFFERED_IO • DO_DIRECT_IO Points to next device belonging to this driver Minimum number of I/ 0 stack locations needed by IRPs sent to this device Memory alignment required for buffers PDEVICE_OBJECT NextDevice CCHAR StackSize ULONG AlignmentRequirement Although the Device object contains a lot of data, much of it is the exclusive property of the 1/0 Manager. Your driver should limit its access to only those fields listed in Table 4.6. Manipulating Device Objects Table 4.7 lists many of the 1/0 Manager functions that operate on Device objects. The I/ 0 Manager also passes a Device object pointer as an argument to most of the routines in your driver. Table 4.7 Access functions for a Device object Device object access functions Function Description Called by... IoCreateDevice IoCreateSymbolicLink IoAttachDevice* IoAttachDeviceByPointer* IoGetDeviceObjectPointer* IoCallDriver* IoDetachDevice* IoDeleteSymbolicLink Creates a Device object Makes Device object visible to Win32 Attaches a filter to a Device object Attaches a filter to a Device object Layers one driver on top of another Sends request to another driver Disconnects from a lower driver Removes Device object from the Win32 namespace Removes Device object from system DriverEntry DriverEntry DriverEntry DriverEntry DriverEntry Dispatch Unload Unload IoDeleteDevice *These functions are used primarily by layered drivers. Unload Sec. 4.5 Controller Objects and Controller Extensions 71 Device Extensions Connected to the Device object is another important data structure, the Device Extension. The Extension is simply a block of nonpaged pool that the I/0 Manager automatically attaches to any Device object you create. You choose both the size and the contents of the Device Extension. Typically, you use it to hold any information associated with a particular device. Drivers have to be fully reentrant, so global or static variables are a very bad idea. Any information that you might be tempted to keep in global or static stor age probably belongs in the Device Extension. Other things you might want to store in the Extension include • A back pointer to the Device object • Any device state or driver context information • A pointer to an Interrupt object and an interrupt-expected flag • A pointer to a Controller object • A pointer to an Adapter object and a count of mapping registers Since the Device Extension is driver-specific, you'll need to define its struc ture in one of your driver's header files. Although the Extension's exact contents will depend on what your driver does, its general layout will look something like this: typede f s t ruct _DEVICE_EXTENS ION { PDEVICE_OBJECT Devic eObj e c t ; II Other dr iver- spec i f i c dec l arat i ons DEVI CE_EXTENS I ON , * PDEVICE_EXTENS ION ; In later chapters of this book, you'll see a great many uses for the Device Extension. 4.5 CONTROLLER O BJ ECTS AND CONTROLLER EXTENSIONS Some peripheral adapters manage more than one physical device using the same set of control registers. The floppy disk controller is one example of this architec ture. This kind of hardware causes the following synchronization problem: If the driver tries to perform simultaneous operations on more than one of the con nected devices without first synchronizing its access to the shared register space, the control registers may get trashed. To help with this problem, the 1/0 Manager provides Controller objects. The Controller object is a kind of token that can be owned by only one device at a time. Before accessing any device registers, the driver asks that ownership of Chapter 4 72 Drivers and Kernel-Mode Objects the Controller object be given to a specific device. If the hardware is free, ownership is granted. If not, the device's request is put on hold until the current owner releases the hardware. By passing the Controller object around this way, the I/O Manager guarantees that multiple devices will access the hardware in an orderly manner. Here's a little more detail about how Controller objects are used. 1. The DriverEntry routine creates the Controller object and usually stores its address in a field of each device's Device Extension. 2. Before it starts a device operation, the Start I/0 routine asks for exclusive ownership of the Controller object on behalf of a specific device. 3. When the Controller object becomes available, the I/ 0 Manager grants own ership and calls the driver's ControllerControl routine. This routine sets up the device's registers and starts the I/O operation. As long as this device owns the Controller object, any further requests for ownership will block at step 2 until the object is released. 4. When the device operation is finished, the driver's DpcForlsr routine releases the Controller object, making it available for use by other pending requests. 5. The driver's Unload routine deletes the Controller object when the driver is unloaded. Obviously, not all drivers need a Controller object. If your interface card supports only one physical or virtual device, or if multiple devices on the same card don't share any control registers then you can ignore Controller objects. Layout of a Controller Object Figure 4.4 shows the relationship of a Controller object to other system data structures. The only externally visible field in a Controller object is the PVOID Control lerExtension field, which contains a pointer to the extension block. Manipulating Controller Objects The I/O Manager exports four functions that operate on Controller objects. These functions are listed in Table 4.8. Controller Extensions Like Device objects, Controller objects contain a pointer to an Extension structure that you can use to hold any controller-specific data. The Extension is also a place to store any information that's global to all the devices attached to a controller. Finally, if the controller (rather than individual devices) is the source of Sec. 4.5 Controller Objects and Controller Extensions �··· Device Object Device Exten sion 73 @ Device Object @· ControllerExtension Copyright © 1 994 by Cydonix Corporation. 940036a. vsd Figure 4.4 Structure of a Controller object interrupts, it makes sense to store pointers to Interrupt and Adapter objects in the Controller Extension. Since the Controller Extension is driver-specific, you'll need to define its structure in one of your driver's header files. Although the Extension's exact con tents will depend on what your driver does, its general layout will look some thing like this: typede f s t ruc t _CONTROLLER_EXTENS I ON { PCONTROLLER_OBJECT Contro l l erObj ect ; II Other driver- spec i f ic dec l arati ons CONTROLLER_EXTENS ION , * PCONTROLLER_EXTENS I ON ; Table 4.8 Access functions for a Controller object Controller object access functions Function Description Cal led by... IoCreateController IoAllocateController IoFreeController IoDeleteController Creates a Controller object Requests exclusive ownership of controller Releases ownership of controller Removes Controller object from the system DriverEntry Start 1/0 DpcForlsr Unload Chapter 4 74 4.6 Drivers and Kernel-Mode Objects ADAPTER OBJ ECTS Just as multiple devices on the same controller need to coordinate their hardware access, so devices that perform DMA need an orderly way to share system DMA resources. The 1/0 Manager uses Adapter objects to prevent arguments over DMA hardware. There is one Adapter object for each DMA data transfer channel on the system. Like a Controller object, an Adapter object can be owned by only one device at a time. Before starting a DMA transfer, the Start I/O routine asks for ownership of the Adapter object. If the hardware is free, ownership is granted. If not, the device's request is put on hold until the current owner releases the hardware. Obviously, if your device supports only programmed 1/0, you don't need to bother with Adapter objects. Here's how Adapter objects work. 1. The HAL creates Adapter objects for any DMA data channels detected at bootstrap time. 2. The DriverEntry routine locates the Adapter object for its device and stores a pointer in the Device or Controller Extension. Adapter objects for unrecog nized DMA hardware may be created on the fly at this point. 3. The Start I/O routine requests ownership of the Adapter object on behalf of a specific device. 4. When ownership is granted, the 1 / 0 Manager calls the driver's Adapter Control routine. This routine then uses the Adapter object to set up a DMA transfer. 5. The driver's DpcForlsr routine may use the Adapter object to perform addi tional operations in the case of a split transfer. When a transfer is finished, DpcForlsr releases the Adapter object. Another important function of the Adapter object is to manage some things called mapping registers. The HAL uses these registers to map the scattered physi cal pages of a user's buffer onto the contiguous range of addresses required by most DMA hardware. If that statement doesn't make any sense to you, don't worry. We'll be looking at the mechanics of DMA transfers in much greater detail in Chapter 12. Layout of an Adapter Object Figure 4.5 illustrates the relationship of Adapter objects to other structures. As you can see from the diagram, the Adapter object is completely opaque and has no externally visible fields. If you're working with DMA devices, you should Sec. 4.6 Adapter Objects 75 AdapterPtr MapRegCount Copyright © 1 994 by Cydonix Corporation. 940037a.vsd Figure 4.5 Structure of an Adapter object store the pointer to your Adapter object, as well as the number of mapping regis ters it supports, either in a Device or Controller Extension Manipulating Adapter Objects Both the HAL and the 1/0 Manager export functions that you can use to manipulate Adapter objects. Table 4.9 lists the ones you're most likely to encounter. Table 4.9 Access functions for an Adapter object Adapter object access functions Function Description Called by... HalGetAdapter Gets a pointer to an Adapter object Requests exclusive ownership of OMA hardware Sets up OMA hardware for a data transfer Flushes data after partial transfers Releases map registers Releases Adapter object DriverEntry IoAllocateAdapterChannel IoMapTransfer IoFlushAdapterBuffers IoFreeMapRegisters IoFreeAdapterChannel Startlo (Controller Control) Adapter Control I DpcForlsr DpcForlsr DpcForlsr DpcForlsr Chapter 4 76 4.7 Drivers and Kernel-Mode Objects I NTERR U PT OBJECTS That brings us to the last of the NT objects we'll be looking at in this chapter, the Interrupt object. Interrupt objects simply give the Kernel's interrupt dispatcher a way to find the right service routine when an interrupt occurs. Here's how Inter rupt objects are used. 1. The DriverEntry routine creates a n Interrupt object for each interrupt vector supported by the device or the Controller 2. When an interrupt occurs, the Kernel's interrupt dispatcher uses the Interrupt object to locate the Interrupt Service routine 3. The Unload routine deletes the Interrupt object after disabling interrupts from the device. Other than creating and deleting them, your driver has very little direct interaction with Interrupt objects. You will, however, need to store a pointer to the Interrupt object in a convenient place like the Device or Controller Extension. Layout of an Interrupt Object Figure 4.6 illustrates the structure of an Interrupt object. Like Adapter objects, they are completely opaque and have no externally visible fields. lnterruptPtr Copyright © 1 994 by Cydonix Corporation. 940038a.vsd Figure 4.6 Structure of an Interrupt object Sec. 4.8 Summary 77 Table 4.1 0 Access functions for an I nterrupt object Interrupt object access functions Function Description Called by... HalGetlnterruptVector Converts bus-relative interrupt vector to systemwide value Associates Interrupt Service routine with a system interrupt vector Synchronizes driver routines that run at different IRQLs Removes Interrupt object DriverEntry IoConnectlnterrupt KeSynchronizeExecution IoDisconnectlnterrupt DriverEntry (Various) Unload Manipulating Interrupt Objects Several system components export functions that work with Interrupt ob jects. Table 4.10 lists the most common ones. 4.8 SUM MARY Although it may seem as if there are a lot of objects involved in 1/0 processing, they're all necessary and important. If you're feeling a little overwhelmed with all this background material, you can relax. The next chapter will show you how to put this information to work as we start writing some actual driver code. C H A P T E R 5 General Develo p ment Issues W riting kernel-mode code is not the same as writing an application program. Because your driver is a trusted component of the system, you have to be much more careful about how you behave. This chap ter is a short manual of good etiquette for driver writers. 5.1 DRIVER DESIGN STRATEGIES Like most other kinds of software, drivers benefit from an organized approach to development. This section gives some guidelines that may help shorten develop ment time. Use Formal Design Methods There's a certain cowboy mentality that pervades the driver-writing world. For some reason, it's easy to think that you can just sit down, scribble a flowchart on an old candy wrapper, and just start coding. Unfortunately, when you're deal ing with a full-duplex driver for some asynchronous communication device, such ad hoc methods just don't work. So many things are going on that it becomes impossible to verify the flow of control. A better approach is to use techniques that have proven helpful in other areas of real-time design. Some suggestions follow. 78 Sec. 5.1 Driver Design Strategies 79 • Data flow diagrams can help you break your driver into discrete func tional units. These diagrams make it easier to visualize how the func tional units in your driver relate to each other, and how they transform input data into output data. • State-machine models are another good way to describe the flow of con trol in a driver - especially one that manages an elaborate hardware or software protocol. In the process of verifying the state machine, you can also ferret out synchronization issues within the driver. • An analysis of expected data repetition rates or mandatory input-to-out put response will give you a set of quantitative timing requirements. These are important when it comes time to tune the driver. • Another useful tool is an explicit list of external events and the driver actions these events should trigger. This list ought to include both hard ware events from the device and I/ 0 requests from users. Using these techniques will help you to separate your driver into well defined functional units, which makes the driver easier to develop. In some cases, this might even mean breaking a single driver into a pair of port and class drivers that handle hardware-dependent and hardware-independent functions. In any event, the time you spend analyzing and designing your driver at the start of the project will more than pay for itself in reduced debugging and maintenance. Use Incremental Development Once you've completed your initial analysis and design, it's time to start the actual development. Following the steps below can reduce your debugging time by helping you to detect problems while they're still easy to find. 1. Decide which kinds o f kernel-mode objects your driver will need. 2. Decide on any additional context or state information your driver will need, and decide where you're going to store it. 3. Write the DriverEntry and Unload routines. To test the driver at this point, see if you can load and unload it using the Control Panel. 4. Add code that finds and allocates the driver's hardware, as well as code to deallocate the hardware when the driver unloads. Again, the test is just whether you can load and unload the driver using the Control Panel. You can also use the Registry editor (REGEDT32) to see whether your driver is allocat ing and deallocating its resources properly. 5. Add driver Dispatch routines that process IRP_MJ_CREATE, IRP_MJ_CLOSE, and any other operations that don't require device access. Chapter 5 80 General Development Issues You can test the driver with a simple Win32 program that calls CreateFile and CloseHandle. 6. Add Dispatch routines that process any other IRP_MJ_XXX function codes. Also, add the Start I/O logic but complete each I/O request without starting the device. Test these new code paths with a simple Win32 program that makes ReadFile and WriteFile calls, as appropriate. 7. Finally, implement the real Start I/O logic, the Interrupt Service routine, and the DPC routine. Now you can test the driver using live data. Another tip: If you're unsure about the exact behavior of the hardware, add a DeviceloControl function that gives you direct access to the device registers. This will allow you to find out how the device really works by writing a few simple Win32 programs. Just remember to disable this function when you ship the final version of the driver. Use the Sample Drivers The Windows NT device driver kit (DDK) contains a huge body of sample code in the \ DDK\ SRC directory tree. There are many ways you can use all this code to make driver development easier. At the very least, you should read it for hints, clues, and comments. You might also want to be more direct about cutting and pasting helpful chunks of code (a procedure encouraged by Microsoft). The usual warning: If you do decide to cut and paste, make sure you thoroughly understand the code you're grabbing. 5.2 C O D I N G C O N V EN TI O N S A N D T E C H N I Q U ES Writing a trusted kernel-mode component is not the same as writing an applica tion program. This section presents some basic conventions and techniques that will make it easier to code in this environment. General Recommendations First of all, here are some general recommendations for things you should keep in mind when you're writing a driver: • Avoid the use of assembly language in your driver. It makes the code hard to read, nonportable, and difficult to maintain. In those rare situa tions where it's unavoidable, isolate the code in its own module. What ever you do, don't go sprinkling inline assembly throughout your driver. • If you have any platform-specific code, either put it in its own module, or at the very least bracket it with #ifdef/#endif statements. Coding Conventions and Techniques Sec. 5.2 81 • Don't link your driver with the standard C runtime library. Some of those routines may hold state or context information in ways that are not driver safe. Instead use the RtlXxx support routines supplied for drivers. • Commenting code is a religious issue. Some people swear by it; others think that out-of-date comments are worse than no comments at all. 1 • Manage your driver project with some kind of source-code control pro gram. This is especially important for larger drivers, or drivers being developed by several people. Naming Conventions It's a good idea to adopt some standard naming convention for the routines in your driver. This makes it easier to debug and test the driver during its initial development. It also simplifies maintenance of the driver should you have to reacquaint yourself with the code after being away from it for a year. Microsoft recommends the following: • Add a driver-specific prefix to each of your routines. Choose one prefix for standard driver routines and another, shorter prefix for any internal functions. • Give the routine itself a name that describes what it does. For example, the mouse class driver supplied with the NT DOK adds the prefix MouseClass to all its standard routines which gives names like MouseClassStartlo and MouseClassUnload. The same class driver uses the prefix Mou for any internal routines like MouConfiguration and MouConnectToPort. Regardless of whether you follow these conventions or come up with some of your own, it's important that you establish some consistent way of naming your driver routines. When you come back to a driver that you haven't looked at for six months, uniform naming will make it easier to figure out what you originally had in mind. Header Files NTDDK.H defines all the data types, structures, and constants used by base-level kernel-mode drivers. SCSI, network, and video drivers use other header files. Be sure you've included the appropriate headers in your driver. You can use private header files to hide various hardware and platform dependencies. For example on 80x86 systems, you can address each byte in 1/0 space, but on other architectures, 1 / 0 registers may need to be aligned on 4-byte 1 Personally, I attend services at the Church of the Detailed Comment. Chapter 5 82 General Development Issues or 8-byte boundaries. Hiding these differences in a header file means you can move your driver to a new platform just by redefining some symbols and rebuilding the driver. Even if your driver doesn't face any of these issues, writing a few register access macros can make the driver itself easier to read. The following code fragment is an example of some hardware beautification macros for a parallel port device. This example assumes that some initialization code in the driver has put the address of the first device register in the PortBase field of the Device Extension. II II Def ine devi c e regi s t ers as relat ive o f f s e t s II # de f ine PAR_DATA # de f ine PAR_STATUS # de f ine PAR_CONTROL II II II 0 1 2 Def ine acc e s s macros for regi s ters . Each mac ro take s a pointer to a Device Ext en s i on as an argument II # de f ine ParWr i t eData ( pDevExt , bData ) ( WRITE_PORT_UCHAR ( pDevExt - > PortBase + PAR_DATA , bData ) ) \ \ # de f ine ParReadS tatu s ( pDevExt ) ( READ_PORT_UCHAR ( pDevExt - > PortBa s e + PAR_STATUS ) ) \ \ # de f ine ParWr i teContro l ( pDevExt , bData \ ( WRITE_PORT_UCHAR ( \ pDevExt - > PortBase + PAR_CONTROL , bData ) ) Status Return Val ues The kernel-mode portions of NT operating system use 32-bit status values to describe the outcome of any particular operation. The data type of these codes is NTSTATUS. There are three situations in which you'll need to use these status codes: • When you call one of the internal NT functions, it will communicate its displeasure at something you're trying to do by returning an NTSTATUS value • When NT calls some driver-specific callback routines, the routines often have to return an NTSTATUS value to the system. Coding Conventions and Techniques Sec. 5.2 • 83 When you complete the processing of an 1/0 request, you need to mark it with an NTSTATUS value. This value will ultimately be mapped onto a Win32 ERROR_XXX code. 2 NTSTATUS.H defines symbolic names for a large number of NTSTATUS values. These names all have the form STATUS_XXX, where XXX describes the actual status message. STATUS_SUCCESS, STATUS_NAME_EXISTS, and STATUS_INSUFFICIENT_RESOURCES are all examples of these names. When you call a system routine that returns an NTSTATUS value, you can either check for specific values, or you can use the NT_SUCCESS macro to test for general success or failure. The following code fragment illustrates this technique. NTSTATUS s tatus ; s tatus = I oCreat eDevi c e ( . . . ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { I I c l ean up and exi t wi th fai lure } Always, always, always check the return values you get from any system routines you call. If you just assume that the call succeeded, your driver may damage the system somewhere down the line. If you're lucky, this kind of thing will crash the system and draw attention to itself; if not, it may just produce sporadic, hard-to-find errors. NT Driver Support Routines The 1/0 Manager and other kernel-mode components of NT export a large number of support functions that your driver can call. The reference section of the NT DDK documentation describes these functions, and you'll see plenty of examples of their use throughout this book. For the moment, it's enough to point out that these support routines fall into categories based on the NT module that exports them. Table 5.1 gives a brief overview of the kinds of support that each NT module provides. The ZwXxx functions need a little explanation. These are actually an internal calling interface for all the NtXxx user-mode system services. The difference between the user- and kernel-mode interfaces is that the ZwXxx functions don't perform any argument checking. Although there are a large number of these 2 NTSTATUS codes and Win32 error codes are not the same thing. The knowledge base that comes with the NT DDK has an article that shows the mapping between NTSTATUS values and their cor responding Win32 ERROR_XXX codes. It's worth taking a look at this article because the mappings from STATUS_XXX to ERROR_XXX codes don' t always make a lot of sense. Chapter 5 84 Table 5.1 General Development Issues Categories of support routines available to drivers NT driver support routines Category Supports ... Function names Executive Memory allocation Interlocked queues Zones Lookaside lists System worker threads Device register access Bus access General driver support Synchronization DPC Virtual-to-physical mapping Memory allocation Handle management System thread management String manipulation Large integer arithmetic Registry access Security functions Time and date functions Queue and list support Privilege checking Security descriptor functions Internal system services ExXxx() HAL I/O Manager Kernel Memory Manager Object Manager Process Manager Runtime library Security Monitor (All) HalXxx() IoXxx() KeXxx() MmXxx() ObXxx() PsXxx() RtlXxx() (mostly) SeXxx() ZwXxx() functions, the NT DDK reference material describes only a few of them. Microsoft may eventually tell us about the rest, but for now, limit yourself to using the ones that show up in the documentation. One final point. To make life easier for driver writers, the I/O Manager provides several convenience functions that are really just wrappers around one or more lower-level calls to other NT modules. These wrappers usually offer a simpler interface than their low-level counterparts, and you should use them whenever you can. Discarding Initialization Routines Some compilers support the option of declaring certain functions as discardable. Functions in this category will disappear from memory after your Sec. 5.2 85 Coding Conventions and Techniques driver has finished loading, making your driver smaller. If your development environment offers this feature, you should use it. Good candidates for discardable functions are DriverEntry and any subroutines called only by DriverEntry. The following code fragment shows how to take advantage of discardable code. # i fde f ALLOC_PRAGMA #pragma a l l oc_t ext ( i ni t , DriverEnt ry ) #pragma a l l o c_t ext ( ini t , XxStuf fCal l edByDriverEntry #pragma a l l oc_t ext ( ini t , XxAl s oCal l edByDriverEntry ) # endi f The alloc_text pragma must appear after the function name is declared, but before the function itself is defined - so remember to prototype the function at the top of the code module (or in a suitable header file) . Also, functions referenced in the pragma statement must be defined in the same compilation unit as the pragma. If you don't follow these rules, things break. Controlling Driver Paging Nonpaged system memory is a precious resource. You can further reduce the burden your driver puts on nonpaged memory by putting appropriate routines in paged memory. Any function that executes only at PASSNE_LEVEL IRQL can be paged. This includes Reinitialize routines, Unload and Shutdown routines, Dispatch routines, thread functions, and any helper functions running exclusively at PASSNE_LEVEL IRQL. Once again, it's the alloc_text pragma that performs this little miracle. Here's an example: # i fde f ALLOC_PRAGMA #pragma a l l oc_text ( #pragma a l l oc_text ( #pragma a l l oc_text ( #pragma a l l oc_text ( page , page , page , page , XxUnl oad XxShutdown XxDi spatchRead XxDi spat chHe lper # endi f Finally, there's another trick you can play if you have a seldom-used device driver and you want to get it out of the way. By calling the MmPageEntireDriver function, you can override a driver 's declared memory management attributes and make the whole thing temporarily paged. Call this function at the end of the DriverEntry routine and from the Dispatch routine for IRP_MJ_CLOSE when there are no more open handles to any of your devices. Call MmResetDriverPaging from the IRP_MJ_CREATE Dispatch routine to make the driver 's page attributes revert to normal. If you use this technique, watch out for two things. First, make sure there aren't any IRPs being processed by high-IRQL portions of the driver when you 86 Chapter 5 General Development Issues make everything paged. Second, be certain that no device interrupts will arrive while the driver 's JSR is paged. Handling these details is left as an exercise for the reader. 5.3 DRIVER M EMORY ALLOCATION Just like application programs, drivers may need to allocate temporary storage from time to time. Unfortunately, drivers don't have the luxury of making simple calls to malloc and free. Instead, they have to be extremely careful about what kind of memory they allocate and how much of it they use. Drivers must also be sure to release any memory they may be holding, since there's no automatic cleanup mechanism for kernel-mode code. This section describes techniques your driver can use to work with temporary storage. Memory Available to Drivers You have three options when you need to allocate temporary storage in a driver. Which one you select will depend on how long you plan to keep the data around and what IRQL level your code is running at. You can choose from the following: • Kernel stack The kernel stack provides limited amounts of nonpaged storage for local variables during the execution of specific driver routines. • Paged pool Driver routines running below DISPATCH_LEVEL, IRQL can use a heap area called paged pool. As its name implies, memory in this area is pageable, and a page fault can occur when you touch it. • Nonpaged pool Driver routines running at elevated IRQLs need to allocate temporary storage from another heap area called nonpaged pool. The system guarantees that the virtual memory in nonpaged pool is always physically resident. The Device and Controller Extensions created by the I/ 0 Manager come from this pool area. - - - Global variables are absent from this list because they introduce major syn chronization problems. The problem is that everyone using a given driver is shar ing the same copy of the driver 's code and global data. Since a driver might be processing multiple requests at the same time, the contents of unprotected global variables can become unpredictable. Local static variables in a driver subroutine are just as bad. Don't try using them to maintain state information between calls to a function. There's no guarantee that two successive calls to a driver routine will be made in the context of the same 1/0 request. After saying that, it's worth pointing out that global variables can be helpful for storing read-only parameters that affect the overall behavior the driver. For Sec. 5.3 Driver Memory Allocation 87 example, your DriverEntry routine might pull a value from the Registry that controlled the amount of detail you report to the error-log. Storing this value in a global variable is acceptable since it will essentially be constant for the life of the driver. You could use a similar strategy for turning the collection of driver performance data on and off. Working with the Kernel Stack On 80x86 and MIPS platforms, the kernel stack is only 12 kilobytes long. On Alpha and PowerPC systems, the size is 16 kilobytes. This isn't a lot of space, so be careful how you use the kernel stack. Dreadful things will happen if you run out of space. You can avoid kernel stack overflow by following these guidelines. • Don't design your driver in such a way that internal routines need to make deeply-nested calls to one another. Try to keep the calling tree as flat as possible. • If any of your routines call themselves recursively, make sure you limit the depth of recursion. Drivers are not the place to be calculating Fibonacci numbers. • Don't build large temporary data structures on the kernel stack. Use one of the pool areas instead. Another characteristic of the kernel stack is that it lives in cached memory. This means you shouldn't use temporary buffers on the stack for DMA operations. Instead, your driver should allocate the buffer from nonpaged pool. Chapter 12 will describe DMA caching issues in more detail. Working with the Pool Areas Remember that kernel-mode drivers can't allocate memory by making calls to malloc. Instead, they have to use the ExAllocatePool and ExFreePool func tions. These functions allocate the following kinds of memory: • Non Paged Pool - Memory available to driver routines running at or above DISPATCH_LEVEL IRQL. • NonPagedPoolMustSucceed - Temporary memory that is crucial to the driver 's continuing operation. If the allocation fails, the system will bugcheck. Use this memory for emergencies only and release it as quickly as possible. • Non PagedPoolCacheAl igned - Memory that's guaranteed to be aligned on the natural boundary of a CPU data-cache line. A driver might use this kind of memory for a permanent I/ 0 buffer. 88 Chapter 5 General Development Issues • NonPagedPoolCacheAlignedMustS - Storage for a temporary 1/0 buffer that is crucial to the operation of the driver. • PagedPool - Memory available only to driver routines running below DISPATCH_LEVEL IRQL. Normally, this includes the driver's initializa tion, cleanup, and Dispatch routines and any system threads the driver is using. • PagedPoolCacheAligned - I/O buffer memory used by file system drivers. There are several things to keep in mind when you're working with the system memory areas. First and foremost, the pools are precious system resources, and you shouldn't be too extravagant in their use. This is especially true of the NonPaged and MustSucceed pool areas. Second, your driver must be executing at or below DISPATCH_LEVEL IRQL when you allocate or free nonpaged memory, and at or below APC_LEVEL IRQL to allocate or free paged pool. Finally, release any memory you've allocated as soon as have finished using it. Otherwise, the system may start to perform badly because of low memory conditions. In particular, be very sure to give back any pool memory when your driver is unloaded. System Support for Memory Suballocation Generally, you should avoid driver designs that constantly allocate and release blocks of pool memory smaller than PAGE_SIZE bytes. This kind of behavior causes fragmentation of the pool areas and can make it impossible for other parts of NT to allocate memory. Instead, if your driver needs to create and destroy lots of little dynamic data structures, you should allocate a single, large chunk of pool and write your own suballocation routines to carve it up. Some kinds of drivers need to manage a collection of small, fixed-size memory blocks. For example, SCSI class drivers maintain a supply of SCSI Request Blocks (SRBs) which they use repeatedly to send commands to any devices under their control. If your driver needs to do something similar, the system provides two different mechanisms you can use to handle all the details of suballocation. Zone buffers A zone buffer is just a chunk of driver-allocated pool. By calling various Executive routines, your driver can use the zone buffer to manage collections of fixed-size blocks in paged or nonpaged memory. If you plan to access a zone buffer at or above DISPATCH_LEVEL IRQL, you must also set up an Executive spin lock to guard it and use the interlocked versions of the zone management functions. Zone buffers used only below Sec. 5.3 Driver Memory Allocation 89 DISPATCH_LEVEL IRQL can be guarded with a Fast Mutex. 3 In this case, use the noninterlocked set of functions. To set up a zone buffer, you must declare a structure of type ZONE_HEADER. You may also need to declare and initialize a spin lock or Fast Mutex object. Then follow these steps to manage the zone buffer. 1. Call ExAllocatePool t o claim space for the zone buffer itself. Then initialize the zone buffer with ExlnitializeZone. Both these steps are normally per formed in your DriverEntry routine. 2. To allocate a block from a zone, call either ExAllocateFromZone or Exlnter lockedAllocateFromZone. The interlocked version of the function uses a spin lock to synchronize access to the zone buffer. The noninterlocked function leaves synchronization entirely up to your driver. 3. To release a block back to the zone, use either ExFreeToZone or Exlnter lockedFreeToZone. Again, the interlocked version of the function synchro nizes access to the zone, while the noninterlocked version does not. 4. In your driver Unload routine, use ExFreePool to release the memory used for the zone buffer. Your driver has to make sure that no blocks from the zone buffer are in use when you deallocate the zone buffer. Zone buffers that are too large put a strain on the system's memory re sources, so don't make a zone buffer any bigger than necessary. Try to pick a size that will allow your driver to handle the I/O demand level you expect on an average system. This is a more system-friendly approach than making the zone buffer big enough to handle the worst possible case. If you're feeling really clever, you can try to base the size of your zone buffer on the characteristics of the local system. MmQuerySystemSize will give you a hint about the total amount of memory available. Systems with more memory can support larger zone buffers. MmlsThisAnNtAsSystem will tell you whether your driver is running under Windows NT Workstation or Server. Servers are likely to have more memory and higher I/O demand levels. Calling these functions in your DriverEntry routine may help you pick an appropriate zone buffer size. If you try to allocate a block from a zone buffer and the allocation fails, your driver should use ExAllocatePool (or ExAllocatePoolWithTag) to get the block from one of the pool areas instead. To use this strategy, you'll need some kind of flag bit in the allocated structure to indicate whether it came from the zone buffer or from the general pool; otherwise you won't know what function to call when you want to release the block. 3 Spin locks are described later in this chapter. Fast Mutexes show up in Chapter 14. 90 Chapter 5 General Development Issues You can make an existing zone buffer larger by calling ExExtendZone or ExlnterlockedExtendZone, but this is generally a bad thing to do. If you enlarge a zone buffer this way, the additional memory that the system gives to the zone will not be reclaimed until the next bootstrap. Don't do this unless the performance gains from using zone allocation (compared to repeated ExAllocatePool calls) sig nificantly outweigh the damage it does to the system. Lookaside lists Windows NT 4.0 provides a more efficient mechanism called a lookaside list for managing driver-allocated memory. A lookaside list is a linked list of fixed size memory blocks. Unlike zone buffers, lookaside lists can grow and shrink dynamically in response to changing system conditions. There fore, properly-sized lookaside lists are less likely to waste memory than zone buffers are. Compared to zone buffers, the synchronization mechanism used with looka side lists is also more efficient. If the CPU architecture has an 8-byte compare exchange instruction, the Executive uses it to guard access to the list. On plat forms without such an instruction, it reverts to using a spin lock for lookaside lists in nonpaged pool and a Fast Mutex for lists in paged pool. Since most common platforms do have the necessary compare exchange instruction, lookaside lists have lower synchronization latency than zone buffers. To use a lookaside list, you need to declare a header structure of type NPAGED_LOOKASIDE_LIST or PAGED_LOOKASIDE_LIST (depending on whether your list will be nonpaged or paged) . Then follow these steps to manage the lookaside list. 1. Use one o f the ExlnitializeXxxLookasideList functions t o initialize the list header structure. 4 Normally, this is done in you DriverEntry routine. 2. Call ExAllocateFromXxxLookasideList to allocate a block from a lookaside list. 3. Call ExFreeToXxxLookasideList when you want to release a block. 4. Use ExDeleteXxxLookasideList to release any resources associated with the lookaside list. Usually, this is something you do in the driver 's Unload routine. The operation of lookaside lists is rather interesting and deserves a little attention. For starters, the ExlnitializeXxxLookasideList functions just set up the list header; they don't actually allocate any memory for the list. When you call one of these initialization functions, you can specify the maximum number of blocks that the list can hold. (This is referred to as the depth of the list.) You can 4 In this series of instructions, replace the Xxx in the function name with either depending on the location of the list. NPaged or Paged, Sec. 5.4 Unicode Strings 91 also pass pointers to memory allocation and deallocation routines in your driver. The system will call these functions when it needs to add or remove memory from the list. 5 Later, when you call one of the ExAllocateFromXxxLookasideList func tions, the system allocates memory as needed. As you release blocks with ExFree ToXxxLookasideList, they are added to the lookaside list until it reaches its maximum allowable depth. At that point, any additional calls to ExFreeToXxx LookasideList result in memory being released back to the system. This behavior guarantees that, after awhile, the number of available blocks in the lookaside list will tend to remain near the depth of the list. You should choose the depth value very carefully. If it's too shallow, the sys tem will be performing expensive allocation and deallocation operations too often. If it's too deep, you'll be wasting memory by tying it up in the list and not using it. The statistics maintained in the list header structure can help you deter mine a proper value for the depth of the list. 5.4 U NICODE STRINGS All character strings i n the N T operating system are stored internally a s Unicode. The Unicode scheme uses 16 bits to represent each character and makes it easier to move NT to language environments not based on the Latin alphabet. Unless otherwise noted, any character strings your driver sends to or receives from NT will be Unicode. 6 Unicode String Datatypes When you're working with Unicode, remember to do the following: • Prefix Unicode string constants with the letter L to let the compiler know you want wide characters. For example, L" some text" generates Unicode text, whereas "some text" produces 8-bit ANSI. • Use the WCHAR data type for Unicode characters and PWSTR to point to an array of Unicode characters. • Use the constant UNICODE_NULL to terminate a Unicode string. Many NT system routines work with counted Unicode strings described by a UNICODE_STRING structure (see Table 5.2 for the contents) . 5 I f you don't pass the addresses o f driver-defined memory management functions, the system uses and ExFreePool by default. 6 Note that this does not include data passed between a user 's buffer and a device - unless the device specifically works with Unicode. ExAllocatePoolWithTag 92 Chapter 5 Table 5.2 General Development Issues This structure defines the basic string object used by drivers UNICODE_STRING, *PU NICODE_STRING Field Contents USHORT Length USHORT MaximumLength PWSTR Buffer Current string length, in bytes Maximum string length, in bytes Address of driver-allocated buffer holding the string It's up to you to allocate memory for the string buffer itself. If the Buffer field points to a NULL-terminated string, the Length field does not include the NULL character. Notice that the two length fields in the UNICODE_STRING structure specify a count in bytes, not characters. Working with U nicode The NT runtime library provides a number of functions for working with ANSI and Unicode strings. Table 5 .3 presents a few of them. See the documen tation for a complete list. Some of these functions have restrictions on the IRQL levels from which they can be called, so be careful when you're using them. If you've never worked with Unicode before, you may have some programming habits that will cause you problems. Most of them result from Table 5.3 The NT runtime library provides these U nicode manipulation functions Unicode string manipulation functions Function Description RtllnitUnicodeString Initializes a UNICODE_STRING from a NULL-terminated Unicode string Calculates number of bytes required to hold a converted ANSI string Converts ANSI string to Unicode Converts an integer to Unicode text Concatenates two Unicode strings Copies a source string to a destination Converts Unicode string to uppercase Compares two Unicode strings Tests equality of two Unicode strings RtlAnsiStringToUnicodeSize RtlAnsiStringToUnicodeString RtllntegerToUnicodeString RtlAppendUnicodeStringToString RtlCopyUnicodeString RtlUpcaseUnicodeString RtlCompareUnicodeString RtlEqualUnicodeString Sec. 5.5 93 Interrupt Synchronization making the assumption that a character and a byte are the same size. Watch out for the following when you start working with Unicode: 5.5 • Remember that the number of characters in a Unicode string is not the same as the number of bytes. Be very careful about any arithmetic you do that calculates the length of a Unicode string. • Don't assume anything about the collating sequence of the characters or the relationship of upper- and lowercase characters. • Don't assume that a table with 256 entries is large enough to hold the entire character set. I NTERRUPT SYNCHRONIZATION Writing code that executes a t multiple IRQL levels requires some attention to proper synchronization. This section examines the issues that arise in this kind of environment. The Problem If code executing at two different IRQLs attempts to access the same data structure simultaneously, the structure can be corrupted. Figure 5.1 illustrates the details of this synchronization problem. foo.x = 1 0; foo.y = 20; int x; int y; Copyright © 1 994 by Cydonix Corporation. 940026a.vsd Figure 5.1 Data structures can be corrupted by unsynchronized access 94 Chapter 5 General Development Issues To see the exact problem, consider this sequence of events: I. Imagine that some piece o f code executing at a low IRQL decides to modify several fields in the foo data structure. It gets as far as setting the field foo.x to I . 2. Suddenly an interrupt occurs, and a higher-IRQL piece of code gets control of the CPU. This code also decides to modify foo, and it sets foo.x to 10 and foo.y to 20. 3. The higher-IRQL code dismisses its interrupt, and control returns to the lower IRQL routine which finishes its modifications to foo by setting foo.y to 2. The lower-IRQL code is completely unaware that it was interrupted. 4. The foo structure is now corrupted, with 10 in x and 2 in y. In the following sections, you'll see some techniques your driver can use to avoid these kinds of collisions. Interrupt B locking In the previous example, the lower-IRQL routine could have avoided these synchronization problems by preventing itself from being interrupted. It can do this by temporarily raising the IRQL of the CPU and then lowering it back to its initial level after completing the modification. This technique is called interrupt blocking. If you look at Table 5.4, you'll see the Kernel functions that your driver can use to manipulate a CPU's IRQL value. Rules for B locking Interrupts If you plan to use any of these functions to block interrupts, there are certain rules you need to follow: • Every piece of code touching a protected data structure has to agree on the IRQL to use for synchronization and must only touch the structure when it's running at the chosen IRQL. Table 5.4 These Kernel functions control the CPU's I RQL level Interrupt B locking Functions Function KeRaiseirql KeLowerirql KeGetCurrentirql Description Changes the CPU IRQL to a specified value, blocking interrupts at or below that IRQL level Lowers the CPU IRQL value Returns the IRQL value of the CPU on which this call is made Sec. 5.6 5.6 Synchronizing Multiple CPUs 95 • Drivers using this technique shouldn't spend too much time at the ele vated IRQL level. Depending on the blocking level, this can have a nega tive impact on NT's ability to service other interrupts quickly. • Although your driver can raise the CPU's IRQL to a higher level and reduce it back to its previous value, you must never drop the CPU's IRQL below the level where you found it. Disobeying this rule will compromise the entire interrupt priority mechanism. SYNCHRONIZING M U LTIPLE C P U S But everything is not yet safe. Modifying the IRQL o f one CPU has no affect on other CPUs in a multiprocessor system. Consequently, IRQLs provide only local protection to shared data. To prevent corruption of data structures accessed by multiple CPUs, NT uses synchronization objects called spin locks. How Spin Locks Work A spin lock is simply a mutual-exclusion object that you associate with a specific group of data structures. When a piece of kernel-mode code wants to touch any of the guarded data structures, it must first request ownership of the associated spin lock. Since only one CPU at a time can own the spin lock, the data structure is safe from collisions. Any CPU requesting an already-owned spin lock will busy-wait until the spin lock becomes available. Look at Figure 5.2 to see how this works. A given spin lock is always acquired and released at a specific IRQL level. This has the effect of blocking potentially dangerous interrupts on the local CPU and preventing the synchronization problems we saw in the last section. While a CPU is waiting for a spin lock, all activity at or below the IRQL of the spin lock is blocked on that CPU. Once the IRQL level has been raised, the CPU can request ownership of the spin lock, which will guarantee protection against other CPUs. Fortunately, all these details are hidden inside the Kernel's spin lock routines. Using Spin Locks There are two major kinds of spin locks provided by the Kernel. They are distinguished by the IRQL level at which you use them. • Interrupt spin locks These synchronize access to driver data struc tures shared by multiple driver routines. Interrupt spin locks are acquired at the DIRQL associated with the device. • Executive spin locks These guard various operating system data structures and their associated IRQL is DISPATCH_LEVEL. - - Chapter 5 96 Raise IRQL Repeat Request Spin Lock Until ACQUIRED foo.x = 1 ; foo.y = 2; Release Spin Lock Restore IRQL General Development Issues Raise lRQL Repeat Request Spin Lock Until ACQUIRED foo.x = 1 0; foo.y = 20; Release Spin Lock Restore IRQL S p i n lock for "too" int x; int y; Copyright © 1 994 b y Cydonix Corporation. 940027a.vsd Figure 5.2 How spin locks synchronize multiple CPUs When your driver uses Interrupt spin locks, most of the work happens behind the scenes. When we look at KeSynchronizeExecution in Chapter 9, you'll see the exact details. Executive spin locks are another story. When you use them, you'll need to follow these steps: 1. Decide what data items you need to guard and how many spin locks to use. The tradeoff is that a larger number of spin locks may allow more of your driver to run in parallel, but it increases the chance of deadlocking if you need to acquire multiple locks at the same time. 2. Declare a data item of type KSPIN_LOCK for each lock. Storage for the spin lock must be permanently resident. Usually, you store spin locks in the Device or Controller Extension. 3. Initialize the spin lock once by calling KelnitializeSpinLock. You can call this function from any IRQL level, though most often you set up all your spin locks in the DriverEntry routine. 4. Call KeAcquireSpinLock before you touch any resource guarded by a spin lock. This function raises IRQL to DISPATCH_LEVEL, acquires the spin lock, and returns the previous IRQL value to you. To call this function, you must be at or below DISPATCH_LEVEL IRQL. If you're already running at DIS PATCH_LEVEL, you can save some work by calling KeAcquireSpinLockAt DpcLevel instead. Sec. 5.7 Linked Lists 5. 97 When you've finished using the protected resource, call the KeRelease SpinLock function to let go of the lock. You call this function from DIS PATCH_LEVEL IRQL and it restores IRQL to its previous value. If you were already at DISPATCH_LEVEL when you acquired the lock, you can save some work by calling KeReleaseSpinLockFromDpcLevel, which releases the lock but doesn't change IRQL. Some other driver support routines (like the interlocked lists and queues described in the next section) use Executive spin locks for protection. In these cases, your only responsibility is to initialize the spin lock object. The routines that manage the interlocked object will acquire and release the spin lock itself on your behalf. Rules for Using Spin Locks Spin locks aren't terribly difficult to use, but you do have to keep a few things in mind when you're working with them: 5.7 • Be sure to release a spin lock as quickly as possible, because while you're holding it, you may be blocking all activity on other CPUs. The official rec ommendation is not to hold a spin lock for more than about 25 microseconds. • Don't cause any hardware or software exceptions while you're holding a spin lock. This is a sure way to crash the system. • Don't try to access any paged code or data while you're holding a spin lock. This may result in a page fault exception, which is another quick way to crash the system. • Don't try to acquire a spin lock that your CPU already owns. This will lead to a deadlock situation where the CPU freezes up waiting for itself to release the spin lock. • Avoid driver designs that depend on holding multiple spin locks at the same time. Unless you're careful, this can also lead to deadlocks. If you must use multiple spin locks, be sure that everyone agrees to acquire them in a fixed order and release them in reverse order. • Don't call any routines that violate the above rules. L I N KE D L ISTS Drivers sometimes need to maintain various kinds of linked lists. You'll see exam ples of this in later chapters. The following subsections describe the support avail able from NT for managing singly- and doubly-linked lists. Chapter 5 98 General Development Issues Singly-Linked Lists To use singly-linked lists, begin by declaring a list head of type SINGLE_LIST_ENTRY. This is also the data type of the link pointer itself. You need to initialize the list by setting the head to NULL, as demonstrated in the following code fragment. typede f s t ruct _DEVICE_EXTENS I ON { S I NGLE_L I ST_ENTRY l i s tHead ; / / Declare head pointer } DEVICE_EXTENS ION , * PDEVICE_EXTENS ION pDevExt - > l i s tHead . Next = NULL ; / / I n i t i a l i z e the l i s t To add o r remove entries from the front o f the list, call PushEntryList and PopEntryList. Depending on how you're using the list, the actual entries can be in either paged or nonpaged memory. Just remember that these functions don't perform any synchronization of their own. NT also provides convenient support for singly-linked lists guarded by an Executive spin lock. This kind of protection is important if you're sharing a linked list among driver routines running at or below DISPATCH_LEVEL IRQL. To use one of these lists, set up the list head in the usual way, and then initialize an Executive spin lock that will guard the list. typede f s t ruct _DEVICE_EXTENS ION S I NGLE_L I ST_ENTRY l i s tHead ; / / Declare head pointer / / and the lock KSPIN_LOCK l i s tLock ; } DEVICE_EXTENS ION , * PDEVICE_EXTENSION Ke ini t i a l i z e Sp inLock ( &pDevExt - > l i s tLock ) ; pDevExt - > l i s tHead . Next = NULL ; You pass a pointer to this spin lock as an explicit argument to Exlnter lockedPushEntryList and ExlnterlockedPopEntryList. To make these interlocked calls, you must be running at or below DISPATCH_LEVEL IRQL. The list entries themselves must reside in nonpaged memory, since the system will be linking and unlinking them from DISPATCH_LEVEL IRQL. Doubly-Linked Lists To use doubly-linked lists, declare a list head of type LIST_ENTRY. This is also the data type of the link pointer itself. You need to initialize the list head, as demonstrated in the following code fragment. Sec. 5.7 Linked Lists 99 typede f s truc t _DEVICE_EXTENS ION LIST_ENTRY l i s tHead ; / / Decl are head pointer } DEVICE_EXTENS I ON , * PDEVICE_EXTENS I ON Ini t i al i z eL i s tHead ( &pDevExt - > l i s tHead ) ; To add entries to the list, call InsertHeadList or InsertTailList, and to pull entries out, call RemoveHeadList or RemoveTailList. You can determine if there's anything in a list by calling IsListEmpty. Again, the entries can be paged or nonpaged, but these functions don't perform any synchronization. Not surprisingly, NT supports interlocked doubly-linked lists. To use these, set up the list head in the usual way, and then initialize an Executive spin lock that will guard the list. typede f s t ruct _DEVICE_EXTENS ION L I ST_ENTRY l i s tHead ; / / Dec lare head pointer KSPIN_LOCK l i s tLock ; / / and the lock } DEVICE_EXTENSION , * PDEVICE_EXTENSION Keini t i a l i z eSpinLock ( &pDevExt - > l i s tLock ) ; Ini t i al i z eLi s tHead ( &pDevExt - > l i s tHead ) ; You pass this spin lock in calls to ExlnterlockedlnsertTailList, Exlnter lockedlnsertHeadList, and ExlnterlockedRemoveHeadList. To make these interlocked calls, you must be running at or below DISPATCH_LEVEL IRQL. Just like their singly-linked cousins, entries for doubly-linked interlocked lists have to live in nonpaged memory. Removing Blocks from a List When you pull a block out of a list, what the system gives you is a pointer to the LIST_ENTRY or SINGLE_LIST_ENTRY structure within the block. What you probably want is the address of the block itself. If the XXX_LIST_ENTRY structure is at the top of the block, everything is easy. If it's buried in the block somewhere, you need to do a little arithmetic to get the address of the containing structure. Fortunately, NT provides a macro to make this easier. See Table 5.5 for the details. The following code fragment shows how to use this macro. It assumes you're using the Tail.Overlay.ListEntry field of an IRP to maintain your own linked list of IRPs, and that the listHead field of your Device Extension points to the beginning of this list. 100 Chapter 5 Table 5.5 General Development Issues CONTA I N I NG_RECORD macro arguments CONTAINING_RECORD Parameter Description Address Type Field Address of a field within a data structure The data type of the structure Field in structure pointed at by the Address argument Base address of structure containing Field Return value P I RP p i rp ; PL I ST_ENTRY pEntry ; pEntry = RemoveHeadL i s t ( &pDevExt - > l i s tHead ) ; p i rp = CONTAINING_RECORD ( pEntry , I RP , Tai l . Over l ay . Li s tEntry ) ; 5.8 SUMMARY In this chapter we've looked a t some general guidelines for designing and coding your driver. We've also covered a number of basic techniques that will show up again and again throughout this book. This is all just foundation material for the work ahead. In the next chapter, we'll start to implement some actual driver routines. C H A P T E R 6 Initialization and Cleanup Routines E verything has to start somewhere. In the case of an NT kernel-mode driver, the starting point is a function called DriverEntry. This chapter will show you how to write a DriverEntry routine along with vari ous other pieces of initialization and cleanup code. By the time you finish this chapter, you'll be able to write a minimal driver that you can actually load into the system. 6 . 1 WRITING A DRIVERENTRY ROUTIN E Every NT kernel-mode driver, regardless of its purpose, has to expose a routine whose name is DriverEntry. This routine initializes various driver data structures and prepares the environment for all the other driver components. Execution Context The 1/0 Manager calls your DriverEntry routine once when it loads your driver. As you can see from Table 6.1, the DriverEntry routine runs at PAS SIVE_LEVEL IRQL, which means it has access to paged system resources. The DriverEntry routine receives a pointer to its own Driver object, which it must initialize. It also gets a UNICODE_STRING containing the path to the driver 's service key in the Registry. This string takes the form, HKEY_LOCAL_MA101 102 Chapter 6 Table 6.1 Initialization and Cleanup Routines Function prototype for a D riverEntry routine == NTSTATUS DriverEntry IRQL PASSIVE_LEVEL Parameter Description IN PDRIVER_OBJECT DriverObject IN PUNICODE_STRING RegistryPath Return value Driver object for this driver Registry path string for this driver's key • STATUS_SUCCESS - success • STATUS_XXX - some error code CHINE\ System \CurrentControlSet\Services \DriverName, and DriverEntry can use it to extract any driver-specific parameters stored in the Registry. 1 What a DriverEntry Routine Does Although the exact details will vary slightly from driver to driver, in general you should perform the following steps in your DriverEntry routine. 1. If you're writing a device driver, start by finding and allocating any hardware that the driver is supposed to manage. 2. Initialize the Driver object with pointers to other driver entry points. 3. If your driver manages a multiunit controller, call IoCreateController to cre ate a Controller object and then initialize its Controller Extension. 4. Call IoCreateDevice to create a Device object and then initialize its Device Extension. 5. Make the device visible to the Win32 subsystem by calling IoCreateSymbolic Link. 6. Connect the device to an Interrupt object and initialize any DPC objects needed by the driver. 7. Repeat steps 3-6 for all controllers and devices that belong to your driver. 8. Return STATUS_SUCCESS to the 1/0 Manager. If you run into problems during initialization, your DriverEntry routine should release any system resources it may have allocated and return an appro priate NTSTATUS failure code to the 1/0 Manager. The following sections describe some of these steps in greater detail. The process of finding and allocating hardware is complex enough that it needs to 1 Chapter 7 explains how to extract these parameters from a driver's service key. 103 Sec. 6.1 Writing a DriverEntry Routine wait until the next chapter. We'll also have to postpone the discussion of interrupt processing and DPCs until we look at data transfer routines in Chapter 9. Initializing DriverEntry Points The 1/0 Manager is able to locate the DriverEntry routine because it has a well-known name. Other driver routines don't have fixed names, so the 1/0 Man ager needs some other way to find them. It does this by looking in the Driver object for pointers to specific functions. Your DriverEntry routine is responsible for setting up these function pointers. These function pointers fall into two categories: • Functions with explicit slots in the Driver object. • IRP Dispatch functions that are listed in the Driver object's MajorFunc tion array. These are discussed in more detail in Chapter 8. The following code fragment shows how a DriverEntry routine initializes both kinds of function pointers. pDO- >DriverStar t i o = XxS tar t i o ; pDO- >Dr iverUnl oad = XxUnl oad ; II II Ini t i a l i z e the func t i on di spatch array II = pDO - >Maj orFunc t i on [ I RP_MJ_CREATE ] pDO - >Maj orFunc t i on [ IRP_MJ_CLOSE ] = XxDi spatchCreat e ; XxD i spatchC l o s e ; Creating Device Objects Once you've found and allocated all your hardware, you need to create a Device object for each physical or virtual device you want to expose to the rest of the system. Most of the work is done by the IoCreateDevice function, which takes a description of your device and returns a Device object, complete with an attached Device Extension. IoCreateDevice also links the new Device object into the list of devices managed by this Driver object. Table 6.2 contains a description of this function. Take a look at the NTDDK.H header file to see the standard definitions for the DeviceType argument. Try to choose a value that's as close as possible to your device. If you truly believe your nuclear-powered laser retroscope is unlike any existing device, you can define a private device type value. Just remember that Microsoft reserves values in the range 0-32767 and leaves numbers between 32768 and 65535 for you. They also leave the bookkeeping up to you, so there's no 104 Chapter 6 Table 6.2 Initialization and Cleanup Routines Function prototype for loCreateDevice == NTSTATUS loCreateDevice IRQL Parameter Description IN PDRIVER_OBJECT DriverObject Pointer to Driver object Desired size of Device Extension in bytes NT device name (see below) FILE_DEVICE_XXX (see NTDDK.H) Characteristics for mass-storage device • FILE_REMOVABLE_MEDIA • FILE_READ_ONLY_DEVICE • FILE_FLOPPY_DISKETTE • FILE_WRITE_ONCE_MEDIA • FILE_REMOTE_DEVICE TRUE if device is nonshareable Variable that receives Device object • STATUS_SUCCESS - success • STATUS_XXX - some failure code IN ULONG DeviceExtensionSize IN PUNICODE_STRING DeviceName IN DEVICE_TYPE DeviceType IN ULONG DeviceCharacteristics IN BOOLEAN Exclusive OUT PDEVICE_OBJECT *DeviceObject Return value PASSIVE_LEVEL guarantee that the number you choose for your retroscope won't be used by some other driver to refer to its microwave popcorn warmer. One final point about creating Device objects. Although the vast majority of drivers call IoCreateDevice from their DriverEntry routines, it is possible to make this call from a Dispatch routine instead. For example, a driver that managed pseudo-devices could use this technique to dynamically create Device objects in response to a driver-defined DeviceloControl request. If you do create Device objects somewhere other than in your DriverEntry routine, you have to reset the DO_DEVICE_INffiALIZING bit in the Flags field of the object. In the normal course of events, the 1/0 Manager automatically resets this bit for a driver 's Device objects when the DriverEntry routine is fin ished. Until this bit is cleared, the Device object can't be used, and CreateFile calls referencing it will fail. The following code fragment shows what you need to do. pDevObj - >F l ags &= -DO_DEVICE_INITIAL I Z ING ; Don't clear this bit until the Device object is actually initialized and ready to process requests. Choosing a Bufferi ng Strategy If the IoCreateDevice call succeeds, you need to let the I/O Manager know whether you want to do Buffered or Direct 1/0 with this device. You make this Sec. 6.2 Code Example: Driver Initialization 105 choice by ORing one of the following bits into the Flags field of the new Device object. 2 : • DO_BUFFERED_IO If you want the 1/0 Manager to copy data back and forth between user and system-space buffers. • DO_DIRECT_10 If you want the 1/0 Manager to lock user buffers into physical memory for the duration of an 1/0, and build a descriptor list of the pages in the buffer. - - Chapter 8 will explain how to work with user buffers in both of these cases. If you don't set either of these bits, the 1/0 Manager will assume that you're han dling everything yourself. Making user data available to a driver is a nasty pro cess, so it's best to let the 1/0 Manager do the work for you. NT and Win32 Device Names Just like T.S. Elliot's cats, NT devices have more than one name. The one you specify to loCreateDevice is the name by which the device is known to the NT Executive itself. If you want to make the device available to the Win32 subsystem, the Win16 subsystem, and virtual DOS machines, you have to give the device a DOS name as well. These two types of names live in different parts of the Object Manager's namespace. You'll find NT device names dangling beneath the \Device section of the tree, while the Win32 name appears beneath the \DosDevices area. Notice that the DOS name is actually a symbolic link that connects it to the NT device. Figure 6.1 illustrates this relationship. Also notice that NT and DOS follow different device naming conventions. NT device names tend to be longer, and they always end in a zero-based number (FloppyDiskO, FloppyDiskl, etc). DOS devices follow the usual pattern of A through Z for file-system devices, and names ending in a one-based number for any other devices (LPTl, LPT2, etc). 6.2 C O D E E XAM P L E : D R I V E R I N ITIALIZATI O N This example shows how a basic device driver initializes itself. You can find the code for this example in the CH06 directory on the disk that accompanies this book. 2 Make sure you use a logical OR to set the Flags field of the Device object. The 1/0 Manager uses other bits in this field to synchronize its own operation, and if you accidentally clear some of them, bad things will happen. 106 Chapter 6 Initialization and Cleanup Routines .. .. . . .. .. ' \ •: Device : : DosDevices ' ' · ·� XxO · · · · · · · · · · · · · · · · · ·� Sym bolic Link · - . XX1 .. .. .. .. .. Copyright @ 1 996 by Cydonix Corporation. 96001 Sa.vsd Figure 6 . 1 NT and Win32 device names in the Object Manager's namespace INIT.C The functions in this module perform all the essential setup tasks needed to manage one or more physical devices. Although the code supports multiple devices, it assumes they are all on separate controllers, so it doesn't create any Controller objects. DriverEntry This particular implementation isn't very forgiving of initial ization errors. H anything fails along the way, the whole driver refuses to load. A real driver might take a more flexible approach. II II Header f i l e s . . . II # inc l ude " xxdr iver . h " O II II Forward dec l arat i ons o f local func t i ons II s t a t i c NTSTATUS XxCreateDev i c e ( IN PDRIVER_OBJECT DriverObj e c t , IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVICE_BLOCK Devi ceBlock , IN ULONG NtDevic eNumber ) i II II I f the p l a t f orm can handl e i t , make the Dr iverEntry Sec. 6.2 107 Code Example: Driver Initialization II routine di s cardabl e , so that it doesn ' t was t e space II # i fde f ALLOC_PRAGMA #pragrna a l l o c_text ( i ni t , DriverEntry ) 8 #pragrna a l l o c_text ( i ni t , XxCreateDevi c e # endi f I I ++ I I Func t i on : DriverEntry II II II Des c r ip t i on : This function initializes the driver , l ocates and c l a ims hardware r e s ourc e s , and creates vari ous NT obj e c t s needed to proc e s s I I O reques t s . II II Arguments : II II II II Pointer to the Driver obj ect Regis try path string for driver servi ce key II II II II II Return Value : NTSTATUS s i gnal i ng suc c e s s or fai lure I I-- NT STATUS DriverEntry ( IN PDRIVER_OBJECT DriverObj ect , IN PUNI CODE_STRING Reg i s t ryPath ) { PCONFI G_ARRAY Con f i gLi s t ; 4D PCONFI G_ARRAY Con f i gArray ; ULONG NtDevic eNumber ; NTSTATUS s tatus ; ULONG i ; II II Load up the Con f i g l i s t II . . . XxGetHardwareinfo ( 0 Reg i s t ryPath , &Con f i gL i s t ) ; i f ( ! NT_SUCCES S ( s tatus ) ) { return s t atus ; } s tatus = 108 Chapter 6 II II Initialization and Cleanup Routines Al l ocate the hardware . . . II s t atus = XxReportHardwareUsage ( DriverObj ect , Con f i gL i s t ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { XxRe l eas eHardwareinfo ( Con f i gL i s t ) ; return s t atus ; II II Export o ther dr iver entry points . . . II = XxDr iverUnl oad ; DriverObj ec t - >Maj orFunc t i on [ XxDi spat chOpen ; DriverObj e c t - >Maj orFunc t i on [ XxDi spatchC l o s e ; DriverObj e c t - >Maj orFunc t i on [ XxDi spatchWr i t e ; DriverObj e c t - >Maj orFunc t i on [ XxDi spat chRead ; I RP_MJ_CREATE ] Dr iverObj e c t - >Dr iverUnl oad II II II IRP_MJ_CLOSE = I RP_MJ_WRITE IRP_MJ_READ ] = Ini t i al i z e a Devi ce obj ect f o r each p i e c e o f hardware we ' ve f ound II Con f i gArray = Conf igLi s t ; NtDeviceNurnber = O ; whi l e ( Con f i gArray ! = NULL ) { for ( i = 0 ; i < Con f i gArray->Count ; i++ ) { XxCreateDevi c e ( DriverObj e c t , Con f i gArray- >BusType , Con f i gArray->BusNurnber , &Con f i gArray- >Devi ce [ i ] , NtDeviceNurnber ) ; i f ( ! NT_SUCCES S ( s t atus ) ) break ; s tatus Sec. 6.2 Code Example: Driver Initialization 109 NtDevic eNumber + + ; i f ( ! NT_SUCCESS ( s t atus ) ) break ; II II Get next array in the chain II Con f i gArray = Con f i gArray- >NextCon f i gArray ; i f ( ! NT_SUCCES S ( s tatus ) ) { XxRe l eas eHardware ( DriverObj e c t ) ; XxRe l e a s eHardwareinfo ( Con f i gL i s t ) ; return s tatus ; 0 This header includes both the system-supplied NTDDK.H and our pri vate HARDWARE.H file. It also contains definitions of any driver defined structures. @ NT will discard these routines after DriverEntry executes. You should also include any functions called only by the DriverEntry routine. Do not discard any code needed after driver initialization. @ The Config list is a driver-defined data structure that will follow us through the DriverEntry routine. It holds information about any hard ware that this driver manages. Chapter 7 will show you how to use this structure. 0 We'll see this routine in the next chapter. It uses one of two techniques to locate any hardware this driver is responsible for and put a description of that hardware into the Config list. XxCreateDevice This is a helper function that does all the grunt work. It creates and initializes a single Device object using one of the hardware descrip tions in the Config list. s tat i c NTSTATUS XxCreateDevi c e ( IN PDRIVER_OBJECT DriverObj e c t , IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVICE_BLOCK DeviceB l o c k , IN ULONG NtDevi c eNumber ) { 110 Chapter 6 Initialization and Cleanup Routines NTSTATUS s tatus ; PDEVICE_OBJECT pDevObj ; PDEVICE_EXTENS I ON pDevExt ; UNI CODE_STRING devic eName ; WCHAR deviceNameBu f f e r [ XX_MAX_NAME_LENGTH ] ; UNICODE_STRING l inkName ; WCHAR l inkNarneBu f f er [ XX_MAX_NAME_LENGTH ] ; UNICODE_STRING number ; WCHAR numberBu f fer [ l O ] ; number . Bu f f e r = numberBu f f er ; number . MaxirnurnLength = 1 0 ; II II Form the bas e NT device name . . . II deviceName . Bu f fer = devic eNameBu f fer ; devic eName . MaxirnurnLength = XX_MAX_NAME_LENGTH ; devic eName . Length = O ; Rt lAppendUn i codeToS t ring ( &devic eName , XX_NT_DEVICE_NAME ) ; II II II Convert the dev i c e number into a s tr ing and attach i t to the end of the devi c e name . II number . Length = O ; Rtl integerToUnicode S t ring ( NtDevic eNumber , 10 , &number ) ; Rt lAppendUni codeS t ringToStr ing ( &devic eName , &number ) ; II II Create a Device obj ect f o r this devi c e . II s t atus = I oCreat eDevi c e ( DriverObj ect , s i z e o f ( DEVI CE_EXTENSION ) , &devic eName , F I LE_DEVICE_UNKNOWN , 0 0, . . Sec. 6.2 Code Example: Driver Initialization TRUE , &pDevObj ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { return s t atus ; } pDevObj - >F l ags J = DO_BUFFERED_IO ; 8 II II Ini t i al i z e the Devi c e Extens i on II pDevExt = pDevObj - >Devi c eExt ens i on ; pDevExt - >Devi c eObj ect = pDevObj ; pDevExt- >NtDev i ceNumber = NtDevi c eNumber ; II II Copy thing s f rom Dev i c e B l ock @ II pDevExt - > PortBa s e = Devi ceBl ock- > PortBa s e ; II II Prepare a DPC obj ect f o r later u s e II I o ini t ia l i z eDpcReques t ( pDevObj , XxDpcFor i s r ) ; II II Form the Win3 2 symbo l i c l ink name . II l inkName . Bu f fer = l inkNameBu f fer ; l inkName . MaximumLength = XX_MAX_NAME_LENGTH ; l inkName . Length = O ; Rt lAppendUnicodeToS t ring ( & l inkName , XX_WIN3 2_DEVICE_NAME ) ; II II II II II Re s e t the number s t ring and do ano ther convers i on . Win3 2 devi c e numbers are one greater than the NT equiva l ent . number . Length = O ; Rt l integerToUni codeString ( NtDevic eNumber + 1 , 10 , &number ) ; 111 112 Chapter 6 Initialization and Cleanup Routines Rt lAppendUn i c odeStringToS t r ing ( & l inkName , &number ) ; II II II Create a symbo l i c l ink s o our device i s v i s ible to Win3 2 . . . II s tatus II II = I oCreateSymbo l i cLink ( & l inkName , &devic eName ) ; S e e i f the symbo l i c l ink was c reated . . . II i f ( ! NT_SUCCESS ( status ) ) { IoDe l e t eDevi c e ( pDevObj ) ; return s t atus ; II II Make sure device interrup t s are OFF II XxD i s ab l e interrupt s ( pDevExt ) ; II II Connect to an Interrupt obj ect . . . 0 II s tatus = I oConnect interrupt ( &pDevExt - >pinterrupt , Xxi s r , pDevExt , NULL , Devi ceBl ock- >Sys temVector , Devi ceBlock->Di rql , Devi ceBlock- >Di rql , Devi ceBl ock- > InterruptMode , Devi ceBlock- >ShareVector , Devi ceBlock->Af f i n i ty , DeviceBlock->Fl oat ingSave ) ; i f ( ! NT_SUCCE S S ( s tatus ) ) { I oDe l e t e Symbo l i cLink ( & l inkName ) ; IoDe l e t eDevi c e ( pDevObj ) ; return s tatus ; Sec. 6.3 Writing Reinitialize Routines II II 113 Ini t i al i z e the hardware and enab l e interrupts II KeSynchroni z eExecut i on ( pDevExt - >pinterrup t , XXIni tDevi c e , pDevExt ) ; return s tatus ; 0 Choose a FILE_DEVICE_XXX value that's as close as possible to the type of device your driver manages. @ Select an 1/0 method for data transfer operations. In this case, we'll let the 1/0 Manager copy things to and from user space for us. 8 The Config list will be going away soon, so we need to move anything important into the Device Extension. At the least, this includes the control register base address; for DMA devices it would also include the Adapter object pointer and count of mapping registers. More on this in Chapter 12. 0 Chapters 7 and 9 will explain more about interrupt processing. 6.3 WRITING REINITIALIZE ROUTI NES Intermediate-level drivers loading a t system boot time may need to delay their initialization until one or more lower-level drivers have finished loading. If all the drivers belong to you, you can determine their load sequence by setting various Registry entries at installation. But if you don't own all the underlying drivers, your intermediate driver will need a Reinitialize routine. Execution Context If your DriverEntry routine discovers that it can't finish its initialization because system bootstrapping hasn't yet gone far enough, it can register a Reinitialize routine by calling loRegisterDriverReinitialization. The 1/0 Manager will call the Reinitial ize routine at some later point during the bootstrap. As you can see from Table 6.3, the Reinitalize routine runs at PASSIVE_LEVEL IRQL, which means it has access to paged system resources. Reinitialize routines are useful only for drivers that load automatically at system boot. What a Rein itial ize Routine Does The Reinitialize routine can perform any driver initialization that the Driver Entry routine was unable to complete. If the Reinitialize routine discovers that the environment still isn't suitable, it can call loRegisterDriverReinitialization to register itself again. 114 Initialization and Cleanup Routines Chapter 6 Table 6.3 Function prototype for a Reinitialize routine VOI D XxReinitialize IRQL Parameter Description IN PDRIVER_OBJECT DriverObject IN PVOID Context IN ULONG Count Return value Pointer to Driver object Context block specified at registration Zero-based count of reinitialization calls 6.4 == PASSIVE_LEVEL WRITING AN U N LOAD ROUTI NE By default, once a driver i s loaded, i t remains in the system until a reboot occurs. To make a driver unloadable, you need to write an Unload routine and store a pointer to the routine in your Driver object's DriverUnload field. The 1/0 Man ager will then call this routine in response to an unload request from the Control Panel's Devices applet. If your driver will never be unloaded, then you can forget about this routine. Execution Context The 1/0 Manager calls your Unload routine once when it unloads the driver, usually because someone is playing with the Control Panel Devices applet. As you can see from Table 6.4, the Unload routine runs at PASSIVE_LEVEL IRQL, which means it has access to paged system resources. What an Unload Routine Does Although the exact details will vary slightly from driver to driver, in general you should perform the following steps in your Unload routine: 1. For some kinds o f hardware, you may need to save the state of the device in the Registry. That way, you'll be able to put the device back in the same state the next time your DriverEntry routine executes. For example, an audio card driver might save the current volume setting of the card. Table 6.4 Function prototype for an U nload routine VOI D XxU nload IRQL == PASSIVE_LEVEL Parameter Description IN PDRIVER_OBJECT DriverObject Return value Pointer to Driver object for this driver Code Example: Driver Cleanup Sec. 6.5 115 2. Disable interrupts from the device and disconnect the device from its Inter rupt object. It's crucial that the device not generate any interrupt requests once the Interrupt object is gone. 3. Deallocate any hardware belonging to your driver. 4. Use IoDeleteSymbolicLink to remove the device from the Win32 namespace. 5. Remove the Device object itself using IoDeleteDevice. 6. If you're managing multiunit controllers, repeat steps 4 and 5 for each device attached to the controller. Then remove the Controller object itself using IoDeleteController. 7. Repeat steps 4-6 for all controllers and devices that belong to your driver. 8. Deallocate any pool memory held by the driver Keep in mind that your Unload routine will not be called at system shut down time. If you need to do any special work at system shutdown, you'll need to write a shutdown routine. 6.5 CODE EXAM PLE: DRIVER CLEAN UP This example shows how a simple driver removes itself from the system. You can find the complete code for this example in the CH06 directory on the disk that accompanies this book. UN LOAD.C The functions in this module basically just undo the work that was per formed in the DriverEntry code. Again, it assumes there aren't any Controller objects to deal with. XxUnload In this case, the Unload routine is just a wrapper for calling XxReleaseHardware. VOID XxDriverUnl oad ( IN PDRIVER_OBJECT DriverObj e c t ) { II I I S t op interrupt pro c e s s ing and re l ea s e hardware II XxRe l eas eHardware ( Drive rObj ect ) ; } XxReleaseHardware The real cleanup work done by the driver happens in this routine. It's been separated out as a helper routine because parts of the driver initialization code needs to perform the same kinds of cleanup. 116 Chapter 6 Initialization and Cleanup Routines VOI D XXRe l eas eHardware ( IN PDRIVER_OBJECT DriverObj ect ) { PDEVICE_OBJECT pDevObj ; PDEVICE_EXTENS ION pDevExt ; UNI CODE_STRING l inkName ; WCHAR l inkNameBu f f e r [ XX_MAX NAME LENGTH ] ; UNICODE_STRINGnumber ; WCHAR numberBu f f er [ l O ] ; CM_RESOURCE_L I ST Re sLi s t ; BOOLEAN bC on f l i c t ; l inkName . Bu f f er = l inkNameBu f f e r ; l inkName . MaximumLength = XX_MAX_NAME_LENGTH ; number . Bu f f er = numberBu f f er ; number . MaximumLength = 1 0 ; pDevObj = Dr iverObj e c t - >Devic eObj ect ; O II I I Trave r s e the l i s t o f Devi c e obj ects I I and c l ean up each one in turn . . . II whi l e ( pDevObj ! = NULL ) { pDevExt = pDevObj - >DeviceExtens i on ; II I I Add code here t o save the s t ate o f I I the hardware i n the Regi s t ry and / o r I I to s e t t h e hardware into a known condi t i on . II II I I S t op handl ing interrupts f rom devi c e II XxD i s able interrupt s ( pDevExt ) ; I oDi s c onne c t interrupt ( pDevExt - >pinterrupt ) ; II I I Deal locate hardware resourc e s be l onging @ I I only to thi s Dev i c e obj ect . . . II ResLi s t . Count = O ; I I Bui ld an empty l i s t Sec. 6.5 117 Code Example: Driver Cleanup I oReportRe sourceUsage ( NULL , Dr iverObj ect , NULL , 0, I I Defau l t c l a s s name I I Ptr to Dr iver obj ect I I No driver res ourc e s I I Ptr to Devi c e obj ect pDevObj , I I Devi ce resourc e s &Re s Li s t , s i z e o f ( Res L i s t ) , FALSE , I I Junk , but requi red &bConf l i c t ) ; II I I Form the Win3 2 symbo l i c l ink name . II l inkName . Length = O ; Rt lAppendUnicodeTo Str ing ( & l inkName , XX_WIN3 2_DEVICE_NAME ) ; II I I Attach Win3 2 device number to the I I end of the name ; DOS devi ce numbers I I are one greater than NT number s . . . II number . Length = O ; Rt l int egerToUnicodeStr ing ( pDevExt - >NtDevi c eNumber + l , 10 , &number ) ; Rt lAppendUnicode S t r ingToS t r i ng ( & l i nkName , &number ) ; II I I Remove symbo l i c l ink f rom Obj e c t I I name space . . . II I o De l e t e Symbo l i cLink ( & l inkName ) ; II I I Get addr e s s o f next Dev i c e obj ect I I and get rid o f the current one . II pDevObj = pDevObj - >NextDevi c e ; IoDe l e t eDevi c e ( pDevExt - > Devic eObj e c t ) ; . . 118 Chapter 6 Initialization and Cleanup Routines II I I Deal l ocate hardware re s ourc e s owned @ I I by the Dr iver obj ect . . . II I I Bu i l d an empty l i s t Res L i s t . Count = O ; I oReportRe s ourceUsage ( NULL , I I De faul t c l a s s name DriverObj ect , I I Po inter to Driver obj ect &Re s L i s t , I I Driver r e s ources s i z e o f ( Res L i s t ) pDevObj , I I Po inter to Devi c e obj ect NULL , I I Devi ce r e s ourc es 0, I I Don ' t override conf l i c t s FALSE , &bCon f l i c t ) ; I I Junk , but requi red , 0 We're going to run the linked list of Device objects in order to do our cleanup. Get the first Device object from the Driver object. @ The mechanics of actually releasing allocated hardware will be the subject of Chapter 7. For the moment, just treat these two calls to IoReportRe sourceUsage as a piece of necessary magic. 6.6 WRITING SHUTDOWN ROUTI N ES If your driver has any special processing to do before the operating system disap pears, you'll need to write a Shutdown routine. Execution Context The 1/0 Manager calls your Shutdown routine once during system shut down. As you can see from Table 6.5, the Shutdown routine runs at PAS SIVE_LEVEL IRQL, which means it has access to paged system resources. Table 6.5 Function prototype for a Shutdown routine == NTSTATUS XxShutdown IRQL Parameter Description IN PDRIVER_OBJECT DriverObject IN PIRP irp Return value Pointer to Driver object for this driver Pointer to shutdown IRP • STATUS_SUCCESS - success • STATUS_XXX - appropriate error code PASSIVE_LEVEL Sec. 6.7 Testing the Driver 119 What a Shutdown Routine Does The main purpose of a Shutdown routine is to put the device into a known state and perhaps store some device information in the Registry. Again, saving the current volume settings from a sound card is a good example of something a Shutdown routine would do. Unlike the driver's Unload routine, Shutdown routines don't have to worry about releasing driver resources because the operating system is about to disap pear anyway. Enabling Shutdown Notification If you examine the fields in the Driver object, it won't be obvious where the address of your Shutdown routine should go. That's because shutdown notifica tions are delivered to your driver in the form of an 1/0 request whose function code is IRP_MJ_SHUTDOWN. This means that your Shutdown routine is really a Dispatch routine which needs to be added to the Driver object's Maj orFunction array. But wait, it doesn't stop there. You also need to tell the 1/0 Manager that you're interested in receiving shutdown notifications. You do this by making a call to IoRegisterShutdownNotification. The following code fragment, taken from a DriverEntry routine, shows how to enable shutdown notifications in your driver. NTSTATUS DriverEntry ( IN PDRIVER_OBJECT pDO , IN PUNICODE_STRING Regi s t ryPath pDO - >Maj orFunc t i on [ I RP_MJ_SHUTDOWN ] = XxShutdown ; I oReg i s terShutdownNo t i f i c a t i on ( pDO ) ; 6. 7 TESTING TH E DRIVER Even though your driver is far from being complete, there are still a few things you can do at this point to verify its operation. In particular, you can test the driver to be sure that it • Compiles and links successfully • Loads and unloads without crashing the system • Creates Device objects and Win32 symbolic links • Releases any resources when it unloads Chapter 6 120 Initialization and Cleanup Routines These goals may not seem very ambitious, but once you've reached them, you know you have a solid base on which to build the rest of your driver. Testing Procedu re You can use the following procedure to test your driver. If any of the steps fail, or if you crash the system, find and correct the problem before going on to the next phase of the test. 1. Write a SOURCES file for your driver. 2. Use the BUILD utility to create the driver file. 3. Move the driver to its target destination. 4. Install the driver using REGEDT32. Specify manual loading. 5. Reboot the system. 6. Use the Control Panel Devices applet to load and start the driver. 7. Use WINOBJ to see if your driver has created a Device object and its Win32 symbolic link. 8. Stop the driver using the Control Panel Devices applet. 9. Examine the Object Manager's namespace with WINOBJ to be certain the driver has removed any objects it created. The WINOBJ Utility WINOBJ is a tool that comes with the Win32 SOK (not the DOK). This little gem lets you view the NT Object Manager's namespace and determine whether your driver has created its Device object and symbolic link. Microsoft supplies executable versions of WINOBJ for the Alpha, Intel, and MIPS architectures. Unfortunately, you won't find any source code for WINOBJ since it makes direct calls to some native NT system services. To use WINOBJ, just run the executable. The program will display the win dow pictured in Figure 6.2. The left pane shows the NT object directory in the form of file folders. Double-clicking on a particular folder will show its contents in the right window pane. Double-clicking on some objects in the right-hand pane will display additional information about the object.3 As a driver writer, you'll be mainly interested in the driver, DosDevices, and device directories. 3 WINOBJ is a little "throw-away" application that someone at Microsoft wrote. It doesn't know how to display information about all object types, nor do all of its informational displays make sense. Unfortunately, because it uses some of the "secret" NtXxx system calls, its source code isn't included with the SDK. Sec. 6.8 121 Summary CJ H arddiskO CJ \ ,.,, LanmanRedirector CJ ?? CJ H arddisk1 ,.,, LanmanS erver CJ arcname CJ windfs ,.,, mailslot CJ B aseN amedO bjects ,.,, am 1 500t1 ,.,, mup ,.,, beep ,.,, N amedPipe ,.,, floppyO ,.,, N bf_Am1 500t1 ,.,, floppy1 ,.,, ndis ,.,, KeyboardClassO ,.,, null "· ' ci;;·�-�-; '"Ei" ; � device i CJ FileSystem CJ KnownD lls CJ nls '·;i· FloppyControllerE ventO ,.,, netbios CJ O bjectTypes ,.,, KeyboardPortO ,.,, ParallelO CJ R PC Control ,.,, ksecdd ,.,, ParallelPortO CJ security ,.,, LanmanD atagramReceiver J)I PhysicalM emory Figure 6.2 Main window of the WINOBJ utility 6.8 SUM MARY At this point, your driver is on its way. It can initialize itself and present both NT and Win32 devices to the system. Depending on your specific needs, it may also be able to perform various cleanup operations, either when it's unloaded manu ally or when the system shuts down. Unfortunately, your driver still can't locate the hardware it's supposed to be managing. This is a serious deficiency for a device driver, and it's one we'll see how to remedy in the next chapter. C H A P T E R 7 Hardware Ini tializa ti on Q ne of the first things a device driver does is to locate any devices it has to manage. This means finding their control registers, determining their OMA capabilities and the IRQ levels at which they interrupt, and locating any device-specific memory. In other words, the driver has to come up with a list of the hardware resources used by its devices. This turns out to be a much easier task if the hardware is auto-detectable. This chapter explains how to determine the resources needed by a device regardless of whether it auto-detects or not. However, it's not enough to know what resources a device uses. Device drivers also have to claim ownership of any hardware resources they plan to use, in order to avoid collisions with other drivers. At the end of this chapter, you'll learn how to allocate and deallocate system hardware. 7.1 FINDING AUTO-DETECTED HARDWAR E During system bootstrap, NT goes to a lot of trouble to figure out what kinds of peripherals are attached to the system. This section explains how the process works and how your driver can access auto-detected hardware information. How Auto-Detection Works The exact mechanism used for detecting hardware depends on the platform architecture. On 80x86 systems, a bootstrap component called NTDETECT gath1 22 Sec. 7.1 Finding Auto-Detected Hardware 123 ers information about the hardware environment, while on RISC-based machines, the ARC firmware performs a similar function. In either case, the detection com ponent makes this hardware data available to the operating system loader, which in turn writes it into the \HARDWARE\DESCRIPTION area of the Registry. Later, device drivers can use this information to control their initialization. The detection components use whatever methods they can to determine the identity and characteristics of a given system. This includes both interrogating the hardware directly, as well as using information in the ROM BIOS to draw conclu sions about devices attached to the system. Among other things, auto-detection tries to determine . . . • The number and type of any 1/0 buses on the system • Extended information about the bootstrap device itself • Information about the monitor and video adapter used to display boot strap messages • The presence and location of keyboard and mouse hardware • Number and location of serial and parallel controllers and any recogniz able printers or terminals attached to them • The presence and identity of any network cards • Information about any other devices on each 1/0 bus The specific kinds of data that auto-detection searches for include the address and number of a device's control registers, hardware interrupt levels used by the device, information about a device's OMA capabilities, and any ranges of physical memory used by the device. If the hardware offers any device-specific data, auto-detection will collect that as well. This is a wonderful scheme, and it promises to make the lives of driver writ ers much easier in the long run. Later releases of Windows NT will use this strat egy as a basis for supporting Plug and Play capabilities. At the moment, however, most ISA devices don't have a lot to say for themselves and therefore don't show up during auto-detection. This means that drivers of ISA devices have to use other means for locating their hardware. Fortunately, PCI, native EISA, and MCA devices are much more talkative. Auto-Detected Hardware and the Registry Regardless of how NT auto-detects a given piece of hardware, Registry information about the hardware always has a standard format. This isolates driv ers from any bus or platform peculiarities and generally makes life easier for driver writers. Figure 7.1 shows a portion of the Registry's hardware description area. The keys and subkeys below \ System form a tree-structured model of any auto-detectable hardware. Keys with alphanumeric names correspond to . . . 124 Chapter 7 Hardware Initialization H KEV_LOCAL_MACHINE c HARDWARE L DESCRIPTION l 'Lm Multlfu nctionAdapter Lo L Copyright © 1 996 by Cydonix Corporation. 960001a.vsd Figure 7. 1 DiskController Lo L FloppyPeripheral L 0l[; omponentlnformation Configu ration Data Identifier Auto-detected hardware data in the Registry general classes of hardware. Hanging from each of these keys will be one or more subkeys whose names are integers. These numeric subkeys identify specific instances of a CPU, a floating-point unit, a bus, a controller, or a device. In the fig ure, the MultifunctionAdapter key represents a category of buses (in this case ISA), and the subkey 0 below it represents the first actual instance of such a bus. DiskController\0 is connected to this bus, and FloppyPeripheral\O is attached to this controller. Tucked away in the numeric subkeys, you'll find value items containing any information that NT was able to auto-detect. Three value items can show up in one of these numeric subkeys: • Componentlnformation This is binary data that (hopefully) the driver will know how to interpret. • ConfigurationData This names the resources needed by the hardware in the form of a REG_FULL_RESOURCE_DESCRIPTOR item. • Identifier This is an identifier string generated by the hardware or the system BIOS. It's converted to Unicode when it goes into the Registry. - - - You can use the Registry editor, REGEDT32, to browse through this auto detected hardware data. This is very helpful if you're trying to resolve conflicts or make sure that something is auto-detecting properly. Once you've selected a con troller or peripheral' s numeric sub key, double-clicking on the Componentlnfor- Sec. 7.1 125 Finding Auto-Detected Hardware mation value will bring up a display of the resources needed by that piece of hardware. Querying the Hardware Database Although you're free to wander through the hardware description area using RtlXxx and ZwXxx routines, IoQueryDeviceDescription (shown in Table 7.1) makes the process a little less painful. You give this function a pattern describ ing the kind of hardware information you want, and a callback routine. IoQuery DeviceDescription will then rummage around in the Registry and invoke your callback routine each time it finds something that matches the pattern. You tell IoQueryDeviceDescription what level of detail you want by using the XxxType arguments listed in Table 7.2. Only the following combinations will work: Table 7.1 • BusType alone gets just bus-level information. 1 • BusType and ControllerType gets bus and controller information • BusType, ControllerType, and PeripheralType together will give you device-level information. Prototype for l oQueryDeviceDescription NTSTATUS loQueryDeviceDescription IRQL == PASSIVE_LEVEL Parameter Description IN PINTERFACE_TYPE BusType IN PULONG BusNumber IN PCONFIGURATION_TYPE ControllerType IN PULONG ControllerNumber IN PCONFIGURATION_TYPE PeripheralType IN PULONG PeripheralNumber IN PIO_QUERY_DEVICE_ROUTINE Callback IN PVOID Context Return value Desired bus architecture (see below) Zero-based bus number Desired controller type (see below) Zero-based controller number Desired device type (see below) Zero-based device number Address of ConfigCallback routine 1 Address of driver's configuration buffer • STATUS_OBJECT_NAME_NOT_FOUND • STATUS_XXX from ConfigCallback To get information about all the buses on a machine, call IoQueryDeviceDescription in a loop and iterate the BusType from zero to MaximumlnterfaceType. Alternatively, you can use the HalQue· rySystemlnformation function to get an explicit list of the buses on the machine. 126 Chapter 7 Table 7.2 Hardware Initialization Bus, controller, and peripheral types for loQueryDeviceDescription XxxType arguments for loQueryDeviceDescription BusType ControllerType PeripheralType CBus Eisa Internal Isa MicroChannel MPIBus MPSABus NuBus PCIBus PCMCIABus TurboChannel VMEBus AudioController CdrornController DiskController DisplayController KeyboardController NetworkController ParallelController PointerController SerialController TapeController Worm.Controller OtherController DiskPeripheral FloppyDiskPeripheral KeyboardPeripheral LinePeripheral Modem.Peripheral MonitorPeripheral NetworkPeripheral PointerPeripheral PrinterPeripheral TapePeripheral TerrninalPeripheral OtherPeripheral Notice that the XxxType arguments are pointers to variables and not the val ues themselves. You pass a NULL pointer to indicate that you don't want a partic ular kind of information. You can get data about specific buses, controllers, or devices using one or more of the XxxNurnber parameters. These arguments are pointers to variables containing the number of the bus, controller, or device that you're asking about. Passing a NULL pointer causes the 1/0 Manager to enumerate all items of a par ticular type. To see how this works, suppose you call IoQueryDeviceDescription and specify BusType as Eisa, BusNurnber as 0, ControllerType as DiskController, and NULL for the ControllerNumber. The 1/0 Manager will call your ConfigCallback routine once for each disk controller on EISA bus 0. With each invocation, the callback will receive data about EISA bus 0 and one particular controller, but nothing about any devices connected to that controller. Since multiple disk con trollers can be attached to a single bus, the ConfigCallback might get the same bus information more than once, even though the controller information will be different each time. Now, suppose you make the same call to IoQueryDeviceDescription, but this time you further restrict the search by specifying PeripheralType as Floppy DiskPeripheral and NULL for the PeripheralNurnber. In this case, your Config Callback will be called for each floppy drive on EISA bus 0. Along with bus and controller data, each call will receive information about a different floppy disk device. In this case, both the bus and controller information may be repeated for multiple calls (because several floppies can share the same controller). Sec. 7. 1 127 Finding Auto-Detected Hardware If IoQueryDeviceDescription can't find anything in the Registry that matches your request, it returns STATUS_OBJECT_NAME_NOT_FOUND with out invoking the ConfigCallback routine. Otherwise, it continues to execute your callback until it runs out of matching items, or until your callback returns a value other than STATUS_SUCCESS. In this case, it's supposed to return the last NTSTATUS value sent back by your callback routine. That's the theory. In practice, if you pass a NULL BusNumber parameter, you always get STATUS_OBJECT_NAME_NOT_FOUND from IoQueryDevice Description. This value comes back regardless of whether your callback was invoked, and it supersedes whatever status value your callback might have returned. This problem doesn't occur with the other two XxxNumber arguments. For this reason, the code example in the next section manually iterates both BusType and BusNumber. What a ConfigCallback Routine Does Each time IoQueryDeviceDescription invokes your ConfigCallback rou tine, it passes the arguments listed in Table 7.3. These arguments are valid only within the ConfigCallback routine itself, so you have to store any configuration Table 7.3 Function prototype for a configuration callback NTSTATUS XxConfigCallback IRQL == PASSIVE_LEVEL Parameter Description IN PVOID Context IN PUNICODE_STRING PathName Address of configuration buffer Registry path for bus, controller, or device information Bus architecture Zero-based bus number Pointer to Registry information IN INTERFACE_TYPE BusType IN ULONG BusNumber IN PKEY_VALUE_FULL_INFORMATION *Businformation IN CONFIGURATION_TYPE ControllerType IN ULONG ControllerNumber IN PKEY_VALUE_FULL_INFORMATION *ControllerInformation IN CONFIGURATION_TYPE PeripheralType IN ULONG PeripheralNumber IN PKEY_VALUE_FULL_INFORMATION *Peripherallnformation Return value Controller type Zero-based controller number Pointer to Registry information Device type Zero-based device number Pointer to Registry information • • STATUS_SUCCESS STATUS_XXX - error code 128 Chapter 7 Hardware Initialization data that you'll need later in a temporary buffer. Usually, you allocate this buffer somewhere in your DriverEntry routine and pass its address as the Context argu ment to IoQueryDeviceDescription. Although the specific steps will depend on the hardware you're working with, a ConfigCallback routine generally does the following: 1. It scans the Registry information for base-register address, count of registers, interrupt level and vector information, and DMA channel requirements. 2. The ConfigCallback then stores the Registry values in the Config block allo cated by DriverEntry. 3. It translates the Registry's bus-specific values into systemwide values that your driver can use and stores these values in the Config block as well. Each time IoQueryDeviceDescription calls your ConfigCallback routine, you repeat this procedure for a new controller or device that matches your query. Using Configuration Data Your main sources of information in a ConfigCallback routine come from the various XxxType, XxxNumber, and Xxxlnformation arguments. The meaning of the XxxType and XxxNumber items should be pretty obvious, but the Xxxlnfor mation arguments need some explanation. Each Xxxlnformation argument is actually a pointer which may or may not be NULL, depending on what you've asked for. If you follow this pointer, you come to an array of three items. Use one of these predefined constants to index into this array: • loQueryDeviceldentifier Points to any auto-detected hardware name information stored in the Registry as a Unicode string. • loQueryDeviceConfigurationData Points to any bus-relative Registry information about the bus, controller, or device that was discovered dur ing auto-detection. • loQueryDeviceComponentlnformation a device's subcomponents. - - - Points to information about Of these, IoQueryDeviceConfigurationData is probably the most helpful. Using this constant as an index into one of the Xxxlnformation arrays gets you a pointer to a KEY_VALUE_FULL_INFORMATION structure which, in turn, contains the actual Registry data about a bus, controller, or device. Figure 7.2 shows how this works for the Controllerlnformation argument to a ConfigCallback routine. The group of CM_PARTIAL_RESOURCE_DESCRIPTOR items hanging from the bottom of this whole mess contains the actual hardware information you're looking for. As you can see from Table 7.4, each descriptor identifies one Sec. 7.1 129 Finding Auto-Detected Hardware Controllerlnformation[ loQueryDeviceConfigurationData ] DataOffset CM_FULL_RESOURCE_DESCRIPTOR CM_PARTIAL_RESOURCE_LIST CM_PARTIAL_RESOU RCE_DESCRIPTOR Figure 7.2 Hardware information given to a configuration callback Table 7.4 Contents of a partial resource descriptor CM_PARTIAL_RESOURCE_DESCRIPTOR Field Description UCHAR Type Identifies resource being described: • CmResourceTypePort • CmResourceTypelnterrupt • CmResourceTypeDma • CmResourceTypeMemory • CmResourceTypeDeviceSpecificData UCHAR ShareDisposition Level of sharing for this resource: • CmResourceShareDeviceExclusive • CmResourceShareDriverExclusive • CmResourceShareShared USHORT Flags union u struct Port struct Interrupt struct Dma struct Memory struct DeviceSpecificData Type-specific values Union based on Type field • Control register address and span • Interrupt level and vector • OMA channel and port • Device memory address and span • Device-specific information Chapter 7 130 Hardware Initialization kind of hardware resource used by the device. To extract this data, you need to do a little pointer arithmetic and then examine each of the partial resource descriptors. There's something you need to be aware of when you start pulling informa tion from Partial Resource Descriptors: The partial descriptors are in no particular order, so you need to walk through all of them to find the information you want. The only exception to this is device-specific data, which if present, will always be the last partial descriptor. 2 Translating Configuration Data After you've pulled all this data from the Registry, there's still one more step. The information in the partial descriptors is all bus-relative, just the way the auto detection component found it. To use these values in your driver, you need to translate them into their systemwide equivalents. Specifically, you need to call some of the following functions: • HalTranslateBusAddress Converts device memory and register addresses from bus-relative to system-wide values. • HalGetlnterruptVector Converts bus-specific interrupt information into system-assigned values for the vector, DIRQL, and affinity mask. Chapter 9 explains how to use these values to connect to an Interrupt object. • HalGetAdapter locates an Adapter object your driver can use to per form DMA operations with a specific device. Chapter 12 explains how to use this function. - - - It's worth mentioning that, in some environments, some of these transla tions may not do very much, but for portability, you need to perform them anyway. 7.2 CODE EXAMPLE : LOCATING AUTO-DETECTED HARDWAR E This rather long example shows how to pull auto-detected hardware information from the Registry. Specifically, it looks for all the hardware of type ParallelCon troller. You can find these files in the CH07 directory on the disk that accompanies this book. 2 This is because device-specific data is variable in length. Another implication is that there can be only one device-specific data item in a group of partial resource descriptors. Code Example: Locating Auto-Detected Hardware Sec. 7.2 131 XXDRIVER.H The following excerpts from the driver's header file show the driver-defined data structures involved in hardware configuration. 3 DEVICE_BLOCK This temporary structure is carved out of paged pool and is used only during driver initialization. It holds information about one spe cific piece of hardware. Some of the items in this block will later be copied into the Device Extension block for safekeeping. typede f s truc t _DEVICE_BLOCK { II I I Ori ginal values pu l l ed f rom the Reg i s t ry II PHYS ICAL_ADDRES S Original PortBas e ; ULONG PortSpan ; ULONG Origina l i rql ; ULONG OriginalVector ; KINTERRUPT_MODE InterruptMode ; BOOLEAN ShareVector ; BOOLEAN FloatingSave ; ULONG OriginalDmaChannel ; II I I Converted values that wi l l be u s ed by I I the driver PUCHAR PortBas e ; / / F i r s t contr o l regi s t er ULONG Sys temVe c t o r ; KIRQL D i rql ; KAFFINITY Af f i n i ty ; DEVICE_BLOCK , * PDEVICE_BLOCK ; CONFIG_ARRAY This structure is an array of DEVICE_BLOCKs that hold temporary information about all the hardware belonging to the driver on one par ticular bus. In theory, multiple devices might show up on different buses, in which case there would be a linked list of CONFIG_ARRAYs. The Count field keeps track of how many DEVICE_BLOCKs actually contain valid data. typede f s t ruct _CONFI G_ARRAY II I I We keep a l i s t o f the s e arrays , one I I f o r each bus - type / bu s - number combi na t i on 3 You'll notice some DMA-related fields in the following structures. Since the parallel port doesn't perform any DMA, these won't be used. Chapter 12 will show you how to fill them in. 132 Chapter 7 Hardware Initialization I I where we f ind our hardware . II s t ruct _CONF IG_ARRAY *NextCon f i gArray ; II I I The bus to whi ch a l l the devi c e s in thi s I I array are attached . II INTERFACE_TYPE BusType ; ULONG BusNumber ; II I I Number o f devi c e s in thi s array II ULONG Count ; II I I One array- e l ement for each dev i c e II DEVICE_BLOCK Devi c e [ XX_MAXIMUM_DEVICES ] ; CONFIG_ARRAY , * PCONF IG_ARRAY ; DEVICE_EXTENSION This driver-defined structure is created from non paged pool by IoCreateDevice and automatically attached to our Device object. It holds information that will be needed throughout the life of the driver. typede f s t ruct _DEVICE_EXTENS I ON { PDEVICE_OBJECT Dev i c eObj ect ; I I Back pointer ULONG NtDevi c eNumber ; PUCHAR PortBase ; I I Z ero -based device num I I F i r s t c ontr o l regi s ter PKINTERRUPT pinterrup t ; I I Interrupt obj ect PADAPTER_OBJECT pAdapter ; ULONG cMapRegs ; UCHAR Devi c e S tatus ; I I DMA Adapter obj e c t I I Count o f mapping regs I I Mo s t rec ent s t atus } DEVICE_EXTENS ION , * PDEVICE_EXTENS ION ; AUTOCON.C This group of functions scans the Registry's hardware description map for all the parallel controllers. It fills in a separate DEVICE_BLOCK for each piece of hardware it finds. The result is a linked list of CONFIG_ARRAYs describing all the parallel controllers on all buses in this machine. XxGetHardwarelnfo This routine just loops through all the known bus types and checks to see if one or more of our devices live on each bus. This is mainly a harness for the call to IoQueryDeviceDescription. Sec. 7.2 Code Example: Locating Auto-Detected Hardware 133 NT STATUS XxGetHardwareinfo ( IN PUNICODE_STRING Regi s t ryPath , I I ( unus e d ) OUT PCONFIG_ARRAY * Conf i gL i s t ) INTERFACE_TYPE Interfac eType ; ULONG InterfaceNumber ; CONF IGURATI ON_TYPE Ctrl rType PCONF IG_ARRAY Conf i gArray ; NTSTATUS s tatus ; Para l l e l Contro l l e r ; 0 * C on f i gL i s t = NULL ; I I No devi c e s located yet II I I Run through a l l the var i ous bus types and I I see i f our device i s on any o f them . . . II for ( InterfaceType = O ; InterfaceType < Maximuminterfac eType ; InterfaceType++ ) O; InterfaceNumber do { s tatus = I oQueryDevi ceDe s c r ip t i on ( @ & InterfaceType , & InterfaceNumber , &Ctrl rType , NULL , NULL , NULL , XxCon f i gCal lback , Conf i gL i s t ) ; II I I Return to cal l e r i f a real I I error occurs II i f ( ! NT_SUCCESS ( s tatus ) @ && s t atus ! = STATUS_OBJECT_NAME_NOT FOUND XxRe leas eHardwareinfo ( * Conf i gL i s t ) ; * Conf i gLi s t = NULL ; return s tatus ; } 134 Chapter 7 Hardware Initialization Inter fac eNurnber+ + ; whi l e ( s tatus ! = STATUS_OBJECT_NAME NOT FOUND ) ; end o f forl oop } II i f ( *Con f i gL i s t = = NULL ) return STATUS_NO_SUCH_DEVICE ; else return STATUS_SUCCES S ; 0 This is the hardware category. Notice that the parallel port is considered to be a controller rather than a device. @ Since we're specifying a controller type, our callback will be invoked once for each piece of hardware on the current bus that matches the Parallel Controller type. @} STATUS_OBJECT_NAME_NOT_FOUND simply means there is no such item on the current bus - so we keep looking. Other kinds of errors cause us to abort. XxConfigCallback This routine gets called by the I/ 0 Manager once for each device that matches the category ParallelController. We have to scan through the Registry data for information about 1/0 port addresses and interrupt behavior. s t a t i c NTSTATUS XxCon f i gCal lback ( IN PVO I D Context , IN PUNICODE STRING PathName , IN INTERFACE_TYPE Bus Type , IN ULONG BusNumber , IN PKEY_VALUE_FULL_INFORMATION * Bus info , IN CONF IGURATION_TYPE C t r l rType , IN ULONG C t r l rNurnber , IN PKEY_VALUE_FULL_INFORMAT ION * C trlrinfo , IN CONF IGURAT ION_TYPE Devi ceType , IN ULONG Dev i c eNurnber , IN PKEY_VALUE_FULL_INFORMATI ON * Devi c e i n f o ) II I I So we don ' t have to typecast the cont ext . II PCONF IG_ARRAY * C on f i gL i s t = Cont ext ; II I I Short-hand po inters to r e s ource data II Sec. 7.2 Code Example: Locating Auto-Detected Hardware 135 PCM_FULL_RESOURCE_DESCRI PTOR pFrd ; PCM_PARTIAL_RESOURCE_DESCRI PTOR pPrd ; PCONF IG_ARRAY Con f i gArray ; PDEVICE_BLOCK Devi c eBl ock ; II I I The s e bool eans wi l l t e l l us whether we got I I all the inf orma t i on that we needed . II BOOLEAN bFoundPort = FALSE ; BOOLEAN bFoundinterrupt = FALSE ; NTSTATUS s tatus ; ULONG i ; I I Gene r i c l o op control II I I Locate the Con f i g Array for thi s bus II s tatus = XxFindMatchingCon f i gArray ( O BusType , BusNumber , Conf igLi s t , &Con f i gArray ) ; i f ( ! NT_SUCCES S ( s t atus ) ) { re turn s tatus ; } II I I See i f there ' s any room l e f t in the Conf i g I I Array ; i f not , j us t drop thi s devi ce on the I I f l oor II i f ( Con f i gArray->Count >= XX_MAXIMUM_DEVICES { re turn STATUS_SUCCES S ; } II I I Make i t eas i e r t o r e f e r to the s l o t i n the I I Conf i g Array bel onging to thi s device II Devi ceBlock = &Con f i gArray- >Devi c e [ Conf i gArray->Count ] ; II I I Ge t pointer t o beginning o f con f i gura t i on I I data f o r thi s device in the Regi s t ry II Chapter 7 136 pFrd = Hardware Initialization ( PCM_FULL_RESOURCE_DESCRI PTOR ) @ ( ( ( PUCHAR ) C t r l r i n f o [ I oQueryDeviceCon f i gurat i onData ] ) + Ctrlrinfo [ I oQueryDevi c eCon f i gurat i onData ] - >DataO f f s e t ) ; II I I Loop through a l l Par t i a l Res ource Des c r iptors I I looking for Port and Interrupt informat i on II for ( i = 0 ; @) i < pFrd - > Part i a l Re sourc eL i s t . Count ; i++ ) pPrd &pFrd- > PartialRe s ourceL i s t . Part i a l De s c r iptors [ i ] ; II I I Swi tch on the var i ous part i a l r e s ource I I types . Pul l out the pieces we need . . . II swi t ch ( p Prd- >Type ) 0 { case CmRes ourceTypePort : bFoundPort = XxGe t Port info ( pPrd , BusTyp e , BusNumber , Devi ceBlock ) ; break ; case CrnRes ourceTypeinterrup t : bFoundinterrupt = XxGe t interrup t info ( pPrd , BusType , BusNumber , Devi ceBlock ) ; break ; de f aul t : break ; } I I end o f swi tch } I I end of f o r - loop Sec. 7.2 Code Example: Locating Auto-Detected Hardware 137 i f ( ! ( bFoundPort && bFoundinterrupt ) ) 0 { re turn STATUS_NO_SUCH_DEVICE ; } II I I Acc ount for the s lo t that we ' ve j us t I I f i l l ed up . . . II Conf i gArray- >Count + + ; CD return STATUS_SUCCES S ; } 0 XxFindMatchingBus is a helper function that locates the Config Array for a specific bus type and number combination. If this is the first time a par ticular bus has been encountered, it creates an empty Config Array and links it into the caller-supplied Config List. 8 Create a pointer to the Full Resource Descriptor for this device. To do this, we need to skip over the header information by adding the DataOffset field to the starting address of the block. 4D The Partial Resource Descriptors are in no particular order, so we have to loop through all of them looking for information about ports and inter rupts. Anything we don't recognize, we ignore. 0 Switch on the Partial Resource type and call a helper function to extract the useful information from it. The parallel controller needs only port and interrupt data; for other devices you might need to add cases for CmRe sourceTypeDma, CmResourceTypeMemory, or CmResourceTypeDevice SpecificData. 0 When the entire scan is complete, check to be sure that all the components have been found. If anything is missing, signal an error. CD Each time we successfully locate a device, we use up one more slot in the Config Array. The Count field keeps track of this. XxGetPortlnfo and XxGetlnterruptlnfo Here are the two helper func tions. Each one simply pulls information out of a specific kind of Partial Resource Descriptor and stores it in the appropriate fields of a DEVICE_BLOCK. They also translate bus-specific values into their systemwide equivalents. I I++ I I Func t i on : XXGetPortinfo II II I I Des cript i on : II Thi s func t i on pul l s I I O Port infomat i on 138 Chapter 7 Hardware Initialization from a Par t i a l Re s ource Descr iptor II II I I Argument s : Pointer to a Par t i a l Re s ource Des c r iptor II Bus type for thi s device II Bus number f o r thi s device II Pointer to thi s device ' s s l ot in Conf i g Array II II I I Return Value : Thi s func t i on re turns TRUE i f we f ound the II data we wanted , FALSE otherwi s e . II I I -s t a t i c BOOLEAN XxGet Port info ( I N PCM_PARTIAL_RESOURCE DESCRI PTOR pPrd , IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVICE_BLOCK Dev i c eB l ock ) { PHY S ICAL_ADDRE S S Trans lat edPortBa s e ; ULONG uAddr e s s Space = 1 ; Devi c eBlock->Original PortBase pPrd- >u . Port . Start ; = Devi ceBlock- >PortSpan p Prd- >u . Port . Length ; i f ( ! Ha lTrans lat eBusAddr e s s ( BusType , BusNumber , Devi ceBlock->Original PortBas e , &uAddres s Spac e , &Trans latedPortBas e ) ) { return FALSE ; = Devi ceBl ock- > PortBase ( PUCHAR ) Trans l atedPortBas e . LowPart ; return TRUE ; I I++ I I Func t i on : XxGet int errup t i n f o II II I I Des c r ipt i on : Thi s func t i on pul l s Interrupt infomat ion II from a Par t i a l Resource Descriptor II Sec. 7.3 139 Finding Unrecognized Hardware II I I Argument s : Pointer t o a Par t i a l Res ource De s c r iptor II Bus type f o r thi s devi c e II B u s number for thi s devi c e II Po inter to thi s devi c e ' s s l ot in Con f i g Array II II I I Return Value : Thi s func t i on re turns TRUE i f we f ound the II data we want ed , FALS E otherwi s e . II I I-s tat i c BOOLEAN XxGe t interrup t i n f o ( IN PCM_PARTIAL_RESOURCE DESCRI PTOR p Prd , IN INTERFACE_TYPE BusTyp e , IN ULONG BusNumber , IN PDEVICE_BLOCK Devi ceBlock ) i f ( pPrd- >Flags == CM_RESOURCE_INTERRUPT_LATCHED Devi ceBl ock- > InterruptMode = Latched ; else Devi ceBl ock- > InterruptMode Level S ens i t ive ; Devi ceBlock- >Origina l i rql = p Prd- >u . Interrupt . Leve l ; Devi ceBlock- >Ori ginalVector p Prd- >u . Interrupt . Vector ; Devi ceBlock- > ShareVector = FALSE ; Devi ceB l ock- >Float ingSave FALSE ; Devi c eB l o c k - > Sys temVector = HalGe t interruptVec tor ( Bus Typ e , BusNumber , pPrd- >u . Interrupt . Leve l , pPrd- >u . Int errupt . Ve c t o r , &Devi ceBl ock- >Dirql , &Devi ceBl ock- >Af f inity ) ; r eturn TRUE ; = } 7.3 FINDING U N R ECOGN IZED HARDWARE If your device doesn't show up under auto-detection, or if you just need to sup plement the auto-detected information, you can hard-code additional information into the Registry. This section explains how. 140 Chapter 7 Hardware Initialization Adding Driver Parameters to the Registry One way to tell your driver about hardware is to hard-code the information in a nonvolatile area of the Registry. Although this doesn't seem like a very ele gant solution, in the absence of any auto-detection capabilities, it may be your only option. Many ISA devices will require the use of this technique. The standard convention is to store device information in one or more value entries beneath a subkey called Parameters, which dangles off the driver's service key in the Registry. Figure 7.3 shows how this works. It's usually up to the driver's installation procedure to set up the Parameters area. For example, suppose your driver works with a device that the user has to configure manually with DIP switches. When the driver 's installation program runs it displays a dialog box ask ing the user for the port address, IRQ, and DMA settings selected on the device. It then stores this information in the Parameters area where the driver can find it. There are no particular standards for the format of driver-specific parameter data. You simply need to store the same kinds of information that your device would generate if it auto-detected. As we've already seen, this can include the addresses of any control registers, the IRQ level used by the device, information about its DMA capabilities, and the address and span of any device memory. If your driver supports multiple devices, it's probably a good idea to create separate subkeys underneath Parameters for each individual device. In Figure 7.3, these are the DeviceO and Devicel subkeys. Retrieving Parameters from the Registry You use RtlQueryRegistryValues (described in Table 7.5) to retrieve values from the Parameters subkey of your driver's Registry key. This is a very powerful H KEY_LOCA L_M A CHINE Ll CurrentControlSet L Services L XxDriver L Parameters 4; De�""° RT' AN: : Copyright © 1 994 by Cydonlx Corporation. 940049a.vsd Figure 7.3 Device1 REG_DWORD : Ox378 REG_DWOR D : Ox3 REG_DWOR D : Ox7 Registry path for driver-specific parameters Sec. 7.3 Finding Unrecognized Hardware Table 7.5 141 Prototype for RtlQueryRegistryValues function NTSTATUS RtlQueryRegistryValues IRQL == PASSIVE_LEVEL Parameter Description IN ULONG RelativeTo Specifies beginning of Registry path • RTL_REGISTRY_ABSOLUTE • RTL_REGISTRY_SERVICES • RTL_REGISTRY_CONTROL • RTL_REGISTRY_WINDOWS_NT • RTL_REGISTRY_DEVICE_MAP • RTL_REGISTRY_USER • RTL_REGISTRY_OPTIONAL • RTL_REGISTRY_HANDLE Identifies an absolute or relative path Address of a table describing the query IN PWSTR Path IN PRTL_QUERY_REGISTRY_ TABLE QueryTable IN PVOID Context IN PVOID Environment Return value Context passed to a QueryRoutine Environment block used to expand any REG_EXPAND_SZ registry entries • STATUS_SUCCESS • STATUS_INVALID_PARAMETER • STATUS_OBJECT_NAME_NOT_FOUND function, and if you're going to be doing anything fancy with the Registry, you should become familiar with all its capabilities. For our purposes, we won't need to do much with it except translate a few value names. To work with RtlQueryRegistryValues, you need to construct a query table describing the values you want to translate. The query table is an array of RTL_QUERY_REGISTRY_TABLE items terminated with an entry containing NULL QueryRoutine and Name fields. Table 7.6 shows the format of the individ ual items. As with auto-detected hardware information, it's a good idea to store the Registry data in a configuration buffer that other parts of your DriverEntry rou tine can use. That way, you can move the driver to an auto-detecting environment without having to rewrite too much code. Also remember that values from the Registry still must be translated into systemwide values. Other Sources of Device Information Before we look at an example of using the Registry, it's worth mentioning some other sources of hardware information. The first is the HalGetBusData function which allows you to interrogate a specific slot on a specific bus. This Chapter 7 142 Table 7 .6 Hardware Initialization Query table entries RTL_QUERY_REGISTRY_TABLE Field Description PRTL_REGISTRY_QUERY_ ROUTINE QueryRoutine ULONG Flags Optional query routine to be called for each item found in the Registry Control interpretation of other fields • RTL_QUERY_REGISTRY_SUBKEY • RTL_QUERY_REGISTRY_TOPKEY • RTL_QUERY_REGISTRY_REQUIRED • RTL_QUERY_REGISTRY_NOVALUE • RTL_QUERY_REGISTRY_NOEXPAND • RTL_QUERY_REGISTRY_DIRECT Name of the value caller wants to query 32-bit value to be passed to QueryRoutine Type of data Data item to be used if queried item not present Default length of data item PWSTR Name PVOID EntryContext ULONG DefaultType PVOID DefaultData ULONG DefaultLength function returns a buffer containing any device-specific data available from a device. HalGetBusData is only useful if you're working with buses like PCI or EISA that generate a lot of information. Also, the I/O Manager keeps a data structure that tracks the number of disk, tape, floppy, SCSI-HBA, serial, and parallel Device objects that have been created by various drivers. Calling IoGetConfigurationinformation returns a pointer to this structure, which you can use to pick an appropriate number for a new device name. It's also your responsibility to increment the counts in this structure if you create any of the device types listed above. Finally, if none of the techniques we've looked at will work, you may have no alternative but to locate your hardware by poking various control register addresses. This a potentially dangerous and error-prone way to do things. If you take this approach, make sure you temporarily allocate the hardware before you fiddle with it. If the allocation fails, don't touch the hardware. Otherwise, you may be doing something that confuses an already-loaded driver that owns the hardware and has put it into a specific state. 7 .4 CO D E EXAM PLE : Q U E RYING TH E R EGISTRY Here is another hardware locator. This one pulls information about ISA cards from the Parameters subkey of the driver 's service key. You can find this code in the CH07 directory on the disk that accompanies this book. Sec. 7.4 Code Example: Querying the Registry 143 REGCON.C This group of functions scans the driver 's Parameters key looking for sub keys with names like DeviceO, Devicel, and so on. Each time it finds one, it fills out another DEVICE_BLOCK using values from the Registry. XxGetHardwarelnfo This routine checks for the existence of an ISA bus on the machine; if no ISA bus shows up, it checks for an EISA bus where the ISA card might live. If neither type of bus exists on this machine, the routine fails. This indirect approach is necessary because ISA cards don't give any feedback about their presence. NTSTATUS XxGetHardware i n f o ( IN PUNICODE_STRING Regi s t ryPath , IN PCONF IG_BLOCK pConf i g ) { NTSTATUS s tatus ; PCONF IG_ARRAY Con f i gArray ; INTERFACE_TYPE BusType ; ULONG BusNumber ; UNICODE_STRING TernpString ; II I I Check for a bus we can use . Look for an I SA bus I I f i rs t , then look for an EISA bus . I f nei ther one I I shows up , qui t . II BusType = I s a ; BusNumber = O ; s tatus = XxCheckForBus ( I s a , BusNurnber ) ; i f ( ! NT_SUCCES S ( s tatus ) ) { Bus Type = E i s a ; s tatus = XxCheckForBus ( E i s a , BusNumber ) ; } i f ( ! NT_SUCCESS ( s tatus ) ) { * Conf i gL i s t = NULL ; re turn STATUS_NO_SUCH_DEVICE ; } II I I We found a c ompat ible bus . Al locate I I spac e f o r the ( s ingl e ) Con f i g array 144 Chapter 7 Hardware Initialization I I that we ' l l be pas s ing back to the I I cal ler . II ExAl l ocatePoo l ( i f ( ( Con f i gArray PagedPo o l , s i z eo f ( CONFI G_ARRAY ) ) ) = = NULL ) * C on f i gL i s t = NULL ; return STATUS_INSUFFICIENT_RESOURCES ; Rt l Z eroMemory ( Con f i gArray , s i z eo f ( CONF IG_ARRAY ) ) ; * C on f i gL i s t = Con f i gArray ; Con f i gArray- >BusType = BusType ; Con f i gArray- >BusNumber BusNumber ; II I I Make a copy o f the Regi s t ry path name I I and be sure it has a terminator at the I I end . . . II Temp S t r ing . Length = O ; 0 Temp S t r ing . MaximumLength = Reg i s t ryPath- >Length + s i z eo f ( UNI CODE_NULL ) ; i f ( ( TempStr ing . Bu f fer = ExAl l ocatePoo l ( PagedPool , TempS tr ing . MaximumLength ) ) - - NULL ) *Con f i gL i s t = NULL ; ExFreePoo l ( Conf i gArray ) ; return STATUS_INSUFFICI ENT_RESOURCES ; Rt lCopyUni codeString ( &TempString , Regi stryPath ) ; TempS t r ing . Bu f fer [ TempStr ing . Length ] UNICODE_NULL ; = II I I Keep l o oping unt i l we run out o f device I I slots or Regi s t ry ent r i e s , or unt i l an I I error occurs . Sec. 7.4 145 Code Example: Querying the Registry II Con f i gArray- >Count = O ; whi l e ( ConfigArray- >Count { s tatus = < XX_MAXIMUM_DEVICES ) @ XxF indNextDevice ( BusTyp e , BusNumber , &Temp S t r ing , Con f i gArray ) ; i f ( ! NT_SUCCESS ( s tatus ) ) break ; Con f i gArray- >Count + + ; } I I end whi l e - l oop ExFreePoo l ( Temp S t r ing . Bu f fer ) ; i f ( ! NT_SUCCESS ( s tatus ) && s tatus ! = STATUS_OBJECT_NAME_NOT FOUND ) @) * Conf i gL i s t = NULL ; ExFreePo o l ( Conf i gArray ) ; return s tatus ; } II I I See i f we f ound anything a f t er a l l I I that work II i f { Con f i gArray->Count - - 0 ) 0 { * Conf igLi s t = NULL ; ExFreePo o l ( Conf igArray ) ; return STATUS_NO_SUCH_DEVICE ; } II I I Everything worked . . II return STATUS_SUCCESS ; . 0 We need to go through all these shenanigans because the RegistryPath argument is a counted UNICODE_STRING object, but the Registry query function wants a NULL-terminated array of Unicode characters. @ This loop keeps going until we run out of slots in the Configuration block, or until we don't find a matching entry in the Registry. The organization of this routine means that all the DeviceN subkeys must be consecutive. 146 Chapter 7 Hardware Initialization @l STATUS_OBJECT_NAME_NOT_FOUND means we ran out of DeviceN subkeys, but it's not really an error. 0 There must have been at least one valid set of parameter information, or there's a problem somewhere. XxFindNextDevice This function extracts information about one device from the driver's service key and stores it in a slot in the Configuration block. s ta t i c NTSTATUS XxF indNextDevi c e ( IN INTERFACE_TYPE BusTyp e , IN ULONG BusNumber , IN PUNICODE_STRING Regi s t ryPath , IN PCONF IG_ARRAY Conf i gArray ) UNICODE_STRING SubPath ; WCHAR PathNameBu f f er [ 3 0 ] ; UNICODE_STRING Number ; WCHAR NumberBu f f er [ l O ] ; RTL_QUERY_REG I STRY_TABLE Tab l e [ 5 ] ; 0 NTSTATUS s tatus ; PDEVICE_BLOCK pDevi c e = &Con f i gArray- >Devi ce [ Con f i gArray->Count ] ; II I I Prepare to interrogate the Regi s t ry by I I s e t t ing up the query- table II Rt l Z eroMemory ( Tabl e , s i z eo f ( Tabl e ) ) ; II I I Create a name s t r ing for the I I query tabl e . S tart by forming I I the bas e path name II SubPath . Bu f f e r = PathNameBuf fer ; @ SubPath . MaximumLength = s i z e o f ( PathNameBu f fer ) ; SubPath . Length = O ; RtlAppendUn i codeToStri ng ( & SubPath , L " Parameters \ \ Devi c e " ) ; II I I Convert the devi c e number into a s t r ing and Sec. 7.4 147 Code Example: Querying the Registry I I at tach i t to the end o f the path name . II Number . Bu f fer = NumberBu f f e r ; Number . Maximum.Length = s i z eo f ( Numbe rBu f f er ) ; Number . Length O; = Rt l i ntegerToUni code S t r ing ( Con f i gArray- >Count , I I bas e - 1 0 conve r s i on 10 , &Number ) ; Rt lAppendUnicode S t r ingTo S t r ing ( &SubPath , &Number ) ; II I I Fabr i cate the query II = SubPath . Bu f f e r ; Tab l e [ O ] . Name RTL_QUERY_REGI STRY_SUBKEY ; tD Tab l e [ O ] . F lags L " PORT " ; I I I I O port addr Table [ l ] . Name = RTL_QUERY_REGI STRY_DIRECT ; Table [ l ] . F lags Table [ l ] . EntryCont ext &pDevi c e - >Original PortBas e ; = L " S PAN " ; I I Number o f ports Table [ 2 ] . Name RTL_QUERY_REG I STRY_DIRECT ; Table [ 2 ] . F lags Table [ 2 ] . EntryContext = &pDevi c e - > Port Span ; = = Table [ 3 ] . Name = L " I RQ " ; I I I RQ l eve l Table [ 3 ] . F l ags = RTL_QUERY_REGI STRY_DIRECT ; Table [ 3 ] . EntryCont ext = &pDevi c e - >Origina l i rql ; II I I Query the Regi s t ry . . . II s tatus = Rt lQueryRegi s tryValues ( 0 RTL_REGI STRY_ABSOLUTE , Reg i s t ryPath- >Bu f fer , Tabl e , NULL NULL ) i I i f ( ! NT_SUCCESS ( s tatus ) ) return s tatus ; II I I Fix up and t rans late the informat i on I I from the Reg i s t ry II 148 Chapter 7 Hardware Initialization XxGe t Portinf o ( 0 BusType , BusNumber , pDevi ce ) ; s tatus i f ( ! NT_SUCCESS ( s t atus ) ) return s t atus ; s t atus XxGetinterruptinfo ( BusType , BusNumber , pDevi ce ) ; return status ; = 0 We need four entries in the query table for our own use, plus one extra to terminate the query request. '9 We need to create a string that looks like "Parameters\DeviceN" to repre sent the subkey under the driver 's service entry. @) This query just moves us down a level in the Registry so that all future queries will be taken from the Parameters \DeviceN subkey. e One call to RtlQueryRegistryValues does it all. It adds the subkey to the end of the driver 's service key name, looks for all four value items, and dumps their contents back into the Configuration block. 0 From here on, we use some helper functions to make the data from the Registry usable. XxGetPortlnfo and XxGetlnterruptlnfo Here are the helper functions again. You'll notice that XxGetlnterruptlnfo has to do some fix-up work on the data it gets from the Registry. I I++ I I Func t i on : XxGe tPortinfo II II I I De s c r ip t i on : Thi s func t i on f ixes up II pul l ed from the driver ' s II II I I Argument s : Bus type II Bu s number II Pointer to thi s device ' s II II I I Return Value : STATUS_SUCCESS II STATUS_XXX i f error II I I O port infomat i on Registry service key s lot in Config Array Sec. 7.4 Code Example: Querying the Registry 149 / /-s tat i c NTSTATUS XxGet Port info ( IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVICE_BLOCK pDevi c e ) ULONG Addre s s Spac e ; PHYS ICAL_ADDRES S Trans latedPortBa s e ; II I I Conver t bus - re lat ive por t - inf ormat ion into NT I I sys t em-mapped value s , and s ave the resul t s . . II Addres s Space = l ; / / Ports should be in I / 0 space . . i f ( ! Hal Trans lat eBusAddr e s s ( BusType , BusNumber , pDevi c e - >Or i ginal PortBas e , &Addre s s Space , &Trans l atedPortBase ) ) { return STATUS_INSUFF I C IENT_RESOURCES ; pDevi c e - > PortBase = ( PUCHAR ) Trans latedPortBas e . LowPart ; return STATUS_SUCCESS ; } / / ++ I I Func t i on : XxGe t interruptinfo II II I I De s c r ip t i on : Thi s func t i on f ixes up I RQ infoma t i on II pul l ed from the driver ' s Regis try service key II II I I Argument s : Bus type II Bus number II Pointer to thi s device ' s slot in Conf ig array II II I I Return Va lue : STATUS_SUCCESS II STATUS_XXX i f error II 150 Chapter 7 Hardware Initialization / / -s t at i c NTSTATUS XxGetinterrupt info ( IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVICE_BLOCK pDevice ) II I I F i l l in the gaps by providing values for things I I that aren ' t in the Regi s t ry . . . II pDevi c e - > InterruptMode Lat ched ; pDevi c e - >Ori ginalVe c t o r pDevi c e - >Original i rql ; pDevi c e - > ShareVector FALSE ; pDevi c e - > F l oat ingSave FALSE ; = = = = II I I Convert bus -relative interrupt information into I I NT sys tem-mapped values , and save the results . . . II pDevi c e - > Sys t emVector HalGet interruptVector ( BusType , BusNumber , pDevi c e - >Origina l i rql , pDevi c e - >Ori ginalVector , &pDevi c e - >Di rql , &pDevi c e - >Af f ini ty ) ; return STATUS_SUCCES S ; = XxCheckForBus and XxBusCallback These little functions allow you to check for the existence of a particular bus on the system. They make use of IoQue ryDeviceDescription to test for the presence of the bus. I I ++ I I Func t i on : XxCheckForBus II II I I De s c r i p t i on : Thi s func t i on veri f i e s the exi s t ence o f a // part i cu l ar bus - type and number . II II I I Argument s : BusType - - I s a , E i s a , e t c II Sec. 7.4 Code Example: Querying the Registry 151 BusNumber - - 0 , l , etc II II I I Return Value : STATUS_SUCCESS or s ome error c ondi t i on . // / /-s tat i c NTSTATUS XxCheckForBus ( IN INTERFACE_TYPE BusType , IN ULONG BusNumber ) return ( IoQueryDevi ceDe s c r ip t i on ( &BusType , &BusNumber , NULL , NULL , NULL , NULL , XxBusCal lback , NULL ) ) ; I I ++ I I Func t i on : XxBusCal lback II II I I Des c r ipt i on : Thi s i s a dummy func t i on . The fact that the II sys tem cal l s it means that the bus type and II number both exi s t , so a l l that ' s nec e s s ary II i s to return STATUS_SUCCESS . II II I I Argument s : ( Unus ed ) II II I I Return Value : Thi s funct i on always returns STATUS_SUCCES S II I I-s ta t i c NTSTATUS XxBusCal lback ( IN PVOI D Cont ext , IN PUNICODE_STRING PathName , IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PKEY_VALUE_FULL_INFORMATION * Bus info , IN CONF IGURATI ON_TYPE C t r l rTyp e , IN ULONG Ctrl rNumber , IN PKEY_VALUE_FULL_INFORMAT ION * C t r l rinf o , IN CONF IGURATI ON_TYPE Devic eType , 152 Chapter 7 Hardware Initialization IN ULONG Devic eNumbe r , IN PKEY_VALUE_FULL_INFORMATION * Devi c e i n f o ) re turn STATUS_SUCCES S ; 7.5 ALLOCATING AND R ELEASING HARDWARE At this point, your driver has gone to a lot of trouble to locate some hardware. Before you can use any of it, though, you have to make sure the hardware doesn't belong to any other driver. This section explains how to allocate hardware for your driver's exclusive use. How Resource Allocation Works NT maintains a central database of all currently owned hardware in the . . . \HARDWARE\RESOURCEMAP section of the Registry. Before touching any hardware resources, a driver checks this map to be sure someone else isn't using them. If everything is free, the driver claims the hardware by adding a description of its resource requirements to the resource map. If the resources aren't free, the 4 driver must leave them alone. Resources owned by a particular driver are recorded in a key with the same name as the driver. In the resource map, these resource keys are organized in arbi trary classes. Your driver has the option of declaring its own class, using an exist ing class declared by another driver, or using the default resource class called OtherDrivers. Resource classes are purely decorative and have no effect on resource allocation or conflict detection. Within a driver 's resource key, there are two values called .Raw and .Trans lated. Each of these items is a list describing the resources owned by the driver. The raw list contains bus-specific information returned by routines like IoQuery DeviceDescription, while the translated list holds the systemwide numbers returned by the HalTranslateXxx functions. Drivers can also declare some resources as the property of the whole driver, and others as belonging to individual devices. In this case, resources shared by multiple devices go into the driver 's .Raw and .Translated values, while device specific resources have their own value items in the resource key. These device specific values are called \Device\DeviceName.Raw and \Device\Device Name.Translated. Figure 7.4 shows how all this works. 4 For the stability of the operating system, it's vital that all device drivers abide by this arbitration scheme. As a trusted kernel-mode component, no one can stop a driver from touching hardware without allocating it. However, this can lead to confusing, unpredictable interactions between mul tiple drivers that think they each have exclusive access to a piece of hardware. Sec. 7.5 153 Allocating and Releasing Hardware L H KEY_LOCAL_MACHIN E L HARDWARE RESOURCEMAP L XX DRIVER RESOURCES XxDriver .Raw L OtherDrivers \Device\XXO.Raw \Device\XXO.Translated YyDriver �-----< .Raw .Translated \Device\YyO. Raw \Device\YyO.Translated Copyright © 1 996 by Cydonix Corporation. 960002a. vsd Figure 7.4 Format of hardware-allocation data in the Registry In the figure, XXDRIVER has declared a private class (called XX DRIVER RESOURCES) for its resource list. Some resources are allocated to the driver itself, while others belong only to the device XxO. YYDRIVER, being somewhat more shy, doesn't use a private class for its resources, so its resource key ends up in the OtherDrivers class. Again, some resources belong to the entire driver while others have been claimed only for one device. Again, the Registry editor, REGEDT32, gives you an easy way to poke around in the system resource map. In the initial phases of driver development, you can use this tool to make sure your driver is allocating all the right resources. REGEDT32 also lets you verify that an unloadable driver has released whatever hardware it may have claimed. How to Claim Hardware Resources To claim hardware, your driver needs to build a list of the resources it wants to allocate. Figure 7.5 shows one of these lists. At the very top is a structure called a CM_RESOURCE_LIST. As you can see, a Resource List is basically an array of the CM_FULL_RESOURCE_DESCRIPTOR structures that you saw back in Figure 7.2. Each Full Resource Descriptor in this array identifies all the resources used by the driver on a single bus type and bus number. Collectively, all the Full Resource Descriptors in a single Resource List describe the resources used on multiple buses. As with the data passed to a ConfigCallback routine, individual resources are identified by Partial Resource Descriptors. The only difference is that the information given to a ConfigCallback routine is about one specific device or con troller. When you fabricate a Full Resource Descriptor to allocate hardware, you 154 Chapter 7 Hardware Initialization CM_RESOURCE_LIST CM_FULL_RESOU RCE_DESCRIPTOR CM_PARTIAL_RESOURCE_LIST 1 st Bus CM_PARTIAL_RESOURCE_DESCRIPTOR CM_PARTIAL_RESOURCE_DESCRIPTOR CM_FULL_RESOURCE_DESCRIPTOR CM_PARTIAL_RESOU RCE_LIST CM_PARTIAL_RESOURCE_DESCRIPTOR 2nd Bus CM_PARTIAL_RESOU RCE_DESCRIPTOR Copyright © 1 994 by Cydonix Corporation. 940047a.vsd Figure 7.5 Structures passed to IoReportResourceUsage have to group together the Partial Descriptors for all resources on one bus in the same Full Resource Descriptor. 5 You request ownership of the items in a CM_RESOURCE_LIST by passing the list to loReportResourceUsage (described in Table 7.7) . This function checks for any conflicts with previously allocated hardware and adds your claims to the Registry's resource map. When you call this function, it com pletely replaces any existing resource list associated with the specified Driver or Device object. If you include a class-name string, the 1 / 0 Manager will create a private class key for your driver 's resources. Passing NULL puts your driver 's resource key in the OtherDrivers class. If you allocate resources using a private class, you'll also need to specify the class name when you release these resources. Remember that you can associate a resource list either with the Driver object itself or with a particular Device object. Any resources being used by multiple devices should be in the DriverList, while device-dedicated resources should go in the DeviceList. If you break your resources up this way, you'll need to call IoReportResourceUsage several times: once for the DriverList and once for each Device List. If loReportResourceUsage returns STATUS_SUCCESS, you have to check the value returned in the ConflictDetected boolean. If this variable is TRUE, it 5 It's also worth emphasizing that these Partial Resource Descriptors contain the original bus-rela tive values for such things as the I/O port base and the IRQ level - not the translated values returned by functions like Ha!TranslateBusAdress. Sec. 7.5 Allocating and Releasing Hardware Table 7.7 155 Prototype for l oReportResourceUsage == NTSTATUS loReportResourceUsage IRQL Parameter Description IN PUNICODE_STRING ClassName IN PDRIVER_OBJECT DriverObject IN PCM_RESOURCE_LIST DriverList IN ULONG DriverListSize IN PDEVICE_OBJECT DeviceObject IN PCM_RESOURCE_LIST DeviceList IN ULONG DeviceListSize IN BOOLEAN OverrideConflict Optional class name for driver Driver object associated with this driver Resources used by all driver's devices Size of list in bytes Device that will own the resources Resources used by a single device Size of list in bytes • TRUE - ignore resource conflicts • FALSE - return error if conflict • TRUE - resources already claimed • FALSE - no conflict • STATUS_SUCCESS • STATUS_INSUFFICIENT_RESOURCES OUT PBOOLEAN ConflictDetected Return value PASSIVE_LEVEL means that one or more items in your resource list already belong to someone else. In this case, your driver mustn't use any of the hardware in the list. The OverrideConflict parameter determines the behavior of IoReportRe sourceUsage when it detects a conflict. If you pass FALSE, the function makes no changes to the Registry's resource map. Instead, it puts a message in the event log 6 identifying the conflicting resources and their current owner. If OverrideConflict is TRUE, IoReportResourceUsage does add your resource list to the resource map but doesn't send a message to the system event log. However, even though your resource list is in the Registry, your driver mustn't touch any hardware in the list; someone else thinks they own it. One odd bit of behavior is worth mentioning: Sometimes when there's a resource conflict, loReportResourceUsage returns an unsuccessful status code that has no corresponding Win32 error number. The sample code in the next sec tion shows how to handle this situation properly. How to Release Hardware When you want to free up resources held by your driver, you build an empty resource list and call IoReportResourceUsage. Since the new list completely replaces the previous one, this has the effect of releasing any resources described in the old list. If you allocated hardware on a device-specific or driver-wide basis, you 6 Your driver has to be identified in the Registry as a system event logging component in order for the Event Viewer to display these messages. Chapter 13 explains how to set this up. These mes sages can be very helpful for debugging resource conflicts. 156 Chapter 7 Hardware Initialization need to release it the same way. Also, if you used a private class name to allocate the hardware, you'll need to use the same class name to free it. The following code fragment shows how a driver's Unload routine might release hardware resources associated with a specific Device object. CM_RESOURCE_L I ST Re s Li s t ; BOOL bConf l i c t ; Re sLi s t . Count = O; I oRepo rtRes ourc eUs age ( NULL , II pDriverObj ect , II II NULL , 0, pDevi ceObj ect , II &ResLi s t , II s i z e o f ( Res L i s t ) , FALSE , II II &bCon f l i c t ) ; De fau l t c l a s s name Po inter to Driver obj ect No driver -wide resources Po inter to Devi c e obj ect Devi c e - spec i f i c resourc e s Don ' t overr i de conf l i c t Junk , but requi red Mapping Device Memory If your device uses a range of dedicated memory addresses, your driver will need to make that memory available during initialization. Depending on the architecture of the device, your driver will need to perform one of the following two procedures. Driver-chosen addresses Some devices (like Ethernet adapters) have a control register that specifies the starting address of a device specific memory area. In this case, your driver needs to allocate memory for the device and let the device know where the memory is located. 7 Follow these steps to set up this memory area: 1. Call loReportResourceUsage to allocate the device's control registers. 2. Call HalGetAdapter to find the Adapter object associated with your device. 3. Call HalAllocateCommonBuffer to allocate buffer space for your device's memory. This function returns both a system virtual address and a physical address. 4. Save the system virtual address of this buffer somewhere in your Device Extension. Use this virtual address from within your driver whenever you need to reference the device's memory area. 7 This is actually just a special case of something called common buffer bus master DMA which is described in Chapter 12. Allocating and Releasing Hardware Sec. 7.5 157 5. Write the buffer 's physical address into whatever device registers control access to the device memory. 6. When your driver unloads, call HalFreeCommonBuffer to release the buffer. Hard-wired addresses Some pieces of hardware (like VGA controllers) have very specific ideas about where their shared buffers should be located. If your device needs to use a particular range of physical addresses for device mem ory, follow these steps to make the memory available to your driver: 1. Call IoReportResourceUsage to request exclusive ownership of the range of physical addresses belonging to the device. 2. Call HalTranslateBusAddress to convert the device's bus-relative physical addresses into system wide values. 3. Call MmMaploSpace to map the device's memory into system virtual space. Save the address returned by this function and use it to access device memory from within your driver. 4. When your driver unloads, call MmUnmaploSpace to break the connection between the device's memory and system virtual space. Loading Device Microcode As part of their initialization, some complex devices need to have microcode loaded into them from a disk file. If the quantity of microcode is small, you can store it as a REG_BINARY value in the driver 's Parameters subkey. For a device that needs large amounts of microcode, this may not be feasible. Fortunately, NT provides several functions that give drivers handle-based access to files and directories. As you can see from Table 7.8, these routines bear a strong resemblance to the Win32 user-mode file APL Using these functions, a driver could load vast quantities of microcode into a device without overburden ing the Configuration Manager. In this case, only the path-name for the microcode file would need to be stored in the driver 's Parameters subkey. There are three important things to keep in mind if you decide to use these functions. First, you can only call them from parts of your code running at PASSIVE_LEVEL IRQL. This effectively limits their use to DriverEntry, the Unload routine, Dispatch routines, and any thread-based parts of your driver. Second, you can't access any files with these calls until the file-system driver for the target volume has finished initializing itself. If your driver loads during system bootstrap, you can guarantee that it loads after any file systems by setting up proper group dependencies in the Registry. Chapter 16 explains how to do this. Finally, avoid the temptation to store driver initialization parameters in disk files. That kind of thing belongs only in the Registry. The proliferation of .INI files in earlier versions of Windows was a bad thing; don't litter NT with them. Chapter 7 158 Hardware Initialization Kernel-mode code can access files using these functions Table 7.8 ZwXxx file functions IRQL == PASSIVE_LEVEL I F you want to . . . TH EN call . . . Create or open a file, device, or directory Read data into memory from a file Write data from memory to a file Get file size, position, attribute information Set file size, position, attribute information Close an open file handle ZwCreateFile ZwReadFile ZwWriteFile ZwQuerylnformationFile ZwSetlnformationFile ZwClose For more information about the functions listed in Table 7.8, take a look at the online documentation in the NT DOK. The DOK also contains some sample code that shows how to use these routines. 7.6 CODE EXAMPLE : ALLOCATING HARDWARE This example illustrates the hardware allocation techniques we've just been look ing at. It assumes that the device uses a OMA channel, but no device-specific memory or other device-specific data. You can find this code in the CH07 direc tory on the disk that accompanies this book. RESALLOC.C The functions in this file allocate a group of resources for exclusive use by a specific Driver object. XxReportHardwareUsage Given a linked list of CONFIG_ARRAYs, this routine buids a Resource List and marks the resources as belonging to the entire Driver object. No resources are tagged as belonging to specific Device objects. NT STATUS XxReportHardwareUsage ( IN PDRIVER_OBJECT DriverObj e c t , IN PCONF I G_ARRAY Con f i gL i s t ) ULONG L i s t S i z e ; PCM_RESOURCE_L I S T ResourceL i s t ; PCM_FULL_RESOURCE_DESCRI PTOR Frd ; PCM_PART IAL_RESOURCE_DESCRI PTOR Prd ; PCONF IG_ARRAY CurrentArray ; Sec. 7.6 159 Code Example: Allocating Hardware BOOLEAN bCon f l i c tDetec t ed ; NTSTATUS s tatus ; ULONG i ; II I I Calculate s i z e o f resource l i s t 0 II ListSize = FI ELD_OFFSET ( CM_RESOURCE_L I S T , L i s t [ O ] ) ; CurrentArray = Con f i gL i s t ; whi l e ( CurrentArray ! = NULL { L i s tS i z e + = s i zeof ( CM_FULL_RESOURCE_DESCRIPTOR ) + ( ( ( CurrentArray- >Count * XX_RESOURCE_ITEMS_PER_DEVICE ) - 1 ) * s izeof ( CM_PARTIAL_RESOURCE_DESCRI PTOR ) ) ; CurrentArray = CurrentArray- >NextCon f i gArray ; II I I Try and a l l ocate paged memory f o r the resource I I l i s t . If it works , z ero out the l i s t . II Re s ourceLi s t = ExAl l ocatePoo l ( PagedPool , L i s t S i z e ) ; $ i f ( ResourceL i s t = = NULL ) { return STATUS_INSUF F I C IENT_RESOURCES ; Rt l ZeroMemory ( Res ourceL i s t , Li s t S i z e ) ; CurrentArray = Con f i gL i s t ; � Frd = &Res ourceL i s t - >L i s t [ O J ; whi l e ( CurrentArray ! = NULL ) { Re s ourceL i s t - >Count + + ; Frd- > Interf ac eType = CurrentArray- >BusType ; Frd- >BusNurnbe r = CurrentArray->BusNurnber ; II I I Set the number o f Part i a l Re source I I Descriptors in thi s FRD . Chapter 7 160 Hardware Initialization II Frd- > Par t i al Re s ourceL i s t . C ount CurrentArray->Count * XX_RESOURCE_ITEMS_PER_DEVICE ; II I I Get pointer t o f i r s t Part ial Res ource I I Des c r iptor i n thi s FRD . II Prd = &Frd- > Par t i al Re s our c eL i s t . Par t i a l De s c r iptors [ O ] ; f o r { i = O ; i < CurrentArray->Count ; i + + { Prd 0 XxBui ldPar t ialDescriptors { &CurrentArray->Devi ce [ i ] , Prd ) ; } II I I Point t o begi nning o f next Ful l Res ourc e I I Des c r iptor . II { PUCHAR ) Frd + = { { { Frd - > Par t i a lResour c eL i s t . Count - 1 ) * s i z e o f { CM_PARTIAL_RESOURCE_DESCRI PTOR ) ) + s i z e o f { CM_FULL_RESOURCE_DESCRI PTOR ) ) ; II I I Ge t next Con f i g array from l inked- l i s t II CurrentArray = CurrentArray- >NextCon f i gArray ; } s tatus = IoReportRe s ourceUsage { 8 NULL , DriverObj e c t , Res ourceL i s t , ListSize , NULL , NULL , 0, FALSE , I I Don ' t override &bCon f l i c tDetected ) ; ExFreePoo l { Res ourceL i s t ) ; i f { ! NT_SUCCESS { s tatus ) I I bCon f l i ctDe t e c t ed return STATUS_INSUFF I C IENT_RESOURCES ; else Code Example: Allocating Hardware Sec. 7.6 161 return STATUS_SUCCESS ; } 0 Start by accounting for header space between the beginning of the Resource List and first Full Resource Descriptor (FRD) . For the whole Resource List, we need one FRD per bus type and bus number. We have to run the Config List to find them all. Each FRD contains a separate group of Partial Resource Descriptors (PRDs) for each device we're allo cating. Since an FRD has one PRD already embedded in it, we subtract one from the total PRD count for each FRD. @ Once the hideous calculations are complete, we allocate a chunk of paged pool that's large enough to hold the whole thing. As always, it's impor tant to zero out any memory allocated from the system pool areas. You don't know where they've been. � Run the Config List again. This time, build a separate FRD for each Con fig Array in the list. 0 Loop through all the Device Blocks in the current Config Array. For each Device Block, call a helper function to create PRDs for any resources used by that device. 0 Once the Resource List is complete, call loReportResourceUsage to request ownership of the hardware. Afterwards, release the pool memory used for the Resource List. XxBuildPartialDescriptors Give a Device Block and a pointer to the first available Partial Resource Descriptor in an FRD, this function adds all the PRDs for one device to the current FRD. s t a t i c PCM_PARTIAL_RESOURCE_DESCRI PTOR XxBu i l dPartialDescriptors ( IN PDEVICE_BLOCK Device , IN PCM_PARTIAL_RESOURCE_DESCRI PTOR Prd ) II I I Set up PRD f o r control regi s ters II Prd- >Type = CmRes ourc eTypePort ; Prd- > ShareD i spo s i t i on = CmRe s ourceShareDr iverExc lus ive ; Prd- > F l ag s = CM_RESOURCE_PORT_IO ; 0 Prd->u . Port . Start = Dev i c e - >Or iginal PortBas e ; Prd- >u . Port . Length = Devi c e - > Port Span ; Prd+ + ; @ 162 Chapter 7 Hardware Initialization II I I S e t up PRD f o r Interrupt r e s ource II Prd->Type = CmRe sourc eTypeinterrupt ; Prd- > ShareD i spo s i t i on = CmRes ourceShareDriverExc lus ive ; i f ( Devi c e - > InterruptMode = = Lat ched ) Prd- > F l ags = CM_RESOURCE_INTERRUPT_LATCHED ; else Prd- > F l ags = CM_RESOURCE_INTERRUPT_LEVEL_SENS I TIVE ; Prd- > u . Interrupt . Level = Devi c e - >Or i ginal i rql ; @ Prd- > u . Interrupt . Vector = Devi c e - >Ori ginalVe c t o r ; Prd+ + ; return Prd ; 0 This example assumes that device control registers are always in I/ 0 space. A truly general driver would need to take a more flexible approach. f9 Point to the beginning of the next PRD. (C is a wonderful language.) @ The setup operations for all the PRDs are very similar; just fill in the nec essary fields of the PRD. Remember to use the original values, and not the ones returned by translation functions such as HalGetlnterruptVector or HalTranslateBusAddress. 7.7 SUMMARY In this chapter, we've looked a t various techniques your driver can use to locate the hardware it has to manage. For some kinds of devices, the hardware will iden tify itself and provide the system with a lot of information. Other devices (includ ing most ISA cards) are very shy, so you'll need to supplement any auto-detected information with other data sources, including hard-wired Registry values. What ever method you use to find your hardware, you absolutely must claim it for your driver's exclusive use. Now that we have a driver that loads and unloads without crashing the sys tem, the next step is to make a connection with the NT system service dispatcher. That's the subject of Chapter 8. C H A P T E R 8 Driver Dis p atch Routines W hen an 1/0 request begins its arduous journey through the NT I / 0 subsystem, the first challenge it faces is to get by one of your driver's Dispatch routines. The Dispatch routine decides whether the request should go any further, or whether it should be sent back to the original caller in disgrace. This chapter will help you set up your Dispatch routines and explain how these routines should behave in various situations. It also fills in some of the details involved in processing buffered and direct I / 0 requests. 8.1 ENABLING DRIVER DISPATCH ROUTI NES Before your driver can receive I/O requests, you need to tell the 1/0 Manager what kinds of operations the driver supports. This section describes the I/O Man ager's dispatching mechanism and explains how to enable receipt of specific I/O function codes. It also presents some guidelines for deciding which function codes your driver needs to support. 1/0 Request Dispatching Mechanism Recall from earlier chapters that most I/O operations under NT are packet driven. When a user-mode application issues an I/O request, the I/O Manager first builds an IRP to keep track of the request. Among other things, it stores an IRP_MJ_XXX code in the MajorFunction field of the IRP's I/O stack location to identify the exact operation being performed. 1 63 Chapter 8 164 Driver Object I R P_MJ_WRITE IRP_MJ_WRITE Driver Dispatch Routines -r MajorFunction[ ] _ loplnvalidDeviceRequest i------iiiii!iiii XxDispatchWrite _loplnvalidDeviceRequest Copyright © 1 994 by Cydonix Corporation. 940030a.vsd Figure 8.1 How the 1/0 Manager selects Dispatch routines When it's time to process the IRP, the I / 0 Manager uses the IRP_MJ_XXX value as an index into the Driver object's MajorFunction table. From the table, it gets a pointer to a routine that handles this specific IRP_MJ_XXX code, which it then calls. If the driver doesn't support the requested operation, the table entry points to the 1/0 Manager 's internal _loplnvalidDeviceRequest function which returns an error to the original caller. If the driver does support the opera tion, the table entry points to one of the driver 's own Dispatch routines. Figure 8.1 illustrates this process. Enabling Specific Function Codes To enable dispatching for a specific IRP_MJ_XXX function code, your Driv erEntry routine must put the address of a Dispatch routine into the Maj orFunc tion table of the Driver object. You use the 1/0 function code itself as an index into the dispatching table. The following code fragment illustrates how to do this. NT STATUS DriverEntry ( IN PDRIVER_OBJECT pDO , IN PUNICODE_STRING Reg i s t ryPath pDO - >Ma j orFunc t i on [ IRP_MJ_CREATE ] = XxDi spCreate ; XxDispC l o s e ; pDO - >Maj orFunct i on [ IRP_MJ_CLOSE ] pDO - >Maj orFunc t i on [ IRP_MJ_CLEANUP ] = XxDi spC l eanup ; Sec. 8.2 165 Extending the Dispatch Interface pDO- >Maj orFunc t i on [ I RP_MJ_READ ] = XxD i spRead ; pDO - >Maj orFunc t i on [ I RP_MJ_WR I TE ] = XxDi spWr i t e ; return STATUS_SUCCES S ; } Note that you can use the same Dispatch routine to service more than one 1/0 function code. The choice of how many Dispatch routines to implement is entirely up to you. Also, you can ignore MajorFunction table entries corresponding to function codes your driver doesn't support. By the time the 1/0 Manager calls your DriverEntry routine, it has already filled every entry in the table with pointers to _IoplnvalidDeviceRequest, so any slots you don't explicitly fill will appear as unsupported device operations. Deciding Which Function Codes to Support All drivers must support the IRP_MJ_CREATE function code, since this is the one generated by a Win32 CreateFile call. If you don't process this function code, Win32 programs will have no way to get a handle to your device. The choice of other function codes will depend on the nature of your device and the kinds of operations it can perform. Use Table 8.1 to decide which IRP function codes might be appropriate. If you're writing an intermediate driver, you must provide Dispatch entry points for all the 1/0 function codes supported by any drivers below yours in the chain. If you're writing a driver for one of the standard system devices, or if you're writing a layered driver that sits on top of such a device, it's important that you support a specific set of required IRP function codes. Part II of the Windows NT DOK Kernel-mode Driver Reference contains extensive descriptions of the IRP_MJ_XXX function codes your driver must process if it supports one of the standard devices. 8.2 EXTENDING THE DISPATCH I NTER FACE What do you do if you need to perform a device operation other than the ones listed in Table 8.1? The I / 0 Manager doesn't permit you to add any new IRP func tion codes, so that's not an option. Fortunately, two of the standard IRP_MJ_XXX values are escape codes that allow you to define any number of driver-specific operations: • IRP_MJ_DEVICE_CONTROL Lets you define functions that are avail able to user-mode clients through the Win32 DeviceloControl function. Other drivers can also issue these control requests by building appropri ate IRPs. - Chapter 8 166 • Driver Dispatch Routines IRP_MJ_INTERNAL_DEVICE_CONTROL Lets you define functions that are only available to kernel-mode clients (usually other drivers). There is no user-mode API function that can generate one of these requests. - Both these functions pass a driver-defined 32-bit value as a parameter in the IRP. This value is referred to as an 1/0 control code (IOCTL), and your driver uses it to determine just what operation it should perform. The rest of this section Table 8.1 Commonly used I R P function codes and their Win32 functions IRP_MJ_XXX function codes Function code Description IRP_MJ_CREATE Request for a handle. IRP_MJ_CLEANUP Cancel pending IRPs when handle closes • • CreateFile CloseHandle IRP_MJ_CLOSE Close the handle. IRP_MJ_READ Get data from device. • • CloseHandle ReadFile IRP_MJ_WRITE Send data to device. IRP_MJ_DEVICE_CONTROL Control operation available to user or kernel-mode clients. • • WriteFile DeviceloControl IRP_MJ_QUERY_INFORMATION Control operation only available to kernel-mode clients. (No Win32 call) Get length of file. IRP_MJ_SET_INFORMATION Set length of file. IRP_MJ_INTERNAL_DEVICE_CONTROL • • IRP_MJ_FLUSH_BUFFERS SetEndOfFile Write output buffers or discard input buffers. • • • IRP_MJ_SHUTDOWN GetFileSize FlushFileBuffers FlushConsolelnputBuffer PurgeComm System shutting down. • InitateSystemShutdown Note: See NTDDK.H or the online documentation for a complete list of IRP MJ XX codes. _ _ Sec. 8.2 167 Extending the Dispatch Interface explains how this interface works. Later in the chapter, you'll see how to process these functions when they appear in an IRP. Defining Private IOCTL Val ues The IOCTL values passed to your driver have a very specific structure. Fig ure 8.2 illustrates the fields that make up one of these codes. Although you can fabricate these control codes by hand, it's much easier to generate them using the CTL_CODE macro that comes with the DDK. As you can see from Table 8.2, the arguments to this macro parallel the fields of an IOCTL code. IOCTL Argument-Passing Methods In many situations, you'll want to define IOCTL codes that either need addi tional arguments from the caller, or that need to pass information back to the caller. For example, an IOCTL that queried a driver for performance data would need some way to return the data. The Win32 DeviceloControl function solves this problem by letting the user specify a pair of input and ouput buffer addresses along with the IOCTL code. The question then becomes: Does the I/0 Manager pass these buffers to your driver using Buffered or Direct I/0? You may be tempted to think that the buffering method used for IOCTLs will be the same one you specified with the DO_BUFFERED_IO or DO_DIRECT_IO flags in the Device object. However, the method used for a device's IOCTLs is not necessarily the same as the method used for data transfers. For greater flexibility, the I/O Manager uses a field in the IOCTL code itself to determine the buffering method. This allows you to choose different buffering methods for each individual IOCTL. 15- 14 31 - 1 6 Device Type 13 - 2 J Required Access Control Code Transfer Type Copyright © 1 996 by Cydonix Corporation. 960016a.vsd Figure 8.2 Layout of an IOCTL code 1 -0 Chapter 8 168 Table 8.2 Driver Dispatch Routines Use the CTL_COD E macro to define IOCTL codes CTL_CODE macro Parameter Description DeviceType FILE_DEVICE_XXX value given to IoCreateDevice • OxOOOO to Ox7FFF - reserved for Microsoft • Ox8000 to OxFFFF - available for customer device types Driver-defined IOCTL code • OxOOO to Ox7FF - reserved for Microsoft • Ox800 to OxFFF - available for customer IOCTLs Buffer-passing mechanism for this control code (see below) • METHOD_BUFFERED • METHOD_IN_DIRECT • METHOD_OUT_DIRECT • METHOD_NEITHER ControlCode TransferType RequiredAccess Access that must be requested when user calls Win32 CreateFile • FILE_ANY_ACCESS • FILE_READ_DATA • FILE_WRITE_DATA • FILE_READ_DATA I FILE_WRITE_DATA As you can see from Figure 8.2, the TransferType field is located in the low est two bits of the IOCTL code. It can take on one of the following values: • M ETHOD_BU FFERED The 1/0 Manager moves IOCTL data to and from the driver using an intermediate nonpaged pool buffer. • M ETHOD_IN_DI RECT IOCTL data coming from the caller is passed using Direct 1/0; data going from the driver back to the caller is passed through an intermediate system-space buffer. • M ETHOD_OUT_DIRECT - Data coming from the caller passes through a system-space buffer; data going back to the caller is passed using Direct 1/0. • M ETHOD_NEITH ER The I/O Manager simply gives the driver raw user-space addresses for the caller's incoming and outgoing IOCTL buffers. - - - If your driver supports a public IOCTL defined by Windows NT, it has to use the method embedded in the IOCTL. 1 For private IOCTLs, you can choose the 1/0 method that makes the most sense for the operation. The guidelines for choosing an IOCTL buffering method are the same as those for choosing a data 1 For a complete list of public IOCTLs, see the header file MSTOOLS\ H \ WINIOCTL.H. Writing Driver Dispatch Routines Sec. 8.3 169 transfer buffering method. Buffered I/ 0 is suitable for small amounts of data (less than PAGE_SIZE bytes), while Direct 1/0 is a better approach for large buffers or DMA operations. Writing IOCTL Header Files It's a good idea to write a separate header file for your control-code defini tions. This header file should also contain any structures that describe the con tents of the IOCTL's input or output buffers. You'll need to include this header file in both the driver and any user-mode programs that issue Win32 DeviceloControl calls to the driver.2 The following is an example of an IOCTL header file: # de f ine I OCTL_XXDEVICE_AIM3 CTL_CODE ( F ILE_DEVI CE_UNKNOWN , O x8 0 1 , METHOD_BUFFERED , F ILE_ACCES S_ANY ) I I S t ruc tures us ed by I OCTL_XXDEVICE_AIM II typede f s t ruct _XX_AIM_IN_BUFF { ULONG Longi tude ; ULONG Lati tude ; XX_AIM_IN_BUFF , * PXX_AIM_IN_BUFF ; \ \ \ \ typede f s t ruct _XX_AIM_OUT_BUFF { ULONG Ext endedS tatus ; } XX_AIM_OUT_BUFF , * PXX_AIM_OUT_BUFF ; #de f ine IOCTL_XXDEVICE_LAUNCH CTL_CODE ( \ \ F ILE_DEVICE_UNKNOWN , O x8 0 2 , \ METHOD_NEI THER , \ F I LE_ACCESS_ANY 8.3 WRITING DRIVER DISPATCH ROUTINES Once you've chosen a n appropriate set o f 1 / 0 function codes, you need to write the Dispatch routines themselves. This section explains how to code these routines. 2 Additionally, the Wm32 program will need t o include WINIOCTL.H and th e driver will need to include DEVIOCTL.H to get the definition of the CTL_CODE macro. These header files need to be included before you include the file with your IOCTL defintions. 3 Microsoft recommends that the names you give to private IOCTLs look like IOCTL_Device_Functian, where Device identifies the device that supports the IOCTL, and Function describes the effect of the IOCTL. 170 Chapter 8 Driver Dispatch Routines Execution Context By the time it calls your Dispatch routine, the 1/0 Manager has already checked the accessibility of the caller's buffer. If this is a Buffered 1/0 operation, it has also allocated a system buffer from nonpaged pool, and for output requests, copied the caller's data into the system buffer. For Direct 1/0 operations, the caller's buffer has been faulted into physical memory and locked down. Like your driver's initialization and cleanup routines, Dispatch routines run at PASSIVE_LEVEL IRQL, which means they can access paged system resources. Table 8.3 shows the prototype for a Dispatch routine. Normally, a Dispatch routine works only with the contents of the IRP. If a Dispatch routine touches any data structures shared with other parts of the driver, it has to synchronize itself properly. This means using a spin lock to coordinate with driver routines running at DISPATCH_LEVEL IRQL and KeSynchronizeEx ecution to synchronize with the Interrupt Service code. Never forget that you're sharing the IRP with the 1/0 Manager. In particu lar, the system uses various fields in the Parameters union to clean up after 1/0 operations. For example, after a Buffered 1/0, it eventually needs to deallocate its nonpaged pool buffer. A field in the IRP gives it the location of this buffer. Chang ing the contents of the IRP can lead to unspecified (but dreadful) results when the I/ 0 Manager tries to finish processing the request. If you need to modify any IRP fields, make working copies in local variables or in the Device Extension. Modify these working copies and not the data in the IRP. The only exceptions to this rule are the 1/0 status block and the Others struc ture in the Parameters union. Chapter 15 will discuss the use of this structure by higher-level drivers. What Dispatch Routines Do Keep in mind that the exact behavior of a Dispatch routine will depend on the function code it supports. However, the general responsibilities of these rou tines include the following: l. Call IoGetCurrentlrpStackLocation t o get a pointer t o the IRP stack location belonging to this driver. Table 8.3 Function prototype for a Dispatch routine NTSTATUS XxDispatch IRQL == PASSIVE_LEVEL Parameter Description IN PDEVICE_OBJECT DeviceObject Pointer to target device for this request Pointer to IRP describing this request • STATUS_SUCCESS - request complete • STATUS_PENDING - request pending • STATUS_:XXX - appropriate error code IN PIRP irp Return value 171 Writing Driver Dispatch Routines Sec. 8.3 2. Perform any additional sanity checking or parameter validation specific to this function code and device. 3. If this is an intermediate-level driver, and there are limitations on the underly ing physical device (for example, its maximum transfer size), the Dispatch routine may need to split the caller's request into multiple requests to the device driver. Chapter 15 explains how to do this. 4. Continue processing the IRP until one of three exit conditions occur. The following subsections describe some of these steps in greater detail. Exiting the Dispatch Routine When a Dispatch routine processes an IRP, there are only three possible out comes: • The IRP's request parameters don't pass whatever validation tests you're applying and you need to reject the request. • You can complete the request entirely in the Dispatch routine without performing any device operations. • You need to start a device operation in order to complete the request. Signaling an error If your Dispatch routine uncovers a problem with the IRP parameters, you need to send the request back to the caller with a nasty mes sage. Follow these steps to reject an IRP: 1. Put an appropriate error code in the Status field of the IRP's 1/0 status block and clear the Information field. 2. Call IoCompleteRequest to release the IRP with no priority increment. 3. When you exit the Dispatch routine, return the same error code you put in the IRP. The code fragment below shows how a Dispatch routine rejects an I/O request. NT STATUS XXDi spWhat ever ( IN PDEVICE_OBJECT pDO , IN PIRP I rp ) { I rp - > I o S tatus . S tatus 4 STATUS_BADVIBES ; 4 No, STATUS_BADVIBES isn't a real NTSTATUS code. 172 Chapter 8 Driver Dispatch Routines I rp - > I oS tatus . Informat i on = O ; I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; return STATUS_BADVIBES ; Completing a request You can process some kinds of IRP function codes without actually performing any device operations. Opening a handle to a device, or returning information stored in the Device object are examples of these kinds of requests. To complete a request in the Dispatch routine, do the following: 1. Put a successful completion code in the Status field of the IRP's I / O status block, and set the Information to some appropriate value. 2. Call IoCompleteRequest to release the IRP with no priority increment. 3. Exit the Dispatch routine with a value of STATUS_SUCCESS. The code fragment below shows how a Dispatch routine completes a request. NT STATUS XXD i spC l o s e ( IN PDEVICE_OBJECT pDO , IN PIRP I rp ) I rp- > I o S tatus . S tatus = STATUS_SUCCES S ; I rp - > I o S tatus . Informat i on = O ; I oComplet eReque s t ( I rp , IO_NO_INCREMENT ) ; return STATUS_SUCCES S ; } Starting a device operation The last possibility is that the IRP is request ing an actual device operation. This could be either a data transfer, a control func tion, or an informational query. In this case, the Dispatch routine has to pass the IRP to the driver's Start I/O routine. To start a device operation, do the following: 1. Call IoMarklrpPending s o that the I / O Manager won't try to complete the request. 2. Call IoStartPacket to send the request to your driver's Start I / O routine. If you manage your own IRP queues, call your driver's internal routine to start the I/0. 3. Exit the Dispatch routine with a value of STATUS_PENDING. The following code fragment shows how a Dispatch routine starts a device operation. Sec. 8.4 Processing Specific Kinds of Requests 173 NT STATUS XxDi spWr i t e ( IN PDEVICE_OBJECT pDO , IN P I RP I rp ) IoMarki rpPending ( I rp ) ; I o S tart Packet ( pDO , I rp , 0 , NULL ) ; re turn STATUS_PENDING ; It's a little-known fact that the 1/0 Manager automatically completes any IRP that isn't marked pending as soon as the Dispatch function returns. Unfortu nately, this automatic mechanism doesn't work the same way as an explicit call to IoCompleteRequest. In particular, it doesn't include calling any I/O Completion routines attached to the IRP by higher-level drivers. Consequently, it's important that your driver either marks an IRP as pending or completes it explicitly with IoCompleteRequest. 8.4 PROCESSING SPECI FIC KIN DS OF REQU ESTS The previous section described the general kinds of processing done by a driver's Dispatch routines. These routines may also need to perform various operations that depend on the IRP's function code and the buffering strategy used with the device. This section discusses some of these request-specific issues. This material is also relevant to the Start 1 /0 routine and other parts of a driver, but it appears here because this is the first place where you might run into it. Processing Read and Write Requests Chapter 6 explained how to create Device objects, which included setting the DO_BUFFERED_IO or DO_DIRECT_IO bits in the Device object's Flags field. These bits control the 1 /0 Manager's behavior for all IRP_MJ_READ and IRP_MJ_WRITE requests sent to the device. Here's what happens once you've set these flags. Buffered 110 At the start of both read and write requests, the 1/0 Manager checks the accessibility of the user buffer. It then allocates a piece of nonpaged pool as big as the caller's buffer and puts its address in the Associatedlrp.System Buffer field of the IRP. This is the buffer your driver should use for the actual data transfer. For IRP_MJ_READ operations, the I/O Manager also sets the IRP's User Buffer field to the user-space address of the caller's buffer. Later, when the request is completed, it will use this address to copy data from the driver's system-space buffer back to the caller's buffer. For an IRP_MJ_WRITE request, the I/ 0 Manager Chapter 8 174 Driver Dispatch Routines sets the IRP's User Buffer field to NULL and copies the contents of the user buffer into the system buffer. Direct 1/0 The I/O Manager checks the accesibility of the user buffer and locks it in physical memory. It then builds a Memory Descriptor List (MDL) for the buffer and stores the address of the MDL in the IRP's MdlAddress field. Both the Associatedlrp.SystemBuffer and UserBuffer fields are set to NULL. Normally, you use the MDL to set up a DMA operation, as you'll see in Chapter 12. If you're performing Direct 1/0 with a programmed 1/0 device, you can use the MmGetSystemAddressForMdl function to get a system-space address for the user buffer. This function doubly maps the caller's buffer into a range of nonpaged system space. (In effect, the buffer lives at two virtual addresses at one time.) When your driver completes the 1/0 request, the system automatically unmaps the buffer from system space. 5 Neither method If you specify neither Buffered nor Direct 1/0 when you create a Device object, it's up to your driver to decide how to handle buffering issues. The 1 /0 Manager simply puts the user-space address of the caller's buffer into the IRP's UserBuffer field. In this case, the IRP's Associatedlrp.SystemBuffer and MdlAddress fields have no meaning and are set to NULL. Be very careful about accessing the caller's buffer in user space with the UserBuffer field of the IRP - even if the buffer is locked down. Since IRPs are processed asynchronously, there's no guarantee that the calling process will still be mapped into user space by the time your driver executes. The only exception to this rule is that the Dispatch routines (and only the Dispatch routines) of a highest-level driver can use UserBuffer to access the caller's buffer. This is because these routines always run in the context of the thread issuing the 1 / 0 request. Other routines in a highest-level driver (and any routine i n a lower driver) don't have this guarantee. Processing IOCTL Requests Once your driver has filled in either the IRP_MJ_DEVICE_CONTROL or the IRP_MJ_INTERNAL_DEVICE_CONTROL slots in the Driver object's MajorFunc tion table, the 1/0 Manager starts passing these requests to the associated Dis patch routines. At this point, your driver has to decide what to do with the request. Other than buffer access checking (described later), the I/O Manager does no validation of either the IOCTL control code itself or the contents of the caller's buffers. (For example, the FILE_DEVICE_:XXX field of the IOCTL does not have to 5 Drivers ought to avoid this technique, because releasing the doubly-mapped pages causes every in the system to flush its data cache. This is terrible for system performance. CPU Sec. 8.4 Processing Specific Kinds of Requests 175 match that of the target Device object.) The caller could pass any random number as an IOCTL code, and it would find its way to your IOCTL Dispatch routine. So, it's up to you to do any necessary sanity checking. IOCTL dispatchers usually turn into one of those horrendous switch state ments that Microsoft finds so intriguing. The following skeleton of code shows the general layout of a Dispatch routine that processes IOCTL requests. NT STATUS XxD i s p i oContro l ( O IN PDEVICE_OBJECT pDO , IN PIRP I rp ) P IO_STACK_LOCATION I rpS tack ; ULONG ControlCode ; ULONG InputLength , OutputLength ; NTSTATUS S tatus ; I rpS tack = I oGe tCurrent i rpStackLocat i on ( I rp ) ; I I Extrac t u s e ful informat i on from the I I O s tack II ControlCode = I rpS tack- > Parame ters . Devi c e i oContro l . I oControlCode ; InputLength = I rpStack-> Parameters . Devi c e i oContro l . InputBu f ferLength ; OutputLength = I rpStack-> Parameters . Devi c e i oContro l . OutputBu f f e rLength ; swi tch ( ControlCode ) { case IOCTL_XXDEVICE_AIM : 8 I I Check bu f f er s i z e s and fai l i f I I not enough space . . . II i f ( ( InputLength < @) s i z e o f ( XX_AIM_IN_BUFF ) ) Ou tpu tLength < I I s i z eo f ( XX_AIM_OUT_BUFF ) ) ) S t atus break ; } = STATUS_INVALI D_BUFFER_S I Z E ; I I Everything ' s OK ; pas s I RP to S tart I I O II I oMarki rpPending ( I rp ) ; 8 I o S tartPacket ( pDO , I rp , 0 , NULL ) ; return STATUS_PENDING ; Chapter 8 176 Driver Dispatch Routines case IOCTL_XXDEVI CE_LAUNCH : i f ( InputLength > 0 0 1 1 OutputLength > 0 ) { Status = STATUS_INVAL I D_PARAMETER ; break ; } I I Same kind o f pro c e s s ing as the case I I above ; I I I t ' s not a rec ogni zed contro l code . . . II de fau l t : STATUS_INVALI D_DEVICE_REQUEST ; Status break ; } I I We only wind up here i f there ' s an error II I rp - > I o S tatus . S tatus = S tatus ; © I rp - > I o S tatus . Inf orma t i on O; IoComp l e t eRequ e s t ( I rp , IO_NO_INCREMENT ) ; return S tatus ; = 0 If you support both external IRP_MJ_DEVICE_CONTROL and internal IRP_MJ_INTERNAL_DEVICE_CONTROL (kernel-mode only) interfaces, you'll probably want individual IOCTL Dispatch routines for each major function code. @ Include a separate case for each IOCTL code that might appear. Any code that isn't supported will end up in the default case and fail. � You have to make sure that any buffers associated with the IOCTL are big enough. This has to be checked individually for each IOCTL code, since different control codes may have different input and output structures. 0 If the IOCTL makes it through all the validation checks, it gets sent to the driver's Start 1 / 0 routine. This assumes that the IOCTL causes some kind of device operation. For IOCTLs that don't require device activity, you can perform the operation and complete the IRP successfully from the XxDisploControl routine. 0 If you're not expecting any buffers for a particular IOCTL code, you might want to return STATUS_INVALID_PARAMETER and fail. This isn't really an error, but it makes you wonder if the caller is missing a clue or two. © If something is wrong with this IOCTL request, fail the IRP using what ever status value was generated by the switch statement. Sec. 8.4 Processing Specific Kinds of Requests 177 Managing IOCTL Buffers IOCTL requests can involve both an input buffer coming from the caller and an output buffer being returned to the caller. As a result, they act like a combina tion of a write operation followed by a read. From previous sections of this chap ter, you know that the buffering strategy used for an IOCTL request is determined by the low-order 2 bits of the IOCTL code itself. The following paragraphs describe how the various buffering methods work. M ETHOD_BUFFERED The I / 0 Manager starts by allocating a single chunk of nonpaged pool that's big enough to hold either the caller's input or output buffer (whichever is larger) . It puts the address of the nonpaged pool buffer in the IRP's Associatedlrp.SystemBuffer field. It then copies the IOCTL's input data into the system buffer and sets the UserBuffer field of the IRP to the user-space output buffer address. When your driver completes the IOCTL IRP, the 1 / 0 Manager copies the contents of the system buffer back into the caller's output buffer. Since the same piece of nonpaged pool is being used for both the input and output buffers, your driver should read all incoming data before it writes any out put data to the buffer. M ETHOD_IN_DIR ECT The I / O Manager checks the accessibility of the caller's input buffer and locks it into physical memory. It then builds an MDL for the input buffer and stores a pointer to the MDL in the MdlAddress field of the IRP. It also allocates an output buffer from nonpaged pool and stores the address of this buffer in the IRP's Associatedlrp.SystemBuffer field. The IRP's UserBuffer field is set to the original caller's output buffer address. When the IOCTL IRP is completed, the contents of the system buffer will be copied back into the caller's original output buffer. M ETHOD_OUT_DI RECT The 1 / 0 Manager checks the accessibility of the caller's output buffer and locks it into physical memory. It then builds an MDL for the output buffer and stores a pointer to the MDL in the MdlAddress field of the IRP. The I/0 Manager also allocates an input buffer from nonpaged pool and stores its address in the IRP's Associatedlrp.SystemBuffer field. It copies the con tents of the caller's original input buffer into the system buffer and sets the IRP's UserBuffer field to NULL. M ETHOD_NEITHE R The I / O Manager puts the address of the caller's input buffer in the Parameters.DeviceloControl.Type3InputBuffer field of the IRP's current I/0 stack location. It stores the address of the output buffer in the IRP's UserBuffer field. Both of these are user-space addresses. Chapter 8 178 8.5 Driver Dispatch Routines TESTING D RIVER DISPATCH ROUTI NES Your driver still has a long way t o go, but once again, you can verify some aspects of its operation. In particular, you can test the driver to be sure that it • Opens and closes a handle to the device • Supports Win32 l/O function calls that return successfully • Manages requests from multiple callers Still not very ambitious goals, but if you complete these tests successfully, your driver will be one step closer to full operation. Testing Procedure The following procedure will let you check all the code paths through your driver's Dispatch routines. 1. Write IRP_MJ_CREATE and IRP_MJ_CLOSE Dispatch routines for your driver. 2. Test the driver with a simple Win32 console program that gets a handle to your device and then closes the handle. 3. Write other Dispatch routines but modify them so that they always call IoCompleteRequest rather than starting any device operations. 4. Modify your Win32 test program to make ReadFile, WriteFile, and Devicelo Control calls that exercise each driver Dispatch routine. 5. If your device is shareable, run several copies of the test program at once to be sure the driver works with multiple open handles. 6. If your driver supports multiple physical devices, repeat the tests with each device unit. Sample Test Program This is an example of the kind of test program you can use to verify the code paths through a driver's Dispatch routines. # inc lude # inc lude < s tdio . h> VOI D main { VOI D ) { HANDLE hDevi ce ; BOOL s tatus ; Sec. 8.6 Summary hDevi c e 8.6 179 = CreateF i l e ( 11 \ \ \ \ . \ \ XXl 11 • • • ) ; ; s t atus ReadF i l e ( hDevi c e . . . ) s tatus Wr i t e F i l e ( hDevice . . . ) s tatus Devi c e i oContro l ( hDevice . . . ) s tatus C l o s eHandl e ( hDevi ce ) ; ; ; SUM MARY In this chapter, you've seen the beginning of the 1 / 0 processing cycle. By now, you should have a good idea of what IRP function codes your driver will need to support. If some of these functions include IOCTLs, the information in this chap ter will help you implement them correctly. If you're writing a higher-level driver, that may be the end of the story. For device drivers, however, there's still more to do. In the next chapter, you'll see how to perform actual data transfer operations. C H A P T E R 9 Pro grammed I / 0 Data Transfers D evices that do programmed I/O need a great deal of attention from the CPU while they transfer data. Usually, these are slow devices (like the mouse or keyboard) that don't move large amounts of data in a single operation. This chapter explains how to write the data transfer sections of drivers for this kind of hardware. 9.1 H ow P R O G RA M M E D 1/0 W O R KS This section describes the events that occur during a programmed I/ 0 operation, as well as describing some of the other issues a driver will have to face. What Happens during Programmed 1/0 In a programmed 1/0 operation, the CPU transfers each unit of data to or from the device in response to an interrupt. Referring to Figure 9.1, the following sequence of events takes place: 180 1. The Start I/ 0 routine performs any necessary preprocessing and setup based on the IRP_MJ_XXX function code in the IRP. It then starts the device. 2. Eventually, the device generates an interrupt which the Kernel passes to the driver's Interrupt Service routine. Sec. 9.1 How Programmed I / O Works 181 Dispatch (start device) Interrupt Service loRequestDpc DpcForlsr .fr' Interrupt loCompleteRequest loStartNextPacket Copyright © 1 994 by Cydonix Corporation. 940052a.vsd Figure 9.1 Sequence of events in a programmed 1/0 3. If there is any more data, the Interrupt Service routine starts the next transfer. Steps 2 and 3 may repeat any number of times until the operation is complete. 4. When the operation completes, either because there's no more data or because an error occurs, the Interrupt Service routine queues a request to fire off the driver's DpcForlsr routine. 5. The DPC dispatcher eventually runs the DpcForlsr which releases the current IRP back to the I/O Manager. If there are any more IRPs waiting, the Dpc Forlsr sends the next packet to the driver's Start I/0 routine, and the whole cycle repeats. Synchronizing Various Driver Routines Driver routines running at an IRQL below DIRQL must synchronize their access to any device registers or memory areas shared with the driver's Interrupt Service routine. Without this protection, an interrupt might arrive while a low IRQL routine was using the shared resource, and the outcome would be unpre dictable (but probably nothing good). You solve this synchronization problem by putting code that touches these shared resources in a SynchCritSection routine. Table 9.1 shows you the prototype for one of these routines. When you need to execute a SynchCritSection routine, you pass its address as an argument to KeSynchronizeExecution (see Table 9.2). This function raises IRQL to the DIRQL level of the Interrupt object, acquires the object's Interrupt spin lock and then calls your SynchCritSection routine. While it's running, your Chapter 9 182 Table 9.1 Programmed 1/0 Data Transfers Function prototype for a SynchCritSection routine BOOLEAN XxSynchCritSection IRQL == DIRQL Parameter Description IN PVOID Context Pointer to context passed to KeSynchronizeExecution • TRUE - success • FALSE - something failed Return value Table 9.2 Function prototype for KeSynchronizeExecution BOOLEAN KeSynchronizeExecution IRQL < DIRQL Parameter Description IN PKINTERRUPT Interrupt IN PKSYNCHRONIZE_ROUTINE Routine IN PVOID Context Address of an Interrupt object SynchCritSection callback routine Argument for SynchCritSection routine Value returned by SynchCritSection routine Return value SynchCritSection code is guaranteed not to be interrupted by the device associ ated with the Interrupt object. When your routine finishes, KeSynchronizeExecu tion releases the spin lock, drops IRQL back to its original level, and returns to the caller. Notice that you're allowed to pass some context information to the Synch CritSection routine. Typically, this will be a pointer to the Device or Controller Extension structure. 9.2 D R I V E R I N ITIALIZAT I O N A N D C LEAN U P Along with the general initialization and cleanup issues we've seen in previous chapters, there are some specific things that a programmed I/O device driver needs to take care of. The following subsections describe them in detail. Initializing the Start 1/0 Entry Poi nt If your driver has a Start I/ 0 routine, you need to let the I/ 0 Manager know where to find it. You do this by putting the address of the Start I / O routine into the DriverStartlo field of the Driver object, as in the following code fragment: Sec. 9.2 Driver Initialization and Cleanup 183 NT STATUS DriverEntry ( IN PDRIVER_OBJECT Dr iverObj ect , IN PUNICODE_STRING Regi s t ryPath ) II I I Export o ther dr iver entry points II DriverObj e c t - >DriverStar t i o = XxS tar t i o ; DriverObj e c t - > Dr iverUnload = XxDr iverUnl oad ; . . . Dr iverObj ec t - >Maj orFunc t i on [ I RP_MJ_CREATE XxDi spat chOpenC l os e ; If you forget to initialize this entry point, you'll get an access violation (and a bright blue screen) when your Dispatch routines call IoStartPacket. Initializing a DpcForlsr Routi ne The 1/0 Manager provides you with a simplified version of the DPC mecha nism. Tucked away inside each Device object is a single DPC object. To use it, your DriverEntry routine just calls lolnitializeDpcRequest and associates a DpcForlsr callback with the Device object. Later, your driver 's Interrupt Service routine can trigger this DPC by calling IoRequestDpc. For some kinds of drivers, this simplified mechanism is too limited. In Chapter 11, you'll see how to set up your own DPC objects if you need the flexibil ity of multiple DPCs. Connecting to an Interrupt Source Before you can process interrupts, you have to establish a connection between your device's interrupt vector and an Interrupt Service routine in your driver. You do this by calling the IoConnectlnterrupt1 function described in Table 9.3. Given an Interrupt Service routine and some of the translated information generated by your hardware location code, this function adds your ISR to the Ker nel's list of interrupt handlers. 1 If you recall, we first bumped into this function in the driver initialization code in Chapter 6, where we treated it as a necessary bit of magic. Chapter 9 184 Table 9.3 Programmed 1/0 Data Transfers Function prototype for loConnectl nterrupt == NTSTATUS loConnectlnterrupt IRQL Parameter Description OUT PKINTERRUPT *lnterruptObject Address of variable that receives pointer to Interrupt object ISR that handles this interrupt Context argument passed to ISR; usually the Device Extension Initialized spin lock (see below) Translated interrupt vector value DIRQL value for device Usually same as DIRQL (see below) • LevelSensitive • Latched If TRUE, identifies this vector as shareable Set of CPUs on which device interrupt can occur If TRUE, save the state of the FPU during an interrupt • STATUS_SUCCESS • STATUS_INVALID_PARAMETER • STATUS_INSUFFICIENT_ RESOURCES IN PKSERVICE_ROUTINE SeviceRoutine IN PVOID ServiceContext IN PKSPIN_LOCK SpinLock IN ULONG Vector IN KIRQL Irql IN KIRQL Synchronizelrql IN KINTERRUPT_MODE InterruptMode IN BOOLEAN ShareVector IN KAFFINITY ProcessorEnableMask IN BOOLEAN FloatingSave Return value PASSIVE_LEVEL If it works, IoConnectlnterrupt returns a pointer to an Interrupt object. You should store this pointer in your Device or Controller Extension because you'll need it in order to disconnect from the interrupt source or to execute any Synch CritSection routines. Three things are worth mentioning about IoConnectlnterrupt. First, if your ISR handles more than one interrupt vector, or if your driver has more than one ISR, you need to supply the system with a spin lock to prevent collisions over the ISR's ServiceContext. If you're not doing either of those things, then this spin lock is unnecessary. 2 Second, if the ISR manages more than one interrupt vector, or your driver has more than one ISR, make sure that the value you specify for Synchronizelrql is the highest DIRQL value of any of the vectors you're using. 2 Normally, you declare storage space for this spin lock in the Device or Controller Extension. Remember to call KelnitializeSpinLock before you connect to an interrupt source. Sec. 9.3 Writing a Start 1 /0 Routine 185 Finally, your driver's Interrupt Service routine must be ready to run as soon as you call this function. Interrupts from your device (or from other devices at the same IRQL) may preempt any additional initialization done by your driver, and the ISR has to handle these interrupts correctly. So, make sure all the necessary driver setup work is done before you connect to an interrupt. In general, you should follow this kind of sequence: 1. Call IolnitializeDpcRequest to initialize the Device object's DPC and perform any initialization needed to make the DpcForlsr routine execute properly. 2. Disable interrupts from the device by setting appropriate bits in the device's control registers. 3. Perform any driver initialization required by the ISR in order for it to run properly. 4. Call IoConnectlnterrupt to attach your ISR to an interrupt source and store the address of the Interrupt object in the Device Extension. 5. Use a SynchCritSection routine to put the device into a known initial state and enable device interrupts. Disconnecting from an Interrupt Source If your driver is unloadable, you need to detach its Interrupt Service routine from the Kernel's list of interrupt handlers before the driver is removed from memory. If you forget to do this and your device generates an interrupt after the driver is unloaded, the Kernel will try to call the address in nonpaged pool where your ISR used to lived. Nothing good will happen. Disconnecting from an interrupt is a two-step procedure. First, use KeSyn chronizeExecution and a SynchCritSection routine to disable the device and pre vent it from generating any further interrupts. Second, remove your ISR from the Kernel's list of handlers by passing the device's Interrupt object to IoDiscon nectlnterrupt. 9.3 W RITING A S TART 1/0 R OUTI N E In the rest of this chapter, we'll be developing a programmed I/0 driver for a paral lel port. To keep things simple, this driver ignores many of the details you'd have to consider if you were writing a commercial driver. Take a look at the sample driver that comes with the NT DDK to see what's involved in managing these devices. Execution Context The I / O Manager calls your Start 1 /0 routine (described in Table 9.4) either when a Dispatch routine calls IoStartPacket (if the device was idle), or when Chapter 9 186 Table 9.4 Programmed 1/0 Data Transfers Function prototype for a Start 1/0 routine VOID XxStartlo IRQL Parameter Description IN PDEVICE_OBJECT DeviceObject IN PIRP irp Target device for this request IRP describing the request == DISPATCH_LEVEL Return value some other part of the driver calls loStartNextPacket. In either case, Start 1/0 runs at DISPATCH_LEVEL IRQL, so it mustn't do anything that causes a page fault. What the Start 1/0 Routine Does Your driver's Start 1/0 routine is responsible for doing any function-code specific processing needed by the current IRP and then starting the actual device operation. In general terms a Start 1 / 0 routine will do the following: 1. Call IoGetCurrentStackLocation t o get a pointer t o the IRP's stack location. 2. If your device supports more than one IRP_MJ_XXX function code, examine the 1/0 stack location's MajorFunction field to determine the operation. 3. Make working copies of the system buffer pointer and byte count stored in the IRP. The Device Extension is the best place to keep these items. 4. Set a flag in the Device Extension indicating that you expect an interrupt. 5. Begin the actual device operation. To guarantee proper synchronization, any of these steps that access data shared with the ISR should be performed inside a SynchCritSection routine rather than in Start 1/0 itself. 9.4 WRITING AN I NTER R U PT SERVICE ROUTINE (ISR) Once a device operation begins, the actual data transfer is driven by the arrival of hardware interrupts. When an interrupt arrives, the driver's Interrupt Service routine acknowledges the request and either transfers the next piece of data or invokes a DPC routine. Execution Context When the Kernel gets a device interrupt, it uses its collection of Interrupt objects to locate an ISR willing to service the event. It does this by running Sec. 9.4 Writing an Interrupt Service Routine (ISR) 187 through all the Interrupt objects attached to the DIRQL of the interrupt and call ing ISRs until one of them claims the interrupt. The Kernel interrupt dispatcher calls your ISR at the synchronization IRQL you specified in the call to IoConnectlnterrupt. Usually this will be the DIRQL level of the device. The Kernel dispatcher also acquires and releases the device spin lock for you. Running at such a high IRQL, there are lots of things your ISR isn't allowed to do. In addition to the usual warning about page faults, your ISR shouldn't try to allocate or free various system resources (like memory) . If you plan to call any system support routines from your ISR, check for restrictions on the level at which they can run. You may need to perform those kinds of operations in a DPC routine rather than in the ISR itself. As you can see from Table 9.5, the Kernel passes you a pointer to whatever context information you identified in IoConnectlnterrupt. Most often, this will be a pointer to the Device or Controller Extension. What the Interrupt Service Routine Does The Interrupt Service routine is the real workhorse in a programmed I/O driver. In general, one of these routines will do the following: 1. Determine if the interrupt belongs to this driver. If not, immediately return a value of FALSE. 2. Perform any operations needed by the device to acknowledge the interrupt. 3. Determine if any more data remains to be transferred. If there is, start the next device operation. This will eventually result in another interrupt. 4. If all the data has been transferred (or if a device error occurred), queue up a DPC request by calling IoRequestDpc. 5. Return a value of TRUE. Always code an ISR for speed. Any work that isn't absolutely essential should go in a DPC routine. It's especially important that your ISR doesn't drag its Table 9.5 Function prototype for an I nterrupt Service routine == BOOLEAN XxlSR IRQL Parameter Description IN PKINTERRUPT Interrupt IN PVOID ServiceContext Interrupt object generating the interrupt Context area passed to IoConnectlnterrupt • TRUE - interrupt was serviced by XxISR • FALSE - interrupt not serviced Return value DIRQL Chapter 9 188 Programmed 1/0 Data Transfers feet while determining whether or not to service an interrupt. There may be any number of other ISRs waiting in line behind yours for a given interrupt, and if you do a lot of processing before you decide not to handle the event, you can slow them down. 9.5 W R ITI N G A D P C F O R I S R R O U TI N E Your driver's DpcForlsr routine is responsible for determining a final status for the current request, completing the IRP, and starting the next one. Execution Context In response to the ISR's call to IoRequestDpc, your driver's DpcForlsr rou tine (described in Table 9 .6) is added to the DPC dispatch queue. When the CPU's IRQL value drops below DISPATCH_LEVEL, the DPC dispatcher calls the Dpc Forlsr routine. Your DpcForlsr routine runs at DISPATCH_LEVEL IRQL, which means it has no access to pageable addresses. Once you call IoRequestDpc for a given device, the 1/0 Manager ignores any further IoRequestDpc calls for that device until the DpcForlsr routine exe cutes. This is standard behavior for DPC objects. If your driver design is such that you might issue overlapping DPC requests for the same device, then it's up to you to handle this situation properly. You'll need to keep track of the pending requests and have the DPC routine perform the work for all of them each time it executes. What the DpcForlsr Routine Does Since most of the work happens during interrupt processing, the DpcForlsr routine in a programmed 1/0 driver doesn't have a lot do. In particular, this rou tine should 1. Set IRP's I / 0 status block. Put an appropriate STATUS_XXX code in the Sta tus field and the actual number of bytes transferred in the Information field. Table 9.6 Function prototype for a DpcForlsr routine == VOID XxDpcForlsr IRQL Parameter Description IN PKDPC Dpc IN PDEVICE_OBJECT DeviceObject IN PIRP Irp IN PVOID Context DPC object responsible for this call Target device for I/O request IRP describing the current request Context passed to IoRequestDpc Return value DISPATCH LEVEL Some Hardware: The Parallel Port Sec. 9.6 189 2. Call IoCompleteRequest to complete the IRP with an appropriate priority boost. Once you've made this call, don't touch the IRP again. 3. Call IoStartNextPacket to send the next IRP to Start 1/0. Priority Increments The NT thread-scheduler uses a priority-boosting strategy to keep the CPU and 1/0 devices as busy as possible. As you can see from the boost values listed in Table 9.7, priority increments are weighted so as to favor threads working with interactive devices like the mouse and keyboard. As part of this strategy, your driver should compensate any thread that waits for an actual device operation by giving it a priority boost. Choose an appropriate increment from the table and specify it as an argument to IoCompleteRequest. 9.6 SOM E HARDWARE : THE PARALLEL PORT Before we walk through an example of a programmed 1/0 driver, it will be help ful to look at some actual hardware. This serves the dual purpose of showing you what kinds of devices tend to perform programmed 1/0 and of giving us some thing to control with our driver. How the Parallel Port Works The parallel interface found on most PCs is based on an ancient standard from the Centronics Company. Although its original purpose was to communicate Table 9.7 Specify one of these values when you complete an 1/0 request Priority increment val ues Symbol IO_NO_INCREMENT IO_CD_ROM_INCREMENT IO_DISK_INCREMENT IO_PARALLEL_INCREMENT 10_VIDEO_INCREMENT IO_MAILSLOT_INCREMENT IO_NAMED_PIPE_INCREMENT IO_NETWORK_INCREMENT IO_SERIAL_INCREMENT IO_MOUSE_INCREMENT IO_KEYBOARD_INCREMENT IO_SOUND_INCREMENT Boost 0 1 1 1 1 2 2 2 2 6 6 6 Use when completing ... Requests involving no device 1/0 CD-ROM input Disk l/O Parallel-port 1/0 Video output Mailslot I/ 0 Named pipe 1/0 Network l/0 Serial-port 1/0 Pointing-device input Keyboard input Sound board I/ 0 Chapter 9 190 Programme d 1/0 Data Transfers with printers, clever people have found ways of attaching everything from disks to optical scanners to the parallel port. The DB-25 connector on this port carries a number of signals, the most important ones being: • Initialize-The CPU sends a pulse down this line when it wants to initial ize the printer. • Data-The CPU uses these eight lines to send one byte of data to the printer. On systems with extended parallel interfaces, these lines can also be used for input. • Strobe#-The CPU pulses this line once to let the printer know that valid information is available on the data lines. 3 • Busy-The printer uses this line to let the CPU know that it can't accept any data. • Ack#-The printer sends a single pulse down this line when it is no longer busy. • Errors-The printer can use several lines to indicate a variety of not ready and error conditions to the CPU. The following sequence of events occurs during a data transfer from the CPU to a printer attached to the parallel port: 1. The CPU places a byte on the eight data lines and lets the data settle for at least half a microsecond. 2. The CPU grounds the STROBE# line for at least half a microsecond and then raises it again. This is the signal to the printer that it should latch the byte on the data lines. 3. In response to the new data, the printer raises the BUSY line and starts to pro cess the byte. This usually means moving the byte to an internal buffer. 4. After it processes the character (which may take microseconds or seconds, depending on how full the printer 's buffer is), the printer lowers the BUSY line and pulses the ACK# wire by grounding it briefly. 4 You can see from this description that the parallel port offers a very low level interface to the outside world. Most of the signaling protocol involved in a data transfer has to be implemented by the CPU itself. This is going to have a major impact on the design of our driver. 3 Following the standard convention, a line with # in its name means that ground indicates a logic-1, while presence of a signal on the line indicates a logic-0. 4 Yes, using two lines to indicate a ready status is redundant. Some Hardware: The Parallel Port Sec. 9.6 191 Device Registers The software interface to the parallel port is through a set of three registers, described in Table 9.8. Since the parallel port is one of the things detected by auto configuration (even on an ISA system), our driver will be able to use the Configu ration Manager to find the base address of the data register. If you look at the bit settings in Table 9.8, you'll notice that some of the bits have the opposite polarity from the signals they represent. For example, you need to set the STROBE bit to 1 if you want to ground the STROBE# wire and get the printer to accept your data. Also, the BUSY wire going to ground causes the BUSY bit in the status register to set itself - so it's really a NOT-BUSY bit. The "solder people" may have a good explanation for all this, but it's usually best to hide these oddities in a hardware header file. 5 Table 9.8 These registers control a parallel port i nterface Parallel port registers Offset Register Access Description 0 Data R/W 1 Status RI O Data byte transferred through parallel port Current parallel port status Reserved; normally contain a 1 0 interrupt has been requested by port 0 - an error occurred 1 - printer is selected 1 - printer is out of paper 0 acknowledge 0 printer is busy Commands sent to parallel port 1 - strobe data to /from parallel port 1 - automatic line feed 0 - initialize printer 1 - select printer 1 enable interrupts 1 read data from parallel port* Reserved; must be 1 Bits 0 - 1 Bit 2 - Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 2 Control Bit O Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bits 6 - 7 - - R/W - - *Only valid for extended parallel ports; otherwise this must be 0. 5 See the HARDWARE.H header file included in the on-disk version of the sample source code that accompanies this chapter. Chapter 9 192 Programmed 1/ 0 Data Transfers Interrupt Behavior On ISA machines, the parallel port designated as LPTl normally uses IRQ 7 and LPT2 uses IRQ 5. A device connected to a parallel port generates an interrupt by grounding the ACK# line momentarily. Most printers yank on this line for any of the following reasons: • The printer has finished initializing itself. • The printer has processed one character and is now ready for another. • Power to the printer has been switched off. • The printer has gone offline or has run out of paper. There's some variability in the way different printers implement these fea tures. For example, not all of them generate an interrupt when they've completed their initialization, nor do all printers interrupt when they go offline or run out of paper. The driver developed later in this chapter assumes that all these conditions produce interrupts. A Driver for the Parallel Port So, just what is it about the parallel port that makes it a good candidate for programmed 1/0? Looking at the device's behavior, one clue is that each byte sent to the device has to be transferred through the CPU. DMA devices work independently of the CPU and don't demand this much attention. Another hint is that it generates an interrupt after each byte is accepted by the device. This means a large number of interrupts will probably occur before an operation is complete. DMA devices typically generate only a single interrupt when a transfer is complete. 9.7 CODE EXAMPLE : PARALLEL PORT DRIVER This example shows how t o write a basic programmed 1/0 driver for the parallel port. You can find the code for this example in the CH09\DRIVER directory on the disk that accompanies this book. XXDRIVER.H This version of the main header file builds on the ones seen in previous chapters. Only one structure from this file is of much interest. DEVICE_EXTENSION The following excerpt shows the changes in the Device Extension needed to support the parallel port. Sec. 9.7 Code Example: Parallel Port Driver 193 typede f s t ru c t _DEVICE_EXTENS I ON ULONG F i foS i z e ; / / Byte s to s end at once ULONG Byte s Requested ; / / Requested trans fer s i z e ULONG Byte sRemaining ; / / Chars l e f t to t rans ferO PUCHAR pBu f f e r ; / / Next char to s end BOOLEAN Trans f e rinProgre s s ; @ / / Mo s t rec ent s tatus@ UCHAR Devi c e S tatus ; } DEVICE_EXTENS I ON , * PDEVICE_EXTENS ION ; 0 These two fields are working copies of the requested transfer size and the system buffer pointer taken from the IRP. They are used to keep track of where we are in the transfer. Modifying the IRP itself would be a disaster because the 1/0 Manager uses it to clean up after the request. @ This flag is used to detect spurious interrupts. It's set at the beginning of a transfer and cleared when the request is completed. @ This field keeps track of the most recent status of the parallel port. The DpcForlsr routine uses it to figure out what kind of status to give back to the caller. I NIT.C Most of the code in this module is the same as it was in Chapter 6. The changes have to do with some hardware-specific initialization. XxCreateDevice This excerpt shows the proper sequence of operations for enabling interrupts and initializing a piece of hardware. s ta t i c NTSTATUS XxC r e a t eDev i c e ( IN PDRIVER_OBJECT DriverObj e c t , IN PCONF I G_BLOCK pCon f i g , I I Con f i g b l o c k IN ULONG uN um I I Devi c e number ) s t a tus = I o C r e a t e Syrnbo l i cL ink ( & l inkName , &dev i c eName ) ; II I I S e e i f the symbo l i c l ink was c r e a t ed . II if ( ! NT_SUCC E S S ( s t atus I o De l e t eDevi c e ( r e turn s tatus ; ) ) { pDevObj ) ; . . Chapter 9 194 Programmed 1/0 Data Transfers II I I Make sure devi c e interrup t s are OFF II XxWri t eControl ( O pDevExt , XX_CTL_DEFAULT I XX_CTL_NOT_INI ) ; II I I Conne c t t o an Interrupt obj e c t . . . II s tatus = I oConne c t int errupt ( @ &pDevExt - >pinterrup t , Xxi s r , pDevExt , NULL , pCon f i g - >Devi c e [ uNum ] . Sys t emVec tor , pCon f i g - >Devi c e [ uNum ] . Di rql , pCon f i g - >Devi c e [ uNum ] . Di rql , pCon f i g - >Devi c e [ uNum ] . InterruptMode , pCon f i g - >Devi c e [ uNum ] . ShareVector , pConf i g - >Devi c e [ uNum ] . Af f in i ty , pConf i g - > Devi c e [ uNum ] . F loat ingSave ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { IoDe l e t eSymbo l i cLink ( & l inkName ) ; IoDe l e t eDevi c e ( pDevObj ) ; r eturn s tatus ; } ' II I I Ini t i a l i z e the hardware and enab l e interrup t s II KeSynchroni z eExecu t i on ( 4D pDevExt - >pinterrupt , XxinitDevi c e , pDevExt ) ; return s tatus ; 0 It's important to put the device into a known state. This includes dis abling interrupts from the port. @ The driver uses values recovered by XxGetHardwarelnfo to attach its Interrupt Service Routine to the device's interrupt vector. 49 Finally, the driver uses a Synch Critical Section routine to initialize the device, including turning on its interrupts. Keep in mind that the Inter- Sec. 9.7 Code Example: Parallel Port Driver 195 rupt Service Routine may actually get called as soon as the KeSynchro nizeExecution function returns. XxlnitHardware This function cycles the INIT line, causing the printer to start initializing itself. This will eventually produce an interrupt. The function then sets the SELECT line and enables interrupts from the port. This might result in an immediate interrupt. However, since this function is being called by KeSyn chronizeExecution, it's not in any danger of being disturbed by parallel port interrupts. s t at i c BOOLEAN XXInitDevi ce ( IN PVOI D SynchContext ) PDEVICE_EXTENS ION pDE ( PDEVICE_EXTENS I ON ) SynchContext ; XxWri t eControl ( pDE , XX_CTL_DEFAULT ) ; 0 KeS ta l l Execu t i onProc e s s o r ( 6 0 ) ; XxWr i t eContro l ( @ pDE , XX_CTL_DEFAULT I XX_CTL_NOT_INI I XX_CTL_SELECT I XX_CTL_INTENB ) ; KeS t a l l Execu t i onPro c e s s o r ( 6 0 ) ; return TRUE ; 0 Clear the NOT_INIT bit. This begins the printer 's initialization cycle. The driver waits 60 microseconds to be sure the signal has stabilized. @ To complete the cycle, the driver sets the NOT_INIT bit. It also enables interrupts and tells the printer to select itself. Again, it's necessary to wait a little while the signals stabilize. TRANSFER.C The routines in this file do the actual work of transferring data out to the parallel port. This includes starting each operation, handling interrupts, and cleaning up with a DPC. XxStartlo This function does any preprocessing needed by the current IRP and then starts the actual device operation. Chapter 9 196 Programmed 1/0 Data Transfers VOI D XxS tart i o ( IN PDEVICE_OBJECT Devic eObj e c t , IN P I RP I rp ) P IO_STACK_LOCATI ON I rpStack = I oGe tCurrent i rpStackLocat i on ( I rp ) ; PDEVICE_EXTENS ION pDE = Devi c eObj e c t - >Devi c eExt ens i on ; swi tch ( I rpStack - >Ma j o rFunct i on ) { 0 II I I Use a SynchC r i t S ec t i on rout ine to I I s tart the wr i t e opera t i on . . . II case I RP_MJ_WRITE : II I I S e t up counts and byte pointer@ II pDE - >Byt e s Reques t ed = I rpS tack- > Pararneters . Wr i t e . Length ; pDE - >Byt e s Rernaininng = pDE - >Byt e s Reque s t ed ; pDE- >pBu f fer = I rp - >As s o c i at edirp . Sys t ernBu f f e r ; i f ( ! Ke Synchroni z eExecut i on ( @} pDE - >p interrupt , XxTransrni tByte s , pDE ) ) XxDpcFori s r ( NULL , Devi c eObj ect , I rp , pDE ) ; break ; de faul t : 0 I rp - > I o S tatus . S tatus = STATUS_NOT_SUPPORTED ; Sec. 9.7 Code Example: Parallel Port Driver 197 O; I rp - > I o S tatus . Informat i on I oComp l e teReques t ( I rp , I O_NO_INCREMENT ) ; I o S t artNext Packet ( DeviceOb j e c t , FALSE ) ; break ; } } 0 Since all requests get funneled through a single Start 1 / 0 routine, it's nec essary to switch on the major-function code if you have to do any func tion-specific preprocessing. @ These are the private copies of the pointer and byte counts that the driver uses to keep track of its place in the system buffer. @) The driver tries to send some number of bytes out to the device. If any thing goes wrong, it calls XxDpcForlsr as a regular subroutine to com plete the request. 0 The driver should never get to the default case, because unsupported functions have been filtered out by the 1/0 Manager during the dispatch ing process. But it's better to be safe than sorry. XxTransmitBytes This function sends as many bytes as possible to the parallel port. This will be either one FIFO's worth, or as many as are left in the system buffer. Both XxStartlo and Xxlsr call this function. In either case, it expects to be running at DIRQL, synchronized with the driver 's ISR s ta t i c BOOLEAN XxTransmi tByt es ( IN PVOI D Cont ext / / Pointer to the Devi c e Ext ens i on ) PDEVICE_EXTENS I ON pDE = ( PDEVICE_EXTENS I ON ) Cont ext ; ULONG XferS i z e ; UCHAR Contr o l = XxReadCont rol ( pDE ) ; pDE - >Devi ceS tatus = XxReadStatus ( pDE ) ; 0 i f ( ( pDE - >Byt esRemaining = = 0 ) @ I I ! XX_OK ( pDE - > Devi ceS tatus ) ) pDE- >Trans fer inProgre s s return FALSE ; = FALSE ; 198 Chapter 9 Programmed I/O Data Transfers II I I A trans f e r i s happening . Calculate the numbe r I I o f byt es to s end in one bunch . II pDE - >Trans f e rinProgres s TRUE ; � i f ( pDE - > Byt e s Remaining < pDE - >Fi f oS i z e pDE - >Byte s Remaining ; XferS i z e else X f e rS i z e pDE - >Fi f o S i z e ; II I I Send as many byt es to the devi c e as i t I I can handl e . Each byte mus t b e s t robed I I out . II whi l e ( XferS i z e > 0 ) { 0 II I I Make sure the STROBE b i t i s o f f II XxWr i t eControl ( pDE , Control & -XX_CTL_STROBE ) ; II I I S end a byt e and ho l d i t f o r at least I I 5 0 0 nano - s econds II XxWr i t eData ( pDE , *pDE - >pBu f f e r ) ; KeS t a l l Execu t i onProc e s s o r ( 1 ) ; II I I Turn on the STROBE b i t and ho l d i t I I f o r a t l eas t 5 0 0 nano - seconds II XxWr i t eControl ( pDE , Control I XX_CTL_STROBE ) ; KeS ta l l Execut i onProc e s s or ( 1 ) ; II I I Turn o f f the STROBE l ine II XxWr i t eContro l ( pDE , Control & -XX_CTL_STROBE ) ; KeS ta l l Execu t i onProce s s o r ( 1 ) ; Sec. 9.7 Code Example: Parallel Port Driver 199 II I I Updat e pointer and count ers II pDE - >pBu f f er + + ; XferS i z e - - ; pDE - >Byt esRemaining - - ; return TRUE ; 0 The XxDpcForlsr routine will use this status field to figure out what hap pened during the 1 / 0 processing cycle. @ If all the bytes have been sent, or there was a problem with the printer, just return a FALSE and quit. @ Send either one FIFO's worth of data, or as many bytes as are left in the buffer - whichever is less. 0 This loop sends out one bucketful of data to the port. The body of the loop incorporates the strobing protocol required for sending data to the parallel port. Xxlsr The Kernel calls this function in response to a device interrupt. If Xxlsr processes the interrupt, it returns TRUE; otherwise, FALSE. It runs at DIRQL level, holding the Interrupt spin lock for this device. BOOLEAN Xxi s r ( IN PKINTERRUPT Interrupt , IN PVO I D Servic eContext I I P t r to Devi c e Extens i on ) PDEVICE_EXTENSION pDE = S ervi ceCont ext ; PDEVICE_OBJECT pDevi c e = pDE - >Devi ceObj e c t ; P I RP I rp = pDevice- >Current i rp ; UCHAR S tatus = XxReadS tatus ( pDE ) ; i f ( ( S tatus & XX_STS_NOT_IRQ ) re turn FALSE ; != 0 ) 0 i f ( pDE - >Trans f e rinProgres s ) @ i f ( ! XxTransmi tByt e s ( pDE ) ) I oReques t Dpc ( pDevi c e , I rp , re turn TRUE ; ( PVOID ) pDE ) ; 200 Chapter 9 Programmed 1/ 0 Data Transfers 0 Check the parallel port to see if it generated an interrupt. Not all parallel devices support this bit, but the ones that don't hold it at 0. If the device didn't request an interrupt, leave the ISR as soon as possible. @ The port interrupted. If there's no transfer in progress, just ignore the inter rupt; otherwise try to send the next chunk of data. If XxTransmitBytes fails, it means either an error occurred, or there are no more bytes to send. XxDpcForlsr Once the data transfer finishes, this function performs any required cleanup operations. The XxStartlo also calls this function if it needs to fail an IRP before starting a transfer. XxDpcForlsr runs at DISPATCH_LEVEL IRQL. VO I D XxDpcFori s r ( IN PKDPC Dpc , IN PDEVICE_OBJECT Devi c eObj ect , I N P I RP I rp , IN PVOI D Cont ext I I Pointer to Device Extens i on ) { PDEVICE_EXTENS ION pDE = Cont ext ; I rp - > Io S tatus . Inforrnat i on = pDE - >BytesReques t ed pDE- >cBytesRernaining ; 0 II I I F i gure out what the f inal s tatus I I shou l d be II i f ( XX_OK ( pDE - >Devi ceS tatus ) ) @ I rp - > I o S tatus . S tatus = STATUS_SUCCES S ; e l s e i f ( XX_POWERED_OFF ( pDE - > Devi c e S tatus ) ) I rp - > I o S tatus . S tatus = STATUS_DEVICE_POWERED_OFF ; e l s e i f ( XX_NOT_CONNECTED ( pDE - >Devi c e S tatus ) ) I rp - > I o S tatus . S tatus = STATUS_DEVICE_NOT_CONNECTED ; e l s e i f ( XX_OFF_L INE ( pDE - >Devi ceS tatus ) ) I rp - > I o S tatus . S tatus = STATUS_DEVICE_OFF_L INE ; e l s e i f ( XX_PAPER_EMPTY ( pDE - >Devi ceS tatus ) ) I rp- > I o Status . S tatus = STATUS_DEVICE_PAPER_EMPTY ; Sec. 9.8 Testing the Data Transfer Routines 201 e l s e I rp - > I o S tatus . S tatus = STATUS_DEVI CE_DATA_ERROR ; II I I I f we ' re being cal l ed dire c t ly from S t art I I O , I I don ' t give the user any p r i or i ty boos t . II i f ( Dpc = = NULL ) @ I oComp l e teReque s t ( I rp , I O_NO_INCREMENT ) ; else I oCompl e teReque s t ( I rp , I O_PARALLEL_INCREMENT ) ; I o S t artNextPacke t ( Devi c eObj e c t , FALSE ) ; 0 0 The Information field should contain the number of bytes actually trans ferred when the IRP goes back to the I/ 0 Manager. @ This section of code uses several macros defined in HARDWARE.H to figure out what the final status should be. @) It's necessary to know whether this function is being called directly from XxStartlo or by the system DPC dispatcher. In the former case, the origi nal thread gets no priority boost. The NULL DPC argument means XxD pcForlsr is being called from XxStartlo. O Once the current IRP is completed, it's necessary to tell the I/O Manager to start the next one. 9.8 TESTING THE DATA TRANSFER ROUTIN ES At this point, you've got a real driver to work with and you can do serious testing. Among other things, you can verify that the driver • Sends IRPs from its Dispatch routines to its Start I/O routine • Responds to device interrupts • Transfers data successfully • Completes requests • Manages requests from multiple callers Testing Procedure The following procedure will let you check all the code paths through your driver's data transfer routines. Chapter 9 202 9.9 Programmed I/O Data Transfers 1. Write a minimal Start 1/0 routine that simply completes each IRP a s soon as it arrives. This will allow you to test the linkage between the driver's Dispatch and Start I / 0 routines. 2. Write the real Start I/O routine, the ISR, and the DpcForlsr routine. If the driver supports both read and write operations, implement and test each path separately. 3. Exercise all the data transfer paths through the driver with a simple Win32 program that makes ReadFile, WriteFile, and DeviceloControl calls. 4. Stress test the driver with a program that generates large numbers of I/O requests as quickly as possible. Run this test on a busy system. 5. If your device is shareable, run several copies of the test program at once to be sure the driver works with multiple open handles. 6. If your driver supports multiple physical devices, repeat the tests with each device unit. 7. If possible repeat steps 4-6 on a multiprocessor system to verify SMP syn chronization. SUM MARY At this point, it looks as if you have all the components of a working driver. Its Start 1 / 0 routine is setting up each request, its ISR is servicing interrupts, and its DpcForlsr is handling all the details of 1/0 postprocessing. What more could you want? Unfortunately, the little parallel port driver we built in this chapter isn't ready for prime time distribution. In particular, it doesn't handle device timeouts, so if an interrupt never arrives, the request will simply lock up. In the next chap ter, you'll see how to remedy this situation. C H A P T E R 10 Timers I t's a sad fact, but true: Hardware is perverse stuff that doesn't necessarily behave the way it should. For example, error conditions may prevent a device from generating an interrupt when you're expecting one. Even worse, some devices don't even use interrupts to signal interesting state changes. Handling these situations often requires some kind of timer or polling mechanism, and that's just what we're going to look at in this chapter. 1 0. 1 HAN DLING DEVICE TIM EOUTS Your driver should never assume that an expected device interrupt will arrive. The device might be offline, it might be waiting for some kind of operator inter vention, or perhaps it's just broken. This section explains how to use I/O Timer routines to detect unresponsive devices. How 1/0 Timer Routines Work An I/O Timer routine is an optional piece of driver code that your Driver Entry routine attaches to a specific Device object. After you start the Device object's timer, the I/O Manager begins calling the I/0 Timer routine once every second. These calls continue until you stop the timer. Table 10.1 lists the functions available for working with 1/0 timers. Table 10.2 shows the prototype for the 1/0 Timer routine itself. When it exe cutes, it receives a pointer to the associated Device object and whatever context 203 Chapter 10 204 Timers Using 1/0 timers Table 1 0.1 How to use an 1/0 Timer routine I F you want to ... THEN cal l ... I RQL Attach a timer routine to a device Start a device's timer Stop a device's timer IolnitialzeTimer IoStartTimer IoStopTimer PASSIVE_LEVEL :::;; DISPATCH_LEVEL :::;; DISPATCH_LEVEL Table 1 0.2 Function prototype of an 1/0 Timer routine VOID XxloTimer IRQL Parameter Description IN PDEVICE_OBJECT DeviceObject IN PVOID Context Return value Device object whose timer just fired Context passed to IolnitializeTimer == DISPATCH_LEVEL information you passed to IolnitializeTimer. As always, the address of the Device Extension is a good choice for context. How to Catch Device Timeout Conditions In general terms, a driver that wants to catch device timeouts should do the following: 1. Its DriverEntry routine calls IolnitializeTimer t o associate a n I/OTimer rou tine with a specific device. 2. When a user-mode program attaches a handle to the device by calling Create File, the Dispatch routine for IRP_MJ_CREATE calls IoStartTimer. As long as this handle is open, the device will receive I/O Timer calls. This same Dis patch routine also sets a timeout counter in the Device Extension to -1 a "do nothing" value. - 3. When the Start I/O routine starts the device, it also sets the timeout counter to the maximum number of seconds the driver is willing to wait for an interrupt. 4. The ISR will do one of two things when an interrupt arrives. If there's more data, it resets the timeout counter to its maximum value and transfers the next piece of data. Otherwise, it sets the timeout counter to -1 and issues a DPC request to complete the IRP. 5. Meanwhile the system is calling the driver 's I/O Timer routine once every second. When it executes, the I/O Timer routine checks the timeout counter. Sec. 10.2 Code Example: Catching Device Timeouts 205 A value of -1 means "ignore the 1/0 Timer call." A positive value causes the 1/0 Timer routine to decrement the device's timeout counter. If the counter reaches zero before an interrupt arrives, the 1 / 0 Timer routine stops the device, sets the timeout counter to -1, and processes the request as a timed out operation. 6. When the user-mode program calls CloseHandle, the Dispatch routine for IRP_MJ_CLOSE calls IoStopTimer and disables 1/0 Timer callbacks for the device. Notice that the Start 1/0 and 1/0 Timer routines (running at DIS PATCH_LEVEL IRQL), and the ISR (running at DIRQL) all have access to the timeout counter in the Device Extension. This can lead to problems unless these driver routines synchronize themselves. The code example that appears later in this chapter shows how to do this properly. It's also worth pointing out that not all drivers use their Dispatch routines to start and stop the 1/0 Timer calls. Some drivers just start a device's 1 / 0 Timer in DriverEntry and stop it in the Unload routine. While the driver is loaded, it sim ply ignores 1/0 Timer callbacks whenever the timeout counter is set to -1 . The only disadvantage of this scheme is that it incurs some system overhead even when the device isn't being used. Your driver has a number of options for processing a request that has timed out. Some of the common things drivers do include: • Retrying the device operation some fixed number of times before failing the IRP that generated it. • Failing the IRP by calling IoCompleteRequest with an appropriate final status value. 1 • Logging a timeout error for the device in the system event log. This can help system administrators to track down flaky hardware. 1 0 . 2 C O D E E XAM P L E : C ATC H I N G D E V I C E T I M EO UTS This example does show how to add timeout support to the simple parallel port driver developed in the previous chapter. You can find the code for this example in the CHlO\ TIME-OUT\ DRIVER directory on the disk that accompanies this book. 1 Watch out if you're tempted to use STATUS_IO_TIMEOUT as the final status for a timedout IRP. Unfortunately, this status code maps onto the ERROR_SEM_TIMEOUT in Win32. The message for this code ("The semaphore timeout period has expired.") may be a little confusing to users of your driver, so it's usually best to find some other NT status code. Chapter 10 206 Timers XXDRIVER.H This version of the main header file builds on the ones seen in previous chapters. Only one structure from this file is of much interest. DEVICE_EXTENSION The following excerpt shows the changes in the Device Extension needed to catch parallel port timeout errors. typ ede f s truc t _DEVICE_EXTENS I ON { PUCHAR pBu f fer ; I I Working bu f f er pointer LONG TimeRemaining ; I I Seconds unt i l t imeoutO UCHAR Devi c e S tatus ; I I Mo s t recent s tatus byte DEVICE_EXTENS ION , * PDEVICE_EXTENS I ON ; 0 This counter keeps track of the number of seconds remaining until the driver declares a timeout condition. If it's set to -1, 1/0 Timer callbacks are ignored. Anyone accessing this variable needs to be synchronized with the ISR. INIT.C Here's an excerpt from the driver initialization code. Only a few changes are necessary to prepare for I/ 0 Timer support. XxCreateDevice In this modified version of the function that creates Device objects, notice the addition of code to set up the I/0 timer. s t at i c NTSTATUS XxC r e a t eDevi c e ( IN PDRIVER_OBJECT DriverObj ec t , IN PCONF I G_BLOCK pCon f i g , ) II II I I Con f i g b l o c k I I Devi c e number IN ULONG uNum Ini t i a l i z e the devi c e extens i on s t ruc ture II pDevExt = pDevObj - >Devi c eExten s i on ; pDevEx t - > Devi c eOb j e c t pDevOb j ; = pDevEx t - >NtDev i c eNumbe r pDevExt - > F i f o S i z e = = uNum ; XX_F I FO_S I Z E ; pDevExt - > T imeRema ining = -1 ; 0 II II II Prepare the devi c e ' s DPC obj e c t f o r l a t e r u s e Sec. 10.2 Code Example: Catching Device Timeouts 207 I o i n i t i a l i z eDpcReque s t ( pDevOb j , XxDp c Fo r i s r II II ) ; Ini t i a l i z e the dev i c e ' s t imeout c l o ck II I o in i t i a l i z eT irner ( pDevOb j , Xxi oTirne r , pDevExt ) ;@ 0 Set the initial value of the timeout counter to its "do nothing" state. There's no need to synchronize here because the driver 's ISR hasn't been activated yet with a call to IoConnectlnterrupt. @ Associate the Device obj ect with the driver 's I/O Timer routine. Each time XxloTimer is called, pass it a pointer to the Device Extension. TRANSFER.C Most of the changes in these versions of the data transfer routines involve checking and setting the state of the timeout counter. XxTransmitBytes For proper synchronization, this function expects to be holding the Interrupt spin lock when it runs. This means it either must be called from Xxlsr or as a Synch Critical Section routine. s tat i c BOOLEAN XxTransmi tByt e s ( IN PVOI D Context ) PDEVICE_EXTENS I ON pDE = ( PDEVICE_EXTENS I ON ) Context ; UCHAR Contro l ; ULONG i ; pDE - > Devi ceS tatus II II II II II if = XxReadStatus ( pDE ) ; I f a l l the byt es have been s ent or the devi c e is unhappy , inhibi t any further proc e s s ing of thi s reques t and j us t qui t . ( ( pDE - >Byte sRemaining = = 0 ) I I ! XX_OK ( pDE - >Devi ceS tatus ) ) { pDE ->TimeRemaining = -1 ; 0 Chapter 10 208 Timers return FALSE ; II I I S end as many byt es to the device as i t I I can handl e . Each byte mus t be s t robed I I out . II XxReadCont r o l ( pDE ) ; Contro l for ( i=O ; i < pDE - >XferS i z e ; i + + ) {@ II I I Make sure the STROBE l ine i s o f f II XxWr i t eContro l ( pDE , Control & -XX_CTL_STROBE ) ; II I I Update pointer and c ounters II pDE - >pBu f fer+ + ; II I I S tart the t imeout c l ock and wai t I I for an interrupt II pDE - > TimeRemaining XX_TIMEOUT_VALUE ; @ re turn TRUE ; } 0 If the device is unhappy or there are no more bytes to transfer, this is the end of the request. Disable the timeout counter. @ There's no danger of the timeout routine failing the IRP during the data transfer loop. This is because 1/ 0 Timer routine won't access the timeout counter variable until it acquires the Interrupt spin lock. @ Now that more data has been sent, reset the timeout counter and wait for the next interrupt to arrive. Xxlsr This function responds to interrupts from the parallel port. It differs from the previous version in that it uses the timeout counter variable to determine if a transfer is currently in progress. Sec. 10.2 Code Example: Catching Device Timeouts 209 BOOLEAN Xxi s r { IN PKINTERRUPT Interrupt , IN PVOI D S ervi c eContext ) { PDEVICE_EXTENS I ON pDE = Servi c eContext ; PDEVICE_OBJECT pDev i c e = pDE- > Devi ceObj e c t ; UCHAR S tatus = XxReadS tatus { pDE ) ; II I I See i f thi s devi c e reques ted an interrupt II i f { ( S t atus & XX_STS_NOT_IRQ != 0 ) return FALSE ; i f { pDE - >TimeRemaining = = - 1 r eturn TRUE ; 0 II I I Otherwi s e , t ry to s end the next bunch o f I I byt e s . I f XxTransmi tBytes f ai l s , i t means I I e i ther an error occurred or there ' s no more I I data to s end . II pDE - >BytesRemaining - = pDE - >X f erS i z e ; i f { ! XxTransmi tByt e s { pDE ) ) @ { I oRequ e s t Dpc { pDevi ce , pDevi c e - > Current i rp , { PVOI D ) pDE ) ; return TRUE ; 0 If the timeout clock is -1, either there's no transfer in progress, or the device has already timedout. In either case, there's nothing to be done here. @ After the return from XxTransmitBytes, the timeout counter has either been set to its maximum value (if the next piece of data has been sent), or -1 (if there was no more data to send or the device had an error). TIM ER.C Here are the routines that actually process the timer events. In this particular driver, the Dispatch routine for IRP_MJ_CREATE starts the device's timer, and the Chapter 10 210 Timers Dispatch routine for IRP_MJ_CLOSE stops it. While the timer is running, the con tents of the timeout counter variable determine the behavior of the 1/0 Timer routines. XxloTimer As long as the 1 /0 Timer for a device is running, the system will call this routine once every second. VO I D Xxi oTirner ( IN PDEVICE_OBJECT Devic eObj ect , IN PVOID Context ) { PDEVICE_EXTENS ION pDE = Context ; i f ( pDE->T irneRernaining = = -1 ) return ; 0 e l s e i f ( ! Ke Synchron i z eExecu t i on ( @ pDE - >pinterrupt , XxProcessTirnerEvent , pDE ) ) II I I Cal l the DPC rout ine to f i gure out a I I f inal s t atus and comp l e t e the IRP . II XxDpcFori s r ( @ NULL , Devi c eObj ect , Devi ceObj ect - >Current i rp , pDE ) ; 0 Do a quick check of the timeout counter. Either there's no data transfer in progress, or an expected interrupt has arrived. Making this quick check at DISPATCH_LEVEL avoids needless trips up to DIRQL. @ The timeout counter appears to contain some value other than -1 . To pro cess the timer event safely, synchronize with Xxlsr using a Synch Critical Section routine. @} The Sync Critical Section routine returns FALSE if the current IRP has timed out. In this case, we just fail the IRP. Other options might include retrying the operation a fixed number of times, logging an error, and so forth. Sec. 10.3 Managing Devices without Interrupts 211 XxProcessTimerEvent This function does the real work of processing timer events. It runs as a Synch Critical Section routine because it has to synchro nize its access to the timeout counter with Xxlsr. s ta t i c BOOLEAN XxProcessTimerEvent ( IN PDEVICE_EXTENS ION pDE ) { i f ( pDE - >TimeRemaining = = - 1 ) re turn TRUE ; 0 II I I Decrement and t e s t the t imer . II i f ( - -pDE->TimeRemaining > 0 ) return TRUE ; 8 II I I A t imeout has oc curred . Prevent further I I pro c e s s ing of thi s reques t II pDE - >TimeRemaining -1 ; @ pDE - >Devi ceS tatus XxReadS t atus ( pDE ) ; return FALSE ; = = } 0 It's necessary to test the timeout counter again because the ISR may have changed it while we were waiting for the Interrupt spin lock. If that's the case, do nothing. 8 The timeout counter contains something other than -1 . In this case, decre ment the count. If it's still above 0, the IRP hasn't timedout yet. @ The counter hit 0 so the IRP has timedout. Setting the timeout counter to -1 blocks further processing of this request by the ISR (should an inter rupt just happen to arrive). Returning FALSE will cause the IRP to be completed by XxloTimer. 1 0 . 3 M A N AG I N G D E V I C E S WITHO U T I N TE R R U PTS Some devices don't generate interrupts every time they make a significant state change. Legacy ISA devices can be especially bad about this kind of thing. This section presents alternative ways of working with noninterrupting devices. Working with Noninterrupting Devices Under operating systems like MS-DOS, a driver managing a noninterrupt ing device could simply poll the device or busy-wait until it has changed state. Chapter 10 212 Timers However, this kind of behavior would cause serious performance problems for NT. Instead, NT drivers can use one of the following techniques for suspending their execution during a repeated polling operation: • Driver routines running at PASSIVE_LEVEL IRQL can call KeDelayExe cutionThread to introduce a time delay. This method can only be used by the driver's initialization and cleanup code, or any Kernel-mode threads the driver has created. • If you occasionally have to delay execution for intervals less than about 50 microseconds, you can call KeStallExecutionProcessor. This is better than busy-waitin� because the delay interval doesn't depend on a specific CPU architecture. • If parts of your driver running at DISPATCH_LEVEL IRQL need to intro duce a time delay, you can use a CustomTimerDpc routine. If your device needs to be polled repeatedly and the delay interval between each polling operation is over 50 microseconds, base your driver design on the use of system threads (discussed in Chapter 1 4). How CustomTimerDpc Routines Work A CustomTimerDpc routine is just a DPC routine that you associate with a Kernel Timer object. You get the CustomTimerDpc routine to run by setting the Timer's timeout value. When it expires, the Kernel automatically queues your DPC routine for execution. Eventually, the Kernel's DPC dispatcher pulls your request from the queue and executes the CustomTimerDpc routine. Keep in mind that, depending on system activity, there could be some delay between the moment the Timer object expires and the actual execution of the DPC routine. In earlier versions of Windows NT, a CustomTimerDpc routine would fire only once. If you wanted one of these routines to execute repeatedly, you had to manually reset the Timer object each time it fired. With NT 4.0, you have the option of specifying a repetition interval when you set the Timer object's initial timeout value. Each time it fires, the Timer object will automatically reset itself to fire again when the repetition interval has elapsed.3 Like all other DPC routines, a CustomTimerDpc runs at DISPATCH_LEVEL IRQL. Table 10.3 shows the prototype for one of these routines. Notice that a Cus- 2 3 Don't use this function too often. It essentially freezes the CPU on which it's called at whatever IRQL level it's called from. If you need to implement a repeating CustomTimerDpc routine, it's generally a good idea to use the Timer object's automatic repetition feature rather than resetting the Timer yourself each time it fires. It's more efficient because your driver won't be making so many calls to Kernel support rou tines. It also guarantees that there won't be any skewing of the timeout interval. Sec. 10.3 Managing Devices without Interrupts Table 1 0.3 213 Function prototype of a CustomTimerDpc routine == VOI D XxCustomTimerDpc IRQL Parameter Description IN PKDPC Dpc IN PVOID Context IN PVOID SystemArgl IN PVOID SystemArg2 Return value DPC object generating the request Context passed to KelnitializeDpc (Not used - contents unspecified) (Not used - contents unspecified) DISPATCH_LEVEL tomTimerDpc routine always receives two junk arguments from the system. The contents of these two system arguments are undefined, so don't use them. 4 With CustomTimerDpc routines, you're limited to just a single context argument that is permanently associated with the DPC object. It's worth comparing CustomTimerDpc routines with the I/O Timers you saw in the first part of this chapter. Although both mechanisms operate with time, they differ in several significant ways. In particular: • Unlike I/O Timer routines, a CustomTimerDpc is not associated with any particular Device object. You can have as many or as few of them as you like. • The minimum resolution of an I/O Timer is one second; you specify the expiration time of a CustomTimerDpc in units of 100 nanoseconds. • The I/O Timer always uses a one-second interval. You can specify differ ent expiration intervals each time you start a CustomTimerDpc. • The storage for an I/O Timer object is automatically part of the Device object. You need to declare nonpaged storage for both a KDPC and a KTIMER object if you want to use a CustomTimerDpc. How to Set Up a CustomTimerDpc Routine Working with CustomTimerDpc routines is very straightforward. Your driver simply needs to follow these steps: l. Allocate nonpaged storage (usually in a Device or Controller Extension) for both a KDPC and a KTIMER object. 2. DriverEntry calls KelnitializeDpc to associate a DPC routine and a context item with the DPC object. This context item will be passed to your CustomTimerDpc 4 Regular CustomDpc routines (not associated with a Timer object) can make use of these argu ments. The discussion of CustomDpc routines in the next chapter shows how to use them. Chapter 10 214 Timers routine when it fires. The address of the Device or Controller Extension is a good choice for the context item. 3. DriverEntry also calls KelnitializeTimer just once to set up the Timer object. 4. To start a one-shot Timer, call KeSetTimer; to set up a repeating Timer, use KeSetTimerEx instead. If you call these functions using a Timer object that is currently active, the previous request is canceled and the new expiration time replaces the old one. If you want to keep a Timer from firing, call KeCancelTimer before the Timer object expires. This also cancels a repeating Timer. If you need to find out whether a Timer has already expired, call KeReadStateTimer. You must be executing at PASSIVE_LEVEL IRQL when you initialize the DPC and Timer object. To set, cancel, or read the state of a Timer, you must be run ning at or below DISPATCH_LEVEL IRQL. In general, you should avoid calling KelnsertQueueDpc with a DPC object being used for a CustomTimerDpc rou tine. This can lead to race conditions in your driver. How to Specify Expiration Times Internally, NT maintains the current system time by counting the number of 100-nanosecond intervals since January 1, 1601 . This is a very big number, so NT defines a 64-bit data type called a LARGE_INTEGER to hold it. Table 10.4 lists the functions drivers can use to work with time values. Table 1 0.4 Functions that operate on system time values Time functions Function Description KeQuerySystemTime RtlTimeToTimeFields RtlTimeFieldsToTime KeQueryTickCount KeQueryTimelncrement Return 64-bit absolute system time Break 64-bit time into date and time fields Convert date and time into 64-bit system time Return number of clock interrupts since boot Return number of 100-nanosecond units added to system time for each clock interrupt Create a signed LARGE_INTEGER Create a positive LARGE_INTEGER Perform various arithmetic and logical operations on LARGE_INTEGERs RtlConvertLongToLargelnteger RtlConvertUlongToLargeinteger RtlLargelntegerXxx Note: Callers of these functions can be running at any IRQL level. Sec. 10.4 Code Example: A Timer-Based Driver 215 When you call KeSetTimer to start the clock ticking on a Timer object, you can specify the expiration time in one of two ways: • A positive LARGE_INTEGER value represents an absolute system time at which you want the Timer to expire. Absolute times correspond to some exact moment in the future, like "February 23, 2051 at 6:45 PM." • A negative LARGE_INTEGER value represents the length of an interval measured from the current moment, like "10 seconds from now." This is the form you're most likely to use. This fragment of code shows how to set a Timer object to expire after an interval of 75 microseconds. It assumes that pDE holds a pointer to a Device Extension, and that the Extension contains initialized Timer and DPC objects. LARGE_INTEGER DueTime ; DueTime = Rt lConvertLongToLarge integer ( - 7 5 * 1 0 ) ; KeS etTimer ( &pDE - >Timer , DueTime , &pDE - >DPC ) ; Since the number is negative, the system will interpret it as a relative time value. Scaling the number by ten is necessary because the basic unit of system time is 100 nanoseconds (or 0.1 microseconds) . Other Uses for CustomTimerDpc Routines In the next section, you'll see an example of a driver that performs data transfers using a CustomTimerDpc instead of device interrupts. It's worth point ing out that, in some situations, you might want to use this kind of technique even with devices that do generate interrupts. This could be helpful if your device gen erates so many interrupts that it overwhelms the Kernel's interrupt dispatcher and degrades system performance. The sample parallel port driver that comes with the NT DDK is one example of a driver that uses this technique. This driver monitors the arrival rate of inter rupts for its device. When a flood of interrupts threatens to drown the system, the driver intentionally disables parallel port interrupts and uses a CustomTimerDpc to send data to the device. Depending on the device you're working with, this kind of adaptive behavior might be something you want to consider. 1 0 . 4 C O D E EX A M P L E : A TI M E R - B A S E D D R I V E R This modified version o f the parallel port driver disables interrupts and uses a CustomTimerDpc routine to transfer data at fixed intervals. You can find the code for this example in the CHlO\POLLING\DRIVER directory on the disk that accompanies this book. Chapter 10 216 Timers XXDRIVER.H This version of the main header file builds on the ones seen in previous chapters. Only one structure from this file is of much interest. DEVICE_EXTENSION The following excerpt shows the changes in the Device Extension needed to support polling. typede f s t ruct _DEVICE_EXTENS I ON { PDEVICE_OBJECT DeviceObj e c t ; I I Back pointer ULONG NtDevic eNumber ; I I Zero -based devi c e num PUCHAR PortBas e ; I I F i r s t c ontrol reg i s terO KDPC Po l l ingDpc ; I I Components o f the @ KTIMER Po l l ingTime r ; I I po l l ing mechani sm LARGE_INTEGER Po l l inglnterval ; 8 Byte s t o s end at once 0 Requested trans f e r s i z e Chars l e f t to trans f e r Next char to s end ULONG F i f oS i z e ; ULONG Byt e sReques t ed ; ULONG Byte sRemaining ; PUCHAR pBu f f e r ; II II II II UCHAR Devi c e S tatus ; I I Mos t recent s tatus } DEVICE_EXTENS I ON , * PDEVICE_EXTENS I ON ; 0 While we need to have access to the device's control registers, we're not keeping a pointer to an Interrupt object in this driver. All interrupts from this device will be turned off. @ The Dpc and Timer objects together will activate the CustomTimerDpc routine. 8 The Pollinglnterval field holds the expiration interval for the polling timer. For convenience in this driver, we'll keep the value in microsec onds rather than tenths of microseconds. e The rest of the structure is the same as the interrupt-driven version. I NIT.C Here is a tiny excerpt from the driver's initialization code. The rest of it is the same boilerplate we've been looking at for several chapters. Not shown (but equally important) is the hardware initialization code that disables interrupts from the parallel port. XxCreateDevice This function sets up the Device object. It differs from the interrupt-driven version in that it never calls IoConnectlnterrupt, and it has to initialize the polling timer. Sec. 10.4 Code Example: A Timer-Based Driver 217 s ta t i c NTSTATUS XxCreateDevi c e ( IN PDRIVER_OBJECT DriverObj e c t , IN PCONF I G_BLOCK pCon f i g , I I Con f i g block IN ULONG uNum I I Devi ce number ) { II I I Copy things f rom Con f i g block II pDevExt - > PortBase = pCon f i g - >Devi c e [ uNum ] . PortBas e ; II I I Calculate the po l l ing interval II pDevExt - > Po l l inginterval Rt lConver tLongToLarge integer ( XX_POLLING_INTERVAL * - 1 0 ) ; 0 = II I I Prepare the po l l ing t imer and i t s DPC obj e c t II Keini t i al i z eT imer ( &pDevExt - > Po l l ingTimer ) ; @ Ke i ni t ial i z eDpc ( 49 &pDevExt - > Po l l ingDpc , XxPo l l ingTimerDpc , ( PVOI D ) pDevObj ) ; II I I Form the Win3 2 symbo l i c l ink name . II } 0 We use an RtlXxx convenience function to create the polling interval. Since the number is negative, the timeout will be measured relative to the moment the Timer object is started. Multiplying the value by ten allows us to specify XX_POLLING_INTERVAL in microseconds. @ Get the Kernel to turn the blob of memory into a Timer object. 49 Attach the CustomTimerDpc routine to the DPC object. Pass a pointer to the Device object each time the CustomTimerDpc is called. TRANSFER.C Since this driver uses polling rather than interrupts to send data, you won't find any Interrupt Service routine here. Chapter 10 218 Timers XxStartlo This function is called to begin the processing of each IRP. It looks very much like the interrupt-driven version. VO I D XxS tart i o ( IN PDEVI CE_OBJECT Devic eObj ect , IN PIRP I rp ) { P IO_STACK_LOCATI ON I rpS tack = I oGe tCurrent i rpStackLocat i on ( I rp ) ; PDEVICE_EXTENS ION pDE = Dev i c eObj e c t - >Devi c eExt ens i on ; swi tch ( I rpS tack- >Maj orFunc t i on ) { case I RP_MJ_WRITE : 0 pDE - >BytesReques ted = I rpS tack- > Parame ters . Wr i t e . Length ; pDE - > Byt esRemaining = pDE - >BytesReque s t ed ; pDE - >pBu f fer = I rp - >As s oc iatedirp . Sys temBu f f e r ; i f ( ! XxTransmi tByt e s ( pDE ) ) @ { XxF ini shCurrentRequest ( Devi c eObj e c t , pDE , I rp , IO_NO_INCREMENT ) ; break ; II I I Should never get here - - j us t get r i d I I o f the packet . . . II de faul t : I rp - > I o S tatus . S tatus = STATUS_NOT_SUPPORTED ; I rp- > I o S tatus . Informat i on = O ; I oComp l e t eReques t ( I rp , I O_NO_INCREMENT ) ; Sec. 10.4 Code Example: A Timer-Based Driver 219 I o S t artNext Packet ( Devic eObj e c t , FALSE ) ; break ; } } 0 If this turns out to be an IRP_MJ_WRITE request, just set up the necessary counters and pointers, and try to send the first bunch of bytes to the device. @ Notice that XxTransmitBytes is being called directly from DISPATCH_LEVEL IRQL. There's no need to synchronize it using a Synch Critical Section routine because there is no interrupt activity from the device. XxTransmitBytes This routine sends a fixed number of bytes out to the parallel port. If the device has a personal problem or there are no more bytes left in the buffer, it returns a FALSE. This data s tat i c BOOLEAN XxTransmi tByt e s ( IN PDEVICE_EXTENSION pDE ) { ULONG XferS i z e ; UCHAR Control = XxReadContro l ( pDE ) ; pDE - >Devi ceS tatus = XxReadS t atus ( pDE ) ; II I I I f a l l the byt es have been s ent or the I I devi ce is unhappy , j us t qui t II i f ( ( pDE ->Byte s Remaining == 0 ) I I ! XX_OK ( pDE - >Devi ceS tatus ) ) { return FALSE ; } II I I Calculate the number o f byt e s to I I s end in one bunch . II i f ( pDE - >Byte sRemaining < pDE - > F i foS i z e pDE - >BytesRemaining ; XferS i z e else pDE - > F i f o S i z e ; Xf erS i z e whi l e ( XferS i z e > 0 ) {0 Chapter 10 220 Timers II I I Make sure the STROBE l ine i s o f f II XxWr i t eControl ( pDE , Control & -XX_CTL_STROBE ) ; II I I Updat e pointer and counters II pDE- >pBu f f e r + + ; XferS i z e - - ; pDE- >Byt esRemaining - - ; II I I S t art the po l l ing t imer II KeS e tTimer ( @ &pDE - > Po l l ingTime r , pDE - > Po l l inginterval , &pDE - > Po l l ingDpc ) ; re turn TRUE ; } 0 Send as many bytes to the device as it can handle. Since this is a parallel port device, each byte has to be strobed out. @ Start the polling Timer object. When the Timer expires, the associated DPC routine will be queued automatically. XxPollingTimerDpc This function runs each time the polling timer expires. It replaces both the ISR and the DpcForlsr routines in the interrupt-driven version of this driver. VOI D XxPo l l ingTimerDpc ( IN PKDPC Dpc , IN PVOID Cont ext , IN PVO ID Sys t emArgumentl , 0 IN PVOID Sys t emArgument2 ) PDEVICE_OBJECT Dev i c eObj e c t = Cont ext ; i f ( ! XxTransmi tByt e s ( @ Devi c eObj e c t - >Devi ceExtens i on ) ) Sec. 10.S Summary 221 { XxF inishCurrentReque s t ( @ Devi ceObj ect , Devi c eObj e c t - >DeviceExtens i on , Devic eObj e c t - >Current irp , I O_PARALLEL_INCREMENT ) ; 0 Remember that the contents of the two system arguments are undefined in a CustomTimerDpc routine. @ Try to send the next bunch of bytes. If XxTransmitBytes fails, it means either an error occurred, or there is no more data to send. If it succeeds, it restarts the polling timer, which will eventually result in another call to XxPollingTimerDpc. 49 Call XxFinishCurrentRequest to come up with an appropriate status code and complete the IRP. Again, notice that everything is happening at DISPATCH_LEVEL IRQL. XxFinishCurrentlrp runs in response to a reg ular function-call, not a DPC request. 1 0.5 SUM MARY This chapter has presented two different aspects of using time in your driver. Handling device timeouts is something that will always be important, while the use of CustomTimerDpc routines may only be useful for certain kinds of devices. One important use of CustomTimerDpc routines is to implement various polling strategies. You now have enough tools to write reasonable drivers for many simple pieces of hardware. In the next chapter, we'll look at some additional techniques for managing full-duplex devices and devices that generate asynchronous events. C H A P T E R 11 Full-Dup lex Drivers T he driver model described in the last few chap ters has one significant limitation: It allows you to process only a single IRP at a time per Device object. While this is fine for many situations, it doesn't cut it if your driver has to perform both input and output operations simultaneously. This chapter presents a modified driver architecture that lifts the single-IRP restriction. To implement this architecture, it uses several new techniques (like CustomDpc and Cancel routines) that can be helpful in any kind of driver. At the end of the chapter, sample code for a tiny serial port driver will tie all the loose ends together. 1 1 . 1 DOING Two THINGS AT ONCE Just what is it about the standard driver architecture that prevents a single Device object from processing two IRPs at once? The problem becomes clear if you con sider what happens when a Dispatch routine sends an IRP to IoStartPacket. Calling loStartPacket with a pointer to an idle Device object makes the object busy and invokes the driver 's Start 1 / 0 routine. From then on, any calls to loStartPacket targeting the same Device object result in IRPs being added to the object's queue of pending requests. This continues until the Start I / 0 or Dpc Forlsr routines call IoStartNextPacket to mark the Device object as being idle. This kind of behavior makes it very difficult to start another IRP before the cur rent one is completed. 222 Sec. 1 1 . 1 Doing Two Things at Once 223 Do You Need to Process Concurrent I R Ps? The first thing to ask yourself is whether your driver really needs to process multiple IRPs concurrently. This is actually a question about the kind of software interface your driver is going to expose. For purposes of this discussion, you can divide driver interfaces into the following categories: • Simplex i nterface tion. • Half-duplex i nterface These drivers manage hardware that transfers data in both directions, but (for whatever reason) the drivers only process one request at a time. • Full-duplex interface Here, the driver can perform both inputs and outputs simultaneously. - These drivers can transfer data only in one direc - - The standard driver model easily supports both the simplex and half-duplex cases. Unfortunately, since it can't handle two requests at the same time, you can't use this model to provide a full-duplex driver interface. An important factor in choosing a software interface is the behavior of the underlying hardware. Usually, this will tell you what kind of driver is most appropriate. Broadly speaking, you can divide hardware into three families. Simplex devices These devices can transfer data in only one direction. The standard parallel port and the mouse are both examples of simplex hardware. It's very unlikely that you'd need a full-duplex driver for a simplex device. Half-duplex devices This type of device can transfer data in both direc tions, but only one transfer can take place at a time. Disk controllers and Ethernet cards are both examples of half-duplex hardware. The choice of driver interface will depend on how the device is used. It's natural for disk controllers to process only one request at a time. Network cards need to give the appearance of per forming simultaneous input and output operations, even though the device itself can only send or receive one packet at a time. Full-duplex devices These devices can perform simultaneous input and output operations. The standard serial port exhibits this kind of behavior. A full duplex driver is almost always a necessity for this type of device. How the Modified Driver Architectu re Works In a nutshell, if you want a single Device object to process two concurrent IRPs, you need to establish a complete secondary path through your driver. IRPs taking this alternate route will be processed in parallel with those following the standard path. To do this, you must: Chapter 1 1 224 Full-Duplex Drivers 1. Divide the IRP_MJ_:XXX functions supported by your driver into two catego ries: Those to be processed by the standard Start I/O routine (the primary IRPs) and those that will travel the alternate path (the alternate IRPs). 2. Set up various bookkeeping structures to handle IRPs with alternate function codes. This involves maintaining a queue of alternate IRPs, as well as keeping track of the current alternate IRP. In this chapter, we'll be using Device Queue objects to hold the alternate IRPs. 3. Duplicate some of the logic in the I/0 Manager 's loStartPacket and IoStart NextPacket functions. Your versions of these routines will be responsible for controlling the flow of IRPs along the alternate path. 4. Write additional Start I/0 and DPC routines to handle alternate IRPs. 5. Modify the Interrupt Service routine so that it can process both primary and alternate DPC functions. Data Structures for a Full-Duplex Driver In Chapter 4 you saw that a standard Device object contains a Currentlrp field that keeps track of the primary IRP being processed. Although it wasn't dis cussed in any detail, you also saw that the Device object contains an embedded Device Queue object for holding primary IRPs that arrive after the Device object has become busy. In a full-duplex driver, you need to set up parallel structures to manage the alternate IRPs. Normally, this bookkeeping happens in the Device Extension, as shown in Figure 11.1. Primary I R Ps Alternate IRPs Current IRP Alternate Queue Copyright @ 1 996 b y Cydonix Corporation. 960017a.vsd Figure 1 1 .1 A full-duplex driver uses these data structures Sec. 1 1 .2 Using Device Queue Objects 225 Along with the alternate IRP pointer and the Device Queue object, there are some other changes to the Device Extension. If you're doing Buffered 1 / 0, you'll need two sets of buffer pointers and counters to keep track of your progress through an 1 / 0 request. In addition, the strategy adopted in this chapter uses sep arate DPC routines for completing primary and alternate IRPs, so you'll need to leave room in the Device Extension for a KDPC object. Implementing the Alternate Path Setting up the alternate path requires changes to several parts of your driver 's code. The following subsections describe the modifications you'll need to make. Dispatch routines In a full-duplex driver, Dispatch routines for the alter nate IRP_MJ_XXX function codes don't use the IoStartPacket function. Instead, they call the driver-defined start-packet routine to send IRPs down the alternate processing path. Start 1/0 routines The modified driver architecture is going to use two Start 1/0 routines: One for IRPs with primary IRP_MJ_XXX function codes and another for the IRPs with alternate codes. Implementing these functions as sepa rate pieces of code usually makes them easier to manage. Interrupt Service routine When an interrupt arrives, the Interrupt Service routine has to perform different kinds of processing for primary and alternate operations. It needs to send primary and alternate IRPs to different DPC routines for postprocessing. DPC routi nes Although you could write a full-duplex driver with only a single DpcForlsr routine, it's usually easier to have a separate CustomDpc routine for the alternate IRPs. When this CustomDpc routine completes an IRP, it calls the driver-defined version of IoStartNextPacket to begin processing the next alter nate IRP. 1 1 .2 USING DEVICE QUEUE OBJ ECTS I A full-duplex driver needs some way to keep track of pending IRPs that arrive while the driver is already processing an alternate IRP. Although there are several ways to handle this situation, the driver model developed in this chapter is going to use a Device Queue object to hold on to pending alternate IRPs. This is the same strategy that the 1 / 0 Manager uses for the driver 's primary IRPs. How Device Queue Objects Work A Device Queue is a Kernel object that contains a linked list guarded by an embedded Executive spin lock. Although a Device Queue object can hold any Chapter 1 1 226 Full-Duplex Drivers structure with a KDEVICE_QUEUE_ENTRY in it, they are most commonly used to store a Device object's pending IRPs. A Device Queue object is always in one of two states: Busy if there's been at least one attempt to insert an entry into the queue and Not Busy if there's been an attempt to remove an entry from an empty queue. Table 11 .1 shows how Device Queue state transitions work. The basic pattern is fairly simple: If you try to insert an entry into a Device Queue that isn't Busy, the insertion fails but the queue becomes Busy. Once it has become Busy, insertion operations succeed. Removing entries from a Busy Device Queue causes no change in the object's state. Once the Device Queue has no more entries, the next attempt to remove one causes the object to return to the Not-Busy state. The IoStartPacket and IoStartNextPacket functions use the state of the Device object's built-in Device Queue to guarantee that a driver 's Start 1/0 rou tine receives only one IRP at a time per Device object. The Device Queue is Not Busy if the associated Device object is ready to process another IRP, and Busy if the Device object is currently working on an IRP. How to Use Device Queue Objects It's fairly easy to work with Device Queue objects. The code example appearing later in this chapter will show you the specific details. In general, what you do is: 1. Declare a KDEVICE_QUEUE item in your Device Extension structure. 2. In your DriverEntry routine, call KelnitializeDeviceQueue. This sets up both the Device Queue object and its associated Executive spin lock. 3. Use the functions in Table 11.2 to insert or remove IRPs. These routines auto matically acquire and release the Executive spin lock hidden in the Device Queue object. There are two things to notice about Device Queue objects. First, you must be at DISPATCH_LEVEL IRQL in order to call the functions that insert and Table 1 1 .1 State transitions in Device Queue objects Device Queue state transitions Initial state Action Final state Entry is . . . Not Busy Busy Busy Busy Busy Insert into empty Insert into empty Insert into non-empty Remove from non-empty Remove from empty Busy Busy Busy Busy Not Busy Not inserted Inserted Inserted Removed Sec. 1 1 .2 Using Device Queue Objects Table 1 1 .2 227 Use these functions to work with Device Queue objects How to use Device Queue objects IF you want to ... THEN call ... IRQL Create a Device Queue Insert an IRP at the end Insert IRP in sort-order Remove first IRP Remove specific IRP KelnitializeDeviceQueue KelnsertDeviceQueue KelnsertByKeyDeviceQueue KeRemoveDeviceQueue KeRemoveEntryDeviceQueue PASSIVE_LEVEL DISPATCH_LEVEL DISPATCH_LEVEL DISPATCH_LEVEL DISPATCH_LEVEL remove Device Queue entries. To call these functions from some part of your driver running at PASSIVE_LEVEL IRQL, you have to change levels by calling KeRaiselrql and KeLowerlrql. Second, Device Queue objects must live in nonpaged storage. Since you normally declare them as part of your Device Extension structure, this poses no particular problem. To link an IRP into a Device Queue, you use a predefined Device Queue entry that's a standard part of the IRP. The code looks like this: KeinsertDevi c eQueue ( &pDevExt - >Al t erna t e i rpQueue , & I rp- >Tai l . Over l ay . Devic eQueueEntry ) ; Here, AltematelrpQueue is a KDEVICE_QUEUE structure that's part of the Device Extension. When you remove an item from a Device Queue, you get a pointer to the queue entry. As this fragment of code illustrates, you still need to use the CONTAINING_RECORD macro to convert this entry back into the address of an IRP: P I RP I rp ; PKDEVICE_QUEUE_ENTRY QueueEnt ry ; QueueEntry = KeRemoveDevic eQueue ( &pDevExt - >Al t erna t e i rpQueue ) ; i f ( QueueEnt ry ! = NULL ) { I rp = CONTAINING_RECORD ( QueueEntry , I RP , Tai l . Over l ay . DeviceQueueEntry ) ; I I Do s omething with the IRP Chapter 11 228 Full-Duplex Drivers Also remember to check for a NULL return value. There's always the possi bility that the queue might be empty. 1 1 .3 WRITING CUSTOM DPC ROUTI N ES Chapter 3 briefly introduced DPC objects as a general-purpose way for high IRQL code to perform less-important processing at a lower IRQL level. All the drivers you've seen since then have taken advantage of the 1/0 Manager 's Dpc Forlsr mechanism to simplify the use of DPCs. For many situations, this may pro vide all the functionality you'll need. In the case of full-duplex drivers, however, funneling everything through a single DpcForlsr routine adds unnecessary com plications to the design of the software. This section explains how to work directly with Kernel DPC objects using CustomDpc routines. Although the main focus will be on their use in full-duplex drivers, CustomDpc routines can be valuable in any situation where a driver 's Interrupt Service routine needs to perform some action that isn't allowed at DIRQL. How to Use a CustomDpc Routine Working directly with Kernel DPC objects isn't terribly difficult. This is what you need to do: l. When you define your Device or Controller Extension, declare a separate KDPC item for each CustomDpc routine you plan to use. 2. In your DriverEntry routine, initialize each KDPC object by calling Kelnitial izeDpc. This sets up a correspondence between the KDPC object and a spe cific CustomDpc routine in your driver. 3. When you want to fire off the CustomDpc routine (usually from the driver's ISR), call KelnsertQueueDpc (see Table 11 .3) . To cancel a pending DPC request, you can call KeRemoveQueueDpc. Table 1 1 .3 Function prototype for Kel nsertQueueDpc :t: BOOLEAN KelnsertQueueDpc IRQL Parameter Description IN PKDPC Dpc IN PVOID SystemArgl IN PVOID SystemArg2 Return value Address of initialized DPC object to be queued First call-specific DPC parameter Second call-specific DPC parameter • TRUE - the DPC was successfully queued • FALSE - the DPC is already in the queue DISPATCH_LEVEL Sec. 1 1 .4 Canceling 1/0 Requests Table 1 1 .4 229 Prototype for a CustomDpc routine == VOID XxCustomDpc IRQL DISPATCH_LEVEL Parameter Description IN PKDPC Dpc IN PVOID Context IN PVOID SystemArgl IN PVOID SystemArg2 Return value DPC object that generated the call Context parameter passed to KelnitializeDpc 1st DPC parameter passed to KelnsertQueueDpc 2nd DPC parameter passed to KelnsertQueueDpc Remember that you can't queue a DPC object that's already in the queue. If you try, KelnsertQueueDpc will return FALSE. This kind of thing might happen if your device has such a high interrupt rate that the DPC routine doesn't get a chance to run before the next interrupt arrives. In this case, it's up to your driver to decide what to do. Depending on the design of your driver, one solution might be to initialize a pool of DPCs for the ISR to use. In any event, remember that it's your responsibility to take care of this situation. Execution Context The Kernel's DPC dispatcher eventually removes your DPC routine from the queue and calls the associated CustomDpc routine. Table 11 .4 shows the pro totype for the DPC routine itself. Notice that you can pass three driver-specific parameters to a CustomDpc routine. Along with the Context item that KelnitializeDpc associates with the DPC object, you can pass two additional parameters each time you call Keinsert QueueDpc. This is a little more flexible than the I/O Manager 's DpcForisr mech anism, which always passes the Device object, the IRP, and one call-specific argument. Depending on what you're trying to do, this can be useful. 1 1 . 4 CA N C E L I N G 1/0 R E Q U E STS One issue we haven't addressed yet is how to deal with 1/0 requests that get abandoned. Although there's nothing about full-duplex drivers that makes them more prone to canceled requests, this is as good a place as any to bring up the sub ject. Specifically, a driver has to be prepared for any of the following situations: • A thread issues one or more overlapped I / 0 requests to a Device object. Before the driver processes these IRPs, the thread either terminates or closes its handle to the Device object. • A thread issues one or more overlapped 1/0 requests and then calls some other Win32 function that cancels any previous requests. For example, Chapter 1 1 230 Full-Duplex Drivers one side-effect of calling SetupComm is that it automatically cancels all pending IRPs. • A higher-level driver allocates an IRP and sends it to another driver using IoCallDriver. Before the IRP completes, the higher-level driver calls the IoCancelirp function to cancel the request. 1 In all three of these cases, the I/O Manager will notify the driver that the IRPs involved in the I/O need to be cancelled. Once it's been notified, the driver 's job is to complete the affected IRPs with an IoStatus.Status value of STATUS_CANCELLED and an IoStatus.Information value of zero. This section explains the mechanics of canceling I/O requests. You'll see examples of how to implement cancellation routines in the sample UART driver at the end of the chapter. How IRP Cancellation Works In general, any driver that's going to hold IRPs in a pending state for a long time needs to support cancellation. This really includes most device drivers, since any Device object can have multiple overlapped requests waiting in its Device Queue for the Start I/O routine. Cancellation support is also necessary in any driver that stores IRPs temporarily in a driver-defined queue during the course of processing. Some of the issues will become a little more clear if you think about just what might be going on when a cancel notification arrives. An IRP can be in one of the following states at the time it gets cancelled: • It might be in a queue waiting for the driver to get to it. This could be the Device object's Device Queue of pending requests (waiting for the Start I/0 routine), or some private, internal queue managed by the driver. • The IRP might have been removed from a queue, but the driver hasn't started to work on it yet. For example, an IRP might have become the Device object's current IRP but the Start I/0 routine hasn't quite begun processing it. • It might have been removed from a queue, and the next driver routine has begun processing it. The I/O Manager 's philosophy is that if an IRP is waiting in a queue when a cancellation request occurs, then the driver should dequeue the IRP and cancel it. Similarly, if the IRP has just been removed from a queue but processing hasn't begun, the driver should cancel it. On the other hand, if the IRP has already been 1 Incidentally, a driver is only allowed to cancel IRPs that it has allocated and sent to a lower-level driver. It must not try to cancel any IRPs sent to it by the 1/0 Manager or by a higher-level driver. Sec. 1 1 .4 Canceling I/ 0 Requests 231 started (and if it won't take too long to complete), then the driver should finish processing the request the normal way. The I/ 0 Manager provides two independent mechanisms for cancelling IRPs. First of all, a driver can attach a Cancel routine to an IRP before it puts the IRP into any queue. If the IRP is cancelled while it's still in the queue, the Cancel routine dequeues the IRP and performs the cancellation. If there's no cancellation request, some other part of the driver eventually dequeues the IRP, removes its Cancel routine, and continues processing it. This allows individual IRPs to be can celled selectively. Second, a driver can have a Cleanup Dispatch routine that processes the IRP_MJ_CLEANUP major function code. The 1/0 Manager automatically sends an IRP with this function code whenever a thread terminates or closes a handle. The job of the Cleanup Dispatch is to cancel any queued IRPs belonging to the thread. This is a more general mechanism that all drivers ought to support. Synchronization Issues Keep in mind that a driver's I/O processing, Cleanup Dispatch, and Cancel routines all execute asynchronously. On a multi-processor system, they could lit erally be running at the same time. As a result, various driver routines have to coordinate their activities with care. Otherwise, there's always the chance one part of a driver might keep working on an IRP that another part of the driver has already cancelled. For example, imagine that an IRP with an attached Cancel routine is sitting in the Device Queue of some Device object. The I/0 Manager dequeues the IRP, makes it the Device object's current IRP, and then calls the driver 's Start I/0 rou tine. Start 1/0 gets control, but before it can remove the IRP's Cancel routine, the IRP is cancelled (perhaps on another CPU) and the Cancel routine executes. Now, the Start I/0 routine will begin processing the IRP and the Cancel routine will cancel it. 2 Or consider the case where a driver 's Cleanup Dispatch routine is in the process of cancelling an IRP with an attached Cancel routine. If the Cancel routine starts running before the Cleanup Dispatch function can disable it, again there will be a very nasty collision. The I/O Manager uses two mechanisms to prevent these kinds of synchroni zation problems. The following subsections describe how they work. The Cancel spin lock The 1/0 Manager 's Cancel spin lock is the primary safeguard against collisions during IRP cancellation. Ownership of this spin lock guarantees exclusive access to any IRP fields involved in cancelling a request. It 2 This race condition is not limited to the Start I/O routine. Any time a queued IRP has an attached Cancel routine, there's the chance that the Cancel routine may execute between the moment when the IRP is dequeued and the moment when its Cancel routine is disabled. Chapter 1 1 232 Full-Duplex Drivers also protects the IRP from the time it leaves a Device object's Device Queue until it becomes the current IRP. Any driver-defined data structures that are shared among the Cleanup Dis patch routine, a Cancel routine, and some other driver routine should also be guarded by this spin lock. This includes any internal queues where the driver might be holding IRPs. To use this lock, you need to call loAcquireCancelSpinLock before you touch any of the various CancelXxx fields of the IRP and loReleaseCancelSpin Lock when you're finished. This is an Executive spin lock, so you have to be at or below DISPATCH_LEVEL IRQL when you acquire it. During the time you actu ally hold the Cancel spin lock, you'll be running at DISPATCH_LEVEL IRQL, so it's important not to cause any page faults. Two important points about working with the Cancel spin lock: First, make sure you release it before you call loCompleteRequest. If you break this rule, you can cause a system deadlock. Second, remember that there's only one of these locks for the whole system, so don't hold on to it for too long. Doing so can prevent other drivers from run ning, which can degrade system performance. The IRP Cancel flag Any time a driver removes an IRP with a Cancel rou tine from a queue, there's always the danger that the Cancel routine will execute in the brief interval before it can be disabled. This would lead to a situation where the driver continued processing an IRP that had already been completed by the Cancel function. To avoid this problem, each IRP contains a Boolean Cancel flag. By setting this flag to TRUE before it calls the IRP's Cancel routine, the 1 / 0 Manager lets other parts of the driver know that the IRP has already been completed. Like other cancellation fields in the IRP, the Cancel flag is guarded by the Cancel spin lock. A driver 's processing routines check the Cancel flag after they remove an IRP from a queue. If the flag is TRUE, it means the Cancel routine has already grabbed the IRP and nothing more should be done with it. If the flag is FALSE, the processing routine sets the CancelRoutine field of the IRP to NULL using IoSet CancelRoutine and starts to work on it. 3 From this point on, the Cancel routine can't run anymore, so there's no more danger. What a Cancel Routine Does Whenever a driver puts an IRP into a queue where it might remain for an indefinite time, the driver should give the 1 / 0 Manager the option of canceling the IRP. To do this, the driver attaches a queue-specific Cancel routine to the IRP 3 Calling IoSetCancelRoutine requires that you first become the owner of the Cancel spin lock. Sec. 1 1 .4 Canceling I/ 0 Requests 233 by calling IoSetCancelRoutine. The Cancel routine is responsible for doing whatever is necessary to cancel the IRP. The exact actions it takes will depend on where the IRP is in its processing cycle. If the driver has multiple internal queues, it can attach different Cancel routines to an IRP at different stages of processing. A driver can have a Cancel routine attached to an IRP only while the driver actually owns the IRP. In other words, the IRP is only cancelable between the time the driver receives the IRP and when it either completes the IRP or sends it to another driver with IoCallDriver. Before releasing an IRP, the driver must set its CancelRoutine field to NULL using IoSetCancelRoutine. As described at the beginning of this section, the I/0 Manager will call an IRP's Cancel routine if the thread issuing the request terminates or closes its han dle before the request completes. The Cancel routine will also execute if a higher level driver explicitly cancels the request with IoCancellrp. An IRP's Cancel routine runs at DISPATCH_LEVEL IRQL. As input, it receives a pointer to the Device object and a pointer to the IRP being cancelled. Before calling a Cancel routine, the I/O Manager acquires the Cancel spin lock, sets the IRP's Cancel flag to TRUE and its CancelRoutine field to NULL. The Can cel routine has to release the Cancel lock before it returns. The specific actions taken by a Cancel routine will depend on the state of the IRP at the time it gets cancelled. The following subsections describe each of the possibilities. IRP is in the Device Queue If the IRP has not become current yet, then it must still be in the Device object's Device queue. In this case, the Cancel routine takes the following actions: 1. I t calls KeRemoveEntryDeviceQueue t o pull the IRP from the Device Queue. 2. The Cancel routine then calls IoReleaseCancelSpinLock to let go of the Can cel lock. 3. Next, it puts STATUS_CANCELLED in the IRP's IoStatus.Status field and 0 in its IoStatus.Information field. 4. The Cancel routine calls IoCompleteRequest to give the IRP back to the I/O Manager. The priority boost is set to IO_NO_INCREMENT. There's no need to call IoStartNextPacket since the IRP was canceled while it was still in the Device Queue and hadn't yet entered the Start I/O routine. IRP is current A Cancel routine might run in the brief interval after the I/ 0 Manager has put an IRP' s address into a Device object's Currentlrp field but before the Start 1/0 routine has set the IRP's CancelRoutine field to NULL. The Cancel routine normally checks to see if the IRP being cancelled is the Device object's current IRP, and if it is, it does the following: Chapter 1 1 234 Full-Duplex Drivers 1. I t calls IoReleaseCancelSpinLock to let go o f the Cancel lock. 2. The Cancel routine next sets the IRP's IoStatus.Status field to STATUS_CANCELLED and its IoStatus.Information field to 0. 3. Next, it calls IoCompleteRequest to give the IRP back to the 1/0 Manager. The priority boost is set to IO_NO_INCREMENT. 4. Finally, the Cancel routine calls IoStartNextPacket to make the driver start the next IRP. IRP is i n some other queue A driver can always maintain its own pri vate queue of IRPs. If an IRP is in such a queue at the time it gets cancelled, its queue-specific Cancel routine does the following: 1. It calls RemoveEntryList to dequeue it. 4 2. The Cancel routine then calls IoReleaseCancelSpinLock to let go of the Can cel lock. 3. Next, it puts STATUS_CANCELLED in the IRP's IoStatus.Status field, and zero in its IoStatus.Information field. 4. The Cancel routine calls IoCompleteRequest to give the IRP back to the I/ 0 Manager. The priority boost is set to IO_NO_INCREMENT. 5. Depending on the design of the driver, it may be necessary to call IoStart N extPacket to get the driver working on the next request. What a Dispatch Cleanup Routine Does At the time a user-mode thread terminates (either normally or abnormally), it may still have incomplete I/ 0 requests associated with one or more Device objects. Similarly, it's possible for a thread to close a Device object handle with requests pending. In both these cases, the 1/0 Manager will try to clean up the outstanding 1/0 requests by doing two things: It sends the Device object an IRP with the major function code IRP_MJ_CLEANUP and it calls the Cancel routine of any IRPs associated with the thread. After this, the 1/0 Manager delays execution of the thread, giving the driver time to process the IRPs. If the driver completes the IRPs, the 1/0 Manager responds by sending an IRP_MJ_CLOSE IRP to the Device object. If the IRPs aren't completed during the timeout interval (which can last more than five minutes), things get ugly. In this case, the 1/0 Manager displays a message box for each IRP (naming the offending driver) and detaches the IRP 4 This assumes the driver-defined queue is protected by the Cancel spin lock. The RemoveEntryList function is not interlocked. Sec. 1 1 .4 Canceling I / 0 Requests 235 from the thread. These zombie IRPs are lost to the system, as is any system buffer space associated with them. Another side-effect is that the driver can't be unloaded since it still has outstanding IRPs. No IRP_MJ_CLOSE ever gets sent. From this description, you can see how important it is for a driver to clean up pending 1/0 requests. As you already know, one way to do this is to attach Cancel routines to every IRP. For some drivers, this may be overkill, and a simpler method is just to ask for cleanup notifications. To receive these notifications, the Driver object has to have a Dispatch routine registered for IRP_MJ_CLEANUP in its MajorFunction table. The job of the Cleanup Dispatch routine is to cancel any queued requests associated with a specific Device object. For nonshareable Device objects, this means flushing all IRPs out of the Device Queue and any other driver-defined queues where they may be hiding. Depending on the nature of the device, the driver might also abort a request in progress, or let it complete normally. If a Device object is shareable, cleanup involves a little more work. In this case, the Cleanup Dispatch routine must cancel only those IRPs associated with the specific user-mode handle being closed. To do this, it uses the File object pointer stored in the 1/0 stack location of each IRP. This pointer uniquely identi fies the user-mode handle that issued the request. The Cleanup Dispatch routine simply has to compare the File object pointer in each queued IRP with the pointer in the IRP_MJ_CLEANUP IRP. If they match, the queued IRP needs to be cancelled. Like any Dispatch routine, the Cleanup Dispatch executes at PASSIVE_LEVEL IRQL. Although the specific steps will depend on the driver, a Cleanup Dispatch routine generally has to do the following: 1. I t calls IoAcquireCancelSpinLock t o acquire the Cancel spin lock. Unlike a Cancel routine, the Cleanup Dispatch routine doesn't automatically hold this spin lock when it's called. 2. Next, it scans the Device Queue of the target Device object looking for IRPs whose File object pointer matches the File object pointer of the IRP_MJ_CLEANUP IRP itself. 3. The Dispatch Cleanup routine removes each matching IRP from the Device Queue and sets the IRP's CancelRoutine field to NULL. It also sets the IRP's Cancel flag to TRUE and its Cancellrql field to DISPATCH_LEVEL. The IRP is then added to a list of requests to be cancelled. 4. If the driver maintains any private queues where IRPs might be held, the Dis patch Cleanup routine performs a similar scan. Any IRPs with matching File object pointers are removed from these queues, their various CancelX:xx fields are modified and they are also put in the list of requests to be cancelled. 5. After releasing the Cancel spin lock, the Dispatch Cleanup routine completes all the IRPs in its cancellation list with a status of STATUS_ CANCELLED and a boost of IO_NO_INCREMENT. Chapter 1 1 236 6. Full-Duplex Drivers Finally, it completes the IRP_MJ_CLEANUP request itself with a status of STATUS_SUCCESS and a priority boost value of IO_NO_INCREMENT. 1 1 .5 SO M E MORE HARDWARE : TH E 1 6550 UART This section describes the operation of the 16550 UART (Universal Asynchronous Receiver/Transmitter), a typical full-duplex device. Knowing how this hardware works will make it easier to understand the sample driver in the next section. What the 1 6550 UART Does The 16550 UART is an integrated circuit that performs serial input and out put. Normally the UART is coupled to some kind of line-driver chip that inter faces with the outside world. For example, this is how the RS-232 serial ports on most computers are implemented. The beauty of the UART is that it hides all the unpleasant details of framing the data with START and STOP bits, as well as generating parity and making sure all the bits are shifted out at the proper rate. To perform serial data transfers, you just move individual bytes to or from the UART's buffer registers. On output, you send a data byte to the UART's Transmit Data register from which it is moved into a 16-byte FIFO on the chip. When the data byte makes it to the other end of the FIFO, it goes into a shift register that sends it out over the serial line one bit at a time. When the FIFO empties out, the UART sets its TBE (transmit buffer empty) flag. Meanwhile, the UART's receiver section is constantly monitoring the serial line for input. As bits appear, they are added to a shift register that assembles them into a single byte of data. When the byte is complete, it goes into the input FIFO and the UART sets its RxRDY (receive data ready) flag to indicate that data is available. This flag stays up as long as any data remains in the FIFO. You pull data bytes one by one from the UART's Receive Data register. Device Registers You interact with the 16550 UART by reading and writing a set of one-byte registers, which are described briefly in Table 11.5. Although this chapter gives you enough information to talk intelligently about the 16550 at a dinner party, you should read the data sheets from National Semiconductor if you want the whole story. 5 If you count carefully, you'll notice that there are twelve registers sand wiched into an eight-byte span. How can this be? Actually, it's the hardware 5 Joe Campbell's definitive book on serial communications (listed in the bibliography) is another excellent source of information. Sec. 1 1 .5 Some More Hardware: The 16550 UART Table 1 1 .5 237 Control and status registers for a 1 6550 UART UART register definitions Offset Register Access Description 0 Receive Data Transmit Data Baud rate LSB Interrupt Enable Bit 0 Bit 1 Bit 2 Bit 3 Bits 4-7 Baud rate MSB Interrupt ID Bit O Bits 1-2 Bit 3 Bits 4-5 Bits 6-7 FIFO Control Bit 0 Bit 1 Bit 2 Bit 3 Bits 4-5 Bits 6-7 Line Control Bits 0-1 Bit 2 Bits 3-5 Bit 6 Bit 7 Modem C qntrol Line Status Modem Status Scratch-pad RIO WIO RIW RIW Fetches first byte from input FIFO Sends byte to output FIFO Low byte of baud rate divisor* Enables various interrupts Received data ready Transmit buffer empty Error or BREAK RS-232 input has changed state Always zero High byte of baud rate divisor* Identifies source of an interrupt If set, no interrupts pending Source of interrupt (see below) FIFO timeout interrupt Reserved Set if FIFOs are enabled Controls FIFO behavior Enable both FIFOs Clear all bytes from Receive FIFO Clear all bytes from Transmit FIFO Enable DMA support Reserved Trigger-level of Receive FIFO Controls data bits, stop bits, parity Number of data bits Number of STOP bits Parity control BREAK control Divisor latch access bit (DLAB) Controls state of DTR and RTS lines Reports status of 1/0 operation Reports state changes in DTR, RTS Unused, possibly not implemented 1 2 3 4 5 6 7 RIW RIO WIO RIW RIW RIW RIW RIW *Accessible only when DLAB in the Line Control register is 1 . Chapter 1 1 238 Full-Duplex Drivers people playing those little tricks they like so much. The first trick is that some addresses go to different registers on the UART depending on whether you're reading or writing them. For example, if you read from offset 0, you get the con tents of the Receive Data register, but if you write to 0, your byte goes to the Transmit Data register instead. That accounts for ten registers, but what about the remaining two? The other trick is that when you set the DLAB bit of register 3, the low and high bytes of the baud-rate control mysteriously appear at offsets 0 and 1 . You restore things to nor mal by clearing the bit. Since you're not likely to change baud rates frequently, this doesn't cause much of a problem. One other thing to watch out for is register 7. Although the official data sheets say you should be able to use it as a one-byte store for anything you like, the truth is that it may not work. National Semiconductor licenses this UART design to a number of other manufacturers, and they don't all implement the scratch-pad. Interrupt Behavior The 16550 UART uses interrupts to let the CPU know about a number of interesting conditions. Specifically, it generates an interrupt whenever: • A framing error or a BREAK occurs. • The Receive FIFO reaches the trigger level set by the FIFO Control register. • There is at least one character in the Receive FIFO, no other characters have arrived recently, and the CPU hasn't read the Receive Data register for awhile. This FIFO timeout interrupt prevents data from wasting away in the FIFO. • The transmit FIFO is empty. Usually, this is the signal to send more data. A single, spurious FIFO empty interrupt can occur when you first enable transmitter interrupts. • Any of the RS-232 input lines changes state. Your interrupt service routine determines the cause of the interrupt by examining the UART's Interrupt ID register. Notice the use of negative logic in this register: The UART clears the low-order bit when an interrupt occurs and sets it when all pending interrupts have been serviced. The remaining bits in this reg ister describe the exact source of the interrupt. See Table 11.6 for more information about UART interrupts. Since several of these conditions might occur simultaneously, the UART imposes a priority arbitration scheme on interrupt events. When an interrupt occurs, the 16550 locks out UART events of equal or lower priority until the cur rent interrupt has been dismissed. Sec. 1 1 .6 Code Example: Full-Duplex UART Driver Table 1 1 .6 239 Determining the cause of a 1 6550 interrupt UART i nterrupts Cause Priority ID register (No interrupt) Error or BREAK FIFO receiver trigger level Receive-FIFO timeout Transmitter buffer empty RS-232 input 0 1 1 2 3 1 6 4 12 2 0 Priority 0 is the most important, priority 3 the least. The Interrupt ID register only shows you the highest-priority UART event. After you service this event, any other pending interrupts appear in the ID regis ter in order of priority. This means that when you service a single UART interrupt, you need to check for any other events that might be pending before you dismiss the interrupt. Your service routine isn't really finished until the UART sets the low bit of the ID register. The action your service routine takes to clear an interrupt depends on the cause of the interrupt. Table 11.7 shows how to clear various UART interrupts. Notice that you can clear Transmit interrupts either by sending more data, or (if this is the end of the I/ 0 operation) simply by reading the ID register again. Table 1 1 .7 Clearing interrupts on the 1 6550 UART Clearing UART i nterrupts I nterrupt source To clear it ... Receiver error or BREAK Received data Transmit buffer empty Read the Line Status register Read data from the Receiver register • Write to the Transmit buffer • Read the Interrupt ID register Read the RS-232 Status register RS-232 input 1 1 .6 C O D E E XAM P L E : F U LL- D U P L EX UART D R I V E R This i s an example o f a simple driver that performs simultaneous input and out put operations using a 16550 UART. Because the driver is rather large, only selected pieces will appear here. You can find the complete code for this example in the CHU \DRIVER directory on the disk accompanying this book. Chapter 1 1 240 Full-Duplex Drivers What to Expect As you're poking around in the code, keep in mind that this is a toy driver whose real purpose is to illustrate the techniques presented earlier in this chapter. As a result, it ignores a number of issues that a real serial port driver needs to worry about. 6 Before examining the driver itself, it's a good idea to describe some of the things it doesn't do. Perhaps this driver 's biggest limitation is that it doesn't handle unsolicited input. In other words, it only accepts data from the device when an IRP_MJ_READ IRP is pending. Data arriving at any other time is simply dropped on the floor. In a real serial port driver, the Interrupt Service routine would proba bly dump unsolicited input into a type-ahead buffer, where it could be used to satisfy IRP_MJ_READ requests as they arrived. Secondly, this driver uses a very simple signaling protocol between the sender and the receiver: It relies on the timeout interrupt from the UART's input FIFO to terminate a read request. If the sender slows down enough to trigger this interrupt, the receiver will essentially ignore the rest of the transmission. Con versely, if the sender doesn't leave enough of a gap between successive transmis sions, the receiver will run them together. This is the only kind of flow-control supported by the driver. Finally, as a concession to simplicity, this driver doesn't worry about device operations that time out. Since this can lead to situations where an IRP never gets completed, it's definitely something you'd want to handle in a real driver. The first code example in Chapter 10 shows how to deal with device time-outs. DEVICE_EXTENSION i n XXDRIVER.H The following excerpt from the driver-specific header file shows the layout of the Device Extension. typede f s t ruct _DEVICE_EXTENS I ON { PDEVICE_OBJECT Devic eObj ect ; I I Back pointer ULONG NtDevi c eNumber ; I I Z ero -based devi ce number PUCHAR PortBas e ; I I F i r s t control regi s ter PKINTERRUPT pinterrupt ; I I Interrupt obj e c t II I I Current UART s e t t ings II 6 If you want to see what really goes into managing a standard COM port, take a look at the serial port driver source code included in the NT DDK. Sec. 1 1 .6 Code Example: Full-Duplex UART Driver ULONG ULONG ULONG ULONG 241 I nputF i f oTri ggerLeve l ; DataB i t s ; S t opBi t s ; Pari ty ; KDEVICE_QUEUE Al t erna t e i rpQueue ; 0 KDPC Al ternateDpc ; P IRP CurrentAl terna t e i rp ; ULONG OutputF i f o S i z e ; ULONG OutputByt esReques t ed ; ULONG OutputByt esRemaining ; PUCHAR pOu tputBu f f e r ; II II II II Byt es t o s end at once @ Output bu f f er s i z e Chars l e f t t o s end Next char to s end BOOLEAN Outpu t interrup t sVal i d ; ULONG I nputF i f oS i z e ; ULONG I nputByt esReques t ed ; ULONG InputByt esRemaining ; PUCHAR p i nputBu f fer ; II II II II Count o f bytes � I nput buf f e r s i z e Space l e f t i n bu f f e r Next ava i l ab l e s l o t BOOLEAN Inpu t interrup t sVa l i d ; UCHAR Devi c e S t atus ; I I Mo s t recent s tatus DEVICE_EXTENSION , * PDEVICE_EXTENS I ON ; 0 The Device Queue object and AltemateCurrentlrp pointer keep track of input requests. In this driver, all input operations will follow the alternate processing path. The DPC object is used to request 1/0 postprocessing of alternate IRPs. @ Here are the bookkeeping items used for output requests. Since the driver is using Buffered 1/0, it has to keep a count of the bytes left to be trans ferred and a pointer to the location of the next output byte in the system buffer. The OutputlnterruptsValid flag is set to TRUE whenever an out put operation is in progress. � These items do the bookkeeping for input requests. Notice how they par allel the output items. DISPATCH.C This portion of the example shows the Dispatch routines for writing, read ing, and performing IRP cleanup operations. XxDispatchWrite This function processes Win32 WriteFile calls by send ing the IRP along the standard driver processing path. Chapter 1 1 242 Full-Duplex Drivers NT STATUS XxDi spat chWr i t e ( IN PDEVICE_OBJECT Devic eObj e c t , IN P I RP I rp ) { P IO_STACK_LOCATION I rpStack = I oGetCurrent i rp S tackLoc at i on ( I rp ) ; i f ( I rpS tack- > Parameters . Wr i t e . Length = = 0 ) 0 { I rp - > I oS tatus . S tatus = STATUS_SUCCES S ; Irp - > I o S tatus . Inforrnat i on O; IoCornp l e teReque s t ( I rp , I O_NO_INCREMENT ) ; return STATUS_SUCCESS ; = } II I I S t art devi c e operat i on II I oMarkirpPending ( I rp ) ; I o S tartPacket ( @ Devi c eObj e c t , I rp , 0, XxCanc e l Prirnaryi rp ) ; 8 re turn STATUS_PENDING ; 0 This driver doesn't consider zero-length transfers to be an error, so the IRP is just completed immediately. @ To send an IRP along the standard processing path, the driver calls IoStartPacket. 8 While the IRP is waiting in the Device object's pending queue, this Cancel routine will be responsible for canceling it. XxDispatch Read This function processes Win32 ReadFile calls by sending the IRP along the alternate driver processing path. NTSTATUS XxDi spatchRead ( IN PDEVICE_OBJECT Devi ceObj e c t , IN PIRP I rp ) { Sec. 1 1 .6 Code Example: Full-Duplex UART Driver 243 P IO_STACK_LOCATI ON I rpStack = I oGe tCurrent i rpStackLocat i on ( I rp ) ; II I I Check f o r z e r o - l ength tran s f e r s II i f ( I rpS tack- > Parameters . Read . Length = = O ) { I rp - > I o S tatus . S tatus = STATUS_SUCCES S ; I rp - > I oS t atus . Inforrnat i on = O ; I oCornp l e t eReques t ( I rp , I O_NO_INCREMENT ) ; return STATUS_SUCCESS ; } I oMarki rpPending ( I rp ) ; 0 XxAl ternat eStartPacket ( @ Devi ceObj ect , I rp , XxCanc e lAl t ernateirp ) ; @) return STATUS_PENDING ; } 0 Begin the device operation. As always, the IRP must be marked pending. @ Unlike the previous Dispatch routine, this one uses a driver-defined function to send the IRP along the alternate processing path. @) Once again, there's a Cancel routine to process the IRP if it should be can celed before the driver actually starts working on it. XxDispatchCleanup This Dispatch routine gets called when a thread that opened a handle either calls CloseHandle or terminates. Its job is to pull any IRPs associated with the handle from the two Device Queues and cancel them. NT STATUS XxDi spatchCleanup ( IN PDEVICE_OBJECT Devi c eObj e c t , IN PIRP Irp ) PIO_STACK_LOCAT ION C l e anup i rpS tack = I oGetCurrent i rpStackLocat i on ( I rp ) ; PDEVICE_EXTENS ION Devi c eExtens i on = Devi c eObj e c t - >DeviceExtens i on ; XxC l eanupDevi c eQueue ( 0 &Devi c eObj e c t - >DeviceQueue , C l eanup i rp S tack- > F i l eObj e c t ) ; Chapter 1 1 244 Full-Duplex Drivers XxC l eanupDevi c eQueue ( @ &DeviceExten s i on->Al t e rnate i rpQueue , C l eanup i rpS tack- >Fi l eObj ect ) ; I rp - > I o S tatus . Status = STATUS_SUCCESS ; @} I rp - > I o S tatus . Informa t i on = O ; I oComp l e t eReque s t ( I rp , IO_NO_INCREMENT ) ; return STATUS_SUCCES S ; 0 XxCleanupDeviceQueue, a helper function that appears later in this example, does the actual work. Here, it's being called to cancel IRP_MJ_WRITE IRPs waiting in the Device object's primary queue. The File object pointer identifies the handle to look for when canceling IRPs. @ Here, XxCleanupDeviceQueue will cancel IRP_MJ_READ IRPs associ ated with the handle. @} Finally, the IRP_MJ_CLEANUP IRP itself is completed. Once this IRP is passed back to the 1/0 Manager, it will be followed by an IRP_MJ_CLOSE request for the same handle. DEVQUEU E.C The routines in this file manage the Device Queue object used for processing alternate IRPs. XxAlternateStartPacket Given an IRP, this function either sends it to the alternate Start 1/0 routine or queues it for later processing if the alternate path is busy. In many ways, this function resembles the 1/0 Manager 's IoStartPacket routine. VOI D XxAl t e rnat e S tartPacke t ( IN PDEVICE_OBJECT DeviceObj e c t , IN PIRP I rp , IN PDRIVER_CANCEL Canc el Func t i on ) KIRQL Oldi rql ; PDEVICE_EXTENS ION Devic eExt en s i on = Devi c eObj e c t - >Devic eExtens i on ; I oAcquireCanc e l Sp inLock ( &Oldi rql ) ; 0 I o S etCanc e l Rout ine ( I rp , Canc e l Func t i on ) ; i f ( Ke insertDevi c eQueue ( @ &DeviceExt ens i on- >Al ternat e i rpQueue , 245 Sec. 1 1 .6 Code Example: Full-Duplex UART Driver & I rp - >Tai l . Ove r l ay . Dev i c eQueueEntry ) ) I oRe l eas eCanc e l Sp inLock ( Oldirql ) ; e l s e 49 Devi c eExtens i on->CurrentAl ternate i rp I oRe leas eCanc e l SpinLock ( Oldirql ) ; = I rp ; 0 KeRai s e i rql ( D I S PATCH_LEVEL , &Oldi rql ) ; 0 XxAl ternateStart i o ( Dev i ceObj e c t , I rp ) ; KeLowe r i rql ( Oldi rql ) ; 0 It's necessary to be holding the Cancel spin lock in order to modify the IRP's CancelRoutine field. This driver also uses the Cancel spin lock to guard the alternate IRP queue and the pointer to the alternate IRP cur rently being processed. @ Try to put the IRP into the alternate queue. If the Device Queue object was already busy, the IRP will be inserted and the driver will simply release the Cancel spin lock. @ If the Device Queue was not-busy, KelnsertDeviceQueue will fail, and the Device Queue will flip into the busy state. In that case, it's necessary to start processing the IRP. 0 The first step is to record the IRP as the current alternate IRP. Once this is done, it's safe to release the Cancel spin lock. 0 The next step is to call the alternate Start 1/0 routine. Since XxAlternate StartPacket runs at PASSIVE_LEVEL IRQL, and the alternate Start 1/0 routine runs at DISPATCH_LEVEL, it's necessary for requests to raise and lower the CPU's IRQL value. XxAlternateStartNextPacket This routine does the same job as the 1/0 Manager 's IoStartNextPacket function. If there is an available IRP in the queue of pending alternate IRPs, this function sends it to the alternate Start 1/0 entry point. This piece of code expects to run at DISPATCH_LEVEL IRQL only. VOID XxAl ternateS tartNext Packe t ( IN PDEVICE_OBJECT Devic eObj e c t , IN BOOLEAN Canc e l ab l e ) { PDEVICE_EXTENSION Devi c eExtens i on = Devi c eObj e c t - >DeviceExtens i on ; 246 Chapter 1 1 Full-Duplex Drivers PKDEVICE_QUEUE_ENTRY QueueEntry ; P I RP I rp ; KIRQL Oldi rql ; i f ( Cance l ab l e I oAcqu i r eCance l Sp inLock ( &Oldi rql ) ; 0 QueueEnt ry = KeRemoveDevic eQueue ( &Devic eExtens i on - >Al ternat e i rpQueue ) ; @ i f ( QueueEntry ! = NULL ) { I rp = CONTAINING_RECORD ( @} QueueEntry , IRP , Tai l . Over l ay . DeviceQueueEntry ) ; Devi ceExten s i on- >CurrentAlternat e i rp = I rp ; i f ( Canc e l ab l e ) I oRe l eas eCance l SpinLock ( Oldi rql ) ; 0 XxAl t ernateS tart i o ( Devi c eObj ect , I rp ) ; else { Devi ceExten s i on- >CurrentAlternat e i rp = NULL ; 0 i f ( Canc e l ab l e ) I oRe l eas eCance l SpinLock ( O l di rql ) ; } 0 In imitation of the I/O Manager 's routine, this function uses an explicit argument to decide whether the whole operation should be protected by the Cancel spin lock. Since this driver always attaches a Cancel routine to an alternate IRP, this argument will always be TRUE. @ Try to get the next pending IRP from the alternate Device Queue. If the queue was empty, KeRemoveDeviceQueue sets the Device Queue's state to not-busy and returns NULL. @} There was something in the queue. Reconstitute the address of the IRP itself and make it the new current IRP for the alternate path. 0 If necessary, let go of the Cancel spin lock, then call the driver 's alternate Start I/0 entry point. 0 If the queue was empty, the only work to do is to clear out the current-IRP slot for the alternate path and drop the Cancel spin lock. Sec. 11.6 Code Example: Full-Duplex UART Driver 247 INPUT.C In this driver, IRP_MJ_READ requests are sent down the alternate path. This file contains routines that process these alternate IRPs. You'll find similar code for handling IRP_MJ_WRITE requests in OUTPUT.C. XxAlternateStartlo Like any Start 1/0 routine, this one is responsible for setting up various bookkeeping values and then starting the actual device opera tion. VOI D XxAl ternat eStart i o ( IN PDEVICE_OBJECT Devi ceObj ect , IN P I RP I rp ) KIRQL Oldi rql ; P I O_STACK_LOCATION I rpS tack = I oGetCurrent i rp S tackLocat i on ( I rp ) ; PDEVICE_EXTENS ION Devi ceExt ens i on DeviceObj e c t - >Devic eExt ens i on ; = I oAcqu i reCanc e l Sp inLock ( &Oldi rql ) ; 0 i f ( I rp - >Canc e l ) { I oRe l eas eCance l SpinLock ( O l di rql ) ; @ re turn ; else @ I o S etCance l Rout ine ( I rp , NULL ) ; IoRe l eas eCanc e l SpinLock ( Oldi rql ) ; swi t ch ( I rpStack->Ma j o rFunc t i on ) { case I RP_MJ_READ : 0 Devi ceExtens i on- > I nputByt e s Reque s t ed I rpS tack- > Parameters . Read . Length ; Devic eExtens i on- > I nputByte s Remaining = DeviceExtens i on-> InputBytesReques ted ; Devi ceExt ens i on - >p inputBu f f e r = I rp - >As s o c i atedirp . Sys temBu f fer ; i f ( ! Ke Synchroni z e Execut i on ( 0 Chapter 1 1 248 Full-Duplex Drivers DeviceExtens i on - >pinterrupt , XxReceiveByte s , DeviceExtens i on ) ) XxDpcFori npu t s ( NULL , Devic eObj e c t , I rp , Devic eExtens i on ) ; break ; II I I Shoul d never get here - - j us t get r i d I I o f the packet . . . II de faul t : II I I Fai l the IRP and s tart the next one . II I rp- > I o Status . S t atus = STATUS_NOT_SUPPORTED ; I rp- > I o S tatus . Informat i on = O ; I oComp l e teReques t ( I rp , I O_NO_INCREMENT ) ; XxAl ternateStartNext Packe t ( Devi ceOb j ect , TRUE ) ; break ; 0 Before starting the operation, see if the Cancel routine has run between the time the IRP was removed from the Device Queue and now. This requires ownership of the Cancel spin lock. @ If the Cancel flag is set, it means the IRP has already been processed by the Cancel routine. In this case, the only thing to do is to release the spin and return immediately. 49 The Cancel flag is clear. Remove the IRP from the cancelable state by set ting its Cancel routine to NULL, then start to process it. From this point on, only normal completion or an error can stop this request. 0 Set up various pointers and counters in preparation for the data transfer operation. Sec. 1 1 .6 Code Example: Full-Duplex UART Driver 249 0 Next, start the device. If something goes wrong, use the DPC routine to fail the IRP Pass NULL for the DPC object argument to let the DPC routine know that it's been called early and not as part of a normal 1/0 completion. XxDpcForlnputs Here's the CustomDpc routine used for inputs. It does the usual work of putting a final status in the IRP, completing the current request, and trying to start another. VOI D XxDpcForinpu t s ( IN PKDPC Dpc , IN PDEVICE_OBJECT Devic eObj e c t , IN P I RP I rp , IN PVO I D Context ) PDEVICE_EXTENS ION Devi c eExtens i on = Context ; I rp - > I o S tatus . Inf orma t i on = 0 Devi ceExtens i on- > I nputByt esRequ e s t ed DeviceExtens i on - > I nputBytesRemaining ; I rp - > I oS tatus . S tatus = STATUS_SUCCES S ; @ i f ( Dpc == NULL ) @) I oComp l e t eReque s t ( Irp , I O_NO_INCREMENT ) ; else I oComp l e t eReque s t ( I rp , I O_SERIAL_INCREMENT ) ; XxAl ternateS tartNext Packet ( Devi c eObj e c t , TRUE ) ; 0 0 Calculate the number of bytes actually transferred. @ Come up with a final status code for the IRP. A real driver would proba bly use the last recorded contents of the device's status register (stored in the Device Extension) to produce a real status value. @) If this routine is being called directly from the alternate Start 1/0 routine, the DPC argument will be NULL. This means the IRP is being failed before it got started. In that case, don't give the calling thread a priority boost. 0 This request is done. Use a driver-defined routine to start the next alter nate IRP (if there is one) . ISR.C This file contains the interrupt service code for the UART driver. To make things a little more readable, processing for input events happens in some auxil iary subroutines. 250 Chapter 1 1 Full-Duplex Drivers Xxlsr The Kernel's interrupt dispatcher calls this function at DIRQL, hold ing the Interrupt spin lock for the device. Since the UART can request multiple kinds of interrupts at the same time, Xxlsr has to keep checking for possible inter rupts until nothing more shows up. BOOLEAN Xxi s r ( IN PKINTERRUPT Interrupt , IN PVOID ServiceCont ext ) { PDEVICE_EXTENS ION pDE = Servic eCont ext ; PDEVICE_OBJECT pDevi c e = pDE- >Devic eObj ect ; UCHAR Interrup t i d = XxReadint i d ( pDE ) ; i f ( ( Interruptid & XX_I IR_NO_INTERRUPT re turn FALSE ; != 0 ) 0 do { Interrup t i d &= XX_I I R_INTERRUPT_I D_MASK ; @ swi tch ( Interrup t i d ) { case XX_I I R_ERR : XxReadLineStatus ( pDE ) ; @ break ; case XX_I I R_RDA : XxHandl e inpu tF i f oTrigger ( pDE ) ; 0 break ; case XX_I I R_F IFO_TMO : XxHandl e inputFi foTimeOut ( pDE ) ; 0 break ; case XX_I I R_TBE : i f ( pDE - >OutputinterruptsVa l i d ) © i f ( ! XxTransmi tByt e s ( pDE ) ) I oReques tDpc ( pDevi ce , pDevi c e - >Current i rp , ( PVOI D ) pDE ) ; break ; case XX_I I R_RS 2 3 2 : XxReadModemS tatus ( pDE ) ; @ break ; Sec. 1 1 . 6 Code Example: Full-Duplex UART Driver 251 Interrup t i d = XxReadint i d ( pDE ) ; � whi l e ( ( Interrup t i d & XX_I IR_NO_INTERRUPT ) = = 0 ) ; return TRUE ; 0 If the low-order bit of the Interrupt ID register is set, then this device didn't generate an interrupt. Return control to the Kernel's interrupt dispatcher. @ The UART interrupted. Enter a loop that will keep processing interrupt until there's nothing left to do. Begin by masking out any irrelevant bits, then switch on the interrupt-type. @) This driver doesn't process any device errors. Just read the status register to clear the pending interrupt. 0 This interrupt means that the input FIFO hit its trigger level. Call a helper routine to get the input characters from the FIFO. 0 This interrupt means there's been a little data (less than the trigger level) sitting and aging in the input FIFO. For this driver, that's a signal to end an input operation. Call a helper function to empty the FIFO and com plete the IRP. © During an output operation, this interrupt means that it's time to refill the output FIFO and send more data. The interrupt-valid flag in the Device Extension prevents the driver from responding to spurious Transmit Buffer Empty interrupts when no output request is being processed. 8 This driver ignores modem events, but it's still necessary to read the Modem Status register in order to clear the interrupt. � That ends the processing for the first UART interrupt. There might be more waiting in line behind it. Read the Interrupt ID register to get the next one and do the whole thing over again. If there is no other interrupt pending in the UART, drop out of the loop and return. XxHandlelnputFifoTrigger This function is called by Xxlsr during an input operation to get the next bunch of characters from the UART. s tat i c VOI D XxHandl e i npu tF i f oTrigger ( IN PDEVICE_EXTENS ION pDE ) ULONG i ; II I I Read one l e s s than the number o f byte s in Chapter 1 1 252 Full-Duplex Drivers I I the F I FO ; thi s guarant ees a F I FO t ime - out I I whi ch wi l l end the read reque s t . II for ( i = O ; i < pDE - > InputF i f o S i z e - 1 ; i + + ) 0 { i f ( pDE - > Inputinterrup t sVa l i d && pDE - > InputByt e sRemaining > 0 ) @ *pDE - >p i nputBu f f e r + + = XxReadDataBu f f er ( pDE ) ; pDE - > InputByt esRema ining - - ; e l s e XxReadDataBu f fer ( pDE ) ; } } 0 This loop reads one less than the number of bytes in the FIFO. This last lonely byte, pining away in the FIFO, will eventually generate a FIFO timeout interrupt and terminate the input operation. @ If an input operation is in progress, and if there's room left in the buffer, move a byte from the FIFO to the input buffer. Otherwise, drop the byte on the floor. This behavior throws away both excess characters and unso licited input. XxHandlelnputFifoTimeOut This function is called from Xxlsr when some bytes have been languishing in the input FIFO for more than four character periods. In this driver, the FIFO timeout interrupt signals the end of an input operation. s ta t i c VOI D XxHandl e i nput F i foTimeOut ( IN PDEVICE_EXTENS ION pDE ) { whi l e ( XxReadLineS tatus ( pDE ) & XX_LSR_DATA_RDY ) 0 { i f ( pDE - > Input interrup t sVa l i d && pDE - > InputByt e s Remaining > 0 ) { *pDE - >pinputBu f f e r + + = XxReadDataBu f fer ( pDE ) ; pDE - > InputByt e sRemaining - - ; } e l s e XxReadDataBuf f e r ( pDE ) ; } Sec. 1 1 .6 Code Example: Full-Duplex UART Driver 253 i f ( pDE - > Input interrup t sVa l i d ) @ { pDE - > Input interrup t sVa l i d = FALSE ; Ke insertQueueDpc ( fD &pDE - >Alternat eDpc , ( PVOI D ) pDE - >CurrentAl t ernat e i rp , ( PVOI D ) pDE ) ; 0 Read bytes from the FIFO until it's empty. If this is a genuine input opera tion and there's still some room left in the buffer, store the bytes. Other wise, drop them on the floor. f9 If this was a spurious interrupt, there's nothing more to do. If an input operation really was in progress, clear the interrupt-valid flag (so addi tional interrupts will be ignored). Then complete the current input IRP. fD Input operations use a CustomDpc routine to complete the IRP. CANCEL.C This file contains routines that support IRP cancellation. XxCleanupDeviceQueue This function is called by the driver 's Cleanup Dispatch routine. Its job is to cancel any IRPs in a Device Queue whose File object pointer matches the one passed as an argument. VOI D XXC l eanupDev i ceQueue ( IN PKDEVICE_QUEUE Devi c eQueue , IN PFILE_OBJECT F i l eObj e c t ) { KIRQL Oldi rql ; P I RP Canc e l i rp ; P I RP Requeu e i rp ; P IO_STACK_LOCATI ON Canc e l irpS tack ; L I ST_ENTRY Canc e l L i s t ; L I S T_ENTRY RequeueL i s t ; PLI S T_ENTRY L i s tHead ; PKDEVICE_QUEUE_ENTRY QueueEntry ; Ini t i a l i zeLi s tHead ( &Canc e lL i s t ) ; 0 Ini t i a l i zeLi s tHead ( &RequeueL i s t ) ; Chapter 1 1 254 Full-Duplex Drivers I oAcquireCanc e l SpinLo c k ( &Oldi rql ) ; i f ( I sL i s tEmpty ( &DeviceQueue - >Devi ceLi s tHead ) ) 8 { I oRe l ea s eCanc e l SpinLock ( Oldi rql ) ; re turn ; } whi l e ( ( QueueEntry KeRemoveDevi c eQueue ( DeviceQueue ) ) ! = NULL ) 8 { Canc e l i rp = CONTAINING_RECORD ( QueueEntry , IRP , Tai l . Over l ay . Devic eQueueEntry ) ; = Canc e l i rpStack = I oGetCurren t i rpStackLocat i on ( Canc e l i rp ) ; i f ( Canc e l i rp S t ack- >Fi l eObj ect = = F i l eObj ect ) 0 { Canc e l i rp - >Canc e l = TRUE ; Canc e l i rp - >Canc e l i rql = Oldirql ; Canc e l i rp - >Cance l Routine = NULL ; InsertTai l L i s t ( &Canc e l L i s t , &Canc e l i rp - >Tai l . Overlay . Li s tEntry ) ; } else e InsertTai l L i s t ( &RequeueL i s t , &Canc e l i rp - >Ta i l . Over l ay . L i s tEntry ) ; } } whi l e ( ! I s L i s tEmpty ( &RequeueL i s t ) ) Tai l . Over l ay . Devi c eQueueEntry ) ) { Ke ins ertDevic eQueue ( Dev i c eQueue , &Requeueirp - > Ta i l . Ove r l ay . Dev i c eQueueEntry ) ; } } II I I Then release the Canc e l spin l ock II I oRe l eas eCanc e l SpinLock ( Oldi rql ) ; II I I Run the l ength o f the ho l ding queue and I I c omp l e t e every IRP that we f ound in i t . II whi l e ( ! I s L i s tEmpty ( &Cance l L i s t ) ) @ { L i s tHead = RemoveHeadL i s t ( &Canc e l L i s t ) ; Canc e l i rp = CONTAINING_RECORD ( L i s tHead , IRP , Tai l . Overlay . Li s tEnt ry ) ; Canc e l i rp - > I o S tatus . S tatus STATUS_CANCELLED ; Canc e l i rp - > I o S tatus . Informat i on O; = = I oComp l e teReques t ( Canc e l i rp , I O_NO_INCREMENT ) ; 0 These temporary work-lists will hold IRPs that are chosen for cancellation and for requeuing. The list-heads need to initialized. It's also necessary to acquire the Cancel spin lock and hold it until all the IRPs in the Device Queue have been processed. 8 See if there are any IRPs in the Device Queue. If it's empty, there's no work to do, so just quit. 8 Loop until every IRP has been removed from the Device Queue. For each IRP, decide whether to cancel it or requeue it. At the end of this loop, the Device Queue has been emptied, hence its state will be Not Busy. Chapter 1 1 256 Full-Duplex Drivers 0 If the IRP' s File object pointer is the same as the one in the IRP_MJ_CLEANUP IRP, set the IRP' s various CancelXxx fields. Then put the IRP into a holding queue of requests to be canceled. 0 If the File object pointer doesn't match, this IRP should not be canceled. In that case, add it to the list of IRPs to be put back in the Device Queue. Devic eExtens i on ; i f ( I rp = = Devi ceObj e c t - >CurrentAl ternateirp ) 0 { I oRe l easeCanc e l SpinLock ( I rp- >Canc e l i rql ) ; fD I rp - > I o S tatus . S tatus = STATUS_CANCELLED ; I rp - > I o Status . Informa t i on = O ; I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; XxAl t e rnat e S tartNext Packet ( f9 Devi c eObj e c t , TRUE ) ; } else 0 { 7 CANCEL.C contains a similar function for canceling IRP_MJ_WRITE IRPs. Sec. 11.7 Summary 257 KeRemoveEnt ryDevic eQueue ( &DeviceOb j e c t - >A l t ernate i rpQueue , & I rp - >Tai l . Over l ay . DeviceQueueEntry ) ; I oRe l ea s eCance l Sp i nLock ( I rp - >Canc e l i rql ) ; I rp- > I o S tatus . S tatus = STATUS_CANCELLED ; I rp - > I o S ta tus . Informat i on = O ; II I I Comp l e t e thi s IRP , but don ' t s tart the I I next one . II I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; 0 If the IRP is already in the CurrentAltematelrp slot, but not yet started, it can still be canceled. @ Release the Cancel spin lock before completing the IRP. Notice that the 1/0 Manager has loaded the Cancellrql field of the IRP with the IRQL to which the driver should return when it releases the lock. @ Since the current alternate IRP has been removed, it's necessary to see if another one is waiting in the wings. 0 The IRP wasn't current, so it must still be sitting in the Device Queue. Simply remove it from the queue and complete it. In this case, the driver doesn 't try to start the next IRP. 1 1 . 7 S U M MARY This chapter has presented a slightly different driver architecture that allows you to process more than one IRP at a time. Implementing this architecture required that we set up a Device Queue object to hold alternate IRPs. CustomDpc and Can cel routines also proved helpful, although their usefulness goes far beyond full duplex drivers. So much for drivers that manage Programmed 1/0 devices. The next step is to see what kind of support NT provides for DMA hardware. That will be the sub ject of the coming chapter. C H A P T E R 12 OMA Drivers O ne way or another, all the drivers we've seen so far have depended on the CPU to move data between memory and the peripheral device. This technique is fine for slower hardware, but for fast devices that trans fer large amounts of data, it would introduce too much overhead. Such devices are usually capable of directly accessing system memory and transferring data without the CPU's intervention. This chapter explains how to write drivers for these kinds of devices. 1 2 . 1 H ow O M A W O R KS U N D E R W I N D OWS NT As you saw in Chapter 1, insulating drivers from CPU- and platform-dependen cies was a major design goal of the NT I/O subsystem. One way that NT does this is by using an abstract model of DMA operations. Drivers that perform DMA work within the framework of this abstract model and can ignore many of the hardware-specific aspects of what's going on. This section presents the major fea tures of the NT DMA framework. Hiding OMA Hardware Variations with Adapter Objects The purpose of using DMA is to minimize the CPU's involvement in data transfer operations. To do this, DMA devices use an auxiliary processor (called a DMA controller) to move data between memory and the peripheral device. This 258 Sec. 12.1 How DMA Works Under Windows NT 259 allows the CPU to continue doing other useful work in parallel with the 1 / 0 operation. Although the exact details will vary, most DMA controllers have a very sim ilar architecture. In its simplest form, this consists of an address register for the starting address of the DMA buffer, and a count register for the number of bytes or words to transfer. When you set these registers and start the attached device, the DMA controller begins moving data on its own. With each transfer, it incre ments the memory address register and decrements the count register. When the count register empties out, the DMA controller generates an interrupt, and the device is ready for another transfer. Unfortunately, the needs of real-world hardware design complicate this sim ple picture. Consider the DMA implementation on ISA-based machines, described back in Chapter 2. These systems use a pair of Intel 8237 controller chips cascaded to provide four primary and three secondary DMA data channels. The primary channels (identified as zero through three) can perform single-byte transfers, while the secondary channels (five through seven) always transfer two bytes at a time. Since the 8237 uses a 16-bit transfer counter, the primary and sec ondary channels can handle only 64K bytes or 128K bytes per operation, respec tively. Due to limitations of the ISA architecture, the DMA buffer must be located in the first sixteen megabytes of physical memory. Contrast this with the DMA architecture used by EISA systems. The Intel 82357 EISA I/ 0 controller extends ISA capabilities by supporting one-, two-, or four-byte transfers on any DMA channel, as well as allowing DMA buffers to be located anywhere in a 32-bit address space. In addition, EISA introduces three new DMA bus-cycle formats (known as types A, B, and C) that give peripheral designers the ability to work with faster devices. Even on the same ISA or EISA bus, different devices can use different DMA techniques. Remember the discussion of DMA slaves and bus masters from Chap ter 2. Slave devices compete for shareable system DMA hardware on the mother board, while bus masters avoid bottlenecks by using their own built-in DMA controllers. The problem with all this variety is that it tends to make DMA drivers very platform dependent. To avoid this trap, NT drivers don't manipulate DMA hard ware directly. Instead, they work with an abstract representation of the hardware in the form of an NT Adapter object. Chapter 4 briefly introduced these objects and said they help with orderly sharing of system DMA resources. It turns out that Adapter objects also simplify the task of writing platform-independent drivers by hiding many of the details of setting up the DMA hardware. The rest of this section will explain more about what Adapter objects do and how to use them in a driver. Solving the Scatter/Gather Problem with Mapping Registers Although virtual memory simplifies the lives of application developers, it introduces two major complications for DMA-based drivers. The first problem is Chapter 12 260 DMA Drivers that the buffer address passed to the I/O Manager is a virtual address. Since the DMA controller works with physical addresses, DMA drivers need some way to determine the physical pages making up a virtual buffer. You'll see how this works when we look at Memory Descriptor Lists in the next section. The other problem (illustrated in Figure 12.1) is that a process doesn't neces sarily occupy consecutive pages of physical memory, and what appears to be a contiguous buffer in virtual space is probably scattered throughout physical memory. The NT Virtual Memory Manager uses the platform's address transla tion hardware (represented by a generic page table in the diagram) to give the process the illusion of a single, unbroken virtual address space. Unfortunately, the DMA controller doesn't participate in this illusion. Since most DMA controllers can only generate sequential physical addresses, buffers that span virtual page boundaries present a serious challenge. Consider what happens if a DMA controller starts at the top of a multi-page buffer and simply increments its way through successive pages of physical mem ory. It's unlikely that any page after the first will actually correspond to one of the caller's virtual buffer pages. In fact, the pages touched by the DMA controller probably won't even belong to the process issuing the I/ 0 request. All virtual memory systems have to deal with the problem of scattering and gathering physical buffer pages during a DMA operation. Support for scat ter/ gather capabilities can come either from system DMA hardware or from hardware built into a smart bus master device. Once again, NT tries to simplify things by presenting drivers with a unified, abstract view of whatever scatter I gather hardware happens to exist on the system. This model consists of a contig uous range of addresses (called logical space) used by the DMA hardware and a Virtual Space Physical Space Logical Space Copyright © 1 994 by Cydonix Corporation. 940050a. vsd Figure 1 2. 1 Address spaces involved in DMA operations Sec. 12.1 How OMA Works Under Windows NT 261 set of mapping registers that translate logical space addresses into physical space addresses. Here's how it works. Referring to Figure 12.1, each mapping register corre sponds to one page of DMA logical space, and a group of consecutively num bered registers represents a contiguous range of logical addresses. To perform a OMA transfer, a driver first allocates enough contiguous mapping registers to account for all the pages in the caller's buffer. It then loads consecutive mapping registers with the physical addresses of the caller's buffer pages. This has the effect of mapping the physically noncontiguous user buffer into a contiguous area of logical space. Finally, the driver loads the OMA controller with the starting address of the buffer in logical space and starts the device. While the operation is in progress, the OMA controller generates sequential, logical addresses that the scatter I gather hardware maps to appropriate physical page references. So much for the conceptual view of mapping registers. Like the OMA con troller, the actual implementation depends on the platform, the bus, and the 1/0 device. To minimize the driver's awareness of these details, NT lumps the map ping registers into the Adapter object and provides a set of routines for managing them. Managing 1/0 Buffers with Memory Descriptor Lists As you've just seen, loading physical addresses into mapping registers is an important part of setting up a OMA transfer. To make this process easier, the 1/0 Manager uses a structure called a Memory Descriptor List (MDL). An MDL keeps track of the physical pages associated with a virtual buffer. The buffer described by an MDL can be in either user- or system-address space. Direct 1 / 0 operations are one place where MDLs play a major role. If a Device object has the DO_DIRECT_IO bit set in its Flags field, the 1/0 Manager automatically builds an MDL describing the caller's buffer each time an 1/0 request is sent to the device. It stores the address of this MDL in the IRP's MdlAd dress field, and the driver uses it to prepare the OMA hardware for a transfer. As you can see from Figure 12.2, the MDL consists of a header describing the virtual buffer, followed by an array that lists the physical pages associated with the buffer. Given a virtual address within the buffer, it's possible to determine the corresponding physical page. Some of the fields in the header help clarify the use of an MDL. StartVa and ByteOffset The StartVa field contains the address of the buffer described by the MDL, rounded down to the nearest virtual page bound ary. Since the buffer doesn't necessarily start on a page boundary, the ByteOffset field specifies the distance from this page boundary to the actual beginning of the buffer. Keep in mind that if the buffer is in user space, your driver can use the StartVa field to calculate indexes into the MDL but not as an actual address pointer. Chapter 12 262 DMA Drivers Physical Memory Virtual Space ByteOffset / ByteCount Size Process MappedSystemVa Phys Addr 1 Phys Addr 2 Phys Addr 3 Copyright © 1 996 by Cydonix Corporation. 960018a.vsd Figure 1 2 .2 Structure of a Memory Descriptor List (MDL} MappedSystemVa If the buffer described by the MDL is in user space and you need to access the contents of the buffer itself, you first have to map the buffer into system space with MmGetSystemAddressForMdl. This field of the MDL is used to hold the system-space address where the user-space buffer has been mapped. 1 ByteCount and Size These fields contain the number of bytes in the buffer described by the MDL and the size of the MDL itself, respectively. Process If the buffer lives in user space, the Process field points to the Process object that owns the buffer. The 1/0 Manager will use this information when it cleans up the I/ 0 operation. Keep in mind that MDLs are opaque data objects defined by the NT Virtual Memory Manager. Their actual contents may vary from platform to platform and they might also change in future versions of NT. Consequently, you must access an MDL using system support functions. Any other approach could lead to disas ter. Table 12.1 lists the MDL functions you're most likely to encounter in a driver. See the DOK documentation for others. It's worth pointing out that some of the functions in this table are implemented as macros for speed. 1 Using doubly-mapped buffers is generally a bad idea. Unmapping the buffer can cause a great deal of system overhead. Sec. 12.1 How DMA Works Under Windows NT Table 1 2.1 263 Functions that work with Memory Descriptor Lists MDL access functions Function Description IoAllocateMdl IoFreeMdl MmBuildMdlForN onPagedPool Allocates an empty MDL Releases MDL allocated by IoAllocateMdl Builds MDL for an existing nonpaged pool buffer Returns a nonpaged system space address for the buffer described by an MDL Builds an MDL describing part of a buffer Returns count of bytes in buffer described by MDL Returns page-offset of buffer described by MDL Returns starting VA of buffer described by MDL MmGetSystemAddressForMdl IoBuildPartialMdl MmGetMdlByteCount MmGetMdlByteOffset MmGetMdlVirtualAddress MDLs give drivers a convenient, platform-independent way of describing buffers located either in user- or system-address space. For drivers that perform DMA operations, MDLs are important because they make it easier to set up an Adapter object's mapping registers. Later parts of this chapter will show you how to use MDLs to set up DMA transfers. Maintaining Cache Coherency The final thing we need to consider is the impact of various caches on DMA operations. During a DMA transfer, data may be getting cached in various places, and if everything isn't coordinated properly, someone might end up with stale data. Figure 12.3 shows who the players are in this drama. CPU data cache Modern CPUs support both on-chip and external caches for holding copies of recently-used data. When the CPU wants something from physical memory, it first looks for the data in the cache. If the CPU finds what it wants, it doesn't have to make the long, slow trip down the system memory bus. For write operations, data moves from the CPU to the cache, where (depending on the caching policy) it may stay for awhile before making its way out to main memory. The problem is that, on some architectures (primarily RISC platforms), the CPU's cache controller and the DMA hardware are unaware of each other. This lack of awareness can lead to incoherent views of memory. For instance, if the CPU cache is holding part of a buffer and that buffer is overwritten in physical memory by a DMA input, the CPU cache will contain stale data. Similarly, if mod ified data hasn't been flushed from the CPU cache when a DMA output begins, 264 Chapter 12 OMA Drivers OMA Buffer .. . . .. . . . .. . . .. . .. . . .. . . .. . .. .. .. .. . .. . .. . . .. .. . .. . .. .. .. . .. .. . .. Duplicate OMA Buffer Adapter Object Cache · · - - - �----· Copyright © 1 996 by Cydonix Corporation. 940051 a.vsd Figure 1 2 .3 Caches involved in DMA processing the OMA controller will be sending stale data from physical memory out to the device. One way of handling this problem is to make sure that any portions of a OMA buffer residing in the CPU's data cache are flushed before a OMA operation 2 begins. Your driver can do this by calling KeFlushloBuffers and giving it the MDL describing the OMA buffer. This function flushes any pages in the MDL from the data cache of every processor on the system. The code example later in this chapter shows how this works. If you know something about hardware, you may be horrified by the over head of flushing every CPU's data cache before every OMA transfer. It's impor tant to emphasize that the cache coherency problem described above is only an issue on some platforms. On machines that automatically maintain cache coher ency, KeFlushloBuffers is a no-op . You should always call it, however, just in case your driver ends up on a platform that doesn't handle caching properly. Adapter object cache The Adapter object is another place where data may be cached during a OMA transfer. Unlike the CPU cache, which is always a real piece of hardware, the Adapter object's cache is an abstraction representing platform-dependent hardware or software. It might be an actual cache in a system OMA controller or a software buffer maintained by the I/ 0 Manager. In fact, for some combinations of hardware, there might not even be a cache, but your driver has to act as if there were in order to guarantee portability. 2 Another option is to use non-cached memory for your DMA buffers. Sec. 12. 1 How DMA Works Under Windows NT 265 If this sounds strange, consider a DMA controller attached to an ISA bus. Such a controller can access only the first sixteen megabytes of physical memory. If the pages of a user buffer are outside this range, the 1/0 Manager allocates another buffer in low memory when your driver sets up its DMA mapping regis ters. If you're setting up an output operation, the 1/0 Manager also copies the contents of the user buffer pages into this Adapter object buffer. You need to flush the Adapter object cache of this ISA DMA controller in two cases. First, after an input operation, your driver must tell the I/O Manager to copy data from the Adapter buffer back to the user buffer. Second, when you complete any data transfer, you have to let the 1/0 Manager know that it can · release the memory in the Adapter buffer. The function that does the work is IoFlushAdapterBuffers. Categorizing OMA Drivers The NT DMA model divides drivers into two categories, based on the loca tion of the DMA buffer itself. In packet-based DMA, data moves directly between the device and the locked-down pages of a user-space buffer. This is the type of DMA associated with Direct 1/0 operations. The main thing to notice here is that each new 1/0 request will probably use a different set of physical pages for its buffer. This has an impact on the kinds of setup and cleanup steps the driver will have to take for each 1/0. The other possibility is that the driver sets up a single nonpaged buffer in system space and uses it for all DMA transfers. This is referred to as common buffer DMA. Packet-based and common-buffer DMA are not mutually exclusive catego ries. Some complex devices perform both kinds of DMA. One example is the Adaptec AHA-1742 controller, which uses packet-based DMA to transfer data between SCSI devices and user buffers. This same controller exchanges command and status information with its driver using a set of mailboxes kept in a common buffer area. Although DMA drivers are all rather similar, certain implementation details will depend on whether you're performing packet-based or common-buffer DMA. Later sections of this chapter will present the specifics of writing each kind of driver. Limitations of the NT OMA Architecture Although NT's use of an abstract DMA model makes some things easier, it does have its drawbacks. For one thing, it tends to favor the notion of shared-sys tem DMA controllers. Much of the setup that goes on in an NT DMA driver is based on the idea of passing a shared DMA channel from driver to driver. In an age of dumb peripherals, this made sense, but as more bus-mastering devices have appeared, the slave DMA model has become a little out of date. A more significant problem is that NT doesn't allow you to perform DMA operations directly from device to device. Instead, you have to read data from one 266 Chapter 12 DMA Drivers 3 device, buffer it in system memory, and from there write it out to another device. This puts severe limitations on the available bandwidth and wastes one of the main architectural features of modern buses like PCL Sadly, Microsoft appears to be adamantly opposed to direct device-to-device data transfers. 1 2.2 W O R KI N G WITH A DAPT E R O BJ ECTS Although the specific details will vary according to the nature of the device and the architecture of the driver, DMA drivers generally have to perform several kinds of operations on Adapter objects. • Locate the Adapter object associated with a specific device. • Acquire and release ownership of Adapter objects and their mapping registers. • Load the Adapter object's mapping registers at the start of a transfer. • Flush the Adapter object's cache after a transfer completes. The following subsections discuss these topics in general terms. Later sec tions of this chapter will add more detail. Finding the Right Adapter Object All DMA drivers need to locate an Adapter object before they can perform any I/O operations. To find the right one, a driver's initialization code needs to call the HalGetAdapter function described in Table 12.2. Given a description of some DMA hardware, HalGetAdapter returns a pointer to the corresponding Adapter object and a count of the maximum number of map- Table 1 2.2 Function prototype for HalGetAdapter PADAPTER_OBJECT HalGetAdapter IRQL == PASSIVE LEVEL Parameter Description IN PDEVICE_DESCRIPTION DeviceDescription IN OUT PULONG NumberOfMapRegisters Points to a structure describing device capabilities Return value • • • • 3 IN - requested number of registers OUT - maximum allowable number Non-NULL - address of Adapter object NULL - no such Adapter object available Part of the problem here is that you can only build MDLs for physical memory that's known to the system at bootstrap time. There's simply no way to create an MDL describing memory that's actu ally located on a peripheral or that's just a range of address space on some bus. Sec. 12.2 Working with Adapter Objects 267 ping registers available for a single transfer. The driver needs to save both these items in nonpaged storage (usually the Device or Controller Extension) for later use. The main input to HalGetAdapter is the DEVICE_DESCRIPTION block pictured in Table 1 2.3. It's important to set up this structure correctly, since most Table 1 2.3 The DEVICE_DESCRIPTION structure describes a piece of OMA hardware DEVICE_DESCRIPTION, *PDEVICE_DESCRIPTION Field ULONG Version BOOLEAN Master Contents • DEVICE_DESCRIPTION_VERSION • DEVICE_DESCRIPTION_VERSIONl TRUE - device is a bus master • • BOOLEAN ScatterGather BOOLEAN DemandMode BOOLEAN Autolnitialize BOOLEAN Dma32BitAddresses BOOLEAN IgnoreCount BOOLEAN Reservedl BOOLEAN Reserved2 ULONG BusNumber ULONG DmaChannel INTERFACE_TYPE InterfaceType FALSE - devices uses system DMA Slave device supports scatter I gather Slave device uses demand-mode Slave device uses autoinitialize mode DMA logical space uses 32-bit addressing Platform's DMA controller doesn't maintain an accurate DMA count* -* -* System-assigned bus number Slave device DMA channel number Bus architecture • Internal • Isa • Eisa • MicroChannel PCIBus Width of a single transfer operation • Width8Bits • Width16Bits • Width32Bits DMA bus-cycle speed • Compatible • TypeA • TypeB • TypeC • DMA_WIDTH DmaWidth DMA_SPEED DmaSpeed ULONG MaximumLength ULONG DmaPort Largest transfer size (in bytes) device can perform Micro Channel DMA port number *Requires the use of DEVICE_DESCRIPTION_VERSIONl Chapter 12 268 DMA Drivers of the failures of HalGetAdapter are due to bogus device descriptions. Also be sure to clear the structure with RtlZeroMemory before you fill it in. Most of these fields are self-explanatory, but the following ones may need a little clarification. ScatterGather For bus master devices, this says that the hardware has some sort of built-in support for transferring data to and from noncontiguous ranges of physical memory. A later section of this chapter will explain how to write drivers that can take advantage of these capabilities. For slave devices, setting this field to TRUE implies that the device can stop and wait in the middle of a transfer while the I/ 0 Manager reprograms the DMA controller. Since the system DMA controllers on some platforms have only one mapping register per channel, setting ScatterGather to TRUE would mean stop ping after each page of memory is transferred. Demand transfer mode Some devices need to stop and "catch their breath" during a DMA transfer. This gives them the chance to finish working with one chunk of data before the next comes through. If your device behaves this way, the DMA controller has to be programmed to work in demand mode. Otherwise, the system DMA controller won't stop, no matter how much the device screams. Autoinitialization System DMA channels can be programmed to reinitial ize themselves when a transfer completes. In this mode, the DMA controller's count and address registers are automatically reloaded from a pair of base count and address registers at the end of each operation. This causes another transfer to begin immediately. Typically, drivers using this mode of operation will also use a common buffer for the data transfer. lgnoreCount Setting this field to TRUE says that the platform's DMA hardware doesn't maintain an accurate running count of the number of bytes transferred. This forces the HAL to do some extra work during DMA operations, which slows things down. Acquiring and Releasing the Adapter Object There's no guarantee that the DMA resources needed for a transfer will be free when a driver's Start I/O routine runs. For example, a slave-device's DMA channel may already be in use by another device, or there may not be enough mapping registers to handle the request. Consequently, all packet-based DMA drivers and drivers for common-buffer slave devices have to request ownership of the Adapter object before starting a data transfer. Since a Start I/O routine runs at DISPATCH_LEVEL IRQL, there's no way it can stop and wait for the Adapter object. Instead, it calls the IoAllocateAdapter Channel function (see Table 12.4) and then returns control to the I/O Manager. 269 Sec. 12.2 Working with Adapter Objects Table 1 2.4 Prototype for loAllocateAdapterChannel == NTSTATUS loAllocateAdapterChannel IRQL Parameter Description IN PADAPTER_OBJECT AdapterObject Adapter object from HalGetAdapter Target device for OMA operation IN PDEVICE_OBJECT DeviceObject DISPATCH_LEVEL Count of map registers to allocate IN ULONG NumberOfMapRegisters IN PDRIVER_CONTROL ExecutionRoutine IN PVOID Context Address of XxAdapterControl Argument for XxAdapterControl • STATUS_SUCCESS • STATUS_INSUFFICIENT_ Return value RESOURCES When the requested OMA resources become available, the 1/0 Manager notifies the driver by calling its Adapter Control routine. It's important to keep in mind that this is an asynchronous callback. It may happen as soon as Start 1/0 calls IoAllocateAdapterChannel or it may not occur until some other driver releases the Adapter resources. Notice that you have to be at DISPATCH_LEVEL IRQL when you call this function. Since you normally call it from the Start I/O routine, this poses no prob lem. However, if you're using it in some weird way and you happen to be at PASSIVE_LEVEL, make sure you use KeRaiseirql and KeLowerlrql before and after your call to IoAllocateAdapterChannel. The Adapter Control routine in a DMA driver is responsible for calling loMapTransfer to set up the DMA hardware and starting the actual device opera tion. Table 12.5 contains a prototype of the Adapter Control callback. The MapRegisterBase argument is an opaque value that identifies the map ping registers assigned to your I/O request. In a sense, it's a kind of handle to a specific group of registers. You use this handle to set up the DMA hardware for Table 1 2.5 Function prototype for an Adapter Control routine IO_ALLOCATION_ACTfON XxAdapterControl IRQL == DISPATCH_LEVEL Parameter Description IN PDEVICE_OBJECT DeviceObject IN PIRP irp IN PVOID MapRegisterBase IN PVOID Context Target device for OMA operation IRP describing this operation Handle to a group of mapping registers Driver-determined context • DeallocateObjectKeepRegisters • KeepObject Return value Chapter 12 270 OMA Drivers the transfer. Normally, you should save this value in the Device or Controller extension because you'll need it in later parts of the OMA operation. Watch out for the Irp argument. The IRP address sent to your Adapter Con trol routine comes from the Currentlrp field of the Device object. Since the Cur rentlrp field only gets set when the Start 1/0 routine is called, you can only use this passed IRP pointer if IoAllocateAdapterChannel is called from the Start I/ 0 routine. If you're calling it from some other context, this pointer will be NULL. In that case, you'll have to find another way to pass the IRP (and its associated MDL address) to the Adapter Control routine. After it programs the OMA controller and starts the data transfer, the Adapter Control routine gives control back to the 1/0 Manager. Drivers of slave devices should return a value of KeepObject from this function so that no one else will be able to use the Adapter object until this request is finished. Bus master drivers return DeallocateObjectKeepRegisters instead. When the DpcForlsr routine in a OMA driver completes an 1/0 request, it needs to release any Adapter resources it owns. Drivers of slave devices do this by calling loFreeAdapterChannel; bus master drivers call loFreeMapRegisters. Setting Up the OMA Hardware All packet-based drivers, as well as common-buffer drivers for slave devices, have to program the OMA hardware at the beginning of each data trans fer. In terms of the abstract OMA model used by NT, this means loading the Adapter object's mapping registers with physical-page addresses taken from the MDL. This set up work is done by the loMapTransfer function described in Table 12.6. Table 1 2.6 Prototype for loMapTransfer £ PHVSICAL_ADDRESS loMapTransfer IRQL Parameter Description IN PADAPTER_OBJECT AdapterObject IN PMDL Mdl IN PVOID MapRegisterBase IN PVOID CurrentVa IN OUT PULONG Length Allocated Adapter object Memory Descriptor List for OMA buffer Handle to a group of mapping registers Virtual address of buffer within the MDL • IN - count of bytes to be mapped • OUT - actual count of bytes mapped • TRUE - send data to device • FALSE - read data from device OMA logical address of the mapped region IN BOOLEAN WriteToDevice Return value DISPATCH_LEVEL 271 Sec. 12.2 Working with Adapter Objects IoMapTransfer uses the CurrentVa and Length arguments to figure out what physical page addresses to put into the mapping registers. These values must fall somewhere within the range of addresses described by the MDL. Keep in mind that IoMapTransfer may actually move the contents of a DMA output buffer from one place to another in memory. For example, on an ISA machine, if the pages in the MDL are outside the 16-megabyte DMA limit, calling this function results in data being copied to a buffer in low physical memory. Sim ilarly, if a DMA input buffer is out of range, IoMapTransfer will allocate a buffer in low memory for the transfer. On buses that support 32-bit DMA addresses, no copying or duplicate buffers are required. Drivers of bus master devices also need to call IoMapTransfer. In this case, however, the function behaves a little differently, since it doesn't know how to program the bus master 's control registers. Instead, IoMapTransfer simply returns address and length values that your driver then loads into the device's registers. For bus masters with built-in scatter/ gather support, this same mecha nism allows your driver to create a scatter I gather list for the device. Later sec tions of this chapter will explain how all this works. Flushing the Adapter Object Cache At the end of a data transfer, all packet-based DMA drivers and drivers for common-buffer slave devices have to call IoFlushAdapterBuffers (see Table 12.7) . For devices using the system DMA controller, this function flushes any hardware caches associated with the Adapater object. In the case of ISA devices doing packet-based DMA, this call releases any low memory used for auxiliary buffers. For input operations, it also copies data back to the physical pages of the caller 's input buffer. Refer back to the section on cache coherency for a discussion of this process. Table 1 2.7 Prototype for loFlushAdapterBuffers £ BOOLEAN loFlushAdapterBuffers IRQL Parameter Description IN PADAPTER_OBJECT AdapterObject IN PMDL Mdl IN PVOID MapRegisterBase IN PVOID CurrentVa IN ULONG Length IN BOOLEAN WriteToDevice Adapter object used for this 1/0 MDL describing the buffer Handle passed to XxAdapterControl Starting VA where I/O operation took place Length of buffer • TRUE - operation was an output • FALSE - operation was an input • TRUE - Adapter buffers flushed • FALSE - an error occurred Return value DISPATCH_LEVEL Chapter 12 272 DMA Drivers 1 2.3 WRITING A PACKET- BASED SLAVE OMA DRIVER In packet-based slave DMA, the device transfers data to o r from the locked-down pages of the caller's buffer using a shared DMA controller on the motherboard. The system is also responsible for providing scatter/ gather support. How Packet-Based Slave OMA Works Although the specifics will depend on the nature of your device, most packet-based slave DMA drivers conform to a very similar pattern. The follow ing subsections describe what goes on in the routines making up one of these drivers. DriverEntry routine Along with its usual duties, the DriverEntry routine has some extra work to do: 1. It finds the DMA channel used b y the device. This can come either from auto detected hardware information in the Registry or it can be hard-coded in the Parameters subkey of the driver 's service key. 2. DriverEntry uses its hardware information to build a DEVICE_DESCRIP TION structure and calls HalGetAdapter to locate the Adapter object associ ated with the device. 3. It saves the address of the Adapter object and the count of mapping registers returned by HalGetAdapter for later use. Usually these are stored in the Device Extension. 4. It sets the DO_DIRECT_IO bit in the Flags field of any Device objects it cre ates. This causes the 1/0 Manager to lock user buffers in memory and create MDLs for them. Start 1/0 routine Unlike its counterpart in a programmed 1/0 driver, this Start 1/0 routine doesn't actually start the device. Instead, it just requests owner ship of the Adapter object and leaves the rest of the work to the Adapter Control callback routine. Specifically, the Start 1/0 routine does the following: 1. I t calls KeFlushloBuffers t o flush data from the CPU's cache out t o physical memory. 2. Start 1/0 decides how many mapping registers to request. Initially, it calcu lates the number of registers needed to cover the entire user buffer. If this turns out to be more mapping registers than the Adapter object has, it will ask for as many as are available. 3. Based on the number of mapping registers and the size of the user buffer, Start 1/0 calculates the number of bytes to transfer in the first device opera- Sec. 12.3 Writing a Packet-Based Slave DMA Driver 273 tion. This may be the entire buffer or it may be only the first portion of a split transfer. 4. Next, it calls MmGetMdlVirtualAddress to recover the virtual address of the user buffer from the MDL. It stores this address in the Device Extension. Later parts of the driver will use this address as an offset into the MDL to set up the actual DMA transfer. 5. Start 1/0 then calls IoAllocateAdapterChannel to request ownership of the Adapter object. If this function succeeds, the rest of the setup work will be done by the AdapterControl routine, so Start 1/0 simply returns control to the I/ 0 Manager. 6. If IoAllocateAdapterChannel returns an error, Start 1/0 puts an error code in the IRP's IoStatus block, calls IoCompleteRequest, and starts processing the next IRP. Adapter Control routine The 1/0 Manager calls the Adapter Control rou tine whenever the necessary Adapter resources have become available. Its job is to initialize the DMA controller for the transfer and start the device itself. This rou tine does the following: 1. I t stores the value o f the MapRegisterBase argument in the Device Extension for later use. 2. The Adapter Control routine then calls IoMapTransfer to load the Adapter object's mapping registers. To make this call, it uses the buffer 's virtual address and the transfer size calculated by the Start I/ 0 routine. 3. Next, it sends appropriate commands to the device to begin the transfer operation. 4. Finally, the Adapter Control routine returns the value KeepObject to retain ownership of the Adapter object. At this point, the transfer is actually in progress, and the system can go off and do other things until an interrupt arrives from the device. Interrupt Service routine Compared to a programmed 1/0 driver, the ISR in a packet-based DMA driver is not very complicated. Unless hardware limi tations force the driver to split a large transfer request across several device oper ations, there will be only a single interrupt to service when the whole transfer completes. When this interrupt arrives, the ISR does the following: 1. I t issues whatever commands are necessary to acknowledge the device and prevent it from generating any more interrupts. 2. The ISR then stores device status (and any relevant error information) in the Device Extension. 274 Chapter 12 OMA Drivers 3. It calls IoRequestDpc to continue processing the request in the driver 's Dpc Forlsr routine. 4. The ISR returns a value of TRUE to indicate that it serviced the interrupt. DpcForlsr routine The DpcForlsr routine is triggered by the ISR at the end of each partial data transfer operation. Its job is to start the next partial trans fer (if there is one) or to complete the current request. Specifically, the DpcForlsr routine in a packet-based OMA driver does the following: 1. It calls IoFlushAdapterBuffers to force any remaining data from the Adapter object's cache. 2. The DpcForlsr routine checks the Device Extension to see if there were any errors during the operation. If there were, it completes the request with an appropriate status code and length, and starts the next request. 3. Otherwise, it decrements the count of bytes remaining by the size of the last transfer. If the whole buffer has been processed, it completes the current request and starts the next. 4. If more data remains, the DpcForlsr routine increments the user-buffer address pointer (stored in the Device Extension) by the size of the last opera tion. It then calculates the number of bytes to transfer in the next device oper ation, calls IoMapTransfer to reset the mapping registers, and starts the device. If the DpcForlsr routine started another partial transfer, the 1/0 Manager will return control to the driver again when the device generates an interrupt. Splitting OMA Transfers When a packet-based OMA driver receives a buffer, it may not be able to transfer all the data in a single device operation. It could be that the Adapter object doesn't have enough mapping registers to handle the whole thing at once, or there could be limitations on the device itself. In any event, the driver has to be prepared to split the request across multiple data-transfer operations. There are two solutions to this problem. One is have the driver reject any requests that it can't handle in a single 1/0. With this approach, anyone using the driver is responsible for breaking the request into chunks that are small enough to process. Of course, the driver will have to provide some mechanism for letting its clients know the maximum allowable buffer size (an IOCTL, for example) . If you decide to do things this way, you might want to write a higher-level driver that sits on top of the OMA device driver and splits the requests. This has the advan tage of shielding application programs from the details of splitting the request. Another approach is to write a single, monolithic driver that accepts requests of any size and splits them into several 1/0 operations. This is the strat egy used by the sample driver in the next section of this chapter. Sec. 12.3 Writing a Packet-Based Slave DMA Driver 275 To do things this way, you need to maintain a pointer that tracks your posi tion in the user buffer as you transfer successive chunks of data. You also need to maintain a count of the number of bytes left to process, as well as calculating the amount of data to transfer in the current 1/0 operation. The following subsec tions explain how to initialize and update these data items during an 1/0 request. First transfer The Start 1/0 routine normally sets things up for the first transfer. Initially, it tries to grab enough mapping registers to do everything in one 1/0. If the Adapter object doesn't have enough mapping registers for this to work, Start 1 / 0 asks for as many as it can get and sets up the current transfer accordingly. The following code fragment shows how it's done. pDE - >Trans ferVA = MmGetMdlVi rtualAddress ( I rp - >MdlAddre s s ) ; pDE->Byt esRemaining = MmGetMdl ByteCount ( I rp - >MdlAddr e s s ) ; pDE - >Trans ferS i z e MapRegsNeeded = = pDE - >BytesRemaining ; ADDRES S_AND_S I ZE_TO_SPAN_PAGES ( pDE - >Trans f erVA , pDE - >Trans f e rS i z e ) ; i f ( MapRegsNeeded > pDE - >MapRegsAva i l ab l e { MapRegsNeeded = pDE - >MapRegsAva i l abl e ; pDE - >Trans f erS i z e = MapRegsNeeded * PAGE_S I Z E MmGetMdlByt eO f f s et ( I rp - >MdlAddr e s s ) ; } I oAl locateAdapterChannel ( . . .) ; Additional transfers After each interrupt, the DpcForlsr checks to see if there's any data left to process. If there is, it calculates the number of mapping registers needed to transfer all the remaining bytes in a single 1/0 operation. If there aren't enough mapping registers available, it sets up another partial transfer. The following code fragment illustrates the procedure. pDE - >Byt e s Remaining -= pDE - >Trans f e rS i z e ; i f ( pDE- >Byt esRemaining > 0 ) { pDE - >Trans f e rVA + = pDE- >Trans f e rS i z e ; pDE - >Trans f e rS i z e MapRegsNeeded = = pDE - >Byte sRemaining ; ADDRES S_AND_S I ZE_TO_S PAN_PAGES ( pDE - >Trans f e rVA , pDE - >Trans ferS i z e ) ; 276 Chapter 12 DMA Drivers i f ( MapRegsNeeded > pDE- >MapRegsAva i l abl e ) { MapReg sNeeded = pDE - >MapRegsAva i l able ) ; pDE - >Trans f e rS i z e = MapRegsNeeded * PAGE_S I Z E BYTE_OFFSET ( pDE - >Trans f e rVA ) ; I oMapTrans f e r ( . . . ) ; } 1 2 .4 CODE EXAM PLE : A PACKET-BASED SLAVE OMA DRIVER This example is a skeleton o f a packet-based driver for a generic slave DMA device. Although it doesn't actually manage a specific kind of hardware, it will help you to understand how these drivers work. You can find the complete code for this example in the CH12\PACKT-S directory on the disk that accompanies this book. XXDRIVER.H This excerpt from the driver-specific header file shows the changes that need to be made to support a DMA device. DEVICE_EXTENSION The modified Device Extension structure contains some extra items that are necessary for packet-based DMA. typede f s t ruct _DEVICE_EXTENSION { PDEVICE_OBJECT Devi ceObj e c t ; I I Back pointer ULONG NtDeviceNumber ; I I Zero -based devi ce num PUCHAR PortBas e ; I I F i r s t control regi s ter PKINTERRUPT p int errup t ; I I Interrupt obj e c t PADAPTER_OBJECT AdapterObj e c t ; 0 ULONG MapRegi s t e rCount ; PVOI D MapReg i s terBas e ; @ ULONG Byte s Reques t ed ; � ULONG Byt e s Remaining ; ULONG Trans f e rS i z e ; PUCHAR Trans ferVA ; BOOLEAN Wr i t eToDevi c e ; 0 UCHAR Devi c e S tatu s ; } DEVICE_EXTENS ION , * PDEVICE_EXTENS I ON ; Sec. 12.4 Code Example: A Packet-Based Slave OMA Driver 277 0 These are returned by HalGetAdapter. They identify the specific Adapter object and its maximum transfer size. @ This identifies a particular group of mapping registers that have been assigned to our driver during the course of an 1/0 request. @ These bookkeeping fields keep track of our progress through a split trans fer operation. 0 These items hold the direction of the current data transfer and the status of the OMA device itself. R EGCON.C This sample uses the version of XxGetHardwarelnfo that extracts hard coded information from the Parameters subkey of the driver 's service key. You could just as easily use auto-detected information. XxGetDmalnfo This function uses information pulled from the Registry, supplemented with a few assumptions about the hardware, to find the device's Adapter object. s tat i c NTSTATUS XxGetDmainfo ( IN INTERFACE_TYPE Bus Typ e , IN ULONG BusNumber , IN PDEVICE_BLOCK pDevice ) DEVICE_DESCRI PTI ON De s c r ip ; Rt l Z eroMemory ( &De s c r ip , s i z e o f ( DEVICE_DESCRI PTION ) ) ; 0 Des c r ip . Vers i on = DEVICE_DESCRI PT I ON_VERS I ONl ; De s c r ip . Ma s t e r De s c r ip . ScatterGather De s c r ip . DemandMode De s c rip . Auto ini t i al i z e Des c r ip . Dma3 2 B i tAddres s es FALSE ; @ FALSE ; FALSE ; FALSE ; FALSE ; Des c r ip . InterfaceType Des c rip . BusNumber BusType ; BusNumber ; Des c rip . DmaChanne l Descrip . MaximumLength De s c r ip . DmaWidth De s crip . DmaSpeed pDevi c e - >DmaChanne l ; XX_MAX_DMA_LENGTH ; Widthl 6Bi t s ; Compatibl e ; 278 Chapter 12 OMA Drivers pDevi c e - >MapRegi s t e rCount = ( XX_MAX_DMA_LENGTH I PAGE_S I ZE ) + 2 ; @ pDevi c e - >AdapterObj e c t = HalGetAdapter ( &De scrip , &pDevice- >MapRegi s t e rCount ) ; 0 i f ( pDev i c e - >AdapterObj ect = = NULL ) 0 return STATUS_INSUF F I C IENT_RESOURCES ; else return STATUS_SUCCES S ; 0 It's important to make sure that there aren't any spurious bits set in the DEVICE_DESCRIPTION structure. f9 From this point on, start to build a description of the OMA device. In this case, it's a slave device that performs 16-bit transfers and needs an ISA compatible bus cycle speed. @ Calculate the number of mapping registers that correspond to the largest possible transfer the device can handle. In the worst case, a buffer could occupy some integral number of pages plus one byte before the first page and one byte after the last page. To account for this possibility, request two additional mapping registers. 0 Try to find the Adapter object for the device. Later parts of the driver will need a pointer to the object and information about the maximum number of available mapping registers. 0 If HalGetAdapter fails, it usually means that the DEVICE_DESCRIP TION had some inconsistencies. TRANSFER.C This portion of the example performs the actual data transfers. If an 1/0 request is too large for a single device operation, these routines split the request over several transfers. XxStartlo This function gets control at the beginning of each request. It calculates the size of the first data transfer and requests ownership of the Adapter object. VOI D XxS tart i o ( IN PDEVICE_OBJECT Devi ceObj e c t , IN PIRP I rp ) Sec. 12.4 Code Example: A Packet-Based Slave DMA Driver 279 P I O_STACK_LOCATION I rpS tack = I oGetCurrent i rp S tackLocat i on ( I rp ) ; PDEVICE_EXTENSION pDE = DeviceObj e c t - >Devi c eExtens i on ; PMDL Mdl = I rp - >MdlAddres s ; ULONG MapRegsNeeded ; NTSTATUS s tatus ; swi tch ( I rpStack->Maj orFunc t i on ) { case I RP_MJ_WRITE : case I RP_MJ_READ : pDE- >Byt e sReque s ted MmGetMdlByteCount ( Mdl ) ; 0 pDE - >Byt esRemaining = pDE - >Byt esRequested ; pDE - >Trans ferVA = MmGe tMdlVi rtualAddr e s s ( Mdl ) ; II I I S e t the direc t i on f l ag II i f ( I rpStack- >Maj orFunc t i on = = I RP_MJ_WRITE ) pDE - >Wr i t eToDevi c e TRUE ; pDE - >Wr i t eToDevi c e FALSE ; else pDE - >Trans ferS i z e pDE - >Byt esRemaining ; @ MapRegsNeeded = ADDRESS_AND_S I ZE_TO_S PAN_PAGES ( pDE - >Trans f e rVA , pDE - >Trans ferS i z e ) ; i f ( MapRegsNeeded > pDE - >MapRegi s t e rCount MapRegsNeeded = pDE - >MapRegi s te rCount ; 280 Chapter 12 DMA Drivers pDE - >Trans ferS i z e = MapRegsNeeded * PAGE_S I ZE MmGe tMdlByt eO f f s e t ( Mdl ) ; s tatus I oAl l ocateAdapt erChanne l ( e pDE - >AdapterObj e c t , Devi c eObj e c t , MapRegsNeeded , XxAdapterC ontrol , pDE ) ; i f ( ! NT_SUCCESS ( s t atus ) ) 0 { I rp - > I o S tatus . S tatus = s tatus ; I rp - > I o S tatus . Informat i on = O ; I oComp l e teReque s t ( I rp , I O_NO_INCREMENT ) ; I o S t ar tNext Packe t ( Devi c eObj ect , FALSE ) ; break ; II I I Shou l d never get here - - j us t get r i d I I o f the packet . . . II de f au l t : I rp - > I o S tatus . S tatus STATUS_NOT_SUPPORTED ; I rp- > I o S tatus . Informat i on = O ; I oComp l e t eReque s t ( I rp , IO_NO_INCREMENT ) ; I o StartNextPacke t ( Devic eObj ect , FALSE ) ; break ; } I I end swi tch 0 Set up various bookkeeping values. The size and address of the user buffer come from the MDL built by the 1/0 Manager. Keep in mind that you can use the virtual address as an index into the user buffer but you can't actually dereference it. @ This section calculates the size of the first partial transfer. First, the driver tries to transfer everything in a single DMA. If there aren't enough map- Sec. 12.4 Code Example: A Packet-Based Slave DMA Driver 281 ping registers to handle the whole buffer, the driver asks for as many mapping registers as it can get. Based on this smaller number, it calculates a smaller size for the current transfer. @) Ask for the Adapter object using an asynchronous call. The Adapter Con trol routine will execute when the DMA channel is available. It will start the actual device operation. 0 If the call to IoAllocateAdapaterChannel fails, it usually means there aren't enough mapping registers. In that case, the driver simply fails the IRP and starts the next request. XxAdapterControl This function programs the system DMA hardware and starts the device itself. The 1/0 Manager calls it when the Adapter object belongs to our device and there are enough mapping registers to handle the request. s tat i c IO_ALLOCATI ON_ACT I ON XxAdapterContro l ( . IN PDEVI CE_OBJECT De viceObj ect , IN P I RP I rp , 0 IN PVOI D MapReg i s terBas e , IN PVOI D Context ) = PDEVICE_EXTENS I ON pDE pDE - >MapRegi s terBas e = Context ; MapRegi s terBas e ; @ KeF lushi oBu f fers ( I rp- >MdlAddres s , ! pDE - >Wri teToDevi c e , TRUE ) ; @) I oMapTrans f e r ( pDE - >AdapterObj ect , I rp- >MdlAddres s , pDE - >MapRegi s t erBas e , pDE - >Trans f erVA , &pDE- >Trans ferS i z e , pDE - >Wr i t eToDev i c e ) ; 0 II I I S tart the device II XxWr i t eContro l ( pDE , XX_CTL_INTENB return KeepObj ect ; 0 XX_CTL_DMA_GO ) ; 282 Chapter 12 OMA Drivers 0 The 1/0 Manager gets this IRP pointer from the Currentlrp field of the Device object. Normally, this field gets set when your driver uses loStart Packet or loStartNextPacket to call a standard Start I/0 routine. If your driver doesn't have a Start 1/0 routine, it's up to you to make sure that the Currentlrp field gets set before you call loAllocateAdapterChannel. Or, you'll have to have some other way of getting the IRP pointer (and it's associated MDL address) into the Adapter Control routine. @ Save the value of the MapRegisterBase argument for use by later parts of the driver. @) Flush any processor caches that might be holding parts of the OMA buffer. This is a no-op on CPUs that handle their own cache coherency. Notice the perverse way that the direction argument for this function is TRUE for a read. Other 1/0 Manager functions use TRUE for write requests. 0 Set up the system OMA channel associated with the device. @ Return a value of KeepObj ect in order to retain ownership of the Adapter object until the whole buffer has been transferred. Xxlsr This function processes interrupts from the device. Normally, there will be a single interrupt at the end of each partial transfer, or when an error occurs. BOOLEAN Xxi s r ( IN PKINTERRUPT Interrupt , IN PVOID Servic eCont ext ) PDEVICE_EXTENS ION pDE = S ervi c eCont ext ; PDEVICE_OBJECT Devi ceObj ect = pDE - >Devi c eObj e c t ; UCHAR S t atus = XxReadS tatus ( pDE ) ; UCHAR Contro l ; II I I See i f thi s device reques ted an interrupt II i f ( ( S tatus & XX_STS_IRQ ) = = 0 ) re turn FALSE ; Contro l = XxReadContr o l ( pDE ) ; 0 Contro l &= - ( XX_CTL_INTENB I XX_CTL_DMA_GO ) ; XxWr i t eControl ( pDE , Cont rol ) ; Sec. 12.4 Code Example: A Packet-Based Slave DMA Driver pDE- > Devi ceS tatus = 283 S tatus ; @ I oReques tDpc ( DeviceObj ect , DeviceObj e c t - >Current i rp , ( PVOI D ) pDE ) ; @ re turn TRUE ; 0 When an interrupt arrives, issue some device-specific commands to acknowledge the interrupt and prevent any further ones from coming in. @ Save the status of the hardware so that the DpcForlsr routine can figure out whether the transfer was successful. @ There's not much more that can be done up at DIRQL. Issue a DPC request and let the rest of the work happen at DISPATCH_LEVEL IRQL. XxDpcForlsr This function executes after the Interrupt Service routine runs. It either sets up the next partial transfer or it completes the current request and starts the next one. VOID XxDpcFor i s r ( IN PKDPC Dpc , IN PDEVICE_OBJECT Devi ceObj e c t , IN P I RP I rp , IN PVOI D Context ) PDEVICE_EXTENS I ON pDE = Context ; ULONG MapRegsNeeded ; PMDL Mdl = I rp - >MdlAddre s s ; IoFlushAdapt erBu f fers ( pDE- >Adap terObj e c t , Mdl , pDE- >MapReg i s t erBas e , pDE - >Trans f erVA , pDE - >Trans ferS i z e , pDE - >Wr i t eToDevi c e ) ; 0 i f ( ! XX_STS_OK ( pDE - > Devi ceSt atus ) ) @ { IoFreeAdapt erChanne l ( pDE - >Adapt erObj e c t ) ; I rp - > I oS tatus . S tatus = STATUS_DEVICE_DATA_ERROR ; 284 Chapter 12 DMA Drivers I rp - > I o S tatus . Inf orma t i on = pDE - > BytesRequested pDE - >BytesRema ining ; II I I Comp l e t e thi s reque s t and I I s tart the next II I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; I o S t ar tNext Packe t ( Devic eObj e c t , FALSE ) ; re turn ; pDE - >Byt e s Remaining -= pDE - >Trans f e rS i z e ; i f ( pDE - >Byt esRemaining > 0 ) @ { II I I Update the pointer and t ry to I I do a l l o f i t in one opera t i on II pDE - >Trans ferVA + = pDE - > Trans ferS i z e ; pDE - >Trans ferS i z e = pDE - >Byt esRemaining ; MapRegsNeeded = ADDRES S_AND_S I ZE_TO_S PAN_PAGES ( pDE - >Trans f e rVA , pDE - >Trans ferS i z e ) ; II I I I f the remainder o f the bu f f er i s more I I than we can handle in one I I O . Reduce I I our expec t at i ons . II i f ( MapRegsNeeded > pDE - >MapRegi s t erCount { MapRegsNeeded = pDE - >MapRegi s t erCount ; pDE - >Trans ferS i z e = MapRegsNe eded * PAGE_S I ZE BYTE_OFFSET ( pDE- >Trans ferVA ) ; I oMapTrans fer ( pDE- >Adapt erObj e c t , Mdl , pDE- >MapRegi s t erBas e , pDE - >Trans ferVA , &pDE - >Trans f e rS i z e , pDE - >Wr i teToDevi c e ) ; 0 Sec. 12.5 Writing a Packet-Based Bus Master OMA Driver XxWr i t eContro l ( pDE , XX_CTL_INTENB 285 XX_CTL DMA_GO ) ; else 0 I oFreeAdapterChanne l ( pDE - >AdapterObj ect ) ; I rp- > I o S tatus . S tatus = STATUS_SUCCES S ; I rp- > I o S tatus . Informat i on pDE - >Byt esRequ e s t ed ; = I oComp l e t eRequest ( I rp , IO_DISK_INCREMENT ) ; © I o S tartNext Packe t ( Devic eObj ect , FALSE ) ; 0 Flush any data out of the Adapter object's cache. On platforms with OMA address limitations (ISA buses, for example), this may result in data being copied from place to place in memory. @ Check for device errors. This driver simply fails the IRP if an error occurred. A real driver might retry the operation some number of times before failing it. @} At this point, the driver can assume the previous operation was a success. It checks to see if there are any bytes left in the buffer, and if there are, it sets up the next partial transfer. The logic here is similar to what goes on in the Start 1/0 routine: Try to transfer all the remaining bytes, or as much as the Adapter object can handle, whichever is less. 0 Set up the system OMA controller for the next partial transfer, then start the device. 0 This else clause executes when the entire user buffer has been transferred. It simply completes the IRP and starts the next one. © Pick a priority-boost value that's appropriate for your device. Slower devices can probably get by with IO_DISK_INCREMENT, while faster hardware may need a heftier boost. 1 2 . 5 W R I TI N G A P A C KET- B A S E D B u s M A ST E R D M A D R I V E R In packet-based bus master OMA, the device transfers data to o r from the locked down pages of the caller's buffer using OMA hardware that's part of the device itself. Depending on the capabilities of the device, it might be providing its own scatter I gather support as well. 286 Chapter 12 DMA Drivers The architecture of a packet-based bus master driver is almost identical to that of a driver for a slave device. The only difference is the way the driver sets up the bus master hardware. The following subsections describe these differences. Setting Up Bus Master Hardware A bus master device complicates things because the system doesn't know how to program the device's onboard DMA controller. The most the 1/0 Man ager can do is to give the driver two things: An address in DMA logical space where a contiguous segment of the buffer begins and a count indicating the num ber of bytes in that segment. It then becomes the driver 's responsibility to load this information into the address and length registers of the device and start the transfer. The function that performs this little miracle is none other than our old friend, IoMapTransfer. When you pass NULL for its AdapterObject pointer, its return value will be the address in DMA logical space that corresponds to the CurrentVa and Mdl arguments. You put this logical address into the device's address register. Furthermore, when AdapterObject is NULL, Length becomes both an input and output argument. On input, you ask it to map all the bytes remaining between CurrentVa and the end of the buffer. On output, Length contains the number of contiguous bytes starting at the logical address returned by loMap Transfer. This number goes into your device's count register. Figure 12.4 shows how this works. Supporting bus master devices requires some changes to the driver 's Adapter Control and DpcForlsr routines. The following subsections contain fragVirtual Space Physical Memory A // / B / / --+--+- A Address Length: A+B / c / B Copyright © 1 996 by Cydonix Corporation. 960019a.vsd Figure 1 2.4 For bus masters, IoMapTransfer scans for contiguous buffer segments Sec. 12.5 Writing a Packet-Based Bus Master DMA Driver 287 ments of these routines. Compare them with the corresponding routines in the packet-based slave DMA driver in the previous section of this chapter. Adapter Control routine Being optimistic, the Adapter Control routine asks IoMapTransfer to map the entire buffer at the start of the first transfer. Instead, it tells the driver how much contiguous memory is actually available in the first segment of the buffer. PHYSICAL_ADDRES S DmaAddre s s ; pDE - >Trans ferVA MrnGetMdlVi rtualAddress ( I rp - >MdlAddre s s ) ; = pDE - >Byt esRemaining MrnGe tMdlByteCount ( Irp- >MdlAddres s ) ; = pDE - >Trans ferS i z e DmaAddre s s = = pDE - >BytesRemaining ; I oMapTrans f e r ( NULL , I rp - >MdlAddres s , pDE - >MapRegi s terBas e , pDE - >Trans f e rVA , &pDE - > Trans f e rS i z e , pDE - >Wr i teReque s t ) ; XxWri teAddres s ( pDE , ( PUCHAR ) DmaAddr e s s . LowPart ) ; XxWr i teCount ( pDE , pDE - >Trans f e rS i z e ) ; XxWr i t eContro l ( XX_CTL_DMA_GO ) ; re turn Dea l l ocateObj ec tKeepRegi s t ers ; DpcForlsr routine After each partial transfer, the DpcForlsr routine incre ments the CurrentVa pointer by the previously returned Length value. It then calls IoMapTransfer with this updated pointer and asks to map all the bytes remaining in the buffer. IoMapTransfer returns another logical address and a new Length value indicating the size of the next contiguous buffer segment. This con tinues until the whole buffer has been processed. PHYS I CAL_ADDRESS DmaAddre s s ; I oFlushAdapterBu f f ers ( NULL , I rp - >MdlAddress , pDE - >MapRegi s terBas e , pDE - >Trans f e rVA , pDE - > Trans ferSi z e , pDE - >Wr i t eRequest ) ; pDE- >Byte sRemaining -= pDE - >Trans f e rS i z e ; Chapter 12 288 OMA Drivers i f ( pDE- >Byte sRemaining > 0 ) { pDE - >Trans f erVA + = pDE - >Trans ferS i z e ; pDE - >Trans f e rS i z e DmaAddre s s = = pDE - >Byt esRemaining ; I oMapTrans fer ( NULL , I rp - >MdlAddr es s , pDE- >MapReg i s t erBas e , pDE - >Trans f e rVA , &pDE - >Trans ferSi z e , pDE - >Wr i teReques t ) ; XxWri teAddres s ( pDE , ( PUCHAR ) DmaAddr e s s . LowPart ) ; XxWr i teCount ( pDE , pDE- >Trans f e rS i z e ) ; XxWri teControl ( XX_CTL_DMA_GO ) ; Hardware with Scatter/Gather Support Some bus master devices contain multiple pairs of address and length regis ters, each one describing a single contiguous buffer segment. This allows the device to perform I/O using buffers that are scattered throughout OMA address space. These multiple address and count registers are often referred to as a scatter/ gather list, but you can also think of these bus masters as having their own built-in mapping registers. Figure 12.5 shows how this works. Virtual Space Physical Memory c A B Address A c Length: A+B Address B Length: C Copyright © 1 996 by Cydonix Corporation. 960020a.vsd Figure 1 2.5 Some bus masters have their own scatter I gather hardware Sec. 12.5 Writing a Packet-Based Bus Master DMA Driver 289 Before each transfer, the driver loads as many pairs of address and count registers as there are segments in the buffer. When the device is started, it walks through the scatter I gather list entries in sequence, filling or emptying each seg ment of the buffer and then moving on to the next. When all the list entries have been processed, the device generates an interrupt. Building Scatter/Gather Lists with loMapTransfer Once a gain, IoMapTransfer will be used to find contiguous segments of the DMA buffer. In this case, however, the driver will call it several times before each data transfer operation - once for each entry in the hardware scatter I gather list. These fragments of an Adapter Control and a DpcForlsr routine show how it's done. Adapter Control routine Before the first transfer operation, the Adapter Control routine loads the hardware scatter I gather list and starts the device. The remainder of the buffer will be handled by the ISR and DpcForlsr routines. PHYS ICAL_ADDRES S DmaAddres s ; ULONG Byt e s Le f t i nBu f f e r ; ULONG Segment S i z e ; PUCHAR SegmentVA ; pDE - >Trans f erVA MrnGetMdlVi rtualAddre s s ( I rp - >MdlAddre s s ) ; = pDE- >Byt esRema ining = MrnGe tMdlByteCount ( I rp- >MdlAddress ) ; pDE - > Trans ferS i z e = O; BytesLe f t i nBuf f e r = pDE - >BytesRema i ning ; SegmentVA = pDE - >Trans f e rVA ; XxC l earSgL i s t ( pDE ) ; whi l e ( pDE - >Ava i l abl eSgEnt r i e s > 0 && BytesLe f t i nBu f f e r > 0 ) S egmentS i z e DmaAddre s s = = BytesLe f t i nBu f f e r ; I oMapTrans f e r ( NULL , I rp - >MdlAddr e s s , pDE - >MapRegi s terBas e , pDE - >Trans ferVA , &Segment S i z e , pDE - >Wr i t eRequ e s t ) ; Chapter 12 290 OMA Drivers XxAddToSgL i s t ( pDE , DrnaAddr e s s . LowPart , Segment S i z e ) ; pDE- >Trans f e rS i z e + = Segment S i z e ; SegmentVA + = Segment S i z e ; BytesLe f t inBu f f e r - = Segment S i z e ; Ava i l ab l e S gEnt r i es - - ; XxWr i t eControl ( XX_CTL_DMA_GO ) ; re turn Dea l l ocateObj e c tKeepRegi s ters ; DpcForlsr routine After each transfer is finished, the ISR issues a DPC request. The DpcForlsr routine flushes the previous request, and if there are more bytes left to transfer, it rebuilds the scatter I gather list. PHYS I CAL_ADDRESS DrnaAddres s ; ULONG BytesLe f t inBu f f e r ; ULONG SegmentS i z e ; PUCHAR SegmentVA ; I o F l ushAdapt e rBu f f ers ( NULL , I rp - >MdlAddr e s s , pDE - >MapReg i s terBas e , pDE - >Trans f erVA , pDE - > Trans f e rS i z e , pDE - >Wr i t eReques t ) ; pDE - >Byt esRemaining -= pDE - >Trans f e rS i z e ; i f ( pDE- >Byt e s Remaining > 0 ) { pDE - >Trans f e rVA + = pDE - >Trans ferS i z e ; pDE - >Trans f e rS i z e = 0; BytesLe f t inBu f f er = pDE - >BytesRemaining ; SegmentVA = pDE - > Trans f erVA ; XxC l earSgL i s t ( pDE ) ; whi l e ( pDE - >Ava i l abl eSgEntr i e s > 0 && Byt e s Le f t inBu f fer > 0 ) SegmentS i z e = BytesLe f t inBu f f e r ; DrnaAddre s s = I oMapTrans fer ( Sec. 12.6 Writing a Common Buffer Slave DMA Driver 291 NULL , I rp - >MdlAddre s s , pDE - >MapRegi s terBas e , pDE- >Trans ferVA , & Segment S i z e , pDE - >Wr i teRequest ) ; XxAddTo SgL i s t ( pDE , DmaAddr e s s . LowPart , Segment S i z e ) ; pDE - > Trans ferS i z e + = Segment S i z e ; SegmentVA + = Segment S i z e ; Byte sLe f t i nBu f f e r - = Segment S i z e ; Ava i l ab l eSgEnt r i es - - ; } // end whi l e XxWr i t eContr o l ( XX_CTL_DMA_GO ) ; else I oFreeMapReg i s ters ( . . . ) ; ) ; IoComp l e teReque s t ( I o S tartNext Packe t ( . . . ) ; } 1 2.6 WRITING A COMMON B U FFER SLAVE OMA DRIVER In common buffer slave DMA, the device transfers data to or from a contiguous buffer in nonpaged pool using a system DMA channel. Although originally intended for devices that use the system DMA controller 's autoinitialize mode, common buffers can also improve throughput for some types of ISA-based slave devices. Allocating a Common Buffer Memory for a common buffer has to be physically contiguous and visible in the DMA logical space of a specific device. To guarantee that both these condi tions are met, you use the HalAllocateCommonBuffer function described in Table 12.8 to allocate memory for the buffer. Notice the CacheEnabled argument to this function. It's usually a good idea to request non-cached memory for the common buffer since it eliminates the need to call KeFlushloBuffers. On some platforms, this can improve the performance of both your driver and the system. 292 Chapter 12 Table 1 2.8 DMA Drivers Prototype for HalAllocateCommonBuffer PVOID HalAllocateCommonBuffer IRQL == PASSIVE_LEVEL Parameter Description IN PADAPTER_OBJECT Adapter object IN ULONG Length OUT PPHYSICAL_ADDRESS LogicalAddress IN BOOLEAN CacheEnabled AdapterObject associated with DMA device Return value Requested size of buffer in bytes Address of the common buffer in the DMA controller's logical space • TRUE - memory is cacheable by the CPU • FALSE - memory is not cached • Non-NULL - system VA of common buffer • NULL - error In the case of common buffer slave DMA, you'll need to build an MDL for the buffer. 4 This MDL is a required argument for IoMapTransfer and IoFlush AdapterBuffers. To set up the MDL, call IoAllocateMdl followed by MrnBuild MdlForNonPagedPool. When your driver unloads, call IoFreeMdl to release the memory used for the MDL. Using Common Buffer Slave OMA to Maintain Throughput Common buffer slave DMA is useful if a driver can't afford to have IoMap Transfer copy a DMA buffer from one place to another during a data transfer. On ISA buses, this kind of copying is always a possibility with packet-based DMA. Since common buffers are guaranteed to be accessible by their associated DMA devices, there's never any danger of IoMapTransfer moving data from one place to another. For example, drivers of some ISA-based tape drives need to maintain very high throughput if they want to keep the tape streaming. They won't be able to do this if a buffer copy happens during a call to IoMapTransfer. To prevent this, the driver uses a ring of common buffers for the actual DMA operation. Other, less time-critical portions of the driver move data between these common buffers and the actual user buffers. To see how this might work, lets consider the operation of a driver for a hypothetical ISA output device. To maintain a high DMA data rate, it uses a series of common buffers that are shared between the driver 's Dispatch and DpcForlsr routines. The Dispatch routine copies user-output data into an available common buffer and attaches the buffer to a queue of pending DMA requests. Once a DMA 4 The MDL is unnecessary if you plan to use the common buffer for bus master DMA. 293 Sec. 12.6 Writing a Common Buffer Slave DMA Driver Adapter Control: Dispatch: Allocate buffer loMapTransfer first buffer RtlMoveMemory Start device Add buffer to queue If idle l oAllocateAdapterChannel Interrupt Service: loReq uestDpc 0. . . . . . . ... Common Buffers DpcForlsr: Release cu rrent buffer loMapTransfer next buffer Free Start device Copyright © 1 996 by Cydonix Corporation. 960021 a. vsd Figure 1 2.6 Using common buffers allows some ISA drivers to maintain higher throughput is in progress, the DpcForlsr removes buffers from the queue and processes them as fast as it can. Figure 12.6 shows the organization of this driver, and the subsec tions below describe various driver routines. Dr l verEntry routine As always the DriverEntry routine has to find and allocate the driver 's hardware. Along with its usual responsibilities, DriverEntry also does the following: 1. When it creates its Device object, it sets the DO_BUFFERED_IO bit in the Flags field. Although the underlying common buffers will be processed using DMA, the user data will initially be copied into system-space buffers. 2. DriverEntry initializes two queues in the Device Extension. One holds a list of free common buffers. The other is for work requests in progress. 3. Next, it creates separate spin locks to guard each queue. The spin lock for the work list also protects a flag in the Device Extension called DmalnProgress. 4. Then, DriverEntry calls HalGetAdapter to find the Adapter object associated with its device. It uses the count of mapping registers returned by this func tion to determine the size of its common buffers. 5. It allocates some number of common buffers and adds them to the free list in the Device Extension. (As an implementation detail, some of the space in each common buffer is used for a linked-list pointer, a pointer to the IRP associated with this request, and a pointer to the MDL for the common buffer.) For each 294 Chapter 12 DMA Drivers buffer, it also calls IoAllocateMdl and MmBuildMdlForNonPagedPool to create an MDL. Finally, DriverEntry initializes a Semaphore object and sets its initial count to the number of common buffers it has just created. 6. Dispatch routine The Dispatch routine of this driver works differently than the ones you've seen so far. Since the driver has no Start 1/0 routine, the Dis patch routine is actually responsible for queuing or starting each request. This is what the Dispatch routine does to process an output request: 1. I t calls KeWaitForSingleObj ect to wait for the Semaphore object associated with the driver 's list of free buffers. The thread issuing the call will freeze until there's at least one buffer in the queue. 5 2. The Dispatch routine removes an available common buffer from the free list and (since we're only considering outputs here) uses RtlMoveMemory to fill it with data from the user 's buffer. 3. It prevents the 1/0 Manager from completing the request by calling IoMark IrpPending. 4. Next, it acquires the spin lock associated with the queue of active requests. As a side-effect, acquiring the spin lock raises IRQL up to DISPATCH_LEVEL. After it owns the spin lock, the Dispatch routine adds the new request to the list of buffers to be output. 5. Still holding the spin lock, the Dispatch routine checks an internal Dmaln Progress flag to see if other parts of the driver are already doing an output. If the flag is TRUE, it simply releases the spin lock. If the flag is FALSE, the Dis patch routine sets it to TRUE and calls IoAllocateAdapterChannel to start the device. It then releases the spin lock. 6. Finally, it returns a value of STATUS_PENDING. At this point, the work request for this buffer has been either started or queued. The next phase of the transfer will take place after the device generates an interrupt. Adapter Control routine If the device was idle, the Adapter Control is called to get it going. This is what it does: 1. 5 It removes the first request from the work queue and saves its address in the Device Extension as the current request. Chapter 14 will explain how to use Semaphore objects. If you're familiar with Win32 programming, you already have a good idea of how they work. Sec. 12.6 Writing a Common Buffer Slave DMA Driver 295 2. Next, the Adapter Control routine saves the value of the MapRegisterBase argument in the Device Extension for later use. 3. It then calls IoMapTransfer to load the system DMA controller with the address of the current request's common buffer. 4. Finally, the Adapter Control routine starts the device and returns a value of KeepObj ect. Once the driver owns the Adapter object, it will hold on to it as long as there are work requests in the queue. Interrupt Service routine As with packet-based DMA, the ISR in a com mon-buffer driver for a slave device just saves hardware status in the Device Exten sion. It then calls IoRequestDpc to continue processing at DISPATCH_LEVEL IRQL. DpcForlsr routine In this driver, the DpcForlsr routine sets up each addi tional work request after the first. Here's how it works: 1. I t calls IoFlushAdapterBuffers t o flush any data from the system DMA con troller 's hardware cache. 2. The DpcForlsr routine tries to remove the next 1/0 request from the work queue. If there is another request, the driver makes it the new .current request, maps its buffer with IoMapTransfer, and starts the device. On the other hand, if the work queue is empty, the driver calls IoFreeAdapterChan nel to release the Adapter object and clears the DmalnProgress flag in the Device Extension. 3. Next, it puts appropriate status information in the IRP for the just-completed request and calls IoCompleteRequest to give it back to the 1/0 Manager. 4. Finally, the DpcForlsr routine puts the just-completed common buffer back in the free list and calls KeRleaseSemaphore to increment the count of available buffers. Each completed DMA operation causes another interrupt that brings the driver back through the DpcForlsr routine. This loop continues until all the requests in the work queue have been processed. Unload routine When a common buffer bus master driver is unloaded, it first needs to stop the device from trying to use the buffer. Once the device is silent, the Unload routine calls HalFreeCommonBuffer to release the memory associated with the ring of buffers. It also calls IoFreeMdl to release memory used for each buffer 's MDL. Chapter 12 296 OMA Drivers 1 2 . 7 WRITING A COMMON-BU FFER Bus MASTER OMA DRIVER In common-buffer bus master OMA, the device transfers data t o o r from a contig uous nonpaged pool buffer using a OMA controller that's part of the device itself. Frequently, this kind of hardware will treat the common buffer as a mailbox for exchanging control and status messages with the driver. How Common-Buffer Bus Master OMA Works The exact operation of a common-buffer bus master driver will depend on the whims of the hardware designer. The description that follows is based on a typical architecture. It assumes the device uses one mailbox for commands and another to return status information. Figure 12.7 illustrates this arrangement. DriverEntry routine The DriverEntry routine does the following to set up a common buffer: 1. It calls HalGetAdapter to find a n Adapter object for the device. 2. DriverEntry next calls HalAllocateCommonBuffer to get a block of contigu ous, nonpaged memory that both the driver and the device can access. It usu ally simplifies things if the common buffer is allocated from non-cached memory. 3. It stores the virtual address of the common buffer in the Device Extension for later use. Length Driver Status Mailbox Copyright © 1 996 by Cydonix Corporation. 9600228.vsd Figure 1 2.7 The driver and the device exchange messages using a common buffer Sec. 12.8 Summary 4. 297 DriverEntry also makes the device itself aware of the common buffer. This usually means storing the logical address and size of the buffer in a pair of device control registers. Start 1/0 routine When it wants to send a command to the device, the Start 1/0 routine does the following: 1. I t builds a command structure in the common buffer using the virtual address stored in the Device Extension. 2. If DriverEntry specificed TRUE for the CacheEnabled parameter of HalAllo cateCommonBuffer, Start I/O needs to call KeFlushloBuffers to force data from the CPU's cache out to physical memory. 3. Finally, Start 1/0 sets a bit in a device control register to notify the device that there is a command waiting for it. In response to the notification bit being set, the device begins processing the command in the common buffer. Interrupt Service routine When the device has finished processing the command in the common buffer, it puts a message in the status mailbox and gen erates an interrupt. In response to this interrupt, the driver 's Interrupt Service routine does the following: 1. It copies the contents of the status mailbox into various fields of the Device Extension. 2. If necessary, the ISR sets another bit in the device control register to acknowl edge that it has read the status message. 3. It calls IoRequestDpc to continue processing the request at a lower IRQL. Unload routi ne When a common-buffer bus master driver is unloaded, it first needs to stop the device from trying to use the buffer. Once the device is silent, the Unload routine calls HalFreeCommonBuffer to release the memory associated with the buffer. 1 2 .8 SUM MARY Without a doubt, drivers for OMA devices are more complicated than drivers for programmed I/0 hardware. In return for this added complexity, the system achieves greater throughput by overlapping CPU activity with data transfers. The 1/0 Manager tries to simplify things by providing a generic framework in which 298 Chapter 12 DMA Drivers to perform DMA. This chapter has presented the details of NT's abstract DMA model and shown how to perform various styles of DMA. So far, we've been assuming that things have gone well during device opera tions. But suppose something terrible happens? Something so terrible, in fact, that you think the system administrator should hear about it. In the next chapter, you'll see how to add error-logging capabilities to a driver. C H A P T E R 13 Logging Device Errors S ystem administrators are a nervous and para noid lot. Like small mammals in the Jurassic period, they scurry about - imagin ing the worst and waiting for it to happen. Adding to their anxiety may seem cruel, but if you're writing a commercial-quality driver, you really should tell someone when serious hardware and software errors occur. This chapter explains how to generate these notifications using NT's event-logging mechanism. 1 3. 1 EVENT LOGGING IN WINDOWS NT Built into Windows NT is a mechanism that allows software components to keep a record of interesting events. This event-logging capability can help you monitor the behavior of a piece of software that's under development. It can also give sup port personnel crucial information once the software is out in the field. The remainder of this section presents guidelines for deciding what information to log and then describes how event logging works. Deciding What to Log For the most part, error logging is something that's best done by lowest-level device drivers. Higher-level drivers usually don't have anything to say that's worth putting in the log file, except possibly startup and shutdown notifications. There are several kinds of events that a device driver might log: 299 Chapter 13 300 Logging Device Errors • Hard device errors that result in an IRP failing • Soft errors that are corrected after some number of retries • Device timeouts • Driver startup and shutdown Along with various pieces of standard information, you're allowed to add your own data to the messages in the event log. Useful items to include are • The contents of any device control or status registers that might indicate the cause of the problem • Any fields from the Device or Controller Extension that indicate the state of the driver when the error occurred • Any additional information about the request that would help with the diagnosis. For example, logging the transfer size might lead you to dis cover that large requests always fail. Two points are worth mentioning. First, don't get carried away with the idea of adding driver-specific data to event-log messages. The amount of space avail able for private data in a kernel-mode event-log message is rather limited. So, stick to the essentials and only add things to your log packets that will be of true diagnostic value. Second, hardware that's on its last legs can generate a lot of error messages as it fails and can easily overwhelm the log file. It's important to have some strat egy for dealing with this situation. For example, you might keep track of how many messages a device is generating, and if it exceeds some threshold, reduce the level of detail reported by your driver. How Event Logging Works The developers of Windows NT had several goals for the event-logging archi tecture. The first was to provide application programs, drivers, and the operating system with a unified framework for recording information. This framework includes a simple yet flexible standard for the binary format of event-log entries. Another goal was to give system administrators an easy way to view these messages. As part of this goal, viewing utilities must be able to display event mes sages in the currently selected national language. Under the American version of NT, the message text should appear in English, while the French version of NT should display French text. Figure 13.l shows how it all works. The following describes what happens when a kernel-mode driver decides to log an error. The process is similar for a user-mode Win32 application, although the specific API calls are different. 1 1 The data-collection DLL in Chapter 1 8 contains an example of using the Win32 event-logging API. Sec. 13.2 Working with Messages 301 Driver Message File • � Logging Thread · · · · · · · · · � · · · · · - - - - � Event Log File Copyright © 1 994 by Cydonix Corporation. 940023a.vsd Figure 1 3.2 1 3. 1 NT event-logging components l. All event messages take the form o f packets in Windows NT. When a kemel mode driver wants to log an event, it first calls the 1/0 Manager to allocate a message packet from nonpaged pool. 2. The driver fills in this packet with various pieces of descriptive information. One of the key items is a 32-bit message code number that identifies the text to be displayed for this packet. Once the packet's ready, the driver gives it back to the I/ 0 Manager. 3. The 1/0 Manager takes the message packet and sends it to the system event logging thread. This thread accumulates packets and periodically writes them to the proper event-log file. 2 4. The Event Viewer utility reads binary packets from the log files. To translate a packet's 32-bit message code into text, the Viewer goes to the Registry. There it finds the path names of one or more message files associated with the packet. These message files contain the actual message text (possibly in multi ple language�) which the Viewer displays. W O R KI N G WITH M ESSAG ES As you've just seen, your driver doesn't include the actual text for its messages in an event-log entry. Instead, it identifies messages using code numbers. The text associated with these code numbers takes the form of a message resource stored 2 If the system crashes before a group of log packets have been written out, you can still see them by using WINDBG's !errlog command. See Chapter 17 for more details. Chapter 13 302 Logging Device Errors somewhere on disk. This section describes how these message codes work and explains how to generate your own message resources. How Message Codes Work The code number identifying a specific message is a 32-bit value consisting of several fields. Figure 13.2 shows the layout of a message code. Table 13.1 gives a little more detail about the meaning of each of these fields. Although you'll probably never need to decode these fields on sight, it's always nice to be able to impress your friends. The 1/0 Manager provides a number of standard messages that your driver can use. The header file, NTIOLOGC.H, defines symbolic names for these mes sage codes, all of which begin with IO_ERR_ (for example, IO_ERR_TIMEOUT or 31 - 30 28 - 1 6 29 Severity 15 - 0 Faci lity Error Code Customer Copyright © 1 996 by Cydonix Corporation. 960023a.vsd Figure Table 1 3. 1 1 3.2 Layout of a message-code number The meaning of message-code fields Message-code fields Field B its Description Code Facility Customer 0-15 16-28 29 Severity 30-31 Code number identifying the error Software component generating the message If set, this is a customer-generated (non-Microsoft) message One of the following: • 0 - success • 1 - information • 2 - warning • 3 - error Sec. 13.2 Working with Messages 303 IO_ERR_NOT_READY). Browse through this header file for a complete list of standard messages. If you want to use these standard messages, you have to add your driver to the list of event-logging system components in the Registry. You also have to identify the file where the text for these messages is located (%System Root% \SYSTEM32\ IOLOGMSG.DLL) . The procedure for doing this is described a little later in this chapter. If the standard messages don't meet all your needs, you can supplement them with driver-defined messages. To do this, you need to follow these steps: 1. Write a message definition file that associates your message codes with specific text strings. 2. Compile this file using the message compiler (MC) utility. 3. Incorporate the message resources generated by MC into your driver. 4. Register your driver as an event-logging system component and identify the driver executable as the file containing the text for these private messages. Writing Message Defin ition Files To use the MC utility, you first need to write a definition file describing all your messages. This definition file is divided into two major sections. Header section Keywords in the header define names for values that will be used in the actual message definitions. Table 13.2 contains the keywords that you can use in the header section of a message definition file. Message section This portion of the message definition file contains the actual text of the messages. Each message begins with the keywords listed in Table 13.3. Table 1 3.2 Keywords used in the header section of a message definition file Header section keywords Keyword Description MessageldTypedef = DataType SeverityNames = ( name=number[:name] ) Typecast applied to all message codes Up to four severity values used in the Message section Facility names used in the Message section Language names used in Message section FacilityNames = ( name=number[:name] ) LanguageNames = ( name=number:filename) Chapter 13 304 Table 1 3.3 Logging Device Errors Keywords used in the message section of a message definition file Message section keywords Keyword Description Messageld = [number I +number] Severity = SeverityName Facility = FacilityName SymbolicName = SymbolName Language = LanguageName 16-bit value assigned to this message* Severity level of this message Facility generating the message Name of message code in generated header file Language ID associated with the message *Required. The message text itself begins after the last keyword. The text of a message can occupy several lines. You end a message with a line containing only a single period character. The message compiler ignores any whitespace or carriage returns in a mes sage definition. If you want explicit control over the appearance of a message when the Event Viewer displays it, you can include various escape sequences (listed in Table 13.4) in the body of the message. The %1-%99 escape codes represent Unicode strings (embedded in the event lo� packet) that will be inserted in the message when the Event Viewer dis plays it. If a kernel-mode driver associates an event packet with a Device object, %1 will automatically contain the NT name of the device; if the driver associates the packet with the Driver object, %1 will be blank. In either case, your first real insertion string will be %2, your second one will be %3, and so on. The code example appearing later in this chapter will explain how to add insertion strings to an event packet. Table 1 3.4 The effects of various escape codes on displayed message text Message formatting escape codes 3 IF you use . . . TH EN it's replaced with . . . %b %t %r%n %1-%99 A single space character A single tab character Carriage return and linefeed An insertion string Remember that these insertion strings will always be displayed as raw text. There's no way for the Event Viewer to translate them into the local language. Sec. 13.2 Working with Messages 305 A Small Example: XXMSG.MC Here is the message definition file for the example that goes with this chap ter. You can find it in the CH13\DRIVER directory on the floppy that accompa nies this book. Header section The first part of the message definition file contains header information. NTSTATUS O Mes s ageidTypede f S eve r i tyName s = ( Suc c e s s Informat i onal Warning Error Fac i l i tyName s Sys t em RpcRunt ime Rpc S tubs Io = XXDr iver = OxO : STATUS_SEVERI TY_SUCCESS = Oxl : STATUS_SEVERITY_INFORMATI ONAL Ox2 : STATUS_SEVERITY_WARNING Ox3 : STATUS_SEVERI TY_ERROR (@ OxO Ox2 : FAC I L ITY_RPC RUNTIME Ox3 : FAC I L I TY_RPC_STUBS Ox4 : FAC I L I TY_I O_ERROR_CODE Ox7 : FAC I L ITY_XX_ERROR_CODE 0 The definitions of any symbolic names generated by MC will include a typecast to NTSTATUS. @ You can find codes for Microsoft-defined facilities in the NTSTATUS.H header file. For your own facility number, pick something that isn't in use. Message section Here's the message section of the file. It defines the actual text to be associated with message code number. Me s s age i d= OxO O O l O Fac i l i ty=XXDriver Seve r i ty= Inforrnat i ona l Syrnbo l i cName=XX_MSG_LOGGING_ENABLED@ Language=Engl i sh Event l o gging enab l ed f o r XxDriver . $ Me s s age Id=+ le Faci l i ty=XXDriver 306 Chapter 13 S eve r i ty=Informat i onal Symbo l i cName =XX_MSG_DRIVER_START ING Language=Engl i sh XxDriver has succe s s fu l ly ini t i a l i z ed . Mes s ageid= + l Fac i l i ty=XxDriver S eve r i ty=Informat i onal S ymbo l i cName=XX_MSG_DRIVER_STOPPING Language=Engl i sh XxDriver has unloaded . Mes s ageid= + l Fac i l i ty=XxDriver S eve r i ty= Informat i onal Symbo l i cName=XX_MSG_OPENING_HANDLE Language=Engl i sh Opening handle to % 1 . Mes s ageid= + l Fac i l i ty=XxDriver S everi ty= Informat i onal Symbo l i cName=XX_MSG_CLOS ING_HANDLE Language=Engl i sh C l o s ing handle to % 1 . Mes s ageid= + l Fac i l i ty=XxDriver S everi ty=Warning S ymbo l i cName=XX_MSG_MULT I PLE_OCCUPANCY Language=Engl i sh % 1 contains mul t ip l e l i f e - f o rms . Data spec i f i es number of oc cupant s . Mes s ageid= + l Fac i l i ty=XxDriver S everi ty= Informat i onal Symbo l i cName=XX_MSG_MERGING_DNA Language=Eng l i sh Merging DNA f rom % 2 and % 3 in % 1 . 0 Logging Device Errors Sec. 13.2 Working with Messages 307 0 The Messageld keyword is required at the start of a message. This form of the keyword assigns an absolute number to the 16-bit Code field of the generated message code. f9 This keyword tells the message compiler to define a symbol called XX_MSG_LOGGING_ENABLED in the header file it generates. @) The actual message text begins after the last keyword. A line containing only a single period character ends the text. 0 This form of the Messageld keyword assigns a Code value to the mes sage that's one greater than the previous message. 0 This message contains placeholders for insertion strings. %1 will become the device name; %2 and %3 will be replaced with whatever insertion strings are embedded in the event-log packet. Compiling a Message Defin ition File Once you've written the message definition file, you use the message com piler (MC) to process it. MC is another quirky little command-line utility that comes with the Win32 SOK and Visual C++. 4 Table 13.5 shows the syntax of the MC command. Table 1 3.5 Syntax of the MC command MC [-?cdosvw] [-herx argument] [-uU] filename.MC Parameter Description -c -d -o -s -v -w -h pathname -e extension -r pathname -x pathname -u -U filename Set Customer bit in all message codes. Use decimal definitions of facility and severity codes in header. Generate OLE2 header file. Insert symbolic name as first line of each message. Generate verbose output. Give warning if message-text is not OS/2 compatible. Location of generated header file. (Default is current directory.) One- to three-character extension for header file. Location of generated RC and binary message files. Location of generated debug file. Input file is Unicode. Message text in binary-output binary file should be Unicode. Name of the message definition file to compile. 4 Documentation for MC is rather sparse. One of the best sources is the MC.HLP help file that comes with the compiler. Chapter 13 308 Logging Device Errors When you run the message compiler, it automatically generates the follow ing files: • filename.RC This is a resource control script that identifies all the lan guages used in the message definition file. For each language, it also iden tifies the binary message file containing the message text. • filename.H This header file contains #define statements for all the message code numbers in the MC input file. The compiler also puts a lot of inline commentary in the header, including the text of the correspond ing message. • MSGnnnnn.BIN This binary file holds all the text for messages in one language. MC will generate separate files (beginning with MSGOOOOl.BIN) for each national language used in the message definition file. - - - Although you can specify the paths where the header and RC files will go, the actual names of these files will always be the same as the name of the message definition file. You have no control over the names of the binary message file. Adding Message Resources to a Driver After you run the message compiler, you still need to do something with the binary message resources it generates. You could put them in a separate DLL, the way the 1/0 Manager does with IOLOGMSG.DLL, but for most drivers it makes more sense to add the message resources to the driver executable itself. That way, you won't have to worry about keeping track of multiple files when you send your driver out into the world. The BUILD utility (described in Chapter 16) understands how to process resource control scripts. So, all you have to do is to add the name of the script to the list of source files making up the driver. BUILD will then run the resource compiler and link the resulting resources into your driver. For example, if you've just compiled a message definition file called XXMSG.MC, you'll have a resource script called XXMSG.RC. The following excerpt from a BUILD SOURCES file shows how you would add this resource script to your driver. SOURCES= i n i t . c unl oad . c di spatch . c eventlog . c xxms g . rc \ \ \ There's one glitch in all this. BUILD doesn't know what to do with message definition files, so you can't just add XXMSG.MC itself to the list of driver sources. This means you need to run the message compiler by hand any time you modify your message definition file. Fortunately, there's a way to extend the capa- Sec. 13.2 Working with Messages 309 bilities of BUILD so that it will automatically maintain message resources for you. Chapter 17 explains how to perform this little bit of magic. Registering a Driver as an Event Source So now you have a header file containing message codes, and a bunch of message resources stuffed into your driver. But there's still a question: Just how does the system know that it should look in your driver executable when it wants to translate a particular message code into text? Once again, we're saved by the Registry. Any software component that plans to generate log entries must identify itself to the system as an event source. Further, every event source has to specify the location of the message files needed to translate any message codes appearing in its log entries. Figure 13.3 shows the Registry entries that identify a driver as an event source. 5 To register your driver as an event source, make the following changes to the Registry: 1. Under Services\EventLog\ System, add the name o f your driver 's execut able (without the extension) to the REG_MULTI_SZ value called Sources. 2. Under driver. . . . . . . Services\EventLog\System, add a key with the same name as your H KEY_LOCAL_MACHIN E\System\CurrentControlSet\Services [ � Eventlog L s tem Sources: REG_MU LTl_SZ: XXDRIVER YYDRIVER ••• XXDRIVER EventMessageFile: REG_EXPAND_SZ: %SystemRoot%\System32\IOLOGMSG.DLL; %SystemRoot%\System32\Drivers\XXDRIVER.SYS TypesSupported: REG_DWORD: Ox7 Copyrig ht © 1 994 by Cydonix Corporation. 940024a.vsd Figure 5 1 3.3 Registering a kernel-mode driver as an event source These entries apply only to kernel-mode event sources. Chapter mode component as an event source. 18 shows how to register a user Chapter 13 310 Logging Device Errors 3. In this key, create a value called EventMessageFile. This is a REG_EXPAND_SZ containing the full path names of any message files used by your driver. If your driver uses multiple files, separate them with a semico lon. If you're using standard messages defined in NTIOLOGC.H, you'll also need to add IOLOGMSG.DLL to this list. 4. In this same key, create a value called TypesSupported. This is a REG_DWORD bit mask identifying the types of messages generated by your driver. A value of Ox7 gets everything. 1 3.3 G E N E RATI N G L O G E NTR I E S The final piece of the puzzle i s to add code t o your driver that actually generates event-log entries. This is a relatively straightforward process that involves allocat ing an empty packet, filling it in, and sending it off to the system logging thread. The rest of this section describes the major steps along the way. Preparing a Driver for Error Logging If you plan to support error logging, there a few small changes you'll want to make to your driver. In particular, it's a good idea to add the following items to your Device Extension: • A sequence number field that your driver increments for each IRP pro cessed by the device. This value should remain constant for the life of the request. • A retry count for the current request, if you retry device operations when an error occurs. Set it to zero each time you start processing an IRP and increment it for each repeated attempt. • Copies of any device registers that would help diagnose the error. If your ISR decides to log an error, it should take a snapshot of the hardware reg isters for the logging routine. You should also adopt some convention that assigns a unique identifying number to each stage of processing an IRP. This number becomes part of the error log information, and it will help you figure out where in your driver the error occurred. This fragment of a driver's header file shows how you might do this: # de f ine # de f ine # de f ine #def ine #de f ine XX_ERRORLOG_STARTI O XX_ERRORLOG_CONTROLLER_CONTROL XX_ERRORLOG_ADAPTER_CONTROL XX_ERRORLOG_I SR XX_ERRORLOG_DPC_FOR_ISR 1 2 3 4 5 Sec. 13.3 Generating Log Entries 311 Finally, you might want to define a value in the Parameters subkey of your driver's Registry service key to control driver error logging. This could either be a Boolean that simply enables and disables logging, or it could be an actual value that determines the level of logging detail. The code example appearing later in this chapter uses a value called EventLogLevel to control the quantity event mes sages it generates. Allocating an Error-Log Packet When your driver uncovers some terrible sin that needs reporting, it has to prepare an error-log packet. There are three sections to an error-log packet: • A standard header • An array of driver-defined ULONGs (referred to as dump data) One or more NULL-terminated Unicode insertion strings6 • Both the dump-data and insertion strings are variable in length and are optional. Figure 13.4 shows the structure of an error-log packet. Before you can allocate an error-log packet, you need to determine how big the packet should be. Remember to leave room for any dump-data and insertion ErrorCode DumpDataSize StringOffset NumberOfStrings StringOffset DumpData[ ] "First Unicode insertion string \O" "Second U nicode insertion string \O" Copyright © 1 996 by Cydonix Corporation. 960024a.vsd Figure 6 1 3.4 Layout of an error-log packet Don't confuse these with the counted UNICODE_STRING data structures used in other parts of NT. Chapter 13 312 Logging Device Errors strings. You can calculate the size of the packet using a variation on the following piece of code: Packe t S i z e = s i z e o f ( IO_ERROR_LOG_PACKET ) + ( s i z e o f ( ULONG ) * ( DumpDataCount s i z e o f ( Ins ert i onS tring s ) ; - 1 ) ) + Here, DumpDataCount is the number of driver-specific ULONG data items, and InsertionStrings are any driver-supplied UNICODE strings to be inserted in the error message. The requested size of the packet cannot exceed ERROR_LOG_MAXIMUM_SIZE. Use the IoAllocateErrorLogEntry function (described in Table 13.6) to allo cate the packet. As you can see from the table, you're allowed to associate the packet either with the Driver object or with a particular Device object. Your choice will determine how the Event Viewer utility displays your message. Overall ini tialization and shutdown are good choices for Driver-level messages, while prob lems involving specific IRPs or pieces of hardware ought to be associated with a Device object. Low memory conditions could make it impossible for the system to get a packet for you, so don't assume that your allocation request will always suc ceed. One easy way to handle these situations is just to forget about logging the error, with the hope that it will happen again when the system isn't so pressed for memory. Finally, notice that you have to be at or below DISPATCH_LEVEL IRQL when you allocate error-log packets. This means that if your ISR decides to log an error (a common occurrence), you'll need a CustomDpc routine to do the actual work. Logging the Error Once you've allocated the packet, you need to fill in all the relevant fields. In addition to the fields listed in Table 13.7, you should also copy any driver-specific data and strings into the packet. Table 1 3.6 Use this function to allocates an error-log packet PVOID loAllocateErrorLogEntry IRQL � DISPATCH_LEVEL Parameter Description IN PVOID IoObject Address of a Device object generating an error Address of a Driver object reporting an error Size in bytes of packet to be allocated • PIO_ERROR_LOG_PACKET - success • NULL - allocation failure • • IN UCHAR EntrySize Return value Sec. 13.4 Code Example: An Error-Logging Routine Table 1 3.7 313 Layout of an IO_ER ROR_LOG_PACKET IO_ERROR_LOG_PACKET, *PIO_ERROR_LOG_PACKET Field Description UCHAR MajorFunctionCode UCHAR RetryCount USHORT DumpDataSize USHORT NumberOfStrings USHORT StringOffset USHORT EventCategory NTSTATUS ErrorCode ULONG UniqueErrorValue NTSTATUS FinalStatus ULONG SequenceNumber ULONG IoControlCode LARGE_INTEGER DeviceOffset ULONG DumpData[l] IRP_MJ_XXX code of current IRP Zero-based count of consecutive retries Bytes of driver-specific data Number of insertion strings Byte offset of first insertion string Event category from driver's message file IO_ERR_XXX (see NTIOLOGC.H) Indicates where in the driver the error occurred STATUS_XXX value from the IRP Driver-assigned number for current IRP IOCTL_XXX if this is a DeviceloControl request Device offset where error occurred, or zero Driver-specific data if DumpDataSize is nonzero When the packet is ready, call IoWriteErrorLogEntry to send it to the system logging thread. The packet doesn't belong to you once you call this function, so don't touch it again. As with packet allocation, you can only write an error-log packet if you're at or below DISPATCH_LEVEL IRQL. 1 3.4 CODE EXAM PLE: AN E R R O R LOGGIN G RO UTI N E - This example illustrates how to log event messages from a kernel-mode driver. The complete example includes a driver that uses these event-logging functions, as well as a test program that exercises the driver. You can find all of this in the CH13 directory on the disk that accompanies this book. EVENTLOG.C This module provides a general event-logging mechanism that any driver can use. In addition to the functions listed below, EVENTLOG.C also defines a global variable called LogLevel that determines logging verbosity. Although glo bals are generally a bad idea in drivers, this one's okay because its value doesn't change once driver initialization is done. Xxln itializeEventlog This function is called from DriverEntry to set up the driver 's event-logging mechanism. Its main purpose is to retrieve a value called EventLogLevel from the driver 's Registry service key and store it in the LogLevel variable. Chapter 13 314 Logging Device Errors VOI D Xxini t i al i z eEventLog ( IN PDRIVER_OBJECT DriverObj ect ) RTL_QUERY_REG I STRY_TABLE QueryTabl e [ 2 ] ; 0 II I I Fabr i cate a Reg i s t ry query . II Rt l Z eroMemory ( QueryTable , s i z e o f ( QueryTab l e ) ) ; @ QueryTabl e [ O ] . Name = L 11 EventLogLeve l 11 ; QueryTabl e [ O ] . Fl ags = RTL_QUERY_REGI STRY_DIRECT ; QueryTabl e [ O ] . EntryContext = &LogLeve l ; II I I Look f o r the EventLogLeve l value I I in the Regi s t ry . II i f ( ! NT_SUCCESS ( Rt lQueryRegi s t ryValues ( @ RTL_REGI STRY_SERVICES , XX_DRIVER_NAME L 11 \ \ Parameters 11 , QueryTab l e , NULL , NULL ) ) ) { LogLeve l DEFAULT_LOG_LEVEL ; II I I Log a mes s age s aying that l ogging I I i s enabled . II } XxReportEvent ( O LOG_LEVEL_DEBUG , XX_MSG_LOGGING_ENABLED , XX_ERRORLOG_INI T , ( PVOI D ) Dr iverObj ect , NULL , I I No IRP NULL , 0 , I I No dump data NULL , 0 ) ; I I No s tr ings 0 This function uses our old friend RtlQueryRegistryValues to set the event-logging verbosity level. We need a query table with one entry for the value and another (NULL) entry for a terminator. Sec. 13.4 Code Example: An Error-Logging Routine 315 @ It's a good idea to clear the table before using it. Otherwise, you can get some strange error messages resulting from random bit settings. @ Query the Registry. RTL_REGISTRY_SERVICES says that the path name (xxdriver\Parameters) should be treated as a subkey of the \ Services key. . . . 0 If verbose logging is enabled, log a message indicating that logging is enabled. XxReportEvent This function does the actual grunt work of allocating an error-log packet, filling it in, and sending it off to the system logging thread. You can only call this function from DISPATCH_LEVEL IRQL. BOOLEAN XxReportEvent ( IN ULONG Mes s ageLeve l , IN NTSTATUS ErrorCode , IN ULONG Uni queErrorValue , IN PVOID I oObj e c t , IN P I RP I rp , IN ULONG DumpData [ ] , IN ULONG DumpDataCount , IN PWSTR S t ri ng s [ ] , IN ULONG S t r i ngCount ) P IO_ERROR_LOG_PACKET Packet ; PDEVICE_EXTENS I ON pDE ; P IO_STACK_LOCATION I rpStack ; PUCHAR pins e r t i onS tri ng ; UCHAR Packet S i z e ; UCHAR Str ingS i z e [ XX_MAX_INSERTION_STRINGS ] ; ULONG i ; if( LOG_LEVEL_NONE LogLeve l ( Mes s ageLeve l > LogLeve l ) ) 1 10 re turn TRUE ; Packe t S i z e s i z e o f ( I O_ERROR_LOG_PACKET ) ; @ i f ( DumpDataCount > 0 ) @ Packe t S i z e + = ( UCHAR ) ( s i z eo f ( ULONG ) * ( DumpDataCount i f ( StringCount > 0 ) 0 - 1 ) ) ; Chapter 13 316 Logging Device Errors i f ( StringCount > XX_MAX_INSERT ION_STRINGS ) StringCount = XX_MAX_INSERTION_STRINGS ; for ( i = O ; i < S t r ingCount ; i + + ) 0 { StringS i z e [ i ] = ( UCHAR ) XxGe tStr ingS i z e ( S t rings [ i ] ) ; Packet S i z e + = S t ringS i z e [ i ] ; II I I Try to a l l ocate the packet II I oAl l ocateErrorLogEntry ( I oObj ect , Packet S i z e ) ; Packet i f ( Packe t - - NULL ) re turn FALSE ; II I I F i l l in s t andard parts o f the packet II Packe t - >ErrorCode = ErrorCode ; Packe t - >UniqueErrorValue = Uni queErrorValue ; i f ( I rp ! = NULL ) © { I rpStack = I oGe tCurrent i rpStackLo c a t i on ( I rp ) ; pDE ( PDEVICE_EXTENS I ON ) ( ( PDEVICE_OBJECT ) I oObj ect ) - > Devi ceExtens i on ; Packe t - >Maj orFunc t i onCode = I rpS tack- >Maj orFunc t i on ; Packe t - >Re t ryCount = pDE - > I rpRe t ryCount ; Packe t - >Fina l S t atus = I rp - > I o S tatus . S tatus ; Packe t - > S equenc eNurnber = pDE - > I rpSequenc eNumber ; i f ( I rpStack- >Maj orFunc t i on = = IRP_MJ_DEVI CE_CONTROL I I I rpStack->Maj orFunc t i on = = I RP_MJ_INTERNAL_DEVICE_CONTROL Sec. 13.4 Code Example: An Error-Logging Routine 317 Packe t - > I oControlCode = I rpStack- > Parameters . Devi c e i oContro l . I oContro lCode ; e l s e Packe t - > I oCont rolCode O; e l s e I I No IRP Packe t - >Maj orFunc t i onCode O; Packe t - >Re t ryCount = O ; Packe t - >Fina l S tatus = O ; Packe t - > Sequenc eNumber = O ; Packe t - > I oControlCode = O ; II I I Add the dump data II i f ( DumpDataCount > 0 { Packe t - >DumpDataS i z e ( USHORT ) ( s i z e o f ( ULONG ) * DumpDataCount ) ; for ( i = O ; i < DumpDataCount ; i + + ) Packe t - >DumpData [ i ] = DumpData [ i ] ; e l s e Packe t - >DumpDataS i z e = O ; II I I Add the insert i on s tr ings II Packe t - >NumberO f S t r ings = ( USHORT ) StringCount ; i f ( StringCount > 0 ) { Packe t - > S t r ingO f f s e t s i z e o f ( I O_ERROR_LOG_PACKET ) + ( DumpDataCount - 1 ) * s i z e o f ( ULONG ) ; p insert i onS t r ing = ( PUCHAR ) Packet + Packe t - > S tringO f f s e t ; @ for ( i = O ; i < S t ringCount ; i + + ) @ { II Chapter 13 318 Logging Device Errors I I Add each new s t ring to the end I I o f the exi s t ing s tu f f II Rt lCopyByt e s ( p i ns e r t i onString , S t rings [ i ] , S t r ingS i z e [ i ] ) ; pins e r t i onS t r i ng += S t r ingS i z e [ i ] ; II I I Log the mes s age II I oWri teErrorLogEntry ( Packet ) ; return TRUE ; 0 If we're not logging or the message is out of range, return without doing anything. @ Begin calculating the packet size. Start with the minimum required num ber of bytes. @) Add in any dump data. Remember that the standard error-lo g packet already has one slot in its Dump Data array. 0 Determine the total space needed for any insertion strings. If the caller has sent too many strings, process only as many as this function can handle. 0 Build a table containing the length of each individual string using XxGet StringSize, a local helper function. This table will be used a gain later to copy the insertion strings into the error-lo g packet. Also add the size of each string to the total packet requirement. CD If there's an IRP, then the IoObject argument must point to a Device object. In that case, use the IRP and the Device Extension to fill in addi tional parts of the error-log packet. If there's no IRP, then set the addi tional fields to 0. @ Insertion strings always go just after the Dump Data array in the error-log packet. After setting the offset of the first string, calculate the address where the first string should go in the packet. @ This loop simply adds each new string to the end of the packet using Rtl CopyBytes. It takes advantage of the table of string sizes generated ear lier in the routine. Sec. 13.5 Summary 319 XxGetStringSize This little helper function calculates the amount of space needed by a NULL-terminated Unicode string. The size includes space for the (2 bytes) UNICODE_NULL at the end of the string. UL ONG XxGe t S t ringS i z e ( IN PWSTR S t r ing ) UNI CODE_STRING Temp S t r ing ; II I I Us e an RTL routine to get the l ength II Rt l in i tUnicodeS t r ing ( &Temp S t r ing , S t r ing ) ; II I I S i z e i s actua l ly two greater becau s e I I o f the UNICODE_NULL at the end . II return ( TempStr ing . Length + s i z e o f ( WCHAR ) ) ; } 1 3.5 SUMMARY This chapter has presented NT's event-logging mechanisms. As you can see, it isn't terribly difficult for drivers to leave a little trail when devices start generating errors. These audit trails can be a useful diagnostic aid to system administrators. This chapter also finishes our look at basic kernel-mode device driver tech niques. In the next chapter, you'll see the first of several variations on the driver architecture we've developed so far. C H A P T E R 14 Sy stem Threads S ome types of legacy hardware can have a bad effect on system performance if you manage them using the driver model we've developed so far. System threads give you a way to keep these devices out of everyone's way. 1 4. 1 SYSTEM THREADS A system thread is a thread that runs exclusively in kernel mode. It has no user mode context and can't access user address space. Just like a Win32 thread, a sys tem thread executes at or below APC_LEVEL IRQL and it competes for use of the CPU based on its scheduling priority. When to Use Threads There are several reasons why you might use threads in a driver. The first possibility is that you're working with a piece of hardware that has the following characteristics: 320 • The device is slow and infrequently accessed. • It takes a long time (more than 50 microseconds) for the device to make a state transition, and the driver has to wait for the transition to occur. Sec. 14.1 System Threads 321 • The device needs to make several state transitions in order to complete a single operation. • The device doesn't generate interrupts for some kinds of interesting state transitions, and the driver has to poll the device for extended periods. You could, of course, manage a device like this using a CustomTimerDpc routine. Depending on the amount of device activity, this approach could clog up the DPC queues and slow down other drivers. Threads, on the other hand, run at PASSIVE_LEVEL and won't interfere with DPC routines. Fortunately, there aren't too many categories of hardware that behave this rudely, and most of them are legacy devices that date from the early days of the personal computer. The most notable examples are floppy disks and QIC tapes attached to floppy controllers. The second possibility is that you've got a device which takes a very long time to initialize itself, and which your driver has to monitor throughout the ini tialization. Certain kinds of optical jukeboxes behave this way. So might a com puter-controlled pottery kiln. This kind of behavior is a problem because the Service Control Manager gives a driver only about 30 seconds to execute its DriverEntry routine. If Driver Entry hasn't returned by then, the Service Control Manager forcibly unloads the driver. The only solution is to put the long-running device start-up code in a sepa rate thread, and return immediately from the DriverEntry routine with STATUS_SUCCESS. 1 Finally, you might need to perform some kind of operation that will only work at PASSIVE_LEVEL IRQL. For example, if your driver had to access the Regis try on a regular basis, or write something to a file, a thread might be the answer. Creating and Terminating System Threads Call PsCreateSystemThread, described in Table 14.1, when you want to cre ate a system thread. Since you can only call this function at PASSIVE_LEVEL IRQL, you will usually create driver threads in your DriverEntry routine. When your driver unloads, it must kill any system threads it may have cre ated. The only way to do this is to have the thread itself call PsTerminateSys temThread with an appropriate exit status. Unlike Win32 user-mode threads, there is no way to forcibly terminate a system thread. This means you need to set up some kind of signaling mechanism to let a thread know that it should exit. As you'll see later in this chapter, Event objects provide a convenient way to do this. 1 Of course, you'll have to figure out what to do if the device fails to initialize successfully. Once DriverEntry has returned, there's no way for a driver to unload itself, so any cleanup will have to be done by the thread itself. This includes things like deleting Device objects, freeing resources, etc. If the driver finds it has no initialized devices, it might also make itself entirely paged in order to reduce its impact on the system. Chapter 14 322 Table 1 4. 1 System Threads Prototype for function that creates a system thread NTSTATUS PsCreateSystemThread IRQL == PASSIVE_LEVEL Parameter Description OUT PHANDLE ThreadHandle IN ULONG DesiredAccess IN POBJECT_ATTRIBUTES Attrib IN HANDLE ProcessHandle OUT PCLIENT_ID Clientld IN PKSTART_ROUTINE StartAddr IN PVOID Context Return value Handle of new thread 0 for a driver-created thread NULL for a driver-created thread NULL for a driver-created thread NULL for a driver-created thread Entry point for thread Argument passed to thread routine • STATUS_SUCCESS - thread was created • STATUS_XXX - an error code Managing Thread Priority In general, system threads running in a driver should set their thread prior ity to the low end of the real-time range. The following code fragment shows how to do this. VOI D ThreadS tartRout ine ( PVO I D Context ) { KeS e t P r i o r i tyThread ( KeGe tCurrentThread ( ) , LOW_REALTIME_PRIORITY ) ; } Remember that real-time threads have no quantum timeout. This means that they only give up the CPU when they voluntarily go into a wait state, or when they're preempted by a thread of higher priority. So don't design any drivers that depend on automatic round-robin thread scheduling. System Worker Threads For occasional, quick operations at PASSIVE_LEVEL IRQL, creating and ter minating a separate thread may not be very efficient. The alternative is to have one of NT's system worker threads perform the task. These threads use a callback mechanism to do work on behalf of any driver. It's not difficult to use system worker threads. First, allocate storage for a WORK_QUEUE_ITEM structure. The system will use this block to keep track of your work request. Next, call ExlnitializeWorkltem to associate a callback func tion in your driver with the WORK_QUEUE_ITEM. Sec. 14.2 Thread Synchronization 323 Later, when you want a system thread to execute your callback function, call ExQueueWorkltem to insert the request block into one of the system work queues. You can choose to have your request executed either by a worker thread with a real-time priority, or by one with a variable priority. Keep in mind that all drivers are sharing the same group of system worker threads. Requests that take a very long time to complete may delay the execution of requests from other drivers. If you need to perform tasks involving lengthy operations or long time delays, use a private driver thread rather than the system work queues. 1 4.2 THREAD SYNCHRONIZATION Like user-mode threads in a Wm32 application, system threads may need to sus pend their execution until some other condition has been satisfied. This section describes the basic synchronization techniques available to system threads. Time Synchronization The simplest kind of synchronization involves stopping a thread's execution until a specific time interval elapses. Although you can use the Timer objects described later in this chapter, the Kernel provides a convenience function (described in Table 14.2) that's easier to use. Table 1 4.2 Prototype for the KeDelayExecutionTh read function NTSTATUS KeDelayExecutionThread I RQL == PASSIVE_LEVEL Parameter Description IN KPROCESSOR_MODE WaitMode IN BOOLEAN Alertable IN PLARGE_INTEGER Interval Return value KemelMode for drivers FALSE for drivers Absolute or relative duetime STATUS_SUCCESS - wait completed General Synchronization System threads can synchronize their activities in more general ways by waiting for things called dispatcher objects. Thread synchronization depends on the fact that a dispatcher object is always in either the Signaled or Nonsignaled state. When a thread asks to wait for a Nonsignaled dispatcher object, the thread's exe cution stops until the object becomes Signaled. (Waiting for a dispatcher object that's already Signaled is a no-op.) There are two different functions you can use to wait for a dispatcher object. Chapter 14 324 System Threads KeWaitForSingleObject This function, described in Table 14.3, puts the calling thread into a wait state until a specific dispatcher object is set to the Sig naled state. Optionally, you can also specify a timeout value that will cause the thread to awaken even if the dispatcher object is Nonsignaled. If you don't pass a timeout argument, KeWaitForSingleObject will wait indefinitely. Table 1 4.3 Prototype for the single object wait function NTSTATUS KeWaitForSingleObject Parameter Description IN PVOID Object IN KWAIT_REASON Reason IN KPROCESSOR_MODE WaitMode IN BOOLEAN Alertable IN PLARGE_INTEGER Timeout Pointer to an initialized dispatcher object Executive for drivers KemelMode for drivers FALSE for drivers • Absolute or relative timeout value 1 • NULL for an infinite wait • STATUS_SUCCESS • STATUS_ALERTED • STATUS_TIMEOUT Return value KeWaitForMultipleObjects This function, described in Table 14.4, puts the calling thread into a wait state until any or all of a group of dispatcher objects Table 1 4.4 Prototype for the multiple-object wait function NTSTATUS KeWaitForMu ltipleObjects Parameter Description IN ULONG Count IN PVOID Object[ ] IN WAIT_TYPE WaitType Number of objects to wait for Array of pointers to dispatcher objects • WaitAll - wait until all are Signaled • WaitAny - wait until one is Signaled Executive for drivers KemelMode for drivers FALSE for drivers • Absolute or relative timeout value • NULL for an infinite wait Array of wait blocks for this operation • STATUS_SUCCESS • STATUS_ALERTED • STATUS_TIMEOUT IN KWAIT_REASON Reason IN KPROCESSOR_MODE WaitMode IN BOOLEAN Alertable IN PLARGE_INTEGER Timeout IN PKWAIT_BLOCK WaitBlocks[ ] Return value Sec. 14.3 Using Dispatcher Objects 325 are set to the Signaled state. Again, you have the option of specifying a timeout value for the wait. Be aware that there are limits on how many objects your thread can wait for at one time. Each thread has a built-in array of Wait blocks that it uses for concurrent wait operations. The thread can use this array to wait for THREAD_WAIT_OBJECTS number of objects. If you need to wait for more than this number of objects, you must supply your own array of Wait blocks when you call KeWaitForMultipleObjects. In either case, the number of objects you wait for cannot exceed MAXIMUM_WAIT_OBJECTS. You can call the KeWaitForXxx functions either from PASSIVE_LEVEL or DISPATCH_LEVEL IRQL. If you call them from DISPATCH_LEVEL IRQL, how ever, you must specify a zero timeout value. 2 This can be useful when your real goal is to cause some side effect produced by the KeWaitForXxx functions. 1 4.3 USING DISPATCHER OBJECTS Except for Thread objects, it's up to you to allocate storage for any dispatcher objects you plan to use. The objects must be permanently resident, so you have to put them in the Device or Controller Extension, or in some other piece of non paged memory. You also have to initialize the dispatcher object once with the proper Kelni tializeXxx function before you use it. Since you can only call these functions at PASSIVE_LEVEL IRQL, you should usually initialize all dispatcher objects in your DriverEntry routine. The following subsections describe each category of dispatcher object in greater detail. Event Objects An Event is a dispatcher object that must be explicitly set to the Signaled or Nonsignaled state. They are useful for notifying one or more threads of some spe cific occurrence. You can see this behavior in Figure 14.1, where thread A awakens B, C, and D by setting an Event object. These objects actually come in two different flavors: Notification Events and Synchronization Events. You choose the type when you initialize the object. These two types of Events exhibit different behavior when they're put into the Signaled state. As long as a Notification Event remains Signaled, all threads waiting for the Event come out of their wait-state. You have to explicitly reset a Notification Event to put it into the Nonsignaled state. This is the same behavior exhibited by Win32 manual-reset Events. When you put a Synchronization Event into the Signaled state, it remains there only long enough for one call to KeWaitForXxx to be satisfied. It then resets 2 Keep in mind that specifying a timeout value of 0 is not the same as passing a NULL pointer for the Trmeout argument. Chapter 14 326 � System Threads Thread A - Set Event Copyright ICJ 1 994 by Cydonix Corporation. 940043a.vsd Figure 1 4. 1 How Event objects synchronize system threads itself to the Nonsignaled state automatically. In other words, the gate stays open until one thread passes through, and then it slams shut. This is equivalent to a Win32 auto-reset Event. To use an Event, you need to declare some nonpaged storage for an item of type KEVENT, and then call the functions listed in Table 14.5. Notice that you can use either of two functions to put an Event object into the Nonsignaled state. The difference is that KeResetEvent returns the state of the Event before it became Nonsignaled, and KeClearEvent does not. KeClearEvent is somewhat faster, so you should use it unless you specifically need to know the previous state of the Event. Table 1 4.5 Use these functions to work with Event objects How to use Event objects IF you want to ... TH EN call ... IRQL Create an Event Create a named Event KelnitializeEvent IoCreateSynchronizationEvent IoCreateNotificationEvent KeSetEvent KeClearEvent KeResetEvent KeWaitForSingleObject KeWaitForMultipleObjects KeReadStateEvent PASSIVE_LEVEL PASSIVE_LEVEL Modify Event state Wait for a Timer Interrogate an Event ::; DISPATCH_LEVEL PASSIVE_LEVEL ::; DISPATCH_LEVEL , Sec. 14.3 Using Dispatcher Objects 327 The driver that we'll be examining later in this chapter provides a good example of using Events. It has a worker thread that needs to pause until an inter rupt arrives, so the thread waits for an Event object. The driver 's DpcForlsr rou tine sets the Event into the Signaled state, waking up the worker thread. Sharing Events between Drivers Normally, it's rather awkward for two unrelated drivers to share an Event object created with KelnitializeEvent. These Event objects are referenced only by pointer, and without some kind of explicit agreement (an internal IOCTL for example), there's no simple way to pass a pointer from one driver to another. Even then, there's the issue of making sure that the driver creating the Event object doesn't unload while some other driver is using the object. Overall, it's a very messy problem. The IoCreateNotificationEvent and IoCreate SynchronizationEvent functions make things easier by allowing you to create named Event objects. As long as two drivers use the same Event name, they will be able to get pointers to the same Event object. Both IoCreateXxxEvent functions behave very much like the Win32 Cre ateEvent system service. In other words, the first driver to make a call with a spe cific Event name causes the Event object to be created. Each additional call using the same name simply returns a handle to the existing Event object. There are two things to notice when you use the IoCreateXxxEvent func tions. First, you don't supply any memory to hold the KEVENT object itself. Stor age for these objects is provided by the system. When everyone using the Event releases it, the system deletes the object automatically. The second little twist is that IoCreateXxxEvent calls return a handle to the Event object. If you want to use the Event object in calls to the KeXxx functions listed in Table 14.5, you need a pointer to the object rather than a handle. To con vert a handle into an object pointer, do the following : 1. First, call ObReferenceObjectByHandle. This function gives you a pointer to the Event object itself and increments the object's pointer reference count. 2. If you don't need the handle for anything (and you probably don't), call ZwClose to release it. This reduces the object's handle reference count. (Don't do this until after you increment the pointer count; otherwise the object may be deleted.) 3. When you have finished using the Event object (normally in the driver 's Unload routine), call ObDereferenceObject to decrement the Event object's pointer reference count and possibly delete the Event object. You can call these functions only from PASSIVE_LEVEL IRQL which limits the places in your driver where you can use them. Chapter 14 328 System Threads Thread B � Thread A - Release Mutex ----- Thread C Thread D Copyright © 1 994 by Cydonix Corporation. 940044a.vsd Figure 1 4.2 How Mutex objects synchronize system threads M utex Objects A Mutex (short for mutual exclusion) is a dispatcher object that can be owned by only one thread at a time. The object becomes Nonsignaled when a thread owns it and Signaled when it's available. Mutexes provide an easy mecha nism for coordinating mutually exclusive access to some shared resource, usually memory. Figure 14.2 shows threads B, C, and D waiting for a Mutex owned by thread A. When A releases the Mutex, one of the waiting threads will wake up and become its new owner. To use a Mutex, you need to declare some nonpaged storage for an item of type KMUTEX, and then call the functions listed in Table 14.6. Be aware that when you initialize a Mutex, it is always set to the Signaled state. Table 1 4.6 Use these functions to work with M utex objects How to use M utex objects IF you want to ... TH EN call. .. IRQL Create a Mutex Request Mutex ownership KelnitializeMutex KeWaitForSingleObject KeWaitForMultipleObjects KeReleaseMutex KeReadStateMutex PASSIVE_LEVEL PASSIVE_LEVEL Give up Mutex ownership Interrogate Mutex PASSIVE_LEVEL :::;:; DISPATCH_LEVEL Sec. 14.3 Using Dispatcher Objects 329 If a thread calls KeWaitForXxx on a Mutex it already owns, the thread never waits. Instead, the Mutex increments an internal counter to record the fact that this thread is making recursive ownership requests. When the thread wants to free the Mutex, it has to call KeReleaseMutex as many times as it requested own ership. Only then will the Mutex go into the Signaled state. This is the same behavior exhibited by Win32 Mutex objects. It's also crucial that your driver release any Mutexes it might be holding before it makes a transition back into user mode. The NT Kernel will bugcheck if any of your driver threads attempt to return control to the I/0 Manager while owning a Mutex. So, for example, a DriverEntry or Dispatch routine isn't allowed to acquire a Mutex which would later be released by some other Dispatch routine or by a system thread. Semaphore Objects A Semaphore is a dispatcher object that maintains a count. The object remains Signaled as long as its count is greater than zero, and Nonsignaled when the count is zero. Figure 14.3 shows the operation of a Semaphore. Threads B, C, and D are all waiting for a Semaphore whose count is zero. When thread A calls KeRelease Semaphore twice, the count increments to two, and two of the waiting threads are allowed to resume execution. Waking up the threads also causes the Semaphore to decrement back to zero. Again, the driver in Section 14.4 provides a good example. Its Dispatch rou tines increment a Semaphore each time they add an IRP to an internal work queue. As a worker thread removes IRPs from the queue, it decrements the Sema phore and finally goes into a wait state when the queue is empty. Thread B � Thread A - Release Semaphore (Count == 2) ------ Thread C Thread D Copyright © 1 994 by Cydonix Corporation. 940045a.vsd Figure 1 4.3 How Semaphore objects synchronize system threads Chapter 14 330 Table 1 4. 7 System Threads Use these functions to work with Semaphore objects How to use Semaphore objects IF you want to ..• Create a Semaphore Decrement Semaphore Increment Semaphore Interrogate Semaphore THEN call ... IRQL KelnitializeSemaphore KeWaitForSingleObject KeWaitForMultipleObjects KeReleaseSemaphore KeReadStateSemaphore PASSIVE_LEVEL PASSIVE_LEVEL � DISPATCH_LEVEL Any To use a Semaphore, you need to allocate some storage for an item of type KSEMAPHORE, then call the functions listed in Table 14.7. Timer Objects A Timer is a dispatcher object with a timeout value. When you start a Timer, it goes into the Nonsignaled state until its timeout value expires. At that point, it becomes Signaled. In Chapter 10, you saw that Timer objects can cause Custom TimerDpc routines to execute. Since they are just Kernel dispatcher objects, you can also use them in calls to KeWaitForXxx. Figure 14.4 illustrates the operation of a Timer object. Thread A starts a Timer and then calls KeWaitForSingleObj ect. The thread blocks until the Timer expires. At that point, the Timer goes into the Signaled state and the thread wakes up. Thread A SetTimer Wait Blocked Timer Continue Copyright © 1 994 by Cydonix Corporation. 940046a.vsd Figure 1 4.4 How Timer objects synchronize system threads Sec. 14.3 Using Dispatcher Objects 331 Timer objects actually come in two different flavors: Notification Timers and Synchronization Timers. You choose the type when you initialize the object. Although both types of Timer go into the Signaled state when their timeout value expires, their behavior from that point on is different. When a Notification Timer times out, it remains in the Signaled state until it's explicitly reset. While the Timer is Signaled, all threads waiting for the Timer are awakened. Earlier versions of Windows NT supported only Notification Timers. When a Synchronization Timer expires, it remains in the Signaled state only long enough to satisfy a single KeWaitForXxx request. At that point, the Timer becomes Nonsignaled automatically. Synchronization Timers are a new feature of Windows NT 4.0. To use a Timer, you need to allocate some storage for an item of type KTIMER and then call the functions listed in Table 14.8. Thread Objects System threads are also dispatcher objects, which means they have a signal state. When a system thread terminates, its Thread object changes from the Non signaled to the Signaled state. This allows your driver to synchronize its cleanup operations by waiting for the Thread object. One thing to notice is that when you call PsCreateSystemThread, you get a handle to the Thread object. If you want to use a Thread object in a call to KeWait ForXxx, you need a pointer to the object rather than a handle. To convert a handle into an object pointer, do the following: 1. Call ObReferenceObj ectByHandle. This function gives you a pointer to the Thread object itself and increments the object's pointer reference count. 2. If you don't need the handle for anything (and you probably don't), call ZwClose to release it. This decrements the object's handle reference count. 3. After the thread terminates, call ObDereferenceObject to decrement the Thread object's pointer reference count and possibly delete the Thread object. Table 1 4.8 Use these functions to work with Timer objects How to use Timer objects IF you want to ... THEN call ... IRQL Create a Timer Start a one-shot Timer Start a repeating Timer Stop a Timer Wait for a Timer KelnitializeTimerEx KeSetTimer KeSetTimerEx KeCancelTimer KeWaitForSingleObject KeWaitForMultipleObjects KeReadTimerState PASSIVE_LEVEL ::::; DISPATCH_LEVEL ::::; DISPATCH_LEVEL ::::; DISPATCH_LEVEL PASSIVE_LEVEL Interrogate a Timer ::::; DISPATCH_LEVEL Chapter 14 332 System Threads You can call these functions only from PASSIVE_LEVEL IRQL which limits the places in your driver where you can use them. Variations on the M utex The NT Executive supports two variations on Mutex objects. The following subsections describe them briefly. In general, using these objects instead of Kernel Mutexes can result in better driver performance. See the NT DDK documentation for more complete information. Fast Mutexes A Fast Mutex is a synchronization object that acts like a Kernel Mutex, except that it doesn't allow recursive ownership requests. By removing this feature, the Fast Mutex doesn't have to do as much work and its speed improves. The Fast Mutex itself is an object of type FAST_MUTEX that you associate with one or more data items needing protection. Any code touching the data items must acquire ownership of the corresponding FAST_MUTEX first. Use the functions listed in Table 14.9 to work with Fast Mutexes. Notice that these objects have their own functions for requesting ownership. You can't use KeWaitForXxx to acquire Fast Mutexes. Table 1 4.9 Use these functions to work with Fast Mutexes How to use Fast M utexes IF you want to ... THEN cal l ... IRQL Create a Fast Mutex Request Fast Mutex ownership Give up Fast Mutex ownership ExlnitializeFastMutex ExAcquireFastMutex ExReleaseFastMutex :::; DISPATCH_LEVEL < DISPATCH_LEVEL < DISPATCH_LEVEL Executive Resources Another synchronization object that behaves very much like a Kernel Mutex is an Executive Resource. Here, the main difference is that a Resource can either be owned exclusively by a single thread, or shared by multiple threads for read access. Since it's common (in the real world) for multiple readers to request simultaneous access to a resource, Executive Resource objects provide better throughput than standard Kernel Mutexes. The Executive Resource itself is just an object of type ERESOURCE that you associate with one or more data items needing protection. Any code planning to touch the data items has to acquire ownership of the corresponding ERESOURCE first. Table 14.10 lists the functions that work with Executive Resources. Notice that these objects have their own functions for requesting ownership. You can't use KeWaitForXxx to acquire Executive Resources. Sec. 14.3 Using Dispatcher Objects Table 1 4. 1 0 333 Use these functions to work with Executive Resources How to use Executive Resources IF you want to ... THEN cal l ... IRQL Create Acquire ExlnitializeResourceLite ExAcquireResourceExclusiveLite ExAcquiredResourceSharedLite ExTryToAcquireResourceExclusiveLite ExConvertExclusiveToSharedLite ExReleaseResourceForThreadLite ExlsResourceAcquiredSharedLite ExisResourceAcquiredExclusiveLite ExDeleteResourceLite :5: Release Interrogate Delete DISPATCH_LEVEL < DISPATCH_LEVEL < DISPATCH_LEVEL < DISPATCH_LEVEL < DISPATCH_LEVEL :5: DISPATCH_LEVEL :5: DISPATCH_LEVEL :5: DISPATCH_LEVEL :5: DISPATCH_LEVEL Synchronization Deadlocks Deadlock situations can occur whenever multiple threads compete for simultaneous ownership of multiple resources. Figure 14.5 shows the simplest form of this problem: 1. Thread A acquires resource X. 2. Thread B acquires resource Y. 3. Thread A requests ownership of resource Y and goes into a wait state until B releases Y. 4. Thread B then requests ownership of resource X. This causes B to go into a wait state until A releases X. Deadlock. �------ Resource X Resource Y Copyright © 1 995 by Cydonix Corporation. 950006a. vsd Figure 1 4.5 How a multiple-resource deadlock occurs Chapter 14 334 System Threads You can cause this kind of deadlock using Events, Mutexes, or Semaphores. Even Thread objects can deadlock waiting for each other to terminate. There are two general approaches to solving deadlock problems: • Use the Timeout argument of the KeWaitForXxx functions to limit the time you wait. While this technique may help you detect a deadlock, it doesn't really correct the underlying problem. • Force all the threads using a given set of resources to acquire them in the same order. In the previous example, if A and B had both gone after resource X first and then Y second, there would have been no deadlock. Mutex objects give you some protection against deadlocks through the use of level numbers. When you initialize a Mutex, you have to assign a level number to it. Later, when a thread attempts to acquire the Mutex, the Kernel will not grant ownership if that thread is holding any Mutex with a lower level number. By enforcing this policy, the Kernel avoids deadlocks involving multiple Mutexes. 1 4.4 CODE EXAM PLE: A THREAD-BASED D RIVE R This section presents a modified version of the packet-based slave DMA driver that you saw back in Chapter 12. What's different about this driver is that it uses a system thread to do most of the 1/0 processing. As a result, it spends very little time at DISPATCH_LEVEL IRQL or DIRQL and doesn't interfere as much with other system components. You can find the code for this example in the CH14\DRIVER directory on the disk that accompanies this book. How the Driver Works The driver you're about to see is unlike anything that's appeared so far in this book. Figure 14.6 gives a high-level view of its inner workings. One of the first things to notice is that the driver has no Start 1/0 routine. When a user-mode I/O request arrives, one of the driver 's Dispatch routines simply adds the IRP to a work queue associated with the Device object. Then the Dispatch routine calls KeReleaseSemaphore to increment a Semaphore object that keeps track of the number of IRPs in the work queue. Each Device object has its own system thread that processes these 1/0 requests. This thread is in an endless loop that begins with a call to KeWaitForSin gleObject on the Semaphore. If the Semaphore object has a nonzero count, the thread will remove an IRP from the work queue and perform the I/0 operation. On the other hand, if the count is zero the thread will go into a wait state until the Dispatch routine inserts another IRP in the queue. When the thread needs to perform a data transfer, it starts the device and then uses KeWaitForSingleObject to wait for an Event object. The driver 's Dpc Forlsr routine will set this Event into the Signaled state after an interrupt arrives. Sec. 14.4 Code Example: A Thread-Based Driver Semaphore Dispatch Routine Queue 335 Thread Walt Interrupt Service Routine Walt DPC Routine Event Copyright © 1 994 by Cydonix Corporation. 940042a.vsd Figure 1 4.6 � Architecture of the thread-based DMA driver When the driver 's Unload routine needs to kill the system thread it sets a flag in the Device Extension and increments the Semaphore object. If the thread was asleep waiting for the Semaphore object, it will wake up, see the flag, and ter minate itself. If it's in the middle of an 1/0 operation, it won't see the flag until it completes the current IRP. The DEVICE_EXTENSION Structure in XXDRIVER.H This file contains all the usual driver-defined data structures. The following excerpt shows only those fields that driver needs in order to manage the system thread and its work queue. Other fields are identical to those in the packet-based slave DMA example of Chapter 12. typede f s t ruct _DEVICE_EXTENS ION { PETHREAD ThreadObj ect ; 0 BOOLEAN ThreadShouldStop ; KEVENT Adapt erObj e c t i sAcqu i red ; @ KEVENT Devi c eOperat i onComp l e t e ; KSEMAPHORE I rpQueueSemaphore ; fD L I ST_ENTRY I rpQueueLi s tHead ; KS P IN_LOCK I rpQueueSpinLock ; } DEVICE_EXTENS ION , * PDEVICE_EXTENS ION ; Chapter 14 336 System Threads 0 Once the thread is running, other parts of the driver can use the Thread object pointer synchronize with it. The BOOLEAN flag tells the thread when it's time to shut down. @ The thread waits for these Event objects at appropriate places in its pro cessing cycle. Other parts of the driver set them into the Signaled state when interesting things happen. @} The work queue consists of a doubly-linked list guarded by a spin lock and a Semaphore object that keeps track of the number of IRPs in the queue. The XxCreateDevice Function in INIT.C This portion of the example shows the initialization code for the Thread object, the work queue, and the various synchronization objects used to process an 1/0 request. Remember that DriverEntry calls XxCreateDevice once for each Device object. s t a t i c NTSTATUS XxCreateDevice ( IN PDRIVER_OBJECT DriverObj ect , IN INTERFACE_TYPE BusType , IN ULONG BusNumber , IN PDEVI CE_BLOCK Devi ceBlock , IN ULONG NtDevic eNurnber ) Ke ini t i al i z eSp inLock ( &pDevExt - > I rpQueueSpinLock ) ; 0 Ini t i a l i zeLi s tHead ( &pDevExt - > I rpQueueL i s tHead ) ; Ke ini t i a l i z eS ernaphore ( &pDevExt - > I rpQueueSernaphore , 0, MAXLONG ) ; Keini t i a l i z eEvent ( @ &pDevExt - >Adap terObj ec t i sAcquired , Synchron i z a t i onEvent , FALSE ) ; Ke ini t i al i z eEvent ( &pDevExt - >Devi ceOperat i onCornp l e t e , Synchron i z a t i onEvent , FALSE ) ; Sec. 14.4 Code Example: A Thread-Based Driver pDevExt - >ThreadShouldS top s tatus = 337 = FALSE ; PsCreateSyst emThread ( 8 &Thread.Handle , ( ACCES S_MASK ) O , NULL , ( HANDLE ) 0 , NULL , XXThread.Mai n , pDevExt ) ; i f ( ! NT_SUCCE S S ( s tatus ) ) { I oDe l et e Syrnbo l i cLink ( & l inkName ) ; IoDe l e t eDevi c e ( pDevObj ) ; return s tatus ; } ObRe f erenc eObj ec tByHandl e ( 0 Thread.Handl e , THREAD_ALL_ACCES S , NULL , Kerne lMode , &pDevExt - >ThreadObj ect , NULL ) ; ZWC l o s e ( Thread.Handl e ) ; I oConnec t interrup t ( . . . ) ; 0 This section of code sets up the work queue used by the thread. fD These calls initialize the Event objects that signal ownership of the Adapter object and the arrival of a device interrupt. Notice that they're both synchronization (i.e., auto-reset) Events. 8 The call to PsCreateSystemThread starts the thread. The entry point function is XxThreadMain and it will receive a pointer to the Device Extension as its Context argument. Because this is an asynchronous oper ation, the status of PsCreateSystemThread is only telling you that the thread was started successfully. It says nothing about what happens to the thread afterwards. 0 PsCreateSystemThread gives back a handle to Thread rather than a pointer to the Thread object itself. This section of code gets a pointer to the object and then releases the (unneeded) handle. Chapter 14 338 System Threads The XxDispatchReadWrite Function in DISPATCH.C This portion of the example shows how the Dispatch routine of this driver works. Its operation is relatively straightforward: After checking for a zero-length transfer, it puts the IRP into the pending state and inserts it into the work queue attached to the target Device object. It then increments the count in the work queue's Semaphore object. Notice that there are no calls to IoStartPacket because there is no Start I/ 0 routine. NT STATUS XxDi spatchReadWr i t e ( IN PDEVICE_OBJECT pDO , IN PIRP I rp ) { PIO_STACK_LOCATI ON I rpS tack = I oGetCurrent i rpS tackLocat i on ( I rp ) ; PDEVICE_EXTENS I ON pDE = pDO- >Devic eExtens i on ; II I I Check f o r z e ro - l ength trans f e r s II i f ( I rp S t ack- > Parameters . Read . Length = = 0 ) { I rp - > I o S tatus . S tatus = STATUS_SUCCESS ; I rp - > I o S tatus . Informat i on = O ; I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; return STATUS_SUCCESS ; II I I Start device opera t i on II I oMarki rp Pending ( I rp ) ; II I I Add the I RP to the thread ' s work queue II Exinterl ockedinsertTa i l L i s t ( &pDE - > I rpQueueL i s tHead , & I rp - >Tai l . Over l ay . Li s tEntry , &pDE - > I rpQueueSpinLock ) ; KeRe leaseSemaphore ( &pDE - > I rpQueueSemaphore , 0, I I No p r i o r i ty boo s t 1, I I Increment s emaphore by 1 Sec. 14.4 Code Example: A Thread-Based Driver 339 FALSE ) ; I I No Wai tForXxx a f t er thi s c a l l re turn STATUS_PENDING ; } THREAD.C This module contains the main thread function and any routines needed to manage the thread. XxTh readMain Here is the !RP-processing engine itself. Its job is to pull 1/0 requests from the work queue in the Device Extension and perform the data transfer operation. This function continues to wait for new IRPs until the Unload routine tells it to shut down. VOI D XxThreadMain ( IN PVOI D Context ) { PDEVICE_EXTENS I ON DevExtens i on = Context ; PDEVICE_OBJECT DeviceObj e c t DevExtens i on- >Devi ceObj e c t ; = PLI ST_ENTRY L i s tEntry ; P I RP I rp ; CCHAR P r i o r i tyBoo s t ; KeSet Pr i o r i tyThread ( KeGetCurrentThread ( ) , LOW_REALTIME_PRIORITY ) ; 0 II I I Now enter the main IRP-pro c e s s ing l oop II whi l e ( TRUE ) { KeWa i tForS ingl eObj e c t ( 8 &DevExt ens i on- > I rpQueueS emaphore , Exe cut ive , Kerne lMode , FALSE , NULL ) ; i f ( DevExtens i on- >ThreadShouldStop ) 8 PsTerminateSys temThread ( STATUS_SUCCES S ) ; Chapter 14 340 System Threads II I I I t mus t be a real reque s t . Get an IRP II L i s tEntry ExinterlockedRemoveHeadL i s t ( &DevExten s i on - > I rpQueueL i s tHead , &DevExtens i on- > I rpQueueSpinLock ) ; = I rp CONTAINING_RECORD ( L i s tEntry , IRP , Ta i l . Overlay . L i s tEntry ) ; Priori tyBoo s t XxPerformDataTran s f e r ( 0 Devic eObj ect , I rp ) ; = I oComp l e t eReques t ( I rp , Priori tyBo o s t ) ; 0 System threads normally start running down in the variable priority range. The usual practice is to move the thread to the lowest of the time critical scheduling priorities. f9 The thread will wait here indefinitely for an IRP to appear in the work queue or for the Unload routine to stop the thread. @) When the thread awakens, it has to see whether the wake-up call was the result of an 1/0 request or a thread shutdown signal. The flag in the Device Extension will give a clue. 0 This function processes the IRP. This is a synchronous call which doesn't return until the data transfer operation is done. It returns a priority boost value which the thread then uses when it completes the IRP. After releas ing the IRP, the thread goes back to the top of the loop and waits for the Semaphore object again. XxKillThread This function notifies the thread associated with a particular Device object that it's time to quit. To simplify things, this function stops and waits until the target thread is gone. Consequently, it can only be called from PASSIVE_LEVEL IRQL. VOI D XxKi l lThread ( IN PDEVICE_EXTENS ION pDE ) Sec. 14.4 Code Example: A Thread-Based Driver II I I S e t the S t op f l ag II pDE - > ThreadShouldStop 341 TRUE ; II I I Make sure the thread wakes up II KeRe l ea s e S emaphore ( &pDE - > I rpQueueS emaphore , 0, I I No p r i o r i ty boo st I I Increment s emaphore by 1 1, I I Wai tForXxx a f t e r thi s c a l l TRUE ) ; II I I Wa i t f o r the thread to terminate II KeWai tForS ingl eObj ect ( &pDE - >ThreadObj ect , Exe cutive , Kerne lMode , FALSE , NULL ) ; ObDer e f erenc eObj e c t ( &pDE - > ThreadObj ect ) ; TRANSFER.C This portion of the example contains the support routines that perform 1/0 operations. A great deal of what's in here is derived from the packet-based slave DMA driver in Chapter 12. Consequently, only those features that differ signifi cantly will be described in detail. The main thing to notice is that very little work actually happens inside the Adapter Control or DpcForlsr routines. Instead of doing their usual jobs, these functions just set Event objects to signal the thread's data transfer routines that they can proceed. XxPerform DataTransfer This function moves an entire buffer of data to or from the device. This may include splitting the transfer over several device opera tions if there aren't enough mapping registers to handle it all at once. This rou tines runs at PASSIVE_LEVEL IRQL and doesn't return to the caller until everything is done. CCHAR XxPer f o rrnDataTrans fer ( IN PDEVICE_OBJECT Devic eObj ect , Chapter 14 342 System Threads IN P I RP I rp ) PIO_STACK_LOCATI ON I rpS tack = I oGetCurrent i rpS tackLocat i on ( I rp ) ; PDEVICE_EXTENSION pDE = Devi ceObj e c t - >Devi c eExtens i on ; PMDL Mdl = I rp - >MdlAddres s ; ULONG MapRegsNeeded ; NTSTATUS s tatus ; II I I S e t the I I O direc t i on f l ag II i f ( I rpS tack- >Maj orFunc t i on == I RP_MJ_WRITE ) pDE - >Wr i t eToDevice TRUE ; pDE - >Wr i t eToDevice FALSE ; else II I I S e t up bookkeep ing values II pDE - >Byt esReques ted = MmGetMdlByteCount ( Mdl ) ; pDE - >Byt esRemaining = pDE - > Byt e s Reques t ed ; pDE - >Trans f erVA = MmGetMdlVi rtua lAddres s ( Mdl ) ; II I I Flush C PU cache i f nec e s sary II KeF lushioBu f f e r s ( I rp- >MdlAddr e s s , ! pDE - >Wr i t eToDevi c e , TRUE ) ; II I I Calculate s i z e o f f i r s t par t i a l Sec. 14.4 Code Example: A Thread-Based Driver I I trans fer II pDE - > Trans ferS i z e 343 = pDE - >Byt esRemaining ; MapReg sNeeded = ADDRESS_AND_S I ZE_TO_S PAN_PAGES ( pDE - >Trans f e rVA , pDE - >Trans f e rS i z e ) ; i f ( MapRegsNeeded > pDE - >MapRegi s t erCount { MapRegsNeeded pDE - >MapRegi s terCount ; pDE - >Trans f e rS i z e = MapRegsNeeded * PAGE_S I Z E MmGe tMdlByteO f f s e t ( Mdl ) ; II I I Acqu i r e the adapter obj ect . II XxAcqui r eAdapterObj ect ( 0 pDE , MapReg sNeeded ) ; i f ( ! NT_SUCCESS ( s t atus ) ) { I rp - > I o S tatus . S tatus = s t atus ; I rp - > I o S tatus . Informat i on = O ; return I O_NO_INCREMENT ; s tatus = II I I Try to per f orm the f i r s t part i a l I I trans fer II s tatus = XxPe r f o rmSynchronousTrans f e r ( @ Devic eObj ect , I rp ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { I oFreeAdapterChanne l ( pDE - >AdapterOb j e c t ) ; I rp - > Io S tatus . S tatus = s tatus ; I rp - > I o S tatus . Informat i on = O ; return I O_NO_INCREMENT ; Chapter 14 344 System Threads II I I I t worked . Updat e the bookkeeping I I informa t i on . II pDE - >Trans f e rVA + = pDE - >Trans f e rS i z e ; pDE- >Byte sRemaining - = pDE - >Trans ferS i z e ; whi l e ( pDE - >BytesRemaining > O ) @) { II I I Try t o do a l l o f i t in one operat i on II pDE- >Trans f e rS i z e = pDE - > Byt esRema i ning ; MapRegsNeeded = ADDRESS_AND_S I ZE_TO_S PAN_PAGES ( pDE - >Trans f erVA , pDE - >Trans f e rS i z e ) ; II II II II II if I f the remainder o f the bu f fe r i s more than we can handle i n one I I O . Reduce our expec tat i ons . ( MapRegsNeeded > pDE - >MapRegi s te rCount MapRegsNeeded = pDE - >MapRegi s t erCount ; pDE- >Trans f e rS i z e = MapRegsNe eded * PAGE_S I Z E BYTE_OFFSET ( pDE- >Trans f erVA ) i II I I Try t o per f o rm a devi c e operat i on . II s tatus XxPer f o rmSynchronousTrans fer ( Dev i ceObj e c t , I rp ) ; i f ( ! NT_SUCCE S S ( s tatus ) ) break ; II I I I t worked . Update the bookkeep ing Sec. 14.4 Code Example: A Thread-Based Driver 345 I I informa t i on for the next cyc l e . II pDE - >Trans f e rVA + = pDE - >Trans ferS i z e ; pDE - >Byt es Remaining - = pDE - >Trans f e r S i z e ; I oFreeAdapterChanne l ( pDE - >Adapt erObj ect ) ; 0 I rp - > I o S tatus . S tatus = s t atus ; @ I rp - > I o S tatus . Informat i on = pDE - >Byte sReque s t ed pDE - >BytesRemaining ; II I I S ince there has been at least one I I O I I operat i on , g ive the IRP a p r i o r i ty boo s t . II return I O_D I SK_INCREMENT ; © 0 Before starting a data transfer, the Device object has to acquire its Adapter object. The thread calls this synchronous helper function to grab the Adapter object. This is different from the callback model used by the OMA driver in Chapter 12. @ Once the Adapter object is secured, the driver can try to perform the first partial data transfer. Again, since this code is running in the context of a system thread, it can stop and wait for the I/O operation to complete. If there's an error, processing stops and the IRP is sent back with no priority boost. . @) If there's more data to transfer, continue to step through the buffer and perform partial OMA transfers. 0 When the last partial transfer is done, release the OMA Adapter object. @ The final status of the IRP will be the status of the last data transfer opera tion. Also calculate the number of bytes actually transferred. © Tell the caller to apply a priority boost to the IRP. This makes sense since there has been at least one actual device operation. XxAcqu ireAdapterObject and XxAdapterControl These two functions work together to give the thread a synchronous mechanism for acquiring owner ship of the Adapter object. XxAcquireAdapterObj ect runs in the context of a sys tem thread so it can stop and wait for a nonzero time interval. s ta t i c NTSTATUS XxAcqui reAdapterObj ect ( IN PDEVICE_EXTENS I ON pDE , IN ULONG MapRegsNeeded Chapter 14 346 System Threads KIRQL Oldi rql ; NTSTATUS s tatus ; KeRa i s e i rql ( D I S PATCH_LEVEL , &Oldi rql ) ; 0 s tatus = I oAl locateAdapt erChanne l ( pDE - >Adapt erObj ect , pDE - >Devi c eObj ect , MapRegsNeeded , XxAdapterContro l , pDE ) ; KeLowe r i rql ( Oldirql ) ; II I I I f the c a l l f ai l ed , i t ' s because there I I weren ' t enough mapp ing reg i s ters . II i f ( ! NT_SUCCES S ( status ) ) { re turn s tatus ; KeWai tForS ingl eObj e c t ( @ &pDE - >AdapterObj e c t i sAcqui red , Execut ive , Kerne lMode , FALSE , NULL ) ; return STATUS_SUCCES S ; s ta t i c IO_ALLOCATION_ACTION XxAdapterContro l ( IN PDEVICE_OBJECT Devi c eObj ect , IN PIRP I rp , IN PVOID MapRegi s t erBas e , IN PVO I D Context ) { PDEVICE_EXTENS ION pDE = Cont ext ; pDE - >MapRegi s t erBa s e = MapReg i s terBas e ; 8 KeSetEvent ( 0 &pDE - >AdapterObj e c t i sAcqui red , 0, FALSE ) ; Sec. 14.4 Code Example: A Thread-Based Driver 347 return KeepObj e c t ; 0 0 Only code running at DISPATCH_LEVEL IRQL can request ownership of the Adapter object. Consequently, this routine raises its IRQL level before calling IoAllocateAdapterChannel. Once it makes the call, it returns to PASSIVE_LEVEL IRQL. @ The function then stops and waits for the Adapter Control routine to set a synchronization Event. That will be the signal that Adapter object has been acquired. 4D It's important for the Adapter Control routine to store the mapping reg ister handle because the thread will need it to set up any OMA data transfers. 0 Next, let the waiting thread know that it can use the OMA hardware. e Finally, return a value of KeepObject in order to hold on to the Adapter Object. XxPerformSynchronousTransfer Running in the context of the system thread, this function performs a single data transfer operation. It doesn't return to the caller until the transfer finishes. The main thing to notice here is that the func tion uses an Event object to wait for the arrival of a device interrupt. s tat i c NTSTATUS XxPer f o rmSynchronousTrans f e r ( IN PDEVICE_OBJECT DeviceObj ect , IN P IRP I rp ) PDEVICE_EXTENS I ON pDE = Devi ceObj e c t - >Devi ceExtens i on ; II I I Set up the sys tem DMA c ontro l l er I I attached to thi s devi c e . II I oMapTrans f e r ( pDE - >Adapt erObj ect , I rp- >MdlAddres s , pDE - >MapRegi s terBas e , pDE- >Trans f e rVA , &pDE - >Trans f e rS i z e , pDE - >Wr i teToDevi c e ) ; II I I Start the devi c e II Chapter 14 348 System Threads XxWr i t eContro l ( pDE , XX_CTL_INTENB I XX_CTL_DMA_GO ) ; II I I The DPC rout ine wi l l set an Event I I obj e c t when the I I O operat i on i s I I done . S t op here and wai t f o r i t . II KeWai tForS ingl eObj ect ( &pDE - >DeviceOperati onComp l e t e , Execut ive , Kerne lMode , FALSE , NULL ) ; II I I Flush data out o f the Adapater I I obj ect cache . II IoFlushAdapterBu f fers ( pDE- >AdapterObj ect , I rp- >MdlAddres s , pDE - >MapReg i s t erBas e , pDE - >Trans ferVA , pDE - > Trans ferS i z e , pDE - >Wr i teToDevi c e ) ; II I I Check f o r device errors II i f ( ! XX_STS_OK ( pDE - > Devi ceS tatus ) ) return STATUS_DEVICE_DATA_ERROR ; else re turn STATUS_SUCCESS ; XxDpcForlsr When the device generates an interrupt, the Interrupt Ser vice routines (not shown here) saves the status of the hardware and requests a DPC. Eventually, XxDpcForlsr executes and just sets an Event object into the Sig naled state. XxPerformSynchronousTransfer (which has been waiting for this Event object) wakes up and continues processing the current IRP. VO I D XxDpcFor i s r ( IN PKDPC Dpc , IN PDEVICE_OBJECT Devic eObj ect , Sec. 14.5 Summary 349 IN PIRP I rp , IN PVOI D Cont ext ) { PDEVICE_EXTENS I ON pDE = Context ; KeSetEvent ( &pDE - >Devi c eOperat i onComp l e t e , 0, FALSE ) ; return ; 1 4.5 S U M MARY This chapter has presented you with an alternative driver architecture based on the use of system threads. Although it's not a good choice for most drivers, this model can be useful if you're trying to manage certain kinds of legacy devices, or devices that would interfere with normal system operation if you used the stan dard interrupt-driven architecture. Now that you have a good understanding of how to work at the hardware level, it's time to see how higher-level drivers are organized. That's the subject of the next chapter. C H A P T E R 15 Hi gher-Level Drivers O ne of the 1/0 Manager 's nifty features is that it lets you stack drivers on top of one another. This permits one driver to use another as a prepackaged component and send requests to it just as a user-mode thread might. As you saw back in Chapter 1, NT's SCSI and network driver archi tectures both rely on this building-block approach. This chapter describes the techniques you need to use if you want to design your own driver hierarchies. 1 5. 1 AN OVERVIEW OF I NTERM EDIATE DRIVERS Before getting into a discussion of writing intermediate drivers, it's a good idea to define just what they are. This section also explores some of the trade-offs inher ent in using a hierarchical driver architecture. What Are Intermediate Drivers? For the purposes of this chapter, an intermediate driver is any kernel-mode driver that issues 1/0 requests to another driver. Intermediate drivers are not usually responsible for any direct, register-level manipulation of hardware resources. Instead, they often depend on a lower-level device driver to perform hardware operations. This may seem like an overly broad definition, but the truth is that intermediate drivers can assume a wide variety of shapes. 350 Sec. 15.l An Overview of Intermediate Drivers 351 From an implementation standpoint, you can classify an intermediate driver according to its relationship with the driver directly below it. Taking this approach, you end up with three distinct groups: • Layered drivers This generic category includes just about any driver that uses the 1/0 Manager 's standard calling mechanism to send requests to another driver. • Filter drivers This is a special category of intermediate drivers that transparently intercept requests intended for some other driver. These drivers also use the 1/0 Manager 's standard calling mechanism. • Tightly coupled drivers This category includes any pair of drivers that define a private interface between themselves - one that doesn't use the 1/0 Manager 's calling mechanism for the bulk of the communication. - - - Later parts of this chapter will explain how to develop drivers in each of these families. Should You Use a Layered Architecture? One important thing to decide is whether your driver design would benefit from being broken into a series of layers, or whether it should be structured as a single monolithic unit. The following will help you understand the trade-offs of taking a layered approach. Why you should Depending on your goals, using multiple driver layers can provide a number of benefits. For example, it allows you to separate higher level protocol issues from management of the specific underlying hardware. This makes it possible to support a wider variety of hardware without having to rewrite large amounts of code. It also promotes flexibility by allowing the same protocol driver to plug into different hardware drivers at runtime. This is the approach taken by NT network drivers. If several different kinds of peripherals can all be attached to the same con troller (as in the case of a SCSI adapter), layering allows you to decouple manage ment of the peripheral from management of the controller. To do this, you write a single device driver for the controller (the port driver) and separate higher-level class drivers for each type of attached peripheral. The two main benefits here are that the class drivers are smaller and simpler and (assuming a well-defined proto col) the class and port drivers can come from different vendors. 1 1 This is exactly what NT's SCSI architecture does. Expect to see more of this kind of thing in future versions of Windows NT when buses like the IEEE 1 394 bus and the Universal Serial Bus make their appearance. Chapter 15 352 Higher-Level Drivers Layering also makes it possible to hide hardware limitations from users of a device, or to add features not supported by the hardware itself. For example, if a given piece of hardware can only handle transfers of a certain size, you might stack another driver on top of it that would break oversized transfers into smaller pieces. Users of the device would be unaware of the device's shortcomings. Inserting driver layers gives you a transparent way to add or remove fea tures from a product without having to maintain multiple code bases for the same product. NT's fault-tolerant disks are one example of this. They're imple mented as a separate driver layer which is shipped with NT Server but not with NT Workstation. Why you shouldn't Of course, there are costs you have to consider if you're thinking about a layered architecture. First of all, 1 / 0 requests incur some extra overhead because each IRP has to take a trip through the 1/0 Manager every time it passes from one driver to another. To some extent, you can reduce this overhead by defining a private interdriver interface that partially bypasses the I/ 0 Manager. It also takes somewhat more design effort to make sure that the separate driver components fit together seamlessly. In the absence of an external standard, this can be especially painful if some of the drivers are coming from different vendors. Since the overall functionality is no longer contained in a single driver exe cutable, there's somewhat more bookkeeping involved in managing the drivers. This also has some impact on maintaining version compatibility between various members of the hierarchy. Finally, installing layered drivers is a little more involved since each one will need its own area in the Registry. In addition, it's necessary to set up dependency relationships among the various drivers in the hierarchy to make sure they start in the proper order. 2 1 5 .2 WRITING LAYERED DRIVERS Layered drivers are the most general type of intermediate driver. They depend for their operation on a well-defined interdriver calling mechanism provided by the 1/0 Manager. This is the first of three sections that explain how this mechanism works, and what a driver needs to do if it wants to use another driver as a component. How Layered Drivers Work As you can see from Figure 15.l, a layered driver exposes one or more named Device objects to which clients send 1/0 requests. When an IRP repre senting one of these requests arrives, the layered driver can process it in two dif ferent ways: In some cases, it might send the IRP directly to a lower-level driver. 2 See Chapter 16 for more information about creating startup dependencies among drivers. Sec. 15.2 Writing Layered Drivers 353 IRP ..... loCallDriver return loCompleteRequest Copyright © 1 996 by Cydonix Corporation. 960031a.vsd Figure 1 5.1 How a layered driver works Alternatively, the layered driver might hold the IRP in a pending state while it allocates additional IRPs and sends them to one or more lower-level drivers. If the layered driver needs to regain control after a lower-level driver fin ishes with an IRP, it can attach an 1/0 Completion routine to the IRP. This routine will execute when the lower driver calls IoCompleteRequest. Initialization and Cleanup in Layered Drivers Like every other kernel-mode driver, a layered driver must have a main entry point called DriverEntry. If the driver is to be unloaded while the system is running, it needs an Unload routine as well. The following subsections describe what these routines have to do. DriverEntry routine The initialization steps performed by a layered driver are similar to those of a regular device driver. The main difference is that a layered driver doesn't have any direct contact with hardware, so all the hardware detec tion and allocation code that you saw in Chapter 7 will be missing. In general, the DriverEntry routine of a layered driver will do the following: l. I t uses IoCreateDevice t o build the upper-level Device object that will b e seen by the outside world. Like the Device objects created by hardware drivers, this one has its own unique name. 2. DriverEntry then calls IoGetDeviceObjectPointer. Given a device name, this function returns the address of the target Device object and a pointer to a File object associated with the target Device. Normally, DriverEnry saves the Chapter 15 354 Higher-Level Drivers target Device object pointer in the Device Extension of the upper-level Device object. 3. Next, it increments the pointer reference count on the target Device object by calling ObReferenceObj ectByPointer. This is necessary because IoGetDeviceObjectPointer automatically increments the reference count on the File object pointer, but not the reference count on the target Device object. 4. Then, DriverEntry calls ObDereferenceObj ect to decrement the pointer ref erence count on the File object associated with the target Device object. 5. If the layered driver forwards incoming IRPs to the target Device object, DriverEntry should set the layered Device object's StackSize field to a value one greater than the StackSize field of the target Device object. This guaran tees that there will be enough stack slots for all the drivers in the hierarchy. 6. If the lower-level driver requires it, DriverEntry can fabricate an IRP with IRP_MJ_CREATE as its major function code and send it to the target Device object. 7. If the Device object is going to be visible to Win32 applications, DriverEntry calls IoCreateSymbolicLink to add its Win32 name to the \DosDevices area of the Object Manager 's namespace. The layered driver can now use the target Device object pointer to make calls to the lower-level driver. U nload routine When a layered driver unloads itself, it basically reverses the sequence of operations it performed at initialization time. Once again, since the driver is not working directly with the hardware, it won't need to release any hardware resources. Although the exact steps may vary, a layered driver 's Unload routine will generally do the following: l. It calls IoDeleteSymbolicLink to remove the upper-level Device object's Win32 name from the Object Manager's namespace. 2. If the lower-level driver requires it, the layered driver 's Unload routine can fabricate an IRP with IRP_MJ_CLOSE as its major function code and send it to the target Device object. 3. Next, the Unload routine decrements the target Device object's pointer refer ence count by calling ObDereferenceObject. This effectively breaks the con nection with the target Device object. 4. Finally, it destroys the upper-level Device object by calling IoDeleteDevice. Code Fragment: Connecting to Another Driver The following code fragment (taken from somewhere in the flow of a DriverEntry routine) shows how one driver might layer itself on top of Sec. 15.2 Writing Layered Drivers 355 another. In this example, the lower-level driver XXDRIVER exposes a device called (what else) XXO and the layered driver (YYDRIVER) exposes YYO. UNI CODE_STRING UpperDevi c eName ; DEVICE_OBJECT Uppe rDeviceObj ect ; PDEVICE_EXTENS I ON UpperExtens i on ; UNICODE_STRING LowerDeviceName ; DEVICE_OBJECT LowerDevic eObj ect ; FILE_OBJECT LowerF i l eObj ect ; NTSTATUS s tatus ; Rt l ini tUnicode S t ring ( 0 &UpperDeviceName , L " \ devi c e \ YY O " ) ; Rt l ini tUni code S t r ing ( &LowerDevic eName , L " \ devi c e \ XXO " ) ; s tatus = IoCreat eDevi c e ( &UpperDevic eName , &UpperDevic eObj ect ) ; UpperExt ens i on = UpperDevi c eObj e c t - >DeviceExtens i on ; s tatus s tatus = I oGe tDevi c eObj ect Pointer ( @ &LowerDevi c eName , F I LE_ALL_ACCESS , &LowerF i l eObj ect , &LowerDevi c eObj e c t ) ; ObRe f e r enceObj ec tByPo inter ( @ LowerDevic eObj ect , F I LE_ALL_ACCESS , NULL , Kerne lMode ) ; ObDer e f erenc eObj e c t ( LowerF i l eObj ect ) ; UpperExt ens i on- >LowerDev i c e = LowerDevic eObj ect ; 0 UpperDeviceObj e c t - >S tackS i z e = LowerDev i c eObj ec t - >S tackS i z e + l; 0 UpperDevi c eObj e c t - >Flags I = ( Lowe rDeviceObj e c t - >Flag s & ( DO_BUFFERED_IO I DO_DIRECT_IO ) ) ; UpperDevic eObj e c t - >Al i gnmentRequirement = LowerDeviceObj e c t - >Al i gnmentRequirement ; Chapter 15 356 Higher-Level Drivers 0 The upper driver prepares Unicode names for both the upper and lower devices. Be careful: These names are case-sensitive. @ It then retrieves a pointer to the lower Device object. This function returns pointers to both a Device object and a File object. @ IoGetDeviceObj ectPointer doesn't increment the pointer count on the Device object. The upper driver has to do that itself. Then, it decrements the pointer count on the lower driver 's File object, since this isn't needed anymore. 0 The upper driver needs to save the address of the lower Device object in its own Device Extension so that other routines will be able to find it. 0 If the upper driver plans to forward IRPs directly to the lower one, these IRPs have to have enough 1/0 stack locations for all the drivers in the hierarchy. In this case, it's also important for the upper driver to duplicate the buffering strategy and alignment of the lower driver. Other Initialization Concerns for Layered Drivers You've just seen the general steps a layered driver needs to perform if it wants to connect to another driver. Depending on how the layered driver oper ates, there may be some other issues that the initialization code has to deal with. There are basically two cases to consider. Transparent layer Some layered drivers are intended to slip transparently between some lower-level driver and its clients. Here, it's important for the Device objects exposed by the layered driver to mimic the behavior of the lower driver 's Device objects. NT Server 's fault-tolerant disk driver is one example of a transparent layer. To guarantee that the layered driver can be added or removed transparently, its DriverEntry routine needs to perform the following extra initialization: 3 • It should copy the DeviceType and Characteristics fields from the target Device object to the layered Device object. • DriverEntry should also copy the DO_DIRECT_IO and DO_BUF FERED_IO bits from the target Device's Flags field. This ensures that the layered Device object will use the same buffering strategy as the target. • It should copy the AlignmentRequirement field from the target to the upper-level Device object. • Finally, the MajorFunction table in the layered Driver object has to sup port the exact same set of IRP_MJ_XXX function codes as the lower-level Driver object. 3 The sample filter driver that appears later in this chapter shows how to set up a layered driver's MajorFunction table dynamically. Sec. 15.2 Writing Layered Drivers 357 Virtual or logical device layer The other possibility is that the layered driver exposes virtual or logical Device objects.4 For example, NT's TDI network protocol drivers present Device objects that have no particular similarity to the network interface cards below them. Likewise, SCSI class drivers export Device objects whose characteristics are those of the peripheral attached to the SCSI bus - not those of the SCSI interface card. In this case, the layered driver should pick appropriate values for the Type and Characteristics fields of the layered Device object. Also, the exact set of IRP_MJ_XXX functions supported by the layered driver will be ones appropriate to the layered Device object. There's also no requirement for the layered and tar get Device objects to use the same buffering strategy. 1/0 Request Processing in Layered Drivers Since layered drivers don't directly manage any hardware, they don't need any Start I/0, Interrupt Service, or DPC routines. Instead, most of the code in a layered driver consists of Dispatch routines and I/O Completion routines. Because they deserve some extra attention, I/0 Completion routines get their own section later in this chapter. The subsections below describe the operation of a layered driver 's Dispatch routines. When one of these Dispatch routines receives an IRP, it can do one of three things. Complete the original IRP The simplest case is the one where the Dis patch routine is able to process the request all by itself and return either success or failure notification to the original caller. The Dispatch routine does the following: 1. It calls IoGetCurrentlrpStackLocation to get a pointer to this driver 's I/O stack slot. 2. The Dispatch routine processes the request using various fields in the IRP and the I/O stack location. 3. It puts an appropriate value in the IoStatus.Information field of the IRP. 4. The Dispatch routine also fills the IoStatus.Status field of the IRP with a suit able STATUS_XXX code. 5. Then, it calls IoCompleteRequest with a priority-boost value of IO_NO_IN CREMENT to send the IRP back to the I/O Manager. 4 A virtual device is one whose behavior is not tied to the characteristics of the underlying peripheral hardware. This also includes things like RAM disks which have no associated peripheral device. A logical device is a temporary construct that maintains the context for a specific series of transac tions - usually occurring over a shared communication medium. For example, when a client requests a connection to a Named Pipe object, the pipe driver creates a separate instance of the pipe just for that client. This pipe instance is a logical device. Logical devices normally have a limited lifespan; the driver creates them when a series of transactions begins, and destroys them when the last transaction is finished. 358 Chapter 15 6. Higher-Level Drivers As its return-value, the Dispatch routine passes back the same STATUS_XXX code that it put into the IRP. .. There's nothing at all mysterious going on here. In fact, it's the same proce dure any Dispatch routine follows when it wants to end the processing of a request. Pass the IRP to another driver The second possibility is that the layered driver 's Dispatch routine needs to pass the IRP to the next lower driver. The Dis patch routine does the following: 1. I t calls IoGetCurrentlrpStackLocation t o get a pointer t o its own 1/0 stack location. 2. The Dispatch routine also calls IoGetNextlrpStackLocation to retrieve a pointer to the 1/0 stack location belonging to the next lower driver. 3. It sets up the next lower driver 's 1/0 stack location, including the Major Function field and various members of the Parameters union. 4. The Dispatch routine calls IoSetCompletionRoutine to associate an 1/0 Completion routine with the IRP. At the very least, this 1/0 Completion rou tine is going to be responsible for marking the IRP as pending. 5. It sends the IRP to a lower-level driver using IoCallDriver. This is an asyn chronous call that returns immediately regardless of whether the lower-level driver completed the IRP. 6. As its return value, the Dispatch routine passes back whatever status code was returned by IoCallDriver. This will be either STATUS_SUCCESS, STATUS_FENDING, or some STATUS_XXX error code. Notice that the Dispatch routine does not call IoMarklrpPending to put the original IRP in the pending state before sending it to the lower driver. This is because the Dispatch routine doesn't know whether the IRP should be marked pending until after loCallDriver returns. Unfortunately, by that time IoCall Driver has already pushed the 1/0 stack pointer in the IRP, so a call to IoMark IrpPending (which always works with the current stack slot) would mark the wrong stack location. The solution is to call IoMarklrpPending in an 1/0 Com pletion routine, after the IRP stack pointer has been reset to the proper level. Allocate additional IRPs Finally, the layered driver 's Dispatch routine may need to allocate one or more additional IRPs which it then sends to lower level drivers. The Dispatch routine has the option of waiting for these additional IRPs to complete, or of issuing asynchronous requests to the lower driver. In the asynchronous case, cleanup of the additional IRPs occurs in an 1/0 Completion routine. The discussion of driver-allocated IRPs (appearing later in this chapter) will explain how to use both these techniques. Sec. 15.2 Writing Layered Drivers 359 Code Fragment: Calling a Lower-Level Driver The code fragment below shows how the Dispatch routine in one driver might forward an IRP to a lower-level driver. For purposes of example, it also shows how the upper driver could store some context (in this case, a retry count) in an unused field of its own 1/0 stack location. NT STATUS YyDi spatchRead ( IN PDEVICE_OBJECT Devi c eObj ect , IN P I RP I rp ) PDEVICE_EXTENS I ON Extens i on = Devi c eObj e c t - >Devi ceExtens i on ; PIO_STACK_LOCATI ON Thi s i rpStack = I oGetCurrent i rpStackLocat i on ( I rp ) ; PIO_STACK_LOCATI ON Next i rpStack = I oGetNext i rpS tackLocat i on ( I rp ) ; *Next i rpStack = *Thi s i rpStack ; 0 Thi s i rpS tack- > Parameters . Read . Key YY_RETRY_COUNT_MAXIMUM_VALUE ; @ = I o S etComp l e t i onRout ine ( @) I rp , YyReadComp l e t i on , NULL , TRUE , TRUE , TRUE ) ; return I oCal l Dr iver ( 0 Ext ens i on- >LowerDevi c e , I rp ) ; 0 In this simple example, the upper driver just copies the entire 1/0 stack location from its own slot to the slot of the next lower driver. This is essentially just a pass-through operation. @ The upper driver 's 1/0 Completion routine is going to use the count stored in the Parameters.Read.Key field of the upper driver's 1/0 stack slot to keep track of attempted retries. Since the upper driver isn't using this field for its intended purpose, it can get away with this trick. @) To recapture this IRP after the lower driver completes it, the upper driver attaches an 1/0 Completion routine. Since all three InvokeOnXxx Chapter 15 360 Higher-Level Drivers arguments are TRUE, the I / 0 Manager will call this routine no matter what happens to the IRP. 0 Finally, the upper driver sends the IRP to the lower driver. Notice that the return value of IoCallDriver becomes the return value of the Dispatch routine. Also, notice that the Dispatch routine doesn't call IoMarklrp Pending with the IRP; that will happen in the 1/0 Completion routine. 1 5.3 WRITING 1/0 COMPLETION ROUTIN ES An I/ 0 Completion routine is an I / 0 Manager callback that lets you recapture an IRP after a lower-level driver has completed it. This section explains how to use I/ 0 Completion routines in intermediate drivers. Requesting an 1/0 Completion Callback If you want to regain control of an IRP after it's been processed, you need to call IoSetCompletionRoutine (described in Table 15.1). This function puts the address of an 1/0 Completion routine in the IRP stack location associated with the next lower driver. When some lower-level driver calls IoCompleteRequest, the 1/0 Completion routine will execute as the IRP bubbles its way back to the top of the driver hierarchy. Except for the driver on the bottom, each driver in the hierarchy can attach its own 1/0 Completion routine to an IRP. This allows everyone to receive notifi cation when an IRP completes. The I/ 0 Completion routines will execute in driver-stacking order, from bottom to top. Also notice the three BOOLEAN lnvokeOnXxx arguments. These allow you to specify the situations in which a particular 1/0 Completion routine will run. The 1/0 Manager uses the IoStatus.Status field of the IRP to decide whether it should call the 1/0 Completion routine. Table 1 5. 1 Function prototype for l oSetCompletionRoutine VOID loSetCompletionRoutine IRQL ::; D I SPATCH _L EV E L Parameter Description IN PIRP Irp IN PIO_COMPLETION_ROUTINE CompletionRoutine IN PVOID Context IN BOOLEAN InvokeOnSuccess IN BOOLEAN InvokeOnError IN BOOLEAN InvokeOnCancel Return value Address of IRP the driver wants to track Routine to call when a lower driver completes the IRP Argument passed to I/ 0 Completion routine Call routine if IRP completes successfully Call routine if IRP completes with error Call routine if IRP is canceled Sec. 15.3 Writing 1/0 Completion Routines 361 Execution Context By the time it calls your I/O Completion routine, the 1/0 Manager has already popped the 1/0 stack pointer, so that the current stack location is the one belonging to your driver. Table 15.2 lists the arguments passed to an 1/0 Comple tion routine. One tricky item is the IRQL level at which an 1/0 Completion routine exe cutes. If the lower-level driver calls IoCompleteRequest from PASSIVE_LEVEL IRQL, then higher-level I/0 Completion routines will also run at PASSIVE_LEVEL. On the other hand, if the lower-level driver completes the request from DISPATCH_LEVEL IRQL (from a DPC routine, for example), then higher-level 1/0 Completion routines will execute at DISPATCH_LEVEL. Since DISPATCH_LEVEL IRQL has more restrictions associated with it than PASSIVE_LEVEL IRQL, it's a good idea to limit the actions of an I/O Completion routine to things that can safely be done at DISPATCH_LEVEL. 5 When an I/0 Completion routine is finished, it should return one of two sta tus codes. Returning STATUS_SUCCESS causes the IRP to continue its journey back toward the original caller. This includes the execution of any other I/O Com pletion routines attached by drivers above this one. This is normally the appropri ate value to use if this is the original IRP that came from some caller outside the driver. To stop any further processing of this IRP, an 1/0 Completion routine can return STATUS_MORE_PROCESSING_REQUIRED. This value blocks the exe cution of any higher-level I/0 Completion routines attached to the IRP. It also prevents the original caller from receiving notification that the IRP has com pleted. An I / 0 Completion routine should return this code if it either plans to send the IRP back down to a lower-level driver (as in the case of split transfer), or if the IRP was allocated by this driver and the I/O Completion routine is going to deallocate it. Table 1 5.2 Function prototype for an 1/0 Completion routine NTSTATUS XxloCompletion IRQL == PASSIVE_LEVEL I DISPATCH_LEVEL Parameter Description IN PDEVICE_OBJECT DeviceObject IN PIRP irp IN PVOID Context Return value Device object that just completed the request 5 The IRP that's being completed Context that was passed to IoSetCompletionRoutine One of the following: • STATUS_MORE_PROCESSING_REQUIRED • STATUS_SUCCESS For example, don't mark any 1/0 Completion routines as paged in an alloc_text pragma. Chapter 15 362 Higher-Level Drivers What VO Completion Routines Do An intermediate driver can attach an 1/0 Completion routine to any IRP it sends to another driver. This includes the original IRP that the driver received from some outside caller, as well as any IRPs that the driver allocates on its own. When an 1/0 Completion routine executes, there are three general kinds of tasks it may need to perform. Release the original IRP If the completed IRP is one that came from an outside caller, it may require some driver-specific cleanup. At the very least, the I/ 0 Completion routine for one of these IRPs needs to do the following: 1. It tests the value o f the IRP's PendingRetumed flag. 2. If this flag is TRUE, the 1/0 Completion routine puts the current 1/0 stack location into the pending state with a call to loMarklrpPending. 3. Finally, it returns a value of STATUS_SUCCESS to allow completion process ing to continue. Deallocate the IRP If the IRP was allocated by the driver, the 1/0 Com pletion routine may be responsible for releasing it. Once again, this is a rather involved topic because the 1/0 Manager supports several different IRP allocation strategies. The next section of this chapter will explain all the gory details of releasing driver-allocated IRPs. Recycle the IRP Some intermediate drivers have to split a transfer into smaller pieces before sending it to a lower-level driver. Normally, the most effi cient way to do this is to send each partial transfer to the lower driver by reusing the same IRP. To recycle an IRP, the 1/0 Completion routine does the following: 1. I t checks the context information stored with the IRP to see if this was the last partial transfer. If the whole transfer is finished and the IRP came from an out side caller, the driver performs any necessary cleanup and returns STATUS_SUCCESS to allow further completion processing. 2. If the whole transfer is finished and this is a driver-allocated IRP, the I/ 0 Completion routine performs any necessary cleanup, frees the IRP, and returns STATUS_MORE_PROCESSING_REQUIRED to prevent any further completion processing. 3. If there's more work to be done, the 1/0 Completion routine calls loGetNext IrpStackLocation and sets up the 1/0 stack slot for the next lower driver. 4. Next, it uses loSetCompletionRoutine to attach the address of this 1/0 Com pletion routine to the IRP. 5. It passes the IRP to the target Device object using loCallDriver. 6. Finally, it returns STATUS_MORE_PROCESSING_REQUIRED to prevent any further completion processing of this IRP. Sec. 15.3 Writing I/0 Completion Routines 363 An implementation detail: During each partial transfer, an intermediate driver has to keep track of how much of the original caller 's request has been sat isfied. One clever way to maintain this context information is to store it in unused fields of the intermediate driver 's 1/0 stack location. For example, if the interme diate driver doesn't need the ByteOffset or Key fields, it can use them to hold three longwords of context data. Of course, if your driver does use these fields for their intended purpose, you can always allocate a private block and pass it as the Context argument to IoSetCompletionRoutine. Code Fragment: An 1/0 Completion Routine Below you'll find a fragment of an 1/0 Completion routine. It complements the YyDispatchRead function presented in the previous section of this chapter. If the request completed normally, it sends it back to the original caller. If something failed at a lower level, it retries the operation a fixed number of times. NT STATUS YyReadComp l e t i on ( IN PDEVICE_OBJECT Devi ceObj e c t , IN P I RP I rp , IN PVOID Context ) { PI O_STACK_LOCATI ON Thi s i rpStack = I oGetCurrent i rp S t ackLocat i on ( I rp ) ; PIO_STACK_LOCATI ON Next i rpStack = I oGetNext i rpStackLocat i on ( I rp ) ; PDEVICE_EXTENS I ON Ext ens i on = Devi ceObj e c t - >Devic eExtens i on ; i f ( ( NT_SUCCESS ( I rp - > I o S tatus . S tatus ) ) I I ( Thi s i rpS tack- > Parameter s . Read . Key { i f ( I rp - > PendingReturned ) 8 I oMarki rpPending ( I rp ) ; return STATUS_SUCCES S ; } Thi s i rpS tack- > Parameters . Read . Key- - ; 8 *Next i rpS tack = * Thi s i rpS tack ; Next i rpS tack- > Parameters . Read . Key I o SetComp l e t i onRout ine ( 0 I rp , YyReadComp l e t i on , = O; 0 ) ) 0 Chapter 15 364 Higher-Level Drivers NULL , TRUE , TRUE , TRUE ) ; I oCal lDriver ( Extens i on->LowerDevi c e , I rp ) ; 0 return STATUS_MORE_PROCESS ING_REQUI RED ; 0 If the lower driver completed the IRP with a successful status code, or if the IRP failed and it has run out of retries, this driver is about to send it on its way back up the driver hierarchy. @ It's necessary to see if the current I/0 stack location should be marked pending. Because of the asynchronous nature of IoCallDriver, this can't be done until the completion routine runs. @ The lower driver failed the IRP but it still has some retries left. At this point, the upper driver decrements the retry count and prepares to send the IRP back down for another try. 0 The 1/0 Completion routine address has to be reset each time the IRP is recycled. 0 Finally, the I/0 Completion routine sends the IRP back to the lower driver. As its return value, the I/ 0 Completion routine sends back STATUS_MORE_PROCESSING_REQUIRED. This prevents the I/O Man ager from continuing to complete the IRP. 1 5.4 ALLOCATING ADDITIONAL I R PS There are some situations where an intermediate driver may need to allocate addi tional IRPs to send to another driver. For example, the initialization code in one driver might want to query the capabilities of a lower-level driver by issuing an IOCTL request. The filter driver appearing later in this chapter does exactly this. Or, for purposes of fault tolerance, the intermediate driver might want to duplicate an incoming request and send redundant copies to multiple lower level drivers. The fault-tolerant disk driver that comes with NT Server uses this technique. Finally, a command exposed by an intermediate driver might require lower level drivers to perform a complex sequence of operations. For example, the class driver for a particular kind of SCSI device has to issue a whole series of com mands to the SCSI port driver to implement one of the class driver 's operations. The IR P's 1/0 Stack Revisited When you start to allocate additional IRPs, it's important to have a clear understanding of just how the IRP's 1/0 stack works. As you already know, when any driver receives an IRP from an outside caller, the 1/0 stack pointer points to Sec. 15.4 Allocating Additional IRPs 365 the stack location belonging to that driver. To retrieve this pointer, the driver sim ply calls IoGetCurrentlrpStackLocation. If an intermediate driver plans to pass an incoming IRP to a lower-level driver, it has to set up the I/ 0 stack location for the lower driver. To get a pointer to the lower driver's 1/0 stack slot, the intermediate driver makes a call to IoGet NextlrpStackLocation. After setting up the lower stack slot, the intermediate driver uses IoCallDriver to pass the IRP on. This function automatically pushes the 1/0 stack pointer so that when the lower driver calls IoGetCurrentlrpStack Location, it will get the right address. When the lower driver calls IoCompleteRequest, the completed IRP's 1/0 stack is popped. This allows an 1/0 Completion routine belonging to the interme diate driver to call IoGetCurrentlrpStackLocation if it needs to access its own stack location. As the IRP bubbles its way back up to the original caller, the 1/0 stack is automatically popped again for each driver in the hierarchy. Table 15.3 summarizes the effects of these functions on an IRP's 1/0 stack pointer. To maintain consistent behavior with driver-allocated IRPs, the 1/0 Man ager plays a little trick. When a driver allocates an IRP, the 1/0 Manager initial izes the new IRP's 1/0 stack pointer so that it points at a nonexistent slot one location beyond the end of the stack. This guarantees that when the driver passes the IRP to a lower-level driver, IoCallDriver 's push operation will set the stack pointer to the first real slot in the stack. This means the higher-level driver must call IoGetNextlrpStackLocation to retrieve a pointer to the 1/0 stack slot intended for the target driver. Controlling the Size of the IRP Stack When a driver receives an IRP from an outside caller, the number of 1/0 stack slots is determined by the StackSize field of the driver 's Device object. If an intermediate driver plans to pass incoming IRPs to a lower-level driver, it needs to set this field equal to one more than the StackSize value of the lower driver. This ensures that there will be enough 1/0 stack for all the drivers in the hierarchy. Table 1 5.3 What various functions do to the I RP's 1/0 stack pointer Working with the IRP stack pointer Function Effect on the IRP stack pointer IoGetCurrentlrpStackLocation IoGetNextlrpStackLocation IoSetNextlrpStackLocation IoCallDriver IoCompleteRequest No change No change Pushes stack pointer one location Pushes stack pointer one location Pops stack pointer one location Chapter 15 366 Higher-Level Drivers If an intermediate driver calls IoBuildAsynchronousFsdRequest, IoBuild DeviceloControlRequest, or IoBuildSynchronousFsdRequest to create an IRP, the 1/0 Manager uses the StackSize field of the target Device object (passed as an argument to all three functions) to determine the number of I/ 0 stack locations in the new IRP. These IRPs will have enough I/ 0 stack slots for the target driver and any drivers below it. There will not be a slot in the 1/0 stack for the intermediate driver itself. If an intermediate driver uses IoAllocatelrp, ExAllocatePool, or some pri vately managed memory to create an IRP, the driver must explicitly specify the number of 1/0 stack slots in the new IRP. Again, the common practice is to use the StackSize field of the target Device object to determine the proper number of slots. Ordinarily, an intermediate driver won't need a stack slot for itself in any IRPs it allocates. The one exception would be if the intermediate driver needed to associate some per-request context with the IRP. In that case, the driver could allo cate an IRP with one extra stack slot and use the extra slot for holding private con text data. This code fragment shows how it's done: Newi rp = I oAl l ocate i rp ( LowerDevi c e - > S tackS i z e + 1 ) ; II I I Push the I I O s tack pointer s o that i t points I I at the f i r s t val i d s l ot . Use thi s s l o t to ho l d I I c ontext inf orma t i on needed by the upper driver . II I o S e tNext i rpS tackLocat i on ( Newi rp ) ; Cont extArea = I oGetCurrent i rp S tackLocat i on ( Newi rp ) ; NextDriverS l o t = I oGetNext i rpStackLoc a t i on ( Newirp ) ; II I I S e t up next driver ' s I I O s tack s l o t II NextDriverS l o t - >Maj orFunc t i on IRP_MJ_XXX ; = II I I At tach an I I O Comp l e t i on rout ine and I I s end the IRP to s omeone e l s e II I o S etComp l e t i onRout ine ( Newi rp , YyioComp l e t i on , NULL , TRUE TRUE TRUE ) ; I I I oCal lDr iver ( LowerDevice , Newi rp ) ; Sec. 15.4 Allocating Additional IRPs 367 Creating IRPs with loBuildSynchronousFsdRequest The 1/0 Manager provides three convenience functions that simplify the process of building IRPs for standard kinds of I / 0 request. The first one is IoBuildSynchronousFsdRequest, and it fabricates read, write, flush, or shut down IRPs. See Table 15.4 for a description of this function. The number of I/ 0 stack locations in IRPs created with this function is equal to the StackSize field of the TargetDevice argument. There's no straightforward way to leave room in the I/O stack for the intermediate driver itself. The Buffer, Length, and StartingOffset arguments to this function are required for read and write operations. They must be NULL, 0, and NULL (respectively) for flush or shutdown operations. IoBuildSynchronousFsdRequest automatically sets up various fields in the Parameters area of the next lower I/0 stack location, so there's rarely any need to touch the 1/0 stack. For read or write requests, this function also allocates system buffer space or builds an MDL, depending on whether the TargetDevice does Buffered or Direct I/O. For buffered outputs, it also copies the contents of the caller 's buffer into the system buffer; at the end of a buffered input, data is auto matically copied from the system buffer to the caller 's buffer. As the function name suggests, you make requests for synchronous 1/0 operations with the IRPs returned by IoBuildSynchronousFsdRequest. In other words, the thread that calls IoCallDriver normally blocks itself until the 1/0 operation completes. To do this, just pass the address of an initialized Event object Table 1 5.4 Function prototype for loBuildSynch ronousFsd Request PIRP loBui ldSynchronousFsdRequest IRQL == PASSIVE_LEVEL Parameter Description IN ULONG MajorFunction One of the following: • IRP_MJ_READ • IRP_MJ_WRITE • IRP_MJ_FLUSH_BUFFERS • IRP_MJ_SHUTDOWN Device object where IRP will be sent Address of I/ 0 buffer Length of buffer in bytes Device offset where I/O will begin Event object used to signal I/0 completion Receives final status of 1/0 operation • Non-NULL - address of new IRP • NULL - IRP could not be allocated IN PDEVICE_OBJECT TargetDevice IN OUT PVOID Buffer IN ULONG Length IN PLARGE_INTEGER StartingOffset IN PKEVENT Event OUT PIO_STATUS_BLOCK Iosb Return value 368 Chapter 15 Higher-Level Drivers when you allocate the IRP. Then, after sending the IRP to a lower-level driver with IoCallDriver, use KeWaitForSingleObject to wait for the Event object. When a lower-level driver completes the IRP, the I/O Manager will put this Event object into the Signaled state, which will awaken your driver. The I/O status block will tell you whether everything worked. Two points about intermediate drivers issuing synchronous I/0 requests to other drivers. First, drivers that perform blocking I/0 can be rather sluggish because they prevent the calling thread from overlapping its I/0 operations. This is contrary to the philosophy of the NT I/0 architecture, so you shouldn't do it unless you really need to. Second, the Event object used to wait for I/0 completion needs to be syn chronized properly or there could be a nasty collision. Consider the case where two threads in the same process issue a read request using the same handle. The YyDispatchRead routine executes in the context of the first thread and blocks itself waiting for the Event object. Then the same YyDispatchRead routine exe cutes in the context of the other thread and reuses the same Event object to issue a second request. When the IRP for either request completes, the Event object will be set, both threads will awaken, and nothing good will happen. 6 The solution is to guard the Event object with a Fast Mutex. The I/0 Manager automatically cleans up and deallocates IRPs created with IoBuildSynchronousFsdRequest after their completion processing is done. This includes releasing any system buffer space or MDL attached to the IRP. To trigger this cleanup, a lower-level driver simply has to call IoCompleteRequest. Normally, there won't be any need to attach an I/0 Completion routine to one of these IRPs, unless you need to do some driver-specific postprocessing. If you do attach an I/O Completion routine, it should return STATUS_SUCCESS when it's done. This lets the I/0 Manager free the IRP. Creating IRPs with loBui ldAsynchronousFsdRequest The second convenience function, IoBuildAsynchronousFsdRequest, is quite similar to the first. It lets you build read, write, flush, and shutdown requests without worrying about too many of the details. The main difference is that you have to process these IRPs asynchronously. You don't have the option of stopping and waiting for the I/O to complete. Table 15.5 contains the prototype for this function. As with IoBuildSynchronousFsdRequest, the Buffer, Length, and Starting Offset parameters to IoBuildAsynchronousFsdRequest are required for read and write operations. They must be NULL, 0, and NULL (respectively) for flush or shutdown operations. 6 This problem isn't limited to threads in the same process, by the way. If the intermediate driver's Device object is shareable, the same issue arises if threads in two separate processes issue simulta neous requests that travel through the YyDispatchRead routine. Sec. 15.4 Allocating Additional IRPs Table 1 5.5 369 Function prototype for loBuildAsynchronousFsdRequest PIRP loBuildAsynchronousFsdRequest IRQL � DISPATCH_LEVEL Parameter Description IN ULONG MajorFunction One of the following: • IRP_MJ_READ • IRP_MJ_WRITE • IRP_MJ_FLUSH_BUFFERS • IRP_MJ_SHUTDOWN Device object where IRP will be sent Address of I/ 0 buffer Length of buffer in bytes Device offset where 1/0 will begin Receives final status of 1/0 operation • Non-NULL - address of new IRP • NULL - IRP could not be allocated IN PDEVICE_OBJECT TargetDevice IN OUT PVOID Buffer IN ULONG Length IN PLARGE_INTEGER StartingOffset OUT PIO_STATUS_BLOCK Iosb Return value Notice that you can call IoBuildAsynchronousFsdRequest at or below DISPATCH_LEVEL IRQL. IoBuildSynchronousFsdRequest works only at PASSIVE_LEVEL. Unlike the IRPs from IoBuildSynchronousFsdRequest, the ones from this function are not released automatically when a lower-level driver completes them. Instead, you must attach an 1/0 Completion routine to any IRP created with IoBuildAsynchronousFsdRequest. The 1/0 Completion routine calls loFreelrp which releases the system buffer or MDL associated with the IRP and then deallocates the IRP itself. The return value of the 1/0 Completion routine should be STATUS_MORE_PROCESSING_REQUIRED. Creating IRPs with loBuildDeviceloControlRequest The last convenience function, IoBuildDeviceloControlRequest, (described in Table 15.6) simplifies the task of building IOCTL IRPs. This is useful because it's a fairly common practice for drivers of odd pieces of hardware to expose an interface composed almost entirely of IOCTLs. Some higher-level drivers (like NT's TDI network protocol drivers) take this same approach. The IntemalDeviceloControl argument lets you specify the major function code in the target driver 's 1/0 stack slot. FALSE produces an IRP with IRP_MJ_DEVICE_CONTROL, while TRUE causes it to be set to IRP_MJ_INTER NAL_DEVICE_CONTROL. Also notice that you can make either synchronous or asynchronous calls with the IRPs returned by this function. If you want your Dispatch routine to stop Chapter 15 370 Table 1 5.6 Higher-Level Drivers Function prototype for loBuildDeviceloControlRequest PIRP loBuild DeviceloControlRequest IRQL == PASSIVE_LEVEL Parameter Description IN ULONG IoControlCode IN PDEVICE_OBJECT TargetDevice IN PVOID lnputBuffer IN ULONG InputBufferLength OUT PVOID OutputBuffer IN ULONG OutputBufferLength IN BOOLEAN InternalDeviceloControl IN PKEVENT Event IOCTL code recognized by target driver Device object where IRP will be sent Buffer of data passed to lower driver Size of data buffer in bytes Data buffer filled by lower driver Size of data buffer in bytes (See below) Event object used to signal 1/0 completion Receives final status of I/O operation • Non-NULL - address of new IRP • NULL - IRP could not be allocated OUT PIO_STATUS_BLOCK Iosb Return value and wait until an 1/0 control operation completes, simply pass the address of an initialized Event object when you allocate the IRP. Then, after sending the IRP to a lower-level driver with IoCallDriver, use KeWaitForSingleObject to wait for the Event object. When a lower-level driver completes the IRP, the 1/0 Manager will put this Event object into the Signaled state, which awakens your driver. The I/0 status block will tell you how everything went. As with IoBuildSynchronous FsdRequest, you have to be careful about multiple threads using this Event object at the same time. The I/O Manager automatically cleans up and deallocates IRPs created with IoBuildDeviceloControlRequest after their completion processing is done. This includes releasing any system buffer space or MDL attached to the IRP. To trigger this cleanup, a lower-level driver simply has to call IoCompleteRequest. Normally, there's no need to attach an I/O Completion routine to one of these IRPs, unless you need to do some driver-specific post-processing. If you do attach an I/0 Completion routine, it should return STATUS_SUCCESS when it's done. This lets the 1/0 Manager free the IRP. The one problem with this function is the way it handles the buffering method bits embedded in the IOCTL code. If an IOCTL code contains METHOD_BUFFERED, IoBuildDeviceloControl allocates a nonpaged pool buffer and copies the contents of the lnputBuffer to it; when the IRP completes, the contents of the nonpaged pool buffer are automatically copied to Output Buffer. So far, it behaves exactly like a Win32 DeviceloControl call coming from a user-mode application. But, if you specify an IOCTL code containing one of the Direct 1/0 methods, a nasty bug appears: IoBuildDeviceloControl always builds an MDL for the Out putBuffer address and always uses a nonpaged pool buffer for the InputBuffer Sec. 15.4 Allocating Additional IRPs 371 address, regardless of whether the IOCTL code specifies METHOD_IN_DIRECT or METHOD_OUT_DIRECT. Creating IRPs from Scratch The 1/0 Manager routines described above are the most convenient way to work with driver-allocated IRPs. Every once in awhile, however, they may not be the right thing to use. For example, if you're trying issue a request other than read, write, flush, shutdown, or device 1/0 control, these functions aren't very helpful. At that point, your only option is allocate a blank IRP and set it up by hand. The following subsections describe several ways to do this. IRPs from loAllocatelrp The IoAllocatelrp function will allocate an IRP from an 1/0 Manager zone buffer and perform certain basic kinds of initializa tion. 7 Your driver has to fill in the I/O stack location for the target driver and set up whatever kind of buffer the target driver is expecting to find. The following code fragment illustrates the use of this function. PMDL NewMdl ; PIRP Newi rp ; P IO_STACK_LOCATI ON Next i rpS tack ; Newi rp = I oAl l o c a t e i rp ( LowerDevi c e - > S tackS i z e ) ; NewMdl = I oAl l ocateMdl ( MmGetMdlVi r tualAddr e s s ( Origina l i rp - >MdlAddre s s ) , XX_S I ZE_OF_BI GGEST_TRANSFER , FALSE , / / Pr imary bu f f er FALSE , / / No quo ta charge Newi rp ) ; I oBu i l dPar t i a lMdl ( Origina l i rp - >MdlAddr e s s , NewMdl , MmGe tMdlVi rtualAddres s ( Origina l i rp- >MdlAddre s s ) , XX_S I Z E_OF_BIGGEST_TRANSFER ) ; Next i rpS tack = I oGe tNext i rpStackLocat i on ( Newirp ) ; Next i rpS tack- >Maj orFunc t i on = I RP_MJ_XXX ; Next i rpS tack- > Parameters . Xxx . Length = XX_S I ZE_OF_BIGGEST_TRANSFER ; 7 There's a very serious error in the DDK documentation that's worth knowing about: The documen tation clearly states that you must pass any IRPs created with loAllocatelrp to lolnitializelrp before you can use them. This turns out to be a lie. If you pass an IRP returned from loAllocatelrp to lolnitializelrp, the system will crash when your driver tries to release the IRP. So, don't do that. Chapter 15 372 Higher-Level Drivers Newi rp - > Ta i l . Over l ay . Thread Origina l i rp- >Tai l . Over l ay . Thread ; I o S e tComp l e t i onRout ine ( Newi rp , YyioComp l e t i on , NULL , TRUE , TRUE , TRUE ) ; I oCal l Driver ( LowerDevice , Newi rp ) ; One thing to mention here: If the new IRP is targeted at a disk device or a device with removable media, the intermediate driver needs to copy the contents of the original IRP's Tail.Overlay.Thread field into the new IRP. This guarantees that the system will be able to pop up a dialog box for the user if the underlying device driver calls IoSetHardErrorOrVerifyDevice. Your driver is responsible for releasing any IRPs created with IoAllocatelrp. It also has to free any other resources (MDLs or system buffers, for example) asso ciated with the IRP. Normally, this cleanup occurs in the IRP's I/0 Completion routine. The following code fragment shows what you need to do. NT STATUS Yyi oComp l e t i on ( IN PDEVI CE_OBJECT DeviceObj ect , IN PIRP I rp , IN PVOI D Context ) I oFreeMdl ( I rp - >MdlAddre s s ) ; I oFree i rp ( I rp ) ; return STATUS_MORE_PROCESS ING_REQUI RED ; I R Ps from ExAllocatePool If, for some odd reason, you'd prefer to get your IRPs directly from nonpaged pool, you can allocate them with the standard ExAllocatePool function. Once you have the block of pool, you still need to turn it into an IRP using Iolnitializelrp. (This is the correct place to call this function.) Filling in the 1/0 stack location and setting up appropriate buffers or MDLs is still left to you. Here's an example of what to do; in this fragment, the lower Device object is expecting a nonpaged pool buffer rather than an MDL. Newi rp = ExAl l ocatePoo l ( NonPagedPoo l , Sec. 15.4 Allocating Additional IRPs 373 I o S i z eO f i rp ( LowerDevi c e - > S tackS i z e ) ) ; I o i ni t i a l i z e i rp ( Newi rp , I o S i z eO f i rp ( LowerDevi c e - > S tackS i z e ) , LowerDevi c e - > S tackS i z e ) ; Next i rpS tack = I oGe tNext i rpStackLocat i on ( Newi rp ) ; Next i rpS tack - >Maj orFunc t i on Next i rpS tack- > Parameters . Xxx . Length = IRP_MJ_XXX ; XX_BUFFER_S I Z E ; Newi rp - > As s o c i a t edi rp . Sys temBu f f er ExAl l ocatePoo l ( NonPagedPo o l , XX_BUFFER_S I Z E ) ; Newi rp - > Tai l . Overlay . Thread Ori ginal i rp - >Tai l . Overlay . Thread ; I o S e tComp l e t i onRout ine ( Newi rp , Yyi oComp l e t i on , NULL , TRUE , TRUE , TRUE ) ; I oCal lDriver ( LowerDevice , Newi rp ) ; Once again, it's the j ob of the 1/0 Completion routine attached to the IRP to do all the cleanup and release the IRP. The following code fragment shows you how. NT STATUS Yyi oComp l e t i on ( IN PDEVICE_OBJECT Devic eObj ect , IN P I RP I rp , IN PVO I D Context ) ExFreePoo l ( I rp - >As s o c i atedirp . Sys temBu f f e r ) ; IoFre e i rp ( I rp ) ; re turn STATUS_MORE_PROCESS ING_REQUIRED ; Notice that you use IoFreelrp to get rid of the IRP, even though you allocated it with ExAllocatePool. This is because a field in the IRP tells the 1/0 Manager Chapter 15 374 Higher-Level Drivers whether this IRP came directly from the pool, or whether it came from the 1/0 Manager 's private zone buffer. IRPs from driver-managed memory Finally, there's always the chance that you're keeping a private collection of IRPs that you've carved out of a driver specific zone buffer or a look-aside list. This is really the same as the case where you allocate IRPs using ExAllocatePool, in that you still need to initialize each IRP using Iolnitializelrp. The big difference is the way you release these privately managed IRPs. Since the 1/0 Manager doesn't know anything about your driver 's memory man agement strategy for these IRPs, the IoFreelrp function wouldn't know what to do with one of them. So, instead of calling IoFreelrp, the 1/0 Completion routine needs to call whatever internal driver function is responsible for releasing the IRP. Setting Up Buffers for Lower Drivers If you use any of the preceding techniques to create IRPs from scratch, it's also your responsibility to initialize and clean up any buffers needed by those IRPs. 8 How you do this will depend on whether the target Device object does Buffered or Direct 1/0. Buffered 1/0 requests Here, the Dispatch routine in the intermediate driver has to call ExAllocatePool to allocate the buffer. It stores the address of this buffer in Associatedlrp.SystemBuffer field of the driver-allocated IRP. Later, an I/ 0 Completion routine attached to the IRP has to release the buffer with a call to ExFreePool. Direct 1/0 requests Handling these requests means the intermediate driver has to set up an MDL describing the 1/0 buffer. In this case, the intermedi ate driver 's Dispatch routine would do the following: 1. I t calls IoAllocateMdl to create a n empty MDL large enough map the buffer. It stores the address of this MDL in the MdlAddress field of the driver-allo cated IRP. 2. The Dispatch routine fills in the MDL. To map a portion of the buffer associ ated with the original caller 's IRP, it calls IoBuildPartialMdl. To map system memory into the MDL, it uses MmBuildMdlForNonPagedPool. 3. It then attaches an 1/0 Completion routine to the driver-allocated IRP using IoSetCompletionRoutine. 4. Finally, the Dispatch routine sends the IRP to a lower-level driver with IoCallDriver. 8 This is one of the arguments in favor of using the convenience routines to build IRPs, since they handle all this nastiness on their own. Sec. 15.4 Allocating Additional IRPs 375 When the lower-level driver completes the IRP, the intermediate driver 's I/O Completion routine uses IoFreeMdl to release the MDL. Keeping Track of Driver-Al located IRPs Intermediate drivers have to be careful about how they handle incoming 1/0 requests that result in multiple IRPs being sent simultaneously to some other drivers. In particular, it's important for the original incoming IRP not to be completed until all the allocated IRPs have finished their work. Exactly how the intermediate driver does this will depend on whether it performs synchronous or asynchronous 1/0 with the driver-allocated IRPs. Synchronous 1/0 This is the simpler of the two cases, since the intermedi ate driver 's Dispatch routine just has to stop and wait until all the allocated IRPs have been completed. In general, the Dispatch routine would do the following: 1. I t calls IoBuildSynchronousFsdReqest to create some number o f driver-allo cated IRPs. 2. Next, the Dispatch routine uses IoCallDriver to pass all the driver-allocated IRPs to other drivers. 3. It then calls KeWaitForMultipleObj ects and freezes until all the allocated IRPs have completed. 4. Finally, it calls IoCompleteRequest with the original IRP to send it back to the caller. Notice here that, since the original request is blocking inside the Dispatch routine itself, there's no need to mark the original IRP pending. Asynchronous 1/0 This is a somewhat more complex case because there's no central point of control where the driver can stop and wait for everything to finish. Instead, the intermediate driver has to attach I/O Completion routines to each driver-allocated IRP, and the completion routine will have to decide whether it's time to complete the original caller 's IRP. Here's what happens in the Dispatch routine of an intermediate driver using this kind of freewheeling approach: 1. It puts the original caller 's IRP in the pending state by calling IoMarkPending. 2. Next the Dispatch routine uses one of the methods described in the previous section to allocate some additional IRPs. 3. It attaches an 1/0 Completion routine to each of these IRPs with IoSetCom pletionRoutine. When it makes this call, the Dispatch routine passes a pointer to the original caller 's IRP as the Context argument. Chapter 15 376 Higher-Level Drivers 4. The Dispatch routine stores a count of outstanding allocated IRPs in an unused field of the original IRP. The Key field in the current 1/0 stack loca tion's Parameters union is one possible place. 5. Next, it uses IoCallDriver to pass all the IRPs to other drivers. 6. Finally, the Dispatch routine passes back STATUS_PENDING as its return value. This is necessary because the original IRP isn't yet ready for comple tion processing. As each of the other drivers completes one of these IRPs, the intermediate driver 's 1/0 Completion routine executes. That routine does the following: 1. First, it performs whatever cleanup is necessary and deletes the driver-allo cated !RP. 2. The I/ 0 Completion routine calls ExlnterlockedDecrementLong to decre ment the count of outstanding IRPs contained in the original caller 's IRP. (Remember, it received a pointer to this original IRP as its Context argument.) 3. If the count equals zero, then this is the last outstanding driver-allocated IRP. In that case, the 1/0 Completion routine completes the original IRP by calling IoCompleteRequest. 4. Finally, it returns STATUS_MORE_FROCESSING_REQUIRED to prevent any further completion processing of the driver-allocated IRP (which has just been deleted). 1 5 .5 WRITING FILTER DRIVERS A filter driver is a special type of intermediate driver. What sets filters apart from the layered drivers described earlier in this chapter is that they are invisible. They sit on top of some other driver and intercept requests directed at the lower driver 's Device objects. Users of the lower driver are completely unaware that this is going on. Some of the things you can do with filters include the following: • Filters let you modify some aspects of an existing driver 's behavior with out rewriting the whole thing. SCSI filter drivers (described back in Chapter 1) work this way. • They make it easier to hide the limitations of lower-level device drivers. For example, a filter could split large transfers into smaller pieces before passing them on to a driver with transfer size limits. • Filters allow you to add features like compression or encryption to a device without modifying the underlying device driver or the programs that use the device. Sec. 15.5 Writing Filter Drivers • 377 They let you add or remove expensive behavior like performance moni toring that you don't want included in a driver all the time. The disk per formance monitoring tools in NT work this way. The rest of this section explains how to write filter drivers. As you read it, keep in mind that things like driver-allocated IRPs and I/ 0 Completion routines work the same way in a filter driver as they do in a regular layered driver. How Fi lter Drivers Work The main difference between filter drivers and other layered drivers is in the Device objects they create. Whereas a layered driver exposes Device objects with their own unique names, a filter driver 's Device objects have no names at all. Fil ter drivers work by attaching one of these nameless Device objects to a Device object created by some lower-level driver. Figure 15.2 illustrates this relationship. In the diagram, YYDRIVER has attached a filter Device object to XXO, one of XXDRIVER's Device objects. Any IRPs sent to XXO are automatically rerouted to the Dispatch routines in YYDRIVER. Here's how it works. 1. The DriverEntry routine in the filter driver creates an invisible Device object and attaches it to a named Device object belonging to another driver. 2. A client of the lower-level driver opens a connection to XXO. This can be a user-mode program calling CreateFile to get a handle, or a kernel-mode client loCompleteRequest Copyright © 1 996 by Cydonix Corporation. 960032a.vsd Figure 1 5.2 How filter drivers work Chapter 15 378 Higher-Level Drivers calling IoGetDeviceObj ectPointer. In either case, the I/0 Manager actually opens a connection between the client and the filter driver 's invisible Device object. 3. When the client sends an 1/0 request to XXO, the 1/0 Manager sends it to the filter driver 's unnamed Device object instead. The I/O Manager uses the MajorFunction table of the filter 's Driver object to select an appropriate Dis patch routine. 4. The Dispatch routines in the filter driver either process the IRP on their own and complete it immediately, or they send the IRP down to XXO with IoCall Driver. If the filter driver needs to regain control of the IRP when the lower level driver completes it, the filter can associate an 1/0 Completion routine with the IRP. Filters can also be layered above other filters. If you try to attach a filter to an already filtered Device object, the new filter simply gets layered on top of the highest existing filter. So, you can have essentially any number of filter levels. Initial ization and Cleanup in Fi lter Drivers Like every other kernel-mode driver, a filter driver must have a main entry point called DriverEntry. If the driver is to be unloaded while the system is run ning, it needs an Unload routine as well. The following subsections describe what these routines have to do. DriverEntry routine The initialization sequence in a filter driver will fol low one of two basic patterns. The first possibility is that the filter needs to inter cept IRPs directed at all the Device objects created by a lower-level driver. In that case, the filter 's DriverEntry routine will perform these steps: 1. It calls IoGetDeviceObjectPointer to get a pointer to one of the Device objects belonging to the lower-level driver. 2. From this Device object, the filter's DriverEntry routine gets a pointer to the target Driver object. It uses this pointer to scan the MajorFunction table of the target Driver object and make sure that every function code supported by the target is also supported by the filter driver. 3. Next, DriverEntry uses the DeviceObject field of the target Driver object to get the first target Device object. 4. The filter calls IoCreateDevice to create a filter Device object for this target Device object. This filter Device object has no NT name, nor does it have a symbolic link to give it a Win32 name. 5. It then calls loAttachDeviceByPointer to attach the new filter Device object to the target Device object. Sec. 15.5 Writing Filter Drivers 379 6. It stores the address of the target Device object in the Device Extension of the filter Device object. Other parts of the filter driver will need this pointer to call the target driver. 7. Next, DriverEntry copies the DeviceType and Characteristics fields from the target Device object to the filter Device object. It also copies the DO_DIRECT_IO and DO_BUFFERED_IO bits from the target Device object's Flags field. This guarantees that the filter will look the same and have the same buffering strategy as the target driver. 8. It uses the NextDevice field of the target Device object to get the next Device object in the chain and repeats steps 4-7. 9. Finally, it calls ObDereferenceObject to decrement the reference count on the File object returned by IoGetDeviceObjectPointer. The second possibility is that the filter driver only wants to capture 1/0 requests sent a specific Device object belonging to a lower-level driver. In that case, the filter 's DriverEntry routine performs the following steps. 1. I t calls IoCreateDevice to create a filter Device object. This object has no NT name, nor does it have a symbolic link to give it a Win32 name. 2. DriverEntry uses IoAttachDevice to connect the filter Device object to a spe cific target Device object. This function takes the case-sensitive NT name of the target device (for example, \Device\XXO) and a pointer to the filter Device object. After making the attachment, it returns a pointer to the target Device object. 3. It stores the address of the target Device object in the Device Extension of the filter Device object. 4. Next, DriverEntry copies the DeviceType and Characteristics fields from the target Device object to the filter Device object. It also copies the DO_DIRECT_IO and DO_BUFFERED_IO bits from the target Device object's Flags field. 5. From the target Device object, the filter 's DriverEntry routine gets a pointer to the target Driver object. It uses this pointer to scan the MajorFunction table of the target Driver object and make sure that every function code supported by the target is also supported by the filter driver. U nload routine A filter driver 's Unload routine has to disconnect the filter and target Device objects. It does this by calling IoDetachDevice and passing a pointer to the target Device object. Once the filter Device object has been detached, the Unload routine calls IoDeleteDevice to get rid of it. If the filter driver has attached itself to a number of target Device objects, it needs to repeat this procedure for each filter Device object. Chapter 15 380 Higher-Level Drivers What Happens beh ind the Scenes A lot of undocumented activity occurs when a filter driver attaches itself to a target Device object. In response to an IoAttachDeviceByPointer call, the 1/0 Manager performs the following steps. 1. It sends an IRP t o the target Device object. This IRP contains the function code IRP_MJ_CREATE. There are enough 1/0 stack locations in this IRP for the target driver plus any other drivers layered beneath it. This IRP does not pass through the filter driver's Maj orFunction dispatch table. 2. Next, the 1/0 Manager sets the filter Device object's StackSize field to one greater than the StackSize field of the target Device object. This guarantees that IRPs created for the filter will have enough 1/0 stack locations for any lower-level drivers in the hierarchy. 3. It also sets the AlignmentRequirement field of the filter Device object equal to the AlignmentRequirement field of the target Device object. 4. The 1/0 Manager then sends an IRP to the filter Device object. This IRP con tains the function code IRP_MJ_CLOSE. Regardless of what Dispatch routines are registered in the filter driver 's MajorFunction table, this IRP_MJ_CLOSE IRP is not preceded by an IRP_MJ_CLEANUP IRP. 5. Finally, the 1/0 Manager returns the address of the target Device object to the caller of IoAttachDeviceByPointer. Unlike the attach function, IoDetachDevice function doesn't send any self generated IRPs to the target Device object, nor does it reset the StackSize field of the filter Device object. Making the Attachment Transparent Once a filter has attached itself to a target driver, any 1/0 requests sent to the target have to pass through the Dispatch routines of the filter driver first. If the Maj orFunction table of the filter Driver object doesn't support the same set of IRP_MJ_XXX codes as the target driver, clients of the target may experience problems when the filter is attached. Specifically, some types of requests that work without the filter will be rejected as illegal operations when the filter is in place. To avoid this kind of inconsistency, the filter driver 's MajorFunction table must contain a Dispatch routine for every IRP_MJ_XXX function supported by the target driver. Even if the filter isn't interested in modifying a particular major function code, it still has to supply a dummy Dispatch routine that just passes the IRP on to the target driver. The best way to set this up is for the filter driver to scan the MajorFunction table of the target Driver object. If an entry in the target driver 's table contains a Sec. 15.6 Code Example: A Filter Driver 381 pointer to _IoplnvalidDeviceRequest, 9 then the corresponding IRP_MJ_XXX code is unsupported; if it contains anything else, then the target driver supports the function code. In that case, the filter driver has to put a Dispatch routine in the corresponding Maj orFunction slot of its own Driver object. The sample driver in the next section shows how to do this. 1 5.6 CODE E XAM PL E : A F I LT E R DRIVER This example shows how a basic filter driver (called YYDRIVER) intercepts all requests intended for a lower-level driver (XXDRIVER). The purpose of the filter is to hide the lower driver 's limited output transfer size. To do this, it breaks large outputs into smaller pieces. It also overrides an IOCTL from the lower driver that returns the maximum size of an output buffer. All other major function codes sup ported by the lower driver are simply passed through from the filter. You can find the code for this example in the CH15\FILTER\DRIVER directory on the disk that accompanies this book. Code for the dummy device driver sitting below it is in CH15\LOWER\DRIVER. YYDRIVER.H - Driver Data Structures Here's the Device Extension used by the filter driver. Notice that it contains a pointer to the lower driver 's Device object. The filter uses this to send IRPs to the lower driver. typede f s t ruc t _DEVICE_EXTENS ION { PDEVICE_OBJECT Devi c eObj ect ; / / Back po inter PDEVICE_OBJECT TargetDevice ; XX_BUFFER_S I Z E_INFO Buf f erinf o ; DEVICE_EXTENS ION , * PDEVICE_EXTENS ION ; I NIT.C - Initialization Code Initialization in this filter follows the pattern described in the previous sec tion of this chapter. This driver takes the general approach of intercepting 1/0 requests for all the Device objects created by the lower driver. DriverEntry This function is responsible for driver-level initialization. It uses one of the lower driver's Device objects to locate all Device objects belonging to the lower driver. It uses a helper function to attach filter Device objects to each one. It also sets up the filter 's Maj orFunction table by scanning the slots in the lower driver 's table. 9 Remember from Chapter 8 that this is the 1/0 Manager routine that rejects an IRP with an unwanted function code. This is the default value for any slot in the MajorFunction table. Chapter 15 382 Higher-Level Drivers NT STATUS DriverEnt ry ( IN PDRIVER_OBJECT DriverObj ect , IN PUNICODE_STRING Regi s t ryPath ) { PDEVICE_OBJECT TargetDevi c e ; UNI CODE_STRING TargetDevi c eName ; PDRIVER_OBJECT TargetDriver ; PDRIVER_D I S PATCH EmptyDi spatchValue ; XX_BUFFER_S I Z E_INFO Buf f erinf o ; PF I LE_OBJECT F i l eObj e c t ; NTSTATUS s tatus ; ULONG i ; EmptyDi spatchValue DriverObj e c t - >Maj orFunc t i on [ I RP_MJ_CREATE ] ; 0 II I I Export other driver entry points II DriverObj e c t - >Drive rUnl oad = . . . YyDriverUnl oad ; DriverObj e c t - > Maj o rFunc t i on [ I RP_MJ_WRITE YyDi spatchWr i t e ; 8 = DriverObj ect - > Ma j orFunc t i on [ IRP_MJ_DEVICE_CONTROL YyDi spat chDevi c e i oContro l ; = Rt l in i tUni code S t ri ng ( &Targe tDevi c eName , TARGET_DEVI CE_NAME ) ; s tatus = I oGetDeviceObj e c t Po inter ( 8 &TargetDevi ceName , F ILE_ALL_ACCESS , &F i l eObj e c t , &TargetDevice ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { return s tatus ; YyGetBu f f erLimi t s ( TargetDevi c e , &Bu f ferinfo ) ; TargetDriver = TargetDevi c e - >DriverObj ect ; Sec. 15.6 Code Example: A Filter Driver 383 for ( i = O ; i < = IRP_MJ_MAXIMUM_FUNCTION ; i + + ) 0 { i f ( ( Targe tDriver- >Maj orFunc t i on [ i ] ! = EmptyDi spat chValue ) && ( DriverObj e c t - >Maj orFunc t i on [ i ] EmptyDi spatchVa lue ) ) { DriverObj e c t - >Maj o rFunc t i on [ i ] YyDi spatchPassThrough ; Targe tDevi c e = Targe tDr iver- >Devi ceOb j ect ; 0 whi l e ( TargetDevi c e ! = NULL ) { s tatus = YyAt tachF i l t er ( DriverObj ect , TargetDevi c e , &Bu f ferinfo ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { YyDriverUnl oad ( DriverObj ect ) ; break ; TargetDevi ce = TargetDev i c e - >NextDevi ce ; ObDer e f e renc eObj ect ( F i l eObj ect ) ; © re turn s tatus ; } 0 The first step is to get the contents of an empty slot in the filter 's Major Function table. This is actually the address of an internal system routine called _IoplnvalidDeviceRequest. We can find its current value by look ing in any slot of the filter 's own table that it hasn't filled in yet. @ Next, overwrite slots in the filter 's MajorFunction table that correspond to functions the filter wants to intercept and modify. In this driver, only write and IOCTL functions are being fooled with. @} Using the NT name of any device belonging to the lower driver, get a pointer to the Device object itself. It doesn't really matter which one, since it's only being used to query buffer size limits and to get a pointer to the lower Driver object. 0 In this loop, see which IRP_MJ_XXX function codes the lower driver responds to. If the lower driver processes a given code and the filter Chapter 15 384 Higher-Level Drivers doesn't explicitly intercept that code, fill the corresponding slot in the fil ter 's MajorFunction table with the address of a generic pass-through Dis patch routine. 0 Now, run the list of all Device objects attached to the lower Driver object. For each one, create and attach an invisible filter Device object. © Finally, decrement the reference count on the unused File object and return the most recent status value. This is either STATUS_SUCCESS or some error code from YyAttachFilter. YyAttachFilter This is a little helper function that does the grunt work associated with creating and attaching a filter Device object to a specific lower level Device object. s tat i c NTSTATUS YyAt tachF i l te r ( IN PDRIVER_OBJECT F i l t erDriver , IN PDEVICE_OBJECT TargetDevi c e , IN PXX_BUFFER_S I Z E_INFO Bu f f erinfo ) PDEVICE_OBJECT F i l t erDevi c e ; PDEVICE_EXTENS ION F i l terExt ens i on ; ULONG TargetMethod ; NTSTATUS s tatus ; s tatus = I oCreat eDevi c e ( 0 F i l t e rDriver , s i z e o f ( DEVICE_EXTENS ION ) , NULL , F ILE_DEVICE_UNKNOWN , 0' TRUE , &Fi l terDevice ) ; i f ( ! NT_SUCCE S S ( s tatus ) ) { return s tatus ; } s tatus = I oAttachDeviceByPo inter ( @ F i l t erDevi c e , TargetDevi c e ) ; i f ( ! NT_SUCCESS ( s tatus ) ) { Sec. 15.6 Code Example: A Filter Driver 385 I oDe l e t eDevi c e ( F i l t erDevice ) ; re turn s tatus ; F i l te rExtens i on = F i l t erDevi c e - >Devi c eExtens i on ; @ F i l terDevi c e ; F i l te rExtens i on- >Devi c eObj ect F i l terExtens i on - > TargetDevice = TargetDevi ce ; F i l terExt ens i on - > Bu f f erinf o . MaxWri teLength Buf f erinfo- >MaxWri teLength ; F i l t erExtens ion-> Bu f f erinf o . MaxReadLength = Bu f fe r i n f o - >MaxReadLength ; F i l te rDevi c e - >DeviceType = TargetDevi c e - >Devic eType ; 0 F i l terDevi c e - >Charac t e r i s ti c s = Targe tDevi c e - >Charac t er i s t i c s ; F i l t erDevi c e - > F l ags I = ( TargetDevic e - > F l ags & ( DO_BUFFERED_IO I DO_DIRECT_IO ) ) ; 0 re turn STATUS_SUCCES S ; 0 Create a Device object without an NT name. It doesn't matter what its type or characteristics are, since they'll be copied from the lower-level Device object. @ Attach the invisible Device object to the lower-level Device object. See the previous section in this chapter for a description of all the things that hap pen when you make this call. @ Set up the filter Device object's Device Extension structure. This includes storing the transfer size limitations queried from the lower driver. 0 Copy various items from the lower-level Device object into the filter Device object. This is necessary to make the presence of the filter as trans parent as possible. 0 Last, select the same buffering strategy as the one used by the lower-level Device object. YyGetBufferlimits This is an even tinier helper function that queries the lower-level driver for information about its buffer size limits. It shows how to make a synchronous IOCTL call from one driver to another. Chapter 15 386 Higher-Level Drivers s ta t i c VOI D YyGetBu f f erLimi t s ( IN PDEVICE_OBJECT Targe tDevi c e , IN OUT PXX BUFFER_S I ZE_INFO Bu f ferinfo ) KEVENT I o c t lComp l e t e ; I O_STATUS_BLOCK I osb ; P I RP I rp ; NTSTATUS s tatus ; Keini t i a l i z eEvent ( & I o c t l Comp l e t e , No t i f i cat i onEvent , FALSE ) ; I rp I oBu i l dDevi c e i oCont rolReque s t ( I OCTL_XX_GET_MAX_BUFFER_S I ZE , TargetDevi ce , NULL , 0, Bu f ferinf o , s i z e o f ( XX_BUFFER_S I Z E_INFO ) , FALSE , & I o c t lComp l e t e , & I osb ) ; I oCal lDr iver ( Targe tDevi c e , I rp ) ; KeWa i tForS ingl eObj ect ( & I o c t l Comp l e t e , Execut ive , Kerne lMode , FALSE , NULL ) ; DISPATCH.C - Filter Dispatch Routines Here are the Dispatch routines for the filter driver. Only two major function codes are actually modified by the filter. All the others are passed directly to the lower-level driver. YyDispatchWrite The lower driver has a limit on the maximum size of an output operation. The filter hides this by breaking writes into smaller pieces. This Dispatch routine and the corresponding 1/0 Completion routine do the work of splitting the transfer. Sec. 15.6 Code Example: A Filter Driver 387 NT STATUS YyDi spat chWr i t e ( IN PDEVICE_OBJECT Devi ceObj ect , IN P IRP I rp ) PDEVICE_EXTENS I ON F i l t erExtens i on = Devi c eObj e c t - >Devi c eExtens i on ; PI O_STACK_LOCAT ION I rpS tack = I oGetCurrent i rp S tackLocat i on ( I rp ) ; P IO_STACK_LOCATI ON Next i rpStack = I o GetNext i rpStackLo cat i on ( I rp ) ; ULONG MaxTrans f e r = F i l te rExtens i on- > Bu f f erinfo . MaxWri teLength ; ULONG Byte s Reques ted = I rpStack- > Paramet er s . Wr i t e . Length ; i f ( Byt esReques t ed = = 0 ) 0 { I rp - > I o S tatus . S tatus = STATUS_SUCCES S ; I rp - > I o S tatus . I n f o rmat i on = O ; I oComp l e t eReque s t ( I rp , I O_NO_INCREMENT ) ; re turn STATUS_SUCCESS ; i f ( Byte sReques t ed < = MaxTrans f e r ) @ { return YyDi spatchPassThrough ( DeviceObj e c t , I rp ) ; Next i rpS tack- > Maj orFunc t i on = I RP_MJ_WRITE ; @ Next i rpS tack- > Paramet er s . Wr i t e . Length = I rpStack-> Paramet er s . Wr i t e . ByteOf f s e t . HighPart I rpStack-> Parameters . Wr i te . ByteOf f s e t . LowPar t MaxTrans f e r ; Byte sReques ted ; 0 Chapter 15 388 Higher-Level Drivers ( ULONG ) I rp - >As s o c i a t edirp . Sys temBu f f e r ; 0 I o S e tComp l e t i onRout ine ( © I rp , YyWr i t eComp l e t i on , NULL , TRUE TRUE TRUE ) ; I I II I I Pas s the IRP to the target devi c e II return I oCal l Dr iver ( @ F i l t erExtens i on- >Targe tDevi c e , I rp ) ; 0 Check for zero-length transfers and complete them right here. @ If the requested length is within the lower driver 's acceptable limits, just send the IRP right on through. @) Otherwise, set up the lower driver 's I/0 stack location in this IRP to transfer as much as possible in a single operation. 0 Use the high-order part of the ByteOffset field in the filter driver 's I/0 stack location to hold the number of bytes remaining in the original caller 's request. This is all right because this field isn't being used for any thing else in this driver. Initially, this is the same as the number of bytes requested in the whole transfer. 0 Save the original system buffer address in the low-order (unsigned) part of the ByteOffset field. © Set up an 1/0 Completion routine to continue working on the split trans fer. All the necessary context is stored somewhere in the IRP, so there's no need to pass any other context block. @ Finally, pass the IRP to the lower-level driver and begin the first partial transfer operation. VyDispatch DeviceloControl To further hide the limitations of the lower level driver, the filter intercepts IOCTL queries about the driver 's maximum transfer size. Instead of returning the lower-level driver 's limit values, it lies and says there are no limits. Any other kind of IOCTL function is passed through. NT STATUS YyDi spat chDevi c e i oContro l ( IN PDEVICE_OBJECT Devic eObj e c t , Sec. 15.6 Code Example: A Filter Driver 389 IN P I RP I rp ) PIO_STACK_LOCATI ON I rpS tack I oGetCurrenti rpStackLocat i on ( I rp ) ; = PXX_BUFFER_S I Z E_INFO Bu f f erinf o ; i f ( I rpS tack- > Parameters . Devi c e ioContro l . IoContro l Code IOCTL_XX_GET_MAX_BUFFER_S I Z E ) 0 Bu f f erinfo ( PXX_BUFFER_S I ZE_INFO ) I rp - > As s o c iatedirp . Sys temBu f f e r ; = Bu f f erin f o - > MaxWri teLength Bu f f erin f o - > MaxReadLength = = XX_NO_BUFFER_LIMIT ; XX_NO_BUFFER_LIMIT ; I rp- > I o S tatus . Inf ormat i on s i z e o f ( XX_BUFFER_S I ZE_INFO ) ; = I rp - > I o S tatus . S tatus = STATUS_SUCCES S ; I oComp l e t eReques t ( I rp , IO_NO_INCREMENT ) ; return STATUS_SUCCES S ; e l s e f9 return YyDi spatchPas sThrough ( Devic eObj ect , I rp ) ; 0 Intercept the buffer-size IOCTL code used by the lower-level driver and tell the caller that there are no size limits. f9 If it's any other kind of IOCTL, just send it on to the lower driver for processing. YyDispatchPassThrough This is the "none of the above" Dispatch rou tine. It simply passes everything on to the lower-level driver. It attaches a generic I/0 Completion routine to handle making the IRP pending. NT STATUS YyDi spat chPas sThrough ( Chapter 15 390 Higher-Level Drivers IN PDEVICE_OBJECT Devic eObj e c t , IN PIRP I rp ) { PDEVICE_EXTENS ION F i l t erExtens i on = DeviceObj e c t - >Devi c eExtens i on ; P IO_STACK_LOCATI ON I rpStack = I oGetCurrent i rpStackLoc a t i on ( I rp ) ; P IO_STACK_LOCATI ON Next i rpS tack = I oGetNext i rp S tackLo c a t i on ( I rp ) ; NTSTATUS s tatus ; II I I Copy args to next l evel II * Next i rpS tack = * I rpStac k ; II I I S e t up Comp l e t i on routine t o handl e I I marking the IRP p ending . II I o S e tComp l e t i onRou t i ne ( I rp , YyGeneri cComp l e t ion , NULL , TRUE , TRUE , TRUE ) ; II I I Pas s the IRP t o the target II return I oCal l Driver ( F i l te rExt ens i on - >TargetDevi ce , I rp ) ; } COMPLETE.C - 1/0 Completion Routines The functions in this file handle all the 1/0 completion performed by the fil ter driver. YyWriteCompletion This is the real workhorse routine. Its job is to per form all the additional partial transfers needed to satisfy the original caller 's request. If there's an error, or when the whole transfer is finished, it allows the IRP to continue its journey back up the driver stack. Otherwise, it sets up the IRP for another small transfer and sends it to the lower driver. Sec. 15.6 Code Example: A Filter Driver 391 NTSTATUS YyWri t eComp l e t i on ( IN PDEVICE_OBJECT DeviceObj e c t , IN P I RP I rp , IN PVOI D Cont ext ) PDEVICE_EXTENS I ON F i l t erExt ens i on = Devi c eObj e c t - > Devi c eExtens i on ; PI O_STACK_LOCATI ON I rpS tack = I oGetCurrent i rpStackLocat i on ( I rp ) ; PI O_STACK_LOCATION Next i rpStack = I oGetNext i rpStackLocat i on ( I rp ) ; ULONG Trans f e rS i z e = I rp - > I o S tatus . I n f o rmat i on ; ULONG Byte sRequested = I rpS tack- > Parameters . Wr i t e . Length ; ULONG Byte s Remaining = ( ULONG ) I rpStack- > Parameters . Wr i t e . Byt eOf f s e t . H i ghPart ; ULONG MaxTrans f e r = F i l terExtens i on- >Bu f ferinfo . MaxWri t eLength ; NTSTATUS s tatus ; i f ( NT_SUCCES S ( I rp - > I oS tatus . S tatus ) ) 0 { Trans f e rS i z e ; Byte sRemaining I rpStack- > Parameters . Wr i t e . Byt eOf f s e t . H i ghPart = Byt e s Remaining ; i f ( NT_SUCCES S ( I rp - > I o S tatus . S tatus ) 8 && Byte sRema i ni ng > 0 ) { ( PUCHAR ) I rp - > As s o c iatedirp . Sys t emBu f fer + = Trans f e rS i z e ; 8 Trans f e rS i z e = Byte s Remaining ; 0 i f ( Trans f e rS i z e > MaxTrans fer Chapter 15 392 Higher-Level Drivers { Trans f e rS i z e = MaxTrans f e r ; } Next i rpStack- >Maj orFunc t i on = I RP_MJ_WRITE ; Next i rpStack - > Parame ters . Wr i t e . Length = Trans f e rS i z e ; I o S e tComp l e t i onRout ine ( @ I rp , YyWri t eComp l e t i on , NULL , TRUE TRUE TRUE ) ; I I I oCal lDr iver ( TargetDevi ce , I rp ) ; return STATUS_MORE_PROCESSING_REQUIRED ; } else 8 { I rp - >As s o c i a t edi rp . Sys t emBu f f e r ( PVO I D ) I rpStack- > Paramet er s . Wr i t e . Byt eO f f s e t . LowPart ; @ I rp - > I o S tatus . Informa t i on = Byt e s Requested - BytesRemaining ; CD i f ( I rp - > PendingReturned ) @ { I oMarkirpPending ( I rp ) ; re turn STATUS_SUCCES S ; 0 If the current transfer worked, reduce the count of bytes left to send and save the new count in an unused part of the filter driver 's 1 / 0 stack location. f9 If there's more data left to transfer, set up the next partial output operation. @) Increment the pointer into the system buffer to account for the data trans fer that's just completed. Sec. 15.6 Code Example: A Filter Driver 393 e Calculate the size of the next partial transfer. Start by assuming it can all be done in a single operation. Reduce that expectation if it proves to be too optimistic. 0 After setting up the 1/0 stack location for the lower-level driver, attach this 1/0 Completion routine to catch the operation when it finishes. PendingRe turned ) { I oMarki rpPending ( I rp ) ; re turn STATUS_SUCCES S ; } Chapter 15 394 Higher-Level Drivers 1 5. 7 WRITING TIGHTLY COUPLED DRIVERS Unlike layered and filter driver, tightly coupled drivers don't use the 1/0 Man ager 's IoCallDriver function for most of their communications. Instead, they define some kind of private calling interface. The advantage of this approach is that it's usually faster than the !RP-passing model supported by the 1/0 Manager. In trade for improved performance, however, you have to pay much more atten tion to the mechanics of the interface. Also, unless the details of the interface are well documented, it's difficult for drivers from different vendors to work with each other this way. How Tightly Coupled Drivers Work Since the interface between two tightly coupled drivers is completely deter mined by the driver designer, it's impossible to give a single, unified description of how all tightly coupled drivers work. Instead, this subsection presents some 10 general architectural guidelines. Figure 15.3 shows one common method of tightly coupling a pair of drivers. In this picture, the lower driver has exposed a special setup function in the form of a IRP_MJ_INTERNAL_DEVICE_CONTROL IOCTL. During the upper driver 's initialization, it calls this IOCTL function to retrieve a table of function IRP For YyO · · · · ·· · · · · · · · · · · · · · · · · · ······ ··········· · · · · Call XxFunction1 Function Table : L XxFunctionO · . . IJi XxFunction1 return · Copyright@ 1996 by Cydonix Corporation. 960033a.vsd Figure 10 1 5.3 How tightly coupled drivers work For some concrete examples, see source code for the mouse and keyboard drivers that comes with the DOK. Sec. 15.7 Writing Tightly Coupled Drivers 395 pointers from the lower driver. When the upper driver needs the services of the lower driver, it calls one of the functions in this table directly, rather than using IoCallDriver. Before unloading, the upper driver calls another function in the function table to disconnect it from the lower driver. Initialization and Cleanup in Tightly Coupled Drivers The following subsections describe in general terms how a pair of tightly coupled drivers might initialize and unload. Of course, the exact steps will depend on the architecture chosen by the driver designer. Lower DriverEntry routine Assuming the lower driver manages some specific piece of hardware, its DriverEntry routine will perform the following steps. 1. Using the techniques described in Chapter 7, i t finds and allocates any hard ware for which it is responsible. 2. DriverEntry adds an IRP_MJ_INTERNAL_DEVICE_CONTROL Dispatch routine to the Driver object's MajorFunction table. One of the IOCTLs sup ported by this function code will be to export a table of pointers to various functions in the lower driver. 3. Next, it calls IoCreateDevice to build a Device object. Although this object has an NT name, it does not have a Win32 symbolic link. This Device object is used by the upper driver to establish its initial connection with the lower driver. 4. Finally, DriverEntry does any other driver-specific initialization. For example, it might set up a ring of buffers that it will share with its higher-level clients. Upper DriverEntry routine The upper driver makes its initial contact with the lower driver using the standard 1/0 Manager interface described earlier in this chapter. This is what its DriverEntry routine does. 1. I t calls IoGetDeviceObjectPointer to get a pointer to the lower driver 's Device object. As with a layered driver, this is followed by a call to ObRefer enceObjectByPointer to increment the pointer reference count of the lower Device object, and a call to ObDereferenceObject to decrement the reference count of the File object returned by IoGetDeviceObjectPointer. 2. Next, DriverEntry issues a synchronous IOCTL request to the lower Device object. This IOCTL returns the address of the lower driver 's table of exported functions. 3. It creates one or more Device objects with IoCreateDevice. If the upper driver is exposing these objects to user-mode applications, it calls IoCreateSymbolic Link to give them Win32 names. 396 Chapter 15 4. Higher-Level Drivers Finally, DriverEntry stores the address of the lower driver 's function table in the Device Extension of the upper Device objects. U pper Un load routine When the upper driver is stopped, its Unload rou tine should perform the following general steps. 1. It releases any resources it might have acquired from the lower driver. For example, if it received a buffer from the lower driver, it returns it. 2. Next, the Unload routine issues a synchronous IOCTL to the lower Device object. This notifies the lower driver that the upper one is disconnecting and gives the lower driver a chance to release resources acquired from the upper driver. 3. It then calls ObDereferenceObj ect to decrement the pointer reference count on the lower Device object. This effectively breaks the connection with the lower driver. 4. Finally, the Unload routine performs the usual cleanup tasks, such as deleting its own Device objects and symbolic links. Lower Unload routine There's nothing particular exciting about the lower driver 's Unload routine. It simply releases any hardware it might be holding, releases any other system resources it has allocated, and deletes the Device object that it exposed to the upper driver. 1/0 Request Processing in Tightly Coupled Drivers When a client of the upper driver issues an I/0 request, the I/O Manager sends an IRP representing the transaction to one of the upper driver 's Dispatch routines. Rather than using IoCallDriver to send this IRP to the lower driver, the Dispatch routine directly calls one or more functions in the lower driver to service the request. The exact processing sequence will depend on whether the request is handled synchronously or asynchronously. Synchronous 1/0 For input operations, the upper driver uses a GetBuffer function in the lower driver to dequeue a buffer of data from the ring of shared buffers. Following the model described in Chapter 14, this queue has a Sema phore object that keeps track of the number of full buffers. If the queue of ready buffers is empty, the Semaphore will be in the Non-signaled state, and the upper driver 's Dispatch routine will wait. When the lower driver adds a full buffer to the queue, it increments the Semaphore, which awakens the waiting Dispatch routine. The Dispatch routine then formats and copies data from the shared buffer into the buffer associated with the original caller's IRP, completes the IRP, and releases the shared buffer using a PutBuffer function exposed by the lower driver. Synchronous output operations just reverse the sequence. Here, the upper driver 's Dispatch routine calls a GetBuffer function in the lower driver to get an Sec. 15.8 Summary 397 empty buffer from the queue. Again, the queue has an attached Semaphore object that counts the number of available buffers. If there are no empty buffers, the upper driver 's Dispatch routine waits until the lower driver adds one to the queue and increments the Semaphore. Once it gets an empty buffer, the upper driver fills it with data from the buffer associated with the original IRP. It then calls a PutBuffer function exposed by the lower driver. The PutBuffer function begins the actual data transfer and then waits for a synchronization Event object embedded in the buffer. This causes the upper driver 's Dispatch routine to go to sleep. When the transfer operation completes, some other part of the lower driver (a DPC routine, for example) sets the Event object and returns the buffer to the queue of available blocks. At that point, the upper driver 's Dispatch routine wakes up and completes the original caller 's IRP. Asynchronous 1/0 In this case, the upper driver 's Dispatch routine calls IoMarklrpPending to put the original caller 's IRP into the pending state. It then calls a QueueRequest function exported by the lower driver. As arguments, this function takes the address of the original IRP and a pointer to a callback routine in the upper driver. QueueRequest stores the IRP address and callback pointer in a driver-defined context block and adds it to a private queue of pending requests. It then returns control to the upper driver, and the upper driver 's Dispatch routine returns STATUS_PENDING to the 1/0 Manager. Meanwhile, the lower driver is busily pulling context blocks from its private queue and performing 1 / 0 requests. As each one finishes, the lower driver invokes the upper driver 's callback routine and passes it the address of the pro cessed IRP. The callback routine in the upper driver does any postprocessing needed by the request and calls IoCompleteRequest with the original caller 's IRP. 1 5.8 SUMMARY The layered architecture in Windows NT allows you to simplify the design of drivers that might otherwise be extremely complex. Breaking a monolithic driver into smaller, logically distinct pieces makes implementation and maintenance eas ier, reduces debugging time, and increases the likelihood that some of the soft ware will be reusable. In this chapter, you've seen a number of different ways to stack drivers on top of one another. Most of these techniques depend on the 1/0 Manager 's stan dard calling mechanism to send IRPs from one driver to another. If this proves not to be fast enough, you can also define private interfaces between a pair of drivers. In general, these privately-defined interfaces are a bad idea because they make the design more fragile and harder to maintain. Regardless of how your drivers communicate with one another, you still have to guarantee that they load in the proper order. Getting that to happen is one of the topics discussed in the next chapter. C H A P T E R 16 Building and Installin g Drivers T here's always a certain amount of grunt work associated with any interesting activity. This chapter is about the mundane details of building drivers and installing them on a system. Some of this information is pretty straightforward stuff. Other bits of it have been teased painfully from vari ous header files, online sources, and tedious experimentation. So, even if you're familiar with the DDK documentation, you may find something of value here. 1 6 . 1 BUI LDING DRIVERS One difficult aspect of writing drivers for Windows NT is that you need to main tain separate versions of the driver for each hardware platform that you support. Generating and keeping track of multiple binaries is especially troublesome because you may need different sets of compiler and linker options for each plat form. The BUILD utility supplied with the NT DDK insulates you from most of these platform dependencies. What BUILD Does The BUILD utility is just an elaborate wrapper around NMAKE. Using a set of keywords, you describe the operation you want to perform. BUILD then scans your source files for dependencies and constructs an appropriate set of NMAKE commands. Next, it runs NMAKE to execute these commands, and the result is 398 Sec. 16.l Building Drivers 399 SOURCES File Environment Variables Command Options ,/ Free Build Checked Build Copyright @ 1 994 by Cydonix Corporation. 940040a.vsd Figure 1 6. 1 How th e BUILD utility works one or more binary output files (referred to as B UILD products). Figure 16.1 shows how this process works. BUILD itself is actually a rather simple-minded piece of software. Most of the build process is controlled by a set of standard command files that BUILD passes to NMAKE. These files contain all the platform-specific rules and option settings needed to create a BUILD product. Keeping these rules in a separate file allows Microsoft to modify the build process without having to rewrite the whole BUILD utility. Currently, BUILD uses these command files (located in ... \DDK\INC): • MAKEFILE.DEF is the master control file. It uses several other files to do some of its work. • MAKEFILE.PLT selects the target platform for a build operation. • 1386MK.INC, ALPHAMK.INC, MIPSMK.INC, and PPCMK.INC con tain platform-specific compiler and linker switches for Intel, Alpha, MIPS, and PowerPC systems. BUILD helps you manage multiplatform projects by separating binary files according to their platform type. To do this, it uses different directories for Intel, MIPS, Alpha, and PowerPC binaries. If you have cross-hosted compilers and link ers, you can produce the binaries for all the supported platforms on one system using a single BUILD command. Figure 16.2 shows the directory structure that BUILD uses. Chapter 16 400 Building and Installing Drivers 1----- ALPHA t 1----- 1386 CHECKED XXDRIVER.SYS FREE XXDRIVER.SYS 1----- MIPS '------ PPC Copyright © 1 996 by Cydonlx Corporation. 960025a.vsd Figure 1 6.2 Directory structure for BUILD products Notice that BUILD also uses separate directories for the checked and free ver sions of your binaries. In the checked version, compiler optimization is disabled, extra debugging information is added to the file, and the DBG symbol is defined as 1 (allowing you to include conditional debugging code in your driver) . By con trast, free BUILD products are compiled with optimization turned on and the DBG symbol is defined as 0. Checked builds are useful when you're debugging; free builds are generally smaller and faster and should be used for the commercial release of a driver. One of BUILD' s odd little quirks is that, while it creates the platform-specific directories automatically, for some reason it doesn't create the CHECKED and FREE subdirectories. This results in an error message from the linker when it tries to create your driver. The easiest solution is to set up the directory structure by hand. How to Build a Driver Once you have some source code ready, follow these steps to generate your driver. You only need to perform steps 1-3 the first time you build the driver. 1. In the directory where you keep your driver source code, create a file called SOURCES that identifies the components of the final driver. A discussion of what to put in this file appears later in this section. 2. In the same directory, create a file called MAKEFILE that contains only the following line: ! INCLUDE $ ( NTMAKEENV ) \ MAKEFILE . DEF Sec. 16.1 Building Drivers 401 This stub invokes the standard makefile needed by any driver created with BUILD. Don't edit this stub makefile. If you want to add more source files to this driver, add them to the SOURCES file. 3. Use the File Manager or the MKDIR command to set up the directory tree for your BUILD products. Refer back to Figure 16.2. 4. In the Program Manager group for the Windows NT DDK, double-click on the icon for either the Checked Build or the Free Build environment. A com mand window will appear with the appropriate BUILD environment vari ables set for a debug or release version of your driver. It's important that you run the BUILD utility only from one of these windows. 5. When the Checked or Free command window opens, its default directory is the same as the installation directory for the NT DDK itself. Use the CD com mand to move to the directory where your driver 's SOURCES file is located. 6. Run the BUILD utility to create the driver executable. If all goes well, your driver will be in the CHECKED or FREE subdirectory of the appropriate platform directory. If something goes awry, look at the various BUILD log files to determine the problem. You might be wondering whether you can build NT drivers on a Windows 95 system. The VC++ tools all run under Windows 95, so in theory it should work. Unfortunately, when BUILD spawns NMAKE, it uses a command line that's too long for Windows 95 to handle and the operation fails. Consequently, you have to do your BUILDing on a Windows NT system. Writing a SOURCES File You describe your BUILD operation using a series of keywords. These key words specify things like the type of driver you want to generate, the source files making up the BUILD product, and the directories for various files. Although you can pass these keywords to BUILD as command-line options or environment vari ables, the usual procedure is to put them in a SOURCES file. Keep the following points in mind when you write one of these files: • The filename must be SOURCES (without any extension). • The file should contain some number of commands, each having the fol lowing format: keyword=value • You can break a single BUILD command over multiple lines in the SOURCES file by putting a \ character at the end of each line except the last. • The value of a BUILD keyword must be pure text. BUILD itself does only very limited processing of NMAKE macros and doesn't handle condi tional statements at all. Chapter 16 402 • Make sure you don't leave any whitespace between a BUILD keyword and the character. Whitespace after the is acceptable. = • Building and Installing Drivers = You can put comments in a SOURCES file by starting the line with a # character. Table 16.1 lists the SOURCES keywords that you're most likely to use for building drivers. If you're the sort of person who enjoys going to the dentist for root-canal work, you may want to use the BUILD utility for maintaining user mode applications as well as drivers. In that case, see the BUILD documentation for a list of additional keywords. Table 1 6.1 BU I LD utility keywords for maintaining d rivers and libraries Selected BUILD keywords Keyword Meaning INCLUDES SOURCES List of paths containing header files List of source files making up the BUILD product* Top-level directory for BUILD product tree* Name of the BUILD product, without an extension* File extension for the BUILD product Case-sensitive keyword describing BUILD product* • DRIVER • GDl_DRIVER • MINIPORT • LIBRARY (for static libraries) • DYNLINK (for DLLs) List of libraries to be linked with the driver Linker options of the form -Jlag:value Example: -MAP:XXDRIVER.MAP File containing #include directives List of nonstandard components to be built with MAKEFILE.INC after initial dependency scan List of nonstandard components to be built with MAKEFILE.INC before linking List of nonstandard components to be built with MAKEFILE.INC both before and after the link TARGETPATH TARGETNAME TARGETEXT TARGETTYPE TARGETLIBS LINKER_FLAGS PRECOMPILED_INCLUDE NTTARGETFILEO NTTARGETFILEl NTTARGETFILES *Re quired. 403 Sec. 16.1 Building Drivers The following is an example of a minimal SOURCES file for building a ker nel-mode driver. TARGETNAME= XXDRIVER TARGETTYPE= DRIVER TARGETPATH= INCLUDES = $ ( BASEDI R ) \ inc ; . . \ inc SOURCES= ini t . c config . c resal l o c . c \ di spatch . c x f e r . c unl oad . c One item to point out in this file is the INCLUDES= keyword. For some rea son, neither the DOK installation procedure nor the Free/ Checked build icons add the DOK header directory to the INCLUDE-path environment variable. By naming it explicitly in SOURCES, you can avoid a number of miscellaneous BUILD error messages. Log Files Generated by BUILD In addition to its screen output, the BUILD utility generates several text files that you can use to determine the status of a BUILD product. These files are: • BUILD.LOG • BUILD.WAN • BUILD.ERR - - - Lists the commands invoked by NMAKE. Contains any warnings generated during the build. Contains a list of errors generated during the build. BUILD puts these files in the same directory as the SOURCES file. The warning and error files appear only if something bad happened during the BUILD operation. One other point worth mentioning is BUILD's nasty habit of filtering out some compiler and linker messages. These filtered messages don't appear on the screen display, but they will show up in the log files. For that reason, it's impor tant to check the log files after each BUILD. Recursive BUILD Operations You can use BUILD to maintain an entire source code tree by creating a file called DIRS. You put this file in a directory that contains nothing but subdirecto ries. Each subdirectory can be a source directory (containing a SOURCES file) or the root of another source tree (containing another DIRS file). When you run BUILD from the topmost DIRS directory, it creates all the BUILD products described in each SOURCES file. The rules for writing a DIRS file are the same as those for a SOURCES file, with the restriction that you're only allowed to use the following two keywords: Chapter 16 404 Building and Installing Drivers • DIRS - Lists subdirectories that should always be built. Entries in this list are separated by spaces or tabs. • OPTIONAL_DIRS - Lists subdirectories that should be built only if they are named on the original BUILD command line. This recursive BUILD feature can be useful for maintaining things like video drivers that have both a user-mode and a kernel-mode component. 1 6.2 M ISC E L LA N E O U S B U I L D - T I M E A CTIVITI E S Along with the basic operations of getting your driver to compile and link, there are several other kinds of activities that you may want to perform at BUILD time. This section presents the ones that have proven to be the most useful. Using Precompi led Headers Much of the time consumed by a BUILD operation is spent compiling vari ous large header files. During a normal development cycle, your driver 's code will change frequently, but these headers will be relatively static. This leads to a lot of wasted time as the headers are compiled again and again. By taking advan tage of the C compiler 's precompiled header feature, you can significantly reduce the BUILD time of your driver (at the expense of some disk space) . To use precompiled headers, you'll need to make some changes to your driver sources and add a new keyword to the BUILD control file. Follow these steps: 1. Create a header file containing nothing but #include directives for any other headers used by your driver. For example, if you called this file PRECOMP.H, it would contain the following: # inc lude # inc lude " xxdr iver . h " # inc lude " hardware . h " 2. In all your other driver source files, replace all #include directives with # inc lude " precomp . h " 3. Add the following statement to your SOURCES file: PRECOMP ILED_INCLUDE = PRECOMP . H When you run BUILD for the first time, the C compiler will save the precom piled header information in a binary file called PRECOMP.PCH. As long as you don't change the contents of your headers, the compiler will be able to save itself some work by reusing the precompiled binary version. Sec. 16.2 Miscellaneous BUILD-Time Activities 405 Including Version Information in a Driver How much time have you spent tracking down weird bugs, only to find that the real problem was a software version mismatch? This can be a real time waster, especially if you're trying to support a commercial product used by hun dreds of customers. You can avoid this situation altogether by putting explicit ver sion information in your drivers and checking it before you start looking for more complex explanations. You add version information to a driver using a resource script that defines a version structure. An example later in this section shows how to do this, but the basic steps you need to follow are: l. Separate your version data into two categories: things that relate to your company as a whole (like the company name), and things that are product specific. 2. Use the generic company information to write a header that can be included in the version resource scripts of all your products. 3. Write a resource script for your driver that contains product-specific version information. This file should be updated each time you release a version of your driver for testing. 4. Add the name of the resource script to the list of driver components identified by the SOURCES keyword in your SOURCES file. When you want to examine the driver 's version data, you can use the File Manager 's File Properties ... menu item. To display this information in a more complete form, you could also write a little Win32 program to read the version data. The following Win32 API calls are relevant. • GetFileVersionlnfoSize This tells you the number of bytes of version data are associated with the driver. • GetFileVersionlnfo • VerQueryValue This extracts a specific piece of version information from the buffer returned by GetFile Versionlnfo. - - This returns a buffer of version data. - To make all this more concrete, here are examples of a vendor header file and the corresponding product resource script. Vendor information file. This header file contains version information com mon to all the products from one vendor. Although you could include this stuff in the RC file itself, if you're maintaining several products, it's less work to keep it in one place for all of them. Below is a copy of CYDNXVER.H, the vendor informa tion file for Cydonix Corporation. Chapter 16 406 # de f ine VER_COMPANYNAME_STR Building and Installing Drivers " Cydonix Corpora t i on " # de f ine VER_LEGALTRADEMARKS_STR \ " Cydonix\ 2 5 6 i s a t rademark o f Cydonix Corporat i on . " # de f ine VER_LEGALCOPYRIGHT_YEARS " 1 9 9 4 - 1 9 9 5 " # de f ine VER_LEGALCOPYRIGHT_STR " Copyr i ght \ 2 5 1 Cydonix Corp . " VER_LEGALCOPYRIGHT_YEARS / * de f au l t i s nodebug * / # i f DBG # de f ine VER_DEBUG #else # de f ine VER_DEBUG # endi f / * de f au l t i s release * / # i f BETA # de f ine VER_PRERELEASE #else # de f ine VER_PRERELEASE # endi f \ \ VS_FF_DEBUG 0 VS_FF_PRERELEASE 0 # de f ine VER_F I LEFLAGSMASK VS_FFI F I LEFLAGSMASK VOS_NT_WINDOWS 3 2 # de f ine VER_F I LEOS ( VER_PRERELEASE I VER_DEBUG ) # de f ine VER_F ILEFLAGS Product information file This is the actual resource control script that sets product-specific fields in the version resource. Notice that it includes the vendor default values defined above. The actual version resource is built by including the system-supplied COMMON.VER file. Any version information not defined by the time you include COMMON.VER will be filled in with Microsoft-specific information. The following is a copy of XXDRIVER.RC, the version resource script for XXDRIVER. # inc lude / *---------------------------------------------------* / / * Inc lude de f au l t va lues f o r generic vendor info * / */ /* /*---------------------------------------------------* / # inc lude " cydnxver . h " / *---------------------------------------------------*/ / * The f o l l owing values shoul d be modi f i ed only by * / / * the o f f i c i al bui l der , and they shou l d be updated * / / * f o r each r e l eas e * / / *---------------------------------------------------* / Sec. 16.2 Miscellaneous BUILD-Time Activities # de f ine # de f i ne # de f ine #de f ine 407 VER_PRODUCTBUILD 4 2 VER_PRODUCTVERS I ON_STR 11 1 . 0 1 11 VER_PRODUCTVERS I ON 1 , 0 1 , VER_PRODUCTBUILD , l VER_PRODUCTBETA_STR 11 11 / *---------------------------------------------------*/ / * Inc lude produc t - spec i f i c de fau l t va lues * / */ /* /*--------------------------------------------------- * / # de f ine # de f ine # de f ine # de f ine # de f ine # de f ine VER_PRODUCTNAME_STR " XXDRIVER " VER_F ILETYPEVFT_DRV VER_FILESUBTYPEVFT2_UNKNOWN VER_F I LEDESCRI PTION_STR " Dr iver f o r XX " VER_INTERNALNAME_STR " xxdr iver . sys " VER_ORIGINALF I LENAME_STR " xxdriver . sys " / *---------------------------------------------------*/ / * De f ine the ver s i on r e s ource i t s e l f * / */ /* / *---------------------------------------------------*/ # inc lude < c ommon . ver> Including Nonstandard Components i n a BUILD Even though BUILD is the epitome of software maintenance technology, there are still some things it doesn't do very well. For example, if you have a non standard driver component (like a custom message file), BUILD won't know what to do. It's your job to help BUILD out of these sticky situations by writing an aux iliary makefile that tells it how to process the nonstandard components. These are the steps you need to follow: 1. Decide what nonstandard target files need to b e part o f the driver. 2. In the same directory as the SOURCES file for your driver, create a makefile called MAKEFILE.INC. This makefile describes the dependencies among your driver 's nonstandard components and gives instructions for building these components. 3. For each nonstandard component, decide when during the BUILD operation the component should be created. 4. Add the component to the list of files in the NTTARGEFILEO, NTTARGETFILEl, or NTTARGETFILES keyword of your BUILD control file. See Table 16.1 for a description of these keywords. 5. Run the BUILD utility. Back in Chapter 13, you saw an example of a driver that defined some private messages for logging events. Here are the auxiliary NMAKE and BUILD control 408 Chapter 16 Building and Installing Drivers files that generate this driver 's executable. You can find the complete example in the CH13\DRIVER directory on the floppy that accompanies this book. MAKEFILE.INC Recall from Chapter 13 that the message compiler gener ates a tiny resource script along with a binary message file and a header. You include this stub resource script in the driver 's main resource file, which leads to the following dependencies in the auxiliary makefile: xxrnsg . rc xxms g . h msgO O O O l . bi n : xxrns g . mc me -v - c xxrnsg . mc SOURCES Since the dependent files must be generated before BUILD runs the resource compiler or the C compiler, you use the NTTARGETFILEO key word. Identifying any one of the dependent files is enough to get BUILD to invoke MAKEFILE.INC. TARGETTYPE= DRIVER TARGETNAME= xxdriver TARGETPATH= INCLUDES= $ ( BASEDIR ) \ inc ; . . \ inc ; . SOURCES= i n i t . c unl oad . c di spatch . c event log . c xxrnsg . rc \ \ \ NTTARGETF I LE O = xxms g . h Moving Driver Symbol Data into .DBG Files Contrary to what the DDK documentation claims, both checked and free versions of your driver contain symbol data, which greatly increases the size of your driver executable. This section explains how to strip symbols from your driver and put them into a separate file. Follow this procedure. 1. Use the following command t o examine the header information in your driver 's executable: DUMPBIN/ HEADERS XXDRIVER . SYS I MORE 2. In the OPTIONAL HEADER VALUES section, look for the image base address. Usually this will be OxlOOOO for kernel-mode drivers. 3. Strip symbol information from your driver and put it in a separate file using this command: REBASE -B Oxl O O O O -X . \ SYMBOLS XXDRIVER . SYS The B option specifies the new base address for the driver (in this case, the same as the original value). The X option identifies the directory where the Sec. 16.3 Installing Drivers Table 1 6.2 409 Effect of removing symbols on driver file sizes Driver sizes with and without symbols Version Before REBASE After REBASE Checked build Free build 376,476 bytes 77,600 bytes 96,544 bytes 46,368 bytes symbol file should go. The symbol file will have the same name as the driver executable, with the extension .DBG. 4. To use the symbol file for debugging, move it to the directory where you keep other .DBG files on the host machine. If you look at Table 16.2, you'll see the impact symbol data can have on the size of a driver. This table compares the sizes of checked and free builds of the standard NT serial port driver with and without symbols. 1 6.3 INSTALLING D RIVERS This section explains how to install a driver by hand, which is something you'll need to do while you're developing your driver. It also presents some guidelines for automating the driver installation process once the retail version is ready for the world. How to Install a Driver by Hand Installing an NT driver is just a matter of copying some files to the right directory and making a few entries in the system Registry. These are the basic steps you need to follow: l. Copy the driver to the %SystemRoot% \SYSTEM32\DRIVERS directory on the target system. 2. Add appropriate entries to the Registry of the target system using the REGEDT32 utility. These entries are described below. 3. Reboot the target system to make the Service Control Manager aware of the new driver. If the driver 's Registry entries specify automatic startup, the driver will load during system boot. 4. If the driver 's Registry entries specify manual startup, use the Control Panel Devices applet to start the driver. If you find a nonfatal bug in your driver, you can load a corrected copy with out rebooting the system. Just use the Control Panel Dev�ces applet to stop the Chapter 16 410 Building and Installing Drivers driver. Then, overwrite the driver executable in the \DRIVERS directory and restart it using the Devices applet. Of course, this only works if the driver has an XxUnload routine and if it isn't crucial to the operation of the system. ... Driver Registry Entries During system bootstrap, NT builds a list of available drivers by scanning the Registry. This list identifies both the drivers that start automatically as well as those that need to be started manually. To add your driver to this list, you need to build the Registry entries that appear in Figure 16.3. Table 16.3 describes these Registry keys and values. To bring a driver online, you only need the driver 's service key plus the Start, 'fype, and ErrorControl val ues. The service key should have the same name as the driver executable, without the file extension. As you saw in Chapter 7, the Parameters subkey is normally used for device information that doesn't auto-detect, although you can really put anything in it. End-User Installation of Standard Drivers Manual installation is fine while you're still developing a driver, but once your code is ready for commercial release, it's a good idea to automate the whole procedure. If your driver manages a standard piece of hardware (like a video or network card), you can take advantage of NT's built-in driver installation mecha nisms. These built-in mechanisms run in three different situations. During text setup When end users perform a full installation of Windows NT, the first piece of setup software runs in text mode. During this text phase, the HKEY_LOCAL_MACHINE �-- System L CurrentControlSet L Services L xx RIVER ErrorControl: REG_DWORD: Ox1 Start: REG_DWORD: Ox3 Type: REG_DWORD: Ox1 Parameters Copyright @ 1 994 by Cydonix Corporation. 940041a.vsd Figure 1 6.3 Structure of a driver 's Registry service key 411 Sec. 16.3 Installing Drivers Table 1 6.3 Kernel-driver Registry entries Driver service key Registry entries Name Data type Description XXDRIVER Type (Key) REG_DWORD Driver service key* What kind of driver this is* • 1 kernel-mode driver • 2 file-system driver When to start the driver (see below)* System response if driver fails to load* • 0 log error and ignore • 1 log error and put up a message box • 2 log error and reboot with last-known good configuration • 3 log error and fail if already using last-known good configuration Driver 's group name (see below) Drivers needed by this one (see below) Driver load order within a group (see below) Key to hold driver-specific parameters - - Start ErrorControl REG_DWORD REG_DWORD - - - - Group DependOnGroup Tag REG_SZ REG_MULTI_SZ REG_BINARY Parameters (Key) *These entries are re quired. setup program installs drivers for the keyboard, the mouse, SCSI HBAs, and video devices. If it can't find a driver for one of these devices (or if the user chooses to replace the standard driver), the setup program will prompt the user for an installation diskette. The diskette contains a copy of the driver itself and a control script called TXTSETUP.OEM. This script is just a text file that identifies the type of hardware supported by the driver, lists the files that need to be copied from the floppy, and names the keys and values that should be added to the Registry. The Windows NT DDK Programmer 's Guide describes the exact contents and format of a TXT SETUP.OEM file. During GUI setup Once the text phase of Windows NT installation fin ishes, a GUI-based setup program takes over. This GUI setup program can install drivers for the keyboard and mouse, video and network cards, tape drives, and SCSI HBAs. Just like its text-based counterpart, the GUI setup program prompts the user for the location of any drivers it can't find; it also allows the user to sup ply replacements for the standard drivers. To install a driver during GUI setup, once again you'll need to write a con trol file. This one is called OEMSETUP.INF, and it uses a much more full-featured scripting language than TXTSETUP.OEM. The GUI scripting language supports 412 Chapter 16 Building and Installing Drivers dialog boxes, message text in multiple national languages, elaborate flow control, and commands for a variety of common installation tasks. If the built-in com mands aren't enough, you can call functions in DLLs or run external programs from within the script. See the Windows NT DOK Programmer 's Guide for a description of the GUI scripting language. After NT installation Users can also install drivers for standard devices after NT itself has been set up. This is referred to as maintenance mode installation, and it uses the same OEMSETUP.INF script as the GUI setup phase of NT. Depending on the type of hardware, the end user will have to run either the Win dows NT Setup program or a Control Panel applet to execute the script. Table 16.4 shows the various options. End-User Instal lation of Nonstandard Drivers If your device isn't one of the types supported by TXTSETUP.OEM or OEMSETUP.INF, you'll have to provide your own installation program. You can either use commercial installation software, or you can roll your own using some of the following Win32 API calls: • CopyFile to move the driver file to the appropriate directory. • RegCreateKeyEx and RegSetValueEx to set up the proper keys and val ues in the Registry. • CreateProcess to run any external programs needed during installation. • CreateService and StartService if you want to bring the driver online without rebooting the system. 1 As you've seen elsewhere in this book, you can customize the behavior of your driver using values stored in the Parameters subkey of the driver 's Registry Table 1 6.4 How to install standard d rivers in maintenance mode Installation tools for standard drivers 1 Type of d river Installation tool Keyboard Mouse Multimedia device Net-card and network protocol SCSI HBA Tape drive Video Windows NT Setup Windows NT Setup Control Panel Drivers applet Control Panel Network applet Windows NT Setup Windows NT Setup Control Panel Display applet See the INSTDRV sample that comes with the NT DDK for an example of using the Service Control Manager API to install a driver without forcing the user to reboot. Sec. 16.4 Controlling Driver Load Sequence 41 3 service key. If you have many of these parameters and you expect end users to change them, you should consider writing either a Control Panel applet or a standalone program to modify the Registry. This is much safer than asking an end user to work with REGEDT32. Finally, you'll make everyone's life easier if you supply software that allows users to remove your driver from the system. This means cleaning up the Registry as well as deleting any relevant files. 1 6 .4 CONTROLLING DRIVER LOAD SEQUENCE There are times when you may need t o control the sequence i n which N T loads multiple drivers. For example, class drivers usually have to be loaded after the port drivers that manage their underlying hardware. If your drivers load auto matically when the system boots, you can use various Registry entries to control their load sequence. This section explains how. Changing the Driver's Start Value You can control when a driver loads by setting the Start value in the driver 's Registry service key. The number you assign to Start corresponds to one of the Service startup types recognized by the NT Service Control Manager. Currently, Start can take one of the following values. OxO (SERVICE_BOOT_START) This value specifies that a driver should be started by the operating system loader. Since much of the system isn't avail able, this value should be used only for drivers that are necessary to the bootstrap operation itself (for example, the driver for the boot device) . Ox1 (SERVICE_SVSTEM_START) This value identifies drivers that should be started after the operating system has been loaded, but while it is still initializing itself. Ox2 (SERVICE_AUTO_START) Drivers with this Start value are loaded by the Service Control Manager after the entire system is up and running. Unless your driver is crucial to the system bootstrap or initialization, this is probably the most appropriate value to choose. Ox3 (SERVICE_DEMAND_START) These drivers have to be started man ually, either by using the Control Panel Devices applet or by making direct calls to the Win32 Service Control Manager APL Ox4 (SERVICE_DISABLED) Disabled drivers cannot be started until their Start value is changed to something else. Again, you change this value using the Control Panel Devices applet or the Service Control Manager API, or by modify ing the Registry directly. Chapter 16 414 Building and Installing Drivers NT guarantees that drivers with lower Start values will be loaded ahead of drivers with higher values. So all drivers with a value of 0 will load ahead of any drivers with values of 1 or 2. Keep in mind that this only works for Start values of 0, 1, or 2, because drivers with other Start values require some kind of manual intervention to get them going. Creating Explicit Dependencies between Drivers Setting Start values is fine if your drivers need to be loaded during different phases of system startup, but what if you need to control the load order of multi ple drivers with the same Start value? For example, a SCSI class driver won't be able to load successfully until all the SCSI miniport HBA drivers are available. One solution to this problem is to use the Group and DependOnGroup values in the driver service keys. These are the steps you should follow if you want to establish an explicit load-order dependency between two drivers: l. Decide which driver needs to load first and choose a group name for this driver. In some cases (like the SCSI miniport), you may need to use a standard, system-defined group name. Otherwise, use a name of your own choosing. 2. Add a value called Group to the service key of the driver that loads first. The Group value is a REG_SZ containing the group name you've assigned to this driver. 3. Add a value called DependOnGroup to the service key of the driver that should load second. The DependOnGroup value is a REG_MULTI_SZ con taining the names of any groups on which this driver depends. At least one driver in each named group must be started before the system will start any dependent driver. Keep in mind that you can have as many drivers as you like with the same Group value. This guarantees that all the members of the group will get a chance to load ahead of any drivers depending on that group name. Again, SCSI miniports are a good example. To see how all this works, imagine that you have two drivers, :XXDRIVER and YYDRIVER, and that XXDRIVER is a member of the group called "Group W. " If you wanted :XXDRIVER to load ahead of YYDRIVER, you'd need to set up the following Registry entries: HKEY_LOCAL_MACHINE \ . . . \ S ervi c e s \ XXDRIVER S tart : REG_DWORD : 2 Group : REG_S Z : Group W HKEY_LOCAL_MACHINE \ . . . \ Servi c e s \ YYDRIVER S tart : REG_DWORD : 2 DependOnGroup : REG_MULTI_S Z : Group W Sec. 16.4 Controlling Driver Load Sequence 415 With these values, both drivers will load during final stages of system startup, after everything is running . Further, all the drivers in "Group W" will be given a chance to load before YYDRIVER. Establishing Global Group Dependencies Another way to control the load order of your drivers is to modify the Ser viceGroupOrder key in the Registry. This key contains a single REG_MULTI_SZ value called List that identifies group names in the order that they will be loaded. The earlier a driver 's group name appears in this list, the sooner it loads. NT will try to load all the drivers in an earlier group ahead of any driver in a later group. Figure 16.4 shows an excerpt of this part of the Registry. In this example, drivers in the group "SCSI class" load after all drivers in the group "Primary disk" and before any drivers in the group "SCSI CDROM class. " Although you could achieve the same results using DependOnGroup, this technique is useful for situations where you don't want to modify the Registry values of some of the drivers. For example, if you wanted one of your drivers to load earlier than a particular system-supplied driver group, you could simple modify the ServiceGroupOrder key. There would be no need to change the DependOnGroup value of each system-supplied driver. The ServiceGroupOrder list is actually scanned several times during system startup. First, at bootstrap time, all drivers with a Start value of 0 load according to their ServiceGroupOrder sequence. Next, during system initialization, drivers with a Start value of 1 load. Finally, when the system is up and running, any driv ers with a Start value of 2 are loaded. So, drivers with lower Start values load H KEY_LOCAL_MACHINE L system L CurrentControlSet L Co ntrol L se [ ·ceGroupOrder List: REG_MULTl_SZ: System Bus Extender SCSI mi n iport port Primary disk SCSI class SCSI CDROM class filter Copyright © 1 995 by Cydonix Corporation. 950012a.vsd Figure 1 6.4 The layout of the ServiceGroupOrder Registry key 416 Chapter 16 Building and Installing Drivers before any drivers with higher Start values, no matter what their positions in the ServiceGroupOrder list. As an example, suppose you had a SCSI disk that needed a special driver. Unfortunately, the standard SCSI disk class driver is going to allocate anything that looks like a SCSI disk, including yours. The only way to prevent this is to make sure that your driver loads ahead of the standard driver. You can do this by modifying the ServiceGroupOrder list. First, add a Group value to the Registry key for the driver that manages the special disk. If this driver were XXDRNER, and you wanted to add it to "Group W," the Registry key would be HKEY_LOCAL_MACHINE \ . . . \ S ervi c e s \ XXDRIVER Group : REG_S Z : Group W S tart : REG_DWORD : 0 Examining the Registry service key for the standard SCSI disk driver (SCSIDISK), you find that it belongs to the group "SCSI class. " So, you need to edit the ServiceGroupOrder list and add "Group W" ahead of "SCSI class. " The Registry would then look like this: HKEY_LOCAL_MACHINE \ . . . \ C ontro l \ S ervi c eGroupOrder L i s t : REG_MULTI_S Z : Sys t em Bus Ext ender S C S I miniport Group W SCSI class Controlling Load Sequence within a Group The techniques presented so far allow you to set up load-order relationships among groups of drivers, but they make no promises about the load order of driv ers in the same group. By adding Tag values to the Registry keys of drivers within a group, you can control their loading sequence. Here's what you need to do: 1. Modify the \CurrentControlSet\Control\GroupOrderList key in the Reg istry by adding a value with the same name as your driver group. Give this value a data type of REG_BINARY and make sure its contents follow the pat tern described below. This value defines a series of tag numbers and their sequence. 2. Add a REG_DWORD value called Tag to the Registry service key of each driver in the group. Set this value to one of the tag numbers you defined for your group in GroupOrderList. ... Within a single group, NT will load drivers according to the sequence of their Tag values, as defined in the GroupOrderList. Drivers without a Tag value 417 Sec. 16.4 Controlling Driver Load Sequence Count 1 byte 1 st Tag 4 bytes Nth Tag 4 bytes Filler 3 bytes Copyright © 1 995 by Cydonix Corporation. 950013a.vsd Figure 1 6.5 Layout of a tag definition in the GroupOrderList key (and drivers whose Tag value is not in the GroupOrderList) load after the drivers with valid Tag values. For these drivers, the order of loading is not guaranteed, other than that all drivers in a group load before the next group loads. The tag definitions in the GroupOrderList are REG_BINARY data, and their format needs a little explanation. As you can see from Figure 16.5, each definition contains several fields. The first field is a 1-byte count of the number of tag values to follow. Next come the tag numbers themselves, each one taking up a DWORD. These are followed by 3 null bytes that round the whole entry up to an integral number of DWORDs. The following example of one of these values defines two tags: one with a value of Ox44 and another with a value of Ox28. 02 00 00 00 44 00 00 0 0 2 8 00 00 0 0 Note that it's the sequence of the tags (and not their actual numerical values) that determines driver load order. With the example above, drivers in this group with a Tag of Ox44 would load ahead of those with a Tag value of Ox28. As an example of using these tags, imagine that you have two drivers, XXDRIVER and YYDRIVER, both belonging to "Group W" and you want XXDRIVER to load ahead of YYDRIVER. The first step is to add a value to the GroupOrderList that defines the tags: HKEY_LOCAL_MACHINE \ . . . \ Contro l \ GroupOrderL i s t Group W : REG_BINARY : 0 2 0 0 0 0 0 0 4 4 0 0 0 0 0 0 2 8 . . . Next, modify the service keys for XXDRIVER and YYDRIVER by adding Tag values to them. The Registry entries would look like this: HKEY_LOCAL_MACHINE \ . . . \ S ervi ce s \ XXDRIVER Start : REG_DWORD : 2 Group : REG_S Z : Group W Tag : REG_DWORD : Ox4 4 HKEY_LOCAL_MACHINE \ . . . \ S ervi c e s \ YYDRIVER Start : REG_DWORD : 2 Group : REG_S Z : Group W Tag : REG_DWORD : Ox2 8 418 Chapter 16 Building and Installing Drivers One final point: Not every group shows up in the GroupOrderList key. When a group is not in the GroupOrderList, the order in which drivers load within the group is undetermined. 1 6.5 SUM MARY This chapter has presented a variety of different topics, all of which had to do with building a driver and getting it online. But what if the driver has personal problems? What if, in an occasional psychotic fit, it crashes the system or muti lates some data? In the next chapter, you'll see some techniques you can use to track down and eliminate bugs from your driver. C H A P T E R 17 Testin g and Debu ggin g Drivers W here do they come from, these driver bugs? Do they hide beneath the bed like mutant dust bunnies, scheming and plotting waiting for nightfall so they can sneak into our code? No, driver bugs are not ran dom events. Instead, they represent some coding or logic error, or some lack of understanding about how the hardware or the system actually works. This chap ter presents a number of testing and debugging techniques you can use to catch both catastrophic and subtle flaws in your driver. 1 7 . 1 SOME G U IDELINES FOR DRIVER TESTING A s i n other areas o f software development, a great deal o f thought has gone into the practice of software testing over the last three decades. It's a good idea to take advantage of this thinking when you start to design a testing strategy for your driver. The following sections present some of the major issues you should consider. (See the Bibliography for some other references on software testing.) The General Approach to Testing Drivers The first thing to do is to accept the hopelessness of your situation. It's sim ply not possible to verify that a driver is free of bugs. To begin with, even trivial pieces of software can have so many code paths that there's just no way to exer cise every one of them. Add to that all the various hardware and system-load 419 Chapter 1 7 420 Testing and Debugging Drivers conditions your driver might encounter in the real world, and your chances of catching every bug disappear pretty quickly. As a tester, the best you can do is to show that a driver doesn't exhibit any of the bugs detectable by your tests. If your tests represent a reasonable model of conditions in the driver 's target environment, then you'll probably be in good shape. This points to the fact that designing good tests is just as important as designing a good driver. When to do the testing Experience shows that it's more effective to test individual driver components as they're developed, rather than waiting until the whole driver is written to perform a single "big bang" test. Although incremental testing means writing a larger number of small test programs, this strategy makes it much easier to locate the source of a problem. The tiny test programs are also helpful when you want to make sure that changes to a driver 's code base haven't introduced any new bugs. Another advantage of testing during development is that it can point out basic design flaws in the driver which might otherwise go undetected until the end of the project. Correcting these kinds of fundamental errors late in the project cycle is usually much more expensive than catching them early. What to test Later in this chapter, you'll see some specific types of driver failures to watch out for, but you can generally divide driver tests into the follow ing categories: • Hardware tests These verify the operation of the hardware. This is especially important if both the device and the driver are being devel oped together. In some cases, this may actually mean using a logic ana lyzer to see what's going on. • Normal response tests These confirm that the driver executes the full range of commands it will have to perform once it's out in the real world. • Error response tests These check the reaction of the driver to bad input from a user program, as well as to device errors and timeout conditions . • Boundary tests If the device has any limitations on its maximum transfer size or speed, these tests make sure that the driver can handle them. • Stress tests These subject the driver and its devices to high levels of sustained activity. This category also includes tests where the overall sys tem experiences high levels of CPU, memory, and 1/0 activity, or where resources like memory are in very short supply. - - - - - How to develop the tests Writing test software is an art. Good tests must be thorough enough to have a high probability of actually uncovering errors in Sec. 17. 1 Some Guidelines for Driver Testing 421 the driver. This means you need to analyze the kinds of errors you think the driver might generate, and then write a test suite that will produce them. Good test software also gives the tester enough information to pinpoint the cause of the failure easily. The output generated by a test program should be easy to read and should be formatted in such a way that important details aren't hid den somewhere in a pile of extraneous information. Finally, test software needs to be complex enough to model a real-world sit uation, yet simple enough that it's easy to develop. If a test program is too com plex, it may take a long time just to write and debug the test itself. How to perform the tests It's important to automate the test procedure itself. This makes it easier to guarantee that the same sequence of tests are being performed each time. It's also a good idea to do regression testing. In other words, if you fix some thing in the driver, run the tests again to make sure you haven't broken anything else. This is another good reason to automate the test procedure. When you run the tests, log the results and keep the output. This will give you a good idea of whether or not you're actually getting closer to fixing things or not. Who should do the testing Remember that the goal of testing is to tear the driver to shreds. To find bugs lurking under every line of code. To prove that only angelic intervention keeps the driver working at all. This is very different from the goal of the driver writer, who generally assumes that what he or she is producing will work properly. Because coding and testing have this kind of adversarial relationship, it's usually best if these jobs are performed by different people. It's almost always unreasonable to expect a single person to be objective about their own code. Using the M icrosoft Hardware Compatibility Tests (HCTs) The hardware compatibility test suite (or simply, the HCTs), is a collection of programs which allow platform vendors to see whether their systems will run Windows NT. This suite contains a number of different components, including • General system tests that exercise the FPU, the onboard serial and parallel ports, the keyboard interface, and the HAL. • Tests that exercise drivers for specific kinds of hardware like video adapt ers, multimedia devices, network interface cards, tape drives, SCSI devices, etc. • General stress tests that put unusually high loads on system resources and 1/0 bandwidth. • A GUI-based test manager that automates test execution and data collection. Even if you're not developing a driver for one of the types of hardware with its own test, you can use the HCTs as part of your stress-testing strategy. Chapter 1 7 422 Testing and Debugging Drivers You can find the HCTs in the \HCT . . . directory tree on the CD containing the NT DDK. Although they're distributed with the DDK, the HCTs are not auto matically installed. For installation instructions, see the README.TXT file in the \HCT directory. Remember to put the HCTs on the target machine (where your driver will be running ), not on the host. For more information about using the HCTs, look in the \HCT\DOC directory on the DDK CD. This directory contains all the HCT documentation in Word for Windows format. Finally, if you're writing a driver for a commercial product and you want it to be logo-branded by Microsoft, you'll need to send your driver (and its hard ware) to the Microsoft Compatibility Labs for testing. Microsoft offers Windows NT certification programs for several hardware categories including video cards, network adapters, SCSI adapters, multimedia audio cards, and printers. Once a driver passes the Microsoft certification tests, it's added to the driver library that's distributed with Windows NT. At that point, you're allowed to display a special logo on any product packaging. Contact your friends at Microsoft for details and pricing. 1 7 .2 SOME THOUGHTS ABOUT DRIVER BUGS A s you saw in the last section, successful testing and debugging depend on figur ing out ahead of time what might go wrong. The goal of this section is to get you thinking about the specific kinds of problems drivers can have. It also presents some techniques that can make bugs easier to detect and manage. Categories of Driver Errors Drivers can fail in any number of interesting ways. Although it's not possi ble to give a complete list, the following subsections describe some of the more common types of driver pathology. Hardware problems There's always a chance that the hardware itself might be causing problems. This becomes even more likely if both the device and the driver are being developed at the same time. Symptoms of hardware prob lems include • Errors occurring during data transmission. • Device status codes indicating an error. • Interrupts not arriving. • The device not responding properly to commands. The cause might be as simple as undocumented behavioral quirks in the device (for example, some kind of restriction on command timing or sequencing). If it's a complex device, it might have bugs in its firmware (there simply is no bug- Sec. 17.2 Some Thoughts about Driver Bugs 423 free SCSI firmware in the world). It could also be the result of some low-level bus contention or external signal noise. The device might just be broken. The best approach to these problems is to make the error reproducible and then get as much information as you can. See if the manufacturer has any more information on the behavior of the device, or on known bugs. Use any available hardware diagnostics to verify that the device itself is working properly. System crashes It's easy for failures in kernel-mode code to kill the entire system. Many kinds of driver logic errors can produce a crash, although the most common problem seems to be access violations caused by a bogus pointer. It's also possible for things like bad DMA addresses to corrupt system memory. The next section of this chapter will have more to say about interpreting system crashes. Resource leaks The system doesn't perform any resource tracking or automatic cleanup for kernel-mode components. When a driver unloads, it's responsible for releasing whatever it may have allocated. This includes both memory from the pool areas plus any hardware the driver manages. Even while a driver is running, it can leak memory if it regularly grabs pool space for temporary use and doesn't release it. Higher-level drivers can also be a source of leaks if they allocate their own IRPs and forget to free them. These kinds of driver errors can lead to bad system performance, as the pools slowly dry up, or to a complete system crash. You can use the pool-tagging mechanism and sanity counters (described later in this chapter) to catch pool leakage and lost IRPs. By examining the RESOURCEMAP section of the Registry with REGEDT32, you can check for hardware allocation problems. Thread hangs Another kind of failure involves synchronous 1/0 requests that don't return. In this case, the user-mode thread issuing the request is blocked forever and never comes out of its wait state. This type of behavior can result from several different driver problems. The most obvious cause is not calling IoCompleteRequest to send the IRP back to the 1/0 Manager. Not so obvious is the need to call IoStartNextPacket. Even if there are no pending requests to be processed, your driver has to call this function because it marks the Device object as idle. Without this call, all new IRPs will go into the pending queue, rather than going to the Start 1/0 routine. The calling thread can hang in a driver 's Dispatch routine if the driver is try ing to recursively acquire a Fast Mutex or an Executive Resource. Similarly, if a kernel-mode thread acquires a Mutex or Executive Resource without releasing it, Dispatch routines may hang up if they try to acquire the same object. DMA drivers that don't release the Adapter object or its mapping registers can prevent IRPs from being processed. In the case of slave DMA devices, the offending driver might even cause other drivers using the same DMA channel to lock up. Chapter 1 7 424 Testing and Debugging Drivers Drivers that manage multiunit controllers can cause similar trouble by not releasing the Controller object. In this case, new IRPs sent to any Device object using the Controller object will freeze up. Unfortunately, there's no convenient way to see who currently owns Adapter or Controller objects, Mutexes or Executive resources. About the best you can do is to use a counter to make sure you're releasing these objects as many times as you're acquiring them. In some cases, the checked build of NT may flag some of these errors with a crash. System hangs Occasionally, a driver error can cause the entire system to lock up. For example, deadly embraces involving multiple spin locks (or attempts to acquire the same spin lock multiple times on the same CPU) will bring every thing to a grinding halt. Endless loops in a driver 's Interrupt Service routine or a DPC routine could cause a similar failure. Once this kind of collapse occurs, it's difficult (if not impossible) to regain control of the system. The best approach is usually to debug the driver interac tively and see if you can trace the exact sequence of steps that lead to the hang. Reproducing Driver Errors One of the keys to correcting a driver bug is being able to reproduce the problem. Intermittent errors are the bane of a driver writer 's existence. Be as meticulous as possible in recording the exact circumstances at the time a bug appears, so that you can track and correct it. Several factors can make bugs intermittent. Time dependencies Some kinds of problems only show themselves when a driver is running at full speed. This could mean large numbers of 1/0 requests per second, high data rates, or both. Stress testing is usually a good way to make these kinds of bugs appear. Multiprocessor dependencies Things don't behave the same way on sin gle- and multiprocessor systems. For example, ISR, DPC, and 1/0 Timer routines can all run simultaneously on an SMP machine. This can lead to various problems that don't show up on a single CPU. For this reason, it's important to make multi processor testing part of your driver verification strategy. One warning: SMP debugging is very painful, so it's a good idea to do the initial debugging on a sin gle processor. M ultithreading dependencies If your driver manages shareable Device objects, it's important to see what happens when multiple threads are issuing requests at the same time. Miscellaneous causes Finally, intermittent errors can depend on a whole universe of other factors. This includes sensitivity to system load conditions, Sec. 1 7.2 Some Thoughts about Driver Bugs 425 problems caused by specific combinations of hardware on the same machine, or specific combinations of devices on the same bus. Once again, a detailed log is your best hope of determining the factors that make the bug appear. Coding Strategies That Reduce Debugging There are several things you can do during the coding phase of driver devel opment that will reduce debugging time. Here are some of them: • Get someone else to look at your code. It's amazing how quickly an unbi ased eye can sometimes see the cause of a problem that you haven't been able to find. • Use assertions (described later in this chapter) to check for various kinds of inconsistencies. • Leave the debug code in your driver, surrounded with appropriate #if and #endif statements. • Add a version resource to the driver so that you can determine exactly which version of the driver is having problems. Chapter 16 explains how to do this. • If you're working on a large driver project with other people, using ver sion control software will help to maintain everyone's sanity. Keeping Track of Driver Bugs Research has shown that bugs are not evenly distributed throughout a piece of code. Rather, they tend to cluster in a few specific routines. Usually, this will be some very complex piece of code, or code with complex (or questionable) logic. A bug log can help you track these errors by drawing your attention to the places where your driver tends to fail. Such a log can also help you spot patterns of system loading or driver usage that result in failures. Finally, you can use the bug log to decide which errors are worth fixing (not all of them are) and to keep track of which errors have already been corrected. Individual needs vary, but at the very least, you should keep the following kinds of information in a bug log: • An exact description of the failure. • As much detail as possible about the prevailing conditions at the time of the failure. This includes the version of the operating system and the driver and a description of the hardware configuration, • The importance of fixing this bug. • Current status of the bug. Chapter 1 7 426 Testing and Debugging Drivers 1 7 .3 R EADI N G C R A S H S C R E E N S System crashes (which Microsoft documentation euphemistically calls "STOP messages") are perhaps the most dramatic sign that your driver has a bug. This section describes how STOP messages are generated and explains how to get use ful information from them. What Happens When the System Crashes In spite of its name, a system crash is really a very orderly thing. It is NT's way of telling you that something in the operating system has become so unstable that rebooting is the only safe thing to do. Oddly enough, a crash actually improves NT' s reliability by preventing further damage to the system, and by drawing attention to problems that might otherwise go unnoticed. Two different sequences of events can lead to a system crash. In the first sce nario, some kernel-mode component happens to notice a horribly inconsistent state of affairs and decides to take the system down. For example, the 1/0 Man ager might discover that a driver is trying to pass an already completed IRP to IoCompleteRequest. The 1/0 Manager responds by initiating a crash. The second path to a system crash is less direct. Here, a kernel-mode compo nent causes an exception which it does not or cannot handle. Code in the Kernel traps the exception and initiates a crash. For example, a buggy driver that gener ated an access violation would produce this kind of crash. So would a driver that caused a page fault at an elevated IRQL level. Regardless of who decides to crash the system, the deed is done by making one of the following calls: 1 VOI D KeBugCheck ( Code ) ; VOI D KeBugCheckEx ( Code , Argl , Arg2 , Arg3 , Arg4 ) ; These functions generate the STOP screen itself and (optionally) save a crash file to disk. Then, depending on various system settings, they either reboot, halt the system, or start up the Kernel's debug client. The Code argument to KeBugCheck and KeBugCheckEx identifies the cause of the crash. KeBugCheckEx takes an additional four arguments that appear as part of the STOP message. KeBugCheck sets these values to zero. The BUG CODES.H header file in the DOK defines all the standard bugcheck codes. You'll find descriptions of the more common codes and their parameters in Appendix B of this book. 1 You can also call KeBugCheck and KeBugCheckEx in your own code if you discover some terrible error. If you do make these functions part of your debugging strategy, use conditional compilation to keep them out of the retail version of the driver. Very, very few situations are serious enough to warrant a system crash in a commercial driver. 427 Sec. 17.3 Reading Crash Screens Layout of a STOP Message It's hard to miss the bright blue, character-mode screen on which STOP mes sages appear. If �ou look at one of these "blue screens of death," you'll see four distinct sections. Bugcheck i nformation The first part of the display identifies the cause of the crash. This includes the bugcheck code, zero to four bugcheck parameters, and (if the bugcheck code is one of Microsoft's) the symbolic name associated with the error. Here's a sample: *** STOP : OxO O O O O O OA ( O x 0 0 0 0 0 0 0 0 , 0 x0 0 0 0 0 0 0 2 , 0x0 0 0 0 0 0 0 0 , 0xFCE 1 0 7 9 6 ) I RQL_NOT_LES S_OR_EQUAL * * * Addr e s s p4 - 3 0 0 f c e 1 0 7 9 6 has base a t fcel O O O O - XxDr iver . SYS i rq l : l f SYSVER O xf 0 0 0 0 5 2 2 In this example, the bugcheck code is OxOOOOOOOA and the associated sym bolic name is IRQL_NOT_LESS_OR_EQUAL. Fine, but just what does it mean? If you look in Appendix B, you'll find that OxOOOOOOOA is saying that the driver caused a page fault at or above DISPATCH_LEVEL IRQL. The four numbers in parentheses after the bugcheck code are the extra argu ments passed to KeBugCheckEx. Their significance depends on the bugcheck code itself. Again consulting Appendix B, you'll see that the first parameter con tains the paged address (0), the second is the IRQL level at the time of the refer ence (2), the third indicates the type of access (0 means "read"), and the fourth is the address of the instruction that caused the fault (OxFCE1 0796). Very thoughtfully, the display tells us that this address falls within the range of the XXDRIVER.SYS module. Next comes a line that seems to say something about the IRQL level of the crash. This would be very useful to know, if it were correct. Sadly, KeBugCheck always raises IRQL to HIGHEST_LEVEL for synchronization purposes so the value in a STOP message is always OxlF. On this same line, the SYSVER field tells you what version of NT was run ning. This is just the build-number in hex, with a OxF or a OxC in the highest nib ble to indicate whether it's the free or checked build of NT. In the sample above, converting Ox522 to decimal says that this crash occurred under the free version of build 1314. Most of the useful information comes from this section of the STOP mes sage. All by itself, it's often enough to give you a good idea of what caused the crash. You should always take note of this part of the STOP screen before reboot ing the system. 2 Under some conditions, the Kernel won't be able to display the entire screen. This usually means that the services it needs to output some of the information are not available. Chapter 1 7 428 Testing and Debugging Drivers Module list Next comes a two-column display naming all the operating system modules and drivers loaded at the time of the crash. It also lists each mod ule's base address in memory and a date-stamp indicating the module's file date. Dl l B a s e Da t e S tmp - Name Dll 8 0 1 0 0 0 0 0 2 f c 6 5 3 bc - n t o skrnl . exe 80400000 � fb2 4 f 4 a - hal . dl l 8 0 0 1 0 0 0 0 2 fa a e 8 b 0 - Atdi s k . sys 80686000 2 f c l 5dl 9 - Fas t f at . sys fcc2 0 0 0 0 00000000 - F l oppy . SYS fcc3 0 0 0 0 00000000 - Fs_rec . SYS fcc4 0 0 0 0 00000000 Base Da t e S tmp - Name - Nu l l . SYS fcc5 0 0 0 0 00000000 - Beep . SYS f c c 6 0 0 0 0 2 faae 8 d9 - S e rmous e . SYS fcc7 0 0 0 0 2 f aae8b2 - f c c 8 0 0 0 0 2 fa a e 8 b 5 - Mouc l a s s . SYS f c c 9 0 0 0 0 2 f aae8b4 i 8 0 4 2 prt . SYS - Kbdc l as s . SYS f c cb O O O O 2 faae 8 8 d - VIDEO PRT . SYS fccaO O O O 2 faa e 8 9 2 - vga . sys f c c c O O O O 2 faae 8 fd - Ms f s . SYS f c c dO O O O 2 fa a e 8 e c - Np f s . SYS f cc f O O O O 2 fc l 2 a f 6 - NDI S . SYS fcce O O O O 2 faae 9 2 d - aml 5 0 0 t . sys f c d3 0 0 0 0 2 f a a e 9 4 5 - TDI . SYS f c dl O O O O 2 fae6a5 f - nb f . sys f c d4 0 0 0 0 2 f aae9 4 f - n e tb i o s . sys f c d5 0 0 0 0 00000000 - Parpor t . SYS f c d6 0 0 0 0 00000000 - f c d7 0 0 0 0 2 f aae8d8 - S e r i a l . SYS fcd8 0 0 0 0 00000000 - a f d . sys f c d9 0 0 0 0 2 fba 6 8 1 8 - rdr . sys f c ddO O O O 2 f c 3 e 3 eb - Paral l e l . SYS s rv . sys fce l O O O O 3 1 6aa5 9 4 - XxDrive r . SYS Occasionally, this part of the display can help you detect hostile interactions between drivers. If driver X crashes the system if (and only if) driver Y is loaded, there may be something going on between them. A written crash log will help you to see these kinds of patterns. Stack trace The third part is a listing of the function calls on the stack that preceded the STOP message. Addre s s dwo rd dump Bu i l d [ 1 3 1 4 ) - Name f f 4 1 6dl 8 fce1 0 7 9 6 fce1 0 7 9 6 ff4f9cl0 e1304018 8 0 1 8 62e3 00000246 - XxDr iver . SYS f f 4 1 6 d2 4 801862e3 8 0 1 8 6 2 e3 00000246 8013 16e6 f f 4 1 6d4c ff4 f9cl0 - n t o s krnl . exe f f 4 1 6 d2 c 8 0 1 3 1 6e6 8 0 1 3 1 6e6 f f 4 1 6 d4c ff4f9cl0 8 0 1 7 5 de 6 ff538288 - n t o s krnl . exe f f 4 1 6 d3 8 8 0 1 7 5 de 6 8 0 1 7 5 de 6 ff538288 ff416f04 00000000 00000000 - n t o skrnl . exe f f 4 1 6d84 fce1 0 7 9 6 fce1 0 7 9 6 fcel 0 0 0 8 00010246 f f 5 6 7 ee 8 00000000 - XxDr iver . SYS f f 4 1 6d8 8 fce1 0 0 0 8 fce1 0 0 0 8 00010246 f f 5 6 7ee8 00000000 f f 5 8bc 4 0 - XxDr iver . SYS f f 4 1 6 da4 fce1 0 6 1 f fce1 0 6 1 f 00000004 f f 5 67ee8 00000000 f f 5 8bc 4 0 - XxDriver . SYS f f 4 1 6 db 8 8 0 1 1 9b 6 9 8 0 1 1 9b 6 9 f f 5 8bc f 8 ff567f58 f f 4 1 6de4 8 0 1 14d69 - n t o s krnl . exe f f 4 1 6 dc 8 8 0 114d69 8 0 1 1 4 d6 9 f f 5 8bc 4 0 f f 5 6 7ee8 f f 5 6 7 ee8 f f 5 8bc 4 0 - n t o s krnl . exe f f 4 1 6de8 f c e l 0 5 d3 f c e l 0 5d3 f f 5 8bc4 0 f f 5 67ee8 00000000 00000000 - XxDriver . SYS ff416e08 8 0 4 0 4 2 ac 8 0 4 0 4 2 ac 8 0 1 02 f4 8 f f 5 8bc 4 0 f f 5 6 7 ee8 f f 5 6 7 ee8 - hal . dl l ff416e0c 80102 f48 8 0 1 0 2 f4 8 f f 5 8b c 4 0 8 0 1 0 d5 4 4 f f 5 6 7ee8 8 0 4 0 4 2 a0 - n t o s krnl . exe ff41 6elc 804042a0 804042a0 8 0 1 5b 9 4 3 f f 4 1 6ed8 f f 5 9 3 d8 c 00403 054 - hal . dl l f f 4 1 6 e2 0 8 0 1 5b 9 4 3 8 0 1 5b 9 4 3 f f 4 1 6 ed8 f f 5 9 3d8c 00403 054 ff567 f64 - n t o s krnl . exe ff41 6e3 c 8 0 1 0d544 8 0 1 0 d5 4 4 8 0 1 5a3 4 8 f f 5 8bc 4 0 f f 5 6 7 ee 8 ff4 f9c28 - n t o skrnl . exe ff41 6e4 0 8 0 1 5a3 4 8 8 0 1 5 a3 4 8 f f 5 8bc 4 0 f f 5 6 7ee8 ff4f9c28 00000001 - n t o s krnl . exe ff41 6e68 80159c9c 80159c9c 00000000 00120196 01040864 ff41 6e08 - n t o skrnl . exe ff41 6e84 8013 4 f3 0 80134f30 80100c60 ffffffff f f 4 1 6 ed0 8 0 1 5 6 1 2 d - n t o s krnl . exe ff416e88 80100c60 8 0 1 0 0c 6 0 ffffffff f f 4 1 6ed0 8015612d 0012ff30 - n t o s krnl . exe ff416e94 8 0 1 5 6 12d 8 0 1 5 612d 0 0 1 2 ff3 0 40100080 00120196 0012 ff14 - n t o s krnl . exe f f 4 1 6ecc 8 0 1 3 4 f3 0 ffffffff ff41 6 f04 8 0 1 3 7 fb 5 - n t o s krnl . exe 8 0 1 3 4 f3 0 8 0 1 0 0ec0 Sec. 17.3 Reading Crash Screens 429 Each line in this display represents one frame on the stack, with the most recent frame being at the top of the display. This top frame is the one that was active at the time of the crash. Reading down the display gives you a history of the function calls that led to the crash. On each line, the first column is the address of the stack frame itself. The sec ond two columns both contain the return address of the function. The remaining columns are the first four DWORD arguments passed to the function. If a particu lar function takes more arguments, you won't see anything beyond the fourth. If it takes less than four DWORDs, the information in some of the rightmost col umns will be bogus. The last column identifies the module in which the return address (from column two) falls. In the crash pictured above, you can see that code somewhere around OxFCE10796 in :XXDRIVER.SYS was executing at the time of the crash. This code was called by a routine in NTOSKRNL.EXE (at Ox801862E3), which in turn was called by another system routine at Ox801316E6. Unfortunately, without a linker map, there's no way to turn these hideous addresses back into function names. This seriously limits the value of this display. Also keep in mind that the call frames on the stack show you where the problem was detected, not necessarily where it was caused. It's possible for a driver to do horrible damage to a seldom-used part of the system and be long gone before NT discovers it and crashes. Recovery instructions There's very little useful information in this part of the display. It basically confirms the communication settings of the Kernel's debug client (if it's enabled), lets you know when the crash dump is finished, and recommends a response to the STOP message. Beginning dump o f phys i c a l memory Phys i c a l memory dump c omp l e t e . techn i c a l Contac t your sys t em admini s t r a t o r or support group . The actual text of this message will depend on the current option settings selected for the system. For example, if you have disabled crash dumps, you'll see a slightly different display. Deciphering STOP Messages If the truth be told, there's not all that much helpful information in a STOP message. The top few lines, containing the bugcheck information are perhaps the most useful things to know. The stack trace (which at first glance looks so promis ing) actually has very little to say, unless you can determine the identities of the functions listed in the trace. To do this, you need linker maps for the modules containing the functions. This means you're out of luck if the functions are located in a Microsoft module like NTOSKRNL.EXE or HAL.DLL, since these linker maps don't come with the Chapter 17 430 Testing and Debugging Drivers DDK. You can, however, generate a linker map for your own driver using the fol lowing BUILD command: BUILD - c e f -nmake L INKER_FLAGS= -MAP : xxdriver . rnap This is all a rather tedious process, and it still doesn't give you a great deal of information. Fortunately, if you have a crash file handy, you can find out much more with far less work. The next two sections of this chapter will explain how to work with crash files. 1 7.4 AN OVERVIEW OF W I N D B G WINDBG is a kernel-mode debugger you can use to analyze both crash files and running driver code. This section gives a brief overview of WINDBG . For more information, see WINDBG's online help and the NT DDK Programmer 's Reference. Although WINDBG is a helpful tool, it does have some problems. For one thing, it's actually an amalgamation of an older console-based kernel-mode debugger (KD) and a GUI-based source-code debugger that came with early ver sions of the Win32 SDK. This double ancestry can make WINDBG a little confus ing to use, since there may be a console command, a menu option, and a toolbar button that all do the same thing. You may also experience occasional unexplainable WINDBG crashes from time to time, as well as several other kinds of quirky behavior. For a complete list of known (or at least, acknowledged) WINDBG bugs, look for an article on the Microsoft Developers CD in the Win32 SDK Knowledge Base. 3 The Key to Source-Code Debugging One of WINDBG's most powerful features is its ability to debug kernel mode components at the source-code level. Sadly, the documentation isn't real clear about how to accomplish this little miracle. Proper configuration of two sets of directories is the key to making it all work. Symbol directories WINDBG gets very cranky if it can't find the symbol files for the modules you're trying to debug. This includes both the symbols for your driver and those for various operating system modules. See Appendix A for a description of how to set up WINDBG symbol directories. Source code directories On the machine where you're running WINDBG, the directory path to your driver 's source code must exactly match the source-code 3 Search for "WINDBG near bug" to find this article. Sec. 17.4 An Overview of WINDBG 431 path on the machine where the driver was compiled and linked. Even the drive letter has to be the same. The Linker stores this path information in the driver executable, and WINDBG uses it to find the source code.4 If you don't know the original source-code path for a kernel-mode compo nent, don't worry. As long as you have a checked build of your driver (and its symbols haven't been stripped out), you can use the DUMPBIN utility to find the path names. The command looks like this: DUMPBIN / SYMBOLS XXDRIVER . SYS I MORE This generates a lot of output. The important information is at the top of the listing. The following excerpts show the things you should look for. Dump o f f i l e xxdriver . sys F i l e Typ e : EXECUTABLE IMAGE COFF SYMBOL TABLE ' 0 0 0 O O O O O O O B DEBUG notype F i l ename I . f i l e D : \users \ ar t \ dr iver s \ ch1 8 \ crash \ drive r \ c rash . c D O B 0 0 0 0 0 0 1 5 DEBUG no type F i l ename I . f i l e D : \users \ ar t \ driver s \ ch1 8 \ c rash\ driver \ t rans f er . c 0 1 5 O O O O O O lD DEBUG no type F i l ename I . f i l e D : \us ers \ art \ dr ivers \ chl 8 \ c rash \ driver \ di spatch . c O lD 0 0 0 0 0 0 2 3 DEBUG no type F i l ename I . f i l e D : \us ers \ art \ dr ivers \ ch1 8 \ crash\ drive r \ unload . c 0 2 3 0 0 0 0 0 0 0 0 DEBUG no type F i l ename I . f i l e D : \users \ ar t \ driver s \ ch1 8 \ crash\ driver \ in i t . c A Few WINDBG Commands Although WINDBG is a GUI program, you really can't avoid using its com mand-line window. This text-based interface supports several dozen built-in com mands, and as you'll see later, you can add extensions of your own. Table 17.1 gives an overview of the more helpful WINDBG commands. See the online docu mentation and the WINDBG Help file for more information. 4 WINDBG has a menu option that supposedly lets you change the source-code path, but it doesn't seem to work. Chapter 17 432 Table 1 7.1 Testing and Debugging Drivers Some useful W I N D BG commands WINDBG commands and extensions Command Description help k, kb, and kv dd address Print help on basic WINDBG commands Print a trace of the current kernel-mode stack Dump the contents of memory In Print symbol names nearby a given value Open a log file, replacing a previous version Add new log information to an existing file Close the debug log file Print help on standard WINDBG extensions Print verbose information about process handles .logopen .logappend .logclose !help !handle 0 3 CID !process 0 0 !process address flags !process CID -1 !sysptes 1 List all processes on system Print information about a process object Print detailed information about specific process Print information about a thread Print context information for 80x86 CPU Print Virtual memory statistics Print summary of system page table usage !drivers !irpzone List currently-loaded kernel-mode modules List IRPs in use in NT' s IRP zone buffer !irpzone full !errlog !bugdump ComponentName !irp address (Same as above, but with more information) List any unflushed messages in errorlog buffer Dump contents of bugcheck callback buffer Print formatted contents of an IRP !devobj address !drvobj address Print formatted contents of a Device object Print formatted contents of a Driver object Print formatted contents of an SRB Print formatted contents of 80x86 trap frame Print information about pool with a given tag Print information about tagged pool Reload a particular module Load an extension DLL Unload an extension DLL !thread address !per !vm !srb address !trap address !poolfind Tag !poolused !reload !load ExtensionName !unload ExtensionName 433 Sec. 17.5 Analyzing a Crash Dump 1 7 .5 ANALYZING A CRASH DUMP When a crash occurs, Windows NT can save the state of the system in a dump file on the boot partition. 5 Crash dumps allow you to reboot almost immediately and determine the cause of the crash at a later time. This section explains how to ana lyze a system crash dump. Goals of the Analysis Using WINDBG, you can poke around in the remains of a dead system and find out almost as much as if it were still running. This kind of forensic pathology can help you develop a convincing explanation of what led to the crash. Some of the questions you should ask when you're analyzing a crash include: • Was my driver executing at the time of the crash? • Was my driver responsible for the crash? • What was the sequence of events that led to the crash? • What operation was the driver trying to perform when the system crashed? • Is there any information in the Device Extension that might tell me what was going on? • What Device object was it working with? Starting the Analysis To begin analyzing a crash file, run WINDBG from the command line with the -y and -z options. These specify the location of the crash symbols and the dump file. For example, WINDBG -y c : \wnt \ symbo l s - z c : \wnt \ memory . dmp For the crash that produced the STOP message you saw earlier, the initial output from WINDBG looks like this: Thread C r e a t e : Modu l e Load : Proces s = O , Thread= O d : \ u s e r s \ a r t \ dr i ve r s \ syrnbo l s \ f r e e \ NTOSKRNL . DBG ( symbo l s Kernel Debugger c onne c t i on e s t ab l i shed f o r G : \ WINNT \ MEMORY . DMP Kerne l Vers i on 1 3 1 4 Free Bugchec k O O O O O O O a 00000000 : l o aded @ O x8 0 1 0 0 0 0 0 00000002 S t opped a t an unexp e c t e d exc ept i on : 00000000 fce1 0 7 9 6 c o de = 8 0 0 0 0 0 0 3 Hard c oded breakpo i n t h i t 5 See Appendix A for instructions o n enabling crash dumps. addr = 8 0 1 3 b4 1 6 l o aded ) Chapter 17 434 Testing and Debugging Drivers You'll recognize some of this information from the STOP message. The bugcheck code is OxA, which means the fourth parameter (OxFCE10796 in this case) is the address where the problem occurred. To see where this instruction is in your source code, choose Goto Address from the View menu, and enter the address from the bugcheck parameter. In this particular crash, OxFCE10796 turns out to be a function called XxTryToCrash. The second parameter for bugcheck OxA is the true IRQL level at the time of the crash. From NTDDK.H, two turns out to be DISPATCH_LEVEL, which gives us a hint about what parts of the driver might have been executing at the time of the crash. One point: Don't be mislead by the message about the unexpected exception with a code of Ox80000003. This is just the breakpoint used by KeBugCheck itself to halt the system, so it has no significance. Tracing the Stack The stack trace is like a time line, showing you the sequence of function calls leading up to the crash. By reading the trace from the oldest frame (at the bottom) to the crash frame (at the top), you can come up with a coherent story describing what happened. The trick is to find the stack. H igh-IRQL crashes If the system crashed while it was running at or above DISPATCH_LEVEL IRQL, you can use the k command to get a trace of the stack at the time of the bugcheck. KDx8 6 > k f f 4 1 6dlc f c e 1 0 7 9 6 NT ! KiTrap O E+ Ox2 5 2 f f 4 1 6da0 f c e 1 0 6 1 f XXDRIVER ! XxTryToCrash+ Ox2 6 ( 0 x 0 0 0 0 0 0 0 4 ) f f 4 1 6 dc 4 8 0 1 1 4 d6 9 XXDRIVER ! Xx S t ar t i o + Ox2 f ( O xFF 5 8 BC4 0 , f f 4 1 6de4 f c e 1 0 5d3 NT ! I o S t a r t Packe t + O x 9 b OxFF 5 6 7 EE 8 ) f f 4 1 6 e 0 8 8 0 1 0 2 f 4 8 XXDRIVER ! XxDi spat chWr i t e + O x4 3 ( 0xF F 5 8 B C 4 0 , ff41 6elc OxFF5 6 7 EE 8 ) 8 0 1 5b9 4 3 NT ! @ I o fC a l l Driver@ 8 + 0x3 8 f f4 1 6e3c 8 0 1 5 a3 4 8 NT ! I opSynchronou s S e rvi ceTai l + Ox 6 f f f 4 1 6 ed8 8 0 1 3 7 fb 5 NT ! NtWr i t eF i l e + Ox 6 a c f f 4 1 6 ed8 7 7 f 8 9 4 2 7 NT ! Ki Sys t emServi c e + Oxa5 0012ff6c 0 0 0 0 0 0 0 0 NTDLL ! ZwWr i t e F i l e + O xb Each line shows the address of the stack frame, the return address of the function, the name of the function, and (in parentheses) the arguments passed to the function. You generally won't see any arguments for system functions. To make them show up, use the kv version of the stack-trace command. In this trace, a call to ZwWriteFile eventually found its way to XXDRIVER's XxDispatchWrite routine. The first argument for a Dispatch routine is always the Device object (here, OxFF58BC40) and the second (OxFF567EE8) is the IRP. XxDis patchWrite called loStartPacket, which called the Start I/0 routine in XXDRIVER. Just before it died, XxStartlo made a call to XxTryToCrash and passed it an argument with a value of four. Sec. 17.5 Analyzing a Crash Dump 435 Another way to see the current stack is by selecting Calls from the WINDBG Window menu. Double-clicking on one of the frames in the Calls display will take you to the line of source code where the call originated. Once you've entered a stack frame this way, you can examine the function's local variables at the time of the crash by selecting Locals from the WINDBG Window menu. Crashes below DISPATCH_LEVEL When the system crashes because of an unhandled exception below DISPATCH_LEVEL IRQL, the stack trace from the k command won't tell you much about what was going on. 6 Instead, you need to find the trap frame associated with the crash. On 80x86 platforms, you can find the trap frame by using the kb command. First, look for the stack frame associated with a function called KiDispatch Exception. 7 KDx8 6 > kb F r ame P t r RetAddr Paraml Param2 Param3 00000000 Func t i on Name f c c c c ab8 80138f59 f c c c c ad4 f c c c cb2 8 NT ! Ki D i spatchExc ep t i on + Ox3 6 6 fccccc l O 8015c542 8 0 1 02 f48 ff564410 f f 4 ebc 8 8 NT ! C ommonDi spat chExcep t i on + Ox4d f c c c cb2 8 ff56c860 f f 5 0cle8 00000000 0 0 0 0 0 0 0 6 NT ! I opErrorLogQueueReque s t + Ox 5 c Next, look down the left-hand column (the one labeled "FramePtr") for the address of the frame two earlier than the KiDispatchException frame. In this crash, the frame of interest has the address OxFCCCCB28. What you've just found is called the trap frame, and you can use the !trap command to format it. KDx 8 6 > ! t rap f c c c cb2 8 eax= O O O O O O O O ebx= O O O O O O O O ecx= f c c c cb8 8 edx= O O O O O O O O e i p = f c e 1 0 7 9 6 esp= f c c c cb 9 c ebp = f c c c cbac i op l = O nv u p e i pl es i = f f 4 ebec 0 edi = f f 5 6 7 ee 8 z r na po nc v ip = O v i f = O cs=0008 s s = O O l O ds= 0 0 2 3 ErrCode = es = 0 0 2 3 fs=0 0 3 0 gs = O O O O efl=00010246 00000000 From the formatted trap frame, note the contents of the EBP (OxFCCCCBAC), ESP (OxFCCCCB9C), and EIP (OxFCE10796) registers. Use these values in the k command to specify the stack address. This displays the true stack trace at the time of the crash. KDx8 6 > k = f c c c cbac f c c ccb9 c fce1 0 7 9 6 f c c ccbac f c e 1 0 4 6 a XXDRIVER ! XxTryToCras h + O x 2 6 ( 0x 0 0 0 0 0 0 0 2 ) f c c ccbc 4 8 0 1 0 2 f4 8 XXDRIVER ! XxD i spatchOpenC l o s e + O x l a ( OxFF4 EBEC O , f c c c cbd8 8 0 1 5 c c c a NT ! @ I o fC a 1 1 Dr iver@ 8 + 0x3 8 fccccc 9 c 8 0 1 7 9b0 0 NT ! I opPars eDevi c e + Ox7 7 e f c c c cd O c 8 0 1 7 5 c f 6 NT ! ObpLookupObj e c tName + O x4 8 0 f c c c cde4 8 0 1 5 1 e3 3 NT ! ObOpenObj e c t ByName + Oxa2 f c c c ce 9 0 8 0 1 5 6 1 2 d NT ! I oC r e a t eF i l e + Ox4 3 d f c c c cedO 8 0 1 3 7 fb5 NT ! N t C r e a t eF i l e + Ox2 f 6 If you have WINDBG connected 7 will give you useful information. O xFF 5 6 7 EE 8 ) to the target system and it catches the exception, the stack trace This sample output is from a different crash than the one we've been examining. 436 Chapter 1 7 f c c c c edO 7 7 f 8 8 9 b3 NT ! K i Sys t emServ i c e + O x a S f c c c cb 9 8 f f 5 6 7 ee 8 Testing and Debugging Drivers O x7 7 f 8 8 9b3 In this trace, it's obvious that the problem occurred during a call to NtCre ateFile in the driver 's XxDispatchOpenClose function. Using trap frames Another way to find the proper stack on 80x86 machines is to use the kv command. This displays a more detailed view of each frame. Look for a function with KiTrap in its name. Next to this function, you'll find the address of the trap frame. KDx8 6 > kv f f 4 1 6dlc f c e l 0 7 9 6 NT ! KiTrap O E + Ox2 5 2 f f 4 1 6da0 f c e l 0 6 l f XXDRIVER ! XxTryToC rash+ O x2 6 ( 0 x 0 0 0 0 0 0 0 4 ) ( FPO : [0, 0] TrapFrame @ f f 4 1 6dc4 8 0 1 1 4 d 6 9 XXDRIVER ! Xx S t a r t i o + O x2 f ( OxFF 5 8 BC4 0 , f f 4 1 6de4 f c e l 0 5d3 NT ! I o S t a r t Pa c ke t + O x 9 b f f 4 1 6 dl c ) OxFF 5 6 7 EE 8 ) On the line for KiTrap, you'll find the address of the trap frame (in this case, OxFF416DlC). Use the !trap command to format its contents. KDx 8 6 > ! t rap f f 4 1 6dlc eax= O O O O O O O O ebx= f f 5 8bc 4 0 ecx= f f 4 1 6 d7 c edx= O O O O O O O O e s i = O O O O O O O O eip= fcel 0 7 9 6 esp= f f 4 1 6 d9 0 ebp= f f 4 1 6da0 nv up ei pl edi = f f 5 6 7 ee 8 z r n a po n c vi f = O vip= O cs= 0 0 0 8 i op l = O s s = O O l O ds = 0 0 2 3 ErrCode = Oxfcel 0 7 9 6 es = 0 0 2 3 f s = 0 0 3 0 gs = O O O O efl=00010246 00000000 mov 8a0 0 a l , by t e p t r [ eax ] From the trap frame, note the contents of the EBP (OxFF416DAO), ESP (OxFF416D90), and EIP (OxFCE10796) registers. Use these values in the k com mand to specify the stack address. This displays the true stack trace at the time of the crash. KDx8 6 > k = f f 4 1 6da0 f f 4 1 6 d9 0 fcel 0 7 9 6 f f 4 1 6da0 f c e l 0 6 l f XXDRIVER ! XxTryToCrash+ O x2 6 ( 0 x 0 0 0 0 0 0 0 4 ) f f 4 1 6 dc 4 8 0 1 1 4d6 9 XXDRIVER ! Xx S t a r t i o + Ox2 f ( OxFF 5 8 BC4 0 , f f 4 1 6de4 f c e l 0 5d3 NT ! I oS t ar t Packe t + O x 9 b OxFF5 6 7 EE 8 ) f f 4 1 6 e 0 8 8 0 1 0 2 f 4 8 XXDRIVER ! XxDi spat chWr i t e + O x4 3 ( 0xFF 5 8 BC 4 0 , ff416elc 8 0 1 5b9 4 3 NT ! @ I o fC a l l Dr iver @ 8 + 0x3 8 ff41 6e3c 8 0 1 5 a 3 4 8 NT ! I opSynchronou s S e rvi c eTai l + Ox 6 f f f 4 1 6 ed8 8 0 1 3 7 fb 5 NT ! NtWr i t eF i l e+ Ox 6 ac f f 4 1 6 ed8 7 7 f 8 9 4 2 7 NT ! K i Sys t emServi c e + O x a S f f 4 1 6d8c f f 5 6 7 ee 8 OxFF5 6 7 EE 8 ) O x7 7 f 8 9 4 2 7 You can see that this display matches the one generated by the k command, verifying that we've found the right stack. Indirect Methods of Investigation If your driver wasn't running at the time of the crash, the stack trace won't contain any useful information and you'll need to take a more indirect approach Sec. 17.5 Analyzing a Crash Dump 437 to find the problem. The goal is to gather as much information as possible about what the driver was doing when the system crashed. This involves a certain amount of creativity and imagination. Finding 1/0 requests One approach is to track down any IRPs the driver was processing at the time it died, and then try to puzzle out what was happen ing. Begin by getting a list of all the active IRPs on the system with the !irpzone command: ! i rp z one KDx8 6 > Sma l l I rp l i s t f f 5 6 7 ee 8 Thread f f 5 9 9 b e 0 current s t ack b e l ongs to \ Dr i ve r \ XxDriver f f 5 6 a 7 0 8 Thread f f 5 1 9 6 2 0 current s ta c k b e l ongs to \ Dr i ve r \ Mouc l as s f f 5 6 ab 0 8 Thread f f 5 4 7 a 6 0 current s ta c k b e l ongs to \ Dr iver \ Kbdc l a s s f f 5 6bd0 8 Thread f f 5 0 0 5 0 0 current s tack b e l ongs to \ F i l e Sys t em \ Rdr Large I rp l i s t From this list, select the IRPs currently belonging to your driver. Next, use the !irp command to format each one (this can be a rather tedious process if there are a lot of IRPs). This is what the formatted IRP looks like: KDx8 6 > ! i rp f f 5 6 7 ee 8 f rom z one and a c t ive w i t h 1 I rp i s No Mdl Sys t em bu f f e r = s t acks 1 is current f f 5 9 3 d8 8 Thread f f 5 9 9 be 0 : I rp s t a c k t r a c e . cmd f l g c l Devi c e F i l e C omp l e t i on - C o n t ext > 4 0 1 f f 5 8bc 4 0 f f4 f 9 c 2 8 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 pending \ Dr i ve r \ XxDr iver Args : 00000004 00000000 00000000 00000000 The cmd field shows the major function, and the Args field displays the Parameters union of the 1/0 stack location. The fig and cl fields show the stack location flags and control bits, which you can find in NTSTATUS.H. Here, you can see that the function code was a four (IRP_MJ_WRITE) and Parameters.Write.Length was 4 bytes. Furthermore, no Completion routine (or completion context) was associated with this 1 / 0 stack location, and it had already been marked pending at the time of the crash. Finally, there is a system buffer associated with the IRP (at location OxFF593D88) which you can examine using the dd command or the Memory option in the WINDBG Window menu. This tells us that the Device object is doing Buffered 1/0. To see exactly which device the IRP was sent to, use the !devobj command on the address of the Device object from the IRP display. Here you can see that the target device was CrashO, and that the IRP had already been made current when the system crashed. KDx8 6 > ! devobj Devi c e obj e c t CrashO Current f f 5 8bc 4 0 is for : \ Dr iver \ XxDr iver Dr iverObj e c t I rp f f 5 6 7 ee 8 Re fC ount Dev i c eQueue : 1 Type f f 5 3 eld0 00000022 DevExt f f 5 8bc f 8 Chapter 1 7 438 Testing and Debugging Drivers Sometimes, you can find out even more information about what was going on by dumping the contents of the Device Extension with the dd command. Later in this chapter, you'll see how to write a WINDBG extension that makes the Device Extension easier to dump. Of course, this doesn't give us nearly as much information as the stack trace, but it does tell us that the driver was trying to process a Buffered 1/0 IRP_MJ_WRITE command. Since the IRP had been made current, we know that it got at least as far the driver 's Start 1/0 routine. Often the best approach in this case is to set up the system for interactive debugging and try to make the error repeat. Examining processes Occasionally, it's helpful to know what processes were running on a system at the time of a crash. This could help you spot patterns of system usage or even specific user programs that cause your driver to fail. For general information, you can use the !process command like this: KDx8 6 > ! proces s 0 0 * * * * NT ACTIVE PROCESS DUMP * * * * PROCESS f f 5 7 8 9 4 0 C i d : 0 0 0 2 Peb : 0 0 0 0 0 0 0 0 ParentC i d : 0 0 0 0 Di rBas e : 0 0 0 3 0 0 0 0 Obj ec tTabl e : e 1 0 0 0 f 8 8 Tabl e S i z e : 6 4 . Image : Sys t em PROCESS f f 5 5 4 3 6 0 C i d : 0 0 1 3 Peb : 7 f fdf 0 0 0 ParentC i d : 0 0 0 2 Di rBas e : 0 1 2 ec 0 0 0 Obj ectTabl e : e 1 0 0 1 7 c 8 Tabl e S i z e : 4 8 . Image : sms s . exe PROCESS f f 5 8b 6 c 0 C i d : 0 0 9 0 Peb : 7 f f df 0 0 0 ParentC i d : 0 0 7 b Di rBas e : 0 0 3 b9 0 0 0 Obj ec tTabl e : e l l f e e e 8 Tabl e S i z e : 1 6 . Image : Xxt es t . exe For more information, you can use the CID number of a specific process and increase the level of verbosity with some flags. 8 KDx8 6 > ! pr o c e s s 90 -1 Searching f o r Proc e s s w i t h C i d PROCESS f f 5 8b 6 c 0 C i d : D i rBas e : Image : 0090 = = Peb : 90 7 f f df 0 0 0 0 0 3 b9 0 0 0 Obj e c tTab l e : Parent C i d : 16 . Xxt e s t . exe VadRo o t f f 4 f a 6 6 8 C l one 0 P r ivate 3 3 . Modi f i ed 0 . FF 5 8 B 8 7 C Mutant S t a t e S i gn a l l ed OWn i ngThread 0 8 0 0 7b e l l f e e e 8 Tab l eS i z e : Token el3 04 0 3 0 E l aps edTime 0 : 00 : 00 . 0110 U s e rT ime 0 : 00 : 00 . 0020 Kerne l Time 0 : 00 : 00 . 0030 Quo t a P o o l U s age [ PagedPo o l ) 6892 Quo t a P o o l U s age [ NonPagedPo o l ) 1096 Kernel-mode threads always run in the process whose CID is 2. Locked 0 . Sec. 17.5 Analyzing a Crash Dump Working S e t S i z e s 439 ( 14 6 , ( now , mi n , max ) PeakWo r k i ng S e t S i z e 50 , 345 ) 153 8 Mb V i r tu a l S i z e 8 Mb PeakVi r t u a l S i z e PageFau l t Count 159 MemoryP r i o r i ty FOREGROUND BasePr i o r i ty 9 C ommi tCharge 38 THREAD f f 5 9 9 b e 0 C i d 9 0 . 8 8 Teb : 7 f fde 0 0 0 Win3 2 Thread : 8 0 1 4 4 8 c 0 RUNNING IRP L i s t : f f 5 6 7 ee 8 : Not ( 0006 , 0094 ) F l ags : 0 0 0 0 0 a3 0 Mdl : 00000000 impers onat ing Own ing Pro c e s s Wa i tT ime f f 5 8b 6 c 0 ( s ec onds ) 107578 C o n t ext Swi t c h C ount UserTime 12 0 : 00 : 00 . 0010 Kerne l T ime 0 : 00 : 00 . 0030 S t a r t Addre s s Ox7 7 f 2 7 0 a 4 I n i t i a l Sp f f 4 1 7 0 0 0 Current S p f f 4 1 6bec Pr i o r i ty 9 Bas e P r i o r i ty 9 P r i o r i tyDec r ement 0 DecrementC ount 124 Chi l dEBP RetAddr Args t o Chi l d 0 0 1 2 f7 5 0 00000000 00000000 00000000 00000000 For multithreaded processes, this form o f the !process command will tell you things about all the threads, including any objects they might be waiting for. It also gives information about the I / O requests issued by a given thread, so if a thread seems to be getting hung, you can see what IRPs it issued. Analyzing Crashes with DUM PEXAM DUMPEXAM is a command-line utility that you can use to analyze a crash dump file. When you run this utility, it uses the kernel-mode debugger to execute a standard series of commands and produces an output file called MEM ORY.TXT. The analysis performed by DUMPEXAM is intended to give support personnel a fairly detailed snapshot of the state of the system at the time of the crash. This can be useful if you're trying to support a driver out in the field. You'll find DUMPEXAM on the Windows NT distribution CD in the \ SVP PORT\DEBUG \ directory. Along with the DUMPEXAM executable, you have to install the KD EXTS.DLL extension DLL for the target plat form. Normally, these DLLs are copied along with everything else when you install WINDBG from the Win32 SOK. You also need to copy IMAGEHLP.DLL from the Windows NT distribution CD. It's in the same directory as the DUMPEXAM executable. Finally, make sure you mirror the debug symbol tree that's on the CD when you run DUMPEXAM. Unfortunately, this tool isn't smart enough to handle the situation where everything is in the same directory. 440 Chapter 1 7 Testing and Debugging Drivers 1 7 .6 I NTERACTIVE D E B U G G I N G Poking around in the remains o f a dead system can tell you a great deal, but some problems are easier to diagnose while a driver is still running. This section briefly describes how to debug driver code interactively. Starting and Stopping a Debug Session WINDBG is the primary tool for interactive debugging. To use it, you'll need to set up host and target systems as described in Appendix A. As with crash dump analysis, make certain the source-code path on the host exactly matches the source-code path on the machine where the driver was built. Once everything is configured, follow these steps to begin an interactive debug session: 1. Move a copy of your driver 's executable (or the corresponding .DBG symbol file) into the symbol directory on the host. Repeat this step each time you rebuild the driver, or the symbols will be out of sync. 2. From the command line, run WINDBG using the k and -y options, for example, - WINDBG -k i 3 8 6 coml 9 6 0 0 -y c : \wnt \ symbo l s ntoskrnl . exe 3. From the WINDBG Run menu, select Go. You'll see a message in the WINDBG command window saying that WINDBG is waiting to connect. 4. Reboot the target machine with the Kernel's debug client enabled. As the sys tem boots, you'll see it trying to make a connection with the debugger on the host. When the systems connect, there will be a lot of activity in WINDBG's command window. Once you've established a connection between the host and target machines, you have a wide range of commands available to you. For the most part, the inter active WINDBG commands are a superset of the ones you use to analyze a crash. You also have the added capability of setting breakpoints on the target and single stepping through target code. After you've completed a debugging session, you should follow these steps to disconnect the host and the target: 1. If you've set any breakpoints in your driver, pause the target system by typ ing CTRL+C in the WINDBG command window. (Alternatively, you can press the SYSREQ key on the target itself.) 2. From the Debug menu, choose Breakpoints. When the breakpoint dialog appears, click on Clear All and OK. 3. From the Run menu, choose Go (or use the toolbar button) to let the target machine continue. 4. From the File menu, choose Exit. Sec. 17.6 Interactive Debugging 441 After WINDBG has exited, the target machine may pause for 30 seconds or so the first time it hits a KdPrint macro. This delay is the time it takes the Kernel's debug client to realize there's no debugger to talk to. It occurs only once. Setting Breakpoi nts One of the great things about WINDBG is its ability to set source-code breakpoints in a driver. This can be immensely helpful for figuring out the exact nature of a bug. To set a breakpoint with WINDBG, do the following: 1. I f the target machine i s currently running, type CTRL+C i n the WINDBG command window to pause the target. (Alternatively, you can press the SYSREQ key on the target.) You can't set breakpoints if the target is running. 2. From the File menu, choose Open. The Open File dialog box will appear. Nav igate to the directory containing your driver 's source code. Double-click on a source file to open it. 3. Move the cursor to the source code line where you want to set the breakpoint. If you're breaking on a multiline C statement, make sure you position the cur sor on the line containing the semicolon. 4. Click on the breakpoint button in the toolbar. (It's the one that looks like a lit tle hand.) If your driver is currently loaded in memory, the source-code line will turn red; if it hasn't been loaded yet, the source line will turn magenta. 5. Click on the Go button in the toolbar to let the target machine continue. When the target machine hits the breakpoint, it will stop and the source-code line in WINDBG will turn green. To remove a breakpoint, simply pause the target machine, select the source code line containing the breakpoint, and click on the toolbar 's breakpoint but ton. You can also use the Debug Breakpoints menu item to remove multiple breakpoints. Breakpoints highlight another of WINDBG's little quirks. If you set several breakpoints in a driver that hasn't been loaded yet, WINDBG won't be able to resolve the first one that it hits. Instead it will display a dialog box asking you how it should handle the breakpoint. You should select the Defer option. This will cause WINDBG to instantiate all the breakpoints in the driver and proceed. When WINDBG hits the next breakpoint, it will work correctly. (In fact, even if it hits the first breakpoint again, it will work properly.) Breakpoints that you set after the driver is loaded don't seem to have this problem. This odd behavior can make it difficult to set breakpoints in the DriverEntry routine. The easiest solution is just to set an extra (dummy) breakpoint some where at the beginning of DriverEntry. This one will cause the others to behave properly. Chapter 17 442 Testing and Debugging Drivers Setting Hard Breakpoints With WINDBG, there aren't too many compelling reasons for putting hard breakpoints into your driver. If you do find such a need, you can use the follow ing two calls: VO I D DbgBreakPo int ( VOI D ) ; VOI D KdBreakPo int ( VOID ) ; KdBreakPoint is just a macro that wraps a conditional compilation directive around DbgBreakPoint. KdBreakPoint becomes a no-op if you build a free ver sion of your driver. Beware: NT will crash with a KMODE_EXCEPTION_NOT_HANDLED error if your driver hits a hard-coded breakpoint and the Kernel's debug client isn't enabled. If your driver hits a breakpoint and there's no debugger on the other end of the serial line, NT will hang the target machine. You can recover from the hang by starting up WINDBG on the host machine. Using Print Statements Debugging code by peppering it with print£ statements has a long and hon orable history. You can continue the tradition by calling either DbgPrint or KdPrint. Both allow you to send a debug string from your driver (on the target system) to the WINDBG command window (on the host machine) . These calls have the following syntax: ULONG DbgPrint ( Forma t S t ring , argl , arg2 ... ) ; ULONG KdPrint ( ( Forma t S t ring , argl , arg2 ... ) ) ; DbgPrint and KdPrint take the same arguments as the standard print£ func tion. Since KdPrint is actually a macro (defined in NTDDK.H), you have to include an extra set of parentheses in order to pass it a variable-length list of argu ments. KdPrint also becomes a no-op in free builds of a driver. 1 7.7 W RITI N G W I N D B G E XTE N S I O N S One o f WINDBG's strengths is that you can expand its capabilities b y writing extension commands for it. This can be very helpful, particularly for printing out the contents of driver-defined data structures. Unfortunately, the documentation and sample extension code that come with the NT DDK are incorrect. This section explains how to add extension commands to WINDBG . How WINDBG Extensions Work A WINDBG extension is just a user-mode DLL that exports various com mands in the form of DLL functions. The extension DLL also contains several support routines that perform initialization and version-checking operations. 443 Sec. 1 7.7 Writing WINDBG Extensions One of the tricky aspects of writing a WINDBG extension is gaining access to memory in the target system (whether it's a crash file or a live machine) . To make this easy, WINDBG supplies a set of callback routines that the extension DLLs use to touch the debug target. This means the DLL has the same view of the target system's memory as WINDBG itself. In particular, extension com mands can't access anything that is paged out at the time a crash or breakpoint occurs. Initial ization and Version-Checking Functions When you write an extension DLL for WINDBG, there are two required ini tialization functions that you must include. At your option, you can also include a third version-checking function. These are described in the following subsections. WinDbgExtensionDlllnit WINDBG calls this function when the user loads the extension DLL. Its job is to save the address of the callback table so that other parts of the DLL can use it. This function (shown in Table 1 7.2) is required. Table 1 7.2 Function prototype for Wi n DbgExtension D l l l nit VOID Win DbgExtension Dlllnit Parameter Description PWINDBG_EXTENSION_APIS lpExtensionApis USHORT MajorVersion Address of table containing pointers to WINDBG callback functions • OxF for free build of NT • OxC for checked build of NT Build-number of NT (None) USHORT MinorVersion Return value ExtensionApiVersion WINDBG calls this function when you try to load an extension DLL. Its job is to convince WINDBG that the extension DLL has the same version as WINDBG itself. It does this by returning a pointer to the version structure associated with the extension DLL. This function (shown in Table 17.3) is required. Table 1 7.3 Function prototype for ExtensionApiVersion LPEXT_API_VERSION ExtensionApiVersion Parameter Description VOID (None) Address of the DLL's EXT_API_VERSION structure Return value 444 Chapter 17 Testing and Debugging Drivers CheckVersion Each time WINDBG executes a command in the DLL, it calls this function before calling the command routine. CheckVersion's job is to make sure that the version of the extension DLL is compatible with the version of NT being debugged. If not, it should complain loudly (and perhaps set a global DLL variable to inhibit command execution) . This function (shown in Table 17.4) is optional. Table 1 7.4 Function prototype for CheckVersion VOI D CheckVersion Parameter Description VOID (None) (None) Return value Writing Extension Commands Each command in your extension DLL is implemented as a separate func tion. Define these command functions using the DELCARE_API macro, like this: DECLARE_API ( c ommand_name { II I I Your c ode II . . . DECLARE_API gives your command function the prototype shown in Table 17.5. Be sure the names of your commands are entirely lower-case, or WNDBG won't be able to find them. Table 1 7.5 Commands declared with DECLAR E_API have this prototype VOI D command_name Parameter Description IN HANDLE hCurrentProcess IN HANDLE hCurrentThread IN ULONG dwCurrentPc IN ULONG dwProcessor IN PCSTR args Handle of current process on target machine Handle of current thread on target machine Current value of program counter value Number of current CPU Argument string passed to the command (None) Return value Sec. 17.7 Writing WINDBG Extensions 445 These extension commands can perform any sort of operation that will make debugging easier. Their most common use is to format and print the con tents of various driver-defined data structures, like the Device Extension. Finally, if one of your extension commands is going to take a long time to execute, or if it's going to generate a lot of output, it should periodically check to see if the WINDBG user has typed CTRL+C. Otherwise, the user won't have any way to abort the command until it completes. One of the WINDBG helper func tions described next lets you make this check. WIN DBG Helper Functions Your extension DLL gains access to the system being debugged by calling various helper functions exported by WINDBG itself. These functions also give your DLL access to the WINDBG command window for input and output. Table 1 7.6 contains a brief description of these helper functions. Table 1 7.6 A WIN DBG extension DLL can call these helper functions WIN DBG helper functions Function Description dprintf CheckControlC GetExpression GetSymbol Disassm StackTrace GetKDContext GetContext SetContext ReadControlSpace ReadMemory WriteMemory* ReadloSpace* WriteloSpace* ReadloSpaceEx* Print formatted text in WINDBG command window See if WINDBG user has typed CTRL+C Convert a C expression into a DWORD value Locate name of symbol nearest a given address Generate string representation of machine instruction Return stack-trace of current process Return current CPU number and count of CPUs Return CPU context of process being debugged Modify CPU context of process being debugged Get platform-specific CPU information Copy data from system virtual space into buffer Copy data from buffer to system virtual space Read 1/0 port Write 1/0 port Read 1/0 port on specific bus-type and number (Alpha only) Write 1/0 port on specific bus-type and number (Alpha only) Copy data from physical memory into buffer Copy data from buffer to specific physical addresses WriteloSpaceEx* ReadPhysical WritePhysical* *These functions can only be used during an interactive debugging session. Chapter 17 446 Testing and Debugging Drivers The only complete documentation on these helper functions is in the WINDBG online help. To find it, do the following: l. From the WINDBG help Contents screen, click on the KD button. 2. Click on the "Creating Extensions" topic. 3. Scroll about halfway down this topic and you'll find a list of helper functions. 4. Click on the name of a function to see its prototype and a description. Building and Using an Extension DLL Although a WINDBG extension is just a user-mode DLL, you still need to compile and link it using the BUILD utility. This is because it incorporates the DDK header files, and it needs all the compile-time symbol definitions provided by BUILD. Consequently, using Visual C++ projects to create an extension DLL isn't easy. The example in the next section contains a SOURCES file that builds one of these DLLs. To use an extension DLL, you first load it using WINDBG's !load command. Then you execute one of its functions with a command of the form !function. The !unload command allows you to unload an extension DLL. WINDBG allows you to have up to 32 extension DLLs loaded at one time. When you execute a !function command, WINDBG searches the list of currently loaded extensions, starting with the most recently loaded and going back to earliest. 1 7.8 CODE EXAM PLE: A WI N D BG EXTENSION This example shows how t o write a simple WINDB extension DLL. You can find the code for this example in the CH17\XXDBG directory on the disk that accom panies this book. XXDBG.C All the code for this extension DLL is in a single file. The following subsec tions break it into easily digestible pieces. Headers This part of the code contains all the headers and definitions needed to make everything work. Warning: There is some odd stuff going on here. Don't change the sequence of anything between 0 and tD. # inc lude O # inc lude # de f ine LMEM_F IXED O xO O O O @ Sec. 17.8 Code Example: A WINDBG Extension # de f ine # de f ine # de f ine # de f ine # de f ine # de f ine # de f ine # de f ine 447 LMEM_MOVEABLE Ox0 0 0 2 LMEM_NOCOMPACT OxO O l O LMEM_NODISCARD O x0 0 2 0 LMEM_ZEROINIT Ox0 0 4 0 LMEM_MODIFY Ox0 0 8 0 LMEM_DI SCARDABLE OxO F O O LMEM_VALI D_FLAGS Ox0 F 7 2 LMEM_INVALID_HANDLE Ox8 0 0 0 # de f ine LPTR ( LMEM_F IXED I LMEM_ZEROINI T ) # de f ine WINBASEAP I WINBASEAPI HLOCAL WINAPI LocalAl l o c ( UINT uFl ags , UINT uByt es ) ; WINBASEAPI HLOCAL WINAPI LocalFree ( HLOCAL hMem ); # de f ine CopyMemory Rt l CopyMemory # de f ine F i l lMemory Rt l F i l lMemory # de f ine Z eroMemory Rt l Z eroMemory # inc lude @} II I I Other header f i l e s . . . II # inc lude < s tdl ib . h> # inc l ude < s t r ing . h> # inc lude " . . \ drive r \ xxdr iver . h " 0 0 This is the beginning of some magic. The problem is that we're trying to build a Win32 user-mode DLL, but we need access to things defined in NTDDK.H and XXDRIVER.H. It takes a little trickery to get all the header files to live together. @ The various definitions that follow are taken from WINBASE.H in the Win32 SOK. The WINDBG extension definitions from WDBGEXTS.H Chapter 1 7 448 Testing and Debugging Drivers won't work without them. Unfortunately, NTDDK.H and WINBASE.H can't coexist in the same source file. The only solution is to cut the required pieces from WINBASE.H and include them here. 8 Now it's safe to bring in the WINDBG extension definitions. This header is located in MSTOOLS\H in the Win32 SOK. Here ends the magical sequence of headers and definitions. 0 Finally, bring in the driver-specific data structures and definitions. Globals These global variables are necessary for the proper operation of the extension library. s t a t i c EXT_API_VERS ION ApiVers i on = { 3 , 5 , EXT_API_VERS I ON_NUMBER , 0 } ; 0 s ta t i c WINDBG_EXTENS I ON_API S Extens i onAp i s ; @ s ta t i c USHORT SavedMaj orVers i on ; 8 s t a t i c USHORT S avedMinorVers i on ; 0 This structure identifies the version of WINDBG that this particular extension library works with. WINDBG won't allow you to load an incompatible extension DLL. @ This will hold a pointer to the table of WINDBG callback functions. The access macros defined in WDBGEXTS.H assume that this pointer is called ExtensionApis, so don't change the name. 8 These variables will hold information about the version of NT that is being debugged. You can use this information to verify that your library is compatible with that version. Required functions These functions perform various kinds of initializa tion and version-checking. VOI D WinDbgExt ens i onDl l in i t ( PWINDBG_EXTENS ION_API S lpExt ens i onAp i s , USHORT Maj orVers i on , USHORT MinorVers i on ) II I I Save the addr e s s o f the WINDBG cal lback I I tabl e and the NT vers i on inf orma t i on II Extens i onAp i s = * lpExt ens i onAp i s ; S avedMaj o rVers i on = Maj orVers i on ; Sec. 17.8 Code Example: A WINDBG Extension SavedMinorVers i on 449 MinorVers i on ; return ; VOID CheckVe r s i on ( VOI D ) II I I Replace thi s wi th your I I ver s i on - checking c ode II dprint f ( " CheckVers i on cal l ed . . . S avedMa j o rVer s i on , SavedMinorVers i on ) ; [ % 1x ; % d ] \ n " , } LPEXT_API_VERS I ON Extens i onApiVers i on ( VOI D ) re turn &Ap iVers i on ; } Command routines Here is the code for a command that formats and prints the contents of the Device Extension. It illustrates how to access memory on the system being debugged. DECLARE_API { devext ) { DWORD dwBytesRead ; DWORD dwAddres s ; PDEVICE_OBJECT pDevObj ; PDEVICE_EXTENS I ON pDevExt ; i f ( ( pDevObj mal l o c ( s i z e o f ( DEVICE_OBJECT ) ) ) = == NULL ) 0 { dprintf ( " Can ' t a l l ocate bu f f er . \ n " ) ; return ; Chapter 17 450 dwAddres s = Testing and Debugging Drivers GetExpre s s i on ( args ) ; @ i f ( ! ReadMemory ( dwAddre s s , pDevObj , s i z e o f ( DEVICE_OBJECT ) , &dwBytesRead ) ) @) dprint f ( " Can ' t get Dev i c e obj e c t . \ n " ) ; free ( pDevObj ) ; return ; i f ( ( pDevExt = mal l oc ( s i z e o f ( DEVICE_EXTENS I ON ) ) ) = = NULL ) 8 { dprint f ( " Can ' t a l l ocate bu f f e r . \ n " ) ; free ( pDevObj ) ; return ; } i f ( ! ReadMemory ( ( DWORD ) pDevObj - >Devic eExtens ion , pDevExt , s i z e o f ( DEVICE_EXTENS I ON ) , &dwByt e s Read ) ) 0 dprint f ( " Can ' t get Devi c e Ext ens i on . \ n " ) ; free ( pDevExt ) ; free ( pDevObj ) ; return ; } dpr i nt f ( CD " BytesReque s t ed : %d\ n " " Byt e s Remaining : %d\ n " " TimeoutCount er : %d \ n " " Devi c eObj e c t : % 8x\ n " , pDevExt - >Byt e sReque s t ed , pDevExt - >Byt e s Remaining , pDevExt - >TimeoutCount er , pDevExt - >Devi c eObj ect ) i free ( pDevExt ) ; @ free ( pDevObj ) ; } Sec. 1 7.8 Code Example: A WINDBG Extension 451 0 Allocate memory for a copy of the Device object. @ Get the address of the Device object from the command line using a WINDBG callback function. 8 Use another WINDBG callback function to get a copy of the Device object from the system being debugged. 0 Allocate another buffer to hold the Device Extension. 6' Get the address of the Device Extension (on the target system) from the Device object. Copy the Extension from the target system into the buffer. ! l oad xxdbg 0 Debugger extens i on l ibrary [ xxdbg ] l o aded KDx8 6 > ! devext f f 5 8bc 4 0 @ CheckVers i on c a l l e d . . . [ f ; 1 0 5 7 ] Byt es Reques t ed : 0 Byt e sRemaining : 0 TimeoutCount er : 0 Devi c eObj e c t : f f 5 8bc4 0 KDx8 6 > ! unl oad 49 Extens i on dl l xxdbg unl oaded 0 The !load command brings XXDBG into memory and makes it the default extension library. For this to work, XXDBG.DLL must be in one of the directories where the system looks for DLLs. @ To execute a command, just prefix the command name with an exclama tion point. 49 The !unload command unloads the current default extension library. To unload some other extension DLL, specify the name of library as an argu ment to the command. 1 7 .9 MISCELLANEOUS DEBUGGING TEC H NIQUES Often the main problem in correcting driver bugs is just getting enough informa tion to make an accurate diagnosis. This section presents a grab bag of techniques that may help. Leaving Debug Code in the Driver In general, it's a good idea to leave debugging code in place, even when you think the driver is ready for release. That way, you can reuse it when you have to modify the driver at some later date. Conditional compilation makes this easy to do. The BUILD utility defines a compile-time symbol called DBG that you can use to conditionally add debugging code to your driver. In the checked BUILD environment, DBG has a value of one; in the free environment it has a value of zero. Several of the macros described below use this symbol to suppress the gen eration of extraneous debugging code in free versions of drivers. If you're adding your own debugging code to a driver, you should wrap it in #if DBG and #endif statements. Sec. 17.9 Miscellaneous Debugging Techniques 453 Catchi ng Incorrect Assumptions As in real life, making unfounded assumptions in kernel-mode drivers is a dangerous practice. For example, assuming that some function argument will always be non-NULL, or that a piece of code will only be called at a specific IRQL level can lead to disaster if these expectations aren't met. To catch unforeseen conditions that could lead to driver failure, you need to do two things. First, you have to document the explicit assumptions made by your code. Second, you need to verify that these assumptions are actually true at runtime. The ASSERT and ASSERTMSG macros will help you with both these tasks. They have the following syntax: ASSERT ( Expressi on ) ; ASSERTMSG ( Message , Expressi on ) ; If Expression evaluates to FALSE, ASSERT writes a message to WINDBG's command window. The message contains the source code of the failing expres sion, plus the file name and line number where the ASSERT macro was called. It then gives you the option of taking a breakpoint at the point of the ASSERT, ignoring the assertion failure, or terminating the process or thread in which the assertion occurred. ASSERTMSG exhibits the same behavior, except that it includes the text of the Message argument with its output. Don't try getting too fancy with the Message argument; it's just a simple string. Unlike the debug print functions described ear lier, ASSERTMSG doesn't allow you to include any printf-style substitutions. Several things are worth mentioning here. First, both assertion macros com pile conditionally and disappear altogether in free builds of your driver. This means it's a very bad idea to put any executable code in the Expression argument. Another little twist is that RtlAssert (the underlying function used by these macros) is a no-op in the free version of Windows NT itself. So, if you want to see any assertion failures, you'll have to run a checked build of your driver under the checked version of Wmdows NT. Finally, a warning is in order: The checked build of Windows NT will crash with a KMODE_EXCEPTION_NOT_HANDLED error if an assertion fails and the Kernel's debug client isn't enabled. If the debug client is enabled, but there's no debugger on the other end of the serial line, the target machine will simply hang when an assertion fails. You can recover from the hang by starting up WINDBG on the host machine, but you won't see the text of the assertion that failed. Using Bugcheck Cal lbacks A bugcheck callback is an optional driver routine that gets called by the Ker nel when the system begins to crash. These routines give you a convenient way to capture debugging information at the time of a crash. You can also use them to put a piece of hardware in a known state before the system goes away. Here's how they work. 454 Chapter 1 7 Testing and Debugging Drivers In DriverEntry, call KelnitializeCallbackRecord to set up a KBUG CHECK_CALLBACK_RECORD structure. The space for this opaque struc ture must be nonpaged, and must be left alone until you call KeDeregister 1. BugCheckCallback. 2. Also in DriverEntry, call KeRegisterBugCheckCallback to request notifica tion when a bugcheck occurs. The arguments to this function include the bugcheck-callback record, the address of a callback routine, the address and size of a driver-defined crash buffer, and a string that will be used to identify this driver 's crash buffer. As with the bugcheck-callback record, memory for the driver 's crash buffer must be nonpaged and can't be touched until the driver calls KeDeregisterBugCheckCallback. 3. Call KeDeregisterBugCheckCallback in your driver 's Unload routine to dis connect from the bugcheck notification mechanism. 4. If a bugcheck occurs, the system will call the driver 's bugcheck-callback routine and pass it the address and size of the driver 's crash buffer. The j ob of the callback routine is to fill the crash buffer with any information that would not otherwise end up in the dump file (like the contents of device registers) . 5. When you analyze the crash with WINDBG, use the !bugdump command to view the contents of the crash buffer. There are some restrictions on what a bugcheck callback is allowed to do. When it runs, the callback routine can't allocate any system resources (like mem ory). It also can't use spin locks or any other synchronization mechanisms. 9 It is allowed to call Kernel routines that don't violate these restrictions, as well as the HAL functions that access device registers. Catching Memory Leaks A memory leak is one of the nastier kinds of driver pathology. Drivers that allocate pool space and then forget to release it may just degrade system perfor mance over time, or they can lead to actual system crashes. You can use NT's built-in pool-tagging mechanism to determine if your driver leaks memory. Here's how it works. 1. 9 Replace calls to ExAllocatePool with ExAllocatePoolWithTag calls. The extra 4-byte tag argument to this function will be used to mark the block of mem ory allocated by your driver. Synchronization shouldn't be a problem, though, since nothing else is allowed to run while the bugcheck callback is executing. Sec. 1 7.9 Miscellaneous Debugging Techniques 2. 3. 455 Run your driver under the checked build of NT. Keeping track of pool ta§s is an expensive activity, so it only works under the checked version of NT. 1 When you're analyzing a crash, or when your driver is at a breakpoint, use the !poolused or !poolfind commands in WINDBG to examine the state of the pool areas. These commands sort the pool areas by tag value and displays various memory statistics for each tag. One easy way to use pool tagging is to replace the ExAllocatePool function with ExAllocatePoolWithTag with conditional compilation. This way, you can tum tagging on and off without too much trouble. Add something like the follow ing to your driver 's header file: # i f DBG # de f ine ExAl l ocatePoo l ( type , s i z e \ ExAl locatePoolWi thTag ( ( type ) , ( s i z e ) , ' DCBA ' ) # endi f The tag argument to ExAllocatePoolWithTag consists of four case-sensitive ANSI characters. Because of the way things work on little-endian machines, you need to specify the characters in reverse order. Hence, the DCBA in the example will become ABCD in the pool tag display. In this example, we used the same tag value for all the allocations made by a single driver. For some situations, you might also want to use different tag values for different kinds of data structures, or for allocations made by different parts of your driver. These kinds of strategies might help you see exactly what's been leaking out of your driver. The POOLMON utility that comes with the NT DOK also lets you look at the pool tags dynamically, without the need for WINDBG. You run this com mand-line utility on the target machine and it outputs a continuously updated display of the pool tags. See Chapter 6 of the DOK Programmer 's Guide for details on running POOLMON. Using Counters, B its, and B uffers There's no question that interactive driver debugging is a wonderful thing. o Unf rtunately, some kinds of bugs are time-dependent, and they disappear when you use breakpoints or single step through the code. This subsection presents sev eral techniques that may help you catch these bugs. lO Chapter 6 of the DOK Programmer's Guide claims that you can enable this feature in the free build of NT by ORing the FLG _POOL_ENABLE _TAGGING bit into the GlobalFlag value of the HKEY_LOCAL_MACHINE\System \CurrentControlSet\ Control \ SessionManager key of the Registry. Unfortunately, none of the currently available documentation or header files defines what this value is. Chapter 1 7 456 Testing and Debugging Drivers Sanity counters You can use pairs of counters to perform several kinds of sanity checks in your driver. For example, you might count how many IRPs arrive at your driver and how many you send to IoCompleteRequest. Or, in a higher level driver, you could track the number of IRPs allocated versus the number released. Checks like these can help you find subtle inconsistencies in the behav ior of your driver. The only disadvantage of sanity counters is that they don't nec essarily tell you where the problem is occurring. Implementing a counter is very simple. Just declare a ULONG variable in your Device Extension for each counter and then add appropriate code to incre ment the counters throughout your driver. As with all debugging support, it's a good idea to wrap sanity-counter code in conditional compilation statements that depend on the DBG symbol. If you're feeling really ambitious, you can write a WINDBG extension to dis play the counters. As a simple alternative, your driver can force a bugcheck after it has collected enough data, and simply use a bugcheck callback to save the counter values. Event bits Another useful technique is to keep a collection of bit flags that track the occurrence of significant events in your driver. Each bit represents one specific event, and when that event happens, your driver sets the corresponding bit. Where sanity counters tell you about global-driver behavior, event bits can give you an idea of what parts of your code have executed. One of the decisions you'll have to make is whether to clear the event vari able during DriverEntry, during the Dispatch routine for IRP_MJ_CREATE, or when you begin processing each new IRP. Each of these options can be useful in different situations. Trace buffers The problem with event bits and counters is that they don't give you any idea of the sequence of execution of your code. To get around this limitation, you can add a simple tracing mechanism that makes entries in a spe cial buffer as different parts of your driver execute. Trace buffers can be very useful for tracking down unexpected interactions in asynchronous or full-duplex drivers. On the downside, this extra information isn't free. Trace buffers use more CPU time than counters or event bits, and this could have an effect on time-sensitive bugs. Implementing a trace buffer mechanism takes a little more work than the other techniques we've looked at. Here are the basic steps you need to follow: 1. Add trace buffer data structures to your driver. Normally, you should put these structures in the Device Extension so you can trace things on a device by-device basis. Every once in awhile, you might find some value in a global buffer that traces the entire driver. 2. Define a macro to make entries in the trace buffer. As with other pieces of debug code, it's a good idea to bracket the trace macro with conditional com pilation statements. 457 Sec. 17.9 Miscellaneous Debugging Techniques 3. Insert calls to the trace macro at various strategic places in your driver. 4. Write a debugger extension to dump the contents of trace buffer. The trace buffer itself is just an array, coupled with a counter that keeps track of the next free slot. The following code fragment illustrates the structure of a basic trace buffer. typede f _DEVICE_EXTENS I ON { # i f DBG ULONG TraceCount ; ULONG TraceBu f f e r [ XX_TRACE_BUFFER_S I Z E ] ; # endi f } DEVICE_EXTENS I ON , * PDEVICE_EXTENS I ON ; Again, depending on what you're looking for, you can initialize the Trace Count field once in your DriverEntry routine, each time you get an IRP_MJ_CREATE request, or with each new IRP. Adding entries to the buffer is just a matter of storing an item in the array and incrementing the counter. 11 This code fragment shows how to implement a basic trace macro. # i f DBG #de f ine XXTRACE ( pDE , Tag ) i f ( pDE - >TraceC ount > = XX_TRACE_BUFFER_S I Z E pDE - > Trac eCount = O ; pDE - >TraceBu f f e r [ pDE - >TraceCount + + ] = ( ULONG ) ( Tag ) ; #else #de f ine XXTRACE ( pDE , Tag ) whi l e ( FALSE ) { } # endi f \ \ \ \ \ Notice that this implementation ignores all the synchronization issues that arise when you call XXTRACE from multiple IRQL levels (potentially on multiple CPUs). Since the whole purpose of using trace buffers is to catch errors that are sensitive to timing, putting synchronization mechanisms into XXTRACE would probably make it useless. So, just how do you prevent the trace macro from trash ing itself? One solution is to call XXTRACE only from places in your driver where syn chronization won't be a problem. For example, if you call XXTRACE from DPC routines, synchronization is already being handled as part of the larger structure of the driver itself. Similarly, if you call it from an JSR and a SyncCritSection routine, 11 If you have a large enough trace buffer and an accurate idea of how many events will be traced, you can save some time by eliminating the test for a full buffer. This is a very dangerous optimiza tion, so use it with care. Chapter 17 458 Testing and Debugging Drivers synchronization is already guaranteed. If you can't live with these restrictions, you'll have to add explicit synchronization to XXTRACE. 1 7. 1 0 SUMMARY When you write a driver, very few limits are placed on what you can do to the system. With all this power comes the heavy burden of making sure that your driver doesn't compromise system integrity. You need to correct not only overt, catastrophic errors, but also subtle problems that may over time damage the sys tem. This chapter has presented some techniques you can use to diagnose and eliminate bugs, both early in the development cycle, and later when the driver is out in the world. But suppose bugs aren't the problem. Suppose the driver works, but it just isn't fast enough. The next chapter examines the important area of driver performance. C H A P T E R 18 Driver Performance T here's a certain feverish look - a kind of glassy stare - that comes into the eyes of a programmer about to start tuning a piece of code. You can almost hear their thoughts: "ff I just squeeze out a few cycles here and there, make this loop a little tighter, optimize the code by hand, maybe even use some assembly language . . . " Through some kind of magic, everything will run twice as fast. Unfortunately, the results seldom meet these expectations, and after a lot of effort, the code runs only a few percent faster. The problem is that no amount of optimization or tuning will make up for an inherently slow design. Performance is something you have to think about all the way through the development cycle. If you've done that, then you can use the techniques described in this chapter to verify that your driver meets its performance goals. 1 8. 1 G E NE RAL G U I DELI N ES Acceptable driver performance can mean different things in different situations. As a result, the guidelines given in this section are necessarily a little fuzzy. Hope fully, they'll act as a springboard for your own thinking on the subject. Know Where You're Going You have to know where you're trying to go or else you won't know when you've gotten there. In the case of driver tuning, this means you should have 459 460 Chapter 18 Driver Performance some specific performance targets in mind when you start. These targets can be the result of a number of things: • The device itself may have some timing needs. For example, it might need to be serviced within a certain minimum interval, or it may generate data at some particular rate. Understanding your device and how it will be used are important factors in setting performance targets. • Application programs may have expectations of how quickly the device will respond, or how many transactions per second it should be able to handle. • The user 's perception may be the determining factor in choosing perfor mance targets. The drivers of video cards, sound boards, and even pointing devices are judged by how they feel to the user more than anything else. Very early in the design process, formulate your performance goals in the most concrete terms possible. Come up with numbers if you can. Then look at your overall driver design and see where these performance needs will have the biggest impact. Get to Know the Hardware Learn as much as you can about the hardware your driver is managing. Does it have any weird quirks that might impact driver performance? Are there any specific sequences of operations that make things go faster or slower? Are you making the most of any built-in processing capabilities of the device itself? If you're working with a multiunit controller, does it support overlapped operations on several devices at the same time? The more you know about the hardware you're driving, the better you'll be able to see what your options are. Explore Creative Driver Designs Some of the most powerful optimizations come, not from tweaking code, but from looking for a whole different approach to the problem. NT has a very well-defined driver architecture, but it may not always be suitable for what you're trying to do. For example, look at the way video and display drivers work. Display speed would be abysmal if Win32 went all the way through the 1/0 Manager every time it touched the video hardware, so the drivers use a nonstandard architecture. In some cases, it may make sense to map device registers or device memory into user space if that's the only way to achieve acceptable performance. Real-time device control might demand this kind of design. The mouse class and port drivers provide another example of nonstandard interfaces. In this case, the class driver gives the mouse port driver a pointer to a Sec. 18.1 General Guidelines 461 function that it should call when mouse events arrive. This allows the port driver to pass data using a common buffer and greatly reduces the system's overhead in processing large numbers of events. The downside of all this is that you may end up compromising system integrity. Don't abandon the standard NT driver architecture right off the bat, but if it's clear that nothing else will give you good performance, go for it. Optimize Code Creatively This is where everyone wants to look first, when in fact it's probably the last place to focus your attention. It's worth repeating that no amount of clever opti mization will make up for an inherently bad design. If you do need to squeeze more performance out of your code, here are some things to think about. First, be very clear about what you're trying to accomplish. Your goal should be to find new ways of doing things, not just ways to tweak existing code. Most decent C compilers do a wonderful job of tweaking code. Your advantage as a human is that you know the context in which the code will run. This allows you to look for entirely different ways of accomplishing a particular task Don't waste this gift by turning yourself into a glorified peep-hole optimizer. Also, focus your attention on the relatively small areas of code that really determine overall performance. It's often the case that one or two tiny subrou tines, comprising maybe 10 percent of your overall driver, will be the gate that controls the speed of the driver. Try to find those hot spots or critical code paths and make them as fast as possible. The code paths through your driver 's most fre quently executed operations are a good place to look Finally, don't assume that an optimization will have the same impact on all NT platforms. Some kinds of optimizations may work only on a specific type of CPU. If you plan to support your driver on more than one CPU or bus architec ture, be sure that improvements work equally well everywhere. At the very least, make certain that an optimization on one configuration doesn't degrade perfor mance anywhere else. Measure Everything You Do Concrete measurement forms the basis of all good science. It's amazing how much faster a piece of code can seem just because you've put several hours of work into optimizing it. Don't get caught in the trap of wishful thinking; measure the impact of everything you do. If you don't have any quantitative data to go by, you won't know if you're helping or hurting. Later in this chapter, you'll see one way to analyze a driver 's behavior using the PERFMON utility. You can also measure the speed of specific routines using the profiling timer available in NT. The only limitation is that this counter 's reso lution on 80x86 machines is only one microsecond, and on a 100 MHz Pentium, a lot of instructions can flow by in that time. Chapter 18 462 Driver Performance 1 8.2 PERFORMANCE MONITORING IN WINDOWS NT One o f your options fo r observing a driver's behavior i s t o tie into NT's perfor mance monitoring system. The advantage of this technique is that you or anyone else can use the PERFMON utility to collect and display data about your driver. This section presents the overall architecture of NT' s performance monitor system. Some Termi nology Like other parts of NT, the performance system uses an object-based model to describe its operation. Before we look at the actual steps involved in using the performance system, it's a good idea to define some of the terms appearing in the discussion. Performance object This is any object that makes performance data avail able through the Registry. System components, drivers, and services can all export various performance objects. For example, the system exposes objects like memory and CPU, and drivers can expose separate performance objects for each device they support. Performance counter Data about a given performance object takes the form of counters. Although the name seems to imply the summing of discrete events, these counters can actually represent a wide variety of measurements: an absolute number of events, a rate of occurrence, a ratio of quantities, the average availability of a resource, and so forth. For example, NT's Memory object exposes counters representing the number of available bytes and the number of page faults per second. Object i nstance There may be more than one instance of some kinds of objects on the system. For example, there can be several CPUs and several disk drives. To distinguish among members of a set of identical objects, performance monitoring components usually represent these objects as separate instances of the object type. CPU performance data would show up as information about CPUO, CPUl, CPU2, and so on. Counter instance When a performance object supports multiple object instances, each instance will have its own complete set of counters. Referring back to the CPU object, there are separate interrupt rate counters for each CPU object instance. How Performance Monitoring Works Windows NT provides a common set of interfaces that drivers and applica tion programs can use if they want to participate in performance monitoring operations. Figure 18.1 shows how these interfaces work. Sec. 18.2 Performance Monitoring in Windows NT 463 Win32 Registry API PERFMON App File Mapping Object User-mode Driver DeviceloControl Kernel-mode Driver Data Collection DLL Copyright © 1 994 by Cydonix Corporation. 940055a. vsd Figure 1 8. 1 N T performance monitoring components The following describes what happens when you run the PERFMON utility (located in the Administrative Tools program group). The process would be the same for any application program curious about system performance data. 1. The PERFMON utility uses the Win32 RegQueryValueEx function to access the HKEY_PERFORMANCE_DATA key. 2. The Registry API scans HKEY_LOCAL_MACHINE\ . . . \ Services for drivers and services with a Performance subkey. Having this subkey marks a driver or service as a performance monitoring component. Values contained in the Performance subkey identify a data-collection DLL that acts as an interface between the Registry API and the objects being monitored. 3. The Registry API maps these interface DLLs into the process requesting per formance data. It then calls the Open and Collect functions in each DLL to determine what objects and counters the DLL supports. 4. Each time PERFMON wants updated performance information, it calls the RegQueryValueEx again. This results in calls to the Collect function in each performance component's data-collection DLL. The Collect function gets a raw data sample from the object being monitored and sends it back to PERFMON. 5. When PERFMON closes the HKEY_PERFORMANCE_DATA key with Reg CloseKey, the Registry API calls the DLL's Close function to do any necessary cleanup. It then unmaps the DLL from the process. 464 Chapter 18 Driver Performance You can see from this description that performance information isn't actu ally stored in the Registry in the same way that hardware or software configura tion data is. Rather, the Win32 Registry API calls gather performance data at the time someone asks for it. How Drivers Export Performance Data Drivers that support monitoring have to maintain performance data about themselves. They make this data available to their data-collection DLL using either of two different techniques: • IOCTLs - Kernel-mode drivers make their performance data available through a privately defined IOCTL function. • File Mapping objects - User-mode drivers expose performance data through a File Mapping object (i.e., shared memory) that has a well known name. The example appearing later in this chapter shows how to implement a data-collection DLL for a kernel-mode driver. A similar example in the NT DOK illustrates how to set up monitoring for a user-mode system component. 1 8.3 ADDING COUNTER NAMES TO THE REGISTRY One of the goals of NT' s performance monitoring architecture was to make the display names of performance objects and counters independent of any particular national language. If you have the American version of NT installed, performance monitoring tools should display counter names in English, while the French ver sion of NT should use French names. To accomplish this, both the data-collection DLL and the PERFMON utility refer to performance objects and counters using index numbers rather than names. These index numbers are assigned when a driver is installed on a given machine, and they are globally unique on that system. These object and counter indexes are stored in the Registry along with their corresponding display names. Tools like PERFMON use this area of the Registry to convert an object or counter index into text. A similar mechanism allows PERFMON to display help text (in the appropriate language) about a given counter. Counter Defin itions in the Registry As you can see from Figure 18.2, individual counter definitions are stored under the Perflib key, grouped according to their language ID. This scheme allows you to support counter names and help text in multiple languages without having to modify your driver. 465 Sec. 18.3 Adding Counter Names to the Registry L _LOCAL_MACHINE Software L Microsoft L Wlndpws NT L , Curren Version P � lib Last Counter: ... Last Help: ... American E n glish Other Languages i f 019� �� -·· '·········· ··· n nn ters: ••• Copyright C 1 994 by Cydonix Corporation. 940057a.vsd Figure 1 8.2 Counter definition area in the Registry Look at Table 18.1 for a more detailed view of the individual Registry entries. As you can see, each performance object or counter is coupled with a unique, even integer. These pairs are stored under the Counters subkey for each language. Help text for a given counter has an odd-numbered index one greater than the index for the counter itself. Help text definitions are stored under the Help subkey of each language. Although you could do something disgusting such as using REGEDT32 to add your counter definitions to the Registry, there is an easier way. The NT DOK Table 1 8.1 Registry entries that define counter names and help text Perflib Registry entries Entry Contents Example \nnn \ nnn \Counters Names and help text for a specific language ID REG_MULTI_SZ string composed of index I name \nnn \Help REG_MULTI_SZ string composed of index I help text Last Counter Last Help Highest assigned name index Highest-assigned help index 009 is the language ID for American English 2 \0 System \ 0 4 \ 0 Memory \ 0 6 \ 0 % Processor time \0 \ 0 3 \ 0 The System object type . . . \ 0 5 \ 0 The Memory object type ... \ 0 7 \ 0 % Processor time i s... \ 0 \ 0 Ox330 Ox331 466 Chapter 18 Driver Performance contains two utilities, LODCTR and UNLODCTR, that add and remove counter definitions for you. In order to add counters with LODCTR, you need to do the following: 1. Write a LODCTR command file. 2. Write a counter-offset header file. 3. Add a subkey called Performance to your driver 's Registry service key. 4. Run the LODCTR utility to install the counter definitions. Writing LODCTR Command Files To use the LODCTR utility, you first need to write a command file describ ing the objects, counters, and help text you want to add to the Registry. The com mand file is divided into three sections and can contain the keywords listed in Table 18.2. Table 1 8.2 Section names and keywords in a LODCTR command file LODCTR command file Section Keywords Description [info] DRlVERNAME=DriverName APPLICATIONNAME=ProgName SYMBOLFILE=FileName.H Name, if driver Name, if service Counter-offset definition file [languages] langid=LanguageName IDs of languages in this file (LanguageName is ignored) [text] symbol_langid_NAME=Name text symbol_langid_HELP=Help text Name of one object or counter Single line of explanatory text The LODCTR utility uses the Win32 profile functions to parse its command file, so it should come as no surprise that these files usually have the extension INI. Let's look at an example of the command and header files needed to define some performance counters. COU NTERS.IN! The following example of a LODCTR command file adds one object with two counters to the Registry. It supports only American English versions of the counters. [ in f o ] drivername=XXDRIVER symbo l f i l e=COUNTERS . H [ l anguage s ] 0 0 9 =Eng l i sh Sec. 18.3 Adding Counter Names to the Registry 467 [ t ext ] XXDEVICE_0 0 9_NAME=XX Devi c e XXDEVICE_0 0 9_HELP= The X X Devi c e does whatever i t does . INTERRUPTS_0 0 9_NAME=Interrupt s / s e c INTERRUPTS_0 0 9_HELP=Measure s the interrup t rate . OPERATIONS_0 0 9_NAME=Operat i ons / s ec OPERATIONS_0 0 9_HELP=Measures devi c e ac t ivi ty . COUNTERS.H You also need to write a header file containing the relative index values of each object and counter that you plan to add to the Registry. This header file defines relative offsets for the XXDEVICE object and its two counters. # de f ine XXDEVICE # de f ine INTERRUPTS # de f ine OPERATI ONS 0 2 4 These indexes must be even numbers starting at zero. The names in the header file have to match the names in the [text] section of the LODCTR com mand file, and they are case-sensitive. This header file will also be included in your data-collection DLL. Using LODCTR and UN LODCTR To add your counter names to the Registry, run LODCTR from the com mand line and give it the name of the command file, like this: LODCTR COUNTERS . INI When you run LODCTR, it uses the Last Counter and Last Help values in the Perflib Registry key to assign absolute index numbers to your objects, counters, and help text items. It also stores the first and last counter and help indexes assigned to your driver in the Performance subkey of the driver 's Regis try service key. A single command file can contain object and counter definitions in more than one language. However, LODCTR will only install counter definitions for language IDs already listed under the Perflib Registry key. To remove all the objects, counters, and help text associated with a particular driver or service, run the UNLODCTR utility. Its only argument is the name of the driver or service that you specified in the [info] section of the INI file. UNLODCTR XXDRIVER If you want to modify the object and counter names associated with a par ticular driver, you have to remove the existing counter definitions for the driver with UNLODCTR and run LODCTR again. LODCTR performs only minimal error checking, and if you run it twice for the same driver, the results are unpredictable. 468 Chapter 18 Driver Performance 1 8.4 THE FORMAT OF PERFORMANCE DATA When the Registry API calls your data-collection DLL, it expects you to return counter information in a very specific format. This data format is one of the more Byzantine things in NT, so it deserves a little motivating explanation. Along with the goal of language-independent object and counter names, the NT architects also wanted to make performance data totally self-descriptive. This means that programs like PERFMON should be able to process and display a block of performance data using only the contents of the block itself. This open ended, extensible architecture allows standard tools to monitor objects that they know nothing about. Unfortunately, data that's totally self-descriptive is also very complicated. The following subsections describe the Registry's performance data format. Overall Structure of Performance Data Figure 18.3 illustrates the overall structure of the information returned by your data-collection DLL. For each performance object in the DLL, you have to provide • Information about the object itself • Definitions for each counter the object exposes • A header for all the counter data • A block containing the counters themselves Object Type 1 Object Type 2 / Object Type 3 : Object Type N PERF_OBJECT_TVPE ' PERF_COUNTER_DEFINITION 1 PERF_COUNTER_DEFINITION 2 'i : PERF_COUNTER_DEFINITION M PERF_COUNTER_BLOCK Counter 1 Counter 2 : Counter M Copyright © 1 994 by Cydonix Corporation. 940054a.vsd Figure 1 8.3 Structure of performance data for objects with single instances 469 Sec. 18.4 The Format of Performance Data The following subsections describe these structures in more detail. You can find additional information in the WINPERF.H header file that comes with the Win32 SDK. 1 PERF_OBJECT_TV PE This structure acts as a header for information about a single object type. You must provide one of these structures for each object being exposed by your performance DLL. Table 18.3 lists the fields in this structure. Table 1 8.3 Contents of a PERF _OBJ ECT_TY PE structure PERF_OBJECT_TYPE, *PPERF_OBJECT_TYPE Field Contents DWORD TotalByteLength sizeof( PERF_OBJECT_TYPE ) + NumCounters * sizeof( PERF_COUNTER_DEFINffiON ) + sizeof( PERF_COUNTER_BLOCK ) + sizeof( allCounters ) sizeof( PERF_OBJECT_TYPE ) + NumCounters * sizeof( PERF_COUNTER_DEFINITION ) sizeof( PERF_OBJECT_TYPE ) Index of this object's name in the title database NULL Index of object's description in the help database NULL Complexity level of information • PERF_DETAIL_NOVICE • PERF_DETAIL_ADVANCED • PERF_DETAIL_EXPERT • PERF_DETAIL_WIZARD Number of counters in each counter block Default to display, or -1 Number of instances of this object, or -1 if no separate instances 0 for drivers Current value, in counts, of the high-resolution performance counter Current frequency, in counts per second, of the high-resolution performance counter DWORD DefinitionLength DWORD HeaderLength DWORD ObjectNameTitlelndex LPWSTR ObjectNameTitle DWORD ObjectHelpTitlelndex LPWSTR ObjectHelpTitle DWORD DetailLevel DWORD NumCounters DWORD DefaultCounter DWORD Numlnstances DWORD CodePage LARGE_INTEGER PerfTrme LARGE_INTEGER PerfFreq 1 This header also contains a great deal of descriptive commentary. going to be working with the performance subsystem. I recommend reading it if you're · Chapter 18 470 Table 1 8.4 Driver Performance Contents of a PERF_COU NTER_DEFIN ITION structure PERF_COUNTER_ D EF I N IT I O N , *PPERF_C O U NTER_ DE F I N IT I ON Field Contents DWORD ByteLength DWORD CounterNameTitlelndex sizeof( PERF_COUNTER_DEFINITION ) Index of this counter 's name in the title database NULL Index of this counter 's description in the help database NULL Scaling factor for display, expressed as a power of lO Complexity level of information • PERF_DETAIL_NOVICE • PERF_DETAIL_ADVANCED • PERF_DETAIL_EXPERT • PERF_DETAIL_WIZARD (See below) Size of counter in bytes Offset from start of PERF_COUNTER_BLOCK structure to the first byte of this counter LPWSTR CounterNameTitle DWORD CounterHelpTitlelndex LPWSTR CounterHelpTitle DWORD DefaultScale DWORD DetailLevel DWORD CounterType DWORD CounterSize DWORD CounterOffset PERF_CO U N T ER_D EF I N IT I O N You must supply a separate counter defi nition for each counter in your DLL. This block (described in Table 18.4) pinpoints the size and position of the counter data itself, as well as defining the type of information the counter represents. PERF_C O U NTER_BLOCK This block (described in Table 18.5) is simply a header for all the raw counter data itself. The counters come immediately after it. Table 1 8.5 Contents of a PERF_COUNTE R_BLOCK structure PERF_COU NTER_BLOCK, *PPERF_COUNTER_BLOCK Field Contents DWORD ByteLength sizeof( PERF_COUNTER_BLOCK ) + sizeof( allCounters ) Types of Counters The CounterType field of the counter definition block specifies the kind of information represented by the counter. WINPERF.H contains a number of pre defined types, most of which are listed in Table 18.6. Your choice of a counter type 471 Sec. 18.4 The Format of Performance Data Table 1 8.6 Use these values for the Cou nterType field of a P E R F_COUNTER_DEFI N ITION Predefined CounterType val ues Counter type Description PERF_COUNTER_COUNTER 32-bit event rate PERF_COUNTER_TIM:ER Suffix / sec LiCount I LiTime 64-bit Timer % LiCount I LiTime PERF_COUNTER_QUEUELEN_TYPE Average queue length PERF_COUNTER_BULK_COUNT 64-bit event rate PERF_COUNTER_TEXT PERF_COUNTER_RAWCOUNT Unicode text 32-bit counter No time averaging % Busy counter numerator 1 or 0 on each sampling interrupt LiCount I LiTime PERF_SAMPLE_FRACTION / sec LiCount I LiTime % LiCount I LiTime PERF_SAMPLE_BASE PERF_SAMPLE_COUNTER PERF_COUNTER_NODATA PERF_COUNTER_TIMER_INV % Busy counter denominator Directly follows numerator counter. Sampled counter Li Count I LiTime Label only; no data 64-bit Timer inverse Measure % idle but display % busy 100 (LiCount I LiTime ) A bulk count which, when divided (typically) by the number of operations, gives (typically) the number of bytes per operation. % - PERF_AVERAGE_BULK Count / Base PERF_AVERAGE_TIM:ER A timer which, when divided by an average base, produces a time in seconds which is the average time of some operation. This timer times total operations, and the base is the number of operations. Timer / Base PERF_AVERAGE_BASE Denominator of time or count averages Directly follows numerator counter. sec 472 Chapter 18 Table 1 8.6 Driver Performance (Continued) Counter type Description PERF_lOONSEC_TIMER 64-bit Timer in 100 nsec units LiCount I LiTime 64-bit Timer inverse 100 (LiCount I LiTime ) 64-bit multi-instance Tuner LiCount I LiTime Result can exceed 100% 64-bit multi-instance Tuner inverse PERF_lOONSEC_TIMER_INV Suffix % % - PERF_COUNTER_:MULTI_TIMER PERF_COUNTER_:MULTI_TIMER_INV % % 100 * MULTI_BASE - (LiCount I LiTmte ) Result can exceed 100% PERF_COUNTER_:MULTI_BASE Followed by a :MULTI_BASE. Counter Number of instances to which the preceding _:MULTI_ _INV counter applies 64-bit multi-instance 100 nSec Timer LiCount I LiTime Result can exceed 100% 64-bit Tuner inverse ... PERF_lOONSEC_:MULTI_TIMER PERF_lOONSEC_MULTI_TIMER_INV 100 * _MULTI_BASE - LiCount I LiTime Result can exceed 100%. PERF_RAW_FRACTION Followed by a :MULTI_BASE counter Counter is a fraction of base PERF_RAW_BASE No time averaging Base for the preceding counter % % % Count / Base will determine not only the data you have to supply, but also how the Perfor mance Monitor displays that data. Objects with Multiple Instances If your data-collection DLL reports data separately for each instance of an object, you need to use a slightly different data format. As you can see from Figure 18.4, the main change is that you have to supply a name for each object instance and separate instances of each counter. Sec. 18.4 The Format of Performance Data 473 PERF_OBJ ECT_TYPE PERF_COUNTER_DEFINITION 1 v PERF_COUNTER_DEFINITION 2 : PERF_COUNTER_DEFINITION Instance M 1 lnstance 2 : Instance P Copyright © 1 994 by Cydonix Corporation. 940058a.vsd Figure 1 8.4 PERF_INSTANCE_DEFINITION \ Unicode Instance Name PERF_COUNTER_BLOCK Counter 1 Counter 2 : Counter M Modified structure of performance data for objects with multiple instances You need to calculate slightly different values for two fields in the PERF_OBJECT_TYPE if you're using multiple object instances. Table 18.7 lists these changes. The other new item for multi-instance objects is a block that describes each object instance. See Table 18.8 for the contents of this block. Notice that you can identify an instance either by a Unicode name or by a number. If you use a name, the name string immediately follows the instance definition block. Keep in mind that, since this Unicode name string is embedded in the data, it won't be trans lated into the local language. Table 1 8.7 These fields of PERF_OBJ ECT_TYPE are different for multi-instance objects PERF_OBJECT_TYPE fields Field Contents TotalByteLength sizeof( PERF_OBJECT_TYPE ) + NumCounters * sizeof( PERF_COUNTER_DEFINITION ) + Numlnstances * sizeof( PERF_INSTANCE_DEFINITION ) + sizeof( alllnstanceNames ) + Numlnstances + Numlnstances Numlnstances Value :f: 1 * sizeof( PERF_COUNTER_BLOCK ) * sizeof( allCounters ) 474 Chapter 18 Table 1 8.8 Driver Performance Contents of a PERF_I NSTANCE D E F I N ITION structu re PERF_INSTANCE_DEFINITION, *PPERF_INSTANCE_DEFINITION Field Contents DWORD ByteLength sizeof( PERF_INSTANCE_DEFINITION ) + sizeof( InstanceNameString ) Index in the title database of object type which is this object's parent or 0 if no hierarchy Index, starting at 0, into the instances being reported for the parent object type Zero-based numerical identifier used in place of a name; PERF_NO_UNIQUE_ID if none sizeof( PERF_INSTANCE_DEFINITION ) sizeof( InstanceNameString ) or 0 if no name DWORD ParentObjectTitleindex DWORD ParentObjectlnstance DWORD UniqueID DWORD NameOffset DWORD NameLength 1 8.5 W RITI N G TH E DATA-COLLECTION D l l As we've already seen, the data-collection DLL acts as an interface between the driver and the Registry APL This section describes the contents of the DLL and explains what you have to do to make the DLL visible to the system. Contents of the Data-Col lection DLL The data-collection DLL consists of three major functions. You can call these routines anything you like, since their names will be recorded in the Performance subkey of your driver 's Registry service key. The following subsections describe each of these functions. Open The Open function queries the Registry to determine the proper index values for each object and counter exported by the DLL. It also initializes the static versions of PERF_OBJECT_TYPE and PERF_COUNTER_DEFINITION structures used by the DLL's Collect function. Finally, it establishes a connection with the specific devices being monitored. Table 18.9 contains the prototype for the Open function. Table 1 8.9 Prototype for data collector's Open function DWOR D XxPerfOpen Parameter Description IN LPWSTR lpDeviceNames Unicode strings naming each device managed by this driver or NULL • ERROR_SUCCESS - function succeeded • ERROR_XXX - some Win32 error code Return value 475 Sec. 18.5 Writing the Data-Collection DLL Table 1 8.1 O Prototype for data collector's Collect function DWORD XxPerfCollect Parameter Description IN LPWSTR lpwszValue Unicode string identifying requested data • Global data about all objects • index 1 index 2 . . . - data about specific objects - • • • IN OUT LPVOID *lppData IN OUT LPDWORD lpcbBytes OUT LPDWORD lpcObjectTypes Return value Foreign ComputerName Foreign ComputerName indexl index2 . . . Costly - data that's expensive t o collect IN: Pointer to buffer pointer for returned data OUT SUCCESS: Updated pointer OUT ERROR: Unchanged from input input IN: Pointer to DWORD containing buffer size OUT SUCCESS: Number of data bytes in buffer OUT ERROR: O OUT SUCCESS - Count of ObjectTypes OUT ERROR: O ERROR_MORE_DATA - buffer too small ERROR_SUCCESS - all other cases Collect The Collect function (described in Table 18.10) is called once when the DLL is opened to get a list of all the objects supported by the DLL. From then on, it is called periodically to retrieve current counter values from each object being monitored. The first argument to this function is a NULL-terminated Unicode string describing the kind of data that the caller wants to receive. This argument can either be a specific keyword (like Global), or it can be a list of index numbers that identify particular object types. Your Collect function will need to parse this string to see if it can provide data about any of the objects the caller is interested in. Close This function is called when it's time to close the connection with the monitored devices and release any resources held by the DLL. The prototype for this function appears in Table 18.11. Table 1 8.1 1 Prototype for data collector's Close function DWORD XxPerfClose Parameter Description VOID Return value ERROR_SUCCESS Chapter 18 476 Driver Performance Error Handling in a Data-Collection DLL It's a good idea for your data-collection DLL to record any problems it encounters in the Event Log. That way, you or a system administrator can poke around with the Event Viewer utility if your driver 's performance objects aren't showing up in PERFMON for some reason. Since a data-collection DLL is running in user mode, it doesn't use the ker nel-mode event-logging interface described in Chapter 13. Instead, it works with the Win32 event logging functions, RegisterEventSource, ReportEvent, and DeregisterEventSource. The code example that accompanies this chapter shows how to use these functions. Another implication of the data-collection DLL's user-mode environment is that you have to record its error message file (which is usually the DLL itself) in a slightly different part of the Registry. Rather than dangling beneath Services\ EventLog\ System, the DLL's message file is recorded in Ser vices \ EventLog\Application. 2 Figure 18.5 shows how this works. It's also polite behavior to give system administrators the ability to control the amount of event logging your DLL performs. One way to do this is to put a REG_DWORD value called EventLogLevel under the Parameters subkey of the driver 's Registry service key. The DLL's Open function retrieves this value from the Registry and uses it as a logging threshold. The higher the number, the more event-logging detail the DLL generates. . . . . . . H KEY_LOCAL_MACHINE\System\CurrentControlSet\Services [ Eventlog L � L� Application rces: REG_MULTl_SZ: XXPERF ..• ERF EventMessageFile: REG_EXPAND_SZ: %SystemRooto/o\System32\XXPERF.DLL TypesSupported: R EG_DWORD: Ox7 Copyright © 1 996 by Cydonix Corporation. 960026a.vsd Figure 2 This 1 8.5 Adding a data-collection DLL' s message file to the Registry also means the DLL's event messages will show up tem log when you use the Event Viewer utility. in the Application log rather than the Sys 477 Sec. 18.5 Writing the Data-Collection DLL HKEY_LOCAL_MACHINE L System L CurrentControlSet L Services L XxDriver L Performance Library: REG_SZ: Open : REG_SZ: Collect: REG_SZ: Close: REG_SZ: First Counter: ... First Help: ... Last Counter: ... Last Help: ... XXPERF.DLL XxPerfOpen XxPerfCollect XxPerfClose Copyright © 1 994 by Cydonlx Corporation. 940056a.vsd Figure 1 8.6 Contents of a driver 's Performance subkey Installing the DLL Once you've built the data-collection DLL itself, you need to move it to the %SystemRoot% \SYSTEM32 directory. To make NT aware of your DLL, you have to add several values to the Performance subkey of your driver 's Registry service key. Figure 18.6 shows the structure of these Registry entries, and Table 18.12 describes them in detail. The First Counter, Last Counter, First Help, and Last Help values were put there by LODCTR. The data-collection DLL retrieves the two First values and uses them to calculate the proper index numbers for each of its objects, counters, and help text items. You only need to add the values that identify the DLL and its entry points. Table 1 8.1 2 Values in a d river's Performance subkey Performance subkey values Value Description Example Library Open Collect Close Full path name of data-collection DLL Name of DLL's (optional) Open function Name of DLL's Collect function Name of DLL's (optional) Close function XXPERF.DLL XxPerfOpen XxPerfCollect XxPerfCollect Chapter 18 478 1 8 .6 CODE EXAM P L E : Driver Performance A DATA-COLLECTION D l l This example shows how t o set u p a data-collection DLL. I t also illustrates the modifications you'd need to make to a kernel-mode driver in order to retrieve performance data from it. It takes a fair amount of code to implement all the pieces of this example. Unfortunately, not all of it will fit here. The complete code for all the components can be found in the CH18 directory on the disk that accompanies this book. In this directory, you'll find three subdirectories: • Driver This directory contains a version of XXDRIVER that supports a IOCTL_XX_GET_PERF_DATA 1/0 control code. The driver itself is just a stub that illustrates how to pass performance data back to the collection DLL. The performance measurements generated by the driver are all bogus. • Ioctl The only file in this directory is XXIOCTL.H which contains the IOCTL definitions and structures used by both the driver and the collec tion DLL. • Library The files in this directory implement the data-collection DLL itself. This includes support for event logging, parsing the argument string of the DLL's Open function, and gathering and formatting perfor mance data. - - - Again, because of space limitations, only selected portions of the data-collec tion DLL will appear here. XXPERF.C This file of the example contains the Open, Collect, and Close functions that interface with the Win32 Registry API calls. Preamble area This section of the data-collection DLL's source code con tains header files, data definitions, and function prototypes necessary to the proper operation of the DLL. II I I Al l - inclus ive header f i l e II # inc lude " xxper f . h " 0 II I I Data g l obal to thi s modu l e @ II s ta t i c HANDLE hDevi c e ; 479 Sec. 18.6 Code Example: A Data-Collection DLL s tat i c DWORD dwOpenCount s tat i c BOOL bini t i al i z ed = O; FALSE ; II I I Ini t i a l i z ed obj ect header de f ined I I i n data . c II ext e rn XX_HEADER_DEF INITION XxObj ec tHeader ; @ II I I Forward dec larat i ons o f routines 8 II PM_OPEN_PROC PM_COLLECT_PROC PM_CLOSE_PROC XxPerfOpen ; XxPer fCo l l ec t ; XxPer f C l o s e ; 0 The master header file includes WINPERF.H from the Win32 SDK. This Win32 header defines all the performance data structures. @ Multiple functions in this source module need access to the device han dle, the count of threads using the library, and the initialization flag. The easiest way to deal with this is to make the variables global. @ The modules DATA.C and DATA.H contain a single copy of all the static parts of the object-type and counter-definition data. e The three exported functions in the DLL must be identified using these specific forward declarations if you want everything to work properly. XxPerfOpen This function sets up the DLL. This includes getting a handle to the target device and calculating the absolute index values for each object and counter exported by the DLL. To simplify the collection process, the DLL keeps a single, statically initialized copy of the data header information in a global struc ture defined in DATA.C and DATA.H. DWORD XxPer fOpen ( LPWSTR lpDevic eNames ) { HKEY hKeyDriverPer f ; DWORD dwF i r s tCounter ; DWORD dwF i r s tHelp ; DWORD dwType ; DWORD dwS i z e ; DWORD dwStatus ; i f ( dwOpenCount -- 0 ) 0 { 480 Chapter 18 Driver Performance XxOpenEventLog ( ) ; @ hDevi c e = Crea t e F i l e@ XX_WIN3 2_DEVICE_NAME , GENERI C_READ , F I LE_SHARE_READ I F I LE_SHARE_WRITE , NULL , O PEN_EX I S T ING , F I LE_ATTRIBUTE_NORMAL , NULL i f ( hDev i c e = = ) ; INVALI D_HANDLE_VALUE { dwS tatus = GetLas tError ( ) ; XxLogErrorWi thDa t a ( LOG_LEVEL_NORMAL , XXPERF_CANT_OPEN_DEVICE_HANDLE , &dwS tatus , si zeof ( dwS tatus ) ) ; XxC l o s eEventLog ( ) ; re turn dwS tatus ; II II II II Open the Per f o rmance subkey o f the dr iver ' s s ervi c e key in the Reg i s t ry . dwS tatus = RegOpenKeyEx ( O HKEY_LOCAL_MACHINE , " SYSTEM \ \ CurrentContr o l S e t " " \ \ S e rvi c e s \ \ XxDr iver " " \ \ Per f o rmanc e " , OL , KEY_ALL_ACC E S S , &hKeyDrive r P e r f if ( dwS tatus != ) ; ERROR_SUCC E S S { XxLogErrorWi thDa t a ( LOG_LEVEL_NORMAL , XXPERF_CANT_O PEN_DRIVER_KEY , &dwS t atus , C l o s eHandl e ( sizeof ( hDevi c e XxC l o s eEventLog ( ) ; re turn dwS t atus ; } ) ; dwS tatus ) ) ; 481 Sec. 18.6 Code Example: A Data-Collection DLL II II II Get ba s e i ndex o f f i r s t obj e c t o r counter dwS i z e si zeof dwStatus ( DWORD ) ; RegQueryValueEx ( hKeyDri verPer f , " F i r s t C ount er " , OL , &dwType , ( LPBYTE ) &dwF i r s tCount e r , &dwS i z e ) ; if ( dwS t a tus != ERROR_SUCCE S S { XxLogErrorWi thData ( LOG_LEVEL_NORMAL , XXPERF_CANT_READ_F I RST_COUNTER , &dwS tatus , sizeof ( dwS tatus RegC l o s eKey ( hKeyDrive r P e r f C l o s eHandl e ( hDev i c e ) ) ; ) ; ) ; XxC l o s eEventLog ( ) ; re turn dwS t atus ; II II II Get bas e i ndex o f f i r s t he l p t ext dwS i z e s i zeof dwS tatus ( DWORD ) ; RegQueryValueEx ( hKeyDriverPer f , " F i r s t H e lp " , OL , &dwType , ( LPBYTE ) &dwF i r s tHelp , &dwS i z e if ( { dwS tatus != ) ; ERROR_SUCCESS XxLogErrorWi thData ( LOG_LEVEL_NORMAL , XXPERF_CANT_READ_F I RST_HELP , &dwS tatus , si zeof ( dwS tatus RegC l o s eKey ( hKeyDr ive r P e r f C l o s eHandl e ( hDevi c e XxC l o s eEventLog ( ) ; ) ; ) ; ) ) ; 482 Chapter 18 Driver Performance return dwS tatus ; II I I Don ' t need Reg i s t ry handle anymore II RegC l o s eKey ( hKeyDriverPerf ) ; II I I Ini t i a l i z e PERF_OBJECT_TYPE s t ruct0 II XxObj ec tHeader . XxDevi c e . Obj ectNameTi t l e index dwF i r s tCount er + XXDEVICE ; XxObj ec tHeader . XxDevi c e . Obj ectHelpTi t l e index = dwF i r s tHelp + XXDEVI CE ; II I I Ini t i al i z e 1 s t PERF_COUNTER_DEF INITION II XxObj ec tHeader . Interrup t s . CounterNameTi t l e i ndex dwF i r s tCounter + INTERRUPTS ; XxObj e c tHeade r . Interrup t s . Count erHelpT i t l e i ndex dwF i r s tHelp + INTERRUPTS ; = II I I Ini t ial i z e 2 nd PERF_COUNTER_DEF INITION II XxObj e c tHeader . Operati ons . Count erNameTi t l e index dwF i r s tCount er + OPERATI ONS ; XxObj ectHeader . Operat i ons . CounterHe lpT i t l e index dwF i r s tHelp + OPERATIONS ; = II I I Mark DLL as suc c e s s fu l ly ini t i al i z ed II bini t i a l i zed = TRUE ; II I I One way or ano ther , there ' s one more I I thread u s i ng the DLL . Sec. 18.6 Code Example: A Data-Collection DLL 483 II dwOpenCount + + ; return ERROR_SUCCESS 0 If the DLL is being called by SCREG from a remote computer, there may be multiple threads accessing it at the same time. Therefore the DLL needs to keep a count of how many times it's been opened. The first call causes the DLL to initialize itself; the rest simply bump the count. @ Any errors that occur will go to the Event Log. This helper function (defined in EVENTLOG.C) manages the details of setting up the connection. @ The kernel-mode driver will give performance data to the DLL in response to a special IOCTL code. To issue that IOCTL, the DLL needs a handle to the device. This handle is stored in a global variable (hDevice) where the rest of the DLL can get to it. e This next section of code gets a handle to the Performance subkey below XXDRIVER's Registry service key. Then it recovers the base index num ber for XXDRIVER's objects and counters (from the First Counter value), and the base index number for help text (from the First Help value). 0 Once the base index values are recovered, it's necessary to calculate the index number of every object, counter, and help text item supported by this DLL. The resulting indexes are put into the various . Titleindex fields of the statically initialized object header defined in DATA.C. . . XxPerfCol lect The Collect function retrieves one sample of data from the object being monitored. After copying the static data header into the caller 's buffer, it uses an IOCTL to put the current counter values there as well. DWORD XxPerfCo l l ect ( IN L PWSTR lpValueName , IN OUT LPVO I D * lppData , IN OUT LPDWORD lpcbTo talByt e s , IN OUT LPDWORD lpNumObj e c tType s ) DWORD dwQue ryType ; DWORD dwStatus ; DWORD dwBytesRe turned ; PPERF_COUNTER_BLOCK pPer fCounterBlock ; PXX_HEADER_DEFINITION pXxObj e c tHeader ; i f ( ! bi ni t i a l i z ed ) 0 { Chapter 1 8 484 Driver Performance * lpcbTotalByt es = ( DWORD ) 0 ; * lpNumObj ec tTyp e s = ( DWORD ) 0 ; return ERROR_SUCCES S ; dwQueryType = XxGe t Pe r fQueryType ( lpValueName ) ; @ i f ( dwQueryType = = PERF_QUERY_TYPE_FORE IGN ) { II I I Can ' t s ervi c e foreign reques t s . II * lpcbTo talByt es = ( DWORD ) O ; * lpNumObj ec tTyp e s ( DWORD ) O ; re turn ERROR_SUCCES S ; = i f ( dwQueryType = = PERF_QUERY_TYPE_ITEMS { i f ( ! Xxi sNumberinL i s t ( @) XxObj ectHeader . XxDevi ce . Obj ec tNameT i t l e index , lpValueName ) ) { * lpcbTotalByt es = ( DWORD ) 0 ; * lpNumObj ec tTyp e s = ( DWORD ) 0 ; return ERROR_SUCCES S ; } i f ( * lpcbTo ta lByt e s < 0 ( s i z e o f ( XX_HEADER_DEF INITION ) + s i z e o f ( XX_PERF_DATA ) ) ) { * lpcbTo talByt e s ( DWORD ) O ; * lpNumObj ec tTypes = ( DWORD ) O ; re turn ERROR_MORE_DATA ; = } pXxObj e c tHeader = 0 ( PXX_HEADER_DEF INITION ) * lppDat a ; memmove ( pXxObj e c tHeader , &XxObj ec tHeader , s i z e o f ( XX_HEADER_DEF INITION ) ) ; Sec. 18.6 Code Example: A Data-Collection DLL 485 pPer fCounterBlock = \ SYMBOLS on the NT distribution CD, copy various symbol files to \SYMBOLS\FREE on the host. At a mini mum, you'll need EXE\NTOSKRNL.DBG, DLL \NTDLL.DBG, and DLL \HAL.DBG. ... 3. Copy the checked versions of the same symbol files from \CHECKED\ SUP PORT\DEBUG \ \ SYMBOLS on the NT distribution CD to ... \ SYMBOLS\CHECKED on the host. You'll need these symbols when you run your driver under the checked build of NT. 4. Each time you rebuild your driver, copy the driver 's symbol file into these directories. Refer back to Chapter 16 for an explanation of creating the driver 's debug symbol file. One thing to watch out for: Installing an NT service pack changes all the symbol information. So, if you've upgraded NT on the target system with a ser vice pack, you have to get the operating system symbol files from the service pack CD. The symbols on the standard distribution CD won't work. The symbol direc tory paths on the service pack CD are the same as those on the NT distribution disk. A.3 ENABLI N G C R A S H D U M PS O N TH E TAR G ET SYSTEM Crash dump files can be very helpful when you're tracking down bugs in a ker nel-mode driver. Refer back to Chapter 17 for information on reading these files. Follow these steps on the target system if you want Windows NT to dump crash information after a bugcheck. 1. In the Control Panel, double-click on the System applet. 2. Click on the Recovery button. The Recovery dialog box will appear. Enabling Crash Dumps on the Target System Sec. A.3 491 3. Select the Write Debugging Information To check box. You can enter a path and filename for the crash file in the test box, or accept the default value (%SystemRoot% \MEMORY.DMP). 4. Select the Overwrite Any Existing File check box if you want new crashes to overwrite an existing dump file with the same name. If this check box is clear, you won't get any crash information if a dump file with the same name already exists. 5. Reboot the system to have these options take effect. When a crash occurs, the system copies an image of physical memory into the paging file located on the system root partition. During the reboot after a crash, NT copies the crash image from the paging file to the target file specified in the Recovery dialog. If You Don't Get Any Crash Dump Files Several things can prevent the system from creating a dump file after a crash. If you're having troubles, here's what to look for. Premature reboot Make sure you don't hit the reboot switch until NT has finished dumping memory into the crash file. If you reboot before the dump is complete, you won't get any crash information. You can tell when NT has finished by looking at the message at the bottom of the blue screen display. Paging file issues NT can only use the paging file on the system root par tition for storing the crash image. If you don't have a paging file there, NT won't be able to save crash information. Also, make sure there's enough space in this paging file. It must be big enough to hold all of physical memory plus one additional megabyte. If the file is too small, you won't get any crash information. Lack of disk space There has to be enough space on the system root par tition to hold the dump file itself. Although you can specify any target directory for the dump file, NT initially creates it in the %SystemRoot% directory and then copies it to its final destination. If there isn't enough free space, NT won't be able to create the file. Hardware issues Certain specific hardware configurations have problems generating crash files. Most of them (though not all) involve SCSI disk controllers. If you search the Knowledge Base section of the Microsoft Developer CD for a title containing the name of your system (or SCSI controller) and MEMORY.DMP, you may find a bug report helpful. Other than getting some new hardware, there's not much you can do in this case. 492 Appendix A The Development Environment Even if your system isn't one of the ones with known problems, the lack of a dump file may indicate that you're using an out-of-date driver for your system disk. See if there's a newer version available. A.4 E N A B LI N G TH E T A R G ET SYSTE M ' S D E B U G C L I E NT Both the retail and checked versions of Windows NT include a debugging client that allows NT to communicate over a serial line with the WINDBG debugger. However, you have to enable this debugging client on the target system if you want to debug the target system interactively with WINDBG. Depending on the CPU architecture, you follow different procedures to enable kernel-mode debugging on the target system. On RISC machines, you need to modify the OSLOADOPTIONS environment variable in the ARC firm ware. See your system documentation for an explanation of how to do this. To enable the debugger on 80x86-based machines, you edit the BOOT.IN! file located in the root directory of the boot partition. This is a hidden system file that tells the NT loader what operating systems are available for booting. Follow these steps to modify BOOT.INI: 1. Remove the read-only, hidden, and system attributes from the file using this command: attrib - r -h - s BOOT . INI 2. Open BOOT.INI for editing with your favorite text editor. 3. In the [operating systems] section, add appropriate options to the boot com mand line for the free and checked versions of Windows NT. 4. Save the changes and close the file. 5. Use the following command (or its File Manager equivalent) to restore the file's original attributes: attrib +r +h + s BOOT . INI Regardless of the machine architecture, you can specify the options listed in Table A l . Keep the following things in mind when you're selecting bootstrap options. • If you specify NODEBUG, then DEBUGPORT, BAUDRATE, and CRASH DEBUG are ignored. • If you specify BAUDRATE, kernel debugging is enabled; you do not also have to specify DEBUG. Select the highest baud rate that works for both machines. Sec. A.4 Enabling the Target System's Debug Client Table A.1 493 Debugging options for BOOT. I N ! files or OSLOADOPTIONS BOOT.IN! options Options Description DEBUG NODEBUG DEBUGPORT=PortName BAUDRATE=BaudRate CRASHDEBUG Enables kernel-mode debugging. Disables kernel-mode debugging. This is the default. Specifies debug serial port used by target machine. Specifies baud rate used by target machine. Causes debugger to activate only when the system bugchecks. Specifies the amount of memory to be made available to the system. Displays the name of each module being loaded during system bootstrap MAXMEM=SizelnMB sos • On 80x86 machines, COM2 is the default debugger communications port, if it exists and if it isn't being used. In all other cases, COMl is the default. • The MAXMEM option can be useful for stress testing your driver in a low-memory environment. For example, you can limit a 24-megabyte machine to using only 12 megabytes. The following example of a BOOT.INI file offers three choices at boot time: a nondebugging, free version of NT; a free version of NT with the debugger enabled; and a checked version of NT with debugging enabled. The checked ver sion is also restricted to a 12 MB environment. [ bo o t l o ader ] t irneou t = 3 0 de f au l t = c : \ [ opera t i ng sys t ems ] rnul t i ( O ) di sk ( O ) rd i s k ( O ) par t i t i on ( l ) \ w i nn t = " NT Free " rnu l t i ( O ) di s k ( O ) rd i s k ( O ) par t i t i on ( l ) \ w i nn t = " NT Free " / DEBUGPORT=COMl rnu l t i ( O ) di sk ( O ) rd i s k ( O ) par t i t i on ( l ) \ wnt chk= " NT Che c k " / DEBUG=COMl / MAXMEM= 1 2 A P P E N D I X B Common Bugcheck Codes 8.1 G E N E RA L PROBLEMS WITH DRIVERS A variety of driver errors can produce the bugchecks in Table B.1. The accom panying notes may help you locate the source of the problem. Table B.1 General errors Bugchecks caused by general driver problems Code and parameters Description IRQL_NOT_LESS_OR_EQUAL (OxOA) 1 - Address that was referenced 2 - IRQL at time of reference 3 - Type of access • 0 - Read • 1 - Write 4 - Address where reference occurred CAUSE: A driver touched paged memory at or above DISPATCH_LEVEL IRQL. ACTION : The driver may be using a bogus pointer. Use the fourth bugcheck parameter to find the offending source code line. KMODE_EXCEPTION_NOT_HAN DLED (Ox1 E) 1 - The exception code 1 2 - Address of the failing instruction 3 - First exception parameter 4 - Second exception parameter CAUSE: A driver generated an exception. ACTION: Use the second bugcheck parameter to locate the offending source code line. 494 Sec. B.1 495 General Problems with Drivers Table B.1 (Continued) Code and parameters Description UNEXPECTED_KERNEL MODE_TRAP (Ox7F) Code number of trap2 CAUSE: On Intel platforms, this means the CPU generated a trap that it can't handle in kernel mode. ACTION: From WINDBG, find the trap frame address with kb. Use !trap to format the frame. 3 The contents of EIP will show where the trap was taken. PANIC_STACK_SWITCH (Ox2B) CAUSE: The kernel-mode stack has overflowed. This can mean other operating sys tem data structures have been damaged. ACTION: In the stack trace, look for a driver that's using too much stack space.4 PAGE_FAULT_WITH_INTERRUPTS_OFF (Ox49) Same as OxOA (above). IRQL_NOT_DISPATCH_LEVEL (Ox08) IRQL_NOT_GREATER_OR_EQUAL (Ox09) IRQL_GT_ZERO_AT_SVSTEM_SERVICE (Ox4A) CAUSE: Miscellaneous problems with IRQL level. ACTION: Use the stack trace to locate the code causing the crash. INVALID_SOFTWARE_INTERRUPT (Ox07) SVSTEM_SERVICE_EXCEPTION (Ox3B) INVALID_DATA_ACCESS_TRAP (Ox04) NO_EXCEPTION_HANDLING_SU PPORT (OxOB) TRAP_CAUSE_U NKNOWN (Ox1 2) LAST_CHANCE_CALLED_FROM_KMODE (Ox1 5) CAUSE: Miscellaneous problems with exceptions. ACTION: Use the stack trace to locate the code executing at the time of the crash. 1 - 1 You can determine what kind of exception it is by searching NTSTATUS.H for this number. A common exception code is Ox80000003. This means the system hit a hard-coded breakpoint or ASSERT while it was booted with the /NODEBUG switch. Connect a debugger and reboot with the / DEBUG switch to locate the problem. Another popular error is OxCOOOOOOS, which is an access violation. In this case, argument 4 (the second exception parameter) is the address your driver was trying to touch. 2 See the Intel486 Processor Family Programmer's Reference (listed in the bibliography) for a list of CPU trap codes. 3 On Intel platforms, the frame will be associated with a procedure called NT!KiTrap. 4 Keep in mind that the driver whose stack operation generated the bugcheck is not necessarily the driver that's using too much stack space. Appendix B 496 8.2 Common Bugcheck Codes SYNCHRONIZATION P R O B L E M S The bugchecks in Table B.2 are caused b y improper use o f various NT synchroni zation mechanisms. Table B.2 Synch ronization problems Bugchecks caused by synchronization problems Code and parameters Description SPIN_LOCK_INIT_FAILU RE (Ox81 ) SPIN_LOCK_ALREADY_OWNED (OxF) SPIN_LOCK_NOT_OWNED (Ox1 0) NO_SPIN_LOCK_AVAILABLE (Ox1 D) CAUSE: Misuse of spin locks. ACTION: Use the stack trace to locate the code executing at the time of the crash. MAXIM U M_WAIT_OBJECTS_EXCEEDED (OxOC) THREAD_NOT_MUTEX_OWNER (Ox1 1 ) SYSTEM_EXIT_OWNED_MUTEX (Ox39) CAUSE: Improper use of Mutexes in kernel mode. ACTION : Fix the driver logic error causing the problem. M UTEX_LEVEL_NUMB ER_VIOLATION (OxD) 1 Current thread's Mutex level 2 - Mutex level of requested Mutex CAUSE: A driver thread has requested ownership of a Mutex that violates the level number sequence. ACTION: Use the stack trace to identify the driver. Use the level numbers to identify the Mutexes. 1 - 1 If the Mutexes belong to NT, use EXLEVELS.H to figure out which ones they are. 8.3 COR R U PTED DRIVER DATA STRUCTU R ES The bugchecks in Table B.3 are caused by problems with various 1/0 Manger data structures. In general, these problems indicate some kind of serious logic error in a driver. Sec. B.3 Corrupted Driver Data Structures Table B.3 497 Driver data structure problems Bugchecks caused by data structure problems Code and parameters Description DEVICE_REFERENCE_COU NT_NOT_ZERO (Ox36} 1 Address of Device object CAUSE: A driver has called IoDeleteDevice with a Device object that still has a nonzero refer ence count. ACTION: Locate the driver logic error leading to this situation. NO_MORE_IRP_STACK_LOCATIONS (Ox35} 1 - Address of the IRP CAUSE: A higher-level driver has tried to pass an IRP to a lower-level driver using IoCallDriver, but there are no more stack locations in the IRP. 1 ACTION: If your driver allocated the IRP, examine how you're calculating the number of stack slots. If the IRP is being passed to you, your Device object's Stack Size field is too small. INCONSISTENT_IR P (Ox2A} Address of the IRP CAUSE: The 1/0 Manager has found an IRP with fields that are not inter nally consistent. 2 ACTION: Make sure your driver isn't writing over the contents of the IRP. MULTI PLE_IRP_COMPLETE_REQU ESTS (Ox44} CAUSE: A driver has called IoCompleteRequest with an IRP that's already been completed. Either one driver is trying to com plete the same IRP twice, or two drivers both think they own the IRP. 3 ACTION: The DeviceOb j ect field of the IRP's stack locations will show you who was using the IRP. This may help. - 1 - 1 - Address of the IRP Appendix B 498 Table B.3 Common Bugcheck Codes (Continued) Code and parameters Description CANCEL_STATE_IN_COMPLETED_I RP (Ox48) 1 Address of the IRP CAUSE: A driver has called IoCompleteRequest with an IRP that still has a cancel routine. ACTION: This is a driver logic error. Take the IRP out of the cancelable state be fore you try to complete it. DEVICE_QUEU E_NOT_BUSY (Ox02) CAUSE: A Device Queue object is in an inconsistent state. ACTION : The Device Queue object is probably getting corrupted by inap propriate access or be cause of bogus use of pointers. - 1 This is really a disaster, since the higher-level driver thinks it has filled in the IRP parameter fields for the lower-level driver. However, there was no room in the IRP for these parameters, so the higher-level driver has actually written off the end of the IRP and mangled some unrelated piece of memory. 2 For example, an IRP that was being completed but was still marked as being attached to a driver's Device Queue object. 3 Finding the two drivers is difficult, since the identity of the first one has already been covered up by the time the second driver makes the failing call to IoCompleteRequest. 8.4 M EM O RY PROBLEMS The bugchecks in Table B . 4 are caused by driver memory problems. Drivers can cause many subtle (and not so subtle) system failures through improper use of memory. Sec. B.4 Memory Problems Table B.4 499 Memory problems B ugchecks caused by memory problems Code and parameters Description NO_MORE_SVSTEM_PTES (Ox3F) CAUSE: There are no system page table entries left. This often means a driver isn't cleaning up after itself. ACTION: The !sysptes com mand may give some insight. TARGET_MDL_TOO_SMALL (Ox40) CAUSE: A driver has called IoBuildPartialMdl and passed a target MDL that isn't large enough to map the entire range of addresses requested. ACTION: Locate the call to Io BuildPartialMdl in the stack trace. Its arguments identify the bad MDL. Also use the stack trace to see who called this function. M UST_SUCCEED_POOL_EMPTY (Ox41 ) 1 - Size of unsatisfied request 2 - Number of pages used of nonpaged pool 3 - Number of too large PAGE_SIZE requests from nonpaged pool 4 - Number of pages available CAUSE: There isn't enough mem ory to satisfy a request from one of the XxxMustSucceed pool areas. ACTION: Look for a driver that's leaking memory. NO_PAGES_AVAILABLE (Ox4D) 1 - Number of dirty pages 2 - Number of physical pages in machine 3 - Extended commit value in pages 4 - Total commit value in pages CAUSE: The system has run out of free pages. ACTION: Look for processes or drivers that are leaking memory. PFN_LIST_CORRUPT (Ox4E) 1-1 2 - ListHead value that was corrupt 3 - Number of pages available 4-0 - OR 1 -2 2 - Entry in list being removed 3 - Highest physical page number 4 - Reference count of entry being removed CAUSE: A driver has probably corrupted an MDL. ACTION: Trace backward on the stack from the system routine that detected the error to the driver routine that passed the MDL. This may be the driver that corrupted the MDL. 500 Appendix B Table B.4 Common Bugcheck Codes (Continued) Code and parameters Description PROCESS_HAS_LOCKED_PAGES (Ox76) 1 - Process address 2 - Number of locked pages 3 - Number of private pages 4-0 CAUSE: A driver hasn't released some locked pages at the end of an 1/0 operation. ACTION: Look for a driver that isn't cleaning up after an 1/0. BAD_POOL_HEADER (Ox1 9) M EMORY_MANAGEM ENT (Ox1 A) PFN_SHARE_COUNT (Ox1 B) PFN_REFERENCE_COU NT (Ox1 C) PAGE_FAU LT_IN_NONPAGED_AREA (Ox50) INSUFFICIENT_SYSTEM_MAP_REGS (Ox45) CAUSE: Miscellaneous memory errors. ACTION : Look for drivers active at the time of the crash. One of them may be corrupting memory. B.5 HARDWARE FAI LU R ES The bugchecks in Table B.5 are the result of various hardware failures. Try to locate and correct the problem. Table B.5 Hardware problems B ugchecks caused by hardware problems Code and parameters Description KERNEL_STACK_INPAGE_ERROR (Ox77) 1-0 4 - Address of Kernel stack signature - OR 1 - Status code 2 - 1/ 0 status code 3 - Page file number 4 - Offset into page file CAUSE: A page of the ker nel-mode stack couldn't be read because of a bad block in the paging file or a disk controller error. ACTION: If the first two parameters are zero, there is a hardware error. Else, look at the status code: • C000009C or C000016A: bad block • C0000185: SCSI cable or termination problem • C0000009A: insufficient nonpaged pool KERNEL_DATA_IN PAGE_ERROR (Ox7 A) 1 - Lock type that was held: • Value 1, 2, 3 • PTE address 2 - Error status CAUSE: A page of kernel mode data couldn't be read because of a bad block in the paging file or a disk controller error. 2-0 3 - PTE value at time of error Sec. B.6 501 Configuration Manager and Registry Problems Table B.5 (Continued) Code and parameters Description 3 - Current process ACTION: See error Ox77 (above) 4 - Virtual address that could not be read DATA_BUS_ERROR (Ox2E) 1 Virtual address that caused the fault 2 - Physical address that caused the fault 3 - Processor status register (PSR) 4 - Faulting instruction register (FIR) CAUSE: Either there is a parity error in system memory or a driver is accessing a nonexistent system-space address. ACTION: If a memory test succeeds, then use stack trace to locate the driver making the reference. M U LTIPROCESSOR_CONFIGU RATION_ NOT_SUPPORTED (Ox3E) CAUSE: NT has detected that all the CPUs in a mul tiprocessor system are not identical. This is not a sup ported configuration. ACTION: Correct the asymmetry. INSTALL_MORE_MEMORY (Ox7D) 1 - Number of physical pages found 2 - Lowest physical page 3 - Highest physical page CAUSE: There isn't enough memory available to boot the system. ACTION: Install more memory. - 4-0 NMl_HARDWARE_FAILURE (Ox80) INSTRUCTION_BUS_ERROR (Ox2F) DATA_COHERENCY_EXCEPTION (Ox55) INSTRUCTION_COHERENCY_EXCEPTION (Ox56) 8.6 CAUSE: Miscellaneous hardware failures. ACTION : Use hardware diagnostics to locate and correct the problem. C O N FI G U RATIO N M A N A G E R A N D R E G I ST R Y P R O B L E M S The bugchecks in Table B.6 result from problems with crucial Registry informa tion. If the failure occurs only when your driver is running, you may be able to trace the problem back to bad calls to Registry functions. Since the Registry is mapped into system space, drivers can also corrupt the Registry by using bogus address pointers. Appendix B 502 Table 8.6 Common Bugcheck Codes Registry problems Bugchecks caused by Registry problems Code and parameters Description CONFIG_INITIALIZATION_FAILED (Ox67) 2 - Location where failure occurred CAUSE: Configuration Manager couldn't get enough paged pool for the Registry. 1 ACTION: Get a stack trace and call Microsoft. CONFIG_LIST_FAILED (Ox73) 1 -5 2-2 3 - Index of hive 4 - Pointer to UNICODE_STRING containing filename of hive CAUSE: One of the core system Registry hives (SOFTWARE, SECURITY, or SAM) is unread able or corrupted. ACTION: Get a stack trace and call Microsoft. BAD_SYSTEM_CONFIG_IN FO (Ox74) CAUSE: Either the SYSTEM hive is corrupted, or various crucial keys and values are missing. ACTION: Try booting from the Last Known Good configuration. If that fails, use the emergency repair disk. If that fails, reinstall 1 -5 NT. CANNOT_WRITE_CONFIGURATION (Ox75) CAUSE: There is no room on the disk to increase the size of the SYSTEM hive files. ACTION: Free up space in the system partition. REGISTRY_ER ROR (Ox51 ) 1 - Indicates where error occurred 2 - Indicates where error occurred 3 Pointer to hive 4 - Internal error return code CAUSE: Something is seriously wrong with the Registry. It may be the result of an 1/0 error or file system corruption. ACTION: Try rebooting using the Last Known Good option or the emergency repair disk. - 1 This error should never occur, since Registry setup happens early enough during system initial ization that there should always be enough pool space. Sec. B.7 B.7 File System Problems 503 F I L E S YST E M P R O B L E M S The bugchecks in Table B.7 result from failures in a file-system driver or a related component. Since Microsoft doesn't currently support customer-written FSDs, there is little you can do to diagnose these problems. Table B. 7 File system problems Bugchecks caused by file system problems Code and parameters Description CACH E_MANAGER (Ox34) FILE_SYSTEM (Ox22) FAT_FILE_SYSTEM (Ox23) NTFS_FILE_SYSTEM (Ox24) NPFS_FILE_SYSTEM (Ox25) CDFS_FILE_SYSTEM (Ox26) RDR_FILE_SYSTEM (Ox27) MAILSLOT_FILE_SYSTEM (Ox52) PINBALL_FILE_SYSTEM (Ox59) LM_SERVER_INTERNAL_ERROR (Ox54) CAUSE: Internal problems with a Microsoft-supplied file-system driver. ACTION: Get a stack trace and call Microsoft. APC_IN DEX_MISMATCH (Ox01 ) CAUSE: This internal error could be the result of file system problems. ACTION: Get a stack trace and call Microsoft. KERNEL_APC_PENDING_DURING_EXIT (Ox20) 1 Address of pending APC 2 The thread's APC disable count 3 The current IRQL CAUSE: This indicates a logic error in a file system driver. ACTION: See if any third party file system drivers were installed at the time of the crash. Be suspicious of them. - - - Appendix B 504 8.8 Common Bugcheck Codes SYSTEM I N ITIALIZATION FAI LURES The bugchecks in Table B.8 occur only during system initialization. Some of them are the result of mismatched software components, while others indicate prob lems that can only be diagnosed by Microsoft. Table B.8 Bootstrap and initialization fail u res Bugchecks caused by bootstrap problems Code and parameters Description MISMATCHED_HAL (Ox79) 1 - 1 (Release levels don't match) 2 - Release level of Kernel 3 - Release level of HAL - OR 1 - 2 (Build types don't match) 2 - Kernel build type • 0 - Free multiprocessor-enabled build • 1 - Checked multiprocessor-enabled build • 2 - Free uniprocessor build 3 - HAL build-type - OR 1 - 3 (MCA HAL required) 2 - Machine type detected at bootstrap • 2 means MCA 3 - HAL type CAUSE: The HAL revision level and HAL configuration type do not match those of the Kernel or the machine type. 1 ACTION: Make sure the proper versions of the HAL and NTOSKRNL are installed. FTDISK_INTERNAL_ERROR (Ox58) CAUSE: The system is trying to boot from the wrong copy of a mirrored partition. ACTION: Reboot from the shadow copy of the partition. INACCESSIBLE_BOOT_DEVICE (Ox7B) 1 - Pointer to boot Device object - OR 1 Pointer to UNICODE_STRING structure containing ARC name of volume that can't be mounted. CAUSE: Either the device driver for the boot device failed to initialize, or the file system driver for the boot device didn't recognize the file structures on the volume. ACTION: Be sure the right device driver is installed for the boot device, and that the sys tem is trying to boot from the correct location. - Sec. B.8 505 System Initialization Failures Table B.8 (Continued) Code and parameters Description PHASEO_EXCEPTION (Ox78) CAUSE: Failure during initial ization of a system component. ACTION: Get a stack trace and call Microsoft. SESSION 1 _1NITIALIZATION_FAILED (Ox6D) SESSION2_1NITIALIZATION_FAILED (Ox6E) SESSION3_1NITIALIZATION_FAILED (Ox6F) SESSION4_1NITIALIZATION_FAILED (Ox70) SESSIONS_INITIALIZATION_FAILED (Ox71 ) 1 NT status code at time of failure CAUSE: Failure during initial ization of a system component. ACTION: Get a stack trace and call Microsoft. PHASEO_I NITIALIZATION_FAILED (Ox31 ) PHASE1 _1NITIALIZATION_FAILED (Ox32) HAL_IN fflALIZATION_FAILED (Ox5C) H EAP_llNITIALIZATION_FAILED (Ox5D) O BJECT_INITIALIZATION_FAILED (Ox5E) SECURITY_INITIALIZATION_FAILED (Ox5F) PROCESS_INITIALIZATION_FAILED (Ox60) HAL 1 _1NITIALIZATION_FAILED (Ox61 ) OBJECT1 _INITIALIZATION_FAILED (Ox62) SECURITY1 _1NITIALIZATION_FAILED (Ox63) SYM BOLIC_INITIALIZATION_FAILED (Ox64) M EMORY1 _1NITIALIZATION_FAILED (Ox65) CACHE_INITIALIZATION_FAILED (Ox66) FILE_INITIALIZATION_FAILED (Ox68) 101_1NITIALIZATION_FAILED (Ox69) LPC_INITIALIZATION_FAILED (Ox6A) PROCESS1 _1NITIALIZATION_FAILED (Ox6B) REFMOIN_INITIALIZATION_FAILED (Ox6C) 1 NT status code describing the failure 2 Indicator of location where failure occurred WINDOWS_NT_BANNER (Ox4000007E) CAUSE: Failure during initial ization of a system component. ACTION: Get a stack trace and call Microsoft. - - - 1 This error probably means that someone has manually updated either NTOSKRNL.EXE or HAL.DLL. It can also result from mixing a uniprocessor HAL with a multiprocessor Kernel, or vice versa. 506 8.9 Appendix B Common Bugcheck Codes I NTE R N A L S YST E M F A I L U R ES The bugchecks in Table B.9 all come from fatal errors within a Microsoft-supplied software component. For the most part, there's little you can do to track these errors. Table B.9 I nternal system errors Bugchecks caused by internal system problems Code and parameters Description PORT_DRIVER_INTERNAL (Ox2C) SCSl_DISK_DRIVER_INTERNAL (Ox2D) FLOPPY_INTER NAL_ERROR (Ox37) SERIAL_DRIVER_INTERNAL (Ox38) ATDISK_DRIVER_INTERNAL (Ox42) CAUSE: Miscellaneous errors from a system-supplied driver. ACTION : Get a stack trace and call Microsoft. STREAMS_INTERNAL_ERROR (Ox4B) N DIS_INTERNAL_ERROR (Ox4F) XNS_INTERNAL_ERROR (Ox57) CAUSE: Internal errors from system-supplied networking components. ACTION: Get a stack trace and call Microsoft. CORRUPT_ACCESS_TOKEN (Ox28) SECURITY_SYSTEM (Ox29) CAUSE: Internal security sub system errors. ACTION: Get a stack trace and call Microsoft. Bibliography Books about Software Development Hatley, Derek J., and Pirbhai, Imtiaz A. Strategies for Real-Time System Specification. New York, NY. Dorset House Publishing, 1988. Device drivers are complex pieces of real-time software. The techniques in this book can help in their design. Kaner, Cem, et al. Testing Computer Software, 2nd ed. New York, NY. Van Nostrand Reinhold, 1993. This book gives a good overview of the software testing process. If you're responsible for finding and fixing the bugs, this is a good place to start. Books about Windows NT and Win32 Custer, Helen. Inside Windows NT. Redmond, WA. Microsoft Press, 1993. This book (although getting rather long in the tooth at this point) contains a good high-level overview of the orig inal Windows NT architecture. Unfortunately, it's somewhat lacking in specific implemen tation details. Microsoft Corporation. Windows NT 3. 5 Resource Kit. Redmond, WA. Microsoft Press, 1994. These volumes have been updated for NT 3 . 5 1 and presumably will be for NT 4.0 as well. Richter, Jeffrey. Advanced Windows NT. Redmond, WA. Microsoft Press, 1994. This book will give you a good background in Win32 user-mode programming. Books about Bus Architectures Anderson, Don. PCMCIA System Architecture, 2nd ed. Reading, MA. Addison-Wesley Publishing Company, Inc., 1995. I can't say enough good things about this series of hardware books from Shanley and Anderson. They're accurate, readable, and detailed enough to give driver 507 Bibliography 508 writers a comprehensive introduction to various bus and system architectures. To top it all off, they're not even terribly expensive. Bowlds , Pat A. Micro Channel Architecture. New York, NY. Van Nostrand Reinhold, 1 99 1 . If you're in the unenviable position of writing a driver for an MCA device, this is one of the few sources of information available. It's a little bit fluffy. Schmidt, Friedhelm. The SCSI Bus and IDE Interface. Reading, MA Addison-Wesley Publishing . Company, Inc . , 1995. This provides a good introduction to the SCSI bus . It's worth reading before you dive into the ANSI SCSI specification itself. Shanley, Tom. Plug and Play System Architecture. Reading, MA Addison-Wesley Publishing . Company, Inc . , 1995. Shanley, Tom and Anderson, Don. ISA System Architecture, 3rd ed. Reading, MA. Addison-Wes ley Publishing Company, Inc., 1995. Shanley, Tom and Anderson, Don. EISA System Architecture, 2nd ed. Reading, MA Addison . Wesley Publishing Company, Inc., 1995. Shanley, Tom and Anderson, Don. PCI System Architecture, 3rd ed. Reading, MA Addison-Wes . ley Publishing Company, Inc . , 1995. Shanley, Tom and Anderson, Don. CardBus System Architecture. Reading, MA Addison-Wesley . Publishing Company, Inc . , 1996. Books about CPU Architectures Heinrich, Joe. MIPS R4000 User's Manual. Englewood Cliffs, NJ. Prentice Hall Inc . , 1993 . Intel Corporation. Intel486 Processor Family Programmer 's Reference, Beaverton, OR. Intel Cor poration, 1992. Shanley, Tom. PowerPC 601 System Architecture. Reading, MA Addison-Wesley Publishing . Company, Inc. , 1994. Sites, Richard and Witek, Richard. Afpha AXP Architecture Reference Manual, 2nd ed. Newton, MA Digital Press, 1995. . Books about Miscellaneous Hardware Campbell, Joe. Programmer's Guide to Serial Communications, 2nd ed. Indianapolis, IN . SAMS Publishing, 1994. 1993 . This is an incredibly comprehensive source of information about the operation of UARTs and related devices . Ferraro, Richard. Programmer 's Guide t o the EGA, VGA, and Super VGA Cards, 3rd ed. Read ing, MA Addison-Wesley Publishing Company, Inc., 1994. This is probably the most com . prehensive source of information about PC video hardware and how to program it. Intel Corporation. Intel Peripheral Components Handbook, Beaverton, OR. Intel Corporation, 1993. This handbook has information about common peripheral interface chips such as the programmable interrupt controller and DMA controller. About the Author Art Baker has spent over twenty-five years in the computer industry, where he's worked on everything from compilers to real-time data gathering software. In 1984, he changed the focus of his career and began writing and teaching techni cal training classes for Digital Equipment Corporation. His broad technical back ground and good communication skills made him a consistent favorite with students and won him several awards for instructor excellence. After leaving Dig ital, Mr. Baker founded Cydonix Corporation, a Washington, DC training and consulting firm. In his spare time, Mr. Baker is an accomplished classical pianist, and an avid collector of old science fiction movies. He lives in Washington, DC. ' ,; � . ' :-:. · · . :;;_ . - ·-: · . . >-. '- . _ · . · _ · '..- !.-·• :- .: � . . : . . · ·> ; '· " · ' -" !� . · - -· � .· · · . . .: :« _ ·· : . - :· · · . ... . · : -·- . ;,-· . _ - . ��:--· · .: ·- : · . . ·_ · ' - . : · INDEX A AdapterControl routine, 57 Adapter description file (ADF), 38 Adapter object cache, 264-65 flushin�, 271 Adapter ob1ects, 74-75, 266-71 access functions for, 75 acquiring/releasing, 268-70 OMA hardware, setting up, 270-71 finding/locating, 266-68 layout of, 74-75 manipulating, 75 structure of, 75 AlignmentRequirement field, 380 Allocating hardware, 152-62 code example, 158-62 RESALLOC.C, 158-62 XxBuildPartialDescriptors, 161-62 XxReportHardwareUsage, 158-61 alloc_text pragma, 85 AltemateCurrentlrp pointer, 241 AltematelrpQueue, 227-28 Alternate IRPs, 224 ASSERT, 453 ASSERTMSG, 453 Associatedlrp.SystemBuffer field, 65, 374 Asynchronous procedure call (APC), 61 AUTOCON.C, 132-37 Autoconfiguration: EISA 0us, 40-41 ISA bus, 36 MCA bus, 38 PCI bus, 44-45 requirements, 32-33 Auto-detected hardware: code example, 130-39 AUTOCON.C, 132-37 CONFIG -ARRAY, 131-32 DEVICE-BLOCK, 131 DEVICE EXTENSION, 132 XxConfi Callback, 134-37 XXDRIVER.H, 131-32 XxGetHardwarelnfo, 132-34 XxGetlnterruptlnfo, 137-39 XxGetPortlnfo, 137-39 ConfigCallback routine, 127-30 configuration data, translating, 130 finding, 122-30 hardware database, querying, 125-27 how auto-detection worl D Data buffer, 25 Data-collection DLL, 474-87 Close function, 475 code example, 478-87 buildin /installin ' 486-87 preamb e area, 47 79 XXPERF.C, 478-86 XxPerfClose, 486 XxPerfCollect, 483-86 XxPerfOpen, 479-83 Collect function, 475 contents of, 474-75 error handling in, 476 installing, 477 Open fuii.ction, 474 writing, 474-87 Data objects, 62-63 Data transfer, 59-60 Data transfer mechanisms, 29-30, 46 Data transfer routines, 57 See also Programmed I/O data transfers DbgBreakPoint, 442 DbgPrint, 442 DDI< (Windows NT), 17-18, 80 Debugging: coding strategies that reduce, 425 miscellaneous techniques, 452-58 catching incorrect assumptions, 453 catchin!!; memory leaks, 454-55 event bits, 456 leaving WINDBG utility, code in place, 452 sanity counters, 456 trace buffers, 456-58 using bugcheck callbacks, 453-54 using counters/bits/buffers, 455-58 Debug sym6ol files, 490 Deferred procedure calls (DPCs), 4, 51-53 behavior of, 52-53 and interrupt synchronization, 95 operation of, 52-53 Demand transfer mode, 268 DependOnGroup, 415 DeregisterEventSource, 476 Development environment, 488-93 deoug symbol files, 490 enabling crash dumps on targets system, 490-92 enabling target system's debug client, 492-93 hardware/ software requirements, 488-90 connecting host ancftarget, 489-90 host system, 488 target system, 488-89 � f INDEX Development issues, 78-100 coding conventions / techniques, 80-86 driver design strategies, 78-80 driver memory allocation, 86-91 interrupt synchronization, 93-95 linked lists, 98-100 multiple CPUs, synchronizing, 95-97 Unicode strings, 91-93 DEVICE_BLOCK, 131 Device-dedicated memory, 31 Device drivers, 11 DEVICE_EXTENSION, 132 Device extensions, 71 Device interrupts, 27-29 interrupt priorities, 28 interrupt vectors, 28 processor affinity, 29 signaling mechanisms, 28-29 DeviceloControl, 80, 104, 370 Device memory, 46 Device objects, 69�71 access functions for, 70 device extensions, 71 externally visible fields for, 70 layout of, 69-70 manipulating, 70 structure of, 69 Device operations, 56 Device queue objects, 225-28 how they work, 225-26 using, 226-28 Device re&IBters, 25 accessmg, 26-27 Device resource lists, 32 Device timeouts: catching, 204-5 code example, 205-11 INIT.C, 206-7 TRANSFER.C, 207-11 XXDRIVER.H, 206 XxloTimer, 210 Xxlsr, 208-9 XxProcessTimerEvent, 211 XxTransmitBytes, 207-8 handling, 203-5 DirectDrawRAL, 15, 16-17 Direct 1/0 (DIO), 54 Direct memory access (DMA), 30 mechanisms, 30-31 DIRS (keyword), 404 DirverEntry routine, device objects, creating, 103-4 Dispatch cfeanup routine, 234-36 Dispatcher objects, 4, 323, 325-34 Event objects, 325-26 Executive Resource, 332-33 Mutex objects, 327-29 Semaphore objects, 329-30 sharing events between drivers, 327 Thread object, 331-32 Timer objects, 330-31 DISPATCH_LEVEL, crashes below, 435-36 Display drivers, 16 DMA controller (DMAC), 30, 35-36 DMA drivers, 258-98 adapter objects, 266-71 caclie coherency, maintaining, 263-65 categorizing, 265 513 common buffer slave DMA driver, 291-95 DMA hardware variations, hiding with adaptor objects, 258-59 I / 0 buffers, managing with MDLs, 261-63 NT DMA architecture, limitations of, 265-66 packet-based bus master, 285-91 packet-based slave, 272-85 scatter I gather problem, solving with mapping registers, 259-61 See also Common buffer slave DMA driver; Packet based bus master DMA drivers; Packet-based slave DMA drivers DmalnProgress, 293 Doubly-linked lists, 99 DPCForisr routine, 56, 72, 74 function of, 188-89 writing, 188-89 execution context, 188 priority increments, 189 DPC routine, 56 Driver bugs, keeping track, 425 Driver-chossen addresses, 156-57 Driver cleanup: code example, 1 15-18 UNLOAD.C, 115-18 Xxl{eleaseHardware, 115-18 XxUnload, 115 Driver design strategies, 78-80 formal design methods, 79 incremental development, 79-80 sample drivers, 80 Driver dispatch routines, 163-79 dispatch interface, extending, 165-69 execution context, 170 exiting, 171-73 completing a request, 172 signaling an error, 171-72 starting a device operation, 172-73 IOCTL argument-passing methods, 167-69 IOCTL buffers, managin�, 177 IOCTL header files, writing, 169 IOCTL requests, processing, 174-76 I / 0 request dispatching mechanism, 163-64 IRP_MJ_DEVICE_CONTROL, 165 IRP_MJ_INTERNAL_DEVICE_CONTROL, 166-67 METHOD_BUFFERED, 168, 177 METHOD_IN_DIRECT, 1 68, 177 METHOD_NEITHER, 168, 1 77 METHOD_OUT_DIRECT, 168, 1 77 private IOCTL values, defining, 167 read and write requests, processing, 173-74 specific function codes: deciding which to support, 165 enabling, 164-65 testing, 17ff-79 sample test program, 178-79 testing procedure, 178 what they ao, 170-71 writing, 169-73 DriverEnergy points, initializing, 103 DriverEntry routine, 55, 85, 157, 293, 295, 296-97, 35354, 378, 395-96, 454, 456 buffer strategy, choosin(?i, 104-5 DriverEnergy points, irutializing, 103 function of, 102-3 514 INDEX NT /Win32 device names, 105 writing, 101-5 execution context, 101-2 Driver errors: categories of, 422-24 liardware problems, 422-23 resource leaks, 423 system crashes, 423 system hangs, 424 thread han�, 423-24 coding strategies that reduce, 425 keeping track of, 425 reproducing, 424-25 miscellaneous causes, 424-25 multiprocessor dependencies, 424 multithreading dependencies, 424 time dependencies, 424 Driver initialization: and cleanup routines, 55-56 code example, 105-13 DriverEntry routine, 106-9 INIT.C, 106-13 XxCreateDevice, 109-13 Driver load sequence, controlling, 413-18 changing driver start value, 413-14 controllfug load sequence within a group, 416-18 creating explicit dependencies between arivers, 414-15 establishing global dependencies, 415-16 Driver memory cillocation, 86-91 kernel stack, 86, 87 lookaside lists, 90-91 memory suballocation, system support for, 88-91 nonpaged ool, 86, 87-88 paged poo , 86, 87-88 zone bUffers, 88-90 Driver objects, 67-68 externally visible fileds of, 68 layout of, 68 structure of, 68 Driver paging, controlling, 85-86 Driver performance, 459-87 counter definitions in the Registry, 464-66 counter names, adding to the Registry, 464-66 COUNTERS.H, 467 COUNTERS.IN!, 466-67 data-collection DLL, 474-87 Close function, 475 code example, 478-87 Collect function, 475 contents of, 474-75 error handling in, 476 installing, 477 Open function, 474 writing, 474-87 general guidelines, 459-61 concrete measurement, 461 explore creative driver designs, 460-61 kriow the hardware, 460 know where you're going, 459-60 optimize code, 461 LODCTR utility, 466-67 performance data: format of, 468-74 objects with multiple instances, 472-74 overall structure of, 468 PERF_COUNTER_BLOCK, 470 f. PERF_COUNTER_DEFINITION, 470 PERF_OBJECT_TYPE, 469 types of counters, 470-72 performance monitoring, 462-64 how drivers export performance data, 464 how it works, 462-M terminology, 462 UNLODCTR utility, 467 Drivers: building, 398-409 installirig, 409-13 testing/ debugging, 419-58 Driver symbol data, moving into .DBG files, 408-9 Driver testing, 419-58 crash dump analysis, 433-40 developing tests, 420-21 driver errors: categories of, 422-24 reproducing, 424-25 general approach to, 419-21 now to perform tests, 421 interactive debugging, 440-42 Microsoft Hardware Compatibility Tests (HCTs), 421-22 miscellaneous techniques, 452-58 catching incorrect assumptions, 453 catchin� memory leaks, 454-55 event bits, 456 leaving debugging code in place, 452 sanity counters, 456 trace buffers, 456-58 using bugcheck callbacks, 453-54 using counters/bits/buffers, 455-58 system crashes, 426-30 what to test, 420 when to test, 420 who should do testing, 421 WINDBG, 430-32 extensions, 442-46 See also Crash dump analysis; Driver errors; Interactive debugging; WINDBG utility DriverUnload fie1d, 1 14 DUMPEXAM, analyzing crashes with, 439 E Edge-triggered interrupts, 28-29 EISA bus, 39-41 autoconfiguration, 40-41 device memory, 40 DMA capabilities, 40 interrupt mechanisms, 39-40 register access, 39 Environment subsystems, 7-8 Error logging, 312-13 code example, 313-19 EVENTLOG.C, 313-19 XxGetStringSize, 319 XxlnitializeEventLog, 313-15 XxReportEvent, 315-18 preparing a driver for, 310-11 Error reporting, 46 Error response tests, 420 EVENTLOG.C, 313-19 Event logging, 299-301 deciding what to log, 299-300 process, 300-301 EventLogLevel, 311, 476, 487 515 INDEX Event Viewer utility, 301 ExAllocateFromXxxLookasideList, 90-91 ExAllocateFromZone, 89 ExAllocatePool, 87, 89, 366, 372-74, 454-55 ExAllocatePoolWithTag, 89, 454-55 Exceptions, 48-49 ExDeleteXxxLookasideList, 90 Executive, 4-7, 84 Configuration Manager, 5-6 1/0 Manager, 7, 49, 55-72, 84, 101 Long Procedure Call (LPC) facility, 6 Object Manager, 5, 84 Process Manager, 6, 84 Security References Monitor, 6 system service interface, 5 Virtual Memory Manager, 6, 84 Executive Resources, 332-33 functions that work with, 333 ExExtendZone, 90 ExFreePool, 87, 374 ExFreeToXxxLookasideList, 90-91 ExFreeToZone, 89 ExlnitializeWorkltem, 322 ExlnitializeXxxLookasideList, 90 ExlnitializeZone, 89 ExlnterlockedAllocateFromZone, 89 ExlnterlockedDecrementLong, 376 ExlnterlockedExtendZone, 90 ExlnterlockedFreeToZone, 89 ExlnterlockedlnsertHeadList, 99 ExlnterlockedlnsertTailList, 99 ExlnterlockedPopEntryList, 98 ExlnterlockedPushEntryList, 98 ExlnterlockedRemoveHeadList, 99 Expiration times, specifying, 214-15 ExQueueWorkltem, 323 Extendibility, Windows NT, 2 ExtensionApis, 448 ExtensionApiVersion, 443 F Fast Mutex, 332 File mapping objects, 464 Filename.H, 308 Filename.RC, 308 File-system drivers (FSDs), 11-12 File system problems, 503 Filter drivers, 351, 376-93 code example, 381-93 COMPLETE.C, 390-93 DISPATCH.C, 386-90 DriverEntry routine, 381-84 INIT.C, 381-86 YyAttachFilter, 384-85 YyDispatchDeviceloControl, 388-89 YyDispatchPassThrough, 389-90 YyDispatchWrite, 386-88 YYDRIVER.H, 381 YyGenericCompletion, 393 YyGetBufferLimits, 385-86 YyWriteCompletion, 390-93 how they work, 377-78 initialization/cleanup in, 378-79 DriverEntry routine, 378-79 Unload routine, 379 MajorFunction table, 380-81 making the attachment transparent, 380-81 SCSI, 13 undocumented activity, 380 Flags field, 104, 272, 293 Formal design methods, 79 Full-duplex aevices, 223 Full-duplex drivers, 222-57 alternate path, implementing, 225 CustomDpc routines, writing, 228-29 data structures for, 224-25 device queue objects, 225-28 dispatcfi cleanup routine, 234-36 1/0 requests, canceling, 229-36 modified driver architecture, 223-24 16550 UART, 236-57 See also 16550 UART G GDI engine, 1 6 GDI functions, Win32 subsystem, 9 GetBuffer, 396-97 GetFileVersionlnfo, 405 GetFileVersionlnfoSize, 405 Graphical device interface (GDI), 15 GroupOrderList, 416-18 H HAL, See Hardware Abstraction Layer (HAL) HalAllocateCommonBuffer, 156, 291 , 296-97 HalAssignSlotResources, 45 HAL.DLL, 429 Half-duplex devices, 223 HalFreeCommonBuffer, 157, 295, 297 HalGetAdapter, 75, 156, 266-69, 272, 277-78, 293, 296 HalGetBusData, 38, 41, 45, 141-42 HalGetlnterruptVector, 77 HalSetBusData, 38, 41, 45 HalTranslateBusAddress, 157 Hardware, 24-47 allocating, 152-62 autoconfiguration requirements, 32-33 basics of, 24-33 bus architecture, understanding, 45 buses, 33-45 control registers, understanding, 45-46 data transfer mechanisms, 29-30 understanding, 46 device-dedicated memory, 31 device memory, understanding, 46 device microcode, loading, 157-58 device registers, 25 accessing, 26-27 direct memory access (DMA) mechanisms, 30-31 error and status reporting, understanding, 46 hints for working with, 45-47 problems with, 422-23 releasing, 155-56 testing, 46-4 7 See also Auto-detected hardware Hardware Abstraction Layer (HAL), 3, 74, 84 Hardware database, querying, 125-27 Hardware failures, 500-501 Hardware initialization, 122-62 allocating/ releasing hardware, 152-62 auto-detected hardware, finding, 122-30 device memory, mapping, 156-57 unrecognized hardware, finding, 139-52 INDEX 516 Hardware resources, claiming, 153-55 Hardware tests, 420 Hard-wired addreses, 157 Header, 44 Header files, 81-82 Higher-level drivers, 350-97 allocating additional IRPs, 364-76 buffered 1/0 requests, 374 creating IRPs from scratch, 371-74 driver-managed memory, 374 ExAllocatePool, 372-74 IoAllocatelrp, 371-72 direct 1/0 requests, 374-75 driver-allocated IRPs, 375-76 asynchronous I/O, 375-76 synchronous I/O, 375 filter drivers, 376-93 intermediate drivers, 350-52 IoBuildAsynchronousFsdRequest, creating IRPs with, 368-69 loBuildDeviceloControlRequest, creating IRPs with, 369-71 IoBuildSynchronousFsdRequest, creating IRPs with, 367-68 I / O completion routines, 360-64 code example, 363-64 execution context, 361 requesting I/O completion callback, 360 wnat they do, 362-63 IRP stack, controlling size of, 365-66 tight!Y: coupled drivers, 394-97 High-lRQL crashes, 434-35 I Identifier (subkey), 124 IgnoreCount field, 268 IMAGEHLP.DLL, 439 Incremental development, 79-80 Information field, 64 Initialization and cleanup routines, 101-21 driver cleanup example, 115-18 DriverEntry routine, writing, 101-5 driver initialization example, 105-13 reinitialize routines, writing, 1 13-14 shutdown routines, writing, 118-19 testing the driver, 1 19-21 unload routine, writing, 1 14-15 Initialization routines, discarding, 84-85 lnsertHeadList, 99 InsertTailList, 99 Installing drivers, 409-13 by hand, 409-10 driver Re try entries, 410 end-user mstallation of nonstandard drivers, 41213 end-user installation of standard drivers, 410-12 after NT installation, 412 during GUI setup, 411-12 during text setup, 410-11 Integral subsystems, 7 Interactive debugging, 440-42 breakpoints, setting, 441 debug session, startin g /stopping, 440-41 hard breakpoints, setting, 442 print statements, using, 442 Intermediate drivers, 11, 350-52 definition of, 350 � and layered architecture, 351-52 Interrupt behavior, 46 Interrupt objects, 76-77 layout of, 76 manipulation of, 77 Interrupt priorities, 28 Interrupt request level (IRQL), 49-51, 60, 88 Interrupts, 49 CPU priority levels, 49 interrupt processing se quence, 50-51 interrupt request level (IRQ), 49 software-generated interrupts, 51 Interrupt Service routine (ISR), 56, 297 furiction of, 187-88 writing, 186-88 execution context, 186-87 Interrupt synchronization, 93-95 and DPCs, 95 interrupt blocking, 94-95 Interrupt vectors, 28 IoAcquireCancelSpinLock, 232, 235 IoAllocateAdapterChannel, 75, 268-69, 273, 281, 294 IoAllocateController, 73 IoAllocateErrorLogEntry, 312 IoAllocatelrp, 66, 366, 371-72 IoAllocateMdl, 263, 292, 294, 374 IoAttachDevice, 70 IoAttachDeviceByPointer, 70, 378, 380 IoBuildAsynchronousFsdRequest, creating IRPs with, 366-69 IoBuildDeviceloControlRequest, creating IRPs with, 366-71 IoBuildPartialMdl, 263 IoBuildSynchronousFsdRequest, creating IRPs with, 366-68, 375 IoCallDriver, 66, 70, 230, 233, 358, 364-65, 367, 374-76, 394 IoCancellrp, 230 IoCompleteRequest, 66, 205, 232-34, 273, 353, 357, 360-61, 365, 368, 370, 375, 376, 397, 423, 426 I/0 completion routines, 58, 360-64 code example, 363-64 execution context, 361 reguesting 1/0 completion callback, 360 wnat they do, 362-63 IoConnectlnterrupt, 77, 207 IoCreateController, 73, 102 loCreateDevice, 70, 102-5, 132, 353, 378-79, 395 IoCreateNotificationEvent, 326 loCreateSymbolicLink, 70, 326 IoCreateSynchronizationEvent, 326 IOCTL argument-J? assing methods, 167-69 IOCTL buffers, Driver dispatch routines, managing, 177 IOCTL header files, writing, 169 IOCTL requests, processing, 174-76 IOCTLs, 464 IoDeleteController, 73 IoDeleteDevice, 70, 354, 379 IoDeleteSymbolicLink, 70, 102, 354 IoDetachDevice, 70, 379 IoDisconnectlnterrupt, 77 IoFlushAdapterBuffers, 75, 265, 271, 274, 295 IoFreeAdapterChannel, 75, 270, 295 IoFreeController, 73 loFreelrp, 66, 369, 373-74 517 INDEX IoFreeMapRegisters, 75, 270 IoFreeMdl, 263, 295 IoGetConfigurationlnformation, 142 IoGetCurrentlrpStackLocation, 67, 357, 365 IoGetDeviceObjectPointer, 70, 353, 356, 378-79, 395 IoGetNextlrpStackLocation, 67, 358, 362, 365 lolnitializel 97, 372, 374 IolnitializeT!mer, 204 IOLOGMSG.DLL, 308 I/0 Manager, 7, 49, 55-72, 84, 101, 460 IoMapTransfer, 75, 269-71, 273, 274, 286-87, 292, 294, 295 IoMarklrpPending, 67, 294, 358, 362, 375 IoQueryDeviceComponentlnformation, 128 IoQueryDeviceConfi�urationData, 128 IoQueryDeviceDescnption, 125-28, 150, 152 IoQueryDeviceldentifier, 128 IoRegisterDriverReinitialization, 113 IoRegisterShutdownNotification, 119 IoReleaseCancelSpinLock, 232-34 IoReportResourceUsage, 154-57 1/0 request dispatching mechanism, 163-64 IoRequestDpc, 274, 295, 297 I/O request packets (IRPs), 58, 63-67 alternate IRPs, 224 IRP header, 64-65 externally visible fields in, 65 layout of, 64-65 manipulating, 65-67 primary IRPs, 224 stack locations, 65, 67 structure of, 64 I/0 requests: canceling, 229-36 cancel routine, 232-34 Cancel spin lock, 231-32 IRP Cancel flag, 232 synchronization issues, 231-32 IoSetCancelRoutine, 233 IoSetCompletionRoutine, 67, 358, 360, 362, 363, 374, 375 IoSetHardErrorOrVerifyDevice, 372 IoSetNextlrpStackLocation, 67 I/O space, definition of, 26 I/O space registers, 26 I/O stack locations, 64 IoStartNextPacket, 66, 224-26, 233-34, 242, 244, 282, 423 IoStartPacket, 66, 222, 224-26, 282, 338, 434 IoStartTimer, 204 IoStatus.Information, 230, 357 IoStatus member, 64, 273 IoStopTimer, 205 I/O srstem service dispatch routines, 55-56 IoWr1teErrorLogEntry, 313 ISA bus, 33-36 autoconfiguration, 36 device memory, 36 OMA capabilities, 34-36 interrupt mechanisms, 34-35 register access, 33-34 ISR, 60 K KdPrint, 442 KeAcquireSpinLock, 97 KeBugCheck, 426, 427, 434 KeBugCheckEx, 426, 427 KeCancelTimer, 214 KeClearEvent, 326 KeDelayExecutionThread, 212 KeDeregisterBugCheckCallback, 454 KeFlushloBuffers, 264, 272, 291, 297 KeGetCurrentlrql, 94 KelnitializeCallbackRecord, 454 KelnitializeDeviceQueue, 226, 227, 228 KelnitializeDpc, 213, 229 KelnitializeEvent, 326 KelnitializeMutex, 328 KelnitializeS pinLock, 97 KelnitializeT!mer, 214 KelnsertByKeyDeviceQueue, 227 KelnsertDeviceQueue, 227, 245 KelnsertQueueDpc, 228-29 KeLowerlrql, 94, 227, 269 KeQuerySystemTime, 214 KeQueryTickCount, 214 KeQueryTimelncrement, 214 KeRaiselrql, 94, 227, 269 KeReadStateMutex, 328 KeReadStateTimer, 214 KeRegisterBugCheckCallback, 454 KeReleaseMutex, 328-29 KeReleaseSemaphore, 295, 329 KeReleaseSpinLock, 97 KeReleaseSpinLockFromDpcLevel, 97 KeRemoveDeviceQueue, 227, 246 KeRemoveEntryDeviceQueue, 227, 233 KeRemoveQueueDpc, 228 KeResetEvent, 326 Kernel, 3-4 KERNEL functions, Win32 subsystem, 9 Kernel mode, 48-61 control objects, 4 data transfer, 59-60 deferred l? rocedure calls (DPCs), 51-53 device dnvers, 11 dispatcher objects, 4 exceptions, 48-49 Executive, 4-7 file-system drivers (FSDs), 1 1-12 Hardware Abstraction Layer (HAL), 3 intermediate drivers, 1 1 interrupts, 49-51 I/O components, 10-15 I/0 subsystem design goals, 10 1/0 processing sequence, 58-61 kernel, 3-4 kernel-mode threads, 49 layered drivers, 10- 1 1 network drivers, 13-15 postprocessing: by the driver, 60 by the 1/0 Manager, 60-61 request preprocessing: by tne driver, 59 by NT, 58-59 scsr drivers, 12-13 user buffer access, 53-54 Windows NT, 2 Kernel-mode drivers, 21 data transfer routines, 56 driver initialization and cleanup routines, 55 1/0 system service dispatch routines, 55-56 INDEX 518 resource synchronization callbacks, 57 structure of, 54-58 Kernel-mode threads, 49 Kernel stack, 86, 87 KeSetTimer, 214-15 KeSetTimerEx, 214 KeStallExecutionProcessor, 212 KeSynchronizeExecution, 77, 96 KeWaitForMultipleObjects, 324-25, 328, 375 KeWaitForSingleObject, 324, 328, 330, 334, 368, 370 KiTrap, 436 L Language monitor DLL, and printer drivers, 1 9 Latched interrupts, 28-29 Layered drivers, 10-11, 351, 352-60 code example, 354-56 how they work, 352-53 initialization/ cleanup in, 353-54 DriverEntry routine, 353-54 Unload routine, 354 IRPs in, 357-58 lower-level driver, calling, 359-60 transparent la er, 356 virtual/logica device layer, 357 Legacy 16-bit applications, drivers for, 21-22 Level-sensitive (level-triggered) interrupt, 29 Linked lists, 98-100 doubly-linked lists, 99 removing blocks from, 99-100 singly-linked lists, 98 IistHead field, 100 LODCTR utility, 466-67 Logging device errors, 299-319 error logging, 312-13 code example, 313-19 preparing a driver for, 310-1 1 error-log packet, allocating, 311-12 event logging in Windows NT, 299-301 generating log entries, 310-13 messages, 301-10 ad d ing message resources to a driver, 308-9 message codes, 302-3 message definition files, 303-8 registering drivers as event sources, 309-10 r XXMSG.MC, 305-7 Logical space, 260 Long Procedure Call (LPC) facility, 6 Lookaside lists, 90-91 Low-level audio drivers, 21 M MajorFunction field, 65 MajorFunction table, 235, 356, 378-81, 383-84, 395 filter driver object, 380-81 MAKEFILE.INC, 408 Mapping registers, 74 MCA bus, 36-38 autoconfiguration, 38 device memory, 38 DMA capabilities, 38 interrupt mechanisms, 38 register access, 37-38 MCI drivers, 20-21 MC utility, 307-8 Md!Address field, 65, 374 Memory Description Lists (MDLs), 260 managing 1/0 buffers with, 261-63 Memory-mapped registers, 27 Memory suballocation, system support for, 88-91 MEMORY.TXT, 439 Message-code fields, meaning of, 302 Message definition files, 303-8 compiling, 307-8 header section, 303 keywords used in, 303 MC utility, 307-8 message section, 303-4 Messageld keyword, 307 Messages, 301-10 message codes, 302-3 message definition files, 303-8 compiling, 307-8 header section, 303 keywords used in header seciton of, 303 MC utility, 307-8 message section, 303-4 message resources, adding to a driver, 308-9 registering drivers as event sources, 309-10 XXMSG.MC, 305-7 METHOD BUFFERED, 168, 177 METHOD) N_DIRECT, 168, 177 METHOD_NEITHER, 168, 177 METHOD_OUT_DIRECT, 168, 177 Microsoft Hardware Compatibility Tests (HCTs), 42122 MmbuildMd!ForNonPagedPool, 263, 292, 294 MmGetMd!ByteCount, 263 MmGetMd!ByteOffset, 263 MmGetMdlVirtua!Address, 263, 273 MmGetSystemAddressForMdl, 263 MmGetSystemAddressForMdl field, 262 MmlsThisAnNtAsSystem, 89 MmMaploSp ace, 157 MmPageEnbreDriver, 85 MmQuerySystemSize, 89 MmResetDriverPaging, 85 MmUnmaploSpace, 157 Module list, 428 MouConfiguration, 81 MouConnectToPort, 81 MouseClassStartlo, 81 MouseClassUnload, 81 MSGnnnnn.BIN, 308 Multimedia drivers, 20-22 kernel-mode device drivers, 21 low-level audio drivers, 21 MCI drivers, 20-21 WINMM, 20 Multiple CPUs, synchronizing, 95-97 Multiprocessor dependencies, 424 Multithreading dependencies, 424 Mutex objects, 327-29 Fast Mutex, 332, 423-24 N NDIS intermediate drivers, 14 Network driver interface specfication (NDIS), 13-14 Network drivers, 13-15 kernel-mode networking clients, 15 NDIS intermediate drivers, 14 network interface card (NIC) drivers, 13-14 transport drivers, 14-15 519 INDEX NIC drivers, 13-14 Nonpaged pool, 86 NonPagedPool, 87 NonPagedPoolCacheAligned, 87 NonPagedPoolCacheAlignedMustS, 88 NonPagedPoolMustSucceed, 87 Nonpaged system memory, controlling, 85-86 Nonsignaled dispatcher objects, 323 Normcil response tests, 420 NTDDK.H, 81, 1 03, 434, 442, 448 NTDETECT, 122-23 NT driver support routines, 83-84 NTOSKRNL.EXE, 429 NTSTATUS, 82-83, 102 NTSTATUS.H, 83, 437 NtXxx, 83 0 ObDereferenceObject, 327, 331, 354, 379, 39506 Object instance, 462 Object Manager, 5, 84 ObReferenceObjectByHandle, 327, 331, 354, 395 OEMSETUP.INF (control script), 411-12 OOP, and Windows NT, 62-63" Open and close operations, 56 openGL API, 15 OPTIONAL_DIRS (keyword), 404 OS/2 subsystem, 8 OtherDrivers class, 152-54 OutpuilnterruptsValid flag, 241 Overall system architecture, windows NT, 1-10 OverrideConflict parameter, 155 p Packet-based bus master DMA drivers, 285-91 Adapter Control routine: and bus master hardware, 287 and scatter/gather lists, 289-90 bus master hardware, setting up, 286-88 DpcForlsr routine: and bus master hardware, 287-88 and scatter Igather support, 290-91 hardware with scatter I gather support, 288-89 scatter I gather lists, 288-91 building with I/OMapTransfer, 289-91 Packet-based slave DMA drivers, 272-85 Adapter Control routine, 273 code example, 276-85 DEVICE_EXTENSION, 276-77 TRANSFER.C, 278-85 XxAdapterControl, 281-82 XxDpcForlsr, 283-85 XXDRIVER.H, 276 XxGetDmalnfo, 277-78 Xxlsr, 282-83 XxStartlo, 278-81 DMA transfers, splitting, 274-76 DpcForlsr routine, 274 DriverEntry routine, 272 Interrupt Service routine, 273-74 Start 1/0 routine, 272-73 Paged pool, 86, 87-88 PagedPool, 88 PagedPoolCacheAligned, 88 Parallel port, 189-201 code example, 192-201 DEVICE_EXTENSION, 192-93 INIT.C, 193-95 TRANSFER.C, 193-201 XxCreateDevice, 193-95 XxDpcForlsr, 200-201 XXDRIVER.H, 192-93 XxlnitHardware, 195 Xxlsr, 199-200 XxStartlo, 195-97 XxTransmitBytes, 197-99 device registers, 191 driver for, 192 how it works, 189-90 interrupt behavior, 192 Partial resource descriptor, contents of, 129 PCi bus, 41-45 autoconfiguration, 44-45 device memory, 44 DMA capabilities, 43-44 interrupt mechanisms, 43 register access, 43 PCI (peripheral component interconnect), See PCI bus PERF_COUNTER_BLOCK, 470 PE+RF_COUNTER_DEFINITION, 470 Perflib key, 464, 467 PERFMON utility, 461, 463, 464 PERF_OBJECT_TYPE, 469 Performance, Windows NT, 2 Performance counter, 462 Physical addresses, 31, 260 Pollinglnterval field, 216 PopEntryList, 98 Portability, Windows NT, 2 PortBase field, 82 Port drivers, 1 1 SCSI, 13 video, 17 Port monitor DLL, 19-20 Ports, definition of, 26 POSIX subsystem, 8 Precompiled heads, using, 404 Primary IRPs, 224 Printer drivers, 17-20 configuration DLL, 18 DDK, 17-18 language monitor DLL, 19 port monitor DLL, 19-20 print processor DLL, 18-19 spooler, 18 Printer Job Language (PJL), 19 Print processor DLL, 18-19 Process field, 262-63 Process Manager, 6, 84 Product information file, 406-7 Programmed 1/0 data transfers, 180-202 driver initialization/cleanup, 182-85 connecting to interrupt source, 183-85 disconnecting from interrupt source, 185 initializing DpcForlsr routine, 183 initializing Start 1/0 entry point, 182-83 interru t service routine, writing, 186-88 paralle port, 189-201 code example, 192-201 device registers, 191 driver for, 192 how it works, 189-90 interrupt behavior, 192 f 520 Start I/O routine, writing, 185-86 synchronizing driver routines, 181-82 testing, 201-2 what happens during, 180-81 Programmea l/O (PIO), 29-30 Protected subsystems, 7 PsCreateSystemThread, 331, 337 PushEntryList, 98 PutBuffer, 396-97 PVOID ControllerExtension field, 72 Q QueueRequest, 397 R Read and write requests, processing, 1 73-74 buffered I/O, 1 73-74 direct I/O, 174 neither method, 174 Recursive BUILD operations, 403-4 REGCON.C, 143-52 RegCreateKeyEx, 412 RegisterEventSource, 476 Registry: adding counter names to, 464-66 adding driver parameters to, 140 and auto-detected hardware, 123-25 and Configuration Manager, 501-2 counter definitions in, 464-66 driver entries, 410 RegQueryValueEx, 463 RegSetValueEx, 412 Reinitialize routines, 55, 113-14 writing, 113-14 execution context, 113 what it does, 113 Reliability, Windows NT, 1 RemoveHeadList, 99 RemoveTailList, 99 ReportEvent, 476 RESALLOC.C, 158-62 Resoruce leaks, 423 Resource allocation, 152-53 Resource lists, 32 Resource synchronization callbacks, 57 Robustness, Windows NT, 1 RtlConvertLongToLargelnteger, 214 RtlConvertUlongToLargelnteger, 214 RtlCopyBytes, 318 RtlLargelntegerXx, 214 RtlMoveMemory, 294 RtlQueryRegistryValues, 140-41, 148, 314 RtlTimeFieldsToTime, 214 RtlTimeToTimeFields, 214 RtlZeroMemory, 268 Runtime library, 84 s Sample drivers, 80 ScatterGather, 268 Scatter I gather list, 288 SCSI drivers, 12-13 class drivers, 13 filter drivers, 13 ort and miniport dirvers, 13 SCS Request Bloc1
Source Exif Data:File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : No Create Date : 2017:03:24 18:37:54-07:00 Creator : PFU ScanSnap Manager 6.5.40 #iX500 Modify Date : 2017:03:24 18:37:54-07:00 Title : XMP Toolkit : Adobe XMP Core 5.4-c006 80.159825, 2016/09/16-03:31:08 Metadata Date : 2017:03:24 18:37:54-07:00 Creator Tool : PFU ScanSnap Manager 6.5.40 #iX500 Format : application/pdf Document ID : uuid:828d3aac-a688-4660-a7ff-7f50b7eb5f49 Instance ID : uuid:173d6b2f-1b31-45af-b8b5-474438988480 Producer : Adobe Acrobat Pro 11.0.19 Paper Capture Plug-in with ClearScan Page Count : 544EXIF Metadata provided by EXIF.tools